1. 24 Nov, 2016 22 commits
    • Alexei Starovoitov's avatar
      samples/bpf: fix bpf loader · db6a71dd
      Alexei Starovoitov authored
      llvm can emit relocations into sections other than program code
      (like debug info sections). Ignore them during parsing of elf file
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db6a71dd
    • Alexei Starovoitov's avatar
      samples/bpf: fix sockex2 example · d2b024d3
      Alexei Starovoitov authored
      since llvm commit "Do not expand UNDEF SDNode during insn selection lowering"
      llvm will generate code that uses uninitialized registers for cases
      where C code is actually uses uninitialized data.
      So this sockex2 example is technically broken.
      Fix it by initializing on the stack variable fully.
      Also increase verifier buffer limit, since verifier output
      may not fit in 64k for this sockex2 code depending on llvm version.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2b024d3
    • Eric Dumazet's avatar
      mlx4: reorganize struct mlx4_en_tx_ring · e3f42f84
      Eric Dumazet authored
      Goal is to reorganize this critical structure to increase performance.
      
      ndo_start_xmit() should only dirty one cache line, and access as few
      cache lines as possible.
      
      Add sp_ (Slow Path) prefix to fields that are not used in fast path,
      to make clear what is going on.
      
      After this patch pahole reports something much better, as all
      ndo_start_xmit() needed fields are packed into two cache lines instead
      of seven or eight
      
      struct mlx4_en_tx_ring {
      	u32                        last_nr_txbb;         /*     0   0x4 */
      	u32                        cons;                 /*   0x4   0x4 */
      	long unsigned int          wake_queue;           /*   0x8   0x8 */
      	struct netdev_queue *      tx_queue;             /*  0x10   0x8 */
      	u32                        (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /*  0x18   0x8 */
      	struct mlx4_en_rx_ring *   recycle_ring;         /*  0x20   0x8 */
      
      	/* XXX 24 bytes hole, try to pack */
      
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	u32                        prod;                 /*  0x40   0x4 */
      	unsigned int               tx_dropped;           /*  0x44   0x4 */
      	long unsigned int          bytes;                /*  0x48   0x8 */
      	long unsigned int          packets;              /*  0x50   0x8 */
      	long unsigned int          tx_csum;              /*  0x58   0x8 */
      	long unsigned int          tso_packets;          /*  0x60   0x8 */
      	long unsigned int          xmit_more;            /*  0x68   0x8 */
      	struct mlx4_bf             bf;                   /*  0x70  0x18 */
      	/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
      	__be32                     doorbell_qpn;         /*  0x88   0x4 */
      	__be32                     mr_key;               /*  0x8c   0x4 */
      	u32                        size;                 /*  0x90   0x4 */
      	u32                        size_mask;            /*  0x94   0x4 */
      	u32                        full_size;            /*  0x98   0x4 */
      	u32                        buf_size;             /*  0x9c   0x4 */
      	void *                     buf;                  /*  0xa0   0x8 */
      	struct mlx4_en_tx_info *   tx_info;              /*  0xa8   0x8 */
      	int                        qpn;                  /*  0xb0   0x4 */
      	u8                         queue_index;          /*  0xb4   0x1 */
      	bool                       bf_enabled;           /*  0xb5   0x1 */
      	bool                       bf_alloced;           /*  0xb6   0x1 */
      	u8                         hwtstamp_tx_type;     /*  0xb7   0x1 */
      	u8 *                       bounce_buf;           /*  0xb8   0x8 */
      	/* --- cacheline 3 boundary (192 bytes) --- */
      	long unsigned int          queue_stopped;        /*  0xc0   0x8 */
      	struct mlx4_hwq_resources  sp_wqres;             /*  0xc8  0x58 */
      	/* --- cacheline 4 boundary (256 bytes) was 32 bytes ago --- */
      	struct mlx4_qp             sp_qp;                /* 0x120  0x30 */
      	/* --- cacheline 5 boundary (320 bytes) was 16 bytes ago --- */
      	struct mlx4_qp_context     sp_context;           /* 0x150  0xf8 */
      	/* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
      	cpumask_t                  sp_affinity_mask;     /* 0x248  0x20 */
      	enum mlx4_qp_state         sp_qp_state;          /* 0x268   0x4 */
      	u16                        sp_stride;            /* 0x26c   0x2 */
      	u16                        sp_cqn;               /* 0x26e   0x2 */
      
      	/* size: 640, cachelines: 10, members: 36 */
      	/* sum members: 600, holes: 1, sum holes: 24 */
      	/* padding: 16 */
      };
      
      Instead of this silly placement :
      
      struct mlx4_en_tx_ring {
      	u32                        last_nr_txbb;         /*     0   0x4 */
      	u32                        cons;                 /*   0x4   0x4 */
      	long unsigned int          wake_queue;           /*   0x8   0x8 */
      
      	/* XXX 48 bytes hole, try to pack */
      
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	u32                        prod;                 /*  0x40   0x4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	long unsigned int          bytes;                /*  0x48   0x8 */
      	long unsigned int          packets;              /*  0x50   0x8 */
      	long unsigned int          tx_csum;              /*  0x58   0x8 */
      	long unsigned int          tso_packets;          /*  0x60   0x8 */
      	long unsigned int          xmit_more;            /*  0x68   0x8 */
      	unsigned int               tx_dropped;           /*  0x70   0x4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	struct mlx4_bf             bf;                   /*  0x78  0x18 */
      	/* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */
      	long unsigned int          queue_stopped;        /*  0x90   0x8 */
      	cpumask_t                  affinity_mask;        /*  0x98  0x10 */
      	struct mlx4_qp             qp;                   /*  0xa8  0x30 */
      	/* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */
      	struct mlx4_hwq_resources  wqres;                /*  0xd8  0x58 */
      	/* --- cacheline 4 boundary (256 bytes) was 48 bytes ago --- */
      	u32                        size;                 /* 0x130   0x4 */
      	u32                        size_mask;            /* 0x134   0x4 */
      	u16                        stride;               /* 0x138   0x2 */
      
      	/* XXX 2 bytes hole, try to pack */
      
      	u32                        full_size;            /* 0x13c   0x4 */
      	/* --- cacheline 5 boundary (320 bytes) --- */
      	u16                        cqn;                  /* 0x140   0x2 */
      
      	/* XXX 2 bytes hole, try to pack */
      
      	u32                        buf_size;             /* 0x144   0x4 */
      	__be32                     doorbell_qpn;         /* 0x148   0x4 */
      	__be32                     mr_key;               /* 0x14c   0x4 */
      	void *                     buf;                  /* 0x150   0x8 */
      	struct mlx4_en_tx_info *   tx_info;              /* 0x158   0x8 */
      	struct mlx4_en_rx_ring *   recycle_ring;         /* 0x160   0x8 */
      	u32                        (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /* 0x168   0x8 */
      	u8 *                       bounce_buf;           /* 0x170   0x8 */
      	struct mlx4_qp_context     context;              /* 0x178  0xf8 */
      	/* --- cacheline 9 boundary (576 bytes) was 48 bytes ago --- */
      	int                        qpn;                  /* 0x270   0x4 */
      	enum mlx4_qp_state         qp_state;             /* 0x274   0x4 */
      	u8                         queue_index;          /* 0x278   0x1 */
      	bool                       bf_enabled;           /* 0x279   0x1 */
      	bool                       bf_alloced;           /* 0x27a   0x1 */
      
      	/* XXX 5 bytes hole, try to pack */
      
      	/* --- cacheline 10 boundary (640 bytes) --- */
      	struct netdev_queue *      tx_queue;             /* 0x280   0x8 */
      	int                        hwtstamp_tx_type;     /* 0x288   0x4 */
      
      	/* size: 704, cachelines: 11, members: 36 */
      	/* sum members: 587, holes: 6, sum holes: 65 */
      	/* padding: 52 */
      };
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3f42f84
    • Florian Fainelli's avatar
      ethtool: Protect {get, set}_phy_tunable with PHY device mutex · 4b65246b
      Florian Fainelli authored
      PHY drivers should be able to rely on the caller of {get,set}_tunable to
      have acquired the PHY device mutex, in order to both serialize against
      concurrent calls of these functions, but also against PHY state machine
      changes. All ethtool PHY-level functions do this, except
      {get,set}_tunable, so we make them consistent here as well.
      
      We need to update the Microsemi PHY driver in the same commit to avoid
      introducing either deadlocks, or lack of proper locking.
      
      Fixes: 968ad9da ("ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE")
      Fixes: 310d9ad5 ("net: phy: Add downshift get/set support in Microsemi PHYs driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarAllan W. Nielsen <allan.nielsen@microsemi.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b65246b
    • David S. Miller's avatar
      Merge branch 'mlx5-next' · fab96ec8
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox 100G mlx5 SRIOV switchdev update
      
      This series from Roi and Or further enhances the new SRIOV switchdev mode.
      
      Roi's patches deal with allowing users to configure though devlink
      the level of inline headers that the VF should be setting in order for
      the eswitch HW to do proper matching. We also enforce that the matching
      required for offloaded TC rules is aligned with that level on the PF driver.
      
      Or's patches deals with allowing the user to control on the VF operational
      link state through admin directives on the mlx5 VF rep link. Also in this series
      is implementation of HW and SW counters for the mlx5 VF rep which is aligned
      with the design set by commit a5ea31f5 'Merge branch net-offloaded-stats'.
      
      v1 --> v2:
      * constified the net-device param of get offloaded stats ndo in mlxsw
        (pointed by 0-day screaming on us...)
      * added Or's Review-by tags for Roi's patches
      
      This series was generated against commit
      e796f49d ("net: ieee802154: constify ieee802154_ops structures")
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fab96ec8
    • Roi Dayan's avatar
      net/mlx5e: Enforce min inline mode when offloading flows · de0af0bf
      Roi Dayan authored
      A flow should be offloaded only if the matches are
      allowed according to min inline mode.
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de0af0bf
    • Roi Dayan's avatar
      net/mlx5: E-Switch, Add control for inline mode · bffaa916
      Roi Dayan authored
      Implement devlink show and set of HW inline-mode.
      The supported modes: none, link, network, transport.
      We currently support one mode for all vports so set is done on all vports.
      When eswitch is first initialized the inline-mode is queried from the FW.
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bffaa916
    • Roi Dayan's avatar
      net/mlx5: Enable to query min inline for a specific vport · 34e4e990
      Roi Dayan authored
      Also move the inline capablities enum to a shared header vport.h
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34e4e990
    • Roi Dayan's avatar
      devlink: Add E-Switch inline mode control · 59bfde01
      Roi Dayan authored
      Some HWs need the VF driver to put part of the packet headers on the
      TX descriptor so the e-switch can do proper matching and steering.
      
      The supported modes: none, link, network, transport.
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59bfde01
    • Or Gerlitz's avatar
      net/mlx5e: Support VF vport link state control for SRIOV switchdev mode · 20a1ea67
      Or Gerlitz authored
      Reflect the administative link changes done on the VF representor to the
      VF e-switch vport. This means that doing ip link set down/up commands on
      the VF rep will modify the e-switch vport state which in turn will make
      proper VF drivers to set their carrier accordingly.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20a1ea67
    • Or Gerlitz's avatar
      net/mlx5e: Support HW (offloaded) and SW counters for SRIOV switchdev mode · 370bad0f
      Or Gerlitz authored
      Switchdev driver net-device port statistics should follow the model introduced
      in commit a5ea31f5 'Merge branch net-offloaded-stats'.
      
      For VF reps we return the SRIOV eswitch vport stats as the usual ones and SW stats
      if asked. For the PF, if we're in the switchdev mode, we return the uplink stats
      and SW stats if asked, otherwise as before. The uplink stats are implemented using
      the PPCNT 802_3 counters which are already being read/cached by the driver.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      370bad0f
    • Or Gerlitz's avatar
      net: Add net-device param to the get offloaded stats ndo · 3df5b3c6
      Or Gerlitz authored
      Some drivers would need to check few internal matters for
      that. To be used in downstream mlx5 commit.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3df5b3c6
    • David S. Miller's avatar
      Merge branch 'phy-broadcom-wirespeed-downshift-support' · ac32378f
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: phy: broadcom: Wirespeed/downshift support
      
      This patch series adds support for the Broadcom Wirespeed, aka
      downsfhit feature utilizing the recently added ethtool PHY tunables.
      
      Tested with two Gigabit link partners with a 4-wire cable having only
      2 pairs connected.
      
      Last patch in the series is a fix that was required for testing, which
      should make it to -stable, which I can submit separate against net if
      you prefer David.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac32378f
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change · 30ce0de4
      Florian Fainelli authored
      In case the link change and EEE is enabled or disabled, always try to
      re-negotiate this with the link partner.
      
      Fixes: 450b05c1 ("net: dsa: bcm_sf2: add support for controlling EEE")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30ce0de4
    • Florian Fainelli's avatar
      net: phy: bcm7xxx: Add support for downshift/Wirespeed · db88816b
      Florian Fainelli authored
      Add support for configuring the downshift/Wirespeed enable/disable
      toggles and specify a link retry value ranging from 1 to 9. Since the
      integrated BCM7xxx have issues when wirespeed is enabled and EEE is also
      enabled, we do disable EEE if wirespeed is enabled.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db88816b
    • Florian Fainelli's avatar
      net: phy: broadcom: Allow enabling or disabling of EEE · 99cec8a4
      Florian Fainelli authored
      In preparation for adding support for Wirespeed/downshift, we need to
      change bcm_phy_eee_enable() to allow enabling or disabling EEE, so make
      the function take an extra enable/disable boolean parameter and rename
      it to illustrate it sets EEE, not necessarily just enables it.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99cec8a4
    • Florian Fainelli's avatar
      net: phy: broadcom: Add support code for downshift/Wirespeed · d06f78c4
      Florian Fainelli authored
      Broadcom's Wirespeed feature allows us to configure how auto-negotiation
      should behave with fewer working pairs of wires on a cable. Add support
      code for retrieving and setting such downshift counters using the
      recently added ethtool downshift tunables.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d06f78c4
    • Florian Fainelli's avatar
      net: phy: broadcom: Move bcm54xx_auxctl_{read, write} to common library · 5519da87
      Florian Fainelli authored
      We are going to need these functions to implement support for Broadcom
      Wirespeed, aka downshift.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5519da87
    • Eric Dumazet's avatar
      tcp: enhance tcp_collapse_retrans() with skb_shift() · f8071cde
      Eric Dumazet authored
      In commit 2331ccc5 ("tcp: enhance tcp collapsing"),
      we made a first step allowing copying right skb to left skb head.
      
      Since all skbs in socket write queue are headless (but possibly the very
      first one), this strategy often does not work.
      
      This patch extends tcp_collapse_retrans() to perform frag shifting,
      thanks to skb_shift() helper.
      
      This helper needs to not BUG on non headless skbs, as callers are ok
      with that.
      
      Tested:
      
      Following packetdrill test now passes :
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
         +0 bind(3, ..., ...) = 0
         +0 listen(3, 1) = 0
      
         +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8>
         +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
      +.100 < . 1:1(0) ack 1 win 257
         +0 accept(3, ..., ...) = 4
      
         +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
         +0 write(4, ..., 200) = 200
         +0 > P. 1:201(200) ack 1
      +.001 write(4, ..., 200) = 200
         +0 > P. 201:401(200) ack 1
      +.001 write(4, ..., 200) = 200
         +0 > P. 401:601(200) ack 1
      +.001 write(4, ..., 200) = 200
         +0 > P. 601:801(200) ack 1
      +.001 write(4, ..., 200) = 200
         +0 > P. 801:1001(200) ack 1
      +.001 write(4, ..., 100) = 100
         +0 > P. 1001:1101(100) ack 1
      +.001 write(4, ..., 100) = 100
         +0 > P. 1101:1201(100) ack 1
      +.001 write(4, ..., 100) = 100
         +0 > P. 1201:1301(100) ack 1
      +.001 write(4, ..., 100) = 100
         +0 > P. 1301:1401(100) ack 1
      
      +.099 < . 1:1(0) ack 201 win 257
      +.001 < . 1:1(0) ack 201 win 257 <nop,nop,sack 1001:1401>
         +0 > P. 201:1001(800) ack 1
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8071cde
    • Stefan Eichenberger's avatar
      net: dsa: mv88e6xxx: add MV88E6097 switch · 7d381a02
      Stefan Eichenberger authored
      Add support for the MV88E6097 switch. The change was tested on an Armada
      based platform with a MV88E6097 switch.
      Signed-off-by: default avatarStefan Eichenberger <stefan.eichenberger@netmodule.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d381a02
    • Uwe Kleine-König's avatar
      net/phy: add trace events for mdio accesses · e22e996b
      Uwe Kleine-König authored
      Make it possible to generate trace events for mdio read and write accesses.
      Signed-off-by: default avatarUwe Kleine-König <uwe@kleine-koenig.org>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e22e996b
    • Stefan Hajnoczi's avatar
      VSOCK: add loopback to virtio_transport · b9116823
      Stefan Hajnoczi authored
      The VMware VMCI transport supports loopback inside virtual machines.
      This patch implements loopback for virtio-vsock.
      
      Flow control is handled by the virtio-vsock protocol as usual.  The
      sending process stops transmitting on a connection when the peer's
      receive buffer space is exhausted.
      
      Cathy Avery <cavery@redhat.com> noticed this difference between VMCI and
      virtio-vsock when a test case using loopback failed.  Although loopback
      isn't the main point of AF_VSOCK, it is useful for testing and
      virtio-vsock must match VMCI semantics so that userspace programs run
      regardless of the underlying transport.
      
      My understanding is that loopback is not supported on the host side with
      VMCI.  Follow that by implementing it only in the guest driver, not the
      vhost host driver.
      
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Reported-by: default avatarCathy Avery <cavery@redhat.com>
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9116823
  2. 22 Nov, 2016 17 commits
  3. 21 Nov, 2016 1 commit