1. 10 Jan, 2018 14 commits
    • Ido Schimmel's avatar
      ipv6: Add support for non-equal-cost multipath · 398958ae
      Ido Schimmel authored
      The use of hash-threshold instead of modulo-N makes it trivial to add
      support for non-equal-cost multipath.
      
      Instead of dividing the multipath hash function's output space equally
      between the nexthops, each nexthop is assigned a region size which is
      proportional to its weight.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      398958ae
    • Ido Schimmel's avatar
      ipv6: Use hash-threshold instead of modulo-N · 3d709f69
      Ido Schimmel authored
      Now that each nexthop stores its region boundary in the multipath hash
      function's output space, we can use hash-threshold instead of modulo-N
      in multipath selection.
      
      This reduces the number of checks we need to perform during lookup, as
      dead and linkdown nexthops are assigned a negative region boundary. In
      addition, in contrast to modulo-N, only flows near region boundaries are
      affected when a nexthop is added or removed.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d709f69
    • Ido Schimmel's avatar
      ipv6: Use a 31-bit multipath hash · 7696c06a
      Ido Schimmel authored
      The hash thresholds assigned to IPv6 nexthops are in the range of
      [-1, 2^31 - 1], where a negative value is assigned to nexthops that
      should not be considered during multipath selection.
      
      Therefore, in a similar fashion to IPv4, we need to use the upper
      31-bits of the multipath hash for multipath selection.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7696c06a
    • Ido Schimmel's avatar
      ipv6: Calculate hash thresholds for IPv6 nexthops · d7dedee1
      Ido Schimmel authored
      Before we convert IPv6 to use hash-threshold instead of modulo-N, we
      first need each nexthop to store its region boundary in the hash
      function's output space.
      
      The boundary is calculated by dividing the output space equally between
      the different active nexthops. That is, nexthops that are not dead or
      linkdown.
      
      The boundaries are rebalanced whenever a nexthop is added or removed to
      a multipath route and whenever a nexthop becomes active or inactive.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7dedee1
    • Jason Wang's avatar
      vhost_net: batch used ring update in rx · e2b3b35e
      Jason Wang authored
      This patch tries to batched used ring update during RX. This is pretty
      fit for the case when guest is much faster (e.g dpdk based
      backend). In this case, used ring is almost empty:
      
      - we may get serious cache line misses/contending on both used ring
        and used idx.
      - at most 1 packet could be dequeued at one time, batching in guest
        does not make much effect.
      
      Update used ring in a batch can help since guest won't access the used
      ring until used idx was advanced for several descriptors and since we
      advance used ring for every N packets, guest will only need to access
      used idx for every N packet since it can cache the used idx. To have a
      better interaction for both batch dequeuing and dpdk batching,
      VHOST_RX_BATCH was used as the maximum number of descriptors that
      could be batched.
      
      Test were done between two machines with 2.40GHz Intel(R) Xeon(R) CPU
      E5-2630 connected back to back through ixgbe. Traffic were generated
      on one remote ixgbe through MoonGen and measure the RX pps through
      testpmd in guest when do xdp_redirect_map from local ixgbe to
      tap. RX pps were increased from 3.05 Mpps to 4.00 Mpps (about 31%
      improvement).
      
      One possible concern for this is the implications for TCP (especially
      latency sensitive workload). Result[1] does not show obvious changes
      for most of the netperf test (RR, TX, and RX). And we do get some
      improvements for RX on some specific size.
      
      Guest RX:
      
      size/sessions/+thu%/+normalize%
         64/     1/   +2%/   +2%
         64/     2/   +2%/   -1%
         64/     4/   +1%/   +1%
         64/     8/    0%/    0%
        256/     1/   +6%/   -3%
        256/     2/   -3%/   +2%
        256/     4/  +11%/  +11%
        256/     8/    0%/    0%
        512/     1/   +4%/    0%
        512/     2/   +2%/   +2%
        512/     4/    0%/   -1%
        512/     8/   -8%/   -8%
       1024/     1/   -7%/  -17%
       1024/     2/   -8%/   -7%
       1024/     4/   +1%/    0%
       1024/     8/    0%/    0%
       2048/     1/  +30%/  +14%
       2048/     2/  +46%/  +40%
       2048/     4/    0%/    0%
       2048/     8/    0%/    0%
       4096/     1/  +23%/  +22%
       4096/     2/  +26%/  +23%
       4096/     4/    0%/   +1%
       4096/     8/    0%/    0%
      16384/     1/   -2%/   -3%
      16384/     2/   +1%/   -4%
      16384/     4/   -1%/   -3%
      16384/     8/    0%/   -1%
      65535/     1/  +15%/   +7%
      65535/     2/   +4%/   +7%
      65535/     4/    0%/   +1%
      65535/     8/    0%/    0%
      
      TCP_RR:
      
      size/sessions/+thu%/+normalize%
          1/     1/    0%/   +1%
          1/    25/   +2%/   +1%
          1/    50/   +4%/   +1%
         64/     1/    0%/   -4%
         64/    25/   +2%/   +1%
         64/    50/    0%/   -1%
        256/     1/    0%/    0%
        256/    25/    0%/    0%
        256/    50/   +4%/   +2%
      
      Guest TX:
      
      size/sessions/+thu%/+normalize%
         64/     1/   +4%/   -2%
         64/     2/   -6%/   -5%
         64/     4/   +3%/   +6%
         64/     8/    0%/   +3%
        256/     1/  +15%/  +16%
        256/     2/  +11%/  +12%
        256/     4/   +1%/    0%
        256/     8/   +5%/   +5%
        512/     1/   -1%/   -6%
        512/     2/    0%/   -8%
        512/     4/   -2%/   +4%
        512/     8/   +6%/   +9%
       1024/     1/   +3%/   +1%
       1024/     2/   +3%/   +9%
       1024/     4/    0%/   +7%
       1024/     8/    0%/   +7%
       2048/     1/   +8%/   +2%
       2048/     2/   +3%/   -1%
       2048/     4/   -1%/  +11%
       2048/     8/   +3%/   +9%
       4096/     1/   +8%/   +8%
       4096/     2/    0%/   -7%
       4096/     4/   +4%/   +4%
       4096/     8/   +2%/   +5%
      16384/     1/   -3%/   +1%
      16384/     2/   -1%/  -12%
      16384/     4/   -1%/   +5%
      16384/     8/    0%/   +1%
      65535/     1/    0%/   -3%
      65535/     2/   +5%/  +16%
      65535/     4/   +1%/   +2%
      65535/     8/   +1%/   -1%
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2b3b35e
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2018-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 65d51f26
      David S. Miller authored
      mlx5-updates-2018-01-08
      
      Four patches from Or that add Hairpin support to mlx5:
      ===========================================================
      From:  Or Gerlitz <ogerlitz@mellanox.com>
      
      We refer the ability of NIC HW to fwd packet received on one port to
      the other port (also from a port to itself) as hairpin. The application API
      is based
      on ingress tc/flower rules set on the NIC with the mirred redirect
      action. Other actions can apply to packets during the redirect.
      
      Hairpin allows to offload the data-path of various SW DDoS gateways,
      load-balancers, etc to HW. Packets go through all the required
      processing in HW (header re-write, encap/decap, push/pop vlan) and
      then forwarded, CPU stays at practically zero usage. HW Flow counters
      are used by the control plane for monitoring and accounting.
      
      Hairpin is implemented by pairing a receive queue (RQ) to send queue (SQ).
      All the flows that share <recv NIC, mirred NIC> are redirected through
      the same hairpin pair. Currently, only header-rewrite is supported as a
      packet modification action.
      
      I'd like to thanks Elijah Shakkour <elijahs@mellanox.com> for implementing this
      functionality
      on HW simulator, before it was avail in the FW so the driver code could be
      tested early.
      ===========================================================
      
      From Feras three patches that provide very small changes that allow IPoIB
      to support RX timestamping for child interfaces, simply by hooking the mlx5e
      timestamping PTP ioctl to IPoIB child interface netdev profile.
      
      One patch from Gal to fix a spilling mistake.
      
      Two patches from Eugenia adds drop counters to VF statistics
      to be reported as part of VF statistics in netlink (iproute2) and
      implemented them in mlx5 eswitch.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65d51f26
    • David S. Miller's avatar
      Merge branch 'hns3-next' · 45f89822
      David S. Miller authored
      Peng Li says:
      
      ====================
      code improvements in HNS3 driver
      
      This patchset fixes 2 comments for community review.
      [patch 1/2] reverts "net: hns3: Add packet statistics of netdev"
      reported by Jakub Kicinski and David Miller.
      [patch 2/2] reports the function type the same line with
      hns3_nic_get_stats64, reported by Andrew Lunn.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45f89822
    • Peng Li's avatar
      net: hns3: report the function type the same line with hns3_nic_get_stats64 · 6c88d9d7
      Peng Li authored
      The function type should be on the same line with the function
      name, or it may cause display error if a patch edit the
      function. There is am example following:
      https://www.spinics.net/lists/netdev/msg476141.htmlSigned-off-by: default avatarPeng Li <lipeng321@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c88d9d7
    • Peng Li's avatar
      Revert "net: hns3: Add packet statistics of netdev" · bf909456
      Peng Li authored
      This reverts commit 84910007.
      
      It is duplicate to add statistics of netdev for ethtool -S.
      Signed-off-by: default avatarPeng Li <lipeng321@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf909456
    • David S. Miller's avatar
      Merge branch 'Socionext-Synquacer-NETSEC-driver' · 68d5c265
      David S. Miller authored
      Jassi Brar says:
      
      ====================
      Socionext Synquacer NETSEC driver
      
      Changes since v5
      	# Removed helper macros
      	# Removed 'inline' qualifier
      	# Changed multiline empty comment to single line
      	# Added 'clock-names' property in DT binding example
      	# Ignore 'clock-names' property in driver until f/ws in the wild are
      	  upgraded or we support instance that take in more than one clock.
      	# Rebased the patchset onto net-next
      
      Changes since v4
              # Fixed ucode indexing as a word, instead of byte
              # Removed redundant clocks, keep only phy rate reference clock
                and expect it to be 'phy_ref_clk'
      
      Changes since v3
              # Discard 'socionext,snq-mdio', and simply use 'mdio' subnode.
              # Use ioremap on ucode region as well, instead of memremap.
      
      Changes since v2
              # Use 'mdio' subnode in DT bindings.
              # Use phy_interface_mode_is_rgmii(), instead of open coding the check.
              # Use readl/b with eeprom_base pointer.
              # Unregister mdio bus upon failure in probe.
      
      Changes since v1
              # Switched from using memremap to ioremap
              # Implemented ndo_do_ioctl callback
              # Defined optional 'dma-coherent' DT property
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68d5c265
    • Jassi Brar's avatar
      MAINTAINERS: Add entry for Socionext ethernet driver · 919e66a2
      Jassi Brar authored
      Add entry for the Socionext Netsec controller driver and DT bindings.
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarJassi Brar <jaswinder.singh@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      919e66a2
    • Jassi Brar's avatar
      net: socionext: Add Synquacer NetSec driver · 533dd11a
      Jassi Brar authored
      This driver adds support for Socionext "netsec" IP Gigabit
      Ethernet + PHY IP used in the Synquacer SC2A11 SoC.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarJassi Brar <jaswinder.singh@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      533dd11a
    • Jassi Brar's avatar
      dt-bindings: net: Add DT bindings for Socionext Netsec · f78f4107
      Jassi Brar authored
      This patch adds documentation for Device-Tree bindings for the
      Socionext NetSec Controller driver.
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarJassi Brar <jaswinder.singh@linaro.org>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f78f4107
    • David S. Miller's avatar
      Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · c215dae4
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      10GbE Intel Wired LAN Driver Updates 2018-01-09
      
      This series contains updates to ixgbe and ixgbevf only.
      
      Emil fixes an issue with "wake on LAN"(WoL) where we need to ensure we
      enable the reception of multicast packets so that WoL works for IPv6
      magic packets.  Cleaned up code no longer needed with the update to
      adaptive ITR.
      
      Paul update the driver to advertise the highest capable link speed
      when a module gets inserted.  Also extended the displaying of firmware
      version to include the iSCSI and OEM block in the EEPROM to better
      identify firmware versions/images.
      
      Tonghao Zhang cleans up a code comment that no longer applies since
      InterruptThrottleRate has been removed from the driver.
      
      Alex fixes SR-IOV and MACVLAN offload interaction, where the MACVLAN
      offload was incorrectly configuring several filters with the wrong
      pool value which resulted in MACLVAN interfaces not being able to
      receive traffic that had to pass over the physical interface.  Fixed
      transmit hangs and dropped receive frames when the number of VFs
      changed.  Added support for RSS on MACVLAN pools for X550 devices.
      Fixed up the MACVLAN limitations so we can now support 63 offloaded
      devices.  Cleaned up MACVLAN code that is no longer needed with the
      recent changes and fixes.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c215dae4
  2. 09 Jan, 2018 26 commits