1. 17 Sep, 2016 5 commits
    • David S. Miller's avatar
      Merge branch 'mlx5e-order-0' · 31b96621
      David S. Miller authored
      Tariq Toukan says:
      
      ====================
      mlx5e Order-0 pages for Striding RQ
      
      In this series, we refactor our Striding RQ receive-flow to always use
      fragmented WQEs (Work Queue Elements) using order-0 pages, omitting the
      flow that allocates and splits high-order pages which would fragment
      and deplete high-order pages in the system.
      
      The first patch gives a slight degradation, but opens the opportunity
      to using a simple page-cache mechanism of a fair size.
      The page-cache, implemented in patch 3, not only closes the performance
      gap but even gives a gain.
      In patch 2 we re-organize the code to better manage the calls for
      alloc/de-alloc pages in the RX flow.
      
      Series generated against net-next commit:
      bed806cb "Merge branch 'mlxsw-ethtool'"
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31b96621
    • Tariq Toukan's avatar
      net/mlx5e: Implement RX mapped page cache for page recycle · 4415a031
      Tariq Toukan authored
      Instead of reallocating and mapping pages for RX data-path,
      recycle already used pages in a per ring cache.
      
      Performance tests:
      The following results were measured on a freshly booted system,
      giving optimal baseline performance, as high-order pages are yet to
      be fragmented and depleted.
      
      We ran pktgen single-stream benchmarks, with iptables-raw-drop:
      
      Single stride, 64 bytes:
      * 4,739,057 - baseline
      * 4,749,550 - order0 no cache
      * 4,786,899 - order0 with cache
      1% gain
      
      Larger packets, no page cross, 1024 bytes:
      * 3,982,361 - baseline
      * 3,845,682 - order0 no cache
      * 4,127,852 - order0 with cache
      3.7% gain
      
      Larger packets, every 3rd packet crosses a page, 1500 bytes:
      * 3,731,189 - baseline
      * 3,579,414 - order0 no cache
      * 3,931,708 - order0 with cache
      5.4% gain
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4415a031
    • Tariq Toukan's avatar
      net/mlx5e: Introduce API for RX mapped pages · a5a0c590
      Tariq Toukan authored
      Manage the allocation and deallocation of mapped RX pages only
      through dedicated API functions.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5a0c590
    • Tariq Toukan's avatar
      net/mlx5e: Single flow order-0 pages for Striding RQ · 7e426671
      Tariq Toukan authored
      To improve the memory consumption scheme, we omit the flow that
      demands and splits high-order pages in Striding RQ, and stay
      with a single Striding RQ flow that uses order-0 pages.
      
      Moving to fragmented memory allows the use of larger MPWQEs,
      which reduces the number of UMR posts and filler CQEs.
      
      Moving to a single flow allows several optimizations that improve
      performance, especially in production servers where we would
      anyway fallback to order-0 allocations:
      - inline functions that were called via function pointers.
      - improve the UMR post process.
      
      This patch alone is expected to give a slight performance reduction.
      However, the new memory scheme gives the possibility to use a page-cache
      of a fair size, that doesn't inflate the memory footprint, which will
      dramatically fix the reduction and even give a performance gain.
      
      Performance tests:
      The following results were measured on a freshly booted system,
      giving optimal baseline performance, as high-order pages are yet to
      be fragmented and depleted.
      
      We ran pktgen single-stream benchmarks, with iptables-raw-drop:
      
      Single stride, 64 bytes:
      * 4,739,057 - baseline
      * 4,749,550 - this patch
      no reduction
      
      Larger packets, no page cross, 1024 bytes:
      * 3,982,361 - baseline
      * 3,845,682 - this patch
      3.5% reduction
      
      Larger packets, every 3rd packet crosses a page, 1500 bytes:
      * 3,731,189 - baseline
      * 3,579,414 - this patch
      4% reduction
      
      Fixes: 461017cb ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
      Fixes: bc77b240 ("net/mlx5e: Add fragmented memory support for RX multi packet WQE")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e426671
    • David Howells's avatar
      rxrpc: Make IPv6 support conditional on CONFIG_IPV6 · d1912747
      David Howells authored
      Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it.
      This is then made conditional on CONFIG_IPV6.
      
      Without this, the following can be seen:
      
         net/built-in.o: In function `rxrpc_init_peer':
      >> peer_object.c:(.text+0x18c3c8): undefined reference to `ip6_route_output_flags'
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1912747
  2. 16 Sep, 2016 35 commits