1. 27 Mar, 2017 19 commits
  2. 26 Mar, 2017 10 commits
  3. 25 Mar, 2017 11 commits
    • David S. Miller's avatar
      Merge branch 'epoll-busypoll' · 2239cc63
      David S. Miller authored
      Alexander Duyck says:
      
      ====================
      Add busy poll support for epoll
      
      This patch set adds support for using busy polling with epoll. The main
      idea behind this is that we record the NAPI ID for the last event that is
      moved onto the ready list for the epoll context and then when we no longer
      have any events on the ready list we begin polling with that ID. If the
      busy polling does not yield any events then we will reset the NAPI ID to 0
      and wait until a new event is added to the ready list with a valid NAPI ID
      before we will resume busy polling.
      
      Most of the changes in this set authored by me are meant to be cleanup or
      fixes for various things. For example, I am trying to make it so that we
      don't perform hash look-ups for the NAPI instance when we are only working
      with sender_cpu and the like.
      
      At the heart of this set is the last 3 patches which enable epoll support
      and add support for obtaining the NAPI ID of a given socket. With these it
      becomes possible for an application to make use of epoll and get optimal
      busy poll utilization by stacking multiple sockets with the same NAPI ID on
      the same epoll context.
      
      v1: The first version of this series only allowed epoll to busy poll if all
          of the sockets with a NAPI ID shared the same NAPI ID. I feel we were
          too strict with this requirement, so I changed the behavior for v2.
      v2: The second version was pretty much a full rewrite of the first set. The
          main changes consisted of pulling apart several patches to better
          address the need to clean up a few items and to make the code easier to
          review. In the set however I went a bit overboard and was trying to fix
          an issue that would only occur with 500+ years of uptime, and in the
          process limited the range for busy_poll/busy_read unnecessarily.
      v3: Split off the code for limiting busy_poll and busy_read into a separate
          patch for net.
          Updated patch that changed busy loop time tracking so that it uses
          "local_clock() >> 10" as we originally did.
          Tweaked "Change return type.." patch by moving declaration of "work"
          inside the loop where is was accessed and always reset to 0.
          Added "Acked-by" for patches that received acks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2239cc63
    • Sridhar Samudrala's avatar
      net: Introduce SO_INCOMING_NAPI_ID · 6d433902
      Sridhar Samudrala authored
      This socket option returns the NAPI ID associated with the queue on which
      the last frame is received. This information can be used by the apps to
      split the incoming flows among the threads based on the Rx queue on which
      they are received.
      
      If the NAPI ID actually represents a sender_cpu then the value is ignored
      and 0 is returned.
      Signed-off-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d433902
    • Sridhar Samudrala's avatar
      epoll: Add busy poll support to epoll with socket fds. · bf3b9f63
      Sridhar Samudrala authored
      This patch adds busy poll support to epoll. The implementation is meant to
      be opportunistic in that it will take the NAPI ID from the last socket
      that is added to the ready list that contains a valid NAPI ID and it will
      use that for busy polling until the ready list goes empty.  Once the ready
      list goes empty the NAPI ID is reset and busy polling is disabled until a
      new socket is added to the ready list.
      
      In addition when we insert a new socket into the epoll we record the NAPI
      ID and assume we are going to receive events on it.  If that doesn't occur
      it will be evicted as the active NAPI ID and we will resume normal
      behavior.
      
      An application can use SO_INCOMING_CPU or SO_REUSEPORT_ATTACH_C/EBPF socket
      options to spread the incoming connections to specific worker threads
      based on the incoming queue. This enables epoll for each worker thread
      to have only sockets that receive packets from a single queue. So when an
      application calls epoll_wait() and there are no events available to report,
      busy polling is done on the associated queue to pull the packets.
      Signed-off-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf3b9f63
    • Sridhar Samudrala's avatar
      net: Commonize busy polling code to focus on napi_id instead of socket · 7db6b048
      Sridhar Samudrala authored
      Move the core functionality in sk_busy_loop() to napi_busy_loop() and
      make it independent of sk.
      
      This enables re-using this function in epoll busy loop implementation.
      Signed-off-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7db6b048
    • Alexander Duyck's avatar
      net: Track start of busy loop instead of when it should end · 37056719
      Alexander Duyck authored
      This patch flips the logic we were using to determine if the busy polling
      has timed out.  The main motivation for this is that we will need to
      support two different possible timeout values in the future and by
      recording the start time rather than when we would want to end we can focus
      on making the end_time specific to the task be it epoll or socket based
      polling.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37056719
    • Alexander Duyck's avatar
      net: Change return type of sk_busy_loop from bool to void · 2b5cd0df
      Alexander Duyck authored
      checking the return value of sk_busy_loop. As there are only a few
      consumers of that data, and the data being checked for can be replaced
      with a check for !skb_queue_empty() we might as well just pull the code
      out of sk_busy_loop and place it in the spots that actually need it.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b5cd0df
    • Alexander Duyck's avatar
      net: Only define skb_mark_napi_id in one spot instead of two · d2e64dbb
      Alexander Duyck authored
      Instead of defining two versions of skb_mark_napi_id I think it is more
      readable to just match the format of the sk_mark_napi_id functions and just
      wrap the contents of the function instead of defining two versions of the
      function.  This way we can save a few lines of code since we only need 2 of
      the ifdef/endif but needed 5 for the extra function declaration.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2e64dbb
    • Alexander Duyck's avatar
      tcp: Record Rx hash and NAPI ID in tcp_child_process · e5907459
      Alexander Duyck authored
      While working on some recent busy poll changes we found that child sockets
      were being instantiated without NAPI ID being set.  In our first attempt to
      fix it, it was suggested that we should just pull programming the NAPI ID
      into the function itself since all callers will need to have it set.
      
      In addition to the NAPI ID change I have dropped the code that was
      populating the Rx hash since it was actually being populated in
      tcp_get_cookie_sock.
      Reported-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5907459
    • Alexander Duyck's avatar
      net: Busy polling should ignore sender CPUs · 545cd5e5
      Alexander Duyck authored
      This patch is a cleanup/fix for NAPI IDs following the changes that made it
      so that sender_cpu and napi_id were doing a better job of sharing the same
      location in the sk_buff.
      
      One issue I found is that we weren't validating the napi_id as being valid
      before we started trying to setup the busy polling.  This change corrects
      that by using the MIN_NAPI_ID value that is now used in both allocating the
      NAPI IDs, as well as validating them.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      545cd5e5
    • David S. Miller's avatar
      Merge branch 'mlx5-xdp-perf-optimizations' · dcb421f4
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox mlx5e XDP performance optimization
      
      This series provides some preformancee optimizations for mlx5e
      driver, especially for XDP TX flows.
      
      1st patch is a simple change of rmb to dma_rmb in CQE fetch routine
      which shows a huge gain for both RX and TX packet rates.
      
      2nd patch removes write combining logic from the driver TX handler
      and simplifies the TX logic while improving TX CPU utilization.
      
      All other patches combined provide some refactoring to the driver TX
      flows to allow some significant XDP TX improvements.
      
      More details and performance numbers per patch can be found in each patch
      commit message compared to the preceding patch.
      
      Overall performance improvemnets
        System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
      
      Test case                   Baseline      Now      improvement
      ---------------------------------------------------------------
      TX packets (24 threads)     45Mpps        54Mpps      20%
      TC stack Drop (1 core)      3.45Mpps      3.6Mpps     5%
      XDP Drop      (1 core)      14Mpps        16.9Mpps    20%
      XDP TX        (1 core)      10.4Mpps      13.7Mpps    31%
      ====================
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dcb421f4
    • Saeed Mahameed's avatar
      net/mlx5e: Different SQ types · 31391048
      Saeed Mahameed authored
      Different SQ types (tx, xdp, ico) are growing apart, we separate them
      and remove unwanted parts in each one of them, to simplify data path and
      utilize data cache.
      
      Remove DB union from SQ structures since it is not needed anymore as we
      now have different SQ data type for each SQ.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31391048