1. 15 Apr, 2023 4 commits
    • Jakub Kicinski's avatar
      Merge branch 'page_pool-allow-caching-from-safely-localized-napi' · e61caf04
      Jakub Kicinski authored
      Jakub Kicinski says:
      
      ====================
      page_pool: allow caching from safely localized NAPI
      
      I went back to the explicit "are we in NAPI method", mostly
      because I don't like having both around :( (even tho I maintain
      that in_softirq() && !in_hardirq() is as safe, as softirqs do
      not nest).
      
      Still returning the skbs to a CPU, tho, not to the NAPI instance.
      I reckon we could create a small refcounted struct per NAPI instance
      which would allow sockets and other users so hold a persisent
      and safe reference. But that's a bigger change, and I get 90+%
      recycling thru the cache with just these patches (for RR and
      streaming tests with 100% CPU use it's almost 100%).
      
      Some numbers for streaming test with 100% CPU use (from previous version,
      but really they perform the same):
      
      		HW-GRO				page=page
      		before		after		before		after
      recycle:
      cached:			0	138669686		0	150197505
      cache_full:		0	   223391		0	    74582
      ring:		138551933         9997191	149299454		0
      ring_full: 		0             488	     3154	   127590
      released_refcnt:	0		0		0		0
      
      alloc:
      fast:		136491361	148615710	146969587	150322859
      slow:		     1772	     1799	      144	      105
      slow_high_order:	0		0		0		0
      empty:		     1772	     1799	      144	      105
      refill:		  2165245	   156302	  2332880	     2128
      waive:			0		0		0		0
      
      v1: https://lore.kernel.org/all/20230411201800.596103-1-kuba@kernel.org/
      rfcv2: https://lore.kernel.org/all/20230405232100.103392-1-kuba@kernel.org/
      ====================
      
      Link: https://lore.kernel.org/r/20230413042605.895677-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e61caf04
    • Jakub Kicinski's avatar
      bnxt: hook NAPIs to page pools · 294e39e0
      Jakub Kicinski authored
      bnxt has 1:1 mapping of page pools and NAPIs, so it's safe
      to hoook them up together.
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Tested-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      294e39e0
    • Jakub Kicinski's avatar
      page_pool: allow caching from safely localized NAPI · 8c48eea3
      Jakub Kicinski authored
      Recent patches to mlx5 mentioned a regression when moving from
      driver local page pool to only using the generic page pool code.
      Page pool has two recycling paths (1) direct one, which runs in
      safe NAPI context (basically consumer context, so producing
      can be lockless); and (2) via a ptr_ring, which takes a spin
      lock because the freeing can happen from any CPU; producer
      and consumer may run concurrently.
      
      Since the page pool code was added, Eric introduced a revised version
      of deferred skb freeing. TCP skbs are now usually returned to the CPU
      which allocated them, and freed in softirq context. This places the
      freeing (producing of pages back to the pool) enticingly close to
      the allocation (consumer).
      
      If we can prove that we're freeing in the same softirq context in which
      the consumer NAPI will run - lockless use of the cache is perfectly fine,
      no need for the lock.
      
      Let drivers link the page pool to a NAPI instance. If the NAPI instance
      is scheduled on the same CPU on which we're freeing - place the pages
      in the direct cache.
      
      With that and patched bnxt (XDP enabled to engage the page pool, sigh,
      bnxt really needs page pool work :() I see a 2.6% perf boost with
      a TCP stream test (app on a different physical core than softirq).
      
      The CPU use of relevant functions decreases as expected:
      
        page_pool_refill_alloc_cache   1.17% -> 0%
        _raw_spin_lock                 2.41% -> 0.98%
      
      Only consider lockless path to be safe when NAPI is scheduled
      - in practice this should cover majority if not all of steady state
      workloads. It's usually the NAPI kicking in that causes the skb flush.
      
      The main case we'll miss out on is when application runs on the same
      CPU as NAPI. In that case we don't use the deferred skb free path.
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Tested-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8c48eea3
    • Jakub Kicinski's avatar
      net: skb: plumb napi state thru skb freeing paths · b07a2d97
      Jakub Kicinski authored
      We maintain a NAPI-local cache of skbs which is fed by napi_consume_skb().
      Going forward we will also try to cache head and data pages.
      Plumb the "are we in a normal NAPI context" information thru
      deeper into the freeing path, up to skb_release_data() and
      skb_free_head()/skb_pp_recycle(). The "not normal NAPI context"
      comes from netpoll which passes budget of 0 to try to reap
      the Tx completions but not perform any Rx.
      
      Use "bool napi_safe" rather than bare "int budget",
      the further we get from NAPI the more confusing the budget
      argument may seem (particularly whether 0 or MAX is the
      correct value to pass in when not in NAPI).
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Tested-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b07a2d97
  2. 14 Apr, 2023 36 commits