1. 05 Dec, 2023 15 commits
  2. 04 Dec, 2023 24 commits
  3. 02 Dec, 2023 1 commit
    • David S. Miller's avatar
      Merge branch 'net-cacheline-optimizations' · 8470e436
      David S. Miller authored
      Coco Li says:
      
      ====================
      Analyze and Reorganize core Networking Structs to optimize cacheline consumption
      
      Currently, variable-heavy structs in the networking stack is organized
      chronologically, logically and sometimes by cacheline access.
      
      This patch series attempts to reorganize the core networking stack
      variables to minimize cacheline consumption during the phase of data
      transfer. Specifically, we looked at the TCP/IP stack and the fast
      path definition in TCP.
      
      For documentation purposes, we also added new files for each core data
      structure we considered, although not all ended up being modified due
      to the amount of existing cacheline they span in the fast path. In
      the documentation, we recorded all variables we identified on the
      fast path and the reasons. We also hope that in the future when
      variables are added/modified, the document can be referred to and
      updated accordingly to reflect the latest variable organization.
      
      Tested:
      Our tests were run with neper tcp_rr using tcp traffic. The tests have $cpu
      number of threads and variable number of flows (see below).
      
      Tests were run on 6.5-rc1
      
      Efficiency is computed as cpu seconds / throughput (one tcp_rr round trip).
      The following result shows efficiency delta before and after the patch
      series is applied.
      
      On AMD platforms with 100Gb/s NIC and 256Mb L3 cache:
      IPv4
      Flows   with patches    clean kernel      Percent reduction
      30k     0.0001736538065 0.0002741191042 -36.65%
      20k     0.0001583661752 0.0002712559158 -41.62%
      10k     0.0001639148817 0.0002951800751 -44.47%
      5k      0.0001859683866 0.0003320642536 -44.00%
      1k      0.0002035190546 0.0003152056382 -35.43%
      
      IPv6
      Flows   with patches  clean kernel    Percent reduction
      30k     0.000202535503  0.0003275329163 -38.16%
      20k     0.0002020654777 0.0003411304786 -40.77%
      10k     0.0002122427035 0.0003803674705 -44.20%
      5k      0.0002348776729 0.0004030403953 -41.72%
      1k      0.0002237384583 0.0002813646157 -20.48%
      
      On Intel platforms with 200Gb/s NIC and 105Mb L3 cache:
      IPv6
      Flows   with patches    clean kernel    Percent reduction
      30k     0.0006296537873 0.0006370427753 -1.16%
      20k     0.0003451029365 0.0003628016076 -4.88%
      10k     0.0003187646958 0.0003346835645 -4.76%
      5k      0.0002954676348 0.000311807592  -5.24%
      1k      0.0001909169342 0.0001848069709 3.31%
      
      v8 changes:
      1. Update net_device_read_txrx cache group maximum
      2. Update MAINTAINERS for documentations
      3. Skip __cache_group variables in scripts/kernel-doc
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8470e436