• Yury Norov's avatar
    net: mana: add a function to spread IRQs per CPUs · 91bfe210
    Yury Norov authored
    Souradeep investigated that the driver performs faster if IRQs are
    spread on CPUs with the following heuristics:
    
    1. No more than one IRQ per CPU, if possible;
    2. NUMA locality is the second priority;
    3. Sibling dislocality is the last priority.
    
    Let's consider this topology:
    
    Node            0               1
    Core        0       1       2       3
    CPU       0   1   2   3   4   5   6   7
    
    The most performant IRQ distribution based on the above topology
    and heuristics may look like this:
    
    IRQ     Nodes   Cores   CPUs
    0       1       0       0-1
    1       1       1       2-3
    2       1       0       0-1
    3       1       1       2-3
    4       2       2       4-5
    5       2       3       6-7
    6       2       2       4-5
    7       2       3       6-7
    
    The irq_setup() routine introduced in this patch leverages the
    for_each_numa_hop_mask() iterator and assigns IRQs to sibling groups
    as described above.
    
    According to [1], for NUMA-aware but sibling-ignorant IRQ distribution
    based on cpumask_local_spread() performance test results look like this:
    
    ./ntttcp -r -m 16
    NTTTCP for Linux 1.4.0
    ---------------------------------------------------------
    08:05:20 INFO: 17 threads created
    08:05:28 INFO: Network activity progressing...
    08:06:28 INFO: Test run completed.
    08:06:28 INFO: Test cycle finished.
    08:06:28 INFO: #####  Totals:  #####
    08:06:28 INFO: test duration    :60.00 seconds
    08:06:28 INFO: total bytes      :630292053310
    08:06:28 INFO:   throughput     :84.04Gbps
    08:06:28 INFO:   retrans segs   :4
    08:06:28 INFO: cpu cores        :192
    08:06:28 INFO:   cpu speed      :3799.725MHz
    08:06:28 INFO:   user           :0.05%
    08:06:28 INFO:   system         :1.60%
    08:06:28 INFO:   idle           :96.41%
    08:06:28 INFO:   iowait         :0.00%
    08:06:28 INFO:   softirq        :1.94%
    08:06:28 INFO:   cycles/byte    :2.50
    08:06:28 INFO: cpu busy (all)   :534.41%
    
    For NUMA- and sibling-aware IRQ distribution, the same test works
    15% faster:
    
    ./ntttcp -r -m 16
    NTTTCP for Linux 1.4.0
    ---------------------------------------------------------
    08:08:51 INFO: 17 threads created
    08:08:56 INFO: Network activity progressing...
    08:09:56 INFO: Test run completed.
    08:09:56 INFO: Test cycle finished.
    08:09:56 INFO: #####  Totals:  #####
    08:09:56 INFO: test duration    :60.00 seconds
    08:09:56 INFO: total bytes      :741966608384
    08:09:56 INFO:   throughput     :98.93Gbps
    08:09:56 INFO:   retrans segs   :6
    08:09:56 INFO: cpu cores        :192
    08:09:56 INFO:   cpu speed      :3799.791MHz
    08:09:56 INFO:   user           :0.06%
    08:09:56 INFO:   system         :1.81%
    08:09:56 INFO:   idle           :96.18%
    08:09:56 INFO:   iowait         :0.00%
    08:09:56 INFO:   softirq        :1.95%
    08:09:56 INFO:   cycles/byte    :2.25
    08:09:56 INFO: cpu busy (all)   :569.22%
    
    [1] https://lore.kernel.org/all/20231211063726.GA4977@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/
    
    Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
    Co-developed-by: default avatarSouradeep Chakrabarti <schakrabarti@linux.microsoft.com>
    Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
    91bfe210
gdma_main.c 36.9 KB