• Wen Gu's avatar
    net/smc: Allow virtually contiguous sndbufs or RMBs for SMC-R · b8d19945
    Wen Gu authored
    On long-running enterprise production servers, high-order contiguous
    memory pages are usually very rare and in most cases we can only get
    fragmented pages.
    
    When replacing TCP with SMC-R in such production scenarios, attempting
    to allocate high-order physically contiguous sndbufs and RMBs may result
    in frequent memory compaction, which will cause unexpected hung issue
    and further stability risks.
    
    So this patch is aimed to allow SMC-R link group to use virtually
    contiguous sndbufs and RMBs to avoid potential issues mentioned above.
    Whether to use physically or virtually contiguous buffers can be set
    by sysctl smcr_buf_type.
    
    Note that using virtually contiguous buffers will bring an acceptable
    performance regression, which can be mainly divided into two parts:
    
    1) regression in data path, which is brought by additional address
       translation of sndbuf by RNIC in Tx. But in general, translating
       address through MTT is fast.
    
       Taking 256KB sndbuf and RMB as an example, the comparisons in qperf
       latency and bandwidth test with physically and virtually contiguous
       buffers are as follows:
    
    - client:
      smc_run taskset -c <cpu> qperf <server> -oo msg_size:1:64K:*2\
      -t 5 -vu tcp_{bw|lat}
    - server:
      smc_run taskset -c <cpu> qperf
    
       [latency]
       msgsize              tcp            smcr        smcr-use-virt-buf
       1               11.17 us         7.56 us         7.51 us (-0.67%)
       2               10.65 us         7.74 us         7.56 us (-2.31%)
       4               11.11 us         7.52 us         7.59 us ( 0.84%)
       8               10.83 us         7.55 us         7.51 us (-0.48%)
       16              11.21 us         7.46 us         7.51 us ( 0.71%)
       32              10.65 us         7.53 us         7.58 us ( 0.61%)
       64              10.95 us         7.74 us         7.80 us ( 0.76%)
       128             11.14 us         7.83 us         7.87 us ( 0.47%)
       256             10.97 us         7.94 us         7.92 us (-0.28%)
       512             11.23 us         7.94 us         8.20 us ( 3.25%)
       1024            11.60 us         8.12 us         8.20 us ( 0.96%)
       2048            14.04 us         8.30 us         8.51 us ( 2.49%)
       4096            16.88 us         9.13 us         9.07 us (-0.64%)
       8192            22.50 us        10.56 us        11.22 us ( 6.26%)
       16384           28.99 us        12.88 us        13.83 us ( 7.37%)
       32768           40.13 us        16.76 us        16.95 us ( 1.16%)
       65536           68.70 us        24.68 us        24.85 us ( 0.68%)
       [bandwidth]
       msgsize                tcp              smcr          smcr-use-virt-buf
       1                1.65 MB/s         1.59 MB/s         1.53 MB/s (-3.88%)
       2                3.32 MB/s         3.17 MB/s         3.08 MB/s (-2.67%)
       4                6.66 MB/s         6.33 MB/s         6.09 MB/s (-3.85%)
       8               13.67 MB/s        13.45 MB/s        11.97 MB/s (-10.99%)
       16              25.36 MB/s        27.15 MB/s        24.16 MB/s (-11.01%)
       32              48.22 MB/s        54.24 MB/s        49.41 MB/s (-8.89%)
       64             106.79 MB/s       107.32 MB/s        99.05 MB/s (-7.71%)
       128            210.21 MB/s       202.46 MB/s       201.02 MB/s (-0.71%)
       256            400.81 MB/s       416.81 MB/s       393.52 MB/s (-5.59%)
       512            746.49 MB/s       834.12 MB/s       809.99 MB/s (-2.89%)
       1024          1292.33 MB/s      1641.96 MB/s      1571.82 MB/s (-4.27%)
       2048          2007.64 MB/s      2760.44 MB/s      2717.68 MB/s (-1.55%)
       4096          2665.17 MB/s      4157.44 MB/s      4070.76 MB/s (-2.09%)
       8192          3159.72 MB/s      4361.57 MB/s      4270.65 MB/s (-2.08%)
       16384         4186.70 MB/s      4574.13 MB/s      4501.17 MB/s (-1.60%)
       32768         4093.21 MB/s      4487.42 MB/s      4322.43 MB/s (-3.68%)
       65536         4057.14 MB/s      4735.61 MB/s      4555.17 MB/s (-3.81%)
    
    2) regression in buffer initialization and destruction path, which is
       brought by additional MR operations of sndbufs. But thanks to link
       group buffer reuse mechanism, the impact of this kind of regression
       decreases as times of buffer reuse increases.
    
       Taking 256KB sndbuf and RMB as an example, latency of some key SMC-R
       buffer-related function obtained by bpftrace are as follows:
    
       Function                         Phys-bufs           Virt-bufs
       smcr_new_buf_create()             67154 ns            79164 ns
       smc_ib_buf_map_sg()                 525 ns              928 ns
       smc_ib_get_memory_region()       162294 ns           161191 ns
       smc_wr_reg_send()                  9957 ns             9635 ns
       smc_ib_put_memory_region()       203548 ns           198374 ns
       smc_ib_buf_unmap_sg()               508 ns             1158 ns
    
    ------------
    Test environment notes:
    1. Above tests run on 2 VMs within the same Host.
    2. The NIC is ConnectX-4Lx, using SRIOV and passing through 2 VFs to
       the each VM respectively.
    3. VMs' vCPUs are binded to different physical CPUs, and the binded
       physical CPUs are isolated by `isolcpus=xxx` cmdline.
    4. NICs' queue number are set to 1.
    Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    b8d19945
af_smc.c 88.4 KB