• Luigi Leonardi's avatar
    vsock/virtio: avoid queuing packets when intermediate queue is empty · efcd71af
    Luigi Leonardi authored
    When the driver needs to send new packets to the device, it always
    queues the new sk_buffs into an intermediate queue (send_pkt_queue)
    and schedules a worker (send_pkt_work) to then queue them into the
    virtqueue exposed to the device.
    
    This increases the chance of batching, but also introduces a lot of
    latency into the communication. So we can optimize this path by
    adding a fast path to be taken when there is no element in the
    intermediate queue, there is space available in the virtqueue,
    and no other process that is sending packets (tx_lock held).
    
    The following benchmarks were run to check improvements in latency and
    throughput. The test bed is a host with Intel i7-10700KF CPU @ 3.80GHz
    and L1 guest running on QEMU/KVM with vhost process and all vCPUs
    pinned individually to pCPUs.
    
    - Latency
       Tool: Fio version 3.37-56
       Mode: pingpong (h-g-h)
       Test runs: 50
       Runtime-per-test: 50s
       Type: SOCK_STREAM
    
    In the following fio benchmark (pingpong mode) the host sends
    a payload to the guest and waits for the same payload back.
    
    fio process pinned both inside the host and the guest system.
    
    Before: Linux 6.9.8
    
    Payload 64B:
    
    	1st perc.	overall		99th perc.
    Before	12.91		16.78		42.24		us
    After	9.77		13.57		39.17		us
    
    Payload 512B:
    
    	1st perc.	overall		99th perc.
    Before	13.35		17.35		41.52		us
    After	10.25		14.11		39.58		us
    
    Payload 4K:
    
    	1st perc.	overall		99th perc.
    Before	14.71		19.87		41.52		us
    After	10.51		14.96		40.81		us
    
    - Throughput
       Tool: iperf-vsock
    
    The size represents the buffer length (-l) to read/write
    P represents the number of parallel streams
    
    P=1
    	4K	64K	128K
    Before	6.87	29.3	29.5 Gb/s
    After	10.5	39.4	39.9 Gb/s
    
    P=2
    	4K	64K	128K
    Before	10.5	32.8	33.2 Gb/s
    After	17.8	47.7	48.5 Gb/s
    
    P=4
    	4K	64K	128K
    Before	12.7	33.6	34.2 Gb/s
    After	16.9	48.1	50.5 Gb/s
    
    The performance improvement is related to this optimization,
    I used a ebpf kretprobe on virtio_transport_send_skb to check
    that each packet was sent directly to the virtqueue
    Co-developed-by: default avatarMarco Pinna <marco.pinn95@gmail.com>
    Signed-off-by: default avatarMarco Pinna <marco.pinn95@gmail.com>
    Signed-off-by: default avatarLuigi Leonardi <luigi.leonardi@outlook.com>
    Message-Id: <20240730-pinna-v4-2-5c9179164db5@outlook.com>
    Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
    Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
    efcd71af
virtio_transport.c 23.3 KB