• Arjun Roy's avatar
    tcp: Use per-vma locking for receive zerocopy · 7a7f0946
    Arjun Roy authored
    Per-VMA locking allows us to lock a struct vm_area_struct without
    taking the process-wide mmap lock in read mode.
    
    Consider a process workload where the mmap lock is taken constantly in
    write mode. In this scenario, all zerocopy receives are periodically
    blocked during that period of time - though in principle, the memory
    ranges being used by TCP are not touched by the operations that need
    the mmap write lock. This results in performance degradation.
    
    Now consider another workload where the mmap lock is never taken in
    write mode, but there are many TCP connections using receive zerocopy
    that are concurrently receiving. These connections all take the mmap
    lock in read mode, but this does induce a lot of contention and atomic
    ops for this process-wide lock. This results in additional CPU
    overhead caused by contending on the cache line for this lock.
    
    However, with per-vma locking, both of these problems can be avoided.
    
    As a test, I ran an RPC-style request/response workload with 4KB
    payloads and receive zerocopy enabled, with 100 simultaneous TCP
    connections. I measured perf cycles within the
    find_tcp_vma/mmap_read_lock/mmap_read_unlock codepath, with and
    without per-vma locking enabled.
    
    When using process-wide mmap semaphore read locking, about 1% of
    measured perf cycles were within this path. With per-VMA locking, this
    value dropped to about 0.45%.
    Signed-off-by: default avatarArjun Roy <arjunroy@google.com>
    Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    7a7f0946
tcp.c 126 KB