• Andrii Nakryiko's avatar
    uprobes: perform lockless SRCU-protected uprobes_tree lookup · cd7bdd9d
    Andrii Nakryiko authored
    Another big bottleneck to scalablity is uprobe_treelock that's taken in
    a very hot path in handle_swbp(). Now that uprobes are SRCU-protected,
    take advantage of that and make uprobes_tree RB-tree look up lockless.
    
    To make RB-tree RCU-protected lockless lookup correct, we need to take
    into account that such RB-tree lookup can return false negatives if there
    are parallel RB-tree modifications (rotations) going on. We use seqcount
    lock to detect whether RB-tree changed, and if we find nothing while
    RB-tree got modified inbetween, we just retry. If uprobe was found, then
    it's guaranteed to be a correct lookup.
    
    With all the lock-avoiding changes done, we get a pretty decent
    improvement in performance and scalability of uprobes with number of
    CPUs, even though we are still nowhere near linear scalability. This is
    due to SRCU not really scaling very well with number of CPUs on
    a particular hardware that was used for testing (80-core Intel Xeon Gold
    6138 CPU @ 2.00GHz), but also due to the remaning mmap_lock, which is
    currently taken to resolve interrupt address to inode+offset and then
    uprobe instance. And, of course, uretprobes still need similar RCU to
    avoid refcount in the hot path, which will be addressed in the follow up
    patches.
    
    Nevertheless, the improvement is good. We used BPF selftest-based
    uprobe-nop and uretprobe-nop benchmarks to get the below numbers,
    varying number of CPUs on which uprobes and uretprobes are triggered.
    
    BASELINE
    ========
    uprobe-nop      ( 1 cpus):    3.032 ± 0.023M/s  (  3.032M/s/cpu)
    uprobe-nop      ( 2 cpus):    3.452 ± 0.005M/s  (  1.726M/s/cpu)
    uprobe-nop      ( 4 cpus):    3.663 ± 0.005M/s  (  0.916M/s/cpu)
    uprobe-nop      ( 8 cpus):    3.718 ± 0.038M/s  (  0.465M/s/cpu)
    uprobe-nop      (16 cpus):    3.344 ± 0.008M/s  (  0.209M/s/cpu)
    uprobe-nop      (32 cpus):    2.288 ± 0.021M/s  (  0.071M/s/cpu)
    uprobe-nop      (64 cpus):    3.205 ± 0.004M/s  (  0.050M/s/cpu)
    
    uretprobe-nop   ( 1 cpus):    1.979 ± 0.005M/s  (  1.979M/s/cpu)
    uretprobe-nop   ( 2 cpus):    2.361 ± 0.005M/s  (  1.180M/s/cpu)
    uretprobe-nop   ( 4 cpus):    2.309 ± 0.002M/s  (  0.577M/s/cpu)
    uretprobe-nop   ( 8 cpus):    2.253 ± 0.001M/s  (  0.282M/s/cpu)
    uretprobe-nop   (16 cpus):    2.007 ± 0.000M/s  (  0.125M/s/cpu)
    uretprobe-nop   (32 cpus):    1.624 ± 0.003M/s  (  0.051M/s/cpu)
    uretprobe-nop   (64 cpus):    2.149 ± 0.001M/s  (  0.034M/s/cpu)
    
    SRCU CHANGES
    ============
    uprobe-nop      ( 1 cpus):    3.276 ± 0.005M/s  (  3.276M/s/cpu)
    uprobe-nop      ( 2 cpus):    4.125 ± 0.002M/s  (  2.063M/s/cpu)
    uprobe-nop      ( 4 cpus):    7.713 ± 0.002M/s  (  1.928M/s/cpu)
    uprobe-nop      ( 8 cpus):    8.097 ± 0.006M/s  (  1.012M/s/cpu)
    uprobe-nop      (16 cpus):    6.501 ± 0.056M/s  (  0.406M/s/cpu)
    uprobe-nop      (32 cpus):    4.398 ± 0.084M/s  (  0.137M/s/cpu)
    uprobe-nop      (64 cpus):    6.452 ± 0.000M/s  (  0.101M/s/cpu)
    
    uretprobe-nop   ( 1 cpus):    2.055 ± 0.001M/s  (  2.055M/s/cpu)
    uretprobe-nop   ( 2 cpus):    2.677 ± 0.000M/s  (  1.339M/s/cpu)
    uretprobe-nop   ( 4 cpus):    4.561 ± 0.003M/s  (  1.140M/s/cpu)
    uretprobe-nop   ( 8 cpus):    5.291 ± 0.002M/s  (  0.661M/s/cpu)
    uretprobe-nop   (16 cpus):    5.065 ± 0.019M/s  (  0.317M/s/cpu)
    uretprobe-nop   (32 cpus):    3.622 ± 0.003M/s  (  0.113M/s/cpu)
    uretprobe-nop   (64 cpus):    3.723 ± 0.002M/s  (  0.058M/s/cpu)
    
    Peak througput increased from 3.7 mln/s (uprobe triggerings) up to about
    8 mln/s. For uretprobes it's a bit more modest with bump from 2.4 mln/s
    to 5mln/s.
    Suggested-by: default avatar"Peter Zijlstra (Intel)" <peterz@infradead.org>
    Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
    Link: https://lore.kernel.org/r/20240903174603.3554182-8-andrii@kernel.org
    cd7bdd9d
uprobes.c 60.1 KB