• Jiri Olsa's avatar
    uprobe: Add uretprobe syscall to speed up return probe · ff474a78
    Jiri Olsa authored
    Adding uretprobe syscall instead of trap to speed up return probe.
    
    At the moment the uretprobe setup/path is:
    
      - install entry uprobe
    
      - when the uprobe is hit, it overwrites probed function's return address
        on stack with address of the trampoline that contains breakpoint
        instruction
    
      - the breakpoint trap code handles the uretprobe consumers execution and
        jumps back to original return address
    
    This patch replaces the above trampoline's breakpoint instruction with new
    ureprobe syscall call. This syscall does exactly the same job as the trap
    with some more extra work:
    
      - syscall trampoline must save original value for rax/r11/rcx registers
        on stack - rax is set to syscall number and r11/rcx are changed and
        used by syscall instruction
    
      - the syscall code reads the original values of those registers and
        restore those values in task's pt_regs area
    
      - only caller from trampoline exposed in '[uprobes]' is allowed,
        the process will receive SIGILL signal otherwise
    
    Even with some extra work, using the uretprobes syscall shows speed
    improvement (compared to using standard breakpoint):
    
      On Intel (11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz)
    
      current:
        uretprobe-nop  :    1.498 ± 0.000M/s
        uretprobe-push :    1.448 ± 0.001M/s
        uretprobe-ret  :    0.816 ± 0.001M/s
    
      with the fix:
        uretprobe-nop  :    1.969 ± 0.002M/s  < 31% speed up
        uretprobe-push :    1.910 ± 0.000M/s  < 31% speed up
        uretprobe-ret  :    0.934 ± 0.000M/s  < 14% speed up
    
      On Amd (AMD Ryzen 7 5700U)
    
      current:
        uretprobe-nop  :    0.778 ± 0.001M/s
        uretprobe-push :    0.744 ± 0.001M/s
        uretprobe-ret  :    0.540 ± 0.001M/s
    
      with the fix:
        uretprobe-nop  :    0.860 ± 0.001M/s  < 10% speed up
        uretprobe-push :    0.818 ± 0.001M/s  < 10% speed up
        uretprobe-ret  :    0.578 ± 0.000M/s  <  7% speed up
    
    The performance test spawns a thread that runs loop which triggers
    uprobe with attached bpf program that increments the counter that
    gets printed in results above.
    
    The uprobe (and uretprobe) kind is determined by which instruction
    is being patched with breakpoint instruction. That's also important
    for uretprobes, because uprobe is installed for each uretprobe.
    
    The performance test is part of bpf selftests:
      tools/testing/selftests/bpf/run_bench_uprobes.sh
    
    Note at the moment uretprobe syscall is supported only for native
    64-bit process, compat process still uses standard breakpoint.
    
    Note that when shadow stack is enabled the uretprobe syscall returns
    via iret, which is slower than return via sysret, but won't cause the
    shadow stack violation.
    
    Link: https://lore.kernel.org/all/20240611112158.40795-4-jolsa@kernel.org/Suggested-by: default avatarAndrii Nakryiko <andrii@kernel.org>
    Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
    Reviewed-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
    Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
    Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
    Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
    Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
    ff474a78
uprobes.c 57.2 KB