• Arnaldo Carvalho de Melo's avatar
    perf trace: Handle raw_syscalls:sys_enter just like the BPF_OUTPUT augmented event · b119970a
    Arnaldo Carvalho de Melo authored
    So, we use a PERF_COUNT_SW_BPF_OUTPUT to output the augmented sys_enter
    payload, i.e. to output more than just the raw syscall args, and if
    something goes wrong when handling an unfiltered syscall, we bail out
    and just return 1 in the bpf program associated with
    raw_syscalls:sys_enter, meaning, don't filter that tracepoint, in which
    case what will appear in the perf ring buffer isn't the BPF_OUTPUT
    event, but the original raw_syscalls:sys_enter event with its normal
    payload.
    
    Now that we're switching to using a bpf_tail_call +
    BPF_MAP_TYPE_PROG_ARRAY we're going to use this in the common case, so a
    bug where raw_syscalls:sys_enter wasn't being handled by
    trace__sys_enter() surfaced and for  that case, instead of using the
    strace-like augmenter (trace__sys_enter()), we continued to use the
    normal generic tracepoint handler:
    
      (gdb) p evsel
      $2 = (struct perf_evsel *) 0xc03e40
      (gdb) p evsel->name
      $3 = 0xbc56c0 "raw_syscalls:sys_enter"
      (gdb) p ((struct perf_evsel *) 0xc03e40)->name
      $4 = 0xbc56c0 "raw_syscalls:sys_enter"
      (gdb) p ((struct perf_evsel *) 0xc03e40)->handler
      $5 = (void *) 0x495eb3 <trace__event_handler>
    
    This resulted in this:
    
         0.027 raw_syscalls:sys_enter:NR 12 (0, 7fcfcac64c9b, 4d, 7fcfcac64c9b, 7fcfcac6ce00, 19)
         ... [continued]: brk())                = 0x563b88677000
    
    I.e. only the sys_exit tracepoint was being properly handled, but since
    the sys_enter went to the generic trace__event_handler() we printed it
    using libtraceevent's formatter instead of 'perf trace's strace-like
    one.
    
    Fix it by setting trace__sys_enter() as the handler for
    raw_syscalls:sys_enter and setup the tp_field tracepoint field
    accessors.
    
    Now, to test it we just make raw_syscalls:sys_enter return 1 right after
    checking if the pid is filtered, making it not use
    bpf_perf_output_event() but rather ask for the tracepoint not to be
    filtered and the result is the expected one:
    
      brk(NULL)                               = 0x556f42d6e000
    
    I.e. raw_syscalls:sys_enter returns 1, gets handled by
    trace__sys_enter() and gets it combined with the raw_syscalls:sys_exit
    in a strace-like way.
    
    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Link: https://lkml.kernel.org/n/tip-0mkocgk31nmy0odknegcby4z@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    b119970a
builtin-trace.c 120 KB