1. 10 Oct, 2019 3 commits
  2. 09 Oct, 2019 14 commits
    • Arnaldo Carvalho de Melo's avatar
      perf beauty: Introduce strtoul() for x86 MSRs · 728db198
      Arnaldo Carvalho de Melo authored
      Continuing from the previous cset comment, now that filter expression
      works:
      
        # perf trace -e msr:* --filter="msr!=FS_BASE && msr != IA32_TSC_DEADLINE && msr != 0x830 && msr != 0x83f && msr !=IA32_SPEC_CTRL" --filter-pids 3750
           0.000 Timer/5033 msr:write_msr(msr: SYSCALL_MASK, val: 292608)
           0.009 Timer/5033 msr:write_msr(msr: LSTAR, val: -1398800368)
           0.010 Timer/5033 msr:write_msr(msr: TSC_AUX, val: 4)
           0.050 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
          45.661 gnome-terminal/12595 msr:write_msr(msr: SYSCALL_MASK, val: 292608)
          45.672 gnome-terminal/12595 msr:write_msr(msr: LSTAR, val: -1398800368)
          45.675 gnome-terminal/12595 msr:write_msr(msr: TSC_AUX, val: 3)
          54.852 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
         130.508 Timer/4050 msr:write_msr(msr: SYSCALL_MASK, val: 292608)
         130.527 Timer/4050 msr:write_msr(msr: LSTAR, val: -1398800368)
         130.531 Timer/4050 msr:write_msr(msr: TSC_AUX, val: 3)
         140.924 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
         164.738 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
         603.578 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
         620.809 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
         690.115 JS Watchdog/4259 msr:write_msr(msr: SYSCALL_MASK, val: 292608)
         690.136 JS Watchdog/4259 msr:write_msr(msr: LSTAR, val: -1398800368)
         690.141 JS Watchdog/4259 msr:write_msr(msr: TSC_AUX, val: 3)
         690.186 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
         759.016 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
      ^C[root@quaco ~]#
      
      Or look at the first 3 write_msr events for that IA32_TSC_DEADLINE to learn why
      it happens so often:
      
        # perf trace --max-events=3 --max-stack=8 -e msr:* --filter="msr==IA32_TSC_DEADLINE" --filter-pids 3750
           0.000 :0/0 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 19296732550862)
                                             do_trace_write_msr ([kernel.kallsyms])
                                             do_trace_write_msr ([kernel.kallsyms])
                                             lapic_next_deadline ([kernel.kallsyms])
                                             clockevents_program_event ([kernel.kallsyms])
                                             hrtimer_interrupt ([kernel.kallsyms])
                                             smp_apic_timer_interrupt ([kernel.kallsyms])
                                             apic_timer_interrupt ([kernel.kallsyms])
                                             cpuidle_enter_state ([kernel.kallsyms])
          32.646 :0/0 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 19296800134158)
                                             do_trace_write_msr ([kernel.kallsyms])
                                             do_trace_write_msr ([kernel.kallsyms])
                                             lapic_next_deadline ([kernel.kallsyms])
                                             clockevents_program_event ([kernel.kallsyms])
                                             hrtimer_start_range_ns ([kernel.kallsyms])
                                             tick_nohz_restart_sched_tick ([kernel.kallsyms])
                                             tick_nohz_idle_exit ([kernel.kallsyms])
                                             do_idle ([kernel.kallsyms])
          32.802 :0/0 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 19297507436922)
                                             do_trace_write_msr ([kernel.kallsyms])
                                             do_trace_write_msr ([kernel.kallsyms])
                                             lapic_next_deadline ([kernel.kallsyms])
                                             clockevents_program_event ([kernel.kallsyms])
                                             hrtimer_try_to_cancel ([kernel.kallsyms])
                                             hrtimer_cancel ([kernel.kallsyms])
                                             tick_nohz_restart_sched_tick ([kernel.kallsyms])
                                             tick_nohz_idle_exit ([kernel.kallsyms])
        #
      
      And if some of the strings can't be found:
      
        # trace -e msr:* --filter="msr!=SPECULATIVE_EXECUTION_PROBLEMS_SOLUTION && msr != IA32_TSC_DEADLINE && msr != 0x830 && msr != 0x83f && msr !=IA32_SPEC_CTRL" --filter-pids 3750
        "SPECULATIVE_EXECUTION_PROBLEMS_SOLUTION" not found for "msr" in "msr:read_msr", can't set filter "(msr!=SPECULATIVE_EXECUTION_PROBLEMS_SOLUTION && msr != IA32_TSC_DEADLINE && msr != 0x830 && msr != 0x83f && msr !=IA32_SPEC_CTRL) && (common_pid != 28131 && common_pid != 3750)"
        #
      
      Next step is to automatically wire up the pre-existing strarrays, which there
      are quite a few.
      
      The strtoul() methods will be further enhanced to allow for looking at other
      arguments in a syscall/tracepoint, just like going from integer to string
      (scnprintf methods), so that those "val" lines for the msr tracepoints can be
      properly formatted or even resolved into some string.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-4qaai5iqjgefd11k4ddm7qg8@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      728db198
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Expand strings in filters to integers · 90df0249
      Arnaldo Carvalho de Melo authored
      So that one can try things like:
      
        # perf trace -e msr:* --filter="msr!=FS_BASE && msr != IA32_TSC_DEADLINE && msr != 0x830 && msr != 0x83f && msr !=IA32_SPEC_CTRL" --filter-pids 3750
      
      That, at this point in the patchset, without any strtoul in place for
      tracepoint arguments, will result in:
      
        No resolver (strtoul) for "msr" in "msr:read_msr", can't set filter "(msr!=FS_BASE && msr != IA32_TSC_DEADLINE && msr != 0x830 && msr != 0x83f && msr !=IA32_SPEC_CTRL) && (common_pid != 25407 && common_pid != 3750)"
        #
      
      See you in the next cset!
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-dx5j70fv2rgkeezd1cb3hv2p@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      90df0249
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Introduce a strtoul() method for 'struct strarrays' · d0a3a104
      Arnaldo Carvalho de Melo authored
      And also for 'struct strarray', since its needed to implement
      strarrays__strtoul(). This just traverses the entries and when finding a
      match, returns (offset + index), i.e. the value associated with the
      searched string.
      
      E.g. "EFER" (MSR_EFER) returns:
      
        # grep -w EFER -B2 /tmp/build/perf/trace/beauty/generated/x86_arch_MSRs_array.c
        #define x86_64_specific_MSRs_offset 0xc0000080
        static const char *x86_64_specific_MSRs[] = {
      	[0xc0000080 - x86_64_specific_MSRs_offset] = "EFER",
        #
      
        0xc0000080
      
      This will be auto-attached to 'struct syscall_arg_fmt' entries
      associated with strarrays as soon as we add a ->strarray and ->strarrays
      to 'struct syscall_arg_fmt'.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-r2hpaahf8lishyb1owko9vs1@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d0a3a104
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Add a strtoul() method to 'struct syscall_arg_fmt' · 3f41b778
      Arnaldo Carvalho de Melo authored
      This will go from a string to a number, so that filter expressions can
      be constructed with strings and then, before applying the tracepoint
      filters (or eBPF, in the future) we can map those strings to numbers.
      
      The first one will be for 'msr' tracepoint arguments, but real quickly
      we will be able to reuse all strarrays for that.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-wgqq48agcgr95b8dmn6fygtr@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3f41b778
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Introduce --filter for tracepoint events · d4097f19
      Arnaldo Carvalho de Melo authored
      Similar to what is in 'perf record', works just like there:
      
        # perf trace -e msr:*
         328.297 :0/0 msr:write_msr(msr: FS_BASE, val: 140240388381888)
         328.302 :0/0 msr:write_msr(msr: FS_BASE, val: 140240388381888)
         328.306 :0/0 msr:write_msr(msr: FS_BASE, val: 140240388381888)
         328.317 :0/0 msr:write_msr(msr: FS_BASE, val: 140240388381888)
         328.322 :0/0 msr:write_msr(msr: FS_BASE, val: 140240388381888)
         328.327 :0/0 msr:write_msr(msr: FS_BASE, val: 140240388381888)
         328.331 :0/0 msr:write_msr(msr: FS_BASE, val: 140240388381888)
         328.336 :0/0 msr:write_msr(msr: FS_BASE, val: 140240388381888)
         328.340 :0/0 ^Cmsr:write_msr(msr: FS_BASE, val: 140240388381888)
        #
      
      So, for a system wide trace session looking at the write_msr tracepoint
      we see a flood of MSR_FS_BASE, we need to get the number for that:
      
        # grep FS_BASE /tmp/build/perf/trace/beauty/generated/x86_arch_MSRs_array.c
      	[0xc0000100 - x86_64_specific_MSRs_offset] = "FS_BASE",
        #
      
      And then use it in a filter:
      
        # perf trace -e msr:* --filter="msr!=0xc0000100"
        <SNIP>
         942.177 :0/0 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 3056931068232)
         942.199 :0/0 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 3057135655252)
         942.203 :0/0 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 3056931068222)
         942.231 :0/0 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 3056998373022)
         942.241 :0/0 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 3056931068236)
        <SNIP>
        #
      
      Ok, lets filter that too, too noisy:
      
        # grep TSC_DEADLINE /tmp/build/perf/trace/beauty/generated/x86_arch_MSRs_array.c
      	[0x000006E0] = "IA32_TSC_DEADLINE",
        #
      
        # perf trace -e msr:* --filter="msr!=0xc0000100 && msr!=0x6e0" -a sleep 0.1
           0.000 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
           0.066 CPU 0/KVM/4895 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
           0.070 CPU 0/KVM/4895 msr:write_msr(msr: 0x830, val: 34359740667)
           0.099 CPU 0/KVM/4895 msr:read_msr(msr: IA32_SYSENTER_ESP, val: -2199021993472)
           0.100 CPU 0/KVM/4895 msr:read_msr(msr: IA32_APICBASE, val: 4276096000)
           0.101 CPU 0/KVM/4895 msr:read_msr(msr: IA32_DEBUGCTLMSR)
           0.109 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL)
           1.000 :0/0 msr:write_msr(msr: 0x830, val: 17179871485)
          18.893 :0/0 msr:write_msr(msr: 0x83f, val: 246)
          28.810 :0/0 msr:write_msr(msr: 0x830, val: 68719479037)
          40.117 CPU 0/KVM/4895 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
          40.127 CPU 0/KVM/4895 msr:read_msr(msr: IA32_DEBUGCTLMSR)
          40.139 CPU 0/KVM/4895 msr:write_msr(msr: LSTAR, val: -2130661312)
          40.141 CPU 0/KVM/4895 msr:write_msr(msr: SYSCALL_MASK, val: 14080)
          40.142 CPU 0/KVM/4895 msr:write_msr(msr: TSC_AUX)
          40.144 CPU 0/KVM/4895 msr:write_msr(msr: KERNEL_GS_BASE)
          40.147 CPU 0/KVM/4895 msr:write_msr(msr: IA32_SPEC_CTRL)
          40.148 CPU 0/KVM/4895 msr:write_msr(msr: IA32_FLUSH_CMD, val: 1)
          40.151 CPU 0/KVM/4895 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
        ^C
        #
      
      One can combine that with filtering pids as well:
      
        # perf trace -e msr:* --filter="msr!=0xc0000100 && msr!=0x6e0" --filter-pids 4895 -a sleep 0.09
           0.000 :0/0 msr:write_msr(msr: 0x830, val: 4294969597)
           0.291 gnome-terminal/2790 msr:write_msr(msr: SYSCALL_MASK, val: 292608)
           0.294 gnome-terminal/2790 msr:write_msr(msr: LSTAR, val: -1935671280)
           0.295 gnome-terminal/2790 msr:write_msr(msr: TSC_AUX, val: 6)
          10.940 gnome-terminal/2790 msr:write_msr(msr: 0x830, val: 4294969597)
          15.943 gnome-shell/2096 msr:write_msr(msr: 0x830, val: 4294969597)
          16.975 :0/0 msr:write_msr(msr: 0x830, val: 4294969597)
          19.560 :0/0 msr:write_msr(msr: 0x83f, val: 246)
          25.162 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
          25.807 JS Watchdog/3635 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
          25.820 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL)
          25.941 gnome-terminal/2790 msr:write_msr(msr: 0x830, val: 4294969597)
          26.941 gnome-terminal/2790 msr:write_msr(msr: 0x830, val: 4294969597)
          29.942 gnome-terminal/2790 msr:write_msr(msr: 0x830, val: 4294969597)
          45.313 :0/0 msr:write_msr(msr: 0x83f, val: 246)
          56.945 gnome-terminal/2790 msr:write_msr(msr: 0x830, val: 4294969597)
          60.946 gnome-terminal/2790 msr:write_msr(msr: 0x830, val: 4294969597)
          74.096 JS Watchdog/8971 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
          74.130 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL)
          79.673 :0/0 msr:write_msr(msr: 0x83f, val: 246)
          79.947 gnome-terminal/2790 msr:write_msr(msr: 0x830, val: 17179871485)
        #
      
      Or for just a pid, with callchains:
      
        # grep SYSCALL_MAS /tmp/build/perf/trace/beauty/generated/x86_arch_MSRs_array.c
      	[0xc0000084 - x86_64_specific_MSRs_offset] = "SYSCALL_MASK",
        # perf trace -e msr:* --filter="msr==0xc0000084" --pid 2790 --call-graph=dwarf
      
           0.000 gnome-terminal/2790 msr:write_msr(msr: SYSCALL_MASK, val: 292608)
                                             do_trace_write_msr ([kernel.kallsyms])
                                             do_trace_write_msr ([kernel.kallsyms])
                                             kvm_on_user_return ([kvm])
                                             fire_user_return_notifiers ([kernel.kallsyms])
                                             exit_to_usermode_loop ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             entry_SYSCALL_64 ([kernel.kallsyms])
                                             __GI___poll (inlined)
        9299.073 gnome-terminal/2790 msr:write_msr(msr: SYSCALL_MASK, val: 292608)
                                             do_trace_write_msr ([kernel.kallsyms])
                                             do_trace_write_msr ([kernel.kallsyms])
                                             kvm_on_user_return ([kvm])
                                             fire_user_return_notifiers ([kernel.kallsyms])
                                             exit_to_usermode_loop ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             entry_SYSCALL_64 ([kernel.kallsyms])
                                             __GI___poll (inlined)
        9348.374 gnome-terminal/2790 msr:write_msr(msr: SYSCALL_MASK, val: 292608)
                                             do_trace_write_msr ([kernel.kallsyms])
                                             do_trace_write_msr ([kernel.kallsyms])
                                             kvm_on_user_return ([kvm])
                                             fire_user_return_notifiers ([kernel.kallsyms])
                                             exit_to_usermode_loop ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             entry_SYSCALL_64 ([kernel.kallsyms])
                                             __GI___poll (inlined)
        <SNIP>
        #
      
      Ok, just another form of KVM to emit MSRs :-)
      
      Next step: elliminate those greps by getting the filter expression,
      looking for arg names, then for the arrays associated with it to do a
      reverse lookup.
      
      Also allow those filters to be associated with strace-like syscall
      names.
      
      After that: augment the 'val' arg for 'msr:write_msr' based on the first
      arg, 'msr'.
      
      Then, do that with eBPF too, not just with tracepoint filters.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-95bfe5d4tzy5f66bx49d05rj@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d4097f19
    • Arnaldo Carvalho de Melo's avatar
      perf evlist: Introduce append_tp_filter_pid() and append_tp_filter_pids() · 1827ab5b
      Arnaldo Carvalho de Melo authored
      We'll need this to support 'perf trace e tracepoint --filter=expr', as
      the command line tracepoint filter is attache to the preceding evsel,
      just like in 'perf record' and when we go to set pid filters, which we
      do at the minimum to filter 'perf trace' own syscalls, we need to
      append, not set the tp filter.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-daynpknni44ywuzi8iua57nn@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1827ab5b
    • Arnaldo Carvalho de Melo's avatar
      perf evlist: Introduce append_tp_filter() method · 53c92f73
      Arnaldo Carvalho de Melo authored
      Will be used by 'perf trace' to support 'perf trace --filter', we need
      to append to any pre-existing filter.
      
      When parse_filter() gets invoked to process --filter, it'll set the
      filter to that specified on the command line, later on, when we filter
      out 'perf trace' own pid to avoid an event feedback loop, we need to
      preserve the command line filter put in place by parse_filter().
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-h9rot08qmxlnfmte0holt68x@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      53c92f73
    • Arnaldo Carvalho de Melo's avatar
      perf evlist: Factor out asprintf routine to build a tracepoint pid filter · 05cea449
      Arnaldo Carvalho de Melo authored
      Will be used to append such lists to existing filters.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-798vlyqfqw938ehoe8etivx1@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      05cea449
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Associate the "msr" tracepoint arg name with x86_MSR__scnprintf() · c330ef28
      Arnaldo Carvalho de Melo authored
      So that we can go from:
      
        # perf trace -e msr:write_msr --max-stack=16 sleep 1
             0.000 sleep/6740 msr:write_msr(msr: 3221225728, val: 139636317451648)
                                               do_trace_write_msr ([kernel.kallsyms])
                                               do_trace_write_msr ([kernel.kallsyms])
                                               do_arch_prctl_64 ([kernel.kallsyms])
                                               __x64_sys_arch_prctl ([kernel.kallsyms])
                                               do_syscall_64 ([kernel.kallsyms])
                                               entry_SYSCALL_64 ([kernel.kallsyms])
                                               init_tls (/usr/lib64/ld-2.29.so)
                                               dl_main (/usr/lib64/ld-2.29.so)
                                               _dl_sysdep_start (/usr/lib64/ld-2.29.so)
                                               _dl_start (/usr/lib64/ld-2.29.so)
        #
      
      To:
      
        # perf trace -e msr:write_msr --max-stack=16 sleep 1
           0.000 sleep/8519 msr:write_msr(msr: FS_BASE, val: 139878031705472)
                                             do_trace_write_msr ([kernel.kallsyms])
                                             do_trace_write_msr ([kernel.kallsyms])
                                             do_arch_prctl_64 ([kernel.kallsyms])
                                             __x64_sys_arch_prctl ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             entry_SYSCALL_64 ([kernel.kallsyms])
                                             init_tls (/usr/lib64/ld-2.29.so)
                                             dl_main (/usr/lib64/ld-2.29.so)
                                             _dl_sysdep_start (/usr/lib64/ld-2.29.so)
                                             _dl_start (/usr/lib64/ld-2.29.so)
        #
      
      This, in reverse, will allow for symbolic system call/tracepoint
      filtering.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-q1q4unmqja5ex7dy0kb5cjaa@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c330ef28
    • Arnaldo Carvalho de Melo's avatar
      perf trace beauty: Add the glue for the autogenerated MSR arrays · 646b3e2c
      Arnaldo Carvalho de Melo authored
      We need to wrap those autogenerated string arrays with the
      strarrays__scnprintf() formatter, do it.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-wqjz4kwi4a0ot6lsis3kc65j@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      646b3e2c
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Allow associating scnprintf routines with well known arg names · 5d88099b
      Arnaldo Carvalho de Melo authored
      For instance 'msr' appears in several tracepoints, so we can associate
      it with a single scnprintf() routine auto-generated from kernel headers,
      as will be done in followup patches.
      
      Start with an empty array of associations.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-89ptht6s5fez82lykuwq1eyb@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5d88099b
    • Arnaldo Carvalho de Melo's avatar
      perf beauty: Hook up the x86 MSR table generator · fd218347
      Arnaldo Carvalho de Melo authored
      This way we generate the source with the table for later use by plugins,
      etc.
      
      I.e. after running:
      
        $ make -C tools/perf O=/tmp/build/perf
      
      We end up with:
      
        $ head /tmp/build/perf/trace/beauty/generated/x86_arch_MSRs_array.c
        static const char *x86_MSRs[] = {
        	[0x00000000] = "IA32_P5_MC_ADDR",
        	[0x00000001] = "IA32_P5_MC_TYPE",
        	[0x00000010] = "IA32_TSC",
        	[0x00000017] = "IA32_PLATFORM_ID",
        	[0x0000001b] = "IA32_APICBASE",
        	[0x00000020] = "KNC_PERFCTR0",
        	[0x00000021] = "KNC_PERFCTR1",
        	[0x00000028] = "KNC_EVNTSEL0",
        	[0x00000029] = "KNC_EVNTSEL1",
        $
      
      Now its just a matter of using it, first in a libtracevent plugin.
      
      At some point we should move tools/perf/trace/beauty to tools/beauty/,
      so that it can be used more generally and even made available externally
      like libbpf, libperf, libtraevent, etc.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-b3rmutg4igcohx6kpo67qh4j@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      fd218347
    • Arnaldo Carvalho de Melo's avatar
      perf trace beauty: Add a x86 MSR cmd id->str table generator · 693d3458
      Arnaldo Carvalho de Melo authored
      Without parameters it'll parse tools/arch/x86/include/asm/msr-index.h
      and output a table usable by tools, that will be wired up later to a
      libtraceevent plugin registered from perf's glue code:
      
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh
        static const char *x86_MSRs[] = {
       <SNIP>
        	[0x00000034] = "SMI_COUNT",
        	[0x0000003a] = "IA32_FEATURE_CONTROL",
        	[0x0000003b] = "IA32_TSC_ADJUST",
        	[0x00000040] = "LBR_CORE_FROM",
        	[0x00000048] = "IA32_SPEC_CTRL",
        	[0x00000049] = "IA32_PRED_CMD",
       <SNIP>
        	[0x0000010b] = "IA32_FLUSH_CMD",
        	[0x0000010F] = "TSX_FORCE_ABORT",
       <SNIP>
        	[0x00000198] = "IA32_PERF_STATUS",
        	[0x00000199] = "IA32_PERF_CTL",
        <SNIP>
        	[0x00000da0] = "IA32_XSS",
        	[0x00000dc0] = "LBR_INFO_0",
        	[0x00000ffc] = "IA32_BNDCFGS_RSVD",
        };
      
        #define x86_64_specific_MSRs_offset 0xc0000080
        static const char *x86_64_specific_MSRs[] = {
        	[0xc0000080 - x86_64_specific_MSRs_offset] = "EFER",
        	[0xc0000081 - x86_64_specific_MSRs_offset] = "STAR",
        	[0xc0000082 - x86_64_specific_MSRs_offset] = "LSTAR",
        	[0xc0000083 - x86_64_specific_MSRs_offset] = "CSTAR",
        	[0xc0000084 - x86_64_specific_MSRs_offset] = "SYSCALL_MASK",
        <SNIP>
        	[0xc0000103 - x86_64_specific_MSRs_offset] = "TSC_AUX",
        	[0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO",
        };
      
        #define x86_AMD_V_KVM_MSRs_offset 0xc0010000
        static const char *x86_AMD_V_KVM_MSRs[] = {
        	[0xc0010000 - x86_AMD_V_KVM_MSRs_offset] = "K7_EVNTSEL0",
        <SNIP>
        	[0xc0010114 - x86_AMD_V_KVM_MSRs_offset] = "VM_CR",
        	[0xc0010115 - x86_AMD_V_KVM_MSRs_offset] = "VM_IGNNE",
        	[0xc0010117 - x86_AMD_V_KVM_MSRs_offset] = "VM_HSAVE_PA",
        <SNIP>
        	[0xc0010240 - x86_AMD_V_KVM_MSRs_offset] = "F15H_NB_PERF_CTL",
        	[0xc0010241 - x86_AMD_V_KVM_MSRs_offset] = "F15H_NB_PERF_CTR",
        	[0xc0010280 - x86_AMD_V_KVM_MSRs_offset] = "F15H_PTSC",
        };
      
      Then these will in turn be hooked up in a follow up patch to be used by
      strarrays__scnprintf().
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-ja080xawx08kedez855usnon@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      693d3458
    • Arnaldo Carvalho de Melo's avatar
      perf beauty: Make strarray's offset be u64 · 8d6505ba
      Arnaldo Carvalho de Melo authored
      We need it for things like MSRs that are sparse and go over MAXINT.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-g8t2d0jr0mg3yimg2qrjkvlt@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8d6505ba
  3. 07 Oct, 2019 23 commits
    • Arnaldo Carvalho de Melo's avatar
      tools arch x86: Grab a copy of the file containing the MSR numbers · 444e2ff3
      Arnaldo Carvalho de Melo authored
      We'll use it to generate a table and then convert the
      msr:{read,write}_msr 'msr' option in things like perf trace, script,
      etc.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-y1f4s0y1s43d4drh7pd2huzn@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      444e2ff3
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Allow choosing how to augment the tracepoint arguments · f11b2803
      Arnaldo Carvalho de Melo authored
      So far we used the libtraceevent printing routines when showing
      tracepoint arguments, but since 'perf trace' has a lot of beautifiers
      for syscall arguments, and since some of those can be used to augment
      tracepoint arguments, add a routine to make use of those beautifiers
      and allow the user to choose which one to use.
      
      The default now is to use the same beautifiers used for the strace-like
      sys_enter+sys_exit lines, but the user can choose the libtraceevent ones
      by either using the:
      
          perf trace --libtraceevent_print
      
      command line option, or by setting:
      
        # cat ~/.perfconfig
        [trace]
      	tracepoint_beautifiers = libtraceevent
      
      For instance, here are some examples:
      
        # perf trace -e sched:*switch,*sleep,sched:*wakeup,exit*,sched:*exit sleep 1
             0.000 sched:sched_wakeup(comm: "perf", pid: 5273 (perf), prio: 120, success: 1, target_cpu: 6)
             0.621 nanosleep(rqtp: 0x7ffdd06d1140, rmtp: NULL) ...
             0.628 sched:sched_switch(prev_comm: "sleep", prev_pid: 5273 (sleep), prev_prio: 120, prev_state: 1, next_comm: "swapper/6", next_pid: 0, next_prio: 120)
          1000.879 sched:sched_wakeup(comm: "sleep", pid: 5273 (sleep), prio: 120, success: 1, target_cpu: 6)
             0.621  ... [continued]: nanosleep())          = 0
          1001.026 exit_group(error_code: 0)               = ?
          1001.216 sched:sched_process_exit(comm: "sleep", pid: 5273 (sleep), prio: 120)
        #
      
      And then using libtraceevent, as before:
      
        # perf trace --libtraceevent_print -e sched:*switch,*sleep,sched:*wakeup,exit*,sched:*exit sleep 1
             0.000 sched:sched_wakeup(comm=perf pid=5288 prio=120 target_cpu=001)
             0.739 nanosleep(rqtp: 0x7ffeba6c2f40, rmtp: NULL) ...
             0.747 sched:sched_switch(prev_comm=sleep prev_pid=5288 prev_prio=120 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120)
          1000.902 sched:sched_wakeup(comm=sleep pid=5288 prio=120 target_cpu=001)
             0.739  ... [continued]: nanosleep())          = 0
          1001.012 exit_group(error_code: 0)               = ?
        #
      
      The new default allocates an array of 'struct syscall_arg_fmt' for the
      tracepoint arguments and, just like with syscall arguments, tries to
      find suitable syscall_arg__scnprintf_NAME() routines to augment those
      tracepoint arguments based on their type (as in the tracefs "format"
      file), or even in their name + type, for instance arguntents with names
      ending in "fd" with type "int" get the fd scnprintf beautifier attached,
      etc.
      
      Soon this will take advantage of the kernel BTF information to augment
      enumerations based on the tracefs "format" type info.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-o8qdluotkcb3b1x2gjqrejcl@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f11b2803
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Enclose all events argument lists with () · 311baaf9
      Arnaldo Carvalho de Melo authored
      So that they look a bit like normal strace-like syscall enter+exit
      lines.
      
      They will look even more when we switch from using libtraceevent's
      tep_print_event() routine in favour of using all the perf beautifiers
      used by the strace-like syscall enter+exit lines.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-y4fcej6v6u1m644nbxd2r4pg@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      311baaf9
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Add array of chars scnprintf beautifier · 9597945d
      Arnaldo Carvalho de Melo authored
      Needed for sched's traceoints prev/next comm, where, unlike with
      syscalls, we are not dealing with an integer or pointer, but an array
      straight out from the ring buffer.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-rlll7tmcqe1g4odtaifil5re@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9597945d
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Add the syscall_arg_fmt pointer to syscall_arg · 888ca854
      Arnaldo Carvalho de Melo authored
      So that the scnprintf beautifiers can access it, as will be the case
      with the char array one in the following csets, that needs to know
      the number of elements in an array.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-01qmjqv6cb1nj1qy4khdexce@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      888ca854
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Move some scnprintf methods from syscall to syscall_arg_fmt · 3e0c9b2c
      Arnaldo Carvalho de Melo authored
      Since all they operate on is on a syscall_arg_fmt instance, so move them
      to allow use it from the upcoming tracepoint fprintf routine.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-ynttrs1l75f0x9tk67spd7jd@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3e0c9b2c
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Allocate an array of beautifiers for tracepoint args · 947b843c
      Arnaldo Carvalho de Melo authored
      This will work similar to the syscall args, we'll allocate an array
      of 'struct syscall_arg_fmt' for the tracepoint args and then init them
      using the same algorithm used for the defaults for syscall args, i.e.
      using its types and sometimes names as hints to find the right scnprintf
      routine to beautify them from numbers into strings.
      
      Next step is to stop using libtracevent to printf tracepoints, as we'll
      have more beautifiers than int provides, modulo perhaps some plugins.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-dcl135relxvf6ljisjg13aqg@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      947b843c
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Factor out the initialization of syscal_arg_fmt->scnprintf · 8d1d4ff5
      Arnaldo Carvalho de Melo authored
      We set the default scnprint routines for the syscall args based on its
      type or on heuristics based on its names, now we'll use this for
      tracepoints as well, so move it out of syscall__set_arg_fmts() and into
      a routine that receive just an array of syscall_arg_fmt entries + the
      tracepoint format fields list.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-xs3x0zzyes06c7scdsjn01ty@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8d1d4ff5
    • Andi Kleen's avatar
      perf script: Allow --time with --reltime · 3714437d
      Andi Kleen authored
      The original --reltime patch forbid --time with --reltime.
      
      But it turns out --time doesn't really care about --reltime, because the
      relative time is only used at final output, while the time filtering
      always works earlier on absolute time.
      
      So just remove the check and allow combining the two options.
      
      Fixes: 90b10f47 ("perf script: Support relative time")
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: http://lore.kernel.org/lkml/20191002164642.1719-1-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3714437d
    • Björn Töpel's avatar
      samples/bpf: fix build by setting HAVE_ATTR_TEST to zero · fce9501a
      Björn Töpel authored
      To remove that test_attr__{enabled/open} are used by perf-sys.h, we
      set HAVE_ATTR_TEST to zero.
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Tested-by: default avatarKP Singh <kpsingh@google.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: bpf@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20191001113307.27796-3-bjorn.topel@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      fce9501a
    • Björn Töpel's avatar
      perf tools: Make usage of test_attr__* optional for perf-sys.h · 06f84d19
      Björn Töpel authored
      For users of perf-sys.h outside perf, e.g. samples/bpf/bpf_load.c, it's
      convenient not to depend on test_attr__*.
      
      After commit 91854f9a ("perf tools: Move everything related to
      sys_perf_event_open() to perf-sys.h"), all users of perf-sys.h will
      depend on test_attr__enabled and test_attr__open.
      
      This commit enables a user to define HAVE_ATTR_TEST to zero in order
      to omit the test dependency.
      
      Fixes: 91854f9a ("perf tools: Move everything related to sys_perf_event_open() to perf-sys.h")
      Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: bpf@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20191001113307.27796-2-bjorn.topel@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      06f84d19
    • Adrian Hunter's avatar
      perf scripts python: exported-sql-viewer.py: Add Time chart by CPU · b3700f21
      Adrian Hunter authored
      Add a time chart based on context switch information.
      
      Context switch information was added to the database export fairly
      recently, so the chart menu option will only appear if context switch
      information is in the database.
      
      Refer to the Exported SQL Viewer Help option for more information about
      the chart.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20190821083216.1340-7-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b3700f21
    • Adrian Hunter's avatar
      perf scripts python: exported-sql-viewer.py: Add ability for Call tree to open... · e69d5df7
      Adrian Hunter authored
      perf scripts python: exported-sql-viewer.py: Add ability for Call tree to open at a specified task and time
      
      Add ability for Call tree to open at a specified task and time.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20190821083216.1340-6-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e69d5df7
    • Adrian Hunter's avatar
      perf scripts python: exported-sql-viewer.py: Tidy up Call tree call_time · da4264f5
      Adrian Hunter authored
      Record call_time on tree nodes and re-name the misnamed "count" parameter.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20190821083216.1340-5-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      da4264f5
    • Adrian Hunter's avatar
      perf scripts python: exported-sql-viewer.py: Add global time range calculations · 9a9dae36
      Adrian Hunter authored
      Add calculations to determine a time range that encompasses all data.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20190821083216.1340-4-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9a9dae36
    • Adrian Hunter's avatar
      perf scripts python: exported-sql-viewer.py: Add HBoxLayout and VBoxLayout · 42c303ff
      Adrian Hunter authored
      Add layout classes HBoxLayout and VBoxLayout.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20190821083216.1340-3-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      42c303ff
    • Adrian Hunter's avatar
      perf scripts python: exported-sql-viewer.py: Add LookupModel() · 181ea40a
      Adrian Hunter authored
      Add LookupModel() to find a model in the model cache without creating it.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/20190821083216.1340-2-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      181ea40a
    • Arnaldo Carvalho de Melo's avatar
      perf trace augmented_syscalls: Do not show syscalls when none was asked for · 8bd436b0
      Arnaldo Carvalho de Melo authored
      When not using augmented syscalls, i.e. not passing thru the command
      line a eBPF source or object file event that provides the
      __augmented_syscalls__ BPF_MAP_TYPE_PERF_EVENT_ARRAY, etc, as with:
      
         perf trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c
      
      or passing that augmented eBPF source/object via the trace.add_events in
      .perfconfig file, we were assuming that syscalls were asked for,
      differing from when not using augmented syscalls at all.
      
      This is confusing when using .perfconfig to hide the fact we're using
      the augmenter, i.e. using:
      
       # perf trace -e sched:* sleep 1
      
      Will show both the scheduler tracepoints and the syscalls, where what we
      want is to show just the scheduler tracepoints.
      
      To see the scheduler tracepoints and some specific syscall strace-like
      formatting, one has to use:
      
        # perf trace -e sched:*,nanosleep sleep 1
      
      Or, if wanting all the syscalls:
      
        # perf trace -e sched:* --syscalls sleep 1
      
      This way 'perf trace' can be used to trace just a set of tracepoints
      while allowing for mixing with strace-like when desired, by simply
      adding to the mix the name of the syscalls to show in addition to the
      tracepoints.
      
      Fix it so that the behaviour using the eBPF based syscall augmenter is
      the same as when not using one.
      
      Testing:
      
      Before this patch, with this ~/.perfconfig:
      
        # egrep -B1 ^[[:space:]]+add_events ~/.perfconfig
        [trace]
        	add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        #
      
      That points to this pre-compiled eBPF syscall augmenter:
      
        # file /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o: ELF 64-bit LSB relocatable, eBPF, version 1 (SYSV), with debug_info, not stripped
      
      And when asking for _only_ sched:sched_switch and sched:sched_wakeup we
      were unconditionally getting all the syscalls formatted strace-like:
      
        # perf trace -e sched:*switch,sched:*wakeup sleep 1 |& tail
           0.633 fstat(3, 0x7fe11d030ac0)                = 0
           0.635 mmap(NULL, 217750512, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fe10fec5000
           0.643 close(3)                                = 0
           0.668 nanosleep(0x7fff649a3a90, NULL)      ...
           0.672 sched:sched_switch:prev_comm=sleep prev_pid=4417 prev_prio=120 prev_state=S ==> next_comm=swapper/6 next_pid=0 next_prio=120
        1000.822 sched:sched_wakeup:comm=sleep pid=4417 prio=120 target_cpu=006
           0.668  ... [continued]: nanosleep())          = 0
        1000.923 close(1)                                = 0
        1000.941 close(2)                                = 0
        1000.974 exit_group(0)                           = ?
        #
      
      After the patch:
      
        # perf trace -e sched:*switch,sched:*wakeup sleep 1
           0.000 sched:sched_wakeup:comm=perf pid=5529 prio=120 target_cpu=005
           1.186 sched:sched_switch:prev_comm=sleep prev_pid=5529 prev_prio=120 prev_state=S ==> next_comm=swapper/5 next_pid=0 next_prio=120
        1001.573 sched:sched_wakeup:comm=sleep pid=5529 prio=120 target_cpu=005
        #
      
      If we add the "open*" syscalls to the mix then the eBPF augmented _will_
      be used and these syscalls will be traced together with the specified
      sched tracepoints:
      
        # cd /sys/kernel/debug/tracing/events/syscalls/
        # ls -1d sys_enter_open*
        sys_enter_open
        sys_enter_openat
        sys_enter_open_by_handle_at
        sys_enter_open_tree
        #
      
        # perf trace -e open*,sched:*switch,sched:*wakeup sleep 1
             0.000 sched:sched_wakeup:comm=perf pid=5580 prio=120 target_cpu=005
             0.590 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
             0.616 openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
             0.846 openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
             0.891 sched:sched_switch:prev_comm=sleep prev_pid=5580 prev_prio=120 prev_state=S ==> next_comm=swapper/5 next_pid=0 next_prio=120
          1001.005 sched:sched_wakeup:comm=sleep pid=5580 prio=120 target_cpu=005
        #
      
      And as we can see, the pathnames were collected via the eBPF augmenters.
      
      If we don't specify anything it'll trace all syscalls:
      
        # perf trace sleep 1 |& tail
             0.299 brk(0x5597543a3000)                     = 0x5597543a3000
             0.302 brk(NULL)                               = 0x5597543a3000
             0.307 openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
             0.313 fstat(3, 0x7feece50cac0)                = 0
             0.315 mmap(NULL, 217750512, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7feec13a1000
             0.323 close(3)                                = 0
             0.354 nanosleep(0x7ffe338856e0, NULL)         = 0
          1000.641 close(1)                                = 0
          1000.655 close(2)                                = 0
          1000.673 exit_group(0)                           = ?
        #
      
      Ditto if we don't use .perfconfig's trace.add_events but instead pass
      just the augmenter as a command line event:
      
        # vim ~/.perfconfig
        # egrep -B1 ^[[:space:]]+add_events ~/.perfconfig
        # perf trace -e /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o sleep 1 |& tail
             0.294 brk(0x55ae08ec3000)                     = 0x55ae08ec3000
             0.297 brk(NULL)                               = 0x55ae08ec3000
             0.302 openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
             0.309 fstat(3, 0x7f726488fac0)                = 0
             0.311 mmap(NULL, 217750512, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7257724000
             0.319 close(3)                                = 0
             0.347 nanosleep(0x7ffe23643a70, NULL)         = 0
          1000.560 close(1)                                = 0
          1000.575 close(2)                                = 0
          1000.593 exit_group(0)                           = ?
        #
      
      As well as that + some syscall names for strace-like formatting:
      
        # perf trace -e socket,connect,/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o ssh localhost
             0.000 socket(PF_LOCAL, SOCK_STREAM|CLOEXEC|NONBLOCK, 0) = 3
             0.021 connect(3, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
             0.034 socket(PF_LOCAL, SOCK_STREAM|CLOEXEC|NONBLOCK, 0) = 3
             0.041 connect(3, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
             0.163 socket(PF_LOCAL, SOCK_STREAM, 0)        = 4
             0.185 connect(4, { .family: PF_LOCAL, path: /var/lib/sss/pipes/nss }, 110) = 0
             0.670 socket(PF_LOCAL, SOCK_STREAM|CLOEXEC|NONBLOCK, 0) = 7
             0.684 connect(7, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
             0.694 socket(PF_LOCAL, SOCK_STREAM|CLOEXEC|NONBLOCK, 0) = 7
             0.701 connect(7, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
             0.994 socket(PF_LOCAL, SOCK_STREAM|CLOEXEC|NONBLOCK, 0) = 5
             1.006 connect(5, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
             1.014 socket(PF_LOCAL, SOCK_STREAM|CLOEXEC|NONBLOCK, 0) = 5
             1.022 connect(5, { .family: PF_LOCAL, path: /var/run/nscd/socket }, 110) = -1 ENOENT (No such file or directory)
             1.068 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 5
             1.087 connect(5, { .family: PF_INET, port: 22, addr: 127.0.0.1 }, 16) = 0
            24.299 socket(PF_LOCAL, SOCK_STREAM, 0)        = 6
            24.337 connect(6, { .family: PF_LOCAL, path: /var/run/.heim_org.h5l.kcm-socket }, 110) = 0
            28.441 socket(PF_LOCAL, SOCK_STREAM, 0)        = 6
            28.516 connect(6, { .family: PF_LOCAL, path: /var/run/.heim_org.h5l.kcm-socket }, 110) = 0
        root@localhost's password:^C
        #
      
      Everything works without augmenters:
      
        # egrep -B1 ^[[:space:]]+add_events ~/.perfconfig
        # perf trace sleep 1 |& tail
             0.261 brk(0x5635068ac000)                     = 0x5635068ac000
             0.264 brk(NULL)                               = 0x5635068ac000
             0.268 openat(AT_FDCWD, 0xdce642a0, O_RDONLY|O_CLOEXEC) = 3
             0.275 fstat(3, 0x7f3fdce97ac0)                = 0
             0.277 mmap(NULL, 217750512, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f3fcfd2c000
             0.284 close(3)                                = 0
             0.310 nanosleep(0x7ffdaea6ecd0, NULL)         = 0
          1000.552 close(1)                                = 0
          1000.565 close(2)                                = 0
          1000.580 exit_group(0)                           = ?
        #
      
        # perf trace -e connect ssh localhost
             0.000 connect(3, 0x58266930, 110)             = -1 ENOENT (No such file or directory)
             0.022 connect(3, 0x58266af0, 110)             = -1 ENOENT (No such file or directory)
             0.150 connect(4, 0x58266b00, 110)             = 0
             0.490 connect(7, 0x58264150, 110)             = -1 ENOENT (No such file or directory)
             0.505 connect(7, 0x58264300, 110)             = -1 ENOENT (No such file or directory)
             0.832 connect(5, 0x58266220, 110)             = -1 ENOENT (No such file or directory)
             0.847 connect(5, 0x582663e0, 110)             = -1 ENOENT (No such file or directory)
             0.899 connect(5, 0x95ba0630, 16)              = 0
            25.619 connect(6, 0x58266360, 110)             = 0
            40.564 connect(6, 0x58266330, 110)             = 0
        root@localhost's password: ^C
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-624f6jxic04031tnt40va4dd@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8bd436b0
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Postpone parsing .perfconfig trace.add_events to after --verbose is processed · 7e035929
      Arnaldo Carvalho de Melo authored
      When we add events via the '[trace]' section in perfconfig the command
      line options are not yet processed, so when something goes wrong with
      parsing those events and using --verbose is advised, we end up not
      getting any more verbosity by doing so.
      
      So just copy the trace.add_events string for later processing, after we
      processed --verbose and the other command line options.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-d6wbnz85ftqljdll6ynjyjd8@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7e035929
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Generalize the syscall_fmt find routines · bcddbfc5
      Arnaldo Carvalho de Melo authored
      To allow them to be used with other stuff, such as tracepoints.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-od3gzg77ppqgnnrxqv40fvgx@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bcddbfc5
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Separate 'struct syscall_fmt' definition from syscall_fmts variable · 9b2036cd
      Arnaldo Carvalho de Melo authored
      As this has all the things needed to format tracepoints events, not just
      syscalls, that, after all, are just tracepoints with a set in stone ABI,
      i.e. order and number of parameters.
      
      For tracepoints we'll create a
      
        static struct syscall_fmt tracepoint_fmts[]
      
      array and will fill the ->arg[] entries with the beautifier for each
      positional argument and record the name, then, when we need it, we'll
      just check that the position has the same name, maybe even type, so that
      we can do some check that the tracepoint hasn't changed, if it has, we
      can even reorder things.
      
      Keep calling it syscall_fmt but use it as well for tracepoints, do it
      this way to minimize changes and reuse what is in place for syscalls,
      we'll see.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-2x1jgiev13zt4njaanlnne0d@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9b2036cd
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Make evlist__set_evsel_handler() affect just entries without a handler · 206d635a
      Arnaldo Carvalho de Melo authored
      Renaming it to evlist__set_default_evsel_handler(), to better reflect
      what we want to do, which is to set a default handler for events we
      still haven't set a custom handler, like the ones for "msr:write_msr",
      etc that are coming soon.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-e1bit7upnpmtsayh8039kfuw@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      206d635a
    • Arnaldo Carvalho de Melo's avatar
      perf evlist: Adopt __set_tracepoint_handlers method from perf_session · c0e53476
      Arnaldo Carvalho de Melo authored
      It all operates on the evsels in the session's evlist, so move it to the
      evlist layer to make it useful to tools not using perf_session, just
      evlists, like 'perf trace' in live mode.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-9oc53gnfi53vg82fvolkm85g@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c0e53476