• Linus Torvalds's avatar
    Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9657752c
    Linus Torvalds authored
    Pull perf updates from Ingo Molnar:
     "Kernel side changes:
    
       - Add branch type profiling/tracing support. (Jin Yao)
    
       - Add the PERF_SAMPLE_PHYS_ADDR ABI to allow the tracing/profiling of
         physical memory addresses, where the PMU supports it. (Kan Liang)
    
       - Export some PMU capability details in the new
         /sys/bus/event_source/devices/cpu/caps/ sysfs directory. (Andi
         Kleen)
    
       - Aux data fixes and updates (Will Deacon)
    
       - kprobes fixes and updates (Masami Hiramatsu)
    
       - AMD uncore PMU driver fixes and updates (Janakarajan Natarajan)
    
      On the tooling side, here's a (limited!) list of highlights - there
      were many other changes that I could not list, see the shortlog and
      git history for details:
    
      UI improvements:
    
       - Implement a visual marker for fused x86 instructions in the
         annotate TUI browser, available now in 'perf report', more work
         needed to have it available as well in 'perf top' (Jin Yao)
    
         Further explanation from one of Jin's patches:
    
                 │   ┌──cmpl   $0x0,argp_program_version_hook
           81.93 │   ├──je     20
                 │   │  lock   cmpxchg %esi,0x38a9a4(%rip)
                 │   │↓ jne    29
                 │   │↓ jmp    43
           11.47 │20:└─→cmpxch %esi,0x38a999(%rip)
    
         That means the cmpl+je is a fused instruction pair and they should
         be considered together.
    
       - Record the branch type and then show statistics and info about in
         callchain entries (Jin Yao)
    
         Example from one of Jin's patches:
    
            # perf record -g -j any,save_type
            # perf report --branch-history --stdio --no-children
    
            38.50%  div.c:45                [.] main                    div
                    |
                    ---main div.c:42 (RET CROSS_2M cycles:2)
                       compute_flag div.c:28 (cycles:2)
                       compute_flag div.c:27 (RET CROSS_2M cycles:1)
                       rand rand.c:28 (cycles:1)
                       rand rand.c:28 (RET CROSS_2M cycles:1)
                       __random random.c:298 (cycles:1)
                       __random random.c:297 (COND_BWD CROSS_2M cycles:1)
                       __random random.c:295 (cycles:1)
                       __random random.c:295 (COND_BWD CROSS_2M cycles:1)
                       __random random.c:295 (cycles:1)
                       __random random.c:295 (RET CROSS_2M cycles:9)
    
      namespaces support:
    
       - Add initial support for namespaces, using setns to access files in
         namespaces, grabbing their build-ids, etc. (Krister Johansen)
    
      perf trace enhancements:
    
       - Beautify pkey_{alloc,free,mprotect} arguments in 'perf trace'
         (Arnaldo Carvalho de Melo)
    
       - Add initial 'clone' syscall args beautifier in 'perf trace'
         (Arnaldo Carvalho de Melo)
    
       - Ignore 'fd' and 'offset' args for MAP_ANONYMOUS in 'perf trace'
         (Arnaldo Carvalho de Melo)
    
       - Beautifiers for the 'cmd' arg of several ioctl types, including:
         sound, DRM, KVM, vhost virtio and perf_events. (Arnaldo Carvalho de
         Melo)
    
       - Add PERF_SAMPLE_CALLCHAIN and PERF_RECORD_MMAP[2] to 'perf data'
         CTF conversion, allowing CTF trace visualization tools to show
         callchains and to resolve symbols (Geneviève Bastien)
    
       - Beautify the fcntl syscall, which is an interesting one in the
         sense that infrastructure had to be put in place to change the
         formatters of some arguments according to the value in a previous
         one, i.e. cmd dictates how arg and the syscall return will be
         formatted. (Arnaldo Carvalho de Melo
    
      perf stat enhancements:
    
       - Use group read for event groups in 'perf stat', reducing overhead
         when groups are defined in the event specification, i.e. when using
         {} to enclose a list of events, asking them to be read at the same
         time, e.g.: "perf stat -e '{cycles,instructions}'" (Jiri Olsa)
    
      pipe mode improvements:
    
       - Process tracing data in 'perf annotate' pipe mode (David
         Carrillo-Cisneros)
    
       - Add header record types to pipe-mode, now this command:
    
            $ perf record -o - -e cycles sleep 1 | perf report --stdio --header
    
         Will show the same as in non-pipe mode, i.e. involving a perf.data
         file (David Carrillo-Cisneros)
    
      Vendor specific hardware event support updates/enhancements:
    
       - Update POWER9 vendor events tables (Sukadev Bhattiprolu)
    
       - Add POWER9 PMU events Sukadev (Bhattiprolu)
    
       - Support additional POWER8+ PVR in PMU mapfile (Shriya)
    
       - Add Skylake server uncore JSON vendor events (Andi Kleen)
    
       - Support exporting Intel PT data to sqlite3 with python perf
         scripts, this is in addition to the postgresql support that was
         already there (Adrian Hunter)"
    
    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (253 commits)
      perf symbols: Fix plt entry calculation for ARM and AARCH64
      perf probe: Fix kprobe blacklist checking condition
      perf/x86: Fix caps/ for !Intel
      perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR
      perf/core, pt, bts: Get rid of itrace_started
      perf trace beauty: Beautify pkey_{alloc,free,mprotect} arguments
      tools headers: Sync cpu features kernel ABI headers with tooling headers
      perf tools: Pass full path of FEATURES_DUMP
      perf tools: Robustify detection of clang binary
      tools lib: Allow external definition of CC, AR and LD
      perf tools: Allow external definition of flex and bison binary names
      tools build tests: Don't hardcode gcc name
      perf report: Group stat values on global event id
      perf values: Zero value buffers
      perf values: Fix allocation check
      perf values: Fix thread index bug
      perf report: Add dump_read function
      perf record: Set read_format for inherit_stat
      perf c2c: Fix remote HITM detection for Skylake
      perf tools: Fix static build with newer toolchains
      ...
    9657752c
core.c 59.7 KB