1. 22 Feb, 2019 1 commit
    • Adrian Hunter's avatar
      perf thread-stack: Improve thread_stack__no_call_return() · 1f35cd65
      Adrian Hunter authored
      Improve thread_stack__no_call_return() to better handle 'returns' that
      do not match the stack i.e. 'no call'. See code comments for details.
      The example below shows how retpolines are affected:
      
      Example:
      
        $ cat simple-retpoline.c
        __attribute__((noinline)) int bar(void)
        {
                return -1;
        }
      
        int foo(void)
        {
                return bar() + 1;
        }
      
        __attribute__((indirect_branch("thunk"))) int main()
        {
                int (*volatile fn)(void) = foo;
      
                fn();
                return fn();
        }
        $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
        $ objdump -d simple-retpoline
        <SNIP>
        0000000000001040 <main>:
            1040:       48 83 ec 18             sub    $0x18,%rsp
            1044:       48 8d 05 25 01 00 00    lea    0x125(%rip),%rax        # 1170 <foo>
            104b:       48 89 44 24 08          mov    %rax,0x8(%rsp)
            1050:       48 8b 44 24 08          mov    0x8(%rsp),%rax
            1055:       e8 1f 01 00 00          callq  1179 <__x86_indirect_thunk_rax>
            105a:       48 8b 44 24 08          mov    0x8(%rsp),%rax
            105f:       48 83 c4 18             add    $0x18,%rsp
            1063:       e9 11 01 00 00          jmpq   1179 <__x86_indirect_thunk_rax>
        <SNIP>
        0000000000001160 <bar>:
            1160:       b8 ff ff ff ff          mov    $0xffffffff,%eax
            1165:       c3                      retq
        <SNIP>
        0000000000001170 <foo>:
            1170:       e8 eb ff ff ff          callq  1160 <bar>
            1175:       83 c0 01                add    $0x1,%eax
            1178:       c3                      retq
        0000000000001179 <__x86_indirect_thunk_rax>:
            1179:       e8 07 00 00 00          callq  1185 <__x86_indirect_thunk_rax+0xc>
            117e:       f3 90                   pause
            1180:       0f ae e8                lfence
            1183:       eb f9                   jmp    117e <__x86_indirect_thunk_rax+0x5>
            1185:       48 89 04 24             mov    %rax,(%rsp)
            1189:       c3                      retq
        <SNIP>
        $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
        $ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
        2019-01-08 14:03:37.851655 Creating database...
        2019-01-08 14:03:37.863256 Writing records...
        2019-01-08 14:03:38.069750 Adding indexes
        2019-01-08 14:03:38.078799 Done
        $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
      
      Before:
      
          main
              -> __x86_indirect_thunk_rax
                  -> __x86_indirect_thunk_rax
                      -> __x86_indirect_thunk_rax
                          -> bar
      
      After:
      
          main
              -> __x86_indirect_thunk_rax
                  -> __x86_indirect_thunk_rax
                      -> foo
                          -> bar
      
      Committer testing:
      
      Chose "Reports", Then "Context-Sensitive Call Graph" and then go on
      expanding:
      
      Before:
      
      simple-retpolin
         PID:PID
            _start
               _start
                  __libc_start_main
                     main
                         __x86_indirect_thunk_rax
                            __x86_indirect_thunk_rax
                            bar
      
      After:
      
      Remove the "simple.retpoline.db" file, run again the 'perf script' line
      to regenerate the .db file and run the exported-sql-viewer.py again to
      get the same all the way to 'main', then, from there, including 'main':
      
                     main
                         __x86_indirect_thunk_rax
                             __x86_indirect_thunk_rax
                                 foo
                                     bar
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20190109091835.5570-6-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1f35cd65
  2. 21 Feb, 2019 1 commit
    • Wei Li's avatar
      perf annotate: Fix getting source line failure · 11db1ad4
      Wei Li authored
      The output of "perf annotate -l --stdio xxx" changed since commit 425859ff
      ("perf annotate: No need to calculate notes->start twice") removed notes->start
      assignment in symbol__calc_lines(). It will get failed in
      find_address_in_section() from symbol__tty_annotate() subroutine as the
      a2l->addr is wrong. So the annotate summary doesn't report the line number of
      source code correctly.
      
      Before fix:
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ cat common_while_1.c
        void hotspot_1(void)
        {
      	volatile int i;
      
      	for (i = 0; i < 0x10000000; i++);
      	for (i = 0; i < 0x10000000; i++);
      	for (i = 0; i < 0x10000000; i++);
        }
      
        int main(void)
        {
      	hotspot_1();
      
      	return 0;
        }
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ gcc common_while_1.c -g -o common_while_1
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.488 MB perf.data (12498 samples) ]
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio
      
        Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
        ----------------------------------------------
      
         19.30 common_while_1[32]
         19.03 common_while_1[4e]
         19.01 common_while_1[16]
          5.04 common_while_1[13]
          4.99 common_while_1[4b]
          4.78 common_while_1[2c]
          4.77 common_while_1[10]
          4.66 common_while_1[2f]
          4.59 common_while_1[51]
          4.59 common_while_1[35]
          4.52 common_while_1[19]
          4.20 common_while_1[56]
          0.51 common_while_1[48]
         Percent |      Source code & Disassembly of common_while_1 for cycles:ppp (12480 samples, percent: local period)
        -----------------------------------------------------------------------------------------------------------------
               :
               :
               :
               :         Disassembly of section .text:
               :
               :         00000000000005fa <hotspot_1>:
               :         hotspot_1():
               :         void hotspot_1(void)
               :         {
          0.00 :   5fa:   push   %rbp
          0.00 :   5fb:   mov    %rsp,%rbp
               :                 volatile int i;
               :
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   5fe:   movl   $0x0,-0x4(%rbp)
          0.00 :   605:   jmp    610 <hotspot_1+0x16>
          0.00 :   607:   mov    -0x4(%rbp),%eax
         common_while_1[10]    4.77 :   60a:   add    $0x1,%eax
         common_while_1[13]    5.04 :   60d:   mov    %eax,-0x4(%rbp)
         common_while_1[16]   19.01 :   610:   mov    -0x4(%rbp),%eax
         common_while_1[19]    4.52 :   613:   cmp    $0xfffffff,%eax
            0.00 :   618:   jle    607 <hotspot_1+0xd>
                 :                 for (i = 0; i < 0x10000000; i++);
        ...
      
      After fix:
      
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
        [ perf record: Woken up 2 times to write data ]
        [ perf record: Captured and wrote 0.488 MB perf.data (12500 samples) ]
        liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio
      
        Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
        ----------------------------------------------
      
         33.34 common_while_1.c:5
         33.34 common_while_1.c:6
         33.32 common_while_1.c:7
         Percent |      Source code & Disassembly of common_while_1 for cycles:ppp (12482 samples, percent: local period)
        -----------------------------------------------------------------------------------------------------------------
               :
               :
               :
               :         Disassembly of section .text:
               :
               :         00000000000005fa <hotspot_1>:
               :         hotspot_1():
               :         void hotspot_1(void)
               :         {
          0.00 :   5fa:   push   %rbp
          0.00 :   5fb:   mov    %rsp,%rbp
               :                 volatile int i;
               :
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   5fe:   movl   $0x0,-0x4(%rbp)
          0.00 :   605:   jmp    610 <hotspot_1+0x16>
          0.00 :   607:   mov    -0x4(%rbp),%eax
         common_while_1.c:5    4.70 :   60a:   add    $0x1,%eax
          4.89 :   60d:   mov    %eax,-0x4(%rbp)
         common_while_1.c:5   19.03 :   610:   mov    -0x4(%rbp),%eax
         common_while_1.c:5    4.72 :   613:   cmp    $0xfffffff,%eax
          0.00 :   618:   jle    607 <hotspot_1+0xd>
               :                 for (i = 0; i < 0x10000000; i++);
          0.00 :   61a:   movl   $0x0,-0x4(%rbp)
          0.00 :   621:   jmp    62c <hotspot_1+0x32>
          0.00 :   623:   mov    -0x4(%rbp),%eax
         common_while_1.c:6    4.54 :   626:   add    $0x1,%eax
          4.73 :   629:   mov    %eax,-0x4(%rbp)
         common_while_1.c:6   19.54 :   62c:   mov    -0x4(%rbp),%eax
         common_while_1.c:6    4.54 :   62f:   cmp    $0xfffffff,%eax
        ...
      Signed-off-by: default avatarWei Li <liwei391@huawei.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: 425859ff ("perf annotate: No need to calculate notes->start twice")
      Link: http://lkml.kernel.org/r/20190221095716.39529-1-liwei391@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      11db1ad4
  3. 20 Feb, 2019 6 commits
  4. 19 Feb, 2019 10 commits
  5. 15 Feb, 2019 2 commits
    • Tommi Rantala's avatar
      perf tests shell: Skip trace+probe_vfs_getname.sh if built without trace support · 83244772
      Tommi Rantala authored
      If perf was built without trace support, the trace+probe_vfs_getname.sh
      'perf test' entry fails:
      
        # perf trace -h
        perf: 'trace' is not a perf-command. See 'perf --help'
      
        # perf test 64
        64: Check open filename arg using perf trace + vfs_getname: FAILED!
      
      Check trace support, so that we'll skip the test in that case:
      
        # perf test 64
        64: Check open filename arg using perf trace + vfs_getname: Skip
      Signed-off-by: default avatarTommi Rantala <tommi.t.rantala@nokia.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20190215134253.11454-1-tt.rantala@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      83244772
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-5.1-20190214' of... · 43f4e627
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-5.1-20190214' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      perf list:
      
        Jiri Olsa:
      
        - Display metric expressions for --details option
      
      perf record:
      
        Alexey Budankov:
      
        - Implement --affinity=node|cpu option, leftover, the other patches
          in this kit were already applied.
      
      perf trace:
      
        Arnaldo Carvalho de Melo:
      
        - Fix segfaults due to not properly handling negative file descriptor syscall args.
      
        - Fix segfault related to the 'waitid' 'options' prefix showing logic.
      
        - Filter out 'gnome-terminal*' if it is a parent of 'perf trace', to reduce the
          syscall feedback loop in system wide sessions.
      
      BPF:
      
        Song Liu:
      
        - Silence "Couldn't synthesize bpf events" warning for EPERM.
      
      Build system:
      
        Arnaldo Carvalho de Melo:
      
        - Fix the test-all.c feature detection fast path that was broken for
          quite a while leading to longer build times.
      
      Event parsing:
      
        Jiri Olsa:
      
        - Fix legacy events symbol separator parsing
      
      cs-etm:
      
        Mathieu Poirier:
      
        - Fix some error path return errors and plug some memory leaks.
      
        - Add proper header file for symbols
      
        - Remove unused structure fields.
      
        - Modularize auxtrace_buffer fetch, decoder and packet processing loop.
      
      Vendor events:
      
        Paul Clarke:
      
        - Add assorted metrics for the Power8 and Power9 architectures.
      
      perf report:
      
        Thomas Richter:
      
        - Add s390 diagnostic sampling descriptor size
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      43f4e627
  6. 14 Feb, 2019 20 commits