1. 08 Apr, 2015 10 commits
    • Yunlong Song's avatar
      perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task · 1aff59be
      Yunlong Song authored
      Since there is sem_wait for each task in the wait_for_tasks(), e.g.
      sem_wait(&task->work_done_sem).
      
      The sem_wait can continue only when work_done_sem is greater than 0, or
      it will be blocked.
      
      For perf sched replay, one task may sem_post the work_done_sem of
      another task, which causes the work_done_sem of that task processed in a
      reasonable sequence, e.g. sem_post, sem_wait, sem_wait, sem_post...
      
      This sequence simulates the sched process of the running tasks at the
      time when perf sched record runs.
      
      As a result, all the tasks are required and their threads must be
      successfully created.
      
      If any one (task A) of the tasks fails to create its thread, then
      another task (task B), whose work_done_sem needs sem_post from that
      failed task A, may likely block itself due to seg_wait.
      
      And this is a dead halt, since task B's thread_func cannot continue at
      all.
      
      To solve this problem, perf sched replay should exit once any task fails
      to create its thread.
      
      Example:
      
      Test environment: x86_64 with 160 cores
      
      Before this patch:
      
       $ perf sched replay
       ...
       Error: sys_perf_event_open() syscall returned with -1 (Too many open
       files)
       ------------------------------------------------------------    <- dead halt
      
      After this patch:
      
       $ perf sched replay
       ...
       task   1551 (           <unknown>:         0), nr_events: 10
       Error: sys_perf_event_open() syscall returned with -1 (Too many open
       files)
       $
      
      As shown above, perf sched replay finishes the process after printing an
      error message and does not block itself.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427809596-29559-7-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1aff59be
    • Yunlong Song's avatar
      perf sched replay: Fix the segmentation fault problem caused by pr_err in threads · 08097abc
      Yunlong Song authored
      The pr_err in self_open_counters() prints error message to stderr.
      Unlike stdout, stderr uses memory buffer on the stack of each calling
      process.
      
      The pr_err in self_open_counters() works in a thread called thread_func
      created in function create_tasks, which concurrently creates
      sched->nr_tasks threads.
      
      If the error happens and pr_err prints the error message in each of
      these threads, the stack size of the perf process (default is 8192
      kbytes) will quickly run out and the segmentation fault will happen
      then.
      
      To solve this problem, pr_err with self_open_counters() should be moved
      from newly created threads to the old main thread of the perf process.
      Then the pr_err can work in a stable situation without the strange
      segmentation fault problem.
      
      Example:
      
      Test environment: x86_64 with 160 cores
      
      Before this patch:
      
       $ perf sched replay
       ...
       task   1549 (             :163132:    163132), nr_events: 1
       task   1550 (             :163540:    163540), nr_events: 1
       task   1551 (           <unknown>:         0), nr_events: 10
       Segmentation fault
      
      After this patch:
      
       $ perf sched replay
       ...
       task   1549 (             :163132:    163132), nr_events: 1
       task   1550 (             :163540:    163540), nr_events: 1
       task   1551 (           <unknown>:         0), nr_events: 10
       ...
      
      As shown above, the result continues without any segmentation fault.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427809596-29559-6-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      08097abc
    • Yunlong Song's avatar
      perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the... · 3a423a5c
      Yunlong Song authored
      perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
      
      Although the memory of pid_to_task can be allocated via calloc according
      to the value of /proc/sys/kernel/pid_max, it cannot handle the case when
      pid_max is changed after 'perf sched record' has created its perf.data.
      
      If the new pid_max configured in 'perf sched replay' is smaller than the
      old pid_max configured in 'perf sched record', then it will cause the
      assertion failure problem.
      
      To solve this problem, we realloc the memory of pid_to_task stepwise
      once the passed-in pid parameter in register_pid is larger than the
      current pid_max.
      
      Example:
      
      Test environment: x86_64 with 160 cores
      
       $ cat /proc/sys/kernel/pid_max
       163840
       $ perf sched record ls
       $ echo 5000 > /proc/sys/kernel/pid_max
       $ cat /proc/sys/kernel/pid_max
       5000
      
      Before this patch:
      
       $ perf sched replay
       run measurement overhead: 221 nsecs
       sleep measurement overhead: 55356 nsecs
       the run test took 1000011 nsecs
       the sleep test took 1060940 nsecs
       perf: builtin-sched.c:337: register_pid: Assertion `!(pid >= (unsigned
       long)pid_max)' failed.
       Aborted
      
      After this patch:
      
       $ perf sched replay
       run measurement overhead: 221 nsecs
       sleep measurement overhead: 55611 nsecs
       the run test took 1000026 nsecs
       the sleep test took 1060486 nsecs
       nr_run_events:        10
       nr_sleep_events:      1562
       nr_wakeup_events:     5
       task      0 (                  :1:         1), nr_events: 1
       task      1 (                  :2:         2), nr_events: 1
       task      2 (                  :3:         3), nr_events: 1
       task      3 (                  :5:         5), nr_events: 1
       ...
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427809596-29559-5-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3a423a5c
    • Yunlong Song's avatar
      perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the... · cb06ac25
      Yunlong Song authored
      perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max
      
      The current memory allocation of struct task_desc *pid_to_task[MAX_PID]
      is in a permanent and preset way, and it has two problems:
      
      Problem 1: If the pid_max, which is the max number of pids in the
      system, is much smaller than MAX_PID (1024*1000), then it causes a waste
      of stack memory. This may happen in the case where the number of cpu
      cores is much smaller than 1000.
      
      Problem 2: If the pid_max is changed from the default value to a value
      larger than MAX_PID, then it will cause assertion failure problem. The
      maximum value of pid_max can be set to pid_max_max (see pidmap_init
      defined in kernel/pid.c), which equals to PID_MAX_LIMIT. In x86_64,
      PID_MAX_LIMIT is 4*1024*1024 (defined in include/linux/threads.h). This
      value is much larger than MAX_PID, and will take up 32768 Kbytes
      (4*1024*1024*8/1024) for memory allocation of pid_to_task, which is much
      larger than the default 8192 Kbytes of the stack size of calling
      process.
      
      Due to these two problems, we use calloc to allocate the memory of
      pid_to_task dynamically.
      
      Example:
      
      Test environment: x86_64 with 160 cores
      
       $ cat /proc/sys/kernel/pid_max
       163840
       $ echo 1025000 > /proc/sys/kernel/pid_max
       $ cat /proc/sys/kernel/pid_max
       1025000
      
      Run some applications until the pid of some process is greater than
      the value of MAX_PID (1024*1000).
      
      Before this patch:
      
       $ perf sched replay
       run measurement overhead: 221 nsecs
       sleep measurement overhead: 55480 nsecs
       the run test took 1000008 nsecs
       the sleep test took 1063151 nsecs
       perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 1024000)'
       failed.
       Aborted
      
      After this patch:
      
       $ perf sched replay
       run measurement overhead: 221 nsecs
       sleep measurement overhead: 55435 nsecs
       the run test took 1000004 nsecs
       the sleep test took 1059312 nsecs
       nr_run_events:        10
       nr_sleep_events:      1562
       nr_wakeup_events:     5
       task      0 (                  :1:         1), nr_events: 1
       task      1 (                  :2:         2), nr_events: 1
       task      2 (                  :3:         3), nr_events: 1
       task      3 (                  :5:         5), nr_events: 1
       ...
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427809596-29559-4-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cb06ac25
    • Yunlong Song's avatar
      perf sched replay: Increase the MAX_PID value to fix assertion failure problem · a35e27d0
      Yunlong Song authored
      Current MAX_PID is only 65536, which will cause assertion failure problem
      when CPU cores are more than 64 in x86_64.
      
      This is because the pid_max value in x86_64 is at least
      PIDS_PER_CPU_DEFAULT * num_possible_cpus() (see function pidmap_init
      defined in kernel/pid.c), where PIDS_PER_CPU_DEFAULT is 1024 (defined in
      include/linux/threads.h).
      
      Thus for MAX_PID = 65536, the correspoinding CPU cores are
      65536/1024=64.  This is obviously not enough at all for x86_64, and will
      cause an assertion failure problem due to BUG_ON(pid >= MAX_PID) in the
      codes.
      
      We increase MAX_PID value from 65536 to 1024*1000, which can be used in
      x86_64 with 1000 cores.
      
      This number is finally decided according to the limitation of stack size
      of calling process.
      
      Use 'ulimit -a', the result shows the stack size of any process is 8192
      Kbytes, which is defined in include/uapi/linux/resource.h (#define
      _STK_LIM (8*1024*1024)).
      
      Thus we choose a large enough value for MAX_PID, and make it satisfy to
      the limitation of the stack size, i.e., making the perf process take up
      a memory space just smaller than 8192 Kbytes.
      
      We have calculated and tested that 1024*1000 is OK for MAX_PID.
      
      This means perf sched replay can now be used with at most 1000 cores in
      x86_64 without any assertion failure problem.
      
      Example:
      
      Test environment: x86_64 with 160 cores
      
       $ cat /proc/sys/kernel/pid_max
       163840
      
      Before this patch:
      
       $ perf sched replay
       run measurement overhead: 240 nsecs
       sleep measurement overhead: 55379 nsecs
       the run test took 1000004 nsecs
       the sleep test took 1059424 nsecs
       perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 65536)'
       failed.
       Aborted
      
      After this patch:
      
       $ perf sched replay
       run measurement overhead: 221 nsecs
       sleep measurement overhead: 55397 nsecs
       the run test took 999920 nsecs
       the sleep test took 1053313 nsecs
       nr_run_events:        10
       nr_sleep_events:      1562
       nr_wakeup_events:     5
       task      0 (                  :1:         1), nr_events: 1
       task      1 (                  :2:         2), nr_events: 1
       task      2 (                  :3:         3), nr_events: 1
       task      3 (                  :5:         5), nr_events: 1
       ...
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427809596-29559-3-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a35e27d0
    • Yunlong Song's avatar
      perf sched replay: Use struct task_desc instead of struct task_task for correct meaning · 0755bc4d
      Yunlong Song authored
      There is no struct task_task at all, thus it is a typo error in the old
      commits, now fix it to what it should be in order to avoid unnecessary
      misunderstanding.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427809596-29559-2-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0755bc4d
    • Jiri Olsa's avatar
      perf kmem: Respect -i option · 28939e1a
      Jiri Olsa authored
      Currently the perf kmem does not respect -i option.
      
      Initializing the file.path properly after options get parsed.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1428298576-9785-2-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      28939e1a
    • Namhyung Kim's avatar
      tools lib traceevent: Honor operator priority · 3201f0dc
      Namhyung Kim authored
      Currently it ignores operator priority and just sets processed args as a
      right operand.  But it could result in priority inversion in case that
      the right operand is also a operator arg and its priority is lower.
      
      For example, following print format is from new kmem events.
      
        "page=%p", REC->pfn != -1UL ? (((struct page *)(0xffffea0000000000UL)) + (REC->pfn)) : ((void *)0)
      
      But this was treated as below:
      
        REC->pfn != ((null - 1UL) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)
      
      In this case, the right arg was '?' operator which has lower priority.
      But it just sets the whole arg so making the output confusing - page was
      always 0 or 1 since that's the result of logical operation.
      
      With this patch, it can handle it properly like following:
      
        ((REC->pfn != (null - 1UL)) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1428298576-9785-10-git-send-email-namhyung@kernel.org
      [ Replaced 'swap' with 'rotate' in a comment as requested by Steve and agreed by Namhyung ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3201f0dc
    • Wang Nan's avatar
      perf kmaps: Check kmaps to make code more robust · ba92732e
      Wang Nan authored
      This patch add checks in places where map__kmap is used to get kmaps
      from struct kmap.
      
      Error messages are added at map__kmap to warn invalid accessing of kmap
      (for the case of !map->dso->kernel, kmap(map) does not exists at all).
      
      Also, introduces map__kmaps() to warn uninitialized kmaps.
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Cc: pi3orama@163.com
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1428394966-131044-2-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ba92732e
    • He Kuang's avatar
      perf evlist: Fix inverted logic in perf_mmap__empty · 8ea92ceb
      He Kuang authored
      perf_evlist__mmap_consume() uses perf_mmap__empty() to judge whether
      perf_mmap is empty and can be released. But the result is inverted so
      fix it.
      Signed-off-by: default avatarHe Kuang <hekuang@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1428399071-7141-1-git-send-email-hekuang@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8ea92ceb
  2. 03 Apr, 2015 1 commit
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo' of... · 6645f318
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      User visible changes:
      
        - Support unnamed union/structure members data collection in 'perf probe'. (Masami Hiramatsu)
      
        - Support missing -f to override perf.data file ownership. (Yunlong Song)
      
      Infrastructure changes:
      
        - No need to lookup thread twice when processing samples in 'perf script'. (Arnaldo Carvalho de Melo)
      
        - No need to pass thread twice to the scripting callbacks. (Arnaldo Carvalho de Melo)
      
        - No need to pass thread twice to the db-export facility. (Arnaldo Carvalho de Melo)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6645f318
  3. 02 Apr, 2015 29 commits
    • Yunlong Song's avatar
      perf data: Support using -f to override perf.data file ownership for 'convert' · bd05954b
      Yunlong Song authored
      Enable perf data convert to use perf.data when it is not owned by
      current user or root.
      
      Example:
      
       # perf record ls
       # chown Yunlong.Song:Yunlong.Song perf.data
       # ls -al perf.data
       -rw------- 1 Yunlong.Song Yunlong.Song 28260 Apr  2 17:35 perf.data
       # id
       uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
      
      Before this patch:
      
       # perf data convert --to-ctf=./ctf-data/
       File perf.data not owned by current user or root (use -f to override)
       # perf data convert --to-ctf=./ctf-data/ -f
         Error: unknown switch `f'
      
        usage: perf data convert [<options>]
      
           -v, --verbose         be more verbose
           -i, --input <file>    input file name
               --to-ctf ...      Convert to CTF format
      
      After this patch:
      
       # perf data convert --to-ctf=./ctf-data/
       File perf.data not owned by current user or root (use -f to override)
       # perf data convert --to-ctf=./ctf-data/ -f
       # ls ctf-data/
       metadata  perf_stream_0
      
      As shown above, the -f option really works now.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427982439-27388-11-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bd05954b
    • Yunlong Song's avatar
      perf trace: Support using -f to override perf.data file ownership · e366a6d8
      Yunlong Song authored
      Enable perf trace to use perf.data when it is not owned by current user
      or root.
      
      Example:
      
       # perf trace record ls
       # chown Yunlong.Song:Yunlong.Song perf.data
       # ls -al perf.data
       -rw------- 1 Yunlong.Song Yunlong.Song 4153101 Apr  2 15:28 perf.data
       # id
       uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
      
      Before this patch:
      
       # perf trace -i perf.data
       File perf.data not owned by current user or root (use -f to override)
       # perf trace -i perf.data -f
         Error: unknown switch `f'
      
        usage: perf trace [<options>] [<command>]
           or: perf trace [<options>] -- <command> [<options>]
           or: perf trace record [<options>] [<command>]
           or: perf trace record [<options>] -- <command> [<options>]
      
               --event <event>   event selector. use 'perf list' to list
       						  available events
               --comm            show the thread COMM next to its id
               --tool_stats      show tool stats
           -e, --expr <expr>     list of events to trace
           -o, --output <file>   output file name
           -i, --input <file>    Analyze events in file
           -p, --pid <pid>       trace events on existing process id
           -t, --tid <tid>       trace events on existing thread id
               --filter-pids <float>
        ...
      
      As shown above, the -f option does not work at all.
      
      After this patch:
      
       # perf trace -i perf.data
       File perf.data not owned by current user or root (use -f to override)
       # perf trace -i perf.data -f
       0.056 ( 0.002 ms): ls/47325 brk(                                 ...
       0.108 ( 0.018 ms): ls/47325 mmap(len: 4096, prot: READ|WRITE,    ...
       0.145 ( 0.013 ms): ls/47325 access(filename: 0x7f31259a0eb0,     ...
       0.172 ( 0.008 ms): ls/47325 open(filename: 0x7fffeb9a0d00,       ...
       0.180 ( 0.004 ms): ls/47325 stat(filename: 0x7fffeb9a0d00,       ...
       0.185 ( 0.004 ms): ls/47325 open(filename: 0x7fffeb9a0d00,       ...
       0.189 ( 0.003 ms): ls/47325 stat(filename: 0x7fffeb9a0d00,       ...
       0.195 ( 0.004 ms): ls/47325 open(filename: 0x7fffeb9a0d00,       ...
       0.199 ( 0.002 ms): ls/47325 stat(filename: 0x7fffeb9a0d00,       ...
       0.205 ( 0.004 ms): ls/47325 open(filename: 0x7fffeb9a0d00,       ...
       0.211 ( 0.004 ms): ls/47325 stat(filename: 0x7fffeb9a0d00,       ...
       0.220 ( 0.007 ms): ls/47325 open(filename: 0x7f312599e8ff,       ...
       ...
       ...
      
      As shown above, the -f option really works now.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427982439-27388-10-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e366a6d8
    • Yunlong Song's avatar
      perf timechart: Support using -f to override perf.data file ownership · 44f7e432
      Yunlong Song authored
      Enable perf timechart to use perf.data when it is not owned by current
      user or root.
      
      Example:
      
       # perf timechart record ls
       # chown Yunlong.Song:Yunlong.Song perf.data
       # ls -al perf.data
       -rw------- 1 Yunlong.Song Yunlong.Song 5471744 Apr  2 15:15 perf.data
       # id
       uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
      
      Before this patch:
      
       # perf timechart
       File perf.data not owned by current user or root (use -f to override)
       # perf timechart -f
         Error: unknown switch `f'
      
        usage: perf timechart [<options>] {record}
      
           -i, --input <file>    input file name
           -o, --output <file>   output file name
           -w, --width <n>       page width
               --highlight <duration or task name>
                                 highlight tasks. Pass duration in ns or process name.
           -P, --power-only      output power data only
           -T, --tasks-only      output processes data only
           -p, --process <process>
                                 process selector. Pass a pid or process name.
               --symfs <directory>
                                 Look for files with symbols relative to this directory
           -n, --proc-num <n>    min. number of tasks to print
           -t, --topology        sort CPUs according to topology
               --io-skip-eagain  skip EAGAIN errors
               --io-min-time <time>
                                 all IO faster than min-time will visually appear longer
               --io-merge-dist <time>
                                 merge events that are merge-dist us apart
      
      As shown above, the -f option does not work at all.
      
      After this patch:
      
       # perf timechart
       File perf.data not owned by current user or root (use -f to override)
       # perf timechart -f
       Written 0.0 seconds of trace to output.svg.
       # cat output.svg
       <?xml version="1.0" standalone="no"?>
       <!DOCTYPE svg SYSTEM "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
       <svg width="1000" height="10110" version="1.1" xmlns="http://www.w3.org/2000/svg">
       <defs>
         <style type="text/css">
           <![CDATA[
             rect          { stroke-width: 1; }
       ...
       ...
      
      As shown above, the -f option really works now.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427982439-27388-9-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      44f7e432
    • Yunlong Song's avatar
      perf script: Support using -f to override perf.data file ownership · 06af0f2c
      Yunlong Song authored
      Enable perf script to use perf.data when it is not owned by current user
      or root. Change the short option name of --fields to -F to avoid confusion
      with --force.
      
      Example:
      
       # perf record ls
       # chown Yunlong.Song:Yunlong.Song perf.data
       # ls -al perf.data
       -rw------- 1 Yunlong.Song Yunlong.Song 28360 Apr  2 14:53 perf.data
       # id
       uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
      
      Before this patch:
      
       # perf script
       File perf.data not owned by current user or root (use -f to override)
       # perf script -f
         Error: switch `f' requires a value
      
        usage: perf script [<options>]
           or: perf script [<options>] record <script> [<record-options>] <command>
           or: perf script [<options>] report <script> [script-args]
           or: perf script [<options>] <script> [<record-options>] <command>
           or: perf script [<options>] <top-script> [script-args]
      
           -f, --fields <str>    comma separated output fields prepend with
           'type:'. Valid types: hw,sw,trace,raw. Fields:
           comm,tid,pid,time,cpu,event,trace,ip,sym,dso,addr,symoff,period
      
      As shown above, the -f option does not work at all. And -f is already
      taken up by --fields, which makes --force confused, so change the short
      option name of --fields to -F like what other perf commands do (e.g.
      perf report -F) and use -f as the short option name of --force.
      
      After this patch:
      
       # perf script
       File perf.data not owned by current user or root (use -f to override)
       # perf script -f
       :41298 41298 2590086.564226:          1 cycles:  ffffffff8103efc6
       native_write_msr_safe ([kernel.kallsyms])
       :41298 41298 2590086.564244:          1 cycles:  ffffffff8103efc6
       native_write_msr_safe ([kernel.kallsyms])
       :41298 41298 2590086.564249:          7 cycles:  ffffffff8103efc6
       native_write_msr_safe ([kernel.kallsyms])
       :41298 41298 2590086.564255:        176 cycles:  ffffffff8103efc6
       native_write_msr_safe ([kernel.kallsyms])
           ls 41298 2590086.567346:       4059 cycles:  ffffffff8105a592
           raise_softirq ([kernel.kallsyms])
           ls 41298 2590086.567353:       3717 cycles:  ffffffff8105a592
           raise_softirq ([kernel.kallsyms])
           ls 41298 2590086.567358:      63058 cycles:  ffffffff8105a592
           raise_softirq ([kernel.kallsyms])
           ls 41298 2590086.567448:    1706255 cycles:            406ae0
           [unknown] (/usr/bin/ls)
      
      As shown above, the -f option really works now.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427982439-27388-8-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      06af0f2c
    • Yunlong Song's avatar
      perf mem: Support using -f to override perf.data file ownership · 62a1a63a
      Yunlong Song authored
      Enable perf mem to use perf.data when it is not owned by current user or
      root.
      
      Example:
      
       # perf mem -t load record ls
       # chown Yunlong.Song:Yunlong.Song perf.data
       # ls -al perf.data
       -rw------- 1 Yunlong.Song Yunlong.Song 16392 Apr  2 14:34 perf.data
       # id
       uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
      
      Before this patch:
      
       # perf mem -D report
       File perf.data not owned by current user or root (use -f to override)
       # perf mem -D -f report
         Error: unknown switch `f'
      
        usage: perf mem [<options>] {record|report}
      
           -t, --type <type>     memory operations(load,store) Default load,store
           -D, --dump-raw-samples
                                 dump raw samples in ASCII
           -U, --hide-unresolved
                                 Only display entries resolved to a symbol
           -i, --input <file>    input file name
           -C, --cpu <cpu>       list of cpus to profile
           -x, --field-separator <separator>
                                 separator for columns, no spaces will be added
                                 between columns '.' is reserved.
      
      As shown above, the -f option does not work at all.
      
      After this patch:
      
       # perf mem -D report
       File perf.data not owned by current user or root (use -f to override)
       # perf mem -D -f report
       # PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL
       39095 39095 0xffffffff81127e40 0x016ffff887f45148338 8 0x68100142
       /proc/kcore:perf_event_aux
       39095 39095 0xffffffff8100a3fe 0xffff89007f8cb7d0 6 0x68100142
       /proc/kcore:native_sched_clock
       39095 39095 0xffffffff81309139 0xffff88bf44c9ded8 6 0x68100142
       /proc/kcore:acpi_map_lookup
       39095 39095 0xffffffff810f8c4c 0xffff89007f8ccd88 6 0x68100142
       /proc/kcore:rcu_nmi_exit
       39095 39095 0xffffffff81136346 0xffff88fea995dd50 6 0x68100142
       /proc/kcore:unlock_page
       39095 39095 0xffffffff812a64a2 0xffff88fea995dcc8 6 0x68100142
       /proc/kcore:half_md4_transform
       39095 39095 0x7f0cf877c7e9 0x25dfb94 6 0x68100142
       /lib64/libc-2.19.so:__readdir64
       39095 39095 0x7f0cf87575a3 0x7f0cf9163731 6 0x68100142
       /lib64/libc-2.19.so:__strcoll_l
       39095 39095 0xffffffff8116910e 0xffffea01c1bfbd50 23 0x68100242
       /proc/kcore:page_remove_rmap
      
      As shown above, the -f option really works now.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427982439-27388-7-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      62a1a63a
    • Yunlong Song's avatar
      perf lock: Support using -f to override perf.data file ownership · c4ac732a
      Yunlong Song authored
      Enable perf lock to use perf.data when it is not owned by current user
      or root.
      
      Example:
      
       # perf lock record ls
       # chown Yunlong.Song:Yunlong.Song perf.data
       # ls -al perf.data
       -rw------- 1 Yunlong.Song Yunlong.Song 4880686 Apr  2 14:14 perf.data
       # id
       uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
      
      Before this patch:
      
       # perf lock report
       File perf.data not owned by current user or root (use -f to override)
       Initializing perf session failed
       # perf lock report -f
         Error: unknown switch `f'
      
        usage: perf lock report [<options>]
      
           -k, --key <acquired>  key for sorting (acquired / contended /
           avg_wait / wait_total / wait_max / wait_min)
      
      As shown above, the -f option does not work at all.
      
      After this patch:
      
       # perf lock report
       File perf.data not owned by current user or root (use -f to override)
       Initializing perf session failed
       # perf lock report -f
                      Name   acquired  contended   avg wait (ns) total wait (ns) ...
      
       &ldata->output_l...        128          0               0               0 ...
                &ctx->lock        114          0               0               0 ...
               &p->pi_lock        112          0               0               0 ...
       &(&pool->lock)->...        112          0               0               0 ...
       &(&dentry->d_loc...         70          0               0               0 ...
       &(&newf->file_lo...         62          0               0               0 ...
       &(&fs->lock)->rl...         43          0               0               0 ...
       ...
      
      As shown above, the -f option really works now.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427982439-27388-6-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c4ac732a
    • Yunlong Song's avatar
      perf kvm: Support using -f to override perf.data.guest file ownership · 8cc5ec1f
      Yunlong Song authored
      Enable perf kvm to use perf.data.guest when it is not owned by current
      user or root.
      
      Example:
      
       # perf kvm stat record ls
       # chown Yunlong.Song:Yunlong.Song perf.data.guest
       # ls -al perf.data.guest
       -rw------- 1 Yunlong.Song Yunlong.Song 4128937 Apr  2 11:05 perf.data.guest
       # id
       uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
      
      Before this patch:
      
       # perf kvm stat report
       File perf.data.guest not owned by current user or root (use -f to override)
       Initializing perf session failed
       # perf kvm stat report -f
         Error: unknown switch `f'
      
        usage: perf kvm stat report [<options>]
      
               --event <report event>
                                 event for reporting: vmexit, mmio (x86 only),
                                 ioport (x86 only)
               --vcpu <n>        vcpu id to report
           -k, --key <sort-key>  key for sorting: sample(sort by samples
       						   number) time (sort by avg time)
           -p, --pid <pid>       analyze events only for given process id(s)
      
      As shown above, the -f option does not work at all.
      
      After this patch:
      
       # perf kvm stat report
       File perf.data.guest not owned by current user or root (use -f to override)
       Initializing perf session failed
       # perf kvm stat report -f
       Analyze events for all VMs, all VCPUs:
      
         VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
      
       Total Samples:0, Total events handled time:0.00us.
      
      As shown above, the -f option really works now. Since we have not
      launched any KVM related process, the result shows 0 sample here.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427982439-27388-5-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8cc5ec1f
    • Yunlong Song's avatar
      perf kmem: Support using -f to override perf.data file ownership · d1eeb77c
      Yunlong Song authored
      Enable perf kmem to use perf.data when it is not owned by current user
      or root.
      
      Example:
      
       # perf kmem record ls
       # chown Yunlong.Song:Yunlong.Song perf.data
       # ls -al perf.data
       -rw------- 1 Yunlong.Song Yunlong.Song 5315665 Apr  2 10:54 perf.data
       # id
       uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
      
      Before this patch:
      
       # perf kmem stat
       File perf.data not owned by current user or root (use -f to override)
       # perf kmem stat -f
         Error: unknown switch `f'
      
        usage: perf kmem [<options>] {record|stat}
      
           -i, --input <file>    input file name
           -v, --verbose         be more verbose (show symbol address, etc)
               --caller          show per-callsite statistics
               --alloc           show per-allocation statistics
           -s, --sort <key[,key2...]>
                                 sort by keys: ptr, call_site, bytes, hit,
                                 pingpong, frag
           -l, --line <num>      show n lines
               --raw-ip          show raw ip instead of symbol
      
      As shown above, the -f option does not work at all.
      
      After this patch:
      
       # perf kmem stat
       File perf.data not owned by current user or root (use -f to override)
       # perf kmem stat -f
       SUMMARY
       =======
       Total bytes requested: 437599
       Total bytes allocated: 615472
       Total bytes wasted on internal fragmentation: 177873
       Internal fragmentation: 28.900259%
       Cross CPU allocations: 6/1192
      
      As shown above, the -f option really works now.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427982439-27388-4-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d1eeb77c
    • Yunlong Song's avatar
      perf inject: Support using -f to override perf.data file ownership · ccaa474c
      Yunlong Song authored
      Enable perf inject to use perf.data when it is not owned by current user
      or root.
      
      Example:
      
       # perf record ls
       # chown Yunlong.Song:Yunlong.Song perf.data
       # ls -al perf.data
       -rw------- 1 Yunlong.Song Yunlong.Song 28260 Apr  2 10:37 perf.data
       # id
       uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
      
      Before this patch:
      
       # perf inject -v -b -i perf.data -o perf.data.new
       File perf.data not owned by current user or root (use -f to override)
       # perf inject -v -b -i perf.data -o perf.data.new -f
         Error: unknown switch `f'
      
        usage: perf inject [<options>]
      
           -b, --build-ids       Inject build-ids into the output stream
           -i, --input <file>    input file name
           -o, --output <file>   output file name
           -s, --sched-stat      Merge sched-stat and sched-switch for getting
           events where and how long tasks slept
           -v, --verbose         be more verbose (show build ids, etc)
               --kallsyms <file>
                                 kallsyms pathname
      
      As shown above, the -f option does not work at all.
      
      After this patch:
      
       # perf inject -v -b -i perf.data -o perf.data.new
       File perf.data not owned by current user or root (use -f to override)
       # perf inject -v -b -i perf.data -o perf.data.new -f
       build id event received for [kernel.kallsyms]:
       f6dcb66d8b98f1c0d9eb87bf043444b69f91d30c
       symsrc__init: cannot get elf header.
       Looking at the vmlinux_path (7 entries long)
       Using /proc/kcore for kernel object code
       Using /proc/kallsyms for symbols
      
      As shown above, the -f option really works now.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427982439-27388-3-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ccaa474c
    • Yunlong Song's avatar
      perf evlist: Support using -f to override perf.data file ownership · 9e3b6ec1
      Yunlong Song authored
      Enable perf evlist to use perf.data when it is not owned by current user
      or root.
      
      Example:
      
       # perf record ls
       # chown Yunlong.Song:Yunlong.Song perf.data
       # ls -al perf.data
       -rw------- 1 Yunlong.Song Yunlong.Song 28260 Apr  2 10:18 perf.data
       # id
       uid=0(root) gid=0(root) groups=0(root),64(pkcs11)
      
      Before this patch:
      
       # perf evlist
       File perf.data not owned by current user or root (use -f to override)
       # perf evlist -f
         Error: unknown switch `f'
      
        usage: perf evlist [<options>]
      
           -i, --input <file>    Input file name
           -F, --freq            Show the sample frequency
           -v, --verbose         Show all event attr details
           -g, --group           Show event group information
      
      As shown above, the -f option does not work at all.
      
      After this patch:
      
       # perf evlist
       File perf.data not owned by current user or root (use -f to override)
       # perf evlist -f
       cycles
      
      As shown above, the -f option really works now.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427982439-27388-2-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9e3b6ec1
    • Masami Hiramatsu's avatar
      perf probe: Fix to track down unnamed union/structure members · c7273835
      Masami Hiramatsu authored
      Fix 'perf probe' to track down unnamed union/structure members.
      
      perf probe did not track down the tree of unnamed union/structure
      members, since it just failed to find given "name" in a parent
      structure/union.  To solve this issue, I've introduced 2 changes.
      
      - Fix die_find_member() to track down the type-DIE if it is
        unnamed, and if it contains the specified member, returns the
        unnamed member.
        (note that we don't return found member, since unnamed member
         has the offset in the parent structure)
      - Fix convert_variable_fields() to track down the unnamed union/
        structure (one-by-one).
      
      With this patch, perf probe can access unnamed fields:
        -----
        #./perf probe -nfx ./perf lock__delete ops 'locked_ops=ops->locked.ops'
        Added new event:
          probe_perf:lock__delete (on lock__delete in /home/mhiramat/ksrc/linux-3/tools/perf/perf with ops locked_ops=ops->locked.ops)
      
        You can now use it in all perf tools, such as:
      
                perf record -e probe_perf:lock__delete -aR sleep 1
        -----
      Reported-by: default avatarArnaldo Carvalho de Melo <acme@kernel.org>
      Report-Link: https://lkml.org/lkml/2015/3/5/431Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150402073312.14482.37942.stgit@localhost.localdomainSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c7273835
    • Arnaldo Carvalho de Melo's avatar
      perf db-export: No need to have ->thread twice in struct export_sample · b83e868d
      Arnaldo Carvalho de Melo authored
      As it comes from address_location->thread, that is already stored as
      export_sample->al, where the thread can be obtained.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20150402141542.GA9630@kernel.org
      Link: http://lkml.kernel.org/n/tip-bzotbl4epoztw0jd6sm2stpf@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b83e868d
    • Arnaldo Carvalho de Melo's avatar
      perf db-export: No need to pass thread twice to db_export__sample · 7327259d
      Arnaldo Carvalho de Melo authored
      As it is available via another parameter, address_location->thread.
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: lkml.kernel.org/r/551D08F8.3040706@intel.com
      Link: http://lkml.kernel.org/n/tip-6dbn0tcm9hyv92g7h3zj2dbt@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7327259d
    • Arnaldo Carvalho de Melo's avatar
      perf scripting: No need to pass thread twice to the scripting callbacks · f9d5d549
      Arnaldo Carvalho de Melo authored
      It is already in the addr_location, so remove the redundant 'thread'
      parameter from the callback signatures.
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1427906210-10519-3-git-send-email-acme@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f9d5d549
    • Arnaldo Carvalho de Melo's avatar
      perf script: No need to lookup thread twice · 79628f2c
      Arnaldo Carvalho de Melo authored
      We get the thread when we call perf_event__preprocess_sample(), no need
      to do it before that.
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1427906210-10519-2-git-send-email-acme@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      79628f2c
    • Ingo Molnar's avatar
      perf/x86/intel/pt: Fix the 32-bit build · 2e54a5bd
      Ingo Molnar authored
      On a 32-bit build I got:
      
        arch/x86/kernel/cpu/perf_event_intel_pt.c:413:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
        arch/x86/kernel/cpu/perf_event_intel_bts.c:162:24: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      
      Fix it. The code should probably be (re-)tested on 32-bit systems to make
      sure all is fine.
      
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2e54a5bd
    • Andi Kleen's avatar
      perf/x86/intel: Avoid rewriting DEBUGCTL with the same value for LBRs · cd1f11de
      Andi Kleen authored
      perf with LBRs on has a tendency to rewrite the DEBUGCTL MSR with
      the same value. Add a little optimization to skip the unnecessary
      write.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1426871484-21285-2-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cd1f11de
    • Andi Kleen's avatar
      perf/x86/intel: Streamline LBR MSR handling in PMI · 1a78d937
      Andi Kleen authored
      The perf PMI currently does unnecessary MSR accesses when
      LBRs are enabled. We use LBR freezing, or when in callstack
      mode force the LBRs to only filter on ring 3.
      
      So there is no need to disable the LBRs explicitely in the
      PMI handler.
      
      Also we always unnecessarily rewrite LBR_SELECT in the LBR
      handler, even though it can never change.
      
       5)               |  /* write_msr: MSR_LBR_SELECT(1c8), value 0 */
       5)               |  /* read_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */
       5)               |  /* write_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */
       5)               |  /* write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f */
       5)               |  /* write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 */
       5)               |  /* write_msr: MSR_LBR_SELECT(1c8), value 0 */
       5)               |  /* read_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */
       5)               |  /* write_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */
      
      This patch:
      
        - Avoids disabling already frozen LBRs unnecessarily in the PMI
        - Avoids changing LBR_SELECT in the PMI
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1426871484-21285-1-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1a78d937
    • Andi Kleen's avatar
      perf/x86: Only dump PEBS register when PEBS has been detected · 15fde110
      Andi Kleen authored
      Technically PEBS_ENABLED is only guaranteed to exist when we
      detected PEBS. So add a check for this to the PMU dump function.
      I don't think it can happen on a real CPU, but could in a VM.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1425059312-18217-4-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      15fde110
    • Andi Kleen's avatar
      perf/x86: Dump DEBUGCTL in PMU dump · da3e606d
      Andi Kleen authored
      LBRs and LBR freezing are controlled through the DEBUGCTL MSR. So
      dump the state of DEBUGCTL too when dumping the PMU state.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1425059312-18217-3-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      da3e606d
    • Andi Kleen's avatar
      perf/x86/intel: Reset more state in PMU reset · 8882edf7
      Andi Kleen authored
      The PMU reset code didn't quite keep up with newer PMU features.
      Improve it a bit to really reset a modern PMU:
      
        - Clear all overflow status
        - Clear LBRs and freezing state
        - Disable fixed counters too
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1425059312-18217-2-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8882edf7
    • Stephane Eranian's avatar
      perf/x86/intel: Make the HT bug workaround conditional on HT enabled · b37609c3
      Stephane Eranian authored
      This patch disables the PMU HT bug when Hyperthreading (HT)
      is disabled. We cannot do this test immediately when perf_events
      is initialized. We need to wait until the topology information
      is setup properly. As such, we register a later initcall, check
      the topology and potentially disable the workaround. To do this,
      we need to ensure there is no user of the PMU. At this point of
      the boot, the only user is the NMI watchdog, thus we disable
      it during the switch and re-enable it right after.
      
      Having the workaround disabled when it is not needed provides
      some benefits by limiting the overhead is time and space.
      The workaround still ensures correct scheduling of the corrupting
      memory events (0xd0, 0xd1, 0xd2) when HT is off. Those events
      can only be measured on counters 0-3. Something else the current
      kernel did not handle correctly.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: bp@alien8.de
      Cc: jolsa@redhat.com
      Cc: kan.liang@intel.com
      Cc: maria.n.dimakopoulou@gmail.com
      Link: http://lkml.kernel.org/r/1416251225-17721-13-git-send-email-eranian@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b37609c3
    • Stephane Eranian's avatar
      watchdog: Add watchdog enable/disable all functions · b3738d29
      Stephane Eranian authored
      This patch adds two new functions to enable/disable
      the watchdog across all CPUs.
      
      This will be used by the HT PMU bug workaround code to
      disable/enable the NMI watchdog across quirk enablement.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: bp@alien8.de
      Cc: jolsa@redhat.com
      Cc: kan.liang@intel.com
      Cc: maria.n.dimakopoulou@gmail.com
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1416251225-17721-12-git-send-email-eranian@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b3738d29
    • Stephane Eranian's avatar
      perf/x86/intel: Limit to half counters when the HT workaround is enabled, to... · c02cdbf6
      Stephane Eranian authored
      perf/x86/intel: Limit to half counters when the HT workaround is enabled, to avoid exclusive mode starvation
      
      This patch limits the number of counters available to each CPU when
      the HT bug workaround is enabled.
      
      This is necessary to avoid situation of counter starvation. Such can
      arise from configuration where one HT thread, HT0, is using all 4 counters
      with corrupting events which require exclusion the the sibling HT, HT1.
      
      In such case, HT1 would not be able to schedule any event until HT0
      is done. To mitigate this problem, this patch artificially limits
      the number of counters to 2.
      
      That way, we can gurantee that at least 2 counters are not in exclusive
      mode and therefore allow the sibling thread to schedule events of the
      same type (system vs. per-thread). The 2 counters are not determined
      in advance. We simply set the limit to two events per HT.
      
      This helps mitigate starvation in case of events with specific counter
      constraints such a PREC_DIST.
      
      Note that this does not elimintate the starvation is all cases. But
      it is better than not having it.
      
      (Solution suggested by Peter Zjilstra.)
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: bp@alien8.de
      Cc: jolsa@redhat.com
      Cc: kan.liang@intel.com
      Cc: maria.n.dimakopoulou@gmail.com
      Link: http://lkml.kernel.org/r/1416251225-17721-11-git-send-email-eranian@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c02cdbf6
    • Stephane Eranian's avatar
      perf/x86/intel: Fix intel_get_event_constraints() for dynamic constraints · a90738c2
      Stephane Eranian authored
      With dynamic constraint, we need to restart from the static
      constraints each time the intel_get_event_constraints() is called.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
      Cc: bp@alien8.de
      Cc: jolsa@redhat.com
      Cc: kan.liang@intel.com
      Link: http://lkml.kernel.org/r/1416251225-17721-10-git-send-email-eranian@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a90738c2
    • Maria Dimakopoulou's avatar
      perf/x86/intel: Enforce HT bug workaround with PEBS for SNB/IVB/HSW · b63b4b45
      Maria Dimakopoulou authored
      This patch modifies the PEBS constraint tables for SNB/IVB/HSW
      such that corrupting events supporting PEBS activate the HT
      workaround.
      Signed-off-by: default avatarMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarStephane Eranian <eranian@google.com>
      Cc: bp@alien8.de
      Cc: jolsa@redhat.com
      Cc: kan.liang@intel.com
      Link: http://lkml.kernel.org/r/1416251225-17721-9-git-send-email-eranian@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b63b4b45
    • Maria Dimakopoulou's avatar
      perf/x86/intel: Enforce HT bug workaround for SNB/IVB/HSW · 93fcf72c
      Maria Dimakopoulou authored
      This patches activates the HT bug workaround for the
      SNB/IVB/HSW processors. This covers non-PEBS mode.
      Activation is done thru the constraint tables.
      
      Both client and server processors needs this workaround.
      Signed-off-by: default avatarMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarStephane Eranian <eranian@google.com>
      Cc: bp@alien8.de
      Cc: jolsa@redhat.com
      Cc: kan.liang@intel.com
      Link: http://lkml.kernel.org/r/1416251225-17721-8-git-send-email-eranian@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      93fcf72c
    • Maria Dimakopoulou's avatar
      perf/x86/intel: Implement cross-HT corruption bug workaround · e979121b
      Maria Dimakopoulou authored
      This patch implements a software workaround for a HW erratum
      on Intel SandyBridge, IvyBridge and Haswell processors
      with Hyperthreading enabled. The errata are documented for
      each processor in their respective specification update
      documents:
      
        - SandyBridge: BJ122
        - IvyBridge: BV98
        - Haswell: HSD29
      
      The bug causes silent counter corruption across hyperthreads only
      when measuring certain memory events (0xd0, 0xd1, 0xd2, 0xd3).
      Counters measuring those events may leak counts to the sibling
      counter. For instance, counter 0, thread 0 measuring event 0xd0,
      may leak to counter 0, thread 1, regardless of the event measured
      there. The size of the leak is not predictible. It all depends on
      the workload and the state of each sibling hyper-thread. The
      corrupting events do undercount as a consequence of the leak. The
      leak is compensated automatically only when the sibling counter measures
      the exact same corrupting event AND the workload is on the two threads
      is the same. Given, there is no way to guarantee this, a work-around
      is necessary. Furthermore, there is a serious problem if the leaked count
      is added to a low-occurrence event. In that case the corruption on
      the low occurrence event can be very large, e.g., orders of magnitude.
      
      There is no HW or FW workaround for this problem.
      
      The bug is very easy to reproduce on a loaded system.
      Here is an example on a Haswell client, where CPU0, CPU4
      are siblings. We load the CPUs with a simple triad app
      streaming large floating-point vector. We use 0x81d0
      corrupting event (MEM_UOPS_RETIRED:ALL_LOADS) and
      0x20cc (ROB_MISC_EVENTS:LBR_INSERTS). Given we are not
      using the LBR, the 0x20cc event should be zero.
      
        $ taskset -c 0 triad &
        $ taskset -c 4 triad &
        $ perf stat -a -C 0 -e r81d0 sleep 100 &
        $ perf stat -a -C 4 -r20cc sleep 10
        Performance counter stats for 'system wide':
              139 277 291      r20cc
             10,000969126 seconds time elapsed
      
      In this example, 0x81d0 and r20cc ar eusing sinling counters
      on CPU0 and CPU4. 0x81d0 leaks into 0x20cc and corrupts it
      from 0 to 139 millions occurrences.
      
      This patch provides a software workaround to this problem by modifying the
      way events are scheduled onto counters by the kernel. The patch forces
      cross-thread mutual exclusion between counters in case a corrupting event
      is measured by one of the hyper-threads. If thread 0, counter 0 is measuring
      event 0xd0, then nothing can be measured on counter 0, thread 1. If no corrupting
      event is measured on any hyper-thread, event scheduling proceeds as before.
      
      The same example run with the workaround enabled, yield the correct answer:
      
        $ taskset -c 0 triad &
        $ taskset -c 4 triad &
        $ perf stat -a -C 0 -e r81d0 sleep 100 &
        $ perf stat -a -C 4 -r20cc sleep 10
        Performance counter stats for 'system wide':
              0 r20cc
             10,000969126 seconds time elapsed
      
      The patch does provide correctness for all non-corrupting events. It does not
      "repatriate" the leaked counts back to the leaking counter. This is planned
      for a second patch series. This patch series makes this repatriation more
      easy by guaranteeing the sibling counter is not measuring any useful event.
      
      The patch introduces dynamic constraints for events. That means that events which
      did not have constraints, i.e., could be measured on any counters, may now be
      constrained to a subset of the counters depending on what is going on the sibling
      thread. The algorithm is similar to a cache coherency protocol. We call it XSU
      in reference to Exclusive, Shared, Unused, the 3 possible states of a PMU
      counter.
      
      As a consequence of the workaround, users may see an increased amount of event
      multiplexing, even in situtations where there are fewer events than counters
      measured on a CPU.
      
      Patch has been tested on all three impacted processors. Note that when
      HT is off, there is no corruption. However, the workaround is still enabled,
      yet not costing too much. Adding a dynamic detection of HT on turned out to
      be complex are requiring too much to code to be justified.
      
      This patch addresses the issue when PEBS is not used. A subsequent patch
      fixes the problem when PEBS is used.
      Signed-off-by: default avatarMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
      [spinlock_t -> raw_spinlock_t]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarStephane Eranian <eranian@google.com>
      Cc: bp@alien8.de
      Cc: jolsa@redhat.com
      Cc: kan.liang@intel.com
      Link: http://lkml.kernel.org/r/1416251225-17721-7-git-send-email-eranian@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e979121b
    • Maria Dimakopoulou's avatar
      perf/x86/intel: Add cross-HT counter exclusion infrastructure · 6f6539ca
      Maria Dimakopoulou authored
      This patch adds a new shared_regs style structure to the
      per-cpu x86 state (cpuc). It is used to coordinate access
      between counters which must be used with exclusion across
      HyperThreads on Intel processors. This new struct is not
      needed on each PMU, thus is is allocated on demand.
      Signed-off-by: default avatarMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
      [peterz: spinlock_t -> raw_spinlock_t]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarStephane Eranian <eranian@google.com>
      Cc: bp@alien8.de
      Cc: jolsa@redhat.com
      Cc: kan.liang@intel.com
      Link: http://lkml.kernel.org/r/1416251225-17721-6-git-send-email-eranian@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6f6539ca