1. 08 Apr, 2015 8 commits
    • Yunlong Song's avatar
      perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the... · 3a423a5c
      Yunlong Song authored
      perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
      
      Although the memory of pid_to_task can be allocated via calloc according
      to the value of /proc/sys/kernel/pid_max, it cannot handle the case when
      pid_max is changed after 'perf sched record' has created its perf.data.
      
      If the new pid_max configured in 'perf sched replay' is smaller than the
      old pid_max configured in 'perf sched record', then it will cause the
      assertion failure problem.
      
      To solve this problem, we realloc the memory of pid_to_task stepwise
      once the passed-in pid parameter in register_pid is larger than the
      current pid_max.
      
      Example:
      
      Test environment: x86_64 with 160 cores
      
       $ cat /proc/sys/kernel/pid_max
       163840
       $ perf sched record ls
       $ echo 5000 > /proc/sys/kernel/pid_max
       $ cat /proc/sys/kernel/pid_max
       5000
      
      Before this patch:
      
       $ perf sched replay
       run measurement overhead: 221 nsecs
       sleep measurement overhead: 55356 nsecs
       the run test took 1000011 nsecs
       the sleep test took 1060940 nsecs
       perf: builtin-sched.c:337: register_pid: Assertion `!(pid >= (unsigned
       long)pid_max)' failed.
       Aborted
      
      After this patch:
      
       $ perf sched replay
       run measurement overhead: 221 nsecs
       sleep measurement overhead: 55611 nsecs
       the run test took 1000026 nsecs
       the sleep test took 1060486 nsecs
       nr_run_events:        10
       nr_sleep_events:      1562
       nr_wakeup_events:     5
       task      0 (                  :1:         1), nr_events: 1
       task      1 (                  :2:         2), nr_events: 1
       task      2 (                  :3:         3), nr_events: 1
       task      3 (                  :5:         5), nr_events: 1
       ...
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427809596-29559-5-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3a423a5c
    • Yunlong Song's avatar
      perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the... · cb06ac25
      Yunlong Song authored
      perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max
      
      The current memory allocation of struct task_desc *pid_to_task[MAX_PID]
      is in a permanent and preset way, and it has two problems:
      
      Problem 1: If the pid_max, which is the max number of pids in the
      system, is much smaller than MAX_PID (1024*1000), then it causes a waste
      of stack memory. This may happen in the case where the number of cpu
      cores is much smaller than 1000.
      
      Problem 2: If the pid_max is changed from the default value to a value
      larger than MAX_PID, then it will cause assertion failure problem. The
      maximum value of pid_max can be set to pid_max_max (see pidmap_init
      defined in kernel/pid.c), which equals to PID_MAX_LIMIT. In x86_64,
      PID_MAX_LIMIT is 4*1024*1024 (defined in include/linux/threads.h). This
      value is much larger than MAX_PID, and will take up 32768 Kbytes
      (4*1024*1024*8/1024) for memory allocation of pid_to_task, which is much
      larger than the default 8192 Kbytes of the stack size of calling
      process.
      
      Due to these two problems, we use calloc to allocate the memory of
      pid_to_task dynamically.
      
      Example:
      
      Test environment: x86_64 with 160 cores
      
       $ cat /proc/sys/kernel/pid_max
       163840
       $ echo 1025000 > /proc/sys/kernel/pid_max
       $ cat /proc/sys/kernel/pid_max
       1025000
      
      Run some applications until the pid of some process is greater than
      the value of MAX_PID (1024*1000).
      
      Before this patch:
      
       $ perf sched replay
       run measurement overhead: 221 nsecs
       sleep measurement overhead: 55480 nsecs
       the run test took 1000008 nsecs
       the sleep test took 1063151 nsecs
       perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 1024000)'
       failed.
       Aborted
      
      After this patch:
      
       $ perf sched replay
       run measurement overhead: 221 nsecs
       sleep measurement overhead: 55435 nsecs
       the run test took 1000004 nsecs
       the sleep test took 1059312 nsecs
       nr_run_events:        10
       nr_sleep_events:      1562
       nr_wakeup_events:     5
       task      0 (                  :1:         1), nr_events: 1
       task      1 (                  :2:         2), nr_events: 1
       task      2 (                  :3:         3), nr_events: 1
       task      3 (                  :5:         5), nr_events: 1
       ...
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427809596-29559-4-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cb06ac25
    • Yunlong Song's avatar
      perf sched replay: Increase the MAX_PID value to fix assertion failure problem · a35e27d0
      Yunlong Song authored
      Current MAX_PID is only 65536, which will cause assertion failure problem
      when CPU cores are more than 64 in x86_64.
      
      This is because the pid_max value in x86_64 is at least
      PIDS_PER_CPU_DEFAULT * num_possible_cpus() (see function pidmap_init
      defined in kernel/pid.c), where PIDS_PER_CPU_DEFAULT is 1024 (defined in
      include/linux/threads.h).
      
      Thus for MAX_PID = 65536, the correspoinding CPU cores are
      65536/1024=64.  This is obviously not enough at all for x86_64, and will
      cause an assertion failure problem due to BUG_ON(pid >= MAX_PID) in the
      codes.
      
      We increase MAX_PID value from 65536 to 1024*1000, which can be used in
      x86_64 with 1000 cores.
      
      This number is finally decided according to the limitation of stack size
      of calling process.
      
      Use 'ulimit -a', the result shows the stack size of any process is 8192
      Kbytes, which is defined in include/uapi/linux/resource.h (#define
      _STK_LIM (8*1024*1024)).
      
      Thus we choose a large enough value for MAX_PID, and make it satisfy to
      the limitation of the stack size, i.e., making the perf process take up
      a memory space just smaller than 8192 Kbytes.
      
      We have calculated and tested that 1024*1000 is OK for MAX_PID.
      
      This means perf sched replay can now be used with at most 1000 cores in
      x86_64 without any assertion failure problem.
      
      Example:
      
      Test environment: x86_64 with 160 cores
      
       $ cat /proc/sys/kernel/pid_max
       163840
      
      Before this patch:
      
       $ perf sched replay
       run measurement overhead: 240 nsecs
       sleep measurement overhead: 55379 nsecs
       the run test took 1000004 nsecs
       the sleep test took 1059424 nsecs
       perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 65536)'
       failed.
       Aborted
      
      After this patch:
      
       $ perf sched replay
       run measurement overhead: 221 nsecs
       sleep measurement overhead: 55397 nsecs
       the run test took 999920 nsecs
       the sleep test took 1053313 nsecs
       nr_run_events:        10
       nr_sleep_events:      1562
       nr_wakeup_events:     5
       task      0 (                  :1:         1), nr_events: 1
       task      1 (                  :2:         2), nr_events: 1
       task      2 (                  :3:         3), nr_events: 1
       task      3 (                  :5:         5), nr_events: 1
       ...
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427809596-29559-3-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a35e27d0
    • Yunlong Song's avatar
      perf sched replay: Use struct task_desc instead of struct task_task for correct meaning · 0755bc4d
      Yunlong Song authored
      There is no struct task_task at all, thus it is a typo error in the old
      commits, now fix it to what it should be in order to avoid unnecessary
      misunderstanding.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1427809596-29559-2-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0755bc4d
    • Jiri Olsa's avatar
      perf kmem: Respect -i option · 28939e1a
      Jiri Olsa authored
      Currently the perf kmem does not respect -i option.
      
      Initializing the file.path properly after options get parsed.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1428298576-9785-2-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      28939e1a
    • Namhyung Kim's avatar
      tools lib traceevent: Honor operator priority · 3201f0dc
      Namhyung Kim authored
      Currently it ignores operator priority and just sets processed args as a
      right operand.  But it could result in priority inversion in case that
      the right operand is also a operator arg and its priority is lower.
      
      For example, following print format is from new kmem events.
      
        "page=%p", REC->pfn != -1UL ? (((struct page *)(0xffffea0000000000UL)) + (REC->pfn)) : ((void *)0)
      
      But this was treated as below:
      
        REC->pfn != ((null - 1UL) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)
      
      In this case, the right arg was '?' operator which has lower priority.
      But it just sets the whole arg so making the output confusing - page was
      always 0 or 1 since that's the result of logical operation.
      
      With this patch, it can handle it properly like following:
      
        ((REC->pfn != (null - 1UL)) ? ((struct page *)0xffffea0000000000UL + REC->pfn) : (void *) 0)
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1428298576-9785-10-git-send-email-namhyung@kernel.org
      [ Replaced 'swap' with 'rotate' in a comment as requested by Steve and agreed by Namhyung ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3201f0dc
    • Wang Nan's avatar
      perf kmaps: Check kmaps to make code more robust · ba92732e
      Wang Nan authored
      This patch add checks in places where map__kmap is used to get kmaps
      from struct kmap.
      
      Error messages are added at map__kmap to warn invalid accessing of kmap
      (for the case of !map->dso->kernel, kmap(map) does not exists at all).
      
      Also, introduces map__kmaps() to warn uninitialized kmaps.
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Cc: pi3orama@163.com
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1428394966-131044-2-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ba92732e
    • He Kuang's avatar
      perf evlist: Fix inverted logic in perf_mmap__empty · 8ea92ceb
      He Kuang authored
      perf_evlist__mmap_consume() uses perf_mmap__empty() to judge whether
      perf_mmap is empty and can be released. But the result is inverted so
      fix it.
      Signed-off-by: default avatarHe Kuang <hekuang@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1428399071-7141-1-git-send-email-hekuang@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8ea92ceb
  2. 03 Apr, 2015 1 commit
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo' of... · 6645f318
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      User visible changes:
      
        - Support unnamed union/structure members data collection in 'perf probe'. (Masami Hiramatsu)
      
        - Support missing -f to override perf.data file ownership. (Yunlong Song)
      
      Infrastructure changes:
      
        - No need to lookup thread twice when processing samples in 'perf script'. (Arnaldo Carvalho de Melo)
      
        - No need to pass thread twice to the scripting callbacks. (Arnaldo Carvalho de Melo)
      
        - No need to pass thread twice to the db-export facility. (Arnaldo Carvalho de Melo)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6645f318
  3. 02 Apr, 2015 31 commits