• Yunlong Song's avatar
    perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the... · cb06ac25
    Yunlong Song authored
    perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max
    
    The current memory allocation of struct task_desc *pid_to_task[MAX_PID]
    is in a permanent and preset way, and it has two problems:
    
    Problem 1: If the pid_max, which is the max number of pids in the
    system, is much smaller than MAX_PID (1024*1000), then it causes a waste
    of stack memory. This may happen in the case where the number of cpu
    cores is much smaller than 1000.
    
    Problem 2: If the pid_max is changed from the default value to a value
    larger than MAX_PID, then it will cause assertion failure problem. The
    maximum value of pid_max can be set to pid_max_max (see pidmap_init
    defined in kernel/pid.c), which equals to PID_MAX_LIMIT. In x86_64,
    PID_MAX_LIMIT is 4*1024*1024 (defined in include/linux/threads.h). This
    value is much larger than MAX_PID, and will take up 32768 Kbytes
    (4*1024*1024*8/1024) for memory allocation of pid_to_task, which is much
    larger than the default 8192 Kbytes of the stack size of calling
    process.
    
    Due to these two problems, we use calloc to allocate the memory of
    pid_to_task dynamically.
    
    Example:
    
    Test environment: x86_64 with 160 cores
    
     $ cat /proc/sys/kernel/pid_max
     163840
     $ echo 1025000 > /proc/sys/kernel/pid_max
     $ cat /proc/sys/kernel/pid_max
     1025000
    
    Run some applications until the pid of some process is greater than
    the value of MAX_PID (1024*1000).
    
    Before this patch:
    
     $ perf sched replay
     run measurement overhead: 221 nsecs
     sleep measurement overhead: 55480 nsecs
     the run test took 1000008 nsecs
     the sleep test took 1063151 nsecs
     perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 1024000)'
     failed.
     Aborted
    
    After this patch:
    
     $ perf sched replay
     run measurement overhead: 221 nsecs
     sleep measurement overhead: 55435 nsecs
     the run test took 1000004 nsecs
     the sleep test took 1059312 nsecs
     nr_run_events:        10
     nr_sleep_events:      1562
     nr_wakeup_events:     5
     task      0 (                  :1:         1), nr_events: 1
     task      1 (                  :2:         2), nr_events: 1
     task      2 (                  :3:         3), nr_events: 1
     task      3 (                  :5:         5), nr_events: 1
     ...
    Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Wang Nan <wangnan0@huawei.com>
    Link: http://lkml.kernel.org/r/1427809596-29559-4-git-send-email-yunlong.song@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    cb06ac25
builtin-sched.c 44 KB