• Stephane Eranian's avatar
    perf tools: Make perf_event__synthesize_mmap_events() scale · 88b897a3
    Stephane Eranian authored
    This patch significantly improves the execution time of
    perf_event__synthesize_mmap_events() when running perf record on systems
    where processes have lots of threads.
    
    It just happens that cat /proc/pid/maps support uses a O(N^2) algorithm to
    generate each map line in the maps file.  If you have 1000 threads, then you
    have necessarily 1000 stacks.  For each vma, you need to check if it
    corresponds to a thread's stack.  With a large number of threads, this can take
    a very long time. I have seen latencies >> 10mn.
    
    As of today, perf does not use the fact that a mapping is a stack, therefore we
    can work around the issue by using /proc/pid/tasks/pid/maps.  This entry does
    not try to map a vma to stack and is thus much faster with no loss of
    functonality.
    
    The proc-map-timeout logic is kept in case users still want some upper limit.
    
    In V2, we fix the file path from /proc/pid/tasks/pid/maps to actual
    /proc/pid/task/pid/maps, tasks -> task.  Thanks Arnaldo for catching this.
    
    Committer note:
    
    This problem seems to have been elliminated in the kernel since commit :
    b18cb64e ("fs/proc: Stop trying to report thread stacks").
    Signed-off-by: default avatarStephane Eranian <eranian@google.com>
    Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Link: http://lkml.kernel.org/r/20170315135059.GC2177@redhat.com
    Link: http://lkml.kernel.org/r/1489598233-25586-1-git-send-email-eranian@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    88b897a3
event.c 39.5 KB