• Andi Kleen's avatar
    perf/x86: Optimize stack walk user accesses · 75925e1a
    Andi Kleen authored
    Change the perf user stack walking to use the new
    __copy_from_user_nmi(), and split each access into word sized transfer
    sizes. This allows to inline the complete access and optimize it all
    into a single load.
    
    The main advantage is that this avoids the overhead of double page
    faults.  When normal copy_from_user() fails it reexecutes the copy to
    compute an accurate number of non copied bytes. This leads to
    executing the expensive page fault twice.
    
    While walking stacks having a fault at some point is relatively common
    (typically when some part of the program isn't compiled with frame
    pointers), so this is a large overhead.
    
    With the optimized copies we avoid this problem because they only do
    all accesses once. And of course they're much faster too when the
    access does not fault because they're just single instructions instead
    of complex function calls.
    
    While profiling a kernel build with -g, the patch brings down the
    average time of the PMI handler from 966ns to 552ns (-43%).
    Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Link: http://lkml.kernel.org/r/1445551641-13379-2-git-send-email-andi@firstfloor.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    75925e1a
perf_event.c 55.3 KB