1. 12 May, 2016 14 commits
    • Arnaldo Carvalho de Melo's avatar
      perf evsel: Handle EACCESS + perf_event_paranoid=2 in fallback() · 08094828
      Arnaldo Carvalho de Melo authored
      Now with the default for the kernel.perf_event_paranoid sysctl being 2 [1]
      we need to fall back to :u, i.e. to set perf_event_attr.exclude_kernel
      to 1.
      
      Before:
      
        [acme@jouet linux]$ perf record usleep 1
        Error:
        You may not have permission to collect stats.
      
        Consider tweaking /proc/sys/kernel/perf_event_paranoid,
        which controls use of the performance events system by
        unprivileged users (without CAP_SYS_ADMIN).
      
        The current value is 2:
      
          -1: Allow use of (almost) all events by all users
        >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
        >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
        >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
        [acme@jouet linux]$
      
      After:
      
        [acme@jouet linux]$ perf record usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ]
        [acme@jouet linux]$ perf evlist
        cycles:u
        [acme@jouet linux]$ perf evlist -v
        cycles:u: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
        [acme@jouet linux]$
      
      And if the user turns on verbose mode, an explanation will appear:
      
        [acme@jouet linux]$ perf record -v usleep 1
        Warning:
        kernel.perf_event_paranoid=2, trying to fall back to excluding kernel samples
        mmap size 528384B
        [ perf record: Woken up 1 times to write data ]
        Looking at the vmlinux_path (8 entries long)
        Using /lib/modules/4.6.0-rc7+/build/vmlinux for symbols
        [ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ]
        [acme@jouet linux]$
      
      [1] 0161028b ("perf/core: Change the default paranoia level to 2")
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-b20jmx4dxt5hpaa9t2rroi0o@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      08094828
    • Arnaldo Carvalho de Melo's avatar
      perf evsel: Improve EPERM error handling in open_strerror() · 7d173913
      Arnaldo Carvalho de Melo authored
      We were showing a hardcoded default value for the kernel.perf_event_paranoid
      sysctl, now that it became more paranoid (1 -> 2 [1]), this would need to be
      updated, instead show the current value:
      
        [acme@jouet linux]$ perf record ls
        Error:
        You may not have permission to collect stats.
      
        Consider tweaking /proc/sys/kernel/perf_event_paranoid,
        which controls use of the performance events system by
        unprivileged users (without CAP_SYS_ADMIN).
      
        The current value is 2:
      
          -1: Allow use of (almost) all events by all users
        >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
        >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
        >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
        [acme@jouet linux]$
      
      [1] 0161028b ("perf/core: Change the default paranoia level to 2")
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-0gc4rdpg8d025r5not8s8028@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7d173913
    • Steven Rostedt's avatar
      tools lib traceevent: Do not reassign parg after collapse_tree() · 106b816c
      Steven Rostedt authored
      At the end of process_filter(), collapse_tree() was changed to update
      the parg parameter, but the reassignment after the call wasn't removed.
      
      What happens is that the "current_op" gets modified and freed and parg
      is assigned to the new allocated argument. But after the call to
      collapse_tree(), parg is assigned again to the just freed "current_op",
      and this causes the tool to crash.
      
      The current_op variable must also be assigned to NULL in case of error,
      otherwise it will cause it to be free()ed twice.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: stable@vger.kernel.org # 3.14+
      Fixes: 42d6194d ("tools lib traceevent: Refactor process_filter()")
      Link: http://lkml.kernel.org/r/20160511150936.678c18a1@gandalf.local.homeSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      106b816c
    • Arnaldo Carvalho de Melo's avatar
      perf probe: Check if dwarf_getlocations() is available · 49247345
      Arnaldo Carvalho de Melo authored
      If not, tell the user that:
      
        config/Makefile:273: Old libdw.h, finding variables at given 'perf probe' point will not work, install elfutils-devel/libdw-dev >= 0.157
      
      And return -ENOTSUPP in die_get_var_range(), failing features that
      need it, like the one pointed out above.
      
      This fixes the build on older systems, such as Ubuntu 12.04.5.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Vinson Lee <vlee@freedesktop.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-9l7luqkq4gfnx7vrklkq4obs@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      49247345
    • Arnaldo Carvalho de Melo's avatar
      perf dwarf: Guard !x86_64 definitions under #ifdef else clause · 62aa0e17
      Arnaldo Carvalho de Melo authored
      To fix the build on Fedora Rawhide (gcc 6.0.0 20160311 (Red Hat 6.0.0-0.17):
      
          CC       /tmp/build/perf/arch/x86/util/dwarf-regs.o
        arch/x86/util/dwarf-regs.c:66:36: error: 'x86_32_regoffset_table' defined but not used [-Werror=unused-const-variable=]
         static const struct pt_regs_offset x86_32_regoffset_table[] = {
                                            ^~~~~~~~~~~~~~~~~~~~~~
        cc1: all warnings being treated as errors
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-fghuksc1u8ln82bof4lwcj0o@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      62aa0e17
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Use readdir() instead of deprecated readdir_r() · 22a9f41b
      Arnaldo Carvalho de Melo authored
      The readdir() function is thread safe as long as just one thread uses a
      DIR, which is the case when parsing tracepoint event definitions, to
      avoid breaking the build with glibc-2.23.90 (upcoming 2.24), use it
      instead of readdir_r().
      
      See: http://man7.org/linux/man-pages/man3/readdir.3.html
      
      "However, in modern implementations (including the glibc implementation),
      concurrent calls to readdir() that specify different directory streams
      are thread-safe.  In cases where multiple threads must read from the
      same directory stream, using readdir() with external synchronization is
      still preferable to the use of the deprecated readdir_r(3) function."
      
      Noticed while building on a Fedora Rawhide docker container.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-wddn49r6bz6wq4ee3dxbl7lo@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      22a9f41b
    • Arnaldo Carvalho de Melo's avatar
      perf thread_map: Use readdir() instead of deprecated readdir_r() · 7839b9f3
      Arnaldo Carvalho de Melo authored
      The readdir() function is thread safe as long as just one thread uses a
      DIR, which is the case in thread_map, so, to avoid breaking the build
      with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r().
      
      See: http://man7.org/linux/man-pages/man3/readdir.3.html
      
      "However, in modern implementations (including the glibc implementation),
      concurrent calls to readdir() that specify different directory streams
      are thread-safe.  In cases where multiple threads must read from the
      same directory stream, using readdir() with external synchronization is
      still preferable to the use of the deprecated readdir_r(3) function."
      
      Noticed while building on a Fedora Rawhide docker container.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-del8h2a0f40z75j4r42l96l0@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7839b9f3
    • Arnaldo Carvalho de Melo's avatar
      perf script: Use readdir() instead of deprecated readdir_r() · 9a5f3bf3
      Arnaldo Carvalho de Melo authored
      The readdir() function is thread safe as long as just one thread uses a
      DIR, which is the case in 'perf script', so, to avoid breaking the build
      with glibc-2.23.90 (upcoming 2.24), use it instead of readdir_r().
      
      See: http://man7.org/linux/man-pages/man3/readdir.3.html
      
      "However, in modern implementations (including the glibc implementation),
      concurrent calls to readdir() that specify different directory streams
      are thread-safe.  In cases where multiple threads must read from the
      same directory stream, using readdir() with external synchronization is
      still preferable to the use of the deprecated readdir_r(3) function."
      
      Noticed while building on a Fedora Rawhide docker container.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-mt3xz7n2hl49ni2vx7kuq74g@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9a5f3bf3
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Use readdir() instead of deprecated readdir_r() · 2515e614
      Arnaldo Carvalho de Melo authored
      The readdir() function is thread safe as long as just one thread uses a
      DIR, which is the case when synthesizing events for pre-existing threads
      by traversing /proc, so, to avoid breaking the build with glibc-2.23.90
      (upcoming 2.24), use it instead of readdir_r().
      
      See: http://man7.org/linux/man-pages/man3/readdir.3.html
      
      "However, in modern implementations (including the glibc implementation),
      concurrent calls to readdir() that specify different directory streams
      are thread-safe.  In cases where multiple threads must read from the
      same directory stream, using readdir() with external synchronization is
      still preferable to the use of the deprecated readdir_r(3) function."
      
      Noticed while building on a Fedora Rawhide docker container.
      
         CC       /tmp/build/perf/util/event.o
        util/event.c: In function '__event__synthesize_thread':
        util/event.c:466:2: error: 'readdir_r' is deprecated [-Werror=deprecated-declarations]
          while (!readdir_r(tasks, &dirent, &next) && next) {
          ^~~~~
        In file included from /usr/include/features.h:368:0,
                         from /usr/include/stdint.h:25,
                         from /usr/lib/gcc/x86_64-redhat-linux/6.0.0/include/stdint.h:9,
                         from /git/linux/tools/include/linux/types.h:6,
                         from util/event.c:1:
        /usr/include/dirent.h:189:12: note: declared here
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-i1vj7nyjp2p750rirxgrfd3c@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2515e614
    • Alexander Shishkin's avatar
      perf/core: Disable the event on a truncated AUX record · 9f448cd3
      Alexander Shishkin authored
      When the PMU driver reports a truncated AUX record, it effectively means
      that there is no more usable room in the event's AUX buffer (even though
      there may still be some room, so that perf_aux_output_begin() doesn't take
      action). At this point the consumer still has to be woken up and the event
      has to be disabled, otherwise the event will just keep spinning between
      perf_aux_output_begin() and perf_aux_output_end() until its context gets
      unscheduled.
      
      Again, for cpu-wide events this means never, so once in this condition,
      they will be forever losing data.
      
      Fix this by disabling the event and waking up the consumer in case of a
      truncated AUX record.
      Reported-by: default avatarMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1462886313-13660-3-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9f448cd3
    • Alexander Shishkin's avatar
      perf/x86/intel/pt: Generate PMI in the STOP region as well · ab92b232
      Alexander Shishkin authored
      Currently, the PT driver always sets the PMI bit one region (page) before
      the STOP region so that we can wake up the consumer before we run out of
      room in the buffer and have to disable the event. However, we also need
      an interrupt in the last output region, so that we actually get to disable
      the event (if no more room from new data is available at that point),
      otherwise hardware just quietly refuses to start, but the event is
      scheduled in and we end up losing trace data till the event gets removed.
      
      For a cpu-wide event it is even worse since there may not be any
      re-scheduling at all and no chance for the ring buffer code to notice
      that its buffer is filled up and the event needs to be disabled (so that
      the consumer can re-enable it when it finishes reading the data out). In
      other words, all the trace data will be lost after the buffer gets filled
      up.
      
      This patch makes PT also generate a PMI when the last output region is
      full.
      Reported-by: default avatarMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1462886313-13660-2-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ab92b232
    • Andrey Ryabinin's avatar
      perf/x86: Fix undefined shift on 32-bit kernels · 6d6f2833
      Andrey Ryabinin authored
      Jim reported:
      
      	UBSAN: Undefined behaviour in arch/x86/events/intel/core.c:3708:12
      	shift exponent 35 is too large for 32-bit type 'long unsigned int'
      
      The use of 'unsigned long' type obviously is not correct here, make it
      'unsigned long long' instead.
      Reported-by: default avatarJim Cromie <jim.cromie@gmail.com>
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Imre Palik <imrep@amazon.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 2c33645d ("perf/x86: Honor the architectural performance monitoring version")
      Link: http://lkml.kernel.org/r/1462974711-10037-1-git-send-email-aryabinin@virtuozzo.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6d6f2833
    • Peter Zijlstra's avatar
      perf/x86/msr: Fix SMI overflow · 3c3116b7
      Peter Zijlstra authored
      We compute 'delta' and properly sign extend it and then ignore it and
      recompute the raw value, loosing the sign extention.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: kan.liang@intel.com
      Cc: linux-kernel@vger.kernel.org
      Cc: luto@kernel.org
      Cc: ray.huang@amd.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3c3116b7
    • hchrzani's avatar
      perf/x86/intel/uncore: Fix CHA registers configuration procedure for Knights Landing platform · ec336c87
      hchrzani authored
      CHA events in Knights Landing platform require programming filter registers properly.
      Remote node, local node and NonNearMemCachable bits should be set to 1 at all times.
      Signed-off-by: default avatarHubert Chrzaniuk <hubert.chrzaniuk@intel.com>
      Signed-off-by: default avatarLawrence F Meadows <lawrence.f.meadows@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: bp@suse.de
      Cc: harish.chegondi@intel.com
      Cc: hpa@zytor.com
      Cc: izumi.taku@jp.fujitsu.com
      Cc: kan.liang@intel.com
      Cc: lukasz.anaczkowski@intel.com
      Cc: vthakkar1994@gmail.com
      Fixes: 77af0037 ('perf/x86/intel/uncore: Add Knights Landing uncore PMU support')
      Link: http://lkml.kernel.org/r/1462779419-17115-2-git-send-email-hubert.chrzaniuk@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ec336c87
  2. 11 May, 2016 1 commit
    • Namhyung Kim's avatar
      perf diff: Fix duplicated output column · e9d848cb
      Namhyung Kim authored
      The commit b97511c5 ("perf tools: Add overhead/overhead_children
      keys defaults via string") moved initialization of column headers but it
      missed to check the sort__mode.  As 'perf diff' doesn't call
      perf_hpp__init(), the setup_overhead() also should not be called.
      
      Before:
      
        # Baseline    Delta  Children  Overhead  Shared Object        Symbol
        # ........  .......  ........  ........  ...................  .......................
        #
            28.48%  -28.47%    28.48%    28.48%  [kernel.vmlinux ]    [k] intel_idle
            11.51%  -11.47%    11.51%    11.51%  libxul.so            [.] 0x0000000001a360f7
             3.49%   -3.49%     3.49%     3.49%  [kernel.vmlinux]     [k] generic_exec_single
             2.91%   -2.89%     2.91%     2.91%  libdbus-1.so.3.8.11  [.] 0x000000000000cdc2
             2.86%   -2.85%     2.86%     2.86%  libxcb.so.1.1.0      [.] 0x000000000000c890
             2.44%   -2.39%     2.44%     2.44%  [kernel.vmlinux]     [k] perf_event_aux_ctx
      
      After:
      
        # Baseline    Delta  Shared Object        Symbol
        # ........  .......  ...................  .......................
        #
            28.48%  -28.47%  [kernel.vmlinux]     [k] intel_idle
            11.51%  -11.47%  libxul.so            [.] 0x0000000001a360f7
             3.49%   -3.49%  [kernel.vmlinux]     [k] generic_exec_single
             2.91%   -2.89%  libdbus-1.so.3.8.11  [.] 0x000000000000cdc2
             2.86%   -2.85%  libxcb.so.1.1.0      [.] 0x000000000000c890
             2.44%   -2.39%  [kernel.vmlinux]     [k] perf_event_aux_ctx
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: <stable@vger.kernel.org> # 4.5+
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: b97511c5 ("perf tools: Add overhead/overhead_children keys defaults via string")
      Link: http://lkml.kernel.org/r/1462890384-12486-2-git-send-email-acme@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e9d848cb
  3. 10 May, 2016 10 commits
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.6-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · c5114626
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
       "Since v4.5, we've WARNed during resume if a PCI device, including a
        Thunderbolt device, was added while we were suspended.  A change we
        merged for v4.6-rc1 turned that warning into a system hang.  These
        enumeration patches from Lukas Wunner fix this issue:
      
         - Fix BUG on device attach failure
         - Do not treat EPROBE_DEFER as device attach failure"
      
      * tag 'pci-v4.6-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: Do not treat EPROBE_DEFER as device attach failure
        PCI: Fix BUG on device attach failure
      c5114626
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7ec02e3b
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Two topology corner case fixes, and a MAINTAINERS file update for
        mmiotrace maintenance"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/topology: Set x86_max_cores to 1 for CONFIG_SMP=n
        MAINTAINERS: Add mmiotrace entry
        x86/topology: Handle CPUID bogosity gracefully
      7ec02e3b
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ac244065
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "A UP kernel cpufreq fix and a rt/dl scheduler corner case fix"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/rt, sched/dl: Don't push if task's scheduling class was changed
        sched/fair: Fix !CONFIG_SMP kernel cpufreq governor breakage
      ac244065
    • Xunlei Pang's avatar
      sched/rt, sched/dl: Don't push if task's scheduling class was changed · 13b5ab02
      Xunlei Pang authored
      We got this warning:
      
          WARNING: CPU: 1 PID: 2468 at kernel/sched/core.c:1161 set_task_cpu+0x1af/0x1c0
          [...]
          Call Trace:
      
          dump_stack+0x63/0x87
          __warn+0xd1/0xf0
          warn_slowpath_null+0x1d/0x20
          set_task_cpu+0x1af/0x1c0
          push_dl_task.part.34+0xea/0x180
          push_dl_tasks+0x17/0x30
          __balance_callback+0x45/0x5c
          __sched_setscheduler+0x906/0xb90
          SyS_sched_setattr+0x150/0x190
          do_syscall_64+0x62/0x110
          entry_SYSCALL64_slow_path+0x25/0x25
      
      This corresponds to:
      
          WARN_ON_ONCE(p->state == TASK_RUNNING &&
                   p->sched_class == &fair_sched_class &&
                   (p->on_rq && !task_on_rq_migrating(p)))
      
      It happens because in find_lock_later_rq(), the task whose scheduling
      class was changed to fair class is still pushed away as if it were
      a deadline task ...
      
      So, check in find_lock_later_rq() after double_lock_balance(), if the
      scheduling class of the deadline task was changed, break and retry.
      
      Apply the same logic to RT tasks.
      Signed-off-by: default avatarXunlei Pang <xlpang@redhat.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Link: http://lkml.kernel.org/r/1462767091-1215-1-git-send-email-xlpang@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      13b5ab02
    • Thomas Gleixner's avatar
      x86/topology: Set x86_max_cores to 1 for CONFIG_SMP=n · 8d415ee2
      Thomas Gleixner authored
      Josef reported that the uncore driver trips over with CONFIG_SMP=n because
      x86_max_cores is 16 instead of 12.
      
      The reason is, that for SMP=n the extended topology detection is a NOOP and
      the cache leaf is used to determine the number of cores. That's wrong in two
      aspects:
      
      1) The cache leaf enumerates the maximum addressable number of cores in the
         package, which is obviously not correct
      
      2) UP has no business with topology bits at all.
      
      Make intel_num_cpu_cores() return 1 for CONFIG_SMP=n
      Reported-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-team <Kernel-team@fb.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: http://lkml.kernel.org/r/761b4a2a-0332-7954-f030-c6639f949612@fb.com
      8d415ee2
    • Linus Torvalds's avatar
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 2d0bd953
      Linus Torvalds authored
      Pull libnvdimm build fix from Dan Williams:
       "A build fix for the usage of HPAGE_SIZE in the last libnvdimm pull
        request.
      
        I have taken note that the kbuild robot build success test does not
        include results for alpha_allmodconfig.  Thanks to Guenter for the
        report.  It's tagged for -stable since the original fix will land
        there and cause build problems"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        libnvdimm, pfn: fix ARCH=alpha allmodconfig build failure
      2d0bd953
    • Andy Lutomirski's avatar
      perf/core: Change the default paranoia level to 2 · 0161028b
      Andy Lutomirski authored
      Allowing unprivileged kernel profiling lets any user dump follow kernel
      control flow and dump kernel registers.  This most likely allows trivial
      kASLR bypassing, and it may allow other mischief as well.  (Off the top
      of my head, the PERF_SAMPLE_REGS_INTR output during /dev/urandom reads
      could be quite interesting.)
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0161028b
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 5c56b563
      Linus Torvalds authored
      Merge fixes from Andrew Morton:
       "2 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        zsmalloc: fix zs_can_compact() integer overflow
        Revert "proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan""
      5c56b563
    • Sergey Senozhatsky's avatar
      zsmalloc: fix zs_can_compact() integer overflow · 44f43e99
      Sergey Senozhatsky authored
      zs_can_compact() has two race conditions in its core calculation:
      
      unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
      				zs_stat_get(class, OBJ_USED);
      
      1) classes are not locked, so the numbers of allocated and used
         objects can change by the concurrent ops happening on other CPUs
      2) shrinker invokes it from preemptible context
      
      Depending on the circumstances, thus, OBJ_ALLOCATED can become
      less than OBJ_USED, which can result in either very high or
      negative `total_scan' value calculated later in do_shrink_slab().
      
      do_shrink_slab() has some logic to prevent those cases:
      
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
       vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
      
      However, due to the way `total_scan' is calculated, not every
      shrinker->count_objects() overflow can be spotted and handled.
      To demonstrate the latter, I added some debugging code to do_shrink_slab()
      (x86_64) and the results were:
      
       vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
       vmscan: but total_scan > 0: 92679974445502
       vmscan: resulting total_scan: 92679974445502
      [..]
       vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
       vmscan: but total_scan > 0: 22634041808232578
       vmscan: resulting total_scan: 22634041808232578
      
      Even though shrinker->count_objects() has returned an overflowed value,
      the resulting `total_scan' is positive, and, what is more worrisome, it
      is insanely huge. This value is getting used later on in
      shrinker->scan_objects() loop:
      
              while (total_scan >= batch_size ||
                     total_scan >= freeable) {
                      unsigned long ret;
                      unsigned long nr_to_scan = min(batch_size, total_scan);
      
                      shrinkctl->nr_to_scan = nr_to_scan;
                      ret = shrinker->scan_objects(shrinker, shrinkctl);
                      if (ret == SHRINK_STOP)
                              break;
                      freed += ret;
      
                      count_vm_events(SLABS_SCANNED, nr_to_scan);
                      total_scan -= nr_to_scan;
      
                      cond_resched();
              }
      
      `total_scan >= batch_size' is true for a very-very long time and
      'total_scan >= freeable' is also true for quite some time, because
      `freeable < 0' and `total_scan' is large enough, for example,
      22634041808232578. The only break condition, in the given scheme of
      things, is shrinker->scan_objects() == SHRINK_STOP test, which is a
      bit too weak to rely on, especially in heavy zsmalloc-usage scenarios.
      
      To fix the issue, take a pool stat snapshot and use it instead of
      racy zs_stat_get() calls.
      
      Link: http://lkml.kernel.org/r/20160509140052.3389-1-sergey.senozhatsky@gmail.comSigned-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>        [4.3+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44f43e99
    • Robin Humble's avatar
      Revert "proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan"" · 1e92a61c
      Robin Humble authored
      This reverts the 4.6-rc1 commit 7e2bc81d ("proc/base: make prompt
      shell start from new line after executing "cat /proc/$pid/wchan")
      because it breaks /proc/$PID/whcan formatting in ps and top.
      
      Revert also because the patch is inconsistent - it adds a newline at the
      end of only the '0' wchan, and does not add a newline when
      /proc/$PID/wchan contains a symbol name.
      
      eg.
      $ ps -eo pid,stat,wchan,comm
      PID STAT WCHAN  COMMAND
      ...
      1189 S    -      dbus-launch
      1190 Ssl  0
      dbus-daemon
      1198 Sl   0
      lightdm
      1299 Ss   ep_pol systemd
      1301 S    -      (sd-pam)
      1304 Ss   wait   sh
      Signed-off-by: default avatarRobin Humble <plaguedbypenguins@gmail.com>
      Cc: Minfei Huang <mnfhuang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e92a61c
  4. 09 May, 2016 11 commits
  5. 08 May, 2016 2 commits
  6. 07 May, 2016 2 commits