Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar: "Kernel side updates: - Fix and enhance poll support (Jiri Olsa) - Re-enable inheritance optimization (Jiri Olsa) - Enhance Intel memory events support (Stephane Eranian) - Refactor the Intel uncore driver to be more maintainable (Zheng Yan) - Enhance and fix Intel CPU and uncore PMU drivers (Peter Zijlstra, Andi Kleen) - [ plus various smaller fixes/cleanups ] User visible tooling updates: - Add +field argument support for --field option, so that one can add fields to the default list of fields to show, ie now one can just do: perf report --fields +pid And the pid will appear in addition to the default fields (Jiri Olsa) - Add +field argument support for --sort option (Jiri Olsa) - Honour -w in the report tools (report, top), allowing to specify the widths for the histogram entries columns (Namhyung Kim) - Properly show submicrosecond times in 'perf kvm stat' (Christian Borntraeger) - Add beautifier for mremap flags param in 'trace' (Alex Snast) - perf script: Allow callchains if any event samples them - Don't truncate Intel style addresses in 'annotate' (Alex Converse) - Allow profiling when kptr_restrict == 1 for non root users, kernel samples will just remain unresolved (Andi Kleen) - Allow configuring default options for callchains in config file (Namhyung Kim) - Support operations for shared futexes. (Davidlohr Bueso) - "perf kvm stat report" improvements by Alexander Yarygin: - Save pid string in opts.target.pid - Enable the target.system_wide flag - Unify the title bar output - [ plus lots of other fixes and small improvements. ] Tooling infrastructure changes: - Refactor unit and scale function parameters for PMU parsing routines (Matt Fleming) - Improve DSO long names lookup with rbtree, resulting in great speedup for workloads with lots of DSOs (Waiman Long) - We were not handling POLLHUP notifications for event file descriptors Fix it by filtering entries in the events file descriptor array after poll() returns, refcounting mmaps so that when the last fd pointing to a perf mmap goes away we do the unmap (Arnaldo Carvalho de Melo) - Intel PT prep work, from Adrian Hunter, including: - Let a user specify a PMU event without any config terms - Add perf-with-kcore script - Let default config be defined for a PMU - Add perf_pmu__scan_file() - Add a 'perf test' for tracking with sched_switch - Add 'flush' callback to scripting API - Use ring buffer consume method to look like other tools (Arnaldo Carvalho de Melo) - hists browser (used in top and report) refactorings, getting rid of unused variables and reducing source code size by handling similar cases in a fewer functions (Namhyung Kim). - Replace thread unsafe strerror() with strerror_r() accross the whole tools/perf/ tree (Masami Hiramatsu) - Rename ordered_samples to ordered_events and allow setting a queue size for ordering events (Jiri Olsa) - [ plus lots of fixes, cleanups and other improvements ]" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (198 commits) perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment perf/x86/intel/uncore: Fix minor race in box set up perf record: Fix error message for --filter option not coming after tracepoint perf tools: Fix build breakage on arm64 targets perf symbols: Improve DSO long names lookup speed with rbtree perf symbols: Encapsulate dsos list head into struct dsos perf bench futex: Sanitize -q option in requeue perf bench futex: Support operations for shared futexes perf trace: Fix mmap return address truncation to 32-bit perf tools: Refactor unit and scale function parameters perf tools: Fix line number in the config file error message perf tools: Convert {record,top}.call-graph option to call-graph.record-mode perf tools: Introduce perf_callchain_config() perf callchain: Move some parser functions to callchain.c perf tools: Move callchain config from record_opts to callchain_param perf hists browser: Fix callchain print bug on TUI perf tools: Use ACCESS_ONCE() instead of volatile cast perf tools: Modify error code for when perf_session__new() fails perf tools: Fix perf record as non root with kptr_restrict == 1 perf stat: Fix --per-core on multi socket systems ...

Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar: "Kernel side updates: - Fix and enhance poll support (Jiri Olsa) - Re-enable inheritance optimization (Jiri Olsa) - Enhance Intel memory events support (Stephane Eranian) - Refactor the Intel uncore driver to be more maintainable (Zheng Yan) - Enhance and fix Intel CPU and uncore PMU drivers (Peter Zijlstra, Andi Kleen) - [ plus various smaller fixes/cleanups ] User visible tooling updates: - Add +field argument support for --field option, so that one can add fields to the default list of fields to show, ie now one can just do: perf report --fields +pid And the pid will appear in addition to the default fields (Jiri Olsa) - Add +field argument support for --sort option (Jiri Olsa) - Honour -w in the report tools (report, top), allowing to specify the widths for the histogram entries columns (Namhyung Kim) - Properly show submicrosecond times in 'perf kvm stat' (Christian Borntraeger) - Add beautifier for mremap flags param in 'trace' (Alex Snast) - perf script: Allow callchains if any event samples them - Don't truncate Intel style addresses in 'annotate' (Alex Converse) - Allow profiling when kptr_restrict == 1 for non root users, kernel samples will just remain unresolved (Andi Kleen) - Allow configuring default options for callchains in config file (Namhyung Kim) - Support operations for shared futexes. (Davidlohr Bueso) - "perf kvm stat report" improvements by Alexander Yarygin: - Save pid string in opts.target.pid - Enable the target.system_wide flag - Unify the title bar output - [ plus lots of other fixes and small improvements. ] Tooling infrastructure changes: - Refactor unit and scale function parameters for PMU parsing routines (Matt Fleming) - Improve DSO long names lookup with rbtree, resulting in great speedup for workloads with lots of DSOs (Waiman Long) - We were not handling POLLHUP notifications for event file descriptors Fix it by filtering entries in the events file descriptor array after poll() returns, refcounting mmaps so that when the last fd pointing to a perf mmap goes away we do the unmap (Arnaldo Carvalho de Melo) - Intel PT prep work, from Adrian Hunter, including: - Let a user specify a PMU event without any config terms - Add perf-with-kcore script - Let default config be defined for a PMU - Add perf_pmu__scan_file() - Add a 'perf test' for tracking with sched_switch - Add 'flush' callback to scripting API - Use ring buffer consume method to look like other tools (Arnaldo Carvalho de Melo) - hists browser (used in top and report) refactorings, getting rid of unused variables and reducing source code size by handling similar cases in a fewer functions (Namhyung Kim). - Replace thread unsafe strerror() with strerror_r() accross the whole tools/perf/ tree (Masami Hiramatsu) - Rename ordered_samples to ordered_events and allow setting a queue size for ordering events (Jiri Olsa) - [ plus lots of fixes, cleanups and other improvements ]" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (198 commits) perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment perf/x86/intel/uncore: Fix minor race in box set up perf record: Fix error message for --filter option not coming after tracepoint perf tools: Fix build breakage on arm64 targets perf symbols: Improve DSO long names lookup speed with rbtree perf symbols: Encapsulate dsos list head into struct dsos perf bench futex: Sanitize -q option in requeue perf bench futex: Support operations for shared futexes perf trace: Fix mmap return address truncation to 32-bit perf tools: Refactor unit and scale function parameters perf tools: Fix line number in the config file error message perf tools: Convert {record,top}.call-graph option to call-graph.record-mode perf tools: Introduce perf_callchain_config() perf callchain: Move some parser functions to callchain.c perf tools: Move callchain config from record_opts to callchain_param perf hists browser: Fix callchain print bug on TUI perf tools: Use ACCESS_ONCE() instead of volatile cast perf tools: Modify error code for when perf_session__new() fails perf tools: Fix perf record as non root with kptr_restrict == 1 perf stat: Fix --per-core on multi socket systems ...
9d9420f1 · Linus Torvalds · 6d5f0ebf · cc6cd47e · 9d9420f1 · 9d9420f1
Commit 9d9420f1 authored Oct 13, 2014 by Linus Torvalds
134 changed files
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -51,6 +51,14 @@
 	 ARCH_PERFMON_EVENTSEL_EDGE  |	\
 	 ARCH_PERFMON_EVENTSEL_INV   |	\
 	 ARCH_PERFMON_EVENTSEL_CMASK)
+#define X86_ALL_EVENT_FLAGS  			\
+	(ARCH_PERFMON_EVENTSEL_EDGE |  		\
+	 ARCH_PERFMON_EVENTSEL_INV | 		\
+	 ARCH_PERFMON_EVENTSEL_CMASK | 		\
+	 ARCH_PERFMON_EVENTSEL_ANY | 		\
+	 ARCH_PERFMON_EVENTSEL_PIN_CONTROL | 	\
+	 HSW_IN_TX | 				\
+	 HSW_IN_TX_CHECKPOINTED)
 #define AMD64_RAW_EVENT_MASK		\
 	(X86_RAW_EVENT_MASK          |  \
 	 AMD64_EVENTSEL_EVENT)

--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -39,7 +39,9 @@ obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd_iommu.o
 endif
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_knc.o perf_event_p4.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o
-obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_uncore.o perf_event_intel_rapl.o
+obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_uncore.o perf_event_intel_uncore_snb.o
+obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_uncore_snbep.o perf_event_intel_uncore_nhmex.o
+obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_rapl.o
 endif



--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -243,7 +243,8 @@ static bool check_hw_exists(void)

 msr_fail:
 	printk(KERN_CONT "Broken PMU hardware detected, using software events only.\n");
-	printk(KERN_ERR "Failed to access perfctr msr (MSR %x is %Lx)\n", reg, val_new);
+	printk(boot_cpu_has(X86_FEATURE_HYPERVISOR) ? KERN_INFO : KERN_ERR
+	       "Failed to access perfctr msr (MSR %x is %Lx)\n", reg, val_new);

 	return false;
 }
@@ -387,7 +388,7 @@ int x86_pmu_hw_config(struct perf_event *event)
 			precise++;

 			/* Support for IP fixup */
-			if (x86_pmu.lbr_nr)
+			if (x86_pmu.lbr_nr || x86_pmu.intel_cap.pebs_format >= 2)
 				precise++;
 		}

@@ -443,6 +444,12 @@ int x86_pmu_hw_config(struct perf_event *event)
 	if (event->attr.type == PERF_TYPE_RAW)
 		event->hw.config |= event->attr.config & X86_RAW_EVENT_MASK;

+	if (event->attr.sample_period && x86_pmu.limit_period) {
+		if (x86_pmu.limit_period(event, event->attr.sample_period) >
+				event->attr.sample_period)
+			return -EINVAL;
+	}
+
 	return x86_setup_perfctr(event);
 }

@@ -980,6 +987,9 @@ int x86_perf_event_set_period(struct perf_event *event)
 	if (left > x86_pmu.max_period)
 		left = x86_pmu.max_period;

+	if (x86_pmu.limit_period)
+		left = x86_pmu.limit_period(event, left);
+
 	per_cpu(pmc_prev_left[idx], smp_processor_id()) = left;

 	/*

--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -67,8 +67,10 @@ struct event_constraint {
 */
 #define PERF_X86_EVENT_PEBS_LDLAT	0x1 /* ld+ldlat data address sampling */
 #define PERF_X86_EVENT_PEBS_ST		0x2 /* st data address sampling */
-#define PERF_X86_EVENT_PEBS_ST_HSW	0x4 /* haswell style st data sampling */
+#define PERF_X86_EVENT_PEBS_ST_HSW	0x4 /* haswell style datala, store */
 #define PERF_X86_EVENT_COMMITTED	0x8 /* event passed commit_txn */
+#define PERF_X86_EVENT_PEBS_LD_HSW	0x10 /* haswell style datala, load */
+#define PERF_X86_EVENT_PEBS_NA_HSW	0x20 /* haswell style datala, unknown */

 struct amd_nb {
 	int nb_id;  /* NorthBridge id */
@@ -252,18 +254,52 @@ struct cpu_hw_events {
 	EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK)

 #define INTEL_PLD_CONSTRAINT(c, n)	\
-	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
+	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS, \
 			   HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LDLAT)

 #define INTEL_PST_CONSTRAINT(c, n)	\
-	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
+	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS, \
 			  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_ST)

-/* DataLA version of store sampling without extra enable bit. */
-#define INTEL_PST_HSW_CONSTRAINT(c, n)	\
-	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
+/* Event constraint, but match on all event flags too. */
+#define INTEL_FLAGS_EVENT_CONSTRAINT(c, n) \
+	EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS)
+
+/* Check only flags, but allow all event/umask */
+#define INTEL_ALL_EVENT_CONSTRAINT(code, n)	\
+	EVENT_CONSTRAINT(code, n, X86_ALL_EVENT_FLAGS)
+
+/* Check flags and event code, and set the HSW store flag */
+#define INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_ST(code, n) \
+	__EVENT_CONSTRAINT(code, n, 			\
+			  ARCH_PERFMON_EVENTSEL_EVENT|X86_ALL_EVENT_FLAGS, \
+			  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_ST_HSW)
+
+/* Check flags and event code, and set the HSW load flag */
+#define INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(code, n) \
+	__EVENT_CONSTRAINT(code, n, 			\
+			  ARCH_PERFMON_EVENTSEL_EVENT|X86_ALL_EVENT_FLAGS, \
+			  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LD_HSW)
+
+/* Check flags and event code/umask, and set the HSW store flag */
+#define INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(code, n) \
+	__EVENT_CONSTRAINT(code, n, 			\
+			  INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS, \
 			  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_ST_HSW)

+/* Check flags and event code/umask, and set the HSW load flag */
+#define INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(code, n) \
+	__EVENT_CONSTRAINT(code, n, 			\
+			  INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS, \
+			  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LD_HSW)
+
+/* Check flags and event code/umask, and set the HSW N/A flag */
+#define INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA(code, n) \
+	__EVENT_CONSTRAINT(code, n, 			\
+			  INTEL_ARCH_EVENT_MASK|INTEL_ARCH_EVENT_MASK, \
+			  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_NA_HSW)
+
+
 /*
 * We define the end marker as having a weight of -1
 * to enable blacklisting of events using a counter bitmask
@@ -409,6 +445,7 @@ struct x86_pmu {
 	struct x86_pmu_quirk *quirks;
 	int		perfctr_second_write;
 	bool		late_ack;
+	unsigned	(*limit_period)(struct perf_event *event, unsigned l);

 	/*
 	 * sysfs attrs

--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -697,7 +697,7 @@ static const int snb_lbr_sel_map[PERF_SAMPLE_BRANCH_MAX] = {
 };

 /* core */
-void intel_pmu_lbr_init_core(void)
+void __init intel_pmu_lbr_init_core(void)
 {
 	x86_pmu.lbr_nr     = 4;
 	x86_pmu.lbr_tos    = MSR_LBR_TOS;
@@ -712,7 +712,7 @@ void intel_pmu_lbr_init_core(void)
 }

 /* nehalem/westmere */
-void intel_pmu_lbr_init_nhm(void)
+void __init intel_pmu_lbr_init_nhm(void)
 {
 	x86_pmu.lbr_nr     = 16;
 	x86_pmu.lbr_tos    = MSR_LBR_TOS;
@@ -733,7 +733,7 @@ void intel_pmu_lbr_init_nhm(void)
 }

 /* sandy bridge */
-void intel_pmu_lbr_init_snb(void)
+void __init intel_pmu_lbr_init_snb(void)
 {
 	x86_pmu.lbr_nr	 = 16;
 	x86_pmu.lbr_tos	 = MSR_LBR_TOS;
@@ -753,7 +753,7 @@ void intel_pmu_lbr_init_snb(void)
 }

 /* atom */
-void intel_pmu_lbr_init_atom(void)
+void __init intel_pmu_lbr_init_atom(void)
 {
 	/*
 	 * only models starting at stepping 10 seems

--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snbep.c
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -8,28 +8,28 @@
 * Copyright (C) 2011-2012 Peter Zijlstra <pzijlstr@redhat.com>
 *
 * Jump labels provide an interface to generate dynamic branches using
- * self-modifying code. Assuming toolchain and architecture support the result
- * of a "if (static_key_false(&key))" statement is a unconditional branch (which
+ * self-modifying code. Assuming toolchain and architecture support, the result
+ * of a "if (static_key_false(&key))" statement is an unconditional branch (which
 * defaults to false - and the true block is placed out of line).
 *
 * However at runtime we can change the branch target using
 * static_key_slow_{inc,dec}(). These function as a 'reference' count on the key
- * object and for as long as there are references all branches referring to
+ * object, and for as long as there are references all branches referring to
 * that particular key will point to the (out of line) true block.
 *
- * Since this relies on modifying code the static_key_slow_{inc,dec}() functions
+ * Since this relies on modifying code, the static_key_slow_{inc,dec}() functions
 * must be considered absolute slow paths (machine wide synchronization etc.).
- * OTOH, since the affected branches are unconditional their runtime overhead
+ * OTOH, since the affected branches are unconditional, their runtime overhead
 * will be absolutely minimal, esp. in the default (off) case where the total
 * effect is a single NOP of appropriate size. The on case will patch in a jump
 * to the out-of-line block.
 *
- * When the control is directly exposed to userspace it is prudent to delay the
+ * When the control is directly exposed to userspace, it is prudent to delay the
 * decrement to avoid high frequency code modifications which can (and do)
 * cause significant performance degradation. Struct static_key_deferred and
 * static_key_slow_dec_deferred() provide for this.
 *
- * Lacking toolchain and or architecture support, it falls back to a simple
+ * Lacking toolchain and or architecture support, jump labels fall back to a simple
 * conditional branch.
 *
 * struct static_key my_key = STATIC_KEY_INIT_TRUE;
@@ -43,8 +43,7 @@
 *
 * Not initializing the key (static data is initialized to 0s anyway) is the
 * same as using STATIC_KEY_INIT_FALSE.
- *
-*/
+ */

 #include <linux/types.h>
 #include <linux/compiler.h>

--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2538,6 +2538,7 @@
 #define PCI_DEVICE_ID_INTEL_EESSC	0x0008
 #define PCI_DEVICE_ID_INTEL_SNB_IMC	0x0100
 #define PCI_DEVICE_ID_INTEL_IVB_IMC	0x0154
+#define PCI_DEVICE_ID_INTEL_IVB_E3_IMC	0x0150
 #define PCI_DEVICE_ID_INTEL_HSW_IMC	0x0c00
 #define PCI_DEVICE_ID_INTEL_PXHD_0	0x0320
 #define PCI_DEVICE_ID_INTEL_PXHD_1	0x0321

--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -52,6 +52,7 @@ struct perf_guest_info_callbacks {
 #include <linux/atomic.h>
 #include <linux/sysfs.h>
 #include <linux/perf_regs.h>
+#include <linux/workqueue.h>
 #include <asm/local.h>

 struct perf_callchain_entry {
@@ -268,6 +269,7 @@ struct pmu {
 * enum perf_event_active_state - the states of a event
 */
 enum perf_event_active_state {
+	PERF_EVENT_STATE_EXIT		= -3,
 	PERF_EVENT_STATE_ERROR		= -2,
 	PERF_EVENT_STATE_OFF		= -1,
 	PERF_EVENT_STATE_INACTIVE	=  0,
@@ -507,6 +509,9 @@ struct perf_event_context {
 	int				nr_cgroups;	 /* cgroup evts */
 	int				nr_branch_stack; /* branch_stack evt */
 	struct rcu_head			rcu_head;
+
+	struct delayed_work		orphans_remove;
+	bool				orphans_remove_sched;
 };

 /*
@@ -604,6 +609,13 @@ struct perf_sample_data {
 	u64				txn;
 };

+/* default value for data source */
+#define PERF_MEM_NA (PERF_MEM_S(OP, NA)   |\
+		    PERF_MEM_S(LVL, NA)   |\
+		    PERF_MEM_S(SNOOP, NA) |\
+		    PERF_MEM_S(LOCK, NA)  |\
+		    PERF_MEM_S(TLB, NA))
+
 static inline void perf_sample_data_init(struct perf_sample_data *data,
 					 u64 addr, u64 period)
 {
@@ -616,7 +628,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
 	data->regs_user.regs = NULL;
 	data->stack_user_size = 0;
 	data->weight = 0;
-	data->data_src.val = 0;
+	data->data_src.val = PERF_MEM_NA;
 	data->txn = 0;
 }


--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -52,7 +52,7 @@ static void release_callchain_buffers(void)
 	struct callchain_cpus_entries *entries;

 	entries = callchain_cpus_entries;
-	rcu_assign_pointer(callchain_cpus_entries, NULL);
+	RCU_INIT_POINTER(callchain_cpus_entries, NULL);
 	call_rcu(&entries->rcu_head, release_callchain_buffers_rcu);
 }


--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -47,6 +47,8 @@

 #include <asm/irq_regs.h>

+static struct workqueue_struct *perf_wq;
+
 struct remote_function_call {
 	struct task_struct	*p;
 	int			(*func)(void *info);
@@ -120,6 +122,13 @@ static int cpu_function_call(int cpu, int (*func) (void *info), void *info)
 	return data.ret;
 }

+#define EVENT_OWNER_KERNEL ((void *) -1)
+
+static bool is_kernel_event(struct perf_event *event)
+{
+	return event->owner == EVENT_OWNER_KERNEL;
+}
+
 #define PERF_FLAG_ALL (PERF_FLAG_FD_NO_GROUP |\
 		       PERF_FLAG_FD_OUTPUT  |\
 		       PERF_FLAG_PID_CGROUP |\
@@ -1370,6 +1379,45 @@ static void perf_group_detach(struct perf_event *event)
 		perf_event__header_size(tmp);
 }

+/*
+ * User event without the task.
+ */
+static bool is_orphaned_event(struct perf_event *event)
+{
+	return event && !is_kernel_event(event) && !event->owner;
+}
+
+/*
+ * Event has a parent but parent's task finished and it's
+ * alive only because of children holding refference.
+ */
+static bool is_orphaned_child(struct perf_event *event)
+{
+	return is_orphaned_event(event->parent);
+}
+
+static void orphans_remove_work(struct work_struct *work);
+
+static void schedule_orphans_remove(struct perf_event_context *ctx)
+{
+	if (!ctx->task || ctx->orphans_remove_sched || !perf_wq)
+		return;
+
+	if (queue_delayed_work(perf_wq, &ctx->orphans_remove, 1)) {
+		get_ctx(ctx);
+		ctx->orphans_remove_sched = true;
+	}
+}
+
+static int __init perf_workqueue_init(void)
+{
+	perf_wq = create_singlethread_workqueue("perf");
+	WARN(!perf_wq, "failed to create perf workqueue\n");
+	return perf_wq ? 0 : -1;
+}
+
+core_initcall(perf_workqueue_init);
+
 static inline int
 event_filter_match(struct perf_event *event)
 {
@@ -1419,6 +1467,9 @@ event_sched_out(struct perf_event *event,
 	if (event->attr.exclusive || !cpuctx->active_oncpu)
 		cpuctx->exclusive = 0;

+	if (is_orphaned_child(event))
+		schedule_orphans_remove(ctx);
+
 	perf_pmu_enable(event->pmu);
 }

@@ -1726,6 +1777,9 @@ event_sched_in(struct perf_event *event,
 	if (event->attr.exclusive)
 		cpuctx->exclusive = 1;

+	if (is_orphaned_child(event))
+		schedule_orphans_remove(ctx);
+
 out:
 	perf_pmu_enable(event->pmu);

@@ -2326,7 +2380,7 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
 	next_parent = rcu_dereference(next_ctx->parent_ctx);

 	/* If neither context have a parent context; they cannot be clones. */
-	if (!parent || !next_parent)
+	if (!parent && !next_parent)
 		goto unlock;

 	if (next_parent == ctx || next_ctx == parent || next_parent == parent) {
@@ -3073,6 +3127,7 @@ static void __perf_event_init_context(struct perf_event_context *ctx)
 	INIT_LIST_HEAD(&ctx->flexible_groups);
 	INIT_LIST_HEAD(&ctx->event_list);
 	atomic_set(&ctx->refcount, 1);
+	INIT_DELAYED_WORK(&ctx->orphans_remove, orphans_remove_work);
 }

 static struct perf_event_context *
@@ -3318,16 +3373,12 @@ static void free_event(struct perf_event *event)
 }

 /*
- * Called when the last reference to the file is gone.
+ * Remove user event from the owner task.
 */
-static void put_event(struct perf_event *event)
+static void perf_remove_from_owner(struct perf_event *event)
 {
-	struct perf_event_context *ctx = event->ctx;
 	struct task_struct *owner;

-	if (!atomic_long_dec_and_test(&event->refcount))
-		return;
-
 	rcu_read_lock();
 	owner = ACCESS_ONCE(event->owner);
 	/*
@@ -3360,6 +3411,20 @@ static void put_event(struct perf_event *event)
 		mutex_unlock(&owner->perf_event_mutex);
 		put_task_struct(owner);
 	}
+}
+
+/*
+ * Called when the last reference to the file is gone.
+ */
+static void put_event(struct perf_event *event)
+{
+	struct perf_event_context *ctx = event->ctx;
+
+	if (!atomic_long_dec_and_test(&event->refcount))
+		return;
+
+	if (!is_kernel_event(event))
+		perf_remove_from_owner(event);

 	WARN_ON_ONCE(ctx->parent_ctx);
 	/*
@@ -3394,6 +3459,42 @@ static int perf_release(struct inode *inode, struct file *file)
 	return 0;
 }

+/*
+ * Remove all orphanes events from the context.
+ */
+static void orphans_remove_work(struct work_struct *work)
+{
+	struct perf_event_context *ctx;
+	struct perf_event *event, *tmp;
+
+	ctx = container_of(work, struct perf_event_context,
+			   orphans_remove.work);
+
+	mutex_lock(&ctx->mutex);
+	list_for_each_entry_safe(event, tmp, &ctx->event_list, event_entry) {
+		struct perf_event *parent_event = event->parent;
+
+		if (!is_orphaned_child(event))
+			continue;
+
+		perf_remove_from_context(event, true);
+
+		mutex_lock(&parent_event->child_mutex);
+		list_del_init(&event->child_list);
+		mutex_unlock(&parent_event->child_mutex);
+
+		free_event(event);
+		put_event(parent_event);
+	}
+
+	raw_spin_lock_irq(&ctx->lock);
+	ctx->orphans_remove_sched = false;
+	raw_spin_unlock_irq(&ctx->lock);
+	mutex_unlock(&ctx->mutex);
+
+	put_ctx(ctx);
+}
+
 u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
 {
 	struct perf_event *child;
@@ -3491,6 +3592,19 @@ static int perf_event_read_one(struct perf_event *event,
 	return n * sizeof(u64);
 }

+static bool is_event_hup(struct perf_event *event)
+{
+	bool no_children;
+
+	if (event->state != PERF_EVENT_STATE_EXIT)
+		return false;
+
+	mutex_lock(&event->child_mutex);
+	no_children = list_empty(&event->child_list);
+	mutex_unlock(&event->child_mutex);
+	return no_children;
+}
+
 /*
 * Read the performance event - simple non blocking version for now
 */
@@ -3532,7 +3646,12 @@ static unsigned int perf_poll(struct file *file, poll_table *wait)
 {
 	struct perf_event *event = file->private_data;
 	struct ring_buffer *rb;
-	unsigned int events = POLL_HUP;
+	unsigned int events = POLLHUP;
+
+	poll_wait(file, &event->waitq, wait);
+
+	if (is_event_hup(event))
+		return events;

 	/*
 	 * Pin the event->rb by taking event->mmap_mutex; otherwise
@@ -3543,9 +3662,6 @@ static unsigned int perf_poll(struct file *file, poll_table *wait)
 	if (rb)
 		events = atomic_xchg(&rb->poll, 0);
 	mutex_unlock(&event->mmap_mutex);
-
-	poll_wait(file, &event->waitq, wait);
-
 	return events;
 }

@@ -5809,7 +5925,7 @@ static void swevent_hlist_release(struct swevent_htable *swhash)
 	if (!hlist)
 		return;

-	rcu_assign_pointer(swhash->swevent_hlist, NULL);
+	RCU_INIT_POINTER(swhash->swevent_hlist, NULL);
 	kfree_rcu(hlist, rcu_head);
 }

@@ -7392,6 +7508,9 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 		goto err;
 	}

+	/* Mark owner so we could distinguish it from user events. */
+	event->owner = EVENT_OWNER_KERNEL;
+
 	account_event(event);

 	ctx = find_get_context(event->pmu, task, cpu);
@@ -7478,6 +7597,12 @@ static void sync_child_event(struct perf_event *child_event,
 	list_del_init(&child_event->child_list);
 	mutex_unlock(&parent_event->child_mutex);

+	/*
+	 * Make sure user/parent get notified, that we just
+	 * lost one event.
+	 */
+	perf_event_wakeup(parent_event);
+
 	/*
 	 * Release the parent event, if this was the last
 	 * reference to it.
@@ -7512,6 +7637,9 @@ __perf_event_exit_task(struct perf_event *child_event,
 	if (child_event->parent) {
 		sync_child_event(child_event, child);
 		free_event(child_event);
+	} else {
+		child_event->state = PERF_EVENT_STATE_EXIT;
+		perf_event_wakeup(child_event);
 	}
 }

@@ -7695,6 +7823,7 @@ inherit_event(struct perf_event *parent_event,
 	      struct perf_event *group_leader,
 	      struct perf_event_context *child_ctx)
 {
+	enum perf_event_active_state parent_state = parent_event->state;
 	struct perf_event *child_event;
 	unsigned long flags;

@@ -7715,7 +7844,8 @@ inherit_event(struct perf_event *parent_event,
 	if (IS_ERR(child_event))
 		return child_event;

-	if (!atomic_long_inc_not_zero(&parent_event->refcount)) {
+	if (is_orphaned_event(parent_event) ||
+	    !atomic_long_inc_not_zero(&parent_event->refcount)) {
 		free_event(child_event);
 		return NULL;
 	}
@@ -7727,7 +7857,7 @@ inherit_event(struct perf_event *parent_event,
 	 * not its attr.disabled bit.  We hold the parent's mutex,
 	 * so we won't race with perf_event_{en, dis}able_family.
 	 */
-	if (parent_event->state >= PERF_EVENT_STATE_INACTIVE)
+	if (parent_state >= PERF_EVENT_STATE_INACTIVE)
 		child_event->state = PERF_EVENT_STATE_INACTIVE;
 	else
 		child_event->state = PERF_EVENT_STATE_OFF;

--- a/tools/lib/api/Makefile
+++ b/tools/lib/api/Makefile
@@ -10,9 +10,14 @@ LIB_OBJS=

 LIB_H += fs/debugfs.h
 LIB_H += fs/fs.h
+# See comment below about piggybacking...
+LIB_H += fd/array.h

 LIB_OBJS += $(OUTPUT)fs/debugfs.o
 LIB_OBJS += $(OUTPUT)fs/fs.o
+# XXX piggybacking here, need to introduce libapikfd, or rename this
+# to plain libapik.a and make it have it all api goodies
+LIB_OBJS += $(OUTPUT)fd/array.o

 LIBFILE = libapikfs.a

@@ -29,7 +34,7 @@ $(LIBFILE): $(LIB_OBJS)
 $(LIB_OBJS): $(LIB_H)

 libapi_dirs:
-	$(QUIET_MKDIR)mkdir -p $(OUTPUT)fs/
+	$(QUIET_MKDIR)mkdir -p $(OUTPUT)fd $(OUTPUT)fs

 $(OUTPUT)%.o: %.c libapi_dirs
 	$(QUIET_CC)$(CC) -o $@ -c $(ALL_CFLAGS) $<

--- a/tools/lib/api/fd/array.c
+++ b/tools/lib/api/fd/array.c
+/*
+ * Copyright (C) 2014, Red Hat Inc, Arnaldo Carvalho de Melo <acme@redhat.com>
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ */
+#include "array.h"
+#include <errno.h>
+#include <fcntl.h>
+#include <poll.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+void fdarray__init(struct fdarray *fda, int nr_autogrow)
+{
+	fda->entries	 = NULL;
+	fda->priv	 = NULL;
+	fda->nr		 = fda->nr_alloc = 0;
+	fda->nr_autogrow = nr_autogrow;
+}
+
+int fdarray__grow(struct fdarray *fda, int nr)
+{
+	void *priv;
+	int nr_alloc = fda->nr_alloc + nr;
+	size_t psize = sizeof(fda->priv[0]) * nr_alloc;
+	size_t size  = sizeof(struct pollfd) * nr_alloc;
+	struct pollfd *entries = realloc(fda->entries, size);
+
+	if (entries == NULL)
+		return -ENOMEM;
+
+	priv = realloc(fda->priv, psize);
+	if (priv == NULL) {
+		free(entries);
+		return -ENOMEM;
+	}
+
+	fda->nr_alloc = nr_alloc;
+	fda->entries  = entries;
+	fda->priv     = priv;
+	return 0;
+}
+
+struct fdarray *fdarray__new(int nr_alloc, int nr_autogrow)
+{
+	struct fdarray *fda = calloc(1, sizeof(*fda));
+
+	if (fda != NULL) {
+		if (fdarray__grow(fda, nr_alloc)) {
+			free(fda);
+			fda = NULL;
+		} else {
+			fda->nr_autogrow = nr_autogrow;
+		}
+	}
+
+	return fda;
+}
+
+void fdarray__exit(struct fdarray *fda)
+{
+	free(fda->entries);
+	free(fda->priv);
+	fdarray__init(fda, 0);
+}
+
+void fdarray__delete(struct fdarray *fda)
+{
+	fdarray__exit(fda);
+	free(fda);
+}
+
+int fdarray__add(struct fdarray *fda, int fd, short revents)
+{
+	int pos = fda->nr;
+
+	if (fda->nr == fda->nr_alloc &&
+	    fdarray__grow(fda, fda->nr_autogrow) < 0)
+		return -ENOMEM;
+
+	fda->entries[fda->nr].fd     = fd;
+	fda->entries[fda->nr].events = revents;
+	fda->nr++;
+	return pos;
+}
+
+int fdarray__filter(struct fdarray *fda, short revents,
+		    void (*entry_destructor)(struct fdarray *fda, int fd))
+{
+	int fd, nr = 0;
+
+	if (fda->nr == 0)
+		return 0;
+
+	for (fd = 0; fd < fda->nr; ++fd) {
+		if (fda->entries[fd].revents & revents) {
+			if (entry_destructor)
+				entry_destructor(fda, fd);
+
+			continue;
+		}
+
+		if (fd != nr) {
+			fda->entries[nr] = fda->entries[fd];
+			fda->priv[nr]	 = fda->priv[fd];
+		}
+
+		++nr;
+	}
+
+	return fda->nr = nr;
+}
+
+int fdarray__poll(struct fdarray *fda, int timeout)
+{
+	return poll(fda->entries, fda->nr, timeout);
+}
+
+int fdarray__fprintf(struct fdarray *fda, FILE *fp)
+{
+	int fd, printed = fprintf(fp, "%d [ ", fda->nr);
+
+	for (fd = 0; fd < fda->nr; ++fd)
+		printed += fprintf(fp, "%s%d", fd ? ", " : "", fda->entries[fd].fd);
+
+	return printed + fprintf(fp, " ]");
+}
--- a/tools/lib/api/fd/array.h
+++ b/tools/lib/api/fd/array.h
+#ifndef __API_FD_ARRAY__
+#define __API_FD_ARRAY__
+
+#include <stdio.h>
+
+struct pollfd;
+
+/**
+ * struct fdarray: Array of file descriptors
+ *
+ * @priv: Per array entry priv area, users should access just its contents,
+ *	  not set it to anything, as it is kept in synch with @entries, being
+ *	  realloc'ed, * for instance, in fdarray__{grow,filter}.
+ *
+ *	  I.e. using 'fda->priv[N].idx = * value' where N < fda->nr is ok,
+ *	  but doing 'fda->priv = malloc(M)' is not allowed.
+ */
+struct fdarray {
+	int	       nr;
+	int	       nr_alloc;
+	int	       nr_autogrow;
+	struct pollfd *entries;
+	union {
+		int    idx;
+	} *priv;
+};
+
+void fdarray__init(struct fdarray *fda, int nr_autogrow);
+void fdarray__exit(struct fdarray *fda);
+
+struct fdarray *fdarray__new(int nr_alloc, int nr_autogrow);
+void fdarray__delete(struct fdarray *fda);
+
+int fdarray__add(struct fdarray *fda, int fd, short revents);
+int fdarray__poll(struct fdarray *fda, int timeout);
+int fdarray__filter(struct fdarray *fda, short revents,
+		    void (*entry_destructor)(struct fdarray *fda, int fd));
+int fdarray__grow(struct fdarray *fda, int extra);
+int fdarray__fprintf(struct fdarray *fda, FILE *fp);
+
+static inline int fdarray__available_entries(struct fdarray *fda)
+{
+	return fda->nr_alloc - fda->nr;
+}
+
+#endif /* __API_FD_ARRAY__ */
--- a/tools/perf/.gitignore
+++ b/tools/perf/.gitignore
@@ -15,6 +15,7 @@ perf.data
 perf.data.old
 output.svg
 perf-archive
+perf-with-kcore
 tags
 TAGS
 cscope*

--- a/tools/perf/Documentation/perf-probe.txt
+++ b/tools/perf/Documentation/perf-probe.txt
@@ -104,6 +104,9 @@ OPTIONS
 	Specify path to the executable or shared library file for user
 	space tracing. Can also be used with --funcs option.

+--demangle-kernel::
+	Demangle kernel symbols.
+
 In absence of -m/-x options, perf probe checks if the first argument after
 the options is an absolute path name. If its an absolute path, perf probe
 uses it as a target module/target user space binary to probe.

--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -147,7 +147,7 @@ OPTIONS
 -w::
 --column-widths=<width[,width...]>::
 	Force each column width to the provided list, for large terminal
-	readability.
+	readability.  0 means no limit (default behavior).

 -t::
 --field-separator=::
@@ -276,6 +276,9 @@ OPTIONS
 	Demangle symbol names to human readable form. It's enabled by default,
 	disable with --no-demangle.

+--demangle-kernel::
+	Demangle kernel symbol names to human readable form (for C++ kernels).
+
 --mem-mode::
 	Use the data addresses of samples in addition to instruction addresses
 	to build the histograms.  To generate meaningful output, the perf.data

--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -98,6 +98,9 @@ Default is to monitor all CPUS.
 --hide_user_symbols::
        Hide user symbols.

+--demangle-kernel::
+        Demangle kernel symbols.
+
 -D::
 --dump-symtab::
        Dump the symbol table used for profiling.
@@ -193,6 +196,12 @@ Default is to monitor all CPUS.
 	sum of shown entries will be always 100%. "absolute" means it retains
 	the original value before and after the filter is applied.

+-w::
+--column-widths=<width[,width...]>::
+	Force each column width to the provided list, for large terminal
+	readability.  0 means no limit (default behavior).
+
+
 INTERACTIVE PROMPTING KEYS
 --------------------------


--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -126,6 +126,7 @@ PYRF_OBJS =
 SCRIPT_SH =

 SCRIPT_SH += perf-archive.sh
+SCRIPT_SH += perf-with-kcore.sh

 grep-libs = $(filter -l%,$(1))
 strip-libs = $(filter-out -l%,$(1))
@@ -263,6 +264,7 @@ LIB_H += util/xyarray.h
 LIB_H += util/header.h
 LIB_H += util/help.h
 LIB_H += util/session.h
+LIB_H += util/ordered-events.h
 LIB_H += util/strbuf.h
 LIB_H += util/strlist.h
 LIB_H += util/strfilter.h
@@ -347,6 +349,7 @@ LIB_OBJS += $(OUTPUT)util/machine.o
 LIB_OBJS += $(OUTPUT)util/map.o
 LIB_OBJS += $(OUTPUT)util/pstack.o
 LIB_OBJS += $(OUTPUT)util/session.o
+LIB_OBJS += $(OUTPUT)util/ordered-events.o
 LIB_OBJS += $(OUTPUT)util/comm.o
 LIB_OBJS += $(OUTPUT)util/thread.o
 LIB_OBJS += $(OUTPUT)util/thread_map.o
@@ -399,6 +402,7 @@ LIB_OBJS += $(OUTPUT)tests/perf-record.o
 LIB_OBJS += $(OUTPUT)tests/rdpmc.o
 LIB_OBJS += $(OUTPUT)tests/evsel-roundtrip-name.o
 LIB_OBJS += $(OUTPUT)tests/evsel-tp-sched.o
+LIB_OBJS += $(OUTPUT)tests/fdarray.o
 LIB_OBJS += $(OUTPUT)tests/pmu.o
 LIB_OBJS += $(OUTPUT)tests/hists_common.o
 LIB_OBJS += $(OUTPUT)tests/hists_link.o
@@ -423,6 +427,7 @@ endif
 endif
 LIB_OBJS += $(OUTPUT)tests/mmap-thread-lookup.o
 LIB_OBJS += $(OUTPUT)tests/thread-mg-share.o
+LIB_OBJS += $(OUTPUT)tests/switch-tracking.o

 BUILTIN_OBJS += $(OUTPUT)builtin-annotate.o
 BUILTIN_OBJS += $(OUTPUT)builtin-bench.o
@@ -765,7 +770,7 @@ $(LIBTRACEEVENT)-clean:
 install-traceevent-plugins: $(LIBTRACEEVENT)
 	$(QUIET_SUBDIR0)$(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) install_plugins

-LIBAPIKFS_SOURCES = $(wildcard $(LIB_PATH)fs/*.[ch])
+LIBAPIKFS_SOURCES = $(wildcard $(LIB_PATH)fs/*.[ch] $(LIB_PATH)fd/*.[ch])

 # if subdir is set, we've been called from above so target has been built
 # already
@@ -875,6 +880,8 @@ install-bin: all install-gtk
 		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)'
 	$(call QUIET_INSTALL, perf-archive) \
 		$(INSTALL) $(OUTPUT)perf-archive -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)'
+	$(call QUIET_INSTALL, perf-with-kcore) \
+		$(INSTALL) $(OUTPUT)perf-with-kcore -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)'
 ifndef NO_LIBPERL
 	$(call QUIET_INSTALL, perl-scripts) \
 		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/Perf-Trace-Util/lib/Perf/Trace'; \
@@ -920,7 +927,7 @@ config-clean:
 	@$(MAKE) -C config/feature-checks clean >/dev/null

 clean: $(LIBTRACEEVENT)-clean $(LIBAPIKFS)-clean config-clean
-	$(call QUIET_CLEAN, core-objs)  $(RM) $(LIB_OBJS) $(BUILTIN_OBJS) $(LIB_FILE) $(OUTPUT)perf-archive $(OUTPUT)perf.o $(LANG_BINDINGS) $(GTK_OBJS)
+	$(call QUIET_CLEAN, core-objs)  $(RM) $(LIB_OBJS) $(BUILTIN_OBJS) $(LIB_FILE) $(OUTPUT)perf-archive $(OUTPUT)perf-with-kcore $(OUTPUT)perf.o $(LANG_BINDINGS) $(GTK_OBJS)
 	$(call QUIET_CLEAN, core-progs) $(RM) $(ALL_PROGRAMS) perf
 	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)PERF-CFLAGS $(OUTPUT)PERF-FEATURES $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex*
 	$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) clean

--- a/tools/perf/arch/arm/tests/dwarf-unwind.c
+++ b/tools/perf/arch/arm/tests/dwarf-unwind.c
@@ -3,6 +3,7 @@
 #include "thread.h"
 #include "map.h"
 #include "event.h"
+#include "debug.h"
 #include "tests/tests.h"

 #define STACK_SIZE 8192

--- a/tools/perf/arch/arm/util/unwind-libunwind.c
+++ b/tools/perf/arch/arm/util/unwind-libunwind.c
@@ -3,6 +3,7 @@
 #include <libunwind.h>
 #include "perf_regs.h"
 #include "../../util/unwind.h"
+#include "../../util/debug.h"

 int libunwind__arch_reg_id(int regnum)
 {

--- a/tools/perf/arch/arm64/include/perf_regs.h
+++ b/tools/perf/arch/arm64/include/perf_regs.h
@@ -6,6 +6,8 @@
 #include <asm/perf_regs.h>

 #define PERF_REGS_MASK	((1ULL << PERF_REG_ARM64_MAX) - 1)
+#define PERF_REGS_MAX	PERF_REG_ARM64_MAX
+
 #define PERF_REG_IP	PERF_REG_ARM64_PC
 #define PERF_REG_SP	PERF_REG_ARM64_SP


--- a/tools/perf/arch/arm64/util/unwind-libunwind.c
+++ b/tools/perf/arch/arm64/util/unwind-libunwind.c
@@ -3,6 +3,7 @@
 #include <libunwind.h>
 #include "perf_regs.h"
 #include "../../util/unwind.h"
+#include "../../util/debug.h"

 int libunwind__arch_reg_id(int regnum)
 {

--- a/tools/perf/arch/common.c
+++ b/tools/perf/arch/common.c
@@ -12,6 +12,11 @@ const char *const arm_triplets[] = {
 	NULL
 };

+const char *const arm64_triplets[] = {
+	"aarch64-linux-android-",
+	NULL
+};
+
 const char *const powerpc_triplets[] = {
 	"powerpc-unknown-linux-gnu-",
 	"powerpc64-unknown-linux-gnu-",
@@ -105,6 +110,8 @@ static const char *normalize_arch(char *arch)
 		return "x86";
 	if (!strcmp(arch, "sun4u") || !strncmp(arch, "sparc", 5))
 		return "sparc";
+	if (!strcmp(arch, "aarch64") || !strcmp(arch, "arm64"))
+		return "arm64";
 	if (!strncmp(arch, "arm", 3) || !strcmp(arch, "sa110"))
 		return "arm";
 	if (!strncmp(arch, "s390", 4))
@@ -159,6 +166,8 @@ static int perf_session_env__lookup_binutils_path(struct perf_session_env *env,

 	if (!strcmp(arch, "arm"))
 		path_list = arm_triplets;
+	else if (!strcmp(arch, "arm64"))
+		path_list = arm64_triplets;
 	else if (!strcmp(arch, "powerpc"))
 		path_list = powerpc_triplets;
 	else if (!strcmp(arch, "sh"))

--- a/tools/perf/arch/powerpc/Makefile
+++ b/tools/perf/arch/powerpc/Makefile
 ifndef NO_DWARF
 PERF_HAVE_DWARF_REGS := 1
 LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/dwarf-regs.o
+LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/skip-callchain-idx.o
 endif
 LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/header.o
-LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/skip-callchain-idx.o
--- a/tools/perf/arch/powerpc/util/skip-callchain-idx.c
+++ b/tools/perf/arch/powerpc/util/skip-callchain-idx.c
@@ -15,6 +15,7 @@

 #include "util/thread.h"
 #include "util/callchain.h"
+#include "util/debug.h"

 /*
 * When saving the callchain on Power, the kernel conservatively saves

--- a/tools/perf/bench/futex-hash.c
+++ b/tools/perf/bench/futex-hash.c
@@ -26,6 +26,7 @@ static unsigned int nsecs    = 10;
 /* amount of futexes per thread */
 static unsigned int nfutexes = 1024;
 static bool fshared = false, done = false, silent = false;
+static int futex_flag = 0;

 struct timeval start, end, runtime;
 static pthread_mutex_t thread_lock;
@@ -75,8 +76,7 @@ static void *workerfn(void *arg)
 			 * such as internal waitqueue handling, thus enlarging
 			 * the critical region protected by hb->lock.
 			 */
-			ret = futex_wait(&w->futex[i], 1234, NULL,
-					 fshared ? 0 : FUTEX_PRIVATE_FLAG);
+			ret = futex_wait(&w->futex[i], 1234, NULL, futex_flag);
 			if (!silent &&
 			    (!ret || errno != EAGAIN || errno != EWOULDBLOCK))
 				warn("Non-expected futex return call");
@@ -135,6 +135,9 @@ int bench_futex_hash(int argc, const char **argv,
 	if (!worker)
 		goto errmem;

+	if (!fshared)
+		futex_flag = FUTEX_PRIVATE_FLAG;
+
 	printf("Run summary [PID %d]: %d threads, each operating on %d [%s] futexes for %d secs.\n\n",
 	       getpid(), nthreads, nfutexes, fshared ? "shared":"private", nsecs);


--- a/tools/perf/bench/futex-requeue.c
+++ b/tools/perf/bench/futex-requeue.c
@@ -30,16 +30,18 @@ static u_int32_t futex1 = 0, futex2 = 0;
 static unsigned int nrequeue = 1;

 static pthread_t *worker;
-static bool done = 0, silent = 0;
+static bool done = false, silent = false, fshared = false;
 static pthread_mutex_t thread_lock;
 static pthread_cond_t thread_parent, thread_worker;
 static struct stats requeuetime_stats, requeued_stats;
 static unsigned int ncpus, threads_starting, nthreads = 0;
+static int futex_flag = 0;

 static const struct option options[] = {
 	OPT_UINTEGER('t', "threads",  &nthreads, "Specify amount of threads"),
 	OPT_UINTEGER('q', "nrequeue", &nrequeue, "Specify amount of threads to requeue at once"),
 	OPT_BOOLEAN( 's', "silent",   &silent,   "Silent mode: do not display data/details"),
+	OPT_BOOLEAN( 'S', "shared",   &fshared,  "Use shared futexes instead of private ones"),
 	OPT_END()
 };

@@ -70,7 +72,7 @@ static void *workerfn(void *arg __maybe_unused)
 	pthread_cond_wait(&thread_worker, &thread_lock);
 	pthread_mutex_unlock(&thread_lock);

-	futex_wait(&futex1, 0, NULL, FUTEX_PRIVATE_FLAG);
+	futex_wait(&futex1, 0, NULL, futex_flag);
 	return NULL;
 }

@@ -127,9 +129,12 @@ int bench_futex_requeue(int argc, const char **argv,
 	if (!worker)
 		err(EXIT_FAILURE, "calloc");

-	printf("Run summary [PID %d]: Requeuing %d threads (from %p to %p), "
-	       "%d at a time.\n\n",
-	       getpid(), nthreads, &futex1, &futex2, nrequeue);
+	if (!fshared)
+		futex_flag = FUTEX_PRIVATE_FLAG;
+
+	printf("Run summary [PID %d]: Requeuing %d threads (from [%s] %p to %p), "
+	       "%d at a time.\n\n",  getpid(), nthreads,
+	       fshared ? "shared":"private", &futex1, &futex2, nrequeue);

 	init_stats(&requeued_stats);
 	init_stats(&requeuetime_stats);
@@ -156,16 +161,20 @@ int bench_futex_requeue(int argc, const char **argv,

 		/* Ok, all threads are patiently blocked, start requeueing */
 		gettimeofday(&start, NULL);
-		for (nrequeued = 0; nrequeued < nthreads; nrequeued += nrequeue)
+		for (nrequeued = 0; nrequeued < nthreads; nrequeued += nrequeue) {
 			/*
 			 * Do not wakeup any tasks blocked on futex1, allowing
 			 * us to really measure futex_wait functionality.
 			 */
-			futex_cmp_requeue(&futex1, 0, &futex2, 0, nrequeue,
-					  FUTEX_PRIVATE_FLAG);
+			futex_cmp_requeue(&futex1, 0, &futex2, 0,
+					  nrequeue, futex_flag);
+		}
 		gettimeofday(&end, NULL);
 		timersub(&end, &start, &runtime);

+		if (nrequeued > nthreads)
+			nrequeued = nthreads;
+
 		update_stats(&requeued_stats, nrequeued);
 		update_stats(&requeuetime_stats, runtime.tv_usec);

@@ -175,7 +184,7 @@ int bench_futex_requeue(int argc, const char **argv,
 		}

 		/* everybody should be blocked on futex2, wake'em up */
-		nrequeued = futex_wake(&futex2, nthreads, FUTEX_PRIVATE_FLAG);
+		nrequeued = futex_wake(&futex2, nthreads, futex_flag);
 		if (nthreads != nrequeued)
 			warnx("couldn't wakeup all tasks (%d/%d)", nrequeued, nthreads);

@@ -184,7 +193,6 @@ int bench_futex_requeue(int argc, const char **argv,
 			if (ret)
 				err(EXIT_FAILURE, "pthread_join");
 		}
-
 	}

 	/* cleanup & report results */

--- a/tools/perf/bench/futex-wake.c
+++ b/tools/perf/bench/futex-wake.c
@@ -31,16 +31,18 @@ static u_int32_t futex1 = 0;
 static unsigned int nwakes = 1;

 pthread_t *worker;
-static bool done = false, silent = false;
+static bool done = false, silent = false, fshared = false;
 static pthread_mutex_t thread_lock;
 static pthread_cond_t thread_parent, thread_worker;
 static struct stats waketime_stats, wakeup_stats;
 static unsigned int ncpus, threads_starting, nthreads = 0;
+static int futex_flag = 0;

 static const struct option options[] = {
 	OPT_UINTEGER('t', "threads", &nthreads, "Specify amount of threads"),
 	OPT_UINTEGER('w', "nwakes",  &nwakes,   "Specify amount of threads to wake at once"),
 	OPT_BOOLEAN( 's', "silent",  &silent,   "Silent mode: do not display data/details"),
+	OPT_BOOLEAN( 'S', "shared",  &fshared,  "Use shared futexes instead of private ones"),
 	OPT_END()
 };

@@ -58,7 +60,7 @@ static void *workerfn(void *arg __maybe_unused)
 	pthread_cond_wait(&thread_worker, &thread_lock);
 	pthread_mutex_unlock(&thread_lock);

-	futex_wait(&futex1, 0, NULL, FUTEX_PRIVATE_FLAG);
+	futex_wait(&futex1, 0, NULL, futex_flag);
 	return NULL;
 }

@@ -130,9 +132,12 @@ int bench_futex_wake(int argc, const char **argv,
 	if (!worker)
 		err(EXIT_FAILURE, "calloc");

-	printf("Run summary [PID %d]: blocking on %d threads (at futex %p), "
+	if (!fshared)
+		futex_flag = FUTEX_PRIVATE_FLAG;
+
+	printf("Run summary [PID %d]: blocking on %d threads (at [%s] futex %p), "
 	       "waking up %d at a time.\n\n",
-	       getpid(), nthreads, &futex1, nwakes);
+	       getpid(), nthreads, fshared ? "shared":"private",  &futex1, nwakes);

 	init_stats(&wakeup_stats);
 	init_stats(&waketime_stats);
@@ -160,7 +165,7 @@ int bench_futex_wake(int argc, const char **argv,
 		/* Ok, all threads are patiently blocked, start waking folks up */
 		gettimeofday(&start, NULL);
 		while (nwoken != nthreads)
-			nwoken += futex_wake(&futex1, nwakes, FUTEX_PRIVATE_FLAG);
+			nwoken += futex_wake(&futex1, nwakes, futex_flag);
 		gettimeofday(&end, NULL);
 		timersub(&end, &start, &runtime);


--- a/tools/perf/bench/sched-messaging.c
+++ b/tools/perf/bench/sched-messaging.c
@@ -26,7 +26,7 @@
 #include <sys/socket.h>
 #include <sys/wait.h>
 #include <sys/time.h>
-#include <sys/poll.h>
+#include <poll.h>
 #include <limits.h>
 #include <err.h>


--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -36,7 +36,8 @@

 struct perf_annotate {
 	struct perf_tool tool;
-	bool	   force, use_tui, use_stdio, use_gtk;
+	struct perf_session *session;
+	bool	   use_tui, use_stdio, use_gtk;
 	bool	   full_paths;
 	bool	   print_line;
 	bool	   skip_missing;
@@ -188,18 +189,9 @@ static void hists__find_annotations(struct hists *hists,
 static int __cmd_annotate(struct perf_annotate *ann)
 {
 	int ret;
-	struct perf_session *session;
+	struct perf_session *session = ann->session;
 	struct perf_evsel *pos;
 	u64 total_nr_samples;
-	struct perf_data_file file = {
-		.path  = input_name,
-		.mode  = PERF_DATA_MODE_READ,
-		.force = ann->force,
-	};
-
-	session = perf_session__new(&file, false, &ann->tool);
-	if (session == NULL)
-		return -ENOMEM;

 	machines__set_symbol_filter(&session->machines, symbol__annotate_init);

@@ -207,22 +199,22 @@ static int __cmd_annotate(struct perf_annotate *ann)
 		ret = perf_session__cpu_bitmap(session, ann->cpu_list,
 					       ann->cpu_bitmap);
 		if (ret)
-			goto out_delete;
+			goto out;
 	}

 	if (!objdump_path) {
 		ret = perf_session_env__lookup_objdump(&session->header.env);
 		if (ret)
-			goto out_delete;
+			goto out;
 	}

 	ret = perf_session__process_events(session, &ann->tool);
 	if (ret)
-		goto out_delete;
+		goto out;

 	if (dump_trace) {
 		perf_session__fprintf_nr_events(session, stdout);
-		goto out_delete;
+		goto out;
 	}

 	if (verbose > 3)
@@ -250,8 +242,8 @@ static int __cmd_annotate(struct perf_annotate *ann)
 	}

 	if (total_nr_samples == 0) {
-		ui__error("The %s file has no samples!\n", file.path);
-		goto out_delete;
+		ui__error("The %s file has no samples!\n", session->file->path);
+		goto out;
 	}

 	if (use_browser == 2) {
@@ -261,24 +253,12 @@ static int __cmd_annotate(struct perf_annotate *ann)
 					 "perf_gtk__show_annotations");
 		if (show_annotations == NULL) {
 			ui__error("GTK browser not found!\n");
-			goto out_delete;
+			goto out;
 		}
 		show_annotations();
 	}

-out_delete:
-	/*
-	 * Speed up the exit process, for large files this can
-	 * take quite a while.
-	 *
-	 * XXX Enable this when using valgrind or if we ever
-	 * librarize this command.
-	 *
-	 * Also experiment with obstacks to see how much speed
-	 * up we'll get here.
-	 *
-	 * perf_session__delete(session);
-	 */
+out:
 	return ret;
 }

@@ -297,10 +277,14 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused)
 			.comm	= perf_event__process_comm,
 			.exit	= perf_event__process_exit,
 			.fork	= perf_event__process_fork,
-			.ordered_samples = true,
+			.ordered_events = true,
 			.ordering_requires_timestamps = true,
 		},
 	};
+	struct perf_data_file file = {
+		.path  = input_name,
+		.mode  = PERF_DATA_MODE_READ,
+	};
 	const struct option options[] = {
 	OPT_STRING('i', "input", &input_name, "file",
 		    "input file name"),
@@ -308,7 +292,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused)
 		   "only consider symbols in these dsos"),
 	OPT_STRING('s', "symbol", &annotate.sym_hist_filter, "symbol",
 		    "symbol to annotate"),
-	OPT_BOOLEAN('f', "force", &annotate.force, "don't complain, do it"),
+	OPT_BOOLEAN('f', "force", &file.force, "don't complain, do it"),
 	OPT_INCR('v', "verbose", &verbose,
 		    "be more verbose (show symbol address, etc)"),
 	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
@@ -341,6 +325,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "Show event group information together"),
 	OPT_END()
 	};
+	int ret;

 	argc = parse_options(argc, argv, options, annotate_usage, 0);

@@ -353,11 +338,16 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused)

 	setup_browser(true);

+	annotate.session = perf_session__new(&file, false, &annotate.tool);
+	if (annotate.session == NULL)
+		return -1;
+
 	symbol_conf.priv_size = sizeof(struct annotation);
 	symbol_conf.try_vmlinux_path = true;

-	if (symbol__init() < 0)
-		return -1;
+	ret = symbol__init(&annotate.session->header.env);
+	if (ret < 0)
+		goto out_delete;

 	if (setup_sorting() < 0)
 		usage_with_options(annotate_usage, options);
@@ -373,5 +363,20 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused)
 		annotate.sym_hist_filter = argv[0];
 	}

-	return __cmd_annotate(&annotate);
+	ret = __cmd_annotate(&annotate);
+
+out_delete:
+	/*
+	 * Speed up the exit process, for large files this can
+	 * take quite a while.
+	 *
+	 * XXX Enable this when using valgrind or if we ever
+	 * librarize this command.
+	 *
+	 * Also experiment with obstacks to see how much speed
+	 * up we'll get here.
+	 *
+	 * perf_session__delete(session);
+	 */
+	return ret;
 }
--- a/tools/perf/builtin-buildid-cache.c
+++ b/tools/perf/builtin-buildid-cache.c
@@ -246,20 +246,9 @@ static bool dso__missing_buildid_cache(struct dso *dso, int parm __maybe_unused)
 	return true;
 }

-static int build_id_cache__fprintf_missing(const char *filename, bool force, FILE *fp)
+static int build_id_cache__fprintf_missing(struct perf_session *session, FILE *fp)
 {
-	struct perf_data_file file = {
-		.path  = filename,
-		.mode  = PERF_DATA_MODE_READ,
-		.force = force,
-	};
-	struct perf_session *session = perf_session__new(&file, false, NULL);
-	if (session == NULL)
-		return -1;
-
 	perf_session__fprintf_dsos_buildid(session, fp, dso__missing_buildid_cache, 0);
-	perf_session__delete(session);
-
 	return 0;
 }

@@ -302,6 +291,12 @@ int cmd_buildid_cache(int argc, const char **argv,
 		   *missing_filename = NULL,
 		   *update_name_list_str = NULL,
 		   *kcore_filename;
+	char sbuf[STRERR_BUFSIZE];
+
+	struct perf_data_file file = {
+		.mode  = PERF_DATA_MODE_READ,
+	};
+	struct perf_session *session = NULL;

 	const struct option buildid_cache_options[] = {
 	OPT_STRING('a', "add", &add_name_list_str,
@@ -326,8 +321,17 @@ int cmd_buildid_cache(int argc, const char **argv,
 	argc = parse_options(argc, argv, buildid_cache_options,
 			     buildid_cache_usage, 0);

-	if (symbol__init() < 0)
-		return -1;
+	if (missing_filename) {
+		file.path = missing_filename;
+		file.force = force;
+
+		session = perf_session__new(&file, false, NULL);
+		if (session == NULL)
+			return -1;
+	}
+
+	if (symbol__init(session ? &session->header.env : NULL) < 0)
+		goto out;

 	setup_pager();

@@ -344,7 +348,7 @@ int cmd_buildid_cache(int argc, const char **argv,
 						continue;
 					}
 					pr_warning("Couldn't add %s: %s\n",
-						   pos->s, strerror(errno));
+						   pos->s, strerror_r(errno, sbuf, sizeof(sbuf)));
 				}

 			strlist__delete(list);
@@ -362,7 +366,7 @@ int cmd_buildid_cache(int argc, const char **argv,
 						continue;
 					}
 					pr_warning("Couldn't remove %s: %s\n",
-						   pos->s, strerror(errno));
+						   pos->s, strerror_r(errno, sbuf, sizeof(sbuf)));
 				}

 			strlist__delete(list);
@@ -370,7 +374,7 @@ int cmd_buildid_cache(int argc, const char **argv,
 	}

 	if (missing_filename)
-		ret = build_id_cache__fprintf_missing(missing_filename, force, stdout);
+		ret = build_id_cache__fprintf_missing(session, stdout);

 	if (update_name_list_str) {
 		list = strlist__new(true, update_name_list_str);
@@ -383,7 +387,7 @@ int cmd_buildid_cache(int argc, const char **argv,
 						continue;
 					}
 					pr_warning("Couldn't update %s: %s\n",
-						   pos->s, strerror(errno));
+						   pos->s, strerror_r(errno, sbuf, sizeof(sbuf)));
 				}

 			strlist__delete(list);
@@ -394,5 +398,9 @@ int cmd_buildid_cache(int argc, const char **argv,
 	    build_id_cache__add_kcore(kcore_filename, debugdir, force))
 		pr_warning("Couldn't add %s\n", kcore_filename);

+out:
+	if (session)
+		perf_session__delete(session);
+
 	return ret;
 }
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -360,7 +360,7 @@ static struct perf_tool tool = {
 	.exit	= perf_event__process_exit,
 	.fork	= perf_event__process_fork,
 	.lost	= perf_event__process_lost,
-	.ordered_samples = true,
+	.ordered_events = true,
 	.ordering_requires_timestamps = true,
 };

@@ -683,7 +683,7 @@ static int __cmd_diff(void)
 		d->session = perf_session__new(&d->file, false, &tool);
 		if (!d->session) {
 			pr_err("Failed to open %s\n", d->file.path);
-			ret = -ENOMEM;
+			ret = -1;
 			goto out_delete;
 		}

@@ -1143,7 +1143,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix __maybe_unused)

 	argc = parse_options(argc, argv, options, diff_usage, 0);

-	if (symbol__init() < 0)
+	if (symbol__init(NULL) < 0)
 		return -1;

 	if (data_init(argc, argv) < 0)

--- a/tools/perf/builtin-evlist.c
+++ b/tools/perf/builtin-evlist.c
@@ -28,7 +28,7 @@ static int __cmd_evlist(const char *file_name, struct perf_attr_details *details

 	session = perf_session__new(&file, 0, NULL);
 	if (session == NULL)
-		return -ENOMEM;
+		return -1;

 	evlist__for_each(session->evlist, pos)
 		perf_evsel__fprintf(pos, details, stdout);

--- a/tools/perf/builtin-help.c
+++ b/tools/perf/builtin-help.c
@@ -103,6 +103,8 @@ static int check_emacsclient_version(void)

 static void exec_woman_emacs(const char *path, const char *page)
 {
+	char sbuf[STRERR_BUFSIZE];
+
 	if (!check_emacsclient_version()) {
 		/* This works only with emacsclient version >= 22. */
 		struct strbuf man_page = STRBUF_INIT;
@@ -111,16 +113,19 @@ static void exec_woman_emacs(const char *path, const char *page)
 			path = "emacsclient";
 		strbuf_addf(&man_page, "(woman \"%s\")", page);
 		execlp(path, "emacsclient", "-e", man_page.buf, NULL);
-		warning("failed to exec '%s': %s", path, strerror(errno));
+		warning("failed to exec '%s': %s", path,
+			strerror_r(errno, sbuf, sizeof(sbuf)));
 	}
 }

 static void exec_man_konqueror(const char *path, const char *page)
 {
 	const char *display = getenv("DISPLAY");
+
 	if (display && *display) {
 		struct strbuf man_page = STRBUF_INIT;
 		const char *filename = "kfmclient";
+		char sbuf[STRERR_BUFSIZE];

 		/* It's simpler to launch konqueror using kfmclient. */
 		if (path) {
@@ -139,24 +144,31 @@ static void exec_man_konqueror(const char *path, const char *page)
 			path = "kfmclient";
 		strbuf_addf(&man_page, "man:%s(1)", page);
 		execlp(path, filename, "newTab", man_page.buf, NULL);
-		warning("failed to exec '%s': %s", path, strerror(errno));
+		warning("failed to exec '%s': %s", path,
+			strerror_r(errno, sbuf, sizeof(sbuf)));
 	}
 }

 static void exec_man_man(const char *path, const char *page)
 {
+	char sbuf[STRERR_BUFSIZE];
+
 	if (!path)
 		path = "man";
 	execlp(path, "man", page, NULL);
-	warning("failed to exec '%s': %s", path, strerror(errno));
+	warning("failed to exec '%s': %s", path,
+		strerror_r(errno, sbuf, sizeof(sbuf)));
 }

 static void exec_man_cmd(const char *cmd, const char *page)
 {
 	struct strbuf shell_cmd = STRBUF_INIT;
+	char sbuf[STRERR_BUFSIZE];
+
 	strbuf_addf(&shell_cmd, "%s %s", cmd, page);
 	execl("/bin/sh", "sh", "-c", shell_cmd.buf, NULL);
-	warning("failed to exec '%s': %s", cmd, strerror(errno));
+	warning("failed to exec '%s': %s", cmd,
+		strerror_r(errno, sbuf, sizeof(sbuf)));
 }

 static void add_man_viewer(const char *name)

--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -23,6 +23,7 @@

 struct perf_inject {
 	struct perf_tool	tool;
+	struct perf_session	*session;
 	bool			build_ids;
 	bool			sched_stat;
 	const char		*input_name;
@@ -340,12 +341,8 @@ static int perf_evsel__check_stype(struct perf_evsel *evsel,

 static int __cmd_inject(struct perf_inject *inject)
 {
-	struct perf_session *session;
 	int ret = -EINVAL;
-	struct perf_data_file file = {
-		.path = inject->input_name,
-		.mode = PERF_DATA_MODE_READ,
-	};
+	struct perf_session *session = inject->session;
 	struct perf_data_file *file_out = &inject->output;

 	signal(SIGINT, sig_handler);
@@ -357,16 +354,12 @@ static int __cmd_inject(struct perf_inject *inject)
 		inject->tool.tracing_data = perf_event__repipe_tracing_data;
 	}

-	session = perf_session__new(&file, true, &inject->tool);
-	if (session == NULL)
-		return -ENOMEM;
-
 	if (inject->build_ids) {
 		inject->tool.sample = perf_event__inject_buildid;
 	} else if (inject->sched_stat) {
 		struct perf_evsel *evsel;

-		inject->tool.ordered_samples = true;
+		inject->tool.ordered_events = true;

 		evlist__for_each(session->evlist, evsel) {
 			const char *name = perf_evsel__name(evsel);
@@ -396,8 +389,6 @@ static int __cmd_inject(struct perf_inject *inject)
 		perf_session__write_header(session, session->evlist, file_out->fd, true);
 	}

-	perf_session__delete(session);
-
 	return ret;
 }

@@ -427,6 +418,11 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
 			.mode = PERF_DATA_MODE_WRITE,
 		},
 	};
+	struct perf_data_file file = {
+		.mode = PERF_DATA_MODE_READ,
+	};
+	int ret;
+
 	const struct option options[] = {
 		OPT_BOOLEAN('b', "build-ids", &inject.build_ids,
 			    "Inject build-ids into the output stream"),
@@ -461,8 +457,17 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
 		return -1;
 	}

-	if (symbol__init() < 0)
+	file.path = inject.input_name;
+	inject.session = perf_session__new(&file, true, &inject.tool);
+	if (inject.session == NULL)
+		return -1;
+
+	if (symbol__init(&inject.session->header.env) < 0)
 		return -1;

-	return __cmd_inject(&inject);
+	ret = __cmd_inject(&inject);
+
+	perf_session__delete(inject.session);
+
+	return ret;
 }
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -256,7 +256,9 @@ static int process_sample_event(struct perf_tool *tool __maybe_unused,
 static struct perf_tool perf_kmem = {
 	.sample		 = process_sample_event,
 	.comm		 = perf_event__process_comm,
-	.ordered_samples = true,
+	.mmap		 = perf_event__process_mmap,
+	.mmap2		 = perf_event__process_mmap2,
+	.ordered_events	 = true,
 };

 static double fragmentation(unsigned long n_req, unsigned long n_alloc)
@@ -403,10 +405,9 @@ static void sort_result(void)
 	__sort_result(&root_caller_stat, &root_caller_sorted, &caller_sort);
 }

-static int __cmd_kmem(void)
+static int __cmd_kmem(struct perf_session *session)
 {
 	int err = -EINVAL;
-	struct perf_session *session;
 	const struct perf_evsel_str_handler kmem_tracepoints[] = {
 		{ "kmem:kmalloc",		perf_evsel__process_alloc_event, },
    		{ "kmem:kmem_cache_alloc",	perf_evsel__process_alloc_event, },
@@ -415,34 +416,22 @@ static int __cmd_kmem(void)
 		{ "kmem:kfree",			perf_evsel__process_free_event, },
    		{ "kmem:kmem_cache_free",	perf_evsel__process_free_event, },
 	};
-	struct perf_data_file file = {
-		.path = input_name,
-		.mode = PERF_DATA_MODE_READ,
-	};
-
-	session = perf_session__new(&file, false, &perf_kmem);
-	if (session == NULL)
-		return -ENOMEM;
-
-	if (perf_session__create_kernel_maps(session) < 0)
-		goto out_delete;

 	if (!perf_session__has_traces(session, "kmem record"))
-		goto out_delete;
+		goto out;

 	if (perf_session__set_tracepoints_handlers(session, kmem_tracepoints)) {
 		pr_err("Initializing perf session tracepoint handlers failed\n");
-		return -1;
+		goto out;
 	}

 	setup_pager();
 	err = perf_session__process_events(session, &perf_kmem);
 	if (err != 0)
-		goto out_delete;
+		goto out;
 	sort_result();
 	print_result(session);
-out_delete:
-	perf_session__delete(session);
+out:
 	return err;
 }

@@ -689,29 +678,46 @@ int cmd_kmem(int argc, const char **argv, const char *prefix __maybe_unused)
 		NULL,
 		NULL
 	};
+	struct perf_session *session;
+	struct perf_data_file file = {
+		.path = input_name,
+		.mode = PERF_DATA_MODE_READ,
+	};
+	int ret = -1;
+
 	argc = parse_options_subcommand(argc, argv, kmem_options,
 					kmem_subcommands, kmem_usage, 0);

 	if (!argc)
 		usage_with_options(kmem_usage, kmem_options);

-	symbol__init();
-
 	if (!strncmp(argv[0], "rec", 3)) {
+		symbol__init(NULL);
 		return __cmd_record(argc, argv);
-	} else if (!strcmp(argv[0], "stat")) {
+	}
+
+	session = perf_session__new(&file, false, &perf_kmem);
+	if (session == NULL)
+		return -1;
+
+	symbol__init(&session->header.env);
+
+	if (!strcmp(argv[0], "stat")) {
 		if (cpu__setup_cpunode_map())
-			return -1;
+			goto out_delete;

 		if (list_empty(&caller_sort))
 			setup_sorting(&caller_sort, default_sort_order);
 		if (list_empty(&alloc_sort))
 			setup_sorting(&alloc_sort, default_sort_order);

-		return __cmd_kmem();
+		ret = __cmd_kmem(session);
 	} else
 		usage_with_options(kmem_usage, kmem_options);

-	return 0;
+out_delete:
+	perf_session__delete(session);
+
+	return ret;
 }

--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -543,14 +543,12 @@ static void print_vcpu_info(struct perf_kvm_stat *kvm)

 	pr_info("Analyze events for ");

-	if (kvm->live) {
-		if (kvm->opts.target.system_wide)
-			pr_info("all VMs, ");
-		else if (kvm->opts.target.pid)
-			pr_info("pid(s) %s, ", kvm->opts.target.pid);
-		else
-			pr_info("dazed and confused on what is monitored, ");
-	}
+	if (kvm->opts.target.system_wide)
+		pr_info("all VMs, ");
+	else if (kvm->opts.target.pid)
+		pr_info("pid(s) %s, ", kvm->opts.target.pid);
+	else
+		pr_info("dazed and confused on what is monitored, ");

 	if (vcpu == -1)
 		pr_info("all VCPUs:\n\n");
@@ -592,8 +590,8 @@ static void print_result(struct perf_kvm_stat *kvm)
 	pr_info("%9s ", "Samples%");

 	pr_info("%9s ", "Time%");
-	pr_info("%10s ", "Min Time");
-	pr_info("%10s ", "Max Time");
+	pr_info("%11s ", "Min Time");
+	pr_info("%11s ", "Max Time");
 	pr_info("%16s ", "Avg time");
 	pr_info("\n\n");

@@ -610,8 +608,8 @@ static void print_result(struct perf_kvm_stat *kvm)
 		pr_info("%10llu ", (unsigned long long)ecount);
 		pr_info("%8.2f%% ", (double)ecount / kvm->total_count * 100);
 		pr_info("%8.2f%% ", (double)etime / kvm->total_time * 100);
-		pr_info("%8" PRIu64 "us ", min / 1000);
-		pr_info("%8" PRIu64 "us ", max / 1000);
+		pr_info("%9.2fus ", (double)min / 1e3);
+		pr_info("%9.2fus ", (double)max / 1e3);
 		pr_info("%9.2fus ( +-%7.2f%% )", (double)etime / ecount/1e3,
 			kvm_event_rel_stddev(vcpu, event));
 		pr_info("\n");
@@ -732,7 +730,7 @@ static s64 perf_kvm__mmap_read_idx(struct perf_kvm_stat *kvm, int idx,
 			return -1;
 		}

-		err = perf_session_queue_event(kvm->session, event, &sample, 0);
+		err = perf_session_queue_event(kvm->session, event, &kvm->tool, &sample, 0);
 		/*
 		 * FIXME: Here we can't consume the event, as perf_session_queue_event will
 		 *        point to it, and it'll get possibly overwritten by the kernel.
@@ -785,7 +783,7 @@ static int perf_kvm__mmap_read(struct perf_kvm_stat *kvm)

 	/* flush queue after each round in which we processed events */
 	if (ntotal) {
-		kvm->session->ordered_samples.next_flush = flush_time;
+		kvm->session->ordered_events.next_flush = flush_time;
 		err = kvm->tool.finished_round(&kvm->tool, NULL, kvm->session);
 		if (err) {
 			if (kvm->lost_events)
@@ -885,15 +883,11 @@ static int fd_set_nonblock(int fd)
 	return 0;
 }

-static
-int perf_kvm__handle_stdin(struct termios *tc_now, struct termios *tc_save)
+static int perf_kvm__handle_stdin(void)
 {
 	int c;

-	tcsetattr(0, TCSANOW, tc_now);
 	c = getc(stdin);
-	tcsetattr(0, TCSAFLUSH, tc_save);
-
 	if (c == 'q')
 		return 1;

@@ -904,7 +898,7 @@ static int kvm_events_live_report(struct perf_kvm_stat *kvm)
 {
 	struct pollfd *pollfds = NULL;
 	int nr_fds, nr_stdin, ret, err = -EINVAL;
-	struct termios tc, save;
+	struct termios save;

 	/* live flag must be set first */
 	kvm->live = true;
@@ -919,26 +913,14 @@ static int kvm_events_live_report(struct perf_kvm_stat *kvm)
 		goto out;
 	}

+	set_term_quiet_input(&save);
 	init_kvm_event_record(kvm);

-	tcgetattr(0, &save);
-	tc = save;
-	tc.c_lflag &= ~(ICANON | ECHO);
-	tc.c_cc[VMIN] = 0;
-	tc.c_cc[VTIME] = 0;
-
 	signal(SIGINT, sig_handler);
 	signal(SIGTERM, sig_handler);

-	/* copy pollfds -- need to add timerfd and stdin */
-	nr_fds = kvm->evlist->nr_fds;
-	pollfds = zalloc(sizeof(struct pollfd) * (nr_fds + 2));
-	if (!pollfds) {
-		err = -ENOMEM;
-		goto out;
-	}
-	memcpy(pollfds, kvm->evlist->pollfd,
-		sizeof(struct pollfd) * kvm->evlist->nr_fds);
+	/* use pollfds -- need to add timerfd and stdin */
+	nr_fds = kvm->evlist->pollfd.nr;

 	/* add timer fd */
 	if (perf_kvm__timerfd_create(kvm) < 0) {
@@ -946,17 +928,21 @@ static int kvm_events_live_report(struct perf_kvm_stat *kvm)
 		goto out;
 	}

-	pollfds[nr_fds].fd = kvm->timerfd;
-	pollfds[nr_fds].events = POLLIN;
+	if (perf_evlist__add_pollfd(kvm->evlist, kvm->timerfd))
+		goto out;
+
 	nr_fds++;

-	pollfds[nr_fds].fd = fileno(stdin);
-	pollfds[nr_fds].events = POLLIN;
+	if (perf_evlist__add_pollfd(kvm->evlist, fileno(stdin)))
+		goto out;
+
 	nr_stdin = nr_fds;
 	nr_fds++;
 	if (fd_set_nonblock(fileno(stdin)) != 0)
 		goto out;

+	pollfds	 = kvm->evlist->pollfd.entries;
+
 	/* everything is good - enable the events and process */
 	perf_evlist__enable(kvm->evlist);

@@ -972,7 +958,7 @@ static int kvm_events_live_report(struct perf_kvm_stat *kvm)
 			goto out;

 		if (pollfds[nr_stdin].revents & POLLIN)
-			done = perf_kvm__handle_stdin(&tc, &save);
+			done = perf_kvm__handle_stdin();

 		if (!rc && !done)
 			err = poll(pollfds, nr_fds, 100);
@@ -989,7 +975,7 @@ static int kvm_events_live_report(struct perf_kvm_stat *kvm)
 	if (kvm->timerfd >= 0)
 		close(kvm->timerfd);

-	free(pollfds);
+	tcsetattr(0, TCSAFLUSH, &save);
 	return err;
 }

@@ -998,6 +984,7 @@ static int kvm_live_open_events(struct perf_kvm_stat *kvm)
 	int err, rc = -1;
 	struct perf_evsel *pos;
 	struct perf_evlist *evlist = kvm->evlist;
+	char sbuf[STRERR_BUFSIZE];

 	perf_evlist__config(evlist, &kvm->opts);

@@ -1034,12 +1021,14 @@ static int kvm_live_open_events(struct perf_kvm_stat *kvm)

 	err = perf_evlist__open(evlist);
 	if (err < 0) {
-		printf("Couldn't create the events: %s\n", strerror(errno));
+		printf("Couldn't create the events: %s\n",
+		       strerror_r(errno, sbuf, sizeof(sbuf)));
 		goto out;
 	}

 	if (perf_evlist__mmap(evlist, kvm->opts.mmap_pages, false) < 0) {
-		ui__error("Failed to mmap the events: %s\n", strerror(errno));
+		ui__error("Failed to mmap the events: %s\n",
+			  strerror_r(errno, sbuf, sizeof(sbuf)));
 		perf_evlist__close(evlist);
 		goto out;
 	}
@@ -1058,7 +1047,7 @@ static int read_events(struct perf_kvm_stat *kvm)
 	struct perf_tool eops = {
 		.sample			= process_sample_event,
 		.comm			= perf_event__process_comm,
-		.ordered_samples	= true,
+		.ordered_events		= true,
 	};
 	struct perf_data_file file = {
 		.path = kvm->file_name,
@@ -1069,9 +1058,11 @@ static int read_events(struct perf_kvm_stat *kvm)
 	kvm->session = perf_session__new(&file, false, &kvm->tool);
 	if (!kvm->session) {
 		pr_err("Initializing perf session failed\n");
-		return -EINVAL;
+		return -1;
 	}

+	symbol__init(&kvm->session->header.env);
+
 	if (!perf_session__has_traces(kvm->session, "kvm record"))
 		return -EINVAL;

@@ -1088,8 +1079,8 @@ static int read_events(struct perf_kvm_stat *kvm)

 static int parse_target_str(struct perf_kvm_stat *kvm)
 {
-	if (kvm->pid_str) {
-		kvm->pid_list = intlist__new(kvm->pid_str);
+	if (kvm->opts.target.pid) {
+		kvm->pid_list = intlist__new(kvm->opts.target.pid);
 		if (kvm->pid_list == NULL) {
 			pr_err("Error parsing process id string\n");
 			return -EINVAL;
@@ -1191,7 +1182,7 @@ kvm_events_report(struct perf_kvm_stat *kvm, int argc, const char **argv)
 		OPT_STRING('k', "key", &kvm->sort_key, "sort-key",
 			    "key for sorting: sample(sort by samples number)"
 			    " time (sort by avg time)"),
-		OPT_STRING('p', "pid", &kvm->pid_str, "pid",
+		OPT_STRING('p', "pid", &kvm->opts.target.pid, "pid",
 			   "analyze events only for given process id(s)"),
 		OPT_END()
 	};
@@ -1201,8 +1192,6 @@ kvm_events_report(struct perf_kvm_stat *kvm, int argc, const char **argv)
 		NULL
 	};

-	symbol__init();
-
 	if (argc) {
 		argc = parse_options(argc, argv,
 				     kvm_events_report_options,
@@ -1212,6 +1201,9 @@ kvm_events_report(struct perf_kvm_stat *kvm, int argc, const char **argv)
 					   kvm_events_report_options);
 	}

+	if (!kvm->opts.target.pid)
+		kvm->opts.target.system_wide = true;
+
 	return kvm_events_report_vcpu(kvm);
 }

@@ -1311,7 +1303,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
 	kvm->tool.exit   = perf_event__process_exit;
 	kvm->tool.fork   = perf_event__process_fork;
 	kvm->tool.lost   = process_lost_event;
-	kvm->tool.ordered_samples = true;
+	kvm->tool.ordered_events = true;
 	perf_tool__fill_defaults(&kvm->tool);

 	/* set defaults */
@@ -1322,7 +1314,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
 	kvm->opts.target.uid_str = NULL;
 	kvm->opts.target.uid = UINT_MAX;

-	symbol__init();
+	symbol__init(NULL);
 	disable_buildid_cache();

 	use_browser = 0;
@@ -1369,7 +1361,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
 	 */
 	kvm->session = perf_session__new(&file, false, &kvm->tool);
 	if (kvm->session == NULL) {
-		err = -ENOMEM;
+		err = -1;
 		goto out;
 	}
 	kvm->session->evlist = kvm->evlist;

--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -852,7 +852,7 @@ static int __cmd_report(bool display_info)
 	struct perf_tool eops = {
 		.sample		 = process_sample_event,
 		.comm		 = perf_event__process_comm,
-		.ordered_samples = true,
+		.ordered_events	 = true,
 	};
 	struct perf_data_file file = {
 		.path = input_name,
@@ -862,9 +862,11 @@ static int __cmd_report(bool display_info)
 	session = perf_session__new(&file, false, &eops);
 	if (!session) {
 		pr_err("Initializing perf session failed\n");
-		return -ENOMEM;
+		return -1;
 	}

+	symbol__init(&session->header.env);
+
 	if (!perf_session__has_traces(session, "lock record"))
 		goto out_delete;

@@ -974,7 +976,6 @@ int cmd_lock(int argc, const char **argv, const char *prefix __maybe_unused)
 	unsigned int i;
 	int rc = 0;

-	symbol__init();
 	for (i = 0; i < LOCKHASH_SIZE; i++)
 		INIT_LIST_HEAD(lockhash_table + i);


--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -124,7 +124,7 @@ static int report_raw_events(struct perf_mem *mem)
 							 &mem->tool);

 	if (session == NULL)
-		return -ENOMEM;
+		return -1;

 	if (mem->cpu_list) {
 		ret = perf_session__cpu_bitmap(session, mem->cpu_list,
@@ -133,7 +133,7 @@ static int report_raw_events(struct perf_mem *mem)
 			goto out_delete;
 	}

-	if (symbol__init() < 0)
+	if (symbol__init(&session->header.env) < 0)
 		return -1;

 	printf("# PID, TID, IP, ADDR, LOCAL WEIGHT, DSRC, SYMBOL\n");
@@ -194,7 +194,7 @@ int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused)
 			.lost		= perf_event__process_lost,
 			.fork		= perf_event__process_fork,
 			.build_id	= perf_event__process_build_id,
-			.ordered_samples = true,
+			.ordered_events	= true,
 		},
 		.input_name		 = "perf.data",
 	};

--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -290,8 +290,11 @@ static void cleanup_params(void)

 static void pr_err_with_code(const char *msg, int err)
 {
+	char sbuf[STRERR_BUFSIZE];
+
 	pr_err("%s", msg);
-	pr_debug(" Reason: %s (Code: %d)", strerror(-err), err);
+	pr_debug(" Reason: %s (Code: %d)",
+		 strerror_r(-err, sbuf, sizeof(sbuf)), err);
 	pr_err("\n");
 }

@@ -373,6 +376,8 @@ __cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused)
 			"target executable name or path", opt_set_target),
 	OPT_BOOLEAN(0, "demangle", &symbol_conf.demangle,
 		    "Disable symbol demangling"),
+	OPT_BOOLEAN(0, "demangle-kernel", &symbol_conf.demangle_kernel,
+		    "Enable kernel symbol demangling"),
 	OPT_END()
 	};
 	int ret;
@@ -467,7 +472,8 @@ __cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused)
 			usage_with_options(probe_usage, options);
 		}

-		ret = show_line_range(&params.line_range, params.target);
+		ret = show_line_range(&params.line_range, params.target,
+				      params.uprobes);
 		if (ret < 0)
 			pr_err_with_code("  Error: Failed to show lines.", ret);
 		return ret;

--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -58,17 +58,19 @@ struct report {
 	const char		*symbol_filter_str;
 	float			min_percent;
 	u64			nr_entries;
+	u64			queue_size;
 	DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 };

 static int report__config(const char *var, const char *value, void *cb)
 {
+	struct report *rep = cb;
+
 	if (!strcmp(var, "report.group")) {
 		symbol_conf.event_group = perf_config_bool(var, value);
 		return 0;
 	}
 	if (!strcmp(var, "report.percent-limit")) {
-		struct report *rep = cb;
 		rep->min_percent = strtof(value, NULL);
 		return 0;
 	}
@@ -76,6 +78,10 @@ static int report__config(const char *var, const char *value, void *cb)
 		symbol_conf.cumulate_callchain = perf_config_bool(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "report.queue-size")) {
+		rep->queue_size = perf_config_u64(var, value);
+		return 0;
+	}

 	return perf_default_config(var, value, cb);
 }
@@ -578,7 +584,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 			.attr		 = perf_event__process_attr,
 			.tracing_data	 = perf_event__process_tracing_data,
 			.build_id	 = perf_event__process_build_id,
-			.ordered_samples = true,
+			.ordered_events	 = true,
 			.ordering_requires_timestamps = true,
 		},
 		.max_stack		 = PERF_MAX_STACK_DEPTH,
@@ -674,6 +680,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		   "objdump binary to use for disassembly and annotations"),
 	OPT_BOOLEAN(0, "demangle", &symbol_conf.demangle,
 		    "Disable symbol demangling"),
+	OPT_BOOLEAN(0, "demangle-kernel", &symbol_conf.demangle_kernel,
+		    "Enable kernel symbol demangling"),
 	OPT_BOOLEAN(0, "mem-mode", &report.mem_mode, "mem access profile"),
 	OPT_CALLBACK(0, "percent-limit", &report, "percent",
 		     "Don't show entries under that percent", parse_percent_limit),
@@ -712,14 +720,19 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 repeat:
 	session = perf_session__new(&file, false, &report.tool);
 	if (session == NULL)
-		return -ENOMEM;
+		return -1;
+
+	if (report.queue_size) {
+		ordered_events__set_alloc_size(&session->ordered_events,
+					       report.queue_size);
+	}

 	report.session = session;

 	has_br_stack = perf_header__has_feat(&session->header,
 					     HEADER_BRANCH_STACK);

-	if (branch_mode == -1 && has_br_stack) {
+	if ((branch_mode == -1 && has_br_stack) || branch_mode == 1) {
 		sort__mode = SORT_MODE__BRANCH;
 		symbol_conf.cumulate_callchain = false;
 	}
@@ -787,7 +800,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		}
 	}

-	if (symbol__init() < 0)
+	if (symbol__init(&session->header.env) < 0)
 		goto error;

 	if (argc) {

--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -428,6 +428,7 @@ static u64 get_cpu_usage_nsec_parent(void)
 static int self_open_counters(void)
 {
 	struct perf_event_attr attr;
+	char sbuf[STRERR_BUFSIZE];
 	int fd;

 	memset(&attr, 0, sizeof(attr));
@@ -440,7 +441,8 @@ static int self_open_counters(void)

 	if (fd < 0)
 		pr_err("Error: sys_perf_event_open() syscall returned "
-		       "with %d (%s)\n", fd, strerror(errno));
+		       "with %d (%s)\n", fd,
+		       strerror_r(errno, sbuf, sizeof(sbuf)));
 	return fd;
 }

@@ -1462,6 +1464,8 @@ static int perf_sched__read_events(struct perf_sched *sched,
 		return -1;
 	}

+	symbol__init(&session->header.env);
+
 	if (perf_session__set_tracepoints_handlers(session, handlers))
 		goto out_delete;

@@ -1662,7 +1666,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 			.comm		 = perf_event__process_comm,
 			.lost		 = perf_event__process_lost,
 			.fork		 = perf_sched__process_fork_event,
-			.ordered_samples = true,
+			.ordered_events = true,
 		},
 		.cmp_pid	      = LIST_HEAD_INIT(sched.cmp_pid),
 		.sort_list	      = LIST_HEAD_INIT(sched.sort_list),
@@ -1747,7 +1751,6 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (!strcmp(argv[0], "script"))
 		return cmd_script(argc, argv, prefix);

-	symbol__init();
 	if (!strncmp(argv[0], "rec", 3)) {
 		return __cmd_record(argc, argv);
 	} else if (!strncmp(argv[0], "lat", 3)) {

--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -1605,7 +1605,9 @@ static int __cmd_timechart(struct timechart *tchart, const char *output_name)
 	int ret = -EINVAL;

 	if (session == NULL)
-		return -ENOMEM;
+		return -1;
+
+	symbol__init(&session->header.env);

 	(void)perf_header__process_sections(&session->header,
 					    perf_data_file__fd(session->file),
@@ -1920,7 +1922,7 @@ int cmd_timechart(int argc, const char **argv,
 			.fork		 = process_fork_event,
 			.exit		 = process_exit_event,
 			.sample		 = process_sample_event,
-			.ordered_samples = true,
+			.ordered_events	 = true,
 		},
 		.proc_num = 15,
 		.min_time = 1000000,
@@ -1982,8 +1984,6 @@ int cmd_timechart(int argc, const char **argv,
 		return -1;
 	}

-	symbol__init();
-
 	if (argc && !strncmp(argv[0], "rec", 3)) {
 		argc = parse_options(argc, argv, record_options, record_usage,
 				     PARSE_OPT_STOP_AT_NON_OPTION);

--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
--- a/tools/perf/config/feature-checks/Makefile
+++ b/tools/perf/config/feature-checks/Makefile
--- a/tools/perf/config/utilities.mak
+++ b/tools/perf/config/utilities.mak
--- a/tools/perf/perf-with-kcore.sh
+++ b/tools/perf/perf-with-kcore.sh
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
--- a/tools/perf/tests/fdarray.c
+++ b/tools/perf/tests/fdarray.c
--- a/tools/perf/tests/mmap-basic.c
+++ b/tools/perf/tests/mmap-basic.c
--- a/tools/perf/tests/open-syscall-all-cpus.c
+++ b/tools/perf/tests/open-syscall-all-cpus.c
--- a/tools/perf/tests/open-syscall-tp-fields.c
+++ b/tools/perf/tests/open-syscall-tp-fields.c
--- a/tools/perf/tests/open-syscall.c
+++ b/tools/perf/tests/open-syscall.c
--- a/tools/perf/tests/perf-record.c
+++ b/tools/perf/tests/perf-record.c
--- a/tools/perf/tests/pmu.c
+++ b/tools/perf/tests/pmu.c
--- a/tools/perf/tests/rdpmc.c
+++ b/tools/perf/tests/rdpmc.c
--- a/tools/perf/tests/sw-clock.c
+++ b/tools/perf/tests/sw-clock.c
--- a/tools/perf/tests/switch-tracking.c
+++ b/tools/perf/tests/switch-tracking.c
--- a/tools/perf/tests/task-exit.c
+++ b/tools/perf/tests/task-exit.c
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
--- a/tools/perf/ui/hist.c
+++ b/tools/perf/ui/hist.c
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
--- a/tools/perf/util/cache.h
+++ b/tools/perf/util/cache.h
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
--- a/tools/perf/util/cloexec.c
+++ b/tools/perf/util/cloexec.c
--- a/tools/perf/util/color.c
+++ b/tools/perf/util/color.c
--- a/tools/perf/util/color.h
+++ b/tools/perf/util/color.h
--- a/tools/perf/util/comm.c
+++ b/tools/perf/util/comm.c
--- a/tools/perf/util/comm.h
+++ b/tools/perf/util/comm.h
--- a/tools/perf/util/config.c
+++ b/tools/perf/util/config.c
--- a/tools/perf/util/data.c
+++ b/tools/perf/util/data.c
--- a/tools/perf/util/debug.c
+++ b/tools/perf/util/debug.c
--- a/tools/perf/util/debug.h
+++ b/tools/perf/util/debug.h
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
--- a/tools/perf/util/kvm-stat.h
+++ b/tools/perf/util/kvm-stat.h
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
--- a/tools/perf/util/ordered-events.c
+++ b/tools/perf/util/ordered-events.c
--- a/tools/perf/util/ordered-events.h
+++ b/tools/perf/util/ordered-events.h
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
--- a/tools/perf/util/run-command.c
+++ b/tools/perf/util/run-command.c
--- a/tools/perf/util/scripting-engines/trace-event-perl.c
+++ b/tools/perf/util/scripting-engines/trace-event-perl.c
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
--- a/tools/perf/util/trace-event-scripting.c
+++ b/tools/perf/util/trace-event-scripting.c
--- a/tools/perf/util/trace-event.h
+++ b/tools/perf/util/trace-event.h
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h