Commit 41d859a8 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:
 "Main perf kernel side changes:

   - uprobes updates/fixes.  (Oleg Nesterov)

   - Add PERF_RECORD_SWITCH to indicate context switches and use it in
     tooling.  (Adrian Hunter)

   - Support BPF programs attached to uprobes and first steps for BPF
     tooling support.  (Wang Nan)

   - x86 generic x86 MSR-to-perf PMU driver.  (Andy Lutomirski)

   - x86 Intel PT, LBR and BTS updates.  (Alexander Shishkin)

   - x86 Intel Skylake support.  (Andi Kleen)

   - x86 Intel Knights Landing (KNL) RAPL support.  (Dasaratharaman
     Chandramouli)

   - x86 Intel Broadwell-DE uncore support.  (Kan Liang)

   - x86 hw breakpoints robustization (Andy Lutomirski)

  Main perf tooling side changes:

   - Support Intel PT in several tools, enabling the use of the
     processor trace feature introduced in Intel Broadwell processors:
     (Adrian Hunter)

       # dmesg | grep Performance
       # [0.188477] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell events, full-width counters, Intel PMU driver.
       # perf record -e intel_pt//u -a sleep 1
       [ perf record: Woken up 1 times to write data ]
       [ perf record: Captured and wrote 0.216 MB perf.data ]
       # perf script # then navigate in the tool output to some area, like this one:
       184 1030 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba661440 dl_main (/usr/lib64/ld-2.17.so)
       185 1457 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba669f10 _dl_new_object (/usr/lib64/ld-2.17.so)
       186 9f37 _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba677b90 strlen (/usr/lib64/ld-2.17.so)
       187 7ba3 strlen (/usr/lib64/ld-2.17.so) => 7f21ba677c75 strlen (/usr/lib64/ld-2.17.so)
       188 7c78 strlen (/usr/lib64/ld-2.17.so) => 7f21ba669f3c _dl_new_object (/usr/lib64/ld-2.17.so)
       189 9f8a _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba65fab0 calloc@plt (/usr/lib64/ld-2.17.so)
       190 fab0 calloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e70 calloc (/usr/lib64/ld-2.17.so)
       191 5e87 calloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa90 malloc@plt (/usr/lib64/ld-2.17.so)
       192 fa90 malloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e60 malloc (/usr/lib64/ld-2.17.so)
       193 5e68 malloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so)
       194 fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so) => 7f21ba675d50 __libc_memalign (/usr/lib64/ld-2.17.so)
       195 5d63 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e20 __libc_memalign (/usr/lib64/ld-2.17.so)
       196 5e40 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675d73 __libc_memalign (/usr/lib64/ld-2.17.so)
       197 5d97 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e18 __libc_memalign (/usr/lib64/ld-2.17.so)
       198 5e1e __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675df9 __libc_memalign (/usr/lib64/ld-2.17.so)
       199 5e10 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba669f8f _dl_new_object (/usr/lib64/ld-2.17.so)
       200 9fc2 _dl_new_object (/usr/lib64/ld-2.17.so) =>  7f21ba678e70 memcpy (/usr/lib64/ld-2.17.so)
       201 8e8c memcpy (/usr/lib64/ld-2.17.so) => 7f21ba678ea0 memcpy (/usr/lib64/ld-2.17.so)

   - Add support for using several Intel PT features (CYC, MTC packets),
     the relevant documentation was updated in:
         tools/perf/Documentation/intel-pt.txt
     briefly describing those packets, its purposes, how to configure
     them in the event config terms and relevant external documentation
     for further reading.  (Adrian Hunter)

   - Introduce support for probing at an absolute address, for user and
     kernel 'perf probe's, useful when one have the symbol maps on a
     developer machine but not on an embedded system.  (Wang Nan)

   - Add Intel BTS support, with a call-graph script to show it and PT
     in use in a GUI using 'perf script' python scripting with
     postgresql and Qt.  (Adrian Hunter)

   - Allow selecting the type of callchains per event, including
     disabling callchains in all but one entry in an event list, to save
     space, and also to ask for the callchains collected in one event to
     be used in other events.  (Kan Liang)

   - Beautify more syscall arguments in 'perf trace': (Arnaldo Carvalho
     de Melo)
       * A bunch more translate file/pathnames from pointers to strings.
       * Convert numbers to strings for the 'keyctl' syscall 'option'
         arg.
       * Add missing 'clockid' entries.

   - Introduce 'srcfile' sort key: (Andi Kleen)

       # perf record -F 10000 usleep 1
       # perf report --stdio --dsos '[kernel.vmlinux]' -s srcfile
       <SNIP>
       # Overhead  Source File
          26.49%  copy_page_64.S
           5.49%  signal.c
           0.51%  msr.h
       #

     It can be combined with other fields, for instance, experiment with
     '-s srcfile,symbol'.

     There are some oddities in some distros and with some specific
     DSOs, being investigated, so your mileage may vary.

   - Support per-event 'freq' term: (Namhyung Kim)

       $ perf record -e 'cpu/instructions,freq=1234/',cycles -c 1000 sleep 1
       $ perf evlist -F
       cpu/instructions,freq=1234/: sample_freq=1234
       cycles: sample_period=1000
       $

   - Deref sys_enter pointer args with contents from probe:vfs_getname,
     showing pathnames instead of pointers in many syscalls in 'perf
     trace'.  (Arnaldo Carvalho de Melo)

   - Stop collecting /proc/kallsyms in perf.data files, saving about
     4.5MB on a typical x86-64 system, use the the symbol resolution
     routines used in all the other tools (report, top, etc) now that we
     can ask libtraceevent to use perf's symbol resolution code.
     (Arnaldo Carvalho de Melo)

   - Allow filtering out of perf's PID via 'perf record --exclude-perf'.
     (Wang Nan)

   - 'perf trace' now supports syscall groups, like strace, i.e:

       $ trace -e file touch file

     Will expand 'file' into multiple, file related, syscalls.  More
     work needed to add extra groups for other syscall groups, and also
     to complement what was added for the 'file' group, included as a
     proof of concept.  (Arnaldo Carvalho de Melo)

   - Add lock_pi stresser to 'perf bench futex', to test the kernel code
     related to FUTEX_(UN)LOCK_PI.  (Davidlohr Bueso)

   - Let user have timestamps with per-thread recording in 'perf record'
     (Adrian Hunter)

   - ... and tons of other changes, see the shortlog and the Git log for
     details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (240 commits)
  perf evlist: Add backpointer for perf_env to evlist
  perf tools: Rename perf_session_env to perf_env
  perf tools: Do not change lib/api/fs/debugfs directly
  perf tools: Add tracing_path and remove unneeded functions
  perf buildid: Introduce sysfs/filename__sprintf_build_id
  perf evsel: Add a backpointer to the evlist a evsel is in
  perf trace: Add header with copyright and background info
  perf scripts python: Add new compaction-times script
  perf stat: Get correct cpu id for print_aggr
  tools lib traceeveent: Allow for negative numbers in print format
  perf script: Add --[no-]-demangle/--[no-]-demangle-kernel
  tracing/uprobes: Do not print '0x (null)' when offset is 0
  perf probe: Support probing at absolute address
  perf probe: Fix error reported when offset without function
  perf probe: Fix list result when address is zero
  perf probe: Fix list result when symbol can't be found
  tools build: Allow duplicate objects in the object list
  perf tools: Remove export.h from MANIFEST
  perf probe: Prevent segfault when reading probe point with absolute address
  perf tools: Update Intel PT documentation
  ...
parents 46580009 bac2e4a9
......@@ -73,6 +73,12 @@
#define MSR_LBR_CORE_FROM 0x00000040
#define MSR_LBR_CORE_TO 0x00000060
#define MSR_LBR_INFO_0 0x00000dc0 /* ... 0xddf for _31 */
#define LBR_INFO_MISPRED BIT_ULL(63)
#define LBR_INFO_IN_TX BIT_ULL(62)
#define LBR_INFO_ABORT BIT_ULL(61)
#define LBR_INFO_CYCLES 0xffff
#define MSR_IA32_PEBS_ENABLE 0x000003f1
#define MSR_IA32_DS_AREA 0x00000600
#define MSR_IA32_PERF_CAPABILITIES 0x00000345
......@@ -80,13 +86,21 @@
#define MSR_IA32_RTIT_CTL 0x00000570
#define RTIT_CTL_TRACEEN BIT(0)
#define RTIT_CTL_CYCLEACC BIT(1)
#define RTIT_CTL_OS BIT(2)
#define RTIT_CTL_USR BIT(3)
#define RTIT_CTL_CR3EN BIT(7)
#define RTIT_CTL_TOPA BIT(8)
#define RTIT_CTL_MTC_EN BIT(9)
#define RTIT_CTL_TSC_EN BIT(10)
#define RTIT_CTL_DISRETC BIT(11)
#define RTIT_CTL_BRANCH_EN BIT(13)
#define RTIT_CTL_MTC_RANGE_OFFSET 14
#define RTIT_CTL_MTC_RANGE (0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
#define RTIT_CTL_CYC_THRESH_OFFSET 19
#define RTIT_CTL_CYC_THRESH (0x0full << RTIT_CTL_CYC_THRESH_OFFSET)
#define RTIT_CTL_PSB_FREQ_OFFSET 24
#define RTIT_CTL_PSB_FREQ (0x0full << RTIT_CTL_PSB_FREQ_OFFSET)
#define MSR_IA32_RTIT_STATUS 0x00000571
#define RTIT_STATUS_CONTEXTEN BIT(1)
#define RTIT_STATUS_TRIGGEREN BIT(2)
......
......@@ -159,6 +159,13 @@ struct x86_pmu_capability {
*/
#define INTEL_PMC_IDX_FIXED_BTS (INTEL_PMC_IDX_FIXED + 16)
#define GLOBAL_STATUS_COND_CHG BIT_ULL(63)
#define GLOBAL_STATUS_BUFFER_OVF BIT_ULL(62)
#define GLOBAL_STATUS_UNC_OVF BIT_ULL(61)
#define GLOBAL_STATUS_ASIF BIT_ULL(60)
#define GLOBAL_STATUS_COUNTERS_FROZEN BIT_ULL(59)
#define GLOBAL_STATUS_LBRS_FROZEN BIT_ULL(58)
/*
* IBS cpuid feature detection
*/
......
......@@ -51,6 +51,7 @@ extern int unsynchronized_tsc(void);
extern int check_tsc_unstable(void);
extern int check_tsc_disabled(void);
extern unsigned long native_calibrate_tsc(void);
extern unsigned long long native_sched_clock_from_tsc(u64 tsc);
extern int tsc_clocksource_reliable;
......
......@@ -46,6 +46,8 @@ obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += perf_event_intel_uncore.o \
perf_event_intel_uncore_snb.o \
perf_event_intel_uncore_snbep.o \
perf_event_intel_uncore_nhmex.o
obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_msr.o
obj-$(CONFIG_CPU_SUP_AMD) += perf_event_msr.o
endif
......
......@@ -25,32 +25,11 @@
*/
#define TOPA_PMI_MARGIN 512
/*
* Table of Physical Addresses bits
*/
enum topa_sz {
TOPA_4K = 0,
TOPA_8K,
TOPA_16K,
TOPA_32K,
TOPA_64K,
TOPA_128K,
TOPA_256K,
TOPA_512K,
TOPA_1MB,
TOPA_2MB,
TOPA_4MB,
TOPA_8MB,
TOPA_16MB,
TOPA_32MB,
TOPA_64MB,
TOPA_128MB,
TOPA_SZ_END,
};
#define TOPA_SHIFT 12
static inline unsigned int sizes(enum topa_sz tsz)
static inline unsigned int sizes(unsigned int tsz)
{
return 1 << (tsz + 12);
return 1 << (tsz + TOPA_SHIFT);
};
struct topa_entry {
......@@ -66,20 +45,26 @@ struct topa_entry {
u64 rsvd4 : 16;
};
#define TOPA_SHIFT 12
#define PT_CPUID_LEAVES 2
#define PT_CPUID_LEAVES 2
#define PT_CPUID_REGS_NUM 4 /* number of regsters (eax, ebx, ecx, edx) */
enum pt_capabilities {
PT_CAP_max_subleaf = 0,
PT_CAP_cr3_filtering,
PT_CAP_psb_cyc,
PT_CAP_mtc,
PT_CAP_topa_output,
PT_CAP_topa_multiple_entries,
PT_CAP_single_range_output,
PT_CAP_payloads_lip,
PT_CAP_mtc_periods,
PT_CAP_cycle_thresholds,
PT_CAP_psb_periods,
};
struct pt_pmu {
struct pmu pmu;
u32 caps[4 * PT_CPUID_LEAVES];
u32 caps[PT_CPUID_REGS_NUM * PT_CPUID_LEAVES];
};
/**
......
......@@ -1551,7 +1551,7 @@ static void __init filter_events(struct attribute **attrs)
}
/* Merge two pointer arrays */
static __init struct attribute **merge_attr(struct attribute **a, struct attribute **b)
__init struct attribute **merge_attr(struct attribute **a, struct attribute **b)
{
struct attribute **new;
int j, i;
......
......@@ -165,7 +165,7 @@ struct intel_excl_cntrs {
unsigned core_id; /* per-core: core id */
};
#define MAX_LBR_ENTRIES 16
#define MAX_LBR_ENTRIES 32
enum {
X86_PERF_KFREE_SHARED = 0,
......@@ -594,6 +594,7 @@ struct x86_pmu {
struct event_constraint *pebs_constraints;
void (*pebs_aliases)(struct perf_event *event);
int max_pebs_events;
unsigned long free_running_flags;
/*
* Intel LBR
......@@ -624,6 +625,7 @@ struct x86_pmu {
struct x86_perf_task_context {
u64 lbr_from[MAX_LBR_ENTRIES];
u64 lbr_to[MAX_LBR_ENTRIES];
u64 lbr_info[MAX_LBR_ENTRIES];
int lbr_callstack_users;
int lbr_stack_state;
};
......@@ -793,6 +795,8 @@ static inline void set_linear_ip(struct pt_regs *regs, unsigned long ip)
ssize_t x86_event_sysfs_show(char *page, u64 config, u64 event);
ssize_t intel_event_sysfs_show(char *page, u64 config);
struct attribute **merge_attr(struct attribute **a, struct attribute **b);
#ifdef CONFIG_CPU_SUP_AMD
int amd_pmu_init(void);
......@@ -808,20 +812,6 @@ static inline int amd_pmu_init(void)
#ifdef CONFIG_CPU_SUP_INTEL
static inline bool intel_pmu_needs_lbr_smpl(struct perf_event *event)
{
/* user explicitly requested branch sampling */
if (has_branch_stack(event))
return true;
/* implicit branch sampling to correct PEBS skid */
if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1 &&
x86_pmu.intel_cap.pebs_format < 2)
return true;
return false;
}
static inline bool intel_pmu_has_bts(struct perf_event *event)
{
if (event->attr.config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS &&
......@@ -873,6 +863,8 @@ extern struct event_constraint intel_ivb_pebs_event_constraints[];
extern struct event_constraint intel_hsw_pebs_event_constraints[];
extern struct event_constraint intel_skl_pebs_event_constraints[];
struct event_constraint *intel_pebs_constraints(struct perf_event *event);
void intel_pmu_pebs_enable(struct perf_event *event);
......@@ -911,6 +903,8 @@ void intel_pmu_lbr_init_snb(void);
void intel_pmu_lbr_init_hsw(void);
void intel_pmu_lbr_init_skl(void);
int intel_pmu_setup_lbr_filter(struct perf_event *event);
void intel_pt_interrupt(void);
......@@ -934,6 +928,7 @@ static inline int is_ht_workaround_enabled(void)
{
return !!(x86_pmu.flags & PMU_FL_EXCL_ENABLED);
}
#else /* CONFIG_CPU_SUP_INTEL */
static inline void reserve_ds_buffers(void)
......
This diff is collapsed.
......@@ -62,9 +62,6 @@ struct bts_buffer {
struct pmu bts_pmu;
void intel_pmu_enable_bts(u64 config);
void intel_pmu_disable_bts(void);
static size_t buf_size(struct page *page)
{
return 1 << (PAGE_SHIFT + page_private(page));
......
......@@ -224,6 +224,19 @@ union hsw_tsx_tuning {
#define PEBS_HSW_TSX_FLAGS 0xff00000000ULL
/* Same as HSW, plus TSC */
struct pebs_record_skl {
u64 flags, ip;
u64 ax, bx, cx, dx;
u64 si, di, bp, sp;
u64 r8, r9, r10, r11;
u64 r12, r13, r14, r15;
u64 status, dla, dse, lat;
u64 real_ip, tsx_tuning;
u64 tsc;
};
void init_debug_store_on_cpu(int cpu)
{
struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
......@@ -675,6 +688,28 @@ struct event_constraint intel_hsw_pebs_event_constraints[] = {
EVENT_CONSTRAINT_END
};
struct event_constraint intel_skl_pebs_event_constraints[] = {
INTEL_FLAGS_UEVENT_CONSTRAINT(0x1c0, 0x2), /* INST_RETIRED.PREC_DIST */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA(0x01c2, 0xf), /* UOPS_RETIRED.ALL */
/* UOPS_RETIRED.ALL, inv=1, cmask=16 (cycles:p). */
INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c2, 0xf),
INTEL_PLD_CONSTRAINT(0x1cd, 0xf), /* MEM_TRANS_RETIRED.* */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x11d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_LOADS */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x12d0, 0xf), /* MEM_INST_RETIRED.STLB_MISS_STORES */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x21d0, 0xf), /* MEM_INST_RETIRED.LOCK_LOADS */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x22d0, 0xf), /* MEM_INST_RETIRED.LOCK_STORES */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x41d0, 0xf), /* MEM_INST_RETIRED.SPLIT_LOADS */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x42d0, 0xf), /* MEM_INST_RETIRED.SPLIT_STORES */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x81d0, 0xf), /* MEM_INST_RETIRED.ALL_LOADS */
INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x82d0, 0xf), /* MEM_INST_RETIRED.ALL_STORES */
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd1, 0xf), /* MEM_LOAD_RETIRED.* */
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd2, 0xf), /* MEM_LOAD_L3_HIT_RETIRED.* */
INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd3, 0xf), /* MEM_LOAD_L3_MISS_RETIRED.* */
/* Allow all events as PEBS with no flags */
INTEL_ALL_EVENT_CONSTRAINT(0, 0xf),
EVENT_CONSTRAINT_END
};
struct event_constraint *intel_pebs_constraints(struct perf_event *event)
{
struct event_constraint *c;
......@@ -754,6 +789,11 @@ void intel_pmu_pebs_disable(struct perf_event *event)
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct hw_perf_event *hwc = &event->hw;
struct debug_store *ds = cpuc->ds;
bool large_pebs = ds->pebs_interrupt_threshold >
ds->pebs_buffer_base + x86_pmu.pebs_record_size;
if (large_pebs)
intel_pmu_drain_pebs_buffer();
cpuc->pebs_enabled &= ~(1ULL << hwc->idx);
......@@ -762,12 +802,8 @@ void intel_pmu_pebs_disable(struct perf_event *event)
else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
cpuc->pebs_enabled &= ~(1ULL << 63);
if (ds->pebs_interrupt_threshold >
ds->pebs_buffer_base + x86_pmu.pebs_record_size) {
intel_pmu_drain_pebs_buffer();
if (!pebs_is_enabled(cpuc))
perf_sched_cb_dec(event->ctx->pmu);
}
if (large_pebs && !pebs_is_enabled(cpuc))
perf_sched_cb_dec(event->ctx->pmu);
if (cpuc->enabled)
wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
......@@ -885,7 +921,7 @@ static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
return 0;
}
static inline u64 intel_hsw_weight(struct pebs_record_hsw *pebs)
static inline u64 intel_hsw_weight(struct pebs_record_skl *pebs)
{
if (pebs->tsx_tuning) {
union hsw_tsx_tuning tsx = { .value = pebs->tsx_tuning };
......@@ -894,7 +930,7 @@ static inline u64 intel_hsw_weight(struct pebs_record_hsw *pebs)
return 0;
}
static inline u64 intel_hsw_transaction(struct pebs_record_hsw *pebs)
static inline u64 intel_hsw_transaction(struct pebs_record_skl *pebs)
{
u64 txn = (pebs->tsx_tuning & PEBS_HSW_TSX_FLAGS) >> 32;
......@@ -918,7 +954,7 @@ static void setup_pebs_sample_data(struct perf_event *event,
* unconditionally access the 'extra' entries.
*/
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct pebs_record_hsw *pebs = __pebs;
struct pebs_record_skl *pebs = __pebs;
u64 sample_type;
int fll, fst, dsrc;
int fl = event->hw.flags;
......@@ -1016,6 +1052,16 @@ static void setup_pebs_sample_data(struct perf_event *event,
data->txn = intel_hsw_transaction(pebs);
}
/*
* v3 supplies an accurate time stamp, so we use that
* for the time stamp.
*
* We can only do this for the default trace clock.
*/
if (x86_pmu.intel_cap.pebs_format >= 3 &&
event->attr.use_clockid == 0)
data->time = native_sched_clock_from_tsc(pebs->tsc);
if (has_branch_stack(event))
data->br_stack = &cpuc->lbr_stack;
}
......@@ -1142,6 +1188,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
for (at = base; at < top; at += x86_pmu.pebs_record_size) {
struct pebs_record_nhm *p = at;
u64 pebs_status;
/* PEBS v3 has accurate status bits */
if (x86_pmu.intel_cap.pebs_format >= 3) {
......@@ -1152,12 +1199,17 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
continue;
}
bit = find_first_bit((unsigned long *)&p->status,
pebs_status = p->status & cpuc->pebs_enabled;
pebs_status &= (1ULL << x86_pmu.max_pebs_events) - 1;
bit = find_first_bit((unsigned long *)&pebs_status,
x86_pmu.max_pebs_events);
if (bit >= x86_pmu.max_pebs_events)
continue;
if (!test_bit(bit, cpuc->active_mask))
if (WARN(bit >= x86_pmu.max_pebs_events,
"PEBS record without PEBS event! status=%Lx pebs_enabled=%Lx active_mask=%Lx",
(unsigned long long)p->status, (unsigned long long)cpuc->pebs_enabled,
*(unsigned long long *)cpuc->active_mask))
continue;
/*
* The PEBS hardware does not deal well with the situation
* when events happen near to each other and multiple bits
......@@ -1172,27 +1224,21 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
* one, and it's not possible to reconstruct all events
* that caused the PEBS record. It's called collision.
* If collision happened, the record will be dropped.
*
*/
if (p->status != (1 << bit)) {
u64 pebs_status;
/* slow path */
pebs_status = p->status & cpuc->pebs_enabled;
pebs_status &= (1ULL << MAX_PEBS_EVENTS) - 1;
if (pebs_status != (1 << bit)) {
for_each_set_bit(i, (unsigned long *)&pebs_status,
MAX_PEBS_EVENTS)
error[i]++;
continue;
}
if (p->status != (1ULL << bit)) {
for_each_set_bit(i, (unsigned long *)&pebs_status,
x86_pmu.max_pebs_events)
error[i]++;
continue;
}
counts[bit]++;
}
for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
if ((counts[bit] == 0) && (error[bit] == 0))
continue;
event = cpuc->events[bit];
WARN_ON_ONCE(!event);
WARN_ON_ONCE(!event->attr.precise_ip);
......@@ -1245,6 +1291,14 @@ void __init intel_ds_init(void)
x86_pmu.drain_pebs = intel_pmu_drain_pebs_nhm;
break;
case 3:
pr_cont("PEBS fmt3%c, ", pebs_type);
x86_pmu.pebs_record_size =
sizeof(struct pebs_record_skl);
x86_pmu.drain_pebs = intel_pmu_drain_pebs_nhm;
x86_pmu.free_running_flags |= PERF_SAMPLE_TIME;
break;
default:
printk(KERN_CONT "no PEBS fmt%d%c, ", format, pebs_type);
x86_pmu.pebs = 0;
......
......@@ -13,7 +13,8 @@ enum {
LBR_FORMAT_EIP = 0x02,
LBR_FORMAT_EIP_FLAGS = 0x03,
LBR_FORMAT_EIP_FLAGS2 = 0x04,
LBR_FORMAT_MAX_KNOWN = LBR_FORMAT_EIP_FLAGS2,
LBR_FORMAT_INFO = 0x05,
LBR_FORMAT_MAX_KNOWN = LBR_FORMAT_INFO,
};
static enum {
......@@ -139,6 +140,13 @@ static void __intel_pmu_lbr_enable(bool pmi)
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
u64 debugctl, lbr_select = 0, orig_debugctl;
/*
* No need to unfreeze manually, as v4 can do that as part
* of the GLOBAL_STATUS ack.
*/
if (pmi && x86_pmu.version >= 4)
return;
/*
* No need to reprogram LBR_SELECT in a PMI, as it
* did not change.
......@@ -186,6 +194,8 @@ static void intel_pmu_lbr_reset_64(void)
for (i = 0; i < x86_pmu.lbr_nr; i++) {
wrmsrl(x86_pmu.lbr_from + i, 0);
wrmsrl(x86_pmu.lbr_to + i, 0);
if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)
wrmsrl(MSR_LBR_INFO_0 + i, 0);
}
}
......@@ -230,10 +240,12 @@ static void __intel_pmu_lbr_restore(struct x86_perf_task_context *task_ctx)
mask = x86_pmu.lbr_nr - 1;
tos = intel_pmu_lbr_tos();
for (i = 0; i < x86_pmu.lbr_nr; i++) {
for (i = 0; i < tos; i++) {
lbr_idx = (tos - i) & mask;
wrmsrl(x86_pmu.lbr_from + lbr_idx, task_ctx->lbr_from[i]);
wrmsrl(x86_pmu.lbr_to + lbr_idx, task_ctx->lbr_to[i]);
if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)
wrmsrl(MSR_LBR_INFO_0 + lbr_idx, task_ctx->lbr_info[i]);
}
task_ctx->lbr_stack_state = LBR_NONE;
}
......@@ -251,10 +263,12 @@ static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx)
mask = x86_pmu.lbr_nr - 1;
tos = intel_pmu_lbr_tos();
for (i = 0; i < x86_pmu.lbr_nr; i++) {
for (i = 0; i < tos; i++) {
lbr_idx = (tos - i) & mask;
rdmsrl(x86_pmu.lbr_from + lbr_idx, task_ctx->lbr_from[i]);
rdmsrl(x86_pmu.lbr_to + lbr_idx, task_ctx->lbr_to[i]);
if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)
rdmsrl(MSR_LBR_INFO_0 + lbr_idx, task_ctx->lbr_info[i]);
}
task_ctx->lbr_stack_state = LBR_VALID;
}
......@@ -411,16 +425,31 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
u64 tos = intel_pmu_lbr_tos();
int i;
int out = 0;
int num = x86_pmu.lbr_nr;
for (i = 0; i < x86_pmu.lbr_nr; i++) {
if (cpuc->lbr_sel->config & LBR_CALL_STACK)
num = tos;
for (i = 0; i < num; i++) {
unsigned long lbr_idx = (tos - i) & mask;
u64 from, to, mis = 0, pred = 0, in_tx = 0, abort = 0;
int skip = 0;
u16 cycles = 0;
int lbr_flags = lbr_desc[lbr_format];
rdmsrl(x86_pmu.lbr_from + lbr_idx, from);
rdmsrl(x86_pmu.lbr_to + lbr_idx, to);
if (lbr_format == LBR_FORMAT_INFO) {
u64 info;
rdmsrl(MSR_LBR_INFO_0 + lbr_idx, info);
mis = !!(info & LBR_INFO_MISPRED);
pred = !mis;
in_tx = !!(info & LBR_INFO_IN_TX);
abort = !!(info & LBR_INFO_ABORT);
cycles = (info & LBR_INFO_CYCLES);
}
if (lbr_flags & LBR_EIP_FLAGS) {
mis = !!(from & LBR_FROM_FLAG_MISPRED);
pred = !mis;
......@@ -450,6 +479,7 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
cpuc->lbr_entries[out].predicted = pred;
cpuc->lbr_entries[out].in_tx = in_tx;
cpuc->lbr_entries[out].abort = abort;
cpuc->lbr_entries[out].cycles = cycles;
cpuc->lbr_entries[out].reserved = 0;
out++;
}
......@@ -947,6 +977,26 @@ void intel_pmu_lbr_init_hsw(void)
pr_cont("16-deep LBR, ");
}
/* skylake */
__init void intel_pmu_lbr_init_skl(void)
{
x86_pmu.lbr_nr = 32;
x86_pmu.lbr_tos = MSR_LBR_TOS;
x86_pmu.lbr_from = MSR_LBR_NHM_FROM;
x86_pmu.lbr_to = MSR_LBR_NHM_TO;
x86_pmu.lbr_sel_mask = LBR_SEL_MASK;
x86_pmu.lbr_sel_map = hsw_lbr_sel_map;
/*
* SW branch filter usage:
* - support syscall, sysret capture.
* That requires LBR_FAR but that means far
* jmp need to be filtered out
*/
pr_cont("32-deep LBR, ");
}
/* atom */
void __init intel_pmu_lbr_init_atom(void)
{
......
......@@ -65,15 +65,21 @@ static struct pt_cap_desc {
} pt_caps[] = {
PT_CAP(max_subleaf, 0, CR_EAX, 0xffffffff),
PT_CAP(cr3_filtering, 0, CR_EBX, BIT(0)),
PT_CAP(psb_cyc, 0, CR_EBX, BIT(1)),
PT_CAP(mtc, 0, CR_EBX, BIT(3)),
PT_CAP(topa_output, 0, CR_ECX, BIT(0)),
PT_CAP(topa_multiple_entries, 0, CR_ECX, BIT(1)),
PT_CAP(single_range_output, 0, CR_ECX, BIT(2)),
PT_CAP(payloads_lip, 0, CR_ECX, BIT(31)),
PT_CAP(mtc_periods, 1, CR_EAX, 0xffff0000),
PT_CAP(cycle_thresholds, 1, CR_EBX, 0xffff),
PT_CAP(psb_periods, 1, CR_EBX, 0xffff0000),
};
static u32 pt_cap_get(enum pt_capabilities cap)
{
struct pt_cap_desc *cd = &pt_caps[cap];
u32 c = pt_pmu.caps[cd->leaf * 4 + cd->reg];
u32 c = pt_pmu.caps[cd->leaf * PT_CPUID_REGS_NUM + cd->reg];
unsigned int shift = __ffs(cd->mask);
return (c & cd->mask) >> shift;
......@@ -94,12 +100,22 @@ static struct attribute_group pt_cap_group = {
.name = "caps",
};
PMU_FORMAT_ATTR(cyc, "config:1" );
PMU_FORMAT_ATTR(mtc, "config:9" );
PMU_FORMAT_ATTR(tsc, "config:10" );
PMU_FORMAT_ATTR(noretcomp, "config:11" );
PMU_FORMAT_ATTR(mtc_period, "config:14-17" );
PMU_FORMAT_ATTR(cyc_thresh, "config:19-22" );
PMU_FORMAT_ATTR(psb_period, "config:24-27" );
static struct attribute *pt_formats_attr[] = {
&format_attr_cyc.attr,
&format_attr_mtc.attr,
&format_attr_tsc.attr,
&format_attr_noretcomp.attr,
&format_attr_mtc_period.attr,
&format_attr_cyc_thresh.attr,
&format_attr_psb_period.attr,
NULL,
};
......@@ -129,10 +145,10 @@ static int __init pt_pmu_hw_init(void)
for (i = 0; i < PT_CPUID_LEAVES; i++) {
cpuid_count(20, i,
&pt_pmu.caps[CR_EAX + i*4],
&pt_pmu.caps[CR_EBX + i*4],
&pt_pmu.caps[CR_ECX + i*4],
&pt_pmu.caps[CR_EDX + i*4]);
&pt_pmu.caps[CR_EAX + i*PT_CPUID_REGS_NUM],
&pt_pmu.caps[CR_EBX + i*PT_CPUID_REGS_NUM],
&pt_pmu.caps[CR_ECX + i*PT_CPUID_REGS_NUM],
&pt_pmu.caps[CR_EDX + i*PT_CPUID_REGS_NUM]);
}
ret = -ENOMEM;
......@@ -170,15 +186,65 @@ static int __init pt_pmu_hw_init(void)
return ret;
}
#define PT_CONFIG_MASK (RTIT_CTL_TSC_EN | RTIT_CTL_DISRETC)
#define RTIT_CTL_CYC_PSB (RTIT_CTL_CYCLEACC | \
RTIT_CTL_CYC_THRESH | \
RTIT_CTL_PSB_FREQ)
#define RTIT_CTL_MTC (RTIT_CTL_MTC_EN | \
RTIT_CTL_MTC_RANGE)
#define PT_CONFIG_MASK (RTIT_CTL_TSC_EN | \
RTIT_CTL_DISRETC | \
RTIT_CTL_CYC_PSB | \
RTIT_CTL_MTC)
static bool pt_event_valid(struct perf_event *event)
{
u64 config = event->attr.config;
u64 allowed, requested;
if ((config & PT_CONFIG_MASK) != config)
return false;
if (config & RTIT_CTL_CYC_PSB) {
if (!pt_cap_get(PT_CAP_psb_cyc))
return false;
allowed = pt_cap_get(PT_CAP_psb_periods);
requested = (config & RTIT_CTL_PSB_FREQ) >>
RTIT_CTL_PSB_FREQ_OFFSET;
if (requested && (!(allowed & BIT(requested))))
return false;
allowed = pt_cap_get(PT_CAP_cycle_thresholds);
requested = (config & RTIT_CTL_CYC_THRESH) >>
RTIT_CTL_CYC_THRESH_OFFSET;
if (requested && (!(allowed & BIT(requested))))
return false;
}
if (config & RTIT_CTL_MTC) {
/*
* In the unlikely case that CPUID lists valid mtc periods,
* but not the mtc capability, drop out here.
*
* Spec says that setting mtc period bits while mtc bit in
* CPUID is 0 will #GP, so better safe than sorry.
*/
if (!pt_cap_get(PT_CAP_mtc))
return false;
allowed = pt_cap_get(PT_CAP_mtc_periods);
if (!allowed)
return false;
requested = (config & RTIT_CTL_MTC_RANGE) >>
RTIT_CTL_MTC_RANGE_OFFSET;
if (!(allowed & BIT(requested)))
return false;
}
return true;
}
......@@ -191,6 +257,11 @@ static void pt_config(struct perf_event *event)
{
u64 reg;
if (!event->hw.itrace_started) {
event->hw.itrace_started = 1;
wrmsrl(MSR_IA32_RTIT_STATUS, 0);
}
reg = RTIT_CTL_TOPA | RTIT_CTL_BRANCH_EN | RTIT_CTL_TRACEEN;
if (!event->attr.exclude_kernel)
......@@ -910,7 +981,6 @@ void intel_pt_interrupt(void)
pt_config_buffer(buf->cur->table, buf->cur_idx,
buf->output_off);
wrmsrl(MSR_IA32_RTIT_STATUS, 0);
pt_config(event);
}
}
......@@ -934,7 +1004,6 @@ static void pt_event_start(struct perf_event *event, int mode)
pt_config_buffer(buf->cur->table, buf->cur_idx,
buf->output_off);
wrmsrl(MSR_IA32_RTIT_STATUS, 0);
pt_config(event);
}
......
......@@ -86,6 +86,10 @@ static const char *rapl_domain_names[NR_RAPL_DOMAINS] __initconst = {
1<<RAPL_IDX_RAM_NRG_STAT|\
1<<RAPL_IDX_PP1_NRG_STAT)
/* Knights Landing has PKG, RAM */
#define RAPL_IDX_KNL (1<<RAPL_IDX_PKG_NRG_STAT|\
1<<RAPL_IDX_RAM_NRG_STAT)
/*
* event code: LSB 8 bits, passed in attr->config
* any other bit is reserved
......@@ -486,6 +490,18 @@ static struct attribute *rapl_events_hsw_attr[] = {
NULL,
};
static struct attribute *rapl_events_knl_attr[] = {
EVENT_PTR(rapl_pkg),
EVENT_PTR(rapl_ram),
EVENT_PTR(rapl_pkg_unit),
EVENT_PTR(rapl_ram_unit),
EVENT_PTR(rapl_pkg_scale),
EVENT_PTR(rapl_ram_scale),
NULL,
};
static struct attribute_group rapl_pmu_events_group = {
.name = "events",
.attrs = NULL, /* patched at runtime */
......@@ -730,6 +746,10 @@ static int __init rapl_pmu_init(void)
rapl_cntr_mask = RAPL_IDX_SRV;
rapl_pmu_events_group.attrs = rapl_events_srv_attr;
break;
case 87: /* Knights Landing */
rapl_add_quirk(rapl_hsw_server_quirk);
rapl_cntr_mask = RAPL_IDX_KNL;
rapl_pmu_events_group.attrs = rapl_events_knl_attr;
default:
/* unsupported */
......
......@@ -911,6 +911,9 @@ static int __init uncore_pci_init(void)
case 63: /* Haswell-EP */
ret = hswep_uncore_pci_init();
break;
case 86: /* BDX-DE */
ret = bdx_uncore_pci_init();
break;
case 42: /* Sandy Bridge */
ret = snb_uncore_pci_init();
break;
......@@ -1209,6 +1212,11 @@ static int __init uncore_cpu_init(void)
break;
case 42: /* Sandy Bridge */
case 58: /* Ivy Bridge */
case 60: /* Haswell */
case 69: /* Haswell */
case 70: /* Haswell */
case 61: /* Broadwell */
case 71: /* Broadwell */
snb_uncore_cpu_init();
break;
case 45: /* Sandy Bridge-EP */
......@@ -1224,6 +1232,9 @@ static int __init uncore_cpu_init(void)
case 63: /* Haswell-EP */
hswep_uncore_cpu_init();
break;
case 86: /* BDX-DE */
bdx_uncore_cpu_init();
break;
default:
return 0;
}
......
......@@ -336,6 +336,8 @@ int ivbep_uncore_pci_init(void);
void ivbep_uncore_cpu_init(void);
int hswep_uncore_pci_init(void);
void hswep_uncore_cpu_init(void);
int bdx_uncore_pci_init(void);
void bdx_uncore_cpu_init(void);
/* perf_event_intel_uncore_nhmex.c */
void nhmex_uncore_cpu_init(void);
......@@ -45,6 +45,11 @@
#define SNB_UNC_CBO_0_PER_CTR0 0x706
#define SNB_UNC_CBO_MSR_OFFSET 0x10
/* SNB ARB register */
#define SNB_UNC_ARB_PER_CTR0 0x3b0
#define SNB_UNC_ARB_PERFEVTSEL0 0x3b2
#define SNB_UNC_ARB_MSR_OFFSET 0x10
/* NHM global control register */
#define NHM_UNC_PERF_GLOBAL_CTL 0x391
#define NHM_UNC_FIXED_CTR 0x394
......@@ -115,7 +120,7 @@ static struct intel_uncore_ops snb_uncore_msr_ops = {
.read_counter = uncore_msr_read_counter,
};
static struct event_constraint snb_uncore_cbox_constraints[] = {
static struct event_constraint snb_uncore_arb_constraints[] = {
UNCORE_EVENT_CONSTRAINT(0x80, 0x1),
UNCORE_EVENT_CONSTRAINT(0x83, 0x1),
EVENT_CONSTRAINT_END
......@@ -134,14 +139,28 @@ static struct intel_uncore_type snb_uncore_cbox = {
.single_fixed = 1,
.event_mask = SNB_UNC_RAW_EVENT_MASK,
.msr_offset = SNB_UNC_CBO_MSR_OFFSET,
.constraints = snb_uncore_cbox_constraints,
.ops = &snb_uncore_msr_ops,
.format_group = &snb_uncore_format_group,
.event_descs = snb_uncore_events,
};
static struct intel_uncore_type snb_uncore_arb = {
.name = "arb",
.num_counters = 2,
.num_boxes = 1,
.perf_ctr_bits = 44,
.perf_ctr = SNB_UNC_ARB_PER_CTR0,
.event_ctl = SNB_UNC_ARB_PERFEVTSEL0,
.event_mask = SNB_UNC_RAW_EVENT_MASK,
.msr_offset = SNB_UNC_ARB_MSR_OFFSET,
.constraints = snb_uncore_arb_constraints,
.ops = &snb_uncore_msr_ops,
.format_group = &snb_uncore_format_group,
};
static struct intel_uncore_type *snb_msr_uncores[] = {
&snb_uncore_cbox,
&snb_uncore_arb,
NULL,
};
......
......@@ -2215,7 +2215,7 @@ static struct intel_uncore_type *hswep_pci_uncores[] = {
NULL,
};
static DEFINE_PCI_DEVICE_TABLE(hswep_uncore_pci_ids) = {
static const struct pci_device_id hswep_uncore_pci_ids[] = {
{ /* Home Agent 0 */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x2f30),
.driver_data = UNCORE_PCI_DEV_DATA(HSWEP_PCI_UNCORE_HA, 0),
......@@ -2321,3 +2321,167 @@ int hswep_uncore_pci_init(void)
return 0;
}
/* end of Haswell-EP uncore support */
/* BDX-DE uncore support */
static struct intel_uncore_type bdx_uncore_ubox = {
.name = "ubox",
.num_counters = 2,
.num_boxes = 1,
.perf_ctr_bits = 48,
.fixed_ctr_bits = 48,
.perf_ctr = HSWEP_U_MSR_PMON_CTR0,
.event_ctl = HSWEP_U_MSR_PMON_CTL0,
.event_mask = SNBEP_U_MSR_PMON_RAW_EVENT_MASK,
.fixed_ctr = HSWEP_U_MSR_PMON_UCLK_FIXED_CTR,
.fixed_ctl = HSWEP_U_MSR_PMON_UCLK_FIXED_CTL,
.num_shared_regs = 1,
.ops = &ivbep_uncore_msr_ops,
.format_group = &ivbep_uncore_ubox_format_group,
};
static struct event_constraint bdx_uncore_cbox_constraints[] = {
UNCORE_EVENT_CONSTRAINT(0x09, 0x3),
UNCORE_EVENT_CONSTRAINT(0x11, 0x1),
UNCORE_EVENT_CONSTRAINT(0x36, 0x1),
EVENT_CONSTRAINT_END
};
static struct intel_uncore_type bdx_uncore_cbox = {
.name = "cbox",
.num_counters = 4,
.num_boxes = 8,
.perf_ctr_bits = 48,
.event_ctl = HSWEP_C0_MSR_PMON_CTL0,
.perf_ctr = HSWEP_C0_MSR_PMON_CTR0,
.event_mask = SNBEP_CBO_MSR_PMON_RAW_EVENT_MASK,
.box_ctl = HSWEP_C0_MSR_PMON_BOX_CTL,
.msr_offset = HSWEP_CBO_MSR_OFFSET,
.num_shared_regs = 1,
.constraints = bdx_uncore_cbox_constraints,
.ops = &hswep_uncore_cbox_ops,
.format_group = &hswep_uncore_cbox_format_group,
};
static struct intel_uncore_type *bdx_msr_uncores[] = {
&bdx_uncore_ubox,
&bdx_uncore_cbox,
&hswep_uncore_pcu,
NULL,
};
void bdx_uncore_cpu_init(void)
{
if (bdx_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
bdx_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
uncore_msr_uncores = bdx_msr_uncores;
}
static struct intel_uncore_type bdx_uncore_ha = {
.name = "ha",
.num_counters = 4,
.num_boxes = 1,
.perf_ctr_bits = 48,
SNBEP_UNCORE_PCI_COMMON_INIT(),
};
static struct intel_uncore_type bdx_uncore_imc = {
.name = "imc",
.num_counters = 5,
.num_boxes = 2,
.perf_ctr_bits = 48,
.fixed_ctr_bits = 48,
.fixed_ctr = SNBEP_MC_CHy_PCI_PMON_FIXED_CTR,
.fixed_ctl = SNBEP_MC_CHy_PCI_PMON_FIXED_CTL,
.event_descs = hswep_uncore_imc_events,
SNBEP_UNCORE_PCI_COMMON_INIT(),
};
static struct intel_uncore_type bdx_uncore_irp = {
.name = "irp",
.num_counters = 4,
.num_boxes = 1,
.perf_ctr_bits = 48,
.event_mask = SNBEP_PMON_RAW_EVENT_MASK,
.box_ctl = SNBEP_PCI_PMON_BOX_CTL,
.ops = &hswep_uncore_irp_ops,
.format_group = &snbep_uncore_format_group,
};
static struct event_constraint bdx_uncore_r2pcie_constraints[] = {
UNCORE_EVENT_CONSTRAINT(0x10, 0x3),
UNCORE_EVENT_CONSTRAINT(0x11, 0x3),
UNCORE_EVENT_CONSTRAINT(0x13, 0x1),
UNCORE_EVENT_CONSTRAINT(0x23, 0x1),
UNCORE_EVENT_CONSTRAINT(0x25, 0x1),
UNCORE_EVENT_CONSTRAINT(0x26, 0x3),
UNCORE_EVENT_CONSTRAINT(0x2d, 0x3),
EVENT_CONSTRAINT_END
};
static struct intel_uncore_type bdx_uncore_r2pcie = {
.name = "r2pcie",
.num_counters = 4,
.num_boxes = 1,
.perf_ctr_bits = 48,
.constraints = bdx_uncore_r2pcie_constraints,
SNBEP_UNCORE_PCI_COMMON_INIT(),
};
enum {
BDX_PCI_UNCORE_HA,
BDX_PCI_UNCORE_IMC,
BDX_PCI_UNCORE_IRP,
BDX_PCI_UNCORE_R2PCIE,
};
static struct intel_uncore_type *bdx_pci_uncores[] = {
[BDX_PCI_UNCORE_HA] = &bdx_uncore_ha,
[BDX_PCI_UNCORE_IMC] = &bdx_uncore_imc,
[BDX_PCI_UNCORE_IRP] = &bdx_uncore_irp,
[BDX_PCI_UNCORE_R2PCIE] = &bdx_uncore_r2pcie,
NULL,
};
static DEFINE_PCI_DEVICE_TABLE(bdx_uncore_pci_ids) = {
{ /* Home Agent 0 */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f30),
.driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_HA, 0),
},
{ /* MC0 Channel 0 */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6fb0),
.driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_IMC, 0),
},
{ /* MC0 Channel 1 */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6fb1),
.driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_IMC, 1),
},
{ /* IRP */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f39),
.driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_IRP, 0),
},
{ /* R2PCIe */
PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x6f34),
.driver_data = UNCORE_PCI_DEV_DATA(BDX_PCI_UNCORE_R2PCIE, 0),
},
{ /* end: all zeroes */ }
};
static struct pci_driver bdx_uncore_pci_driver = {
.name = "bdx_uncore",
.id_table = bdx_uncore_pci_ids,
};
int bdx_uncore_pci_init(void)
{
int ret = snbep_pci2phy_map_init(0x6f1e);
if (ret)
return ret;
uncore_pci_uncores = bdx_pci_uncores;
uncore_pci_driver = &bdx_uncore_pci_driver;
return 0;
}
/* end of BDX-DE uncore support */
#include <linux/perf_event.h>
enum perf_msr_id {
PERF_MSR_TSC = 0,
PERF_MSR_APERF = 1,
PERF_MSR_MPERF = 2,
PERF_MSR_PPERF = 3,
PERF_MSR_SMI = 4,
PERF_MSR_EVENT_MAX,
};
bool test_aperfmperf(int idx)
{
return boot_cpu_has(X86_FEATURE_APERFMPERF);
}
bool test_intel(int idx)
{
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
boot_cpu_data.x86 != 6)
return false;
switch (boot_cpu_data.x86_model) {
case 30: /* 45nm Nehalem */
case 26: /* 45nm Nehalem-EP */
case 46: /* 45nm Nehalem-EX */
case 37: /* 32nm Westmere */
case 44: /* 32nm Westmere-EP */
case 47: /* 32nm Westmere-EX */
case 42: /* 32nm SandyBridge */
case 45: /* 32nm SandyBridge-E/EN/EP */
case 58: /* 22nm IvyBridge */
case 62: /* 22nm IvyBridge-EP/EX */
case 60: /* 22nm Haswell Core */
case 63: /* 22nm Haswell Server */
case 69: /* 22nm Haswell ULT */
case 70: /* 22nm Haswell + GT3e (Intel Iris Pro graphics) */
case 61: /* 14nm Broadwell Core-M */
case 86: /* 14nm Broadwell Xeon D */
case 71: /* 14nm Broadwell + GT3e (Intel Iris Pro graphics) */
case 79: /* 14nm Broadwell Server */
case 55: /* 22nm Atom "Silvermont" */
case 77: /* 22nm Atom "Silvermont Avoton/Rangely" */
case 76: /* 14nm Atom "Airmont" */
if (idx == PERF_MSR_SMI)
return true;
break;
case 78: /* 14nm Skylake Mobile */
case 94: /* 14nm Skylake Desktop */
if (idx == PERF_MSR_SMI || idx == PERF_MSR_PPERF)
return true;
break;
}
return false;
}
struct perf_msr {
u64 msr;
struct perf_pmu_events_attr *attr;
bool (*test)(int idx);
};
PMU_EVENT_ATTR_STRING(tsc, evattr_tsc, "event=0x00");
PMU_EVENT_ATTR_STRING(aperf, evattr_aperf, "event=0x01");
PMU_EVENT_ATTR_STRING(mperf, evattr_mperf, "event=0x02");
PMU_EVENT_ATTR_STRING(pperf, evattr_pperf, "event=0x03");
PMU_EVENT_ATTR_STRING(smi, evattr_smi, "event=0x04");
static struct perf_msr msr[] = {
[PERF_MSR_TSC] = { 0, &evattr_tsc, NULL, },
[PERF_MSR_APERF] = { MSR_IA32_APERF, &evattr_aperf, test_aperfmperf, },
[PERF_MSR_MPERF] = { MSR_IA32_MPERF, &evattr_mperf, test_aperfmperf, },
[PERF_MSR_PPERF] = { MSR_PPERF, &evattr_pperf, test_intel, },
[PERF_MSR_SMI] = { MSR_SMI_COUNT, &evattr_smi, test_intel, },
};
static struct attribute *events_attrs[PERF_MSR_EVENT_MAX + 1] = {
NULL,
};
static struct attribute_group events_attr_group = {
.name = "events",
.attrs = events_attrs,
};
PMU_FORMAT_ATTR(event, "config:0-63");
static struct attribute *format_attrs[] = {
&format_attr_event.attr,
NULL,
};
static struct attribute_group format_attr_group = {
.name = "format",
.attrs = format_attrs,
};
static const struct attribute_group *attr_groups[] = {
&events_attr_group,
&format_attr_group,
NULL,
};
static int msr_event_init(struct perf_event *event)
{
u64 cfg = event->attr.config;
if (event->attr.type != event->pmu->type)
return -ENOENT;
if (cfg >= PERF_MSR_EVENT_MAX)
return -EINVAL;
/* unsupported modes and filters */
if (event->attr.exclude_user ||
event->attr.exclude_kernel ||
event->attr.exclude_hv ||
event->attr.exclude_idle ||
event->attr.exclude_host ||
event->attr.exclude_guest ||
event->attr.sample_period) /* no sampling */
return -EINVAL;
if (!msr[cfg].attr)
return -EINVAL;
event->hw.idx = -1;
event->hw.event_base = msr[cfg].msr;
event->hw.config = cfg;
return 0;
}
static inline u64 msr_read_counter(struct perf_event *event)
{
u64 now;
if (event->hw.event_base)
rdmsrl(event->hw.event_base, now);
else
rdtscll(now);
return now;
}
static void msr_event_update(struct perf_event *event)
{
u64 prev, now;
s64 delta;
/* Careful, an NMI might modify the previous event value. */
again:
prev = local64_read(&event->hw.prev_count);
now = msr_read_counter(event);
if (local64_cmpxchg(&event->hw.prev_count, prev, now) != prev)
goto again;
delta = now - prev;
if (unlikely(event->hw.event_base == MSR_SMI_COUNT)) {
delta <<= 32;
delta >>= 32; /* sign extend */
}
local64_add(now - prev, &event->count);
}
static void msr_event_start(struct perf_event *event, int flags)
{
u64 now;
now = msr_read_counter(event);
local64_set(&event->hw.prev_count, now);
}
static void msr_event_stop(struct perf_event *event, int flags)
{
msr_event_update(event);
}
static void msr_event_del(struct perf_event *event, int flags)
{
msr_event_stop(event, PERF_EF_UPDATE);
}
static int msr_event_add(struct perf_event *event, int flags)
{
if (flags & PERF_EF_START)
msr_event_start(event, flags);
return 0;
}
static struct pmu pmu_msr = {
.task_ctx_nr = perf_sw_context,
.attr_groups = attr_groups,
.event_init = msr_event_init,
.add = msr_event_add,
.del = msr_event_del,
.start = msr_event_start,
.stop = msr_event_stop,
.read = msr_event_update,
.capabilities = PERF_PMU_CAP_NO_INTERRUPT,
};
static int __init msr_init(void)
{
int i, j = 0;
if (!boot_cpu_has(X86_FEATURE_TSC)) {
pr_cont("no MSR PMU driver.\n");
return 0;
}
/* Probe the MSRs. */
for (i = PERF_MSR_TSC + 1; i < PERF_MSR_EVENT_MAX; i++) {
u64 val;
/*
* Virt sucks arse; you cannot tell if a R/O MSR is present :/
*/
if (!msr[i].test(i) || rdmsrl_safe(msr[i].msr, &val))
msr[i].attr = NULL;
}
/* List remaining MSRs in the sysfs attrs. */
for (i = 0; i < PERF_MSR_EVENT_MAX; i++) {
if (msr[i].attr)
events_attrs[j++] = &msr[i].attr->attr.attr;
}
events_attrs[j] = NULL;
perf_pmu_register(&pmu_msr, "msr", -1);
return 0;
}
device_initcall(msr_init);
......@@ -32,6 +32,7 @@
#include <linux/irqflags.h>
#include <linux/notifier.h>
#include <linux/kallsyms.h>
#include <linux/kprobes.h>
#include <linux/percpu.h>
#include <linux/kdebug.h>
#include <linux/kernel.h>
......@@ -179,7 +180,11 @@ int arch_check_bp_in_kernelspace(struct perf_event *bp)
va = info->address;
len = bp->attr.bp_len;
return (va >= TASK_SIZE) && ((va + len - 1) >= TASK_SIZE);
/*
* We don't need to worry about va + len - 1 overflowing:
* we already require that va is aligned to a multiple of len.
*/
return (va >= TASK_SIZE_MAX) || ((va + len - 1) >= TASK_SIZE_MAX);
}
int arch_bp_generic_fields(int x86_len, int x86_type,
......@@ -243,6 +248,20 @@ static int arch_build_bp_info(struct perf_event *bp)
info->type = X86_BREAKPOINT_RW;
break;
case HW_BREAKPOINT_X:
/*
* We don't allow kernel breakpoints in places that are not
* acceptable for kprobes. On non-kprobes kernels, we don't
* allow kernel breakpoints at all.
*/
if (bp->attr.bp_addr >= TASK_SIZE_MAX) {
#ifdef CONFIG_KPROBES
if (within_kprobe_blacklist(bp->attr.bp_addr))
return -EINVAL;
#else
return -EINVAL;
#endif
}
info->type = X86_BREAKPOINT_EXECUTE;
/*
* x86 inst breakpoints need to have a specific undefined len.
......@@ -276,8 +295,18 @@ static int arch_build_bp_info(struct perf_event *bp)
break;
#endif
default:
/* AMD range breakpoint */
if (!is_power_of_2(bp->attr.bp_len))
return -EINVAL;
if (bp->attr.bp_addr & (bp->attr.bp_len - 1))
return -EINVAL;
/*
* It's impossible to use a range breakpoint to fake out
* user vs kernel detection because bp_len - 1 can't
* have the high bit set. If we ever allow range instruction
* breakpoints, then we'll have to check for kprobe-blacklisted
* addresses anywhere in the range.
*/
if (!cpu_has_bpext)
return -EOPNOTSUPP;
info->mask = bp->attr.bp_len - 1;
......
......@@ -296,6 +296,14 @@ u64 native_sched_clock(void)
return cycles_2_ns(tsc_now);
}
/*
* Generate a sched_clock if you already have a TSC value.
*/
u64 native_sched_clock_from_tsc(u64 tsc)
{
return cycles_2_ns(tsc);
}
/* We need to define a real function for sched_clock, to override the
weak default version */
#ifdef CONFIG_PARAVIRT
......
......@@ -985,3 +985,12 @@ arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs
return -1;
}
bool arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check ctx,
struct pt_regs *regs)
{
if (ctx == RP_CHECK_CALL) /* sp was just decremented by "call" insn */
return regs->sp < ret->stack;
else
return regs->sp <= ret->stack;
}
......@@ -267,6 +267,8 @@ extern void show_registers(struct pt_regs *regs);
extern void kprobes_inc_nmissed_count(struct kprobe *p);
extern bool arch_within_kprobe_blacklist(unsigned long addr);
extern bool within_kprobe_blacklist(unsigned long addr);
struct kprobe_insn_cache {
struct mutex mutex;
void *(*alloc)(void); /* allocate insn page */
......
......@@ -243,6 +243,7 @@ enum {
TRACE_EVENT_FL_USE_CALL_FILTER_BIT,
TRACE_EVENT_FL_TRACEPOINT_BIT,
TRACE_EVENT_FL_KPROBE_BIT,
TRACE_EVENT_FL_UPROBE_BIT,
};
/*
......@@ -257,6 +258,7 @@ enum {
* USE_CALL_FILTER - For trace internal events, don't use file filter
* TRACEPOINT - Event is a tracepoint
* KPROBE - Event is a kprobe
* UPROBE - Event is a uprobe
*/
enum {
TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT),
......@@ -267,8 +269,11 @@ enum {
TRACE_EVENT_FL_USE_CALL_FILTER = (1 << TRACE_EVENT_FL_USE_CALL_FILTER_BIT),
TRACE_EVENT_FL_TRACEPOINT = (1 << TRACE_EVENT_FL_TRACEPOINT_BIT),
TRACE_EVENT_FL_KPROBE = (1 << TRACE_EVENT_FL_KPROBE_BIT),
TRACE_EVENT_FL_UPROBE = (1 << TRACE_EVENT_FL_UPROBE_BIT),
};
#define TRACE_EVENT_FL_UKPROBE (TRACE_EVENT_FL_KPROBE | TRACE_EVENT_FL_UPROBE)
struct trace_event_call {
struct list_head list;
struct trace_event_class *class;
......@@ -542,7 +547,7 @@ event_trigger_unlock_commit_regs(struct trace_event_file *file,
event_triggers_post_call(file, tt);
}
#ifdef CONFIG_BPF_SYSCALL
#ifdef CONFIG_BPF_EVENTS
unsigned int trace_call_bpf(struct bpf_prog *prog, void *ctx);
#else
static inline unsigned int trace_call_bpf(struct bpf_prog *prog, void *ctx)
......
......@@ -92,6 +92,22 @@ struct uprobe_task {
unsigned int depth;
};
struct return_instance {
struct uprobe *uprobe;
unsigned long func;
unsigned long stack; /* stack pointer */
unsigned long orig_ret_vaddr; /* original return address */
bool chained; /* true, if instance is nested */
struct return_instance *next; /* keep as stack */
};
enum rp_check {
RP_CHECK_CALL,
RP_CHECK_CHAIN_CALL,
RP_CHECK_RET,
};
struct xol_area;
struct uprobes_state {
......@@ -128,6 +144,7 @@ extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
extern int arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val, void *data);
extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs *regs);
extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs *regs);
extern bool arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check ctx, struct pt_regs *regs);
extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
void *src, unsigned long len);
......
......@@ -330,7 +330,8 @@ struct perf_event_attr {
mmap2 : 1, /* include mmap with inode data */
comm_exec : 1, /* flag comm events that are due to an exec */
use_clockid : 1, /* use @clockid for time fields */
__reserved_1 : 38;
context_switch : 1, /* context switch data */
__reserved_1 : 37;
union {
__u32 wakeup_events; /* wakeup every n events */
......@@ -572,9 +573,11 @@ struct perf_event_mmap_page {
/*
* PERF_RECORD_MISC_MMAP_DATA and PERF_RECORD_MISC_COMM_EXEC are used on
* different events so can reuse the same bit position.
* Ditto PERF_RECORD_MISC_SWITCH_OUT.
*/
#define PERF_RECORD_MISC_MMAP_DATA (1 << 13)
#define PERF_RECORD_MISC_COMM_EXEC (1 << 13)
#define PERF_RECORD_MISC_SWITCH_OUT (1 << 13)
/*
* Indicates that the content of PERF_SAMPLE_IP points to
* the actual instruction that triggered the event. See also
......@@ -818,6 +821,32 @@ enum perf_event_type {
*/
PERF_RECORD_LOST_SAMPLES = 13,
/*
* Records a context switch in or out (flagged by
* PERF_RECORD_MISC_SWITCH_OUT). See also
* PERF_RECORD_SWITCH_CPU_WIDE.
*
* struct {
* struct perf_event_header header;
* struct sample_id sample_id;
* };
*/
PERF_RECORD_SWITCH = 14,
/*
* CPU-wide version of PERF_RECORD_SWITCH with next_prev_pid and
* next_prev_tid that are the next (switching out) or previous
* (switching in) pid/tid.
*
* struct {
* struct perf_event_header header;
* u32 next_prev_pid;
* u32 next_prev_tid;
* struct sample_id sample_id;
* };
*/
PERF_RECORD_SWITCH_CPU_WIDE = 15,
PERF_RECORD_MAX, /* non-ABI */
};
......@@ -922,6 +951,7 @@ union perf_mem_data_src {
*
* in_tx: running in a hardware transaction
* abort: aborting a hardware transaction
* cycles: cycles from last branch (or 0 if not supported)
*/
struct perf_branch_entry {
__u64 from;
......@@ -930,7 +960,8 @@ struct perf_branch_entry {
predicted:1,/* target predicted */
in_tx:1, /* in transaction */
abort:1, /* transaction abort */
reserved:60;
cycles:16, /* cycle count to last branch */
reserved:44;
};
#endif /* _UAPI_LINUX_PERF_EVENT_H */
......@@ -163,6 +163,7 @@ static atomic_t nr_mmap_events __read_mostly;
static atomic_t nr_comm_events __read_mostly;
static atomic_t nr_task_events __read_mostly;
static atomic_t nr_freq_events __read_mostly;
static atomic_t nr_switch_events __read_mostly;
static LIST_HEAD(pmus);
static DEFINE_MUTEX(pmus_lock);
......@@ -2619,6 +2620,9 @@ static void perf_pmu_sched_task(struct task_struct *prev,
local_irq_restore(flags);
}
static void perf_event_switch(struct task_struct *task,
struct task_struct *next_prev, bool sched_in);
#define for_each_task_context_nr(ctxn) \
for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
......@@ -2641,6 +2645,9 @@ void __perf_event_task_sched_out(struct task_struct *task,
if (__this_cpu_read(perf_sched_cb_usages))
perf_pmu_sched_task(task, next, false);
if (atomic_read(&nr_switch_events))
perf_event_switch(task, next, false);
for_each_task_context_nr(ctxn)
perf_event_context_sched_out(task, ctxn, next);
......@@ -2831,6 +2838,9 @@ void __perf_event_task_sched_in(struct task_struct *prev,
if (atomic_read(this_cpu_ptr(&perf_cgroup_events)))
perf_cgroup_sched_in(prev, task);
if (atomic_read(&nr_switch_events))
perf_event_switch(task, prev, true);
if (__this_cpu_read(perf_sched_cb_usages))
perf_pmu_sched_task(prev, task, true);
}
......@@ -3454,6 +3464,10 @@ static void unaccount_event(struct perf_event *event)
atomic_dec(&nr_task_events);
if (event->attr.freq)
atomic_dec(&nr_freq_events);
if (event->attr.context_switch) {
static_key_slow_dec_deferred(&perf_sched_events);
atomic_dec(&nr_switch_events);
}
if (is_cgroup_event(event))
static_key_slow_dec_deferred(&perf_sched_events);
if (has_branch_stack(event))
......@@ -6024,6 +6038,91 @@ void perf_log_lost_samples(struct perf_event *event, u64 lost)
perf_output_end(&handle);
}
/*
* context_switch tracking
*/
struct perf_switch_event {
struct task_struct *task;
struct task_struct *next_prev;
struct {
struct perf_event_header header;
u32 next_prev_pid;
u32 next_prev_tid;
} event_id;
};
static int perf_event_switch_match(struct perf_event *event)
{
return event->attr.context_switch;
}
static void perf_event_switch_output(struct perf_event *event, void *data)
{
struct perf_switch_event *se = data;
struct perf_output_handle handle;
struct perf_sample_data sample;
int ret;
if (!perf_event_switch_match(event))
return;
/* Only CPU-wide events are allowed to see next/prev pid/tid */
if (event->ctx->task) {
se->event_id.header.type = PERF_RECORD_SWITCH;
se->event_id.header.size = sizeof(se->event_id.header);
} else {
se->event_id.header.type = PERF_RECORD_SWITCH_CPU_WIDE;
se->event_id.header.size = sizeof(se->event_id);
se->event_id.next_prev_pid =
perf_event_pid(event, se->next_prev);
se->event_id.next_prev_tid =
perf_event_tid(event, se->next_prev);
}
perf_event_header__init_id(&se->event_id.header, &sample, event);
ret = perf_output_begin(&handle, event, se->event_id.header.size);
if (ret)
return;
if (event->ctx->task)
perf_output_put(&handle, se->event_id.header);
else
perf_output_put(&handle, se->event_id);
perf_event__output_id_sample(event, &handle, &sample);
perf_output_end(&handle);
}
static void perf_event_switch(struct task_struct *task,
struct task_struct *next_prev, bool sched_in)
{
struct perf_switch_event switch_event;
/* N.B. caller checks nr_switch_events != 0 */
switch_event = (struct perf_switch_event){
.task = task,
.next_prev = next_prev,
.event_id = {
.header = {
/* .type */
.misc = sched_in ? 0 : PERF_RECORD_MISC_SWITCH_OUT,
/* .size */
},
/* .next_prev_pid */
/* .next_prev_tid */
},
};
perf_event_aux(perf_event_switch_output,
&switch_event,
NULL);
}
/*
* IRQ throttle logging
*/
......@@ -6083,8 +6182,6 @@ static void perf_log_itrace_start(struct perf_event *event)
event->hw.itrace_started)
return;
event->hw.itrace_started = 1;
rec.header.type = PERF_RECORD_ITRACE_START;
rec.header.misc = 0;
rec.header.size = sizeof(rec);
......@@ -6792,8 +6889,8 @@ static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
if (event->tp_event->prog)
return -EEXIST;
if (!(event->tp_event->flags & TRACE_EVENT_FL_KPROBE))
/* bpf programs can only be attached to kprobes */
if (!(event->tp_event->flags & TRACE_EVENT_FL_UKPROBE))
/* bpf programs can only be attached to u/kprobes */
return -EINVAL;
prog = bpf_prog_get(prog_fd);
......@@ -7522,6 +7619,10 @@ static void account_event(struct perf_event *event)
if (atomic_inc_return(&nr_freq_events) == 1)
tick_nohz_full_kick_all();
}
if (event->attr.context_switch) {
atomic_inc(&nr_switch_events);
static_key_slow_inc(&perf_sched_events.key);
}
if (has_branch_stack(event))
static_key_slow_inc(&perf_sched_events.key);
if (is_cgroup_event(event))
......
......@@ -437,7 +437,10 @@ static struct page *rb_alloc_aux_page(int node, int order)
if (page && order) {
/*
* Communicate the allocation size to the driver
* Communicate the allocation size to the driver:
* if we managed to secure a high-order allocation,
* set its first page's private to this order;
* !PagePrivate(page) means it's just a normal page.
*/
split_page(page, order);
SetPagePrivate(page);
......
This diff is collapsed.
......@@ -1332,7 +1332,7 @@ bool __weak arch_within_kprobe_blacklist(unsigned long addr)
addr < (unsigned long)__kprobes_text_end;
}
static bool within_kprobe_blacklist(unsigned long addr)
bool within_kprobe_blacklist(unsigned long addr)
{
struct kprobe_blacklist_entry *ent;
......
......@@ -434,7 +434,7 @@ config UPROBE_EVENT
config BPF_EVENTS
depends on BPF_SYSCALL
depends on KPROBE_EVENT
depends on KPROBE_EVENT || UPROBE_EVENT
bool
default y
help
......
......@@ -601,7 +601,22 @@ static int probes_seq_show(struct seq_file *m, void *v)
seq_printf(m, "%c:%s/%s", c, tu->tp.call.class->system,
trace_event_name(&tu->tp.call));
seq_printf(m, " %s:0x%p", tu->filename, (void *)tu->offset);
seq_printf(m, " %s:", tu->filename);
/* Don't print "0x (null)" when offset is 0 */
if (tu->offset) {
seq_printf(m, "0x%p", (void *)tu->offset);
} else {
switch (sizeof(void *)) {
case 4:
seq_printf(m, "0x00000000");
break;
case 8:
default:
seq_printf(m, "0x0000000000000000");
break;
}
}
for (i = 0; i < tu->tp.nr_args; i++)
seq_printf(m, " %s=%s", tu->tp.args[i].name, tu->tp.args[i].comm);
......@@ -1095,11 +1110,15 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
{
struct trace_event_call *call = &tu->tp.call;
struct uprobe_trace_entry_head *entry;
struct bpf_prog *prog = call->prog;
struct hlist_head *head;
void *data;
int size, esize;
int rctx;
if (prog && !trace_call_bpf(prog, regs))
return;
esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
size = esize + tu->tp.size + dsize;
......@@ -1289,6 +1308,7 @@ static int register_uprobe_event(struct trace_uprobe *tu)
return -ENODEV;
}
call->flags = TRACE_EVENT_FL_UPROBE;
call->class->reg = trace_uprobe_register;
call->data = tu;
ret = trace_add_event_call(call);
......
......@@ -66,6 +66,7 @@ To follow the above example, the user provides following 'Build' files:
ex/Build:
ex-y += a.o
ex-y += b.o
ex-y += b.o # duplicates in the lists are allowed
libex-y += c.o
libex-y += d.o
......
......@@ -57,11 +57,13 @@ quiet_cmd_cc_i_c = CPP $@
quiet_cmd_cc_s_c = AS $@
cmd_cc_s_c = $(CC) $(c_flags) -S -o $@ $<
quiet_cmd_gen = GEN $@
# Link agregate command
# If there's nothing to link, create empty $@ object.
quiet_cmd_ld_multi = LD $@
cmd_ld_multi = $(if $(strip $(obj-y)),\
$(LD) -r -o $@ $(obj-y),rm -f $@; $(AR) rcs $@)
$(LD) -r -o $@ $(filter $(obj-y),$^),rm -f $@; $(AR) rcs $@)
# Build rules
$(OUTPUT)%.o: %.c FORCE
......
......@@ -33,7 +33,8 @@ FILES= \
test-compile-32.bin \
test-compile-x32.bin \
test-zlib.bin \
test-lzma.bin
test-lzma.bin \
test-bpf.bin
CC := $(CROSS_COMPILE)gcc -MD
PKG_CONFIG := $(CROSS_COMPILE)pkg-config
......@@ -69,8 +70,13 @@ test-libelf.bin:
test-glibc.bin:
$(BUILD)
DWARFLIBS := -ldw
ifeq ($(findstring -static,${LDFLAGS}),-static)
DWARFLIBS += -lelf -lebl -lz -llzma -lbz2
endif
test-dwarf.bin:
$(BUILD) -ldw
$(BUILD) $(DWARFLIBS)
test-libelf-mmap.bin:
$(BUILD) -lelf
......@@ -156,6 +162,9 @@ test-zlib.bin:
test-lzma.bin:
$(BUILD) -llzma
test-bpf.bin:
$(BUILD)
-include *.d
###############################
......
#include <linux/bpf.h>
int main(void)
{
union bpf_attr attr;
attr.prog_type = BPF_PROG_TYPE_KPROBE;
attr.insn_cnt = 0;
attr.insns = 0;
attr.license = 0;
attr.log_buf = 0;
attr.log_size = 0;
attr.log_level = 0;
attr.kern_version = 0;
attr = attr;
return 0;
}
#include <stdlib.h>
#if !defined(__UCLIBC__)
#include <gnu/libc-version.h>
#else
#define XSTR(s) STR(s)
#define STR(s) #s
#endif
int main(void)
{
#if !defined(__UCLIBC__)
const char *version = gnu_get_libc_version();
#else
const char *version = XSTR(__GLIBC__) "." XSTR(__GLIBC_MINOR__);
#endif
return (long)version;
}
ex-y += ex.o
ex-y += a.o
ex-y += b.o
ex-y += b.o
ex-y += empty/
ex-y += empty2/
......
......@@ -12,6 +12,7 @@
#include <linux/kernel.h>
#include "debugfs.h"
#include "tracefs.h"
#ifndef DEBUGFS_DEFAULT_PATH
#define DEBUGFS_DEFAULT_PATH "/sys/kernel/debug"
......@@ -94,11 +95,21 @@ int debugfs__strerror_open(int err, char *buf, size_t size, const char *filename
"Hint:\tIs the debugfs filesystem mounted?\n"
"Hint:\tTry 'sudo mount -t debugfs nodev /sys/kernel/debug'");
break;
case EACCES:
case EACCES: {
const char *mountpoint = debugfs_mountpoint;
if (!access(debugfs_mountpoint, R_OK) && strncmp(filename, "tracing/", 8) == 0) {
const char *tracefs_mntpoint = tracefs_find_mountpoint();
if (tracefs_mntpoint)
mountpoint = tracefs_mntpoint;
}
snprintf(buf, size,
"Error:\tNo permissions to read %s/%s\n"
"Hint:\tTry 'sudo mount -o remount,mode=755 %s'\n",
debugfs_mountpoint, filename, debugfs_mountpoint);
debugfs_mountpoint, filename, mountpoint);
}
break;
default:
snprintf(buf, size, "%s", strerror_r(err, sbuf, sizeof(sbuf)));
......
libbpf_version.h
FEATURE-DUMP
libbpf-y := libbpf.o bpf.o
# Most of this file is copied from tools/lib/traceevent/Makefile
BPF_VERSION = 0
BPF_PATCHLEVEL = 0
BPF_EXTRAVERSION = 1
MAKEFLAGS += --no-print-directory
# Makefiles suck: This macro sets a default value of $(2) for the
# variable named by $(1), unless the variable has been set by
# environment or command line. This is necessary for CC and AR
# because make sets default values, so the simpler ?= approach
# won't work as expected.
define allow-override
$(if $(or $(findstring environment,$(origin $(1))),\
$(findstring command line,$(origin $(1)))),,\
$(eval $(1) = $(2)))
endef
# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
$(call allow-override,CC,$(CROSS_COMPILE)gcc)
$(call allow-override,AR,$(CROSS_COMPILE)ar)
INSTALL = install
# Use DESTDIR for installing into a different root directory.
# This is useful for building a package. The program will be
# installed in this directory as if it was the root directory.
# Then the build tool can move it later.
DESTDIR ?=
DESTDIR_SQ = '$(subst ','\'',$(DESTDIR))'
LP64 := $(shell echo __LP64__ | ${CC} ${CFLAGS} -E -x c - | tail -n 1)
ifeq ($(LP64), 1)
libdir_relative = lib64
else
libdir_relative = lib
endif
prefix ?= /usr/local
libdir = $(prefix)/$(libdir_relative)
man_dir = $(prefix)/share/man
man_dir_SQ = '$(subst ','\'',$(man_dir))'
export man_dir man_dir_SQ INSTALL
export DESTDIR DESTDIR_SQ
include ../../scripts/Makefile.include
# copy a bit from Linux kbuild
ifeq ("$(origin V)", "command line")
VERBOSE = $(V)
endif
ifndef VERBOSE
VERBOSE = 0
endif
ifeq ($(srctree),)
srctree := $(patsubst %/,%,$(dir $(shell pwd)))
srctree := $(patsubst %/,%,$(dir $(srctree)))
srctree := $(patsubst %/,%,$(dir $(srctree)))
#$(info Determined 'srctree' to be $(srctree))
endif
FEATURE_DISPLAY = libelf libelf-getphdrnum libelf-mmap bpf
FEATURE_TESTS = libelf bpf
INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/arch/$(ARCH)/include/uapi -I$(srctree)/include/uapi
FEATURE_CHECK_CFLAGS-bpf = $(INCLUDES)
include $(srctree)/tools/build/Makefile.feature
export prefix libdir src obj
# Shell quotes
libdir_SQ = $(subst ','\'',$(libdir))
libdir_relative_SQ = $(subst ','\'',$(libdir_relative))
plugin_dir_SQ = $(subst ','\'',$(plugin_dir))
LIB_FILE = libbpf.a libbpf.so
VERSION = $(BPF_VERSION)
PATCHLEVEL = $(BPF_PATCHLEVEL)
EXTRAVERSION = $(BPF_EXTRAVERSION)
OBJ = $@
N =
LIBBPF_VERSION = $(BPF_VERSION).$(BPF_PATCHLEVEL).$(BPF_EXTRAVERSION)
# Set compile option CFLAGS
ifdef EXTRA_CFLAGS
CFLAGS := $(EXTRA_CFLAGS)
else
CFLAGS := -g -Wall
endif
ifeq ($(feature-libelf-mmap), 1)
override CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT
endif
ifeq ($(feature-libelf-getphdrnum), 1)
override CFLAGS += -DHAVE_ELF_GETPHDRNUM_SUPPORT
endif
# Append required CFLAGS
override CFLAGS += $(EXTRA_WARNINGS)
override CFLAGS += -Werror -Wall
override CFLAGS += -fPIC
override CFLAGS += $(INCLUDES)
ifeq ($(VERBOSE),1)
Q =
else
Q = @
endif
# Disable command line variables (CFLAGS) overide from top
# level Makefile (perf), otherwise build Makefile will get
# the same command line setup.
MAKEOVERRIDES=
export srctree OUTPUT CC LD CFLAGS V
build := -f $(srctree)/tools/build/Makefile.build dir=. obj
BPF_IN := $(OUTPUT)libbpf-in.o
LIB_FILE := $(addprefix $(OUTPUT),$(LIB_FILE))
CMD_TARGETS = $(LIB_FILE)
TARGETS = $(CMD_TARGETS)
all: $(VERSION_FILES) all_cmd
all_cmd: $(CMD_TARGETS)
$(BPF_IN): force elfdep bpfdep
$(Q)$(MAKE) $(build)=libbpf
$(OUTPUT)libbpf.so: $(BPF_IN)
$(QUIET_LINK)$(CC) --shared $^ -o $@
$(OUTPUT)libbpf.a: $(BPF_IN)
$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
define update_dir
(echo $1 > $@.tmp; \
if [ -r $@ ] && cmp -s $@ $@.tmp; then \
rm -f $@.tmp; \
else \
echo ' UPDATE $@'; \
mv -f $@.tmp $@; \
fi);
endef
define do_install
if [ ! -d '$(DESTDIR_SQ)$2' ]; then \
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$2'; \
fi; \
$(INSTALL) $1 '$(DESTDIR_SQ)$2'
endef
install_lib: all_cmd
$(call QUIET_INSTALL, $(LIB_FILE)) \
$(call do_install,$(LIB_FILE),$(libdir_SQ))
install: install_lib
### Cleaning rules
config-clean:
$(call QUIET_CLEAN, config)
$(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null
clean:
$(call QUIET_CLEAN, libbpf) $(RM) *.o *~ $(TARGETS) *.a *.so $(VERSION_FILES) .*.d \
$(RM) LIBBPF-CFLAGS
$(call QUIET_CLEAN, core-gen) $(RM) $(OUTPUT)FEATURE-DUMP
PHONY += force elfdep bpfdep
force:
elfdep:
@if [ "$(feature-libelf)" != "1" ]; then echo "No libelf found"; exit -1 ; fi
bpfdep:
@if [ "$(feature-bpf)" != "1" ]; then echo "BPF API too old"; exit -1 ; fi
# Declare the contents of the .PHONY variable as phony. We keep that
# information in a variable so we can use it in if_changed and friends.
.PHONY: $(PHONY)
/*
* common eBPF ELF operations.
*
* Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
* Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
* Copyright (C) 2015 Huawei Inc.
*/
#include <stdlib.h>
#include <memory.h>
#include <unistd.h>
#include <asm/unistd.h>
#include <linux/bpf.h>
#include "bpf.h"
/*
* When building perf, unistd.h is override. Define __NR_bpf is
* required to be defined.
*/
#ifndef __NR_bpf
# if defined(__i386__)
# define __NR_bpf 357
# elif defined(__x86_64__)
# define __NR_bpf 321
# elif defined(__aarch64__)
# define __NR_bpf 280
# else
# error __NR_bpf not defined. libbpf does not support your arch.
# endif
#endif
static __u64 ptr_to_u64(void *ptr)
{
return (__u64) (unsigned long) ptr;
}
static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
unsigned int size)
{
return syscall(__NR_bpf, cmd, attr, size);
}
int bpf_create_map(enum bpf_map_type map_type, int key_size,
int value_size, int max_entries)
{
union bpf_attr attr;
memset(&attr, '\0', sizeof(attr));
attr.map_type = map_type;
attr.key_size = key_size;
attr.value_size = value_size;
attr.max_entries = max_entries;
return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
}
int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
size_t insns_cnt, char *license,
u32 kern_version, char *log_buf, size_t log_buf_sz)
{
int fd;
union bpf_attr attr;
bzero(&attr, sizeof(attr));
attr.prog_type = type;
attr.insn_cnt = (__u32)insns_cnt;
attr.insns = ptr_to_u64(insns);
attr.license = ptr_to_u64(license);
attr.log_buf = ptr_to_u64(NULL);
attr.log_size = 0;
attr.log_level = 0;
attr.kern_version = kern_version;
fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
if (fd >= 0 || !log_buf || !log_buf_sz)
return fd;
/* Try again with log */
attr.log_buf = ptr_to_u64(log_buf);
attr.log_size = log_buf_sz;
attr.log_level = 1;
log_buf[0] = 0;
return sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
}
/*
* common eBPF ELF operations.
*
* Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
* Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
* Copyright (C) 2015 Huawei Inc.
*/
#ifndef __BPF_BPF_H
#define __BPF_BPF_H
#include <linux/bpf.h>
int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
int max_entries);
/* Recommend log buffer size */
#define BPF_LOG_BUF_SIZE 65536
int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
size_t insns_cnt, char *license,
u32 kern_version, char *log_buf,
size_t log_buf_sz);
#endif
This diff is collapsed.
/*
* Common eBPF ELF object loading operations.
*
* Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
* Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
* Copyright (C) 2015 Huawei Inc.
*/
#ifndef __BPF_LIBBPF_H
#define __BPF_LIBBPF_H
#include <stdio.h>
#include <stdbool.h>
/*
* In include/linux/compiler-gcc.h, __printf is defined. However
* it should be better if libbpf.h doesn't depend on Linux header file.
* So instead of __printf, here we use gcc attribute directly.
*/
typedef int (*libbpf_print_fn_t)(const char *, ...)
__attribute__((format(printf, 1, 2)));
void libbpf_set_print(libbpf_print_fn_t warn,
libbpf_print_fn_t info,
libbpf_print_fn_t debug);
/* Hide internal to user */
struct bpf_object;
struct bpf_object *bpf_object__open(const char *path);
struct bpf_object *bpf_object__open_buffer(void *obj_buf,
size_t obj_buf_sz);
void bpf_object__close(struct bpf_object *object);
/* Load/unload object into/from kernel */
int bpf_object__load(struct bpf_object *obj);
int bpf_object__unload(struct bpf_object *obj);
struct bpf_object *bpf_object__next(struct bpf_object *prev);
#define bpf_object__for_each_safe(pos, tmp) \
for ((pos) = bpf_object__next(NULL), \
(tmp) = bpf_object__next(pos); \
(pos) != NULL; \
(pos) = (tmp), (tmp) = bpf_object__next(tmp))
/* Accessors of bpf_program. */
struct bpf_program;
struct bpf_program *bpf_program__next(struct bpf_program *prog,
struct bpf_object *obj);
#define bpf_object__for_each_program(pos, obj) \
for ((pos) = bpf_program__next(NULL, (obj)); \
(pos) != NULL; \
(pos) = bpf_program__next((pos), (obj)))
typedef void (*bpf_program_clear_priv_t)(struct bpf_program *,
void *);
int bpf_program__set_private(struct bpf_program *prog, void *priv,
bpf_program_clear_priv_t clear_priv);
int bpf_program__get_private(struct bpf_program *prog,
void **ppriv);
const char *bpf_program__title(struct bpf_program *prog, bool dup);
int bpf_program__fd(struct bpf_program *prog);
/*
* We don't need __attribute__((packed)) now since it is
* unnecessary for 'bpf_map_def' because they are all aligned.
* In addition, using it will trigger -Wpacked warning message,
* and will be treated as an error due to -Werror.
*/
struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};
#endif
......@@ -418,7 +418,7 @@ static int func_map_init(struct pevent *pevent)
}
static struct func_map *
find_func(struct pevent *pevent, unsigned long long addr)
__find_func(struct pevent *pevent, unsigned long long addr)
{
struct func_map *func;
struct func_map key;
......@@ -434,6 +434,71 @@ find_func(struct pevent *pevent, unsigned long long addr)
return func;
}
struct func_resolver {
pevent_func_resolver_t *func;
void *priv;
struct func_map map;
};
/**
* pevent_set_function_resolver - set an alternative function resolver
* @pevent: handle for the pevent
* @resolver: function to be used
* @priv: resolver function private state.
*
* Some tools may have already a way to resolve kernel functions, allow them to
* keep using it instead of duplicating all the entries inside
* pevent->funclist.
*/
int pevent_set_function_resolver(struct pevent *pevent,
pevent_func_resolver_t *func, void *priv)
{
struct func_resolver *resolver = malloc(sizeof(*resolver));
if (resolver == NULL)
return -1;
resolver->func = func;
resolver->priv = priv;
free(pevent->func_resolver);
pevent->func_resolver = resolver;
return 0;
}
/**
* pevent_reset_function_resolver - reset alternative function resolver
* @pevent: handle for the pevent
*
* Stop using whatever alternative resolver was set, use the default
* one instead.
*/
void pevent_reset_function_resolver(struct pevent *pevent)
{
free(pevent->func_resolver);
pevent->func_resolver = NULL;
}
static struct func_map *
find_func(struct pevent *pevent, unsigned long long addr)
{
struct func_map *map;
if (!pevent->func_resolver)
return __find_func(pevent, addr);
map = &pevent->func_resolver->map;
map->mod = NULL;
map->addr = addr;
map->func = pevent->func_resolver->func(pevent->func_resolver->priv,
&map->addr, &map->mod);
if (map->func == NULL)
return NULL;
return map;
}
/**
* pevent_find_function - find a function by a given address
* @pevent: handle for the pevent
......@@ -1680,6 +1745,9 @@ process_cond(struct event_format *event, struct print_arg *top, char **tok)
type = process_arg(event, left, &token);
again:
if (type == EVENT_ERROR)
goto out_free;
/* Handle other operations in the arguments */
if (type == EVENT_OP && strcmp(token, ":") != 0) {
type = process_op(event, left, &token);
......@@ -1939,6 +2007,12 @@ process_op(struct event_format *event, struct print_arg *arg, char **tok)
goto out_warn_free;
type = process_arg_token(event, right, tok, type);
if (type == EVENT_ERROR) {
free_arg(right);
/* token was freed in process_arg_token() via *tok */
token = NULL;
goto out_free;
}
if (right->type == PRINT_OP &&
get_op_prio(arg->op.op) < get_op_prio(right->op.op)) {
......@@ -4754,6 +4828,7 @@ static void pretty_print(struct trace_seq *s, void *data, int size, struct event
case 'z':
case 'Z':
case '0' ... '9':
case '-':
goto cont_process;
case 'p':
if (pevent->long_size == 4)
......@@ -6564,6 +6639,7 @@ void pevent_free(struct pevent *pevent)
free(pevent->trace_clock);
free(pevent->events);
free(pevent->sort_events);
free(pevent->func_resolver);
free(pevent);
}
......
......@@ -453,6 +453,10 @@ struct cmdline_list;
struct func_map;
struct func_list;
struct event_handler;
struct func_resolver;
typedef char *(pevent_func_resolver_t)(void *priv,
unsigned long long *addrp, char **modp);
struct pevent {
int ref_count;
......@@ -481,6 +485,7 @@ struct pevent {
int cmdline_count;
struct func_map *func_map;
struct func_resolver *func_resolver;
struct func_list *funclist;
unsigned int func_count;
......@@ -611,6 +616,9 @@ enum trace_flag_type {
TRACE_FLAG_SOFTIRQ = 0x10,
};
int pevent_set_function_resolver(struct pevent *pevent,
pevent_func_resolver_t *func, void *priv);
void pevent_reset_function_resolver(struct pevent *pevent);
int pevent_register_comm(struct pevent *pevent, const char *comm, int pid);
int pevent_register_trace_clock(struct pevent *pevent, const char *trace_clock);
int pevent_register_function(struct pevent *pevent, char *name,
......
......@@ -29,3 +29,4 @@ config.mak.autogen
*.pyc
*.pyo
.config-detected
util/intel-pt-decoder/inat-tables.c
......@@ -35,6 +35,7 @@ paths += -DPERF_MAN_PATH="BUILD_STR($(mandir_SQ))"
CFLAGS_builtin-help.o += $(paths)
CFLAGS_builtin-timechart.o += $(paths)
CFLAGS_perf.o += -DPERF_HTML_PATH="BUILD_STR($(htmldir_SQ))" -include $(OUTPUT)PERF-VERSION-FILE
CFLAGS_builtin-trace.o += -DSTRACE_GROUPS_DIR="BUILD_STR($(STRACE_GROUPS_DIR_SQ))"
libperf-y += util/
libperf-y += arch/
......
Intel Branch Trace Store
========================
Overview
========
Intel BTS could be regarded as a predecessor to Intel PT and has some
similarities because it can also identify every branch a program takes. A
notable difference is that Intel BTS has no timing information and as a
consequence the present implementation is limited to per-thread recording.
While decoding Intel BTS does not require walking the object code, the object
code is still needed to pair up calls and returns correctly, consequently much
of the Intel PT documentation applies also to Intel BTS. Refer to the Intel PT
documentation and consider that the PMU 'intel_bts' can usually be used in
place of 'intel_pt' in the examples provided, with the proviso that per-thread
recording must also be stipulated i.e. the --per-thread option for
'perf record'.
perf record
===========
new event
---------
The Intel BTS kernel driver creates a new PMU for Intel BTS. The perf record
option is:
-e intel_bts//
Currently Intel BTS is limited to per-thread tracing so the --per-thread option
is also needed.
snapshot option
---------------
The snapshot option is the same as Intel PT (refer Intel PT documentation).
auxtrace mmap size option
-----------------------
The mmap size option is the same as Intel PT (refer Intel PT documentation).
perf script
===========
By default, perf script will decode trace data found in the perf.data file.
This can be further controlled by option --itrace. The --itrace option is
the same as Intel PT (refer Intel PT documentation) except that neither
"instructions" events nor "transactions" events (and consequently call
chains) are supported.
To disable trace decoding entirely, use the option --no-itrace.
dump option
-----------
perf script has an option (-D) to "dump" the events i.e. display the binary
data.
When -D is used, Intel BTS packets are displayed.
To disable the display of Intel BTS packets, combine the -D option with
--no-itrace.
perf report
===========
By default, perf report will decode trace data found in the perf.data file.
This can be further controlled by new option --itrace exactly the same as
perf script.
perf inject
===========
perf inject also accepts the --itrace option in which case tracing data is
removed and replaced with the synthesized events. e.g.
perf inject --itrace -i perf.data -o perf.data.new
This diff is collapsed.
i synthesize instructions events
b synthesize branches events
c synthesize branches events (calls only)
r synthesize branches events (returns only)
x synthesize transactions events
e synthesize error events
d create a debug log
g synthesize a call chain (use with i or x)
The default is all events i.e. the same as --itrace=ibxe
In addition, the period (default 100000) for instructions events
can be specified in units of:
i instructions
t ticks
ms milliseconds
us microseconds
ns nanoseconds (default)
Also the call chain size (default 16, max. 1024) for instructions or
transactions events can be specified.
......@@ -216,6 +216,10 @@ Suite for evaluating parallel wake calls.
*requeue*::
Suite for evaluating requeue calls.
*lock-pi*::
Suite for evaluating futex lock_pi calls.
SEE ALSO
--------
linkperf:perf[1]
......@@ -48,28 +48,7 @@ OPTIONS
Decode Instruction Tracing data, replacing it with synthesized events.
Options are:
i synthesize instructions events
b synthesize branches events
c synthesize branches events (calls only)
r synthesize branches events (returns only)
x synthesize transactions events
e synthesize error events
d create a debug log
g synthesize a call chain (use with i or x)
The default is all events i.e. the same as --itrace=ibxe
In addition, the period (default 100000) for instructions events
can be specified in units of:
i instructions
t ticks
ms milliseconds
us microseconds
ns nanoseconds (default)
Also the call chain size (default 16, max. 1024) for instructions or
transactions events can be specified.
include::itrace.txt[]
SEE ALSO
--------
......
......@@ -45,6 +45,21 @@ OPTIONS
param1 and param2 are defined as formats for the PMU in:
/sys/bus/event_sources/devices/<pmu>/format/*
There are also some params which are not defined in .../<pmu>/format/*.
These params can be used to overload default config values per event.
Here is a list of the params.
- 'period': Set event sampling period
- 'freq': Set event sampling frequency
- 'time': Disable/enable time stamping. Acceptable values are 1 for
enabling time stamping. 0 for disabling time stamping.
The default is 1.
- 'call-graph': Disable/enable callgraph. Acceptable str are "fp" for
FP mode, "dwarf" for DWARF mode, "lbr" for LBR mode and
"no" for disable callgraph.
- 'stack-size': user stack size for dwarf mode
Note: If user explicitly sets options which conflict with the params,
the value set by the params will be overridden.
- a hardware breakpoint event in the form of '\mem:addr[/len][:access]'
where addr is the address in memory you want to break in.
Access is the memory access type (read, write, execute) it can
......@@ -61,7 +76,16 @@ OPTIONS
"perf report" to view group events together.
--filter=<filter>::
Event filter.
Event filter. This option should follow a event selector (-e) which
selects tracepoint event(s). Multiple '--filter' options are combined
using '&&'.
--exclude-perf::
Don't record events issued by perf itself. This option should follow
a event selector (-e) which selects tracepoint event(s). It adds a
filter expression 'common_pid != $PERFPID' to filters. If other
'--filter' exists, the new filter expression will be combined with
them by '&&'.
-a::
--all-cpus::
......@@ -276,6 +300,10 @@ When processing pre-existing threads /proc/XXX/mmap, it may take a long time,
because the file may be huge. A time out is needed in such cases.
This option sets the time out limit. The default value is 500 ms.
--switch-events::
Record context switch events i.e. events of type PERF_RECORD_SWITCH or
PERF_RECORD_SWITCH_CPU_WIDE.
SEE ALSO
--------
linkperf:perf-stat[1], linkperf:perf-list[1]
......@@ -81,6 +81,8 @@ OPTIONS
- cpu: cpu number the task ran at the time of sample
- srcline: filename and line number executed at the time of sample. The
DWARF debugging info must be provided.
- srcfile: file name of the source file of the same. Requires dwarf
information.
- weight: Event specific weight, e.g. memory latency or transaction
abort cost. This is the global weight.
- local_weight: Local weight version of the weight above.
......@@ -109,6 +111,7 @@ OPTIONS
- mispredict: "N" for predicted branch, "Y" for mispredicted branch
- in_tx: branch in TSX transaction
- abort: TSX transaction abort.
- cycles: Cycles in basic block
And default sort keys are changed to comm, dso_from, symbol_from, dso_to
and symbol_to, see '--branch-stack'.
......@@ -328,31 +331,23 @@ OPTIONS
--itrace::
Options for decoding instruction tracing data. The options are:
i synthesize instructions events
b synthesize branches events
c synthesize branches events (calls only)
r synthesize branches events (returns only)
x synthesize transactions events
e synthesize error events
d create a debug log
g synthesize a call chain (use with i or x)
The default is all events i.e. the same as --itrace=ibxe
In addition, the period (default 100000) for instructions events
can be specified in units of:
i instructions
t ticks
ms milliseconds
us microseconds
ns nanoseconds (default)
Also the call chain size (default 16, max. 1024) for instructions or
transactions events can be specified.
include::itrace.txt[]
To disable decoding entirely, use --no-itrace.
--full-source-path::
Show the full path for source files for srcline output.
--show-ref-call-graph::
When multiple events are sampled, it may not be needed to collect
callgraphs for all of them. The sample sites are usually nearby,
and it's enough to collect the callgraphs on a reference event.
So user can use "call-graph=no" event modifier to disable callgraph
for other events to reduce the overhead.
However, perf report cannot show callgraphs for the event which
disable the callgraph.
This option extends the perf report to show reference callgraphs,
which collected by reference event, in no callgraph event.
include::callchain-overhead-calculation.txt[]
......
......@@ -222,6 +222,17 @@ OPTIONS
--show-mmap-events
Display mmap related events (e.g. MMAP, MMAP2).
--show-switch-events
Display context switch events i.e. events of type PERF_RECORD_SWITCH or
PERF_RECORD_SWITCH_CPU_WIDE.
--demangle::
Demangle symbol names to human readable form. It's enabled by default,
disable with --no-demangle.
--demangle-kernel::
Demangle kernel symbol names to human readable form (for C++ kernels).
--header
Show perf.data header.
......@@ -231,31 +242,13 @@ OPTIONS
--itrace::
Options for decoding instruction tracing data. The options are:
i synthesize instructions events
b synthesize branches events
c synthesize branches events (calls only)
r synthesize branches events (returns only)
x synthesize transactions events
e synthesize error events
d create a debug log
g synthesize a call chain (use with i or x)
The default is all events i.e. the same as --itrace=ibxe
In addition, the period (default 100000) for instructions events
can be specified in units of:
i instructions
t ticks
ms milliseconds
us microseconds
ns nanoseconds (default)
Also the call chain size (default 16, max. 1024) for instructions or
transactions events can be specified.
include::itrace.txt[]
To disable decoding entirely, use --no-itrace.
--full-source-path::
Show the full path for source files for srcline output.
SEE ALSO
--------
linkperf:perf-record[1], linkperf:perf-script-perl[1],
......
......@@ -208,6 +208,27 @@ Default is to monitor all CPUS.
This option sets the time out limit. The default value is 500 ms.
-b::
--branch-any::
Enable taken branch stack sampling. Any type of taken branch may be sampled.
This is a shortcut for --branch-filter any. See --branch-filter for more infos.
-j::
--branch-filter::
Enable taken branch stack sampling. Each sample captures a series of consecutive
taken branches. The number of branches captured with each sample depends on the
underlying hardware, the type of branches of interest, and the executed code.
It is possible to select the types of branches captured by enabling filters.
For a full list of modifiers please see the perf record manpage.
The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
The privilege levels may be omitted, in which case, the privilege levels of the associated
event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege
levels are subject to permissions. When sampling on multiple events, branch stack sampling
is enabled for all the sampling events. The sampled branch type is the same for all events.
The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
Note that this feature may not be available on all processors.
INTERACTIVE PROMPTING KEYS
--------------------------
......
......@@ -18,6 +18,7 @@ tools/arch/x86/include/asm/atomic.h
tools/arch/x86/include/asm/rmwcc.h
tools/lib/traceevent
tools/lib/api
tools/lib/bpf
tools/lib/hweight.c
tools/lib/rbtree.c
tools/lib/symbol/kallsyms.c
......@@ -40,7 +41,6 @@ tools/include/asm-generic/bitops.h
tools/include/linux/atomic.h
tools/include/linux/bitops.h
tools/include/linux/compiler.h
tools/include/linux/export.h
tools/include/linux/hash.h
tools/include/linux/kernel.h
tools/include/linux/list.h
......
......@@ -76,6 +76,12 @@ include config/utilities.mak
#
# Define NO_AUXTRACE if you do not want AUX area tracing support
# As per kernel Makefile, avoid funny character set dependencies
unexport LC_ALL
LC_COLLATE=C
LC_NUMERIC=C
export LC_COLLATE LC_NUMERIC
ifeq ($(srctree),)
srctree := $(patsubst %/,%,$(dir $(shell pwd)))
srctree := $(patsubst %/,%,$(dir $(srctree)))
......@@ -135,6 +141,7 @@ INSTALL = install
FLEX = flex
BISON = bison
STRIP = strip
AWK = awk
LIB_DIR = $(srctree)/tools/lib/api/
TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
......@@ -289,7 +296,7 @@ strip: $(PROGRAMS) $(OUTPUT)perf
PERF_IN := $(OUTPUT)perf-in.o
export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX
export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX AWK
build := -f $(srctree)/tools/build/Makefile.build dir=. obj
$(PERF_IN): $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h FORCE
......@@ -507,6 +514,11 @@ endif
$(INSTALL) $(OUTPUT)perf-archive -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)'
$(call QUIET_INSTALL, perf-with-kcore) \
$(INSTALL) $(OUTPUT)perf-with-kcore -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)'
ifndef NO_LIBAUDIT
$(call QUIET_INSTALL, strace/groups) \
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(STRACE_GROUPS_INSTDIR_SQ)'; \
$(INSTALL) trace/strace/groups/* -t '$(DESTDIR_SQ)$(STRACE_GROUPS_INSTDIR_SQ)'
endif
ifndef NO_LIBPERL
$(call QUIET_INSTALL, perl-scripts) \
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/Perf-Trace-Util/lib/Perf/Trace'; \
......@@ -560,7 +572,8 @@ clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean config-clean
$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
$(Q)$(RM) $(OUTPUT).config-detected
$(call QUIET_CLEAN, core-progs) $(RM) $(ALL_PROGRAMS) perf perf-read-vdso32 perf-read-vdsox32
$(call QUIET_CLEAN, core-gen) $(RM) *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex*
$(call QUIET_CLEAN, core-gen) $(RM) *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex* \
$(OUTPUT)util/intel-pt-decoder/inat-tables.c
$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) clean
$(python-clean)
......
......@@ -128,7 +128,7 @@ static const char *normalize_arch(char *arch)
return arch;
}
static int perf_session_env__lookup_binutils_path(struct perf_session_env *env,
static int perf_session_env__lookup_binutils_path(struct perf_env *env,
const char *name,
const char **path)
{
......@@ -206,7 +206,7 @@ static int perf_session_env__lookup_binutils_path(struct perf_session_env *env,
return -1;
}
int perf_session_env__lookup_objdump(struct perf_session_env *env)
int perf_session_env__lookup_objdump(struct perf_env *env)
{
/*
* For live mode, env->arch will be NULL and we can use
......
......@@ -5,6 +5,6 @@
extern const char *objdump_path;
int perf_session_env__lookup_objdump(struct perf_session_env *env);
int perf_session_env__lookup_objdump(struct perf_env *env);
#endif /* ARCH_PERF_COMMON_H */
libperf-y += header.o
libperf-y += tsc.o
libperf-y += pmu.o
libperf-y += kvm-stat.o
libperf-$(CONFIG_DWARF) += dwarf-regs.o
libperf-$(CONFIG_LIBUNWIND) += unwind-libunwind.o
libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
libperf-$(CONFIG_AUXTRACE) += auxtrace.o
libperf-$(CONFIG_AUXTRACE) += intel-pt.o
libperf-$(CONFIG_AUXTRACE) += intel-bts.o
/*
* auxtrace.c: AUX area tracing support
* Copyright (c) 2013-2014, Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
*/
#include <stdbool.h>
#include "../../util/header.h"
#include "../../util/debug.h"
#include "../../util/pmu.h"
#include "../../util/auxtrace.h"
#include "../../util/intel-pt.h"
#include "../../util/intel-bts.h"
#include "../../util/evlist.h"
static
struct auxtrace_record *auxtrace_record__init_intel(struct perf_evlist *evlist,
int *err)
{
struct perf_pmu *intel_pt_pmu;
struct perf_pmu *intel_bts_pmu;
struct perf_evsel *evsel;
bool found_pt = false;
bool found_bts = false;
intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
if (evlist) {
evlist__for_each(evlist, evsel) {
if (intel_pt_pmu &&
evsel->attr.type == intel_pt_pmu->type)
found_pt = true;
if (intel_bts_pmu &&
evsel->attr.type == intel_bts_pmu->type)
found_bts = true;
}
}
if (found_pt && found_bts) {
pr_err("intel_pt and intel_bts may not be used together\n");
*err = -EINVAL;
return NULL;
}
if (found_pt)
return intel_pt_recording_init(err);
if (found_bts)
return intel_bts_recording_init(err);
return NULL;
}
struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist,
int *err)
{
char buffer[64];
int ret;
*err = 0;
ret = get_cpuid(buffer, sizeof(buffer));
if (ret) {
*err = ret;
return NULL;
}
if (!strncmp(buffer, "GenuineIntel,", 13))
return auxtrace_record__init_intel(evlist, err);
return NULL;
}
This diff is collapsed.
This diff is collapsed.
#include <string.h>
#include <linux/perf_event.h>
#include "../../util/intel-pt.h"
#include "../../util/intel-bts.h"
#include "../../util/pmu.h"
struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
{
#ifdef HAVE_AUXTRACE_SUPPORT
if (!strcmp(pmu->name, INTEL_PT_PMU_NAME))
return intel_pt_pmu_default_config(pmu);
if (!strcmp(pmu->name, INTEL_BTS_PMU_NAME))
pmu->selectable = true;
#endif
return NULL;
}
......@@ -5,6 +5,7 @@ perf-y += futex-hash.o
perf-y += futex-wake.o
perf-y += futex-wake-parallel.o
perf-y += futex-requeue.o
perf-y += futex-lock-pi.o
perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o
perf-$(CONFIG_X86_64) += mem-memset-x86-64-asm.o
......
......@@ -36,6 +36,8 @@ extern int bench_futex_wake(int argc, const char **argv, const char *prefix);
extern int bench_futex_wake_parallel(int argc, const char **argv,
const char *prefix);
extern int bench_futex_requeue(int argc, const char **argv, const char *prefix);
/* pi futexes */
extern int bench_futex_lock_pi(int argc, const char **argv, const char *prefix);
#define BENCH_FORMAT_DEFAULT_STR "default"
#define BENCH_FORMAT_DEFAULT 0
......
/*
* Copyright (C) 2015 Davidlohr Bueso.
*/
#include "../perf.h"
#include "../util/util.h"
#include "../util/stat.h"
#include "../util/parse-options.h"
#include "../util/header.h"
#include "bench.h"
#include "futex.h"
#include <err.h>
#include <stdlib.h>
#include <sys/time.h>
#include <pthread.h>
struct worker {
int tid;
u_int32_t *futex;
pthread_t thread;
unsigned long ops;
};
static u_int32_t global_futex = 0;
static struct worker *worker;
static unsigned int nsecs = 10;
static bool silent = false, multi = false;
static bool done = false, fshared = false;
static unsigned int ncpus, nthreads = 0;
static int futex_flag = 0;
struct timeval start, end, runtime;
static pthread_mutex_t thread_lock;
static unsigned int threads_starting;
static struct stats throughput_stats;
static pthread_cond_t thread_parent, thread_worker;
static const struct option options[] = {
OPT_UINTEGER('t', "threads", &nthreads, "Specify amount of threads"),
OPT_UINTEGER('r', "runtime", &nsecs, "Specify runtime (in seconds)"),
OPT_BOOLEAN( 'M', "multi", &multi, "Use multiple futexes"),
OPT_BOOLEAN( 's', "silent", &silent, "Silent mode: do not display data/details"),
OPT_BOOLEAN( 'S', "shared", &fshared, "Use shared futexes instead of private ones"),
OPT_END()
};
static const char * const bench_futex_lock_pi_usage[] = {
"perf bench futex requeue <options>",
NULL
};
static void print_summary(void)
{
unsigned long avg = avg_stats(&throughput_stats);
double stddev = stddev_stats(&throughput_stats);
printf("%sAveraged %ld operations/sec (+- %.2f%%), total secs = %d\n",
!silent ? "\n" : "", avg, rel_stddev_stats(stddev, avg),
(int) runtime.tv_sec);
}
static void toggle_done(int sig __maybe_unused,
siginfo_t *info __maybe_unused,
void *uc __maybe_unused)
{
/* inform all threads that we're done for the day */
done = true;
gettimeofday(&end, NULL);
timersub(&end, &start, &runtime);
}
static void *workerfn(void *arg)
{
struct worker *w = (struct worker *) arg;
pthread_mutex_lock(&thread_lock);
threads_starting--;
if (!threads_starting)
pthread_cond_signal(&thread_parent);
pthread_cond_wait(&thread_worker, &thread_lock);
pthread_mutex_unlock(&thread_lock);
do {
int ret;
again:
ret = futex_lock_pi(w->futex, NULL, 0, futex_flag);
if (ret) { /* handle lock acquisition */
if (!silent)
warn("thread %d: Could not lock pi-lock for %p (%d)",
w->tid, w->futex, ret);
if (done)
break;
goto again;
}
usleep(1);
ret = futex_unlock_pi(w->futex, futex_flag);
if (ret && !silent)
warn("thread %d: Could not unlock pi-lock for %p (%d)",
w->tid, w->futex, ret);
w->ops++; /* account for thread's share of work */
} while (!done);
return NULL;
}
static void create_threads(struct worker *w, pthread_attr_t thread_attr)
{
cpu_set_t cpu;
unsigned int i;
threads_starting = nthreads;
for (i = 0; i < nthreads; i++) {
worker[i].tid = i;
if (multi) {
worker[i].futex = calloc(1, sizeof(u_int32_t));
if (!worker[i].futex)
err(EXIT_FAILURE, "calloc");
} else
worker[i].futex = &global_futex;
CPU_ZERO(&cpu);
CPU_SET(i % ncpus, &cpu);
if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpu))
err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
if (pthread_create(&w[i].thread, &thread_attr, workerfn, &worker[i]))
err(EXIT_FAILURE, "pthread_create");
}
}
int bench_futex_lock_pi(int argc, const char **argv,
const char *prefix __maybe_unused)
{
int ret = 0;
unsigned int i;
struct sigaction act;
pthread_attr_t thread_attr;
argc = parse_options(argc, argv, options, bench_futex_lock_pi_usage, 0);
if (argc)
goto err;
ncpus = sysconf(_SC_NPROCESSORS_ONLN);
sigfillset(&act.sa_mask);
act.sa_sigaction = toggle_done;
sigaction(SIGINT, &act, NULL);
if (!nthreads)
nthreads = ncpus;
worker = calloc(nthreads, sizeof(*worker));
if (!worker)
err(EXIT_FAILURE, "calloc");
if (!fshared)
futex_flag = FUTEX_PRIVATE_FLAG;
printf("Run summary [PID %d]: %d threads doing pi lock/unlock pairing for %d secs.\n\n",
getpid(), nthreads, nsecs);
init_stats(&throughput_stats);
pthread_mutex_init(&thread_lock, NULL);
pthread_cond_init(&thread_parent, NULL);
pthread_cond_init(&thread_worker, NULL);
threads_starting = nthreads;
pthread_attr_init(&thread_attr);
gettimeofday(&start, NULL);
create_threads(worker, thread_attr);
pthread_attr_destroy(&thread_attr);
pthread_mutex_lock(&thread_lock);
while (threads_starting)
pthread_cond_wait(&thread_parent, &thread_lock);
pthread_cond_broadcast(&thread_worker);
pthread_mutex_unlock(&thread_lock);
sleep(nsecs);
toggle_done(0, NULL, NULL);
for (i = 0; i < nthreads; i++) {
ret = pthread_join(worker[i].thread, NULL);
if (ret)
err(EXIT_FAILURE, "pthread_join");
}
/* cleanup & report results */
pthread_cond_destroy(&thread_parent);
pthread_cond_destroy(&thread_worker);
pthread_mutex_destroy(&thread_lock);
for (i = 0; i < nthreads; i++) {
unsigned long t = worker[i].ops/runtime.tv_sec;
update_stats(&throughput_stats, t);
if (!silent)
printf("[thread %3d] futex: %p [ %ld ops/sec ]\n",
worker[i].tid, worker[i].futex, t);
if (multi)
free(worker[i].futex);
}
print_summary();
free(worker);
return ret;
err:
usage_with_options(bench_futex_lock_pi_usage, options);
exit(EXIT_FAILURE);
}
......@@ -55,6 +55,26 @@ futex_wake(u_int32_t *uaddr, int nr_wake, int opflags)
return futex(uaddr, FUTEX_WAKE, nr_wake, NULL, NULL, 0, opflags);
}
/**
* futex_lock_pi() - block on uaddr as a PI mutex
* @detect: whether (1) or not (0) to perform deadlock detection
*/
static inline int
futex_lock_pi(u_int32_t *uaddr, struct timespec *timeout, int detect,
int opflags)
{
return futex(uaddr, FUTEX_LOCK_PI, detect, timeout, NULL, 0, opflags);
}
/**
* futex_unlock_pi() - release uaddr as a PI mutex, waking the top waiter
*/
static inline int
futex_unlock_pi(u_int32_t *uaddr, int opflags)
{
return futex(uaddr, FUTEX_UNLOCK_PI, 0, NULL, NULL, 0, opflags);
}
/**
* futex_cmp_requeue() - requeue tasks from uaddr to uaddr2
* @nr_wake: wake up to this many tasks
......
......@@ -67,6 +67,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
rb_erase(&al->sym->rb_node,
&al->map->dso->symbols[al->map->type]);
symbol__delete(al->sym);
dso__reset_find_symbol_cache(al->map->dso);
}
return 0;
}
......@@ -187,6 +188,7 @@ static void hists__find_annotations(struct hists *hists,
* symbol, free he->ms.sym->src to signal we already
* processed this symbol.
*/
zfree(&notes->src->cycles_hist);
zfree(&notes->src);
}
}
......@@ -238,6 +240,8 @@ static int __cmd_annotate(struct perf_annotate *ann)
if (nr_samples > 0) {
total_nr_samples += nr_samples;
hists__collapse_resort(hists, NULL);
/* Don't sort callchain */
perf_evsel__reset_sample_bit(pos, CALLCHAIN);
hists__output_resort(hists, NULL);
if (symbol_conf.event_group &&
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment