Commit 9657752c authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Add branch type profiling/tracing support. (Jin Yao)

   - Add the PERF_SAMPLE_PHYS_ADDR ABI to allow the tracing/profiling of
     physical memory addresses, where the PMU supports it. (Kan Liang)

   - Export some PMU capability details in the new
     /sys/bus/event_source/devices/cpu/caps/ sysfs directory. (Andi
     Kleen)

   - Aux data fixes and updates (Will Deacon)

   - kprobes fixes and updates (Masami Hiramatsu)

   - AMD uncore PMU driver fixes and updates (Janakarajan Natarajan)

  On the tooling side, here's a (limited!) list of highlights - there
  were many other changes that I could not list, see the shortlog and
  git history for details:

  UI improvements:

   - Implement a visual marker for fused x86 instructions in the
     annotate TUI browser, available now in 'perf report', more work
     needed to have it available as well in 'perf top' (Jin Yao)

     Further explanation from one of Jin's patches:

             │   ┌──cmpl   $0x0,argp_program_version_hook
       81.93 │   ├──je     20
             │   │  lock   cmpxchg %esi,0x38a9a4(%rip)
             │   │↓ jne    29
             │   │↓ jmp    43
       11.47 │20:└─→cmpxch %esi,0x38a999(%rip)

     That means the cmpl+je is a fused instruction pair and they should
     be considered together.

   - Record the branch type and then show statistics and info about in
     callchain entries (Jin Yao)

     Example from one of Jin's patches:

        # perf record -g -j any,save_type
        # perf report --branch-history --stdio --no-children

        38.50%  div.c:45                [.] main                    div
                |
                ---main div.c:42 (RET CROSS_2M cycles:2)
                   compute_flag div.c:28 (cycles:2)
                   compute_flag div.c:27 (RET CROSS_2M cycles:1)
                   rand rand.c:28 (cycles:1)
                   rand rand.c:28 (RET CROSS_2M cycles:1)
                   __random random.c:298 (cycles:1)
                   __random random.c:297 (COND_BWD CROSS_2M cycles:1)
                   __random random.c:295 (cycles:1)
                   __random random.c:295 (COND_BWD CROSS_2M cycles:1)
                   __random random.c:295 (cycles:1)
                   __random random.c:295 (RET CROSS_2M cycles:9)

  namespaces support:

   - Add initial support for namespaces, using setns to access files in
     namespaces, grabbing their build-ids, etc. (Krister Johansen)

  perf trace enhancements:

   - Beautify pkey_{alloc,free,mprotect} arguments in 'perf trace'
     (Arnaldo Carvalho de Melo)

   - Add initial 'clone' syscall args beautifier in 'perf trace'
     (Arnaldo Carvalho de Melo)

   - Ignore 'fd' and 'offset' args for MAP_ANONYMOUS in 'perf trace'
     (Arnaldo Carvalho de Melo)

   - Beautifiers for the 'cmd' arg of several ioctl types, including:
     sound, DRM, KVM, vhost virtio and perf_events. (Arnaldo Carvalho de
     Melo)

   - Add PERF_SAMPLE_CALLCHAIN and PERF_RECORD_MMAP[2] to 'perf data'
     CTF conversion, allowing CTF trace visualization tools to show
     callchains and to resolve symbols (Geneviève Bastien)

   - Beautify the fcntl syscall, which is an interesting one in the
     sense that infrastructure had to be put in place to change the
     formatters of some arguments according to the value in a previous
     one, i.e. cmd dictates how arg and the syscall return will be
     formatted. (Arnaldo Carvalho de Melo

  perf stat enhancements:

   - Use group read for event groups in 'perf stat', reducing overhead
     when groups are defined in the event specification, i.e. when using
     {} to enclose a list of events, asking them to be read at the same
     time, e.g.: "perf stat -e '{cycles,instructions}'" (Jiri Olsa)

  pipe mode improvements:

   - Process tracing data in 'perf annotate' pipe mode (David
     Carrillo-Cisneros)

   - Add header record types to pipe-mode, now this command:

        $ perf record -o - -e cycles sleep 1 | perf report --stdio --header

     Will show the same as in non-pipe mode, i.e. involving a perf.data
     file (David Carrillo-Cisneros)

  Vendor specific hardware event support updates/enhancements:

   - Update POWER9 vendor events tables (Sukadev Bhattiprolu)

   - Add POWER9 PMU events Sukadev (Bhattiprolu)

   - Support additional POWER8+ PVR in PMU mapfile (Shriya)

   - Add Skylake server uncore JSON vendor events (Andi Kleen)

   - Support exporting Intel PT data to sqlite3 with python perf
     scripts, this is in addition to the postgresql support that was
     already there (Adrian Hunter)"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (253 commits)
  perf symbols: Fix plt entry calculation for ARM and AARCH64
  perf probe: Fix kprobe blacklist checking condition
  perf/x86: Fix caps/ for !Intel
  perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR
  perf/core, pt, bts: Get rid of itrace_started
  perf trace beauty: Beautify pkey_{alloc,free,mprotect} arguments
  tools headers: Sync cpu features kernel ABI headers with tooling headers
  perf tools: Pass full path of FEATURES_DUMP
  perf tools: Robustify detection of clang binary
  tools lib: Allow external definition of CC, AR and LD
  perf tools: Allow external definition of flex and bison binary names
  tools build tests: Don't hardcode gcc name
  perf report: Group stat values on global event id
  perf values: Zero value buffers
  perf values: Fix allocation check
  perf values: Fix thread index bug
  perf report: Add dump_read function
  perf record: Set read_format for inherit_stat
  perf c2c: Fix remote HITM detection for Skylake
  perf tools: Fix static build with newer toolchains
  ...
parents 0081a0ce 1b2f76d7
......@@ -61,6 +61,7 @@ show up in /proc/sys/kernel:
- perf_cpu_time_max_percent
- perf_event_paranoid
- perf_event_max_stack
- perf_event_mlock_kb
- perf_event_max_contexts_per_stack
- pid_max
- powersave-nap [ PPC only ]
......@@ -654,7 +655,9 @@ Controls use of the performance events system by unprivileged
users (without CAP_SYS_ADMIN). The default value is 2.
-1: Allow use of (almost) all events by all users
>=0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>=1: Disallow CPU event access by users without CAP_SYS_ADMIN
>=2: Disallow kernel profiling by users without CAP_SYS_ADMIN
......@@ -673,6 +676,14 @@ The default value is 127.
==============================================================
perf_event_mlock_kb:
Control size of per-cpu ring buffer not counted agains mlock limit.
The default value is 512 + 1 page
==============================================================
perf_event_max_contexts_per_stack:
Controls maximum number of stack frame context entries for
......
......@@ -18,7 +18,6 @@ struct undef_hook {
void register_undef_hook(struct undef_hook *hook);
void unregister_undef_hook(struct undef_hook *hook);
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
static inline int __in_irqentry_text(unsigned long ptr)
{
extern char __irqentry_text_start[];
......@@ -27,12 +26,6 @@ static inline int __in_irqentry_text(unsigned long ptr)
return ptr >= (unsigned long)&__irqentry_text_start &&
ptr < (unsigned long)&__irqentry_text_end;
}
#else
static inline int __in_irqentry_text(unsigned long ptr)
{
return 0;
}
#endif
static inline int in_exception_text(unsigned long ptr)
{
......
......@@ -37,18 +37,11 @@ void unregister_undef_hook(struct undef_hook *hook);
void arm64_notify_segfault(struct pt_regs *regs, unsigned long addr);
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
static inline int __in_irqentry_text(unsigned long ptr)
{
return ptr >= (unsigned long)&__irqentry_text_start &&
ptr < (unsigned long)&__irqentry_text_end;
}
#else
static inline int __in_irqentry_text(unsigned long ptr)
{
return 0;
}
#endif
static inline int in_exception_text(unsigned long ptr)
{
......
......@@ -227,7 +227,7 @@ static void crisv32_arbiter_config(int arbiter, int region, int unused_slots)
}
}
extern char _stext, _etext;
extern char _stext[], _etext[];
static void crisv32_arbiter_init(void)
{
......@@ -265,7 +265,7 @@ static void crisv32_arbiter_init(void)
#ifndef CONFIG_ETRAX_KGDB
/* Global watch for writes to kernel text segment. */
crisv32_arbiter_watch(virt_to_phys(&_stext), &_etext - &_stext,
crisv32_arbiter_watch(virt_to_phys(_stext), _etext - _stext,
MARB_CLIENTS(arbiter_all_clients, arbiter_bar_all_clients),
arbiter_all_write, NULL);
#endif
......
......@@ -158,7 +158,7 @@ static void crisv32_arbiter_config(int region, int unused_slots)
}
}
extern char _stext, _etext;
extern char _stext[], _etext[];
static void crisv32_arbiter_init(void)
{
......@@ -190,7 +190,7 @@ static void crisv32_arbiter_init(void)
#ifndef CONFIG_ETRAX_KGDB
/* Global watch for writes to kernel text segment. */
crisv32_arbiter_watch(virt_to_phys(&_stext), &_etext - &_stext,
crisv32_arbiter_watch(virt_to_phys(_stext), _etext - _stext,
arbiter_all_clients, arbiter_all_write, NULL);
#endif
}
......
......@@ -42,7 +42,7 @@ void (*nmi_handler)(struct pt_regs *);
void show_trace(unsigned long *stack)
{
unsigned long addr, module_start, module_end;
extern char _stext, _etext;
extern char _stext[], _etext[];
int i;
pr_err("\nCall Trace: ");
......@@ -69,8 +69,8 @@ void show_trace(unsigned long *stack)
* down the cause of the crash will be able to figure
* out the call path that was taken.
*/
if (((addr >= (unsigned long)&_stext) &&
(addr <= (unsigned long)&_etext)) ||
if (((addr >= (unsigned long)_stext) &&
(addr <= (unsigned long)_etext)) ||
((addr >= module_start) && (addr <= module_end))) {
#ifdef CONFIG_KALLSYMS
print_ip_sym(addr);
......
......@@ -33,9 +33,9 @@ extern unsigned long *_interrupt_redirect_table;
#define TRAP2_VEC 10
#define TRAP3_VEC 11
extern char _start, _etext;
extern char _start[], _etext[];
#define check_kernel_text(addr) \
((addr >= (unsigned long)(&_start)) && \
(addr < (unsigned long)(&_etext)) && !(addr & 1))
((addr >= (unsigned long)(_start)) && \
(addr < (unsigned long)(_etext)) && !(addr & 1))
#endif /* _H8300_TRAPS_H */
......@@ -2039,7 +2039,8 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
perf_sample_data_init(&data, ~0ULL, event->hw.last_period);
if (event->attr.sample_type & PERF_SAMPLE_ADDR)
if (event->attr.sample_type &
(PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR))
perf_get_data_addr(regs, &data.addr);
if (event->attr.sample_type & PERF_SAMPLE_BRANCH_STACK) {
......
......@@ -675,13 +675,8 @@ apicinterrupt3 \num trace(\sym) smp_trace(\sym)
#endif
/* Make sure APIC interrupt handlers end up in the irqentry section: */
#if defined(CONFIG_FUNCTION_GRAPH_TRACER) || defined(CONFIG_KASAN)
# define PUSH_SECTION_IRQENTRY .pushsection .irqentry.text, "ax"
# define POP_SECTION_IRQENTRY .popsection
#else
# define PUSH_SECTION_IRQENTRY
# define POP_SECTION_IRQENTRY
#endif
#define PUSH_SECTION_IRQENTRY .pushsection .irqentry.text, "ax"
#define POP_SECTION_IRQENTRY .popsection
.macro apicinterrupt num sym do_sym
PUSH_SECTION_IRQENTRY
......
......@@ -400,11 +400,24 @@ static int amd_uncore_cpu_starting(unsigned int cpu)
if (amd_uncore_llc) {
unsigned int apicid = cpu_data(cpu).apicid;
unsigned int nshared;
unsigned int nshared, subleaf, prev_eax = 0;
uncore = *per_cpu_ptr(amd_uncore_llc, cpu);
cpuid_count(0x8000001d, 2, &eax, &ebx, &ecx, &edx);
nshared = ((eax >> 14) & 0xfff) + 1;
/*
* Iterate over Cache Topology Definition leaves until no
* more cache descriptions are available.
*/
for (subleaf = 0; subleaf < 5; subleaf++) {
cpuid_count(0x8000001d, subleaf, &eax, &ebx, &ecx, &edx);
/* EAX[0:4] gives type of cache */
if (!(eax & 0x1f))
break;
prev_eax = eax;
}
nshared = ((prev_eax >> 14) & 0xfff) + 1;
uncore->id = apicid - (apicid % nshared);
uncore = amd_uncore_find_online_sibling(uncore, amd_uncore_llc);
......@@ -555,7 +568,7 @@ static int __init amd_uncore_init(void)
ret = 0;
}
if (boot_cpu_has(X86_FEATURE_PERFCTR_L2)) {
if (boot_cpu_has(X86_FEATURE_PERFCTR_LLC)) {
amd_uncore_llc = alloc_percpu(struct amd_uncore *);
if (!amd_uncore_llc) {
ret = -ENOMEM;
......
......@@ -487,22 +487,28 @@ static inline int precise_br_compat(struct perf_event *event)
return m == b;
}
int x86_pmu_hw_config(struct perf_event *event)
int x86_pmu_max_precise(void)
{
if (event->attr.precise_ip) {
int precise = 0;
int precise = 0;
/* Support for constant skid */
if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
precise++;
/* Support for constant skid */
if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
/* Support for IP fixup */
if (x86_pmu.lbr_nr || x86_pmu.intel_cap.pebs_format >= 2)
precise++;
/* Support for IP fixup */
if (x86_pmu.lbr_nr || x86_pmu.intel_cap.pebs_format >= 2)
precise++;
if (x86_pmu.pebs_prec_dist)
precise++;
}
return precise;
}
if (x86_pmu.pebs_prec_dist)
precise++;
}
int x86_pmu_hw_config(struct perf_event *event)
{
if (event->attr.precise_ip) {
int precise = x86_pmu_max_precise();
if (event->attr.precise_ip > precise)
return -EOPNOTSUPP;
......@@ -1751,6 +1757,7 @@ ssize_t x86_event_sysfs_show(char *page, u64 config, u64 event)
}
static struct attribute_group x86_pmu_attr_group;
static struct attribute_group x86_pmu_caps_group;
static int __init init_hw_perf_events(void)
{
......@@ -1799,6 +1806,14 @@ static int __init init_hw_perf_events(void)
x86_pmu_format_group.attrs = x86_pmu.format_attrs;
if (x86_pmu.caps_attrs) {
struct attribute **tmp;
tmp = merge_attr(x86_pmu_caps_group.attrs, x86_pmu.caps_attrs);
if (!WARN_ON(!tmp))
x86_pmu_caps_group.attrs = tmp;
}
if (x86_pmu.event_attrs)
x86_pmu_events_group.attrs = x86_pmu.event_attrs;
......@@ -2213,10 +2228,30 @@ static struct attribute_group x86_pmu_attr_group = {
.attrs = x86_pmu_attrs,
};
static ssize_t max_precise_show(struct device *cdev,
struct device_attribute *attr,
char *buf)
{
return snprintf(buf, PAGE_SIZE, "%d\n", x86_pmu_max_precise());
}
static DEVICE_ATTR_RO(max_precise);
static struct attribute *x86_pmu_caps_attrs[] = {
&dev_attr_max_precise.attr,
NULL
};
static struct attribute_group x86_pmu_caps_group = {
.name = "caps",
.attrs = x86_pmu_caps_attrs,
};
static const struct attribute_group *x86_pmu_attr_groups[] = {
&x86_pmu_attr_group,
&x86_pmu_format_group,
&x86_pmu_events_group,
&x86_pmu_caps_group,
NULL,
};
......
......@@ -268,7 +268,7 @@ static void bts_event_start(struct perf_event *event, int flags)
bts->ds_back.bts_absolute_maximum = cpuc->ds->bts_absolute_maximum;
bts->ds_back.bts_interrupt_threshold = cpuc->ds->bts_interrupt_threshold;
event->hw.itrace_started = 1;
perf_event_itrace_started(event);
event->hw.state = 0;
__bts_event_start(event);
......
......@@ -3415,12 +3415,26 @@ static struct attribute *intel_arch3_formats_attr[] = {
&format_attr_any.attr,
&format_attr_inv.attr,
&format_attr_cmask.attr,
NULL,
};
static struct attribute *hsw_format_attr[] = {
&format_attr_in_tx.attr,
&format_attr_in_tx_cp.attr,
&format_attr_offcore_rsp.attr,
&format_attr_ldlat.attr,
NULL
};
&format_attr_offcore_rsp.attr, /* XXX do NHM/WSM + SNB breakout */
&format_attr_ldlat.attr, /* PEBS load latency */
NULL,
static struct attribute *nhm_format_attr[] = {
&format_attr_offcore_rsp.attr,
&format_attr_ldlat.attr,
NULL
};
static struct attribute *slm_format_attr[] = {
&format_attr_offcore_rsp.attr,
NULL
};
static struct attribute *skl_format_attr[] = {
......@@ -3781,6 +3795,36 @@ static ssize_t freeze_on_smi_store(struct device *cdev,
static DEVICE_ATTR_RW(freeze_on_smi);
static ssize_t branches_show(struct device *cdev,
struct device_attribute *attr,
char *buf)
{
return snprintf(buf, PAGE_SIZE, "%d\n", x86_pmu.lbr_nr);
}
static DEVICE_ATTR_RO(branches);
static struct attribute *lbr_attrs[] = {
&dev_attr_branches.attr,
NULL
};
static char pmu_name_str[30];
static ssize_t pmu_name_show(struct device *cdev,
struct device_attribute *attr,
char *buf)
{
return snprintf(buf, PAGE_SIZE, "%s\n", pmu_name_str);
}
static DEVICE_ATTR_RO(pmu_name);
static struct attribute *intel_pmu_caps_attrs[] = {
&dev_attr_pmu_name.attr,
NULL
};
static struct attribute *intel_pmu_attrs[] = {
&dev_attr_freeze_on_smi.attr,
NULL,
......@@ -3795,6 +3839,8 @@ __init int intel_pmu_init(void)
unsigned int unused;
struct extra_reg *er;
int version, i;
struct attribute **extra_attr = NULL;
char *name;
if (!cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
switch (boot_cpu_data.x86) {
......@@ -3862,6 +3908,7 @@ __init int intel_pmu_init(void)
switch (boot_cpu_data.x86_model) {
case INTEL_FAM6_CORE_YONAH:
pr_cont("Core events, ");
name = "core";
break;
case INTEL_FAM6_CORE2_MEROM:
......@@ -3877,6 +3924,7 @@ __init int intel_pmu_init(void)
x86_pmu.event_constraints = intel_core2_event_constraints;
x86_pmu.pebs_constraints = intel_core2_pebs_event_constraints;
pr_cont("Core2 events, ");
name = "core2";
break;
case INTEL_FAM6_NEHALEM:
......@@ -3905,8 +3953,11 @@ __init int intel_pmu_init(void)
intel_pmu_pebs_data_source_nhm();
x86_add_quirk(intel_nehalem_quirk);
x86_pmu.pebs_no_tlb = 1;
extra_attr = nhm_format_attr;
pr_cont("Nehalem events, ");
name = "nehalem";
break;
case INTEL_FAM6_ATOM_PINEVIEW:
......@@ -3923,6 +3974,7 @@ __init int intel_pmu_init(void)
x86_pmu.pebs_constraints = intel_atom_pebs_event_constraints;
x86_pmu.pebs_aliases = intel_pebs_aliases_core2;
pr_cont("Atom events, ");
name = "bonnell";
break;
case INTEL_FAM6_ATOM_SILVERMONT1:
......@@ -3940,7 +3992,9 @@ __init int intel_pmu_init(void)
x86_pmu.extra_regs = intel_slm_extra_regs;
x86_pmu.flags |= PMU_FL_HAS_RSP_1;
x86_pmu.cpu_events = slm_events_attrs;
extra_attr = slm_format_attr;
pr_cont("Silvermont events, ");
name = "silvermont";
break;
case INTEL_FAM6_ATOM_GOLDMONT:
......@@ -3965,7 +4019,9 @@ __init int intel_pmu_init(void)
x86_pmu.lbr_pt_coexist = true;
x86_pmu.flags |= PMU_FL_HAS_RSP_1;
x86_pmu.cpu_events = glm_events_attrs;
extra_attr = slm_format_attr;
pr_cont("Goldmont events, ");
name = "goldmont";
break;
case INTEL_FAM6_ATOM_GEMINI_LAKE:
......@@ -3991,7 +4047,9 @@ __init int intel_pmu_init(void)
x86_pmu.cpu_events = glm_events_attrs;
/* Goldmont Plus has 4-wide pipeline */
event_attr_td_total_slots_scale_glm.event_str = "4";
extra_attr = slm_format_attr;
pr_cont("Goldmont plus events, ");
name = "goldmont_plus";
break;
case INTEL_FAM6_WESTMERE:
......@@ -4020,7 +4078,9 @@ __init int intel_pmu_init(void)
X86_CONFIG(.event=0xb1, .umask=0x3f, .inv=1, .cmask=1);
intel_pmu_pebs_data_source_nhm();
extra_attr = nhm_format_attr;
pr_cont("Westmere events, ");
name = "westmere";
break;
case INTEL_FAM6_SANDYBRIDGE:
......@@ -4056,7 +4116,10 @@ __init int intel_pmu_init(void)
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =
X86_CONFIG(.event=0xb1, .umask=0x01, .inv=1, .cmask=1);
extra_attr = nhm_format_attr;
pr_cont("SandyBridge events, ");
name = "sandybridge";
break;
case INTEL_FAM6_IVYBRIDGE:
......@@ -4090,7 +4153,10 @@ __init int intel_pmu_init(void)
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
extra_attr = nhm_format_attr;
pr_cont("IvyBridge events, ");
name = "ivybridge";
break;
......@@ -4118,7 +4184,10 @@ __init int intel_pmu_init(void)
x86_pmu.get_event_constraints = hsw_get_event_constraints;
x86_pmu.cpu_events = hsw_events_attrs;
x86_pmu.lbr_double_abort = true;
extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
hsw_format_attr : nhm_format_attr;
pr_cont("Haswell events, ");
name = "haswell";
break;
case INTEL_FAM6_BROADWELL_CORE:
......@@ -4154,7 +4223,10 @@ __init int intel_pmu_init(void)
x86_pmu.get_event_constraints = hsw_get_event_constraints;
x86_pmu.cpu_events = hsw_events_attrs;
x86_pmu.limit_period = bdw_limit_period;
extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
hsw_format_attr : nhm_format_attr;
pr_cont("Broadwell events, ");
name = "broadwell";
break;
case INTEL_FAM6_XEON_PHI_KNL:
......@@ -4172,8 +4244,9 @@ __init int intel_pmu_init(void)
/* all extra regs are per-cpu when HT is on */
x86_pmu.flags |= PMU_FL_HAS_RSP_1;
x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
extra_attr = slm_format_attr;
pr_cont("Knights Landing/Mill events, ");
name = "knights-landing";
break;
case INTEL_FAM6_SKYLAKE_MOBILE:
......@@ -4203,11 +4276,14 @@ __init int intel_pmu_init(void)
x86_pmu.hw_config = hsw_hw_config;
x86_pmu.get_event_constraints = hsw_get_event_constraints;
x86_pmu.format_attrs = merge_attr(intel_arch3_formats_attr,
skl_format_attr);
WARN_ON(!x86_pmu.format_attrs);
extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
hsw_format_attr : nhm_format_attr;
extra_attr = merge_attr(extra_attr, skl_format_attr);
x86_pmu.cpu_events = hsw_events_attrs;
intel_pmu_pebs_data_source_skl(
boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X);
pr_cont("Skylake events, ");
name = "skylake";
break;
default:
......@@ -4215,6 +4291,7 @@ __init int intel_pmu_init(void)
case 1:
x86_pmu.event_constraints = intel_v1_event_constraints;
pr_cont("generic architected perfmon v1, ");
name = "generic_arch_v1";
break;
default:
/*
......@@ -4222,10 +4299,19 @@ __init int intel_pmu_init(void)
*/
x86_pmu.event_constraints = intel_gen_event_constraints;
pr_cont("generic architected perfmon, ");
name = "generic_arch_v2+";
break;
}
}
snprintf(pmu_name_str, sizeof pmu_name_str, "%s", name);
if (version >= 2 && extra_attr) {
x86_pmu.format_attrs = merge_attr(intel_arch3_formats_attr,
extra_attr);
WARN_ON(!x86_pmu.format_attrs);
}
if (x86_pmu.num_counters > INTEL_PMC_MAX_GENERIC) {
WARN(1, KERN_ERR "hw perf events %d > max(%d), clipping!",
x86_pmu.num_counters, INTEL_PMC_MAX_GENERIC);
......@@ -4272,8 +4358,13 @@ __init int intel_pmu_init(void)
x86_pmu.lbr_nr = 0;
}
if (x86_pmu.lbr_nr)
x86_pmu.caps_attrs = intel_pmu_caps_attrs;
if (x86_pmu.lbr_nr) {
x86_pmu.caps_attrs = merge_attr(x86_pmu.caps_attrs, lbr_attrs);
pr_cont("%d-deep LBR, ", x86_pmu.lbr_nr);
}
/*
* Access extra MSR may cause #GP under certain circumstances.
* E.g. KVM doesn't support offcore event
......
......@@ -49,34 +49,47 @@ union intel_x86_pebs_dse {
*/
#define P(a, b) PERF_MEM_S(a, b)
#define OP_LH (P(OP, LOAD) | P(LVL, HIT))
#define LEVEL(x) P(LVLNUM, x)
#define REM P(REMOTE, REMOTE)
#define SNOOP_NONE_MISS (P(SNOOP, NONE) | P(SNOOP, MISS))
/* Version for Sandy Bridge and later */
static u64 pebs_data_source[] = {
P(OP, LOAD) | P(LVL, MISS) | P(LVL, L3) | P(SNOOP, NA),/* 0x00:ukn L3 */
OP_LH | P(LVL, L1) | P(SNOOP, NONE), /* 0x01: L1 local */
OP_LH | P(LVL, LFB) | P(SNOOP, NONE), /* 0x02: LFB hit */
OP_LH | P(LVL, L2) | P(SNOOP, NONE), /* 0x03: L2 hit */
OP_LH | P(LVL, L3) | P(SNOOP, NONE), /* 0x04: L3 hit */
OP_LH | P(LVL, L3) | P(SNOOP, MISS), /* 0x05: L3 hit, snoop miss */
OP_LH | P(LVL, L3) | P(SNOOP, HIT), /* 0x06: L3 hit, snoop hit */
OP_LH | P(LVL, L3) | P(SNOOP, HITM), /* 0x07: L3 hit, snoop hitm */
OP_LH | P(LVL, REM_CCE1) | P(SNOOP, HIT), /* 0x08: L3 miss snoop hit */
OP_LH | P(LVL, REM_CCE1) | P(SNOOP, HITM), /* 0x09: L3 miss snoop hitm*/
OP_LH | P(LVL, LOC_RAM) | P(SNOOP, HIT), /* 0x0a: L3 miss, shared */
OP_LH | P(LVL, REM_RAM1) | P(SNOOP, HIT), /* 0x0b: L3 miss, shared */
OP_LH | P(LVL, LOC_RAM) | SNOOP_NONE_MISS,/* 0x0c: L3 miss, excl */
OP_LH | P(LVL, REM_RAM1) | SNOOP_NONE_MISS,/* 0x0d: L3 miss, excl */
OP_LH | P(LVL, IO) | P(SNOOP, NONE), /* 0x0e: I/O */
OP_LH | P(LVL, UNC) | P(SNOOP, NONE), /* 0x0f: uncached */
P(OP, LOAD) | P(LVL, MISS) | LEVEL(L3) | P(SNOOP, NA),/* 0x00:ukn L3 */
OP_LH | P(LVL, L1) | LEVEL(L1) | P(SNOOP, NONE), /* 0x01: L1 local */
OP_LH | P(LVL, LFB) | LEVEL(LFB) | P(SNOOP, NONE), /* 0x02: LFB hit */
OP_LH | P(LVL, L2) | LEVEL(L2) | P(SNOOP, NONE), /* 0x03: L2 hit */
OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, NONE), /* 0x04: L3 hit */
OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, MISS), /* 0x05: L3 hit, snoop miss */
OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HIT), /* 0x06: L3 hit, snoop hit */
OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HITM), /* 0x07: L3 hit, snoop hitm */
OP_LH | P(LVL, REM_CCE1) | REM | LEVEL(L3) | P(SNOOP, HIT), /* 0x08: L3 miss snoop hit */
OP_LH | P(LVL, REM_CCE1) | REM | LEVEL(L3) | P(SNOOP, HITM), /* 0x09: L3 miss snoop hitm*/
OP_LH | P(LVL, LOC_RAM) | LEVEL(RAM) | P(SNOOP, HIT), /* 0x0a: L3 miss, shared */
OP_LH | P(LVL, REM_RAM1) | REM | LEVEL(L3) | P(SNOOP, HIT), /* 0x0b: L3 miss, shared */
OP_LH | P(LVL, LOC_RAM) | LEVEL(RAM) | SNOOP_NONE_MISS, /* 0x0c: L3 miss, excl */
OP_LH | P(LVL, REM_RAM1) | LEVEL(RAM) | REM | SNOOP_NONE_MISS, /* 0x0d: L3 miss, excl */
OP_LH | P(LVL, IO) | LEVEL(NA) | P(SNOOP, NONE), /* 0x0e: I/O */
OP_LH | P(LVL, UNC) | LEVEL(NA) | P(SNOOP, NONE), /* 0x0f: uncached */
};
/* Patch up minor differences in the bits */
void __init intel_pmu_pebs_data_source_nhm(void)
{
pebs_data_source[0x05] = OP_LH | P(LVL, L3) | P(SNOOP, HIT);
pebs_data_source[0x06] = OP_LH | P(LVL, L3) | P(SNOOP, HITM);
pebs_data_source[0x07] = OP_LH | P(LVL, L3) | P(SNOOP, HITM);
pebs_data_source[0x05] = OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HIT);
pebs_data_source[0x06] = OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HITM);
pebs_data_source[0x07] = OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HITM);
}
void __init intel_pmu_pebs_data_source_skl(bool pmem)
{
u64 pmem_or_l4 = pmem ? LEVEL(PMEM) : LEVEL(L4);
pebs_data_source[0x08] = OP_LH | pmem_or_l4 | P(SNOOP, HIT);
pebs_data_source[0x09] = OP_LH | pmem_or_l4 | REM | P(SNOOP, HIT);
pebs_data_source[0x0b] = OP_LH | LEVEL(RAM) | REM | P(SNOOP, NONE);
pebs_data_source[0x0c] = OP_LH | LEVEL(ANY_CACHE) | REM | P(SNOOPX, FWD);
pebs_data_source[0x0d] = OP_LH | LEVEL(ANY_CACHE) | REM | P(SNOOP, HITM);
}
static u64 precise_store_data(u64 status)
......@@ -149,8 +162,6 @@ static u64 load_latency_data(u64 status)
{
union intel_x86_pebs_dse dse;
u64 val;
int model = boot_cpu_data.x86_model;
int fam = boot_cpu_data.x86;
dse.val = status;
......@@ -162,8 +173,7 @@ static u64 load_latency_data(u64 status)
/*
* Nehalem models do not support TLB, Lock infos
*/
if (fam == 0x6 && (model == 26 || model == 30
|| model == 31 || model == 46)) {
if (x86_pmu.pebs_no_tlb) {
val |= P(TLB, NA) | P(LOCK, NA);
return val;
}
......@@ -1175,7 +1185,7 @@ static void setup_pebs_sample_data(struct perf_event *event,
else
regs->flags &= ~PERF_EFLAGS_EXACT;
if ((sample_type & PERF_SAMPLE_ADDR) &&
if ((sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR)) &&
x86_pmu.intel_cap.pebs_format >= 1)
data->addr = pebs->dla;
......
......@@ -109,6 +109,9 @@ enum {
X86_BR_ZERO_CALL = 1 << 15,/* zero length call */
X86_BR_CALL_STACK = 1 << 16,/* call stack */
X86_BR_IND_JMP = 1 << 17,/* indirect jump */
X86_BR_TYPE_SAVE = 1 << 18,/* indicate to save branch type */
};
#define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
......@@ -514,6 +517,7 @@ static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc)
cpuc->lbr_entries[i].in_tx = 0;
cpuc->lbr_entries[i].abort = 0;
cpuc->lbr_entries[i].cycles = 0;
cpuc->lbr_entries[i].type = 0;
cpuc->lbr_entries[i].reserved = 0;
}
cpuc->lbr_stack.nr = i;
......@@ -600,6 +604,7 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
cpuc->lbr_entries[out].in_tx = in_tx;
cpuc->lbr_entries[out].abort = abort;
cpuc->lbr_entries[out].cycles = cycles;
cpuc->lbr_entries[out].type = 0;
cpuc->lbr_entries[out].reserved = 0;
out++;
}
......@@ -677,6 +682,10 @@ static int intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
if (br_type & PERF_SAMPLE_BRANCH_CALL)
mask |= X86_BR_CALL | X86_BR_ZERO_CALL;
if (br_type & PERF_SAMPLE_BRANCH_TYPE_SAVE)
mask |= X86_BR_TYPE_SAVE;
/*
* stash actual user request into reg, it may
* be used by fixup code for some CPU
......@@ -930,6 +939,43 @@ static int branch_type(unsigned long from, unsigned long to, int abort)
return ret;
}
#define X86_BR_TYPE_MAP_MAX 16
static int branch_map[X86_BR_TYPE_MAP_MAX] = {
PERF_BR_CALL, /* X86_BR_CALL */
PERF_BR_RET, /* X86_BR_RET */
PERF_BR_SYSCALL, /* X86_BR_SYSCALL */
PERF_BR_SYSRET, /* X86_BR_SYSRET */
PERF_BR_UNKNOWN, /* X86_BR_INT */
PERF_BR_UNKNOWN, /* X86_BR_IRET */
PERF_BR_COND, /* X86_BR_JCC */
PERF_BR_UNCOND, /* X86_BR_JMP */
PERF_BR_UNKNOWN, /* X86_BR_IRQ */
PERF_BR_IND_CALL, /* X86_BR_IND_CALL */
PERF_BR_UNKNOWN, /* X86_BR_ABORT */
PERF_BR_UNKNOWN, /* X86_BR_IN_TX */
PERF_BR_UNKNOWN, /* X86_BR_NO_TX */
PERF_BR_CALL, /* X86_BR_ZERO_CALL */
PERF_BR_UNKNOWN, /* X86_BR_CALL_STACK */
PERF_BR_IND, /* X86_BR_IND_JMP */
};
static int
common_branch_type(int type)
{
int i;
type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */
if (type) {
i = __ffs(type);
if (i < X86_BR_TYPE_MAP_MAX)
return branch_map[i];
}
return PERF_BR_UNKNOWN;
}
/*
* implement actual branch filter based on user demand.
* Hardware may not exactly satisfy that request, thus
......@@ -946,7 +992,8 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
bool compress = false;
/* if sampling all branches, then nothing to filter */
if ((br_sel & X86_BR_ALL) == X86_BR_ALL)
if (((br_sel & X86_BR_ALL) == X86_BR_ALL) &&
((br_sel & X86_BR_TYPE_SAVE) != X86_BR_TYPE_SAVE))
return;
for (i = 0; i < cpuc->lbr_stack.nr; i++) {
......@@ -967,6 +1014,9 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
cpuc->lbr_entries[i].from = 0;
compress = true;
}
if ((br_sel & X86_BR_TYPE_SAVE) == X86_BR_TYPE_SAVE)
cpuc->lbr_entries[i].type = common_branch_type(type);
}
if (!compress)
......
......@@ -471,8 +471,9 @@ static void pt_config(struct perf_event *event)
struct pt *pt = this_cpu_ptr(&pt_ctx);
u64 reg;
if (!event->hw.itrace_started) {
event->hw.itrace_started = 1;
/* First round: clear STATUS, in particular the PSB byte counter. */
if (!event->hw.config) {
perf_event_itrace_started(event);
wrmsrl(MSR_IA32_RTIT_STATUS, 0);
}
......
......@@ -91,7 +91,7 @@ struct amd_nb {
(PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_ADDR | \
PERF_SAMPLE_ID | PERF_SAMPLE_CPU | PERF_SAMPLE_STREAM_ID | \
PERF_SAMPLE_DATA_SRC | PERF_SAMPLE_IDENTIFIER | \
PERF_SAMPLE_TRANSACTION)
PERF_SAMPLE_TRANSACTION | PERF_SAMPLE_PHYS_ADDR)
/*
* A debug store configuration.
......@@ -558,6 +558,7 @@ struct x86_pmu {
int attr_rdpmc;
struct attribute **format_attrs;
struct attribute **event_attrs;
struct attribute **caps_attrs;
ssize_t (*events_sysfs_show)(char *page, u64 config);
struct attribute **cpu_events;
......@@ -591,7 +592,8 @@ struct x86_pmu {
pebs :1,
pebs_active :1,
pebs_broken :1,
pebs_prec_dist :1;
pebs_prec_dist :1,
pebs_no_tlb :1;
int pebs_record_size;
int pebs_buffer_size;
void (*drain_pebs)(struct pt_regs *regs);
......@@ -741,6 +743,8 @@ int x86_reserve_hardware(void);
void x86_release_hardware(void);
int x86_pmu_max_precise(void);
void hw_perf_lbr_event_destroy(struct perf_event *event);
int x86_setup_perfctr(struct perf_event *event);
......@@ -947,6 +951,8 @@ void intel_pmu_lbr_init_knl(void);
void intel_pmu_pebs_data_source_nhm(void);
void intel_pmu_pebs_data_source_skl(bool pmem);
int intel_pmu_setup_lbr_filter(struct perf_event *event);
void intel_pt_interrupt(void);
......
......@@ -177,7 +177,7 @@
#define X86_FEATURE_PERFCTR_NB ( 6*32+24) /* NB performance counter extensions */
#define X86_FEATURE_BPEXT (6*32+26) /* data breakpoint extension */
#define X86_FEATURE_PTSC ( 6*32+27) /* performance time-stamp counter */
#define X86_FEATURE_PERFCTR_L2 ( 6*32+28) /* L2 performance counter extensions */
#define X86_FEATURE_PERFCTR_LLC ( 6*32+28) /* Last Level Cache performance counter extensions */
#define X86_FEATURE_MWAITX ( 6*32+29) /* MWAIT extension (MONITORX/MWAITX) */
/*
......
......@@ -39,6 +39,7 @@
#include <asm/insn.h>
#include <asm/debugreg.h>
#include <asm/set_memory.h>
#include <asm/sections.h>
#include "common.h"
......@@ -251,10 +252,12 @@ static int can_optimize(unsigned long paddr)
/*
* Do not optimize in the entry code due to the unstable
* stack handling.
* stack handling and registers setup.
*/
if ((paddr >= (unsigned long)__entry_text_start) &&
(paddr < (unsigned long)__entry_text_end))
if (((paddr >= (unsigned long)__entry_text_start) &&
(paddr < (unsigned long)__entry_text_end)) ||
((paddr >= (unsigned long)__irqentry_text_start) &&
(paddr < (unsigned long)__irqentry_text_end)))
return 0;
/* Check there is enough space for a relative jump. */
......
......@@ -91,10 +91,8 @@ static bool in_entry_code(unsigned long ip)
if (addr >= __entry_text_start && addr < __entry_text_end)
return true;
#if defined(CONFIG_FUNCTION_GRAPH_TRACER) || defined(CONFIG_KASAN)
if (addr >= __irqentry_text_start && addr < __irqentry_text_end)
return true;
#endif
return false;
}
......
......@@ -273,8 +273,8 @@ void __init init_arch(bp_tag_t *bp_start)
* Initialize system. Setup memory and reserve regions.
*/
extern char _end;
extern char _stext;
extern char _end[];
extern char _stext[];
extern char _WindowVectors_text_start;
extern char _WindowVectors_text_end;
extern char _DebugInterruptVector_literal_start;
......@@ -333,7 +333,7 @@ void __init setup_arch(char **cmdline_p)
}
#endif
mem_reserve(__pa(&_stext), __pa(&_end));
mem_reserve(__pa(_stext), __pa(_end));
#ifdef CONFIG_VECTORS_OFFSET
mem_reserve(__pa(&_WindowVectors_text_start),
......
......@@ -27,6 +27,8 @@
* __kprobes_text_start, __kprobes_text_end
* __entry_text_start, __entry_text_end
* __ctors_start, __ctors_end
* __irqentry_text_start, __irqentry_text_end
* __softirqentry_text_start, __softirqentry_text_end
*/
extern char _text[], _stext[], _etext[];
extern char _data[], _sdata[], _edata[];
......@@ -39,6 +41,8 @@ extern char __per_cpu_load[], __per_cpu_start[], __per_cpu_end[];
extern char __kprobes_text_start[], __kprobes_text_end[];
extern char __entry_text_start[], __entry_text_end[];
extern char __start_rodata[], __end_rodata[];
extern char __irqentry_text_start[], __irqentry_text_end[];
extern char __softirqentry_text_start[], __softirqentry_text_end[];
/* Start and end of .ctors section - used for constructor calls. */
extern char __ctors_start[], __ctors_end[];
......
......@@ -497,25 +497,17 @@
*(.entry.text) \
VMLINUX_SYMBOL(__entry_text_end) = .;
#if defined(CONFIG_FUNCTION_GRAPH_TRACER) || defined(CONFIG_KASAN)
#define IRQENTRY_TEXT \
ALIGN_FUNCTION(); \
VMLINUX_SYMBOL(__irqentry_text_start) = .; \
*(.irqentry.text) \
VMLINUX_SYMBOL(__irqentry_text_end) = .;
#else
#define IRQENTRY_TEXT
#endif
#if defined(CONFIG_FUNCTION_GRAPH_TRACER) || defined(CONFIG_KASAN)
#define SOFTIRQENTRY_TEXT \
ALIGN_FUNCTION(); \
VMLINUX_SYMBOL(__softirqentry_text_start) = .; \
*(.softirqentry.text) \
VMLINUX_SYMBOL(__softirqentry_text_end) = .;
#else
#define SOFTIRQENTRY_TEXT
#endif
/* Section used for early init (in .S files) */
#define HEAD_TEXT *(.head.text)
......
......@@ -18,6 +18,7 @@
#include <linux/atomic.h>
#include <asm/ptrace.h>
#include <asm/irq.h>
#include <asm/sections.h>
/*
* These correspond to the IORESOURCE_IRQ_* defines in
......@@ -726,7 +727,6 @@ extern int early_irq_init(void);
extern int arch_probe_nr_irqs(void);
extern int arch_early_irq_init(void);
#if defined(CONFIG_FUNCTION_GRAPH_TRACER) || defined(CONFIG_KASAN)
/*
* We want to know which function is an entrypoint of a hardirq or a softirq.
*/
......@@ -734,16 +734,4 @@ extern int arch_early_irq_init(void);
#define __softirq_entry \
__attribute__((__section__(".softirqentry.text")))
/* Limits of hardirq entrypoints */
extern char __irqentry_text_start[];
extern char __irqentry_text_end[];
/* Limits of softirq entrypoints */
extern char __softirqentry_text_start[];
extern char __softirqentry_text_end[];
#else
#define __irq_entry
#define __softirq_entry
#endif
#endif
......@@ -147,9 +147,6 @@ struct hw_perf_event {
struct list_head cqm_groups_entry;
struct list_head cqm_group_entry;
};
struct { /* itrace */
int itrace_started;
};
struct { /* amd_power */
u64 pwr_acc;
u64 ptsc;
......@@ -541,6 +538,7 @@ struct swevent_hlist {
#define PERF_ATTACH_GROUP 0x02
#define PERF_ATTACH_TASK 0x04
#define PERF_ATTACH_TASK_DATA 0x08
#define PERF_ATTACH_ITRACE 0x10
struct perf_cgroup;
struct ring_buffer;
......@@ -864,6 +862,7 @@ extern int perf_aux_output_skip(struct perf_output_handle *handle,
unsigned long size);
extern void *perf_get_aux(struct perf_output_handle *handle);
extern void perf_aux_output_flag(struct perf_output_handle *handle, u64 flags);
extern void perf_event_itrace_started(struct perf_event *event);
extern int perf_pmu_register(struct pmu *pmu, const char *name, int type);
extern void perf_pmu_unregister(struct pmu *pmu);
......@@ -944,6 +943,8 @@ struct perf_sample_data {
struct perf_regs regs_intr;
u64 stack_user_size;
u64 phys_addr;
} ____cacheline_aligned;
/* default value for data source */
......
......@@ -139,8 +139,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_IDENTIFIER = 1U << 16,
PERF_SAMPLE_TRANSACTION = 1U << 17,
PERF_SAMPLE_REGS_INTR = 1U << 18,
PERF_SAMPLE_PHYS_ADDR = 1U << 19,
PERF_SAMPLE_MAX = 1U << 19, /* non-ABI */
PERF_SAMPLE_MAX = 1U << 20, /* non-ABI */
};
/*
......@@ -174,6 +175,8 @@ enum perf_branch_sample_type_shift {
PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */
PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */
PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */
PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
};
......@@ -198,9 +201,30 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
PERF_SAMPLE_BRANCH_TYPE_SAVE =
1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
};
/*
* Common flow change classification
*/
enum {
PERF_BR_UNKNOWN = 0, /* unknown */
PERF_BR_COND = 1, /* conditional */
PERF_BR_UNCOND = 2, /* unconditional */
PERF_BR_IND = 3, /* indirect */
PERF_BR_CALL = 4, /* function call */
PERF_BR_IND_CALL = 5, /* indirect function call */
PERF_BR_RET = 6, /* function return */
PERF_BR_SYSCALL = 7, /* syscall */
PERF_BR_SYSRET = 8, /* syscall return */
PERF_BR_COND_CALL = 9, /* conditional function call */
PERF_BR_COND_RET = 10, /* conditional function return */
PERF_BR_MAX,
};
#define PERF_SAMPLE_BRANCH_PLM_ALL \
(PERF_SAMPLE_BRANCH_USER|\
PERF_SAMPLE_BRANCH_KERNEL|\
......@@ -791,6 +815,7 @@ enum perf_event_type {
* { u64 transaction; } && PERF_SAMPLE_TRANSACTION
* { u64 abi; # enum perf_sample_regs_abi
* u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR
* { u64 phys_addr;} && PERF_SAMPLE_PHYS_ADDR
* };
*/
PERF_RECORD_SAMPLE = 9,
......@@ -931,14 +956,20 @@ union perf_mem_data_src {
mem_snoop:5, /* snoop mode */
mem_lock:2, /* lock instr */
mem_dtlb:7, /* tlb access */
mem_rsvd:31;
mem_lvl_num:4, /* memory hierarchy level number */
mem_remote:1, /* remote */
mem_snoopx:2, /* snoop mode, ext */
mem_rsvd:24;
};
};
#elif defined(__BIG_ENDIAN_BITFIELD)
union perf_mem_data_src {
__u64 val;
struct {
__u64 mem_rsvd:31,
__u64 mem_rsvd:24,
mem_snoopx:2, /* snoop mode, ext */
mem_remote:1, /* remote */
mem_lvl_num:4, /* memory hierarchy level number */
mem_dtlb:7, /* tlb access */
mem_lock:2, /* lock instr */
mem_snoop:5, /* snoop mode */
......@@ -975,6 +1006,22 @@ union perf_mem_data_src {
#define PERF_MEM_LVL_UNC 0x2000 /* Uncached memory */
#define PERF_MEM_LVL_SHIFT 5
#define PERF_MEM_REMOTE_REMOTE 0x01 /* Remote */
#define PERF_MEM_REMOTE_SHIFT 37
#define PERF_MEM_LVLNUM_L1 0x01 /* L1 */
#define PERF_MEM_LVLNUM_L2 0x02 /* L2 */
#define PERF_MEM_LVLNUM_L3 0x03 /* L3 */
#define PERF_MEM_LVLNUM_L4 0x04 /* L4 */
/* 5-0xa available */
#define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
#define PERF_MEM_LVLNUM_LFB 0x0c /* LFB */
#define PERF_MEM_LVLNUM_RAM 0x0d /* RAM */
#define PERF_MEM_LVLNUM_PMEM 0x0e /* PMEM */
#define PERF_MEM_LVLNUM_NA 0x0f /* N/A */
#define PERF_MEM_LVLNUM_SHIFT 33
/* snoop mode */
#define PERF_MEM_SNOOP_NA 0x01 /* not available */
#define PERF_MEM_SNOOP_NONE 0x02 /* no snoop */
......@@ -983,6 +1030,10 @@ union perf_mem_data_src {
#define PERF_MEM_SNOOP_HITM 0x10 /* snoop hit modified */
#define PERF_MEM_SNOOP_SHIFT 19
#define PERF_MEM_SNOOPX_FWD 0x01 /* forward */
/* 1 free */
#define PERF_MEM_SNOOPX_SHIFT 37
/* locked instruction */
#define PERF_MEM_LOCK_NA 0x01 /* not available */
#define PERF_MEM_LOCK_LOCKED 0x02 /* locked transaction */
......@@ -1015,6 +1066,7 @@ union perf_mem_data_src {
* in_tx: running in a hardware transaction
* abort: aborting a hardware transaction
* cycles: cycles from last branch (or 0 if not supported)
* type: branch type
*/
struct perf_branch_entry {
__u64 from;
......@@ -1024,7 +1076,8 @@ struct perf_branch_entry {
in_tx:1, /* in transaction */
abort:1, /* transaction abort */
cycles:16, /* cycle count to last branch */
reserved:44;
type:4, /* branch type */
reserved:40;
};
#endif /* _UAPI_LINUX_PERF_EVENT_H */
......@@ -1249,26 +1249,31 @@ unclone_ctx(struct perf_event_context *ctx)
return parent_ctx;
}
static u32 perf_event_pid(struct perf_event *event, struct task_struct *p)
static u32 perf_event_pid_type(struct perf_event *event, struct task_struct *p,
enum pid_type type)
{
u32 nr;
/*
* only top level events have the pid namespace they were created in
*/
if (event->parent)
event = event->parent;
return task_tgid_nr_ns(p, event->ns);
nr = __task_pid_nr_ns(p, type, event->ns);
/* avoid -1 if it is idle thread or runs in another ns */
if (!nr && !pid_alive(p))
nr = -1;
return nr;
}
static u32 perf_event_tid(struct perf_event *event, struct task_struct *p)
static u32 perf_event_pid(struct perf_event *event, struct task_struct *p)
{
/*
* only top level events have the pid namespace they were created in
*/
if (event->parent)
event = event->parent;
return perf_event_pid_type(event, p, __PIDTYPE_TGID);
}
return task_pid_nr_ns(p, event->ns);
static u32 perf_event_tid(struct perf_event *event, struct task_struct *p)
{
return perf_event_pid_type(event, p, PIDTYPE_PID);
}
/*
......@@ -1570,6 +1575,9 @@ static void __perf_event_header_size(struct perf_event *event, u64 sample_type)
if (sample_type & PERF_SAMPLE_TRANSACTION)
size += sizeof(data->txn);
if (sample_type & PERF_SAMPLE_PHYS_ADDR)
size += sizeof(data->phys_addr);
event->header_size = size;
}
......@@ -3211,6 +3219,13 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
return;
perf_ctx_lock(cpuctx, ctx);
/*
* We must check ctx->nr_events while holding ctx->lock, such
* that we serialize against perf_install_in_context().
*/
if (!ctx->nr_events)
goto unlock;
perf_pmu_disable(ctx->pmu);
/*
* We want to keep the following priority order:
......@@ -3224,6 +3239,8 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
perf_event_sched_in(cpuctx, ctx, task);
perf_pmu_enable(ctx->pmu);
unlock:
perf_ctx_unlock(cpuctx, ctx);
}
......@@ -6003,6 +6020,9 @@ void perf_output_sample(struct perf_output_handle *handle,
}
}
if (sample_type & PERF_SAMPLE_PHYS_ADDR)
perf_output_put(handle, data->phys_addr);
if (!event->attr.watermark) {
int wakeup_events = event->attr.wakeup_events;
......@@ -6018,6 +6038,38 @@ void perf_output_sample(struct perf_output_handle *handle,
}
}
static u64 perf_virt_to_phys(u64 virt)
{
u64 phys_addr = 0;
struct page *p = NULL;
if (!virt)
return 0;
if (virt >= TASK_SIZE) {
/* If it's vmalloc()d memory, leave phys_addr as 0 */
if (virt_addr_valid((void *)(uintptr_t)virt) &&
!(virt >= VMALLOC_START && virt < VMALLOC_END))
phys_addr = (u64)virt_to_phys((void *)(uintptr_t)virt);
} else {
/*
* Walking the pages tables for user address.
* Interrupts are disabled, so it prevents any tear down
* of the page tables.
* Try IRQ-safe __get_user_pages_fast first.
* If failed, leave phys_addr as 0.
*/
if ((current->mm != NULL) &&
(__get_user_pages_fast(virt, 1, 0, &p) == 1))
phys_addr = page_to_phys(p) + virt % PAGE_SIZE;
if (p)
put_page(p);
}
return phys_addr;
}
void perf_prepare_sample(struct perf_event_header *header,
struct perf_sample_data *data,
struct perf_event *event,
......@@ -6136,6 +6188,9 @@ void perf_prepare_sample(struct perf_event_header *header,
header->size += size;
}
if (sample_type & PERF_SAMPLE_PHYS_ADDR)
data->phys_addr = perf_virt_to_phys(data->addr);
}
static void __always_inline
......@@ -7287,6 +7342,11 @@ static void perf_log_throttle(struct perf_event *event, int enable)
perf_output_end(&handle);
}
void perf_event_itrace_started(struct perf_event *event)
{
event->attach_state |= PERF_ATTACH_ITRACE;
}
static void perf_log_itrace_start(struct perf_event *event)
{
struct perf_output_handle handle;
......@@ -7302,7 +7362,7 @@ static void perf_log_itrace_start(struct perf_event *event)
event = event->parent;
if (!(event->pmu->capabilities & PERF_PMU_CAP_ITRACE) ||
event->hw.itrace_started)
event->attach_state & PERF_ATTACH_ITRACE)
return;
rec.header.type = PERF_RECORD_ITRACE_START;
......@@ -9890,6 +9950,11 @@ SYSCALL_DEFINE5(perf_event_open,
return -EINVAL;
}
/* Only privileged users can get physical addresses */
if ((attr.sample_type & PERF_SAMPLE_PHYS_ADDR) &&
perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
return -EACCES;
if (!attr.sample_max_stack)
attr.sample_max_stack = sysctl_perf_event_max_stack;
......
......@@ -38,9 +38,9 @@ struct ring_buffer {
struct user_struct *mmap_user;
/* AUX area */
local_t aux_head;
long aux_head;
local_t aux_nest;
local_t aux_wakeup;
long aux_wakeup; /* last aux_watermark boundary crossed by aux_head */
unsigned long aux_pgoff;
int aux_nr_pages;
int aux_overwrite;
......@@ -208,7 +208,7 @@ static inline int get_recursion_context(int *recursion)
{
int rctx;
if (in_nmi())
if (unlikely(in_nmi()))
rctx = 3;
else if (in_irq())
rctx = 2;
......
......@@ -367,7 +367,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
if (WARN_ON_ONCE(local_xchg(&rb->aux_nest, 1)))
goto err_put;
aux_head = local_read(&rb->aux_head);
aux_head = rb->aux_head;
handle->rb = rb;
handle->event = event;
......@@ -382,7 +382,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
*/
if (!rb->aux_overwrite) {
aux_tail = ACCESS_ONCE(rb->user_page->aux_tail);
handle->wakeup = local_read(&rb->aux_wakeup) + rb->aux_watermark;
handle->wakeup = rb->aux_wakeup + rb->aux_watermark;
if (aux_head - aux_tail < perf_aux_size(rb))
handle->size = CIRC_SPACE(aux_head, aux_tail, perf_aux_size(rb));
......@@ -433,12 +433,12 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)
handle->aux_flags |= PERF_AUX_FLAG_OVERWRITE;
aux_head = handle->head;
local_set(&rb->aux_head, aux_head);
rb->aux_head = aux_head;
} else {
handle->aux_flags &= ~PERF_AUX_FLAG_OVERWRITE;
aux_head = local_read(&rb->aux_head);
local_add(size, &rb->aux_head);
aux_head = rb->aux_head;
rb->aux_head += size;
}
if (size || handle->aux_flags) {
......@@ -450,11 +450,10 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)
handle->aux_flags);
}
aux_head = rb->user_page->aux_head = local_read(&rb->aux_head);
if (aux_head - local_read(&rb->aux_wakeup) >= rb->aux_watermark) {
rb->user_page->aux_head = rb->aux_head;
if (rb->aux_head - rb->aux_wakeup >= rb->aux_watermark) {
wakeup = true;
local_add(rb->aux_watermark, &rb->aux_wakeup);
rb->aux_wakeup = rounddown(rb->aux_head, rb->aux_watermark);
}
if (wakeup) {
......@@ -478,22 +477,20 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)
int perf_aux_output_skip(struct perf_output_handle *handle, unsigned long size)
{
struct ring_buffer *rb = handle->rb;
unsigned long aux_head;
if (size > handle->size)
return -ENOSPC;
local_add(size, &rb->aux_head);
rb->aux_head += size;
aux_head = rb->user_page->aux_head = local_read(&rb->aux_head);
if (aux_head - local_read(&rb->aux_wakeup) >= rb->aux_watermark) {
rb->user_page->aux_head = rb->aux_head;
if (rb->aux_head - rb->aux_wakeup >= rb->aux_watermark) {
perf_output_wakeup(handle);
local_add(rb->aux_watermark, &rb->aux_wakeup);
handle->wakeup = local_read(&rb->aux_wakeup) +
rb->aux_watermark;
rb->aux_wakeup = rounddown(rb->aux_head, rb->aux_watermark);
handle->wakeup = rb->aux_wakeup + rb->aux_watermark;
}
handle->head = aux_head;
handle->head = rb->aux_head;
handle->size -= size;
return 0;
......
......@@ -203,6 +203,14 @@ struct kvm_arch_memory_slot {
#define KVM_DEV_ARM_VGIC_LINE_LEVEL_INTID_MASK 0x3ff
#define VGIC_LEVEL_INFO_LINE_LEVEL 0
/* Device Control API on vcpu fd */
#define KVM_ARM_VCPU_PMU_V3_CTRL 0
#define KVM_ARM_VCPU_PMU_V3_IRQ 0
#define KVM_ARM_VCPU_PMU_V3_INIT 1
#define KVM_ARM_VCPU_TIMER_CTRL 1
#define KVM_ARM_VCPU_TIMER_IRQ_VTIMER 0
#define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
#define KVM_DEV_ARM_VGIC_CTRL_INIT 0
#define KVM_DEV_ARM_ITS_SAVE_TABLES 1
#define KVM_DEV_ARM_ITS_RESTORE_TABLES 2
......
......@@ -232,6 +232,9 @@ struct kvm_arch_memory_slot {
#define KVM_ARM_VCPU_PMU_V3_CTRL 0
#define KVM_ARM_VCPU_PMU_V3_IRQ 0
#define KVM_ARM_VCPU_PMU_V3_INIT 1
#define KVM_ARM_VCPU_TIMER_CTRL 1
#define KVM_ARM_VCPU_TIMER_IRQ_VTIMER 0
#define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1
/* KVM_IRQ_LINE irq field index values */
#define KVM_ARM_IRQ_TYPE_SHIFT 24
......
......@@ -60,6 +60,12 @@ struct kvm_regs {
#define KVM_SREGS_E_FSL_PIDn (1 << 0) /* PID1/PID2 */
/* flags for kvm_run.flags */
#define KVM_RUN_PPC_NMI_DISP_MASK (3 << 0)
#define KVM_RUN_PPC_NMI_DISP_FULLY_RECOV (1 << 0)
#define KVM_RUN_PPC_NMI_DISP_LIMITED_RECOV (2 << 0)
#define KVM_RUN_PPC_NMI_DISP_NOT_RECOV (3 << 0)
/*
* Feature bits indicate which sections of the sregs struct are valid,
* both in KVM_GET_SREGS and KVM_SET_SREGS. On KVM_SET_SREGS, registers
......
......@@ -28,6 +28,7 @@
#define KVM_DEV_FLIC_CLEAR_IO_IRQ 8
#define KVM_DEV_FLIC_AISM 9
#define KVM_DEV_FLIC_AIRQ_INJECT 10
#define KVM_DEV_FLIC_AISM_ALL 11
/*
* We can have up to 4*64k pending subchannels + 8 adapter interrupts,
* as well as up to ASYNC_PF_PER_VCPU*KVM_MAX_VCPUS pfault done interrupts.
......@@ -53,6 +54,11 @@ struct kvm_s390_ais_req {
__u16 mode;
};
struct kvm_s390_ais_all {
__u8 simm;
__u8 nimm;
};
#define KVM_S390_IO_ADAPTER_MASK 1
#define KVM_S390_IO_ADAPTER_MAP 2
#define KVM_S390_IO_ADAPTER_UNMAP 3
......@@ -70,6 +76,7 @@ struct kvm_s390_io_adapter_req {
#define KVM_S390_VM_TOD 1
#define KVM_S390_VM_CRYPTO 2
#define KVM_S390_VM_CPU_MODEL 3
#define KVM_S390_VM_MIGRATION 4
/* kvm attributes for mem_ctrl */
#define KVM_S390_VM_MEM_ENABLE_CMMA 0
......@@ -151,6 +158,11 @@ struct kvm_s390_vm_cpu_subfunc {
#define KVM_S390_VM_CRYPTO_DISABLE_AES_KW 2
#define KVM_S390_VM_CRYPTO_DISABLE_DEA_KW 3
/* kvm attributes for migration mode */
#define KVM_S390_VM_MIGRATION_STOP 0
#define KVM_S390_VM_MIGRATION_START 1
#define KVM_S390_VM_MIGRATION_STATUS 2
/* for KVM_GET_REGS and KVM_SET_REGS */
struct kvm_regs {
/* general purpose regs for s390 */
......
......@@ -177,7 +177,7 @@
#define X86_FEATURE_PERFCTR_NB ( 6*32+24) /* NB performance counter extensions */
#define X86_FEATURE_BPEXT (6*32+26) /* data breakpoint extension */
#define X86_FEATURE_PTSC ( 6*32+27) /* performance time-stamp counter */
#define X86_FEATURE_PERFCTR_L2 ( 6*32+28) /* L2 performance counter extensions */
#define X86_FEATURE_PERFCTR_LLC ( 6*32+28) /* Last Level Cache performance counter extensions */
#define X86_FEATURE_MWAITX ( 6*32+29) /* MWAIT extension (MONITORX/MWAITX) */
/*
......@@ -286,6 +286,7 @@
#define X86_FEATURE_PAUSEFILTER (15*32+10) /* filtered pause intercept */
#define X86_FEATURE_PFTHRESHOLD (15*32+12) /* pause filter threshold */
#define X86_FEATURE_AVIC (15*32+13) /* Virtual Interrupt Controller */
#define X86_FEATURE_V_VMSAVE_VMLOAD (15*32+15) /* Virtual VMSAVE VMLOAD */
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */
#define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/
......
......@@ -10,3 +10,6 @@
#ifndef __NR_getcpu
# define __NR_getcpu 318
#endif
#ifndef __NR_setns
# define __NR_setns 346
#endif
......@@ -10,3 +10,6 @@
#ifndef __NR_getcpu
# define __NR_getcpu 309
#endif
#ifndef __NR_setns
#define __NR_setns 308
#endif
#ifndef _UAPI_ASM_X86_UNISTD_H
#define _UAPI_ASM_X86_UNISTD_H
/* x32 syscall flag bit */
#define __X32_SYSCALL_BIT 0x40000000
#ifndef __KERNEL__
# ifdef __i386__
# include <asm/unistd_32.h>
# elif defined(__ILP32__)
# include <asm/unistd_x32.h>
# else
# include <asm/unistd_64.h>
# endif
#endif
#endif /* _UAPI_ASM_X86_UNISTD_H */
......@@ -64,7 +64,8 @@ FEATURE_TESTS_BASIC := \
get_cpuid \
bpf \
sched_getcpu \
sdt
sdt \
setns
# FEATURE_TESTS_BASIC + FEATURE_TESTS_EXTRA is the complete list
# of all feature tests
......
......@@ -49,7 +49,8 @@ FILES= \
test-sdt.bin \
test-cxx.bin \
test-jvmti.bin \
test-sched_getcpu.bin
test-sched_getcpu.bin \
test-setns.bin
FILES := $(addprefix $(OUTPUT),$(FILES))
......@@ -95,6 +96,9 @@ $(OUTPUT)test-glibc.bin:
$(OUTPUT)test-sched_getcpu.bin:
$(BUILD)
$(OUTPUT)test-setns.bin:
$(BUILD)
DWARFLIBS := -ldw
ifeq ($(findstring -static,${LDFLAGS}),-static)
DWARFLIBS += -lelf -lebl -lz -llzma -lbz2
......
......@@ -153,6 +153,10 @@
# include "test-sdt.c"
#undef main
#define main main_test_setns
# include "test-setns.c"
#undef main
int main(int argc, char *argv[])
{
main_test_libpython();
......@@ -188,6 +192,7 @@ int main(int argc, char *argv[])
main_test_libcrypto();
main_test_sched_getcpu();
main_test_sdt();
main_test_setns();
return 0;
}
#define _GNU_SOURCE
#include <sched.h>
int main(void)
{
return setns(0, 0);
}
......@@ -8,7 +8,7 @@ ex:
include $(srctree)/tools/build/Makefile.include
ex: ex-in.o libex-in.o
gcc -o $@ $^
$(CC) -o $@ $^
ex.%: fixdep FORCE
make -f $(srctree)/tools/build/Makefile.build dir=. $@
......
#ifndef _TOOLS_LINUX_STRING_H_
#define _TOOLS_LINUX_STRING_H_
#include <linux/types.h> /* for size_t */
#include <string.h>
void *memdup(const void *src, size_t len);
......@@ -18,6 +18,14 @@ extern size_t strlcpy(char *dest, const char *src, size_t size);
char *str_error_r(int errnum, char *buf, size_t buflen);
int prefixcmp(const char *str, const char *prefix);
/**
* strstarts - does @str start with @prefix?
* @str: string to examine
* @prefix: prefix to look for.
*/
static inline bool strstarts(const char *str, const char *prefix)
{
return strncmp(str, prefix, strlen(prefix)) == 0;
}
#endif /* _LINUX_STRING_H_ */
#ifndef _ASM_GENERIC_FCNTL_H
#define _ASM_GENERIC_FCNTL_H
#include <linux/types.h>
/*
* FMODE_EXEC is 0x20
* FMODE_NONOTIFY is 0x4000000
* These cannot be used by userspace O_* until internal and external open
* flags are split.
* -Eric Paris
*/
/*
* When introducing new O_* bits, please check its uniqueness in fcntl_init().
*/
#define O_ACCMODE 00000003
#define O_RDONLY 00000000
#define O_WRONLY 00000001
#define O_RDWR 00000002
#ifndef O_CREAT
#define O_CREAT 00000100 /* not fcntl */
#endif
#ifndef O_EXCL
#define O_EXCL 00000200 /* not fcntl */
#endif
#ifndef O_NOCTTY
#define O_NOCTTY 00000400 /* not fcntl */
#endif
#ifndef O_TRUNC
#define O_TRUNC 00001000 /* not fcntl */
#endif
#ifndef O_APPEND
#define O_APPEND 00002000
#endif
#ifndef O_NONBLOCK
#define O_NONBLOCK 00004000
#endif
#ifndef O_DSYNC
#define O_DSYNC 00010000 /* used to be O_SYNC, see below */
#endif
#ifndef FASYNC
#define FASYNC 00020000 /* fcntl, for BSD compatibility */
#endif
#ifndef O_DIRECT
#define O_DIRECT 00040000 /* direct disk access hint */
#endif
#ifndef O_LARGEFILE
#define O_LARGEFILE 00100000
#endif
#ifndef O_DIRECTORY
#define O_DIRECTORY 00200000 /* must be a directory */
#endif
#ifndef O_NOFOLLOW
#define O_NOFOLLOW 00400000 /* don't follow links */
#endif
#ifndef O_NOATIME
#define O_NOATIME 01000000
#endif
#ifndef O_CLOEXEC
#define O_CLOEXEC 02000000 /* set close_on_exec */
#endif
/*
* Before Linux 2.6.33 only O_DSYNC semantics were implemented, but using
* the O_SYNC flag. We continue to use the existing numerical value
* for O_DSYNC semantics now, but using the correct symbolic name for it.
* This new value is used to request true Posix O_SYNC semantics. It is
* defined in this strange way to make sure applications compiled against
* new headers get at least O_DSYNC semantics on older kernels.
*
* This has the nice side-effect that we can simply test for O_DSYNC
* wherever we do not care if O_DSYNC or O_SYNC is used.
*
* Note: __O_SYNC must never be used directly.
*/
#ifndef O_SYNC
#define __O_SYNC 04000000
#define O_SYNC (__O_SYNC|O_DSYNC)
#endif
#ifndef O_PATH
#define O_PATH 010000000
#endif
#ifndef __O_TMPFILE
#define __O_TMPFILE 020000000
#endif
/* a horrid kludge trying to make sure that this will fail on old kernels */
#define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
#define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT)
#ifndef O_NDELAY
#define O_NDELAY O_NONBLOCK
#endif
#define F_DUPFD 0 /* dup */
#define F_GETFD 1 /* get close_on_exec */
#define F_SETFD 2 /* set/clear close_on_exec */
#define F_GETFL 3 /* get file->f_flags */
#define F_SETFL 4 /* set file->f_flags */
#ifndef F_GETLK
#define F_GETLK 5
#define F_SETLK 6
#define F_SETLKW 7
#endif
#ifndef F_SETOWN
#define F_SETOWN 8 /* for sockets. */
#define F_GETOWN 9 /* for sockets. */
#endif
#ifndef F_SETSIG
#define F_SETSIG 10 /* for sockets. */
#define F_GETSIG 11 /* for sockets. */
#endif
#ifndef CONFIG_64BIT
#ifndef F_GETLK64
#define F_GETLK64 12 /* using 'struct flock64' */
#define F_SETLK64 13
#define F_SETLKW64 14
#endif
#endif
#ifndef F_SETOWN_EX
#define F_SETOWN_EX 15
#define F_GETOWN_EX 16
#endif
#ifndef F_GETOWNER_UIDS
#define F_GETOWNER_UIDS 17
#endif
/*
* Open File Description Locks
*
* Usually record locks held by a process are released on *any* close and are
* not inherited across a fork().
*
* These cmd values will set locks that conflict with process-associated
* record locks, but are "owned" by the open file description, not the
* process. This means that they are inherited across fork() like BSD (flock)
* locks, and they are only released automatically when the last reference to
* the the open file against which they were acquired is put.
*/
#define F_OFD_GETLK 36
#define F_OFD_SETLK 37
#define F_OFD_SETLKW 38
#define F_OWNER_TID 0
#define F_OWNER_PID 1
#define F_OWNER_PGRP 2
struct f_owner_ex {
int type;
__kernel_pid_t pid;
};
/* for F_[GET|SET]FL */
#define FD_CLOEXEC 1 /* actually anything with low bit set goes */
/* for posix fcntl() and lockf() */
#ifndef F_RDLCK
#define F_RDLCK 0
#define F_WRLCK 1
#define F_UNLCK 2
#endif
/* for old implementation of bsd flock () */
#ifndef F_EXLCK
#define F_EXLCK 4 /* or 3 */
#define F_SHLCK 8 /* or 4 */
#endif
/* operations for bsd flock(), also used by the kernel implementation */
#define LOCK_SH 1 /* shared lock */
#define LOCK_EX 2 /* exclusive lock */
#define LOCK_NB 4 /* or'd with one of the above to prevent
blocking */
#define LOCK_UN 8 /* remove lock */
#define LOCK_MAND 32 /* This is a mandatory flock ... */
#define LOCK_READ 64 /* which allows concurrent read operations */
#define LOCK_WRITE 128 /* which allows concurrent write operations */
#define LOCK_RW 192 /* which allows concurrent read & write ops */
#define F_LINUX_SPECIFIC_BASE 1024
#ifndef HAVE_ARCH_STRUCT_FLOCK
#ifndef __ARCH_FLOCK_PAD
#define __ARCH_FLOCK_PAD
#endif
struct flock {
short l_type;
short l_whence;
__kernel_off_t l_start;
__kernel_off_t l_len;
__kernel_pid_t l_pid;
__ARCH_FLOCK_PAD
};
#endif
#ifndef HAVE_ARCH_STRUCT_FLOCK64
#ifndef __ARCH_FLOCK64_PAD
#define __ARCH_FLOCK64_PAD
#endif
struct flock64 {
short l_type;
short l_whence;
__kernel_loff_t l_start;
__kernel_loff_t l_len;
__kernel_pid_t l_pid;
__ARCH_FLOCK64_PAD
};
#endif
#endif /* _ASM_GENERIC_FCNTL_H */
#ifndef __ASM_GENERIC_IOCTLS_H
#define __ASM_GENERIC_IOCTLS_H
#include <linux/ioctl.h>
/*
* These are the most common definitions for tty ioctl numbers.
* Most of them do not use the recommended _IOC(), but there is
* probably some source code out there hardcoding the number,
* so we might as well use them for all new platforms.
*
* The architectures that use different values here typically
* try to be compatible with some Unix variants for the same
* architecture.
*/
/* 0x54 is just a magic number to make these relatively unique ('T') */
#define TCGETS 0x5401
#define TCSETS 0x5402
#define TCSETSW 0x5403
#define TCSETSF 0x5404
#define TCGETA 0x5405
#define TCSETA 0x5406
#define TCSETAW 0x5407
#define TCSETAF 0x5408
#define TCSBRK 0x5409
#define TCXONC 0x540A
#define TCFLSH 0x540B
#define TIOCEXCL 0x540C
#define TIOCNXCL 0x540D
#define TIOCSCTTY 0x540E
#define TIOCGPGRP 0x540F
#define TIOCSPGRP 0x5410
#define TIOCOUTQ 0x5411
#define TIOCSTI 0x5412
#define TIOCGWINSZ 0x5413
#define TIOCSWINSZ 0x5414
#define TIOCMGET 0x5415
#define TIOCMBIS 0x5416
#define TIOCMBIC 0x5417
#define TIOCMSET 0x5418
#define TIOCGSOFTCAR 0x5419
#define TIOCSSOFTCAR 0x541A
#define FIONREAD 0x541B
#define TIOCINQ FIONREAD
#define TIOCLINUX 0x541C
#define TIOCCONS 0x541D
#define TIOCGSERIAL 0x541E
#define TIOCSSERIAL 0x541F
#define TIOCPKT 0x5420
#define FIONBIO 0x5421
#define TIOCNOTTY 0x5422
#define TIOCSETD 0x5423
#define TIOCGETD 0x5424
#define TCSBRKP 0x5425 /* Needed for POSIX tcsendbreak() */
#define TIOCSBRK 0x5427 /* BSD compatibility */
#define TIOCCBRK 0x5428 /* BSD compatibility */
#define TIOCGSID 0x5429 /* Return the session ID of FD */
#define TCGETS2 _IOR('T', 0x2A, struct termios2)
#define TCSETS2 _IOW('T', 0x2B, struct termios2)
#define TCSETSW2 _IOW('T', 0x2C, struct termios2)
#define TCSETSF2 _IOW('T', 0x2D, struct termios2)
#define TIOCGRS485 0x542E
#ifndef TIOCSRS485
#define TIOCSRS485 0x542F
#endif
#define TIOCGPTN _IOR('T', 0x30, unsigned int) /* Get Pty Number (of pty-mux device) */
#define TIOCSPTLCK _IOW('T', 0x31, int) /* Lock/unlock Pty */
#define TIOCGDEV _IOR('T', 0x32, unsigned int) /* Get primary device node of /dev/console */
#define TCGETX 0x5432 /* SYS5 TCGETX compatibility */
#define TCSETX 0x5433
#define TCSETXF 0x5434
#define TCSETXW 0x5435
#define TIOCSIG _IOW('T', 0x36, int) /* pty: generate signal */
#define TIOCVHANGUP 0x5437
#define TIOCGPKT _IOR('T', 0x38, int) /* Get packet mode state */
#define TIOCGPTLCK _IOR('T', 0x39, int) /* Get Pty lock state */
#define TIOCGEXCL _IOR('T', 0x40, int) /* Get exclusive mode state */
#define TIOCGPTPEER _IO('T', 0x41) /* Safely open the slave */
#define FIONCLEX 0x5450
#define FIOCLEX 0x5451
#define FIOASYNC 0x5452
#define TIOCSERCONFIG 0x5453
#define TIOCSERGWILD 0x5454
#define TIOCSERSWILD 0x5455
#define TIOCGLCKTRMIOS 0x5456
#define TIOCSLCKTRMIOS 0x5457
#define TIOCSERGSTRUCT 0x5458 /* For debugging only */
#define TIOCSERGETLSR 0x5459 /* Get line status register */
#define TIOCSERGETMULTI 0x545A /* Get multiport config */
#define TIOCSERSETMULTI 0x545B /* Set multiport config */
#define TIOCMIWAIT 0x545C /* wait for a change on serial input line(s) */
#define TIOCGICOUNT 0x545D /* read serial port inline interrupt counts */
/*
* Some arches already define FIOQSIZE due to a historical
* conflict with a Hayes modem-specific ioctl value.
*/
#ifndef FIOQSIZE
# define FIOQSIZE 0x5460
#endif
/* Used for packet mode */
#define TIOCPKT_DATA 0
#define TIOCPKT_FLUSHREAD 1
#define TIOCPKT_FLUSHWRITE 2
#define TIOCPKT_STOP 4
#define TIOCPKT_START 8
#define TIOCPKT_NOSTOP 16
#define TIOCPKT_DOSTOP 32
#define TIOCPKT_IOCTL 64
#define TIOCSER_TEMT 0x01 /* Transmitter physically empty */
#endif /* __ASM_GENERIC_IOCTLS_H */
This diff is collapsed.
This diff is collapsed.
......@@ -643,7 +643,7 @@ enum bpf_func_id {
/* Mode for BPF_FUNC_skb_adjust_room helper. */
enum bpf_adj_room_mode {
BPF_ADJ_ROOM_NET_OPTS,
BPF_ADJ_ROOM_NET,
};
/* user accessible mirror of in-kernel sk_buff.
......@@ -750,6 +750,8 @@ struct bpf_map_info {
/* User bpf_sock_ops struct to access socket values and specify request ops
* and their replies.
* Some of this fields are in network (bigendian) byte order and may need
* to be converted before use (bpf_ntohl() defined in samples/bpf/bpf_endian.h).
* New fields can only be added at the end of this structure
*/
struct bpf_sock_ops {
......@@ -759,12 +761,12 @@ struct bpf_sock_ops {
__u32 replylong[4];
};
__u32 family;
__u32 remote_ip4;
__u32 local_ip4;
__u32 remote_ip6[4];
__u32 local_ip6[4];
__u32 remote_port;
__u32 local_port;
__u32 remote_ip4; /* Stored in network byte order */
__u32 local_ip4; /* Stored in network byte order */
__u32 remote_ip6[4]; /* Stored in network byte order */
__u32 local_ip6[4]; /* Stored in network byte order */
__u32 remote_port; /* Stored in network byte order */
__u32 local_port; /* stored in host byte order */
};
/* List of known BPF sock_ops operators.
......
......@@ -42,6 +42,27 @@
#define F_SEAL_WRITE 0x0008 /* prevent writes */
/* (1U << 31) is reserved for signed error codes */
/*
* Set/Get write life time hints. {GET,SET}_RW_HINT operate on the
* underlying inode, while {GET,SET}_FILE_RW_HINT operate only on
* the specific file.
*/
#define F_GET_RW_HINT (F_LINUX_SPECIFIC_BASE + 11)
#define F_SET_RW_HINT (F_LINUX_SPECIFIC_BASE + 12)
#define F_GET_FILE_RW_HINT (F_LINUX_SPECIFIC_BASE + 13)
#define F_SET_FILE_RW_HINT (F_LINUX_SPECIFIC_BASE + 14)
/*
* Valid hint values for F_{GET,SET}_RW_HINT. 0 is "not set", or can be
* used to clear any hints previously set.
*/
#define RWF_WRITE_LIFE_NOT_SET 0
#define RWH_WRITE_LIFE_NONE 1
#define RWH_WRITE_LIFE_SHORT 2
#define RWH_WRITE_LIFE_MEDIUM 3
#define RWH_WRITE_LIFE_LONG 4
#define RWH_WRITE_LIFE_EXTREME 5
/*
* Types of directory notifications that may be requested.
*/
......
This diff is collapsed.
......@@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */
PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */
PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */
PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
};
......@@ -198,9 +200,30 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
PERF_SAMPLE_BRANCH_TYPE_SAVE =
1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
};
/*
* Common flow change classification
*/
enum {
PERF_BR_UNKNOWN = 0, /* unknown */
PERF_BR_COND = 1, /* conditional */
PERF_BR_UNCOND = 2, /* unconditional */
PERF_BR_IND = 3, /* indirect */
PERF_BR_CALL = 4, /* function call */
PERF_BR_IND_CALL = 5, /* indirect function call */
PERF_BR_RET = 6, /* function return */
PERF_BR_SYSCALL = 7, /* syscall */
PERF_BR_SYSRET = 8, /* syscall return */
PERF_BR_COND_CALL = 9, /* conditional function call */
PERF_BR_COND_RET = 10, /* conditional function return */
PERF_BR_MAX,
};
#define PERF_SAMPLE_BRANCH_PLM_ALL \
(PERF_SAMPLE_BRANCH_USER|\
PERF_SAMPLE_BRANCH_KERNEL|\
......@@ -931,14 +954,20 @@ union perf_mem_data_src {
mem_snoop:5, /* snoop mode */
mem_lock:2, /* lock instr */
mem_dtlb:7, /* tlb access */
mem_rsvd:31;
mem_lvl_num:4, /* memory hierarchy level number */
mem_remote:1, /* remote */
mem_snoopx:2, /* snoop mode, ext */
mem_rsvd:24;
};
};
#elif defined(__BIG_ENDIAN_BITFIELD)
union perf_mem_data_src {
__u64 val;
struct {
__u64 mem_rsvd:31,
__u64 mem_rsvd:24,
mem_snoopx:2, /* snoop mode, ext */
mem_remote:1, /* remote */
mem_lvl_num:4, /* memory hierarchy level number */
mem_dtlb:7, /* tlb access */
mem_lock:2, /* lock instr */
mem_snoop:5, /* snoop mode */
......@@ -975,6 +1004,22 @@ union perf_mem_data_src {
#define PERF_MEM_LVL_UNC 0x2000 /* Uncached memory */
#define PERF_MEM_LVL_SHIFT 5
#define PERF_MEM_REMOTE_REMOTE 0x01 /* Remote */
#define PERF_MEM_REMOTE_SHIFT 37
#define PERF_MEM_LVLNUM_L1 0x01 /* L1 */
#define PERF_MEM_LVLNUM_L2 0x02 /* L2 */
#define PERF_MEM_LVLNUM_L3 0x03 /* L3 */
#define PERF_MEM_LVLNUM_L4 0x04 /* L4 */
/* 5-0xa available */
#define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
#define PERF_MEM_LVLNUM_LFB 0x0c /* LFB */
#define PERF_MEM_LVLNUM_RAM 0x0d /* RAM */
#define PERF_MEM_LVLNUM_PMEM 0x0e /* PMEM */
#define PERF_MEM_LVLNUM_NA 0x0f /* N/A */
#define PERF_MEM_LVLNUM_SHIFT 33
/* snoop mode */
#define PERF_MEM_SNOOP_NA 0x01 /* not available */
#define PERF_MEM_SNOOP_NONE 0x02 /* no snoop */
......@@ -983,6 +1028,10 @@ union perf_mem_data_src {
#define PERF_MEM_SNOOP_HITM 0x10 /* snoop hit modified */
#define PERF_MEM_SNOOP_SHIFT 19
#define PERF_MEM_SNOOPX_FWD 0x01 /* forward */
/* 1 free */
#define PERF_MEM_SNOOPX_SHIFT 37
/* locked instruction */
#define PERF_MEM_LOCK_NA 0x01 /* not available */
#define PERF_MEM_LOCK_LOCKED 0x02 /* locked transaction */
......@@ -1015,6 +1064,7 @@ union perf_mem_data_src {
* in_tx: running in a hardware transaction
* abort: aborting a hardware transaction
* cycles: cycles from last branch (or 0 if not supported)
* type: branch type
*/
struct perf_branch_entry {
__u64 from;
......@@ -1024,7 +1074,8 @@ struct perf_branch_entry {
in_tx:1, /* in transaction */
abort:1, /* transaction abort */
cycles:16, /* cycle count to last branch */
reserved:44;
type:4, /* branch type */
reserved:40;
};
#endif /* _UAPI_LINUX_PERF_EVENT_H */
#ifndef _UAPI_LINUX_SCHED_H
#define _UAPI_LINUX_SCHED_H
/*
* cloning flags:
*/
#define CSIGNAL 0x000000ff /* signal mask to be sent at exit */
#define CLONE_VM 0x00000100 /* set if VM shared between processes */
#define CLONE_FS 0x00000200 /* set if fs info shared between processes */
#define CLONE_FILES 0x00000400 /* set if open files shared between processes */
#define CLONE_SIGHAND 0x00000800 /* set if signal handlers and blocked signals shared */
#define CLONE_PTRACE 0x00002000 /* set if we want to let tracing continue on the child too */
#define CLONE_VFORK 0x00004000 /* set if the parent wants the child to wake it up on mm_release */
#define CLONE_PARENT 0x00008000 /* set if we want to have the same parent as the cloner */
#define CLONE_THREAD 0x00010000 /* Same thread group? */
#define CLONE_NEWNS 0x00020000 /* New mount namespace group */
#define CLONE_SYSVSEM 0x00040000 /* share system V SEM_UNDO semantics */
#define CLONE_SETTLS 0x00080000 /* create a new TLS for the child */
#define CLONE_PARENT_SETTID 0x00100000 /* set the TID in the parent */
#define CLONE_CHILD_CLEARTID 0x00200000 /* clear the TID in the child */
#define CLONE_DETACHED 0x00400000 /* Unused, ignored */
#define CLONE_UNTRACED 0x00800000 /* set if the tracing process can't force CLONE_PTRACE on this clone */
#define CLONE_CHILD_SETTID 0x01000000 /* set the TID in the child */
#define CLONE_NEWCGROUP 0x02000000 /* New cgroup namespace */
#define CLONE_NEWUTS 0x04000000 /* New utsname namespace */
#define CLONE_NEWIPC 0x08000000 /* New ipc namespace */
#define CLONE_NEWUSER 0x10000000 /* New user namespace */
#define CLONE_NEWPID 0x20000000 /* New pid namespace */
#define CLONE_NEWNET 0x40000000 /* New network namespace */
#define CLONE_IO 0x80000000 /* Clone io context */
/*
* Scheduling policies
*/
#define SCHED_NORMAL 0
#define SCHED_FIFO 1
#define SCHED_RR 2
#define SCHED_BATCH 3
/* SCHED_ISO: reserved but not implemented yet */
#define SCHED_IDLE 5
#define SCHED_DEADLINE 6
/* Can be ORed in to make sure the process is reverted back to SCHED_NORMAL on fork */
#define SCHED_RESET_ON_FORK 0x40000000
/*
* For the sched_{set,get}attr() calls
*/
#define SCHED_FLAG_RESET_ON_FORK 0x01
#define SCHED_FLAG_RECLAIM 0x02
#endif /* _UAPI_LINUX_SCHED_H */
#ifndef _LINUX_VHOST_H
#define _LINUX_VHOST_H
/* Userspace interface for in-kernel virtio accelerators. */
/* vhost is used to reduce the number of system calls involved in virtio.
*
* Existing virtio net code is used in the guest without modification.
*
* This header includes interface used by userspace hypervisor for
* device configuration.
*/
#include <linux/types.h>
#include <linux/compiler.h>
#include <linux/ioctl.h>
#include <linux/virtio_config.h>
#include <linux/virtio_ring.h>
struct vhost_vring_state {
unsigned int index;
unsigned int num;
};
struct vhost_vring_file {
unsigned int index;
int fd; /* Pass -1 to unbind from file. */
};
struct vhost_vring_addr {
unsigned int index;
/* Option flags. */
unsigned int flags;
/* Flag values: */
/* Whether log address is valid. If set enables logging. */
#define VHOST_VRING_F_LOG 0
/* Start of array of descriptors (virtually contiguous) */
__u64 desc_user_addr;
/* Used structure address. Must be 32 bit aligned */
__u64 used_user_addr;
/* Available structure address. Must be 16 bit aligned */
__u64 avail_user_addr;
/* Logging support. */
/* Log writes to used structure, at offset calculated from specified
* address. Address must be 32 bit aligned. */
__u64 log_guest_addr;
};
/* no alignment requirement */
struct vhost_iotlb_msg {
__u64 iova;
__u64 size;
__u64 uaddr;
#define VHOST_ACCESS_RO 0x1
#define VHOST_ACCESS_WO 0x2
#define VHOST_ACCESS_RW 0x3
__u8 perm;
#define VHOST_IOTLB_MISS 1
#define VHOST_IOTLB_UPDATE 2
#define VHOST_IOTLB_INVALIDATE 3
#define VHOST_IOTLB_ACCESS_FAIL 4
__u8 type;
};
#define VHOST_IOTLB_MSG 0x1
struct vhost_msg {
int type;
union {
struct vhost_iotlb_msg iotlb;
__u8 padding[64];
};
};
struct vhost_memory_region {
__u64 guest_phys_addr;
__u64 memory_size; /* bytes */
__u64 userspace_addr;
__u64 flags_padding; /* No flags are currently specified. */
};
/* All region addresses and sizes must be 4K aligned. */
#define VHOST_PAGE_SIZE 0x1000
struct vhost_memory {
__u32 nregions;
__u32 padding;
struct vhost_memory_region regions[0];
};
/* ioctls */
#define VHOST_VIRTIO 0xAF
/* Features bitmask for forward compatibility. Transport bits are used for
* vhost specific features. */
#define VHOST_GET_FEATURES _IOR(VHOST_VIRTIO, 0x00, __u64)
#define VHOST_SET_FEATURES _IOW(VHOST_VIRTIO, 0x00, __u64)
/* Set current process as the (exclusive) owner of this file descriptor. This
* must be called before any other vhost command. Further calls to
* VHOST_OWNER_SET fail until VHOST_OWNER_RESET is called. */
#define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01)
/* Give up ownership, and reset the device to default values.
* Allows subsequent call to VHOST_OWNER_SET to succeed. */
#define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02)
/* Set up/modify memory layout */
#define VHOST_SET_MEM_TABLE _IOW(VHOST_VIRTIO, 0x03, struct vhost_memory)
/* Write logging setup. */
/* Memory writes can optionally be logged by setting bit at an offset
* (calculated from the physical address) from specified log base.
* The bit is set using an atomic 32 bit operation. */
/* Set base address for logging. */
#define VHOST_SET_LOG_BASE _IOW(VHOST_VIRTIO, 0x04, __u64)
/* Specify an eventfd file descriptor to signal on log write. */
#define VHOST_SET_LOG_FD _IOW(VHOST_VIRTIO, 0x07, int)
/* Ring setup. */
/* Set number of descriptors in ring. This parameter can not
* be modified while ring is running (bound to a device). */
#define VHOST_SET_VRING_NUM _IOW(VHOST_VIRTIO, 0x10, struct vhost_vring_state)
/* Set addresses for the ring. */
#define VHOST_SET_VRING_ADDR _IOW(VHOST_VIRTIO, 0x11, struct vhost_vring_addr)
/* Base value where queue looks for available descriptors */
#define VHOST_SET_VRING_BASE _IOW(VHOST_VIRTIO, 0x12, struct vhost_vring_state)
/* Get accessor: reads index, writes value in num */
#define VHOST_GET_VRING_BASE _IOWR(VHOST_VIRTIO, 0x12, struct vhost_vring_state)
/* Set the vring byte order in num. Valid values are VHOST_VRING_LITTLE_ENDIAN
* or VHOST_VRING_BIG_ENDIAN (other values return -EINVAL).
* The byte order cannot be changed while the device is active: trying to do so
* returns -EBUSY.
* This is a legacy only API that is simply ignored when VIRTIO_F_VERSION_1 is
* set.
* Not all kernel configurations support this ioctl, but all configurations that
* support SET also support GET.
*/
#define VHOST_VRING_LITTLE_ENDIAN 0
#define VHOST_VRING_BIG_ENDIAN 1
#define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
#define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
/* The following ioctls use eventfd file descriptors to signal and poll
* for events. */
/* Set eventfd to poll for added buffers */
#define VHOST_SET_VRING_KICK _IOW(VHOST_VIRTIO, 0x20, struct vhost_vring_file)
/* Set eventfd to signal when buffers have beed used */
#define VHOST_SET_VRING_CALL _IOW(VHOST_VIRTIO, 0x21, struct vhost_vring_file)
/* Set eventfd to signal an error */
#define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file)
/* Set busy loop timeout (in us) */
#define VHOST_SET_VRING_BUSYLOOP_TIMEOUT _IOW(VHOST_VIRTIO, 0x23, \
struct vhost_vring_state)
/* Get busy loop timeout (in us) */
#define VHOST_GET_VRING_BUSYLOOP_TIMEOUT _IOW(VHOST_VIRTIO, 0x24, \
struct vhost_vring_state)
/* VHOST_NET specific defines */
/* Attach virtio net ring to a raw socket, or tap device.
* The socket must be already bound to an ethernet device, this device will be
* used for transmit. Pass fd -1 to unbind from the socket and the transmit
* device. This can be used to stop the ring (e.g. for migration). */
#define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, struct vhost_vring_file)
/* Feature bits */
/* Log all write descriptors. Can be changed while device is active. */
#define VHOST_F_LOG_ALL 26
/* vhost-net should add virtio_net_hdr for RX, and strip for TX packets. */
#define VHOST_NET_F_VIRTIO_NET_HDR 27
/* VHOST_SCSI specific definitions */
/*
* Used by QEMU userspace to ensure a consistent vhost-scsi ABI.
*
* ABI Rev 0: July 2012 version starting point for v3.6-rc merge candidate +
* RFC-v2 vhost-scsi userspace. Add GET_ABI_VERSION ioctl usage
* ABI Rev 1: January 2013. Ignore vhost_tpgt filed in struct vhost_scsi_target.
* All the targets under vhost_wwpn can be seen and used by guset.
*/
#define VHOST_SCSI_ABI_VERSION 1
struct vhost_scsi_target {
int abi_version;
char vhost_wwpn[224]; /* TRANSPORT_IQN_LEN */
unsigned short vhost_tpgt;
unsigned short reserved;
};
#define VHOST_SCSI_SET_ENDPOINT _IOW(VHOST_VIRTIO, 0x40, struct vhost_scsi_target)
#define VHOST_SCSI_CLEAR_ENDPOINT _IOW(VHOST_VIRTIO, 0x41, struct vhost_scsi_target)
/* Changing this breaks userspace. */
#define VHOST_SCSI_GET_ABI_VERSION _IOW(VHOST_VIRTIO, 0x42, int)
/* Set and get the events missed flag */
#define VHOST_SCSI_SET_EVENTS_MISSED _IOW(VHOST_VIRTIO, 0x43, __u32)
#define VHOST_SCSI_GET_EVENTS_MISSED _IOW(VHOST_VIRTIO, 0x44, __u32)
/* VHOST_VSOCK specific defines */
#define VHOST_VSOCK_SET_GUEST_CID _IOW(VHOST_VIRTIO, 0x60, __u64)
#define VHOST_VSOCK_SET_RUNNING _IOW(VHOST_VIRTIO, 0x61, int)
#endif
This diff is collapsed.
......@@ -8,9 +8,9 @@ srctree := $(patsubst %/,%,$(dir $(srctree)))
#$(info Determined 'srctree' to be $(srctree))
endif
CC = $(CROSS_COMPILE)gcc
AR = $(CROSS_COMPILE)ar
LD = $(CROSS_COMPILE)ld
CC ?= $(CROSS_COMPILE)gcc
AR ?= $(CROSS_COMPILE)ar
LD ?= $(CROSS_COMPILE)ld
MAKEFLAGS += --no-print-directory
......@@ -19,7 +19,7 @@ LIBFILE = $(OUTPUT)libapi.a
CFLAGS := $(EXTRA_WARNINGS) $(EXTRA_CFLAGS)
CFLAGS += -ggdb3 -Wall -Wextra -std=gnu99 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fPIC
ifeq ($(CC), clang)
ifeq ($(CC_NO_CLANG), 0)
CFLAGS += -O3
else
CFLAGS += -O6
......
......@@ -154,12 +154,12 @@ all: fixdep $(VERSION_FILES) all_cmd
all_cmd: $(CMD_TARGETS)
$(BPF_IN): force elfdep bpfdep
@(test -f ../../../include/uapi/linux/bpf.h -a -f ../../../include/uapi/linux/bpf.h && ( \
@(test -f ../../include/uapi/linux/bpf.h -a -f ../../../include/uapi/linux/bpf.h && ( \
(diff -B ../../include/uapi/linux/bpf.h ../../../include/uapi/linux/bpf.h >/dev/null) || \
echo "Warning: tools/include/uapi/linux/bpf.h differs from kernel" >&2 )) || true
@(test -f ../../../include/uapi/linux/bpf_common.h -a -f ../../../include/uapi/linux/bpf_common.h && ( \
echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h'" >&2 )) || true
@(test -f ../../include/uapi/linux/bpf_common.h -a -f ../../../include/uapi/linux/bpf_common.h && ( \
(diff -B ../../include/uapi/linux/bpf_common.h ../../../include/uapi/linux/bpf_common.h >/dev/null) || \
echo "Warning: tools/include/uapi/linux/bpf_common.h differs from kernel" >&2 )) || true
echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf_common.h' differs from latest version at 'include/uapi/linux/bpf_common.h'" >&2 )) || true
$(Q)$(MAKE) $(build)=libbpf
$(OUTPUT)libbpf.so: $(BPF_IN)
......
......@@ -39,27 +39,45 @@ void *memdup(const void *src, size_t len)
* @s: input string
* @res: result
*
* This routine returns 0 iff the first character is one of 'Yy1Nn0'.
* Otherwise it will return -EINVAL. Value pointed to by res is
* updated upon finding a match.
* This routine returns 0 iff the first character is one of 'Yy1Nn0', or
* [oO][NnFf] for "on" and "off". Otherwise it will return -EINVAL. Value
* pointed to by res is updated upon finding a match.
*/
int strtobool(const char *s, bool *res)
{
if (!s)
return -EINVAL;
switch (s[0]) {
case 'y':
case 'Y':
case '1':
*res = true;
break;
return 0;
case 'n':
case 'N':
case '0':
*res = false;
break;
return 0;
case 'o':
case 'O':
switch (s[1]) {
case 'n':
case 'N':
*res = true;
return 0;
case 'f':
case 'F':
*res = false;
return 0;
default:
break;
}
default:
return -EINVAL;
break;
}
return 0;
return -EINVAL;
}
/**
......@@ -87,12 +105,3 @@ size_t __weak strlcpy(char *dest, const char *src, size_t size)
}
return ret;
}
int prefixcmp(const char *str, const char *prefix)
{
for (; ; str++, prefix++)
if (!*prefix)
return 0;
else if (*str != *prefix)
return (unsigned char)*prefix - (unsigned char)*str;
}
......@@ -21,7 +21,7 @@ LIBFILE = $(OUTPUT)libsubcmd.a
CFLAGS := $(EXTRA_WARNINGS) $(EXTRA_CFLAGS)
CFLAGS += -ggdb3 -Wall -Wextra -std=gnu99 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fPIC
ifeq ($(CC), clang)
ifeq ($(CC_NO_CLANG), 0)
CFLAGS += -O3
else
CFLAGS += -O6
......
......@@ -171,7 +171,7 @@ static void list_commands_in_dir(struct cmdnames *cmds,
while ((de = readdir(dir)) != NULL) {
int entlen;
if (prefixcmp(de->d_name, prefix))
if (!strstarts(de->d_name, prefix))
continue;
astrcat(&buf, de->d_name);
......
......@@ -368,7 +368,7 @@ static int parse_long_opt(struct parse_opt_ctx_t *p, const char *arg,
return 0;
}
if (!rest) {
if (!prefixcmp(options->long_name, "no-")) {
if (strstarts(options->long_name, "no-")) {
/*
* The long name itself starts with "no-", so
* accept the option without "no-" so that users
......@@ -381,7 +381,7 @@ static int parse_long_opt(struct parse_opt_ctx_t *p, const char *arg,
goto match;
}
/* Abbreviated case */
if (!prefixcmp(options->long_name + 3, arg)) {
if (strstarts(options->long_name + 3, arg)) {
flags |= OPT_UNSET;
goto is_abbreviated;
}
......@@ -406,7 +406,7 @@ static int parse_long_opt(struct parse_opt_ctx_t *p, const char *arg,
continue;
}
/* negated and abbreviated very much? */
if (!prefixcmp("no-", arg)) {
if (strstarts("no-", arg)) {
flags |= OPT_UNSET;
goto is_abbreviated;
}
......@@ -416,7 +416,7 @@ static int parse_long_opt(struct parse_opt_ctx_t *p, const char *arg,
flags |= OPT_UNSET;
rest = skip_prefix(arg + 3, options->long_name);
/* abbreviated and negated? */
if (!rest && !prefixcmp(options->long_name, arg + 3))
if (!rest && strstarts(options->long_name, arg + 3))
goto is_abbreviated;
if (!rest)
continue;
......@@ -456,7 +456,7 @@ static void check_typos(const char *arg, const struct option *options)
if (strlen(arg) < 3)
return;
if (!prefixcmp(arg, "no-")) {
if (strstarts(arg, "no-")) {
fprintf(stderr, " Error: did you mean `--%s` (with two dashes ?)", arg);
exit(129);
}
......@@ -464,7 +464,7 @@ static void check_typos(const char *arg, const struct option *options)
for (; options->type != OPTION_END; options++) {
if (!options->long_name)
continue;
if (!prefixcmp(options->long_name, arg)) {
if (strstarts(options->long_name, arg)) {
fprintf(stderr, " Error: did you mean `--%s` (with two dashes ?)", arg);
exit(129);
}
......@@ -933,10 +933,10 @@ int parse_options_usage(const char * const *usagestr,
if (opts->long_name == NULL)
continue;
if (!prefixcmp(opts->long_name, optstr))
if (strstarts(opts->long_name, optstr))
print_option_help(opts, 0);
if (!prefixcmp("no-", optstr) &&
!prefixcmp(opts->long_name, optstr + 3))
if (strstarts("no-", optstr) &&
strstarts(opts->long_name, optstr + 3))
print_option_help(opts, 0);
}
......
......@@ -50,6 +50,6 @@ libperf-y += util/
libperf-y += arch/
libperf-y += ui/
libperf-y += scripts/
libperf-y += trace/beauty/
libperf-$(CONFIG_AUDIT) += trace/beauty/
gtk-y += ui/gtk/
......@@ -192,7 +192,7 @@ do-install-man: man
# $(INSTALL) -m 644 $(DOC_MAN5) $(DESTDIR)$(man5dir); \
# $(INSTALL) -m 644 $(DOC_MAN7) $(DESTDIR)$(man7dir)
install-man: check-man-tools man
install-man: check-man-tools man do-install-man
ifdef missing_tools
DO_INSTALL_MAN = $(warning Please install $(missing_tools) to have the man pages installed)
......
......@@ -104,9 +104,9 @@ system, asynchronous, interrupt, transaction abort, trace begin, trace end, and
in transaction, respectively.
While it is possible to create scripts to analyze the data, an alternative
approach is available to export the data to a postgresql database. Refer to
script export-to-postgresql.py for more details, and to script
call-graph-from-postgresql.py for an example of using the database.
approach is available to export the data to a sqlite or postgresql database.
Refer to script export-to-sqlite.py or export-to-postgresql.py for more details,
and to script call-graph-from-sql.py for an example of using the database.
There is also script intel-pt-events.py which provides an example of how to
unpack the raw data for power events and PTWRITE.
......
......@@ -43,6 +43,10 @@ OPTIONS
--quiet::
Do not show any message. (Suppress -v)
-n::
--show-nr-samples::
Show the number of samples for each symbol
-D::
--dump-raw-trace::
Dump raw trace in ASCII.
......@@ -88,6 +92,8 @@ OPTIONS
--asm-raw::
Show raw instruction encoding of assembly instructions.
--show-total-period:: Show a column with the sum of periods.
--source::
Interleave source code with assembly code. Enabled by default,
disable with --no-source.
......
......@@ -61,6 +61,11 @@ OPTIONS
--verbose::
Be more verbose.
--target-ns=PID:
Obtain mount namespace information from the target pid. This is
used when creating a uprobe for a process that resides in a
different mount namespace from the perf(1) utility.
SEE ALSO
--------
linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-buildid-list[1]
......@@ -130,6 +130,11 @@ OPTIONS
--max-probes=NUM::
Set the maximum number of probe points for an event. Default is 128.
--target-ns=PID:
Obtain mount namespace information from the target pid. This is
used when creating a uprobe for a process that resides in a
different mount namespace from the perf(1) utility.
-x::
--exec=PATH::
Specify path to the executable or shared library file for user
......@@ -264,6 +269,15 @@ Add probes at malloc() function on libc
./perf probe -x /lib/libc.so.6 malloc or ./perf probe /lib/libc.so.6 malloc
Add a uprobe to a target process running in a different mount namespace
./perf probe --target-ns <target pid> -x /lib64/libc.so.6 malloc
Add a USDT probe to a target process running in a different mount namespace
./perf probe --target-ns <target pid> -x /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64/jre/lib/amd64/server/libjvm.so %sdt_hotspot:thread__sleep__end
SEE ALSO
--------
linkperf:perf-trace[1], linkperf:perf-record[1], linkperf:perf-buildid-cache[1]
......@@ -332,6 +332,7 @@ following filters are defined:
- no_tx: only when the target is not in a hardware transaction
- abort_tx: only when the target is a hardware transaction abort
- cond: conditional branches
- save_type: save branch type during sampling in case binary is not available later
+
The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
......
......@@ -41,13 +41,13 @@ report::
- a symbolically formed event like 'pmu/param1=0x3,param2/' where
param1 and param2 are defined as formats for the PMU in
/sys/bus/event_sources/devices/<pmu>/format/*
/sys/bus/event_source/devices/<pmu>/format/*
- a symbolically formed event like 'pmu/config=M,config1=N,config2=K/'
where M, N, K are numbers (in decimal, hex, octal format).
Acceptable values for each of 'config', 'config1' and 'config2'
parameters are defined by corresponding entries in
/sys/bus/event_sources/devices/<pmu>/format/*
/sys/bus/event_source/devices/<pmu>/format/*
-i::
--no-inherit::
......
......@@ -237,6 +237,10 @@ Default is to monitor all CPUS.
--hierarchy::
Enable hierarchy output.
--force::
Don't do ownership validation.
INTERACTIVE PROMPTING KEYS
--------------------------
......
......@@ -398,6 +398,11 @@ struct auxtrace_error_event {
char msg[MAX_AUXTRACE_ERROR_MSG];
};
PERF_RECORD_HEADER_FEATURE = 80,
Describes a header feature. These are records used in pipe-mode that
contain information that otherwise would be in perf.data file's header.
Event types
Define the event attributes with their IDs.
......@@ -422,8 +427,9 @@ struct perf_pipe_file_header {
};
The information about attrs, data, and event_types is instead in the
synthesized events PERF_RECORD_ATTR, PERF_RECORD_HEADER_TRACING_DATA and
PERF_RECORD_HEADER_EVENT_TYPE that are generated by perf record in pipe-mode.
synthesized events PERF_RECORD_ATTR, PERF_RECORD_HEADER_TRACING_DATA,
PERF_RECORD_HEADER_EVENT_TYPE, and PERF_RECORD_HEADER_FEATURE
that are generated by perf record in pipe-mode.
References:
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -126,7 +126,7 @@ void arch__post_process_probe_trace_events(struct perf_probe_event *pev,
struct rb_node *tmp;
int i = 0;
map = get_target_map(pev->target, pev->uprobes);
map = get_target_map(pev->target, pev->nsi, pev->uprobes);
if (!map || map__load(map) < 0)
return;
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment