1. 16 Apr, 2020 17 commits
    • Alexey Budankov's avatar
      doc/admin-guide: Update perf-security.rst with CAP_PERFMON information · 902a8dcc
      Alexey Budankov authored
      Update perf-security.rst documentation file with the information
      related to usage of CAP_PERFMON capability to secure performance
      monitoring and observability operations in system.
      
      Committer notes:
      
      While testing 'perf top' under cap_perfmon I noticed that it needs
      some more capability and Alexey pointed out cap_ipc_lock, as needed by
      this kernel chunk:
      
        kernel/events/core.c: 6101
             if ((locked > lock_limit) && perf_is_paranoid() &&
                     !capable(CAP_IPC_LOCK)) {
                     ret = -EPERM;
                     goto unlock;
             }
      
      So I added it to the documentation, and also mentioned that if the
      libcap version doesn't yet supports 'cap_perfmon', its numeric value can
      be used instead, i.e. if:
      
      	# setcap "cap_perfmon,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf
      
      Fails, try:
      
      	# setcap "38,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf
      
      I also added a paragraph stating that using an unpatched libcap will
      fail the check for CAP_PERFMON, as it checks the cap number against a
      maximum to see if it is valid, which makes it use as the default the
      'cycles:u' event, even tho a cap_perfmon capable perf binary can get
      kernel samples, to workaround that just use, e.g.:
      
        # perf top -e cycles
        # perf record -e cycles
      
      And it will sample kernel and user modes.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/17278551-9399-9ebe-d665-8827016a217d@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      902a8dcc
    • Alexey Budankov's avatar
      drivers/oprofile: Open access for CAP_PERFMON privileged process · ab76878b
      Alexey Budankov authored
      Open access to monitoring for CAP_PERFMON privileged process.  Providing
      the access under CAP_PERFMON capability singly, without the rest of
      CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
      and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to the monitoring remains open
      for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure monitoring is discouraged with respect to CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Acked-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/691f1096-b15f-9b12-50a0-c2b93918149e@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ab76878b
    • Alexey Budankov's avatar
      drivers/perf: Open access for CAP_PERFMON privileged process · cea7d0d4
      Alexey Budankov authored
      Open access to monitoring for CAP_PERFMON privileged process.  Providing
      the access under CAP_PERFMON capability singly, without the rest of
      CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
      and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to the monitoring remains open
      for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure monitoring is discouraged with respect to CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/4ec1d6f7-548c-8d1c-f84a-cebeb9674e4e@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cea7d0d4
    • Alexey Budankov's avatar
      parisc/perf: open access for CAP_PERFMON privileged process · cf91baf3
      Alexey Budankov authored
      Open access to monitoring for CAP_PERFMON privileged process.  Providing
      the access under CAP_PERFMON capability singly, without the rest of
      CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
      and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to the monitoring remains open
      for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure monitoring is discouraged with respect to CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarHelge Deller <deller@gmx.de>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/8cc98809-d35b-de0f-de02-4cf554f3cf62@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cf91baf3
    • Alexey Budankov's avatar
      powerpc/perf: open access for CAP_PERFMON privileged process · ff467583
      Alexey Budankov authored
      Open access to monitoring for CAP_PERFMON privileged process.  Providing
      the access under CAP_PERFMON capability singly, without the rest of
      CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
      and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to the monitoring remains open
      for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure monitoring is discouraged with respect to CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/ac98cd9f-b59e-673c-c70d-180b3e7695d2@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ff467583
    • Alexey Budankov's avatar
      trace/bpf_trace: Open access for CAP_PERFMON privileged process · 031258da
      Alexey Budankov authored
      Open access to bpf_trace monitoring for CAP_PERFMON privileged process.
      Providing the access under CAP_PERFMON capability singly, without the
      rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
      credentials and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to bpf_trace monitoring
      remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN
      usage for secure bpf_trace monitoring is discouraged with respect to
      CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/c0a0ae47-8b6e-ff3e-416b-3cd1faaf71c0@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      031258da
    • Alexey Budankov's avatar
      drm/i915/perf: Open access for CAP_PERFMON privileged process · 4e3d3456
      Alexey Budankov authored
      Open access to i915_perf monitoring for CAP_PERFMON privileged process.
      Providing the access under CAP_PERFMON capability singly, without the
      rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
      credentials and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to i915_events subsystem remains
      open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure i915_events monitoring is discouraged with respect to CAP_PERFMON
      capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/e3e3292f-f765-ea98-e59c-fbe2db93fd34@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4e3d3456
    • Alexey Budankov's avatar
      perf tools: Support CAP_PERFMON capability · 6b3e0e2e
      Alexey Budankov authored
      Extend error messages to mention CAP_PERFMON capability as an option to
      substitute CAP_SYS_ADMIN capability for secure system performance
      monitoring and observability operations. Make
      perf_event_paranoid_check() and __cmd_ftrace() to be aware of
      CAP_PERFMON capability.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to perf_events subsystem remains
      open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure perf_events monitoring is discouraged with respect to CAP_PERFMON
      capability.
      
      Committer testing:
      
      Using a libcap with this patch:
      
        diff --git a/libcap/include/uapi/linux/capability.h b/libcap/include/uapi/linux/capability.h
        index 78b2fd4c8a95..89b5b0279b60 100644
        --- a/libcap/include/uapi/linux/capability.h
        +++ b/libcap/include/uapi/linux/capability.h
        @@ -366,8 +366,9 @@ struct vfs_ns_cap_data {
      
         #define CAP_AUDIT_READ       37
      
        +#define CAP_PERFMON	     38
      
        -#define CAP_LAST_CAP         CAP_AUDIT_READ
        +#define CAP_LAST_CAP         CAP_PERFMON
      
         #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP)
      
      Note that using '38' in place of 'cap_perfmon' works to some degree with
      an old libcap, its only when cap_get_flag() is called that libcap
      performs an error check based on the maximum value known for
      capabilities that it will fail.
      
      This makes determining the default of perf_event_attr.exclude_kernel to
      fail, as it can't determine if CAP_PERFMON is in place.
      
      Using 'perf top -e cycles' avoids the default check and sets
      perf_event_attr.exclude_kernel to 1.
      
      As root, with a libcap supporting CAP_PERFMON:
      
        # groupadd perf_users
        # adduser perf -g perf_users
        # mkdir ~perf/bin
        # cp ~acme/bin/perf ~perf/bin/
        # chgrp perf_users ~perf/bin/perf
        # setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" ~perf/bin/perf
        # getcap ~perf/bin/perf
        /home/perf/bin/perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep
        # ls -la ~perf/bin/perf
        -rwxr-xr-x. 1 root perf_users 16968552 Apr  9 13:10 /home/perf/bin/perf
      
      As the 'perf' user in the 'perf_users' group:
      
        $ perf top -a --stdio
        Error:
        Failed to mmap with 1 (Operation not permitted)
        $
      
      Either add the cap_ipc_lock capability to the perf binary or reduce the
      ring buffer size to some smaller value:
      
        $ perf top -m10 -a --stdio
        rounding mmap pages size to 64K (16 pages)
        Error:
        Failed to mmap with 1 (Operation not permitted)
        $ perf top -m4 -a --stdio
        Error:
        Failed to mmap with 1 (Operation not permitted)
        $ perf top -m2 -a --stdio
         PerfTop: 762 irqs/sec  kernel:49.7%  exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 4 CPUs)
        ------------------------------------------------------------------------------------------------------
      
           9.83%  perf                [.] __symbols__insert
           8.58%  perf                [.] rb_next
           5.91%  [kernel]            [k] module_get_kallsym
           5.66%  [kernel]            [k] kallsyms_expand_symbol.constprop.0
           3.98%  libc-2.29.so        [.] __GI_____strtoull_l_internal
           3.66%  perf                [.] rb_insert_color
           2.34%  [kernel]            [k] vsnprintf
           2.30%  [kernel]            [k] string_nocheck
           2.16%  libc-2.29.so        [.] _IO_getdelim
           2.15%  [kernel]            [k] number
           2.13%  [kernel]            [k] format_decode
           1.58%  libc-2.29.so        [.] _IO_feof
           1.52%  libc-2.29.so        [.] __strcmp_avx2
           1.50%  perf                [.] rb_set_parent_color
           1.47%  libc-2.29.so        [.] __libc_calloc
           1.24%  [kernel]            [k] do_syscall_64
           1.17%  [kernel]            [k] __x86_indirect_thunk_rax
      
        $ perf record -a sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.552 MB perf.data (74 samples) ]
        $ perf evlist
        cycles
        $ perf evlist -v
        cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
        $ perf report | head -20
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 74  of event 'cycles'
        # Event count (approx.): 15694834
        #
        # Overhead  Command          Shared Object               Symbol
        # ........  ...............  ..........................  ......................................
        #
            19.62%  perf             [kernel.vmlinux]            [k] strnlen_user
            13.88%  swapper          [kernel.vmlinux]            [k] intel_idle
            13.83%  ksoftirqd/0      [kernel.vmlinux]            [k] pfifo_fast_dequeue
            13.51%  swapper          [kernel.vmlinux]            [k] kmem_cache_free
             6.31%  gnome-shell      [kernel.vmlinux]            [k] kmem_cache_free
             5.66%  kworker/u8:3+ix  [kernel.vmlinux]            [k] delay_tsc
             4.42%  perf             [kernel.vmlinux]            [k] __set_cpus_allowed_ptr
             3.45%  kworker/2:1-eve  [kernel.vmlinux]            [k] shmem_truncate_range
             2.29%  gnome-shell      libgobject-2.0.so.0.6000.7  [.] g_closure_ref
        $
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/a66d5648-2b8e-577e-e1f2-1d56c017ab5e@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6b3e0e2e
    • Alexey Budankov's avatar
      perf/core: open access to probes for CAP_PERFMON privileged process · c9e0924e
      Alexey Budankov authored
      Open access to monitoring via kprobes and uprobes and eBPF tracing for
      CAP_PERFMON privileged process. Providing the access under CAP_PERFMON
      capability singly, without the rest of CAP_SYS_ADMIN credentials,
      excludes chances to misuse the credentials and makes operation more
      secure.
      
      perf kprobes and uprobes are used by ftrace and eBPF. perf probe uses
      ftrace to define new kprobe events, and those events are treated as
      tracepoint events. eBPF defines new probes via perf_event_open interface
      and then the probes are used in eBPF tracing.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to perf_events subsystem
      remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN
      usage for secure perf_events monitoring is discouraged with respect to
      CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Link: http://lore.kernel.org/lkml/3c129d9a-ba8a-3483-ecc5-ad6c8e7c203f@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c9e0924e
    • Alexey Budankov's avatar
      perf/core: Open access to the core for CAP_PERFMON privileged process · 18aa1856
      Alexey Budankov authored
      Open access to monitoring of kernel code, CPUs, tracepoints and
      namespaces data for a CAP_PERFMON privileged process. Providing the
      access under CAP_PERFMON capability singly, without the rest of
      CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
      and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons the access to perf_events subsystem
      remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN
      usage for secure perf_events monitoring is discouraged with respect to
      CAP_PERFMON capability.
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: linux-man@vger.kernel.org
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/471acaef-bb8a-5ce2-923f-90606b78eef9@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      18aa1856
    • Alexey Budankov's avatar
      capabilities: Introduce CAP_PERFMON to kernel and user space · 98073728
      Alexey Budankov authored
      Introduce the CAP_PERFMON capability designed to secure system
      performance monitoring and observability operations so that CAP_PERFMON
      can assist CAP_SYS_ADMIN capability in its governing role for
      performance monitoring and observability subsystems.
      
      CAP_PERFMON hardens system security and integrity during performance
      monitoring and observability operations by decreasing attack surface that
      is available to a CAP_SYS_ADMIN privileged process [2]. Providing the access
      to system performance monitoring and observability operations under CAP_PERFMON
      capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes
      chances to misuse the credentials and makes the operation more secure.
      
      Thus, CAP_PERFMON implements the principle of least privilege for
      performance monitoring and observability operations (POSIX IEEE 1003.1e:
      2.2.2.39 principle of least privilege: A security design principle that
        states that a process or program be granted only those privileges
      (e.g., capabilities) necessary to accomplish its legitimate function,
      and only for the time that such privileges are actually required)
      
      CAP_PERFMON meets the demand to secure system performance monitoring and
      observability operations for adoption in security sensitive, restricted,
      multiuser production environments (e.g. HPC clusters, cloud and virtual compute
      environments), where root or CAP_SYS_ADMIN credentials are not available to
      mass users of a system, and securely unblocks applicability and scalability
      of system performance monitoring and observability operations beyond root
      and CAP_SYS_ADMIN use cases.
      
      CAP_PERFMON takes over CAP_SYS_ADMIN credentials related to system performance
      monitoring and observability operations and balances amount of CAP_SYS_ADMIN
      credentials following the recommendations in the capabilities man page [1]
      for CAP_SYS_ADMIN: "Note: this capability is overloaded; see Notes to kernel
      developers, below." For backward compatibility reasons access to system
      performance monitoring and observability subsystems of the kernel remains
      open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN capability
      usage for secure system performance monitoring and observability operations
      is discouraged with respect to the designed CAP_PERFMON capability.
      
      Although the software running under CAP_PERFMON can not ensure avoidance
      of related hardware issues, the software can still mitigate these issues
      following the official hardware issues mitigation procedure [2]. The bugs
      in the software itself can be fixed following the standard kernel development
      process [3] to maintain and harden security of system performance monitoring
      and observability operations.
      
      [1] http://man7.org/linux/man-pages/man7/capabilities.7.html
      [2] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
      [3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.htmlSigned-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Acked-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Acked-by: default avatarSerge E. Hallyn <serge@hallyn.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/5590d543-82c6-490a-6544-08e6a5517db0@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      98073728
    • Jiri Olsa's avatar
      perf annotate: Add basic support for bpf_image · 3c29d448
      Jiri Olsa authored
      Add the DSO_BINARY_TYPE__BPF_IMAGE dso binary type to recognize BPF
      images that carry trampoline or dispatcher.
      
      Upcoming patches will add support to read the image data, store it
      within the BPF feature in perf.data and display it for annotation
      purposes.
      
      Currently we only display following message:
      
        # ./perf annotate bpf_trampoline_24456 --stdio
         Percent |      Source code & Disassembly of . for cycles (504  ...
        --------------------------------------------------------------- ...
                 :       to be implemented
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Björn Töpel <bjorn.topel@intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@redhat.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jesper Dangaard Brouer <hawk@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200312195610.346362-16-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3c29d448
    • Jiri Olsa's avatar
      perf machine: Set ksymbol dso as loaded on arrival · 7eddf7e7
      Jiri Olsa authored
      There's no special load action for ksymbol data on map__load/dso__load
      action, where the kernel is getting loaded. It only gets confused with
      kernel kallsyms/vmlinux load for bpf object, which fails and could mess
      up with the map.
      
      Disabling any further load of the map for ksymbol related dso/map.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Björn Töpel <bjorn.topel@intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@redhat.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jesper Dangaard Brouer <hawk@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200312195610.346362-15-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7eddf7e7
    • Jiri Olsa's avatar
      perf tools: Synthesize bpf_trampoline/dispatcher ksymbol event · 943930e4
      Jiri Olsa authored
      Synthesize bpf images (trampolines/dispatchers) on start, as ksymbol
      events from /proc/kallsyms. Having this perf can recognize samples from
      those images and perf report and top shows them correctly.
      
      The rest of the ksymbol handling is already in place from for the bpf
      programs monitoring, so only the initial state was needed.
      
      perf report output:
      
        # Overhead  Command     Shared Object                  Symbol
      
          12.37%  test_progs  [kernel.vmlinux]                 [k] entry_SYSCALL_64
          11.80%  test_progs  [kernel.vmlinux]                 [k] syscall_return_via_sysret
           9.63%  test_progs  bpf_prog_bcf7977d3b93787c_prog2  [k] bpf_prog_bcf7977d3b93787c_prog2
           6.90%  test_progs  bpf_trampoline_24456             [k] bpf_trampoline_24456
           6.36%  test_progs  [kernel.vmlinux]                 [k] memcpy_erms
      
      Committer notes:
      
      Use scnprintf() instead of strncpy() to overcome this on fedora:32,
      rawhide and OpenMandriva Cooker:
      
          CC       /tmp/build/perf/util/bpf-event.o
        In file included from /usr/include/string.h:495,
                         from /git/linux/tools/lib/bpf/libbpf_common.h:12,
                         from /git/linux/tools/lib/bpf/bpf.h:31,
                         from util/bpf-event.c:4:
        In function 'strncpy',
            inlined from 'process_bpf_image' at util/bpf-event.c:323:2,
            inlined from 'kallsyms_process_symbol' at util/bpf-event.c:358:9:
        /usr/include/bits/string_fortified.h:106:10: error: '__builtin_strncpy' specified bound 256 equals destination size [-Werror=stringop-truncation]
          106 |   return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
              |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        cc1: all warnings being treated as errors
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: Björn Töpel <bjorn.topel@intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@redhat.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jesper Dangaard Brouer <hawk@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200312195610.346362-14-jolsa@kernel.org/Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      943930e4
    • Arnaldo Carvalho de Melo's avatar
      perf stat: Honour --timeout for forked workloads · cfbd41b7
      Arnaldo Carvalho de Melo authored
      When --timeout is used and a workload is specified to be started by
      'perf stat', i.e.
      
        $ perf stat --timeout 1000 sleep 1h
      
      The --timeout wasn't being honoured, i.e. the workload, 'sleep 1h' in
      the above example, should be terminated after 1000ms, but it wasn't,
      'perf stat' was waiting for it to finish.
      
      Fix it by sending a SIGTERM when the timeout expires.
      
      Now it works:
      
        # perf stat -e cycles --timeout 1234 sleep 1h
        sleep: Terminated
      
         Performance counter stats for 'sleep 1h':
      
                 1,066,692      cycles
      
               1.234314838 seconds time elapsed
      
               0.000750000 seconds user
               0.000000000 seconds sys
      
        #
      
      Fixes: f1f8ad52 ("perf stat: Add support to print counts after a period of time")
      Reported-by: default avatarKonstantin Kharlamov <hi-angel@yandex.ru>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207243Tested-by: default avatarKonstantin Kharlamov <hi-angel@yandex.ru>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: yuzhoujian <yuzhoujian@didichuxing.com>
      Link: https://lore.kernel.org/lkml/20200415153803.GB20324@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cfbd41b7
    • Ingo Molnar's avatar
      Merge tag 'perf-urgent-for-mingo-5.7-20200414' of... · cd094335
      Ingo Molnar authored
      Merge tag 'perf-urgent-for-mingo-5.7-20200414' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
      
      perf stat:
      
        Jin Yao:
      
        - Fix no metric header if --per-socket and --metric-only set
      
      build system:
      
        - Fix python building when built with clang, that was failing if the clang
          version doesn't support -fno-semantic-interposition.
      
      tools UAPI headers:
      
        Arnaldo Carvalho de Melo:
      
        - Update various copies of kernel headers, some ended up automatically
          updating build-time generated tables to enable tools such as 'perf trace'
          to decode syscalls and tracepoints arguments.
      
          Now the tools/perf build is free of UAPI drift warnings.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cd094335
    • Linus Torvalds's avatar
      Merge tag 'efi-urgent-2020-04-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 00086336
      Linus Torvalds authored
      Pull EFI fixes from Ingo Molnar:
       "Misc EFI fixes, including the boot failure regression caused by the
        BSS section not being cleared by the loaders"
      
      * tag 'efi-urgent-2020-04-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi/x86: Revert struct layout change to fix kexec boot regression
        efi/x86: Don't remap text<->rodata gap read-only for mixed mode
        efi/x86: Fix the deletion of variables in mixed mode
        efi/libstub/file: Merge file name buffers to reduce stack usage
        Documentation/x86, efi/x86: Clarify EFI handover protocol and its requirements
        efi/arm: Deal with ADR going out of range in efi_enter_kernel()
        efi/x86: Always relocate the kernel for EFI handover entry
        efi/x86: Move efi stub globals from .bss to .data
        efi/libstub/x86: Remove redundant assignment to pointer hdr
        efi/cper: Use scnprintf() for avoiding potential buffer overflow
      00086336
  2. 14 Apr, 2020 23 commits
    • Linus Torvalds's avatar
      Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux · 8632e9b5
      Linus Torvalds authored
      Pull hyperv fixes from Wei Liu:
      
       - a series from Tianyu Lan to fix crash reporting on Hyper-V
      
       - three miscellaneous cleanup patches
      
      * tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
        x86/Hyper-V: Report crash data in die() when panic_on_oops is set
        x86/Hyper-V: Report crash register data when sysctl_record_panic_msg is not set
        x86/Hyper-V: Report crash register data or kmsg before running crash kernel
        x86/Hyper-V: Trigger crash enlightenment only once during system crash.
        x86/Hyper-V: Free hv_panic_page when fail to register kmsg dump
        x86/Hyper-V: Unload vmbus channel in hv panic callback
        x86: hyperv: report value of misc_features
        hv_debugfs: Make hv_debug_root static
        hv: hyperv_vmbus.h: Replace zero-length array with flexible-array member
      8632e9b5
    • Linus Torvalds's avatar
      Merge tag 'for-5.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 6cc9306b
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "We have a few regressions and one fix for stable:
      
         - revert fsync optimization
      
         - fix lost i_size update
      
         - fix a space accounting leak
      
         - build fix, add back definition of a deprecated ioctl flag
      
         - fix search condition for old roots in relocation"
      
      * tag 'for-5.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: re-instantiate the removed BTRFS_SUBVOL_CREATE_ASYNC definition
        btrfs: fix reclaim counter leak of space_info objects
        btrfs: make full fsyncs always operate on the entire file again
        btrfs: fix lost i_size update after cloning inline extent
        btrfs: check commit root generation in should_ignore_root
      6cc9306b
    • Linus Torvalds's avatar
      Merge tag 'afs-fixes-20200413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · f4cd6668
      Linus Torvalds authored
      Pull AFS fixes from David Howells:
      
       - Fix the decoding of fetched file status records so that the xdr
         pointer is advanced under all circumstances.
      
       - Fix the decoding of a fetched file status record that indicates an
         inline abort (ie. an error) so that it sets the flag saying the
         decoder stored the abort code.
      
       - Fix the decoding of the result of the rename operation so that it
         doesn't skip the decoding of the second fetched file status (ie. that
         of the dest dir) in the case that the source and dest dirs were the
         same as this causes the xdr pointer not to be advanced, leading to
         incorrect decoding of subsequent parts of the reply.
      
       - Fix the dump of a bad YFSFetchStatus record to dump the full length.
      
       - Fix a race between local editing of directory contents and accessing
         the dir for reading or d_revalidate by using the same lock in both.
      
       - Fix afs_d_revalidate() to not accidentally reverse the version on a
         dentry when it's meant to be bringing it forward.
      
      * tag 'afs-fixes-20200413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        afs: Fix afs_d_validate() to set the right directory version
        afs: Fix race between post-modification dir edit and readdir/d_revalidate
        afs: Fix length of dump of bad YFSFetchStatus record
        afs: Fix rename operation status delivery
        afs: Fix decoding of inline abort codes from version 1 status records
        afs: Fix missing XDR advance in xdr_decode_{AFS,YFS}FSFetchStatus()
      f4cd6668
    • Arnaldo Carvalho de Melo's avatar
      tools headers: Synchronize linux/bits.h with the kernel sources · e3698b23
      Arnaldo Carvalho de Melo authored
      To pick up the changes in these csets:
      
        295bcca8 ("linux/bits.h: add compile time sanity check of GENMASK inputs")
        3945ff37 ("linux/bits.h: Extract common header for vDSO")
      
      To address this tools/perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/linux/bits.h' differs from latest version at 'include/linux/bits.h'
        diff -u tools/include/linux/bits.h include/linux/bits.h
      
      This clashes with usage of userspace's static_assert(), that, at least
      on glibc, is guarded by a ifnded/endif pair, do the same to our copy of
      build_bug.h and avoid that diff in check_headers.sh so that we continue
      checking for drifts with the kernel sources master copy.
      
      This will all be tested with the set of build containers that includes
      uCLibc, musl libc, lots of glibc versions in lots of distros and cross
      build environments.
      
      The tools/objtool, tools/bpf, etc were tested as well.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Rikard Falkeborn <rikard.falkeborn@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e3698b23
    • Arnaldo Carvalho de Melo's avatar
      tools headers: Adopt verbatim copy of compiletime_assert() from kernel sources · 5b992add
      Arnaldo Carvalho de Melo authored
      Will be needed when syncing the linux/bits.h header, in the next cset.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5b992add
    • Arnaldo Carvalho de Melo's avatar
      tools headers: Update x86's syscall_64.tbl with the kernel sources · d8ed4d7a
      Arnaldo Carvalho de Melo authored
      To pick the changes from:
      
        d3b1b776 ("x86/entry/64: Remove ptregs qualifier from syscall table")
        cab56d34 ("x86/entry: Remove ABI prefixes from functions in syscall tables")
        27dd84fa ("x86/entry/64: Use syscall wrappers for x32_rt_sigreturn")
      
      Addressing this tools/perf build warning:
      
        Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
        diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
      
      That didn't result in any tooling changes, as what is extracted are just
      the first two columns, and these patches touched only the third.
      
        $ cp /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c /tmp
        $ cp arch/x86/entry/syscalls/syscall_64.tbl tools/perf/arch/x86/entry/syscalls/syscall_64.tbl
        $ make -C tools/perf O=/tmp/build/perf install-bin
        make: Entering directory '/home/acme/git/perf/tools/perf'
          BUILD:   Doing 'make -j12' parallel build
          DESCEND  plugins
          CC       /tmp/build/perf/util/syscalltbl.o
          INSTALL  trace_plugins
          LD       /tmp/build/perf/util/perf-in.o
          LD       /tmp/build/perf/perf-in.o
          LINK     /tmp/build/perf/perf
        $ diff -u /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c /tmp/syscalls_64.c
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d8ed4d7a
    • Arnaldo Carvalho de Melo's avatar
      tools headers UAPI: Sync drm/i915_drm.h with the kernel sources · 54a58ebc
      Arnaldo Carvalho de Melo authored
      To pick the change in:
      
        88be76cd ("drm/i915: Allow userspace to specify ringsize on construction")
      
      That don't result in any changes in tooling, just silences this perf
      build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
        diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      54a58ebc
    • Arnaldo Carvalho de Melo's avatar
      tools headers UAPI: Update tools's copy of drm.h headers · 0719bdf4
      Arnaldo Carvalho de Melo authored
      Picking the changes from:
      
        455e00f1 ("drm: Add getfb2 ioctl")
      
      Silencing these perf build warnings:
      
        Warning: Kernel ABI header at 'tools/include/uapi/drm/drm.h' differs from latest version at 'include/uapi/drm/drm.h'
        diff -u tools/include/uapi/drm/drm.h include/uapi/drm/drm.h
      
      Now 'perf trace' and other code that might use the
      tools/perf/trace/beauty autogenerated tables will be able to translate
      this new ioctl code into a string:
      
        $ tools/perf/trace/beauty/drm_ioctl.sh > before
        $ cp include/uapi/drm/drm.h tools/include/uapi/drm/drm.h
        $ tools/perf/trace/beauty/drm_ioctl.sh > after
        $ diff -u before after
        --- before	2020-04-14 09:28:45.461821077 -0300
        +++ after	2020-04-14 09:28:53.594782685 -0300
        @@ -107,6 +107,7 @@
         	[0xCB] = "SYNCOBJ_QUERY",
         	[0xCC] = "SYNCOBJ_TRANSFER",
         	[0xCD] = "SYNCOBJ_TIMELINE_SIGNAL",
        +	[0xCE] = "MODE_GETFB2",
         	[DRM_COMMAND_BASE + 0x00] = "I915_INIT",
         	[DRM_COMMAND_BASE + 0x01] = "I915_FLUSH",
         	[DRM_COMMAND_BASE + 0x02] = "I915_FLIP",
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Daniel Stone <daniels@collabora.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0719bdf4
    • Arnaldo Carvalho de Melo's avatar
      tools headers kvm: Sync linux/kvm.h with the kernel sources · b8fc2280
      Arnaldo Carvalho de Melo authored
      To pick up the changes from:
      
        9a5788c6 ("KVM: PPC: Book3S HV: Add a capability for enabling secure guests")
        3c9bd400 ("KVM: x86: enable dirty log gradually in small chunks")
        13da9ae1 ("KVM: s390: protvirt: introduce and enable KVM_CAP_S390_PROTECTED")
        e0d2773d ("KVM: s390: protvirt: UV calls in support of diag308 0, 1")
        19e12277 ("KVM: S390: protvirt: Introduce instruction data area bounce buffer")
        29b40f10 ("KVM: s390: protvirt: Add initial vm and cpu lifecycle handling")
      
      So far we're ignoring those arch specific ioctls, we need to revisit
      this at some time to have arch specific tables, etc:
      
        $ grep S390 tools/perf/trace/beauty/kvm_ioctl.sh
            egrep -v " ((ARM|PPC|S390)_|[GS]ET_(DEBUGREGS|PIT2|XSAVE|TSC_KHZ)|CREATE_SPAPR_TCE_64)" | \
        $
      
      This addresses these tools/perf build warnings:
      
        Warning: Kernel ABI header at 'tools/arch/arm/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm/include/uapi/asm/kvm.h'
        diff -u tools/arch/arm/include/uapi/asm/kvm.h arch/arm/include/uapi/asm/kvm.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Janosch Frank <frankja@linux.ibm.com>
      Cc: Jay Zhou <jianjay.zhou@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b8fc2280
    • Arnaldo Carvalho de Melo's avatar
      tools headers UAPI: Sync linux/fscrypt.h with the kernel sources · 1abcb9d9
      Arnaldo Carvalho de Melo authored
      To pick the changes from:
      
        e98ad464 ("fscrypt: add FS_IOC_GET_ENCRYPTION_NONCE ioctl")
      
      That don't trigger any changes in tooling.
      
      This silences this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/fscrypt.h' differs from latest version at 'include/uapi/linux/fscrypt.h'
        diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h
      
      In time we should come up with something like:
      
        $ tools/perf/trace/beauty/fsconfig.sh
        static const char *fsconfig_cmds[] = {
        	[0] = "SET_FLAG",
        	[1] = "SET_STRING",
        	[2] = "SET_BINARY",
        	[3] = "SET_PATH",
        	[4] = "SET_PATH_EMPTY",
        	[5] = "SET_FD",
        	[6] = "CMD_CREATE",
        	[7] = "CMD_RECONFIGURE",
        };
        $
      
      And:
      
        $ tools/perf/trace/beauty/drm_ioctl.sh | head
        #ifndef DRM_COMMAND_BASE
        #define DRM_COMMAND_BASE                0x40
        #endif
        static const char *drm_ioctl_cmds[] = {
        	[0x00] = "VERSION",
        	[0x01] = "GET_UNIQUE",
        	[0x02] = "GET_MAGIC",
        	[0x03] = "IRQ_BUSID",
        	[0x04] = "GET_MAP",
        	[0x05] = "GET_CLIENT",
        $
      
      For fscrypt's ioctls.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1abcb9d9
    • Arnaldo Carvalho de Melo's avatar
      tools include UAPI: Sync linux/vhost.h with the kernel sources · 3df4d4bf
      Arnaldo Carvalho de Melo authored
      To get the changes in:
      
        4c8cf318 ("vhost: introduce vDPA-based backend")
      
      Silencing this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h'
        diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h
      
      This automatically picks these new ioctls, making tools such as 'perf
      trace' aware of them and possibly allowing to use the strings in
      filters, etc:
      
        $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before
        $ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h
        $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after
        $ diff -u before after
        --- before	2020-04-14 09:12:28.559748968 -0300
        +++ after	2020-04-14 09:12:38.781696242 -0300
        @@ -24,9 +24,16 @@
         	[0x44] = "SCSI_GET_EVENTS_MISSED",
         	[0x60] = "VSOCK_SET_GUEST_CID",
         	[0x61] = "VSOCK_SET_RUNNING",
        +	[0x72] = "VDPA_SET_STATUS",
        +	[0x74] = "VDPA_SET_CONFIG",
        +	[0x75] = "VDPA_SET_VRING_ENABLE",
         };
         static const char *vhost_virtio_ioctl_read_cmds[] = {
         	[0x00] = "GET_FEATURES",
         	[0x12] = "GET_VRING_BASE",
         	[0x26] = "GET_BACKEND_FEATURES",
        +	[0x70] = "VDPA_GET_DEVICE_ID",
        +	[0x71] = "VDPA_GET_STATUS",
        +	[0x73] = "VDPA_GET_CONFIG",
        +	[0x76] = "VDPA_GET_VRING_NUM",
         };
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Tiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3df4d4bf
    • Arnaldo Carvalho de Melo's avatar
      tools arch x86: Sync asm/cpufeatures.h with the kernel sources · e00a2d90
      Arnaldo Carvalho de Melo authored
      To pick up the changes from:
      
        077168e2 ("x86/mce/amd: Add PPIN support for AMD MCE")
        753039ef ("x86/cpu/amd: Call init_amd_zn() om Family 19h processors too")
        6650cdd9 ("x86/split_lock: Enable split lock detection by kernel")
      
      These don't cause any changes in tooling, just silences this perf build
      warning:
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
        diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Kim Phillips <kim.phillips@amd.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Wei Huang <wei.huang2@amd.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e00a2d90
    • Arnaldo Carvalho de Melo's avatar
      tools headers UAPI: Sync linux/mman.h with the kernel · f60b3878
      Arnaldo Carvalho de Melo authored
      To get the changes in:
      
        e346b381 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
      
      Add that to 'perf trace's mremap 'flags' decoder.
      
      This silences this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/mman.h' differs from latest version at 'include/uapi/linux/mman.h'
        diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Brian Geffon <bgeffon@google.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f60b3878
    • Arnaldo Carvalho de Melo's avatar
      tools headers UAPI: Sync sched.h with the kernel · 027fa8fb
      Arnaldo Carvalho de Melo authored
      To get the changes in:
      
        ef2c41cf ("clone3: allow spawning processes into cgroups")
      
      Add that to 'perf trace's clone 'flags' decoder.
      
      This silences this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/sched.h' differs from latest version at 'include/uapi/linux/sched.h'
        diff -u tools/include/uapi/linux/sched.h include/uapi/linux/sched.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      027fa8fb
    • Arnaldo Carvalho de Melo's avatar
      tools headers: Update linux/vdso.h and grab a copy of vdso/const.h · ca64d84e
      Arnaldo Carvalho de Melo authored
      To get in line with:
      
        8165b57b ("linux/const.h: Extract common header for vDSO")
      
      And silence this tools/perf/ build warning:
      
        Warning: Kernel ABI header at 'tools/include/linux/const.h' differs from latest version at 'include/linux/const.h'
        diff -u tools/include/linux/const.h include/linux/const.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ca64d84e
    • Jin Yao's avatar
      perf stat: Fix no metric header if --per-socket and --metric-only set · 8358f698
      Jin Yao authored
      We received a report that was no metric header displayed if --per-socket
      and --metric-only were both set.
      
      It's hard for script to parse the perf-stat output. This patch fixes this
      issue.
      
      Before:
      
        root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket
        ^C
         Performance counter stats for 'system wide':
      
        S0        8                  2.6
      
               2.215270071 seconds time elapsed
      
        root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket -I1000
        #           time socket cpus
             1.000411692 S0        8                  2.2
             2.001547952 S0        8                  3.4
             3.002446511 S0        8                  3.4
             4.003346157 S0        8                  4.0
             5.004245736 S0        8                  0.3
      
      After:
      
        root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket
        ^C
         Performance counter stats for 'system wide':
      
                                     CPI
        S0        8                  2.1
      
               1.813579830 seconds time elapsed
      
        root@kbl-ppc:~# perf stat -a -M CPI --metric-only --per-socket -I1000
        #           time socket cpus                  CPI
             1.000415122 S0        8                  3.2
             2.001630051 S0        8                  2.9
             3.002612278 S0        8                  4.3
             4.003523594 S0        8                  3.0
             5.004504256 S0        8                  3.7
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200331180226.25915-1-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8358f698
    • Arnaldo Carvalho de Melo's avatar
      perf python: Check if clang supports -fno-semantic-interposition · 9a00df31
      Arnaldo Carvalho de Melo authored
      The set of C compiler options used by distros to build python bindings
      may include options that are unknown to clang, we check for a variety of
      such options, add -fno-semantic-interposition to that mix:
      
      This fixes the build on, among others, Manjaro Linux:
      
          GEN      /tmp/build/perf/python/perf.so
        clang-9: error: unknown argument: '-fno-semantic-interposition'
        error: command 'clang' failed with exit status 1
        make: Leaving directory '/git/perf/tools/perf'
      
        [perfbuilder@602aed1c266d ~]$ gcc -v
        Using built-in specs.
        COLLECT_GCC=gcc
        COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/lto-wrapper
        Target: x86_64-pc-linux-gnu
        Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-pkgversion='Arch Linux 9.3.0-1' --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-shared --enable-threads=posix --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release --enable-default-pie --enable-default-ssp --enable-cet=auto gdc_include_dir=/usr/include/dlang/gdc
        Thread model: posix
        gcc version 9.3.0 (Arch Linux 9.3.0-1)
        [perfbuilder@602aed1c266d ~]$
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9a00df31
    • Arnaldo Carvalho de Melo's avatar
      tools arch x86: Sync the msr-index.h copy with the kernel sources · bab1a501
      Arnaldo Carvalho de Melo authored
      To pick up the changes in:
      
        6650cdd9 ("x86/split_lock: Enable split lock detection by kernel")
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
        diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
      
      Which causes these changes in tooling:
      
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
        $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
        $ diff -u before after
        --- before	2020-04-01 12:11:14.789344795 -0300
        +++ after	2020-04-01 12:11:56.907798879 -0300
        @@ -10,6 +10,7 @@
         	[0x00000029] = "KNC_EVNTSEL1",
         	[0x0000002a] = "IA32_EBL_CR_POWERON",
         	[0x0000002c] = "EBC_FREQUENCY_ID",
        +	[0x00000033] = "TEST_CTRL",
         	[0x00000034] = "SMI_COUNT",
         	[0x0000003a] = "IA32_FEAT_CTL",
         	[0x0000003b] = "IA32_TSC_ADJUST",
        @@ -27,6 +28,7 @@
         	[0x000000c2] = "IA32_PERFCTR1",
         	[0x000000cd] = "FSB_FREQ",
         	[0x000000ce] = "PLATFORM_INFO",
        +	[0x000000cf] = "IA32_CORE_CAPS",
         	[0x000000e2] = "PKG_CST_CONFIG_CONTROL",
         	[0x000000e7] = "IA32_MPERF",
         	[0x000000e8] = "IA32_APERF",
        $
      
        $ make -C tools/perf O=/tmp/build/perf install-bin
        <SNIP>
          CC       /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o
          LD       /tmp/build/perf/trace/beauty/tracepoints/perf-in.o
          LD       /tmp/build/perf/trace/beauty/perf-in.o
          LD       /tmp/build/perf/perf-in.o
          LINK     /tmp/build/perf/perf
        <SNIP>
      
      Now one can do:
      
      	perf trace -e msr:* --filter=msr==IA32_CORE_CAPS
      
      or:
      
      	perf trace -e msr:* --filter='msr==IA32_CORE_CAPS || msr==TEST_CTRL'
      
      And see only those MSRs being accessed via:
      
        # perf trace -v -e msr:* --filter='msr==IA32_CORE_CAPS || msr==TEST_CTRL'
        New filter for msr:read_msr: (msr==0xcf || msr==0x33) && (common_pid != 8263 && common_pid != 23250)
        New filter for msr:write_msr: (msr==0xcf || msr==0x33) && (common_pid != 8263 && common_pid != 23250)
        New filter for msr:rdpmc: (msr==0xcf || msr==0x33) && (common_pid != 8263 && common_pid != 23250)
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/lkml/20200401153325.GC12534@kernel.org/Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bab1a501
    • Ard Biesheuvel's avatar
      efi/x86: Revert struct layout change to fix kexec boot regression · a088b858
      Ard Biesheuvel authored
      Commit
      
        0a67361d ("efi/x86: Remove runtime table address from kexec EFI setup data")
      
      removed the code that retrieves the non-remapped UEFI runtime services
      pointer from the data structure provided by kexec, as it was never really
      needed on the kexec boot path: mapping the runtime services table at its
      non-remapped address is only needed when calling SetVirtualAddressMap(),
      which never happens during a kexec boot in the first place.
      
      However, dropping the 'runtime' member from struct efi_setup_data was a
      mistake. That struct is shared ABI between the kernel and the kexec tooling
      for x86, and so we cannot simply change its layout. So let's put back the
      removed field, but call it 'unused' to reflect the fact that we never look
      at its contents. While at it, add a comment to remind our future selves
      that the layout is external ABI.
      
      Fixes: 0a67361d ("efi/x86: Remove runtime table address from kexec EFI setup data")
      Reported-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Tested-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarDave Young <dyoung@redhat.com>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a088b858
    • Ard Biesheuvel's avatar
      efi/x86: Don't remap text<->rodata gap read-only for mixed mode · f6103162
      Ard Biesheuvel authored
      Commit
      
        d9e3d2c4 ("efi/x86: Don't map the entire kernel text RW for mixed mode")
      
      updated the code that creates the 1:1 memory mapping to use read-only
      attributes for the 1:1 alias of the kernel's text and rodata sections, to
      protect it from inadvertent modification. However, it failed to take into
      account that the unused gap between text and rodata is given to the page
      allocator for general use.
      
      If the vmap'ed stack happens to be allocated from this region, any by-ref
      output arguments passed to EFI runtime services that are allocated on the
      stack (such as the 'datasize' argument taken by GetVariable() when invoked
      from efivar_entry_size()) will be referenced via a read-only mapping,
      resulting in a page fault if the EFI code tries to write to it:
      
        BUG: unable to handle page fault for address: 00000000386aae88
        #PF: supervisor write access in kernel mode
        #PF: error_code(0x0003) - permissions violation
        PGD fd61063 P4D fd61063 PUD fd62063 PMD 386000e1
        Oops: 0003 [#1] SMP PTI
        CPU: 2 PID: 255 Comm: systemd-sysv-ge Not tainted 5.6.0-rc4-default+ #22
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0008:0x3eaeed95
        Code: ...  <89> 03 be 05 00 00 80 a1 74 63 b1 3e 83 c0 48 e8 44 d2 ff ff eb 05
        RSP: 0018:000000000fd73fa0 EFLAGS: 00010002
        RAX: 0000000000000001 RBX: 00000000386aae88 RCX: 000000003e9f1120
        RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000001
        RBP: 000000000fd73fd8 R08: 00000000386aae88 R09: 0000000000000000
        R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000
        R13: ffffc0f040220000 R14: 0000000000000000 R15: 0000000000000000
        FS:  00007f21160ac940(0000) GS:ffff9cf23d500000(0000) knlGS:0000000000000000
        CS:  0008 DS: 0018 ES: 0018 CR0: 0000000080050033
        CR2: 00000000386aae88 CR3: 000000000fd6c004 CR4: 00000000003606e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
        Modules linked in:
        CR2: 00000000386aae88
        ---[ end trace a8bfbd202e712834 ]---
      
      Let's fix this by remapping text and rodata individually, and leave the
      gaps mapped read-write.
      
      Fixes: d9e3d2c4 ("efi/x86: Don't map the entire kernel text RW for mixed mode")
      Reported-by: default avatarJiri Slaby <jslaby@suse.cz>
      Tested-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20200409130434.6736-10-ardb@kernel.org
      f6103162
    • Gary Lin's avatar
      efi/x86: Fix the deletion of variables in mixed mode · a4b81ccf
      Gary Lin authored
      efi_thunk_set_variable() treated the NULL "data" pointer as an invalid
      parameter, and this broke the deletion of variables in mixed mode.
      This commit fixes the check of data so that the userspace program can
      delete a variable in mixed mode.
      
      Fixes: 8319e9d5 ("efi/x86: Handle by-ref arguments covering multiple pages in mixed mode")
      Signed-off-by: default avatarGary Lin <glin@suse.com>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20200408081606.1504-1-glin@suse.com
      Link: https://lore.kernel.org/r/20200409130434.6736-9-ardb@kernel.org
      a4b81ccf
    • Ard Biesheuvel's avatar
      efi/libstub/file: Merge file name buffers to reduce stack usage · 464fb126
      Ard Biesheuvel authored
      Arnd reports that commit
      
        9302c1bb ("efi/libstub: Rewrite file I/O routine")
      
      reworks the file I/O routines in a way that triggers the following
      warning:
      
        drivers/firmware/efi/libstub/file.c:240:1: warning: the frame size
                  of 1200 bytes is larger than 1024 bytes [-Wframe-larger-than=]
      
      We can work around this issue dropping an instance of efi_char16_t[256]
      from the stack frame, and reusing the 'filename' field of the file info
      struct that we use to obtain file information from EFI (which contains
      the file name even though we already know it since we used it to open
      the file in the first place)
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20200409130434.6736-8-ardb@kernel.org
      464fb126
    • Ard Biesheuvel's avatar
      Documentation/x86, efi/x86: Clarify EFI handover protocol and its requirements · 8b84769a
      Ard Biesheuvel authored
      The EFI handover protocol was introduced on x86 to permit the boot
      loader to pass a populated boot_params structure as an additional
      function argument to the entry point. This allows the bootloader to
      pass the base and size of a initrd image, which is more flexible
      than relying on the EFI stub's file I/O routines, which can only
      access the file system from which the kernel image itself was loaded
      from firmware.
      
      This approach requires a fair amount of internal knowledge regarding
      the layout of the boot_params structure on the part of the boot loader,
      as well as knowledge regarding the allowed placement of the initrd in
      memory, and so it has been deprecated in favour of a new initrd loading
      method that is based on existing UEFI protocols and best practices.
      
      So update the x86 boot protocol documentation to clarify that the EFI
      handover protocol has been deprecated, and while at it, add a note that
      invoking the EFI handover protocol still requires the PE/COFF image to
      be loaded properly (as opposed to simply being copied into memory).
      Also, drop the code32_start header field from the list of values that
      need to be provided, as this is no longer required.
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20200409130434.6736-7-ardb@kernel.org
      8b84769a