1. 23 Jun, 2021 21 commits
  2. 22 Jun, 2021 2 commits
    • Thomas Gleixner's avatar
      x86/fpu: Make init_fpstate correct with optimized XSAVE · f9dfb5e3
      Thomas Gleixner authored
      The XSAVE init code initializes all enabled and supported components with
      XRSTOR(S) to init state. Then it XSAVEs the state of the components back
      into init_fpstate which is used in several places to fill in the init state
      of components.
      
      This works correctly with XSAVE, but not with XSAVEOPT and XSAVES because
      those use the init optimization and skip writing state of components which
      are in init state. So init_fpstate.xsave still contains all zeroes after
      this operation.
      
      There are two ways to solve that:
      
         1) Use XSAVE unconditionally, but that requires to reshuffle the buffer when
            XSAVES is enabled because XSAVES uses compacted format.
      
         2) Save the components which are known to have a non-zero init state by other
            means.
      
      Looking deeper, #2 is the right thing to do because all components the
      kernel supports have all-zeroes init state except the legacy features (FP,
      SSE). Those cannot be hard coded because the states are not identical on all
      CPUs, but they can be saved with FXSAVE which avoids all conditionals.
      
      Use FXSAVE to save the legacy FP/SSE components in init_fpstate along with
      a BUILD_BUG_ON() which reminds developers to validate that a newly added
      component has all zeroes init state. As a bonus remove the now unused
      copy_xregs_to_kernel_booting() crutch.
      
      The XSAVE and reshuffle method can still be implemented in the unlikely
      case that components are added which have a non-zero init state and no
      other means to save them. For now, FXSAVE is just simple and good enough.
      
        [ bp: Fix a typo or two in the text. ]
      
      Fixes: 6bad06b7 ("x86, xsave: Use xsaveopt in context-switch path when supported")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210618143444.587311343@linutronix.de
      f9dfb5e3
    • Thomas Gleixner's avatar
      x86/fpu: Preserve supervisor states in sanitize_restored_user_xstate() · 9301982c
      Thomas Gleixner authored
      sanitize_restored_user_xstate() preserves the supervisor states only
      when the fx_only argument is zero, which allows unprivileged user space
      to put supervisor states back into init state.
      
      Preserve them unconditionally.
      
       [ bp: Fix a typo or two in the text. ]
      
      Fixes: 5d6b6a6f ("x86/fpu/xstate: Update sanitize_restored_xstate() for supervisor xstates")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210618143444.438635017@linutronix.de
      9301982c
  3. 20 Jun, 2021 4 commits
  4. 19 Jun, 2021 13 commits
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · b84a7c28
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Fix initrd corruption caused by our recent change to use relative jump
        labels.
      
        Fix a crash using perf record on systems without a hardware PMU
        backend.
      
        Rework our 64-bit signal handling slighty to make it more closely
        match the old behaviour, after the recent change to use unsafe user
        accessors.
      
        Thanks to Anastasia Kovaleva, Athira Rajeev, Christophe Leroy, Daniel
        Axtens, Greg Kurz, and Roman Bolshakov"
      
      * tag 'powerpc-5.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/perf: Fix crash in perf_instruction_pointer() when ppmu is not set
        powerpc: Fix initrd corruption with relative jump labels
        powerpc/signal64: Copy siginfo before changing regs->nip
        powerpc/mem: Add back missing header to fix 'no previous prototype' error
      b84a7c28
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v5.13-2021-06-19' of... · 913ec3c2
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v5.13-2021-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Fix refcount usage when processing PERF_RECORD_KSYMBOL.
      
       - 'perf stat' metric group fixes.
      
       - Fix 'perf test' non-bash issue with stat bpf counters.
      
       - Update unistd, in.h and socket.h with the kernel sources, silencing
         perf build warnings.
      
      * tag 'perf-tools-fixes-for-v5.13-2021-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        tools headers UAPI: Sync linux/in.h copy with the kernel sources
        tools headers UAPI: Sync asm-generic/unistd.h with the kernel original
        perf beauty: Update copy of linux/socket.h with the kernel sources
        perf test: Fix non-bash issue with stat bpf counters
        perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL
        perf metricgroup: Return error code from metricgroup__add_metric_sys_event_iter()
        perf metricgroup: Fix find_evsel_group() event selector
      913ec3c2
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · d9403d30
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - A build fix to always build modules with the 'medany' code model, as
         the module loader doesn't support 'medlow'.
      
       - A Kconfig warning fix for the SiFive errata.
      
       - A pair of fixes that for regressions to the recent memory layout
         changes.
      
       - A fix for the FU740 device tree.
      
      * tag 'riscv-for-linus-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: dts: fu740: fix cache-controller interrupts
        riscv: Ensure BPF_JIT_REGION_START aligned with PMD size
        riscv: kasan: Fix MODULES_VADDR evaluation due to local variables' name
        riscv: sifive: fix Kconfig errata warning
        riscv32: Use medany C model for modules
      d9403d30
    • Linus Torvalds's avatar
      Merge tag 's390-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · e14c779a
      Linus Torvalds authored
      Pull s390 fixes from Vasily Gorbik:
      
       - Fix zcrypt ioctl hang due to AP queue msg counter dropping below 0
         when pending requests are purged.
      
       - Two fixes for the machine check handler in the entry code.
      
      * tag 's390-5.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/ap: Fix hanging ioctl caused by wrong msg counter
        s390/mcck: fix invalid KVM guest condition check
        s390/mcck: fix calculation of SIE critical section size
      e14c779a
    • Arnaldo Carvalho de Melo's avatar
      tools headers UAPI: Sync linux/in.h copy with the kernel sources · 1792a59e
      Arnaldo Carvalho de Melo authored
      To pick the changes in:
      
        32182747 ("icmp: don't send out ICMP messages with a source address of 0.0.0.0")
      
      That don't result in any change in tooling, as INADDR_ are not used to
      generate id->string tables used by 'perf trace'.
      
      This addresses this build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/in.h' differs from latest version at 'include/uapi/linux/in.h'
        diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Toke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1792a59e
    • Arnaldo Carvalho de Melo's avatar
      tools headers UAPI: Sync asm-generic/unistd.h with the kernel original · 17d27fc3
      Arnaldo Carvalho de Melo authored
      To pick the changes in:
      
        8b1462b6 ("quota: finish disable quotactl_path syscall")
      
      Those headers are used in some arches to generate the syscall table used
      in 'perf trace' to translate syscall numbers into strings.
      
      This addresses this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
        diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Marcin Juszkiewicz <marcin@juszkiewicz.com.pl>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      17d27fc3
    • Arnaldo Carvalho de Melo's avatar
      perf beauty: Update copy of linux/socket.h with the kernel sources · ef83f9ef
      Arnaldo Carvalho de Melo authored
      To pick the changes in:
      
        ea6932d7 ("net: make get_net_ns return error if NET_NS is disabled")
      
      That don't result in any changes in the tables generated from that
      header.
      
      This silences this perf build warning:
      
        Warning: Kernel ABI header at 'tools/perf/trace/beauty/include/linux/socket.h' differs from latest version at 'include/linux/socket.h'
        diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h
      
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ef83f9ef
    • Ian Rogers's avatar
      perf test: Fix non-bash issue with stat bpf counters · 482698c2
      Ian Rogers authored
      $(( .. )) is a bash feature but the test's interpreter is !/bin/sh,
      switch the code to use expr.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: bpf@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20210617184216.2075588-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      482698c2
    • Riccardo Mancini's avatar
      perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL · c087e948
      Riccardo Mancini authored
      ASan reported a memory leak of BPF-related ksymbols map and dso. The
      leak is caused by refount never reaching 0, due to missing __put calls
      in the function machine__process_ksymbol_register.
      
      Once the dso is inserted in the map, dso__put() should be called
      (map__new2() increases the refcount to 2).
      
      The same thing applies for the map when it's inserted into maps
      (maps__insert() increases the refcount to 2).
      
        $ sudo ./perf record -- sleep 5
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.025 MB perf.data (8 samples) ]
      
        =================================================================
        ==297735==ERROR: LeakSanitizer: detected memory leaks
      
        Direct leak of 6992 byte(s) in 19 object(s) allocated from:
            #0 0x4f43c7 in calloc (/home/user/linux/tools/perf/perf+0x4f43c7)
            #1 0x8e4e53 in map__new2 /home/user/linux/tools/perf/util/map.c:216:20
            #2 0x8cf68c in machine__process_ksymbol_register /home/user/linux/tools/perf/util/machine.c:778:10
            [...]
      
        Indirect leak of 8702 byte(s) in 19 object(s) allocated from:
            #0 0x4f43c7 in calloc (/home/user/linux/tools/perf/perf+0x4f43c7)
            #1 0x8728d7 in dso__new_id /home/user/linux/tools/perf/util/dso.c:1256:20
            #2 0x872015 in dso__new /home/user/linux/tools/perf/util/dso.c:1295:9
            #3 0x8cf623 in machine__process_ksymbol_register /home/user/linux/tools/perf/util/machine.c:774:21
            [...]
      
        Indirect leak of 1520 byte(s) in 19 object(s) allocated from:
            #0 0x4f43c7 in calloc (/home/user/linux/tools/perf/perf+0x4f43c7)
            #1 0x87b3da in symbol__new /home/user/linux/tools/perf/util/symbol.c:269:23
            #2 0x888954 in map__process_kallsym_symbol /home/user/linux/tools/perf/util/symbol.c:710:8
            [...]
      
        Indirect leak of 1406 byte(s) in 19 object(s) allocated from:
            #0 0x4f43c7 in calloc (/home/user/linux/tools/perf/perf+0x4f43c7)
            #1 0x87b3da in symbol__new /home/user/linux/tools/perf/util/symbol.c:269:23
            #2 0x8cfbd8 in machine__process_ksymbol_register /home/user/linux/tools/perf/util/machine.c:803:8
            [...]
      Signed-off-by: default avatarRiccardo Mancini <rickyman7@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tommi Rantala <tommi.t.rantala@nokia.com>
      Link: http://lore.kernel.org/lkml/20210612173751.188582-1-rickyman7@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c087e948
    • John Garry's avatar
      perf metricgroup: Return error code from metricgroup__add_metric_sys_event_iter() · fe7a98b9
      John Garry authored
      The error code is not set at all in the sys event iter function.
      
      This may lead to an uninitialized value of "ret" in
      metricgroup__add_metric() when no CPU metric is added.
      
      Fix by properly setting the error code.
      
      It is not necessary to init "ret" to 0 in metricgroup__add_metric(), as
      if we have no CPU or sys event metric matching, then "has_match" should
      be 0 and "ret" is set to -EINVAL.
      
      However gcc cannot detect that it may not have been set after the
      map_for_each_metric() loop for CPU metrics, which is strange.
      
      Fixes: be335ec2 ("perf metricgroup: Support adding metrics for system PMUs")
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/1623335580-187317-3-git-send-email-john.garry@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      fe7a98b9
    • John Garry's avatar
      perf metricgroup: Fix find_evsel_group() event selector · fc96ec4d
      John Garry authored
      The following command segfaults on my x86 broadwell:
      
        $ ./perf stat  -M frontend_bound,retiring,backend_bound,bad_speculation sleep 1
        WARNING: grouped events cpus do not match, disabling group:
          anon group { raw 0x10e }
          anon group { raw 0x10e }
        perf: util/evsel.c:1596: get_group_fd: Assertion `!(!leader->core.fd)' failed.
        Aborted (core dumped)
      
      The issue shows itself as a use-after-free in evlist__check_cpu_maps(),
      whereby the leader of an event selector (evsel) has been deleted (yet we
      still attempt to verify for an evsel).
      
      Fundamentally the problem comes from metricgroup__setup_events() ->
      find_evsel_group(), and has developed from the previous fix attempt in
      commit 9c880c24 ("perf metricgroup: Fix for metrics containing
      duration_time").
      
      The problem now is that the logic in checking if an evsel is in the same
      group is subtly broken for the "cycles" event. For the "cycles" event,
      the pmu_name is NULL; however the logic in find_evsel_group() may set an
      event matched against "cycles" as used, when it should not be.
      
      This leads to a condition where an evsel is set, yet its leader is not.
      
      Fix the check for evsel pmu_name by not matching evsels when either has a
      NULL pmu_name.
      
      There is still a pre-existing metric issue whereby the ordering of the
      metrics may break the 'stat' function, as discussed at:
      https://lore.kernel.org/lkml/49c6fccb-b716-1bf0-18a6-cace1cdb66b9@huawei.com/
      
      Fixes: 9c880c24 ("perf metricgroup: Fix for metrics containing duration_time")
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> # On a Thinkpad T450S
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/1623335580-187317-2-git-send-email-john.garry@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      fc96ec4d
    • David Abdurachmanov's avatar
      riscv: dts: fu740: fix cache-controller interrupts · 7ede12b0
      David Abdurachmanov authored
      The order of interrupt numbers is incorrect.
      
      The order for FU740 is: DirError, DataError, DataFail, DirFail
      
      From SiFive FU740-C000 Manual:
      19 - L2 Cache DirError
      20 - L2 Cache DirFail
      21 - L2 Cache DataError
      22 - L2 Cache DataFail
      Signed-off-by: default avatarDavid Abdurachmanov <david.abdurachmanov@sifive.com>
      Signed-off-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      7ede12b0
    • Jisheng Zhang's avatar
      riscv: Ensure BPF_JIT_REGION_START aligned with PMD size · 3a02764c
      Jisheng Zhang authored
      Andreas reported commit fc850476 ("riscv: bpf: Avoid breaking W^X")
      breaks booting with one kind of defconfig, I reproduced a kernel panic
      with the defconfig:
      
      [    0.138553] Unable to handle kernel paging request at virtual address ffffffff81201220
      [    0.139159] Oops [#1]
      [    0.139303] Modules linked in:
      [    0.139601] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #1
      [    0.139934] Hardware name: riscv-virtio,qemu (DT)
      [    0.140193] epc : __memset+0xc4/0xfc
      [    0.140416]  ra : skb_flow_dissector_init+0x1e/0x82
      [    0.140609] epc : ffffffff8029806c ra : ffffffff8033be78 sp : ffffffe001647da0
      [    0.140878]  gp : ffffffff81134b08 tp : ffffffe001654380 t0 : ffffffff81201158
      [    0.141156]  t1 : 0000000000000002 t2 : 0000000000000154 s0 : ffffffe001647dd0
      [    0.141424]  s1 : ffffffff80a43250 a0 : ffffffff81201220 a1 : 0000000000000000
      [    0.141654]  a2 : 000000000000003c a3 : ffffffff81201258 a4 : 0000000000000064
      [    0.141893]  a5 : ffffffff8029806c a6 : 0000000000000040 a7 : ffffffffffffffff
      [    0.142126]  s2 : ffffffff81201220 s3 : 0000000000000009 s4 : ffffffff81135088
      [    0.142353]  s5 : ffffffff81135038 s6 : ffffffff8080ce80 s7 : ffffffff80800438
      [    0.142584]  s8 : ffffffff80bc6578 s9 : 0000000000000008 s10: ffffffff806000ac
      [    0.142810]  s11: 0000000000000000 t3 : fffffffffffffffc t4 : 0000000000000000
      [    0.143042]  t5 : 0000000000000155 t6 : 00000000000003ff
      [    0.143220] status: 0000000000000120 badaddr: ffffffff81201220 cause: 000000000000000f
      [    0.143560] [<ffffffff8029806c>] __memset+0xc4/0xfc
      [    0.143859] [<ffffffff8061e984>] init_default_flow_dissectors+0x22/0x60
      [    0.144092] [<ffffffff800010fc>] do_one_initcall+0x3e/0x168
      [    0.144278] [<ffffffff80600df0>] kernel_init_freeable+0x1c8/0x224
      [    0.144479] [<ffffffff804868a8>] kernel_init+0x12/0x110
      [    0.144658] [<ffffffff800022de>] ret_from_exception+0x0/0xc
      [    0.145124] ---[ end trace f1e9643daa46d591 ]---
      
      After some investigation, I think I found the root cause: commit
      2bfc6cd8 ("move kernel mapping outside of linear mapping") moves
      BPF JIT region after the kernel:
      
      | #define BPF_JIT_REGION_START	PFN_ALIGN((unsigned long)&_end)
      
      The &_end is unlikely aligned with PMD size, so the front bpf jit
      region sits with part of kernel .data section in one PMD size mapping.
      But kernel is mapped in PMD SIZE, when bpf_jit_binary_lock_ro() is
      called to make the first bpf jit prog ROX, we will make part of kernel
      .data section RO too, so when we write to, for example memset the
      .data section, MMU will trigger a store page fault.
      
      To fix the issue, we need to ensure the BPF JIT region is PMD size
      aligned. This patch acchieve this goal by restoring the BPF JIT region
      to original position, I.E the 128MB before kernel .text section. The
      modification to kasan_init.c is inspired by Alexandre.
      
      Fixes: fc850476 ("riscv: bpf: Avoid breaking W^X")
      Reported-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Signed-off-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      3a02764c