1. 01 Jul, 2021 13 commits
    • Arnaldo Carvalho de Melo's avatar
      tools headers UAPI: Sync linux/kvm.h with the kernel sources · e48f62ae
      Arnaldo Carvalho de Melo authored
      To pick the changes in:
      
        19238e75 ("kvm: x86: Allow userspace to handle emulation errors")
        cb082bfa ("KVM: stats: Add fd-based API to read binary stats data")
        b87cc116 ("KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE capability")
        f0376edb ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
        0dbb1123 ("KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall")
        6dba9403 ("KVM: x86: Introduce KVM_GET_SREGS2 / KVM_SET_SREGS2")
        644f7067 ("KVM: x86: hyper-v: Introduce KVM_CAP_HYPERV_ENFORCE_CPUID")
      
      That automatically adds support for these new ioctls:
      
        $ tools/perf/trace/beauty/kvm_ioctl.sh > before
        $ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
        $ tools/perf/trace/beauty/kvm_ioctl.sh > after
        $ diff -u before after
        --- before	2021-07-01 13:42:07.006387354 -0300
        +++ after	2021-07-01 13:45:16.051649301 -0300
        @@ -95,6 +95,9 @@
         	[0xc9] = "XEN_HVM_SET_ATTR",
         	[0xca] = "XEN_VCPU_GET_ATTR",
         	[0xcb] = "XEN_VCPU_SET_ATTR",
        +	[0xcc] = "GET_SREGS2",
        +	[0xcd] = "SET_SREGS2",
        +	[0xce] = "GET_STATS_FD",
         	[0xe0] = "CREATE_DEVICE",
         	[0xe1] = "SET_DEVICE_ATTR",
         	[0xe2] = "GET_DEVICE_ATTR",
        $
      
      This silences these perf build warning:
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
        diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
        Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
        diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
      
      Cc: Aaron Lewis <aaronlewis@google.com>
      Cc: Ashish Kalra <ashish.kalra@amd.com>
      Cc: Bharata B Rao <bharata@linux.ibm.com>
      Cc: Jing Zhang <jingzhangos@google.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e48f62ae
    • Arnaldo Carvalho de Melo's avatar
      tools headers cpufeatures: Sync with the kernel sources · cc200a7d
      Arnaldo Carvalho de Melo authored
      To pick the changes from:
      
        1348924b ("x86/msr: Define new bits in TSX_FORCE_ABORT MSR")
        cbcddaa3 ("perf/x86/rapl: Use CPUID bit on AMD and Hygon parts")
      
      This only causes these perf files to be rebuilt:
      
        CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
        CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o
      
      And addresses this perf build warning:
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
        diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
      
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cc200a7d
    • Arnaldo Carvalho de Melo's avatar
      tools include UAPI: Update linux/mount.h copy · 14c6ef2b
      Arnaldo Carvalho de Melo authored
      To pick the changes from:
      
        dd8b477f ("mount: Support "nosymfollow" in new mount api")
      
      That ends up adding support for the new MOUNT_ATTR_NOSYMFOLLOW mount
      attribute:
      
        $ tools/perf/trace/beauty/fsmount.sh > before
        $ cp include/uapi/linux/mount.h tools/include/uapi/linux/mount.h
        $ tools/perf/trace/beauty/fsmount.sh > after
        $ diff -u before after
        --- before	2021-07-01 13:34:04.542517355 -0300
        +++ after	2021-07-01 13:34:12.423694537 -0300
        @@ -7,4 +7,5 @@
         	[ilog2(0x00000020) + 1] = "STRICTATIME",
         	[ilog2(0x00000080) + 1] = "NODIRATIME",
         	[ilog2(0x00100000) + 1] = "IDMAP",
        +	[ilog2(0x00200000) + 1] = "NOSYMFOLLOW",
         };
        $
      
      So now one can use it in --filter expressions for tracepoints.
      
      This silences this perf build warnings:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/mount.h' differs from latest version at 'include/uapi/linux/mount.h'
        diff -u tools/include/uapi/linux/mount.h include/uapi/linux/mount.h
      
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      14c6ef2b
    • Arnaldo Carvalho de Melo's avatar
      tools arch x86: Sync the msr-index.h copy with the kernel sources · 04df0dc1
      Arnaldo Carvalho de Melo authored
      To pick up the changes from these csets:
      
        1348924b ("x86/msr: Define new bits in TSX_FORCE_ABORT MSR")
      
      That cause no changes to tooling:
      
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
        $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
        $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
        $ diff -u before after
        $
      
      Just silences this perf build warning:
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
        diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
      
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      04df0dc1
    • Leo Yan's avatar
      perf arm-spe: Don't wait for PERF_RECORD_EXIT event · 8941ba50
      Leo Yan authored
      When decode Arm SPE trace, it waits for PERF_RECORD_EXIT event (the last
      perf event) for processing trace data, which is needless and even might
      cause logic error, e.g. it might fail to correlate perf events with Arm
      SPE events correctly.
      
      So this patch removes the condition checking for PERF_RECORD_EXIT event.
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Tested-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <Al.Grant@arm.com>
      Cc: Dave Martin <Dave.Martin@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20210519071939.1598923-6-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8941ba50
    • Leo Yan's avatar
      perf arm-spe: Bail out if the trace is later than perf event · afb5e9e4
      Leo Yan authored
      It's possible that record in Arm SPE trace is later than perf event and
      vice versa.  This asks to correlate the perf events and Arm SPE
      synthesized events to be processed in the manner of correct timing.
      
      To achieve the time ordering, this patch reverses the flow, it firstly
      calls arm_spe_sample() and then calls arm_spe_decode().  By comparing
      the timestamp value and detect the perf event is coming earlier than Arm
      SPE trace data, it bails out from the decoding loop, the last record is
      pushed into auxtrace stack and is deferred to generate sample.  To track
      the timestamp, everytime it updates timestamp for the latest record.
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Tested-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <Al.Grant@arm.com>
      Cc: Dave Martin <Dave.Martin@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20210519071939.1598923-5-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      afb5e9e4
    • Leo Yan's avatar
      perf arm-spe: Assign kernel time to synthesized event · 85498f75
      Leo Yan authored
      In current code, it assigns the arch timer counter to the synthesized
      samples Arm SPE trace, thus the samples don't contain the kernel time
      but only contain the raw counter value.
      
      To fix the issue, this patch converts the timer counter to kernel time
      and assigns it to sample timestamp.
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Tested-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <Al.Grant@arm.com>
      Cc: Dave Martin <Dave.Martin@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20210519071939.1598923-4-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      85498f75
    • Leo Yan's avatar
      perf arm-spe: Convert event kernel time to counter value · 63051901
      Leo Yan authored
      When handle a perf event, Arm SPE decoder needs to decide if this perf
      event is earlier or later than the samples from Arm SPE trace data; to
      do comparision, it needs to use the same unit for the time.
      
      This patch converts the event kernel time to arch timer's counter value,
      thus it can be used to compare with counter value contained in Arm SPE
      Timestamp packet.
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Tested-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <Al.Grant@arm.com>
      Cc: Dave Martin <Dave.Martin@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20210519071939.1598923-3-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      63051901
    • Leo Yan's avatar
      perf arm-spe: Save clock parameters from TIME_CONV event · c210c306
      Leo Yan authored
      During the recording phase, "perf record" tool synthesizes event
      PERF_RECORD_TIME_CONV for the hardware clock parameters and saves the
      event into the data file.
      
      Afterwards, when processing the data file, the event TIME_CONV will be
      processed at the very early time and is stored into session context.
      
      This patch extracts these parameters from the session context and saves
      into the structure "spe->tc" with the type perf_tsc_conversion, so that
      the parameters are ready for conversion between clock counter and time
      stamp.
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Tested-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Al Grant <Al.Grant@arm.com>
      Cc: Dave Martin <Dave.Martin@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20210519071939.1598923-2-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c210c306
    • Leo Yan's avatar
      perf cs-etm: Remove callback cs_etm_find_snapshot() · 2f01c200
      Leo Yan authored
      The callback cs_etm_find_snapshot() is invoked for snapshot mode, its
      main purpose is to find the correct AUX trace data and returns "head"
      and "old" (we can call "old" as "old head") to the caller, the caller
      __auxtrace_mmap__read() uses these two pointers to decide the AUX trace
      data size.
      
      This patch removes cs_etm_find_snapshot() with below reasons:
      
      - The first thing in cs_etm_find_snapshot() is to check if the head has
        wrapped around, if it is not, directly bails out.  The checking is
        pointless, this is because the "head" and "old" pointers both are
        monotonical increasing so they never wrap around.
      
      - cs_etm_find_snapshot() adjusts the "head" and "old" pointers and
        assumes the AUX ring buffer is fully filled with the hardware trace
        data, so it always subtracts the difference "mm->len" from "head" to
        get "old".  Let's imagine the snapshot is taken in very short
        interval, the tracers only fill a small chunk of the trace data into
        the AUX ring buffer, in this case, it's wrongly to copy the whole the
        AUX ring buffer to perf file.
      
      - As the "head" and "old" pointers are monotonically increased, the
        function __auxtrace_mmap__read() handles these two pointers properly.
        It calculates the reminders for these two pointers, and the size is
        clamped to be never more than "snapshot_size".  We can simply reply on
        the function __auxtrace_mmap__read() to calculate the correct result
        for data copying, it's not necessary to add Arm CoreSight specific
        callback.
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reviewed-by: default avatarJames Clark <james.clark@arm.com>
      Tested-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Daniel Kiss <daniel.kiss@arm.com>
      Cc: Denis Nikitin <denik@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: coresight@lists.linaro.org
      Link: http://lore.kernel.org/lkml/20210701093537.90759-3-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2f01c200
    • Namhyung Kim's avatar
      perf bpf_counter: Move common functions to bpf_counter.h · d6a735ef
      Namhyung Kim authored
      Some helper functions will be used for cgroup counting too.  Move them
      to a header file for sharing.
      
      Committer notes:
      
      Fix the build on older systems with:
      
        -       struct bpf_map_info map_info = {0};
        +       struct bpf_map_info map_info = { .id = 0, };
      
      This wasn't breaking the build in such systems as bpf_counter.c isn't
      built due to:
      
      tools/perf/util/Build:
      
        perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter.o
      
      The bpf_counter.h file on the other hand is included from places that
      are built everywhere.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20210625071826.608504-4-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d6a735ef
    • Namhyung Kim's avatar
      perf tools: Add cgroup_is_v2() helper · 21bcc726
      Namhyung Kim authored
      The cgroup_is_v2() is to check if the given subsystem is mounted on
      cgroup v2 or not.  It'll be used by BPF cgroup code later.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20210625071826.608504-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      21bcc726
    • Namhyung Kim's avatar
      perf tools: Add read_cgroup_id() function · 69e874db
      Namhyung Kim authored
      The read_cgroup_id() is to read a cgroup id from a file handle using
      name_to_handle_at(2) for the given cgroup.  It'll be used by bperf
      cgroup stat later.
      
      Committer notes:
      
        -int read_cgroup_id(struct cgroup *cgrp)
        +static inline int read_cgroup_id(struct cgroup *cgrp __maybe_unused)
      
      To fix the build when HAVE_FILE_HANDLE is not defined.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20210625071826.608504-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      69e874db
  2. 30 Jun, 2021 8 commits
    • Alexey Bayduraev's avatar
      tools lib: Adopt bitmap_intersects() operation from the kernel sources · f20510d5
      Alexey Bayduraev authored
      Adopt bitmap_intersects() routine that tests whether bitmaps bitmap1 and
      bitmap2 intersects. This routine will be used during thread masks
      initialization.
      Signed-off-by: default avatarAlexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
      Acked-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarNamhyung Kim <namhyung@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Budankov <abudankov@huawei.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Link: http://lore.kernel.org/lkml/f75aa738d8ff8f9cffd7532d671f3ef3deb97a7c.1625065643.git.alexey.v.bayduraev@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f20510d5
    • Arnaldo Carvalho de Melo's avatar
      857286e4
    • Linus Torvalds's avatar
      Merge tag 'dlm-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm · 007b350a
      Linus Torvalds authored
      Pull dlm updates from David Teigland:
       "This is a major dlm networking enhancement that adds message
        retransmission so that the dlm can reliably continue operating when
        network connections fail and nodes reconnect.
      
        Previously, this would result in lost messages which could only be
        handled as a node failure"
      
      * tag 'dlm-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: (26 commits)
        fs: dlm: invalid buffer access in lookup error
        fs: dlm: fix race in mhandle deletion
        fs: dlm: rename socket and app buffer defines
        fs: dlm: introduce proto values
        fs: dlm: move dlm allow conn
        fs: dlm: use alloc_ordered_workqueue
        fs: dlm: fix memory leak when fenced
        fs: dlm: fix lowcomms_start error case
        fs: dlm: Fix spelling mistake "stucked" -> "stuck"
        fs: dlm: Fix memory leak of object mh
        fs: dlm: don't allow half transmitted messages
        fs: dlm: add midcomms debugfs functionality
        fs: dlm: add reliable connection if reconnect
        fs: dlm: add union in dlm header for lockspace id
        fs: dlm: move out some hash functionality
        fs: dlm: add functionality to re-transmit a message
        fs: dlm: make buffer handling per msg
        fs: dlm: add more midcomms hooks
        fs: dlm: public header in out utility
        fs: dlm: fix connection tcp EOF handling
        ...
      007b350a
    • Linus Torvalds's avatar
      Merge tag 'gfs2-v5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 8418dabd
      Linus Torvalds authored
      Pull gfs2 updates from Andreas Gruenbacher:
       "Various minor gfs2 cleanups and fixes"
      
      * tag 'gfs2-v5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: Clean up gfs2_unstuff_dinode
        gfs2: Unstuff before locking page in gfs2_page_mkwrite
        gfs2: Clean up the error handling in gfs2_page_mkwrite
        gfs2: Fix error handling in init_statfs
        gfs2: Fix underflow in gfs2_page_mkwrite
        gfs2: Use list_move_tail instead of list_del/list_add_tail
        gfs2: Fix do_gfs2_set_flags description
      8418dabd
    • Linus Torvalds's avatar
      Merge tag '5.14-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6 · bbd91626
      Linus Torvalds authored
      Pull cifs updates from Steve French:
      
       - improve fallocate emulation
      
       - DFS fixes
      
       - minor multichannel fixes
      
       - various cleanup patches, many to address Coverity warnings
      
      * tag '5.14-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (38 commits)
        smb3: prevent races updating CurrentMid
        cifs: fix missing spinlock around update to ses->status
        cifs: missing null pointer check in cifs_mount
        smb3: fix possible access to uninitialized pointer to DACL
        cifs: missing null check for newinode pointer
        cifs: remove two cases where rc is set unnecessarily in sid_to_id
        SMB3: Add new info level for query directory
        cifs: fix NULL dereference in smb2_check_message()
        smbdirect: missing rc checks while waiting for rdma events
        cifs: Avoid field over-reading memcpy()
        smb311: remove dead code for non compounded posix query info
        cifs: fix SMB1 error path in cifs_get_file_info_unix
        smb3: fix uninitialized value for port in witness protocol move
        cifs: fix unneeded null check
        cifs: use SPDX-Licence-Identifier
        cifs: convert list_for_each to entry variant in cifs_debug.c
        cifs: convert list_for_each to entry variant in smb2misc.c
        cifs: avoid extra calls in posix_info_parse
        cifs: retry lookup and readdir when EAGAIN is returned.
        cifs: fix check of dfs interlinks
        ...
      bbd91626
    • Linus Torvalds's avatar
      Merge tag 'fs.openat2.unknown_flags.v5.14' of... · b97902b6
      Linus Torvalds authored
      Merge tag 'fs.openat2.unknown_flags.v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
      
      Pull openat2 fixes from Christian Brauner:
      
       - Remove the unused VALID_UPGRADE_FLAGS define we carried from an
         extension to openat2() that we haven't merged. Aleksa might be
         getting back to it at some point but just not right now.
      
       - openat2() used to accidently ignore unknown flag values in the upper
         32 bits.
      
         The new openat2() syscall verifies that no unknown O-flag values are
         set and returns an error to userspace if they are while the older
         open syscalls like open() and openat() simply ignore unknown flag
         values:
      
            #define O_FLAG_CURRENTLY_INVALID (1 << 31)
            struct open_how how = {
                  .flags = O_RDONLY | O_FLAG_CURRENTLY_INVALID,
                  .resolve = 0,
            };
      
            /* fails */
            fd = openat2(-EBADF, "/dev/null", &how, sizeof(how));
      
            /* succeeds */
            fd = openat(-EBADF, "/dev/null", O_RDONLY | O_FLAG_CURRENTLY_INVALID);
      
         However, openat2() silently truncates the upper 32 bits meaning:
      
            #define O_FLAG_CURRENTLY_INVALID_LOWER32 (1 << 31)
            #define O_FLAG_CURRENTLY_INVALID_UPPER32 (1 << 40)
      
            struct open_how how_lowe32 = {
                  .flags = O_RDONLY | O_FLAG_CURRENTLY_INVALID_LOWER32,
            };
      
            struct open_how how_upper32 = {
                  .flags = O_RDONLY | O_FLAG_CURRENTLY_INVALID_UPPER32,
            };
      
            /* fails */
            fd = openat2(-EBADF, "/dev/null", &how_lower32, sizeof(how_lower32));
      
            /* succeeds */
            fd = openat2(-EBADF, "/dev/null", &how_upper32, sizeof(how_upper32));
      
         Fix this by preventing the immediate truncation in build_open_flags()
         and add a compile-time check to catch when we add flags in the upper
         32 bit range.
      
      * tag 'fs.openat2.unknown_flags.v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        test: add openat2() test for invalid upper 32 bit flag value
        open: don't silently ignore unknown O-flags in openat2()
        fcntl: remove unused VALID_UPGRADE_FLAGS
      b97902b6
    • Linus Torvalds's avatar
      Merge tag 'fs.mount_setattr.nosymfollow.v5.14' of... · 30d1a556
      Linus Torvalds authored
      Merge tag 'fs.mount_setattr.nosymfollow.v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
      
      Pull mount_setattr updates from Christian Brauner:
       "A few releases ago the old mount API gained support for a mount
        options which prevents following symlinks on a given mount. This adds
        support for it in the new mount api through the MOUNT_ATTR_NOSYMFOLLOW
        flag via mount_setattr() and fsmount(). With mount_setattr() that flag
        can even be applied recursively.
      
        There's an additional ack from Ross Zwisler who originally authored
        the nosymfollow patch. As I've already had the patches in my for-next
        I didn't add his ack explicitly"
      
      * tag 'fs.mount_setattr.nosymfollow.v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        tests: test MOUNT_ATTR_NOSYMFOLLOW with mount_setattr()
        mount: Support "nosymfollow" in new mount api
      30d1a556
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 65090f30
      Linus Torvalds authored
      Merge misc updates from Andrew Morton:
       "191 patches.
      
        Subsystems affected by this patch series: kthread, ia64, scripts,
        ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab,
        slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap,
        mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization,
        pagealloc, and memory-failure)"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (191 commits)
        mm,hwpoison: make get_hwpoison_page() call get_any_page()
        mm,hwpoison: send SIGBUS with error virutal address
        mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes
        mm/page_alloc: allow high-order pages to be stored on the per-cpu lists
        mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM
        mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
        docs: remove description of DISCONTIGMEM
        arch, mm: remove stale mentions of DISCONIGMEM
        mm: remove CONFIG_DISCONTIGMEM
        m68k: remove support for DISCONTIGMEM
        arc: remove support for DISCONTIGMEM
        arc: update comment about HIGHMEM implementation
        alpha: remove DISCONTIGMEM and NUMA
        mm/page_alloc: move free_the_page
        mm/page_alloc: fix counting of managed_pages
        mm/page_alloc: improve memmap_pages dbg msg
        mm: drop SECTION_SHIFT in code comments
        mm/page_alloc: introduce vm.percpu_pagelist_high_fraction
        mm/page_alloc: limit the number of pages on PCP lists when reclaim is active
        mm/page_alloc: scale the number of pages that are batch freed
        ...
      65090f30
  3. 29 Jun, 2021 19 commits
    • Linus Torvalds's avatar
      Merge tag 'devprop-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 349a2d52
      Linus Torvalds authored
      Pull device properties framework updates from Rafael Wysocki:
       "These unify device properties access in some pieces of code and make
        related changes.
      
        Specifics:
      
         - Handle device properties with software node API in the ACPI IORT
           table parsing code (Heikki Krogerus).
      
         - Unify of_node access in the common device properties code, constify
           the acpi_dma_supported() argument pointer and fix up CONFIG_ACPI=n
           stubs of some functions related to device properties (Andy
           Shevchenko)"
      
      * tag 'devprop-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        device property: Unify access to of_node
        ACPI: scan: Constify acpi_dma_supported() helper function
        ACPI: property: Constify stubs for CONFIG_ACPI=n case
        ACPI: IORT: Handle device properties with software node API
        device property: Retrieve fwnode from of_node via accessor
      349a2d52
    • Linus Torvalds's avatar
      Merge tag 'pnp-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 72ad9f9d
      Linus Torvalds authored
      Pull PNP updates from Rafael Wysocki:
       "These get rid of unnecessary local variables and function, reduce code
        duplication and clean up message printing.
      
        Specifics:
      
         - Remove unnecessary local variables from isapnp_proc_attach_device()
           (Anupama K Patil).
      
         - Make the callers of pnp_alloc() use kzalloc() directly and drop the
           former (Heiner Kallweit).
      
         - Make two pieces of code use dev_dbg() instead of dev_printk() with
           the KERN_DEBUG message level (Heiner Kallweit).
      
         - Use DEVICE_ATTR_RO() instead of full DEVICE_ATTR() in some places
           in card.c (Zhen Lei).
      
         - Use list_for_each_entry() instead of list_for_each() in
           insert_device() (Zou Wei)"
      
      * tag 'pnp-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PNP: pnpbios: Use list_for_each_entry() instead of list_for_each()
        PNP: use DEVICE_ATTR_RO macro
        PNP: Switch over to dev_dbg()
        PNP: Remove pnp_alloc()
        drivers: pnp: isapnp: proc.c: Remove unnecessary local variables
      72ad9f9d
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 5e692824
      Linus Torvalds authored
      Pull ACPI updates from Rafael Wysocki:
       "These update the ACPICA code in the kernel to the 20210604 upstream
        revision, add preliminary support for the Platform Runtime Mechanism
        (PRM), address issues related to the handling of device dependencies
        in the ACPI device eunmeration code, improve the tracking of ACPI
        power resource states, improve the ACPI support for suspend-to-idle on
        AMD systems, continue the unification of message printing in the ACPI
        code, address assorted issues and clean up the code in a number of
        places.
      
        Specifics:
      
         - Update ACPICA code in the kernel to upstrea revision 20210604
           including the following changes:
      
            - Add defines for the CXL Host Bridge Structureand and add the
              CFMWS structure definition to CEDT (Alison Schofield).
            - iASL: Finish support for the IVRS ACPI table (Bob Moore).
            - iASL: Add support for the SVKL table (Bob Moore).
            - iASL: Add full support for RGRT ACPI table (Bob Moore).
            - iASL: Add support for the BDAT ACPI table (Bob Moore).
            - iASL: add disassembler support for PRMT (Erik Kaneda).
            - Fix memory leak caused by _CID repair function (Erik Kaneda).
            - Add support for PlatformRtMechanism OpRegion (Erik Kaneda).
            - Add PRMT module header to facilitate parsing (Erik Kaneda).
            - Add _PLD panel positions (Fabian Wüthrich).
            - MADT: add Multiprocessor Wakeup Mailbox Structure and the SVKL
              table headers (Kuppuswamy Sathyanarayanan).
            - Use ACPI_FALLTHROUGH (Wei Ming Chen).
      
         - Add preliminary support for the Platform Runtime Mechanism (PRM) to
           allow the AML interpreter to call PRM functions (Erik Kaneda).
      
         - Address some issues related to the handling of device dependencies
           reported by _DEP in the ACPI device enumeration code and clean up
           some related pieces of it (Rafael Wysocki).
      
         - Improve the tracking of states of ACPI power resources (Rafael
           Wysocki).
      
         - Improve ACPI support for suspend-to-idle on AMD systems (Alex
           Deucher, Mario Limonciello, Pratik Vishwakarma).
      
         - Continue the unification and cleanup of message printing in the
           ACPI code (Hanjun Guo, Heiner Kallweit).
      
         - Fix possible buffer overrun issue with the description_show() sysfs
           attribute method (Krzysztof Wilczyński).
      
         - Improve the acpi_mask_gpe kernel command line parameter handling
           and clean up the core ACPI code related to sysfs (Andy Shevchenko,
           Baokun Li, Clayton Casciato).
      
         - Postpone bringing devices in the general ACPI PM domain to D0
           during resume from system-wide suspend until they are really needed
           (Dmitry Torokhov).
      
         - Make the ACPI processor driver fix up C-state latency if not
           ordered (Mario Limonciello).
      
         - Add support for identifying devices depening on the given one that
           are not its direct descendants with the help of _DEP (Daniel
           Scally).
      
         - Extend the checks related to ACPI IRQ overrides on x86 in order to
           avoid false-positives (Hui Wang).
      
         - Add battery DPTF participant for Intel SoCs (Sumeet Pawnikar).
      
         - Rearrange the ACPI fan driver and device power management code to
           use a common list of device IDs (Rafael Wysocki).
      
         - Fix clang CFI violation in the ACPI BGRT table parsing code and
           clean it up (Nathan Chancellor).
      
         - Add GPE-related quirks for some laptops to the EC driver (Chris
           Chiu, Zhang Rui).
      
         - Make the ACPI PPTT table parsing code populate the cache-id value
           if present in the firmware (James Morse).
      
         - Remove redundant clearing of context->ret.pointer from
           acpi_run_osc() (Hans de Goede).
      
         - Add missing acpi_put_table() in acpi_init_fpdt() (Jing Xiangfeng).
      
         - Make ACPI APEI handle ARM Processor Error CPER records like Memory
           Error ones to avoid user space task lockups (Xiaofei Tan).
      
         - Stop warning about disabled ACPI in APEI (Jon Hunter).
      
         - Fix fall-through warning for Clang in the SBSHC driver (Gustavo A.
           R. Silva).
      
         - Add custom DSDT file as Makefile prerequisite (Richard Fitzgerald).
      
         - Initialize local variable to avoid garbage being returned (Colin
           Ian King).
      
         - Simplify assorted pieces of code, address assorted coding style and
           documentation issues and comment typos (Baokun Li, Christophe
           JAILLET, Clayton Casciato, Liu Shixin, Shaokun Zhang, Wei Yongjun,
           Yang Li, Zhen Lei)"
      
      * tag 'acpi-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (97 commits)
        ACPI: PM: postpone bringing devices to D0 unless we need them
        ACPI: tables: Add custom DSDT file as makefile prerequisite
        ACPI: bgrt: Use sysfs_emit
        ACPI: bgrt: Fix CFI violation
        ACPI: EC: trust DSDT GPE for certain HP laptop
        ACPI: scan: Simplify acpi_table_events_fn()
        ACPI: PM: Adjust behavior for field problems on AMD systems
        ACPI: PM: s2idle: Add support for new Microsoft UUID
        ACPI: PM: s2idle: Add support for multiple func mask
        ACPI: PM: s2idle: Refactor common code
        ACPI: PM: s2idle: Use correct revision id
        ACPI: sysfs: Remove tailing return statement in void function
        ACPI: sysfs: Use __ATTR_RO() and __ATTR_RW() macros
        ACPI: sysfs: Sort headers alphabetically
        ACPI: sysfs: Refactor param_get_trace_state() to drop dead code
        ACPI: sysfs: Unify pattern of memory allocations
        ACPI: sysfs: Allow bitmap list to be supplied to acpi_mask_gpe
        ACPI: sysfs: Make sparse happy about address space in use
        ACPI: scan: Fix race related to dropping dependencies
        ACPI: scan: Reorganize acpi_device_add()
        ...
      5e692824
    • Linus Torvalds's avatar
      Merge tag 'pm-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 3563f55c
      Linus Torvalds authored
      Pull power management updates from Rafael Wysocki:
       "These add hybrid processors support to the intel_pstate driver and
        make it work with more processor models when HWP is disabled, make the
        intel_idle driver use special C6 idle state paremeters when package
        C-states are disabled, add cooling support to the tegra30 devfreq
        driver, rework the TEO (timer events oriented) cpuidle governor,
        extend the OPP (operating performance points) framework to use the
        required-opps DT property in more cases, fix some issues and clean up
        a number of assorted pieces of code.
      
        Specifics:
      
         - Make intel_pstate support hybrid processors using abstract
           performance units in the HWP interface (Rafael Wysocki).
      
         - Add Icelake servers and Cometlake support in no-HWP mode to
           intel_pstate (Giovanni Gherdovich).
      
         - Make cpufreq_online() error path be consistent with the CPU device
           removal path in cpufreq (Rafael Wysocki).
      
         - Clean up 3 cpufreq drivers and the statistics code (Hailong Liu,
           Randy Dunlap, Shaokun Zhang).
      
         - Make intel_idle use special idle state parameters for C6 when
           package C-states are disabled (Chen Yu).
      
         - Rework the TEO (timer events oriented) cpuidle governor to address
           some theoretical shortcomings in it (Rafael Wysocki).
      
         - Drop unneeded semicolon from the TEO governor (Wan Jiabing).
      
         - Modify the runtime PM framework to accept unassigned suspend and
           resume callback pointers (Ulf Hansson).
      
         - Improve pm_runtime_get_sync() documentation (Krzysztof Kozlowski).
      
         - Improve device performance states support in the generic power
           domains (genpd) framework (Ulf Hansson).
      
         - Fix some documentation issues in genpd (Yang Yingliang).
      
         - Make the operating performance points (OPP) framework use the
           required-opps DT property in use cases that are not related to
           genpd (Hsin-Yi Wang).
      
         - Make lazy_link_required_opp_table() use list_del_init instead of
           list_del/INIT_LIST_HEAD (Yang Yingliang).
      
         - Simplify wake IRQs handling in the core system-wide sleep support
           code and clean up some coding style inconsistencies in it (Tian
           Tao, Zhen Lei).
      
         - Add cooling support to the tegra30 devfreq driver and improve its
           DT bindings (Dmitry Osipenko).
      
         - Fix some assorted issues in the devfreq core and drivers (Chanwoo
           Choi, Dong Aisheng, YueHaibing)"
      
      * tag 'pm-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (39 commits)
        PM / devfreq: passive: Fix get_target_freq when not using required-opp
        cpufreq: Make cpufreq_online() call driver->offline() on errors
        opp: Allow required-opps to be used for non genpd use cases
        cpuidle: teo: remove unneeded semicolon in teo_select()
        dt-bindings: devfreq: tegra30-actmon: Add cooling-cells
        dt-bindings: devfreq: tegra30-actmon: Convert to schema
        PM / devfreq: userspace: Use DEVICE_ATTR_RW macro
        PM: runtime: Clarify documentation when callbacks are unassigned
        PM: runtime: Allow unassigned ->runtime_suspend|resume callbacks
        PM: runtime: Improve path in rpm_idle() when no callback
        PM: hibernate: remove leading spaces before tabs
        PM: sleep: remove trailing spaces and tabs
        PM: domains: Drop/restore performance state votes for devices at runtime PM
        PM: domains: Return early if perf state is already set for the device
        PM: domains: Split code in dev_pm_genpd_set_performance_state()
        cpuidle: teo: Use kerneldoc documentation in admin-guide
        cpuidle: teo: Rework most recent idle duration values treatment
        cpuidle: teo: Change the main idle state selection logic
        cpuidle: teo: Cosmetic modification of teo_select()
        cpuidle: teo: Cosmetic modifications of teo_update()
        ...
      3563f55c
    • Linus Torvalds's avatar
      Merge tag 'x86-entry-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1dfb0f47
      Linus Torvalds authored
      Pull x86 entry code related updates from Thomas Gleixner:
      
       - Consolidate the macros for .byte ... opcode sequences
      
       - Deduplicate register offset defines in include files
      
       - Simplify the ia32,x32 compat handling of the related syscall tables
         to get rid of #ifdeffery.
      
       - Clear all EFLAGS which are not required for syscall handling
      
       - Consolidate the syscall tables and switch the generation over to the
         generic shell script and remove the CFLAGS tweaks which are not
         longer required.
      
       - Use 'int' type for system call numbers to match the generic code.
      
       - Add more selftests for syscalls
      
      * tag 'x86-entry-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/syscalls: Don't adjust CFLAGS for syscall tables
        x86/syscalls: Remove -Wno-override-init for syscall tables
        x86/uml/syscalls: Remove array index from syscall initializers
        x86/syscalls: Clear 'offset' and 'prefix' in case they are set in env
        x86/entry: Use int everywhere for system call numbers
        x86/entry: Treat out of range and gap system calls the same
        x86/entry/64: Sign-extend system calls on entry to int
        selftests/x86/syscall: Add tests under ptrace to syscall_numbering_64
        selftests/x86/syscall: Simplify message reporting in syscall_numbering
        selftests/x86/syscall: Update and extend syscall_numbering_64
        x86/syscalls: Switch to generic syscallhdr.sh
        x86/syscalls: Use __NR_syscalls instead of __NR_syscall_max
        x86/unistd: Define X32_NR_syscalls only for 64-bit kernel
        x86/syscalls: Stop filling syscall arrays with *_sys_ni_syscall
        x86/syscalls: Switch to generic syscalltbl.sh
        x86/entry/x32: Rename __x32_compat_sys_* to __x64_compat_sys_*
      1dfb0f47
    • Linus Torvalds's avatar
      Merge tag 'x86-irq-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a22c3f61
      Linus Torvalds authored
      Pull x86 interrupt related updates from Thomas Gleixner:
      
       - Consolidate the VECTOR defines and the usage sites.
      
       - Cleanup GDT/IDT related code and replace open coded ASM with proper
         native helper functions.
      
      * tag 'x86-irq-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/kexec: Set_[gi]dt() -> native_[gi]dt_invalidate() in machine_kexec_*.c
        x86: Add native_[ig]dt_invalidate()
        x86/idt: Remove address argument from idt_invalidate()
        x86/irq: Add and use NR_EXTERNAL_VECTORS and NR_SYSTEM_VECTORS
        x86/irq: Remove unused vectors defines
      a22c3f61
    • Linus Torvalds's avatar
      Merge tag 'timers-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a941a034
      Linus Torvalds authored
      Pull timer updates from Thomas Gleixner:
       "Time and clocksource/clockevent related updates:
      
        Core changes:
      
         - Infrastructure to support per CPU "broadcast" devices for per CPU
           clockevent devices which stop in deep idle states. This allows us
           to utilize the more efficient architected timer on certain ARM SoCs
           for normal operation instead of permanentely using the slow to
           access SoC specific clockevent device.
      
         - Print the name of the broadcast/wakeup device in /proc/timer_list
      
         - Make the clocksource watchdog more robust against delays between
           reading the current active clocksource and the watchdog
           clocksource. Such delays can be caused by NMIs, SMIs and vCPU
           preemption.
      
           Handle this by reading the watchdog clocksource twice, i.e. before
           and after reading the current active clocksource. In case that the
           two watchdog reads shows an excessive time delta, the read sequence
           is repeated up to 3 times.
      
         - Improve the debug output and add a test module for the watchdog
           mechanism.
      
         - Reimplementation of the venerable time64_to_tm() function with a
           faster and significantly smaller version. Straight from the source,
           i.e. the author of the related research paper contributed this!
      
        Driver changes:
      
         - No new drivers, not even new device tree bindings!
      
         - Fixes, improvements and cleanups and all over the place"
      
      * tag 'timers-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
        time/kunit: Add missing MODULE_LICENSE()
        time: Improve performance of time64_to_tm()
        clockevents: Use list_move() instead of list_del()/list_add()
        clocksource: Print deviation in nanoseconds when a clocksource becomes unstable
        clocksource: Provide kernel module to test clocksource watchdog
        clocksource: Reduce clocksource-skew threshold
        clocksource: Limit number of CPUs checked for clock synchronization
        clocksource: Check per-CPU clock synchronization when marked unstable
        clocksource: Retry clock read if long delays detected
        clockevents: Add missing parameter documentation
        clocksource/drivers/timer-ti-dm: Drop unnecessary restore
        clocksource/arm_arch_timer: Improve Allwinner A64 timer workaround
        clocksource/drivers/arm_global_timer: Remove duplicated argument in arm_global_timer
        clocksource/drivers/arm_global_timer: Make symbol 'gt_clk_rate_change_nb' static
        arm: zynq: don't disable CONFIG_ARM_GLOBAL_TIMER due to CONFIG_CPU_FREQ anymore
        clocksource/drivers/arm_global_timer: Implement rate compensation whenever source clock changes
        clocksource/drivers/ingenic: Rename unreasonable array names
        clocksource/drivers/timer-ti-dm: Save and restore timer TIOCP_CFG
        clocksource/drivers/mediatek: Ack and disable interrupts on suspend
        clocksource/drivers/samsung_pwm: Constify source IO memory
        ...
      a941a034
    • Linus Torvalds's avatar
      Merge tag 'irq-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 21edf509
      Linus Torvalds authored
      Pull irq updates from Thomas Gleixner:
       "Updates for the interrupt subsystem:
      
        Core changes:
      
         - Cleanup and simplification of common code to invoke the low level
           interrupt flow handlers when this invocation requires irqdomain
           resolution. Add the necessary core infrastructure.
      
         - Provide a proper interface for modular PMU drivers to set the
           interrupt affinity.
      
         - Add a request flag which allows to exclude interrupts from spurious
           interrupt detection. Useful especially for IPI handlers which
           always return IRQ_HANDLED which turns the spurious interrupt
           detection into a pointless waste of CPU cycles.
      
        Driver changes:
      
         - Bulk convert interrupt chip drivers to the new irqdomain low level
           flow handler invocation mechanism.
      
         - Add device tree bindings for the Renesas R-Car M3-W+ SoC
      
         - Enable modular build of the Qualcomm PDC driver
      
         - The usual small fixes and improvements"
      
      * tag 'irq-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
        dt-bindings: interrupt-controller: arm,gic-v3: Describe GICv3 optional properties
        irqchip: gic-pm: Remove redundant error log of clock bulk
        irqchip/sun4i: Remove unnecessary oom message
        irqchip/irq-imx-gpcv2: Remove unnecessary oom message
        irqchip/imgpdc: Remove unnecessary oom message
        irqchip/gic-v3-its: Remove unnecessary oom message
        irqchip/gic-v2m: Remove unnecessary oom message
        irqchip/exynos-combiner: Remove unnecessary oom message
        irqchip: Bulk conversion to generic_handle_domain_irq()
        genirq: Move non-irqdomain handle_domain_irq() handling into ARM's handle_IRQ()
        genirq: Add generic_handle_domain_irq() helper
        irqchip/nvic: Convert from handle_IRQ() to handle_domain_irq()
        irqdesc: Fix __handle_domain_irq() comment
        genirq: Use irq_resolve_mapping() to implement __handle_domain_irq() and co
        irqdomain: Introduce irq_resolve_mapping()
        irqdomain: Protect the linear revmap with RCU
        irqdomain: Cache irq_data instead of a virq number in the revmap
        irqdomain: Use struct_size() helper when allocating irqdomain
        irqdomain: Make normal and nomap irqdomains exclusive
        powerpc: Move the use of irq_domain_add_nomap() behind a config option
        ...
      21edf509
    • Linus Torvalds's avatar
      Merge tag 'smp-urgent-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 62180152
      Linus Torvalds authored
      Pull CPU hotplug fix from Thomas Gleixner:
       "A fix for the CPU hotplug and cpusets interaction:
      
        cpusets delegate the hotplug work to a workqueue to prevent a lock
        order inversion vs. the CPU hotplug lock. The work is not flushed
        before the hotplug operation returns which creates user visible
        inconsistent state. Prevent this by flushing the work after dropping
        CPU hotplug lock and before releasing the outer mutex which serializes
        the CPU hotplug related sysfs interface operations"
      
      * tag 'smp-urgent-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        cpu/hotplug: Cure the cpusets trainwreck
      62180152
    • Linus Torvalds's avatar
      Merge tag 'smp-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 371fb854
      Linus Torvalds authored
      Pull CPU hotplug cleanup from Thomas Gleixner:
       "A simple cleanup for the CPU hotplug code to avoid per_cpu_ptr()
        reevaluation"
      
      * tag 'smp-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        cpu/hotplug: Simplify access to percpu cpuhp_state
      371fb854
    • Linus Torvalds's avatar
      Merge tag 'printk-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux · e563592c
      Linus Torvalds authored
      Pull printk updates from Petr Mladek:
      
       - Add %pt[RT]s modifier to vsprintf(). It overrides ISO 8601 separator
         by using ' ' (space). It produces "YYYY-mm-dd HH:MM:SS" instead of
         "YYYY-mm-ddTHH:MM:SS".
      
       - Correctly parse long row of numbers by sscanf() when using the field
         width. Add extensive sscanf() selftest.
      
       - Generalize re-entrant CPU lock that has already been used to
         serialize dump_stack() output. It is part of the ongoing printk
         rework. It will allow to remove the obsoleted printk_safe buffers and
         introduce atomic consoles.
      
       - Some code clean up and sparse warning fixes.
      
      * tag 'printk-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
        printk: fix cpu lock ordering
        lib/dump_stack: move cpu lock to printk.c
        printk: Remove trailing semicolon in macros
        random32: Fix implicit truncation warning in prandom_seed_state()
        lib: test_scanf: Remove pointless use of type_min() with unsigned types
        selftests: lib: Add wrapper script for test_scanf
        lib: test_scanf: Add tests for sscanf number conversion
        lib: vsprintf: Fix handling of number field widths in vsscanf
        lib: vsprintf: scanf: Negative number must have field width > 1
        usb: host: xhci-tegra: Switch to use %ptTs
        nilfs2: Switch to use %ptTs
        kdb: Switch to use %ptTs
        lib/vsprintf: Allow to override ISO 8601 date and time separator
      e563592c
    • Linus Torvalds's avatar
      Merge tag 'hyperv-next-signed-20210629' of... · b694011a
      Linus Torvalds authored
      Merge tag 'hyperv-next-signed-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
      
      Pull hyperv updates from Wei Liu:
       "Just a few minor enhancement patches and bug fixes"
      
      * tag 'hyperv-next-signed-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
        PCI: hv: Add check for hyperv_initialized in init_hv_pci_drv()
        Drivers: hv: Move Hyper-V extended capability check to arch neutral code
        drivers: hv: Fix missing error code in vmbus_connect()
        x86/hyperv: fix logical processor creation
        hv_utils: Fix passing zero to 'PTR_ERR' warning
        scsi: storvsc: Use blk_mq_unique_tag() to generate requestIDs
        Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer
        hv_balloon: Remove redundant assignment to region_start
      b694011a
    • Naoya Horiguchi's avatar
      mm,hwpoison: make get_hwpoison_page() call get_any_page() · 0ed950d1
      Naoya Horiguchi authored
      __get_hwpoison_page() could fail to grab refcount by some race condition,
      so it's helpful if we can handle it by retrying.  We already have retry
      logic, so make get_hwpoison_page() call get_any_page() when called from
      memory_failure().
      
      As a result, get_hwpoison_page() can return negative values (i.e.  error
      code), so some callers are also changed to handle error cases.
      soft_offline_page() does nothing for -EBUSY because that's enough and
      users in userspace can easily handle it.  unpoison_memory() is also
      unchanged because it's broken and need thorough fixes (will be done
      later).
      
      Link: https://lkml.kernel.org/r/20210603233632.2964832-3-nao.horiguchi@gmail.comSigned-off-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0ed950d1
    • Naoya Horiguchi's avatar
      mm,hwpoison: send SIGBUS with error virutal address · a3f5d80e
      Naoya Horiguchi authored
      Now an action required MCE in already hwpoisoned address surely sends a
      SIGBUS to current process, but the SIGBUS doesn't convey error virtual
      address.  That's not optimal for hwpoison-aware applications.
      
      To fix the issue, make memory_failure() call kill_accessing_process(),
      that does pagetable walk to find the error virtual address.  It could find
      multiple virtual addresses for the same error page, and it seems hard to
      tell which virtual address is correct one.  But that's rare and sending
      incorrect virtual address could be better than no address.  So let's
      report the first found virtual address for now.
      
      [naoya.horiguchi@nec.com: fix walk_page_range() return]
        Link: https://lkml.kernel.org/r/20210603051055.GA244241@hori.linux.bs1.fc.nec.co.jp
      
      Link: https://lkml.kernel.org/r/20210521030156.2612074-4-nao.horiguchi@gmail.comSigned-off-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Aili Yao <yaoaili@kingsoft.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Jue Wang <juew@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3f5d80e
    • Mel Gorman's avatar
      mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes · 203c06ee
      Mel Gorman authored
      Dave Hansen reported the following about Feng Tang's tests on a machine
      with persistent memory onlined as a DRAM-like device.
      
        Feng Tang tossed these on a "Cascade Lake" system with 96 threads and
        ~512G of persistent memory and 128G of DRAM.  The PMEM is in "volatile
        use" mode and being managed via the buddy just like the normal RAM.
      
        The PMEM zones are big ones:
      
              present  65011712 = 248 G
              high       134595 = 525 M
      
        The PMEM nodes, of course, don't have any CPUs in them.
      
        With your series, the pcp->high value per-cpu is 69584 pages or about
        270MB per CPU.  Scaled up by the 96 CPU threads, that's ~26GB of
        worst-case memory in the pcps per zone, or roughly 10% of the size of
        the zone.
      
      This should not cause a problem as such although it could trigger reclaim
      due to pages being stored on per-cpu lists for CPUs remote to a node.  It
      is not possible to treat cpuless nodes exactly the same as normal nodes
      but the worst-case scenario can be mitigated by splitting pcp->high across
      all online CPUs for cpuless memory nodes.
      
      Link: https://lkml.kernel.org/r/20210616110743.GK30378@techsingularity.netSuggested-by: default avatarDave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Tang, Feng" <feng.tang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      203c06ee
    • Mel Gorman's avatar
      mm/page_alloc: allow high-order pages to be stored on the per-cpu lists · 44042b44
      Mel Gorman authored
      The per-cpu page allocator (PCP) only stores order-0 pages.  This means
      that all THP and "cheap" high-order allocations including SLUB contends on
      the zone->lock.  This patch extends the PCP allocator to store THP and
      "cheap" high-order pages.  Note that struct per_cpu_pages increases in
      size to 256 bytes (4 cache lines) on x86-64.
      
      Note that this is not necessarily a universal performance win because of
      how it is implemented.  High-order pages can cause pcp->high to be
      exceeded prematurely for lower-orders so for example, a large number of
      THP pages being freed could release order-0 pages from the PCP lists.
      Hence, much depends on the allocation/free pattern as observed by a single
      CPU to determine if caching helps or hurts a particular workload.
      
      That said, basic performance testing passed.  The following is a netperf
      UDP_STREAM test which hits the relevant patches as some of the network
      allocations are high-order.
      
      netperf-udp
                                       5.13.0-rc2             5.13.0-rc2
                                 mm-pcpburst-v3r4   mm-pcphighorder-v1r7
      Hmean     send-64         261.46 (   0.00%)      266.30 *   1.85%*
      Hmean     send-128        516.35 (   0.00%)      536.78 *   3.96%*
      Hmean     send-256       1014.13 (   0.00%)     1034.63 *   2.02%*
      Hmean     send-1024      3907.65 (   0.00%)     4046.11 *   3.54%*
      Hmean     send-2048      7492.93 (   0.00%)     7754.85 *   3.50%*
      Hmean     send-3312     11410.04 (   0.00%)    11772.32 *   3.18%*
      Hmean     send-4096     13521.95 (   0.00%)    13912.34 *   2.89%*
      Hmean     send-8192     21660.50 (   0.00%)    22730.72 *   4.94%*
      Hmean     send-16384    31902.32 (   0.00%)    32637.50 *   2.30%*
      
      Functionally, a patch like this is necessary to make bulk allocation of
      high-order pages work with similar performance to order-0 bulk
      allocations.  The bulk allocator is not updated in this series as it would
      have to be determined by bulk allocation users how they want to track the
      order of pages allocated with the bulk allocator.
      
      Link: https://lkml.kernel.org/r/20210611135753.GC30378@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44042b44
    • Mike Rapoport's avatar
      mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM · 43b02ba9
      Mike Rapoport authored
      After removal of the DISCONTIGMEM memory model the FLAT_NODE_MEM_MAP
      configuration option is equivalent to FLATMEM.
      
      Drop CONFIG_FLAT_NODE_MEM_MAP and use CONFIG_FLATMEM instead.
      
      Link: https://lkml.kernel.org/r/20210608091316.3622-10-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      43b02ba9
    • Mike Rapoport's avatar
      mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA · a9ee6cf5
      Mike Rapoport authored
      After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA
      configuration options are equivalent.
      
      Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead.
      
      Done with
      
      	$ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \
      		$(git grep -wl CONFIG_NEED_MULTIPLE_NODES)
      	$ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \
      		$(git grep -wl NEED_MULTIPLE_NODES)
      
      with manual tweaks afterwards.
      
      [rppt@linux.ibm.com: fix arm boot crash]
        Link: https://lkml.kernel.org/r/YMj9vHhHOiCVN4BF@linux.ibm.com
      
      Link: https://lkml.kernel.org/r/20210608091316.3622-9-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9ee6cf5
    • Mike Rapoport's avatar
      docs: remove description of DISCONTIGMEM · 48d9f335
      Mike Rapoport authored
      Remove description of DISCONTIGMEM from the "Memory Models" document and
      update VM sysctl description so that it won't mention DISCONIGMEM.
      
      Link: https://lkml.kernel.org/r/20210608091316.3622-8-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      48d9f335