1. 20 Jan, 2022 5 commits
    • David Hildenbrand's avatar
      proc/vmcore: don't fake reading zeroes on surprise vmcore_cb unregistration · 25bc5b0d
      David Hildenbrand authored
      In commit cc5f2704 ("proc/vmcore: convert oldmem_pfn_is_ram callback
      to more generic vmcore callbacks"), we added detection of surprise
      vmcore_cb unregistration after the vmcore was already opened.  Once
      detected, we warn the user and simulate reading zeroes from that point
      on when accessing the vmcore.
      
      The basic reason was that unexpected unregistration, for example, by
      manually unbinding a driver from a device after opening the vmcore, is
      not supported and could result in reading oldmem the vmcore_cb would
      have actually prohibited while registered.  However, something like that
      can similarly be trigger by a user that's really looking for trouble
      simply by unbinding the relevant driver before opening the vmcore -- or
      by disallowing loading the driver in the first place.  So it's actually
      of limited help.
      
      Currently, unregistration can only be triggered via virtio-mem when
      manually unbinding the driver from the device inside the VM; there is no
      way to trigger it from the hypervisor, as hypervisors don't allow for
      unplugging virtio-mem devices -- ripping out system RAM from a VM
      without coordination with the guest is usually not a good idea.
      
      The important part is that unbinding the driver and unregistering the
      vmcore_cb while concurrently reading the vmcore won't crash the system,
      and that is handled by the rwsem.
      
      To make the mechanism more future proof, let's remove the "read zero"
      part, but leave the warning in place.  For example, we could have a
      future driver (like virtio-balloon) that will contact the hypervisor to
      figure out if we already populated a page for a given PFN.
      Hotunplugging such a device and consequently unregistering the vmcore_cb
      could be triggered from the hypervisor without harming the system even
      while kdump is running.  In that case, we don't want to silently end up
      with a vmcore that contains wrong data, because the user inside the VM
      might be unaware of the hypervisor action and might easily miss the
      warning in the log.
      
      Link: https://lkml.kernel.org/r/20211111192243.22002-1-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Philipp Rudo <prudo@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      25bc5b0d
    • Kefeng Wang's avatar
      mm: percpu: add generic pcpu_populate_pte() function · 20c03576
      Kefeng Wang authored
      With NEED_PER_CPU_PAGE_FIRST_CHUNK enabled, we need a function to
      populate pte, this patch adds a generic pcpu populate pte function,
      pcpu_populate_pte(), which is marked __weak and used on most
      architectures, but it is overridden on x86, which has its own
      implementation.
      
      Link: https://lkml.kernel.org/r/20211216112359.103822-5-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      20c03576
    • Kefeng Wang's avatar
      mm: percpu: add generic pcpu_fc_alloc/free funciton · 23f91716
      Kefeng Wang authored
      With the previous patch, we could add a generic pcpu first chunk
      allocate and free function to cleanup the duplicated definations on each
      architecture.
      
      Link: https://lkml.kernel.org/r/20211216112359.103822-4-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23f91716
    • Kefeng Wang's avatar
      mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedef · 1ca3fb3a
      Kefeng Wang authored
      Add pcpu_fc_cpu_to_node_fn_t and pass it into pcpu_fc_alloc_fn_t, pcpu
      first chunk allocation will call it to alloc memblock on the
      corresponding node by it, this is prepare for the next patch.
      
      Link: https://lkml.kernel.org/r/20211216112359.103822-3-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1ca3fb3a
    • Kefeng Wang's avatar
      mm: percpu: generalize percpu related config · 7ecd19cf
      Kefeng Wang authored
      Patch series "mm: percpu: Cleanup percpu first chunk function".
      
      When supporting page mapping percpu first chunk allocator on arm64, we
      found there are lots of duplicated codes in percpu embed/page first chunk
      allocator.  This patchset is aimed to cleanup them and should no function
      change.
      
      The currently supported status about 'embed' and 'page' in Archs shows
      below,
      
      	embed: NEED_PER_CPU_PAGE_FIRST_CHUNK
      	page:  NEED_PER_CPU_EMBED_FIRST_CHUNK
      
      		embed	page
      	------------------------
      	arm64	  Y	 Y
      	mips	  Y	 N
      	powerpc	  Y	 Y
      	riscv	  Y	 N
      	sparc	  Y	 Y
      	x86	  Y	 Y
      	------------------------
      
      There are two interfaces about percpu first chunk allocator,
      
       extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
                                      size_t atom_size,
                                      pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
      -                               pcpu_fc_alloc_fn_t alloc_fn,
      -                               pcpu_fc_free_fn_t free_fn);
      +                               pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
      
       extern int __init pcpu_page_first_chunk(size_t reserved_size,
      -                               pcpu_fc_alloc_fn_t alloc_fn,
      -                               pcpu_fc_free_fn_t free_fn,
      -                               pcpu_fc_populate_pte_fn_t populate_pte_fn);
      +                               pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
      
      The pcpu_fc_alloc_fn_t/pcpu_fc_free_fn_t is killed, we provide generic
      pcpu_fc_alloc() and pcpu_fc_free() function, which are called in the
      pcpu_embed/page_first_chunk().
      
      1) For pcpu_embed_first_chunk(), pcpu_fc_cpu_to_node_fn_t is needed to be
         provided when archs supported NUMA.
      
      2) For pcpu_page_first_chunk(), the pcpu_fc_populate_pte_fn_t is killed too,
         a generic pcpu_populate_pte() which marked '__weak' is provided, if you
         need a different function to populate pte on the arch(like x86), please
         provide its own implementation.
      
      [1] https://github.com/kevin78/linux.git percpu-cleanup
      
      This patch (of 4):
      
      The HAVE_SETUP_PER_CPU_AREA/NEED_PER_CPU_EMBED_FIRST_CHUNK/
      NEED_PER_CPU_PAGE_FIRST_CHUNK/USE_PERCPU_NUMA_NODE_ID configs, which have
      duplicate definitions on platforms that subscribe it.
      
      Move them into mm, drop these redundant definitions and instead just
      select it on applicable platforms.
      
      Link: https://lkml.kernel.org/r/20211216112359.103822-1-wangkefeng.wang@huawei.com
      Link: https://lkml.kernel.org/r/20211216112359.103822-2-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Will Deacon <will@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ecd19cf
  2. 09 Jan, 2022 6 commits
  3. 08 Jan, 2022 5 commits
  4. 07 Jan, 2022 11 commits
  5. 06 Jan, 2022 11 commits
  6. 05 Jan, 2022 2 commits
    • Naveen N. Rao's avatar
      tracing: Tag trace_percpu_buffer as a percpu pointer · f28439db
      Naveen N. Rao authored
      Tag trace_percpu_buffer as a percpu pointer to resolve warnings
      reported by sparse:
        /linux/kernel/trace/trace.c:3218:46: warning: incorrect type in initializer (different address spaces)
        /linux/kernel/trace/trace.c:3218:46:    expected void const [noderef] __percpu *__vpp_verify
        /linux/kernel/trace/trace.c:3218:46:    got struct trace_buffer_struct *
        /linux/kernel/trace/trace.c:3234:9: warning: incorrect type in initializer (different address spaces)
        /linux/kernel/trace/trace.c:3234:9:    expected void const [noderef] __percpu *__vpp_verify
        /linux/kernel/trace/trace.c:3234:9:    got int *
      
      Link: https://lkml.kernel.org/r/ebabd3f23101d89cb75671b68b6f819f5edc830b.1640255304.git.naveen.n.rao@linux.vnet.ibm.com
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Fixes: 07d777fe ("tracing: Add percpu buffers for trace_printk()")
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f28439db
    • Naveen N. Rao's avatar
      tracing: Fix check for trace_percpu_buffer validity in get_trace_buf() · 823e670f
      Naveen N. Rao authored
      With the new osnoise tracer, we are seeing the below splat:
          Kernel attempted to read user page (c7d880000) - exploit attempt? (uid: 0)
          BUG: Unable to handle kernel data access on read at 0xc7d880000
          Faulting instruction address: 0xc0000000002ffa10
          Oops: Kernel access of bad area, sig: 11 [#1]
          LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
          ...
          NIP [c0000000002ffa10] __trace_array_vprintk.part.0+0x70/0x2f0
          LR [c0000000002ff9fc] __trace_array_vprintk.part.0+0x5c/0x2f0
          Call Trace:
          [c0000008bdd73b80] [c0000000001c49cc] put_prev_task_fair+0x3c/0x60 (unreliable)
          [c0000008bdd73be0] [c000000000301430] trace_array_printk_buf+0x70/0x90
          [c0000008bdd73c00] [c0000000003178b0] trace_sched_switch_callback+0x250/0x290
          [c0000008bdd73c90] [c000000000e70d60] __schedule+0x410/0x710
          [c0000008bdd73d40] [c000000000e710c0] schedule+0x60/0x130
          [c0000008bdd73d70] [c000000000030614] interrupt_exit_user_prepare_main+0x264/0x270
          [c0000008bdd73de0] [c000000000030a70] syscall_exit_prepare+0x150/0x180
          [c0000008bdd73e10] [c00000000000c174] system_call_vectored_common+0xf4/0x278
      
      osnoise tracer on ppc64le is triggering osnoise_taint() for negative
      duration in get_int_safe_duration() called from
      trace_sched_switch_callback()->thread_exit().
      
      The problem though is that the check for a valid trace_percpu_buffer is
      incorrect in get_trace_buf(). The check is being done after calculating
      the pointer for the current cpu, rather than on the main percpu pointer.
      Fix the check to be against trace_percpu_buffer.
      
      Link: https://lkml.kernel.org/r/a920e4272e0b0635cf20c444707cbce1b2c8973d.1640255304.git.naveen.n.rao@linux.vnet.ibm.com
      
      Cc: stable@vger.kernel.org
      Fixes: e2ace001 ("tracing: Choose static tp_printk buffer by explicit nesting count")
      Signed-off-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      823e670f