1. 08 Sep, 2021 28 commits
    • Marco Elver's avatar
      kfence: show cpu and timestamp in alloc/free info · 4bbf04aa
      Marco Elver authored
      Record cpu and timestamp on allocations and frees, and show them in
      reports.  Upon an error, this can help correlate earlier messages in the
      kernel log via allocation and free timestamps.
      
      Link: https://lkml.kernel.org/r/20210714175312.2947941-1-elver@google.comSuggested-by: default avatarJoern Engel <joern@purestorage.com>
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Acked-by: default avatarAlexander Potapenko <glider@google.com>
      Acked-by: default avatarJoern Engel <joern@purestorage.com>
      Cc: Yuanyuan Zhong <yzhong@purestorage.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4bbf04aa
    • Jordy Zomer's avatar
      mm/secretmem: use refcount_t instead of atomic_t · 11086054
      Jordy Zomer authored
      When a secret memory region is active, memfd_secret disables hibernation.
      One of the goals is to keep the secret data from being written to
      persistent-storage.
      
      It accomplishes this by maintaining a reference count to
      `secretmem_users`.  Once this reference is held your system can not be
      hibernated due to the check in `hibernation_available()`.  However,
      because `secretmem_users` is of type `atomic_t`, reference counter
      overflows are possible.
      
      As you can see there's an `atomic_inc` for each `memfd` that is opened in
      the `memfd_secret` syscall.  If a local attacker succeeds to open 2^32
      memfd's, the counter will wrap around to 0.  This implies that you may
      hibernate again, even though there are still regions of this secret
      memory, thereby bypassing the security check.
      
      In an attempt to fix this I have used `refcount_t` instead of `atomic_t`
      which prevents reference counter overflows.
      
      Link: https://lkml.kernel.org/r/20210820043339.2151352-1-jordy@pwning.systemsSigned-off-by: default avatarJordy Zomer <jordy@pwning.systems>
      Cc: Kees Cook <keescook@chromium.org>,
      Cc: Jordy Zomer <jordy@jordyzomer.github.io>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      11086054
    • Muchun Song's avatar
      mm: introduce PAGEFLAGS_MASK to replace ((1UL << NR_PAGEFLAGS) - 1) · 41c961b9
      Muchun Song authored
      Instead of hard-coding ((1UL << NR_PAGEFLAGS) - 1) everywhere, introducing
      PAGEFLAGS_MASK to make the code clear to get the page flags.
      
      Link: https://lkml.kernel.org/r/20210819150712.59948-1-songmuchun@bytedance.comSigned-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      41c961b9
    • Changbin Du's avatar
      mm: in_irq() cleanup · ea0eafea
      Changbin Du authored
      Replace the obsolete and ambiguos macro in_irq() with new macro
      in_hardirq().
      
      Link: https://lkml.kernel.org/r/20210813145245.86070-1-changbin.du@gmail.comSigned-off-by: default avatarChangbin Du <changbin.du@gmail.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[kmemleak]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ea0eafea
    • Sebastian Andrzej Siewior's avatar
      highmem: don't disable preemption on RT in kmap_atomic() · 51386120
      Sebastian Andrzej Siewior authored
      kmap_atomic() disables preemption and pagefaults for historical reasons.
      The conversion to kmap_local(), which only disables migration, cannot be
      done wholesale because quite some call sites need to be updated to
      accommodate with the changed semantics.
      
      On PREEMPT_RT enabled kernels the kmap_atomic() semantics are problematic
      due to the implicit disabling of preemption which makes it impossible to
      acquire 'sleeping' spinlocks within the kmap atomic sections.
      
      PREEMPT_RT replaces the preempt_disable() with a migrate_disable() for
      more than a decade.  It could be argued that this is a justification to do
      this unconditionally, but PREEMPT_RT covers only a limited number of
      architectures and it disables some functionality which limits the coverage
      further.
      
      Limit the replacement to PREEMPT_RT for now.
      
      Link: https://lkml.kernel.org/r/20210810091116.pocdmaatdcogvdso@linutronix.deSigned-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      51386120
    • Weizhao Ouyang's avatar
      mm/early_ioremap.c: remove redundant early_ioremap_shutdown() · 395519b4
      Weizhao Ouyang authored
      early_ioremap_reset() reserved a weak function so that architectures can
      provide a specific cleanup.  Now no architectures use it, remove this
      redundant function.
      
      Link: https://lkml.kernel.org/r/20210901082917.399953-1-o451686892@gmail.comSigned-off-by: default avatarWeizhao Ouyang <o451686892@gmail.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      395519b4
    • Christoph Hellwig's avatar
      mm: don't allow executable ioremap mappings · 8491502f
      Christoph Hellwig authored
      There is no need to execute from iomem (and most platforms it is
      impossible anyway), so add the pgprot_nx() call similar to vmap.
      
      Link: https://lkml.kernel.org/r/20210824091259.1324527-3-hch@lst.deSigned-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8491502f
    • Christoph Hellwig's avatar
      mm: move ioremap_page_range to vmalloc.c · 82a70ce0
      Christoph Hellwig authored
      Patch series "small ioremap cleanups".
      
      The first patch moves a little code around the vmalloc/ioremap boundary
      following a bigger move by Nick earlier.  The second enforces
      non-executable mapping on ioremap just like we do for vmap.  No driver
      currently uses executable mappings anyway, as they should.
      
      This patch (of 2):
      
      This keeps it together with the implementation, and to remove the
      vmap_range wrapper.
      
      Link: https://lkml.kernel.org/r/20210824091259.1324527-1-hch@lst.de
      Link: https://lkml.kernel.org/r/20210824091259.1324527-2-hch@lst.deSigned-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      82a70ce0
    • Christoph Hellwig's avatar
      riscv: only select GENERIC_IOREMAP if MMU support is enabled · 8350229f
      Christoph Hellwig authored
      nommu ioremap is an inline stub in asm-generic/io.h.
      
      Link: https://lkml.kernel.org/r/20210825072036.GA29161@lst.deSigned-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8350229f
    • Muchun Song's avatar
      mm: remove redundant compound_head() calling · fe3df441
      Muchun Song authored
      There is a READ_ONCE() in the macro of compound_head(), which will prevent
      compiler from optimizing the code when there are more than once calling of
      it in a function.  Remove the redundant calling of compound_head() from
      page_to_index() and page_add_file_rmap() for better code generation.
      
      Link: https://lkml.kernel.org/r/20210811101431.83940-1-songmuchun@bytedance.comSigned-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fe3df441
    • Miaohe Lin's avatar
      mm/memory_hotplug: use helper zone_is_zone_device() to simplify the code · 5ef5f810
      Miaohe Lin authored
      Patch series "Cleanup and fixups for memory hotplug".
      
      This series contains cleanup to use helper function to simplify the code.
      Also we fix some potential bugs.  More details can be found in the
      respective changelogs.
      
      This patch (of 3):
      
      Use helper zone_is_zone_device() to simplify the code and remove some
      explicit CONFIG_ZONE_DEVICE codes.
      
      Link: https://lkml.kernel.org/r/20210821094246.10149-1-linmiaohe@huawei.com
      Link: https://lkml.kernel.org/r/20210821094246.10149-2-linmiaohe@huawei.comSigned-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5ef5f810
    • David Hildenbrand's avatar
      mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy · 3fcebf90
      David Hildenbrand authored
      Currently, the "auto-movable" online policy does not allow for hotplugged
      KERNEL (ZONE_NORMAL) memory to increase the amount of MOVABLE memory we
      can have, primarily, because there is no coordiantion across memory
      devices and we don't want to create zone-imbalances accidentially when
      unplugging memory.
      
      However, within a single memory device it's different.  Let's allow for
      KERNEL memory within a dynamic memory group to allow for more MOVABLE
      within the same memory group.  The only thing we have to take care of is
      that the managing driver avoids zone imbalances by unplugging MOVABLE
      memory first, otherwise there can be corner cases where unplug of memory
      could result in (accidential) zone imbalances.
      
      virtio-mem is the only user of dynamic memory groups and recently added
      support for prioritizing unplug of ZONE_MOVABLE over ZONE_NORMAL, so we
      don't need a new toggle to enable it for dynamic memory groups.
      
      We limit this handling to dynamic memory groups, because:
      
      * We want to keep the runtime overhead for collecting stats when
        onlining a single memory block small.  We tend to have only a handful of
        dynamic memory groups, but we can have quite some static memory groups
        (e.g., 256 DIMMs).
      
      * It doesn't make too much sense for static memory groups, as we try
        onlining all applicable memory blocks either completely to ZONE_MOVABLE
        or not.  In ordinary operation, we won't have a mixture of zones within
        a static memory group.
      
      When adding memory to a dynamic memory group, we'll first online memory to
      ZONE_MOVABLE as long as early KERNEL memory allows for it.  Then, we'll
      online the next unit(s) to ZONE_NORMAL, until we can online the next
      unit(s) to ZONE_MOVABLE.
      
      For a simple virtio-mem device with a MOVABLE:KERNEL ratio of 3:1, it will
      result in a layout like:
      
        [M][M][M][M][M][M][M][M][N][M][M][M][N][M][M][M]...
        ^ movable memory due to early kernel memory
      			   ^ allows for more movable memory ...
      			      ^-----^ ... here
      				       ^ allows for more movable memory ...
      				          ^-----^ ... here
      
      While the created layout is sub-optimal when it comes to contiguous zones,
      it gives us the maximum flexibility when dynamically growing/shrinking a
      device; we can grow small VMs really big in small steps, and still shrink
      reliably to e.g., 1/4 of the maximum VM size in this example, removing
      full memory blocks along with meta data more reliably.
      
      Mark dynamic memory groups in the xarray such that we can efficiently
      iterate over them when collecting stats.  In usual setups, we have one
      virtio-mem device per NUMA node, and usually only a small number of NUMA
      nodes.
      
      Note: for now, there seems to be no compelling reason to make this
      behavior configurable.
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-10-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3fcebf90
    • David Hildenbrand's avatar
      mm/memory_hotplug: memory group aware "auto-movable" online policy · 445fcf7c
      David Hildenbrand authored
      Use memory groups to improve our "auto-movable" onlining policy:
      
      1. For static memory groups (e.g., a DIMM), online a memory block MOVABLE
         only if all other memory blocks in the group are either MOVABLE or could
         be onlined MOVABLE. A DIMM will either be MOVABLE or not, not a mixture.
      
      2. For dynamic memory groups (e.g., a virtio-mem device), online a
         memory block MOVABLE only if all other memory blocks inside the
         current unit are either MOVABLE or could be onlined MOVABLE. For a
         virtio-mem device with a device block size with 512 MiB, all 128 MiB
         memory blocks wihin a 512 MiB unit will either be MOVABLE or not, not
         a mixture.
      
      We have to pass the memory group to zone_for_pfn_range() to take the
      memory group into account.
      
      Note: for now, there seems to be no compelling reason to make this
      behavior configurable.
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-9-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      445fcf7c
    • David Hildenbrand's avatar
      virtio-mem: use a single dynamic memory group for a single virtio-mem device · ffaa6ce8
      David Hildenbrand authored
      Let's use a single dynamic memory group.
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-8-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ffaa6ce8
    • David Hildenbrand's avatar
      dax/kmem: use a single static memory group for a single probed unit · eedf634a
      David Hildenbrand authored
      Although dax/kmem users often disable auto-onlining and instead online
      memory manually (usually to ZONE_MOVABLE), there is still value in having
      auto-onlining be aware of the relationship of memory blocks.
      
      Let's treat one probed unit as a single static memory device, similar to a
      single ACPI memory device.
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-7-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eedf634a
    • David Hildenbrand's avatar
      ACPI: memhotplug: use a single static memory group for a single memory device · 2a157839
      David Hildenbrand authored
      Let's group all memory we add for a single memory device - we want a
      single node for that (which also seems to be the sane thing to do).
      
      We won't care for now about memory that was already added to the system
      (e.g., via e820) -- usually *all* memory of a memory device was already
      added and we'll fail acpi_memory_enable_device().
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-6-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2a157839
    • David Hildenbrand's avatar
      mm/memory_hotplug: track present pages in memory groups · 836809ec
      David Hildenbrand authored
      Let's track all present pages in each memory group.  Especially, track
      memory present in ZONE_MOVABLE and memory present in one of the kernel
      zones (which really only is ZONE_NORMAL right now as memory groups only
      apply to hotplugged memory) separately within a memory group, to prepare
      for making smart auto-online decision for individual memory blocks within
      a memory group based on group statistics.
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-5-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      836809ec
    • David Hildenbrand's avatar
      drivers/base/memory: introduce "memory groups" to logically group memory blocks · 028fc57a
      David Hildenbrand authored
      In our "auto-movable" memory onlining policy, we want to make decisions
      across memory blocks of a single memory device.  Examples of memory
      devices include ACPI memory devices (in the simplest case a single DIMM)
      and virtio-mem.  For now, we don't have a connection between a single
      memory block device and the real memory device.  Each memory device
      consists of 1..X memory block devices.
      
      Let's logically group memory blocks belonging to the same memory device in
      "memory groups".  Memory groups can span multiple physical ranges and a
      memory group itself does not contain any information regarding physical
      ranges, only properties (e.g., "max_pages") necessary for improved memory
      onlining.
      
      Introduce two memory group types:
      
      1) Static memory group: E.g., a single ACPI memory device, consisting
         of 1..X memory resources.  A memory group consists of 1..Y memory
         blocks.  The whole group is added/removed in one go.  If any part
         cannot get offlined, the whole group cannot be removed.
      
      2) Dynamic memory group: E.g., a single virtio-mem device.  Memory is
         dynamically added/removed in a fixed granularity, called a "unit",
         consisting of 1..X memory blocks.  A unit is added/removed in one go.
         If any part of a unit cannot get offlined, the whole unit cannot be
         removed.
      
      In case of 1) we usually want either all memory managed by ZONE_MOVABLE or
      none.  In case of 2) we usually want to have as many units as possible
      managed by ZONE_MOVABLE.  We want a single unit to be of the same type.
      
      For now, memory groups are an internal concept that is not exposed to user
      space; we might want to change that in the future, though.
      
      add_memory() users can specify a mgid instead of a nid when passing the
      MHP_NID_IS_MGID flag.
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-4-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      028fc57a
    • David Hildenbrand's avatar
      mm/memory_hotplug: introduce "auto-movable" online policy · e83a437f
      David Hildenbrand authored
      When onlining without specifying a zone (using "online" instead of
      "online_kernel" or "online_movable"), we currently select a zone such that
      existing zones are kept contiguous.  This online policy made sense in the
      past, where contiguous zones where required.
      
      We'd like to implement smarter policies, however:
      
      * User space has little insight.  As one example, it has no idea which
        memory blocks logically belong together (e.g., to a DIMM or to a
        virtio-mem device).
      
      * Drivers that add memory in separate memory blocks, especially
        virtio-mem, want memory to get onlined right from the kernel when
        adding.
      
      So we really want to have onlining to differing zones managed in the
      kernel, configured by user space.
      
      We see more and more cases where we might eventually hotplug a lot of
      memory in the future (e.g., eventually grow a 2 GiB VM to 64 GiB),
      however:
      
      * Resizing happens dynamically, in smaller steps in both directions
        (e.g., 2 GiB -> 8 GiB -> 4 GiB -> 16 GiB ...)
      
      * We still want as much flexibility as possible, especially,
        hotunplugging as much memory as possible later.
      
      We can really only use "online_movable" if we know that the amount of
      memory we are going to hotplug upfront, and we know that it won't result
      in a zone imbalance.  So in our example, a 2 GiB VM that could grow to 64
      GiB could currently not use "online_movable", and instead, "online_kernel"
      would have to be used, resulting in worse (no) memory hotunplug
      reliability.
      
      Let's add a new "auto-movable" online policy that considers the current
      zone ratios (global, per-node) to determine, whether we a memory block can
      be onlined to ZONE_MOVABLE:
      
      	MOVABLE : KERNEL
      
      However, internally we'll only consider the following ratio for now:
      
      	MOVABLE : KERNEL_EARLY
      
      For now, we don't allow for hotplugged KERNEL memory to allow for more
      MOVABLE memory, because there is no coordination across memory devices.
      In follow-up patches, we will allow for more KERNEL memory within a memory
      device to allow for more MOVABLE memory within the same memory device --
      which only makes sense for special memory device types.
      
      We base our calculation on "present pages", see the code comments for
      details.  Hotplugged memory will get online to ZONE_MOVABLE if the
      configured ratio allows for it.  Depending on the setup, this can result
      in fragmented zones, which can make compaction slower and dynamic
      allocation of gigantic pages when not using CMA less reliable (...  which
      is already pretty unreliable).
      
      The old policy will be the default and called "contig-zones".  In
      follow-up patches, our new policy will use additional information, such as
      memory groups, to make even smarter decisions across memory blocks.
      
      Configuration:
      
      * memory_hotplug.online_policy is used to switch between both polices
        and defaults to "contig-zones".
      
      * memory_hotplug.auto_movable_ratio defines the maximum ratio is in
        percent and defaults to "301" -- allowing e.g., most 8 GiB machines to
        grow to 32 GiB and have all hotplugged memory in ZONE_MOVABLE.  The
        additional percent accounts for a handful of lost present pages (e.g.,
        firmware allocations).  User space is expected to adjust this ratio when
        enabling the new "auto-movable" policy, though.
      
      * memory_hotplug.auto_movable_numa_aware considers numa node stats in
        addition to global stats, and defaults to "true".
      
      Note: just like the old policy, the new policy won't take things like
      unmovable huge pages or memory ballooning that doesn't support balloon
      compaction into account.  User space has to configure onlining
      accordingly.
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e83a437f
    • David Hildenbrand's avatar
      mm: track present early pages per zone · 4b097002
      David Hildenbrand authored
      Patch series "mm/memory_hotplug: "auto-movable" online policy and memory groups", v3.
      
      I. Goal
      
      The goal of this series is improving in-kernel auto-online support.  It
      tackles the fundamental problems that:
      
       1) We can create zone imbalances when onlining all memory blindly to
          ZONE_MOVABLE, in the worst case crashing the system. We have to know
          upfront how much memory we are going to hotplug such that we can
          safely enable auto-onlining of all hotplugged memory to ZONE_MOVABLE
          via "online_movable". This is far from practical and only applicable in
          limited setups -- like inside VMs under the RHV/oVirt hypervisor which
          will never hotplug more than 3 times the boot memory (and the
          limitation is only in place due to the Linux limitation).
      
       2) We see more setups that implement dynamic VM resizing, hot(un)plugging
          memory to resize VM memory. In these setups, we might hotplug a lot of
          memory, but it might happen in various small steps in both directions
          (e.g., 2 GiB -> 8 GiB -> 4 GiB -> 16 GiB ...). virtio-mem is the
          primary driver of this upstream right now, performing such dynamic
          resizing NUMA-aware via multiple virtio-mem devices.
      
          Onlining all hotplugged memory to ZONE_NORMAL means we basically have
          no hotunplug guarantees. Onlining all to ZONE_MOVABLE means we can
          easily run into zone imbalances when growing a VM. We want a mixture,
          and we want as much memory as reasonable/configured in ZONE_MOVABLE.
          Details regarding zone imbalances can be found at [1].
      
       3) Memory devices consist of 1..X memory block devices, however, the
          kernel doesn't really track the relationship. Consequently, also user
          space has no idea. We want to make per-device decisions.
      
          As one example, for memory hotunplug it doesn't make sense to use a
          mixture of zones within a single DIMM: we want all MOVABLE if
          possible, otherwise all !MOVABLE, because any !MOVABLE part will easily
          block the whole DIMM from getting hotunplugged.
      
          As another example, virtio-mem operates on individual units that span
          1..X memory blocks. Similar to a DIMM, we want a unit to either be all
          MOVABLE or !MOVABLE. A "unit" can be thought of like a DIMM, however,
          all units of a virtio-mem device logically belong together and are
          managed (added/removed) by a single driver. We want as much memory of
          a virtio-mem device to be MOVABLE as possible.
      
       4) We want memory onlining to be done right from the kernel while adding
          memory, not triggered by user space via udev rules; for example, this
          is reqired for fast memory hotplug for drivers that add individual
          memory blocks, like virito-mem. We want a way to configure a policy in
          the kernel and avoid implementing advanced policies in user space.
      
      The auto-onlining support we have in the kernel is not sufficient.  All we
      have is a) online everything MOVABLE (online_movable) b) online everything
      !MOVABLE (online_kernel) c) keep zones contiguous (online).  This series
      allows configuring c) to mean instead "online movable if possible
      according to the coniguration, driven by a maximum MOVABLE:KERNEL ratio"
      -- a new onlining policy.
      
      II. Approach
      
      This series does 3 things:
      
       1) Introduces the "auto-movable" online policy that initially operates on
          individual memory blocks only. It uses a maximum MOVABLE:KERNEL ratio
          to make a decision whether a memory block will be onlined to
          ZONE_MOVABLE or not. However, in the basic form, hotplugged KERNEL
          memory does not allow for more MOVABLE memory (details in the
          patches). CMA memory is treated like MOVABLE memory.
      
       2) Introduces static (e.g., DIMM) and dynamic (e.g., virtio-mem) memory
          groups and uses group information to make decisions in the
          "auto-movable" online policy across memory blocks of a single memory
          device (modeled as memory group). More details can be found in patch
          #3 or in the DIMM example below.
      
       3) Maximizes ZONE_MOVABLE memory within dynamic memory groups, by
          allowing ZONE_NORMAL memory within a dynamic memory group to allow for
          more ZONE_MOVABLE memory within the same memory group. The target use
          case is dynamic VM resizing using virtio-mem. See the virtio-mem
          example below.
      
      I remember that the basic idea of using a ratio to implement a policy in
      the kernel was once mentioned by Vitaly Kuznetsov, but I might be wrong (I
      lost the pointer to that discussion).
      
      For me, the main use case is using it along with virtio-mem (and DIMMs /
      ppc64 dlpar where necessary) for dynamic resizing of VMs, increasing the
      amount of memory we can hotunplug reliably again if we might eventually
      hotplug a lot of memory to a VM.
      
      III. Target Usage
      
      The target usage will be:
      
       1) Linux boots with "mhp_default_online_type=offline"
      
       2) User space (e.g., systemd unit) configures memory onlining (according
          to a config file and system properties), for example:
          * Setting memory_hotplug.online_policy=auto-movable
          * Setting memory_hotplug.auto_movable_ratio=301
          * Setting memory_hotplug.auto_movable_numa_aware=true
      
       3) User space enabled auto onlining via "echo online >
          /sys/devices/system/memory/auto_online_blocks"
      
       4) User space triggers manual onlining of all already-offline memory
          blocks (go over offline memory blocks and set them to "online")
      
      IV. Example
      
      For DIMMs, hotplugging 4 GiB DIMMs to a 4 GiB VM with a configured ratio of
      301% results in the following layout:
      	Memory block 0-15:    DMA32   (early)
      	Memory block 32-47:   Normal  (early)
      	Memory block 48-79:   Movable (DIMM 0)
      	Memory block 80-111:  Movable (DIMM 1)
      	Memory block 112-143: Movable (DIMM 2)
      	Memory block 144-275: Normal  (DIMM 3)
      	Memory block 176-207: Normal  (DIMM 4)
      	... all Normal
      	(-> hotplugged Normal memory does not allow for more Movable memory)
      
      For virtio-mem, using a simple, single virtio-mem device with a 4 GiB VM
      will result in the following layout:
      	Memory block 0-15:    DMA32   (early)
      	Memory block 32-47:   Normal  (early)
      	Memory block 48-143:  Movable (virtio-mem, first 12 GiB)
      	Memory block 144:     Normal  (virtio-mem, next 128 MiB)
      	Memory block 145-147: Movable (virtio-mem, next 384 MiB)
      	Memory block 148:     Normal  (virtio-mem, next 128 MiB)
      	Memory block 149-151: Movable (virtio-mem, next 384 MiB)
      	... Normal/Movable mixture as above
      	(-> hotplugged Normal memory allows for more Movable memory within
      	    the same device)
      
      Which gives us maximum flexibility when dynamically growing/shrinking a
      VM in smaller steps.
      
      V. Doc Update
      
      I'll update the memory-hotplug.rst documentation, once the overhaul [1] is
      usptream. Until then, details can be found in patch #2.
      
      VI. Future Work
      
       1) Use memory groups for ppc64 dlpar
       2) Being able to specify a portion of (early) kernel memory that will be
          excluded from the ratio. Like "128 MiB globally/per node" are excluded.
      
          This might be helpful when starting VMs with extremely small memory
          footprint (e.g., 128 MiB) and hotplugging memory later -- not wanting
          the first hotplugged units getting onlined to ZONE_MOVABLE. One
          alternative would be a trigger to not consider ZONE_DMA memory
          in the ratio. We'll have to see if this is really rrequired.
       3) Indicate to user space that MOVABLE might be a bad idea -- especially
          relevant when memory ballooning without support for balloon compaction
          is active.
      
      This patch (of 9):
      
      For implementing a new memory onlining policy, which determines when to
      online memory blocks to ZONE_MOVABLE semi-automatically, we need the
      number of present early (boot) pages -- present pages excluding hotplugged
      pages.  Let's track these pages per zone.
      
      Pass a page instead of the zone to adjust_present_page_count(), similar as
      adjust_managed_page_count() and derive the zone from the page.
      
      It's worth noting that a memory block to be offlined/onlined is either
      completely "early" or "not early".  add_memory() and friends can only add
      complete memory blocks and we only online/offline complete (individual)
      memory blocks.
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20210806124715.17090-2-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4b097002
    • David Hildenbrand's avatar
      ACPI: memhotplug: memory resources cannot be enabled yet · 35ba0cd5
      David Hildenbrand authored
      We allocate + initialize everything from scratch.  In case enabling the
      device fails, we free all memory resourcs.
      
      Link: https://lkml.kernel.org/r/20210712124052.26491-5-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarPankaj Gupta <pankaj.gupta@ionos.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Blanchard <anton@ozlabs.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Laurent Dufour <ldufour@linux.ibm.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michel Lespinasse <michel@lespinasse.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nathan Lynch <nathanl@linux.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pierre Morel <pmorel@linux.ibm.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Scott Cheloha <cheloha@linux.ibm.com>
      Cc: Sergei Trofimovich <slyfox@gentoo.org>
      Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      35ba0cd5
    • David Hildenbrand's avatar
      mm/memory_hotplug: remove nid parameter from remove_memory() and friends · e1c158e4
      David Hildenbrand authored
      There is only a single user remaining.  We can simply lookup the nid only
      used for node offlining purposes when walking our memory blocks.  We don't
      expect to remove multi-nid ranges; and if we'd ever do, we most probably
      don't care about removing multi-nid ranges that actually result in empty
      nodes.
      
      If ever required, we can detect the "multi-nid" scenario and simply try
      offlining all online nodes.
      
      Link: https://lkml.kernel.org/r/20210712124052.26491-4-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Nathan Lynch <nathanl@linux.ibm.com>
      Cc: Laurent Dufour <ldufour@linux.ibm.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Scott Cheloha <cheloha@linux.ibm.com>
      Cc: Anton Blanchard <anton@ozlabs.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michel Lespinasse <michel@lespinasse.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta@ionos.com>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pierre Morel <pmorel@linux.ibm.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Sergei Trofimovich <slyfox@gentoo.org>
      Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e1c158e4
    • David Hildenbrand's avatar
      mm/memory_hotplug: remove nid parameter from arch_remove_memory() · 65a2aa5f
      David Hildenbrand authored
      The parameter is unused, let's remove it.
      
      Link: https://lkml.kernel.org/r/20210712124052.26491-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
      Acked-by: Heiko Carstens <hca@linux.ibm.com>	[s390]
      Reviewed-by: default avatarPankaj Gupta <pankaj.gupta@ionos.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Laurent Dufour <ldufour@linux.ibm.com>
      Cc: Sergei Trofimovich <slyfox@gentoo.org>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Michel Lespinasse <michel@lespinasse.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Pierre Morel <pmorel@linux.ibm.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Anton Blanchard <anton@ozlabs.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Nathan Lynch <nathanl@linux.ibm.com>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Scott Cheloha <cheloha@linux.ibm.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      65a2aa5f
    • David Hildenbrand's avatar
      mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range() · 7cf209ba
      David Hildenbrand authored
      Patch series "mm/memory_hotplug: preparatory patches for new online policy and memory"
      
      These are all cleanups and one fix previously sent as part of [1]:
      [PATCH v1 00/12] mm/memory_hotplug: "auto-movable" online policy and memory
      groups.
      
      These patches make sense even without the other series, therefore I pulled
      them out to make the other series easier to digest.
      
      [1] https://lkml.kernel.org/r/20210607195430.48228-1-david@redhat.com
      
      This patch (of 4):
      
      Checkpatch complained on a follow-up patch that we are using "unsigned"
      here, which defaults to "unsigned int" and checkpatch is correct.
      
      As we will search for a fitting zone using the wrong pfn, we might end
      up onlining memory to one of the special kernel zones, such as ZONE_DMA,
      which can end badly as the onlined memory does not satisfy properties of
      these zones.
      
      Use "unsigned long" instead, just as we do in other places when handling
      PFNs.  This can bite us once we have physical addresses in the range of
      multiple TB.
      
      Link: https://lkml.kernel.org/r/20210712124052.26491-2-david@redhat.com
      Fixes: e5e68930 ("mm, memory_hotplug: display allowed zones in the preferred ordering")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPankaj Gupta <pankaj.gupta@ionos.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: virtualization@lists.linux-foundation.org
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Anton Blanchard <anton@ozlabs.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Laurent Dufour <ldufour@linux.ibm.com>
      Cc: Michel Lespinasse <michel@lespinasse.org>
      Cc: Nathan Lynch <nathanl@linux.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pierre Morel <pmorel@linux.ibm.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Scott Cheloha <cheloha@linux.ibm.com>
      Cc: Sergei Trofimovich <slyfox@gentoo.org>
      Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7cf209ba
    • Mike Rapoport's avatar
      mm: memory_hotplug: cleanup after removal of pfn_valid_within() · 673d40c8
      Mike Rapoport authored
      When test_pages_in_a_zone() used pfn_valid_within() is has some logic
      surrounding pfn_valid_within() checks.
      
      Since pfn_valid_within() is gone, this logic can be removed.
      
      Link: https://lkml.kernel.org/r/20210713080035.7464-3-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      673d40c8
    • Mike Rapoport's avatar
      mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE · 859a85dd
      Mike Rapoport authored
      Patch series "mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE".
      
      After recent updates to freeing unused parts of the memory map, no
      architecture can have holes in the memory map within a pageblock.  This
      makes pfn_valid_within() check and CONFIG_HOLES_IN_ZONE configuration
      option redundant.
      
      The first patch removes them both in a mechanical way and the second patch
      simplifies memory_hotplug::test_pages_in_a_zone() that had
      pfn_valid_within() surrounded by more logic than simple if.
      
      This patch (of 2):
      
      After recent changes in freeing of the unused parts of the memory map and
      rework of pfn_valid() in arm and arm64 there are no architectures that can
      have holes in the memory map within a pageblock and so nothing can enable
      CONFIG_HOLES_IN_ZONE which guards non trivial implementation of
      pfn_valid_within().
      
      With that, pfn_valid_within() is always hardwired to 1 and can be
      completely removed.
      
      Remove calls to pfn_valid_within() and CONFIG_HOLES_IN_ZONE.
      
      Link: https://lkml.kernel.org/r/20210713080035.7464-1-rppt@kernel.org
      Link: https://lkml.kernel.org/r/20210713080035.7464-2-rppt@kernel.orgSigned-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      859a85dd
    • David Hildenbrand's avatar
      memory-hotplug.rst: complete admin-guide overhaul · ac3332c4
      David Hildenbrand authored
      The memory hot(un)plug documentation is outdated and incomplete.  Most of
      the content dates back to 2007, so it's time for a major overhaul.
      
      Let's rewrite, reorganize and update most parts of the documentation.  In
      addition to memory hot(un)plug, also add some details regarding
      ZONE_MOVABLE, with memory hotunplug being one of its main consumers.
      
      Drop the file history, that information can more reliably be had from the
      git log.
      
      The style of the document is also properly fixed that e.g., "restview"
      renders it cleanly now.
      
      In the future, we might add some more details about virt users like
      virtio-mem, the XEN balloon, the Hyper-V balloon and ppc64 dlpar.
      
      Link: https://lkml.kernel.org/r/20210707073205.3835-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ac3332c4
    • David Hildenbrand's avatar
      memory-hotplug.rst: remove locking details from admin-guide · df82bf5a
      David Hildenbrand authored
      Patch series "memory-hotplug.rst: complete admin-guide overhaul", v3.
      
      This patch (of 2):
      
      We have the same content at Documentation/core-api/memory-hotplug.rst and
      it doesn't fit into the admin-guide.  The documentation was accidentially
      duplicated when merging.
      
      Link: https://lkml.kernel.org/r/20210707073205.3835-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20210707073205.3835-2-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      df82bf5a
  2. 29 Aug, 2021 8 commits
  3. 28 Aug, 2021 3 commits
  4. 27 Aug, 2021 1 commit
    • Linus Torvalds's avatar
      Merge tag 'block-5.14-2021-08-27' of git://git.kernel.dk/linux-block · 64b4fc45
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - Revert the mq-deadline priority handling, it's causing serious
         performance regressions. While experimental patches exists to fix
         this up, it's too late to do so now. Revert it and re-do it properly
         for 5.15 instead.
      
       - Fix a NULL vs IS_ERR() regression in this release (Dan)
      
       - Fix a mq-deadline accounting regression in this release (Bart)
      
       - Mark cryptoloop as deprecated. It's broken and dm-crypt fully
         supports it, and it's actively intefering with loop. Plan on removal
         for 5.16 (Christoph)
      
      * tag 'block-5.14-2021-08-27' of git://git.kernel.dk/linux-block:
        cryptoloop: add a deprecation warning
        pd: fix a NULL vs IS_ERR() check
        Revert "block/mq-deadline: Prioritize high-priority requests"
        mq-deadline: Fix request accounting
      64b4fc45