• David Hildenbrand's avatar
    mm/memory_hotplug: don't access uninitialized memmaps in shrink_zone_span() · 7ce700bf
    David Hildenbrand authored
    Let's limit shrinking to !ZONE_DEVICE so we can fix the current code.
    We should never try to touch the memmap of offline sections where we
    could have uninitialized memmaps and could trigger BUGs when calling
    page_to_nid() on poisoned pages.
    
    There is no reliable way to distinguish an uninitialized memmap from an
    initialized memmap that belongs to ZONE_DEVICE, as we don't have
    anything like SECTION_IS_ONLINE we can use similar to
    pfn_to_online_section() for !ZONE_DEVICE memory.
    
    E.g., set_zone_contiguous() similarly relies on pfn_to_online_section()
    and will therefore never set a ZONE_DEVICE zone consecutive.  Stopping
    to shrink the ZONE_DEVICE therefore results in no observable changes,
    besides /proc/zoneinfo indicating different boundaries - something we
    can totally live with.
    
    Before commit d0dc12e8 ("mm/memory_hotplug: optimize memory
    hotplug"), the memmap was initialized with 0 and the node with the right
    value.  So the zone might be wrong but not garbage.  After that commit,
    both the zone and the node will be garbage when touching uninitialized
    memmaps.
    
    Toshiki reported a BUG (race between delayed initialization of
    ZONE_DEVICE memmaps without holding the memory hotplug lock and
    concurrent zone shrinking).
    
      https://lkml.org/lkml/2019/11/14/1040
    
    "Iteration of create and destroy namespace causes the panic as below:
    
          kernel BUG at mm/page_alloc.c:535!
          CPU: 7 PID: 2766 Comm: ndctl Not tainted 5.4.0-rc4 #6
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
          RIP: 0010:set_pfnblock_flags_mask+0x95/0xf0
          Call Trace:
           memmap_init_zone_device+0x165/0x17c
           memremap_pages+0x4c1/0x540
           devm_memremap_pages+0x1d/0x60
           pmem_attach_disk+0x16b/0x600 [nd_pmem]
           nvdimm_bus_probe+0x69/0x1c0
           really_probe+0x1c2/0x3e0
           driver_probe_device+0xb4/0x100
           device_driver_attach+0x4f/0x60
           bind_store+0xc9/0x110
           kernfs_fop_write+0x116/0x190
           vfs_write+0xa5/0x1a0
           ksys_write+0x59/0xd0
           do_syscall_64+0x5b/0x180
           entry_SYSCALL_64_after_hwframe+0x44/0xa9
    
      While creating a namespace and initializing memmap, if you destroy the
      namespace and shrink the zone, it will initialize the memmap outside
      the zone and trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page),
      pfn), page) in set_pfnblock_flags_mask()."
    
    This BUG is also mitigated by this commit, where we for now stop to
    shrink the ZONE_DEVICE zone until we can do it in a safe and clean way.
    
    Link: http://lkml.kernel.org/r/20191006085646.5768-5-david@redhat.com
    Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e8]
    Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Reported-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Reported-by: default avatarToshiki Fukasawa <t-fukasawa@vx.jp.nec.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Anshuman Khandual <anshuman.khandual@arm.com>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christian Borntraeger <borntraeger@de.ibm.com>
    Cc: Christophe Leroy <christophe.leroy@c-s.fr>
    Cc: Damian Tometzki <damian.tometzki@gmail.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Fenghua Yu <fenghua.yu@intel.com>
    Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Halil Pasic <pasic@linux.ibm.com>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Ira Weiny <ira.weiny@intel.com>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: Jun Yao <yaojun8558363@gmail.com>
    Cc: Logan Gunthorpe <logang@deltatee.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Mike Rapoport <rppt@linux.ibm.com>
    Cc: Pankaj Gupta <pagupta@redhat.com>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Qian Cai <cai@lca.pw>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Robin Murphy <robin.murphy@arm.com>
    Cc: Steve Capper <steve.capper@arm.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tom Lendacky <thomas.lendacky@amd.com>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Wei Yang <richard.weiyang@gmail.com>
    Cc: Wei Yang <richardw.yang@linux.intel.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Cc: Yu Zhao <yuzhao@google.com>
    Cc: <stable@vger.kernel.org>	[4.13+]
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    7ce700bf
memory_hotplug.c 47.2 KB