• Oscar Salvador's avatar
    mm,hwpoison: take free pages off the buddy freelists · a8b2c2ce
    Oscar Salvador authored
    The crux of the matter is that historically we left poisoned pages in the
    buddy system because we have some checks in place when allocating a page
    that are gatekeeper for poisoned pages.  Unfortunately, we do have other
    users (e.g: compaction [1]) that scan buddy freelists and try to get a
    page from there without checking whether the page is HWPoison.
    
    As I stated already, I think it is fundamentally wrong to keep HWPoison
    pages within the buddy systems, checks in place or not.
    
    Let us fix this the same way we did for soft_offline [2], taking the page
    off the buddy freelist so it is completely unreachable.
    
    Note that this is fairly simple to trigger, as we only need to poison free
    buddy pages (madvise MADV_HWPOISON) and then run some sort of memory
    stress system.
    
    Just for a matter of reference, I put a dump_page() in compaction_alloc()
    to trigger for HWPoison patches:
    
        page:0000000012b2982b refcount:1 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1d5db
        flags: 0xfffffc0800000(hwpoison)
        raw: 000fffffc0800000 ffffea00007573c8 ffffc90000857de0 0000000000000000
        raw: 0000000000000001 0000000000000000 00000001ffffffff 0000000000000000
        page dumped because: compaction_alloc
    
        CPU: 4 PID: 123 Comm: kcompactd0 Tainted: G            E     5.9.0-rc2-mm1-1-default+ #5
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
        Call Trace:
         dump_stack+0x6d/0x8b
         compaction_alloc+0xb2/0xc0
         migrate_pages+0x2a6/0x12a0
         compact_zone+0x5eb/0x11c0
         proactive_compact_node+0x89/0xf0
         kcompactd+0x2d0/0x3a0
         kthread+0x118/0x130
         ret_from_fork+0x22/0x30
    
    After that, if e.g: a process faults in the page,  it will get killed
    unexpectedly.
    Fix it by containing the page immediatelly.
    
    Besides that, two more changes can be noticed:
    
    * MF_DELAYED no longer suits as we are fixing the issue by containing
      the page immediately, so it does no longer rely on the allocation-time
      checks to stop HWPoison to be handed over.
      gain unless it is unpoisoned, so we fixed the situation.
      Because of that, let us use MF_RECOVERED from now on.
    
    * The second block that handles PageBuddy pages is no longer needed:
      We call shake_page and then check whether the page is Buddy
      because shake_page calls drain_all_pages, which sends pcp-pages back to
      the buddy freelists, so we could have a chance to handle free pages.
      Currently, get_hwpoison_page already calls drain_all_pages, and we call
      get_hwpoison_page right before coming here, so we should be on the safe
      side.
    
    [1] https://lore.kernel.org/linux-mm/20190826104144.GA7849@linux/T/#u
    [2] https://patchwork.kernel.org/cover/11792607/
    
    [osalvador@suse.de: take the poisoned subpage off the buddy frelists]
      Link: https://lkml.kernel.org/r/20201013144447.6706-4-osalvador@suse.de
    
    Link: https://lkml.kernel.org/r/20201013144447.6706-3-osalvador@suse.deSigned-off-by: default avatarOscar Salvador <osalvador@suse.de>
    Acked-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    a8b2c2ce
memory-failure.c 52.4 KB