1. 27 May, 2022 4 commits
  2. 13 May, 2022 2 commits
    • Sultan Alsawaf's avatar
      zsmalloc: fix races between asynchronous zspage free and page migration · 2505a981
      Sultan Alsawaf authored
      The asynchronous zspage free worker tries to lock a zspage's entire page
      list without defending against page migration.  Since pages which haven't
      yet been locked can concurrently migrate off the zspage page list while
      lock_zspage() churns away, lock_zspage() can suffer from a few different
      lethal races.
      
      It can lock a page which no longer belongs to the zspage and unsafely
      dereference page_private(), it can unsafely dereference a torn pointer to
      the next page (since there's a data race), and it can observe a spurious
      NULL pointer to the next page and thus not lock all of the zspage's pages
      (since a single page migration will reconstruct the entire page list, and
      create_page_chain() unconditionally zeroes out each list pointer in the
      process).
      
      Fix the races by using migrate_read_lock() in lock_zspage() to synchronize
      with page migration.
      
      Link: https://lkml.kernel.org/r/20220509024703.243847-1-sultan@kerneltoast.com
      Fixes: 77ff4657 ("zsmalloc: zs_page_migrate: skip unnecessary loops but not return -EBUSY if zspage is not inuse")
      Signed-off-by: default avatarSultan Alsawaf <sultan@kerneltoast.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2505a981
    • Dong Aisheng's avatar
      Revert "mm/cma.c: remove redundant cma_mutex lock" · 60a60e32
      Dong Aisheng authored
      This reverts commit a4efc174 which introduced a regression issue
      that when there're multiple processes allocating dma memory in parallel by
      calling dma_alloc_coherent(), it may fail sometimes as follows:
      
      Error log:
      cma: cma_alloc: linux,cma: alloc failed, req-size: 148 pages, ret: -16
      cma: number of available pages:
      3@125+20@172+12@236+4@380+32@736+17@2287+23@2473+20@36076+99@40477+108@40852+44@41108+20@41196+108@41364+108@41620+
      108@42900+108@43156+483@44061+1763@45341+1440@47712+20@49324+20@49388+5076@49452+2304@55040+35@58141+20@58220+20@58284+
      7188@58348+84@66220+7276@66452+227@74525+6371@75549=> 33161 free of 81920 total pages
      
      When issue happened, we saw there were still 33161 pages (129M) free CMA
      memory and a lot available free slots for 148 pages in CMA bitmap that we
      want to allocate.
      
      When dumping memory info, we found that there was also ~342M normal
      memory, but only 1352K CMA memory left in buddy system while a lot of
      pageblocks were isolated.
      
      Memory info log:
      Normal free:351096kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB
      	    active_anon:98060kB inactive_anon:98948kB active_file:60864kB inactive_file:31776kB
      	    unevictable:0kB writepending:0kB present:1048576kB managed:1018328kB mlocked:0kB
      	    bounce:0kB free_pcp:220kB local_pcp:192kB free_cma:1352kB lowmem_reserve[]: 0 0 0
      Normal: 78*4kB (UECI) 1772*8kB (UMECI) 1335*16kB (UMECI) 360*32kB (UMECI) 65*64kB (UMCI)
      	36*128kB (UMECI) 16*256kB (UMCI) 6*512kB (EI) 8*1024kB (UEI) 4*2048kB (MI) 8*4096kB (EI)
      	8*8192kB (UI) 3*16384kB (EI) 8*32768kB (M) = 489288kB
      
      The root cause of this issue is that since commit a4efc174 ("mm/cma.c:
      remove redundant cma_mutex lock"), CMA supports concurrent memory
      allocation.  It's possible that the memory range process A trying to alloc
      has already been isolated by the allocation of process B during memory
      migration.
      
      The problem here is that the memory range isolated during one allocation
      by start_isolate_page_range() could be much bigger than the real size we
      want to alloc due to the range is aligned to MAX_ORDER_NR_PAGES.
      
      Taking an ARMv7 platform with 1G memory as an example, when
      MAX_ORDER_NR_PAGES is big (e.g.  32M with max_order 14) and CMA memory is
      relatively small (e.g.  128M), there're only 4 MAX_ORDER slot, then it's
      very easy that all CMA memory may have already been isolated by other
      processes when one trying to allocate memory using dma_alloc_coherent(). 
      Since current CMA code will only scan one time of whole available CMA
      memory, then dma_alloc_coherent() may easy fail due to contention with
      other processes.
      
      This patch simply falls back to the original method that using cma_mutex
      to make alloc_contig_range() run sequentially to avoid the issue.
      
      Link: https://lkml.kernel.org/r/20220509094551.3596244-1-aisheng.dong@nxp.com
      Link: https://lore.kernel.org/all/20220315144521.3810298-2-aisheng.dong@nxp.com/
      Fixes: a4efc174 ("mm/cma.c: remove redundant cma_mutex lock")
      Signed-off-by: default avatarDong Aisheng <aisheng.dong@nxp.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>	[5.11+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      60a60e32
  3. 10 May, 2022 7 commits
  4. 29 Apr, 2022 7 commits
    • Naoya Horiguchi's avatar
      mm/hwpoison: use pr_err() instead of dump_page() in get_any_page() · 1825b93b
      Naoya Horiguchi authored
      The following VM_BUG_ON_FOLIO() is triggered when memory error event
      happens on the (thp/folio) pages which are about to be freed:
      
        [ 1160.232771] page:00000000b36a8a0f refcount:1 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x16a000
        [ 1160.236916] page:00000000b36a8a0f refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x16a000
        [ 1160.240684] flags: 0x57ffffc0800000(hwpoison|node=1|zone=2|lastcpupid=0x1fffff)
        [ 1160.243458] raw: 0057ffffc0800000 dead000000000100 dead000000000122 0000000000000000
        [ 1160.246268] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
        [ 1160.249197] page dumped because: VM_BUG_ON_FOLIO(!folio_test_large(folio))
        [ 1160.251815] ------------[ cut here ]------------
        [ 1160.253438] kernel BUG at include/linux/mm.h:788!
        [ 1160.256162] invalid opcode: 0000 [#1] PREEMPT SMP PTI
        [ 1160.258172] CPU: 2 PID: 115368 Comm: mceinj.sh Tainted: G            E     5.18.0-rc1-v5.18-rc1-220404-2353-005-g83111+ #3
        [ 1160.262049] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
        [ 1160.265103] RIP: 0010:dump_page.cold+0x27e/0x2bd
        [ 1160.266757] Code: fe ff ff 48 c7 c6 81 f1 5a 98 e9 4c fe ff ff 48 c7 c6 a1 95 59 98 e9 40 fe ff ff 48 c7 c6 50 bf 5a 98 48 89 ef e8 9d 04 6d ff <0f> 0b 41 f7 c4 ff 0f 00 00 0f 85 9f fd ff ff 49 8b 04 24 a9 00 00
        [ 1160.273180] RSP: 0018:ffffaa2c4d59fd18 EFLAGS: 00010292
        [ 1160.274969] RAX: 000000000000003e RBX: 0000000000000001 RCX: 0000000000000000
        [ 1160.277263] RDX: 0000000000000001 RSI: ffffffff985995a1 RDI: 00000000ffffffff
        [ 1160.279571] RBP: ffffdc9c45a80000 R08: 0000000000000000 R09: 00000000ffffdfff
        [ 1160.281794] R10: ffffaa2c4d59fb08 R11: ffffffff98940d08 R12: ffffdc9c45a80000
        [ 1160.283920] R13: ffffffff985b6f94 R14: 0000000000000000 R15: ffffdc9c45a80000
        [ 1160.286641] FS:  00007eff54ce1740(0000) GS:ffff99c67bd00000(0000) knlGS:0000000000000000
        [ 1160.289498] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 1160.291106] CR2: 00005628381a5f68 CR3: 0000000104712003 CR4: 0000000000170ee0
        [ 1160.293031] Call Trace:
        [ 1160.293724]  <TASK>
        [ 1160.294334]  get_hwpoison_page+0x47d/0x570
        [ 1160.295474]  memory_failure+0x106/0xaa0
        [ 1160.296474]  ? security_capable+0x36/0x50
        [ 1160.297524]  hard_offline_page_store+0x43/0x80
        [ 1160.298684]  kernfs_fop_write_iter+0x11c/0x1b0
        [ 1160.299829]  new_sync_write+0xf9/0x160
        [ 1160.300810]  vfs_write+0x209/0x290
        [ 1160.301835]  ksys_write+0x4f/0xc0
        [ 1160.302718]  do_syscall_64+0x3b/0x90
        [ 1160.303664]  entry_SYSCALL_64_after_hwframe+0x44/0xae
        [ 1160.304981] RIP: 0033:0x7eff54b018b7
      
      As shown in the RIP address, this VM_BUG_ON in folio_entire_mapcount() is
      called from dump_page("hwpoison: unhandlable page") in get_any_page().
      The below explains the mechanism of the race:
      
        CPU 0                                       CPU 1
      
          memory_failure
            get_hwpoison_page
              get_any_page
                dump_page
                  compound = PageCompound
                                                      free_pages_prepare
                                                        page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP
                  folio_entire_mapcount
                    VM_BUG_ON_FOLIO(!folio_test_large(folio))
      
      So replace dump_page() with safer one, pr_err().
      
      Link: https://lkml.kernel.org/r/20220427053220.719866-1-naoya.horiguchi@linux.dev
      Fixes: 74e8ee47 ("mm: Turn head_compound_mapcount() into folio_entire_mapcount()")
      Signed-off-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1825b93b
    • Xu Yu's avatar
      mm/huge_memory: do not overkill when splitting huge_zero_page · 478d134e
      Xu Yu authored
      Kernel panic when injecting memory_failure for the global huge_zero_page,
      when CONFIG_DEBUG_VM is enabled, as follows.
      
        Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
        page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
        head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
        flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
        raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
        raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
        page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
        ------------[ cut here ]------------
        kernel BUG at mm/huge_memory.c:2499!
        invalid opcode: 0000 [#1] PREEMPT SMP PTI
        CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
        Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
        RIP: 0010:split_huge_page_to_list+0x66a/0x880
        Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
        RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
        RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
        RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
        R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
        R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
        FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
        try_to_split_thp_page+0x3a/0x130
        memory_failure+0x128/0x800
        madvise_inject_error.cold+0x8b/0xa1
        __x64_sys_madvise+0x54/0x60
        do_syscall_64+0x35/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
        RIP: 0033:0x7fc3754f8bf9
        Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
        RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
        RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
        RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
        RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
        R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
        R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000
      
      We think that raising BUG is overkilling for splitting huge_zero_page, the
      huge_zero_page can't be met from normal paths other than memory failure,
      but memory failure is a valid caller.  So we tend to replace the BUG to
      WARN + returning -EBUSY, and thus the panic above won't happen again.
      
      Link: https://lkml.kernel.org/r/f35f8b97377d5d3ede1bc5ac3114da888c57cbce.1651052574.git.xuyu@linux.alibaba.com
      Fixes: d173d541 ("mm/memory-failure.c: skip huge_zero_page in memory_failure()")
      Fixes: 6a46079c ("HWPOISON: The high level memory error handler in the VM v7")
      Signed-off-by: default avatarXu Yu <xuyu@linux.alibaba.com>
      Suggested-by: default avatarYang Shi <shy828301@gmail.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reviewed-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      478d134e
    • Xu Yu's avatar
      Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()" · b4e61fc0
      Xu Yu authored
      Patch series "mm/memory-failure: rework fix on huge_zero_page splitting".
      
      
      This patch (of 2):
      
      This reverts commit d173d541.
      
      The commit d173d541 ("mm/memory-failure.c: skip huge_zero_page in
      memory_failure()") explicitly skips huge_zero_page in memory_failure(), in
      order to avoid triggering VM_BUG_ON_PAGE on huge_zero_page in
      split_huge_page_to_list().
      
      This works, but Yang Shi thinks that,
      
          Raising BUG is overkilling for splitting huge_zero_page. The
          huge_zero_page can't be met from normal paths other than memory
          failure, but memory failure is a valid caller. So I tend to replace
          the BUG to WARN + returning -EBUSY. If we don't care about the
          reason code in memory failure, we don't have to touch memory
          failure.
      
      And for the issue that huge_zero_page will be set PG_has_hwpoisoned,
      Yang Shi comments that,
      
          The anonymous page fault doesn't check if the page is poisoned or
          not since it typically gets a fresh allocated page and assumes the
          poisoned page (isolated successfully) can't be reallocated again.
          But huge zero page and base zero page are reused every time. So no
          matter what fix we pick, the issue is always there.
      
      Finally, Yang, David, Anshuman and Naoya all agree to fix the bug, i.e.,
      to split huge_zero_page, in split_huge_page_to_list().
      
      This reverts the commit d173d541 ("mm/memory-failure.c: skip
      huge_zero_page in memory_failure()"), and the original bug will be fixed
      by the next patch.
      
      Link: https://lkml.kernel.org/r/872cefb182ba1dd686b0e7db1e6b2ebe5a4fff87.1651039624.git.xuyu@linux.alibaba.com
      Fixes: d173d541 ("mm/memory-failure.c: skip huge_zero_page in memory_failure()")
      Fixes: 6a46079c ("HWPOISON: The high level memory error handler in the VM v7")
      Signed-off-by: default avatarXu Yu <xuyu@linux.alibaba.com>
      Suggested-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b4e61fc0
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2022-04-29' of git://anongit.freedesktop.org/drm/drm · 38d741cb
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Another relatively quiet week, amdgpu leads the way, some i915 display
        fixes, and a single sunxi fix.
      
        amdgpu:
         - Runtime pm fix
         - DCN memory leak fix in error path
         - SI DPM deadlock fix
         - S0ix fix
      
        amdkfd:
         - GWS fix
         - GWS support for CRIU
      
        i915:
         - Fix #5284: Backlight control regression on XMG Core 15 e21
         - Fix black display plane on Acer One AO532h
         - Two smaller display fixes
      
        sunxi:
         - Single fix removing applying PHYS_OFFSET twice"
      
      * tag 'drm-fixes-2022-04-29' of git://anongit.freedesktop.org/drm/drm:
        drm/amdgpu: keep mmhub clock gating being enabled during s2idle suspend
        drm/amd/pm: fix the deadlock issue observed on SI
        drm/amd/display: Fix memory leak in dcn21_clock_source_create
        drm/amdgpu: don't runtime suspend if there are displays attached (v3)
        drm/amdkfd: CRIU add support for GWS queues
        drm/amdkfd: Fix GWS queue count
        drm/sun4i: Remove obsolete references to PHYS_OFFSET
        drm/i915/fbc: Consult hw.crtc instead of uapi.crtc
        drm/i915: Fix SEL_FETCH_PLANE_*(PIPE_B+) register addresses
        drm/i915: Check EDID for HDR static metadata when choosing blc
        drm/i915: Fix DISP_POS_Y and DISP_HEIGHT defines
      38d741cb
    • Dave Airlie's avatar
      Merge tag 'amd-drm-fixes-5.18-2022-04-27' of... · 9d9f7207
      Dave Airlie authored
      Merge tag 'amd-drm-fixes-5.18-2022-04-27' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
      
      amd-drm-fixes-5.18-2022-04-27:
      
      amdgpu:
      - Runtime pm fix
      - DCN memory leak fix in error path
      - SI DPM deadlock fix
      - S0ix fix
      
      amdkfd:
      - GWS fix
      - GWS support for CRIU
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Alex Deucher <alexander.deucher@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220428023232.5794-1-alexander.deucher@amd.com
      9d9f7207
    • Dave Airlie's avatar
      Merge tag 'drm-intel-fixes-2022-04-28' of... · 22c73ba4
      Dave Airlie authored
      Merge tag 'drm-intel-fixes-2022-04-28' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
      
      - Fix #5284: Backlight control regression on XMG Core 15 e21
      - Fix black display plane on Acer One AO532h
      - Two smaller display fixes
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/Ymotel5VfZUrJahf@jlahtine-mobl.ger.corp.intel.com
      22c73ba4
    • Dave Airlie's avatar
      Merge tag 'drm-misc-fixes-2022-04-27' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes · 43f2c104
      Dave Airlie authored
      drm-misc-fixes for v5.18-rc5:
      - Single fix removing applying PHYS_OFFSET twice in sunxi.
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/f692bb62-5620-1868-91b7-dffb8d6f9175@linux.intel.com
      43f2c104
  5. 28 Apr, 2022 20 commits