1. 10 Jul, 2024 6 commits
    • SeongJae Park's avatar
      Docs/admin-guide/mm/damon/start: add access pattern snapshot example · 752f18b9
      SeongJae Park authored
      DAMON user-space tool (damo) provides access pattern snapshot feature,
      which is expected to be frequently used for real time access pattern
      analysis.  The snapshot output is also showing what DAMON provides on its
      own, including the 'age' information.
      
      In contrast, the recorded access patterns, which is shown as an example
      usage on the quick start section, shows what users can make from what
      DAMON provided.  It includes information that generated outside of DAMON
      and makes the 'age' concept bit unclear.  Hence snapshot output is easier
      at understanding the raw realtime output of DAMON.  Add the snapshot usage
      example on the quick start section.
      
      Link: https://lkml.kernel.org/r/20240701192706.51415-4-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      752f18b9
    • SeongJae Park's avatar
      Docs/mm/damon/design: clarify regions merging operation · 2081610d
      SeongJae Park authored
      DAMON design document is not explaining how min_nr_regions limit is kept,
      and what happens if the number of regions exceeds max_nr_regions.  Add
      more clarification for those.
      
      Link: https://lkml.kernel.org/r/20240701192706.51415-3-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2081610d
    • SeongJae Park's avatar
      Docs/mm/damon/design: fix two typos · efeacc2c
      SeongJae Park authored
      Patch series "Docs/damon: minor fixups and improvements".
      
      Fixup typos, clarify regions merging operation design with recent change,
      add access pattern snapshot example use case, and improve readability of
      the design document and subsystem documents index by
      reorganizing/wordsmithing and adding links to other sections and/or
      documents for easy browsing.
      
      
      This patch (of 9):
      
      Fix two typos.  The first one is just a simple typo: s/accurach/accuracy/
      
      The second one is made by the author being out of their mind.  'Region
      Based Sampling' section of the doc is mistakenly calling the access
      frequency counter of region as 'nr_regions'.  Fix it with the correct
      name, 'nr_accesses'.
      
      Link: https://lkml.kernel.org/r/20240701192706.51415-1-sj@kernel.org
      Link: https://lkml.kernel.org/r/20240701192706.51415-2-sj@kernel.orgSigned-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      efeacc2c
    • Bang Li's avatar
      mm/shmem: fix input and output inconsistencies · 843a2e24
      Bang Li authored
      Commit 19eaf449 ("mm: thp: support allocation of anonymous multi-size
      THP") added mTHP support for anonymous shmem.  We can configure different
      policies through the multi-size THP sysfs interface for anonymous shmem.
      
      But when we configure the "advise" policy of
      /sys/kernel/mm/transparent_hugepage/hugepages-xxxkB/shmem_enabled, we
      cannot write the "advise", but write the "madvise", which is unreasonable.
      We should keep the output and input values consistent, which is more
      convenient for users.
      
      Link: https://lkml.kernel.org/r/20240628032327.16987-1-libang.li@antgroup.com
      Fixes: 61a57f1b1da9 ("mm: shmem: add multi-size THP sysfs interface for anonymous shmem")
      Signed-off-by: default avatarBang Li <libang.li@antgroup.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Bang Li <libang.li@antgroup.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      843a2e24
    • Edward Liaw's avatar
      selftests: centralize -D_GNU_SOURCE= to CFLAGS in lib.mk · cc937dad
      Edward Liaw authored
      Centralize the _GNU_SOURCE definition to CFLAGS in lib.mk.  Remove
      redundant defines from Makefiles that import lib.mk.  Convert any usage of
      "#define _GNU_SOURCE 1" to "#define _GNU_SOURCE".
      
      This uses the form "-D_GNU_SOURCE=", which is equivalent to
      "#define _GNU_SOURCE".
      
      Otherwise using "-D_GNU_SOURCE" is equivalent to "-D_GNU_SOURCE=1" and
      "#define _GNU_SOURCE 1", which is less commonly seen in source code and
      would require many changes in selftests to avoid redefinition warnings.
      
      Link: https://lkml.kernel.org/r/20240625223454.1586259-2-edliaw@google.comSigned-off-by: default avatarEdward Liaw <edliaw@google.com>
      Suggested-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Acked-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Reviewed-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: André Almeida <andrealmeid@igalia.com>
      Cc: Darren Hart <dvhart@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jarkko Sakkinen <jarkko@kernel.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kees Cook <kees@kernel.org>
      Cc: Kevin Tian <kevin.tian@intel.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Reinette Chatre <reinette.chatre@intel.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cc937dad
    • Barry Song's avatar
      tools/mm: introduce a tool to assess swap entry allocation for thp_swapout · 95139d94
      Barry Song authored
      Both Ryan and Chris have been utilizing the small test program to aid in
      debugging and identifying issues with swap entry allocation.  While a real
      or intricate workload might be more suitable for assessing the correctness
      and effectiveness of the swap allocation policy, a small test program
      presents a simpler means of understanding the problem and initially
      verifying the improvements being made.
      
      Let's endeavor to integrate it into tools/mm.  Although it presently only
      accommodates 64KB and 4KB, I'm optimistic that we can expand its
      capabilities to support multiple sizes and simulate more complex systems
      in the future as required.
      
      Basically, we have
      
      1. Use MADV_PAGEPUT for rapid swap-out, putting the swap allocation
         code under high exercise in a short time.
      
      2. Use MADV_DONTNEED to simulate the behavior of libc and Java heap in
         freeing memory, as well as for munmap, app exits, or OOM killer
         scenarios.  This ensures new mTHP is always generated, released or
         swapped out, similar to the behavior on a PC or Android phone where
         many applications are frequently started and terminated.
      
      3. Swap in with or without the "-a" option to observe how fragments
         due to swap-in and the incoming swap-in of large folios will impact
         swap-out fallback.
      
      Due to 2, we ensure a certain proportion of mTHP.  Similarly, because of
      3, we maintain a certain proportion of small folios, as we don't support
      large folios swap-in, meaning any swap-in will immediately result in small
      folios.  Therefore, with both 2 and 3, we automatically achieve a system
      containing both mTHP and small folios.  Additionally, 1 provides the
      ability to continuously swap them out.
      
      We can also use "-s" to add a dedicated small folios memory area.
      
      [akpm@linux-foundation.org: thp_swap_allocator_test.c needs mman.h, per Kairui Song]
      Link: https://lkml.kernel.org/r/20240622071231.576056-2-21cnbao@gmail.comSigned-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Acked-by: default avatarChris Li <chrisl@kernel.org>
      Tested-by: default avatarChris Li <chrisl@kernel.org>
      Reviewed-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Tested-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kairui Song <kasong@tencent.com>
      Cc: Kalesh Singh <kaleshsingh@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      95139d94
  2. 06 Jul, 2024 11 commits
    • Kefeng Wang's avatar
      mm: migrate: remove folio_migrate_copy() · 3f594937
      Kefeng Wang authored
      The folio_migrate_copy() is just a wrapper of folio_copy() and
      folio_migrate_flags(), it is simple and only aio use it for now, unfold it
      and remove folio_migrate_copy().
      
      Link: https://lkml.kernel.org/r/20240626085328.608006-7-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarJane Chu <jane.chu@oracle.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jiaqi Yan <jiaqiyan@google.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3f594937
    • Kefeng Wang's avatar
      fs: hugetlbfs: support poisoned recover from hugetlbfs_migrate_folio() · f00b295b
      Kefeng Wang authored
      This is similar to __migrate_folio(), use folio_mc_copy() in HugeTLB folio
      migration to avoid panic when copy from poisoned folio.
      
      Link: https://lkml.kernel.org/r/20240626085328.608006-6-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jiaqi Yan <jiaqiyan@google.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f00b295b
    • Kefeng Wang's avatar
      mm: migrate: support poisoned recover from migrate folio · 06091399
      Kefeng Wang authored
      The folio migration is widely used in kernel, memory compaction, memory
      hotplug, soft offline page, numa balance, memory demote/promotion, etc,
      but once access a poisoned source folio when migrating, the kerenl will
      panic.
      
      There is a mechanism in the kernel to recover from uncorrectable memory
      errors, ARCH_HAS_COPY_MC, which is already used in other core-mm paths,
      eg, CoW, khugepaged, coredump, ksm copy, see copy_mc_to_{user,kernel},
      copy_mc_{user_}highpage callers.
      
      In order to support poisoned folio copy recover from migrate folio, we
      chose to make folio migration tolerant of memory failures and return error
      for folio migration, because folio migration is no guarantee of success,
      this could avoid the similar panic shown below.
      
        CPU: 1 PID: 88343 Comm: test_softofflin Kdump: loaded Not tainted 6.6.0
        pc : copy_page+0x10/0xc0
        lr : copy_highpage+0x38/0x50
        ...
        Call trace:
         copy_page+0x10/0xc0
         folio_copy+0x78/0x90
         migrate_folio_extra+0x54/0xa0
         move_to_new_folio+0xd8/0x1f0
         migrate_folio_move+0xb8/0x300
         migrate_pages_batch+0x528/0x788
         migrate_pages_sync+0x8c/0x258
         migrate_pages+0x440/0x528
         soft_offline_in_use_page+0x2ec/0x3c0
         soft_offline_page+0x238/0x310
         soft_offline_page_store+0x6c/0xc0
         dev_attr_store+0x20/0x40
         sysfs_kf_write+0x4c/0x68
         kernfs_fop_write_iter+0x130/0x1c8
         new_sync_write+0xa4/0x138
         vfs_write+0x238/0x2d8
         ksys_write+0x74/0x110
      
      Note, folio copy is moved in the begin of the __migrate_folio(), which
      could simplify the error handling since there is no turning back if
      folio_migrate_mapping() return success, the downside is the folio copied
      even though folio_migrate_mapping() return fail, an optimization is to
      check whether source folio does not have extra refs before we do folio
      copy.
      
      Link: https://lkml.kernel.org/r/20240626085328.608006-5-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jiaqi Yan <jiaqiyan@google.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      06091399
    • Kefeng Wang's avatar
      mm: migrate: split folio_migrate_mapping() · 52881539
      Kefeng Wang authored
      The folio refcount check is moved out for both !mapping and mapping folio,
      also update comment from page to folio for folio_migrate_mapping().
      
      No functional change intended.
      
      Link: https://lkml.kernel.org/r/20240626085328.608006-4-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jiaqi Yan <jiaqiyan@google.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      52881539
    • Kefeng Wang's avatar
      mm: add folio_mc_copy() · 02f4ee5a
      Kefeng Wang authored
      Add a #MC variant of folio_copy() which uses copy_mc_highpage() to support
      #MC handled during folio copy, it will be used in folio migration soon.
      
      Link: https://lkml.kernel.org/r/20240626085328.608006-3-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarJane Chu <jane.chu@oracle.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jiaqi Yan <jiaqiyan@google.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      02f4ee5a
    • Kefeng Wang's avatar
      mm: move memory_failure_queue() into copy_mc_[user]_highpage() · 28bdacbc
      Kefeng Wang authored
      Patch series "mm: migrate: support poison recover from migrate folio", v5.
      
      The folio migration is widely used in kernel, memory compaction, memory
      hotplug, soft offline page, numa balance, memory demote/promotion, etc,
      but once access a poisoned source folio when migrating, the kernel will
      panic.
      
      There is a mechanism in the kernel to recover from uncorrectable memory
      errors, ARCH_HAS_COPY_MC(eg, Machine Check Safe Memory Copy on x86), which
      is already used in NVDIMM or core-mm paths(eg, CoW, khugepaged, coredump,
      ksm copy), see copy_mc_to_{user,kernel}, copy_mc_{user_}highpage callers.
      
      This series of patches provide the recovery mechanism from folio copy for
      the widely used folio migration.  Please note, because folio migration is
      no guarantee of success, so we could chose to make folio migration
      tolerant of memory failures, adding folio_mc_copy() which is a #MC
      versions of folio_copy(), once accessing a poisoned source folio, we could
      return error and make the folio migration fail, and this could avoid the
      similar panic shown below.
      
        CPU: 1 PID: 88343 Comm: test_softofflin Kdump: loaded Not tainted 6.6.0
        pc : copy_page+0x10/0xc0
        lr : copy_highpage+0x38/0x50
        ...
        Call trace:
         copy_page+0x10/0xc0
         folio_copy+0x78/0x90
         migrate_folio_extra+0x54/0xa0
         move_to_new_folio+0xd8/0x1f0
         migrate_folio_move+0xb8/0x300
         migrate_pages_batch+0x528/0x788
         migrate_pages_sync+0x8c/0x258
         migrate_pages+0x440/0x528
         soft_offline_in_use_page+0x2ec/0x3c0
         soft_offline_page+0x238/0x310
         soft_offline_page_store+0x6c/0xc0
         dev_attr_store+0x20/0x40
         sysfs_kf_write+0x4c/0x68
         kernfs_fop_write_iter+0x130/0x1c8
         new_sync_write+0xa4/0x138
         vfs_write+0x238/0x2d8
         ksys_write+0x74/0x110
      
      
      This patch (of 5):
      
      There is a memory_failure_queue() call after copy_mc_[user]_highpage(),
      see callers, eg, CoW/KSM page copy, it is used to mark the source page as
      h/w poisoned and unmap it from other tasks, and the upcomming poison
      recover from migrate folio will do the similar thing, so let's move the
      memory_failure_queue() into the copy_mc_[user]_highpage() instead of
      adding it into each user, this should also enhance the handling of
      poisoned page in khugepaged.
      
      Link: https://lkml.kernel.org/r/20240626085328.608006-1-wangkefeng.wang@huawei.com
      Link: https://lkml.kernel.org/r/20240626085328.608006-2-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarJane Chu <jane.chu@oracle.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jiaqi Yan <jiaqiyan@google.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      28bdacbc
    • Andrew Morton's avatar
      Merge branch 'mm-hotfixes-stable' into mm-stable to pick up "mm: fix · 8ef6fd0e
      Andrew Morton authored
      crashes from deferred split racing folio migration", needed by "mm:
      migrate: split folio_migrate_mapping()".
      8ef6fd0e
    • Lorenzo Stoakes's avatar
    • Hugh Dickins's avatar
      mm: fix crashes from deferred split racing folio migration · be9581ea
      Hugh Dickins authored
      Even on 6.10-rc6, I've been seeing elusive "Bad page state"s (often on
      flags when freeing, yet the flags shown are not bad: PG_locked had been
      set and cleared??), and VM_BUG_ON_PAGE(page_ref_count(page) == 0)s from
      deferred_split_scan()'s folio_put(), and a variety of other BUG and WARN
      symptoms implying double free by deferred split and large folio migration.
      
      6.7 commit 9bcef597 ("mm: memcg: fix split queue list crash when large
      folio migration") was right to fix the memcg-dependent locking broken in
      85ce2c51 ("memcontrol: only transfer the memcg data for migration"),
      but missed a subtlety of deferred_split_scan(): it moves folios to its own
      local list to work on them without split_queue_lock, during which time
      folio->_deferred_list is not empty, but even the "right" lock does nothing
      to secure the folio and the list it is on.
      
      Fortunately, deferred_split_scan() is careful to use folio_try_get(): so
      folio_migrate_mapping() can avoid the race by folio_undo_large_rmappable()
      while the old folio's reference count is temporarily frozen to 0 - adding
      such a freeze in the !mapping case too (originally, folio lock and
      unmapping and no swap cache left an anon folio unreachable, so no freezing
      was needed there: but the deferred split queue offers a way to reach it).
      
      Link: https://lkml.kernel.org/r/29c83d1a-11ca-b6c9-f92e-6ccb322af510@google.com
      Fixes: 9bcef597 ("mm: memcg: fix split queue list crash when large folio migration")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      be9581ea
    • Paul Menzel's avatar
      lib/build_OID_registry: avoid non-destructive substitution for Perl < 5.13.2 compat · 2fe29fe9
      Paul Menzel authored
      On a system with Perl 5.12.1, commit 5ef6dc08
      ("lib/build_OID_registry: don't mention the full path of the script in
      output") causes the build to fail with the error below.
      
           Bareword found where operator expected at ./lib/build_OID_registry line 41, near "s#^\Q$abs_srctree/\E##r"
           syntax error at ./lib/build_OID_registry line 41, near "s#^\Q$abs_srctree/\E##r"
           Execution of ./lib/build_OID_registry aborted due to compilation errors.
           make[3]: *** [lib/Makefile:352: lib/oid_registry_data.c] Error 255
      
      Ahmad Fatoum analyzed that non-destructive substitution is only supported since
      Perl 5.13.2. Instead of dropping `r` and having the side effect of modifying
      `$0`, introduce a dedicated variable to support older Perl versions.
      
      Link: https://lkml.kernel.org/r/20240702223512.8329-2-pmenzel@molgen.mpg.de
      Link: https://lkml.kernel.org/r/20240701155802.75152-1-pmenzel@molgen.mpg.de
      Fixes: 5ef6dc08 ("lib/build_OID_registry: don't mention the full path of the script in output")
      Link: https://lore.kernel.org/all/259f7a87-2692-480e-9073-1c1c35b52f67@molgen.mpg.de/Signed-off-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Suggested-by: default avatarAhmad Fatoum <a.fatoum@pengutronix.de>
      Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Cc: Nicolas Schier <nicolas@fjasle.eu>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Ahmad Fatoum <a.fatoum@pengutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2fe29fe9
    • Yang Shi's avatar
      mm: gup: stop abusing try_grab_folio · f442fa61
      Yang Shi authored
      A kernel warning was reported when pinning folio in CMA memory when
      launching SEV virtual machine.  The splat looks like:
      
      [  464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520
      [  464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6
      [  464.325477] RIP: 0010:__get_user_pages+0x423/0x520
      [  464.325515] Call Trace:
      [  464.325520]  <TASK>
      [  464.325523]  ? __get_user_pages+0x423/0x520
      [  464.325528]  ? __warn+0x81/0x130
      [  464.325536]  ? __get_user_pages+0x423/0x520
      [  464.325541]  ? report_bug+0x171/0x1a0
      [  464.325549]  ? handle_bug+0x3c/0x70
      [  464.325554]  ? exc_invalid_op+0x17/0x70
      [  464.325558]  ? asm_exc_invalid_op+0x1a/0x20
      [  464.325567]  ? __get_user_pages+0x423/0x520
      [  464.325575]  __gup_longterm_locked+0x212/0x7a0
      [  464.325583]  internal_get_user_pages_fast+0xfb/0x190
      [  464.325590]  pin_user_pages_fast+0x47/0x60
      [  464.325598]  sev_pin_memory+0xca/0x170 [kvm_amd]
      [  464.325616]  sev_mem_enc_register_region+0x81/0x130 [kvm_amd]
      
      Per the analysis done by yangge, when starting the SEV virtual machine, it
      will call pin_user_pages_fast(..., FOLL_LONGTERM, ...) to pin the memory. 
      But the page is in CMA area, so fast GUP will fail then fallback to the
      slow path due to the longterm pinnalbe check in try_grab_folio().
      
      The slow path will try to pin the pages then migrate them out of CMA area.
      But the slow path also uses try_grab_folio() to pin the page, it will
      also fail due to the same check then the above warning is triggered.
      
      In addition, the try_grab_folio() is supposed to be used in fast path and
      it elevates folio refcount by using add ref unless zero.  We are guaranteed
      to have at least one stable reference in slow path, so the simple atomic add
      could be used.  The performance difference should be trivial, but the
      misuse may be confusing and misleading.
      
      Redefined try_grab_folio() to try_grab_folio_fast(), and try_grab_page()
      to try_grab_folio(), and use them in the proper paths.  This solves both
      the abuse and the kernel warning.
      
      The proper naming makes their usecase more clear and should prevent from
      abusing in the future.
      
      peterx said:
      
      : The user will see the pin fails, for gpu-slow it further triggers the WARN
      : right below that failure (as in the original report):
      : 
      :         folio = try_grab_folio(page, page_increm - 1,
      :                                 foll_flags);
      :         if (WARN_ON_ONCE(!folio)) { <------------------------ here
      :                 /*
      :                         * Release the 1st page ref if the
      :                         * folio is problematic, fail hard.
      :                         */
      :                 gup_put_folio(page_folio(page), 1,
      :                                 foll_flags);
      :                 ret = -EFAULT;
      :                 goto out;
      :         }
      
      [1] https://lore.kernel.org/linux-mm/1719478388-31917-1-git-send-email-yangge1116@126.com/
      
      [shy828301@gmail.com: fix implicit declaration of function try_grab_folio_fast]
        Link: https://lkml.kernel.org/r/CAHbLzkowMSso-4Nufc9hcMehQsK9PNz3OSu-+eniU-2Mm-xjhA@mail.gmail.com
      Link: https://lkml.kernel.org/r/20240628191458.2605553-1-yang@os.amperecomputing.com
      Fixes: 57edfcfd ("mm/gup: accelerate thp gup even for "pages != NULL"")
      Signed-off-by: default avatarYang Shi <yang@os.amperecomputing.com>
      Reported-by: default avataryangge <yangge1116@126.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: <stable@vger.kernel.org>	[6.6+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f442fa61
  3. 05 Jul, 2024 23 commits