- 09 Sep, 2024 22 commits
-
-
Hugh Dickins authored
Although shmem_get_folio_gfp() is correctly putting inodes on the shrinklist according to the folio size, shmem_unused_huge_shrink() was still dealing with that shrinklist in terms of HPAGE_PMD_SIZE. Generalize that; and to handle the mixture of sizes more sensibly, shmem_alloc_and_add_folio() give it a number of pages to be freed (approximate: no need to minimize that with an exact calculation) instead of a number of inodes to split. [akpm@linux-foundation.org: comment tweak, per David] Link: https://lkml.kernel.org/r/d8c40850-6774-7a93-1e2c-8d941683b260@google.comSigned-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
There has been a long-standing and very minor off-by-one, where shmem_get_folio_gfp() decides if a large folio extends beyond i_size far enough to leave a page or more for freeing later under pressure. This is not something needed for stable: but it will be proportionately more significant as support for smaller large folios is added, and is best fixed before duplicating the check in other places. Link: https://lkml.kernel.org/r/d8e75079-af2d-8519-56df-6be1dccc247a@google.com Fixes: 779750d2 ("shmem: split huge pages beyond i_size under memory pressure") Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wei Yang authored
Just do what mt_dump_range64() does. Dump the error message based on format. Link: https://lkml.kernel.org/r/20240826012422.29935-2-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wei Yang authored
mt_dump_arange64() only applies to an entry whose type is maple_arange_64, in which mte_is_leaf() must return false. Since mte_is_leaf() here is always false, we can remove this condition check. Link: https://lkml.kernel.org/r/20240826012422.29935-1-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
We added a public Google calendar for easy sharing of DAMON bi-weekly meetups[1]. Add it to the official document for a better visibility. [1] https://lore.kernel.org/all/20240717235812.53087-1-sj@kernel.org/ Link: https://lkml.kernel.org/r/20240826015741.80707-4-sj@kernel.orgSigned-off-by: SeongJae Park <sj@kernel.org> Cc: Alex Shi <alexs@kernel.org> Cc: Hu Haowen <2023002089@link.tyut.edu.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
maintainer-profile.rst for DAMON separates the links and target definitions. It is not really necessary, and only makes the readability worse. At least the definitions need the section title (say, "References"). Just add the links in place on the doc. Link: https://lkml.kernel.org/r/20240826015741.80707-3-sj@kernel.orgSigned-off-by: SeongJae Park <sj@kernel.org> Cc: Alex Shi <alexs@kernel.org> Cc: Hu Haowen <2023002089@link.tyut.edu.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
Patch series "Docs/damon: update GitHub repo URLs and maintainer-profile". Replace GitHub URLS on DAMON documents for none-kernel parts DAMON repos with new ones[1] via the first patch. With following two patches, wordsmith maitnainer-profile for better readability, and document the Google clendsar for bi-weekly meetups, respectively. [1] https://lore.kernel.org/20240813232158.83903-1-sj@kernel.org This patch (of 3): GitHub repos for non-kernel parts of DAMON project including 'damo', 'damon-tests' and 'damoos' will be moved[1] from 'awslabs' org to 'damonitor', by 2024-09-05. Update related URLs in kernel tree. [1] https://lore.kernel.org/20240813232158.83903-1-sj@kernel.org Link: https://lkml.kernel.org/r/20240826015741.80707-1-sj@kernel.org Link: https://lkml.kernel.org/r/20240826015741.80707-2-sj@kernel.orgSigned-off-by: SeongJae Park <sj@kernel.org> Cc: Alex Shi <alexs@kernel.org> Cc: Hu Haowen <2023002089@link.tyut.edu.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
This reverts commit 0742cadf5e4c ("mm/damon/lru_sort: adjust local variable to dynamic allocation"). The commit was introduced to avoid unnecessary usage of stack memory for per-scheme region priorities histogram buffer. The fix is nice, but the point of the fix looks not very clear if the commit message is not read together. That's mainly because the buffer is a private field, which means it is hidden from the DAMON API users. That's not the fault of the fix but the underlying data structure. Now the per-scheme histogram buffer is gone, so the problem that the commit was fixing is also removed. The use of kmemdup() has no more point but just making the code bit difficult to understand. Revert the fix. Link: https://lkml.kernel.org/r/20240826042323.87025-5-sj@kernel.orgSigned-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
Nobody is reading from or writing to the per-scheme region priorities histogram buffer. It is only wasting memory. Remove it. Link: https://lkml.kernel.org/r/20240826042323.87025-4-sj@kernel.orgSigned-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
Replace the usage of per-quota region priorities histogram buffer with the per-context one. After this change, the per-quota histogram is not used by anyone, and hence it is ready to be removed. Link: https://lkml.kernel.org/r/20240826042323.87025-3-sj@kernel.orgSigned-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
Patch series "replace per-quota region priorities histogram buffer with per-context one". Each DAMOS quota (struct damos_quota) maintains a histogram for total regions size per its prioritization score. DAMOS calcultes minimum prioritization score of regions that are ok to apply the DAMOS action to while respecting the quota. The histogram is constructed only for the calculation of the minimum score in damos_adjust_quota() for each quota which called by kdamond_fn(). Hence, there is no real reason to have per-quota histogram. Only per-kdamond histogram is needed, since parallel kdamonds could have races otherwise. The current implementation is only wasting the memory, and can easily cause unintended stack usage[1]. So, introducing a per-kdamond histogram and replacing the per-quota one with it would be the right solution for the issue. However, supporting multiple DAMON contexts per kdamond is still an ongoing work[2] without a clear estimated time of arrival. Meanwhile, per-context histogram could be an effective and straightforward solution having no blocker. Let's fix the problem first in the way. This patch (of 4): Introduce per-context buffer for region priority scores-total size histogram. Same to the per-quota one (->histogram of struct damos_quota), the new buffer is hidden from DAMON API users by being defined as a private field of DAMON context structure. It is dynamically allocated and de-allocated at the beginning and ending of the execution of the kdamond by kdamond_fn() itself. [1] commit 0742cadf5e4c ("mm/damon/lru_sort: adjust local variable to dynamic allocation") [2] https://lore.kernel.org/20240531122320.909060-1-yorha.op@gmail.com Link: https://lkml.kernel.org/r/20240826042323.87025-1-sj@kernel.org Link: https://lkml.kernel.org/r/20240826042323.87025-2-sj@kernel.orgSigned-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
There are no more callers of putback_lru_page(), remove it. Link: https://lkml.kernel.org/r/20240826065814.1336616-7-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
There are no more callers of isolate_lru_page(), remove it. [wangkefeng.wang@huawei.com: convert page to folio in comment and document, per Matthew] Link: https://lkml.kernel.org/r/20240826144114.1928071-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20240826065814.1336616-6-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Saves a couple of calls to compound_head() and remove last two callers of putback_lru_page(). Link: https://lkml.kernel.org/r/20240826065814.1336616-5-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
The page for migrate_device_unmap() already has a reference, so it is safe to convert the page to folio to save a few calls to compound_head(), which removes the last isolate_lru_page() call. Link: https://lkml.kernel.org/r/20240826065814.1336616-4-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Save two calls to compound_head() and use folio throughout. Link: https://lkml.kernel.org/r/20240826065814.1336616-3-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Patch series "mm: finish isolate/putback_lru_page()". Convert to use more folios in migrate_device.c, then we could remove isolate_lru_page() and putback_lru_page(). This patch (of 6): Save a few calls to compound_head() and use folio throughout. Link: https://lkml.kernel.org/r/20240826065814.1336616-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20240826065814.1336616-2-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Retrieve a folio from the page cache rather than a page. Saves a couple of conversions between page & folio. Link: https://lkml.kernel.org/r/20240826202138.3804238-1-willy@infradead.orgSigned-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Barry Song authored
When a THP is added to the deferred_list due to partially mapped, its partial pages are unused, leading to wasted memory and potentially increasing memory reclamation pressure. Detailing the specifics of how unmapping occurs is quite difficult and not that useful, so we adopt a simple approach: each time a THP enters the deferred_list, we increment the count by 1; whenever it leaves for any reason, we decrement the count by 1. Link: https://lkml.kernel.org/r/20240824010441.21308-3-21cnbao@gmail.comSigned-off-by: Barry Song <v-songbaohua@oppo.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chris Li <chrisl@kernel.org> Cc: Chuanhua Han <hanchuanhua@oppo.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuai Yuan <yuanshuai@oppo.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Barry Song authored
Patch series "mm: count the number of anonymous THPs per size", v4. Knowing the number of transparent anon THPs in the system is crucial for performance analysis. It helps in understanding the ratio and distribution of THPs versus small folios throughout the system. Additionally, partial unmapping by userspace can lead to significant waste of THPs over time and increase memory reclamation pressure. We need this information for comprehensive system tuning. This patch (of 2): Let's track for each anonymous THP size, how many of them are currently allocated. We'll track the complete lifespan of an anon THP, starting when it becomes an anon THP ("large anon folio") (->mapping gets set), until it gets freed (->mapping gets cleared). Introduce a new "nr_anon" counter per THP size and adjust the corresponding counter in the following cases: * We allocate a new THP and call folio_add_new_anon_rmap() to map it the first time and turn it into an anon THP. * We split an anon THP into multiple smaller ones. * We migrate an anon THP, when we prepare the destination. * We free an anon THP back to the buddy. Note that AnonPages in /proc/meminfo currently tracks the total number of *mapped* anonymous *pages*, and therefore has slightly different semantics. In the future, we might also want to track "nr_anon_mapped" for each THP size, which might be helpful when comparing it to the number of allocated anon THPs (long-term pinning, stuck in swapcache, memory leaks, ...). Further note that for now, we only track anon THPs after they got their ->mapping set, for example via folio_add_new_anon_rmap(). If we would allocate some in the swapcache, they will only show up in the statistics for now after they have been mapped to user space the first time, where we call folio_add_new_anon_rmap(). [akpm@linux-foundation.org: documentation fixups, per David] Link: https://lkml.kernel.org/r/3e8add35-e26b-443b-8a04-1078f4bc78f6@redhat.com Link: https://lkml.kernel.org/r/20240824010441.21308-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240824010441.21308-2-21cnbao@gmail.comSigned-off-by: Barry Song <v-songbaohua@oppo.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chris Li <chrisl@kernel.org> Cc: Chuanhua Han <hanchuanhua@oppo.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuai Yuan <yuanshuai@oppo.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ryan Roberts authored
Previously we had a situation where shmem mTHP controls and stats were not exposed for some supported sizes and were exposed for some unsupported sizes. So let's clean that up. Anon mTHP can support all large orders [2, PMD_ORDER]. But shmem can support all large orders [1, MAX_PAGECACHE_ORDER]. However, per-size shmem controls and stats were previously being exposed for all the anon mTHP orders, meaning order-1 was not present, and for arm64 64K base pages, orders 12 and 13 were exposed but were not supported internally. Tidy this all up by defining ctrl and stats attribute groups for anon and file separately. Anon ctrl and stats groups are populated for all orders in THP_ORDERS_ALL_ANON and file ctrl and stats groups are populated for all orders in THP_ORDERS_ALL_FILE_DEFAULT. Additionally, create "any" ctrl and stats attribute groups which are populated for all orders in (THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_FILE_DEFAULT). swpout stats use this since they apply to anon and shmem. The side-effect of all this is that different hugepage-*kB directories contain different sets of controls and stats, depending on which memory types support that size. This approach is preferred over the alternative, which is to populate dummy controls and stats for memory types that do not support a given size. [ryan.roberts@arm.com: file pages and shmem can also be split] Link: https://lkml.kernel.org/r/f7ced14c-8bc5-405f-bee7-94f63980f525@arm.comLink: https://lkml.kernel.org/r/20240808111849.651867-3-ryan.roberts@arm.comSigned-off-by: Ryan Roberts <ryan.roberts@arm.com> Tested-by: Barry Song <baohua@kernel.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ryan Roberts authored
Patch series "Shmem mTHP controls and stats improvements", v3. This is a small series to tidy up the way the shmem controls and stats are exposed. These patches were previously part of the series at [2], but I decided to split them out since they can go in independently. This patch (of 2): Let's move count_mthp_stat() so that it's always defined, even when THP is disabled. Previously uses of the function in files such as shmem.c, which are compiled even when THP is disabled, required ugly THP ifdeferry. With this cleanup, we can remove those ifdefs and the function resolves to a nop when THP is disabled. I shortly plan to call count_mthp_stat() from more THP-invariant source files. Link: https://lkml.kernel.org/r/20240808111849.651867-1-ryan.roberts@arm.com Link: https://lkml.kernel.org/r/20240808111849.651867-2-ryan.roberts@arm.comSigned-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Barry Song <baohua@kernel.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Lance Yang <ioworker0@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 04 Sep, 2024 18 commits
-
-
Kefeng Wang authored
Use the isolate_folio_to_list() to unify hugetlb/LRU/non-LRU folio isolation, which cleanup code a bit and save a few calls to compound_head(). [wangkefeng.wang@huawei.com: various fixes] Link: https://lkml.kernel.org/r/20240829150500.2599549-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20240827114728.3212578-6-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Add isolate_folio_to_list() helper to try to isolate HugeTLB, no-LRU movable and LRU folios to a list, which will be reused by do_migrate_range() from memory hotplug soon, also drop the mf_isolate_folio() since we could directly use new helper in the soft_offline_in_use_page(). Link: https://lkml.kernel.org/r/20240827114728.3212578-5-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Tested-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Commit b15c8726 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined") don't handle the hugetlb pages, the endless loop still occur if offline a hwpoison hugetlb page, luckly, after the commit e591ef7d ("mm, hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage"), the HPageMigratable of hugetlb page will be cleared, and the hwpoison hugetlb page will be skipped in scan_movable_pages(), so the endless loop issue is fixed. However if the HPageMigratable() check passed(without reference and lock), the hugetlb page may be hwpoisoned, it won't cause issue since the hwpoisoned page will be handled correctly in the next movable pages scan loop, and it will be isolated in do_migrate_range() but fails to migrate. In order to avoid the unnecessary isolation and unify all hwpoisoned page handling, let's unconditionally check hwpoison firstly, and if it is a hwpoisoned hugetlb page, try to unmap it as the catch all safety net like normal page does. Link: https://lkml.kernel.org/r/20240827114728.3212578-4-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Add unmap_poisoned_folio() helper which will be reused by do_migrate_range() from memory hotplug soon. [akpm@linux-foundation.org: whitespace tweak, per Miaohe Lin] Link: https://lkml.kernel.org/r/1f80c7e3-c30d-1ac1-6a36-d1a5f5907f7c@huawei.com Link: https://lkml.kernel.org/r/20240827114728.3212578-3-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Patch series "mm: memory_hotplug: improve do_migrate_range()", v3. Unify hwpoisoned page handling and isolation of HugeTLB/LRU/non-LRU movable page, also convert to use folios in do_migrate_range(). This patch (of 5): Directly use a folio for HugeTLB and THP when calculate the next pfn, then remove unused head variable. Link: https://lkml.kernel.org/r/20240827114728.3212578-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20240827114728.3212578-2-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
'--kunitconfig' option of 'kunit.py run' supports '.kunitconfig' file name convention. Add the file for DAMON kunit tests for more convenient kunit run. Link: https://lkml.kernel.org/r/20240827030336.7930-10-sj@kernel.orgSigned-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
There was a discussion about better places for kunit test code[1] and test file name suffix[2]. Folowwing the conclusion, move kunit tests for DAMON to mm/damon/tests/ subdirectory and rename those. [1] https://lore.kernel.org/CABVgOS=pUdWb6NDHszuwb1HYws4a1-b1UmN=i8U_ED7HbDT0mg@mail.gmail.com [2] https://lore.kernel.org/CABVgOSmKwPq7JEpHfS6sbOwsR0B-DBDk_JP-ZD9s9ZizvpUjbQ@mail.gmail.com Link: https://lkml.kernel.org/r/20240827030336.7930-9-sj@kernel.orgSigned-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
The test depends on registration of DAMON_OPS_PADDR. It would be registered only when CONFIG_DAMON_PADDR is set. DAMON core kunit tests do fake ops registration for such case. However, the functions for such fake ops registration is not available to DAMON debugfs interface. Just skip the test in the case. Link: https://lkml.kernel.org/r/20240827030336.7930-8-sj@kernel.org Fixes: 999b9467 ("mm/damon/dbgfs-test: fix is_target_id() change") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
The test depends on registration of DAMON_OPS_PADDR. It would be registered only when CONFIG_DAMON_PADDR is set. DAMON core kunit tests do fake ops registration for such case. However, the functions for such fake ops registration is not available to DAMON debugfs interface. Just skip the test in the case. Link: https://lkml.kernel.org/r/20240827030336.7930-7-sj@kernel.org Fixes: 999b9467 ("mm/damon/dbgfs-test: fix is_target_id() change") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
DAMON core kunit test can be executed without CONFIG_DAMON_VADDR. In the case, vaddr DAMON ops is not registered. Meanwhile, ops registration kunit test assumes the vaddr ops is registered. Check and handle the case by registrering fake vaddr ops inside the test code. Link: https://lkml.kernel.org/r/20240827030336.7930-6-sj@kernel.org Fixes: 4f540f5a ("mm/damon/core-test: add a kunit test case for ops registration") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
DAMON ops registration kunit test tests both vaddr and paddr use cases in parts of the whole test cases. Basically testing only one ops use case is enough. Do the test with only vaddr use case. Link: https://lkml.kernel.org/r/20240827030336.7930-5-sj@kernel.orgSigned-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
Some test scripts are missing executable permissions. It causes warnings that make the test output unnecessarily verbose. Add executable permissions. Link: https://lkml.kernel.org/r/20240827030336.7930-4-sj@kernel.orgSigned-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
Python-based tests creates __pycache__/ directory. Remove it with 'make clean' by defining it as EXTRA_CLEAN. Link: https://lkml.kernel.org/r/20240827030336.7930-3-sj@kernel.org Fixes: b5906f5f ("selftests/damon: add a test for update_schemes_tried_regions sysfs command") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
Patch series "misc fixups for DAMON {self,kunit} tests". This patchset is for minor fixups of DAMON selftests and kunit tests. First three patches make DAMON selftests more cleanly maintained (patches 1 and 2) without unnecessary warnings (patch 3). Following six patches remove unnecessary test case (patch 4), handle configs combinations that can make tests fail (patches 5-7), reorganize the test files following the new guideline (patch 8), and add reference kunitconfig for DAMON kunit tests (patch 9). This patch (of 9): DAMON selftests build access_memory_even, but its not on the .gitignore list. Add it to make 'git status' output cleaner. Link: https://lkml.kernel.org/r/20240827030336.7930-1-sj@kernel.org Link: https://lkml.kernel.org/r/20240827030336.7930-2-sj@kernel.org Fixes: c94df805 ("selftests/damon: implement a program for even-numbered memory regions access") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: David Gow <davidgow@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Yujie Liu authored
Problem statement: Since commit fc137c0d ("sched/numa: enhance vma scanning logic"), the Numa vma scan overhead has been reduced a lot. Meanwhile, the reducing of the vma scan might create less Numa page fault information. The insufficient information makes it harder for the Numa balancer to make decision. Later, commit b7a5b537 ("sched/numa: Complete scanning of partial VMAs regardless of PID activity") and commit 84db47ca ("sched/numa: Fix mm numa_scan_seq based unconditional scan") are found to bring back part of the performance. Recently when running SPECcpu omnetpp_r on a 320 CPUs/2 Sockets system, a long duration of remote Numa node read was observed by PMU events: A few cores having ~500MB/s remote memory access for ~20 seconds. It causes high core-to-core variance and performance penalty. After the investigation, it is found that many vmas are skipped due to the active PID check. According to the trace events, in most cases, vma_is_accessed() returns false because the history access info stored in pids_active array has been cleared. Proposal: The main idea is to adjust vma_is_accessed() to let it return true easier. Thus compare the diff between mm->numa_scan_seq and vma->numab_state->prev_scan_seq. If the diff has exceeded the threshold, scan the vma. This patch especially helps the cases where there are small number of threads, like the process-based SPECcpu. Without this patch, if the SPECcpu process access the vma at the beginning, then sleeps for a long time, the pid_active array will be cleared. A a result, if this process is woken up again, it never has a chance to set prot_none anymore. Because only the first 2 times of access is granted for vma scan: (current->mm->numa_scan_seq) - vma->numab_state->start_scan_seq) < 2 to be worse, no other threads within the task can help set the prot_none. This causes information lost. Raghavendra helped test current patch and got the positive result on the AMD platform: autonumabench NUMA01 base patched Amean syst-NUMA01 194.05 ( 0.00%) 165.11 * 14.92%* Amean elsp-NUMA01 324.86 ( 0.00%) 315.58 * 2.86%* Duration User 380345.36 368252.04 Duration System 1358.89 1156.23 Duration Elapsed 2277.45 2213.25 autonumabench NUMA02 Amean syst-NUMA02 1.12 ( 0.00%) 1.09 * 2.93%* Amean elsp-NUMA02 3.50 ( 0.00%) 3.56 * -1.84%* Duration User 1513.23 1575.48 Duration System 8.33 8.13 Duration Elapsed 28.59 29.71 kernbench Amean user-256 22935.42 ( 0.00%) 22535.19 * 1.75%* Amean syst-256 7284.16 ( 0.00%) 7608.72 * -4.46%* Amean elsp-256 159.01 ( 0.00%) 158.17 * 0.53%* Duration User 68816.41 67615.74 Duration System 21873.94 22848.08 Duration Elapsed 506.66 504.55 Intel 256 CPUs/2 Sockets: autonuma benchmark also shows improvements: v6.10-rc5 v6.10-rc5 +patch Amean syst-NUMA01 245.85 ( 0.00%) 230.84 * 6.11%* Amean syst-NUMA01_THREADLOCAL 205.27 ( 0.00%) 191.86 * 6.53%* Amean syst-NUMA02 18.57 ( 0.00%) 18.09 * 2.58%* Amean syst-NUMA02_SMT 2.63 ( 0.00%) 2.54 * 3.47%* Amean elsp-NUMA01 517.17 ( 0.00%) 526.34 * -1.77%* Amean elsp-NUMA01_THREADLOCAL 99.92 ( 0.00%) 100.59 * -0.67%* Amean elsp-NUMA02 15.81 ( 0.00%) 15.72 * 0.59%* Amean elsp-NUMA02_SMT 13.23 ( 0.00%) 12.89 * 2.53%* v6.10-rc5 v6.10-rc5 +patch Duration User 1064010.16 1075416.23 Duration System 3307.64 3104.66 Duration Elapsed 4537.54 4604.73 The SPECcpu remote node access issue disappears with the patch applied. Link: https://lkml.kernel.org/r/20240827112958.181388-1-yu.c.chen@intel.com Fixes: fc137c0d ("sched/numa: enhance vma scanning logic") Signed-off-by: Chen Yu <yu.c.chen@intel.com> Co-developed-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Yujie Liu <yujie.liu@intel.com> Reported-by: Xiaoping Zhou <xiaoping.zhou@intel.com> Reviewed-and-tested-by: Raghavendra K T <raghavendra.kt@amd.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: "Chen, Tim C" <tim.c.chen@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@amd.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Yanfei Xu authored
commit 823430c8 ("memory tier: consolidate the initialization of memory tiers") introduces a locking change that use guard(mutex) to instead of mutex_lock/unlock() for memory_tier_lock. It unexpectedly expanded the locked region to include the hotplug_memory_notifier(), as a result, it triggers an locking dependency detected of ABBA deadlock. Exclude hotplug_memory_notifier() from the locked region to fixing it. The deadlock scenario is that when a memory online event occurs, the execution of memory notifier will access the read lock of the memory_chain.rwsem, then the reigistration of the memory notifier in memory_tier_init() acquires the write lock of the memory_chain.rwsem while holding memory_tier_lock. Then the memory online event continues to invoke the memory hotplug callback registered by memory_tier_init(). Since this callback tries to acquire the memory_tier_lock, a deadlock occurs. In fact, this deadlock can't happen because memory_tier_init() always executes before memory online events happen due to the subsys_initcall() has an higher priority than module_init(). [ 133.491106] WARNING: possible circular locking dependency detected [ 133.493656] 6.11.0-rc2+ #146 Tainted: G O N [ 133.504290] ------------------------------------------------------ [ 133.515194] (udev-worker)/1133 is trying to acquire lock: [ 133.525715] ffffffff87044e28 (memory_tier_lock){+.+.}-{3:3}, at: memtier_hotplug_callback+0x383/0x4b0 [ 133.536449] [ 133.536449] but task is already holding lock: [ 133.549847] ffffffff875d3310 ((memory_chain).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x60/0xb0 [ 133.556781] [ 133.556781] which lock already depends on the new lock. [ 133.556781] [ 133.569957] [ 133.569957] the existing dependency chain (in reverse order) is: [ 133.577618] [ 133.577618] -> #1 ((memory_chain).rwsem){++++}-{3:3}: [ 133.584997] down_write+0x97/0x210 [ 133.588647] blocking_notifier_chain_register+0x71/0xd0 [ 133.592537] register_memory_notifier+0x26/0x30 [ 133.596314] memory_tier_init+0x187/0x300 [ 133.599864] do_one_initcall+0x117/0x5d0 [ 133.603399] kernel_init_freeable+0xab0/0xeb0 [ 133.606986] kernel_init+0x28/0x2f0 [ 133.610312] ret_from_fork+0x59/0x90 [ 133.613652] ret_from_fork_asm+0x1a/0x30 [ 133.617012] [ 133.617012] -> #0 (memory_tier_lock){+.+.}-{3:3}: [ 133.623390] __lock_acquire+0x2efd/0x5c60 [ 133.626730] lock_acquire+0x1ce/0x580 [ 133.629757] __mutex_lock+0x15c/0x1490 [ 133.632731] mutex_lock_nested+0x1f/0x30 [ 133.635717] memtier_hotplug_callback+0x383/0x4b0 [ 133.638748] notifier_call_chain+0xbf/0x370 [ 133.641647] blocking_notifier_call_chain+0x76/0xb0 [ 133.644636] memory_notify+0x2e/0x40 [ 133.647427] online_pages+0x597/0x720 [ 133.650246] memory_subsys_online+0x4f6/0x7f0 [ 133.653107] device_online+0x141/0x1d0 [ 133.655831] online_memory_block+0x4d/0x60 [ 133.658616] walk_memory_blocks+0xc0/0x120 [ 133.661419] add_memory_resource+0x51d/0x6c0 [ 133.664202] add_memory_driver_managed+0xf5/0x180 [ 133.667060] dev_dax_kmem_probe+0x7f7/0xb40 [kmem] [ 133.669949] dax_bus_probe+0x147/0x230 [ 133.672687] really_probe+0x27f/0xac0 [ 133.675463] __driver_probe_device+0x1f3/0x460 [ 133.678493] driver_probe_device+0x56/0x1b0 [ 133.681366] __driver_attach+0x277/0x570 [ 133.684149] bus_for_each_dev+0x145/0x1e0 [ 133.686937] driver_attach+0x49/0x60 [ 133.689673] bus_add_driver+0x2f3/0x6b0 [ 133.692421] driver_register+0x170/0x4b0 [ 133.695118] __dax_driver_register+0x141/0x1b0 [ 133.697910] dax_kmem_init+0x54/0xff0 [kmem] [ 133.700794] do_one_initcall+0x117/0x5d0 [ 133.703455] do_init_module+0x277/0x750 [ 133.706054] load_module+0x5d1d/0x74f0 [ 133.708602] init_module_from_file+0x12c/0x1a0 [ 133.711234] idempotent_init_module+0x3f1/0x690 [ 133.713937] __x64_sys_finit_module+0x10e/0x1a0 [ 133.716492] x64_sys_call+0x184d/0x20d0 [ 133.719053] do_syscall_64+0x6d/0x140 [ 133.721537] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 133.724239] [ 133.724239] other info that might help us debug this: [ 133.724239] [ 133.730832] Possible unsafe locking scenario: [ 133.730832] [ 133.735298] CPU0 CPU1 [ 133.737759] ---- ---- [ 133.740165] rlock((memory_chain).rwsem); [ 133.742623] lock(memory_tier_lock); [ 133.745357] lock((memory_chain).rwsem); [ 133.748141] lock(memory_tier_lock); [ 133.750489] [ 133.750489] *** DEADLOCK *** [ 133.750489] [ 133.756742] 6 locks held by (udev-worker)/1133: [ 133.759179] #0: ffff888207be6158 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x26c/0x570 [ 133.762299] #1: ffffffff875b5868 (device_hotplug_lock){+.+.}-{3:3}, at: lock_device_hotplug+0x20/0x30 [ 133.765565] #2: ffff88820cf6a108 (&dev->mutex){....}-{3:3}, at: device_online+0x2f/0x1d0 [ 133.768978] #3: ffffffff86d08ff0 (cpu_hotplug_lock){++++}-{0:0}, at: mem_hotplug_begin+0x17/0x30 [ 133.772312] #4: ffffffff8702dfb0 (mem_hotplug_lock){++++}-{0:0}, at: mem_hotplug_begin+0x23/0x30 [ 133.775544] #5: ffffffff875d3310 ((memory_chain).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x60/0xb0 [ 133.779113] [ 133.779113] stack backtrace: [ 133.783728] CPU: 5 UID: 0 PID: 1133 Comm: (udev-worker) Tainted: G O N 6.11.0-rc2+ #146 [ 133.787220] Tainted: [O]=OOT_MODULE, [N]=TEST [ 133.789948] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 133.793291] Call Trace: [ 133.795826] <TASK> [ 133.798284] dump_stack_lvl+0xea/0x150 [ 133.801025] dump_stack+0x19/0x20 [ 133.803609] print_circular_bug+0x477/0x740 [ 133.806341] check_noncircular+0x2f4/0x3e0 [ 133.809056] ? __pfx_check_noncircular+0x10/0x10 [ 133.811866] ? __pfx_lockdep_lock+0x10/0x10 [ 133.814670] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30 [ 133.817610] __lock_acquire+0x2efd/0x5c60 [ 133.820339] ? __pfx___lock_acquire+0x10/0x10 [ 133.823128] ? __dax_driver_register+0x141/0x1b0 [ 133.825926] ? do_one_initcall+0x117/0x5d0 [ 133.828648] lock_acquire+0x1ce/0x580 [ 133.831349] ? memtier_hotplug_callback+0x383/0x4b0 [ 133.834293] ? __pfx_lock_acquire+0x10/0x10 [ 133.837134] __mutex_lock+0x15c/0x1490 [ 133.839829] ? memtier_hotplug_callback+0x383/0x4b0 [ 133.842753] ? memtier_hotplug_callback+0x383/0x4b0 [ 133.845602] ? __this_cpu_preempt_check+0x21/0x30 [ 133.848438] ? __pfx___mutex_lock+0x10/0x10 [ 133.851200] ? __pfx_lock_acquire+0x10/0x10 [ 133.853935] ? global_dirty_limits+0xc0/0x160 [ 133.856699] ? __sanitizer_cov_trace_switch+0x58/0xa0 [ 133.859564] mutex_lock_nested+0x1f/0x30 [ 133.862251] ? mutex_lock_nested+0x1f/0x30 [ 133.864964] memtier_hotplug_callback+0x383/0x4b0 [ 133.867752] notifier_call_chain+0xbf/0x370 [ 133.870550] ? writeback_set_ratelimit+0xe8/0x160 [ 133.873372] blocking_notifier_call_chain+0x76/0xb0 [ 133.876311] memory_notify+0x2e/0x40 [ 133.879013] online_pages+0x597/0x720 [ 133.881686] ? irqentry_exit+0x3e/0xa0 [ 133.884397] ? __pfx_online_pages+0x10/0x10 [ 133.887244] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30 [ 133.890299] ? mhp_init_memmap_on_memory+0x7a/0x1c0 [ 133.893203] memory_subsys_online+0x4f6/0x7f0 [ 133.896099] ? __pfx_memory_subsys_online+0x10/0x10 [ 133.899039] ? xa_load+0x16d/0x2e0 [ 133.901667] ? __pfx_xa_load+0x10/0x10 [ 133.904366] ? __pfx_memory_subsys_online+0x10/0x10 [ 133.907218] device_online+0x141/0x1d0 [ 133.909845] online_memory_block+0x4d/0x60 [ 133.912494] walk_memory_blocks+0xc0/0x120 [ 133.915104] ? __pfx_online_memory_block+0x10/0x10 [ 133.917776] add_memory_resource+0x51d/0x6c0 [ 133.920404] ? __pfx_add_memory_resource+0x10/0x10 [ 133.923104] ? _raw_write_unlock+0x31/0x60 [ 133.925781] ? register_memory_resource+0x119/0x180 [ 133.928450] add_memory_driver_managed+0xf5/0x180 [ 133.931036] dev_dax_kmem_probe+0x7f7/0xb40 [kmem] [ 133.933665] ? __pfx_dev_dax_kmem_probe+0x10/0x10 [kmem] [ 133.936332] ? __pfx___up_read+0x10/0x10 [ 133.938878] dax_bus_probe+0x147/0x230 [ 133.941332] ? __pfx_dax_bus_probe+0x10/0x10 [ 133.943954] really_probe+0x27f/0xac0 [ 133.946387] ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30 [ 133.949106] __driver_probe_device+0x1f3/0x460 [ 133.951704] ? parse_option_str+0x149/0x190 [ 133.954241] driver_probe_device+0x56/0x1b0 [ 133.956749] __driver_attach+0x277/0x570 [ 133.959228] ? __pfx___driver_attach+0x10/0x10 [ 133.961776] bus_for_each_dev+0x145/0x1e0 [ 133.964367] ? __pfx_bus_for_each_dev+0x10/0x10 [ 133.967019] ? __kasan_check_read+0x15/0x20 [ 133.969543] ? _raw_spin_unlock+0x31/0x60 [ 133.972132] driver_attach+0x49/0x60 [ 133.974536] bus_add_driver+0x2f3/0x6b0 [ 133.977044] driver_register+0x170/0x4b0 [ 133.979480] __dax_driver_register+0x141/0x1b0 [ 133.982126] ? __pfx_dax_kmem_init+0x10/0x10 [kmem] [ 133.984724] dax_kmem_init+0x54/0xff0 [kmem] [ 133.987284] ? __pfx_dax_kmem_init+0x10/0x10 [kmem] [ 133.989965] do_one_initcall+0x117/0x5d0 [ 133.992506] ? __pfx_do_one_initcall+0x10/0x10 [ 133.995185] ? __kasan_kmalloc+0x88/0xa0 [ 133.997748] ? kasan_poison+0x3e/0x60 [ 134.000288] ? kasan_unpoison+0x2c/0x60 [ 134.002762] ? kasan_poison+0x3e/0x60 [ 134.005202] ? __asan_register_globals+0x62/0x80 [ 134.007753] ? __pfx_dax_kmem_init+0x10/0x10 [kmem] [ 134.010439] do_init_module+0x277/0x750 [ 134.012953] load_module+0x5d1d/0x74f0 [ 134.015406] ? __pfx_load_module+0x10/0x10 [ 134.017887] ? __pfx_ima_post_read_file+0x10/0x10 [ 134.020470] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30 [ 134.023127] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20 [ 134.025767] ? security_kernel_post_read_file+0xa2/0xd0 [ 134.028429] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20 [ 134.031162] ? kernel_read_file+0x503/0x820 [ 134.033645] ? __pfx_kernel_read_file+0x10/0x10 [ 134.036232] ? __pfx___lock_acquire+0x10/0x10 [ 134.038766] init_module_from_file+0x12c/0x1a0 [ 134.041291] ? init_module_from_file+0x12c/0x1a0 [ 134.043936] ? __pfx_init_module_from_file+0x10/0x10 [ 134.046516] ? __this_cpu_preempt_check+0x21/0x30 [ 134.049091] ? __kasan_check_read+0x15/0x20 [ 134.051551] ? do_raw_spin_unlock+0x60/0x210 [ 134.054077] idempotent_init_module+0x3f1/0x690 [ 134.056643] ? __pfx_idempotent_init_module+0x10/0x10 [ 134.059318] ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20 [ 134.061995] ? __fget_light+0x17d/0x210 [ 134.064428] __x64_sys_finit_module+0x10e/0x1a0 [ 134.066976] x64_sys_call+0x184d/0x20d0 [ 134.069405] do_syscall_64+0x6d/0x140 [ 134.071926] entry_SYSCALL_64_after_hwframe+0x76/0x7e [yanfei.xu@intel.com: add mutex_lock/unlock() pair back] Link: https://lkml.kernel.org/r/20240830102447.1445296-1-yanfei.xu@intel.com Link: https://lkml.kernel.org/r/20240827113614.1343049-1-yanfei.xu@intel.com Fixes: 823430c8 ("memory tier: consolidate the initialization of memory tiers") Signed-off-by: Yanfei Xu <yanfei.xu@intel.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Ho-Ren (Jack) Chuang <horen.chuang@linux.dev> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Uladzislau Rezki (Sony) authored
The aim is to simplify and making the vm_area_alloc_pages() function less confusing as it became more clogged nowadays: - eliminate a "bulk_gfp" variable and do not overwrite a gfp flag for bulk allocator; - drop __GFP_NOFAIL flag for high-order-page requests on upper layer. It becomes less spread between levels when it comes to __GFP_NOFAIL allocations; - add a comment about a fallback path if high-order attempt is unsuccessful because for such cases __GFP_NOFAIL is dropped; - fix a typo in a commit message. Link: https://lkml.kernel.org/r/20240827190916.34242-1-urezki@gmail.comSigned-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Baoquan He <bhe@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Lorenzo Stoakes authored
In commit 714965ca ("mm/mmap: start distinguishing if vma can be removed in mergeability test") we relaxed the VMA merge rules for VMAs possessing a vm_ops->close() hook, permitting this operation in instances where we wouldn't delete the VMA as part of the merge operation. This was later corrected in commit fc0c8f90 ("mm, mmap: fix vma_merge() case 7 with vma_ops->close") to account for a subtle case that the previous commit had not taken into account. In both instances, we first rely on is_mergeable_vma() to determine whether we might be dealing with a VMA that might be removed, taking advantage of the fact that a 'previous' VMA will never be deleted, only VMAs that follow it. The second patch corrects the instance where a merge of the previous VMA into a subsequent one did not correctly check whether the subsequent VMA had a vm_ops->close() handler. Both changes prevent merge cases that are actually permissible (for instance a merge of a VMA into a following VMA with a vm_ops->close(), but with no previous VMA, which would result in the next VMA being extended, not deleted). In addition, both changes fail to consider the case where a VMA that would otherwise be merged with the previous and next VMA might have vm_ops->close(), on the assumption that for this to be the case, all three would have to have the same vma->vm_file to be mergeable and thus the same vm_ops. And in addition both changes operate at 50,000 feet, trying to guess whether a VMA will be deleted. As we have majorly refactored the VMA merge operation and de-duplicated code to the point where we know precisely where deletions will occur, this patch removes the aforementioned checks altogether and instead explicitly checks whether a VMA will be deleted. In cases where a reduced merge is still possible (where we merge both previous and next VMA but the next VMA has a vm_ops->close hook, meaning we could just merge the previous and current VMA), we do so, otherwise the merge is not permitted. We take advantage of our userland testing to assert that this functions correctly - replacing the previous limited vm_ops->close() tests with tests for every single case where we delete a VMA. We also update all testing for both new and modified VMAs to set vma->vm_ops->close() in every single instance where this would not prevent the merge, to assert that we never do so. Link: https://lkml.kernel.org/r/9f96b8cfeef3d14afabddac3d6144afdfbef2e22.1725040657.git.lorenzo.stoakes@oracle.comSigned-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: Bert Karwatzki <spasswolf@web.de> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-