- 04 Jul, 2024 23 commits
-
-
Kairui Song authored
folio_file_pos is only needed for mixed usage of page cache and swap cache, for pure page cache usage, the caller can just use folio_pos instead. It can't be a swap cache page here. Swap mapping may only call into fs through swap_rw and that is not supported for netfs. So just drop it and use folio_pos instead. Link: https://lkml.kernel.org/r/20240521175854.96038-7-ryncsn@gmail.comSigned-off-by: Kairui Song <kasong@tencent.com> Cc: David Howells <dhowells@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Anna Schumaker <anna@kernel.org> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: NeilBrown <neilb@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kairui Song authored
folio_file_pos is only needed for mixed usage of page cache and swap cache, for pure page cache usage, the caller can just use folio_pos instead. It can't be a swap cache page here. Swap mapping may only call into fs through swap_rw and that is not supported for afs. So just drop it and use folio_pos instead. Link: https://lkml.kernel.org/r/20240521175854.96038-6-ryncsn@gmail.comSigned-off-by: Kairui Song <kasong@tencent.com> Cc: David Howells <dhowells@redhat.com> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: NeilBrown <neilb@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kairui Song authored
This function is no longer used after commit 4fa7a717 ("NFS: Fix up nfs_vm_page_mkwrite() for folios"), all users have been converted to use folio instead, just delete it to remove usage of page_index. Link: https://lkml.kernel.org/r/20240521175854.96038-5-ryncsn@gmail.comSigned-off-by: Kairui Song <kasong@tencent.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: NeilBrown <neilb@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kairui Song authored
page_index is needed for mixed usage of page cache and swap cache, for pure page cache usage, the caller can just use page->index instead. It can't be a swap cache page here, so just drop it. Link: https://lkml.kernel.org/r/20240521175854.96038-4-ryncsn@gmail.comSigned-off-by: Kairui Song <kasong@tencent.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Anna Schumaker <anna@kernel.org> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: NeilBrown <neilb@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kairui Song authored
Patch series "mm/swap: clean up and optimize swap cache index", v6. Currently we use one swap_address_space for every 64M chunk to reduce lock contention, this is like having a set of smaller files inside a swap device. But when doing swap cache look up or insert, we are still using the offset of the whole large swap device. This is OK for correctness, as the offset (key) is unique. But Xarray is specially optimized for small indexes, it creates the redix tree levels lazily to be just enough to fit the largest key stored in one Xarray. So we are wasting tree nodes unnecessarily. For 64M chunk it should only take at most 3 level to contain everything. But if we are using the offset from the whole swap device, the offset (key) value will be way beyond 64M, and so will the tree level. Optimize this by reduce the swap cache search space into 64M scope. Test with `time memhog 128G` inside a 8G memcg using 128G swap (ramdisk with SWP_SYNCHRONOUS_IO dropped, tested 3 times, results are stable. The test result is similar but the improvement is smaller if SWP_SYNCHRONOUS_IO is enabled, as swap out path can never skip swap cache): Before: 6.07user 250.74system 4:17.26elapsed 99%CPU (0avgtext+0avgdata 8373376maxresident)k 0inputs+0outputs (55major+33555018minor)pagefaults 0swaps After (+1.8% faster): 6.08user 246.09system 4:12.58elapsed 99%CPU (0avgtext+0avgdata 8373248maxresident)k 0inputs+0outputs (54major+33555027minor)pagefaults 0swaps Similar result with MySQL and sysbench using swap: Before: 94055.61 qps After (+0.8% faster): 94834.91 qps There is alse a very slight drop of radix tree node slab usage: Before: 303952K After: 302224K For this series: There are multiple places that expect mixed type of pages (page cache or swap cache), eg. migration, huge memory split; There are four helpers for that: - page_index - page_file_offset - folio_index - folio_file_pos To keep the code clean and compatible, this series first cleaned up usage of them. page_file_offset and folio_file_pos are historical helpes that can be simply dropped after clean up. And page_index can be all converted to folio_index or folio->index. Then introduce two new helpers swap_cache_index and swap_dev_pos for swap. Replace swp_offset with swap_cache_index when used to retrieve folio from swap cache, and use swap_dev_pos when needed to retrieve the device position of a swap entry. This way, swap_cache_index can return the optimized value with no compatibility issue. The result is better performance and reduced LOC. Idealy, in the future, we may want to reduce SWAP_ADDRESS_SPACE_SHIFT from 14 to 12: Default Xarray chunk offset is 6, so we have 3 level trees instead of 2 level trees just for 2 extra bits. But swap cache is based on address_space struct, with 4 times more metadata sparsely distributed in memory it waste more cacheline, the performance gain from this series is almost canceled according to my test. So first, just have a cleaner seperation of offsets and smaller search space. This patch (of 10): page_index is only for mixed usage of page cache and swap cache, for pure page cache usage, the caller can just use page->index instead. It can't be a swap cache page here (being part of buffer head), so just drop it. And while we are at it, optimize the code by retrieving the offset of the buffer head within the folio directly using bh_offset, and get rid of the loop and usage of page helpers. Link: https://lkml.kernel.org/r/20240521175854.96038-1-ryncsn@gmail.com Link: https://lkml.kernel.org/r/20240521175854.96038-3-ryncsn@gmail.comSuggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Kairui Song <kasong@tencent.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: Minchan Kim <minchan@kernel.org> Cc: NeilBrown <neilb@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kemeng Shi authored
Factor out balance_wb_limits to remove repeated code [shikemeng@huaweicloud.com: add comment] Link: https://lkml.kernel.org/r/20240606033547.344376-1-shikemeng@huaweicloud.com [akpm@linux-foundation.org: s/fileds/fields/ in comment] Link: https://lkml.kernel.org/r/20240514125254.142203-9-shikemeng@huaweicloud.comSigned-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kemeng Shi authored
Factor out wb_dirty_exceeded to remove repeated code Link: https://lkml.kernel.org/r/20240514125254.142203-8-shikemeng@huaweicloud.comSigned-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kemeng Shi authored
Factor out balance_domain_limits to remove repeated code. Link: https://lkml.kernel.org/r/20240514125254.142203-7-shikemeng@huaweicloud.comSigned-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kemeng Shi authored
Factor out wb_dirty_freerun to remove more repeated freerun code. Link: https://lkml.kernel.org/r/20240514125254.142203-6-shikemeng@huaweicloud.comSigned-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kemeng Shi authored
Factor out code of freerun into new helper functions domain_poll_intv and domain_dirty_freerun to remove repeated code. Link: https://lkml.kernel.org/r/20240514125254.142203-5-shikemeng@huaweicloud.comSigned-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kemeng Shi authored
Factor out domain_over_bg_thresh from wb_over_bg_thresh to remove repeated code. Link: https://lkml.kernel.org/r/20240514125254.142203-4-shikemeng@huaweicloud.comSigned-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kemeng Shi authored
Add general function domain_dirty_avail to calculate dirty and avail for either dirty limit or background writeback in either global domain or wb domain. Link: https://lkml.kernel.org/r/20240514125254.142203-3-shikemeng@huaweicloud.comSigned-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kemeng Shi authored
Patch series "Add helper functions to remove repeated code and improve readability of cgroup writeback", v2. This series adds a lot of helpers to remove repeated code between domain and wb; dirty limit and dirty background; global domain and wb domain. The helpers also improve readability. More details can be found in the respective patches. A simple domain hierarchy is tested: global domain (> 20G) | cgroup domain1(10G) | wb1 | fio Test steps: /* make it easy to observe */ echo 300000 > /proc/sys/vm/dirty_expire_centisecs echo 3000 > /proc/sys/vm/dirty_writeback_centisecs /* create cgroup domain */ cd /sys/fs/cgroup echo "+memory +io" > cgroup.subtree_control mkdir group1 cd group1 echo 10G > memory.high echo 10G > memory.max echo $$ > cgroup.procs mkfs.ext4 -F /dev/vdb mount /dev/vdb /bdi1/ /* run fio to generate dirty pages */ fio -name test -filename=/bdi1/file -size=xxx -ioengine=libaio -bs=4K \ -iodepth=1 -rw=write -direct=0 --time_based -runtime=600 -invalidate=0 When fio size is 1G, the wb is in freerun state and dirty pages are only written back when dirty inode is expired after 30 seconds. When fio size is 2G, the dirty pages keep being written back and bandwidth of fio is limited. This patch (of 8): Similar to wb_dirty_limits which calculates dirty and thresh of wb, wb_bg_dirty_limits calculates background dirty and background thresh of wb. With wb_bg_dirty_limits, we could remove repeated code in wb_over_bg_thresh. Link: https://lkml.kernel.org/r/20240514125254.142203-1-shikemeng@huaweicloud.com Link: https://lkml.kernel.org/r/20240514125254.142203-2-shikemeng@huaweicloud.comSigned-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Shakeel Butt authored
The commit 6be5e186fd65 ("mm: vmscan: restore incremental cgroup iteration") added a retry reclaim heuristic to iterate all the cgroups before returning an unsuccessful reclaim but missed to reset the sc->priority. Let's fix it. Link: https://lkml.kernel.org/r/20240529154911.3008025-1-shakeel.butt@linux.dev Fixes: 6be5e186fd65 ("mm: vmscan: restore incremental cgroup iteration") Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reported-by: syzbot+17416257cb95200cba44@syzkaller.appspotmail.com Tested-by: syzbot+17416257cb95200cba44@syzkaller.appspotmail.com Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Johannes Weiner authored
Currently, reclaim always walks the entire cgroup tree in order to ensure fairness between groups. While overreclaim is limited in shrink_lruvec(), many of our systems have a sizable number of active groups, and an even bigger number of idle cgroups with cache left behind by previous jobs; the mere act of walking all these cgroups can impose significant latency on direct reclaimers. In the past, we've used a save-and-restore iterator that enabled incremental tree walks over multiple reclaim invocations. This ensured fairness, while keeping the work of individual reclaimers small. However, in edge cases with a lot of reclaim concurrency, individual reclaimers would sometimes not see enough of the cgroup tree to make forward progress and (prematurely) declare OOM. Consequently we switched to comprehensive walks in 1ba6fc9a ("mm: vmscan: do not share cgroup iteration between reclaimers"). To address the latency problem without bringing back the premature OOM issue, reinstate the shared iteration, but with a restart condition to do the full walk in the OOM case - similar to what we do for memory.low enforcement and active page protection. In the worst case, we do one more full tree walk before declaring OOM. But the vast majority of direct reclaim scans can then finish much quicker, while fairness across the tree is maintained: - Before this patch, we observed that direct reclaim always takes more than 100us and most direct reclaim time is spent in reclaim cycles lasting between 1ms and 1 second. Almost 40% of direct reclaim time was spent on reclaim cycles exceeding 100ms. - With this patch, almost all page reclaim cycles last less than 10ms, and a good amount of direct page reclaim finishes in under 100us. No page reclaim cycles lasting over 100ms were observed anymore. The shared iterator state is maintaned inside the target cgroup, so fair and incremental walks are performed during both global reclaim and cgroup limit reclaim of complex subtrees. Link: https://lkml.kernel.org/r/20240514202641.2821494-1-hannes@cmpxchg.orgSigned-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Rik van Riel <riel@surriel.com> Reported-by: Rik van Riel <riel@surriel.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Facebook Kernel Team <kernel-team@fb.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ran Xiaokai authored
huge_anon_orders_always is accessed lockless, it is better to use the READ_ONCE() wrapper. This is not fixing any visible bug, hopefully this can cease some KCSAN complains in the future. Also do that for huge_anon_orders_madvise. Link: https://lkml.kernel.org/r/20240515104754889HqrahFPePOIE1UlANHVAh@zte.com.cnSigned-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lu Zhongjun <lu.zhongjun@zte.com.cn> Reviewed-by: xu xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Let's change shmem_alloc_folio() to take a order and use folio_alloc_mpol() helper, then directly use it for normal or large folio to cleanup code. Link: https://lkml.kernel.org/r/20240515070709.78529-5-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Convert to use folio_alloc_mpol() to make vma_alloc_folio_noprof() to use folio throughout. Link: https://lkml.kernel.org/r/20240515070709.78529-4-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Convert to use folio_alloc_mpol_noprof() to make vma_alloc_folio_noprof() to use folio throughout. Link: https://lkml.kernel.org/r/20240515070709.78529-3-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Patch series "mm: convert to folio_alloc_mpol()". This patch (of 4): This adds a new folio_alloc_mpol() like folio_alloc() but allocate folio according to NUMA mempolicy. Link: https://lkml.kernel.org/r/20240515070709.78529-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20240515070709.78529-2-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Oscar Salvador authored
Since commit d67e32f2 ("hugetlb: restructure pool allocations"), the parameter node_alloc_noretry from alloc_fresh_hugetlb_folio() is not used, so drop it. Link: https://lkml.kernel.org/r/20240516081035.5651-1-osalvador@suse.deSigned-off-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Illia Ostapyshyn authored
Commit 49fd9b6d ("mm/vmscan: fix a lot of comments") renamed shrink_page_list() to shrink_folio_list(). Fix up the remaining references to the old name in comments and documentation. Link: https://lkml.kernel.org/r/20240517091348.1185566-1-illia@yshyn.comSigned-off-by: Illia Ostapyshyn <illia@yshyn.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Thomas Weißschuh authored
The sysctl core is preparing to only expose instances of struct ctl_table as "const". This will also affect the ctl_table argument of sysctl handlers. As the function prototype of all sysctl handlers throughout the tree needs to stay consistent that change will be done in one commit. To reduce the size of that final commit, switch utility functions which are not bound by "typedef proc_handler" to "const struct ctl_table". No functional change. Link: https://lkml.kernel.org/r/20240518-sysctl-const-handler-hugetlb-v1-1-47e34e2871b2@weissschuh.netSigned-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Muchun Song <muchun.song@linux.dev> Cc: Joel Granados <j.granados@samsung.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 30 Jun, 2024 16 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linuxLinus Torvalds authored
Pull ata fixes from Niklas Cassel: - Add NOLPM quirk for for all Crucial BX SSD1 models. Considering that we now have had bug reports for 3 different BX SSD1 variants from Crucial with the same product name, make the quirk more inclusive, to catch more device models from the same generation. - Fix a trivial NULL pointer dereference in the error path for ata_host_release(). - Create a ata_port_free(), so that we don't miss freeing ata_port struct members when freeing a struct ata_port. - Fix a trivial double free in the error path for ata_host_alloc(). - Ensure that we remove the libata "remapped NVMe device count" sysfs entry on .probe() error. * tag 'ata-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ata: ahci: Clean up sysfs file on error ata: libata-core: Fix double free on error ata,scsi: libata-core: Do not leak memory for ata_port struct members ata: libata-core: Fix null pointer dereference on error ata: libata-core: Add ATA_HORKAGE_NOLPM for all Crucial BX SSD1 models
-
Niklas Cassel authored
.probe() (ahci_init_one()) calls sysfs_add_file_to_group(), however, if probe() fails after this call, we currently never call sysfs_remove_file_from_group(). (The sysfs_remove_file_from_group() call in .remove() (ahci_remove_one()) does not help, as .remove() is not called on .probe() error.) Thus, if probe() fails after the sysfs_add_file_to_group() call, the next time we insmod the module we will get: sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:04.0/remapped_nvme' CPU: 11 PID: 954 Comm: modprobe Not tainted 6.10.0-rc5 #43 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 sysfs_warn_dup.cold+0x17/0x23 sysfs_add_file_mode_ns+0x11a/0x130 sysfs_add_file_to_group+0x7e/0xc0 ahci_init_one+0x31f/0xd40 [ahci] Fixes: 894fba7f ("ata: ahci: Add sysfs attribute to show remapped NVMe device count") Cc: stable@vger.kernel.org Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240629124210.181537-10-cassel@kernel.orgSigned-off-by: Niklas Cassel <cassel@kernel.org>
-
Niklas Cassel authored
If e.g. the ata_port_alloc() call in ata_host_alloc() fails, we will jump to the err_out label, which will call devres_release_group(). devres_release_group() will trigger a call to ata_host_release(). ata_host_release() calls kfree(host), so executing the kfree(host) in ata_host_alloc() will lead to a double free: kernel BUG at mm/slub.c:553! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 11 PID: 599 Comm: (udev-worker) Not tainted 6.10.0-rc5 #47 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:kfree+0x2cf/0x2f0 Code: 5d 41 5e 41 5f 5d e9 80 d6 ff ff 4d 89 f1 41 b8 01 00 00 00 48 89 d9 48 89 da RSP: 0018:ffffc90000f377f0 EFLAGS: 00010246 RAX: ffff888112b1f2c0 RBX: ffff888112b1f2c0 RCX: ffff888112b1f320 RDX: 000000000000400b RSI: ffffffffc02c9de5 RDI: ffff888112b1f2c0 RBP: ffffc90000f37830 R08: 0000000000000000 R09: 0000000000000000 R10: ffffc90000f37610 R11: 617461203a736b6e R12: ffffea00044ac780 R13: ffff888100046400 R14: ffffffffc02c9de5 R15: 0000000000000006 FS: 00007f2f1cabe980(0000) GS:ffff88813b380000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2f1c3acf75 CR3: 0000000111724000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die_body.cold+0x19/0x27 ? die+0x2e/0x50 ? do_trap+0xca/0x110 ? do_error_trap+0x6a/0x90 ? kfree+0x2cf/0x2f0 ? exc_invalid_op+0x50/0x70 ? kfree+0x2cf/0x2f0 ? asm_exc_invalid_op+0x1a/0x20 ? ata_host_alloc+0xf5/0x120 [libata] ? ata_host_alloc+0xf5/0x120 [libata] ? kfree+0x2cf/0x2f0 ata_host_alloc+0xf5/0x120 [libata] ata_host_alloc_pinfo+0x14/0xa0 [libata] ahci_init_one+0x6c9/0xd20 [ahci] Ensure that we will not call kfree(host) twice, by performing the kfree() only if the devres_open_group() call failed. Fixes: dafd6c49 ("libata: ensure host is free'd on error exit paths") Cc: stable@vger.kernel.org Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240629124210.181537-9-cassel@kernel.orgSigned-off-by: Niklas Cassel <cassel@kernel.org>
-
Niklas Cassel authored
libsas is currently not freeing all the struct ata_port struct members, e.g. ncq_sense_buf for a driver supporting Command Duration Limits (CDL). Add a function, ata_port_free(), that is used to free a ata_port, including its struct members. It makes sense to keep the code related to freeing a ata_port in its own function, which will also free all the struct members of struct ata_port. Fixes: 18bd7718 ("scsi: ata: libata: Handle completion of CDL commands using policy 0xD") Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240629124210.181537-8-cassel@kernel.orgSigned-off-by: Niklas Cassel <cassel@kernel.org>
-
Niklas Cassel authored
If the ata_port_alloc() call in ata_host_alloc() fails, ata_host_release() will get called. However, the code in ata_host_release() tries to free ata_port struct members unconditionally, which can lead to the following: BUG: unable to handle page fault for address: 0000000000003990 PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 10 PID: 594 Comm: (udev-worker) Not tainted 6.10.0-rc5 #44 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:ata_host_release.cold+0x2f/0x6e [libata] Code: e4 4d 63 f4 44 89 e2 48 c7 c6 90 ad 32 c0 48 c7 c7 d0 70 33 c0 49 83 c6 0e 41 RSP: 0018:ffffc90000ebb968 EFLAGS: 00010246 RAX: 0000000000000041 RBX: ffff88810fb52e78 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88813b3218c0 RDI: ffff88813b3218c0 RBP: ffff88810fb52e40 R08: 0000000000000000 R09: 6c65725f74736f68 R10: ffffc90000ebb738 R11: 73692033203a746e R12: 0000000000000004 R13: 0000000000000000 R14: 0000000000000011 R15: 0000000000000006 FS: 00007f6cc55b9980(0000) GS:ffff88813b300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000003990 CR3: 00000001122a2000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die_body.cold+0x19/0x27 ? page_fault_oops+0x15a/0x2f0 ? exc_page_fault+0x7e/0x180 ? asm_exc_page_fault+0x26/0x30 ? ata_host_release.cold+0x2f/0x6e [libata] ? ata_host_release.cold+0x2f/0x6e [libata] release_nodes+0x35/0xb0 devres_release_group+0x113/0x140 ata_host_alloc+0xed/0x120 [libata] ata_host_alloc_pinfo+0x14/0xa0 [libata] ahci_init_one+0x6c9/0xd20 [ahci] Do not access ata_port struct members unconditionally. Fixes: 633273a3 ("libata-pmp: hook PMP support and enable it") Cc: stable@vger.kernel.org Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240629124210.181537-7-cassel@kernel.orgSigned-off-by: Niklas Cassel <cassel@kernel.org>
-
Linus Torvalds authored
Merge tag 'kbuild-fixes-v6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Remove the executable bit from installed DTB files - Escape $ in subshell execution in the debian-orig target - Fix RPM builds with CONFIG_MODULES=n - Fix xconfig with the O= option - Fix scripts_gdb with the O= option * tag 'kbuild-fixes-v6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: scripts/gdb: bring the "abspath" back kbuild: Use $(obj)/%.cc to fix host C++ module builds kbuild: rpm-pkg: fix build error with CONFIG_MODULES=n kbuild: Fix build target deb-pkg: ln: failed to create hard link kbuild: doc: Update default INSTALL_MOD_DIR from extra to updates kbuild: Install dtb files as 0644 in Makefile.dtbinst
-
Linus Torvalds authored
The kernel test robot reported that clang no longer compiles the 32-bit x86 kernel in some configurations due to commit 95ece481 ("locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions"). The build fails with arch/x86/include/asm/cmpxchg_32.h:149:9: error: inline assembly requires more registers than available and the reason seems to be that not only does the cmpxchg8b instruction need four fixed registers (EDX:EAX and ECX:EBX), with the emulation fallback the inline asm also wants a fifth fixed register for the address (it uses %esi for that, but that's just a software convention with cmpxchg8b_emu). Avoiding using another pointer input to the asm (and just forcing it to use the "0(%esi)" addressing that we end up requiring for the sw fallback) seems to fix the issue. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202406230912.F6XFIyA6-lkp@intel.com/ Fixes: 95ece481 ("locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions") Link: https://lore.kernel.org/all/202406230912.F6XFIyA6-lkp@intel.com/Suggested-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-and-Tested-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds authored
Pull char/misc driver fixes from Greg KH: "Here are some small driver fixes for 6.10-rc6. Included in here are: - IIO driver fixes for reported issues - Counter driver fix for a reported problem. All of these have been in linux-next this week with no reported issues" * tag 'char-misc-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: counter: ti-eqep: enable clock at probe iio: chemical: bme680: Fix sensor data read operation iio: chemical: bme680: Fix overflows in compensate() functions iio: chemical: bme680: Fix calibration data variable iio: chemical: bme680: Fix pressure value output iio: humidity: hdc3020: fix hysteresis representation iio: dac: fix ad9739a random config compile error iio: accel: fxls8962af: select IIO_BUFFER & IIO_KFIFO_BUF iio: adc: ad7266: Fix variable checking bug iio: xilinx-ams: Don't include ams_ctrl_channels in scan_mask
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/stagingLinus Torvalds authored
Pull staging driver fixes from Greg KH: "Here are two small staging driver fixes for 6.10-rc6, both for the vc04_services drivers: - build fix if CONFIG_DEBUGFS was not set - initialization check fix that was much reported. Both of these have been in linux-next this week with no reported issues" * tag 'staging-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: vchiq_debugfs: Fix build if CONFIG_DEBUG_FS is not set staging: vc04_services: vchiq_arm: Fix initialisation check
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/ttyLinus Torvalds authored
Pull tty / serial / console fixes from Greg KH: "Here are a bunch of fixes/reverts for 6.10-rc6. Include in here are: - revert the bunch of tty/serial/console changes that landed in -rc1 that didn't quite work properly yet. Everyone agreed to just revert them for now and will work on making them better for a future release instead of trying to quick fix the existing changes this late in the release cycle - 8250 driver port count bugfix - Other tiny serial port bugfixes for reported issues All of these have been in linux-next this week with no reported issues" * tag 'tty-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: Revert "printk: Save console options for add_preferred_console_match()" Revert "printk: Don't try to parse DEVNAME:0.0 console options" Revert "printk: Flag register_console() if console is set on command line" Revert "serial: core: Add support for DEVNAME:0.0 style naming for kernel console" Revert "serial: core: Handle serial console options" Revert "serial: 8250: Add preferred console in serial8250_isa_init_ports()" Revert "Documentation: kernel-parameters: Add DEVNAME:0.0 format for serial ports" Revert "serial: 8250: Fix add preferred console for serial8250_isa_init_ports()" Revert "serial: core: Fix ifdef for serial base console functions" serial: bcm63xx-uart: fix tx after conversion to uart_port_tx_limited() serial: core: introduce uart_port_tx_limited_flags() Revert "serial: core: only stop transmit when HW fifo is empty" serial: imx: set receiver level before starting uart tty: mcf: MCF54418 has 10 UARTS serial: 8250_omap: Implementation of Errata i2310 tty: serial: 8250: Fix port count mismatch with the device
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds authored
Pull USB fixes from Greg KH: "Here are a handful of small USB driver fixes for 6.10-rc6 to resolve some reported issues. Included in here are: - typec driver bugfixes - usb gadget driver reverts for commits that were reported to have problems - resource leak bugfix - gadget driver bugfixes - dwc3 driver bugfixes - usb atm driver bugfix for when syzbot got loose on it All of these have been in linux-next this week with no reported issues" * tag 'usb-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: dwc3: core: Workaround for CSR read timeout Revert "usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach" Revert "usb: gadget: u_ether: Re-attach netif device to mirror detachment" usb: gadget: aspeed_udc: fix device address configuration usb: dwc3: core: remove lock of otg mode during gadget suspend/resume to avoid deadlock usb: typec: ucsi: glink: fix child node release in probe function usb: musb: da8xx: fix a resource leak in probe() usb: typec: ucsi_acpi: Add LG Gram quirk usb: ucsi: stm32: fix command completion handling usb: atm: cxacru: fix endpoint checking in cxacru_bind() usb: gadget: printer: fix races against disable usb: gadget: printer: SS+ support
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull smp fixes from Borislav Petkov: - Fix "nosmp" and "maxcpus=0" after the parallel CPU bringup work went in and broke them - Make sure CPU hotplug dynamic prepare states are actually executed * tag 'smp_urgent_for_v6.10_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu: Fix broken cmdline "nosmp" and "maxcpus=0" cpu/hotplug: Fix dynstate assignment in __cpuhp_setup_state_cpuslocked()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fixes from Borislav Petkov: - Make sure multi-bridge machines get all eiointc interrupt controllers initialized even if the number of CPUs has been limited by a cmdline param - Make sure interrupt lines on liointc hw are configured properly even when interrupt routing changes - Avoid use-after-free in the error path of the MSI init code * tag 'irq_urgent_for_v6.10_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: PCI/MSI: Fix UAF in msi_capability_init irqchip/loongson-liointc: Set different ISRs for different cores irqchip/loongson-eiointc: Use early_cpu_to_node() instead of cpu_to_node()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull timer fix from Borislav Petkov: - Warn when an hrtimer doesn't get a callback supplied * tag 'timers_urgent_for_v6.10_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Prevent queuing of hrtimer without a function callback
-
git://www.linux-watchdog.org/linux-watchdogLinus Torvalds authored
Pull watchdog fixes from Wim Van Sebroeck: - lenovo_se10_wdt: add HAS_IOPORT dependency - add missing MODULE_DESCRIPTION() macros * tag 'linux-watchdog-6.10-rc-fixes' of git://www.linux-watchdog.org/linux-watchdog: watchdog: add missing MODULE_DESCRIPTION() macros watchdog: lenovo_se10_wdt: add HAS_IOPORT dependency
-
- 29 Jun, 2024 1 commit
-
-
git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds authored
Pull NFS client fix from Trond Myklebust: - One more SUNRPC fix for the NFSv4.x backchannel timeouts * tag 'nfs-for-6.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Fix backchannel reply, again
-