1. 27 May, 2022 13 commits
  2. 25 May, 2022 17 commits
  3. 19 May, 2022 10 commits
    • Kefeng Wang's avatar
      02e34fff
    • Vasily Averin's avatar
      tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate · 2b132903
      Vasily Averin authored
      Fixes following sparse warnings:
      
        CHECK   mm/vmscan.c
      mm/vmscan.c: note: in included file (through
      include/trace/trace_events.h, include/trace/define_trace.h,
      include/trace/events/vmscan.h):
      ./include/trace/events/vmscan.h:281:1: sparse: warning:
       cast to restricted isolate_mode_t
      ./include/trace/events/vmscan.h:281:1: sparse: warning:
       restricted isolate_mode_t degrades to integer
      
      Link: https://lkml.kernel.org/r/e85d7ff2-fd10-53f8-c24e-ba0458439c1b@openvz.orgSigned-off-by: default avatarVasily Averin <vvs@openvz.org>
      Acked-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2b132903
    • Christophe de Dinechin's avatar
      nodemask.h: fix compilation error with GCC12 · 37462a92
      Christophe de Dinechin authored
      With gcc version 12.0.1 20220401 (Red Hat 12.0.1-0), building with
      defconfig results in the following compilation error:
      
      |   CC      mm/swapfile.o
      | mm/swapfile.c: In function `setup_swap_info':
      | mm/swapfile.c:2291:47: error: array subscript -1 is below array bounds
      |  of `struct plist_node[]' [-Werror=array-bounds]
      |  2291 |                                 p->avail_lists[i].prio = 1;
      |       |                                 ~~~~~~~~~~~~~~^~~
      | In file included from mm/swapfile.c:16:
      | ./include/linux/swap.h:292:27: note: while referencing `avail_lists'
      |   292 |         struct plist_node avail_lists[]; /*
      |       |                           ^~~~~~~~~~~
      
      This is due to the compiler detecting that the mask in
      node_states[__state] could theoretically be zero, which would lead to
      first_node() returning -1 through find_first_bit.
      
      I believe that the warning/error is legitimate.  I first tried adding a
      test to check that the node mask is not emtpy, since a similar test exists
      in the case where MAX_NUMNODES == 1.
      
      However, adding the if statement causes other warnings to appear in
      for_each_cpu_node_but, because it introduces a dangling else ambiguity. 
      And unfortunately, GCC is not smart enough to detect that the added test
      makes the case where (node) == -1 impossible, so it still complains with
      the same message.
      
      This is why I settled on replacing that with a harmless, but relatively
      useless (node) >= 0 test.  Based on the warning for the dangling else, I
      also decided to fix the case where MAX_NUMNODES == 1 by moving the
      condition inside the for loop.  It will still only be tested once.  This
      ensures that the meaning of an else following for_each_node_mask or
      derivatives would not silently have a different meaning depending on the
      configuration.
      
      Link: https://lkml.kernel.org/r/20220414150855.2407137-3-dinechin@redhat.comSigned-off-by: default avatarChristophe de Dinechin <christophe@dinechin.org>
      Signed-off-by: default avatarChristophe de Dinechin <dinechin@redhat.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Ben Segall <bsegall@google.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Zhen Lei <thunder.leizhen@huawei.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      37462a92
    • Qi Zheng's avatar
      mm: fix missing handler for __GFP_NOWARN · 3f913fc5
      Qi Zheng authored
      We expect no warnings to be issued when we specify __GFP_NOWARN, but
      currently in paths like alloc_pages() and kmalloc(), there are still some
      warnings printed, fix it.
      
      But for some warnings that report usage problems, we don't deal with them.
      If such warnings are printed, then we should fix the usage problems. 
      Such as the following case:
      
      	WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
      
      [zhengqi.arch@bytedance.com: v2]
       Link: https://lkml.kernel.org/r/20220511061951.1114-1-zhengqi.arch@bytedance.com
      Link: https://lkml.kernel.org/r/20220510113809.80626-1-zhengqi.arch@bytedance.comSigned-off-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
      Cc: Akinobu Mita <akinobu.mita@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3f913fc5
    • Wonhyuk Yang's avatar
      mm/page_alloc: fix tracepoint mm_page_alloc_zone_locked() · 10e0f753
      Wonhyuk Yang authored
      Currently, trace point mm_page_alloc_zone_locked() doesn't show correct
      information.
      
      First, when alloc_flag has ALLOC_HARDER/ALLOC_CMA, page can be allocated
      from MIGRATE_HIGHATOMIC/MIGRATE_CMA.  Nevertheless, tracepoint use
      requested migration type not MIGRATE_HIGHATOMIC and MIGRATE_CMA.
      
      Second, after commit 44042b44 ("mm/page_alloc: allow high-order pages
      to be stored on the per-cpu lists") percpu-list can store high order
      pages.  But trace point determine whether it is a refiil of percpu-list by
      comparing requested order and 0.
      
      To handle these problems, make mm_page_alloc_zone_locked() only be called
      by __rmqueue_smallest with correct migration type.  With a new argument
      called percpu_refill, it can show roughly whether it is a refill of
      percpu-list.
      
      Link: https://lkml.kernel.org/r/20220512025307.57924-1-vvghjk1234@gmail.comSigned-off-by: default avatarWonhyuk Yang <vvghjk1234@gmail.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Baik Song An <bsahn@etri.re.kr>
      Cc: Hong Yeon Kim <kimhy@etri.re.kr>
      Cc: Taeung Song <taeung@reallinux.co.kr>
      Cc: <linuxgeek@linuxgeek.io>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      10e0f753
    • Fanjun Kong's avatar
      mm/page_owner.c: add missing __initdata attribute · 3645b5ec
      Fanjun Kong authored
      This patch fixes two issues:
      1. Add __initdata attribute according to include/linux/init.h:
      	For initialized data:
      	You should insert __initdata between the variable name and equal
      	sign followed by value
      
      2. Fix below error reported by checkpatch.pl:
      	ERROR: do not initialise statics to false
      
      Special thanks to Muchun Song :)
      
      Link: https://lkml.kernel.org/r/20220516030039.1487005-1-bh1scw@gmail.comSigned-off-by: default avatarFanjun Kong <bh1scw@gmail.com>
      Suggested-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3645b5ec
    • Luo Meng's avatar
      tmpfs: fix undefined-behaviour in shmem_reconfigure() · d14f5efa
      Luo Meng authored
      When shmem_reconfigure() calls __percpu_counter_compare(), the second
      parameter is unsigned long long.  But in the definition of
      __percpu_counter_compare(), the second parameter is s64.  So when
      __percpu_counter_compare() executes abs(count - rhs), UBSAN shows the
      following warning:
      
      ================================================================================
      UBSAN: Undefined behaviour in lib/percpu_counter.c:209:6
      signed integer overflow:
      0 - -9223372036854775808 cannot be represented in type 'long long int'
      CPU: 1 PID: 9636 Comm: syz-executor.2 Tainted: G                 ---------r-  - 4.18.0 #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      Call Trace:
       __dump_stack home/install/linux-rh-3-10/lib/dump_stack.c:77 [inline]
       dump_stack+0x125/0x1ae home/install/linux-rh-3-10/lib/dump_stack.c:117
       ubsan_epilogue+0xe/0x81 home/install/linux-rh-3-10/lib/ubsan.c:159
       handle_overflow+0x19d/0x1ec home/install/linux-rh-3-10/lib/ubsan.c:190
       __percpu_counter_compare+0x124/0x140 home/install/linux-rh-3-10/lib/percpu_counter.c:209
       percpu_counter_compare home/install/linux-rh-3-10/./include/linux/percpu_counter.h:50 [inline]
       shmem_remount_fs+0x1ce/0x6b0 home/install/linux-rh-3-10/mm/shmem.c:3530
       do_remount_sb+0x11b/0x530 home/install/linux-rh-3-10/fs/super.c:888
       do_remount home/install/linux-rh-3-10/fs/namespace.c:2344 [inline]
       do_mount+0xf8d/0x26b0 home/install/linux-rh-3-10/fs/namespace.c:2844
       ksys_mount+0xad/0x120 home/install/linux-rh-3-10/fs/namespace.c:3075
       __do_sys_mount home/install/linux-rh-3-10/fs/namespace.c:3089 [inline]
       __se_sys_mount home/install/linux-rh-3-10/fs/namespace.c:3086 [inline]
       __x64_sys_mount+0xbf/0x160 home/install/linux-rh-3-10/fs/namespace.c:3086
       do_syscall_64+0xca/0x5c0 home/install/linux-rh-3-10/arch/x86/entry/common.c:298
       entry_SYSCALL_64_after_hwframe+0x6a/0xdf
      RIP: 0033:0x46b5e9
      Code: 5d db fa ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b db fa ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f54d5f22c68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
      RAX: ffffffffffffffda RBX: 000000000077bf60 RCX: 000000000046b5e9
      RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
      RBP: 000000000077bf60 R08: 0000000020000140 R09: 0000000000000000
      R10: 00000000026740a4 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007ffd1fb1592f R14: 00007f54d5f239c0 R15: 000000000077bf6c
      ================================================================================
      
      [akpm@linux-foundation.org: tweak error message text]
      Link: https://lkml.kernel.org/r/20220513025225.2678727-1-luomeng12@huawei.comSigned-off-by: default avatarLuo Meng <luomeng12@huawei.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Yu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d14f5efa
    • Wang Cheng's avatar
      mm/mempolicy: fix uninit-value in mpol_rebind_policy() · 018160ad
      Wang Cheng authored
      mpol_set_nodemask()(mm/mempolicy.c) does not set up nodemask when
      pol->mode is MPOL_LOCAL.  Check pol->mode before access
      pol->w.cpuset_mems_allowed in mpol_rebind_policy()(mm/mempolicy.c).
      
      BUG: KMSAN: uninit-value in mpol_rebind_policy mm/mempolicy.c:352 [inline]
      BUG: KMSAN: uninit-value in mpol_rebind_task+0x2ac/0x2c0 mm/mempolicy.c:368
       mpol_rebind_policy mm/mempolicy.c:352 [inline]
       mpol_rebind_task+0x2ac/0x2c0 mm/mempolicy.c:368
       cpuset_change_task_nodemask kernel/cgroup/cpuset.c:1711 [inline]
       cpuset_attach+0x787/0x15e0 kernel/cgroup/cpuset.c:2278
       cgroup_migrate_execute+0x1023/0x1d20 kernel/cgroup/cgroup.c:2515
       cgroup_migrate kernel/cgroup/cgroup.c:2771 [inline]
       cgroup_attach_task+0x540/0x8b0 kernel/cgroup/cgroup.c:2804
       __cgroup1_procs_write+0x5cc/0x7a0 kernel/cgroup/cgroup-v1.c:520
       cgroup1_tasks_write+0x94/0xb0 kernel/cgroup/cgroup-v1.c:539
       cgroup_file_write+0x4c2/0x9e0 kernel/cgroup/cgroup.c:3852
       kernfs_fop_write_iter+0x66a/0x9f0 fs/kernfs/file.c:296
       call_write_iter include/linux/fs.h:2162 [inline]
       new_sync_write fs/read_write.c:503 [inline]
       vfs_write+0x1318/0x2030 fs/read_write.c:590
       ksys_write+0x28b/0x510 fs/read_write.c:643
       __do_sys_write fs/read_write.c:655 [inline]
       __se_sys_write fs/read_write.c:652 [inline]
       __x64_sys_write+0xdb/0x120 fs/read_write.c:652
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was created at:
       slab_post_alloc_hook mm/slab.h:524 [inline]
       slab_alloc_node mm/slub.c:3251 [inline]
       slab_alloc mm/slub.c:3259 [inline]
       kmem_cache_alloc+0x902/0x11c0 mm/slub.c:3264
       mpol_new mm/mempolicy.c:293 [inline]
       do_set_mempolicy+0x421/0xb70 mm/mempolicy.c:853
       kernel_set_mempolicy mm/mempolicy.c:1504 [inline]
       __do_sys_set_mempolicy mm/mempolicy.c:1510 [inline]
       __se_sys_set_mempolicy+0x44c/0xb60 mm/mempolicy.c:1507
       __x64_sys_set_mempolicy+0xd8/0x110 mm/mempolicy.c:1507
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      KMSAN: uninit-value in mpol_rebind_task (2)
      https://syzkaller.appspot.com/bug?id=d6eb90f952c2a5de9ea718a1b873c55cb13b59dc
      
      This patch seems to fix below bug too.
      KMSAN: uninit-value in mpol_rebind_mm (2)
      https://syzkaller.appspot.com/bug?id=f2fecd0d7013f54ec4162f60743a2b28df40926b
      
      The uninit-value is pol->w.cpuset_mems_allowed in mpol_rebind_policy().
      When syzkaller reproducer runs to the beginning of mpol_new(),
      
      	    mpol_new() mm/mempolicy.c
      	  do_mbind() mm/mempolicy.c
      	kernel_mbind() mm/mempolicy.c
      
      `mode` is 1(MPOL_PREFERRED), nodes_empty(*nodes) is `true` and `flags`
      is 0. Then
      
      	mode = MPOL_LOCAL;
      	...
      	policy->mode = mode;
      	policy->flags = flags;
      
      will be executed. So in mpol_set_nodemask(),
      
      	    mpol_set_nodemask() mm/mempolicy.c
      	  do_mbind()
      	kernel_mbind()
      
      pol->mode is 4 (MPOL_LOCAL), that `nodemask` in `pol` is not initialized,
      which will be accessed in mpol_rebind_policy().
      
      Link: https://lkml.kernel.org/r/20220512123428.fq3wofedp6oiotd4@ppc.localdomainSigned-off-by: default avatarWang Cheng <wanngchenng@gmail.com>
      Reported-by: <syzbot+217f792c92599518a2ab@syzkaller.appspotmail.com>
      Tested-by: <syzbot+217f792c92599518a2ab@syzkaller.appspotmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      018160ad
    • Minchan Kim's avatar
      mm: don't be stuck to rmap lock on reclaim path · 6d4675e6
      Minchan Kim authored
      The rmap locks(i_mmap_rwsem and anon_vma->root->rwsem) could be contended
      under memory pressure if processes keep working on their vmas(e.g., fork,
      mmap, munmap).  It makes reclaim path stuck.  In our real workload traces,
      we see kswapd is waiting the lock for 300ms+(worst case, a sec) and it
      makes other processes entering direct reclaim, which were also stuck on
      the lock.
      
      This patch makes lru aging path try_lock mode like shink_page_list so the
      reclaim context will keep working with next lru pages without being stuck.
      if it found the rmap lock contended, it rotates the page back to head of
      lru in both active/inactive lrus to make them consistent behavior, which
      is basic starting point rather than adding more heristic.
      
      Since this patch introduces a new "contended" field as out-param along
      with try_lock in-param in rmap_walk_control, it's not immutable any longer
      if the try_lock is set so remove const keywords on rmap related functions.
      Since rmap walking is already expensive operation, I doubt the const
      would help sizable benefit( And we didn't have it until 5.17).
      
      In a heavy app workload in Android, trace shows following statistics.  It
      almost removes rmap lock contention from reclaim path.
      
      Martin Liu reported:
      
      Before:
      
         max_dur(ms)  min_dur(ms)  max-min(dur)ms  avg_dur(ms)  sum_dur(ms)  count blocked_function
               1632            0            1631   151.542173        31672    209  page_lock_anon_vma_read
                601            0             601   145.544681        28817    198  rmap_walk_file
      
      After:
      
         max_dur(ms)  min_dur(ms)  max-min(dur)ms  avg_dur(ms)  sum_dur(ms)  count blocked_function
                NaN          NaN              NaN          NaN          NaN    0.0             NaN
                  0            0                0     0.127645            1     12  rmap_walk_file
      
      [minchan@kernel.org: add comment, per Matthew]
        Link: https://lkml.kernel.org/r/YnNqeB5tUf6LZ57b@google.com
      Link: https://lkml.kernel.org/r/20220510215423.164547-1-minchan@kernel.orgSigned-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: John Dias <joaodias@google.com>
      Cc: Tim Murray <timmurray@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Martin Liu <liumartin@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6d4675e6
    • Johannes Weiner's avatar
      zswap: memcg accounting · f4840ccf
      Johannes Weiner authored
      Applications can currently escape their cgroup memory containment when
      zswap is enabled.  This patch adds per-cgroup tracking and limiting of
      zswap backend memory to rectify this.
      
      The existing cgroup2 memory.stat file is extended to show zswap statistics
      analogous to what's in meminfo and vmstat.  Furthermore, two new control
      files, memory.zswap.current and memory.zswap.max, are added to allow
      tuning zswap usage on a per-workload basis.  This is important since not
      all workloads benefit from zswap equally; some even suffer compared to
      disk swap when memory contents don't compress well.  The optimal size of
      the zswap pool, and the threshold for writeback, also depends on the size
      of the workload's warm set.
      
      The implementation doesn't use a traditional page_counter transaction. 
      zswap is unconventional as a memory consumer in that we only know the
      amount of memory to charge once expensive compression has occurred.  If
      zwap is disabled or the limit is already exceeded we obviously don't want
      to compress page upon page only to reject them all.  Instead, the limit is
      checked against current usage, then we compress and charge.  This allows
      some limit overrun, but not enough to matter in practice.
      
      [hannes@cmpxchg.org: fix for CONFIG_SLOB builds]
        Link: https://lkml.kernel.org/r/YnwD14zxYjUJPc2w@cmpxchg.org
      [hannes@cmpxchg.org: opt out of cgroups v1]
        Link: https://lkml.kernel.org/r/Yn6it9mBYFA+/lTb@cmpxchg.org
      Link: https://lkml.kernel.org/r/20220510152847.230957-7-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f4840ccf