• Mel Gorman's avatar
    mm: vmscan: remove deadlock due to throttling failing to make progress · b485c6f1
    Mel Gorman authored
    A soft lockup bug in kcompactd was reported in a private bugzilla with
    the following visible in dmesg;
    
      watchdog: BUG: soft lockup - CPU#33 stuck for 26s! [kcompactd0:479]
      watchdog: BUG: soft lockup - CPU#33 stuck for 52s! [kcompactd0:479]
      watchdog: BUG: soft lockup - CPU#33 stuck for 78s! [kcompactd0:479]
      watchdog: BUG: soft lockup - CPU#33 stuck for 104s! [kcompactd0:479]
    
    The machine had 256G of RAM with no swap and an earlier failed
    allocation indicated that node 0 where kcompactd was run was potentially
    unreclaimable;
    
      Node 0 active_anon:29355112kB inactive_anon:2913528kB active_file:0kB
        inactive_file:0kB unevictable:64kB isolated(anon):0kB isolated(file):0kB
        mapped:8kB dirty:0kB writeback:0kB shmem:26780kB shmem_thp:
        0kB shmem_pmdmapped: 0kB anon_thp: 23480320kB writeback_tmp:0kB
        kernel_stack:2272kB pagetables:24500kB all_unreclaimable? yes
    
    Vlastimil Babka investigated a crash dump and found that a task
    migrating pages was trying to drain PCP lists;
    
      PID: 52922  TASK: ffff969f820e5000  CPU: 19  COMMAND: "kworker/u128:3"
      Call Trace:
         __schedule
         schedule
         schedule_timeout
         wait_for_completion
         __flush_work
         __drain_all_pages
         __alloc_pages_slowpath.constprop.114
         __alloc_pages
         alloc_migration_target
         migrate_pages
         migrate_to_node
         do_migrate_pages
         cpuset_migrate_mm_workfn
         process_one_work
         worker_thread
         kthread
         ret_from_fork
    
    This failure is specific to CONFIG_PREEMPT=n builds.  The root of the
    problem is that kcompact0 is not rescheduling on a CPU while a task that
    has isolated a large number of the pages from the LRU is waiting on
    kcompact0 to reschedule so the pages can be released.  While
    shrink_inactive_list() only loops once around too_many_isolated, reclaim
    can continue without rescheduling if sc->skipped_deactivate == 1 which
    could happen if there was no file LRU and the inactive anon list was not
    low.
    
    Link: https://lkml.kernel.org/r/20220203100326.GD3301@suse.de
    Fixes: d818fca1 ("mm/vmscan: throttle reclaim and compaction when too may pages are isolated")
    Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
    Debugged-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Acked-by: default avatarDavid Rientjes <rientjes@google.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    b485c6f1
vmscan.c 139 KB