• Yasuaki Ishimatsu's avatar
    workqueue: zero cpumask of wq_numa_possible_cpumask on init · 5a6024f1
    Yasuaki Ishimatsu authored
    When hot-adding and onlining CPU, kernel panic occurs, showing following
    call trace.
    
      BUG: unable to handle kernel paging request at 0000000000001d08
      IP: [<ffffffff8114acfd>] __alloc_pages_nodemask+0x9d/0xb10
      PGD 0
      Oops: 0000 [#1] SMP
      ...
      Call Trace:
       [<ffffffff812b8745>] ? cpumask_next_and+0x35/0x50
       [<ffffffff810a3283>] ? find_busiest_group+0x113/0x8f0
       [<ffffffff81193bc9>] ? deactivate_slab+0x349/0x3c0
       [<ffffffff811926f1>] new_slab+0x91/0x300
       [<ffffffff815de95a>] __slab_alloc+0x2bb/0x482
       [<ffffffff8105bc1c>] ? copy_process.part.25+0xfc/0x14c0
       [<ffffffff810a3c78>] ? load_balance+0x218/0x890
       [<ffffffff8101a679>] ? sched_clock+0x9/0x10
       [<ffffffff81105ba9>] ? trace_clock_local+0x9/0x10
       [<ffffffff81193d1c>] kmem_cache_alloc_node+0x8c/0x200
       [<ffffffff8105bc1c>] copy_process.part.25+0xfc/0x14c0
       [<ffffffff81114d0d>] ? trace_buffer_unlock_commit+0x4d/0x60
       [<ffffffff81085a80>] ? kthread_create_on_node+0x140/0x140
       [<ffffffff8105d0ec>] do_fork+0xbc/0x360
       [<ffffffff8105d3b6>] kernel_thread+0x26/0x30
       [<ffffffff81086652>] kthreadd+0x2c2/0x300
       [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60
       [<ffffffff815f20ec>] ret_from_fork+0x7c/0xb0
       [<ffffffff81086390>] ? kthread_create_on_cpu+0x60/0x60
    
    In my investigation, I found the root cause is wq_numa_possible_cpumask.
    All entries of wq_numa_possible_cpumask is allocated by
    alloc_cpumask_var_node(). And these entries are used without initializing.
    So these entries have wrong value.
    
    When hot-adding and onlining CPU, wq_update_unbound_numa() is called.
    wq_update_unbound_numa() calls alloc_unbound_pwq(). And alloc_unbound_pwq()
    calls get_unbound_pool(). In get_unbound_pool(), worker_pool->node is set
    as follow:
    
    3592         /* if cpumask is contained inside a NUMA node, we belong to that node */
    3593         if (wq_numa_enabled) {
    3594                 for_each_node(node) {
    3595                         if (cpumask_subset(pool->attrs->cpumask,
    3596                                            wq_numa_possible_cpumask[node])) {
    3597                                 pool->node = node;
    3598                                 break;
    3599                         }
    3600                 }
    3601         }
    
    But wq_numa_possible_cpumask[node] does not have correct cpumask. So, wrong
    node is selected. As a result, kernel panic occurs.
    
    By this patch, all entries of wq_numa_possible_cpumask are allocated by
    zalloc_cpumask_var_node to initialize them. And the panic disappeared.
    Signed-off-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
    Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Cc: stable@vger.kernel.org
    Fixes: bce90380 ("workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]")
    5a6024f1
workqueue.c 137 KB