• Srikar Dronamraju's avatar
    powerpc/numa: Update cpu_cpu_map on CPU online/offline · 9a245d0e
    Srikar Dronamraju authored
    cpu_cpu_map holds all the CPUs in the DIE. However in PowerPC, when
    onlining/offlining of CPUs, this mask doesn't get updated.  This mask
    is however updated when CPUs are added/removed. So when both
    operations like online/offline of CPUs and adding/removing of CPUs are
    done simultaneously, then cpumaps end up broken.
    
    WARNING: CPU: 13 PID: 1142 at kernel/sched/topology.c:898
    build_sched_domains+0xd48/0x1720
    Modules linked in: rpadlpar_io rpaphp mptcp_diag xsk_diag tcp_diag
    udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag
    bonding tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
    nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
    nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
    rfkill nf_tables nfnetlink pseries_rng xts vmx_crypto uio_pdrv_genirq
    uio binfmt_misc ip_tables xfs libcrc32c dm_service_time sd_mod t10_pi sg
    ibmvfc scsi_transport_fc ibmveth dm_multipath dm_mirror dm_region_hash
    dm_log dm_mod fuse
    CPU: 13 PID: 1142 Comm: kworker/13:2 Not tainted 5.13.0-rc6+ #28
    Workqueue: events cpuset_hotplug_workfn
    NIP:  c0000000001caac8 LR: c0000000001caac4 CTR: 00000000007088ec
    REGS: c00000005596f220 TRAP: 0700   Not tainted  (5.13.0-rc6+)
    MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 48828222  XER:
    00000009
    CFAR: c0000000001ea698 IRQMASK: 0
    GPR00: c0000000001caac4 c00000005596f4c0 c000000001c4a400 0000000000000036
    GPR04: 00000000fffdffff c00000005596f1d0 0000000000000027 c0000018cfd07f90
    GPR08: 0000000000000023 0000000000000001 0000000000000027 c0000018fe68ffe8
    GPR12: 0000000000008000 c00000001e9d1880 c00000013a047200 0000000000000800
    GPR16: c000000001d3c7d0 0000000000000240 0000000000000048 c000000010aacd18
    GPR20: 0000000000000001 c000000010aacc18 c00000013a047c00 c000000139ec2400
    GPR24: 0000000000000280 c000000139ec2520 c000000136c1b400 c000000001c93060
    GPR28: c00000013a047c20 c000000001d3c6c0 c000000001c978a0 000000000000000d
    NIP [c0000000001caac8] build_sched_domains+0xd48/0x1720
    LR [c0000000001caac4] build_sched_domains+0xd44/0x1720
    Call Trace:
    [c00000005596f4c0] [c0000000001caac4] build_sched_domains+0xd44/0x1720 (unreliable)
    [c00000005596f670] [c0000000001cc5ec] partition_sched_domains_locked+0x3ac/0x4b0
    [c00000005596f710] [c0000000002804e4] rebuild_sched_domains_locked+0x404/0x9e0
    [c00000005596f810] [c000000000283e60] rebuild_sched_domains+0x40/0x70
    [c00000005596f840] [c000000000284124] cpuset_hotplug_workfn+0x294/0xf10
    [c00000005596fc60] [c000000000175040] process_one_work+0x290/0x590
    [c00000005596fd00] [c0000000001753c8] worker_thread+0x88/0x620
    [c00000005596fda0] [c000000000181704] kthread+0x194/0x1a0
    [c00000005596fe10] [c00000000000ccec] ret_from_kernel_thread+0x5c/0x70
    Instruction dump:
    485af049 60000000 2fa30800 409e0028 80fe0000 e89a00f8 e86100e8 38da0120
    7f88e378 7ce53b78 4801fb91 60000000 <0fe00000> 39000000 38e00000 38c00000
    
    Fix this by updating cpu_cpu_map aka cpumask_of_node() on every CPU
    online/offline.
    Signed-off-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20210826100521.412639-5-srikar@linux.vnet.ibm.com
    9a245d0e
numa.c 36.8 KB