• K Prateek Nayak's avatar
    drivers: base: cacheinfo: Fix shared_cpu_map changes in event of CPU hotplug · 126310c9
    K Prateek Nayak authored
    While building the shared_cpu_map, check if the cache level and cache
    type matches. On certain systems that build the cache topology based on
    the instance ID, there are cases where the same ID may repeat across
    multiple cache levels, leading inaccurate topology.
    
    In event of CPU offlining, the cache_shared_cpu_map_remove() does not
    consider if IDs at same level are being compared. As a result, when same
    IDs repeat across different cache levels, the CPU going offline is not
    removed from all the shared_cpu_map.
    
    Below is the output of cache topology of CPU8 and it's SMT sibling after
    CPU8 is offlined on a dual socket 3rd Generation AMD EPYC processor
    (2 x 64C/128T) running kernel release v6.3:
    
      # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
        /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136
        /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136
        /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136
        /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143
    
      # echo 0 > /sys/devices/system/cpu/cpu8/online
    
      # for i in /sys/devices/system/cpu/cpu136/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
        /sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list: 136
        /sys/devices/system/cpu/cpu136/cache/index1/shared_cpu_list: 8,136
        /sys/devices/system/cpu/cpu136/cache/index2/shared_cpu_list: 8,136
        /sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list: 9-15,136-143
    
    CPU8 is removed from index0 (L1i) but remains in the shared_cpu_list of
    index1 (L1d) and index2 (L2). Since L1i, L1d, and L2 are shared by the
    SMT siblings, and they have the same cache instance ID, CPU 2 is only
    removed from the first index with matching ID which is index1 (L1i) in
    this case. With this fix, the results are as expected when performing
    the same experiment on the same system:
    
      # for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
        /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136
        /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136
        /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136
        /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143
    
      # echo 0 > /sys/devices/system/cpu/cpu8/online
    
      # for i in /sys/devices/system/cpu/cpu136/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
        /sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list: 136
        /sys/devices/system/cpu/cpu136/cache/index1/shared_cpu_list: 136
        /sys/devices/system/cpu/cpu136/cache/index2/shared_cpu_list: 136
        /sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list: 9-15,136-143
    
    When rebuilding topology, the same problem appears as
    cache_shared_cpu_map_setup() implements a similar logic. Consider the
    same 3rd Generation EPYC processor: CPUs in Core 1, that share the L1
    and L2 caches, have L1 and L2 instance ID as 1. For all the CPUs on
    the second chiplet, the L3 ID is also 1 leading to grouping on CPUs from
    Core 1 (1, 17) and the entire second chiplet (8-15, 24-31) as CPUs
    sharing one cache domain. This went undetected since x86 processors
    depended on arch specific populate_cache_leaves() method to repopulate
    the shared_cpus_map when CPU came back online until kernel release
    v6.3-rc5.
    
    Fixes: 198102c9 ("cacheinfo: Fix shared_cpu_map to handle shared caches at different levels")
    Signed-off-by: default avatarK Prateek Nayak <kprateek.nayak@amd.com>
    Reviewed-by: default avatarSudeep Holla <sudeep.holla@arm.com>
    Link: https://lore.kernel.org/r/20230508084115.1157-2-kprateek.nayak@amd.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    126310c9
cacheinfo.c 22.9 KB