• Tony Luck's avatar
    x86/resctrl: Introduce snc_nodes_per_l3_cache · e13db55b
    Tony Luck authored
    Intel Sub-NUMA Cluster (SNC) is a feature that subdivides the CPU cores
    and memory controllers on a socket into two or more groups. These are
    presented to the operating system as NUMA nodes.
    
    This may enable some workloads to have slightly lower latency to memory
    as the memory controller(s) in an SNC node are electrically closer to the
    CPU cores on that SNC node. This cost may be offset by lower bandwidth
    since the memory accesses for each core can only be interleaved between
    the memory controllers on the same SNC node.
    
    Resctrl monitoring on an Intel system depends upon attaching RMIDs to tasks
    to track L3 cache occupancy and memory bandwidth. There is an MSR that
    controls how the RMIDs are shared between SNC nodes.
    
    The default mode divides them numerically. E.g. when there are two SNC
    nodes on a socket the lower number half of the RMIDs are given to the
    first node, the remainder to the second node. This would be difficult
    to use with the Linux resctrl interface as specific RMID values assigned
    to resctrl groups are not visible to users.
    
    RMID sharing mode divides the physical RMIDs evenly between SNC nodes
    but uses a logical RMID in the IA32_PQR_ASSOC MSR. For example a system
    with 200 physical RMIDs (as enumerated by CPUID leaf 0xF) that has two
    SNC nodes per L3 cache instance would have 100 logical RMIDs available
    for Linux to use. A task running on SNC node 0 with RMID 5 would
    accumulate LLC occupancy and MBM bandwidth data in physical RMID 5.
    Another task using RMID 5, but running on SNC node 1 would accumulate
    data in physical RMID 105.
    
    Even with this renumbering SNC mode requires several changes in resctrl
    behavior for correct operation.
    
    Add a static global to arch/x86/kernel/cpu/resctrl/monitor.c to indicate
    how many SNC domains share an L3 cache instance.  Initialize this to
    "1". Runtime detection of SNC mode will adjust this value.
    
    Update all places to take appropriate action when SNC mode is enabled:
    1) The number of logical RMIDs per L3 cache available for use is the
       number of physical RMIDs divided by the number of SNC nodes.
    2) Likewise the "mon_scale" value must be divided by the number of SNC
       nodes.
    3) Add a function to convert from logical RMID values (assigned to
       tasks and loaded into the IA32_PQR_ASSOC MSR on context switch)
       to physical RMID values to load into IA32_QM_EVTSEL MSR when
       reading counters on each SNC node.
    Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
    Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
    Reviewed-by: default avatarReinette Chatre <reinette.chatre@intel.com>
    Tested-by: default avatarBabu Moger <babu.moger@amd.com>
    Link: https://lore.kernel.org/r/20240628215619.76401-7-tony.luck@intel.com
    e13db55b
monitor.c 31.7 KB