• Tejun Heo's avatar
    workqueue: Implement non-strict affinity scope for unbound workqueues · 8639eceb
    Tejun Heo authored
    An unbound workqueue can be served by multiple worker_pools to improve
    locality. The segmentation is achieved by grouping CPUs into pods. By
    default, the cache boundaries according to cpus_share_cache() define the
    CPUs are grouped. Let's a workqueue is allowed to run on all CPUs and the
    system has two L3 caches. The workqueue would be mapped to two worker_pools
    each serving one L3 cache domains.
    
    While this improves locality, because the pod boundaries are strict, it
    limits the total bandwidth a given issuer can consume. For example, let's
    say there is a thread pinned to a CPU issuing enough work items to saturate
    the whole machine. With the machine segmented into two pods, no matter how
    many work items it issues, it can only use half of the CPUs on the system.
    
    While this limitation has existed for a very long time, it wasn't very
    pronounced because the affinity grouping used to be always by NUMA nodes.
    With cache boundaries as the def...
    8639eceb
wq_monitor.py 6.37 KB