tools/workqueue/wq_monitor.py · bd30fe6a7d9b72e73c5ac9109cbc3066dde08034 · Kirill Smelkov / linux

workqueue: Implement non-strict affinity scope for unbound workqueues · 8639eceb

Tejun Heo authored Aug 07, 2023

An unbound workqueue can be served by multiple worker_pools to improve
locality. The segmentation is achieved by grouping CPUs into pods. By
default, the cache boundaries according to cpus_share_cache() define the
CPUs are grouped. Let's a workqueue is allowed to run on all CPUs and the
system has two L3 caches. The workqueue would be mapped to two worker_pools
each serving one L3 cache domains.

While this improves locality, because the pod boundaries are strict, it
limits the total bandwidth a given issuer can consume. For example, let's
say there is a thread pinned to a CPU issuing enough work items to saturate
the whole machine. With the machine segmented into two pods, no matter how
many work items it issues, it can only use half of the CPUs on the system.

While this limitation has existed for a very long time, it wasn't very
pronounced because the affinity grouping used to be always by NUMA nodes.
With cache boundaries as the def...

8639eceb

wq_monitor.py 6.37 KB

Replace wq_monitor.py