• Tejun Heo's avatar
    workqueue: Add workqueue_attrs->__pod_cpumask · 9546b29e
    Tejun Heo authored
    workqueue_attrs has two uses:
    
    * to specify the required unouned workqueue properties by users
    
    * to match worker_pool's properties to workqueues by core code
    
    For example, if the user wants to restrict a workqueue to run only CPUs 0
    and 2, and the two CPUs are on different affinity scopes, the workqueue's
    attrs->cpumask would contains CPUs 0 and 2, and the workqueue would be
    associated with two worker_pools, one with attrs->cpumask containing just
    CPU 0 and the other CPU 2.
    
    Workqueue wants to support non-strict affinity scopes where work items are
    started in their matching affinity scopes but the scheduler is free to
    migrate them outside the starting scopes, which can enable utilizing the
    whole machine while maintaining most of the locality benefits from affinity
    scopes.
    
    To enable that, worker_pools need to distinguish the strict affinity that it
    has to follow (because that's the restriction coming from the user) and the
    soft affinity that it wants to apply when dispatching work items. Note that
    two worker_pools with different soft dispatching requirements have to be
    separate; otherwise, for example, we'd be ping-ponging worker threads across
    NUMA boundaries constantly.
    
    This patch adds workqueue_attrs->__pod_cpumask. The new field is double
    underscored as it's only used internally to distinguish worker_pools. A
    worker_pool's ->cpumask is now always the same as the online subset of
    allowed CPUs of the associated workqueues, and ->__pod_cpumask is the pod's
    subset of that ->cpumask. Going back to the example above, both worker_pools
    would have ->cpumask containing both CPUs 0 and 2 but one's ->__pod_cpumask
    would contain 0 while the other's 2.
    
    * pool_allowed_cpus() is added. It returns the worker_pool's strict cpumask
      that the pool's workers must stay within. This is currently always
      ->__pod_cpumask as all boundaries are still strict.
    
    * As a workqueue_attrs can now track both the associated workqueues' cpumask
      and its per-pod subset, wq_calc_pod_cpumask() no longer needs an external
      out-argument. Drop @cpumask and instead store the result in
      ->__pod_cpumask.
    
    * The above also simplifies apply_wqattrs_prepare() as the same
      workqueue_attrs can be used to create all pods associated with a
      workqueue. tmp_attrs is dropped.
    
    * wq_update_pod() is updated to use wqattrs_equal() to test whether a pwq
      update is needed instead of only comparing ->cpumask so that
      ->__pod_cpumask is compared too. It can directly compare ->__pod_cpumaks
      but the code is easier to understand and more robust this way.
    
    The only user-visible behavior change is that two workqueues with different
    cpumasks no longer can share worker_pools even when their pod subsets
    coincide. Going back to the example, let's say there's another workqueue
    with cpumask 0, 2, 3, where 2 and 3 are in the same pod. It would be mapped
    to two worker_pools - one with CPU 0, the other with 2 and 3. The former has
    the same cpumask as the first pod of the earlier example and would have
    shared the same worker_pool but that's no longer the case after this patch.
    The worker_pools would have the same ->__pod_cpumask but their ->cpumask's
    wouldn't match.
    
    While this is necessary to support non-strict affinity scopes, there can be
    further optimizations to maintain sharing among strict affinity scopes.
    However, non-strict affinity scopes are going to be preferable for most use
    cases and we don't see very diverse mixture of unbound workqueue cpumasks
    anyway, so the additional overhead doesn't seem to justify the extra
    complexity.
    
    v2: - wq_update_pod() was incorrectly comparing target_attrs->__pod_cpumask
          to pool->attrs->cpumask instead of its ->__pod_cpumask. Fix it by
          using wqattrs_equal() for comparison instead.
    
        - Per-cpu worker pools weren't initializing ->__pod_cpumask which caused
          a subtle problem later on. Set it to cpumask_of(cpu) like ->cpumask.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    9546b29e
workqueue.c 185 KB