• Linus Torvalds's avatar
    Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · 46d9be3e
    Linus Torvalds authored
    Pull workqueue updates from Tejun Heo:
     "A lot of activities on workqueue side this time.  The changes achieve
      the followings.
    
       - WQ_UNBOUND workqueues - the workqueues which are per-cpu - are
         updated to be able to interface with multiple backend worker pools.
         This involved a lot of churning but the end result seems actually
         neater as unbound workqueues are now a lot closer to per-cpu ones.
    
       - The ability to interface with multiple backend worker pools are
         used to implement unbound workqueues with custom attributes.
         Currently the supported attributes are the nice level and CPU
         affinity.  It may be expanded to include cgroup association in
         future.  The attributes can be specified either by calling
         apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if
         the workqueue in question is exported through sysfs.
    
         The backend worker pools are keyed by the actual attributes and
         shared by any workqueues which share the same attributes.  When
         attributes of a workqueue are changed, the workqueue binds to the
         worker pool with the specified attributes while leaving the work
         items which are already executing in its previous worker pools
         alone.
    
         This allows converting custom worker pool implementations which
         want worker attribute tuning to use workqueues.  The writeback pool
         is already converted in block tree and there are a couple others
         are likely to follow including btrfs io workers.
    
       - WQ_UNBOUND's ability to bind to multiple worker pools is also used
         to make it NUMA-aware.  Because there's no association between work
         item issuer and the specific worker assigned to execute it, before
         this change, using unbound workqueue led to unnecessary cross-node
         bouncing and it couldn't be helped by autonuma as it requires tasks
         to have implicit node affinity and workers are assigned randomly.
    
         After these changes, an unbound workqueue now binds to multiple
         NUMA-affine worker pools so that queued work items are executed in
         the same node.  This is turned on by default but can be disabled
         system-wide or for individual workqueues.
    
         Crypto was requesting NUMA affinity as encrypting data across
         different nodes can contribute noticeable overhead and doing it
         per-cpu was too limiting for certain cases and IO throughput could
         be bottlenecked by one CPU being fully occupied while others have
         idle cycles.
    
      While the new features required a lot of changes including
      restructuring locking, it didn't complicate the execution paths much.
      The unbound workqueue handling is now closer to per-cpu ones and the
      new features are implemented by simply associating a workqueue with
      different sets of backend worker pools without changing queue,
      execution or flush paths.
    
      As such, even though the amount of change is very high, I feel
      relatively safe in that it isn't likely to cause subtle issues with
      basic correctness of work item execution and handling.  If something
      is wrong, it's likely to show up as being associated with worker pools
      with the wrong attributes or OOPS while workqueue attributes are being
      changed or during CPU hotplug.
    
      While this creates more backend worker pools, it doesn't add too many
      more workers unless, of course, there are many workqueues with unique
      combinations of attributes.  Assuming everything else is the same,
      NUMA awareness costs an extra worker pool per NUMA node with online
      CPUs.
    
      There are also a couple things which are being routed outside the
      workqueue tree.
    
       - block tree pulled in workqueue for-3.10 so that writeback worker
         pool can be converted to unbound workqueue with sysfs control
         exposed.  This simplifies the code, makes writeback workers
         NUMA-aware and allows tuning nice level and CPU affinity via sysfs.
    
       - The conversion to workqueue means that there's no 1:1 association
         between a specific worker, which makes writeback folks unhappy as
         they want to be able to tell which filesystem caused a problem from
         backtrace on systems with many filesystems mounted.  This is
         resolved by allowing work items to set debug info string which is
         printed when the task is dumped.  As this change involves unifying
         implementations of dump_stack() and friends in arch codes, it's
         being routed through Andrew's -mm tree."
    
    * 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits)
      workqueue: use kmem_cache_free() instead of kfree()
      workqueue: avoid false negative WARN_ON() in destroy_workqueue()
      workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity
      workqueue: implement NUMA affinity for unbound workqueues
      workqueue: introduce put_pwq_unlocked()
      workqueue: introduce numa_pwq_tbl_install()
      workqueue: use NUMA-aware allocation for pool_workqueues
      workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq()
      workqueue: map an unbound workqueues to multiple per-node pool_workqueues
      workqueue: move hot fields of workqueue_struct to the end
      workqueue: make workqueue->name[] fixed len
      workqueue: add workqueue->unbound_attrs
      workqueue: determine NUMA node of workers accourding to the allowed cpumask
      workqueue: drop 'H' from kworker names of unbound worker pools
      workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]
      workqueue: move pwq_pool_locking outside of get/put_unbound_pool()
      workqueue: fix memory leak in apply_workqueue_attrs()
      workqueue: fix unbound workqueue attrs hashing / comparison
      workqueue: fix race condition in unbound workqueue free path
      workqueue: remove pwq_lock which is no longer used
      ...
    46d9be3e
cpuset.c 77.1 KB