1. 20 Aug, 2012 2 commits
    • Tejun Heo's avatar
      workqueue: make all workqueues non-reentrant · dbf2576e
      Tejun Heo authored
      By default, each per-cpu part of a bound workqueue operates separately
      and a work item may be executing concurrently on different CPUs.  The
      behavior avoids some cross-cpu traffic but leads to subtle weirdities
      and not-so-subtle contortions in the API.
      
      * There's no sane usefulness in allowing a single work item to be
        executed concurrently on multiple CPUs.  People just get the
        behavior unintentionally and get surprised after learning about it.
        Most either explicitly synchronize or use non-reentrant/ordered
        workqueue but this is error-prone.
      
      * flush_work() can't wait for multiple instances of the same work item
        on different CPUs.  If a work item is executing on cpu0 and then
        queued on cpu1, flush_work() can only wait for the one on cpu1.
      
        Unfortunately, work items can easily cross CPU boundaries
        unintentionally when the queueing thread gets migrated.  This means
        that if multiple queuers compete, flush_work() can't even guarantee
        that the instance queued right before it is finished before
        returning.
      
      * flush_work_sync() was added to work around some of the deficiencies
        of flush_work().  In addition to the usual flushing, it ensures that
        all currently executing instances are finished before returning.
        This operation is expensive as it has to walk all CPUs and at the
        same time fails to address competing queuer case.
      
        Incorrectly using flush_work() when flush_work_sync() is necessary
        is an easy error to make and can lead to bugs which are difficult to
        reproduce.
      
      * Similar problems exist for flush_delayed_work[_sync]().
      
      Other than the cross-cpu access concern, there's no benefit in
      allowing parallel execution and it's plain silly to have this level of
      contortion for workqueue which is widely used from core code to
      extremely obscure drivers.
      
      This patch makes all workqueues non-reentrant.  If a work item is
      executing on a different CPU when queueing is requested, it is always
      queued to that CPU.  This guarantees that any given work item can be
      executing on one CPU at maximum and if a work item is queued and
      executing, both are on the same CPU.
      
      The only behavior change which may affect workqueue users negatively
      is that non-reentrancy overrides the affinity specified by
      queue_work_on().  On a reentrant workqueue, the affinity specified by
      queue_work_on() is always followed.  Now, if the work item is
      executing on one of the CPUs, the work item will be queued there
      regardless of the requested affinity.  I've reviewed all workqueue
      users which request explicit affinity, and, fortunately, none seems to
      be crazy enough to exploit parallel execution of the same work item.
      
      This adds an additional busy_hash lookup if the work item was
      previously queued on a different CPU.  This shouldn't be noticeable
      under any sane workload.  Work item queueing isn't a very
      high-frequency operation and they don't jump across CPUs all the time.
      In a micro benchmark to exaggerate this difference - measuring the
      time it takes for two work items to repeatedly jump between two CPUs a
      number (10M) of times with busy_hash table densely populated, the
      difference was around 3%.
      
      While the overhead is measureable, it is only visible in pathological
      cases and the difference isn't huge.  This change brings much needed
      sanity to workqueue and makes its behavior consistent with timer.  I
      think this is the right tradeoff to make.
      
      This enables significant simplification of workqueue API.
      Simplification patches will follow.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      dbf2576e
    • Valentin Ilie's avatar
      workqueue: fix checkpatch issues · 044c782c
      Valentin Ilie authored
      Fixed some checkpatch warnings.
      
      tj: adapted to wq/for-3.7 and massaged pr_xxx() format strings a bit.
      Signed-off-by: default avatarValentin Ilie <valentin.ilie@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      LKML-Reference: <1345326762-21747-1-git-send-email-valentin.ilie@gmail.com>
      044c782c
  2. 16 Aug, 2012 6 commits
    • Joonsoo Kim's avatar
      workqueue: use system_highpri_wq for unbind_work · 7635d2fd
      Joonsoo Kim authored
      To speed cpu down processing up, use system_highpri_wq.
      As scheduling priority of workers on it is higher than system_wq and
      it is not contended by other normal works on this cpu, work on it
      is processed faster than system_wq.
      
      tj: CPU up/downs care quite a bit about latency these days.  This
          shouldn't hurt anything and makes sense.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      7635d2fd
    • Joonsoo Kim's avatar
      workqueue: use system_highpri_wq for highpri workers in rebind_workers() · e2b6a6d5
      Joonsoo Kim authored
      In rebind_workers(), we do inserting a work to rebind to cpu for busy workers.
      Currently, in this case, we use only system_wq. This makes a possible
      error situation as there is mismatch between cwq->pool and worker->pool.
      
      To prevent this, we should use system_highpri_wq for highpri worker
      to match theses. This implements it.
      
      tj: Rephrased comment a bit.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      e2b6a6d5
    • Joonsoo Kim's avatar
      workqueue: introduce system_highpri_wq · 1aabe902
      Joonsoo Kim authored
      Commit 3270476a ('workqueue: reimplement
      WQ_HIGHPRI using a separate worker_pool') introduce separate worker pool
      for HIGHPRI. When we handle busyworkers for gcwq, it can be normal worker
      or highpri worker. But, we don't consider this difference in rebind_workers(),
      we use just system_wq for highpri worker. It makes mismatch between
      cwq->pool and worker->pool.
      
      It doesn't make error in current implementation, but possible in the future.
      Now, we introduce system_highpri_wq to use proper cwq for highpri workers
      in rebind_workers(). Following patch fix this issue properly.
      
      tj: Even apart from rebinding, having system_highpri_wq generally
          makes sense.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      1aabe902
    • Joonsoo Kim's avatar
      workqueue: change value of lcpu in __queue_delayed_work_on() · e42986de
      Joonsoo Kim authored
      We assign cpu id into work struct's data field in __queue_delayed_work_on().
      In current implementation, when work is come in first time,
      current running cpu id is assigned.
      If we do __queue_delayed_work_on() with CPU A on CPU B,
      __queue_work() invoked in delayed_work_timer_fn() go into
      the following sub-optimal path in case of WQ_NON_REENTRANT.
      
      	gcwq = get_gcwq(cpu);
      	if (wq->flags & WQ_NON_REENTRANT &&
      		(last_gcwq = get_work_gcwq(work)) && last_gcwq != gcwq) {
      
      Change lcpu to @cpu and rechange lcpu to local cpu if lcpu is WORK_CPU_UNBOUND.
      It is sufficient to prevent to go into sub-optimal path.
      
      tj: Slightly rephrased the comment.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      e42986de
    • Joonsoo Kim's avatar
      workqueue: correct req_cpu in trace_workqueue_queue_work() · b75cac93
      Joonsoo Kim authored
      When we do tracing workqueue_queue_work(), it records requested cpu.
      But, if !(@wq->flag & WQ_UNBOUND) and @cpu is WORK_CPU_UNBOUND,
      requested cpu is changed as local cpu.
      In case of @wq->flag & WQ_UNBOUND, above change is not occured,
      therefore it is reasonable to correct it.
      
      Use temporary local variable for storing requested cpu.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      b75cac93
    • Joonsoo Kim's avatar
      workqueue: use enum value to set array size of pools in gcwq · 330dad5b
      Joonsoo Kim authored
      Commit 3270476a ('workqueue: reimplement
      WQ_HIGHPRI using a separate worker_pool') introduce separate worker_pool
      for HIGHPRI. Although there is NR_WORKER_POOLS enum value which represent
      size of pools, definition of worker_pool in gcwq doesn't use it.
      Using it makes code robust and prevent future mistakes.
      So change code to use this enum value.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      330dad5b
  3. 14 Aug, 2012 1 commit
    • Tejun Heo's avatar
      workqueue: add missing wmb() in clear_work_data() · 23657bb1
      Tejun Heo authored
      Any operation which clears PENDING should be preceded by a wmb to
      guarantee that the next PENDING owner sees all the changes made before
      PENDING release.
      
      There are only two places where PENDING is cleared -
      set_work_cpu_and_clear_pending() and clear_work_data().  The caller of
      the former already does smp_wmb() but the latter doesn't have any.
      
      Move the wmb above set_work_cpu_and_clear_pending() into it and add
      one to clear_work_data().
      
      There hasn't been any report related to this issue, and, given how
      clear_work_data() is used, it is extremely unlikely to have caused any
      actual problems on any architecture.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      23657bb1
  4. 13 Aug, 2012 2 commits
    • Tejun Heo's avatar
      workqueue: fix CPU binding of flush_delayed_work[_sync]() · 1265057f
      Tejun Heo authored
      delayed_work encodes the workqueue to use and the last CPU in
      delayed_work->work.data while it's on timer.  The target CPU is
      implicitly recorded as the CPU the timer is queued on and
      delayed_work_timer_fn() queues delayed_work->work to the CPU it is
      running on.
      
      Unfortunately, this leaves flush_delayed_work[_sync]() no way to find
      out which CPU the delayed_work was queued for when they try to
      re-queue after killing the timer.  Currently, it chooses the local CPU
      flush is running on.  This can unexpectedly move a delayed_work queued
      on a specific CPU to another CPU and lead to subtle errors.
      
      There isn't much point in trying to save several bytes in struct
      delayed_work, which is already close to a hundred bytes on 64bit with
      all debug options turned off.  This patch adds delayed_work->cpu to
      remember the CPU it's queued for.
      
      Note that if the timer is migrated during CPU down, the work item
      could be queued to the downed global_cwq after this change.  As a
      detached global_cwq behaves like an unbound one, this doesn't change
      much for the delayed_work.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      1265057f
    • Tejun Heo's avatar
      workqueue: use mod_delayed_work() instead of cancel + queue · 41f63c53
      Tejun Heo authored
      Convert delayed_work users doing cancel_delayed_work() followed by
      queue_delayed_work() to mod_delayed_work().
      
      Most conversions are straight-forward.  Ones worth mentioning are,
      
      * drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
        use mod_delayed_work() and cancel loop in
        edac_mc_reset_delay_period() is dropped.
      
      * drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
        watchdog is active or not.  @fan_watchdog_active and related code
        dropped.
      
      * drivers/power/charger-manager.c: Seemingly a lot of
        delayed_work_pending() abuse going on here.
        [delayed_]work_pending() are unsynchronized and racy when used like
        this.  I converted one instance in fullbatt_handler().  Please
        conver the rest so that it invokes workqueue APIs for the intended
        target state rather than trying to game work item pending state
        transitions.  e.g. if timer should be modified - call
        mod_delayed_work(), canceled - call cancel_delayed_work[_sync]().
      
      * drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling()
        simplified.  Note that round_jiffies() calls in this function are
        meaningless.  round_jiffies() work on absolute jiffies not delta
        delay used by delayed_work.
      
      v2: Tomi pointed out that __cancel_delayed_work() users can't be
          safely converted to mod_delayed_work().  They could be calling it
          from irq context and if that happens while delayed_work_timer_fn()
          is running, it could deadlock.  __cancel_delayed_work() users are
          dropped.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarHenrique de Moraes Holschuh <hmh@hmh.eng.br>
      Acked-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Acked-by: default avatarAnton Vorontsov <cbouatmailru@gmail.com>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Doug Thompson <dougthompson@xmission.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: "John W. Linville" <linville@tuxdriver.com>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      41f63c53
  5. 03 Aug, 2012 13 commits
    • Tejun Heo's avatar
      workqueue: implement mod_delayed_work[_on]() · 8376fe22
      Tejun Heo authored
      Workqueue was lacking a mechanism to modify the timeout of an already
      pending delayed_work.  delayed_work users have been working around
      this using several methods - using an explicit timer + work item,
      messing directly with delayed_work->timer, and canceling before
      re-queueing, all of which are error-prone and/or ugly.
      
      This patch implements mod_delayed_work[_on]() which behaves similarly
      to mod_timer() - if the delayed_work is idle, it's queued with the
      given delay; otherwise, its timeout is modified to the new value.
      Zero @delay guarantees immediate execution.
      
      v2: Updated to reflect try_to_grab_pending() changes.  Now safe to be
          called from bh context.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      8376fe22
    • Tejun Heo's avatar
      workqueue: mark a work item being canceled as such · bbb68dfa
      Tejun Heo authored
      There can be two reasons try_to_grab_pending() can fail with -EAGAIN.
      One is when someone else is queueing or deqeueing the work item.  With
      the previous patches, it is guaranteed that PENDING and queued state
      will soon agree making it safe to busy-retry in this case.
      
      The other is if multiple __cancel_work_timer() invocations are racing
      one another.  __cancel_work_timer() grabs PENDING and then waits for
      running instances of the target work item on all CPUs while holding
      PENDING and !queued.  try_to_grab_pending() invoked from another task
      will keep returning -EAGAIN while the current owner is waiting.
      
      Not distinguishing the two cases is okay because __cancel_work_timer()
      is the only user of try_to_grab_pending() and it invokes
      wait_on_work() whenever grabbing fails.  For the first case, busy
      looping should be fine but wait_on_work() doesn't cause any critical
      problem.  For the latter case, the new contender usually waits for the
      same condition as the current owner, so no unnecessarily extended
      busy-looping happens.  Combined, these make __cancel_work_timer()
      technically correct even without irq protection while grabbing PENDING
      or distinguishing the two different cases.
      
      While the current code is technically correct, not distinguishing the
      two cases makes it difficult to use try_to_grab_pending() for other
      purposes than canceling because it's impossible to tell whether it's
      safe to busy-retry grabbing.
      
      This patch adds a mechanism to mark a work item being canceled.
      try_to_grab_pending() now disables irq on success and returns -EAGAIN
      to indicate that grabbing failed but PENDING and queued states are
      gonna agree soon and it's safe to busy-loop.  It returns -ENOENT if
      the work item is being canceled and it may stay PENDING && !queued for
      arbitrary amount of time.
      
      __cancel_work_timer() is modified to mark the work canceling with
      WORK_OFFQ_CANCELING after grabbing PENDING, thus making
      try_to_grab_pending() fail with -ENOENT instead of -EAGAIN.  Also, it
      invokes wait_on_work() iff grabbing failed with -ENOENT.  This isn't
      necessary for correctness but makes it consistent with other future
      users of try_to_grab_pending().
      
      v2: try_to_grab_pending() was testing preempt_count() to ensure that
          the caller has disabled preemption.  This triggers spuriously if
          !CONFIG_PREEMPT_COUNT.  Use preemptible() instead.  Reported by
          Fengguang Wu.
      
      v3: Updated so that try_to_grab_pending() disables irq on success
          rather than requiring preemption disabled by the caller.  This
          makes busy-looping easier and will allow try_to_grap_pending() to
          be used from bh/irq contexts.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      bbb68dfa
    • Tejun Heo's avatar
      workqueue: reorganize try_to_grab_pending() and __cancel_timer_work() · 36e227d2
      Tejun Heo authored
      * Use bool @is_dwork instead of @timer and let try_to_grab_pending()
        use to_delayed_work() to determine the delayed_work address.
      
      * Move timer handling from __cancel_work_timer() to
        try_to_grab_pending().
      
      * Make try_to_grab_pending() use -EAGAIN instead of -1 for
        busy-looping and drop the ret local variable.
      
      * Add proper function comment to try_to_grab_pending().
      
      This makes the code a bit easier to understand and will ease further
      changes.  This patch doesn't make any functional change.
      
      v2: Use @is_dwork instead of @timer.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      36e227d2
    • Tejun Heo's avatar
      workqueue: factor out __queue_delayed_work() from queue_delayed_work_on() · 7beb2edf
      Tejun Heo authored
      This is to prepare for mod_delayed_work[_on]() and doesn't cause any
      functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      7beb2edf
    • Tejun Heo's avatar
      workqueue: introduce WORK_OFFQ_FLAG_* · b5490077
      Tejun Heo authored
      Low WORK_STRUCT_FLAG_BITS bits of work_struct->data contain
      WORK_STRUCT_FLAG_* and flush color.  If the work item is queued, the
      rest point to the cpu_workqueue with WORK_STRUCT_CWQ set; otherwise,
      WORK_STRUCT_CWQ is clear and the bits contain the last CPU number -
      either a real CPU number or one of WORK_CPU_*.
      
      Scheduled addition of mod_delayed_work[_on]() requires an additional
      flag, which is used only while a work item is off queue.  There are
      more than enough bits to represent off-queue CPU number on both 32 and
      64bits.  This patch introduces WORK_OFFQ_FLAG_* which occupy the lower
      part of the @work->data high bits while off queue.  This patch doesn't
      define any actual OFFQ flag yet.
      
      Off-queue CPU number is now shifted by WORK_OFFQ_CPU_SHIFT, which adds
      the number of bits used by OFFQ flags to WORK_STRUCT_FLAG_SHIFT, to
      make room for OFFQ flags.
      
      To avoid shift width warning with large WORK_OFFQ_FLAG_BITS, ulong
      cast is added to WORK_STRUCT_NO_CPU and, just in case, BUILD_BUG_ON()
      to check that there are enough bits to accomodate off-queue CPU number
      is added.
      
      This patch doesn't make any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      b5490077
    • Tejun Heo's avatar
      workqueue: move try_to_grab_pending() upwards · bf4ede01
      Tejun Heo authored
      try_to_grab_pending() will be used by to-be-implemented
      mod_delayed_work[_on]().  Move try_to_grab_pending() and related
      functions above queueing functions.
      
      This patch only moves functions around.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      bf4ede01
    • Tejun Heo's avatar
      workqueue: fix zero @delay handling of queue_delayed_work_on() · 715f1300
      Tejun Heo authored
      If @delay is zero and the dealyed_work is idle, queue_delayed_work()
      queues it for immediate execution; however, queue_delayed_work_on()
      lacks this logic and always goes through timer regardless of @delay.
      
      This patch moves 0 @delay handling logic from queue_delayed_work() to
      queue_delayed_work_on() so that both functions behave the same.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      715f1300
    • Tejun Heo's avatar
      workqueue: unify local CPU queueing handling · 57469821
      Tejun Heo authored
      Queueing functions have been using different methods to determine the
      local CPU.
      
      * queue_work() superflously uses get/put_cpu() to acquire and hold the
        local CPU across queue_work_on().
      
      * delayed_work_timer_fn() uses smp_processor_id().
      
      * queue_delayed_work() calls queue_delayed_work_on() with -1 @cpu
        which is interpreted as the local CPU.
      
      * flush_delayed_work[_sync]() were using raw_smp_processor_id().
      
      * __queue_work() interprets %WORK_CPU_UNBOUND as local CPU if the
        target workqueue is bound one but nobody uses this.
      
      This patch converts all functions to uniformly use %WORK_CPU_UNBOUND
      to indicate local CPU and use the local binding feature of
      __queue_work().  unlikely() is dropped from %WORK_CPU_UNBOUND handling
      in __queue_work().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      57469821
    • Tejun Heo's avatar
      workqueue: set delayed_work->timer function on initialization · d8e794df
      Tejun Heo authored
      delayed_work->timer.function is currently initialized during
      queue_delayed_work_on().  Export delayed_work_timer_fn() and set
      delayed_work timer function during delayed_work initialization
      together with other fields.
      
      This ensures the timer function is always valid on an initialized
      delayed_work.  This is to help mod_delayed_work() implementation.
      
      To detect delayed_work users which diddle with the internal timer,
      trigger WARN if timer function doesn't match on queue.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      d8e794df
    • Tejun Heo's avatar
      workqueue: disable irq while manipulating PENDING · 8930caba
      Tejun Heo authored
      Queueing operations use WORK_STRUCT_PENDING_BIT to synchronize access
      to the target work item.  They first try to claim the bit and proceed
      with queueing only after that succeeds and there's a window between
      PENDING being set and the actual queueing where the task can be
      interrupted or preempted.
      
      There's also a similar window in process_one_work() when clearing
      PENDING.  A work item is dequeued, gcwq->lock is released and then
      PENDING is cleared and the worker might get interrupted or preempted
      between releasing gcwq->lock and clearing PENDING.
      
      cancel[_delayed]_work_sync() tries to claim or steal PENDING.  The
      function assumes that a work item with PENDING is either queued or in
      the process of being [de]queued.  In the latter case, it busy-loops
      until either the work item loses PENDING or is queued.  If canceling
      coincides with the above described interrupts or preemptions, the
      canceling task will busy-loop while the queueing or executing task is
      preempted.
      
      This patch keeps irq disabled across claiming PENDING and actual
      queueing and moves PENDING clearing in process_one_work() inside
      gcwq->lock so that busy looping from PENDING && !queued doesn't wait
      for interrupted/preempted tasks.  Note that, in process_one_work(),
      setting last CPU and clearing PENDING got merged into single
      operation.
      
      This removes possible long busy-loops and will allow using
      try_to_grab_pending() from bh and irq contexts.
      
      v2: __queue_work() was testing preempt_count() to ensure that the
          caller has disabled preemption.  This triggers spuriously if
          !CONFIG_PREEMPT_COUNT.  Use preemptible() instead.  Reported by
          Fengguang Wu.
      
      v3: Disable irq instead of preemption.  IRQ will be disabled while
          grabbing gcwq->lock later anyway and this allows using
          try_to_grab_pending() from bh and irq contexts.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      8930caba
    • Tejun Heo's avatar
      workqueue: add missing smp_wmb() in process_one_work() · 959d1af8
      Tejun Heo authored
      WORK_STRUCT_PENDING is used to claim ownership of a work item and
      process_one_work() releases it before starting execution.  When
      someone else grabs PENDING, all pre-release updates to the work item
      should be visible and all updates made by the new owner should happen
      afterwards.
      
      Grabbing PENDING uses test_and_set_bit() and thus has a full barrier;
      however, clearing doesn't have a matching wmb.  Given the preceding
      spin_unlock and use of clear_bit, I don't believe this can be a
      problem on an actual machine and there hasn't been any related report
      but it still is theretically possible for clear_pending to permeate
      upwards and happen before work->entry update.
      
      Add an explicit smp_wmb() before work_clear_pending().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: stable@vger.kernel.org
      959d1af8
    • Tejun Heo's avatar
      workqueue: make queueing functions return bool · d4283e93
      Tejun Heo authored
      All queueing functions return 1 on success, 0 if the work item was
      already pending.  Update them to return bool instead.  This signifies
      better that they don't return 0 / -errno.
      
      This is cleanup and doesn't cause any functional difference.
      
      While at it, fix comment opening for schedule_work_on().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      d4283e93
    • Tejun Heo's avatar
      workqueue: reorder queueing functions so that _on() variants are on top · 0a13c00e
      Tejun Heo authored
      Currently, queue/schedule[_delayed]_work_on() are located below the
      counterpart without the _on postifx even though the latter is usually
      implemented using the former.  Swap them.
      
      This is cleanup and doesn't cause any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      0a13c00e
  6. 02 Aug, 2012 16 commits
    • Linus Torvalds's avatar
      Linux 3.6-rc1 · 0d7614f0
      Linus Torvalds authored
      0d7614f0
    • Linus Torvalds's avatar
      Merge branch 'for-linus-3.6' of git://dev.laptop.org/users/dilinger/linux-olpc · fc6bdb59
      Linus Torvalds authored
      Pull OLPC platform updates from Andres Salomon:
       "These move the OLPC Embedded Controller driver out of
        arch/x86/platform and into drivers/platform/olpc.
      
        OLPC machines are now ARM-based (which means lots of x86 and ARM
        changes), but are typically pretty self-contained..  so it makes more
        sense to go through a separate OLPC tree after getting the appropriate
        review/ACKs."
      
      * 'for-linus-3.6' of git://dev.laptop.org/users/dilinger/linux-olpc:
        x86: OLPC: move s/r-related EC cmds to EC driver
        Platform: OLPC: move global variables into priv struct
        Platform: OLPC: move debugfs support from x86 EC driver
        x86: OLPC: switch over to using new EC driver on x86
        Platform: OLPC: add a suspended flag to the EC driver
        Platform: OLPC: turn EC driver into a platform_driver
        Platform: OLPC: allow EC cmd to be overridden, and create a workqueue to call it
        drivers: OLPC: update various drivers to include olpc-ec.h
        Platform: OLPC: add a stub to drivers/platform/ for the OLPC EC driver
      fc6bdb59
    • Linus Torvalds's avatar
      Merge tag 'dt2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 44d82e29
      Linus Torvalds authored
      Pull arm-soc Marvell Orion device-tree updates from Olof Johansson:
       "This contains a set of device-tree conversions for Marvell Orion
        platforms that were staged early but took a few tries to get the
        branch into a format where it was suitable for us to pick up.
      
        Given that most people working on these platforms are hobbyists with
        limited time, we were a bit more flexible with merging it even though
        it came in late."
      
      * tag 'dt2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (21 commits)
        ARM: Kirkwood: Replace mrvl with marvell
        ARM: Kirkwood: Describe GoFlex Net LEDs and SATA in DT.
        ARM: Kirkwood: Describe Dreamplug LEDs in DT.
        ARM: Kirkwood: Describe iConnects LEDs in DT.
        ARM: Kirkwood: Describe iConnects temperature sensor in DT.
        ARM: Kirkwood: Describe IB62x0 LEDs in DT.
        ARM: Kirkwood: Describe IB62x0 gpio-keys in DT.
        ARM: Kirkwood: Describe DNS32? gpio-keys in DT.
        ARM: Kirkwood: Move common portions into a kirkwood-dnskw.dtsi
        ARM: Kirkwood: Replace DNS-320/DNS-325 leds with dt bindings
        ARM: Kirkwood: Describe DNS325 temperature sensor in DT.
        ARM: Kirkwood: Use DT to configure SATA device.
        ARM: kirkwood: use devicetree for SPI on dreamplug
        ARM: kirkwood: Add LS-XHL and LS-CHLv2 support
        ARM: Kirkwood: Initial DTS support for Kirkwood GoFlex Net
        ARM: Kirkwood: Add basic device tree support for QNAP TS219.
        ATA: sata_mv: Add device tree support
        ARM: Orion: DTify the watchdog timer.
        ARM: Orion: Add arch support needed for I2C via DT.
        ARM: kirkwood: use devicetree for orion-spi
        ...
      
      Conflicts:
      	drivers/watchdog/orion_wdt.c
      44d82e29
    • Linus Torvalds's avatar
      Merge tag 'pm2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · bfdf85df
      Linus Torvalds authored
      Pull arm-soc cpuidle enablement for OMAP from Olof Johansson:
       "Coupled cpuidle was meant to merge for 3.5 through Len Brown's tree,
        but didn't go in because the pull request ended up rejected.  So it
        just got merged, and we got this staged branch that enables the
        coupled cpuidle code on OMAP.
      
        With a stable git workflow from the other maintainer we could have
        staged this earlier, but that wasn't the case so we have had to merge
        it late.
      
        The alternative is to hold it off until 3.7 but given that the code is
        well-isolated to OMAP and they are eager to see it go in, I didn't
        push back hard in that direction."
      
      * tag 'pm2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        ARM: OMAP4: CPUidle: Open broadcast clock-event device.
        ARM: OMAP4: CPUidle: add synchronization for coupled idle states
        ARM: OMAP4: CPUidle: Use coupled cpuidle states to implement SMP cpuidle.
        ARM: OMAP: timer: allow gp timer clock-event to be used on both cpus
      bfdf85df
    • Linus Torvalds's avatar
      Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · d1494ba8
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "A few fixes for merge window fallout, and a bugfix for timer resume on
        PRIMA2."
      
      * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        ARM: mmp: add missing irqs.h
        arm: mvebu: fix typo in .dtsi comment for Armada XP SoCs
        ARM: PRIMA2: delete redundant codes to restore LATCHED when timer resumes
        ARM: mxc: Include missing irqs.h header
      d1494ba8
    • Linus Torvalds's avatar
      Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh · 0a276d16
      Linus Torvalds authored
      Pull SuperH fixes from Paul Mundt.
      
      * tag 'sh-for-linus' of git://github.com/pmundt/linux-sh: (24 commits)
        sh: explicitly include sh_dma.h in setup-sh7722.c
        sh: ecovec: care CN5 VBUS if USB host mode
        sh: sh7724: fixup renesas_usbhs clock settings
        sh: intc: initial irqdomain support.
        sh: pfc: Fix up init ordering mess.
        serial: sh-sci: fix compilation breakage, when DMA is enabled
        dmaengine: shdma: restore partial transfer calculation
        sh: modify the sh_dmae_slave_config for RSPI in setup-sh7757
        sh: Fix up recursive fault in oops with unset TTB.
        sh: pfc: Build fix for pinctrl_remove_gpio_range() changes.
        sh: select the fixed regulator driver on several boards
        sh: ecovec: switch MMC power control to regulators
        sh: add fixed voltage regulators to se7724
        sh: add fixed voltage regulators to sdk7786
        sh: add fixed voltage regulators to rsk
        sh: add fixed voltage regulators to migor
        sh: add fixed voltage regulators to kfr2r09
        sh: add fixed voltage regulators to ap325rxa
        sh: add fixed voltage regulators to sh7757lcr
        sh: add fixed voltage regulators to sh2007
        ...
      0a276d16
    • Linus Torvalds's avatar
      Merge tag 'md-3.6' of git://neil.brown.name/md · 25aa6a7a
      Linus Torvalds authored
      Pull additional md update from NeilBrown:
       "This contains a few patches that depend on plugging changes in the
        block layer so needed to wait for those.
      
        It also contains a Kconfig fix for the new RAID10 support in dm-raid."
      
      * tag 'md-3.6' of git://neil.brown.name/md:
        md/dm-raid: DM_RAID should select MD_RAID10
        md/raid1: submit IO from originating thread instead of md thread.
        raid5: raid5d handle stripe in batch way
        raid5: make_request use batch stripe release
      25aa6a7a
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · c8924234
      Linus Torvalds authored
      Pull two ceph fixes from Sage Weil:
       "The first patch fixes up the old crufty open intent code to use the
        atomic_open stuff properly, and the second fixes a possible null deref
        and memory leak with the crypto keys."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
        libceph: fix crypto key null deref, memory leak
        ceph: simplify+fix atomic_open
      c8924234
    • Linus Torvalds's avatar
      Merge tag 'ecryptfs-3.6-rc1-fixes' of... · 410fc4ce
      Linus Torvalds authored
      Merge tag 'ecryptfs-3.6-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
      
      Pull ecryptfs fixes from Tyler Hicks:
       - Fixes a bug when the lower filesystem mount options include 'acl',
         but the eCryptfs mount options do not
       - Cleanups in the messaging code
       - Better handling of empty files in the lower filesystem to improve
         usability.  Failed file creations are now cleaned up and empty lower
         files are converted into eCryptfs during open().
       - The write-through cache changes are being reverted due to bugs that
         are not easy to fix.  Stability outweighs the performance
         enhancements here.
       - Improvement to the mount code to catch unsupported ciphers specified
         in the mount options
      
      * tag 'ecryptfs-3.6-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
        eCryptfs: check for eCryptfs cipher support at mount
        eCryptfs: Revert to a writethrough cache model
        eCryptfs: Initialize empty lower files when opening them
        eCryptfs: Unlink lower inode when ecryptfs_create() fails
        eCryptfs: Make all miscdev functions use daemon ptr in file private_data
        eCryptfs: Remove unused messaging declarations and function
        eCryptfs: Copy up POSIX ACL and read-only flags from lower mount
      410fc4ce
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 · 630103ea
      Linus Torvalds authored
      Pull CIFS update from Steve French:
       "Adds SMB2 rmdir/mkdir capability to the SMB2/SMB2.1 support in cifs.
      
        I am holding up a few more days on merging the remainder of the
        SMB2/SMB2.1 enablement although it is nearing review completion, in
        order to address some review comments from Jeff Layton on a few of the
        subsequent SMB2 patches, and also to debug an unrelated cifs problem
        that Pavel discovered."
      
      * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
        CIFS: Add SMB2 support for rmdir
        CIFS: Move rmdir code to ops struct
        CIFS: Add SMB2 support for mkdir operation
        CIFS: Separate protocol specific part from mkdir
        CIFS: Simplify cifs_mkdir call
      630103ea
    • Linus Torvalds's avatar
      mm: remove node_start_pfn checking in new WARN_ON for now · 8783b6e2
      Linus Torvalds authored
      Borislav Petkov reports that the new warning added in commit
      88fdf75d ("mm: warn if pg_data_t isn't initialized with zero")
      triggers for him, and it is the node_start_pfn field that has already
      been initialized once.
      
      The call trace looks like this:
      
        x86_64_start_kernel ->
          x86_64_start_reservations ->
          start_kernel ->
          setup_arch ->
          paging_init ->
          zone_sizes_init ->
          free_area_init_nodes ->
          free_area_init_node
      
      and (with the warning replaced by debug output), Borislav sees
      
        On node 0 totalpages: 4193848
          DMA zone: 64 pages used for memmap
          DMA zone: 6 pages reserved
          DMA zone: 3890 pages, LIFO batch:0
          DMA32 zone: 16320 pages used for memmap
          DMA32 zone: 798464 pages, LIFO batch:31
          Normal zone: 52736 pages used for memmap
          Normal zone: 3322368 pages, LIFO batch:31
        free_area_init_node: pgdat->node_start_pfn: 4423680      <----
        On node 1 totalpages: 4194304
          Normal zone: 65536 pages used for memmap
          Normal zone: 4128768 pages, LIFO batch:31
        free_area_init_node: pgdat->node_start_pfn: 8617984      <----
        On node 2 totalpages: 4194304
          Normal zone: 65536 pages used for memmap
          Normal zone: 4128768 pages, LIFO batch:31
        free_area_init_node: pgdat->node_start_pfn: 12812288     <----
        On node 3 totalpages: 4194304
          Normal zone: 65536 pages used for memmap
          Normal zone: 4128768 pages, LIFO batch:31
      
      so remove the bogus warning for now to avoid annoying people.  Minchan
      Kim is looking at it.
      Reported-by: default avatarBorislav Petkov <bp@amd64.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8783b6e2
    • Haojian Zhuang's avatar
      ARM: mmp: add missing irqs.h · bac6f615
      Haojian Zhuang authored
      arch/arm/mach-mmp/gplugd.c:195:13: error: ‘MMP_NR_IRQS’ undeclared here
      (not in a function)
      make[1]: *** [arch/arm/mach-mmp/gplugd.o] Error 1
      
      Include <mach/irqs.h> to fix this issue.
      Signed-off-by: default avatarHaojian Zhuang <haojian.zhuang@gmail.com>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      bac6f615
    • Thomas Petazzoni's avatar
      arm: mvebu: fix typo in .dtsi comment for Armada XP SoCs · 10b683cb
      Thomas Petazzoni authored
      The comment was wrongly referring to Armada 370 while the file is
      related to Armada XP.
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      10b683cb
    • Barry Song's avatar
      ARM: PRIMA2: delete redundant codes to restore LATCHED when timer resumes · debeaf6c
      Barry Song authored
      The only way to write LATCHED registers to write LATCH_BIT to LATCH register,
      that will latch COUNTER into LATCHED.e.g.
      writel_relaxed(SIRFSOC_TIMER_LATCH_BIT, sirfsoc_timer_base +
      	SIRFSOC_TIMER_LATCH);
      
      Writing values to LATCHED registers directly is useless at all.
      Signed-off-by: default avatarBarry Song <Baohua.Song@csr.com>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      debeaf6c
    • Sylvain Munaut's avatar
      libceph: fix crypto key null deref, memory leak · f0666b1a
      Sylvain Munaut authored
      Avoid crashing if the crypto key payload was NULL, as when it was not correctly
      allocated and initialized.  Also, avoid leaking it.
      Signed-off-by: default avatarSylvain Munaut <tnt@246tNt.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      Reviewed-by: default avatarAlex Elder <elder@inktank.com>
      f0666b1a
    • Sage Weil's avatar
      ceph: simplify+fix atomic_open · 5ef50c3b
      Sage Weil authored
      The initial ->atomic_open op was carried over from the old intent code,
      which was incomplete and didn't really work.  Replace it with a fresh
      method.  In particular:
      
       * always attempt to do an atomic open+lookup, both for the create case
         and for lookups of existing files.
       * fix symlink handling by returning 1 to the VFS so that we can follow
         the link to its destination. This fixes a longstanding ceph bug (#2392).
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      5ef50c3b