An error occurred fetching the project authors.
  1. 07 Apr, 2009 1 commit
  2. 01 Apr, 2009 1 commit
    • Davide Libenzi's avatar
      epoll keyed wakeups: add __wake_up_locked_key() and __wake_up_sync_key() · 4ede816a
      Davide Libenzi authored
      This patchset introduces wakeup hints for some of the most popular (from
      epoll POV) devices, so that epoll code can avoid spurious wakeups on its
      waiters.
      
      The problem with epoll is that the callback-based wakeups do not, ATM,
      carry any information about the events the wakeup is related to.  So the
      only choice epoll has (not being able to call f_op->poll() from inside the
      callback), is to add the file* to a ready-list and resolve the real events
      later on, at epoll_wait() (or its own f_op->poll()) time.  This can cause
      spurious wakeups, since the wake_up() itself might be for an event the
      caller is not interested into.
      
      The rate of these spurious wakeup can be pretty high in case of many
      network sockets being monitored.
      
      By allowing devices to report the events the wakeups refer to (at least
      the two major classes - POLLIN/POLLOUT), we are able to spare useless
      wakeups by proper handling inside the epoll's poll callback.
      
      Epoll will have in any case to call f_op->poll() on the file* later on,
      since the change to be done in order to have the full event set sent via
      wakeup, is too invasive for the way our f_op->poll() system works (the
      full event set is calculated inside the poll function - there are too many
      of them to even start thinking the change - also poll/select would need
      change too).
      
      Epoll is changed in a way that both devices which send event hints, and
      the ones that don't, are correctly handled.  The former will gain some
      efficiency though.
      
      As a general rule for devices, would be to add an event mask by using
      key-aware wakeup macros, when making up poll wait queues.  I tested it
      (together with the epoll's poll fix patch Andrew has in -mm) and wakeups
      for the supported devices are correctly filtered.
      
      Test program available here:
      
      http://www.xmailserver.org/epoll_test.c
      
      This patch:
      
      Nothing revolutionary here.  Just using the available "key" that our
      wakeup core already support.  The __wake_up_locked_key() was no brainer,
      since both __wake_up_locked() and __wake_up_locked_key() are thin wrappers
      around __wake_up_common().
      
      The __wake_up_sync() function had a body, so the choice was between
      borrowing the body for __wake_up_sync_key() and calling it from
      __wake_up_sync(), or make an inline and calling it from both.  I chose the
      former since in most archs it all resolves to "mov $0, REG; jmp ADDR".
      Signed-off-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      Cc: William Lee Irwin III <wli@movementarian.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4ede816a
  3. 31 Mar, 2009 1 commit
    • Peter Zijlstra's avatar
      hrtimer: fix rq->lock inversion (again) · 7f1e2ca9
      Peter Zijlstra authored
      It appears I inadvertly introduced rq->lock recursion to the
      hrtimer_start() path when I delegated running already expired
      timers to softirq context.
      
      This patch fixes it by introducing a __hrtimer_start_range_ns()
      method that will not use raise_softirq_irqoff() but
      __raise_softirq_irqoff() which avoids the wakeup.
      
      It then also changes schedule() to check for pending softirqs and
      do the wakeup then, I'm not quite sure I like this last bit, nor
      am I convinced its really needed.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      LKML-Reference: <20090313112301.096138802@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7f1e2ca9
  4. 30 Mar, 2009 1 commit
  5. 29 Mar, 2009 1 commit
  6. 25 Mar, 2009 11 commits
    • Gautham R Shenoy's avatar
      sched: Add comments to find_busiest_group() function · b7bb4c9b
      Gautham R Shenoy authored
      Impact: cleanup
      
      Add /** style comments around find_busiest_group(). Also add a few
      explanatory comments.
      
      This concludes the find_busiest_group() cleanup. The function is
      now down to 72 lines from the original 313 lines.
      Signed-off-by: default avatarGautham R Shenoy <ego@in.ibm.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091427.13992.18933.stgit@sofia.in.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b7bb4c9b
    • Gautham R Shenoy's avatar
      sched: Refactor the power savings balance code · c071df18
      Gautham R Shenoy authored
      Impact: cleanup
      
      Create seperate helper functions to initialize the
      power-savings-balance related variables, to update them and
      to check if we have a scope for performing power-savings balance.
      
      Add no-op inline functions for the !(CONFIG_SCHED_MC || CONFIG_SCHED_SMT)
      case.
      
      This will eliminate all the #ifdef jungle in find_busiest_group() and the
      other helper functions.
      Signed-off-by: default avatarGautham R Shenoy <ego@in.ibm.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091422.13992.73616.stgit@sofia.in.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c071df18
    • Gautham R Shenoy's avatar
      sched: Optimize the !power_savings_balance during fbg() · a021dc03
      Gautham R Shenoy authored
      Impact: cleanup, micro-optimization
      
      We don't need to perform power_savings balance if either the
      cpu is NOT_IDLE or if the sched_domain doesn't contain the
      SD_POWERSAVINGS_BALANCE flag set.
      
      Currently, we check for these conditions multiple number of
      times, even though these variables don't change over the scope
      of find_busiest_group().
      
      Check once, and store the value in the already exiting
      "power_savings_balance" variable.
      Signed-off-by: default avatarGautham R Shenoy <ego@in.ibm.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091417.13992.2657.stgit@sofia.in.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a021dc03
    • Gautham R Shenoy's avatar
      sched: Create a helper function to calculate imbalance · dbc523a3
      Gautham R Shenoy authored
      Move all the imbalance calculation out of find_busiest_group()
      through this helper function.
      
      With this change, the structure of find_busiest_group() will be
      as follows:
      
      - update_sched_domain_statistics.
      
      - check if imbalance exits.
      
      - update imbalance and return busiest.
      Signed-off-by: default avatarGautham R Shenoy <ego@in.ibm.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091411.13992.43293.stgit@sofia.in.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      dbc523a3
    • Gautham R Shenoy's avatar
      sched: Create helper to calculate small_imbalance in fbg() · 2e6f44ae
      Gautham R Shenoy authored
      Impact: cleanup
      
      We have two places in find_busiest_group() where we need to calculate
      the minor imbalance before returning the busiest group. Encapsulate
      this functionality into a seperate helper function.
      
      Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: default avatarGautham R Shenoy <ego@in.ibm.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      LKML-Reference: <20090325091406.13992.54316.stgit@sofia.in.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2e6f44ae
    • Gautham R Shenoy's avatar
      sched: Create a helper function to calculate sched_domain stats for fbg() · 37abe198
      Gautham R Shenoy authored
      Impact: cleanup
      
      Create a helper function named update_sd_lb_stats() to update the
      various sched_domain related statistics in find_busiest_group().
      
      With this we would have moved all the statistics computation out of
      find_busiest_group().
      
      Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: default avatarGautham R Shenoy <ego@in.ibm.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      LKML-Reference: <20090325091401.13992.88737.stgit@sofia.in.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      37abe198
    • Gautham R Shenoy's avatar
      sched: Define structure to store the sched_domain statistics for fbg() · 222d656d
      Gautham R Shenoy authored
      Impact: cleanup
      
      Currently we use a lot of local variables in find_busiest_group()
      to capture the various statistics related to the sched_domain.
      Group them together into a single data structure.
      
      This will help us to offload the job of updating the sched_domain
      statistics to a helper function.
      
      Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: default avatarGautham R Shenoy <ego@in.ibm.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      LKML-Reference: <20090325091356.13992.25970.stgit@sofia.in.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      222d656d
    • Gautham R Shenoy's avatar
      sched: Create a helper function to calculate sched_group stats for fbg() · 1f8c553d
      Gautham R Shenoy authored
      Impact: cleanup
      
      Create a helper function named update_sg_lb_stats() which
      can be invoked to calculate the individual group's statistics
      in find_busiest_group().
      
      This reduces the lenght of find_busiest_group() considerably.
      
      Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: default avatarGautham R Shenoy <ego@in.ibm.com>
      Aked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      LKML-Reference: <20090325091351.13992.43461.stgit@sofia.in.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1f8c553d
    • Gautham R Shenoy's avatar
      sched: Define structure to store the sched_group statistics for fbg() · 381be78f
      Gautham R Shenoy authored
      Impact: cleanup
      
      Currently a whole bunch of variables are used to store the
      various statistics pertaining to the groups we iterate over
      in find_busiest_group().
      
      Group them together in a single data structure and add
      appropriate comments.
      
      This will be useful later on when we create helper functions
      to calculate the sched_group statistics.
      
      Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: default avatarGautham R Shenoy <ego@in.ibm.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      LKML-Reference: <20090325091345.13992.20099.stgit@sofia.in.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      381be78f
    • Gautham R Shenoy's avatar
      sched: Fix indentations in find_busiest_group() using gotos · 6dfdb062
      Gautham R Shenoy authored
      Impact: cleanup
      
      Some indentations in find_busiest_group() can minimized by using
      early exits with the help of gotos. This improves readability in
      a couple of cases.
      Signed-off-by: default avatarGautham R Shenoy <ego@in.ibm.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091340.13992.45062.stgit@sofia.in.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6dfdb062
    • Gautham R Shenoy's avatar
      sched: Simple helper functions for find_busiest_group() · 67bb6c03
      Gautham R Shenoy authored
      Impact: cleanup
      
      Currently the load idx calculation code is in find_busiest_group().
      Move that to a static inline helper function.
      
      Similary, to find the first cpu of a sched_group we use
      cpumask_first(sched_group_cpus(group))
      
      Use a helper to that. It improves readability in some cases.
      Signed-off-by: default avatarGautham R Shenoy <ego@in.ibm.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091335.13992.55424.stgit@sofia.in.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      67bb6c03
  7. 24 Mar, 2009 1 commit
  8. 19 Mar, 2009 2 commits
  9. 17 Mar, 2009 2 commits
  10. 13 Mar, 2009 2 commits
  11. 11 Mar, 2009 1 commit
    • Mike Galbraith's avatar
      sched: add avg_overlap decay · df1c99d4
      Mike Galbraith authored
      Impact: more precise avg_overlap metric - better load-balancing
      
      avg_overlap is used to measure the runtime overlap of the waker and
      wakee.
      
      However, when a process changes behaviour, eg a pipe becomes
      un-congested and we don't need to go to sleep after a wakeup
      for a while, the avg_overlap value grows stale.
      
      When running we use the avg runtime between preemption as a
      measure for avg_overlap since the amount of runtime can be
      correlated to cache footprint.
      
      The longer we run, the less likely we'll be wanting to be
      migrated to another CPU.
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1236709131.25234.576.camel@laptop>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      df1c99d4
  12. 10 Mar, 2009 1 commit
  13. 06 Mar, 2009 1 commit
  14. 05 Mar, 2009 1 commit
    • Frederic Weisbecker's avatar
      sched: don't rebalance if attached on NULL domain · 8a0be9ef
      Frederic Weisbecker authored
      Impact: fix function graph trace hang / drop pointless softirq on UP
      
      While debugging a function graph trace hang on an old PII, I saw
      that it consumed most of its time on the timer interrupt. And
      the domain rebalancing softirq was the most concerned.
      
      The timer interrupt calls trigger_load_balance() which will
      decide if it is worth to schedule a rebalancing softirq.
      
      In case of builtin UP kernel, no problem arises because there is
      no domain question.
      
      In case of builtin SMP kernel running on an SMP box, still no
      problem, the softirq will be raised each time we reach the
      next_balance time.
      
      In case of builtin SMP kernel running on a UP box (most distros
      provide default SMP kernels, whatever the box you have), then
      the CPU is attached to the NULL sched domain. So a kind of
      unexpected behaviour happen:
      
      trigger_load_balance() -> raises the rebalancing softirq later
      on softirq: run_rebalance_domains() -> rebalance_domains() where
      the for_each_domain(cpu, sd) is not taken because of the NULL
      domain we are attached at. Which means rq->next_balance is never
      updated. So on the next timer tick, we will enter
      trigger_load_balance() which will always reschedule() the
      rebalacing softirq:
      
      if (time_after_eq(jiffies, rq->next_balance))
      	raise_softirq(SCHED_SOFTIRQ);
      
      So for each tick, we process this pointless softirq.
      
      This patch fixes it by checking if we are attached to the null
      domain before raising the softirq, another possible fix would be
      to set the maximal possible JIFFIES value to rq->next_balance if
      we are attached to the NULL domain.
      
      v2: build fix on UP
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <49af242d.1c07d00a.32d5.ffffc019@mx.google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8a0be9ef
  15. 02 Mar, 2009 1 commit
  16. 27 Feb, 2009 1 commit
  17. 26 Feb, 2009 2 commits
  18. 25 Feb, 2009 1 commit
    • Peter Zijlstra's avatar
      generic-ipi: remove CSD_FLAG_WAIT · 6e275637
      Peter Zijlstra authored
      Oleg noticed that we don't strictly need CSD_FLAG_WAIT, rework
      the code so that we can use CSD_FLAG_LOCK for both purposes.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6e275637
  19. 20 Feb, 2009 1 commit
  20. 16 Feb, 2009 1 commit
  21. 15 Feb, 2009 1 commit
  22. 12 Feb, 2009 1 commit
  23. 11 Feb, 2009 1 commit
  24. 05 Feb, 2009 1 commit
    • Johannes Weiner's avatar
      wait: prevent exclusive waiter starvation · 777c6c5f
      Johannes Weiner authored
      With exclusive waiters, every process woken up through the wait queue must
      ensure that the next waiter down the line is woken when it has finished.
      
      Interruptible waiters don't do that when aborting due to a signal.  And if
      an aborting waiter is concurrently woken up through the waitqueue, noone
      will ever wake up the next waiter.
      
      This has been observed with __wait_on_bit_lock() used by
      lock_page_killable(): the first contender on the queue was aborting when
      the actual lock holder woke it up concurrently.  The aborted contender
      didn't acquire the lock and therefor never did an unlock followed by
      waking up the next waiter.
      
      Add abort_exclusive_wait() which removes the process' wait descriptor from
      the waitqueue, iff still queued, or wakes up the next waiter otherwise.
      It does so under the waitqueue lock.  Racing with a wake up means the
      aborting process is either already woken (removed from the queue) and will
      wake up the next waiter, or it will remove itself from the queue and the
      concurrent wake up will apply to the next waiter after it.
      
      Use abort_exclusive_wait() in __wait_event_interruptible_exclusive() and
      __wait_on_bit_lock() when they were interrupted by other means than a wake
      up through the queue.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Reported-by: default avatarChris Mason <chris.mason@oracle.com>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Mentored-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Chuck Lever <cel@citi.umich.edu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>		["after some testing"]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      777c6c5f
  25. 04 Feb, 2009 1 commit
    • Suresh Siddha's avatar
      sched: fix nohz load balancer on cpu offline · 483b4ee6
      Suresh Siddha authored
      Christian Borntraeger reports:
      
      > After a logical cpu offline, even on a complete idle system, there
      > is one cpu with full ticks. It turns out that nohz.cpu_mask has the
      > the offlined cpu still set.
      >
      > In select_nohz_load_balancer() we check if the system is completely
      > idle to turn of load balancing. We compare cpu_online_map with
      > nohz.cpu_mask.  Since cpu_online_map is updated on cpu unplug,
      > but nohz.cpu_mask is not, the check fails and the scheduler believes
      > that we need an "idle load balancer" even on a fully idle system.
      > Since the ilb cpu does not deactivate the timer tick this breaks NOHZ.
      
      Fix the select_nohz_load_balancer() to not set the nohz.cpu_mask
      while a cpu is going offline.
      Reported-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Tested-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      483b4ee6
  26. 01 Feb, 2009 1 commit