An error occurred fetching the project authors.
  1. 28 Aug, 2009 1 commit
    • Peter Zijlstra's avatar
      sched: Fix division by zero - really · 34d76c41
      Peter Zijlstra authored
      When re-computing the shares for each task group's cpu
      representation we need the ratio of weight on each cpu vs the
      total weight of the sched domain.
      
      Since load-balancing is loosely (read not) synchronized, the
      weight of individual cpus can change between doing the sum and
      calculating the ratio.
      
      The previous patch dealt with only one of the race scenarios,
      this patch side steps them all by saving a snapshot of all the
      individual cpu weights, thereby always working on a consistent
      set.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: torvalds@linux-foundation.org
      Cc: jes@sgi.com
      Cc: jens.axboe@oracle.com
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      LKML-Reference: <1251371336.18584.77.camel@twins>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      34d76c41
  2. 21 Aug, 2009 1 commit
  3. 20 Aug, 2009 1 commit
  4. 02 Aug, 2009 6 commits
  5. 24 Jul, 2009 1 commit
  6. 18 Jul, 2009 6 commits
  7. 10 Jul, 2009 4 commits
  8. 29 Jun, 2009 1 commit
  9. 19 Jun, 2009 1 commit
    • Peter Zijlstra's avatar
      perf_counter: Simplify and fix task migration counting · e5289d4a
      Peter Zijlstra authored
      The task migrations counter was causing rare and hard to decypher
      memory corruptions under load. After a day of debugging and bisection
      we found that the problem was introduced with:
      
        3f731ca6: perf_counter: Fix cpu migration counter
      
      Turning them off fixes the crashes. Incidentally, the whole
      perf_counter_task_migration() logic can be done simpler as well,
      by injecting a proper sw-counter event.
      
      This cleanup also fixed the crashes. The precise failure mode is
      not completely clear yet, but we are clearly not unhappy about
      having a fix ;-)
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e5289d4a
  10. 18 Jun, 2009 1 commit
    • Oleg Nesterov's avatar
      kthreads: simplify migration_thread() exit path · 371cbb38
      Oleg Nesterov authored
      Now that kthread_stop() can be used even if the task has already exited,
      we can kill the "wait_to_die:" loop in migration_thread().  But we must
      pin rq->migration_thread after creation.
      
      Actually, I don't think CPU_UP_CANCELED or CPU_DEAD should wait for
      ->migration_thread exit.  Perhaps we can simplify this code a bit more.
      migration_call() can set ->should_stop and forget about this thread.  But
      we need a new helper in kthred.c for that.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Vitaliy Gusev <vgusev@openvz.org
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      371cbb38
  11. 17 Jun, 2009 3 commits
  12. 15 Jun, 2009 1 commit
    • Lennart Poettering's avatar
      sched: Introduce SCHED_RESET_ON_FORK scheduling policy flag · ca94c442
      Lennart Poettering authored
      This patch introduces a new flag SCHED_RESET_ON_FORK which can be passed
      to the kernel via sched_setscheduler(), ORed in the policy parameter. If
      set this will make sure that when the process forks a) the scheduling
      priority is reset to DEFAULT_PRIO if it was higher and b) the scheduling
      policy is reset to SCHED_NORMAL if it was either SCHED_FIFO or SCHED_RR.
      
      Why have this?
      
      Currently, if a process is real-time scheduled this will 'leak' to all
      its child processes. For security reasons it is often (always?) a good
      idea to make sure that if a process acquires RT scheduling this is
      confined to this process and only this process. More specifically this
      makes the per-process resource limit RLIMIT_RTTIME useful for security
      purposes, because it makes it impossible to use a fork bomb to
      circumvent the per-process RLIMIT_RTTIME accounting.
      
      This feature is also useful for tools like 'renice' which can then
      change the nice level of a process without having this spill to all its
      child processes.
      
      Why expose this via sched_setscheduler() and not other syscalls such as
      prctl() or sched_setparam()?
      
      prctl() does not take a pid parameter. Due to that it would be
      impossible to modify this flag for other processes than the current one.
      
      The struct passed to sched_setparam() can unfortunately not be extended
      without breaking compatibility, since sched_setparam() lacks a size
      parameter.
      
      How to use this from userspace? In your RT program simply replace this:
      
        sched_setscheduler(pid, SCHED_FIFO, &param);
      
      by this:
      
        sched_setscheduler(pid, SCHED_FIFO|SCHED_RESET_ON_FORK, &param);
      Signed-off-by: default avatarLennart Poettering <lennart@poettering.net>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090615152714.GA29092@tango.0pointer.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ca94c442
  13. 12 Jun, 2009 1 commit
  14. 11 Jun, 2009 3 commits
  15. 02 Jun, 2009 2 commits
    • Paul Mackerras's avatar
      perf_counter: Fix cpu migration counter · 3f731ca6
      Paul Mackerras authored
      This fixes the cpu migration software counter to count
      correctly even when contexts get swapped from one task to
      another.  Previously the cpu migration counts reported by perf
      stat were bogus, ranging from negative to several thousand for
      a single "lat_ctx 2 8 32" run.  With this patch the cpu
      migration count reported for "lat_ctx 2 8 32" is almost always
      between 35 and 44.
      
      This fixes the problem by adding a call into the perf_counter
      code from set_task_cpu when tasks are migrated.  This enables
      us to use the generic swcounter code (with some modifications)
      for the cpu migration counter.
      
      This modifies the swcounter code to allow a NULL regs pointer
      to be passed in to perf_swcounter_ctx_event() etc.  The cpu
      migration counter does this because there isn't necessarily a
      pt_regs struct for the task available.  In this case, the
      counter will not have interrupt capability - but the migration
      counter didn't have interrupt capability before, so this is no
      loss.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <18979.35006.819769.416327@cargo.ozlabs.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3f731ca6
    • Paul Mackerras's avatar
      perf_counter: Initialize per-cpu context earlier on cpu up · f38b0820
      Paul Mackerras authored
      This arranges for perf_counter's notifier for cpu hotplug
      operations to be called earlier than the migration notifier in
      sched.c by increasing its priority to 20, compared to the 10
      for the migration notifier.  The reason for doing this is that
      a subsequent commit to convert the cpu migration counter to use
      the generic swcounter infrastructure will add a call into the
      perf_counter subsystem when tasks get migrated.  Therefore the
      perf_counter subsystem needs a chance to initialize its per-cpu
      data for the new cpu before it can get called from the
      migration code.
      
      This also adds a comment to the migration notifier noting that
      its priority needs to be lower than that of the perf_counter
      notifier.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <18981.1900.792795.836858@cargo.ozlabs.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f38b0820
  16. 23 May, 2009 1 commit
    • Peter Zijlstra's avatar
      perf_counter: Fix dynamic irq_period logging · e220d2dc
      Peter Zijlstra authored
      We call perf_adjust_freq() from perf_counter_task_tick() which
      is is called under the rq->lock causing lock recursion.
      However, it's no longer required to be called under the
      rq->lock, so remove it from under it.
      
      Also, fix up some related comments.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090523163012.476197912@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e220d2dc
  17. 22 May, 2009 1 commit
    • Paul Mackerras's avatar
      perf_counter: Optimize context switch between identical inherited contexts · 564c2b21
      Paul Mackerras authored
      When monitoring a process and its descendants with a set of inherited
      counters, we can often get the situation in a context switch where
      both the old (outgoing) and new (incoming) process have the same set
      of counters, and their values are ultimately going to be added together.
      In that situation it doesn't matter which set of counters are used to
      count the activity for the new process, so there is really no need to
      go through the process of reading the hardware counters and updating
      the old task's counters and then setting up the PMU for the new task.
      
      This optimizes the context switch in this situation.  Instead of
      scheduling out the perf_counter_context for the old task and
      scheduling in the new context, we simply transfer the old context
      to the new task and keep using it without interruption.  The new
      context gets transferred to the old task.  This means that both
      tasks still have a valid perf_counter_context, so no special case
      is introduced when the old task gets scheduled in again, either on
      this CPU or another CPU.
      
      The equivalence of contexts is detected by keeping a pointer in
      each cloned context pointing to the context it was cloned from.
      To cope with the situation where a context is changed by adding
      or removing counters after it has been cloned, we also keep a
      generation number on each context which is incremented every time
      a context is changed.  When a context is cloned we take a copy
      of the parent's generation number, and two cloned contexts are
      equivalent only if they have the same parent and the same
      generation number.  In order that the parent context pointer
      remains valid (and is not reused), we increment the parent
      context's reference count for each context cloned from it.
      
      Since we don't have individual fds for the counters in a cloned
      context, the only thing that can make two clones of a given parent
      different after they have been cloned is enabling or disabling all
      counters with prctl.  To account for this, we keep a count of the
      number of enabled counters in each context.  Two contexts must have
      the same number of enabled counters to be considered equivalent.
      
      Here are some measurements of the context switch time as measured with
      the lat_ctx benchmark from lmbench, comparing the times obtained with
      and without this patch series:
      
      		-----Unmodified-----		With this patch series
      Counters:	none	2 HW	4H+4S	none	2 HW	4H+4S
      
      2 processes:
      Average		3.44	6.45	11.24	3.12	3.39	3.60
      St dev		0.04	0.04	0.13	0.05	0.17	0.19
      
      8 processes:
      Average		6.45	8.79	14.00	5.57	6.23	7.57
      St dev		1.27	1.04	0.88	1.42	1.46	1.42
      
      32 processes:
      Average		5.56	8.43	13.78	5.28	5.55	7.15
      St dev		0.41	0.47	0.53	0.54	0.57	0.81
      
      The numbers are the mean and standard deviation of 20 runs of
      lat_ctx.  The "none" columns are lat_ctx run directly without any
      counters.  The "2 HW" columns are with lat_ctx run under perfstat,
      counting cycles and instructions.  The "4H+4S" columns are lat_ctx run
      under perfstat with 4 hardware counters and 4 software counters
      (cycles, instructions, cache references, cache misses, task
      clock, context switch, cpu migrations, and page faults).
      
      [ Impact: performance optimization of counter context-switches ]
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <18966.10666.517218.332164@cargo.ozlabs.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      564c2b21
  18. 19 May, 2009 1 commit
  19. 15 May, 2009 2 commits
    • Thomas Gleixner's avatar
      sched, timers: cleanup avenrun users · 2d02494f
      Thomas Gleixner authored
      avenrun is an rough estimate so we don't have to worry about
      consistency of the three avenrun values. Remove the xtime lock
      dependency and provide a function to scale the values. Cleanup the
      users.
      
      [ Impact: cleanup ]
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      2d02494f
    • Thomas Gleixner's avatar
      sched, timers: move calc_load() to scheduler · dce48a84
      Thomas Gleixner authored
      Dimitri Sivanich noticed that xtime_lock is held write locked across
      calc_load() which iterates over all online CPUs. That can cause long
      latencies for xtime_lock readers on large SMP systems. 
      
      The load average calculation is an rough estimate anyway so there is
      no real need to protect the readers vs. the update. It's not a problem
      when the avenrun array is updated while a reader copies the values.
      
      Instead of iterating over all online CPUs let the scheduler_tick code
      update the number of active tasks shortly before the avenrun update
      happens. The avenrun update itself is handled by the CPU which calls
      do_timer().
      
      [ Impact: reduce xtime_lock write locked section ]
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      dce48a84
  20. 13 May, 2009 2 commits
    • Arun R Bharadwaj's avatar
      timers: Logic to move non pinned timers · eea08f32
      Arun R Bharadwaj authored
      * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]:
      
      This patch migrates all non pinned timers and hrtimers to the current
      idle load balancer, from all the idle CPUs. Timers firing on busy CPUs
      are not migrated.
      
      While migrating hrtimers, care should be taken to check if migrating
      a hrtimer would result in a latency or not. So we compare the expiry of the
      hrtimer with the next timer interrupt on the target cpu and migrate the
      hrtimer only if it expires *after* the next interrupt on the target cpu.
      So, added a clockevents_get_next_event() helper function to return the
      next_event on the target cpu's clock_event_device.
      
      [ tglx: cleanups and simplifications ]
      Signed-off-by: default avatarArun R Bharadwaj <arun@linux.vnet.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      eea08f32
    • Arun R Bharadwaj's avatar
      timers: /proc/sys sysctl hook to enable timer migration · cd1bb94b
      Arun R Bharadwaj authored
      * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]:
      
      This patch creates the /proc/sys sysctl interface at
      /proc/sys/kernel/timer_migration
      
      Timer migration is enabled by default.
      
      To disable timer migration, when CONFIG_SCHED_DEBUG = y,
      
      echo 0 > /proc/sys/kernel/timer_migration
      Signed-off-by: default avatarArun R Bharadwaj <arun@linux.vnet.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      cd1bb94b