Commits · 206f4a83894f93a4afc4fa82592eb195827c3378 · Kirill Smelkov / linux

24 Aug, 2004 17 commits

[PATCH] move CONFIG_SCHEDSTATS to arch/ppc64/Kconfig.debug · 206f4a83

Nathan Lynch authored Aug 23, 2004

Otherwise it shows up under "iSeries device drivers", which doesn't seem
right.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

206f4a83

[PATCH] scheduler statistics · 7394ebbd

Rick Lindsley authored Aug 23, 2004

It adds lots of CPU scheduler stats in /proc/pid/stat.  They are described in
the new Documentation//sched-stats.txt

We were carrying this patch offline for some time, but as there's still
considerable ongoing work in this area, and as the new stats are a
configuration option, I think it's best that this capability be in the base
kernel.

Nick removed a fair amount of statistics that he wasn't using.  The full patch
gathers more information.  In particular, his patch doesn't include the code
to measure the latency between the time a process is made runnable and the
time it hits a processor which will be key to measuring interactivity changes.

He passed his changes back to me and I got finished merging his changes with
the current statistics patches just before OLS.  I believe this is largely a
superset of the patch you grabbed and should port relatively easily too.

Versions also exist for

    2.6.8-rc2
    2.6.8-rc2-mm1
    2.6.8-rc2-mm2

at
    http://eaglet.rain.com/rick/linux/schedstat/patches/

and within 24 hours at

    http://oss.software.ibm.com/linux/patches/?patch_id=730&show=all

The version below is for 2.6.8-rc2-mm2 without the staircase code and has
been compiled cleanly but not yet run.

From: Ingo Molnar <mingo@elte.hu>

this code needs a couple of cleanups before it can go into mainline:

fs/proc/array.c, fs/proc/base.c, fs/proc/proc_misc.c:

 - moved the new /proc/<PID>/stat fields to /proc/<PID>/schedstat,
   because the new fields break older procps. It's cleaner this way
   anyway. This moving of fields necessiated a bump to version 10.

Documentation/sched-stats.txt:

 - updated sched-stats.txt for version 10

 - wake_up_forked_thread() => wake_up_new_task()

 - updated the per-process field description

Kconfig:

 - removed the default y and made the option dependent on DEBUG_KERNEL. 
   This is really for scheduler analysis, normal users dont need the 
   overhead.

include/linux/sched.h:

 - moved the definitions into kernel/sched.c - this fixes UP compilation
   and is cleaner.

 - also moved the sched-domain definitions to sched.c - now that the 
   sched-domains internals are not exposed to architectures this is
   doable. It's also necessary due to the previous change.

kernel/fork.c:

 - moved the ->sched_info init to sched_fork() where it belongs.

kernel/sched.c:

 - wake_up_forked_thread() -> wake_up_new_task(), wuft_cnt -> wunt_cnt,
   wuft_moved -> wunt_moved.

 - wunt_cnt and wunt_moved were defined by never updated - added the 
   missing code to wake_up_new_task().

 - whitespace/style police

 - removed whitespace changes done to code not related to schedstats -
   i'll send a separate patch for these (and more).
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

7394ebbd

[PATCH] sched: adjust p4 per-cpu gain · 8399dc16

Con Kolivas authored Aug 23, 2004

The smt-nice handling is a little too aggressive by not estimating the per cpu
gain as high enough for pentium4 hyperthread. This patch changes the per
sibling cpu gain from 15% to 25%. The true per cpu gain is entirely dependant
on the workload but overall the 2 species of Pentium4 that support
hyperthreading have about 20-30% gain.

P.S: Anton - For the power processors that are now using this SMT nice
infrastructure it would be worth setting this value separately at 40%.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8399dc16

[PATCH] Create cpu_sibling_map for PPC64 · 0c5af7c6

Matthew Dobson authored Aug 23, 2004

In light of some proposed changes in the sched_domains code, I coded up
this little ditty that simply creates and populates a cpu_sibling_map for
PPC64 machines. The patch just checks the CPU flags to determine if the
CPU supports SMT (aka Hyper-Threading aka Multi-Threading aka ...) and
fills in a mask of the siblings for each CPU in the system. This should
allow us to build sched_domains for PPC64 with generic code in
kernel/sched.c for the SMT systems. SMT is becoming more popular and is
turning up in more and more architectures. I don't think it will be too
long until this feature is supported by most arches...
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

0c5af7c6

[PATCH] sched: isolated sched domains · 6f4c30b1

Dimitri Sivanich authored Aug 23, 2004

Here's a version of the isolated scheduler domain code that I mentioned in
an RFC on 7/22.  This patch applies on top of 2.6.8-rc2-mm1 (to include all
of the new arch_init_sched_domain code).  This patch also contains the 2
line fix to remove the check of first_cpu(sd->groups->cpumask)) that Jesse
sent in earlier.

Note that this has not been tested with CONFIG_SCHED_SMT.  I hope that my
handling of those instances is OK.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

6f4c30b1

[PATCH] sched: limit cpuspan of node scheduler domains · c183e253

Jesse Barnes authored Aug 23, 2004

  This patch limits the cpu span of each node's scheduler domain to prevent
  balancing across too many cpus.  The cpus included in a node's domain are
  determined by the SD_NODES_PER_DOMAIN define and the arch specific
  sched_domain_node_span routine if ARCH_HAS_SCHED_DOMAIN is defined.  If
  ARCH_HAS_SCHED_DOMAIN is not defined, behavior is unchanged--all possible
  cpus will be included in each node's scheduling domain.  Currently, only
  ia64 provides an arch specific sched_domain_node_span routine.

From: Jesse Barnes <jbarnes@engr.sgi.com>

  This patch adds some more NUMA specific logic to the creation of scheduler
  domains.  Domains spanning all CPUs in a large system are too large to
  schedule across efficiently, leading to livelocks and inordinate amounts of
  time being spent in scheduler routines.  With this patch applied, the node
  scheduling domains for NUMA platforms will only contain a specified number
  of nearby CPUs, based on the value of SD_NODES_PER_DOMAIN.  It also allows
  arches to override SD_NODE_INIT, which sets the domain scheduling parameters
  for each node's domain.  This is necessary especially for large systems.

  Possible future directions:

  o multilevel node hierarchy (e.g.  node domains could contain 4 nodes
    worth of CPUs, supernode domains could contain 32 nodes worth, etc.  each
    with their own SD_NODE_INIT values)

  o more tweaking of SD_NODE_INIT values for good load balancing vs. 
    overhead tradeoffs

From: mita akinobu <amgta@yacht.ocn.ne.jp>

  Compile fix
Signed-off-by: Jesse Barnes <jbarnes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

c183e253

[PATCH] sched: consolidate sched domains · 8a7a2318

Nick Piggin authored Aug 23, 2004

  Teach the generic domains builder about SMT, and consolidate all
  architecture specific domain code into that.  Also, the SD_*_INIT macros can
  now be redefined by arch code without duplicating the entire setup code. 
  This can be done by defining ARCH_HASH_SCHED_TUNE.

  The generic builder has been simplified with the addition of a helper
  macro which will probably prove to be useful to arch specific code as well
  and should be exported if that is the case.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>

From: Matthew Dobson <colpatch@us.ibm.com>

  The attached patch is against 2.6.8-rc2-mm2, and removes Nick's
  conditional definition & population of cpu_sibling_map[] in favor of my
  unconditional ones.  This does not affect how cpu_sibling_map is used, just
  gives it broader scope.

From: Nick Piggin <nickpiggin@yahoo.com.au>

  Small fix to sched-consolidate-domains.patch picked up by

From: Suresh <suresh.b.siddha@intel.com>

  another sched consolidate domains fix

From: Nick Piggin <nickpiggin@yahoo.com.au>

  Don't use cpu_sibling_map if !CONFIG_SCHED_SMT

  This one spotted by Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8a7a2318

[PATCH] sched: fork hotplug hanling cleanup · c62e7cdb

Ingo Molnar authored Aug 23, 2004

- remove the hotplug lock from around much of fork(), and re-copy the
  cpus_allowed mask to solve the hotplug race cleanly.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

c62e7cdb

[PATCH] sched: remove balance on clone · c15d3bea

Nick Piggin authored Aug 23, 2004

This removes balance on clone capability altogether.  I told Andi we wouldn't
remove it yet, but provided it is in a single small patch, he mightn't get too
upset.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

c15d3bea

[PATCH] sched: disable balance on clone · b4f14b64

Nick Piggin authored Aug 23, 2004

Don't balance on clone by default.

Balance on clone has a number of trivial performance failure cases, but it was
needed to get decent OpenMP performance on NUMA (Opteron) systems.  Not doing
child-runs-first for new threads also solves this problem in a nicer way
(implemented in a previous patch).
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

b4f14b64

[PATCH] sched: sched misc changes · 8a78765b

Nick Piggin authored Aug 23, 2004

Add some likely/unliklies, a for_each_cpu => for_each_cpu_online, and close
the sched_exit race.

From: Ingo Molnar <mingo@elte.hu>

  fix a typo in a previous patch breaking RT scheduling & interactivity.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8a78765b

[PATCH] sched: make rt_task unlikely · 0df0d063

Nick Piggin authored Aug 23, 2004

From: Ingo Molnar <mingo@elte.hu>

RT tasks are unlikely, move this into rt_task() instead of open-coding it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

0df0d063

[PATCH] sched: misc cleanups #2 · ce9bb66d

Ingo Molnar authored Aug 23, 2004

 - fix two stale comments
 - cleanup
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

ce9bb66d

[PATCH] kernel thread idle fix · 49717553

Nick Piggin authored Aug 23, 2004

Now that init_idle does not remove tasks from the runqueue, those
architectures that use kernel_thread instead of copy_process for the idle
task will break.  To fix, ensure that CLONE_IDLETASK tasks are not put on
the runqueue in the first place.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

49717553

[PATCH] sched: cleanup, improve sched <=> fork APIs · 3632d86a

Nick Piggin authored Aug 23, 2004

Move balancing and child-runs-first logic from fork.c into sched.c where
it belongs.

* Consolidate wake_up_forked_process and wake_up_forked_thread into
  wake_up_new_process, and pass in clone_flags as suggested by Linus.  This
  removes a lot of code duplication and allows all logic to be handled in that
  function.

* Don't do balance-on-clone balancing for vfork'ed threads.

* Don't do set_task_cpu or balance one clone in wake_up_new_process. 
  Instead do it in sched_fork to fix set_cpus_allowed races.

* Don't do child-runs-first for CLONE_VM processes, as there is obviously no
  COW benifit to be had.  This is a big one, it enables Andi's workload to run
  well without clone balancing, because the OpenMP child threads can get
  balanced off to other nodes *before* they start running and allocating
  memory.

* Rename sched_balance_exec to sched_exec: hide the policy from the API.


From: Ingo Molnar <mingo@elte.hu>

  rename wake_up_new_process -> wake_up_new_task.

  in sched.c we are gradually moving away from the overloaded 'process' or
  'thread' notion to the traditional task (or context) naming.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

3632d86a

[PATCH] sched: cleanup init_idle() · 70a0b8e7

Nick Piggin authored Aug 23, 2004

Clean up init_idle to not use wake_up_forked_process, then undo all the stuff
that call does.  Instead, do everything in init_idle.

Make double_rq_lock depend on CONFIG_SMP because it is no longer used on UP.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

70a0b8e7

[PATCH] sched: fix timeslice calculations for HZ=1000. · b2a0e913

Ingo Molnar authored Aug 23, 2004

The main benefit is that with the default HZ=1000 nice +19 tasks now get 5
msecs of timeslices, so the ratio of CPU use is linear.  (nice 0 task gets
20 times more CPU time than a nice 19 task.  Prior this change the ratio
was 1:10)

another effect is that nice 0 tasks now get a round 100 msecs of timeslices
(as intended), instead of 102 msecs.

here's a table of old/new timeslice values, for HZ=1000 and 100:

                      HZ=1000         (   HZ=100   )
                    old    new        ( old    new )

        nice -20:   200    200        ( 200    200 )
        nice -19:   195    195        ( 190    190 )
        ...
        nice 0:     102    100        ( 100    100 )
        nice 1:      97     95        (  90     90 )
        nice 2:      92     90        (  90     90 )
        ...
        nice 17:     19     15        (  10     10 )
        nice 18:     14     10        (  10     10 )
        nice 19:     10      5        (  10     10 )

i've tested the patch on x86.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

b2a0e913

23 Aug, 2004 23 commits
- Linux 2.6.9-rc1 · 5a528e75
  Linus Torvalds authored Aug 23, 2004
  
  5a528e75
- [PATCH] ppc64: use struct list_head for hose_list · fae337c7
  Paul Mackerras authored Aug 23, 2004
```
This patch changes hose_list from a simple linked list to a
"list.h"-style list.  This is in preparation for the runtime
addition/removal of PCI Host Bridges.
Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
```
  fae337c7
- [PATCH] ppc64: fix enable_surveillance() for power5 · d613adcd
  Nathan Fontenot authored Aug 23, 2004
```
On some platforms (notably power5) you can't enable surveillance
(firmware/service processor watchdog) from the kernel - you have to do
it in the firmware.

This patch changes enable_surveillance() to make the message that is
printed in this situation more informative.  Additionaly, the rtas_call
was changed to rtas_set_indicator so as to avoid having to handle
RTAS_BUSY returns.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
```
  d613adcd
- Merge bk://ppc.bkbits.net/for-linus-ppc64 · bf851860
  Linus Torvalds authored Aug 23, 2004
```
into ppc970.osdl.org:/home/torvalds/v2.6/linux
```
  bf851860
- Use F_SETLK instead of F_SETLK64 in nfs locking code. · 6a8e8a44
  Linus Torvalds authored Aug 23, 2004
```
The code doesn't actually _care_ about 32/64-bit issues,
only about F_SETLK vs F_SETLKW, and the F_SETLK64 doesn't
exist except as a compatibility thing on 64-bit architectures
(since the regular one already _is_ 64-bit, of course).
```
  6a8e8a44
- Merge http://nfsclient.bkbits.net/linux-2.6 · 0646a4e4
  Trond Myklebust authored Aug 23, 2004
```
into fys.uio.no:/home/linux/bitkeeper/nfsclient-2.6
```
  0646a4e4
- RPC,NFSv4: NFSv4 operations that create or destroy state on the · b4a558fd
  Trond Myklebust authored Aug 23, 2004
```
   server are not allowed to be interrupted as that may result in the
   client and server disagreeing.
```
  b4a558fd
- NFSv4: Enable delegations by actually telling the server about our · fdd46e51
  Trond Myklebust authored Aug 23, 2004
```
   recall ability.
Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
```
  fdd46e51
- NFSv4: return all delegations we hold if the server issues a · ef7306b4
  Trond Myklebust authored Aug 23, 2004
```
   NFS4ERR_CB_PATH_DOWN error.
```
  ef7306b4
- NFSv4: More aggressive caching if we have a delegation. · b42a8a16
  Trond Myklebust authored Aug 23, 2004
```
Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
```
  b42a8a16
- NFSv4: Delegated open. · 7129dfe5
  Trond Myklebust authored Aug 23, 2004
```
Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
```
  7129dfe5
- NFSv4: Recover delegations on server reboot. · 2ac2de8d
  Trond Myklebust authored Aug 23, 2004
```
Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
```
  2ac2de8d
- NFSv4: More delegation recall code · 0a3b5d3b
  Trond Myklebust authored Aug 23, 2004
```
Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
```
  0a3b5d3b
- NFSv4: Service delegation recall requests from the server. · 3f052fdc
  Trond Myklebust authored Aug 23, 2004
```
Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
```
  3f052fdc
- NFSv4: Further XDR cleanups in preparation for delegations. · f092be42
  Trond Myklebust authored Aug 23, 2004
```
Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
```
  f092be42
- NFSv4: XDR cleanups in preparation for delegations. · b6784786
  Trond Myklebust authored Aug 23, 2004
```
Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
```
  b6784786
- NFSv4: Add support for a delegation callback server. · 2fa8729d
  Trond Myklebust authored Aug 23, 2004
```
Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
```
  2fa8729d
- NFSv4: Basic code for managing delegation state. · 6d132c2f
  Trond Myklebust authored Aug 23, 2004
```
Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
```
  6d132c2f
- Merge fys.uio.no:/home/linux/bitkeeper/nfsclient-2.6 · 477d40b4
  Trond Myklebust authored Aug 23, 2004
```
into fys.uio.no:/home/linux/bitkeeper/work/nfsclient-2.6
```
  477d40b4
- Merge http://nfsclient.bkbits.net/linux-2.6 · fec00732
  Trond Myklebust authored Aug 23, 2004
```
into fys.uio.no:/home/linux/bitkeeper/nfsclient-2.6
```
  fec00732
- NFSv2/v3/v4: Make the rpc_ops->getattr method take a filehandle · 60fa4cfb
  Trond Myklebust authored Aug 23, 2004
```
   rather than an inode argument. Fix up nfs_instantiate() and
   _nfs4_do_open to use this since doing a new lookup might be racy.
```
  60fa4cfb
- NFSv4: don't retry CREATE operations if the server returns · 26ee7f10
  Trond Myklebust authored Aug 23, 2004
```
   NFS4ERR_DELAY on the GETATTR call.
```
  26ee7f10
- NFSv4: More cleanups of the NFSv4 state. · 90c1f6b2
  Trond Myklebust authored Aug 23, 2004
```
Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
```
  90c1f6b2