Commits · 1079b18707c78ea1a31ca66f3e1e17fe86fd9894 · nexedi / linux

12 Apr, 2004 40 commits

[PATCH] Update Documentation/Changes · 1079b187

Andrew Morton authored Apr 11, 2004

From: Trivial Patch Monkey <trivial@rustcorp.com.au>

From:  Thomas Molina <tmolina@cablespeed.com>

1079b187

[PATCH] ne2k-pci.c compile fix on ppc[64] · 73007d9b

Andrew Morton authored Apr 11, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

These macros are redefined here.  Previously definitions are in
asm-ppc(64)/io.h

73007d9b

[PATCH] Add CC Trivial Patch Monkey to SubmittingPatches · 64ea79c7

Andrew Morton authored Apr 11, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

From: maximilian attems <janitor@sternwelten.at>

Add the Monkey to SubmittingPatches.

64ea79c7

[PATCH] Use valid node number when unmapping x86 CPUs · 7275fb97

Andrew Morton authored Apr 11, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

From: colpatch@us.ibm.com

The cpu_2_node[] array for i386 is initialized to all 0's, meaning that
until modified at CPU bring-up, all CPUs are mapped to node 0.

When CPUs are brought online, they are mapped to the appropriate node by
various mechanisms, depending on the underlying hardware.

When we unmap CPUs (hotplug time), we should return the mapping for the CPU
that is going away to its original state, ie: 0.

When this code was initially submitted, the misguided poster (me) made the
mistake of putting a -1 in the cpu_2_node[] array for the CPU going away.

This patch fixes this mistake, and allows code to get a valid node number
for all valid CPU numbers. This is important, because most (if not all)
callers do not error check the value returned by the cpu_to_node() macro,
and they should not have to. The API specifies that a valid node number be
returned for any valid CPU number.

7275fb97

[PATCH] Kill duplicate #include <linux_ioport.h> · 3a2d85ea

Andrew Morton authored Apr 11, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

include/linux/device.h includes include/linux/ioport.h twice.

3a2d85ea

[PATCH] updating email info in CREDITS · 17ec30a3

Andrew Morton authored Apr 11, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

From:  Thomas Molina <tmolina@cablespeed.com>

17ec30a3

[PATCH] CONFIG_X86_GENERIC description fixup · e1319f38

Andrew Morton authored Apr 11, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

From:  Stewart Smith <stewart@linux.org.au>

A better explanation of the X86_GENERIC config option follows.

e1319f38

[PATCH] Fix genksyms parsing · f17ea056

Andrew Morton authored Apr 11, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

From: Andreas Schwab <schwab@suse.de> I'm getting a warning when building
for ia64 with MODVERSIONS enabled. This is a bug in genksyms, it can't
cope with some arguments of __typeof__.

The following patch will fix that. Actually the argument of __typeof__ is
an abstract declarator, but the genksyms parser has no production for that;
decl_specifier_seq also matches some invalid constructs, but I don't think
this is a problem in practice, since the compiler will reject them.

f17ea056

[PATCH] Trivial Patch Monkey should be in MAINTAINERS · fa79e47b
Andrew Morton authored Apr 11, 2004
```
From: Rusty Russell <rusty@rustcorp.com.au>

From:  Petri Koistinen <petri.koistinen@iki.fi>
```
fa79e47b

[PATCH] Fix firmware loader docs · f333f50d

Andrew Morton authored Apr 11, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

From:  Pavel Machek <pavel@ucw.cz>

sysfs should be mounted on /sys these days.

f333f50d

[PATCH] i386 irq.c ifdef cleanup · bc344a64

Andrew Morton authored Apr 11, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

From:  Josef 'Jeff' Sipek <jeffpc@optonline.net>

I just noticed the nested ifdefs, and made it little more readable.

bc344a64

[PATCH] fix sch_ingress help · 387ec9eb

Andrew Morton authored Apr 11, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

From:  John Levon <levon@movementarian.org>

387ec9eb

[PATCH] SGML: close tag with ">" · bd9646e6

Andrew Morton authored Apr 11, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

From:  Hans Ulrich Niedermann <linux-kernel@n-dimensional.de>

doc patch: close tag with ">"

bd9646e6

[PATCH] Consistently use quotes for SGML attributes · c02dc9a8

Andrew Morton authored Apr 11, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

From:  Hans Ulrich Niedermann <linux-kernel@n-dimensional.de>

doc patch: Consistently use quotes for SGML attributes This makes it
possible to process the SGML files without SHORTTAG YES.

c02dc9a8

[PATCH] document unused pte bits on i386 · 2b5f9408

Andrew Morton authored Apr 11, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

From:  Ed L Cashin <ecashin@uga.edu>

This small patch documents that bits 9, 10, and 11 are unused by the Linux
kernel.  The IA-32 Intel Architecture Software Developer's Manual says that
these bits are available for programmer use.

2b5f9408

[PATCH] Update CodingStyle hints for Emacs users. · b4ecf1b0

Andrew Morton authored Apr 11, 2004

From: Trivial Patch Monkey <trivial@rustcorp.com.au>

From:  Ben Greear <greearb@candelatech.com>

Depending on one's default emacs settings, the suggestion in the
CodingStyle may or may not work.  This patch adds a few more commands to
ensure it works in more cases.

b4ecf1b0

[PATCH] ver_linux fix · 3bca5aa3

Andrew Morton authored Apr 11, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

From:  Adrian Bunk <bunk@fs.tum.de>

Some versions of ps print non-version lines when ps --version is invoked.
grep them out.

3bca5aa3

[PATCH] Broken bitmap_parse for ncpus > 32 · f9511792

Andrew Morton authored Apr 11, 2004

From: Joe Korty <joe.korty@ccur.com>

This patch replaces the call to bitmap_shift_right() in bitmap_parse() with
bitmap_shift_left().

I also prepended comments to the bitmap_shift_* functions defining what
'left' and 'right' means.  This is under the theory that if I and all the
reviewers were bamboozled, others in the future occasionally might be too.

f9511792

[PATCH] Fix sys_time() to get subtick correction from the new xtime · 5362a354

Andrew Morton authored Apr 11, 2004

From: "La Monte H.P. Yarroll" <piggy@timesys.com>

This is a Scott Wood patch against 2.6.3.


Use gettimeofday() rather than xtime.tv_sec in sys_time(), since
sys_stime() uses settimeofday() and thus subtracts the subtick correction
from the new xtime.

stime() used settimeofday(), but time() did not use gettimeofday().  Since
settimeofday() subtracts out the current intra-tick correction, and nsec
was 0 (since stime() only allows seconds), this resulted in xtime being
slightly earlier than the time that was set.

If time() had used gettimeofday(), the correction would have been applied,
and everything would be fine.  However, instead time just reads the current
xtime.tv_sec, so if time() is called immediately after stime(), you'll
usually get a value one second earlier.

5362a354

[PATCH] add file_operations.fcntl · cea39746

Andrew Morton authored Apr 11, 2004

From: Chuck Lever <cel@citi.umich.edu>

O_DIRECT|O_APPEND cannot possibly work on NFS, so NFS needs some way of
preventing the user from setting this combination.  We felt that the best
way of implementing this restriction is to allow the filesytem to implement
its own fcntl() handler.

This patch does, that, and provide the appropriate handler for NFS.

Additional details from Chuck:

Forgetting O_DIRECT for a moment, O_APPEND writes on NFS don't work in any
case when multiple clients are writing to a file, since an NFS client can
never guarantee it knows where the true end of file is 100% of the time.
it works as expected iff only one client writes to an O_APPEND file at a
time.

Multi-client O_APPEND writing doesn't seem to be a problem for any
application I'm aware of.  Since it can be made to behave in the
multi-client case with careful application logic or by using file locking,
I don't think we should disallow it.

I want to drop the inode semaphore when doing NFS direct I/O because it is
synchronous; holding the i_sem means we reduce direct I/O concurrency to
one I/O per file at a time.  the important thing sct was worried about was
the case where a single client is writing with O_APPEND and O_DIRECT, and
we don't hold the i_sem during the write.

We must at least hold the i_sem when determining where the end of file is
to do the O_APPEND write.  In 2.6, I believe that is handled correctly in
the VFS layer, so this is not an issue for 2.6, right?

cea39746

[PATCH] pmdisk: fix strcmp in sysfs store · 3f66b056

Andrew Morton authored Apr 11, 2004

From: Herbert Xu <herbert@gondor.apana.org.au>

This patch fixes the sysfs store functions for pmdisk when the input
contains a trailing newline.

3f66b056

[PATCH] sb_mixer bounds checking · 77abb2f0

Andrew Morton authored Apr 11, 2004

From: Muli Ben-Yehuda <mulix@mulix.org>

This patch add proper bounds checking to the sb_mixer.c code, found by the
stanford checker[0]. It fixes bugzilla bugs 252[1], 253[2] and 254[3].
Patch is against 2.6.5-rc2. It was tested by Rene Herman on SN AWE64 gold
and sound still works. The issue was previously discussed on lkml[4], but
apparently no fix was applied.

The patch is a bit more intrusive than I would've liked, but I don't think
it can be helped without really intrusive changes. sb_devc has a pointer
to an array (iomap) that is set at run time to point to arrays of variable
sizes. The patch adds an 'iomap_sz' member to sb_devc that is set to the
length of the array, and does bounds checking in sb_common_mixer_set() and
smw_mixer_set() agains that.

77abb2f0

[PATCH] fs/proc/proc_tty.c comment fixes · 59b46ce5
Andrew Morton authored Apr 11, 2004
```
From: Marc-Christian Petersen <m.c.p@wolk-project.de>
```
59b46ce5

[PATCH] set mod->waiter before calling stop_machine · 07ebe427

Andrew Morton authored Apr 11, 2004

From: Rusty Russell <rusty@rustcorp.com.au>

mod->waiter needs to be set before we try to stop the module: setting it in
__try_stop_module means it gets set to the kthread, not rmmod.

07ebe427

[PATCH] slab: updates for per-arch alignments · b9e55f3d

Andrew Morton authored Apr 11, 2004

From: Manfred Spraul <manfred@colorfullife.com>

Description:

Right now kmem_cache_create automatically decides about the alignment of
allocated objects. The automatic decisions are sometimes wrong:

- for some objects, it's better to keep them as small as possible to
  reduce the memory usage.  Ingo already added a parameter to
  kmem_cache_create for the sigqueue cache, but it wasn't implemented.

- for s390, normal kmalloc must be 8-byte aligned.  With debugging
  enabled, the default allocation was 4-bytes.  This means that s390 cannot
  enable slab debugging.

- arm26 needs 1 kB aligned objects.  Previously this was impossible to
  generate, therefore arm has its own allocator in
  arm26/machine/small_page.c

- most objects should be cache line aligned, to avoid false sharing.  But
  the cache line size was set at compile time, often to 128 bytes for
  generic kernels.  This wastes memory.  The new code uses the runtime
  determined cache line size instead.

- some caches want an explicit alignment.  One example are the pte_chain
  objects: they must find the start of the object with addr&mask.  Right
  now pte_chain objects are scaled to the cache line size, because that was
  the only alignment that could be generated reliably.

The implementation reuses the "offset" parameter of kmem_cache_create and
now uses it to pass in the requested alignment.  offset was ignored by the
current implementation, and the only user I found is sigqueue, which
intended to set the alignment.

In the long run, it might be interesting for the main tree: due to the 128
byte alignment, only 7 inodes fit into one page, with 64-byte alignment, 9
inodes - 20% memory recovered for Athlon systems.



For generic kernels  running on P6 cpus (i.e. 32 byte cachelines), it means

Number of objects per page:

 ext2_inode_cache: 8 instead of 7
 ext3_inode_cache: 8 instead of 7
 fat_inode_cache: 9 instead of 7
 rpc_tasks: 24 instead of 15
 tcp_tw_bucket: 40 instead of 30
 arp_cache: 40 instead of 30
 nfs_write_data: 9 instead of 7

b9e55f3d

[PATCH] Fix scripts/kernel-doc to handle __attribute__ · 1aa6c0d1

Andrew Morton authored Apr 11, 2004

From: Tom Rini <trini@kernel.crashing.org>

The following patch is needed so that kernel-doc can handle functions which
have __attribute__'s on them (such as __attribute__ ((weak))).

1aa6c0d1

[PATCH] readv/writev range checking fix · fb14ef35

Andrew Morton authored Apr 11, 2004

do-readv_writev() is trying to fail if

a) any of the segments have a length < 0 or

b) the sum of the segments wraps negative.

But it gets b) wrong because local variable tot_len is unsigned.

Fix that up.

fb14ef35

[PATCH] jbd: fix I/O error handling · b1ee3fea

Andrew Morton authored Apr 11, 2004

Fix a few buglets spotted by Jeff Mahoney <jeffm@suse.com>. We're currently
only checking for I/O errors against journal buffers if they were locked when
they were first inspected.

We need to check buffer_uptodate() even if the buffers were already unlocked.

b1ee3fea

[PATCH] JBD: ordered-data commit cleanup · 2b38960c

Andrew Morton authored Apr 11, 2004

For data=ordered, kjournald at commit time has to write out and wait upon a
long list of buffers.  It does this in a rather awkward way with a single
list.  it causes complexity and long lock hold times, and makes the addition
of rescheduling points quite hard

So what we do instead (based on Chris Mason's suggestion) is to add a new
buffer list (t_locked_list) to the journal.  It contains buffers which have
been placed under I/O.

So as we walk the t_sync_datalist list we move buffers over to t_locked_list
as they are written out.

When t_sync_datalist is empty we may then walk t_locked_list waiting for the
I/O to complete.

As a side-effect this means that we can remove the nasty synchronous wait in
journal_dirty_data which is there to avoid the kjournald livelock which would
otherwise occur when someone is continuously dirtying a buffer.

2b38960c

[PATCH] jbd: fix ordered-data writeout logic · 376fd482

Andrew Morton authored Apr 11, 2004

There's some nasty code in commit which deals with a lock ranking problem.
Currently if it fails to get the lock when and local variable `bufs' is zero
we forget to write out some ordered-data buffers. So a subsequent
crash+recovery could yield stale data in existing files.

Fix it by correctly restarting the t_sync_datalist search.

376fd482

[PATCH] speed up ext2 fsync() and fdatasync() · 7176142a

Andrew Morton authored Apr 11, 2004

ext2_sync_file() forgets to clear the inode's dirty bits, so we write the
inode on every fsync(), even if it hasn't changed.

Fix that up via the new sync_file() API which correctly manages the inode
state bits and the superblock inode lists.

When performing file overwrite on IDE with and without writeback caching
enabled this patch approximately doubles fsync() speed, bringing it into line
with O_SYNC writes.

Also, fix up the return value handling in ext2_sync_file().

Credit due to Jeffrey Siegal <jbs@quiotix.com> who noticed the performance
discrepancy and wrote a test app.

7176142a

[PATCH] ext3 fsync() and fdatasync() speedup · a1ff5989

Andrew Morton authored Apr 11, 2004

ext3's fsync/fdatasync implementation is currently syncing the inode via a
full journal commit even if it was unaltered.

Fix that up by exporting the core VFS's inode sync function to modules and
calling it if the inode is dirty. We need to do it this way so that the
inode is moved to the appropriate superblock list and so that the i_state
dirty flags are appropriately updated.

This speeds up ext3 fsync() for file overwrites by a factor of four (disk
non-writeback) to forty (disk in writeback mode).

a1ff5989

[PATCH] Fix page allocator lower zone protection for NUMA · af70f767

Andrew Morton authored Apr 11, 2004

From: Martin Hicks <mort@wildopensource.com>

This changes __alloc_pages() so it uses precalculated values for the "min".
This should prevent the problem of min incrementing from zone to zone across
many nodes on a NUMA machine.  The result of falling back to other nodes with
the old incremental min calculations was that the min value became very
large.

af70f767

[PATCH] move job control fields from task_struct to signal_struct · 7860b371

Andrew Morton authored Apr 11, 2004

From: Roland McGrath <roland@redhat.com>

This patch moves all the fields relating to job control from task_struct to
signal_struct, so that all this info is properly per-process rather than
being per-thread.

7860b371

[PATCH] IPMI driver updates · 0ab2d668

Andrew Morton authored Apr 11, 2004

From: Corey Minyard <minyard@acm.org>

- Add support for messaging through an IPMI LAN interface, which is
  required for some system software that already exists on other IPMI
  drivers.  It also does some renaming and a lot of little cleanups.

- Add the "System Interface" driver.  The previous driver for system
  interfaces only supported the KCS interface, this driver supports all
  system interfaces defined in the IPMI standard.  It also does a much better
  job of handling ACPI and SMBIOS tables for detecting IPMI system
  interfaces.

0ab2d668

[PATCH] compat emulation for posix message queues · 87c22e84

Andrew Morton authored Apr 11, 2004

From: Arnd Bergmann <arnd@arndb.de>

I have tested the code with the open posix test suite and found the same
four failures for both 64-bit and compat mode, most tests pass. The patch
is against -mc1, but I guess it also applies to the other trees around.

What worries me more than mq_attr compatibility is the conversion of struct
sigevent, which might turn out really hard when more fields in there are
used. AFAICS, the only other part in the kernel ABI is sys_timer_create(),
so maybe it's not too late to deprecate the current structure and create a
structure that can be used properly for compat syscalls.

87c22e84

[PATCH] posix message queues: send notifications via netlink · 34b98f22

Andrew Morton authored Apr 11, 2004

From: Manfred Spraul <manfred@colorfullife.com>

SIGEV_THREAD means that a given callback should be called in the context on a
new thread. This must be done by the C library. The kernel must deliver a
notice of the event to the C library when the callback should be called.

This patch switches to a new, simpler interface: User space creates a socket
with socket(PF_NETLINK, SOCK_RAW,0) and passes the fd to the mq_notify call
together with a cookie. When the mq_notify() condition is satisfied, the
kernel "writes" the cookie to the socket. User space then reads the cookie
and calls the appropriate callback.

34b98f22

[PATCH] split netlink_unicast · ed6dcf4a

Andrew Morton authored Apr 11, 2004

From: Manfred Spraul <manfred@colorfullife.com>

The attached patch splits netlink_unicast into three steps:

- netlink_getsock{bypid,byfilp}: lookup the destination socket.

- netlink_attachskb: perform the nonblock checks, sleep if the socket
  queue is longer than the limit, etc.

- netlink_sendskb: actually send the skb.

jamal looked over it and didn't see a problem with the netlink change.  The
actual use from ipc/mqueue.c is still open (just send back whatever the C
library passed to mq_notify, add an nlmsghdr or perhaps even make it a
specialized netlink protocol), but the attached patch is independant from
the the message queue change.

(acked by davem)

ed6dcf4a

[PATCH] security bugfix for mqueue · b06d7b4c

Andrew Morton authored Apr 11, 2004

From: Manfred Spraul <manfred@colorfullife.com>

I found a security bug in the new mqueue code: a process that has only
write permissions to a message queue could call mq_notify(SIGEV_THREAD) and
use the returned notification file descriptor to read from the message
queue.

b06d7b4c

[PATCH] posix message queue update · f3ca8d5d

Andrew Morton authored Apr 11, 2004

From: Manfred Spraul <manfred@colorfullife.com>

My discussion with Ulrich had one result:

- mq_setattr can accept implementation defined flags.  Right now we have
  none, but we might add some later (e.g.  switch to CLOCK_MONOTONIC for
  mq_timed{send,receive} or something similar).  When we add flags, we
  might need the fields for additional information.  And they don't hurt.
  Therefore add four __reserved fields to mq_attr.

- fail mq_setattr if we get unknown flags - otherwise glibc can't detect
  if it's running on a future kernel that supports new features.

- use memset to initialize the mq_attr structure - theoretically we could
  leak kernel memory.

- Only set O_NONBLOCK in mq_attr, explicitely clear O_RDWR & friends.
  openposix uses getattr, attr |=O_NONBLOCK, setattr - a sane approach. 
  Without clearing O_RDWR, this fails.

I've retested all openposix conformance tests with the new patch - the two
new FAILED tests check undefined behavior.  Note that I won't have net
access until Sunday - if the message queue patch breaks something important
either ask Krzysztof or drop it.

Ulrich had another good idea for SIGEV_THREAD, but I must think about it.
It would mean less complexitiy in glibc, but more code in the kernel.  I'm
not yet convinced that it's overall better.

f3ca8d5d