Commits · 3d178e973cf1b246350c5fb80020ae4523e9c969 · nexedi / linux

18 Jun, 2004 40 commits

Thomas Winischhofer authored Jun 17, 2004

attached is an update for the sisfb driver to version 1.7.10.

This update includes

- fixes for pure 64bit and 32/64bit mixed systems (add ioctl conversion;
  fix variable sizes, etc; REQUIRED for current X.org/XFree86 on 64bit
  systems, even if pure 64bit),

- fixes for 301C video bridge, (scales TV output correctly now)

- fixes for 1600x1200 and 1400x1050 LCD panels,

- many fixes for 661/741/760 (amongst others, proper LFB support for the
  760 and corrections for SiS' new BIOS data layout; would lead to display
  corruption with old driver)

- add support for many modes for LCD which were unsupported previously,

- add support for HiVision and YPbPr HDTV

- "vga=" statement now honoured properly (sisfb will set the same mode as
  the kernel did by default)

- use LCD native resolution mode if no mode is given

- a major clean up of main driver code,

- radical removal of duplicate (or nearly duplicate) code,

- switched to 2.6 module_param macros,

- enhanced communication with the X driver,

- added eventual POSTing of SiS300/305 card for non-x86 archs,

- added ability to relocate the image on the TV screen using a userland
  tool,

- added Documentation/fb/sisfb.txt (why the heck was this missing?!)

- small fix for SiS DRM driver (match 32/64bit fixes mentioned above)
  (cast the data passed to sis_free as u32)

- make driver re-entrant by avoiding static structures and variables.

As usual, heavily tested.  The mode switching code is even lab-tested by
SiS (although 100% written by me).  Please apply asap (especially since
64bit systems were not properly supported previously; as mentioned, current
X.org/XFree86 needs this update for proper communication with the
framebuffer driver on 64bit systems.  X crashes on such systems with the
old driver).
Signed-off-by: Thomas Winischhofer <thomas@winischhofer.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

3d178e97

[PATCH] iso9660: NFS fix · 8fc7813c

Paul Serice authored Jun 17, 2004

Make all inode numbers unique for images less than 128GB in size.  Required
for knfsd.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8fc7813c

[PATCH] iso9660: fix handling of inodes beyond 4GB · 9210c204

Paul Serice authored Jun 17, 2004

This is my fourth attempt to patch the isofs code. It is similar to the last
posting except this one implements the NFS get_parent() method which has
always been missing.

The original problem I set out to addresses is that the current iso9660 file
system cannot reach inodes located beyond the 4GB barrier. This is caused by
using the inode number as the byte offset of the inode data. Being 32-bits
wide, the inode number is unable to reach inode data that does not reside on
the first 4GB of the file system.

This causes real problems with "growisofs"

http://fy.chalmers.se/~appro/linux/DVD+RW/#isofs4gb

and my pet project "shunt"

http://www.serice.net/shunt/

This patch switches the isofs code from iget() to iget5_locked() which allows
extra data to be passed into isofs_read_inode() so that inode data anywhere on
the disk can be reached.

The inode number scheme was also changed. Continuing to use the byte offset
would have resulted in non-unique inodes in many common situations, but
because the inode number no longer plays any role in reading the meta-data off
the disk, I was free to set the inode number to some unique characteristic of
the file. I have chosen to use the block offset which is also 32-bits wide.

Lastly, the pre-patch code uses the default export_operations to handle
accessing the file system through NFS. The problem with this is that the
default NFS operations assume that iget() works which is no longer the case
because of the necessity of switching to iget5_locked(). So, I had to
implement the NFS operations too. As a bonus, I went ahead and implemented
the NFS get_parent() method which has always been missing.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

9210c204

[PATCH] BSD accounting format rework · f38928f4

Tim Schmielau authored Jun 17, 2004

BSD accounting format rework:

Use all explicit and implicit padding in struct acct to

 - correctly report 32 bit uid/gid,
 - correctly report jobs (e.g., daemons) running longer than 497 days,
 - increase the precision of ac_etime from 2^-13 to 2^-20
   (i.e., from ~6 hours to ~1 min. after a year)
 - store the current AHZ value.
 - allow cross-platform processing of the accounting file
   (limited for m68k which has a different size struct acct).
 - introduce versioning for smooth transition to incompatible formats in
   the future. Currently the following version numbers are defined:
     0: old format (until 2.6.7) with 16 bit uid/gid
     1: extended variant (binary compatible to v0 on M68K)
     2: extended variant (binary compatible to v0 on everything except M68K)
     3: a new binary incompatible format (64 bytes)
     4: new binary incompatible format (128 bytes).
        layout of its first 64 bytes is the same as for v3.
     5: marks second half of new binary incompatible format (128 bytes)
        (layout is not yet defined)

All this is accomplished without breaking binary compatibility.  32 bit
uid/gid support is compatible with the patch previously floating around and
used e.g.  by Red Hat.

This patch also introduces a config option for a new, binary incompatible
"version 3" format that

 - is uniform across and properly aligned on all platforms
 - stores pid and ppid
 - uses AHZ==100 on all platforms (allows to report longer times)

Much of the compatibility glue goes away when v1/v2 support is removed from
the kernel.  Such a patch is at

  http://www.physik3.uni-rostock.de/tim/kernel/2.7/acct-cleanup-04.patch

and might be applied in the 2.7 timeframe.

The new v3 format is source compatible with current GNU acct tools (6.3.5).
However, current GNU acct tools can be compiled for only one format.  As there
is no way to pass the kernel configuration to userspace, with my patch it will
still only support the old v2 format.  Only if v1/v2 support is removed from
the kernel, recompiling GNU acct tools will yield v3 support.

A preliminary take at the corresponding work on cross-platform userspace tools
(GNU acct package) is at

  http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/

This version of the package is able to read any of the v0/v2/v3 formats,
regardless of byte-order (untested), even within the same file.
Cross-platform compatibility with m68k (v1 format) is not yet implemented, but
native use on m68k should work (untested).  pid and ppid are currently only
shown by the dump-acct utility.

Thanks to Arthur Corliss, Albert Cahalan and Ragnar Kjørstad for their
comments, and to Albert Cahalan for the u64->IEEE float conversion code.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

f38928f4

[PATCH] make the 3c59x/3c90x driver somewhat more reliable · e1cb4984

Alan Cox authored Jun 17, 2004

The existing driver violates basic PCI rules in several places making it
unusable for basic things like DHCP in Fedora Core.  This patch removes all
the situations I can find where it writes to the device while in D3 state
and breaks stuff.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

e1cb4984

[PATCH] fix 3c59x.c to allow 3c905c 100bT-FD · 02bf06ec

Burton N. Windle authored Jun 17, 2004

Fix the 3c905C 10/100 transceiver initialisation woes.

(This was reverted from 2.6.7-rcX, but the bug reporter said the failure
turned out to be unrepeatable).
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

02bf06ec

[PATCH] Validate PM-Timer rate at boot time · 6d58b128

Joris van Rantwijk authored Jun 17, 2004

Add a check to the PM-Timer initialization code.  It validates the PM-Timer
rate against PIT channel 2 and rejects the PM-Timer if its rate is not
withing 5% of the expected number.

Rationale:

The PMTMR timers of certain (older) mainboards are running at invalid
rates, often much faster than the rate expected by the PM-Timer code.  This
causes the system clock to run much too fast.  See also
http://bugme.osdl.org/show_bug.cgi?id=2375

Possible workarounds are disabling the PM-Timer in the kernel config or
disabling the PM-Timer at boot time through the "clock=tsc" parameter.
However, we believe it is more user friendly to automatically validate the
PM-Timer rate at boot time before using it as the system time source.

Tested by me (with broken timer) and John Stultz (with good timer) and
believed to be ok.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

6d58b128

[PATCH] ahc1542 !CONFIG_MCA build fix · 261efef3

Matthew Wilcox authored Jun 17, 2004

The old 1542 scsi driver is both ISA and MCA.  The MCA portions are disabled
when !CONFIG_MCA through the typical wrapper scheme (a la pci.h and
!CONFIG_PCI).  However...  the driver unconditionally includes linux/mca.h
which in turn unconditionally includes asm/mca.h.

This breaks drivers on platforms with ISA but not MCA, like alpha.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

261efef3

[PATCH] x86: remove io_apic_sync · 2765df29

Ingo Molnar authored Jun 17, 2004

The patch below gets rid of io_apic_sync().

io_apic_sync() was introduced in 2.1.104 and it was originally done for
masking and unmasking as well.  Later the unmasking use got removed but the
masking use lingered around.  I dont think it was ever justified to do it
and clearly since the lack of io_apic_sync() didnt break some of the other
writes we do to the IO-APIC registers, it must be unnecessary in the
masking case too.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

2765df29

[PATCH] x86: remove APIC_LOCKUP_DEBUG · ecab0503

Ingo Molnar authored Jun 17, 2004

the patch below gets rid of APIC_LOCKUP_DEBUG. It has been in the kernel
for more than 3 years and the message was only reported once during that
period of time - and even in that case it was a side-effect of a really bad
crash. The lockup workaround works, the debugging code can be moved out.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

ecab0503

[PATCH] io_apic.c code consolidation · 6be35b97

Pavel Machek authored Jun 17, 2004

This cleans up io_apic.c a bit -- I do not really like 4 copies of same
code.

Ingo said:

   yeah, agreed - i checked & test it, it's ok.  I made a small
   modification (see the patch below) to uninline the __modify_IO_APIC_irq()
   function - shaving 0.5K off the kernel's size.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

6be35b97

[PATCH] Fix read() vs truncate race · 4bd9607e

Nick Piggin authored Jun 17, 2004

do_generic_mapping_read()
{
	isize1 = i_size_read();
	...
	readpage
	copy_to_user up to isize1;
}

readpage()
{
	isize2 = i_size_read();
	...
	read blocks
	...
	zero-fill all blocks past isize2
}

If a second thread runs truncate and shrinks i_size, so isize1 and isize2 are
different, the read can return up to a page of zero-fill that shouldn't really
exist.

The trick is to read isize1 after doing the readpage.  I realised this is the
right way to do it without having to change the readpage API.

The patch should not cost any cycles when reading from pagecache.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

4bd9607e

[PATCH] invalidate_inodes2(): mark pages not uptodate · 3cf8b87b

Andrew Morton authored Jun 17, 2004

Andrea Arcangeli <andrea@suse.de> points out that invalidate_inode_pages2() is
supposed to mark mapped-into-pagetable pages as not uptodate so that next time
someone faults the page in we will go get a new version from backing store.

The callers are the direct-io code and the NFS "something changed on the
server" code. In both these cases we do need to go and re-read the page.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

3cf8b87b

[PATCH] Check return status of register calls in i82365 · 8d6d3943

Herbert Xu authored Jun 17, 2004

i82365 calls driver_register and platform_device_register without checking
their return values.  This patch fixes that.

It also runs platform_device_register() prior to isa_probe() so we don't have
to undo ise_probe()'s effects if platform_device_register() ends up failing.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8d6d3943

[PATCH] getgroups16() fix · 8783a1ce

Tomas Olsson authored Jun 17, 2004

sys_getgroups16 (or rather groups16_to_user()) returns large gids
truncated.  Needs to be fixed, one way or another.  Don't know why the
other similar casts are still there.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8783a1ce

[PATCH] same small resource tweaks, x86_64 version · 406e1707

Rene Herman authored Jun 17, 2004

The same small tweaks for x86_64.  Just to keep the two in sync.  One
additional wrinkle: vram_resource was exported to e820.c, which didn't
actually use it.  Undo that.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

406e1707

[PATCH] small tweaks to standard resource stuff · bc3e9bd2

Rene Herman authored Jun 17, 2004

Various small tweaks. Compiled and booted.

1. add IORESOURCE_BUSY | IORESOURCE_MEM also for the kernel code and
     data resources. I don't believe this actually matters one bit, but
     they're hooked into a BUSY/MEM parent ("System RAM") and marking
     them busy seems to make sense.

2. delete the .start = 1M default for the kernel code resource. This
     isn't actually a change; it's set to virt_to_phys(_text) in
     setup_arch() overriding any default anyways.

3. s/vram_resource/video_ram_resource/. Lines up much nicer with
     video_rom_resource...

4. s/checksum/romchecksum/. setup.c is a fairly large file, and
     "checksum" pollutes the namespace.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

bc3e9bd2

[PATCH] Use first-fit for pty allocation · 3931ca0a

H. Peter Anvin authored Jun 17, 2004

(With Andrew Morton).

The current dynamic pty allocation scheme has a few problems:

- pty numbers grow to be very large, causing wtmp file bloat.

- Seems to break libc5 and some old applications

So change it to do first-fit.  An IDR tree is used to provide a
logarithmic-time search.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

3931ca0a

[PATCH] Ext3: Retry allocation after transaction commit (v2) · 5c4ad014

Theodore Y. Ts'o authored Jun 17, 2004

Here is a reworked version of my patch to ext3 to retry certain filesystem
operations after an ENOSPC error.  The ext3_should_retry_alloc() function will
not wait on the currently running transaction if there is a currently active
handle; hence this should avoid deadlocks in the Lustre use case.  The patch
is versus BK-recent.

I've also included a simple, reliable test case which demonstrates the problem
this patch is intended to fix.  (Note that BK-recent is not sufficient to
address this test case, and waiting on the commiting transaction in
ext3_new_block is also not sufficient.  Been there, tried that, didn't work.
We need to do the full-bore retry from the top level.  The
ext3_should_retry_alloc() will only wait on the committing transaction if
there is an active handle; hence Lustre will probably also need to use
ext3_should_retry_alloc() if it wants to reliably avoid this particular
problem.)

#!/bin/sh
#
#
TEST_DIR=/tmp
IMAGE=$TEST_DIR/retry.img
MNTPT=$TEST_DIR/retry.mnt
TEST_SRC=/usr/projects/e2fsprogs/e2fsprogs/build
MKE2FS_OPTS=""
IMAGE_SIZE=8192

umount $MNTPT
dd if=/dev/zero of=$IMAGE bs=4k count=$IMAGE_SIZE
mke2fs -j -F $MKE2FS_OPTS $IMAGE 

function test_log ()
{
	echo $*
	logger -p local4.notice $*
}

mkdir -p $MNTPT
mount -o loop -t ext3 $IMAGE $MNTPT
test_log Retry test: BEGIN
for i in `seq 1 3`
do
	test_log "Retry test: Loop $i"
	echo 2 > /proc/sys/fs/jbd-debug
	while ! mkdir -p $MNTPT/foo/bar
	do
		test_log "Retry test: mkdir failed"
		sleep 1
	done
	echo 0 > /proc/sys/fs/jbd-debug
	cp -r $TEST_SRC $MNTPT/foo/bar 2> /dev/null
	rm -rf $MNTPT/*
done
umount $MNTPT
test_log "Retry test: END"


akpm@osdl.org

  Rework the code to make it a formal JBD API entry point.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

5c4ad014

[PATCH] pc9800: merge std_resources.c back into setup.c · 248af7e2

Rene Herman authored Jun 17, 2004

std_resources.{c,h} was only split off due to pc9800 wanting to override it.
With it gone, it might as well be merged back in.  Doesn't change any code.
It was compiled and booted.

This time this also actually doesn't break compilation of any of the
subarches.  That's to say, any further.  I guess it might have been my .config
(my regular PC config, with just the subarch switched through menuconfig) or
O=, but only ELAN actually compiled.  Voyager and VISWS bombed out at the
final link and NUMAQ much sooner (with "physnode_map undeclared" during
compilation of numaq.c).
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

248af7e2

[PATCH] more PC9800 removal · 808ef260

Adrian Bunk authored Jun 17, 2004

Removes more PC9800 code.

Requires:

  bk rm drivers/char/upd4990a.c
  bk rm drivers/net/ne2k_cbus.c
  bk rm drivers/net/ne2k_cbus.h
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

808ef260

[PATCH] Remove PC9800 support · 5e018f7e

Randy Dunlap authored Jun 17, 2004

PC9800 sub-arch is incomplete, hackish (at least in IDE), maintainers don't
reply to emails and haven't touched it in awhile.  Can't even config it to
try to build it without other patches to the kernel tree.

bk-rm-script:

#! /bin/sh
bk rm -r ./arch/i386/mach-pc9800
bk rm -r ./arch/i386/boot98
bk rm ./drivers/char/lp_old98.c
bk rm ./drivers/serial/serial98.c
bk rm ./drivers/scsi/scsi_pc98.c
bk rm ./drivers/scsi/pc980155.c
bk rm ./drivers/scsi/pc980155.h
bk rm ./drivers/block/floppy98.c
bk rm ./drivers/input/keyboard/98kbd.c
bk rm ./drivers/input/serio/98kbd-io.c
bk rm ./drivers/input/misc/98spkr.c
bk rm ./drivers/input/mouse/98busmouse.c
bk rm ./drivers/ide/legacy/pc9800.c
bk rm ./drivers/ide/legacy/hd98.c
bk rm -r ./include/asm-i386/mach-pc9800
bk rm ./include/asm-i386/pc9800_sca.h
bk rm ./include/asm-i386/pc9800.h
bk rm ./fs/partitions/nec98.c
bk rm ./fs/partitions/nec98.h
bk rm ./sound/isa/cs423x/pc98.c
bk rm ./sound/isa/cs423x/pc9801_118_magic.h
bk rm ./sound/isa/cs423x/sound_pc9800.h
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

5e018f7e

[PATCH] HPET driver · b429f3b3

Robert Picco authored Jun 17, 2004

The driver supports the High Precision Event Timer. The driver has adopted
a similar API to the Real Time Clock driver. It can support any number of
HPET devices and the maximum number of timers per HPET device. For further
information look at the documentation in the patch.

Thanks to Venki at Intel for testing the driver on X86 hardware with HPET.

HPET documentation is available at http://www.intel.com/design/chipsets/datashts/252516.htmSigned-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

b429f3b3

[PATCH] RLIM: adjust default mqueue sizes · 02fb4124

Chris Wright authored Jun 17, 2004

Lower default sizes for POSIX mqueue allocation now that rlimits are in place.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

02fb4124

[PATCH] RLIM: enforce rlimits for POSIX mqueue allocation · ae17b2b3

Chris Wright authored Jun 17, 2004

Add a user_struct to the mq_inode_info structure.  Charge the maximum number
of bytes that could be allocated to a mqueue to the user who creates the
mqueue.  This is checked against the per user rlimit.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

ae17b2b3

[PATCH] RLIM: add mq_attr_ok() helper · b1cae1ec

Chris Wright authored Jun 17, 2004

Add helper function mq_attr_ok() to do mq_attr sanity checking, and do some
extra overlow checking.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

b1cae1ec

[PATCH] RLIM: add mq_bytes to user_struct · 9d9f6e8b

Chris Wright authored Jun 17, 2004

Add mq_bytes field to user_struct, and make sure it's properly initialized.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

9d9f6e8b

[PATCH] RLIM: add rlimit entry for POSIX mqueue allocation · faaa0feb

Chris Wright authored Jun 17, 2004

Add an rlimit entry to control the maximum number of bytes a user can allocate
to a POSIX mqueue.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

faaa0feb

[PATCH] RLIM: add simple get_uid() helper · db49b0f9

Chris Wright authored Jun 17, 2004

Add simple helper function to grab a reference to a user_struct.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

db49b0f9

[PATCH] RLIM: add sigpending field to user_struct · 84f4d297

Chris Wright authored Jun 17, 2004

Add sigpending field to user_struct, and make sure it's properly initialized.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

84f4d297

[PATCH] RLIM: add rlimit entry for controlling queued signals · 63e9e5dc

Chris Wright authored Jun 17, 2004

The following patches introduce per user rlimits for both queued signals and
POSIX message queues.  The changes touch all the arches resource.h files as
well as init_task.c to get the rlimit defaults setup.

Both require caching the user_struct to avoid problems with setuid().

The signal changes makes some small changes to send_signal() to pass along the
task being signalled to get proper accounting for signals initiated in
interrupt.  Thanks to Marcelo for getting this one going.


This patch:

Add an rlimit entry to control the maximum number of pending signals a user
may have.  This is essentially just the resource.h changes.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

63e9e5dc

[PATCH] i2c fixups for idr API change · 8e3ca9ba

Andrew Morton authored Jun 17, 2004

Fix up the i2c code which uses the IDR library.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8e3ca9ba

[PATCH] IDR fixups · 935b33bc

Corey Minyard authored Jun 17, 2004

There were definately some problems in there.  I've made some changes and
tested with a lot of bounds.  I don't have a machine with enough memory to
fill it up (it would take ~16GB on a 64-bit machine), but I use the "above"
code to simulate a lot of situations.

The problems were:

    * IDR_FULL was not the right value
    * idr_get_new_above() was not defined in the headers or documented.
    * idr_alloc() bug-ed if there was a race and not enough memory was
      allocated.  It should have returned NULL.
    * id will overflow when you go past the end.
    * There was a "(id >= (1 << (layers*IDR_BITS)))" comparison, but at
      the top layer it would overflow the id and be zero.
    * The allocation should return ENOSPC for an "above" value with
      nothing above it, but it returned EAGAIN.

I have not tested on 64-bits (as I don't have a 64-bit machine).

I've included the files, a diff from the previous version, and my test
programs.

For the test programs, idr_test <size> will just attempt to allocate 
<size> elements, check them, free them, and check them again.

idr_test2 <size> <incr> will allocate <size> element with <incr> between
them.

idr_test3 just tests some bounds and tries all values with just a few in
the idr.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

935b33bc

[PATCH] idr: remove counter bits from id's · 5470e17c

Andrew Morton authored Jun 17, 2004

idr_get_new() currently returns an incrementing counter in the top 8 bits of
the counter.  Which means that most users have to mask it off again, and we
only have a 24-bit range.

So remove that counter.  Also:

- Remove the BITS_PER_INT define due to namespace collision risk.

- Make MAX_ID_SHIFT 31, so counters have a 0 to 2G-1 range.

- Why is MAX_ID_SHIFT using sizeof(int) and not sizeof(long)?  If it's for
  consistency across 32- and 64-bit machines, why not just make it "31"?

- Does this still hold true with the counter removed?

/* We can only use half the bits in the top level because there are
   only four possible bits in the top level (5 bits * 4 levels = 25
   bits, but you only use 24 bits in the id). */

  If not, what needs to change?
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

5470e17c

[PATCH] Fixes for idr code · 90e518e1

Corey Minyard authored Jun 17, 2004

* On a 32-bit architecture, the idr code will cease to work if you add
  more than 2^20 entries.  You will not be able to find many of the
  entries.  The problem is that the IDR code uses 5-bit chunks of the
  number and the lower portion used by IDR is 24 bits, so you have one bit
  that leaks over into the comparisons that should not be there.  The
  solution is to mask off that bit before doing IDR processing.  This
  actually causes the POSIX timer code to crash if you create that many
  timers.  I have included an idr_test.tar.gz file that demonstrates this
  with and without the fix, in case you need more evidence :).

* When the IDR fills up, it returns -1.  However, there was no way to
  check for this condition.  This patch adds the ability to check for the
  idr being full and fixes all the users.  It also fixes a problem in
  fs/super.c where the idr code wasn't checking for -1.

* There was a race condition creating POSIX timers.  The timer was added
  to a task struct for another process then the data for the timer was
  filled out.  The other task could use/destroy time timer as soon as it is
  in the task's queue and the lock is released.  This moves settup up the
  timer data to before the timer is enqueued or (for some data) into the
  lock.

* Change things so that the caller doesn't need to run idr_full() to find
  out the reason for an idr_get_new() failure.

  Just return -ENOSPC if the tree was full, or -EAGAIN if the caller needs
  to re-run idr_pre_get() and try again.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

90e518e1

[PATCH] reiserfs data logging support · f1372916

Chris Mason authored Jun 17, 2004

Add data=journal support for reiserfs
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

f1372916

[PATCH] reiserfs: btree readahead · 2167f071

Chris Mason authored Jun 17, 2004

Walking the btree can trigger a number of single block synchronous reads.
This patch does btree readahead during operations that are likely to be long
and sequential.  So far, that only includes directory reads and truncates, but
it can make both much faster.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

2167f071

[PATCH] reiserfs: remove debugging warning from block allocator · 36f9f7fc

Chris Mason authored Jun 17, 2004

Remove debugging warning from the reiserfs block allocator code
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

36f9f7fc

[PATCH] reiserfs: block allocator should not inherit "packing locality 1" · 930c07f9

Chris Mason authored Jun 17, 2004

reiserfsck --rebuild-tree expects the only key with a packing locality of 1 to
be for the root directory (key [1 2]).  The new block allocator inherited that
packing locality down to subdirectories, which triggers failures in reiserfsck
--rebuild-tree

reiserfsck in readonly check mode doesn't complain about this, thanks to Jeff
Mahoney for finding it.

The fix is to never inherit packing locality #1
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

930c07f9

[PATCH] reiserfs: block allocator optimizations · 734db689

Chris Mason authored Jun 17, 2004

From: <mason@suse.com>
From: <jeffm@suse.com>

The current reiserfs allocator pretty much allocates things sequentially
from the start of the disk, it works very nicely for desktop loads but
once you've got more then one proc doing io data files can fragment badly.

One obvious solution is something like ext2's bitmap groups, which puts
file data into different areas of the disk based on which subdirectory
they are in.  The problem with bitmap groups is that if you've got a
group of subdirectories their contents will be spread out all over the
disk, leading to lots of seeks during a sequential read.

This allocator patch uses the packing locality to determine which bitmap
group to allocate from, but when you create a file it looks in the bitmaps
to see how 'full' that packing locality already is.  If it hasn't been
heavily used yet, the packing locality is inherited from the parent
directory putting files in new subdirs close to the parent subdir,
otherwise it is the inode number of the parent directory putting new
files far away from the parent subdir.

The end result is fewer bitmap groups for the same working set.  For
example, one test data set created by 20 procs running in parallel has
6822 subdirs.  And with vanilla reiserfs that would mean 6822
packing localities.  This patch turns that into 26 packing localities.

This makes sequential reads of big directory trees more efficient, but
it also makes the btree more efficient in general.  Things end up sorted
better because groups of subdirs end up with similar keys in the btree,
instead of being spread out all over.

The bitmap grouping code tries to use the start of each bitmap group
for metadata, and offsets the data slightly.  The data and metadata
are still close together, but not completely intermixed like they are
in the default allocator.  The end result is that leaf nodes tend to be
close to each other, making metadata readahead more effective.

The old block allocator had the ability to enforce a minimum
allocation size, but did not use it.  It now tries to do a pass looking
for larger allocation chunks before falling back to the old behaviour
of taking any blocks it can find.

The patch changes the defaults to:

mount -o alloc=skip_busy:dirid_groups:packing_groups

You can get back the old behaviour with mount -o alloc=skip_busy

mount -o alloc=dirid_groups will turn on the bitmap groups
mount -o alloc=packing_groups turns on the packing locality reduction code
mount -o alloc=skip_busy:dirid_groups turns on both dirid_groups and
skip_busy

Finally the patch adds a mount -o alloc=oid_groups, which puts files into
bitmap groups based on a hash of their objectid.  This would be used for
databases or other situations where you have a limited number of very
large files.

This command will tell you how many packing localities are actually in
use:

debugreiserfs -d /dev/xxx | grep '^|.*SD' | sed 's/^.....//' | awk '{print $1}' | sort -u | wc -l
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

734db689