- 18 Jun, 2004 40 commits
-
-
Thomas Winischhofer authored
attached is an update for the sisfb driver to version 1.7.10. This update includes - fixes for pure 64bit and 32/64bit mixed systems (add ioctl conversion; fix variable sizes, etc; REQUIRED for current X.org/XFree86 on 64bit systems, even if pure 64bit), - fixes for 301C video bridge, (scales TV output correctly now) - fixes for 1600x1200 and 1400x1050 LCD panels, - many fixes for 661/741/760 (amongst others, proper LFB support for the 760 and corrections for SiS' new BIOS data layout; would lead to display corruption with old driver) - add support for many modes for LCD which were unsupported previously, - add support for HiVision and YPbPr HDTV - "vga=" statement now honoured properly (sisfb will set the same mode as the kernel did by default) - use LCD native resolution mode if no mode is given - a major clean up of main driver code, - radical removal of duplicate (or nearly duplicate) code, - switched to 2.6 module_param macros, - enhanced communication with the X driver, - added eventual POSTing of SiS300/305 card for non-x86 archs, - added ability to relocate the image on the TV screen using a userland tool, - added Documentation/fb/sisfb.txt (why the heck was this missing?!) - small fix for SiS DRM driver (match 32/64bit fixes mentioned above) (cast the data passed to sis_free as u32) - make driver re-entrant by avoiding static structures and variables. As usual, heavily tested. The mode switching code is even lab-tested by SiS (although 100% written by me). Please apply asap (especially since 64bit systems were not properly supported previously; as mentioned, current X.org/XFree86 needs this update for proper communication with the framebuffer driver on 64bit systems. X crashes on such systems with the old driver). Signed-off-by: Thomas Winischhofer <thomas@winischhofer.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Paul Serice authored
Make all inode numbers unique for images less than 128GB in size. Required for knfsd. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Paul Serice authored
This is my fourth attempt to patch the isofs code. It is similar to the last posting except this one implements the NFS get_parent() method which has always been missing. The original problem I set out to addresses is that the current iso9660 file system cannot reach inodes located beyond the 4GB barrier. This is caused by using the inode number as the byte offset of the inode data. Being 32-bits wide, the inode number is unable to reach inode data that does not reside on the first 4GB of the file system. This causes real problems with "growisofs" http://fy.chalmers.se/~appro/linux/DVD+RW/#isofs4gb and my pet project "shunt" http://www.serice.net/shunt/ This patch switches the isofs code from iget() to iget5_locked() which allows extra data to be passed into isofs_read_inode() so that inode data anywhere on the disk can be reached. The inode number scheme was also changed. Continuing to use the byte offset would have resulted in non-unique inodes in many common situations, but because the inode number no longer plays any role in reading the meta-data off the disk, I was free to set the inode number to some unique characteristic of the file. I have chosen to use the block offset which is also 32-bits wide. Lastly, the pre-patch code uses the default export_operations to handle accessing the file system through NFS. The problem with this is that the default NFS operations assume that iget() works which is no longer the case because of the necessity of switching to iget5_locked(). So, I had to implement the NFS operations too. As a bonus, I went ahead and implemented the NFS get_parent() method which has always been missing. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Tim Schmielau authored
BSD accounting format rework: Use all explicit and implicit padding in struct acct to - correctly report 32 bit uid/gid, - correctly report jobs (e.g., daemons) running longer than 497 days, - increase the precision of ac_etime from 2^-13 to 2^-20 (i.e., from ~6 hours to ~1 min. after a year) - store the current AHZ value. - allow cross-platform processing of the accounting file (limited for m68k which has a different size struct acct). - introduce versioning for smooth transition to incompatible formats in the future. Currently the following version numbers are defined: 0: old format (until 2.6.7) with 16 bit uid/gid 1: extended variant (binary compatible to v0 on M68K) 2: extended variant (binary compatible to v0 on everything except M68K) 3: a new binary incompatible format (64 bytes) 4: new binary incompatible format (128 bytes). layout of its first 64 bytes is the same as for v3. 5: marks second half of new binary incompatible format (128 bytes) (layout is not yet defined) All this is accomplished without breaking binary compatibility. 32 bit uid/gid support is compatible with the patch previously floating around and used e.g. by Red Hat. This patch also introduces a config option for a new, binary incompatible "version 3" format that - is uniform across and properly aligned on all platforms - stores pid and ppid - uses AHZ==100 on all platforms (allows to report longer times) Much of the compatibility glue goes away when v1/v2 support is removed from the kernel. Such a patch is at http://www.physik3.uni-rostock.de/tim/kernel/2.7/acct-cleanup-04.patch and might be applied in the 2.7 timeframe. The new v3 format is source compatible with current GNU acct tools (6.3.5). However, current GNU acct tools can be compiled for only one format. As there is no way to pass the kernel configuration to userspace, with my patch it will still only support the old v2 format. Only if v1/v2 support is removed from the kernel, recompiling GNU acct tools will yield v3 support. A preliminary take at the corresponding work on cross-platform userspace tools (GNU acct package) is at http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/ This version of the package is able to read any of the v0/v2/v3 formats, regardless of byte-order (untested), even within the same file. Cross-platform compatibility with m68k (v1 format) is not yet implemented, but native use on m68k should work (untested). pid and ppid are currently only shown by the dump-acct utility. Thanks to Arthur Corliss, Albert Cahalan and Ragnar Kjørstad for their comments, and to Albert Cahalan for the u64->IEEE float conversion code. Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Alan Cox authored
The existing driver violates basic PCI rules in several places making it unusable for basic things like DHCP in Fedora Core. This patch removes all the situations I can find where it writes to the device while in D3 state and breaks stuff. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Burton N. Windle authored
Fix the 3c905C 10/100 transceiver initialisation woes. (This was reverted from 2.6.7-rcX, but the bug reporter said the failure turned out to be unrepeatable). Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Joris van Rantwijk authored
Add a check to the PM-Timer initialization code. It validates the PM-Timer rate against PIT channel 2 and rejects the PM-Timer if its rate is not withing 5% of the expected number. Rationale: The PMTMR timers of certain (older) mainboards are running at invalid rates, often much faster than the rate expected by the PM-Timer code. This causes the system clock to run much too fast. See also http://bugme.osdl.org/show_bug.cgi?id=2375 Possible workarounds are disabling the PM-Timer in the kernel config or disabling the PM-Timer at boot time through the "clock=tsc" parameter. However, we believe it is more user friendly to automatically validate the PM-Timer rate at boot time before using it as the system time source. Tested by me (with broken timer) and John Stultz (with good timer) and believed to be ok. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Matthew Wilcox authored
The old 1542 scsi driver is both ISA and MCA. The MCA portions are disabled when !CONFIG_MCA through the typical wrapper scheme (a la pci.h and !CONFIG_PCI). However... the driver unconditionally includes linux/mca.h which in turn unconditionally includes asm/mca.h. This breaks drivers on platforms with ISA but not MCA, like alpha. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
The patch below gets rid of io_apic_sync(). io_apic_sync() was introduced in 2.1.104 and it was originally done for masking and unmasking as well. Later the unmasking use got removed but the masking use lingered around. I dont think it was ever justified to do it and clearly since the lack of io_apic_sync() didnt break some of the other writes we do to the IO-APIC registers, it must be unnecessary in the masking case too. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
the patch below gets rid of APIC_LOCKUP_DEBUG. It has been in the kernel for more than 3 years and the message was only reported once during that period of time - and even in that case it was a side-effect of a really bad crash. The lockup workaround works, the debugging code can be moved out. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Pavel Machek authored
This cleans up io_apic.c a bit -- I do not really like 4 copies of same code. Ingo said: yeah, agreed - i checked & test it, it's ok. I made a small modification (see the patch below) to uninline the __modify_IO_APIC_irq() function - shaving 0.5K off the kernel's size. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Nick Piggin authored
do_generic_mapping_read() { isize1 = i_size_read(); ... readpage copy_to_user up to isize1; } readpage() { isize2 = i_size_read(); ... read blocks ... zero-fill all blocks past isize2 } If a second thread runs truncate and shrinks i_size, so isize1 and isize2 are different, the read can return up to a page of zero-fill that shouldn't really exist. The trick is to read isize1 after doing the readpage. I realised this is the right way to do it without having to change the readpage API. The patch should not cost any cycles when reading from pagecache. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
Andrea Arcangeli <andrea@suse.de> points out that invalidate_inode_pages2() is supposed to mark mapped-into-pagetable pages as not uptodate so that next time someone faults the page in we will go get a new version from backing store. The callers are the direct-io code and the NFS "something changed on the server" code. In both these cases we do need to go and re-read the page. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Herbert Xu authored
i82365 calls driver_register and platform_device_register without checking their return values. This patch fixes that. It also runs platform_device_register() prior to isa_probe() so we don't have to undo ise_probe()'s effects if platform_device_register() ends up failing. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Tomas Olsson authored
sys_getgroups16 (or rather groups16_to_user()) returns large gids truncated. Needs to be fixed, one way or another. Don't know why the other similar casts are still there. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Rene Herman authored
The same small tweaks for x86_64. Just to keep the two in sync. One additional wrinkle: vram_resource was exported to e820.c, which didn't actually use it. Undo that. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Rene Herman authored
Various small tweaks. Compiled and booted. 1. add IORESOURCE_BUSY | IORESOURCE_MEM also for the kernel code and data resources. I don't believe this actually matters one bit, but they're hooked into a BUSY/MEM parent ("System RAM") and marking them busy seems to make sense. 2. delete the .start = 1M default for the kernel code resource. This isn't actually a change; it's set to virt_to_phys(_text) in setup_arch() overriding any default anyways. 3. s/vram_resource/video_ram_resource/. Lines up much nicer with video_rom_resource... 4. s/checksum/romchecksum/. setup.c is a fairly large file, and "checksum" pollutes the namespace. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
H. Peter Anvin authored
(With Andrew Morton). The current dynamic pty allocation scheme has a few problems: - pty numbers grow to be very large, causing wtmp file bloat. - Seems to break libc5 and some old applications So change it to do first-fit. An IDR tree is used to provide a logarithmic-time search. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Theodore Y. Ts'o authored
Here is a reworked version of my patch to ext3 to retry certain filesystem operations after an ENOSPC error. The ext3_should_retry_alloc() function will not wait on the currently running transaction if there is a currently active handle; hence this should avoid deadlocks in the Lustre use case. The patch is versus BK-recent. I've also included a simple, reliable test case which demonstrates the problem this patch is intended to fix. (Note that BK-recent is not sufficient to address this test case, and waiting on the commiting transaction in ext3_new_block is also not sufficient. Been there, tried that, didn't work. We need to do the full-bore retry from the top level. The ext3_should_retry_alloc() will only wait on the committing transaction if there is an active handle; hence Lustre will probably also need to use ext3_should_retry_alloc() if it wants to reliably avoid this particular problem.) #!/bin/sh # # TEST_DIR=/tmp IMAGE=$TEST_DIR/retry.img MNTPT=$TEST_DIR/retry.mnt TEST_SRC=/usr/projects/e2fsprogs/e2fsprogs/build MKE2FS_OPTS="" IMAGE_SIZE=8192 umount $MNTPT dd if=/dev/zero of=$IMAGE bs=4k count=$IMAGE_SIZE mke2fs -j -F $MKE2FS_OPTS $IMAGE function test_log () { echo $* logger -p local4.notice $* } mkdir -p $MNTPT mount -o loop -t ext3 $IMAGE $MNTPT test_log Retry test: BEGIN for i in `seq 1 3` do test_log "Retry test: Loop $i" echo 2 > /proc/sys/fs/jbd-debug while ! mkdir -p $MNTPT/foo/bar do test_log "Retry test: mkdir failed" sleep 1 done echo 0 > /proc/sys/fs/jbd-debug cp -r $TEST_SRC $MNTPT/foo/bar 2> /dev/null rm -rf $MNTPT/* done umount $MNTPT test_log "Retry test: END" akpm@osdl.org Rework the code to make it a formal JBD API entry point. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Rene Herman authored
std_resources.{c,h} was only split off due to pc9800 wanting to override it. With it gone, it might as well be merged back in. Doesn't change any code. It was compiled and booted. This time this also actually doesn't break compilation of any of the subarches. That's to say, any further. I guess it might have been my .config (my regular PC config, with just the subarch switched through menuconfig) or O=, but only ELAN actually compiled. Voyager and VISWS bombed out at the final link and NUMAQ much sooner (with "physnode_map undeclared" during compilation of numaq.c). Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Adrian Bunk authored
Removes more PC9800 code. Requires: bk rm drivers/char/upd4990a.c bk rm drivers/net/ne2k_cbus.c bk rm drivers/net/ne2k_cbus.h Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Randy Dunlap authored
PC9800 sub-arch is incomplete, hackish (at least in IDE), maintainers don't reply to emails and haven't touched it in awhile. Can't even config it to try to build it without other patches to the kernel tree. bk-rm-script: #! /bin/sh bk rm -r ./arch/i386/mach-pc9800 bk rm -r ./arch/i386/boot98 bk rm ./drivers/char/lp_old98.c bk rm ./drivers/serial/serial98.c bk rm ./drivers/scsi/scsi_pc98.c bk rm ./drivers/scsi/pc980155.c bk rm ./drivers/scsi/pc980155.h bk rm ./drivers/block/floppy98.c bk rm ./drivers/input/keyboard/98kbd.c bk rm ./drivers/input/serio/98kbd-io.c bk rm ./drivers/input/misc/98spkr.c bk rm ./drivers/input/mouse/98busmouse.c bk rm ./drivers/ide/legacy/pc9800.c bk rm ./drivers/ide/legacy/hd98.c bk rm -r ./include/asm-i386/mach-pc9800 bk rm ./include/asm-i386/pc9800_sca.h bk rm ./include/asm-i386/pc9800.h bk rm ./fs/partitions/nec98.c bk rm ./fs/partitions/nec98.h bk rm ./sound/isa/cs423x/pc98.c bk rm ./sound/isa/cs423x/pc9801_118_magic.h bk rm ./sound/isa/cs423x/sound_pc9800.h Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Robert Picco authored
The driver supports the High Precision Event Timer. The driver has adopted a similar API to the Real Time Clock driver. It can support any number of HPET devices and the maximum number of timers per HPET device. For further information look at the documentation in the patch. Thanks to Venki at Intel for testing the driver on X86 hardware with HPET. HPET documentation is available at http://www.intel.com/design/chipsets/datashts/252516.htmSigned-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Chris Wright authored
Lower default sizes for POSIX mqueue allocation now that rlimits are in place. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Chris Wright authored
Add a user_struct to the mq_inode_info structure. Charge the maximum number of bytes that could be allocated to a mqueue to the user who creates the mqueue. This is checked against the per user rlimit. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Chris Wright authored
Add helper function mq_attr_ok() to do mq_attr sanity checking, and do some extra overlow checking. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Chris Wright authored
Add mq_bytes field to user_struct, and make sure it's properly initialized. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Chris Wright authored
Add an rlimit entry to control the maximum number of bytes a user can allocate to a POSIX mqueue. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Chris Wright authored
Add simple helper function to grab a reference to a user_struct. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Chris Wright authored
Add sigpending field to user_struct, and make sure it's properly initialized. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Chris Wright authored
The following patches introduce per user rlimits for both queued signals and POSIX message queues. The changes touch all the arches resource.h files as well as init_task.c to get the rlimit defaults setup. Both require caching the user_struct to avoid problems with setuid(). The signal changes makes some small changes to send_signal() to pass along the task being signalled to get proper accounting for signals initiated in interrupt. Thanks to Marcelo for getting this one going. This patch: Add an rlimit entry to control the maximum number of pending signals a user may have. This is essentially just the resource.h changes. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
Fix up the i2c code which uses the IDR library. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Corey Minyard authored
There were definately some problems in there. I've made some changes and tested with a lot of bounds. I don't have a machine with enough memory to fill it up (it would take ~16GB on a 64-bit machine), but I use the "above" code to simulate a lot of situations. The problems were: * IDR_FULL was not the right value * idr_get_new_above() was not defined in the headers or documented. * idr_alloc() bug-ed if there was a race and not enough memory was allocated. It should have returned NULL. * id will overflow when you go past the end. * There was a "(id >= (1 << (layers*IDR_BITS)))" comparison, but at the top layer it would overflow the id and be zero. * The allocation should return ENOSPC for an "above" value with nothing above it, but it returned EAGAIN. I have not tested on 64-bits (as I don't have a 64-bit machine). I've included the files, a diff from the previous version, and my test programs. For the test programs, idr_test <size> will just attempt to allocate <size> elements, check them, free them, and check them again. idr_test2 <size> <incr> will allocate <size> element with <incr> between them. idr_test3 just tests some bounds and tries all values with just a few in the idr. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
idr_get_new() currently returns an incrementing counter in the top 8 bits of the counter. Which means that most users have to mask it off again, and we only have a 24-bit range. So remove that counter. Also: - Remove the BITS_PER_INT define due to namespace collision risk. - Make MAX_ID_SHIFT 31, so counters have a 0 to 2G-1 range. - Why is MAX_ID_SHIFT using sizeof(int) and not sizeof(long)? If it's for consistency across 32- and 64-bit machines, why not just make it "31"? - Does this still hold true with the counter removed? /* We can only use half the bits in the top level because there are only four possible bits in the top level (5 bits * 4 levels = 25 bits, but you only use 24 bits in the id). */ If not, what needs to change? Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Corey Minyard authored
* On a 32-bit architecture, the idr code will cease to work if you add more than 2^20 entries. You will not be able to find many of the entries. The problem is that the IDR code uses 5-bit chunks of the number and the lower portion used by IDR is 24 bits, so you have one bit that leaks over into the comparisons that should not be there. The solution is to mask off that bit before doing IDR processing. This actually causes the POSIX timer code to crash if you create that many timers. I have included an idr_test.tar.gz file that demonstrates this with and without the fix, in case you need more evidence :). * When the IDR fills up, it returns -1. However, there was no way to check for this condition. This patch adds the ability to check for the idr being full and fixes all the users. It also fixes a problem in fs/super.c where the idr code wasn't checking for -1. * There was a race condition creating POSIX timers. The timer was added to a task struct for another process then the data for the timer was filled out. The other task could use/destroy time timer as soon as it is in the task's queue and the lock is released. This moves settup up the timer data to before the timer is enqueued or (for some data) into the lock. * Change things so that the caller doesn't need to run idr_full() to find out the reason for an idr_get_new() failure. Just return -ENOSPC if the tree was full, or -EAGAIN if the caller needs to re-run idr_pre_get() and try again. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Chris Mason authored
Add data=journal support for reiserfs Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Chris Mason authored
Walking the btree can trigger a number of single block synchronous reads. This patch does btree readahead during operations that are likely to be long and sequential. So far, that only includes directory reads and truncates, but it can make both much faster. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Chris Mason authored
Remove debugging warning from the reiserfs block allocator code Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Chris Mason authored
reiserfsck --rebuild-tree expects the only key with a packing locality of 1 to be for the root directory (key [1 2]). The new block allocator inherited that packing locality down to subdirectories, which triggers failures in reiserfsck --rebuild-tree reiserfsck in readonly check mode doesn't complain about this, thanks to Jeff Mahoney for finding it. The fix is to never inherit packing locality #1 Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Chris Mason authored
From: <mason@suse.com> From: <jeffm@suse.com> The current reiserfs allocator pretty much allocates things sequentially from the start of the disk, it works very nicely for desktop loads but once you've got more then one proc doing io data files can fragment badly. One obvious solution is something like ext2's bitmap groups, which puts file data into different areas of the disk based on which subdirectory they are in. The problem with bitmap groups is that if you've got a group of subdirectories their contents will be spread out all over the disk, leading to lots of seeks during a sequential read. This allocator patch uses the packing locality to determine which bitmap group to allocate from, but when you create a file it looks in the bitmaps to see how 'full' that packing locality already is. If it hasn't been heavily used yet, the packing locality is inherited from the parent directory putting files in new subdirs close to the parent subdir, otherwise it is the inode number of the parent directory putting new files far away from the parent subdir. The end result is fewer bitmap groups for the same working set. For example, one test data set created by 20 procs running in parallel has 6822 subdirs. And with vanilla reiserfs that would mean 6822 packing localities. This patch turns that into 26 packing localities. This makes sequential reads of big directory trees more efficient, but it also makes the btree more efficient in general. Things end up sorted better because groups of subdirs end up with similar keys in the btree, instead of being spread out all over. The bitmap grouping code tries to use the start of each bitmap group for metadata, and offsets the data slightly. The data and metadata are still close together, but not completely intermixed like they are in the default allocator. The end result is that leaf nodes tend to be close to each other, making metadata readahead more effective. The old block allocator had the ability to enforce a minimum allocation size, but did not use it. It now tries to do a pass looking for larger allocation chunks before falling back to the old behaviour of taking any blocks it can find. The patch changes the defaults to: mount -o alloc=skip_busy:dirid_groups:packing_groups You can get back the old behaviour with mount -o alloc=skip_busy mount -o alloc=dirid_groups will turn on the bitmap groups mount -o alloc=packing_groups turns on the packing locality reduction code mount -o alloc=skip_busy:dirid_groups turns on both dirid_groups and skip_busy Finally the patch adds a mount -o alloc=oid_groups, which puts files into bitmap groups based on a hash of their objectid. This would be used for databases or other situations where you have a limited number of very large files. This command will tell you how many packing localities are actually in use: debugreiserfs -d /dev/xxx | grep '^|.*SD' | sed 's/^.....//' | awk '{print $1}' | sort -u | wc -l Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-