- 29 Dec, 2003 40 commits
-
-
Andrew Morton authored
From: Roland McGrath <roland@redhat.com> I believe I have identified a failure mode that Linus saw a couple weeks back when tracking down some other fork/exit sorts of races. We saw this come up on rare occasions with the RHEL3 kernel's backport of the new code (while trying to track down other race failure modes we have yet to fix, sigh). I am talking about the following scenario: > Btw, even with the fix, doing a "while : ; ./crash t 10 ; done" will > eventually result in a stuck process: > > 1415 tty1 D 0:00 ./crash > > This is some kind of deadlock: most of the fifty threads are in "D" > state, with a trace something like > > [<c011fbe3>] schedule+0x360/0x7f8 > [<c0120539>] wait_for_completion+0xd4/0x1c3 > [<c0128c9e>] do_exit+0x627/0x6a4 > [<c0128ddd>] do_group_exit+0x3d/0x177 > [<c0130c13>] dequeue_signal+0x2d/0x84 > [<c0133911>] get_signal_to_deliver+0x390/0x575 > [<c010a541>] do_signal+0x6c/0xf1 > [<c01200be>] default_wake_function+0x0/0x12 > [<c01200be>] default_wake_function+0x0/0x12 > [<c013d50f>] do_futex+0x6d/0x7d > [<c013d635>] sys_futex+0x116/0x12f > [<c010a601>] do_notify_resume+0x3b/0x3d > [<c010a82e>] work_notifysig+0x13/0x15 > > except for one that is trying to core-dump: > > [<c0120539>] wait_for_completion+0xd4/0x1c3 > [<c01200be>] default_wake_function+0x0/0x12 > [<c01200be>] default_wake_function+0x0/0x12 > [<c02101aa>] rwsem_wake+0x86/0x12d > [<c01738af>] coredump_wait+0xa8/0xaa > [<c0173a26>] do_coredump+0x175/0x26c > > and three that are just doing a regular "exit()" system call: > > [<c011fbe3>] schedule+0x360/0x7f8 > [<c011e19a>] recalc_task_prio+0x90/0x1aa > [<c0120539>] wait_for_completion+0xd4/0x1c3 > [<c01200be>] default_wake_function+0x0/0x12 > [<c01200be>] default_wake_function+0x0/0x12 > [<c0210207>] rwsem_wake+0xe3/0x12d > [<c0128c9e>] do_exit+0x627/0x6a4 > [<c0128d4d>] next_thread+0x0/0x53 > [<c010a7e3>] syscall_call+0x7/0xb > > However, the rest of the system is totally unaffected by this deadlock: > it's only deadlocked withing the thread group itself, nobody else cares. What happens here is a race between an exiting thread checking mm->core_waiters in __exit_mm, and the thread taking the core-dump signal (in coredump_wait) examining the first thread's ->mm pointer and incrementing mm->core_waiters to account for it. There is no synchronization at all in __exit_mm's use of mm->core_waiters. If the coredump_wait thread reads tsk->mm when tsk is in __exit_mm between checking mm->core_waiters and clearing tsk->mm, then it will increment mm->core_waiters and the total count will later exceed the number of threads that will ever decrement it and synchronize. Hence it blocks forever. The following patch fixes the problem by using mm->mmap_sem in __exit_mm. The read lock must be held around checking mm->core_waiters and clearing tsk->mm so that coredump_wait (which gets the write lock) cannot come in between and do bogus bookkeeping.
-
Andrew Morton authored
From: Dave Olien <dmo@osdl.org> Here's a patch that changes the DAC960 driver from having one request queue for ALL disks on the controller, to having a request queue for each logical disk. This turns out to make little difference for deadline scheduler, nor for AS scheduler under light IO load. But under AS scheduler with heavy IO, it makes about a 40% difference on dbt2 workload. Here are the measured numbers: The 2.6.0-test11-D kernel version includes this mutli-queue patch to the DAC960 driver. For non-cached dbt2 workload (heavy IO load) Scheduler kernel/driver NOTPM(bigger is better) AS 2.6.0-test11-D 1598 AS 2.6.0-test11 973 deadline 2.6.0-test11 1640 deadline 2.6.0-test11-D 1645 For cached dbt2 workload (lighter IO load) AS 2.6.0-test11-D 4993 AS 2.6.-test6-mm4 4976, 4890, 4972 deadline 2.6.0-test11-D 4998 Can this be included in 2.6.0? I know it's not a "critical patch" in the sense that something won't work without it. On the other hand, the change is isolated to a driver.
-
Andrew Morton authored
From: Paul Clements <Paul.Clements@SteelEye.com> A previous "cleanup" on the nbd.h header file broke userspace compiles. I've added an #ifdef __KERNEL__ so that userspace doesn't need to worry about the nbd_device structure, which is only used in-kernel. The patch allows me to compile my nbd tools with the 2.6 nbd.h.
-
Andrew Morton authored
From: Tonnerre Anklin <thunder@keepsake.ch> This spinlock was used uninitialized. Gave me a lot of warnings.
-
Andrew Morton authored
From: Manfred Spraul <manfred@colorfullife.com> if alloc_slabmgmt fails, then kmem_freepages() calls sub_page_state(), altough nr_slab was not yet increased. The attached patch fixes that by moving the inc_page_state into kmem_getpages().
-
Andrew Morton authored
From: Rusty Russell <rusty@rustcorp.com.au> Steve Youngs, Stephen Hemminger Three more MODULE_ALIASes. Trivial, but useful if people want things to "just work" in 2.6.0.
-
Andrew Morton authored
From: Ingo Molnar <mingo@elte.hu> i've attached a minor fix for the 2.6.1 timeframe - we clearly meant __struct_cpy_bug(). Newest versions of gcc warn about this.
-
Andrew Morton authored
From: Manfred Spraul <manfred@colorfullife.com> slab_reclaim_pages is increased even if get_free_pages fails. The attached patch moves the update to the correct position.
-
Andrew Morton authored
From: Tim Schmielau <tim@physik3.uni-rostock.de>
-
Andrew Morton authored
From: Kurt Garloff <garloff@suse.de> The comment in front of vt_ioctl() reads /* * We handle the console-specific ioctl's here. We allow the * capability to modify any console, not just the fg_console.=20 */ Unfortunately, this does not apply to PIO_UNIMAPCLR, nor GIO_/PIO_UNIMAP. They always operate on the current foreground console, which is inconsistent at least. For most ioctls, the comment is applicable. It also causes problems, as setfont can't do the full job on the non-fg consoles. (OK, our setfont is slightly changed to even try it ... as you know.) The attached patch does fix this. I have a similar patch for 2.4, but it never got merged :-( because not many people seem to care and I submitted in the middle of the 2.4 series ... It has been in UnitedLinux/SUSE kernels for ages, though.
-
Andrew Morton authored
From: "Martin J. Bligh" <mbligh@aracnet.com>, Guillaume Morin <guillaume@morinfr.org> We need to clear the software dirty bit on the tail pages of a compound page when freeing it up. The tail pages can become dirtied by mmap'ing /dev/mem, and writing into any clustered page group (that a driver might have created or whatever). Plus it's better to run all these pages through the free_pages_check checks anyway.
-
Andrew Morton authored
From: Ingo Molnar <mingo@elte.hu> I'd also suggest the following patch below, to clarify the use of unsynchronized list_empty(). list_empty_careful() can only be safe in the very specific case of "one-shot" list entries which might be removed by another CPU. (but nothing else can happen to them and this is their only final state.) list_empty_careful() is otherwise completely unsynchronized on both the compiler and CPU level and is not 'SMP safe' in any way.
-
Andrew Morton authored
From: Geert Uytterhoeven <geert@linux-m68k.org> Mailing lists at vger.rutgers.edu are obsolete, use vger.kernel.org instead.
-
Andrew Morton authored
From: Joe Korty <joe.korty@ccur.com> The API for get_compat_timespec / put_compat_timespec is incorrect, it forces a caller with const args to (incorrectly) cast. The posix message queue patch is one such caller.
-
Andrew Morton authored
From: Gerd Knorr <kraxel@bytesex.org> Below is a ObviouslyCorrect[tm] patch which fixes the i2c bus timeout handling in the saa7146 driver.
-
Andrew Morton authored
The locking rules say that b_committed_data is covered by jbd_lock_bh_state(), so implement that during the start of commit, while throwing away unused shadow buffers. I don't expect that there is really a race here, but them's the rules.
-
Andrew Morton authored
From: Badari Pulavarty <pbadari@us.ibm.com> I found the problem with O_DIRECT memory leak. The problem is, when we are doing DIO read and crossed the end of file - we don't release referencess on all the pages we got from get_user_pages(). (since it is a success case). The fix is to call dio_cleanup() even for sucess cases.
-
Andrew Morton authored
From: Roland McGrath <roland@redhat.com> The following test program will crash every time if dynamically linked. I think this bites all 32-bit platforms, including 32-bit executables on 64-bit platforms that support them (and could in theory bite 64-bit platforms with bss sizes beyond the bounds of comprehension). volatile char hugebss[1080000000]; main() { printf("%p..%p\n", &hugebss[0], &hugebss[sizeof hugebss]); system("cat /proc/$PPID/maps"); hugebss[sizeof hugebss - 1] = 1; return 23; } The problem is that the kernel maps ld.so at 0x40000000 or some such place, before it maps the bss. Here the bss is so large that it overlaps and clobbers that mapping. I've changed it to map the bss before it loads the interpreter, so that part of the address space is reserved before ld.so's mapping (which doesn't really care where it goes) is done. This patch also adds error checking to the bss setup (and interpreter's bss setup). With the aforementioned change but no error checking, "ulimit -v 65536; ./hugebss" will crash in the store after the `system' call, because the kernel will have failed to allocate the bss and ignored the error, so the program runs without those pages being mapped at all. With this change it dies with a SIGKILL as for a failure to set up stack pages. It might be even better to try to detect the case earlier so that execve can return an error before it has wiped out the address space. But that seems like it would always be fragile and miss some corner cases, so I did not try to add such complexity.
-
Andrew Morton authored
From: Joe Korty <joe.korty@ccur.com> do_gettimeofday() is using tick_usec which is defined in terms of USER_HZ not HZ.
-
Andrew Morton authored
From: NeilBrown <neilb@cse.unsw.edu.au> stripe to be effective. This patch sets ra_pages appropriately.
-
Andrew Morton authored
From: NeilBrown <neilb@cse.unsw.edu.au> As no md personalities honour the merge_bvec_fn of underlying devices, we must make sure never to submit a bio larger than 1 page when a merge_bvec_fn is defined. raid5 already does this (it never submits bios larger than one page). With this patch, all other raid personalities limit their max_sectors when a merge_bvec_fn is present.
-
Andrew Morton authored
From: glee@gnupilgrims.org I think Adrian had forgotten to update the help text.
-
Andrew Morton authored
From: Jan Kara <jack@ucw.cz> here's patch which should fix deadlock with quotas+ext3 reported in 2.4 (the same problem existed in 2.6 but nobody found it).
-
Andrew Morton authored
From: Jan Kara <jack@ucw.cz> I'm sending you a fix of possible Oops in vfs_quota_sync(). Actually nobody has run into that I found it when I was looking through the code.
-
Andrew Morton authored
From: Geoffrey Lee <glee@gnupilgrims.org> This fixes what seems to be an obvious = vs == bug in the init301.c sis file.
-
Andrew Morton authored
From: William Lee Irwin III <wli@holomorphy.com> This field is 100% unused. This patch removes it.
-
Andrew Morton authored
From: Andi Kleen <ak@muc.de> 32bit siginfo would sometimes get passed incorrectly on x86-64. This change fixes the conversion function to be a bit dumber, but more correct.
-
Andrew Morton authored
From: Andi Kleen <ak@muc.de> Merge i386 fix. Don't panic in MP table parsing when the table is bad.
-
Andrew Morton authored
From: Andi Kleen <ak@muc.de> Merge signal race fixes from i386 to x86-64. Fix a bug in system call restart, noted by John Blackwood.
-
Andrew Morton authored
From: Andi Kleen <ak@muc.de> Merge the i386 fix for the page fault from Linus to x86-64 (I'm not actually sure what it fixes, but if it's good for 32bit it is likely good for 64bit too)
-
Andrew Morton authored
From: Andi Kleen <ak@muc.de> Make sure we never access anything in kernel mapping while doing the prefetch workaround checks on x86-64. Originally suggested by Jamie Lockier.
-
Andrew Morton authored
From: Andi Kleen <ak@muc.de> Another potential data corruption fix. The 32bit truncate64 on x86-64 did silently truncate offsets >32bit. That broke mysql for example. Fix that. From Chris Wilson
-
Andrew Morton authored
From: Andi Kleen <ak@muc.de> From Badari Pulavarty Without this sysrq-t shows the same backtrace for all processes on x86-64
-
Andrew Morton authored
From: Andi Kleen <ak@muc.de> A lot of people have run into this: the x86-64 cpuid driver didn't compile as module. Using a kludge suggested by Sam Ravnsborg.
-
Andrew Morton authored
From: Andi Kleen <ak@muc.de> Please consider applying this patch, I would consider it critical for x86-64. The 2.6.0 x86-64 IOMMU code unfortunately had a few problems, leading to non booting systems and in a few cases to data corruption. It fixes a two serious bugs in handling special kinds of scatter gather lists in pci_map_sg. AGP was completely broken with IOMMU because of a wrong #ifdef. Fix that. One TLB flush optimization I did a long time ago seems to break on some 3ware boards (who require IOMMU because they don't support 64bit addresses). The breakage lead to data corruption. This patch diables the optimization for now and fixes a potential SMP race in the flush code too. The TLB flush is done in a slower, but more reliable way now too. This patch fixes them. Please consider applying, because some of these problems hit quite many people. This also disables the IOMMU_DEBUG in the defconfig. A lot of people were using the IOMMU when they didn't need to, which multiplied the problems. IOMMU merge is disabled for now. This was an experimental optimization which helped with some block devices, but for production it seems to be better to disable it for now because there are some questionable corner cases when the IOMMU aperture fragments. The same is done for IOMMU SAC force, which was related to that. i386 has quite broken semantics for pci_alloc_consistent(). It uses the standard device DMA mask instead of the consistent mask. Make us bug-to-bug compatible here. This fixes problems with some sound drivers that don't support full 32bit addressing.
-
Andrew Morton authored
From: Andi Kleen <ak@muc.de> Add 32bit a.out support for x86-64. Not exactly an important bug fix, but maybe it will help someone. This should increase the current 98% compatibility to i386 to perhaps 98.1% @) I tested an old a.out SuSE 4.2 installation in chroot and it worked. It also ran some very old linux binaries from '92 found on ftp.funet.fi. The only program that didn't was the SuSE a.out GNU emacs, but I was too lazy to track that down. Core dumps are not supported.
-
Andrew Morton authored
From: Andi Kleen <ak@muc.de> It fixes the statfs64 emulation on x86-64. The problem is that x86-64 needs an __attribute__((aligned)) on the compat_statfs64 structure. The conclusion last time this was discussed was that the structure should be duplicated. Essentially it is the old shared structure copied to every user and x86-64 uses __attribute__((packed)).
-
Andrew Morton authored
From: Mark Haverkamp <markh@osdl.org> About three weeks ago markw at osdl posted a mail about a panic that he was seeing: http://marc.theaimsgroup.com/?l=linux-kernel&m=106737176716474&w=2 I believe what is happening, is that the dm __clone_and_map function is generating bio structures with the bi_idx field non-zero. When __blk_queue_bounce creates a new bio with bounce pages, it sets the bi_idx field to 0 rather than the bi_idx of the original. This causes trouble since bv_page pointers will be dereferenced later that are zero. The following uses the original bio structure's bi_idx in the new bio structure and in copy_to_high_bio_irq and bounce_end_io. This has cleared up the panic when using the volume. (acked by Joe Thornber)
-
Andrew Morton authored
From: Neil Brown <neilb@cse.unsw.edu.au> Change ext3 to run bd_claim() against external journal devices. It is significant only for those who have ext3 journals on a separate device, and gets exclusive access to that device.
-
Andrew Morton authored
From: Arnaldo Carvalho de Melo <acme@conectiva.com.br> pagemap.h, do not include thyself.
-