1. 29 Dec, 2003 40 commits
    • Andrew Morton's avatar
      [PATCH] synchronize use of mm->core_waiters · 99365bd4
      Andrew Morton authored
      From: Roland McGrath <roland@redhat.com>
      
      I believe I have identified a failure mode that Linus saw a couple weeks
      back when tracking down some other fork/exit sorts of races.  We saw this
      come up on rare occasions with the RHEL3 kernel's backport of the new code
      (while trying to track down other race failure modes we have yet to fix, sigh).
      
      I am talking about the following scenario:
      
      > Btw, even with the fix, doing a "while : ; ./crash t 10 ; done" will
      > eventually result in a stuck process:
      >
      > 	 1415 tty1     D      0:00 ./crash
      >
      > This is some kind of deadlock: most of the fifty threads are in "D"
      > state, with a trace something like
      >
      > 	 [<c011fbe3>] schedule+0x360/0x7f8
      > 	 [<c0120539>] wait_for_completion+0xd4/0x1c3
      > 	 [<c0128c9e>] do_exit+0x627/0x6a4
      > 	 [<c0128ddd>] do_group_exit+0x3d/0x177
      > 	 [<c0130c13>] dequeue_signal+0x2d/0x84
      > 	 [<c0133911>] get_signal_to_deliver+0x390/0x575
      > 	 [<c010a541>] do_signal+0x6c/0xf1
      > 	 [<c01200be>] default_wake_function+0x0/0x12
      > 	 [<c01200be>] default_wake_function+0x0/0x12
      > 	 [<c013d50f>] do_futex+0x6d/0x7d
      > 	 [<c013d635>] sys_futex+0x116/0x12f
      > 	 [<c010a601>] do_notify_resume+0x3b/0x3d
      > 	 [<c010a82e>] work_notifysig+0x13/0x15
      >
      > except for one that is trying to core-dump:
      >
      > 	 [<c0120539>] wait_for_completion+0xd4/0x1c3
      > 	 [<c01200be>] default_wake_function+0x0/0x12
      > 	 [<c01200be>] default_wake_function+0x0/0x12
      > 	 [<c02101aa>] rwsem_wake+0x86/0x12d
      > 	 [<c01738af>] coredump_wait+0xa8/0xaa
      > 	 [<c0173a26>] do_coredump+0x175/0x26c
      >
      > and three that are just doing a regular "exit()" system call:
      >
      > 	 [<c011fbe3>] schedule+0x360/0x7f8
      > 	 [<c011e19a>] recalc_task_prio+0x90/0x1aa
      > 	 [<c0120539>] wait_for_completion+0xd4/0x1c3
      > 	 [<c01200be>] default_wake_function+0x0/0x12
      > 	 [<c01200be>] default_wake_function+0x0/0x12
      > 	 [<c0210207>] rwsem_wake+0xe3/0x12d
      > 	 [<c0128c9e>] do_exit+0x627/0x6a4
      > 	 [<c0128d4d>] next_thread+0x0/0x53
      > 	 [<c010a7e3>] syscall_call+0x7/0xb
      >
      > However, the rest of the system is totally unaffected by this deadlock:
      > it's only deadlocked withing the thread group itself, nobody else cares.
      
      What happens here is a race between an exiting thread checking
      mm->core_waiters in __exit_mm, and the thread taking the core-dump signal
      (in coredump_wait) examining the first thread's ->mm pointer and
      incrementing mm->core_waiters to account for it.  There is no
      synchronization at all in __exit_mm's use of mm->core_waiters.  If the
      coredump_wait thread reads tsk->mm when tsk is in __exit_mm between
      checking mm->core_waiters and clearing tsk->mm, then it will increment
      mm->core_waiters and the total count will later exceed the number of
      threads that will ever decrement it and synchronize.  Hence it blocks forever.
      
      The following patch fixes the problem by using mm->mmap_sem in __exit_mm.
      The read lock must be held around checking mm->core_waiters and clearing
      tsk->mm so that coredump_wait (which gets the write lock) cannot come in
      between and do bogus bookkeeping.
      99365bd4
    • Andrew Morton's avatar
      [PATCH] DAC960 request queue per disk · dc942a21
      Andrew Morton authored
      From: Dave Olien <dmo@osdl.org>
      
      Here's a patch that changes the DAC960 driver from having one request
      queue for ALL disks on the controller, to having a request queue for
      each logical disk.  This turns out to make little difference for deadline
      scheduler, nor for AS scheduler under light IO load.  But under AS
      scheduler with heavy IO, it makes about a 40% difference on dbt2
      workload.  Here are the measured numbers:
      
      The 2.6.0-test11-D kernel version includes this mutli-queue patch to the
      DAC960 driver.
      
      For non-cached dbt2 workload  (heavy IO load)
      
      Scheduler	kernel/driver	NOTPM(bigger is better)
      AS		2.6.0-test11-D  1598
      AS		2.6.0-test11     973
      deadline	2.6.0-test11    1640
      deadline	2.6.0-test11-D  1645
      
      For cached dbt2 workload (lighter IO load)
      
      AS		2.6.0-test11-D  4993
      AS		2.6.-test6-mm4  4976, 4890, 4972
      deadline	2.6.0-test11-D  4998
      
      Can this be included in 2.6.0?  I know it's not a "critical patch"
      in the sense that something won't work without it.  On the other hand,
      the change is isolated to a driver.
      dc942a21
    • Andrew Morton's avatar
      [PATCH] fix userspace compiles with nbd.h · dd5a4db6
      Andrew Morton authored
      From: Paul Clements <Paul.Clements@SteelEye.com>
      
      A previous "cleanup" on the nbd.h header file broke userspace compiles.
      I've added an #ifdef __KERNEL__ so that userspace doesn't need to worry
      about the nbd_device structure, which is only used in-kernel. The patch
      allows me to compile my nbd tools with the 2.6 nbd.h.
      dd5a4db6
    • Andrew Morton's avatar
      [PATCH] isdn_ppp_ccp.c uses uninitialized spinlock · fe8bbcd3
      Andrew Morton authored
      From: Tonnerre Anklin <thunder@keepsake.ch>
      
      This spinlock was used uninitialized. Gave me a lot of warnings.
      fe8bbcd3
    • Andrew Morton's avatar
      [PATCH] nr_slab accounting fix · d71abcaf
      Andrew Morton authored
      From: Manfred Spraul <manfred@colorfullife.com>
      
      if alloc_slabmgmt fails, then kmem_freepages() calls sub_page_state(),
      altough nr_slab was not yet increased.  The attached patch fixes that by
      moving the inc_page_state into kmem_getpages().
      d71abcaf
    • Andrew Morton's avatar
      [PATCH] More MODULE_ALIASes · 6788a95d
      Andrew Morton authored
      From: Rusty Russell <rusty@rustcorp.com.au>
            Steve Youngs, Stephen Hemminger
      
      Three more MODULE_ALIASes.  Trivial, but useful if people want things
      to "just work" in 2.6.0.
      6788a95d
    • Andrew Morton's avatar
      [PATCH] struct_cpy compilation warning · e85132b2
      Andrew Morton authored
      From: Ingo Molnar <mingo@elte.hu>
      
      i've attached a minor fix for the 2.6.1 timeframe - we clearly meant
      __struct_cpy_bug().  Newest versions of gcc warn about this.
      e85132b2
    • Andrew Morton's avatar
      [PATCH] slab reclaim accounting fix · 1cdf0eef
      Andrew Morton authored
      From: Manfred Spraul <manfred@colorfullife.com>
      
      slab_reclaim_pages is increased even if get_free_pages fails.  The attached
      patch moves the update to the correct position.
      1cdf0eef
    • Andrew Morton's avatar
      [PATCH] fix outdated comment in jiffies.h · 162bc7d1
      Andrew Morton authored
      From: Tim Schmielau <tim@physik3.uni-rostock.de>
      162bc7d1
    • Andrew Morton's avatar
      [PATCH] Allow unimap change on non fg console · a4b05bb1
      Andrew Morton authored
      From: Kurt Garloff <garloff@suse.de>
      
      The comment in front of vt_ioctl() reads
      /*
       * We handle the console-specific ioctl's here.  We allow the
       * capability to modify any console, not just the fg_console.=20
       */
      
      Unfortunately, this does not apply to PIO_UNIMAPCLR, nor
      GIO_/PIO_UNIMAP. They always operate on the current foreground
      console, which is inconsistent at least. For most ioctls, the
      comment is applicable.
      
      It also causes problems, as setfont can't do the full job on
      the non-fg consoles. (OK, our setfont is slightly changed to
      even try it ... as you know.)
      
      The attached patch does fix this.
      
      I have a similar patch for 2.4, but it never got merged :-(
      because not many people seem to care and I submitted in the middle
      of the 2.4 series ...
      It has been in UnitedLinux/SUSE kernels for ages, though.
      a4b05bb1
    • Andrew Morton's avatar
      [PATCH] Clear dirty bits etc on compound frees · e86ff3c7
      Andrew Morton authored
      From: "Martin J. Bligh" <mbligh@aracnet.com>,
            Guillaume Morin <guillaume@morinfr.org>
      
      We need to clear the software dirty bit on the tail pages of a compound page
      when freeing it up.
      
      The tail pages can become dirtied by mmap'ing /dev/mem, and writing into
      any clustered page group (that a driver might have created or whatever).
      
      Plus it's better to run all these pages through the free_pages_check checks
      anyway.
      e86ff3c7
    • Andrew Morton's avatar
      [PATCH] list_empty_careful() documentation. · 3182fe92
      Andrew Morton authored
      From: Ingo Molnar <mingo@elte.hu>
      
      I'd also suggest the following patch below, to clarify the use of
      unsynchronized list_empty().  list_empty_careful() can only be safe in the
      very specific case of "one-shot" list entries which might be removed by
      another CPU.  (but nothing else can happen to them and this is their only
      final state.) list_empty_careful() is otherwise completely unsynchronized
      on both the compiler and CPU level and is not 'SMP safe' in any way.
      3182fe92
    • Andrew Morton's avatar
      [PATCH] MAINTAINERS vger.rutgers.edu · c13bb409
      Andrew Morton authored
      From: Geert Uytterhoeven <geert@linux-m68k.org>
      
      Mailing lists at vger.rutgers.edu are obsolete, use vger.kernel.org
      instead.
      c13bb409
    • Andrew Morton's avatar
      [PATCH] more correct get_compat_timespec interface · 0eea2040
      Andrew Morton authored
      From: Joe Korty <joe.korty@ccur.com>
      
      The API for get_compat_timespec / put_compat_timespec is incorrect, it
      forces a caller with const args to (incorrectly) cast.  The posix message
      queue patch is one such caller.
      0eea2040
    • Andrew Morton's avatar
      [PATCH] dvb i2c timeout fix · 0f4e98bc
      Andrew Morton authored
      From: Gerd Knorr <kraxel@bytesex.org>
      
      Below is a ObviouslyCorrect[tm] patch which fixes the i2c bus timeout
      handling in the saa7146 driver.
      0f4e98bc
    • Andrew Morton's avatar
      [PATCH] JBD: b_committed_data locking fix · 524e63d2
      Andrew Morton authored
      The locking rules say that b_committed_data is covered by
      jbd_lock_bh_state(), so implement that during the start of commit, while
      throwing away unused shadow buffers.
      
      I don't expect that there is really a race here, but them's the rules.
      524e63d2
    • Andrew Morton's avatar
      [PATCH] O_DIRECT memory leak fix · 7e3989bb
      Andrew Morton authored
      From: Badari Pulavarty <pbadari@us.ibm.com>
      
      I found the problem with O_DIRECT memory leak.
      
      The problem is, when we are doing DIO read and crossed the end of file - we
      don't release referencess on all the pages we got from get_user_pages().
      (since it is a success case).
      
      The fix is to call dio_cleanup() even for sucess cases.
      7e3989bb
    • Andrew Morton's avatar
      [PATCH] fix ELF exec with huge bss · 0363994f
      Andrew Morton authored
      From: Roland McGrath <roland@redhat.com>
      
      The following test program will crash every time if dynamically linked.
      I think this bites all 32-bit platforms, including 32-bit executables on
      64-bit platforms that support them (and could in theory bite 64-bit
      platforms with bss sizes beyond the bounds of comprehension).
      
      	volatile char hugebss[1080000000];
      	main() { printf("%p..%p\n", &hugebss[0], &hugebss[sizeof hugebss]);
      	 system("cat /proc/$PPID/maps");
      	 hugebss[sizeof hugebss - 1] = 1;
      	 return 23;
      	}
      
      The problem is that the kernel maps ld.so at 0x40000000 or some such place,
      before it maps the bss.  Here the bss is so large that it overlaps and
      clobbers that mapping.  I've changed it to map the bss before it loads the
      interpreter, so that part of the address space is reserved before ld.so's
      mapping (which doesn't really care where it goes) is done.
      
      This patch also adds error checking to the bss setup (and interpreter's bss
      setup).  With the aforementioned change but no error checking, "ulimit -v
      65536; ./hugebss" will crash in the store after the `system' call, because
      the kernel will have failed to allocate the bss and ignored the error, so
      the program runs without those pages being mapped at all.  With this change
      it dies with a SIGKILL as for a failure to set up stack pages.  It might be
      even better to try to detect the case earlier so that execve can return an
      error before it has wiped out the address space.  But that seems like it
      would always be fragile and miss some corner cases, so I did not try to add
      such complexity.
      0363994f
    • Andrew Morton's avatar
      [PATCH] Erronous use of tick_usec in do_gettimeofday · 709087ca
      Andrew Morton authored
      From: Joe Korty <joe.korty@ccur.com>
      
      do_gettimeofday() is using tick_usec which is defined in terms of USER_HZ
      not HZ.
      709087ca
    • Andrew Morton's avatar
      [PATCH] md: set ra_pages for raid0/raid5 devices properly. · c5b971d7
      Andrew Morton authored
      From: NeilBrown <neilb@cse.unsw.edu.au>
      
      stripe to be effective.  This patch sets ra_pages
      appropriately.
      c5b971d7
    • Andrew Morton's avatar
      [PATCH] md: Limit max_sectors on md when merge_bvec_fn defined on underlying device. · 59165b4f
      Andrew Morton authored
      From: NeilBrown <neilb@cse.unsw.edu.au>
      
      As no md personalities honour the merge_bvec_fn of underlying devices,
      we must make sure never to submit a bio larger than 1 page when a 
      merge_bvec_fn is defined.
      
      raid5 already does this (it never submits bios larger than one page).
      With this patch, all other raid personalities limit their
      max_sectors when a merge_bvec_fn is present.
      59165b4f
    • Andrew Morton's avatar
      [PATCH] BINFMT_ELF=m is not an option · 0b0a866d
      Andrew Morton authored
      From: glee@gnupilgrims.org
      
      I think Adrian had forgotten to update the help text.
      0b0a866d
    • Andrew Morton's avatar
      [PATCH] Ext3+quota deadlock fix · db84a820
      Andrew Morton authored
      From: Jan Kara <jack@ucw.cz>
      
      here's patch which should fix deadlock with quotas+ext3 reported in 2.4
      (the same problem existed in 2.6 but nobody found it).
      db84a820
    • Andrew Morton's avatar
      [PATCH] Fix possible oops in vfs_quota_sync() · b0d8c562
      Andrew Morton authored
      From: Jan Kara <jack@ucw.cz>
      
      I'm sending you a fix of possible Oops in vfs_quota_sync().  Actually
      nobody has run into that I found it when I was looking through the code.
      b0d8c562
    • Andrew Morton's avatar
      [PATCH] sis comparison / assignment operator fix · 155717ab
      Andrew Morton authored
      From: Geoffrey Lee <glee@gnupilgrims.org>
      
      This fixes what seems to be an obvious = vs == bug in the init301.c sis
      file.
      155717ab
    • Andrew Morton's avatar
      [PATCH] remove mm->swap_address · 695716f5
      Andrew Morton authored
      From: William Lee Irwin III <wli@holomorphy.com>
      
      This field is 100% unused. This patch removes it.
      695716f5
    • Andrew Morton's avatar
      [PATCH] Fix 32bit siginfo problems on x86-64 · 3b35cbe5
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      32bit siginfo would sometimes get passed incorrectly on x86-64. This
      change fixes the conversion function to be a bit dumber, but more
      correct.
      3b35cbe5
    • Andrew Morton's avatar
      [PATCH] Don't panic in mpparse on x86-64 · 53b3aa6c
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      Merge i386 fix. Don't panic in MP table parsing when the table is bad.
      53b3aa6c
    • Andrew Morton's avatar
      [PATCH] Signal fixes for x86-64 · ca981c9f
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      Merge signal race fixes from i386 to x86-64.
      
      Fix a bug in system call restart, noted by John Blackwood.
      ca981c9f
    • Andrew Morton's avatar
      [PATCH] Merge i386 fix for page fault to x86-64 · 2988d8dd
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      Merge the i386 fix for the page fault from Linus to x86-64
      (I'm not actually sure what it fixes, but if it's good for 32bit
      it is likely good for 64bit too)
      2988d8dd
    • Andrew Morton's avatar
      [PATCH] Add more paranoid checking in x86-64 prefetch checker · cf79a124
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      Make sure we never access anything in kernel mapping while
      doing the prefetch workaround checks on x86-64.
      
      Originally suggested by Jamie Lockier.
      cf79a124
    • Andrew Morton's avatar
      [PATCH] Fix 32bit truncate on x86-64 · 8f0f4aaa
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      Another potential data corruption fix.
      
      The 32bit truncate64 on x86-64 did silently truncate
      offsets >32bit. That broke mysql for example. Fix that.
      
      From Chris Wilson
      8f0f4aaa
    • Andrew Morton's avatar
      [PATCH] Fix sysrq-t on x86-64 · 3959fde8
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      From Badari Pulavarty
      
      Without this sysrq-t shows the same backtrace for all processes on x86-64
      3959fde8
    • Andrew Morton's avatar
      [PATCH] Fix CPUID compilation on x86-64 · 2393a309
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      A lot of people have run into this: the x86-64 cpuid driver didn't
      compile as module.
      
      Using a kludge suggested by Sam Ravnsborg.
      2393a309
    • Andrew Morton's avatar
      [PATCH] Critical x86-64 IOMMU fixes for 2.6.0 · f2059100
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      Please consider applying this patch, I would consider it critical for x86-64.
      
      The 2.6.0 x86-64 IOMMU code unfortunately had a few problems, leading
      to non booting systems and in a few cases to data corruption.
      
      It fixes a two serious bugs in handling special kinds of scatter gather
      lists in pci_map_sg.
      
      AGP was completely broken with IOMMU because of a wrong #ifdef.
      Fix that.
      
      One TLB flush optimization I did a long time ago seems to break on
      some 3ware boards (who require IOMMU because they don't support 64bit
      addresses).  The breakage lead to data corruption. This patch diables
      the optimization for now and fixes a potential SMP race in the flush
      code too. The TLB flush is done in a slower, but more reliable way
      now too.
      
      This patch fixes them. Please consider applying, because some of these
      problems hit quite many people.
      
      This also disables the IOMMU_DEBUG in the defconfig. A lot of people 
      were using the IOMMU when they didn't need to, which multiplied the
      problems.
      
      IOMMU merge is disabled for now. This was an experimental optimization
      which helped with some block devices, but for production it seems to
      be better to disable it for now because there are some questionable
      corner cases when the IOMMU aperture fragments. The same is done
      for IOMMU SAC force, which was related to that. 
      
      i386 has quite broken semantics for pci_alloc_consistent(). It uses
      the standard device DMA mask instead of the consistent mask. Make us
      bug-to-bug compatible here. This fixes problems with some sound
      drivers that don't support full 32bit addressing.
      f2059100
    • Andrew Morton's avatar
      [PATCH] Add a.out support for x86-64 · b14a4258
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      Add 32bit a.out support for x86-64.
      
      Not exactly an important bug fix, but maybe it will help someone.  This
      should increase the current 98% compatibility to i386 to perhaps 98.1% @)
      
      I tested an old a.out SuSE 4.2 installation in chroot and it worked.  It
      also ran some very old linux binaries from '92 found on ftp.funet.fi.  The
      only program that didn't was the SuSE a.out GNU emacs, but I was too lazy
      to track that down.  Core dumps are not supported.
      b14a4258
    • Andrew Morton's avatar
      [PATCH] statfs64 fix · dce80777
      Andrew Morton authored
      From: Andi Kleen <ak@muc.de>
      
      It fixes the statfs64 emulation on x86-64.  The problem is that x86-64
      needs an __attribute__((aligned)) on the compat_statfs64 structure.  The
      conclusion last time this was discussed was that the structure should be
      duplicated.
      
      Essentially it is the old shared structure copied to every user and x86-64
      uses __attribute__((packed)).
      dce80777
    • Andrew Morton's avatar
      [PATCH] dm and bounce buffer panic fix · 85734c47
      Andrew Morton authored
      From: Mark Haverkamp <markh@osdl.org>
      
      About three weeks ago markw at osdl posted a mail about a panic that he
      was seeing:
      
      http://marc.theaimsgroup.com/?l=linux-kernel&m=106737176716474&w=2
      
      I believe what is happening, is that the dm __clone_and_map function is
      generating bio structures with the bi_idx field non-zero.  When
      __blk_queue_bounce creates a new bio with bounce pages, it sets the bi_idx
      field to 0 rather than the bi_idx of the original.  This causes trouble since
      bv_page pointers will be dereferenced later that are zero.  The following
      uses the original bio structure's bi_idx in the new bio structure and in
      copy_to_high_bio_irq and bounce_end_io.
      
      This has cleared up the panic when using the volume.
      
      (acked by Joe Thornber)
      85734c47
    • Andrew Morton's avatar
      [PATCH] ext3: bd_claim for journal device · 9907e736
      Andrew Morton authored
      From: Neil Brown <neilb@cse.unsw.edu.au>
      
      Change ext3 to run bd_claim() against external journal devices. It is
      significant only for those who have ext3 journals on a separate device, and
      gets exclusive access to that device.
      9907e736
    • Andrew Morton's avatar
      [PATCH] remove include recursion from linux/pagemap.h · 1fcec52f
      Andrew Morton authored
      From: Arnaldo Carvalho de Melo <acme@conectiva.com.br>
      
      pagemap.h, do not include thyself.
      1fcec52f