1. 20 Jun, 2007 5 commits
  2. 19 Jun, 2007 16 commits
  3. 18 Jun, 2007 7 commits
    • Linus Torvalds's avatar
      Fix possible runqueue lock starvation in wait_task_inactive() · fa490cfd
      Linus Torvalds authored
      Miklos Szeredi reported very long pauses (several seconds, sometimes
      more) on his T60 (with a Core2Duo) which he managed to track down to
      wait_task_inactive()'s open-coded busy-loop.
      
      He observed that an interrupt on one core tries to acquire the
      runqueue-lock but does not succeed in doing so for a very long time -
      while wait_task_inactive() on the other core loops waiting for the first
      core to deschedule a task (which it wont do while spinning in an
      interrupt handler).
      
      This rewrites wait_task_inactive() to do all its waiting optimistically
      without any locks taken at all, and then just double-check the end
      result with the proper runqueue lock held over just a very short
      section.  If there were races in the optimistic wait, of a preemption
      event scheduled the process away, we simply re-synchronize, and start
      over.
      
      So the code now looks like this:
      
      	repeat:
      		/* Unlocked, optimistic looping! */
      		rq = task_rq(p);
      		while (task_running(rq, p))
      			cpu_relax();
      
      		/* Get the *real* values */
      		rq = task_rq_lock(p, &flags);
      		running = task_running(rq, p);
      		array = p->array;
      		task_rq_unlock(rq, &flags);
      
      		/* Check them.. */
      		if (unlikely(running)) {
      			cpu_relax();
      			goto repeat;
      		}
      
      		/* Preempted away? Yield if so.. */
      		if (unlikely(array)) {
      			yield();
      			goto repeat;
      		}
      
      Basically, that first "while()" loop is done entirely without any
      locking at all (and doesn't check for the case where the target process
      might have been preempted away), and so it's possibly "incorrect", but
      we don't really care.  Both the runqueue used, and the "task_running()"
      check might be the wrong tests, but they won't oops - they just mean
      that we could possibly get the wrong results due to lack of locking and
      exit the loop early in the case of a race condition.
      
      So once we've exited the loop, we then get the proper (and careful) rq
      lock, and check the running/runnable state _safely_.  And if it turns
      out that our quick-and-dirty and unsafe loop was wrong after all, we
      just go back and try it all again.
      
      (The patch also adds a lot of comments, which is the actual bulk of it
      all, to make it more obvious why we can do these things without holding
      the locks).
      
      Thanks to Miklos for all the testing and tracking it down.
      Tested-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa490cfd
    • Ingo Molnar's avatar
      sched: fix SysRq-N (normalize RT tasks) · a0f98a1c
      Ingo Molnar authored
      Gene Heskett reported the following problem while testing CFS: SysRq-N
      is not always effective in normalizing tasks back to SCHED_OTHER.
      
      The reason for that turns out to be the following bug:
      
       - normalize_rt_tasks() uses for_each_process() to iterate through all
         tasks in the system.  The problem is, this method does not iterate
         through all tasks, it iterates through all thread groups.
      
      The proper mechanism to enumerate over all threads is to use a
      do_each_thread() + while_each_thread() loop.
      Reported-by: default avatarGene Heskett <gene.heskett@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0f98a1c
    • Linus Torvalds's avatar
      Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 · 4cc21505
      Linus Torvalds authored
      * master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
        [SCSI] ESP: Don't forget to clear ESP_FLAG_RESETTING.
        [SCSI] fusion: fix for BZ 8426 - massive slowdown on SCSI CD/DVD drive
      4cc21505
    • Benjamin Herrenschmidt's avatar
      Fix signalfd interaction with thread-private signals · caec4e8d
      Benjamin Herrenschmidt authored
      Don't let signalfd dequeue private signals off other threads (in the
      case of things like SIGILL or SIGSEGV, trying to do so would result
      in undefined behaviour on who actually gets the signal, since they
      are force unblocked).
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      caec4e8d
    • Thomas Gleixner's avatar
      Revert "futex_requeue_pi optimization" · bd197234
      Thomas Gleixner authored
      This reverts commit d0aa7a70.
      
      It not only introduced user space visible changes to the futex syscall,
      it is also non-functional and there is no way to fix it proper before
      the 2.6.22 release.
      
      The breakage report ( http://lkml.org/lkml/2007/5/12/17 ) went
      unanswered, and unfortunately it turned out that the concept is not
      feasible at all.  It violates the rtmutex semantics badly by introducing
      a virtual owner, which hacks around the coupling of the user-space
      pi_futex and the kernel internal rt_mutex representation.
      
      At the moment the only safe option is to remove it fully as it contains
      user-space visible changes to broken kernel code, which we do not want
      to expose in the 2.6.22 release.
      
      The patch reverts the original patch mostly 1:1, but contains a couple
      of trivial manual cleanups which were necessary due to patches, which
      touched the same area of code later.
      
      Verified against the glibc tests and my own PI futex tests.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarUlrich Drepper <drepper@redhat.com>
      Cc: Pierre Peiffer <pierre.peiffer@bull.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bd197234
    • Paul Mundt's avatar
      sh: oops_enter()/oops_exit() in die(). · 55273982
      Paul Mundt authored
      As Russell helpfully pointed out on linux-arch:
      
      	http://marc.info/?l=linux-arch&m=118208089204630&w=2
      
      We were missing the oops_enter/exit() in the sh die() implementation.
      As we do support lockdep, it's beneficial to add these calls so lockdep
      properly disables itself in the die() case.
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      55273982
    • Kaz Kojima's avatar
      sh: Fix restartable syscall arg5 clobbering. · 69a33147
      Kaz Kojima authored
      We use R0 as the 5th argument of syscall.  When the syscall restarts
      after signal handling, we should restore the old value of R0.
      The attached patch does it. Without this patch, I've experienced random
      failures in the situation which signals are issued frequently.
      Signed-off-by: default avatarKaz Kojima <kkojima@rr.iij4u.or.jp>
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      69a33147
  4. 17 Jun, 2007 1 commit
  5. 16 Jun, 2007 11 commits
    • Eric W. Biederman's avatar
      shm: fix the filename of hugetlb sysv shared memory · 9d66586f
      Eric W. Biederman authored
      Some user space tools need to identify SYSV shared memory when examining
      /proc/<pid>/maps.  To do so they look for a block device with major zero, a
      dentry named SYSV<sysv key>, and having the minor of the internal sysv
      shared memory kernel mount.
      
      To help these tools and to make it easier for people just browsing
      /proc/<pid>/maps this patch modifies hugetlb sysv shared memory to use the
      SYSV<key> dentry naming convention.
      
      User space tools will still have to be aware that hugetlb sysv shared
      memory lives on a different internal kernel mount and so has a different
      block device minor number from the rest of sysv shared memory.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Albert Cahalan <acahalan@gmail.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9d66586f
    • Adam Litke's avatar
      hugetlb: fix get_policy for stacked shared memory files · 22741925
      Adam Litke authored
      Here's another breakage as a result of shared memory stacked files :(
      
      The NUMA policy for a VMA is determined by checking the following (in the
      order given):
      
      1) vma->vm_ops->get_policy() (if defined)
      2) vma->vm_policy (if defined)
      3) task->mempolicy (if defined)
      4) Fall back to default_policy
      
      By switching to stacked files for shared memory, get_policy() is now always
      set to shm_get_policy which is a wrapper function.  This causes us to stop
      at step 1, which yields NULL for hugetlb instead of task->mempolicy which
      was the previous (and correct) result.
      
      This patch modifies the shm_get_policy() wrapper to maintain steps 1-3 for
      the wrapped vm_ops.
      
      (akpm: the refcounting of mempolicies is busted and this patch does nothing to
      improve it)
      Signed-off-by: default avatarAdam Litke <agl@us.ibm.com>
      Acked-by: default avatarWilliam Irwin <bill.irwin@oracle.com>
      Cc: dean gaudet <dean@arctic.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      22741925
    • Jan Kara's avatar
      udf: fix possible leakage of blocks · 74584ae5
      Jan Kara authored
      We have to take care that when we call udf_discard_prealloc() from
      udf_clear_inode() we have to write inode ourselves afterwards (otherwise,
      some changes might be lost leading to leakage of blocks, use of free blocks
      or improperly aligned extents).
      
      Also udf_discard_prealloc() does two different things - it removes
      preallocated blocks and truncates the last extent to exactly match i_size.
      We move the latter functionality to udf_truncate_tail_extent(), call
      udf_discard_prealloc() when last reference to a file is dropped and call
      udf_truncate_tail_extent() when inode is being removed from inode cache
      (udf_clear_inode() call).
      
      We cannot call udf_truncate_tail_extent() earlier as subsequent open+write
      would find the last block of the file mapped and happily write to the end
      of it, although the last extent says it's shorter.
      
      [akpm@linux-foundation.org: Make checkpatch.pl happier]
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Eric Sandeen <sandeen@sandeen.net>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      74584ae5
    • Christoph Lameter's avatar
      SLUB: minimum alignment fixes · 4b356be0
      Christoph Lameter authored
      If ARCH_KMALLOC_MINALIGN is set to a value greater than 8 (SLUBs smallest
      kmalloc cache) then SLUB may generate duplicate slabs in sysfs (yes again)
      because the object size is padded to reach ARCH_KMALLOC_MINALIGN.  Thus the
      size of the small slabs is all the same.
      
      No arch sets ARCH_KMALLOC_MINALIGN larger than 8 though except mips which
      for some reason wants a 128 byte alignment.
      
      This patch increases the size of the smallest cache if
      ARCH_KMALLOC_MINALIGN is greater than 8.  In that case more and more of the
      smallest caches are disabled.
      
      If we do that then the count of the active general caches that is displayed
      on boot is not correct anymore since we may skip elements of the kmalloc
      array.  So count them separately.
      
      This approach was tested by Havard yesterday.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4b356be0
    • Benjamin Herrenschmidt's avatar
      Rework ptep_set_access_flags and fix sun4c · 8dab5241
      Benjamin Herrenschmidt authored
      Some changes done a while ago to avoid pounding on ptep_set_access_flags and
      update_mmu_cache in some race situations break sun4c which requires
      update_mmu_cache() to always be called on minor faults.
      
      This patch reworks ptep_set_access_flags() semantics, implementations and
      callers so that it's now responsible for returning whether an update is
      necessary or not (basically whether the PTE actually changed).  This allow
      fixing the sparc implementation to always return 1 on sun4c.
      
      [akpm@linux-foundation.org: fixes, cleanups]
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Mark Fortescue <mark@mtfhpc.demon.co.uk>
      Acked-by: default avatarWilliam Lee Irwin III <wli@holomorphy.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8dab5241
    • Matt Mackall's avatar
      random: fix output buffer folding · 679ce0ac
      Matt Mackall authored
      (As reported by linux@horizon.com)
      
      Folding is done to minimize the theoretical possibility of systematic
      weakness in the particular bits of the SHA1 hash output.  The result of
      this bug is that 16 out of 80 bits are un-folded.  Without a major new
      vulnerability being found in SHA1, this is harmless, but still worth
      fixing.
      Signed-off-by: default avatarMatt Mackall <mpm@selenic.com>
      Cc: <linux@horizon.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      679ce0ac
    • Jeff Dike's avatar
      uml: kill x86_64 STACK_TOP_MAX · 39a27902
      Jeff Dike authored
      The x86_64 a.out.h got a definition of STACK_TOP_MAX, which interferes with
      the UML version.  So, just undef it like STACK_TOP.
      Signed-off-by: default avatarJeff Dike <jdike@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      39a27902
    • Jeff Dike's avatar
      uml: remove PAGE_SIZE from libc code · c539ab73
      Jeff Dike authored
      Distros seem to be removing PAGE_SIZE from asm/page.h.  So, the libc side of
      UML should stop using it.
      
      I replace it with UM_KERN_PAGE_SIZE, which is defined to be the same as
      PAGE_SIZE on the kernel side of the house.  I could also use getpagesize(),
      but it's more important that UML have the same value of PAGE_SIZE everywhere.
      It's conceivable that it could be built with a larger PAGE_SIZE, and use of
      getpagesize() would break that badly.
      
      PAGE_MASK got the same treatment, as it is closely tied to PAGE_SIZE.
      Signed-off-by: default avatarJeff Dike <jdike@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c539ab73
    • David Brownell's avatar
      spi doc updates · f5a9c77d
      David Brownell authored
      Update two points in the SPI interface documentation:
      
      - Update description of the "chip stays selected after message ends"
        mode.  In some cases it's required for correctness; it isn't just a
        performance tweak.  (Yes: to use this mode on mult-device busses, another
        programming interface will be needed.  One draft has been circulated
        already.)
      
      - Clarify spi_setup(), highlighting that callers must ensure that no
        requests are queued (can't change configuration except between I/Os), and
        that the device must be deselected when this returns (which is a key part
        of why it's called during device init).
      Signed-off-by: default avatarDavid Brownell <dbrownell@users.sourceforge.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f5a9c77d
    • Mike Accetta's avatar
      md: fix bug in error handling during raid1 repair · ed456662
      Mike Accetta authored
      If raid1/repair (which reads all block and fixes any differences it finds)
      hits a read error, it doesn't reset the bio for writing before writing
      correct data back, so the read error isn't fixed, and the device probably
      gets a zero-length write which it might complain about.
      Signed-off-by: default avatarNeil Brown <neilb@suse.de>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ed456662
    • NeilBrown's avatar
      md: fix two raid10 bugs · af03b8e4
      NeilBrown authored
      1/ When resyncing a degraded raid10 which has more than 2 copies of each block,
        garbage can get synced on top of good data.
      
      2/ We round the wrong way in part of the device size calculation, which
        can cause confusion.
      Signed-off-by: default avatarNeil Brown <neilb@suse.de>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      af03b8e4