1. 02 Mar, 2007 29 commits
  2. 01 Mar, 2007 11 commits
    • Trond Myklebust's avatar
      [PATCH] VM: invalidate_inode_pages2_range() should not exit early · 7b965e08
      Trond Myklebust authored
      Fix invalidate_inode_pages2_range() so that it does not immediately exit
      just because a single page in the specified range could not be removed.
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b965e08
    • Aristeu Sergio Rozanski Filho's avatar
      [PATCH] tty_io: fix race in master pty close/slave pty close path · 5a39e8c6
      Aristeu Sergio Rozanski Filho authored
      This patch fixes a possible race that leads to double freeing an idr index.
       When the master begin to close, release_dev() is called and then
      pty_close() is called:
      
              if (tty->driver->close)
                      tty->driver->close(tty, filp);
      
      This is done without helding any locks other than BKL.  Inside pty_close(),
      being a master close, the devpts entry will be removed:
      
      #ifdef CONFIG_UNIX98_PTYS
                      if (tty->driver == ptm_driver)
                              devpts_pty_kill(tty->index);
      #endif
      
      But devpts_pty_kill() will call get_node() that may sleep while waiting for
      &devpts_root->d_inode->i_sem.  When this happens and the slave is being
      opened, tty_open() just found the driver and index:
      
              driver = get_tty_driver(device, &index);
              if (!driver) {
                      mutex_unlock(&tty_mutex);
                      return -ENODEV;
              }
      
      This part of the code is already protected under tty_mute.  The problem is
      that the slave close already got an index.  Then init_dev() is called and
      blocks waiting for the same &devpts_root->d_inode->i_sem.
      
      When the master close resumes, it removes the devpts entry, and the
      relation between idr index and the tty is gone.  The master then sleeps
      waiting for the tty_mutex on release_dev().
      
      Slave open resumes and found no tty for that index.  As result, a NULL tty
      is returned and init_dev() doesn't flow to fast_track:
      
              /* check whether we're reopening an existing tty */
              if (driver->flags & TTY_DRIVER_DEVPTS_MEM) {
                      tty = devpts_get_tty(idx);
                      if (tty && driver->subtype == PTY_TYPE_MASTER)
                              tty = tty->link;
              } else {
                      tty = driver->ttys[idx];
              }
              if (tty) goto fast_track;
      
      The result of this, is that a new tty will be created and init_dev() returns
      sucessfull. After returning, tty_mutex is dropped and master close may resume.
      
      Master close finds it's the only use and both sides are closing, then releases
      the tty and the index. At this point, the idr index is free, but slave still
      has it.
      
      Slave open then calls pty_open() and finds that tty->link->count is 0,
      because there's no master and returns error.  Then tty_open() calls
      release_dev() which executes without any warning, as it was a case of last
      slave close when the master is already closed (master->count == 0,
      slave->count == 1).  The tty is then released with the already released idr
      index.
      
      This normally would only issue a warning on idr_remove() but in case of a
      customer's critical application, it's never too simple:
      
      thread1: opens master, gets index X
      thread1: begin closing master
      thread2: begin opening slave with index X
      thread1: finishes closing master, index X released
      thread3: opens master, gets index X, just released
      thread2: fails opening slave, releases index X         <----
      thread4: opens master, gets index X, init_dev() then find an already in use
      	 and healthy tty and fails
      
      If no more indexes are released, ptmx_open() will keep failing, as the
      first free index available is X, and it will make init_dev() fail because
      you're trying to "reopen a master" which isn't valid.
      
      The patch notices when this race happens and make init_dev() fail
      imediately.  The init_dev() function is called with tty_mutex held, so it's
      safe to continue with tty till the end of function because release_dev()
      won't make any further changes without grabbing the tty_mutex.
      
      Without the patch, on some machines it's possible get easily idr warnings
      like this one:
      
      idr_remove called for id=15 which is not allocated.
       [<c02555b9>] idr_remove+0x139/0x170
       [<c02a1b62>] release_mem+0x182/0x230
       [<c02a28e7>] release_dev+0x4b7/0x700
       [<c02a0ea7>] tty_ldisc_enable+0x27/0x30
       [<c02a1e64>] init_dev+0x254/0x580
       [<c02a0d64>] check_tty_count+0x14/0xb0
       [<c02a4f05>] tty_open+0x1c5/0x340
       [<c02a4d40>] tty_open+0x0/0x340
       [<c017388f>] chrdev_open+0xaf/0x180
       [<c017c2ac>] open_namei+0x8c/0x760
       [<c01737e0>] chrdev_open+0x0/0x180
       [<c0167bc9>] __dentry_open+0xc9/0x210
       [<c0167e2c>] do_filp_open+0x5c/0x70
       [<c0167a91>] get_unused_fd+0x61/0xd0
       [<c0167e93>] do_sys_open+0x53/0x100
       [<c0167f97>] sys_open+0x27/0x30
       [<c010303b>] syscall_call+0x7/0xb
      
      using this test application available on:
       http://www.ruivo.org/~aris/pty_sodomizer.cSigned-off-by: default avatarAristeu Sergio Rozanski Filho <aris@ruivo.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Chuck Ebbert <cebbert@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5a39e8c6
    • Yoichi Yuasa's avatar
      [PATCH] fix memory leak in dma_declare_coherent_memory() · 3a0ee2ce
      Yoichi Yuasa authored
      When it goes to free1_out, dev->dma_mem has not been freed.
      Signed-off-by: default avatarYoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3a0ee2ce
    • Karsten Keil's avatar
      [PATCH] Fix buffer overflow and races in capi debug functions · 17f0cd2f
      Karsten Keil authored
      The CAPI trace debug functions were using a fixed size buffer, which can be
      overflowed if wrong formatted CAPI messages were sent to the kernel capi
      layer.  The code was also not protected against multiple callers.  This fix
      bug 8028.
      
      Additionally the patch make the CAPI trace functions optional.
      Signed-off-by: default avatarKarsten Keil <kkeil@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      17f0cd2f
    • Oleg Nesterov's avatar
      [PATCH] adapt page_lock_anon_vma() to PREEMPT_RCU · 34bbd704
      Oleg Nesterov authored
      page_lock_anon_vma() uses spin_lock() to block RCU.  This doesn't work with
      PREEMPT_RCU, we have to do rcu_read_lock() explicitely.  Otherwise, it is
      theoretically possible that slab returns anon_vma's memory to the system
      before we do spin_unlock(&anon_vma->lock).
      
      [ Hugh points out that this only matters for PREEMPT_RCU, which isn't merged
        yet, and may never be.  Regardless, this patch is conceptually the
        right thing to do, even if it doesn't matter at this point.  - Linus ]
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Acked-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      34bbd704
    • Vassili Karpov's avatar
      [PATCH] Documentation: CPU load calculation description · 48dba8ab
      Vassili Karpov authored
      Describes how/when the information exported to `/proc/stat' is calculated,
      and possible problems with this approach.
      Signed-off-by: default avatarVassili Karpov <av1474@comtv.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      48dba8ab
    • Ingo Molnar's avatar
      [PATCH] sched: fix SMT scheduler bug · 7355690e
      Ingo Molnar authored
      The SMT scheduler incorrectly skips kernel threads even if they are
      runnable (but they are preempted by a higher-prio user-space task which got
      SMT-delayed by an even higher-priority task running on a sibling CPU).
      
      Fix this for now by only doing the SMT-nice optimization if the
      to-be-delayed task is the only runnable task.  (This should cover most of
      the real-life cases anyway.)
      
      This bug has been in the SMT scheduler since 2.6.17 or so, but has only
      been noticed now by the active check in the dynticks code.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7355690e
    • Geert Uytterhoeven's avatar
      [PATCH] ps3: introduce CONFIG_PS3_ADVANCED · 3f555c70
      Geert Uytterhoeven authored
      ps3: Introduce CONFIG_PS3_ADVANCED, as suggested by Roman Zippel, and use
      it to control questions about PS3 subsystems that may not be obvious for
      the casual user.
      
      This gets rid of the following warning on non-powerpc platforms: |
      drivers/video/Kconfig:1604:warning: 'select' used by config symbol 'FB_PS3'
      refer to undefined symbol 'PS3_PS3AV'
      Signed-off-by: default avatarGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Acked-by: default avatarGeoff Levand <geoffrey.levand@am.sony.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f555c70
    • Mingming Cao's avatar
      [PATCH] ext[34]: EA block reference count racing fix · 8a2bfdcb
      Mingming Cao authored
      There are race issues around ext[34] xattr block release code.
      
      ext[34]_xattr_release_block() checks the reference count of xattr block
      (h_refcount) and frees that xattr block if it is the last one reference it.
       Unlike ext2, the check of this counter is unprotected by any lock.
      ext[34]_xattr_release_block() will free the mb_cache entry before freeing
      that xattr block.  There is a small window between the check for the re
      h_refcount ==1 and the call to mb_cache_entry_free().  During this small
      window another inode might find this xattr block from the mbcache and reuse
      it, racing a refcount updates.  The xattr block will later be freed by the
      first inode without notice other inode is still use it.  Later if that
      block is reallocated as a datablock for other file, then more serious
      problem might happen.
      
      We need put a lock around places checking the refount as well to avoid
      racing issue.  Another place need this kind of protection is in
      ext3_xattr_block_set(), where it will modify the xattr block content in-
      the-fly if the refcount is 1 (means it's the only inode reference it).
      
      This will also fix another issue: the xattr block may not get freed at all
      if no lock is to protect the refcount check at the release time.  It is
      possible that the last two inodes could release the shared xattr block at
      the same time.  But both of them think they are not the last one so only
      decreased the h_refcount without freeing xattr block at all.
      
      We need to call lock_buffer() after ext3_journal_get_write_access() to
      avoid deadlock (because the later will call lock_buffer()/unlock_buffer
      () as well).
      Signed-off-by: default avatarMingming Cao <cmm@us.ibm.com>
      Cc: Andreas Gruenbacher <agruen@suse.de>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a2bfdcb
    • Jeff Dike's avatar
      [PATCH] uml: pte_mkread fix · 1463fdbc
      Jeff Dike authored
      Fix the fact that pte_mkread set _PAGE_RW instead of _PAGE_USER (the logic is
      copied from i386 in most place, so it is really as bad as you're thinking).
      
      Thus currently page tables are more permissive than they should.
      
      Such a change may trigger other latent bugs, so be careful with this.
      Signed-off-by: default avatarPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Signed-off-by: default avatarJeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1463fdbc
    • Jeff Dike's avatar
      [PATCH] uml: host VDSO fix · 14251809
      Jeff Dike authored
      This fixes a problem seen by a number of people running UML on newer host
      kernels.  init would hang with an infinite segfault loop.
      
      It turns out that the host kernel was providing a AT_SYSINFO_EHDR of
      0xffffe000, which faked UML into believing that the host VDSO page could be
      reused.  However, AT_SYSINFO pointed into the middle of the address space, and
      was unmapped as a result.  Because UML was providing AT_SYSINFO_EHDR and
      AT_SYSINFO to its own processes, these would branch to nowhere when trying to
      use the VDSO.
      
      The fix is to also check the location of AT_SYSINFO when deciding whether to
      use the host's VDSO.
      Signed-off-by: default avatarJeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      14251809