1. 08 Feb, 2008 40 commits
    • Heiko Carstens's avatar
      tty: let architectures override the user/kernel macros. · aa7738a5
      Heiko Carstens authored
      Give architectures that support the new termios2 the possibilty to overide the
      user_termios_to_kernel_termios and kernel_termios_to_user_termios macros.  As
      soon as all architectures that use the generic variant have been converted the
      ifdefs can go away again.  Architectures in question are avr32, frv, powerpc
      and s390.
      
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aa7738a5
    • mark gross's avatar
      intel-iommu: fault_reason index cleanup · d94afc6c
      mark gross authored
      Fix an off by one bug in the fault reason string reporting function, and
      clean up some of the code around this buglet.
      
      [akpm@linux-foundation.org: cleanup]
      Signed-off-by: default avatarmark gross <mgross@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d94afc6c
    • mark gross's avatar
      intel-iommu: PMEN support · f8bab735
      mark gross authored
      Add support for protected memory enable bits by clearing them if they are
      set at startup time.  Some future boot loaders or firmware could have this
      bit set after it loads the kernel, and it needs to be cleared if DMA's are
      going to happen effectively.
      Signed-off-by: default avatarmark gross <mgross@intel.com>
      Acked-by: default avatarMuli Ben-Yehuda <muli@il.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f8bab735
    • Alexey Dobriyan's avatar
      proc: fix ->open'less usage due to ->proc_fops flip · 2d3a4e36
      Alexey Dobriyan authored
      Typical PDE creation code looks like:
      
      	pde = create_proc_entry("foo", 0, NULL);
      	if (pde)
      		pde->proc_fops = &foo_proc_fops;
      
      Notice that PDE is first created, only then ->proc_fops is set up to
      final value. This is a problem because right after creation
      a) PDE is fully visible in /proc , and
      b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
         possible to ->read without ->open (see one class of oopses below).
      
      The fix is new API called proc_create() which makes sure ->proc_fops are
      set up before gluing PDE to main tree. Typical new code looks like:
      
      	pde = proc_create("foo", 0, NULL, &foo_proc_fops);
      	if (!pde)
      		return -ENOMEM;
      
      Fix most networking users for a start.
      
      In the long run, create_proc_entry() for regular files will go.
      
      BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
      printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
      Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      last sysfs file: /sys/block/sda/sda1/dev
      Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
      
      Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
      EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0
      EIP is at mutex_lock_nested+0x75/0x25d
      EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
      ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
      Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
             00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
             00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
      Call Trace:
       [<c106f7ce>] seq_read+0x24/0x28a
       [<c106f7aa>] seq_read+0x0/0x28a
       [<c106f7ce>] seq_read+0x24/0x28a
       [<c106f7aa>] seq_read+0x0/0x28a
       [<c10818b8>] proc_reg_read+0x60/0x73
       [<c1081858>] proc_reg_read+0x0/0x73
       [<c105a34f>] vfs_read+0x6c/0x8b
       [<c105a6f3>] sys_read+0x3c/0x63
       [<c10025f2>] sysenter_past_esp+0x5f/0xa5
       [<c10697a7>] destroy_inode+0x24/0x33
       =======================
      INFO: lockdep is turned off.
      Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
      EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@sw.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2d3a4e36
    • Eric W. Biederman's avatar
      proc: fix the threaded /proc/self · c6caeb7c
      Eric W. Biederman authored
      Long ago when the CLONE_THREAD support first went it someone thought it
      would be wise to point /proc/self at /proc/<tgid> instead of /proc/<pid>.
      
      Given that /proc/<tgid> can return information about a very different task
      (if enough things have been unshared) then our current process /proc/<tgid>
      seems blatantly wrong.  So far I have yet to think up an example where the
      current behavior would be advantageous, and I can see several places where
      it is seriously non-intuitive.
      
      We may be stuck with the current broken behavior for backwards
      compatibility reasons but lets try fixing our ancient bug for the 2.6.25
      time frame and see if anyone screams.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: "Guillaume Chazarain" <guichaz@yahoo.fr>
      Cc: "Pavel Emelyanov" <xemul@openvz.org>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c6caeb7c
    • Eric W. Biederman's avatar
      proc: proper pidns handling for /proc/self · 488e5bc4
      Eric W. Biederman authored
      Currently if you access a /proc that is not mounted with your processes
      current pid namespace /proc/self will point at a completely random task.
      
      This patch fixes /proc/self to point to the current process if it is
      available in the particular mount of /proc or to return -ENOENT if the
      current process is not visible.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      488e5bc4
    • Eric W. Biederman's avatar
      proc: seqfile convert proc_pid_status to properly handle pid namespaces · df5f8314
      Eric W. Biederman authored
      Currently we possibly lookup the pid in the wrong pid namespace.  So
      seq_file convert proc_pid_status which ensures the proper pid namespaces is
      passed in.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: another build fix]
      [akpm@linux-foundation.org: s390 build fix]
      [akpm@linux-foundation.org: fix task_name() output]
      [akpm@linux-foundation.org: fix nommu build]
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Cc: Andrew Morgan <morgan@kernel.org>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      df5f8314
    • Eric W. Biederman's avatar
      seqfile convert proc_pid_statm · a56d3fc7
      Eric W. Biederman authored
      This conversion is just for code cleanliness, uniformity, and general safety.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a56d3fc7
    • Eric W. Biederman's avatar
      proc: rewrite do_task_stat to correctly handle pid namespaces. · ee992744
      Eric W. Biederman authored
      Currently (as pointed out by Oleg) do_task_stat has a race when calling
      task_pid_nr_ns with the task exiting.  In addition do_task_stat is not
      currently displaying information in the context of the pid namespace that
      mounted the /proc filesystem.  So "cut -d' ' -f 1 /proc/<pid>/stat" may not
      equal <pid>.
      
      This patch fixes the problem by converting to a single_open seq_file show
      method.  Getting the pid namespace from the filesystem superblock instead of
      current, and simply using the the struct pid from the inode instead of
      attempting to get that same pid from the task.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ee992744
    • Eric W. Biederman's avatar
      proc: implement proc_single_file_operations · be614086
      Eric W. Biederman authored
      Currently many /proc/pid files use a crufty precursor to the current seq_file
      api, and they don't have direct access to the pid_namespace or the pid of for
      which they are displaying data.
      
      So implement proc_single_file_operations to make the seq_file routines easy to
      use, and to give access to the full state of the pid of we are displaying data
      for.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      be614086
    • Zhang Rui's avatar
      proc: detect duplicate names on registration · 94413d88
      Zhang Rui authored
      Print a warning if PDE is registered with a name which already exists in
      target directory.
      
      Bug report and a simple fix can be found here:
      http://bugzilla.kernel.org/show_bug.cgi?id=8798
      
      [\n fixlet and no undescriptive variable usage --adobriyan]
      [akpm@linux-foundation.org: make printk comprehensible]
      Signed-off-by: default avatarZhang Rui <rui.zhang@intel.com>
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      94413d88
    • Alexey Dobriyan's avatar
      proc: remove useless check on symlink removal · fd2cbe48
      Alexey Dobriyan authored
      proc symlinks always have valid ->data containing destination of symlink.  No
      need to check it on removal -- proc_symlink() already done it.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fd2cbe48
    • Alexey Dobriyan's avatar
      proc: simplify function prototypes · 76df0c25
      Alexey Dobriyan authored
      Move code around so as to reduce the number of forward-declarations.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      76df0c25
    • Alexey Dobriyan's avatar
      proc: less LOCK operations during lookup · 4237e0d3
      Alexey Dobriyan authored
      Pseudo-code for lookup effectively is:
      
      	LOCK kernel
      	LOCK proc_subdir_lock
      		find PDE
      		UNLOCK proc_subdir_lock
      
      		get inode
      
      		LOCK proc_subdir_lock
      		goto unlock
      	UNLOCK proc_subdir_lock
      	UNLOCK kernel
      
      We can get rid of LOCK/UNLOCK pair after getting inode simply by jumping
      to unlock_kernel() directly.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4237e0d3
    • Alexey Dobriyan's avatar
      proc: remove MODULE_LICENSE · 5b3fe63b
      Alexey Dobriyan authored
      proc is not modular, so MODULE_LICENSE just expands to empty space.  proc
      without doubts remains GPLed.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b3fe63b
    • Pavel Emelyanov's avatar
      namespaces: mark NET_NS with "depends on NAMESPACES" · cbdc7387
      Pavel Emelyanov authored
      There's already an option controlling the net namespaces cloning code, so make
      it work the same way as all the other namespaces' options.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cbdc7387
    • Pavel Emelyanov's avatar
      namespaces: cleanup the code managed with PID_NS option · 74bd59bb
      Pavel Emelyanov authored
      Just like with the user namespaces, move the namespace management code into
      the separate .c file and mark the (already existing) PID_NS option as "depend
      on NAMESPACES"
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      74bd59bb
    • Pavel Emelyanov's avatar
      namespaces: cleanup the code managed with the USER_NS option · aee16ce7
      Pavel Emelyanov authored
      Make the user_namespace.o compilation depend on this option and move the
      init_user_ns into user.c file to make the kernel compile and work without the
      namespaces support.  This make the user namespace code be organized similar to
      other namespaces'.
      
      Also mask the USER_NS option as "depend on NAMESPACES".
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aee16ce7
    • Pavel Emelyanov's avatar
      namespaces: move the IPC namespace under IPC_NS option · ae5e1b22
      Pavel Emelyanov authored
      Currently the IPC namespace management code is spread over the ipc/*.c files.
      I moved this code into ipc/namespace.c file which is compiled out when needed.
      
      The linux/ipc_namespace.h file is used to store the prototypes of the
      functions in namespace.c and the stubs for NAMESPACES=n case.  This is done
      so, because the stub for copy_ipc_namespace requires the knowledge of the
      CLONE_NEWIPC flag, which is in sched.h.  But the linux/ipc.h file itself in
      included into many many .c files via the sys.h->sem.h sequence so adding the
      sched.h into it will make all these .c depend on sched.h which is not that
      good.  On the other hand the knowledge about the namespaces stuff is required
      in 4 .c files only.
      
      Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
      msg.c and shm.c files.  It turned out that moving these functions into
      namespaces.c is not that easy because they use many other calls and macros
      from the original file.  Moving them would make this patch complicated.  On
      the other hand all these functions can be consolidated, so I will send a
      separate patch doing this a bit later.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae5e1b22
    • Pavel Emelyanov's avatar
      namespaces: move the UTS namespace under UTS_NS option · 58bfdd6d
      Pavel Emelyanov authored
      Currently all the namespace management code is in the kernel/utsname.c file,
      so just compile it out and make stubs in the appropriate header.
      
      The init namespace itself is in init/version.c and is in the kernel all the
      time.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      58bfdd6d
    • Pavel Emelyanov's avatar
      namespaces: add the NAMESPACES config option · c5289a69
      Pavel Emelyanov authored
      The option is selectable if EMBEDDED is chosen only.  When the EMBEDDED is off
      namespaces will be on.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c5289a69
    • Nishanth Aravamudan's avatar
      hugetlb: add locking for overcommit sysctl · a3d0c6aa
      Nishanth Aravamudan authored
      When I replaced hugetlb_dynamic_pool with nr_overcommit_hugepages I used
      proc_doulongvec_minmax() directly.  However, hugetlb.c's locking rules
      require that all counter modifications occur under the hugetlb_lock.  Add a
      callback into the hugetlb code similar to the one for nr_hugepages.  Grab
      the lock around the manipulation of nr_overcommit_hugepages in
      proc_doulongvec_minmax().
      Signed-off-by: default avatarNishanth Aravamudan <nacc@us.ibm.com>
      Acked-by: default avatarAdam Litke <agl@us.ibm.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3d0c6aa
    • Ulisses Furquim's avatar
      inotify: fix check for one-shot watches before destroying them · ac74c00e
      Ulisses Furquim authored
      As the IN_ONESHOT bit is never set when an event is sent we must check it
      in the watch's mask and not in the event's mask.
      Signed-off-by: default avatarUlisses Furquim <ulissesf@gmail.com>
      Reported-by: default avatar"Clem Taylor" <clem.taylor@gmail.com>
      Tested-by: default avatar"Clem Taylor" <clem.taylor@gmail.com>
      Cc: Amy Griffis <amy.griffis@hp.com>
      Cc: Robert Love <rlove@google.com>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ac74c00e
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm · a4ffc0a0
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (44 commits)
        dm raid1: report fault status
        dm raid1: handle read failures
        dm raid1: fix EIO after log failure
        dm raid1: handle recovery failures
        dm raid1: handle write failures
        dm snapshot: combine consecutive exceptions in memory
        dm: stripe enhanced status return
        dm: stripe trigger event on failure
        dm log: auto load modules
        dm: move deferred bio flushing to workqueue
        dm crypt: use async crypto
        dm crypt: prepare async callback fn
        dm crypt: add completion for async
        dm crypt: add async request mempool
        dm crypt: extract scatterlist processing
        dm crypt: tidy io ref counting
        dm crypt: introduce crypt_write_io_loop
        dm crypt: abstract crypt_write_done
        dm crypt: store sector mapping in dm_crypt_io
        dm crypt: move queue functions
        ...
      a4ffc0a0
    • Linus Torvalds's avatar
      Merge branch 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6 · d7511ec8
      Linus Torvalds authored
      * 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6: (59 commits)
        hwmon: (lm80) Add individual alarm files
        hwmon: (lm80) De-macro the sysfs callbacks
        hwmon: (lm80) Various cleanups
        hwmon: (w83627hf) Refactor beep enable handling
        hwmon: (w83627hf) Add individual alarm and beep files
        hwmon: (w83627hf) Enable VBAT monitoring
        hwmon: (w83627ehf) The W83627DHG has 8 VID pins
        hwmon: (asb100) Add individual alarm files
        hwmon: (asb100) De-macro the sysfs callbacks
        hwmon: (asb100) Various cleanups
        hwmon: VRM is not written to registers
        hwmon: (dme1737) fix Super-IO device ID override
        hwmon: (dme1737) fix divide-by-0
        hwmon: (abituguru3) Add AUX4 fan input for Abit IP35 Pro
        hwmon: Add support for Texas Instruments/Burr-Brown ADS7828
        hwmon: (adm9240) Add individual alarm files
        hwmon: (lm77) Add individual alarm files
        hwmon: Discard useless I2C driver IDs
        hwmon: (lm85) Make the pwmN_enable files writable
        hwmon: (lm85) Return standard values in pwmN_enable
        ...
      d7511ec8
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 · 0b61a2ba
      Linus Torvalds authored
      * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: (62 commits)
        [XFS] add __init/__exit mark to specific init/cleanup functions
        [XFS] Fix oops in xfs_file_readdir()
        [XFS] kill xfs_root
        [XFS] keep i_nlink updated and use proper accessors
        [XFS] stop updating inode->i_blocks
        [XFS] Make xfs_ail_check check less by default
        [XFS] Move AIL pushing into it's own thread
        [XFS] use generic_permission
        [XFS] stop re-checking permissions in xfs_swapext
        [XFS] clean up xfs_swapext
        [XFS] remove permission check from xfs_change_file_space
        [XFS] prevent panic during log recovery due to bogus op_hdr length
        [XFS] Cleanup various fid related bits:
        [XFS] Fix xfs_lowbit64
        [XFS] Remove CFORK macros and use code directly in IFORK and DFORK macros.
        [XFS] kill superflous buffer locking (2nd attempt)
        [XFS] Use kernel-supplied "roundup_pow_of_two" for simplicity
        [XFS] Remove the BPCSHIFT and NB* based macros from XFS.
        [XFS] Remove bogus assert
        [XFS] optimize XFS_IS_REALTIME_INODE w/o realtime config
        ...
      0b61a2ba
    • Nick Piggin's avatar
      Convert SG from nopage to fault. · a13ff0bb
      Nick Piggin authored
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Douglas Gilbert <dougg@torque.net>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a13ff0bb
    • Linus Torvalds's avatar
      Merge branch 'slub-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm · c00f08d7
      Linus Torvalds authored
      * 'slub-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm:
        SLUB: fix checkpatch warnings
        Use non atomic unlock
        SLUB: Support for performance statistics
        SLUB: Alternate fast paths using cmpxchg_local
        SLUB: Use unique end pointer for each slab page.
        SLUB: Deal with annoying gcc warning on kfree()
      c00f08d7
    • Jonathan Brassow's avatar
      dm raid1: report fault status · af195ac8
      Jonathan Brassow authored
      This patch adds extra information to the mirror status output, so that
      it can be determined which device(s) have failed.  For each mirror device,
      a character is printed indicating the most severe error encountered.  The
      characters are:
       *    A => Alive - No failures
       *    D => Dead - A write failure occurred leaving mirror out-of-sync
       *    S => Sync - A sychronization failure occurred, mirror out-of-sync
       *    R => Read - A read failure occurred, mirror data unaffected
      This allows userspace to properly reconfigure the mirror set.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      af195ac8
    • Jonathan Brassow's avatar
      dm raid1: handle read failures · 06386bbf
      Jonathan Brassow authored
      This patch gives the ability to respond-to/record device failures
      that happen during read operations.  It also adds the ability to
      read from mirror devices that are not the primary if they are
      in-sync.
      
      There are essentially two read paths in mirroring; the direct path
      and the queued path.  When a read request is mapped, if the region
      is 'in-sync' the direct path is taken; otherwise the queued path
      is taken.
      
      If the direct path is taken, we must record bio information so that
      if the read fails we can retry it.  We then discover the status of
      a direct read through mirror_end_io.  If the read has failed, we will
      mark the device from which the read was attempted as failed (so we
      don't try to read from it again), restore the bio and try again.
      
      If the queued path is taken, we discover the results of the read
      from 'read_callback'.  If the device failed, we will mark the device
      as failed and attempt the read again if there is another device
      where this region is known to be 'in-sync'.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      06386bbf
    • Jonathan Brassow's avatar
      dm raid1: fix EIO after log failure · b80aa7a0
      Jonathan Brassow authored
      This patch adds the ability to requeue write I/O to
      core device-mapper when there is a log device failure.
      
      If a write to the log produces and error, the pending writes are
      put on the "failures" list.  Since the log is marked as failed,
      they will stay on the failures list until a suspend happens.
      
      Suspends come in two phases, presuspend and postsuspend.  We must
      make sure that all the writes on the failures list are requeued
      in the presuspend phase (a requirement of dm core).  This means
      that recovery must be complete (because writes may be delayed
      behind it) and the failures list must be requeued before we
      return from presuspend.
      
      The mechanisms to ensure recovery is complete (or stopped) was
      already in place, but needed to be moved from postsuspend to
      presuspend.  We rely on 'flush_workqueue' to ensure that the
      mirror thread is complete and therefore, has requeued all writes
      in the failures list.
      
      Because we are using flush_workqueue, we must ensure that no
      additional 'queue_work' calls will produce additional I/O
      that we need to requeue (because once we return from
      presuspend, we are unable to do anything about it).  'queue_work'
      is called in response to the following functions:
      - complete_resync_work = NA, recovery is stopped
      - rh_dec (mirror_end_io) = NA, only calls 'queue_work' if it
                                 is ready to recover the region
                                 (recovery is stopped) or it needs
                                 to clear the region in the log*
                                 **this doesn't get called while
                                 suspending**
      - rh_recovery_end = NA, recovery is stopped
      - rh_recovery_start = NA, recovery is stopped
      - write_callback = 1) Writes w/o failures simply call
                         bio_endio -> mirror_end_io -> rh_dec
                         (see rh_dec above)
                         2) Writes with failures are put on
                         the failures list and queue_work is
                         called**
                         ** write_callbacks don't happen
                         during suspend **
      - do_failures = NA, 'queue_work' not called if suspending
      - add_mirror (initialization) = NA, only done on mirror creation
      - queue_bio = NA, 1) delayed I/O scheduled before flush_workqueue
                    is called.  2) No more I/Os are being issued.
                    3) Re-attempted READs can still be handled.
                    (Write completions are handled through rh_dec/
                    write_callback - mention above - and do not
                    use queue_bio.)
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      b80aa7a0
    • Jonathan Brassow's avatar
      dm raid1: handle recovery failures · 8f0205b7
      Jonathan Brassow authored
      This patch adds the calls to 'fail_mirror' if an error occurs during
      mirror recovery (aka resynchronization).  'fail_mirror' is responsible
      for recording the type of error by mirror device and ensuring an event
      gets raised for the purpose of notifying userspace.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      8f0205b7
    • Jonathan Brassow's avatar
      dm raid1: handle write failures · 72f4b314
      Jonathan Brassow authored
      This patch gives mirror the ability to handle device failures
      during normal write operations.
      
      The 'write_callback' function is called when a write completes.
      If all the writes failed or succeeded, we report failure or
      success respectively.  If some of the writes failed, we call
      fail_mirror; which increments the error count for the device, notes
      the type of error encountered (DM_RAID1_WRITE_ERROR),  and
      selects a new primary (if necessary).  Note that the primary
      device can never change while the mirror is not in-sync (IOW,
      while recovery is happening.)  This means that the scenario
      where a failed write changes the primary and gives
      recovery_complete a chance to misread the primary never happens.
      The fact that the primary can change has necessitated the change
      to the default_mirror field.  We need to protect against reading
      garbage while the primary changes.  We then add the bio to a new
      list in the mirror set, 'failures'.  For every bio in the 'failures'
      list, we call a new function, '__bio_mark_nosync', where we mark
      the region 'not-in-sync' in the log and properly set the region
      state as, RH_NOSYNC.  Userspace must also be notified of the
      failure.  This is done by 'raising an event' (dm_table_event()).
      If fail_mirror is called in process context the event can be raised
      right away.  If in interrupt context, the event is deferred to the
      kmirrord thread - which raises the event if 'event_waiting' is set.
      
      Backwards compatibility is maintained by ignoring errors if
      the DM_FEATURES_HANDLE_ERRORS flag is not present.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      72f4b314
    • Milan Broz's avatar
      dm snapshot: combine consecutive exceptions in memory · d74f81f8
      Milan Broz authored
      Provided sector_t is 64 bits, reduce the in-memory footprint of the
      snapshot exception table by the simple method of using unused bits of
      the chunk number to combine consecutive entries.
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      d74f81f8
    • Brian Wood's avatar
      dm: stripe enhanced status return · 4f7f5c67
      Brian Wood authored
      This patch adds additional information to the status line. It is added at the
      end of the returned text so it will not interfere with existing
      implementations using this data. The addition of this information will allow
      for a common return interface to match that returned with the dm-raid1.c
      status line (with Jonathan Brassow's patches).
      
      Here is a sample of what is returned with a mirror "status" call:
      isw_eeaaabgfg_mirror: 0 488390920 mirror 2 8:16 8:32 3727/3727 1 AA 1 core
      
      Here's what's returned with this patch for a stripe "status" call:
      isw_dheeijjdej_stripe: 0 976783872 striped 2 8:16 8:32 1 AA
      Signed-off-by: default avatarBrian Wood <brian.j.wood@intel.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      4f7f5c67
    • Brian Wood's avatar
      dm: stripe trigger event on failure · a25eb944
      Brian Wood authored
      This patch adds the stripe_end_io function to process errors that might
      occur after an IO operation. As part of this there are a number of
      enhancements made to record and trigger events:
      
      - New atomic variable in struct stripe to record the number of
      errors each stripe volume device has experienced (could be used
      later with uevents to report back directly to userspace)
      
      - New workqueue/work struct setup to process the trigger_event function
      
      - New end_io function. It is here that testing for BIO error conditions
      take place. It determines the exact stripe that cause the error,
      records this in the new atomic variable, and calls the queue_work() function
      
      - New trigger_event function to process failure events. This
      calls dm_table_event()
      Signed-off-by: default avatarBrian Wood <brian.j.wood@intel.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      a25eb944
    • Jonathan Brassow's avatar
      dm log: auto load modules · fb8b2848
      Jonathan Brassow authored
      If the log type is not recognised, attempt to load the module
      'dm-log-<type>.ko'.
      Signed-off-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      fb8b2848
    • Milan Broz's avatar
      dm: move deferred bio flushing to workqueue · 304f3f6a
      Milan Broz authored
      Add a single-thread workqueue for each mapped device
      and move flushing of the lists of pushback and deferred bios
      to this new workqueue.
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      304f3f6a
    • Milan Broz's avatar
      dm crypt: use async crypto · 3a7f6c99
      Milan Broz authored
      dm-crypt: Use crypto ablkcipher interface
      
      Move encrypt/decrypt core to async crypto call.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      3a7f6c99
    • Milan Broz's avatar
      dm crypt: prepare async callback fn · 95497a96
      Milan Broz authored
      dm-crypt: Use crypto ablkcipher interface
      
      Prepare callback function for async crypto operation.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarMilan Broz <mbroz@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      95497a96