1. 20 Feb, 2009 6 commits
    • Steven Rostedt's avatar
      ftrace: break out modify loop immediately on detection of error · 4377245a
      Steven Rostedt authored
      Impact: added precaution on failure detection
      
      Break out of the modifying loop as soon as a failure is detected.
      This is just an added precaution found by code review and was not
      found by any bug chasing.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      4377245a
    • Steven Rostedt's avatar
      ftrace: immediately stop code modification if failure is detected · 90c7ac49
      Steven Rostedt authored
      Impact: fix to prevent NMI lockup
      
      If the page fault handler produces a WARN_ON in the modifying of
      text, and the system is setup to have a high frequency of NMIs,
      we can lock up the system on a failure to modify code.
      
      The modifying of code with NMIs allows all NMIs to modify the code
      if it is about to run. This prevents a modifier on one CPU from
      modifying code running in NMI context on another CPU. The modifying
      is done through stop_machine, so only NMIs must be considered.
      
      But if the write causes the page fault handler to produce a warning,
      the print can slow it down enough that as soon as it is done
      it will take another NMI before going back to the process context.
      The new NMI will perform the write again causing another print and
      this will hang the box.
      
      This patch turns off the writing as soon as a failure is detected
      and does not wait for it to be turned off by the process context.
      This will keep NMIs from getting stuck in this back and forth
      of print outs.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      90c7ac49
    • Steven Rostedt's avatar
      ftrace, x86: make kernel text writable only for conversions · 16239630
      Steven Rostedt authored
      Impact: keep kernel text read only
      
      Because dynamic ftrace converts the calls to mcount into and out of
      nops at run time, we needed to always keep the kernel text writable.
      
      But this defeats the point of CONFIG_DEBUG_RODATA. This patch converts
      the kernel code to writable before ftrace modifies the text, and converts
      it back to read only afterward.
      
      The kernel text is converted to read/write, stop_machine is called to
      modify the code, then the kernel text is converted back to read only.
      
      The original version used SYSTEM_STATE to determine when it was OK
      or not to change the code to rw or ro. Andrew Morton pointed out that
      using SYSTEM_STATE is a bad idea since there is no guarantee to what
      its state will actually be.
      
      Instead, I moved the check into the set_kernel_text_* functions
      themselves, and use a local variable to determine when it is
      OK to change the kernel text RW permissions.
      
      [ Update: Ingo Molnar suggested moving the prototypes to cacheflush.h ]
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      16239630
    • Steven Rostedt's avatar
      ftrace: allow archs to preform pre and post process for code modification · 000ab691
      Steven Rostedt authored
      This patch creates the weak functions: ftrace_arch_code_modify_prepare
      and ftrace_arch_code_modify_post_process that are called before and
      after the stop machine is called to modify the kernel text.
      
      If the arch needs to do pre or post processing, it only needs to define
      these functions.
      
      [ Update: Ingo Molnar suggested using the name ftrace_arch_code_modify_*
                over using ftrace_arch_modify_* ]
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      000ab691
    • Ingo Molnar's avatar
      x86: use the right protections for split-up pagetables · 07a66d7c
      Ingo Molnar authored
      Steven Rostedt found a bug in where in his modified kernel
      ftrace was unable to modify the kernel text, due to the PMD
      itself having been marked read-only as well in
      split_large_page().
      
      The fix, suggested by Linus, is to not try to 'clone' the
      reference protection of a huge-page, but to use the standard
      (and permissive) page protection bits of KERNPG_TABLE.
      
      The 'cloning' makes sense for the ptes but it's a confused and
      incorrect concept at the page table level - because the
      pagetable entry is a set of all ptes and hence cannot
      'clone' any single protection attribute - the ptes can be any
      mixture of protections.
      
      With the permissive KERNPG_TABLE, even if the pte protections
      get changed after this point (due to ftrace doing code-patching
      or other similar activities like kprobes), the resulting combined
      protections will still be correct and the pte's restrictive
      (or permissive) protections will control it.
      
      Also update the comment.
      
      This bug was there for a long time but has not caused visible
      problems before as it needs a rather large read-only area to
      trigger. Steve possibly hacked his kernel with some really
      large arrays or so. Anyway, the bug is definitely worth fixing.
      
      [ Huang Ying also experienced problems in this area when writing
        the EFI code, but the real bug in split_large_page() was not
        realized back then. ]
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Reported-by: default avatarHuang Ying <ying.huang@intel.com>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      07a66d7c
    • Alok N Kataria's avatar
      x86, vmi: TSC going backwards check in vmi clocksource · 48ffc70b
      Alok N Kataria authored
      Impact: fix time warps under vmware
      
      Similar to the check for TSC going backwards in the TSC clocksource,
      we also need this check for VMI clocksource.
      Signed-off-by: default avatarAlok N Kataria <akataria@vmware.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: stable@kernel.org
      48ffc70b
  2. 19 Feb, 2009 23 commits
  3. 18 Feb, 2009 11 commits
    • Ingo Molnar's avatar
      inotify: fix GFP_KERNEL related deadlock · f04b30de
      Ingo Molnar authored
      Enhanced lockdep coverage of __GFP_NOFS turned up this new lockdep
      assert:
      
      [ 1093.677775]
      [ 1093.677781] =================================
      [ 1093.680031] [ INFO: inconsistent lock state ]
      [ 1093.680031] 2.6.29-rc5-tip-01504-gb49eca1-dirty #1
      [ 1093.680031] ---------------------------------
      [ 1093.680031] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      [ 1093.680031] kswapd0/308 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [ 1093.680031]  (&inode->inotify_mutex){+.+.?.}, at: [<c0205942>] inotify_inode_is_dead+0x20/0x80
      [ 1093.680031] {RECLAIM_FS-ON-W} state was registered at:
      [ 1093.680031]   [<c01696b9>] mark_held_locks+0x43/0x5b
      [ 1093.680031]   [<c016baa4>] lockdep_trace_alloc+0x6c/0x6e
      [ 1093.680031]   [<c01cf8b0>] kmem_cache_alloc+0x20/0x150
      [ 1093.680031]   [<c040d0ec>] idr_pre_get+0x27/0x6c
      [ 1093.680031]   [<c02056e3>] inotify_handle_get_wd+0x25/0xad
      [ 1093.680031]   [<c0205f43>] inotify_add_watch+0x7a/0x129
      [ 1093.680031]   [<c020679e>] sys_inotify_add_watch+0x20f/0x250
      [ 1093.680031]   [<c010389e>] sysenter_do_call+0x12/0x35
      [ 1093.680031]   [<ffffffff>] 0xffffffff
      [ 1093.680031] irq event stamp: 60417
      [ 1093.680031] hardirqs last  enabled at (60417): [<c018d5f5>] call_rcu+0x53/0x59
      [ 1093.680031] hardirqs last disabled at (60416): [<c018d5b9>] call_rcu+0x17/0x59
      [ 1093.680031] softirqs last  enabled at (59656): [<c0146229>] __do_softirq+0x157/0x16b
      [ 1093.680031] softirqs last disabled at (59651): [<c0106293>] do_softirq+0x74/0x15d
      [ 1093.680031]
      [ 1093.680031] other info that might help us debug this:
      [ 1093.680031] 2 locks held by kswapd0/308:
      [ 1093.680031]  #0:  (shrinker_rwsem){++++..}, at: [<c01b0502>] shrink_slab+0x36/0x189
      [ 1093.680031]  #1:  (&type->s_umount_key#4){+++++.}, at: [<c01e6d77>] shrink_dcache_memory+0x110/0x1fb
      [ 1093.680031]
      [ 1093.680031] stack backtrace:
      [ 1093.680031] Pid: 308, comm: kswapd0 Not tainted 2.6.29-rc5-tip-01504-gb49eca1-dirty #1
      [ 1093.680031] Call Trace:
      [ 1093.680031]  [<c016947a>] valid_state+0x12a/0x13d
      [ 1093.680031]  [<c016954e>] mark_lock+0xc1/0x1e9
      [ 1093.680031]  [<c016a5b4>] ? check_usage_forwards+0x0/0x3f
      [ 1093.680031]  [<c016ab74>] __lock_acquire+0x2c6/0xac8
      [ 1093.680031]  [<c01688d9>] ? register_lock_class+0x17/0x228
      [ 1093.680031]  [<c016b3d3>] lock_acquire+0x5d/0x7a
      [ 1093.680031]  [<c0205942>] ? inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c08824c4>] __mutex_lock_common+0x3a/0x4cb
      [ 1093.680031]  [<c0205942>] ? inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c08829ed>] mutex_lock_nested+0x2e/0x36
      [ 1093.680031]  [<c0205942>] ? inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c0205942>] inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c01e6672>] dentry_iput+0x90/0xc2
      [ 1093.680031]  [<c01e67a3>] d_kill+0x21/0x45
      [ 1093.680031]  [<c01e6a46>] __shrink_dcache_sb+0x27f/0x355
      [ 1093.680031]  [<c01e6dc5>] shrink_dcache_memory+0x15e/0x1fb
      [ 1093.680031]  [<c01b05ed>] shrink_slab+0x121/0x189
      [ 1093.680031]  [<c01b0d12>] kswapd+0x39f/0x561
      [ 1093.680031]  [<c01ae499>] ? isolate_pages_global+0x0/0x233
      [ 1093.680031]  [<c0157eae>] ? autoremove_wake_function+0x0/0x43
      [ 1093.680031]  [<c01b0973>] ? kswapd+0x0/0x561
      [ 1093.680031]  [<c0157daf>] kthread+0x41/0x82
      [ 1093.680031]  [<c0157d6e>] ? kthread+0x0/0x82
      [ 1093.680031]  [<c01043ab>] kernel_thread_helper+0x7/0x10
      
      inotify_handle_get_wd() does idr_pre_get() which does a
      kmem_cache_alloc() without __GFP_FS - and is hence deadlockable under
      extreme MM pressure.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: MinChan Kim <minchan.kim@gmail.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f04b30de
    • Michael Buesch's avatar
      spi-gpio: sanitize MISO bitvalue · be50344e
      Michael Buesch authored
      gpio_get_value() returns 0 or nonzero, but getmiso() expects 0 or 1.
      Sanitize the value to a 0/1 boolean.
      Signed-off-by: default avatarMichael Buesch <mb@bu3sch.de>
      Acked-by: default avatarDavid Brownell <dbrownell@users.sourceforge.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      be50344e
    • Bernhard Walle's avatar
      Bernhard has moved · 97bef7dd
      Bernhard Walle authored
      Since I don't work for SUSE any more and the bwalle@suse.de address is
      invalid, correct it in the copyright headers and documentation.
      Signed-off-by: default avatarBernhard Walle <bernhard.walle@gmx.de>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      97bef7dd
    • Randy Dunlap's avatar
      x86: dell-laptop: depends on POWER_SUPPLY · 310d8c93
      Randy Dunlap authored
      Build breaks when DELL_LAPTOP=y and POWER_SUPPLY=m.  DELL_LAPTOP needs to
      depend on POWER_SUPPLY.
      
      dell-laptop.c:(.text+0x1ef3c4): undefined reference to `power_supply_is_system_supplied'
      dell-laptop.c:(.text+0x1ef45e): undefined reference to `power_supply_is_system_supplied'
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Len Brown <lenb@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      310d8c93
    • Bill Nottingham's avatar
      vt: Declare PIO_CMAP/GIO_CMAP as compatbile ioctls. · 2db69a93
      Bill Nottingham authored
      Otherwise, these don't work when called from 32-bit userspace on 64-bit
      kernels.
      
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2db69a93
    • Krzysztof Helt's avatar
      fbdev/drm: fix Kconfig submenu mess in "Graphics support" · a1a5c3b9
      Krzysztof Helt authored
      Submenus of the graphics support "Support for frame buffer devices" and
      "Direct Rendering Manager (XFree86 4.1.0 and higher DRI support)" are
      broken in half after latest changes for Intel 915 mode setting support.
      
      The DRM subsection is broken because one option is put outside the choice
      section it depends on.
      
      The frame buffers part is broken then due to circular dependency.  Fix
      this by make Intel frame buffers depend on CONFIG_INTEL_AGP.
      
      Kconfigs are broken by d2f59357
      ("drm/i915: select framebuffer support automatically").
      
      This is probably not only way to fix this.
      Signed-off-by: default avatarKrzysztof Helt <krzysztof.h1@wp.pl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Dave Airlie <airlied@linux.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1a5c3b9
    • Philippe De Muyter's avatar
      floppy: request and release only the ports we actually use · 5a74db06
      Philippe De Muyter authored
      The floppy driver requests an I/O port it doesn't need, and sometimes this
      causes a conflict with a motherboard device reported by PNPBIOS.
      
      This patch makes the floppy driver request and release only the ports it
      actually uses.  It also factors out the request/release stuff and the
      io-ports list so they're all in one place now.
      
      The current floppy driver uses only these ports:
      
          0x3f2 (FD_DOR)
          0x3f4 (FD_STATUS)
          0x3f5 (FD_DATA)
          0x3f7 (FD_DCR/FD_DIR)
      
      but it requests 0x3f2-0x3f5 and 0x3f7, which includes the unused port
      0x3f3.
      
      Some BIOSes report 0x3f3 as a motherboard resource.  The PNP system driver
      reserves that, which causes a conflict when the floppy driver requests
      0x3f2-0x3f5 later.
      
      Philippe reported that this conflict broke the floppy driver between
      2.6.11 and 2.6.22.  His PNPBIOS reports these devices:
      
          $ cat 00:07/id 00:07/resources	# motherboard device
          PNP0c02
          state = active
          io 0x80-0x80
          io 0x10-0x1f
          io 0x22-0x3f
          io 0x44-0x5f
          io 0x90-0x9f
          io 0xa2-0xbf
          io 0x3f0-0x3f1
          io 0x3f3-0x3f3
      
          $ cat 00:03/id 00:03/resources	# floppy device
          PNP0700
          state = active
          io 0x3f4-0x3f5
          io 0x3f2-0x3f2
      
      Reference:
          http://lkml.org/lkml/2009/1/31/162Signed-off-by: default avatarBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: default avatarPhilippe De Muyter <phdm@macqel.be>
      Reported-by: default avatarPhilippe De Muyter <phdm@macqel.be>
      Tested-by: default avatarPhilippe De Muyter <phdm@macqel.be>
      Cc: Adam M Belay <abelay@mit.edu>
      Cc: Robert Hancock <hancockrwd@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5a74db06
    • Adam Lackorzynski's avatar
      jsm: additional device support · ffa7525c
      Adam Lackorzynski authored
      I have a Digi Neo 8 PCI card (114f:00b1) Serial controller: Digi
      International Digi Neo 8 (rev 05)
      
      that works with the jsm driver after using the following patch.
      Signed-off-by: default avatarAdam Lackorzynski <adam@os.inf.tu-dresden.de>
      Cc: Scott H Kilau <Scott_Kilau@digi.com>
      Cc: Wendy Xiong <wendyx@us.ibm.com>
      Acked-by: default avatarAlan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ffa7525c
    • KAMEZAWA Hiroyuki's avatar
      mm: fix memmap init for handling memory hole · cc2559bc
      KAMEZAWA Hiroyuki authored
      Now, early_pfn_in_nid(PFN, NID) may returns false if PFN is a hole.
      and memmap initialization was not done. This was a trouble for
      sparc boot.
      
      To fix this, the PFN should be initialized and marked as PG_reserved.
      This patch changes early_pfn_in_nid() return true if PFN is a hole.
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reported-by: default avatarDavid Miller <davem@davemlloft.net>
      Tested-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cc2559bc
    • KAMEZAWA Hiroyuki's avatar
      mm: clean up for early_pfn_to_nid() · f2dbcfa7
      KAMEZAWA Hiroyuki authored
      What's happening is that the assertion in mm/page_alloc.c:move_freepages()
      is triggering:
      
      	BUG_ON(page_zone(start_page) != page_zone(end_page));
      
      Once I knew this is what was happening, I added some annotations:
      
      	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
      		printk(KERN_ERR "move_freepages: Bogus zones: "
      		       "start_page[%p] end_page[%p] zone[%p]\n",
      		       start_page, end_page, zone);
      		printk(KERN_ERR "move_freepages: "
      		       "start_zone[%p] end_zone[%p]\n",
      		       page_zone(start_page), page_zone(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
      		       page_to_pfn(start_page), page_to_pfn(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_nid[%d] end_nid[%d]\n",
      		       page_to_nid(start_page), page_to_nid(end_page));
       ...
      
      And here's what I got:
      
      	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
      	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
      	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
      	move_freepages: start_nid[1] end_nid[0]
      
      My memory layout on this box is:
      
      [    0.000000] Zone PFN ranges:
      [    0.000000]   Normal   0x00000000 -> 0x0081ff5d
      [    0.000000] Movable zone start PFN for each node
      [    0.000000] early_node_map[8] active PFN ranges
      [    0.000000]     0: 0x00000000 -> 0x00020000
      [    0.000000]     1: 0x00800000 -> 0x0081f7ff
      [    0.000000]     1: 0x0081f800 -> 0x0081fe50
      [    0.000000]     1: 0x0081fed1 -> 0x0081fed8
      [    0.000000]     1: 0x0081feda -> 0x0081fedb
      [    0.000000]     1: 0x0081fedd -> 0x0081fee5
      [    0.000000]     1: 0x0081fee7 -> 0x0081ff51
      [    0.000000]     1: 0x0081ff59 -> 0x0081ff5d
      
      So it's a block move in that 0x81f600-->0x81f7ff region which triggers
      the problem.
      
      This patch:
      
      Declaration of early_pfn_to_nid() is scattered over per-arch include
      files, and it seems it's complicated to know when the declaration is used.
       I think it makes fix-for-memmap-init not easy.
      
      This patch moves all declaration to include/linux/mm.h
      
      After this,
        if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use static definition in include/linux/mm.h
        else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use generic definition in mm/page_alloc.c
        else
           -> per-arch back end function will be called.
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Tested-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: default avatarDavid Miller <davem@davemlloft.net>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f2dbcfa7
    • Peter Zijlstra's avatar
      fs/super.c: add lockdep annotation to s_umount · ada723dc
      Peter Zijlstra authored
      Li Zefan said:
      
      Thread 1:
        for ((; ;))
        {
            mount -t cpuset xxx /mnt > /dev/null 2>&1
            cat /mnt/cpus > /dev/null 2>&1
            umount /mnt > /dev/null 2>&1
        }
      
      Thread 2:
        for ((; ;))
        {
            mount -t cpuset xxx /mnt > /dev/null 2>&1
            umount /mnt > /dev/null 2>&1
        }
      
      (Note: It is irrelevant which cgroup subsys is used.)
      
      After a while a lockdep warning showed up:
      
      =============================================
      [ INFO: possible recursive locking detected ]
      2.6.28 #479
      ---------------------------------------------
      mount/13554 is trying to acquire lock:
       (&type->s_umount_key#19){--..}, at: [<c049d888>] sget+0x5e/0x321
      
      but task is already holding lock:
       (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321
      
      other info that might help us debug this:
      1 lock held by mount/13554:
       #0:  (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321
      
      stack backtrace:
      Pid: 13554, comm: mount Not tainted 2.6.28-mc #479
      Call Trace:
       [<c044ad2e>] validate_chain+0x4c6/0xbbd
       [<c044ba9b>] __lock_acquire+0x676/0x700
       [<c044bb82>] lock_acquire+0x5d/0x7a
       [<c049d888>] ? sget+0x5e/0x321
       [<c061b9b8>] down_write+0x34/0x50
       [<c049d888>] ? sget+0x5e/0x321
       [<c049d888>] sget+0x5e/0x321
       [<c045a2e7>] ? cgroup_set_super+0x0/0x3e
       [<c045959f>] ? cgroup_test_super+0x0/0x2f
       [<c045bcea>] cgroup_get_sb+0x98/0x2e7
       [<c045cfb6>] cpuset_get_sb+0x4a/0x5f
       [<c049dfa4>] vfs_kern_mount+0x40/0x7b
       [<c049e02d>] do_kern_mount+0x37/0xbf
       [<c04af4a0>] do_mount+0x5c3/0x61a
       [<c04addd2>] ? copy_mount_options+0x2c/0x111
       [<c04af560>] sys_mount+0x69/0xa0
       [<c0403251>] sysenter_do_call+0x12/0x31
      
      The cause is after alloc_super() and then retry, an old entry in list
      fs_supers is found, so grab_super(old) is called, but both functions hold
      s_umount lock:
      
      struct super_block *sget(...)
      {
      	...
      retry:
      	spin_lock(&sb_lock);
      	if (test) {
      		list_for_each_entry(old, &type->fs_supers, s_instances) {
      			if (!test(old, data))
      				continue;
      			if (!grab_super(old))  <--- 2nd: down_write(&old->s_umount);
      				goto retry;
      			if (s)
      				destroy_super(s);
      			return old;
      		}
      	}
      	if (!s) {
      		spin_unlock(&sb_lock);
      		s = alloc_super(type);   <--- 1th: down_write(&s->s_umount)
      		if (!s)
      			return ERR_PTR(-ENOMEM);
      		goto retry;
      	}
      	...
      }
      
      It seems like a false positive, and seems like VFS but not cgroup needs to
      be fixed.
      
      Peter said:
      
      We can simply put the new s_umount instance in a but lockdep doesn't
      particularly cares about subclass order.
      
      If there's any issue with the callers of sget() assuming the s_umount lock
      being of sublcass 0, then there is another annotation we can use to fix
      that, but lets not bother with that if this is sufficient.
      
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12673Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Reported-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Menage <menage@google.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ada723dc