1. 20 Feb, 2014 20 commits
    • Bojan Smojver's avatar
      PM / Hibernate: Hibernate/thaw fixes/improvements · b249f99c
      Bojan Smojver authored
      commit 5a21d489 upstream.
      
       1. Do not allocate memory for buffers from emergency pools, unless
          absolutely required. Do not warn about and do not retry non-essential
          failed allocations.
      
       2. Do not check the amount of free pages left on every single page
          write, but wait until one map is completely populated and then check.
      
       3. Set maximum number of pages for read buffering consistently, instead
          of inadvertently depending on the size of the sector type.
      
       4. Fix copyright line, which I missed when I submitted the hibernation
          threading patch.
      
       5. Dispense with bit shifting arithmetic to improve readability.
      
       6. Really recalculate the number of pages required to be free after all
          allocations have been done.
      
       7. Fix calculation of pages required for read buffering. Only count in
          pages that do not belong to high memory.
      Signed-off-by: default avatarBojan Smojver <bojan@rexursive.com>
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b249f99c
    • Avi Kivity's avatar
      KVM: Fix buffer overflow in kvm_set_irq() · ec90b611
      Avi Kivity authored
      commit f2ebd422 upstream.
      
      kvm_set_irq() has an internal buffer of three irq routing entries, allowing
      connecting a GSI to three IRQ chips or on MSI.  However setup_routing_entry()
      does not properly enforce this, allowing three irqchip routes followed by
      an MSI route to overflow the buffer.
      
      Fix by ensuring that an MSI entry is added to an empty list.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec90b611
    • Nicholas Bellinger's avatar
      target/file: Re-enable optional fd_buffered_io=1 operation · 230a0c3b
      Nicholas Bellinger authored
      commit b32f4c7e upstream.
      
      This patch re-adds the ability to optionally run in buffered FILEIO mode
      (eg: w/o O_DSYNC) for device backends in order to once again use the
      Linux buffered cache as a write-back storage mechanism.
      
      This logic was originally dropped with mainline v3.5-rc commit:
      
      commit a4dff304
      Author: Nicholas Bellinger <nab@linux-iscsi.org>
      Date:   Wed May 30 16:25:41 2012 -0700
      
          target/file: Use O_DSYNC by default for FILEIO backends
      
      This difference with this patch is that fd_create_virtdevice() now
      forces the explicit setting of emulate_write_cache=1 when buffered FILEIO
      operation has been enabled.
      
      (v2: Switch to FDBD_HAS_BUFFERED_IO_WCE + add more detailed
           comment as requested by hch)
      Reported-by: default avatarFerry <iscsitmp@bananateam.nl>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      230a0c3b
    • Nicholas Bellinger's avatar
      target/file: Use O_DSYNC by default for FILEIO backends · 45a0374f
      Nicholas Bellinger authored
      commit a4dff304 upstream.
      
      Convert to use O_DSYNC for all cases at FILEIO backend creation time to
      avoid the extra syncing of pure timestamp updates with legacy O_SYNC during
      default operation as recommended by hch.  Continue to do this independently of
      Write Cache Enable (WCE) bit, as WCE=0 is currently the default for all backend
      devices and enabled by user on per device basis via attrib/emulate_write_cache.
      
      This patch drops the now unnecessary fd_buffered_io= token usage that was
      originally signalling when to explictly disable O_SYNC at backend creation
      time for buffered I/O operation.  This can end up being dangerous for a number
      of reasons during physical node failure, so go ahead and drop this option
      for now when O_DSYNC is used as the default.
      
      Also allow explict FUA WRITEs -> vfs_fsync_range() call to function in
      fd_execute_cmd() independently of WCE bit setting.
      Reported-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      [bwh: Backported to 3.2:
       - We have fd_do_task() and not fd_execute_cmd()
       - Various fields are in struct se_task rather than struct se_cmd
       - fd_create_virtdevice() flags initialisation hasn't been cleaned up]
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45a0374f
    • Jan Kara's avatar
      IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast() · f7441069
      Jan Kara authored
      commit 603e7729 upstream.
      
      qib_user_sdma_queue_pkts() gets called with mmap_sem held for
      writing. Except for get_user_pages() deep down in
      qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all.  Even
      more interestingly the function qib_user_sdma_queue_pkts() (and also
      qib_user_sdma_coalesce() called somewhat later) call copy_from_user()
      which can hit a page fault and we deadlock on trying to get mmap_sem
      when handling that fault.
      
      So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and
      leave mmap_sem locking for mm.
      
      This deadlock has actually been observed in the wild when the node
      is under memory pressure.
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      [Backported to 3.4: (Thank to Ben Hutchings)
       - Adjust context
       - Adjust indentation and nr_pages argument in qib_user_sdma_pin_pages()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7441069
    • Peter Zijlstra's avatar
      sched/nohz: Fix rq->cpu_load calculations some more · f5a4c4b7
      Peter Zijlstra authored
      commit 5aaa0b7a upstream.
      
      Follow up on commit 556061b0 ("sched/nohz: Fix rq->cpu_load[]
      calculations") since while that fixed the busy case it regressed the
      mostly idle case.
      
      Add a callback from the nohz exit to also age the rq->cpu_load[]
      array. This closes the hole where either there was no nohz load
      balance pass during the nohz, or there was a 'significant' amount of
      idle time between the last nohz balance and the nohz exit.
      
      So we'll update unconditionally from the tick to not insert any
      accidental 0 load periods while busy, and we try and catch up from
      nohz idle balance and nohz exit. Both these are still prone to missing
      a jiffy, but that has always been the case.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: pjt@google.com
      Cc: Venkatesh Pallipadi <venki@google.com>
      Link: http://lkml.kernel.org/n/tip-kt0trz0apodbf84ucjfdbr1a@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5a4c4b7
    • Peter Zijlstra's avatar
      sched/nohz: Fix rq->cpu_load[] calculations · e2d51f27
      Peter Zijlstra authored
      commit 556061b0 upstream.
      
      While investigating why the load-balancer did funny I found that the
      rq->cpu_load[] tables were completely screwy.. a bit more digging
      revealed that the updates that got through were missing ticks followed
      by a catchup of 2 ticks.
      
      The catchup assumes the cpu was idle during that time (since only nohz
      can cause missed ticks and the machine is idle etc..) this means that
      esp. the higher indices were significantly lower than they ought to
      be.
      
      The reason for this is that its not correct to compare against jiffies
      on every jiffy on any other cpu than the cpu that updates jiffies.
      
      This patch cludges around it by only doing the catch-up stuff from
      nohz_idle_balance() and doing the regular stuff unconditionally from
      the tick.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: pjt@google.com
      Cc: Venkatesh Pallipadi <venki@google.com>
      Link: http://lkml.kernel.org/n/tip-tp4kj18xdd5aj4vvj0qg55s2@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2d51f27
    • Steven Rostedt's avatar
      ftrace: Have function graph only trace based on global_ops filters · 1c2bd0db
      Steven Rostedt authored
      commit 23a8e844 upstream.
      
      Doing some different tests, I discovered that function graph tracing, when
      filtered via the set_ftrace_filter and set_ftrace_notrace files, does
      not always keep with them if another function ftrace_ops is registered
      to trace functions.
      
      The reason is that function graph just happens to trace all functions
      that the function tracer enables. When there was only one user of
      function tracing, the function graph tracer did not need to worry about
      being called by functions that it did not want to trace. But now that there
      are other users, this becomes a problem.
      
      For example, one just needs to do the following:
      
       # cd /sys/kernel/debug/tracing
       # echo schedule > set_ftrace_filter
       # echo function_graph > current_tracer
       # cat trace
      [..]
       0)               |  schedule() {
       ------------------------------------------
       0)    <idle>-0    =>   rcu_pre-7
       ------------------------------------------
      
       0) ! 2980.314 us |  }
       0)               |  schedule() {
       ------------------------------------------
       0)   rcu_pre-7    =>    <idle>-0
       ------------------------------------------
      
       0) + 20.701 us   |  }
      
       # echo 1 > /proc/sys/kernel/stack_tracer_enabled
       # cat trace
      [..]
       1) + 20.825 us   |      }
       1) + 21.651 us   |    }
       1) + 30.924 us   |  } /* SyS_ioctl */
       1)               |  do_page_fault() {
       1)               |    __do_page_fault() {
       1)   0.274 us    |      down_read_trylock();
       1)   0.098 us    |      find_vma();
       1)               |      handle_mm_fault() {
       1)               |        _raw_spin_lock() {
       1)   0.102 us    |          preempt_count_add();
       1)   0.097 us    |          do_raw_spin_lock();
       1)   2.173 us    |        }
       1)               |        do_wp_page() {
       1)   0.079 us    |          vm_normal_page();
       1)   0.086 us    |          reuse_swap_page();
       1)   0.076 us    |          page_move_anon_rmap();
       1)               |          unlock_page() {
       1)   0.082 us    |            page_waitqueue();
       1)   0.086 us    |            __wake_up_bit();
       1)   1.801 us    |          }
       1)   0.075 us    |          ptep_set_access_flags();
       1)               |          _raw_spin_unlock() {
       1)   0.098 us    |            do_raw_spin_unlock();
       1)   0.105 us    |            preempt_count_sub();
       1)   1.884 us    |          }
       1)   9.149 us    |        }
       1) + 13.083 us   |      }
       1)   0.146 us    |      up_read();
      
      When the stack tracer was enabled, it enabled all functions to be traced, which
      now the function graph tracer also traces. This is a side effect that should
      not occur.
      
      To fix this a test is added when the function tracing is changed, as well as when
      the graph tracer is enabled, to see if anything other than the ftrace global_ops
      function tracer is enabled. If so, then the graph tracer calls a test trampoline
      that will look at the function that is being traced and compare it with the
      filters defined by the global_ops.
      
      As an optimization, if there's no other function tracers registered, or if
      the only registered function tracers also use the global ops, the function
      graph infrastructure will call the registered function graph callback directly
      and not go through the test trampoline.
      
      Fixes: d2d45c7a "tracing: Have stack_tracer use a separate list of functions"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c2bd0db
    • Steven Rostedt's avatar
      ftrace: Fix synchronization location disabling and freeing ftrace_ops · 29558665
      Steven Rostedt authored
      commit a4c35ed2 upstream.
      
      The synchronization needed after ftrace_ops are unregistered must happen
      after the callback is disabled from becing called by functions.
      
      The current location happens after the function is being removed from the
      internal lists, but not after the function callbacks were disabled, leaving
      the functions susceptible of being called after their callbacks are freed.
      
      This affects perf and any externel users of function tracing (LTTng and
      SystemTap).
      
      Fixes: cdbe61bf "ftrace: Allow dynamically allocated function tracers"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29558665
    • Steven Rostedt's avatar
      ftrace: Synchronize setting function_trace_op with ftrace_trace_function · 95bcd16e
      Steven Rostedt authored
      commit 405e1d83 upstream.
      
      [ Partial commit backported to 3.4. The ftrace_sync() code by this is
        required for other fixes that 3.4 needs. ]
      
      ftrace_trace_function is a variable that holds what function will be called
      directly by the assembly code (mcount). If just a single function is
      registered and it handles recursion itself, then the assembly will call that
      function directly without any helper function. It also passes in the
      ftrace_op that was registered with the callback. The ftrace_op to send is
      stored in the function_trace_op variable.
      
      The ftrace_trace_function and function_trace_op needs to be coordinated such
      that the called callback wont be called with the wrong ftrace_op, otherwise
      bad things can happen if it expected a different op. Luckily, there's no
      callback that doesn't use the helper functions that requires this. But
      there soon will be and this needs to be fixed.
      
      Use a set_function_trace_op to store the ftrace_op to set the
      function_trace_op to when it is safe to do so (during the update function
      within the breakpoint or stop machine calls). Or if dynamic ftrace is not
      being used (static tracing) then we have to do a bit more synchronization
      when the ftrace_trace_function is set as that takes affect immediately
      (as oppose to dynamic ftrace doing it with the modification of the trampoline).
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95bcd16e
    • Mikulas Patocka's avatar
      dm sysfs: fix a module unload race · a7333f3d
      Mikulas Patocka authored
      commit 2995fa78 upstream.
      
      This reverts commit be35f486 ("dm: wait until embedded kobject is
      released before destroying a device") and provides an improved fix.
      
      The kobject release code that calls the completion must be placed in a
      non-module file, otherwise there is a module unload race (if the process
      calling dm_kobject_release is preempted and the DM module unloaded after
      the completion is triggered, but before dm_kobject_release returns).
      
      To fix this race, this patch moves the completion code to dm-builtin.c
      which is always compiled directly into the kernel if BLK_DEV_DM is
      selected.
      
      The patch introduces a new dm_kobject_holder structure, its purpose is
      to keep the completion and kobject in one place, so that it can be
      accessed from non-module code without the need to export the layout of
      struct mapped_device to that code.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7333f3d
    • Xishi Qiu's avatar
      mm: setup pageblock_order before it's used by sparsemem · 3fea8b0a
      Xishi Qiu authored
      commit ca57df79 upstream.
      
      On architectures with CONFIG_HUGETLB_PAGE_SIZE_VARIABLE set, such as
      Itanium, pageblock_order is a variable with default value of 0.  It's set
      to the right value by set_pageblock_order() in function
      free_area_init_core().
      
      But pageblock_order may be used by sparse_init() before free_area_init_core()
      is called along path:
      sparse_init()
          ->sparse_early_usemaps_alloc_node()
      	->usemap_size()
      	    ->SECTION_BLOCKFLAGS_BITS
      		->((1UL << (PFN_SECTION_SHIFT - pageblock_order)) *
      NR_PAGEBLOCK_BITS)
      
      The uninitialized pageblock_size will cause memory wasting because
      usemap_size() returns a much bigger value then it's really needed.
      
      For example, on an Itanium platform,
      sparse_init() pageblock_order=0 usemap_size=24576
      free_area_init_core() before pageblock_order=0, usemap_size=24576
      free_area_init_core() after pageblock_order=12, usemap_size=8
      
      That means 24K memory has been wasted for each section, so fix it by calling
      set_pageblock_order() from sparse_init().
      Signed-off-by: default avatarXishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: default avatarJiang Liu <liuj97@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Keping Chen <chenkeping@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3fea8b0a
    • Andrew Morton's avatar
      mm/page_alloc.c: remove pageblock_default_order() · 237597d8
      Andrew Morton authored
      commit 955c1cd7 upstream.
      
      This has always been broken: one version takes an unsigned int and the
      other version takes no arguments.  This bug was hidden because one
      version of set_pageblock_order() was a macro which doesn't evaluate its
      argument.
      
      Simplify it all and remove pageblock_default_order() altogether.
      Reported-by: default avatarrajman mekaco <rajman.mekaco@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      237597d8
    • Daniel Vetter's avatar
      drm/i915: kick any firmware framebuffers before claiming the gtt · dbdd2eb4
      Daniel Vetter authored
      commit 9f846a16 upstream.
      
      Especially vesafb likes to map everything as uc- (yikes), and if that
      mapping hangs around still while we try to map the gtt as wc the
      kernel will downgrade our request to uc-, resulting in abyssal
      performance.
      
      Unfortunately we can't do this as early as readon does (i.e. as the
      first thing we do when initializing the hw) because our fb/mmio space
      region moves around on a per-gen basis. So I've had to move it below
      the gtt initialization, but that seems to work, too. The important
      thing is that we do this before we set up the gtt wc mapping.
      
      Now an altogether different question is why people compile their
      kernels with vesafb enabled, but I guess making things just work isn't
      bad per se ...
      
      v2:
      - s/radeondrmfb/inteldrmfb/
      - fix up error handling
      
      v3: Kill #ifdef X86, this is Intel after all. Noticed by Ben Widawsky.
      
      v4: Jani Nikula complained about the pointless bool primary
      initialization.
      
      v5: Don't oops if we can't allocate, noticed by Chris Wilson.
      
      v6: Resolve conflicts with agp rework and fixup whitespace.
      
      This is commit e188719a in drm-next.
      
      Backport to 3.5 -fixes queue requested by Dave Airlie - due to grub
      using vesa on fedora their initrd seems to load vesafb before loading
      the real kms driver. So tons more people actually experience a
      dead-slow gpu. Hence also the Cc: stable.
      Reported-and-tested-by: default avatar"Kilarski, Bernard R" <bernard.r.kilarski@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      dbdd2eb4
    • Tao Ma's avatar
      ext4: protect group inode free counting with group lock · 4e0bc3f3
      Tao Ma authored
      commit 6f2e9f0e upstream.
      
      Now when we set the group inode free count, we don't have a proper
      group lock so that multiple threads may decrease the inode free
      count at the same time. And e2fsck will complain something like:
      
      Free inodes count wrong for group #1 (1, counted=0).
      Fix? no
      
      Free inodes count wrong for group #2 (3, counted=0).
      Fix? no
      
      Directories count wrong for group #2 (780, counted=779).
      Fix? no
      
      Free inodes count wrong for group #3 (2272, counted=2273).
      Fix? no
      
      So this patch try to protect it with the ext4_lock_group.
      
      btw, it is found by xfstests test case 269 and the volume is
      mkfsed with the parameter
      "-O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr"
      and I have run it 100 times and the error in e2fsck doesn't
      show up again.
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      4e0bc3f3
    • Paul E. McKenney's avatar
      printk: Fix scheduling-while-atomic problem in console_cpu_notify() · 5e23efd0
      Paul E. McKenney authored
      commit 85eae82a upstream.
      
      The console_cpu_notify() function runs with interrupts disabled in the
      CPU_DYING case.  It therefore cannot block, for example, as will happen
      when it calls console_lock().  Therefore, remove the CPU_DYING leg of
      the switch statement to avoid this problem.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Guillaume Morin <guillaume@morinfr.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e23efd0
    • Peter Oberparleiter's avatar
      x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y · 36f0c45d
      Peter Oberparleiter authored
      commit 6583327c upstream.
      
      Commit d61931d8, "x86: Add optimized popcnt variants" introduced
      compile flag -fcall-saved-rdi for lib/hweight.c. When combined with
      options -fprofile-arcs and -O2, this flag causes gcc to generate
      broken constructor code. As a result, a 64 bit x86 kernel compiled
      with CONFIG_GCOV_PROFILE_ALL=y prints message "gcov: could not create
      file" and runs into sproadic BUGs during boot.
      
      The gcc people indicate that these kinds of problems are endemic when
      using ad hoc calling conventions.  It is therefore best to treat any
      file compiled with ad hoc calling conventions as an isolated
      environment and avoid things like profiling or coverage analysis,
      since those subsystems assume a "normal" calling conventions.
      
      This patch avoids the bug by excluding lib/hweight.o from coverage
      profiling.
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/52F3A30C.7050205@linux.vnet.ibm.comSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36f0c45d
    • KOSAKI Motohiro's avatar
      mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq · d89985cb
      KOSAKI Motohiro authored
      commit 227d53b3 upstream.
      
      To use spin_{un}lock_irq is dangerous if caller disabled interrupt.
      During aio buffer migration, we have a possibility to see the following
      call stack.
      
      aio_migratepage  [disable interrupt]
        migrate_page_copy
          clear_page_dirty_for_io
            set_page_dirty
              __set_page_dirty_buffers
                __set_page_dirty
                  spin_lock_irq
      
      This mean, current aio migration is a deadlockable.  spin_lock_irqsave
      is a safer alternative and we should use it.
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: David Rientjes rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d89985cb
    • KOSAKI Motohiro's avatar
      mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq() · 4d4bed81
      KOSAKI Motohiro authored
      commit a85d9df1 upstream.
      
      During aio stress test, we observed the following lockdep warning.  This
      mean AIO+numa_balancing is currently deadlockable.
      
      The problem is, aio_migratepage disable interrupt, but
      __set_page_dirty_nobuffers unintentionally enable it again.
      
      Generally, all helper function should use spin_lock_irqsave() instead of
      spin_lock_irq() because they don't know caller at all.
      
         other info that might help us debug this:
          Possible unsafe locking scenario:
      
                CPU0
                ----
           lock(&(&ctx->completion_lock)->rlock);
           <Interrupt>
             lock(&(&ctx->completion_lock)->rlock);
      
          *** DEADLOCK ***
      
            dump_stack+0x19/0x1b
            print_usage_bug+0x1f7/0x208
            mark_lock+0x21d/0x2a0
            mark_held_locks+0xb9/0x140
            trace_hardirqs_on_caller+0x105/0x1d0
            trace_hardirqs_on+0xd/0x10
            _raw_spin_unlock_irq+0x2c/0x50
            __set_page_dirty_nobuffers+0x8c/0xf0
            migrate_page_copy+0x434/0x540
            aio_migratepage+0xb1/0x140
            move_to_new_page+0x7d/0x230
            migrate_pages+0x5e5/0x700
            migrate_misplaced_page+0xbc/0xf0
            do_numa_page+0x102/0x190
            handle_pte_fault+0x241/0x970
            handle_mm_fault+0x265/0x370
            __do_page_fault+0x172/0x5a0
            do_page_fault+0x1a/0x70
            page_fault+0x28/0x30
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d4bed81
    • Stephen Smalley's avatar
      SELinux: Fix kernel BUG on empty security contexts. · a0f916d4
      Stephen Smalley authored
      commit 2172fa70 upstream.
      
      Setting an empty security context (length=0) on a file will
      lead to incorrectly dereferencing the type and other fields
      of the security context structure, yielding a kernel BUG.
      As a zero-length security context is never valid, just reject
      all such security contexts whether coming from userspace
      via setxattr or coming from the filesystem upon a getxattr
      request by SELinux.
      
      Setting a security context value (empty or otherwise) unknown to
      SELinux in the first place is only possible for a root process
      (CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
      if the corresponding SELinux mac_admin permission is also granted
      to the domain by policy.  In Fedora policies, this is only allowed for
      specific domains such as livecd for setting down security contexts
      that are not defined in the build host policy.
      
      Reproducer:
      su
      setenforce 0
      touch foo
      setfattr -n security.selinux foo
      
      Caveat:
      Relabeling or removing foo after doing the above may not be possible
      without booting with SELinux disabled.  Any subsequent access to foo
      after doing the above will also trigger the BUG.
      
      BUG output from Matthew Thode:
      [  473.893141] ------------[ cut here ]------------
      [  473.962110] kernel BUG at security/selinux/ss/services.c:654!
      [  473.995314] invalid opcode: 0000 [#6] SMP
      [  474.027196] Modules linked in:
      [  474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G      D   I
      3.13.0-grsec #1
      [  474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
      07/29/10
      [  474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
      ffff8805f50cd488
      [  474.183707] RIP: 0010:[<ffffffff814681c7>]  [<ffffffff814681c7>]
      context_struct_compute_av+0xce/0x308
      [  474.219954] RSP: 0018:ffff8805c0ac3c38  EFLAGS: 00010246
      [  474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
      0000000000000100
      [  474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
      ffff8805e8aaa000
      [  474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
      0000000000000006
      [  474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
      0000000000000006
      [  474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
      0000000000000000
      [  474.453816] FS:  00007f2e75220800(0000) GS:ffff88061fc00000(0000)
      knlGS:0000000000000000
      [  474.489254] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
      00000000000207f0
      [  474.556058] Stack:
      [  474.584325]  ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
      ffff8805f1190a40
      [  474.618913]  ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
      ffff8805e8aac860
      [  474.653955]  ffff8805c0ac3cb8 000700068113833a ffff880606c75060
      ffff8805c0ac3d94
      [  474.690461] Call Trace:
      [  474.723779]  [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a
      [  474.778049]  [<ffffffff81468824>] security_compute_av+0xf4/0x20b
      [  474.811398]  [<ffffffff8196f419>] avc_compute_av+0x2a/0x179
      [  474.843813]  [<ffffffff8145727b>] avc_has_perm+0x45/0xf4
      [  474.875694]  [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31
      [  474.907370]  [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e
      [  474.938726]  [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22
      [  474.970036]  [<ffffffff811b057d>] vfs_getattr+0x19/0x2d
      [  475.000618]  [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91
      [  475.030402]  [<ffffffff811b063b>] vfs_lstat+0x19/0x1b
      [  475.061097]  [<ffffffff811b077e>] SyS_newlstat+0x15/0x30
      [  475.094595]  [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3
      [  475.148405]  [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b
      [  475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
      8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
      75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
      [  475.255884] RIP  [<ffffffff814681c7>]
      context_struct_compute_av+0xce/0x308
      [  475.296120]  RSP <ffff8805c0ac3c38>
      [  475.328734] ---[ end trace f076482e9d754adc ]---
      Reported-by: default avatarMatthew Thode <mthode@mthode.org>
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0f916d4
  2. 13 Feb, 2014 20 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.4.80 · a6d2ebcd
      Greg Kroah-Hartman authored
      a6d2ebcd
    • Colin Cross's avatar
      3.4.y: timekeeping: fix 32-bit overflow in get_monotonic_boottime · cd34de10
      Colin Cross authored
      fixed upstream in v3.6 by ec145bab
      
      get_monotonic_boottime adds three nanonsecond values stored
      in longs, followed by an s64.  If the long values are all
      close to 1e9 the first three additions can overflow and
      become negative when added to the s64.  Cast the first
      value to s64 so that all additions are 64 bit.
      Signed-off-by: default avatarColin Cross <ccross@android.com>
      [jstultz: Fished this out of the AOSP commong.git tree. This was
      fixed upstream in v3.6 by ec145bab]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd34de10
    • John Stultz's avatar
      timekeeping: Avoid possible deadlock from clock_was_set_delayed · cf85cc93
      John Stultz authored
      commit 6fdda9a9 upstream.
      
      As part of normal operaions, the hrtimer subsystem frequently calls
      into the timekeeping code, creating a locking order of
        hrtimer locks -> timekeeping locks
      
      clock_was_set_delayed() was suppoed to allow us to avoid deadlocks
      between the timekeeping the hrtimer subsystem, so that we could
      notify the hrtimer subsytem the time had changed while holding
      the timekeeping locks. This was done by scheduling delayed work
      that would run later once we were out of the timekeeing code.
      
      But unfortunately the lock chains are complex enoguh that in
      scheduling delayed work, we end up eventually trying to grab
      an hrtimer lock.
      
      Sasha Levin noticed this in testing when the new seqlock lockdep
      enablement triggered the following (somewhat abrieviated) message:
      
      [  251.100221] ======================================================
      [  251.100221] [ INFO: possible circular locking dependency detected ]
      [  251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted
      [  251.101967] -------------------------------------------------------
      [  251.101967] kworker/10:1/4506 is trying to acquire lock:
      [  251.101967]  (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70
      [  251.101967]
      [  251.101967] but task is already holding lock:
      [  251.101967]  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
      [  251.101967]
      [  251.101967] which lock already depends on the new lock.
      [  251.101967]
      [  251.101967]
      [  251.101967] the existing dependency chain (in reverse order) is:
      [  251.101967]
      -> #5 (hrtimer_bases.lock#11){-.-...}:
      [snipped]
      -> #4 (&rt_b->rt_runtime_lock){-.-...}:
      [snipped]
      -> #3 (&rq->lock){-.-.-.}:
      [snipped]
      -> #2 (&p->pi_lock){-.-.-.}:
      [snipped]
      -> #1 (&(&pool->lock)->rlock){-.-...}:
      [  251.101967]        [<ffffffff81194803>] validate_chain+0x6c3/0x7b0
      [  251.101967]        [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580
      [  251.101967]        [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0
      [  251.101967]        [<ffffffff84398500>] _raw_spin_lock+0x40/0x80
      [  251.101967]        [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0
      [  251.101967]        [<ffffffff81154168>] queue_work_on+0x98/0x120
      [  251.101967]        [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30
      [  251.101967]        [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160
      [  251.101967]        [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70
      [  251.101967]        [<ffffffff843a4b49>] ia32_sysret+0x0/0x5
      [  251.101967]
      -> #0 (timekeeper_seq){----..}:
      [snipped]
      [  251.101967] other info that might help us debug this:
      [  251.101967]
      [  251.101967] Chain exists of:
        timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11
      
      [  251.101967]  Possible unsafe locking scenario:
      [  251.101967]
      [  251.101967]        CPU0                    CPU1
      [  251.101967]        ----                    ----
      [  251.101967]   lock(hrtimer_bases.lock#11);
      [  251.101967]                                lock(&rt_b->rt_runtime_lock);
      [  251.101967]                                lock(hrtimer_bases.lock#11);
      [  251.101967]   lock(timekeeper_seq);
      [  251.101967]
      [  251.101967]  *** DEADLOCK ***
      [  251.101967]
      [  251.101967] 3 locks held by kworker/10:1/4506:
      [  251.101967]  #0:  (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
      [  251.101967]  #1:  (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
      [  251.101967]  #2:  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
      [  251.101967]
      [  251.101967] stack backtrace:
      [  251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053
      [  251.101967] Workqueue: events clock_was_set_work
      
      So the best solution is to avoid calling clock_was_set_delayed() while
      holding the timekeeping lock, and instead using a flag variable to
      decide if we should call clock_was_set() once we've released the locks.
      
      This works for the case here, where the do_adjtimex() was the deadlock
      trigger point. Unfortuantely, in update_wall_time() we still hold
      the jiffies lock, which would deadlock with the ipi triggered by
      clock_was_set(), preventing us from calling it even after we drop the
      timekeeping lock. So instead call clock_was_set_delayed() at that point.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      cf85cc93
    • Borislav Petkov's avatar
      rtc-cmos: Add an alarm disable quirk · ab99a94d
      Borislav Petkov authored
      commit d5a1c7e3 upstream.
      
      41c7f742 ("rtc: Disable the alarm in the hardware (v2)") added the
      functionality to disable the RTC wake alarm when shutting down the box.
      
      However, there are at least two b0rked BIOSes we know about:
      
      https://bugzilla.novell.com/show_bug.cgi?id=812592
      https://bugzilla.novell.com/show_bug.cgi?id=805740
      
      where, when wakeup alarm is enabled in the BIOS, the machine reboots
      automatically right after shutdown, regardless of what wakeup time is
      programmed.
      
      Bisecting the issue lead to this patch so disable its functionality with
      a DMI quirk only for those boxes.
      
      Cc: Brecht Machiels <brecht@mos6581.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Rabin Vincent <rabin.vincent@stericsson.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      [jstultz: Changed variable name for clarity, added extra dmi entry]
      Tested-by: default avatarBrecht Machiels <brecht@mos6581.org>
      Tested-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab99a94d
    • Ying Xue's avatar
      sched/rt: Avoid updating RT entry timeout twice within one tick period · dbf32394
      Ying Xue authored
      commit 57d2aa00 upstream.
      
      The issue below was found in 2.6.34-rt rather than mainline rt
      kernel, but the issue still exists upstream as well.
      
      So please let me describe how it was noticed on 2.6.34-rt:
      
      On this version, each softirq has its own thread, it means there
      is at least one RT FIFO task per cpu. The priority of these
      tasks is set to 49 by default. If user launches an RT FIFO task
      with priority lower than 49 of softirq RT tasks, it's possible
      there are two RT FIFO tasks enqueued one cpu runqueue at one
      moment. By current strategy of balancing RT tasks, when it comes
      to RT tasks, we really need to put them off to a CPU that they
      can run on as soon as possible. Even if it means a bit of cache
      line flushing, we want RT tasks to be run with the least latency.
      
      When the user RT FIFO task which just launched before is
      running, the sched timer tick of the current cpu happens. In this
      tick period, the timeout value of the user RT task will be
      updated once. Subsequently, we try to wake up one softirq RT
      task on its local cpu. As the priority of current user RT task
      is lower than the softirq RT task, the current task will be
      preempted by the higher priority softirq RT task. Before
      preemption, we check to see if current can readily move to a
      different cpu. If so, we will reschedule to allow the RT push logic
      to try to move current somewhere else. Whenever the woken
      softirq RT task runs, it first tries to migrate the user FIFO RT
      task over to a cpu that is running a task of lesser priority. If
      migration is done, it will send a reschedule request to the found
      cpu by IPI interrupt. Once the target cpu responds the IPI
      interrupt, it will pick the migrated user RT task to preempt its
      current task. When the user RT task is running on the new cpu,
      the sched timer tick of the cpu fires. So it will tick the user
      RT task again. This also means the RT task timeout value will be
      updated again. As the migration may be done in one tick period,
      it means the user RT task timeout value will be updated twice
      within one tick.
      
      If we set a limit on the amount of cpu time for the user RT task
      by setrlimit(RLIMIT_RTTIME), the SIGXCPU signal should be posted
      upon reaching the soft limit.
      
      But exactly when the SIGXCPU signal should be sent depends on the
      RT task timeout value. In fact the timeout mechanism of sending
      the SIGXCPU signal assumes the RT task timeout is increased once
      every tick.
      
      However, currently the timeout value may be added twice per
      tick. So it results in the SIGXCPU signal being sent earlier
      than expected.
      
      To solve this issue, we prevent the timeout value from increasing
      twice within one tick time by remembering the jiffies value of
      last updating the timeout. As long as the RT task's jiffies is
      different with the global jiffies value, we allow its timeout to
      be updated.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarFan Du <fan.du@windriver.com>
      Reviewed-by: default avatarYong Zhang <yong.zhang0@gmail.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1342508623-2887-1-git-send-email-ying.xue@windriver.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [ lizf: backported to 3.4: adjust context ]
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dbf32394
    • Peter Boonstoppel's avatar
      sched: Unthrottle rt runqueues in __disable_runtime() · f61eb9ce
      Peter Boonstoppel authored
      commit a4c96ae3 upstream.
      
      migrate_tasks() uses _pick_next_task_rt() to get tasks from the
      real-time runqueues to be migrated. When rt_rq is throttled
      _pick_next_task_rt() won't return anything, in which case
      migrate_tasks() can't move all threads over and gets stuck in an
      infinite loop.
      
      Instead unthrottle rt runqueues before migrating tasks.
      
      Additionally: move unthrottle_offline_cfs_rqs() to rq_offline_fair()
      Signed-off-by: default avatarPeter Boonstoppel <pboonstoppel@nvidia.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/5FBF8E85CA34454794F0F7ECBA79798F379D3648B7@HQMAIL04.nvidia.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [ lizf: backported to 3.4: adjust context ]
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f61eb9ce
    • Mike Galbraith's avatar
      sched,rt: fix isolated CPUs leaving root_task_group indefinitely throttled · 1e5c13ec
      Mike Galbraith authored
      commit e221d028 upstream.
      
      Root task group bandwidth replenishment must service all CPUs, regardless of
      where the timer was last started, and regardless of the isolation mechanism,
      lest 'Quoth the Raven, "Nevermore"' become rt scheduling policy.
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1344326558.6968.25.camel@marge.simpson.netSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e5c13ec
    • Colin Cross's avatar
      sched/rt: Fix SCHED_RR across cgroups · 21b53baf
      Colin Cross authored
      commit 454c7999 upstream.
      
      task_tick_rt() has an optimization to only reschedule SCHED_RR tasks
      if they were the only element on their rq.  However, with cgroups
      a SCHED_RR task could be the only element on its per-cgroup rq but
      still be competing with other SCHED_RR tasks in its parent's
      cgroup.  In this case, the SCHED_RR task in the child cgroup would
      never yield at the end of its timeslice.  If the child cgroup
      rt_runtime_us was the same as the parent cgroup rt_runtime_us,
      the task in the parent cgroup would starve completely.
      
      Modify task_tick_rt() to check that the task is the only task on its
      rq, and that the each of the scheduling entities of its ancestors
      is also the only entity on its rq.
      Signed-off-by: default avatarColin Cross <ccross@android.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1337229266-15798-1-git-send-email-ccross@android.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21b53baf
    • Al Viro's avatar
      hpfs: deadlock and race in directory lseek() · d5c20298
      Al Viro authored
      commit 31abdab9 upstream.
      
      For one thing, there's an ABBA deadlock on hpfs fs-wide lock and i_mutex
      in hpfs_dir_lseek() - there's a lot of methods that grab the former with
      the caller already holding the latter, so it must take i_mutex first.
      
      For another, locking the damn thing, carefully validating the offset,
      then dropping locks and assigning the offset is obviously racy.
      
      Moreover, we _must_ do hpfs_add_pos(), or the machinery in dnode.c
      won't modify the sucker on B-tree surgeries.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5c20298
    • Yijing Wang's avatar
      PCI: Enable ARI if dev and upstream bridge support it; disable otherwise · d7c16d1e
      Yijing Wang authored
      commit b0cc6020 upstream.
      
      Currently, we enable ARI in a device's upstream bridge if the bridge and
      the device support it.  But we never disable ARI, even if the device is
      removed and replaced with a device that doesn't support ARI.
      
      This means that if we hot-remove an ARI device and replace it with a
      non-ARI multi-function device, we find only function 0 of the new device
      because the upstream bridge still has ARI enabled, and next_ari_fn()
      only returns function 0 for the new non-ARI device.
      
      This patch disables ARI in the upstream bridge if the device doesn't
      support ARI.  See the PCIe spec, r3.0, sec 6.13.
      
      [bhelgaas: changelog, function comment]
      [yijing: replace PCIe Cap accessor with legacy PCI accessor]
      Signed-off-by: default avatarYijing Wang <wangyijing@huawei.com>
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      d7c16d1e
    • Alex Deucher's avatar
      drm/radeon/DCE4+: clear bios scratch dpms bit (v2) · 27fb12b9
      Alex Deucher authored
      commit 6802d4ba upstream.
      
      The BlankCrtc table in some DCE8 boards has some
      logic shortcuts for the vbios when this bit is set.
      Clear it for driver use.
      
      v2: fix typo
      
      Bug:
      https://bugs.freedesktop.org/show_bug.cgi?id=73420Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27fb12b9
    • Alex Deucher's avatar
      drm/radeon: set the full cache bit for fences on r7xx+ · 00a67d1c
      Alex Deucher authored
      commit d45b964a upstream.
      
      Needed to properly flush the read caches for fences.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00a67d1c
    • Marek Olšák's avatar
      drm/radeon: skip colorbuffer checking if COLOR_INFO.FORMAT is set to INVALID · 790d8449
      Marek Olšák authored
      commit 56492e0f upstream.
      
      This fixes a bug which was causing rejections of valid GPU commands
      from userspace.
      Signed-off-by: default avatarMarek Olšák <marek.olsak@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      790d8449
    • Michel Dänzer's avatar
      radeon/pm: Guard access to rdev->pm.power_state array · e4496194
      Michel Dänzer authored
      commit 37016951 upstream.
      
      It's never allocated on systems without an ATOMBIOS or COMBIOS ROM.
      
      Should fix an oops I encountered while resetting the GPU after a lockup
      on my PowerBook with an RV350.
      Signed-off-by: default avatarMichel Dänzer <michel.daenzer@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4496194
    • Alex Deucher's avatar
      drm/radeon: warn users when hw_i2c is enabled (v2) · aea570ea
      Alex Deucher authored
      commit d1951782 upstream.
      
      The hw i2c engines are disabled by default as the
      current implementation is still experimental.  Print
      a warning when users enable it so that it's obvious
      when the option is enabled.
      
      v2: check for non-0 rather than 1
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aea570ea
    • Joe Thornber's avatar
      dm space map common: make sure new space is used during extend · 11690e14
      Joe Thornber authored
      commit 12c91a5c upstream.
      
      When extending a low level space map we should update nr_blocks at
      the start so the new space is used for the index entries.
      
      Otherwise extend can fail, e.g.: sm_metadata_extend call sequence
      that fails:
       -> sm_ll_extend
          -> dm_tm_new_block -> dm_sm_new_block -> sm_bootstrap_new_block
          => returns -ENOSPC because smm->begin == smm->ll.nr_blocks
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11690e14
    • Mikulas Patocka's avatar
      dm: wait until embedded kobject is released before destroying a device · 39bbeb69
      Mikulas Patocka authored
      commit be35f486 upstream.
      
      There may be other parts of the kernel holding a reference on the dm
      kobject.  We must wait until all references are dropped before
      deallocating the mapped_device structure.
      
      The dm_kobject_release method signals that all references are dropped
      via completion.  But dm_kobject_release doesn't free the kobject (which
      is embedded in the mapped_device structure).
      
      This is the sequence of operations:
      * when destroying a DM device, call kobject_put from dm_sysfs_exit
      * wait until all users stop using the kobject, when it happens the
        release method is called
      * the release method signals the completion and should return without
        delay
      * the dm device removal code that waits on the completion continues
      * the dm device removal code drops the dm_mod reference the device had
      * the dm device removal code frees the mapped_device structure that
        contains the kobject
      
      Using kobject this way should avoid the module unload race that was
      mentioned at the beginning of this thread:
      https://lkml.org/lkml/2014/1/4/83Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39bbeb69
    • Weston Andros Adamson's avatar
      sunrpc: Fix infinite loop in RPC state machine · b0c0d5a3
      Weston Andros Adamson authored
      commit 6ff33b7d upstream.
      
      When a task enters call_refreshresult with status 0 from call_refresh and
      !rpcauth_uptodatecred(task) it enters call_refresh again with no rate-limiting
      or max number of retries.
      
      Instead of trying forever, make use of the retry path that other errors use.
      
      This only seems to be possible when the crrefresh callback is gss_refresh_null,
      which only happens when destroying the context.
      
      To reproduce:
      
      1) mount with sec=krb5 (or sec=sys with krb5 negotiated for non FSID specific
         operations).
      
      2) reboot - the client will be stuck and will need to be hard rebooted
      
      BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:46]
      Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 parport_pc parport shpchp nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy
      irq event stamp: 195724
      hardirqs last  enabled at (195723): [<ffffffff814a925c>] restore_args+0x0/0x30
      hardirqs last disabled at (195724): [<ffffffff814b0a6a>] apic_timer_interrupt+0x6a/0x80
      softirqs last  enabled at (195722): [<ffffffff8103f583>] __do_softirq+0x1df/0x276
      softirqs last disabled at (195717): [<ffffffff8103f852>] irq_exit+0x53/0x9a
      CPU: 0 PID: 46 Comm: kworker/0:2 Not tainted 3.13.0-rc3-branch-dros_testing+ #4
      Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
      Workqueue: rpciod rpc_async_schedule [sunrpc]
      task: ffff8800799c4260 ti: ffff880079002000 task.ti: ffff880079002000
      RIP: 0010:[<ffffffffa0064fd4>]  [<ffffffffa0064fd4>] __rpc_execute+0x8a/0x362 [sunrpc]
      RSP: 0018:ffff880079003d18  EFLAGS: 00000246
      RAX: 0000000000000005 RBX: 0000000000000007 RCX: 0000000000000007
      RDX: 0000000000000007 RSI: ffff88007aecbae8 RDI: ffff8800783d8900
      RBP: ffff880079003d78 R08: ffff88006e30e9f8 R09: ffffffffa005a3d7
      R10: ffff88006e30e7b0 R11: ffff8800783d8900 R12: ffffffffa006675e
      R13: ffff880079003ce8 R14: ffff88006e30e7b0 R15: ffff8800783d8900
      FS:  0000000000000000(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f3072333000 CR3: 0000000001a0b000 CR4: 00000000001407f0
      Stack:
       ffff880079003d98 0000000000000246 0000000000000000 ffff88007a9a4830
       ffff880000000000 ffffffff81073f47 ffff88007f212b00 ffff8800799c4260
       ffff8800783d8988 ffff88007f212b00 ffffe8ffff604800 0000000000000000
      Call Trace:
       [<ffffffff81073f47>] ? trace_hardirqs_on_caller+0x145/0x1a1
       [<ffffffffa00652d3>] rpc_async_schedule+0x27/0x32 [sunrpc]
       [<ffffffff81052974>] process_one_work+0x211/0x3a5
       [<ffffffff810528d5>] ? process_one_work+0x172/0x3a5
       [<ffffffff81052eeb>] worker_thread+0x134/0x202
       [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280
       [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280
       [<ffffffff810584a0>] kthread+0xc9/0xd1
       [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61
       [<ffffffff814afd6c>] ret_from_fork+0x7c/0xb0
       [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61
      Code: e8 87 63 fd e0 c6 05 10 dd 01 00 01 48 8b 43 70 4c 8d 6b 70 45 31 e4 a8 02 0f 85 d5 02 00 00 4c 8b 7b 48 48 c7 43 48 00 00 00 00 <4c> 8b 4b 50 4d 85 ff 75 0c 4d 85 c9 4d 89 cf 0f 84 32 01 00 00
      
      And the output of "rpcdebug -m rpc -s all":
      
      RPC:    61 call_refresh (status 0)
      RPC:    61 call_refresh (status 0)
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      RPC:    61 call_refreshresult (status 0)
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      RPC:    61 call_refreshresult (status 0)
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      RPC:    61 call_refresh (status 0)
      RPC:    61 call_refreshresult (status 0)
      RPC:    61 call_refresh (status 0)
      RPC:    61 call_refresh (status 0)
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      RPC:    61 call_refreshresult (status 0)
      RPC:    61 call_refresh (status 0)
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      RPC:    61 call_refresh (status 0)
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      RPC:    61 call_refreshresult (status 0)
      RPC:    61 call_refresh (status 0)
      RPC:    61 call_refresh (status 0)
      RPC:    61 call_refresh (status 0)
      RPC:    61 call_refresh (status 0)
      RPC:    61 call_refreshresult (status 0)
      RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
      Signed-off-by: default avatarWeston Andros Adamson <dros@netapp.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0c0d5a3
    • Weston Andros Adamson's avatar
      nfs4.1: properly handle ENOTSUP in SECINFO_NO_NAME · b5f0608e
      Weston Andros Adamson authored
      commit 78b19bae upstream.
      
      Don't check for -NFS4ERR_NOTSUPP, it's already been mapped to -ENOTSUPP
      by nfs4_stat_to_errno.
      
      This allows the client to mount v4.1 servers that don't support
      SECINFO_NO_NAME by falling back to the "guess and check" method of
      nfs4_find_root_sec.
      Signed-off-by: default avatarWeston Andros Adamson <dros@primarydata.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5f0608e
    • Trond Myklebust's avatar
      NFSv4: OPEN must handle the NFS4ERR_IO return code correctly · 3c16dfe2
      Trond Myklebust authored
      commit c7848f69 upstream.
      
      decode_op_hdr() cannot distinguish between an XDR decoding error and
      the perfectly valid errorcode NFS4ERR_IO. This is normally not a
      problem, but for the particular case of OPEN, we need to be able
      to increment the NFSv4 open sequence id when the server returns
      a valid response.
      Reported-by: default avatarJ Bruce Fields <bfields@fieldses.org>
      Link: http://lkml.kernel.org/r/20131204210356.GA19452@fieldses.orgSigned-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c16dfe2