1. 22 Feb, 2014 14 commits
  2. 20 Feb, 2014 25 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.4.81 · dd12c7c4
      Greg Kroah-Hartman authored
      dd12c7c4
    • Jeff Layton's avatar
      nfs: tear down caches in nfs_init_writepagecache when allocation fails · 478b97d4
      Jeff Layton authored
      commit 3dd4765f upstream.
      
      ...and ensure that we tear down the nfs_commit_data cache too when
      unloading the module.
      
      Cc: Bryan Schumaker <bjschuma@netapp.com>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      [bwh: Backported to 3.2: drop the nfs_cdata_cachep cleanup; it doesn't exist]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      478b97d4
    • Dan Rosenberg's avatar
      lib/vsprintf.c: kptr_restrict: fix pK-error in SysRq show-all-timers(Q) · 26fead64
      Dan Rosenberg authored
      commit 3715c530 upstream.
      
      When using ALT+SysRq+Q all the pointers are replaced with "pK-error" like
      this:
      
      	[23153.208033]   .base:               pK-error
      
      with echo h > /proc/sysrq-trigger it works:
      
      	[23107.776363]   .base:       ffff88023e60d540
      
      The intent behind this behavior was to return "pK-error" in cases where
      the %pK format specifier was used in interrupt context, because the
      CAP_SYSLOG check wouldn't be meaningful.  Clearly this should only apply
      when kptr_restrict is actually enabled though.
      Reported-by: default avatarStevie Trujillo <stevie.trujillo@gmail.com>
      Signed-off-by: default avatarDan Rosenberg <dan.j.rosenberg@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      26fead64
    • Asias He's avatar
      virtio-blk: Use block layer provided spinlock · 374d3a41
      Asias He authored
      commit 2c95a329 upstream.
      
      Block layer will allocate a spinlock for the queue if the driver does
      not provide one in blk_init_queue().
      
      The reason to use the internal spinlock is that blk_cleanup_queue() will
      switch to use the internal spinlock in the cleanup code path.
      
              if (q->queue_lock != &q->__queue_lock)
                      q->queue_lock = &q->__queue_lock;
      
      However, processes which are in D state might have taken the driver
      provided spinlock, when the processes wake up, they would release the
      block provided spinlock.
      
      =====================================
      [ BUG: bad unlock balance detected! ]
      3.4.0-rc7+ #238 Not tainted
      -------------------------------------
      fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
      [<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380
      but there are no more locks to release!
      
      other info that might help us debug this:
      1 lock held by fio/3587:
       #0:  (&(&vblk->lock)->rlock){......}, at:
      [<ffffffff8132661a>] get_request_wait+0x19a/0x250
      
      Other drivers use block layer provided spinlock as well, e.g. SCSI.
      
      Switching to the block layer provided spinlock saves a bit of memory and
      does not increase lock contention. Performance test shows no real
      difference is observed before and after this patch.
      
      Changes in v2: Improve commit log as Michael suggested.
      
      Cc: virtualization@lists.linux-foundation.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarAsias He <asias@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      374d3a41
    • Seth Forshee's avatar
      Input: synaptics - handle out of bounds values from the hardware · ba37c708
      Seth Forshee authored
      commit c0394506 upstream.
      
      The touchpad on the Acer Aspire One D250 will report out of range values
      in the extreme lower portion of the touchpad. These appear as abrupt
      changes in the values reported by the hardware from very low values to
      very high values, which can cause unexpected vertical jumps in the
      position of the mouse pointer.
      
      What seems to be happening is that the value is wrapping to a two's
      compliment negative value of higher resolution than the 13-bit value
      reported by the hardware, with the high-order bits being truncated. This
      patch adds handling for these values by converting them to the
      appropriate negative values.
      
      The only tricky part about this is deciding when to treat a number as
      negative. It stands to reason that if out of range values can be
      reported on the low end then it could also happen on the high end, so
      not all out of range values should be treated as negative. The approach
      taken here is to split the difference between the maximum legitimate
      value for the axis and the maximum possible value that the hardware can
      report, treating values greater than this number as negative and all
      other values as positive. This can be tweaked later if hardware is found
      that operates outside of these parameters.
      
      BugLink: http://bugs.launchpad.net/bugs/1001251Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Reviewed-by: default avatarDaniel Kurtz <djkurtz@chromium.org>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba37c708
    • Bojan Smojver's avatar
      PM / Hibernate: Hibernate/thaw fixes/improvements · b249f99c
      Bojan Smojver authored
      commit 5a21d489 upstream.
      
       1. Do not allocate memory for buffers from emergency pools, unless
          absolutely required. Do not warn about and do not retry non-essential
          failed allocations.
      
       2. Do not check the amount of free pages left on every single page
          write, but wait until one map is completely populated and then check.
      
       3. Set maximum number of pages for read buffering consistently, instead
          of inadvertently depending on the size of the sector type.
      
       4. Fix copyright line, which I missed when I submitted the hibernation
          threading patch.
      
       5. Dispense with bit shifting arithmetic to improve readability.
      
       6. Really recalculate the number of pages required to be free after all
          allocations have been done.
      
       7. Fix calculation of pages required for read buffering. Only count in
          pages that do not belong to high memory.
      Signed-off-by: default avatarBojan Smojver <bojan@rexursive.com>
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b249f99c
    • Avi Kivity's avatar
      KVM: Fix buffer overflow in kvm_set_irq() · ec90b611
      Avi Kivity authored
      commit f2ebd422 upstream.
      
      kvm_set_irq() has an internal buffer of three irq routing entries, allowing
      connecting a GSI to three IRQ chips or on MSI.  However setup_routing_entry()
      does not properly enforce this, allowing three irqchip routes followed by
      an MSI route to overflow the buffer.
      
      Fix by ensuring that an MSI entry is added to an empty list.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec90b611
    • Nicholas Bellinger's avatar
      target/file: Re-enable optional fd_buffered_io=1 operation · 230a0c3b
      Nicholas Bellinger authored
      commit b32f4c7e upstream.
      
      This patch re-adds the ability to optionally run in buffered FILEIO mode
      (eg: w/o O_DSYNC) for device backends in order to once again use the
      Linux buffered cache as a write-back storage mechanism.
      
      This logic was originally dropped with mainline v3.5-rc commit:
      
      commit a4dff304
      Author: Nicholas Bellinger <nab@linux-iscsi.org>
      Date:   Wed May 30 16:25:41 2012 -0700
      
          target/file: Use O_DSYNC by default for FILEIO backends
      
      This difference with this patch is that fd_create_virtdevice() now
      forces the explicit setting of emulate_write_cache=1 when buffered FILEIO
      operation has been enabled.
      
      (v2: Switch to FDBD_HAS_BUFFERED_IO_WCE + add more detailed
           comment as requested by hch)
      Reported-by: default avatarFerry <iscsitmp@bananateam.nl>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      230a0c3b
    • Nicholas Bellinger's avatar
      target/file: Use O_DSYNC by default for FILEIO backends · 45a0374f
      Nicholas Bellinger authored
      commit a4dff304 upstream.
      
      Convert to use O_DSYNC for all cases at FILEIO backend creation time to
      avoid the extra syncing of pure timestamp updates with legacy O_SYNC during
      default operation as recommended by hch.  Continue to do this independently of
      Write Cache Enable (WCE) bit, as WCE=0 is currently the default for all backend
      devices and enabled by user on per device basis via attrib/emulate_write_cache.
      
      This patch drops the now unnecessary fd_buffered_io= token usage that was
      originally signalling when to explictly disable O_SYNC at backend creation
      time for buffered I/O operation.  This can end up being dangerous for a number
      of reasons during physical node failure, so go ahead and drop this option
      for now when O_DSYNC is used as the default.
      
      Also allow explict FUA WRITEs -> vfs_fsync_range() call to function in
      fd_execute_cmd() independently of WCE bit setting.
      Reported-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      [bwh: Backported to 3.2:
       - We have fd_do_task() and not fd_execute_cmd()
       - Various fields are in struct se_task rather than struct se_cmd
       - fd_create_virtdevice() flags initialisation hasn't been cleaned up]
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45a0374f
    • Jan Kara's avatar
      IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast() · f7441069
      Jan Kara authored
      commit 603e7729 upstream.
      
      qib_user_sdma_queue_pkts() gets called with mmap_sem held for
      writing. Except for get_user_pages() deep down in
      qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all.  Even
      more interestingly the function qib_user_sdma_queue_pkts() (and also
      qib_user_sdma_coalesce() called somewhat later) call copy_from_user()
      which can hit a page fault and we deadlock on trying to get mmap_sem
      when handling that fault.
      
      So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and
      leave mmap_sem locking for mm.
      
      This deadlock has actually been observed in the wild when the node
      is under memory pressure.
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      [Backported to 3.4: (Thank to Ben Hutchings)
       - Adjust context
       - Adjust indentation and nr_pages argument in qib_user_sdma_pin_pages()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7441069
    • Peter Zijlstra's avatar
      sched/nohz: Fix rq->cpu_load calculations some more · f5a4c4b7
      Peter Zijlstra authored
      commit 5aaa0b7a upstream.
      
      Follow up on commit 556061b0 ("sched/nohz: Fix rq->cpu_load[]
      calculations") since while that fixed the busy case it regressed the
      mostly idle case.
      
      Add a callback from the nohz exit to also age the rq->cpu_load[]
      array. This closes the hole where either there was no nohz load
      balance pass during the nohz, or there was a 'significant' amount of
      idle time between the last nohz balance and the nohz exit.
      
      So we'll update unconditionally from the tick to not insert any
      accidental 0 load periods while busy, and we try and catch up from
      nohz idle balance and nohz exit. Both these are still prone to missing
      a jiffy, but that has always been the case.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: pjt@google.com
      Cc: Venkatesh Pallipadi <venki@google.com>
      Link: http://lkml.kernel.org/n/tip-kt0trz0apodbf84ucjfdbr1a@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5a4c4b7
    • Peter Zijlstra's avatar
      sched/nohz: Fix rq->cpu_load[] calculations · e2d51f27
      Peter Zijlstra authored
      commit 556061b0 upstream.
      
      While investigating why the load-balancer did funny I found that the
      rq->cpu_load[] tables were completely screwy.. a bit more digging
      revealed that the updates that got through were missing ticks followed
      by a catchup of 2 ticks.
      
      The catchup assumes the cpu was idle during that time (since only nohz
      can cause missed ticks and the machine is idle etc..) this means that
      esp. the higher indices were significantly lower than they ought to
      be.
      
      The reason for this is that its not correct to compare against jiffies
      on every jiffy on any other cpu than the cpu that updates jiffies.
      
      This patch cludges around it by only doing the catch-up stuff from
      nohz_idle_balance() and doing the regular stuff unconditionally from
      the tick.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: pjt@google.com
      Cc: Venkatesh Pallipadi <venki@google.com>
      Link: http://lkml.kernel.org/n/tip-tp4kj18xdd5aj4vvj0qg55s2@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2d51f27
    • Steven Rostedt's avatar
      ftrace: Have function graph only trace based on global_ops filters · 1c2bd0db
      Steven Rostedt authored
      commit 23a8e844 upstream.
      
      Doing some different tests, I discovered that function graph tracing, when
      filtered via the set_ftrace_filter and set_ftrace_notrace files, does
      not always keep with them if another function ftrace_ops is registered
      to trace functions.
      
      The reason is that function graph just happens to trace all functions
      that the function tracer enables. When there was only one user of
      function tracing, the function graph tracer did not need to worry about
      being called by functions that it did not want to trace. But now that there
      are other users, this becomes a problem.
      
      For example, one just needs to do the following:
      
       # cd /sys/kernel/debug/tracing
       # echo schedule > set_ftrace_filter
       # echo function_graph > current_tracer
       # cat trace
      [..]
       0)               |  schedule() {
       ------------------------------------------
       0)    <idle>-0    =>   rcu_pre-7
       ------------------------------------------
      
       0) ! 2980.314 us |  }
       0)               |  schedule() {
       ------------------------------------------
       0)   rcu_pre-7    =>    <idle>-0
       ------------------------------------------
      
       0) + 20.701 us   |  }
      
       # echo 1 > /proc/sys/kernel/stack_tracer_enabled
       # cat trace
      [..]
       1) + 20.825 us   |      }
       1) + 21.651 us   |    }
       1) + 30.924 us   |  } /* SyS_ioctl */
       1)               |  do_page_fault() {
       1)               |    __do_page_fault() {
       1)   0.274 us    |      down_read_trylock();
       1)   0.098 us    |      find_vma();
       1)               |      handle_mm_fault() {
       1)               |        _raw_spin_lock() {
       1)   0.102 us    |          preempt_count_add();
       1)   0.097 us    |          do_raw_spin_lock();
       1)   2.173 us    |        }
       1)               |        do_wp_page() {
       1)   0.079 us    |          vm_normal_page();
       1)   0.086 us    |          reuse_swap_page();
       1)   0.076 us    |          page_move_anon_rmap();
       1)               |          unlock_page() {
       1)   0.082 us    |            page_waitqueue();
       1)   0.086 us    |            __wake_up_bit();
       1)   1.801 us    |          }
       1)   0.075 us    |          ptep_set_access_flags();
       1)               |          _raw_spin_unlock() {
       1)   0.098 us    |            do_raw_spin_unlock();
       1)   0.105 us    |            preempt_count_sub();
       1)   1.884 us    |          }
       1)   9.149 us    |        }
       1) + 13.083 us   |      }
       1)   0.146 us    |      up_read();
      
      When the stack tracer was enabled, it enabled all functions to be traced, which
      now the function graph tracer also traces. This is a side effect that should
      not occur.
      
      To fix this a test is added when the function tracing is changed, as well as when
      the graph tracer is enabled, to see if anything other than the ftrace global_ops
      function tracer is enabled. If so, then the graph tracer calls a test trampoline
      that will look at the function that is being traced and compare it with the
      filters defined by the global_ops.
      
      As an optimization, if there's no other function tracers registered, or if
      the only registered function tracers also use the global ops, the function
      graph infrastructure will call the registered function graph callback directly
      and not go through the test trampoline.
      
      Fixes: d2d45c7a "tracing: Have stack_tracer use a separate list of functions"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c2bd0db
    • Steven Rostedt's avatar
      ftrace: Fix synchronization location disabling and freeing ftrace_ops · 29558665
      Steven Rostedt authored
      commit a4c35ed2 upstream.
      
      The synchronization needed after ftrace_ops are unregistered must happen
      after the callback is disabled from becing called by functions.
      
      The current location happens after the function is being removed from the
      internal lists, but not after the function callbacks were disabled, leaving
      the functions susceptible of being called after their callbacks are freed.
      
      This affects perf and any externel users of function tracing (LTTng and
      SystemTap).
      
      Fixes: cdbe61bf "ftrace: Allow dynamically allocated function tracers"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29558665
    • Steven Rostedt's avatar
      ftrace: Synchronize setting function_trace_op with ftrace_trace_function · 95bcd16e
      Steven Rostedt authored
      commit 405e1d83 upstream.
      
      [ Partial commit backported to 3.4. The ftrace_sync() code by this is
        required for other fixes that 3.4 needs. ]
      
      ftrace_trace_function is a variable that holds what function will be called
      directly by the assembly code (mcount). If just a single function is
      registered and it handles recursion itself, then the assembly will call that
      function directly without any helper function. It also passes in the
      ftrace_op that was registered with the callback. The ftrace_op to send is
      stored in the function_trace_op variable.
      
      The ftrace_trace_function and function_trace_op needs to be coordinated such
      that the called callback wont be called with the wrong ftrace_op, otherwise
      bad things can happen if it expected a different op. Luckily, there's no
      callback that doesn't use the helper functions that requires this. But
      there soon will be and this needs to be fixed.
      
      Use a set_function_trace_op to store the ftrace_op to set the
      function_trace_op to when it is safe to do so (during the update function
      within the breakpoint or stop machine calls). Or if dynamic ftrace is not
      being used (static tracing) then we have to do a bit more synchronization
      when the ftrace_trace_function is set as that takes affect immediately
      (as oppose to dynamic ftrace doing it with the modification of the trampoline).
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95bcd16e
    • Mikulas Patocka's avatar
      dm sysfs: fix a module unload race · a7333f3d
      Mikulas Patocka authored
      commit 2995fa78 upstream.
      
      This reverts commit be35f486 ("dm: wait until embedded kobject is
      released before destroying a device") and provides an improved fix.
      
      The kobject release code that calls the completion must be placed in a
      non-module file, otherwise there is a module unload race (if the process
      calling dm_kobject_release is preempted and the DM module unloaded after
      the completion is triggered, but before dm_kobject_release returns).
      
      To fix this race, this patch moves the completion code to dm-builtin.c
      which is always compiled directly into the kernel if BLK_DEV_DM is
      selected.
      
      The patch introduces a new dm_kobject_holder structure, its purpose is
      to keep the completion and kobject in one place, so that it can be
      accessed from non-module code without the need to export the layout of
      struct mapped_device to that code.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7333f3d
    • Xishi Qiu's avatar
      mm: setup pageblock_order before it's used by sparsemem · 3fea8b0a
      Xishi Qiu authored
      commit ca57df79 upstream.
      
      On architectures with CONFIG_HUGETLB_PAGE_SIZE_VARIABLE set, such as
      Itanium, pageblock_order is a variable with default value of 0.  It's set
      to the right value by set_pageblock_order() in function
      free_area_init_core().
      
      But pageblock_order may be used by sparse_init() before free_area_init_core()
      is called along path:
      sparse_init()
          ->sparse_early_usemaps_alloc_node()
      	->usemap_size()
      	    ->SECTION_BLOCKFLAGS_BITS
      		->((1UL << (PFN_SECTION_SHIFT - pageblock_order)) *
      NR_PAGEBLOCK_BITS)
      
      The uninitialized pageblock_size will cause memory wasting because
      usemap_size() returns a much bigger value then it's really needed.
      
      For example, on an Itanium platform,
      sparse_init() pageblock_order=0 usemap_size=24576
      free_area_init_core() before pageblock_order=0, usemap_size=24576
      free_area_init_core() after pageblock_order=12, usemap_size=8
      
      That means 24K memory has been wasted for each section, so fix it by calling
      set_pageblock_order() from sparse_init().
      Signed-off-by: default avatarXishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: default avatarJiang Liu <liuj97@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Keping Chen <chenkeping@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3fea8b0a
    • Andrew Morton's avatar
      mm/page_alloc.c: remove pageblock_default_order() · 237597d8
      Andrew Morton authored
      commit 955c1cd7 upstream.
      
      This has always been broken: one version takes an unsigned int and the
      other version takes no arguments.  This bug was hidden because one
      version of set_pageblock_order() was a macro which doesn't evaluate its
      argument.
      
      Simplify it all and remove pageblock_default_order() altogether.
      Reported-by: default avatarrajman mekaco <rajman.mekaco@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      237597d8
    • Daniel Vetter's avatar
      drm/i915: kick any firmware framebuffers before claiming the gtt · dbdd2eb4
      Daniel Vetter authored
      commit 9f846a16 upstream.
      
      Especially vesafb likes to map everything as uc- (yikes), and if that
      mapping hangs around still while we try to map the gtt as wc the
      kernel will downgrade our request to uc-, resulting in abyssal
      performance.
      
      Unfortunately we can't do this as early as readon does (i.e. as the
      first thing we do when initializing the hw) because our fb/mmio space
      region moves around on a per-gen basis. So I've had to move it below
      the gtt initialization, but that seems to work, too. The important
      thing is that we do this before we set up the gtt wc mapping.
      
      Now an altogether different question is why people compile their
      kernels with vesafb enabled, but I guess making things just work isn't
      bad per se ...
      
      v2:
      - s/radeondrmfb/inteldrmfb/
      - fix up error handling
      
      v3: Kill #ifdef X86, this is Intel after all. Noticed by Ben Widawsky.
      
      v4: Jani Nikula complained about the pointless bool primary
      initialization.
      
      v5: Don't oops if we can't allocate, noticed by Chris Wilson.
      
      v6: Resolve conflicts with agp rework and fixup whitespace.
      
      This is commit e188719a in drm-next.
      
      Backport to 3.5 -fixes queue requested by Dave Airlie - due to grub
      using vesa on fedora their initrd seems to load vesafb before loading
      the real kms driver. So tons more people actually experience a
      dead-slow gpu. Hence also the Cc: stable.
      Reported-and-tested-by: default avatar"Kilarski, Bernard R" <bernard.r.kilarski@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      dbdd2eb4
    • Tao Ma's avatar
      ext4: protect group inode free counting with group lock · 4e0bc3f3
      Tao Ma authored
      commit 6f2e9f0e upstream.
      
      Now when we set the group inode free count, we don't have a proper
      group lock so that multiple threads may decrease the inode free
      count at the same time. And e2fsck will complain something like:
      
      Free inodes count wrong for group #1 (1, counted=0).
      Fix? no
      
      Free inodes count wrong for group #2 (3, counted=0).
      Fix? no
      
      Directories count wrong for group #2 (780, counted=779).
      Fix? no
      
      Free inodes count wrong for group #3 (2272, counted=2273).
      Fix? no
      
      So this patch try to protect it with the ext4_lock_group.
      
      btw, it is found by xfstests test case 269 and the volume is
      mkfsed with the parameter
      "-O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr"
      and I have run it 100 times and the error in e2fsck doesn't
      show up again.
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      4e0bc3f3
    • Paul E. McKenney's avatar
      printk: Fix scheduling-while-atomic problem in console_cpu_notify() · 5e23efd0
      Paul E. McKenney authored
      commit 85eae82a upstream.
      
      The console_cpu_notify() function runs with interrupts disabled in the
      CPU_DYING case.  It therefore cannot block, for example, as will happen
      when it calls console_lock().  Therefore, remove the CPU_DYING leg of
      the switch statement to avoid this problem.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Guillaume Morin <guillaume@morinfr.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e23efd0
    • Peter Oberparleiter's avatar
      x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y · 36f0c45d
      Peter Oberparleiter authored
      commit 6583327c upstream.
      
      Commit d61931d8, "x86: Add optimized popcnt variants" introduced
      compile flag -fcall-saved-rdi for lib/hweight.c. When combined with
      options -fprofile-arcs and -O2, this flag causes gcc to generate
      broken constructor code. As a result, a 64 bit x86 kernel compiled
      with CONFIG_GCOV_PROFILE_ALL=y prints message "gcov: could not create
      file" and runs into sproadic BUGs during boot.
      
      The gcc people indicate that these kinds of problems are endemic when
      using ad hoc calling conventions.  It is therefore best to treat any
      file compiled with ad hoc calling conventions as an isolated
      environment and avoid things like profiling or coverage analysis,
      since those subsystems assume a "normal" calling conventions.
      
      This patch avoids the bug by excluding lib/hweight.o from coverage
      profiling.
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/52F3A30C.7050205@linux.vnet.ibm.comSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36f0c45d
    • KOSAKI Motohiro's avatar
      mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq · d89985cb
      KOSAKI Motohiro authored
      commit 227d53b3 upstream.
      
      To use spin_{un}lock_irq is dangerous if caller disabled interrupt.
      During aio buffer migration, we have a possibility to see the following
      call stack.
      
      aio_migratepage  [disable interrupt]
        migrate_page_copy
          clear_page_dirty_for_io
            set_page_dirty
              __set_page_dirty_buffers
                __set_page_dirty
                  spin_lock_irq
      
      This mean, current aio migration is a deadlockable.  spin_lock_irqsave
      is a safer alternative and we should use it.
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: David Rientjes rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d89985cb
    • KOSAKI Motohiro's avatar
      mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq() · 4d4bed81
      KOSAKI Motohiro authored
      commit a85d9df1 upstream.
      
      During aio stress test, we observed the following lockdep warning.  This
      mean AIO+numa_balancing is currently deadlockable.
      
      The problem is, aio_migratepage disable interrupt, but
      __set_page_dirty_nobuffers unintentionally enable it again.
      
      Generally, all helper function should use spin_lock_irqsave() instead of
      spin_lock_irq() because they don't know caller at all.
      
         other info that might help us debug this:
          Possible unsafe locking scenario:
      
                CPU0
                ----
           lock(&(&ctx->completion_lock)->rlock);
           <Interrupt>
             lock(&(&ctx->completion_lock)->rlock);
      
          *** DEADLOCK ***
      
            dump_stack+0x19/0x1b
            print_usage_bug+0x1f7/0x208
            mark_lock+0x21d/0x2a0
            mark_held_locks+0xb9/0x140
            trace_hardirqs_on_caller+0x105/0x1d0
            trace_hardirqs_on+0xd/0x10
            _raw_spin_unlock_irq+0x2c/0x50
            __set_page_dirty_nobuffers+0x8c/0xf0
            migrate_page_copy+0x434/0x540
            aio_migratepage+0xb1/0x140
            move_to_new_page+0x7d/0x230
            migrate_pages+0x5e5/0x700
            migrate_misplaced_page+0xbc/0xf0
            do_numa_page+0x102/0x190
            handle_pte_fault+0x241/0x970
            handle_mm_fault+0x265/0x370
            __do_page_fault+0x172/0x5a0
            do_page_fault+0x1a/0x70
            page_fault+0x28/0x30
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d4bed81
    • Stephen Smalley's avatar
      SELinux: Fix kernel BUG on empty security contexts. · a0f916d4
      Stephen Smalley authored
      commit 2172fa70 upstream.
      
      Setting an empty security context (length=0) on a file will
      lead to incorrectly dereferencing the type and other fields
      of the security context structure, yielding a kernel BUG.
      As a zero-length security context is never valid, just reject
      all such security contexts whether coming from userspace
      via setxattr or coming from the filesystem upon a getxattr
      request by SELinux.
      
      Setting a security context value (empty or otherwise) unknown to
      SELinux in the first place is only possible for a root process
      (CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
      if the corresponding SELinux mac_admin permission is also granted
      to the domain by policy.  In Fedora policies, this is only allowed for
      specific domains such as livecd for setting down security contexts
      that are not defined in the build host policy.
      
      Reproducer:
      su
      setenforce 0
      touch foo
      setfattr -n security.selinux foo
      
      Caveat:
      Relabeling or removing foo after doing the above may not be possible
      without booting with SELinux disabled.  Any subsequent access to foo
      after doing the above will also trigger the BUG.
      
      BUG output from Matthew Thode:
      [  473.893141] ------------[ cut here ]------------
      [  473.962110] kernel BUG at security/selinux/ss/services.c:654!
      [  473.995314] invalid opcode: 0000 [#6] SMP
      [  474.027196] Modules linked in:
      [  474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G      D   I
      3.13.0-grsec #1
      [  474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
      07/29/10
      [  474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
      ffff8805f50cd488
      [  474.183707] RIP: 0010:[<ffffffff814681c7>]  [<ffffffff814681c7>]
      context_struct_compute_av+0xce/0x308
      [  474.219954] RSP: 0018:ffff8805c0ac3c38  EFLAGS: 00010246
      [  474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
      0000000000000100
      [  474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
      ffff8805e8aaa000
      [  474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
      0000000000000006
      [  474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
      0000000000000006
      [  474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
      0000000000000000
      [  474.453816] FS:  00007f2e75220800(0000) GS:ffff88061fc00000(0000)
      knlGS:0000000000000000
      [  474.489254] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
      00000000000207f0
      [  474.556058] Stack:
      [  474.584325]  ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
      ffff8805f1190a40
      [  474.618913]  ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
      ffff8805e8aac860
      [  474.653955]  ffff8805c0ac3cb8 000700068113833a ffff880606c75060
      ffff8805c0ac3d94
      [  474.690461] Call Trace:
      [  474.723779]  [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a
      [  474.778049]  [<ffffffff81468824>] security_compute_av+0xf4/0x20b
      [  474.811398]  [<ffffffff8196f419>] avc_compute_av+0x2a/0x179
      [  474.843813]  [<ffffffff8145727b>] avc_has_perm+0x45/0xf4
      [  474.875694]  [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31
      [  474.907370]  [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e
      [  474.938726]  [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22
      [  474.970036]  [<ffffffff811b057d>] vfs_getattr+0x19/0x2d
      [  475.000618]  [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91
      [  475.030402]  [<ffffffff811b063b>] vfs_lstat+0x19/0x1b
      [  475.061097]  [<ffffffff811b077e>] SyS_newlstat+0x15/0x30
      [  475.094595]  [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3
      [  475.148405]  [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b
      [  475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
      8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
      75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
      [  475.255884] RIP  [<ffffffff814681c7>]
      context_struct_compute_av+0xce/0x308
      [  475.296120]  RSP <ffff8805c0ac3c38>
      [  475.328734] ---[ end trace f076482e9d754adc ]---
      Reported-by: default avatarMatthew Thode <mthode@mthode.org>
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0f916d4
  3. 13 Feb, 2014 1 commit