1. 19 May, 2013 40 commits
    • Jani Nikula's avatar
      drm/i915: clear the stolen fb before resuming · f28cecc1
      Jani Nikula authored
      commit 1ffc5289 upstream.
      
      Similar to
      commit 88afe715
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Sun Dec 16 12:15:41 2012 +0000
      
          drm/i915: Clear the stolen fb before enabling
      
      but on the resume path.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=57191Reported-and-tested-by: default avatarNikolay Amiantov <nikoamia@gmail.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f28cecc1
    • Daniel Vetter's avatar
      drm: don't check modeset locks in panic handler · 60724ed5
      Daniel Vetter authored
      commit a9b054e8 upstream.
      
      Since we know that locking is broken in that case and it's more
      important to not flood the dmesg with random gunk.
      Reported-and-tested-by: default avatarBorislav Petkov <bp@suse.de>
      References: http://lkml.kernel.org/r/20130502000206.GH15623@pd.tnic
      Cc: Dave Airlie <airlied@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60724ed5
    • Daniel Vetter's avatar
      drm/mm: fix dump table BUG · dfdaa3fc
      Daniel Vetter authored
      commit 3a359f0b upstream.
      
      In
      
      commit 9e8944ab
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Thu Nov 15 11:32:17 2012 +0000
      
          drm: Introduce an iterator over holes in the drm_mm range manager
      
      helpers and iterators for hole handling have been introduced with some
      debug BUG_ONs sprinkled over. Unfortunately this broke the mm dumper
      which unconditionally tried to compute the size of the very first
      hole.
      
      While at it unify the code a bit with the hole dumping in the loop.
      
      v2: Extract a hole dump helper.
      Reported-by: default avatarChristopher Harvey <charvey@matrox.com>
      Cc: Christopher Harvey <charvey@matrox.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Acked-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dfdaa3fc
    • Christopher Harvey's avatar
      drm/mgag200: Fix framebuffer base address programming · 70386b7f
      Christopher Harvey authored
      commit 9f1d0366 upstream.
      
      Higher bits of the base address of framebuffers weren't being
      programmed properly. This caused framebuffers that didn't happen to be
      allocated at a low enough address to not be displayed properly.
      Signed-off-by: default avatarChristopher Harvey <charvey@matrox.com>
      Signed-off-by: default avatarMathieu Larouche <mathieu.larouche@matrox.com>
      Acked-by: default avatarJulia Lemire <jlemire@matrox.com>
      Tested-by: default avatarJulia Lemire <jlemire@matrox.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70386b7f
    • Christopher Harvey's avatar
      drm/mgag200: Fix writes into MGA1064_PIX_CLK_CTL register · 6640a6a9
      Christopher Harvey authored
      commit fb70a669 upstream.
      
      The original line,
        WREG_DAC(MGA1064_PIX_CLK_CTL_CLK_DIS, tmp);
      wrote tmp into MGA1064_PIX_CLK_CTL_CLK_DIS, where
      MGA1064_PIX_CLK_CTL_CLK_DIS is an offset into
      MGA1064_PIX_CLK_CTL. Change the line to write properly into
      MGA1064_PIX_CLK_CTL. There were other chunks of code nearby that use
      the same pattern (but work correctly), so this patch updates them all
      to use this new (slightly more efficient) write pattern. The WREG_DAC
      macro was causing the DAC_INDEX register to be set to the same value
      twice. WREG8(DAC_DATA, foo) takes advantage of the fact that DAC_INDEX
      is already at the value we want.
      Signed-off-by: default avatarChristopher Harvey <charvey@matrox.com>
      Acked-by: default avatarJulia Lemire <jlemire@matrox.com>
      Tested-by: default avatarJulia Lemire <jlemire@matrox.com>
      Acked-by: default avatarMathieu Larouche <mathieu.larouche@matrox.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6640a6a9
    • Stanislaw Gruszka's avatar
      iwl4965: workaround connection regression on passive channel · 4aea296f
      Stanislaw Gruszka authored
      commit dd9c4640 upstream.
      
      Jake reported that since commit 1672c0e3
      "mac80211: start auth/assoc timeout on frame status", he is unable to
      connect to his AP, which is configured to use passive channel.
      
      After switch to passive channel 4965 firmware drops any TX packet until
      it receives beacon. Before commit 1672c0e3 we waited on channel and
      retransmit packet after 200ms, that makes we receive beacon on the
      meantime and association process succeed. New mac80211 behaviour cause
      that any ASSOC frame fail immediately on iwl4965 and we can not
      associate.
      
      This patch restore old mac80211 behaviour for iwl4965, by removing
      IEEE80211_HW_REPORTS_TX_ACK_STATUS feature. This feature will be
      added again to iwl4965 driver, when different, more complex
      workaround for this firmware issue, will be added to the driver.
      Bisected-by: default avatarJake Edge <jake@lwn.net>
      Reported-and-tested-by: default avatarJake Edge <jake@lwn.net>
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4aea296f
    • Thommy Jakobsson's avatar
      B43: Handle DMA RX descriptor underrun · 548ebaff
      Thommy Jakobsson authored
      commit 73b82bf0 upstream.
      
      Add handling of rx descriptor underflow. This fixes a fault that could
      happen on slow machines, where data is received faster than the CPU can
      handle. In such a case the device will use up all rx descriptors and
      refuse to send any more data before confirming that it is ok. This
      patch enables necessary interrupt to discover such a situation and will
      handle them by dropping everything in the ring buffer.
      Reviewed-by: default avatarMichael Buesch <m@bues.ch>
      Signed-off-by: default avatarThommy Jakobsson <thommyj@gmail.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      548ebaff
    • Chris Metcalf's avatar
      tile: support new Tilera hypervisor · ab08ba32
      Chris Metcalf authored
      commit c539914d upstream.
      
      The Tilera hypervisor shipped in releases up through MDE 4.1 launches
      the client operating system (i.e. Linux) at privilege level 1 (PL1).
      Starting with MDE 4.2, as part of the work to enable KVM, the
      Tilera hypervisor launches Linux at PL2 instead.
      
      This commit makes the KERNEL_PL option default to 2 for tilegx, while
      still saying at 1 for tilepro, which doesn't have an updated hypervisor.
      It also explains how and when you might want to choose another value.
      In addition, we change a small buglet in the on-chip Ethernet driver,
      where we were failing to use the KERNEL_PL constant in an API call.
      
      To make the transition cleaner, this change also provides the updated
      hv_init() API for the new hypervisor that supports announcing Linux's
      compiled-in PL, so the hypervisor can generate a suitable error in the
      case of a mismatched hypervisor and Linux binary.
      Signed-off-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab08ba32
    • Daniel Drake's avatar
      mwifiex: fix setting of multicast filter · f4d497e4
      Daniel Drake authored
      commit ccd384b1 upstream.
      
      A small bug in this code was causing the ALLMULTI filter to be set
      when in fact we were just wanting to program a selective multicast list
      to the hardware.
      
      Fix that bug and remove a redundant if condition in the code that
      follows.
      
      This fixes wakeup behaviour when multicast WOL is enabled. Previously,
      all multicast packets would wake up the system. Now, only those that the
      host intended to receive trigger wakeups.
      Signed-off-by: default avatarDaniel Drake <dsd@laptop.org>
      Acked-by: default avatarBing Zhao <bzhao@marvell.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4d497e4
    • Amitkumar Karwar's avatar
      mwifiex: fix memory leak issue when driver unload · 09ec4566
      Amitkumar Karwar authored
      commit f16fdc9d upstream.
      
      After unregister_netdevice() call the request is queued and
      reg_state is changed to NETREG_UNREGISTERING.
      As we check for NETREG_UNREGISTERED state, free_netdev() never
      gets executed causing memory leak.
      
      Initialize "dev->destructor" to free_netdev() to free device
      data after unregistration.
      Reported-by: default avatarDaniel Drake <dsd@laptop.org>
      Tested-by: default avatarDaniel Drake <dsd@laptop.org>
      Signed-off-by: default avatarAmitkumar Karwar <akarwar@marvell.com>
      Signed-off-by: default avatarBing Zhao <bzhao@marvell.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09ec4566
    • Bing Zhao's avatar
      mwifiex: clear is_suspended flag when interrupt is received early · 8dd844a8
      Bing Zhao authored
      commit 48795424 upstream.
      
      When the XO-4 with 8787 wireless is woken up due to wake-on-WLAN
      mwifiex is often flooded with "not allowed while suspended" messages
      and the interface is unusable.
      
      [  202.171609] int: sdio_ireg = 0x1
      [  202.180700] info: mwifiex_process_hs_config: auto cancelling host
                     sleep since there is interrupt from the firmware
      [  202.201880] event: wakeup device...
      [  202.211452] event: hs_deactivated
      [  202.514638] info: --- Rx: Data packet ---
      [  202.514753] data: 4294957544 BSS(0-0): Data <= kernel
      [  202.514825] PREP_CMD: device in suspended state
      [  202.514839] data: dequeuing the packet ec7248c0 ec4869c0
      [  202.514886] mwifiex_write_data_sync: not allowed while suspended
      [  202.514886] host_to_card, write iomem (1) failed: -1
      [  202.514917] mwifiex_write_data_sync: not allowed while suspended
      [  202.514936] host_to_card, write iomem (2) failed: -1
      [  202.514949] mwifiex_write_data_sync: not allowed while suspended
      [  202.514965] host_to_card, write iomem (3) failed: -1
      [  202.514976] mwifiex_write_data_async failed: 0xFFFFFFFF
      
      This can be readily reproduced when putting the XO-4 in a loop where
      it goes to sleep due to inactivity, but then wakes up due to an
      incoming ping. The error is hit within an hour or two.
      
      This issue happens when an interrupt comes in early while host sleep
      is still activated. Driver handles this case by auto cancelling host
      sleep. However is_suspended flag is still set which prevents any cmd
      or data from being sent to firmware. Fix it by clearing is_suspended
      flag in this path.
      Reported-by: default avatarDaniel Drake <dsd@laptop.org>
      Tested-by: default avatarDaniel Drake <dsd@laptop.org>
      Signed-off-by: default avatarBing Zhao <bzhao@marvell.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8dd844a8
    • Felix Fietkau's avatar
      ath9k: fix key allocation error handling for powersave keys · 003e033d
      Felix Fietkau authored
      commit 4ef69d03 upstream.
      
      If no keycache slots are available, ath_key_config can return -ENOSPC.
      If the key index is not checked for errors, it can lead to logspam that
      looks like this: "ath: wiphy0: keyreset: keycache entry 228 out of range"
      This can cause follow-up errors if the invalid keycache index gets
      used for tx.
      Signed-off-by: default avatarFelix Fietkau <nbd@openwrt.org>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      003e033d
    • Anton Blanchard's avatar
      powerpc/kexec: Fix kexec when using VMX optimised memcpy · 240814c5
      Anton Blanchard authored
      commit 79c66ce8 upstream.
      
      commit b3f271e8 (powerpc: POWER7 optimised memcpy using VMX and
      enhanced prefetch) uses VMX when it is safe to do so (ie not in
      interrupt). It also looks at the task struct to decide if we have to
      save the current tasks' VMX state.
      
      kexec calls memcpy() at a point where the task struct may have been
      overwritten by the new kexec segments. If it has been overwritten
      then when memcpy -> enable_altivec looks up current->thread.regs->msr
      we get a cryptic oops or lockup.
      
      I also notice we aren't initialising thread_info->cpu, which means
      smp_processor_id is broken. Fix that too.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      240814c5
    • Robert Jennings's avatar
      powerpc: Bring all threads online prior to migration/hibernation · ce253003
      Robert Jennings authored
      commit 120496ac upstream.
      
      This patch brings online all threads which are present but not online
      prior to migration/hibernation.  After migration/hibernation those
      threads are taken back offline.
      
      During migration/hibernation all online CPUs must call H_JOIN, this is
      required by the hypervisor.  Without this patch, threads that are offline
      (H_CEDE'd) will not be woken to make the H_JOIN call and the OS will be
      deadlocked (all threads either JOIN'd or CEDE'd).
      Signed-off-by: default avatarRobert Jennings <rcj@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce253003
    • Jaccon Bastiaansen's avatar
      ARM: 7720/1: ARM v6/v7 cmpxchg64 shouldn't clear upper 32 bits of the old/new value · e374a2ff
      Jaccon Bastiaansen authored
      commit 6eabb330 upstream.
      
      The implementation of cmpxchg64() for the ARM v6 and v7 architecture
      casts parameter 2 and 3 (the old and new 64bit values) to an unsigned
      long before calling the atomic_cmpxchg64() function. This clears
      the top 32 bits of the old and new values, resulting in the wrong
      values being compare-exchanged. Luckily, this only appears to be used
      for 64-bit sched_clock, which we don't (yet) have on ARM.
      
      This bug was introduced by commit 3e0f5a15 ("ARM: 7404/1: cmpxchg64:
      use atomic64 and local64 routines for cmpxchg64").
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJaccon Bastiaansen <jaccon.bastiaansen@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e374a2ff
    • Konrad Rzeszutek Wilk's avatar
      x86/microcode: Add local mutex to fix physical CPU hot-add deadlock · 903bded0
      Konrad Rzeszutek Wilk authored
      commit 074d72ff upstream.
      
      This can easily be triggered if a new CPU is added (via
      ACPI hotplug mechanism) and from user-space you do:
      
         echo 1 > /sys/devices/system/cpu/cpu3/online
      
      (or wait for UDEV to do it) on a newly appeared physical CPU.
      
      The deadlock is that the "store_online" in drivers/base/cpu.c
      takes the cpu_hotplug_driver_lock() lock, then calls "cpu_up".
      "cpu_up" eventually ends up calling "save_mc_for_early"
      which also takes the cpu_hotplug_driver_lock() lock.
      
      And here is that lockdep thinks of it:
      
       smpboot: Stack at about ffff880075c39f44
       smpboot: CPU3: has booted.
       microcode: CPU3 sig=0x206a7, pf=0x2, revision=0x25
      
       =============================================
       [ INFO: possible recursive locking detected ]
       3.9.0upstream-10129-g167af0e #1 Not tainted
       ---------------------------------------------
       sh/2487 is trying to acquire lock:
        (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20
      
       but task is already holding lock:
        (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(x86_cpu_hotplug_driver_mutex);
         lock(x86_cpu_hotplug_driver_mutex);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       6 locks held by sh/2487:
        #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811ca48d>] vfs_write+0x17d/0x190
        #1:  (&buffer->mutex){+.+.+.}, at: [<ffffffff812464ef>] sysfs_write_file+0x3f/0x160
        #2:  (s_active#20){.+.+.+}, at: [<ffffffff81246578>] sysfs_write_file+0xc8/0x160
        #3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20
        #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810961c2>] cpu_maps_update_begin+0x12/0x20
        #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810962a7>] cpu_hotplug_begin+0x27/0x60
      Suggested-and-Acked-by: default avatarBorislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: fenghua.yu@intel.com
      Cc: xen-devel@lists.xensource.com
      Link: http://lkml.kernel.org/r/1368029583-23337-1-git-send-email-konrad.wilk@oracle.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      903bded0
    • Lachlan McIlroy's avatar
      ext4: limit group search loop for non-extent files · ee7122ad
      Lachlan McIlroy authored
      commit e6155736 upstream.
      
      In the case where we are allocating for a non-extent file,
      we must limit the groups we allocate from to those below
      2^32 blocks, and ext4_mb_regular_allocator() attempts to
      do this initially by putting a cap on ngroups for the
      subsequent search loop.
      
      However, the initial target group comes in from the
      allocation context (ac), and it may already be beyond
      the artificially limited ngroups.  In this case,
      the limit
      
      	if (group == ngroups)
      		group = 0;
      
      at the top of the loop is never true, and the loop will
      run away.
      
      Catch this case inside the loop and reset the search to
      start at group 0.
      
      [sandeen@redhat.com: add commit msg & comments]
      Signed-off-by: default avatarLachlan McIlroy <lmcilroy@redhat.com>
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee7122ad
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix leaks of filter preds · 23b929f9
      Steven Rostedt (Red Hat) authored
      commit 60705c89 upstream.
      
      Special preds are created when folding a series of preds that
      can be done in serial. These are allocated in an ops field of
      the pred structure. But they were never freed, causing memory
      leaks.
      
      This was discovered using the kmemleak checker:
      
      unreferenced object 0xffff8800797fd5e0 (size 32):
        comm "swapper/0", pid 1, jiffies 4294690605 (age 104.608s)
        hex dump (first 32 bytes):
          00 00 01 00 03 00 05 00 07 00 09 00 0b 00 0d 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff814b52af>] kmemleak_alloc+0x73/0x98
          [<ffffffff8111ff84>] kmemleak_alloc_recursive.constprop.42+0x16/0x18
          [<ffffffff81120e68>] __kmalloc+0xd7/0x125
          [<ffffffff810d47eb>] kcalloc.constprop.24+0x2d/0x2f
          [<ffffffff810d4896>] fold_pred_tree_cb+0xa9/0xf4
          [<ffffffff810d3781>] walk_pred_tree+0x47/0xcc
          [<ffffffff810d5030>] replace_preds.isra.20+0x6f8/0x72f
          [<ffffffff810d50b5>] create_filter+0x4e/0x8b
          [<ffffffff81b1c30d>] ftrace_test_event_filter+0x5a/0x155
          [<ffffffff8100028d>] do_one_initcall+0xa0/0x137
          [<ffffffff81afbedf>] kernel_init_freeable+0x14d/0x1dc
          [<ffffffff814b24b7>] kernel_init+0xe/0xdb
          [<ffffffff814d539c>] ret_from_fork+0x7c/0xb0
          [<ffffffffffffffff>] 0xffffffffffffffff
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23b929f9
    • Thomas Gleixner's avatar
      tick: Cleanup NOHZ per cpu data on cpu down · c25c0eb5
      Thomas Gleixner authored
      commit 4b0c0f29 upstream.
      
      Prarit reported a crash on CPU offline/online. The reason is that on
      CPU down the NOHZ related per cpu data of the dead cpu is not cleaned
      up. If at cpu online an interrupt happens before the per cpu tick
      device is registered the irq_enter() check potentially sees stale data
      and dereferences a NULL pointer.
      
      Cleanup the data after the cpu is dead.
      Reported-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Cc: Mike Galbraith <bitbucket@online.de>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305031451561.2886@ionosSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c25c0eb5
    • Tirupathi Reddy's avatar
      timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE · 3715b5fa
      Tirupathi Reddy authored
      commit 42a5cf46 upstream.
      
      An inactive timer's base can refer to a offline cpu's base.
      
      In the current code, cpu_base's lock is blindly reinitialized each
      time a CPU is brought up. If a CPU is brought online during the period
      that another thread is trying to modify an inactive timer on that CPU
      with holding its timer base lock, then the lock will be reinitialized
      under its feet. This leads to following SPIN_BUG().
      
      <0> BUG: spinlock already unlocked on CPU#3, kworker/u:3/1466
      <0> lock: 0xe3ebe000, .magic: dead4ead, .owner: kworker/u:3/1466, .owner_cpu: 1
      <4> [<c0013dc4>] (unwind_backtrace+0x0/0x11c) from [<c026e794>] (do_raw_spin_unlock+0x40/0xcc)
      <4> [<c026e794>] (do_raw_spin_unlock+0x40/0xcc) from [<c076c160>] (_raw_spin_unlock+0x8/0x30)
      <4> [<c076c160>] (_raw_spin_unlock+0x8/0x30) from [<c009b858>] (mod_timer+0x294/0x310)
      <4> [<c009b858>] (mod_timer+0x294/0x310) from [<c00a5e04>] (queue_delayed_work_on+0x104/0x120)
      <4> [<c00a5e04>] (queue_delayed_work_on+0x104/0x120) from [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c)
      <4> [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c) from [<c04d8780>] (sdhci_disable+0x40/0x48)
      <4> [<c04d8780>] (sdhci_disable+0x40/0x48) from [<c04bf300>] (mmc_release_host+0x4c/0xb0)
      <4> [<c04bf300>] (mmc_release_host+0x4c/0xb0) from [<c04c7aac>] (mmc_sd_detect+0x90/0xfc)
      <4> [<c04c7aac>] (mmc_sd_detect+0x90/0xfc) from [<c04c2504>] (mmc_rescan+0x7c/0x2c4)
      <4> [<c04c2504>] (mmc_rescan+0x7c/0x2c4) from [<c00a6a7c>] (process_one_work+0x27c/0x484)
      <4> [<c00a6a7c>] (process_one_work+0x27c/0x484) from [<c00a6e94>] (worker_thread+0x210/0x3b0)
      <4> [<c00a6e94>] (worker_thread+0x210/0x3b0) from [<c00aad9c>] (kthread+0x80/0x8c)
      <4> [<c00aad9c>] (kthread+0x80/0x8c) from [<c000ea80>] (kernel_thread_exit+0x0/0x8)
      
      As an example, this particular crash occurred when CPU #3 is executing
      mod_timer() on an inactive timer whose base is refered to offlined CPU
      #2.  The code locked the timer_base corresponding to CPU #2. Before it
      could proceed, CPU #2 came online and reinitialized the spinlock
      corresponding to its base. Thus now CPU #3 held a lock which was
      reinitialized. When CPU #3 finally ended up unlocking the old cpu_base
      corresponding to CPU #2, we hit the above SPIN_BUG().
      
      CPU #0		CPU #3				       CPU #2
      ------		-------				       -------
      .....		 ......				      <Offline>
      		mod_timer()
      		 lock_timer_base
      		   spin_lock_irqsave(&base->lock)
      
      cpu_up(2)	 .....				        ......
      							init_timers_cpu()
      ....		 .....				    	spin_lock_init(&base->lock)
      .....		   spin_unlock_irqrestore(&base->lock)  ......
      		   <spin_bug>
      
      Allocation of per_cpu timer vector bases is done only once under
      "tvec_base_done[]" check. In the current code, spinlock_initialization
      of base->lock isn't under this check. When a CPU is up each time the
      base lock is reinitialized. Move base spinlock initialization under
      the check.
      Signed-off-by: default avatarTirupathi Reddy <tirupath@codeaurora.org>
      Link: http://lkml.kernel.org/r/1368520142-4136-1-git-send-email-tirupath@codeaurora.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3715b5fa
    • John Stultz's avatar
      time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons · d96ac6f2
      John Stultz authored
      commit b4f711ee upstream.
      
      Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
      which enables some minor compile time optimization to avoid
      uncessary code in mostly the suspend/resume path could cause
      problems for userland.
      
      In particular, the dependency for RTC_HCTOSYS on
      !ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
      twice and simplifies suspend/resume, has the side effect
      of causing the /sys/class/rtc/rtcN/hctosys flag to always be
      zero, and this flag is commonly used by udev to setup the
      /dev/rtc symlink to /dev/rtcN, which can cause pain for
      older applications.
      
      While the udev rules could use some work to be less fragile,
      breaking userland should strongly be avoided. Additionally
      the compile time optimizations are fairly minor, and the code
      being optimized is likely to be reworked in the future, so
      lets revert this change.
      Reported-by: default avatarKay Sievers <kay@vrfy.org>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Link: http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d96ac6f2
    • Jeff Layton's avatar
      audit: vfs: fix audit_inode call in O_CREAT case of do_last · 93d927e2
      Jeff Layton authored
      commit 33e2208a upstream.
      
      Jiri reported a regression in auditing of open(..., O_CREAT) syscalls.
      In older kernels, creating a file with open(..., O_CREAT) created
      audit_name records that looked like this:
      
      type=PATH msg=audit(1360255720.628:64): item=1 name="/abc/foo" inode=138810 dev=fd:00 mode=0100640 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
      type=PATH msg=audit(1360255720.628:64): item=0 name="/abc/" inode=138635 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
      
      ...in recent kernels though, they look like this:
      
      type=PATH msg=audit(1360255402.886:12574): item=2 name=(null) inode=264599 dev=fd:00 mode=0100640 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
      type=PATH msg=audit(1360255402.886:12574): item=1 name=(null) inode=264598 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
      type=PATH msg=audit(1360255402.886:12574): item=0 name="/abc/foo" inode=264598 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
      
      Richard bisected to determine that the problems started with commit
      bfcec708, but the log messages have changed with some later
      audit-related patches.
      
      The problem is that this audit_inode call is passing in the parent of
      the dentry being opened, but audit_inode is being called with the parent
      flag false. This causes later audit_inode and audit_inode_child calls to
      match the wrong entry in the audit_names list.
      
      This patch simply sets the flag to properly indicate that this inode
      represents the parent. With this, the audit_names entries are back to
      looking like they did before.
      Reported-by: default avatarJiri Jaburek <jjaburek@redhat.com>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Test By: Richard Guy Briggs <rbriggs@redhat.com>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93d927e2
    • Anton Blanchard's avatar
      audit: Syscall rules are not applied to existing processes on non-x86 · 16f0b63b
      Anton Blanchard authored
      commit cdee3904 upstream.
      
      Commit b05d8447 (audit: inline audit_syscall_entry to reduce
      burden on archs) changed audit_syscall_entry to check for a dummy
      context before calling __audit_syscall_entry. Unfortunately the dummy
      context state is maintained in __audit_syscall_entry so once set it
      never gets cleared, even if the audit rules change.
      
      As a result, if there are no auditing rules when a process starts
      then it will never be subject to any rules added later. x86 doesn't
      see this because it has an assembly fast path that calls directly into
      __audit_syscall_entry.
      
      I noticed this issue when working on audit performance optimisations.
      I wrote a set of simple test cases available at:
      
      http://ozlabs.org/~anton/junkcode/audit_tests.tar.gz
      
      02_new_rule.py fails without the patch and passes with it. The
      test case clears all rules, starts a process, adds a rule then
      verifies the process produces a syscall audit record.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16f0b63b
    • James Bottomley's avatar
      SCSI: sd: fix array cache flushing bug causing performance problems · ccb2c9da
      James Bottomley authored
      commit 39c60a09 upstream.
      
      Some arrays synchronize their full non volatile cache when the sd driver sends
      a SYNCHRONIZE CACHE command.  Unfortunately, they can have Terrabytes of this
      and we send a SYNCHRONIZE CACHE for every barrier if an array reports it has a
      writeback cache.  This leads to massive slowdowns on journalled filesystems.
      
      The fix is to allow userspace to turn off the writeback cache setting as a
      temporary measure (i.e. without doing the MODE SELECT to write it back to the
      device), so even though the device reported it has a writeback cache, the
      user, knowing that the cache is non volatile and all they care about is
      filesystem correctness, can turn that bit off in the kernel and avoid the
      performance ruinous (and safety irrelevant) SYNCHRONIZE CACHE commands.
      
      The way you do this is add a 'temporary' prefix when performing the usual
      cache setting operations, so
      
      echo temporary write through > /sys/class/scsi_disk/<disk>/cache_type
      Reported-by: default avatarRic Wheeler <rwheeler@redhat.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ccb2c9da
    • Konrad Rzeszutek Wilk's avatar
      xen/vcpu/pvhvm: Fix vcpu hotplugging hanging. · db9f69dc
      Konrad Rzeszutek Wilk authored
      commit 7f1fc268 upstream.
      
      If a user did:
      
      	echo 0 > /sys/devices/system/cpu/cpu1/online
      	echo 1 > /sys/devices/system/cpu/cpu1/online
      
      we would (this a build with DEBUG enabled) get to:
      smpboot: ++++++++++++++++++++=_---CPU UP  1
      .. snip..
      smpboot: Stack at about ffff880074c0ff44
      smpboot: CPU1: has booted.
      
      and hang. The RCU mechanism would kick in an try to IPI the CPU1
      but the IPIs (and all other interrupts) would never arrive at the
      CPU1. At first glance at least. A bit digging in the hypervisor
      trace shows that (using xenanalyze):
      
      [vla] d4v1 vec 243 injecting
         0.043163027 --|x d4v1 intr_window vec 243 src 5(vector) intr f3
      ]  0.043163639 --|x d4v1 vmentry cycles 1468
      ]  0.043164913 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254
         0.043164913 --|x d4v1 inj_virq vec 243  real
        [vla] d4v1 vec 243 injecting
         0.043164913 --|x d4v1 intr_window vec 243 src 5(vector) intr f3
      ]  0.043165526 --|x d4v1 vmentry cycles 1472
      ]  0.043166800 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254
         0.043166800 --|x d4v1 inj_virq vec 243  real
        [vla] d4v1 vec 243 injecting
      
      there is a pending event (subsequent debugging shows it is the IPI
      from the VCPU0 when smpboot.c on VCPU1 has done
      "set_cpu_online(smp_processor_id(), true)") and the guest VCPU1 is
      interrupted with the callback IPI (0xf3 aka 243) which ends up calling
      __xen_evtchn_do_upcall.
      
      The __xen_evtchn_do_upcall seems to do *something* but not acknowledge
      the pending events. And the moment the guest does a 'cli' (that is the
      ffffffff81673254 in the log above) the hypervisor is invoked again to
      inject the IPI (0xf3) to tell the guest it has pending interrupts.
      This repeats itself forever.
      
      The culprit was the per_cpu(xen_vcpu, cpu) pointer. At the bootup
      we set each per_cpu(xen_vcpu, cpu) to point to the
      shared_info->vcpu_info[vcpu] but later on use the VCPUOP_register_vcpu_info
      to register per-CPU  structures (xen_vcpu_setup).
      This is used to allow events for more than 32 VCPUs and for performance
      optimizations reasons.
      
      When the user performs the VCPU hotplug we end up calling the
      the xen_vcpu_setup once more. We make the hypercall which returns
      -EINVAL as it does not allow multiple registration calls (and
      already has re-assigned where the events are being set). We pick
      the fallback case and set per_cpu(xen_vcpu, cpu) to point to the
      shared_info->vcpu_info[vcpu] (which is a good fallback during bootup).
      However the hypervisor is still setting events in the register
      per-cpu structure (per_cpu(xen_vcpu_info, cpu)).
      
      As such when the events are set by the hypervisor (such as timer one),
      and when we iterate in __xen_evtchn_do_upcall we end up reading stale
      events from the shared_info->vcpu_info[vcpu] instead of the
      per_cpu(xen_vcpu_info, cpu) structures. Hence we never acknowledge the
      events that the hypervisor has set and the hypervisor keeps on reminding
      us to ack the events which we never do.
      
      The fix is simple. Don't on the second time when xen_vcpu_setup is
      called over-write the per_cpu(xen_vcpu, cpu) if it points to
      per_cpu(xen_vcpu_info).
      Acked-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db9f69dc
    • Li Zefan's avatar
      shm: fix null pointer deref when userspace specifies invalid hugepage size · 159590f2
      Li Zefan authored
      commit 091d0d55 upstream.
      
      Dave reported an oops triggered by trinity:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
        IP: newseg+0x10d/0x390
        PGD cf8c1067 PUD cf8c2067 PMD 0
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        CPU: 2 PID: 7636 Comm: trinity-child2 Not tainted 3.9.0+#67
        ...
        Call Trace:
          ipcget+0x182/0x380
          SyS_shmget+0x5a/0x60
          tracesys+0xdd/0xe2
      
      This bug was introduced by commit af73e4d9 ("hugetlbfs: fix mmap
      failure in unaligned size request").
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarLi Zefan <lizfan@huawei.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      159590f2
    • Shuah Khan's avatar
      hp_accel: Ignore the error from lis3lv02d_poweron() at resume · 7b44587e
      Shuah Khan authored
      commit 77838199 upstream.
      
      The error in lis3lv02_poweron() is harmless in the resume path, so
      we should ignore it. It is inline with the other usages of lis3lv02_poweron()
      and matches the 3.0 code for this routine. This patch is in suse git and
      might have missed making it into the mainline.
      opensuse - commit id: 66ccdac87c322cf7af12bddba8c805af640b1cff
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarShuah Khan <shuah.khan@hp.com>
      Signed-off-by: default avatarMatthew Garrett <matthew.garrett@nebula.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b44587e
    • Jeff Layton's avatar
      nfsd: fix oops when legacy_recdir_name_error is passed a -ENOENT error · 59d7914f
      Jeff Layton authored
      commit 7255e716 upstream.
      
      Toralf reported the following oops to the linux-nfs mailing list:
      
          -----------------[snip]------------------
          NFSD: unable to generate recoverydir name (-2).
          NFSD: disabling legacy clientid tracking. Reboot recovery will not function correctly!
          BUG: unable to handle kernel NULL pointer dereference at 000003c8
          IP: [<f90a3d91>] nfsd4_client_tracking_exit+0x11/0x50 [nfsd]
          *pdpt = 000000002ba33001 *pde = 0000000000000000
          Oops: 0000 [#1] SMP
          Modules linked in: loop nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack nf_conntrack_ftp xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun arc4 iwldvm mac80211 coretemp kvm_intel uvcvideo sdhci_pci sdhci mmc_core videobuf2_vmalloc videobuf2_memops usblp videobuf2_core i915 iwlwifi psmouse videodev cfg80211 kvm fbcon bitblit cfbfillrect acpi_cpufreq mperf evdev softcursor font cfbimgblt i2c_algo_bit cfbcopyarea intel_agp intel_gtt drm_kms_helper snd_hda_codec_conexant drm agpgart fb fbdev tpm_tis thinkpad_acpi tpm nvram e1000e rfkill thermal ptp wmi pps_core tpm_bios 8250_pci processor 8250 ac snd_hda_intel snd_hda_codec snd_pcm battery video i2c_i801 snd_page_alloc snd_timer button serial_core i2c_core snd soundcore thermal_sys hwmon aesni_intel ablk_helper cryp
      td lrw aes_i586 xts gf128mul cbc fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode]
          Pid: 6374, comm: nfsd Not tainted 3.9.1 #6 LENOVO 4180F65/4180F65
          EIP: 0060:[<f90a3d91>] EFLAGS: 00010202 CPU: 0
          EIP is at nfsd4_client_tracking_exit+0x11/0x50 [nfsd]
          EAX: 00000000 EBX: fffffffe ECX: 00000007 EDX: 00000007
          ESI: eb9dcb00 EDI: eb2991c0 EBP: eb2bde38 ESP: eb2bde34
          DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
          CR0: 80050033 CR2: 000003c8 CR3: 2ba80000 CR4: 000407f0
          DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
          DR6: ffff0ff0 DR7: 00000400
          Process nfsd (pid: 6374, ti=eb2bc000 task=eb2711c0 task.ti=eb2bc000)
          Stack:
          fffffffe eb2bde4c f90a3e0c f90a7754 fffffffe eb0a9c00 eb2bdea0 f90a41ed
          eb2991c0 1b270000 eb2991c0 eb2bde7c f9099ce9 eb2bde98 0129a020 eb29a020
          eb2bdecc eb2991c0 eb2bdea8 f9099da5 00000000 eb9dcb00 00000001 67822f08
          Call Trace:
          [<f90a3e0c>] legacy_recdir_name_error+0x3c/0x40 [nfsd]
          [<f90a41ed>] nfsd4_create_clid_dir+0x15d/0x1c0 [nfsd]
          [<f9099ce9>] ? nfsd4_lookup_stateid+0x99/0xd0 [nfsd]
          [<f9099da5>] ? nfs4_preprocess_seqid_op+0x85/0x100 [nfsd]
          [<f90a4287>] nfsd4_client_record_create+0x37/0x50 [nfsd]
          [<f909d6ce>] nfsd4_open_confirm+0xfe/0x130 [nfsd]
          [<f90980b1>] ? nfsd4_encode_operation+0x61/0x90 [nfsd]
          [<f909d5d0>] ? nfsd4_free_stateid+0xc0/0xc0 [nfsd]
          [<f908fd0b>] nfsd4_proc_compound+0x41b/0x530 [nfsd]
          [<f9081b7b>] nfsd_dispatch+0x8b/0x1a0 [nfsd]
          [<f857b85d>] svc_process+0x3dd/0x640 [sunrpc]
          [<f908165d>] nfsd+0xad/0x110 [nfsd]
          [<f90815b0>] ? nfsd_destroy+0x70/0x70 [nfsd]
          [<c1054824>] kthread+0x94/0xa0
          [<c1486937>] ret_from_kernel_thread+0x1b/0x28
          [<c1054790>] ? flush_kthread_work+0xd0/0xd0
          Code: 86 b0 00 00 00 90 c5 0a f9 c7 04 24 70 76 0a f9 e8 74 a9 3d c8 eb ba 8d 76 00 55 89 e5 53 66 66 66 66 90 8b 15 68 c7 0a f9 85 d2 <8b> 88 c8 03 00 00 74 2c 3b 11 77 28 8b 5c 91 08 85 db 74 22 8b
          EIP: [<f90a3d91>] nfsd4_client_tracking_exit+0x11/0x50 [nfsd] SS:ESP 0068:eb2bde34
          CR2: 00000000000003c8
          ---[ end trace 09e54015d145c9c6 ]---
      
      The problem appears to be a regression that was introduced in commit
      9a9c6478 "nfsd: make NFSv4 recovery client tracking options per net".
      Prior to that commit, it was safe to pass a NULL net pointer to
      nfsd4_client_tracking_exit in the legacy recdir case, and
      legacy_recdir_name_error did so. After that comit, the net pointer must
      be valid.
      
      This patch just fixes legacy_recdir_name_error to pass in a valid net
      pointer to that function.
      Reported-and-tested-by: default avatarToralf Förster <toralf.foerster@gmx.de>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59d7914f
    • J. Bruce Fields's avatar
      nfsd4: don't allow owner override on 4.1 CLAIM_FH opens · faad5f5c
      J. Bruce Fields authored
      commit 9f415eb2 upstream.
      
      The Linux client is using CLAIM_FH to implement regular opens, not just
      recovery cases, so it depends on the server to check permissions
      correctly.
      
      Therefore the owner override, which may make sense in the delegation
      recovery case, isn't right in the CLAIM_FH case.
      
      Symptoms: on a client with 49f9a0fa
      "NFSv4.1: Enable open-by-filehandle", Bryan noticed this:
      
      	touch test.txt
      	chmod 000 test.txt
      	echo test > test.txt
      
      succeeding.
      Reported-by: default avatarBryan Schumaker <bjschuma@netapp.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      faad5f5c
    • Stanislaw Gruszka's avatar
      sched: Avoid prev->stime underflow · 6bc7f6ef
      Stanislaw Gruszka authored
      commit 68aa8efc upstream.
      
      Dave Hansen reported strange utime/stime values on his system:
      https://lkml.org/lkml/2013/4/4/435
      
      This happens because prev->stime value is bigger than rtime
      value. Root of the problem are non-monotonic rtime values (i.e.
      current rtime is smaller than previous rtime) and that should be
      debugged and fixed.
      
      But since problem did not manifest itself before commit
      62188451 "cputime: Avoid
      multiplication overflow on utime scaling", it should be threated
      as regression, which we can easily fixed on cputime_adjust()
      function.
      
      For now, let's apply this fix, but further work is needed to fix
      root of the problem.
      Reported-and-tested-by: default avatarDave Hansen <dave@sr71.net>
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: rostedt@goodmis.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1367314507-9728-3-git-send-email-sgruszka@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bc7f6ef
    • Stanislaw Gruszka's avatar
      Revert "math64: New div64_u64_rem helper" · 859a8c0d
      Stanislaw Gruszka authored
      commit f3002134 upstream.
      
      This reverts commit f7926850.
      
      The cputime scaling code was changed/fixed and does not need the
      div64_u64_rem() primitive anymore. It has no other users, so let's
      remove them.
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: rostedt@goodmis.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1367314507-9728-4-git-send-email-sgruszka@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      859a8c0d
    • Stanislaw Gruszka's avatar
      sched: Do not account bogus utime · f25d7d1c
      Stanislaw Gruszka authored
      commit 772c808a upstream.
      
      Due to rounding in scale_stime(), for big numbers, scaled stime
      values will grow in chunks. Since rtime grow in jiffies and we
      calculate utime like below:
      
      	prev->stime = max(prev->stime, stime);
      	prev->utime = max(prev->utime, rtime - prev->stime);
      
      we could erroneously account stime values as utime. To prevent
      that only update prev->{u,s}time values when they are smaller
      than current rtime.
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: rostedt@goodmis.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1367314507-9728-2-git-send-email-sgruszka@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f25d7d1c
    • Stanislaw Gruszka's avatar
      sched: Avoid cputime scaling overflow · 434c4913
      Stanislaw Gruszka authored
      commit 55eaa7c1 upstream.
      
      Here is patch, which adds Linus's cputime scaling algorithm to the
      kernel.
      
      This is a follow up (well, fix) to commit
      d9a3c982 ("sched: Lower chances
      of cputime scaling overflow") which commit tried to avoid
      multiplication overflow, but did not guarantee that the overflow
      would not happen.
      
      Linus crated a different algorithm, which completely avoids the
      multiplication overflow by dropping precision when numbers are
      big.
      
      It was tested by me and it gives good relative error of
      scaled numbers. Testing method is described here:
      http://marc.info/?l=linux-kernel&m=136733059505406&w=2
      
      Originally-From: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: rostedt@goodmis.org
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20130430151441.GC10465@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      434c4913
    • Frederic Weisbecker's avatar
      sched: Lower chances of cputime scaling overflow · 96fc7a7d
      Frederic Weisbecker authored
      commit d9a3c982 upstream.
      
      Some users have reported that after running a process with
      hundreds of threads on intensive CPU-bound loads, the cputime
      of the group started to freeze after a few days.
      
      This is due to how we scale the tick-based cputime against
      the scheduler precise execution time value.
      
      We add the values of all threads in the group and we multiply
      that against the sum of the scheduler exec runtime of the whole
      group.
      
      This easily overflows after a few days/weeks of execution.
      
      A proposed solution to solve this was to compute that multiplication
      on stime instead of utime:
         62188451
         ("cputime: Avoid multiplication overflow on utime scaling")
      
      The rationale behind that was that it's easy for a thread to
      spend most of its time in userspace under intensive CPU-bound workload
      but it's much harder to do CPU-bound intensive long run in the kernel.
      
      This postulate got defeated when a user recently reported he was still
      seeing cputime freezes after the above patch. The workload that
      triggers this issue relates to intensive networking workloads where
      most of the cputime is consumed in the kernel.
      
      To reduce much more the opportunities for multiplication overflow,
      lets reduce the multiplication factors to the remainders of the division
      between sched exec runtime and cputime. Assuming the difference between
      these shouldn't ever be that large, it could work on many situations.
      
      This gets the same results as in the upstream scaling code except for
      a small difference: the upstream code always rounds the results to
      the nearest integer not greater to what would be the precise result.
      The new code rounds to the nearest integer either greater or not
      greater. In practice this difference probably shouldn't matter but
      it's worth mentioning.
      
      If this solution appears not to be enough in the end, we'll
      need to partly revert back to the behaviour prior to commit
           0cf55e1e
           ("sched, cputime: Introduce thread_group_times()")
      
      Back then, the scaling was done on exit() time before adding the cputime
      of an exiting thread to the signal struct. And then we'll need to
      scale one-by-one the live threads cputime in thread_group_cputime(). The
      drawback may be a slightly slower code on exit time.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96fc7a7d
    • Frederic Weisbecker's avatar
      math64: New div64_u64_rem helper · c459e23a
      Frederic Weisbecker authored
      commit f7926850 upstream.
      
      Provide an extended version of div64_u64() that
      also returns the remainder of the division.
      
      We are going to need this to refine the cputime
      scaling code.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c459e23a
    • Wei Yongjun's avatar
      dm cache: fix error return code in cache_create · 3dc73aa4
      Wei Yongjun authored
      commit fa4d683a upstream.
      
      Return -ENOMEM if memory allocation fails in cache_create
      instead of 0 (to avoid NULL pointer dereference).
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3dc73aa4
    • Wei Yongjun's avatar
      dm snapshot: fix error return code in snapshot_ctr · 62253ab0
      Wei Yongjun authored
      commit 09e8b813 upstream.
      
      Return -ENOMEM instead of success if unable to allocate pending
      exception mempool in snapshot_ctr.
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      62253ab0
    • Mikulas Patocka's avatar
      dm bufio: avoid a possible __vmalloc deadlock · 8f9341a6
      Mikulas Patocka authored
      commit 502624bd upstream.
      
      This patch uses memalloc_noio_save to avoid a possible deadlock in
      dm-bufio.  (it could happen only with large block size, at most
      PAGE_SIZE << MAX_ORDER (typically 8MiB).
      
      __vmalloc doesn't fully respect gfp flags. The specified gfp flags are
      used for allocation of requested pages, structures vmap_area, vmap_block
      and vm_struct and the radix tree nodes.
      
      However, the kernel pagetables are allocated always with GFP_KERNEL.
      Thus the allocation of pagetables can recurse back to the I/O layer and
      cause a deadlock.
      
      This patch uses the function memalloc_noio_save to set per-process
      PF_MEMALLOC_NOIO flag and the function memalloc_noio_restore to restore
      it. When this flag is set, all allocations in the process are done with
      implied GFP_NOIO flag, thus the deadlock can't happen.
      
      This should be backported to stable kernels, but they don't have the
      PF_MEMALLOC_NOIO flag and memalloc_noio_save/memalloc_noio_restore
      functions. So, PF_MEMALLOC should be set and restored instead.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8f9341a6
    • Mike Snitzer's avatar
      dm stripe: fix regression in stripe_width calculation · ce397c5e
      Mike Snitzer authored
      commit d793e684 upstream.
      
      Fix a regression in the calculation of the stripe_width in the
      dm stripe target which led to incorrect processing of device limits.
      
      The stripe_width is the stripe device length divided by the number of
      stripes.  The group of commits in the range f14fa693 ("dm stripe: fix
      size test") to eb850de6 ("dm stripe: support for non power of 2
      chunksize") interfered with each other (a merging error) and led to the
      stripe_width being set incorrectly to the stripe device length divided by
      chunk_size * stripe_count.
      
      For example, a stripe device's table with: 0 33553920 striped 3 512 ...
      should result in a stripe_width of 11184640 (33553920 / 3), but due to
      the bug it was getting set to 21845 (33553920 / (512 * 3)).
      
      The impact of this bug is that device topologies that previously worked
      fine with the stripe target are no longer considered valid.  In
      particular, there is a higher risk of seeing this issue if one of the
      stripe devices has a 4K logical block size.  Resulting in an error
      message like this:
      "device-mapper: table: 253:4: len=21845 not aligned to h/w logical block size 4096 of dm-1"
      
      The fix is to swap the order of the divisions and to use a temporary
      variable for the second one, so that width retains the intended
      value.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce397c5e
    • Mike Snitzer's avatar
      dm table: fix write same support · e861593f
      Mike Snitzer authored
      commit dc019b21 upstream.
      
      If device_not_write_same_capable() returns true then the iterate_devices
      loop in dm_table_supports_write_same() should return false.
      Reported-by: default avatarBharata B Rao <bharata.rao@gmail.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e861593f