1. 17 May, 2015 38 commits
    • Russell King's avatar
      ARM: fix broken hibernation · 3e98276b
      Russell King authored
      [ Upstream commit 767bf7e7 ]
      
      Normally, when a CPU wants to clear a cache line to zero in the external
      L2 cache, it would generate bus cycles to write each word as it would do
      with any other data access.
      
      However, a Cortex A9 connected to a L2C-310 has a specific feature where
      the CPU can detect this operation, and signal that it wants to zero an
      entire cache line.  This feature, known as Full Line of Zeros (FLZ),
      involves a non-standard AXI signalling mechanism which only the L2C-310
      can properly interpret.
      
      There are separate enable bits in both the L2C-310 and the Cortex A9 -
      the L2C-310 needs to be enabled and have the FLZ enable bit set in the
      auxiliary control register before the Cortex A9 has this feature
      enabled.
      
      Unfortunately, the suspend code was not respecting this - it's not
      obvious from the code:
      
      swsusp_arch_suspend()
       cpu_suspend() /* saves the Cortex A9 auxiliary control register */
        arch_save_image()
        soft_restart() /* turns off FLZ in Cortex A9, and disables L2C */
         cpu_resume() /* restores the Cortex A9 registers, inc auxcr */
      
      At this point, we end up with the L2C disabled, but the Cortex A9 with
      FLZ enabled - which means any memset() or zeroing of a full cache line
      will fail to take effect.
      
      A similar issue exists in the resume path, but it's slightly more
      complex:
      
      swsusp_arch_suspend()
       cpu_suspend() /* saves the Cortex A9 auxiliary control register */
        arch_save_image() /* image with A9 auxcr saved */
      ...
      swsusp_arch_resume()
       call_with_stack()
        arch_restore_image() /* restores image with A9 auxcr saved above */
        soft_restart() /* turns off FLZ in Cortex A9, and disables L2C */
         cpu_resume() /* restores the Cortex A9 registers, inc auxcr */
      
      Again, here we end up with the L2C disabled, but Cortex A9 FLZ enabled.
      
      There's no need to turn off the L2C in either of these two paths; there
      are benefits from not doing so - for example, the page copies will be
      faster with the L2C enabled.
      
      Hence, fix this by providing a variant of soft_restart() which can be
      used without turning the L2 cache controller off, and use it in both
      of these paths to keep the L2C enabled across the respective resume
      transitions.
      
      Fixes: 8ef418c7 ("ARM: l2c: trial at enabling some Cortex-A9 optimisations")
      Reported-by: default avatarSean Cross <xobs@kosagi.com>
      Tested-by: default avatarSean Cross <xobs@kosagi.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      3e98276b
    • Andrew Elble's avatar
      NFS: fix BUG() crash in notify_change() with patch to chown_common() · 36af79ba
      Andrew Elble authored
      [ Upstream commit c1b8940b ]
      
      We have observed a BUG() crash in fs/attr.c:notify_change(). The crash
      occurs during an rsync into a filesystem that is exported via NFS.
      
      1.) fs/attr.c:notify_change() modifies the caller's version of attr.
      2.) 6de0ec00 ("VFS: make notify_change pass ATTR_KILL_S*ID to
          setattr operations") introduced a BUG() restriction such that "no
          function will ever call notify_change() with both ATTR_MODE and
          ATTR_KILL_S*ID set". Under some circumstances though, it will have
          assisted in setting the caller's version of attr to this very
          combination.
      3.) 27ac0ffe ("locks: break delegations on any attribute
          modification") introduced code to handle breaking
          delegations. This can result in notify_change() being re-called. attr
          _must_ be explicitly reset to avoid triggering the BUG() established
          in #2.
      4.) The path that that triggers this is via fs/open.c:chmod_common().
          The combination of attr flags set here and in the first call to
          notify_change() along with a later failed break_deleg_wait()
          results in notify_change() being called again via retry_deleg
          without resetting attr.
      
      Solution is to move retry_deleg in chmod_common() a bit further up to
      ensure attr is completely reset.
      
      There are other places where this seemingly could occur, such as
      fs/utimes.c:utimes_common(), but the attr flags are not initially
      set in such a way to trigger this.
      
      Fixes: 27ac0ffe ("locks: break delegations on any attribute modification")
      Reported-by: default avatarEric Meddaugh <etmsys@rit.edu>
      Tested-by: default avatarEric Meddaugh <etmsys@rit.edu>
      Signed-off-by: default avatarAndrew Elble <aweits@rit.edu>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      36af79ba
    • Krzysztof Kozlowski's avatar
      power_supply: ipaq_micro_battery: Check return values in probe · 32354801
      Krzysztof Kozlowski authored
      [ Upstream commit a2c1d531 ]
      
      The return values of create_singlethread_workqueue() and
      power_supply_register() calls were not checked and even on error probe()
      function returned 0.
      
      1. If allocation of workqueue failed (returning NULL) then further
         accesses could lead to NULL pointer dereference. The
         queue_delayed_work() expects workqueue to be non-NULL.
      
      2. If registration of power supply failed then during unbind the driver
         tried to unregister power supply which was not actually registered.
         This could lead to memory corruption because
         power_supply_unregister() unconditionally cleans up given power
         supply.
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Fixes: 00a588f9 ("power: add driver for battery reading on iPaq h3xxx")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSebastian Reichel <sre@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      32354801
    • Krzysztof Kozlowski's avatar
      power_supply: ipaq_micro_battery: Fix leaking workqueue · d81418c1
      Krzysztof Kozlowski authored
      [ Upstream commit f852ec46 ]
      
      Driver allocates singlethread workqueue in probe but it is not destroyed
      during removal.
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Fixes: 00a588f9 ("power: add driver for battery reading on iPaq h3xxx")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSebastian Reichel <sre@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d81418c1
    • Krzysztof Kozlowski's avatar
      power_supply: lp8788-charger: Fix leaked power supply on probe fail · 58aad404
      Krzysztof Kozlowski authored
      [ Upstream commit a7117f81 ]
      
      Driver forgot to unregister charger power supply if registering of
      battery supply failed in probe(). In such case the memory associated
      with power supply leaked.
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Fixes: 98a27664 ("power_supply: Add new lp8788 charger driver")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSebastian Reichel <sre@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      58aad404
    • Krzysztof Kozlowski's avatar
      power_supply: twl4030_madc: Check return value of power_supply_register · 6bc373ff
      Krzysztof Kozlowski authored
      [ Upstream commit 68c3ed6f ]
      
      The return value of power_supply_register() call was not checked and
      even on error probe() function returned 0. If registering failed then
      during unbind the driver tried to unregister power supply which was not
      actually registered.
      
      This could lead to memory corruption because power_supply_unregister()
      unconditionally cleans up given power supply.
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Fixes: da0a00eb ("power: Add twl4030_madc battery driver.")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSebastian Reichel <sre@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6bc373ff
    • Steven Rostedt's avatar
      ring-buffer: Replace this_cpu_*() with __this_cpu_*() · 6ec5fc3a
      Steven Rostedt authored
      [ Upstream commit 80a9b64e ]
      
      It has come to my attention that this_cpu_read/write are horrible on
      architectures other than x86. Worse yet, they actually disable
      preemption or interrupts! This caused some unexpected tracing results
      on ARM.
      
         101.356868: preempt_count_add <-ring_buffer_lock_reserve
         101.356870: preempt_count_sub <-ring_buffer_lock_reserve
      
      The ring_buffer_lock_reserve has recursion protection that requires
      accessing a per cpu variable. But since preempt_disable() is traced, it
      too got traced while accessing the variable that is suppose to prevent
      recursion like this.
      
      The generic version of this_cpu_read() and write() are:
      
       #define this_cpu_generic_read(pcp)					\
       ({	typeof(pcp) ret__;						\
      	preempt_disable();						\
      	ret__ = *this_cpu_ptr(&(pcp));					\
      	preempt_enable();						\
      	ret__;								\
       })
      
       #define this_cpu_generic_to_op(pcp, val, op)				\
       do {									\
      	unsigned long flags;						\
      	raw_local_irq_save(flags);					\
      	*__this_cpu_ptr(&(pcp)) op val;					\
      	raw_local_irq_restore(flags);					\
       } while (0)
      
      Which is unacceptable for locations that know they are within preempt
      disabled or interrupt disabled locations.
      
      Paul McKenney stated that __this_cpu_() versions produce much better code on
      other architectures than this_cpu_() does, if we know that the call is done in
      a preempt disabled location.
      
      I also changed the recursive_unlock() to use two local variables instead
      of accessing the per_cpu variable twice.
      
      Link: http://lkml.kernel.org/r/20150317114411.GE3589@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/20150317104038.312e73d1@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Reported-by: default avatarUwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
      Tested-by: default avatarUwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6ec5fc3a
    • Krzysztof Kozlowski's avatar
      compal-laptop: Check return value of power_supply_register · ebc00a20
      Krzysztof Kozlowski authored
      [ Upstream commit 1915a718 ]
      
      The return value of power_supply_register() call was not checked and
      even on error probe() function returned 0. If registering failed then
      during unbind the driver tried to unregister power supply which was not
      actually registered.
      
      This could lead to memory corruption because power_supply_unregister()
      unconditionally cleans up given power supply.
      
      Fix this by checking return status of power_supply_register() call. In
      case of failure, clean up sysfs entries and fail the probe.
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Fixes: 9be0fcb5 ("compal-laptop: add JHL90, battery & hwmon interface")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSebastian Reichel <sre@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ebc00a20
    • Krzysztof Kozlowski's avatar
      compal-laptop: Fix leaking hwmon device · 29037bcf
      Krzysztof Kozlowski authored
      [ Upstream commit ad774702 ]
      
      The commit c2be45f0 ("compal-laptop: Use
      devm_hwmon_device_register_with_groups") wanted to change the
      registering of hwmon device to resource-managed version. It mostly did
      it except the main thing - it forgot to use devm-like function so the
      hwmon device leaked after device removal or probe failure.
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Fixes: c2be45f0 ("compal-laptop: Use devm_hwmon_device_register_with_groups")
      Cc: <stable@vger.kernel.org>
      Acked-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Acked-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarSebastian Reichel <sre@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      29037bcf
    • Ian Abbott's avatar
      spi: spidev: fix possible arithmetic overflow for multi-transfer message · 502e0246
      Ian Abbott authored
      [ Upstream commit f20fbaad ]
      
      `spidev_message()` sums the lengths of the individual SPI transfers to
      determine the overall SPI message length.  It restricts the total
      length, returning an error if too long, but it does not check for
      arithmetic overflow.  For example, if the SPI message consisted of two
      transfers and the first has a length of 10 and the second has a length
      of (__u32)(-1), the total length would be seen as 9, even though the
      second transfer is actually very long.  If the second transfer specifies
      a null `rx_buf` and a non-null `tx_buf`, the `copy_from_user()` could
      overrun the spidev's pre-allocated tx buffer before it reaches an
      invalid user memory address.  Fix it by checking that neither the total
      nor the individual transfer lengths exceed the maximum allowed value.
      
      Thanks to Dan Carpenter for reporting the potential integer overflow.
      Signed-off-by: default avatarIan Abbott <abbotti@mev.co.uk>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      502e0246
    • Lucas Stach's avatar
      spi: imx: read back the RX/TX watermark levels earlier · 2ed82e9e
      Lucas Stach authored
      [ Upstream commit f511ab09 ]
      
      They are used to decide if the controller can do DMA on a buffer
      of a specific length and thus are needed before any transfer is attempted.
      
      This fixes a memory leak where the SPI core uses the drivers can_dma()
      callback to determine if a buffer needs to be mapped. As the watermark
      levels aren't correct at that point the driver falsely claims to be able to
      DMA the buffer when it fact it isn't.
      After the transfer has been done the core uses the same callback to
      determine if it needs to unmap the buffers. As the driver now correctly
      claims to not being able to DMA the buffer the core doesn't attempt to
      unmap the buffer which leaves the SGT leaking.
      
      Fixes: f62caccd (spi: spi-imx: add DMA support)
      Signed-off-by: default avatarLucas Stach <l.stach@pengutronix.de>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2ed82e9e
    • Chen-Yu Tsai's avatar
      mmc: sunxi: Use devm_reset_control_get_optional() for reset control · 7c06756c
      Chen-Yu Tsai authored
      [ Upstream commit 9e71c589 ]
      
      The reset control for the sunxi mmc controller is optional. Some
      newer platforms (sun6i, sun8i, sun9i) have it, while older ones
      (sun4i, sun5i, sun7i) don't.
      
      Use the properly stubbed _optional version so the driver does not
      fail to compile when RESET_CONTROLLER=n.
      
      This patch also adds a check for deferred probing on the reset
      control.
      Signed-off-by: default avatarChen-Yu Tsai <wens@csie.org>
      Cc: <stable@vger.kernel.org> # 3.16+
      Acked-by: default avatarDavid Lanzendörfer <david.lanzendoerfer@o2s.ch>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7c06756c
    • Oliver Neukum's avatar
      cdc-wdm: fix endianness bug in debug statements · cdc0cf5f
      Oliver Neukum authored
      [ Upstream commit 323ece54 ]
      
      Values directly from descriptors given in debug statements
      must be converted to native endianness.
      Signed-off-by: default avatarOliver Neukum <oneukum@suse.de>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      cdc0cf5f
    • NeilBrown's avatar
      md/raid0: fix bug with chunksize not a power of 2. · d2c861b7
      NeilBrown authored
      [ Upstream commit 47d68979 ]
      
      Since commit 20d0189b
      in v3.14-rc1 RAID0 has performed incorrect calculations
      when the chunksize is not a power of 2.
      
      This happens because "sector_div()" modifies its first argument, but
      this wasn't taken into account in the patch.
      
      So restore that first arg before re-using the variable.
      Reported-by: default avatarJoe Landman <joe.landman@gmail.com>
      Reported-by: default avatarDave Chinner <david@fromorbit.com>
      Fixes: 20d0189b
      Cc: stable@vger.kernel.org (3.14 and later).
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d2c861b7
    • Alistair Strachan's avatar
      staging: android: sync: Fix memory corruption in sync_timeline_signal(). · f364a04f
      Alistair Strachan authored
      [ Upstream commit 8e43c9c7 ]
      
      The android_fence_release() function checks for active sync points
      by calling list_empty() on the list head embedded on the sync
      point. However, it is only valid to use list_empty() on nodes that
      have been initialized with INIT_LIST_HEAD() or list_del_init().
      
      Because the list entry has likely been removed from the active list
      by sync_timeline_signal(), there is a good chance that this
      WARN_ON_ONCE() will be hit due to dangling pointers pointing at
      freed memory (even though the sync drivers did nothing wrong)
      and memory corruption will ensue as the list entry is removed for
      a second time, corrupting the active list.
      
      This problem can be reproduced quite easily with CONFIG_DEBUG_LIST=y
      and fences with more than one sync point.
      Signed-off-by: default avatarAlistair Strachan <alistair.strachan@imgtec.com>
      Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Colin Cross <ccross@google.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f364a04f
    • Sudip Mukherjee's avatar
      staging: panel: fix lcd type · fd431a7c
      Sudip Mukherjee authored
      [ Upstream commit 2c20d92d ]
      
      the lcd type as defined in the Kconfig is not matching in the code.
      as a result the rs, rw and en pins were getting interchanged.
      Kconfig defines the value of PANEL_LCD to be 1 if we select custom
      configuration but in the code LCD_TYPE_CUSTOM is defined as 5.
      
      my hardware is LCD_TYPE_CUSTOM, but the pins were assigned to it
      as pins of LCD_TYPE_OLD, and it was not working.
      Now values are corrected with referenece to the values defined in
      Kconfig and it is working.
      checked on JHD204A lcd with LCD_TYPE_CUSTOM configuration.
      
      Cc: <stable@vger.kernel.org> # 2.6.32+
      Signed-off-by: default avatarSudip Mukherjee <sudip@vectorindia.org>
      Acked-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      fd431a7c
    • Huacai Chen's avatar
      MIPS: Hibernate: flush TLB entries earlier · 886b9d66
      Huacai Chen authored
      [ Upstream commit a843d00d ]
      
      We found that TLB mismatch not only happens after kernel resume, but
      also happens during snapshot restore. So move it to the beginning of
      swsusp_arch_suspend().
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Cc: <stable@vger.kernel.org>
      Cc: Steven J. Hill <Steven.Hill@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: Fuxin Zhang <zhangfx@lemote.com>
      Cc: Zhangjin Wu <wuzhangjin@gmail.com>
      Cc: stable@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/9621/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      886b9d66
    • Huacai Chen's avatar
      MIPS: Loongson-3: Add IRQF_NO_SUSPEND to Cascade irqaction · 38d99bff
      Huacai Chen authored
      [ Upstream commit 0add9c2f ]
      
      HPET irq is routed to i8259 and then to MIPS CPU irq (cascade). After
      commit a3e6c1ef (MIPS: IRQ: Fix disable_irq on CPU IRQs), if without
      IRQF_NO_SUSPEND in cascade_irqaction, HPET interrupts will lost during
      suspend. The result is machine cannot be waken up.
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Cc: <stable@vger.kernel.org>
      Cc: Steven J. Hill <Steven.Hill@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: Fuxin Zhang <zhangfx@lemote.com>
      Cc: Zhangjin Wu <wuzhangjin@gmail.com>
      Patchwork: https://patchwork.linux-mips.org/patch/9528/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      38d99bff
    • Markos Chandras's avatar
      MIPS: asm: asm-eva: Introduce kernel load/store variants · 301080ae
      Markos Chandras authored
      [ Upstream commit 60cd7e08 ]
      
      Introduce new macros for kernel load/store variants which will be
      used to perform regular kernel space load/store operations in EVA
      mode.
      Signed-off-by: default avatarMarkos Chandras <markos.chandras@imgtec.com>
      Cc: <stable@vger.kernel.org> # v3.15+
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/9500/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      301080ae
    • Markos Chandras's avatar
      MIPS: Malta: Detect and fix bad memsize values · 312bc67c
      Markos Chandras authored
      [ Upstream commit f7f8aea4 ]
      
      memsize denotes the amount of RAM we can access from kseg{0,1} and
      that should be up to 256M. In case the bootloader reports a value
      higher than that (perhaps reporting all the available RAM) it's best
      if we fix it ourselves and just warn the user about that. This is
      usually a problem with the bootloader and/or its environment.
      
      [ralf@linux-mips.org: Remove useless parens as suggested bei Sergei.
      Reformat long pr_warn statement to fit into 80 column limit.]
      Signed-off-by: default avatarMarkos Chandras <markos.chandras@imgtec.com>
      Cc: <stable@vger.kernel.org> # v3.15+
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/9362/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      312bc67c
    • James Hogan's avatar
      MIPS: lose_fpu(): Disable FPU when MSA enabled · 00f1187a
      James Hogan authored
      [ Upstream commit f8483988 ]
      
      The lose_fpu() function only disables the FPU in CP0_Status.CU1 if the
      FPU is in use and MSA isn't enabled.
      
      This isn't necessarily a problem because KSTK_STATUS(current), the
      version of CP0_Status stored on the kernel stack on entry from user
      mode, does always get updated and gets restored when returning to user
      mode, but I don't think it was intended, and it is inconsistent with the
      case of only the FPU being in use. Sometimes leaving the FPU enabled may
      also mask kernel bugs where FPU operations are executed when the FPU
      might not be enabled.
      
      So lets disable the FPU in the MSA case too.
      
      Fixes: 33c771ba ("MIPS: save/disable MSA in lose_fpu")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/9323/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      00f1187a
    • James Hogan's avatar
      MIPS: KVM: Handle MSA Disabled exceptions from guest · 707ff225
      James Hogan authored
      [ Upstream commit 98119ad5 ]
      
      Guest user mode can generate a guest MSA Disabled exception on an MSA
      capable core by simply trying to execute an MSA instruction. Since this
      exception is unknown to KVM it will be passed on to the guest kernel.
      However guest Linux kernels prior to v3.15 do not set up an exception
      handler for the MSA Disabled exception as they don't support any MSA
      capable cores. This results in a guest OS panic.
      
      Since an older processor ID may be being emulated, and MSA support is
      not advertised to the guest, the correct behaviour is to generate a
      Reserved Instruction exception in the guest kernel so it can send the
      guest process an illegal instruction signal (SIGILL), as would happen
      with a non-MSA-capable core.
      
      Fix this as minimally as reasonably possible by preventing
      kvm_mips_check_privilege() from relaying MSA Disabled exceptions from
      guest user mode to the guest kernel, and handling the MSA Disabled
      exception by emulating a Reserved Instruction exception in the guest,
      via a new handle_msa_disabled() KVM callback.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: <stable@vger.kernel.org> # v3.15+
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      707ff225
    • Andre Przywara's avatar
      KVM: arm/arm64: check IRQ number on userland injection · e580744e
      Andre Przywara authored
      [ Upstream commit fd1d0ddf ]
      
      When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently
      only check it against a fixed limit, which historically is set
      to 127. With the new dynamic IRQ allocation the effective limit may
      actually be smaller (64).
      So when now a malicious or buggy userland injects a SPI in that
      range, we spill over on our VGIC bitmaps and bytemaps memory.
      I could trigger a host kernel NULL pointer dereference with current
      mainline by injecting some bogus IRQ number from a hacked kvmtool:
      -----------------
      ....
      DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1)
      DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1)
      DEBUG: IRQ #114 still in the game, writing to bytemap now...
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      pgd = ffffffc07652e000
      [00000000] *pgd=00000000f658b003, *pud=00000000f658b003, *pmd=0000000000000000
      Internal error: Oops: 96000006 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027
      Hardware name: FVP Base (DT)
      task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000
      PC is at kvm_vgic_inject_irq+0x234/0x310
      LR is at kvm_vgic_inject_irq+0x30c/0x310
      pc : [<ffffffc0000ae0a8>] lr : [<ffffffc0000ae180>] pstate: 80000145
      .....
      
      So this patch fixes this by checking the SPI number against the
      actual limit. Also we remove the former legacy hard limit of
      127 in the ioctl code.
      Signed-off-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      CC: <stable@vger.kernel.org> # 4.0, 3.19, 3.18
      [maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__,
      as suggested by Christopher Covington]
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e580744e
    • Radim Krčmář's avatar
      KVM: use slowpath for cross page cached accesses · 35e13292
      Radim Krčmář authored
      [ Upstream commit ca3f0874 ]
      
      kvm_write_guest_cached() does not mark all written pages as dirty and
      code comments in kvm_gfn_to_hva_cache_init() talk about NULL memslot
      with cross page accesses.  Fix all the easy way.
      
      The check is '<= 1' to have the same result for 'len = 0' cache anywhere
      in the page.  (nr_pages_needed is 0 on page boundary.)
      
      Fixes: 8f964525 ("KVM: Allow cross page reads and writes from cached translations.")
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Message-Id: <20150408121648.GA3519@potion.brq.redhat.com>
      Reviewed-by: default avatarWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      35e13292
    • Heiko Carstens's avatar
      s390/hibernate: fix save and restore of kernel text section · 23786453
      Heiko Carstens authored
      [ Upstream commit d7441949 ]
      
      Sebastian reported a crash caused by a jump label mismatch after resume.
      This happens because we do not save the kernel text section during suspend
      and therefore also do not restore it during resume, but use the kernel image
      that restores the old system.
      
      This means that after a suspend/resume cycle we lost all modifications done
      to the kernel text section.
      The reason for this is the pfn_is_nosave() function, which incorrectly
      returns that read-only pages don't need to be saved. This is incorrect since
      we mark the kernel text section read-only.
      We still need to make sure to not save and restore pages contained within
      NSS and DCSS segment.
      To fix this add an extra case for the kernel text section and only save
      those pages if they are not contained within an NSS segment.
      
      Fixes the following crash (and the above bugs as well):
      
      Jump label code mismatch at netif_receive_skb_internal+0x28/0xd0
      Found:    c0 04 00 00 00 00
      Expected: c0 f4 00 00 00 11
      New:      c0 04 00 00 00 00
      Kernel panic - not syncing: Corrupted kernel text
      CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0-01975-gb1b096e70f23 #4
      Call Trace:
        [<0000000000113972>] show_stack+0x72/0xf0
        [<000000000081f15e>] dump_stack+0x6e/0x90
        [<000000000081c4e8>] panic+0x108/0x2b0
        [<000000000081be64>] jump_label_bug.isra.2+0x104/0x108
        [<0000000000112176>] __jump_label_transform+0x9e/0xd0
        [<00000000001121e6>] __sm_arch_jump_label_transform+0x3e/0x50
        [<00000000001d1136>] multi_cpu_stop+0x12e/0x170
        [<00000000001d1472>] cpu_stopper_thread+0xb2/0x168
        [<000000000015d2ac>] smpboot_thread_fn+0x134/0x1b0
        [<0000000000158baa>] kthread+0x10a/0x110
        [<0000000000824a86>] kernel_thread_starter+0x6/0xc
      Reported-and-tested-by: default avatarSebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      23786453
    • Jens Freimann's avatar
      KVM: s390: fix get_all_floating_irqs · 00d83726
      Jens Freimann authored
      [ Upstream commit 94aa033e ]
      
      This fixes a bug introduced with commit c05c4186 ("KVM: s390:
      add floating irq controller").
      
      get_all_floating_irqs() does copy_to_user() while holding
      a spin lock. Let's fix this by filling a temporary buffer
      first and copy it to userspace after giving up the lock.
      
      Cc: <stable@vger.kernel.org> # 3.18+: 69a8d456 KVM: s390: no need to hold...
      Reviewed-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Freimann <jfrei@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      00d83726
    • Christian Borntraeger's avatar
      KVM: s390: no need to hold the kvm->mutex for floating interrupts · 7e15bc0e
      Christian Borntraeger authored
      [ Upstream commit 69a8d456 ]
      
      The kvm mutex was (probably) used to protect against cpu hotplug.
      The current code no longer needs to protect against that, as we only
      rely on CPU data structures that are guaranteed to be available
      if we can access the CPU. (e.g. vcpu_create will put the cpu
      in the array AFTER the cpu is ready).
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Reviewed-by: default avatarJens Freimann <jfrei@linux.vnet.ibm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7e15bc0e
    • Ekaterina Tumanova's avatar
      KVM: s390: Zero out current VMDB of STSI before including level3 data. · abd80ecb
      Ekaterina Tumanova authored
      [ Upstream commit b75f4c9a ]
      
      s390 documentation requires words 0 and 10-15 to be reserved and stored as
      zeros. As we fill out all other fields, we can memset the full structure.
      Signed-off-by: default avatarEkaterina Tumanova <tumanova@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      abd80ecb
    • David Hildenbrand's avatar
      KVM: s390: reinjection of irqs can fail in the tpi handler · b7c23d30
      David Hildenbrand authored
      [ Upstream commit 15462e37 ]
      
      The reinjection of an I/O interrupt can fail if the list is at the limit
      and between the dequeue and the reinjection, another I/O interrupt is
      injected (e.g. if user space floods kvm with I/O interrupts).
      
      This patch avoids this memory leak and returns -EFAULT in this special
      case. This error is not recoverable, so let's fail hard. This can later
      be avoided by not dequeuing the interrupt but working directly on the
      locked list.
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org # 3.16+
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      b7c23d30
    • David Hildenbrand's avatar
      KVM: s390: fix handling of write errors in the tpi handler · 19881aff
      David Hildenbrand authored
      [ Upstream commit 261520dc ]
      
      If the I/O interrupt could not be written to the guest provided
      area (e.g. access exception), a program exception was injected into the
      guest but "inti" wasn't freed, therefore resulting in a memory leak.
      
      In addition, the I/O interrupt wasn't reinjected. Therefore the dequeued
      interrupt is lost.
      
      This patch fixes the problem while cleaning up the function and making the
      cc and rc logic easier to handle.
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org # 3.16+
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      19881aff
    • Andrzej Pietrasiewicz's avatar
      usb: gadget: printer: enqueue printer's response for setup request · 67c5b95c
      Andrzej Pietrasiewicz authored
      [ Upstream commit eb132ccb ]
      
      Function-specific setup requests should be handled in such a way, that
      apart from filling in the data buffer, the requests are also actually
      enqueued: if function-specific setup is called from composte_setup(),
      the "usb_ep_queue()" block of code in composite_setup() is skipped.
      
      The printer function lacks this part and it results in e.g. get device id
      requests failing: the host expects some response, the device prepares it
      but does not equeue it for sending to the host, so the host finally asserts
      timeout.
      
      This patch adds enqueueing the prepared responses.
      
      Cc: <stable@vger.kernel.org> # v3.4+
      Fixes: 2e87edf4: "usb: gadget: make g_printer use composite"
      Signed-off-by: default avatarAndrzej Pietrasiewicz <andrzej.p@samsung.com>
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      67c5b95c
    • Filipe Manana's avatar
      Btrfs: fix inode eviction infinite loop after extent_same ioctl · 43e8149d
      Filipe Manana authored
      [ Upstream commit HEAD ]
      
      commit 113e8283 upstream.
      
      If we pass a length of 0 to the extent_same ioctl, we end up locking an
      extent range with a start offset greater then its end offset (if the
      destination file's offset is greater than zero). This results in a warning
      from extent_io.c:insert_state through the following call chain:
      
        btrfs_extent_same()
          btrfs_double_lock()
            lock_extent_range()
              lock_extent(inode->io_tree, offset, offset + len - 1)
                lock_extent_bits()
                  __set_extent_bit()
                    insert_state()
                      --> WARN_ON(end < start)
      
      This leads to an infinite loop when evicting the inode. This is the same
      problem that my previous patch titled
      "Btrfs: fix inode eviction infinite loop after cloning into it" addressed
      but for the extent_same ioctl instead of the clone ioctl.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarOmar Sandoval <osandov@osandov.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 9dc10661)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      43e8149d
    • Filipe Manana's avatar
      Btrfs: fix inode eviction infinite loop after cloning into it · d5454242
      Filipe Manana authored
      [ Upstream commit HEAD ]
      
      commit ccccf3d6 upstream.
      
      If we attempt to clone a 0 length region into a file we can end up
      inserting a range in the inode's extent_io tree with a start offset
      that is greater then the end offset, which triggers immediately the
      following warning:
      
      [ 3914.619057] WARNING: CPU: 17 PID: 4199 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]()
      [ 3914.620886] BTRFS: end < start 4095 4096
      (...)
      [ 3914.638093] Call Trace:
      [ 3914.638636]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
      [ 3914.639620]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
      [ 3914.640789]  [<ffffffffa03ca44f>] ? insert_state+0x4b/0x10b [btrfs]
      [ 3914.642041]  [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
      [ 3914.643236]  [<ffffffffa03ca44f>] insert_state+0x4b/0x10b [btrfs]
      [ 3914.644441]  [<ffffffffa03ca729>] __set_extent_bit+0x107/0x3f4 [btrfs]
      [ 3914.645711]  [<ffffffffa03cb256>] lock_extent_bits+0x65/0x1bf [btrfs]
      [ 3914.646914]  [<ffffffff8142b2fb>] ? _raw_spin_unlock+0x28/0x33
      [ 3914.648058]  [<ffffffffa03cbac4>] ? test_range_bit+0xcc/0xde [btrfs]
      [ 3914.650105]  [<ffffffffa03cb3c3>] lock_extent+0x13/0x15 [btrfs]
      [ 3914.651361]  [<ffffffffa03db39e>] lock_extent_range+0x3d/0xcd [btrfs]
      [ 3914.652761]  [<ffffffffa03de1fe>] btrfs_ioctl_clone+0x278/0x388 [btrfs]
      [ 3914.654128]  [<ffffffff811226dd>] ? might_fault+0x58/0xb5
      [ 3914.655320]  [<ffffffffa03e0909>] btrfs_ioctl+0xb51/0x2195 [btrfs]
      (...)
      [ 3914.669271] ---[ end trace 14843d3e2e622fc1 ]---
      
      This later makes the inode eviction handler enter an infinite loop that
      keeps dumping the following warning over and over:
      
      [ 3915.117629] WARNING: CPU: 22 PID: 4228 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]()
      [ 3915.119913] BTRFS: end < start 4095 4096
      (...)
      [ 3915.137394] Call Trace:
      [ 3915.137913]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
      [ 3915.139154]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
      [ 3915.140316]  [<ffffffffa03ca44f>] ? insert_state+0x4b/0x10b [btrfs]
      [ 3915.141505]  [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
      [ 3915.142709]  [<ffffffffa03ca44f>] insert_state+0x4b/0x10b [btrfs]
      [ 3915.143849]  [<ffffffffa03ca729>] __set_extent_bit+0x107/0x3f4 [btrfs]
      [ 3915.145120]  [<ffffffffa038c1e3>] ? btrfs_kill_super+0x17/0x23 [btrfs]
      [ 3915.146352]  [<ffffffff811548f6>] ? deactivate_locked_super+0x3b/0x50
      [ 3915.147565]  [<ffffffffa03cb256>] lock_extent_bits+0x65/0x1bf [btrfs]
      [ 3915.148785]  [<ffffffff8142b7e2>] ? _raw_write_unlock+0x28/0x33
      [ 3915.149931]  [<ffffffffa03bc325>] btrfs_evict_inode+0x196/0x482 [btrfs]
      [ 3915.151154]  [<ffffffff81168904>] evict+0xa0/0x148
      [ 3915.152094]  [<ffffffff811689e5>] dispose_list+0x39/0x43
      [ 3915.153081]  [<ffffffff81169564>] evict_inodes+0xdc/0xeb
      [ 3915.154062]  [<ffffffff81154418>] generic_shutdown_super+0x49/0xef
      [ 3915.155193]  [<ffffffff811546d1>] kill_anon_super+0x13/0x1e
      [ 3915.156274]  [<ffffffffa038c1e3>] btrfs_kill_super+0x17/0x23 [btrfs]
      (...)
      [ 3915.167404] ---[ end trace 14843d3e2e622fc2 ]---
      
      So just bail out of the clone ioctl if the length of the region to clone
      is zero, without locking any extent range, in order to prevent this issue
      (same behaviour as a pwrite with a 0 length for example).
      
      This is trivial to reproduce. For example, the steps for the test I just
      made for fstests:
      
        mkfs.btrfs -f SCRATCH_DEV
        mount SCRATCH_DEV $SCRATCH_MNT
      
        touch $SCRATCH_MNT/foo
        touch $SCRATCH_MNT/bar
      
        $CLONER_PROG -s 0 -d 4096 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar
        umount $SCRATCH_MNT
      
      A test case for fstests follows soon.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarOmar Sandoval <osandov@osandov.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 449b4627)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d5454242
    • David Sterba's avatar
      btrfs: don't accept bare namespace as a valid xattr · 5728a924
      David Sterba authored
      [ Upstream commit HEAD ]
      
      commit 3c3b04d1 upstream.
      
      Due to insufficient check in btrfs_is_valid_xattr, this unexpectedly
      works:
      
       $ touch file
       $ setfattr -n user. -v 1 file
       $ getfattr -d file
      user.="1"
      
      ie. the missing attribute name after the namespace.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94291Reported-by: default avatarWilliam Douglas <william.douglas@intel.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 1bb2835e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5728a924
    • Filipe Manana's avatar
      Btrfs: fix log tree corruption when fs mounted with -o discard · 372b2ac5
      Filipe Manana authored
      [ Upstream commit HEAD ]
      
      commit dcc82f47 upstream.
      
      While committing a transaction we free the log roots before we write the
      new super block. Freeing the log roots implies marking the disk location
      of every node/leaf (metadata extent) as pinned before the new super block
      is written. This is to prevent the disk location of log metadata extents
      from being reused before the new super block is written, otherwise we
      would have a corrupted log tree if before the new super block is written
      a crash/reboot happens and the location of any log tree metadata extent
      ended up being reused and rewritten.
      
      Even though we pinned the log tree's metadata extents, we were issuing a
      discard against them if the fs was mounted with the -o discard option,
      resulting in corruption of the log tree if a crash/reboot happened before
      writing the new super block - the next time the fs was mounted, during
      the log replay process we would find nodes/leafs of the log btree with
      a content full of zeroes, causing the process to fail and require the
      use of the tool btrfs-zero-log to wipeout the log tree (and all data
      previously fsynced becoming lost forever).
      
      Fix this by not doing a discard when pinning an extent. The discard will
      be done later when it's safe (after the new super block is committed) at
      extent-tree.c:btrfs_finish_extent_commit().
      
      Fixes: e688b725 (Btrfs: fix extent pinning bugs in the tree log)
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 3909e5a9)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      372b2ac5
    • Nadav Amit's avatar
      KVM: x86: Fix MSR_IA32_BNDCFGS in msrs_to_save · 753fd54a
      Nadav Amit authored
      [ Upstream commit HEAD ]
      
      commit 9e9c3fe4 upstream.
      
      kvm_init_msr_list is currently called before hardware_setup. As a result,
      vmx_mpx_supported always returns false when kvm_init_msr_list checks whether to
      save MSR_IA32_BNDCFGS.
      
      Move kvm_init_msr_list after vmx_hardware_setup is called to fix this issue.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Message-Id: <1428864435-4732-1-git-send-email-namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      
      (cherry picked from commit 702a71cf)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      753fd54a
    • Mike Galbraith's avatar
      sched/idle/x86: Optimize unnecessary mwait_idle() resched IPIs · 6cbb41b1
      Mike Galbraith authored
      [ Upstream commit f8e617f4 ]
      
      To fully take advantage of MWAIT, apparently the CLFLUSH instruction needs
      another quirk on certain CPUs: proper barriers around it on certain machines.
      
      On a Q6600 SMP system, pipe-test scheduling performance, cross core,
      improves significantly:
      
        3.8.13                   487.2 KHz    1.000
        3.13.0-master            415.5 KHz     .852
        3.13.0-master+           415.2 KHz     .852     + restore mwait_idle
        3.13.0-master++          488.5 KHz    1.002     + restore mwait_idle + IPI fix
      
      Since X86_BUG_CLFLUSH_MONITOR is already a quirk, don't create a separate
      quirk for the extra smp_mb()s.
      Signed-off-by: default avatarMike Galbraith <bitbucket@online.de>
      Cc: <stable@vger.kernel.org> # 3.10+
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ian Malone <ibmalone@gmail.com>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1390061684.5566.4.camel@marge.simpson.net
      [ Ported to recent kernel, added comments about the quirk. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6cbb41b1
    • Len Brown's avatar
      sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power... · 560e6448
      Len Brown authored
      sched/idle/x86: Restore mwait_idle() to fix boot hangs, to improve power savings and to improve performance
      
      [ Upstream commit b253149b ]
      
      In Linux-3.9 we removed the mwait_idle() loop:
      
        69fb3676 ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
      
      The reasoning was that modern machines should be sufficiently
      happy during the boot process using the default_idle() HALT
      loop, until cpuidle loads and either acpi_idle or intel_idle
      invoke the newer MWAIT-with-hints idle loop.
      
      But two machines reported problems:
      
       1. Certain Core2-era machines support MWAIT-C1 and HALT only.
          MWAIT-C1 is preferred for optimal power and performance.
          But if they support just C1, cpuidle never loads and
          so they use the boot-time default idle loop forever.
      
       2. Some laptops will boot-hang if HALT is used,
          but will boot successfully if MWAIT is used.
          This appears to be a hidden assumption in BIOS SMI,
          that is presumably valid on the proprietary OS
          where the BIOS was validated.
      
             https://bugzilla.kernel.org/show_bug.cgi?id=60770
      
      So here we effectively revert the patch above, restoring
      the mwait_idle() loop.  However, we don't bother restoring
      the idle=mwait cmdline parameter, since it appears to add
      no value.
      
      Maintainer notes:
      
        For 3.9, simply revert 69fb3676
        for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
        context For 3.11, 3.12, 3.13, this patch applies cleanly
      Tested-by: default avatarMike Galbraith <bitbucket@online.de>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      Acked-by: default avatarMike Galbraith <bitbucket@online.de>
      Cc: <stable@vger.kernel.org> # 3.9+
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ian Malone <ibmalone@gmail.com>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
      [ Ported to recent kernels. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      560e6448
  2. 11 May, 2015 2 commits
    • Eric Dumazet's avatar
      net: fix crash in build_skb() · 0ff99ba9
      Eric Dumazet authored
      [ Upstream commit 2ea2f62c ]
      
      When I added pfmemalloc support in build_skb(), I forgot netlink
      was using build_skb() with a vmalloc() area.
      
      In this patch I introduce __build_skb() for netlink use,
      and build_skb() is a wrapper handling both skb->head_frag and
      skb->pfmemalloc
      
      This means netlink no longer has to hack skb->head_frag
      
      [ 1567.700067] kernel BUG at arch/x86/mm/physaddr.c:26!
      [ 1567.700067] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      [ 1567.700067] Dumping ftrace buffer:
      [ 1567.700067]    (ftrace buffer empty)
      [ 1567.700067] Modules linked in:
      [ 1567.700067] CPU: 9 PID: 16186 Comm: trinity-c182 Not tainted 4.0.0-next-20150424-sasha-00037-g4796e21 #2167
      [ 1567.700067] task: ffff880127efb000 ti: ffff880246770000 task.ti: ffff880246770000
      [ 1567.700067] RIP: __phys_addr (arch/x86/mm/physaddr.c:26 (discriminator 3))
      [ 1567.700067] RSP: 0018:ffff8802467779d8  EFLAGS: 00010202
      [ 1567.700067] RAX: 000041000ed8e000 RBX: ffffc9008ed8e000 RCX: 000000000000002c
      [ 1567.700067] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffffb3fd6049
      [ 1567.700067] RBP: ffff8802467779f8 R08: 0000000000000019 R09: ffff8801d0168000
      [ 1567.700067] R10: ffff8801d01680c7 R11: ffffed003a02d019 R12: ffffc9000ed8e000
      [ 1567.700067] R13: 0000000000000f40 R14: 0000000000001180 R15: ffffc9000ed8e000
      [ 1567.700067] FS:  00007f2a7da3f700(0000) GS:ffff8801d1000000(0000) knlGS:0000000000000000
      [ 1567.700067] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1567.700067] CR2: 0000000000738308 CR3: 000000022e329000 CR4: 00000000000007e0
      [ 1567.700067] Stack:
      [ 1567.700067]  ffffc9000ed8e000 ffff8801d0168000 ffffc9000ed8e000 ffff8801d0168000
      [ 1567.700067]  ffff880246777a28 ffffffffad7c0a21 0000000000001080 ffff880246777c08
      [ 1567.700067]  ffff88060d302e68 ffff880246777b58 ffff880246777b88 ffffffffad9a6821
      [ 1567.700067] Call Trace:
      [ 1567.700067] build_skb (include/linux/mm.h:508 net/core/skbuff.c:316)
      [ 1567.700067] netlink_sendmsg (net/netlink/af_netlink.c:1633 net/netlink/af_netlink.c:2329)
      [ 1567.774369] ? sched_clock_cpu (kernel/sched/clock.c:311)
      [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273)
      [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273)
      [ 1567.774369] sock_sendmsg (net/socket.c:614 net/socket.c:623)
      [ 1567.774369] sock_write_iter (net/socket.c:823)
      [ 1567.774369] ? sock_sendmsg (net/socket.c:806)
      [ 1567.774369] __vfs_write (fs/read_write.c:479 fs/read_write.c:491)
      [ 1567.774369] ? get_lock_stats (kernel/locking/lockdep.c:249)
      [ 1567.774369] ? default_llseek (fs/read_write.c:487)
      [ 1567.774369] ? vtime_account_user (kernel/sched/cputime.c:701)
      [ 1567.774369] ? rw_verify_area (fs/read_write.c:406 (discriminator 4))
      [ 1567.774369] vfs_write (fs/read_write.c:539)
      [ 1567.774369] SyS_write (fs/read_write.c:586 fs/read_write.c:577)
      [ 1567.774369] ? SyS_read (fs/read_write.c:577)
      [ 1567.774369] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
      [ 1567.774369] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2594 kernel/locking/lockdep.c:2636)
      [ 1567.774369] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42)
      [ 1567.774369] system_call_fastpath (arch/x86/kernel/entry_64.S:261)
      
      Fixes: 79930f58 ("net: do not deplete pfmemalloc reserve")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      0ff99ba9
    • Eric Dumazet's avatar
      net: do not deplete pfmemalloc reserve · cfe7befc
      Eric Dumazet authored
      [ Upstream commit 79930f58 ]
      
      build_skb() should look at the page pfmemalloc status.
      If set, this means page allocator allocated this page in the
      expectation it would help to free other pages. Networking
      stack can do that only if skb->pfmemalloc is also set.
      
      Also, we must refrain using high order pages from the pfmemalloc
      reserve, so __page_frag_refill() must also use __GFP_NOMEMALLOC for
      them. Under memory pressure, using order-0 pages is probably the best
      strategy.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      cfe7befc