1. 20 Jun, 2014 33 commits
    • NeilBrown's avatar
      md: always set MD_RECOVERY_INTR when interrupting a reshape thread. · ab41ac4c
      NeilBrown authored
      commit 2ac295a5 upstream.
      
      Commit 8313b8e5
         md: fix problem when adding device to read-only array with bitmap.
      
      added a called to md_reap_sync_thread() which cause a reshape thread
      to be interrupted (in particular, it could cause md_thread() to never even
      call md_do_sync()).
      However it didn't set MD_RECOVERY_INTR so ->finish_reshape() would not
      know that the reshape didn't complete.
      
      This only happens when mddev->ro is set and normally reshape threads
      don't run in that situation.  But raid5 and raid10 can start a reshape
      thread during "run" is the array is in the middle of a reshape.
      They do this even if ->ro is set.
      
      So it is best to set MD_RECOVERY_INTR before abortingg the
      sync thread, just in case.
      
      Though it rare for this to trigger a problem it can cause data corruption
      because the reshape isn't finished properly.
      So it is suitable for any stable which the offending commit was applied to.
      (3.2 or later)
      
      Fixes: 8313b8e5Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ab41ac4c
    • NeilBrown's avatar
      md: always set MD_RECOVERY_INTR when aborting a reshape or other "resync". · 73e620da
      NeilBrown authored
      commit 3991b31e upstream.
      
      If mddev->ro is set, md_to_sync will (correctly) abort.
      However in that case MD_RECOVERY_INTR isn't set.
      
      If a RESHAPE had been requested, then ->finish_reshape() will be
      called and it will think the reshape was successful even though
      nothing happened.
      
      Normally a resync will not be requested if ->ro is set, but if an
      array is stopped while a reshape is on-going, then when the array is
      started, the reshape will be restarted.  If the array is also set
      read-only at this point, the reshape will instantly appear to success,
      resulting in data corruption.
      
      Consequently, this patch is suitable for any -stable kernel.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      73e620da
    • Martin K. Petersen's avatar
      libata: Blacklist queued trim for Crucial M500 · dc99f5d6
      Martin K. Petersen authored
      commit 3b8d2676 upstream.
      
      Queued trim only works for some users with MU05 firmware.  Revert to
      blacklisting all firmware versions.
      
      Introduced by commit d121f7d0 ("libata: Update queued trim blacklist
      for M5x0 drives") which this effectively reverts, while retaining the
      blacklisting of M550.
      
      See
      
          https://bugzilla.kernel.org/show_bug.cgi?id=71371
      
      for reports of trouble with MU05 firmware.
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      dc99f5d6
    • Chris Wilson's avatar
      drm/i915: Only copy back the modified fields to userspace from execbuffer · f1daac89
      Chris Wilson authored
      commit 9aab8bff upstream.
      
      We only want to modifiy a single field in the userspace view of the
      execbuffer command buffer, so explicitly change that rather than copy
      everything back again.
      
      This serves two purposes:
      
      1. The single fields are much cheaper to copy (constant size so the
      copy uses special case code) and much smaller than the whole array.
      
      2. We modify the array for internal use that need to be masked from
      the user.
      
      Note: We need this backported since without it the next bugfix will
      blow up when userspace recycles batchbuffers and relocations.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f1daac89
    • Christian König's avatar
      9292af25
    • Lai Jiangshan's avatar
      sched: Fix hotplug vs. set_cpus_allowed_ptr() · 774f680a
      Lai Jiangshan authored
      commit 6acbfb96 upstream.
      
      Lai found that:
      
        WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
        ...
        migration_cpu_stop+0x1d/0x22
      
      was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
      always a sub-set of cpu_online_mask.
      
      This isn't true since 5fbd036b ("sched: Cleanup cpu_active madness").
      
      So set active and online at the same time to avoid this particular
      problem.
      
      Fixes: 5fbd036b ("sched: Cleanup cpu_active madness")
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael wang <wangyun@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      774f680a
    • Heinz Mauelshagen's avatar
      dm cache: always split discards on cache block boundaries · 3ae4e547
      Heinz Mauelshagen authored
      commit f1daa838 upstream.
      
      The DM cache target cannot cope with discards that span multiple cache
      blocks, so each discard bio that spans more than one cache block must
      get split by the DM core.
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3ae4e547
    • Bibek Basu's avatar
      cpufreq: remove race while accessing cur_policy · 3ddbd9a2
      Bibek Basu authored
      commit c5450db8 upstream.
      
      While accessing cur_policy during executing events
      CPUFREQ_GOV_START, CPUFREQ_GOV_STOP, CPUFREQ_GOV_LIMITS,
      same mutex lock is not taken, dbs_data->mutex, which leads
      to race and data corruption while running continious suspend
      resume test. This is seen with ondemand governor with suspend
      resume test using rtcwake.
      
       Unable to handle kernel NULL pointer dereference at virtual address 00000028
       pgd = ed610000
       [00000028] *pgd=adf11831, *pte=00000000, *ppte=00000000
       Internal error: Oops: 17 [#1] PREEMPT SMP ARM
       Modules linked in: nvhost_vi
       CPU: 1 PID: 3243 Comm: rtcwake Not tainted 3.10.24-gf5cf9e5 #1
       task: ee708040 ti: ed61c000 task.ti: ed61c000
       PC is at cpufreq_governor_dbs+0x400/0x634
       LR is at cpufreq_governor_dbs+0x3f8/0x634
       pc : [<c05652b8>] lr : [<c05652b0>] psr: 600f0013
       sp : ed61dcb0 ip : 000493e0 fp : c1cc14f0
       r10: 00000000 r9 : 00000000 r8 : 00000000
       r7 : eb725280 r6 : c1cc1560 r5 : eb575200 r4 : ebad7740
       r3 : ee708040 r2 : ed61dca8 r1 : 001ebd24 r0 : 00000000
       Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
       Control: 10c5387d Table: ad61006a DAC: 00000015
       [<c05652b8>] (cpufreq_governor_dbs+0x400/0x634) from [<c055f700>] (__cpufreq_governor+0x98/0x1b4)
       [<c055f700>] (__cpufreq_governor+0x98/0x1b4) from [<c0560770>] (__cpufreq_set_policy+0x250/0x320)
       [<c0560770>] (__cpufreq_set_policy+0x250/0x320) from [<c0561dcc>] (cpufreq_update_policy+0xcc/0x168)
       [<c0561dcc>] (cpufreq_update_policy+0xcc/0x168) from [<c0561ed0>] (cpu_freq_notify+0x68/0xdc)
       [<c0561ed0>] (cpu_freq_notify+0x68/0xdc) from [<c008eff8>] (notifier_call_chain+0x4c/0x8c)
       [<c008eff8>] (notifier_call_chain+0x4c/0x8c) from [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68)
       [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68) from [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28)
       [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28) from [<c00aac6c>] (pm_qos_update_bounded_target+0xd8/0x310)
       [<c00aac6c>] (pm_qos_update_bounded_target+0xd8/0x310) from [<c00ab3b0>] (__pm_qos_update_request+0x64/0x70)
       [<c00ab3b0>] (__pm_qos_update_request+0x64/0x70) from [<c004b4b8>] (tegra_pm_notify+0x114/0x134)
       [<c004b4b8>] (tegra_pm_notify+0x114/0x134) from [<c008eff8>] (notifier_call_chain+0x4c/0x8c)
       [<c008eff8>] (notifier_call_chain+0x4c/0x8c) from [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68)
       [<c008f3d4>] (__blocking_notifier_call_chain+0x50/0x68) from [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28)
       [<c008f40c>] (blocking_notifier_call_chain+0x20/0x28) from [<c00ac228>] (pm_notifier_call_chain+0x1c/0x34)
       [<c00ac228>] (pm_notifier_call_chain+0x1c/0x34) from [<c00ad38c>] (enter_state+0xec/0x128)
       [<c00ad38c>] (enter_state+0xec/0x128) from [<c00ad400>] (pm_suspend+0x38/0xa4)
       [<c00ad400>] (pm_suspend+0x38/0xa4) from [<c00ac114>] (state_store+0x70/0xc0)
       [<c00ac114>] (state_store+0x70/0xc0) from [<c027b1e8>] (kobj_attr_store+0x14/0x20)
       [<c027b1e8>] (kobj_attr_store+0x14/0x20) from [<c019cd9c>] (sysfs_write_file+0x104/0x184)
       [<c019cd9c>] (sysfs_write_file+0x104/0x184) from [<c0143038>] (vfs_write+0xd0/0x19c)
       [<c0143038>] (vfs_write+0xd0/0x19c) from [<c0143414>] (SyS_write+0x4c/0x78)
       [<c0143414>] (SyS_write+0x4c/0x78) from [<c000f080>] (ret_fast_syscall+0x0/0x30)
       Code: e1a00006 eb084346 e59b0020 e5951024 (e5903028)
       ---[ end trace 0488523c8f6b0f9d ]---
      Signed-off-by: default avatarBibek Basu <bbasu@nvidia.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3ddbd9a2
    • Rabin Vincent's avatar
      ARM: 8064/1: fix v7-M signal return · ae87686e
      Rabin Vincent authored
      commit 483a6c9d upstream.
      
      According to the ARM ARM, the behaviour is UNPREDICTABLE if the PC read
      from the exception return stack is not half word aligned.  See the
      pseudo code for ExceptionReturn() and PopStack().
      
      The signal handler's address has the bit 0 set, and setup_return()
      directly writes this to regs->ARM_pc.  Current hardware happens to
      discard this bit, but QEMU's emulation doesn't and this makes processes
      crash.  Mask out bit 0 before the exception return in order to get
      predictable behaviour.
      
      Fixes: 19c4d593 ("ARM: ARMv7-M: Add support for exception handling")
      Acked-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarRabin Vincent <rabin@rab.in>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ae87686e
    • Andrey Ryabinin's avatar
      ARM: 8051/1: put_user: fix possible data corruption in put_user · add232a1
      Andrey Ryabinin authored
      commit 537094b6 upstream.
      
      According to arm procedure call standart r2 register is call-cloberred.
      So after the result of x expression was put into r2 any following
      function call in p may overwrite r2. To fix this, the result of p
      expression must be saved to the temporary variable before the
      assigment x expression to __r2.
      Signed-off-by: default avatarAndrey Ryabinin <a.ryabinin@samsung.com>
      Reviewed-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      add232a1
    • Laurent Pinchart's avatar
      ARM: OMAP3: clock: Back-propagate rate change from cam_mclk to dpll4_m5 on all OMAP3 platforms · 7e84ef36
      Laurent Pinchart authored
      commit 98d7e1ae upstream.
      
      Commit 7b2e1277 ("ARM: OMAP3: clock:
      Back-propagate rate change from cam_mclk to dpll4_m5") enabled clock
      rate back-propagation from cam_mclk do dpll4_m5 on OMAP3630 only.
      Perform back-propagation on other OMAP3 platforms as well.
      Reported-by: default avatarJean-Philippe François <jp.francois@cynove.com>
      Signed-off-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7e84ef36
    • Peter Ujfalusi's avatar
      ARM: omap5: hwmod_data: Correct IDLEMODE for McPDM · d0c827e6
      Peter Ujfalusi authored
      commit 0f9e19ad upstream.
      
      McPDM need to be configured to NO_IDLE mode when it is in used otherwise
      vital clocks will be gated which results 'slow motion' audio playback.
      Signed-off-by: default avatarPeter Ujfalusi <peter.ujfalusi@ti.com>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d0c827e6
    • Emil Goode's avatar
      ARM: imx: fix error handling in ipu device registration · 93a5cb6c
      Emil Goode authored
      commit d1d70e5d upstream.
      
      If we fail to allocate struct platform_device pdev we
      dereference it after the goto label err.
      
      This bug was found using coccinelle.
      
      Fixes: afa77ef3 (ARM: mx3: dynamically allocate "ipu-core" devices)
      Signed-off-by: default avatarEmil Goode <emilgoode@gmail.com>
      Acked-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarShawn Guo <shawn.guo@freescale.com>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      93a5cb6c
    • Joe Lawrence's avatar
      SCSI: scsi_transport_sas: move bsg destructor into sas_rphy_remove · 517e5981
      Joe Lawrence authored
      commit 6aa6caff upstream.
      
      The recent change in sysfs, bcdde7e2
      "sysfs: make __sysfs_remove_dir() recursive" revealed an asymmetric
      rphy device creation/deletion sequence in scsi_transport_sas:
      
        modprobe mpt2sas
          sas_rphy_add
            device_add A               rphy->dev
            device_add B               sas_device transport class
            device_add C               sas_end_device transport class
            device_add D               bsg class
      
        rmmod mpt2sas
          sas_rphy_delete
            sas_rphy_remove
              device_del B
              device_del C
              device_del A
                sysfs_remove_group     recursive sysfs dir removal
            sas_rphy_free
              device_del D             warning
      
        where device A is the parent of B, C, and D.
      
      When sas_rphy_free tries to unregister the bsg request queue (device D
      above), the ensuing sysfs cleanup discovers that its sysfs group has
      already been removed and emits a warning, "sysfs group... not found for
      kobject 'end_device-X:0'".
      
      Since bsg creation is a side effect of sas_rphy_add, move its
      complementary removal call into sas_rphy_remove. This imposes the
      following tear-down order for the devices above: D, B, C, A.
      
      Note the sas_device and sas_end_device transport class devices (B and C
      above) are created and destroyed both via the list match traversal in
      attribute_container_device_trigger, so the order in which they are
      handled is fixed. This is fine as long as they are deleted before their
      parent device.
      Signed-off-by: default avatarJoe Lawrence <joe.lawrence@stratus.com>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      517e5981
    • Alex Deucher's avatar
      drm/radeon: handle non-VGA class pci devices with ATRM · 7d683054
      Alex Deucher authored
      commit d8ade352 upstream.
      
      Newer PX systems have non-VGA pci class dGPUs.  Update
      the ATRM fetch method to handle those cases.
      
      bug:
      https://bugzilla.kernel.org/show_bug.cgi?id=75401Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7d683054
    • Christian König's avatar
      drm/radeon: also try GART for CPU accessed buffers · bac59d1b
      Christian König authored
      commit 54409259 upstream.
      
      Placing them exclusively into VRAM might not work all the time.
      
      Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=78297Signed-off-by: default avatarChristian König <christian.koenig@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bac59d1b
    • Ben Skeggs's avatar
      drm/gf119-/disp: fix nasty bug which can clobber SOR0's clock setup · afb44e17
      Ben Skeggs authored
      commit 0f1d360b upstream.
      
      Fixes a LVDS bleed issue on Lenovo W530 that can occur under a
      number of circumstances.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      afb44e17
    • Jean Delvare's avatar
      hwmon: (ntc_thermistor) Fix OF device ID mapping · 1b2b80af
      Jean Delvare authored
      commit ead82d67 upstream.
      
      The mapping from OF device IDs to platform device IDs is wrong.
      TYPE_NCPXXWB473 is 0, TYPE_NCPXXWL333 is 1, so
      ntc_thermistor_id[TYPE_NCPXXWB473] is { "ncp15wb473", TYPE_NCPXXWB473 }
      while
      ntc_thermistor_id[TYPE_NCPXXWL333] is { "ncp18wb473", TYPE_NCPXXWB473 }.
      
      So the name is wrong for all but the "ntc,ncp15wb473" entry, and the
      type is wrong for the "ntc,ncp15wl333" entry.
      
      So map the entries by index, it is neither elegant nor robust but at
      least it is correct.
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Fixes: 9e8269de hwmon: (ntc_thermistor) Add DT with IIO support to NTC thermistor driver
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: Naveen Krishna Chatradhi <ch.naveen@samsung.com>
      Cc: Doug Anderson <dianders@chromium.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1b2b80af
    • Jean Delvare's avatar
      hwmon: (ntc_thermistor) Fix dependencies · f8239ad5
      Jean Delvare authored
      commit 59cf4243 upstream.
      
      In commit 9e8269de, support was added for ntc_thermistor devices being
      declared in the device tree and implemented on top of IIO. With that
      change, a dependency was added to the ntc_thermistor driver:
      
      	depends on (!OF && !IIO) || (OF && IIO)
      
      This construct has the drawback that the driver can no longer be
      selected when OF is set and IIO isn't, nor when IIO is set and OF is
      not. This is a regression for the original users of the driver.
      
      As the new code depends on IIO and is useless without OF, include it
      only if both are enabled, and set the dependencies accordingly. This
      is clearer, more simple and more correct.
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Fixes: 9e8269de hwmon: (ntc_thermistor) Add DT with IIO support to NTC thermistor driver
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: Naveen Krishna Chatradhi <ch.naveen@samsung.com>
      Cc: Doug Anderson <dianders@chromium.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f8239ad5
    • Johannes Berg's avatar
      Documentation: fix DOCBOOKS=... building · 4552eb24
      Johannes Berg authored
      commit e60cbeed upstream.
      
      Prior to commit 42661299 ("[media] DocBook: Move all media docbook
      stuff into its own directory") it was possible to build only a single
      (or more) book(s) by calling, for example
      
          make htmldocs DOCBOOKS=80211.xml
      
      This now fails:
      
          cp: target `.../Documentation/DocBook//media_api' is not a directory
      
      Ignore errors from that copy to make this possible again.
      
      Fixes: 42661299 ("[media] DocBook: Move all media docbook stuff into its own directory")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Acked-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4552eb24
    • Naoya Horiguchi's avatar
      mm/memory-failure.c: fix memory leak by race between poison and unpoison · 191df2f2
      Naoya Horiguchi authored
      commit 3e030ecc upstream.
      
      When a memory error happens on an in-use page or (free and in-use)
      hugepage, the victim page is isolated with its refcount set to one.
      
      When you try to unpoison it later, unpoison_memory() calls put_page()
      for it twice in order to bring the page back to free page pool (buddy or
      free hugepage list).  However, if another memory error occurs on the
      page which we are unpoisoning, memory_failure() returns without
      releasing the refcount which was incremented in the same call at first,
      which results in memory leak and unconsistent num_poisoned_pages
      statistics.  This patch fixes it.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      191df2f2
    • Peter Zijlstra's avatar
      perf: Fix race in removing an event · c8db9ba0
      Peter Zijlstra authored
      commit 46ce0fe9 upstream.
      
      When removing a (sibling) event we do:
      
      	raw_spin_lock_irq(&ctx->lock);
      	perf_group_detach(event);
      	raw_spin_unlock_irq(&ctx->lock);
      
      	<hole>
      
      	perf_remove_from_context(event);
      		raw_spin_lock_irq(&ctx->lock);
      		...
      		raw_spin_unlock_irq(&ctx->lock);
      
      Now, assuming the event is a sibling, it will be 'unreachable' for
      things like ctx_sched_out() because that iterates the
      groups->siblings, and we just unhooked the sibling.
      
      So, if during <hole> we get ctx_sched_out(), it will miss the event
      and not call event_sched_out() on it, leaving it programmed on the
      PMU.
      
      The subsequent perf_remove_from_context() call will find the ctx is
      inactive and only call list_del_event() to remove the event from all
      other lists.
      
      Hereafter we can proceed to free the event; while still programmed!
      
      Close this hole by moving perf_group_detach() inside the same
      ctx->lock region(s) perf_remove_from_context() has.
      
      The condition on inherited events only in __perf_event_exit_task() is
      likely complete crap because non-inherited events are part of groups
      too and we're tearing down just the same. But leave that for another
      patch.
      
      Most-likely-Fixes: e03a9a55 ("perf: Change close() semantics for group events")
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Tested-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Much-staring-at-traces-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Much-staring-at-traces-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140505093124.GN17778@laptop.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c8db9ba0
    • Peter Zijlstra's avatar
      perf: Limit perf_event_attr::sample_period to 63 bits · ed8acba3
      Peter Zijlstra authored
      commit 0819b2e3 upstream.
      
      Vince reported that using a large sample_period (one with bit 63 set)
      results in wreckage since while the sample_period is fundamentally
      unsigned (negative periods don't make sense) the way we implement
      things very much rely on signed logic.
      
      So limit sample_period to 63 bits to avoid tripping over this.
      Reported-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-p25fhunibl4y3qi0zuqmyf4b@git.kernel.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ed8acba3
    • Jiri Olsa's avatar
      perf: Prevent false warning in perf_swevent_add · e36e8c8f
      Jiri Olsa authored
      commit 39af6b16 upstream.
      
      The perf cpu offline callback takes down all cpu context
      events and releases swhash->swevent_hlist.
      
      This could race with task context software event being just
      scheduled on this cpu via perf_swevent_add while cpu hotplug
      code already cleaned up event's data.
      
      The race happens in the gap between the cpu notifier code
      and the cpu being actually taken down. Note that only cpu
      ctx events are terminated in the perf cpu hotplug code.
      
      It's easily reproduced with:
        $ perf record -e faults perf bench sched pipe
      
      while putting one of the cpus offline:
        # echo 0 > /sys/devices/system/cpu/cpu1/online
      
      Console emits following warning:
        WARNING: CPU: 1 PID: 2845 at kernel/events/core.c:5672 perf_swevent_add+0x18d/0x1a0()
        Modules linked in:
        CPU: 1 PID: 2845 Comm: sched-pipe Tainted: G        W    3.14.0+ #256
        Hardware name: Intel Corporation Montevina platform/To be filled by O.E.M., BIOS AMVACRB1.86C.0066.B00.0805070703 05/07/2008
         0000000000000009 ffff880077233ab8 ffffffff81665a23 0000000000200005
         0000000000000000 ffff880077233af8 ffffffff8104732c 0000000000000046
         ffff88007467c800 0000000000000002 ffff88007a9cf2a0 0000000000000001
        Call Trace:
         [<ffffffff81665a23>] dump_stack+0x4f/0x7c
         [<ffffffff8104732c>] warn_slowpath_common+0x8c/0xc0
         [<ffffffff8104737a>] warn_slowpath_null+0x1a/0x20
         [<ffffffff8110fb3d>] perf_swevent_add+0x18d/0x1a0
         [<ffffffff811162ae>] event_sched_in.isra.75+0x9e/0x1f0
         [<ffffffff8111646a>] group_sched_in+0x6a/0x1f0
         [<ffffffff81083dd5>] ? sched_clock_local+0x25/0xa0
         [<ffffffff811167e6>] ctx_sched_in+0x1f6/0x450
         [<ffffffff8111757b>] perf_event_sched_in+0x6b/0xa0
         [<ffffffff81117a4b>] perf_event_context_sched_in+0x7b/0xc0
         [<ffffffff81117ece>] __perf_event_task_sched_in+0x43e/0x460
         [<ffffffff81096f1e>] ? put_lock_stats.isra.18+0xe/0x30
         [<ffffffff8107b3c8>] finish_task_switch+0xb8/0x100
         [<ffffffff8166a7de>] __schedule+0x30e/0xad0
         [<ffffffff81172dd2>] ? pipe_read+0x3e2/0x560
         [<ffffffff8166b45e>] ? preempt_schedule_irq+0x3e/0x70
         [<ffffffff8166b45e>] ? preempt_schedule_irq+0x3e/0x70
         [<ffffffff8166b464>] preempt_schedule_irq+0x44/0x70
         [<ffffffff816707f0>] retint_kernel+0x20/0x30
         [<ffffffff8109e60a>] ? lockdep_sys_exit+0x1a/0x90
         [<ffffffff812a4234>] lockdep_sys_exit_thunk+0x35/0x67
         [<ffffffff81679321>] ? sysret_check+0x5/0x56
      
      Fixing this by tracking the cpu hotplug state and displaying
      the WARN only if current cpu is initialized properly.
      
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1396861448-10097-1-git-send-email-jolsa@redhat.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e36e8c8f
    • Thomas Gleixner's avatar
      sched: Sanitize irq accounting madness · 3cca5deb
      Thomas Gleixner authored
      commit 2d513868 upstream.
      
      Russell reported, that irqtime_account_idle_ticks() takes ages due to:
      
             for (i = 0; i < ticks; i++)
                     irqtime_account_process_tick(current, 0, rq);
      
      It's sad, that this code was written way _AFTER_ the NOHZ idle
      functionality was available. I charge myself guitly for not paying
      attention when that crap got merged with commit abb74cef ("sched:
      Export ns irqtimes through /proc/stat")
      
      So instead of looping nr_ticks times just apply the whole thing at
      once.
      
      As a side note: The whole cputime_t vs. u64 business in that context
      wants to be cleaned up as well. There is no point in having all these
      back and forth conversions. Lets standardise on u64 nsec for all
      kernel internal accounting and be done with it. Everything else does
      not make sense at all for fine grained accounting. Frederic, can you
      please take care of that?
      Reported-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Shaun Ruffell <sruffell@digium.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1405022307000.6261@ionos.tec.linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3cca5deb
    • Steven Rostedt (Red Hat)'s avatar
      sched: Use CPUPRI_NR_PRIORITIES instead of MAX_RT_PRIO in cpupri check · 21af1041
      Steven Rostedt (Red Hat) authored
      commit 6227cb00 upstream.
      
      The check at the beginning of cpupri_find() makes sure that the task_pri
      variable does not exceed the cp->pri_to_cpu array length. But that length
      is CPUPRI_NR_PRIORITIES not MAX_RT_PRIO, where it will miss the last two
      priorities in that array.
      
      As task_pri is computed from convert_prio() which should never be bigger
      than CPUPRI_NR_PRIORITIES, if the check should cause a panic if it is
      hit.
      Reported-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1397015410.5212.13.camel@marge.simpson.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      21af1041
    • David Woodhouse's avatar
      iommu/vt-d: Fix missing IOTLB flush in intel_iommu_unmap() · 4801cb6b
      David Woodhouse authored
      This is a small excerpt of the upstream commit
      ea8ea460 (iommu/vt-d: Clean up and fix
      page table clear/free behaviour).
      
      This missing IOTLB flush was added as a minor, inconsequential bug-fix
      in commit ea8ea460 ("iommu/vt-d: Clean up and fix page table clear/free
      behaviour") in 3.15. It wasn't originally intended for -stable but a
      couple of users have reported issues which turn out to be fixed by
      adding the missing flush.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4801cb6b
    • Will Deacon's avatar
      ARM: perf: hook up perf_sample_event_took around pmu irq handling · 22259d1f
      Will Deacon authored
      commit 5f5092e7 upstream.
      
      Since we indirect all of our PMU IRQ handling through a dispatcher, it's
      trivial to hook up perf_sample_event_took to prevent applications such
      as oprofile from generating interrupt storms due to an unrealisticly
      low sample period.
      Reported-by: default avatarRobert Richter <rric@kernel.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      22259d1f
    • Vlastimil Babka's avatar
      mm/compaction: make isolate_freepages start at pageblock boundary · 4ce13868
      Vlastimil Babka authored
      commit 49e068f0 upstream.
      
      The compaction freepage scanner implementation in isolate_freepages()
      starts by taking the current cc->free_pfn value as the first pfn.  In a
      for loop, it scans from this first pfn to the end of the pageblock, and
      then subtracts pageblock_nr_pages from the first pfn to obtain the first
      pfn for the next for loop iteration.
      
      This means that when cc->free_pfn starts at offset X rather than being
      aligned on pageblock boundary, the scanner will start at offset X in all
      scanned pageblock, ignoring potentially many free pages.  Currently this
      can happen when
      
       a) zone's end pfn is not pageblock aligned, or
      
       b) through zone->compact_cached_free_pfn with CONFIG_HOLES_IN_ZONE
          enabled and a hole spanning the beginning of a pageblock
      
      This patch fixes the problem by aligning the initial pfn in
      isolate_freepages() to pageblock boundary.  This also permits replacing
      the end-of-pageblock alignment within the for loop with a simple
      pageblock_nr_pages increment.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarHeesub Shin <heesub.shin@samsung.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Dongjun Shin <d.j.shin@samsung.com>
      Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4ce13868
    • Vlastimil Babka's avatar
      mm: compaction: detect when scanners meet in isolate_freepages · 31662557
      Vlastimil Babka authored
      commit 7ed695e0 upstream.
      
      Compaction of a zone is finished when the migrate scanner (which begins
      at the zone's lowest pfn) meets the free page scanner (which begins at
      the zone's highest pfn).  This is detected in compact_zone() and in the
      case of direct compaction, the compact_blockskip_flush flag is set so
      that kswapd later resets the cached scanner pfn's, and a new compaction
      may again start at the zone's borders.
      
      The meeting of the scanners can happen during either scanner's activity.
      However, it may currently fail to be detected when it occurs in the free
      page scanner, due to two problems.  First, isolate_freepages() keeps
      free_pfn at the highest block where it isolated pages from, for the
      purposes of not missing the pages that are returned back to allocator
      when migration fails.  Second, failing to isolate enough free pages due
      to scanners meeting results in -ENOMEM being returned by
      migrate_pages(), which makes compact_zone() bail out immediately without
      calling compact_finished() that would detect scanners meeting.
      
      This failure to detect scanners meeting might result in repeated
      attempts at compaction of a zone that keep starting from the cached
      pfn's close to the meeting point, and quickly failing through the
      -ENOMEM path, without the cached pfns being reset, over and over.  This
      has been observed (through additional tracepoints) in the third phase of
      the mmtests stress-highalloc benchmark, where the allocator runs on an
      otherwise idle system.  The problem was observed in the DMA32 zone,
      which was used as a fallback to the preferred Normal zone, but on the
      4GB system it was actually the largest zone.  The problem is even
      amplified for such fallback zone - the deferred compaction logic, which
      could (after being fixed by a previous patch) reset the cached scanner
      pfn's, is only applied to the preferred zone and not for the fallbacks.
      
      The problem in the third phase of the benchmark was further amplified by
      commit 81c0a2bb ("mm: page_alloc: fair zone allocator policy") which
      resulted in a non-deterministic regression of the allocation success
      rate from ~85% to ~65%.  This occurs in about half of benchmark runs,
      making bisection problematic.  It is unlikely that the commit itself is
      buggy, but it should put more pressure on the DMA32 zone during phases 1
      and 2, which may leave it more fragmented in phase 3 and expose the bugs
      that this patch fixes.
      
      The fix is to make scanners meeting in isolate_freepage() stay that way,
      and to check in compact_zone() for scanners meeting when migrate_pages()
      returns -ENOMEM.  The result is that compact_finished() also detects
      scanners meeting and sets the compact_blockskip_flush flag to make
      kswapd reset the scanner pfn's.
      
      The results in stress-highalloc benchmark show that the "regression" by
      commit 81c0a2bb in phase 3 no longer occurs, and phase 1 and 2
      allocation success rates are also significantly improved.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      31662557
    • Vlastimil Babka's avatar
      mm: compaction: reset cached scanner pfn's before reading them · 948ec1db
      Vlastimil Babka authored
      commit d3132e4b upstream.
      
      Compaction caches pfn's for its migrate and free scanners to avoid
      scanning the whole zone each time.  In compact_zone(), the cached values
      are read to set up initial values for the scanners.  There are several
      situations when these cached pfn's are reset to the first and last pfn
      of the zone, respectively.  One of these situations is when a compaction
      has been deferred for a zone and is now being restarted during a direct
      compaction, which is also done in compact_zone().
      
      However, compact_zone() currently reads the cached pfn's *before*
      resetting them.  This means the reset doesn't affect the compaction that
      performs it, and with good chance also subsequent compactions, as
      update_pageblock_skip() is likely to be called and update the cached
      pfn's to those being processed.  Another chance for a successful reset
      is when a direct compaction detects that migration and free scanners
      meet (which has its own problems addressed by another patch) and sets
      update_pageblock_skip flag which kswapd uses to do the reset because it
      goes to sleep.
      
      This is clearly a bug that results in non-deterministic behavior, so
      this patch moves the cached pfn reset to be performed *before* the
      values are read.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      948ec1db
    • Nicholas Bellinger's avatar
      target: Fix NULL pointer dereference for XCOPY in target_put_sess_cmd · 9c7e0735
      Nicholas Bellinger authored
      commit 0ed6e189 upstream.
      
      This patch fixes a NULL pointer dereference regression bug that was
      introduced with:
      
      commit 1e1110c4
      Author: Mikulas Patocka <mpatocka@redhat.com>
      Date:   Sat May 17 06:49:22 2014 -0400
      
          target: fix memory leak on XCOPY
      
      Now that target_put_sess_cmd() -> kref_put_spinlock_irqsave() is
      called with a valid se_cmd->cmd_kref, a NULL pointer dereference
      is triggered because the XCOPY passthrough commands don't have
      an associated se_session pointer.
      
      To address this bug, go ahead and checking for a NULL se_sess pointer
      within target_put_sess_cmd(), and call se_cmd->se_tfo->release_cmd()
      to release the XCOPY's xcopy_pt_cmd memory.
      Reported-by: default avatarThomas Glanzmann <thomas@glanzmann.de>
      Cc: Thomas Glanzmann <thomas@glanzmann.de>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org # 3.12+
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9c7e0735
    • Justin Maggard's avatar
      btrfs: fix defrag 32-bit integer overflow · 6e8451cb
      Justin Maggard authored
      commit c41570c9 upstream.
      
      When defragging a very large file, the cluster variable can wrap its 32-bit
      signed int type and become negative, which eventually gets passed to
      btrfs_force_ra() as a very large unsigned long value.  On 32-bit platforms,
      this eventually results in an Oops from the SLAB allocator.
      
      Change the cluster and max_cluster signed int variables to unsigned long to
      match the readahead functions.  This also allows the min() comparison in
      btrfs_defrag_file() to work as intended.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6e8451cb
  2. 18 Jun, 2014 7 commits