1. 07 Jun, 2017 12 commits
    • Ville Syrjälä's avatar
      drm/i915: Restore has_fbc=1 for ILK-M · 27fe407c
      Ville Syrjälä authored
      Restore the lost has_fbc flag for mobile ILK.
      
      Cc: Carlos Santa <carlos.santa@intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Fixes: a1323380 ("drm/i915: Introduce GEN5_FEATURES for device info")
      Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170606133229.12439-1-ville.syrjala@linux.intel.comReviewed-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      (cherry picked from commit c2d1a0ce)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      27fe407c
    • Ville Syrjälä's avatar
      drm/i915: Workaround VLV/CHV DSI scanline counter hardware fail · 8f4d3809
      Ville Syrjälä authored
      The scanline counter is bonkers on VLV/CHV DSI. The scanline counter
      increment is not lined up with the start of vblank like it is on
      every other platform and output type. This causes problems for
      both the vblank timestamping and atomic update vblank evasion.
      
      On my FFRD8 machine at least, the scanline counter increment
      happens about 1/3 of a scanline ahead of the start of vblank (which
      is where all register latching happens still). That means we can't
      trust the scanline counter to tell us whether we're in vblank or not
      while we're on that particular line. In order to keep vblank
      timestamping in working condition when called from the vblank irq,
      we'll leave scanline_offset at one, which means that the entire
      line containing the start of vblank is considered to be inside
      the vblank.
      
      For the vblank evasion we'll need to consider that entire line
      to be bad, since we can't tell whether the registers already
      got latched or not. And we can't actually use the start of vblank
      interrupt to get us past that line as the interrupt would fire
      too soon, and then we'd up waiting for the next start of vblank
      instead. One way around that would using the frame start
      interrupt instead since that wouldn't fire until the next
      scanline, but that would require some bigger changes in the
      interrupt code. So for simplicity we'll just poll until we get
      past the bad line.
      
      v2: Adjust the comments a bit
      
      Cc: stable@vger.kernel.org
      Cc: Jonas Aaberg <cja@gmx.net>
      Tested-by: default avatarJonas Aaberg <cja@gmx.net>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99086Signed-off-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161215174734.28779-1-ville.syrjala@linux.intel.comTested-by: default avatarMika Kahola <mika.kahola@intel.com>
      Reviewed-by: default avatarMika Kahola <mika.kahola@intel.com>
      (cherry picked from commit ec1b4ee2)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      8f4d3809
    • Chris Wilson's avatar
      drm/i915: Fix logical inversion for gen4 quirking · 5857dbfa
      Chris Wilson authored
      The assertion that we want to make before disabling the pin of the pages
      for the unknown swizzling quirk is that the quirk is indeed active, and
      that the quirk is disabled before we do apply it to the pages.
      
      Fixes: 2c3a3f44 ("drm/i915: Fix pages pin counting around swizzle quirk")
      Fixes: 957870f9 ("drm/i915: Split out i915_gem_object_set_tiling()")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170521124014.27678-1-chris@chris-wilson.co.uk
      Reviewed-bhy: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      (cherry picked from commit 20bb3771)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      5857dbfa
    • Chris Wilson's avatar
      drm/i915: Guard against i915_ggtt_disable_guc() being invoked unconditionally · d90c9890
      Chris Wilson authored
      Commit 7c3f86b6 ("drm/i915: Invalidate the guc ggtt TLB upon
      insertion") added the restoration of the invalidation routine after the
      GuC was disabled, but missed that the GuC was unconditionally disabled
      when not used. This then overwrites the invalidate routine for the older
      chipsets, causing havoc and breaking resume as the most obvious victim.
      
      We place the guard inside i915_ggtt_disable_guc() to be backport
      friendly (the bug was introduced into v4.11) but it would be preferred
      to be in more control over when this was guard (i.e. do not try and
      teardown the data structures before we have enabled them). That should
      be true with the reorganisation of the guc loaders.
      Reported-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Fixes: 7c3f86b6 ("drm/i915: Invalidate the guc ggtt TLB upon insertion")
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Oscar Mateo <oscar.mateo@intel.com>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
      Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
      Cc: <stable@vger.kernel.org> # v4.11+
      Link: http://patchwork.freedesktop.org/patch/msgid/20170531190514.3691-1-chris@chris-wilson.co.ukReviewed-by: default avatarMichel Thierry <michel.thierry@intel.com>
      (cherry picked from commit cb60606d)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      d90c9890
    • Maarten Lankhorst's avatar
      drm/i915: Always recompute watermarks when distrust_bios_wm is set, v2. · 4e3aed84
      Maarten Lankhorst authored
      On some systems there can be a race condition in which no crtc state is
      added to the first atomic commit. This results in all crtc's having a
      null DDB allocation, causing a FIFO underrun on any update until the
      first modeset.
      
      Changes since v1:
      - Do not take the connection_mutex, this is already done below.
      Reported-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Inspired-by: default avatarMahesh Kumar <mahesh1.kumar@intel.com>
      Signed-off-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Fixes: 98d39494 ("drm/i915/gen9: Compute DDB allocation at atomic
      check time (v4)")
      Cc: <stable@vger.kernel.org> # v4.8+
      Cc: Mahesh Kumar <mahesh1.kumar@intel.com>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170531154236.27180-1-maarten.lankhorst@linux.intel.comReviewed-by: default avatarMahesh Kumar <mahesh1.kumar@intel.com>
      Reviewed-by: default avatarMatt Roper <matthew.d.roper@intel.com>
      
      (cherry picked from commit 367d73d2)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      4e3aed84
    • Imre Deak's avatar
      drm/i915: Prevent the system suspend complete optimization · 6ab92afc
      Imre Deak authored
      Since
      
      commit bac2a909
      Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Date:   Wed Jan 21 02:17:42 2015 +0100
      
          PCI / PM: Avoid resuming PCI devices during system suspend
      
      PCI devices will default to allowing the system suspend complete
      optimization where devices are not woken up during system suspend if
      they were already runtime suspended. This however breaks the i915/HDA
      drivers for two reasons:
      
      - The i915 driver has system suspend specific steps that it needs to
        run, that bring the device to a different state than its runtime
        suspended state.
      
      - The HDA driver's suspend handler requires power that it will request
        from the i915 driver's power domain handler. This in turn requires the
        i915 driver to runtime resume itself, but this won't be possible if the
        suspend complete optimization is in effect: in this case the i915
        runtime PM is disabled and trying to get an RPM reference returns
        -EACCESS.
      
      Solve this by requiring the PCI/PM core to resume the device during
      system suspend which in effect disables the suspend complete optimization.
      
      Regardless of the above commit the optimization stayed disabled for DRM
      devices until
      
      commit d14d2a84
      Author: Lukas Wunner <lukas@wunner.de>
      Date:   Wed Jun 8 12:49:29 2016 +0200
      
          drm: Remove dev_pm_ops from drm_class
      
      so this patch is in practice a fix for this commit. Another reason for
      the bug staying hidden for so long is that the optimization for a device
      is disabled if it's disabled for any of its children devices. i915 may
      have a backlight device as its child which doesn't support runtime PM
      and so doesn't allow the optimization either.  So if this backlight
      device got registered the bug stayed hidden.
      
      Credits to Marta, Tomi and David who enabled pstore logging,
      that caught one instance of this issue across a suspend/
      resume-to-ram and Ville who rememberd that the optimization was enabled
      for some devices at one point.
      
      The first WARN triggered by the problem:
      
      [ 6250.746445] WARNING: CPU: 2 PID: 17384 at drivers/gpu/drm/i915/intel_runtime_pm.c:2846 intel_runtime_pm_get+0x6b/0xd0 [i915]
      [ 6250.746448] pm_runtime_get_sync() failed: -13
      [ 6250.746451] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul
      snd_hda_codec_realtek snd_hda_codec_generic ghash_clmulni_intel e1000e snd_hda_codec snd_hwdep snd_hda_core ptp mei_me pps_core snd_pcm lpc_ich mei prime_
      numbers i2c_hid i2c_designware_platform i2c_designware_core [last unloaded: i915]
      [ 6250.746512] CPU: 2 PID: 17384 Comm: kworker/u8:0 Tainted: G     U  W       4.11.0-rc5-CI-CI_DRM_334+ #1
      [ 6250.746515] Hardware name:                  /NUC5i5RYB, BIOS RYBDWi35.86A.0362.2017.0118.0940 01/18/2017
      [ 6250.746521] Workqueue: events_unbound async_run_entry_fn
      [ 6250.746525] Call Trace:
      [ 6250.746530]  dump_stack+0x67/0x92
      [ 6250.746536]  __warn+0xc6/0xe0
      [ 6250.746542]  ? pci_restore_standard_config+0x40/0x40
      [ 6250.746546]  warn_slowpath_fmt+0x46/0x50
      [ 6250.746553]  ? __pm_runtime_resume+0x56/0x80
      [ 6250.746584]  intel_runtime_pm_get+0x6b/0xd0 [i915]
      [ 6250.746610]  intel_display_power_get+0x1b/0x40 [i915]
      [ 6250.746646]  i915_audio_component_get_power+0x15/0x20 [i915]
      [ 6250.746654]  snd_hdac_display_power+0xc8/0x110 [snd_hda_core]
      [ 6250.746661]  azx_runtime_resume+0x218/0x280 [snd_hda_intel]
      [ 6250.746667]  pci_pm_runtime_resume+0x76/0xa0
      [ 6250.746672]  __rpm_callback+0xb4/0x1f0
      [ 6250.746677]  ? pci_restore_standard_config+0x40/0x40
      [ 6250.746682]  rpm_callback+0x1f/0x80
      [ 6250.746686]  ? pci_restore_standard_config+0x40/0x40
      [ 6250.746690]  rpm_resume+0x4ba/0x740
      [ 6250.746698]  __pm_runtime_resume+0x49/0x80
      [ 6250.746703]  pci_pm_suspend+0x57/0x140
      [ 6250.746709]  dpm_run_callback+0x6f/0x330
      [ 6250.746713]  ? pci_pm_freeze+0xe0/0xe0
      [ 6250.746718]  __device_suspend+0xf9/0x370
      [ 6250.746724]  ? dpm_watchdog_set+0x60/0x60
      [ 6250.746730]  async_suspend+0x1a/0x90
      [ 6250.746735]  async_run_entry_fn+0x34/0x160
      [ 6250.746741]  process_one_work+0x1f2/0x6d0
      [ 6250.746749]  worker_thread+0x49/0x4a0
      [ 6250.746755]  kthread+0x107/0x140
      [ 6250.746759]  ? process_one_work+0x6d0/0x6d0
      [ 6250.746763]  ? kthread_create_on_node+0x40/0x40
      [ 6250.746768]  ret_from_fork+0x2e/0x40
      [ 6250.746778] ---[ end trace 102a62fd2160f5e6 ]---
      
      v2:
      - Use the new pci_dev->needs_resume flag, to avoid any overhead during
        the ->pm_prepare hook. (Rafael)
      
      v3:
      - Update commit message to reference the actual regressing commit.
        (Lukas)
      
      v4:
      - Rebase on v4 of patch 1/2.
      
      Fixes: d14d2a84 ("drm: Remove dev_pm_ops from drm_class")
      References: https://bugs.freedesktop.org/show_bug.cgi?id=100378
      References: https://bugs.freedesktop.org/show_bug.cgi?id=100770
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Marta Lofstedt <marta.lofstedt@intel.com>
      Cc: David Weinehall <david.weinehall@linux.intel.com>
      Cc: Tomi Sarvela <tomi.p.sarvela@intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Lukas Wunner <lukas@wunner.de>
      Cc: linux-pci@vger.kernel.org
      Cc: <stable@vger.kernel.org> # v4.10.x: 4d071c32 - PCI/PM: Add needs_resume flag
      Cc: <stable@vger.kernel.org> # v4.10.x
      Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reported-and-tested-by: default avatarMarta Lofstedt <marta.lofstedt@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1493726649-32094-2-git-send-email-imre.deak@intel.com
      (cherry picked from commit adfdf85d)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      6ab92afc
    • Nagaraju, Vathsala's avatar
      drm/i915/psr: disable psr2 for resolution greater than 32X20 · bd709898
      Nagaraju, Vathsala authored
      psr1 is also disabled for panel resolution  greater than 32X20.
      Added psr2 check to disable only for psr2 panels having resolution
      greater than 32X20.
      
      issue was introduced by
      commit-id : "acf45d11"
      commit message: "PSR2 is restricted to work with panel resolutions
      upto 3200x2000, move the check to intel_psr_match_conditions and fully
      block psr."
      
      v2: (Rodrigo)
         Add previous commit details which introduced the issue
      
      Fixes: acf45d11 ("drm/i915/psr: disable psr2 for resolution greater than 32X20")
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Jim Bride <jim.bride@linux.intel.com>
      Cc: Yaroslav Shabalin <yaroslav.shabalin@gmail.com>
      Reported-by: default avatarYaroslav Shabalin <yaroslav.shabalin@gmail.com>
      Reviewed-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: default avatarvathsala nagaraju <vathsala.nagaraju@intel.com>
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/49935bdff896ee3140bed471012b9f9110a863a4.1495729964.git.vathsala.nagaraju@intel.com
      (cherry picked from commit bef8c056)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      bd709898
    • Chris Wilson's avatar
      drm/i915: Hold a wakeref for probing the ring registers · d9533f19
      Chris Wilson authored
      Allow intel_engine_is_idle() to be called outside of the GT wakeref by
      acquiring the device runtime pm for ourselves. This allows the function
      to act as check after we assume the engine is idle and we release the GT
      wakeref held whilst we have requests. At the moment, we do not call it
      outside of an awake context but taking the wakeref as required makes it
      more convenient to use for quick debugging in future.
      
      [ 2613.401647] RPM wakelock ref not held during HW access
      [ 2613.401684] ------------[ cut here ]------------
      [ 2613.401720] WARNING: CPU: 5 PID: 7739 at drivers/gpu/drm/i915/intel_drv.h:1787 gen6_read32+0x21f/0x2b0 [i915]
      [ 2613.401731] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp snd_hda_codec_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mii mei_me lpc_ich mei prime_numbers [last unloaded: i915]
      [ 2613.401823] CPU: 5 PID: 7739 Comm: drv_missed_irq Tainted: G     U          4.12.0-rc2-CI-CI_DRM_421+ #1
      [ 2613.401825] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
      [ 2613.401840] task: ffff880409e3a740 task.stack: ffffc900084dc000
      [ 2613.401861] RIP: 0010:gen6_read32+0x21f/0x2b0 [i915]
      [ 2613.401863] RSP: 0018:ffffc900084dfce8 EFLAGS: 00010292
      [ 2613.401869] RAX: 000000000000002a RBX: ffff8804016a8000 RCX: 0000000000000006
      [ 2613.401871] RDX: 0000000000000006 RSI: ffffffff81cbf2d9 RDI: ffffffff81c9e3a7
      [ 2613.401874] RBP: ffffc900084dfd18 R08: ffff880409e3afc8 R09: 0000000000000000
      [ 2613.401877] R10: 000000008a1c483f R11: 0000000000000000 R12: 000000000000209c
      [ 2613.401879] R13: 0000000000000001 R14: ffff8804016a8000 R15: ffff8804016ac150
      [ 2613.401882] FS:  00007f39ef3dd8c0(0000) GS:ffff88041fb40000(0000) knlGS:0000000000000000
      [ 2613.401885] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2613.401887] CR2: 00000000023717c8 CR3: 00000002e7b34000 CR4: 00000000001406e0
      [ 2613.401889] Call Trace:
      [ 2613.401912]  intel_engine_is_idle+0x76/0x90 [i915]
      [ 2613.401931]  i915_gem_wait_for_idle+0xe6/0x1e0 [i915]
      [ 2613.401951]  fault_irq_set+0x40/0x90 [i915]
      [ 2613.401970]  i915_ring_test_irq_set+0x42/0x50 [i915]
      [ 2613.401976]  simple_attr_write+0xc7/0xe0
      [ 2613.401981]  full_proxy_write+0x4f/0x70
      [ 2613.401987]  __vfs_write+0x23/0x120
      [ 2613.401992]  ? rcu_read_lock_sched_held+0x75/0x80
      [ 2613.401996]  ? rcu_sync_lockdep_assert+0x2a/0x50
      [ 2613.401999]  ? __sb_start_write+0xfa/0x1f0
      [ 2613.402004]  vfs_write+0xc5/0x1d0
      [ 2613.402008]  ? trace_hardirqs_on_caller+0xe7/0x1c0
      [ 2613.402013]  SyS_write+0x44/0xb0
      [ 2613.402020]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      [ 2613.402022] RIP: 0033:0x7f39eded6670
      [ 2613.402025] RSP: 002b:00007fffdcdcb1a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [ 2613.402030] RAX: ffffffffffffffda RBX: ffffffff81470203 RCX: 00007f39eded6670
      [ 2613.402033] RDX: 0000000000000001 RSI: 000000000041bc33 RDI: 0000000000000006
      [ 2613.402036] RBP: ffffc900084dff88 R08: 00007f39ef3dd8c0 R09: 0000000000000001
      [ 2613.402038] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000041bc33
      [ 2613.402041] R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
      [ 2613.402046]  ? __this_cpu_preempt_check+0x13/0x20
      [ 2613.402052] Code: 01 9b fa e0 0f ff e9 28 fe ff ff 80 3d 6a dd 0e 00 00 0f 85 29 fe ff ff 48 c7 c7 48 19 29 a0 c6 05 56 dd 0e 00 01 e8 da 9a fa e0 <0f> ff e9 0f fe ff ff b9 01 00 00 00 ba 01 00 00 00 44 89 e6 48
      [ 2613.402199] ---[ end trace 31f0cfa93ab632bf ]---
      
      Fixes: 5400367a ("drm/i915: Ensure the engine is idle before manually changing HWS")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170530121334.17364-2-chris@chris-wilson.co.ukReviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      (cherry picked from commit a091d4ee)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      d9533f19
    • Chris Wilson's avatar
      drm/i915: Short-circuit i915_gem_wait_for_idle() if already idle · e0da1963
      Chris Wilson authored
      If the device is asleep (no GT wakeref), we know the GPU is already idle.
      If we add an early return, we can avoid touching registers and checking
      hw state outside of the assumed GT wakelock. This prevents causing such
      errors whilst debugging:
      
      [ 2613.401647] RPM wakelock ref not held during HW access
      [ 2613.401684] ------------[ cut here ]------------
      [ 2613.401720] WARNING: CPU: 5 PID: 7739 at drivers/gpu/drm/i915/intel_drv.h:1787 gen6_read32+0x21f/0x2b0 [i915]
      [ 2613.401731] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp snd_hda_codec_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mii mei_me lpc_ich mei prime_numbers [last unloaded: i915]
      [ 2613.401823] CPU: 5 PID: 7739 Comm: drv_missed_irq Tainted: G     U          4.12.0-rc2-CI-CI_DRM_421+ #1
      [ 2613.401825] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
      [ 2613.401840] task: ffff880409e3a740 task.stack: ffffc900084dc000
      [ 2613.401861] RIP: 0010:gen6_read32+0x21f/0x2b0 [i915]
      [ 2613.401863] RSP: 0018:ffffc900084dfce8 EFLAGS: 00010292
      [ 2613.401869] RAX: 000000000000002a RBX: ffff8804016a8000 RCX: 0000000000000006
      [ 2613.401871] RDX: 0000000000000006 RSI: ffffffff81cbf2d9 RDI: ffffffff81c9e3a7
      [ 2613.401874] RBP: ffffc900084dfd18 R08: ffff880409e3afc8 R09: 0000000000000000
      [ 2613.401877] R10: 000000008a1c483f R11: 0000000000000000 R12: 000000000000209c
      [ 2613.401879] R13: 0000000000000001 R14: ffff8804016a8000 R15: ffff8804016ac150
      [ 2613.401882] FS:  00007f39ef3dd8c0(0000) GS:ffff88041fb40000(0000) knlGS:0000000000000000
      [ 2613.401885] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2613.401887] CR2: 00000000023717c8 CR3: 00000002e7b34000 CR4: 00000000001406e0
      [ 2613.401889] Call Trace:
      [ 2613.401912]  intel_engine_is_idle+0x76/0x90 [i915]
      [ 2613.401931]  i915_gem_wait_for_idle+0xe6/0x1e0 [i915]
      [ 2613.401951]  fault_irq_set+0x40/0x90 [i915]
      [ 2613.401970]  i915_ring_test_irq_set+0x42/0x50 [i915]
      [ 2613.401976]  simple_attr_write+0xc7/0xe0
      [ 2613.401981]  full_proxy_write+0x4f/0x70
      [ 2613.401987]  __vfs_write+0x23/0x120
      [ 2613.401992]  ? rcu_read_lock_sched_held+0x75/0x80
      [ 2613.401996]  ? rcu_sync_lockdep_assert+0x2a/0x50
      [ 2613.401999]  ? __sb_start_write+0xfa/0x1f0
      [ 2613.402004]  vfs_write+0xc5/0x1d0
      [ 2613.402008]  ? trace_hardirqs_on_caller+0xe7/0x1c0
      [ 2613.402013]  SyS_write+0x44/0xb0
      [ 2613.402020]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      [ 2613.402022] RIP: 0033:0x7f39eded6670
      [ 2613.402025] RSP: 002b:00007fffdcdcb1a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [ 2613.402030] RAX: ffffffffffffffda RBX: ffffffff81470203 RCX: 00007f39eded6670
      [ 2613.402033] RDX: 0000000000000001 RSI: 000000000041bc33 RDI: 0000000000000006
      [ 2613.402036] RBP: ffffc900084dff88 R08: 00007f39ef3dd8c0 R09: 0000000000000001
      [ 2613.402038] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000041bc33
      [ 2613.402041] R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
      [ 2613.402046]  ? __this_cpu_preempt_check+0x13/0x20
      [ 2613.402052] Code: 01 9b fa e0 0f ff e9 28 fe ff ff 80 3d 6a dd 0e 00 00 0f 85 29 fe ff ff 48 c7 c7 48 19 29 a0 c6 05 56 dd 0e 00 01 e8 da 9a fa e0 <0f> ff e9 0f fe ff ff b9 01 00 00 00 ba 01 00 00 00 44 89 e6 48
      [ 2613.402199] ---[ end trace 31f0cfa93ab632bf ]---
      
      Fixes: 25112b64 ("drm/i915: Wait for all engines to be idle as part of i915_gem_wait_for_idle()")
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170530121334.17364-1-chris@chris-wilson.co.ukReviewed-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      (cherry picked from commit 863e9fde)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      e0da1963
    • Kai Chen's avatar
      drm/i915: Disable decoupled MMIO · 4c4c5655
      Kai Chen authored
      The decoupled MMIO feature doesn't work as intended by HW team. Enabling
      it with forcewake will only make debugging efforts more difficult, so
      let's disable it.
      
      Fixes: 85ee17eb ("drm/i915/bxt: Broxton decoupled MMIO")
      Cc: Zhe Wang <zhe1.wang@intel.com>
      Cc: Praveen Paneri <praveen.paneri@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: <stable@vger.kernel.org> # v4.10+
      Signed-off-by: default avatarKai Chen <kai.chen@intel.com>
      Reviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170523215812.18328-2-kai.chen@intel.com
      (cherry picked from commit 0051c10a)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      4c4c5655
    • Michal Wajdeczko's avatar
      drm/i915/guc: Remove stale comment for q_fail · 4ca9a582
      Michal Wajdeczko authored
      This member was dropped long time ago.
      
      Fixes: 774439e1 ("drm/i915/guc: re-optimise i915_guc_client layout")
      Signed-off-by: default avatarMichal Wajdeczko <michal.wajdeczko@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170518113104.54400-1-michal.wajdeczko@intel.comReviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      (cherry picked from commit 4afc67be)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      4ca9a582
    • Jon Bloomfield's avatar
      drm/i915: Serialize GTT/Aperture accesses on BXT · d86b18a0
      Jon Bloomfield authored
      BXT has a H/W issue with IOMMU which can lead to system hangs when
      Aperture accesses are queued within the GAM behind GTT Accesses.
      
      This patch avoids the condition by wrapping all GTT updates in stop_machine
      and using a flushing read prior to restarting the machine.
      
      The stop_machine guarantees no new Aperture accesses can begin while
      the PTE writes are being emmitted. The flushing read ensures that
      any following Aperture accesses cannot begin until the PTE writes
      have been cleared out of the GAM's fifo.
      
      Only FOLLOWING Aperture accesses need to be separated from in flight
      PTE updates. PTE Writes may follow tightly behind already in flight
      Aperture accesses, so no flushing read is required at the start of
      a PTE update sequence.
      
      This issue was reproduced by running
      	igt/gem_readwrite and
      	igt/gem_render_copy
      simultaneously from different processes, each in a tight loop,
      with INTEL_IOMMU enabled.
      
      This patch was originally published as:
      	drm/i915: Serialize GTT Updates on BXT
      
      [Note: This will cause a performance penalty for some use cases, but
      avoiding hangs trumps performance hits. This may need to be worked
      around in Mesa to recover the lost performance.]
      
      v2: Move bxt/iommu detection into static function
          Remove #ifdef CONFIG_INTEL_IOMMU protection
          Make function names more reflective of purpose
          Move flushing read into static function
      
      v3: Tidy up for checkpatch.pl
      
      Testcase: igt/gem_concurrent_blit
      Signed-off-by: default avatarJon Bloomfield <jon.bloomfield@intel.com>
      Cc: John Harrison <john.C.Harrison@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: stable@vger.kernel.org
      Link: http://patchwork.freedesktop.org/patch/msgid/1495641251-30022-1-git-send-email-jon.bloomfield@intel.comReviewed-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      (cherry picked from commit 0ef34ad6)
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      d86b18a0
  2. 04 Jun, 2017 9 commits
    • Linus Torvalds's avatar
      Linux 4.12-rc4 · 3c2993b8
      Linus Torvalds authored
      3c2993b8
    • Richard Narron's avatar
      fs/ufs: Set UFS default maximum bytes per file · 239e250e
      Richard Narron authored
      This fixes a problem with reading files larger than 2GB from a UFS-2
      file system:
      
          https://bugzilla.kernel.org/show_bug.cgi?id=195721
      
      The incorrect UFS s_maxsize limit became a problem as of commit
      c2a9737f ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
      which started using s_maxbytes to avoid a page index overflow in
      do_generic_file_read().
      
      That caused files to be truncated on UFS-2 file systems because the
      default maximum file size is 2GB (MAX_NON_LFS) and UFS didn't update it.
      
      Here I simply increase the default to a common value used by other file
      systems.
      Signed-off-by: default avatarRichard Narron <comet.berkeley@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Will B <will.brokenbourgh2877@gmail.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: <stable@vger.kernel.org> # v4.9 and backports of c2a9737fSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      239e250e
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 125f42b0
      Linus Torvalds authored
      Pull NFS client bugfixes from Trond Myklebust:
       "Bugfixes include:
      
         - Fix a typo in commit e0926934 ("NFS append COMMIT after
           synchronous COPY") that breaks copy offload
      
         - Fix the connect error propagation in xs_tcp_setup_socket()
      
         - Fix a lock leak in nfs40_walk_client_list
      
         - Verify that pNFS requests lie within the offset range of the layout
           segment"
      
      * tag 'nfs-for-4.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        nfs: Mark unnecessarily extern functions as static
        SUNRPC: ensure correct error is reported by xs_tcp_setup_socket()
        NFSv4.0: Fix a lock leak in nfs40_walk_client_list
        pnfs: Fix the check for requests in range of layout segment
        xprtrdma: Delete an error message for a failed memory allocation in xprt_rdma_bc_setup()
        pNFS/flexfiles: missing error code in ff_layout_alloc_lseg()
        NFS fix COMMIT after COPY
      125f42b0
    • Linus Torvalds's avatar
      Merge tag 'tty-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 3c06e6cb
      Linus Torvalds authored
      Pull tty fix from Greg KH:
       "Here is a single tty core fix for 4.12-rc4. It reverts a patch that a
        lot of people reported as causing lockdep and other warnings.
      
        Right after I reverted this in my tree, it seems like another
        "correct" fix might have shown up, but it's too late in the release
        cycle to be messing with tty core locking, so let's just revert this
        for now to go back how things always have been and try it again for
        4.13.
      
        This has not been in linux-next as I only reverted it a few hours ago"
      
      * tag 'tty-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        Revert "tty: fix port buffer locking"
      3c06e6cb
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · e00811b4
      Linus Torvalds authored
      Pull input subsystem fixes from Dmitry Torokhov:
      
       - a couple of regression fixes in synaptics and axp20x-pek drivers
      
       - try to ease transition from PS/2 to RMI for Synaptics touchpad users
         by ensuring we do not try to activate RMI mode when RMI SMBus support
         is not enabled, and nag users a bit to enable it
      
       - plus a couple of other changes that seemed worthwhile for this
         release
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: axp20x-pek - switch to acpi_dev_present and check for ACPI0011 too
        Input: axp20x-pek - only check for "INTCFD9" ACPI device on Cherry Trail
        Input: tm2-touchkey - use LEN_ON as boolean value instead of LED_FULL
        Input: synaptics - tell users to report when they should be using rmi-smbus
        Input: synaptics - warn the users when there is a better mode
        Input: synaptics - keep PS/2 around when RMI4_SMB is not enabled
        Input: synaptics - clear device info before filling in
        Input: silead - disable interrupt during suspend
      e00811b4
    • Linus Torvalds's avatar
      Merge tag 'rtc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · 9f03b2c7
      Linus Torvalds authored
      Pull RTC fixlet from Alexandre Belloni:
       "A single patch, not really a fix but I don't think there is any reason
        to delay it.
      
        Change the mailing list address"
      
      * tag 'rtc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
        MAINTAINERS: update RTC mailing list
      9f03b2c7
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 1f915b7f
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This is nine fixes, seven of which are for the qedi driver (new as of
        4.10) the other two are a use after free in the cxgbi drivers and a
        potential NULL dereference in the rdac device handler"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: libcxgbi: fix skb use after free
        scsi: qedi: Fix endpoint NULL panic during recovery.
        scsi: qedi: set max_fin_rt default value
        scsi: qedi: Set firmware tcp msl timer value.
        scsi: qedi: Fix endpoint NULL panic in qedi_set_path.
        scsi: qedi: Set dma_boundary to 0xfff.
        scsi: qedi: Correctly set firmware max supported BDs.
        scsi: qedi: Fix bad pte call trace when iscsiuio is stopped.
        scsi: scsi_dh_rdac: Use ctlr directly in rdac_failover_get()
      1f915b7f
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · 55cbdaf6
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "For the most part this is just a minor -rc cycle for the rdma
        subsystem. Even given that this is all of the -rc patches since the
        merge window closed, it's still only about 25 patches:
      
         - Multiple i40iw, nes, iw_cxgb4, hfi1, qib, mlx4, mlx5 fixes
      
         - A few upper layer protocol fixes (IPoIB, iSER, SRP)
      
         - A modest number of core fixes"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (26 commits)
        RDMA/SA: Fix kernel panic in CMA request handler flow
        RDMA/umem: Fix missing mmap_sem in get umem ODP call
        RDMA/core: not to set page dirty bit if it's already set.
        RDMA/uverbs: Declare local function static and add brackets to sizeof
        RDMA/netlink: Reduce exposure of RDMA netlink functions
        RDMA/srp: Fix NULL deref at srp_destroy_qp()
        RDMA/IPoIB: Limit the ipoib_dev_uninit_default scope
        RDMA/IPoIB: Replace netdev_priv with ipoib_priv for ipoib_get_link_ksettings
        RDMA/qedr: add null check before pointer dereference
        RDMA/mlx5: set UMR wqe fence according to HCA cap
        net/mlx5: Define interface bits for fencing UMR wqe
        RDMA/mlx4: Fix MAD tunneling when SRIOV is enabled
        RDMA/qib,hfi1: Fix MR reference count leak on write with immediate
        RDMA/hfi1: Defer setting VL15 credits to link-up interrupt
        RDMA/hfi1: change PCI bar addr assignments to Linux API functions
        RDMA/hfi1: fix array termination by appending NULL to attr array
        RDMA/iw_cxgb4: fix the calculation of ipv6 header size
        RDMA/iw_cxgb4: calculate t4_eq_status_entries properly
        RDMA/iw_cxgb4: Avoid touch after free error in ARP failure handlers
        RDMA/nes: ACK MPA Reply frame
        ...
      55cbdaf6
    • Greg Kroah-Hartman's avatar
      Revert "tty: fix port buffer locking" · fc098af1
      Greg Kroah-Hartman authored
      This reverts commit 925bb1ce.
      
      It causes lots of warnings and problems so for now, let's just revert
      it.
      
      Reported-by: <valdis.kletnieks@vt.edu>
      Reported-by: default avatarRussell King <linux@armlinux.org.uk>
      Reported-by: default avatarSergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Reported-by: default avatarJiri Slaby <jslaby@suse.cz>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Acked-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc098af1
  3. 03 Jun, 2017 8 commits
  4. 02 Jun, 2017 11 commits
    • Linus Torvalds's avatar
      Merge tag 'acpi-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 104c08ba
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These revert one more problematic commit related to the ACPI-based
        handling of laptop lids and make some unuseful error messages coming
        from ACPICA go away.
      
        Specifics:
      
         - Revert one more commit related to the ACPI-based handling of laptop
           lids that changed the default behavior on laptops that booted with
           closed lids and introduced a regression there (Benjamin Tissoires).
      
         - Add a missing acpi_put_table() to the code implementing the
           /sys/firmware/acpi/tables interface to prevent a counter in the
           ACPICA core from overflowing (Dan Williams).
      
         - Drop error messages printed by ACPICA on acpi_get_table() reference
           counting mismatches as they need not indicate real errors at this
           point (Lv Zheng)"
      
      * tag 'acpi-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPICA: Tables: Fix regression introduced by a too early mechanism enabling
        Revert "ACPI / button: Change default behavior to lid_init_state=open"
        ACPI / sysfs: fix acpi_get_table() leak / acpi-sysfs denial of service
      104c08ba
    • Linus Torvalds's avatar
      Merge tag 'pm-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 89af529a
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix two bugs in error code paths in the cpufreq core and in the
        kirkwood-cpufreq driver.
      
        Specifics:
      
         - Make cpufreq_register_driver() return an error if the ->init()
           calls fail for all CPUs to prevent non-functional drivers from
           hanging around for no reason (David Arcari).
      
         - Make kirkwood-cpufreq check the return value of
           clk_prepare_enable() (which may fail) as appropriate (Arvind
           Yadav)"
      
      * tag 'pm-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: kirkwood-cpufreq:- Handle return value of clk_prepare_enable()
        cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
      89af529a
    • Linus Torvalds's avatar
      Merge tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random · 5a4829b5
      Linus Torvalds authored
      Pull /dev/random bug fix from Ted Ts'o:
       "Fix a race on architectures with prioritized interrupts (such as m68k)
        which can causes crashes in drivers/char/random.c:get_reg()"
      
      * tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
        fix race in drivers/char/random.c:get_reg()
      5a4829b5
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · f2197649
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "15 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        scripts/gdb: make lx-dmesg command work (reliably)
        mm: consider memblock reservations for deferred memory initialization sizing
        mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified
        mlock: fix mlock count can not decrease in race condition
        mm/migrate: fix refcount handling when !hugepage_migration_supported()
        dax: fix race between colliding PMD & PTE entries
        mm: avoid spurious 'bad pmd' warning messages
        mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once
        pcmcia: remove left-over %Z format
        slub/memcg: cure the brainless abuse of sysfs attributes
        initramfs: fix disabling of initramfs (and its compression)
        mm: clarify why we want kmalloc before falling backto vmallock
        frv: declare jiffies to be located in the .data section
        include/linux/gfp.h: fix ___GFP_NOLOCKDEP value
        ksm: prevent crash after write_protect_page fails
      f2197649
    • André Draszik's avatar
      scripts/gdb: make lx-dmesg command work (reliably) · d6c97087
      André Draszik authored
      lx-dmesg needs access to the log_buf symbol from printk.c.
      Unfortunately, the symbol log_buf also exists in BPF's verifier.c and
      hence gdb can pick one or the other.  If it happens to pick BPF's
      log_buf, lx-dmesg doesn't work:
      
        (gdb) lx-dmesg
        Python Exception <class 'gdb.MemoryError'> Cannot access memory at address 0x0:
        Error occurred in Python command: Cannot access memory at address 0x0
        (gdb) p log_buf
        $15 = 0x0
      
      Luckily, GDB has a way to deal with this, see
        https://sourceware.org/gdb/onlinedocs/gdb/Symbols.html
      
        (gdb) info variables ^log_buf$
        All variables matching regular expression "^log_buf$":
      
        File <linux.git>/kernel/bpf/verifier.c:
        static char *log_buf;
      
        File <linux.git>/kernel/printk/printk.c:
        static char *log_buf;
        (gdb) p 'verifier.c'::log_buf
        $1 = 0x0
        (gdb) p 'printk.c'::log_buf
        $2 = 0x811a6aa0 <__log_buf> ""
        (gdb) p &log_buf
        $3 = (char **) 0x8120fe40 <log_buf>
        (gdb) p &'verifier.c'::log_buf
        $4 = (char **) 0x8120fe40 <log_buf>
        (gdb) p &'printk.c'::log_buf
        $5 = (char **) 0x8048b7d0 <log_buf>
      
      By being explicit about the location of the symbol, we can make lx-dmesg
      work again.  While at it, do the same for the other symbols we need from
      printk.c
      
      Link: http://lkml.kernel.org/r/20170526112222.3414-1-git@andred.netSigned-off-by: default avatarAndré Draszik <git@andred.net>
      Tested-by: default avatarKieran Bingham <kieran@bingham.xyz>
      Acked-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d6c97087
    • Michal Hocko's avatar
      mm: consider memblock reservations for deferred memory initialization sizing · 864b9a39
      Michal Hocko authored
      We have seen an early OOM killer invocation on ppc64 systems with
      crashkernel=4096M:
      
      	kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
      	kthreadd cpuset=/ mems_allowed=7
      	CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
      	Call Trace:
      	  dump_stack+0xb0/0xf0 (unreliable)
      	  dump_header+0xb0/0x258
      	  out_of_memory+0x5f0/0x640
      	  __alloc_pages_nodemask+0xa8c/0xc80
      	  kmem_getpages+0x84/0x1a0
      	  fallback_alloc+0x2a4/0x320
      	  kmem_cache_alloc_node+0xc0/0x2e0
      	  copy_process.isra.25+0x260/0x1b30
      	  _do_fork+0x94/0x470
      	  kernel_thread+0x48/0x60
      	  kthreadd+0x264/0x330
      	  ret_from_kernel_thread+0x5c/0xa4
      
      	Mem-Info:
      	active_anon:0 inactive_anon:0 isolated_anon:0
      	 active_file:0 inactive_file:0 isolated_file:0
      	 unevictable:0 dirty:0 writeback:0 unstable:0
      	 slab_reclaimable:5 slab_unreclaimable:73
      	 mapped:0 shmem:0 pagetables:0 bounce:0
      	 free:0 free_pcp:0 free_cma:0
      	Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      	lowmem_reserve[]: 0 0 0 0
      	Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
      	0 total pagecache pages
      	0 pages in swap cache
      	Swap cache stats: add 0, delete 0, find 0/0
      	Free swap  = 0kB
      	Total swap = 0kB
      	819200 pages RAM
      	0 pages HighMem/MovableOnly
      	817481 pages reserved
      	0 pages cma reserved
      	0 pages hwpoisoned
      
      the reason is that the managed memory is too low (only 110MB) while the
      rest of the the 50GB is still waiting for the deferred intialization to
      be done.  update_defer_init estimates the initial memoty to initialize
      to 2GB at least but it doesn't consider any memory allocated in that
      range.  In this particular case we've had
      
      	Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)
      
      so the low 2GB is mostly depleted.
      
      Fix this by considering memblock allocations in the initial static
      initialization estimation.  Move the max_initialise to
      reset_deferred_meminit and implement a simple memblock_reserved_memory
      helper which iterates all reserved blocks and sums the size of all that
      start below the given address.  The cumulative size is than added on top
      of the initial estimation.  This is still not ideal because
      reset_deferred_meminit doesn't consider holes and so reservation might
      be above the initial estimation whihch we ignore but let's make the
      logic simpler until we really need to handle more complicated cases.
      
      Fixes: 3a80a7fa ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
      Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.czSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Tested-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org>	[4.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      864b9a39
    • James Morse's avatar
      mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified · 9a291a7c
      James Morse authored
      KVM uses get_user_pages() to resolve its stage2 faults.  KVM sets the
      FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it
      finds a VM_FAULT_HWPOISON.  KVM handles these hwpoison pages as a
      special case.  (check_user_page_hwpoison())
      
      When huge pages are involved, this doesn't work so well.
      get_user_pages() calls follow_hugetlb_page(), which stops early if it
      receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning
      -EFAULT to the caller.  The step to map this to -EHWPOISON based on the
      FOLL_ flags is missing.  The hwpoison special case is skipped, and
      -EFAULT is returned to user-space, causing Qemu or kvmtool to exit.
      
      Instead, move this VM_FAULT_ to errno mapping code into a header file
      and use it from faultin_page() and follow_hugetlb_page().
      
      With this, KVM works as expected.
      
      This isn't a problem for arm64 today as we haven't enabled
      MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86
      too, so I think this should be a fix.  This doesn't apply earlier than
      stable's v4.11.1 due to all sorts of cleanup.
      
      [james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()]
      suggested.
        Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.comSigned-off-by: default avatarJames Morse <james.morse@arm.com>
      Acked-by: default avatarPunit Agrawal <punit.agrawal@arm.com>
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>	[4.11.1+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a291a7c
    • Yisheng Xie's avatar
      mlock: fix mlock count can not decrease in race condition · 70feee0e
      Yisheng Xie authored
      Kefeng reported that when running the follow test, the mlock count in
      meminfo will increase permanently:
      
       [1] testcase
       linux:~ # cat test_mlockal
       grep Mlocked /proc/meminfo
        for j in `seq 0 10`
        do
       	for i in `seq 4 15`
       	do
       		./p_mlockall >> log &
       	done
       	sleep 0.2
       done
       # wait some time to let mlock counter decrease and 5s may not enough
       sleep 5
       grep Mlocked /proc/meminfo
      
       linux:~ # cat p_mlockall.c
       #include <sys/mman.h>
       #include <stdlib.h>
       #include <stdio.h>
      
       #define SPACE_LEN	4096
      
       int main(int argc, char ** argv)
       {
      	 	int ret;
      	 	void *adr = malloc(SPACE_LEN);
      	 	if (!adr)
      	 		return -1;
      
      	 	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
      	 	printf("mlcokall ret = %d\n", ret);
      
      	 	ret = munlockall();
      	 	printf("munlcokall ret = %d\n", ret);
      
      	 	free(adr);
      	 	return 0;
      	 }
      
      In __munlock_pagevec() we should decrement NR_MLOCK for each page where
      we clear the PageMlocked flag.  Commit 1ebb7cc6 ("mm: munlock: batch
      NR_MLOCK zone state updates") has introduced a bug where we don't
      decrement NR_MLOCK for pages where we clear the flag, but fail to
      isolate them from the lru list (e.g.  when the pages are on some other
      cpu's percpu pagevec).  Since PageMlocked stays cleared, the NR_MLOCK
      accounting gets permanently disrupted by this.
      
      Fix it by counting the number of page whose PageMlock flag is cleared.
      
      Fixes: 1ebb7cc6 (" mm: munlock: batch NR_MLOCK zone state updates")
      Link: http://lkml.kernel.org/r/1495678405-54569-1-git-send-email-xieyisheng1@huawei.comSigned-off-by: default avatarYisheng Xie <xieyisheng1@huawei.com>
      Reported-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Tested-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: zhongjiang <zhongjiang@huawei.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      70feee0e
    • Punit Agrawal's avatar
      mm/migrate: fix refcount handling when !hugepage_migration_supported() · 30809f55
      Punit Agrawal authored
      On failing to migrate a page, soft_offline_huge_page() performs the
      necessary update to the hugepage ref-count.
      
      But when !hugepage_migration_supported() , unmap_and_move_hugepage()
      also decrements the page ref-count for the hugepage.  The combined
      behaviour leaves the ref-count in an inconsistent state.
      
      This leads to soft lockups when running the overcommitted hugepage test
      from mce-tests suite.
      
        Soft offlining pfn 0x83ed600 at process virtual address 0x400000000000
        soft offline: 0x83ed600: migration failed 1, type 1fffc00000008008 (uptodate|head)
        INFO: rcu_preempt detected stalls on CPUs/tasks:
         Tasks blocked on level-0 rcu_node (CPUs 0-7): P2715
          (detected by 7, t=5254 jiffies, g=963, c=962, q=321)
          thugetlb_overco R  running task        0  2715   2685 0x00000008
          Call trace:
            dump_backtrace+0x0/0x268
            show_stack+0x24/0x30
            sched_show_task+0x134/0x180
            rcu_print_detail_task_stall_rnp+0x54/0x7c
            rcu_check_callbacks+0xa74/0xb08
            update_process_times+0x34/0x60
            tick_sched_handle.isra.7+0x38/0x70
            tick_sched_timer+0x4c/0x98
            __hrtimer_run_queues+0xc0/0x300
            hrtimer_interrupt+0xac/0x228
            arch_timer_handler_phys+0x3c/0x50
            handle_percpu_devid_irq+0x8c/0x290
            generic_handle_irq+0x34/0x50
            __handle_domain_irq+0x68/0xc0
            gic_handle_irq+0x5c/0xb0
      
      Address this by changing the putback_active_hugepage() in
      soft_offline_huge_page() to putback_movable_pages().
      
      This only triggers on systems that enable memory failure handling
      (ARCH_SUPPORTS_MEMORY_FAILURE) but not hugepage migration
      (!ARCH_ENABLE_HUGEPAGE_MIGRATION).
      
      I imagine this wasn't triggered as there aren't many systems running
      this configuration.
      
      [akpm@linux-foundation.org: remove dead comment, per Naoya]
      Link: http://lkml.kernel.org/r/20170525135146.32011-1-punit.agrawal@arm.comReported-by: default avatarManoj Iyer <manoj.iyer@canonical.com>
      Tested-by: default avatarManoj Iyer <manoj.iyer@canonical.com>
      Suggested-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarPunit Agrawal <punit.agrawal@arm.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>	[3.14+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30809f55
    • Ross Zwisler's avatar
      dax: fix race between colliding PMD & PTE entries · e2093926
      Ross Zwisler authored
      We currently have two related PMD vs PTE races in the DAX code.  These
      can both be easily triggered by having two threads reading and writing
      simultaneously to the same private mapping, with the key being that
      private mapping reads can be handled with PMDs but private mapping
      writes are always handled with PTEs so that we can COW.
      
      Here is the first race:
      
        CPU 0					CPU 1
      
        (private mapping write)
        __handle_mm_fault()
          create_huge_pmd() - FALLBACK
          handle_pte_fault()
            passes check for pmd_devmap()
      
      					(private mapping read)
      					__handle_mm_fault()
      					  create_huge_pmd()
      					    dax_iomap_pmd_fault() inserts PMD
      
            dax_iomap_pte_fault() does a PTE fault, but we already have a DAX PMD
            			  installed in our page tables at this spot.
      
      Here's the second race:
      
        CPU 0					CPU 1
      
        (private mapping read)
        __handle_mm_fault()
          passes check for pmd_none()
          create_huge_pmd()
            dax_iomap_pmd_fault() inserts PMD
      
        (private mapping write)
        __handle_mm_fault()
          create_huge_pmd() - FALLBACK
      					(private mapping read)
      					__handle_mm_fault()
      					  passes check for pmd_none()
      					  create_huge_pmd()
      
          handle_pte_fault()
            dax_iomap_pte_fault() inserts PTE
      					    dax_iomap_pmd_fault() inserts PMD,
      					       but we already have a PTE at
      					       this spot.
      
      The core of the issue is that while there is isolation between faults to
      the same range in the DAX fault handlers via our DAX entry locking,
      there is no isolation between faults in the code in mm/memory.c.  This
      means for instance that this code in __handle_mm_fault() can run:
      
      	if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) {
      		ret = create_huge_pmd(&vmf);
      
      But by the time we actually get to run the fault handler called by
      create_huge_pmd(), the PMD is no longer pmd_none() because a racing PTE
      fault has installed a normal PMD here as a parent.  This is the cause of
      the 2nd race.  The first race is similar - there is the following check
      in handle_pte_fault():
      
      	} else {
      		/* See comment in pte_alloc_one_map() */
      		if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd))
      			return 0;
      
      So if a pmd_devmap() PMD (a DAX PMD) has been installed at vmf->pmd, we
      will bail and retry the fault.  This is correct, but there is nothing
      preventing the PMD from being installed after this check but before we
      actually get to the DAX PTE fault handlers.
      
      In my testing these races result in the following types of errors:
      
        BUG: Bad rss-counter state mm:ffff8800a817d280 idx:1 val:1
        BUG: non-zero nr_ptes on freeing mm: 15
      
      Fix this issue by having the DAX fault handlers verify that it is safe
      to continue their fault after they have taken an entry lock to block
      other racing faults.
      
      [ross.zwisler@linux.intel.com: improve fix for colliding PMD & PTE entries]
        Link: http://lkml.kernel.org/r/20170526195932.32178-1-ross.zwisler@linux.intel.com
      Link: http://lkml.kernel.org/r/20170522215749.23516-2-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: default avatarPawel Lebioda <pawel.lebioda@intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Pawel Lebioda <pawel.lebioda@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Xiong Zhou <xzhou@redhat.com>
      Cc: Eryu Guan <eguan@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e2093926
    • Ross Zwisler's avatar
      mm: avoid spurious 'bad pmd' warning messages · d0f0931d
      Ross Zwisler authored
      When the pmd_devmap() checks were added by 5c7fb56e ("mm, dax:
      dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge
      pages, they were all added to the end of if() statements after existing
      pmd_trans_huge() checks.  So, things like:
      
        -       if (pmd_trans_huge(*pmd))
        +       if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd))
      
      When further checks were added after pmd_trans_unstable() checks by
      commit 7267ec00 ("mm: postpone page table allocation until we have
      page to map") they were also added at the end of the conditional:
      
        +       if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
      
      This ordering is fine for pmd_trans_huge(), but doesn't work for
      pmd_trans_unstable().  This is because DAX huge pages trip the bad_pmd()
      check inside of pmd_none_or_trans_huge_or_clear_bad() (called by
      pmd_trans_unstable()), which prints out a warning and returns 1.  So, we
      do end up doing the right thing, but only after spamming dmesg with
      suspicious looking messages:
      
        mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5)
      
      Reorder these checks in a helper so that pmd_devmap() is checked first,
      avoiding the error messages, and add a comment explaining why the
      ordering is important.
      
      Fixes: commit 7267ec00 ("mm: postpone page table allocation until we have page to map")
      Link: http://lkml.kernel.org/r/20170522215749.23516-1-ross.zwisler@linux.intel.comSigned-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Pawel Lebioda <pawel.lebioda@intel.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Xiong Zhou <xzhou@redhat.com>
      Cc: Eryu Guan <eguan@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0f0931d