1. 08 Feb, 2018 2 commits
  2. 07 Feb, 2018 3 commits
  3. 05 Feb, 2018 1 commit
    • Bo Yan's avatar
      cpufreq: Skip cpufreq resume if it's not suspended · 703cbaa6
      Bo Yan authored
      cpufreq_resume can be called even without preceding cpufreq_suspend.
      This can happen in following scenario:
      
          suspend_devices_and_enter
             --> dpm_suspend_start
                --> dpm_prepare
                    --> device_prepare : this function errors out
                --> dpm_suspend: this is skipped due to dpm_prepare failure
                                 this means cpufreq_suspend is skipped over
             --> goto Recover_platform, due to previous error
             --> goto Resume_devices
             --> dpm_resume_end
                 --> dpm_resume
                     --> cpufreq_resume
      
      In case schedutil is used as frequency governor, cpufreq_resume will
      eventually call sugov_start, which does following:
      
          memset(sg_cpu, 0, sizeof(*sg_cpu));
          ....
      
      This effectively erases function pointer for frequency update, causing
      crash later on. The function pointer would have been set correctly if
      subsequent cpufreq_add_update_util_hook runs successfully, but that
      function returns earlier because cpufreq_suspend was not called:
      
          if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
      		return;
      
      The fix is to check cpufreq_suspended first, if it's false, that means
      cpufreq_suspend was not called in the first place, so do not resume
      cpufreq.
      Signed-off-by: default avatarBo Yan <byan@nvidia.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      [ rjw: Dropped printing a message ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      703cbaa6
  4. 29 Jan, 2018 3 commits
    • Linus Torvalds's avatar
      Merge tag 'pm-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 7f3fdd40
      Linus Torvalds authored
      Pull power management updates from Rafael Wysocki:
       "This includes some infrastructure changes in the PM core, mostly
        related to integration between runtime PM and system-wide suspend and
        hibernation, plus some driver changes depending on them and fixes for
        issues in that area which have become quite apparent recently.
      
        Also included are changes making more x86-based systems use the Low
        Power Sleep S0 _DSM interface by default, which turned out to be
        necessary to handle power button wakeups from suspend-to-idle on
        Surface Pro3.
      
        On the cpufreq front we have fixes and cleanups in the core, some new
        hardware support, driver updates and the removal of some unused code
        from the CPU cooling thermal driver.
      
        Apart from this, the Operating Performance Points (OPP) framework is
        prepared to be used with power domains in the future and there is a
        usual bunch of assorted fixes and cleanups.
      
        Specifics:
      
         - Define a PM driver flag allowing drivers to request that their
           devices be left in suspend after system-wide transitions to the
           working state if possible and add support for it to the PCI bus
           type and the ACPI PM domain (Rafael Wysocki).
      
         - Make the PM core carry out optimizations for devices with driver PM
           flags set in some cases and make a few drivers set those flags
           (Rafael Wysocki).
      
         - Fix and clean up wrapper routines allowing runtime PM device
           callbacks to be re-used for system-wide PM, change the generic
           power domains (genpd) framework to stop using those routines
           incorrectly and fix up a driver depending on that behavior of genpd
           (Rafael Wysocki, Ulf Hansson, Geert Uytterhoeven).
      
         - Fix and clean up the PM core's device wakeup framework and
           re-factor system-wide PM core code related to device wakeup
           (Rafael Wysocki, Ulf Hansson, Brian Norris).
      
         - Make more x86-based systems use the Low Power Sleep S0 _DSM
           interface by default (to fix power button wakeup from
           suspend-to-idle on Surface Pro3) and add a kernel command line
           switch to tell it to ignore the system sleep blacklist in the ACPI
           core (Rafael Wysocki).
      
         - Fix a race condition related to cpufreq governor module removal and
           clean up the governor management code in the cpufreq core (Rafael
           Wysocki).
      
         - Drop the unused generic code related to the handling of the static
           power energy usage model in the CPU cooling thermal driver along
           with the corresponding documentation (Viresh Kumar).
      
         - Add mt2712 support to the Mediatek cpufreq driver (Andrew-sh
           Cheng).
      
         - Add a new operating point to the imx6ul and imx6q cpufreq drivers
           and switch the latter to using clk_bulk_get() (Anson Huang, Dong
           Aisheng).
      
         - Add support for multiple regulators to the TI cpufreq driver along
           with a new DT binding related to that and clean up that driver
           somewhat (Dave Gerlach).
      
         - Fix a powernv cpufreq driver regression leading to incorrect CPU
           frequency reporting, fix that driver to deal with non-continguous
           P-states correctly and clean it up (Gautham Shenoy, Shilpasri
           Bhat).
      
         - Add support for frequency scaling on Armada 37xx SoCs through the
           generic DT cpufreq driver (Gregory CLEMENT).
      
         - Fix error code paths in the mvebu cpufreq driver (Gregory CLEMENT).
      
         - Fix a transition delay setting regression in the longhaul cpufreq
           driver (Viresh Kumar).
      
         - Add Skylake X (server) support to the intel_pstate cpufreq driver
           and clean up that driver somewhat (Srinivas Pandruvada).
      
         - Clean up the cpufreq statistics collection code (Viresh Kumar).
      
         - Drop cluster terminology and dependency on physical_package_id from
           the PSCI driver and drop dependency on arm_big_little from the SCPI
           cpufreq driver (Sudeep Holla).
      
         - Add support for system-wide suspend and resume to the RAPL power
           capping driver and drop a redundant semicolon from it (Zhen Han,
           Luis de Bethencourt).
      
         - Make SPI domain validation (in the SCSI SPI transport driver) and
           system-wide suspend mutually exclusive as they rely on the same
           underlying mechanism and cannot be carried out at the same time
           (Bart Van Assche).
      
         - Fix the computation of the amount of memory to preallocate in the
           hibernation core and clean up one function in there (Rainer Fiebig,
           Kyungsik Lee).
      
         - Prepare the Operating Performance Points (OPP) framework for being
           used with power domains and clean up one function in it (Viresh
           Kumar, Wei Yongjun).
      
         - Clean up the generic sysfs interface for device PM (Andy
           Shevchenko).
      
         - Fix several minor issues in power management frameworks and clean
           them up a bit (Arvind Yadav, Bjorn Andersson, Geert Uytterhoeven,
           Gustavo Silva, Julia Lawall, Luis de Bethencourt, Paul Gortmaker,
           Sergey Senozhatsky, gaurav jindal).
      
         - Make it easier to disable PM via Kconfig (Mark Brown).
      
         - Clean up the cpupower and intel_pstate_tracer utilities (Doug
           Smythies, Laura Abbott)"
      
      * tag 'pm-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
        PCI / PM: Remove spurious semicolon
        cpufreq: scpi: remove arm_big_little dependency
        drivers: psci: remove cluster terminology and dependency on physical_package_id
        powercap: intel_rapl: Fix trailing semicolon
        dmaengine: rcar-dmac: Make DMAC reinit during system resume explicit
        PM / runtime: Allow no callbacks in pm_runtime_force_suspend|resume()
        PM / hibernate: Drop unused parameter of enough_swap
        PM / runtime: Check ignore_children in pm_runtime_need_not_resume()
        PM / runtime: Rework pm_runtime_force_suspend/resume()
        PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume()
        cpufreq: powernv: Dont assume distinct pstate values for nominal and pmin
        cpufreq: intel_pstate: Add Skylake servers support
        cpufreq: intel_pstate: Replace bxt_funcs with core_funcs
        platform/x86: surfacepro3: Support for wakeup from suspend-to-idle
        ACPI / PM: Use Low Power S0 Idle on more systems
        PM / wakeup: Print warn if device gets enabled as wakeup source during sleep
        PM / domains: Don't skip driver's ->suspend|resume_noirq() callbacks
        PM / core: Propagate wakeup_path status flag in __device_suspend_late()
        PM / core: Re-structure code for clearing the direct_complete flag
        powercap: add suspend and resume mechanism for SOC power limit
        ...
      7f3fdd40
    • Linus Torvalds's avatar
      Merge tag 'sound-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 1c1f395b
      Linus Torvalds authored
      Pull sound updates from Takashi Iwai:
       "The major changes in the core API side in this cycle are the still
        on-going ASoC componentization works. Other than that, only few small
        changes such as 20bit PCM format support are found.
      
        Meanwhile the rest majority of changes are for ASoC drivers:
      
         - Large cleanups of some of the TI CODEC drivers
      
         - Continued work on Intel ASoC stuff for new quirks, ACPI GPIO
           handling, Kconfigs and lots of cleanups
      
         - Refactoring of the Freescale SSI driver, as preliminary work for
           the upcoming changes
      
         - Work on ST DFSDM driver, including the required IIO patches
      
         - New drivers for Allwinner A83T, Maxim MAX89373, SocioNext UiniPhier
           EVEA Tempo Semiconductor TSCS42xx and TI PCM816x, TAS5722 and
           TAS6424 devices
      
         - Removal of dead codes for SN95031 and board drivers
      
        Last but not least, a few HD-audio and USB-audio quirks are included
        as usual, too"
      
      * tag 'sound-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (303 commits)
        ALSA: hda - Reduce the suspend time consumption for ALC256
        ASoC: use seq_file to dump the contents of dai_list,platform_list and codec_list
        ASoC: soc-core: add missing EXPORT_SYMBOL_GPL() for snd_soc_rtdcom_lookup
        IIO: ADC: stm32-dfsdm: remove unused variable again
        ASoC: bcm2835: fix hw_params error when device is in prepared state
        ASoC: mxs-sgtl5000: Do not print error on probe deferral
        ASoC: sgtl5000: Do not print error on probe deferral
        ASoC: Intel: remove select on non-existing SND_SOC_INTEL_COMMON
        ALSA: usb-audio: Support changing input on Sound Blaster E1
        ASoC: Intel: remove second duplicated assignment to pointer 'res'
        ALSA: hda/realtek - update ALC215 depop optimize
        ALSA: hda/realtek - Support headset mode for ALC215/ALC285/ALC289
        ALSA: pcm: Fix trailing semicolon
        ASoC: add Component level .read/.write
        ASoC: cx20442: fix regression by adding back .read/.write
        ASoC: uda1380: fix regression by adding back .read/.write
        ASoC: tlv320dac33: fix regression by adding back .read/.write
        ALSA: hda - Use IS_REACHABLE() for dependency on input
        IIO: ADC: stm32-dfsdm: fix static check warning
        IIO: ADC: stm32-dfsdm: code optimization
        ...
      1c1f395b
    • Linus Torvalds's avatar
      Merge tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 49f9c355
      Linus Torvalds authored
      Pull init_task initializer cleanups from David Howells:
       "It doesn't seem useful to have the init_task in a header file rather
        than in a normal source file. We could consolidate init_task handling
        instead and expand out various macros.
      
        Here's a series of patches that consolidate init_task handling:
      
         (1) Make THREAD_SIZE available to vmlinux.lds for cris, hexagon and
             openrisc.
      
         (2) Alter the INIT_TASK_DATA linker script macro to set
             init_thread_union and init_stack rather than defining these in C.
      
             Insert init_task and init_thread_into into the init_stack area in
             the linker script as appropriate to the configuration, with
             different section markers so that they end up correctly ordered.
      
             We can then get merge ia64's init_task.c into the main one.
      
             We then have a bunch of single-use INIT_*() macros that seem only
             to be macros because they used to be used per-arch. We can then
             expand these in place of the user and get rid of a few lines and
             a lot of backslashes.
      
         (3) Expand INIT_TASK() in place.
      
         (4) Expand in place various small INIT_*() macros that are defined
             conditionally. Expand them and surround them by #if[n]def/#endif
             in the .c file as it takes fewer lines.
      
         (5) Expand INIT_SIGNALS() and INIT_SIGHAND() in place.
      
         (6) Expand INIT_STRUCT_PID in place.
      
        These macros can then be discarded"
      
      * tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        Expand INIT_STRUCT_PID and remove
        Expand the INIT_SIGNALS and INIT_SIGHAND macros and remove
        Expand various INIT_* macros and remove
        Expand INIT_TASK() in init/init_task.c and remove
        Construct init thread stack in the linker script rather than by union
        openrisc: Make THREAD_SIZE available to vmlinux.lds
        hexagon: Make THREAD_SIZE available to vmlinux.lds
        cris: Make THREAD_SIZE available to vmlinux.lds
      49f9c355
  5. 28 Jan, 2018 8 commits
  6. 27 Jan, 2018 2 commits
    • Thomas Gleixner's avatar
      hrtimer: Reset hrtimer cpu base proper on CPU hotplug · d5421ea4
      Thomas Gleixner authored
      The hrtimer interrupt code contains a hang detection and mitigation
      mechanism, which prevents that a long delayed hrtimer interrupt causes a
      continous retriggering of interrupts which prevent the system from making
      progress. If a hang is detected then the timer hardware is programmed with
      a certain delay into the future and a flag is set in the hrtimer cpu base
      which prevents newly enqueued timers from reprogramming the timer hardware
      prior to the chosen delay. The subsequent hrtimer interrupt after the delay
      clears the flag and resumes normal operation.
      
      If such a hang happens in the last hrtimer interrupt before a CPU is
      unplugged then the hang_detected flag is set and stays that way when the
      CPU is plugged in again. At that point the timer hardware is not armed and
      it cannot be armed because the hang_detected flag is still active, so
      nothing clears that flag. As a consequence the CPU does not receive hrtimer
      interrupts and no timers expire on that CPU which results in RCU stalls and
      other malfunctions.
      
      Clear the flag along with some other less critical members of the hrtimer
      cpu base to ensure starting from a clean state when a CPU is plugged in.
      
      Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
      root cause of that hard to reproduce heisenbug. Once understood it's
      trivial and certainly justifies a brown paperbag.
      
      Fixes: 41d2e494 ("hrtimer: Tune hrtimer_interrupt hang logic")
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Sewior <bigeasy@linutronix.de>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
      d5421ea4
    • H. Peter Anvin's avatar
      x86: Mark hpa as a "Designated Reviewer" for the time being · 8a95b74d
      H. Peter Anvin authored
      Due to some unfortunate events, I have not been directly involved in
      the x86 kernel patch flow for a while now.  I have also not been able
      to ramp back up by now like I had hoped to, and after reviewing what I
      will need to work on both internally at Intel and elsewhere in the near
      term, it is clear that I am not going to be able to ramp back up until
      late 2018 at the very earliest.
      
      It is not acceptable to not recognize that this load is currently
      taken by Ingo and Thomas without my direct participation, so I mark
      myself as R: (designated reviewer) rather than M: (maintainer) until
      further notice.  This is in fact recognizing the de facto situation
      for the past few years.
      
      I have obviously no intention of going away, and I will do everything
      within my power to improve Linux on x86 and x86 for Linux.  This,
      however, puts credit where it is due and reflects a change of focus.
      
      This patch also removes stale entries for portions of the x86
      architecture which have not been maintained separately from arch/x86
      for a long time.  If there is a reason to re-introduce them then that
      can happen later.
      Signed-off-by: default avatarH. Peter Anvin <h.peter.anvin@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180125195934.5253-1-hpa@zytor.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8a95b74d
  7. 26 Jan, 2018 13 commits
  8. 25 Jan, 2018 8 commits
    • Lyude Paul's avatar
      drm/nouveau: Move irq setup/teardown to pci ctor/dtor · 0fd189a9
      Lyude Paul authored
      For a while we've been having issues with seemingly random interrupts
      coming from nvidia cards when resuming them. Originally the fix for this
      was thought to be just re-arming the MSI interrupt registers right after
      re-allocating our IRQs, however it seems a lot of what we do is both
      wrong and not even nessecary.
      
      This was made apparent by what appeared to be a regression in the
      mainline kernel that started introducing suspend/resume issues for
      nouveau:
      
              a0c9259d (irq/matrix: Spread interrupts on allocation)
      
      After this commit was introduced, we started getting interrupts from the
      GPU before we actually re-allocated our own IRQ (see references below)
      and assigned the IRQ handler. Investigating this turned out that the
      problem was not with the commit, but the fact that nouveau even
      free/allocates it's irqs before and after suspend/resume.
      
      For starters: drivers in the linux kernel haven't had to handle
      freeing/re-allocating their IRQs during suspend/resume cycles for quite
      a while now. Nouveau seems to be one of the few drivers left that still
      does this, despite the fact there's no reason we actually need to since
      disabling interrupts from the device side should be enough, as the
      kernel is already smart enough to know to disable host-side interrupts
      for us before going into suspend. Since we were tearing down our IRQs by
      hand however, that means there was a short period during resume where
      interrupts could be received before we re-allocated our IRQ which would
      lead to us getting an unhandled IRQ. Since we never handle said IRQ and
      re-arm the interrupt registers, this would cause us to miss all of the
      interrupts from the GPU and cause our init process to start timing out
      on anything requiring interrupts.
      
      So, since this whole setup/teardown every suspend/resume cycle is
      useless anyway, move irq setup/teardown into the pci subdev's ctor/dtor
      functions instead so they're only called at driver load and driver
      unload. This should fix most of the issues with pending interrupts on
      resume, along with getting suspend/resume for nouveau to work again.
      
      As well, this probably means we can also just remove the msi rearm call
      inside nvkm_pci_init(). But since our main focus here is to fix
      suspend/resume before 4.15, we'll save that for a later patch.
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Cc: Karol Herbst <kherbst@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      0fd189a9
    • Nicolas Dichtel's avatar
      net: don't call update_pmtu unconditionally · f15ca723
      Nicolas Dichtel authored
      Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
      "BUG: unable to handle kernel NULL pointer dereference at           (null)"
      
      Let's add a helper to check if update_pmtu is available before calling it.
      
      Fixes: 52a589d5 ("geneve: update skb dst pmtu on tx path")
      Fixes: a93bf0ff ("vxlan: update skb dst pmtu on tx path")
      CC: Roman Kapl <code@rkapl.cz>
      CC: Xin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f15ca723
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 6e20630e
      Linus Torvalds authored
      Pull KVM fixes from Radim Krčmář:
       "Fix races and a potential use after free in the s390 cmma migration
        code"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: s390: add proper locking for CMMA migration bitmap
      6e20630e
    • Linus Torvalds's avatar
      Merge tag 'for-4.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 525273fb
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "It's been reported recently that readdir can list stale entries under
        some conditions. Fix it."
      
      * tag 'for-4.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: fix stale entries in readdir
      525273fb
    • Dan Streetman's avatar
      net: tcp: close sock if net namespace is exiting · 4ee806d5
      Dan Streetman authored
      When a tcp socket is closed, if it detects that its net namespace is
      exiting, close immediately and do not wait for FIN sequence.
      
      For normal sockets, a reference is taken to their net namespace, so it will
      never exit while the socket is open.  However, kernel sockets do not take a
      reference to their net namespace, so it may begin exiting while the kernel
      socket is still open.  In this case if the kernel socket is a tcp socket,
      it will stay open trying to complete its close sequence.  The sock's dst(s)
      hold a reference to their interface, which are all transferred to the
      namespace's loopback interface when the real interfaces are taken down.
      When the namespace tries to take down its loopback interface, it hangs
      waiting for all references to the loopback interface to release, which
      results in messages like:
      
      unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      These messages continue until the socket finally times out and closes.
      Since the net namespace cleanup holds the net_mutex while calling its
      registered pernet callbacks, any new net namespace initialization is
      blocked until the current net namespace finishes exiting.
      
      After this change, the tcp socket notices the exiting net namespace, and
      closes immediately, releasing its dst(s) and their reference to the
      loopback interface, which lets the net namespace continue exiting.
      
      Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811Signed-off-by: default avatarDan Streetman <ddstreet@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ee806d5
    • Peter Zijlstra's avatar
      perf/x86: Fix perf,x86,cpuhp deadlock · efe951d3
      Peter Zijlstra authored
      More lockdep gifts, a 5-way lockup race:
      
      	perf_event_create_kernel_counter()
      	  perf_event_alloc()
      	    perf_try_init_event()
      	      x86_pmu_event_init()
      		__x86_pmu_event_init()
      		  x86_reserve_hardware()
       #0		    mutex_lock(&pmc_reserve_mutex);
      		    reserve_ds_buffer()
       #1		      get_online_cpus()
      
      	perf_event_release_kernel()
      	  _free_event()
      	    hw_perf_event_destroy()
      	      x86_release_hardware()
       #0		mutex_lock(&pmc_reserve_mutex)
      		release_ds_buffer()
       #1		  get_online_cpus()
      
       #1	do_cpu_up()
      	  perf_event_init_cpu()
       #2	    mutex_lock(&pmus_lock)
       #3	    mutex_lock(&ctx->mutex)
      
      	sys_perf_event_open()
      	  mutex_lock_double()
       #3	    mutex_lock(ctx->mutex)
       #4	    mutex_lock_nested(ctx->mutex, 1);
      
      	perf_try_init_event()
       #4	  mutex_lock_nested(ctx->mutex, 1)
      	  x86_pmu_event_init()
      	    intel_pmu_hw_config()
      	      x86_add_exclusive()
       #0		mutex_lock(&pmc_reserve_mutex)
      
      Fix it by using ordering constructs instead of locking.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      efe951d3
    • Peter Zijlstra's avatar
      perf/core: Fix ctx::mutex deadlock · 0c7296ca
      Peter Zijlstra authored
      Lockdep noticed the following 3-way lockup scenario:
      
      	sys_perf_event_open()
      	  perf_event_alloc()
      	    perf_try_init_event()
       #0	      ctx = perf_event_ctx_lock_nested(1)
      	      perf_swevent_init()
      		swevent_hlist_get()
       #1		  mutex_lock(&pmus_lock)
      
      	perf_event_init_cpu()
       #1	  mutex_lock(&pmus_lock)
       #2	  mutex_lock(&ctx->mutex)
      
      	sys_perf_event_open()
      	  mutex_lock_double()
       #2	   mutex_lock()
       #0	   mutex_lock_nested()
      
      And while we need that perf_event_ctx_lock_nested() for HW PMUs such
      that they can iterate the sibling list, trying to match it to the
      available counters, the software PMUs need do no such thing. Exclude
      them.
      
      In particular the swevent triggers the above invertion, while the
      tpevent PMU triggers a more elaborate one through their event_mutex.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0c7296ca
    • Peter Zijlstra's avatar
      perf/core: Fix another perf,trace,cpuhp lock inversion · 43fa87f7
      Peter Zijlstra authored
      Lockdep noticed the following 3-way lockup race:
      
              perf_trace_init()
       #0       mutex_lock(&event_mutex)
                perf_trace_event_init()
                  perf_trace_event_reg()
                    tp_event->class->reg() := tracepoint_probe_register
       #1              mutex_lock(&tracepoints_mutex)
                        trace_point_add_func()
       #2                  static_key_enable()
      
       #2	do_cpu_up()
      	  perf_event_init_cpu()
       #3	    mutex_lock(&pmus_lock)
       #4	    mutex_lock(&ctx->mutex)
      
      	perf_ioctl()
       #4	  ctx = perf_event_ctx_lock()
      	  _perf_iotcl()
      	    ftrace_profile_set_filter()
       #0	      mutex_lock(&event_mutex)
      
      Fudge it for now by noting that the tracepoint state does not depend
      on the event <-> context relation. Ugly though :/
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      43fa87f7