1. 09 Feb, 2018 8 commits
  2. 28 Jan, 2018 8 commits
  3. 27 Jan, 2018 2 commits
    • Thomas Gleixner's avatar
      hrtimer: Reset hrtimer cpu base proper on CPU hotplug · d5421ea4
      Thomas Gleixner authored
      The hrtimer interrupt code contains a hang detection and mitigation
      mechanism, which prevents that a long delayed hrtimer interrupt causes a
      continous retriggering of interrupts which prevent the system from making
      progress. If a hang is detected then the timer hardware is programmed with
      a certain delay into the future and a flag is set in the hrtimer cpu base
      which prevents newly enqueued timers from reprogramming the timer hardware
      prior to the chosen delay. The subsequent hrtimer interrupt after the delay
      clears the flag and resumes normal operation.
      
      If such a hang happens in the last hrtimer interrupt before a CPU is
      unplugged then the hang_detected flag is set and stays that way when the
      CPU is plugged in again. At that point the timer hardware is not armed and
      it cannot be armed because the hang_detected flag is still active, so
      nothing clears that flag. As a consequence the CPU does not receive hrtimer
      interrupts and no timers expire on that CPU which results in RCU stalls and
      other malfunctions.
      
      Clear the flag along with some other less critical members of the hrtimer
      cpu base to ensure starting from a clean state when a CPU is plugged in.
      
      Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
      root cause of that hard to reproduce heisenbug. Once understood it's
      trivial and certainly justifies a brown paperbag.
      
      Fixes: 41d2e494 ("hrtimer: Tune hrtimer_interrupt hang logic")
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Sewior <bigeasy@linutronix.de>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
      d5421ea4
    • H. Peter Anvin's avatar
      x86: Mark hpa as a "Designated Reviewer" for the time being · 8a95b74d
      H. Peter Anvin authored
      Due to some unfortunate events, I have not been directly involved in
      the x86 kernel patch flow for a while now.  I have also not been able
      to ramp back up by now like I had hoped to, and after reviewing what I
      will need to work on both internally at Intel and elsewhere in the near
      term, it is clear that I am not going to be able to ramp back up until
      late 2018 at the very earliest.
      
      It is not acceptable to not recognize that this load is currently
      taken by Ingo and Thomas without my direct participation, so I mark
      myself as R: (designated reviewer) rather than M: (maintainer) until
      further notice.  This is in fact recognizing the de facto situation
      for the past few years.
      
      I have obviously no intention of going away, and I will do everything
      within my power to improve Linux on x86 and x86 for Linux.  This,
      however, puts credit where it is due and reflects a change of focus.
      
      This patch also removes stale entries for portions of the x86
      architecture which have not been maintained separately from arch/x86
      for a long time.  If there is a reason to re-introduce them then that
      can happen later.
      Signed-off-by: default avatarH. Peter Anvin <h.peter.anvin@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180125195934.5253-1-hpa@zytor.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8a95b74d
  4. 26 Jan, 2018 13 commits
  5. 25 Jan, 2018 9 commits
    • Lyude Paul's avatar
      drm/nouveau: Move irq setup/teardown to pci ctor/dtor · 0fd189a9
      Lyude Paul authored
      For a while we've been having issues with seemingly random interrupts
      coming from nvidia cards when resuming them. Originally the fix for this
      was thought to be just re-arming the MSI interrupt registers right after
      re-allocating our IRQs, however it seems a lot of what we do is both
      wrong and not even nessecary.
      
      This was made apparent by what appeared to be a regression in the
      mainline kernel that started introducing suspend/resume issues for
      nouveau:
      
              a0c9259d (irq/matrix: Spread interrupts on allocation)
      
      After this commit was introduced, we started getting interrupts from the
      GPU before we actually re-allocated our own IRQ (see references below)
      and assigned the IRQ handler. Investigating this turned out that the
      problem was not with the commit, but the fact that nouveau even
      free/allocates it's irqs before and after suspend/resume.
      
      For starters: drivers in the linux kernel haven't had to handle
      freeing/re-allocating their IRQs during suspend/resume cycles for quite
      a while now. Nouveau seems to be one of the few drivers left that still
      does this, despite the fact there's no reason we actually need to since
      disabling interrupts from the device side should be enough, as the
      kernel is already smart enough to know to disable host-side interrupts
      for us before going into suspend. Since we were tearing down our IRQs by
      hand however, that means there was a short period during resume where
      interrupts could be received before we re-allocated our IRQ which would
      lead to us getting an unhandled IRQ. Since we never handle said IRQ and
      re-arm the interrupt registers, this would cause us to miss all of the
      interrupts from the GPU and cause our init process to start timing out
      on anything requiring interrupts.
      
      So, since this whole setup/teardown every suspend/resume cycle is
      useless anyway, move irq setup/teardown into the pci subdev's ctor/dtor
      functions instead so they're only called at driver load and driver
      unload. This should fix most of the issues with pending interrupts on
      resume, along with getting suspend/resume for nouveau to work again.
      
      As well, this probably means we can also just remove the msi rearm call
      inside nvkm_pci_init(). But since our main focus here is to fix
      suspend/resume before 4.15, we'll save that for a later patch.
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Cc: Karol Herbst <kherbst@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      0fd189a9
    • Nicolas Dichtel's avatar
      net: don't call update_pmtu unconditionally · f15ca723
      Nicolas Dichtel authored
      Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
      "BUG: unable to handle kernel NULL pointer dereference at           (null)"
      
      Let's add a helper to check if update_pmtu is available before calling it.
      
      Fixes: 52a589d5 ("geneve: update skb dst pmtu on tx path")
      Fixes: a93bf0ff ("vxlan: update skb dst pmtu on tx path")
      CC: Roman Kapl <code@rkapl.cz>
      CC: Xin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f15ca723
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 6e20630e
      Linus Torvalds authored
      Pull KVM fixes from Radim Krčmář:
       "Fix races and a potential use after free in the s390 cmma migration
        code"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: s390: add proper locking for CMMA migration bitmap
      6e20630e
    • Linus Torvalds's avatar
      Merge tag 'for-4.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 525273fb
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "It's been reported recently that readdir can list stale entries under
        some conditions. Fix it."
      
      * tag 'for-4.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: fix stale entries in readdir
      525273fb
    • Dan Streetman's avatar
      net: tcp: close sock if net namespace is exiting · 4ee806d5
      Dan Streetman authored
      When a tcp socket is closed, if it detects that its net namespace is
      exiting, close immediately and do not wait for FIN sequence.
      
      For normal sockets, a reference is taken to their net namespace, so it will
      never exit while the socket is open.  However, kernel sockets do not take a
      reference to their net namespace, so it may begin exiting while the kernel
      socket is still open.  In this case if the kernel socket is a tcp socket,
      it will stay open trying to complete its close sequence.  The sock's dst(s)
      hold a reference to their interface, which are all transferred to the
      namespace's loopback interface when the real interfaces are taken down.
      When the namespace tries to take down its loopback interface, it hangs
      waiting for all references to the loopback interface to release, which
      results in messages like:
      
      unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      These messages continue until the socket finally times out and closes.
      Since the net namespace cleanup holds the net_mutex while calling its
      registered pernet callbacks, any new net namespace initialization is
      blocked until the current net namespace finishes exiting.
      
      After this change, the tcp socket notices the exiting net namespace, and
      closes immediately, releasing its dst(s) and their reference to the
      loopback interface, which lets the net namespace continue exiting.
      
      Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811Signed-off-by: default avatarDan Streetman <ddstreet@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ee806d5
    • Peter Zijlstra's avatar
      perf/x86: Fix perf,x86,cpuhp deadlock · efe951d3
      Peter Zijlstra authored
      More lockdep gifts, a 5-way lockup race:
      
      	perf_event_create_kernel_counter()
      	  perf_event_alloc()
      	    perf_try_init_event()
      	      x86_pmu_event_init()
      		__x86_pmu_event_init()
      		  x86_reserve_hardware()
       #0		    mutex_lock(&pmc_reserve_mutex);
      		    reserve_ds_buffer()
       #1		      get_online_cpus()
      
      	perf_event_release_kernel()
      	  _free_event()
      	    hw_perf_event_destroy()
      	      x86_release_hardware()
       #0		mutex_lock(&pmc_reserve_mutex)
      		release_ds_buffer()
       #1		  get_online_cpus()
      
       #1	do_cpu_up()
      	  perf_event_init_cpu()
       #2	    mutex_lock(&pmus_lock)
       #3	    mutex_lock(&ctx->mutex)
      
      	sys_perf_event_open()
      	  mutex_lock_double()
       #3	    mutex_lock(ctx->mutex)
       #4	    mutex_lock_nested(ctx->mutex, 1);
      
      	perf_try_init_event()
       #4	  mutex_lock_nested(ctx->mutex, 1)
      	  x86_pmu_event_init()
      	    intel_pmu_hw_config()
      	      x86_add_exclusive()
       #0		mutex_lock(&pmc_reserve_mutex)
      
      Fix it by using ordering constructs instead of locking.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      efe951d3
    • Peter Zijlstra's avatar
      perf/core: Fix ctx::mutex deadlock · 0c7296ca
      Peter Zijlstra authored
      Lockdep noticed the following 3-way lockup scenario:
      
      	sys_perf_event_open()
      	  perf_event_alloc()
      	    perf_try_init_event()
       #0	      ctx = perf_event_ctx_lock_nested(1)
      	      perf_swevent_init()
      		swevent_hlist_get()
       #1		  mutex_lock(&pmus_lock)
      
      	perf_event_init_cpu()
       #1	  mutex_lock(&pmus_lock)
       #2	  mutex_lock(&ctx->mutex)
      
      	sys_perf_event_open()
      	  mutex_lock_double()
       #2	   mutex_lock()
       #0	   mutex_lock_nested()
      
      And while we need that perf_event_ctx_lock_nested() for HW PMUs such
      that they can iterate the sibling list, trying to match it to the
      available counters, the software PMUs need do no such thing. Exclude
      them.
      
      In particular the swevent triggers the above invertion, while the
      tpevent PMU triggers a more elaborate one through their event_mutex.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0c7296ca
    • Peter Zijlstra's avatar
      perf/core: Fix another perf,trace,cpuhp lock inversion · 43fa87f7
      Peter Zijlstra authored
      Lockdep noticed the following 3-way lockup race:
      
              perf_trace_init()
       #0       mutex_lock(&event_mutex)
                perf_trace_event_init()
                  perf_trace_event_reg()
                    tp_event->class->reg() := tracepoint_probe_register
       #1              mutex_lock(&tracepoints_mutex)
                        trace_point_add_func()
       #2                  static_key_enable()
      
       #2	do_cpu_up()
      	  perf_event_init_cpu()
       #3	    mutex_lock(&pmus_lock)
       #4	    mutex_lock(&ctx->mutex)
      
      	perf_ioctl()
       #4	  ctx = perf_event_ctx_lock()
      	  _perf_iotcl()
      	    ftrace_profile_set_filter()
       #0	      mutex_lock(&event_mutex)
      
      Fudge it for now by noting that the tracepoint state does not depend
      on the event <-> context relation. Ugly though :/
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      43fa87f7
    • Peter Zijlstra's avatar
      perf/core: Fix lock inversion between perf,trace,cpuhp · 82d94856
      Peter Zijlstra authored
      Lockdep gifted us with noticing the following 4-way lockup scenario:
      
              perf_trace_init()
       #0       mutex_lock(&event_mutex)
                perf_trace_event_init()
                  perf_trace_event_reg()
                    tp_event->class->reg() := tracepoint_probe_register
       #1             mutex_lock(&tracepoints_mutex)
                        trace_point_add_func()
       #2                 static_key_enable()
      
       #2     do_cpu_up()
                perf_event_init_cpu()
       #3         mutex_lock(&pmus_lock)
       #4         mutex_lock(&ctx->mutex)
      
              perf_event_task_disable()
                mutex_lock(&current->perf_event_mutex)
       #4       ctx = perf_event_ctx_lock()
       #5       perf_event_for_each_child()
      
              do_exit()
                task_work_run()
                  __fput()
                    perf_release()
                      perf_event_release_kernel()
       #4               mutex_lock(&ctx->mutex)
       #5               mutex_lock(&event->child_mutex)
                        free_event()
                          _free_event()
                            event->destroy() := perf_trace_destroy
       #0                     mutex_lock(&event_mutex);
      
      Fix that by moving the free_event() out from under the locks.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      82d94856