1. 01 Jun, 2014 1 commit
  2. 28 May, 2014 2 commits
    • Srivatsa S. Bhat's avatar
      powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode · 011e4b02
      Srivatsa S. Bhat authored
      If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
      (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
      get the following messages during boot:
      
      [    0.089866] POWER8 performance monitor hardware support registered
      [    0.089985] power8-pmu: PMAO restore workaround active.
      [    5.095419] Processor 1 is stuck.
      [   10.097933] Processor 2 is stuck.
      [   15.100480] Processor 3 is stuck.
      [   20.102982] Processor 4 is stuck.
      [   25.105489] Processor 5 is stuck.
      [   30.108005] Processor 6 is stuck.
      [   35.110518] Processor 7 is stuck.
      [   40.113369] Processor 9 is stuck.
      [   45.115879] Processor 10 is stuck.
      [   50.118389] Processor 11 is stuck.
      [   55.120904] Processor 12 is stuck.
      [   60.123425] Processor 13 is stuck.
      [   65.125970] Processor 14 is stuck.
      [   70.128495] Processor 15 is stuck.
      [   75.131316] Processor 17 is stuck.
      
      Note that only the sibling threads are stuck, while the primary threads (0, 8,
      16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
      that kexec tries to wakeup (bring online) the sibling threads of all the cores,
      before performing kexec:
      
      [ 9464.131231] Starting new kernel
      [ 9464.148507] kexec: Waking offline cpu 1.
      [ 9464.148552] kexec: Waking offline cpu 2.
      [ 9464.148600] kexec: Waking offline cpu 3.
      [ 9464.148636] kexec: Waking offline cpu 4.
      [ 9464.148671] kexec: Waking offline cpu 5.
      [ 9464.148708] kexec: Waking offline cpu 6.
      [ 9464.148743] kexec: Waking offline cpu 7.
      [ 9464.148779] kexec: Waking offline cpu 9.
      [ 9464.148815] kexec: Waking offline cpu 10.
      [ 9464.148851] kexec: Waking offline cpu 11.
      [ 9464.148887] kexec: Waking offline cpu 12.
      [ 9464.148922] kexec: Waking offline cpu 13.
      [ 9464.148958] kexec: Waking offline cpu 14.
      [ 9464.148994] kexec: Waking offline cpu 15.
      [ 9464.149030] kexec: Waking offline cpu 17.
      
      Instrumenting this piece of code revealed that the cpu_up() operation actually
      fails with -EBUSY. Thus, only the primary threads of all the cores are online
      during kexec, and hence this is a sure-shot receipe for disaster, as explained
      in commit e8e5c215 (powerpc/kexec: Fix orphaned offline CPUs across kexec),
      as well as in the comment above wake_offline_cpus().
      
      It turns out that cpu_up() was returning -EBUSY because the variable
      'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
      by migrate_to_reboot_cpu() inside kernel_kexec().
      
      Now, migrate_to_reboot_cpu() was originally written with the assumption that
      any further code will not need to perform CPU hotplug, since we are anyway in
      the reboot path. However, kexec is clearly not such a case, since we depend on
      onlining CPUs, atleast on powerpc.
      
      So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
      kexec path, to fix this regression in kexec on powerpc.
      
      Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
      can catch such issues more easily in the future.
      
      Fixes: c97102ba (kexec: migrate to reboot cpu)
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      011e4b02
    • Guenter Roeck's avatar
      powerpc: Fix 64 bit builds with binutils 2.24 · 7998eb3d
      Guenter Roeck authored
      With binutils 2.24, various 64 bit builds fail with relocation errors
      such as
      
      arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
      	(.text+0x165ee): relocation truncated to fit: R_PPC64_ADDR16_HI
      	against symbol `interrupt_base_book3e' defined in .text section
      	in arch/powerpc/kernel/built-in.o
      arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
      	(.text+0x16602): relocation truncated to fit: R_PPC64_ADDR16_HI
      	against symbol `interrupt_end_book3e' defined in .text section
      	in arch/powerpc/kernel/built-in.o
      
      The assembler maintainer says:
      
       I changed the ABI, something that had to be done but unfortunately
       happens to break the booke kernel code.  When building up a 64-bit
       value with lis, ori, shl, oris, ori or similar sequences, you now
       should use @high and @higha in place of @h and @ha.  @h and @ha
       (and their associated relocs R_PPC64_ADDR16_HI and R_PPC64_ADDR16_HA)
       now report overflow if the value is out of 32-bit signed range.
       ie. @h and @ha assume you're building a 32-bit value. This is needed
       to report out-of-range -mcmodel=medium toc pointer offsets in @toc@h
       and @toc@ha expressions, and for consistency I did the same for all
       other @h and @ha relocs.
      
      Replacing @h with @high in one strategic location fixes the relocation
      errors. This has to be done conditionally since the assembler either
      supports @h or @high but not both.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7998eb3d
  3. 12 May, 2014 2 commits
    • Anton Blanchard's avatar
      powerpc: irq work racing with timer interrupt can result in timer interrupt hang · 8050936c
      Anton Blanchard authored
      I am seeing an issue where a CPU running perf eventually hangs.
      Traces show timer interrupts happening every 4 seconds even
      when a userspace task is running on the CPU. /proc/timer_list
      also shows pending hrtimers have not run in over an hour,
      including the scheduler.
      
      Looking closer, decrementers_next_tb is getting set to
      0xffffffffffffffff, and at that point we will never take
      a timer interrupt again.
      
      In __timer_interrupt() we set decrementers_next_tb to
      0xffffffffffffffff and rely on ->event_handler to update it:
      
              *next_tb = ~(u64)0;
              if (evt->event_handler)
                      evt->event_handler(evt);
      
      In this case ->event_handler is hrtimer_interrupt. This will eventually
      call back through the clockevents code with the next event to be
      programmed:
      
      static int decrementer_set_next_event(unsigned long evt,
                                            struct clock_event_device *dev)
      {
              /* Don't adjust the decrementer if some irq work is pending */
              if (test_irq_work_pending())
                      return 0;
              __get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
      
      If irq work came in between these two points, we will return
      before updating decrementers_next_tb and we never process a timer
      interrupt again.
      
      This looks to have been introduced by 0215f7d8 (powerpc: Fix races
      with irq_work). Fix it by removing the early exit and relying on
      code later on in the function to force an early decrementer:
      
             /* We may have raced with new irq work */
             if (test_irq_work_pending())
                     set_dec(1);
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org # 3.14+
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8050936c
    • Gavin Shan's avatar
      powerpc/powernv: Reset root port in firmware · 372cf124
      Gavin Shan authored
      Resetting root port has more stuff to do than that for PCIe switch
      ports and we should have resetting root port done in firmware instead
      of the kernel itself. The problem was introduced by commit 5b2e198e
      ("powerpc/powernv: Rework EEH reset").
      
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      372cf124
  4. 28 Apr, 2014 33 commits
  5. 20 Apr, 2014 2 commits