1. 28 May, 2014 11 commits
    • Michael Ellerman's avatar
      powerpc/kvm/book3s_hv: Use threads_per_subcore in KVM · 3102f784
      Michael Ellerman authored
      To support split core on POWER8 we need to modify various parts of the
      KVM code to use threads_per_subcore instead of threads_per_core. On
      systems that do not support split core threads_per_subcore ==
      threads_per_core and these changes are a nop.
      
      We use threads_per_subcore as the value reported by KVM_CAP_PPC_SMT.
      This communicates to userspace that guests can only be created with
      a value of threads_per_core that is less than or equal to the current
      threads_per_subcore. This ensures that guests can only be created with a
      thread configuration that we are able to run given the current split
      core mode.
      
      Although threads_per_subcore can change during the life of the system,
      the commit that enables that will ensure that threads_per_subcore does
      not change during the life of a KVM VM.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Acked-by: default avatarAlexander Graf <agraf@suse.de>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3102f784
    • Michael Ellerman's avatar
      powerpc: Check cpu_thread_in_subcore() in __cpu_up() · 6f5e40a3
      Michael Ellerman authored
      To support split core we need to change the check in __cpu_up() that
      determines if a cpu is allowed to come online.
      
      Currently we refuse to online cpus which are not the primary thread
      within their core.
      
      On POWER8 with split core support this check needs to instead refuse to
      online cpus which are not the primary thread within their *sub* core.
      
      On POWER7 and other systems that do not support split core,
      threads_per_subcore == threads_per_core and so the check is equivalent.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6f5e40a3
    • Michael Ellerman's avatar
      powerpc: Add threads_per_subcore · 5853aef1
      Michael Ellerman authored
      On POWER8 we have a new concept of a subcore. This is what happens when
      you take a regular core and split it. A subcore is a grouping of two or
      four SMT threads, as well as a handfull of SPRs which allows the subcore
      to appear as if it were a core from the point of view of a guest.
      
      Unlike threads_per_core which is fixed at boot, threads_per_subcore can
      change while the system is running. Most code will not want to use
      threads_per_subcore.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5853aef1
    • Michael Ellerman's avatar
      powerpc/powernv: Make it possible to skip the IRQHAPPENED check in power7_nap() · 8d6f7c5a
      Michael Ellerman authored
      To support split core we need to be able to force all secondaries into
      nap, so the core can detect they are idle and do an unsplit.
      
      Currently power7_nap() will return without napping if there is an irq
      pending. We want to ignore the pending irq and nap anyway, we will deal
      with the interrupt later.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8d6f7c5a
    • Michael Ellerman's avatar
      powerpc/kvm/book3s_hv: Rework the secondary inhibit code · 441c19c8
      Michael Ellerman authored
      As part of the support for split core on POWER8, we want to be able to
      block splitting of the core while KVM VMs are active.
      
      The logic to do that would be exactly the same as the code we currently
      have for inhibiting onlining of secondaries.
      
      Instead of adding an identical mechanism to block split core, rework the
      secondary inhibit code to be a "HV KVM is active" check. We can then use
      that in both the cpu hotplug code and the upcoming split core code.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Acked-by: default avatarAlexander Graf <agraf@suse.de>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      441c19c8
    • Nishanth Aravamudan's avatar
      powerpc/numa: Enable CONFIG_HAVE_MEMORYLESS_NODES · 64bb80d8
      Nishanth Aravamudan authored
      Based off fd1197f1 for ia64, enable CONFIG_HAVE_MEMORYLESS_NODES if
      NUMA. Initialize the local memory node in start_secondary.
      
      With this commit and the preceding to enable
      CONFIG_USER_PERCPU_NUMA_NODE_ID, which is a prerequisite, in a PowerKVM
      guest with the following topology:
      
      numactl --hardware
      available: 3 nodes (0-2)
      node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
      23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
      47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
      71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
      95 96 97 98 99
      node 0 size: 1998 MB
      node 0 free: 521 MB
      node 1 cpus: 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
      115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
      133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
      151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168
      169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186
      187 188 189 190 191 192 193 194 195 196 197 198 199
      node 1 size: 0 MB
      node 1 free: 0 MB
      node 2 cpus:
      node 2 size: 2039 MB
      node 2 free: 1739 MB
      node distances:
      node   0   1   2
        0:  10  40  40
        1:  40  10  40
        2:  40  40  10
      
      the unreclaimable slab is reduced by close to 130M:
      
      Before:
              Slab:             418176 kB
              SReclaimable:      26624 kB
              SUnreclaim:       391552 kB
      
      After:
              Slab:             298944 kB
              SReclaimable:      31744 kB
              SUnreclaim:       267200 kB
      Signed-off-by: default avatarNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      64bb80d8
    • Nishanth Aravamudan's avatar
      powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID · 8c272261
      Nishanth Aravamudan authored
      Based off 3bccd996 for ia64, convert powerpc to use the generic per-CPU
      topology tracking, specifically:
      
          initialize per cpu numa_node entry in start_secondary
          remove the powerpc cpu_to_node()
          define CONFIG_USE_PERCPU_NUMA_NODE_ID if NUMA
      Signed-off-by: default avatarNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8c272261
    • Benjamin Herrenschmidt's avatar
      Merge branch 'merge' into next · 86969cf7
      Benjamin Herrenschmidt authored
      Merge the binutils and kexec fixes.
      86969cf7
    • Srivatsa S. Bhat's avatar
      powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode · 011e4b02
      Srivatsa S. Bhat authored
      If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
      (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
      get the following messages during boot:
      
      [    0.089866] POWER8 performance monitor hardware support registered
      [    0.089985] power8-pmu: PMAO restore workaround active.
      [    5.095419] Processor 1 is stuck.
      [   10.097933] Processor 2 is stuck.
      [   15.100480] Processor 3 is stuck.
      [   20.102982] Processor 4 is stuck.
      [   25.105489] Processor 5 is stuck.
      [   30.108005] Processor 6 is stuck.
      [   35.110518] Processor 7 is stuck.
      [   40.113369] Processor 9 is stuck.
      [   45.115879] Processor 10 is stuck.
      [   50.118389] Processor 11 is stuck.
      [   55.120904] Processor 12 is stuck.
      [   60.123425] Processor 13 is stuck.
      [   65.125970] Processor 14 is stuck.
      [   70.128495] Processor 15 is stuck.
      [   75.131316] Processor 17 is stuck.
      
      Note that only the sibling threads are stuck, while the primary threads (0, 8,
      16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
      that kexec tries to wakeup (bring online) the sibling threads of all the cores,
      before performing kexec:
      
      [ 9464.131231] Starting new kernel
      [ 9464.148507] kexec: Waking offline cpu 1.
      [ 9464.148552] kexec: Waking offline cpu 2.
      [ 9464.148600] kexec: Waking offline cpu 3.
      [ 9464.148636] kexec: Waking offline cpu 4.
      [ 9464.148671] kexec: Waking offline cpu 5.
      [ 9464.148708] kexec: Waking offline cpu 6.
      [ 9464.148743] kexec: Waking offline cpu 7.
      [ 9464.148779] kexec: Waking offline cpu 9.
      [ 9464.148815] kexec: Waking offline cpu 10.
      [ 9464.148851] kexec: Waking offline cpu 11.
      [ 9464.148887] kexec: Waking offline cpu 12.
      [ 9464.148922] kexec: Waking offline cpu 13.
      [ 9464.148958] kexec: Waking offline cpu 14.
      [ 9464.148994] kexec: Waking offline cpu 15.
      [ 9464.149030] kexec: Waking offline cpu 17.
      
      Instrumenting this piece of code revealed that the cpu_up() operation actually
      fails with -EBUSY. Thus, only the primary threads of all the cores are online
      during kexec, and hence this is a sure-shot receipe for disaster, as explained
      in commit e8e5c215 (powerpc/kexec: Fix orphaned offline CPUs across kexec),
      as well as in the comment above wake_offline_cpus().
      
      It turns out that cpu_up() was returning -EBUSY because the variable
      'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
      by migrate_to_reboot_cpu() inside kernel_kexec().
      
      Now, migrate_to_reboot_cpu() was originally written with the assumption that
      any further code will not need to perform CPU hotplug, since we are anyway in
      the reboot path. However, kexec is clearly not such a case, since we depend on
      onlining CPUs, atleast on powerpc.
      
      So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
      kexec path, to fix this regression in kexec on powerpc.
      
      Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
      can catch such issues more easily in the future.
      
      Fixes: c97102ba (kexec: migrate to reboot cpu)
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      011e4b02
    • Guenter Roeck's avatar
      powerpc: Fix 64 bit builds with binutils 2.24 · 7998eb3d
      Guenter Roeck authored
      With binutils 2.24, various 64 bit builds fail with relocation errors
      such as
      
      arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
      	(.text+0x165ee): relocation truncated to fit: R_PPC64_ADDR16_HI
      	against symbol `interrupt_base_book3e' defined in .text section
      	in arch/powerpc/kernel/built-in.o
      arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
      	(.text+0x16602): relocation truncated to fit: R_PPC64_ADDR16_HI
      	against symbol `interrupt_end_book3e' defined in .text section
      	in arch/powerpc/kernel/built-in.o
      
      The assembler maintainer says:
      
       I changed the ABI, something that had to be done but unfortunately
       happens to break the booke kernel code.  When building up a 64-bit
       value with lis, ori, shl, oris, ori or similar sequences, you now
       should use @high and @higha in place of @h and @ha.  @h and @ha
       (and their associated relocs R_PPC64_ADDR16_HI and R_PPC64_ADDR16_HA)
       now report overflow if the value is out of 32-bit signed range.
       ie. @h and @ha assume you're building a 32-bit value. This is needed
       to report out-of-range -mcmodel=medium toc pointer offsets in @toc@h
       and @toc@ha expressions, and for consistency I did the same for all
       other @h and @ha relocs.
      
      Replacing @h with @high in one strategic location fixes the relocation
      errors. This has to be done conditionally since the assembler either
      supports @h or @high but not both.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7998eb3d
    • Benjamin Herrenschmidt's avatar
      Merge remote-tracking branch 'scott/next' into next · b9d80095
      Benjamin Herrenschmidt authored
      <<
      Highlights include a few new boards, a device tree binding for CCF
      (including backwards-compatible device tree updates to distinguish
      incompatible versions), and some fixes.
      >>
      b9d80095
  2. 22 May, 2014 21 commits
  3. 20 May, 2014 8 commits