1. 22 Oct, 2021 26 commits
  2. 14 Oct, 2021 2 commits
  3. 13 Oct, 2021 3 commits
  4. 12 Oct, 2021 2 commits
  5. 08 Oct, 2021 7 commits
    • Nathan Lynch's avatar
      powerpc/pseries/cpuhp: remove obsolete comment from pseries_cpu_die · f9473a65
      Nathan Lynch authored
      This comment likely refers to the obsolete DLPAR workflow where some
      resource state transitions were driven more directly from user space
      utilities, but it also seems to contradict itself: "Change isolate state to
      Isolate [...]" is at odds with the preceding sentences, and it does not
      relate at all to the code that follows.
      
      Remove it to prevent confusion.
      Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Reviewed-by: default avatarDaniel Henrique Barboza <danielhb413@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210927201933.76786-5-nathanl@linux.ibm.com
      f9473a65
    • Nathan Lynch's avatar
      powerpc/pseries/cpuhp: delete add/remove_by_count code · fa2a5dfe
      Nathan Lynch authored
      The core DLPAR code supports two actions (add and remove) and three
      subtypes of action:
      
      * By DRC index: the action is attempted on a single specified resource.
        This is the usual case for processors.
      * By indexed count: the action is attempted on a range of resources
        beginning at the specified index. This is implemented only by the memory
        DLPAR code.
      * By count: the lower layer (CPU or memory) is responsible for locating the
        specified number of resources to which the action can be applied.
      
      I cannot find any evidence of the "by count" subtype being used by drmgr or
      qemu for processors. And when I try to exercise this code, the add case
      does not work:
      
        $ ppc64_cpu --smt ; nproc
        SMT=8
        24
        $ printf "cpu remove count 2" > /sys/kernel/dlpar
        $ nproc
        8
        $ printf "cpu add count 2" > /sys/kernel/dlpar
        -bash: printf: write error: Invalid argument
        $ dmesg | tail -2
        pseries-hotplug-cpu: Failed to find enough CPUs (1 of 2) to add
        dlpar: Could not handle DLPAR request "cpu add count 2"
        $ nproc
        8
        $ drmgr -c cpu -a -q 2         # this uses the by-index method
        Validating CPU DLPAR capability...yes.
        CPU 1
        CPU 17
        $ nproc
        24
      
      This is because find_drc_info_cpus_to_add() does not increment drc_index
      appropriately during its search.
      
      This is not hard to fix. But the _by_count() functions also have the
      property that they attempt to roll back all prior operations if the entire
      request cannot be satisfied, even though the rollback itself can encounter
      errors. It's not possible to provide transaction-like behavior at this
      level, and it's undesirable to have code that can only pretend to do that.
      Any users of these functions cannot know what the state of the system is in
      the error case. And the error paths are, to my knowledge, impossible to
      test without adding custom error injection code.
      
      Summary:
      
      * This code has not worked reliably since its introduction.
      * There is no evidence that it is used.
      * It contains questionable rollback behaviors in error paths which are
        difficult to test.
      
      So let's remove it.
      
      Fixes: ac713800 ("powerpc/pseries: Add CPU dlpar remove functionality")
      Fixes: 90edf184 ("powerpc/pseries: Add CPU dlpar add functionality")
      Fixes: b015f6bc ("powerpc/pseries: Add cpu DLPAR support for drc-info property")
      Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Tested-by: default avatarDaniel Henrique Barboza <danielhb413@gmail.com>
      Reviewed-by: default avatarDaniel Henrique Barboza <danielhb413@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210927201933.76786-4-nathanl@linux.ibm.com
      fa2a5dfe
    • Nathan Lynch's avatar
      powerpc/cpuhp: BUG -> WARN conversion in offline path · 983f9101
      Nathan Lynch authored
      If, due to bugs elsewhere, we get into unregister_cpu_online() with a CPU
      that isn't marked hotpluggable, we can emit a warning and return an
      appropriate error instead of crashing.
      Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Reviewed-by: default avatarDaniel Henrique Barboza <danielhb413@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210927201933.76786-3-nathanl@linux.ibm.com
      983f9101
    • Nathan Lynch's avatar
      powerpc/pseries/cpuhp: cache node corrections · 7edd5c9a
      Nathan Lynch authored
      On pseries, cache nodes in the device tree can be added and removed by the
      CPU DLPAR code as well as the partition migration (mobility) code. PowerVM
      partitions in dedicated processor mode typically have L2 and L3 cache
      nodes.
      
      The CPU DLPAR code has the following shortcomings:
      
      * Cache nodes returned as siblings of a new CPU node by
        ibm,configure-connector are silently discarded; only the CPU node is
        added to the device tree.
      
      * Cache nodes which become unreferenced in the processor removal path are
        not removed from the device tree. This can lead to duplicate nodes when
        the post-migration device tree update code replaces cache nodes.
      
      This is long-standing behavior. Presumably it has gone mostly unnoticed
      because the two bugs have the property of obscuring each other in common
      simple scenarios (e.g. remove a CPU and add it back). Likely you'd notice
      only if you cared to inspect the device tree or the sysfs cacheinfo
      information.
      
      Booted with two processors:
      
        $ pwd
        /sys/firmware/devicetree/base/cpus
        $ ls -1d */
        l2-cache@2010/
        l2-cache@2011/
        l3-cache@3110/
        l3-cache@3111/
        PowerPC,POWER9@0/
        PowerPC,POWER9@8/
        $ lsprop */l2-cache
        l2-cache@2010/l2-cache
                       00003110 (12560)
        l2-cache@2011/l2-cache
                       00003111 (12561)
        PowerPC,POWER9@0/l2-cache
                       00002010 (8208)
        PowerPC,POWER9@8/l2-cache
                       00002011 (8209)
        $ ls /sys/devices/system/cpu/cpu0/cache/
        index0  index1  index2  index3
      
      After DLPAR-adding PowerPC,POWER9@10, we see that its associated cache
      nodes are absent, its threads' L2+L3 cacheinfo is unpopulated, and it is
      missing a cache level in its sched domain hierarchy:
      
        $ ls -1d */
        l2-cache@2010/
        l2-cache@2011/
        l3-cache@3110/
        l3-cache@3111/
        PowerPC,POWER9@0/
        PowerPC,POWER9@10/
        PowerPC,POWER9@8/
        $ lsprop PowerPC\,POWER9@10/l2-cache
        PowerPC,POWER9@10/l2-cache
                       00002012 (8210)
        $ ls /sys/devices/system/cpu/cpu16/cache/
        index0  index1
        $ grep . /sys/kernel/debug/sched/domains/cpu{0,8,16}/domain*/name
        /sys/kernel/debug/sched/domains/cpu0/domain0/name:SMT
        /sys/kernel/debug/sched/domains/cpu0/domain1/name:CACHE
        /sys/kernel/debug/sched/domains/cpu0/domain2/name:DIE
        /sys/kernel/debug/sched/domains/cpu8/domain0/name:SMT
        /sys/kernel/debug/sched/domains/cpu8/domain1/name:CACHE
        /sys/kernel/debug/sched/domains/cpu8/domain2/name:DIE
        /sys/kernel/debug/sched/domains/cpu16/domain0/name:SMT
        /sys/kernel/debug/sched/domains/cpu16/domain1/name:DIE
      
      When removing PowerPC,POWER9@8, we see that its cache nodes are left
      behind:
      
        $ ls -1d */
        l2-cache@2010/
        l2-cache@2011/
        l3-cache@3110/
        l3-cache@3111/
        PowerPC,POWER9@0/
      
      When DLPAR is combined with VM migration, we can get duplicate nodes. E.g.
      removing one processor, then migrating, adding a processor, and then
      migrating again can result in warnings from the OF core during
      post-migration device tree updates:
      
        Duplicate name in cpus, renamed to "l2-cache@2011#1"
        Duplicate name in cpus, renamed to "l3-cache@3111#1"
      
      and nodes with duplicated phandles in the tree, making lookup behavior
      unpredictable:
      
        $ lsprop l[23]-cache@*/ibm,phandle
        l2-cache@2010/ibm,phandle
                         00002010 (8208)
        l2-cache@2011#1/ibm,phandle
                         00002011 (8209)
        l2-cache@2011/ibm,phandle
                         00002011 (8209)
        l3-cache@3110/ibm,phandle
                         00003110 (12560)
        l3-cache@3111#1/ibm,phandle
                         00003111 (12561)
        l3-cache@3111/ibm,phandle
                         00003111 (12561)
      
      Address these issues by:
      
      * Correctly processing siblings of the node returned from
        dlpar_configure_connector().
      * Removing cache nodes in the CPU remove path when it can be determined
        that they are not associated with other CPUs or caches.
      
      Use the of_changeset API in both cases, which allows us to keep the error
      handling in this code from becoming more complex while ensuring that the
      device tree cannot become inconsistent.
      
      Fixes: ac713800 ("powerpc/pseries: Add CPU dlpar remove functionality")
      Fixes: 90edf184 ("powerpc/pseries: Add CPU dlpar add functionality")
      Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Tested-by: default avatarDaniel Henrique Barboza <danielhb413@gmail.com>
      Reviewed-by: default avatarDaniel Henrique Barboza <danielhb413@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210927201933.76786-2-nathanl@linux.ibm.com
      7edd5c9a
    • Nathan Lynch's avatar
      powerpc/paravirt: correct preempt debug splat in vcpu_is_preempted() · fda0eb22
      Nathan Lynch authored
      vcpu_is_preempted() can be used outside of preempt-disabled critical
      sections, yielding warnings such as:
      
      BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185
      caller is rwsem_spin_on_owner+0x1cc/0x2d0
      CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33
      Call Trace:
      [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable)
      [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160
      [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0
      [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0
      [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0
      [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0
      [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70
      [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0
      [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250
      
      The result of vcpu_is_preempted() is always used speculatively, and the
      function does not access per-cpu resources in a (Linux) preempt-unsafe way.
      Use raw_smp_processor_id() to avoid such warnings, adding explanatory
      comments.
      
      Fixes: ca3f969d ("powerpc/paravirt: Use is_kvm_guest() in vcpu_is_preempted()")
      Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210928214147.312412-3-nathanl@linux.ibm.com
      fda0eb22
    • Nathan Lynch's avatar
      powerpc/paravirt: vcpu_is_preempted() commentary · 799f9b51
      Nathan Lynch authored
      Add comments more clearly documenting that this function determines whether
      hypervisor-level preemption of the VM has occurred.
      Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210928214147.312412-2-nathanl@linux.ibm.com
      799f9b51
    • Nathan Lynch's avatar
      powerpc: fix unbalanced node refcount in check_kvm_guest() · 56537faf
      Nathan Lynch authored
      When check_kvm_guest() succeeds in looking up a /hypervisor OF node, it
      returns without performing a matching put for the lookup, leaving the
      node's reference count elevated.
      
      Add the necessary call to of_node_put(), rearranging the code slightly to
      avoid repetition or goto.
      
      Fixes: 107c5500 ("powerpc/pseries: Add KVM guest doorbell restrictions")
      Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Reviewed-by: default avatarTyrel Datwyler <tyreld@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210928124550.132020-1-nathanl@linux.ibm.com
      56537faf