1. 25 Nov, 2021 4 commits
    • Julia Lawall's avatar
      powerpc/cell: add missing of_node_put · a841fd00
      Julia Lawall authored
      for_each_node_by_name performs an of_node_get on each iteration, so
      a break out of the loop requires an of_node_put.
      
      A simplified version of the semantic patch that fixes this problem is as
      follows (http://coccinelle.lip6.fr):
      
      // <smpl>
      @@
      expression e,e1;
      local idexpression n;
      @@
      
       for_each_node_by_name(n, e1) {
         ... when != of_node_put(n)
             when != e = n
      (
         return n;
      |
      +  of_node_put(n);
      ?  return ...;
      )
         ...
       }
      // </smpl>
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1448051604-25256-7-git-send-email-Julia.Lawall@lip6.fr
      a841fd00
    • Julia Lawall's avatar
      powerpc/powernv: add missing of_node_put · 7d405a93
      Julia Lawall authored
      for_each_compatible_node performs an of_node_get on each iteration, so
      a break out of the loop requires an of_node_put.
      
      A simplified version of the semantic patch that fixes this problem is as
      follows (http://coccinelle.lip6.fr):
      
      // <smpl>
      @@
      local idexpression n;
      expression e;
      @@
      
       for_each_compatible_node(n,...) {
         ...
      (
         of_node_put(n);
      |
         e = n
      |
      +  of_node_put(n);
      ?  break;
      )
         ...
       }
      ... when != n
      // </smpl>
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1448051604-25256-4-git-send-email-Julia.Lawall@lip6.fr
      7d405a93
    • Julia Lawall's avatar
      powerpc/6xx: add missing of_node_put · f6e82647
      Julia Lawall authored
      for_each_compatible_node performs an of_node_get on each iteration, so
      a break out of the loop requires an of_node_put.
      
      A simplified version of the semantic patch that fixes this problem is as
      follows (http://coccinelle.lip6.fr):
      
      // <smpl>
      @@
      expression e;
      local idexpression n;
      @@
      
      @@
      local idexpression n;
      expression e;
      @@
      
       for_each_compatible_node(n,...) {
         ...
      (
         of_node_put(n);
      |
         e = n
      |
      +  of_node_put(n);
      ?  break;
      )
         ...
       }
      ... when != n
      // </smpl>
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1448051604-25256-2-git-send-email-Julia.Lawall@lip6.fr
      f6e82647
    • Michael Ellerman's avatar
      Merge branch 'topic/ppc-kvm' into next · ff0d6be4
      Michael Ellerman authored
      This merge's Nick's big P9 KVM series, original cover letter follows:
      
      KVM: PPC: Book3S HV P9: entry/exit optimisations
      
      This reduces radix guest full entry/exit latency on POWER9 and POWER10
      by 2x.
      
      Nested HV guests should see smaller improvements in their L1 entry/exit,
      but this is also combined with most L0 speedups also applying to nested
      entry. nginx localhost throughput test in a SMP nested guest is improved
      about 10% (in a direct guest it doesn't change much because it uses XIVE
      for IPIs) when L0 and L1 are patched.
      
      It does this in several main ways:
      
      - Rearrange code to optimise SPR accesses. Mainly, avoid scoreboard
        stalls.
      
      - Test SPR values to avoid mtSPRs where possible. mtSPRs are expensive.
      
      - Reduce mftb. mftb is expensive.
      
      - Demand fault certain facilities to avoid saving and/or restoring them
        (at the cost of fault when they are used, but this is mitigated over
        a number of entries, like the facilities when context switching
        processes). PM, TM, and EBB so far.
      
      - Defer some sequences that are made just in case a guest is interrupted
        in the middle of a critical section to the case where the guest is
        scheduled on a different CPU, rather than every time (at the cost of
        an extra IPI in this case). Namely the tlbsync sequence for radix with
        GTSE, which is very expensive.
      
      - Reduce locking, barriers, atomics related to the vcpus-per-vcore > 1
        handling that the P9 path does not require.
      ff0d6be4
  2. 24 Nov, 2021 36 commits