1. 30 Nov, 2021 6 commits
    • Christophe Leroy's avatar
      powerpc/modules: Don't WARN on first module allocation attempt · f1797e4d
      Christophe Leroy authored
      module_alloc() first tries to allocate module text within 24 bits direct
      jump from kernel text, and tries a wider allocation if first one fails.
      
      When first allocation fails the following is observed in kernel logs:
      
        vmap allocation for size 2400256 failed: use vmalloc=<size> to increase size
        systemd-udevd: vmalloc error: size 2395133b, vm_struct allocation failed, mode:0xcc0(GFP_KERNEL), nodemask=(null)
        CPU: 0 PID: 127 Comm: systemd-udevd Tainted: G        W         5.15.5-gentoo-PowerMacG4 #9
        Call Trace:
        [e2a53a50] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable)
        [e2a53a70] [c0540128] warn_alloc+0x11c/0x2b4
        [e2a53b50] [c0531be8] __vmalloc_node_range+0xd8/0x64c
        [e2a53c10] [c00338c0] module_alloc+0xa0/0xac
        [e2a53c40] [c027a368] load_module+0x2ae0/0x8148
        [e2a53e30] [c027fc78] sys_finit_module+0xfc/0x130
        [e2a53f30] [c0035098] ret_from_syscall+0x0/0x28
        ...
      
      Add __GFP_NOWARN flag to first allocation so that no warning appears
      when it fails.
      Reported-by: default avatarErhard Furtner <erhard_f@mailbox.org>
      Fixes: 2ec13df1 ("powerpc/modules: Load modules closer to kernel text")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/93c9b84d6ec76aaf7b4f03468e22433a6d308674.1638267035.git.christophe.leroy@csgroup.eu
      f1797e4d
    • Nicholas Piggin's avatar
      powerpc/64s: Get LPID bit width from device tree · 5402e239
      Nicholas Piggin authored
      Allow the LPID bit width and partition table size to be set at runtime
      from the device tree.
      
      Move the PID bit width detection into the same place.
      
      KVM does not support using the extra bits yet, this is mainly required
      to get the PTCR register values correct (so KVM will run but it will
      not allocate > 4096 LPIDs).
      
      OPAL firmware provides this property for POWER10 CPUs since skiboot
      commit 9b85f7d961f2 ("hdata: add mmu-pid-bits and mmu-lpid-bits for
      POWER10 CPUs").
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: default avatarFabiano Rosas <farosas@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20211129030915.1888332-1-npiggin@gmail.com
      5402e239
    • Athira Rajeev's avatar
      powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an overflown PMC · 2c9ac51b
      Athira Rajeev authored
      Running perf fuzzer showed below in dmesg logs:
        "Can't find PMC that caused IRQ"
      
      This means a PMU exception happened, but none of the PMC's (Performance
      Monitor Counter) were found to be overflown. There are some corner cases
      that clears the PMCs after PMI gets masked. In such cases, the perf
      interrupt handler will not find the active PMC values that had caused
      the overflow and thus leads to this message while replaying.
      
      Case 1: PMU Interrupt happens during replay of other interrupts and
      counter values gets cleared by PMU callbacks before replay:
      
      During replay of interrupts like timer, __do_irq() and doorbell
      exception, we conditionally enable interrupts via may_hard_irq_enable().
      This could potentially create a window to generate a PMI. Since irq soft
      mask is set to ALL_DISABLED, the PMI will get masked here. We could get
      IPIs run before perf interrupt is replayed and the PMU events could
      be deleted or stopped. This will change the PMU SPR values and resets
      the counters. Snippet of ftrace log showing PMU callbacks invoked in
      __do_irq():
      
        <idle>-0 [051] dns. 132025441306354: __do_irq <-call_do_irq
        <idle>-0 [051] dns. 132025441306430: irq_enter <-__do_irq
        <idle>-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq
        <idle>-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq
        <<>>
        <idle>-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed
        <idle>-0 [051] dnH. 132025441307839: flush_smp_call_function_queue <-smp_ipi_demux_relaxed
        <idle>-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function
        <idle>-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable
        <idle>-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out
        <idle>-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del
        <idle>-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read
        <idle>-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del
        <idle>-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del
        <idle>-0 [051] dnH. 132025441308787: power_pmu_event_idx <-perf_event_update_userpage
        <idle>-0 [051] dnH. 132025441308859: rcu_read_unlock_strict <-perf_event_update_userpage
        <idle>-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable
        <<>>
        <idle>-0 [051] dnH. 132025441311108: irq_exit <-__do_irq
        <idle>-0 [051] dns. 132025441311319: performance_monitor_exception <-replay_soft_interrupts
      
      Case 2: PMI's masked during local_* operations, example local_add(). If
      the local_add() operation happens within a local_irq_save(), replay of
      PMI will be during local_irq_restore(). Similar to case 1, this could
      also create a window before replay where PMU events gets deleted or
      stopped.
      
      Fix it by updating the PMU callback function power_pmu_disable() to
      check for pending perf interrupt. If there is an overflown PMC and
      pending perf interrupt indicated in paca, clear the PMI bit in paca to
      drop that sample. Clearing of PMI bit is done in power_pmu_disable()
      since disable is invoked before any event gets deleted/stopped. With
      this fix, if there are more than one event running in the PMU, there is
      a chance that we clear the PMI bit for the event which is not getting
      deleted/stopped. The other events may still remain active. Hence to make
      sure we don't drop valid sample in such cases, another check is added in
      power_pmu_enable. This checks if there is an overflown PMC found among
      the active events and if so enable back the PMI bit. Two new helper
      functions are introduced to clear/set the PMI, ie
      clear_pmi_irq_pending() and set_pmi_irq_pending(). Helper function
      pmi_irq_pending() is introduced to give a warning if there is pending
      PMI bit in paca, but no PMC is overflown.
      
      Also there are corner cases which result in performance monitor
      interrupts being triggered during power_pmu_disable(). This happens
      since PMXE bit is not cleared along with disabling of other MMCR0 bits
      in the pmu_disable. Such PMI's could leave the PMU running and could
      trigger PMI again which will set MMCR0 PMAO bit. This could lead to
      spurious interrupts in some corner cases. Example, a timer after
      power_pmu_del() which will re-enable interrupts and triggers a PMI again
      since PMAO bit is still set. But fails to find valid overflow since PMC
      was cleared in power_pmu_del(). Fix that by disabling PMXE along with
      disabling of other MMCR0 bits in power_pmu_disable().
      
      We can't just replay PMI any time. Hence this approach is preferred
      rather than replaying PMI before resetting overflown PMC. Patch also
      documents core-book3s on a race condition which can trigger these PMC
      messages during idle path in PowerNV.
      
      Fixes: f442d004 ("powerpc/64s: Add support to mask perf interrupts and replay them")
      Reported-by: default avatarNageswara R Sastry <nasastry@in.ibm.com>
      Suggested-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Suggested-by: default avatarMadhavan Srinivasan <maddy@linux.ibm.com>
      Signed-off-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Tested-by: default avatarNageswara R Sastry <rnsastry@linux.ibm.com>
      Reviewed-by: default avatarNicholas Piggin <npiggin@gmail.com>
      [mpe: Make pmi_irq_pending() return bool, reflow/reword some comments]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1626846509-1350-2-git-send-email-atrajeev@linux.vnet.ibm.com
      2c9ac51b
    • Christophe Leroy's avatar
      powerpc/atomics: Remove atomic_inc()/atomic_dec() and friends · f05cab00
      Christophe Leroy authored
      Now that atomic_add() and atomic_sub() handle immediate operands,
      atomic_inc() and atomic_dec() have no added value compared to the
      generic fallback which calls atomic_add(1) and atomic_sub(1).
      
      Also remove atomic_inc_not_zero() which fallsback to
      atomic_add_unless() which itself fallsback to
      atomic_fetch_add_unless() which now handles immediate operands.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/0bc64a2f18726055093dbb2e479cefc60a409cfd.1632236981.git.christophe.leroy@csgroup.eu
      f05cab00
    • Christophe Leroy's avatar
      powerpc/atomics: Use immediate operand when possible · 41d65207
      Christophe Leroy authored
      Today we get the following code generation for atomic operations:
      
      	c001bb2c:	39 20 00 01 	li      r9,1
      	c001bb30:	7d 40 18 28 	lwarx   r10,0,r3
      	c001bb34:	7d 09 50 50 	subf    r8,r9,r10
      	c001bb38:	7d 00 19 2d 	stwcx.  r8,0,r3
      
      	c001c7a8:	39 40 00 01 	li      r10,1
      	c001c7ac:	7d 00 18 28 	lwarx   r8,0,r3
      	c001c7b0:	7c ea 42 14 	add     r7,r10,r8
      	c001c7b4:	7c e0 19 2d 	stwcx.  r7,0,r3
      
      By allowing GCC to choose between immediate or regular operation,
      we get:
      
      	c001bb2c:	7d 20 18 28 	lwarx   r9,0,r3
      	c001bb30:	39 49 ff ff 	addi    r10,r9,-1
      	c001bb34:	7d 40 19 2d 	stwcx.  r10,0,r3
      	--
      	c001c7a4:	7d 40 18 28 	lwarx   r10,0,r3
      	c001c7a8:	39 0a 00 01 	addi    r8,r10,1
      	c001c7ac:	7d 00 19 2d 	stwcx.  r8,0,r3
      
      For "and", the dot form has to be used because "andi" doesn't exist.
      
      For logical operations we use unsigned 16 bits immediate.
      For arithmetic operations we use signed 16 bits immediate.
      
      On pmac32_defconfig, it reduces the text by approx another 8 kbytes.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Acked-by: default avatarSegher Boessenkool <segher@kernel.crashing.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/2ec558d44db8045752fe9dbd29c9ba84bab6030b.1632236981.git.christophe.leroy@csgroup.eu
      41d65207
    • Christophe Leroy's avatar
      powerpc/bitops: Use immediate operand when possible · fb350784
      Christophe Leroy authored
      Today we get the following code generation for bitops like
      set or clear bit:
      
      	c0009fe0:	39 40 08 00 	li      r10,2048
      	c0009fe4:	7c e0 40 28 	lwarx   r7,0,r8
      	c0009fe8:	7c e7 53 78 	or      r7,r7,r10
      	c0009fec:	7c e0 41 2d 	stwcx.  r7,0,r8
      
      	c000d568:	39 00 18 00 	li      r8,6144
      	c000d56c:	7c c0 38 28 	lwarx   r6,0,r7
      	c000d570:	7c c6 40 78 	andc    r6,r6,r8
      	c000d574:	7c c0 39 2d 	stwcx.  r6,0,r7
      
      Most set bits are constant on lower 16 bits, so it can easily
      be replaced by the "immediate" version of the operation. Allow
      GCC to choose between the normal or immediate form.
      
      For clear bits, on 32 bits 'rlwinm' can be used instead of 'andc' for
      when all bits to be cleared are consecutive.
      
      On 64 bits we don't have any equivalent single operation for clearing,
      single bits or a few bits, we'd need two 'rldicl' so it is not
      worth it, the li/andc sequence is doing the same.
      
      With this patch we get:
      
      	c0009fe0:	7d 00 50 28 	lwarx   r8,0,r10
      	c0009fe4:	61 08 08 00 	ori     r8,r8,2048
      	c0009fe8:	7d 00 51 2d 	stwcx.  r8,0,r10
      
      	c000d558:	7c e0 40 28 	lwarx   r7,0,r8
      	c000d55c:	54 e7 05 64 	rlwinm  r7,r7,0,21,18
      	c000d560:	7c e0 41 2d 	stwcx.  r7,0,r8
      
      On pmac32_defconfig, it reduces the text by approx 10 kbytes.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Reviewed-by: default avatarSegher Boessenkool <segher@kernel.crashing.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/e6f815d9181bab09df3b350af51149437863e9f9.1632236981.git.christophe.leroy@csgroup.eu
      fb350784
  2. 29 Nov, 2021 16 commits
  3. 25 Nov, 2021 18 commits