1. 28 Aug, 2023 2 commits
  2. 25 Aug, 2023 1 commit
    • Russell Currey's avatar
      powerpc/iommu: Fix notifiers being shared by PCI and VIO buses · c37b6908
      Russell Currey authored
      fail_iommu_setup() registers the fail_iommu_bus_notifier struct to both
      PCI and VIO buses.  struct notifier_block is a linked list node, so this
      causes any notifiers later registered to either bus type to also be
      registered to the other since they share the same node.
      
      This causes issues in (at least) the vgaarb code, which registers a
      notifier for PCI buses.  pci_notify() ends up being called on a vio
      device, converted with to_pci_dev() even though it's not a PCI device,
      and finally makes a bad access in vga_arbiter_add_pci_device() as
      discovered with KASAN:
      
       BUG: KASAN: slab-out-of-bounds in vga_arbiter_add_pci_device+0x60/0xe00
       Read of size 4 at addr c000000264c26fdc by task swapper/0/1
      
       Call Trace:
         dump_stack_lvl+0x1bc/0x2b8 (unreliable)
         print_report+0x3f4/0xc60
         kasan_report+0x244/0x698
         __asan_load4+0xe8/0x250
         vga_arbiter_add_pci_device+0x60/0xe00
         pci_notify+0x88/0x444
         notifier_call_chain+0x104/0x320
         blocking_notifier_call_chain+0xa0/0x140
         device_add+0xac8/0x1d30
         device_register+0x58/0x80
         vio_register_device_node+0x9ac/0xce0
         vio_bus_scan_register_devices+0xc4/0x13c
         __machine_initcall_pseries_vio_device_init+0x94/0xf0
         do_one_initcall+0x12c/0xaa8
         kernel_init_freeable+0xa48/0xba8
         kernel_init+0x64/0x400
         ret_from_kernel_thread+0x5c/0x64
      
      Fix this by creating separate notifier_block structs for each bus type.
      
      Fixes: d6b9a81b ("powerpc: IOMMU fault injection")
      Reported-by: default avatarNageswara R Sastry <rnsastry@linux.ibm.com>
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Tested-by: default avatarNageswara R Sastry <rnsastry@linux.ibm.com>
      Reviewed-by: default avatarAndrew Donnellan <ajd@linux.ibm.com>
      [mpe: Add #ifdef to fix CONFIG_IBMVIO=n build]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20230322035322.328709-1-ruscur@russell.cc
      c37b6908
  3. 24 Aug, 2023 16 commits
  4. 23 Aug, 2023 3 commits
    • Christophe Leroy's avatar
      powerpc/85xx: Mark some functions static and add missing includes to fix no... · c265735f
      Christophe Leroy authored
      powerpc/85xx: Mark some functions static and add missing includes to fix no previous prototype error
      
      corenet{32/64}_smp_defconfig leads to:
      
        CC      arch/powerpc/sysdev/ehv_pic.o
      arch/powerpc/sysdev/ehv_pic.c:45:6: error: no previous prototype for 'ehv_pic_unmask_irq' [-Werror=missing-prototypes]
         45 | void ehv_pic_unmask_irq(struct irq_data *d)
            |      ^~~~~~~~~~~~~~~~~~
      arch/powerpc/sysdev/ehv_pic.c:52:6: error: no previous prototype for 'ehv_pic_mask_irq' [-Werror=missing-prototypes]
         52 | void ehv_pic_mask_irq(struct irq_data *d)
            |      ^~~~~~~~~~~~~~~~
      arch/powerpc/sysdev/ehv_pic.c:59:6: error: no previous prototype for 'ehv_pic_end_irq' [-Werror=missing-prototypes]
         59 | void ehv_pic_end_irq(struct irq_data *d)
            |      ^~~~~~~~~~~~~~~
      arch/powerpc/sysdev/ehv_pic.c:66:6: error: no previous prototype for 'ehv_pic_direct_end_irq' [-Werror=missing-prototypes]
         66 | void ehv_pic_direct_end_irq(struct irq_data *d)
            |      ^~~~~~~~~~~~~~~~~~~~~~
      arch/powerpc/sysdev/ehv_pic.c:71:5: error: no previous prototype for 'ehv_pic_set_affinity' [-Werror=missing-prototypes]
         71 | int ehv_pic_set_affinity(struct irq_data *d, const struct cpumask *dest,
            |     ^~~~~~~~~~~~~~~~~~~~
      arch/powerpc/sysdev/ehv_pic.c:112:5: error: no previous prototype for 'ehv_pic_set_irq_type' [-Werror=missing-prototypes]
        112 | int ehv_pic_set_irq_type(struct irq_data *d, unsigned int flow_type)
            |     ^~~~~~~~~~~~~~~~~~~~
        CC      arch/powerpc/sysdev/fsl_rio.o
      arch/powerpc/sysdev/fsl_rio.c:102:5: error: no previous prototype for 'fsl_rio_mcheck_exception' [-Werror=missing-prototypes]
        102 | int fsl_rio_mcheck_exception(struct pt_regs *regs)
            |     ^~~~~~~~~~~~~~~~~~~~~~~~
      arch/powerpc/sysdev/fsl_rio.c:306:5: error: no previous prototype for 'fsl_map_inb_mem' [-Werror=missing-prototypes]
        306 | int fsl_map_inb_mem(struct rio_mport *mport, dma_addr_t lstart,
            |     ^~~~~~~~~~~~~~~
      arch/powerpc/sysdev/fsl_rio.c:357:6: error: no previous prototype for 'fsl_unmap_inb_mem' [-Werror=missing-prototypes]
        357 | void fsl_unmap_inb_mem(struct rio_mport *mport, dma_addr_t lstart)
            |      ^~~~~~~~~~~~~~~~~
      arch/powerpc/sysdev/fsl_rio.c:445:5: error: no previous prototype for 'fsl_rio_setup' [-Werror=missing-prototypes]
        445 | int fsl_rio_setup(struct platform_device *dev)
            |     ^~~~~~~~~~~~~
        CC      arch/powerpc/sysdev/fsl_rmu.o
      arch/powerpc/sysdev/fsl_rmu.c:362:6: error: no previous prototype for 'msg_unit_error_handler' [-Werror=missing-prototypes]
        362 | void msg_unit_error_handler(void)
            |      ^~~~~~~~~~~~~~~~~~~~~~
        CC      arch/powerpc/platforms/85xx/corenet_generic.o
      arch/powerpc/platforms/85xx/corenet_generic.c:33:13: error: no previous prototype for 'corenet_gen_pic_init' [-Werror=missing-prototypes]
         33 | void __init corenet_gen_pic_init(void)
            |             ^~~~~~~~~~~~~~~~~~~~
      arch/powerpc/platforms/85xx/corenet_generic.c:51:13: error: no previous prototype for 'corenet_gen_setup_arch' [-Werror=missing-prototypes]
         51 | void __init corenet_gen_setup_arch(void)
            |             ^~~~~~~~~~~~~~~~~~~~~~
      arch/powerpc/platforms/85xx/corenet_generic.c:104:12: error: no previous prototype for 'corenet_gen_publish_devices' [-Werror=missing-prototypes]
        104 | int __init corenet_gen_publish_devices(void)
            |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~
        CC      arch/powerpc/platforms/85xx/qemu_e500.o
      arch/powerpc/platforms/85xx/qemu_e500.c:28:13: error: no previous prototype for 'qemu_e500_pic_init' [-Werror=missing-prototypes]
         28 | void __init qemu_e500_pic_init(void)
            |             ^~~~~~~~~~~~~~~~~~
        CC      arch/powerpc/kernel/pmc.o
      arch/powerpc/kernel/pmc.c:78:6: error: no previous prototype for 'power4_enable_pmcs' [-Werror=missing-prototypes]
         78 | void power4_enable_pmcs(void)
            |      ^~~~~~~~~~~~~~~~~~
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/c90780017b624b91771a3e4240dcbadc68137915.1692684784.git.christophe.leroy@csgroup.eu
      
      c265735f
    • Christophe Leroy's avatar
      powerpc/64e: Fix circular dependency with CONFIG_SMP disabled · 0e2a34c4
      Christophe Leroy authored
      asm/percpu.h includes asm/paca.h which needs struct tlb_core_data
      which is defined in mmu-e500.h
      
      asm/percpu.h is included from asm/mmu.h in a #ifdef CONFIG_E500
      before the inclusion of mmu-e500.h
      
      To fix that, move the inclusion of asm/percpu.h into mmu-e500.h
      after the definition of struct tlb_core_data
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202308220708.nRf5AUAe-lkp@intel.com/
      Closes: https://lore.kernel.org/oe-kbuild-all/202308220857.uFq2oAxM-lkp@intel.com/
      Closes: https://lore.kernel.org/oe-kbuild-all/202308221055.lw3UzJIL-lkp@intel.com/
      Fixes: 3a24ea0d ("powerpc/kuap: Use ASM feature fixups instead of static branches")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/5e0f97d5cbcd05238b56b4424ab096468296824d.1692684461.git.christophe.leroy@csgroup.eu
      0e2a34c4
    • Immad Mir's avatar
      powerpc/powernv: fix debugfs_create_dir() error checking · 429356fa
      Immad Mir authored
      The debugfs_create_dir returns ERR_PTR incase of an error and the
      correct way of checking it by using the IS_ERR inline function, and
      not the simple null comparision. This patch fixes this.
      Suggested-by: default avatarIvan Orlov <ivan.orlov0322@gmail.com>
      Signed-off-by: default avatarImmad Mir <mirimmad17@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/CY5PR12MB64553EE96EBB3927311DB598C6459@CY5PR12MB6455.namprd12.prod.outlook.com
      429356fa
  5. 21 Aug, 2023 16 commits
  6. 18 Aug, 2023 2 commits
    • Christophe Leroy's avatar
      powerpc/perf: Convert fsl_emb notifier to state machine callbacks · 34daf445
      Christophe Leroy authored
        CC      arch/powerpc/perf/core-fsl-emb.o
      arch/powerpc/perf/core-fsl-emb.c:675:6: error: no previous prototype for 'hw_perf_event_setup' [-Werror=missing-prototypes]
        675 | void hw_perf_event_setup(int cpu)
            |      ^~~~~~~~~~~~~~~~~~~
      
      Looks like fsl_emb was completely missed by commit 3f6da390 ("perf:
      Rework and fix the arch CPU-hotplug hooks")
      
      So, apply same changes as commit 3f6da390 ("perf: Rework and fix
      the arch CPU-hotplug hooks") then commit 57ecde42 ("powerpc/perf:
      Convert book3s notifier to state machine callbacks")
      
      While at it, also fix following error:
      
      arch/powerpc/perf/core-fsl-emb.c: In function 'perf_event_interrupt':
      arch/powerpc/perf/core-fsl-emb.c:648:13: error: variable 'found' set but not used [-Werror=unused-but-set-variable]
        648 |         int found = 0;
            |             ^~~~~
      
      Fixes: 3f6da390 ("perf: Rework and fix the arch CPU-hotplug hooks")
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/603e1facb32608f88f40b7d7b9094adc50e7b2dc.1692349125.git.christophe.leroy@csgroup.eu
      34daf445
    • Mahesh Salgaonkar's avatar
      PCI: rpaphp: Error out on busy status from get-sensor-state · 77583f77
      Mahesh Salgaonkar authored
      When certain PHB HW failure causes pHyp to recover PHB, it marks the PE
      state as temporarily unavailable until recovery is complete. This also
      triggers an EEH handler in Linux which needs to notify drivers, and perform
      recovery. But before notifying the driver about the PCI error it uses
      get_adapter_status()->rpaphp_get_sensor_state()->rtas_call(get-sensor-state)
      operation of the hotplug_slot to determine if the slot contains a device or
      not. If the slot is empty, the recovery is skipped entirely.
      
      eeh_event_handler()
        ->eeh_handle_normal_event()
          ->eeh_slot_presence_check()
            ->get_adapter_status()
              ->rpaphp_get_sensor_state()
                ->rtas_get_sensor()
                  ->rtas_call(get-sensor-state)
      
      However on certain PHB failures, the RTAS call rtas_call(get-sensor-state)
      returns extended busy error (9902) until PHB is recovered by pHyp. Once PHB
      is recovered, the rtas_call(get-sensor-state) returns success with correct
      presence status. The RTAS call interface rtas_get_sensor() loops over the
      RTAS call on extended delay return code (9902) until the return value is
      either success (0) or error (-1). This causes the EEH handler to get stuck
      for ~6 seconds before it could notify that the PCI error has been detected
      and stop any active operations. Hence with running I/O traffic, during this
      6 seconds, the network driver continues its operation and hits a timeout
      (netdev watchdog).
      
      ------------
      [52732.244731] DEBUG: ibm_read_slot_reset_state2()
      [52732.244762] DEBUG: ret = 0, rets[0]=5, rets[1]=1, rets[2]=4000, rets[3]=>
      [52732.244798] DEBUG: in eeh_slot_presence_check
      [52732.244804] DEBUG: error state check
      [52732.244807] DEBUG: Is slot hotpluggable
      [52732.244810] DEBUG: hotpluggable ops ?
      [52732.244953] DEBUG: Calling ops->get_adapter_status
      [52732.244958] DEBUG: calling rpaphp_get_sensor_state
      [52736.564262] ------------[ cut here ]------------
      [52736.564299] NETDEV WATCHDOG: enP64p1s0f3 (tg3): transmit queue 0 timed o>
      [52736.564324] WARNING: CPU: 1442 PID: 0 at net/sched/sch_generic.c:478 dev>
      [...]
      [52736.564505] NIP [c000000000c32368] dev_watchdog+0x438/0x440
      [52736.564513] LR [c000000000c32364] dev_watchdog+0x434/0x440
      ------------
      
      On timeouts, network driver starts dumping debug information to console
      (e.g bnx2 driver calls bnx2x_panic_dump()), and go into recovery path while
      pHyp is still recovering the PHB. As part of recovery, the driver tries to
      reset the device and it keeps failing since every PCI read/write returns
      ff's. And when EEH recovery kicks-in, the driver is unable to recover the
      device. This impacts the ssh connection and leads to the system being
      inaccessible. To get the NIC working again it needs a reboot or re-assign
      the I/O adapter from HMC.
      
      [ 9531.168587] EEH: Beginning: 'slot_reset'
      [ 9531.168601] PCI 0013:01:00.0#10000: EEH: Invoking bnx2x->slot_reset()
      [...]
      [ 9614.110094] bnx2x: [bnx2x_func_stop:9129(enP19p1s0f0)]FUNC_STOP ramrod failed. Running a dry transaction
      [ 9614.110300] bnx2x: [bnx2x_igu_int_disable:902(enP19p1s0f0)]BUG! Proper val not read from IGU!
      [ 9629.178067] bnx2x: [bnx2x_fw_command:3055(enP19p1s0f0)]FW failed to respond!
      [ 9629.178085] bnx2x 0013:01:00.0 enP19p1s0f0: bc 7.10.4
      [ 9629.178091] bnx2x: [bnx2x_fw_dump_lvl:789(enP19p1s0f0)]Cannot dump MCP info while in PCI error
      [ 9644.241813] bnx2x: [bnx2x_io_slot_reset:14245(enP19p1s0f0)]IO slot reset --> driver unload
      [...]
      [ 9644.241819] PCI 0013:01:00.0#10000: EEH: bnx2x driver reports: 'disconnect'
      [ 9644.241823] PCI 0013:01:00.1#10000: EEH: Invoking bnx2x->slot_reset()
      [ 9644.241827] bnx2x: [bnx2x_io_slot_reset:14229(enP19p1s0f1)]IO slot reset initializing...
      [ 9644.241916] bnx2x 0013:01:00.1: enabling device (0140 -> 0142)
      [ 9644.258604] bnx2x: [bnx2x_io_slot_reset:14245(enP19p1s0f1)]IO slot reset --> driver unload
      [ 9644.258612] PCI 0013:01:00.1#10000: EEH: bnx2x driver reports: 'disconnect'
      [ 9644.258615] EEH: Finished:'slot_reset' with aggregate recovery state:'disconnect'
      [ 9644.258620] EEH: Unable to recover from failure from PHB#13-PE#10000.
      [ 9644.261811] EEH: Beginning: 'error_detected(permanent failure)'
      [...]
      [ 9644.261823] EEH: Finished:'error_detected(permanent failure)'
      
      Hence, it becomes important to inform driver about the PCI error detection
      as early as possible, so that driver is aware of PCI error and waits for
      EEH handler's next action for successful recovery.
      
      Current implementation uses rtas_get_sensor() API which blocks the slot
      check state until RTAS call returns success. To avoid this, fix the PCI
      hotplug driver (rpaphp) to return an error (-EBUSY) if the slot presence
      state can not be detected immediately while PE is in EEH recovery state.
      Change rpaphp_get_sensor_state() to invoke rtas_call(get-sensor-state)
      directly only if the respective PE is in EEH recovery state, and take
      actions based on RTAS return status. This way EEH handler will not be
      blocked on rpaphp_get_sensor_state() and can immediately notify driver
      about the PCI error and stop any active operations.
      
      In normal cases (non-EEH case) rpaphp_get_sensor_state() will continue to
      invoke rtas_get_sensor() as it was earlier with no change in existing
      behavior.
      Signed-off-by: default avatarMahesh Salgaonkar <mahesh@linux.ibm.com>
      Reviewed-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/169235815601.193557.13989873835811325343.stgit@jupiter
      77583f77