1. 26 Oct, 2023 3 commits
    • Catalin Marinas's avatar
      Merge branch 'for-next/feat_sve_b16b16' into for-next/core · 2a3f8ce3
      Catalin Marinas authored
      * for-next/feat_sve_b16b16:
        : Add support for FEAT_SVE_B16B16 (BFloat16)
        kselftest/arm64: Verify HWCAP2_SVE_B16B16
        arm64/sve: Report FEAT_SVE_B16B16 to userspace
      2a3f8ce3
    • Catalin Marinas's avatar
      Merge branches 'for-next/sve-remove-pseudo-regs', 'for-next/backtrace-ipi',... · 1519018c
      Catalin Marinas authored
      Merge branches 'for-next/sve-remove-pseudo-regs', 'for-next/backtrace-ipi', 'for-next/kselftest', 'for-next/misc' and 'for-next/cpufeat-display-cores', remote-tracking branch 'arm64/for-next/perf' into for-next/core
      
      * arm64/for-next/perf:
        perf: hisi: Fix use-after-free when register pmu fails
        drivers/perf: hisi_pcie: Initialize event->cpu only on success
        drivers/perf: hisi_pcie: Check the type first in pmu::event_init()
        perf/arm-cmn: Enable per-DTC counter allocation
        perf/arm-cmn: Rework DTC counters (again)
        perf/arm-cmn: Fix DTC domain detection
        drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init()
        drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally
        drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process
        drivers/perf: xgene: Use device_get_match_data()
        perf/amlogic: add missing MODULE_DEVICE_TABLE
        docs/perf: Add ampere_cspmu to toctree to fix a build warning
        perf: arm_cspmu: ampere_cspmu: Add support for Ampere SoC PMU
        perf: arm_cspmu: Support implementation specific validation
        perf: arm_cspmu: Support implementation specific filters
        perf: arm_cspmu: Split 64-bit write to 32-bit writes
        perf: arm_cspmu: Separate Arm and vendor module
      
      * for-next/sve-remove-pseudo-regs:
        : arm64/fpsimd: Remove the vector length pseudo registers
        arm64/sve: Remove SMCR pseudo register from cpufeature code
        arm64/sve: Remove ZCR pseudo register from cpufeature code
      
      * for-next/backtrace-ipi:
        : Add IPI for backtraces/kgdb, use NMI
        arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup
        arm64: smp: avoid NMI IPIs with broken MediaTek FW
        arm64: smp: Mark IPI globals as __ro_after_init
        arm64: kgdb: Implement kgdb_roundup_cpus() to enable pseudo-NMI roundup
        arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI
        arm64: smp: Add arch support for backtrace using pseudo-NMI
        arm64: smp: Remove dedicated wakeup IPI
        arm64: idle: Tag the arm64 idle functions as __cpuidle
        irqchip/gic-v3: Enable support for SGIs to act as NMIs
      
      * for-next/kselftest:
        : Various arm64 kselftest updates
        kselftest/arm64: Validate SVCR in streaming SVE stress test
      
      * for-next/misc:
        : Miscellaneous patches
        arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer
        arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n
        arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper
        clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
        arm64: Remove system_uses_lse_atomics()
        arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused
        arm64/mm: Hoist synchronization out of set_ptes() loop
        arm64: swiotlb: Reduce the default size if no ZONE_DMA bouncing needed
      
      * for-next/cpufeat-display-cores:
        : arm64 cpufeature display enabled cores
        arm64: cpufeature: Change DBM to display enabled cores
        arm64: cpufeature: Display the set of cores with a feature
      1519018c
    • Nathan Chancellor's avatar
      arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer · 146a15b8
      Nathan Chancellor authored
      Prior to LLVM 15.0.0, LLVM's integrated assembler would incorrectly
      byte-swap NOP when compiling for big-endian, and the resulting series of
      bytes happened to match the encoding of FNMADD S21, S30, S0, S0.
      
      This went unnoticed until commit:
      
        34f66c4c ("arm64: Use a positive cpucap for FP/SIMD")
      
      Prior to that commit, the kernel would always enable the use of FPSIMD
      early in boot when __cpu_setup() initialized CPACR_EL1, and so usage of
      FNMADD within the kernel was not detected, but could result in the
      corruption of user or kernel FPSIMD state.
      
      After that commit, the instructions happen to trap during boot prior to
      FPSIMD being detected and enabled, e.g.
      
      | Unhandled 64-bit el1h sync exception on CPU0, ESR 0x000000001fe00000 -- ASIMD
      | CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc3-00013-g34f66c4c #1
      | Hardware name: linux,dummy-virt (DT)
      | pstate: 400000c9 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      | pc : __pi_strcmp+0x1c/0x150
      | lr : populate_properties+0xe4/0x254
      | sp : ffffd014173d3ad0
      | x29: ffffd014173d3af0 x28: fffffbfffddffcb8 x27: 0000000000000000
      | x26: 0000000000000058 x25: fffffbfffddfe054 x24: 0000000000000008
      | x23: fffffbfffddfe000 x22: fffffbfffddfe000 x21: fffffbfffddfe044
      | x20: ffffd014173d3b70 x19: 0000000000000001 x18: 0000000000000005
      | x17: 0000000000000010 x16: 0000000000000000 x15: 00000000413e7000
      | x14: 0000000000000000 x13: 0000000000001bcc x12: 0000000000000000
      | x11: 00000000d00dfeed x10: ffffd414193f2cd0 x9 : 0000000000000000
      | x8 : 0101010101010101 x7 : ffffffffffffffc0 x6 : 0000000000000000
      | x5 : 0000000000000000 x4 : 0101010101010101 x3 : 000000000000002a
      | x2 : 0000000000000001 x1 : ffffd014171f2988 x0 : fffffbfffddffcb8
      | Kernel panic - not syncing: Unhandled exception
      | CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc3-00013-g34f66c4c #1
      | Hardware name: linux,dummy-virt (DT)
      | Call trace:
      |  dump_backtrace+0xec/0x108
      |  show_stack+0x18/0x2c
      |  dump_stack_lvl+0x50/0x68
      |  dump_stack+0x18/0x24
      |  panic+0x13c/0x340
      |  el1t_64_irq_handler+0x0/0x1c
      |  el1_abort+0x0/0x5c
      |  el1h_64_sync+0x64/0x68
      |  __pi_strcmp+0x1c/0x150
      |  unflatten_dt_nodes+0x1e8/0x2d8
      |  __unflatten_device_tree+0x5c/0x15c
      |  unflatten_device_tree+0x38/0x50
      |  setup_arch+0x164/0x1e0
      |  start_kernel+0x64/0x38c
      |  __primary_switched+0xbc/0xc4
      
      Restrict CONFIG_CPU_BIG_ENDIAN to a known good assembler, which is
      either GNU as or LLVM's IAS 15.0.0 and newer, which contains the linked
      commit.
      
      Closes: https://github.com/ClangBuiltLinux/linux/issues/1948
      Link: https://github.com/llvm/llvm-project/commit/1379b150991f70a5782e9a143c2ba5308da1161cSigned-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Link: https://lore.kernel.org/r/20231025-disable-arm64-be-ias-b4-llvm-15-v1-1-b25263ed8b23@kernel.orgSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      146a15b8
  2. 24 Oct, 2023 5 commits
  3. 23 Oct, 2023 5 commits
  4. 19 Oct, 2023 3 commits
  5. 18 Oct, 2023 3 commits
  6. 17 Oct, 2023 2 commits
  7. 16 Oct, 2023 1 commit
    • Ryan Roberts's avatar
      arm64/mm: Hoist synchronization out of set_ptes() loop · 3425cec4
      Ryan Roberts authored
      set_ptes() sets a physically contiguous block of memory (which all
      belongs to the same folio) to a contiguous block of ptes. The arm64
      implementation of this previously just looped, operating on each
      individual pte. But the __sync_icache_dcache() and mte_sync_tags()
      operations can both be hoisted out of the loop so that they are
      performed once for the contiguous set of pages (which may be less than
      the whole folio). This should result in minor performance gains.
      
      __sync_icache_dcache() already acts on the whole folio, and sets a flag
      in the folio so that it skips duplicate calls. But by hoisting the call,
      all the pte testing is done only once.
      
      mte_sync_tags() operates on each individual page with its own loop. But
      by passing the number of pages explicitly, we can rely solely on its
      loop and do the checks only once. This approach also makes it robust for
      the future, rather than assuming if a head page of a compound page is
      being mapped, then the whole compound page is being mapped, instead we
      explicitly know how many pages are being mapped. The old assumption may
      not continue to hold once the "anonymous large folios" feature is
      merged.
      Signed-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Reviewed-by: default avatarSteven Price <steven.price@arm.com>
      Link: https://lore.kernel.org/r/20231005140730.2191134-1-ryan.roberts@arm.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      3425cec4
  8. 13 Oct, 2023 1 commit
  9. 12 Oct, 2023 1 commit
  10. 10 Oct, 2023 1 commit
  11. 06 Oct, 2023 2 commits
  12. 05 Oct, 2023 4 commits
  13. 29 Sep, 2023 2 commits
  14. 25 Sep, 2023 7 commits