arm64: numa: rework ACPI NUMA initialization

Current ACPI ARM64 NUMA initialization code in acpi_numa_gicc_affinity_init() carries out NUMA nodes creation and cpu<->node mappings at the same time in the arch backend so that a single SRAT walk is needed to parse both pieces of information. This implies that the cpu<->node mappings must be stashed in an array (sized NR_CPUS) so that SMP code can later use the stashed values to avoid another SRAT table walk to set-up the early cpu<->node mappings. If the kernel is configured with a NR_CPUS value less than the actual processor entries in the SRAT (and MADT), the logic in acpi_numa_gicc_affinity_init() is broken in that the cpu<->node mapping is only carried out (and stashed for future use) only for a number of SRAT entries up to NR_CPUS, which do not necessarily correspond to the possible cpus detected at SMP initialization in acpi_map_gic_cpu_interface() (ie MADT and SRAT processor entries order is not enforced), which leaves the kernel with broken cpu<->node mappings. Furthermore, given the current ACPI NUMA code parsing logic in acpi_numa_gicc_affinity_init(), PXM domains for CPUs that are not parsed because they exceed NR_CPUS entries are not mapped to NUMA nodes (ie the PXM corresponding node is not created in the kernel) leaving the system with a broken NUMA topology. Rework the ACPI ARM64 NUMA initialization process so that the NUMA nodes creation and cpu<->node mappings are decoupled. cpu<->node mappings are moved to SMP initialization code (where they are needed), at the cost of an extra SRAT walk so that ACPI NUMA mappings can be batched before being applied, fixing current parsing pitfalls. Acked-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: John Garry <john.garry@huawei.com> Fixes: d8b47fca ("arm64, ACPI, NUMA: NUMA support based on SRAT and SLIT") Link: http://lkml.kernel.org/r/1527768879-88161-2-git-send-email-xiexiuqi@huawei.comReported-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Punit Agrawal <punit.agrawal@arm.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> Cc: Jeremy Linton <jeremy.linton@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>

arm64: numa: rework ACPI NUMA initialization
Current ACPI ARM64 NUMA initialization code in acpi_numa_gicc_affinity_init() carries out NUMA nodes creation and cpu<->node mappings at the same time in the arch backend so that a single SRAT walk is needed to parse both pieces of information. This implies that the cpu<->node mappings must be stashed in an array (sized NR_CPUS) so that SMP code can later use the stashed values to avoid another SRAT table walk to set-up the early cpu<->node mappings. If the kernel is configured with a NR_CPUS value less than the actual processor entries in the SRAT (and MADT), the logic in acpi_numa_gicc_affinity_init() is broken in that the cpu<->node mapping is only carried out (and stashed for future use) only for a number of SRAT entries up to NR_CPUS, which do not necessarily correspond to the possible cpus detected at SMP initialization in acpi_map_gic_cpu_interface() (ie MADT and SRAT processor entries order is not enforced), which leaves the kernel with broken cpu<->node mappings. Furthermore, given the current ACPI NUMA code parsing logic in acpi_numa_gicc_affinity_init(), PXM domains for CPUs that are not parsed because they exceed NR_CPUS entries are not mapped to NUMA nodes (ie the PXM corresponding node is not created in the kernel) leaving the system with a broken NUMA topology. Rework the ACPI ARM64 NUMA initialization process so that the NUMA nodes creation and cpu<->node mappings are decoupled. cpu<->node mappings are moved to SMP initialization code (where they are needed), at the cost of an extra SRAT walk so that ACPI NUMA mappings can be batched before being applied, fixing current parsing pitfalls. Acked-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: John Garry <john.garry@huawei.com> Fixes: d8b47fca ("arm64, ACPI, NUMA: NUMA support based on SRAT and SLIT") Link: http://lkml.kernel.org/r/1527768879-88161-2-git-send-email-xiexiuqi@huawei.comReported-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Punit Agrawal <punit.agrawal@arm.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> Cc: Jeremy Linton <jeremy.linton@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
e1896249 · Lorenzo Pieralisi · Will Deacon · e7d4bac4 · e1896249 · e1896249
Commit e1896249 authored Jun 25, 2018 by Lorenzo Pieralisi Committed by Will Deacon Jul 09, 2018
Showing with 85 additions and 48 deletions

arch/arm64/include/asm/acpi.h arch/arm64/include/asm/acpi.h +4 -2

arch/arm64/kernel/acpi_numa.c arch/arm64/kernel/acpi_numa.c +53 -35

arch/arm64/kernel/smp.c arch/arm64/kernel/smp.c +28 -11

No files found.
--- a/arch/arm64/include/asm/acpi.h
+++ b/arch/arm64/include/asm/acpi.h
@@ -134,10 +134,12 @@ pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr);

 #ifdef CONFIG_ACPI_NUMA
 int arm64_acpi_numa_init(void);
-int acpi_numa_get_nid(unsigned int cpu, u64 hwid);
+int acpi_numa_get_nid(unsigned int cpu);
+void acpi_map_cpus_to_nodes(void);
 #else
 static inline int arm64_acpi_numa_init(void) { return -ENOSYS; }
-static inline int acpi_numa_get_nid(unsigned int cpu, u64 hwid) { return NUMA_NO_NODE; }
+static inline int acpi_numa_get_nid(unsigned int cpu) { return NUMA_NO_NODE; }
+static inline void acpi_map_cpus_to_nodes(void) { }
 #endif /* CONFIG_ACPI_NUMA */

 #define ACPI_TABLE_UPGRADE_MAX_PHYS MEMBLOCK_ALLOC_ACCESSIBLE

--- a/arch/arm64/kernel/acpi_numa.c
+++ b/arch/arm64/kernel/acpi_numa.c
@@ -26,36 +26,73 @@
 #include <linux/module.h>
 #include <linux/topology.h>

-#include <acpi/processor.h>
 #include <asm/numa.h>

-static int cpus_in_srat;
+static int acpi_early_node_map[NR_CPUS] __initdata = { NUMA_NO_NODE };

-struct __node_cpu_hwid {
-	u32 node_id;    /* logical node containing this CPU */
-	u64 cpu_hwid;   /* MPIDR for this CPU */
-};
+int __init acpi_numa_get_nid(unsigned int cpu)
+{
+	return acpi_early_node_map[cpu];
+}
+
+static inline int get_cpu_for_acpi_id(u32 uid)
+{
+	int cpu;
+
+	for (cpu = 0; cpu < nr_cpu_ids; cpu++)
+		if (uid == get_acpi_id_for_cpu(cpu))
+			return cpu;

-static struct __node_cpu_hwid early_node_cpu_hwid[NR_CPUS] = {
-[0 ... NR_CPUS - 1] = {NUMA_NO_NODE, PHYS_CPUID_INVALID} };
+	return -EINVAL;
+}

-int acpi_numa_get_nid(unsigned int cpu, u64 hwid)
+static int __init acpi_parse_gicc_pxm(struct acpi_subtable_header *header,
+				      const unsigned long end)
 {
-	int i;
+	struct acpi_srat_gicc_affinity *pa;
+	int cpu, pxm, node;

-	for (i = 0; i < cpus_in_srat; i++) {
-		if (hwid == early_node_cpu_hwid[i].cpu_hwid)
-			return early_node_cpu_hwid[i].node_id;
-	}
+	if (srat_disabled())
+		return -EINVAL;
+
+	pa = (struct acpi_srat_gicc_affinity *)header;
+	if (!pa)
+		return -EINVAL;
+
+	if (!(pa->flags & ACPI_SRAT_GICC_ENABLED))
+		return 0;

-	return NUMA_NO_NODE;
+	pxm = pa->proximity_domain;
+	node = pxm_to_node(pxm);
+
+	/*
+	 * If we can't map the UID to a logical cpu this
+	 * means that the UID is not part of possible cpus
+	 * so we do not need a NUMA mapping for it, skip
+	 * the SRAT entry and keep parsing.
+	 */
+	cpu = get_cpu_for_acpi_id(pa->acpi_processor_uid);
+	if (cpu < 0)
+		return 0;
+
+	acpi_early_node_map[cpu] = node;
+	pr_info("SRAT: PXM %d -> MPIDR 0x%llx -> Node %d\n", pxm,
+		cpu_logical_map(cpu), node);
+
+	return 0;
+}
+
+void __init acpi_map_cpus_to_nodes(void)
+{
+	acpi_table_parse_entries(ACPI_SIG_SRAT, sizeof(struct acpi_table_srat),
+					    ACPI_SRAT_TYPE_GICC_AFFINITY,
+					    acpi_parse_gicc_pxm, 0);
 }

 /* Callback for Proximity Domain -> ACPI processor UID mapping */
 void __init acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa)
 {
 	int pxm, node;
-	phys_cpuid_t mpidr;

 	if (srat_disabled())
 		return;
@@ -70,12 +107,6 @@ void __init acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa)
 	if (!(pa->flags & ACPI_SRAT_GICC_ENABLED))
 		return;

-	if (cpus_in_srat >= NR_CPUS) {
-		pr_warn_once("SRAT: cpu_to_node_map[%d] is too small, may not be able to use all cpus\n",
-			     NR_CPUS);
-		return;
-	}
-
 	pxm = pa->proximity_domain;
 	node = acpi_map_pxm_to_node(pxm);

@@ -85,20 +116,7 @@ void __init acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa)
 		return;
 	}

-	mpidr = acpi_map_madt_entry(pa->acpi_processor_uid);
-	if (mpidr == PHYS_CPUID_INVALID) {
-		pr_err("SRAT: PXM %d with ACPI ID %d has no valid MPIDR in MADT\n",
-			pxm, pa->acpi_processor_uid);
-		bad_srat();
-		return;
-	}
-
-	early_node_cpu_hwid[cpus_in_srat].node_id = node;
-	early_node_cpu_hwid[cpus_in_srat].cpu_hwid =  mpidr;
 	node_set(node, numa_nodes_parsed);
-	cpus_in_srat++;
-	pr_info("SRAT: PXM %d -> MPIDR 0x%Lx -> Node %d\n",
-		pxm, mpidr, node);
 }

 int __init arm64_acpi_numa_init(void)

--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -522,7 +522,6 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
 		}
 		bootcpu_valid = true;
 		cpu_madt_gicc[0] = *processor;
-		early_map_cpu_to_node(0, acpi_numa_get_nid(0, hwid));
 		return;
 	}

@@ -545,8 +544,6 @@ acpi_map_gic_cpu_interface(struct acpi_madt_generic_interrupt *processor)
 	 */
 	acpi_set_mailbox_entry(cpu_count, processor);

-	early_map_cpu_to_node(cpu_count, acpi_numa_get_nid(cpu_count, hwid));
-
 	cpu_count++;
 }

@@ -566,8 +563,34 @@ acpi_parse_gic_cpu_interface(struct acpi_subtable_header *header,

 	return 0;
 }
+
+static void __init acpi_parse_and_init_cpus(void)
+{
+	int i;
+
+	/*
+	 * do a walk of MADT to determine how many CPUs
+	 * we have including disabled CPUs, and get information
+	 * we need for SMP init.
+	 */
+	acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_INTERRUPT,
+				      acpi_parse_gic_cpu_interface, 0);
+
+	/*
+	 * In ACPI, SMP and CPU NUMA information is provided in separate
+	 * static tables, namely the MADT and the SRAT.
+	 *
+	 * Thus, it is simpler to first create the cpu logical map through
+	 * an MADT walk and then map the logical cpus to their node ids
+	 * as separate steps.
+	 */
+	acpi_map_cpus_to_nodes();
+
+	for (i = 0; i < nr_cpu_ids; i++)
+		early_map_cpu_to_node(i, acpi_numa_get_nid(i));
+}
 #else
-#define acpi_table_parse_madt(...)	do { } while (0)
+#define acpi_parse_and_init_cpus(...)	do { } while (0)
 #endif

 /*
@@ -640,13 +663,7 @@ void __init smp_init_cpus(void)
 	if (acpi_disabled)
 		of_parse_and_init_cpus();
 	else
-		/*
-		 * do a walk of MADT to determine how many CPUs
-		 * we have including disabled CPUs, and get information
-		 * we need for SMP init
-		 */
-		acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_INTERRUPT,
-				      acpi_parse_gic_cpu_interface, 0);
+		acpi_parse_and_init_cpus();

 	if (cpu_count > nr_cpu_ids)
 		pr_warn("Number of cores (%d) exceeds configured maximum of %u - clipping\n",