1. 07 Jan, 2011 1 commit
    • David Rientjes's avatar
      x86, numa: Fix CONFIG_DEBUG_PER_CPU_MAPS without NUMA emulation · d906f0eb
      David Rientjes authored
      "x86, numa: Fake node-to-cpumask for NUMA emulation" broke the
      build when CONFIG_DEBUG_PER_CPU_MAPS is set and CONFIG_NUMA_EMU
      is not.  This is because it is possible to map a cpu to multiple
      nodes when NUMA emulation is used; the patch required a physical
      node address table to find those nodes that was only available
      when CONFIG_NUMA_EMU was enabled.
      
      This extracts the common debug functionality to its own function
      for CONFIG_DEBUG_PER_CPU_MAPS and uses it regardless of whether
      CONFIG_NUMA_EMU is set or not.
      
      NUMA emulation will now iterate over the set of possible nodes
      for each cpu and call the new debug function whereas only the
      cpu's node will be used without NUMA emulation enabled.
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <alpine.DEB.2.00.1012301053590.12995@chino.kir.corp.google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d906f0eb
  2. 29 Dec, 2010 1 commit
  3. 23 Dec, 2010 5 commits
    • David Rientjes's avatar
      x86, numa: Fix cpu to node mapping for sparse node ids · a387e95a
      David Rientjes authored
      NUMA boot code assumes that physical node ids start at 0, but the DIMMs
      that the apic id represents may not be reachable.  If this is the case,
      node 0 is never online and cpus never end up getting appropriately
      assigned to a node.  This causes the cpumask of all online nodes to be
      empty and machines crash with kernel code assuming online nodes have
      valid cpus.
      
      The fix is to appropriately map all the address ranges for physical nodes
      and ensure the cpu to node mapping function checks all possible nodes (up
      to MAX_NUMNODES) instead of simply checking nodes 0-N, where N is the
      number of physical nodes, for valid address ranges.
      
      This requires no longer "compressing" the address ranges of nodes in the
      physical node map from 0-N, but rather leave indices in physnodes[] to
      represent the actual node id of the physical node.  Accordingly, the
      topology exported by both amd_get_nodes() and acpi_get_nodes() no longer
      must return the number of nodes to iterate through; all such iterations
      will now be to MAX_NUMNODES.
      
      This change also passes the end address of system RAM (which may be
      different from normal operation if mem= is specified on the command line)
      before the physnodes[] array is populated.  ACPI parsed nodes are
      truncated to fit within the address range that respect the mem=
      boundaries and even some physical nodes may become unreachable in such
      cases.
      
      When NUMA emulation does succeed, any apicid to node mapping that exists
      for unreachable nodes are given default values so that proximity domains
      can still be assigned.  This is important for node_distance() to
      function as desired.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1012221702090.3701@chino.kir.corp.google.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      a387e95a
    • David Rientjes's avatar
      x86, numa: Fake node-to-cpumask for NUMA emulation · c1c3443c
      David Rientjes authored
      It's necessary to fake the node-to-cpumask mapping so that an emulated
      node ID returns a cpumask that includes all cpus that have affinity to
      the memory it represents.
      
      This is a little intrusive because it requires knowledge of the physical
      topology of the system.  setup_physnodes() gives us that information, but
      since NUMA emulation ends up altering the physnodes array, it's necessary
      to reset it before cpus are brought online.
      
      Accordingly, the physnodes array is moved out of init.data and into
      cpuinit.data since it will be needed on cpuup callbacks.
      
      This works regardless of whether numa=fake is used on the command line,
      or the setup of the fake node succeeds or fails.  The physnodes array
      always contains the physical topology of the machine if CONFIG_NUMA_EMU
      is enabled and can be used to setup the correct node-to-cpumask mappings
      in all cases since setup_physnodes() is called whenever the array needs
      to be repopulated with the correct data.
      
      To fake the actual mappings, numa_add_cpu() and numa_remove_cpu() are
      rewritten for CONFIG_NUMA_EMU so that we first find the physical node to
      which each cpu has local affinity, then iterate through all online nodes
      to find the emulated nodes that have local affinity to that physical
      node, and then finally map the cpu to each of those emulated nodes.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1012221701520.3701@chino.kir.corp.google.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      c1c3443c
    • David Rientjes's avatar
      x86, numa: Fake apicid and pxm mappings for NUMA emulation · f51bf307
      David Rientjes authored
      This patch adds the equivalent of acpi_fake_nodes() for AMD Northbridge
      platforms.  The goal is to fake the apicid-to-node mappings for NUMA
      emulation so the physical topology of the machine is correctly maintained
      within the kernel.
      
      This change also fakes proximity domains for both ACPI and k8 code so the
      physical distance between emulated nodes is maintained via
      node_distance().  This exports the correct distances via
      /sys/devices/system/node/.../distance based on the underlying topology.
      
      A new helper function, fake_physnodes(), is introduced to correctly
      invoke the correct NUMA code to fake these two mappings based on the
      system type.  If there is no underlying NUMA configuration, all cpus are
      mapped to node 0 for local distance.
      
      Since acpi_fake_nodes() is no longer called with CONFIG_ACPI_NUMA, it's
      prototype can be removed from the header file for such a configuration.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1012221701360.3701@chino.kir.corp.google.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      f51bf307
    • David Rientjes's avatar
      x86, numa: Avoid compiling NUMA emulation functions without CONFIG_NUMA_EMU · 4e76f4e6
      David Rientjes authored
      Both acpi_get_nodes() and amd_get_nodes() are only necessary when
      CONFIG_NUMA_EMU is enabled, so avoid compiling them when the option is
      disabled.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1012221701210.3701@chino.kir.corp.google.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      4e76f4e6
    • David Rientjes's avatar
      x86, numa: Reduce minimum fake node size to 32M · 34dc9e74
      David Rientjes authored
      This patch changes the minimum fake node size from 64MB to 32MB so it is
      possible to test NUMA code at a greater scale on smaller machines
      (64 nodes on a 2G machine, 1024 nodes on 32G machine with
      CONFIG_NODES_SHIFT=10).
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1012221700590.3701@chino.kir.corp.google.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      34dc9e74
  4. 10 Dec, 2010 1 commit
    • Tejun Heo's avatar
      x86: apic: Cleanup and simplify setup_local_APIC() · 0aa002fe
      Tejun Heo authored
      setup_local_APIC() is used to setup local APIC early during CPU
      initialization and already assumes that preemption is disabled on
      entry. However, The function unnecessarily disables and enables
      preemption and uses smp_processor_id() multiple times in and out of
      the nested preemption disabled section. This gives the wrong
      impression that the function might be able to handle being called with
      preemption enabled and/or migrated to another processor in the middle.
      
      Make it clear that the function is always called with preemption
      disabled, drop the confusing preemption disable block and call
      smp_processor_id() once at the beginning of the function.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarCyrill Gorcunov <gorcunov@gmail.com>
      Reviewed-by: default avatarPekka Enberg <penberg@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: brgerst@gmail.com
      LKML-Reference: <4D00B3B9.7060702@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      0aa002fe
  5. 09 Dec, 2010 10 commits
  6. 06 Dec, 2010 4 commits
  7. 20 Nov, 2010 1 commit
  8. 18 Nov, 2010 5 commits
  9. 17 Nov, 2010 3 commits
  10. 16 Nov, 2010 1 commit
  11. 15 Nov, 2010 8 commits