1. 02 May, 2007 40 commits
    • Fernando Luis [** ISO-8859-1 charset **] VzquezCao's avatar
      [PATCH] i386: Use safe_apic_wait_icr_idle in safe_apic_wait_icr_idle - i386 · f5efb41e
      Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is
      NMI_VECTOR to avoid potential hangups in the event of crash when kdump
      tries to stop the other CPUs.
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      f5efb41e
    • Fernando Luis [** ISO-8859-1 charset **] VzquezCao's avatar
      [PATCH] x86-64: __send_IPI_dest_field - x86_64 · 9062d888
      Implement __send_IPI_dest_field which can be used to send IPIs when the
      "destination shorthand" field of the ICR is set to 00 (destination
      field). Use it whenever possible.
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      9062d888
    • Fernando Luis [** ISO-8859-1 charset **] VzquezCao's avatar
      [PATCH] i386: __send_IPI_dest_field - i386 · 45ae5e96
      Implement __send_IPI_dest_field which can be used to send IPIs when the
      "destination shorthand" field of the ICR is set to 00 (destination
      field). Use it whenever possible.
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      45ae5e96
    • Fernando Luis VazquezCao's avatar
      [PATCH] x86-64: use safe_apic_wait_icr_idle in smpboot.c - x86_64 · 3144c332
      Fernando Luis VazquezCao authored
      inquire_remote_apic is used for APIC debugging, so use
      safe_apic_wait_icr_idle  instead of apic_wait_icr_idle to avoid possible
      lockups when APIC delivery fails.
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      3144c332
    • Fernando Luis VazquezCao's avatar
      [PATCH] i386: use safe_apic_wait_icr_idle in smpboot.c · 4312fa81
      Fernando Luis VazquezCao authored
      __inquire_remote_apic is used for APIC debugging, so use
      safe_apic_wait_icr_idle  instead of apic_wait_icr_idle to avoid possible
      lockups when APIC delivery fails.
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      4312fa81
    • Fernando Luis VazquezCao's avatar
      [PATCH] x86-64: use safe_apic_wait_icr_idle in smpboot.c - x86_64 · ea8c733b
      Fernando Luis VazquezCao authored
      The functionality provided by the new safe_apic_wait_icr_idle is being
      open-coded all over "kernel/smpboot.c". Use safe_apic_wait_icr_idle
      instead to consolidate code and ease maintenance.
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      ea8c733b
    • Fernando Luis VazquezCao's avatar
      [PATCH] i386: use safe_apic_wait_icr_idle - i386 · ae08e43e
      Fernando Luis VazquezCao authored
      The functionality provided by the new safe_apic_wait_icr_idle is being
      open-coded all over "kernel/smpboot.c". Use safe_apic_wait_icr_idle
      instead to consolidate code and ease maintenance.
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      ae08e43e
    • Fernando Luis VazquezCao's avatar
      [PATCH] x86-64: safe_apic_wait_icr_idle - x86_64 · 8339e9fb
      Fernando Luis VazquezCao authored
      apic_wait_icr_idle looks like this:
      
      static __inline__ void apic_wait_icr_idle(void)
      {
        while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
          cpu_relax();
      }
      
      The busy loop in this function would not be problematic if the
      corresponding status bit in the ICR were always updated, but that does
      not seem to be the case under certain crash scenarios. Kdump uses an IPI
      to stop the other CPUs in the event of a crash, but when any of the
      other CPUs are locked-up inside the NMI handler the CPU that sends the
      IPI will end up looping forever in the ICR check, effectively
      hard-locking the whole system.
      
      Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:
      
      "A local APIC unit indicates successful dispatch of an IPI by
      resetting the Delivery Status bit in the Interrupt Command
      Register (ICR). The operating system polls the delivery status
      bit after sending an INIT or STARTUP IPI until the command has
      been dispatched.
      
      A period of 20 microseconds should be sufficient for IPI dispatch
      to complete under normal operating conditions. If the IPI is not
      successfully dispatched, the operating system can abort the
      command. Alternatively, the operating system can retry the IPI by
      writing the lower 32-bit double word of the ICR. This “time-out”
      mechanism can be implemented through an external interrupt, if
      interrupts are enabled on the processor, or through execution of
      an instruction or time-stamp counter spin loop."
      
      Intel's documentation suggests the implementation of a time-out
      mechanism, which, by the way, is already being open-coded in some parts
      of the kernel that tinker with ICR.
      
      Create a apic_wait_icr_idle replacement that implements the time-out
      mechanism and that can be used to solve the aforementioned problem.
      
      AK: moved both functions out of line
      AK: Added improved loop from Keith Owens
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      8339e9fb
    • Fernando Luis VazquezCao's avatar
      [PATCH] i386: safe_apic_wait_icr_idle - i386 · f2b218dd
      Fernando Luis VazquezCao authored
      apic_wait_icr_idle looks like this:
      
      static __inline__ void apic_wait_icr_idle(void)
      {
        while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
          cpu_relax();
      }
      
      The busy loop in this function would not be problematic if the
      corresponding status bit in the ICR were always updated, but that does
      not seem to be the case under certain crash scenarios. Kdump uses an IPI
      to stop the other CPUs in the event of a crash, but when any of the
      other CPUs are locked-up inside the NMI handler the CPU that sends the
      IPI will end up looping forever in the ICR check, effectively
      hard-locking the whole system.
      
      Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:
      
      "A local APIC unit indicates successful dispatch of an IPI by
      resetting the Delivery Status bit in the Interrupt Command
      Register (ICR). The operating system polls the delivery status
      bit after sending an INIT or STARTUP IPI until the command has
      been dispatched.
      
      A period of 20 microseconds should be sufficient for IPI dispatch
      to complete under normal operating conditions. If the IPI is not
      successfully dispatched, the operating system can abort the
      command. Alternatively, the operating system can retry the IPI by
      writing the lower 32-bit double word of the ICR. This “time-out”
      mechanism can be implemented through an external interrupt, if
      interrupts are enabled on the processor, or through execution of
      an instruction or time-stamp counter spin loop."
      
      Intel's documentation suggests the implementation of a time-out
      mechanism, which, by the way, is already being open-coded in some parts
      of the kernel that tinker with ICR.
      
      Create a apic_wait_icr_idle replacement that implements the time-out
      mechanism and that can be used to solve the aforementioned problem.
      
      AK: moved both functions out of line
      AK: added improved loop from Keith Owens
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      f2b218dd
    • Bernhard Kaindl's avatar
      [PATCH] i386: Enable support for fixed-range IORRs to keep RdMem & WrMem in sync · de938c51
      Bernhard Kaindl authored
      If our copy of the MTRRs of the BSP has RdMem or WrMem set, and
      we are running on an AMD64/K8 system, the boot CPU must have had
      MtrrFixDramEn and MtrrFixDramModEn set (otherwise our RDMSR would
      have copied these bits cleared), so we set them on this CPU as well.
      
      This allows us to keep the AMD64/K8 RdMem and WrMem bits in sync
      across the CPUs of SMP systems in order to fullfill the duty of
      system software to "initialize and maintain MTRR consistency
      across all processors." as written in the AMD and Intel manuals.
      
      If an WRMSR instruction fails because MtrrFixDramModEn is not
      set, I expect that also the Intel-style MTRR bits are not updated.
      
      AK: minor cleanup, moved MSR defines around
      Signed-off-by: default avatarBernhard Kaindl <bk@suse.de>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      de938c51
    • Bernhard Kaindl's avatar
      [PATCH] x86: Save and restore the fixed-range MTRRs of the BSP when suspending · 3ebad590
      Bernhard Kaindl authored
      Note: This patch didn'nt need an update since it's initial post.
      
      Some BIOSes may modify fixed-range MTRRs in SMM, e.g. when they
      transition the system into ACPI mode, which is entered thru an SMI,
      triggered by Linux in acpi_enable().
      
      SMIs which cause that Linux is interrupted and BIOS code is
      executed (which may change e.g. fixed-range MTRRs) in SMM may
      be raised by an embedded system controller which is often found
      in notebooks also at other occasions.
      
      If we would not update our copy of the fixed-range MTRRs before
      suspending to RAM or to disk, restore_processor_state() would
      set the fixed-range MTRRs of the BSP using old backup values
      which may be outdated and this could cause the system to fail
      later during resume.
      
      This patch ensures that our copy of the fixed-range MTRRs
      is updated when saving the boot processor state on suspend
      to disk and suspend to RAM.
      
      In combination with other patches this allows to fix s2ram
      and s2disk on the Acer Ferrari 1000 notebook and at least
      s2disk on the Acer Ferrari 5000 notebook.
      Signed-off-by: default avatarBernhard Kaindl <bk@suse.de>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      3ebad590
    • Bernhard Kaindl's avatar
      [PATCH] x86: Save the MTRRs of the BSP before booting an AP · 2b1f6278
      Bernhard Kaindl authored
      Applied fix by Andew Morton:
      http://lkml.org/lkml/2007/4/8/88 - Fix `make headers_check'.
      
      AMD and Intel x86 CPU manuals state that it is the responsibility of
      system software to initialize and maintain MTRR consistency across
      all processors in Multi-Processing Environments.
      
      Quote from page 188 of the AMD64 System Programming manual (Volume 2):
      
      7.6.5 MTRRs in Multi-Processing Environments
      
      "In multi-processing environments, the MTRRs located in all processors must
      characterize memory in the same way. Generally, this means that identical
      values are written to the MTRRs used by the processors." (short omission here)
      "Failure to do so may result in coherency violations or loss of atomicity.
      Processor implementations do not check the MTRR settings in other processors
      to ensure consistency. It is the responsibility of system software to
      initialize and maintain MTRR consistency across all processors."
      
      Current Linux MTRR code already implements the above in the case that the
      BIOS does not properly initialize MTRRs on the secondary processors,
      but the case where the fixed-range MTRRs of the boot processor are changed
      after Linux started to boot, before the initialsation of a secondary
      processor, is not handled yet.
      
      In this case, secondary processors are currently initialized by Linux
      with MTRRs which the boot processor had very early, when mtrr_bp_init()
      did run, but not with the MTRRs which the boot processor uses at the
      time when that secondary processors is actually booted,
      causing differing MTRR contents on the secondary processors.
      
      Such situation happens on Acer Ferrari 1000 and 5000 notebooks where the
      BIOS enables and sets AMD-specific IORR bits in the fixed-range MTRRs
      of the boot processor when it transitions the system into ACPI mode.
      The SMI handler of the BIOS does this in SMM, entered while Linux ACPI
      code runs acpi_enable().
      
      Other occasions where the SMI handler of the BIOS may change bits in
      the MTRRs could occur as well. To initialize newly booted secodary
      processors with the fixed-range MTRRs which the boot processor uses
      at that time, this patch saves the fixed-range MTRRs of the boot
      processor before new secondary processors are started. When the
      secondary processors run their Linux initialisation code, their
      fixed-range MTRRs will be updated with the saved fixed-range MTRRs.
      
      If CONFIG_MTRR is not set, we define mtrr_save_state
      as an empty statement because there is nothing to do.
      
      Possible TODOs:
      
      *) CPU-hotplugging outside of SMP suspend/resume is not yet tested
         with this patch.
      
      *) If, even in this case, an AP never runs i386/do_boot_cpu or x86_64/cpu_up,
         then the calls to mtrr_save_state() could be replaced by calls to
         mtrr_save_fixed_ranges(NULL) and  mtrr_save_state() would not be
         needed.
      
         That would need either verification of the CPU-hotplug code or
         at least a test on a >2 CPU machine.
      
      *) The MTRRs of other running processors are not yet checked at this
         time but it might be interesting to syncronize the MTTRs of all
         processors before booting. That would be an incremental patch,
         but of rather low priority since there is no machine known so
         far which would require this.
      
      AK: moved prototypes on x86-64 around to fix warnings
      Signed-off-by: default avatarBernhard Kaindl <bk@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      2b1f6278
    • Bernhard Kaindl's avatar
      [PATCH] x86: Adds mtrr_save_fixed_ranges() for use in two later patches. · 2b3b4835
      Bernhard Kaindl authored
      In this current implementation which is used in other patches,
      mtrr_save_fixed_ranges() accepts a dummy void pointer because
      in the current implementation of one of these patches, this
      function may be called from smp_call_function_single() which
      requires that this function takes a void pointer argument.
      
      This function calls get_fixed_ranges(), passing mtrr_state.fixed_ranges
      which is the element of the static struct which stores our current
      backup of the fixed-range MTRR values which all CPUs shall be
      using.
      
      Because  mtrr_save_fixed_ranges calls get_fixed_ranges after
      kernel initialisation time, __init needs to be removed from
      the declaration of get_fixed_ranges().
      
      If CONFIG_MTRR is not set, we define mtrr_save_fixed_ranges
      as an empty statement because there is nothing to do.
      
      AK: Moved prototypes for x86-64 around to fix warnings
      Signed-off-by: default avatarBernhard Kaindl <bk@suse.de>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      2b3b4835
    • Andi Kleen's avatar
      856f44ff
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Clean up ELF note generation · 03df4f6e
      Jeremy Fitzhardinge authored
      Three cleanups:
      
      1: ELF notes are never mapped, so there's no need to have any access
      flags in their phdr.
      
      2: When generating them from asm, tell the assembler to use a SHT_NOTE
      section type.  There doesn't seem to be a way to do this from C.
      
      3: Use ANSI rather than traditional cpp behaviour to stringify the
      macro argument.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      03df4f6e
    • Andi Kleen's avatar
      [PATCH] i386: PARAVIRT: Export paravirt_ops for non GPL modules too · 21564fd2
      Andi Kleen authored
      Otherwise non GPL modules cannot even do basic operations
      like disabling interrupts anymore, which would be excessive.
      
      Longer term should split the single structure up into
      internal and external symbols and not export the internal
      ones at all.
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      21564fd2
    • Jeremy Fitzhardinge's avatar
      [PATCH] x86: PARAVIRT: Jeremy Fitzhardinge <jeremy@goop.org> · 441d40dc
      Jeremy Fitzhardinge authored
      The other symbols used to delineate the alt-instructions sections have the
      form __foo/__foo_end.  Rename parainstructions to match.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      441d40dc
    • Zachary Amsden's avatar
      [PATCH] i386: Convert VMI timer to use clock events · e0bb8643
      Zachary Amsden authored
      Convert VMI timer to use clock events, making it properly able to use the NO_HZ
      infrastructure.  On UP systems, with no local APIC, we just continue to route
      these events through the PIT.  On systems with a local APIC, or SMP, we provide
      a single source interrupt chip which creates the local timer IRQ.  It actually
      gets delivered by the APIC hardware, but we don't want to use the same local
      APIC clocksource processing, so we create our own handler here.
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      CC: Dan Hecht <dhecht@vmware.com>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Thomas Gleixner <tglx@linutronix.de>
      e0bb8643
    • Zachary Amsden's avatar
      [PATCH] i386: Implement vmi_kmap_atomic_pte · eeef9c68
      Zachary Amsden authored
      Implement vmi_kmap_atomic_pte in terms of the backend set_linear_mapping
      operation.  The conversion is rather straighforward; call kmap_atomic
      and then inform the hypervisor of the page mapping.
      
      The _flush_tlb damage is due to macros being pulled in from highmem.h.
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      eeef9c68
    • Zachary Amsden's avatar
    • Zachary Amsden's avatar
      [PATCH] i386: Clean up arch/i386/kernel/cpu/mcheck/p4.c · 18420001
      Zachary Amsden authored
      No, just no.  You do not use goto to skip a code block.  You do not
      return an obvious variable from a singly-inlined function and give
      the function a return value.  You don't put unexplained comments
      about kmalloc in code which doesn't do dynamic allocation.  And
      you don't leave stray warnings around for no good reason.
      
      Also, when possible, it is better to use block scoped variables
      because gcc can sometime generate better code.
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      18420001
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: PARAVIRT: Allow boot-time disable of paravirt_ops patching · 959b4fdf
      Jeremy Fitzhardinge authored
      Add "noreplace-paravirt" to disable paravirt_ops patching.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      959b4fdf
    • Zachary Amsden's avatar
    • Jeremy Fitzhardinge's avatar
      [PATCH] x86: update for i386 and x86-64 check_bugs · 57decbda
      Jeremy Fitzhardinge authored
      Remove spurious comments, headers and keywords from x86-64 bugs.[ch].
      
      Use identify_boot_cpu()
      
      AK: merged with other patch
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      57decbda
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: map enough initial memory to create lowmem mappings · 9ce8c2ed
      Jeremy Fitzhardinge authored
      head.S creates the very initial pagetable for the kernel.  This just
      maps enough space for the kernel itself, and an allocation bitmap.
      The amount of mapped memory is rounded up to 4Mbytes, and so this
      typically ends up mapping 8Mbytes of memory.
      
      When booting, pagetable_init() needs to create mappings for all
      lowmem, and the pagetables for these mappings are allocated from the
      free pages around the kernel in low memory.  If the number of
      pagetable pages + kernel size exceeds head.S's initial mapping, it
      will end up faulting on an unmapped page.  This will only happen with
      specific combinations of kernel size and memory size.
      
      This patch makes sure that head.S also maps enough space to fit the
      kernel pagetables as well as the kernel itself.  It ends up using an
      additional two pages of unreclaimable memory.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Acked-by: default avatar"H. Peter Anvin" <hpa@zytor.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>,
      9ce8c2ed
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Fix UP gdt bugs · c5413fbe
      Jeremy Fitzhardinge authored
      Fixes two problems with the GDT when compiling for uniprocessor:
       - There's no percpu segment, so trying to load its selector into %fs fails.
         Use a null selector instead.
       - The real gdt needs to be loaded at some point.  Do it in cpu_init().
      Signed-off-by: default avatarChris Wright <chrisw@sous-sol.org>
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      c5413fbe
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Define per_cpu_offset · 1956c73b
      Jeremy Fitzhardinge authored
      Define per_cpu_offset in asm-i386/percpu.h when SMP defined, like
      asm-generic/percpu.h does for UP.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      1956c73b
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: cleanups to help using per-cpu variables from asm · 978c038e
      Jeremy Fitzhardinge authored
      This patch does a few small cleanups:
       - use PER_CPU_NAME to generate the names of per-cpu variables
       - use lea to add the per_cpu offset in PER_CPU(), because it doesn't
         affect condition flags
       - add PER_CPU_VAR which allows direct access to pre-cpu variables
         with the %fs: prefix on SMP.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andi Kleen <ak@suse.de>
      978c038e
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Convert PDA into the percpu section · 7c3576d2
      Jeremy Fitzhardinge authored
      Currently x86 (similar to x84-64) has a special per-cpu structure
      called "i386_pda" which can be easily and efficiently referenced via
      the %fs register.  An ELF section is more flexible than a structure,
      allowing any piece of code to use this area.  Indeed, such a section
      already exists: the per-cpu area.
      
      So this patch:
      (1) Removes the PDA and uses per-cpu variables for each current member.
      (2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU.
      (3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which
          can be used to calculate addresses for this CPU's variables.
      (4) Simplifies startup, because %fs doesn't need to be loaded with a
          special segment at early boot; it can be deferred until the first
          percpu area is allocated (or never for UP).
      
      The result is less code and one less x86-specific concept.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      7c3576d2
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Page-align the GDT · 7a61d35d
      Jeremy Fitzhardinge authored
      Xen wants a dedicated page for the GDT.  I believe VMI likes it too.
      lguest, KVM and native don't care.
      
      Simple transformation to page-aligned "struct gdt_page".
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Acked-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      7a61d35d
    • Jeremy Fitzhardinge's avatar
      [PATCH] x86-64: deflate inflate_dynamic too · 39b7ee06
      Jeremy Fitzhardinge authored
      inflate_dynamic() has piggy stack usage too, so heap allocate it too.
      I'm not sure it actually gets used, but it shows up large in "make
      checkstack".
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      39b7ee06
    • Jeremy Fitzhardinge's avatar
      [PATCH] x86: deflate stack usage in lib/inflate.c · 35c74226
      Jeremy Fitzhardinge authored
      inflate_fixed and huft_build together use around 2.7k of stack.  When
      using 4k stacks, I saw stack overflows from interrupts arriving while
      unpacking the root initrd:
      
      do_IRQ: stack overflow: 384
       [<c0106b64>] show_trace_log_lvl+0x1a/0x30
       [<c01075e6>] show_trace+0x12/0x14
       [<c010763f>] dump_stack+0x16/0x18
       [<c0107ca4>] do_IRQ+0x6d/0xd9
       [<c010202b>] xen_evtchn_do_upcall+0x6e/0xa2
       [<c0106781>] xen_hypervisor_callback+0x25/0x2c
       [<c010116c>] xen_restore_fl+0x27/0x29
       [<c0330f63>] _spin_unlock_irqrestore+0x4a/0x50
       [<c0117aab>] change_page_attr+0x577/0x584
       [<c0117b45>] kernel_map_pages+0x8d/0xb4
       [<c016a314>] cache_alloc_refill+0x53f/0x632
       [<c016a6c2>] __kmalloc+0xc1/0x10d
       [<c0463d34>] malloc+0x10/0x12
       [<c04641c1>] huft_build+0x2a7/0x5fa
       [<c04645a5>] inflate_fixed+0x91/0x136
       [<c04657e2>] unpack_to_rootfs+0x5f2/0x8c1
       [<c0465acf>] populate_rootfs+0x1e/0xe4
      
      (This was under Xen, but there's no reason it couldn't happen on bare
        hardware.)
      
      This patch mallocs the local variables, thereby reducing the stack
      usage to sane levels.
      
      Also, up the heap size for the kernel decompressor to deal with the
      extra allocation.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Tim Yamin <plasmaroo@gentoo.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      35c74226
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: PARAVIRT: drop unused ptep_get_and_clear · 4cdd9c89
      Jeremy Fitzhardinge authored
      In shadow mode hypervisors, ptep_get_and_clear achieves the desired
      purpose of keeping the shadows in sync by issuing a native_get_and_clear,
      followed by a call to pte_update, which indicates the PTE has been
      modified.
      
      Direct mode hypervisors (Xen) have no need for this anyway, and will trap
      the update using writable pagetables.
      
      This means no hypervisor makes use of ptep_get_and_clear; there is no
      reason to have it in the paravirt-ops structure.  Change confusing
      terminology about raw vs. native functions into consistent use of
      native_pte_xxx for operations which do not invoke paravirt-ops.
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      4cdd9c89
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: PARAVIRT: Clean up paravirt patchable wrappers · 1a45b7aa
      Jeremy Fitzhardinge authored
      Replace all the open-coded macros for generating calls with a pair of
      more general macros (__PVOP_CALL/VCALL), and redefine all the
      PVOP_V?CALL[0-4] in terms of them.
      
      [ Andrew, Andi: this should slot in immediately after "Document asm-i386/paravirt.h"
        (paravirt_ops-document-asm-i386-paravirth.patch) ]
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      1a45b7aa
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: PARAVIRT: Use enums for paravirt lazy flush modi · 4e0fa856
      Jeremy Fitzhardinge authored
      Remove #defines, add enum for PARAVIRT_LAZY_FLUSH.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      4e0fa856
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: PARAVIRT: flush lazy mmu updates on kunmap_atomic · 7b2f27f4
      Jeremy Fitzhardinge authored
      kunmap_atomic should flush any pending lazy mmu updates, mainly to be
      consistent with kmap_atomic, and to preserve its normal behaviour.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      7b2f27f4
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages · ce6234b5
      Jeremy Fitzhardinge authored
      Xen and VMI both have special requirements when mapping a highmem pte
      page into the kernel address space.  These can be dealt with by adding
      a new kmap_atomic_pte() function for mapping highptes, and hooking it
      into the paravirt_ops infrastructure.
      
      Xen specifically wants to map the pte page RO, so this patch exposes a
      helper function, kmap_atomic_prot, which maps the page with the
      specified page protections.
      
      This also adds a kmap_flush_unused() function to clear out the cached
      kmap mappings.  Xen needs this to clear out any potential stray RW
      mappings of pages which will become part of a pagetable.
      
      [ Zach - vmi.c will need some attention after this patch.  It wasn't
        immediately obvious to me what needs to be done. ]
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      ce6234b5
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: PARAVIRT: revert map_pt_hook. · a27fe809
      Jeremy Fitzhardinge authored
      Back out the map_pt_hook to clear the way for kmap_atomic_pte.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      a27fe809
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: PARAVIRT: add flush_tlb_others paravirt_op · d4c10477
      Jeremy Fitzhardinge authored
      This patch adds a pv_op for flush_tlb_others.  Linux running on native
      hardware uses cross-CPU IPIs to flush the TLB on any CPU which may
      have a particular mm's pagetable entries cached in its TLB.  This is
      inefficient in a paravirtualized environment, since the hypervisor
      knows which real CPUs actually contain cached mappings, which may be a
      small subset of a guest's VCPUs.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      d4c10477
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: PARAVIRT: add common patching machinery · 63f70270
      Jeremy Fitzhardinge authored
      Implement the actual patching machinery.  paravirt_patch_default()
      contains the logic to automatically patch a callsite based on a few
      simple rules:
      
       - if the paravirt_op function is paravirt_nop, then patch nops
       - if the paravirt_op function is a jmp target, then jmp to it
       - if the paravirt_op function is callable and doesn't clobber too much
          for the callsite, call it directly
      
      paravirt_patch_default is suitable as a default implementation of
      paravirt_ops.patch, will remove most of the expensive indirect calls
      in favour of either a direct call or a pile of nops.
      
      Backends may implement their own patcher, however.  There are several
      helper functions to help with this:
      
      paravirt_patch_nop	nop out a callsite
      paravirt_patch_ignore	leave the callsite as-is
      paravirt_patch_call	patch a call if the caller and callee
      			have compatible clobbers
      paravirt_patch_jmp	patch in a jmp
      paravirt_patch_insns	patch some literal instructions over
      			the callsite, if they fit
      
      This patch also implements more direct patches for the native case, so
      that when running on native hardware many common operations are
      implemented inline.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Anthony Liguori <anthony@codemonkey.ws>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      63f70270