1. 23 Jan, 2009 2 commits
    • Ian Campbell's avatar
      xen: handle highmem pages correctly when shrinking a domain · ff4ce8c3
      Ian Campbell authored
      Commit 1058a75f ("xen: actually release
      memory when shrinking domain") causes a crash if the page being released
      is a highmem page.
      
      If a page is highmem then there is no need to unmap it.
      Signed-off-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Acked-by: default avatarJeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ff4ce8c3
    • Peter Zijlstra's avatar
      x86, mm: fix pte_free() · 42ef73fe
      Peter Zijlstra authored
      On -rt we were seeing spurious bad page states like:
      
      Bad page state in process 'firefox'
      page:c1bc2380 flags:0x40000000 mapping:c1bc2390 mapcount:0 count:0
      Trying to fix it up, but a reboot is needed
      Backtrace:
      Pid: 503, comm: firefox Not tainted 2.6.26.8-rt13 #3
      [<c043d0f3>] ? printk+0x14/0x19
      [<c0272d4e>] bad_page+0x4e/0x79
      [<c0273831>] free_hot_cold_page+0x5b/0x1d3
      [<c02739f6>] free_hot_page+0xf/0x11
      [<c0273a18>] __free_pages+0x20/0x2b
      [<c027d170>] __pte_alloc+0x87/0x91
      [<c027d25e>] handle_mm_fault+0xe4/0x733
      [<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
      [<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
      [<c0218875>] do_page_fault+0x36f/0x88a
      
      This is the case where a concurrent fault already installed the PTE and
      we get to free the newly allocated one.
      
      This is due to pgtable_page_ctor() doing the spin_lock_init(&page->ptl)
      which is overlaid with the {private, mapping} struct.
      
      union {
          struct {
              unsigned long private;
              struct address_space *mapping;
          };
          spinlock_t ptl;
          struct kmem_cache *slab;
          struct page *first_page;
      };
      
      Normally the spinlock is small enough to not stomp on page->mapping, but
      PREEMPT_RT=y has huge 'spin'locks.
      
      But lockdep kernels should also be able to trigger this splat, as the
      lock tracking code grows the spinlock to cover page->mapping.
      
      The obvious fix is calling pgtable_page_dtor() like the regular pte free
      path __pte_free_tlb() does.
      
      It seems all architectures except x86 and nm10300 already do this, and
      nm10300 doesn't seem to use pgtable_page_ctor(), which suggests it
      doesn't do SMP or simply doesnt do MMU at all or something.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlsta@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>
      42ef73fe
  2. 22 Jan, 2009 2 commits
  3. 21 Jan, 2009 6 commits
    • H. Peter Anvin's avatar
      x86: add MSR_IA32_MISC_ENABLE bits to <asm/msr-index.h> · bdf21a49
      H. Peter Anvin authored
      Impact: None (new bit definitions currently unused)
      
      Add bit definitions for the MSR_IA32_MISC_ENABLE MSRs to
      <asm/msr-index.h>.
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      bdf21a49
    • Suresh Siddha's avatar
      x86: fix PTE corruption issue while mapping RAM using /dev/mem · 95971342
      Suresh Siddha authored
      Beschorner Daniel reported:
      > hwinfo problem since 2.6.28, showing this in the oops:
      >	Corrupted page table at address 7fd04de3ec00
      
      Also, PaX Team reported a regression with this commit:
      
      >	commit 9542ada8
      >	Author: Suresh Siddha <suresh.b.siddha@intel.com>
      >	Date:   Wed Sep 24 08:53:33 2008 -0700
      >
      >	    x86: track memtype for RAM in page struct
      
      This commit breaks mapping any RAM page through /dev/mem, as the
      reserve_memtype() was not initializing the return attribute type and as such
      corrupting the PTE entry that was setup with the return attribute type.
      
      Because of this bug, application mapping this RAM page through /dev/mem
      will die with "Corrupted page table at address xxxx" message in the kernel
      log and also the kernel identity mapping which maps the underlying RAM
      page gets converted to UC.
      
      Fix this by initializing the return attribute type before calling
      reserve_ram_pages_type()
      Reported-by: default avatarPaX Team <pageexec@freemail.hu>
      Reported-and-tested-by: default avatarBeschorner Daniel <Daniel.Beschorner@facton.com>
      Tested-and-Acked-by: default avatarPaX Team <pageexec@freemail.hu>
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      95971342
    • Thomas Renninger's avatar
      x86: mtrr fix debug boot parameter · 731f1872
      Thomas Renninger authored
      while looking at:
      
        http://bugzilla.kernel.org/show_bug.cgi?id=11541
      
      I realized that the mtrr.show param cannot work, because
      the code is processed much too early.
      
      This patch:
       - Declares mtrr.show as early_param
       - Stays consistent with the previous param (which I doubt
         that it ever worked), so mtrr.show=1 would still work
       - Declares mtrr_show as initdata
      Signed-off-by: default avatarThomas Renninger <trenn@suse.de>
      Acked-by: default avatarJan Beulich <jbeulich@novell.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      731f1872
    • Suresh Siddha's avatar
      x86: fix page attribute corruption with cpa() · a1e46212
      Suresh Siddha authored
      Impact: fix sporadic slowdowns and warning messages
      
      This patch fixes a performance issue reported by Linus on his
      Nehalem system. While Linus reverted the PAT patch (commit
      58dab916) which exposed the issue,
      existing cpa() code can potentially still cause wrong(page attribute
      corruption) behavior.
      
      This patch also fixes the "WARNING: at arch/x86/mm/pageattr.c:560" that
      various people reported.
      
      In 64bit kernel, kernel identity mapping might have holes depending
      on the available memory and how e820 reports the address range
      covering the RAM, ACPI, PCI reserved regions. If there is a 2MB/1GB hole
      in the address range that is not listed by e820 entries, kernel identity
      mapping will have a corresponding hole in its 1-1 identity mapping.
      
      If cpa() happens on the kernel identity mapping which falls into these holes,
      existing code fails like this:
      
      	__change_page_attr_set_clr()
      		__change_page_attr()
      			returns 0 because of if (!kpte). But doesn't
      			set cpa->numpages and cpa->pfn.
      		cpa_process_alias()
      			uses uninitialized cpa->pfn (random value)
      			which can potentially lead to changing the page
      			attribute of kernel text/data, kernel identity
      			mapping of RAM pages etc. oops!
      
      This bug was easily exposed by another PAT patch which was doing
      cpa() more often on kernel identity mapping holes (physical range between
      max_low_pfn_mapped and 4GB), where in here it was setting the
      cache disable attribute(PCD) for kernel identity mappings aswell.
      
      Fix cpa() to handle the kernel identity mapping holes. Retain
      the WARN() for cpa() calls to other not present address ranges
      (kernel-text/data, ioremap() addresses)
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a1e46212
    • Ingo Molnar's avatar
      Revert "x86: signal: change type of paramter for sys_rt_sigreturn()" · 552b8aa4
      Ingo Molnar authored
      This reverts commit 4217458d.
      
      Justin Madru bisected this commit, it was causing weird Firefox
      crashes.
      
      The reason is that GCC mis-optimizes (re-uses) the on-stack parameters of
      the calling frame, which corrupts the syscall return pt_regs state and
      thus corrupts user-space register state.
      
      So we go back to the slightly less clean but more optimization-safe
      method of getting to pt_regs. Also add a comment to explain this.
      
      Resolves: http://bugzilla.kernel.org/show_bug.cgi?id=12505Reported-and-bisected-by: default avatarJustin Madru <jdm64@gawab.com>
      Tested-by: default avatarJustin Madru <jdm64@gawab.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      552b8aa4
    • Andi Kleen's avatar
      x86: use early clobbers in usercopy*.c · e0a96129
      Andi Kleen authored
      Impact: fix rare (but currently harmless) miscompile with certain configs and gcc versions
      
      Hugh Dickins noticed that strncpy_from_user() was miscompiled
      in some circumstances with gcc 4.3.
      
      Thanks to Hugh's excellent analysis it was easy to track down.
      
      Hugh writes:
      
      > Try building an x86_64 defconfig 2.6.29-rc1 kernel tree,
      > except not quite defconfig, switch CONFIG_PREEMPT_NONE=y
      > and CONFIG_PREEMPT_VOLUNTARY off (because it expands a
      > might_fault() there, which hides the issue): using a
      > gcc 4.3.2 (I've checked both openSUSE 11.1 and Fedora 10).
      >
      > It generates the following:
      >
      > 0000000000000000 <__strncpy_from_user>:
      >    0:   48 89 d1                mov    %rdx,%rcx
      >    3:   48 85 c9                test   %rcx,%rcx
      >    6:   74 0e                   je     16 <__strncpy_from_user+0x16>
      >    8:   ac                      lods   %ds:(%rsi),%al
      >    9:   aa                      stos   %al,%es:(%rdi)
      >    a:   84 c0                   test   %al,%al
      >    c:   74 05                   je     13 <__strncpy_from_user+0x13>
      >    e:   48 ff c9                dec    %rcx
      >   11:   75 f5                   jne    8 <__strncpy_from_user+0x8>
      >   13:   48 29 c9                sub    %rcx,%rcx
      >   16:   48 89 c8                mov    %rcx,%rax
      >   19:   c3                      retq
      >
      > Observe that "sub %rcx,%rcx; mov %rcx,%rax", whereas gcc 4.2.1
      > (and many other configs) say "sub %rcx,%rdx; mov %rdx,%rax".
      > Isn't it returning 0 when it ought to be returning strlen?
      
      The asm constraints for the strncpy_from_user() result were missing an
      early clobber, which tells gcc that the last output arguments
      are written before all input arguments are read.
      
      Also add more early clobbers in the rest of the file and fix 32-bit
      usercopy.c in the same way.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      [ since this API is rarely used and no in-kernel user relies on a 'len'
        return value (they only rely on negative return values) this miscompile
        was never noticed in the field. But it's worth fixing it nevertheless. ]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e0a96129
  4. 19 Jan, 2009 5 commits
    • Gary Hade's avatar
      x86: remove kernel_physical_mapping_init() from init section · f5495506
      Gary Hade authored
      Impact: fix crash with memory hotplug enabled
      
      kernel_physical_mapping_init() is called during memory hotplug
      so it does not belong in the init section.
      
      If the kernel is built with CONFIG_DEBUG_SECTION_MISMATCH=y on
      the make command line, arch/x86/mm/init_64.c is compiled with
      the -fno-inline-functions-called-once gcc option defeating
      inlining of kernel_physical_mapping_init() within init_memory_mapping().
      
      When kernel_physical_mapping_init() is not inlined it is placed
      in the .init.text section according to the __init in it's current
      declaration.  A later call to kernel_physical_mapping_init() during
      a memory hotplug operation encounters an int3 trap because the
      .init.text section memory has been freed.
      
      This patch eliminates the crash caused by the int3 trap by moving the
      non-inlined kernel_physical_mapping_init() from .init.text to .meminit.text.
      Signed-off-by: default avatarGary Hade <garyhade@us.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f5495506
    • Ingo Molnar's avatar
      fix: crash: IP: __bitmap_intersects+0x48/0x73 · bfa318ad
      Ingo Molnar authored
      -tip testing found this crash:
      
      > [   35.258515] calling  acpi_cpufreq_init+0x0/0x127 @ 1
      > [   35.264127] BUG: unable to handle kernel NULL pointer dereference at (null)
      > [   35.267554] IP: [<ffffffff80478092>] __bitmap_intersects+0x48/0x73
      > [   35.267554] PGD 0
      > [   35.267554] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      
      arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c is still broken: there's no
      allocation of the variable mask, so we pass in an uninitialized cmd.mask
      field to drv_read(), which then passes it to the scheduler which then
      crashes ...
      
      Switch it over to the much simpler constant-cpumask-pointers approach.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      bfa318ad
    • Mike Travis's avatar
      cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write · 72859081
      Mike Travis authored
      Impact: use new work_on_cpu function to reduce stack usage
      
      Replace the saving of current->cpus_allowed and set_cpus_allowed_ptr() with
      a work_on_cpu function for drv_read() and drv_write().
      
      Basically converts do_drv_{read,write} into "work_on_cpu" functions that
      are now called by drv_read and drv_write.
      
      Note: This patch basically reverts 50c668d6 which reverted 7503bfba, now
      that the work_on_cpu() function is more stable.
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Tested-by: default avatarDieter Ries <clip2@gmx.de>
      Tested-by: default avatarMaciej Rutecki <maciej.rutecki@gmail.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: <cpufreq@vger.kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      72859081
    • Rusty Russell's avatar
      work_on_cpu: Use our own workqueue. · 8ccad40d
      Rusty Russell authored
      Impact: remove potential clashes with generic kevent workqueue
      
      Annoyingly, some places we want to use work_on_cpu are already in
      workqueues.  As per Ingo's suggestion, we create a different workqueue
      for work_on_cpu.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8ccad40d
    • Rusty Russell's avatar
      work_on_cpu: don't try to get_online_cpus() in work_on_cpu. · 31ad9081
      Rusty Russell authored
      Impact: remove potential circular lock dependency with cpu hotplug lock
      
      This has caused more problems than it solved, with a pile of cpu
      hotplug locking issues.
      
      Followup patches will get_online_cpus() in callers that need it, but
      if they don't do it they're no worse than before when they were using
      set_cpus_allowed without locking.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      31ad9081
  5. 18 Jan, 2009 3 commits
    • Leonardo Potenza's avatar
      x86: fix section mismatch warnings in kernel/setup_percpu.c · c7f8562a
      Leonardo Potenza authored
      The function setup_cpu_local_masks() has been marked __init, in
      order to remove the following section mismatch messages:
      
      WARNING: vmlinux.o(.text+0x3c2c7): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
      The function setup_cpu_local_masks() references
      the function __init alloc_bootmem_cpumask_var().
      This is often because setup_cpu_local_masks lacks a __init
      annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
      
      WARNING: vmlinux.o(.text+0x3c2d3): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
      The function setup_cpu_local_masks() references
      the function __init alloc_bootmem_cpumask_var().
      This is often because setup_cpu_local_masks lacks a __init
      annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
      
      WARNING: vmlinux.o(.text+0x3c2df): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
      The function setup_cpu_local_masks() references
      the function __init alloc_bootmem_cpumask_var().
      This is often because setup_cpu_local_masks lacks a __init
      annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
      
      WARNING: vmlinux.o(.text+0x3c2eb): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
      The function setup_cpu_local_masks() references
      the function __init alloc_bootmem_cpumask_var().
      This is often because setup_cpu_local_masks lacks a __init
      annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
      Signed-off-by: default avatarLeonardo Potenza <lpotenza@inwind.it>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c7f8562a
    • Mike Travis's avatar
      x86: put trigger in to detect mismatched apic versions · b2b815d8
      Mike Travis authored
      Impact: add debug warning
      
      Fire off one message if two apic's discovered with different
      apic versions. (this code is only called during CPU init)
      
      The goal of this is to pave the way of the removal of the apic_version[]
      array. We dont expect any apic version incompatibilities in the x86
      landscape of systems [if so we dont handle them very well and probably
      never will handle deep apic version assymetries well], but it's prudent
      to have a debug check for one kernel cycle nevertheless.
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b2b815d8
    • Jeff Mahoney's avatar
      x86: define ARCH_WANT_FRAME_POINTERS · 64dec40d
      Jeff Mahoney authored
      Commit da4276b8 changed a dependency
      for FRAME_POINTER from X86 to ARCH_WANT_FRAME_POINTERS, but didn't
      actually define it.
      
      This patch adds the definition for ARCH_WANT_FRAME_POINTERS. Without it,
      FRAME_POINTER can't be enabled on x86.
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      64dec40d
  6. 16 Jan, 2009 1 commit
    • Jan Beulich's avatar
      x86: fix assumed to be contiguous leaf page tables for kmap_atomic region (take 2) · a3c6018e
      Jan Beulich authored
      Debugging and original patch from Nick Piggin <npiggin@suse.de>
      
      The early fixmap pmd entry inserted at the very top of the KVA is causing the
      subsequent fixmap mapping code to not provide physically linear pte pages over
      the kmap atomic portion of the fixmap (which relies on said property to
      calculate pte addresses).
      
      This has caused weird boot failures in kmap_atomic much later in the boot
      process (initial userspace faults) on a 32-bit PAE system with a larger number
      of CPUs (smaller CPU counts tend not to run over into the next page so don't
      show up the problem).
      
      Solve this by attempting to clear out the page table, and copy any of its
      entries to the new one. Also, add a bug if a nonlinear condition is encountered
      and can't be resolved, which might save some hours of debugging if this fragile
      scheme ever breaks again...
      
      Once we have such logic, we can also use it to eliminate the early ioremap
      trickery around the page table setup for the fixmap area. This also fixes
      potential issues with FIX_* entries sharing the leaf page table with the early
      ioremap ones getting discarded by early_ioremap_clear() and not restored by
      early_ioremap_reset(). It at once eliminates the temporary (and configuration,
      namely NR_CPUS, dependent) unavailability of early fixed mappings during the
      time the fixmap area page tables get constructed.
      
      Finally, also replace the hard coded calculation of the initial table space
      needed for the fixmap area with a proper one, allowing kernels configured for
      large CPU counts to actually boot.
      
      Based-on: Nick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a3c6018e
  7. 15 Jan, 2009 2 commits
  8. 14 Jan, 2009 1 commit
  9. 13 Jan, 2009 2 commits
    • Andi Kleen's avatar
      x86, generic: mark complex bitops.h inlines as __always_inline · c8399943
      Andi Kleen authored
      Impact: reduce kernel image size
      
      Hugh Dickins noticed that older gcc versions when the kernel
      is built for code size didn't inline some of the bitops.
      
      Mark all complex x86 bitops that have more than a single
      asm statement or two as always inline to avoid this problem.
      
      Probably should be done for other architectures too.
      
      Ingo then found a better fix that only requires
      a single line change, but it unfortunately only
      works on gcc 4.3.
      
      On older gccs the original patch still makes a ~0.3% defconfig
      difference with CONFIG_OPTIMIZE_INLINING=y.
      
      With gcc 4.1 and a defconfig like build:
      
          61169987 1138540  883788 8139326  7c323e vmlinux-oi-with-patch
          6137043 1138540  883788 8159371  7c808b vmlinux-optimize-inlining
      
      ~20k / 0.3% difference.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c8399943
    • Ingo Molnar's avatar
      x86, cpufreq: remove leftover copymask_copy() · 4a922a96
      Ingo Molnar authored
      Impact: fix potential boot crash on MAXSMP
      
      Remove code left over by:
      
        50c668d6: Revert "cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read
      
      That cmd.cpumask is not allocated anymore. No impact on default !MAXSMP
      kernels.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4a922a96
  10. 12 Jan, 2009 5 commits
  11. 11 Jan, 2009 1 commit
  12. 10 Jan, 2009 10 commits