1. 20 Jul, 2012 1 commit
  2. 28 Jun, 2012 9 commits
    • Alex Shi's avatar
      x86/tlb: do flush_tlb_kernel_range by 'invlpg' · effee4b9
      Alex Shi authored
      This patch do flush_tlb_kernel_range by 'invlpg'. The performance pay
      and gain was analyzed in previous patch
      (x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range).
      
      In the testing: http://lkml.org/lkml/2012/6/21/10
      
      The pay is mostly covered by long kernel path, but the gain is still
      quite clear, memory access in user APP can increase 30+% when kernel
      execute this funtion.
      Signed-off-by: default avatarAlex Shi <alex.shi@intel.com>
      Link: http://lkml.kernel.org/r/1340845344-27557-10-git-send-email-alex.shi@intel.comSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      effee4b9
    • Alex Shi's avatar
      x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR · 52aec330
      Alex Shi authored
      There are 32 INVALIDATE_TLB_VECTOR now in kernel. That is quite big
      amount of vector in IDT. But it is still not enough, since modern x86
      sever has more cpu number. That still causes heavy lock contention
      in TLB flushing.
      
      The patch using generic smp call function to replace it. That saved 32
      vector number in IDT, and resolved the lock contention in TLB
      flushing on large system.
      
      In the NHM EX machine 4P * 8cores * HT = 64 CPUs, hackbench pthread
      has 3% performance increase.
      Signed-off-by: default avatarAlex Shi <alex.shi@intel.com>
      Link: http://lkml.kernel.org/r/1340845344-27557-9-git-send-email-alex.shi@intel.comSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      52aec330
    • Alex Shi's avatar
      x86/tlb: enable tlb flush range support for x86 · 611ae8e3
      Alex Shi authored
      Not every tlb_flush execution moment is really need to evacuate all
      TLB entries, like in munmap, just few 'invlpg' is better for whole
      process performance, since it leaves most of TLB entries for later
      accessing.
      
      This patch also rewrite flush_tlb_range for 2 purposes:
      1, split it out to get flush_blt_mm_range function.
      2, clean up to reduce line breaking, thanks for Borislav's input.
      
      My micro benchmark 'mummap' http://lkml.org/lkml/2012/5/17/59
      show that the random memory access on other CPU has 0~50% speed up
      on a 2P * 4cores * HT NHM EP while do 'munmap'.
      
      Thanks Yongjie's testing on this patch:
      -------------
      I used Linux 3.4-RC6 w/ and w/o his patches as Xen dom0 and guest
      kernel.
      After running two benchmarks in Xen HVM guest, I found his patches
      brought about 1%~3% performance gain in 'kernel build' and 'netperf'
      testing, though the performance gain was not very stable in 'kernel
      build' testing.
      
      Some detailed testing results are below.
      
      Testing Environment:
      	Hardware: Romley-EP platform
      	Xen version: latest upstream
      	Linux kernel: 3.4-RC6
      	Guest vCPU number: 8
      	NIC: Intel 82599 (10GB bandwidth)
      
      In 'kernel build' testing in guest:
      	Command line  |  performance gain
          make -j 4      |    3.81%
          make -j 8      |    0.37%
          make -j 16     |    -0.52%
      
      In 'netperf' testing, we tested TCP_STREAM with default socket size
      16384 byte as large packet and 64 byte as small packet.
      I used several clients to add networking pressure, then 'netperf' server
      automatically generated several threads to response them.
      I also used large-size packet and small-size packet in the testing.
      	Packet size  |  Thread number | performance gain
      	16384 bytes  |      4       |   0.02%
      	16384 bytes  |      8       |   2.21%
      	16384 bytes  |      16      |   2.04%
      	64 bytes     |      4       |   1.07%
      	64 bytes     |      8       |   3.31%
      	64 bytes     |      16      |   0.71%
      Signed-off-by: default avatarAlex Shi <alex.shi@intel.com>
      Link: http://lkml.kernel.org/r/1340845344-27557-8-git-send-email-alex.shi@intel.comTested-by: default avatarRen, Yongjie <yongjie.ren@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      611ae8e3
    • Alex Shi's avatar
      mm/mmu_gather: enable tlb flush range in generic mmu_gather · 597e1c35
      Alex Shi authored
      This patch enabled the tlb flush range support in generic mmu layer.
      
      Most of arch has self tlb flush range support, like ARM/IA64 etc.
      X86 arch has no this support in hardware yet. But another instruction
      'invlpg' can implement this function in some degree. So, enable this
      feather in generic layer for x86 now. and maybe useful for other archs
      in further.
      
      Generic mmu_gather struct is protected by micro
      HAVE_GENERIC_MMU_GATHER. Other archs that has flush range supported
      own self mmu_gather struct. So, now this change is safe for them.
      
      In future we may unify this struct and related functions on multiple
      archs.
      
      Thanks for Peter Zijlstra time and time reminder for multiple
      architecture code safe!
      Signed-off-by: default avatarAlex Shi <alex.shi@intel.com>
      Link: http://lkml.kernel.org/r/1340845344-27557-7-git-send-email-alex.shi@intel.comSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      597e1c35
    • Alex Shi's avatar
      x86/tlb: add tlb_flushall_shift knob into debugfs · 3df3212f
      Alex Shi authored
      kernel will replace cr3 rewrite with invlpg when
        tlb_flush_entries <= active_tlb_entries / 2^tlb_flushall_factor
      if tlb_flushall_factor is -1, kernel won't do this replacement.
      
      User can modify its value according to specific CPU/applications.
      
      Thanks for Borislav providing the help message of
      CONFIG_DEBUG_TLBFLUSH.
      Signed-off-by: default avatarAlex Shi <alex.shi@intel.com>
      Link: http://lkml.kernel.org/r/1340845344-27557-6-git-send-email-alex.shi@intel.comSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      3df3212f
    • Alex Shi's avatar
      x86/tlb: add tlb_flushall_shift for specific CPU · c4211f42
      Alex Shi authored
      Testing show different CPU type(micro architectures and NUMA mode) has
      different balance points between the TLB flush all and multiple invlpg.
      And there also has cases the tlb flush change has no any help.
      
      This patch give a interface to let x86 vendor developers have a chance
      to set different shift for different CPU type.
      
      like some machine in my hands, balance points is 16 entries on
      Romely-EP; while it is at 8 entries on Bloomfield NHM-EP; and is 256 on
      IVB mobile CPU. but on model 15 core2 Xeon using invlpg has nothing
      help.
      
      For untested machine, do a conservative optimization, same as NHM CPU.
      Signed-off-by: default avatarAlex Shi <alex.shi@intel.com>
      Link: http://lkml.kernel.org/r/1340845344-27557-5-git-send-email-alex.shi@intel.comSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      c4211f42
    • Alex Shi's avatar
      x86/tlb: fall back to flush all when meet a THP large page · d8dfe60d
      Alex Shi authored
      We don't need to flush large pages by PAGE_SIZE step, that just waste
      time. and actually, large page don't need 'invlpg' optimizing according
      to our micro benchmark. So, just flush whole TLB is enough for them.
      
      The following result is tested on a 2CPU * 4cores * 2HT NHM EP machine,
      with THP 'always' setting.
      
      Multi-thread testing, '-t' paramter is thread number:
                             without this patch 	with this patch
      ./mprotect -t 1         14ns                       13ns
      ./mprotect -t 2         13ns                       13ns
      ./mprotect -t 4         12ns                       11ns
      ./mprotect -t 8         14ns                       10ns
      ./mprotect -t 16        28ns                       28ns
      ./mprotect -t 32        54ns                       52ns
      ./mprotect -t 128       200ns                      200ns
      Signed-off-by: default avatarAlex Shi <alex.shi@intel.com>
      Link: http://lkml.kernel.org/r/1340845344-27557-4-git-send-email-alex.shi@intel.comSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      d8dfe60d
    • Alex Shi's avatar
      x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range · e7b52ffd
      Alex Shi authored
      x86 has no flush_tlb_range support in instruction level. Currently the
      flush_tlb_range just implemented by flushing all page table. That is not
      the best solution for all scenarios. In fact, if we just use 'invlpg' to
      flush few lines from TLB, we can get the performance gain from later
      remain TLB lines accessing.
      
      But the 'invlpg' instruction costs much of time. Its execution time can
      compete with cr3 rewriting, and even a bit more on SNB CPU.
      
      So, on a 512 4KB TLB entries CPU, the balance points is at:
      	(512 - X) * 100ns(assumed TLB refill cost) =
      		X(TLB flush entries) * 100ns(assumed invlpg cost)
      
      Here, X is 256, that is 1/2 of 512 entries.
      
      But with the mysterious CPU pre-fetcher and page miss handler Unit, the
      assumed TLB refill cost is far lower then 100ns in sequential access. And
      2 HT siblings in one core makes the memory access more faster if they are
      accessing the same memory. So, in the patch, I just do the change when
      the target entries is less than 1/16 of whole active tlb entries.
      Actually, I have no data support for the percentage '1/16', so any
      suggestions are welcomed.
      
      As to hugetlb, guess due to smaller page table, and smaller active TLB
      entries, I didn't see benefit via my benchmark, so no optimizing now.
      
      My micro benchmark show in ideal scenarios, the performance improves 70
      percent in reading. And in worst scenario, the reading/writing
      performance is similar with unpatched 3.4-rc4 kernel.
      
      Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
      'always':
      
      multi thread testing, '-t' paramter is thread number:
      	       	        with patch   unpatched 3.4-rc4
      ./mprotect -t 1           14ns		24ns
      ./mprotect -t 2           13ns		22ns
      ./mprotect -t 4           12ns		19ns
      ./mprotect -t 8           14ns		16ns
      ./mprotect -t 16          28ns		26ns
      ./mprotect -t 32          54ns		51ns
      ./mprotect -t 128         200ns		199ns
      
      Single process with sequencial flushing and memory accessing:
      
      		       	with patch   unpatched 3.4-rc4
      ./mprotect		    7ns			11ns
      ./mprotect -p 4096  -l 8 -n 10240
      			    21ns		21ns
      
      [ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
        has additional performance numbers. ]
      Signed-off-by: default avatarAlex Shi <alex.shi@intel.com>
      Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.comSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      e7b52ffd
    • Alex Shi's avatar
      x86/tlb_info: get last level TLB entry number of CPU · e0ba94f1
      Alex Shi authored
      For 4KB pages, x86 CPU has 2 or 1 level TLB, first level is data TLB and
      instruction TLB, second level is shared TLB for both data and instructions.
      
      For hupe page TLB, usually there is just one level and seperated by 2MB/4MB
      and 1GB.
      
      Although each levels TLB size is important for performance tuning, but for
      genernal and rude optimizing, last level TLB entry number is suitable. And
      in fact, last level TLB always has the biggest entry number.
      
      This patch will get the biggest TLB entry number and use it in furture TLB
      optimizing.
      
      Accroding Borislav's suggestion, except tlb_ll[i/d]_* array, other
      function and data will be released after system boot up.
      
      For all kinds of x86 vendor friendly, vendor specific code was moved to its
      specific files.
      Signed-off-by: default avatarAlex Shi <alex.shi@intel.com>
      Link: http://lkml.kernel.org/r/1340845344-27557-2-git-send-email-alex.shi@intel.comSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      e0ba94f1
  3. 14 Jun, 2012 2 commits
    • Vlad Zolotarov's avatar
      x86: Add read_mostly declaration/definition to variables from smp.h · 0816b0f0
      Vlad Zolotarov authored
      Add "read-mostly" qualifier to the following variables in
      smp.h:
      
       - cpu_sibling_map
       - cpu_core_map
       - cpu_llc_shared_map
       - cpu_llc_id
       - cpu_number
       - x86_cpu_to_apicid
       - x86_bios_cpu_apicid
       - x86_cpu_to_logical_apicid
      
      As long as all the variables above are only written during the
      initialization, this change is meant to prevent the false
      sharing. More specifically, on vSMP Foundation platform
      x86_cpu_to_apicid shared the same internode_cache_line with
      frequently written lapic_events.
      
      From the analysis of the first 33 per_cpu variables out of 219
      (memories they describe, to be more specific) the 8 have read_mostly
      nature (tlb_vector_offset, cpu_loops_per_jiffy, xen_debug_irq, etc.)
      and 25 are frequently written (irq_stack_union, gdt_page,
      exception_stacks, idt_desc, etc.).
      
      Assuming that the spread of the rest of the per_cpu variables is
      similar, identifying the read mostly memories will make more sense
      in terms of long-term code maintenance comparing to identifying
      frequently written memories.
      Signed-off-by: default avatarVlad Zolotarov <vlad@scalemp.com>
      Acked-by: default avatarShai Fultheim <shai@scalemp.com>
      Cc: Shai Fultheim (Shai@ScaleMP.com) <Shai@scalemp.com>
      Cc: ido@wizery.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1719258.EYKzE4Zbq5@vladSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0816b0f0
    • Ido Yariv's avatar
      x86: Define early read-mostly per-cpu macros · c35f7741
      Ido Yariv authored
      Some read-mostly per-cpu data may need to be declared or defined
      early, so it can be initialized and accessed before per_cpu
      areas are allocated.
      
      Only the data that resides in the per_cpu areas should be
      read-mostly, as there is little benefit in optimizing cache
      lines on initialization.
      Signed-off-by: default avatarIdo Yariv <ido@wizery.com>
      [ Added the missing declarations in !SMP code. ]
      Signed-off-by: default avatarVlad Zolotarov <vlad@scalemp.com>
      Acked-by: default avatarShai Fultheim <shai@scalemp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/46188571.ddB8aVQYWo@vladSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c35f7741
  4. 11 Jun, 2012 2 commits
  5. 09 Jun, 2012 1 commit
  6. 08 Jun, 2012 23 commits
    • David Rientjes's avatar
      mm, oom: fix badness score underflow · 1e11ad8d
      David Rientjes authored
      If the privileges given to root threads (3% of allowable memory) or a
      negative value of /proc/pid/oom_score_adj happen to exceed the amount of
      rss of a thread, its badness score overflows as a result of commit
      a7f638f9 ("mm, oom: normalize oom scores to oom_score_adj scale only
      for userspace").
      
      Fix this by making the type signed and return 1, meaning the thread is
      still eligible for kill, if the value is negative.
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e11ad8d
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 72494504
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar.
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched: Fix the relax_domain_level boot parameter
        sched: Validate assumptions in sched_init_numa()
        sched: Always initialize cpu-power
        sched: Fix domain iteration
        sched/rt: Fix lockdep annotation within find_lock_lowest_rq()
        sched/numa: Load balance between remote nodes
        sched/x86: Calculate booted cores after construction of sibling_mask
      72494504
    • Randy Dunlap's avatar
      sched/fair: fix lots of kernel-doc warnings · cd96891d
      Randy Dunlap authored
      Fix lots of new kernel-doc warnings in kernel/sched/fair.c:
      
        Warning(kernel/sched/fair.c:3625): No description found for parameter 'env'
        Warning(kernel/sched/fair.c:3625): Excess function parameter 'sd' description in 'update_sg_lb_stats'
        Warning(kernel/sched/fair.c:3735): No description found for parameter 'env'
        Warning(kernel/sched/fair.c:3735): Excess function parameter 'sd' description in 'update_sd_pick_busiest'
        Warning(kernel/sched/fair.c:3735): Excess function parameter 'this_cpu' description in 'update_sd_pick_busiest'
        .. more warnings
      Signed-off-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cd96891d
    • Linus Torvalds's avatar
      Revert "drm/i915/crt: Do not rely upon the HPD presence pin" · 8f53369b
      Linus Torvalds authored
      This reverts commit 9e612a00.
      
      It incorrectly finds VGA connectors where none are attached, apparently
      not noticing that nothing replied to the EDID queries, and happily using
      the default EDID modes that have nothing to do with actual hardware.
      
      That in turn then causes X to fall down to the lowest common
      denominator, which is usually the default 1024x768 mode that is in the
      default EDID and pretty much anything supports).
      
      I'd suggest that if not relying on the HDP pin, the code should at least
      check whether it gets valid EDID data back, rather than just assume
      there's something on the VGA connector.
      
      Cc: Dave Airlie <airlied@linux.ie>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f53369b
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 77249539
      Linus Torvalds authored
      Pull ext4 bug fixes from Theodore Ts'o:
       "This update contains two bug fixes, both destined for the stable tree.
        Perhaps the most important is one which fixes ext4 when used with file
        systems originally formatted for use with ext3, but then later
        converted to take advantage of ext4."
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: don't set i_flags in EXT4_IOC_SETFLAGS
        ext4: fix the free blocks calculation for ext3 file systems w/ uninit_bg
      77249539
    • Linus Torvalds's avatar
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc · 3e9ca022
      Linus Torvalds authored
      Pull powerpc fixes from Paul Mackerras:
       "Two small fixes for powerpc:
         - a fix for a regression since 3.2 that causes 4-second (or longer)
           pauses
         - a fix for a potential oops when loading kernel modules on 32-bit
           embedded systems."
      
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
        powerpc: Fix kernel panic during kernel module load
        powerpc/time: Sanity check of decrementer expiration is necessary
      3e9ca022
    • Linus Torvalds's avatar
      Merge tag 'upstream-3.5-rc2' of git://git.infradead.org/linux-ubifs · e7264308
      Linus Torvalds authored
      Pull UBI/UBIFS fixes from Artem Bityutskiy:
       "Fix UBI and UBIFS - they refuse to work without debugfs.  This was
        broken by the 3.5-rc1 UBI/UBIFS changes when we removed the debugging
        Kconfig switches.
      
        Also, correct locking in 'ubi_wl_flush()' - it was extended to support
        flushing a specific LEB in 3.5-rc1, and the locking was sub-optimal."
      
      * tag 'upstream-3.5-rc2' of git://git.infradead.org/linux-ubifs:
        UBI: correct ubi_wl_flush locking
        UBIFS: fix debugfs-less systems support
        UBI: fix debugfs-less systems support
      e7264308
    • Linus Torvalds's avatar
      Revert "vfs: stop d_splice_alias creating directory aliases" · 32ba9c3f
      Linus Torvalds authored
      This reverts commit 7732a557 (and commit
      3f50fff4, which was a follow-up
      cleanup).
      
      We're chasing an elusive bug that Dave Jones can apparently reproduce
      using his system call fuzzer tool, and that looks like some kind of
      locking ordering problem on the directory i_mutex chain.  Our i_mutex
      locking is rather complex, and depends on the topological ordering of
      the directories, which is why we have been very wary of splicing
      directory entries around.
      
      Of course, we really don't want to ever see aliased unconnected
      directories anyway, so none of this should ever happen, but this revert
      aims to basically get us back to a known older state.
      
      Bruce points to some of the previous discussion at
      
             http://marc.info/?i=<20110310105821.GE22723@ZenIV.linux.org.uk>
      
      and in particular a long post from Neil:
      
             http://marc.info/?i=<20110311150749.2fa2be66@notabene.brown>
      
      It should be noted that it's possible that Dave's problems come from
      other changes altohgether, including possibly just the fact that Dave
      constantly is teachning his fuzzer new tricks.  So what appears to be a
      new bug could in fact be an old one that just gets newly triggered, but
      reverting these patches as "still under heavy discussion" is the right
      thing regardless.
      Requested-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Acked-by: default avatarJ. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      32ba9c3f
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0b35d326
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar.
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/nmi: Fix section mismatch warnings on 32-bit
        x86/uv: Fix UV2 BAU legacy mode
        x86/mm: Only add extra pages count for the first memory range during pre-allocation early page table space
        x86, efi stub: Add .reloc section back into image
        x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs
        x86/reboot: Fix a warning message triggered by stop_other_cpus()
        x86/intel/moorestown: Change intel_scu_devices_create() to __devinit
        x86/numa: Set numa_nodes_parsed at acpi_numa_memory_affinity_init()
        x86/gart: Fix kmemleak warning
        x86: mce: Add the dropped timer interval init back
        x86/mce: Fix the MCE poll timer logic
      0b35d326
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 106544d8
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "A bit larger than what I'd wish for - half of it is due to hw driver
        updates to Intel Ivy-Bridge which info got recently released,
        cycles:pp should work there now too, amongst other things.  (but we
        are generally making exceptions for hardware enablement of this type.)
      
        There are also callchain fixes in it - responding to mostly
        theoretical (but valid) concerns.  The tooling side sports perf.data
        endianness/portability fixes which did not make it for the merge
        window - and various other fixes as well."
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
        perf/x86: Check user address explicitly in copy_from_user_nmi()
        perf/x86: Check if user fp is valid
        perf: Limit callchains to 127
        perf/x86: Allow multiple stacks
        perf/x86: Update SNB PEBS constraints
        perf/x86: Enable/Add IvyBridge hardware support
        perf/x86: Implement cycles:p for SNB/IVB
        perf/x86: Fix Intel shared extra MSR allocation
        x86/decoder: Fix bsr/bsf/jmpe decoding with operand-size prefix
        perf: Remove duplicate invocation on perf_event_for_each
        perf uprobes: Remove unnecessary check before strlist__delete
        perf symbols: Check for valid dso before creating map
        perf evsel: Fix 32 bit values endianity swap for sample_id_all header
        perf session: Handle endianity swap on sample_id_all header data
        perf symbols: Handle different endians properly during symbol load
        perf evlist: Pass third argument to ioctl explicitly
        perf tools: Update ioctl documentation for PERF_IOC_FLAG_GROUP
        perf tools: Make --version show kernel version instead of pull req tag
        perf tools: Check if callchain is corrupted
        perf callchain: Make callchain cursors TLS
        ...
      106544d8
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 03d8f540
      Linus Torvalds authored
      Pull drm intel and exynos fixes from Dave Airlie:
       "A bunch of fixes for Intel and exynos, nothing too major, a new intel
        PCI ID, and a fix for CRT detection."
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        drm/i915: pch_irq_handler -> {ibx, cpt}_irq_handler
        char/agp: add another Ironlake host bridge
        drm/i915: fix up ivb plane 3 pageflips
        drm/exynos: fixed blending for hdmi graphic layer
        drm/exynos: Remove dummy encoder get_crtc operation implementation
        drm/exynos: Keep a reference to frame buffer GEM objects
        drm/exynos: Don't cast GEM object to Exynos GEM object when not needed
        drm/exynos: DRIVER_BUS_PLATFORM is not a driver feature
        drm/exynos: fixed size type.
        drm/exynos: Use DRM_FORMAT_{NV12, YUV420} instead of DRM_FORMAT_{NV12M, YUV420M}
        drm/i915: hold forcewake around ring hw init
        drm/i915: Mark the ringbuffers as being in the GTT domain
        drm/i915/crt: Do not rely upon the HPD presence pin
        drm/i915: Reset last_retired_head when resetting ring
      03d8f540
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b1e25f41
      Linus Torvalds authored
      Pull leap second timer fix from Thomas Gleixner.
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond
      b1e25f41
    • Linus Torvalds's avatar
      Merge tag 'moduleparam-for-linus' of... · 857505fa
      Linus Torvalds authored
      Merge tag 'moduleparam-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
      
      Pull minor module param fixes from Rusty Russell:
       "One bugfix for multiple moduleparam levels, one removal of overzealous
        printk."
      
      * tag 'moduleparam-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
        init: Drop initcall level output
        module_param: stop double-calling parameters.
      857505fa
    • Don Zickus's avatar
      x86/nmi: Fix section mismatch warnings on 32-bit · eeaaa96a
      Don Zickus authored
      It was reported that compiling for 32-bit caused a bunch of
      section mismatch warnings:
      
       VDSOSYM arch/x86/vdso/vdso32-syms.lds
        LD      arch/x86/vdso/built-in.o
        LD      arch/x86/built-in.o
      
       WARNING: arch/x86/built-in.o(.data+0x5af0): Section mismatch in
       reference from the variable test_nmi_ipi_callback_na.10451 to
       the function .init.text:test_nmi_ipi_callback() [...]
      
       WARNING: arch/x86/built-in.o(.data+0x5b04): Section mismatch in
       reference from the variable nmi_unk_cb_na.10399 to the function
       .init.text:nmi_unk_cb() The variable nmi_unk_cb_na.10399
       references the function __init nmi_unk_cb() [...]
      
      Both of these are attributed to the internal representation of
      the nmiaction struct created during register_nmi_handler.  The
      reason for this is that those structs are not defined in the
      init section whereas the rest of the code in nmi_selftest.c is.
      
      To resolve this, I created a new #define,
      register_nmi_handler_initonly, that tags the struct as
      __initdata to resolve the mismatch.  This #define should only be
      used in rare situations where the register/unregister is called
      during init of the kernel.
      
      Big thanks to Jan Beulich for decoding this for me as I didn't
      have a clue what was going on.
      Reported-by: default avatarWitold Baryluk <baryluk@smp.if.uj.edu.pl>
      Tested-by: default avatarWitold Baryluk <baryluk@smp.if.uj.edu.pl>
      Cc: Jan Beulich <JBeulich@suse.com>
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Link: http://lkml.kernel.org/r/1338991542-23000-1-git-send-email-dzickus@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      eeaaa96a
    • Steffen Rumler's avatar
      powerpc: Fix kernel panic during kernel module load · 3c752965
      Steffen Rumler authored
      This fixes a problem which can causes kernel oopses while loading
      a kernel module.
      
      According to the PowerPC EABI specification, GPR r11 is assigned
      the dedicated function to point to the previous stack frame.
      In the powerpc-specific kernel module loader, do_plt_call()
      (in arch/powerpc/kernel/module_32.c), GPR r11 is also used
      to generate trampoline code.
      
      This combination crashes the kernel, in the case where the compiler
      chooses to use a helper function for saving GPRs on entry, and the
      module loader has placed the .init.text section far away from the
      .text section, meaning that it has to generate a trampoline for
      functions in the .init.text section to call the GPR save helper.
      Because the trampoline trashes r11, references to the stack frame
      using r11 can cause an oops.
      
      The fix just uses GPR r12 instead of GPR r11 for generating the
      trampoline code.  According to the statements from Freescale, this is
      safe from an EABI perspective.
      
      I've tested the fix for kernel 2.6.33 on MPC8541.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSteffen Rumler <steffen.rumler.ext@nsn.com>
      [paulus@samba.org: reworded the description]
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      3c752965
    • Cliff Wickman's avatar
      x86/uv: Fix UV2 BAU legacy mode · d5d2d2ee
      Cliff Wickman authored
      The SGI Altix UV2 BAU (Broadcast Assist Unit) as used for
      tlb-shootdown (selective broadcast mode) always uses UV2
      broadcast descriptor format. There is no need to clear the
      'legacy' (UV1) mode, because the hardware always uses UV2 mode
      for selective broadcast.
      
      But the BIOS uses general broadcast and legacy mode, and the
      hardware pays attention to the legacy mode bit for general
      broadcast. So the kernel must not clear that mode bit.
      Signed-off-by: default avatarCliff Wickman <cpw@sgi.com>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/E1SccoO-0002Lh-Cb@eag09.americas.sgi.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d5d2d2ee
    • Yinghai Lu's avatar
      x86/mm: Only add extra pages count for the first memory range during... · bd2753b2
      Yinghai Lu authored
      x86/mm: Only add extra pages count for the first memory range during pre-allocation early page table space
      
      Robin found this regression:
      
      | I just tried to boot an 8TB system.  It fails very early in boot with:
      | Kernel panic - not syncing: Cannot find space for the kernel page tables
      
      git bisect commit 722bc6b1.
      
      A git revert of that commit does boot past that point on the 8TB
      configuration.
      
      That commit will add up extra pages for all memory range even
      above 4g.
      
      Try to limit that extra page count adding to first entry only.
      Bisected-by: default avatarRobin Holt <holt@sgi.com>
      Tested-by: default avatarRobin Holt <holt@sgi.com>
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/CAE9FiQUj3wyzQxtq9yzBNc9u220p8JZ1FYHG7t%3DMOzJ%3D9BZMYA@mail.gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      bd2753b2
    • Dave Airlie's avatar
      Merge branch 'exynos-drm-fixes' of... · 2d5c7cd3
      Dave Airlie authored
      Merge branch 'exynos-drm-fixes' of git://git.infradead.org/users/kmpark/linux-samsung into drm-fixes
      
      * 'exynos-drm-fixes' of git://git.infradead.org/users/kmpark/linux-samsung:
        drm/exynos: fixed blending for hdmi graphic layer
        drm/exynos: Remove dummy encoder get_crtc operation implementation
        drm/exynos: Keep a reference to frame buffer GEM objects
        drm/exynos: Don't cast GEM object to Exynos GEM object when not needed
        drm/exynos: DRIVER_BUS_PLATFORM is not a driver feature
        drm/exynos: fixed size type.
        drm/exynos: Use DRM_FORMAT_{NV12, YUV420} instead of DRM_FORMAT_{NV12M, YUV420M}
      2d5c7cd3
    • Dave Airlie's avatar
      Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes · 6cf98d6e
      Dave Airlie authored
      * 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
        drm/i915: pch_irq_handler -> {ibx, cpt}_irq_handler
        char/agp: add another Ironlake host bridge
        drm/i915: fix up ivb plane 3 pageflips
        drm/i915: hold forcewake around ring hw init
        drm/i915: Mark the ringbuffers as being in the GTT domain
        drm/i915/crt: Do not rely upon the HPD presence pin
        drm/i915: Reset last_retired_head when resetting ring
      6cf98d6e
    • Borislav Petkov's avatar
      init: Drop initcall level output · 19efb72f
      Borislav Petkov authored
      9fb48c74 ("params: add 3rd arg to option handler callback
      signature") added similar lines to dmesg:
      
      initlevel:0=early, 4 registered initcalls
      initlevel:1=core, 31 registered initcalls
      initlevel:2=postcore, 11 registered initcalls
      initlevel:3=arch, 7 registered initcalls
      initlevel:4=subsys, 40 registered initcalls
      initlevel:5=fs, 30 registered initcalls
      initlevel:6=device, 250 registered initcalls
      initlevel:7=late, 35 registered initcalls
      
      but they don't contain any info for the general user staring at dmesg.
      I'm very doubtful the count of initcalls registered per level helps
      anyone so drop that output completely.
      
      Cc: Jim Cromie <jim.cromie@gmail.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jason Baron <jbaron@redhat.com>
      Signed-off-by: default avatarBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      19efb72f
    • Rusty Russell's avatar
      module_param: stop double-calling parameters. · ae82fdb1
      Rusty Russell authored
      Commit 026cee00 "params:
      <level>_initcall-like kernel parameters" set old-style module
      parameters to level 0.  And we call those level 0 calls where we used
      to, early in start_kernel().
      
      We also loop through the initcall levels and call the levelled
      module_params before the corresponding initcall.  Unfortunately level
      0 is early_init(), so we call the standard module_param calls twice.
      
      (Turns out most things don't care, but at least ubi.mtd does).
      
      Change the level to -1 for standard module_param calls.
      Reported-by: default avatarBenoît Thébaudeau <benoit.thebaudeau@advansee.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: stable@kernel.org
      ae82fdb1
    • Paul Mackerras's avatar
      powerpc/time: Sanity check of decrementer expiration is necessary · 860aed25
      Paul Mackerras authored
      This reverts 68568add ("powerpc/time: Remove unnecessary sanity check
      of decrementer expiration").  We do need to check whether we have reached
      the expiration time of the next event, because we sometimes get an early
      decrementer interrupt, most notably when we set the decrementer to 1 in
      arch_irq_work_raise().  The effect of not having the sanity check is that
      if timer_interrupt() gets called early, we leave the decrementer set to
      its maximum value, which means we then don't get any more decrementer
      interrupts for about 4 seconds (or longer, depending on timebase
      frequency).  I saw these pauses as a consequence of getting a stray
      hypervisor decrementer interrupt left over from exiting a KVM guest.
      
      This isn't quite a straight revert because of changes to the surrounding
      code, but it restores the same algorithm as was previously used.
      
      Cc: stable@vger.kernel.org
      Acked-by: default avatarAnton Blanchard <anton@samba.org>
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      860aed25
    • Linus Torvalds's avatar
      Revert "mm: correctly synchronize rss-counters at exit/exec" · 48d212a2
      Linus Torvalds authored
      This reverts commit 40af1bbd.
      
      It's horribly and utterly broken for at least the following reasons:
      
       - calling sync_mm_rss() from mmput() is fundamentally wrong, because
         there's absolutely no reason to believe that the task that does the
         mmput() always does it on its own VM.  Example: fork, ptrace, /proc -
         you name it.
      
       - calling it *after* having done mmdrop() on it is doubly insane, since
         the mm struct may well be gone now.
      
       - testing mm against NULL before you call it is insane too, since a
      NULL mm there would have caused oopses long before.
      
      .. and those are just the three bugs I found before I decided to give up
      looking for me and revert it asap.  I should have caught it before I
      even took it, but I trusted Andrew too much.
      
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      48d212a2
  7. 07 Jun, 2012 2 commits