1. 20 Mar, 2012 5 commits
    • Linus Torvalds's avatar
      Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9c2b957d
      Linus Torvalds authored
      Pull perf events changes for v3.4 from Ingo Molnar:
      
       - New "hardware based branch profiling" feature both on the kernel and
         the tooling side, on CPUs that support it.  (modern x86 Intel CPUs
         with the 'LBR' hardware feature currently.)
      
         This new feature is basically a sophisticated 'magnifying glass' for
         branch execution - something that is pretty difficult to extract from
         regular, function histogram centric profiles.
      
         The simplest mode is activated via 'perf record -b', and the result
         looks like this in perf report:
      
      	$ perf record -b any_call,u -e cycles:u branchy
      
      	$ perf report -b --sort=symbol
      	    52.34%  [.] main                   [.] f1
      	    24.04%  [.] f1                     [.] f3
      	    23.60%  [.] f1                     [.] f2
      	     0.01%  [k] _IO_new_file_xsputn    [k] _IO_file_overflow
      	     0.01%  [k] _IO_vfprintf_internal  [k] _IO_new_file_xsputn
      	     0.01%  [k] _IO_vfprintf_internal  [k] strchrnul
      	     0.01%  [k] __printf               [k] _IO_vfprintf_internal
      	     0.01%  [k] main                   [k] __printf
      
         This output shows from/to branch columns and shows the highest
         percentage (from,to) jump combinations - i.e.  the most likely taken
         branches in the system.  "branches" can also include function calls
         and any other synchronous and asynchronous transitions of the
         instruction pointer that are not 'next instruction' - such as system
         calls, traps, interrupts, etc.
      
         This feature comes with (hopefully intuitive) flat ascii and TUI
         support in perf report.
      
       - Various 'perf annotate' visual improvements for us assembly junkies.
         It will now recognize function calls in the TUI and by hitting enter
         you can follow the call (recursively) and back, amongst other
         improvements.
      
       - Multiple threads/processes recording support in perf record, perf
         stat, perf top - which is activated via a comma-list of PIDs:
      
      	perf top -p 21483,21485
      	perf stat -p 21483,21485 -ddd
      	perf record -p 21483,21485
      
       - Support for per UID views, via the --uid paramter to perf top, perf
         report, etc.  For example 'perf top --uid mingo' will only show the
         tasks that I am running, excluding other users, root, etc.
      
       - Jump label restructurings and improvements - this includes the
         factoring out of the (hopefully much clearer) include/linux/static_key.h
         generic facility:
      
      	struct static_key key = STATIC_KEY_INIT_FALSE;
      
      	...
      
      	if (static_key_false(&key))
      	        do unlikely code
      	else
      	        do likely code
      
      	...
      	static_key_slow_inc();
      	...
      	static_key_slow_inc();
      	...
      
         The static_key_false() branch will be generated into the code with as
         little impact to the likely code path as possible.  the
         static_key_slow_*() APIs flip the branch via live kernel code patching.
      
         This facility can now be used more widely within the kernel to
         micro-optimize hot branches whose likelihood matches the static-key
         usage and fast/slow cost patterns.
      
       - SW function tracer improvements: perf support and filtering support.
      
       - Various hardenings of the perf.data ABI, to make older perf.data's
         smoother on newer tool versions, to make new features integrate more
         smoothly, to support cross-endian recording/analyzing workflows
         better, etc.
      
       - Restructuring of the kprobes code, the splitting out of 'optprobes',
         and a corner case bugfix.
      
       - Allow the tracing of kernel console output (printk).
      
       - Improvements/fixes to user-space RDPMC support, allowing user-space
         self-profiling code to extract PMU counts without performing any
         system calls, while playing nice with the kernel side.
      
       - 'perf bench' improvements
      
       - ... and lots of internal restructurings, cleanups and fixes that made
         these features possible.  And, as usual this list is incomplete as
         there were also lots of other improvements
      
      * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits)
        perf report: Fix annotate double quit issue in branch view mode
        perf report: Remove duplicate annotate choice in branch view mode
        perf/x86: Prettify pmu config literals
        perf report: Enable TUI in branch view mode
        perf report: Auto-detect branch stack sampling mode
        perf record: Add HEADER_BRANCH_STACK tag
        perf record: Provide default branch stack sampling mode option
        perf tools: Make perf able to read files from older ABIs
        perf tools: Fix ABI compatibility bug in print_event_desc()
        perf tools: Enable reading of perf.data files from different ABI rev
        perf: Add ABI reference sizes
        perf report: Add support for taken branch sampling
        perf record: Add support for sampling taken branch
        perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK
        x86/kprobes: Split out optprobe related code to kprobes-opt.c
        x86/kprobes: Fix a bug which can modify kernel code permanently
        x86/kprobes: Fix instruction recovery on optimized path
        perf: Add callback to flush branch_stack on context switch
        perf: Disable PERF_SAMPLE_BRANCH_* when not supported
        perf/x86: Add LBR software filter support for Intel CPUs
        ...
      9c2b957d
    • Linus Torvalds's avatar
      Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0bbfcaff
      Linus Torvalds authored
      Pull irq/core changes for v3.4 from Ingo Molnar
      
      * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Remove paranoid warnons and bogus fixups
        genirq: Flush the irq thread on synchronization
        genirq: Get rid of unnecessary IRQTF_DIED flag
        genirq: No need to check IRQTF_DIED before stopping a thread handler
        genirq: Get rid of unnecessary irqaction field in task_struct
        genirq: Fix incorrect check for forced IRQ thread handler
        softirq: Reduce invoke_softirq() code duplication
        genirq: Fix long-term regression in genirq irq_set_irq_type() handling
        x86-32/irq: Don't switch to irq stack for a user-mode irq
      0bbfcaff
    • Linus Torvalds's avatar
      Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5928a2b6
      Linus Torvalds authored
      Pull RCU changes for v3.4 from Ingo Molnar.  The major features of this
      series are:
      
       - making RCU more aggressive about entering dyntick-idle mode in order
         to improve energy efficiency
      
       - converting a few more call_rcu()s to kfree_rcu()s
      
       - applying a number of rcutree fixes and cleanups to rcutiny
      
       - removing CONFIG_SMP #ifdefs from treercu
      
       - allowing RCU CPU stall times to be set via sysfs
      
       - adding CPU-stall capability to rcutorture
      
       - adding more RCU-abuse diagnostics
      
       - updating documentation
      
       - fixing yet more issues located by the still-ongoing top-to-bottom
         inspection of RCU, this time with a special focus on the CPU-hotplug
         code path.
      
      * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
        rcu: Stop spurious warnings from synchronize_sched_expedited
        rcu: Hold off RCU_FAST_NO_HZ after timer posted
        rcu: Eliminate softirq-mediated RCU_FAST_NO_HZ idle-entry loop
        rcu: Add RCU_NONIDLE() for idle-loop RCU read-side critical sections
        rcu: Allow nesting of rcu_idle_enter() and rcu_idle_exit()
        rcu: Remove redundant check for rcu_head misalignment
        PTR_ERR should be called before its argument is cleared.
        rcu: Convert WARN_ON_ONCE() in rcu_lock_acquire() to lockdep
        rcu: Trace only after NULL-pointer check
        rcu: Call out dangers of expedited RCU primitives
        rcu: Rework detection of use of RCU by offline CPUs
        lockdep: Add CPU-idle/offline warning to lockdep-RCU splat
        rcu: No interrupt disabling for rcu_prepare_for_idle()
        rcu: Move synchronize_sched_expedited() to rcutree.c
        rcu: Check for illegal use of RCU from offlined CPUs
        rcu: Update stall-warning documentation
        rcu: Add CPU-stall capability to rcutorture
        rcu: Make documentation give more realistic rcutorture duration
        rcutorture: Permit holding off CPU-hotplug operations during boot
        rcu: Print scheduling-clock information on RCU CPU stall-warning messages
        ...
      5928a2b6
    • Linus Torvalds's avatar
      Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5ed59af8
      Linus Torvalds authored
      Pull core/locking changes for v3.4 from Ingo Molnar
      
      * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        futex: Simplify return logic
        futex: Cover all PI opcodes with cmpxchg enabled check
      5ed59af8
    • Linus Torvalds's avatar
      Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b7f077d7
      Linus Torvalds authored
      Pull core/iommu changes for v3.4 from Ingo Molnar
      
      * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/iommu/intel: Increase the number of iommus supported to MAX_IO_APICS
        x86/iommu/intel: Fix identity mapping for sandy bridge
      b7f077d7
  2. 19 Mar, 2012 2 commits
    • Linus Torvalds's avatar
      Merge branch 'dcache-word-accesses' · b0e37d7a
      Linus Torvalds authored
      * branch 'dcache-word-accesses':
        vfs: use 'unsigned long' accesses for dcache name comparison and hashing
      
      This does the name hashing and lookup using word-sized accesses when
      that is efficient, namely on x86 (although any little-endian machine
      with good unaligned accesses would do).
      
      It does very much depend on little-endian logic, but it's a very hot
      couple of functions under some real loads, and this patch improves the
      performance of __d_lookup_rcu() and link_path_walk() by up to about 30%.
      Giving a 10% improvement on some very pathname-heavy benchmarks.
      
      Because we do make unaligned accesses past the filename, the
      optimization is disabled when CONFIG_DEBUG_PAGEALLOC is active, and we
      effectively depend on the fact that on x86 we don't really ever have the
      last page of usable RAM followed immediately by any IO memory (due to
      ACPI tables, BIOS buffer areas etc).
      
      Some of the bit operations we do are a bit "subtle".  It's commented,
      but you do need to really think about the code.  Or just consider it
      black magic.
      
      Thanks to people on G+ for some of the optimized bit tricks.
      b0e37d7a
    • Linus Torvalds's avatar
      vfs: get rid of batshit-insane pointless dentry hash calculations · 6d7d1a0d
      Linus Torvalds authored
      For some odd historical reason, the final mixing round for the dentry
      cache hash table lookup had an insane "xor with big constant" logic.  In
      two places.
      
      The big constant that is being xor'ed is GOLDEN_RATIO_PRIME, which is a
      fairly random-looking number that is designed to be *multiplied* with so
      that the bits get spread out over a whole long-word.
      
      But xor'ing with it is insane.  It doesn't really even change the hash -
      it really only shifts the hash around in the hash table.  To make
      matters worse, the insane big constant is different on 32-bit and 64-bit
      builds, even though the name hash bits we use are always 32-bit (and the
      bits from the pointer we mix in effectively are too).
      
      It's all total voodoo programming, in other words.
      
      Now, some testing and analysis of the hash chains shows that the rest of
      the hash function seems to be fairly good.  It does pick the right bits
      of the parent dentry pointer, for example, and while it's generally a
      bad idea to use an xor to mix down the upper bits (because if there is a
      repeating pattern, the xor can cause "destructive interference"), it
      seems to not have been a disaster.
      
      For example, replacing the hash with the normal "hash_long()" code (that
      uses the GOLDEN_RATIO_PRIME constant correctly, btw) actually just makes
      the hash worse.  The hand-picked hash knew which bits of the pointer had
      the highest entropy, and hash_long() ends up mixing bits less optimally
      at least in some trivial tests.
      
      So the hash function overall seems fine, it just has that really odd
      "shift result around by a constant xor".
      
      So get rid of the silly xor, and replace the down-mixing of the bits
      with an add instead of an xor that tends to not have the same kind of
      destructive interference issues.  Some stats on the resulting hash
      chains shows that they look statistically identical before and after,
      but the code is simpler and no longer makes you go "WTF?".
      
      Also, the incoming hash really is just "unsigned int", not a long, and
      there's no real point to worry about the high 26 bits of the dentry
      pointer for the 64-bit case, because they are all going to be identical
      anyway.
      
      So also change the hashing to be done in the more natural 'unsigned int'
      that is the real size of the actual hashed data anyway.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d7d1a0d
  3. 18 Mar, 2012 3 commits
    • Linus Torvalds's avatar
      Linux 3.3 · c16fa4f2
      Linus Torvalds authored
      c16fa4f2
    • Jason Baron's avatar
      Don't limit non-nested epoll paths · 93dc6107
      Jason Baron authored
      Commit 28d82dc1 ("epoll: limit paths") that I did to limit the
      number of possible wakeup paths in epoll is causing a few applications
      to longer work (dovecot for one).
      
      The original patch is really about limiting the amount of epoll nesting
      (since epoll fds can be attached to other fds). Thus, we probably can
      allow an unlimited number of paths of depth 1. My current patch limits
      it at 1000. And enforce the limits on paths that have a greater depth.
      
      This is captured in: https://bugzilla.redhat.com/show_bug.cgi?id=681578Signed-off-by: default avatarJason Baron <jbaron@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      93dc6107
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c579bc7e
      Linus Torvalds authored
      Pull networking changes from David Miller:
       "1) icmp6_dst_alloc() returns NULL instead of ERR_PTR() leading to
           crashes, particularly during shutdown.  Reported by Dave Jones and
           fixed by Eric Dumazet.
      
        2) hyperv and wimax/i2400m return NETDEV_TX_BUSY when they have
           already freed the SKB, which causes crashes as to the caller this
           means requeue the packet.  Fixes from Eric Dumazet.
      
        3) usbnet driver doesn't allocate the right amount of headroom on
           fresh RX SKBs, fix from Eric Dumazet.
      
        4) Fix regression in ip6_mc_find_dev_rcu(), as an RCU lookup it
           abolutely should not take a reference to 'dev', this leads to
           leaks.  Fix from RonQing Li.
      
        5) Fix netfilter ctnetlink race between delete and timeout expiration.
           From Pablo Neira Ayuso.
      
        6) Revert SFQ change which causes regressions, specifically queueing
           to tail can lead to unavoidable flow starvation.  From Eric
           Dumazet.
      
        7) Fix a memory leak and a crash on corrupt firmware files in bnx2x,
           from Michal Schmidt."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        netfilter: ctnetlink: fix race between delete and timeout expiration
        ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.
        wimax/i2400m: fix erroneous NETDEV_TX_BUSY use
        net/hyperv: fix erroneous NETDEV_TX_BUSY use
        net/usbnet: reserve headroom on rx skbs
        bnx2x: fix memory leak in bnx2x_init_firmware()
        bnx2x: fix a crash on corrupt firmware file
        sch_sfq: revert dont put new flow at the end of flows
        ipv6: fix icmp6_dst_alloc()
      c579bc7e
  4. 17 Mar, 2012 10 commits
  5. 16 Mar, 2012 20 commits