1. 20 Feb, 2013 4 commits
    • Linus Torvalds's avatar
      Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d652e1eb
      Linus Torvalds authored
      Pull scheduler changes from Ingo Molnar:
       "Main changes:
      
         - scheduler side full-dynticks (user-space execution is undisturbed
           and receives no timer IRQs) preparation changes that convert the
           cputime accounting code to be full-dynticks ready, from Frederic
           Weisbecker.
      
         - Initial sched.h split-up changes, by Clark Williams
      
         - select_idle_sibling() performance improvement by Mike Galbraith:
      
              " 1 tbench pair (worst case) in a 10 core + SMT package:
      
                pre   15.22 MB/sec 1 procs
                post 252.01 MB/sec 1 procs "
      
        - sched_rr_get_interval() ABI fix/change.  We think this detail is not
          used by apps (so it's not an ABI in practice), but lets keep it
          under observation.
      
        - misc RT scheduling cleanups, optimizations"
      
      * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
        sched/rt: Add <linux/sched/rt.h> header to <linux/init_task.h>
        cputime: Remove irqsave from seqlock readers
        sched, powerpc: Fix sched.h split-up build failure
        cputime: Restore CPU_ACCOUNTING config defaults for PPC64
        sched/rt: Move rt specific bits into new header file
        sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice
        sched: Move sched.h sysctl bits into separate header
        sched: Fix signedness bug in yield_to()
        sched: Fix select_idle_sibling() bouncing cow syndrome
        sched/rt: Further simplify pick_rt_task()
        sched/rt: Do not account zero delta_exec in update_curr_rt()
        cputime: Safely read cputime of full dynticks CPUs
        kvm: Prepare to add generic guest entry/exit callbacks
        cputime: Use accessors to read task cputime stats
        cputime: Allow dynamic switch between tick/virtual based cputime accounting
        cputime: Generic on-demand virtual cputime accounting
        cputime: Move default nsecs_to_cputime() to jiffies based cputime file
        cputime: Librarize per nsecs resolution cputime definitions
        cputime: Avoid multiplication overflow on utime scaling
        context_tracking: Export context state for generic vtime
        ...
      
      Fix up conflict in kernel/context_tracking.c due to comment additions.
      d652e1eb
    • Linus Torvalds's avatar
      Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8f55cea4
      Linus Torvalds authored
      Pull perf changes from Ingo Molnar:
       "There are lots of improvements, the biggest changes are:
      
        Main kernel side changes:
      
         - Improve uprobes performance by adding 'pre-filtering' support, by
           Oleg Nesterov.
      
         - Make some POWER7 events available in sysfs, equivalent to what was
           done on x86, from Sukadev Bhattiprolu.
      
         - tracing updates by Steve Rostedt - mostly misc fixes and smaller
           improvements.
      
         - Use perf/event tracing to report PCI Express advanced errors, by
           Tony Luck.
      
         - Enable northbridge performance counters on AMD family 15h, by Jacob
           Shin.
      
         - This tracing commit:
      
              tracing: Remove the extra 4 bytes of padding in events
      
           changes the ABI.  All involved parties (PowerTop in particular)
           seem to agree that it's safe to do now with the introduction of
           libtraceevent, but the devil is in the details ...
      
        Main tooling side changes:
      
         - Add 'event group view', from Namyung Kim:
      
           To use it, 'perf record' should group events when recording.  And
           then perf report parses the saved group relation from file header
           and prints them together if --group option is provided.  You can
           use the 'perf evlist' command to see event group information:
      
              $ perf record -e '{ref-cycles,cycles}' noploop 1
              [ perf record: Woken up 2 times to write data ]
              [ perf record: Captured and wrote 0.385 MB perf.data (~16807 samples) ]
      
              $ perf evlist --group
              {ref-cycles,cycles}
      
           With this example, default perf report will show you each event
           separately.
      
           You can use --group option to enable event group view:
      
              $ perf report --group
              ...
              # group: {ref-cycles,cycles}
              # ========
              # Samples: 7K of event 'anon group { ref-cycles, cycles }'
              # Event count (approx.): 6876107743
              #
              #         Overhead  Command      Shared Object                      Symbol
              # ................  .......  .................  ..........................
                  99.84%  99.76%  noploop  noploop            [.] main
                   0.07%   0.00%  noploop  ld-2.15.so         [.] strcmp
                   0.03%   0.00%  noploop  [kernel.kallsyms]  [k] timerqueue_del
                   0.03%   0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
                   0.02%   0.00%  noploop  [kernel.kallsyms]  [k] account_user_time
                   0.01%   0.00%  noploop  [kernel.kallsyms]  [k] __alloc_pages_nodemask
                   0.00%   0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe
                   0.00%   0.11%  noploop  [kernel.kallsyms]  [k] _raw_spin_lock
                   0.00%   0.06%  noploop  [kernel.kallsyms]  [k] find_get_page
                   0.00%   0.02%  noploop  [kernel.kallsyms]  [k] rcu_check_callbacks
                   0.00%   0.02%  noploop  [kernel.kallsyms]  [k] __current_kernel_time
      
           As you can see the Overhead column now contains both of ref-cycles
           and cycles and header line shows group information also - 'anon
           group { ref-cycles, cycles }'.  The output is sorted by period of
           group leader first.
      
         - Initial GTK+ annotate browser, from Namhyung Kim.
      
         - Add option for runtime switching perf data file in perf report,
           just press 's' and a menu with the valid files found in the current
           directory will be presented, from Feng Tang.
      
         - Add support to display whole group data for raw columns, from Jiri
           Olsa.
      
         - Add per processor socket count aggregation in perf stat, from
           Stephane Eranian.
      
         - Add interval printing in 'perf stat', from Stephane Eranian.
      
         - 'perf test' improvements
      
         - Add support for wildcards in tracepoint system name, from Jiri
           Olsa.
      
         - Add anonymous huge page recognition, from Joshua Zhu.
      
         - perf build-id cache now can show DSOs present in a perf.data file
           that are not in the cache, to integrate with build-id servers being
           put in place by organizations such as Fedora.
      
         - perf top now shares more of the evsel config/creation routines with
           'record', paving the way for further integration like 'top'
           snapshots, etc.
      
         - perf top now supports DWARF callchains.
      
         - Fix mmap limitations on 32-bit, fix from David Miller.
      
         - 'perf bench numa mem' NUMA performance measurement suite
      
         - ... and lots of fixes, performance improvements, cleanups and other
           improvements I failed to list - see the shortlog and git log for
           details."
      
      * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (270 commits)
        perf/x86/amd: Enable northbridge performance counters on AMD family 15h
        perf/hwbp: Fix cleanup in case of kzalloc failure
        perf tools: Fix build with bison 2.3 and older.
        perf tools: Limit unwind support to x86 archs
        perf annotate: Make it to be able to skip unannotatable symbols
        perf gtk/annotate: Fail early if it can't annotate
        perf gtk/annotate: Show source lines with gray color
        perf gtk/annotate: Support multiple event annotation
        perf ui/gtk: Implement basic GTK2 annotation browser
        perf annotate: Fix warning message on a missing vmlinux
        perf buildid-cache: Add --update option
        uprobes/perf: Avoid uprobe_apply() whenever possible
        uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE
        uprobes/perf: Teach trace_uprobe/perf code to pre-filter
        uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's
        uprobes: Introduce uprobe_apply()
        perf: Introduce hw_perf_event->tp_target and ->tp_list
        uprobes/perf: Always increment trace_uprobe->nhit
        uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe
        uprobes/tracing: Introduce is_trace_uprobe_enabled()
        ...
      8f55cea4
    • Linus Torvalds's avatar
      Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b7133a9a
      Linus Torvalds authored
      Pull irq core changes from Ingo Molnar:
       "The biggest changes are the IRQ-work and printk changes from Frederic
        Weisbecker, which prepare the code for 'full dynticks' (the ability to
        stop or slow down the periodic tick arbitrarily, not just in idle time
        as today):
      
         - Don't stop tick with irq works pending.  This fix is generally
           useful and concerns archs that can't raise self IPIs.
      
         - Flush irq works before CPU offlining.
      
         - Introduce "lazy" irq works that can wait for the next tick to be
           executed, unless it's stopped.
      
         - Implement klogd wake up using irq work.  This removes the ad-hoc
           printk_tick()/printk_needs_cpu() hooks and make it working even in
           dynticks mode.
      
         - Cleanups and fixes."
      
      * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Export enable/disable_percpu_irq()
        arch Kconfig: Remove references to IRQ_PER_CPU
        irq_work: Remove return value from the irq_work_queue() function
        genirq: Avoid deadlock in spurious handling
        printk: Wake up klogd using irq_work
        irq_work: Make self-IPIs optable
        irq_work: Warn if there's still work on cpu_down
        irq_work: Flush work on CPU_DYING
        irq_work: Don't stop the tick with pending works
        nohz: Add API to check tick state
        irq_work: Remove CONFIG_HAVE_IRQ_WORK
        irq_work: Fix racy check on work pending flag
        irq_work: Fix racy IRQ_WORK_BUSY flag setting
      b7133a9a
    • Linus Torvalds's avatar
      Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e84cf5d0
      Linus Torvalds authored
      Pull RCU changes from Ingo Molnar:
       "SRCU changes:
      
         - These include debugging aids, updates that move towards the goal of
           permitting srcu_read_lock() and srcu_read_unlock() to be used from
           idle and offline CPUs, and a few small fixes.
      
        Changes to rcutorture and to RCU documentation:
      
         - Posted to LKML at https://lkml.org/lkml/2013/1/26/188
      
        Enhancements to uniprocessor handling in tiny RCU:
      
         - Posted to LKML at https://lkml.org/lkml/2013/1/27/2
      
        Tag RCU callbacks with grace-period number to simplify callback
        advancement:
      
         - Posted to LKML at https://lkml.org/lkml/2013/1/26/203
      
        Miscellaneous fixes:
      
         - Posted to LKML at https://lkml.org/lkml/2013/1/26/204"
      
      * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
        srcu: use ACCESS_ONCE() to access sp->completed in srcu_read_lock()
        srcu: Update synchronize_srcu_expedited()'s comments
        srcu: Update synchronize_srcu()'s comments
        srcu: Remove checks preventing idle CPUs from calling srcu_read_lock()
        srcu: Remove checks preventing offline CPUs from calling srcu_read_lock()
        srcu: Simple cleanup for cleanup_srcu_struct()
        srcu: Add might_sleep() annotation to synchronize_srcu()
        srcu: Simplify __srcu_read_unlock() via this_cpu_dec()
        rcu: Allow rcutorture to be built at low optimization levels
        rcu: Make rcutorture's shuffler task shuffle recently added tasks
        rcu: Allow TREE_PREEMPT_RCU on UP systems
        rcu: Provide RCU CPU stall warnings for tiny RCU
        context_tracking: Add comments on interface and internals
        rcu: Remove obsolete Kconfig option from comment
        rcu: Remove unused code originally used for context tracking
        rcu: Consolidate debugging Kconfig options
        rcu: Correct 'optimized' to 'optimize' in header comment
        rcu: Trace callback acceleration
        rcu: Tag callback lists with corresponding grace-period number
        rcutorture: Don't compare ptr with 0
        ...
      e84cf5d0
  2. 19 Feb, 2013 2 commits
  3. 18 Feb, 2013 4 commits
  4. 16 Feb, 2013 1 commit
  5. 15 Feb, 2013 10 commits
  6. 14 Feb, 2013 14 commits
  7. 13 Feb, 2013 5 commits
    • Cyril Roelandt's avatar
      xen: remove redundant NULL check before unregister_and_remove_pcpu(). · 4f8c8527
      Cyril Roelandt authored
      unregister_and_remove_pcpu on a NULL pointer is a no-op, so the NULL check in
      sync_pcpu can be removed.
      Signed-off-by: default avatarCyril Roelandt <tipecaml@gmail.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      4f8c8527
    • Jan Beulich's avatar
      x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS. · 13d2b4d1
      Jan Beulich authored
      This fixes CVE-2013-0228 / XSA-42
      
      Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user
      in 32bit PV guest can use to crash the > guest with the panic like this:
      
      -------------
      general protection fault: 0000 [#1] SMP
      last sysfs file: /sys/devices/vbd-51712/block/xvda/dev
      Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
      iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
      xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4
      mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last
      unloaded: scsi_wait_scan]
      
      Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1
      EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0
      EIP is at xen_iret+0x12/0x2b
      EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010
      ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0
       DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069
      Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000)
      Stack:
       00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000
      Call Trace:
      Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00
      8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40
      10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02
      EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0
      general protection fault: 0000 [#2]
      ---[ end trace ab0d29a492dcd330 ]---
      Kernel panic - not syncing: Fatal exception
      Pid: 1250, comm: r Tainted: G      D    ---------------
      2.6.32-356.el6.i686 #1
      Call Trace:
       [<c08476df>] ? panic+0x6e/0x122
       [<c084b63c>] ? oops_end+0xbc/0xd0
       [<c084b260>] ? do_general_protection+0x0/0x210
       [<c084a9b7>] ? error_code+0x73/
      -------------
      
      Petr says: "
       I've analysed the bug and I think that xen_iret() cannot cope with
       mangled DS, in this case zeroed out (null selector/descriptor) by either
       xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT
       entry was invalidated by the reproducer. "
      
      Jan took a look at the preliminary patch and came up a fix that solves
      this problem:
      
      "This code gets called after all registers other than those handled by
      IRET got already restored, hence a null selector in %ds or a non-null
      one that got loaded from a code or read-only data descriptor would
      cause a kernel mode fault (with the potential of crashing the kernel
      as a whole, if panic_on_oops is set)."
      
      The way to fix this is to realize that the we can only relay on the
      registers that IRET restores. The two that are guaranteed are the
      %cs and %ss as they are always fixed GDT selectors. Also they are
      inaccessible from user mode - so they cannot be altered. This is
      the approach taken in this patch.
      
      Another alternative option suggested by Jan would be to relay on
      the subtle realization that using the %ebp or %esp relative references uses
      the %ss segment.  In which case we could switch from using %eax to %ebp and
      would not need the %ss over-rides. That would also require one extra
      instruction to compensate for the one place where the register is used
      as scaled index. However Andrew pointed out that is too subtle and if
      further work was to be done in this code-path it could escape folks attention
      and lead to accidents.
      Reviewed-by: default avatarPetr Matousek <pmatouse@redhat.com>
      Reported-by: default avatarPetr Matousek <pmatouse@redhat.com>
      Reviewed-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      13d2b4d1
    • David S. Miller's avatar
      sparc64: Fix get_user_pages_fast() wrt. THP. · 89a77915
      David S. Miller authored
      Mostly mirrors the s390 logic, as unlike x86 we don't need the
      SetPageReferenced() bits.
      
      On sparc64 we also lack a user/privileged bit in the huge PMDs.
      
      In order to make this work for THP and non-THP builds, some header
      file adjustments were necessary.  Namely, provide the PMD_HUGE_* bit
      defines and the pmd_large() inline unconditionally rather than
      protected by TRANSPARENT_HUGEPAGE.
      Reported-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89a77915
    • David S. Miller's avatar
      sparc64: Add missing HAVE_ARCH_TRANSPARENT_HUGEPAGE. · b9156ebb
      David S. Miller authored
      This got missed in the cleanups done for the S390 THP
      support.
      
      CC: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9156ebb
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 323a72d8
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "This is primarily to get those r8169 reverts sorted, but other fixes
        have accumulated meanwhile.
      
         1) Revert two r8169 changes to fix suspend/resume for some users,
            from Francois Romieu.
      
         2) PCI dma mapping errors in atl1c are not checked for and this cause
            hard crashes for some users, from Xiong Huang.
      
         3) In 3.8.x we merged the removal of the EXPERIMENTAL dependency for
            'dlm' but the same patch for 'sctp' got lost somewhere, resulting
            in the potential for build errors since there are cross
            dependencies.  From Kees Cook.
      
         4) SCTP's ipv6 socket route validation makes boolean tests
            incorrectly, fix from Daniel Borkmann.
      
         5) mac80211 does sizeof(ptr) instead of (sizeof(ptr) * nelem), from
            Cong Ding.
      
         6) arp_rcv() can crash on shared non-linear packets, from Eric
            Dumazet.
      
         7) Avoid crashes in macvtap by setting ->gso_type consistently in
            ixgbe, qlcnic, and bnx2x drivers.  From Michael S Tsirkin and
            Alexander Duyck.
      
         8) Trinity fuzzer spots infinite loop in __skb_recv_datagram(), fix
            from Eric Dumazet.
      
         9) STP protocol frames should use high packet priority, otherwise an
            overloaded bridge can get stuck.  From Stephen Hemminger.
      
        10) The HTB packet scheduler was converted some time ago to store
            internal timestamps in nanoseconds, but we don't convert back into
            psched ticks for the user during dumps.  Fix from Jiri Pirko.
      
        11) mwl8k channel table doesn't set the .band field properly,
            resulting in NULL pointer derefs.  Fix from Jonas Gorski.
      
        12) mac80211 doesn't accumulate channels properly during a scan so we
            can downgrade heavily to a much less desirable connection speed.
            Fix from Johannes Berg.
      
        13) PHY probe failure in stmmac can result in resource leaks and
            double MDIO registery later, from Giuseppe CAVALLARO.
      
        14) Correct ipv6 checksumming in ip6t_NPT netfilter module, also fix
            address prefix mangling, from YOSHIFUJI Hideaki."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
        net, sctp: remove CONFIG_EXPERIMENTAL
        net: sctp: sctp_v6_get_dst: fix boolean test in dst cache
        batman-adv: Fix NULL pointer dereference in DAT hash collision avoidance
        net/macb: fix race with RX interrupt while doing NAPI
        atl1c: add error checking for pci_map_single functions
        htb: fix values in opt dump
        ixgbe: Only set gso_type to SKB_GSO_TCPV4 as RSC does not support IPv6
        net: fix infinite loop in __skb_recv_datagram()
        net: qmi_wwan: add Yota / Megafon M100-1 4g modem
        mwl8k: fix band for supported channels
        bridge: set priority of STP packets
        mac80211: fix channel selection bug
        arp: fix possible crash in arp_rcv()
        bnx2x: set gso_type
        qlcnic: set gso_type
        ixgbe: fix gso type
        stmmac: mdio register has to fail if the phy is not found
        stmmac: fix macro used for debugging the xmit
        Revert "r8169: enable internal ASPM and clock request settings".
        Revert "r8169: enable ALDPS for power saving".
        ...
      323a72d8