1. 12 Dec, 2012 40 commits
    • Linus Torvalds's avatar
      Merge tag 'regmap-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · 99b8f42e
      Linus Torvalds authored
      Pull regmap updates from Mark Brown:
       "Quite a few enhancements this time around, helpers and diagnostics for
        the most part which is good to see:
      
         - Addition of table based lookups for the register access checks from
           Davide Ciminaghi, making life easier for drivers with big blocks of
           similar registers.
         - Allow drivers to get the irqdomain for regmap irq_chips, allowing
           the domain to be used with other APIs.
         - Debug improvements for paged register maps.
         - Performance improvments for some of the diagnostic infrastructure,
           very helpful for devices with large register maps."
      
      * tag 'regmap-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: debugfs: Cache offsets of valid regions for dump
        regmap: debugfs: Factor out initial seek
        regmap: debugfs: Avoid overflows for very small reads
        regmap: Cache register and value sizes for debugfs
        regmap: introduce tables for readable/writeable/volatile/precious checks
        regmap: core: Report registers in hex when we can't cache
        regmap: Fix printing of size_t variable
        regmap: make lock/unlock functions customizable
        regmap: silence GCC warning
        regmap: Split raw writes that cross window boundaries
        regmap: Make return code checks consistent
        regmap: Factor range lookup out of page selection
        regmap: Provide debugfs read of register ranges
        regmap: Factor out debugfs register read
        regmap: Allow ranges to be named
        regmap: When we sanity check during range adds say what errors we find
        regmap: Rename n_ranges to num_ranges
        regmap: irq: Allow users to retrieve the irq_domain
      99b8f42e
    • Linus Torvalds's avatar
      Merge tag 'please-pull-einj-fix-for-acpi5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · 139353ff
      Linus Torvalds authored
      Pull ACPI5 error injection fix from Tony Luck:
       "Trivial fix for error injection code using ACPI5 version of EINJ"
      
      * tag 'please-pull-einj-fix-for-acpi5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
        ACPI, APEI, EINJ: Add missed ACPI5 support for error trigger table
      139353ff
    • Linus Torvalds's avatar
      Merge tag 'please-pull-pstore_mevent' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux · 251a8cfe
      Linus Torvalds authored
      Pull pstore fixes from Tony Luck:
       "Patch series to allow EFI variable backend to pstore to hold multiple
        records."
      
      * tag 'please-pull-pstore_mevent' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
        efi_pstore: Add a format check for an existing variable name at erasing time
        efi_pstore: Add a format check for an existing variable name at reading time
        efi_pstore: Add a sequence counter to a variable name
        efi_pstore: Add ctime to argument of erase callback
        efi_pstore: Remove a logic erasing entries from a write callback to hold multiple logs
        efi_pstore: Add a logic erasing entries to an erase callback
        efi_pstore: Check remaining space with QueryVariableInfo() before writing data
      251a8cfe
    • Linus Torvalds's avatar
      Merge tag 'please-pull-misc-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux · 70f2836d
      Linus Torvalds authored
      Pull ia64 fix from Tony Luck:
       "Miscellaneous ia64 fix for 3.8.  Just need to avoid a pending
        namespace collision from other work being merged."
      
      * tag 'please-pull-misc-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
        [IA64] Resolve name space collision for cache_show()
      70f2836d
    • Linus Torvalds's avatar
      Merge tag 'arm64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 · 97ebe8f5
      Linus Torvalds authored
      Pull ARM64 updates from Catalin Marinas:
      
       - Generic execve, kernel_thread, fork/vfork/clone.
      
       - Preparatory patches for KVM support (initialising EL2 mode for later
         installing KVM support, hypervisor stub).
      
       - Signal handling corner case fix (alternative signal stack set up for
         a SEGV handler, which is raised in response to RLIMIT_STACK being
         reached).
      
       - Sub-nanosecond timer error fix.
      
      * tag 'arm64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64: (30 commits)
        arm64: Update the MAINTAINERS entry
        arm64: compat for clock_adjtime(2) is miswired
        arm64: move FP-SIMD save/restore code to a macro
        arm64: hyp: initialize vttbr_el2 to zero
        arm64: add hypervisor stub
        arm64: record boot mode when entering the kernel
        arm64: move vector entry macro to assembler.h
        arm64: add AArch32 execution modes to ptrace.h
        arm64: expand register mapping between AArch32 and AArch64
        arm64: generic timer: use virtual counter instead of physical at EL0
        arm64: vdso: defer shifting of nanosecond component of timespec
        arm64: vdso: rework __do_get_tspec register allocation and return shift
        arm64: vdso: check sequence counter even for coarse realtime operations
        arm64: vdso: fix clocksource mask when extracting bottom 56 bits
        ARM64: Remove incorrect Kconfig symbol HAVE_SPARSE_IRQ
        Documentation: Fixes a word in Documentation/arm64/memory.txt
        arm64: Make !dirty ptes read-only
        arm64: Convert empty flush_cache_{mm,page} functions to static inline
        arm64: signal: let the compiler inline compat_get_sigframe
        arm64: signal: return struct rt_sigframe from get_sigframe
        ...
      
      Conflicts:
      	arch/arm64/include/asm/unistd32.h
      97ebe8f5
    • Linus Torvalds's avatar
      Merge branch 'omap-serial' of git://git.linaro.org/people/rmk/linux-arm · d07e43d7
      Linus Torvalds authored
      Pull ARM OMAP serial updates from Russell King:
       "This series is a major reworking of the OMAP serial driver code fixing
        various bugs in the hardware-assisted flow control, extending up into
        serial_core for a couple of issues.  These fixes have been done as a
        set of progressive changes and transformations in the hope that no new
        bugs will be introduced by this series.
      
        The problems are many-fold, from the driver not being informed about
        updated settings, to the driver not knowing what the intentions of the
        upper layers are.
      
        The first four patches tackle the serial_core layer, allowing it to
        provide the necessary information to drivers, and the remaining
        patches allow the OMAP serial driver to take advantage of this.
      
        This brings hardware assisted RTS/CTS and XON/OFF flow control into a
        useful state.
      
        These patches have been in linux-next for most of the last cycle;
        indeed they predate the previous merge window.  They've also been
        posted to the OMAP people."
      
      * 'omap-serial' of git://git.linaro.org/people/rmk/linux-arm: (21 commits)
        SERIAL: omap: fix hardware assisted flow control
        SERIAL: omap: simplify (2)
        SERIAL: omap: move xon/xoff setting earlier
        SERIAL: omap: always set TCR
        SERIAL: omap: simplify
        SERIAL: omap: don't read back LCR/MCR/EFR
        SERIAL: omap: serial_omap_configure_xonxoff() contents into set_termios
        SERIAL: omap: configure xon/xoff before setting modem control lines
        SERIAL: omap: remove OMAP_UART_SYSC_RESET and OMAP_UART_FIFO_CLR
        SERIAL: omap: move driver private definitions and structures to driver
        SERIAL: omap: remove 'irq_pending' bitfield
        SERIAL: omap: fix MCR TCRTLR bit handling
        SERIAL: omap: fix set_mctrl() breakage
        SERIAL: omap: no need to re-read EFR
        SERIAL: omap: remove setting of EFR SCD bit
        SERIAL: omap: allow hardware assisted IXANY mode to be disabled
        SERIAL: omap: allow hardware assisted rts/cts modes to be disabled
        SERIAL: core: add throttle/unthrottle callbacks for hardware assisted flow control
        SERIAL: core: add hardware assisted h/w flow control support
        SERIAL: core: add hardware assisted s/w flow control support
        ...
      
      Conflicts:
      	drivers/tty/serial/omap-serial.c
      d07e43d7
    • Linus Torvalds's avatar
      Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1ebaf4f4
      Linus Torvalds authored
      Pull x86 timer update from Ingo Molnar:
       "This tree includes HPET fixes and also implements a calibration-free,
        TSC match driven APIC timer interrupt mode: 'TSC deadline mode'
        supported in SandyBridge and later CPUs."
      
      * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: hpet: Fix inverted return value check in arch_setup_hpet_msi()
        x86: hpet: Fix masking of MSI interrupts
        x86: apic: Use tsc deadline for oneshot when available
      1ebaf4f4
    • Linus Torvalds's avatar
      Merge branch 'x86-nuke386-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 743aa456
      Linus Torvalds authored
      Pull "Nuke 386-DX/SX support" from Ingo Molnar:
       "This tree removes ancient-386-CPUs support and thus zaps quite a bit
        of complexity:
      
          24 files changed, 56 insertions(+), 425 deletions(-)
      
        ... which complexity has plagued us with extra work whenever we wanted
        to change SMP primitives, for years.
      
        Unfortunately there's a nostalgic cost: your old original 386 DX33
        system from early 1991 won't be able to boot modern Linux kernels
        anymore.  Sniff."
      
      I'm not sentimental.  Good riddance.
      
      * 'x86-nuke386-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, 386 removal: Document Nx586 as a 386 and thus unsupported
        x86, cleanups: Simplify sync_core() in the case of no CPUID
        x86, 386 removal: Remove CONFIG_X86_POPAD_OK
        x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK
        x86, 386 removal: Remove CONFIG_INVLPG
        x86, 386 removal: Remove CONFIG_BSWAP
        x86, 386 removal: Remove CONFIG_XADD
        x86, 386 removal: Remove CONFIG_CMPXCHG
        x86, 386 removal: Remove CONFIG_M386 from Kconfig
      743aa456
    • Linus Torvalds's avatar
      Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a05a4e24
      Linus Torvalds authored
      Pull x86 topology discovery improvements from Ingo Molnar:
       "These changes improve topology discovery on AMD CPUs.
      
        Right now this feeds information displayed in
        /sys/devices/system/cpu/cpuX/cache/indexY/* - but in the future we
        could use this to set up a better scheduling topology."
      
      * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, cacheinfo: Base cache sharing info on CPUID 0x8000001d on AMD
        x86, cacheinfo: Make use of CPUID 0x8000001d for cache information on AMD
        x86, cacheinfo: Determine number of cache leafs using CPUID 0x8000001d on AMD
        x86: Add cpu_has_topoext
      a05a4e24
    • Linus Torvalds's avatar
      Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e9a5a919
      Linus Torvalds authored
      Pull x86 cleanups from Ingo Molnar:
       "Small cleanups."
      
      * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Fix the error of using "const" in gen-insn-attr-x86.awk
        x86, apic: Cleanup cfg->domain setup for legacy interrupts
        x86: Remove dead hlt_use_halt code
      e9a5a919
    • Linus Torvalds's avatar
      Merge branch 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 74b84233
      Linus Torvalds authored
      Pull x86 BSP hotplug changes from Ingo Molnar:
       "This tree enables CPU#0 (the boot processor) to be onlined/offlined on
        x86, just like any other CPU.  Enabled on Intel CPUs for now.
      
        Allowing this required the identification and fixing of latent CPU#0
        assumptions (such as CPU#0 initializations, etc.) in the x86
        architecture code, plus the identification of barriers to
        BSP-offlining, such as active PIC interrupts which can only be
        serviced on the BSP.
      
        It's behind a default-off option, and there's a debug option that
        allows the automatic testing of this feature.
      
        The motivation of this feature is to allow and prepare for true
        CPU-hotplug hardware support: recent changes to MCE support enable us
        to detect a deteriorating but not yet hard-failing L1/L2 cache on a
        CPU that could be soft-unplugged - or a failing L3 cache on a
        multi-socket system.
      
        Note that true hardware hot-plug is not yet fully enabled by this,
        because that requires a special platform wakeup sequence to be sent to
        the freshly powered up CPU#0.  Future patches for this are planned,
        once such a platform exists.  Chicken and egg"
      
      * 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, topology: Debug CPU0 hotplug
        x86/i387.c: Initialize thread xstate only on CPU0 only once
        x86, hotplug: Handle retrigger irq by the first available CPU
        x86, hotplug: The first online processor saves the MTRR state
        x86, hotplug: During CPU0 online, enable x2apic, set_numa_node.
        x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
        x86-32, hotplug: Add start_cpu0() entry point to head_32.S
        x86-64, hotplug: Add start_cpu0() entry point to head_64.S
        kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback
        x86, hotplug, suspend: Online CPU0 for suspend or hibernate
        x86, hotplug: Support functions for CPU0 online/offline
        x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it
        x86, Kconfig: Add config switch for CPU0 hotplug
        doc: Add x86 CPU0 online/offline feature
      74b84233
    • Linus Torvalds's avatar
      Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 50744747
      Linus Torvalds authored
      Pull x86 boot changes from Ingo Molnar:
       "Two small changes: a cleanup and allow CONFIG_X86_MPPARSE to be turned
        off on SFI as well."
      
      * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        arch/x86/Kconfig: Allow turning off CONFIG_X86_MPPARSE when either ACPI or SFI is present
        x86/boot/doc: Fix grammar and typo in boot.txt
      50744747
    • Linus Torvalds's avatar
      Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0019fab3
      Linus Torvalds authored
      Pull x86 asm changes from Ingo Molnar:
       "Two fixlets and a cleanup."
      
      * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86_32: Return actual stack when requesting sp from regs
        x86: Don't clobber top of pt_regs in nested NMI
        x86/asm: Clean up copy_page_*() comments and code
      0019fab3
    • Linus Torvalds's avatar
      Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b64c5fda
      Linus Torvalds authored
      Pull core timer changes from Ingo Molnar:
       "It contains continued generic-NOHZ work by Frederic and smaller
        cleanups."
      
      * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        time: Kill xtime_lock, replacing it with jiffies_lock
        clocksource: arm_generic: use this_cpu_ptr per-cpu helper
        clocksource: arm_generic: use integer math helpers
        time/jiffies: Make clocksource_jiffies static
        clocksource: clean up parse_pmtmr()
        tick: Correct the comments for tick_sched_timer()
        tick: Conditionally build nohz specific code in tick handler
        tick: Consolidate tick handling for high and low res handlers
        tick: Consolidate timekeeping handling code
      b64c5fda
    • Linus Torvalds's avatar
      Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f57d54ba
      Linus Torvalds authored
      Pull scheduler updates from Ingo Molnar:
       "The biggest change affects group scheduling: we now track the runnable
        average on a per-task entity basis, allowing a smoother, exponential
        decay average based load/weight estimation instead of the previous
        binary on-the-runqueue/off-the-runqueue load weight method.
      
        This will inevitably disturb workloads that were in some sort of
        borderline balancing state or unstable equilibrium, so an eye has to
        be kept on regressions.
      
        For that reason the new load average is only limited to group
        scheduling (shares distribution) at the moment (which was also hurting
        the most from the prior, crude weight calculation and whose scheduling
        quality wins most from this change) - but we plan to extend this to
        regular SMP balancing as well in the future, which will simplify and
        speed up things a bit.
      
        Other changes involve ongoing preparatory work to extend NOHZ to the
        scheduler as well, eventually allowing completely irq-free user-space
        execution."
      
      * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
        Revert "sched/autogroup: Fix crash on reboot when autogroup is disabled"
        cputime: Comment cputime's adjusting code
        cputime: Consolidate cputime adjustment code
        cputime: Rename thread_group_times to thread_group_cputime_adjusted
        cputime: Move thread_group_cputime() to sched code
        vtime: Warn if irqs aren't disabled on system time accounting APIs
        vtime: No need to disable irqs on vtime_account()
        vtime: Consolidate a bit the ctx switch code
        vtime: Explicitly account pending user time on process tick
        vtime: Remove the underscore prefix invasion
        sched/autogroup: Fix crash on reboot when autogroup is disabled
        cputime: Separate irqtime accounting from generic vtime
        cputime: Specialize irq vtime hooks
        kvm: Directly account vtime to system on guest switch
        vtime: Make vtime_account_system() irqsafe
        vtime: Gather vtime declarations to their own header file
        sched: Describe CFS load-balancer
        sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking
        sched: Make __update_entity_runnable_avg() fast
        sched: Update_cfs_shares at period edge
        ...
      f57d54ba
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · da830e58
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "These are late-v3.7 pending fixes for tracing."
      
      Fix up trivial conflict in kernel/trace/ring_buffer.c: the NULL pointer
      fix clashed with the change of type of the 'ret' variable.
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        ring-buffer: Fix race between integrity check and readers
        ring-buffer: Fix NULL pointer if rb_set_head_page() fails
        ftrace: Clear bits properly in reset_iter_read()
      da830e58
    • Linus Torvalds's avatar
      Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 090f8ccb
      Linus Torvalds authored
      Pull perf updates from Ingo Molnar:
       "Lots of activity:
      
         211 files changed, 8328 insertions(+), 4116 deletions(-)
      
        most of it on the tooling side.
      
        Main changes:
      
         * ftrace enhancements and fixes from Steve Rostedt.
      
         * uprobes fixes, cleanups and preparation for the ARM port from Oleg
           Nesterov.
      
         * UAPI fixes, from David Howels - prepares the arch/x86 UAPI
           transition
      
         * Separate perf tests into multiple objects, one per test, from Jiri
           Olsa.
      
         * Make hardware event translations available in sysfs, from Jiri
           Olsa.
      
         * Fixes to /proc/pid/maps parsing, preparatory to supporting data
           maps, from Namhyung Kim
      
         * Implement ui_progress for GTK, from Namhyung Kim
      
         * Add framework for automated perf_event_attr tests, where tools with
           different command line options will be run from a 'perf test', via
           python glue, and the perf syscall will be intercepted to verify
           that the perf_event_attr fields set by the tool are those expected,
           from Jiri Olsa
      
         * Add a 'link' method for hists, so that we can have the leader with
           buckets for all the entries in all the hists.  This new method is
           now used in the default 'diff' output, making the sum of the
           'baseline' column be 100%, eliminating blind spots.
      
         * libtraceevent fixes for compiler warnings trying to make perf it
           build on some distros, like fedora 14, 32-bit, some of the warnings
           really pointed to real bugs.
      
         * Add a browser for 'perf script' and make it available from the
           report and annotate browsers.  It does filtering to find the
           scripts that handle events found in the perf.data file used.  From
           Feng Tang
      
         * perf inject changes to allow showing where a task sleeps, from
           Andrew Vagin.
      
         * Makefile improvements from Namhyung Kim.
      
         * Add --pre and --post command hooks in 'stat', from Peter Zijlstra.
      
         * Don't stop synthesizing threads when one vanishes, this is for the
           existing threads when we start a tool like trace.
      
         * Use sched:sched_stat_runtime to provide a thread summary, this
           produces the same output as the 'trace summary' subcommand of
           tglx's original "trace" tool.
      
         * Support interrupted syscalls in 'trace'
      
         * Add an event duration column and filter in 'trace'.
      
         * There are references to the man pages in some tools, so try to
           build Documentation when installing, warning the user if that is
           not possible, from Borislav Petkov.
      
         * Give user better message if precise is not supported, from David
           Ahern.
      
         * Try to find cross-built objdump path by using the session
           environment information in the perf.data file header, from Irina
           Tirdea, original patch and idea by Namhyung Kim.
      
         * Diplays more output on features check for make V=1, so that one can
           figure out what is happening by looking at gcc output, etc.  From
           Jiri Olsa.
      
         * Add on_exit implementation for systems without one, e.g.  Android,
           from Bernhard Rosenkraenzer.
      
         * Only process events for vcpus of interest, helps handling large
           number of events, from David Ahern.
      
         * Cross compilation fixes for Android, from Irina Tirdea.
      
         * Add documentation on compiling for Android, from Irina Tirdea.
      
         * perf diff improvements from Jiri Olsa.
      
         * Target (task/user/cpu/syswide) handling improvements, from Namhyung
           Kim.
      
         * Add support in 'trace' for tracing workload given by command line,
           from Namhyung Kim.
      
         * ... and much more."
      
      * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (194 commits)
        uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race
        perf evsel: Introduce is_group_member method
        perf powerpc: Use uapi/unistd.h to fix build error
        tools: Pass the target in descend
        tools: Honour the O= flag when tool build called from a higher Makefile
        tools: Define a Makefile function to do subdir processing
        perf ui: Always compile browser setup code
        perf ui: Add ui_progress__finish()
        perf ui gtk: Implement ui_progress functions
        perf ui: Introduce generic ui_progress helper
        perf ui tui: Move progress.c under ui/tui directory
        perf tools: Add basic event modifier sanity check
        perf tools: Omit group members from perf_evlist__disable/enable
        perf tools: Ensure single disable call per event in record comand
        perf tools: Fix 'disabled' attribute config for record command
        perf tools: Fix attributes for '{}' defined event groups
        perf tools: Use sscanf for parsing /proc/pid/maps
        perf tools: Add gtk.<command> config option for launching GTK browser
        perf tools: Fix compile error on NO_NEWT=1 build
        perf hists: Initialize all of he->stat with zeroes
        ...
      090f8ccb
    • Linus Torvalds's avatar
      Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · aefb058b
      Linus Torvalds authored
      Pull irq fixes from Ingo Molnar:
       "Affinity fixes and a nested threaded IRQ handling fix."
      
      * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Always force thread affinity
        irq: Set CPU affinity right on thread creation
        genirq: Provide means to retrigger parent
      aefb058b
    • Linus Torvalds's avatar
      Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 37ea95a9
      Linus Torvalds authored
      Pull RCU update from Ingo Molnar:
       "The major features of this tree are:
      
           1. A first version of no-callbacks CPUs.  This version prohibits
              offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y.
              Relaxing this constraint is in progress, but not yet ready
              for prime time.  These commits were posted to LKML at
              https://lkml.org/lkml/2012/10/30/724.
      
           2. Changes to SRCU that allows statically initialized srcu_struct
              structures.  These commits were posted to LKML at
              https://lkml.org/lkml/2012/10/30/296.
      
           3. Restructuring of RCU's debugfs output.  These commits were posted
              to LKML at https://lkml.org/lkml/2012/10/30/341.
      
           4. Additional CPU-hotplug/RCU improvements, posted to LKML at
              https://lkml.org/lkml/2012/10/30/327.
              Note that the commit eliminating __stop_machine() was judged to
              be too-high of risk, so is deferred to 3.9.
      
           5. Changes to RCU's idle interface, most notably a new module
              parameter that redirects normal grace-period operations to
              their expedited equivalents.  These were posted to LKML at
              https://lkml.org/lkml/2012/10/30/739.
      
           6. Additional diagnostics for RCU's CPU stall warning facility,
              posted to LKML at https://lkml.org/lkml/2012/10/30/315.
              The most notable change reduces the
              default RCU CPU stall-warning time from 60 seconds to 21 seconds,
              so that it once again happens sooner than the softlockup timeout.
      
           7. Documentation updates, which were posted to LKML at
              https://lkml.org/lkml/2012/10/30/280.
              A couple of late-breaking changes were posted at
              https://lkml.org/lkml/2012/11/16/634 and
              https://lkml.org/lkml/2012/11/16/547.
      
           8. Miscellaneous fixes, which were posted to LKML at
              https://lkml.org/lkml/2012/10/30/309.
      
           9. Finally, a fix for an lockdep-RCU splat was posted to LKML
              at https://lkml.org/lkml/2012/11/7/486."
      
      * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
        context_tracking: New context tracking susbsystem
        sched: Mark RCU reader in sched_show_task()
        rcu: Separate accounting of callbacks from callback-free CPUs
        rcu: Add callback-free CPUs
        rcu: Add documentation for the new rcuexp debugfs trace file
        rcu: Update documentation for TREE_RCU debugfs tracing
        rcu: Reduce default RCU CPU stall warning timeout
        rcu: Fix TINY_RCU rcu_is_cpu_rrupt_from_idle check
        rcu: Clarify memory-ordering properties of grace-period primitives
        rcu: Add new rcutorture module parameters to start/end test messages
        rcu: Remove list_for_each_continue_rcu()
        rcu: Fix batch-limit size problem
        rcu: Add tracing for synchronize_sched_expedited()
        rcu: Remove old debugfs interfaces and also RCU flavor name
        rcu: split 'rcuhier' to each flavor
        rcu: split 'rcugp' to each flavor
        rcu: split 'rcuboost' to each flavor
        rcu: split 'rcubarrier' to each flavor
        rcu: Fix tracing formatting
        rcu: Remove the interface "rcudata.csv"
        ...
      37ea95a9
    • Linus Torvalds's avatar
      Merge branches 'core-locking-for-linus' and 'timers-urgent-for-linus' of... · de0c276b
      Linus Torvalds authored
      Merge branches 'core-locking-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
      
      Pull trivial fix branches from Ingo Molnar.
      
      Cleanup in __get_key_name, and a timer comment fixlet.
      
      * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        lockdep: Use KSYM_NAME_LEN'ed buffer for __get_key_name()
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        timers, sched: Correct the comments for tick_sched_timer()
      de0c276b
    • Linus Torvalds's avatar
      Merge branch 'akpm' (Andrew's patchbomb) · 608ff1a2
      Linus Torvalds authored
      Merge misc updates from Andrew Morton:
       "About half of most of MM.  Going very early this time due to
        uncertainty over the coreautounifiednumasched things.  I'll send the
        other half of most of MM tomorrow.  The rest of MM awaits a slab merge
        from Pekka."
      
      * emailed patches from Andrew Morton: (71 commits)
        memory_hotplug: ensure every online node has NORMAL memory
        memory_hotplug: handle empty zone when online_movable/online_kernel
        mm, memory-hotplug: dynamic configure movable memory and portion memory
        drivers/base/node.c: cleanup node_state_attr[]
        bootmem: fix wrong call parameter for free_bootmem()
        avr32, kconfig: remove HAVE_ARCH_BOOTMEM
        mm: cma: remove watermark hacks
        mm: cma: skip watermarks check for already isolated blocks in split_free_page()
        mm, oom: fix race when specifying a thread as the oom origin
        mm, oom: change type of oom_score_adj to short
        mm: cleanup register_node()
        mm, mempolicy: remove duplicate code
        mm/vmscan.c: try_to_freeze() returns boolean
        mm: introduce putback_movable_pages()
        virtio_balloon: introduce migration primitives to balloon pages
        mm: introduce compaction and migration for ballooned pages
        mm: introduce a common interface for balloon pages mobility
        mm: redefine address_space.assoc_mapping
        mm: adjust address_space_operations.migratepage() return code
        arch/sparc/kernel/sys_sparc_64.c: s/COLOUR/COLOR/
        ...
      608ff1a2
    • Lai Jiangshan's avatar
      memory_hotplug: ensure every online node has NORMAL memory · 74d42d8f
      Lai Jiangshan authored
      Old memory hotplug code and new online/movable may cause a online node
      don't have any normal memory, but memory-management acts bad when we have
      nodes which is online but don't have any normal memory.  Example: it may
      cause a bound task fail on all kernel allocation and cause the task can't
      create task or create other kernel object.
      
      So we disable non-normal-memory-node here, we will enable it when we
      prepared.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      74d42d8f
    • Lai Jiangshan's avatar
      memory_hotplug: handle empty zone when online_movable/online_kernel · e455a9b9
      Lai Jiangshan authored
      Make online_movable/online_kernel can empty a zone or can move memory to a
      empty zone.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e455a9b9
    • Lai Jiangshan's avatar
      mm, memory-hotplug: dynamic configure movable memory and portion memory · 511c2aba
      Lai Jiangshan authored
      Add online_movable and online_kernel for logic memory hotplug.  This is
      the dynamic version of "movablecore" & "kernelcore".
      
      We have the same reason to introduce it as to introduce "movablecore" &
      "kernelcore".  It has the same motive as "movablecore" & "kernelcore", but
      it is dynamic/running-time:
      
      o We can configure memory as kernelcore or movablecore after boot.
      
        Userspace workload is increased, we need more hugepage, we can't use
        "online_movable" to add memory and allow the system use more
        THP(transparent-huge-page), vice-verse when kernel workload is increase.
      
        Also help for virtualization to dynamic configure host/guest's memory,
        to save/(reduce waste) memory.
      
        Memory capacity on Demand
      
      o When a new node is physically online after boot, we need to use
        "online_movable" or "online_kernel" to configure/portion it as we
        expected when we logic-online it.
      
        This configuration also helps for physically-memory-migrate.
      
      o all benefit as the same as existed "movablecore" & "kernelcore".
      
      o Preparing for movable-node, which is very important for power-saving,
        hardware partitioning and high-available-system(hardware fault
        management).
      
      (Note, we don't introduce movable-node here.)
      
      Action behavior:
      When a memoryblock/memorysection is onlined by "online_movable", the kernel
      will not have directly reference to the page of the memoryblock,
      thus we can remove that memory any time when needed.
      
      When it is online by "online_kernel", the kernel can use it.
      When it is online by "online", the zone type doesn't changed.
      
      Current constraints:
      Only the memoryblock which is adjacent to the ZONE_MOVABLE
      can be online from ZONE_NORMAL to ZONE_MOVABLE.
      
      [akpm@linux-foundation.org: use min_t, cleanups]
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      511c2aba
    • Lai Jiangshan's avatar
      drivers/base/node.c: cleanup node_state_attr[] · fcf07d22
      Lai Jiangshan authored
      use [index] = init_value
      use N_xxxxx instead of hardcode.
      
      Make it more readability and easier to add new state.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fcf07d22
    • Joonsoo Kim's avatar
      bootmem: fix wrong call parameter for free_bootmem() · 81df9bff
      Joonsoo Kim authored
      It is strange that alloc_bootmem() returns a virtual address and
      free_bootmem() requires a physical address.  Anyway, free_bootmem()'s
      first parameter should be physical address.
      
      There are some call sites for free_bootmem() with virtual address.  So fix
      them.
      
      [akpm@linux-foundation.org: improve free_bootmem() and free_bootmem_pate() documentation]
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      81df9bff
    • Joonsoo Kim's avatar
      avr32, kconfig: remove HAVE_ARCH_BOOTMEM · e9b2e78c
      Joonsoo Kim authored
      There is no code for CONFIG_HAVE_ARCH_BOOTMEM, so remove it.
      Signed-off-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e9b2e78c
    • Marek Szyprowski's avatar
      mm: cma: remove watermark hacks · bc357f43
      Marek Szyprowski authored
      Commits 2139cbe6 ("cma: fix counting of isolated pages") and
      d95ea5d1 ("cma: fix watermark checking") introduced a reliable
      method of free page accounting when memory is being allocated from CMA
      regions, so the workaround introduced earlier by commit 49f223a9
      ("mm: trigger page reclaim in alloc_contig_range() to stabilise
      watermarks") can be finally removed.
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc357f43
    • Marek Szyprowski's avatar
      mm: cma: skip watermarks check for already isolated blocks in split_free_page() · 2e30abd1
      Marek Szyprowski authored
      Since commit 2139cbe6 ("cma: fix counting of isolated pages") free
      pages in isolated pageblocks are not accounted to NR_FREE_PAGES counters,
      so watermarks check is not required if one operates on a free page in
      isolated pageblock.
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2e30abd1
    • David Rientjes's avatar
      mm, oom: fix race when specifying a thread as the oom origin · e1e12d2f
      David Rientjes authored
      test_set_oom_score_adj() and compare_swap_oom_score_adj() are used to
      specify that current should be killed first if an oom condition occurs in
      between the two calls.
      
      The usage is
      
      	short oom_score_adj = test_set_oom_score_adj(OOM_SCORE_ADJ_MAX);
      	...
      	compare_swap_oom_score_adj(OOM_SCORE_ADJ_MAX, oom_score_adj);
      
      to store the thread's oom_score_adj, temporarily change it to the maximum
      score possible, and then restore the old value if it is still the same.
      
      This happens to still be racy, however, if the user writes
      OOM_SCORE_ADJ_MAX to /proc/pid/oom_score_adj in between the two calls.
      The compare_swap_oom_score_adj() will then incorrectly reset the old value
      prior to the write of OOM_SCORE_ADJ_MAX.
      
      To fix this, introduce a new oom_flags_t member in struct signal_struct
      that will be used for per-thread oom killer flags.  KSM and swapoff can
      now use a bit in this member to specify that threads should be killed
      first in oom conditions without playing around with oom_score_adj.
      
      This also allows the correct oom_score_adj to always be shown when reading
      /proc/pid/oom_score.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e1e12d2f
    • David Rientjes's avatar
      mm, oom: change type of oom_score_adj to short · a9c58b90
      David Rientjes authored
      The maximum oom_score_adj is 1000 and the minimum oom_score_adj is -1000,
      so this range can be represented by the signed short type with no
      functional change.  The extra space this frees up in struct signal_struct
      will be used for per-thread oom kill flags in the next patch.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9c58b90
    • Yasuaki Ishimatsu's avatar
      mm: cleanup register_node() · fa264375
      Yasuaki Ishimatsu authored
      register_node() is defined as extern in include/linux/node.h.  But the
      function is only called from register_one_node() in driver/base/node.c.
      
      So the patch defines register_node() as static.
      Signed-off-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa264375
    • David Rientjes's avatar
      mm, mempolicy: remove duplicate code · 212a0a6f
      David Rientjes authored
      Remove some duplicate code and simplify alloc_pages_vma().  No functional
      change.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      212a0a6f
    • Jeff Liu's avatar
      mm/vmscan.c: try_to_freeze() returns boolean · 6f6313d4
      Jeff Liu authored
      kswapd()->try_to_freeze() is defined to return a boolean, so it's better
      to use a bool to hold its return value.
      Signed-off-by: default avatarJie Liu <jeff.liu@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f6313d4
    • Rafael Aquini's avatar
      mm: introduce putback_movable_pages() · 5733c7d1
      Rafael Aquini authored
      The PATCH "mm: introduce compaction and migration for virtio ballooned pages"
      hacks around putback_lru_pages() in order to allow ballooned pages to be
      re-inserted on balloon page list as if a ballooned page was like a LRU page.
      
      As ballooned pages are not legitimate LRU pages, this patch introduces
      putback_movable_pages() to properly cope with cases where the isolated
      pageset contains ballooned pages and LRU pages, thus fixing the mentioned
      inelegant hack around putback_lru_pages().
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5733c7d1
    • Rafael Aquini's avatar
      virtio_balloon: introduce migration primitives to balloon pages · e2250429
      Rafael Aquini authored
      Memory fragmentation introduced by ballooning might reduce significantly
      the number of 2MB contiguous memory blocks that can be used within a guest,
      thus imposing performance penalties associated with the reduced number of
      transparent huge pages that could be used by the guest workload.
      
      Besides making balloon pages movable at allocation time and introducing
      the necessary primitives to perform balloon page migration/compaction,
      this patch also introduces the following locking scheme, in order to
      enhance the syncronization methods for accessing elements of struct
      virtio_balloon, thus providing protection against concurrent access
      introduced by parallel memory migration threads.
      
       - balloon_lock (mutex) : synchronizes the access demand to elements of
                                struct virtio_balloon and its queue operations;
      
      [yongjun_wei@trendmicro.com.cn: fix missing unlock on error in fill_balloon()]
      [akpm@linux-foundation.org: avoid having multiple return points in fill_balloon()]
      [akpm@linux-foundation.org: fix printk warning]Signed-off-by: Rafael Aquini <aquini@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e2250429
    • Rafael Aquini's avatar
      mm: introduce compaction and migration for ballooned pages · bf6bddf1
      Rafael Aquini authored
      Memory fragmentation introduced by ballooning might reduce significantly
      the number of 2MB contiguous memory blocks that can be used within a guest,
      thus imposing performance penalties associated with the reduced number of
      transparent huge pages that could be used by the guest workload.
      
      This patch introduces the helper functions as well as the necessary changes
      to teach compaction and migration bits how to cope with pages which are
      part of a guest memory balloon, in order to make them movable by memory
      compaction procedures.
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bf6bddf1
    • Rafael Aquini's avatar
      mm: introduce a common interface for balloon pages mobility · 18468d93
      Rafael Aquini authored
      Memory fragmentation introduced by ballooning might reduce significantly
      the number of 2MB contiguous memory blocks that can be used within a guest,
      thus imposing performance penalties associated with the reduced number of
      transparent huge pages that could be used by the guest workload.
      
      This patch introduces a common interface to help a balloon driver on
      making its page set movable to compaction, and thus allowing the system
      to better leverage the compation efforts on memory defragmentation.
      
      [akpm@linux-foundation.org: use PAGE_FLAGS_CHECK_AT_PREP, s/__balloon_page_flags/page_flags_cleared/, small cleanups]
      [rientjes@google.com: allow balloon compaction for any system with memory compaction enabled, which is the defconfig]
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      18468d93
    • Rafael Aquini's avatar
      mm: redefine address_space.assoc_mapping · 252aa6f5
      Rafael Aquini authored
      Overhaul struct address_space.assoc_mapping renaming it to
      address_space.private_data and its type is redefined to void*.  By this
      approach we consistently name the .private_* elements from struct
      address_space as well as allow extended usage for address_space
      association with other data structures through ->private_data.
      
      Also, all users of old ->assoc_mapping element are converted to reflect
      its new name and type change (->private_data).
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      252aa6f5
    • Rafael Aquini's avatar
      mm: adjust address_space_operations.migratepage() return code · 78bd5209
      Rafael Aquini authored
      Memory fragmentation introduced by ballooning might reduce significantly
      the number of 2MB contiguous memory blocks that can be used within a
      guest, thus imposing performance penalties associated with the reduced
      number of transparent huge pages that could be used by the guest workload.
      
      This patch-set follows the main idea discussed at 2012 LSFMMS session:
      "Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
      to introduce the required changes to the virtio_balloon driver, as well as
      the changes to the core compaction & migration bits, in order to make
      those subsystems aware of ballooned pages and allow memory balloon pages
      become movable within a guest, thus avoiding the aforementioned
      fragmentation issue
      
      Following are numbers that prove this patch benefits on allowing
      compaction to be more effective at memory ballooned guests.
      
      Results for STRESS-HIGHALLOC benchmark, from Mel Gorman's mmtests suite,
      running on a 4gB RAM KVM guest which was ballooning 512mB RAM in 64mB
      chunks, at every minute (inflating/deflating), while test was running:
      
      ===BEGIN stress-highalloc
      
      STRESS-HIGHALLOC
                       highalloc-3.7     highalloc-3.7
                           rc4-clean         rc4-patch
      Pass 1          55.00 ( 0.00%)    62.00 ( 7.00%)
      Pass 2          54.00 ( 0.00%)    62.00 ( 8.00%)
      while Rested    75.00 ( 0.00%)    80.00 ( 5.00%)
      
      MMTests Statistics: duration
                       3.7         3.7
                 rc4-clean   rc4-patch
      User         1207.59     1207.46
      System       1300.55     1299.61
      Elapsed      2273.72     2157.06
      
      MMTests Statistics: vmstat
                                      3.7         3.7
                                rc4-clean   rc4-patch
      Page Ins                    3581516     2374368
      Page Outs                  11148692    10410332
      Swap Ins                         80          47
      Swap Outs                      3641         476
      Direct pages scanned          37978       33826
      Kswapd pages scanned        1828245     1342869
      Kswapd pages reclaimed      1710236     1304099
      Direct pages reclaimed        32207       31005
      Kswapd efficiency               93%         97%
      Kswapd velocity             804.077     622.546
      Direct efficiency               84%         91%
      Direct velocity              16.703      15.682
      Percentage direct scans          2%          2%
      Page writes by reclaim        79252        9704
      Page writes file              75611        9228
      Page writes anon               3641         476
      Page reclaim immediate        16764       11014
      Page rescued immediate            0           0
      Slabs scanned               2171904     2152448
      Direct inode steals             385        2261
      Kswapd inode steals          659137      609670
      Kswapd skipped wait               1          69
      THP fault alloc                 546         631
      THP collapse alloc              361         339
      THP splits                      259         263
      THP fault fallback               98          50
      THP collapse fail                20          17
      Compaction stalls               747         499
      Compaction success              244         145
      Compaction failures             503         354
      Compaction pages moved       370888      474837
      Compaction move failure       77378       65259
      
      ===END stress-highalloc
      
      This patch:
      
      Introduce MIGRATEPAGE_SUCCESS as the default return code for
      address_space_operations.migratepage() method and documents the expected
      return code for the same method in failure cases.
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      78bd5209