1. 15 Dec, 2014 3 commits
  2. 13 Dec, 2014 2 commits
  3. 12 Dec, 2014 9 commits
  4. 11 Dec, 2014 26 commits
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 92a578b0
      Linus Torvalds authored
      Pull ACPI and power management updates from Rafael Wysocki:
       "This time we have some more new material than we used to have during
        the last couple of development cycles.
      
        The most important part of it to me is the introduction of a unified
        interface for accessing device properties provided by platform
        firmware.  It works with Device Trees and ACPI in a uniform way and
        drivers using it need not worry about where the properties come from
        as long as the platform firmware (either DT or ACPI) makes them
        available.  It covers both devices and "bare" device node objects
        without struct device representation as that turns out to be necessary
        in some cases.  This has been in the works for quite a few months (and
        development cycles) and has been approved by all of the relevant
        maintainers.
      
        On top of that, some drivers are switched over to the new interface
        (at25, leds-gpio, gpio_keys_polled) and some additional changes are
        made to the core GPIO subsystem to allow device drivers to manipulate
        GPIOs in the "canonical" way on platforms that provide GPIO
        information in their ACPI tables, but don't assign names to GPIO lines
        (in which case the driver needs to do that on the basis of what it
        knows about the device in question).  That also has been approved by
        the GPIO core maintainers and the rfkill driver is now going to use
        it.
      
        Second is support for hardware P-states in the intel_pstate driver.
        It uses CPUID to detect whether or not the feature is supported by the
        processor in which case it will be enabled by default.  However, it
        can be disabled entirely from the kernel command line if necessary.
      
        Next is support for a platform firmware interface based on ACPI
        operation regions used by the PMIC (Power Management Integrated
        Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms.
        That interface is used for manipulating power resources and for
        thermal management: sensor temperature reporting, trip point setting
        and so on.
      
        Also the ACPI core is now going to support the _DEP configuration
        information in a limited way.  Basically, _DEP it supposed to reflect
        off-the-hierarchy dependencies between devices which may be very
        indirect, like when AML for one device accesses locations in an
        operation region handled by another device's driver (usually, the
        device depended on this way is a serial bus or GPIO controller).  The
        support added this time is sufficient to make the ACPI battery driver
        work on Asus T100A, but it is general enough to be able to cover some
        other use cases in the future.
      
        Finally, we have a new cpufreq driver for the Loongson1B processor.
      
        In addition to the above, there are fixes and cleanups all over the
        place as usual and a traditional ACPICA update to a recent upstream
        release.
      
        As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver for
        Intel platforms should be able to handle power management of the DMA
        engine correctly, the cpufreq-dt driver should interact with the
        thermal subsystem in a better way and the ACPI backlight driver should
        handle some more corner cases, among other things.
      
        On top of the ACPICA update there are fixes for race conditions in the
        ACPICA's interrupt handling code which might lead to some random and
        strange looking failures on some systems.
      
        In the cleanups department the most visible part is the series of
        commits targeted at getting rid of the CONFIG_PM_RUNTIME configuration
        option.  That was triggered by a discussion regarding the generic
        power domains code during which we realized that trying to support
        certain combinations of PM config options was painful and not really
        worth it, because nobody would use them in production anyway.  For
        this reason, we decided to make CONFIG_PM_SLEEP select
        CONFIG_PM_RUNTIME and that lead to the conclusion that the latter
        became redundant and CONFIG_PM could be used instead of it.  The
        material here makes that replacement in a major part of the tree, but
        there will be at least one more batch of that in the second part of
        the merge window.
      
        Specifics:
      
         - Support for retrieving device properties information from ACPI _DSD
           device configuration objects and a unified device properties
           interface for device drivers (and subsystems) on top of that.  As
           stated above, this works with Device Trees and ACPI and allows
           device drivers to be written in a platform firmware (DT or ACPI)
           agnostic way.  The at25, leds-gpio and gpio_keys_polled drivers are
           now going to use this new interface and the GPIO subsystem is
           additionally modified to allow device drivers to assign names to
           GPIO resources returned by ACPI _CRS objects (in case _DSD is not
           present or does not provide the expected data).  The changes in
           this set are mostly from Mika Westerberg, Rafael J Wysocki, Aaron
           Lu, and Darren Hart with some fixes from others (Fabio Estevam,
           Geert Uytterhoeven).
      
         - Support for Hardware Managed Performance States (HWP) as described
           in Volume 3, section 14.4, of the Intel SDM in the intel_pstate
           driver.  CPUID is used to detect whether or not the feature is
           supported by the processor.  If supported, it will be enabled
           automatically unless the intel_pstate=no_hwp switch is present in
           the kernel command line.  From Dirk Brandewie.
      
         - New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie).
      
         - Support for firmware interface based on ACPI operation regions used
           by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR
           platforms for power resource control and thermal management (Aaron
           Lu).
      
         - Limited support for retrieving off-the-hierarchy dependencies
           between devices from ACPI _DEP device configuration objects and
           deferred probing support for the ACPI battery driver based on the
           _DEP information to make that driver work on Asus T100A (Lan
           Tianyu).
      
         - New cpufreq driver for the Loongson1B processor (Kelvin Cheung).
      
         - ACPICA update to upstream revision 20141107 which only affects
           tools (Bob Moore).
      
         - Fixes for race conditions in the ACPICA's interrupt handling code
           and in the ACPI code related to system suspend and resume (Lv Zheng
           and Rafael J Wysocki).
      
         - ACPI core fix for an RCU-related issue in the ioremap() regions
           management code that slowed down significantly after CPUs had been
           allowed to enter idle states even if they'd had RCU callbakcs
           queued and triggered some problems in certain proprietary graphics
           driver (and elsewhere).  The fix replaces synchronize_rcu() in that
           code with synchronize_rcu_expedited() which makes the issue go
           away.  From Konstantin Khlebnikov.
      
         - ACPI LPSS (Low-Power Subsystem) driver fix to handle power
           management of the DMA engine included into the LPSS correctly.  The
           problem is that the DMA engine doesn't have ACPI PM support of its
           own and it simply is turned off when the last LPSS device having
           ACPI PM support goes into D3cold.  To work around that, the PM
           domain used by the ACPI LPSS driver is redesigned so at least one
           device with ACPI PM support will be on as long as the DMA engine is
           in use.  From Andy Shevchenko.
      
         - ACPI backlight driver fix to avoid using it on "Win8-compatible"
           systems where it doesn't work and where it was used by default by
           mistake (Aaron Lu).
      
         - Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki,
           Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and Ashwin
           Chaugule (mostly related to the upcoming ARM64 support).
      
         - Intel RAPL (Running Average Power Limit) power capping driver fixes
           and improvements including new processor IDs (Jacob Pan).
      
         - Generic power domains modification to power up domains after
           attaching devices to them to meet the expectations of device
           drivers and bus types assuming devices to be accessible at probe
           time (Ulf Hansson).
      
         - Preliminary support for controlling device clocks from the generic
           power domains core code and modifications of the ARM/shmobile
           platform to use that feature (Ulf Hansson).
      
         - Assorted minor fixes and cleanups of the generic power domains core
           code (Ulf Hansson, Geert Uytterhoeven).
      
         - Assorted minor fixes and cleanups of the device clocks control code
           in the PM core (Geert Uytterhoeven, Grygorii Strashko).
      
         - Consolidation of device power management Kconfig options by making
           CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter
           which is now redundant (Rafael J Wysocki and Kevin Hilman).  That
           is the first batch of the changes needed for this purpose.
      
         - Core device runtime power management support code cleanup related
           to the execution of callbacks (Andrzej Hajda).
      
         - cpuidle ARM support improvements (Lorenzo Pieralisi).
      
         - cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and a
           new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and
           Bartlomiej Zolnierkiewicz).
      
         - New cpufreq driver callback (->ready) to be executed when the
           cpufreq core is ready to use a given policy object and cpufreq-dt
           driver modification to use that callback for cooling device
           registration (Viresh Kumar).
      
         - cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu, James
           Geboski, Tomeu Vizoso).
      
         - Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate,
           cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao,
           Stefan Wahren, Petr Cvek).
      
         - OPP (Operating Performance Points) framework modification to allow
           OPPs to be removed too and update of a few cpufreq drivers
           (cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added
           during initialization) on driver removal (Viresh Kumar).
      
         - Hibernation core fixes and cleanups (Tina Ruchandani and Markus
           Elfring).
      
         - PM Kconfig fix related to CPU power management (Pankaj Dubey).
      
         - cpupower tool fix (Prarit Bhargava)"
      
      * tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (120 commits)
        i2c-omap / PM: Drop CONFIG_PM_RUNTIME from i2c-omap.c
        dmaengine / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
        tools: cpupower: fix return checks for sysfs_get_idlestate_count()
        drivers: sh / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
        e1000e / igb / PM: Eliminate CONFIG_PM_RUNTIME
        MMC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
        MFD / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
        misc / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
        media / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
        input / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
        leds: leds-gpio: Fix multiple instances registration without 'label' property
        iio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
        hsi / OMAP / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
        i2c-hid / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
        drm / exynos / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
        gpio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
        hwrandom / exynos / PM: Use CONFIG_PM in #ifdef
        block / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
        USB / PM: Drop CONFIG_PM_RUNTIME from the USB core
        PM: Merge the SET*_RUNTIME_PM_OPS() macros
        ...
      92a578b0
    • Linus Torvalds's avatar
      Merge tag 'pci-v3.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · c75059c4
      Linus Torvalds authored
      Pull PCI changes from Bjorn Helgaas:
       "Here are the PCI changes intended for v3.19.  I don't think there's
        anything very exciting here, but there was a lot of MSI-related stuff
        coming via Thomas.
      
        Details:
      
        NUMA
          - Allow numa_node override via sysfs (Prarit Bhargava)
      
        Resource management
          - Restore detection of read-only BARs (Myron Stowe)
          - Shrink decoding-disabled window while sizing BARs (Myron Stowe)
          - Add informational printk for invalid BARs (Myron Stowe)
          - Remove fixed parameter in pci_iov_resource_bar() (Myron Stowe)
      
        MSI
          - Add pci_msi_ignore_mask to prevent writes to MSI/MSI-X Mask Bits (Yijing Wang)
          - Revert "PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()" (Yijing Wang)
          - s390/MSI: Use __msi_mask_irq() instead of default_msi_mask_irq() (Yijing Wang)
      
        Virtualization
          - xen: Process failure for pcifront_(re)scan_root() (Chen Gang)
          - Make FLR and AF FLR reset warning messages different (Gavin Shan)
      
        Generic host bridge driver
          - Allocate config space windows after limiting bus number range (Lorenzo Pieralisi)
          - Convert to DT resource parsing API (Lorenzo Pieralisi)
      
        Freescale Layerscape
          - Add Freescale Layerscape PCIe driver (Minghuan Lian)
      
        NVIDIA Tegra
          - Do not build on 64-bit ARM (Thierry Reding)
          - Add Kconfig help text (Thierry Reding)
      
        Renesas R-Car
          - Make rcar_pci static (Jingoo Han)
      
        Samsung Exynos
          - Add exynos prefix to add_pcie_port(), pcie_init() (Jingoo Han)
      
        ST Microelectronics SPEAr13xx
          - Add spear prefix to add_pcie_port(), pcie_init() (Jingoo Han)
          - Make spear13xx_add_pcie_port() __init (Jingoo Han)
          - Remove unnecessary OOM message (Jingoo Han)
      
        TI DRA7xx
          - Add dra7xx prefix to add_pcie_port() (Jingoo Han)
          - Make dra7xx_add_pcie_port() __init (Jingoo Han)
      
        TI Keystone
          - Make ks_dw_pcie_msi_domain_ops static (Jingoo Han)
          - Remove unnecessary OOM message (Jingoo Han)
      
        Miscellaneous
          - Delete unnecessary NULL pointer checks (Markus Elfring)
          - Remove unused to_hotplug_slot() (Gavin Shan)
          - Whitespace cleanup (Jingoo Han)
          - Simplify if-return sequences (Quentin Lambert)"
      
      * tag 'pci-v3.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (28 commits)
        PCI: Remove fixed parameter in pci_iov_resource_bar()
        PCI: Add informational printk for invalid BARs
        PCI: tegra: Add Kconfig help text
        PCI: tegra: Do not build on 64-bit ARM
        PCI: spear: Remove unnecessary OOM message
        PCI: mvebu: Add a blank line after declarations
        PCI: designware: Add a blank line after declarations
        PCI: exynos: Remove unnecessary return statement
        PCI: imx6: Use tabs for indentation
        PCI: keystone: Remove unnecessary OOM message
        PCI: Remove unused and broken to_hotplug_slot()
        PCI: Make FLR and AF FLR reset warning messages different
        PCI: dra7xx: Add __init annotation to dra7xx_add_pcie_port()
        PCI: spear: Add __init annotation to spear13xx_add_pcie_port()
        PCI: spear: Rename add_pcie_port(), pcie_init() to spear13xx_add_pcie_port(), etc.
        PCI: dra7xx: Rename add_pcie_port() to dra7xx_add_pcie_port()
        PCI: layerscape: Add Freescale Layerscape PCIe driver
        PCI: Simplify if-return sequences
        PCI: Delete unnecessary NULL pointer checks
        PCI: Shrink decoding-disabled window while sizing BARs
        ...
      c75059c4
    • Linus Torvalds's avatar
      Merge tag 'ktest-v3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest · f74ea368
      Linus Torvalds authored
      Pull ktest changes from Steven Rostedt:
       "The following ktest updates were done:
      
         - Fix handling the make kernelrelease change
         - Fix make_min_config that was broken by new bisect_config changes
         - Allow tests to undefine default options (not just being able to
           override them)
         - Print name of test (if defined) to start of test output"
      
      * tag 'ktest-v3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
        ktest: Add back "tail -1" to kernelrelease make
        ktest: Add name to running title
        ktest: Allow tests to undefine default options
        ktest: Fix make_min_config to handle new assign_configs call
        ktest: Use make -s kernelrelease
      f74ea368
    • Linus Torvalds's avatar
      Merge tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 350e4f49
      Linus Torvalds authored
      Pull nmi-safe seq_buf printk update from Steven Rostedt:
       "This code is a fork from the trace-3.19 pull as it needed the
        trace_seq clean ups from that branch.
      
        This code solves the issue of performing stack dumps from NMI context.
        The issue is that printk() is not safe from NMI context as if the NMI
        were to trigger when a printk() was being performed, the NMI could
        deadlock from the printk() internal locks.  This has been seen in
        practice.
      
        With lots of review from Petr Mladek, this code went through several
        iterations, and we feel that it is now at a point of quality to be
        accepted into mainline.
      
        Here's what is contained in this patch set:
      
         - Creates a "seq_buf" generic buffer utility that allows a descriptor
           to be passed around where functions can write their own "printk()"
           formatted strings into it.  The generic version was pulled out of
           the trace_seq() code that was made specifically for tracing.
      
         - The seq_buf code was change to model the seq_file code.  I have a
           patch (not included for 3.19) that converts the seq_file.c code
           over to use seq_buf.c like the trace_seq.c code does.  This was
           done to make sure that seq_buf.c is compatible with seq_file.c.  I
           may try to get that patch in for 3.20.
      
         - The seq_buf.c file was moved to lib/ to remove it from being
           dependent on CONFIG_TRACING.
      
         - The printk() was updated to allow for a per_cpu "override" of the
           internal calls.  That is, instead of writing to the console, a call
           to printk() may do something else.  This made it easier to allow
           the NMI to change what printk() does in order to call dump_stack()
           without needing to update that code as well.
      
         - Finally, the dump_stack from all CPUs via NMI code was converted to
           use the seq_buf code.  The caller to trigger the NMI code would
           wait till all the NMIs finished, and then it would print the
           seq_buf data to the console safely from a non NMI context
      
        One added bonus is that this code also makes the NMI dump stack work
        on PREEMPT_RT kernels.  As printk() includes sleeping locks on
        PREEMPT_RT, printk() only writes to console if the console does not
        use any rt_mutex converted spin locks.  Which a lot do"
      
      * tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        x86/nmi: Fix use of unallocated cpumask_var_t
        printk/percpu: Define printk_func when printk is not defined
        x86/nmi: Perform a safe NMI stack trace on all CPUs
        printk: Add per_cpu printk func to allow printk to be diverted
        seq_buf: Move the seq_buf code to lib/
        seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF
        tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions
        tracing: Have seq_buf use full buffer
        seq_buf: Add seq_buf_can_fit() helper function
        tracing: Add paranoid size check in trace_printk_seq()
        tracing: Use trace_seq_used() and seq_buf_used() instead of len
        tracing: Clean up tracing_fill_pipe_page()
        seq_buf: Create seq_buf_used() to find out how much was written
        tracing: Add a seq_buf_clear() helper and clear len and readpos in init
        tracing: Convert seq_buf fields to be like seq_file fields
        tracing: Convert seq_buf_path() to be like seq_path()
        tracing: Create seq_buf layer in trace_seq
      350e4f49
    • Linus Torvalds's avatar
      Merge tag 'ftracetest-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · c3280952
      Linus Torvalds authored
      Pull ftrace self-test updates from Steven Rostedt:
       "Updates for the ftrace self tests:
      
         - Added kprobes on ftrace testcase
         - Sort test cases
         - Add file to hold helper functions
         - Use logfile name supported by busybox's mktemp
         - Clear trace buffer after running kprobe test
         - Fix show descriptions when run on dash shell
         - Add --verbose option for showing echo output"
      
      * tag 'ftracetest-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        ftracetest: Add --verbose option for showing echo output
        ftracetest: Fix to show descriptions on dash
        ftracetest: Add basic event tracing test cases
        ftracetest: Clear trace buffer after running kprobe testcases
        ftracetest: Use logfile name supported by busybox's mktemp
        ftracetest: Add a couple of ftrace test cases
        ftracetest: Add functions file that holds helper functions
        ftracetest: Sort testcases
        ftracetest: Add kprobes on ftrace testcase
      c3280952
    • Linus Torvalds's avatar
      Merge tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 1dd7dcb6
      Linus Torvalds authored
      Pull tracing updates from Steven Rostedt:
       "There was a lot of clean ups and minor fixes.  One of those clean ups
        was to the trace_seq code.  It also removed the return values to the
        trace_seq_*() functions and use trace_seq_has_overflowed() to see if
        the buffer filled up or not.  This is similar to work being done to
        the seq_file code as well in another tree.
      
        Some of the other goodies include:
      
         - Added some "!" (NOT) logic to the tracing filter.
      
         - Fixed the frame pointer logic to the x86_64 mcount trampolines
      
         - Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems.
           That is, the ftrace trampoline can be dynamically allocated and be
           called directly by functions that only have a single hook to them"
      
      * tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (55 commits)
        tracing: Truncated output is better than nothing
        tracing: Add additional marks to signal very large time deltas
        Documentation: describe trace_buf_size parameter more accurately
        tracing: Allow NOT to filter AND and OR clauses
        tracing: Add NOT to filtering logic
        ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter
        ftrace/x86: Get rid of ftrace_caller_setup
        ftrace/x86: Have save_mcount_regs macro also save stack frames if needed
        ftrace/x86: Add macro MCOUNT_REG_SIZE for amount of stack used to save mcount regs
        ftrace/x86: Simplify save_mcount_regs on getting RIP
        ftrace/x86: Have save_mcount_regs store RIP in %rdi for first parameter
        ftrace/x86: Rename MCOUNT_SAVE_FRAME and add more detailed comments
        ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file
        ftrace/x86: Have static tracing also use ftrace_caller_setup
        ftrace/x86: Have static function tracing always test for function graph
        kprobes: Add IPMODIFY flag to kprobe_ftrace_ops
        ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict
        kprobes/ftrace: Recover original IP if pre_handler doesn't change it
        tracing/trivial: Fix typos and make an int into a bool
        tracing: Deletion of an unnecessary check before iput()
        ...
      1dd7dcb6
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patchbomb from Andrew) · b6da0076
      Linus Torvalds authored
      Merge first patchbomb from Andrew Morton:
       - a few minor cifs fixes
       - dma-debug upadtes
       - ocfs2
       - slab
       - about half of MM
       - procfs
       - kernel/exit.c
       - panic.c tweaks
       - printk upates
       - lib/ updates
       - checkpatch updates
       - fs/binfmt updates
       - the drivers/rtc tree
       - nilfs
       - kmod fixes
       - more kernel/exit.c
       - various other misc tweaks and fixes
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
        exit: pidns: fix/update the comments in zap_pid_ns_processes()
        exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
        exit: exit_notify: re-use "dead" list to autoreap current
        exit: reparent: call forget_original_parent() under tasklist_lock
        exit: reparent: avoid find_new_reaper() if no children
        exit: reparent: introduce find_alive_thread()
        exit: reparent: introduce find_child_reaper()
        exit: reparent: document the ->has_child_subreaper checks
        exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper()
        exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting
        exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting
        exit: proc: don't try to flush /proc/tgid/task/tgid
        exit: release_task: fix the comment about group leader accounting
        exit: wait: drop tasklist_lock before psig->c* accounting
        exit: wait: don't use zombie->real_parent
        exit: wait: cleanup the ptrace_reparented() checks
        usermodehelper: kill the kmod_thread_locker logic
        usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper()
        fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp
        nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
        ...
      b6da0076
    • Oleg Nesterov's avatar
      exit: pidns: fix/update the comments in zap_pid_ns_processes() · a53b8315
      Oleg Nesterov authored
      The comments in zap_pid_ns_processes() are not clear, we need to explain
      how this code actually works.
      
      1. "Ignore SIGCHLD" looks like optimization but it is not, we also
         need this for correctness.
      
      2. The comment above sys_wait4() could tell more.
      
         EXIT_ZOMBIE child is only possible if it has exited before we
         ignored SIGCHLD. Or if it is traced from the parent namespace,
         but in this case it will be reaped by debugger after detach,
         sys_wait4() acts as a synchronization point.
      
      3. The comment about TASK_DEAD (EXIT_DEAD in fact) children is
         outdated. Contrary to what it says we do not need to make sure
         they all go away after 0a01f2cc "pidns: Make the pidns proc
         mount/umount logic obvious".
      
         At the same time, we do need to wait for nr_hashed==init_pids,
         but the reasons are quite different and not obvious: setns().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a53b8315
    • Oleg Nesterov's avatar
      exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting · 24c037eb
      Oleg Nesterov authored
      alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it
      fails because disable_pid_allocation() was called by the exiting
      child_reaper.
      
      We could simply move get_pid_ns() down to successful return, but this fix
      tries to be as trivial as possible.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      24c037eb
    • Oleg Nesterov's avatar
      exit: exit_notify: re-use "dead" list to autoreap current · 6c66e7db
      Oleg Nesterov authored
      After the previous change we can add just the exiting EXIT_DEAD task to
      the "dead" list and remove another release_task(tsk).
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6c66e7db
    • Oleg Nesterov's avatar
      exit: reparent: call forget_original_parent() under tasklist_lock · 482a3767
      Oleg Nesterov authored
      Shift "release dead children" loop from forget_original_parent() to its
      caller, exit_notify().  It is safe to reap them even if our parent reaps
      us right after we drop tasklist_lock, those children no longer have any
      connection to the exiting task.
      
      And this allows us to avoid write_lock_irq(tasklist_lock) right after it
      was released by forget_original_parent(), we can simply call it with
      tasklist_lock held.
      
      While at it, move the comment about forget_original_parent() up to
      this function.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      482a3767
    • Oleg Nesterov's avatar
      exit: reparent: avoid find_new_reaper() if no children · ad9e206a
      Oleg Nesterov authored
      Now that pid_ns logic was isolated we can change forget_original_parent()
      to return right after find_child_reaper() when father->children is empty,
      there is nothing to reparent in this case.
      
      In particular this avoids find_alive_thread() and this can help if the
      whole process exits and it has a lot of PF_EXITING threads at the start of
      the thread list, this can easily lead to O(nr_threads ** 2) iterations.
      
      Trivial test case (tested under KVM, 2 CPUs):
      
          static void *tfunc(void *arg)
          {
              pause();
              return NULL;
          }
      
          static int child(unsigned int nt)
          {
              pthread_t pt;
      
              while (nt--)
                  assert(pthread_create(&pt, NULL, tfunc, NULL) == 0);
      
              pthread_kill(pt, SIGTRAP);
              pause();
              return 0;
          }
      
          int main(int argc, const char *argv[])
          {
              int stat;
              unsigned int nf = atoi(argv[1]);
              unsigned int nt = atoi(argv[2]);
      
              while (nf--) {
                  if (!fork())
                      return child(nt);
      
                  wait(&stat);
                  assert(stat == SIGTRAP);
              }
      
              return 0;
          }
      
      $ time ./test 16 16536 shows:
      
                    real        user         sys
          -    5m37.628s    0m4.437s    8m5.560s
          +    0m50.032s    0m7.130s    1m4.927s
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ad9e206a
    • Oleg Nesterov's avatar
      exit: reparent: introduce find_alive_thread() · c9dc05bf
      Oleg Nesterov authored
      Add the new simple helper to factor out the for_each_thread() code in
      find_child_reaper() and find_new_reaper().  It can also simplify the
      potential PF_EXITING -> exit_state change, plus perhaps we can change this
      code to take SIGNAL_GROUP_EXIT into account.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c9dc05bf
    • Oleg Nesterov's avatar
      exit: reparent: introduce find_child_reaper() · 1109909c
      Oleg Nesterov authored
      find_new_reaper() does 2 completely different things.  Not only it finds a
      reaper, it also updates pid_ns->child_reaper or kills the whole namespace
      if the caller is ->child_reaper.
      
      Now that has_child_subreaper logic doesn't depend on child_reaper check we
      can move that pid_ns code into a separate helper.  IMHO this makes the
      code more clean, and this allows the next changes.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1109909c
    • Oleg Nesterov's avatar
      exit: reparent: document the ->has_child_subreaper checks · 175aed3f
      Oleg Nesterov authored
      Swap the "init_task" and same_thread_group() checks.  This way it is more
      simple to document these checks and we can remove the link to the previous
      discussion on lkml.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      175aed3f
    • Oleg Nesterov's avatar
      exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper() · 3750ef97
      Oleg Nesterov authored
      Change find_new_reaper() to use for_each_thread() instead of deprecated
      while_each_thread().  We do not bother to check "thread != father" in the
      1st loop, we can rely on PF_EXITING check.
      
      Note: this means the minor behavioural change: for_each_thread() starts
      from the group leader.  But this should be fine, nobody should make any
      assumption about do_wait(__WNOTHREAD) when it comes to reparented tasks.
      And this can avoid the pointless reparenting to a short-living thread
      While zombie leaders are not that common.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3750ef97
    • Oleg Nesterov's avatar
      exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting · 7d24e2df
      Oleg Nesterov authored
      find_new_reaper() assumes that "has_child_subreaper" logic is safe as
      long as we are not the exiting ->child_reaper and this is doubly wrong:
      
      1. In fact it is safe if "pid_ns->child_reaper == father"; there must
         be no children after zap_pid_ns_processes() returns, so it doesn't
         matter what we return in this case and even pid_ns->child_reaper is
         wrong otherwise: we can't reparent to ->child_reaper == current.
      
         This is not a bug, but this is confusing.
      
      2. It is not safe if we are not pid_ns->child_reaper but from the same
         thread group. We drop tasklist_lock before zap_pid_ns_processes(),
         so another thread can lock it and choose the new reaper from the
         upper namespace if has_child_subreaper == T, and this is obviously
         wrong.
      
         This is not that bad, zap_pid_ns_processes() won't return until the
         the new reaper reaps all zombies, but this should be fixed anyway.
      
      We could change for_each_thread() loop to use ->exit_state instead of
      PF_EXITING which we had to use until 8aac6270, or we could change
      copy_signal() to check CLONE_NEWPID before setting has_child_subreaper,
      but lets change this code so that it is clear we can't look outside of
      our namespace, otherwise same_thread_group(reaper, child_reaper) check
      will look wrong and confusing anyway.
      
      We can simply start from "father" and fix the problem. We can't wrongly
      return a thread from the same thread group if ->is_child_subreaper == T,
      we know that all threads have PF_EXITING set.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7d24e2df
    • Oleg Nesterov's avatar
      exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting · 8a1296ae
      Oleg Nesterov authored
      The ->has_child_subreaper code in find_new_reaper() finds alive "thread"
      but returns another "reaper" thread which can be dead.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a1296ae
    • Oleg Nesterov's avatar
      exit: proc: don't try to flush /proc/tgid/task/tgid · c35a7f18
      Oleg Nesterov authored
      proc_flush_task_mnt() always tries to flush task/pid, but this is
      pointless if we reap the leader. d_invalidate() is recursive, and
      if nothing else the next d_hash_and_lookup(tgid) should fail anyway.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c35a7f18
    • Oleg Nesterov's avatar
      exit: release_task: fix the comment about group leader accounting · 26e75b5c
      Oleg Nesterov authored
      Contrary to what the comment in __exit_signal() says we do account the
      group leader. Fix this and explain why.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      26e75b5c
    • Oleg Nesterov's avatar
      exit: wait: drop tasklist_lock before psig->c* accounting · 986094df
      Oleg Nesterov authored
      wait_task_zombie() no longer needs tasklist_lock to accumulate the
      psig->c* counters, we can drop it right after cmpxchg(exit_state).
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      986094df
    • Oleg Nesterov's avatar
      exit: wait: don't use zombie->real_parent · f953ccd0
      Oleg Nesterov authored
      1. wait_task_zombie() uses p->real_parent to get psig/siglock. This is
         correct but needs tasklist_lock, ->real_parent can exit.
      
         We can use "current" instead. This is our natural child, its parent
         must be our sub-thread.
      
      2. Read psig/sig outside of ->siglock, ->signal is no longer protected
         by this lock.
      
      3. Fix the outdated comments about tasklist_lock. We can not race with
         __exit_signal(), the whole thread group is dead, nobody but us can
         call it.
      
         Also clarify the usage of ->stats_lock and ->siglock.
      
      Note: thread_group_cputime_adjusted() is sub-optimal in this case, we
      probably want to export cputime_adjust() to avoid thread_group_cputime().
      The comment says "all threads" but there are no other threads.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f953ccd0
    • Oleg Nesterov's avatar
      exit: wait: cleanup the ptrace_reparented() checks · f6507f83
      Oleg Nesterov authored
      Now that EXIT_DEAD is the terminal state we can kill "int traced"
      variable and check "state == EXIT_DEAD" instead to cleanup the code.  In
      particular, this way it is clear that the check obviously doesn't need
      tasklist_lock.
      
      Also fix the type of "unsigned long state", "long" was always wrong
      although this doesn't matter because cmpxchg/xchg uses typeof(*ptr).
      
      [akpm@linux-foundation.org: don't make me google the C Operator Precedence table]
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sterling Alexander <stalexan@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f6507f83
    • Oleg Nesterov's avatar
      usermodehelper: kill the kmod_thread_locker logic · 7f6def9f
      Oleg Nesterov authored
      Now that we do not call kernel_thread(CLONE_VFORK) from the worker
      thread we can not deadlock if do_execve() in turn triggers another
      call_usermodehelper(), we can remove the kmod_thread_locker code.
      
      Note: we should probably kill khelper_wq and simply use one of the
      global workqueues, say, system_unbound_wq, this special wq for umh buys
      nothing nowadays.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7f6def9f
    • Oleg Nesterov's avatar
      usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper() · 7117bc88
      Oleg Nesterov authored
      After "kernel/kmod: fix use-after-free of the sub_infostructure"
      CLONE_VFORK in __call_usermodehelper() buys nothing, we rely on on
      umh_complete() in ____call_usermodehelper() anyway.
      
      Remove it.  This also eliminates the unnecessary sleep/wakeup in the
      likely case, and this allows the next change.
      
      While at it, kill the "int wait" locals in ____call_usermodehelper() and
      __call_usermodehelper(), they can safely use sub_info->wait.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7117bc88
    • Rasmus Villemoes's avatar
      fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp · ddbc22e2
      Rasmus Villemoes authored
      Relying on the sign (after casting to int) of the difference of two
      quantities for comparison is usually wrong.  For example, should a-b
      turn out to be 2^31, the return value of cmp(a,b) is -2^31; but that
      would also be the return value from cmp(b, a).  So a compares less than
      b and b compares less than a.  One can also easily find three values
      a,b,c such that a compares less than b, b compares less than c, but a
      does not compare less than c.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: default avatarVyacheslav Dubeyko <slava@dubeyko.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ddbc22e2