1. 03 Jun, 2014 16 commits
    • Rafael J. Wysocki's avatar
      Merge branches 'pnp', 'powercap', 'pm-runtime' and 'pm-opp' · cd0c5bd3
      Rafael J. Wysocki authored
      * pnp:
        MAINTAINERS: Remove Bjorn Helgaas as PNP maintainer
        PNP / resources: remove positive test on unsigned values
      
      * powercap:
        powercap / RAPL: add new CPU IDs
        powercap / RAPL: further relax energy counter checks
      
      * pm-runtime:
        PM / runtime: Update documentation to reflect the current code flow
      
      * pm-opp:
        PM / OPP: discard duplicate OPPs
        PM / OPP: Make OPP invisible to users in Kconfig
        PM / OPP: fix incorrect OPP count handling in of_init_opp_table
      cd0c5bd3
    • Rafael J. Wysocki's avatar
      Merge branch 'acpi-thermal' · e81a0e77
      Rafael J. Wysocki authored
      * acpi-thermal:
        ACPI / thermal: Use acpi_bus_attach_private_data() to attach private data
      e81a0e77
    • Rafael J. Wysocki's avatar
      Merge branch 'acpi-video' · d9bd4493
      Rafael J. Wysocki authored
      * acpi-video:
        ACPI / video: Add 4 new models to the use_native_backlight DMI list
        ACPI / video: Add use native backlight quirk for the ThinkPad W530
        ACPI / video: Unregister the backlight device if a raw one shows up later
        backlight: Add backlight device (un)registration notification
        nouveau: Don't check acpi_video_backlight_support() before registering backlight
        acer-wmi: Add Aspire 5741 to video_vendor_dmi_table
        acer-wmi: Switch to acpi_video_unregister_backlight
        ACPI / video: Add an acpi_video_unregister_backlight function
        ACPI / video: Don't register acpi_video_resume notifier without backlight devices
        ACPI / video: change acpi-video brightness_switch_enabled default to 0
      d9bd4493
    • Rafael J. Wysocki's avatar
      Merge branch 'acpica' · 0e36d43c
      Rafael J. Wysocki authored
      * acpica: (63 commits)
        ACPICA: Namespace: Remove _PRP method support.
        ACPI: Fix x86 regression related to early mapping size limitation
        ACPICA: Tables: Add mechanism to control early table checksum verification.
        ACPICA: acpidump: Fix repetitive table dump in -n mode.
        ACPI: Clean up acpi_os_map/unmap_memory() to eliminate __iomem.
        ACPICA: Clean up redudant definitions already defined elsewhere
        ACPICA: Linux headers: Add <asm/acenv.h> to remove mis-ordered inclusion of <asm/acpi.h>
        ACPICA: Linux headers: Add <acpi/platform/aclinuxex.h>
        ACPICA: Linux headers: Remove ACPI_PREEMPTION_POINT() due to no usages.
        ACPICA: Update version to 20140424.
        ACPICA: Comment/format update, no functional change.
        ACPICA: Events: Update GPE handling and initialization code.
        ACPICA: Remove extraneous error message for large number of GPEs.
        ACPICA: Tables: Remove old mechanism to validate if XSDT contains NULL entries.
        ACPICA: Tables: Add new mechanism to skip NULL entries in RSDT and XSDT.
        ACPICA: acpidump: Add support to force using RSDT.
        ACPICA: Back port of improvements on exception code.
        ACPICA: Back port of _PRP update.
        ACPICA: acpidump: Fix truncated RSDP signature validation.
        ACPICA: Linux header: Add support for stubbed externals.
        ...
      0e36d43c
    • Rafael J. Wysocki's avatar
      Merge branch 'acpi-enumeration' · b04c58b1
      Rafael J. Wysocki authored
      * acpi-enumeration:
        ACPI / scan: use platform bus type by default for _HID enumeration
        ACPI / scan: always register ACPI LPSS scan handler
        ACPI / scan: always register memory hotplug scan handler
        ACPI / scan: always register container scan handler
        ACPI / scan: Change the meaning of missing .attach() in scan handlers
        ACPI / scan: introduce platform_id device PNP type flag
        ACPI / scan: drop unsupported serial IDs from PNP ACPI scan handler ID list
        ACPI / scan: drop IDs that do not comply with the ACPI PNP ID rule
        ACPI / PNP: use device ID list for PNPACPI device enumeration
        ACPI / scan: .match() callback for ACPI scan handlers
      b04c58b1
    • Rafael J. Wysocki's avatar
      Merge branch 'acpi-lpss' · 864e055f
      Rafael J. Wysocki authored
      * acpi-lpss:
        ACPI / LPSS: support for fractional divider clock
        ACPI / LPSS: custom power domain for LPSS
      864e055f
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-clk' · 6ad29246
      Rafael J. Wysocki authored
      * pm-clk:
        clk: new basic clk type for fractional divider
      6ad29246
    • Rafael J. Wysocki's avatar
      Merge branch 'acpi-platform' · 2770b8b1
      Rafael J. Wysocki authored
      * acpi-platform:
        ACPI / platform / LPSS: Enable async suspend/resume of LPSS devices
        ACPI / platform: add IDs for Broadcom Bluetooth and GPS chips
      2770b8b1
    • Rafael J. Wysocki's avatar
      Merge branch 'acpi-pm' · a392f7d4
      Rafael J. Wysocki authored
      * acpi-pm:
        ACPI / PM: Export rest of the subsys PM callbacks
        ACPI / PM: Avoid resuming devices in ACPI PM domain during system suspend
        ACPI / PM: Hold ACPI scan lock over the "freeze" sleep state
        ACPI / PM: Export acpi_target_system_state() to modules
      a392f7d4
    • Rafael J. Wysocki's avatar
      Merge branch 'acpi-battery' · f58c41cc
      Rafael J. Wysocki authored
      * acpi-battery:
        ACPI / battery: wakeup the system only when necessary
        power_supply: allow power supply devices registered w/o wakeup source
        ACPI / battery: introduce support for POWER_SUPPLY_PROP_CAPACITY_LEVEL
        ACPI / battery: Accelerate battery resume callback
      f58c41cc
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-sleep' · ee7f9d7c
      Rafael J. Wysocki authored
      * pm-sleep:
        PM / hibernate: fixed typo in comment
        PM / sleep: unregister wakeup source when disabling device wakeup
        PM / sleep: Introduce command line argument for sleep state enumeration
        PM / sleep: Use valid_state() for platform-dependent sleep states only
        PM / sleep: Add state field to pm_states[] entries
        PM / sleep: Update device PM documentation to cover direct_complete
        PM / sleep: Mechanism to avoid resuming runtime-suspended devices unnecessarily
        PM / hibernate: Fix memory corruption in resumedelay_setup()
        PM / hibernate: convert simple_strtoul to kstrtoul
        PM / hibernate: Documentation: Fix script for unswapping
        PM / hibernate: no kernel_power_off when pm_power_off NULL
        PM / hibernate: use unsigned local variables in swsusp_show_speed()
      ee7f9d7c
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-cpuidle' · 97b80e68
      Rafael J. Wysocki authored
      * pm-cpuidle:
        PM / suspend: Always use deepest C-state in the "freeze" sleep state
        cpuidle / menu: move repeated correction factor check to init
        cpuidle / menu: Return (-1) if there are no suitable states
        cpuidle: Combine cpuidle_enabled() with cpuidle_select()
        ARM: clps711x: Add cpuidle driver
      97b80e68
    • Rafael J. Wysocki's avatar
      Merge branches 'acpi-tables' and 'acpi-general' · 91ab377b
      Rafael J. Wysocki authored
      * acpi-tables:
        ACPI: Fix conflict between customized DSDT and DSDT local copy
      
      * acpi-general:
        ACPI: Add acpi_bus_attach_private_data() to attach data to ACPI handle
      91ab377b
    • Rafael J. Wysocki's avatar
      Merge branches 'acpi-processor' and 'acpi-pad' · 26f8784e
      Rafael J. Wysocki authored
      * acpi-processor:
        ACPI / processor: Fix STARTING/DYING action in acpi_cpu_soft_notify()
        ACPI / processor: Check if LAPIC is present during initialization
        ACPI / ia64: introduce variable acpi_lapic into ia64
      
      * acpi-pad:
        ACPI / PAD: Use time_before() for time comparison
        ACPI / PAD: call schedule() when need_resched() is true
      26f8784e
    • Rafael J. Wysocki's avatar
      Merge branches 'acpi-scan', 'acpi-hotplug' and 'acpi-pci' · 4db367fd
      Rafael J. Wysocki authored
      * acpi-scan:
        ACPI / scan: do not scan fixed hardware on HW-reduced platform
      
      * acpi-hotplug:
        ACPI: add dynamic_debug support
        ACPI / notify: Clean up handling of hotplug events
      
      * acpi-pci:
        ACPI / PCI: Stub out pci_acpi_crs_quirks() and make it x86 specific
      4db367fd
    • Lv Zheng's avatar
      ACPICA: Namespace: Remove _PRP method support. · 729ddb65
      Lv Zheng authored
      The _PRP method is not going to be a part of the ACPI standard. This patch
      removes its support code introduced by the following commits:
       1. ACPICA: Predefined names: Add support for the _PRP method.
       2. ACPICA: Update for _PRP predefined name.
       3. ACPICA: Add support for _LPD and _PRP methods.
       4. ACPICA: Back port of _PRP update.
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      729ddb65
  2. 02 Jun, 2014 8 commits
  3. 01 Jun, 2014 1 commit
  4. 31 May, 2014 6 commits
    • Niv Yehezkel's avatar
      PM / hibernate: fixed typo in comment · 057b0a75
      Niv Yehezkel authored
      Fix a trivial comment typo (s/mam/map) in kernel/power/swap.c.
      Signed-off-by: default avatarNiv Yehezkel <executerx@gmail.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      057b0a75
    • Lv Zheng's avatar
      ACPI: Fix x86 regression related to early mapping size limitation · 4fc0a7e8
      Lv Zheng authored
      The following warning message is triggered:
       WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:136 __early_ioremap+0x11f/0x1f2()
       Modules linked in:
       CPU: 0 PID: 0 Comm: swapper Not tainted 3.15.0-rc1-00017-g86dfc6f3-dirty #298
       Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.99.99.x036.091920111209 09/19/2011
        0000000000000009 ffffffff81b75c40 ffffffff817c627b 0000000000000000
        ffffffff81b75c78 ffffffff81067b5d 000000000000007b 8000000000000563
        00000000b96b20dc 0000000000000001 ffffffffff300e0c ffffffff81b75c88
       Call Trace:
        [<ffffffff817c627b>] dump_stack+0x45/0x56
        [<ffffffff81067b5d>] warn_slowpath_common+0x7d/0xa0
        [<ffffffff81067c3a>] warn_slowpath_null+0x1a/0x20
        [<ffffffff81d4b9d5>] __early_ioremap+0x11f/0x1f2
        [<ffffffff81d4bc5b>] early_ioremap+0x13/0x15
        [<ffffffff81d2b8f3>] __acpi_map_table+0x13/0x18
        [<ffffffff817b8d1a>] acpi_os_map_memory+0x26/0x14e
        [<ffffffff813ff018>] acpi_tb_acquire_table+0x42/0x70
        [<ffffffff813ff086>] acpi_tb_validate_table+0x27/0x37
        [<ffffffff813ff0e5>] acpi_tb_verify_table+0x22/0xd8
        [<ffffffff813ff6a8>] acpi_tb_install_non_fixed_table+0x60/0x1c9
        [<ffffffff81d61024>] acpi_tb_parse_root_table+0x218/0x26a
        [<ffffffff81d1b120>] ? early_idt_handlers+0x120/0x120
        [<ffffffff81d610cd>] acpi_initialize_tables+0x57/0x59
        [<ffffffff81d5f25d>] acpi_table_init+0x1b/0x99
        [<ffffffff81d2bca0>] acpi_boot_table_init+0x1e/0x85
        [<ffffffff81d23043>] setup_arch+0x99d/0xcc6
        [<ffffffff81d1b120>] ? early_idt_handlers+0x120/0x120
        [<ffffffff81d1bbbe>] start_kernel+0x8b/0x415
        [<ffffffff81d1b120>] ? early_idt_handlers+0x120/0x120
        [<ffffffff81d1b5ee>] x86_64_start_reservations+0x2a/0x2c
        [<ffffffff81d1b72e>] x86_64_start_kernel+0x13e/0x14d
       ---[ end trace 11ae599a1898f4e7 ]---
      when installing the following table during early stage:
       ACPI: SSDT 0x00000000B9638018 07A0C4 (v02 INTEL  S2600CP  00004000 INTL 20100331)
      The regression is caused by the size limitation of the x86 early IO mapping.
      
      The root cause is:
       1. ACPICA doesn't split IO memory mapping and table mapping;
       2. Linux x86 OSL implements acpi_os_map_memory() using a size limited fix-map
          mechanism during early boot stage, which is more suitable for only IO
          mappings.
      
      This patch fixes this issue by utilizing acpi_gbl_verify_table_checksum to
      disable the table mapping during early stage and enabling it again for the
      late stage. In this way, the normal code path is not affected. Then after
      the code related to the root cause is cleaned up, the early checksum
      verification can be easily re-enabled.
      
      A new boot parameter - acpi_force_table_verification is introduced for
      the platforms that require the checksum verification to stop loading bad
      tables.
      
      This fix also covers the checksum verification for the table overrides. Now
      large tables can also be overridden using the initrd override mechanism.
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Reported-and-tested-by: default avatarYuanhan Liu <yuanhan.liu@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4fc0a7e8
    • Lv Zheng's avatar
      ACPICA: Tables: Add mechanism to control early table checksum verification. · 47d68c7f
      Lv Zheng authored
      It is reported that Linux x86 kernel cannot map large tables. The following
      large SSDT table on such platform fails to pass checksum verification and
      cannot be installed:
       ACPI: SSDT 0x00000000B9638018 07A0C4 (v02 INTEL  S2600CP  00004000 INTL 20100331)
      
      It sounds strange that in the 64-bit virtual memory address space, we
      cannot map a single ACPI table to do checksum verification. The root cause
      is:
       1. ACPICA doesn't split IO memory mapping and table mapping;
       2. Linux x86 OSL implements acpi_os_map_memory() using a size limited fix-map
          mechanism during early boot stage, which is more suitable for only IO
          mappings.
      
      ACPICA originally only mapped table header for signature validation, and
      this header mapping is required by OSL override mechanism. There was no
      checksum verification because we could not map the whole table using this
      OSL. While the following ACPICA commit enforces checksum verification by
      mapping the whole table during Linux boot stage and it finally triggers
      this issue on some platforms:
       Commit: 86dfc6f3
       Subject: ACPICA: Tables: Fix table checksums verification before installation.
      
      Before doing further cleanups for the OSL table mapping and override
      implementation, this patch introduces an option for such OSPMs to
      temporarily discard the checksum verification feature. It then can be
      re-enabled easily when the ACPICA and the underlying OSL is ready.
      
      This patch also deletes a comment around the limitation of mappings because
      it is not correct. The limitation is not how many times we can map in the
      early stage, but the OSL mapping facility may not be suitable for mapping
      the ACPI tables and thus may complain us the size limitation.
      
      The acpi_tb_verify_table() is renamed to acpi_tb_verify_temp_table() due to the
      work around added, it now only applies to the table descriptor that hasn't
      been installed and cannot be used in other cases. Lv Zheng.
      Tested-by: default avatarYuanhan Liu <yuanhan.liu@linux.intel.com>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      47d68c7f
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a4bf79eb
      Linus Torvalds authored
      Pull core futex/rtmutex fixes from Thomas Gleixner:
       "Three fixlets for long standing issues in the futex/rtmutex code
        unearthed by Dave Jones syscall fuzzer:
      
         - Add missing early deadlock detection checks in the futex code
         - Prevent user space from attaching a futex to kernel threads
         - Make the deadlock detector of rtmutex work again
      
        Looks large, but is more comments than code change"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        rtmutex: Fix deadlock detector for real
        futex: Prevent attaching to kernel threads
        futex: Add another early deadlock detection check
      a4bf79eb
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 80e06794
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Mostly quiet now:
      
        i915:
          fixing userspace visiblie issues, all stable marked
      
        radeon:
          one more pll fix, two crashers, one suspend/resume regression"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        drm/radeon: Resume fbcon last
        drm/radeon: only allocate necessary size for vm bo list
        drm/radeon: don't allow RADEON_GEM_DOMAIN_CPU for command submission
        drm/radeon: avoid crash if VM command submission isn't available
        drm/radeon: lower the ref * post PLL maximum once more
        drm/i915: Prevent negative relocation deltas from wrapping
        drm/i915: Only copy back the modified fields to userspace from execbuffer
        drm/i915: Fix dynamic allocation of physical handles
      80e06794
    • Linus Torvalds's avatar
      dcache: add missing lockdep annotation · 9f12600f
      Linus Torvalds authored
      lock_parent() very much on purpose does nested locking of dentries, and
      is careful to maintain the right order (lock parent first).  But because
      it didn't annotate the nested locking order, lockdep thought it might be
      a deadlock on d_lock, and complained.
      
      Add the proper annotation for the inner locking of the child dentry to
      make lockdep happy.
      
      Introduced by commit 046b961b ("shrink_dentry_list(): take parent's
      ->d_lock earlier").
      Reported-and-tested-by: default avatarJosh Boyer <jwboyer@fedoraproject.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f12600f
  5. 30 May, 2014 9 commits
    • Daniel Vetter's avatar
      drm/radeon: Resume fbcon last · 18ee37a4
      Daniel Vetter authored
      So a few people complained that
      
      commit 177cf92d
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Tue Apr 1 22:14:59 2014 +0200
      
          drm/crtc-helpers: fix dpms on logic
      
      which was merged into 3.15-rc1, broke resume on radeons. Strangely git
      bisect lead everyone to
      
      commit 25f397a4
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Fri Jul 19 18:57:11 2013 +0200
      
          drm/crtc-helper: explicit DPMS on after modeset
      
      which was merged long ago and actually part of 3.14.
      
      Digging deeper I've noticed (again) that the call to
      drm_helper_resume_force_mode in the radeon resume handlers was a no-op
      previously because everything gets shut down on suspend. radeon does
      this with explicit calls to drm_helper_connector_dpms with DPMS_OFF.
      But with 177c we now force the dpms state to ON, so suddenly
      resume_force_mode actually forced the crtcs back on.
      
      This is the intention of the change after all, the problem is that
      radeon resumes the fbdev console layer _before_ restoring the display,
      through calling fb_set_suspend. And fbcon does an immediate ->set_par,
      which in turn causes the same forced mode restore to happen.
      
      Two concurrent modeset operations didn't lead to happiness. Fix this
      by delaying the fbcon resume until the end of the readeon resum
      functions.
      
      v2: Fix up a bit of the spelling fail.
      
      References: https://lkml.org/lkml/2014/5/29/1043
      References: https://lkml.org/lkml/2014/5/2/388
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=74751Tested-by: default avatarKen Moffat <zarniwhoop@ntlworld.com>
      Cc: Alex Deucher <alexdeucher@gmail.com>
      Cc: Ken Moffat <zarniwhoop@ntlworld.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDave Airlie <airlied@gmail.com>
      18ee37a4
    • Dave Airlie's avatar
      Merge branch 'drm-fixes-3.15' of git://people.freedesktop.org/~deathsimple/linux into drm-fixes · 1446e04c
      Dave Airlie authored
      this is the next pull request for stashed up radeon fixes for 3.15. This is finally calming down with only four patches in this pull request.
      
      * 'drm-fixes-3.15' of git://people.freedesktop.org/~deathsimple/linux:
        drm/radeon: only allocate necessary size for vm bo list
        drm/radeon: don't allow RADEON_GEM_DOMAIN_CPU for command submission
        drm/radeon: avoid crash if VM command submission isn't available
        drm/radeon: lower the ref * post PLL maximum once more
      1446e04c
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 1487385e
      Linus Torvalds authored
      Pull input subsystem fixes from Dmitry Torokhov:
       "A couple of driver/build fixups and also redone quirk for Synaptics
        touchpads on Lenovo boxes (now using PNP IDs instead of DMI data to
        limit number of quirks)"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: synaptics - change min/max quirk table to pnp-id matching
        Input: synaptics - add a matches_pnp_id helper function
        Input: synaptics - T540p - unify with other LEN0034 models
        Input: synaptics - add min/max quirk for the ThinkPad W540
        Input: ambakmi - request a shared interrupt for AMBA KMI devices
        Input: pxa27x-keypad - fix generating scancode
        Input: atmel-wm97xx - only build for AVR32
        Input: fix ps2/serio module dependency
      1487385e
    • Linus Torvalds's avatar
      Merge tag 'firewire-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 · 1326af24
      Linus Torvalds authored
      Pull firewire fix from Stefan Richter:
       "A regression fix for the IEEE 1394 subsystem: re-enable IRQ-based
        asynchronous request reception at addresses below 128 TB"
      
      * tag 'firewire-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
        firewire: revert to 4 GB RDMA, fix protocols using Memory Space
      1326af24
    • Linus Torvalds's avatar
      Merge tag 'dm-3.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm · 24e19d27
      Linus Torvalds authored
      Pull device-mapper fixes from Mike Snitzer:
       "A dm-cache stable fix to split discards on cache block boundaries
        because dm-cache cannot yet handle discards that span cache blocks.
      
        Really fix a dm-mpath LOCKDEP warning that was introduced in -rc1.
      
        Add a 'no_space_timeout' control to dm-thinp to restore the ability to
        queue IO indefinitely when no data space is available.  This fixes a
        change in behavior that was introduced in -rc6 where the timeout
        couldn't be disabled"
      
      * tag 'dm-3.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm mpath: really fix lockdep warning
        dm cache: always split discards on cache block boundaries
        dm thin: add 'no_space_timeout' dm-thin-pool module param
      24e19d27
    • Minchan Kim's avatar
      x86_64: expand kernel stack to 16K · 6538b8ea
      Minchan Kim authored
      While I play inhouse patches with much memory pressure on qemu-kvm,
      3.14 kernel was randomly crashed. The reason was kernel stack overflow.
      
      When I investigated the problem, the callstack was a little bit deeper
      by involve with reclaim functions but not direct reclaim path.
      
      I tried to diet stack size of some functions related with alloc/reclaim
      so did a hundred of byte but overflow was't disappeard so that I encounter
      overflow by another deeper callstack on reclaim/allocator path.
      
      Of course, we might sweep every sites we have found for reducing
      stack usage but I'm not sure how long it saves the world(surely,
      lots of developer start to add nice features which will use stack
      agains) and if we consider another more complex feature in I/O layer
      and/or reclaim path, it might be better to increase stack size(
      meanwhile, stack usage on 64bit machine was doubled compared to 32bit
      while it have sticked to 8K. Hmm, it's not a fair to me and arm64
      already expaned to 16K. )
      
      So, my stupid idea is just let's expand stack size and keep an eye
      toward stack consumption on each kernel functions via stacktrace of ftrace.
      For example, we can have a bar like that each funcion shouldn't exceed 200K
      and emit the warning when some function consumes more in runtime.
      Of course, it could make false positive but at least, it could make a
      chance to think over it.
      
      I guess this topic was discussed several time so there might be
      strong reason not to increase kernel stack size on x86_64, for me not
      knowing so Ccing x86_64 maintainers, other MM guys and virtio
      maintainers.
      
      Here's an example call trace using up the kernel stack:
      
               Depth    Size   Location    (51 entries)
               -----    ----   --------
         0)     7696      16   lookup_address
         1)     7680      16   _lookup_address_cpa.isra.3
         2)     7664      24   __change_page_attr_set_clr
         3)     7640     392   kernel_map_pages
         4)     7248     256   get_page_from_freelist
         5)     6992     352   __alloc_pages_nodemask
         6)     6640       8   alloc_pages_current
         7)     6632     168   new_slab
         8)     6464       8   __slab_alloc
         9)     6456      80   __kmalloc
        10)     6376     376   vring_add_indirect
        11)     6000     144   virtqueue_add_sgs
        12)     5856     288   __virtblk_add_req
        13)     5568      96   virtio_queue_rq
        14)     5472     128   __blk_mq_run_hw_queue
        15)     5344      16   blk_mq_run_hw_queue
        16)     5328      96   blk_mq_insert_requests
        17)     5232     112   blk_mq_flush_plug_list
        18)     5120     112   blk_flush_plug_list
        19)     5008      64   io_schedule_timeout
        20)     4944     128   mempool_alloc
        21)     4816      96   bio_alloc_bioset
        22)     4720      48   get_swap_bio
        23)     4672     160   __swap_writepage
        24)     4512      32   swap_writepage
        25)     4480     320   shrink_page_list
        26)     4160     208   shrink_inactive_list
        27)     3952     304   shrink_lruvec
        28)     3648      80   shrink_zone
        29)     3568     128   do_try_to_free_pages
        30)     3440     208   try_to_free_pages
        31)     3232     352   __alloc_pages_nodemask
        32)     2880       8   alloc_pages_current
        33)     2872     200   __page_cache_alloc
        34)     2672      80   find_or_create_page
        35)     2592      80   ext4_mb_load_buddy
        36)     2512     176   ext4_mb_regular_allocator
        37)     2336     128   ext4_mb_new_blocks
        38)     2208     256   ext4_ext_map_blocks
        39)     1952     160   ext4_map_blocks
        40)     1792     384   ext4_writepages
        41)     1408      16   do_writepages
        42)     1392      96   __writeback_single_inode
        43)     1296     176   writeback_sb_inodes
        44)     1120      80   __writeback_inodes_wb
        45)     1040     160   wb_writeback
        46)      880     208   bdi_writeback_workfn
        47)      672     144   process_one_work
        48)      528     112   worker_thread
        49)      416     240   kthread
        50)      176     176   ret_from_fork
      
      [ Note: the problem is exacerbated by certain gcc versions that seem to
        generate much bigger stack frames due to apparently bad coalescing of
        temporaries and generating too many spills.  Rusty saw gcc-4.6.4 using
        35% more stack on the virtio path than 4.8.2 does, for example.
      
        Minchan not only uses such a bad gcc version (4.6.3 in his case), but
        some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is
        what causes that kernel_map_pages() frame, for example). But we're
        clearly getting too close.
      
        The VM code also seems to have excessive stack frames partly for the
        same compiler reason, triggered by excessive inlining and lots of
        function arguments.
      
        We need to improve on our stack use, but in the meantime let's do this
        simple stack increase too.  Unlike most earlier reports, there is
        nothing simple that stands out as being really horribly wrong here,
        apart from the fact that the stack frames are just bigger than they
        should need to be.        - Linus ]
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S Tsirkin <mst@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6538b8ea
    • Linus Torvalds's avatar
      Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 6f6111e4
      Linus Torvalds authored
      Pull vfs dcache livelock fix from Al Viro:
       "Fixes for livelocks in shrink_dentry_list() introduced by fixes to
        shrink list corruption; the root cause was that trylock of parent's
        ->d_lock could be disrupted by d_walk() happening on other CPUs,
        resulting in shrink_dentry_list() making no progress *and* the same
        d_walk() being called again and again for as long as
        shrink_dentry_list() doesn't get past that mess.
      
        The solution is to have shrink_dentry_list() treat that trylock
        failure not as 'try to do the same thing again', but 'lock them in the
        right order'"
      
      * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        dentry_kill() doesn't need the second argument now
        dealing with the rest of shrink_dentry_list() livelock
        shrink_dentry_list(): take parent's ->d_lock earlier
        expand dentry_kill(dentry, 0) in shrink_dentry_list()
        split dentry_kill()
        lift the "already marked killed" case into shrink_dentry_list()
      6f6111e4
    • Al Viro's avatar
      dentry_kill() doesn't need the second argument now · 8cbf74da
      Al Viro authored
      it's 1 in the only remaining caller.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8cbf74da
    • Al Viro's avatar
      dealing with the rest of shrink_dentry_list() livelock · b2b80195
      Al Viro authored
      We have the same problem with ->d_lock order in the inner loop, where
      we are dropping references to ancestors.  Same solution, basically -
      instead of using dentry_kill() we use lock_parent() (introduced in the
      previous commit) to get that lock in a safe way, recheck ->d_count
      (in case if lock_parent() has ended up dropping and retaking ->d_lock
      and somebody managed to grab a reference during that window), trylock
      the inode->i_lock and use __dentry_kill() to do the rest.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b2b80195