1. 04 Oct, 2015 1 commit
  2. 03 Oct, 2015 4 commits
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2cf30826
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Fixes all around the map: W+X kernel mapping fix, WCHAN fixes, two
        build failure fixes for corner case configs, x32 header fix and a
        speling fix"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds
        x86/mm: Set NX on gap between __ex_table and rodata
        x86/kexec: Fix kexec crash in syscall kexec_file_load()
        x86/process: Unify 32bit and 64bit implementations of get_wchan()
        x86/process: Add proper bound checks in 64bit get_wchan()
        x86, efi, kasan: Fix build failure on !KASAN && KMEMCHECK=y kernels
        x86/hyperv: Fix the build in the !CONFIG_KEXEC_CORE case
        x86/cpufeatures: Correct spelling of the HWP_NOTIFY flag
      2cf30826
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 37cc7ab1
      Linus Torvalds authored
      Pull timer fixes from Ingo Molnar:
       "An abs64() fix in the watchdog driver, and two clocksource driver
        NO_IRQ assumption fixes"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource: Fix abs() usage w/ 64bit values
        clocksource/drivers/keystone: Fix bad NO_IRQ usage
        clocksource/drivers/rockchip: Fix bad NO_IRQ usage
      37cc7ab1
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a758379b
      Linus Torvalds authored
      Pull EFI fixes from Ingo Molnar:
       "Two EFI fixes: one for x86, one for ARM, fixing a boot crash bug that
        can trigger under newer EFI firmware"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        arm64/efi: Fix boot crash by not padding between EFI_MEMORY_RUNTIME regions
        x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down
      a758379b
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 14f97d97
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Bunch of fixes all over the place, all pretty small: amdgpu, i915,
        exynos, one qxl and one vmwgfx.
      
        There is also a bunch of mst fixes, I left some cleanups in the series
        as I didn't think it was worth splitting up the tested series"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (37 commits)
        drm/dp/mst: add some defines for logical/physical ports
        drm/dp/mst: drop cancel work sync in the mstb destroy path (v2)
        drm/dp/mst: split connector registration into two parts (v2)
        drm/dp/mst: update the link_address_sent before sending the link address (v3)
        drm/dp/mst: fixup handling hotplug on port removal.
        drm/dp/mst: don't pass port into the path builder function
        drm/radeon: drop radeon_fb_helper_set_par
        drm: handle cursor_set2 in restore_fbdev_mode
        drm/exynos: Staticize local function in exynos_drm_gem.c
        drm/exynos: fimd: actually disable dp clock
        drm/exynos: dp: remove suspend/resume functions
        drm/qxl: recreate the primary surface when the bo is not primary
        drm/amdgpu: only print meaningful VM faults
        drm/amdgpu/cgs: remove import_gpu_mem
        drm/i915: Call non-locking version of drm_kms_helper_poll_enable(), v2
        drm: Add a non-locking version of drm_kms_helper_poll_enable(), v2
        drm/vmwgfx: Fix a command submission hang regression
        drm/exynos: remove unused mode_fixup() code
        drm/exynos: remove decon_mode_fixup()
        drm/exynos: remove fimd_mode_fixup()
        ...
      14f97d97
  3. 02 Oct, 2015 35 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 978ab6a0
      Linus Torvalds authored
      Pull input layer fixes from Dmitry Torokhov:
       "Fixes for two recent regressions (in Synaptics PS/2 and uinput
        drivers) and some more driver fixups"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Revert "Input: synaptics - fix handling of disabling gesture mode"
        Input: psmouse - fix data race in __ps2_command
        Input: elan_i2c - add all valid ic type for i2c/smbus
        Input: zhenhua - ensure we have BITREVERSE
        Input: omap4-keypad - fix memory leak
        Input: serio - fix blocking of parport
        Input: uinput - fix crash when using ABS events
        Input: elan_i2c - expand maximum product_id form 0xFF to 0xFFFF
        Input: elan_i2c - add ic type 0x03
        Input: elan_i2c - don't require known iap version
        Input: imx6ul_tsc - fix controller name
        Input: imx6ul_tsc - use the preferred method for kzalloc()
        Input: imx6ul_tsc - check for negative return value
        Input: imx6ul_tsc - propagate the errors
        Input: walkera0701 - fix abs() calculations on 64 bit values
        Input: mms114 - remove unneded semicolons
        Input: pm8941-pwrkey - remove unneded semicolon
        Input: fix typo in MT documentation
        Input: cyapa - fix address of Gen3 devices in device tree documentation
      978ab6a0
    • John Stultz's avatar
      clocksource: Fix abs() usage w/ 64bit values · 67dfae0c
      John Stultz authored
      This patch fixes one cases where abs() was being used with 64-bit
      nanosecond values, where the result may be capped at 32-bits.
      
      This potentially could cause watchdog false negatives on 32-bit
      systems, so this patch addresses the issue by using abs64().
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1442279124-7309-2-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      67dfae0c
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 5634347d
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
      
       - Fix for transparent huge page change_protection() logic which was
         inadvertently changing a huge pmd page into a pmd table entry.
      
       - Function graph tracer panic fix caused by the return_to_handler code
         corrupting the multi-regs function return value (composite types).
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: ftrace: fix function_graph tracer panic
        arm64: Fix THP protection change logic
      5634347d
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k · b55a97e7
      Linus Torvalds authored
      Pull m68k updates from Geert Uytterhoeven:
       "Summary:
         - Fix for accidental modification of arguments of syscall functions
         - Wire up new syscalls
         - Update defconfigs"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
        m68k/defconfig: Update defconfigs for v4.3-rc1
        m68k: Define asmlinkage_protect
        m68k: Wire up membarrier
        m68k: Wire up userfaultfd
        m68k: Wire up direct socket calls
      b55a97e7
    • Marc Zyngier's avatar
      irqchip/gic-v3-its: Count additional LPIs for the aliased devices · 791c76d5
      Marc Zyngier authored
      When configuring the interrupt mapping for a new device, we
      iterate over all the possible aliases to account for their
      maximum MSI allocation. This was introduced by e8137f4f
      ("irqchip: gicv3-its: Iterate over PCI aliases to generate ITS configuration").
      
      Turns out that the code doing that is a bit braindead, and repeatedly
      accounts for the same device over and over.
      
      Fix this by counting the actual alias that is passed to us by the
      core code.
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: David Daney <ddaney.cavm@gmail.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Link: http://lkml.kernel.org/r/1443800646-8074-3-git-send-email-marc.zyngier@arm.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      791c76d5
    • Marc Zyngier's avatar
      irqchip/gic-v3-its: Silence warning when its_lpi_alloc_chunks gets inlined · c8415b94
      Marc Zyngier authored
      More agressive inlining in recent versions of GCC have uncovered
      a new set of warnings:
      
       drivers/irqchip/irq-gic-v3-its.c: In function its_msi_prepare:
        drivers/irqchip/irq-gic-v3-its.c:1148:26: warning: lpi_base may be used
          uninitialized in this function [-Wmaybe-uninitialized]
           dev->event_map.lpi_base = lpi_base;
                                ^
       drivers/irqchip/irq-gic-v3-its.c:1116:6: note: lpi_base was declared here
        int lpi_base;
      	      ^
       drivers/irqchip/irq-gic-v3-its.c:1149:25: warning: nr_lpis may be used
        uninitialized in this function [-Wmaybe-uninitialized]
         dev->event_map.nr_lpis = nr_lpis;
      	                         ^
       drivers/irqchip/irq-gic-v3-its.c:1117:6: note: nr_lpis was declared here
        int nr_lpis;
      	      ^
      The warning is fairly benign (there is no code path that could
      actually use uninitialized variables), but let's silence it anyway
      by zeroing the variables on the error path.
      Reported-by: default avatarAlex Shi <alex.shi@linaro.org>
      Tested-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: David Daney <ddaney.cavm@gmail.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Link: http://lkml.kernel.org/r/1443800646-8074-2-git-send-email-marc.zyngier@arm.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      c8415b94
    • Linus Torvalds's avatar
      Merge tag 'dmaengine-fix-4.3-rc4' of git://git.infradead.org/users/vkoul/slave-dma · 83dc311c
      Linus Torvalds authored
      Pull dmaengine fixes from Vinod Koul:
       "This contains fixes spread throughout the drivers, and also fixes one
        more instance of privatecnt in dmaengine.
      
        Driver fixes summary:
         - bunch of pxa_dma fixes for reuse of descriptor issue, residue and
           no-requestor
         - odd fixes in xgene, idma, sun4i and zxdma
         - at_xdmac fixes for cleaning descriptor and block addr mode"
      
      * tag 'dmaengine-fix-4.3-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: pxa_dma: fix residue corner case
        dmaengine: pxa_dma: fix the no-requestor case
        dmaengine: zxdma: Fix off-by-one for testing valid pchan request
        dmaengine: at_xdmac: clean used descriptor
        dmaengine: at_xdmac: change block increment addressing mode
        dmaengine: dw: properly read DWC_PARAMS register
        dmaengine: xgene-dma: Fix overwritting DMA tx ring
        dmaengine: fix balance of privatecnt
        dmaengine: sun4i: fix unsafe list iteration
        dmaengine: idma64: improve residue estimation
        dmaengine: xgene-dma: fix handling xgene_dma_get_ring_size result
        dmaengine: pxa_dma: fix initial list move
      83dc311c
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 27728bf0
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Another week, another round of fixes.
      
        These have been brewing for a bit and in various iterations, but I
        feel pretty comfortable about the quality of them.  They fix real
        issues.  The pull request is mostly blk-mq related, and the only one
        not fixing a real bug, is the tag iterator abstraction from Christoph.
        But it's pretty trivial, and we'll need it for another fix soon.
      
        Apart from the blk-mq fixes, there's an NVMe affinity fix from Keith,
        and a single fix for xen-blkback from Roger fixing failure to free
        requests on disconnect"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        blk-mq: factor out a helper to iterate all tags for a request_queue
        blk-mq: fix racy updates of rq->errors
        blk-mq: fix deadlock when reading cpu_list
        blk-mq: avoid inserting requests before establishing new mapping
        blk-mq: fix q->mq_usage_counter access race
        blk-mq: Fix use after of free q->mq_map
        blk-mq: fix sysfs registration/unregistration race
        blk-mq: avoid setting hctx->tags->cpumask before allocation
        NVMe: Set affinity after allocating request queues
        xen/blkback: free requests on disconnection
      27728bf0
    • Dmitry Torokhov's avatar
      Revert "Input: synaptics - fix handling of disabling gesture mode" · 62d78461
      Dmitry Torokhov authored
      This reverts commit e51e3849: we
      actually do want the device to work in extended W mode, as this is the
      mode that allows us receiving multiple contact information.
      
      Cc: stable@vger.kernel.org
      62d78461
    • Linus Torvalds's avatar
      Merge tag 'mmc-v4.3-rc3' of git://git.linaro.org/people/ulf.hansson/mmc · 36f8dafe
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "Here are some mmc fixes intended for v4.3 rc4:
      
        MMC core:
         - Allow users of mmc_of_parse() to succeed when CONFIG_GPIOLIB is
           unset
         - Prevent infinite loop of re-tuning for CRC-errors for CMD19 and
           CMD21
      
         MMC host:
         - pxamci: Fix issues with card detect
         - sunxi: Fix clk-delay settings"
      
      * tag 'mmc-v4.3-rc3' of git://git.linaro.org/people/ulf.hansson/mmc:
        mmc: core: fix dead loop of mmc_retune
        mmc: pxamci: fix card detect with slot-gpio API
        mmc: sunxi: Fix clk-delay settings
        mmc: core: Don't return an error for CD/WP GPIOs when GPIOLIB is unset
      36f8dafe
    • Linus Torvalds's avatar
      Merge git://git.infradead.org/intel-iommu · 8c25ab8b
      Linus Torvalds authored
      Pull IOVA fixes from David Woodhouse:
       "The main fix here is the first one, fixing the over-allocation of
         size-aligned requests.  The other patches simply make the existing
        IOVA code available to users other than the Intel VT-d driver, with no
        functional change.
      
        I concede the latter really *should* have been submitted during the
        merge window, but since it's basically risk-free and people are
        waiting to build on top of it and it's my fault I didn't get it in, I
        (and they) would be grateful if you'd take it"
      
      * git://git.infradead.org/intel-iommu:
        iommu: Make the iova library a module
        iommu: iova: Export symbols
        iommu: iova: Move iova cache management to the iova library
        iommu/iova: Avoid over-allocating when size-aligned
      8c25ab8b
    • Li Bin's avatar
      arm64: ftrace: fix function_graph tracer panic · ee556d00
      Li Bin authored
      When function graph tracer is enabled, the following operation
      will trigger panic:
      
      mount -t debugfs nodev /sys/kernel
      echo next_tgid > /sys/kernel/tracing/set_ftrace_filter
      echo function_graph > /sys/kernel/tracing/current_tracer
      ls /proc/
      
      ------------[ cut here ]------------
      [  198.501417] Unable to handle kernel paging request at virtual address cb88537fdc8ba316
      [  198.506126] pgd = ffffffc008f79000
      [  198.509363] [cb88537fdc8ba316] *pgd=00000000488c6003, *pud=00000000488c6003, *pmd=0000000000000000
      [  198.517726] Internal error: Oops: 94000005 [#1] SMP
      [  198.518798] Modules linked in:
      [  198.520582] CPU: 1 PID: 1388 Comm: ls Tainted: G
      [  198.521800] Hardware name: linux,dummy-virt (DT)
      [  198.522852] task: ffffffc0fa9e8000 ti: ffffffc0f9ab0000 task.ti: ffffffc0f9ab0000
      [  198.524306] PC is at next_tgid+0x30/0x100
      [  198.525205] LR is at return_to_handler+0x0/0x20
      [  198.526090] pc : [<ffffffc0002a1070>] lr : [<ffffffc0000907c0>] pstate: 60000145
      [  198.527392] sp : ffffffc0f9ab3d40
      [  198.528084] x29: ffffffc0f9ab3d40 x28: ffffffc0f9ab0000
      [  198.529406] x27: ffffffc000d6a000 x26: ffffffc000b786e8
      [  198.530659] x25: ffffffc0002a1900 x24: ffffffc0faf16c00
      [  198.531942] x23: ffffffc0f9ab3ea0 x22: 0000000000000002
      [  198.533202] x21: ffffffc000d85050 x20: 0000000000000002
      [  198.534446] x19: 0000000000000002 x18: 0000000000000000
      [  198.535719] x17: 000000000049fa08 x16: ffffffc000242efc
      [  198.537030] x15: 0000007fa472b54c x14: ffffffffff000000
      [  198.538347] x13: ffffffc0fada84a0 x12: 0000000000000001
      [  198.539634] x11: ffffffc0f9ab3d70 x10: ffffffc0f9ab3d70
      [  198.540915] x9 : ffffffc0000907c0 x8 : ffffffc0f9ab3d40
      [  198.542215] x7 : 0000002e330f08f0 x6 : 0000000000000015
      [  198.543508] x5 : 0000000000000f08 x4 : ffffffc0f9835ec0
      [  198.544792] x3 : cb88537fdc8ba316 x2 : cb88537fdc8ba306
      [  198.546108] x1 : 0000000000000002 x0 : ffffffc000d85050
      [  198.547432]
      [  198.547920] Process ls (pid: 1388, stack limit = 0xffffffc0f9ab0020)
      [  198.549170] Stack: (0xffffffc0f9ab3d40 to 0xffffffc0f9ab4000)
      [  198.582568] Call trace:
      [  198.583313] [<ffffffc0002a1070>] next_tgid+0x30/0x100
      [  198.584359] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
      [  198.585503] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
      [  198.586574] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
      [  198.587660] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
      [  198.588896] Code: aa0003f5 2a0103f4 b4000102 91004043 (885f7c60)
      [  198.591092] ---[ end trace 6a346f8f20949ac8 ]---
      
      This is because when using function graph tracer, if the traced
      function return value is in multi regs ([x0-x7]), return_to_handler
      may corrupt them. So in return_to_handler, the parameter regs should
      be protected properly.
      
      Cc: <stable@vger.kernel.org> # 3.18+
      Signed-off-by: default avatarLi Bin <huawei.libin@huawei.com>
      Acked-by: default avatarAKASHI Takahiro <takahiro.akashi@linaro.org>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      ee556d00
    • Ben Hutchings's avatar
      x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds · f4b4aae1
      Ben Hutchings authored
      On x32, gcc predefines __x86_64__ but long is only 32-bit.  Use
      __ILP32__ to distinguish x32.
      
      Fixes this compiler error in perf:
      
      	tools/include/asm-generic/bitops/__ffs.h: In function '__ffs':
      	tools/include/asm-generic/bitops/__ffs.h:19:8: error: right shift count >= width of type [-Werror=shift-count-overflow]
      	  word >>= 32;
      	       ^
      
      This isn't sufficient to build perf for x32, though.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1443660043.2730.15.camel@decadent.org.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f4b4aae1
    • Stephen Smalley's avatar
      x86/mm: Set NX on gap between __ex_table and rodata · ab76f7b4
      Stephen Smalley authored
      Unused space between the end of __ex_table and the start of
      rodata can be left W+x in the kernel page tables.  Extend the
      setting of the NX bit to cover this gap by starting from
      text_end rather than rodata_start.
      
        Before:
        ---[ High Kernel Mapping ]---
        0xffffffff80000000-0xffffffff81000000          16M                               pmd
        0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
        0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
        0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB x  pte
        0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
        0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
        0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
        0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
        0xffffffff82200000-0xffffffffa0000000         478M                               pmd
      
        After:
        ---[ High Kernel Mapping ]---
        0xffffffff80000000-0xffffffff81000000          16M                               pmd
        0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
        0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
        0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB NX pte
        0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
        0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
        0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
        0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
        0xffffffff82200000-0xffffffffa0000000         478M                               pmd
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.govSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ab76f7b4
    • Lee, Chun-Yi's avatar
      x86/kexec: Fix kexec crash in syscall kexec_file_load() · e3c41e37
      Lee, Chun-Yi authored
      The original bug is a page fault crash that sometimes happens
      on big machines when preparing ELF headers:
      
          BUG: unable to handle kernel paging request at ffffc90613fc9000
          IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260
      
      The bug is caused by us under-counting the number of memory ranges
      and subsequently not allocating enough ELF header space for them.
      The bug is typically masked on smaller systems, because the ELF header
      allocation is rounded up to the next page.
      
      This patch modifies the code in fill_up_crash_elf_data() by using
      walk_system_ram_res() instead of walk_system_ram_range() to correctly
      count the max number of crash memory ranges. That's because the
      walk_system_ram_range() filters out small memory regions that
      reside in the same page, but walk_system_ram_res() does not.
      
      Here's how I found the bug:
      
      After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
      the code uses walk_system_ram_res() to fill-in crash memory regions information
      to the program header, so it counts those small memory regions that
      reside in a page area.
      
      But, when the kernel was using walk_system_ram_range() in
      fill_up_crash_elf_data() to count the number of crash memory regions,
      it filters out small regions.
      
      I printed those small memory regions, for example:
      
        kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0
      
      Based on the code in walk_system_ram_range(), this memory region
      will be filtered out:
      
        pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
        end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
        end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
      
      So, the max_nr_ranges that's counted by the kernel doesn't include
      small memory regions - causing us to under-allocate the required space.
      That causes the page fault crash that happens in a later code path
      when preparing ELF headers.
      
      This bug is not easy to reproduce on small machines that have few
      CPUs, because the allocated page aligned ELF buffer has more free
      space to cover those small memory regions' PT_LOAD headers.
      Signed-off-by: default avatarLee, Chun-Yi <jlee@suse.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: kexec@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e3c41e37
    • Dave Airlie's avatar
      drm/dp/mst: add some defines for logical/physical ports · ccf03d69
      Dave Airlie authored
      This just removes the magic number.
      Acked-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      ccf03d69
    • Dave Airlie's avatar
      drm/dp/mst: drop cancel work sync in the mstb destroy path (v2) · 274d8352
      Dave Airlie authored
      Since 9eb1e57f
      drm/dp/mst: make sure mst_primary mstb is valid in work function
      
      we validate the mstb structs in the work function, and doing
      that takes a reference. So we should never get here with the
      work function running using the mstb device, only if the work
      function hasn't run yet or is running for another mstb.
      
      So we don't need to sync the work here, this was causing
      lockdep spew as below.
      
      [  +0.000160] =============================================
      [  +0.000001] [ INFO: possible recursive locking detected ]
      [  +0.000002] 3.10.0-320.el7.rhel72.stable.backport.3.x86_64.debug #1 Tainted: G        W      ------------
      [  +0.000001] ---------------------------------------------
      [  +0.000001] kworker/4:2/1262 is trying to acquire lock:
      [  +0.000001]  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b29a5>] flush_work+0x5/0x2e0
      [  +0.000007]
      but task is already holding lock:
      [  +0.000001]  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
      [  +0.000004]
      other info that might help us debug this:
      [  +0.000001]  Possible unsafe locking scenario:
      
      [  +0.000002]        CPU0
      [  +0.000000]        ----
      [  +0.000001]   lock((&mgr->work));
      [  +0.000002]   lock((&mgr->work));
      [  +0.000001]
       *** DEADLOCK ***
      
      [  +0.000001]  May be due to missing lock nesting notation
      
      [  +0.000002] 2 locks held by kworker/4:2/1262:
      [  +0.000001]  #0:  (events_long){.+.+.+}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
      [  +0.000004]  #1:  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
      [  +0.000003]
      stack backtrace:
      [  +0.000003] CPU: 4 PID: 1262 Comm: kworker/4:2 Tainted: G        W      ------------   3.10.0-320.el7.rhel72.stable.backport.3.x86_64.debug #1
      [  +0.000001] Hardware name: LENOVO 20EGS0R600/20EGS0R600, BIOS GNET71WW (2.19 ) 02/05/2015
      [  +0.000008] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
      [  +0.000001]  ffffffff82c26c90 00000000a527b914 ffff88046399bae8 ffffffff816fe04d
      [  +0.000004]  ffff88046399bb58 ffffffff8110f47f ffff880461438000 0001009b840fc003
      [  +0.000002]  ffff880461438a98 0000000000000000 0000000804dc26e1 ffffffff824a2c00
      [  +0.000003] Call Trace:
      [  +0.000004]  [<ffffffff816fe04d>] dump_stack+0x19/0x1b
      [  +0.000004]  [<ffffffff8110f47f>] __lock_acquire+0x115f/0x1250
      [  +0.000002]  [<ffffffff8110fd49>] lock_acquire+0x99/0x1e0
      [  +0.000002]  [<ffffffff810b29a5>] ? flush_work+0x5/0x2e0
      [  +0.000002]  [<ffffffff810b29ee>] flush_work+0x4e/0x2e0
      [  +0.000002]  [<ffffffff810b29a5>] ? flush_work+0x5/0x2e0
      [  +0.000004]  [<ffffffff81025905>] ? native_sched_clock+0x35/0x80
      [  +0.000002]  [<ffffffff81025959>] ? sched_clock+0x9/0x10
      [  +0.000002]  [<ffffffff810da1f5>] ? local_clock+0x25/0x30
      [  +0.000002]  [<ffffffff8110dca9>] ? mark_held_locks+0xb9/0x140
      [  +0.000003]  [<ffffffff810b4ed5>] ? __cancel_work_timer+0x95/0x160
      [  +0.000002]  [<ffffffff810b4ee8>] __cancel_work_timer+0xa8/0x160
      [  +0.000002]  [<ffffffff810b4fb0>] cancel_work_sync+0x10/0x20
      [  +0.000007]  [<ffffffffa0160d17>] drm_dp_destroy_mst_branch_device+0x27/0x120 [drm_kms_helper]
      [  +0.000006]  [<ffffffffa0163968>] drm_dp_mst_link_probe_work+0x78/0xa0 [drm_kms_helper]
      [  +0.000002]  [<ffffffff810b5850>] process_one_work+0x220/0x710
      [  +0.000002]  [<ffffffff810b57e4>] ? process_one_work+0x1b4/0x710
      [  +0.000005]  [<ffffffff810b5e5b>] worker_thread+0x11b/0x3a0
      [  +0.000003]  [<ffffffff810b5d40>] ? process_one_work+0x710/0x710
      [  +0.000002]  [<ffffffff810beced>] kthread+0xed/0x100
      [  +0.000003]  [<ffffffff810bec00>] ? insert_kthread_work+0x80/0x80
      [  +0.000003]  [<ffffffff817121d8>] ret_from_fork+0x58/0x90
      
      v2: add flush_work.
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      274d8352
    • Dave Airlie's avatar
      drm/dp/mst: split connector registration into two parts (v2) · d9515c5e
      Dave Airlie authored
      In order to cache the EDID properly for tiled displays, we
      need to retrieve it before we register the connector with
      userspace, otherwise userspace can call get resources
      and try and get the edid before we've even cached it.
      
      This fixes some problems when hotplugging mst monitors,
      with X/mutter running. As mutter seems to get 0 modes
      for one of the monitors in the tile.
      
      v2: fix warning in radeon
      handle tile setting in cached path rather than
      get edid path.
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      d9515c5e
    • Dave Airlie's avatar
      drm/dp/mst: update the link_address_sent before sending the link address (v3) · 68d8c9fc
      Dave Airlie authored
      Update the state before sending the msg to close it.
      
      v2: reset value if return indicates we haven't send the msg.
      v3: just clean the code up.
      Pointed out by Adam J Richter on
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91481Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      68d8c9fc
    • Dave Airlie's avatar
      drm/dp/mst: fixup handling hotplug on port removal. · df4839fd
      Dave Airlie authored
      output ports should always have a connector, unless
      in the rare case connector allocation fails in the
      driver.
      
      In this case we only need to teardown the pdt,
      and free the struct, and there is no need to
      send a hotplug msg.
      
      In the case were we add the port to the destroy
      list we need to send a hotplug if we destroy
      any connectors, so userspace knows to reprobe
      stuff.
      
      this patch also handles port->connector allocation
      failing which should be a rare event, but makes
      the code consistent.
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      df4839fd
    • Dave Airlie's avatar
      drm/dp/mst: don't pass port into the path builder function · 1c960876
      Dave Airlie authored
      This is unnecessary and it makes it easier to see what is needed
      from port.
      
      also add blank line to make things nicer.
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      1c960876
    • Alex Deucher's avatar
      drm/radeon: drop radeon_fb_helper_set_par · 0c6dadbe
      Alex Deucher authored
      It was just a wrapper around drm_fb_helper_set_par that
      called cursor_set2 in addition.  Now that the core handles
      this, drop this radeon specific version.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Reviewed-by: default avatarMichel Dänzer <michel.daenzer@amd.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      0c6dadbe
    • Alex Deucher's avatar
      drm: handle cursor_set2 in restore_fbdev_mode · 03f9abb2
      Alex Deucher authored
      If a driver uses the cursor_set2 crtc callback rather than
      cursor_set, use that.  This fixes the fbdev helper for drivers
      that use cursor_set2.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Reviewed-by: default avatarChristian König <christian.koenig@amd.com>
      Reviewed-by: default avatarMichel Dänzer <michel.daenzer@amd.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      03f9abb2
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · bde17b90
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "12 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        dmapool: fix overflow condition in pool_find_page()
        thermal: avoid division by zero in power allocator
        memcg: remove pcp_counter_lock
        kprobes: use _do_fork() in samples to make them work again
        drivers/input/joystick/Kconfig: zhenhua.c needs BITREVERSE
        memcg: make mem_cgroup_read_stat() unsigned
        memcg: fix dirty page migration
        dax: fix NULL pointer in __dax_pmd_fault()
        mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault
        mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1)
        userfaultfd: remove kernel header include from uapi header
        arch/x86/include/asm/efi.h: fix build failure
      bde17b90
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-4.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 1bca1000
      Linus Torvalds authored
      Pull power management and ACPI fixes from Rafael Wysocki:
       "These are fixes mostly, for a few changes made in this cycle (the
        intel_idle driver, the OPP library, the ACPI EC driver, turbostat) and
        for some issues that have just been discovered (ACPI PCI IRQ
        management, PCI power management documentation, turbostat), with a
        couple of cleanups on top of them.
      
        Specifics:
      
         - intel_idle driver fixup for the recently added Skylake chips
           support (Len Brown).
      
         - Operating Performance Points (OPP) library fix related to the
           recently added support for new DT bindings and a fix for a typo in
           a comment (Viresh Kumar, Stephen Boyd).
      
         - ACPI EC driver fix for a recently introduced memory leak in an
           error code path (Lv Zheng).
      
         - ACPI PCI IRQ management fix for the issue where an ISA IRQ is
           shared with a PCI device which requires it to be configured in a
           different way and may cause an interrupt storm to happen as a
           result with an extra ACPI SCI IRQ handling simplification on top of
           it (Jiang Liu).
      
         - Update of the PCI power management documentation that became
           outdated and started to actively confuse the readers to make it
           actually reflect the code (Rafael J Wysocki).
      
         - turbostat fixes including an IVB Xeon regression fix (related to
           the --debug command line option), Skylake adjustment for the TSC
           running at a frequency that doesn't match the base one exactly, and
           a Knights Landing quirk to account for the fact that it only
           updates APERF and MPERF every 1024 clock cycles plus bumping up the
           turbostat version number (Len Brown, Hubert Chrzaniuk)"
      
      * tag 'pm+acpi-4.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        tools/power turbosat: update version number
        tools/power turbostat: SKL: Adjust for TSC difference from base frequency
        tools/power turbostat: KNL workaround for %Busy and Avg_MHz
        tools/power turbostat: IVB Xeon: fix --debug regression
        ACPI / PCI: Remove duplicated penalty on SCI IRQ
        ACPI, PCI, irq: Do not share PCI IRQ with ISA IRQ
        ACPI / EC: Fix a memory leak issue in acpi_ec_query()
        PM / OPP: Fix typo modifcation -> modification
        PCI / PM: Update runtime PM documentation for PCI devices
        PM / OPP: of_property_count_u32_elems() can return errors
        intel_idle: Skylake Client Support - updated
      1bca1000
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 3deaa4f5
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
      1) Fix regression in SKB partial checksum handling, from Pravin B
         Shalar.
      
      2) Fix VLAN inside of VXLAN handling in i40e driver, from Jesse
         Brandeburg.
      
      3) Cure softlockups during accept() in SCTP, from Karl Heiss.
      
      4) MSG_PEEK should return multiple SKBs worth of data in AF_UNIX, from
         Aaron Conole.
      
      5) IPV6 erroneously ignores output interface specifier in lookup key for
         route lookups, fix from David Ahern.
      
      6) In Marvell DSA driver, forward unknown frames to CPU port, from
         Andrew Lunn.
      
      7) Mission flow flag initializations in some code paths, from David
         Ahern.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: Initialize flow flags in input path
        net: dsa: fix preparation of a port STP update
        testptp: Silence compiler warnings on ppc64
        net/mlx4: Handle return codes in mlx4_qp_attach_common
        dsa: mv88e6xxx: Enable forwarding for unknown to the CPU port
        skbuff: Fix skb checksum partial check.
        net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set
        net sysfs: Print link speed as signed integer
        bna: fix error handling
        af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag
        af_unix: Convert the unix_sk macro to an inline function for type safety
        net: sctp: Don't use 64 kilobyte lookup table for four elements
        l2tp: protect tunnel->del_work by ref_count
        net/ibm/emac: bump version numbers for correct work with ethtool
        sctp: Prevent soft lockup when sctp_accept() is called during a timeout event
        sctp: Whitespace fix
        i40e/i40evf: check for stopped admin queue
        i40e: fix VLAN inside VXLAN
        r8169: fix handling rtl_readphy result
        net: hisilicon: fix handling platform_get_irq result
      3deaa4f5
    • Robin Murphy's avatar
      dmapool: fix overflow condition in pool_find_page() · 676bd991
      Robin Murphy authored
      If a DMA pool lies at the very top of the dma_addr_t range (as may
      happen with an IOMMU involved), the calculated end address of the pool
      wraps around to zero, and page lookup always fails.
      
      Tweak the relevant calculation to be overflow-proof.
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Sakari Ailus <sakari.ailus@iki.fi>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      676bd991
    • Andrea Arcangeli's avatar
      thermal: avoid division by zero in power allocator · 44241628
      Andrea Arcangeli authored
      During boot I get a div by zero Oops regression starting in v4.3-rc3.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatarJavi Merino <javi.merino@arm.com>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Eduardo Valentin <edubezval@gmail.com>
      Cc: Daniel Kurtz <djkurtz@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44241628
    • Greg Thelen's avatar
      memcg: remove pcp_counter_lock · ef510194
      Greg Thelen authored
      Commit 733a572e ("memcg: make mem_cgroup_read_{stat|event}() iterate
      possible cpus instead of online") removed the last use of the per memcg
      pcp_counter_lock but forgot to remove the variable.
      
      Kill the vestigial variable.
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ef510194
    • Petr Mladek's avatar
      kprobes: use _do_fork() in samples to make them work again · 54aea454
      Petr Mladek authored
      Commit 3033f14a ("clone: support passing tls argument via C rather
      than pt_regs magic") introduced _do_fork() that allowed to pass @tls
      parameter.
      
      The old do_fork() is defined only for architectures that are not ready
      to use this way and do not define HAVE_COPY_THREAD_TLS.
      
      Let's use _do_fork() in the kprobe examples to make them work again on
      all architectures.
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thiago Macieira <thiago.macieira@intel.com>
      Cc: Jiri Kosina <jkosina@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      54aea454
    • Andrew Morton's avatar
      drivers/input/joystick/Kconfig: zhenhua.c needs BITREVERSE · 09a59a9d
      Andrew Morton authored
      It uses bitrev8(), so it must ensure that lib/bitrev.o gets included in
      vmlinux.
      
      Cc: Fengguang Wu <fengguang.wu@gmail.com>
      Cc: yalin wang <yalin.wang2010@gmail.com>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09a59a9d
    • Greg Thelen's avatar
      memcg: make mem_cgroup_read_stat() unsigned · 484ebb3b
      Greg Thelen authored
      mem_cgroup_read_stat() returns a page count by summing per cpu page
      counters.  The summing is racy wrt.  updates, so a transient negative
      sum is possible.  Callers don't want negative values:
      
       - mem_cgroup_wb_stats() doesn't want negative nr_dirty or nr_writeback.
         This could confuse dirty throttling.
      
       - oom reports and memory.stat shouldn't show confusing negative usage.
      
       - tree_usage() already avoids negatives.
      
      Avoid returning negative page counts from mem_cgroup_read_stat() and
      convert it to unsigned.
      
      [akpm@linux-foundation.org: fix old typo while we're in there]
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>	[4.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      484ebb3b
    • Greg Thelen's avatar
      memcg: fix dirty page migration · 0610c25d
      Greg Thelen authored
      The problem starts with a file backed dirty page which is charged to a
      memcg.  Then page migration is used to move oldpage to newpage.
      
      Migration:
       - copies the oldpage's data to newpage
       - clears oldpage.PG_dirty
       - sets newpage.PG_dirty
       - uncharges oldpage from memcg
       - charges newpage to memcg
      
      Clearing oldpage.PG_dirty decrements the charged memcg's dirty page
      count.
      
      However, because newpage is not yet charged, setting newpage.PG_dirty
      does not increment the memcg's dirty page count.  After migration
      completes newpage.PG_dirty is eventually cleared, often in
      account_page_cleaned().  At this time newpage is charged to a memcg so
      the memcg's dirty page count is decremented which causes underflow
      because the count was not previously incremented by migration.  This
      underflow causes balance_dirty_pages() to see a very large unsigned
      number of dirty memcg pages which leads to aggressive throttling of
      buffered writes by processes in non root memcg.
      
      This issue:
       - can harm performance of non root memcg buffered writes.
       - can report too small (even negative) values in
         memory.stat[(total_)dirty] counters of all memcg, including the root.
      
      To avoid polluting migrate.c with #ifdef CONFIG_MEMCG checks, introduce
      page_memcg() and set_page_memcg() helpers.
      
      Test:
          0) setup and enter limited memcg
          mkdir /sys/fs/cgroup/test
          echo 1G > /sys/fs/cgroup/test/memory.limit_in_bytes
          echo $$ > /sys/fs/cgroup/test/cgroup.procs
      
          1) buffered writes baseline
          dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
          sync
          grep ^dirty /sys/fs/cgroup/test/memory.stat
      
          2) buffered writes with compaction antagonist to induce migration
          yes 1 > /proc/sys/vm/compact_memory &
          rm -rf /data/tmp/foo
          dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
          kill %
          sync
          grep ^dirty /sys/fs/cgroup/test/memory.stat
      
          3) buffered writes without antagonist, should match baseline
          rm -rf /data/tmp/foo
          dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
          sync
          grep ^dirty /sys/fs/cgroup/test/memory.stat
      
                             (speed, dirty residue)
                   unpatched                       patched
          1) 841 MB/s 0 dirty pages          886 MB/s 0 dirty pages
          2) 611 MB/s -33427456 dirty pages  793 MB/s 0 dirty pages
          3) 114 MB/s -33427456 dirty pages  891 MB/s 0 dirty pages
      
          Notice that unpatched baseline performance (1) fell after
          migration (3): 841 -> 114 MB/s.  In the patched kernel, post
          migration performance matches baseline.
      
      Fixes: c4843a75 ("memcg: add per cgroup dirty page accounting")
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Reported-by: default avatarDave Hansen <dave.hansen@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>	[4.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0610c25d
    • Ross Zwisler's avatar
      dax: fix NULL pointer in __dax_pmd_fault() · 8346c416
      Ross Zwisler authored
      Commit 46c043ed ("mm: take i_mmap_lock in unmap_mapping_range() for
      DAX") moved some code in __dax_pmd_fault() that was responsible for
      zeroing newly allocated PMD pages.  The new location didn't properly set
      up 'kaddr', so when run this code resulted in a NULL pointer BUG.
      
      Fix this by getting the correct 'kaddr' via bdev_direct_access().
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8346c416
    • Mel Gorman's avatar
      mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault · 2f84a899
      Mel Gorman authored
      SunDong reported the following on
      
        https://bugzilla.kernel.org/show_bug.cgi?id=103841
      
      	I think I find a linux bug, I have the test cases is constructed. I
      	can stable recurring problems in fedora22(4.0.4) kernel version,
      	arch for x86_64.  I construct transparent huge page, when the parent
      	and child process with MAP_SHARE, MAP_PRIVATE way to access the same
      	huge page area, it has the opportunity to lead to huge page copy on
      	write failure, and then it will munmap the child corresponding mmap
      	area, but then the child mmap area with VM_MAYSHARE attributes, child
      	process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
      	functions (vma - > vm_flags & VM_MAYSHARE).
      
      There were a number of problems with the report (e.g.  it's hugetlbfs that
      triggers this, not transparent huge pages) but it was fundamentally
      correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
      looks like this
      
      	 vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
      	 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
      	 prot 8000000000000027 anon_vma           (null) vm_ops ffffffff8182a7a0
      	 pgoff 0 file ffff88106bdb9800 private_data           (null)
      	 flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
      	 ------------
      	 kernel BUG at mm/hugetlb.c:462!
      	 SMP
      	 Modules linked in: xt_pkttype xt_LOG xt_limit [..]
      	 CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1
      	 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
      	 set_vma_resv_flags+0x2d/0x30
      
      The VM_BUG_ON is correct because private and shared mappings have
      different reservation accounting but the warning clearly shows that the
      VMA is shared.
      
      When a private COW fails to allocate a new page then only the process
      that created the VMA gets the page -- all the children unmap the page.
      If the children access that data in the future then they get killed.
      
      The problem is that the same file is mapped shared and private.  During
      the COW, the allocation fails, the VMAs are traversed to unmap the other
      private pages but a shared VMA is found and the bug is triggered.  This
      patch identifies such VMAs and skips them.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reported-by: default avatarSunDong <sund_sky@126.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: David Rientjes <rientjes@google.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2f84a899