1. 17 Nov, 2017 6 commits
    • Liran Alon's avatar
      KVM: x86: pvclock: Handle first-time write to pvclock-page contains random junk · 51c4b8bb
      Liran Alon authored
      When guest passes KVM it's pvclock-page GPA via WRMSR to
      MSR_KVM_SYSTEM_TIME / MSR_KVM_SYSTEM_TIME_NEW, KVM don't initialize
      pvclock-page to some start-values. It just requests a clock-update which
      will happen before entering to guest.
      
      The clock-update logic will call kvm_setup_pvclock_page() to update the
      pvclock-page with info. However, kvm_setup_pvclock_page() *wrongly*
      assumes that the version-field is initialized to an even number. This is
      wrong because at first-time write, field could be any-value.
      
      Fix simply makes sure that if first-time version-field is odd, increment
      it once more to make it even and only then start standard logic.
      This follows same logic as done in other pvclock shared-pages (See
      kvm_write_wall_clock() and record_steal_time()).
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Reviewed-by: default avatarNikita Leshenko <nikita.leshchenko@oracle.com>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      51c4b8bb
    • Paolo Bonzini's avatar
      kvm: vmx: Allow disabling virtual NMI support · d02fcf50
      Paolo Bonzini authored
      To simplify testing of these rarely used code paths, add a module parameter
      that turns it on.  One eventinj.flat test (NMI after iret) fails when
      loading kvm_intel with vnmi=0.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      d02fcf50
    • Paolo Bonzini's avatar
      kvm: vmx: Reinstate support for CPUs without virtual NMI · 8a1b4392
      Paolo Bonzini authored
      This is more or less a revert of commit 2c82878b ("KVM: VMX: require
      virtual NMI support", 2017-03-27); it turns out that Core 2 Duo machines
      only had virtual NMIs in some SKUs.
      
      The revert is not trivial because in the meanwhile there have been several
      fixes to nested NMI injection.  Therefore, the entire vNMI state is moved
      to struct loaded_vmcs.
      
      Another change compared to before the patch is a simplification here:
      
             if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked &&
                 !(is_guest_mode(vcpu) && nested_cpu_has_virtual_nmis(
                                             get_vmcs12(vcpu))))) {
      
      The final condition here is always true (because nested_cpu_has_virtual_nmis
      is always false) and is removed.
      
      Fixes: 2c82878b
      Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1490803
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      8a1b4392
    • Paolo Bonzini's avatar
      KVM: SVM: obey guest PAT · 15038e14
      Paolo Bonzini authored
      For many years some users of assigned devices have reported worse
      performance on AMD processors with NPT than on AMD without NPT,
      Intel or bare metal.
      
      The reason turned out to be that SVM is discarding the guest PAT
      setting and uses the default (PA0=PA4=WB, PA1=PA5=WT, PA2=PA6=UC-,
      PA3=UC).  The guest might be using a different setting, and
      especially might want write combining but isn't getting it
      (instead getting slow UC or UC- accesses).
      
      Thanks a lot to geoff@hostfission.com for noticing the relation
      to the g_pat setting.  The patch has been tested also by a bunch
      of people on VFIO users forums.
      
      Fixes: 709ddebf
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196409
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Tested-by: default avatarNick Sarnie <commendsarnex@gmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      15038e14
    • Paolo Bonzini's avatar
      Merge tag 'kvm-arm-gicv4-for-v4.15' of... · fc3790fa
      Paolo Bonzini authored
      Merge tag 'kvm-arm-gicv4-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      GICv4 Support for KVM/ARM for v4.15
      fc3790fa
    • Linus Torvalds's avatar
      Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · cf9b0772
      Linus Torvalds authored
      Pull ARM SoC driver updates from Arnd Bergmann:
       "This branch contains platform-related driver updates for ARM and
        ARM64, these are the areas that bring the changes:
      
        New drivers:
      
         - driver support for Renesas R-Car V3M (R8A77970)
      
         - power management support for Amlogic GX
      
         - a new driver for the Tegra BPMP thermal sensor
      
         - a new bus driver for Technologic Systems NBUS
      
        Changes for subsystems that prefer to merge through arm-soc:
      
         - the usual updates for reset controller drivers from Philipp Zabel,
           with five added drivers for SoCs in the arc, meson, socfpa,
           uniphier and mediatek families
      
         - updates to the ARM SCPI and PSCI frameworks, from Sudeep Holla,
           Heiner Kallweit and Lorenzo Pieralisi
      
        Changes specific to some ARM-based SoC
      
         - the Freescale/NXP DPAA QBMan drivers from PowerPC can now work on
           ARM as well
      
         - several changes for power management on Broadcom SoCs
      
         - various improvements on Qualcomm, Broadcom, Amlogic, Atmel,
           Mediatek
      
         - minor Cleanups for Samsung, TI OMAP SoCs"
      
      [ NOTE! This doesn't work without the previous ARM SoC device-tree pull,
        because the R8A77970 driver is missing a header file that came from
        that pull.
      
        The fact that this got merged afterwards only fixes it at this point,
        and bisection of that driver will fail if/when you walk into the
        history of that driver.           - Linus ]
      
      * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (96 commits)
        soc: amlogic: meson-gx-pwrc-vpu: fix power-off when powered by bootloader
        bus: add driver for the Technologic Systems NBUS
        memory: omap-gpmc: Remove deprecated gpmc_update_nand_reg()
        soc: qcom: remove unused label
        soc: amlogic: gx pm domain: add PM and OF dependencies
        drivers/firmware: psci_checker: Add missing destroy_timer_on_stack()
        dt-bindings: power: add amlogic meson power domain bindings
        soc: amlogic: add Meson GX VPU Domains driver
        soc: qcom: Remote filesystem memory driver
        dt-binding: soc: qcom: Add binding for rmtfs memory
        of: reserved_mem: Accessor for acquiring reserved_mem
        of/platform: Generalize /reserved-memory handling
        soc: mediatek: pwrap: fix fatal compiler error
        soc: mediatek: pwrap: fix compiler errors
        arm64: mediatek: cleanup message for platform selection
        soc: Allow test-building of MediaTek drivers
        soc: mediatek: place Kconfig for all SoC drivers under menu
        soc: mediatek: pwrap: add support for MT7622 SoC
        soc: mediatek: pwrap: add common way for setup CS timing extenstion
        soc: mediatek: pwrap: add MediaTek MT6380 as one slave of pwrap
        ..
      cf9b0772
  2. 16 Nov, 2017 34 commits
    • Linus Torvalds's avatar
      Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 527d1470
      Linus Torvalds authored
      Pull ARM device-tree updates from Arnd Bergmann:
       "We add device tree files for a couple of additional SoCs in various
        areas:
      
        Allwinner R40/V40 for entertainment, Broadcom Hurricane 2 for
        networking, Amlogic A113D for audio, and Renesas R-Car V3M for
        automotive.
      
        As usual, lots of new boards get added based on those and other SoCs:
      
         - Actions S500 based CubieBoard6 single-board computer
      
         - Amlogic Meson-AXG A113D based development board
         - Amlogic S912 based Khadas VIM2 single-board computer
         - Amlogic S912 based Tronsmart Vega S96 set-top-box
      
         - Allwinner H5 based NanoPi NEO Plus2 single-board computer
         - Allwinner R40 based Banana Pi M2 Ultra and Berry single-board computers
         - Allwinner A83T based TBS A711 Tablet
      
         - Broadcom Hurricane 2 based Ubiquiti UniFi Switch 8
         - Broadcom bcm47xx based Luxul XAP-1440/XAP-810/ABR-4500/XBR-4500
           wireless access points and routers
      
         - NXP i.MX51 based Zodiac Inflight Innovations RDU1 board
         - NXP i.MX53 based GE Healthcare PPD biometric monitor
         - NXP i.MX6 based Pistachio single-board computer
         - NXP i.MX6 based Vining-2000 automotive diagnostic interface
         - NXP i.MX6 based Ka-Ro TX6 Computer-on-Module in additional variants
      
         - Qualcomm MSM8974 (Snapdragon 800) based Fairphone 2 phone
         - Qualcomm MSM8974pro (Snapdragon 801) based Sony Xperia Z2 Tablet
      
         - Realtek RTD1295 based set-top-boxes MeLE V9 and PROBOX2 AVA
      
         - Renesas R-Car V3M (R8A77970) SoC and "Eagle" reference board
         - Renesas H3ULCB and M3ULCB "Kingfisher" extension infotainment boards
         - Renasas r8a7745 based iWave G22D-SODIMM SoM
      
         - Rockchip rk3288 based Amarula Vyasa single-board computer
      
         - Samsung Exynos5800 based Odroid HC1 single-board computer
      
        For existing SoC support, there was a lot of ongoing work, as usual
        most of that concentrated on the Renesas, Rockchip, OMAP, i.MX,
        Amlogic and Allwinner platforms, but others were also active.
      
        Rob Herring and many others worked on reducing the number of issues
        that the latest version of 'dtc' now warns about. Unfortunately there
        is still a lot left to do.
      
        A rework of the ARM foundation model introduced several new files for
        common variations of the model"
      
      * tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (599 commits)
        arm64: dts: uniphier: route on-board device IRQ to GPIO controller for PXs3
        dt-bindings: bus: Add documentation for the Technologic Systems NBUS
        arm64: dts: actions: s900-bubblegum-96: Add fake uart5 clock
        ARM: dts: owl-s500: Add CubieBoard6
        dt-bindings: arm: actions: Add CubieBoard6
        ARM: dts: owl-s500-guitar-bb-rev-b: Add fake uart3 clock
        ARM: dts: owl-s500: Set power domains for CPU2 and CPU3
        arm: dts: mt7623: remove unused compatible string for pio node
        arm: dts: mt7623: update usb related nodes
        arm: dts: mt7623: update crypto node
        ARM: dts: sun8i: a711: Enable USB OTG
        ARM: dts: sun8i: a711: Add regulator support
        ARM: dts: sun8i: a83t: bananapi-m3: Enable AP6212 WiFi on mmc1
        ARM: dts: sun8i: a83t: cubietruck-plus: Enable AP6330 WiFi on mmc1
        ARM: dts: sun8i: a83t: Move mmc1 pinctrl setting to dtsi file
        ARM: dts: sun8i: a83t: allwinner-h8homlet-v2: Add AXP818 regulator nodes
        ARM: dts: sun8i: a83t: bananapi-m3: Add AXP813 regulator nodes
        ARM: dts: sun8i: a83t: cubietruck-plus: Add AXP818 regulator nodes
        ARM: dts: sunxi: Add dtsi for AXP81x PMIC
        arm64: dts: allwinner: H5: Restore EMAC changes
        ...
      527d1470
    • Linus Torvalds's avatar
      Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 8c609698
      Linus Torvalds authored
      Pull ARM SoC platform updates from Arnd Bergmann:
       "Most of the commits are for defconfig changes, to enable newly added
        drivers or features that people have started using. For the changed
        lines lines, we have mostly cleanups, the affected platforms are OMAP,
        Versatile, EP93xx, Samsung, Broadcom, i.MX, and Actions.
      
        The largest single change is the introduction of the TI "sysc" bus
        driver, with the intention of cleaning up more legacy code.
      
        Two new SoC platforms get added this time:
      
         - Allwinner R40 is a modernized version of the A20 chip, now with a
           Quad-Core ARM Cortex-A7. According to the manufacturer, it is
           intended for "Smart Hardware"
      
         - Broadcom Hurricane 2 (Aka Strataconnect BCM5334X) is a family of
           chips meant for managed gigabit ethernet switches, based around a
           Cortex-A9 CPU.
      
        Finally, we gain SMP support for two platforms: Renesas R-Car E2 and
        Amlogic Meson8/8b, which were previously added but only supported
        uniprocessor operation"
      
      * tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (118 commits)
        ARM: multi_v7_defconfig: Select RPMSG_VIRTIO as module
        ARM: multi_v7_defconfig: enable CONFIG_GPIO_UNIPHIER
        arm64: defconfig: enable CONFIG_GPIO_UNIPHIER
        ARM: meson: enable MESON_IRQ_GPIO in Kconfig for meson8b
        ARM: meson: Add SMP bringup code for Meson8 and Meson8b
        ARM: smp_scu: allow the platform code to read the SCU CPU status
        ARM: smp_scu: add a helper for powering on a specific CPU
        dt-bindings: Amlogic: Add Meson8 and Meson8b SMP related documentation
        ARM: OMAP3: Delete an unnecessary variable initialisation in omap3xxx_hwmod_init()
        ARM: OMAP3: Use common error handling code in omap3xxx_hwmod_init()
        ARM: defconfig: select the right SX150X driver
        arm64: defconfig: Enable QCOM_IOMMU
        arm64: Add ThunderX drivers to defconfig
        arm64: defconfig: Enable Tegra PCI controller
        cpufreq: imx6q: Move speed grading check to cpufreq driver
        arm64: defconfig: re-enable Qualcomm DB410c USB
        ARM: configs: stm32: Add MDMA support in STM32 defconfig
        ARM: imx: Enable cpuidle for i.MX6DL starting at 1.1
        bus: ti-sysc: Fix unbalanced pm_runtime_enable by adding remove
        bus: ti-sysc: mark PM functions as __maybe_unused
        ...
      8c609698
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 18c83d2c
      Linus Torvalds authored
      Pull virtio updates from Michael Tsirkin:
       "Fixes in qemu, vhost and virtio"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        fw_cfg: fix the command line module name
        vhost/vsock: fix uninitialized vhost_vsock->guest_cid
        vhost: fix end of range for access_ok
        vhost/scsi: Use safe iteration in vhost_scsi_complete_cmd_work()
        virtio_balloon: fix deadlock on OOM
      18c83d2c
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 051089a2
      Linus Torvalds authored
      Pull xen updates from Juergen Gross:
       "Xen features and fixes for v4.15-rc1
      
        Apart from several small fixes it contains the following features:
      
         - a series by Joao Martins to add vdso support of the pv clock
           interface
      
         - a series by Juergen Gross to add support for Xen pv guests to be
           able to run on 5 level paging hosts
      
         - a series by Stefano Stabellini adding the Xen pvcalls frontend
           driver using a paravirtualized socket interface"
      
      * tag 'for-linus-4.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (34 commits)
        xen/pvcalls: fix potential endless loop in pvcalls-front.c
        xen/pvcalls: Add MODULE_LICENSE()
        MAINTAINERS: xen, kvm: track pvclock-abi.h changes
        x86/xen/time: setup vcpu 0 time info page
        x86/xen/time: set pvclock flags on xen_time_init()
        x86/pvclock: add setter for pvclock_pvti_cpu0_va
        ptp_kvm: probe for kvm guest availability
        xen/privcmd: remove unused variable pageidx
        xen: select grant interface version
        xen: update arch/x86/include/asm/xen/cpuid.h
        xen: add grant interface version dependent constants to gnttab_ops
        xen: limit grant v2 interface to the v1 functionality
        xen: re-introduce support for grant v2 interface
        xen: support priv-mapping in an HVM tools domain
        xen/pvcalls: remove redundant check for irq >= 0
        xen/pvcalls: fix unsigned less than zero error check
        xen/time: Return -ENODEV from xen_get_wallclock()
        xen/pvcalls-front: mark expected switch fall-through
        xen: xenbus_probe_frontend: mark expected switch fall-throughs
        xen/time: do not decrease steal time after live migration on xen
        ...
      051089a2
    • Linus Torvalds's avatar
      Merge tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 974aa563
      Linus Torvalds authored
      Pull KVM updates from Radim Krčmář:
       "First batch of KVM changes for 4.15
      
        Common:
         - Python 3 support in kvm_stat
         - Accounting of slabs to kmemcg
      
        ARM:
         - Optimized arch timer handling for KVM/ARM
         - Improvements to the VGIC ITS code and introduction of an ITS reset
           ioctl
         - Unification of the 32-bit fault injection logic
         - More exact external abort matching logic
      
        PPC:
         - Support for running hashed page table (HPT) MMU mode on a host that
           is using the radix MMU mode; single threaded mode on POWER 9 is
           added as a pre-requisite
         - Resolution of merge conflicts with the last second 4.14 HPT fixes
         - Fixes and cleanups
      
        s390:
         - Some initial preparation patches for exitless interrupts and crypto
         - New capability for AIS migration
         - Fixes
      
        x86:
         - Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs,
           and after-reset state
         - Refined dependencies for VMX features
         - Fixes for nested SMI injection
         - A lot of cleanups"
      
      * tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (89 commits)
        KVM: s390: provide a capability for AIS state migration
        KVM: s390: clear_io_irq() requests are not expected for adapter interrupts
        KVM: s390: abstract conversion between isc and enum irq_types
        KVM: s390: vsie: use common code functions for pinning
        KVM: s390: SIE considerations for AP Queue virtualization
        KVM: s390: document memory ordering for kvm_s390_vcpu_wakeup
        KVM: PPC: Book3S HV: Cosmetic post-merge cleanups
        KVM: arm/arm64: fix the incompatible matching for external abort
        KVM: arm/arm64: Unify 32bit fault injection
        KVM: arm/arm64: vgic-its: Implement KVM_DEV_ARM_ITS_CTRL_RESET
        KVM: arm/arm64: Document KVM_DEV_ARM_ITS_CTRL_RESET
        KVM: arm/arm64: vgic-its: Free caches when GITS_BASER Valid bit is cleared
        KVM: arm/arm64: vgic-its: New helper functions to free the caches
        KVM: arm/arm64: vgic-its: Remove kvm_its_unmap_device
        arm/arm64: KVM: Load the timer state when enabling the timer
        KVM: arm/arm64: Rework kvm_timer_should_fire
        KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate
        KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
        KVM: arm/arm64: Move phys_timer_emulate function
        KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps
        ...
      974aa563
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · 441692aa
      Linus Torvalds authored
      Pull ARM updates from Russell King:
      
       - add support for ELF fdpic binaries on both MMU and noMMU platforms
      
       - linker script cleanups
      
       - support for compressed .data section for XIP images
      
       - discard memblock arrays when possible
      
       - various cleanups
      
       - atomic DMA pool updates
      
       - better diagnostics of missing/corrupt device tree
      
       - export information to allow userspace kexec tool to place images more
         inteligently, so that the device tree isn't overwritten by the
         booting kernel
      
       - make early_printk more efficient on semihosted systems
      
       - noMMU cleanups
      
       - SA1111 PCMCIA update in preparation for further cleanups
      
      * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (38 commits)
        ARM: 8719/1: NOMMU: work around maybe-uninitialized warning
        ARM: 8717/2: debug printch/printascii: translate '\n' to "\r\n" not "\n\r"
        ARM: 8713/1: NOMMU: Support MPU in XIP configuration
        ARM: 8712/1: NOMMU: Use more MPU regions to cover memory
        ARM: 8711/1: V7M: Add support for MPU to M-class
        ARM: 8710/1: Kconfig: Kill CONFIG_VECTORS_BASE
        ARM: 8709/1: NOMMU: Disallow MPU for XIP
        ARM: 8708/1: NOMMU: Rework MPU to be mostly done in C
        ARM: 8707/1: NOMMU: Update MPU accessors to use cp15 helpers
        ARM: 8706/1: NOMMU: Move out MPU setup in separate module
        ARM: 8702/1: head-common.S: Clear lr before jumping to start_kernel()
        ARM: 8705/1: early_printk: use printascii() rather than printch()
        ARM: 8703/1: debug.S: move hexbuf to a writable section
        ARM: add additional table to compressed kernel
        ARM: decompressor: fix BSS size calculation
        pcmcia: sa1111: remove special sa1111 mmio accessors
        pcmcia: sa1111: use sa1111_get_irq() to obtain IRQ resources
        ARM: better diagnostics with missing/corrupt dtb
        ARM: 8699/1: dma-mapping: Remove init_dma_coherent_pool_size()
        ARM: 8698/1: dma-mapping: Mark atomic_pool as __ro_after_init
        ..
      441692aa
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 5b0e2cb0
      Linus Torvalds authored
      Pull powerpc updates from Michael Ellerman:
       "A bit of a small release, I suspect in part due to me travelling for
        KS. But my backlog of patches to review is smaller than usual, so I
        think in part folks just didn't send as much this cycle.
      
        Non-highlights:
      
         - Five fixes for the >128T address space handling, both to fix bugs
           in our implementation and to bring the semantics exactly into line
           with x86.
      
        Highlights:
      
         - Support for a new OPAL call on bare metal machines which gives us a
           true NMI (ie. is not masked by MSR[EE]=0) for debugging etc.
      
         - Support for Power9 DD2 in the CXL driver.
      
         - Improvements to machine check handling so that uncorrectable errors
           can be reported into the generic memory_failure() machinery.
      
         - Some fixes and improvements for VPHN, which is used under PowerVM
           to notify the Linux partition of topology changes.
      
         - Plumbing to enable TM (transactional memory) without suspend on
           some Power9 processors (PPC_FEATURE2_HTM_NO_SUSPEND).
      
         - Support for emulating vector loads form cache-inhibited memory, on
           some Power9 revisions.
      
         - Disable the fast-endian switch "syscall" by default (behind a
           CONFIG), we believe it has never had any users.
      
         - A major rework of the API drivers use when initiating and waiting
           for long running operations performed by OPAL firmware, and changes
           to the powernv_flash driver to use the new API.
      
         - Several fixes for the handling of FP/VMX/VSX while processes are
           using transactional memory.
      
         - Optimisations of TLB range flushes when using the radix MMU on
           Power9.
      
         - Improvements to the VAS facility used to access coprocessors on
           Power9, and related improvements to the way the NX crypto driver
           handles requests.
      
         - Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit.
      
        Thanks to: Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew
        Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin
        Herrenschmidt, Breno Leitao, Christophe Leroy, Christophe Lombard,
        Cyril Bur, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven,
        Guilherme G. Piccoli, Gustavo Romero, Haren Myneni, Joel Stanley,
        Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami Hiramatsu,
        Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao,
        Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia
        Franco de Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee,
        Shriya, Stephen Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel
        Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, and William A.
        Kennington III"
      
      * tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (151 commits)
        powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature
        powerpc/64s: Fix masking of SRR1 bits on instruction fault
        powerpc/64s: mm_context.addr_limit is only used on hash
        powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation
        powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundary
        powerpc/64s/hash: Fix fork() with 512TB process address space
        powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocation
        powerpc/64s/hash: Fix 512T hint detection to use >= 128T
        powerpc: Fix DABR match on hash based systems
        powerpc/signal: Properly handle return value from uprobe_deny_signal()
        powerpc/fadump: use kstrtoint to handle sysfs store
        powerpc/lib: Implement UACCESS_FLUSHCACHE API
        powerpc/lib: Implement PMEM API
        powerpc/powernv/npu: Don't explicitly flush nmmu tlb
        powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm()
        powerpc/powernv/idle: Round up latency and residency values
        powerpc/kprobes: refactor kprobe_lookup_name for safer string operations
        powerpc/kprobes: Blacklist emulate_update_regs() from kprobes
        powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace
        powerpc/kprobes: Disable preemption before invoking probe handler for optprobes
        ...
      5b0e2cb0
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · 758f8758
      Linus Torvalds authored
      Pull user namespace update from Eric Biederman:
       "The only change that is production ready this round is the work to
        increase the number of uid and gid mappings a user namespace can
        support from 5 to 340.
      
        This code was carefully benchmarked and it was confirmed that in the
        existing cases the performance remains the same. In the worst case
        with 340 mappings an cache cold stat times go from 158ns to 248ns.
        That is noticable but still quite small, and only the people who are
        doing crazy things pay the cost.
      
        This work uncovered some documentation and cleanup opportunities in
        the mapping code, and patches to make those cleanups and improve the
        documentation will be coming in the next merge window"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        userns: Simplify insert_extent
        userns: Make map_id_down a wrapper for map_id_range_down
        userns: Don't read extents twice in m_start
        userns: Simplify the user and group mapping functions
        userns: Don't special case a count of 0
        userns: bump idmap limits to 340
        userns: use union in {g,u}idmap struct
      758f8758
    • Linus Torvalds's avatar
      Merge tag 'f2fs-for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs · a02cd422
      Linus Torvalds authored
      Pull f2fs updates from Jaegeuk Kim:
       "In this round, we introduce sysfile-based quota support which is
        required for Android by default. In addition, we allow that users are
        able to reserve some blocks in runtime to mitigate performance drops
        in low free space.
      
        Enhancements:
         - assign proper data segments according to write_hints given by user
         - issue cache_flush on dirty devices only among multiple devices
         - exploit cp_error flag and add more faults to enhance fault
           injection test
         - conduct more readaheads during f2fs_readdir
         - add a range for discard commands
      
        Bug fixes:
         - fix zero stat->st_blocks when inline_data is set
         - drop crypto key and free stale memory pointer while evict_inode is
           failing
         - fix some corner cases in free space and segment management
         - fix wrong last_disk_size
      
        This series includes lots of clean-ups and code enhancement in terms
        of xattr operations, discard/flush command control. In addition, it
        adds versatile debugfs entries to monitor f2fs status"
      
      * tag 'f2fs-for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (75 commits)
        f2fs: deny accessing encryption policy if encryption is off
        f2fs: inject fault in inc_valid_node_count
        f2fs: fix to clear FI_NO_PREALLOC
        f2fs: expose quota information in debugfs
        f2fs: separate nat entry mem alloc from nat_tree_lock
        f2fs: validate before set/clear free nat bitmap
        f2fs: avoid opened loop codes in __add_ino_entry
        f2fs: apply write hints to select the type of segments for buffered write
        f2fs: introduce scan_curseg_cache for cleanup
        f2fs: optimize the way of traversing free_nid_bitmap
        f2fs: keep scanning until enough free nids are acquired
        f2fs: trace checkpoint reason in fsync()
        f2fs: keep isize once block is reserved cross EOF
        f2fs: avoid race in between GC and block exchange
        f2fs: save a multiplication for last_nid calculation
        f2fs: fix summary info corruption
        f2fs: remove dead code in update_meta_page
        f2fs: remove unneeded semicolon
        f2fs: don't bother with inode->i_version
        f2fs: check curseg space before foreground GC
        ...
      a02cd422
    • Linus Torvalds's avatar
      Merge tag 'afs-next-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 487e2c9f
      Linus Torvalds authored
      Pull AFS updates from David Howells:
       "kAFS filesystem driver overhaul.
      
        The major points of the overhaul are:
      
         (1) Preliminary groundwork is laid for supporting network-namespacing
             of kAFS. The remainder of the namespacing work requires some way
             to pass namespace information to submounts triggered by an
             automount. This requires something like the mount overhaul that's
             in progress.
      
         (2) sockaddr_rxrpc is used in preference to in_addr for holding
             addresses internally and add support for talking to the YFS VL
             server. With this, kAFS can do everything over IPv6 as well as
             IPv4 if it's talking to servers that support it.
      
         (3) Callback handling is overhauled to be generally passive rather
             than active. 'Callbacks' are promises by the server to tell us
             about data and metadata changes. Callbacks are now checked when
             we next touch an inode rather than actively going and looking for
             it where possible.
      
         (4) File access permit caching is overhauled to store the caching
             information per-inode rather than per-directory, shared over
             subordinate files. Whilst older AFS servers only allow ACLs on
             directories (shared to the files in that directory), newer AFS
             servers break that restriction.
      
             To improve memory usage and to make it easier to do mass-key
             removal, permit combinations are cached and shared.
      
         (5) Cell database management is overhauled to allow lighter locks to
             be used and to make cell records autonomous state machines that
             look after getting their own DNS records and cleaning themselves
             up, in particular preventing races in acquiring and relinquishing
             the fscache token for the cell.
      
         (6) Volume caching is overhauled. The afs_vlocation record is got rid
             of to simplify things and the superblock is now keyed on the cell
             and the numeric volume ID only. The volume record is tied to a
             superblock and normal superblock management is used to mediate
             the lifetime of the volume fscache token.
      
         (7) File server record caching is overhauled to make server records
             independent of cells and volumes. A server can be in multiple
             cells (in such a case, the administrator must make sure that the
             VL services for all cells correctly reflect the volumes shared
             between those cells).
      
             Server records are now indexed using the UUID of the server
             rather than the address since a server can have multiple
             addresses.
      
         (8) File server rotation is overhauled to handle VMOVED, VBUSY (and
             similar), VOFFLINE and VNOVOL indications and to handle rotation
             both of servers and addresses of those servers. The rotation will
             also wait and retry if the server says it is busy.
      
         (9) Data writeback is overhauled. Each inode no longer stores a list
             of modified sections tagged with the key that authorised it in
             favour of noting the modified region of a page in page->private
             and storing a list of keys that made modifications in the inode.
      
             This simplifies things and allows other keys to be used to
             actually write to the server if a key that made a modification
             becomes useless.
      
        (10) Writable mmap() is implemented. This allows a kernel to be build
             entirely on AFS.
      
        Note that Pre AFS-3.4 servers are no longer supported, though this can
        be added back if necessary (AFS-3.4 was released in 1998)"
      
      * tag 'afs-next-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (35 commits)
        afs: Protect call->state changes against signals
        afs: Trace page dirty/clean
        afs: Implement shared-writeable mmap
        afs: Get rid of the afs_writeback record
        afs: Introduce a file-private data record
        afs: Use a dynamic port if 7001 is in use
        afs: Fix directory read/modify race
        afs: Trace the sending of pages
        afs: Trace the initiation and completion of client calls
        afs: Fix documentation on # vs % prefix in mount source specification
        afs: Fix total-length calculation for multiple-page send
        afs: Only progress call state at end of Tx phase from rxrpc callback
        afs: Make use of the YFS service upgrade to fully support IPv6
        afs: Overhaul volume and server record caching and fileserver rotation
        afs: Move server rotation code into its own file
        afs: Add an address list concept
        afs: Overhaul cell database management
        afs: Overhaul permit caching
        afs: Overhaul the callback handling
        afs: Rename struct afs_call server member to cm_server
        ...
      487e2c9f
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · b630a23a
      Linus Torvalds authored
      Pull pin control updates from Linus Walleij:
       "This is the bulk of pin control changes for the v4.15 kernel cycle:
      
        Core:
      
         - The pin control Kconfig entry PINCTRL is now turned into a
           menuconfig option. This obviously has the implication of making the
           subsystem menu visible in menuconfig. This is happening because of
           two things:
      
            (a) Intel have started to deploy and depend on pin controllers in
                a way that is affecting users directly. This happens on the
                highly integrated laptop chipsets named after geographical
                places: baytrail, broxton, cannonlake, cedarfork, cherryview,
                denverton, geminilake, lewisburg, merrifield, sunrisepoint...
                It started a while back and now it is ever more evident that
                this is crucial infrastructure for x86 laptops and not an
                embedded obscurity anymore. Users need to be aware.
      
            (b) Pin control expanders on I2C and SPI that are arch-agnostic.
                Currently Semtech SX150X and Microchip MCP28x08 but more are
                expected. Users will have to be able to configure these in
                directly for their set-up.
      
         - Just go and select GPIOLIB now that we made sure that GPIOLIB is a
           very vanilla subsystem. Do not depend on it, if we need it, select
           it.
      
         - Exposing the pin control subsystem in menuconfig uncovered a bunch
           of obscure bugs that are now hopefully fixed, all more or less
           pertaining to Blackfin.
      
         - Unified namespace for cross-calls between pin control and GPIO.
      
         - New support for clock skew/delay generic DT bindings and generic
           pin config options for this.
      
         - Minor documentation improvements.
      
        Various:
      
         - The Renesas SH-PFC pin controller has evolved a lot. It seems
           Renesas are churning out new SoCs by the minute.
      
         - A bunch of non-critical fixes for the Rockchip driver.
      
         - Improve the use of library functions instead of open coding.
      
         - Support the MCP28018 variant in the MCP28x08 driver.
      
         - Static constifying"
      
      * tag 'pinctrl-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (91 commits)
        pinctrl: gemini: Fix missing pad descriptions
        pinctrl: Add some depends on HAS_IOMEM
        pinctrl: samsung/s3c24xx: add CONFIG_OF dependency
        pinctrl: gemini: Fix GMAC groups
        pinctrl: qcom: spmi-gpio: Add pmi8994 gpio support
        pinctrl: ti-iodelay: remove redundant unused variable dev
        pinctrl: max77620: Use common error handling code in max77620_pinconf_set()
        pinctrl: gemini: Implement clock skew/delay config
        pinctrl: gemini: Use generic DT parser
        pinctrl: Add skew-delay pin config and bindings
        pinctrl: armada-37xx: Add edge both type gpio irq support
        pinctrl: uniphier: remove eMMC hardware reset pin-mux
        pinctrl: rockchip: Add iomux-route switching support for rk3288
        pinctrl: intel: Add Intel Cedar Fork PCH pin controller support
        pinctrl: intel: Make offset to interrupt status register configurable
        pinctrl: sunxi: Enforce the strict mode by default
        pinctrl: sunxi: Disable strict mode for old pinctrl drivers
        pinctrl: sunxi: Introduce the strict flag
        pinctrl: sh-pfc: Save/restore registers for PSCI system suspend
        pinctrl: sh-pfc: r8a7796: Use generic IOCTRL register description
        ...
      b630a23a
    • Linus Torvalds's avatar
      Merge tag 'backlight-next-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight · 9c7a867e
      Linus Torvalds authored
      Pull backlight updates from Lee Jones:
      
         - handle 32bit overflow in pwm_bl
      
         - remove redundant code/checks in tps65217_bl and ili922x
      
      * tag 'backlight-next-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
        backlight: ili922x: Remove redundant variable len
        backlight: tps65217_bl: Remove unnecessary default brightness check
        backlight: pwm_bl: Fix overflow condition
      9c7a867e
    • Linus Torvalds's avatar
      Merge tag 'mfd-next-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd · d3092e4e
      Linus Torvalds authored
      Pull MFD updates from Lee Jones:
       "New drivers:
         - Add support for Cherry Trail Dollar Cove TI PMIC
         - Add support for Add Spreadtrum SC27xx series PMICs
      
        New device support:
         - Add support Regulator to axp20x
      
        New functionality:
         - Add DT support; aspeed-scu sc27xx-pmic
         - Add power saving support; rts5249
      
        Fix-ups:
         - DT clean-up/rework; tps65217, max77693, iproc-cdru, iproc-mhb, tps65218
         - Staticise/constify; stw481x
         - Use new succinct IRQ API; fsl-imx25-tsadc
         - Kconfig fix-ups; MFD_TPS65218
         - Identify SPI method; lpc_ich
         - Use managed resources (devm_*) calls; ssbi
         - Remove unused/obsolete code/documentation; mc13xxx
      
        Bug fixes:
         - Fix typo in MAINTAINERS
         - Fix error handling; mxs-lradc
         - Clean-up IRQs on .remove; fsl-imx25-tsadc"
      
      * tag 'mfd-next-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (21 commits)
        dt-bindings: mfd: mc13xxx: Remove obsolete property
        mfd: axp20x: Add axp20x-regulator cell for AXP813
        mfd: Add Spreadtrum SC27xx series PMICs driver
        dt-bindings: mfd: Add Spreadtrum SC27xx PMIC documentation
        mfd: ssbi: Use devm_of_platform_populate()
        mfd: fsl-imx25: Clean up irq settings during removal
        mfd: mxs-lradc: Fix error handling in mxs_lradc_probe()
        mfd: lpc_ich: Avoton/Rangeley uses SPI_BYT method
        mfd: tps65218: Introduce dependency on CONFIG_OF
        mfd: tps65218: Correct the config description
        MAINTAINERS: Fix Dialog search term for watchdog binding file
        mfd: fsl-imx25: Set irq handler and data in one go
        mfd: rts5249: Add support for RTS5250S power saving
        ACPI / PMIC: Add opregion driver for Intel Dollar Cove TI PMIC
        mfd: Add support for Cherry Trail Dollar Cove TI PMIC
        syscon: dt-bindings: Add binding document for iProc MHB block
        syscon: dt-bindings: Add binding doc for Broadcom iProc CDRU
        mfd: max77693: Add muic of_compatible in mfd_cell
        mfd: stw481x: Make three arrays static const, reduces object code size
        mfd: tps65217: Introduce dependency on CONFIG_OF
        ...
      d3092e4e
    • Linus Torvalds's avatar
      Merge tag 'char-misc-4.15-rc1' of... · 2bf16b7a
      Linus Torvalds authored
      Merge tag 'char-misc-4.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
      
      Pull char/misc updates from Greg KH:
       "Here is the big set of char/misc and other driver subsystem patches
        for 4.15-rc1.
      
        There are small changes all over here, hyperv driver updates, pcmcia
        driver updates, w1 driver updats, vme driver updates, nvmem driver
        updates, and lots of other little one-off driver updates as well. The
        shortlog has the full details.
      
        All of these have been in linux-next for quite a while with no
        reported issues"
      
      * tag 'char-misc-4.15-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (90 commits)
        VME: Return -EBUSY when DMA list in use
        w1: keep balance of mutex locks and refcnts
        MAINTAINERS: Update VME subsystem tree.
        nvmem: sunxi-sid: add support for A64/H5's SID controller
        nvmem: imx-ocotp: Update module description
        nvmem: imx-ocotp: Enable i.MX7D OTP write support
        nvmem: imx-ocotp: Add i.MX7D timing write clock setup support
        nvmem: imx-ocotp: Move i.MX6 write clock setup to dedicated function
        nvmem: imx-ocotp: Add support for banked OTP addressing
        nvmem: imx-ocotp: Pass parameters via a struct
        nvmem: imx-ocotp: Restrict OTP write to IMX6 processors
        nvmem: uniphier: add UniPhier eFuse driver
        dt-bindings: nvmem: add description for UniPhier eFuse
        nvmem: set nvmem->owner to nvmem->dev->driver->owner if unset
        nvmem: qfprom: fix different address space warnings of sparse
        nvmem: mtk-efuse: fix different address space warnings of sparse
        nvmem: mtk-efuse: use stack for nvmem_config instead of malloc'ing it
        nvmem: imx-iim: use stack for nvmem_config instead of malloc'ing it
        thunderbolt: tb: fix use after free in tb_activate_pcie_devices
        MAINTAINERS: Add git tree for Thunderbolt development
        ...
      2bf16b7a
    • Linus Torvalds's avatar
      Merge tag 'driver-core-4.15-rc1' of... · b9743042
      Linus Torvalds authored
      Merge tag 'driver-core-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core updates from Greg KH:
       "Here is the set of driver core / debugfs patches for 4.15-rc1.
      
        Not many here, mostly all are debugfs fixes to resolve some
        long-reported problems with files going away with references to them
        in userspace. There's also some SPDX cleanups for the debugfs code, as
        well as a few other minor driver core changes for issues reported by
        people.
      
        All of these have been in linux-next for a week or more with no
        reported issues"
      
      * tag 'driver-core-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        driver core: Fix device link deferred probe
        debugfs: Remove redundant license text
        debugfs: add SPDX identifiers to all debugfs files
        debugfs: defer debugfs_fsdata allocation to first usage
        debugfs: call debugfs_real_fops() only after debugfs_file_get()
        debugfs: purge obsolete SRCU based removal protection
        IB/hfi1: convert to debugfs_file_get() and -put()
        debugfs: convert to debugfs_file_get() and -put()
        debugfs: debugfs_real_fops(): drop __must_hold sparse annotation
        debugfs: implement per-file removal protection
        debugfs: add support for more elaborate ->d_fsdata
        driver core: Move device_links_purge() after bus_remove_device()
        arch_topology: Fix section miss match warning due to free_raw_capacity()
        driver-core: pr_err() strings should end with newlines
      b9743042
    • Masahiro Yamada's avatar
      arm64: dts: uniphier: route on-board device IRQ to GPIO controller for PXs3 · ba5b5034
      Masahiro Yamada authored
      Commit 429f203e ("arm64: dts: uniphier: route on-board device IRQ
      to GPIO controller") missed to update this DTS.  It becames a real
      problem when arm and arm64 trees are merged together.
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      ba5b5034
    • Radim Krčmář's avatar
      Merge tag 'kvm-s390-next-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux · a6014f1a
      Radim Krčmář authored
      KVM: s390: fixes and improvements for 4.15
      
      - Some initial preparation patches for exitless interrupts and crypto
      - New capability for AIS migration
      - Fixes
      - merge of the sthyi tree from the base s390 team, which moves the sthyi
      out of KVM into a shared function also for non-KVM
      a6014f1a
    • Linus Torvalds's avatar
      Merge tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux · e60e1ee6
      Linus Torvalds authored
      Pull drm updates from Dave Airlie:
       "This is the main drm pull request for v4.15.
      
        Core:
         - Atomic object lifetime fixes
         - Atomic iterator improvements
         - Sparse/smatch fixes
         - Legacy kms ioctls to be interruptible
         - EDID override improvements
         - fb/gem helper cleanups
         - Simple outreachy patches
         - Documentation improvements
         - Fix dma-buf rcu races
         - DRM mode object leasing for improving VR use cases.
         - vgaarb improvements for non-x86 platforms.
      
        New driver:
         - tve200: Faraday Technology TVE200 block.
      
           This "TV Encoder" encodes a ITU-T BT.656 stream and can be found in
           the StorLink SL3516 (later Cortina Systems CS3516) as well as the
           Grain Media GM8180.
      
        New bridges:
         - SiI9234 support
      
        New panels:
         - S6E63J0X03, OTM8009A, Seiko 43WVF1G, 7" rpi touch panel, Toshiba
           LT089AC19000, Innolux AT043TN24
      
        i915:
         - Remove Coffeelake from alpha support
         - Cannonlake workarounds
         - Infoframe refactoring for DisplayPort
         - VBT updates
         - DisplayPort vswing/emph/buffer translation refactoring
         - CCS fixes
         - Restore GPU clock boost on missed vblanks
         - Scatter list updates for userptr allocations
         - Gen9+ transition watermarks
         - Display IPC (Isochronous Priority Control)
         - Private PAT management
         - GVT: improved error handling and pci config sanitizing
         - Execlist refactoring
         - Transparent Huge Page support
         - User defined priorities support
         - HuC/GuC firmware refactoring
         - DP MST fixes
         - eDP power sequencing fixes
         - Use RCU instead of stop_machine
         - PSR state tracking support
         - Eviction fixes
         - BDW DP aux channel timeout fixes
         - LSPCON fixes
         - Cannonlake PLL fixes
      
        amdgpu:
         - Per VM BO support
         - Powerplay cleanups
         - CI powerplay support
         - PASID mgr for kfd
         - SR-IOV fixes
         - initial GPU reset for vega10
         - Prime mmap support
         - TTM updates
         - Clock query interface for Raven
         - Fence to handle ioctl
         - UVD encode ring support on Polaris
         - Transparent huge page DMA support
         - Compute LRU pipe tweaks
         - BO flag to allow buffers to opt out of implicit sync
         - CTX priority setting API
         - VRAM lost infrastructure plumbing
      
        qxl:
         - fix flicker since atomic rework
      
        amdkfd:
         - Further improvements from internal AMD tree
         - Usermode events
         - Drop radeon support
      
        nouveau:
         - Pascal temperature sensor support
         - Improved BAR2 handling
         - MMU rework to support Pascal MMU
      
        exynos:
         - Improved HDMI/mixer support
         - HDMI audio interface support
      
        tegra:
         - Prep work for tegra186
         - Cleanup/fixes
      
        msm:
         - Preemption support for a5xx
         - Display fixes for 8x96 (snapdragon 820)
         - Async cursor plane fixes
         - FW loading rework
         - GPU debugging improvements
      
        vc4:
         - Prep for DSI panels
         - fix T-format tiling scanout
         - New madvise ioctl
      
        Rockchip:
         - LVDS support
      
        omapdrm:
         - omap4 HDMI CEC support
      
        etnaviv:
         - GPU performance counters groundwork
      
        sun4i:
         - refactor driver load + TCON backend
         - HDMI improvements
         - A31 support
         - Misc fixes
      
        udl:
         - Probe/EDID read fixes.
      
        tilcdc:
         - Misc fixes.
      
        pl111:
         - Support more variants
      
        adv7511:
         - Improve EDID handling.
         - HDMI CEC support
      
        sii8620:
         - Add remote control support"
      
      * tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux: (1480 commits)
        drm/rockchip: analogix_dp: Use mutex rather than spinlock
        drm/mode_object: fix documentation for object lookups.
        drm/i915: Reorder context-close to avoid calling i915_vma_close() under RCU
        drm/i915: Move init_clock_gating() back to where it was
        drm/i915: Prune the reservation shared fence array
        drm/i915: Idle the GPU before shinking everything
        drm/i915: Lock llist_del_first() vs llist_del_all()
        drm/i915: Calculate ironlake intermediate watermarks correctly, v2.
        drm/i915: Disable lazy PPGTT page table optimization for vGPU
        drm/i915/execlists: Remove the priority "optimisation"
        drm/i915: Filter out spurious execlists context-switch interrupts
        drm/amdgpu: use irq-safe lock for kiq->ring_lock
        drm/amdgpu: bypass lru touch for KIQ ring submission
        drm/amdgpu: Potential uninitialized variable in amdgpu_vm_update_directories()
        drm/amdgpu: potential uninitialized variable in amdgpu_vce_ring_parse_cs()
        drm/amd/powerplay: initialize a variable before using it
        drm/amd/powerplay: suppress KASAN out of bounds warning in vega10_populate_all_memory_levels
        drm/amd/amdgpu: fix evicted VRAM bo adjudgement condition
        drm/vblank: Tune drm_crtc_accurate_vblank_count() WARN down to a debug
        drm/rockchip: add CONFIG_OF dependency for lvds
        ...
      e60e1ee6
    • Linus Torvalds's avatar
      Merge tag 'media/v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 5d352e69
      Linus Torvalds authored
      Pull media updates from Mauro Carvalho Chehab:
      
       - Documentation for digital TV (both kAPI and uAPI) are now in sync
         with the implementation (except for legacy/deprecated ioctls). This
         is a major step, as there were always a gap there
      
       - New sensor driver: imx274
      
       - New cec driver: cec-gpio
      
       - New platform driver for rockship rga and tegra CEC
      
       - New RC driver: tango-ir
      
       - Several cleanups at atomisp driver
      
       - Core improvements for RC, CEC, V4L2 async probing support and DVB
      
       - Lots of drivers cleanup, fixes and improvements.
      
      * tag 'media/v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (332 commits)
        dvb_frontend: don't use-after-free the frontend struct
        media: dib0700: fix invalid dvb_detach argument
        media: v4l2-ctrls: Don't validate BITMASK twice
        media: s5p-mfc: fix lockdep warning
        media: dvb-core: always call invoke_release() in fe_free()
        media: usb: dvb-usb-v2: dvb_usb_core: remove redundant code in dvb_usb_fe_sleep
        media: au0828: make const array addr_list static
        media: cx88: make const arrays default_addr_list and pvr2000_addr_list static
        media: drxd: make const array fastIncrDecLUT static
        media: usb: fix spelling mistake: "synchronuously" -> "synchronously"
        media: ddbridge: fix build warnings
        media: av7110: avoid 2038 overflow in debug print
        media: Don't do DMA on stack for firmware upload in the AS102 driver
        media: v4l: async: fix unregister for implicitly registered sub-device notifiers
        media: v4l: async: fix return of unitialized variable ret
        media: imx274: fix missing return assignment from call to imx274_mode_regs
        media: camss-vfe: always initialize reg at vfe_set_xbar_cfg()
        media: atomisp: make function calls cleaner
        media: atomisp: get rid of storage_class.h
        media: atomisp: get rid of wrong stddef.h include
        ...
      5d352e69
    • Linus Torvalds's avatar
      Merge tag 'leaks-4.15-rc1' of git://github.com/tcharding/linux · 93ea0eb7
      Linus Torvalds authored
      Pull leaking_addresses script updates from Tobin Harding:
       "Here are development patches for the leaking_addresses.pl script.
      
        Changes include:
      
         - add summary reporting to the script
      
         - add 'SigIgn' to false positives
      
         - add a file read timeout so the script doesn't block indefinitely
      
         - add infrastructure to enable multi-arch support and add support for ppc
      
         - add some exclude files/paths suggested by various people
      
         - code clean up and refactoring
      
         - overhaul command line options"
      
      * tag 'leaks-4.15-rc1' of git://github.com/tcharding/linux:
        leaking_addresses: add SigIgn to false positives
        leaking_addresses: add timeout on file read
        leaking_addresses: add support for ppc64
        leaking_addresses: add summary reporting options
        leaking_addresses: add to exclude files/paths list
        leaking_addresses: fix comment string typo
        leaking_addresses: remove command line options
        leaking_addresses: remove dead/unused code
        leaking_addresses: use tabs instead of spaces
      93ea0eb7
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 7c225c69
      Linus Torvalds authored
      Merge updates from Andrew Morton:
      
       - a few misc bits
      
       - ocfs2 updates
      
       - almost all of MM
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits)
        memory hotplug: fix comments when adding section
        mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP
        mm: simplify nodemask printing
        mm,oom_reaper: remove pointless kthread_run() error check
        mm/page_ext.c: check if page_ext is not prepared
        writeback: remove unused function parameter
        mm: do not rely on preempt_count in print_vma_addr
        mm, sparse: do not swamp log with huge vmemmap allocation failures
        mm/hmm: remove redundant variable align_end
        mm/list_lru.c: mark expected switch fall-through
        mm/shmem.c: mark expected switch fall-through
        mm/page_alloc.c: broken deferred calculation
        mm: don't warn about allocations which stall for too long
        fs: fuse: account fuse_inode slab memory as reclaimable
        mm, page_alloc: fix potential false positive in __zone_watermark_ok
        mm: mlock: remove lru_add_drain_all()
        mm, sysctl: make NUMA stats configurable
        shmem: convert shmem_init_inodecache() to void
        Unify migrate_pages and move_pages access checks
        mm, pagevec: rename pagevec drained field
        ...
      7c225c69
    • Fan Du's avatar
    • Oscar Salvador's avatar
      mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP · 0cd842f9
      Oscar Salvador authored
      free_area_init_node() calls alloc_node_mem_map(), but this function does
      nothing unless we have CONFIG_FLAT_NODE_MEM_MAP.
      
      As a cleanup, we can move the "#ifdef CONFIG_FLAT_NODE_MEM_MAP" within
      alloc_node_mem_map() out of the function, and define a
      alloc_node_mem_map() { } when CONFIG_FLAT_NODE_MEM_MAP is not present.
      
      This also moves the printk that lays within the "#ifdef
      CONFIG_FLAT_NODE_MEM_MAP" block from free_area_init_node() to
      alloc_node_mem_map(), getting rid of the "#ifdef
      CONFIG_FLAT_NODE_MEM_MAP" in free_area_init_node().
      
      [akpm@linux-foundation.org: clean up the printk while we're there]
      Link: http://lkml.kernel.org/r/20171114111935.GA11758@techadventures.netSigned-off-by: default avatarOscar Salvador <osalvador@techadventures.net>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0cd842f9
    • Michal Hocko's avatar
      mm: simplify nodemask printing · 0205f755
      Michal Hocko authored
      alloc_warn() and dump_header() have to explicitly handle NULL nodemask
      which forces both paths to use pr_cont.  We can do better.  printk
      already handles NULL pointers properly so all we need is to teach
      nodemask_pr_args to handle NULL nodemask carefully.  This allows
      simplification of both alloc_warn() and dump_header() and gets rid of
      pr_cont altogether.
      
      This patch has been motivated by patch from Joe Perches
      
        http://lkml.kernel.org/r/b31236dfe3fc924054fd7842bde678e71d193638.1509991345.git.joe@perches.com
      
      [akpm@linux-foundation.org: fix tile warning, per Arnd]
      Link: http://lkml.kernel.org/r/20171109100531.3cn2hcqnuj7mjaju@dhcp22.suse.czSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJoe Perches <joe@perches.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0205f755
    • Tetsuo Handa's avatar
      mm,oom_reaper: remove pointless kthread_run() error check · c50842c8
      Tetsuo Handa authored
      Since oom_init() is called before userspace processes start, memory
      allocation failure for creating the OOM reaper kernel thread will let
      the OOM killer call panic() rather than wake up the OOM reaper.
      
      Link: http://lkml.kernel.org/r/1510137800-4602-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c50842c8
    • Jaewon Kim's avatar
      mm/page_ext.c: check if page_ext is not prepared · e492080e
      Jaewon Kim authored
      online_page_ext() and page_ext_init() allocate page_ext for each
      section, but they do not allocate if the first PFN is !pfn_present(pfn)
      or !pfn_valid(pfn).  Then section->page_ext remains as NULL.
      lookup_page_ext checks NULL only if CONFIG_DEBUG_VM is enabled.  For a
      valid PFN, __set_page_owner will try to get page_ext through
      lookup_page_ext.  Without CONFIG_DEBUG_VM lookup_page_ext will misuse
      NULL pointer as value 0.  This incurrs invalid address access.
      
      This is the panic example when PFN 0x100000 is not valid but PFN
      0x13FC00 is being used for page_ext.  section->page_ext is NULL,
      get_entry returned invalid page_ext address as 0x1DFA000 for a PFN
      0x13FC00.
      
      To avoid this panic, CONFIG_DEBUG_VM should be removed so that page_ext
      will be checked at all times.
      
        Unable to handle kernel paging request at virtual address 01dfa014
        ------------[ cut here ]------------
        Kernel BUG at ffffff80082371e0 [verbose debug info unavailable]
        Internal error: Oops: 96000045 [#1] PREEMPT SMP
        Modules linked in:
        PC is at __set_page_owner+0x48/0x78
        LR is at __set_page_owner+0x44/0x78
          __set_page_owner+0x48/0x78
          get_page_from_freelist+0x880/0x8e8
          __alloc_pages_nodemask+0x14c/0xc48
          __do_page_cache_readahead+0xdc/0x264
          filemap_fault+0x2ac/0x550
          ext4_filemap_fault+0x3c/0x58
          __do_fault+0x80/0x120
          handle_mm_fault+0x704/0xbb0
          do_page_fault+0x2e8/0x394
          do_mem_abort+0x88/0x124
      
      Pre-4.7 kernels also need commit f86e4271 ("mm: check the return
      value of lookup_page_ext for all call sites").
      
      Link: http://lkml.kernel.org/r/20171107094131.14621-1-jaewon31.kim@samsung.com
      Fixes: eefa864b ("mm/page_ext: resurrect struct page extending code for debugging")
      Signed-off-by: default avatarJaewon Kim <jaewon31.kim@samsung.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: <stable@vger.kernel.org>	[depends on f86e4271, see above]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e492080e
    • Wang Long's avatar
      writeback: remove unused function parameter · 2bce774e
      Wang Long authored
      The parameter `struct bdi_writeback *wb` is not been used in the
      function body.  Remove it.
      
      Link: http://lkml.kernel.org/r/1509685485-15278-1-git-send-email-wanglong19@meituan.comSigned-off-by: default avatarWang Long <wanglong19@meituan.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2bce774e
    • Michal Hocko's avatar
      mm: do not rely on preempt_count in print_vma_addr · 0a7f682d
      Michal Hocko authored
      The preempt count check on print_vma_addr has been added by commit
      e8bff74a ("x86: fix "BUG: sleeping function called from invalid
      context" in print_vma_addr()") and it relied on the elevated preempt
      count from preempt_conditional_sti because preempt_count check doesn't
      work on non preemptive kernels by default.
      
      The code has evolved though and commit d99e1bd1 ("x86/entry/traps:
      Refactor preemption and interrupt flag handling") has replaced
      preempt_conditional_sti by an explicit preempt_disable which is noop on
      !PREEMPT so the check in print_vma_addr is broken.
      
      Fix the issue by using trylock on mmap_sem rather than chacking the
      preempt count.  The allocation we are relying on has to be GFP_NOWAIT as
      well.  There is a chance that we won't dump the vma state if the lock is
      contended or the memory short but this is acceptable outcome and much
      less fragile than the not working preemption check or tricks around it.
      
      Link: http://lkml.kernel.org/r/20171106134031.g6dbelg55mrbyc6i@dhcp22.suse.cz
      Fixes: d99e1bd1 ("x86/entry/traps: Refactor preemption and interrupt flag handling")
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarYang Shi <yang.s@alibaba-inc.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0a7f682d
    • Michal Hocko's avatar
      mm, sparse: do not swamp log with huge vmemmap allocation failures · fcdaf842
      Michal Hocko authored
      While doing memory hotplug tests under heavy memory pressure we have
      noticed too many page allocation failures when allocating vmemmap memmap
      backed by huge page
      
        kworker/u3072:1: page allocation failure: order:9, mode:0x24084c0(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO)
        [...]
        Call Trace:
          dump_trace+0x59/0x310
          show_stack_log_lvl+0xea/0x170
          show_stack+0x21/0x40
          dump_stack+0x5c/0x7c
          warn_alloc_failed+0xe2/0x150
          __alloc_pages_nodemask+0x3ed/0xb20
          alloc_pages_current+0x7f/0x100
          vmemmap_alloc_block+0x79/0xb6
          __vmemmap_alloc_block_buf+0x136/0x145
          vmemmap_populate+0xd2/0x2b9
          sparse_mem_map_populate+0x23/0x30
          sparse_add_one_section+0x68/0x18e
          __add_pages+0x10a/0x1d0
          arch_add_memory+0x4a/0xc0
          add_memory_resource+0x89/0x160
          add_memory+0x6d/0xd0
          acpi_memory_device_add+0x181/0x251
          acpi_bus_attach+0xfd/0x19b
          acpi_bus_scan+0x59/0x69
          acpi_device_hotplug+0xd2/0x41f
          acpi_hotplug_work_fn+0x1a/0x23
          process_one_work+0x14e/0x410
          worker_thread+0x116/0x490
          kthread+0xbd/0xe0
          ret_from_fork+0x3f/0x70
      
      and we do see many of those because essentially every allocation fails
      for each memory section.  This is an excessive way to tell the user that
      there is nothing to really worry about because we do have a fallback
      mechanism to use base pages.  The only downside might be a performance
      degradation due to TLB pressure.
      
      This patch changes vmemmap_alloc_block() to use __GFP_NOWARN and warn
      explicitly once on the first allocation failure.  This will reduce the
      noise in the kernel log considerably, while we still have an indication
      that a performance might be impacted.
      
      [mhocko@kernel.org: forgot to git add the follow up fix]
        Link: http://lkml.kernel.org/r/20171107090635.c27thtse2lchjgvb@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20171106092228.31098-1-mhocko@kernel.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fcdaf842
    • Colin Ian King's avatar
      mm/hmm: remove redundant variable align_end · fec11bc0
      Colin Ian King authored
      Variable align_end is assigned a value but it is never read, so the
      variable is redundant and can be removed.  Cleans up the clang warning:
      Value stored to 'align_end' is never read
      
      Link: http://lkml.kernel.org/r/20171017143837.23207-1-colin.king@canonical.comSigned-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fec11bc0
    • Gustavo A. R. Silva's avatar
      mm/list_lru.c: mark expected switch fall-through · 5b568acc
      Gustavo A. R. Silva authored
      In preparation for enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      Link: http://lkml.kernel.org/r/20171020190754.GA24332@embeddedor.comSigned-off-by: default avatarGustavo A. R. Silva <garsilva@embeddedor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b568acc
    • Gustavo A. R. Silva's avatar
      mm/shmem.c: mark expected switch fall-through · c8402871
      Gustavo A. R. Silva authored
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      Link: http://lkml.kernel.org/r/20171020191058.GA24427@embeddedor.comSigned-off-by: default avatarGustavo A. R. Silva <garsilva@embeddedor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c8402871
    • Pavel Tatashin's avatar
      mm/page_alloc.c: broken deferred calculation · d135e575
      Pavel Tatashin authored
      In reset_deferred_meminit() we determine number of pages that must not
      be deferred.  We initialize pages for at least 2G of memory, but also
      pages for reserved memory in this node.
      
      The reserved memory is determined in this function:
      memblock_reserved_memory_within(), which operates over physical
      addresses, and returns size in bytes.  However, reset_deferred_meminit()
      assumes that that this function operates with pfns, and returns page
      count.
      
      The result is that in the best case machine boots slower than expected
      due to initializing more pages than needed in single thread, and in the
      worst case panics because fewer than needed pages are initialized early.
      
      Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com
      Fixes: 864b9a39 ("mm: consider memblock reservations for deferred memory initialization sizing")
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d135e575
    • Tetsuo Handa's avatar
      mm: don't warn about allocations which stall for too long · 400e2249
      Tetsuo Handa authored
      Commit 63f53dea ("mm: warn about allocations which stall for too
      long") was a great step for reducing possibility of silent hang up
      problem caused by memory allocation stalls.  But this commit reverts it,
      for it is possible to trigger OOM lockup and/or soft lockups when many
      threads concurrently called warn_alloc() (in order to warn about memory
      allocation stalls) due to current implementation of printk(), and it is
      difficult to obtain useful information due to limitation of synchronous
      warning approach.
      
      Current printk() implementation flushes all pending logs using the
      context of a thread which called console_unlock().  printk() should be
      able to flush all pending logs eventually unless somebody continues
      appending to printk() buffer.
      
      Since warn_alloc() started appending to printk() buffer while waiting
      for oom_kill_process() to make forward progress when oom_kill_process()
      is processing pending logs, it became possible for warn_alloc() to force
      oom_kill_process() loop inside printk().  As a result, warn_alloc()
      significantly increased possibility of preventing oom_kill_process()
      from making forward progress.
      
      ---------- Pseudo code start ----------
      Before warn_alloc() was introduced:
      
        retry:
          if (mutex_trylock(&oom_lock)) {
            while (atomic_read(&printk_pending_logs) > 0) {
              atomic_dec(&printk_pending_logs);
              print_one_log();
            }
            // Send SIGKILL here.
            mutex_unlock(&oom_lock)
          }
          goto retry;
      
      After warn_alloc() was introduced:
      
        retry:
          if (mutex_trylock(&oom_lock)) {
            while (atomic_read(&printk_pending_logs) > 0) {
              atomic_dec(&printk_pending_logs);
              print_one_log();
            }
            // Send SIGKILL here.
            mutex_unlock(&oom_lock)
          } else if (waited_for_10seconds()) {
            atomic_inc(&printk_pending_logs);
          }
          goto retry;
      ---------- Pseudo code end ----------
      
      Although waited_for_10seconds() becomes true once per 10 seconds,
      unbounded number of threads can call waited_for_10seconds() at the same
      time.  Also, since threads doing waited_for_10seconds() keep doing
      almost busy loop, the thread doing print_one_log() can use little CPU
      resource.  Therefore, this situation can be simplified like
      
      ---------- Pseudo code start ----------
        retry:
          if (mutex_trylock(&oom_lock)) {
            while (atomic_read(&printk_pending_logs) > 0) {
              atomic_dec(&printk_pending_logs);
              print_one_log();
            }
            // Send SIGKILL here.
            mutex_unlock(&oom_lock)
          } else {
            atomic_inc(&printk_pending_logs);
          }
          goto retry;
      ---------- Pseudo code end ----------
      
      when printk() is called faster than print_one_log() can process a log.
      
      One of possible mitigation would be to introduce a new lock in order to
      make sure that no other series of printk() (either oom_kill_process() or
      warn_alloc()) can append to printk() buffer when one series of printk()
      (either oom_kill_process() or warn_alloc()) is already in progress.
      
      Such serialization will also help obtaining kernel messages in readable
      form.
      
      ---------- Pseudo code start ----------
        retry:
          if (mutex_trylock(&oom_lock)) {
            mutex_lock(&oom_printk_lock);
            while (atomic_read(&printk_pending_logs) > 0) {
              atomic_dec(&printk_pending_logs);
              print_one_log();
            }
            // Send SIGKILL here.
            mutex_unlock(&oom_printk_lock);
            mutex_unlock(&oom_lock)
          } else {
            if (mutex_trylock(&oom_printk_lock)) {
              atomic_inc(&printk_pending_logs);
              mutex_unlock(&oom_printk_lock);
            }
          }
          goto retry;
      ---------- Pseudo code end ----------
      
      But this commit does not go that direction, for we don't want to
      introduce a new lock dependency, and we unlikely be able to obtain
      useful information even if we serialized oom_kill_process() and
      warn_alloc().
      
      Synchronous approach is prone to unexpected results (e.g.  too late [1],
      too frequent [2], overlooked [3]).  As far as I know, warn_alloc() never
      helped with providing information other than "something is going wrong".
      I want to consider asynchronous approach which can obtain information
      during stalls with possibly relevant threads (e.g.  the owner of
      oom_lock and kswapd-like threads) and serve as a trigger for actions
      (e.g.  turn on/off tracepoints, ask libvirt daemon to take a memory dump
      of stalling KVM guest for diagnostic purpose).
      
      This commit temporarily loses ability to report e.g.  OOM lockup due to
      unable to invoke the OOM killer due to !__GFP_FS allocation request.
      But asynchronous approach will be able to detect such situation and emit
      warning.  Thus, let's remove warn_alloc().
      
      [1] https://bugzilla.kernel.org/show_bug.cgi?id=192981
      [2] http://lkml.kernel.org/r/CAM_iQpWuPVGc2ky8M-9yukECtS+zKjiDasNymX7rMcBjBFyM_A@mail.gmail.com
      [3] commit db73ee0d ("mm, vmscan: do not loop on too_many_isolated for ever"))
      
      Link: http://lkml.kernel.org/r/1509017339-4802-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reported-by: default avataryuwang.yuwang <yuwang.yuwang@alibaba-inc.com>
      Reported-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      400e2249