Commits · c74f162e845f4aa636c735398676b16640d9541d · nexedi / linux

25 Feb, 2016 1 commit

Merge tag 'mvebu-arm64-4.6-1' of git://git.infradead.org/linux-mvebu into next/arm64 · c74f162e

Olof Johansson authored Feb 24, 2016

mvebu arm64 for 4.6 (part 1)

Non dt part of the Armada 3700 support:
- Kconfig update
- defconfig update
- documentation update (including MAINTAINERS:)

* tag 'mvebu-arm64-4.6-1' of git://git.infradead.org/linux-mvebu:
  arm64: defconfig: enable Armada 3700 related config
  Documentation: arm: update supported Marvell EBU processors
  MAINTAINERS: Extend dts entry for ARM64 mvebu files
  arm64: add mvebu architecture entry
  irqchip/armada-370-xp: Do not enable it by default when ARCH_MVEBU is selected
  ARM: mvebu: Use the ARMADA_370_XP_IRQ option
  irqchip/armada-370-xp: Allow allocation of multiple MSIs
  irqchip/armada-370-xp: Use shorter names for irq_chip
  irqchip/armada-370-xp: Use PCI_MSI_DOORBELL_START where appropriate
  irqchip/armada-370-xp: Use the generic MSI infrastructure
  irqchip/armada-370-xp: Add Kconfig option for the driver
Signed-off-by: Olof Johansson <olof@lixom.net>

c74f162e

24 Feb, 2016 1 commit

Merge tag 'v4.6-rockchip-soc64-1' of... · 8f4f2721

Olof Johansson authored Feb 24, 2016

Merge tag 'v4.6-rockchip-soc64-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into next/arm64

Enable the rockchip-specific timers on arm64 rockchip platforms.
The driver got reworked to not use arm32-specific dsb calls in
4.5-rc1, so now we can safely enable it.

* tag 'v4.6-rockchip-soc64-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  arm64: Enable the timer on Rockchip architecture
Signed-off-by: Olof Johansson <olof@lixom.net>

8f4f2721

17 Feb, 2016 4 commits

arm64: defconfig: enable Armada 3700 related config · e772ca05

Gregory CLEMENT authored Feb 02, 2016

This patch enables the configuration for the Armada 3700 family and for
the related driver it uses.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

e772ca05

Documentation: arm: update supported Marvell EBU processors · a71916b0

Gregory CLEMENT authored Feb 02, 2016

Now that we support Armada 37xx, let's add this family of SoC to the
Marvell documentation, and a reference to a link with more details about
those processors. As for Armda 39x, no datasheet is publicly
available at this time.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>

a71916b0

MAINTAINERS: Extend dts entry for ARM64 mvebu files · dcc3068a

Gregory CLEMENT authored Feb 02, 2016

Extend the mvebu entry to ARM64 device tree sources.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

dcc3068a

arm64: add mvebu architecture entry · b4f596b1

Gregory CLEMENT authored Feb 02, 2016

The Armada 3700 is an mvebu ARM64 SoC using one or two Cortex-A53 cores
depending of the variant.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>

b4f596b1

16 Feb, 2016 7 commits

irqchip/armada-370-xp: Do not enable it by default when ARCH_MVEBU is selected · 63131b63

Gregory CLEMENT authored Feb 08, 2016

The irq-armada-370-xp driver can only be built for ARM 32 bits. The mvebu
family had grown with a new ARM64 SoC which will also select the
ARCH_MEVBU configuration. Since "ARM: mvebu: use the ARMADA_370_XP_IRQ
option", the ARM32 mvebu SoC directly select this new option. Selecting
it by default when ARCH_MEVBU is selected is no more needed.

This patch removes this dependency, thanks to this, a kernel for ARM64
mvebu SoC can be built without error due this driver.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Link: https://lkml.kernel.org/r/1454951660-13289-3-git-send-email-gregory.clement@free-electrons.comSigned-off-by: Jason Cooper <jason@lakedaemon.net>

63131b63

ARM: mvebu: Use the ARMADA_370_XP_IRQ option · cb49d86d

Thomas Petazzoni authored Feb 10, 2016

Now that there is a ARMADA_370_XP_IRQ option to enable the irqchip
driver for Armada 370, XP, 375, 38x and 39x, let's select this option
when needed. Note that this selection is currently not mandatory
because ARMADA_370_XP_IRQ is for now always enabled when ARCH_MVEBU=y,
but this is something that we will change in the future, and therefore
we should make the relevant platforms select ARMADA_370_XP_IRQ when
needed.

Due to this, selecting GENERIC_IRQ_CHIP is no longer needed.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Link: https://lkml.kernel.org/r/1455115621-22846-7-git-send-email-thomas.petazzoni@free-electrons.comSigned-off-by: Jason Cooper <jason@lakedaemon.net>

cb49d86d

irqchip/armada-370-xp: Allow allocation of multiple MSIs · a71b9412

Thomas Petazzoni authored Feb 10, 2016

Add support for allocating multiple MSIs at the same time, so that the
MSI_FLAG_MULTI_PCI_MSI flag can be added to the msi_domain_info
structure.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Link: https://lkml.kernel.org/r/1455115621-22846-6-git-send-email-thomas.petazzoni@free-electrons.comSigned-off-by: Jason Cooper <jason@lakedaemon.net>

a71b9412

irqchip/armada-370-xp: Use shorter names for irq_chip · f692a172

Thomas Petazzoni authored Feb 10, 2016

In order to make the output of /proc/interrupts, use shorter names for
the irq_chip registered by the irq-armada-370-xp driver. Using capital
letters also matches better what is done for the GIC driver, which
uses just "GIC" as the irq_chip->name.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Link: https://lkml.kernel.org/r/1455115621-22846-5-git-send-email-thomas.petazzoni@free-electrons.comSigned-off-by: Jason Cooper <jason@lakedaemon.net>

f692a172

irqchip/armada-370-xp: Use PCI_MSI_DOORBELL_START where appropriate · 0636bab6

Thomas Petazzoni authored Feb 10, 2016

As suggested by Gregory Clement, this commit adjusts the
irq-armada-370-xp driver to use the PCI_MSI_DOORBELL_START define in
the armada_370_xp_handle_msi_irq() function, rather than hardcoding
its value.
Suggested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Link: https://lkml.kernel.org/r/1455115621-22846-4-git-send-email-thomas.petazzoni@free-electrons.comSigned-off-by: Jason Cooper <jason@lakedaemon.net>

0636bab6

irqchip/armada-370-xp: Use the generic MSI infrastructure · fcc392d5

Thomas Petazzoni authored Feb 10, 2016

This commit moves the irq-armada-370-xp driver from using the
PCI-specific MSI infrastructure to the generic MSI infrastructure, to
which drivers are progressively converted.

In this hardware, the MSI controller is directly bundled inside the
interrupt controller, so we have a single Device Tree node to which
multiple IRQ domaines are attached: the wired interrupt domain and the
MSI interrupt domain. In order to ensure that they can be
differentiated, we have to force the bus_token of the wired interrupt
domain to be DOMAIN_BUS_WIRED. The MSI domain bus_token is
automatically set to the appropriate value by
pci_msi_create_irq_domain().
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Suggested-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/1455115621-22846-3-git-send-email-thomas.petazzoni@free-electrons.comSigned-off-by: Jason Cooper <jason@lakedaemon.net>

fcc392d5

irqchip/armada-370-xp: Add Kconfig option for the driver · fed6d336

Thomas Petazzoni authored Feb 10, 2016

Instead of building the irq-armada-370-xp driver directly when
CONFIG_ARCH_MVEBU is enabled, this commit introduces an intermediate
CONFIG_ARMADA_370_XP_IRQ hidden Kconfig option.

This allows this option to select other interrupt-related Kconfig
options (which will be needed in follow-up commits) rather than having
such selects done from arch/arm/mach-<foo>/.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Link: https://lkml.kernel.org/r/1455115621-22846-2-git-send-email-thomas.petazzoni@free-electrons.comSigned-off-by: Jason Cooper <jason@lakedaemon.net>

fed6d336

08 Feb, 2016 1 commit

arm64: defconfig: add spmi and usb related configs · efdda175

Srinivas Kandagatla authored Feb 06, 2016

This patch adds kconfigs for spmi bus support, pinctrl drivers and usb
related to get USB working on Qualcomm DB410C board.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>

efdda175

07 Feb, 2016 5 commits

Linux 4.5-rc3 · 388f7b1d
Linus Torvalds authored Feb 07, 2016

388f7b1d

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · c17dfb01

Linus Torvalds authored Feb 07, 2016

Pull ARM SoC fixes from Olof Johansson:
 "The first real batch of fixes for this release cycle, so there are a
  few more than usual.

  Most of these are fixes and tweaks to board support (DT bugfixes,
  etc).  I've also picked up a couple of small cleanups that seemed
  innocent enough that there was little reason to wait (const/
  __initconst and Kconfig deps).

  Quite a bit of the changes on OMAP were due to fixes to no longer
  write to rodata from assembly when ARM_KERNMEM_PERMS was enabled, but
  there were also other fixes.

  Kirkwood had a bunch of gpio fixes for some boards.  OMAP had RTC
  fixes on OMAP5, and Nomadik had changes to MMC parameters in DT.

  All in all, mostly the usual mix of various fixes"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (46 commits)
  ARM: multi_v7_defconfig: enable DW_WATCHDOG
  ARM: nomadik: fix up SD/MMC DT settings
  ARM64: tegra: Add chosen node for tegra132 norrin
  ARM: realview: use "depends on" instead of "if" after prompt
  ARM: tango: use "depends on" instead of "if" after prompt
  ARM: tango: use const and __initconst for smp_operations
  ARM: realview: use const and __initconst for smp_operations
  bus: uniphier-system-bus: revive tristate prompt
  arm64: dts: Add missing DMA Abort interrupt to Juno
  bus: vexpress-config: Add missing of_node_put
  ARM: dts: am57xx: sbc-am57x: correct Eth PHY settings
  ARM: dts: am57xx: cl-som-am57x: fix CPSW EMAC pinmux
  ARM: dts: am57xx: sbc-am57x: fix UART3 pinmux
  ARM: dts: am57xx: cl-som-am57x: update SPI Flash frequency
  ARM: dts: am57xx: cl-som-am57x: set HOST mode for USB2
  ARM: dts: am57xx: sbc-am57x: fix SB-SOM EEPROM I2C address
  ARM: dts: LogicPD Torpedo: Revert Duplicative Entries
  ARM: dts: am437x: pixcir_tangoc: use correct flags for irq types
  ARM: dts: am4372: fix irq type for arm twd and global timer
  ARM: dts: at91: sama5d4 xplained: fix phy0 IRQ type
  ...

c17dfb01

Merge branch 'mailbox-devel' of git://git.linaro.org/landing-teams/working/fujitsu/integration · 63fee123

Linus Torvalds authored Feb 07, 2016

Pull mailbox fixes from Jassi Brar:

 - fix getting element from the pcc-channels array by simply indexing
   into it

 - prevent building mailbox-test driver for archs that don't have IOMEM

* 'mailbox-devel' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
  mailbox: Fix dependencies for !HAS_IOMEM archs
  mailbox: pcc: fix channel calculation in get_pcc_channel()

63fee123

Merge tag 'usb-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 46df55ce

Linus Torvalds authored Feb 06, 2016

Pull USB fixes from Greg KH:
 "Here are some USB fixes for 4.5-rc3.

  The usual, xhci fixes for reported issues, combined with some small
  gadget driver fixes, and a MAINTAINERS file update.  All have been in
  linux-next with no reported issues"

* tag 'usb-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  xhci: harden xhci_find_next_ext_cap against device removal
  xhci: Fix list corruption in urb dequeue at host removal
  usb: host: xhci-plat: fix NULL pointer in probe for device tree case
  usb: xhci-mtk: fix AHB bus hang up caused by roothubs polling
  usb: xhci-mtk: fix bpkts value of LS/HS periodic eps not behind TT
  usb: xhci: apply XHCI_PME_STUCK_QUIRK to Intel Broxton-M platforms
  usb: xhci: set SSIC port unused only if xhci_suspend succeeds
  usb: xhci: add a quirk bit for ssic port unused
  usb: xhci: handle both SSIC ports in PME stuck quirk
  usb: dwc3: gadget: set the OTG flag in dwc3 gadget driver.
  Revert "xhci: don't finish a TD if we get a short-transfer event mid TD"
  MAINTAINERS: fix my email address
  usb: dwc2: Fix probe problem on bcm2835
  Revert "usb: dwc2: Move reset into dwc2_get_hwparams()"
  usb: musb: ux500: Fix NULL pointer dereference at system PM
  usb: phy: mxs: declare variable with initialized value
  usb: phy: msm: fix error handling in probe.

46df55ce

Merge tag 'staging-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · dacd53c8

Linus Torvalds authored Feb 06, 2016

Pull staging and IIO driver fixes from Greg KH:
 "Here are some IIO and staging driver fixes for 4.5-rc3.

  All of them, except one, are for IIO drivers, and one is for a speakup
  driver fix caused by some earlier patches, to resolve a reported build
  failure"

* tag 'staging-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  Staging: speakup: Fix allyesconfig build on mn10300
  iio: dht11: Use boottime
  iio: ade7753: avoid uninitialized data
  iio: pressure: mpl115: fix temperature offset sign
  iio: imu: Fix dependencies for !HAS_IOMEM archs
  staging: iio: Fix dependencies for !HAS_IOMEM archs
  iio: adc: Fix dependencies for !HAS_IOMEM archs
  iio: inkern: fix a NULL dereference on error
  iio:adc:ti_am335x_adc Fix buffered mode by identifying as software buffer.
  iio: light: acpi-als: Report data as processed
  iio: dac: mcp4725: set iio name property in sysfs
  iio: add HAS_IOMEM dependency to VF610_ADC
  iio: add IIO_TRIGGER dependency to STK8BA50
  iio: proximity: lidar: correct return value
  iio-light: Use a signed return type for ltr501_match_samp_freq()

dacd53c8

06 Feb, 2016 21 commits

Merge branch 'akpm' (patches from Andrew) · 5af9c2e1

Linus Torvalds authored Feb 05, 2016

Merge fixes from Andrew Morton:
 "22 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
  epoll: restrict EPOLLEXCLUSIVE to POLLIN and POLLOUT
  radix-tree: fix oops after radix_tree_iter_retry
  MAINTAINERS: trim the file triggers for ABI/API
  dax: dirty inode only if required
  thp: make deferred_split_scan() work again
  mm: replace vma_lock_anon_vma with anon_vma_lock_read/write
  ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup
  um: asm/page.h: remove the pte_high member from struct pte_t
  mm, hugetlb: don't require CMA for runtime gigantic pages
  mm/hugetlb: fix gigantic page initialization/allocation
  mm: downgrade VM_BUG in isolate_lru_page() to warning
  mempolicy: do not try to queue pages from !vma_migratable()
  mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress
  vmstat: make vmstat_update deferrable
  mm, vmstat: make quiet_vmstat lighter
  mm/Kconfig: correct description of DEFERRED_STRUCT_PAGE_INIT
  memblock: don't mark memblock_phys_mem_size() as __init
  dump_stack: avoid potential deadlocks
  mm: validate_mm browse_rb SMP race condition
  m32r: fix build failure due to SMP and MMU
  ...

5af9c2e1

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 5d6a6a75

Linus Torvalds authored Feb 05, 2016

Pull Ceph fixes from Sage Weil:
 "We have a few wire protocol compatibility fixes, ports of a few recent
  CRUSH mapping changes, and a couple error path fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: MOSDOpReply v7 encoding
  libceph: advertise support for TUNABLES5
  crush: decode and initialize chooseleaf_stable
  crush: add chooseleaf_stable tunable
  crush: ensure take bucket value is valid
  crush: ensure bucket id is valid before indexing buckets array
  ceph: fix snap context leak in error path
  ceph: checking for IS_ERR instead of NULL

5d6a6a75

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 9b108828

Linus Torvalds authored Feb 05, 2016

Pull drm fixes from Dave Airlie:
 "Fixes all over the place:

   - amdkfd: two static checker fixes
   - mst: a bunch of static checker and spec/hw interaction fixes
   - amdgpu: fix Iceland hw properly, and some fiji bugs, along with
     some write-combining fixes.
   - exynos: some regression fixes
   - adv7511: fix some EDID reading issues"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (38 commits)
  drm/dp/mst: deallocate payload on port destruction
  drm/dp/mst: Reverse order of MST enable and clearing VC payload table.
  drm/dp/mst: move GUID storage from mgr, port to only mst branch
  drm/dp/mst: change MST detection scheme
  drm/dp/mst: Calculate MST PBN with 31.32 fixed point
  drm: Add drm_fixp_from_fraction and drm_fixp2int_ceil
  drm/mst: Add range check for max_payloads during init
  drm/mst: Don't ignore the MST PBN self-test result
  drm: fix missing reference counting decrease
  drm/amdgpu: disable uvd and vce clockgating on Fiji
  drm/amdgpu: remove exp hardware support from iceland
  drm/amdgpu: load MEC ucode manually on iceland
  drm/amdgpu: don't load MEC2 on topaz
  drm/amdgpu: drop topaz support from gmc8 module
  drm/amdgpu: pull topaz gmc bits into gmc_v7
  drm/amdgpu: The VI specific EXE bit should only apply to GMC v8.0 above
  drm/amdgpu: iceland use CI based MC IP
  drm/amdgpu: move gmc7 support out of CIK dependency
  drm/amdgpu/gfx7: enable cp inst/reg error interrupts
  drm/amdgpu/gfx8: enable cp inst/reg error interrupts
  ...

9b108828

Merge tag 'pm+acpi-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 22f60701

Linus Torvalds authored Feb 05, 2016

Pull power management and ACPI fixes from Rafael Wysocki:
 "These are: a fix for a recently introduced false-positive warnings
  about PM domain pointers being changed inappropriately (harmless but
  annoying), an MCH size workaround quirk for one more platform, a
  compiler warning fix (generic power domains framework), an ACPI LPSS
  (Intel SoCs) driver fixup and a cleanup of the ACPI CPPC core code.

  Specifics:

   - PM core fix to avoid false-positive warnings generated when the
     pm_domain field is cleared for a device that appears to be bound to
     a driver (Rafael Wysocki).

   - New MCH size workaround quirk for Intel Haswell-ULT (Josh Boyer).

   - Fix for an "unused function" compiler warning in the generic power
     domains framework (Ulf Hansson).

   - Fixup for the ACPI driver for Intel SoCs (acpi-lpss) to set the PM
     domain pointer of a device properly in one place that was
     overlooked by a recent PM core update (Andy Shevchenko).

   - Removal of a redundant function declaration in the ACPI CPPC core
     code (Timur Tabi)"

* tag 'pm+acpi-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: Avoid false-positive warnings in dev_pm_domain_set()
  PM / Domains: Silence compiler warning for an unused function
  ACPI / CPPC: remove redundant mbox_send_message() declaration
  ACPI / LPSS: set PM domain via helper setter
  PNP: Add Haswell-ULT to Intel MCH size workaround

22f60701

epoll: restrict EPOLLEXCLUSIVE to POLLIN and POLLOUT · b6a515c8

Jason Baron authored Feb 05, 2016

In the current implementation of the EPOLLEXCLUSIVE flag (added for
4.5-rc1), if epoll waiters create different POLL* sets and register them
as exclusive against the same target fd, the current implementation will
stop waking any further waiters once it finds the first idle waiter.
This means that waiters could miss wakeups in certain cases.

For example, when we wake up a pipe for reading we do:
wake_up_interruptible_sync_poll(&pipe->wait, POLLIN | POLLRDNORM); So if
one epoll set or epfd is added to pipe p with POLLIN and a second set
epfd2 is added to pipe p with POLLRDNORM, only epfd may receive the
wakeup since the current implementation will stop after it finds any
intersection of events with a waiter that is blocked in epoll_wait().

We could potentially address this by requiring all epoll waiters that
are added to p be required to pass the same set of POLL* events.  IE the
first EPOLL_CTL_ADD that passes EPOLLEXCLUSIVE establishes the set POLL*
flags to be used by any other epfds that are added as EPOLLEXCLUSIVE.
However, I think it might be somewhat confusing interface as we would
have to reference count the number of users for that set, and so
userspace would have to keep track of that count, or we would need a
more involved interface.  It also adds some shared state that we'd have
store somewhere.  I don't think anybody will want to bloat
__wait_queue_head for this.

I think what we could do instead, is to simply restrict EPOLLEXCLUSIVE
such that it can only be specified with EPOLLIN and/or EPOLLOUT.  So
that way if the wakeup includes 'POLLIN' and not 'POLLOUT', we can stop
once we hit the first idle waiter that specifies the EPOLLIN bit, since
any remaining waiters that only have 'POLLOUT' set wouldn't need to be
woken.  Likewise, we can do the same thing if 'POLLOUT' is in the wakeup
bit set and not 'POLLIN'.  If both 'POLLOUT' and 'POLLIN' are set in the
wake bit set (there is at least one example of this I saw in fs/pipe.c),
then we just wake the entire exclusive list.  Having both 'POLLOUT' and
'POLLIN' both set should not be on any performance critical path, so I
think that's ok (in fs/pipe.c its in pipe_release()).  We also continue
to include EPOLLERR and EPOLLHUP by default in any exclusive set.  Thus,
the user can specify EPOLLERR and/or EPOLLHUP but is not required to do
so.

Since epoll waiters may be interested in other events as well besides
EPOLLIN, EPOLLOUT, EPOLLERR and EPOLLHUP, these can still be added by
doing a 'dup' call on the target fd and adding that as one normally
would with EPOLL_CTL_ADD.  Since I think that the POLLIN and POLLOUT
events are what we are interest in balancing, I think that the 'dup'
thing could perhaps be added to only one of the waiter threads.
However, I think that EPOLLIN, EPOLLOUT, EPOLLERR and EPOLLHUP should be
sufficient for the majority of use-cases.

Since EPOLLEXCLUSIVE is intended to be used with a target fd shared
among multiple epfds, where between 1 and n of the epfds may receive an
event, it does not satisfy the semantics of EPOLLONESHOT where only 1
epfd would get an event.  Thus, it is not allowed to be specified in
conjunction with EPOLLEXCLUSIVE.

EPOLL_CTL_MOD is also not allowed if the fd was previously added as
EPOLLEXCLUSIVE.  It seems with the limited number of flags to not be as
interesting, but this could be relaxed at some further point.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Tested-by: Madars Vitolins <m@silodev.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b6a515c8

radix-tree: fix oops after radix_tree_iter_retry · 73204282

Konstantin Khlebnikov authored Feb 05, 2016

Helper radix_tree_iter_retry() resets next_index to the current index.
In following radix_tree_next_slot current chunk size becomes zero.  This
isn't checked and it tries to dereference null pointer in slot.

Tagged iterator is fine because retry happens only at slot 0 where tag
bitmask in iter->tags is filled with single bit.

Fixes: 46437f9a ("radix-tree: fix race in gang lookup")
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

73204282

MAINTAINERS: trim the file triggers for ABI/API · b14fd334

Michael Kerrisk (man-pages) authored Feb 05, 2016

Commit ea8f8fc8 ("MAINTAINERS: add linux-api for review of API/ABI
changes") added file triggers for various paths that likely indicated
API/ABI changes.  However, catching all changes in Documentation/ABI/
and include/uapi/ produces a large volume of mail to linux-api, rather
than only API/ABI changes.  Drop those two entries, but leave
include/linux/syscalls.h and kernel/sys_ni.c to catch syscall-related
changes.

[josh@joshtriplett.org: redid changelog]
Signed-off-by: Michael Kerrisk <mtk.man-pages@gmail.com>
Acked-by: Shuah khan <shuahkh@osg.samsung.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b14fd334

dax: dirty inode only if required · d2b2a28e

Dmitry Monakhov authored Feb 05, 2016

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d2b2a28e

thp: make deferred_split_scan() work again · ae026204

Kirill A. Shutemov authored Feb 05, 2016

We need to iterate over split_queue, not local empty list to get
anything split from the shrinker.

Fixes: e3ae1953 ("thp: limit number of object to scan on deferred_split_scan()")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ae026204

mm: replace vma_lock_anon_vma with anon_vma_lock_read/write · 12352d3c

Konstantin Khlebnikov authored Feb 05, 2016

Sequence vma_lock_anon_vma() - vma_unlock_anon_vma() isn't safe if
anon_vma appeared between lock and unlock. We have to check anon_vma
first or call anon_vma_prepare() to be sure that it's here. There are
only few users of these legacy helpers. Let's get rid of them.

This patch fixes anon_vma lock imbalance in validate_mm(). Write lock
isn't required here, read lock is enough.

And reorders expand_downwards/expand_upwards: security_mmap_addr() and
wrapping-around check don't have to be under anon vma lock.

Link: https://lkml.kernel.org/r/CACT4Y+Y908EjM2z=706dv4rV6dWtxTLK9nFg9_7DhRMLppBo2g@mail.gmail.comSigned-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12352d3c

ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup · c95a5180

xuejiufei authored Feb 05, 2016

When recovery master down, dlm_do_local_recovery_cleanup() only remove
the $RECOVERY lock owned by dead node, but do not clear the refmap bit.
Which will make umount thread falling in dead loop migrating $RECOVERY
to the dead node.
Signed-off-by: xuejiufei <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c95a5180

um: asm/page.h: remove the pte_high member from struct pte_t · 012a4163

Nicolai Stange authored Feb 05, 2016

Commit 16da3068 ("um: kill pfn_t") introduced a compile warning for
defconfig (SUBARCH=i386):

  arch/um/kernel/skas/mmu.c:38:206:
      warning: right shift count >= width of type [-Wshift-count-overflow]

Aforementioned patch changes the definition of the phys_to_pfn() macro
from

  ((pfn_t) ((p) >> PAGE_SHIFT))

to

  ((p) >> PAGE_SHIFT)

This effectively changes the phys_to_pfn() expansion's type from
unsigned long long to unsigned long.

Through the callchain init_stub_pte() => mk_pte(), the expansion of
phys_to_pfn() is (indirectly) fed into the 'phys' argument of the
pte_set_val(pte, phys, prot) macro, eventually leading to

  (pte).pte_high = (phys) >> 32;

This results in the warning from above.

Since UML only deals with 32 bit addresses, the upper 32 bits from
'phys' used to be always zero anyway.  Also, all page protection flags
defined by UML don't use any bits beyond bit 9.  Since the contents of a
PTE are defined within architecture scope only, the ->pte_high member
can be safely removed.

Remove the ->pte_high member from struct pte_t.
Rename ->pte_low to ->pte.
Adapt the pte helper macros in arch/um/include/asm/page.h.

Noteworthy is the pte_copy() macro where a smp_wmb() gets dropped.  This
write barrier doesn't seem to be paired with any read barrier though and
thus, was useless anyway.

Fixes: 16da3068 ("um: kill pfn_t")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

012a4163

mm, hugetlb: don't require CMA for runtime gigantic pages · 080fe206

Vlastimil Babka authored Feb 05, 2016

Commit 944d9fec ("hugetlb: add support for gigantic page allocation
at runtime") has added the runtime gigantic page allocation via
alloc_contig_range(), making this support available only when CONFIG_CMA
is enabled.  Because it doesn't depend on MIGRATE_CMA pageblocks and the
associated infrastructure, it is possible with few simple adjustments to
require only CONFIG_MEMORY_ISOLATION instead of full CONFIG_CMA.

After this patch, alloc_contig_range() and related functions are
available and used for gigantic pages with just CONFIG_MEMORY_ISOLATION
enabled.  Note CONFIG_CMA selects CONFIG_MEMORY_ISOLATION.  This allows
supporting runtime gigantic pages without the CMA-specific checks in
page allocator fastpaths.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

080fe206

mm/hugetlb: fix gigantic page initialization/allocation · b4330afb

Mike Kravetz authored Feb 05, 2016

Attempting to preallocate 1G gigantic huge pages at boot time with
"hugepagesz=1G hugepages=1" on the kernel command line will prevent
booting with the following:

  kernel BUG at mm/hugetlb.c:1218!

When mapcount accounting was reworked, the setting of
compound_mapcount_ptr in prep_compound_gigantic_page was overlooked.  As
a result, the validation of mapcount in free_huge_page fails.

The "BUG_ON" checks in free_huge_page were also changed to
"VM_BUG_ON_PAGE" to assist with debugging.

Fixes: 53f9263b ("mm: rework mapcount accounting to enable 4k mapping of THPs")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b4330afb

mm: downgrade VM_BUG in isolate_lru_page() to warning · cf2a82ee

Kirill A. Shutemov authored Feb 05, 2016

Calling isolate_lru_page() is wrong and shouldn't happen, but it not
nessesary fatal: the page just will not be isolated if it's not on LRU.

Let's downgrade the VM_BUG_ON_PAGE() to WARN_RATELIMIT().
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cf2a82ee

mempolicy: do not try to queue pages from !vma_migratable() · 77bf45e7

Kirill A. Shutemov authored Feb 05, 2016

Maybe I miss some point, but I don't see a reason why we try to queue
pages from non migratable VMAs.

This testcase steps on VM_BUG_ON_PAGE() in isolate_lru_page():

    #include <fcntl.h>
    #include <unistd.h>
    #include <stdio.h>
    #include <sys/mman.h>
    #include <numaif.h>

    #define SIZE 0x2000

    int foo;

    int main()
    {
        int fd;
        char *p;
        unsigned long mask = 2;

        fd = open("/dev/sg0", O_RDWR);
        p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
        /* Faultin pages */
        foo = p[0] + p[0x1000];
        mbind(p, SIZE, MPOL_BIND, &mask, 4, MPOL_MF_MOVE | MPOL_MF_STRICT);
        return 0;
    }

The only case when we can queue pages from such VMA is MPOL_MF_STRICT
plus MPOL_MF_MOVE or MPOL_MF_MOVE_ALL for VMA which has pages on LRU,
but gfp mask is not sutable for migaration (see mapping_gfp_mask() check
in vma_migratable()).  That's looks like a bug to me.

Let's filter out non-migratable vma at start of queue_pages_test_walk()
and go to queue_pages_pte_range() only if MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL flag is set.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

77bf45e7

mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress · 564e81a5

Tetsuo Handa authored Feb 05, 2016

Jan Stancek has reported that system occasionally hanging after "oom01"
testcase from LTP triggers OOM.  Guessing from a result that there is a
kworker thread doing memory allocation and the values between "Node 0
Normal free:" and "Node 0 Normal:" differs when hanging, vmstat is not
up-to-date for some reason.

According to commit 373ccbe5 ("mm, vmstat: allow WQ concurrency to
discover memory reclaim doesn't make any progress"), it meant to force
the kworker thread to take a short sleep, but it by error used
schedule_timeout(1).  We missed that schedule_timeout() in state
TASK_RUNNING doesn't do anything.

Fix it by using schedule_timeout_uninterruptible(1) which forces the
kworker thread to take a short sleep in order to make sure that vmstat
is up-to-date.

Fixes: 373ccbe5 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Jan Stancek <jstancek@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Cristopher Lameter <clameter@sgi.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

564e81a5

vmstat: make vmstat_update deferrable · ccde8bd4

Michal Hocko authored Feb 05, 2016

Commit 0eb77e98 ("vmstat: make vmstat_updater deferrable again and
shut down on idle") made vmstat_shepherd deferrable.  vmstat_update
itself is still useing standard timer which might interrupt idle task.
This is possible because "mm, vmstat: make quiet_vmstat lighter" removed
cancel_delayed_work from the quiet_vmstat.

Change vmstat_work to use DEFERRABLE_WORK to prevent from pointless
wakeups from the idle context.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ccde8bd4

mm, vmstat: make quiet_vmstat lighter · f01f17d3

Michal Hocko authored Feb 05, 2016

Mike has reported a considerable overhead of refresh_cpu_vm_stats from
the idle entry during pipe test:

    12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
     4.75%  [kernel]       [k] __schedule
     4.70%  [kernel]       [k] mutex_unlock
     3.14%  [kernel]       [k] __switch_to

This is caused by commit 0eb77e98 ("vmstat: make vmstat_updater
deferrable again and shut down on idle") which has placed quiet_vmstat
into cpu_idle_loop.  The main reason here seems to be that the idle
entry has to get over all zones and perform atomic operations for each
vmstat entry even though there might be no per cpu diffs.  This is a
pointless overhead for _each_ idle entry.

Make sure that quiet_vmstat is as light as possible.

First of all it doesn't make any sense to do any local sync if the
current cpu is already set in oncpu_stat_off because vmstat_update puts
itself there only if there is nothing to do.

Then we can check need_update which should be a cheap way to check for
potential per-cpu diffs and only then do refresh_cpu_vm_stats.

The original patch also did cancel_delayed_work which we are not doing
here.  There are two reasons for that.  Firstly cancel_delayed_work from
idle context will blow up on RT kernels (reported by Mike):

  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7
  Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
  Call Trace:
    dump_stack+0x49/0x67
    ___might_sleep+0xf5/0x180
    rt_spin_lock+0x20/0x50
    try_to_grab_pending+0x69/0x240
    cancel_delayed_work+0x26/0xe0
    quiet_vmstat+0x75/0xa0
    cpu_idle_loop+0x38/0x3e0
    cpu_startup_entry+0x13/0x20
    start_secondary+0x114/0x140

And secondly, even on !RT kernels it might add some non trivial overhead
which is not necessary.  Even if the vmstat worker wakes up and preempts
idle then it will be most likely a single shot noop because the stats
were already synced and so it would end up on the oncpu_stat_off anyway.
We just need to teach both vmstat_shepherd and vmstat_update to stop
scheduling the worker if there is nothing to do.

[mgalbraith@suse.de: cancel pending work of the cpu_stat_off CPU]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f01f17d3

mm/Kconfig: correct description of DEFERRED_STRUCT_PAGE_INIT · 1ce22103

Vlastimil Babka authored Feb 05, 2016

The description mentions kswapd threads, while the deferred struct page
initialization is actually done by one-off "pgdatinitX" threads.

Fix the description so that potentially users are not confused about
pgdatinit threads using CPU after boot instead of kswapd.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

1ce22103

memblock: don't mark memblock_phys_mem_size() as __init · 1f1ffb8a

David Gibson authored Feb 05, 2016

At the moment memblock_phys_mem_size() is marked as __init, and so is
discarded after boot. This is different from most of the memblock
functions which are marked __init_memblock, and are only discarded after
boot if memory hotplug is not configured.

To allow for upcoming code which will need memblock_phys_mem_size() in
the hotplug path, change it from __init to __init_memblock.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

1f1ffb8a