Commits · 0b0f9dc52ed0333fa52a9314b53d0b2b248b821d · Kirill Smelkov / linux

10 Apr, 2017 4 commits

Revert "virtio_pci: use shared interrupts for virtqueues" · 0b0f9dc5

Michael S. Tsirkin authored Apr 04, 2017

This reverts commit 07ec5148.

Conflicts:
	drivers/virtio/virtio_pci_common.c

Unfortunately the idea does not work with threadirqs
as more than 32 queues can then map to a single interrupts.

Further, the cleanup seems to be one of the changes that broke
hybernation for some users. We are still not sure why
but revert helps.

This reverts the cleanup changes but keeps the affinity support.
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

0b0f9dc5

Revert "virtio_pci: don't duplicate the msix_enable flag in struct pci_dev" · 2008c154

Michael S. Tsirkin authored Apr 04, 2017

This reverts commit 53a020c6.

The cleanup seems to be one of the changes that broke
hybernation for some users. We are still not sure why
but revert helps.
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

2008c154

Revert "virtio_pci: simplify MSI-X setup" · bf951b10

Michael S. Tsirkin authored Apr 04, 2017

This reverts commit 52a61516.

Conflicts:
	drivers/virtio/virtio_pci_common.c

The cleanup seems to be one of the changes that broke
hybernation for some users. We are still not sure why
but revert helps.

This reverts the cleanup changes but keeps the affinity support.
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

bf951b10

Revert "virtio_pci: fix out of bound access for msix_names" · 8f10d014

Michael S. Tsirkin authored Apr 04, 2017

This reverts commit de85ec8b.

Follow-up patches will revert 07ec5148 ("virtio_pci: use shared
interrupts for virtqueues") that triggered the problem so no need for
this one anymore.
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

8f10d014

07 Apr, 2017 5 commits

MAINTAINERS: fix virtio file pattern · 404a5c39

Cornelia Huck authored Mar 30, 2017

The pattern did not catch include/linux/virtio.h.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

404a5c39

virtio_console: fix uninitialized variable use · 2055997f

Michael S. Tsirkin authored Mar 29, 2017

We try to disable callbacks on c_ivq even without multiport
even though that vq is not initialized in this configuration.

Fixes: c743d09d ("virtio: console: Disable callbacks for virtqueues at start of S4 freeze")
Suggested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

2055997f

virtio_net: clear MTU when out of range · fe36cbe0

Michael S. Tsirkin authored Mar 29, 2017

virtio attempts to clear the MTU feature bit if the value is out of the
supported range, but this has no real effect since FEATURES_OK has
already been set.

Fix this up by checking the MTU in the new validate callback.

Fixes: 14de9d11 ("virtio-net: Add initial MTU advice feature")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

fe36cbe0

virtio: allow drivers to validate features · 404123c2

Michael S. Tsirkin authored Mar 29, 2017

Some drivers can't support all features in all configurations. At the
moment we blindly set FEATURES_OK and later FAILED. Support this better
by adding a callback drivers can use to do some early checks.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

404123c2

virtio_net: enable big packets for large MTU values · 2e123b44

Michael S. Tsirkin authored Mar 08, 2017

If one enables e.g. jumbo frames without mergeable
buffers, packets won't fit in 1500 byte buffers
we use. Switch to big packet mode instead.
TODO: make sizing more exact, possibly extend small
packet mode to use larger pages.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

2e123b44

03 Apr, 2017 1 commit
- Linux 4.11-rc5 · a71c9a1c
  Linus Torvalds authored Apr 02, 2017
  
  a71c9a1c
02 Apr, 2017 10 commits

Merge tag 'dmaengine-fix-4.11-rc5' of git://git.infradead.org/users/vkoul/slave-dma · f49237bf

Linus Torvalds authored Apr 02, 2017

Pull dmaengine fixes from Vinod Koul:
 "A couple of minor fixes for 4.11:

   - array bound fix for __get_unmap_pool()

   - cyclic period splitting for bcm2835"

* tag 'dmaengine-fix-4.11-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: Fix array index out of bounds warning in __get_unmap_pool()
  dmaengine: bcm2835: Fix cyclic DMA period splitting

f49237bf

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 496dcc50

Linus Torvalds authored Apr 02, 2017

Pull x86 fixes from Thomas Gleixner:
 "This update provides:

   - prevent KASLR from randomizing EFI regions

   - restrict the usage of -maccumulate-outgoing-args and document when
     and why it is required.

   - make the Global Physical Address calculation for UV4 systems work
     correctly.

   - address a copy->paste->forgot-edit problem in the MCE exception
     table entries.

   - assign a name to AMD MCA bank 3, so the sysfs file registration
     works.

   - add a missing include in the boot code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Include missing header file
  x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs
  x86/build: Mostly disable '-maccumulate-outgoing-args'
  x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization
  x86/mce: Fix copy/paste error in exception table entries
  x86/platform/uv: Fix calculation of Global Physical Address

496dcc50

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 128c434a

Linus Torvalds authored Apr 02, 2017

Pull scheduler fixes from Thomas Gleixner:
 "This update provides:

   - make the scheduler clock switch to unstable mode smooth so the
     timestamps stay at microseconds granularity instead of switching to
     tick granularity.

   - unbreak perf test tsc by taking the new offset into account which
     was added in order to proveide better sched clock continuity

   - switching sched clock to unstable mode runs all clock related
     computations which affect the sched clock output itself from a work
     queue. In case of preemption sched clock uses half updated data and
     provides wrong timestamps. Keep the math in the protected context
     and delegate only the static key switch to workqueue context.

   - remove a duplicate header include"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/headers: Remove duplicate #include <linux/sched/debug.h> line
  sched/clock: Fix broken stable to unstable transfer
  sched/clock, x86/perf: Fix "perf test tsc"
  sched/clock: Fix clear_sched_clock_stable() preempt wobbly

128c434a

Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0a89b5eb

Linus Torvalds authored Apr 02, 2017

Pull EFI fix from Thomas Gleixner:
 "Downgrade the missing ESRT header printk to warning level and remove a
  useless error printk which just generates noise for no value"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/esrt: Cleanup bad memory map log messages

0a89b5eb

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4a6808f3

Linus Torvalds authored Apr 02, 2017

Pull timer fixes from Thomas Gleixner:
 "Two small fixes for the new CLKEVT_OF infrastructure"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  vmlinux.lds: Add __clkevt_of_table to kernel
  clockevents: Fix syntax error in clkevt-of macro

4a6808f3

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 907977b2

Linus Torvalds authored Apr 02, 2017

Pull irq fixes from Thomas Gleixner:
 "Two small fixlets:

   - select a required Kconfig to make the MVEBU driver compile

   - add the missing MIPS local GIC interrupts which prevent drivers to
     probe successfully"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mips-gic: Fix Local compare interrupt
  irqchip/mvebu-odmi: Select GENERIC_MSI_IRQ_DOMAIN

907977b2

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ada63c61

Linus Torvalds authored Apr 02, 2017

Pull core fix from Thomas Gleixner:
 "Prevent leaking kernel memory via /proc/$pid/syscall when the queried
  task is not in a syscall"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  lib/syscall: Clear return values when no stack

ada63c61

Merge branch 'parisc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 346ce1d7

Linus Torvalds authored Apr 01, 2017

Pull parisc fixes from Helge Deller:
"Al Viro reported that - in case of read faults - our copy_from_user()
implementation may claim to have copied more bytes than it actually
did. In order to fix this bug and because of the way how gcc optimizes
register usage for inline assembly in C code, we had to replace our
pa_memcpy() function with a pure assembler implementation.

While fixing the memcpy bug we noticed some other issues with our
get_user() and put_user() functions, e.g. nested faults may return
wrong data. This is now fixed by a common fixup handler for
get_user/put_user in the exception handler which additionally makes
generated code smaller and faster.

The third patch is a trivial one-line fix for a patch which went in
during 4.11-rc and which avoids stalled CPU warnings after power
shutdown (for parisc machines which can't plug power off themselves).

Due to the rewrite of pa_memcpy() into assembly this patch got bigger
than what I wanted to have sent at this stage.

Those patches have been running in production during the last few days
on our debian build servers without any further issues"

* 'parisc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Avoid stalled CPU warnings after system shutdown
parisc: Clean up fixup routines for get_user()/put_user()
parisc: Fix access fault handling in pa_memcpy()

346ce1d7

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 7d34ddbe

Linus Torvalds authored Apr 01, 2017

Pull SCSI fixes from James Bottomley:
 "Thirteen small fixes: The hopefully final effort to get the lpfc nvme
  kconfig problems sorted, there's one important sg fix (user can induce
  read after end of buffer) and one minor enhancement (adding an extra
  PCI ID to qedi). The rest are a set of minor fixes, which mostly occur
  as user visible in error legs or on specific devices"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: remove the duplicated checking for supporting clkscaling
  scsi: lpfc: fix building without debugfs support
  scsi: lpfc: Fix PT2PT PRLI reject
  scsi: hpsa: fix volume offline state
  scsi: libsas: fix ata xfer length
  scsi: scsi_dh_alua: Warn if the first argument of alua_rtpg_queue() is NULL
  scsi: scsi_dh_alua: Ensure that alua_activate() calls the completion function
  scsi: scsi_dh_alua: Check scsi_device_get() return value
  scsi: sg: check length passed to SG_NEXT_CMD_LEN
  scsi: ufshcd-platform: remove the useless cast in ERR_PTR/IS_ERR
  scsi: qedi: Add PCI device-ID for QL41xxx adapters.
  scsi: aacraid: Fix potential null access
  scsi: qla2xxx: Fix crash in qla2xxx_eh_abort on bad ptr

7d34ddbe

Merge branch 'akpm' (patches from Andrew) · 978e0f92

Linus Torvalds authored Apr 01, 2017

Merge misc fixes from Andrew Morton:
 "11 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kasan: do not sanitize kexec purgatory
  drivers/rapidio/devices/tsi721.c: make module parameter variable name unique
  mm/hugetlb.c: don't call region_abort if region_chg fails
  kasan: report only the first error by default
  hugetlbfs: initialize shared policy as part of inode allocation
  mm: fix section name for .data..ro_after_init
  mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()
  mm: workingset: fix premature shadow node shrinking with cgroups
  mm: rmap: fix huge file mmap accounting in the memcg stats
  mm: move mm_percpu_wq initialization earlier
  mm: migrate: fix remove_migration_pte() for ksm pages

978e0f92

01 Apr, 2017 20 commits

Merge tag 'usb-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · a9f6b6b8

Linus Torvalds authored Apr 01, 2017

Pull USB fixes from Greg KH:
 "Here are some small USB fixes for 4.11-rc5.

  The usual xhci fixes are here, as well as a fix for yet-another-bug-
  found-by-KASAN, those developers are doing great stuff here.

  And there's a phy build warning fix that showed up in 4.11-rc1.

  All of these have been in linux-next with no reported issues"

* tag 'usb-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: phy: isp1301: Fix build warning when CONFIG_OF is disabled
  xhci: Manually give back cancelled URB if we can't queue it for cancel
  xhci: Set URB actual length for stopped control transfers
  xhci: plat: Register shutdown for xhci_plat
  USB: fix linked-list corruption in rh_call_control()

a9f6b6b8

Merge tag 'tty-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · b3ff4fac

Linus Torvalds authored Apr 01, 2017

Pull tty/serial fixes from Greg KH:
 "Here are some small fixes for some serial drivers and Kconfig help
  text for 4.11-rc5. Nothing major here at all, a few things resolving
  reported bugs in some random serial drivers.

  I don't think these made the last linux-next due to me getting to them
  yesterday, but I am not sure, they might have snuck in. The patches
  only affect drivers that the maintainers of sent me these patches for,
  so we should be safe here :)"

* tag 'tty-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: pl011: fix earlycon work-around for QDF2400 erratum 44
  serial: 8250_EXAR: fix duplicate Kconfig text and add missing help text
  tty/serial: atmel: fix TX path in atmel_console_write()
  tty/serial: atmel: fix race condition (TX+DMA)
  serial: mxs-auart: Fix baudrate calculation

b3ff4fac

Merge tag 'acpi-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 7ece03b0

Linus Torvalds authored Apr 01, 2017

Pull ACPI fixes from Rafael Wysocki:
 "These fix two issues related to IOAPIC hotplug, an overzealous build
  optimization that prevents the function graph tracer from working with
  the ACPI subsystem correctly and an RCU synchronization issue in the
  ACPI APEI code.

  Specifics:

   - drop the unconditional setting of the '-Os' gcc flag from the ACPI
     Makefile to make the function graph tracer work correctly with the
     ACPI subsystem (Josh Poimboeuf).

   - add missing synchronize_rcu() to ghes_remove() which removes an
     element from an RCU-protected list, but fails to synchronize it
     properly afterward (James Morse).

   - fix two problems related to IOAPIC hotplug, a local variable
     initialization in setup_res() and the creation of platform device
     objects for IO(x)APICs which are (a) unused and (b) leaked on
     hot-removal (Joerg Roedel)"

* tag 'acpi-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: Fix incompatibility with mcount-based function graph tracing
  ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal
  ACPI: Do not create a platform_device for IOAPIC/IOxAPIC
  ACPI: ioapic: Clear on-stack resource before using it

7ece03b0

Merge tag 'pm-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 0d2ceec6

Linus Torvalds authored Apr 01, 2017

Pull power management fixes from Rafael Wysocki:
 "These fix a cpufreq core issue with the initialization of the cpufreq
  sysfs interface and a cpuidle powernv driver initialization issue.

  Specifics:

   - symbolic links from CPU directories to the corresponding cpufreq
     policy directories in sysfs are not created during initialization
     in some cases which confuses user space, so prevent that from
     happening (Rafael Wysocki).

   - the powernv cpuidle driver fails to pass a correct cpumaks to the
     cpuidle core in some cases which causes subsequent failures to
     occur, so fix it (Vaidyanathan Srinivasan)"

* tag 'pm-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: powernv: Pass correct drv->cpumask for registration
  cpufreq: Fix creation of symbolic links to policy directories

0d2ceec6

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 1300dc68

Linus Torvalds authored Apr 01, 2017

Pull i2c fixes from Wolfram Sang:
 "Two bugfixes from I2C, specifically the I2C mux section. Thanks to
  peda for collecting them"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: mux: pca954x: Add missing pca9546 definition to chip_desc
  Revert "i2c: mux: pca954x: Add ACPI support for pca954x"

1300dc68

Merge tag 'arc-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · dcbcb491

Linus Torvalds authored Apr 01, 2017

Pull ARC fixes from Vineet Gupta:
 "Accumulated fixes for ARC which I've been been sitting on for a while:

   - reading clk from driver vs device tree [Vlad]

   - fix support for UIO in VDK platform [Alexey]

   - SLC busy bit reading workaround

   - build warning with kprobes header reorg"

* tag 'arc-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: fix build warnings with !CONFIG_KPROBES
  ARCv2: SLC: Make sure busy bit is set properly on SLC flushing
  ARC: vdk: Fix support of UIO
  ARCv2: make unimplemented vectors as no-ops rather than halt core
  ARC: get rate from clk driver instead of reading device tree
  ARC: [dts] add cpu nodes to ARCHS SMP device tree
  ARC: [dts] add input clocks for cpu nodes

dcbcb491

Merge tag 'nfsd-4.11-1' of git://linux-nfs.org/~bfields/linux · 09c8b3d1

Linus Torvalds authored Apr 01, 2017

Pull nfsd fixes from Bruce Fields:
 "The restriction of NFSv4 to TCP went overboard and also broke the
  backchannel; fix.

  Also some minor refinements to the nfsd version-setting interface that
  we'd like to get fixed before release"

* tag 'nfsd-4.11-1' of git://linux-nfs.org/~bfields/linux:
  svcrdma: set XPT_CONG_CTRL flag for bc xprt
  NFSD: fix nfsd_reset_versions for NFSv4.
  NFSD: fix nfsd_minorversion(.., NFSD_AVAIL)
  NFSD: further refinement of content of /proc/fs/nfsd/versions
  nfsd: map the ENOKEY to nfserr_perm for avoiding warning
  SUNRPC/backchanel: set XPT_CONG_CTRL flag for bc xprt

09c8b3d1

tty: pl011: fix earlycon work-around for QDF2400 erratum 44 · e53e597f

Timur Tabi authored Mar 31, 2017

The work-around for the Qualcomm Datacenter Technologies QDF2400
erratum 44 sets the "qdf2400_e44_present" global variable if the
work-around is needed. However, this check does not happen until after
earlycon is initialized, which means the work-around is not
used, and the console hangs as soon as it displays one character.

Fixes: d8a4995b ("tty: pl011: Work around QDF2400 E44 stuck BUSY bit")
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

e53e597f

Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · fe8e12b5

Linus Torvalds authored Mar 31, 2017

Pull btrfs fixes from Chris Mason:
 "We have three small fixes queued up in my for-linus-4.11 branch"

* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix an integer overflow check
  btrfs: Change qgroup_meta_rsv to 64bit
  Btrfs: bring back repair during read

fe8e12b5

kasan: do not sanitize kexec purgatory · 13a6798e

Mike Galbraith authored Mar 31, 2017

Fixes this:

  kexec: Undefined symbol: __asan_load8_noabort
  kexec-bzImage64: Loading purgatory failed

Link: http://lkml.kernel.org/r/1489672155.4458.7.camel@gmx.deSigned-off-by: Mike Galbraith <efault@gmx.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

13a6798e

drivers/rapidio/devices/tsi721.c: make module parameter variable name unique · 4785603b

Randy Dunlap authored Mar 31, 2017

kbuild test robot reported a non-static variable name collision between
a staging driver and a RapidIO driver, with a generic variable name of
'dbg_level'.

Both drivers should be changed so that they don't use this generic
public variable name.  This patch fixes the RapidIO driver but does not
change the user interface (name) for the module parameter.

  drivers/staging/built-in.o:(.bss+0x109d0): multiple definition of `dbg_level'
  drivers/rapidio/built-in.o:(.bss+0x16c): first defined here

Link: http://lkml.kernel.org/r/ab527fc5-aa3c-4b07-5d48-eef5de703192@infradead.orgSigned-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4785603b

mm/hugetlb.c: don't call region_abort if region_chg fails · ff8c0c53

Mike Kravetz authored Mar 31, 2017

Changes to hugetlbfs reservation maps is a two step process.  The first
step is a call to region_chg to determine what needs to be changed, and
prepare that change.  This should be followed by a call to call to
region_add to commit the change, or region_abort to abort the change.

The error path in hugetlb_reserve_pages called region_abort after a
failed call to region_chg.  As a result, the adds_in_progress counter in
the reservation map is off by 1.  This is caught by a VM_BUG_ON in
resv_map_release when the reservation map is freed.

syzkaller fuzzer (when using an injected kmalloc failure) found this
bug, that resulted in the following:

 kernel BUG at mm/hugetlb.c:742!
 Call Trace:
  hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493
  evict+0x481/0x920 fs/inode.c:553
  iput_final fs/inode.c:1515 [inline]
  iput+0x62b/0xa20 fs/inode.c:1542
  hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306
  newseg+0x422/0xd30 ipc/shm.c:575
  ipcget_new ipc/util.c:285 [inline]
  ipcget+0x21e/0x580 ipc/util.c:639
  SYSC_shmget ipc/shm.c:673 [inline]
  SyS_shmget+0x158/0x230 ipc/shm.c:657
  entry_SYSCALL_64_fastpath+0x1f/0xc2
 RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742

Link: http://lkml.kernel.org/r/1490821682-23228-1-git-send-email-mike.kravetz@oracle.comSigned-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ff8c0c53

kasan: report only the first error by default · b0845ce5

Mark Rutland authored Mar 31, 2017

Disable kasan after the first report.  There are several reasons for
this:

 - Single bug quite often has multiple invalid memory accesses causing
   storm in the dmesg.

 - Write OOB access might corrupt metadata so the next report will print
   bogus alloc/free stacktraces.

 - Reports after the first easily could be not bugs by itself but just
   side effects of the first one.

Given that multiple reports usually only do harm, it makes sense to
disable kasan after the first one.  If user wants to see all the
reports, the boot-time parameter kasan_multi_shot must be used.

[aryabinin@virtuozzo.com: wrote changelog and doc, added missing include]
Link: http://lkml.kernel.org/r/20170323154416.30257-1-aryabinin@virtuozzo.comSigned-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b0845ce5

hugetlbfs: initialize shared policy as part of inode allocation · 4742a35d

Mike Kravetz authored Mar 31, 2017

Any time after inode allocation, destroy_inode can be called.  The
hugetlbfs inode contains a shared_policy structure, and
mpol_free_shared_policy is unconditionally called as part of
hugetlbfs_destroy_inode.  Initialize the policy as part of inode
allocation so that any quick (error path) calls to destroy_inode will be
handed an initialized policy.

syzkaller fuzzer found this bug, that resulted in the following:

    BUG: KASAN: user-memory-access in atomic_inc
    include/asm-generic/atomic-instrumented.h:87 [inline] at addr
    000000131730bd7a
    BUG: KASAN: user-memory-access in __lock_acquire+0x21a/0x3a80
    kernel/locking/lockdep.c:3239 at addr 000000131730bd7a
    Write of size 4 by task syz-executor6/14086
    CPU: 3 PID: 14086 Comm: syz-executor6 Not tainted 4.11.0-rc3+ #364
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
     atomic_inc include/asm-generic/atomic-instrumented.h:87 [inline]
     __lock_acquire+0x21a/0x3a80 kernel/locking/lockdep.c:3239
     lock_acquire+0x1ee/0x590 kernel/locking/lockdep.c:3762
     __raw_write_lock include/linux/rwlock_api_smp.h:210 [inline]
     _raw_write_lock+0x33/0x50 kernel/locking/spinlock.c:295
     mpol_free_shared_policy+0x43/0xb0 mm/mempolicy.c:2536
     hugetlbfs_destroy_inode+0xca/0x120 fs/hugetlbfs/inode.c:952
     alloc_inode+0x10d/0x180 fs/inode.c:216
     new_inode_pseudo+0x69/0x190 fs/inode.c:889
     new_inode+0x1c/0x40 fs/inode.c:918
     hugetlbfs_get_inode+0x40/0x420 fs/hugetlbfs/inode.c:734
     hugetlb_file_setup+0x329/0x9f0 fs/hugetlbfs/inode.c:1282
     newseg+0x422/0xd30 ipc/shm.c:575
     ipcget_new ipc/util.c:285 [inline]
     ipcget+0x21e/0x580 ipc/util.c:639
     SYSC_shmget ipc/shm.c:673 [inline]
     SyS_shmget+0x158/0x230 ipc/shm.c:657
     entry_SYSCALL_64_fastpath+0x1f/0xc2

Analysis provided by Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

Link: http://lkml.kernel.org/r/1490477850-7944-1-git-send-email-mike.kravetz@oracle.comSigned-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4742a35d

mm: fix section name for .data..ro_after_init · 906f2a51

Kees Cook authored Mar 31, 2017

A section name for .data..ro_after_init was added by both:

    commit d07a980c ("s390: add proper __ro_after_init support")

and

    commit d7c19b06 ("mm: kmemleak: scan .data.ro_after_init")

The latter adds incorrect wrapping around the existing s390 section, and
came later.  I'd prefer the s390 naming, so this moves the s390-specific
name up to the asm-generic/sections.h and renames the section as used by
kmemleak (and in the future, kernel/extable.c).

Link: http://lkml.kernel.org/r/20170327192213.GA129375@beastSigned-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>	[s390 parts]
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Eddie Kovsky <ewk@edkovsky.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

906f2a51

mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd() · c9d398fa

Naoya Horiguchi authored Mar 31, 2017

I found the race condition which triggers the following bug when
move_pages() and soft offline are called on a single hugetlb page
concurrently.

    Soft offlining page 0x119400 at 0x700000000000
    BUG: unable to handle kernel paging request at ffffea0011943820
    IP: follow_huge_pmd+0x143/0x190
    PGD 7ffd2067
    PUD 7ffd1067
    PMD 0
        [61163.582052] Oops: 0000 [#1] SMP
    Modules linked in: binfmt_misc ppdev virtio_balloon parport_pc pcspkr i2c_piix4 parport i2c_core acpi_cpufreq ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk 8139too crc32c_intel ata_piix serio_raw libata virtio_pci 8139cp virtio_ring virtio mii floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: cap_check]
    CPU: 0 PID: 22573 Comm: iterate_numa_mo Tainted: P           OE   4.11.0-rc2-mm1+ #2
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    RIP: 0010:follow_huge_pmd+0x143/0x190
    RSP: 0018:ffffc90004bdbcd0 EFLAGS: 00010202
    RAX: 0000000465003e80 RBX: ffffea0004e34d30 RCX: 00003ffffffff000
    RDX: 0000000011943800 RSI: 0000000000080001 RDI: 0000000465003e80
    RBP: ffffc90004bdbd18 R08: 0000000000000000 R09: ffff880138d34000
    R10: ffffea0004650000 R11: 0000000000c363b0 R12: ffffea0011943800
    R13: ffff8801b8d34000 R14: ffffea0000000000 R15: 000077ff80000000
    FS:  00007fc977710740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffea0011943820 CR3: 000000007a746000 CR4: 00000000001406f0
    Call Trace:
     follow_page_mask+0x270/0x550
     SYSC_move_pages+0x4ea/0x8f0
     SyS_move_pages+0xe/0x10
     do_syscall_64+0x67/0x180
     entry_SYSCALL64_slow_path+0x25/0x25
    RIP: 0033:0x7fc976e03949
    RSP: 002b:00007ffe72221d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000117
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc976e03949
    RDX: 0000000000c22390 RSI: 0000000000001400 RDI: 0000000000005827
    RBP: 00007ffe72221e00 R08: 0000000000c2c3a0 R09: 0000000000000004
    R10: 0000000000c363b0 R11: 0000000000000246 R12: 0000000000400650
    R13: 00007ffe72221ee0 R14: 0000000000000000 R15: 0000000000000000
    Code: 81 e4 ff ff 1f 00 48 21 c2 49 c1 ec 0c 48 c1 ea 0c 4c 01 e2 49 bc 00 00 00 00 00 ea ff ff 48 c1 e2 06 49 01 d4 f6 45 bc 04 74 90 <49> 8b 7c 24 20 40 f6 c7 01 75 2b 4c 89 e7 8b 47 1c 85 c0 7e 2a
    RIP: follow_huge_pmd+0x143/0x190 RSP: ffffc90004bdbcd0
    CR2: ffffea0011943820
    ---[ end trace e4f81353a2d23232 ]---
    Kernel panic - not syncing: Fatal exception
    Kernel Offset: disabled

This bug is triggered when pmd_present() returns true for non-present
hugetlb, so fixing the present check in follow_huge_pmd() prevents it.
Using pmd_present() to determine present/non-present for hugetlb is not
correct, because pmd_present() checks multiple bits (not only
_PAGE_PRESENT) for historical reason and it can misjudge hugetlb state.

Fixes: e66f17ff ("mm/hugetlb: take page table lock in follow_huge_pmd()")
Link: http://lkml.kernel.org/r/1490149898-20231-1-git-send-email-n-horiguchi@ah.jp.nec.comSigned-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: <stable@vger.kernel.org>        [4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c9d398fa

mm: workingset: fix premature shadow node shrinking with cgroups · 0cefabda

Johannes Weiner authored Mar 31, 2017

Commit 0a6b76dd ("mm: workingset: make shadow node shrinker memcg
aware") enabled cgroup-awareness in the shadow node shrinker, but forgot
to also enable cgroup-awareness in the list_lru the shadow nodes sit on.

Consequently, all shadow nodes are sitting on a global (per-NUMA node)
list, while the shrinker applies the limits according to the amount of
cache in the cgroup its shrinking. The result is excessive pressure on
the shadow nodes from cgroups that have very little cache.

Enable memcg-mode on the shadow node LRUs, such that per-cgroup limits
are applied to per-cgroup lists.

Fixes: 0a6b76dd ("mm: workingset: make shadow node shrinker memcg aware")
Link: http://lkml.kernel.org/r/20170322005320.8165-1-hannes@cmpxchg.orgSigned-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov@tarantool.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org> [4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0cefabda

mm: rmap: fix huge file mmap accounting in the memcg stats · 553af430

Johannes Weiner authored Mar 31, 2017

Huge pages are accounted as single units in the memcg's "file_mapped"
counter.  Account the correct number of base pages, like we do in the
corresponding node counter.

Link: http://lkml.kernel.org/r/20170322005111.3156-1-hannes@cmpxchg.orgSigned-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

553af430

mm: move mm_percpu_wq initialization earlier · 597b7305

Michal Hocko authored Mar 31, 2017

Yang Li has reported that drain_all_pages triggers a WARN_ON which means
that this function is called earlier than the mm_percpu_wq is
initialized on arm64 with CMA configured:

  WARNING: CPU: 2 PID: 1 at mm/page_alloc.c:2423 drain_all_pages+0x244/0x25c
  Modules linked in:
  CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc1-next-20170310-00027-g64dfbc5 #127
  Hardware name: Freescale Layerscape 2088A RDB Board (DT)
  task: ffffffc07c4a6d00 task.stack: ffffffc07c4a8000
  PC is at drain_all_pages+0x244/0x25c
  LR is at start_isolate_page_range+0x14c/0x1f0
  [...]
   drain_all_pages+0x244/0x25c
   start_isolate_page_range+0x14c/0x1f0
   alloc_contig_range+0xec/0x354
   cma_alloc+0x100/0x1fc
   dma_alloc_from_contiguous+0x3c/0x44
   atomic_pool_init+0x7c/0x208
   arm64_dma_init+0x44/0x4c
   do_one_initcall+0x38/0x128
   kernel_init_freeable+0x1a0/0x240
   kernel_init+0x10/0xfc
   ret_from_fork+0x10/0x20

Fix this by moving the whole setup_vmstat which is an initcall right now
to init_mm_internals which will be called right after the WQ subsystem
is initialized.

Link: http://lkml.kernel.org/r/20170315164021.28532-1-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Yang Li <pku.leo@gmail.com>
Tested-by: Yang Li <pku.leo@gmail.com>
Tested-by: Xiaolong Ye <xiaolong.ye@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

597b7305

mm: migrate: fix remove_migration_pte() for ksm pages · 4b0ece6f

Naoya Horiguchi authored Mar 31, 2017

I found that calling page migration for ksm pages causes the following
bug:

    page:ffffea0004d51180 count:2 mapcount:2 mapping:ffff88013c785141 index:0x913
    flags: 0x57ffffc0040068(uptodate|lru|active|swapbacked)
    raw: 0057ffffc0040068 ffff88013c785141 0000000000000913 0000000200000001
    raw: ffffea0004d5f9e0 ffffea0004d53f60 0000000000000000 ffff88007d81b800
    page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
    page->mem_cgroup:ffff88007d81b800
    ------------[ cut here ]------------
    kernel BUG at /src/linux-dev/mm/rmap.c:1086!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: ppdev parport_pc virtio_balloon i2c_piix4 pcspkr parport i2c_core acpi_cpufreq ip_tables xfs libcrc32c ata_generic pata_acpi ata_piix 8139too libata virtio_blk 8139cp crc32c_intel mii virtio_pci virtio_ring serio_raw virtio floppy dm_mirror dm_region_hash dm_log dm_mod
    CPU: 0 PID: 3162 Comm: bash Not tainted 4.11.0-rc2-mm1+ #1
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    RIP: 0010:do_page_add_anon_rmap+0x1ba/0x260
    RSP: 0018:ffffc90002473b30 EFLAGS: 00010282
    RAX: 0000000000000021 RBX: ffffea0004d51180 RCX: 0000000000000006
    RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff88007dc0dfe0
    RBP: ffffc90002473b58 R08: 00000000fffffffe R09: 00000000000001c1
    R10: 0000000000000005 R11: 00000000000001c0 R12: ffff880139ab3d80
    R13: 0000000000000000 R14: 0000700000000200 R15: 0000160000000000
    FS:  00007f5195f50740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fd450287000 CR3: 000000007a08e000 CR4: 00000000001406f0
    Call Trace:
     page_add_anon_rmap+0x18/0x20
     remove_migration_pte+0x220/0x2c0
     rmap_walk_ksm+0x143/0x220
     rmap_walk+0x55/0x60
     remove_migration_ptes+0x53/0x80
     migrate_pages+0x8ed/0xb60
     soft_offline_page+0x309/0x8d0
     store_soft_offline_page+0xaf/0xf0
     dev_attr_store+0x18/0x30
     sysfs_kf_write+0x3a/0x50
     kernfs_fop_write+0xff/0x180
     __vfs_write+0x37/0x160
     vfs_write+0xb2/0x1b0
     SyS_write+0x55/0xc0
     do_syscall_64+0x67/0x180
     entry_SYSCALL64_slow_path+0x25/0x25
    RIP: 0033:0x7f51956339e0
    RSP: 002b:00007ffcfa0dffc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f51956339e0
    RDX: 000000000000000c RSI: 00007f5195f53000 RDI: 0000000000000001
    RBP: 00007f5195f53000 R08: 000000000000000a R09: 00007f5195f50740
    R10: 000000000000000b R11: 0000000000000246 R12: 00007f5195907400
    R13: 000000000000000c R14: 0000000000000001 R15: 0000000000000000
    Code: fe ff ff 48 81 c2 00 02 00 00 48 89 55 d8 e8 2e c3 fd ff 48 8b 55 d8 e9 42 ff ff ff 48 c7 c6 e0 52 a1 81 48 89 df e8 46 ad fe ff <0f> 0b 48 83 e8 01 e9 7f fe ff ff 48 83 e8 01 e9 96 fe ff ff 48
    RIP: do_page_add_anon_rmap+0x1ba/0x260 RSP: ffffc90002473b30
    ---[ end trace a679d00f4af2df48 ]---
    Kernel panic - not syncing: Fatal exception
    Kernel Offset: disabled
    ---[ end Kernel panic - not syncing: Fatal exception

The problem is in the following lines:

    new = page - pvmw.page->index +
        linear_page_index(vma, pvmw.address);

The 'new' is calculated with 'page' which is given by the caller as a
destination page and some offset adjustment for thp.  But this doesn't
properly work for ksm pages because pvmw.page->index doesn't change for
each address but linear_page_index() changes, which means that 'new'
points to different pages for each addresses backed by the ksm page.  As
a result, we try to set totally unrelated pages as destination pages,
and that causes kernel crash.

This patch fixes the miscalculation and makes ksm page migration work
fine.

Fixes: 3fe87967 ("mm: convert remove_migration_pte() to use page_vma_mapped_walk()")
Link: http://lkml.kernel.org/r/1489717683-29905-1-git-send-email-n-horiguchi@ah.jp.nec.comSigned-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4b0ece6f