Commits · 42ff72cf27027fa28dd79acabe01d9196f1480a7 · nexedi / linux

30 Aug, 2017 6 commits

Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 42ff72cf

Linus Torvalds authored Aug 30, 2017

Pull libnvdimm fix from Dan Williams:
 "A single patch removing some structure definitions from a uapi header
  file. These payloads are never processed directly by the kernel they
  are simply passed through an ioctl as opaque blobs to the ACPI _DSM
  (Device Specific Method) interface.

  Userspace should not be depending on the kernel to define these
  payloads. We will instead provide these definitions via the existing
  libndctl (https://github.com/pmem/ndctl) project that has NVDIMM
  command helpers and other definitions"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm: clean up command definitions

42ff72cf

Merge tag 'drm-fixes-for-v4.13-rc8' of git://people.freedesktop.org/~airlied/linux · 94249117

Linus Torvalds authored Aug 30, 2017

Pull drm fixes from Dave Airlie:
 "Two fixes (a vmwgfx and core drm fix) in the queue for 4.13 final,
  hopefully that is it"

* tag 'drm-fixes-for-v4.13-rc8' of git://people.freedesktop.org/~airlied/linux:
  drm/vmwgfx: Fix F26 Wayland screen update issue
  drm/bridge/sii8620: Fix memory corruption

94249117

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · c02bf3e5

Linus Torvalds authored Aug 30, 2017

Pull SCSI fixes from James Bottomley:
 "Three minor fixes: a NULL deref in qedf, an off by one in sg and a fix
  to IPR to prevent an error on initialisation"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: qedf: Fix a potential NULL pointer dereference
  scsi: sg: off by one in sg_ioctl()
  scsi: ipr: Set no_report_opcodes for RAID arrays

c02bf3e5

Merge branch 'for-linus-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml · 0761fc15

Linus Torvalds authored Aug 30, 2017

Pull UML fix from Richard Weinberger:
 "This contains a single fix for a regression which was introduced while
  the merge window"

* 'for-linus-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Fix check for _xstate for older hosts

0761fc15

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha · dd689a68

Linus Torvalds authored Aug 30, 2017

Pull alpha update from Matt Turner:
 "A few fixes and wires up some additional syscalls."

[ Some of this is technically not really rc7 material, but it's alpha,
  and it all looks safe anyway. Matt explains: "My alpha has been
  offline, hence the very late-in-cycle pull request" and hasn't caused
  problems before, so he gets to slide.   - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
  alpha: uapi: Add support for __SANE_USERSPACE_TYPES__
  alpha: Define ioremap_wc
  alpha: Fix section mismatches
  alpha: support R_ALPHA_REFLONG relocations for module loading
  alpha: Fix typo in ev6-copy_user.S
  alpha: Package string routines together
  alpha: Update for new syscalls
  alpha: Fix build error without CONFIG_VGA_HOSE.

dd689a68

Merge branch 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux into drm-fixes · 58aec872
Dave Airlie authored Aug 30, 2017
```
Single vmwgfx fix.

* 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux:
  drm/vmwgfx: Fix F26 Wayland screen update issue
```
58aec872

29 Aug, 2017 15 commits

drm/vmwgfx: Fix F26 Wayland screen update issue · 021aba76

Sinclair Yeh authored Aug 29, 2017

vmwgfx currently cannot support non-blocking commit because when
vmw_*_crtc_page_flip is called, drm_atomic_nonblocking_commit()
schedules the update on a thread.  This means vmw_*_crtc_page_flip
cannot rely on the new surface being bound before the subsequent
dirty and flush operations happen.

Cc: <stable@vger.kernel.org> # 4.12.x
Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>

021aba76

Merge tag 'drm-misc-fixes-2017-08-28' of git://anongit.freedesktop.org/git/drm-misc into drm-fixes · e38f5164

Dave Airlie authored Aug 30, 2017

Driver Changes:
- bridge/sii8620: Fix out-of-bounds write to incorrect register

Cc: Maciej Purski <m.purski@samsung.com>
Cc: Andrzej Hajda <a.hajda@samsung.com>

* tag 'drm-misc-fixes-2017-08-28' of git://anongit.freedesktop.org/git/drm-misc:
  drm/bridge/sii8620: Fix memory corruption

e38f5164

alpha: uapi: Add support for __SANE_USERSPACE_TYPES__ · cec80d82

Ben Hutchings authored Oct 01, 2015

This fixes compiler errors in perf such as:

tests/attr.c: In function 'store_event':
tests/attr.c:66:27: error: format '%llu' expects argument of type 'long long unsigned int', but argument 6 has type '__u64 {aka long unsigned int}' [-Werror=format=]
  snprintf(path, PATH_MAX, "%s/event-%d-%llu-%d", dir,
                           ^
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tested-by: Michael Cree <mcree@orcon.net.nz>
Cc: stable@vger.kernel.org
Signed-off-by: Matt Turner <mattst88@gmail.com>

cec80d82

alpha: Define ioremap_wc · 7817cedc

Guenter Roeck authored Jul 31, 2015

Commit 3cc2dac5 ("drivers/video/fbdev/atyfb: Replace MTRR UC hole
with strong UC") introduces calls to ioremap_wc and ioremap_uc. This
causes build failures with alpha:allmodconfig. Map the missing functions
to ioremap_nocache.

Fixes: 3cc2dac5 ("drivers/video/fbdev/atyfb:
        Replace MTRR UC hole with strong UC")
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>

7817cedc

alpha: Fix section mismatches · 69f06782
Matt Turner authored Aug 24, 2017
```
Signed-off-by: Matt Turner <mattst88@gmail.com>
```
69f06782

alpha: support R_ALPHA_REFLONG relocations for module loading · 4f61e078

Michael Cree authored Jun 24, 2017

Since commit 71810db2 (modversions: treat symbol CRCs
as 32 bit quantities) R_ALPHA_REFLONG relocations can be required
to load modules. This implements it.
Tested-by: Bob Tracy <rct@gherkin.frus.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Matt Turner <mattst88@gmail.com>

4f61e078

alpha: Fix typo in ev6-copy_user.S · 4606f68f

Richard Henderson authored Jun 23, 2017

Patch 85250231 introduced a typo.

That said, the identity AND insns added by that patch are more
clearly written as MOV.  At the same time, re-schedule the ev6
version so that the first dispatch can execute in parallel.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>

4606f68f

alpha: Package string routines together · 4758ce82

Richard Henderson authored Jun 23, 2017

There are direct branches between {str*cpy,str*cat} and stx*cpy.
Ensure the branches are within range by merging these objects.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>

4758ce82

alpha: Update for new syscalls · a7208306

Richard Henderson authored Jun 23, 2017

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>

a7208306

alpha: Fix build error without CONFIG_VGA_HOSE. · e42faf55

Matt Turner authored Oct 24, 2016

pci_vga_hose is #defined to 0 in include/asm/vga.h if CONFIG_VGA_HOSE is
not set.
Signed-off-by: Matt Turner <mattst88@gmail.com>

e42faf55

Merge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 36fde05f

Linus Torvalds authored Aug 29, 2017

Pull cgroup fix from Tejun Heo:
 "A late but obvious fix for cgroup.

  I broke the 'cpuset.memory_pressure' file a long time ago (v4.4) by
  accidentally deleting its file index, which made it a duplicate of the
  'cpuset.memory_migrate' file. Spotted and fixed by Waiman"

* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cpuset: Fix incorrect memory_pressure control file mapping

36fde05f

Merge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata · 31a3faf3

Linus Torvalds authored Aug 29, 2017

Pull libata fixes from Tejun Heo:
 "Late fixes for libata. There's a minor platform driver fix but the
  important one is READ LOG PAGE.

  This is a new ATA command which is used to test some optional features
  but it broke probing of some devices - they locked up instead of
  failing the unknown command.

  Christoph tried blacklisting, but, after finding out there are
  multiple devices which fail this way, backed off to testing feature
  bit in IDENTIFY data first, which is a bit lossy (we can miss features
  on some devices) but should be a lot safer"

* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  Revert "libata: quirk read log on no-name M.2 SSD"
  libata: check for trusted computing in IDENTIFY DEVICE data
  libata: quirk read log on no-name M.2 SSD
  sata: ahci-da850: Fix some error handling paths in 'ahci_da850_probe()'

31a3faf3

Revert "rmap: do not call mmu_notifier_invalidate_page() under ptl" · 785373b4

Linus Torvalds authored Aug 29, 2017

This reverts commit aac2fea9.

It turns out that that patch was complete and utter garbage, and broke
KVM, resulting in odd oopses.

Quoting Andrea Arcangeli:
 "The aforementioned commit has 3 bugs.

  1) mmu_notifier_invalidate_range cannot be used in replacement of
     mmu_notifier_invalidate_range_start/end.

     For KVM mmu_notifier_invalidate_range is a noop and rightfully so.

     A MMU notifier implementation has to implement either
     ->invalidate_range method or the invalidate_range_start/end
     methods, not both. And if you implement invalidate_range_start/end
     like KVM is forced to do, calling mmu_notifier_invalidate_range in
     common code is a noop for KVM.

     For those MMU notifiers that can get away only implementing
     ->invalidate_range, the ->invalidate_range is implicitly called by
     mmu_notifier_invalidate_range_end(). And only those secondary MMUs
     that share the same pagetable with the primary MMU (like AMD
     iommuv2) can get away only implementing ->invalidate_range.

     So all cases (THP on/off) are broken right now.

     To fix this is enough to replace mmu_notifier_invalidate_range with
     mmu_notifier_invalidate_range_start;mmu_notifier_invalidate_range_end.
     Either that or call multiple mmu_notifier_invalidate_page like
     before.

  2) address + (1UL << compound_order(page) is buggy, it should be
     PAGE_SIZE << compound_order(page), it's bytes not pages, 2M not
     512.

  3) The whole invalidate_range thing was an attempt to call a single
     invalidate while walking multiple 4k ptes that maps the same THP
     (after a pmd virtual split without physical compound page THP
     split).

     It's unclear if the rmap_walk will always provide an address that
     is 2M aligned as parameter to try_to_unmap_one, in presence of THP.
     I think it needs also an address &= (PAGE_SIZE <<
     compound_order(page)) - 1 to be safe"

In general, we should stop making excuses for horrible MMU notifier
users.  It's much more important that the core VM is sane and safe, than
letting MMU notifiers sleep.

So if some MMU notifier is sleeping under a spinlock, we need to fix the
notifier, not try to make excuses for that garbage in the core VM.
Reported-and-tested-by: Bernhard Held <berny156@gmx.de>
Reported-and-tested-by: Adam Borowski <kilobyte@angband.pl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: axie <axie@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

785373b4

Revert "libata: quirk read log on no-name M.2 SSD" · 2aca3923

Tejun Heo authored Aug 29, 2017

This reverts commit 35f0b6a7.

We now conditionalize issuing of READ LOG PAGE on the TRUSTED
COMPUTING SUPPORTED bit in the identity data and this shouldn't be
necessary.
Signed-off-by: Tejun Heo <tj@kernel.org>

2aca3923

libata: check for trusted computing in IDENTIFY DEVICE data · e8f11db9

Christoph Hellwig authored Aug 29, 2017

ATA-8 and later mirrors the TRUSTED COMPUTING SUPPORTED bit in word 48 of
the IDENTIFY DEVICE data.  Check this before issuing a READ LOG PAGE
command to avoid issues with buggy devices.  The only downside is that
we can't support Security Send / Receive for a device with an older
revision due to the conflicting use of this field in earlier
specifications.

tj: The reason we need this is because some devices which don't
    support READ LOG PAGE lock up after getting issued that command.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

e8f11db9

28 Aug, 2017 12 commits

page waitqueue: always add new entries at the end · 9c3a815f

Linus Torvalds authored Aug 28, 2017

Commit 3510ca20 ("Minor page waitqueue cleanups") made the page
queue code always add new waiters to the back of the queue, which helps
upcoming patches to batch the wakeups for some horrid loads where the
wait queues grow to thousands of entries.

However, I forgot about the nasrt add_page_wait_queue() special case
code that is only used by the cachefiles code.  That one still continued
to add the new wait queue entries at the beginning of the list.

Fix it, because any sane batched wakeup will require that we don't
suddenly start getting new entries at the beginning of the list that we
already handled in a previous batch.

[ The current code always does the whole list while holding the lock, so
  wait queue ordering doesn't matter for correctness, but even then it's
  better to add later entries at the end from a fairness standpoint ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9c3a815f

cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs · b339752d

Tejun Heo authored Aug 28, 2017

When !NUMA, cpumask_of_node(@node) equals cpu_online_mask regardless of
@node.  The assumption seems that if !NUMA, there shouldn't be more than
one node and thus reporting cpu_online_mask regardless of @node is
correct.  However, that assumption was broken years ago to support
DISCONTIGMEM and whether a system has multiple nodes or not is
separately controlled by NEED_MULTIPLE_NODES.

This means that, on a system with !NUMA && NEED_MULTIPLE_NODES,
cpumask_of_node() will report cpu_online_mask for all possible nodes,
indicating that the CPUs are associated with multiple nodes which is an
impossible configuration.

This bug has been around forever but doesn't look like it has caused any
noticeable symptoms.  However, it triggers a WARN recently added to
workqueue to verify NUMA affinity configuration.

Fix it by reporting empty cpumask on non-zero nodes if !NUMA.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b339752d

ARCv2: SMP: Mask only private-per-core IRQ lines on boot at core intc · e8206d2b

Alexey Brodkin authored Aug 28, 2017

Recent commit a8ec3ee8 "arc: Mask individual IRQ lines during core
INTC init" breaks interrupt handling on ARCv2 SMP systems.

That commit masked all interrupts at onset, as some controllers on some
boards (customer as well as internal), would assert interrutps early
before any handlers were installed.  For SMP systems, the masking was
done at each cpu's core-intc.  Later, when the IRQ was actually
requested, it was unmasked, but only on the requesting cpu.

For "common" interrupts, which were wired up from the 2nd level IDU
intc, this was as issue as they needed to be enabled on ALL the cpus
(given that IDU IRQs are by default served Round Robin across cpus)

So fix that by NOT masking "common" interrupts at core-intc, but instead
at the 2nd level IDU intc (latter already being done in idu_of_init())

Fixes: a8ec3ee8 ("arc: Mask individual IRQ lines during core INTC init")
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
[vgupta: reworked changelog, removed the extraneous idu_irq_mask_raw()]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

e8206d2b

fs/select: Fix memory corruption in compat_get_fd_set() · 79de3cbe

Helge Deller authored Aug 23, 2017

Commit 464d6242 ("select: switch compat_{get,put}_fd_set() to
compat_{get,put}_bitmap()") changed the calculation on how many bytes
need to be zeroed when userspace handed over a NULL pointer for a fdset
array in the select syscall.

The calculation was changed in compat_get_fd_set() wrongly from
	memset(fdset, 0, ((nr + 1) & ~1)*sizeof(compat_ulong_t));
to
	memset(fdset, 0, ALIGN(nr, BITS_PER_LONG));

The ALIGN(nr, BITS_PER_LONG) calculates the number of _bits_ which need
to be zeroed in the target fdset array (rounded up to the next full bits
for an unsigned long).

But the memset() call expects the number of _bytes_ to be zeroed.

This leads to clearing more memory than wanted (on the stack area or
even at kmalloc()ed memory areas) and to random kernel crashes as we
have seen them on the parisc platform.

The correct change should have been

	memset(fdset, 0, (ALIGN(nr, BITS_PER_LONG) / BITS_PER_LONG) * BYTES_PER_LONG);

which is the same as can be archieved with a call to

	zero_fd_set(nr, fdset).

Fixes: 464d6242 ("select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()"
Acked-by: : Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

79de3cbe

Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming · 702e9762

Linus Torvalds authored Aug 28, 2017

Pull c6x tweaks from Mark Salter.

* tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
  c6x: Convert to using %pOF instead of full_name
  c6x: defconfig: Cleanup from old Kconfig options

702e9762

libata: quirk read log on no-name M.2 SSD · 35f0b6a7

Christoph Hellwig authored Aug 28, 2017

Ido reported that reading the log page on his systems fails,
so quirk it as it won't support ZBC or security protocols.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

35f0b6a7

libnvdimm: clean up command definitions · 7a14724f

Dan Williams authored Aug 28, 2017

Remove the command payloads that do not have an associated libnvdimm
ioctl. I.e. remove the payloads that would only ever be carried in the
ND_CMD_CALL envelope. This prevents userspace from growing unnecessary
dependencies on this kernel header when userspace already has everything
it needs to craft and send these commands.

Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
Reported-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

7a14724f

Linux 4.13-rc7 · cc4a41fe
Linus Torvalds authored Aug 27, 2017

cc4a41fe

Merge tag 'iommu-fixes-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 2c25833c

Linus Torvalds authored Aug 27, 2017

Pull IOMMU fix from Joerg Roedel:
 "Another fix, this time in common IOMMU sysfs code.

  In the conversion from the old iommu sysfs-code to the
  iommu_device_register interface, I missed to update the release path
  for the struct device associated with an IOMMU. It freed the 'struct
  device', which was a pointer before, but is now embedded in another
  struct.

  Freeing from the middle of allocated memory had all kinds of nasty
  side effects when an IOMMU was unplugged. Unfortunatly nobody
  unplugged and IOMMU until now, so this was not discovered earlier. The
  fix is to make the 'struct device' a pointer again"

* tag 'iommu-fixes-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu: Fix wrong freeing of iommu_device->dev

2c25833c

Merge tag 'char-misc-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 80f73b2d

Linus Torvalds authored Aug 27, 2017

Pull char/misc fix from Greg KH:
 "Here is a single misc driver fix for 4.13-rc7. It resolves a reported
  problem in the Android binder driver due to previous patches in
  4.13-rc.

  It's been in linux-next with no reported issues"

* tag 'char-misc-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  ANDROID: binder: fix proc->tsk check.

80f73b2d

Merge tag 'staging-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · c3c16263

Linus Torvalds authored Aug 27, 2017

Pull staging/iio fixes from Greg KH:
 "Here are few small staging driver fixes, and some more IIO driver
  fixes for 4.13-rc7. Nothing major, just resolutions for some reported
  problems.

  All of these have been in linux-next with no reported problems"

* tag 'staging-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: magnetometer: st_magn: remove ihl property for LSM303AGR
  iio: magnetometer: st_magn: fix status register address for LSM303AGR
  iio: hid-sensor-trigger: Fix the race with user space powering up sensors
  iio: trigger: stm32-timer: fix get trigger mode
  iio: imu: adis16480: Fix acceleration scale factor for adis16480
  PATCH] iio: Fix some documentation warnings
  staging: rtl8188eu: add RNX-N150NUB support
  Revert "staging: fsl-mc: be consistent when checking strcmp() return"
  iio: adc: stm32: fix common clock rate
  iio: adc: ina219: Avoid underflow for sleeping time
  iio: trigger: stm32-timer: add enable attribute
  iio: trigger: stm32-timer: fix get/set down count direction
  iio: trigger: stm32-timer: fix write_raw return value
  iio: trigger: stm32-timer: fix quadrature mode get routine
  iio: bmp280: properly initialize device for humidity reading

c3c16263

Merge tag 'ntb-4.13-bugfixes' of git://github.com/jonmason/ntb · fff4e7a0

Linus Torvalds authored Aug 27, 2017

Pull NTB fixes from Jon Mason:
 "NTB bug fixes to address an incorrect ntb_mw_count reference in the
  NTB transport, improperly bringing down the link if SPADs are
  corrupted, and an out-of-order issue regarding link negotiation and
  data passing"

* tag 'ntb-4.13-bugfixes' of git://github.com/jonmason/ntb:
  ntb: ntb_test: ensure the link is up before trying to configure the mws
  ntb: transport shouldn't disable link due to bogus values in SPADs
  ntb: use correct mw_count function in ntb_tool and ntb_transport

fff4e7a0

27 Aug, 2017 3 commits

Avoid page waitqueue race leaving possible page locker waiting · a8b169af

Linus Torvalds authored Aug 27, 2017

The "lock_page_killable()" function waits for exclusive access to the
page lock bit using the WQ_FLAG_EXCLUSIVE bit in the waitqueue entry
set.

That means that if it gets woken up, other waiters may have been
skipped.

That, in turn, means that if it sees the page being unlocked, it *must*
take that lock and return success, even if a lethal signal is also
pending.

So instead of checking for lethal signals first, we need to check for
them after we've checked the actual bit that we were waiting for.  Even
if that might then delay the killing of the process.

This matches the order of the old "wait_on_bit_lock()" infrastructure
that the page locking used to use (and is still used in a few other
areas).

Note that if we still return an error after having unsuccessfully tried
to acquire the page lock, that is ok: that means that some other thread
was able to get ahead of us and lock the page, and when that other
thread then unlocks the page, the wakeup event will be repeated.  So any
other pending waiters will now get properly woken up.

Fixes: 62906027 ("mm: add PageWaiters indicating tasks are waiting for a page bit")
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jan Kara <jack@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a8b169af

Minor page waitqueue cleanups · 3510ca20

Linus Torvalds authored Aug 27, 2017

Tim Chen and Kan Liang have been battling a customer load that shows
extremely long page wakeup lists.  The cause seems to be constant NUMA
migration of a hot page that is shared across a lot of threads, but the
actual root cause for the exact behavior has not been found.

Tim has a patch that batches the wait list traversal at wakeup time, so
that we at least don't get long uninterruptible cases where we traverse
and wake up thousands of processes and get nasty latency spikes.  That
is likely 4.14 material, but we're still discussing the page waitqueue
specific parts of it.

In the meantime, I've tried to look at making the page wait queues less
expensive, and failing miserably.  If you have thousands of threads
waiting for the same page, it will be painful.  We'll need to try to
figure out the NUMA balancing issue some day, in addition to avoiding
the excessive spinlock hold times.

That said, having tried to rewrite the page wait queues, I can at least
fix up some of the braindamage in the current situation. In particular:

 (a) we don't want to continue walking the page wait list if the bit
     we're waiting for already got set again (which seems to be one of
     the patterns of the bad load).  That makes no progress and just
     causes pointless cache pollution chasing the pointers.

 (b) we don't want to put the non-locking waiters always on the front of
     the queue, and the locking waiters always on the back.  Not only is
     that unfair, it means that we wake up thousands of reading threads
     that will just end up being blocked by the writer later anyway.

Also add a comment about the layout of 'struct wait_page_key' - there is
an external user of it in the cachefiles code that means that it has to
match the layout of 'struct wait_bit_key' in the two first members.  It
so happens to match, because 'struct page *' and 'unsigned long *' end
up having the same values simply because the page flags are the first
member in struct page.

Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Christopher Lameter <cl@linux.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3510ca20

Clarify (and fix) MAX_LFS_FILESIZE macros · 0cc3b0ec

Linus Torvalds authored Aug 27, 2017

We have a MAX_LFS_FILESIZE macro that is meant to be filled in by
filesystems (and other IO targets) that know they are 64-bit clean and
don't have any 32-bit limits in their IO path.

It turns out that our 32-bit value for that limit was bogus.  On 32-bit,
the VM layer is limited by the page cache to only 32-bit index values,
but our logic for that was confusing and actually wrong.  We used to
define that value to

	(((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)

which is actually odd in several ways: it limits the index to 31 bits,
and then it limits files so that they can't have data in that last byte
of a page that has the highest 31-bit index (ie page index 0x7fffffff).

Neither of those limitations make sense.  The index is actually the full
32 bit unsigned value, and we can use that whole full page.  So the
maximum size of the file would logically be "PAGE_SIZE << BITS_PER_LONG".

However, we do wan tto avoid the maximum index, because we have code
that iterates over the page indexes, and we don't want that code to
overflow.  So the maximum size of a file on a 32-bit host should
actually be one page less than the full 32-bit index.

So the actual limit is ULONG_MAX << PAGE_SHIFT.  That means that we will
not actually be using the page of that last index (ULONG_MAX), but we
can grow a file up to that limit.

The wrong value of MAX_LFS_FILESIZE actually caused problems for Doug
Nazar, who was still using a 32-bit host, but with a 9.7TB 2 x RAID5
volume.  It turns out that our old MAX_LFS_FILESIZE was 8TiB (well, one
byte less), but the actual true VM limit is one page less than 16TiB.

This was invisible until commit c2a9737f ("vfs,mm: fix a dead loop
in truncate_inode_pages_range()"), which started applying that
MAX_LFS_FILESIZE limit to block devices too.

NOTE! On 64-bit, the page index isn't a limiter at all, and the limit is
actually just the offset type itself (loff_t), which is signed.  But for
clarity, on 64-bit, just use the maximum signed value, and don't make
people have to count the number of 'f' characters in the hex constant.

So just use LLONG_MAX for the 64-bit case.  That was what the value had
been before too, just written out as a hex constant.

Fixes: c2a9737f ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
Reported-and-tested-by: Doug Nazar <nazard@nazar.ca>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

0cc3b0ec

26 Aug, 2017 4 commits

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · bab97524

Linus Torvalds authored Aug 26, 2017

Pull input fixes from Dmitry Torokhov:

 - a tweak to the IBM Trackpoint driver that helps recognizing
   trackpoints on never Lenovo Carbons

 - a fix to the ALPS driver solving scroll issues on some Dells

 - yet another ACPI ID has been added to Elan I2C toucpad driver

 - quieted diagnostic message in soc_button_array driver

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: ALPS - fix two-finger scroll breakage in right side on ALPS touchpad
  Input: soc_button_array - silence -ENOENT error on Dell XPS13 9365
  Input: trackpoint - add new trackpoint firmware ID
  Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310

bab97524

Merge tag 'pci-v4.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 9716bdb2

Linus Torvalds authored Aug 26, 2017

Pull PCI fix from Bjorn Helgaas:
 "Remove needlessly alarming MSI affinity warning (this is not actually
  a bug fix, but the warning prompts unnecessary bug reports)"

* tag 'pci-v4.13-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI/MSI: Don't warn when irq_create_affinity_masks() returns NULL

9716bdb2

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c153e621

Linus Torvalds authored Aug 26, 2017

Pull x86 fixes from Ingo Molnar:
 "Two fixes: one for an ldt_struct handling bug and a cherry-picked
  objtool fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix use-after-free of ldt_struct
  objtool: Fix '-mtune=atom' decoding support in objtool 2.0

c153e621

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0adb8f3d

Linus Torvalds authored Aug 26, 2017

Pull timer fix from Ingo Molnar:
 "Fix a timer granularity handling race+bug, which would manifest itself
  by spuriously increasing timeouts of some timers (from 1 jiffy to ~500
  jiffies in the worst case measured) in certain nohz states"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Fix excessive granularity of new timers after a nohz idle

0adb8f3d