Commits · dc52d783d68c07a68743d42ef2f9a6f9a6d80fb1 · Kirill Smelkov / linux

25 Aug, 2017 1 commit

xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect · dc52d783

Annie Li authored Aug 24, 2017

In xen_blkif_disconnect, before checking inflight I/O, following code
stops the blkback thread,
if (ring->xenblkd) {
	kthread_stop(ring->xenblkd);
	wake_up(&ring->shutdown_wq);
}
If there is inflight I/O in any non-last queue, blkback returns -EBUSY
directly, and above code would not be called to stop thread of remaining
queue and processs them. When removing vbd device with lots of disk I/O
load, some queues with inflight I/O still have blkback thread running even
though the corresponding vbd device or guest is gone.
And this could cause some problems, for example, if the backend device type
is file, some loop devices and blkback thread always lingers there forever
after guest is destroyed, and this causes failure of umounting repositories
unless rebooting the dom0.
This patch allows thread of every queue has the chance to get stopped.
Otherwise, only thread of queue previous to(including) first busy one get
stopped, blkthread of remaining queue will still run.  So stop all threads
properly and return -EBUSY if any queue has inflight I/O.
Signed-off-by: Annie Li <annie.li@oracle.com>
Reviewed-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Adnan Misherfi <adnan.misherfi@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

dc52d783

15 Aug, 2017 2 commits

xen-blkfront: use a right index when checking requests · b15bd8cb

Munehisa Kamata authored Aug 09, 2017

Since commit d05d7f40 ("Merge branch 'for-4.8/core' of
git://git.kernel.dk/linux-block") and 3fc9d690 ("Merge branch
'for-4.8/drivers' of git://git.kernel.dk/linux-block"), blkfront_resume()
has been using an index for iterating ring_info to check request when
iterating blk_shadow in an inner loop. This seems to have been
accidentally introduced during the massive rewrite of the block layer
macros in the commits.

This may cause crash like this:

[11798.057074] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[11798.058832] IP: [<ffffffff814411fa>] blkfront_resume+0x10a/0x610
....
[11798.061063] Call Trace:
[11798.061063]  [<ffffffff8139ce93>] xenbus_dev_resume+0x53/0x140
[11798.061063]  [<ffffffff8139ce40>] ? xenbus_dev_probe+0x150/0x150
[11798.061063]  [<ffffffff813f359e>] dpm_run_callback+0x3e/0x110
[11798.061063]  [<ffffffff813f3a08>] device_resume+0x88/0x190
[11798.061063]  [<ffffffff813f4cc0>] dpm_resume+0x100/0x2d0
[11798.061063]  [<ffffffff813f5221>] dpm_resume_end+0x11/0x20
[11798.061063]  [<ffffffff813950a8>] do_suspend+0xe8/0x1a0
[11798.061063]  [<ffffffff813954bd>] shutdown_handler+0xfd/0x130
[11798.061063]  [<ffffffff8139aba0>] ? split+0x110/0x110
[11798.061063]  [<ffffffff8139ac26>] xenwatch_thread+0x86/0x120
[11798.061063]  [<ffffffff810b4570>] ? prepare_to_wait_event+0x110/0x110
[11798.061063]  [<ffffffff8108fe57>] kthread+0xd7/0xf0
[11798.061063]  [<ffffffff811da811>] ? kfree+0x121/0x170
[11798.061063]  [<ffffffff8108fd80>] ? kthread_park+0x60/0x60
[11798.061063]  [<ffffffff810863b0>] ?  call_usermodehelper_exec_work+0xb0/0xb0
[11798.061063]  [<ffffffff810864ea>] ?  call_usermodehelper_exec_async+0x13a/0x140
[11798.061063]  [<ffffffff81534a45>] ret_from_fork+0x25/0x30

Use the right index in the inner loop.

Fixes: d05d7f40 ("Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block")
Fixes: 3fc9d690 ("Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-block")
Signed-off-by: Munehisa Kamata <kamatam@amazon.com>
Reviewed-by: Thomas Friebel <friebelt@amazon.de>
Reviewed-by: Eduardo Valentin <eduval@amazon.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Roger Pau Monne <roger.pau@citrix.com>
Cc: xen-devel@lists.xenproject.org
Cc: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b15bd8cb

xen: fix bio vec merging · 462cdace

Roger Pau Monne authored Jul 18, 2017

The current test for bio vec merging is not fully accurate and can be
tricked into merging bios when certain grant combinations are used.
The result of these malicious bio merges is a bio that extends past
the memory page used by any of the originating bios.

Take into account the following scenario, where a guest creates two
grant references that point to the same mfn, ie: grant 1 -> mfn A,
grant 2 -> mfn A.

These references are then used in a PV block request, and mapped by
the backend domain, thus obtaining two different pfns that point to
the same mfn, pfn B -> mfn A, pfn C -> mfn A.

If those grants happen to be used in two consecutive sectors of a disk
IO operation becoming two different bios in the backend domain, the
checks in xen_biovec_phys_mergeable will succeed, because bfn1 == bfn2
(they both point to the same mfn). However due to the bio merging,
the backend domain will end up with a bio that expands past mfn A into
mfn A + 1.

Fix this by making sure the check in xen_biovec_phys_mergeable takes
into account the offset and the length of the bio, this basically
replicates whats done in __BIOVEC_PHYS_MERGEABLE using mfns (bus
addresses). While there also remove the usage of
__BIOVEC_PHYS_MERGEABLE, since that's already checked by the callers
of xen_biovec_phys_mergeable.

CC: stable@vger.kernel.org
Reported-by: "Jan H. Schönherr" <jschoenh@amazon.de>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

462cdace

25 Jul, 2017 2 commits

xen/blkfront: always allocate grants first from per-queue persistent grants · bd912ef3

Dongli Zhang authored Jun 28, 2017

This patch partially reverts 3df0e505 ("xen/blkfront: pseudo support for
multi hardware queues/rings"). The xen-blkfront queue/ring might hang due
to grants allocation failure in the situation when gnttab_free_head is
almost empty while many persistent grants are reserved for this queue/ring.

As persistent grants management was per-queue since 73716df ("xen/blkfront:
make persistent grants pool per-queue"), we should always allocate from
persistent grants first.
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

bd912ef3

xen-blkfront: fix mq start/stop race · 4b422cb9

Junxiao Bi authored Jul 20, 2017

When ring buf full, hw queue will be stopped. While blkif interrupt consume
request and make free space in ring buf, hw queue will be started again.
But since start queue is protected by spin lock while stop not, that will
cause a race.

interrupt:                                      process:
blkif_interrupt()                               blkif_queue_rq()
 kick_pending_request_queues_locked()
   blk_mq_start_stopped_hw_queues()
      clear_bit(BLK_MQ_S_STOPPED, &hctx->state)
	                                             blk_mq_stop_hw_queue(hctx)
	  blk_mq_run_hw_queue(hctx, async)

If ring buf is made empty in this case, interrupt will never come, then the
hw queue will be stopped forever, all processes waiting for the pending io
in the queue will hung.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ankur Arora <ankur.a.arora@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

4b422cb9

24 Jul, 2017 3 commits

blk-mq: map queues to all present CPUs · 76451d79

Christoph Hellwig authored Jul 18, 2017

We already do this for PCI mappings, and the higher level code now
expects that CPU on/offlining doesn't have an affect on the queue
mappings.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

76451d79

block: disable runtime-pm for blk-mq · 765e40b6

Christoph Hellwig authored Jul 21, 2017

The blk-mq code lacks support for looking at the rpm_status field, tracking
active requests and the RQF_PM flag.

Due to the default switch to blk-mq for scsi people start to run into
suspend / resume issue due to this fact, so make sure we disable the runtime
PM functionality until it is properly implemented.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

765e40b6

xen-blkfront: Fix handling of non-supported operations · 31c4ccc3

Bart Van Assche authored Jul 21, 2017

This patch fixes the following sparse warnings:

drivers/block/xen-blkfront.c:916:45: warning: incorrect type in argument 2 (different base types)
drivers/block/xen-blkfront.c:916:45: expected restricted blk_status_t [usertype] error
drivers/block/xen-blkfront.c:916:45: got int [signed] error
drivers/block/xen-blkfront.c:1599:47: warning: incorrect type in assignment (different base types)
drivers/block/xen-blkfront.c:1599:47: expected int [signed] error
drivers/block/xen-blkfront.c:1599:47: got restricted blk_status_t [usertype] <noident>
drivers/block/xen-blkfront.c:1607:55: warning: incorrect type in assignment (different base types)
drivers/block/xen-blkfront.c:1607:55: expected int [signed] error
drivers/block/xen-blkfront.c:1607:55: got restricted blk_status_t [usertype] <noident>
drivers/block/xen-blkfront.c:1625:55: warning: incorrect type in assignment (different base types)
drivers/block/xen-blkfront.c:1625:55: expected int [signed] error
drivers/block/xen-blkfront.c:1625:55: got restricted blk_status_t [usertype] <noident>
drivers/block/xen-blkfront.c:1628:62: warning: restricted blk_status_t degrades to integer

Compile-tested only.

Fixes: commit 2a842aca ("block: introduce new block status code type")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: <xen-devel@lists.xenproject.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

31c4ccc3

22 Jul, 2017 4 commits

nbd: only set sndtimeo if we have a timeout set · a7ee8cf1

Josef Bacik authored Jul 21, 2017

A user reported that he was getting immediate disconnects with my
sndtimeo patch applied.  This is because by default the OSS nbd client
doesn't set a timeout, so we end up setting the sndtimeo to 0, which of
course means we have send errors a lot.  Instead only set our sndtimeo
if the user specified a timeout, otherwise we'll just wait forever like
we did previously.

Fixes: dc88e34d ("nbd: set sk->sk_sndtimeo for our sockets")
Reported-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a7ee8cf1

nbd: take tx_lock before disconnecting · b4b2aecc

Josef Bacik authored Jul 21, 2017

We need to take the tx_lock so we don't interleave our disconnect
request between real data going down the wire.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

b4b2aecc

nbd: allow multiple disconnects to be sent · 2e13456f

Josef Bacik authored Jul 21, 2017

There's no reason to limit ourselves to one disconnect message per
socket.  Sometimes networks do strange things, might as well let
sysadmins hit the panic button as much as they want.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

2e13456f

MAINTAINERS: fix alphabetical ordering · 82abbea7

Randy Dunlap authored Jul 21, 2017

Fix major alphabetic errors.  No attempt to fix items that all begin
with the same word (like ARM, BROADCOM, DRM, EDAC, FREESCALE, INTEL,
OMAP, PCI, SAMSUNG, TI, USB, etc.).

(diffstat +/- is different by one line because TI KEYSTONE MULTICORE
had 2 blank lines after it.)
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

82abbea7

21 Jul, 2017 28 commits

Merge tag 'nfs-for-4.13-2' of git://git.linux-nfs.org/projects/anna/linux-nfs · 505d5c11

Linus Torvalds authored Jul 21, 2017

Pull NFS client bugfixes from Anna Schumaker:
 "Stable bugfix:
   - Fix error reporting regression

  Bugfixes:
   - Fix setting filelayout ds address race
   - Fix subtle access bug when using ACLs
   - Fix setting mnt3_counts array size
   - Fix a couple of pNFS commit races"

* tag 'nfs-for-4.13-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS/filelayout: Fix racy setting of fl->dsaddr in filelayout_check_deviceid()
  NFS: Be more careful about mapping file permissions
  NFS: Store the raw NFS access mask in the inode's access cache
  NFSv3: Convert nfs3_proc_access() to use nfs_access_set_mask()
  NFS: Refactor NFS access to kernel access mask calculation
  net/sunrpc/xprt_sock: fix regression in connection error reporting.
  nfs: count correct array for mnt3_counts array size
  Revert commit 722f0b89 ("pNFS: Don't send COMMITs to the DSes if...")
  pNFS/flexfiles: Handle expired layout segments in ff_layout_initiate_commit()
  NFS: Fix another COMMIT race in pNFS
  NFS: Fix a COMMIT race in pNFS
  mount: copy the port field into the cloned nfs_server structure.
  NFS: Don't run wake_up_bit() when nobody is waiting...
  nfs: add export operations

505d5c11

Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs · 99313414

Linus Torvalds authored Jul 21, 2017

Pull overlayfs fixes from Miklos Szeredi:
 "This fixes a crash with SELinux and several other old and new bugs"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: check for bad and whiteout index on lookup
  ovl: do not cleanup directory and whiteout index entries
  ovl: fix xattr get and set with selinux
  ovl: remove unneeded check for IS_ERR()
  ovl: fix origin verification of index dir
  ovl: mark parent impure on ovl_link()
  ovl: fix random return value on mount

99313414

Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 0151ef00

Linus Torvalds authored Jul 21, 2017

Pull block fixes from Jens Axboe:
 "A small set of fixes for -rc2 - two fixes for BFQ, documentation and
  code, and a removal of an unused variable in nbd. Outside of that, a
  small collection of fixes from the usual crew on the nvme side"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nvmet: don't report 0-bytes in serial number
  nvmet: preserve controller serial number between reboots
  nvmet: Move serial number from controller to subsystem
  nvmet: prefix version configfs file with attr
  nvme-pci: Fix an error handling path in 'nvme_probe()'
  nvme-pci: Remove nvme_setup_prps BUG_ON
  nvme-pci: add another device ID with stripe quirk
  nvmet-fc: fix byte swapping in nvmet_fc_ls_create_association
  nvme: fix byte swapping in the streams code
  nbd: kill unused ret in recv_work
  bfq: dispatch request to prevent queue stalling after the request completion
  bfq: fix typos in comments about B-WF2Q+ algorithm

0151ef00

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · bb236dbe

Linus Torvalds authored Jul 21, 2017

Pull more rdma fixes from Doug Ledford:
 "As per my previous pull request, there were two drivers that each had
  a rather large number of legitimate fixes still to be sent.

  As it turned out, I also missed a reasonably large set of fixes from
  one person across the stack that are all important fixes. All in all,
  the bnxt_re, i40iw, and Dan Carpenter are 3/4 to 2/3rds of this pull
  request.

  There were some other random fixes that I didn't send in the last pull
  request that I added to this one. This catches the rdma stack up to
  the fixes from up to about the beginning of this week. Any more fixes
  I'll wait and batch up later in the -rc cycle. This will give us a
  good base to start with for basing a for-next branch on -rc2.

  Summary:

   - i40iw fixes

   - bnxt_re fixes

   - Dan Carpenter bugfixes across stack

   - ten more random fixes, no more than two from any one person"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
  RDMA/core: Initialize port_num in qp_attr
  RDMA/uverbs: Fix the check for port number
  IB/cma: Fix reference count leak when no ipv4 addresses are set
  RDMA/iser: don't send an rkey if all data is written as immadiate-data
  rxe: fix broken receive queue draining
  RDMA/qedr: Prevent memory overrun in verbs' user responses
  iw_cxgb4: don't use WR keys/addrs for 0 byte reads
  IB/mlx4: Fix CM REQ retries in paravirt mode
  IB/rdmavt: Setting of QP timeout can overflow jiffies computation
  IB/core: Fix sparse warnings
  RDMA/bnxt_re: Fix the value reported for local ack delay
  RDMA/bnxt_re: Report MISSED_EVENTS in req_notify_cq
  RDMA/bnxt_re: Fix return value of poll routine
  RDMA/bnxt_re: Enable atomics only if host bios supports
  RDMA/bnxt_re: Specify RDMA component when allocating stats context
  RDMA/bnxt_re: Fixed the max_rd_atomic support for initiator and destination QP
  RDMA/bnxt_re: Report supported value to IB stack in query_device
  RDMA/bnxt_re: Do not free the ctx_tbl entry if delete GID fails
  RDMA/bnxt_re: Fix WQE Size posted to HW to prevent it from throwing error
  RDMA/bnxt_re: Free doorbell page index (DPI) during dealloc ucontext
  ...

bb236dbe

Merge tag 'drm-fixes-for-v4.13-rc2' of git://people.freedesktop.org/~airlied/linux · 24a1635a

Linus Torvalds authored Jul 21, 2017

Pull drm fixes from Dave Airlie:
 "A bunch of fixes for rc2: two imx regressions, vc4 fix, dma-buf fix,
  some displayport mst fixes, and an amdkfd fix.

  Nothing too crazy, I assume we just haven't see much rc1 testing yet"

* tag 'drm-fixes-for-v4.13-rc2' of git://people.freedesktop.org/~airlied/linux:
  drm/mst: Avoid processing partially received up/down message transactions
  drm/mst: Avoid dereferencing a NULL mstb in drm_dp_mst_handle_up_req()
  drm/mst: Fix error handling during MST sideband message reception
  drm/imx: parallel-display: Accept drm_of_find_panel_or_bridge failure
  drm/imx: fix typo in ipu_plane_formats[]
  drm/vc4: Fix VBLANK handling in crtc->enable() path
  dma-buf/fence: Avoid use of uninitialised timestamp
  drm/amdgpu: Remove unused field kgd2kfd_shared_resources.num_mec
  drm/radeon: Remove initialization of shared_resources.num_mec
  drm/amdkfd: Remove unused references to shared_resources.num_mec
  drm/amdgpu: Fix KFD oversubscription by tracking queues correctly

24a1635a

Merge tag 'trace-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · f79ec886

Linus Torvalds authored Jul 21, 2017

Pull tracing fixes from Steven Rostedt:
 "Three minor updates

   - Use the new GFP_RETRY_MAYFAIL to be more aggressive in allocating
     memory for the ring buffer without causing OOMs

   - Fix a memory leak in adding and removing instances

   - Add __rcu annotation to be able to debug RCU usage of function
     tracing a bit better"

* tag 'trace-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  trace: fix the errors caused by incompatible type of RCU variables
  tracing: Fix kmemleak in instance_rmdir
  tracing/ring_buffer: Try harder to allocate

f79ec886

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · b0a75281

Linus Torvalds authored Jul 21, 2017

Pull KVM fixes from Radim Krčmář:
 "A bunch of small fixes for x86"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: x86: hyperv: avoid livelock in oneshot SynIC timers
  KVM: VMX: Fix invalid guest state detection after task-switch emulation
  x86: add MULTIUSER dependency for KVM
  KVM: nVMX: Disallow VM-entry in MOV-SS shadow
  KVM: nVMX: track NMI blocking state separately for each VMCS
  KVM: x86: masking out upper bits

b0a75281

Merge tag 'powerpc-4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 10fc9554

Linus Torvalds authored Jul 21, 2017

Pull powerpc fixes from Michael Ellerman:
 "A handful of fixes, mostly for new code:

   - some reworking of the new STRICT_KERNEL_RWX support to make sure we
     also remove executable permission from __init memory before it's
     freed.

   - a fix to some recent optimisations to the hypercall entry where we
     were clobbering r12, this was breaking nested guests (PR KVM).

   - a fix for the recent patch to opal_configure_cores(). This could
     break booting on bare metal Power8 boxes if the kernel was built
     without CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG.

   - .. and finally a workaround for spurious PMU interrupts on Power9
     DD2.

  Thanks to: Nicholas Piggin, Anton Blanchard, Balbir Singh"

* tag 'powerpc-4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Mark __init memory no-execute when STRICT_KERNEL_RWX=y
  powerpc/mm/hash: Refactor hash__mark_rodata_ro()
  powerpc/mm/radix: Refactor radix__mark_rodata_ro()
  powerpc/64s: Fix hypercall entry clobbering r12 input
  powerpc/perf: Avoid spurious PMU interrupts after idle
  powerpc/powernv: Fix boot on Power8 bare metal due to opal_configure_cores()

10fc9554

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4ec9f7a1

Linus Torvalds authored Jul 21, 2017

Pull x86 fixes from Ingo Molnar:
 "Half of the fixes are for various build time warnings triggered by
  randconfig builds. Most (but not all...) were harmless.

  There's also:

   - ACPI boundary condition fixes

   - UV platform fixes

   - defconfig updates

   - an AMD K6 CPU init fix

   - a %pOF printk format related preparatory change

   - .. and a warning fix related to the tlb/PCID changes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/devicetree: Convert to using %pOF instead of ->full_name
  x86/platform/uv/BAU: Disable BAU on single hub configurations
  x86/platform/intel-mid: Fix a format string overflow warning
  x86/platform: Add PCI dependency for PUNIT_ATOM_DEBUG
  x86/build: Silence the build with "make -s"
  x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl
  x86/fpu/math-emu: Avoid bogus -Wint-in-bool-context warning
  x86/fpu/math-emu: Fix possible uninitialized variable use
  perf/x86: Shut up false-positive -Wmaybe-uninitialized warning
  x86/defconfig: Remove stale, old Kconfig options
  x86/ioapic: Pass the correct data to unmask_ioapic_irq()
  x86/acpi: Prevent out of bound access caused by broken ACPI tables
  x86/mm, KVM: Fix warning when !CONFIG_PREEMPT_COUNT
  x86/platform/uv/BAU: Fix congested_response_us not taking effect
  x86/cpu: Use indirect call to measure performance in init_amd_k6()

4ec9f7a1

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e234b4a8

Linus Torvalds authored Jul 21, 2017

Pull timer fix from Ingo Molnar:
 "A timer_irq_init() clocksource API robustness fix"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/timer-of: Handle of_irq_get_byname() result correctly

e234b4a8

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5a77f025

Linus Torvalds authored Jul 21, 2017

Pull scheduler fixes from Ingo Molnar:
 "A cputime fix and code comments/organization fix to the deadline
  scheduler"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Fix confusing comments about selection of top pi-waiter
  sched/cputime: Don't use smp_processor_id() in preemptible context

5a77f025

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · bbcdea65

Linus Torvalds authored Jul 21, 2017

Pull perf fixes from Ingo Molnar:
 "Two hw-enablement patches, two race fixes, three fixes for regressions
  of semantics, plus a number of tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Add proper condition to run sched_task callbacks
  perf/core: Fix locking for children siblings group read
  perf/core: Fix scheduling regression of pinned groups
  perf/x86/intel: Fix debug_store reset field for freq events
  perf/x86/intel: Add Goldmont Plus CPU PMU support
  perf/x86/intel: Enable C-state residency events for Apollo Lake
  perf symbols: Accept zero as the kernel base address
  Revert "perf/core: Drop kernel samples even though :u is specified"
  perf annotate: Fix broken arrow at row 0 connecting jmp instruction to its target
  perf evsel: State in the default event name if attr.exclude_kernel is set
  perf evsel: Fix attr.exclude_kernel setting for default cycles:p

bbcdea65

Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8b810a3a

Linus Torvalds authored Jul 21, 2017

Pull locking fixlet from Ingo Molnar:
 "Remove an unnecessary priority adjustment in the rtmutex code"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rtmutex: Remove unnecessary priority adjustment

8b810a3a

NFS/filelayout: Fix racy setting of fl->dsaddr in filelayout_check_deviceid() · 1ebf9801

Trond Myklebust authored Jul 20, 2017

We must set fl->dsaddr once, and once only, even if there are multiple
processes calling filelayout_check_deviceid() for the same layout
segment.
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

1ebf9801

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 34eddefe

Linus Torvalds authored Jul 21, 2017

Pull irq fixes from Ingo Molnar:
 "A resume_irq() fix, plus a number of static declaration fixes"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/digicolor: Drop unnecessary static
  irqchip/mips-cpu: Drop unnecessary static
  irqchip/gic/realview: Drop unnecessary static
  irqchip/mips-gic: Remove population of irq domain names
  genirq/PM: Properly pretend disabled state when force resuming interrupts

34eddefe

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0a6109fd

Linus Torvalds authored Jul 21, 2017

Pull core fixes from Ingo Molnar:
 "A fix to WARN_ON_ONCE() done by modules, plus a MAINTAINERS update"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  debug: Fix WARN_ON_ONCE() for modules
  MAINTAINERS: Update the PTRACE entry

0a6109fd

NFS: Be more careful about mapping file permissions · ecbb903c

Trond Myklebust authored Jul 11, 2017

When mapping a directory, we want the MAY_WRITE permissions to reflect
whether or not we have permission to modify, add and delete the directory
entries. MAY_EXEC must map to lookup permissions.

On the other hand, for files, we want MAY_WRITE to reflect a permission
to modify and extend the file.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

ecbb903c

NFS: Store the raw NFS access mask in the inode's access cache · bd8b2441

Trond Myklebust authored Jul 11, 2017

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

bd8b2441

NFSv3: Convert nfs3_proc_access() to use nfs_access_set_mask() · eda3e208

Trond Myklebust authored Jul 11, 2017

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

eda3e208

NFS: Refactor NFS access to kernel access mask calculation · 15d4b73a

Trond Myklebust authored Jul 11, 2017

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

15d4b73a

net/sunrpc/xprt_sock: fix regression in connection error reporting. · 3ffbc1d6

NeilBrown authored Jul 19, 2017

Commit 3d476263 ("tcp: remove poll() flakes when receiving
RST") in v4.12 changed the order in which ->sk_state_change()
and ->sk_error_report() are called when a socket is shut
down - sk_state_change() is now called first.

This causes xs_tcp_state_change() -> xs_sock_mark_closed() ->
xprt_disconnect_done() to wake all pending tasked with -EAGAIN.
When the ->sk_error_report() callback arrives, it is too late to
pass the error on, and it is lost.

As easy way to demonstrate the problem caused is to try to start
rpc.nfsd while rcpbind isn't running.
nfsd will attempt a tcp connection to rpcbind.  A ECONNREFUSED
error is returned, but sunrpc code loses the error and keeps
retrying.  If it saw the ECONNREFUSED, it would abort.

To fix this, handle the sk->sk_err in the TCP_CLOSE branch of
xs_tcp_state_change().

Fixes: 3d476263 ("tcp: remove poll() flakes when receiving RST")
Cc: stable@vger.kernel.org (v4.12)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

3ffbc1d6

nfs: count correct array for mnt3_counts array size · ecc7b435

Eryu Guan authored Jul 18, 2017

Array size of mnt3_counts should be the size of array
mnt3_procedures, not mnt_procedures, though they're same in size
right now. Found this by code inspection.

Fixes: 1c5876dd ("sunrpc: move p_count out of struct rpc_procinfo")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

ecc7b435

x86/devicetree: Convert to using %pOF instead of ->full_name · db15e7f2

Rob Herring authored Jul 18, 2017

Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing
of the full path string for each device node.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devicetree@vger.kernel.org
Link: http://lkml.kernel.org/r/20170718214339.7774-7-robh@kernel.org
[ Clarify the error message while at it, as 'node' is ambiguous. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

db15e7f2

perf/x86/intel: Add proper condition to run sched_task callbacks · df6c3db8

Jiri Olsa authored Jul 19, 2017

We have 2 functions using the same sched_task callback:

  - PEBS drain for free running counters
  - LBR save/store

Both of them are called from intel_pmu_sched_task() and
either of them can be unwillingly triggered when the
other one is configured to run.

Let's say there's PEBS drain configured in sched_task
callback for the event, but in the callback itself
(intel_pmu_sched_task()) we will also run the code for
LBR save/restore, which we did not ask for, but the
code in intel_pmu_sched_task() does not check for that.

This can lead to extra cycles in some perf monitoring,
like when we monitor PEBS event without LBR data.

  # perf record --no-timestamp -c 10000 -e cycles:p ./perf bench sched pipe -l 1000000

  (We need PEBS, non freq/non timestamp event to enable
   the sched_task callback)

The perf stat of cycles and msr:write_msr for above
command before the change:
  ...
  Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
                                 ./perf bench sched pipe -l 1000000' (5 runs):

    18,519,557,441      cycles:k
        91,195,527      msr:write_msr

      29.334476406 seconds time elapsed

And after the change:
  ...
  Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
                                 ./perf bench sched pipe -l 1000000' (5 runs):

    18,704,973,540      cycles:k
        27,184,720      msr:write_msr

      16.977875900 seconds time elapsed

There's no affect on cycles:k because the sched_task happens
with events switched off, however the msr:write_msr tracepoint
counter together with almost 50% of time speedup show the
improvement.

Monitoring LBR event and having extra PEBS drain processing
in sched_task callback showed just a little speedup, because
the drain function does not do much extra work in case there
is no PEBS data.

Adding conditions to recognize the configured work that needs
to be done in the x86_pmu's sched_task callback.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170719075247.GA27506@kravaSigned-off-by: Ingo Molnar <mingo@kernel.org>

df6c3db8

x86/platform/uv/BAU: Disable BAU on single hub configurations · 2fe9a5c6

Andrew Banman authored Jul 20, 2017

The BAU confers no benefit to a UV system running with only one hub/socket.
Permanently disable the BAU driver if there are less than two hubs online
to avoid BAU overhead. We have observed failed boots on single-socket UV4
systems caused by BAU that are avoided with this patch.

Also, while at it, consolidate initialization error blocks and fix a
memory leak.
Signed-off-by: Andrew Banman <abanman@hpe.com>
Acked-by: Russ Anderson <rja@hpe.com>
Acked-by: Mike Travis <mike.travis@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: tony.ernst@hpe.com
Link: http://lkml.kernel.org/r/1500588351-78016-1-git-send-email-abanman@hpe.com
[ Minor cleanups. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

2fe9a5c6

perf/core: Fix locking for children siblings group read · 2aeb1883

Jiri Olsa authored Jul 20, 2017

We're missing ctx lock when iterating children siblings
within the perf_read path for group reading. Following
race and crash can happen:

User space doing read syscall on event group leader:

T1:
  perf_read
    lock event->ctx->mutex
    perf_read_group
      lock leader->child_mutex
      __perf_read_group_add(child)
        list_for_each_entry(sub, &leader->sibling_list, group_entry)

---->   sub might be invalid at this point, because it could
        get removed via perf_event_exit_task_context in T2

Child exiting and cleaning up its events:

T2:
  perf_event_exit_task_context
    lock ctx->mutex
    list_for_each_entry_safe(child_event, next, &child_ctx->event_list,...
      perf_event_exit_event(child)
        lock ctx->lock
        perf_group_detach(child)
        unlock ctx->lock

---->   child is removed from sibling_list without any sync
        with T1 path above

        ...
        free_event(child)

Before the child is removed from the leader's child_list,
(and thus is omitted from perf_read_group processing), we
need to ensure that perf_read_group touches child's
siblings under its ctx->lock.

Peter further notes:

| One additional note; this bug got exposed by commit:
|
|   ba5213ae ("perf/core: Correct event creation with PERF_FORMAT_GROUP")
|
| which made it possible to actually trigger this code-path.
Tested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: ba5213ae ("perf/core: Correct event creation with PERF_FORMAT_GROUP")
Link: http://lkml.kernel.org/r/20170720141455.2106-1-jolsa@kernel.orgSigned-off-by: Ingo Molnar <mingo@kernel.org>

2aeb1883

Merge tag 'imx-drm-fixes-2017-07-18' of git://git.pengutronix.de/git/pza/linux into drm-fixes · 5896ec77

Dave Airlie authored Jul 21, 2017

imx-drm: fix parallel display regression and typo in plane format list

- Fix a regression where the parallel-display driver would not probe
  anymore if no panel is specified in the device tree, since the
  introduction of drm_of_find_panel_or_bridge.
- Fix a typo in the plane format list: replace a duplicate BGRA8888 format
  with BGRX8888, as originally intended.

* tag 'imx-drm-fixes-2017-07-18' of git://git.pengutronix.de/git/pza/linux:
  drm/imx: parallel-display: Accept drm_of_find_panel_or_bridge failure
  drm/imx: fix typo in ipu_plane_formats[]

5896ec77

Merge tag 'drm-misc-fixes-2017-07-20' of git://anongit.freedesktop.org/git/drm-misc into drm-fixes · 660f6b5c

Dave Airlie authored Jul 21, 2017

Core Changes:
- fence: Introduce new fence flag to signify timestamp is populated (Chris)
- mst: Avoid processing incomplete data + fix NULL dereference (Imre)

Driver Changes:
- vc4: Avoid WARN from grabbing a ref from vblank that's not on (Boris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Imre Deak <imre.deak@intel.com>

* tag 'drm-misc-fixes-2017-07-20' of git://anongit.freedesktop.org/git/drm-misc:
  drm/mst: Avoid processing partially received up/down message transactions
  drm/mst: Avoid dereferencing a NULL mstb in drm_dp_mst_handle_up_req()
  drm/mst: Fix error handling during MST sideband message reception
  drm/vc4: Fix VBLANK handling in crtc->enable() path
  dma-buf/fence: Avoid use of uninitialised timestamp

660f6b5c