Commits · 81b1dab45809234958653dca29d04d4161f0a476 · Kirill Smelkov / linux

21 May, 2018 1 commit

Merge branch 'nvme-4.18' of git://git.infradead.org/nvme into for-4.18/block · 81b1dab4

Jens Axboe authored May 21, 2018

Pull NVMe changes from Keith:

"This is just the first nvme pull request for 4.18. There are several
fabrics and target patches that I missed, so there will be more to
come."

* 'nvme-4.18' of git://git.infradead.org/nvme:
  nvme-pci: drop IRQ disabling on submission queue lock
  nvme-pci: split the nvme queue lock into submission and completion locks
  nvme-pci: handle completions outside of the queue lock
  nvme-pci: move ->cq_vector == -1 check outside of ->q_lock
  nvme-pci: remove cq check after submission
  nvme-pci: simplify nvme_cqe_valid
  nvme: mark the result argument to nvme_complete_async_event volatile
  nvme/pci: Sync controller reset for AER slot_reset
  nvme/pci: Hold controller reference during async probe
  nvme: only reconfigure discard if necessary
  nvme/pci: Use async_schedule for initial reset work
  nvme: lightnvm: add granby support
  NVMe: Add Quirk Delay before CHK RDY for Seagate Nytro Flash Storage
  nvme: change order of qid and cmdid in completion trace
  nvme: fc: provide a descriptive error

81b1dab4

18 May, 2018 8 commits

nvme-pci: drop IRQ disabling on submission queue lock · 1eae349d

Jens Axboe authored May 17, 2018

Since we aren't sharing the lock for completions now, we don't
have to make it IRQ safe.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christoph Hellwig <hch@lst.de>

1eae349d

nvme-pci: split the nvme queue lock into submission and completion locks · 1ab0cd69

Jens Axboe authored May 17, 2018

This is now feasible. We protect the submission queue ring with
->sq_lock, and the completion side with ->cq_lock.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christoph Hellwig <hch@lst.de>

1ab0cd69

nvme-pci: handle completions outside of the queue lock · 5cb525c8

Jens Axboe authored May 17, 2018

Split the completion of events into a two part process:

1) Reap the events inside the queue lock
2) Complete the events outside the queue lock

Since we never wrap the queue, we can access it locklessly after we've
updated the completion queue head. This patch started off with batching
events on the stack, but with this trick we don't have to. Keith Busch
<keith.busch@intel.com> came up with that idea.

Note that this kills the ->cqe_seen as well. I haven't been able to
trigger any ill effects of this. If we do race with polling every so
often, it should be rare enough NOT to trigger any issues.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: refactored, restored poll early exit optimization]
Signed-off-by: Christoph Hellwig <hch@lst.de>

5cb525c8

nvme-pci: move ->cq_vector == -1 check outside of ->q_lock · d1f06f4a

Jens Axboe authored May 17, 2018

We only clear it dynamically in nvme_suspend_queue(). When we do, ensure
to do a full flush so that any nvme_queue_rq() invocation will see it.

Ideally we'd kill this check completely, but we're using it to flush
requests on a dying queue.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christoph Hellwig <hch@lst.de>

d1f06f4a

nvme-pci: remove cq check after submission · f9dde187

Jens Axboe authored May 17, 2018

We always check the completion queue after submitting, but in my testing
this isn't a win even on DRAM/xpoint devices. In some cases it's
actually worse. Kill it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

f9dde187

nvme-pci: simplify nvme_cqe_valid · 750dde44

Christoph Hellwig authored May 18, 2018

We always look at the current CQ head and phase, so don't pass these
as separate arguments, and rename the function to nvme_cqe_pending.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

750dde44

nvme: mark the result argument to nvme_complete_async_event volatile · 287a63eb

Christoph Hellwig authored May 17, 2018

We'll need that in the PCIe driver soon as we'll read it straight off the
CQ.
Signed-off-by: Christoph Hellwig <hch@lst.de>

287a63eb

blk-mq: clear hctx->dispatch_from when mappings change · d416c92c

huhai authored May 18, 2018

When the number of hardware queues is changed, the drivers will call
blk_mq_update_nr_hw_queues() to remap hardware queues. This changes
the ctx mappings, but the current code doesn't clear the
->dispatch_from hint. This can result in dispatch_from pointing to
a ctx that isn't mapped to the hctx anymore.

Fixes: b347689f ("blk-mq-sched: improve dispatching from sw queue")
Signed-off-by: huhai <huhai@kylinos.cn>
Reviewed-by: Ming Lei <ming.lei@redhat.com>

Moved the placement of the clearing to where we clear other items
pertaining to the existing mapping, added Fixes line, and reworded
the commit message.
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d416c92c

16 May, 2018 8 commits

nbd: call nbd_bdev_reset instead of bd_set_size on disconnect · 76aa1d34

Josef Bacik authored May 16, 2018

We need to make sure we don't just set the size of the bdev to 0 while
it's being used by a file system.  We have the appropriate check in
nbd_bdev_reset, simply use that helper instead.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

76aa1d34

nbd: fix how we set bd_invalidated · fe1f9e66

Josef Bacik authored May 16, 2018

bd_invalidated is kind of a pain wrt partitions as it really only
triggers the partition rescan if it is set after bd_ops->open() runs, so
setting it when we reset the device isn't useful.  We also sporadically
would still have partitions left over in some disconnect cases, so fix
this by always setting bd_invalidated on open if there's no
configuration or if we've had a disconnect action happen, that way the
partition table gets invalidated and rescanned properly.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

fe1f9e66

nbd: clear_sock on netlink disconnect · 96d97e17

Josef Bacik authored May 16, 2018

This is what the ioctl based nbd disconnect does as well.  Without this
the device will just sit there and wait for the connection to go away
(or IO to occur) before the device gets torn down.  Instead clear
everything up on our end so the configuration goes away as quickly as
possible.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

96d97e17

nbd: use bd_set_size when updating disk size · 9e2b1967

Josef Bacik authored May 16, 2018

When we stopped relying on the bdev everywhere I broke updating the
block device size on the fly, which ceph relies on.  We can't just do
set_capacity, we also have to do bd_set_size so things like parted will
notice the device size change.

Fixes: 29eaadc0 ("nbd: stop using the bdev everywhere")
cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

9e2b1967

nbd: update size when connected · c3f7c939

Josef Bacik authored May 16, 2018

I messed up changing the size of an NBD device while it was connected by
not actually updating the device or doing the uevent.  Fix this by
updating everything if we're connected and we change the size.

cc: stable@vger.kernel.org
Fixes: 639812a1 ("nbd: don't set the device size until we're connected")
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c3f7c939

nbd: fix nbd device deletion · 8364da47

Josef Bacik authored May 16, 2018

This fixes a use after free bug, we shouldn't be doing disk->queue right
after we do del_gendisk(disk).  Save the queue and do the cleanup after
the del_gendisk.

Fixes: c6a4759e ("nbd: add device refcounting")
cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8364da47

block: fix MAINTAINERS email for nbd · 3de9beee

Josef Bacik authored May 16, 2018

I've been missing stuff because it's been going into my work email which
is a black hole.  Update to the email I actually use so I stop missing
patches and bug reports.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3de9beee

blk-mq: remove redundant insert case in blk_mq_make_request() · 8fa9f556

huhai authored May 16, 2018

We can use blk_mq_sched_insert_request() even if we don't have
an IO scheduler attached, since that case will end up being
exactly the same as what blk_mq_queue_io() was doing now.
Signed-off-by: huhai <huhai@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8fa9f556

15 May, 2018 1 commit

Remove jsflash driver · da3c6efe

Jens Axboe authored May 15, 2018

Nobody is using it anymore, and it's been abandoned. Since David
is fine with removing it, kill it.
Suggested-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

da3c6efe

14 May, 2018 18 commits

block: Add sysfs entry for fua support · 6fcefbe5

Kent Overstreet authored May 08, 2018

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6fcefbe5

block: Export bio check/set pages_dirty · 1900fcc4

Kent Overstreet authored May 08, 2018

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

1900fcc4

block: Add warning for bi_next not NULL in bio_endio() · 0ba99ca4

Kent Overstreet authored May 08, 2018

Recently found a bug where a driver left bi_next not NULL and then
called bio_endio(), and then the submitter of the bio used
bio_copy_data() which was treating src and dst as lists of bios.

Fixed that bug by splitting out bio_list_copy_data(), but in case other
things are depending on bi_next in weird ways, add a warning to help
avoid more bugs like that in the future.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

0ba99ca4

block: Add missing flush_dcache_page() call · 6e6e811d

Kent Overstreet authored May 08, 2018

Since a bio can point to userspace pages (e.g. direct IO), this is
generally necessary.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6e6e811d

block: Split out bio_list_copy_data() · 45db54d5

Kent Overstreet authored May 08, 2018

Found a bug (with ASAN) where we were passing a bio to bio_copy_data()
with bi_next not NULL, when it should have been - a driver had left
bi_next set to something after calling bio_endio().

Since the normal case is only copying single bios, split out
bio_list_copy_data() to avoid more bugs like this in the future.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

45db54d5

block: Add bio_copy_data_iter(), zero_fill_bio_iter() · 38a72dac

Kent Overstreet authored May 08, 2018

Add versions that take bvec_iter args instead of using bio->bi_iter - to
be used by bcachefs.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

38a72dac

block: Use bioset_init() for fs_bio_set · f4f8154a

Kent Overstreet authored May 08, 2018

Minor optimization - remove a pointer indirection when using fs_bio_set.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

f4f8154a

block: Add bioset_init()/bioset_exit() · 917a38c7

Kent Overstreet authored May 08, 2018

Similarly to mempool_init()/mempool_exit(), take a pointer indirection
out of allocation/freeing by allowing biosets to be embedded in other
structs.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

917a38c7

block: Convert bio_set to mempool_init() · 8aa6ba2f

Kent Overstreet authored May 08, 2018

Minor performance improvement by getting rid of pointer indirections
from allocation/freeing fastpaths.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

8aa6ba2f

mempool: Add mempool_init()/mempool_exit() · c1a67fef

Kent Overstreet authored May 04, 2015

Allows mempools to be embedded in other structs, getting rid of a
pointer indirection from allocation fastpaths.

mempool_exit() is safe to call on an uninitialized but zeroed mempool.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c1a67fef

sbitmap: fix race in wait batch accounting · c854ab57

Jens Axboe authored May 14, 2018

If we have multiple callers of sbq_wake_up(), we can end up in a
situation where the wait_cnt will continually go more and more
negative. Consider the case where our wake batch is 1, hence
wait_cnt will start out as 1.

wait_cnt == 1

CPU0				CPU1
atomic_dec_return(), cnt == 0
				atomic_dec_return(), cnt == -1
				cmpxchg(-1, 0) (succeeds)
				[wait_cnt now 0]
cmpxchg(0, 1) (fails)

This ends up with wait_cnt being 0, we'll wakeup immediately
next time. Going through the same loop as above again, and
we'll have wait_cnt -1.

For the case where we have a larger wake batch, the only
difference is that the starting point will be higher. We'll
still end up with continually smaller batch wakeups, which
defeats the purpose of the rolling wakeups.

Always reset the wait_cnt to the batch value. Then it doesn't
matter who wins the race. But ensure that whomever does win
the race is the one that increments the ws index and wakes up
our batch count, loser gets to call __sbq_wake_up() again to
account his wakeups towards the next active wait state index.

Fixes: 6c0ca7ae ("sbitmap: fix wakeup hang after sbq resize")
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c854ab57

block: consistently use GFP_NOIO instead of __GFP_NORECLAIM · 0eb0b63c

Christoph Hellwig authored May 09, 2018

Same numerical value (for now at least), but a much better documentation
of intent.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

0eb0b63c

block: use GFP_NOIO instead of __GFP_DIRECT_RECLAIM · c3036021

Christoph Hellwig authored May 09, 2018

We just can't do I/O when doing block layer requests allocations,
so use GFP_NOIO instead of the even more limited __GFP_DIRECT_RECLAIM.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

c3036021

block: pass an explicit gfp_t to get_request · 4accf5fc

Christoph Hellwig authored May 09, 2018

blk_old_get_request already has it at hand, and in blk_queue_bio, which
is the fast path, it is constant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

4accf5fc

block: sanitize blk_get_request calling conventions · ff005a06

Christoph Hellwig authored May 09, 2018

Switch everyone to blk_get_request_flags, and then rename
blk_get_request_flags to blk_get_request.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ff005a06

block: fix __get_request documentation · a9a14d36

Christoph Hellwig authored May 09, 2018

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

a9a14d36

scsi/osd: remove the gfp argument to osd_start_request · ac613e45

Christoph Hellwig authored May 09, 2018

Always GFP_KERNEL, and keeping it would cause serious complications for
the next change.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ac613e45

memstick: remove unused variables · 058147bc

Christoph Hellwig authored May 14, 2018

Fixes: 7c2d748e ("memstick: don't call blk_queue_bounce_limit")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

058147bc

11 May, 2018 4 commits

ps3disk: handle highmem pages · e4f0e0cb

Christoph Hellwig authored May 09, 2018

The ps3disk driver already kmaps all pages when copying from/to the
internal bounce buffer, so it can accept highmem pages just fine.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

e4f0e0cb

jsflash: handle highmem pages · 37a5b5c6

Christoph Hellwig authored May 09, 2018

Just kmap the bio single page payload before processing it.

(and yes, now highmem on sparc32 anyway, but kmap_(un)map atomic are nops,
so this gives the right example)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

37a5b5c6

aoe: handle highmem pages · ad180f6f

Christoph Hellwig authored May 09, 2018

Use kmap_atomic when copying out of a bio_vec.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ad180f6f

mtd_blkdevs: handle highmem pages · 34ab96e6

Christoph Hellwig authored May 09, 2018

Just kmap the single payload page before passing it on to the FTL.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

34ab96e6