Commits · 1c5b12f76301b86d0e5828c7d11ec7c36ffd0195 · Kirill Smelkov / linux

24 Apr, 2017 28 commits

Fix implicit logo and RSCN handling for NVMET · 1c5b12f7

James Smart authored Apr 21, 2017

NVMET didn't have any RSCN handling at all and
would not execute implicit LOGO when receiving a PLOGI
from an rport that NVMET had in state UNMAPPED.

Clean up the logic in lpfc_nlp_state_cleanup for
initiators (FCP and NVME). NVMET should not respond to
RSCN including allocating new ndlps so this code was
conditionalized when nvmet_support is true.  The check
for NLP_RCV_PLOGI in lpfc_setup_disc_node was moved
below the check for nvmet_support to allow the NVMET
to recover initiator nodes correctly.  The implicit
logo was introduced with lpfc_rcv_plogi when NVMET gets
a PLOGI on an ndlp in UNMAPPED state.  The RSCN handling
was modified to not respond to an RSCN in NVMET.  Instead
NVMET sends a GID_FT and determines if an NVMEP_INITIATOR
it has is UNMAPPED but no longer in the zone membership.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

1c5b12f7

Add Fabric assigned WWN support. · aeb3c817

James Smart authored Apr 21, 2017

Adding support for Fabric assigned WWPN and WWNN.

Firmware sends first FLOGI to fabric with vendor version changes.
On link up driver gets updated service parameter with FAWWN assigned port
name.  Driver sends 2nd FLOGI with updated fawwpn and modifies the
vport->fc_portname in driver.

Note:
Soft wwpn will not be allowed when fawwpn is enabled.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

aeb3c817

Fix max_sgl_segments settings for NVME / NVMET · 4d4c4a4a

James Smart authored Apr 21, 2017

Cannot set NVME segment counts to a large number

The existing module parameter lpfc_sg_seg_cnt is used for both
SCSI and NVME.

Limit the module parameter lpfc_sg_seg_cnt to 128 with the
default being 64 for both NVME and NVMET, assuming NVME is enabled in the
driver for that port. The driver will set max_sgl_segments in the
NVME/NVMET template to lpfc_sg_seg_cnt + 1.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

4d4c4a4a

Fix crash after issuing lip reset · 9d3d340d

James Smart authored Apr 21, 2017

When RPI is not available, driver sends WQE with invalid RPI value and
rejected by HBA.
lpfc 0000:82:00.3: 1:3154 BLS ABORT RSP failed, data:  x3/xa0320008
and
lpfc :2753 PLOGI failure DID:FFFFFA Status:x3/xa0240008

In this case, driver accesses rpi_ids array out of bounds.

Fix:
Check return value of lpfc_sli4_alloc_rpi(). Do not allocate
lpfc_nodelist entry if RPI is not available.

When RPI is not available, we will get discovery timeouts and
command drops for some of the vports as seen below.

lpfc :0273 Unexpected discovery timeout, vport State x0
lpfc :0230 Unexpected timeout, hba link state x5
lpfc :0111 Dropping received ELS cmd Data: x0 xc90c55 x0
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

9d3d340d

Fix driver load issues when MRQ=8 · 2b7824d0

James Smart authored Apr 21, 2017

The symptom is that the driver will fail to login to the fabric.
The reason is because it is out of iocb resources.

There is a one to one relationship between MRQs
(receive buffers for NVMET-FC) and iocbs and the default number of
IOCBs was not accounting for the number of MRQs that were being created.

This fix aligns the number of MRQ resources with the total resources so
that it can handle fabric events when needed.

Also the initialization of ctxlock to be on FCP commands, NOT LS commands.
And modified log messages so that the log output can be correlated with
the analyzer trace.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

2b7824d0

Remove hba lock from NVMET issue WQE. · 59c6e13e

James Smart authored Apr 21, 2017

Unnecessary lock is taken. ring lock should be sufficient to protect the
work queue submission.

This was noticed when doing performance testing. The hbalock is not
needed to issue io to the nvme work queue.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

59c6e13e

Fix nvme initiator handling when not enabled. · 4410a67a

James Smart authored Apr 21, 2017

Fix nvme initiator handline when CONFIG_LPFC_NVME_INITIATOR is not enabled.

With update nvme upstream driver sources, loading
the driver with nvme enabled resulting in this Oops.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
 IP: lpfc_nvme_update_localport+0x23/0xd0 [lpfc]
 PGD 0
 Oops: 0000 [#1] SMP
 CPU: 0 PID: 10256 Comm: lpfc_worker_0 Tainted
 Hardware name: ...
 task: ffff881028191c40 task.stack: ffff880ffdf00000
 RIP: 0010:lpfc_nvme_update_localport+0x23/0xd0 [lpfc]
 RSP: 0018:ffff880ffdf03c20 EFLAGS: 00010202

Cause: As the initiator driver completes discovery at different stages,
it call lpfc_nvme_update_localport to hint that the DID and role may have
changed.  In the implementation of lpfc_nvme_update_localport, the driver
was not validating the localport or the lport during the execution
of the update_localport routine.  With the recent upstream additions to
the driver, the create_localport routine didn't run and so the localport
was NULL causing the page-fault Oops.

Fix: Add the CONFIG_LPFC_NVME_INITIATOR preprocessor inclusions to
lpfc_nvme_update_localport to turn off all routine processing when
the running kernel does not have NVME configured.  Add NULL pointer
checks on the localport and lport in lpfc_nvme_update_localport and
dump messages if they are NULL and just exit.
Also one alingment issue fixed.
Repalces the ifdef with the IS_ENABLED macro.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

4410a67a

Fix driver usage of 128B WQEs when WQ_CREATE is V1. · 3f247de7

James Smart authored Apr 21, 2017

There are two versions of a structure for queue creation and setup that the
driver shares with FW. The driver was only treating as version 0.

Verify WQ_CREATE with 128B WQEs in V0 and V1.

Code review of another bug showed the driver passing
128B WQEs and 8 pages in WQ CREATE and V0.
Code inspection/instrumentation showed that the driver
uses V0 in WQ_CREATE and if the caller passes queue->entry_size
128B, the driver sets the hdr_version to V1 so all is good.
When I tested the V1 WQ_CREATE, the mailbox failed causing
the driver to unload.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

3f247de7

Fix driver unload/reload operation. · d1f525aa

James Smart authored Apr 21, 2017

There are couple of different load/unload issues fixed with this patch.
One of the issues was reported by Junichi Nomura, a patch was submitted
by Johannes Thumsrhirn which did fix one of the problems but the fix in
this patch separates the pring free from the queue free and does not set
the parameter passed in to NULL.

issues:
(1) driver could not be unloaded and reloaded without some Oops or
 Panic occurring.
(2) The driver was panicking because of a corruption in the Memory
Manager when the iocb list was getting allocated.

Root cause for the memory corruption was a double free of the Work Queue
ring pointer memory - Freed once in the lpfc_sli4_queue_free when the CQ
was destroyed and again in lpfc_sli4_queue_free when the WQ was destroyed.

The pring free and the queue free were separated, the pring free was moved
to the wq destroy routine because it a better fit logically to delete the
ring with the wq.

The checkpatch flagged several alignmenet issues that were also corrected
with this patch.

The mboxq was never initialed correctly before it was used by the driver
this patch corrects that issue.
Reported-by: Junichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Junichi Nomura <j-nomura@ce.jp.nec.com>

d1f525aa

Fix PRLI ACC rsp for NVME · c07f10cd

James Smart authored Apr 21, 2017

PRLI ACC from target is FCP oriented.

Word 0 was wrong. This was noticed by another nvmet-fc vendor that
was testing the lpfc nvme-fc initiator with their target.

Verified results with analyzer.

PRLI
BC B5 56 56  22 61 04 00  00 61 00 00  01 29 00 00  20 00 00 00
00 10 FF FF  00 00 00 00  20 14 00 18  28 00 00 00  00 00 00 00
00 00 00 00  00 00 00 20  00 00 00 00  9C D8 DA C9  BC 95 75 75

ACC
BC B5 56 56  23 61 00 00  00 61 04 00  01 98 00 00  30 00 00 00
00 10 00 18  00 00 00 00  02 14 00 18  28 00 01 00  00 00 00 00
00 00 00 00  00 00 00 18  00 00 00 00  B0 6B 07 57  BC B5 75 75
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

c07f10cd

Fix extra line print in rqpair debug print. · afbb38fe

James Smart authored Apr 21, 2017

An extra blank line was being added the the rqpair printing.

Remove the extra line feed.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

afbb38fe

Remove NULL ptr check before kfree. · eafe89f5

James Smart authored Apr 21, 2017

The check for NULL ptr is not necessary, kfree will check it.

Removing NULL ptr check.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

eafe89f5

Remove unused defines for NVME PostBuf. · 06909bb9

James Smart authored Apr 21, 2017

These defines for the posting of buffers for nvmet target were not used.

Removing the unused defines.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

06909bb9

Fix spelling in comments. · 0ef69968

James Smart authored Apr 21, 2017

Comment should have said Repost.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

0ef69968

Add debug messages for nvme/fcp resource allocation. · e8c0a779

James Smart authored Apr 21, 2017

The xri resources are split into pools for NVME and FCP IO when NVME is
enabled. There was not message in the log that identified this allocation.

Added debug message to log XRI split.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

e8c0a779

Fix log message in completion path. · c154e750

James Smart authored Apr 21, 2017

In the lpfc_nvme_io_cmd_wqe_cmpl routine the driver was printing two
pointers and the DID for the rport whenever an IO completed on a now
that had transitioned to a non active state.

There is no need to print the node pointer address for a node that
is not active the DID should be enough to debug.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

c154e750

Fix rejected nvme LS Req. · ba43c4d0

James Smart authored Apr 21, 2017

In this case, the NVME initiator is sending an LS REQ command on an NDLP
that is not MAPPED.  The FW rejects it.

The lpfc_nvme_ls_req routine checks for a NULL ndlp pointer
but does not check the NDLP state.  This allows the routine
to send an LS IO when the ndlp is disconnected.

Check the ndlp for NULL, actual node, Target and MAPPED
or Initiator and UNMAPPED. This avoids Fabric nodes getting
the Create Association or Create Connection commands.  Initiators
are free to Reject either Create.
Also some of the messages numbers in lpfc_nvme_ls_req were changed because
they were already used in other log messages.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

ba43c4d0

Fix nvme unregister port timeout. · 975ff31c

James Smart authored Apr 21, 2017

During some link event testing it was observed that the
wait_for_completion_timeout in the lpfc_nvme_unregister_port
was timing out all the time.

The initiator is claiming the nvme_fc_unregister_remoteport upcall is
not completing the unregister in the time allotted.
[ 2186.151317] lpfc 0000:07:00.0: 0:(0):6169 Unreg nvme wait failed 0

 The wait_for_completion_timeout returns 0 when the wait has
been outstanding for the jiffies passed by the caller.  In this error
message, the nvme initiator passed value 5 - meaning 5 jiffies -
and this is just wrong.

Calculate 5 seconds in Jiffies and pass that value
from the current jiffies.

Also the log message for the unregister timeout was reduced
because timeout failure is the same as timeout.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

975ff31c

Standardize nvme SGL segment count · a44e4e8b

James Smart authored Apr 21, 2017

Standardize default SGL segment count for nvme target and initiator

The driver needs to make them the same for clarity.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

a44e4e8b

nvmet-fcloop: mark two symbols static · 36b8890e

Christoph Hellwig authored Apr 21, 2017

Found by sparse.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

36b8890e

nvmet-fc: properly endian swap sq_head · 8ad76cf1

Christoph Hellwig authored Apr 21, 2017

Found by sparse.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

8ad76cf1

nvmet-fc: mark the sqhd field as __le16 · f63688a6

Christoph Hellwig authored Apr 21, 2017

That's what it's used as.

Found by sparse.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

f63688a6

nvmet-fc: fix endianess annoations for nvmet_fc_format_rsp_hdr · 3f5e1188

Christoph Hellwig authored Apr 21, 2017

The passed in desc_len is a big endian value, so mark it as such.

Found by sparse.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

3f5e1188

nvmet-fc: mark nvmet_fc_handle_fcp_rqst static · edba98dd

Christoph Hellwig authored Apr 21, 2017

Found by sparse.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

edba98dd

nvme-fc: mark two symbols static · baee29ac

Christoph Hellwig authored Apr 21, 2017

Found by sparse.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

baee29ac

nvme_fc: add controller reset support · 61bff8ef

James Smart authored Apr 23, 2017

This patch actually does quite a few things.  When looking to add
controller reset support, the organization modeled after rdma was
very fragmented. rdma duplicates the reset and teardown paths and does
different things to the block layer on the two paths. The code to build
up the controller is also duplicated between the initial creation and
the reset/error recovery paths. So I decided to make this sane.

I reorganized the controller creation and teardown so that there is a
connect path and a disconnect path.  Initial creation obviously uses
the connect path.  Controller teardown will use the disconnect path,
followed last access code. Controller reset will use the disconnect
path to stop operation, and then the connect path to re-establish
the controller.

Along the way, several things were fixed
- aens were not properly set up. They are allocated differently from
  the per-request structure on the blk queues.
- aens were oddly torn down. the prior patch corrected to abort, but
  we still need to dma unmap and free relative elements.
- missed a few ref counting points: in aen completion and on i/o's
  that fail
- controller initial create failure paths were still confused vs teardown
  before converting to ref counting vs after we convert to refcounting.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

61bff8ef

nvme_fc: add aen abort to teardown · 78a7ac26

James Smart authored Apr 23, 2017

Add abort support for aens. Commonized the op abort to apply to aen or
real ios (caused some reorg/routine movement). Abort path sets termination
flag in prep for next patch that will be watching i/o abort completion
before proceeding with controller teardown.

Now that we're aborting aens, the "exit" code that simply cleared out
their context no longer applies.

Also clarified how we detect an AEN vs a normal io - by a flag, not
by whether a rq exists or the a rqno is out of range.

Note: saw some interesting cases where if the queues are stopped and
we're waiting for the aborts, the core layer can call the complete_rq
callback for the io. So the io completion synchronizes link side completion
with possible blk layer completion under error.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

78a7ac26

nvme_fc: fix command id check · 458f280d

James Smart authored Apr 23, 2017

The code validates the command_id in the response to the original
sqe command. But prior code was using the rq->rqno as the sqe command
id. The core layer overwrites what the transport set there originally.

Use the actual sqe content.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

458f280d

23 Apr, 2017 6 commits

lightnvm: pblk: fix erase counters on error fail · a44f53fa

Javier González authored Apr 22, 2017

When block erases fail, these blocks are marked bad. The number of valid
blocks in the line was not updated, which could cause an infinite loop
on the erase path.

Fix this atomic counter and, in order to avoid taking an irq lock on the
interrupt context, make the erase counters atomic too.

Also, in the case that a significant number of blocks become bad in a
line, the result is the double shared metadata buffer (emeta) to stop
the pipeline until all metadata is flushed to the media. Increase the
number of metadata lines from 2 to 4 to avoid this case.

Fixes: a4bd217b "lightnvm: physical block device (pblk) target"
Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

a44f53fa

lightnvm: pblk: free metadata on line alloc failure · be388d9f

Javier González authored Apr 22, 2017

When a line allocation fails, for example, due to having too many bad
blocks, free its metadata correctly.

Fixes: a4bd217b "lightnvm: physical block device (pblk) target"
Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

be388d9f

lightnvm: pblk: fix memory leak on error path · 33db9fd4

Javier González authored Apr 22, 2017

When write recovery fails, Free memory for the recovery structure.

Fixes: a4bd217b "lightnvm: physical block device (pblk) target"
Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

33db9fd4

lightnvm: pblk: fix bad error check · f3236cef

Javier González authored Apr 22, 2017

Fix bad error check

Fixes: a4bd217b "lightnvm: physical block device (pblk) target"
Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

f3236cef

lightnvm: pblk: fix race condition on line retry · 3dc001f3

Javier González authored Apr 22, 2017

When a pblk line fails (or is recovered), make sure to take the line
management lock.

Fixes: a4bd217b "lightnvm: physical block device (pblk) target"
Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

3dc001f3

block: fix blk_integrity_register to use template's interval_exp if not 0 · 2859323e

Mike Snitzer authored Apr 22, 2017

When registering an integrity profile: if the template's interval_exp is
not 0 use it, otherwise use the ilog2() of logical block size of the
provided gendisk.

This fixes a long-standing DM linear target bug where it cannot pass
integrity data to the underlying device if its logical block size
conflicts with the underlying device's logical block size.

Cc: stable@vger.kernel.org
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

2859323e

21 Apr, 2017 6 commits

lightnvm: don't print a warning for ADDR_EMPTY · 659226eb

Dan Carpenter authored Apr 21, 2017

Reading from ADDR_EMPTY is out of bounds. The current code generates a
static checker warning because we check for out of bounds "lba" before
we check for ADDR_EMPTY, so the second check is always false. It looks
like we intended ADDR_EMPTY to be a no-op without printing a warning.

Fixes: a4bd217b ("lightnvm: physical block device (pblk) target")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

659226eb

lightnvm: potential underflow in pblk_read_rq() · 5bf1e1ee

Dan Carpenter authored Apr 21, 2017

This is a static checker fix, and perhaps not a real bug.  The static
checker thinks that nr_secs could be negative.  It would result in
zeroing more memory than intended.  Anyway, even if it's not a bug,
changing this variable to unsigned makes the code easier to audit.

Fixes: a4bd217b ("lightnvm: physical block device (pblk) target")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

5bf1e1ee

block: get rid of blk_integrity_revalidate() · 19b7ccf8

Ilya Dryomov authored Apr 18, 2017

Commit 25520d55 ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which seems to assume ownership
of the stable pages flag and unilaterally clears it if no blk_integrity
profile is registered:

    if (bi->profile)
            disk->queue->backing_dev_info->capabilities |=
                    BDI_CAP_STABLE_WRITES;
    else
            disk->queue->backing_dev_info->capabilities &=
                    ~BDI_CAP_STABLE_WRITES;

It's called from revalidate_disk() and rescan_partitions(), making it
impossible to enable stable pages for drivers that support partitions
and don't use blk_integrity: while the call in revalidate_disk() can be
trivially worked around (see zram, which doesn't support partitions and
hence gets away with zram_revalidate_disk()), rescan_partitions() can
be triggered from userspace at any time.  This breaks rbd, where the
ceph messenger is responsible for generating/verifying CRCs.

Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there.  This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.

Fixes: 25520d55 ("block: Inline blk_integrity in struct gendisk")
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.4+, needs backporting
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

19b7ccf8

lightnvm: propagate pblk_init return to userspace · 8d77bb82

Rakesh Pandit authored Apr 20, 2017

From userspace calling ioctl(NVM_DEV_CREATE) was returning ENOMEM for
invalid arguments even though pblk (pblk_init) was returning correctly
-EINVAL to nvm_create_tgt inside core.  This patch propagates the
correct return value to userspace.

Because pblk was introduced recently this only needs to go in 4.12.

Fixes: a4bd217b ("lightnvm: physical block device (pblk) target")
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Jens Axboe <axboe@fb.com>

8d77bb82

blk-mq: Fix preempt count imbalance · abc25a69

Bart Van Assche authored Apr 21, 2017

Avoid that the following kernel bug gets triggered:

BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:349
in_atomic(): 1, irqs_disabled(): 0, pid: 8019, name: find
CPU: 10 PID: 8019 Comm: find Tainted: G        W I     4.11.0-rc4-dbg+ #2
Call Trace:
 dump_stack+0x68/0x93
 ___might_sleep+0x16e/0x230
 __might_sleep+0x4a/0x80
 __ext4_get_inode_loc+0x1e0/0x4e0
 ext4_iget+0x70/0xbc0
 ext4_iget_normal+0x2f/0x40
 ext4_lookup+0xb6/0x1f0
 lookup_slow+0x104/0x1e0
 walk_component+0x19a/0x330
 path_lookupat+0x4b/0x100
 filename_lookup+0x9a/0x110
 user_path_at_empty+0x36/0x40
 vfs_statx+0x67/0xc0
 SYSC_newfstatat+0x20/0x40
 SyS_newfstatat+0xe/0x10
 entry_SYSCALL_64_fastpath+0x18/0xad

This happens since the big if/else in blk_mq_make_request() doesn't
have final else section that also drops the ctx. Add that.

Fixes: b00c53e8 ("blk-mq: fix schedule-while-atomic with scheduler attached")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Omar Sandoval <osandov@fb.com>

Added a bit more to the commit log.
Signed-off-by: Jens Axboe <axboe@fb.com>

abc25a69

Merge branch 'nvme-4.12' of git://git.infradead.org/nvme into for-4.12/block · f8a05a1d

Jens Axboe authored Apr 21, 2017

Christoph writes:

This is the current NVMe pile: virtualization extensions, lots of FC
updates and various misc bits.  There are a few more FC bits that didn't
make the cut, but we'd like to get this request out before the merge
window for sure.

f8a05a1d