Commits · 4a636e9c7a2107b9a590f08d6f8f8a917e6b85de · Kirill Smelkov / linux

21 Aug, 2020 34 commits

scsi: mpt3sas: Remove superfluous memset() · 4a636e9c

Li Heng authored Jul 30, 2020

Fixes coccicheck warning:

./drivers/scsi/mpt3sas/mpt3sas_base.c:5247:16-34: WARNING: dma_alloc_coherent use in ioc -> request already zeroes out memory,  so memset is not needed

dma_alloc_coherent() already zeroes out memory so memset() is not needed.

Link: https://lore.kernel.org/r/1596079918-41115-4-git-send-email-liheng40@huawei.comSigned-off-by: Li Heng <liheng40@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4a636e9c

scsi: qla2xxx: Remove superfluous memset() · bef93fbf

Li Heng authored Jul 30, 2020

Fixes coccicheck warning:

./drivers/scsi/qla2xxx/qla_mbx.c:4928:15-33: WARNING: dma_alloc_coherent use in els_cmd_map already zeroes out memory,  so memset is not needed

dma_alloc_coherent() already zeroes out memory so memset() is not needed.

Link: https://lore.kernel.org/r/1596079918-41115-3-git-send-email-liheng40@huawei.comSigned-off-by: Li Heng <liheng40@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

bef93fbf

scsi: pmcraid: Remove superfluous memset() · 7b1d8862

Li Heng authored Jul 30, 2020

Fixes coccicheck warning:

./drivers/scsi/pmcraid.c:4709:3-21: WARNING: dma_alloc_coherent use in pinstance -> hrrq_start [ i ] already zeroes out memory,  so memset is not needed

dma_alloc_coherent() already zeroes out memory so memset() is not needed.

Link: https://lore.kernel.org/r/1596079918-41115-2-git-send-email-liheng40@huawei.comSigned-off-by: Li Heng <liheng40@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7b1d8862

scsi: mvsas: Remove superfluous memset() · f672d7d3

Li Heng authored Jul 30, 2020

Fixes coccicheck warning:

./drivers/scsi/mvsas/mv_init.c:244:11-29: WARNING: dma_alloc_coherent use in mvi -> tx already zeroes out memory,  so memset is not needed
./drivers/scsi/mvsas/mv_init.c:250:15-33: WARNING: dma_alloc_coherent use in mvi -> rx_fis already zeroes out memory,  so memset is not needed
./drivers/scsi/mvsas/mv_init.c:256:11-29: WARNING: dma_alloc_coherent use in mvi -> rx already zeroes out memory,  so memset is not needed
./drivers/scsi/mvsas/mv_init.c:265:13-31: WARNING: dma_alloc_coherent use in mvi -> slot already zeroes out memory,  so memset is not needed

dma_alloc_coherent() already zeroes out memory so memset() is not needed.

Link: https://lore.kernel.org/r/1596078235-54002-1-git-send-email-liheng40@huawei.comSigned-off-by: Li Heng <liheng40@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f672d7d3

scsi: mptctl: Remove unneeded cast from memory allocation · 8fee79ed

Li Heng authored Jul 29, 2020

Remove casting the values returned by memory allocation function.

Coccinelle emits WARNING:

./drivers/message/fusion/mptctl.c:2596:14-31: WARNING: casting value returned by memory allocation function to (SCSIDevicePage0_t *) is useless.
./drivers/message/fusion/mptctl.c:2660:15-32: WARNING: casting value returned by memory allocation function to (SCSIDevicePage3_t *) is useless.

Link: https://lore.kernel.org/r/1596014390-18605-1-git-send-email-liheng40@huawei.comSigned-off-by: Li Heng <liheng40@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

8fee79ed

scsi: mptfc: Remove unneeded cast from memory allocation · 33fff97c

Li Heng authored Jul 29, 2020

Remove casting the values returned by memory allocation function.

Coccinelle emits WARNING:

./drivers/message/fusion/mptfc.c:766:17-30: WARNING: casting value returned by memory allocation function to (FCPortPage0_t *) is useless.
./drivers/message/fusion/mptfc.c:907:17-30: WARNING: casting value returned by memory allocation function to (FCPortPage1_t *) is useless.

[mkp: memset()]

Link: https://lore.kernel.org/r/1596014354-59935-1-git-send-email-liheng40@huawei.comSigned-off-by: Li Heng <liheng40@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

33fff97c

scsi: ufs: ufs-mediatek: Modify the minimum RX/TX lane count to 2 · 460d74a0

Andy Teng authored Aug 19, 2020

MediaTek UFS host now supports 2 lanes. Modify the lane count to 2.

This modification shall not impact old 1-lane host because
PA_CONNECTEDRXDATALANES and PA_CONNECTEDTXDATALANES will limit the target
lanes properly during power mode change. So we could relax the limitation
in ufs_dev_params.

Link: https://lore.kernel.org/r/20200819084340.7021-1-stanley.chu@mediatek.comReviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Andy Teng <andy.teng@mediatek.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

460d74a0

scsi: ufs: Remove an unpaired ufshcd_scsi_unblock_requests() in err_handler() · 50807f22

Can Guo authored Aug 18, 2020

Commit 5586dd8e ("scsi: ufs: Fix a race condition between error handler
and runtime PM ops") moves the ufshcd_scsi_block_requests() inside
err_handler() but forgets to remove the ufshcd_scsi_unblock_requests() in
the early return path. Correct the mistake.

Link: https://lore.kernel.org/r/1597798958-24322-1-git-send-email-cang@codeaurora.org
Fixes: 5586dd8e ("scsi: ufs: Fix a race condition between error handler and runtime PM ops")
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Hongwu Su<hongwus@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

50807f22

scsi: ufs: Change fDeviceInit busy wait · 29707fab

Kiwoong Kim authored Aug 10, 2020

Currently, the UFS driver busy waits for fDeviceInit to be cleared. Provide
an upper bound and sleep between attempts instead of busy waiting.

Link: https://lore.kernel.org/r/1597053747-75171-1-git-send-email-kwmad.kim@samsung.comTested-by: Kiwoong Kim <kwmad.kim@samsung.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

29707fab

scsi: ufs: Remove several redundant goto statements · b0008625

Bean Huo authored Aug 14, 2020

Link: https://lore.kernel.org/r/20200814095034.20709-3-huobean@gmail.comReviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b0008625

scsi: ufs: Change ufshcd_comp_devman_upiu() to ufshcd_compose_devman_upiu() · f273c54b

Bean Huo authored Aug 14, 2020

ufshcd_comp_devman_upiu() was poorly named leading people to think it was a
completion function. Rename it to ufshcd_compose_devman_upiu().

Link: https://lore.kernel.org/r/20200814095034.20709-2-huobean@gmail.comReviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f273c54b

scsi: qedf: Fix race between ELS completion and flushing ELS request · 3079285b

Saurav Kashyap authored Aug 07, 2020

Fix race between ELS completion and flushing ELS request.

Link: https://lore.kernel.org/r/20200807110656.19965-8-jhasan@marvell.comSigned-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

3079285b

scsi: qedf: Don't process ELS completion if event is flushed or cleaned up · 22ddec31

Saurav Kashyap authored Aug 07, 2020

Don't process ELS completion if event is flushed or cleaned up.

Link: https://lore.kernel.org/r/20200807110656.19965-7-jhasan@marvell.comSigned-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

22ddec31

scsi: qedf: Initiate cleanup for ELS commands as well · 1f6d1d4c

Saurav Kashyap authored Aug 07, 2020

Initiate cleanup for ELS commands as well.

Link: https://lore.kernel.org/r/20200807110656.19965-6-jhasan@marvell.comSigned-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

1f6d1d4c

scsi: qedf: Send cleanup even for RRQ on timeout · 39d0357d

Saurav Kashyap authored Aug 07, 2020

Send cleanup even for RRQ on timeout.

Link: https://lore.kernel.org/r/20200807110656.19965-5-jhasan@marvell.comSigned-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

39d0357d

scsi: qedf: Do not kill timeout work for original I/O on RRQ completion · b09ea43f

Saurav Kashyap authored Aug 07, 2020

The timer is already cancelled when abort is completed, hence no need to
cancel it again.

Link: https://lore.kernel.org/r/20200807110656.19965-4-jhasan@marvell.comSigned-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b09ea43f

scsi: qedf: Check the validity of rjt frame before processing · 7fb8ff08

Saurav Kashyap authored Aug 07, 2020

This is reported by Klockwork.

Link: https://lore.kernel.org/r/20200807110656.19965-3-jhasan@marvell.comSigned-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7fb8ff08

scsi: qedf: Check for port type and role before processing an event · a521bbc3

Saurav Kashyap authored Aug 07, 2020

The rport lock gets initialized during offload. If a non-FCP or non-target
rport got logout then this rport will be uninitialized. KASAN was
complaining because of it.

=========
[   14.384434] the code is fine but needs lockdep annotation.
[   14.384482] turning off the locking correctness validator.
========

Link: https://lore.kernel.org/r/20200807110656.19965-2-jhasan@marvell.comSigned-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a521bbc3

scsi: ufs-qcom: Remove unused MSM bus scaling APIs · 68bdb3db

Sai Prakash Ranjan authored Aug 04, 2020

MSM bus scaling has moved on to use interconnect framework and downstream
bus scaling APIs like msm_bus_scale*() do not exist anymore in the
kernel. Currently they are guarded by a config which also does not exist
and hence there are no build failures reported. Remove these unused
interfaces as they are currently no-ops and the scaling support that may be
added in future will use interconnect API.

Link: https://lore.kernel.org/r/20200804161033.15586-1-saiprakash.ranjan@codeaurora.orgSigned-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

68bdb3db

scsi: smartpqi: Bump version to 1.2.16-010 · ce60a2b8

Don Brace authored Jul 31, 2020

Link: https://lore.kernel.org/r/159622931040.30579.9167901134341507088.stgit@brunhildaReviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Gerry Morong <gerry.morong@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ce60a2b8

scsi: smartpqi: Add RAID bypass counter · 8b664fef

Kevin Barnett authored Jul 31, 2020

Add a counter to assist in verifying when RAID bypass is being used.

Link: https://lore.kernel.org/r/159622930468.30579.13153724465552773544.stgit@brunhildaReviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

8b664fef

scsi: smartpqi: Support device deletion via sysfs · 4d15ad38

Kevin Barnett authored Jul 31, 2020

Support device deletion via sysfs.

I.e: echo 1 > /sys/block/sd<X>/device/delete

Link: https://lore.kernel.org/r/159622929885.30579.2727491506675011534.stgit@brunhildaReviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4d15ad38

scsi: smartpqi: Avoid crashing kernel for controller issues · 9e68cccc

Kevin Barnett authored Jul 31, 2020

Eliminate kernel panics when getting invalid responses from controller.
Take controller offline instead of causing kernel panics.

Link: https://lore.kernel.org/r/159622929306.30579.16523318707596752828.stgit@brunhildaReviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Prasad Munirathnam <Prasad.Munirathnam@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

9e68cccc

scsi: smartpqi: Update logical volume size after expansion · 244ca45e

Mahesh Rajashekhara authored Jul 31, 2020

Have OS rescan after logical volume expansion to reflect new size.

Link: https://lore.kernel.org/r/159622928727.30579.298277463169866711.stgit@brunhildaReviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

244ca45e

scsi: smartpqi: Add id support for SmartRAID 3152-8i · 3af06083

Mahesh Rajashekhara authored Jul 31, 2020

VID_9005, DID_028F, SVID_9005 and SDID_080A.

Link: https://lore.kernel.org/r/159622928143.30579.14769183842894725454.stgit@brunhildaReviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

3af06083

scsi: smartpqi: Identify physical devices without issuing INQUIRY · ce143793

Kevin Barnett authored Jul 31, 2020

Eliminate issuing INQUIRYs to problematic devices by using information
provided by controller.

Link: https://lore.kernel.org/r/159622927172.30579.3960527536810532094.stgit@brunhildaReviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ce143793

scsi: mpt3sas: Update driver version to 35.100.00.00 · 0491bdc7

Suganath Prabu S authored Jul 30, 2020

Updated driver version to 35.100.00.00

Link: https://lore.kernel.org/r/1596096229-3341-8-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

0491bdc7

scsi: mpt3sas: Postprocessing of target and LUN reset · 711a923c

Suganath Prabu S authored Jul 30, 2020

If driver has not received the interrupt for the aborted SCSI command
before processing the TM reply, driver polls all the reply descriptor pools
looking for the reply for the aborted SCSI command before marking TM as
FAILED. If it finds the reply, then it marks the TM as SUCCESS otherwise it
marks it FAILED.

scsih_tm_cmd_map_status() checks whether TM has aborted the timed out SCSI
command or not. If TM has aborted the IO, then it returns SUCCESS else it
returns FAILED.

Link: https://lore.kernel.org/r/1596096229-3341-7-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

711a923c

scsi: mpt3sas: Add functions to check if any cmd is outstanding on Target and LUN · 521e9c0b

Suganath Prabu S authored Jul 30, 2020

Add helper functions to check whether any SCSI command is outstanding on
particular Target, LUN device.

Also add function parameters 'channel', 'id' to function
mpt3sas_scsih_issue_tm().

Link: https://lore.kernel.org/r/1596096229-3341-6-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

521e9c0b

scsi: mpt3sas: Rename and export interrupt mask/unmask functions · 5afa9d44

Suganath Prabu S authored Jul 30, 2020

Rename Function _base_unmask_interrupts() to
mpt3sas_base_unmask_interrupts() and _base_mask_interrupts() to
mpt3sas_base_mask_interrupts(). Also add function declarion to
mpt3sas_base.h

Link: https://lore.kernel.org/r/1596096229-3341-5-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

5afa9d44

scsi: mpt3sas: Cancel the running work during host reset · 9e73ed2e

Suganath Prabu S authored Jul 30, 2020

It is not recommended to issue back-to-back host reset without any delay.
However, if someone issues back-to-back host reset then we observe that
target devices get unregistered and re-register with SML. And if OS drive
is behind the HBA when it gets unregistered, then file-system goes into
read-only mode.

Normally during host reset, driver marks accessible target devices as
responding and triggers the event MPT3SAS_REMOVE_UNRESPONDING_DEVICES to
remove any non-responding devices through FW worker thread. While
processing this event, driver unregisters the non-responding devices and
clears the responding flag for all the devices.

Currently, during host reset, driver is cancelling only those Firmware
event works which are pending in Firmware event workqueue. It is not
cancelling work which is currently running. Change the driver to cancel all
events.

Link: https://lore.kernel.org/r/1596096229-3341-4-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

9e73ed2e

scsi: mpt3sas: Dump system registers for debugging · af6ec1ee

Suganath Prabu S authored Jul 30, 2020

When controller fails to transition to READY state during driver probe,
dump the system interface register set. This will give snapshot of the
firmware status for debugging driver load issues.

Link: https://lore.kernel.org/r/1596096229-3341-3-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

af6ec1ee

scsi: mpt3sas: Memset config_cmds.reply buffer with zeros · f09219e4

Suganath Prabu S authored Jul 30, 2020

Currently config_cmds.reply buffer is not memset to zero before posting
config page request message. In some cases, for the current config
request, the previous config reply is getting processed and we will observe
PageType mismatch between request to reply buffer. It will be difficult to
debug this type of issue and it confuses by thinking that HBA Firmware
itself posted the wrong config reply. So it is better to memset the
config_cmds.reply buffer with zeros before issuing the config request.

Link: https://lore.kernel.org/r/1596096229-3341-2-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f09219e4

scsi: ufs: Properly release resources if a task is aborted successfully · 8bb2dde0

Can Guo authored Aug 09, 2020

In current UFS task abort hook, namely ufshcd_abort(), if one task is
aborted successfully, clk_gating.active_reqs held by this task is not
decreased, which makes clk_gating.active_reqs stay above zero forever, thus
clock gating would never happen. Instead of releasing resources of one task
"manually", use the existing func __ufshcd_transfer_req_compl().  This
change also eliminates a possible race of scsi_dma_unmap() from the real
completion in IRQ handler path.

Link: https://lore.kernel.org/r/1596975355-39813-10-git-send-email-cang@codeaurora.org
Fixes: 1ab27c9c ("ufs: Add support for clock gating")
CC: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

8bb2dde0

18 Aug, 2020 6 commits

scsi: ufs: Fix a race condition between error handler and runtime PM ops · 5586dd8e

Can Guo authored Aug 09, 2020

The current IRQ handler blocks SCSI requests before scheduling eh_work,
when error handler calls pm_runtime_get_sync, if ufshcd_suspend/resume
sends a SCSI cmd, most likely the SSU cmd, since SCSI requests are blocked,
pm_runtime_get_sync() will never return because ufshcd_suspend/resume is
blocked by the SCSI cmd.

 - In queuecommand path, hba->ufshcd_state check and ufshcd_send_command
   should stay under the same spin lock. This is to make sure that no more
   commands leak into doorbell after hba->ufshcd_state is changed.

 - Don't block SCSI requests before error handler starts to run, let error
   handler block SCSI requests when it is ready to start error recovery.

 - Don't let SCSI layer keep requeuing the SCSI cmds sent from HBA runtime
   PM ops, let them pass or fail them. Let them pass if eh_work is
   scheduled due to non-fatal errors. Fail them if eh_work is scheduled due
   to fatal errors, otherwise the cmds may eventually time out since UFS is
   in bad state, which gets error handler blocked for too long. If we fail
   the SCSI cmds sent from HBA runtime PM ops, HBA runtime PM ops fails
   too, but it does not hurt since error handler can recover HBA runtime PM
   error.

Link: https://lore.kernel.org/r/1596975355-39813-9-git-send-email-cang@codeaurora.orgReviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

5586dd8e

scsi: ufs: Move dumps in IRQ handler to error handler · c3be8d1e

Can Guo authored Aug 09, 2020

Performing dumps in the IRQ handler causes system stability issues. Move
dumps to the error handler and only print basic host registers here.

Link: https://lore.kernel.org/r/1596975355-39813-8-git-send-email-cang@codeaurora.orgReviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c3be8d1e

scsi: ufs: Recover HBA runtime PM error in error handler · c72e79c0

Can Guo authored Aug 09, 2020

The current error handler can not recover HBA runtime PM error if
ufshcd_suspend/resume has failed due to UFS errors, e.g. hibern8 enter/exit
error or SSU cmd error. When this happens, error handler may fail
performing a full reset and restore because error handler always assumes
that power, IRQs and clocks are ready after pm_runtime_get_sync returns,
but actually they are not if ufshcd_resume fails[1].

If ufschd_suspend/resume fails due to UFS errors, runtime PM framework
saves the error value to dev.power.runtime_error. After that, HBA dev
runtime suspend/resume would not be invoked anymore unless runtime_error is
cleared[2].

In case of ufshcd_suspend/resume fails due to UFS errors, for scenario [1],
error handler cannot assume anything of pm_runtime_get_sync, meaning error
handler should explicitly turn ON powers, IRQs and clocks again. To get the
HBA runtime PM work as regard for scenario [2], error handler can clear the
runtime_error by calling pm_runtime_set_active() if full reset and restore
succeeds. And, more important, if pm_runtime_set_active() returns no error,
which means runtime_error has been cleared, we also need to resume those
scsi devices under HBA in case any of them has failed to be resumed due to
HBA runtime resume failure. This is to unblock blk_queue_enter in case
there are bios waiting inside it.

Link: https://lore.kernel.org/r/1596975355-39813-7-git-send-email-cang@codeaurora.orgReviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c72e79c0

scsi: ufs: Fix concurrency of error handler and other error recovery paths · 4db7a236

Can Guo authored Aug 09, 2020

Error recovery can be invoked from multiple code paths, including hibern8
enter/exit (from ufshcd_link_recovery), ufshcd_eh_host_reset_handler() and
eh_work scheduled from IRQ context. Ultimately, these paths are all trying
to invoke ufshcd_reset_and_restore() in either a synchronous or
asynchronous manner. This causes problems:

- If link recovery happens during ungate work, ufshcd_hold() would be
called recursively. Although commit 53c12d0e ("scsi: ufs: fix error
recovery after the hibern8 exit failure") fixed a deadlock due to
recursive calls of ufshcd_hold() by adding a check of eh_in_progress
into ufshcd_hold, this check allows eh_work to run in parallel while
link recovery is running.

- Similar concurrency can also happen when error recovery is invoked from
ufshcd_eh_host_reset_handler and ufshcd_link_recovery.

- Concurrency can even happen between eh_works. eh_work, currently queued
on system_wq, is allowed to have multiple instances running in parallel,
but we don't have proper protection for that.

If any of above concurrency scenarios happen, error recovery would fail and
lead ufs device and host into bad states. To fix the concurrency problem,
this change queues eh_work on a single threaded workqueue and removes link
recovery calls from the hibern8 enter/exit path. In addition, make use of
eh_work in eh_host_reset_handler instead of calling
ufshcd_reset_and_restore. This unifies the UFS error recovery mechanism.

According to the UFSHCI JEDEC spec, hibern8 enter/exit error occurs when
the link is broken. This essentially applies to any power mode change
operations (since they all use PACP_PWR cmds in UniPro layer). So, if a
power mode change operation (including AH8 enter/exit) fails, mark link
state as UIC_LINK_BROKEN_STATE and schedule the eh_work. In this case,
error handler needs to do a full reset and restore to recover the link back
to active. Before the link state is recovered to active,
ufshcd_uic_pwr_ctrl simply returns -ENOLINK to avoid more errors.

Link: https://lore.kernel.org/r/1596975355-39813-6-git-send-email-cang@codeaurora.orgReviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4db7a236

scsi: ufs: Add some debug information to ufshcd_print_host_state() · 3f8af604

Can Guo authored Aug 09, 2020

Information about the last interrupt status and timestamp is helpful when
debugging system stability issues (IRQ starvation, for instance). Add this
information to ufshcd_print_host_state() output.

In addition, UFS device information such as model name and firmware version
also comes in handy during debugging. This is printed as well.

Link: https://lore.kernel.org/r/1596975355-39813-5-git-send-email-cang@codeaurora.orgReviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Hongwu Su <hongwus@codeaurora.org>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

3f8af604

scsi: ufs-qcom: Remove testbus dump in ufs_qcom_dump_dbg_regs · 423cc66b

Can Guo authored Aug 09, 2020

Dumping testbus registers outputs a lot of information and can cause
stability issues. Remove the dump code.

Link: https://lore.kernel.org/r/1596975355-39813-4-git-send-email-cang@codeaurora.orgReviewed-by: Hongwu Su <hongwus@codeaurora.org>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

423cc66b