scsi: smartpqi: correct lun reset issues
Kevin Barnett authored

Problem:
The Linux kernel takes a logical volume offline after a LUN reset.  This is
generally accompanied by this message in the dmesg output:

Device offlined - not ready after error recovery

Root Cause:
The root cause is a "quirk" in the timeout handling in the Linux SCSI
layer. The Linux kernel places a 30-second timeout on most media access
commands (reads and writes) that it send to device drivers.  When a media
access command times out, the Linux kernel goes into error recovery mode
for the LUN that was the target of the command that timed out. Every
command that timed out is kept on a list inside of the Linux kernel to be
retried later. The kernel attempts to recover the command(s) that timed out
by issuing a LUN reset followed by a TEST UNIT READY. If the LUN reset and
TEST UNIT READY commands are successful, the kernel retries the command(s)
that timed out.

Each SCSI command issued by the kernel has a result field associated with
it. This field indicates the final result of the command (success or
error). When a command times out, the kernel places a value in this result
field indicating that the command timed out.

The "quirk" is that after the LUN reset and TEST UNIT READY commands are
completed, the kernel checks each command on the timed-out command list
before retrying it. If the result field is still "timed out", the kernel
treats that command as not having been successfully recovered for a
retry. If the number of commands that are in this state are greater than
two, the kernel takes the LUN offline.

Fix:
When our RAIDStack receives a LUN reset, it simply waits until all
outstanding commands complete. Generally, all of these outstanding commands
complete successfully. Therefore, the fix in the smartpqi driver is to
always set the command result field to indicate success when a request
completes successfully. This normally isn’t necessary because the result
field is always initialized to success when the command is submitted to the
driver. So when the command completes successfully, the result field is
left untouched. But in this case, the kernel changes the result field
behind the driver’s back and then expects the field to be changed by the
driver as the commands that timed-out complete.
Reviewed-by: default avatarDave Carroll <david.carroll@microsemi.com>
Reviewed-by: default avatarScott Teel <scott.teel@microsemi.com>
Signed-off-by: default avatarKevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: default avatarDon Brace <don.brace@microsemi.com>
Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
2ba55c98
Name Last commit Last update
Documentation scsi: remove the use_clustering flag
LICENSES Merge tag 'docs-4.20' of git://git.lwn.net/linux
arch scsi: remove the use_clustering flag
block scsi: block: remove the cluster flag
certs export.h: remove VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR()
crypto KEYS: asym_tpm: Add support for the sign operation [ver #2]
drivers scsi: smartpqi: correct lun reset issues
firmware kbuild: remove all dummy assignments to obj-
fs Merge tag 'tags/upstream-4.20-rc1' of git://git.infradead.org/linux-ubifs
include scsi: block: remove the cluster flag
init memblock: stop using implicit alignment to SMP_CACHE_BYTES
ipc ipc: IPCMNI limit check for semmni
kernel Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
lib Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
mm memory_hotplug: cond_resched in __remove_pages
net Merge tag 'nfs-for-4.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
samples Merge tag 'vfio-v4.20-rc1.v2' of git://github.com/awilliam/linux-vfio
scripts Merge tag 'kbuild-v4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
security Merge tag 'apparmor-pr-2018-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
sound Merge tag 'sound-fix-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
tools Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
usr initramfs: move gen_initramfs_list.sh from scripts/ to usr/
virt Revert "mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks"
.clang-format page cache: Convert find_get_pages_contig to XArray
.cocciconfig scripts: add Linux .cocciconfig for coccinelle
.get_maintainer.ignore
.gitattributes
.gitignore
.mailmap
COPYING
CREDITS
Kbuild
Kconfig
MAINTAINERS
Makefile
README
Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.