Commit 223f91b4 authored by Douglas Gilbert's avatar Douglas Gilbert Committed by Martin K. Petersen

scsi: scsi_debug: Fix scp is NULL errors

John Garry reported 'sdebug_q_cmd_complete: scp is NULL' failures that were
mainly seen on aarch64 machines (e.g. RPi 4 with four A72 CPUs). The
problem was tracked down to a missing critical section on a "short circuit"
path. Namely, the time to process the current command so far has already
exceeded the requested command duration (i.e. the number of nanoseconds in
the ndelay parameter).

The random=1 parameter setting was pivotal in finding this error.  The
failure scenario involved first taking that "short circuit" path (due to a
very short command duration) and then taking the more likely
hrtimer_start() path (due to a longer command duration). With random=1 each
command's duration is taken from the uniformly distributed [0..ndelay)
interval.  The fio utility also helped by reliably generating the error
scenario at about once per minute on a RPi 4 (64 bit OS).

Link: https://lore.kernel.org/r/20200813155738.109298-1-dgilbert@interlog.comReported-by: default avatarJohn Garry <john.garry@huawei.com>
Reviewed-by: default avatarLee Duncan <lduncan@suse.com>
Signed-off-by: default avatarDouglas Gilbert <dgilbert@interlog.com>
Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
parent 2d9a2c5f
......@@ -5490,9 +5490,11 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip,
u64 d = ktime_get_boottime_ns() - ns_from_boot;
if (kt <= d) { /* elapsed duration >= kt */
spin_lock_irqsave(&sqp->qc_lock, iflags);
sqcp->a_cmnd = NULL;
atomic_dec(&devip->num_in_q);
clear_bit(k, sqp->in_use_bm);
spin_unlock_irqrestore(&sqp->qc_lock, iflags);
if (new_sd_dp)
kfree(sd_dp);
/* call scsi_done() from this thread */
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment