Commits · 9affa289e2f9ef4721e85edbde86466524bfe957 · nexedi / linux

03 Jul, 2011 40 commits

Dan Williams authored Mar 23, 2011

Don't assume the hardware is in a known state at init.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

9affa289

isci: task.h compile and checkpatch fixes · ce0b89f3

Dan Williams authored Mar 17, 2011

A usage of "FALSE" leaked in as well as some checkpatch escapes.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

ce0b89f3

isci: don't hold scic_lock over calls to sas_task_abort() · c4b9e24c

Jeff Skirvin authored Mar 16, 2011

In the case where submitted I/Os fail with the status code
SCI_FAILURE_REMOTE_DEVICE_RESET_REQUIRED, the execute function now waits
until scic_lock is cleared before calling the helper function
"isci_request_signal_device_reset" which sets the flag for the pending
reset condition on the I/O.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

c4b9e24c

isci: fix incorrect assumptions about task->dev and task->dev->port being NULL · 1077a574

Dan Williams authored Mar 11, 2011

A domain_device has the same lifetime as its related scsi_target.  The
scsi_target is reference counted based on outstanding commands,
therefore it is safe to assume that if we have a valid sas_task that the
->dev pointer is also valid.

The asd_sas_port of a domain_device has the same lifetime as the driver
so it can also never be NULL as long as the sas_task is valid and the
driver is loaded.

This also cleans up isci_task_complete_for_upper_layer(), renames it to
isci_task_refuse() and notices that the isci_completion_selection
parameter was set to isci_perform_normal_io_completion by all callers.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

1077a574

isci: add "isci_id" attribute · 34cad85d

Dan Williams authored Mar 10, 2011

Allow each controller to be identified via sysfs.

# cat /sys/class/scsi_host/host13/isci_id
1
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

34cad85d

isci: All pending requests are terminated before stopping the device. · 6e2802a7

Jeff Skirvin authored Mar 08, 2011

Make sure all pending I/O including any in the libsas error handler
process is cleaned-up.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

6e2802a7

isci: Always set response/status for requests going into the error path. · aa145102

Jeff Skirvin authored Mar 07, 2011

In the case of I/O requests being failed because of a required device
reset condition, set the response and status to indicate an I/O failure.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

aa145102

isci: Errors in the submit path for SATA devices manage the ap lock. · 50e7f9b5

Dan Williams authored Mar 09, 2011

Since libsas takes the domain device sata_dev.ap->lock before submitting
a task, error completions in the submit path for SATA devices must
unlock/relock when completing the sas_task back to libsas.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

50e7f9b5

isci: Fixed BUG_ON in isci_abort_task_process_cb callback. · 70957a94

Jeff Skirvin authored Mar 04, 2011

The request may be in the "aborted" or the "completed" state when
performing a task management operation on it.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

70957a94

isci: Fix TMF build for SAS/SATA LUN reset cases. · c3f42feb

Jeff Skirvin authored Mar 04, 2011

In the case where a SAS or SATA LUN reset TMF is built a NULL pointer
dereference occurred because of the (unused) callback data pointer.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jacek Danecki <Jacek.Danecki@intel.com>

c3f42feb

isci: Termination handling cleanup, added termination timeouts. · 4dc043c4

Jeff Skirvin authored Mar 04, 2011

Added a request "dead" state for use when a termination wait times-out.

isci_terminate_pending_requests now detaches the device's pending list
and terminates each entry on the detached list.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

4dc043c4

isci: Code review change for completion pointer cleanup. · cbb65c66

Jeff Skirvin authored Mar 04, 2011

Since the request structure contains a pointer to the completion to be
used if the request is being aborted or terminated, there is no reason
to pass the completion as a pointer to isci_terminate_request_core().
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Jacek Danecki <Jacek.Danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

cbb65c66

isci: Cleaning up task execute path. · f0846c68

Jeff Skirvin authored Mar 08, 2011

Made sure the device ready check accounts for all states.
Moved the aborted task check into the loop of pulling task requests
off of the submitted list.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Jacek Danecki <Jacek.Danecki@intel.com>
[remove host and device starting state checks]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

f0846c68

isci: save the i/o tag outside the scic request structure. · 1fad9e93

Jeff Skirvin authored Mar 04, 2011

The pointer to the core representation of a request is marked NULL at
completion, but we need to save the i/o tag for task management.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Jacek Danecki <Jacek.Danecki@intel.com>
[revise changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

1fad9e93

isci: Any reset indicated on an I/O completion escalates it to the error path. · ec6c9638

Jeff Skirvin authored Mar 04, 2011

If there is a pending device reset, the I/O is used to accomplish the reset by setting the
RESET bit in the task status, and then putting the task into the error handler
path using sas abort task.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Jacek Danecki <Jacek.Danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

ec6c9638

isci: fix completion / abort path. · a5fde225

Jeff Skirvin authored Mar 04, 2011

Corrected use of the request state_lock in the completion callback.

In the case where an abort (or reset) thread is trying to terminate an
I/O request, it sets the request state to "aborting" (or "terminating")
if the state is still "starting".  One of the bugs was to never set the
state to "completed".  Another was to not correctly recognize the
situation where the I/O had completed but the sas_task was still pending
callback to task_done - this was typically a problem in the LUN and
device reset cases.

It is now possible that we leave isci_task_abort_task() with
request->io_request_completion pointing to localy allocated
aborted_io_completion struct. It may result in a system crash.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Maciej Trela <Maciej.Trela@intel.com>
Signed-off-by: Jacek Danecki <Jacek.Danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

a5fde225

isci: Changes in isci_host_completion_routine · 11b00c19

Jeff Skirvin authored Mar 04, 2011

Changes to move management of the reqs_in_process entry for the request here.
Made changes to note when the task is already in the abort path and
cannot be completed through callbacks.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Jacek Danecki <Jacek.Danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

11b00c19

isci: isci_request_cleanup_completed_loiterer checks task before task_done · 18d3d72a

Jeff Skirvin authored Mar 04, 2011

In the condition where outstanding I/Os are being cleaned from the device
requests in process list, the cleanup function needs to check that the
request is actually a sas-task and not a task management function.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

18d3d72a

isci: cleanup debug leftovers in isci.h · 5409bc3a

Dan Williams authored Mar 08, 2011

Reported-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

5409bc3a

isci: replace remote_device_lock with scic_lock · 1a38045b

Dan Williams authored Mar 03, 2011

The remote_device_lock is currently used to protect a controller global
resource (RNCs), but the remote_device_lock is per-port.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

1a38045b

isci: preallocate remote devices · d9c37390

Dan Williams authored Mar 03, 2011

Until we synchronize against device removal this limits the damage of
use after free bugs to the driver's own objects. Unless we implement
reference counting we need to ensure at least a subset of a remote
device is valid at all times. We follow the lead of other libsas
drivers that also preallocate devices.

This also enforces maximum remote device accounting at the lldd layer,
but the core may still run out of RNC's before we hit this limit.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

d9c37390

isci: replace isci_remote_device completion with event queue · 6ad31fec

Dan Williams authored Mar 04, 2011

Replace the device completion infrastructure with the controller wide
event queue.  There was a potential for the stop and ready notifications
to corrupt each other, now that cannot happen.

The stop pending flag cannot be used until devices are statically
allocated.  We temporarily need to maintain a completion to handle
waiting for an object that has disappeared, but we can at least stop
scribbling on freed memory.

A future change will also get rid of the "stopping" state as it should
not be exposed to the rest of the driver.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

6ad31fec

isci: kill "host quiesce" mechanism · 8acaec15

Dan Williams authored Mar 07, 2011

The midlayer is already throttling i/o in the places where host_quiesce
was trying to prevent further i/o to the device. It's also problematic
in that it holds a lock over GFP_KERNEL allocations.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

8acaec15

isci: remove sci_device_handle · 3a97eec6

Dan Williams authored Mar 04, 2011

It belies the fact that isci_remote_device and scic_sds_remote_device
are one in same object with the same lifetime rules.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

3a97eec6

isci: kill isci_host list in favor of an array · b329aff1

Dan Williams authored Mar 07, 2011

isci_host_by_id() should have been a clue that an array would have been
a simpler approach.
Reported-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

b329aff1

isci: enable isci for dmar builds · 52bed8ea

Dan Williams authored Mar 03, 2011

Now that phys_to_virt() and virt_to_phys() have been removed we are no
longer violating the dma mapping (or kmap apis).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

52bed8ea

isci: pad stp and smp request sizes · fe9a6431

Dan Williams authored Mar 03, 2011

Ross says:
 "The memory allocation for these requests doesn’t take into account the
  additional memory needed when the code in
  scic_sds_s[mst]p_request_assign_buffers() shifts the struct
  scu_task_context so that it is cache line aligned:

  In an example from my machine, total buffer that I’ve given to SCIC goes
  from 0x410024566f84 to 0x410024567308.  From this same example, this
  call shifts my task_context_buffer from 0x410024567208 to
  0x410024567240.

  This means that the task_context_buffer that used to range from
  0x410024567208 to 0x410024567308 instead now goes from 0x410024567240 to
  0x410024567340.

  When the memset() call at the end of scic_task_request_construct()
  clears out this task_context_buffer, it does so from 0x410024567240 to
  0x410024567340, effectively killing whatever buffer follows this
  allocation in memory."

djbw:
Use the kernel's PTR_ALIGN instead of
scic_sds_request_align_task_context_buffer() and SMP_CACHE_BYTES instead of
the local CACHE_LINE_SIZE definition.

TODO: These allocations really want to be better defined in a union rather
than opaque buffers carved up by macros.
Reported-by: Ross Zwisler <ross.zwisler@intel.com>
Signed-off-by: Jacek Danecki <Jacek.Danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

fe9a6431

isci: fix hang after target reset · 27ce51df

Dan Williams authored Mar 02, 2011

When aborting a task context we need to be sure that the hardware has acted on
this request (retrieved the task context) before invalidating the remote node
context. In the case of the "dummy" task context and remote node we do not
have the full state machine that goes through the complete tc abort and rnc
invalidate states. Instead we ensure the hardware has seen and acted on
Signed-off-by: Jacek Danecki <Jacek.Danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

27ce51df

isci: Cleanup warning messages for phy resets · d7628d05

Dave Jiang authored Mar 02, 2011

Moving some of the chattiness of warning messages to debug so only the Linux
system messages are shown.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

d7628d05

isci: Adding support for phy enable and disable · 4d07f7f3

Dave Jiang authored Mar 02, 2011

Adding support for PHY_FUNC_LINK_RESET and PHY_FUNC_DISABLE. This allow the
sysfs knob enable (both 0 and 1) and link_reset to work properly.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

4d07f7f3

isci: controller stop/start fixes · c658b109

Pawel Marek authored Mar 01, 2011

Core reworks to support stopping and re-starting the controller, lays the
groundwork for phy disable / re-enable and fixes other bugs around port/phy
setup/teardown.
Signed-off-by: Pawel Marek <pawel.marek@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

c658b109

isci: handle cases where a d2h fis is used report an ncq error · 3ff0121a

Piotr Sawicki authored Feb 25, 2011

Observed that some devices return a d2h fis, treat like an sdb error fis.
Signed-off-by: Piotr Sawicki <piotr.sawicki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

3ff0121a

isci: workaround port task scheduler starvation issue · a8d4b9fe

Tomasz Chudy authored Feb 25, 2011

There is a condition whereby TCs (task contexts) can jump to the head of
the round robin queue causing indefinite starvation of pending tasks.
Posting a TC to a suspended RNC (remote node context) causes the
hardware to select that task first, but since the RNC is suspended the
scheduler proceeds to the next task in the expected round robin fashion,
restoring TC arbitration fairness.
Signed-off-by: Tomasz Chudy <tomasz.chudy@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

a8d4b9fe

isci: rework timer api · 7c40a803

Dan Williams authored Mar 02, 2011

Prepare the timer api for the arrival of dynamic creation and
destruction events from the core.  It pretended to do this previously
but the core to date only used it in a static init-time only fashion.
This is an interim fix until a cleaner event queue can be developed.

1/ make all locking external to the api (add WARN_ONCE to verify)
2/ add a timer_destroy interface (to be used by the core)
3/ use del_timer_sync() prior to deallocating timer data
4/ delete the "timer_list" indirection, we only have timers allocated
   for the isci_host
5/ fix detection of timer list allocation errors
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

7c40a803

isci: fix sas address reporting · 150fc6fc

Dan Williams authored Feb 25, 2011

Undo the open coded and incorrect translation of the oem parameter sas
address to its libsas expected format.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

150fc6fc

isci: Removing deprecated functions · 7392d275

Dave Jiang authored Feb 23, 2011

Removed all callbacks in the deprecated.c. Core will call the appropriate
functions directly.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

7392d275

isci: Change event notify calls from scic_cb_* to isci_event_* · a1914059

Dave Jiang authored Feb 23, 2011

Renaming the callbacks to apparopriate event notify calls for the LLDD.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

a1914059

isci: have the driver use native SG calls and DMA-API · 6389a775

Dave Jiang authored Feb 23, 2011

Remove abstraction for SG building and get rid of callbacks for getting
DMA memory mapping.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

6389a775

isci: Make the driver copy data directly from and to sg for PIO · 103a00c2

Dave Jiang authored Feb 23, 2011

We can copy the data directly to and from sg for SATA PIO read operations.
There is no reason to involve the hardware SGL. In the process we also need
to kmap the sg because we don't know where that can come from.

We also do to not call phys_to_virt(). The driver already has the information.
We can just calculcate the appropriate offets.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

103a00c2

isci: Removed special macros that does 64bit address math · f7885c84

Dave Jiang authored Feb 22, 2011

These macros are not necessary. We can do 64bit math directly.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

f7885c84