1. 19 Aug, 2010 4 commits
    • Clemens Ladisch's avatar
      firewire: core: do not use del_timer_sync() in interrupt context · 2222bcb7
      Clemens Ladisch authored
      Because we might be in interrupt context, replace del_timer_sync() with
      del_timer().  If the timer is already running, we know that it will
      clean up the transaction, so we do not need to do any further processing
      in the normal transaction handler.
      
      Many thanks to Yong Zhang for diagnosing this.
      Reported-by: default avatarStefan Richter <stefanr@s5r6.in-berlin.de>
      Signed-off-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Signed-off-by: default avatarStefan Richter <stefanr@s5r6.in-berlin.de>
      2222bcb7
    • Stefan Richter's avatar
      firewire: net: fix unicast reception RCODE in failure paths · 1bf145fe
      Stefan Richter authored
      The incoming request hander fwnet_receive_packet() expects subsequent
      datagram handling code to return non-zero on errors.  However, almost
      none of the failure paths did so.  Fix them all.
      
      (This error reporting is used to send and RCODE_CONFLICT_ERROR to the
      sender node in such failure cases.  Two modes of failure exist:  Out of
      memory, or firewire-net is unaware of any peer node to which a fragment
      or an ARP packet belongs.  However, it is unclear whether a sender can
      actually make use of such information.  A Linux peer apparently can't.
      Maybe it should all be simplified to void functions.)
      Reported-by: default avatarJulia Lawall <julia@diku.dk>
      Signed-off-by: default avatarStefan Richter <stefanr@s5r6.in-berlin.de>
      1bf145fe
    • Stefan Richter's avatar
      firewire: sbp2: fix stall with "Unsolicited response" · a481e97d
      Stefan Richter authored
      Fix I/O stalls with some 4-bay RAID enclosures which are based on
      OXUF936QSE:
        - Onnto dataTale RSM4QO, old firmware (not anymore with current
          firmware),
        - inXtron Hydra Super-S LCM, old as well as current firmware
      when used in RAID-5 mode, perhaps also in other RAID modes.
      
      The stalls happen during heavy or moderate disk traffic in periods that
      are a multiple of 5 minutes, roughly twice per hour.  They are caused
      by the target responding too late to an ORB_Pointer register write:
      The target responds after Split_Timeout, hence firewire-core cancels
      the transaction, and firewire-sbp2 fails the SCSI request.  The SCSI
      core retries the request, that fails again (and again), hence SCSI core
      calls firewire-sbp2's abort handler (and even the Management_Agent
      register write in the abort handler has the transaction timeout
      problem).
      
      During all that, the process which issued the I/O is stalled in I/O
      wait state.
      
      Meanwhile, the target actually acts on the first failed SCSI request:
      It responds to the ORB_Pointer write later (seen in the kernel log as
      "firewire_core: Unsolicited response") and also finishes the SCSI
      request with proper status (seen in the kernel log as "firewire_sbp2:
      status write for unknown orb").
      
      So let's just ignore RCODE_CANCELLED in the transaction callback and
      wait for the target to complete the ORB nevertheless.  This requires
      a small modification is sbp2_cancel_orbs(); it now needs to call
      orb->callback() regardless whether fw_cancel_transaction() found the
      transaction unfinished or finished.
      
      A different solution is to increase Split_Timeout on the local node.
      (Tested: 2000ms timeout; maybe 1000ms or something like that works too.
      200ms is insufficient.  Standard is 100ms.)  However, I rather not do
      this because any software on any node could change the Split_Timeout to
      something unsuitable.  Or such a large Split_Timeout may be undesirable
      for other purposes.
      Signed-off-by: default avatarStefan Richter <stefanr@s5r6.in-berlin.de>
      a481e97d
    • Stefan Richter's avatar
      firewire: sbp2: fix memory leak in sbp2_cancel_orbs or at send error · 6c74340b
      Stefan Richter authored
      When an ORB was canceled (Command ORB i.e. SCSI request timed out, or
      Management ORB timed out), or there was a send error in the initial
      transaction, we missed to drop one of the ORB's references and thus
      leaked memory.
      
      Background:
      In total, we hold 3 references to each Operation Request Block:
        - 1 during sbp2_scsi_queuecommand() or sbp2_send_management_orb()
          respectively,
        - 1 for the duration of the write transaction to the ORB_Pointer or
          Management_Agent register of the target,
        - 1 for as long as the ORB stays within the lu->orb_list, until
          the ORB is unlinked from the list and the orb->callback was
          executed.
      
      The latter one of these 3 references is finished
        - normally by sbp2_status_write() when the target wrote status
          for a pending ORB,
        - or by sbp2_cancel_orbs() in case of an ORB time-out,
        - or by complete_transaction() in case of a send error.
      Of them, the latter two lacked the kref_put.
      
      Add the missing kref_put()s.  Add comments to the gets and puts of
      references for transaction callbacks and ORB callbacks so that it is
      easier to see what is supposed to happen.
      Signed-off-by: default avatarStefan Richter <stefanr@s5r6.in-berlin.de>
      6c74340b
  2. 05 Aug, 2010 1 commit
  3. 02 Aug, 2010 2 commits
  4. 01 Aug, 2010 2 commits
  5. 31 Jul, 2010 5 commits
  6. 30 Jul, 2010 8 commits
  7. 29 Jul, 2010 18 commits