1. 30 Nov, 2016 5 commits
    • Quentin Lambert's avatar
      scsi: cxgb4i: Add a missing call to neigh_release · 338be072
      Quentin Lambert authored
      Most error branches following the call to dst_neigh_lookup contain a
      call to neigh_release. This patch add these calls where they are
      missing.
      
      This issue was found with Hector.
      Signed-off-by: default avatarQuentin Lambert <lambert.quentin@gmail.com>
      Acked-by: default avatarVarun Prakash <varun@chelsio.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      338be072
    • Uma Krishnan's avatar
      scsi: cxlflash: Avoid command room violation · 11f7b184
      Uma Krishnan authored
      During test, a command room violation interrupt is occasionally seen
      for the master context when the CXL flash devices are stressed.
      
      After studying the code, there could be gaps in the way command room
      value is being cached in cxlflash. When the cached command room is zero
      the thread attempting to send becomes burdened with updating the cached
      value with the actual value from the AFU. Today, this is handled with an
      atomic set operation of the raw value read. Following the atomic update,
      the thread proceeds to send.
      
      This behavior is incorrect on two counts:
      
         - The update fails to take into account the current thread and its
           consumption of one of the hardware commands.
      
         - The update does not take into account other threads also atomically
           updating. Per design, a worker thread updates the cached value when a
           send thread times out. By not protecting the update with a lock, the
           cached value can be incorrectly clobbered.
      
      To correct these issues, the update of the cached command room has been
      simplified and also protected using a spin lock which is held until the
      MMIO is complete. This ensures the command room is properly consumed by
      the same thread. Update of cached value also takes into account the
      current thread consuming a hardware command.
      Signed-off-by: default avatarUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: default avatarMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      11f7b184
    • Uma Krishnan's avatar
      scsi: cxlflash: Improve context_reset() logic · 3d2f617d
      Uma Krishnan authored
      Currently, the context reset routine waits for command room to
      be available before sending the reset request. Per review of the
      SISLite specification and clarifications from the CXL Flash AFU
      designers, this wait is unnecessary. The reset request can be
      sent anytime regardless of command room, so long as only a single
      reset request is active at any one point in time.
      
      This commit simplifies the reset routine by removing the wait for
      command room. Additionally it adds a debug trace to help pinpoint
      hardware errors when a context reset does not complete.
      Signed-off-by: default avatarUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: default avatarMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      3d2f617d
    • Uma Krishnan's avatar
      scsi: cxlflash: Fix crash in cxlflash_restore_luntable() · 8a260543
      Uma Krishnan authored
      During test, the following crash was observed:
      
      [34538.981505] Faulting instruction address: 0xd000000007c9c870
      cpu 0x9: Vector: 300 (Data Access) at [c0000007f1e8f590]
          pc: d000000007c9c870: cxlflash_restore_luntable+0x70/0x1d0 [cxlflash]
          lr: d000000007c9c84c: cxlflash_restore_luntable+0x4c/0x1d0 [cxlflash]
          sp: c0000007f1e8f810
         msr: 9000000100009033
         dar: c00000171d637438
       dsisr: 40000000
        current = 0xc0000007f1e43f90
        paca    = 0xc000000007b25100   softe: 0        irq_happened: 0x01
          pid   = 493, comm = eehd
      enter ? for help
      [c0000007f1e8f8a0] d000000007c940b0 init_afu+0xd60/0x1200 [cxlflash]
      [c0000007f1e8f9a0] d000000007c945a8 cxlflash_pci_slot_reset+0x58/0xe0 [cxlflash]
      [c0000007f1e8fa20] d00000000715f790 cxl_pci_slot_reset+0x230/0x340 [cxl]
      [c0000007f1e8fae0] c000000000040dd4 eeh_report_reset+0x144/0x180
      [c0000007f1e8fb20] c00000000003f708 eeh_pe_dev_traverse+0x98/0x170
      [c0000007f1e8fbb0] c000000000041618 eeh_handle_normal_event+0x328/0x410
      [c0000007f1e8fc30] c000000000041db8 eeh_handle_event+0x178/0x330
      [c0000007f1e8fce0] c000000000042118 eeh_event_handler+0x1a8/0x1b0
      [c0000007f1e8fd80] c00000000011420c kthread+0xec/0x100
      [c0000007f1e8fe30] c00000000000a47c ret_from_kernel_thread+0x5c/0xe0
      
      When superpipe mode is disabled for a LUN, the references for the
      local lun are deleted but the LUN is still identified as being present
      in the LUN table. This mismatched state can result in the above crash
      when the LUN table is restored during an error recovery operation.
      
      To fix this issue, the local LUN information structure is updated to
      reflect the LUN is no longer in the LUN table once all references to
      the LUN are gone.
      Signed-off-by: default avatarUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: default avatarMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      8a260543
    • Uma Krishnan's avatar
      scsi: cxlflash: Set sg_tablesize to 1 instead of SG_NONE · 68ab2d76
      Uma Krishnan authored
      The following Oops is encountered when blk_mq is enabled with the
      cxlflash driver:
      
      [ 2960.817172] Oops: Kernel access of bad area, sig: 11 [#5]
      [ 2960.817309] NIP  __blk_mq_run_hw_queue+0x278/0x4c0
      [ 2960.817313] LR __blk_mq_run_hw_queue+0x2bc/0x4c0
      [ 2960.817314] Call Trace:
      [ 2960.817320] __blk_mq_run_hw_queue+0x2bc/0x4c0 (unreliable)
      [ 2960.817324] blk_mq_run_hw_queue+0xd8/0x100
      [ 2960.817329] blk_mq_insert_requests+0x14c/0x1f0
      [ 2960.817333] blk_mq_flush_plug_list+0x150/0x190
      [ 2960.817338] blk_flush_plug_list+0x11c/0x2b0
      [ 2960.817344] blk_finish_plug+0x58/0x80
      [ 2960.817348] __do_page_cache_readahead+0x1c0/0x2e0
      [ 2960.817352] force_page_cache_readahead+0x68/0xd0
      [ 2960.817356] generic_file_read_iter+0x43c/0x6a0
      [ 2960.817359] blkdev_read_iter+0x68/0xa0
      [ 2960.817361] __vfs_read+0x11c/0x180
      [ 2960.817364] vfs_read+0xa4/0x1c0
      [ 2960.817366] SyS_read+0x6c/0x110
      [ 2960.817369] system_call+0x38/0xb4
      
      The SCSI blk_mq stack assumes that sg_tablesize is always a non-zero
      value with scsi_mq_setup_tags() allocating tags using sg_tablesize.
      The cxlflash driver currently uses SG_NONE (0) for the sg_tablesize
      as the devices it supports are not capable of scatter gather. This
      mismatch of values results in the Oops above.
      
      To resolve this issue, sg_tablesize for cxlflash can simply be set
      to 1, a value which satisfies the constraints in cxlflash and the
      lack of support of SG_NONE in SCSI blk_mq.
      Signed-off-by: default avatarUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Acked-by: default avatarMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      68ab2d76
  2. 29 Nov, 2016 19 commits
  3. 25 Nov, 2016 13 commits
  4. 22 Nov, 2016 3 commits