1. 06 Sep, 2019 9 commits
    • Keith Busch's avatar
      nvme-pci: Fix async probe remove race · 4a982919
      Keith Busch authored
      [ Upstream commit bd46a906 ]
      
      Ensure the controller is not in the NEW state when nvme_probe() exits.
      This will always allow a subsequent nvme_remove() to set the state to
      DELETING, fixing a potential race between the initial asynchronous probe
      and device removal.
      Reported-by: default avatarLi Zhong <lizhongfs@gmail.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4a982919
    • Sagi Grimberg's avatar
      nvme: fix a possible deadlock when passthru commands sent to a multipath device · 431f579a
      Sagi Grimberg authored
      [ Upstream commit b9156dae ]
      
      When the user issues a command with side effects, we will end up freezing
      the namespace request queue when updating disk info (and the same for
      the corresponding mpath disk node).
      
      However, we are not freezing the mpath node request queue,
      which means that mpath I/O can still come in and block on blk_queue_enter
      (called from nvme_ns_head_make_request -> direct_make_request).
      
      This is a deadlock, because blk_queue_enter will block until the inner
      namespace request queue is unfroze, but that process is blocked because
      the namespace revalidation is trying to update the mpath disk info
      and freeze its request queue (which will never complete because
      of the I/O that is blocked on blk_queue_enter).
      
      Fix this by freezing all the subsystem nsheads request queues before
      executing the passthru command. Given that these commands are infrequent
      we should not worry about this temporary I/O freeze to keep things sane.
      
      Here is the matching hang traces:
      --
      [ 374.465002] INFO: task systemd-udevd:17994 blocked for more than 122 seconds.
      [ 374.472975] Not tainted 5.2.0-rc3-mpdebug+ #42
      [ 374.478522] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 374.487274] systemd-udevd D 0 17994 1 0x00000000
      [ 374.493407] Call Trace:
      [ 374.496145] __schedule+0x2ef/0x620
      [ 374.500047] schedule+0x38/0xa0
      [ 374.503569] blk_queue_enter+0x139/0x220
      [ 374.507959] ? remove_wait_queue+0x60/0x60
      [ 374.512540] direct_make_request+0x60/0x130
      [ 374.517219] nvme_ns_head_make_request+0x11d/0x420 [nvme_core]
      [ 374.523740] ? generic_make_request_checks+0x307/0x6f0
      [ 374.529484] generic_make_request+0x10d/0x2e0
      [ 374.534356] submit_bio+0x75/0x140
      [ 374.538163] ? guard_bio_eod+0x32/0xe0
      [ 374.542361] submit_bh_wbc+0x171/0x1b0
      [ 374.546553] block_read_full_page+0x1ed/0x330
      [ 374.551426] ? check_disk_change+0x70/0x70
      [ 374.556008] ? scan_shadow_nodes+0x30/0x30
      [ 374.560588] blkdev_readpage+0x18/0x20
      [ 374.564783] do_read_cache_page+0x301/0x860
      [ 374.569463] ? blkdev_writepages+0x10/0x10
      [ 374.574037] ? prep_new_page+0x88/0x130
      [ 374.578329] ? get_page_from_freelist+0xa2f/0x1280
      [ 374.583688] ? __alloc_pages_nodemask+0x179/0x320
      [ 374.588947] read_cache_page+0x12/0x20
      [ 374.593142] read_dev_sector+0x2d/0xd0
      [ 374.597337] read_lba+0x104/0x1f0
      [ 374.601046] find_valid_gpt+0xfa/0x720
      [ 374.605243] ? string_nocheck+0x58/0x70
      [ 374.609534] ? find_valid_gpt+0x720/0x720
      [ 374.614016] efi_partition+0x89/0x430
      [ 374.618113] ? string+0x48/0x60
      [ 374.621632] ? snprintf+0x49/0x70
      [ 374.625339] ? find_valid_gpt+0x720/0x720
      [ 374.629828] check_partition+0x116/0x210
      [ 374.634214] rescan_partitions+0xb6/0x360
      [ 374.638699] __blkdev_reread_part+0x64/0x70
      [ 374.643377] blkdev_reread_part+0x23/0x40
      [ 374.647860] blkdev_ioctl+0x48c/0x990
      [ 374.651956] block_ioctl+0x41/0x50
      [ 374.655766] do_vfs_ioctl+0xa7/0x600
      [ 374.659766] ? locks_lock_inode_wait+0xb1/0x150
      [ 374.664832] ksys_ioctl+0x67/0x90
      [ 374.668539] __x64_sys_ioctl+0x1a/0x20
      [ 374.672732] do_syscall_64+0x5a/0x1c0
      [ 374.676828] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [ 374.738474] INFO: task nvmeadm:49141 blocked for more than 123 seconds.
      [ 374.745871] Not tainted 5.2.0-rc3-mpdebug+ #42
      [ 374.751419] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 374.760170] nvmeadm D 0 49141 36333 0x00004080
      [ 374.766301] Call Trace:
      [ 374.769038] __schedule+0x2ef/0x620
      [ 374.772939] schedule+0x38/0xa0
      [ 374.776452] blk_mq_freeze_queue_wait+0x59/0x100
      [ 374.781614] ? remove_wait_queue+0x60/0x60
      [ 374.786192] blk_mq_freeze_queue+0x1a/0x20
      [ 374.790773] nvme_update_disk_info.isra.57+0x5f/0x350 [nvme_core]
      [ 374.797582] ? nvme_identify_ns.isra.50+0x71/0xc0 [nvme_core]
      [ 374.804006] __nvme_revalidate_disk+0xe5/0x110 [nvme_core]
      [ 374.810139] nvme_revalidate_disk+0xa6/0x120 [nvme_core]
      [ 374.816078] ? nvme_submit_user_cmd+0x11e/0x320 [nvme_core]
      [ 374.822299] nvme_user_cmd+0x264/0x370 [nvme_core]
      [ 374.827661] nvme_dev_ioctl+0x112/0x1d0 [nvme_core]
      [ 374.833114] do_vfs_ioctl+0xa7/0x600
      [ 374.837117] ? __audit_syscall_entry+0xdd/0x130
      [ 374.842184] ksys_ioctl+0x67/0x90
      [ 374.845891] __x64_sys_ioctl+0x1a/0x20
      [ 374.850082] do_syscall_64+0x5a/0x1c0
      [ 374.854178] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      --
      Reported-by: default avatarJames Puthukattukaran <james.puthukattukaran@oracle.com>
      Tested-by: default avatarJames Puthukattukaran <james.puthukattukaran@oracle.com>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      431f579a
    • Logan Gunthorpe's avatar
      nvmet-loop: Flush nvme_delete_wq when removing the port · 32c0b8f1
      Logan Gunthorpe authored
      [ Upstream commit 86b9a63e ]
      
      After calling nvme_loop_delete_ctrl(), the controllers will not
      yet be deleted because nvme_delete_ctrl() only schedules work
      to do the delete.
      
      This means a race can occur if a port is removed but there
      are still active controllers trying to access that memory.
      
      To fix this, flush the nvme_delete_wq before returning from
      nvme_loop_remove_port() so that any controllers that might
      be in the process of being deleted won't access a freed port.
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      32c0b8f1
    • David Howells's avatar
      afs: Only update d_fsdata if different in afs_d_revalidate() · 9c55dc85
      David Howells authored
      [ Upstream commit 5dc84855 ]
      
      In the in-kernel afs filesystem, d_fsdata is set with the data version of
      the parent directory.  afs_d_revalidate() will update this to the current
      directory version, but it shouldn't do this if it the value it read from
      d_fsdata is the same as no lock is held and cmpxchg() is not used.
      
      Fix the code to only change the value if it is different from the current
      directory version.
      
      Fixes: 260a9803 ("[AFS]: Add "directory write" support.")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9c55dc85
    • Jia-Ju Bai's avatar
      fs: afs: Fix a possible null-pointer dereference in afs_put_read() · 24e093b9
      Jia-Ju Bai authored
      [ Upstream commit a6eed4ab ]
      
      In afs_read_dir(), there is an if statement on line 255 to check whether
      req->pages is NULL:
      	if (!req->pages)
      		goto error;
      
      If req->pages is NULL, afs_put_read() on line 337 is executed.
      In afs_put_read(), req->pages[i] is used on line 195.
      Thus, a possible null-pointer dereference may occur in this case.
      
      To fix this possible bug, an if statement is added in afs_put_read() to
      check req->pages.
      
      This bug is found by a static analysis tool STCheck written by us.
      
      Fixes: f3ddee8d ("afs: Fix directory handling")
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      24e093b9
    • Marc Dionne's avatar
      afs: Fix loop index mixup in afs_deliver_vl_get_entry_by_name_u() · 8e5179f9
      Marc Dionne authored
      [ Upstream commit 4a46fdba ]
      
      afs_deliver_vl_get_entry_by_name_u() scans through the vl entry
      received from the volume location server and builds a return list
      containing the sites that are currently valid.  When assigning
      values for the return list, the index into the vl entry (i) is used
      rather than the one for the new list (entry->nr_server).  If all
      sites are usable, this works out fine as the indices will match.
      If some sites are not valid, for example if AFS_VLSF_DONTUSE is
      set, fs_mask and the uuid will be set for the wrong return site.
      
      Fix this by using entry->nr_server as the index into the arrays
      being filled in rather than i.
      
      This can lead to EDESTADDRREQ errors if none of the returned sites
      have a valid fs_mask.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8e5179f9
    • David Howells's avatar
      afs: Fix the CB.ProbeUuid service handler to reply correctly · dfc438c0
      David Howells authored
      [ Upstream commit 2067b2b3 ]
      
      Fix the service handler function for the CB.ProbeUuid RPC call so that it
      replies in the correct manner - that is an empty reply for success and an
      abort of 1 for failure.
      
      Putting 0 or 1 in an integer in the body of the reply should result in the
      fileserver throwing an RX_PROTOCOL_ERROR abort and discarding its record of
      the client; older servers, however, don't necessarily check that all the
      data got consumed, and so might incorrectly think that they got a positive
      response and associate the client with the wrong host record.
      
      If the client is incorrectly associated, this will result in callbacks
      intended for a different client being delivered to this one and then, when
      the other client connects and responds positively, all of the callback
      promises meant for the client that issued the improper response will be
      lost and it won't receive any further change notifications.
      
      Fixes: 9396d496 ("afs: support the CB.ProbeUuid RPC op")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dfc438c0
    • Anthony Iliopoulos's avatar
      nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns · 7436dc2a
      Anthony Iliopoulos authored
      [ Upstream commit fab7772b ]
      
      When CONFIG_NVME_MULTIPATH is set, only the hidden gendisk associated
      with the per-controller ns is run through revalidate_disk when a
      rescan is triggered, while the visible blockdev never gets its size
      (bdev->bd_inode->i_size) updated to reflect any capacity changes that
      may have occurred.
      
      This prevents online resizing of nvme block devices and in extension of
      any filesystems atop that will are unable to expand while mounted, as
      userspace relies on the blockdev size for obtaining the disk capacity
      (via BLKGETSIZE/64 ioctls).
      
      Fix this by explicitly revalidating the actual namespace gendisk in
      addition to the per-controller gendisk, when multipath is enabled.
      Signed-off-by: default avatarAnthony Iliopoulos <ailiopoulos@suse.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7436dc2a
    • Arnd Bergmann's avatar
      dmaengine: ste_dma40: fix unneeded variable warning · 2013d6ec
      Arnd Bergmann authored
      [ Upstream commit 5d6fb560 ]
      
      clang-9 points out that there are two variables that depending on the
      configuration may only be used in an ARRAY_SIZE() expression but not
      referenced:
      
      drivers/dma/ste_dma40.c:145:12: error: variable 'd40_backup_regs' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
      static u32 d40_backup_regs[] = {
                 ^
      drivers/dma/ste_dma40.c:214:12: error: variable 'd40_backup_regs_chan' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
      static u32 d40_backup_regs_chan[] = {
      
      Mark these __maybe_unused to shut up the warning.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Link: https://lore.kernel.org/r/20190712091357.744515-1-arnd@arndb.deSigned-off-by: default avatarVinod Koul <vkoul@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2013d6ec
  2. 29 Aug, 2019 31 commits