- 10 Dec, 2018 1 commit
-
-
Jens Axboe authored
Ming reports that lockdep spews the following trace. What this essentially says is that the sbitmap swap_lock was used inconsistently in IRQ enabled and disabled context, and that is usually indicative of a bug that will cause a deadlock. For this case, it's a false positive. The swap_lock is used from process context only, when we swap the bits in the word and cleared mask. We also end up doing that when we are getting a driver tag, from the blk_mq_mark_tag_wait(), and from there we hold the waitqueue lock with IRQs disabled. However, this isn't from an actual IRQ, it's still process context. In lieu of a better way to fix this, simply always disable interrupts when grabbing the swap_lock if lockdep is enabled. [ 100.967642] ================start test sanity/001================ [ 101.238280] null: module loaded [ 106.093735] [ 106.094012] ===================================================== [ 106.094854] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected [ 106.095759] 4.20.0-rc3_5d2ee712_for-next+ #1 Not tainted [ 106.096551] ----------------------------------------------------- [ 106.097386] fio/1043 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 106.098231] 000000004c43fa71 (&(&sb->map[i].swap_lock)->rlock){+.+.}, at: sbitmap_get+0xd5/0x22c [ 106.099431] [ 106.099431] and this task is already holding: [ 106.100229] 000000007eec8b2f (&(&hctx->dispatch_wait_lock)->rlock){....}, at: blk_mq_dispatch_rq_list+0x4c1/0xd7c [ 106.101630] which would create a new lock dependency: [ 106.102326] (&(&hctx->dispatch_wait_lock)->rlock){....} -> (&(&sb->map[i].swap_lock)->rlock){+.+.} [ 106.103553] [ 106.103553] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 106.104580] (&sbq->ws[i].wait){..-.} [ 106.104582] [ 106.104582] ... which became SOFTIRQ-irq-safe at: [ 106.105751] _raw_spin_lock_irqsave+0x4b/0x82 [ 106.106284] __wake_up_common_lock+0x119/0x1b9 [ 106.106825] sbitmap_queue_wake_up+0x33f/0x383 [ 106.107456] sbitmap_queue_clear+0x4c/0x9a [ 106.108046] __blk_mq_free_request+0x188/0x1d3 [ 106.108581] blk_mq_free_request+0x23b/0x26b [ 106.109102] scsi_end_request+0x345/0x5d7 [ 106.109587] scsi_io_completion+0x4b5/0x8f0 [ 106.110099] scsi_finish_command+0x412/0x456 [ 106.110615] scsi_softirq_done+0x23f/0x29b [ 106.111115] blk_done_softirq+0x2a7/0x2e6 [ 106.111608] __do_softirq+0x360/0x6ad [ 106.112062] run_ksoftirqd+0x2f/0x5b [ 106.112499] smpboot_thread_fn+0x3a5/0x3db [ 106.113000] kthread+0x1d4/0x1e4 [ 106.113457] ret_from_fork+0x3a/0x50 [ 106.113969] [ 106.113969] to a SOFTIRQ-irq-unsafe lock: [ 106.114672] (&(&sb->map[i].swap_lock)->rlock){+.+.} [ 106.114674] [ 106.114674] ... which became SOFTIRQ-irq-unsafe at: [ 106.116000] ... [ 106.116003] _raw_spin_lock+0x33/0x64 [ 106.116676] sbitmap_get+0xd5/0x22c [ 106.117134] __sbitmap_queue_get+0xe8/0x177 [ 106.117731] __blk_mq_get_tag+0x1e6/0x22d [ 106.118286] blk_mq_get_tag+0x1db/0x6e4 [ 106.118756] blk_mq_get_driver_tag+0x161/0x258 [ 106.119383] blk_mq_dispatch_rq_list+0x28e/0xd7c [ 106.120043] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.120607] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.121234] __blk_mq_run_hw_queue+0x137/0x17e [ 106.121781] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.122366] blk_mq_run_hw_queue+0x151/0x187 [ 106.122887] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.123492] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.124042] blk_flush_plug_list+0x392/0x3d7 [ 106.124557] blk_finish_plug+0x37/0x4f [ 106.125019] read_pages+0x3ef/0x430 [ 106.125446] __do_page_cache_readahead+0x18e/0x2fc [ 106.126027] force_page_cache_readahead+0x121/0x133 [ 106.126621] page_cache_sync_readahead+0x35f/0x3bb [ 106.127229] generic_file_buffered_read+0x410/0x1860 [ 106.127932] __vfs_read+0x319/0x38f [ 106.128415] vfs_read+0xd2/0x19a [ 106.128817] ksys_read+0xb9/0x135 [ 106.129225] do_syscall_64+0x140/0x385 [ 106.129684] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.130292] [ 106.130292] other info that might help us debug this: [ 106.130292] [ 106.131226] Chain exists of: [ 106.131226] &sbq->ws[i].wait --> &(&hctx->dispatch_wait_lock)->rlock --> &(&sb->map[i].swap_lock)->rlock [ 106.131226] [ 106.132865] Possible interrupt unsafe locking scenario: [ 106.132865] [ 106.133659] CPU0 CPU1 [ 106.134194] ---- ---- [ 106.134733] lock(&(&sb->map[i].swap_lock)->rlock); [ 106.135318] local_irq_disable(); [ 106.136014] lock(&sbq->ws[i].wait); [ 106.136747] lock(&(&hctx->dispatch_wait_lock)->rlock); [ 106.137742] <Interrupt> [ 106.138110] lock(&sbq->ws[i].wait); [ 106.138625] [ 106.138625] *** DEADLOCK *** [ 106.138625] [ 106.139430] 3 locks held by fio/1043: [ 106.139947] #0: 0000000076ff0fd9 (rcu_read_lock){....}, at: hctx_lock+0x29/0xe8 [ 106.140813] #1: 000000002feb1016 (&sbq->ws[i].wait){..-.}, at: blk_mq_dispatch_rq_list+0x4ad/0xd7c [ 106.141877] #2: 000000007eec8b2f (&(&hctx->dispatch_wait_lock)->rlock){....}, at: blk_mq_dispatch_rq_list+0x4c1/0xd7c [ 106.143267] [ 106.143267] the dependencies between SOFTIRQ-irq-safe lock and the holding lock: [ 106.144351] -> (&sbq->ws[i].wait){..-.} ops: 82 { [ 106.144926] IN-SOFTIRQ-W at: [ 106.145314] _raw_spin_lock_irqsave+0x4b/0x82 [ 106.146042] __wake_up_common_lock+0x119/0x1b9 [ 106.146785] sbitmap_queue_wake_up+0x33f/0x383 [ 106.147567] sbitmap_queue_clear+0x4c/0x9a [ 106.148379] __blk_mq_free_request+0x188/0x1d3 [ 106.149148] blk_mq_free_request+0x23b/0x26b [ 106.149864] scsi_end_request+0x345/0x5d7 [ 106.150546] scsi_io_completion+0x4b5/0x8f0 [ 106.151367] scsi_finish_command+0x412/0x456 [ 106.152157] scsi_softirq_done+0x23f/0x29b [ 106.152855] blk_done_softirq+0x2a7/0x2e6 [ 106.153537] __do_softirq+0x360/0x6ad [ 106.154280] run_ksoftirqd+0x2f/0x5b [ 106.155020] smpboot_thread_fn+0x3a5/0x3db [ 106.155828] kthread+0x1d4/0x1e4 [ 106.156526] ret_from_fork+0x3a/0x50 [ 106.157267] INITIAL USE at: [ 106.157713] _raw_spin_lock_irqsave+0x4b/0x82 [ 106.158542] prepare_to_wait_exclusive+0xa8/0x215 [ 106.159421] blk_mq_get_tag+0x34f/0x6e4 [ 106.160186] blk_mq_get_request+0x48e/0xaef [ 106.160997] blk_mq_make_request+0x27e/0xbd2 [ 106.161828] generic_make_request+0x4d1/0x873 [ 106.162661] submit_bio+0x20c/0x253 [ 106.163379] mpage_bio_submit+0x44/0x4b [ 106.164142] mpage_readpages+0x3c2/0x407 [ 106.164919] read_pages+0x13a/0x430 [ 106.165633] __do_page_cache_readahead+0x18e/0x2fc [ 106.166530] force_page_cache_readahead+0x121/0x133 [ 106.167439] page_cache_sync_readahead+0x35f/0x3bb [ 106.168337] generic_file_buffered_read+0x410/0x1860 [ 106.169255] __vfs_read+0x319/0x38f [ 106.169977] vfs_read+0xd2/0x19a [ 106.170662] ksys_read+0xb9/0x135 [ 106.171356] do_syscall_64+0x140/0x385 [ 106.172120] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.173051] } [ 106.173308] ... key at: [<ffffffff85094600>] __key.26481+0x0/0x40 [ 106.174219] ... acquired at: [ 106.174646] _raw_spin_lock+0x33/0x64 [ 106.175183] blk_mq_dispatch_rq_list+0x4c1/0xd7c [ 106.175843] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.176518] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.177262] __blk_mq_run_hw_queue+0x137/0x17e [ 106.177900] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.178591] blk_mq_run_hw_queue+0x151/0x187 [ 106.179207] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.179926] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.180571] blk_flush_plug_list+0x392/0x3d7 [ 106.181187] blk_finish_plug+0x37/0x4f [ 106.181737] __se_sys_io_submit+0x171/0x304 [ 106.182346] do_syscall_64+0x140/0x385 [ 106.182895] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.183607] [ 106.183830] -> (&(&hctx->dispatch_wait_lock)->rlock){....} ops: 1 { [ 106.184691] INITIAL USE at: [ 106.185119] _raw_spin_lock+0x33/0x64 [ 106.185838] blk_mq_dispatch_rq_list+0x4c1/0xd7c [ 106.186697] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.187551] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.188481] __blk_mq_run_hw_queue+0x137/0x17e [ 106.189307] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.190189] blk_mq_run_hw_queue+0x151/0x187 [ 106.190989] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.191902] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.192739] blk_flush_plug_list+0x392/0x3d7 [ 106.193535] blk_finish_plug+0x37/0x4f [ 106.194269] __se_sys_io_submit+0x171/0x304 [ 106.195059] do_syscall_64+0x140/0x385 [ 106.195794] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.196705] } [ 106.196950] ... key at: [<ffffffff84880620>] __key.51231+0x0/0x40 [ 106.197853] ... acquired at: [ 106.198270] lock_acquire+0x280/0x2f3 [ 106.198806] _raw_spin_lock+0x33/0x64 [ 106.199337] sbitmap_get+0xd5/0x22c [ 106.199850] __sbitmap_queue_get+0xe8/0x177 [ 106.200450] __blk_mq_get_tag+0x1e6/0x22d [ 106.201035] blk_mq_get_tag+0x1db/0x6e4 [ 106.201589] blk_mq_get_driver_tag+0x161/0x258 [ 106.202237] blk_mq_dispatch_rq_list+0x5b9/0xd7c [ 106.202902] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.203572] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.204316] __blk_mq_run_hw_queue+0x137/0x17e [ 106.204956] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.205649] blk_mq_run_hw_queue+0x151/0x187 [ 106.206269] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.206997] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.207644] blk_flush_plug_list+0x392/0x3d7 [ 106.208264] blk_finish_plug+0x37/0x4f [ 106.208814] __se_sys_io_submit+0x171/0x304 [ 106.209415] do_syscall_64+0x140/0x385 [ 106.209965] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.210684] [ 106.210904] [ 106.210904] the dependencies between the lock to be acquired [ 106.210905] and SOFTIRQ-irq-unsafe lock: [ 106.212541] -> (&(&sb->map[i].swap_lock)->rlock){+.+.} ops: 1969 { [ 106.213393] HARDIRQ-ON-W at: [ 106.213840] _raw_spin_lock+0x33/0x64 [ 106.214570] sbitmap_get+0xd5/0x22c [ 106.215282] __sbitmap_queue_get+0xe8/0x177 [ 106.216086] __blk_mq_get_tag+0x1e6/0x22d [ 106.216876] blk_mq_get_tag+0x1db/0x6e4 [ 106.217627] blk_mq_get_driver_tag+0x161/0x258 [ 106.218465] blk_mq_dispatch_rq_list+0x28e/0xd7c [ 106.219326] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.220198] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.221138] __blk_mq_run_hw_queue+0x137/0x17e [ 106.221975] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.222874] blk_mq_run_hw_queue+0x151/0x187 [ 106.223686] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.224597] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.225444] blk_flush_plug_list+0x392/0x3d7 [ 106.226255] blk_finish_plug+0x37/0x4f [ 106.227006] read_pages+0x3ef/0x430 [ 106.227717] __do_page_cache_readahead+0x18e/0x2fc [ 106.228595] force_page_cache_readahead+0x121/0x133 [ 106.229491] page_cache_sync_readahead+0x35f/0x3bb [ 106.230373] generic_file_buffered_read+0x410/0x1860 [ 106.231277] __vfs_read+0x319/0x38f [ 106.231986] vfs_read+0xd2/0x19a [ 106.232666] ksys_read+0xb9/0x135 [ 106.233350] do_syscall_64+0x140/0x385 [ 106.234097] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.235012] SOFTIRQ-ON-W at: [ 106.235460] _raw_spin_lock+0x33/0x64 [ 106.236195] sbitmap_get+0xd5/0x22c [ 106.236913] __sbitmap_queue_get+0xe8/0x177 [ 106.237715] __blk_mq_get_tag+0x1e6/0x22d [ 106.238488] blk_mq_get_tag+0x1db/0x6e4 [ 106.239244] blk_mq_get_driver_tag+0x161/0x258 [ 106.240079] blk_mq_dispatch_rq_list+0x28e/0xd7c [ 106.240937] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.241806] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.242751] __blk_mq_run_hw_queue+0x137/0x17e [ 106.243579] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.244469] blk_mq_run_hw_queue+0x151/0x187 [ 106.245277] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.246191] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.247044] blk_flush_plug_list+0x392/0x3d7 [ 106.247859] blk_finish_plug+0x37/0x4f [ 106.248749] read_pages+0x3ef/0x430 [ 106.249463] __do_page_cache_readahead+0x18e/0x2fc [ 106.250357] force_page_cache_readahead+0x121/0x133 [ 106.251263] page_cache_sync_readahead+0x35f/0x3bb [ 106.252157] generic_file_buffered_read+0x410/0x1860 [ 106.253084] __vfs_read+0x319/0x38f [ 106.253808] vfs_read+0xd2/0x19a [ 106.254488] ksys_read+0xb9/0x135 [ 106.255186] do_syscall_64+0x140/0x385 [ 106.255943] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.256867] INITIAL USE at: [ 106.257300] _raw_spin_lock+0x33/0x64 [ 106.258033] sbitmap_get+0xd5/0x22c [ 106.258747] __sbitmap_queue_get+0xe8/0x177 [ 106.259542] __blk_mq_get_tag+0x1e6/0x22d [ 106.260320] blk_mq_get_tag+0x1db/0x6e4 [ 106.261072] blk_mq_get_driver_tag+0x161/0x258 [ 106.261902] blk_mq_dispatch_rq_list+0x28e/0xd7c [ 106.262762] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.263626] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.264571] __blk_mq_run_hw_queue+0x137/0x17e [ 106.265409] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.266302] blk_mq_run_hw_queue+0x151/0x187 [ 106.267111] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.268028] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.268878] blk_flush_plug_list+0x392/0x3d7 [ 106.269694] blk_finish_plug+0x37/0x4f [ 106.270432] read_pages+0x3ef/0x430 [ 106.271139] __do_page_cache_readahead+0x18e/0x2fc [ 106.272040] force_page_cache_readahead+0x121/0x133 [ 106.272932] page_cache_sync_readahead+0x35f/0x3bb [ 106.273811] generic_file_buffered_read+0x410/0x1860 [ 106.274709] __vfs_read+0x319/0x38f [ 106.275407] vfs_read+0xd2/0x19a [ 106.276074] ksys_read+0xb9/0x135 [ 106.276764] do_syscall_64+0x140/0x385 [ 106.277500] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 106.278417] } [ 106.278676] ... key at: [<ffffffff85094640>] __key.26212+0x0/0x40 [ 106.279586] ... acquired at: [ 106.280026] lock_acquire+0x280/0x2f3 [ 106.280559] _raw_spin_lock+0x33/0x64 [ 106.281101] sbitmap_get+0xd5/0x22c [ 106.281610] __sbitmap_queue_get+0xe8/0x177 [ 106.282221] __blk_mq_get_tag+0x1e6/0x22d [ 106.282809] blk_mq_get_tag+0x1db/0x6e4 [ 106.283368] blk_mq_get_driver_tag+0x161/0x258 [ 106.284018] blk_mq_dispatch_rq_list+0x5b9/0xd7c [ 106.284685] blk_mq_do_dispatch_sched+0x23a/0x287 [ 106.285371] blk_mq_sched_dispatch_requests+0x379/0x3fc [ 106.286135] __blk_mq_run_hw_queue+0x137/0x17e [ 106.286806] __blk_mq_delay_run_hw_queue+0x80/0x25f [ 106.287515] blk_mq_run_hw_queue+0x151/0x187 [ 106.288149] blk_mq_sched_insert_requests+0x13f/0x175 [ 106.289041] blk_mq_flush_plug_list+0x7d6/0x81b [ 106.289912] blk_flush_plug_list+0x392/0x3d7 [ 106.290590] blk_finish_plug+0x37/0x4f [ 106.291238] __se_sys_io_submit+0x171/0x304 [ 106.291864] do_syscall_64+0x140/0x385 [ 106.292534] entry_SYSCALL_64_after_hwframe+0x49/0xbe Reported-by: Ming Lei <ming.lei@redhat.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
- 08 Dec, 2018 39 commits
-
-
Dan Carpenter authored
Smatch generates a warning: drivers/scsi/scsi_lib.c:1656 scsi_mq_done() warn: test_bit() takes a bit number The problem is that SCMD_STATE_COMPLETE is supposed to be bit number 0 and not a mask like "(1 << 0)". It is used like this: if (test_and_set_bit(SCMD_STATE_COMPLETE, &scmd->state)) The test_and_set_bit() has a shift built in so it's a double left shift and uses bit number 1 instead of number 0. This bug is harmless because it's done consistently and it doesn't clash with any other flags. Fixes: f1342709 ("scsi: Do not rely on blk-mq for double completions") Reviewed-by: Keith Busch <keith.busch@intel.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Israel Rukshin authored
Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Keith Busch authored
A controller may have an internal state that is not able to successfully process commands for a short duration. In such states, an immediate command requeue is expected to fail. The driver may exceed its max retry count, which permanently ends the command in failure when the same command would succeed after waiting for the controller to be ready. NVMe ratified TP 4033 provides a delay hint in the completion status code for failed commands. Implement the retry delay based on the command completion status and the controller's requested delay. Note that requeued commands are handled per request_queue, not per individual request. If multiple commands fail, the controller should consistently report the desired delay time for retryable commands in all CQEs, otherwise the requeue list may be kicked too soon. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Chaitanya Kulkarni authored
This is a cleanup patch which fixes the structure member indentation introduced by the p2p: commit c6925093 ("nvmet: Optionally use PCI P2P memory"). We don't change any functionality in this patch. This is needed so that any future members will also follow the uniform indentation. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Chaitanya Kulkarni authored
This patch adds unlikely in the nvmet request completion path for the status check in the low level function __nvmet_req_complete. This is helpful in the scenario where host and target connection is working smoothly. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Israel Rukshin authored
Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Israel Rukshin authored
Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
As for now, we don't care about sq_head pointer updates anyway, so at least allow the controller to micro-optimize by omiting this update. Note that we will probably need to support it when a controller that requires this comes along. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
Technical Proposal introduces an indication for SQ flow control disable support. Expose it since we are able to operate in this mode. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
Only override the allowed parts of it. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> [hch: slight tweak to the NVME_TREQ_SECURE_CHANNEL_MASK definition] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
Technical proposal 8005 "fabrics SQ flow control" introduces a mode where a host and controller agree to omit sq_head pointer updates when sending nvme completions. In case the host indicated desire to operate in this mode (connect attribute) the controller will return back a connect completion with sq_head value of 0xffff as indication that it will omit sq_head pointer updates. This mode saves us an atomic update in the I/O path. Reviewed-by: Hannes Reinecke <hare@suse.com> [hch: suggested better implementation] Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
James Smart authored
All target lldd's call the cmd receive and op completions in non-isr thread contexts. As such the IN_ISR options are not necessary. Remove the functionality and flags, which also removes cpu assignments to queues. Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jay Sternberg authored
Add functions to find connections requesting Discovery Change events and send a notification to hosts that maintain an explicit persistent connection and have and active Asynchronous Event Request pending. Only Hosts that have access to the Subsystem effected by the change will receive notifications of Discovery Change event. Call these functions each time there is a configfs change that effects the Discover Log Pages. Set the OAES field in the Identify Controller response to advertise the support for Asynchronous Event Notifications. Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> Reviewed-by: Phil Cayton <phil.cayton@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
It is perfectly valid that a host connects to a discovery subsystem and gets an empty discovery log page since no subsystems are provisioned to it. No reason to disallow connecting to the discovery subsystem all together. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jay Sternberg <jay.e.sternberg@intel.com> Reviewed-by: Phil Cayton <phil.cayton@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jay Sternberg authored
Add custom get/set features to commands allowed by Discovery controllers. Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jay Sternberg authored
Add AEN/AER values as defined by the specification Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jay Sternberg authored
Make common process of get/set features available to other controllers by making simple functions static inline and others not static and prototypes in nvmet.h file Also remove static from nvmet_execute_async_event and add prototype to nvmet.h to allow used by other controllers Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jay Sternberg authored
Per change to specification allowing Discovery controllers to have explicit persistent connections, remove restriction on Discovery controllers allowing kato on connect. Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jay Sternberg authored
Functions nvmet_aen_disabled and nvmet_clear_aen were using values not bit numbers ie 1 << 9 not 9 for bit function clear_bit and test_and_set_bit. Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> Reviewed-by: Phil Cayton <phil.cayton@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Jay Sternberg authored
Move nvmet_aen_disabled and nvmet_clear_aen in preparation for other types of controllers to use, initially the discovery controller. Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Chaitanya Kulkarni authored
This patch optimizes read command behavior when file-ns configured with buffered I/O. Instead of offloading the buffered I/O read operations to the worker threads, we first issue the read operation with IOCB_NOWAIT and try and access the data from the cache. Here we only offload the request to the worker thread and complete the request in the worker thread context when IOCB_NOWAIT request fails. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
A controller that supports traffic based keep-alive can restart the keep alive timer even when no keep-alive was not received in the kato period as long as other admin or I/O commands were received. For each command set ctrl->cmd_seen to true, and when keep-alive timer expires, if any commands were seen, resched ka_work instead of escalating to a fatal error. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
If the controller supports traffic based keep alive, we restart the keep alive timer if any admin or io commands was completed during the kato period. This prevents a possible starvation of keep alive commands in the presence of heavy traffic as in such case, we already have a health indication from the host perspective. Only set a comp_seen indicator in case the controller supports keep alive to minimize the overhead for pci controllers. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
We get the controller attributes in identify, cache them as we'll need them for traffic based keep alive support. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
We are growing more controller attributes, so use a proper enumeration for it. For now just add the 128-bit hostid which we support. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Hannes Reinecke authored
Instead of directly poking into the struct device add a new numa_node field to struct nvme_ctrl. This allows fabrics drivers where ctrl->dev is a virtual device to support NUMA affinity as well. Also expose the field as a sysfs attribute, and populate it for the RDMA and FC transports. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Chaitanya Kulkarni authored
In function nvme_setup_cmd() we call command specific setup function for flush, rw, and discard. Instead of calling memset in each function lets call it once in the parent function. This is purely code cleanup patch and it does not change any existing functionality. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Now almost all .map_queues() implementation based on managed irq affinity doesn't update queue mapping and it just retrieves the old built mapping, so if nr_hw_queues is changed, the mapping talbe includes stale mapping. And only blk_mq_map_queues() may rebuild the mapping talbe. One case is that we limit .nr_hw_queues as 1 in case of kdump kernel. However, drivers often builds queue mapping before allocating tagset via pci_alloc_irq_vectors_affinity(), but set->nr_hw_queues can be set as 1 in case of kdump kernel, so wrong queue mapping is used, and kernel panic[1] is observed during booting. This patch fixes the kernel panic triggerd on nvme by rebulding the mapping table via blk_mq_map_queues(). [1] kernel panic log [ 4.438371] nvme nvme0: 16/0/0 default/read/poll queues [ 4.443277] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 [ 4.444681] PGD 0 P4D 0 [ 4.445367] Oops: 0000 [#1] SMP NOPTI [ 4.446342] CPU: 3 PID: 201 Comm: kworker/u33:10 Not tainted 4.20.0-rc5-00664-g5eb02f7ee1eb-dirty #459 [ 4.447630] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 [ 4.448689] Workqueue: nvme-wq nvme_scan_work [nvme_core] [ 4.449368] RIP: 0010:blk_mq_map_swqueue+0xfb/0x222 [ 4.450596] Code: 04 f5 20 28 ef 81 48 89 c6 39 55 30 76 93 89 d0 48 c1 e0 04 48 03 83 f8 05 00 00 48 8b 00 42 8b 3c 28 48 8b 43 58 48 8b 04 f8 <48> 8b b8 98 00 00 00 4c 0f a3 37 72 42 f0 4c 0f ab 37 66 8b b8 f6 [ 4.453132] RSP: 0018:ffffc900023b3cd8 EFLAGS: 00010286 [ 4.454061] RAX: 0000000000000000 RBX: ffff888174448000 RCX: 0000000000000001 [ 4.456480] RDX: 0000000000000001 RSI: ffffe8feffc506c0 RDI: 0000000000000001 [ 4.458750] RBP: ffff88810722d008 R08: ffff88817647a880 R09: 0000000000000002 [ 4.464580] R10: ffffc900023b3c10 R11: 0000000000000004 R12: ffff888174448538 [ 4.467803] R13: 0000000000000004 R14: 0000000000000001 R15: 0000000000000001 [ 4.469220] FS: 0000000000000000(0000) GS:ffff88817bac0000(0000) knlGS:0000000000000000 [ 4.471554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4.472464] CR2: 0000000000000098 CR3: 0000000174e4e001 CR4: 0000000000760ee0 [ 4.474264] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4.476007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4.477061] PKRU: 55555554 [ 4.477464] Call Trace: [ 4.478731] blk_mq_init_allocated_queue+0x36a/0x3ad [ 4.479595] blk_mq_init_queue+0x32/0x4e [ 4.480178] nvme_validate_ns+0x98/0x623 [nvme_core] [ 4.480963] ? nvme_submit_sync_cmd+0x1b/0x20 [nvme_core] [ 4.481685] ? nvme_identify_ctrl.isra.8+0x70/0xa0 [nvme_core] [ 4.482601] nvme_scan_work+0x23a/0x29b [nvme_core] [ 4.483269] ? _raw_spin_unlock_irqrestore+0x25/0x38 [ 4.483930] ? try_to_wake_up+0x38d/0x3b3 [ 4.484478] ? process_one_work+0x179/0x2fc [ 4.485118] process_one_work+0x1d3/0x2fc [ 4.485655] ? rescuer_thread+0x2ae/0x2ae [ 4.486196] worker_thread+0x1e9/0x2be [ 4.486841] kthread+0x115/0x11d [ 4.487294] ? kthread_park+0x76/0x76 [ 4.487784] ret_from_fork+0x3a/0x50 [ 4.488322] Modules linked in: nvme nvme_core qemu_fw_cfg virtio_scsi ip_tables [ 4.489428] Dumping ftrace buffer: [ 4.489939] (ftrace buffer empty) [ 4.490492] CR2: 0000000000000098 [ 4.491052] ---[ end trace 03cd268ad5a86ff7 ]--- Cc: Christoph Hellwig <hch@lst.de> Cc: linux-nvme@lists.infradead.org Cc: David Milburn <dmilburn@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou authored
I was a little overzealous in removing the rcu_read_lock() call from blkcg_bio_issue_check() and it broke blk-throttle. Put it back. Fixes: e35403a034bf ("blkcg: associate blkg when associating a device") Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Josef Bacik authored
Now that we have this common helper, convert io-latency over to use it as well. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Josef Bacik authored
Now that we have rq_qos_wait() in place, convert wbt_wait() over to using it with it's specific callbacks. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Josef Bacik authored
Originally when I split out the common code from blk-wbt into rq_qos I left the wbt_wait() where it was and simply copied and modified it slightly to work for io-latency. However they are both basically the same thing, and as time has gone on wbt_wait() has ended up much smarter and kinder than it was when I copied it into io-latency, which means io-latency has lost out on these improvements. Since they are the same thing essentially except for a few minor things, create rq_qos_wait() that replicates what wbt_wait() currently does with callbacks that can be passed in for the snowflakes to do their own thing as appropriate. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou authored
blkg reference counting now uses percpu_ref rather than atomic_t. Let's make this consistent with css_tryget. This renames blkg_try_get to blkg_tryget and now returns a bool rather than the blkg or %NULL. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou authored
Every bio is now associated with a blkg putting blkg_get, blkg_try_get, and blkg_put on the hot path. Switch over the refcnt in blkg to use percpu_ref. Signed-off-by: Dennis Zhou <dennis@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou authored
Now that a bio only holds a blkg reference, so clean up is simply putting back that reference. Remove bio_disassociate_task() as it just calls bio_disassociate_blkg() and call the latter directly. Signed-off-by: Dennis Zhou <dennis@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou authored
The previous patch in this series removed carrying around a pointer to the css in blkg. However, the blkg association logic still relied on taking a reference on the css to ensure we wouldn't fail in getting a reference for the blkg. Here the implicit dependency on the css is removed. The association continues to rely on the tryget logic walking up the blkg tree. This streamlines the three ways that association can happen: normal, swap, and writeback. Signed-off-by: Dennis Zhou <dennis@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou authored
Prior patches ensured that any bio that interacts with a request_queue is properly associated with a blkg. This makes bio->bi_css unnecessary as blkg maintains a reference to blkcg already. This removes the bio field bi_css and transfers corresponding uses to access via bi_blkg. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-
Dennis Zhou authored
One of the goals of this series is to remove a separate reference to the css of the bio. This can and should be accessed via bio_blkcg(). In this patch, wbc_init_bio() now requires a bio to have a device associated with it. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
-