qed: allow sleep in qed_mcp_trace_dump()
By default, qed_mcp_cmd_and_union() delays 10us at a time in a loop that can run 500K times, so calls to qed_mcp_nvm_rd_cmd() may block the current thread for over 5s. We observed thread scheduling delays over 700ms in production, with stacktraces pointing to this code as the culprit. qed_mcp_trace_dump() is called from ethtool, so sleeping is permitted. It already can sleep in qed_mcp_halt(), which calls qed_mcp_cmd(). Add a "can sleep" parameter to qed_find_nvram_image() and qed_nvram_read() so they can sleep during qed_mcp_trace_dump(). qed_mcp_trace_get_meta_info() and qed_mcp_trace_read_meta(), called only by qed_mcp_trace_dump(), allow these functions to sleep. I can't tell if the other caller (qed_grc_dump_mcp_hw_dump()) can sleep, so keep b_can_sleep set to false when it calls these functions. An example stacktrace from a custom warning we added to the kernel showing a thread that has not scheduled despite long needing resched: [ 2745.362925,17] ------------[ cut here ]------------ [ 2745.362941,17] WARNING: CPU: 23 PID: 5640 at arch/x86/kernel/irq.c:233 do_IRQ+0x15e/0x1a0() [ 2745.362946,17] Thread not rescheduled for 744 ms after irq 99 [ 2745.362956,17] Modules linked in: ... [ 2745.363339,17] CPU: 23 PID: 5640 Comm: lldpd Tainted: P O 4.4.182+ #202104120910+6d1da174272d.61x [ 2745.363343,17] Hardware name: FOXCONN MercuryB/Quicksilver Controller, BIOS H11P1N09 07/08/2020 [ 2745.363346,17] 0000000000000000 ffff885ec07c3ed8 ffffffff8131eb2f ffff885ec07c3f20 [ 2745.363358,17] ffffffff81d14f64 ffff885ec07c3f10 ffffffff81072ac2 ffff88be98ed0000 [ 2745.363369,17] 0000000000000063 0000000000000174 0000000000000074 0000000000000000 [ 2745.363379,17] Call Trace: [ 2745.363382,17] <IRQ> [<ffffffff8131eb2f>] dump_stack+0x8e/0xcf [ 2745.363393,17] [<ffffffff81072ac2>] warn_slowpath_common+0x82/0xc0 [ 2745.363398,17] [<ffffffff81072b4c>] warn_slowpath_fmt+0x4c/0x50 [ 2745.363404,17] [<ffffffff810d5a8e>] ? rcu_irq_exit+0xae/0xc0 [ 2745.363408,17] [<ffffffff817c99fe>] do_IRQ+0x15e/0x1a0 [ 2745.363413,17] [<ffffffff817c7ac9>] common_interrupt+0x89/0x89 [ 2745.363416,17] <EOI> [<ffffffff8132aa74>] ? delay_tsc+0x24/0x50 [ 2745.363425,17] [<ffffffff8132aa04>] __udelay+0x34/0x40 [ 2745.363457,17] [<ffffffffa04d45ff>] qed_mcp_cmd_and_union+0x36f/0x7d0 [qed] [ 2745.363473,17] [<ffffffffa04d5ced>] qed_mcp_nvm_rd_cmd+0x4d/0x90 [qed] [ 2745.363490,17] [<ffffffffa04e1dc7>] qed_mcp_trace_dump+0x4a7/0x630 [qed] [ 2745.363504,17] [<ffffffffa04e2556>] ? qed_fw_asserts_dump+0x1d6/0x1f0 [qed] [ 2745.363520,17] [<ffffffffa04e4ea7>] qed_dbg_mcp_trace_get_dump_buf_size+0x37/0x80 [qed] [ 2745.363536,17] [<ffffffffa04ea881>] qed_dbg_feature_size+0x61/0xa0 [qed] [ 2745.363551,17] [<ffffffffa04eb427>] qed_dbg_all_data_size+0x247/0x260 [qed] [ 2745.363560,17] [<ffffffffa0482c10>] qede_get_regs_len+0x30/0x40 [qede] [ 2745.363566,17] [<ffffffff816c9783>] ethtool_get_drvinfo+0xe3/0x190 [ 2745.363570,17] [<ffffffff816cc152>] dev_ethtool+0x1362/0x2140 [ 2745.363575,17] [<ffffffff8109bcc6>] ? finish_task_switch+0x76/0x260 [ 2745.363580,17] [<ffffffff817c2116>] ? __schedule+0x3c6/0x9d0 [ 2745.363585,17] [<ffffffff810dbd50>] ? hrtimer_start_range_ns+0x1d0/0x370 [ 2745.363589,17] [<ffffffff816c1e5b>] ? dev_get_by_name_rcu+0x6b/0x90 [ 2745.363594,17] [<ffffffff816de6a8>] dev_ioctl+0xe8/0x710 [ 2745.363599,17] [<ffffffff816a58a8>] sock_do_ioctl+0x48/0x60 [ 2745.363603,17] [<ffffffff816a5d87>] sock_ioctl+0x1c7/0x280 [ 2745.363608,17] [<ffffffff8111f393>] ? seccomp_phase1+0x83/0x220 [ 2745.363612,17] [<ffffffff811e3503>] do_vfs_ioctl+0x2b3/0x4e0 [ 2745.363616,17] [<ffffffff811e3771>] SyS_ioctl+0x41/0x70 [ 2745.363619,17] [<ffffffff817c6ffe>] entry_SYSCALL_64_fastpath+0x1e/0x79 [ 2745.363622,17] ---[ end trace f6954aa440266421 ]--- Fixes: c965db44 ("qed: Add support for debug data collection") Signed-off-by: Caleb Sander <csander@purestorage.com> Acked-by: Alok Prasad <palok@marvell.com> Link: https://lore.kernel.org/r/20230103233021.1457646-1-csander@purestorage.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
Showing
Please register or sign in to comment