Merge branch 'ionic-rework-fix-for-doorbell-miss'
Shannon Nelson says: ==================== ionic: rework fix for doorbell miss A latency test in a scaled out setting (many VMs with many queues) has uncovered an issue with our missed doorbell fix from commit b69585bf ("ionic: missed doorbell workaround") As a refresher, the Elba ASIC has an issue where once in a blue moon it might miss/drop a queue doorbell notification from the driver. This can result in Tx timeouts and potential Rx buffer misses. The basic problem with the original solution is that we're delaying things with a timer for every single queue, periodically using mod_timer() to reset to reset the alarm, and mod_timer() becomes a more and more expensive thing as there are more and more VFs and queues each with their own timer. A ping-pong latency test tends to exacerbate the effect such that every napi is doing a mod_timer() in every cycle. An alternative has been worked out to replace this using periodic workqueue items outside the napi cycle to request a napi_schedule driven by a single delayed-workqueue per device rather than a timer for every queue. Also, now that newer firmware is actually reporting its ASIC type, we can restrict this to the appropriate chip. The testing scenario used 128 VFs in UP state, 16 queues per VF, and latency tests were done using TCP_RR with adaptive interrupt coalescing enabled, running on 1 VF. We would see 99th percentile latencies of up to 900us range, with some max fliers as much as 4ms. With these fixes the 99th percentile latencies are typically well under 50us with the occasional max under 500us. v1: https://lore.kernel.org/netdev/20240610230706.34883-1-shannon.nelson@amd.com/ ==================== Link: https://lore.kernel.org/r/20240619003257.6138-1-shannon.nelson@amd.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
Showing
Please register or sign in to comment