Commit da4e8648 authored by Long Li's avatar Long Li Committed by Jakub Kicinski

net: mana: Batch ringing RX queue doorbell on receiving packets

It's inefficient to ring the doorbell page every time a WQE is posted to
the received queue. Excessive MMIO writes result in CPU spending more
time waiting on LOCK instructions (atomic operations), resulting in
poor scaling performance.

Move the code for ringing doorbell page to where after we have posted all
WQEs to the receive queue during a callback from napi_poll().

With this change, tests showed an improvement from 120G/s to 160G/s on a
200G physical link, with 16 or 32 hardware queues.

Tests showed no regression in network latency benchmarks on single
connection.
Reviewed-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: default avatarDexuan Cui <decui@microsoft.com>
Signed-off-by: default avatarLong Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1689622539-5334-2-git-send-email-longli@linuxonhyperv.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parent f8e34332
......@@ -1386,8 +1386,8 @@ static void mana_post_pkt_rxq(struct mana_rxq *rxq)
recv_buf_oob = &rxq->rx_oobs[curr_index];
err = mana_gd_post_and_ring(rxq->gdma_rq, &recv_buf_oob->wqe_req,
&recv_buf_oob->wqe_inf);
err = mana_gd_post_work_request(rxq->gdma_rq, &recv_buf_oob->wqe_req,
&recv_buf_oob->wqe_inf);
if (WARN_ON_ONCE(err))
return;
......@@ -1657,6 +1657,12 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
mana_process_rx_cqe(rxq, cq, &comp[i]);
}
if (comp_read > 0) {
struct gdma_context *gc = rxq->gdma_rq->gdma_dev->gdma_context;
mana_gd_wq_ring_doorbell(gc, rxq->gdma_rq);
}
if (rxq->xdp_flush)
xdp_do_flush();
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment