Commit e3a843f9 authored by David S. Miller's avatar David S. Miller

Merge branch 'sfx-xdp-fallback-tx-queues'

Íñigo Huguet says:

====================
sfc: fallback for lack of xdp tx queues

If there are not enough hardware resources to allocate one tx queue per
CPU for XDP, XDP_TX and XDP_REDIRECT actions were unavailable, and using
them resulted each time with the packet being drop and this message in
the logs: XDP TX failed (-22)

These patches implement 2 fallback solutions for 2 different situations
that might happen:
1. There are not enough free resources for all the tx queues, but there
   are some free resources available
2. There are not enough free resources at all for tx queues.

Both solutions are based in sharing tx queues, using __netif_tx_lock for
synchronization. In the second case, as there are not XDP TX queues to
share, network stack queues are used instead, but since we're taking
__netif_tx_lock, concurrent access to the queues is correctly protected.

The solution for this second case might affect performance both of XDP
traffic and normal traffice due to lock contention if both are used
intensively. That's why I call it a "last resort" fallback: it's not a
desirable situation, but at least we have XDP TX working.

Some tests has shown good results and indicate that the non-fallback
case is not being damaged by this changes. They are also promising for
the fallback cases. This is the test:
1. From another machine, send high amount of packets with pktgen, script
   samples/pktgen/pktgen_sample04_many_flows.sh
2. In the tested machine, run samples/bpf/xdp_rxq_info with arguments
   "-a XDP_TX --swapmac" and see the results
3. In the tested machine, run also pktgen_sample04 to create high TX
   normal traffic, and see how xdp_rxq_info results vary

Note that this test doesn't check the worst situations for the fallback
solutions because XDP_TX will only be executed from the same CPUs that
are processed by sfc, and not from every CPU in the system, so the
performance drop due to the highest locking contention doesn't happen.
I'd like to test that, as well, but I don't have access right now to a
proper environment.

Test results:

Without doing TX:
Before changes: ~2,900,000 pps
After changes, 1 queues/core: ~2,900,000 pps
After changes, 2 queues/core: ~2,900,000 pps
After changes, 8 queues/core: ~2,900,000 pps
After changes, borrowing from network stack: ~2,900,000 pps

With multiflow TX at the same time:
Before changes: ~1,700,000 - 2,900,000 pps
After changes, 1 queues/core: ~1,700,000 - 2,900,000 pps
After changes, 2 queues/core: ~1,700,000 pps
After changes, 8 queues/core: ~1,700,000 pps
After changes, borrowing from network stack: 1,150,000 pps

Sporadic "XDP TX failed (-5)" warnings are shown when running xdp program
and pktgen simultaneously. This was expected because XDP doesn't have any
buffering system if the NIC is under very high pressure. Thousands of
these warnings are shown in the case of borrowing net stack queues. As I
said before, this was also expected.
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 2a48d96f 6215b608
......@@ -166,32 +166,46 @@ static int efx_allocate_msix_channels(struct efx_nic *efx,
* We need a channel per event queue, plus a VI per tx queue.
* This may be more pessimistic than it needs to be.
*/
if (n_channels + n_xdp_ev > max_channels) {
netif_err(efx, drv, efx->net_dev,
"Insufficient resources for %d XDP event queues (%d other channels, max %d)\n",
n_xdp_ev, n_channels, max_channels);
netif_err(efx, drv, efx->net_dev,
"XDP_TX and XDP_REDIRECT will not work on this interface");
efx->n_xdp_channels = 0;
efx->xdp_tx_per_channel = 0;
efx->xdp_tx_queue_count = 0;
if (n_channels >= max_channels) {
efx->xdp_txq_queues_mode = EFX_XDP_TX_QUEUES_BORROWED;
netif_warn(efx, drv, efx->net_dev,
"Insufficient resources for %d XDP event queues (%d other channels, max %d)\n",
n_xdp_ev, n_channels, max_channels);
netif_warn(efx, drv, efx->net_dev,
"XDP_TX and XDP_REDIRECT might decrease device's performance\n");
} else if (n_channels + n_xdp_tx > efx->max_vis) {
netif_err(efx, drv, efx->net_dev,
"Insufficient resources for %d XDP TX queues (%d other channels, max VIs %d)\n",
n_xdp_tx, n_channels, efx->max_vis);
netif_err(efx, drv, efx->net_dev,
"XDP_TX and XDP_REDIRECT will not work on this interface");
efx->n_xdp_channels = 0;
efx->xdp_tx_per_channel = 0;
efx->xdp_tx_queue_count = 0;
efx->xdp_txq_queues_mode = EFX_XDP_TX_QUEUES_BORROWED;
netif_warn(efx, drv, efx->net_dev,
"Insufficient resources for %d XDP TX queues (%d other channels, max VIs %d)\n",
n_xdp_tx, n_channels, efx->max_vis);
netif_warn(efx, drv, efx->net_dev,
"XDP_TX and XDP_REDIRECT might decrease device's performance\n");
} else if (n_channels + n_xdp_ev > max_channels) {
efx->xdp_txq_queues_mode = EFX_XDP_TX_QUEUES_SHARED;
netif_warn(efx, drv, efx->net_dev,
"Insufficient resources for %d XDP event queues (%d other channels, max %d)\n",
n_xdp_ev, n_channels, max_channels);
n_xdp_ev = max_channels - n_channels;
netif_warn(efx, drv, efx->net_dev,
"XDP_TX and XDP_REDIRECT will work with reduced performance (%d cpus/tx_queue)\n",
DIV_ROUND_UP(n_xdp_tx, tx_per_ev * n_xdp_ev));
} else {
efx->xdp_txq_queues_mode = EFX_XDP_TX_QUEUES_DEDICATED;
}
if (efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_BORROWED) {
efx->n_xdp_channels = n_xdp_ev;
efx->xdp_tx_per_channel = tx_per_ev;
efx->xdp_tx_queue_count = n_xdp_tx;
n_channels += n_xdp_ev;
netif_dbg(efx, drv, efx->net_dev,
"Allocating %d TX and %d event queues for XDP\n",
n_xdp_tx, n_xdp_ev);
n_xdp_ev * tx_per_ev, n_xdp_ev);
} else {
efx->n_xdp_channels = 0;
efx->xdp_tx_per_channel = 0;
efx->xdp_tx_queue_count = n_xdp_tx;
}
if (vec_count < n_channels) {
......@@ -858,6 +872,20 @@ int efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries)
goto out;
}
static inline int
efx_set_xdp_tx_queue(struct efx_nic *efx, int xdp_queue_number,
struct efx_tx_queue *tx_queue)
{
if (xdp_queue_number >= efx->xdp_tx_queue_count)
return -EINVAL;
netif_dbg(efx, drv, efx->net_dev, "Channel %u TXQ %u is XDP %u, HW %u\n",
tx_queue->channel->channel, tx_queue->label,
xdp_queue_number, tx_queue->queue);
efx->xdp_tx_queues[xdp_queue_number] = tx_queue;
return 0;
}
int efx_set_channels(struct efx_nic *efx)
{
struct efx_tx_queue *tx_queue;
......@@ -896,20 +924,9 @@ int efx_set_channels(struct efx_nic *efx)
if (efx_channel_is_xdp_tx(channel)) {
efx_for_each_channel_tx_queue(tx_queue, channel) {
tx_queue->queue = next_queue++;
/* We may have a few left-over XDP TX
* queues owing to xdp_tx_queue_count
* not dividing evenly by EFX_MAX_TXQ_PER_CHANNEL.
* We still allocate and probe those
* TXQs, but never use them.
*/
if (xdp_queue_number < efx->xdp_tx_queue_count) {
netif_dbg(efx, drv, efx->net_dev, "Channel %u TXQ %u is XDP %u, HW %u\n",
channel->channel, tx_queue->label,
xdp_queue_number, tx_queue->queue);
efx->xdp_tx_queues[xdp_queue_number] = tx_queue;
rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue);
if (rc == 0)
xdp_queue_number++;
}
}
} else {
efx_for_each_channel_tx_queue(tx_queue, channel) {
......@@ -918,10 +935,35 @@ int efx_set_channels(struct efx_nic *efx)
channel->channel, tx_queue->label,
tx_queue->queue);
}
/* If XDP is borrowing queues from net stack, it must use the queue
* with no csum offload, which is the first one of the channel
* (note: channel->tx_queue_by_type is not initialized yet)
*/
if (efx->xdp_txq_queues_mode == EFX_XDP_TX_QUEUES_BORROWED) {
tx_queue = &channel->tx_queue[0];
rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue);
if (rc == 0)
xdp_queue_number++;
}
}
}
}
WARN_ON(xdp_queue_number != efx->xdp_tx_queue_count);
WARN_ON(efx->xdp_txq_queues_mode == EFX_XDP_TX_QUEUES_DEDICATED &&
xdp_queue_number != efx->xdp_tx_queue_count);
WARN_ON(efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED &&
xdp_queue_number > efx->xdp_tx_queue_count);
/* If we have more CPUs than assigned XDP TX queues, assign the already
* existing queues to the exceeding CPUs
*/
next_queue = 0;
while (xdp_queue_number < efx->xdp_tx_queue_count) {
tx_queue = efx->xdp_tx_queues[next_queue++];
rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue);
if (rc == 0)
xdp_queue_number++;
}
rc = netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels);
if (rc)
......
......@@ -782,6 +782,12 @@ struct efx_async_filter_insertion {
#define EFX_RPS_MAX_IN_FLIGHT 8
#endif /* CONFIG_RFS_ACCEL */
enum efx_xdp_tx_queues_mode {
EFX_XDP_TX_QUEUES_DEDICATED, /* one queue per core, locking not needed */
EFX_XDP_TX_QUEUES_SHARED, /* each queue used by more than 1 core */
EFX_XDP_TX_QUEUES_BORROWED /* queues borrowed from net stack */
};
/**
* struct efx_nic - an Efx NIC
* @name: Device name (net device name or bus id before net device registered)
......@@ -820,6 +826,7 @@ struct efx_async_filter_insertion {
* should be allocated for this NIC
* @xdp_tx_queue_count: Number of entries in %xdp_tx_queues.
* @xdp_tx_queues: Array of pointers to tx queues used for XDP transmit.
* @xdp_txq_queues_mode: XDP TX queues sharing strategy.
* @rxq_entries: Size of receive queues requested by user.
* @txq_entries: Size of transmit queues requested by user.
* @txq_stop_thresh: TX queue fill level at or above which we stop it.
......@@ -979,6 +986,7 @@ struct efx_nic {
unsigned int xdp_tx_queue_count;
struct efx_tx_queue **xdp_tx_queues;
enum efx_xdp_tx_queues_mode xdp_txq_queues_mode;
unsigned rxq_entries;
unsigned txq_entries;
......
......@@ -428,23 +428,32 @@ int efx_xdp_tx_buffers(struct efx_nic *efx, int n, struct xdp_frame **xdpfs,
unsigned int len;
int space;
int cpu;
int i;
int i = 0;
cpu = raw_smp_processor_id();
if (unlikely(n && !xdpfs))
return -EINVAL;
if (unlikely(!n))
return 0;
if (!efx->xdp_tx_queue_count ||
unlikely(cpu >= efx->xdp_tx_queue_count))
cpu = raw_smp_processor_id();
if (unlikely(cpu >= efx->xdp_tx_queue_count))
return -EINVAL;
tx_queue = efx->xdp_tx_queues[cpu];
if (unlikely(!tx_queue))
return -EINVAL;
if (unlikely(n && !xdpfs))
return -EINVAL;
if (efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED)
HARD_TX_LOCK(efx->net_dev, tx_queue->core_txq, cpu);
if (!n)
return 0;
/* If we're borrowing net stack queues we have to handle stop-restart
* or we might block the queue and it will be considered as frozen
*/
if (efx->xdp_txq_queues_mode == EFX_XDP_TX_QUEUES_BORROWED) {
if (netif_tx_queue_stopped(tx_queue->core_txq))
goto unlock;
efx_tx_maybe_stop_queue(tx_queue);
}
/* Check for available space. We should never need multiple
* descriptors per frame.
......@@ -484,6 +493,10 @@ int efx_xdp_tx_buffers(struct efx_nic *efx, int n, struct xdp_frame **xdpfs,
if (flush && i > 0)
efx_nic_push_buffers(tx_queue);
unlock:
if (efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED)
HARD_TX_UNLOCK(efx->net_dev, tx_queue->core_txq);
return i == 0 ? -EIO : i;
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment