Commit 54f00cce authored by William Tu's avatar William Tu Committed by David S. Miller

vmxnet3: Add XDP support.

The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT.

Background:
The vmxnet3 rx consists of three rings: ring0, ring1, and dataring.
For r0 and r1, buffers at r0 are allocated using alloc_skb APIs and dma
mapped to the ring's descriptor. If LRO is enabled and packet size larger
than 3K, VMXNET3_MAX_SKB_BUF_SIZE, then r1 is used to mapped the rest of
the buffer larger than VMXNET3_MAX_SKB_BUF_SIZE. Each buffer in r1 is
allocated using alloc_page. So for LRO packets, the payload will be in one
buffer from r0 and multiple from r1, for non-LRO packets, only one
descriptor in r0 is used for packet size less than 3k.

When receiving a packet, the first descriptor will have the sop (start of
packet) bit set, and the last descriptor will have the eop (end of packet)
bit set. Non-LRO packets will have only one descriptor with both sop and
eop set.

Other than r0 and r1, vmxnet3 dataring is specifically designed for
handling packets with small size, usually 128 bytes, defined in
VMXNET3_DEF_RXDATA_DESC_SIZE, by simply copying the packet from the backend
driver in ESXi to the ring's memory region at front-end vmxnet3 driver, in
order to avoid memory mapping/unmapping overhead. In summary, packet size:
    A. < 128B: use dataring
    B. 128B - 3K: use ring0 (VMXNET3_RX_BUF_SKB)
    C. > 3K: use ring0 and ring1 (VMXNET3_RX_BUF_SKB + VMXNET3_RX_BUF_PAGE)
As a result, the patch adds XDP support for packets using dataring
and r0 (case A and B), not the large packet size when LRO is enabled.

XDP Implementation:
When user loads and XDP prog, vmxnet3 driver checks configurations, such
as mtu, lro, and re-allocate the rx buffer size for reserving the extra
headroom, XDP_PACKET_HEADROOM, for XDP frame. The XDP prog will then be
associated with every rx queue of the device. Note that when using dataring
for small packet size, vmxnet3 (front-end driver) doesn't control the
buffer allocation, as a result we allocate a new page and copy packet
from the dataring to XDP frame.

The receive side of XDP is implemented for case A and B, by invoking the
bpf program at vmxnet3_rq_rx_complete and handle its returned action.
The vmxnet3_process_xdp(), vmxnet3_process_xdp_small() function handles
the ring0 and dataring case separately, and decides the next journey of
the packet afterward.

For TX, vmxnet3 has split header design. Outgoing packets are parsed
first and protocol headers (L2/L3/L4) are copied to the backend. The
rest of the payload are dma mapped. Since XDP_TX does not parse the
packet protocol, the entire XDP frame is dma mapped for transmission
and transmitted in a batch. Later on, the frame is freed and recycled
back to the memory pool.

Performance:
Tested using two VMs inside one ESXi vSphere 7.0 machine, using single
core on each vmxnet3 device, sender using DPDK testpmd tx-mode attached
to vmxnet3 device, sending 64B or 512B UDP packet.

VM1 txgen:
$ dpdk-testpmd -l 0-3 -n 1 -- -i --nb-cores=3 \
--forward-mode=txonly --eth-peer=0,<mac addr of vm2>
option: add "--txonly-multi-flow"
option: use --txpkts=512 or 64 byte

VM2 running XDP:
$ ./samples/bpf/xdp_rxq_info -d ens160 -a <options> --skb-mode
$ ./samples/bpf/xdp_rxq_info -d ens160 -a <options>
options: XDP_DROP, XDP_PASS, XDP_TX

To test REDIRECT to cpu 0, use
$ ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e drop

Single core performance comparison with skb-mode.
64B:      skb-mode -> native-mode
XDP_DROP: 1.6Mpps -> 2.4Mpps
XDP_PASS: 338Kpps -> 367Kpps
XDP_TX:   1.1Mpps -> 2.3Mpps
REDIRECT-drop: 1.3Mpps -> 2.3Mpps

512B:     skb-mode -> native-mode
XDP_DROP: 863Kpps -> 1.3Mpps
XDP_PASS: 275Kpps -> 376Kpps
XDP_TX:   554Kpps -> 1.2Mpps
REDIRECT-drop: 659Kpps -> 1.2Mpps

Demo: https://youtu.be/4lm1CSCi78Q

Future work:
- XDP frag support
- use napi_consume_skb() instead of dev_kfree_skb_any at unmap
- stats using u64_stats_t
- using bitfield macro BIT()
- optimization for DMA synchronization using actual frame length,
  instead of always max_len
Signed-off-by: default avatarWilliam Tu <u9012063@gmail.com>
Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parent 76fa3635
...@@ -571,6 +571,7 @@ config VMXNET3 ...@@ -571,6 +571,7 @@ config VMXNET3
tristate "VMware VMXNET3 ethernet driver" tristate "VMware VMXNET3 ethernet driver"
depends on PCI && INET depends on PCI && INET
depends on PAGE_SIZE_LESS_THAN_64KB depends on PAGE_SIZE_LESS_THAN_64KB
select PAGE_POOL
help help
This driver supports VMware's vmxnet3 virtual ethernet NIC. This driver supports VMware's vmxnet3 virtual ethernet NIC.
To compile this driver as a module, choose M here: the To compile this driver as a module, choose M here: the
......
...@@ -32,4 +32,4 @@ ...@@ -32,4 +32,4 @@
obj-$(CONFIG_VMXNET3) += vmxnet3.o obj-$(CONFIG_VMXNET3) += vmxnet3.o
vmxnet3-objs := vmxnet3_drv.o vmxnet3_ethtool.o vmxnet3-objs := vmxnet3_drv.o vmxnet3_ethtool.o vmxnet3_xdp.o
This diff is collapsed.
...@@ -28,6 +28,7 @@ ...@@ -28,6 +28,7 @@
#include "vmxnet3_int.h" #include "vmxnet3_int.h"
#include <net/vxlan.h> #include <net/vxlan.h>
#include <net/geneve.h> #include <net/geneve.h>
#include "vmxnet3_xdp.h"
#define VXLAN_UDP_PORT 8472 #define VXLAN_UDP_PORT 8472
...@@ -76,6 +77,10 @@ vmxnet3_tq_driver_stats[] = { ...@@ -76,6 +77,10 @@ vmxnet3_tq_driver_stats[] = {
copy_skb_header) }, copy_skb_header) },
{ " giant hdr", offsetof(struct vmxnet3_tq_driver_stats, { " giant hdr", offsetof(struct vmxnet3_tq_driver_stats,
oversized_hdr) }, oversized_hdr) },
{ " xdp xmit", offsetof(struct vmxnet3_tq_driver_stats,
xdp_xmit) },
{ " xdp xmit err", offsetof(struct vmxnet3_tq_driver_stats,
xdp_xmit_err) },
}; };
/* per rq stats maintained by the device */ /* per rq stats maintained by the device */
...@@ -106,6 +111,16 @@ vmxnet3_rq_driver_stats[] = { ...@@ -106,6 +111,16 @@ vmxnet3_rq_driver_stats[] = {
drop_fcs) }, drop_fcs) },
{ " rx buf alloc fail", offsetof(struct vmxnet3_rq_driver_stats, { " rx buf alloc fail", offsetof(struct vmxnet3_rq_driver_stats,
rx_buf_alloc_failure) }, rx_buf_alloc_failure) },
{ " xdp packets", offsetof(struct vmxnet3_rq_driver_stats,
xdp_packets) },
{ " xdp tx", offsetof(struct vmxnet3_rq_driver_stats,
xdp_tx) },
{ " xdp redirects", offsetof(struct vmxnet3_rq_driver_stats,
xdp_redirects) },
{ " xdp drops", offsetof(struct vmxnet3_rq_driver_stats,
xdp_drops) },
{ " xdp aborted", offsetof(struct vmxnet3_rq_driver_stats,
xdp_aborted) },
}; };
/* global stats maintained by the driver */ /* global stats maintained by the driver */
...@@ -249,10 +264,18 @@ vmxnet3_get_strings(struct net_device *netdev, u32 stringset, u8 *buf) ...@@ -249,10 +264,18 @@ vmxnet3_get_strings(struct net_device *netdev, u32 stringset, u8 *buf)
netdev_features_t vmxnet3_fix_features(struct net_device *netdev, netdev_features_t vmxnet3_fix_features(struct net_device *netdev,
netdev_features_t features) netdev_features_t features)
{ {
struct vmxnet3_adapter *adapter = netdev_priv(netdev);
/* If Rx checksum is disabled, then LRO should also be disabled */ /* If Rx checksum is disabled, then LRO should also be disabled */
if (!(features & NETIF_F_RXCSUM)) if (!(features & NETIF_F_RXCSUM))
features &= ~NETIF_F_LRO; features &= ~NETIF_F_LRO;
/* If XDP is enabled, then LRO should not be enabled */
if (vmxnet3_xdp_enabled(adapter) && (features & NETIF_F_LRO)) {
netdev_err(netdev, "LRO is not supported with XDP");
features &= ~NETIF_F_LRO;
}
return features; return features;
} }
......
...@@ -56,6 +56,9 @@ ...@@ -56,6 +56,9 @@
#include <linux/if_arp.h> #include <linux/if_arp.h>
#include <linux/inetdevice.h> #include <linux/inetdevice.h>
#include <linux/log2.h> #include <linux/log2.h>
#include <linux/bpf.h>
#include <net/page_pool/helpers.h>
#include <net/xdp.h>
#include "vmxnet3_defs.h" #include "vmxnet3_defs.h"
...@@ -188,19 +191,20 @@ struct vmxnet3_tx_data_ring { ...@@ -188,19 +191,20 @@ struct vmxnet3_tx_data_ring {
dma_addr_t basePA; dma_addr_t basePA;
}; };
enum vmxnet3_buf_map_type { #define VMXNET3_MAP_NONE 0
VMXNET3_MAP_INVALID = 0, #define VMXNET3_MAP_SINGLE BIT(0)
VMXNET3_MAP_NONE, #define VMXNET3_MAP_PAGE BIT(1)
VMXNET3_MAP_SINGLE, #define VMXNET3_MAP_XDP BIT(2)
VMXNET3_MAP_PAGE,
};
struct vmxnet3_tx_buf_info { struct vmxnet3_tx_buf_info {
u32 map_type; u32 map_type;
u16 len; u16 len;
u16 sop_idx; u16 sop_idx;
dma_addr_t dma_addr; dma_addr_t dma_addr;
union {
struct sk_buff *skb; struct sk_buff *skb;
struct xdp_frame *xdpf;
};
}; };
struct vmxnet3_tq_driver_stats { struct vmxnet3_tq_driver_stats {
...@@ -217,6 +221,9 @@ struct vmxnet3_tq_driver_stats { ...@@ -217,6 +221,9 @@ struct vmxnet3_tq_driver_stats {
u64 linearized; /* # of pkts linearized */ u64 linearized; /* # of pkts linearized */
u64 copy_skb_header; /* # of times we have to copy skb header */ u64 copy_skb_header; /* # of times we have to copy skb header */
u64 oversized_hdr; u64 oversized_hdr;
u64 xdp_xmit;
u64 xdp_xmit_err;
}; };
struct vmxnet3_tx_ctx { struct vmxnet3_tx_ctx {
...@@ -253,12 +260,13 @@ struct vmxnet3_tx_queue { ...@@ -253,12 +260,13 @@ struct vmxnet3_tx_queue {
* stopped */ * stopped */
int qid; int qid;
u16 txdata_desc_size; u16 txdata_desc_size;
} __attribute__((__aligned__(SMP_CACHE_BYTES))); } ____cacheline_aligned;
enum vmxnet3_rx_buf_type { enum vmxnet3_rx_buf_type {
VMXNET3_RX_BUF_NONE = 0, VMXNET3_RX_BUF_NONE = 0,
VMXNET3_RX_BUF_SKB = 1, VMXNET3_RX_BUF_SKB = 1,
VMXNET3_RX_BUF_PAGE = 2 VMXNET3_RX_BUF_PAGE = 2,
VMXNET3_RX_BUF_XDP = 3,
}; };
#define VMXNET3_RXD_COMP_PENDING 0 #define VMXNET3_RXD_COMP_PENDING 0
...@@ -285,6 +293,12 @@ struct vmxnet3_rq_driver_stats { ...@@ -285,6 +293,12 @@ struct vmxnet3_rq_driver_stats {
u64 drop_err; u64 drop_err;
u64 drop_fcs; u64 drop_fcs;
u64 rx_buf_alloc_failure; u64 rx_buf_alloc_failure;
u64 xdp_packets; /* Total packets processed by XDP. */
u64 xdp_tx;
u64 xdp_redirects;
u64 xdp_drops;
u64 xdp_aborted;
}; };
struct vmxnet3_rx_data_ring { struct vmxnet3_rx_data_ring {
...@@ -307,7 +321,9 @@ struct vmxnet3_rx_queue { ...@@ -307,7 +321,9 @@ struct vmxnet3_rx_queue {
struct vmxnet3_rx_buf_info *buf_info[2]; struct vmxnet3_rx_buf_info *buf_info[2];
struct Vmxnet3_RxQueueCtrl *shared; struct Vmxnet3_RxQueueCtrl *shared;
struct vmxnet3_rq_driver_stats stats; struct vmxnet3_rq_driver_stats stats;
} __attribute__((__aligned__(SMP_CACHE_BYTES))); struct page_pool *page_pool;
struct xdp_rxq_info xdp_rxq;
} ____cacheline_aligned;
#define VMXNET3_DEVICE_MAX_TX_QUEUES 32 #define VMXNET3_DEVICE_MAX_TX_QUEUES 32
#define VMXNET3_DEVICE_MAX_RX_QUEUES 32 /* Keep this value as a power of 2 */ #define VMXNET3_DEVICE_MAX_RX_QUEUES 32 /* Keep this value as a power of 2 */
...@@ -415,6 +431,7 @@ struct vmxnet3_adapter { ...@@ -415,6 +431,7 @@ struct vmxnet3_adapter {
u16 tx_prod_offset; u16 tx_prod_offset;
u16 rx_prod_offset; u16 rx_prod_offset;
u16 rx_prod2_offset; u16 rx_prod2_offset;
struct bpf_prog __rcu *xdp_bpf_prog;
}; };
#define VMXNET3_WRITE_BAR0_REG(adapter, reg, val) \ #define VMXNET3_WRITE_BAR0_REG(adapter, reg, val) \
...@@ -490,6 +507,12 @@ vmxnet3_tq_destroy_all(struct vmxnet3_adapter *adapter); ...@@ -490,6 +507,12 @@ vmxnet3_tq_destroy_all(struct vmxnet3_adapter *adapter);
void void
vmxnet3_rq_destroy_all(struct vmxnet3_adapter *adapter); vmxnet3_rq_destroy_all(struct vmxnet3_adapter *adapter);
int
vmxnet3_rq_create_all(struct vmxnet3_adapter *adapter);
void
vmxnet3_adjust_rx_ring_size(struct vmxnet3_adapter *adapter);
netdev_features_t netdev_features_t
vmxnet3_fix_features(struct net_device *netdev, netdev_features_t features); vmxnet3_fix_features(struct net_device *netdev, netdev_features_t features);
......
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0-or-later
*
* Linux driver for VMware's vmxnet3 ethernet NIC.
* Copyright (C) 2008-2023, VMware, Inc. All Rights Reserved.
* Maintained by: pv-drivers@vmware.com
*
*/
#ifndef _VMXNET3_XDP_H
#define _VMXNET3_XDP_H
#include <linux/filter.h>
#include <linux/bpf_trace.h>
#include <linux/netlink.h>
#include "vmxnet3_int.h"
#define VMXNET3_XDP_HEADROOM (XDP_PACKET_HEADROOM + NET_IP_ALIGN)
#define VMXNET3_XDP_RX_TAILROOM SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
#define VMXNET3_XDP_RX_OFFSET VMXNET3_XDP_HEADROOM
#define VMXNET3_XDP_MAX_FRSIZE (PAGE_SIZE - VMXNET3_XDP_HEADROOM - \
VMXNET3_XDP_RX_TAILROOM)
#define VMXNET3_XDP_MAX_MTU (VMXNET3_XDP_MAX_FRSIZE - ETH_HLEN - \
2 * VLAN_HLEN - ETH_FCS_LEN)
int vmxnet3_xdp(struct net_device *netdev, struct netdev_bpf *bpf);
int vmxnet3_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
u32 flags);
int vmxnet3_process_xdp(struct vmxnet3_adapter *adapter,
struct vmxnet3_rx_queue *rq,
struct Vmxnet3_RxCompDesc *rcd,
struct vmxnet3_rx_buf_info *rbi,
struct Vmxnet3_RxDesc *rxd,
struct sk_buff **skb_xdp_pass);
int vmxnet3_process_xdp_small(struct vmxnet3_adapter *adapter,
struct vmxnet3_rx_queue *rq,
void *data, int len,
struct sk_buff **skb_xdp_pass);
void *vmxnet3_pp_get_buff(struct page_pool *pp, dma_addr_t *dma_addr,
gfp_t gfp_mask);
static inline bool vmxnet3_xdp_enabled(struct vmxnet3_adapter *adapter)
{
return !!rcu_access_pointer(adapter->xdp_bpf_prog);
}
#endif
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment