Commit ec14325c authored by Alexei Starovoitov's avatar Alexei Starovoitov

Merge branch 'xdp-metadata-via-kfuncs-for-ice-vlan-hint'

Larysa Zaremba says:

====================
XDP metadata via kfuncs for ice + VLAN hint

This series introduces XDP hints via kfuncs [0] to the ice driver.

Series brings the following existing hints to the ice driver:
 - HW timestamp
 - RX hash with type

Series also introduces VLAN tag with protocol XDP hint, it now be accessed by
XDP and userspace (AF_XDP) programs. They can also be checked with xdp_metadata
test and xdp_hw_metadata program.

Impact of these patches on ice performance:
ZC:
* Full hints implementation decreases pps in ZC mode by less than 3%
  (64B, rxdrop)

skb (packets with invalid IP, dropped by stack):
* Overall, patchset improves peak performance in skb mode by about 0.5%

[0] https://patchwork.kernel.org/project/netdevbpf/cover/20230119221536.3349901-1-sdf@google.com/

v7:
https://lore.kernel.org/bpf/20231115175301.534113-1-larysa.zaremba@intel.com/
v6:
https://lore.kernel.org/bpf/20231012170524.21085-1-larysa.zaremba@intel.com/
Intermediate RFC v2:
https://lore.kernel.org/bpf/20230927075124.23941-1-larysa.zaremba@intel.com/
Intermediate RFC v1:
https://lore.kernel.org/bpf/20230824192703.712881-1-larysa.zaremba@intel.com/
v5:
https://lore.kernel.org/bpf/20230811161509.19722-1-larysa.zaremba@intel.com/
v4:
https://lore.kernel.org/bpf/20230728173923.1318596-1-larysa.zaremba@intel.com/
v3:
https://lore.kernel.org/bpf/20230719183734.21681-1-larysa.zaremba@intel.com/
v2:
https://lore.kernel.org/bpf/20230703181226.19380-1-larysa.zaremba@intel.com/
v1:
https://lore.kernel.org/all/20230512152607.992209-1-larysa.zaremba@intel.com/

Changes since v7:
* shorten timestamp assignment in ice
* change first argument of ice_fill_rx_descs back to xsk_buff_pool
* fix kernel-doc for ice_run_xdp_zc
* add missing XSK_CHECK_PRIV_TYPE() in ice
* resolved selftests merge conflicts with TX hints
* AF_INET patch adds new packet generation, not replaces AF_XDP one
* fix destination port in xdp_metadata

Changes since v6:
* add ability to fill cb of all xdp_buffs in xsk_buff_pool
* place just pointer to packet context in ice_xdp_buff
* add const qualifiers in veth implementation
* generate uapi for VLAN hint

Changes since v5:
* drop checksum hint from the patchset entirely
* Alex's patch that lifts the data_meta size limitation is no longer
  required in this patchset, so will be sent separately
* new patch: hide some ice hints code behind a static key
* fix several bugs in ZC mode (ice)
* change argument order in VLAN hint kfunc (tci, proto -> proto, tci)
* cosmetic changes
* analyze performance impact

Changes since v4:
* Drop the concept of partial checksum from the hint design
* Drop the concept of checksum level from the hint design

Changes since v3:
* use XDP_CHECKSUM_VALID_LVL0 + csum_level instead of csum_level + 1
* fix spelling mistakes
* read XDP timestamp unconditionally
* add TO_STR() macro

Changes since v2:
* redesign checksum hint, so now it gives full status
* rename vlan_tag -> vlan_tci, where applicable
* use open_netns() and close_netns() in xdp_metadata
* improve VLAN hint documentation
* replace CFI with DEI
* use VLAN_VID_MASK in xdp_metadata
* make vlan_get_tag() return -ENODATA
* remove unused rx_ptype in ice_xsk.c
* fix ice timestamp code division between patches

Changes since v1:
* directly return RX hash, RX timestamp and RX checksum status
  in skb-common functions
* use intermediate enum value for checksum status in ice
* get rid of ring structure dependency in ice kfunc implementation
* make variables const, when possible, in ice implementation
* use -ENODATA instead of -EOPNOTSUPP for driver implementation
* instead of having 2 separate functions for c-tag and s-tag,
  use 1 function that outputs both VLAN tag and protocol ID
* improve documentation for introduced hints
* update xdp_metadata selftest to test new hints
* implement new hints in veth, so they can be tested in xdp_metadata
* parse VLAN tag in xdp_hw_metadata
====================

Link: https://lore.kernel.org/r/20231205210847.28460-1-larysa.zaremba@intel.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
parents 73376328 4c6612f6
......@@ -54,6 +54,10 @@ definitions:
name: hash
doc:
Device is capable of exposing receive packet hash via bpf_xdp_metadata_rx_hash().
-
name: vlan-tag
doc:
Device is capable of exposing receive packet VLAN tag via bpf_xdp_metadata_rx_vlan_tag().
-
type: flags
name: xsk-flags
......
......@@ -20,7 +20,13 @@ Currently, the following kfuncs are supported. In the future, as more
metadata is supported, this set will grow:
.. kernel-doc:: net/core/xdp.c
:identifiers: bpf_xdp_metadata_rx_timestamp bpf_xdp_metadata_rx_hash
:identifiers: bpf_xdp_metadata_rx_timestamp
.. kernel-doc:: net/core/xdp.c
:identifiers: bpf_xdp_metadata_rx_hash
.. kernel-doc:: net/core/xdp.c
:identifiers: bpf_xdp_metadata_rx_vlan_tag
An XDP program can use these kfuncs to read the metadata into stack
variables for its own consumption. Or, to pass the metadata on to other
......
......@@ -996,4 +996,6 @@ static inline void ice_clear_rdma_cap(struct ice_pf *pf)
set_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags);
clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
}
extern const struct xdp_metadata_ops ice_xdp_md_ops;
#endif /* _ICE_H_ */
......@@ -519,6 +519,19 @@ static int ice_setup_rx_ctx(struct ice_rx_ring *ring)
return 0;
}
static void ice_xsk_pool_fill_cb(struct ice_rx_ring *ring)
{
void *ctx_ptr = &ring->pkt_ctx;
struct xsk_cb_desc desc = {};
XSK_CHECK_PRIV_TYPE(struct ice_xdp_buff);
desc.src = &ctx_ptr;
desc.off = offsetof(struct ice_xdp_buff, pkt_ctx) -
sizeof(struct xdp_buff);
desc.bytes = sizeof(ctx_ptr);
xsk_pool_fill_cb(ring->xsk_pool, &desc);
}
/**
* ice_vsi_cfg_rxq - Configure an Rx queue
* @ring: the ring being configured
......@@ -553,6 +566,7 @@ int ice_vsi_cfg_rxq(struct ice_rx_ring *ring)
if (err)
return err;
xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq);
ice_xsk_pool_fill_cb(ring);
dev_info(dev, "Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n",
ring->q_index);
......@@ -575,6 +589,7 @@ int ice_vsi_cfg_rxq(struct ice_rx_ring *ring)
xdp_init_buff(&ring->xdp, ice_rx_pg_size(ring) / 2, &ring->xdp_rxq);
ring->xdp.data = NULL;
ring->xdp_ext.pkt_ctx = &ring->pkt_ctx;
err = ice_setup_rx_ctx(ring);
if (err) {
dev_err(dev, "ice_setup_rx_ctx failed for RxQ %d, err %d\n",
......
......@@ -3397,6 +3397,7 @@ static void ice_set_ops(struct ice_vsi *vsi)
netdev->netdev_ops = &ice_netdev_ops;
netdev->udp_tunnel_nic_info = &pf->hw.udp_tunnel_nic;
netdev->xdp_metadata_ops = &ice_xdp_md_ops;
ice_set_ethtool_ops(netdev);
if (vsi->type != ICE_VSI_PF)
......@@ -6042,6 +6043,23 @@ ice_fix_features(struct net_device *netdev, netdev_features_t features)
return features;
}
/**
* ice_set_rx_rings_vlan_proto - update rings with new stripped VLAN proto
* @vsi: PF's VSI
* @vlan_ethertype: VLAN ethertype (802.1Q or 802.1ad) in network byte order
*
* Store current stripped VLAN proto in ring packet context,
* so it can be accessed more efficiently by packet processing code.
*/
static void
ice_set_rx_rings_vlan_proto(struct ice_vsi *vsi, __be16 vlan_ethertype)
{
u16 i;
ice_for_each_alloc_rxq(vsi, i)
vsi->rx_rings[i]->pkt_ctx.vlan_proto = vlan_ethertype;
}
/**
* ice_set_vlan_offload_features - set VLAN offload features for the PF VSI
* @vsi: PF's VSI
......@@ -6084,6 +6102,9 @@ ice_set_vlan_offload_features(struct ice_vsi *vsi, netdev_features_t features)
if (strip_err || insert_err)
return -EIO;
ice_set_rx_rings_vlan_proto(vsi, enable_stripping ?
htons(vlan_ethertype) : 0);
return 0;
}
......
......@@ -2127,30 +2127,26 @@ int ice_ptp_set_ts_config(struct ice_pf *pf, struct ifreq *ifr)
}
/**
* ice_ptp_rx_hwtstamp - Check for an Rx timestamp
* @rx_ring: Ring to get the VSI info
* ice_ptp_get_rx_hwts - Get packet Rx timestamp in ns
* @rx_desc: Receive descriptor
* @skb: Particular skb to send timestamp with
* @pkt_ctx: Packet context to get the cached time
*
* The driver receives a notification in the receive descriptor with timestamp.
* The timestamp is in ns, so we must convert the result first.
*/
void
ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring,
union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb)
u64 ice_ptp_get_rx_hwts(const union ice_32b_rx_flex_desc *rx_desc,
const struct ice_pkt_ctx *pkt_ctx)
{
struct skb_shared_hwtstamps *hwtstamps;
u64 ts_ns, cached_time;
u32 ts_high;
if (!(rx_desc->wb.time_stamp_low & ICE_PTP_TS_VALID))
return;
return 0;
cached_time = READ_ONCE(rx_ring->cached_phctime);
cached_time = READ_ONCE(pkt_ctx->cached_phctime);
/* Do not report a timestamp if we don't have a cached PHC time */
if (!cached_time)
return;
return 0;
/* Use ice_ptp_extend_32b_ts directly, using the ring-specific cached
* PHC value, rather than accessing the PF. This also allows us to
......@@ -2161,9 +2157,7 @@ ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring,
ts_high = le32_to_cpu(rx_desc->wb.flex_ts.ts_high);
ts_ns = ice_ptp_extend_32b_ts(cached_time, ts_high);
hwtstamps = skb_hwtstamps(skb);
memset(hwtstamps, 0, sizeof(*hwtstamps));
hwtstamps->hwtstamp = ns_to_ktime(ts_ns);
return ts_ns;
}
/**
......
......@@ -298,9 +298,8 @@ void ice_ptp_extts_event(struct ice_pf *pf);
s8 ice_ptp_request_ts(struct ice_ptp_tx *tx, struct sk_buff *skb);
enum ice_tx_tstamp_work ice_ptp_process_ts(struct ice_pf *pf);
void
ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring,
union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb);
u64 ice_ptp_get_rx_hwts(const union ice_32b_rx_flex_desc *rx_desc,
const struct ice_pkt_ctx *pkt_ctx);
void ice_ptp_reset(struct ice_pf *pf);
void ice_ptp_prepare_for_reset(struct ice_pf *pf);
void ice_ptp_init(struct ice_pf *pf);
......@@ -329,9 +328,14 @@ static inline bool ice_ptp_process_ts(struct ice_pf *pf)
{
return true;
}
static inline void
ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring,
union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb) { }
static inline u64
ice_ptp_get_rx_hwts(const union ice_32b_rx_flex_desc *rx_desc,
const struct ice_pkt_ctx *pkt_ctx)
{
return 0;
}
static inline void ice_ptp_reset(struct ice_pf *pf) { }
static inline void ice_ptp_prepare_for_reset(struct ice_pf *pf) { }
static inline void ice_ptp_init(struct ice_pf *pf) { }
......
......@@ -557,13 +557,14 @@ ice_rx_frame_truesize(struct ice_rx_ring *rx_ring, const unsigned int size)
* @xdp_prog: XDP program to run
* @xdp_ring: ring to be used for XDP_TX action
* @rx_buf: Rx buffer to store the XDP action
* @eop_desc: Last descriptor in packet to read metadata from
*
* Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
*/
static void
ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,
struct ice_rx_buf *rx_buf)
struct ice_rx_buf *rx_buf, union ice_32b_rx_flex_desc *eop_desc)
{
unsigned int ret = ICE_XDP_PASS;
u32 act;
......@@ -571,6 +572,8 @@ ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
if (!xdp_prog)
goto exit;
ice_xdp_meta_set_desc(xdp, eop_desc);
act = bpf_prog_run_xdp(xdp_prog, xdp);
switch (act) {
case XDP_PASS:
......@@ -1180,8 +1183,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
struct sk_buff *skb;
unsigned int size;
u16 stat_err_bits;
u16 vlan_tag = 0;
u16 rx_ptype;
u16 vlan_tci;
/* get the Rx desc from Rx ring based on 'next_to_clean' */
rx_desc = ICE_RX_DESC(rx_ring, ntc);
......@@ -1241,7 +1243,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
if (ice_is_non_eop(rx_ring, rx_desc))
continue;
ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_buf);
ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_buf, rx_desc);
if (rx_buf->act == ICE_XDP_PASS)
goto construct_skb;
total_rx_bytes += xdp_get_buff_len(xdp);
......@@ -1276,7 +1278,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
continue;
}
vlan_tag = ice_get_vlan_tag_from_rx_desc(rx_desc);
vlan_tci = ice_get_vlan_tci(rx_desc);
/* pad the skb if needed, to make a valid ethernet frame */
if (eth_skb_pad(skb))
......@@ -1286,14 +1288,11 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
total_rx_bytes += skb->len;
/* populate checksum, VLAN, and protocol */
rx_ptype = le16_to_cpu(rx_desc->wb.ptype_flex_flags0) &
ICE_RX_FLEX_DESC_PTYPE_M;
ice_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype);
ice_process_skb_fields(rx_ring, rx_desc, skb);
ice_trace(clean_rx_irq_indicate, rx_ring, rx_desc, skb);
/* send completed skb up the stack */
ice_receive_skb(rx_ring, skb, vlan_tag);
ice_receive_skb(rx_ring, skb, vlan_tci);
/* update budget accounting */
total_rx_pkts++;
......
......@@ -257,6 +257,20 @@ enum ice_rx_dtype {
ICE_RX_DTYPE_SPLIT_ALWAYS = 2,
};
struct ice_pkt_ctx {
u64 cached_phctime;
__be16 vlan_proto;
};
struct ice_xdp_buff {
struct xdp_buff xdp_buff;
const union ice_32b_rx_flex_desc *eop_desc;
const struct ice_pkt_ctx *pkt_ctx;
};
/* Required for compatibility with xdp_buffs from xsk_pool */
static_assert(offsetof(struct ice_xdp_buff, xdp_buff) == 0);
/* indices into GLINT_ITR registers */
#define ICE_RX_ITR ICE_IDX_ITR0
#define ICE_TX_ITR ICE_IDX_ITR1
......@@ -298,7 +312,6 @@ enum ice_dynamic_itr {
/* descriptor ring, associated with a VSI */
struct ice_rx_ring {
/* CL1 - 1st cacheline starts here */
struct ice_rx_ring *next; /* pointer to next ring in q_vector */
void *desc; /* Descriptor ring memory */
struct device *dev; /* Used for DMA mapping */
struct net_device *netdev; /* netdev ring maps to */
......@@ -310,13 +323,24 @@ struct ice_rx_ring {
u16 count; /* Number of descriptors */
u16 reg_idx; /* HW register index of the ring */
u16 next_to_alloc;
/* CL2 - 2nd cacheline starts here */
union {
struct ice_rx_buf *rx_buf;
struct xdp_buff **xdp_buf;
};
struct xdp_buff xdp;
/* CL2 - 2nd cacheline starts here */
union {
struct ice_xdp_buff xdp_ext;
struct xdp_buff xdp;
};
/* CL3 - 3rd cacheline starts here */
union {
struct ice_pkt_ctx pkt_ctx;
struct {
u64 cached_phctime;
__be16 vlan_proto;
};
};
struct bpf_prog *xdp_prog;
u16 rx_offset;
......@@ -332,9 +356,9 @@ struct ice_rx_ring {
/* CL4 - 4th cacheline starts here */
struct ice_channel *ch;
struct ice_tx_ring *xdp_ring;
struct ice_rx_ring *next; /* pointer to next ring in q_vector */
struct xsk_buff_pool *xsk_pool;
dma_addr_t dma; /* physical address of ring */
u64 cached_phctime;
u16 rx_buf_len;
u8 dcb_tc; /* Traffic class of ring */
u8 ptp_rx;
......
......@@ -63,28 +63,42 @@ static enum pkt_hash_types ice_ptype_to_htype(u16 ptype)
}
/**
* ice_rx_hash - set the hash value in the skb
* ice_get_rx_hash - get RX hash value from descriptor
* @rx_desc: specific descriptor
*
* Returns hash, if present, 0 otherwise.
*/
static u32 ice_get_rx_hash(const union ice_32b_rx_flex_desc *rx_desc)
{
const struct ice_32b_rx_flex_desc_nic *nic_mdid;
if (unlikely(rx_desc->wb.rxdid != ICE_RXDID_FLEX_NIC))
return 0;
nic_mdid = (struct ice_32b_rx_flex_desc_nic *)rx_desc;
return le32_to_cpu(nic_mdid->rss_hash);
}
/**
* ice_rx_hash_to_skb - set the hash value in the skb
* @rx_ring: descriptor ring
* @rx_desc: specific descriptor
* @skb: pointer to current skb
* @rx_ptype: the ptype value from the descriptor
*/
static void
ice_rx_hash(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc,
struct sk_buff *skb, u16 rx_ptype)
ice_rx_hash_to_skb(const struct ice_rx_ring *rx_ring,
const union ice_32b_rx_flex_desc *rx_desc,
struct sk_buff *skb, u16 rx_ptype)
{
struct ice_32b_rx_flex_desc_nic *nic_mdid;
u32 hash;
if (!(rx_ring->netdev->features & NETIF_F_RXHASH))
return;
if (rx_desc->wb.rxdid != ICE_RXDID_FLEX_NIC)
return;
nic_mdid = (struct ice_32b_rx_flex_desc_nic *)rx_desc;
hash = le32_to_cpu(nic_mdid->rss_hash);
skb_set_hash(skb, hash, ice_ptype_to_htype(rx_ptype));
hash = ice_get_rx_hash(rx_desc);
if (likely(hash))
skb_set_hash(skb, hash, ice_ptype_to_htype(rx_ptype));
}
/**
......@@ -170,12 +184,39 @@ ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb,
ring->vsi->back->hw_csum_rx_error++;
}
/**
* ice_ptp_rx_hwts_to_skb - Put RX timestamp into skb
* @rx_ring: Ring to get the VSI info
* @rx_desc: Receive descriptor
* @skb: Particular skb to send timestamp with
*
* The timestamp is in ns, so we must convert the result first.
*/
static void
ice_ptp_rx_hwts_to_skb(struct ice_rx_ring *rx_ring,
const union ice_32b_rx_flex_desc *rx_desc,
struct sk_buff *skb)
{
u64 ts_ns = ice_ptp_get_rx_hwts(rx_desc, &rx_ring->pkt_ctx);
skb_hwtstamps(skb)->hwtstamp = ns_to_ktime(ts_ns);
}
/**
* ice_get_ptype - Read HW packet type from the descriptor
* @rx_desc: RX descriptor
*/
static u16 ice_get_ptype(const union ice_32b_rx_flex_desc *rx_desc)
{
return le16_to_cpu(rx_desc->wb.ptype_flex_flags0) &
ICE_RX_FLEX_DESC_PTYPE_M;
}
/**
* ice_process_skb_fields - Populate skb header fields from Rx descriptor
* @rx_ring: Rx descriptor ring packet is being transacted on
* @rx_desc: pointer to the EOP Rx descriptor
* @skb: pointer to current skb being populated
* @ptype: the packet type decoded by hardware
*
* This function checks the ring, descriptor, and packet information in
* order to populate the hash, checksum, VLAN, protocol, and
......@@ -184,9 +225,11 @@ ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb,
void
ice_process_skb_fields(struct ice_rx_ring *rx_ring,
union ice_32b_rx_flex_desc *rx_desc,
struct sk_buff *skb, u16 ptype)
struct sk_buff *skb)
{
ice_rx_hash(rx_ring, rx_desc, skb, ptype);
u16 ptype = ice_get_ptype(rx_desc);
ice_rx_hash_to_skb(rx_ring, rx_desc, skb, ptype);
/* modifies the skb - consumes the enet header */
skb->protocol = eth_type_trans(skb, rx_ring->netdev);
......@@ -194,28 +237,24 @@ ice_process_skb_fields(struct ice_rx_ring *rx_ring,
ice_rx_csum(rx_ring, skb, rx_desc, ptype);
if (rx_ring->ptp_rx)
ice_ptp_rx_hwtstamp(rx_ring, rx_desc, skb);
ice_ptp_rx_hwts_to_skb(rx_ring, rx_desc, skb);
}
/**
* ice_receive_skb - Send a completed packet up the stack
* @rx_ring: Rx ring in play
* @skb: packet to send up
* @vlan_tag: VLAN tag for packet
* @vlan_tci: VLAN TCI for packet
*
* This function sends the completed packet (via. skb) up the stack using
* gro receive functions (with/without VLAN tag)
*/
void
ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tag)
ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tci)
{
netdev_features_t features = rx_ring->netdev->features;
bool non_zero_vlan = !!(vlan_tag & VLAN_VID_MASK);
if ((features & NETIF_F_HW_VLAN_CTAG_RX) && non_zero_vlan)
__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tag);
else if ((features & NETIF_F_HW_VLAN_STAG_RX) && non_zero_vlan)
__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021AD), vlan_tag);
if ((vlan_tci & VLAN_VID_MASK) && rx_ring->vlan_proto)
__vlan_hwaccel_put_tag(skb, rx_ring->vlan_proto,
vlan_tci);
napi_gro_receive(&rx_ring->q_vector->napi, skb);
}
......@@ -464,3 +503,125 @@ void ice_finalize_xdp_rx(struct ice_tx_ring *xdp_ring, unsigned int xdp_res,
spin_unlock(&xdp_ring->tx_lock);
}
}
/**
* ice_xdp_rx_hw_ts - HW timestamp XDP hint handler
* @ctx: XDP buff pointer
* @ts_ns: destination address
*
* Copy HW timestamp (if available) to the destination address.
*/
static int ice_xdp_rx_hw_ts(const struct xdp_md *ctx, u64 *ts_ns)
{
const struct ice_xdp_buff *xdp_ext = (void *)ctx;
*ts_ns = ice_ptp_get_rx_hwts(xdp_ext->eop_desc,
xdp_ext->pkt_ctx);
if (!*ts_ns)
return -ENODATA;
return 0;
}
/* Define a ptype index -> XDP hash type lookup table.
* It uses the same ptype definitions as ice_decode_rx_desc_ptype[],
* avoiding possible copy-paste errors.
*/
#undef ICE_PTT
#undef ICE_PTT_UNUSED_ENTRY
#define ICE_PTT(PTYPE, OUTER_IP, OUTER_IP_VER, OUTER_FRAG, T, TE, TEF, I, PL)\
[PTYPE] = XDP_RSS_L3_##OUTER_IP_VER | XDP_RSS_L4_##I | XDP_RSS_TYPE_##PL
#define ICE_PTT_UNUSED_ENTRY(PTYPE) [PTYPE] = 0
/* A few supplementary definitions for when XDP hash types do not coincide
* with what can be generated from ptype definitions
* by means of preprocessor concatenation.
*/
#define XDP_RSS_L3_NONE XDP_RSS_TYPE_NONE
#define XDP_RSS_L4_NONE XDP_RSS_TYPE_NONE
#define XDP_RSS_TYPE_PAY2 XDP_RSS_TYPE_L2
#define XDP_RSS_TYPE_PAY3 XDP_RSS_TYPE_NONE
#define XDP_RSS_TYPE_PAY4 XDP_RSS_L4
static const enum xdp_rss_hash_type
ice_ptype_to_xdp_hash[ICE_NUM_DEFINED_PTYPES] = {
ICE_PTYPES
};
#undef XDP_RSS_L3_NONE
#undef XDP_RSS_L4_NONE
#undef XDP_RSS_TYPE_PAY2
#undef XDP_RSS_TYPE_PAY3
#undef XDP_RSS_TYPE_PAY4
#undef ICE_PTT
#undef ICE_PTT_UNUSED_ENTRY
/**
* ice_xdp_rx_hash_type - Get XDP-specific hash type from the RX descriptor
* @eop_desc: End of Packet descriptor
*/
static enum xdp_rss_hash_type
ice_xdp_rx_hash_type(const union ice_32b_rx_flex_desc *eop_desc)
{
u16 ptype = ice_get_ptype(eop_desc);
if (unlikely(ptype >= ICE_NUM_DEFINED_PTYPES))
return 0;
return ice_ptype_to_xdp_hash[ptype];
}
/**
* ice_xdp_rx_hash - RX hash XDP hint handler
* @ctx: XDP buff pointer
* @hash: hash destination address
* @rss_type: XDP hash type destination address
*
* Copy RX hash (if available) and its type to the destination address.
*/
static int ice_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash,
enum xdp_rss_hash_type *rss_type)
{
const struct ice_xdp_buff *xdp_ext = (void *)ctx;
*hash = ice_get_rx_hash(xdp_ext->eop_desc);
*rss_type = ice_xdp_rx_hash_type(xdp_ext->eop_desc);
if (!likely(*hash))
return -ENODATA;
return 0;
}
/**
* ice_xdp_rx_vlan_tag - VLAN tag XDP hint handler
* @ctx: XDP buff pointer
* @vlan_proto: destination address for VLAN protocol
* @vlan_tci: destination address for VLAN TCI
*
* Copy VLAN tag (if was stripped) and corresponding protocol
* to the destination address.
*/
static int ice_xdp_rx_vlan_tag(const struct xdp_md *ctx, __be16 *vlan_proto,
u16 *vlan_tci)
{
const struct ice_xdp_buff *xdp_ext = (void *)ctx;
*vlan_proto = xdp_ext->pkt_ctx->vlan_proto;
if (!*vlan_proto)
return -ENODATA;
*vlan_tci = ice_get_vlan_tci(xdp_ext->eop_desc);
if (!*vlan_tci)
return -ENODATA;
return 0;
}
const struct xdp_metadata_ops ice_xdp_md_ops = {
.xmo_rx_timestamp = ice_xdp_rx_hw_ts,
.xmo_rx_hash = ice_xdp_rx_hash,
.xmo_rx_vlan_tag = ice_xdp_rx_vlan_tag,
};
......@@ -84,7 +84,7 @@ ice_build_ctob(u64 td_cmd, u64 td_offset, unsigned int size, u64 td_tag)
}
/**
* ice_get_vlan_tag_from_rx_desc - get VLAN from Rx flex descriptor
* ice_get_vlan_tci - get VLAN TCI from Rx flex descriptor
* @rx_desc: Rx 32b flex descriptor with RXDID=2
*
* The OS and current PF implementation only support stripping a single VLAN tag
......@@ -92,7 +92,7 @@ ice_build_ctob(u64 td_cmd, u64 td_offset, unsigned int size, u64 td_tag)
* one is found return the tag, else return 0 to mean no VLAN tag was found.
*/
static inline u16
ice_get_vlan_tag_from_rx_desc(union ice_32b_rx_flex_desc *rx_desc)
ice_get_vlan_tci(const union ice_32b_rx_flex_desc *rx_desc)
{
u16 stat_err_bits;
......@@ -148,7 +148,17 @@ void ice_release_rx_desc(struct ice_rx_ring *rx_ring, u16 val);
void
ice_process_skb_fields(struct ice_rx_ring *rx_ring,
union ice_32b_rx_flex_desc *rx_desc,
struct sk_buff *skb, u16 ptype);
struct sk_buff *skb);
void
ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tag);
ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tci);
static inline void
ice_xdp_meta_set_desc(struct xdp_buff *xdp,
union ice_32b_rx_flex_desc *eop_desc)
{
struct ice_xdp_buff *xdp_ext = container_of(xdp, struct ice_xdp_buff,
xdp_buff);
xdp_ext->eop_desc = eop_desc;
}
#endif /* !_ICE_TXRX_LIB_H_ */
......@@ -458,6 +458,11 @@ static u16 ice_fill_rx_descs(struct xsk_buff_pool *pool, struct xdp_buff **xdp,
rx_desc->read.pkt_addr = cpu_to_le64(dma);
rx_desc->wb.status_error0 = 0;
/* Put private info that changes on a per-packet basis
* into xdp_buff_xsk->cb.
*/
ice_xdp_meta_set_desc(*xdp, rx_desc);
rx_desc++;
xdp++;
}
......@@ -863,8 +868,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
struct xdp_buff *xdp;
struct sk_buff *skb;
u16 stat_err_bits;
u16 vlan_tag = 0;
u16 rx_ptype;
u16 vlan_tci;
rx_desc = ICE_RX_DESC(rx_ring, ntc);
......@@ -942,13 +946,10 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
total_rx_bytes += skb->len;
total_rx_packets++;
vlan_tag = ice_get_vlan_tag_from_rx_desc(rx_desc);
rx_ptype = le16_to_cpu(rx_desc->wb.ptype_flex_flags0) &
ICE_RX_FLEX_DESC_PTYPE_M;
vlan_tci = ice_get_vlan_tci(rx_desc);
ice_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype);
ice_receive_skb(rx_ring, skb, vlan_tag);
ice_process_skb_fields(rx_ring, rx_desc, skb);
ice_receive_skb(rx_ring, skb, vlan_tci);
}
rx_ring->next_to_clean = ntc;
......
......@@ -256,9 +256,24 @@ static int mlx5e_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash,
return 0;
}
static int mlx5e_xdp_rx_vlan_tag(const struct xdp_md *ctx, __be16 *vlan_proto,
u16 *vlan_tci)
{
const struct mlx5e_xdp_buff *_ctx = (void *)ctx;
const struct mlx5_cqe64 *cqe = _ctx->cqe;
if (!cqe_has_vlan(cqe))
return -ENODATA;
*vlan_proto = htons(ETH_P_8021Q);
*vlan_tci = be16_to_cpu(cqe->vlan_info);
return 0;
}
const struct xdp_metadata_ops mlx5e_xdp_metadata_ops = {
.xmo_rx_timestamp = mlx5e_xdp_rx_timestamp,
.xmo_rx_hash = mlx5e_xdp_rx_hash,
.xmo_rx_vlan_tag = mlx5e_xdp_rx_vlan_tag,
};
struct mlx5e_xsk_tx_complete {
......
......@@ -1722,6 +1722,24 @@ static int veth_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash,
return 0;
}
static int veth_xdp_rx_vlan_tag(const struct xdp_md *ctx, __be16 *vlan_proto,
u16 *vlan_tci)
{
const struct veth_xdp_buff *_ctx = (void *)ctx;
const struct sk_buff *skb = _ctx->skb;
int err;
if (!skb)
return -ENODATA;
err = __vlan_hwaccel_get_tag(skb, vlan_tci);
if (err)
return err;
*vlan_proto = skb->vlan_proto;
return err;
}
static const struct net_device_ops veth_netdev_ops = {
.ndo_init = veth_dev_init,
.ndo_open = veth_open,
......@@ -1746,6 +1764,7 @@ static const struct net_device_ops veth_netdev_ops = {
static const struct xdp_metadata_ops veth_xdp_metadata_ops = {
.xmo_rx_timestamp = veth_xdp_rx_timestamp,
.xmo_rx_hash = veth_xdp_rx_hash,
.xmo_rx_vlan_tag = veth_xdp_rx_vlan_tag,
};
#define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \
......
......@@ -540,7 +540,7 @@ static inline int __vlan_get_tag(const struct sk_buff *skb, u16 *vlan_tci)
struct vlan_ethhdr *veth = skb_vlan_eth_hdr(skb);
if (!eth_type_vlan(veth->h_vlan_proto))
return -EINVAL;
return -ENODATA;
*vlan_tci = ntohs(veth->h_vlan_TCI);
return 0;
......@@ -561,7 +561,7 @@ static inline int __vlan_hwaccel_get_tag(const struct sk_buff *skb,
return 0;
} else {
*vlan_tci = 0;
return -EINVAL;
return -ENODATA;
}
}
......
......@@ -918,7 +918,7 @@ static inline u8 get_cqe_tls_offload(struct mlx5_cqe64 *cqe)
return (cqe->tls_outer_l3_tunneled >> 3) & 0x3;
}
static inline bool cqe_has_vlan(struct mlx5_cqe64 *cqe)
static inline bool cqe_has_vlan(const struct mlx5_cqe64 *cqe)
{
return cqe->l4_l3_hdr_type & 0x1;
}
......
......@@ -404,6 +404,10 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
NETDEV_XDP_RX_METADATA_HASH, \
bpf_xdp_metadata_rx_hash, \
xmo_rx_hash) \
XDP_METADATA_KFUNC(XDP_METADATA_KFUNC_RX_VLAN_TAG, \
NETDEV_XDP_RX_METADATA_VLAN_TAG, \
bpf_xdp_metadata_rx_vlan_tag, \
xmo_rx_vlan_tag) \
enum xdp_rx_metadata {
#define XDP_METADATA_KFUNC(name, _, __, ___) name,
......@@ -432,6 +436,7 @@ enum xdp_rss_hash_type {
XDP_RSS_L4_UDP = BIT(5),
XDP_RSS_L4_SCTP = BIT(6),
XDP_RSS_L4_IPSEC = BIT(7), /* L4 based hash include IPSEC SPI */
XDP_RSS_L4_ICMP = BIT(8),
/* Second part: RSS hash type combinations used for driver HW mapping */
XDP_RSS_TYPE_NONE = 0,
......@@ -447,11 +452,13 @@ enum xdp_rss_hash_type {
XDP_RSS_TYPE_L4_IPV4_UDP = XDP_RSS_L3_IPV4 | XDP_RSS_L4 | XDP_RSS_L4_UDP,
XDP_RSS_TYPE_L4_IPV4_SCTP = XDP_RSS_L3_IPV4 | XDP_RSS_L4 | XDP_RSS_L4_SCTP,
XDP_RSS_TYPE_L4_IPV4_IPSEC = XDP_RSS_L3_IPV4 | XDP_RSS_L4 | XDP_RSS_L4_IPSEC,
XDP_RSS_TYPE_L4_IPV4_ICMP = XDP_RSS_L3_IPV4 | XDP_RSS_L4 | XDP_RSS_L4_ICMP,
XDP_RSS_TYPE_L4_IPV6_TCP = XDP_RSS_L3_IPV6 | XDP_RSS_L4 | XDP_RSS_L4_TCP,
XDP_RSS_TYPE_L4_IPV6_UDP = XDP_RSS_L3_IPV6 | XDP_RSS_L4 | XDP_RSS_L4_UDP,
XDP_RSS_TYPE_L4_IPV6_SCTP = XDP_RSS_L3_IPV6 | XDP_RSS_L4 | XDP_RSS_L4_SCTP,
XDP_RSS_TYPE_L4_IPV6_IPSEC = XDP_RSS_L3_IPV6 | XDP_RSS_L4 | XDP_RSS_L4_IPSEC,
XDP_RSS_TYPE_L4_IPV6_ICMP = XDP_RSS_L3_IPV6 | XDP_RSS_L4 | XDP_RSS_L4_ICMP,
XDP_RSS_TYPE_L4_IPV6_TCP_EX = XDP_RSS_TYPE_L4_IPV6_TCP | XDP_RSS_L3_DYNHDR,
XDP_RSS_TYPE_L4_IPV6_UDP_EX = XDP_RSS_TYPE_L4_IPV6_UDP | XDP_RSS_L3_DYNHDR,
......@@ -462,6 +469,8 @@ struct xdp_metadata_ops {
int (*xmo_rx_timestamp)(const struct xdp_md *ctx, u64 *timestamp);
int (*xmo_rx_hash)(const struct xdp_md *ctx, u32 *hash,
enum xdp_rss_hash_type *rss_type);
int (*xmo_rx_vlan_tag)(const struct xdp_md *ctx, __be16 *vlan_proto,
u16 *vlan_tci);
};
#ifdef CONFIG_NET
......
......@@ -14,6 +14,12 @@
#ifdef CONFIG_XDP_SOCKETS
struct xsk_cb_desc {
void *src;
u8 off;
u8 bytes;
};
void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries);
bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc);
u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, u32 max);
......@@ -47,6 +53,12 @@ static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool,
xp_set_rxq_info(pool, rxq);
}
static inline void xsk_pool_fill_cb(struct xsk_buff_pool *pool,
struct xsk_cb_desc *desc)
{
xp_fill_cb(pool, desc);
}
static inline unsigned int xsk_pool_get_napi_id(struct xsk_buff_pool *pool)
{
#ifdef CONFIG_NET_RX_BUSY_POLL
......@@ -274,6 +286,11 @@ static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool,
{
}
static inline void xsk_pool_fill_cb(struct xsk_buff_pool *pool,
struct xsk_cb_desc *desc)
{
}
static inline unsigned int xsk_pool_get_napi_id(struct xsk_buff_pool *pool)
{
return 0;
......
......@@ -12,6 +12,7 @@
struct xsk_buff_pool;
struct xdp_rxq_info;
struct xsk_cb_desc;
struct xsk_queue;
struct xdp_desc;
struct xdp_umem;
......@@ -135,6 +136,7 @@ static inline void xp_init_xskb_dma(struct xdp_buff_xsk *xskb, struct xsk_buff_p
/* AF_XDP ZC drivers, via xdp_sock_buff.h */
void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq);
void xp_fill_cb(struct xsk_buff_pool *pool, struct xsk_cb_desc *desc);
int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
unsigned long attrs, struct page **pages, u32 nr_pages);
void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs);
......
......@@ -44,10 +44,13 @@ enum netdev_xdp_act {
* timestamp via bpf_xdp_metadata_rx_timestamp().
* @NETDEV_XDP_RX_METADATA_HASH: Device is capable of exposing receive packet
* hash via bpf_xdp_metadata_rx_hash().
* @NETDEV_XDP_RX_METADATA_VLAN_TAG: Device is capable of exposing receive
* packet VLAN tag via bpf_xdp_metadata_rx_vlan_tag().
*/
enum netdev_xdp_rx_metadata {
NETDEV_XDP_RX_METADATA_TIMESTAMP = 1,
NETDEV_XDP_RX_METADATA_HASH = 2,
NETDEV_XDP_RX_METADATA_VLAN_TAG = 4,
};
/**
......
......@@ -736,6 +736,39 @@ __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash,
return -EOPNOTSUPP;
}
/**
* bpf_xdp_metadata_rx_vlan_tag - Get XDP packet outermost VLAN tag
* @ctx: XDP context pointer.
* @vlan_proto: Destination pointer for VLAN Tag protocol identifier (TPID).
* @vlan_tci: Destination pointer for VLAN TCI (VID + DEI + PCP)
*
* In case of success, ``vlan_proto`` contains *Tag protocol identifier (TPID)*,
* usually ``ETH_P_8021Q`` or ``ETH_P_8021AD``, but some networks can use
* custom TPIDs. ``vlan_proto`` is stored in **network byte order (BE)**
* and should be used as follows:
* ``if (vlan_proto == bpf_htons(ETH_P_8021Q)) do_something();``
*
* ``vlan_tci`` contains the remaining 16 bits of a VLAN tag.
* Driver is expected to provide those in **host byte order (usually LE)**,
* so the bpf program should not perform byte conversion.
* According to 802.1Q standard, *VLAN TCI (Tag control information)*
* is a bit field that contains:
* *VLAN identifier (VID)* that can be read with ``vlan_tci & 0xfff``,
* *Drop eligible indicator (DEI)* - 1 bit,
* *Priority code point (PCP)* - 3 bits.
* For detailed meaning of DEI and PCP, please refer to other sources.
*
* Return:
* * Returns 0 on success or ``-errno`` on error.
* * ``-EOPNOTSUPP`` : device driver doesn't implement kfunc
* * ``-ENODATA`` : VLAN tag was not stripped or is not available
*/
__bpf_kfunc int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx,
__be16 *vlan_proto, u16 *vlan_tci)
{
return -EOPNOTSUPP;
}
__bpf_kfunc_end_defs();
BTF_SET8_START(xdp_metadata_kfunc_ids)
......
......@@ -125,6 +125,18 @@ void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq)
}
EXPORT_SYMBOL(xp_set_rxq_info);
void xp_fill_cb(struct xsk_buff_pool *pool, struct xsk_cb_desc *desc)
{
u32 i;
for (i = 0; i < pool->heads_cnt; i++) {
struct xdp_buff_xsk *xskb = &pool->heads[i];
memcpy(xskb->cb + desc->off, desc->src, desc->bytes);
}
}
EXPORT_SYMBOL(xp_fill_cb);
static void xp_disable_drv_zc(struct xsk_buff_pool *pool)
{
struct netdev_bpf bpf;
......
......@@ -44,10 +44,13 @@ enum netdev_xdp_act {
* timestamp via bpf_xdp_metadata_rx_timestamp().
* @NETDEV_XDP_RX_METADATA_HASH: Device is capable of exposing receive packet
* hash via bpf_xdp_metadata_rx_hash().
* @NETDEV_XDP_RX_METADATA_VLAN_TAG: Device is capable of exposing receive
* packet VLAN tag via bpf_xdp_metadata_rx_vlan_tag().
*/
enum netdev_xdp_rx_metadata {
NETDEV_XDP_RX_METADATA_TIMESTAMP = 1,
NETDEV_XDP_RX_METADATA_HASH = 2,
NETDEV_XDP_RX_METADATA_VLAN_TAG = 4,
};
/**
......
......@@ -53,6 +53,7 @@ const char *netdev_xdp_act_str(enum netdev_xdp_act value)
static const char * const netdev_xdp_rx_metadata_strmap[] = {
[0] = "timestamp",
[1] = "hash",
[2] = "vlan-tag",
};
const char *netdev_xdp_rx_metadata_str(enum netdev_xdp_rx_metadata value)
......
......@@ -20,7 +20,7 @@
#define UDP_PAYLOAD_BYTES 4
#define AF_XDP_SOURCE_PORT 1234
#define UDP_SOURCE_PORT 1234
#define AF_XDP_CONSUMER_PORT 8080
#define UMEM_NUM 16
......@@ -33,6 +33,18 @@
#define RX_ADDR "10.0.0.2"
#define PREFIX_LEN "8"
#define FAMILY AF_INET
#define TX_NETNS_NAME "xdp_metadata_tx"
#define RX_NETNS_NAME "xdp_metadata_rx"
#define TX_MAC "00:00:00:00:00:01"
#define RX_MAC "00:00:00:00:00:02"
#define VLAN_ID 59
#define VLAN_PROTO "802.1Q"
#define VLAN_PID htons(ETH_P_8021Q)
#define TX_NAME_VLAN TX_NAME "." TO_STR(VLAN_ID)
#define XDP_RSS_TYPE_L4 BIT(3)
#define VLAN_VID_MASK 0xfff
struct xsk {
void *umem_area;
......@@ -181,7 +193,7 @@ static int generate_packet(struct xsk *xsk, __u16 dst_port)
ASSERT_EQ(inet_pton(FAMILY, RX_ADDR, &iph->daddr), 1, "inet_pton(RX_ADDR)");
ip_csum(iph);
udph->source = htons(AF_XDP_SOURCE_PORT);
udph->source = htons(UDP_SOURCE_PORT);
udph->dest = htons(dst_port);
udph->len = htons(sizeof(*udph) + UDP_PAYLOAD_BYTES);
udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
......@@ -204,6 +216,30 @@ static int generate_packet(struct xsk *xsk, __u16 dst_port)
return 0;
}
static int generate_packet_inet(void)
{
char udp_payload[UDP_PAYLOAD_BYTES];
struct sockaddr_in rx_addr;
int sock_fd, err = 0;
/* Build a packet */
memset(udp_payload, 0xAA, UDP_PAYLOAD_BYTES);
rx_addr.sin_addr.s_addr = inet_addr(RX_ADDR);
rx_addr.sin_family = AF_INET;
rx_addr.sin_port = htons(AF_XDP_CONSUMER_PORT);
sock_fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
if (!ASSERT_GE(sock_fd, 0, "socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)"))
return sock_fd;
err = sendto(sock_fd, udp_payload, UDP_PAYLOAD_BYTES, MSG_DONTWAIT,
(void *)&rx_addr, sizeof(rx_addr));
ASSERT_GE(err, 0, "sendto");
close(sock_fd);
return err;
}
static void complete_tx(struct xsk *xsk)
{
struct xsk_tx_metadata *meta;
......@@ -236,7 +272,7 @@ static void refill_rx(struct xsk *xsk, __u64 addr)
}
}
static int verify_xsk_metadata(struct xsk *xsk)
static int verify_xsk_metadata(struct xsk *xsk, bool sent_from_af_xdp)
{
const struct xdp_desc *rx_desc;
struct pollfd fds = {};
......@@ -290,17 +326,42 @@ static int verify_xsk_metadata(struct xsk *xsk)
if (!ASSERT_NEQ(meta->rx_hash, 0, "rx_hash"))
return -1;
if (!sent_from_af_xdp) {
if (!ASSERT_NEQ(meta->rx_hash_type & XDP_RSS_TYPE_L4, 0, "rx_hash_type"))
return -1;
if (!ASSERT_EQ(meta->rx_vlan_tci & VLAN_VID_MASK, VLAN_ID, "rx_vlan_tci"))
return -1;
if (!ASSERT_EQ(meta->rx_vlan_proto, VLAN_PID, "rx_vlan_proto"))
return -1;
goto done;
}
ASSERT_EQ(meta->rx_hash_type, 0, "rx_hash_type");
/* checksum offload */
ASSERT_EQ(udph->check, htons(0x721c), "csum");
done:
xsk_ring_cons__release(&xsk->rx, 1);
refill_rx(xsk, comp_addr);
return 0;
}
static void switch_ns_to_rx(struct nstoken **tok)
{
close_netns(*tok);
*tok = open_netns(RX_NETNS_NAME);
}
static void switch_ns_to_tx(struct nstoken **tok)
{
close_netns(*tok);
*tok = open_netns(TX_NETNS_NAME);
}
void test_xdp_metadata(void)
{
struct xdp_metadata2 *bpf_obj2 = NULL;
......@@ -318,27 +379,35 @@ void test_xdp_metadata(void)
int sock_fd;
int ret;
/* Setup new networking namespace, with a veth pair. */
/* Setup new networking namespaces, with a veth pair. */
SYS(out, "ip netns add " TX_NETNS_NAME);
SYS(out, "ip netns add " RX_NETNS_NAME);
SYS(out, "ip netns add xdp_metadata");
tok = open_netns("xdp_metadata");
tok = open_netns(TX_NETNS_NAME);
SYS(out, "ip link add numtxqueues 1 numrxqueues 1 " TX_NAME
" type veth peer " RX_NAME " numtxqueues 1 numrxqueues 1");
SYS(out, "ip link set dev " TX_NAME " address 00:00:00:00:00:01");
SYS(out, "ip link set dev " RX_NAME " address 00:00:00:00:00:02");
SYS(out, "ip link set " RX_NAME " netns " RX_NETNS_NAME);
SYS(out, "ip link set dev " TX_NAME " address " TX_MAC);
SYS(out, "ip link set dev " TX_NAME " up");
SYS(out, "ip link add link " TX_NAME " " TX_NAME_VLAN
" type vlan proto " VLAN_PROTO " id " TO_STR(VLAN_ID));
SYS(out, "ip link set dev " TX_NAME_VLAN " up");
SYS(out, "ip addr add " TX_ADDR "/" PREFIX_LEN " dev " TX_NAME_VLAN);
/* Avoid ARP calls */
SYS(out, "ip -4 neigh add " RX_ADDR " lladdr " RX_MAC " dev " TX_NAME_VLAN);
switch_ns_to_rx(&tok);
SYS(out, "ip link set dev " RX_NAME " address " RX_MAC);
SYS(out, "ip link set dev " RX_NAME " up");
SYS(out, "ip addr add " TX_ADDR "/" PREFIX_LEN " dev " TX_NAME);
SYS(out, "ip addr add " RX_ADDR "/" PREFIX_LEN " dev " RX_NAME);
rx_ifindex = if_nametoindex(RX_NAME);
tx_ifindex = if_nametoindex(TX_NAME);
/* Setup separate AF_XDP for TX and RX interfaces. */
ret = open_xsk(tx_ifindex, &tx_xsk);
if (!ASSERT_OK(ret, "open_xsk(TX_NAME)"))
goto out;
/* Setup separate AF_XDP for RX interface. */
ret = open_xsk(rx_ifindex, &rx_xsk);
if (!ASSERT_OK(ret, "open_xsk(RX_NAME)"))
......@@ -379,18 +448,38 @@ void test_xdp_metadata(void)
if (!ASSERT_GE(ret, 0, "bpf_map_update_elem"))
goto out;
/* Send packet destined to RX AF_XDP socket. */
switch_ns_to_tx(&tok);
/* Setup separate AF_XDP for TX interface nad send packet to the RX socket. */
tx_ifindex = if_nametoindex(TX_NAME);
ret = open_xsk(tx_ifindex, &tx_xsk);
if (!ASSERT_OK(ret, "open_xsk(TX_NAME)"))
goto out;
if (!ASSERT_GE(generate_packet(&tx_xsk, AF_XDP_CONSUMER_PORT), 0,
"generate AF_XDP_CONSUMER_PORT"))
goto out;
/* Verify AF_XDP RX packet has proper metadata. */
if (!ASSERT_GE(verify_xsk_metadata(&rx_xsk), 0,
switch_ns_to_rx(&tok);
/* Verify packet sent from AF_XDP has proper metadata. */
if (!ASSERT_GE(verify_xsk_metadata(&rx_xsk, true), 0,
"verify_xsk_metadata"))
goto out;
switch_ns_to_tx(&tok);
complete_tx(&tx_xsk);
/* Now check metadata of packet, generated with network stack */
if (!ASSERT_GE(generate_packet_inet(), 0, "generate UDP packet"))
goto out;
switch_ns_to_rx(&tok);
if (!ASSERT_GE(verify_xsk_metadata(&rx_xsk, false), 0,
"verify_xsk_metadata"))
goto out;
/* Make sure freplace correctly picks up original bound device
* and doesn't crash.
*/
......@@ -408,11 +497,15 @@ void test_xdp_metadata(void)
if (!ASSERT_OK(xdp_metadata2__attach(bpf_obj2), "attach freplace"))
goto out;
switch_ns_to_tx(&tok);
/* Send packet to trigger . */
if (!ASSERT_GE(generate_packet(&tx_xsk, AF_XDP_CONSUMER_PORT), 0,
"generate freplace packet"))
goto out;
switch_ns_to_rx(&tok);
while (!retries--) {
if (bpf_obj2->bss->called)
break;
......@@ -427,5 +520,6 @@ void test_xdp_metadata(void)
xdp_metadata__destroy(bpf_obj);
if (tok)
close_netns(tok);
SYS_NOFAIL("ip netns del xdp_metadata");
SYS_NOFAIL("ip netns del " RX_NETNS_NAME);
SYS_NOFAIL("ip netns del " TX_NETNS_NAME);
}
......@@ -20,21 +20,32 @@ extern int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx,
__u64 *timestamp) __ksym;
extern int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, __u32 *hash,
enum xdp_rss_hash_type *rss_type) __ksym;
extern int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx,
__be16 *vlan_proto,
__u16 *vlan_tci) __ksym;
SEC("xdp.frags")
int rx(struct xdp_md *ctx)
{
void *data, *data_meta, *data_end;
struct ipv6hdr *ip6h = NULL;
struct ethhdr *eth = NULL;
struct udphdr *udp = NULL;
struct iphdr *iph = NULL;
struct xdp_meta *meta;
struct ethhdr *eth;
int err;
data = (void *)(long)ctx->data;
data_end = (void *)(long)ctx->data_end;
eth = data;
if (eth + 1 < data_end && (eth->h_proto == bpf_htons(ETH_P_8021AD) ||
eth->h_proto == bpf_htons(ETH_P_8021Q)))
eth = (void *)eth + sizeof(struct vlan_hdr);
if (eth + 1 < data_end && eth->h_proto == bpf_htons(ETH_P_8021Q))
eth = (void *)eth + sizeof(struct vlan_hdr);
if (eth + 1 < data_end) {
if (eth->h_proto == bpf_htons(ETH_P_IP)) {
iph = (void *)(eth + 1);
......@@ -76,15 +87,28 @@ int rx(struct xdp_md *ctx)
return XDP_PASS;
}
meta->hint_valid = 0;
meta->xdp_timestamp = bpf_ktime_get_tai_ns();
err = bpf_xdp_metadata_rx_timestamp(ctx, &meta->rx_timestamp);
if (!err)
meta->xdp_timestamp = bpf_ktime_get_tai_ns();
if (err)
meta->rx_timestamp_err = err;
else
meta->rx_timestamp = 0; /* Used by AF_XDP as not avail signal */
meta->hint_valid |= XDP_META_FIELD_TS;
err = bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash, &meta->rx_hash_type);
if (err < 0)
meta->rx_hash_err = err; /* Used by AF_XDP as no hash signal */
err = bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash,
&meta->rx_hash_type);
if (err)
meta->rx_hash_err = err;
else
meta->hint_valid |= XDP_META_FIELD_RSS;
err = bpf_xdp_metadata_rx_vlan_tag(ctx, &meta->rx_vlan_proto,
&meta->rx_vlan_tci);
if (err)
meta->rx_vlan_tag_err = err;
else
meta->hint_valid |= XDP_META_FIELD_VLAN_TAG;
__sync_add_and_fetch(&pkts_redir, 1);
return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS);
......
......@@ -23,6 +23,9 @@ extern int bpf_xdp_metadata_rx_timestamp(const struct xdp_md *ctx,
__u64 *timestamp) __ksym;
extern int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, __u32 *hash,
enum xdp_rss_hash_type *rss_type) __ksym;
extern int bpf_xdp_metadata_rx_vlan_tag(const struct xdp_md *ctx,
__be16 *vlan_proto,
__u16 *vlan_tci) __ksym;
SEC("xdp")
int rx(struct xdp_md *ctx)
......@@ -86,6 +89,8 @@ int rx(struct xdp_md *ctx)
meta->rx_timestamp = 1;
bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash, &meta->rx_hash_type);
bpf_xdp_metadata_rx_vlan_tag(ctx, &meta->rx_vlan_proto,
&meta->rx_vlan_tci);
return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS);
}
......
......@@ -9,6 +9,9 @@
#include <bpf/libbpf.h>
#include <time.h>
#define __TO_STR(x) #x
#define TO_STR(x) __TO_STR(x)
int parse_num_list(const char *s, bool **set, int *set_len);
__u32 link_info_prog_id(const struct bpf_link *link, struct bpf_link_info *info);
int bpf_prog_test_load(const char *file, enum bpf_prog_type type,
......
......@@ -21,6 +21,9 @@
#include "xsk.h"
#include <error.h>
#include <linux/kernel.h>
#include <linux/bits.h>
#include <linux/bitfield.h>
#include <linux/errqueue.h>
#include <linux/if_link.h>
#include <linux/net_tstamp.h>
......@@ -182,19 +185,31 @@ static void print_tstamp_delta(const char *name, const char *refname,
(double)delta / 1000);
}
#define VLAN_PRIO_MASK GENMASK(15, 13) /* Priority Code Point */
#define VLAN_DEI_MASK GENMASK(12, 12) /* Drop Eligible Indicator */
#define VLAN_VID_MASK GENMASK(11, 0) /* VLAN Identifier */
static void print_vlan_tci(__u16 tag)
{
__u16 vlan_id = FIELD_GET(VLAN_VID_MASK, tag);
__u8 pcp = FIELD_GET(VLAN_PRIO_MASK, tag);
bool dei = FIELD_GET(VLAN_DEI_MASK, tag);
printf("PCP=%u, DEI=%d, VID=0x%X\n", pcp, dei, vlan_id);
}
static void verify_xdp_metadata(void *data, clockid_t clock_id)
{
struct xdp_meta *meta;
meta = data - sizeof(*meta);
if (meta->rx_hash_err < 0)
printf("No rx_hash err=%d\n", meta->rx_hash_err);
else
if (meta->hint_valid & XDP_META_FIELD_RSS)
printf("rx_hash: 0x%X with RSS type:0x%X\n",
meta->rx_hash, meta->rx_hash_type);
else
printf("No rx_hash, err=%d\n", meta->rx_hash_err);
if (meta->rx_timestamp) {
if (meta->hint_valid & XDP_META_FIELD_TS) {
__u64 ref_tstamp = gettime(clock_id);
/* store received timestamps to calculate a delta at tx */
......@@ -206,7 +221,16 @@ static void verify_xdp_metadata(void *data, clockid_t clock_id)
print_tstamp_delta("XDP RX-time", "User RX-time",
meta->xdp_timestamp, ref_tstamp);
} else {
printf("No rx_timestamp\n");
printf("No rx_timestamp, err=%d\n", meta->rx_timestamp_err);
}
if (meta->hint_valid & XDP_META_FIELD_VLAN_TAG) {
printf("rx_vlan_proto: 0x%X\n", ntohs(meta->rx_vlan_proto));
printf("rx_vlan_tci: ");
print_vlan_tci(meta->rx_vlan_tci);
} else {
printf("No rx_vlan_tci or rx_vlan_proto, err=%d\n",
meta->rx_vlan_tag_err);
}
}
......
......@@ -9,12 +9,44 @@
#define ETH_P_IPV6 0x86DD
#endif
#ifndef ETH_P_8021Q
#define ETH_P_8021Q 0x8100
#endif
#ifndef ETH_P_8021AD
#define ETH_P_8021AD 0x88A8
#endif
#ifndef BIT
#define BIT(nr) (1 << (nr))
#endif
/* Non-existent checksum status */
#define XDP_CHECKSUM_MAGIC BIT(2)
enum xdp_meta_field {
XDP_META_FIELD_TS = BIT(0),
XDP_META_FIELD_RSS = BIT(1),
XDP_META_FIELD_VLAN_TAG = BIT(2),
};
struct xdp_meta {
__u64 rx_timestamp;
union {
__u64 rx_timestamp;
__s32 rx_timestamp_err;
};
__u64 xdp_timestamp;
__u32 rx_hash;
union {
__u32 rx_hash_type;
__s32 rx_hash_err;
};
union {
struct {
__be16 rx_vlan_proto;
__u16 rx_vlan_tci;
};
__s32 rx_vlan_tag_err;
};
enum xdp_meta_field hint_valid;
};
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment