Commit 4b837ad5 authored by David S. Miller's avatar David S. Miller

Merge branch 'netfilter-flowtable'

Pablo Neira Ayuso says:

====================
netfilter: flowtable enhancements

[ This is v2 that includes documentation enhancements, including
  existing limitations. This is a rebase on top on net-next. ]

The following patchset augments the Netfilter flowtable fastpath to
support for network topologies that combine IP forwarding, bridge,
classic VLAN devices, bridge VLAN filtering, DSA and PPPoE. This
includes support for the flowtable software and hardware datapaths.

The following pictures provides an example scenario:

                        fast path!
                .------------------------.
               /                          \
               |           IP forwarding  |
               |          /             \ \/
               |       br0               wan ..... eth0
               .       / \                         host C
               -> veth1  veth2
                   .           switch/router
                   .
                   .
                 eth0
                host A

The bridge master device 'br0' has an IP address and a DHCP server is
also assumed to be running to provide connectivity to host A which
reaches the Internet through 'br0' as default gateway. Then, packet
enters the IP forwarding path and Netfilter is used to NAT the packets
before they leave through the wan device.

The general idea is to accelerate forwarding by building a fast path
that takes packets from the ingress path of the bridge port and place
them in the egress path of the wan device (and vice versa). Hence,
skipping the classic bridge and IP stack paths.

** Patch from #1 to #6 add the infrastructure which describes the list of
   netdevice hops to reach a given destination MAC address in the local
   network topology.

Patch #1 adds dev_fill_forward_path() and .ndo_fill_forward_path() to
         netdev_ops.

Patch #2 adds .ndo_fill_forward_path for vlan devices, which provides
         the next device hop via vlan->real_dev, the vlan ID and the
         protocol.

Patch #3 adds .ndo_fill_forward_path for bridge devices, which allows to make
         lookups to the FDB to locate the next device hop (bridge port) in the
         forwarding path.

Patch #4 extends bridge .ndo_fill_forward_path to support for bridge VLAN
         filtering.

Patch #5 adds .ndo_fill_forward_path for PPPoE devices.

Patch #6 adds .ndo_fill_forward_path for DSA.

Patches from #7 to #14 update the flowtable software datapath:

Patch #7 adds the transmit path type field to the flow tuple. Two transmit
         paths are supported so far: the neighbour and the xfrm transmit
         paths.

Patch #8 and #9 update the flowtable datapath to use dev_fill_forward_path()
         to obtain the real ingress/egress device for the flowtable datapath.
         This adds the new ethernet xmit direct path to the flowtable.

Patch #10 adds native flowtable VLAN support (up to 2 VLAN tags) through
          dev_fill_forward_path(). The flowtable stores the VLAN id and
          protocol in the flow tuple.

Patch #11 adds native flowtable bridge VLAN filter support through
          dev_fill_forward_path().

Patch #12 adds native flowtable bridge PPPoE through dev_fill_forward_path().

Patch #13 adds DSA support through dev_fill_forward_path().

Patch #14 extends flowtable selftests to cover for flowtable software
          datapath enhancements.

** Patches from #15 to #20 update the flowtable hardware offload datapath:

Patch #15 extends the flowtable hardware offload to support for the
          direct ethernet xmit path. This also includes VLAN support.

Patch #16 stores the egress real device in the flow tuple. The software
          flowtable datapath uses dev_hard_header() to transmit packets,
          hence it might refer to VLAN/DSA/PPPoE software device, not
          the real ethernet device.

Patch #17 deals with switchdev PVID hardware offload to skip it on
          egress.

Patch #18 adds FLOW_ACTION_PPPOE_PUSH to the flow_offload action API.

Patch #19 extends the flowtable hardware offload to support for PPPoE

Patch #20 adds TC_SETUP_FT support for DSA.

** Patches from #20 to #23: Felix Fietkau adds a new driver which support
   hardware offload for the mtk PPE engine through the existing flow
   offload API which supports for the flowtable enhancements coming in
   this batch.

Patch #24 extends the documentation and describe existing limitations.

Please, apply, thanks.
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents ad248f77 143490cd
......@@ -4,35 +4,38 @@
Netfilter's flowtable infrastructure
====================================
This documentation describes the software flowtable infrastructure available in
Netfilter since Linux kernel 4.16.
This documentation describes the Netfilter flowtable infrastructure which allows
you to define a fastpath through the flowtable datapath. This infrastructure
also provides hardware offload support. The flowtable supports for the layer 3
IPv4 and IPv6 and the layer 4 TCP and UDP protocols.
Overview
--------
Initial packets follow the classic forwarding path, once the flow enters the
established state according to the conntrack semantics (ie. we have seen traffic
in both directions), then you can decide to offload the flow to the flowtable
from the forward chain via the 'flow offload' action available in nftables.
Once the first packet of the flow successfully goes through the IP forwarding
path, from the second packet on, you might decide to offload the flow to the
flowtable through your ruleset. The flowtable infrastructure provides a rule
action that allows you to specify when to add a flow to the flowtable.
Packets that find an entry in the flowtable (ie. flowtable hit) are sent to the
output netdevice via neigh_xmit(), hence, they bypass the classic forwarding
path (the visible effect is that you do not see these packets from any of the
netfilter hooks coming after the ingress). In case of flowtable miss, the packet
follows the classic forward path.
A packet that finds a matching entry in the flowtable (ie. flowtable hit) is
transmitted to the output netdevice via neigh_xmit(), hence, packets bypass the
classic IP forwarding path (the visible effect is that you do not see these
packets from any of the Netfilter hooks coming after ingress). In case that
there is no matching entry in the flowtable (ie. flowtable miss), the packet
follows the classic IP forwarding path.
The flowtable uses a resizable hashtable, lookups are based on the following
7-tuple selectors: source, destination, layer 3 and layer 4 protocols, source
and destination ports and the input interface (useful in case there are several
conntrack zones in place).
The flowtable uses a resizable hashtable. Lookups are based on the following
n-tuple selectors: layer 2 protocol encapsulation (VLAN and PPPoE), layer 3
source and destination, layer 4 source and destination ports and the input
interface (useful in case there are several conntrack zones in place).
Flowtables are populated via the 'flow offload' nftables action, so the user can
selectively specify what flows are placed into the flow table. Hence, packets
follow the classic forwarding path unless the user explicitly instruct packets
to use this new alternative forwarding path via nftables policy.
The 'flow add' action allows you to populate the flowtable, the user selectively
specifies what flows are placed into the flowtable. Hence, packets follow the
classic IP forwarding path unless the user explicitly instruct flows to use this
new alternative forwarding path via policy.
This is represented in Fig.1, which describes the classic forwarding path
including the Netfilter hooks and the flowtable fastpath bypass.
The flowtable datapath is represented in Fig.1, which describes the classic IP
forwarding path including the Netfilter hooks and the flowtable fastpath bypass.
::
......@@ -67,11 +70,13 @@ including the Netfilter hooks and the flowtable fastpath bypass.
Fig.1 Netfilter hooks and flowtable interactions
The flowtable entry also stores the NAT configuration, so all packets are
mangled according to the NAT policy that matches the initial packets that went
through the classic forwarding path. The TTL is decremented before calling
neigh_xmit(). Fragmented traffic is passed up to follow the classic forwarding
path given that the transport selectors are missing, therefore flowtable lookup
is not possible.
mangled according to the NAT policy that is specified from the classic IP
forwarding path. The TTL is decremented before calling neigh_xmit(). Fragmented
traffic is passed up to follow the classic IP forwarding path given that the
transport header is missing, in this case, flowtable lookups are not possible.
TCP RST and FIN packets are also passed up to the classic IP forwarding path to
release the flow gracefully. Packets that exceed the MTU are also passed up to
the classic forwarding path to report packet-too-big ICMP errors to the sender.
Example configuration
---------------------
......@@ -85,7 +90,7 @@ flowtable and add one rule to your forward chain::
}
chain y {
type filter hook forward priority 0; policy accept;
ip protocol tcp flow offload @f
ip protocol tcp flow add @f
counter packets 0 bytes 0
}
}
......@@ -103,6 +108,117 @@ flow is offloaded, you will observe that the counter rule in the example above
does not get updated for the packets that are being forwarded through the
forwarding bypass.
You can identify offloaded flows through the [OFFLOAD] tag when listing your
connection tracking table.
::
# conntrack -L
tcp 6 src=10.141.10.2 dst=192.168.10.2 sport=52728 dport=5201 src=192.168.10.2 dst=192.168.10.1 sport=5201 dport=52728 [OFFLOAD] mark=0 use=2
Layer 2 encapsulation
---------------------
Since Linux kernel 5.13, the flowtable infrastructure discovers the real
netdevice behind VLAN and PPPoE netdevices. The flowtable software datapath
parses the VLAN and PPPoE layer 2 headers to extract the ethertype and the
VLAN ID / PPPoE session ID which are used for the flowtable lookups. The
flowtable datapath also deals with layer 2 decapsulation.
You do not need to add the PPPoE and the VLAN devices to your flowtable,
instead the real device is sufficient for the flowtable to track your flows.
Bridge and IP forwarding
------------------------
Since Linux kernel 5.13, you can add bridge ports to the flowtable. The
flowtable infrastructure discovers the topology behind the bridge device. This
allows the flowtable to define a fastpath bypass between the bridge ports
(represented as eth1 and eth2 in the example figure below) and the gateway
device (represented as eth0) in your switch/router.
::
fastpath bypass
.-------------------------.
/ \
| IP forwarding |
| / \ \/
| br0 eth0 ..... eth0
. / \ *host B*
-> eth1 eth2
. *switch/router*
.
.
eth0
*host A*
The flowtable infrastructure also supports for bridge VLAN filtering actions
such as PVID and untagged. You can also stack a classic VLAN device on top of
your bridge port.
If you would like that your flowtable defines a fastpath between your bridge
ports and your IP forwarding path, you have to add your bridge ports (as
represented by the real netdevice) to your flowtable definition.
Counters
--------
The flowtable can synchronize packet and byte counters with the existing
connection tracking entry by specifying the counter statement in your flowtable
definition, e.g.
::
table inet x {
flowtable f {
hook ingress priority 0; devices = { eth0, eth1 };
counter
}
...
}
Counter support is available since Linux kernel 5.7.
Hardware offload
----------------
If your network device provides hardware offload support, you can turn it on by
means of the 'offload' flag in your flowtable definition, e.g.
::
table inet x {
flowtable f {
hook ingress priority 0; devices = { eth0, eth1 };
flags offload;
}
...
}
There is a workqueue that adds the flows to the hardware. Note that a few
packets might still run over the flowtable software path until the workqueue has
a chance to offload the flow to the network device.
You can identify hardware offloaded flows through the [HW_OFFLOAD] tag when
listing your connection tracking table. Please, note that the [OFFLOAD] tag
refers to the software offload mode, so there is a distinction between [OFFLOAD]
which refers to the software flowtable fastpath and [HW_OFFLOAD] which refers
to the hardware offload datapath being used by the flow.
The flowtable hardware offload infrastructure also supports for the DSA
(Distributed Switch Architecture).
Limitations
-----------
The flowtable behaves like a cache. The flowtable entries might get stale if
either the destination MAC address or the egress netdevice that is used for
transmission changes.
This might be a problem if:
- You run the flowtable in software mode and you combine bridge and IP
forwarding in your setup.
- Hardware offload is enabled.
More reading
------------
......
......@@ -4,5 +4,5 @@
#
obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth.o
mtk_eth-y := mtk_eth_soc.o mtk_sgmii.o mtk_eth_path.o
mtk_eth-y := mtk_eth_soc.o mtk_sgmii.o mtk_eth_path.o mtk_ppe.o mtk_ppe_debugfs.o mtk_ppe_offload.o
obj-$(CONFIG_NET_MEDIATEK_STAR_EMAC) += mtk_star_emac.o
......@@ -19,6 +19,7 @@
#include <linux/interrupt.h>
#include <linux/pinctrl/devinfo.h>
#include <linux/phylink.h>
#include <net/dsa.h>
#include "mtk_eth_soc.h"
......@@ -1264,13 +1265,12 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
break;
/* find out which mac the packet come from. values start at 1 */
if (MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628)) {
if (MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628) ||
(trxd.rxd4 & RX_DMA_SPECIAL_TAG))
mac = 0;
} else {
mac = (trxd.rxd4 >> RX_DMA_FPORT_SHIFT) &
RX_DMA_FPORT_MASK;
mac--;
}
else
mac = ((trxd.rxd4 >> RX_DMA_FPORT_SHIFT) &
RX_DMA_FPORT_MASK) - 1;
if (unlikely(mac < 0 || mac >= MTK_MAC_COUNT ||
!eth->netdev[mac]))
......@@ -2233,6 +2233,9 @@ static void mtk_gdm_config(struct mtk_eth *eth, u32 config)
val |= config;
if (!i && eth->netdev[0] && netdev_uses_dsa(eth->netdev[0]))
val |= MTK_GDMA_SPECIAL_TAG;
mtk_w32(eth, val, MTK_GDMA_FWD_CFG(i));
}
/* Reset and enable PSE */
......@@ -2255,12 +2258,17 @@ static int mtk_open(struct net_device *dev)
/* we run 2 netdevs on the same dma ring so we only bring it up once */
if (!refcount_read(&eth->dma_refcnt)) {
int err = mtk_start_dma(eth);
u32 gdm_config = MTK_GDMA_TO_PDMA;
int err;
err = mtk_start_dma(eth);
if (err)
return err;
mtk_gdm_config(eth, MTK_GDMA_TO_PDMA);
if (eth->soc->offload_version && mtk_ppe_start(&eth->ppe) == 0)
gdm_config = MTK_GDMA_TO_PPE;
mtk_gdm_config(eth, gdm_config);
napi_enable(&eth->tx_napi);
napi_enable(&eth->rx_napi);
......@@ -2327,6 +2335,9 @@ static int mtk_stop(struct net_device *dev)
mtk_dma_free(eth);
if (eth->soc->offload_version)
mtk_ppe_stop(&eth->ppe);
return 0;
}
......@@ -2832,6 +2843,7 @@ static const struct net_device_ops mtk_netdev_ops = {
#ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = mtk_poll_controller,
#endif
.ndo_setup_tc = mtk_eth_setup_tc,
};
static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np)
......@@ -3088,6 +3100,17 @@ static int mtk_probe(struct platform_device *pdev)
goto err_free_dev;
}
if (eth->soc->offload_version) {
err = mtk_ppe_init(&eth->ppe, eth->dev,
eth->base + MTK_ETH_PPE_BASE, 2);
if (err)
goto err_free_dev;
err = mtk_eth_offload_init(eth);
if (err)
goto err_free_dev;
}
for (i = 0; i < MTK_MAX_DEVS; i++) {
if (!eth->netdev[i])
continue;
......@@ -3162,6 +3185,7 @@ static const struct mtk_soc_data mt7621_data = {
.hw_features = MTK_HW_FEATURES,
.required_clks = MT7621_CLKS_BITMAP,
.required_pctl = false,
.offload_version = 2,
};
static const struct mtk_soc_data mt7622_data = {
......@@ -3170,6 +3194,7 @@ static const struct mtk_soc_data mt7622_data = {
.hw_features = MTK_HW_FEATURES,
.required_clks = MT7622_CLKS_BITMAP,
.required_pctl = false,
.offload_version = 2,
};
static const struct mtk_soc_data mt7623_data = {
......
......@@ -15,6 +15,8 @@
#include <linux/u64_stats_sync.h>
#include <linux/refcount.h>
#include <linux/phylink.h>
#include <linux/rhashtable.h>
#include "mtk_ppe.h"
#define MTK_QDMA_PAGE_SIZE 2048
#define MTK_MAX_RX_LENGTH 1536
......@@ -40,7 +42,8 @@
NETIF_F_HW_VLAN_CTAG_RX | \
NETIF_F_SG | NETIF_F_TSO | \
NETIF_F_TSO6 | \
NETIF_F_IPV6_CSUM)
NETIF_F_IPV6_CSUM |\
NETIF_F_HW_TC)
#define MTK_HW_FEATURES_MT7628 (NETIF_F_SG | NETIF_F_RXCSUM)
#define NEXT_DESP_IDX(X, Y) (((X) + 1) & ((Y) - 1))
......@@ -82,10 +85,12 @@
/* GDM Exgress Control Register */
#define MTK_GDMA_FWD_CFG(x) (0x500 + (x * 0x1000))
#define MTK_GDMA_SPECIAL_TAG BIT(24)
#define MTK_GDMA_ICS_EN BIT(22)
#define MTK_GDMA_TCS_EN BIT(21)
#define MTK_GDMA_UCS_EN BIT(20)
#define MTK_GDMA_TO_PDMA 0x0
#define MTK_GDMA_TO_PPE 0x4444
#define MTK_GDMA_DROP_ALL 0x7777
/* Unicast Filter MAC Address Register - Low */
......@@ -300,11 +305,18 @@
/* QDMA descriptor rxd3 */
#define RX_DMA_VID(_x) ((_x) & 0xfff)
/* QDMA descriptor rxd4 */
#define MTK_RXD4_FOE_ENTRY GENMASK(13, 0)
#define MTK_RXD4_PPE_CPU_REASON GENMASK(18, 14)
#define MTK_RXD4_SRC_PORT GENMASK(21, 19)
#define MTK_RXD4_ALG GENMASK(31, 22)
/* QDMA descriptor rxd4 */
#define RX_DMA_L4_VALID BIT(24)
#define RX_DMA_L4_VALID_PDMA BIT(30) /* when PDMA is used */
#define RX_DMA_FPORT_SHIFT 19
#define RX_DMA_FPORT_MASK 0x7
#define RX_DMA_SPECIAL_TAG BIT(22)
/* PHY Indirect Access Control registers */
#define MTK_PHY_IAC 0x10004
......@@ -802,6 +814,7 @@ struct mtk_soc_data {
u32 caps;
u32 required_clks;
bool required_pctl;
u8 offload_version;
netdev_features_t hw_features;
};
......@@ -901,6 +914,9 @@ struct mtk_eth {
u32 tx_int_status_reg;
u32 rx_dma_l4_valid;
int ip_align;
struct mtk_ppe ppe;
struct rhashtable flow_table;
};
/* struct mtk_mac - the structure that holds the info about the MACs of the
......@@ -945,4 +961,9 @@ int mtk_gmac_sgmii_path_setup(struct mtk_eth *eth, int mac_id);
int mtk_gmac_gephy_path_setup(struct mtk_eth *eth, int mac_id);
int mtk_gmac_rgmii_path_setup(struct mtk_eth *eth, int mac_id);
int mtk_eth_offload_init(struct mtk_eth *eth);
int mtk_eth_setup_tc(struct net_device *dev, enum tc_setup_type type,
void *type_data);
#endif /* MTK_ETH_H */
This diff is collapsed.
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (C) 2020 Felix Fietkau <nbd@nbd.name> */
#ifndef __MTK_PPE_H
#define __MTK_PPE_H
#include <linux/kernel.h>
#include <linux/bitfield.h>
#define MTK_ETH_PPE_BASE 0xc00
#define MTK_PPE_ENTRIES_SHIFT 3
#define MTK_PPE_ENTRIES (1024 << MTK_PPE_ENTRIES_SHIFT)
#define MTK_PPE_HASH_MASK (MTK_PPE_ENTRIES - 1)
#define MTK_FOE_IB1_UNBIND_TIMESTAMP GENMASK(7, 0)
#define MTK_FOE_IB1_UNBIND_PACKETS GENMASK(23, 8)
#define MTK_FOE_IB1_UNBIND_PREBIND BIT(24)
#define MTK_FOE_IB1_BIND_TIMESTAMP GENMASK(14, 0)
#define MTK_FOE_IB1_BIND_KEEPALIVE BIT(15)
#define MTK_FOE_IB1_BIND_VLAN_LAYER GENMASK(18, 16)
#define MTK_FOE_IB1_BIND_PPPOE BIT(19)
#define MTK_FOE_IB1_BIND_VLAN_TAG BIT(20)
#define MTK_FOE_IB1_BIND_PKT_SAMPLE BIT(21)
#define MTK_FOE_IB1_BIND_CACHE BIT(22)
#define MTK_FOE_IB1_BIND_TUNNEL_DECAP BIT(23)
#define MTK_FOE_IB1_BIND_TTL BIT(24)
#define MTK_FOE_IB1_PACKET_TYPE GENMASK(27, 25)
#define MTK_FOE_IB1_STATE GENMASK(29, 28)
#define MTK_FOE_IB1_UDP BIT(30)
#define MTK_FOE_IB1_STATIC BIT(31)
enum {
MTK_PPE_PKT_TYPE_IPV4_HNAPT = 0,
MTK_PPE_PKT_TYPE_IPV4_ROUTE = 1,
MTK_PPE_PKT_TYPE_BRIDGE = 2,
MTK_PPE_PKT_TYPE_IPV4_DSLITE = 3,
MTK_PPE_PKT_TYPE_IPV6_ROUTE_3T = 4,
MTK_PPE_PKT_TYPE_IPV6_ROUTE_5T = 5,
MTK_PPE_PKT_TYPE_IPV6_6RD = 7,
};
#define MTK_FOE_IB2_QID GENMASK(3, 0)
#define MTK_FOE_IB2_PSE_QOS BIT(4)
#define MTK_FOE_IB2_DEST_PORT GENMASK(7, 5)
#define MTK_FOE_IB2_MULTICAST BIT(8)
#define MTK_FOE_IB2_WHNAT_QID2 GENMASK(13, 12)
#define MTK_FOE_IB2_WHNAT_DEVIDX BIT(16)
#define MTK_FOE_IB2_WHNAT_NAT BIT(17)
#define MTK_FOE_IB2_PORT_MG GENMASK(17, 12)
#define MTK_FOE_IB2_PORT_AG GENMASK(23, 18)
#define MTK_FOE_IB2_DSCP GENMASK(31, 24)
#define MTK_FOE_VLAN2_WHNAT_BSS GEMMASK(5, 0)
#define MTK_FOE_VLAN2_WHNAT_WCID GENMASK(13, 6)
#define MTK_FOE_VLAN2_WHNAT_RING GENMASK(15, 14)
enum {
MTK_FOE_STATE_INVALID,
MTK_FOE_STATE_UNBIND,
MTK_FOE_STATE_BIND,
MTK_FOE_STATE_FIN
};
struct mtk_foe_mac_info {
u16 vlan1;
u16 etype;
u32 dest_mac_hi;
u16 vlan2;
u16 dest_mac_lo;
u32 src_mac_hi;
u16 pppoe_id;
u16 src_mac_lo;
};
struct mtk_foe_bridge {
u32 dest_mac_hi;
u16 src_mac_lo;
u16 dest_mac_lo;
u32 src_mac_hi;
u32 ib2;
u32 _rsv[5];
u32 udf_tsid;
struct mtk_foe_mac_info l2;
};
struct mtk_ipv4_tuple {
u32 src_ip;
u32 dest_ip;
union {
struct {
u16 dest_port;
u16 src_port;
};
struct {
u8 protocol;
u8 _pad[3]; /* fill with 0xa5a5a5 */
};
u32 ports;
};
};
struct mtk_foe_ipv4 {
struct mtk_ipv4_tuple orig;
u32 ib2;
struct mtk_ipv4_tuple new;
u16 timestamp;
u16 _rsv0[3];
u32 udf_tsid;
struct mtk_foe_mac_info l2;
};
struct mtk_foe_ipv4_dslite {
struct mtk_ipv4_tuple ip4;
u32 tunnel_src_ip[4];
u32 tunnel_dest_ip[4];
u8 flow_label[3];
u8 priority;
u32 udf_tsid;
u32 ib2;
struct mtk_foe_mac_info l2;
};
struct mtk_foe_ipv6 {
u32 src_ip[4];
u32 dest_ip[4];
union {
struct {
u8 protocol;
u8 _pad[3]; /* fill with 0xa5a5a5 */
}; /* 3-tuple */
struct {
u16 dest_port;
u16 src_port;
}; /* 5-tuple */
u32 ports;
};
u32 _rsv[3];
u32 udf;
u32 ib2;
struct mtk_foe_mac_info l2;
};
struct mtk_foe_ipv6_6rd {
u32 src_ip[4];
u32 dest_ip[4];
u16 dest_port;
u16 src_port;
u32 tunnel_src_ip;
u32 tunnel_dest_ip;
u16 hdr_csum;
u8 dscp;
u8 ttl;
u8 flag;
u8 pad;
u8 per_flow_6rd_id;
u8 pad2;
u32 ib2;
struct mtk_foe_mac_info l2;
};
struct mtk_foe_entry {
u32 ib1;
union {
struct mtk_foe_bridge bridge;
struct mtk_foe_ipv4 ipv4;
struct mtk_foe_ipv4_dslite dslite;
struct mtk_foe_ipv6 ipv6;
struct mtk_foe_ipv6_6rd ipv6_6rd;
u32 data[19];
};
};
enum {
MTK_PPE_CPU_REASON_TTL_EXCEEDED = 0x02,
MTK_PPE_CPU_REASON_OPTION_HEADER = 0x03,
MTK_PPE_CPU_REASON_NO_FLOW = 0x07,
MTK_PPE_CPU_REASON_IPV4_FRAG = 0x08,
MTK_PPE_CPU_REASON_IPV4_DSLITE_FRAG = 0x09,
MTK_PPE_CPU_REASON_IPV4_DSLITE_NO_TCP_UDP = 0x0a,
MTK_PPE_CPU_REASON_IPV6_6RD_NO_TCP_UDP = 0x0b,
MTK_PPE_CPU_REASON_TCP_FIN_SYN_RST = 0x0c,
MTK_PPE_CPU_REASON_UN_HIT = 0x0d,
MTK_PPE_CPU_REASON_HIT_UNBIND = 0x0e,
MTK_PPE_CPU_REASON_HIT_UNBIND_RATE_REACHED = 0x0f,
MTK_PPE_CPU_REASON_HIT_BIND_TCP_FIN = 0x10,
MTK_PPE_CPU_REASON_HIT_TTL_1 = 0x11,
MTK_PPE_CPU_REASON_HIT_BIND_VLAN_VIOLATION = 0x12,
MTK_PPE_CPU_REASON_KEEPALIVE_UC_OLD_HDR = 0x13,
MTK_PPE_CPU_REASON_KEEPALIVE_MC_NEW_HDR = 0x14,
MTK_PPE_CPU_REASON_KEEPALIVE_DUP_OLD_HDR = 0x15,
MTK_PPE_CPU_REASON_HIT_BIND_FORCE_CPU = 0x16,
MTK_PPE_CPU_REASON_TUNNEL_OPTION_HEADER = 0x17,
MTK_PPE_CPU_REASON_MULTICAST_TO_CPU = 0x18,
MTK_PPE_CPU_REASON_MULTICAST_TO_GMAC1_CPU = 0x19,
MTK_PPE_CPU_REASON_HIT_PRE_BIND = 0x1a,
MTK_PPE_CPU_REASON_PACKET_SAMPLING = 0x1b,
MTK_PPE_CPU_REASON_EXCEED_MTU = 0x1c,
MTK_PPE_CPU_REASON_PPE_BYPASS = 0x1e,
MTK_PPE_CPU_REASON_INVALID = 0x1f,
};
struct mtk_ppe {
struct device *dev;
void __iomem *base;
int version;
struct mtk_foe_entry *foe_table;
dma_addr_t foe_phys;
void *acct_table;
};
int mtk_ppe_init(struct mtk_ppe *ppe, struct device *dev, void __iomem *base,
int version);
int mtk_ppe_start(struct mtk_ppe *ppe);
int mtk_ppe_stop(struct mtk_ppe *ppe);
static inline void
mtk_foe_entry_clear(struct mtk_ppe *ppe, u16 hash)
{
ppe->foe_table[hash].ib1 = 0;
dma_wmb();
}
static inline int
mtk_foe_entry_timestamp(struct mtk_ppe *ppe, u16 hash)
{
u32 ib1 = READ_ONCE(ppe->foe_table[hash].ib1);
if (FIELD_GET(MTK_FOE_IB1_STATE, ib1) != MTK_FOE_STATE_BIND)
return -1;
return FIELD_GET(MTK_FOE_IB1_BIND_TIMESTAMP, ib1);
}
int mtk_foe_entry_prepare(struct mtk_foe_entry *entry, int type, int l4proto,
u8 pse_port, u8 *src_mac, u8 *dest_mac);
int mtk_foe_entry_set_pse_port(struct mtk_foe_entry *entry, u8 port);
int mtk_foe_entry_set_ipv4_tuple(struct mtk_foe_entry *entry, bool orig,
__be32 src_addr, __be16 src_port,
__be32 dest_addr, __be16 dest_port);
int mtk_foe_entry_set_ipv6_tuple(struct mtk_foe_entry *entry,
__be32 *src_addr, __be16 src_port,
__be32 *dest_addr, __be16 dest_port);
int mtk_foe_entry_set_dsa(struct mtk_foe_entry *entry, int port);
int mtk_foe_entry_set_vlan(struct mtk_foe_entry *entry, int vid);
int mtk_foe_entry_set_pppoe(struct mtk_foe_entry *entry, int sid);
int mtk_foe_entry_commit(struct mtk_ppe *ppe, struct mtk_foe_entry *entry,
u16 timestamp);
int mtk_ppe_debugfs_init(struct mtk_ppe *ppe);
#endif
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (C) 2020 Felix Fietkau <nbd@nbd.name> */
#include <linux/kernel.h>
#include <linux/debugfs.h>
#include "mtk_eth_soc.h"
struct mtk_flow_addr_info
{
void *src, *dest;
u16 *src_port, *dest_port;
bool ipv6;
};
static const char *mtk_foe_entry_state_str(int state)
{
static const char * const state_str[] = {
[MTK_FOE_STATE_INVALID] = "INV",
[MTK_FOE_STATE_UNBIND] = "UNB",
[MTK_FOE_STATE_BIND] = "BND",
[MTK_FOE_STATE_FIN] = "FIN",
};
if (state >= ARRAY_SIZE(state_str) || !state_str[state])
return "UNK";
return state_str[state];
}
static const char *mtk_foe_pkt_type_str(int type)
{
static const char * const type_str[] = {
[MTK_PPE_PKT_TYPE_IPV4_HNAPT] = "IPv4 5T",
[MTK_PPE_PKT_TYPE_IPV4_ROUTE] = "IPv4 3T",
[MTK_PPE_PKT_TYPE_BRIDGE] = "L2",
[MTK_PPE_PKT_TYPE_IPV4_DSLITE] = "DS-LITE",
[MTK_PPE_PKT_TYPE_IPV6_ROUTE_3T] = "IPv6 3T",
[MTK_PPE_PKT_TYPE_IPV6_ROUTE_5T] = "IPv6 5T",
[MTK_PPE_PKT_TYPE_IPV6_6RD] = "6RD",
};
if (type >= ARRAY_SIZE(type_str) || !type_str[type])
return "UNKNOWN";
return type_str[type];
}
static void
mtk_print_addr(struct seq_file *m, u32 *addr, bool ipv6)
{
u32 n_addr[4];
int i;
if (!ipv6) {
seq_printf(m, "%pI4h", addr);
return;
}
for (i = 0; i < ARRAY_SIZE(n_addr); i++)
n_addr[i] = htonl(addr[i]);
seq_printf(m, "%pI6", n_addr);
}
static void
mtk_print_addr_info(struct seq_file *m, struct mtk_flow_addr_info *ai)
{
mtk_print_addr(m, ai->src, ai->ipv6);
if (ai->src_port)
seq_printf(m, ":%d", *ai->src_port);
seq_printf(m, "->");
mtk_print_addr(m, ai->dest, ai->ipv6);
if (ai->dest_port)
seq_printf(m, ":%d", *ai->dest_port);
}
static int
mtk_ppe_debugfs_foe_show(struct seq_file *m, void *private, bool bind)
{
struct mtk_ppe *ppe = m->private;
int i, count;
for (i = 0, count = 0; i < MTK_PPE_ENTRIES; i++) {
struct mtk_foe_entry *entry = &ppe->foe_table[i];
struct mtk_foe_mac_info *l2;
struct mtk_flow_addr_info ai = {};
unsigned char h_source[ETH_ALEN];
unsigned char h_dest[ETH_ALEN];
int type, state;
u32 ib2;
state = FIELD_GET(MTK_FOE_IB1_STATE, entry->ib1);
if (!state)
continue;
if (bind && state != MTK_FOE_STATE_BIND)
continue;
type = FIELD_GET(MTK_FOE_IB1_PACKET_TYPE, entry->ib1);
seq_printf(m, "%05x %s %7s", i,
mtk_foe_entry_state_str(state),
mtk_foe_pkt_type_str(type));
switch (type) {
case MTK_PPE_PKT_TYPE_IPV4_HNAPT:
case MTK_PPE_PKT_TYPE_IPV4_DSLITE:
ai.src_port = &entry->ipv4.orig.src_port;
ai.dest_port = &entry->ipv4.orig.dest_port;
fallthrough;
case MTK_PPE_PKT_TYPE_IPV4_ROUTE:
ai.src = &entry->ipv4.orig.src_ip;
ai.dest = &entry->ipv4.orig.dest_ip;
break;
case MTK_PPE_PKT_TYPE_IPV6_ROUTE_5T:
ai.src_port = &entry->ipv6.src_port;
ai.dest_port = &entry->ipv6.dest_port;
fallthrough;
case MTK_PPE_PKT_TYPE_IPV6_ROUTE_3T:
case MTK_PPE_PKT_TYPE_IPV6_6RD:
ai.src = &entry->ipv6.src_ip;
ai.dest = &entry->ipv6.dest_ip;
ai.ipv6 = true;
break;
}
seq_printf(m, " orig=");
mtk_print_addr_info(m, &ai);
switch (type) {
case MTK_PPE_PKT_TYPE_IPV4_HNAPT:
case MTK_PPE_PKT_TYPE_IPV4_DSLITE:
ai.src_port = &entry->ipv4.new.src_port;
ai.dest_port = &entry->ipv4.new.dest_port;
fallthrough;
case MTK_PPE_PKT_TYPE_IPV4_ROUTE:
ai.src = &entry->ipv4.new.src_ip;
ai.dest = &entry->ipv4.new.dest_ip;
seq_printf(m, " new=");
mtk_print_addr_info(m, &ai);
break;
}
if (type >= MTK_PPE_PKT_TYPE_IPV4_DSLITE) {
l2 = &entry->ipv6.l2;
ib2 = entry->ipv6.ib2;
} else {
l2 = &entry->ipv4.l2;
ib2 = entry->ipv4.ib2;
}
*((__be32 *)h_source) = htonl(l2->src_mac_hi);
*((__be16 *)&h_source[4]) = htons(l2->src_mac_lo);
*((__be32 *)h_dest) = htonl(l2->dest_mac_hi);
*((__be16 *)&h_dest[4]) = htons(l2->dest_mac_lo);
seq_printf(m, " eth=%pM->%pM etype=%04x"
" vlan=%d,%d ib1=%08x ib2=%08x\n",
h_source, h_dest, ntohs(l2->etype),
l2->vlan1, l2->vlan2, entry->ib1, ib2);
}
return 0;
}
static int
mtk_ppe_debugfs_foe_show_all(struct seq_file *m, void *private)
{
return mtk_ppe_debugfs_foe_show(m, private, false);
}
static int
mtk_ppe_debugfs_foe_show_bind(struct seq_file *m, void *private)
{
return mtk_ppe_debugfs_foe_show(m, private, true);
}
static int
mtk_ppe_debugfs_foe_open_all(struct inode *inode, struct file *file)
{
return single_open(file, mtk_ppe_debugfs_foe_show_all,
inode->i_private);
}
static int
mtk_ppe_debugfs_foe_open_bind(struct inode *inode, struct file *file)
{
return single_open(file, mtk_ppe_debugfs_foe_show_bind,
inode->i_private);
}
int mtk_ppe_debugfs_init(struct mtk_ppe *ppe)
{
static const struct file_operations fops_all = {
.open = mtk_ppe_debugfs_foe_open_all,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
};
static const struct file_operations fops_bind = {
.open = mtk_ppe_debugfs_foe_open_bind,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
};
struct dentry *root;
root = debugfs_create_dir("mtk_ppe", NULL);
if (!root)
return -ENOMEM;
debugfs_create_file("entries", S_IRUGO, root, ppe, &fops_all);
debugfs_create_file("bind", S_IRUGO, root, ppe, &fops_bind);
return 0;
}
This diff is collapsed.
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright (C) 2020 Felix Fietkau <nbd@nbd.name> */
#ifndef __MTK_PPE_REGS_H
#define __MTK_PPE_REGS_H
#define MTK_PPE_GLO_CFG 0x200
#define MTK_PPE_GLO_CFG_EN BIT(0)
#define MTK_PPE_GLO_CFG_TSID_EN BIT(1)
#define MTK_PPE_GLO_CFG_IP4_L4_CS_DROP BIT(2)
#define MTK_PPE_GLO_CFG_IP4_CS_DROP BIT(3)
#define MTK_PPE_GLO_CFG_TTL0_DROP BIT(4)
#define MTK_PPE_GLO_CFG_PPE_BSWAP BIT(5)
#define MTK_PPE_GLO_CFG_PSE_HASH_OFS BIT(6)
#define MTK_PPE_GLO_CFG_MCAST_TB_EN BIT(7)
#define MTK_PPE_GLO_CFG_FLOW_DROP_KA BIT(8)
#define MTK_PPE_GLO_CFG_FLOW_DROP_UPDATE BIT(9)
#define MTK_PPE_GLO_CFG_UDP_LITE_EN BIT(10)
#define MTK_PPE_GLO_CFG_UDP_LEN_DROP BIT(11)
#define MTK_PPE_GLO_CFG_MCAST_ENTRIES GNEMASK(13, 12)
#define MTK_PPE_GLO_CFG_BUSY BIT(31)
#define MTK_PPE_FLOW_CFG 0x204
#define MTK_PPE_FLOW_CFG_IP4_TCP_FRAG BIT(6)
#define MTK_PPE_FLOW_CFG_IP4_UDP_FRAG BIT(7)
#define MTK_PPE_FLOW_CFG_IP6_3T_ROUTE BIT(8)
#define MTK_PPE_FLOW_CFG_IP6_5T_ROUTE BIT(9)
#define MTK_PPE_FLOW_CFG_IP6_6RD BIT(10)
#define MTK_PPE_FLOW_CFG_IP4_NAT BIT(12)
#define MTK_PPE_FLOW_CFG_IP4_NAPT BIT(13)
#define MTK_PPE_FLOW_CFG_IP4_DSLITE BIT(14)
#define MTK_PPE_FLOW_CFG_L2_BRIDGE BIT(15)
#define MTK_PPE_FLOW_CFG_IP_PROTO_BLACKLIST BIT(16)
#define MTK_PPE_FLOW_CFG_IP4_NAT_FRAG BIT(17)
#define MTK_PPE_FLOW_CFG_IP4_HASH_FLOW_LABEL BIT(18)
#define MTK_PPE_FLOW_CFG_IP4_HASH_GRE_KEY BIT(19)
#define MTK_PPE_FLOW_CFG_IP6_HASH_GRE_KEY BIT(20)
#define MTK_PPE_IP_PROTO_CHK 0x208
#define MTK_PPE_IP_PROTO_CHK_IPV4 GENMASK(15, 0)
#define MTK_PPE_IP_PROTO_CHK_IPV6 GENMASK(31, 16)
#define MTK_PPE_TB_CFG 0x21c
#define MTK_PPE_TB_CFG_ENTRY_NUM GENMASK(2, 0)
#define MTK_PPE_TB_CFG_ENTRY_80B BIT(3)
#define MTK_PPE_TB_CFG_SEARCH_MISS GENMASK(5, 4)
#define MTK_PPE_TB_CFG_AGE_PREBIND BIT(6)
#define MTK_PPE_TB_CFG_AGE_NON_L4 BIT(7)
#define MTK_PPE_TB_CFG_AGE_UNBIND BIT(8)
#define MTK_PPE_TB_CFG_AGE_TCP BIT(9)
#define MTK_PPE_TB_CFG_AGE_UDP BIT(10)
#define MTK_PPE_TB_CFG_AGE_TCP_FIN BIT(11)
#define MTK_PPE_TB_CFG_KEEPALIVE GENMASK(13, 12)
#define MTK_PPE_TB_CFG_HASH_MODE GENMASK(15, 14)
#define MTK_PPE_TB_CFG_SCAN_MODE GENMASK(17, 16)
#define MTK_PPE_TB_CFG_HASH_DEBUG GENMASK(19, 18)
enum {
MTK_PPE_SCAN_MODE_DISABLED,
MTK_PPE_SCAN_MODE_CHECK_AGE,
MTK_PPE_SCAN_MODE_KEEPALIVE_AGE,
};
enum {
MTK_PPE_KEEPALIVE_DISABLE,
MTK_PPE_KEEPALIVE_UNICAST_CPU,
MTK_PPE_KEEPALIVE_DUP_CPU = 3,
};
enum {
MTK_PPE_SEARCH_MISS_ACTION_DROP,
MTK_PPE_SEARCH_MISS_ACTION_FORWARD = 2,
MTK_PPE_SEARCH_MISS_ACTION_FORWARD_BUILD = 3,
};
#define MTK_PPE_TB_BASE 0x220
#define MTK_PPE_TB_USED 0x224
#define MTK_PPE_TB_USED_NUM GENMASK(13, 0)
#define MTK_PPE_BIND_RATE 0x228
#define MTK_PPE_BIND_RATE_BIND GENMASK(15, 0)
#define MTK_PPE_BIND_RATE_PREBIND GENMASK(31, 16)
#define MTK_PPE_BIND_LIMIT0 0x22c
#define MTK_PPE_BIND_LIMIT0_QUARTER GENMASK(13, 0)
#define MTK_PPE_BIND_LIMIT0_HALF GENMASK(29, 16)
#define MTK_PPE_BIND_LIMIT1 0x230
#define MTK_PPE_BIND_LIMIT1_FULL GENMASK(13, 0)
#define MTK_PPE_BIND_LIMIT1_NON_L4 GENMASK(23, 16)
#define MTK_PPE_KEEPALIVE 0x234
#define MTK_PPE_KEEPALIVE_TIME GENMASK(15, 0)
#define MTK_PPE_KEEPALIVE_TIME_TCP GENMASK(23, 16)
#define MTK_PPE_KEEPALIVE_TIME_UDP GENMASK(31, 24)
#define MTK_PPE_UNBIND_AGE 0x238
#define MTK_PPE_UNBIND_AGE_MIN_PACKETS GENMASK(31, 16)
#define MTK_PPE_UNBIND_AGE_DELTA GENMASK(7, 0)
#define MTK_PPE_BIND_AGE0 0x23c
#define MTK_PPE_BIND_AGE0_DELTA_NON_L4 GENMASK(30, 16)
#define MTK_PPE_BIND_AGE0_DELTA_UDP GENMASK(14, 0)
#define MTK_PPE_BIND_AGE1 0x240
#define MTK_PPE_BIND_AGE1_DELTA_TCP_FIN GENMASK(30, 16)
#define MTK_PPE_BIND_AGE1_DELTA_TCP GENMASK(14, 0)
#define MTK_PPE_HASH_SEED 0x244
#define MTK_PPE_DEFAULT_CPU_PORT 0x248
#define MTK_PPE_DEFAULT_CPU_PORT_MASK(_n) (GENMASK(2, 0) << ((_n) * 4))
#define MTK_PPE_MTU_DROP 0x308
#define MTK_PPE_VLAN_MTU0 0x30c
#define MTK_PPE_VLAN_MTU0_NONE GENMASK(13, 0)
#define MTK_PPE_VLAN_MTU0_1TAG GENMASK(29, 16)
#define MTK_PPE_VLAN_MTU1 0x310
#define MTK_PPE_VLAN_MTU1_2TAG GENMASK(13, 0)
#define MTK_PPE_VLAN_MTU1_3TAG GENMASK(29, 16)
#define MTK_PPE_VPM_TPID 0x318
#define MTK_PPE_CACHE_CTL 0x320
#define MTK_PPE_CACHE_CTL_EN BIT(0)
#define MTK_PPE_CACHE_CTL_LOCK_CLR BIT(4)
#define MTK_PPE_CACHE_CTL_REQ BIT(8)
#define MTK_PPE_CACHE_CTL_CLEAR BIT(9)
#define MTK_PPE_CACHE_CTL_CMD GENMASK(13, 12)
#define MTK_PPE_MIB_CFG 0x334
#define MTK_PPE_MIB_CFG_EN BIT(0)
#define MTK_PPE_MIB_CFG_RD_CLR BIT(1)
#define MTK_PPE_MIB_TB_BASE 0x338
#define MTK_PPE_MIB_CACHE_CTL 0x350
#define MTK_PPE_MIB_CACHE_CTL_EN BIT(0)
#define MTK_PPE_MIB_CACHE_CTL_FLUSH BIT(2)
#endif
......@@ -1560,12 +1560,34 @@ static void ppp_dev_priv_destructor(struct net_device *dev)
ppp_destroy_interface(ppp);
}
static int ppp_fill_forward_path(struct net_device_path_ctx *ctx,
struct net_device_path *path)
{
struct ppp *ppp = netdev_priv(ctx->dev);
struct ppp_channel *chan;
struct channel *pch;
if (ppp->flags & SC_MULTILINK)
return -EOPNOTSUPP;
if (list_empty(&ppp->channels))
return -ENODEV;
pch = list_first_entry(&ppp->channels, struct channel, clist);
chan = pch->chan;
if (!chan->ops->fill_forward_path)
return -EOPNOTSUPP;
return chan->ops->fill_forward_path(ctx, path, chan);
}
static const struct net_device_ops ppp_netdev_ops = {
.ndo_init = ppp_dev_init,
.ndo_uninit = ppp_dev_uninit,
.ndo_start_xmit = ppp_start_xmit,
.ndo_do_ioctl = ppp_net_ioctl,
.ndo_get_stats64 = ppp_get_stats64,
.ndo_fill_forward_path = ppp_fill_forward_path,
};
static struct device_type ppp_type = {
......
......@@ -972,8 +972,31 @@ static int pppoe_xmit(struct ppp_channel *chan, struct sk_buff *skb)
return __pppoe_xmit(sk, skb);
}
static int pppoe_fill_forward_path(struct net_device_path_ctx *ctx,
struct net_device_path *path,
const struct ppp_channel *chan)
{
struct sock *sk = (struct sock *)chan->private;
struct pppox_sock *po = pppox_sk(sk);
struct net_device *dev = po->pppoe_dev;
if (sock_flag(sk, SOCK_DEAD) ||
!(sk->sk_state & PPPOX_CONNECTED) || !dev)
return -1;
path->type = DEV_PATH_PPPOE;
path->encap.proto = htons(ETH_P_PPP_SES);
path->encap.id = be16_to_cpu(po->num);
memcpy(path->encap.h_dest, po->pppoe_pa.remote, ETH_ALEN);
path->dev = ctx->dev;
ctx->dev = dev;
return 0;
}
static const struct ppp_channel_ops pppoe_chan_ops = {
.start_xmit = pppoe_xmit,
.fill_forward_path = pppoe_fill_forward_path,
};
static int pppoe_recvmsg(struct socket *sock, struct msghdr *m,
......
......@@ -848,6 +848,59 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
struct sk_buff *skb,
struct net_device *sb_dev);
enum net_device_path_type {
DEV_PATH_ETHERNET = 0,
DEV_PATH_VLAN,
DEV_PATH_BRIDGE,
DEV_PATH_PPPOE,
DEV_PATH_DSA,
};
struct net_device_path {
enum net_device_path_type type;
const struct net_device *dev;
union {
struct {
u16 id;
__be16 proto;
u8 h_dest[ETH_ALEN];
} encap;
struct {
enum {
DEV_PATH_BR_VLAN_KEEP,
DEV_PATH_BR_VLAN_TAG,
DEV_PATH_BR_VLAN_UNTAG,
DEV_PATH_BR_VLAN_UNTAG_HW,
} vlan_mode;
u16 vlan_id;
__be16 vlan_proto;
} bridge;
struct {
int port;
u16 proto;
} dsa;
};
};
#define NET_DEVICE_PATH_STACK_MAX 5
#define NET_DEVICE_PATH_VLAN_MAX 2
struct net_device_path_stack {
int num_paths;
struct net_device_path path[NET_DEVICE_PATH_STACK_MAX];
};
struct net_device_path_ctx {
const struct net_device *dev;
const u8 *daddr;
int num_vlans;
struct {
u16 id;
__be16 proto;
} vlan[NET_DEVICE_PATH_VLAN_MAX];
};
enum tc_setup_type {
TC_SETUP_QDISC_MQPRIO,
TC_SETUP_CLSU32,
......@@ -1282,6 +1335,8 @@ struct netdev_net_notifier {
* struct net_device *(*ndo_get_peer_dev)(struct net_device *dev);
* If a device is paired with a peer device, return the peer instance.
* The caller must be under RCU read context.
* int (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx, struct net_device_path *path);
* Get the forwarding path to reach the real device from the HW destination address
*/
struct net_device_ops {
int (*ndo_init)(struct net_device *dev);
......@@ -1488,6 +1543,8 @@ struct net_device_ops {
int (*ndo_tunnel_ctl)(struct net_device *dev,
struct ip_tunnel_parm *p, int cmd);
struct net_device * (*ndo_get_peer_dev)(struct net_device *dev);
int (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx,
struct net_device_path *path);
};
/**
......@@ -2870,6 +2927,8 @@ void dev_remove_offload(struct packet_offload *po);
int dev_get_iflink(const struct net_device *dev);
int dev_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb);
int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
struct net_device_path_stack *stack);
struct net_device *__dev_get_by_flags(struct net *net, unsigned short flags,
unsigned short mask);
struct net_device *dev_get_by_name(struct net *net, const char *name);
......
......@@ -28,6 +28,9 @@ struct ppp_channel_ops {
int (*start_xmit)(struct ppp_channel *, struct sk_buff *);
/* Handle an ioctl call that has come in via /dev/ppp. */
int (*ioctl)(struct ppp_channel *, unsigned int, unsigned long);
int (*fill_forward_path)(struct net_device_path_ctx *,
struct net_device_path *,
const struct ppp_channel *);
};
struct ppp_channel {
......
......@@ -147,6 +147,7 @@ enum flow_action_id {
FLOW_ACTION_MPLS_POP,
FLOW_ACTION_MPLS_MANGLE,
FLOW_ACTION_GATE,
FLOW_ACTION_PPPOE_PUSH,
NUM_FLOW_ACTIONS,
};
......@@ -274,6 +275,9 @@ struct flow_action_entry {
u32 num_entries;
struct action_gate_entry *entries;
} gate;
struct { /* FLOW_ACTION_PPPOE_PUSH */
u16 sid;
} pppoe;
};
struct flow_action_cookie *cookie; /* user defined action cookie */
};
......
......@@ -89,6 +89,14 @@ enum flow_offload_tuple_dir {
};
#define FLOW_OFFLOAD_DIR_MAX IP_CT_DIR_MAX
enum flow_offload_xmit_type {
FLOW_OFFLOAD_XMIT_NEIGH = 0,
FLOW_OFFLOAD_XMIT_XFRM,
FLOW_OFFLOAD_XMIT_DIRECT,
};
#define NF_FLOW_TABLE_ENCAP_MAX 2
struct flow_offload_tuple {
union {
struct in_addr src_v4;
......@@ -107,15 +115,28 @@ struct flow_offload_tuple {
u8 l3proto;
u8 l4proto;
struct {
u16 id;
__be16 proto;
} encap[NF_FLOW_TABLE_ENCAP_MAX];
/* All members above are keys for lookups, see flow_offload_hash(). */
struct { } __hash;
u8 dir;
u8 dir:2,
xmit_type:2,
encap_num:2,
in_vlan_ingress:2;
u16 mtu;
struct dst_entry *dst_cache;
union {
struct dst_entry *dst_cache;
struct {
u32 ifidx;
u32 hw_ifidx;
u8 h_source[ETH_ALEN];
u8 h_dest[ETH_ALEN];
} out;
};
};
struct flow_offload_tuple_rhash {
......@@ -158,7 +179,23 @@ static inline __s32 nf_flow_timeout_delta(unsigned int timeout)
struct nf_flow_route {
struct {
struct dst_entry *dst;
struct dst_entry *dst;
struct {
u32 ifindex;
struct {
u16 id;
__be16 proto;
} encap[NF_FLOW_TABLE_ENCAP_MAX];
u8 num_encaps:2,
ingress_vlans:2;
} in;
struct {
u32 ifindex;
u32 hw_ifindex;
u8 h_source[ETH_ALEN];
u8 h_dest[ETH_ALEN];
} out;
enum flow_offload_xmit_type xmit_type;
} tuple[FLOW_OFFLOAD_DIR_MAX];
};
......
......@@ -776,6 +776,26 @@ static int vlan_dev_get_iflink(const struct net_device *dev)
return real_dev->ifindex;
}
static int vlan_dev_fill_forward_path(struct net_device_path_ctx *ctx,
struct net_device_path *path)
{
struct vlan_dev_priv *vlan = vlan_dev_priv(ctx->dev);
path->type = DEV_PATH_VLAN;
path->encap.id = vlan->vlan_id;
path->encap.proto = vlan->vlan_proto;
path->dev = ctx->dev;
ctx->dev = vlan->real_dev;
if (ctx->num_vlans >= ARRAY_SIZE(ctx->vlan))
return -ENOSPC;
ctx->vlan[ctx->num_vlans].id = vlan->vlan_id;
ctx->vlan[ctx->num_vlans].proto = vlan->vlan_proto;
ctx->num_vlans++;
return 0;
}
static const struct ethtool_ops vlan_ethtool_ops = {
.get_link_ksettings = vlan_ethtool_get_link_ksettings,
.get_drvinfo = vlan_ethtool_get_drvinfo,
......@@ -814,6 +834,7 @@ static const struct net_device_ops vlan_netdev_ops = {
#endif
.ndo_fix_features = vlan_dev_fix_features,
.ndo_get_iflink = vlan_dev_get_iflink,
.ndo_fill_forward_path = vlan_dev_fill_forward_path,
};
static void vlan_dev_free(struct net_device *dev)
......
......@@ -385,6 +385,54 @@ static int br_del_slave(struct net_device *dev, struct net_device *slave_dev)
return br_del_if(br, slave_dev);
}
static int br_fill_forward_path(struct net_device_path_ctx *ctx,
struct net_device_path *path)
{
struct net_bridge_fdb_entry *f;
struct net_bridge_port *dst;
struct net_bridge *br;
if (netif_is_bridge_port(ctx->dev))
return -1;
br = netdev_priv(ctx->dev);
br_vlan_fill_forward_path_pvid(br, ctx, path);
f = br_fdb_find_rcu(br, ctx->daddr, path->bridge.vlan_id);
if (!f || !f->dst)
return -1;
dst = READ_ONCE(f->dst);
if (!dst)
return -1;
if (br_vlan_fill_forward_path_mode(br, dst, path))
return -1;
path->type = DEV_PATH_BRIDGE;
path->dev = dst->br->dev;
ctx->dev = dst->dev;
switch (path->bridge.vlan_mode) {
case DEV_PATH_BR_VLAN_TAG:
if (ctx->num_vlans >= ARRAY_SIZE(ctx->vlan))
return -ENOSPC;
ctx->vlan[ctx->num_vlans].id = path->bridge.vlan_id;
ctx->vlan[ctx->num_vlans].proto = path->bridge.vlan_proto;
ctx->num_vlans++;
break;
case DEV_PATH_BR_VLAN_UNTAG_HW:
case DEV_PATH_BR_VLAN_UNTAG:
ctx->num_vlans--;
break;
case DEV_PATH_BR_VLAN_KEEP:
break;
}
return 0;
}
static const struct ethtool_ops br_ethtool_ops = {
.get_drvinfo = br_getinfo,
.get_link = ethtool_op_get_link,
......@@ -419,6 +467,7 @@ static const struct net_device_ops br_netdev_ops = {
.ndo_bridge_setlink = br_setlink,
.ndo_bridge_dellink = br_dellink,
.ndo_features_check = passthru_features_check,
.ndo_fill_forward_path = br_fill_forward_path,
};
static struct device_type br_type = {
......
......@@ -1118,6 +1118,13 @@ void br_vlan_notify(const struct net_bridge *br,
bool br_vlan_can_enter_range(const struct net_bridge_vlan *v_curr,
const struct net_bridge_vlan *range_end);
void br_vlan_fill_forward_path_pvid(struct net_bridge *br,
struct net_device_path_ctx *ctx,
struct net_device_path *path);
int br_vlan_fill_forward_path_mode(struct net_bridge *br,
struct net_bridge_port *dst,
struct net_device_path *path);
static inline struct net_bridge_vlan_group *br_vlan_group(
const struct net_bridge *br)
{
......@@ -1277,6 +1284,19 @@ static inline int nbp_get_num_vlan_infos(struct net_bridge_port *p,
return 0;
}
static inline void br_vlan_fill_forward_path_pvid(struct net_bridge *br,
struct net_device_path_ctx *ctx,
struct net_device_path *path)
{
}
static inline int br_vlan_fill_forward_path_mode(struct net_bridge *br,
struct net_bridge_port *dst,
struct net_device_path *path)
{
return 0;
}
static inline struct net_bridge_vlan_group *br_vlan_group(
const struct net_bridge *br)
{
......
......@@ -1339,6 +1339,61 @@ int br_vlan_get_pvid_rcu(const struct net_device *dev, u16 *p_pvid)
}
EXPORT_SYMBOL_GPL(br_vlan_get_pvid_rcu);
void br_vlan_fill_forward_path_pvid(struct net_bridge *br,
struct net_device_path_ctx *ctx,
struct net_device_path *path)
{
struct net_bridge_vlan_group *vg;
int idx = ctx->num_vlans - 1;
u16 vid;
path->bridge.vlan_mode = DEV_PATH_BR_VLAN_KEEP;
if (!br_opt_get(br, BROPT_VLAN_ENABLED))
return;
vg = br_vlan_group(br);
if (idx >= 0 &&
ctx->vlan[idx].proto == br->vlan_proto) {
vid = ctx->vlan[idx].id;
} else {
path->bridge.vlan_mode = DEV_PATH_BR_VLAN_TAG;
vid = br_get_pvid(vg);
}
path->bridge.vlan_id = vid;
path->bridge.vlan_proto = br->vlan_proto;
}
int br_vlan_fill_forward_path_mode(struct net_bridge *br,
struct net_bridge_port *dst,
struct net_device_path *path)
{
struct net_bridge_vlan_group *vg;
struct net_bridge_vlan *v;
if (!br_opt_get(br, BROPT_VLAN_ENABLED))
return 0;
vg = nbp_vlan_group_rcu(dst);
v = br_vlan_find(vg, path->bridge.vlan_id);
if (!v || !br_vlan_should_use(v))
return -EINVAL;
if (!(v->flags & BRIDGE_VLAN_INFO_UNTAGGED))
return 0;
if (path->bridge.vlan_mode == DEV_PATH_BR_VLAN_TAG)
path->bridge.vlan_mode = DEV_PATH_BR_VLAN_KEEP;
else if (v->priv_flags & BR_VLFLAG_ADDED_BY_SWITCHDEV)
path->bridge.vlan_mode = DEV_PATH_BR_VLAN_UNTAG_HW;
else
path->bridge.vlan_mode = DEV_PATH_BR_VLAN_UNTAG;
return 0;
}
int br_vlan_get_info(const struct net_device *dev, u16 vid,
struct bridge_vlan_info *p_vinfo)
{
......
......@@ -848,6 +848,52 @@ int dev_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb)
}
EXPORT_SYMBOL_GPL(dev_fill_metadata_dst);
static struct net_device_path *dev_fwd_path(struct net_device_path_stack *stack)
{
int k = stack->num_paths++;
if (WARN_ON_ONCE(k >= NET_DEVICE_PATH_STACK_MAX))
return NULL;
return &stack->path[k];
}
int dev_fill_forward_path(const struct net_device *dev, const u8 *daddr,
struct net_device_path_stack *stack)
{
const struct net_device *last_dev;
struct net_device_path_ctx ctx = {
.dev = dev,
.daddr = daddr,
};
struct net_device_path *path;
int ret = 0;
stack->num_paths = 0;
while (ctx.dev && ctx.dev->netdev_ops->ndo_fill_forward_path) {
last_dev = ctx.dev;
path = dev_fwd_path(stack);
if (!path)
return -1;
memset(path, 0, sizeof(struct net_device_path));
ret = ctx.dev->netdev_ops->ndo_fill_forward_path(&ctx, path);
if (ret < 0)
return -1;
if (WARN_ON_ONCE(last_dev == ctx.dev))
return -1;
}
path = dev_fwd_path(stack);
if (!path)
return -1;
path->type = DEV_PATH_ETHERNET;
path->dev = ctx.dev;
return ret;
}
EXPORT_SYMBOL_GPL(dev_fill_forward_path);
/**
* __dev_get_by_name - find a device by its name
* @net: the applicable net namespace
......
......@@ -1278,14 +1278,32 @@ static int dsa_slave_setup_tc_block(struct net_device *dev,
}
}
static int dsa_slave_setup_ft_block(struct dsa_switch *ds, int port,
void *type_data)
{
struct dsa_port *cpu_dp = dsa_to_port(ds, port)->cpu_dp;
struct net_device *master = cpu_dp->master;
if (!master->netdev_ops->ndo_setup_tc)
return -EOPNOTSUPP;
return master->netdev_ops->ndo_setup_tc(master, TC_SETUP_FT, type_data);
}
static int dsa_slave_setup_tc(struct net_device *dev, enum tc_setup_type type,
void *type_data)
{
struct dsa_port *dp = dsa_slave_to_port(dev);
struct dsa_switch *ds = dp->ds;
if (type == TC_SETUP_BLOCK)
switch (type) {
case TC_SETUP_BLOCK:
return dsa_slave_setup_tc_block(dev, type_data);
case TC_SETUP_FT:
return dsa_slave_setup_ft_block(ds, dp->index, type_data);
default:
break;
}
if (!ds->ops->port_setup_tc)
return -EOPNOTSUPP;
......@@ -1654,6 +1672,21 @@ static void dsa_slave_get_stats64(struct net_device *dev,
dev_get_tstats64(dev, s);
}
static int dsa_slave_fill_forward_path(struct net_device_path_ctx *ctx,
struct net_device_path *path)
{
struct dsa_port *dp = dsa_slave_to_port(ctx->dev);
struct dsa_port *cpu_dp = dp->cpu_dp;
path->dev = ctx->dev;
path->type = DEV_PATH_DSA;
path->dsa.proto = cpu_dp->tag_ops->proto;
path->dsa.port = dp->index;
ctx->dev = cpu_dp->master;
return 0;
}
static const struct net_device_ops dsa_slave_netdev_ops = {
.ndo_open = dsa_slave_open,
.ndo_stop = dsa_slave_close,
......@@ -1679,6 +1712,7 @@ static const struct net_device_ops dsa_slave_netdev_ops = {
.ndo_vlan_rx_kill_vid = dsa_slave_vlan_rx_kill_vid,
.ndo_get_devlink_port = dsa_slave_get_devlink_port,
.ndo_change_mtu = dsa_slave_change_mtu,
.ndo_fill_forward_path = dsa_slave_fill_forward_path,
};
static struct device_type dsa_type = {
......
......@@ -79,11 +79,8 @@ static int flow_offload_fill_route(struct flow_offload *flow,
enum flow_offload_tuple_dir dir)
{
struct flow_offload_tuple *flow_tuple = &flow->tuplehash[dir].tuple;
struct dst_entry *other_dst = route->tuple[!dir].dst;
struct dst_entry *dst = route->tuple[dir].dst;
if (!dst_hold_safe(route->tuple[dir].dst))
return -1;
int i, j = 0;
switch (flow_tuple->l3proto) {
case NFPROTO_IPV4:
......@@ -94,12 +91,46 @@ static int flow_offload_fill_route(struct flow_offload *flow,
break;
}
flow_tuple->iifidx = other_dst->dev->ifindex;
flow_tuple->dst_cache = dst;
flow_tuple->iifidx = route->tuple[dir].in.ifindex;
for (i = route->tuple[dir].in.num_encaps - 1; i >= 0; i--) {
flow_tuple->encap[j].id = route->tuple[dir].in.encap[i].id;
flow_tuple->encap[j].proto = route->tuple[dir].in.encap[i].proto;
if (route->tuple[dir].in.ingress_vlans & BIT(i))
flow_tuple->in_vlan_ingress |= BIT(j);
j++;
}
flow_tuple->encap_num = route->tuple[dir].in.num_encaps;
switch (route->tuple[dir].xmit_type) {
case FLOW_OFFLOAD_XMIT_DIRECT:
memcpy(flow_tuple->out.h_dest, route->tuple[dir].out.h_dest,
ETH_ALEN);
memcpy(flow_tuple->out.h_source, route->tuple[dir].out.h_source,
ETH_ALEN);
flow_tuple->out.ifidx = route->tuple[dir].out.ifindex;
flow_tuple->out.hw_ifidx = route->tuple[dir].out.hw_ifindex;
break;
case FLOW_OFFLOAD_XMIT_XFRM:
case FLOW_OFFLOAD_XMIT_NEIGH:
if (!dst_hold_safe(route->tuple[dir].dst))
return -1;
flow_tuple->dst_cache = dst;
break;
}
flow_tuple->xmit_type = route->tuple[dir].xmit_type;
return 0;
}
static void nft_flow_dst_release(struct flow_offload *flow,
enum flow_offload_tuple_dir dir)
{
if (flow->tuplehash[dir].tuple.xmit_type == FLOW_OFFLOAD_XMIT_NEIGH ||
flow->tuplehash[dir].tuple.xmit_type == FLOW_OFFLOAD_XMIT_XFRM)
dst_release(flow->tuplehash[dir].tuple.dst_cache);
}
int flow_offload_route_init(struct flow_offload *flow,
const struct nf_flow_route *route)
{
......@@ -118,7 +149,7 @@ int flow_offload_route_init(struct flow_offload *flow,
return 0;
err_route_reply:
dst_release(route->tuple[FLOW_OFFLOAD_DIR_ORIGINAL].dst);
nft_flow_dst_release(flow, FLOW_OFFLOAD_DIR_ORIGINAL);
return err;
}
......@@ -169,8 +200,8 @@ static void flow_offload_fixup_ct(struct nf_conn *ct)
static void flow_offload_route_release(struct flow_offload *flow)
{
dst_release(flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_cache);
dst_release(flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_cache);
nft_flow_dst_release(flow, FLOW_OFFLOAD_DIR_ORIGINAL);
nft_flow_dst_release(flow, FLOW_OFFLOAD_DIR_REPLY);
}
void flow_offload_free(struct flow_offload *flow)
......
This diff is collapsed.
......@@ -177,28 +177,45 @@ static int flow_offload_eth_src(struct net *net,
enum flow_offload_tuple_dir dir,
struct nf_flow_rule *flow_rule)
{
const struct flow_offload_tuple *tuple = &flow->tuplehash[!dir].tuple;
struct flow_action_entry *entry0 = flow_action_entry_next(flow_rule);
struct flow_action_entry *entry1 = flow_action_entry_next(flow_rule);
struct net_device *dev;
const struct flow_offload_tuple *other_tuple, *this_tuple;
struct net_device *dev = NULL;
const unsigned char *addr;
u32 mask, val;
u16 val16;
dev = dev_get_by_index(net, tuple->iifidx);
if (!dev)
return -ENOENT;
this_tuple = &flow->tuplehash[dir].tuple;
switch (this_tuple->xmit_type) {
case FLOW_OFFLOAD_XMIT_DIRECT:
addr = this_tuple->out.h_source;
break;
case FLOW_OFFLOAD_XMIT_NEIGH:
other_tuple = &flow->tuplehash[!dir].tuple;
dev = dev_get_by_index(net, other_tuple->iifidx);
if (!dev)
return -ENOENT;
addr = dev->dev_addr;
break;
default:
return -EOPNOTSUPP;
}
mask = ~0xffff0000;
memcpy(&val16, dev->dev_addr, 2);
memcpy(&val16, addr, 2);
val = val16 << 16;
flow_offload_mangle(entry0, FLOW_ACT_MANGLE_HDR_TYPE_ETH, 4,
&val, &mask);
mask = ~0xffffffff;
memcpy(&val, dev->dev_addr + 2, 4);
memcpy(&val, addr + 2, 4);
flow_offload_mangle(entry1, FLOW_ACT_MANGLE_HDR_TYPE_ETH, 8,
&val, &mask);
dev_put(dev);
if (dev)
dev_put(dev);
return 0;
}
......@@ -210,27 +227,40 @@ static int flow_offload_eth_dst(struct net *net,
{
struct flow_action_entry *entry0 = flow_action_entry_next(flow_rule);
struct flow_action_entry *entry1 = flow_action_entry_next(flow_rule);
const void *daddr = &flow->tuplehash[!dir].tuple.src_v4;
const struct flow_offload_tuple *other_tuple, *this_tuple;
const struct dst_entry *dst_cache;
unsigned char ha[ETH_ALEN];
struct neighbour *n;
const void *daddr;
u32 mask, val;
u8 nud_state;
u16 val16;
dst_cache = flow->tuplehash[dir].tuple.dst_cache;
n = dst_neigh_lookup(dst_cache, daddr);
if (!n)
return -ENOENT;
read_lock_bh(&n->lock);
nud_state = n->nud_state;
ether_addr_copy(ha, n->ha);
read_unlock_bh(&n->lock);
this_tuple = &flow->tuplehash[dir].tuple;
if (!(nud_state & NUD_VALID)) {
switch (this_tuple->xmit_type) {
case FLOW_OFFLOAD_XMIT_DIRECT:
ether_addr_copy(ha, this_tuple->out.h_dest);
break;
case FLOW_OFFLOAD_XMIT_NEIGH:
other_tuple = &flow->tuplehash[!dir].tuple;
daddr = &other_tuple->src_v4;
dst_cache = this_tuple->dst_cache;
n = dst_neigh_lookup(dst_cache, daddr);
if (!n)
return -ENOENT;
read_lock_bh(&n->lock);
nud_state = n->nud_state;
ether_addr_copy(ha, n->ha);
read_unlock_bh(&n->lock);
neigh_release(n);
return -ENOENT;
if (!(nud_state & NUD_VALID))
return -ENOENT;
break;
default:
return -EOPNOTSUPP;
}
mask = ~0xffffffff;
......@@ -243,7 +273,6 @@ static int flow_offload_eth_dst(struct net *net,
val = val16;
flow_offload_mangle(entry1, FLOW_ACT_MANGLE_HDR_TYPE_ETH, 4,
&val, &mask);
neigh_release(n);
return 0;
}
......@@ -465,27 +494,52 @@ static void flow_offload_ipv4_checksum(struct net *net,
}
}
static void flow_offload_redirect(const struct flow_offload *flow,
static void flow_offload_redirect(struct net *net,
const struct flow_offload *flow,
enum flow_offload_tuple_dir dir,
struct nf_flow_rule *flow_rule)
{
struct flow_action_entry *entry = flow_action_entry_next(flow_rule);
struct rtable *rt;
const struct flow_offload_tuple *this_tuple, *other_tuple;
struct flow_action_entry *entry;
struct net_device *dev;
int ifindex;
this_tuple = &flow->tuplehash[dir].tuple;
switch (this_tuple->xmit_type) {
case FLOW_OFFLOAD_XMIT_DIRECT:
this_tuple = &flow->tuplehash[dir].tuple;
ifindex = this_tuple->out.hw_ifidx;
break;
case FLOW_OFFLOAD_XMIT_NEIGH:
other_tuple = &flow->tuplehash[!dir].tuple;
ifindex = other_tuple->iifidx;
break;
default:
return;
}
dev = dev_get_by_index(net, ifindex);
if (!dev)
return;
rt = (struct rtable *)flow->tuplehash[dir].tuple.dst_cache;
entry = flow_action_entry_next(flow_rule);
entry->id = FLOW_ACTION_REDIRECT;
entry->dev = rt->dst.dev;
dev_hold(rt->dst.dev);
entry->dev = dev;
}
static void flow_offload_encap_tunnel(const struct flow_offload *flow,
enum flow_offload_tuple_dir dir,
struct nf_flow_rule *flow_rule)
{
const struct flow_offload_tuple *this_tuple;
struct flow_action_entry *entry;
struct dst_entry *dst;
dst = flow->tuplehash[dir].tuple.dst_cache;
this_tuple = &flow->tuplehash[dir].tuple;
if (this_tuple->xmit_type == FLOW_OFFLOAD_XMIT_DIRECT)
return;
dst = this_tuple->dst_cache;
if (dst && dst->lwtstate) {
struct ip_tunnel_info *tun_info;
......@@ -502,10 +556,15 @@ static void flow_offload_decap_tunnel(const struct flow_offload *flow,
enum flow_offload_tuple_dir dir,
struct nf_flow_rule *flow_rule)
{
const struct flow_offload_tuple *other_tuple;
struct flow_action_entry *entry;
struct dst_entry *dst;
dst = flow->tuplehash[!dir].tuple.dst_cache;
other_tuple = &flow->tuplehash[!dir].tuple;
if (other_tuple->xmit_type == FLOW_OFFLOAD_XMIT_DIRECT)
return;
dst = other_tuple->dst_cache;
if (dst && dst->lwtstate) {
struct ip_tunnel_info *tun_info;
......@@ -517,10 +576,14 @@ static void flow_offload_decap_tunnel(const struct flow_offload *flow,
}
}
int nf_flow_rule_route_ipv4(struct net *net, const struct flow_offload *flow,
enum flow_offload_tuple_dir dir,
struct nf_flow_rule *flow_rule)
static int
nf_flow_rule_route_common(struct net *net, const struct flow_offload *flow,
enum flow_offload_tuple_dir dir,
struct nf_flow_rule *flow_rule)
{
const struct flow_offload_tuple *other_tuple;
int i;
flow_offload_decap_tunnel(flow, dir, flow_rule);
flow_offload_encap_tunnel(flow, dir, flow_rule);
......@@ -528,6 +591,39 @@ int nf_flow_rule_route_ipv4(struct net *net, const struct flow_offload *flow,
flow_offload_eth_dst(net, flow, dir, flow_rule) < 0)
return -1;
other_tuple = &flow->tuplehash[!dir].tuple;
for (i = 0; i < other_tuple->encap_num; i++) {
struct flow_action_entry *entry;
if (other_tuple->in_vlan_ingress & BIT(i))
continue;
entry = flow_action_entry_next(flow_rule);
switch (other_tuple->encap[i].proto) {
case htons(ETH_P_PPP_SES):
entry->id = FLOW_ACTION_PPPOE_PUSH;
entry->pppoe.sid = other_tuple->encap[i].id;
break;
case htons(ETH_P_8021Q):
entry->id = FLOW_ACTION_VLAN_PUSH;
entry->vlan.vid = other_tuple->encap[i].id;
entry->vlan.proto = other_tuple->encap[i].proto;
break;
}
}
return 0;
}
int nf_flow_rule_route_ipv4(struct net *net, const struct flow_offload *flow,
enum flow_offload_tuple_dir dir,
struct nf_flow_rule *flow_rule)
{
if (nf_flow_rule_route_common(net, flow, dir, flow_rule) < 0)
return -1;
if (test_bit(NF_FLOW_SNAT, &flow->flags)) {
flow_offload_ipv4_snat(net, flow, dir, flow_rule);
flow_offload_port_snat(net, flow, dir, flow_rule);
......@@ -540,7 +636,7 @@ int nf_flow_rule_route_ipv4(struct net *net, const struct flow_offload *flow,
test_bit(NF_FLOW_DNAT, &flow->flags))
flow_offload_ipv4_checksum(net, flow, flow_rule);
flow_offload_redirect(flow, dir, flow_rule);
flow_offload_redirect(net, flow, dir, flow_rule);
return 0;
}
......@@ -550,11 +646,7 @@ int nf_flow_rule_route_ipv6(struct net *net, const struct flow_offload *flow,
enum flow_offload_tuple_dir dir,
struct nf_flow_rule *flow_rule)
{
flow_offload_decap_tunnel(flow, dir, flow_rule);
flow_offload_encap_tunnel(flow, dir, flow_rule);
if (flow_offload_eth_src(net, flow, dir, flow_rule) < 0 ||
flow_offload_eth_dst(net, flow, dir, flow_rule) < 0)
if (nf_flow_rule_route_common(net, flow, dir, flow_rule) < 0)
return -1;
if (test_bit(NF_FLOW_SNAT, &flow->flags)) {
......@@ -566,7 +658,7 @@ int nf_flow_rule_route_ipv6(struct net *net, const struct flow_offload *flow,
flow_offload_port_dnat(net, flow, dir, flow_rule);
}
flow_offload_redirect(flow, dir, flow_rule);
flow_offload_redirect(net, flow, dir, flow_rule);
return 0;
}
......@@ -580,10 +672,10 @@ nf_flow_offload_rule_alloc(struct net *net,
enum flow_offload_tuple_dir dir)
{
const struct nf_flowtable *flowtable = offload->flowtable;
const struct flow_offload_tuple *tuple, *other_tuple;
const struct flow_offload *flow = offload->flow;
const struct flow_offload_tuple *tuple;
struct dst_entry *other_dst = NULL;
struct nf_flow_rule *flow_rule;
struct dst_entry *other_dst;
int err = -ENOMEM;
flow_rule = kzalloc(sizeof(*flow_rule), GFP_KERNEL);
......@@ -599,7 +691,10 @@ nf_flow_offload_rule_alloc(struct net *net,
flow_rule->rule->match.key = &flow_rule->match.key;
tuple = &flow->tuplehash[dir].tuple;
other_dst = flow->tuplehash[!dir].tuple.dst_cache;
other_tuple = &flow->tuplehash[!dir].tuple;
if (other_tuple->xmit_type == FLOW_OFFLOAD_XMIT_NEIGH)
other_dst = other_tuple->dst_cache;
err = nf_flow_rule_match(&flow_rule->match, tuple, other_dst);
if (err < 0)
goto err_flow_match;
......
......@@ -19,10 +19,205 @@ struct nft_flow_offload {
struct nft_flowtable *flowtable;
};
static enum flow_offload_xmit_type nft_xmit_type(struct dst_entry *dst)
{
if (dst_xfrm(dst))
return FLOW_OFFLOAD_XMIT_XFRM;
return FLOW_OFFLOAD_XMIT_NEIGH;
}
static void nft_default_forward_path(struct nf_flow_route *route,
struct dst_entry *dst_cache,
enum ip_conntrack_dir dir)
{
route->tuple[!dir].in.ifindex = dst_cache->dev->ifindex;
route->tuple[dir].dst = dst_cache;
route->tuple[dir].xmit_type = nft_xmit_type(dst_cache);
}
static int nft_dev_fill_forward_path(const struct nf_flow_route *route,
const struct dst_entry *dst_cache,
const struct nf_conn *ct,
enum ip_conntrack_dir dir, u8 *ha,
struct net_device_path_stack *stack)
{
const void *daddr = &ct->tuplehash[!dir].tuple.src.u3;
struct net_device *dev = dst_cache->dev;
struct neighbour *n;
u8 nud_state;
n = dst_neigh_lookup(dst_cache, daddr);
if (!n)
return -1;
read_lock_bh(&n->lock);
nud_state = n->nud_state;
ether_addr_copy(ha, n->ha);
read_unlock_bh(&n->lock);
neigh_release(n);
if (!(nud_state & NUD_VALID))
return -1;
return dev_fill_forward_path(dev, ha, stack);
}
struct nft_forward_info {
const struct net_device *indev;
const struct net_device *outdev;
const struct net_device *hw_outdev;
struct id {
__u16 id;
__be16 proto;
} encap[NF_FLOW_TABLE_ENCAP_MAX];
u8 num_encaps;
u8 ingress_vlans;
u8 h_source[ETH_ALEN];
u8 h_dest[ETH_ALEN];
enum flow_offload_xmit_type xmit_type;
};
static bool nft_is_valid_ether_device(const struct net_device *dev)
{
if (!dev || (dev->flags & IFF_LOOPBACK) || dev->type != ARPHRD_ETHER ||
dev->addr_len != ETH_ALEN || !is_valid_ether_addr(dev->dev_addr))
return false;
return true;
}
static void nft_dev_path_info(const struct net_device_path_stack *stack,
struct nft_forward_info *info,
unsigned char *ha, struct nf_flowtable *flowtable)
{
const struct net_device_path *path;
int i;
memcpy(info->h_dest, ha, ETH_ALEN);
for (i = 0; i < stack->num_paths; i++) {
path = &stack->path[i];
switch (path->type) {
case DEV_PATH_ETHERNET:
case DEV_PATH_DSA:
case DEV_PATH_VLAN:
case DEV_PATH_PPPOE:
info->indev = path->dev;
if (is_zero_ether_addr(info->h_source))
memcpy(info->h_source, path->dev->dev_addr, ETH_ALEN);
if (path->type == DEV_PATH_ETHERNET)
break;
if (path->type == DEV_PATH_DSA) {
i = stack->num_paths;
break;
}
/* DEV_PATH_VLAN and DEV_PATH_PPPOE */
if (info->num_encaps >= NF_FLOW_TABLE_ENCAP_MAX) {
info->indev = NULL;
break;
}
info->outdev = path->dev;
info->encap[info->num_encaps].id = path->encap.id;
info->encap[info->num_encaps].proto = path->encap.proto;
info->num_encaps++;
if (path->type == DEV_PATH_PPPOE)
memcpy(info->h_dest, path->encap.h_dest, ETH_ALEN);
break;
case DEV_PATH_BRIDGE:
if (is_zero_ether_addr(info->h_source))
memcpy(info->h_source, path->dev->dev_addr, ETH_ALEN);
switch (path->bridge.vlan_mode) {
case DEV_PATH_BR_VLAN_UNTAG_HW:
info->ingress_vlans |= BIT(info->num_encaps - 1);
break;
case DEV_PATH_BR_VLAN_TAG:
info->encap[info->num_encaps].id = path->bridge.vlan_id;
info->encap[info->num_encaps].proto = path->bridge.vlan_proto;
info->num_encaps++;
break;
case DEV_PATH_BR_VLAN_UNTAG:
info->num_encaps--;
break;
case DEV_PATH_BR_VLAN_KEEP:
break;
}
info->xmit_type = FLOW_OFFLOAD_XMIT_DIRECT;
break;
default:
info->indev = NULL;
break;
}
}
if (!info->outdev)
info->outdev = info->indev;
info->hw_outdev = info->indev;
if (nf_flowtable_hw_offload(flowtable) &&
nft_is_valid_ether_device(info->indev))
info->xmit_type = FLOW_OFFLOAD_XMIT_DIRECT;
}
static bool nft_flowtable_find_dev(const struct net_device *dev,
struct nft_flowtable *ft)
{
struct nft_hook *hook;
bool found = false;
list_for_each_entry_rcu(hook, &ft->hook_list, list) {
if (hook->ops.dev != dev)
continue;
found = true;
break;
}
return found;
}
static void nft_dev_forward_path(struct nf_flow_route *route,
const struct nf_conn *ct,
enum ip_conntrack_dir dir,
struct nft_flowtable *ft)
{
const struct dst_entry *dst = route->tuple[dir].dst;
struct net_device_path_stack stack;
struct nft_forward_info info = {};
unsigned char ha[ETH_ALEN];
int i;
if (nft_dev_fill_forward_path(route, dst, ct, dir, ha, &stack) >= 0)
nft_dev_path_info(&stack, &info, ha, &ft->data);
if (!info.indev || !nft_flowtable_find_dev(info.indev, ft))
return;
route->tuple[!dir].in.ifindex = info.indev->ifindex;
for (i = 0; i < info.num_encaps; i++) {
route->tuple[!dir].in.encap[i].id = info.encap[i].id;
route->tuple[!dir].in.encap[i].proto = info.encap[i].proto;
}
route->tuple[!dir].in.num_encaps = info.num_encaps;
route->tuple[!dir].in.ingress_vlans = info.ingress_vlans;
if (info.xmit_type == FLOW_OFFLOAD_XMIT_DIRECT) {
memcpy(route->tuple[dir].out.h_source, info.h_source, ETH_ALEN);
memcpy(route->tuple[dir].out.h_dest, info.h_dest, ETH_ALEN);
route->tuple[dir].out.ifindex = info.outdev->ifindex;
route->tuple[dir].out.hw_ifindex = info.hw_outdev->ifindex;
route->tuple[dir].xmit_type = info.xmit_type;
}
}
static int nft_flow_route(const struct nft_pktinfo *pkt,
const struct nf_conn *ct,
struct nf_flow_route *route,
enum ip_conntrack_dir dir)
enum ip_conntrack_dir dir,
struct nft_flowtable *ft)
{
struct dst_entry *this_dst = skb_dst(pkt->skb);
struct dst_entry *other_dst = NULL;
......@@ -44,8 +239,14 @@ static int nft_flow_route(const struct nft_pktinfo *pkt,
if (!other_dst)
return -ENOENT;
route->tuple[dir].dst = this_dst;
route->tuple[!dir].dst = other_dst;
nft_default_forward_path(route, this_dst, dir);
nft_default_forward_path(route, other_dst, !dir);
if (route->tuple[dir].xmit_type == FLOW_OFFLOAD_XMIT_NEIGH &&
route->tuple[!dir].xmit_type == FLOW_OFFLOAD_XMIT_NEIGH) {
nft_dev_forward_path(route, ct, dir, ft);
nft_dev_forward_path(route, ct, !dir, ft);
}
return 0;
}
......@@ -74,8 +275,8 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
struct nft_flow_offload *priv = nft_expr_priv(expr);
struct nf_flowtable *flowtable = &priv->flowtable->data;
struct tcphdr _tcph, *tcph = NULL;
struct nf_flow_route route = {};
enum ip_conntrack_info ctinfo;
struct nf_flow_route route;
struct flow_offload *flow;
enum ip_conntrack_dir dir;
struct nf_conn *ct;
......@@ -112,7 +313,7 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
goto out;
dir = CTINFO2DIR(ctinfo);
if (nft_flow_route(pkt, ct, &route, dir) < 0)
if (nft_flow_route(pkt, ct, &route, dir, priv->flowtable) < 0)
goto err_flow_route;
flow = flow_offload_alloc(ct);
......
......@@ -371,6 +371,88 @@ else
ip netns exec nsr1 nft list ruleset
fi
# Another test:
# Add bridge interface br0 to Router1, with NAT enabled.
ip -net nsr1 link add name br0 type bridge
ip -net nsr1 addr flush dev veth0
ip -net nsr1 link set up dev veth0
ip -net nsr1 link set veth0 master br0
ip -net nsr1 addr add 10.0.1.1/24 dev br0
ip -net nsr1 addr add dead:1::1/64 dev br0
ip -net nsr1 link set up dev br0
ip netns exec nsr1 sysctl net.ipv4.conf.br0.forwarding=1 > /dev/null
# br0 with NAT enabled.
ip netns exec nsr1 nft -f - <<EOF
flush table ip nat
table ip nat {
chain prerouting {
type nat hook prerouting priority 0; policy accept;
meta iif "br0" ip daddr 10.6.6.6 tcp dport 1666 counter dnat ip to 10.0.2.99:12345
}
chain postrouting {
type nat hook postrouting priority 0; policy accept;
meta oifname "veth1" counter masquerade
}
}
EOF
if test_tcp_forwarding_nat ns1 ns2; then
echo "PASS: flow offloaded for ns1/ns2 with bridge NAT"
else
echo "FAIL: flow offload for ns1/ns2 with bridge NAT" 1>&2
ip netns exec nsr1 nft list ruleset
ret=1
fi
# Another test:
# Add bridge interface br0 to Router1, with NAT and VLAN.
ip -net nsr1 link set veth0 nomaster
ip -net nsr1 link set down dev veth0
ip -net nsr1 link add link veth0 name veth0.10 type vlan id 10
ip -net nsr1 link set up dev veth0
ip -net nsr1 link set up dev veth0.10
ip -net nsr1 link set veth0.10 master br0
ip -net ns1 addr flush dev eth0
ip -net ns1 link add link eth0 name eth0.10 type vlan id 10
ip -net ns1 link set eth0 up
ip -net ns1 link set eth0.10 up
ip -net ns1 addr add 10.0.1.99/24 dev eth0.10
ip -net ns1 route add default via 10.0.1.1
ip -net ns1 addr add dead:1::99/64 dev eth0.10
if test_tcp_forwarding_nat ns1 ns2; then
echo "PASS: flow offloaded for ns1/ns2 with bridge NAT and VLAN"
else
echo "FAIL: flow offload for ns1/ns2 with bridge NAT and VLAN" 1>&2
ip netns exec nsr1 nft list ruleset
ret=1
fi
# restore test topology (remove bridge and VLAN)
ip -net nsr1 link set veth0 nomaster
ip -net nsr1 link set veth0 down
ip -net nsr1 link set veth0.10 down
ip -net nsr1 link delete veth0.10 type vlan
ip -net nsr1 link delete br0 type bridge
ip -net ns1 addr flush dev eth0.10
ip -net ns1 link set eth0.10 down
ip -net ns1 link set eth0 down
ip -net ns1 link delete eth0.10 type vlan
# restore address in ns1 and nsr1
ip -net ns1 link set eth0 up
ip -net ns1 addr add 10.0.1.99/24 dev eth0
ip -net ns1 route add default via 10.0.1.1
ip -net ns1 addr add dead:1::99/64 dev eth0
ip -net ns1 route add default via dead:1::1
ip -net nsr1 addr add 10.0.1.1/24 dev veth0
ip -net nsr1 addr add dead:1::1/64 dev veth0
ip -net nsr1 link set up dev veth0
KEY_SHA="0x"$(ps -xaf | sha1sum | cut -d " " -f 1)
KEY_AES="0x"$(ps -xaf | md5sum | cut -d " " -f 1)
SPI1=$RANDOM
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment