Commit 2aec17f1 authored by David S. Miller's avatar David S. Miller

Merge branch 'fix-indirect-flow_block-infrastructure'

Pablo Neira Ayuso says:

====================
the indirect flow_block infrastructure, revisited

This series fixes b5140a36 ("netfilter: flowtable: add indr block
setup support") that adds support for the indirect block for the
flowtable. This patch crashes the kernel with the TC CT action.

[  630.908086] BUG: kernel NULL pointer dereference, address: 00000000000000f0
[  630.908233] #PF: error_code(0x0000) - not-present page
[  630.908304] PGD 800000104addd067 P4D 800000104addd067 PUD 104311d067 PMD 0
[  630.908380] Oops: 0000 [#1] SMP PTI [  630.908615] RIP: 0010:nf_flow_table_indr_block_cb+0xc0/0x190 [nf_flow_table]
[  630.908690] Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 4c 89 75 a0 4c 89 65 a8 4d 89 ee 49 89 dd 4c 89 fe 48 c7 c7 b7 64 36 a0 31 c0 e8 ce ed d8 e0 <49> 8b b7 f0 00 00 00 48 c7 c7 c8 64      36 a0 31 c0 e8 b9 ed d8 e0 49[  630.908790] RSP: 0018:ffffc9000895f8c0 EFLAGS: 00010246
[...]
[  630.910774] Call Trace:
[  630.911192]  ? mlx5e_rep_indr_setup_block+0x270/0x270 [mlx5_core]
[  630.911621]  ? mlx5e_rep_indr_setup_block+0x270/0x270 [mlx5_core]
[  630.912040]  ? mlx5e_rep_indr_setup_block+0x270/0x270 [mlx5_core]
[  630.912443]  flow_block_cmd+0x51/0x80
[  630.912844]  __flow_indr_block_cb_register+0x26c/0x510
[  630.913265]  mlx5e_nic_rep_netdevice_event+0x9e/0x110 [mlx5_core]
[  630.913665]  notifier_call_chain+0x53/0xa0
[  630.914063]  raw_notifier_call_chain+0x16/0x20
[  630.914466]  call_netdevice_notifiers_info+0x39/0x90
[  630.914859]  register_netdevice+0x484/0x550
[  630.915256]  __ip_tunnel_create+0x12b/0x1f0 [ip_tunnel]
[  630.915661]  ip_tunnel_init_net+0x116/0x180 [ip_tunnel]
[  630.916062]  ipgre_tap_init_net+0x22/0x30 [ip_gre]
[  630.916458]  ops_init+0x44/0x110
[  630.916851]  register_pernet_operations+0x112/0x200

A workaround patch to cure this crash has been proposed. However, there
is another problem: The indirect flow_block still does not work for the
new TC CT action. The problem is that the existing flow_indr_block_entry
callback assumes you can look up for the flowtable from the netdevice to
get the flow_block. This flow_block allows you to offload the flows via
TC_SETUP_CLSFLOWER. Unfortunately, it is not possible to get the
flow_block from the TC CT flowtables because they are _not_ bound to any
specific netdevice.

= What is the indirect flow_block infrastructure?

The indirect flow_block infrastructure allows drivers to offload
tc/netfilter rules that belong to software tunnel netdevices, e.g.
vxlan.

This indirect flow_block infrastructure relates tunnel netdevices with
drivers because there is no obvious way to relate these two things
from the control plane.

= How does the indirect flow_block work before this patchset?

Front-ends register the indirect block callback through
flow_indr_add_block_cb() if they support for offloading tunnel
netdevices.

== Setting up an indirect block

1) Drivers track tunnel netdevices via NETDEV_{REGISTER,UNREGISTER} events.
   If there is a new tunnel netdevice that the driver can offload, then the
   driver invokes __flow_indr_block_cb_register() with the new tunnel
   netdevice and the driver callback. The __flow_indr_block_cb_register()
   call iterates over the list of the front-end callbacks.

2) The front-end callback sets up the flow_block_offload structure and it
   invokes the driver callback to set up the flow_block.

3) The driver callback now registers the flow_block structure and it
   returns the flow_block back to the front-end.

4) The front-end gets the flow_block object and it is now ready to
   offload rules for this tunnel netdevice.

A simplified callgraph is represented below.

        Front-end                      Driver

                                   NETDEV_REGISTER
                                         |
                     __flow_indr_block_cb_register(netdev, cb_priv, driver_cb)
                                         | [1]
            .--------------frontend_indr_block_cb(cb_priv, driver_cb)
            |
            .
   setup_flow_block_offload(bo)
            | [2]
       driver_cb(bo, cb_priv) -----------.
                                         |
                                         \/
                                  set up flow_blocks [3]
                                         |
      add rules to flow_block <----------
      TC_SETUP_CLSFLOWER [4]

== Releasing the indirect flow_block

There are two possibilities, either tunnel netdevice is removed or
a netdevice (port representor) is removed.

=== Tunnel netdevice is removed

Driver waits for the NETDEV_UNREGISTER event that announces the tunnel
netdevice removal. Then, it calls __flow_indr_block_cb_unregister() to
remove the flow_block and rules.  Callgraph is very similar to the one
described above.

=== Netdevice is removed (port representor)

Driver calls __flow_indr_block_cb_unregister() to remove the existing
netfilter/tc rule that belong to the tunnel netdevice.

= How does the indirect flow_block work after this patchset?

Drivers register the indirect flow_block setup callback through
flow_indr_dev_register() if they support for offloading tunnel
netdevices.

== Setting up an indirect flow_block

1) Frontends check if dev->netdev_ops->ndo_setup_tc is unset. If so,
   frontends call flow_indr_dev_setup_offload(). This call invokes
   the drivers' indirect flow_block setup callback.

2) The indirect flow_block setup callback sets up a flow_block structure
   which relates the tunnel netdevice and the driver.

3) The front-end uses flow_block and offload the rules.

Note that the operational to set up (non-indirect) flow_block is very
similar.

== Releasing the indirect flow_block

=== Tunnel netdevice is removed

This calls flow_indr_dev_setup_offload() to set down the flow_block and
remove the offloaded rules. This alternate path is exercised if
dev->netdev_ops->ndo_setup_tc is unset.

=== Netdevice is removed (port representor)

If a netdevice is removed, then it might need to to clean up the
offloaded tc/netfilter rules that belongs to the tunnel netdevice:

1) The driver invokes flow_indr_dev_unregister() when a netdevice is
   removed.

2) This call iterates over the existing indirect flow_blocks
   and it invokes the cleanup callback to let the front-end remove the
   tc/netfilter rules. The cleanup callback already provides the
   flow_block that the front-end needs to clean up.

        Front-end                      Driver

                                         |
                            flow_indr_dev_unregister(...)
                                         |
                         iterate over list of indirect flow_block
                               and invoke cleanup callback
                                         |
            .-----------------------------
            |
            .
   frontend_flow_block_cleanup(flow_block)
            .
            |
           \/
   remove rules to flow_block
      TC_SETUP_CLSFLOWER

= About this patchset

This patchset aims to address the existing TC CT problem while
simplifying the indirect flow_block infrastructure. Saving 300 LoC in
the flow_offload core and the drivers. The operational gets aligned with
the (non-indirect) flow_blocks logic. Patchset is composed of:

Patch #1 add nf_flow_table_gc_cleanup() which is required by the
         netfilter's flowtable new indirect flow_block approach.

Patch #2 adds the flow_block_indr object which is actually part of
         of the flow_block object. This stores the indirect flow_block
         metadata such as the tunnel netdevice owner and the cleanup
         callback (in case the tunnel netdevice goes away).

         This patch adds flow_indr_dev_{un}register() to allow drivers
         to offer netdevice tunnel hardware offload to the front-ends.
         Then, front-ends call flow_indr_dev_setup_offload() to invoke
         the drivers to set up the (indirect) flow_block.

Patch #3 add the tcf_block_offload_init() helper function, this is
         a preparation patch to adapt the tc front-end to use this
         new indirect flow_block infrastructure.

Patch #4 updates the tc and netfilter front-ends to use the new
         indirect flow_block infrastructure.

Patch #5 updates the mlx5 driver to use the new indirect flow_block
         infrastructure.

Patch #6 updates the nfp driver to use the new indirect flow_block
         infrastructure.

Patch #7 updates the bnxt driver to use the new indirect flow_block
         infrastructure.

Patch #8 removes the indirect flow_block infrastructure version 1,
         now that frontends and drivers have been translated to
         version 2 (coming in this patchset).
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents a01c2454 709ffbe1
......@@ -1870,7 +1870,6 @@ struct bnxt {
u8 dsn[8];
struct bnxt_tc_info *tc_info;
struct list_head tc_indr_block_list;
struct notifier_block tc_netdev_nb;
struct dentry *debugfs_pdev;
struct device *hwmon_dev;
};
......
......@@ -1939,53 +1939,25 @@ static int bnxt_tc_setup_indr_block(struct net_device *netdev, struct bnxt *bp,
return 0;
}
static int bnxt_tc_setup_indr_cb(struct net_device *netdev, void *cb_priv,
enum tc_setup_type type, void *type_data)
{
switch (type) {
case TC_SETUP_BLOCK:
return bnxt_tc_setup_indr_block(netdev, cb_priv, type_data);
default:
return -EOPNOTSUPP;
}
}
static bool bnxt_is_netdev_indr_offload(struct net_device *netdev)
{
return netif_is_vxlan(netdev);
}
static int bnxt_tc_indr_block_event(struct notifier_block *nb,
unsigned long event, void *ptr)
static int bnxt_tc_setup_indr_cb(struct net_device *netdev, void *cb_priv,
enum tc_setup_type type, void *type_data)
{
struct net_device *netdev;
struct bnxt *bp;
int rc;
netdev = netdev_notifier_info_to_dev(ptr);
if (!bnxt_is_netdev_indr_offload(netdev))
return NOTIFY_OK;
bp = container_of(nb, struct bnxt, tc_netdev_nb);
return -EOPNOTSUPP;
switch (event) {
case NETDEV_REGISTER:
rc = __flow_indr_block_cb_register(netdev, bp,
bnxt_tc_setup_indr_cb,
bp);
if (rc)
netdev_info(bp->dev,
"Failed to register indirect blk: dev: %s\n",
netdev->name);
break;
case NETDEV_UNREGISTER:
__flow_indr_block_cb_unregister(netdev,
bnxt_tc_setup_indr_cb,
bp);
switch (type) {
case TC_SETUP_BLOCK:
return bnxt_tc_setup_indr_block(netdev, cb_priv, type_data);
default:
break;
}
return NOTIFY_DONE;
return -EOPNOTSUPP;
}
static const struct rhashtable_params bnxt_tc_flow_ht_params = {
......@@ -2074,8 +2046,8 @@ int bnxt_init_tc(struct bnxt *bp)
/* init indirect block notifications */
INIT_LIST_HEAD(&bp->tc_indr_block_list);
bp->tc_netdev_nb.notifier_call = bnxt_tc_indr_block_event;
rc = register_netdevice_notifier(&bp->tc_netdev_nb);
rc = flow_indr_dev_register(bnxt_tc_setup_indr_cb, bp);
if (!rc)
return 0;
......@@ -2101,7 +2073,8 @@ void bnxt_shutdown_tc(struct bnxt *bp)
if (!bnxt_tc_flower_enabled(bp))
return;
unregister_netdevice_notifier(&bp->tc_netdev_nb);
flow_indr_dev_unregister(bnxt_tc_setup_indr_cb, bp,
bnxt_tc_setup_indr_block_cb);
rhashtable_destroy(&tc_info->flow_table);
rhashtable_destroy(&tc_info->l2_table);
rhashtable_destroy(&tc_info->decap_l2_table);
......
......@@ -306,20 +306,6 @@ mlx5e_rep_indr_block_priv_lookup(struct mlx5e_rep_priv *rpriv,
return NULL;
}
static void mlx5e_rep_indr_unregister_block(struct mlx5e_rep_priv *rpriv,
struct net_device *netdev);
void mlx5e_rep_indr_clean_block_privs(struct mlx5e_rep_priv *rpriv)
{
struct mlx5e_rep_indr_block_priv *cb_priv, *temp;
struct list_head *head = &rpriv->uplink_priv.tc_indr_block_priv_list;
list_for_each_entry_safe(cb_priv, temp, head, list) {
mlx5e_rep_indr_unregister_block(rpriv, cb_priv->netdev);
kfree(cb_priv);
}
}
static int
mlx5e_rep_indr_offload(struct net_device *netdev,
struct flow_cls_offload *flower,
......@@ -423,9 +409,14 @@ mlx5e_rep_indr_setup_block(struct net_device *netdev,
struct flow_block_offload *f,
flow_setup_cb_t *setup_cb)
{
struct mlx5e_priv *priv = netdev_priv(rpriv->netdev);
struct mlx5e_rep_indr_block_priv *indr_priv;
struct flow_block_cb *block_cb;
if (!mlx5e_tc_tun_device_to_offload(priv, netdev) &&
!(is_vlan_dev(netdev) && vlan_dev_real_dev(netdev) == rpriv->netdev))
return -EOPNOTSUPP;
if (f->binder_type != FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
return -EOPNOTSUPP;
......@@ -492,76 +483,20 @@ int mlx5e_rep_indr_setup_cb(struct net_device *netdev, void *cb_priv,
}
}
static int mlx5e_rep_indr_register_block(struct mlx5e_rep_priv *rpriv,
struct net_device *netdev)
{
int err;
err = __flow_indr_block_cb_register(netdev, rpriv,
mlx5e_rep_indr_setup_cb,
rpriv);
if (err) {
struct mlx5e_priv *priv = netdev_priv(rpriv->netdev);
mlx5_core_err(priv->mdev, "Failed to register remote block notifier for %s err=%d\n",
netdev_name(netdev), err);
}
return err;
}
static void mlx5e_rep_indr_unregister_block(struct mlx5e_rep_priv *rpriv,
struct net_device *netdev)
{
__flow_indr_block_cb_unregister(netdev, mlx5e_rep_indr_setup_cb,
rpriv);
}
static int mlx5e_nic_rep_netdevice_event(struct notifier_block *nb,
unsigned long event, void *ptr)
{
struct mlx5e_rep_priv *rpriv = container_of(nb, struct mlx5e_rep_priv,
uplink_priv.netdevice_nb);
struct mlx5e_priv *priv = netdev_priv(rpriv->netdev);
struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
if (!mlx5e_tc_tun_device_to_offload(priv, netdev) &&
!(is_vlan_dev(netdev) && vlan_dev_real_dev(netdev) == rpriv->netdev))
return NOTIFY_OK;
switch (event) {
case NETDEV_REGISTER:
mlx5e_rep_indr_register_block(rpriv, netdev);
break;
case NETDEV_UNREGISTER:
mlx5e_rep_indr_unregister_block(rpriv, netdev);
break;
}
return NOTIFY_OK;
}
int mlx5e_rep_tc_netdevice_event_register(struct mlx5e_rep_priv *rpriv)
{
struct mlx5_rep_uplink_priv *uplink_priv = &rpriv->uplink_priv;
int err;
/* init indirect block notifications */
INIT_LIST_HEAD(&uplink_priv->tc_indr_block_priv_list);
uplink_priv->netdevice_nb.notifier_call = mlx5e_nic_rep_netdevice_event;
err = register_netdevice_notifier_dev_net(rpriv->netdev,
&uplink_priv->netdevice_nb,
&uplink_priv->netdevice_nn);
return err;
return flow_indr_dev_register(mlx5e_rep_indr_setup_cb, rpriv);
}
void mlx5e_rep_tc_netdevice_event_unregister(struct mlx5e_rep_priv *rpriv)
{
struct mlx5_rep_uplink_priv *uplink_priv = &rpriv->uplink_priv;
/* clean indirect TC block notifications */
unregister_netdevice_notifier_dev_net(rpriv->netdev,
&uplink_priv->netdevice_nb,
&uplink_priv->netdevice_nn);
flow_indr_dev_unregister(mlx5e_rep_indr_setup_cb, rpriv,
mlx5e_rep_indr_setup_tc_cb);
}
#if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
......
......@@ -33,7 +33,6 @@ void mlx5e_rep_encap_entry_detach(struct mlx5e_priv *priv,
int mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
void *type_data);
void mlx5e_rep_indr_clean_block_privs(struct mlx5e_rep_priv *rpriv);
bool mlx5e_rep_tc_update_skb(struct mlx5_cqe64 *cqe,
struct sk_buff *skb,
......@@ -65,9 +64,6 @@ static inline int
mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
void *type_data) { return -EOPNOTSUPP; }
static inline void
mlx5e_rep_indr_clean_block_privs(struct mlx5e_rep_priv *rpriv) {}
struct mlx5e_tc_update_priv;
static inline bool
mlx5e_rep_tc_update_skb(struct mlx5_cqe64 *cqe,
......
......@@ -1018,7 +1018,6 @@ static int mlx5e_init_rep_tx(struct mlx5e_priv *priv)
static void mlx5e_cleanup_uplink_rep_tx(struct mlx5e_rep_priv *rpriv)
{
mlx5e_rep_tc_netdevice_event_unregister(rpriv);
mlx5e_rep_indr_clean_block_privs(rpriv);
mlx5e_rep_bond_cleanup(rpriv);
mlx5e_rep_tc_cleanup(rpriv);
}
......
......@@ -69,13 +69,8 @@ struct mlx5_rep_uplink_priv {
* tc_indr_block_cb_priv_list is used to lookup indirect callback
* private data
*
* netdevice_nb is the netdev events notifier - used to register
* tunnel devices for block events
*
*/
struct list_head tc_indr_block_priv_list;
struct notifier_block netdevice_nb;
struct netdev_net_notifier netdevice_nn;
struct mlx5_tun_entropy tun_entropy;
......
......@@ -830,6 +830,10 @@ static int nfp_flower_init(struct nfp_app *app)
if (err)
goto err_cleanup;
err = flow_indr_dev_register(nfp_flower_indr_setup_tc_cb, app);
if (err)
goto err_cleanup;
if (app_priv->flower_ext_feats & NFP_FL_FEATS_VF_RLIM)
nfp_flower_qos_init(app);
......@@ -856,6 +860,9 @@ static void nfp_flower_clean(struct nfp_app *app)
skb_queue_purge(&app_priv->cmsg_skbs_low);
flush_work(&app_priv->cmsg_work);
flow_indr_dev_unregister(nfp_flower_indr_setup_tc_cb, app,
nfp_flower_setup_indr_block_cb);
if (app_priv->flower_ext_feats & NFP_FL_FEATS_VF_RLIM)
nfp_flower_qos_cleanup(app);
......@@ -959,10 +966,6 @@ nfp_flower_netdev_event(struct nfp_app *app, struct net_device *netdev,
return ret;
}
ret = nfp_flower_reg_indir_block_handler(app, netdev, event);
if (ret & NOTIFY_STOP_MASK)
return ret;
ret = nfp_flower_internal_port_event_handler(app, netdev, event);
if (ret & NOTIFY_STOP_MASK)
return ret;
......
......@@ -458,9 +458,10 @@ void nfp_flower_qos_cleanup(struct nfp_app *app);
int nfp_flower_setup_qos_offload(struct nfp_app *app, struct net_device *netdev,
struct tc_cls_matchall_offload *flow);
void nfp_flower_stats_rlim_reply(struct nfp_app *app, struct sk_buff *skb);
int nfp_flower_reg_indir_block_handler(struct nfp_app *app,
struct net_device *netdev,
unsigned long event);
int nfp_flower_indr_setup_tc_cb(struct net_device *netdev, void *cb_priv,
enum tc_setup_type type, void *type_data);
int nfp_flower_setup_indr_block_cb(enum tc_setup_type type, void *type_data,
void *cb_priv);
void
__nfp_flower_non_repr_priv_get(struct nfp_flower_non_repr_priv *non_repr_priv);
......
......@@ -1619,7 +1619,7 @@ nfp_flower_indr_block_cb_priv_lookup(struct nfp_app *app,
return NULL;
}
static int nfp_flower_setup_indr_block_cb(enum tc_setup_type type,
int nfp_flower_setup_indr_block_cb(enum tc_setup_type type,
void *type_data, void *cb_priv)
{
struct nfp_flower_indr_block_cb_priv *priv = cb_priv;
......@@ -1708,10 +1708,13 @@ nfp_flower_setup_indr_tc_block(struct net_device *netdev, struct nfp_app *app,
return 0;
}
static int
int
nfp_flower_indr_setup_tc_cb(struct net_device *netdev, void *cb_priv,
enum tc_setup_type type, void *type_data)
{
if (!nfp_fl_is_netdev_to_offload(netdev))
return -EOPNOTSUPP;
switch (type) {
case TC_SETUP_BLOCK:
return nfp_flower_setup_indr_tc_block(netdev, cb_priv,
......@@ -1720,29 +1723,3 @@ nfp_flower_indr_setup_tc_cb(struct net_device *netdev, void *cb_priv,
return -EOPNOTSUPP;
}
}
int nfp_flower_reg_indir_block_handler(struct nfp_app *app,
struct net_device *netdev,
unsigned long event)
{
int err;
if (!nfp_fl_is_netdev_to_offload(netdev))
return NOTIFY_OK;
if (event == NETDEV_REGISTER) {
err = __flow_indr_block_cb_register(netdev, app,
nfp_flower_indr_setup_tc_cb,
app);
if (err)
nfp_flower_cmsg_warn(app,
"Indirect block reg failed - %s\n",
netdev->name);
} else if (event == NETDEV_UNREGISTER) {
__flow_indr_block_cb_unregister(netdev,
nfp_flower_indr_setup_tc_cb,
app);
}
return NOTIFY_OK;
}
......@@ -443,6 +443,16 @@ enum tc_setup_type;
typedef int flow_setup_cb_t(enum tc_setup_type type, void *type_data,
void *cb_priv);
struct flow_block_cb;
struct flow_block_indr {
struct list_head list;
struct net_device *dev;
enum flow_block_binder_type binder_type;
void *data;
void (*cleanup)(struct flow_block_cb *block_cb);
};
struct flow_block_cb {
struct list_head driver_list;
struct list_head list;
......@@ -450,6 +460,7 @@ struct flow_block_cb {
void *cb_ident;
void *cb_priv;
void (*release)(void *cb_priv);
struct flow_block_indr indr;
unsigned int refcnt;
};
......@@ -523,19 +534,18 @@ static inline void flow_block_init(struct flow_block *flow_block)
typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
enum tc_setup_type type, void *type_data);
int flow_indr_dev_register(flow_indr_block_bind_cb_t *cb, void *cb_priv);
void flow_indr_dev_unregister(flow_indr_block_bind_cb_t *cb, void *cb_priv,
flow_setup_cb_t *setup_cb);
int flow_indr_dev_setup_offload(struct net_device *dev,
enum tc_setup_type type, void *data,
struct flow_block_offload *bo,
void (*cleanup)(struct flow_block_cb *block_cb));
typedef void flow_indr_block_cmd_t(struct net_device *dev,
flow_indr_block_bind_cb_t *cb, void *cb_priv,
enum flow_block_command command);
struct flow_indr_block_entry {
flow_indr_block_cmd_t *cb;
struct list_head list;
};
void flow_indr_add_block_cb(struct flow_indr_block_entry *entry);
void flow_indr_del_block_cb(struct flow_indr_block_entry *entry);
int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
flow_indr_block_bind_cb_t *cb,
void *cb_ident);
......
......@@ -175,6 +175,8 @@ void flow_offload_refresh(struct nf_flowtable *flow_table,
struct flow_offload_tuple_rhash *flow_offload_lookup(struct nf_flowtable *flow_table,
struct flow_offload_tuple *tuple);
void nf_flow_table_gc_cleanup(struct nf_flowtable *flowtable,
struct net_device *dev);
void nf_flow_table_cleanup(struct net_device *dev);
int nf_flow_table_init(struct nf_flowtable *flow_table);
......
This diff is collapsed.
......@@ -588,7 +588,7 @@ static void nf_flow_table_do_cleanup(struct flow_offload *flow, void *data)
flow_offload_teardown(flow);
}
static void nf_flow_table_iterate_cleanup(struct nf_flowtable *flowtable,
void nf_flow_table_gc_cleanup(struct nf_flowtable *flowtable,
struct net_device *dev)
{
nf_flow_table_iterate(flowtable, nf_flow_table_do_cleanup, dev);
......@@ -602,7 +602,7 @@ void nf_flow_table_cleanup(struct net_device *dev)
mutex_lock(&flowtable_lock);
list_for_each_entry(flowtable, &flowtables, list)
nf_flow_table_iterate_cleanup(flowtable, dev);
nf_flow_table_gc_cleanup(flowtable, dev);
mutex_unlock(&flowtable_lock);
}
EXPORT_SYMBOL_GPL(nf_flow_table_cleanup);
......
......@@ -942,6 +942,18 @@ static void nf_flow_table_block_offload_init(struct flow_block_offload *bo,
INIT_LIST_HEAD(&bo->cb_list);
}
static void nf_flow_table_indr_cleanup(struct flow_block_cb *block_cb)
{
struct nf_flowtable *flowtable = block_cb->indr.data;
struct net_device *dev = block_cb->indr.dev;
nf_flow_table_gc_cleanup(flowtable, dev);
down_write(&flowtable->flow_block_lock);
list_del(&block_cb->list);
flow_block_cb_free(block_cb);
up_write(&flowtable->flow_block_lock);
}
static int nf_flow_table_indr_offload_cmd(struct flow_block_offload *bo,
struct nf_flowtable *flowtable,
struct net_device *dev,
......@@ -950,12 +962,9 @@ static int nf_flow_table_indr_offload_cmd(struct flow_block_offload *bo,
{
nf_flow_table_block_offload_init(bo, dev_net(dev), cmd, flowtable,
extack);
flow_indr_block_call(dev, bo, cmd, TC_SETUP_FT);
if (list_empty(&bo->cb_list))
return -EOPNOTSUPP;
return 0;
return flow_indr_dev_setup_offload(dev, TC_SETUP_FT, flowtable, bo,
nf_flow_table_indr_cleanup);
}
static int nf_flow_table_offload_cmd(struct flow_block_offload *bo,
......@@ -999,69 +1008,6 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable,
}
EXPORT_SYMBOL_GPL(nf_flow_table_offload_setup);
static void nf_flow_table_indr_block_ing_cmd(struct net_device *dev,
struct nf_flowtable *flowtable,
flow_indr_block_bind_cb_t *cb,
void *cb_priv,
enum flow_block_command cmd)
{
struct netlink_ext_ack extack = {};
struct flow_block_offload bo;
if (!flowtable)
return;
nf_flow_table_block_offload_init(&bo, dev_net(dev), cmd, flowtable,
&extack);
cb(dev, cb_priv, TC_SETUP_FT, &bo);
nf_flow_table_block_setup(flowtable, &bo, cmd);
}
static void nf_flow_table_indr_block_cb_cmd(struct nf_flowtable *flowtable,
struct net_device *dev,
flow_indr_block_bind_cb_t *cb,
void *cb_priv,
enum flow_block_command cmd)
{
if (!(flowtable->flags & NF_FLOWTABLE_HW_OFFLOAD))
return;
nf_flow_table_indr_block_ing_cmd(dev, flowtable, cb, cb_priv, cmd);
}
static void nf_flow_table_indr_block_cb(struct net_device *dev,
flow_indr_block_bind_cb_t *cb,
void *cb_priv,
enum flow_block_command cmd)
{
struct net *net = dev_net(dev);
struct nft_flowtable *nft_ft;
struct nft_table *table;
struct nft_hook *hook;
mutex_lock(&net->nft.commit_mutex);
list_for_each_entry(table, &net->nft.tables, list) {
list_for_each_entry(nft_ft, &table->flowtables, list) {
list_for_each_entry(hook, &nft_ft->hook_list, list) {
if (hook->ops.dev != dev)
continue;
nf_flow_table_indr_block_cb_cmd(&nft_ft->data,
dev, cb,
cb_priv, cmd);
}
}
}
mutex_unlock(&net->nft.commit_mutex);
}
static struct flow_indr_block_entry block_ing_entry = {
.cb = nf_flow_table_indr_block_cb,
.list = LIST_HEAD_INIT(block_ing_entry.list),
};
int nf_flow_table_offload_init(void)
{
nf_flow_offload_wq = alloc_workqueue("nf_flow_table_offload",
......@@ -1069,13 +1015,10 @@ int nf_flow_table_offload_init(void)
if (!nf_flow_offload_wq)
return -ENOMEM;
flow_indr_add_block_cb(&block_ing_entry);
return 0;
}
void nf_flow_table_offload_exit(void)
{
flow_indr_del_block_cb(&block_ing_entry);
destroy_workqueue(nf_flow_offload_wq);
}
......@@ -285,40 +285,41 @@ static int nft_block_offload_cmd(struct nft_base_chain *chain,
return nft_block_setup(chain, &bo, cmd);
}
static void nft_indr_block_ing_cmd(struct net_device *dev,
struct nft_base_chain *chain,
flow_indr_block_bind_cb_t *cb,
void *cb_priv,
enum flow_block_command cmd)
static void nft_indr_block_cleanup(struct flow_block_cb *block_cb)
{
struct nft_base_chain *basechain = block_cb->indr.data;
struct net_device *dev = block_cb->indr.dev;
struct netlink_ext_ack extack = {};
struct net *net = dev_net(dev);
struct flow_block_offload bo;
if (!chain)
return;
nft_flow_block_offload_init(&bo, dev_net(dev), cmd, chain, &extack);
cb(dev, cb_priv, TC_SETUP_BLOCK, &bo);
nft_block_setup(chain, &bo, cmd);
nft_flow_block_offload_init(&bo, dev_net(dev), FLOW_BLOCK_UNBIND,
basechain, &extack);
mutex_lock(&net->nft.commit_mutex);
list_move(&block_cb->list, &bo.cb_list);
nft_flow_offload_unbind(&bo, basechain);
mutex_unlock(&net->nft.commit_mutex);
}
static int nft_indr_block_offload_cmd(struct nft_base_chain *chain,
static int nft_indr_block_offload_cmd(struct nft_base_chain *basechain,
struct net_device *dev,
enum flow_block_command cmd)
{
struct netlink_ext_ack extack = {};
struct flow_block_offload bo;
int err;
nft_flow_block_offload_init(&bo, dev_net(dev), cmd, chain, &extack);
nft_flow_block_offload_init(&bo, dev_net(dev), cmd, basechain, &extack);
flow_indr_block_call(dev, &bo, cmd, TC_SETUP_BLOCK);
err = flow_indr_dev_setup_offload(dev, TC_SETUP_BLOCK, basechain, &bo,
nft_indr_block_cleanup);
if (err < 0)
return err;
if (list_empty(&bo.cb_list))
return -EOPNOTSUPP;
return nft_block_setup(chain, &bo, cmd);
return nft_block_setup(basechain, &bo, cmd);
}
#define FLOW_SETUP_BLOCK TC_SETUP_BLOCK
......@@ -555,24 +556,6 @@ static struct nft_chain *__nft_offload_get_chain(struct net_device *dev)
return NULL;
}
static void nft_indr_block_cb(struct net_device *dev,
flow_indr_block_bind_cb_t *cb, void *cb_priv,
enum flow_block_command cmd)
{
struct net *net = dev_net(dev);
struct nft_chain *chain;
mutex_lock(&net->nft.commit_mutex);
chain = __nft_offload_get_chain(dev);
if (chain && chain->flags & NFT_CHAIN_HW_OFFLOAD) {
struct nft_base_chain *basechain;
basechain = nft_base_chain(chain);
nft_indr_block_ing_cmd(dev, basechain, cb, cb_priv, cmd);
}
mutex_unlock(&net->nft.commit_mutex);
}
static int nft_offload_netdev_event(struct notifier_block *this,
unsigned long event, void *ptr)
{
......@@ -594,30 +577,16 @@ static int nft_offload_netdev_event(struct notifier_block *this,
return NOTIFY_DONE;
}
static struct flow_indr_block_entry block_ing_entry = {
.cb = nft_indr_block_cb,
.list = LIST_HEAD_INIT(block_ing_entry.list),
};
static struct notifier_block nft_offload_netdev_notifier = {
.notifier_call = nft_offload_netdev_event,
};
int nft_offload_init(void)
{
int err;
err = register_netdevice_notifier(&nft_offload_netdev_notifier);
if (err < 0)
return err;
flow_indr_add_block_cb(&block_ing_entry);
return 0;
return register_netdevice_notifier(&nft_offload_netdev_notifier);
}
void nft_offload_exit(void)
{
flow_indr_del_block_cb(&block_ing_entry);
unregister_netdevice_notifier(&nft_offload_netdev_notifier);
}
......@@ -621,96 +621,42 @@ static void tcf_chain_flush(struct tcf_chain *chain, bool rtnl_held)
static int tcf_block_setup(struct tcf_block *block,
struct flow_block_offload *bo);
static void tc_indr_block_cmd(struct net_device *dev, struct tcf_block *block,
flow_indr_block_bind_cb_t *cb, void *cb_priv,
enum flow_block_command command, bool ingress)
{
struct flow_block_offload bo = {
.command = command,
.binder_type = ingress ?
FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS :
FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS,
.net = dev_net(dev),
.block_shared = tcf_block_non_null_shared(block),
};
INIT_LIST_HEAD(&bo.cb_list);
if (!block)
return;
bo.block = &block->flow_block;
down_write(&block->cb_lock);
cb(dev, cb_priv, TC_SETUP_BLOCK, &bo);
tcf_block_setup(block, &bo);
up_write(&block->cb_lock);
}
static struct tcf_block *tc_dev_block(struct net_device *dev, bool ingress)
static void tcf_block_offload_init(struct flow_block_offload *bo,
struct net_device *dev,
enum flow_block_command command,
enum flow_block_binder_type binder_type,
struct flow_block *flow_block,
bool shared, struct netlink_ext_ack *extack)
{
const struct Qdisc_class_ops *cops;
const struct Qdisc_ops *ops;
struct Qdisc *qdisc;
if (!dev_ingress_queue(dev))
return NULL;
qdisc = dev_ingress_queue(dev)->qdisc_sleeping;
if (!qdisc)
return NULL;
ops = qdisc->ops;
if (!ops)
return NULL;
if (!ingress && !strcmp("ingress", ops->id))
return NULL;
cops = ops->cl_ops;
if (!cops)
return NULL;
if (!cops->tcf_block)
return NULL;
return cops->tcf_block(qdisc,
ingress ? TC_H_MIN_INGRESS : TC_H_MIN_EGRESS,
NULL);
bo->net = dev_net(dev);
bo->command = command;
bo->binder_type = binder_type;
bo->block = flow_block;
bo->block_shared = shared;
bo->extack = extack;
INIT_LIST_HEAD(&bo->cb_list);
}
static void tc_indr_block_get_and_cmd(struct net_device *dev,
flow_indr_block_bind_cb_t *cb,
void *cb_priv,
enum flow_block_command command)
{
struct tcf_block *block;
block = tc_dev_block(dev, true);
tc_indr_block_cmd(dev, block, cb, cb_priv, command, true);
block = tc_dev_block(dev, false);
tc_indr_block_cmd(dev, block, cb, cb_priv, command, false);
}
static void tcf_block_unbind(struct tcf_block *block,
struct flow_block_offload *bo);
static void tc_indr_block_call(struct tcf_block *block,
struct net_device *dev,
struct tcf_block_ext_info *ei,
enum flow_block_command command,
struct netlink_ext_ack *extack)
static void tc_block_indr_cleanup(struct flow_block_cb *block_cb)
{
struct flow_block_offload bo = {
.command = command,
.binder_type = ei->binder_type,
.net = dev_net(dev),
.block = &block->flow_block,
.block_shared = tcf_block_shared(block),
.extack = extack,
};
INIT_LIST_HEAD(&bo.cb_list);
struct tcf_block *block = block_cb->indr.data;
struct net_device *dev = block_cb->indr.dev;
struct netlink_ext_ack extack = {};
struct flow_block_offload bo;
flow_indr_block_call(dev, &bo, command, TC_SETUP_BLOCK);
tcf_block_setup(block, &bo);
tcf_block_offload_init(&bo, dev, FLOW_BLOCK_UNBIND,
block_cb->indr.binder_type,
&block->flow_block, tcf_block_shared(block),
&extack);
down_write(&block->cb_lock);
list_move(&block_cb->list, &bo.cb_list);
up_write(&block->cb_lock);
rtnl_lock();
tcf_block_unbind(block, &bo);
rtnl_unlock();
}
static bool tcf_block_offload_in_use(struct tcf_block *block)
......@@ -727,15 +673,16 @@ static int tcf_block_offload_cmd(struct tcf_block *block,
struct flow_block_offload bo = {};
int err;
bo.net = dev_net(dev);
bo.command = command;
bo.binder_type = ei->binder_type;
bo.block = &block->flow_block;
bo.block_shared = tcf_block_shared(block);
bo.extack = extack;
INIT_LIST_HEAD(&bo.cb_list);
tcf_block_offload_init(&bo, dev, command, ei->binder_type,
&block->flow_block, tcf_block_shared(block),
extack);
if (dev->netdev_ops->ndo_setup_tc)
err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo);
else
err = flow_indr_dev_setup_offload(dev, TC_SETUP_BLOCK, block,
&bo, tc_block_indr_cleanup);
if (err < 0) {
if (err != -EOPNOTSUPP)
NL_SET_ERR_MSG(extack, "Driver ndo_setup_tc failed");
......@@ -753,13 +700,13 @@ static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
int err;
down_write(&block->cb_lock);
if (!dev->netdev_ops->ndo_setup_tc)
goto no_offload_dev_inc;
/* If tc offload feature is disabled and the block we try to bind
* to already has some offloaded filters, forbid to bind.
*/
if (!tc_can_offload(dev) && tcf_block_offload_in_use(block)) {
if (dev->netdev_ops->ndo_setup_tc &&
!tc_can_offload(dev) &&
tcf_block_offload_in_use(block)) {
NL_SET_ERR_MSG(extack, "Bind to offloaded block failed as dev has offload disabled");
err = -EOPNOTSUPP;
goto err_unlock;
......@@ -771,18 +718,15 @@ static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
if (err)
goto err_unlock;
tc_indr_block_call(block, dev, ei, FLOW_BLOCK_BIND, extack);
up_write(&block->cb_lock);
return 0;
no_offload_dev_inc:
if (tcf_block_offload_in_use(block)) {
err = -EOPNOTSUPP;
if (tcf_block_offload_in_use(block))
goto err_unlock;
}
err = 0;
block->nooffloaddevcnt++;
tc_indr_block_call(block, dev, ei, FLOW_BLOCK_BIND, extack);
err_unlock:
up_write(&block->cb_lock);
return err;
......@@ -795,10 +739,6 @@ static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
int err;
down_write(&block->cb_lock);
tc_indr_block_call(block, dev, ei, FLOW_BLOCK_UNBIND, NULL);
if (!dev->netdev_ops->ndo_setup_tc)
goto no_offload_dev_dec;
err = tcf_block_offload_cmd(block, dev, ei, FLOW_BLOCK_UNBIND, NULL);
if (err == -EOPNOTSUPP)
goto no_offload_dev_dec;
......@@ -3824,11 +3764,6 @@ static struct pernet_operations tcf_net_ops = {
.size = sizeof(struct tcf_net),
};
static struct flow_indr_block_entry block_entry = {
.cb = tc_indr_block_get_and_cmd,
.list = LIST_HEAD_INIT(block_entry.list),
};
static int __init tc_filter_init(void)
{
int err;
......@@ -3841,8 +3776,6 @@ static int __init tc_filter_init(void)
if (err)
goto err_register_pernet_subsys;
flow_indr_add_block_cb(&block_entry);
rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL,
RTNL_FLAG_DOIT_UNLOCKED);
rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment