Commit aad93c70 authored by David S. Miller's avatar David S. Miller

Merge branch 'ipvlan-private-vepa'

Mahesh Bandewar says:

====================
add 'private' and 'vepa' attributes to ipvlan modes

IPvlan has always been operating in bridge-mode for its supported modes i.e.
if the packets are destined to the adjacent neighbor dev, then IPvlan driver
will switch the packet internally without needing the packets to hit the
wire or get routed. However, there are situations where this bridge-mode is
not needed. e.g. two private processes running inside two namespaces which
are having one IPvlan slave each for its namespace but sharing the master. These
processes should reach the outside world through the master device but at
the same time the bridge function should not work. Currently that's not
possible hence the private attribute for the selected mode comes in play.

VEPA or 802.1Qbg on the other hand has limited appeal with IPvlan since IPvlan
uses the mac-address of the lower device. So packets that are destined to
the adjacent neighbor slave-dev will have same src and dest mac. When these
packets reach the external switch/router, they will send you the redirect
message which the host will have to deal with. Having said that this attribute
will have appeal in debugging as IPvlan will not switch / short-circuit
packets internally. e.g. using VEPA mode with lower-device in loopback mode
will avoid some complicated set-ups that use non-local-bind with some route
jugglery.

This patch-set implements these attributes for the existing modes that
IPvlan has. Please see individual patches for their detailed implementation.
A subsequent ip-utils patch is needed and will be sent soon.
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 995231c8 fe89aa6b
...@@ -22,9 +22,21 @@ The driver can be built into the kernel (CONFIG_IPVLAN=y) or as a module ...@@ -22,9 +22,21 @@ The driver can be built into the kernel (CONFIG_IPVLAN=y) or as a module
There are no module parameters for this driver and it can be configured There are no module parameters for this driver and it can be configured
using IProute2/ip utility. using IProute2/ip utility.
ip link add link <master-dev> name <slave-dev> type ipvlan mode { l2 | l3 | l3s } ip link add link <master> name <slave> type ipvlan [ mode MODE ] [ FLAGS ]
where
e.g. ip link add link eth0 name ipvl0 type ipvlan mode l2 MODE: l3 (default) | l3s | l2
FLAGS: bridge (default) | private | vepa
e.g.
(a) Following will create IPvlan link with eth0 as master in
L3 bridge mode
bash# ip link add link eth0 name ipvl0 type ipvlan
(b) This command will create IPvlan link in L2 bridge mode.
bash# ip link add link eth0 name ipvl0 type ipvlan mode l2 bridge
(c) This command will create an IPvlan device in L2 private mode.
bash# ip link add link eth0 name ipvlan type ipvlan mode l2 private
(d) This command will create an IPvlan device in L2 vepa mode.
bash# ip link add link eth0 name ipvlan type ipvlan mode l2 vepa
4. Operating modes: 4. Operating modes:
...@@ -54,7 +66,29 @@ works in this mode and hence it is L3-symmetric (L3s). This will have slightly l ...@@ -54,7 +66,29 @@ works in this mode and hence it is L3-symmetric (L3s). This will have slightly l
performance but that shouldn't matter since you are choosing this mode over plain-L3 performance but that shouldn't matter since you are choosing this mode over plain-L3
mode to make conn-tracking work. mode to make conn-tracking work.
5. What to choose (macvlan vs. ipvlan)? 5. Mode flags:
At this time following mode flags are available
5.1 bridge:
This is the default option. To configure the IPvlan port in this mode,
user can choose to either add this option on the command-line or don't specify
anything. This is the traditional mode where slaves can cross-talk among
themseleves apart from talking through the master device.
5.2 private:
If this option is added to the command-line, the port is set in private
mode. i.e. port wont allow cross communication between slaves.
5.3 vepa:
If this is added to the command-line, the port is set in VEPA mode.
i.e. port will offload switching functionality to the external entity as
described in 802.1Qbg
Note: VEPA mode in IPvlan has limitations. IPvlan uses the mac-address of the
master-device, so the packets which are emitted in this mode for the adjacent
neighbor will have source and destination mac same. This will make the switch /
router send the redirect message.
6. What to choose (macvlan vs. ipvlan)?
These two devices are very similar in many regards and the specific use These two devices are very similar in many regards and the specific use
case could very well define which device to choose. if one of the following case could very well define which device to choose. if one of the following
situations defines your use case then you can choose to use ipvlan - situations defines your use case then you can choose to use ipvlan -
......
...@@ -96,6 +96,7 @@ struct ipvl_port { ...@@ -96,6 +96,7 @@ struct ipvl_port {
struct hlist_head hlhead[IPVLAN_HASH_SIZE]; struct hlist_head hlhead[IPVLAN_HASH_SIZE];
struct list_head ipvlans; struct list_head ipvlans;
u16 mode; u16 mode;
u16 flags;
u16 dev_id_start; u16 dev_id_start;
struct work_struct wq; struct work_struct wq;
struct sk_buff_head backlog; struct sk_buff_head backlog;
...@@ -123,6 +124,36 @@ static inline struct ipvl_port *ipvlan_port_get_rtnl(const struct net_device *d) ...@@ -123,6 +124,36 @@ static inline struct ipvl_port *ipvlan_port_get_rtnl(const struct net_device *d)
return rtnl_dereference(d->rx_handler_data); return rtnl_dereference(d->rx_handler_data);
} }
static inline bool ipvlan_is_private(const struct ipvl_port *port)
{
return !!(port->flags & IPVLAN_F_PRIVATE);
}
static inline void ipvlan_mark_private(struct ipvl_port *port)
{
port->flags |= IPVLAN_F_PRIVATE;
}
static inline void ipvlan_clear_private(struct ipvl_port *port)
{
port->flags &= ~IPVLAN_F_PRIVATE;
}
static inline bool ipvlan_is_vepa(const struct ipvl_port *port)
{
return !!(port->flags & IPVLAN_F_VEPA);
}
static inline void ipvlan_mark_vepa(struct ipvl_port *port)
{
port->flags |= IPVLAN_F_VEPA;
}
static inline void ipvlan_clear_vepa(struct ipvl_port *port)
{
port->flags &= ~IPVLAN_F_VEPA;
}
void ipvlan_init_secret(void); void ipvlan_init_secret(void);
unsigned int ipvlan_mac_hash(const unsigned char *addr); unsigned int ipvlan_mac_hash(const unsigned char *addr);
rx_handler_result_t ipvlan_handle_frame(struct sk_buff **pskb); rx_handler_result_t ipvlan_handle_frame(struct sk_buff **pskb);
......
...@@ -514,10 +514,16 @@ static int ipvlan_xmit_mode_l3(struct sk_buff *skb, struct net_device *dev) ...@@ -514,10 +514,16 @@ static int ipvlan_xmit_mode_l3(struct sk_buff *skb, struct net_device *dev)
if (!lyr3h) if (!lyr3h)
goto out; goto out;
if (!ipvlan_is_vepa(ipvlan->port)) {
addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true); addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
if (addr) if (addr) {
if (ipvlan_is_private(ipvlan->port)) {
consume_skb(skb);
return NET_XMIT_DROP;
}
return ipvlan_rcv_frame(addr, &skb, true); return ipvlan_rcv_frame(addr, &skb, true);
}
}
out: out:
ipvlan_skb_crossing_ns(skb, ipvlan->phy_dev); ipvlan_skb_crossing_ns(skb, ipvlan->phy_dev);
return ipvlan_process_outbound(skb); return ipvlan_process_outbound(skb);
...@@ -531,13 +537,19 @@ static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev) ...@@ -531,13 +537,19 @@ static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
void *lyr3h; void *lyr3h;
int addr_type; int addr_type;
if (ether_addr_equal(eth->h_dest, eth->h_source)) { if (!ipvlan_is_vepa(ipvlan->port) &&
ether_addr_equal(eth->h_dest, eth->h_source)) {
lyr3h = ipvlan_get_L3_hdr(skb, &addr_type); lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
if (lyr3h) { if (lyr3h) {
addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true); addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
if (addr) if (addr) {
if (ipvlan_is_private(ipvlan->port)) {
consume_skb(skb);
return NET_XMIT_DROP;
}
return ipvlan_rcv_frame(addr, &skb, true); return ipvlan_rcv_frame(addr, &skb, true);
} }
}
skb = skb_share_check(skb, GFP_ATOMIC); skb = skb_share_check(skb, GFP_ATOMIC);
if (!skb) if (!skb)
return NET_XMIT_DROP; return NET_XMIT_DROP;
......
...@@ -462,11 +462,29 @@ static int ipvlan_nl_changelink(struct net_device *dev, ...@@ -462,11 +462,29 @@ static int ipvlan_nl_changelink(struct net_device *dev,
struct ipvl_port *port = ipvlan_port_get_rtnl(ipvlan->phy_dev); struct ipvl_port *port = ipvlan_port_get_rtnl(ipvlan->phy_dev);
int err = 0; int err = 0;
if (data && data[IFLA_IPVLAN_MODE]) { if (!data)
return 0;
if (data[IFLA_IPVLAN_MODE]) {
u16 nmode = nla_get_u16(data[IFLA_IPVLAN_MODE]); u16 nmode = nla_get_u16(data[IFLA_IPVLAN_MODE]);
err = ipvlan_set_port_mode(port, nmode); err = ipvlan_set_port_mode(port, nmode);
} }
if (!err && data[IFLA_IPVLAN_FLAGS]) {
u16 flags = nla_get_u16(data[IFLA_IPVLAN_FLAGS]);
if (flags & IPVLAN_F_PRIVATE)
ipvlan_mark_private(port);
else
ipvlan_clear_private(port);
if (flags & IPVLAN_F_VEPA)
ipvlan_mark_vepa(port);
else
ipvlan_clear_vepa(port);
}
return err; return err;
} }
...@@ -474,18 +492,34 @@ static size_t ipvlan_nl_getsize(const struct net_device *dev) ...@@ -474,18 +492,34 @@ static size_t ipvlan_nl_getsize(const struct net_device *dev)
{ {
return (0 return (0
+ nla_total_size(2) /* IFLA_IPVLAN_MODE */ + nla_total_size(2) /* IFLA_IPVLAN_MODE */
+ nla_total_size(2) /* IFLA_IPVLAN_FLAGS */
); );
} }
static int ipvlan_nl_validate(struct nlattr *tb[], struct nlattr *data[], static int ipvlan_nl_validate(struct nlattr *tb[], struct nlattr *data[],
struct netlink_ext_ack *extack) struct netlink_ext_ack *extack)
{ {
if (data && data[IFLA_IPVLAN_MODE]) { if (!data)
return 0;
if (data[IFLA_IPVLAN_MODE]) {
u16 mode = nla_get_u16(data[IFLA_IPVLAN_MODE]); u16 mode = nla_get_u16(data[IFLA_IPVLAN_MODE]);
if (mode < IPVLAN_MODE_L2 || mode >= IPVLAN_MODE_MAX) if (mode < IPVLAN_MODE_L2 || mode >= IPVLAN_MODE_MAX)
return -EINVAL; return -EINVAL;
} }
if (data[IFLA_IPVLAN_FLAGS]) {
u16 flags = nla_get_u16(data[IFLA_IPVLAN_FLAGS]);
/* Only two bits are used at this moment. */
if (flags & ~(IPVLAN_F_PRIVATE | IPVLAN_F_VEPA))
return -EINVAL;
/* Also both flags can't be active at the same time. */
if ((flags & (IPVLAN_F_PRIVATE | IPVLAN_F_VEPA)) ==
(IPVLAN_F_PRIVATE | IPVLAN_F_VEPA))
return -EINVAL;
}
return 0; return 0;
} }
...@@ -502,6 +536,8 @@ static int ipvlan_nl_fillinfo(struct sk_buff *skb, ...@@ -502,6 +536,8 @@ static int ipvlan_nl_fillinfo(struct sk_buff *skb,
ret = -EMSGSIZE; ret = -EMSGSIZE;
if (nla_put_u16(skb, IFLA_IPVLAN_MODE, port->mode)) if (nla_put_u16(skb, IFLA_IPVLAN_MODE, port->mode))
goto err; goto err;
if (nla_put_u16(skb, IFLA_IPVLAN_FLAGS, port->flags))
goto err;
return 0; return 0;
...@@ -549,6 +585,12 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev, ...@@ -549,6 +585,12 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev,
ipvlan_adjust_mtu(ipvlan, phy_dev); ipvlan_adjust_mtu(ipvlan, phy_dev);
INIT_LIST_HEAD(&ipvlan->addrs); INIT_LIST_HEAD(&ipvlan->addrs);
/* Flags are per port and latest update overrides. User has
* to be consistent in setting it just like the mode attribute.
*/
if (data && data[IFLA_IPVLAN_FLAGS])
ipvlan->port->flags = nla_get_u16(data[IFLA_IPVLAN_FLAGS]);
/* If the port-id base is at the MAX value, then wrap it around and /* If the port-id base is at the MAX value, then wrap it around and
* begin from 0x1 again. This may be due to a busy system where lots * begin from 0x1 again. This may be due to a busy system where lots
* of slaves are getting created and deleted. * of slaves are getting created and deleted.
...@@ -644,6 +686,7 @@ EXPORT_SYMBOL_GPL(ipvlan_link_setup); ...@@ -644,6 +686,7 @@ EXPORT_SYMBOL_GPL(ipvlan_link_setup);
static const struct nla_policy ipvlan_nl_policy[IFLA_IPVLAN_MAX + 1] = static const struct nla_policy ipvlan_nl_policy[IFLA_IPVLAN_MAX + 1] =
{ {
[IFLA_IPVLAN_MODE] = { .type = NLA_U16 }, [IFLA_IPVLAN_MODE] = { .type = NLA_U16 },
[IFLA_IPVLAN_FLAGS] = { .type = NLA_U16 },
}; };
static struct rtnl_link_ops ipvlan_link_ops = { static struct rtnl_link_ops ipvlan_link_ops = {
......
...@@ -465,6 +465,7 @@ enum macsec_validation_type { ...@@ -465,6 +465,7 @@ enum macsec_validation_type {
enum { enum {
IFLA_IPVLAN_UNSPEC, IFLA_IPVLAN_UNSPEC,
IFLA_IPVLAN_MODE, IFLA_IPVLAN_MODE,
IFLA_IPVLAN_FLAGS,
__IFLA_IPVLAN_MAX __IFLA_IPVLAN_MAX
}; };
...@@ -477,6 +478,9 @@ enum ipvlan_mode { ...@@ -477,6 +478,9 @@ enum ipvlan_mode {
IPVLAN_MODE_MAX IPVLAN_MODE_MAX
}; };
#define IPVLAN_F_PRIVATE 0x01
#define IPVLAN_F_VEPA 0x02
/* VXLAN section */ /* VXLAN section */
enum { enum {
IFLA_VXLAN_UNSPEC, IFLA_VXLAN_UNSPEC,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment