Commit 5b2941b1 authored by David S. Miller's avatar David S. Miller

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch

Jesse Gross says:

====================
A number of significant new features and optimizations for net-next/3.12.
Highlights are:
 * "Megaflows", an optimization that allows userspace to specify which
   flow fields were used to compute the results of the flow lookup.
   This allows for a major reduction in flow setups (the major
   performance bottleneck in Open vSwitch) without reducing flexibility.
 * Converting netlink dump operations to use RCU, allowing for
   additional parallelism in userspace.
 * Matching and modifying SCTP protocol fields.
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents b6750b40 5828cd9a
...@@ -91,6 +91,46 @@ Often we ellipsize arguments not important to the discussion, e.g.: ...@@ -91,6 +91,46 @@ Often we ellipsize arguments not important to the discussion, e.g.:
in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...) in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...)
Wildcarded flow key format
--------------------------
A wildcarded flow is described with two sequences of Netlink attributes
passed over the Netlink socket. A flow key, exactly as described above, and an
optional corresponding flow mask.
A wildcarded flow can represent a group of exact match flows. Each '1' bit
in the mask specifies a exact match with the corresponding bit in the flow key.
A '0' bit specifies a don't care bit, which will match either a '1' or '0' bit
of a incoming packet. Using wildcarded flow can improve the flow set up rate
by reduce the number of new flows need to be processed by the user space program.
Support for the mask Netlink attribute is optional for both the kernel and user
space program. The kernel can ignore the mask attribute, installing an exact
match flow, or reduce the number of don't care bits in the kernel to less than
what was specified by the user space program. In this case, variations in bits
that the kernel does not implement will simply result in additional flow setups.
The kernel module will also work with user space programs that neither support
nor supply flow mask attributes.
Since the kernel may ignore or modify wildcard bits, it can be difficult for
the userspace program to know exactly what matches are installed. There are
two possible approaches: reactively install flows as they miss the kernel
flow table (and therefore not attempt to determine wildcard changes at all)
or use the kernel's response messages to determine the installed wildcards.
When interacting with userspace, the kernel should maintain the match portion
of the key exactly as originally installed. This will provides a handle to
identify the flow for all future operations. However, when reporting the
mask of an installed flow, the mask should include any restrictions imposed
by the kernel.
The behavior when using overlapping wildcarded flows is undefined. It is the
responsibility of the user space program to ensure that any incoming packet
can match at most one flow, wildcarded or not. The current implementation
performs best-effort detection of overlapping wildcarded flows and may reject
some but not all of them. However, this behavior may change in future versions.
Basic rule for evolving flow keys Basic rule for evolving flow keys
--------------------------------- ---------------------------------
......
...@@ -41,6 +41,7 @@ ...@@ -41,6 +41,7 @@
#define NEXTHDR_ICMP 58 /* ICMP for IPv6. */ #define NEXTHDR_ICMP 58 /* ICMP for IPv6. */
#define NEXTHDR_NONE 59 /* No next header */ #define NEXTHDR_NONE 59 /* No next header */
#define NEXTHDR_DEST 60 /* Destination options header. */ #define NEXTHDR_DEST 60 /* Destination options header. */
#define NEXTHDR_SCTP 132 /* SCTP message. */
#define NEXTHDR_MOBILITY 135 /* Mobility header. */ #define NEXTHDR_MOBILITY 135 /* Mobility header. */
#define NEXTHDR_MAX 255 #define NEXTHDR_MAX 255
......
/* /*
* Copyright (c) 2007-2011 Nicira Networks. * Copyright (c) 2007-2013 Nicira, Inc.
* *
* This program is free software; you can redistribute it and/or * This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public * modify it under the terms of version 2 of the GNU General Public
...@@ -259,6 +259,7 @@ enum ovs_key_attr { ...@@ -259,6 +259,7 @@ enum ovs_key_attr {
OVS_KEY_ATTR_ND, /* struct ovs_key_nd */ OVS_KEY_ATTR_ND, /* struct ovs_key_nd */
OVS_KEY_ATTR_SKB_MARK, /* u32 skb mark */ OVS_KEY_ATTR_SKB_MARK, /* u32 skb mark */
OVS_KEY_ATTR_TUNNEL, /* Nested set of ovs_tunnel attributes */ OVS_KEY_ATTR_TUNNEL, /* Nested set of ovs_tunnel attributes */
OVS_KEY_ATTR_SCTP, /* struct ovs_key_sctp */
#ifdef __KERNEL__ #ifdef __KERNEL__
OVS_KEY_ATTR_IPV4_TUNNEL, /* struct ovs_key_ipv4_tunnel */ OVS_KEY_ATTR_IPV4_TUNNEL, /* struct ovs_key_ipv4_tunnel */
...@@ -333,6 +334,11 @@ struct ovs_key_udp { ...@@ -333,6 +334,11 @@ struct ovs_key_udp {
__be16 udp_dst; __be16 udp_dst;
}; };
struct ovs_key_sctp {
__be16 sctp_src;
__be16 sctp_dst;
};
struct ovs_key_icmp { struct ovs_key_icmp {
__u8 icmp_type; __u8 icmp_type;
__u8 icmp_code; __u8 icmp_code;
...@@ -379,6 +385,12 @@ struct ovs_key_nd { ...@@ -379,6 +385,12 @@ struct ovs_key_nd {
* @OVS_FLOW_ATTR_CLEAR: If present in a %OVS_FLOW_CMD_SET request, clears the * @OVS_FLOW_ATTR_CLEAR: If present in a %OVS_FLOW_CMD_SET request, clears the
* last-used time, accumulated TCP flags, and statistics for this flow. * last-used time, accumulated TCP flags, and statistics for this flow.
* Otherwise ignored in requests. Never present in notifications. * Otherwise ignored in requests. Never present in notifications.
* @OVS_FLOW_ATTR_MASK: Nested %OVS_KEY_ATTR_* attributes specifying the
* mask bits for wildcarded flow match. Mask bit value '1' specifies exact
* match with corresponding flow key bit, while mask bit value '0' specifies
* a wildcarded match. Omitting attribute is treated as wildcarding all
* corresponding fields. Optional for all requests. If not present,
* all flow key bits are exact match bits.
* *
* These attributes follow the &struct ovs_header within the Generic Netlink * These attributes follow the &struct ovs_header within the Generic Netlink
* payload for %OVS_FLOW_* commands. * payload for %OVS_FLOW_* commands.
...@@ -391,6 +403,7 @@ enum ovs_flow_attr { ...@@ -391,6 +403,7 @@ enum ovs_flow_attr {
OVS_FLOW_ATTR_TCP_FLAGS, /* 8-bit OR'd TCP flags. */ OVS_FLOW_ATTR_TCP_FLAGS, /* 8-bit OR'd TCP flags. */
OVS_FLOW_ATTR_USED, /* u64 msecs last used in monotonic time. */ OVS_FLOW_ATTR_USED, /* u64 msecs last used in monotonic time. */
OVS_FLOW_ATTR_CLEAR, /* Flag to clear stats, tcp_flags, used. */ OVS_FLOW_ATTR_CLEAR, /* Flag to clear stats, tcp_flags, used. */
OVS_FLOW_ATTR_MASK, /* Sequence of OVS_KEY_ATTR_* attributes. */
__OVS_FLOW_ATTR_MAX __OVS_FLOW_ATTR_MAX
}; };
......
...@@ -4,6 +4,7 @@ ...@@ -4,6 +4,7 @@
config OPENVSWITCH config OPENVSWITCH
tristate "Open vSwitch" tristate "Open vSwitch"
select LIBCRC32C
---help--- ---help---
Open vSwitch is a multilayer Ethernet switch targeted at virtualized Open vSwitch is a multilayer Ethernet switch targeted at virtualized
environments. In addition to supporting a variety of features environments. In addition to supporting a variety of features
......
...@@ -10,10 +10,13 @@ openvswitch-y := \ ...@@ -10,10 +10,13 @@ openvswitch-y := \
dp_notify.o \ dp_notify.o \
flow.o \ flow.o \
vport.o \ vport.o \
vport-gre.o \
vport-internal_dev.o \ vport-internal_dev.o \
vport-netdev.o vport-netdev.o
ifneq ($(CONFIG_OPENVSWITCH_VXLAN),) ifneq ($(CONFIG_OPENVSWITCH_VXLAN),)
openvswitch-y += vport-vxlan.o openvswitch-y += vport-vxlan.o
endif endif
ifneq ($(CONFIG_OPENVSWITCH_GRE),)
openvswitch-y += vport-gre.o
endif
/* /*
* Copyright (c) 2007-2012 Nicira, Inc. * Copyright (c) 2007-2013 Nicira, Inc.
* *
* This program is free software; you can redistribute it and/or * This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public * modify it under the terms of version 2 of the GNU General Public
...@@ -22,6 +22,7 @@ ...@@ -22,6 +22,7 @@
#include <linux/in.h> #include <linux/in.h>
#include <linux/ip.h> #include <linux/ip.h>
#include <linux/openvswitch.h> #include <linux/openvswitch.h>
#include <linux/sctp.h>
#include <linux/tcp.h> #include <linux/tcp.h>
#include <linux/udp.h> #include <linux/udp.h>
#include <linux/in6.h> #include <linux/in6.h>
...@@ -31,6 +32,7 @@ ...@@ -31,6 +32,7 @@
#include <net/ipv6.h> #include <net/ipv6.h>
#include <net/checksum.h> #include <net/checksum.h>
#include <net/dsfield.h> #include <net/dsfield.h>
#include <net/sctp/checksum.h>
#include "datapath.h" #include "datapath.h"
#include "vport.h" #include "vport.h"
...@@ -352,6 +354,39 @@ static int set_tcp(struct sk_buff *skb, const struct ovs_key_tcp *tcp_port_key) ...@@ -352,6 +354,39 @@ static int set_tcp(struct sk_buff *skb, const struct ovs_key_tcp *tcp_port_key)
return 0; return 0;
} }
static int set_sctp(struct sk_buff *skb,
const struct ovs_key_sctp *sctp_port_key)
{
struct sctphdr *sh;
int err;
unsigned int sctphoff = skb_transport_offset(skb);
err = make_writable(skb, sctphoff + sizeof(struct sctphdr));
if (unlikely(err))
return err;
sh = sctp_hdr(skb);
if (sctp_port_key->sctp_src != sh->source ||
sctp_port_key->sctp_dst != sh->dest) {
__le32 old_correct_csum, new_csum, old_csum;
old_csum = sh->checksum;
old_correct_csum = sctp_compute_cksum(skb, sctphoff);
sh->source = sctp_port_key->sctp_src;
sh->dest = sctp_port_key->sctp_dst;
new_csum = sctp_compute_cksum(skb, sctphoff);
/* Carry any checksum errors through. */
sh->checksum = old_csum ^ old_correct_csum ^ new_csum;
skb->rxhash = 0;
}
return 0;
}
static int do_output(struct datapath *dp, struct sk_buff *skb, int out_port) static int do_output(struct datapath *dp, struct sk_buff *skb, int out_port)
{ {
struct vport *vport; struct vport *vport;
...@@ -376,8 +411,10 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb, ...@@ -376,8 +411,10 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
const struct nlattr *a; const struct nlattr *a;
int rem; int rem;
BUG_ON(!OVS_CB(skb)->pkt_key);
upcall.cmd = OVS_PACKET_CMD_ACTION; upcall.cmd = OVS_PACKET_CMD_ACTION;
upcall.key = &OVS_CB(skb)->flow->key; upcall.key = OVS_CB(skb)->pkt_key;
upcall.userdata = NULL; upcall.userdata = NULL;
upcall.portid = 0; upcall.portid = 0;
...@@ -459,6 +496,10 @@ static int execute_set_action(struct sk_buff *skb, ...@@ -459,6 +496,10 @@ static int execute_set_action(struct sk_buff *skb,
case OVS_KEY_ATTR_UDP: case OVS_KEY_ATTR_UDP:
err = set_udp(skb, nla_data(nested_attr)); err = set_udp(skb, nla_data(nested_attr));
break; break;
case OVS_KEY_ATTR_SCTP:
err = set_sctp(skb, nla_data(nested_attr));
break;
} }
return err; return err;
......
/* /*
* Copyright (c) 2007-2012 Nicira, Inc. * Copyright (c) 2007-2013 Nicira, Inc.
* *
* This program is free software; you can redistribute it and/or * This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public * modify it under the terms of version 2 of the GNU General Public
...@@ -165,7 +165,7 @@ static void destroy_dp_rcu(struct rcu_head *rcu) ...@@ -165,7 +165,7 @@ static void destroy_dp_rcu(struct rcu_head *rcu)
{ {
struct datapath *dp = container_of(rcu, struct datapath, rcu); struct datapath *dp = container_of(rcu, struct datapath, rcu);
ovs_flow_tbl_destroy((__force struct flow_table *)dp->table); ovs_flow_tbl_destroy((__force struct flow_table *)dp->table, false);
free_percpu(dp->stats_percpu); free_percpu(dp->stats_percpu);
release_net(ovs_dp_get_net(dp)); release_net(ovs_dp_get_net(dp));
kfree(dp->ports); kfree(dp->ports);
...@@ -226,19 +226,18 @@ void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb) ...@@ -226,19 +226,18 @@ void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
struct sw_flow_key key; struct sw_flow_key key;
u64 *stats_counter; u64 *stats_counter;
int error; int error;
int key_len;
stats = this_cpu_ptr(dp->stats_percpu); stats = this_cpu_ptr(dp->stats_percpu);
/* Extract flow from 'skb' into 'key'. */ /* Extract flow from 'skb' into 'key'. */
error = ovs_flow_extract(skb, p->port_no, &key, &key_len); error = ovs_flow_extract(skb, p->port_no, &key);
if (unlikely(error)) { if (unlikely(error)) {
kfree_skb(skb); kfree_skb(skb);
return; return;
} }
/* Look up flow. */ /* Look up flow. */
flow = ovs_flow_tbl_lookup(rcu_dereference(dp->table), &key, key_len); flow = ovs_flow_lookup(rcu_dereference(dp->table), &key);
if (unlikely(!flow)) { if (unlikely(!flow)) {
struct dp_upcall_info upcall; struct dp_upcall_info upcall;
...@@ -253,6 +252,7 @@ void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb) ...@@ -253,6 +252,7 @@ void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
} }
OVS_CB(skb)->flow = flow; OVS_CB(skb)->flow = flow;
OVS_CB(skb)->pkt_key = &key;
stats_counter = &stats->n_hit; stats_counter = &stats->n_hit;
ovs_flow_used(OVS_CB(skb)->flow, skb); ovs_flow_used(OVS_CB(skb)->flow, skb);
...@@ -435,7 +435,7 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex, ...@@ -435,7 +435,7 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
upcall->dp_ifindex = dp_ifindex; upcall->dp_ifindex = dp_ifindex;
nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_KEY); nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_KEY);
ovs_flow_to_nlattrs(upcall_info->key, user_skb); ovs_flow_to_nlattrs(upcall_info->key, upcall_info->key, user_skb);
nla_nest_end(user_skb, nla); nla_nest_end(user_skb, nla);
if (upcall_info->userdata) if (upcall_info->userdata)
...@@ -468,7 +468,7 @@ static int flush_flows(struct datapath *dp) ...@@ -468,7 +468,7 @@ static int flush_flows(struct datapath *dp)
rcu_assign_pointer(dp->table, new_table); rcu_assign_pointer(dp->table, new_table);
ovs_flow_tbl_deferred_destroy(old_table); ovs_flow_tbl_destroy(old_table, true);
return 0; return 0;
} }
...@@ -611,10 +611,12 @@ static int validate_tp_port(const struct sw_flow_key *flow_key) ...@@ -611,10 +611,12 @@ static int validate_tp_port(const struct sw_flow_key *flow_key)
static int validate_and_copy_set_tun(const struct nlattr *attr, static int validate_and_copy_set_tun(const struct nlattr *attr,
struct sw_flow_actions **sfa) struct sw_flow_actions **sfa)
{ {
struct ovs_key_ipv4_tunnel tun_key; struct sw_flow_match match;
struct sw_flow_key key;
int err, start; int err, start;
err = ovs_ipv4_tun_from_nlattr(nla_data(attr), &tun_key); ovs_match_init(&match, &key, NULL);
err = ovs_ipv4_tun_from_nlattr(nla_data(attr), &match, false);
if (err) if (err)
return err; return err;
...@@ -622,7 +624,8 @@ static int validate_and_copy_set_tun(const struct nlattr *attr, ...@@ -622,7 +624,8 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
if (start < 0) if (start < 0)
return start; return start;
err = add_action(sfa, OVS_KEY_ATTR_IPV4_TUNNEL, &tun_key, sizeof(tun_key)); err = add_action(sfa, OVS_KEY_ATTR_IPV4_TUNNEL, &match.key->tun_key,
sizeof(match.key->tun_key));
add_nested_action_end(*sfa, start); add_nested_action_end(*sfa, start);
return err; return err;
...@@ -709,6 +712,12 @@ static int validate_set(const struct nlattr *a, ...@@ -709,6 +712,12 @@ static int validate_set(const struct nlattr *a,
return validate_tp_port(flow_key); return validate_tp_port(flow_key);
case OVS_KEY_ATTR_SCTP:
if (flow_key->ip.proto != IPPROTO_SCTP)
return -EINVAL;
return validate_tp_port(flow_key);
default: default:
return -EINVAL; return -EINVAL;
} }
...@@ -857,7 +866,6 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info) ...@@ -857,7 +866,6 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
struct ethhdr *eth; struct ethhdr *eth;
int len; int len;
int err; int err;
int key_len;
err = -EINVAL; err = -EINVAL;
if (!a[OVS_PACKET_ATTR_PACKET] || !a[OVS_PACKET_ATTR_KEY] || if (!a[OVS_PACKET_ATTR_PACKET] || !a[OVS_PACKET_ATTR_KEY] ||
...@@ -890,11 +898,11 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info) ...@@ -890,11 +898,11 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
if (IS_ERR(flow)) if (IS_ERR(flow))
goto err_kfree_skb; goto err_kfree_skb;
err = ovs_flow_extract(packet, -1, &flow->key, &key_len); err = ovs_flow_extract(packet, -1, &flow->key);
if (err) if (err)
goto err_flow_free; goto err_flow_free;
err = ovs_flow_metadata_from_nlattrs(flow, key_len, a[OVS_PACKET_ATTR_KEY]); err = ovs_flow_metadata_from_nlattrs(flow, a[OVS_PACKET_ATTR_KEY]);
if (err) if (err)
goto err_flow_free; goto err_flow_free;
acts = ovs_flow_actions_alloc(nla_len(a[OVS_PACKET_ATTR_ACTIONS])); acts = ovs_flow_actions_alloc(nla_len(a[OVS_PACKET_ATTR_ACTIONS]));
...@@ -908,6 +916,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info) ...@@ -908,6 +916,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
goto err_flow_free; goto err_flow_free;
OVS_CB(packet)->flow = flow; OVS_CB(packet)->flow = flow;
OVS_CB(packet)->pkt_key = &flow->key;
packet->priority = flow->key.phy.priority; packet->priority = flow->key.phy.priority;
packet->mark = flow->key.phy.skb_mark; packet->mark = flow->key.phy.skb_mark;
...@@ -922,13 +931,13 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info) ...@@ -922,13 +931,13 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
local_bh_enable(); local_bh_enable();
rcu_read_unlock(); rcu_read_unlock();
ovs_flow_free(flow); ovs_flow_free(flow, false);
return err; return err;
err_unlock: err_unlock:
rcu_read_unlock(); rcu_read_unlock();
err_flow_free: err_flow_free:
ovs_flow_free(flow); ovs_flow_free(flow, false);
err_kfree_skb: err_kfree_skb:
kfree_skb(packet); kfree_skb(packet);
err: err:
...@@ -951,9 +960,10 @@ static struct genl_ops dp_packet_genl_ops[] = { ...@@ -951,9 +960,10 @@ static struct genl_ops dp_packet_genl_ops[] = {
static void get_dp_stats(struct datapath *dp, struct ovs_dp_stats *stats) static void get_dp_stats(struct datapath *dp, struct ovs_dp_stats *stats)
{ {
struct flow_table *table;
int i; int i;
struct flow_table *table = ovsl_dereference(dp->table);
table = rcu_dereference_check(dp->table, lockdep_ovsl_is_held());
stats->n_flows = ovs_flow_tbl_count(table); stats->n_flows = ovs_flow_tbl_count(table);
stats->n_hit = stats->n_missed = stats->n_lost = 0; stats->n_hit = stats->n_missed = stats->n_lost = 0;
...@@ -1044,7 +1054,8 @@ static int set_action_to_attr(const struct nlattr *a, struct sk_buff *skb) ...@@ -1044,7 +1054,8 @@ static int set_action_to_attr(const struct nlattr *a, struct sk_buff *skb)
if (!start) if (!start)
return -EMSGSIZE; return -EMSGSIZE;
err = ovs_ipv4_tun_to_nlattr(skb, nla_data(ovs_key)); err = ovs_ipv4_tun_to_nlattr(skb, nla_data(ovs_key),
nla_data(ovs_key));
if (err) if (err)
return err; return err;
nla_nest_end(skb, start); nla_nest_end(skb, start);
...@@ -1092,6 +1103,7 @@ static size_t ovs_flow_cmd_msg_size(const struct sw_flow_actions *acts) ...@@ -1092,6 +1103,7 @@ static size_t ovs_flow_cmd_msg_size(const struct sw_flow_actions *acts)
{ {
return NLMSG_ALIGN(sizeof(struct ovs_header)) return NLMSG_ALIGN(sizeof(struct ovs_header))
+ nla_total_size(key_attr_size()) /* OVS_FLOW_ATTR_KEY */ + nla_total_size(key_attr_size()) /* OVS_FLOW_ATTR_KEY */
+ nla_total_size(key_attr_size()) /* OVS_FLOW_ATTR_MASK */
+ nla_total_size(sizeof(struct ovs_flow_stats)) /* OVS_FLOW_ATTR_STATS */ + nla_total_size(sizeof(struct ovs_flow_stats)) /* OVS_FLOW_ATTR_STATS */
+ nla_total_size(1) /* OVS_FLOW_ATTR_TCP_FLAGS */ + nla_total_size(1) /* OVS_FLOW_ATTR_TCP_FLAGS */
+ nla_total_size(8) /* OVS_FLOW_ATTR_USED */ + nla_total_size(8) /* OVS_FLOW_ATTR_USED */
...@@ -1104,7 +1116,6 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp, ...@@ -1104,7 +1116,6 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp,
u32 seq, u32 flags, u8 cmd) u32 seq, u32 flags, u8 cmd)
{ {
const int skb_orig_len = skb->len; const int skb_orig_len = skb->len;
const struct sw_flow_actions *sf_acts;
struct nlattr *start; struct nlattr *start;
struct ovs_flow_stats stats; struct ovs_flow_stats stats;
struct ovs_header *ovs_header; struct ovs_header *ovs_header;
...@@ -1113,20 +1124,31 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp, ...@@ -1113,20 +1124,31 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp,
u8 tcp_flags; u8 tcp_flags;
int err; int err;
sf_acts = ovsl_dereference(flow->sf_acts);
ovs_header = genlmsg_put(skb, portid, seq, &dp_flow_genl_family, flags, cmd); ovs_header = genlmsg_put(skb, portid, seq, &dp_flow_genl_family, flags, cmd);
if (!ovs_header) if (!ovs_header)
return -EMSGSIZE; return -EMSGSIZE;
ovs_header->dp_ifindex = get_dpifindex(dp); ovs_header->dp_ifindex = get_dpifindex(dp);
/* Fill flow key. */
nla = nla_nest_start(skb, OVS_FLOW_ATTR_KEY); nla = nla_nest_start(skb, OVS_FLOW_ATTR_KEY);
if (!nla) if (!nla)
goto nla_put_failure; goto nla_put_failure;
err = ovs_flow_to_nlattrs(&flow->key, skb);
err = ovs_flow_to_nlattrs(&flow->unmasked_key,
&flow->unmasked_key, skb);
if (err)
goto error;
nla_nest_end(skb, nla);
nla = nla_nest_start(skb, OVS_FLOW_ATTR_MASK);
if (!nla)
goto nla_put_failure;
err = ovs_flow_to_nlattrs(&flow->key, &flow->mask->key, skb);
if (err) if (err)
goto error; goto error;
nla_nest_end(skb, nla); nla_nest_end(skb, nla);
spin_lock_bh(&flow->lock); spin_lock_bh(&flow->lock);
...@@ -1161,6 +1183,11 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp, ...@@ -1161,6 +1183,11 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp,
*/ */
start = nla_nest_start(skb, OVS_FLOW_ATTR_ACTIONS); start = nla_nest_start(skb, OVS_FLOW_ATTR_ACTIONS);
if (start) { if (start) {
const struct sw_flow_actions *sf_acts;
sf_acts = rcu_dereference_check(flow->sf_acts,
lockdep_ovsl_is_held());
err = actions_to_attr(sf_acts->actions, sf_acts->actions_len, skb); err = actions_to_attr(sf_acts->actions, sf_acts->actions_len, skb);
if (!err) if (!err)
nla_nest_end(skb, start); nla_nest_end(skb, start);
...@@ -1211,20 +1238,24 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info) ...@@ -1211,20 +1238,24 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
{ {
struct nlattr **a = info->attrs; struct nlattr **a = info->attrs;
struct ovs_header *ovs_header = info->userhdr; struct ovs_header *ovs_header = info->userhdr;
struct sw_flow_key key; struct sw_flow_key key, masked_key;
struct sw_flow *flow; struct sw_flow *flow = NULL;
struct sw_flow_mask mask;
struct sk_buff *reply; struct sk_buff *reply;
struct datapath *dp; struct datapath *dp;
struct flow_table *table; struct flow_table *table;
struct sw_flow_actions *acts = NULL; struct sw_flow_actions *acts = NULL;
struct sw_flow_match match;
int error; int error;
int key_len;
/* Extract key. */ /* Extract key. */
error = -EINVAL; error = -EINVAL;
if (!a[OVS_FLOW_ATTR_KEY]) if (!a[OVS_FLOW_ATTR_KEY])
goto error; goto error;
error = ovs_flow_from_nlattrs(&key, &key_len, a[OVS_FLOW_ATTR_KEY]);
ovs_match_init(&match, &key, &mask);
error = ovs_match_from_nlattrs(&match,
a[OVS_FLOW_ATTR_KEY], a[OVS_FLOW_ATTR_MASK]);
if (error) if (error)
goto error; goto error;
...@@ -1235,9 +1266,13 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info) ...@@ -1235,9 +1266,13 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
if (IS_ERR(acts)) if (IS_ERR(acts))
goto error; goto error;
error = validate_and_copy_actions(a[OVS_FLOW_ATTR_ACTIONS], &key, 0, &acts); ovs_flow_key_mask(&masked_key, &key, &mask);
if (error) error = validate_and_copy_actions(a[OVS_FLOW_ATTR_ACTIONS],
&masked_key, 0, &acts);
if (error) {
OVS_NLERR("Flow actions may not be safe on all matching packets.\n");
goto err_kfree; goto err_kfree;
}
} else if (info->genlhdr->cmd == OVS_FLOW_CMD_NEW) { } else if (info->genlhdr->cmd == OVS_FLOW_CMD_NEW) {
error = -EINVAL; error = -EINVAL;
goto error; goto error;
...@@ -1250,8 +1285,11 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info) ...@@ -1250,8 +1285,11 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
goto err_unlock_ovs; goto err_unlock_ovs;
table = ovsl_dereference(dp->table); table = ovsl_dereference(dp->table);
flow = ovs_flow_tbl_lookup(table, &key, key_len);
/* Check if this is a duplicate flow */
flow = ovs_flow_lookup(table, &key);
if (!flow) { if (!flow) {
struct sw_flow_mask *mask_p;
/* Bail out if we're not allowed to create a new flow. */ /* Bail out if we're not allowed to create a new flow. */
error = -ENOENT; error = -ENOENT;
if (info->genlhdr->cmd == OVS_FLOW_CMD_SET) if (info->genlhdr->cmd == OVS_FLOW_CMD_SET)
...@@ -1264,7 +1302,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info) ...@@ -1264,7 +1302,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
new_table = ovs_flow_tbl_expand(table); new_table = ovs_flow_tbl_expand(table);
if (!IS_ERR(new_table)) { if (!IS_ERR(new_table)) {
rcu_assign_pointer(dp->table, new_table); rcu_assign_pointer(dp->table, new_table);
ovs_flow_tbl_deferred_destroy(table); ovs_flow_tbl_destroy(table, true);
table = ovsl_dereference(dp->table); table = ovsl_dereference(dp->table);
} }
} }
...@@ -1277,14 +1315,30 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info) ...@@ -1277,14 +1315,30 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
} }
clear_stats(flow); clear_stats(flow);
flow->key = masked_key;
flow->unmasked_key = key;
/* Make sure mask is unique in the system */
mask_p = ovs_sw_flow_mask_find(table, &mask);
if (!mask_p) {
/* Allocate a new mask if none exsits. */
mask_p = ovs_sw_flow_mask_alloc();
if (!mask_p)
goto err_flow_free;
mask_p->key = mask.key;
mask_p->range = mask.range;
ovs_sw_flow_mask_insert(table, mask_p);
}
ovs_sw_flow_mask_add_ref(mask_p);
flow->mask = mask_p;
rcu_assign_pointer(flow->sf_acts, acts); rcu_assign_pointer(flow->sf_acts, acts);
/* Put flow in bucket. */ /* Put flow in bucket. */
ovs_flow_tbl_insert(table, flow, &key, key_len); ovs_flow_insert(table, flow);
reply = ovs_flow_cmd_build_info(flow, dp, info->snd_portid, reply = ovs_flow_cmd_build_info(flow, dp, info->snd_portid,
info->snd_seq, info->snd_seq, OVS_FLOW_CMD_NEW);
OVS_FLOW_CMD_NEW);
} else { } else {
/* We found a matching flow. */ /* We found a matching flow. */
struct sw_flow_actions *old_acts; struct sw_flow_actions *old_acts;
...@@ -1300,6 +1354,13 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info) ...@@ -1300,6 +1354,13 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
info->nlhdr->nlmsg_flags & (NLM_F_CREATE | NLM_F_EXCL)) info->nlhdr->nlmsg_flags & (NLM_F_CREATE | NLM_F_EXCL))
goto err_unlock_ovs; goto err_unlock_ovs;
/* The unmasked key has to be the same for flow updates. */
error = -EINVAL;
if (!ovs_flow_cmp_unmasked_key(flow, &key, match.range.end)) {
OVS_NLERR("Flow modification message rejected, unmasked key does not match.\n");
goto err_unlock_ovs;
}
/* Update actions. */ /* Update actions. */
old_acts = ovsl_dereference(flow->sf_acts); old_acts = ovsl_dereference(flow->sf_acts);
rcu_assign_pointer(flow->sf_acts, acts); rcu_assign_pointer(flow->sf_acts, acts);
...@@ -1324,6 +1385,8 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info) ...@@ -1324,6 +1385,8 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
ovs_dp_flow_multicast_group.id, PTR_ERR(reply)); ovs_dp_flow_multicast_group.id, PTR_ERR(reply));
return 0; return 0;
err_flow_free:
ovs_flow_free(flow, false);
err_unlock_ovs: err_unlock_ovs:
ovs_unlock(); ovs_unlock();
err_kfree: err_kfree:
...@@ -1341,12 +1404,16 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info) ...@@ -1341,12 +1404,16 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info)
struct sw_flow *flow; struct sw_flow *flow;
struct datapath *dp; struct datapath *dp;
struct flow_table *table; struct flow_table *table;
struct sw_flow_match match;
int err; int err;
int key_len;
if (!a[OVS_FLOW_ATTR_KEY]) if (!a[OVS_FLOW_ATTR_KEY]) {
OVS_NLERR("Flow get message rejected, Key attribute missing.\n");
return -EINVAL; return -EINVAL;
err = ovs_flow_from_nlattrs(&key, &key_len, a[OVS_FLOW_ATTR_KEY]); }
ovs_match_init(&match, &key, NULL);
err = ovs_match_from_nlattrs(&match, a[OVS_FLOW_ATTR_KEY], NULL);
if (err) if (err)
return err; return err;
...@@ -1358,7 +1425,7 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info) ...@@ -1358,7 +1425,7 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info)
} }
table = ovsl_dereference(dp->table); table = ovsl_dereference(dp->table);
flow = ovs_flow_tbl_lookup(table, &key, key_len); flow = ovs_flow_lookup_unmasked_key(table, &match);
if (!flow) { if (!flow) {
err = -ENOENT; err = -ENOENT;
goto unlock; goto unlock;
...@@ -1387,8 +1454,8 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info) ...@@ -1387,8 +1454,8 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info)
struct sw_flow *flow; struct sw_flow *flow;
struct datapath *dp; struct datapath *dp;
struct flow_table *table; struct flow_table *table;
struct sw_flow_match match;
int err; int err;
int key_len;
ovs_lock(); ovs_lock();
dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex); dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
...@@ -1401,12 +1468,14 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info) ...@@ -1401,12 +1468,14 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info)
err = flush_flows(dp); err = flush_flows(dp);
goto unlock; goto unlock;
} }
err = ovs_flow_from_nlattrs(&key, &key_len, a[OVS_FLOW_ATTR_KEY]);
ovs_match_init(&match, &key, NULL);
err = ovs_match_from_nlattrs(&match, a[OVS_FLOW_ATTR_KEY], NULL);
if (err) if (err)
goto unlock; goto unlock;
table = ovsl_dereference(dp->table); table = ovsl_dereference(dp->table);
flow = ovs_flow_tbl_lookup(table, &key, key_len); flow = ovs_flow_lookup_unmasked_key(table, &match);
if (!flow) { if (!flow) {
err = -ENOENT; err = -ENOENT;
goto unlock; goto unlock;
...@@ -1418,13 +1487,13 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info) ...@@ -1418,13 +1487,13 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info)
goto unlock; goto unlock;
} }
ovs_flow_tbl_remove(table, flow); ovs_flow_remove(table, flow);
err = ovs_flow_cmd_fill_info(flow, dp, reply, info->snd_portid, err = ovs_flow_cmd_fill_info(flow, dp, reply, info->snd_portid,
info->snd_seq, 0, OVS_FLOW_CMD_DEL); info->snd_seq, 0, OVS_FLOW_CMD_DEL);
BUG_ON(err < 0); BUG_ON(err < 0);
ovs_flow_deferred_free(flow); ovs_flow_free(flow, true);
ovs_unlock(); ovs_unlock();
ovs_notify(reply, info, &ovs_dp_flow_multicast_group); ovs_notify(reply, info, &ovs_dp_flow_multicast_group);
...@@ -1440,22 +1509,21 @@ static int ovs_flow_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb) ...@@ -1440,22 +1509,21 @@ static int ovs_flow_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb)
struct datapath *dp; struct datapath *dp;
struct flow_table *table; struct flow_table *table;
ovs_lock(); rcu_read_lock();
dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex); dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
if (!dp) { if (!dp) {
ovs_unlock(); rcu_read_unlock();
return -ENODEV; return -ENODEV;
} }
table = ovsl_dereference(dp->table); table = rcu_dereference(dp->table);
for (;;) { for (;;) {
struct sw_flow *flow; struct sw_flow *flow;
u32 bucket, obj; u32 bucket, obj;
bucket = cb->args[0]; bucket = cb->args[0];
obj = cb->args[1]; obj = cb->args[1];
flow = ovs_flow_tbl_next(table, &bucket, &obj); flow = ovs_flow_dump_next(table, &bucket, &obj);
if (!flow) if (!flow)
break; break;
...@@ -1468,7 +1536,7 @@ static int ovs_flow_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb) ...@@ -1468,7 +1536,7 @@ static int ovs_flow_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb)
cb->args[0] = bucket; cb->args[0] = bucket;
cb->args[1] = obj; cb->args[1] = obj;
} }
ovs_unlock(); rcu_read_unlock();
return skb->len; return skb->len;
} }
...@@ -1664,7 +1732,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info) ...@@ -1664,7 +1732,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
goto err_destroy_local_port; goto err_destroy_local_port;
ovs_net = net_generic(ovs_dp_get_net(dp), ovs_net_id); ovs_net = net_generic(ovs_dp_get_net(dp), ovs_net_id);
list_add_tail(&dp->list_node, &ovs_net->dps); list_add_tail_rcu(&dp->list_node, &ovs_net->dps);
ovs_unlock(); ovs_unlock();
...@@ -1678,7 +1746,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info) ...@@ -1678,7 +1746,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
err_destroy_percpu: err_destroy_percpu:
free_percpu(dp->stats_percpu); free_percpu(dp->stats_percpu);
err_destroy_table: err_destroy_table:
ovs_flow_tbl_destroy(ovsl_dereference(dp->table)); ovs_flow_tbl_destroy(ovsl_dereference(dp->table), false);
err_free_dp: err_free_dp:
release_net(ovs_dp_get_net(dp)); release_net(ovs_dp_get_net(dp));
kfree(dp); kfree(dp);
...@@ -1702,7 +1770,7 @@ static void __dp_destroy(struct datapath *dp) ...@@ -1702,7 +1770,7 @@ static void __dp_destroy(struct datapath *dp)
ovs_dp_detach_port(vport); ovs_dp_detach_port(vport);
} }
list_del(&dp->list_node); list_del_rcu(&dp->list_node);
/* OVSP_LOCAL is datapath internal port. We need to make sure that /* OVSP_LOCAL is datapath internal port. We need to make sure that
* all port in datapath are destroyed first before freeing datapath. * all port in datapath are destroyed first before freeing datapath.
...@@ -1807,8 +1875,8 @@ static int ovs_dp_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb) ...@@ -1807,8 +1875,8 @@ static int ovs_dp_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb)
int skip = cb->args[0]; int skip = cb->args[0];
int i = 0; int i = 0;
ovs_lock(); rcu_read_lock();
list_for_each_entry(dp, &ovs_net->dps, list_node) { list_for_each_entry_rcu(dp, &ovs_net->dps, list_node) {
if (i >= skip && if (i >= skip &&
ovs_dp_cmd_fill_info(dp, skb, NETLINK_CB(cb->skb).portid, ovs_dp_cmd_fill_info(dp, skb, NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq, NLM_F_MULTI, cb->nlh->nlmsg_seq, NLM_F_MULTI,
...@@ -1816,7 +1884,7 @@ static int ovs_dp_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb) ...@@ -1816,7 +1884,7 @@ static int ovs_dp_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb)
break; break;
i++; i++;
} }
ovs_unlock(); rcu_read_unlock();
cb->args[0] = i; cb->args[0] = i;
...@@ -2285,7 +2353,7 @@ static void rehash_flow_table(struct work_struct *work) ...@@ -2285,7 +2353,7 @@ static void rehash_flow_table(struct work_struct *work)
new_table = ovs_flow_tbl_rehash(old_table); new_table = ovs_flow_tbl_rehash(old_table);
if (!IS_ERR(new_table)) { if (!IS_ERR(new_table)) {
rcu_assign_pointer(dp->table, new_table); rcu_assign_pointer(dp->table, new_table);
ovs_flow_tbl_deferred_destroy(old_table); ovs_flow_tbl_destroy(old_table, true);
} }
} }
} }
......
...@@ -88,11 +88,13 @@ struct datapath { ...@@ -88,11 +88,13 @@ struct datapath {
/** /**
* struct ovs_skb_cb - OVS data in skb CB * struct ovs_skb_cb - OVS data in skb CB
* @flow: The flow associated with this packet. May be %NULL if no flow. * @flow: The flow associated with this packet. May be %NULL if no flow.
* @pkt_key: The flow information extracted from the packet. Must be nonnull.
* @tun_key: Key for the tunnel that encapsulated this packet. NULL if the * @tun_key: Key for the tunnel that encapsulated this packet. NULL if the
* packet is not being tunneled. * packet is not being tunneled.
*/ */
struct ovs_skb_cb { struct ovs_skb_cb {
struct sw_flow *flow; struct sw_flow *flow;
struct sw_flow_key *pkt_key;
struct ovs_key_ipv4_tunnel *tun_key; struct ovs_key_ipv4_tunnel *tun_key;
}; };
#define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb) #define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb)
...@@ -183,4 +185,8 @@ struct sk_buff *ovs_vport_cmd_build_info(struct vport *, u32 pid, u32 seq, ...@@ -183,4 +185,8 @@ struct sk_buff *ovs_vport_cmd_build_info(struct vport *, u32 pid, u32 seq,
int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb); int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb);
void ovs_dp_notify_wq(struct work_struct *work); void ovs_dp_notify_wq(struct work_struct *work);
#define OVS_NLERR(fmt, ...) \
pr_info_once("netlink: " fmt, ##__VA_ARGS__)
#endif /* datapath.h */ #endif /* datapath.h */
/* /*
* Copyright (c) 2007-2011 Nicira, Inc. * Copyright (c) 2007-2013 Nicira, Inc.
* *
* This program is free software; you can redistribute it and/or * This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public * modify it under the terms of version 2 of the GNU General Public
...@@ -34,6 +34,7 @@ ...@@ -34,6 +34,7 @@
#include <linux/if_arp.h> #include <linux/if_arp.h>
#include <linux/ip.h> #include <linux/ip.h>
#include <linux/ipv6.h> #include <linux/ipv6.h>
#include <linux/sctp.h>
#include <linux/tcp.h> #include <linux/tcp.h>
#include <linux/udp.h> #include <linux/udp.h>
#include <linux/icmp.h> #include <linux/icmp.h>
...@@ -46,6 +47,202 @@ ...@@ -46,6 +47,202 @@
static struct kmem_cache *flow_cache; static struct kmem_cache *flow_cache;
static void ovs_sw_flow_mask_set(struct sw_flow_mask *mask,
struct sw_flow_key_range *range, u8 val);
static void update_range__(struct sw_flow_match *match,
size_t offset, size_t size, bool is_mask)
{
struct sw_flow_key_range *range = NULL;
size_t start = rounddown(offset, sizeof(long));
size_t end = roundup(offset + size, sizeof(long));
if (!is_mask)
range = &match->range;
else if (match->mask)
range = &match->mask->range;
if (!range)
return;
if (range->start == range->end) {
range->start = start;
range->end = end;
return;
}
if (range->start > start)
range->start = start;
if (range->end < end)
range->end = end;
}
#define SW_FLOW_KEY_PUT(match, field, value, is_mask) \
do { \
update_range__(match, offsetof(struct sw_flow_key, field), \
sizeof((match)->key->field), is_mask); \
if (is_mask) { \
if ((match)->mask) \
(match)->mask->key.field = value; \
} else { \
(match)->key->field = value; \
} \
} while (0)
#define SW_FLOW_KEY_MEMCPY(match, field, value_p, len, is_mask) \
do { \
update_range__(match, offsetof(struct sw_flow_key, field), \
len, is_mask); \
if (is_mask) { \
if ((match)->mask) \
memcpy(&(match)->mask->key.field, value_p, len);\
} else { \
memcpy(&(match)->key->field, value_p, len); \
} \
} while (0)
static u16 range_n_bytes(const struct sw_flow_key_range *range)
{
return range->end - range->start;
}
void ovs_match_init(struct sw_flow_match *match,
struct sw_flow_key *key,
struct sw_flow_mask *mask)
{
memset(match, 0, sizeof(*match));
match->key = key;
match->mask = mask;
memset(key, 0, sizeof(*key));
if (mask) {
memset(&mask->key, 0, sizeof(mask->key));
mask->range.start = mask->range.end = 0;
}
}
static bool ovs_match_validate(const struct sw_flow_match *match,
u64 key_attrs, u64 mask_attrs)
{
u64 key_expected = 1 << OVS_KEY_ATTR_ETHERNET;
u64 mask_allowed = key_attrs; /* At most allow all key attributes */
/* The following mask attributes allowed only if they
* pass the validation tests. */
mask_allowed &= ~((1 << OVS_KEY_ATTR_IPV4)
| (1 << OVS_KEY_ATTR_IPV6)
| (1 << OVS_KEY_ATTR_TCP)
| (1 << OVS_KEY_ATTR_UDP)
| (1 << OVS_KEY_ATTR_SCTP)
| (1 << OVS_KEY_ATTR_ICMP)
| (1 << OVS_KEY_ATTR_ICMPV6)
| (1 << OVS_KEY_ATTR_ARP)
| (1 << OVS_KEY_ATTR_ND));
/* Always allowed mask fields. */
mask_allowed |= ((1 << OVS_KEY_ATTR_TUNNEL)
| (1 << OVS_KEY_ATTR_IN_PORT)
| (1 << OVS_KEY_ATTR_ETHERTYPE));
/* Check key attributes. */
if (match->key->eth.type == htons(ETH_P_ARP)
|| match->key->eth.type == htons(ETH_P_RARP)) {
key_expected |= 1 << OVS_KEY_ATTR_ARP;
if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
mask_allowed |= 1 << OVS_KEY_ATTR_ARP;
}
if (match->key->eth.type == htons(ETH_P_IP)) {
key_expected |= 1 << OVS_KEY_ATTR_IPV4;
if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
mask_allowed |= 1 << OVS_KEY_ATTR_IPV4;
if (match->key->ip.frag != OVS_FRAG_TYPE_LATER) {
if (match->key->ip.proto == IPPROTO_UDP) {
key_expected |= 1 << OVS_KEY_ATTR_UDP;
if (match->mask && (match->mask->key.ip.proto == 0xff))
mask_allowed |= 1 << OVS_KEY_ATTR_UDP;
}
if (match->key->ip.proto == IPPROTO_SCTP) {
key_expected |= 1 << OVS_KEY_ATTR_SCTP;
if (match->mask && (match->mask->key.ip.proto == 0xff))
mask_allowed |= 1 << OVS_KEY_ATTR_SCTP;
}
if (match->key->ip.proto == IPPROTO_TCP) {
key_expected |= 1 << OVS_KEY_ATTR_TCP;
if (match->mask && (match->mask->key.ip.proto == 0xff))
mask_allowed |= 1 << OVS_KEY_ATTR_TCP;
}
if (match->key->ip.proto == IPPROTO_ICMP) {
key_expected |= 1 << OVS_KEY_ATTR_ICMP;
if (match->mask && (match->mask->key.ip.proto == 0xff))
mask_allowed |= 1 << OVS_KEY_ATTR_ICMP;
}
}
}
if (match->key->eth.type == htons(ETH_P_IPV6)) {
key_expected |= 1 << OVS_KEY_ATTR_IPV6;
if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
mask_allowed |= 1 << OVS_KEY_ATTR_IPV6;
if (match->key->ip.frag != OVS_FRAG_TYPE_LATER) {
if (match->key->ip.proto == IPPROTO_UDP) {
key_expected |= 1 << OVS_KEY_ATTR_UDP;
if (match->mask && (match->mask->key.ip.proto == 0xff))
mask_allowed |= 1 << OVS_KEY_ATTR_UDP;
}
if (match->key->ip.proto == IPPROTO_SCTP) {
key_expected |= 1 << OVS_KEY_ATTR_SCTP;
if (match->mask && (match->mask->key.ip.proto == 0xff))
mask_allowed |= 1 << OVS_KEY_ATTR_SCTP;
}
if (match->key->ip.proto == IPPROTO_TCP) {
key_expected |= 1 << OVS_KEY_ATTR_TCP;
if (match->mask && (match->mask->key.ip.proto == 0xff))
mask_allowed |= 1 << OVS_KEY_ATTR_TCP;
}
if (match->key->ip.proto == IPPROTO_ICMPV6) {
key_expected |= 1 << OVS_KEY_ATTR_ICMPV6;
if (match->mask && (match->mask->key.ip.proto == 0xff))
mask_allowed |= 1 << OVS_KEY_ATTR_ICMPV6;
if (match->key->ipv6.tp.src ==
htons(NDISC_NEIGHBOUR_SOLICITATION) ||
match->key->ipv6.tp.src == htons(NDISC_NEIGHBOUR_ADVERTISEMENT)) {
key_expected |= 1 << OVS_KEY_ATTR_ND;
if (match->mask && (match->mask->key.ipv6.tp.src == htons(0xffff)))
mask_allowed |= 1 << OVS_KEY_ATTR_ND;
}
}
}
}
if ((key_attrs & key_expected) != key_expected) {
/* Key attributes check failed. */
OVS_NLERR("Missing expected key attributes (key_attrs=%llx, expected=%llx).\n",
key_attrs, key_expected);
return false;
}
if ((mask_attrs & mask_allowed) != mask_attrs) {
/* Mask attributes check failed. */
OVS_NLERR("Contain more than allowed mask fields (mask_attrs=%llx, mask_allowed=%llx).\n",
mask_attrs, mask_allowed);
return false;
}
return true;
}
static int check_header(struct sk_buff *skb, int len) static int check_header(struct sk_buff *skb, int len)
{ {
if (unlikely(skb->len < len)) if (unlikely(skb->len < len))
...@@ -102,6 +299,12 @@ static bool udphdr_ok(struct sk_buff *skb) ...@@ -102,6 +299,12 @@ static bool udphdr_ok(struct sk_buff *skb)
sizeof(struct udphdr)); sizeof(struct udphdr));
} }
static bool sctphdr_ok(struct sk_buff *skb)
{
return pskb_may_pull(skb, skb_transport_offset(skb) +
sizeof(struct sctphdr));
}
static bool icmphdr_ok(struct sk_buff *skb) static bool icmphdr_ok(struct sk_buff *skb)
{ {
return pskb_may_pull(skb, skb_transport_offset(skb) + return pskb_may_pull(skb, skb_transport_offset(skb) +
...@@ -121,12 +324,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies) ...@@ -121,12 +324,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies)
return cur_ms - idle_ms; return cur_ms - idle_ms;
} }
#define SW_FLOW_KEY_OFFSET(field) \ static int parse_ipv6hdr(struct sk_buff *skb, struct sw_flow_key *key)
(offsetof(struct sw_flow_key, field) + \
FIELD_SIZEOF(struct sw_flow_key, field))
static int parse_ipv6hdr(struct sk_buff *skb, struct sw_flow_key *key,
int *key_lenp)
{ {
unsigned int nh_ofs = skb_network_offset(skb); unsigned int nh_ofs = skb_network_offset(skb);
unsigned int nh_len; unsigned int nh_len;
...@@ -136,8 +334,6 @@ static int parse_ipv6hdr(struct sk_buff *skb, struct sw_flow_key *key, ...@@ -136,8 +334,6 @@ static int parse_ipv6hdr(struct sk_buff *skb, struct sw_flow_key *key,
__be16 frag_off; __be16 frag_off;
int err; int err;
*key_lenp = SW_FLOW_KEY_OFFSET(ipv6.label);
err = check_header(skb, nh_ofs + sizeof(*nh)); err = check_header(skb, nh_ofs + sizeof(*nh));
if (unlikely(err)) if (unlikely(err))
return err; return err;
...@@ -176,6 +372,22 @@ static bool icmp6hdr_ok(struct sk_buff *skb) ...@@ -176,6 +372,22 @@ static bool icmp6hdr_ok(struct sk_buff *skb)
sizeof(struct icmp6hdr)); sizeof(struct icmp6hdr));
} }
void ovs_flow_key_mask(struct sw_flow_key *dst, const struct sw_flow_key *src,
const struct sw_flow_mask *mask)
{
const long *m = (long *)((u8 *)&mask->key + mask->range.start);
const long *s = (long *)((u8 *)src + mask->range.start);
long *d = (long *)((u8 *)dst + mask->range.start);
int i;
/* The memory outside of the 'mask->range' are not set since
* further operations on 'dst' only uses contents within
* 'mask->range'.
*/
for (i = 0; i < range_n_bytes(&mask->range); i += sizeof(long))
*d++ = *s++ & *m++;
}
#define TCP_FLAGS_OFFSET 13 #define TCP_FLAGS_OFFSET 13
#define TCP_FLAG_MASK 0x3f #define TCP_FLAG_MASK 0x3f
...@@ -224,6 +436,7 @@ struct sw_flow *ovs_flow_alloc(void) ...@@ -224,6 +436,7 @@ struct sw_flow *ovs_flow_alloc(void)
spin_lock_init(&flow->lock); spin_lock_init(&flow->lock);
flow->sf_acts = NULL; flow->sf_acts = NULL;
flow->mask = NULL;
return flow; return flow;
} }
...@@ -263,7 +476,7 @@ static void free_buckets(struct flex_array *buckets) ...@@ -263,7 +476,7 @@ static void free_buckets(struct flex_array *buckets)
flex_array_free(buckets); flex_array_free(buckets);
} }
struct flow_table *ovs_flow_tbl_alloc(int new_size) static struct flow_table *__flow_tbl_alloc(int new_size)
{ {
struct flow_table *table = kmalloc(sizeof(*table), GFP_KERNEL); struct flow_table *table = kmalloc(sizeof(*table), GFP_KERNEL);
...@@ -281,17 +494,15 @@ struct flow_table *ovs_flow_tbl_alloc(int new_size) ...@@ -281,17 +494,15 @@ struct flow_table *ovs_flow_tbl_alloc(int new_size)
table->node_ver = 0; table->node_ver = 0;
table->keep_flows = false; table->keep_flows = false;
get_random_bytes(&table->hash_seed, sizeof(u32)); get_random_bytes(&table->hash_seed, sizeof(u32));
table->mask_list = NULL;
return table; return table;
} }
void ovs_flow_tbl_destroy(struct flow_table *table) static void __flow_tbl_destroy(struct flow_table *table)
{ {
int i; int i;
if (!table)
return;
if (table->keep_flows) if (table->keep_flows)
goto skip_flows; goto skip_flows;
...@@ -302,32 +513,56 @@ void ovs_flow_tbl_destroy(struct flow_table *table) ...@@ -302,32 +513,56 @@ void ovs_flow_tbl_destroy(struct flow_table *table)
int ver = table->node_ver; int ver = table->node_ver;
hlist_for_each_entry_safe(flow, n, head, hash_node[ver]) { hlist_for_each_entry_safe(flow, n, head, hash_node[ver]) {
hlist_del_rcu(&flow->hash_node[ver]); hlist_del(&flow->hash_node[ver]);
ovs_flow_free(flow); ovs_flow_free(flow, false);
} }
} }
BUG_ON(!list_empty(table->mask_list));
kfree(table->mask_list);
skip_flows: skip_flows:
free_buckets(table->buckets); free_buckets(table->buckets);
kfree(table); kfree(table);
} }
struct flow_table *ovs_flow_tbl_alloc(int new_size)
{
struct flow_table *table = __flow_tbl_alloc(new_size);
if (!table)
return NULL;
table->mask_list = kmalloc(sizeof(struct list_head), GFP_KERNEL);
if (!table->mask_list) {
table->keep_flows = true;
__flow_tbl_destroy(table);
return NULL;
}
INIT_LIST_HEAD(table->mask_list);
return table;
}
static void flow_tbl_destroy_rcu_cb(struct rcu_head *rcu) static void flow_tbl_destroy_rcu_cb(struct rcu_head *rcu)
{ {
struct flow_table *table = container_of(rcu, struct flow_table, rcu); struct flow_table *table = container_of(rcu, struct flow_table, rcu);
ovs_flow_tbl_destroy(table); __flow_tbl_destroy(table);
} }
void ovs_flow_tbl_deferred_destroy(struct flow_table *table) void ovs_flow_tbl_destroy(struct flow_table *table, bool deferred)
{ {
if (!table) if (!table)
return; return;
call_rcu(&table->rcu, flow_tbl_destroy_rcu_cb); if (deferred)
call_rcu(&table->rcu, flow_tbl_destroy_rcu_cb);
else
__flow_tbl_destroy(table);
} }
struct sw_flow *ovs_flow_tbl_next(struct flow_table *table, u32 *bucket, u32 *last) struct sw_flow *ovs_flow_dump_next(struct flow_table *table, u32 *bucket, u32 *last)
{ {
struct sw_flow *flow; struct sw_flow *flow;
struct hlist_head *head; struct hlist_head *head;
...@@ -353,11 +588,13 @@ struct sw_flow *ovs_flow_tbl_next(struct flow_table *table, u32 *bucket, u32 *la ...@@ -353,11 +588,13 @@ struct sw_flow *ovs_flow_tbl_next(struct flow_table *table, u32 *bucket, u32 *la
return NULL; return NULL;
} }
static void __flow_tbl_insert(struct flow_table *table, struct sw_flow *flow) static void __tbl_insert(struct flow_table *table, struct sw_flow *flow)
{ {
struct hlist_head *head; struct hlist_head *head;
head = find_bucket(table, flow->hash); head = find_bucket(table, flow->hash);
hlist_add_head_rcu(&flow->hash_node[table->node_ver], head); hlist_add_head_rcu(&flow->hash_node[table->node_ver], head);
table->count++; table->count++;
} }
...@@ -377,8 +614,10 @@ static void flow_table_copy_flows(struct flow_table *old, struct flow_table *new ...@@ -377,8 +614,10 @@ static void flow_table_copy_flows(struct flow_table *old, struct flow_table *new
head = flex_array_get(old->buckets, i); head = flex_array_get(old->buckets, i);
hlist_for_each_entry(flow, head, hash_node[old_ver]) hlist_for_each_entry(flow, head, hash_node[old_ver])
__flow_tbl_insert(new, flow); __tbl_insert(new, flow);
} }
new->mask_list = old->mask_list;
old->keep_flows = true; old->keep_flows = true;
} }
...@@ -386,7 +625,7 @@ static struct flow_table *__flow_tbl_rehash(struct flow_table *table, int n_buck ...@@ -386,7 +625,7 @@ static struct flow_table *__flow_tbl_rehash(struct flow_table *table, int n_buck
{ {
struct flow_table *new_table; struct flow_table *new_table;
new_table = ovs_flow_tbl_alloc(n_buckets); new_table = __flow_tbl_alloc(n_buckets);
if (!new_table) if (!new_table)
return ERR_PTR(-ENOMEM); return ERR_PTR(-ENOMEM);
...@@ -405,28 +644,30 @@ struct flow_table *ovs_flow_tbl_expand(struct flow_table *table) ...@@ -405,28 +644,30 @@ struct flow_table *ovs_flow_tbl_expand(struct flow_table *table)
return __flow_tbl_rehash(table, table->n_buckets * 2); return __flow_tbl_rehash(table, table->n_buckets * 2);
} }
void ovs_flow_free(struct sw_flow *flow) static void __flow_free(struct sw_flow *flow)
{ {
if (unlikely(!flow))
return;
kfree((struct sf_flow_acts __force *)flow->sf_acts); kfree((struct sf_flow_acts __force *)flow->sf_acts);
kmem_cache_free(flow_cache, flow); kmem_cache_free(flow_cache, flow);
} }
/* RCU callback used by ovs_flow_deferred_free. */
static void rcu_free_flow_callback(struct rcu_head *rcu) static void rcu_free_flow_callback(struct rcu_head *rcu)
{ {
struct sw_flow *flow = container_of(rcu, struct sw_flow, rcu); struct sw_flow *flow = container_of(rcu, struct sw_flow, rcu);
ovs_flow_free(flow); __flow_free(flow);
} }
/* Schedules 'flow' to be freed after the next RCU grace period. void ovs_flow_free(struct sw_flow *flow, bool deferred)
* The caller must hold rcu_read_lock for this to be sensible. */
void ovs_flow_deferred_free(struct sw_flow *flow)
{ {
call_rcu(&flow->rcu, rcu_free_flow_callback); if (!flow)
return;
ovs_sw_flow_mask_del_ref(flow->mask, deferred);
if (deferred)
call_rcu(&flow->rcu, rcu_free_flow_callback);
else
__flow_free(flow);
} }
/* Schedules 'sf_acts' to be freed after the next RCU grace period. /* Schedules 'sf_acts' to be freed after the next RCU grace period.
...@@ -497,18 +738,15 @@ static __be16 parse_ethertype(struct sk_buff *skb) ...@@ -497,18 +738,15 @@ static __be16 parse_ethertype(struct sk_buff *skb)
} }
static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key, static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
int *key_lenp, int nh_len) int nh_len)
{ {
struct icmp6hdr *icmp = icmp6_hdr(skb); struct icmp6hdr *icmp = icmp6_hdr(skb);
int error = 0;
int key_len;
/* The ICMPv6 type and code fields use the 16-bit transport port /* The ICMPv6 type and code fields use the 16-bit transport port
* fields, so we need to store them in 16-bit network byte order. * fields, so we need to store them in 16-bit network byte order.
*/ */
key->ipv6.tp.src = htons(icmp->icmp6_type); key->ipv6.tp.src = htons(icmp->icmp6_type);
key->ipv6.tp.dst = htons(icmp->icmp6_code); key->ipv6.tp.dst = htons(icmp->icmp6_code);
key_len = SW_FLOW_KEY_OFFSET(ipv6.tp);
if (icmp->icmp6_code == 0 && if (icmp->icmp6_code == 0 &&
(icmp->icmp6_type == NDISC_NEIGHBOUR_SOLICITATION || (icmp->icmp6_type == NDISC_NEIGHBOUR_SOLICITATION ||
...@@ -517,21 +755,17 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key, ...@@ -517,21 +755,17 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
struct nd_msg *nd; struct nd_msg *nd;
int offset; int offset;
key_len = SW_FLOW_KEY_OFFSET(ipv6.nd);
/* In order to process neighbor discovery options, we need the /* In order to process neighbor discovery options, we need the
* entire packet. * entire packet.
*/ */
if (unlikely(icmp_len < sizeof(*nd))) if (unlikely(icmp_len < sizeof(*nd)))
goto out; return 0;
if (unlikely(skb_linearize(skb))) {
error = -ENOMEM; if (unlikely(skb_linearize(skb)))
goto out; return -ENOMEM;
}
nd = (struct nd_msg *)skb_transport_header(skb); nd = (struct nd_msg *)skb_transport_header(skb);
key->ipv6.nd.target = nd->target; key->ipv6.nd.target = nd->target;
key_len = SW_FLOW_KEY_OFFSET(ipv6.nd);
icmp_len -= sizeof(*nd); icmp_len -= sizeof(*nd);
offset = 0; offset = 0;
...@@ -541,7 +775,7 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key, ...@@ -541,7 +775,7 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
int opt_len = nd_opt->nd_opt_len * 8; int opt_len = nd_opt->nd_opt_len * 8;
if (unlikely(!opt_len || opt_len > icmp_len)) if (unlikely(!opt_len || opt_len > icmp_len))
goto invalid; return 0;
/* Store the link layer address if the appropriate /* Store the link layer address if the appropriate
* option is provided. It is considered an error if * option is provided. It is considered an error if
...@@ -566,16 +800,14 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key, ...@@ -566,16 +800,14 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
} }
} }
goto out; return 0;
invalid: invalid:
memset(&key->ipv6.nd.target, 0, sizeof(key->ipv6.nd.target)); memset(&key->ipv6.nd.target, 0, sizeof(key->ipv6.nd.target));
memset(key->ipv6.nd.sll, 0, sizeof(key->ipv6.nd.sll)); memset(key->ipv6.nd.sll, 0, sizeof(key->ipv6.nd.sll));
memset(key->ipv6.nd.tll, 0, sizeof(key->ipv6.nd.tll)); memset(key->ipv6.nd.tll, 0, sizeof(key->ipv6.nd.tll));
out: return 0;
*key_lenp = key_len;
return error;
} }
/** /**
...@@ -584,7 +816,6 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key, ...@@ -584,7 +816,6 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
* Ethernet header * Ethernet header
* @in_port: port number on which @skb was received. * @in_port: port number on which @skb was received.
* @key: output flow key * @key: output flow key
* @key_lenp: length of output flow key
* *
* The caller must ensure that skb->len >= ETH_HLEN. * The caller must ensure that skb->len >= ETH_HLEN.
* *
...@@ -602,11 +833,9 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key, ...@@ -602,11 +833,9 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
* of a correct length, otherwise the same as skb->network_header. * of a correct length, otherwise the same as skb->network_header.
* For other key->eth.type values it is left untouched. * For other key->eth.type values it is left untouched.
*/ */
int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key, int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key)
int *key_lenp)
{ {
int error = 0; int error;
int key_len = SW_FLOW_KEY_OFFSET(eth);
struct ethhdr *eth; struct ethhdr *eth;
memset(key, 0, sizeof(*key)); memset(key, 0, sizeof(*key));
...@@ -649,15 +878,13 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key, ...@@ -649,15 +878,13 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key,
struct iphdr *nh; struct iphdr *nh;
__be16 offset; __be16 offset;
key_len = SW_FLOW_KEY_OFFSET(ipv4.addr);
error = check_iphdr(skb); error = check_iphdr(skb);
if (unlikely(error)) { if (unlikely(error)) {
if (error == -EINVAL) { if (error == -EINVAL) {
skb->transport_header = skb->network_header; skb->transport_header = skb->network_header;
error = 0; error = 0;
} }
goto out; return error;
} }
nh = ip_hdr(skb); nh = ip_hdr(skb);
...@@ -671,7 +898,7 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key, ...@@ -671,7 +898,7 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key,
offset = nh->frag_off & htons(IP_OFFSET); offset = nh->frag_off & htons(IP_OFFSET);
if (offset) { if (offset) {
key->ip.frag = OVS_FRAG_TYPE_LATER; key->ip.frag = OVS_FRAG_TYPE_LATER;
goto out; return 0;
} }
if (nh->frag_off & htons(IP_MF) || if (nh->frag_off & htons(IP_MF) ||
skb_shinfo(skb)->gso_type & SKB_GSO_UDP) skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
...@@ -679,21 +906,24 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key, ...@@ -679,21 +906,24 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key,
/* Transport layer. */ /* Transport layer. */
if (key->ip.proto == IPPROTO_TCP) { if (key->ip.proto == IPPROTO_TCP) {
key_len = SW_FLOW_KEY_OFFSET(ipv4.tp);
if (tcphdr_ok(skb)) { if (tcphdr_ok(skb)) {
struct tcphdr *tcp = tcp_hdr(skb); struct tcphdr *tcp = tcp_hdr(skb);
key->ipv4.tp.src = tcp->source; key->ipv4.tp.src = tcp->source;
key->ipv4.tp.dst = tcp->dest; key->ipv4.tp.dst = tcp->dest;
} }
} else if (key->ip.proto == IPPROTO_UDP) { } else if (key->ip.proto == IPPROTO_UDP) {
key_len = SW_FLOW_KEY_OFFSET(ipv4.tp);
if (udphdr_ok(skb)) { if (udphdr_ok(skb)) {
struct udphdr *udp = udp_hdr(skb); struct udphdr *udp = udp_hdr(skb);
key->ipv4.tp.src = udp->source; key->ipv4.tp.src = udp->source;
key->ipv4.tp.dst = udp->dest; key->ipv4.tp.dst = udp->dest;
} }
} else if (key->ip.proto == IPPROTO_SCTP) {
if (sctphdr_ok(skb)) {
struct sctphdr *sctp = sctp_hdr(skb);
key->ipv4.tp.src = sctp->source;
key->ipv4.tp.dst = sctp->dest;
}
} else if (key->ip.proto == IPPROTO_ICMP) { } else if (key->ip.proto == IPPROTO_ICMP) {
key_len = SW_FLOW_KEY_OFFSET(ipv4.tp);
if (icmphdr_ok(skb)) { if (icmphdr_ok(skb)) {
struct icmphdr *icmp = icmp_hdr(skb); struct icmphdr *icmp = icmp_hdr(skb);
/* The ICMP type and code fields use the 16-bit /* The ICMP type and code fields use the 16-bit
...@@ -722,102 +952,175 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key, ...@@ -722,102 +952,175 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key,
memcpy(&key->ipv4.addr.dst, arp->ar_tip, sizeof(key->ipv4.addr.dst)); memcpy(&key->ipv4.addr.dst, arp->ar_tip, sizeof(key->ipv4.addr.dst));
memcpy(key->ipv4.arp.sha, arp->ar_sha, ETH_ALEN); memcpy(key->ipv4.arp.sha, arp->ar_sha, ETH_ALEN);
memcpy(key->ipv4.arp.tha, arp->ar_tha, ETH_ALEN); memcpy(key->ipv4.arp.tha, arp->ar_tha, ETH_ALEN);
key_len = SW_FLOW_KEY_OFFSET(ipv4.arp);
} }
} else if (key->eth.type == htons(ETH_P_IPV6)) { } else if (key->eth.type == htons(ETH_P_IPV6)) {
int nh_len; /* IPv6 Header + Extensions */ int nh_len; /* IPv6 Header + Extensions */
nh_len = parse_ipv6hdr(skb, key, &key_len); nh_len = parse_ipv6hdr(skb, key);
if (unlikely(nh_len < 0)) { if (unlikely(nh_len < 0)) {
if (nh_len == -EINVAL) if (nh_len == -EINVAL) {
skb->transport_header = skb->network_header; skb->transport_header = skb->network_header;
else error = 0;
} else {
error = nh_len; error = nh_len;
goto out; }
return error;
} }
if (key->ip.frag == OVS_FRAG_TYPE_LATER) if (key->ip.frag == OVS_FRAG_TYPE_LATER)
goto out; return 0;
if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP) if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
key->ip.frag = OVS_FRAG_TYPE_FIRST; key->ip.frag = OVS_FRAG_TYPE_FIRST;
/* Transport layer. */ /* Transport layer. */
if (key->ip.proto == NEXTHDR_TCP) { if (key->ip.proto == NEXTHDR_TCP) {
key_len = SW_FLOW_KEY_OFFSET(ipv6.tp);
if (tcphdr_ok(skb)) { if (tcphdr_ok(skb)) {
struct tcphdr *tcp = tcp_hdr(skb); struct tcphdr *tcp = tcp_hdr(skb);
key->ipv6.tp.src = tcp->source; key->ipv6.tp.src = tcp->source;
key->ipv6.tp.dst = tcp->dest; key->ipv6.tp.dst = tcp->dest;
} }
} else if (key->ip.proto == NEXTHDR_UDP) { } else if (key->ip.proto == NEXTHDR_UDP) {
key_len = SW_FLOW_KEY_OFFSET(ipv6.tp);
if (udphdr_ok(skb)) { if (udphdr_ok(skb)) {
struct udphdr *udp = udp_hdr(skb); struct udphdr *udp = udp_hdr(skb);
key->ipv6.tp.src = udp->source; key->ipv6.tp.src = udp->source;
key->ipv6.tp.dst = udp->dest; key->ipv6.tp.dst = udp->dest;
} }
} else if (key->ip.proto == NEXTHDR_SCTP) {
if (sctphdr_ok(skb)) {
struct sctphdr *sctp = sctp_hdr(skb);
key->ipv6.tp.src = sctp->source;
key->ipv6.tp.dst = sctp->dest;
}
} else if (key->ip.proto == NEXTHDR_ICMP) { } else if (key->ip.proto == NEXTHDR_ICMP) {
key_len = SW_FLOW_KEY_OFFSET(ipv6.tp);
if (icmp6hdr_ok(skb)) { if (icmp6hdr_ok(skb)) {
error = parse_icmpv6(skb, key, &key_len, nh_len); error = parse_icmpv6(skb, key, nh_len);
if (error < 0) if (error)
goto out; return error;
} }
} }
} }
out: return 0;
*key_lenp = key_len;
return error;
} }
static u32 ovs_flow_hash(const struct sw_flow_key *key, int key_start, int key_len) static u32 ovs_flow_hash(const struct sw_flow_key *key, int key_start,
int key_end)
{ {
return jhash2((u32 *)((u8 *)key + key_start), u32 *hash_key = (u32 *)((u8 *)key + key_start);
DIV_ROUND_UP(key_len - key_start, sizeof(u32)), 0); int hash_u32s = (key_end - key_start) >> 2;
/* Make sure number of hash bytes are multiple of u32. */
BUILD_BUG_ON(sizeof(long) % sizeof(u32));
return jhash2(hash_key, hash_u32s, 0);
} }
static int flow_key_start(struct sw_flow_key *key) static int flow_key_start(const struct sw_flow_key *key)
{ {
if (key->tun_key.ipv4_dst) if (key->tun_key.ipv4_dst)
return 0; return 0;
else else
return offsetof(struct sw_flow_key, phy); return rounddown(offsetof(struct sw_flow_key, phy),
sizeof(long));
}
static bool __cmp_key(const struct sw_flow_key *key1,
const struct sw_flow_key *key2, int key_start, int key_end)
{
const long *cp1 = (long *)((u8 *)key1 + key_start);
const long *cp2 = (long *)((u8 *)key2 + key_start);
long diffs = 0;
int i;
for (i = key_start; i < key_end; i += sizeof(long))
diffs |= *cp1++ ^ *cp2++;
return diffs == 0;
} }
struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *table, static bool __flow_cmp_masked_key(const struct sw_flow *flow,
struct sw_flow_key *key, int key_len) const struct sw_flow_key *key, int key_start, int key_end)
{
return __cmp_key(&flow->key, key, key_start, key_end);
}
static bool __flow_cmp_unmasked_key(const struct sw_flow *flow,
const struct sw_flow_key *key, int key_start, int key_end)
{
return __cmp_key(&flow->unmasked_key, key, key_start, key_end);
}
bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow,
const struct sw_flow_key *key, int key_end)
{
int key_start;
key_start = flow_key_start(key);
return __flow_cmp_unmasked_key(flow, key, key_start, key_end);
}
struct sw_flow *ovs_flow_lookup_unmasked_key(struct flow_table *table,
struct sw_flow_match *match)
{
struct sw_flow_key *unmasked = match->key;
int key_end = match->range.end;
struct sw_flow *flow;
flow = ovs_flow_lookup(table, unmasked);
if (flow && (!ovs_flow_cmp_unmasked_key(flow, unmasked, key_end)))
flow = NULL;
return flow;
}
static struct sw_flow *ovs_masked_flow_lookup(struct flow_table *table,
const struct sw_flow_key *unmasked,
struct sw_flow_mask *mask)
{ {
struct sw_flow *flow; struct sw_flow *flow;
struct hlist_head *head; struct hlist_head *head;
u8 *_key; int key_start = mask->range.start;
int key_start; int key_end = mask->range.end;
u32 hash; u32 hash;
struct sw_flow_key masked_key;
key_start = flow_key_start(key); ovs_flow_key_mask(&masked_key, unmasked, mask);
hash = ovs_flow_hash(key, key_start, key_len); hash = ovs_flow_hash(&masked_key, key_start, key_end);
_key = (u8 *) key + key_start;
head = find_bucket(table, hash); head = find_bucket(table, hash);
hlist_for_each_entry_rcu(flow, head, hash_node[table->node_ver]) { hlist_for_each_entry_rcu(flow, head, hash_node[table->node_ver]) {
if (flow->mask == mask &&
if (flow->hash == hash && __flow_cmp_masked_key(flow, &masked_key,
!memcmp((u8 *)&flow->key + key_start, _key, key_len - key_start)) { key_start, key_end))
return flow; return flow;
}
} }
return NULL; return NULL;
} }
void ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow, struct sw_flow *ovs_flow_lookup(struct flow_table *tbl,
struct sw_flow_key *key, int key_len) const struct sw_flow_key *key)
{
struct sw_flow *flow = NULL;
struct sw_flow_mask *mask;
list_for_each_entry_rcu(mask, tbl->mask_list, list) {
flow = ovs_masked_flow_lookup(tbl, key, mask);
if (flow) /* Found */
break;
}
return flow;
}
void ovs_flow_insert(struct flow_table *table, struct sw_flow *flow)
{ {
flow->hash = ovs_flow_hash(key, flow_key_start(key), key_len); flow->hash = ovs_flow_hash(&flow->key, flow->mask->range.start,
memcpy(&flow->key, key, sizeof(flow->key)); flow->mask->range.end);
__flow_tbl_insert(table, flow); __tbl_insert(table, flow);
} }
void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow) void ovs_flow_remove(struct flow_table *table, struct sw_flow *flow)
{ {
BUG_ON(table->count == 0); BUG_ON(table->count == 0);
hlist_del_rcu(&flow->hash_node[table->node_ver]); hlist_del_rcu(&flow->hash_node[table->node_ver]);
...@@ -837,6 +1140,7 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = { ...@@ -837,6 +1140,7 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
[OVS_KEY_ATTR_IPV6] = sizeof(struct ovs_key_ipv6), [OVS_KEY_ATTR_IPV6] = sizeof(struct ovs_key_ipv6),
[OVS_KEY_ATTR_TCP] = sizeof(struct ovs_key_tcp), [OVS_KEY_ATTR_TCP] = sizeof(struct ovs_key_tcp),
[OVS_KEY_ATTR_UDP] = sizeof(struct ovs_key_udp), [OVS_KEY_ATTR_UDP] = sizeof(struct ovs_key_udp),
[OVS_KEY_ATTR_SCTP] = sizeof(struct ovs_key_sctp),
[OVS_KEY_ATTR_ICMP] = sizeof(struct ovs_key_icmp), [OVS_KEY_ATTR_ICMP] = sizeof(struct ovs_key_icmp),
[OVS_KEY_ATTR_ICMPV6] = sizeof(struct ovs_key_icmpv6), [OVS_KEY_ATTR_ICMPV6] = sizeof(struct ovs_key_icmpv6),
[OVS_KEY_ATTR_ARP] = sizeof(struct ovs_key_arp), [OVS_KEY_ATTR_ARP] = sizeof(struct ovs_key_arp),
...@@ -844,149 +1148,84 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = { ...@@ -844,149 +1148,84 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
[OVS_KEY_ATTR_TUNNEL] = -1, [OVS_KEY_ATTR_TUNNEL] = -1,
}; };
static int ipv4_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_len, static bool is_all_zero(const u8 *fp, size_t size)
const struct nlattr *a[], u32 *attrs)
{ {
const struct ovs_key_icmp *icmp_key; int i;
const struct ovs_key_tcp *tcp_key;
const struct ovs_key_udp *udp_key;
switch (swkey->ip.proto) {
case IPPROTO_TCP:
if (!(*attrs & (1 << OVS_KEY_ATTR_TCP)))
return -EINVAL;
*attrs &= ~(1 << OVS_KEY_ATTR_TCP);
*key_len = SW_FLOW_KEY_OFFSET(ipv4.tp);
tcp_key = nla_data(a[OVS_KEY_ATTR_TCP]);
swkey->ipv4.tp.src = tcp_key->tcp_src;
swkey->ipv4.tp.dst = tcp_key->tcp_dst;
break;
case IPPROTO_UDP:
if (!(*attrs & (1 << OVS_KEY_ATTR_UDP)))
return -EINVAL;
*attrs &= ~(1 << OVS_KEY_ATTR_UDP);
*key_len = SW_FLOW_KEY_OFFSET(ipv4.tp);
udp_key = nla_data(a[OVS_KEY_ATTR_UDP]);
swkey->ipv4.tp.src = udp_key->udp_src;
swkey->ipv4.tp.dst = udp_key->udp_dst;
break;
case IPPROTO_ICMP:
if (!(*attrs & (1 << OVS_KEY_ATTR_ICMP)))
return -EINVAL;
*attrs &= ~(1 << OVS_KEY_ATTR_ICMP);
*key_len = SW_FLOW_KEY_OFFSET(ipv4.tp);
icmp_key = nla_data(a[OVS_KEY_ATTR_ICMP]);
swkey->ipv4.tp.src = htons(icmp_key->icmp_type);
swkey->ipv4.tp.dst = htons(icmp_key->icmp_code);
break;
}
return 0;
}
static int ipv6_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_len,
const struct nlattr *a[], u32 *attrs)
{
const struct ovs_key_icmpv6 *icmpv6_key;
const struct ovs_key_tcp *tcp_key;
const struct ovs_key_udp *udp_key;
switch (swkey->ip.proto) {
case IPPROTO_TCP:
if (!(*attrs & (1 << OVS_KEY_ATTR_TCP)))
return -EINVAL;
*attrs &= ~(1 << OVS_KEY_ATTR_TCP);
*key_len = SW_FLOW_KEY_OFFSET(ipv6.tp);
tcp_key = nla_data(a[OVS_KEY_ATTR_TCP]);
swkey->ipv6.tp.src = tcp_key->tcp_src;
swkey->ipv6.tp.dst = tcp_key->tcp_dst;
break;
case IPPROTO_UDP:
if (!(*attrs & (1 << OVS_KEY_ATTR_UDP)))
return -EINVAL;
*attrs &= ~(1 << OVS_KEY_ATTR_UDP);
*key_len = SW_FLOW_KEY_OFFSET(ipv6.tp);
udp_key = nla_data(a[OVS_KEY_ATTR_UDP]);
swkey->ipv6.tp.src = udp_key->udp_src;
swkey->ipv6.tp.dst = udp_key->udp_dst;
break;
case IPPROTO_ICMPV6:
if (!(*attrs & (1 << OVS_KEY_ATTR_ICMPV6)))
return -EINVAL;
*attrs &= ~(1 << OVS_KEY_ATTR_ICMPV6);
*key_len = SW_FLOW_KEY_OFFSET(ipv6.tp);
icmpv6_key = nla_data(a[OVS_KEY_ATTR_ICMPV6]);
swkey->ipv6.tp.src = htons(icmpv6_key->icmpv6_type);
swkey->ipv6.tp.dst = htons(icmpv6_key->icmpv6_code);
if (swkey->ipv6.tp.src == htons(NDISC_NEIGHBOUR_SOLICITATION) || if (!fp)
swkey->ipv6.tp.src == htons(NDISC_NEIGHBOUR_ADVERTISEMENT)) { return false;
const struct ovs_key_nd *nd_key;
if (!(*attrs & (1 << OVS_KEY_ATTR_ND))) for (i = 0; i < size; i++)
return -EINVAL; if (fp[i])
*attrs &= ~(1 << OVS_KEY_ATTR_ND); return false;
*key_len = SW_FLOW_KEY_OFFSET(ipv6.nd);
nd_key = nla_data(a[OVS_KEY_ATTR_ND]);
memcpy(&swkey->ipv6.nd.target, nd_key->nd_target,
sizeof(swkey->ipv6.nd.target));
memcpy(swkey->ipv6.nd.sll, nd_key->nd_sll, ETH_ALEN);
memcpy(swkey->ipv6.nd.tll, nd_key->nd_tll, ETH_ALEN);
}
break;
}
return 0; return true;
} }
static int parse_flow_nlattrs(const struct nlattr *attr, static int __parse_flow_nlattrs(const struct nlattr *attr,
const struct nlattr *a[], u32 *attrsp) const struct nlattr *a[],
u64 *attrsp, bool nz)
{ {
const struct nlattr *nla; const struct nlattr *nla;
u32 attrs; u32 attrs;
int rem; int rem;
attrs = 0; attrs = *attrsp;
nla_for_each_nested(nla, attr, rem) { nla_for_each_nested(nla, attr, rem) {
u16 type = nla_type(nla); u16 type = nla_type(nla);
int expected_len; int expected_len;
if (type > OVS_KEY_ATTR_MAX || attrs & (1 << type)) if (type > OVS_KEY_ATTR_MAX) {
OVS_NLERR("Unknown key attribute (type=%d, max=%d).\n",
type, OVS_KEY_ATTR_MAX);
}
if (attrs & (1 << type)) {
OVS_NLERR("Duplicate key attribute (type %d).\n", type);
return -EINVAL; return -EINVAL;
}
expected_len = ovs_key_lens[type]; expected_len = ovs_key_lens[type];
if (nla_len(nla) != expected_len && expected_len != -1) if (nla_len(nla) != expected_len && expected_len != -1) {
OVS_NLERR("Key attribute has unexpected length (type=%d"
", length=%d, expected=%d).\n", type,
nla_len(nla), expected_len);
return -EINVAL; return -EINVAL;
}
attrs |= 1 << type; if (!nz || !is_all_zero(nla_data(nla), expected_len)) {
a[type] = nla; attrs |= 1 << type;
a[type] = nla;
}
} }
if (rem) if (rem) {
OVS_NLERR("Message has %d unknown bytes.\n", rem);
return -EINVAL; return -EINVAL;
}
*attrsp = attrs; *attrsp = attrs;
return 0; return 0;
} }
static int parse_flow_mask_nlattrs(const struct nlattr *attr,
const struct nlattr *a[], u64 *attrsp)
{
return __parse_flow_nlattrs(attr, a, attrsp, true);
}
static int parse_flow_nlattrs(const struct nlattr *attr,
const struct nlattr *a[], u64 *attrsp)
{
return __parse_flow_nlattrs(attr, a, attrsp, false);
}
int ovs_ipv4_tun_from_nlattr(const struct nlattr *attr, int ovs_ipv4_tun_from_nlattr(const struct nlattr *attr,
struct ovs_key_ipv4_tunnel *tun_key) struct sw_flow_match *match, bool is_mask)
{ {
struct nlattr *a; struct nlattr *a;
int rem; int rem;
bool ttl = false; bool ttl = false;
__be16 tun_flags = 0;
memset(tun_key, 0, sizeof(*tun_key));
nla_for_each_nested(a, attr, rem) { nla_for_each_nested(a, attr, rem) {
int type = nla_type(a); int type = nla_type(a);
...@@ -1000,53 +1239,78 @@ int ovs_ipv4_tun_from_nlattr(const struct nlattr *attr, ...@@ -1000,53 +1239,78 @@ int ovs_ipv4_tun_from_nlattr(const struct nlattr *attr,
[OVS_TUNNEL_KEY_ATTR_CSUM] = 0, [OVS_TUNNEL_KEY_ATTR_CSUM] = 0,
}; };
if (type > OVS_TUNNEL_KEY_ATTR_MAX || if (type > OVS_TUNNEL_KEY_ATTR_MAX) {
ovs_tunnel_key_lens[type] != nla_len(a)) OVS_NLERR("Unknown IPv4 tunnel attribute (type=%d, max=%d).\n",
type, OVS_TUNNEL_KEY_ATTR_MAX);
return -EINVAL; return -EINVAL;
}
if (ovs_tunnel_key_lens[type] != nla_len(a)) {
OVS_NLERR("IPv4 tunnel attribute type has unexpected "
" length (type=%d, length=%d, expected=%d).\n",
type, nla_len(a), ovs_tunnel_key_lens[type]);
return -EINVAL;
}
switch (type) { switch (type) {
case OVS_TUNNEL_KEY_ATTR_ID: case OVS_TUNNEL_KEY_ATTR_ID:
tun_key->tun_id = nla_get_be64(a); SW_FLOW_KEY_PUT(match, tun_key.tun_id,
tun_key->tun_flags |= TUNNEL_KEY; nla_get_be64(a), is_mask);
tun_flags |= TUNNEL_KEY;
break; break;
case OVS_TUNNEL_KEY_ATTR_IPV4_SRC: case OVS_TUNNEL_KEY_ATTR_IPV4_SRC:
tun_key->ipv4_src = nla_get_be32(a); SW_FLOW_KEY_PUT(match, tun_key.ipv4_src,
nla_get_be32(a), is_mask);
break; break;
case OVS_TUNNEL_KEY_ATTR_IPV4_DST: case OVS_TUNNEL_KEY_ATTR_IPV4_DST:
tun_key->ipv4_dst = nla_get_be32(a); SW_FLOW_KEY_PUT(match, tun_key.ipv4_dst,
nla_get_be32(a), is_mask);
break; break;
case OVS_TUNNEL_KEY_ATTR_TOS: case OVS_TUNNEL_KEY_ATTR_TOS:
tun_key->ipv4_tos = nla_get_u8(a); SW_FLOW_KEY_PUT(match, tun_key.ipv4_tos,
nla_get_u8(a), is_mask);
break; break;
case OVS_TUNNEL_KEY_ATTR_TTL: case OVS_TUNNEL_KEY_ATTR_TTL:
tun_key->ipv4_ttl = nla_get_u8(a); SW_FLOW_KEY_PUT(match, tun_key.ipv4_ttl,
nla_get_u8(a), is_mask);
ttl = true; ttl = true;
break; break;
case OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT: case OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT:
tun_key->tun_flags |= TUNNEL_DONT_FRAGMENT; tun_flags |= TUNNEL_DONT_FRAGMENT;
break; break;
case OVS_TUNNEL_KEY_ATTR_CSUM: case OVS_TUNNEL_KEY_ATTR_CSUM:
tun_key->tun_flags |= TUNNEL_CSUM; tun_flags |= TUNNEL_CSUM;
break; break;
default: default:
return -EINVAL; return -EINVAL;
} }
} }
if (rem > 0)
return -EINVAL;
if (!tun_key->ipv4_dst) SW_FLOW_KEY_PUT(match, tun_key.tun_flags, tun_flags, is_mask);
return -EINVAL;
if (!ttl) if (rem > 0) {
OVS_NLERR("IPv4 tunnel attribute has %d unknown bytes.\n", rem);
return -EINVAL; return -EINVAL;
}
if (!is_mask) {
if (!match->key->tun_key.ipv4_dst) {
OVS_NLERR("IPv4 tunnel destination address is zero.\n");
return -EINVAL;
}
if (!ttl) {
OVS_NLERR("IPv4 tunnel TTL not specified.\n");
return -EINVAL;
}
}
return 0; return 0;
} }
int ovs_ipv4_tun_to_nlattr(struct sk_buff *skb, int ovs_ipv4_tun_to_nlattr(struct sk_buff *skb,
const struct ovs_key_ipv4_tunnel *tun_key) const struct ovs_key_ipv4_tunnel *tun_key,
const struct ovs_key_ipv4_tunnel *output)
{ {
struct nlattr *nla; struct nlattr *nla;
...@@ -1054,23 +1318,24 @@ int ovs_ipv4_tun_to_nlattr(struct sk_buff *skb, ...@@ -1054,23 +1318,24 @@ int ovs_ipv4_tun_to_nlattr(struct sk_buff *skb,
if (!nla) if (!nla)
return -EMSGSIZE; return -EMSGSIZE;
if (tun_key->tun_flags & TUNNEL_KEY && if (output->tun_flags & TUNNEL_KEY &&
nla_put_be64(skb, OVS_TUNNEL_KEY_ATTR_ID, tun_key->tun_id)) nla_put_be64(skb, OVS_TUNNEL_KEY_ATTR_ID, output->tun_id))
return -EMSGSIZE; return -EMSGSIZE;
if (tun_key->ipv4_src && if (output->ipv4_src &&
nla_put_be32(skb, OVS_TUNNEL_KEY_ATTR_IPV4_SRC, tun_key->ipv4_src)) nla_put_be32(skb, OVS_TUNNEL_KEY_ATTR_IPV4_SRC, output->ipv4_src))
return -EMSGSIZE; return -EMSGSIZE;
if (nla_put_be32(skb, OVS_TUNNEL_KEY_ATTR_IPV4_DST, tun_key->ipv4_dst)) if (output->ipv4_dst &&
nla_put_be32(skb, OVS_TUNNEL_KEY_ATTR_IPV4_DST, output->ipv4_dst))
return -EMSGSIZE; return -EMSGSIZE;
if (tun_key->ipv4_tos && if (output->ipv4_tos &&
nla_put_u8(skb, OVS_TUNNEL_KEY_ATTR_TOS, tun_key->ipv4_tos)) nla_put_u8(skb, OVS_TUNNEL_KEY_ATTR_TOS, output->ipv4_tos))
return -EMSGSIZE; return -EMSGSIZE;
if (nla_put_u8(skb, OVS_TUNNEL_KEY_ATTR_TTL, tun_key->ipv4_ttl)) if (nla_put_u8(skb, OVS_TUNNEL_KEY_ATTR_TTL, output->ipv4_ttl))
return -EMSGSIZE; return -EMSGSIZE;
if ((tun_key->tun_flags & TUNNEL_DONT_FRAGMENT) && if ((output->tun_flags & TUNNEL_DONT_FRAGMENT) &&
nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT)) nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT))
return -EMSGSIZE; return -EMSGSIZE;
if ((tun_key->tun_flags & TUNNEL_CSUM) && if ((output->tun_flags & TUNNEL_CSUM) &&
nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_CSUM)) nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_CSUM))
return -EMSGSIZE; return -EMSGSIZE;
...@@ -1078,176 +1343,390 @@ int ovs_ipv4_tun_to_nlattr(struct sk_buff *skb, ...@@ -1078,176 +1343,390 @@ int ovs_ipv4_tun_to_nlattr(struct sk_buff *skb,
return 0; return 0;
} }
/** static int metadata_from_nlattrs(struct sw_flow_match *match, u64 *attrs,
* ovs_flow_from_nlattrs - parses Netlink attributes into a flow key. const struct nlattr **a, bool is_mask)
* @swkey: receives the extracted flow key.
* @key_lenp: number of bytes used in @swkey.
* @attr: Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink attribute
* sequence.
*/
int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
const struct nlattr *attr)
{ {
const struct nlattr *a[OVS_KEY_ATTR_MAX + 1]; if (*attrs & (1 << OVS_KEY_ATTR_PRIORITY)) {
const struct ovs_key_ethernet *eth_key; SW_FLOW_KEY_PUT(match, phy.priority,
int key_len; nla_get_u32(a[OVS_KEY_ATTR_PRIORITY]), is_mask);
u32 attrs; *attrs &= ~(1 << OVS_KEY_ATTR_PRIORITY);
int err; }
memset(swkey, 0, sizeof(struct sw_flow_key)); if (*attrs & (1 << OVS_KEY_ATTR_IN_PORT)) {
key_len = SW_FLOW_KEY_OFFSET(eth); u32 in_port = nla_get_u32(a[OVS_KEY_ATTR_IN_PORT]);
err = parse_flow_nlattrs(attr, a, &attrs); if (is_mask)
if (err) in_port = 0xffffffff; /* Always exact match in_port. */
return err; else if (in_port >= DP_MAX_PORTS)
return -EINVAL;
/* Metadata attributes. */ SW_FLOW_KEY_PUT(match, phy.in_port, in_port, is_mask);
if (attrs & (1 << OVS_KEY_ATTR_PRIORITY)) { *attrs &= ~(1 << OVS_KEY_ATTR_IN_PORT);
swkey->phy.priority = nla_get_u32(a[OVS_KEY_ATTR_PRIORITY]); } else if (!is_mask) {
attrs &= ~(1 << OVS_KEY_ATTR_PRIORITY); SW_FLOW_KEY_PUT(match, phy.in_port, DP_MAX_PORTS, is_mask);
} }
if (attrs & (1 << OVS_KEY_ATTR_IN_PORT)) {
u32 in_port = nla_get_u32(a[OVS_KEY_ATTR_IN_PORT]); if (*attrs & (1 << OVS_KEY_ATTR_SKB_MARK)) {
if (in_port >= DP_MAX_PORTS) uint32_t mark = nla_get_u32(a[OVS_KEY_ATTR_SKB_MARK]);
return -EINVAL;
swkey->phy.in_port = in_port; SW_FLOW_KEY_PUT(match, phy.skb_mark, mark, is_mask);
attrs &= ~(1 << OVS_KEY_ATTR_IN_PORT); *attrs &= ~(1 << OVS_KEY_ATTR_SKB_MARK);
} else {
swkey->phy.in_port = DP_MAX_PORTS;
} }
if (attrs & (1 << OVS_KEY_ATTR_SKB_MARK)) { if (*attrs & (1 << OVS_KEY_ATTR_TUNNEL)) {
swkey->phy.skb_mark = nla_get_u32(a[OVS_KEY_ATTR_SKB_MARK]); if (ovs_ipv4_tun_from_nlattr(a[OVS_KEY_ATTR_TUNNEL], match,
attrs &= ~(1 << OVS_KEY_ATTR_SKB_MARK); is_mask))
return -EINVAL;
*attrs &= ~(1 << OVS_KEY_ATTR_TUNNEL);
} }
return 0;
}
if (attrs & (1 << OVS_KEY_ATTR_TUNNEL)) { static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs,
err = ovs_ipv4_tun_from_nlattr(a[OVS_KEY_ATTR_TUNNEL], &swkey->tun_key); const struct nlattr **a, bool is_mask)
if (err) {
return err; int err;
u64 orig_attrs = attrs;
attrs &= ~(1 << OVS_KEY_ATTR_TUNNEL); err = metadata_from_nlattrs(match, &attrs, a, is_mask);
} if (err)
return err;
/* Data attributes. */ if (attrs & (1 << OVS_KEY_ATTR_ETHERNET)) {
if (!(attrs & (1 << OVS_KEY_ATTR_ETHERNET))) const struct ovs_key_ethernet *eth_key;
return -EINVAL;
attrs &= ~(1 << OVS_KEY_ATTR_ETHERNET);
eth_key = nla_data(a[OVS_KEY_ATTR_ETHERNET]); eth_key = nla_data(a[OVS_KEY_ATTR_ETHERNET]);
memcpy(swkey->eth.src, eth_key->eth_src, ETH_ALEN); SW_FLOW_KEY_MEMCPY(match, eth.src,
memcpy(swkey->eth.dst, eth_key->eth_dst, ETH_ALEN); eth_key->eth_src, ETH_ALEN, is_mask);
SW_FLOW_KEY_MEMCPY(match, eth.dst,
eth_key->eth_dst, ETH_ALEN, is_mask);
attrs &= ~(1 << OVS_KEY_ATTR_ETHERNET);
}
if (attrs & (1u << OVS_KEY_ATTR_ETHERTYPE) && if (attrs & (1 << OVS_KEY_ATTR_VLAN)) {
nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]) == htons(ETH_P_8021Q)) {
const struct nlattr *encap;
__be16 tci; __be16 tci;
if (attrs != ((1 << OVS_KEY_ATTR_VLAN) |
(1 << OVS_KEY_ATTR_ETHERTYPE) |
(1 << OVS_KEY_ATTR_ENCAP)))
return -EINVAL;
encap = a[OVS_KEY_ATTR_ENCAP];
tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]); tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]);
if (tci & htons(VLAN_TAG_PRESENT)) { if (!(tci & htons(VLAN_TAG_PRESENT))) {
swkey->eth.tci = tci; if (is_mask)
OVS_NLERR("VLAN TCI mask does not have exact match for VLAN_TAG_PRESENT bit.\n");
err = parse_flow_nlattrs(encap, a, &attrs); else
if (err) OVS_NLERR("VLAN TCI does not have VLAN_TAG_PRESENT bit set.\n");
return err;
} else if (!tci) {
/* Corner case for truncated 802.1Q header. */
if (nla_len(encap))
return -EINVAL;
swkey->eth.type = htons(ETH_P_8021Q);
*key_lenp = key_len;
return 0;
} else {
return -EINVAL; return -EINVAL;
} }
}
SW_FLOW_KEY_PUT(match, eth.tci, tci, is_mask);
attrs &= ~(1 << OVS_KEY_ATTR_VLAN);
} else if (!is_mask)
SW_FLOW_KEY_PUT(match, eth.tci, htons(0xffff), true);
if (attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) { if (attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) {
swkey->eth.type = nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]); __be16 eth_type;
if (ntohs(swkey->eth.type) < ETH_P_802_3_MIN)
eth_type = nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]);
if (is_mask) {
/* Always exact match EtherType. */
eth_type = htons(0xffff);
} else if (ntohs(eth_type) < ETH_P_802_3_MIN) {
OVS_NLERR("EtherType is less than minimum (type=%x, min=%x).\n",
ntohs(eth_type), ETH_P_802_3_MIN);
return -EINVAL; return -EINVAL;
}
SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask);
attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE); attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE);
} else { } else if (!is_mask) {
swkey->eth.type = htons(ETH_P_802_2); SW_FLOW_KEY_PUT(match, eth.type, htons(ETH_P_802_2), is_mask);
} }
if (swkey->eth.type == htons(ETH_P_IP)) { if (attrs & (1 << OVS_KEY_ATTR_IPV4)) {
const struct ovs_key_ipv4 *ipv4_key; const struct ovs_key_ipv4 *ipv4_key;
if (!(attrs & (1 << OVS_KEY_ATTR_IPV4)))
return -EINVAL;
attrs &= ~(1 << OVS_KEY_ATTR_IPV4);
key_len = SW_FLOW_KEY_OFFSET(ipv4.addr);
ipv4_key = nla_data(a[OVS_KEY_ATTR_IPV4]); ipv4_key = nla_data(a[OVS_KEY_ATTR_IPV4]);
if (ipv4_key->ipv4_frag > OVS_FRAG_TYPE_MAX) if (!is_mask && ipv4_key->ipv4_frag > OVS_FRAG_TYPE_MAX) {
OVS_NLERR("Unknown IPv4 fragment type (value=%d, max=%d).\n",
ipv4_key->ipv4_frag, OVS_FRAG_TYPE_MAX);
return -EINVAL; return -EINVAL;
swkey->ip.proto = ipv4_key->ipv4_proto;
swkey->ip.tos = ipv4_key->ipv4_tos;
swkey->ip.ttl = ipv4_key->ipv4_ttl;
swkey->ip.frag = ipv4_key->ipv4_frag;
swkey->ipv4.addr.src = ipv4_key->ipv4_src;
swkey->ipv4.addr.dst = ipv4_key->ipv4_dst;
if (swkey->ip.frag != OVS_FRAG_TYPE_LATER) {
err = ipv4_flow_from_nlattrs(swkey, &key_len, a, &attrs);
if (err)
return err;
} }
} else if (swkey->eth.type == htons(ETH_P_IPV6)) { SW_FLOW_KEY_PUT(match, ip.proto,
const struct ovs_key_ipv6 *ipv6_key; ipv4_key->ipv4_proto, is_mask);
SW_FLOW_KEY_PUT(match, ip.tos,
ipv4_key->ipv4_tos, is_mask);
SW_FLOW_KEY_PUT(match, ip.ttl,
ipv4_key->ipv4_ttl, is_mask);
SW_FLOW_KEY_PUT(match, ip.frag,
ipv4_key->ipv4_frag, is_mask);
SW_FLOW_KEY_PUT(match, ipv4.addr.src,
ipv4_key->ipv4_src, is_mask);
SW_FLOW_KEY_PUT(match, ipv4.addr.dst,
ipv4_key->ipv4_dst, is_mask);
attrs &= ~(1 << OVS_KEY_ATTR_IPV4);
}
if (!(attrs & (1 << OVS_KEY_ATTR_IPV6))) if (attrs & (1 << OVS_KEY_ATTR_IPV6)) {
return -EINVAL; const struct ovs_key_ipv6 *ipv6_key;
attrs &= ~(1 << OVS_KEY_ATTR_IPV6);
key_len = SW_FLOW_KEY_OFFSET(ipv6.label);
ipv6_key = nla_data(a[OVS_KEY_ATTR_IPV6]); ipv6_key = nla_data(a[OVS_KEY_ATTR_IPV6]);
if (ipv6_key->ipv6_frag > OVS_FRAG_TYPE_MAX) if (!is_mask && ipv6_key->ipv6_frag > OVS_FRAG_TYPE_MAX) {
OVS_NLERR("Unknown IPv6 fragment type (value=%d, max=%d).\n",
ipv6_key->ipv6_frag, OVS_FRAG_TYPE_MAX);
return -EINVAL; return -EINVAL;
swkey->ipv6.label = ipv6_key->ipv6_label;
swkey->ip.proto = ipv6_key->ipv6_proto;
swkey->ip.tos = ipv6_key->ipv6_tclass;
swkey->ip.ttl = ipv6_key->ipv6_hlimit;
swkey->ip.frag = ipv6_key->ipv6_frag;
memcpy(&swkey->ipv6.addr.src, ipv6_key->ipv6_src,
sizeof(swkey->ipv6.addr.src));
memcpy(&swkey->ipv6.addr.dst, ipv6_key->ipv6_dst,
sizeof(swkey->ipv6.addr.dst));
if (swkey->ip.frag != OVS_FRAG_TYPE_LATER) {
err = ipv6_flow_from_nlattrs(swkey, &key_len, a, &attrs);
if (err)
return err;
} }
} else if (swkey->eth.type == htons(ETH_P_ARP) || SW_FLOW_KEY_PUT(match, ipv6.label,
swkey->eth.type == htons(ETH_P_RARP)) { ipv6_key->ipv6_label, is_mask);
SW_FLOW_KEY_PUT(match, ip.proto,
ipv6_key->ipv6_proto, is_mask);
SW_FLOW_KEY_PUT(match, ip.tos,
ipv6_key->ipv6_tclass, is_mask);
SW_FLOW_KEY_PUT(match, ip.ttl,
ipv6_key->ipv6_hlimit, is_mask);
SW_FLOW_KEY_PUT(match, ip.frag,
ipv6_key->ipv6_frag, is_mask);
SW_FLOW_KEY_MEMCPY(match, ipv6.addr.src,
ipv6_key->ipv6_src,
sizeof(match->key->ipv6.addr.src),
is_mask);
SW_FLOW_KEY_MEMCPY(match, ipv6.addr.dst,
ipv6_key->ipv6_dst,
sizeof(match->key->ipv6.addr.dst),
is_mask);
attrs &= ~(1 << OVS_KEY_ATTR_IPV6);
}
if (attrs & (1 << OVS_KEY_ATTR_ARP)) {
const struct ovs_key_arp *arp_key; const struct ovs_key_arp *arp_key;
if (!(attrs & (1 << OVS_KEY_ATTR_ARP))) arp_key = nla_data(a[OVS_KEY_ATTR_ARP]);
if (!is_mask && (arp_key->arp_op & htons(0xff00))) {
OVS_NLERR("Unknown ARP opcode (opcode=%d).\n",
arp_key->arp_op);
return -EINVAL; return -EINVAL;
}
SW_FLOW_KEY_PUT(match, ipv4.addr.src,
arp_key->arp_sip, is_mask);
SW_FLOW_KEY_PUT(match, ipv4.addr.dst,
arp_key->arp_tip, is_mask);
SW_FLOW_KEY_PUT(match, ip.proto,
ntohs(arp_key->arp_op), is_mask);
SW_FLOW_KEY_MEMCPY(match, ipv4.arp.sha,
arp_key->arp_sha, ETH_ALEN, is_mask);
SW_FLOW_KEY_MEMCPY(match, ipv4.arp.tha,
arp_key->arp_tha, ETH_ALEN, is_mask);
attrs &= ~(1 << OVS_KEY_ATTR_ARP); attrs &= ~(1 << OVS_KEY_ATTR_ARP);
}
key_len = SW_FLOW_KEY_OFFSET(ipv4.arp); if (attrs & (1 << OVS_KEY_ATTR_TCP)) {
arp_key = nla_data(a[OVS_KEY_ATTR_ARP]); const struct ovs_key_tcp *tcp_key;
swkey->ipv4.addr.src = arp_key->arp_sip;
swkey->ipv4.addr.dst = arp_key->arp_tip; tcp_key = nla_data(a[OVS_KEY_ATTR_TCP]);
if (arp_key->arp_op & htons(0xff00)) if (orig_attrs & (1 << OVS_KEY_ATTR_IPV4)) {
SW_FLOW_KEY_PUT(match, ipv4.tp.src,
tcp_key->tcp_src, is_mask);
SW_FLOW_KEY_PUT(match, ipv4.tp.dst,
tcp_key->tcp_dst, is_mask);
} else {
SW_FLOW_KEY_PUT(match, ipv6.tp.src,
tcp_key->tcp_src, is_mask);
SW_FLOW_KEY_PUT(match, ipv6.tp.dst,
tcp_key->tcp_dst, is_mask);
}
attrs &= ~(1 << OVS_KEY_ATTR_TCP);
}
if (attrs & (1 << OVS_KEY_ATTR_UDP)) {
const struct ovs_key_udp *udp_key;
udp_key = nla_data(a[OVS_KEY_ATTR_UDP]);
if (orig_attrs & (1 << OVS_KEY_ATTR_IPV4)) {
SW_FLOW_KEY_PUT(match, ipv4.tp.src,
udp_key->udp_src, is_mask);
SW_FLOW_KEY_PUT(match, ipv4.tp.dst,
udp_key->udp_dst, is_mask);
} else {
SW_FLOW_KEY_PUT(match, ipv6.tp.src,
udp_key->udp_src, is_mask);
SW_FLOW_KEY_PUT(match, ipv6.tp.dst,
udp_key->udp_dst, is_mask);
}
attrs &= ~(1 << OVS_KEY_ATTR_UDP);
}
if (attrs & (1 << OVS_KEY_ATTR_SCTP)) {
const struct ovs_key_sctp *sctp_key;
sctp_key = nla_data(a[OVS_KEY_ATTR_SCTP]);
if (orig_attrs & (1 << OVS_KEY_ATTR_IPV4)) {
SW_FLOW_KEY_PUT(match, ipv4.tp.src,
sctp_key->sctp_src, is_mask);
SW_FLOW_KEY_PUT(match, ipv4.tp.dst,
sctp_key->sctp_dst, is_mask);
} else {
SW_FLOW_KEY_PUT(match, ipv6.tp.src,
sctp_key->sctp_src, is_mask);
SW_FLOW_KEY_PUT(match, ipv6.tp.dst,
sctp_key->sctp_dst, is_mask);
}
attrs &= ~(1 << OVS_KEY_ATTR_SCTP);
}
if (attrs & (1 << OVS_KEY_ATTR_ICMP)) {
const struct ovs_key_icmp *icmp_key;
icmp_key = nla_data(a[OVS_KEY_ATTR_ICMP]);
SW_FLOW_KEY_PUT(match, ipv4.tp.src,
htons(icmp_key->icmp_type), is_mask);
SW_FLOW_KEY_PUT(match, ipv4.tp.dst,
htons(icmp_key->icmp_code), is_mask);
attrs &= ~(1 << OVS_KEY_ATTR_ICMP);
}
if (attrs & (1 << OVS_KEY_ATTR_ICMPV6)) {
const struct ovs_key_icmpv6 *icmpv6_key;
icmpv6_key = nla_data(a[OVS_KEY_ATTR_ICMPV6]);
SW_FLOW_KEY_PUT(match, ipv6.tp.src,
htons(icmpv6_key->icmpv6_type), is_mask);
SW_FLOW_KEY_PUT(match, ipv6.tp.dst,
htons(icmpv6_key->icmpv6_code), is_mask);
attrs &= ~(1 << OVS_KEY_ATTR_ICMPV6);
}
if (attrs & (1 << OVS_KEY_ATTR_ND)) {
const struct ovs_key_nd *nd_key;
nd_key = nla_data(a[OVS_KEY_ATTR_ND]);
SW_FLOW_KEY_MEMCPY(match, ipv6.nd.target,
nd_key->nd_target,
sizeof(match->key->ipv6.nd.target),
is_mask);
SW_FLOW_KEY_MEMCPY(match, ipv6.nd.sll,
nd_key->nd_sll, ETH_ALEN, is_mask);
SW_FLOW_KEY_MEMCPY(match, ipv6.nd.tll,
nd_key->nd_tll, ETH_ALEN, is_mask);
attrs &= ~(1 << OVS_KEY_ATTR_ND);
}
if (attrs != 0)
return -EINVAL;
return 0;
}
/**
* ovs_match_from_nlattrs - parses Netlink attributes into a flow key and
* mask. In case the 'mask' is NULL, the flow is treated as exact match
* flow. Otherwise, it is treated as a wildcarded flow, except the mask
* does not include any don't care bit.
* @match: receives the extracted flow match information.
* @key: Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink attribute
* sequence. The fields should of the packet that triggered the creation
* of this flow.
* @mask: Optional. Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink
* attribute specifies the mask field of the wildcarded flow.
*/
int ovs_match_from_nlattrs(struct sw_flow_match *match,
const struct nlattr *key,
const struct nlattr *mask)
{
const struct nlattr *a[OVS_KEY_ATTR_MAX + 1];
const struct nlattr *encap;
u64 key_attrs = 0;
u64 mask_attrs = 0;
bool encap_valid = false;
int err;
err = parse_flow_nlattrs(key, a, &key_attrs);
if (err)
return err;
if ((key_attrs & (1 << OVS_KEY_ATTR_ETHERNET)) &&
(key_attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) &&
(nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]) == htons(ETH_P_8021Q))) {
__be16 tci;
if (!((key_attrs & (1 << OVS_KEY_ATTR_VLAN)) &&
(key_attrs & (1 << OVS_KEY_ATTR_ENCAP)))) {
OVS_NLERR("Invalid Vlan frame.\n");
return -EINVAL; return -EINVAL;
swkey->ip.proto = ntohs(arp_key->arp_op); }
memcpy(swkey->ipv4.arp.sha, arp_key->arp_sha, ETH_ALEN);
memcpy(swkey->ipv4.arp.tha, arp_key->arp_tha, ETH_ALEN); key_attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE);
tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]);
encap = a[OVS_KEY_ATTR_ENCAP];
key_attrs &= ~(1 << OVS_KEY_ATTR_ENCAP);
encap_valid = true;
if (tci & htons(VLAN_TAG_PRESENT)) {
err = parse_flow_nlattrs(encap, a, &key_attrs);
if (err)
return err;
} else if (!tci) {
/* Corner case for truncated 802.1Q header. */
if (nla_len(encap)) {
OVS_NLERR("Truncated 802.1Q header has non-zero encap attribute.\n");
return -EINVAL;
}
} else {
OVS_NLERR("Encap attribute is set for a non-VLAN frame.\n");
return -EINVAL;
}
}
err = ovs_key_from_nlattrs(match, key_attrs, a, false);
if (err)
return err;
if (mask) {
err = parse_flow_mask_nlattrs(mask, a, &mask_attrs);
if (err)
return err;
if (mask_attrs & 1ULL << OVS_KEY_ATTR_ENCAP) {
__be16 eth_type = 0;
__be16 tci = 0;
if (!encap_valid) {
OVS_NLERR("Encap mask attribute is set for non-VLAN frame.\n");
return -EINVAL;
}
mask_attrs &= ~(1 << OVS_KEY_ATTR_ENCAP);
if (a[OVS_KEY_ATTR_ETHERTYPE])
eth_type = nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]);
if (eth_type == htons(0xffff)) {
mask_attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE);
encap = a[OVS_KEY_ATTR_ENCAP];
err = parse_flow_mask_nlattrs(encap, a, &mask_attrs);
} else {
OVS_NLERR("VLAN frames must have an exact match on the TPID (mask=%x).\n",
ntohs(eth_type));
return -EINVAL;
}
if (a[OVS_KEY_ATTR_VLAN])
tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]);
if (!(tci & htons(VLAN_TAG_PRESENT))) {
OVS_NLERR("VLAN tag present bit must have an exact match (tci_mask=%x).\n", ntohs(tci));
return -EINVAL;
}
}
err = ovs_key_from_nlattrs(match, mask_attrs, a, true);
if (err)
return err;
} else {
/* Populate exact match flow's key mask. */
if (match->mask)
ovs_sw_flow_mask_set(match->mask, &match->range, 0xff);
} }
if (attrs) if (!ovs_match_validate(match, key_attrs, mask_attrs))
return -EINVAL; return -EINVAL;
*key_lenp = key_len;
return 0; return 0;
} }
...@@ -1255,7 +1734,6 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp, ...@@ -1255,7 +1734,6 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
/** /**
* ovs_flow_metadata_from_nlattrs - parses Netlink attributes into a flow key. * ovs_flow_metadata_from_nlattrs - parses Netlink attributes into a flow key.
* @flow: Receives extracted in_port, priority, tun_key and skb_mark. * @flow: Receives extracted in_port, priority, tun_key and skb_mark.
* @key_len: Length of key in @flow. Used for calculating flow hash.
* @attr: Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink attribute * @attr: Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink attribute
* sequence. * sequence.
* *
...@@ -1264,102 +1742,100 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp, ...@@ -1264,102 +1742,100 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
* get the metadata, that is, the parts of the flow key that cannot be * get the metadata, that is, the parts of the flow key that cannot be
* extracted from the packet itself. * extracted from the packet itself.
*/ */
int ovs_flow_metadata_from_nlattrs(struct sw_flow *flow, int key_len,
const struct nlattr *attr) int ovs_flow_metadata_from_nlattrs(struct sw_flow *flow,
const struct nlattr *attr)
{ {
struct ovs_key_ipv4_tunnel *tun_key = &flow->key.tun_key; struct ovs_key_ipv4_tunnel *tun_key = &flow->key.tun_key;
const struct nlattr *nla; const struct nlattr *a[OVS_KEY_ATTR_MAX + 1];
int rem; u64 attrs = 0;
int err;
struct sw_flow_match match;
flow->key.phy.in_port = DP_MAX_PORTS; flow->key.phy.in_port = DP_MAX_PORTS;
flow->key.phy.priority = 0; flow->key.phy.priority = 0;
flow->key.phy.skb_mark = 0; flow->key.phy.skb_mark = 0;
memset(tun_key, 0, sizeof(flow->key.tun_key)); memset(tun_key, 0, sizeof(flow->key.tun_key));
nla_for_each_nested(nla, attr, rem) { err = parse_flow_nlattrs(attr, a, &attrs);
int type = nla_type(nla); if (err)
if (type <= OVS_KEY_ATTR_MAX && ovs_key_lens[type] > 0) {
int err;
if (nla_len(nla) != ovs_key_lens[type])
return -EINVAL;
switch (type) {
case OVS_KEY_ATTR_PRIORITY:
flow->key.phy.priority = nla_get_u32(nla);
break;
case OVS_KEY_ATTR_TUNNEL:
err = ovs_ipv4_tun_from_nlattr(nla, tun_key);
if (err)
return err;
break;
case OVS_KEY_ATTR_IN_PORT:
if (nla_get_u32(nla) >= DP_MAX_PORTS)
return -EINVAL;
flow->key.phy.in_port = nla_get_u32(nla);
break;
case OVS_KEY_ATTR_SKB_MARK:
flow->key.phy.skb_mark = nla_get_u32(nla);
break;
}
}
}
if (rem)
return -EINVAL; return -EINVAL;
flow->hash = ovs_flow_hash(&flow->key, memset(&match, 0, sizeof(match));
flow_key_start(&flow->key), key_len); match.key = &flow->key;
err = metadata_from_nlattrs(&match, &attrs, a, false);
if (err)
return err;
return 0; return 0;
} }
int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb) int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey,
const struct sw_flow_key *output, struct sk_buff *skb)
{ {
struct ovs_key_ethernet *eth_key; struct ovs_key_ethernet *eth_key;
struct nlattr *nla, *encap; struct nlattr *nla, *encap;
bool is_mask = (swkey != output);
if (swkey->phy.priority && if (nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, output->phy.priority))
nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority))
goto nla_put_failure; goto nla_put_failure;
if (swkey->tun_key.ipv4_dst && if ((swkey->tun_key.ipv4_dst || is_mask) &&
ovs_ipv4_tun_to_nlattr(skb, &swkey->tun_key)) ovs_ipv4_tun_to_nlattr(skb, &swkey->tun_key, &output->tun_key))
goto nla_put_failure; goto nla_put_failure;
if (swkey->phy.in_port != DP_MAX_PORTS && if (swkey->phy.in_port == DP_MAX_PORTS) {
nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT, swkey->phy.in_port)) if (is_mask && (output->phy.in_port == 0xffff))
goto nla_put_failure; if (nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT, 0xffffffff))
goto nla_put_failure;
} else {
u16 upper_u16;
upper_u16 = !is_mask ? 0 : 0xffff;
if (swkey->phy.skb_mark && if (nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT,
nla_put_u32(skb, OVS_KEY_ATTR_SKB_MARK, swkey->phy.skb_mark)) (upper_u16 << 16) | output->phy.in_port))
goto nla_put_failure;
}
if (nla_put_u32(skb, OVS_KEY_ATTR_SKB_MARK, output->phy.skb_mark))
goto nla_put_failure; goto nla_put_failure;
nla = nla_reserve(skb, OVS_KEY_ATTR_ETHERNET, sizeof(*eth_key)); nla = nla_reserve(skb, OVS_KEY_ATTR_ETHERNET, sizeof(*eth_key));
if (!nla) if (!nla)
goto nla_put_failure; goto nla_put_failure;
eth_key = nla_data(nla); eth_key = nla_data(nla);
memcpy(eth_key->eth_src, swkey->eth.src, ETH_ALEN); memcpy(eth_key->eth_src, output->eth.src, ETH_ALEN);
memcpy(eth_key->eth_dst, swkey->eth.dst, ETH_ALEN); memcpy(eth_key->eth_dst, output->eth.dst, ETH_ALEN);
if (swkey->eth.tci || swkey->eth.type == htons(ETH_P_8021Q)) { if (swkey->eth.tci || swkey->eth.type == htons(ETH_P_8021Q)) {
if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, htons(ETH_P_8021Q)) || __be16 eth_type;
nla_put_be16(skb, OVS_KEY_ATTR_VLAN, swkey->eth.tci)) eth_type = !is_mask ? htons(ETH_P_8021Q) : htons(0xffff);
if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, eth_type) ||
nla_put_be16(skb, OVS_KEY_ATTR_VLAN, output->eth.tci))
goto nla_put_failure; goto nla_put_failure;
encap = nla_nest_start(skb, OVS_KEY_ATTR_ENCAP); encap = nla_nest_start(skb, OVS_KEY_ATTR_ENCAP);
if (!swkey->eth.tci) if (!swkey->eth.tci)
goto unencap; goto unencap;
} else { } else
encap = NULL; encap = NULL;
}
if (swkey->eth.type == htons(ETH_P_802_2)) if (swkey->eth.type == htons(ETH_P_802_2)) {
/*
* Ethertype 802.2 is represented in the netlink with omitted
* OVS_KEY_ATTR_ETHERTYPE in the flow key attribute, and
* 0xffff in the mask attribute. Ethertype can also
* be wildcarded.
*/
if (is_mask && output->eth.type)
if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE,
output->eth.type))
goto nla_put_failure;
goto unencap; goto unencap;
}
if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, swkey->eth.type)) if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, output->eth.type))
goto nla_put_failure; goto nla_put_failure;
if (swkey->eth.type == htons(ETH_P_IP)) { if (swkey->eth.type == htons(ETH_P_IP)) {
...@@ -1369,12 +1845,12 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb) ...@@ -1369,12 +1845,12 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
if (!nla) if (!nla)
goto nla_put_failure; goto nla_put_failure;
ipv4_key = nla_data(nla); ipv4_key = nla_data(nla);
ipv4_key->ipv4_src = swkey->ipv4.addr.src; ipv4_key->ipv4_src = output->ipv4.addr.src;
ipv4_key->ipv4_dst = swkey->ipv4.addr.dst; ipv4_key->ipv4_dst = output->ipv4.addr.dst;
ipv4_key->ipv4_proto = swkey->ip.proto; ipv4_key->ipv4_proto = output->ip.proto;
ipv4_key->ipv4_tos = swkey->ip.tos; ipv4_key->ipv4_tos = output->ip.tos;
ipv4_key->ipv4_ttl = swkey->ip.ttl; ipv4_key->ipv4_ttl = output->ip.ttl;
ipv4_key->ipv4_frag = swkey->ip.frag; ipv4_key->ipv4_frag = output->ip.frag;
} else if (swkey->eth.type == htons(ETH_P_IPV6)) { } else if (swkey->eth.type == htons(ETH_P_IPV6)) {
struct ovs_key_ipv6 *ipv6_key; struct ovs_key_ipv6 *ipv6_key;
...@@ -1382,15 +1858,15 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb) ...@@ -1382,15 +1858,15 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
if (!nla) if (!nla)
goto nla_put_failure; goto nla_put_failure;
ipv6_key = nla_data(nla); ipv6_key = nla_data(nla);
memcpy(ipv6_key->ipv6_src, &swkey->ipv6.addr.src, memcpy(ipv6_key->ipv6_src, &output->ipv6.addr.src,
sizeof(ipv6_key->ipv6_src)); sizeof(ipv6_key->ipv6_src));
memcpy(ipv6_key->ipv6_dst, &swkey->ipv6.addr.dst, memcpy(ipv6_key->ipv6_dst, &output->ipv6.addr.dst,
sizeof(ipv6_key->ipv6_dst)); sizeof(ipv6_key->ipv6_dst));
ipv6_key->ipv6_label = swkey->ipv6.label; ipv6_key->ipv6_label = output->ipv6.label;
ipv6_key->ipv6_proto = swkey->ip.proto; ipv6_key->ipv6_proto = output->ip.proto;
ipv6_key->ipv6_tclass = swkey->ip.tos; ipv6_key->ipv6_tclass = output->ip.tos;
ipv6_key->ipv6_hlimit = swkey->ip.ttl; ipv6_key->ipv6_hlimit = output->ip.ttl;
ipv6_key->ipv6_frag = swkey->ip.frag; ipv6_key->ipv6_frag = output->ip.frag;
} else if (swkey->eth.type == htons(ETH_P_ARP) || } else if (swkey->eth.type == htons(ETH_P_ARP) ||
swkey->eth.type == htons(ETH_P_RARP)) { swkey->eth.type == htons(ETH_P_RARP)) {
struct ovs_key_arp *arp_key; struct ovs_key_arp *arp_key;
...@@ -1400,11 +1876,11 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb) ...@@ -1400,11 +1876,11 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
goto nla_put_failure; goto nla_put_failure;
arp_key = nla_data(nla); arp_key = nla_data(nla);
memset(arp_key, 0, sizeof(struct ovs_key_arp)); memset(arp_key, 0, sizeof(struct ovs_key_arp));
arp_key->arp_sip = swkey->ipv4.addr.src; arp_key->arp_sip = output->ipv4.addr.src;
arp_key->arp_tip = swkey->ipv4.addr.dst; arp_key->arp_tip = output->ipv4.addr.dst;
arp_key->arp_op = htons(swkey->ip.proto); arp_key->arp_op = htons(output->ip.proto);
memcpy(arp_key->arp_sha, swkey->ipv4.arp.sha, ETH_ALEN); memcpy(arp_key->arp_sha, output->ipv4.arp.sha, ETH_ALEN);
memcpy(arp_key->arp_tha, swkey->ipv4.arp.tha, ETH_ALEN); memcpy(arp_key->arp_tha, output->ipv4.arp.tha, ETH_ALEN);
} }
if ((swkey->eth.type == htons(ETH_P_IP) || if ((swkey->eth.type == htons(ETH_P_IP) ||
...@@ -1419,11 +1895,11 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb) ...@@ -1419,11 +1895,11 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
goto nla_put_failure; goto nla_put_failure;
tcp_key = nla_data(nla); tcp_key = nla_data(nla);
if (swkey->eth.type == htons(ETH_P_IP)) { if (swkey->eth.type == htons(ETH_P_IP)) {
tcp_key->tcp_src = swkey->ipv4.tp.src; tcp_key->tcp_src = output->ipv4.tp.src;
tcp_key->tcp_dst = swkey->ipv4.tp.dst; tcp_key->tcp_dst = output->ipv4.tp.dst;
} else if (swkey->eth.type == htons(ETH_P_IPV6)) { } else if (swkey->eth.type == htons(ETH_P_IPV6)) {
tcp_key->tcp_src = swkey->ipv6.tp.src; tcp_key->tcp_src = output->ipv6.tp.src;
tcp_key->tcp_dst = swkey->ipv6.tp.dst; tcp_key->tcp_dst = output->ipv6.tp.dst;
} }
} else if (swkey->ip.proto == IPPROTO_UDP) { } else if (swkey->ip.proto == IPPROTO_UDP) {
struct ovs_key_udp *udp_key; struct ovs_key_udp *udp_key;
...@@ -1433,11 +1909,25 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb) ...@@ -1433,11 +1909,25 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
goto nla_put_failure; goto nla_put_failure;
udp_key = nla_data(nla); udp_key = nla_data(nla);
if (swkey->eth.type == htons(ETH_P_IP)) { if (swkey->eth.type == htons(ETH_P_IP)) {
udp_key->udp_src = swkey->ipv4.tp.src; udp_key->udp_src = output->ipv4.tp.src;
udp_key->udp_dst = swkey->ipv4.tp.dst; udp_key->udp_dst = output->ipv4.tp.dst;
} else if (swkey->eth.type == htons(ETH_P_IPV6)) {
udp_key->udp_src = output->ipv6.tp.src;
udp_key->udp_dst = output->ipv6.tp.dst;
}
} else if (swkey->ip.proto == IPPROTO_SCTP) {
struct ovs_key_sctp *sctp_key;
nla = nla_reserve(skb, OVS_KEY_ATTR_SCTP, sizeof(*sctp_key));
if (!nla)
goto nla_put_failure;
sctp_key = nla_data(nla);
if (swkey->eth.type == htons(ETH_P_IP)) {
sctp_key->sctp_src = swkey->ipv4.tp.src;
sctp_key->sctp_dst = swkey->ipv4.tp.dst;
} else if (swkey->eth.type == htons(ETH_P_IPV6)) { } else if (swkey->eth.type == htons(ETH_P_IPV6)) {
udp_key->udp_src = swkey->ipv6.tp.src; sctp_key->sctp_src = swkey->ipv6.tp.src;
udp_key->udp_dst = swkey->ipv6.tp.dst; sctp_key->sctp_dst = swkey->ipv6.tp.dst;
} }
} else if (swkey->eth.type == htons(ETH_P_IP) && } else if (swkey->eth.type == htons(ETH_P_IP) &&
swkey->ip.proto == IPPROTO_ICMP) { swkey->ip.proto == IPPROTO_ICMP) {
...@@ -1447,8 +1937,8 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb) ...@@ -1447,8 +1937,8 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
if (!nla) if (!nla)
goto nla_put_failure; goto nla_put_failure;
icmp_key = nla_data(nla); icmp_key = nla_data(nla);
icmp_key->icmp_type = ntohs(swkey->ipv4.tp.src); icmp_key->icmp_type = ntohs(output->ipv4.tp.src);
icmp_key->icmp_code = ntohs(swkey->ipv4.tp.dst); icmp_key->icmp_code = ntohs(output->ipv4.tp.dst);
} else if (swkey->eth.type == htons(ETH_P_IPV6) && } else if (swkey->eth.type == htons(ETH_P_IPV6) &&
swkey->ip.proto == IPPROTO_ICMPV6) { swkey->ip.proto == IPPROTO_ICMPV6) {
struct ovs_key_icmpv6 *icmpv6_key; struct ovs_key_icmpv6 *icmpv6_key;
...@@ -1458,8 +1948,8 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb) ...@@ -1458,8 +1948,8 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
if (!nla) if (!nla)
goto nla_put_failure; goto nla_put_failure;
icmpv6_key = nla_data(nla); icmpv6_key = nla_data(nla);
icmpv6_key->icmpv6_type = ntohs(swkey->ipv6.tp.src); icmpv6_key->icmpv6_type = ntohs(output->ipv6.tp.src);
icmpv6_key->icmpv6_code = ntohs(swkey->ipv6.tp.dst); icmpv6_key->icmpv6_code = ntohs(output->ipv6.tp.dst);
if (icmpv6_key->icmpv6_type == NDISC_NEIGHBOUR_SOLICITATION || if (icmpv6_key->icmpv6_type == NDISC_NEIGHBOUR_SOLICITATION ||
icmpv6_key->icmpv6_type == NDISC_NEIGHBOUR_ADVERTISEMENT) { icmpv6_key->icmpv6_type == NDISC_NEIGHBOUR_ADVERTISEMENT) {
...@@ -1469,10 +1959,10 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb) ...@@ -1469,10 +1959,10 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
if (!nla) if (!nla)
goto nla_put_failure; goto nla_put_failure;
nd_key = nla_data(nla); nd_key = nla_data(nla);
memcpy(nd_key->nd_target, &swkey->ipv6.nd.target, memcpy(nd_key->nd_target, &output->ipv6.nd.target,
sizeof(nd_key->nd_target)); sizeof(nd_key->nd_target));
memcpy(nd_key->nd_sll, swkey->ipv6.nd.sll, ETH_ALEN); memcpy(nd_key->nd_sll, output->ipv6.nd.sll, ETH_ALEN);
memcpy(nd_key->nd_tll, swkey->ipv6.nd.tll, ETH_ALEN); memcpy(nd_key->nd_tll, output->ipv6.nd.tll, ETH_ALEN);
} }
} }
} }
...@@ -1491,6 +1981,8 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb) ...@@ -1491,6 +1981,8 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
* Returns zero if successful or a negative error code. */ * Returns zero if successful or a negative error code. */
int ovs_flow_init(void) int ovs_flow_init(void)
{ {
BUILD_BUG_ON(sizeof(struct sw_flow_key) % sizeof(long));
flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow), 0, flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow), 0,
0, NULL); 0, NULL);
if (flow_cache == NULL) if (flow_cache == NULL)
...@@ -1504,3 +1996,84 @@ void ovs_flow_exit(void) ...@@ -1504,3 +1996,84 @@ void ovs_flow_exit(void)
{ {
kmem_cache_destroy(flow_cache); kmem_cache_destroy(flow_cache);
} }
struct sw_flow_mask *ovs_sw_flow_mask_alloc(void)
{
struct sw_flow_mask *mask;
mask = kmalloc(sizeof(*mask), GFP_KERNEL);
if (mask)
mask->ref_count = 0;
return mask;
}
void ovs_sw_flow_mask_add_ref(struct sw_flow_mask *mask)
{
mask->ref_count++;
}
void ovs_sw_flow_mask_del_ref(struct sw_flow_mask *mask, bool deferred)
{
if (!mask)
return;
BUG_ON(!mask->ref_count);
mask->ref_count--;
if (!mask->ref_count) {
list_del_rcu(&mask->list);
if (deferred)
kfree_rcu(mask, rcu);
else
kfree(mask);
}
}
static bool ovs_sw_flow_mask_equal(const struct sw_flow_mask *a,
const struct sw_flow_mask *b)
{
u8 *a_ = (u8 *)&a->key + a->range.start;
u8 *b_ = (u8 *)&b->key + b->range.start;
return (a->range.end == b->range.end)
&& (a->range.start == b->range.start)
&& (memcmp(a_, b_, range_n_bytes(&a->range)) == 0);
}
struct sw_flow_mask *ovs_sw_flow_mask_find(const struct flow_table *tbl,
const struct sw_flow_mask *mask)
{
struct list_head *ml;
list_for_each(ml, tbl->mask_list) {
struct sw_flow_mask *m;
m = container_of(ml, struct sw_flow_mask, list);
if (ovs_sw_flow_mask_equal(mask, m))
return m;
}
return NULL;
}
/**
* add a new mask into the mask list.
* The caller needs to make sure that 'mask' is not the same
* as any masks that are already on the list.
*/
void ovs_sw_flow_mask_insert(struct flow_table *tbl, struct sw_flow_mask *mask)
{
list_add_rcu(&mask->list, tbl->mask_list);
}
/**
* Set 'range' fields in the mask to the value of 'val'.
*/
static void ovs_sw_flow_mask_set(struct sw_flow_mask *mask,
struct sw_flow_key_range *range, u8 val)
{
u8 *m = (u8 *)&mask->key + range->start;
mask->range = *range;
memset(m, val, range_n_bytes(range));
}
/* /*
* Copyright (c) 2007-2011 Nicira, Inc. * Copyright (c) 2007-2013 Nicira, Inc.
* *
* This program is free software; you can redistribute it and/or * This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public * modify it under the terms of version 2 of the GNU General Public
...@@ -33,6 +33,8 @@ ...@@ -33,6 +33,8 @@
#include <net/inet_ecn.h> #include <net/inet_ecn.h>
struct sk_buff; struct sk_buff;
struct sw_flow_mask;
struct flow_table;
struct sw_flow_actions { struct sw_flow_actions {
struct rcu_head rcu; struct rcu_head rcu;
...@@ -97,8 +99,8 @@ struct sw_flow_key { ...@@ -97,8 +99,8 @@ struct sw_flow_key {
} addr; } addr;
union { union {
struct { struct {
__be16 src; /* TCP/UDP source port. */ __be16 src; /* TCP/UDP/SCTP source port. */
__be16 dst; /* TCP/UDP destination port. */ __be16 dst; /* TCP/UDP/SCTP destination port. */
} tp; } tp;
struct { struct {
u8 sha[ETH_ALEN]; /* ARP source hardware address. */ u8 sha[ETH_ALEN]; /* ARP source hardware address. */
...@@ -113,8 +115,8 @@ struct sw_flow_key { ...@@ -113,8 +115,8 @@ struct sw_flow_key {
} addr; } addr;
__be32 label; /* IPv6 flow label. */ __be32 label; /* IPv6 flow label. */
struct { struct {
__be16 src; /* TCP/UDP source port. */ __be16 src; /* TCP/UDP/SCTP source port. */
__be16 dst; /* TCP/UDP destination port. */ __be16 dst; /* TCP/UDP/SCTP destination port. */
} tp; } tp;
struct { struct {
struct in6_addr target; /* ND target address. */ struct in6_addr target; /* ND target address. */
...@@ -123,7 +125,7 @@ struct sw_flow_key { ...@@ -123,7 +125,7 @@ struct sw_flow_key {
} nd; } nd;
} ipv6; } ipv6;
}; };
}; } __aligned(__alignof__(long));
struct sw_flow { struct sw_flow {
struct rcu_head rcu; struct rcu_head rcu;
...@@ -131,6 +133,8 @@ struct sw_flow { ...@@ -131,6 +133,8 @@ struct sw_flow {
u32 hash; u32 hash;
struct sw_flow_key key; struct sw_flow_key key;
struct sw_flow_key unmasked_key;
struct sw_flow_mask *mask;
struct sw_flow_actions __rcu *sf_acts; struct sw_flow_actions __rcu *sf_acts;
spinlock_t lock; /* Lock for values below. */ spinlock_t lock; /* Lock for values below. */
...@@ -140,6 +144,20 @@ struct sw_flow { ...@@ -140,6 +144,20 @@ struct sw_flow {
u8 tcp_flags; /* Union of seen TCP flags. */ u8 tcp_flags; /* Union of seen TCP flags. */
}; };
struct sw_flow_key_range {
size_t start;
size_t end;
};
struct sw_flow_match {
struct sw_flow_key *key;
struct sw_flow_key_range range;
struct sw_flow_mask *mask;
};
void ovs_match_init(struct sw_flow_match *match,
struct sw_flow_key *key, struct sw_flow_mask *mask);
struct arp_eth_header { struct arp_eth_header {
__be16 ar_hrd; /* format of hardware address */ __be16 ar_hrd; /* format of hardware address */
__be16 ar_pro; /* format of protocol address */ __be16 ar_pro; /* format of protocol address */
...@@ -159,21 +177,21 @@ void ovs_flow_exit(void); ...@@ -159,21 +177,21 @@ void ovs_flow_exit(void);
struct sw_flow *ovs_flow_alloc(void); struct sw_flow *ovs_flow_alloc(void);
void ovs_flow_deferred_free(struct sw_flow *); void ovs_flow_deferred_free(struct sw_flow *);
void ovs_flow_free(struct sw_flow *flow); void ovs_flow_free(struct sw_flow *, bool deferred);
struct sw_flow_actions *ovs_flow_actions_alloc(int actions_len); struct sw_flow_actions *ovs_flow_actions_alloc(int actions_len);
void ovs_flow_deferred_free_acts(struct sw_flow_actions *); void ovs_flow_deferred_free_acts(struct sw_flow_actions *);
int ovs_flow_extract(struct sk_buff *, u16 in_port, struct sw_flow_key *, int ovs_flow_extract(struct sk_buff *, u16 in_port, struct sw_flow_key *);
int *key_lenp);
void ovs_flow_used(struct sw_flow *, struct sk_buff *); void ovs_flow_used(struct sw_flow *, struct sk_buff *);
u64 ovs_flow_used_time(unsigned long flow_jiffies); u64 ovs_flow_used_time(unsigned long flow_jiffies);
int ovs_flow_to_nlattrs(const struct sw_flow_key *,
int ovs_flow_to_nlattrs(const struct sw_flow_key *, struct sk_buff *); const struct sw_flow_key *, struct sk_buff *);
int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp, int ovs_match_from_nlattrs(struct sw_flow_match *match,
const struct nlattr *,
const struct nlattr *); const struct nlattr *);
int ovs_flow_metadata_from_nlattrs(struct sw_flow *flow, int key_len, int ovs_flow_metadata_from_nlattrs(struct sw_flow *flow,
const struct nlattr *attr); const struct nlattr *attr);
#define MAX_ACTIONS_BUFSIZE (32 * 1024) #define MAX_ACTIONS_BUFSIZE (32 * 1024)
#define TBL_MIN_BUCKETS 1024 #define TBL_MIN_BUCKETS 1024
...@@ -182,6 +200,7 @@ struct flow_table { ...@@ -182,6 +200,7 @@ struct flow_table {
struct flex_array *buckets; struct flex_array *buckets;
unsigned int count, n_buckets; unsigned int count, n_buckets;
struct rcu_head rcu; struct rcu_head rcu;
struct list_head *mask_list;
int node_ver; int node_ver;
u32 hash_seed; u32 hash_seed;
bool keep_flows; bool keep_flows;
...@@ -197,22 +216,44 @@ static inline int ovs_flow_tbl_need_to_expand(struct flow_table *table) ...@@ -197,22 +216,44 @@ static inline int ovs_flow_tbl_need_to_expand(struct flow_table *table)
return (table->count > table->n_buckets); return (table->count > table->n_buckets);
} }
struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *table, struct sw_flow *ovs_flow_lookup(struct flow_table *,
struct sw_flow_key *key, int len); const struct sw_flow_key *);
void ovs_flow_tbl_destroy(struct flow_table *table); struct sw_flow *ovs_flow_lookup_unmasked_key(struct flow_table *table,
void ovs_flow_tbl_deferred_destroy(struct flow_table *table); struct sw_flow_match *match);
void ovs_flow_tbl_destroy(struct flow_table *table, bool deferred);
struct flow_table *ovs_flow_tbl_alloc(int new_size); struct flow_table *ovs_flow_tbl_alloc(int new_size);
struct flow_table *ovs_flow_tbl_expand(struct flow_table *table); struct flow_table *ovs_flow_tbl_expand(struct flow_table *table);
struct flow_table *ovs_flow_tbl_rehash(struct flow_table *table); struct flow_table *ovs_flow_tbl_rehash(struct flow_table *table);
void ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow,
struct sw_flow_key *key, int key_len);
void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow);
struct sw_flow *ovs_flow_tbl_next(struct flow_table *table, u32 *bucket, u32 *idx); void ovs_flow_insert(struct flow_table *table, struct sw_flow *flow);
void ovs_flow_remove(struct flow_table *table, struct sw_flow *flow);
struct sw_flow *ovs_flow_dump_next(struct flow_table *table, u32 *bucket, u32 *idx);
extern const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1]; extern const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1];
int ovs_ipv4_tun_from_nlattr(const struct nlattr *attr, int ovs_ipv4_tun_from_nlattr(const struct nlattr *attr,
struct ovs_key_ipv4_tunnel *tun_key); struct sw_flow_match *match, bool is_mask);
int ovs_ipv4_tun_to_nlattr(struct sk_buff *skb, int ovs_ipv4_tun_to_nlattr(struct sk_buff *skb,
const struct ovs_key_ipv4_tunnel *tun_key); const struct ovs_key_ipv4_tunnel *tun_key,
const struct ovs_key_ipv4_tunnel *output);
bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow,
const struct sw_flow_key *key, int key_end);
struct sw_flow_mask {
int ref_count;
struct rcu_head rcu;
struct list_head list;
struct sw_flow_key_range range;
struct sw_flow_key key;
};
struct sw_flow_mask *ovs_sw_flow_mask_alloc(void);
void ovs_sw_flow_mask_add_ref(struct sw_flow_mask *);
void ovs_sw_flow_mask_del_ref(struct sw_flow_mask *, bool deferred);
void ovs_sw_flow_mask_insert(struct flow_table *, struct sw_flow_mask *);
struct sw_flow_mask *ovs_sw_flow_mask_find(const struct flow_table *,
const struct sw_flow_mask *);
void ovs_flow_key_mask(struct sw_flow_key *dst, const struct sw_flow_key *src,
const struct sw_flow_mask *mask);
#endif /* flow.h */ #endif /* flow.h */
...@@ -16,7 +16,6 @@ ...@@ -16,7 +16,6 @@
* 02110-1301, USA * 02110-1301, USA
*/ */
#ifdef CONFIG_OPENVSWITCH_GRE
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/if.h> #include <linux/if.h>
...@@ -271,5 +270,3 @@ const struct vport_ops ovs_gre_vport_ops = { ...@@ -271,5 +270,3 @@ const struct vport_ops ovs_gre_vport_ops = {
.get_name = gre_get_name, .get_name = gre_get_name,
.send = gre_tnl_send, .send = gre_tnl_send,
}; };
#endif /* OPENVSWITCH_GRE */
...@@ -25,6 +25,7 @@ ...@@ -25,6 +25,7 @@
#include <linux/llc.h> #include <linux/llc.h>
#include <linux/rtnetlink.h> #include <linux/rtnetlink.h>
#include <linux/skbuff.h> #include <linux/skbuff.h>
#include <linux/openvswitch.h>
#include <net/llc.h> #include <net/llc.h>
...@@ -74,6 +75,15 @@ static rx_handler_result_t netdev_frame_hook(struct sk_buff **pskb) ...@@ -74,6 +75,15 @@ static rx_handler_result_t netdev_frame_hook(struct sk_buff **pskb)
return RX_HANDLER_CONSUMED; return RX_HANDLER_CONSUMED;
} }
static struct net_device *get_dpdev(struct datapath *dp)
{
struct vport *local;
local = ovs_vport_ovsl(dp, OVSP_LOCAL);
BUG_ON(!local);
return netdev_vport_priv(local)->dev;
}
static struct vport *netdev_create(const struct vport_parms *parms) static struct vport *netdev_create(const struct vport_parms *parms)
{ {
struct vport *vport; struct vport *vport;
...@@ -103,10 +113,15 @@ static struct vport *netdev_create(const struct vport_parms *parms) ...@@ -103,10 +113,15 @@ static struct vport *netdev_create(const struct vport_parms *parms)
} }
rtnl_lock(); rtnl_lock();
err = netdev_master_upper_dev_link(netdev_vport->dev,
get_dpdev(vport->dp));
if (err)
goto error_unlock;
err = netdev_rx_handler_register(netdev_vport->dev, netdev_frame_hook, err = netdev_rx_handler_register(netdev_vport->dev, netdev_frame_hook,
vport); vport);
if (err) if (err)
goto error_unlock; goto error_master_upper_dev_unlink;
dev_set_promiscuity(netdev_vport->dev, 1); dev_set_promiscuity(netdev_vport->dev, 1);
netdev_vport->dev->priv_flags |= IFF_OVS_DATAPATH; netdev_vport->dev->priv_flags |= IFF_OVS_DATAPATH;
...@@ -114,6 +129,8 @@ static struct vport *netdev_create(const struct vport_parms *parms) ...@@ -114,6 +129,8 @@ static struct vport *netdev_create(const struct vport_parms *parms)
return vport; return vport;
error_master_upper_dev_unlink:
netdev_upper_dev_unlink(netdev_vport->dev, get_dpdev(vport->dp));
error_unlock: error_unlock:
rtnl_unlock(); rtnl_unlock();
error_put: error_put:
...@@ -140,6 +157,7 @@ static void netdev_destroy(struct vport *vport) ...@@ -140,6 +157,7 @@ static void netdev_destroy(struct vport *vport)
rtnl_lock(); rtnl_lock();
netdev_vport->dev->priv_flags &= ~IFF_OVS_DATAPATH; netdev_vport->dev->priv_flags &= ~IFF_OVS_DATAPATH;
netdev_rx_handler_unregister(netdev_vport->dev); netdev_rx_handler_unregister(netdev_vport->dev);
netdev_upper_dev_unlink(netdev_vport->dev, get_dpdev(vport->dp));
dev_set_promiscuity(netdev_vport->dev, -1); dev_set_promiscuity(netdev_vport->dev, -1);
rtnl_unlock(); rtnl_unlock();
......
...@@ -203,7 +203,7 @@ struct vport *ovs_vport_add(const struct vport_parms *parms) ...@@ -203,7 +203,7 @@ struct vport *ovs_vport_add(const struct vport_parms *parms)
* ovs_vport_set_options - modify existing vport device (for kernel callers) * ovs_vport_set_options - modify existing vport device (for kernel callers)
* *
* @vport: vport to modify. * @vport: vport to modify.
* @port: New configuration. * @options: New configuration.
* *
* Modifies an existing device with the specified configuration (which is * Modifies an existing device with the specified configuration (which is
* dependent on device type). ovs_mutex must be held. * dependent on device type). ovs_mutex must be held.
...@@ -328,6 +328,7 @@ int ovs_vport_get_options(const struct vport *vport, struct sk_buff *skb) ...@@ -328,6 +328,7 @@ int ovs_vport_get_options(const struct vport *vport, struct sk_buff *skb)
* *
* @vport: vport that received the packet * @vport: vport that received the packet
* @skb: skb that was received * @skb: skb that was received
* @tun_key: tunnel (if any) that carried packet
* *
* Must be called with rcu_read_lock. The packet cannot be shared and * Must be called with rcu_read_lock. The packet cannot be shared and
* skb->data should point to the Ethernet header. * skb->data should point to the Ethernet header.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment