Commit 78a8a36f authored by David S. Miller's avatar David S. Miller
parents 04a6f441 ccb1352e
......@@ -144,6 +144,8 @@ nfc.txt
- The Linux Near Field Communication (NFS) subsystem.
olympic.txt
- IBM PCI Pit/Pit-Phy/Olympic Token Ring driver info.
openvswitch.txt
- Open vSwitch developer documentation.
operstates.txt
- Overview of network interface operational states.
packet_mmap.txt
......
Open vSwitch datapath developer documentation
=============================================
The Open vSwitch kernel module allows flexible userspace control over
flow-level packet processing on selected network devices. It can be
used to implement a plain Ethernet switch, network device bonding,
VLAN processing, network access control, flow-based network control,
and so on.
The kernel module implements multiple "datapaths" (analogous to
bridges), each of which can have multiple "vports" (analogous to ports
within a bridge). Each datapath also has associated with it a "flow
table" that userspace populates with "flows" that map from keys based
on packet headers and metadata to sets of actions. The most common
action forwards the packet to another vport; other actions are also
implemented.
When a packet arrives on a vport, the kernel module processes it by
extracting its flow key and looking it up in the flow table. If there
is a matching flow, it executes the associated actions. If there is
no match, it queues the packet to userspace for processing (as part of
its processing, userspace will likely set up a flow to handle further
packets of the same type entirely in-kernel).
Flow key compatibility
----------------------
Network protocols evolve over time. New protocols become important
and existing protocols lose their prominence. For the Open vSwitch
kernel module to remain relevant, it must be possible for newer
versions to parse additional protocols as part of the flow key. It
might even be desirable, someday, to drop support for parsing
protocols that have become obsolete. Therefore, the Netlink interface
to Open vSwitch is designed to allow carefully written userspace
applications to work with any version of the flow key, past or future.
To support this forward and backward compatibility, whenever the
kernel module passes a packet to userspace, it also passes along the
flow key that it parsed from the packet. Userspace then extracts its
own notion of a flow key from the packet and compares it against the
kernel-provided version:
- If userspace's notion of the flow key for the packet matches the
kernel's, then nothing special is necessary.
- If the kernel's flow key includes more fields than the userspace
version of the flow key, for example if the kernel decoded IPv6
headers but userspace stopped at the Ethernet type (because it
does not understand IPv6), then again nothing special is
necessary. Userspace can still set up a flow in the usual way,
as long as it uses the kernel-provided flow key to do it.
- If the userspace flow key includes more fields than the
kernel's, for example if userspace decoded an IPv6 header but
the kernel stopped at the Ethernet type, then userspace can
forward the packet manually, without setting up a flow in the
kernel. This case is bad for performance because every packet
that the kernel considers part of the flow must go to userspace,
but the forwarding behavior is correct. (If userspace can
determine that the values of the extra fields would not affect
forwarding behavior, then it could set up a flow anyway.)
How flow keys evolve over time is important to making this work, so
the following sections go into detail.
Flow key format
---------------
A flow key is passed over a Netlink socket as a sequence of Netlink
attributes. Some attributes represent packet metadata, defined as any
information about a packet that cannot be extracted from the packet
itself, e.g. the vport on which the packet was received. Most
attributes, however, are extracted from headers within the packet,
e.g. source and destination addresses from Ethernet, IP, or TCP
headers.
The <linux/openvswitch.h> header file defines the exact format of the
flow key attributes. For informal explanatory purposes here, we write
them as comma-separated strings, with parentheses indicating arguments
and nesting. For example, the following could represent a flow key
corresponding to a TCP packet that arrived on vport 1:
in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4),
eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0,
frag=no), tcp(src=49163, dst=80)
Often we ellipsize arguments not important to the discussion, e.g.:
in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...)
Basic rule for evolving flow keys
---------------------------------
Some care is needed to really maintain forward and backward
compatibility for applications that follow the rules listed under
"Flow key compatibility" above.
The basic rule is obvious:
------------------------------------------------------------------
New network protocol support must only supplement existing flow
key attributes. It must not change the meaning of already defined
flow key attributes.
------------------------------------------------------------------
This rule does have less-obvious consequences so it is worth working
through a few examples. Suppose, for example, that the kernel module
did not already implement VLAN parsing. Instead, it just interpreted
the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the
packet. The flow key for any packet with an 802.1Q header would look
essentially like this, ignoring metadata:
eth(...), eth_type(0x8100)
Naively, to add VLAN support, it makes sense to add a new "vlan" flow
key attribute to contain the VLAN tag, then continue to decode the
encapsulated headers beyond the VLAN tag using the existing field
definitions. With this change, an TCP packet in VLAN 10 would have a
flow key much like this:
eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...)
But this change would negatively affect a userspace application that
has not been updated to understand the new "vlan" flow key attribute.
The application could, following the flow compatibility rules above,
ignore the "vlan" attribute that it does not understand and therefore
assume that the flow contained IP packets. This is a bad assumption
(the flow only contains IP packets if one parses and skips over the
802.1Q header) and it could cause the application's behavior to change
across kernel versions even though it follows the compatibility rules.
The solution is to use a set of nested attributes. This is, for
example, why 802.1Q support uses nested attributes. A TCP packet in
VLAN 10 is actually expressed as:
eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800),
ip(proto=6, ...), tcp(...)))
Notice how the "eth_type", "ip", and "tcp" flow key attributes are
nested inside the "encap" attribute. Thus, an application that does
not understand the "vlan" key will not see either of those attributes
and therefore will not misinterpret them. (Also, the outer eth_type
is still 0x8100, not changed to 0x0800.)
Handling malformed packets
--------------------------
Don't drop packets in the kernel for malformed protocol headers, bad
checksums, etc. This would prevent userspace from implementing a
simple Ethernet switch that forwards every packet.
Instead, in such a case, include an attribute with "empty" content.
It doesn't matter if the empty content could be valid protocol values,
as long as those values are rarely seen in practice, because userspace
can always forward all packets with those values to userspace and
handle them individually.
For example, consider a packet that contains an IP header that
indicates protocol 6 for TCP, but which is truncated just after the IP
header, so that the TCP header is missing. The flow key for this
packet would include a tcp attribute with all-zero src and dst, like
this:
eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0)
As another example, consider a packet with an Ethernet type of 0x8100,
indicating that a VLAN TCI should follow, but which is truncated just
after the Ethernet type. The flow key for this packet would include
an all-zero-bits vlan and an empty encap attribute, like this:
eth(...), eth_type(0x8100), vlan(0), encap()
Unlike a TCP packet with source and destination ports 0, an
all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka
VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan
attribute expressly to allow this situation to be distinguished.
Thus, the flow key in this second example unambiguously indicates a
missing or malformed VLAN TCI.
Other rules
-----------
The other rules for flow keys are much less subtle:
- Duplicate attributes are not allowed at a given nesting level.
- Ordering of attributes is not significant.
- When the kernel sends a given flow key to userspace, it always
composes it the same way. This allows userspace to hash and
compare entire flow keys that it may not be able to fully
interpret.
......@@ -4868,6 +4868,14 @@ S: Maintained
T: git git://openrisc.net/~jonas/linux
F: arch/openrisc
OPENVSWITCH
M: Jesse Gross <jesse@nicira.com>
L: dev@openvswitch.org
W: http://openvswitch.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch.git
S: Maintained
F: net/openvswitch/
OPL4 DRIVER
M: Clemens Ladisch <clemens@ladisch.de>
L: alsa-devel@alsa-project.org (moderated for non-subscribers)
......
......@@ -85,6 +85,30 @@ enum {
/* All generic netlink requests are serialized by a global lock. */
extern void genl_lock(void);
extern void genl_unlock(void);
#ifdef CONFIG_PROVE_LOCKING
extern int lockdep_genl_is_held(void);
#endif
/**
* rcu_dereference_genl - rcu_dereference with debug checking
* @p: The pointer to read, prior to dereferencing
*
* Do an rcu_dereference(p), but check caller either holds rcu_read_lock()
* or genl mutex. Note : Please prefer genl_dereference() or rcu_dereference()
*/
#define rcu_dereference_genl(p) \
rcu_dereference_check(p, lockdep_genl_is_held())
/**
* genl_dereference - fetch RCU pointer when updates are prevented by genl mutex
* @p: The pointer to read, prior to dereferencing
*
* Return the value of the specified RCU-protected pointer, but omit
* both the smp_read_barrier_depends() and the ACCESS_ONCE(), because
* caller holds genl mutex.
*/
#define genl_dereference(p) \
rcu_dereference_protected(p, lockdep_genl_is_held())
#endif /* __KERNEL__ */
......
......@@ -310,6 +310,40 @@ static inline __be16 vlan_get_protocol(const struct sk_buff *skb)
return protocol;
}
static inline void vlan_set_encap_proto(struct sk_buff *skb,
struct vlan_hdr *vhdr)
{
__be16 proto;
unsigned char *rawp;
/*
* Was a VLAN packet, grab the encapsulated protocol, which the layer
* three protocols care about.
*/
proto = vhdr->h_vlan_encapsulated_proto;
if (ntohs(proto) >= 1536) {
skb->protocol = proto;
return;
}
rawp = skb->data;
if (*(unsigned short *) rawp == 0xFFFF)
/*
* This is a magic hack to spot IPX packets. Older Novell
* breaks the protocol design and runs IPX over 802.3 without
* an 802.2 LLC layer. We look for FFFF which isn't a used
* 802.2 SSAP/DSAP. This won't work for fault tolerant netware
* but does for the rest.
*/
skb->protocol = htons(ETH_P_802_3);
else
/*
* Real 802.2 LLC
*/
skb->protocol = htons(ETH_P_802_2);
}
#endif /* __KERNEL__ */
/* VLAN IOCTLs are found in sockios.h */
......
This diff is collapsed.
......@@ -128,6 +128,8 @@ extern int genl_register_mc_group(struct genl_family *family,
struct genl_multicast_group *grp);
extern void genl_unregister_mc_group(struct genl_family *family,
struct genl_multicast_group *grp);
extern void genl_notify(struct sk_buff *skb, struct net *net, u32 pid,
u32 group, struct nlmsghdr *nlh, gfp_t flags);
/**
* genlmsg_put - Add generic netlink header to netlink message
......
......@@ -558,7 +558,7 @@ extern void ipv6_push_frag_opts(struct sk_buff *skb,
u8 *proto);
extern int ipv6_skip_exthdr(const struct sk_buff *, int start,
u8 *nexthdrp);
u8 *nexthdrp, __be16 *frag_offp);
extern int ipv6_ext_hdr(u8 nexthdr);
......
......@@ -110,39 +110,6 @@ static struct sk_buff *vlan_reorder_header(struct sk_buff *skb)
return skb;
}
static void vlan_set_encap_proto(struct sk_buff *skb, struct vlan_hdr *vhdr)
{
__be16 proto;
unsigned char *rawp;
/*
* Was a VLAN packet, grab the encapsulated protocol, which the layer
* three protocols care about.
*/
proto = vhdr->h_vlan_encapsulated_proto;
if (ntohs(proto) >= 1536) {
skb->protocol = proto;
return;
}
rawp = skb->data;
if (*(unsigned short *) rawp == 0xFFFF)
/*
* This is a magic hack to spot IPX packets. Older Novell
* breaks the protocol design and runs IPX over 802.3 without
* an 802.2 LLC layer. We look for FFFF which isn't a used
* 802.2 SSAP/DSAP. This won't work for fault tolerant netware
* but does for the rest.
*/
skb->protocol = htons(ETH_P_802_3);
else
/*
* Real 802.2 LLC
*/
skb->protocol = htons(ETH_P_802_2);
}
struct sk_buff *vlan_untag(struct sk_buff *skb)
{
struct vlan_hdr *vhdr;
......
......@@ -215,6 +215,7 @@ source "net/sched/Kconfig"
source "net/dcb/Kconfig"
source "net/dns_resolver/Kconfig"
source "net/batman-adv/Kconfig"
source "net/openvswitch/Kconfig"
config RPS
boolean
......
......@@ -69,3 +69,4 @@ obj-$(CONFIG_DNS_RESOLVER) += dns_resolver/
obj-$(CONFIG_CEPH_LIB) += ceph/
obj-$(CONFIG_BATMAN_ADV) += batman-adv/
obj-$(CONFIG_NFC) += nfc/
obj-$(CONFIG_OPENVSWITCH) += openvswitch/
......@@ -1458,6 +1458,7 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
const struct ipv6hdr *ip6h;
u8 icmp6_type;
u8 nexthdr;
__be16 frag_off;
unsigned len;
int offset;
int err;
......@@ -1483,7 +1484,7 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
return -EINVAL;
nexthdr = ip6h->nexthdr;
offset = ipv6_skip_exthdr(skb, sizeof(*ip6h), &nexthdr);
offset = ipv6_skip_exthdr(skb, sizeof(*ip6h), &nexthdr, &frag_off);
if (offset < 0 || nexthdr != IPPROTO_ICMPV6)
return 0;
......
......@@ -55,9 +55,10 @@ ebt_ip6_mt(const struct sk_buff *skb, struct xt_action_param *par)
return false;
if (info->bitmask & EBT_IP6_PROTO) {
uint8_t nexthdr = ih6->nexthdr;
__be16 frag_off;
int offset_ph;
offset_ph = ipv6_skip_exthdr(skb, sizeof(_ip6h), &nexthdr);
offset_ph = ipv6_skip_exthdr(skb, sizeof(_ip6h), &nexthdr, &frag_off);
if (offset_ph == -1)
return false;
if (FWINV(info->protocol != nexthdr, EBT_IP6_PROTO))
......
......@@ -113,6 +113,7 @@ ebt_log_packet(u_int8_t pf, unsigned int hooknum,
const struct ipv6hdr *ih;
struct ipv6hdr _iph;
uint8_t nexthdr;
__be16 frag_off;
int offset_ph;
ih = skb_header_pointer(skb, 0, sizeof(_iph), &_iph);
......@@ -123,7 +124,7 @@ ebt_log_packet(u_int8_t pf, unsigned int hooknum,
printk(" IPv6 SRC=%pI6 IPv6 DST=%pI6, IPv6 priority=0x%01X, Next Header=%d",
&ih->saddr, &ih->daddr, ih->priority, ih->nexthdr);
nexthdr = ih->nexthdr;
offset_ph = ipv6_skip_exthdr(skb, sizeof(_iph), &nexthdr);
offset_ph = ipv6_skip_exthdr(skb, sizeof(_iph), &nexthdr, &frag_off);
if (offset_ph == -1)
goto out;
print_ports(skb, nexthdr, offset_ph);
......
......@@ -57,6 +57,9 @@ int ipv6_ext_hdr(u8 nexthdr)
* it returns NULL.
* - First fragment header is skipped, not-first ones
* are considered as unparsable.
* - Reports the offset field of the final fragment header so it is
* possible to tell whether this is a first fragment, later fragment,
* or not fragmented.
* - ESP is unparsable for now and considered like
* normal payload protocol.
* - Note also special handling of AUTH header. Thanks to IPsec wizards.
......@@ -64,10 +67,13 @@ int ipv6_ext_hdr(u8 nexthdr)
* --ANK (980726)
*/
int ipv6_skip_exthdr(const struct sk_buff *skb, int start, u8 *nexthdrp)
int ipv6_skip_exthdr(const struct sk_buff *skb, int start, u8 *nexthdrp,
__be16 *frag_offp)
{
u8 nexthdr = *nexthdrp;
*frag_offp = 0;
while (ipv6_ext_hdr(nexthdr)) {
struct ipv6_opt_hdr _hdr, *hp;
int hdrlen;
......@@ -87,7 +93,8 @@ int ipv6_skip_exthdr(const struct sk_buff *skb, int start, u8 *nexthdrp)
if (fp == NULL)
return -1;
if (ntohs(*fp) & ~0x7)
*frag_offp = *fp;
if (ntohs(*frag_offp) & ~0x7)
break;
hdrlen = 8;
} else if (nexthdr == NEXTHDR_AUTH)
......
......@@ -135,11 +135,12 @@ static int is_ineligible(struct sk_buff *skb)
int ptr = (u8 *)(ipv6_hdr(skb) + 1) - skb->data;
int len = skb->len - ptr;
__u8 nexthdr = ipv6_hdr(skb)->nexthdr;
__be16 frag_off;
if (len < 0)
return 1;
ptr = ipv6_skip_exthdr(skb, ptr, &nexthdr);
ptr = ipv6_skip_exthdr(skb, ptr, &nexthdr, &frag_off);
if (ptr < 0)
return 0;
if (nexthdr == IPPROTO_ICMPV6) {
......@@ -596,6 +597,7 @@ static void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info)
int inner_offset;
int hash;
u8 nexthdr;
__be16 frag_off;
if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
return;
......@@ -603,7 +605,8 @@ static void icmpv6_notify(struct sk_buff *skb, u8 type, u8 code, __be32 info)
nexthdr = ((struct ipv6hdr *)skb->data)->nexthdr;
if (ipv6_ext_hdr(nexthdr)) {
/* now skip over extension headers */
inner_offset = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr);
inner_offset = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr),
&nexthdr, &frag_off);
if (inner_offset<0)
return;
} else {
......
......@@ -280,6 +280,7 @@ int ip6_mc_input(struct sk_buff *skb)
u8 *ptr = skb_network_header(skb) + opt->ra;
struct icmp6hdr *icmp6;
u8 nexthdr = hdr->nexthdr;
__be16 frag_off;
int offset;
/* Check if the value of Router Alert
......@@ -293,7 +294,7 @@ int ip6_mc_input(struct sk_buff *skb)
goto out;
}
offset = ipv6_skip_exthdr(skb, sizeof(*hdr),
&nexthdr);
&nexthdr, &frag_off);
if (offset < 0)
goto out;
......
......@@ -329,10 +329,11 @@ static int ip6_forward_proxy_check(struct sk_buff *skb)
{
struct ipv6hdr *hdr = ipv6_hdr(skb);
u8 nexthdr = hdr->nexthdr;
__be16 frag_off;
int offset;
if (ipv6_ext_hdr(nexthdr)) {
offset = ipv6_skip_exthdr(skb, sizeof(*hdr), &nexthdr);
offset = ipv6_skip_exthdr(skb, sizeof(*hdr), &nexthdr, &frag_off);
if (offset < 0)
return 0;
} else
......
......@@ -49,6 +49,7 @@ static void send_reset(struct net *net, struct sk_buff *oldskb)
const __u8 tclass = DEFAULT_TOS_VALUE;
struct dst_entry *dst = NULL;
u8 proto;
__be16 frag_off;
struct flowi6 fl6;
if ((!(ipv6_addr_type(&oip6h->saddr) & IPV6_ADDR_UNICAST)) ||
......@@ -58,7 +59,7 @@ static void send_reset(struct net *net, struct sk_buff *oldskb)
}
proto = oip6h->nexthdr;
tcphoff = ipv6_skip_exthdr(oldskb, ((u8*)(oip6h+1) - oldskb->data), &proto);
tcphoff = ipv6_skip_exthdr(oldskb, ((u8*)(oip6h+1) - oldskb->data), &proto, &frag_off);
if ((tcphoff < 0) || (tcphoff > oldskb->len)) {
pr_debug("Cannot get TCP header.\n");
......
......@@ -116,9 +116,11 @@ ip_set_get_ip6_port(const struct sk_buff *skb, bool src,
{
int protoff;
u8 nexthdr;
__be16 frag_off;
nexthdr = ipv6_hdr(skb)->nexthdr;
protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr);
protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr,
&frag_off);
if (protoff < 0)
return false;
......
......@@ -98,6 +98,7 @@ static void audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)
struct ipv6hdr _ip6h;
const struct ipv6hdr *ih;
u8 nexthdr;
__be16 frag_off;
int offset;
ih = skb_header_pointer(skb, skb_network_offset(skb), sizeof(_ip6h), &_ip6h);
......@@ -108,7 +109,7 @@ static void audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)
nexthdr = ih->nexthdr;
offset = ipv6_skip_exthdr(skb, skb_network_offset(skb) + sizeof(_ip6h),
&nexthdr);
&nexthdr, &frag_off);
audit_log_format(ab, " saddr=%pI6c daddr=%pI6c proto=%hhu",
&ih->saddr, &ih->daddr, nexthdr);
......
......@@ -204,11 +204,12 @@ tcpmss_tg6(struct sk_buff *skb, const struct xt_action_param *par)
{
struct ipv6hdr *ipv6h = ipv6_hdr(skb);
u8 nexthdr;
__be16 frag_off;
int tcphoff;
int ret;
nexthdr = ipv6h->nexthdr;
tcphoff = ipv6_skip_exthdr(skb, sizeof(*ipv6h), &nexthdr);
tcphoff = ipv6_skip_exthdr(skb, sizeof(*ipv6h), &nexthdr, &frag_off);
if (tcphoff < 0)
return NF_DROP;
ret = tcpmss_mangle_packet(skb, par->targinfo,
......
......@@ -87,9 +87,10 @@ tcpoptstrip_tg6(struct sk_buff *skb, const struct xt_action_param *par)
struct ipv6hdr *ipv6h = ipv6_hdr(skb);
int tcphoff;
u_int8_t nexthdr;
__be16 frag_off;
nexthdr = ipv6h->nexthdr;
tcphoff = ipv6_skip_exthdr(skb, sizeof(*ipv6h), &nexthdr);
tcphoff = ipv6_skip_exthdr(skb, sizeof(*ipv6h), &nexthdr, &frag_off);
if (tcphoff < 0)
return NF_DROP;
......
......@@ -445,6 +445,7 @@ hashlimit_init_dst(const struct xt_hashlimit_htable *hinfo,
{
__be16 _ports[2], *ports;
u8 nexthdr;
__be16 frag_off;
int poff;
memset(dst, 0, sizeof(*dst));
......@@ -480,7 +481,7 @@ hashlimit_init_dst(const struct xt_hashlimit_htable *hinfo,
(XT_HASHLIMIT_HASH_DPT | XT_HASHLIMIT_HASH_SPT)))
return 0;
nexthdr = ipv6_hdr(skb)->nexthdr;
protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr);
protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), &nexthdr, &frag_off);
if ((int)protoff < 0)
return -1;
break;
......
......@@ -214,6 +214,7 @@ extract_icmp6_fields(const struct sk_buff *skb,
struct icmp6hdr *icmph, _icmph;
__be16 *ports, _ports[2];
u8 inside_nexthdr;
__be16 inside_fragoff;
int inside_hdrlen;
icmph = skb_header_pointer(skb, outside_hdrlen,
......@@ -229,7 +230,8 @@ extract_icmp6_fields(const struct sk_buff *skb,
return 1;
inside_nexthdr = inside_iph->nexthdr;
inside_hdrlen = ipv6_skip_exthdr(skb, outside_hdrlen + sizeof(_icmph) + sizeof(_inside_iph), &inside_nexthdr);
inside_hdrlen = ipv6_skip_exthdr(skb, outside_hdrlen + sizeof(_icmph) + sizeof(_inside_iph),
&inside_nexthdr, &inside_fragoff);
if (inside_hdrlen < 0)
return 1; /* hjm: Packet has no/incomplete transport layer headers. */
......
......@@ -33,6 +33,14 @@ void genl_unlock(void)
}
EXPORT_SYMBOL(genl_unlock);
#ifdef CONFIG_PROVE_LOCKING
int lockdep_genl_is_held(void)
{
return lockdep_is_held(&genl_mutex);
}
EXPORT_SYMBOL(lockdep_genl_is_held);
#endif
#define GENL_FAM_TAB_SIZE 16
#define GENL_FAM_TAB_MASK (GENL_FAM_TAB_SIZE - 1)
......@@ -946,3 +954,16 @@ int genlmsg_multicast_allns(struct sk_buff *skb, u32 pid, unsigned int group,
return genlmsg_mcast(skb, pid, group, flags);
}
EXPORT_SYMBOL(genlmsg_multicast_allns);
void genl_notify(struct sk_buff *skb, struct net *net, u32 pid, u32 group,
struct nlmsghdr *nlh, gfp_t flags)
{
struct sock *sk = net->genl_sock;
int report = 0;
if (nlh)
report = nlmsg_report(nlh);
nlmsg_notify(sk, skb, pid, group, report, flags);
}
EXPORT_SYMBOL(genl_notify);
#
# Open vSwitch
#
config OPENVSWITCH
tristate "Open vSwitch"
---help---
Open vSwitch is a multilayer Ethernet switch targeted at virtualized
environments. In addition to supporting a variety of features
expected in a traditional hardware switch, it enables fine-grained
programmatic extension and flow-based control of the network. This
control is useful in a wide variety of applications but is
particularly important in multi-server virtualization deployments,
which are often characterized by highly dynamic endpoints and the
need to maintain logical abstractions for multiple tenants.
The Open vSwitch datapath provides an in-kernel fast path for packet
forwarding. It is complemented by a userspace daemon, ovs-vswitchd,
which is able to accept configuration from a variety of sources and
translate it into packet processing rules.
See http://openvswitch.org for more information and userspace
utilities.
To compile this code as a module, choose M here: the module will be
called openvswitch.
If unsure, say N.
#
# Makefile for Open vSwitch.
#
obj-$(CONFIG_OPENVSWITCH) += openvswitch.o
openvswitch-y := \
actions.o \
datapath.o \
dp_notify.o \
flow.o \
vport.o \
vport-internal_dev.o \
vport-netdev.o \
This diff is collapsed.
This diff is collapsed.
/*
* Copyright (c) 2007-2011 Nicira Networks.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
* 02110-1301, USA
*/
#ifndef DATAPATH_H
#define DATAPATH_H 1
#include <asm/page.h>
#include <linux/kernel.h>
#include <linux/mutex.h>
#include <linux/netdevice.h>
#include <linux/skbuff.h>
#include <linux/u64_stats_sync.h>
#include <linux/version.h>
#include "flow.h"
struct vport;
#define DP_MAX_PORTS 1024
#define SAMPLE_ACTION_DEPTH 3
/**
* struct dp_stats_percpu - per-cpu packet processing statistics for a given
* datapath.
* @n_hit: Number of received packets for which a matching flow was found in
* the flow table.
* @n_miss: Number of received packets that had no matching flow in the flow
* table. The sum of @n_hit and @n_miss is the number of packets that have
* been received by the datapath.
* @n_lost: Number of received packets that had no matching flow in the flow
* table that could not be sent to userspace (normally due to an overflow in
* one of the datapath's queues).
*/
struct dp_stats_percpu {
u64 n_hit;
u64 n_missed;
u64 n_lost;
struct u64_stats_sync sync;
};
/**
* struct datapath - datapath for flow-based packet switching
* @rcu: RCU callback head for deferred destruction.
* @list_node: Element in global 'dps' list.
* @n_flows: Number of flows currently in flow table.
* @table: Current flow table. Protected by genl_lock and RCU.
* @ports: Map from port number to &struct vport. %OVSP_LOCAL port
* always exists, other ports may be %NULL. Protected by RTNL and RCU.
* @port_list: List of all ports in @ports in arbitrary order. RTNL required
* to iterate or modify.
* @stats_percpu: Per-CPU datapath statistics.
*
* Context: See the comment on locking at the top of datapath.c for additional
* locking information.
*/
struct datapath {
struct rcu_head rcu;
struct list_head list_node;
/* Flow table. */
struct flow_table __rcu *table;
/* Switch ports. */
struct vport __rcu *ports[DP_MAX_PORTS];
struct list_head port_list;
/* Stats. */
struct dp_stats_percpu __percpu *stats_percpu;
};
/**
* struct ovs_skb_cb - OVS data in skb CB
* @flow: The flow associated with this packet. May be %NULL if no flow.
*/
struct ovs_skb_cb {
struct sw_flow *flow;
};
#define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb)
/**
* struct dp_upcall - metadata to include with a packet to send to userspace
* @cmd: One of %OVS_PACKET_CMD_*.
* @key: Becomes %OVS_PACKET_ATTR_KEY. Must be nonnull.
* @userdata: If nonnull, its u64 value is extracted and passed to userspace as
* %OVS_PACKET_ATTR_USERDATA.
* @pid: Netlink PID to which packet should be sent. If @pid is 0 then no
* packet is sent and the packet is accounted in the datapath's @n_lost
* counter.
*/
struct dp_upcall_info {
u8 cmd;
const struct sw_flow_key *key;
const struct nlattr *userdata;
u32 pid;
};
extern struct notifier_block ovs_dp_device_notifier;
extern struct genl_multicast_group ovs_dp_vport_multicast_group;
void ovs_dp_process_received_packet(struct vport *, struct sk_buff *);
void ovs_dp_detach_port(struct vport *);
int ovs_dp_upcall(struct datapath *, struct sk_buff *,
const struct dp_upcall_info *);
const char *ovs_dp_name(const struct datapath *dp);
struct sk_buff *ovs_vport_cmd_build_info(struct vport *, u32 pid, u32 seq,
u8 cmd);
int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb);
#endif /* datapath.h */
/*
* Copyright (c) 2007-2011 Nicira Networks.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
* 02110-1301, USA
*/
#include <linux/netdevice.h>
#include <net/genetlink.h>
#include "datapath.h"
#include "vport-internal_dev.h"
#include "vport-netdev.h"
static int dp_device_event(struct notifier_block *unused, unsigned long event,
void *ptr)
{
struct net_device *dev = ptr;
struct vport *vport;
if (ovs_is_internal_dev(dev))
vport = ovs_internal_dev_get_vport(dev);
else
vport = ovs_netdev_get_vport(dev);
if (!vport)
return NOTIFY_DONE;
switch (event) {
case NETDEV_UNREGISTER:
if (!ovs_is_internal_dev(dev)) {
struct sk_buff *notify;
notify = ovs_vport_cmd_build_info(vport, 0, 0,
OVS_VPORT_CMD_DEL);
ovs_dp_detach_port(vport);
if (IS_ERR(notify)) {
netlink_set_err(init_net.genl_sock, 0,
ovs_dp_vport_multicast_group.id,
PTR_ERR(notify));
break;
}
genlmsg_multicast(notify, 0, ovs_dp_vport_multicast_group.id,
GFP_KERNEL);
}
break;
}
return NOTIFY_DONE;
}
struct notifier_block ovs_dp_device_notifier = {
.notifier_call = dp_device_event
};
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
/*
* Copyright (c) 2007-2011 Nicira Networks.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
* 02110-1301, USA
*/
#ifndef VPORT_INTERNAL_DEV_H
#define VPORT_INTERNAL_DEV_H 1
#include "datapath.h"
#include "vport.h"
int ovs_is_internal_dev(const struct net_device *);
struct vport *ovs_internal_dev_get_vport(struct net_device *);
#endif /* vport-internal_dev.h */
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment