Commit 7c804e91 authored by David S. Miller's avatar David S. Miller

Merge branch 'ipv6-ioam'

Justin Iurman says:

====================
Support for the IOAM Pre-allocated Trace with IPv6

v5:
 - Refine types, min/max and default values for new sysctls
 - Introduce a "_wide" sysctl for each "ioam6_id" sysctl
 - Add more validation on headers before processing data
 - RCU for sc <> ns pointers + appropriate accessors
 - Generic Netlink policies are now per op, not per family anymore
 - Address other comments/remarks from Jakub (thanks again)
 - Revert "__packed" to "__attribute__((packed))" for uapi headers
 - Add tests to cover the functionality added, as requested by David Ahern

v4:
 - Address warnings from checkpatch (ignore errors related to unnamed bitfields
   in the first patch)
 - Use of hweight32 (thanks Jakub)
 - Remove inline keyword from static functions in C files and let the compiler
   decide what to do (thanks Jakub)

v3:
 - Fix warning "unused label 'out_unregister_genl'" by adding conditional macro
 - Fix lwtunnel output redirect bug: dst cache useless in this case, use
   orig_output instead

v2:
 - Fix warning with static for __ioam6_fill_trace_data
 - Fix sparse warning with __force when casting __be64 to __be32
 - Fix unchecked dereference when removing IOAM namespaces or schemas
 - exthdrs.c: Don't drop by default (now: ignore) to match the act bits "00"
 - Add control plane support for the inline insertion (lwtunnel)
 - Provide uapi structures
 - Use __net_timestamp if skb->tstamp is empty
 - Add note about the temporary IANA allocation
 - Remove support for "removable" TLVs
 - Remove support for virtual/anonymous tunnel decapsulation

In-situ Operations, Administration, and Maintenance (IOAM) records
operational and telemetry information in a packet while it traverses
a path between two points in an IOAM domain. It is defined in
draft-ietf-ippm-ioam-data [1]. IOAM data fields can be encapsulated
into a variety of protocols. The IPv6 encapsulation is defined in
draft-ietf-ippm-ioam-ipv6-options [2], via extension headers. IOAM
can be used to complement OAM mechanisms based on e.g. ICMP or other
types of probe packets.

This patchset implements support for the Pre-allocated Trace, carried
by a Hop-by-Hop. Therefore, a new IPv6 Hop-by-Hop TLV option is
introduced, see IANA [3]. The three other IOAM options are not included
in this patchset (Incremental Trace, Proof-of-Transit and Edge-to-Edge).
The main idea behind the IOAM Pre-allocated Trace is that a node
pre-allocates some room in packets for IOAM data. Then, each IOAM node
on the path will insert its data. There exist several interesting use-
cases, e.g. Fast failure detection/isolation or Smart service selection.
Another killer use-case is what we have called Cross-Layer Telemetry,
see the demo video on its repository [4], that aims to make the entire
stack (L2/L3 -> L7) visible for distributed tracing tools (e.g. Jaeger),
instead of the current L5 -> L7 limited view. So, basically, this is a
nice feature for the Linux Kernel.

This patchset also provides support for the control plane part, but only for the
inline insertion (host-to-host use case), through lightweight tunnels. Indeed,
for in-transit traffic, the solution is to have an IPv6-in-IPv6 encapsulation,
which brings some difficulties and still requires a little bit of work and
discussion (ie anonymous tunnel decapsulation and multi egress resolution).

- Patch 1: IPv6 IOAM headers definition
- Patch 2: Data plane support for Pre-allocated Trace
- Patch 3: IOAM Generic Netlink API
- Patch 4: Support for IOAM injection with lwtunnels
- Patch 5: Documentation for new IOAM sysctls
- Patch 6: Test for the IOAM insertion with IPv6

  [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data
  [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options
  [3] https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
  [4] https://github.com/iurmanj/cross-layer-telemetry
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 71f4f89a 968691c7
.. SPDX-License-Identifier: GPL-2.0
=====================
IOAM6 Sysfs variables
=====================
/proc/sys/net/conf/<iface>/ioam6_* variables:
=============================================
ioam6_enabled - BOOL
Accept (= enabled) or ignore (= disabled) IPv6 IOAM options on ingress
for this interface.
* 0 - disabled (default)
* 1 - enabled
ioam6_id - SHORT INTEGER
Define the IOAM id of this interface.
Default is ~0.
ioam6_id_wide - INTEGER
Define the wide IOAM id of this interface.
Default is ~0.
......@@ -1926,6 +1926,23 @@ fib_notify_on_flag_change - INTEGER
- 1 - Emit notifications.
- 2 - Emit notifications only for RTM_F_OFFLOAD_FAILED flag change.
ioam6_id - INTEGER
Define the IOAM id of this node. Uses only 24 bits out of 32 in total.
Min: 0
Max: 0xFFFFFF
Default: 0xFFFFFF
ioam6_id_wide - LONG INTEGER
Define the wide IOAM id of this node. Uses only 56 bits out of 64 in
total. Can be different from ioam6_id.
Min: 0
Max: 0xFFFFFFFFFFFFFF
Default: 0xFFFFFFFFFFFFFF
IPv6 Fragmentation:
ip6frag_high_thresh - INTEGER
......
/* SPDX-License-Identifier: GPL-2.0+ */
/*
* IPv6 IOAM
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _LINUX_IOAM6_H
#define _LINUX_IOAM6_H
#include <uapi/linux/ioam6.h>
#endif /* _LINUX_IOAM6_H */
/* SPDX-License-Identifier: GPL-2.0+ */
/*
* IPv6 IOAM Generic Netlink API
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _LINUX_IOAM6_GENL_H
#define _LINUX_IOAM6_GENL_H
#include <uapi/linux/ioam6_genl.h>
#endif /* _LINUX_IOAM6_GENL_H */
/* SPDX-License-Identifier: GPL-2.0+ */
/*
* IPv6 IOAM Lightweight Tunnel API
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _LINUX_IOAM6_IPTUNNEL_H
#define _LINUX_IOAM6_IPTUNNEL_H
#include <uapi/linux/ioam6_iptunnel.h>
#endif /* _LINUX_IOAM6_IPTUNNEL_H */
......@@ -76,6 +76,9 @@ struct ipv6_devconf {
__s32 disable_policy;
__s32 ndisc_tclass;
__s32 rpl_seg_enabled;
__u32 ioam6_id;
__u32 ioam6_id_wide;
__u8 ioam6_enabled;
struct ctl_table_header *sysctl_header;
};
......
/* SPDX-License-Identifier: GPL-2.0+ */
/*
* IPv6 IOAM implementation
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _NET_IOAM6_H
#define _NET_IOAM6_H
#include <linux/net.h>
#include <linux/ipv6.h>
#include <linux/ioam6.h>
#include <linux/rhashtable-types.h>
struct ioam6_namespace {
struct rhash_head head;
struct rcu_head rcu;
struct ioam6_schema __rcu *schema;
__be16 id;
__be32 data;
__be64 data_wide;
};
struct ioam6_schema {
struct rhash_head head;
struct rcu_head rcu;
struct ioam6_namespace __rcu *ns;
u32 id;
int len;
__be32 hdr;
u8 data[0];
};
struct ioam6_pernet_data {
struct mutex lock;
struct rhashtable namespaces;
struct rhashtable schemas;
};
static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
{
#if IS_ENABLED(CONFIG_IPV6)
return net->ipv6.ioam6_data;
#else
return NULL;
#endif
}
struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
void ioam6_fill_trace_data(struct sk_buff *skb,
struct ioam6_namespace *ns,
struct ioam6_trace_hdr *trace);
int ioam6_init(void);
void ioam6_exit(void);
int ioam6_iptunnel_init(void);
void ioam6_iptunnel_exit(void);
#endif /* _NET_IOAM6_H */
......@@ -51,6 +51,8 @@ struct netns_sysctl_ipv6 {
int max_dst_opts_len;
int max_hbh_opts_len;
int seg6_flowlabel;
u32 ioam6_id;
u64 ioam6_id_wide;
bool skip_notify_on_dev_down;
u8 fib_notify_on_flag_change;
};
......@@ -110,6 +112,7 @@ struct netns_ipv6 {
spinlock_t lock;
u32 seq;
} ip6addrlbl_table;
struct ioam6_pernet_data *ioam6_data;
};
#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
......
......@@ -145,6 +145,7 @@ struct in6_flowlabel_req {
#define IPV6_TLV_PADN 1
#define IPV6_TLV_ROUTERALERT 5
#define IPV6_TLV_CALIPSO 7 /* RFC 5570 */
#define IPV6_TLV_IOAM 49 /* TEMPORARY IANA allocation for IOAM */
#define IPV6_TLV_JUMBO 194
#define IPV6_TLV_HAO 201 /* home address option */
......
/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
/*
* IPv6 IOAM implementation
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _UAPI_LINUX_IOAM6_H
#define _UAPI_LINUX_IOAM6_H
#include <asm/byteorder.h>
#include <linux/types.h>
#define IOAM6_U16_UNAVAILABLE U16_MAX
#define IOAM6_U32_UNAVAILABLE U32_MAX
#define IOAM6_U64_UNAVAILABLE U64_MAX
#define IOAM6_DEFAULT_ID (IOAM6_U32_UNAVAILABLE >> 8)
#define IOAM6_DEFAULT_ID_WIDE (IOAM6_U64_UNAVAILABLE >> 8)
#define IOAM6_DEFAULT_IF_ID IOAM6_U16_UNAVAILABLE
#define IOAM6_DEFAULT_IF_ID_WIDE IOAM6_U32_UNAVAILABLE
/*
* IPv6 IOAM Option Header
*/
struct ioam6_hdr {
__u8 opt_type;
__u8 opt_len;
__u8 :8; /* reserved */
#define IOAM6_TYPE_PREALLOC 0
__u8 type;
} __attribute__((packed));
/*
* IOAM Trace Header
*/
struct ioam6_trace_hdr {
__be16 namespace_id;
#if defined(__LITTLE_ENDIAN_BITFIELD)
__u8 :1, /* unused */
:1, /* unused */
overflow:1,
nodelen:5;
__u8 remlen:7,
:1; /* unused */
union {
__be32 type_be32;
struct {
__u32 bit7:1,
bit6:1,
bit5:1,
bit4:1,
bit3:1,
bit2:1,
bit1:1,
bit0:1,
bit15:1, /* unused */
bit14:1, /* unused */
bit13:1, /* unused */
bit12:1, /* unused */
bit11:1,
bit10:1,
bit9:1,
bit8:1,
bit23:1, /* reserved */
bit22:1,
bit21:1, /* unused */
bit20:1, /* unused */
bit19:1, /* unused */
bit18:1, /* unused */
bit17:1, /* unused */
bit16:1, /* unused */
:8; /* reserved */
} type;
};
#elif defined(__BIG_ENDIAN_BITFIELD)
__u8 nodelen:5,
overflow:1,
:1, /* unused */
:1; /* unused */
__u8 :1, /* unused */
remlen:7;
union {
__be32 type_be32;
struct {
__u32 bit0:1,
bit1:1,
bit2:1,
bit3:1,
bit4:1,
bit5:1,
bit6:1,
bit7:1,
bit8:1,
bit9:1,
bit10:1,
bit11:1,
bit12:1, /* unused */
bit13:1, /* unused */
bit14:1, /* unused */
bit15:1, /* unused */
bit16:1, /* unused */
bit17:1, /* unused */
bit18:1, /* unused */
bit19:1, /* unused */
bit20:1, /* unused */
bit21:1, /* unused */
bit22:1,
bit23:1, /* reserved */
:8; /* reserved */
} type;
};
#else
#error "Please fix <asm/byteorder.h>"
#endif
#define IOAM6_TRACE_DATA_SIZE_MAX 244
__u8 data[0];
} __attribute__((packed));
#endif /* _UAPI_LINUX_IOAM6_H */
/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
/*
* IPv6 IOAM Generic Netlink API
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _UAPI_LINUX_IOAM6_GENL_H
#define _UAPI_LINUX_IOAM6_GENL_H
#define IOAM6_GENL_NAME "IOAM6"
#define IOAM6_GENL_VERSION 0x1
enum {
IOAM6_ATTR_UNSPEC,
IOAM6_ATTR_NS_ID, /* u16 */
IOAM6_ATTR_NS_DATA, /* u32 */
IOAM6_ATTR_NS_DATA_WIDE,/* u64 */
#define IOAM6_MAX_SCHEMA_DATA_LEN (255 * 4)
IOAM6_ATTR_SC_ID, /* u32 */
IOAM6_ATTR_SC_DATA, /* Binary */
IOAM6_ATTR_SC_NONE, /* Flag */
IOAM6_ATTR_PAD,
__IOAM6_ATTR_MAX,
};
#define IOAM6_ATTR_MAX (__IOAM6_ATTR_MAX - 1)
enum {
IOAM6_CMD_UNSPEC,
IOAM6_CMD_ADD_NAMESPACE,
IOAM6_CMD_DEL_NAMESPACE,
IOAM6_CMD_DUMP_NAMESPACES,
IOAM6_CMD_ADD_SCHEMA,
IOAM6_CMD_DEL_SCHEMA,
IOAM6_CMD_DUMP_SCHEMAS,
IOAM6_CMD_NS_SET_SCHEMA,
__IOAM6_CMD_MAX,
};
#define IOAM6_CMD_MAX (__IOAM6_CMD_MAX - 1)
#endif /* _UAPI_LINUX_IOAM6_GENL_H */
/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
/*
* IPv6 IOAM Lightweight Tunnel API
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#ifndef _UAPI_LINUX_IOAM6_IPTUNNEL_H
#define _UAPI_LINUX_IOAM6_IPTUNNEL_H
enum {
IOAM6_IPTUNNEL_UNSPEC,
IOAM6_IPTUNNEL_TRACE, /* struct ioam6_trace_hdr */
__IOAM6_IPTUNNEL_MAX,
};
#define IOAM6_IPTUNNEL_MAX (__IOAM6_IPTUNNEL_MAX - 1)
#endif /* _UAPI_LINUX_IOAM6_IPTUNNEL_H */
......@@ -190,6 +190,9 @@ enum {
DEVCONF_NDISC_TCLASS,
DEVCONF_RPL_SEG_ENABLED,
DEVCONF_RA_DEFRTR_METRIC,
DEVCONF_IOAM6_ENABLED,
DEVCONF_IOAM6_ID,
DEVCONF_IOAM6_ID_WIDE,
DEVCONF_MAX
};
......
......@@ -14,6 +14,7 @@ enum lwtunnel_encap_types {
LWTUNNEL_ENCAP_BPF,
LWTUNNEL_ENCAP_SEG6_LOCAL,
LWTUNNEL_ENCAP_RPL,
LWTUNNEL_ENCAP_IOAM6,
__LWTUNNEL_ENCAP_MAX,
};
......
......@@ -43,6 +43,8 @@ static const char *lwtunnel_encap_str(enum lwtunnel_encap_types encap_type)
return "SEG6LOCAL";
case LWTUNNEL_ENCAP_RPL:
return "RPL";
case LWTUNNEL_ENCAP_IOAM6:
return "IOAM6";
case LWTUNNEL_ENCAP_IP6:
case LWTUNNEL_ENCAP_IP:
case LWTUNNEL_ENCAP_NONE:
......
......@@ -328,4 +328,15 @@ config IPV6_RPL_LWTUNNEL
If unsure, say N.
config IPV6_IOAM6_LWTUNNEL
bool "IPv6: IOAM Pre-allocated Trace insertion support"
depends on IPV6
select LWTUNNEL
help
Support for the inline insertion of IOAM Pre-allocated
Trace Header (only on locally generated packets), using
the lightweight tunnels mechanism.
If unsure, say N.
endif # IPV6
......@@ -10,7 +10,7 @@ ipv6-objs := af_inet6.o anycast.o ip6_output.o ip6_input.o addrconf.o \
route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
udp_offload.o seg6.o fib6_notifier.o rpl.o
udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
ipv6-offload := ip6_offload.o tcpv6_offload.o exthdrs_offload.o
......@@ -27,6 +27,7 @@ ipv6-$(CONFIG_NETLABEL) += calipso.o
ipv6-$(CONFIG_IPV6_SEG6_LWTUNNEL) += seg6_iptunnel.o seg6_local.o
ipv6-$(CONFIG_IPV6_SEG6_HMAC) += seg6_hmac.o
ipv6-$(CONFIG_IPV6_RPL_LWTUNNEL) += rpl_iptunnel.o
ipv6-$(CONFIG_IPV6_IOAM6_LWTUNNEL) += ioam6_iptunnel.o
ipv6-objs += $(ipv6-y)
......
......@@ -89,12 +89,15 @@
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/export.h>
#include <linux/ioam6.h>
#define INFINITY_LIFE_TIME 0xFFFFFFFF
#define IPV6_MAX_STRLEN \
sizeof("ffff:ffff:ffff:ffff:ffff:ffff:255.255.255.255")
static u32 ioam6_if_id_max = U16_MAX;
static inline u32 cstamp_delta(unsigned long cstamp)
{
return (cstamp - INITIAL_JIFFIES) * 100UL / HZ;
......@@ -237,6 +240,9 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
.addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
.disable_policy = 0,
.rpl_seg_enabled = 0,
.ioam6_enabled = 0,
.ioam6_id = IOAM6_DEFAULT_IF_ID,
.ioam6_id_wide = IOAM6_DEFAULT_IF_ID_WIDE,
};
static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
......@@ -293,6 +299,9 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
.addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
.disable_policy = 0,
.rpl_seg_enabled = 0,
.ioam6_enabled = 0,
.ioam6_id = IOAM6_DEFAULT_IF_ID,
.ioam6_id_wide = IOAM6_DEFAULT_IF_ID_WIDE,
};
/* Check if link is ready: is it up and is a valid qdisc available */
......@@ -5524,6 +5533,9 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
array[DEVCONF_IOAM6_ID_WIDE] = cnf->ioam6_id_wide;
}
static inline size_t inet6_ifla6_size(void)
......@@ -6930,6 +6942,31 @@ static const struct ctl_table addrconf_sysctl[] = {
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "ioam6_enabled",
.data = &ipv6_devconf.ioam6_enabled,
.maxlen = sizeof(u8),
.mode = 0644,
.proc_handler = proc_dou8vec_minmax,
.extra1 = (void *)SYSCTL_ZERO,
.extra2 = (void *)SYSCTL_ONE,
},
{
.procname = "ioam6_id",
.data = &ipv6_devconf.ioam6_id,
.maxlen = sizeof(u32),
.mode = 0644,
.proc_handler = proc_douintvec_minmax,
.extra1 = (void *)SYSCTL_ZERO,
.extra2 = (void *)&ioam6_if_id_max,
},
{
.procname = "ioam6_id_wide",
.data = &ipv6_devconf.ioam6_id_wide,
.maxlen = sizeof(u32),
.mode = 0644,
.proc_handler = proc_douintvec,
},
{
/* sentinel */
}
......
......@@ -62,6 +62,7 @@
#include <net/rpl.h>
#include <net/compat.h>
#include <net/xfrm.h>
#include <net/ioam6.h>
#include <linux/uaccess.h>
#include <linux/mroute6.h>
......@@ -961,6 +962,9 @@ static int __net_init inet6_net_init(struct net *net)
net->ipv6.sysctl.fib_notify_on_flag_change = 0;
atomic_set(&net->ipv6.fib6_sernum, 1);
net->ipv6.sysctl.ioam6_id = IOAM6_DEFAULT_ID;
net->ipv6.sysctl.ioam6_id_wide = IOAM6_DEFAULT_ID_WIDE;
err = ipv6_init_mibs(net);
if (err)
return err;
......@@ -1191,6 +1195,10 @@ static int __init inet6_init(void)
if (err)
goto rpl_fail;
err = ioam6_init();
if (err)
goto ioam6_fail;
err = igmp6_late_init();
if (err)
goto igmp6_late_err;
......@@ -1213,6 +1221,8 @@ static int __init inet6_init(void)
igmp6_late_cleanup();
#endif
igmp6_late_err:
ioam6_exit();
ioam6_fail:
rpl_exit();
rpl_fail:
seg6_exit();
......
......@@ -49,6 +49,9 @@
#include <net/seg6_hmac.h>
#endif
#include <net/rpl.h>
#include <linux/ioam6.h>
#include <net/ioam6.h>
#include <net/dst_metadata.h>
#include <linux/uaccess.h>
......@@ -928,6 +931,60 @@ static bool ipv6_hop_ra(struct sk_buff *skb, int optoff)
return false;
}
/* IOAM */
static bool ipv6_hop_ioam(struct sk_buff *skb, int optoff)
{
struct ioam6_trace_hdr *trace;
struct ioam6_namespace *ns;
struct ioam6_hdr *hdr;
/* Bad alignment (must be 4n-aligned) */
if (optoff & 3)
goto drop;
/* Ignore if IOAM is not enabled on ingress */
if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
goto ignore;
/* Truncated Option header */
hdr = (struct ioam6_hdr *)(skb_network_header(skb) + optoff);
if (hdr->opt_len < 2)
goto drop;
switch (hdr->type) {
case IOAM6_TYPE_PREALLOC:
/* Truncated Pre-allocated Trace header */
if (hdr->opt_len < 2 + sizeof(*trace))
goto drop;
/* Malformed Pre-allocated Trace header */
trace = (struct ioam6_trace_hdr *)((u8 *)hdr + sizeof(*hdr));
if (hdr->opt_len < 2 + sizeof(*trace) + trace->remlen * 4)
goto drop;
/* Ignore if the IOAM namespace is unknown */
ns = ioam6_namespace(ipv6_skb_net(skb), trace->namespace_id);
if (!ns)
goto ignore;
if (!skb_valid_dst(skb))
ip6_route_input(skb);
ioam6_fill_trace_data(skb, ns, trace);
break;
default:
break;
}
ignore:
return true;
drop:
kfree_skb(skb);
return false;
}
/* Jumbo payload */
static bool ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
......@@ -999,6 +1056,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
.type = IPV6_TLV_ROUTERALERT,
.func = ipv6_hop_ra,
},
{
.type = IPV6_TLV_IOAM,
.func = ipv6_hop_ioam,
},
{
.type = IPV6_TLV_JUMBO,
.func = ipv6_hop_jumbo,
......
This diff is collapsed.
// SPDX-License-Identifier: GPL-2.0+
/*
* IPv6 IOAM Lightweight Tunnel implementation
*
* Author:
* Justin Iurman <justin.iurman@uliege.be>
*/
#include <linux/kernel.h>
#include <linux/skbuff.h>
#include <linux/net.h>
#include <linux/netlink.h>
#include <linux/in6.h>
#include <linux/ioam6.h>
#include <linux/ioam6_iptunnel.h>
#include <net/dst.h>
#include <net/sock.h>
#include <net/lwtunnel.h>
#include <net/ioam6.h>
#define IOAM6_MASK_SHORT_FIELDS 0xff100000
#define IOAM6_MASK_WIDE_FIELDS 0xe00000
struct ioam6_lwt_encap {
struct ipv6_hopopt_hdr eh;
u8 pad[2]; /* 2-octet padding for 4n-alignment */
struct ioam6_hdr ioamh;
struct ioam6_trace_hdr traceh;
} __packed;
struct ioam6_lwt {
struct ioam6_lwt_encap tuninfo;
};
static struct ioam6_lwt *ioam6_lwt_state(struct lwtunnel_state *lwt)
{
return (struct ioam6_lwt *)lwt->data;
}
static struct ioam6_lwt_encap *ioam6_lwt_info(struct lwtunnel_state *lwt)
{
return &ioam6_lwt_state(lwt)->tuninfo;
}
static struct ioam6_trace_hdr *ioam6_trace(struct lwtunnel_state *lwt)
{
return &(ioam6_lwt_state(lwt)->tuninfo.traceh);
}
static const struct nla_policy ioam6_iptunnel_policy[IOAM6_IPTUNNEL_MAX + 1] = {
[IOAM6_IPTUNNEL_TRACE] = NLA_POLICY_EXACT_LEN(sizeof(struct ioam6_trace_hdr)),
};
static int nla_put_ioam6_trace(struct sk_buff *skb, int attrtype,
struct ioam6_trace_hdr *trace)
{
struct ioam6_trace_hdr *data;
struct nlattr *nla;
int len;
len = sizeof(*trace);
nla = nla_reserve(skb, attrtype, len);
if (!nla)
return -EMSGSIZE;
data = nla_data(nla);
memcpy(data, trace, len);
return 0;
}
static bool ioam6_validate_trace_hdr(struct ioam6_trace_hdr *trace)
{
u32 fields;
if (!trace->type_be32 || !trace->remlen ||
trace->remlen > IOAM6_TRACE_DATA_SIZE_MAX / 4)
return false;
trace->nodelen = 0;
fields = be32_to_cpu(trace->type_be32);
trace->nodelen += hweight32(fields & IOAM6_MASK_SHORT_FIELDS)
* (sizeof(__be32) / 4);
trace->nodelen += hweight32(fields & IOAM6_MASK_WIDE_FIELDS)
* (sizeof(__be64) / 4);
return true;
}
static int ioam6_build_state(struct net *net, struct nlattr *nla,
unsigned int family, const void *cfg,
struct lwtunnel_state **ts,
struct netlink_ext_ack *extack)
{
struct nlattr *tb[IOAM6_IPTUNNEL_MAX + 1];
struct ioam6_lwt_encap *tuninfo;
struct ioam6_trace_hdr *trace;
struct lwtunnel_state *s;
int len_aligned;
int len, err;
if (family != AF_INET6)
return -EINVAL;
err = nla_parse_nested(tb, IOAM6_IPTUNNEL_MAX, nla,
ioam6_iptunnel_policy, extack);
if (err < 0)
return err;
if (!tb[IOAM6_IPTUNNEL_TRACE]) {
NL_SET_ERR_MSG(extack, "missing trace");
return -EINVAL;
}
trace = nla_data(tb[IOAM6_IPTUNNEL_TRACE]);
if (!ioam6_validate_trace_hdr(trace)) {
NL_SET_ERR_MSG_ATTR(extack, tb[IOAM6_IPTUNNEL_TRACE],
"invalid trace validation");
return -EINVAL;
}
len = sizeof(*tuninfo) + trace->remlen * 4;
len_aligned = ALIGN(len, 8);
s = lwtunnel_state_alloc(len_aligned);
if (!s)
return -ENOMEM;
tuninfo = ioam6_lwt_info(s);
tuninfo->eh.hdrlen = (len_aligned >> 3) - 1;
tuninfo->pad[0] = IPV6_TLV_PADN;
tuninfo->ioamh.type = IOAM6_TYPE_PREALLOC;
tuninfo->ioamh.opt_type = IPV6_TLV_IOAM;
tuninfo->ioamh.opt_len = sizeof(tuninfo->ioamh) - 2 + sizeof(*trace)
+ trace->remlen * 4;
memcpy(&tuninfo->traceh, trace, sizeof(*trace));
len = len_aligned - len;
if (len == 1) {
tuninfo->traceh.data[trace->remlen * 4] = IPV6_TLV_PAD1;
} else if (len > 0) {
tuninfo->traceh.data[trace->remlen * 4] = IPV6_TLV_PADN;
tuninfo->traceh.data[trace->remlen * 4 + 1] = len - 2;
}
s->type = LWTUNNEL_ENCAP_IOAM6;
s->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT;
*ts = s;
return 0;
}
static int ioam6_do_inline(struct sk_buff *skb, struct ioam6_lwt_encap *tuninfo)
{
struct ioam6_trace_hdr *trace;
struct ipv6hdr *oldhdr, *hdr;
struct ioam6_namespace *ns;
int hdrlen, err;
hdrlen = (tuninfo->eh.hdrlen + 1) << 3;
err = skb_cow_head(skb, hdrlen + skb->mac_len);
if (unlikely(err))
return err;
oldhdr = ipv6_hdr(skb);
skb_pull(skb, sizeof(*oldhdr));
skb_postpull_rcsum(skb, skb_network_header(skb), sizeof(*oldhdr));
skb_push(skb, sizeof(*oldhdr) + hdrlen);
skb_reset_network_header(skb);
skb_mac_header_rebuild(skb);
hdr = ipv6_hdr(skb);
memmove(hdr, oldhdr, sizeof(*oldhdr));
tuninfo->eh.nexthdr = hdr->nexthdr;
skb_set_transport_header(skb, sizeof(*hdr));
skb_postpush_rcsum(skb, hdr, sizeof(*hdr) + hdrlen);
memcpy(skb_transport_header(skb), (u8 *)tuninfo, hdrlen);
hdr->nexthdr = NEXTHDR_HOP;
hdr->payload_len = cpu_to_be16(skb->len - sizeof(*hdr));
trace = (struct ioam6_trace_hdr *)(skb_transport_header(skb)
+ sizeof(struct ipv6_hopopt_hdr) + 2
+ sizeof(struct ioam6_hdr));
ns = ioam6_namespace(dev_net(skb_dst(skb)->dev), trace->namespace_id);
if (ns)
ioam6_fill_trace_data(skb, ns, trace);
return 0;
}
static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
{
struct lwtunnel_state *lwt = skb_dst(skb)->lwtstate;
int err = -EINVAL;
if (skb->protocol != htons(ETH_P_IPV6))
goto drop;
/* Only for packets we send and
* that do not contain a Hop-by-Hop yet
*/
if (skb->dev || ipv6_hdr(skb)->nexthdr == NEXTHDR_HOP)
goto out;
err = ioam6_do_inline(skb, ioam6_lwt_info(lwt));
if (unlikely(err))
goto drop;
err = skb_cow_head(skb, LL_RESERVED_SPACE(skb_dst(skb)->dev));
if (unlikely(err))
goto drop;
out:
return lwt->orig_output(net, sk, skb);
drop:
kfree_skb(skb);
return err;
}
static int ioam6_fill_encap_info(struct sk_buff *skb,
struct lwtunnel_state *lwtstate)
{
struct ioam6_trace_hdr *trace = ioam6_trace(lwtstate);
if (nla_put_ioam6_trace(skb, IOAM6_IPTUNNEL_TRACE, trace))
return -EMSGSIZE;
return 0;
}
static int ioam6_encap_nlsize(struct lwtunnel_state *lwtstate)
{
struct ioam6_trace_hdr *trace = ioam6_trace(lwtstate);
return nla_total_size(sizeof(*trace));
}
static int ioam6_encap_cmp(struct lwtunnel_state *a, struct lwtunnel_state *b)
{
struct ioam6_trace_hdr *a_hdr = ioam6_trace(a);
struct ioam6_trace_hdr *b_hdr = ioam6_trace(b);
return (a_hdr->namespace_id != b_hdr->namespace_id);
}
static const struct lwtunnel_encap_ops ioam6_iptun_ops = {
.build_state = ioam6_build_state,
.output = ioam6_output,
.fill_encap = ioam6_fill_encap_info,
.get_encap_size = ioam6_encap_nlsize,
.cmp_encap = ioam6_encap_cmp,
.owner = THIS_MODULE,
};
int __init ioam6_iptunnel_init(void)
{
return lwtunnel_encap_add_ops(&ioam6_iptun_ops, LWTUNNEL_ENCAP_IOAM6);
}
void ioam6_iptunnel_exit(void)
{
lwtunnel_encap_del_ops(&ioam6_iptun_ops, LWTUNNEL_ENCAP_IOAM6);
}
......@@ -21,6 +21,7 @@
#ifdef CONFIG_NETLABEL
#include <net/calipso.h>
#endif
#include <linux/ioam6.h>
static int two = 2;
static int three = 3;
......@@ -28,6 +29,8 @@ static int flowlabel_reflect_max = 0x7;
static int auto_flowlabels_max = IP6_AUTO_FLOW_LABEL_MAX;
static u32 rt6_multipath_hash_fields_all_mask =
FIB_MULTIPATH_HASH_FIELD_ALL_MASK;
static u32 ioam6_id_max = IOAM6_DEFAULT_ID;
static u64 ioam6_id_wide_max = IOAM6_DEFAULT_ID_WIDE;
static int proc_rt6_multipath_hash_policy(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
......@@ -196,6 +199,22 @@ static struct ctl_table ipv6_table_template[] = {
.extra1 = SYSCTL_ZERO,
.extra2 = &two,
},
{
.procname = "ioam6_id",
.data = &init_net.ipv6.sysctl.ioam6_id,
.maxlen = sizeof(u32),
.mode = 0644,
.proc_handler = proc_douintvec_minmax,
.extra2 = &ioam6_id_max,
},
{
.procname = "ioam6_id_wide",
.data = &init_net.ipv6.sysctl.ioam6_id_wide,
.maxlen = sizeof(u64),
.mode = 0644,
.proc_handler = proc_doulongvec_minmax,
.extra2 = &ioam6_id_wide_max,
},
{ }
};
......
......@@ -25,6 +25,7 @@ TEST_PROGS += bareudp.sh
TEST_PROGS += unicast_extensions.sh
TEST_PROGS += udpgro_fwd.sh
TEST_PROGS += veth.sh
TEST_PROGS += ioam6.sh
TEST_PROGS_EXTENDED := in_netns.sh
TEST_GEN_FILES = socket nettest
TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any
......@@ -36,6 +37,7 @@ TEST_GEN_FILES += fin_ack_lat
TEST_GEN_FILES += reuseaddr_ports_exhausted
TEST_GEN_FILES += hwtstamp_config rxtimestamp timestamping txtimestamp
TEST_GEN_FILES += ipsec
TEST_GEN_FILES += ioam6_parser
TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls
......
......@@ -42,3 +42,4 @@ CONFIG_NET_CLS_FLOWER=m
CONFIG_NET_ACT_TUNNEL_KEY=m
CONFIG_NET_ACT_MIRRED=m
CONFIG_BAREUDP=m
CONFIG_IPV6_IOAM6_LWTUNNEL=y
This diff is collapsed.
// SPDX-License-Identifier: GPL-2.0+
/*
* Author: Justin Iurman (justin.iurman@uliege.be)
*
* IOAM parser for IPv6, see ioam6.sh for details.
*/
#include <asm/byteorder.h>
#include <linux/const.h>
#include <linux/if_ether.h>
#include <linux/ioam6.h>
#include <linux/ipv6.h>
#include <sys/socket.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
struct node_args {
__u32 id;
__u64 wide;
__u16 ingr_id;
__u16 egr_id;
__u32 ingr_wide;
__u32 egr_wide;
__u32 ns_data;
__u64 ns_wide;
__u32 sc_id;
__u8 hop_limit;
__u8 *sc_data; /* NULL when sc_id = 0xffffff (default empty value) */
};
/* expected args per node, in that order */
enum {
NODE_ARG_HOP_LIMIT,
NODE_ARG_ID,
NODE_ARG_WIDE,
NODE_ARG_INGR_ID,
NODE_ARG_INGR_WIDE,
NODE_ARG_EGR_ID,
NODE_ARG_EGR_WIDE,
NODE_ARG_NS_DATA,
NODE_ARG_NS_WIDE,
NODE_ARG_SC_ID,
__NODE_ARG_MAX,
};
#define NODE_ARGS_SIZE __NODE_ARG_MAX
struct args {
__u16 ns_id;
__u32 trace_type;
__u8 n_node;
__u8 *ifname;
struct node_args node[0];
};
/* expected args, in that order */
enum {
ARG_IFNAME,
ARG_N_NODE,
ARG_NS_ID,
ARG_TRACE_TYPE,
__ARG_MAX,
};
#define ARGS_SIZE __ARG_MAX
int check_ioam6_node_data(__u8 **p, struct ioam6_trace_hdr *trace, __u8 hlim,
__u32 id, __u64 wide, __u16 ingr_id, __u32 ingr_wide,
__u16 egr_id, __u32 egr_wide, __u32 ns_data,
__u64 ns_wide, __u32 sc_id, __u8 *sc_data)
{
__u64 raw64;
__u32 raw32;
__u8 sc_len;
if (trace->type.bit0) {
raw32 = __be32_to_cpu(*((__u32 *)*p));
if (hlim != (raw32 >> 24) || id != (raw32 & 0xffffff))
return 1;
*p += sizeof(__u32);
}
if (trace->type.bit1) {
raw32 = __be32_to_cpu(*((__u32 *)*p));
if (ingr_id != (raw32 >> 16) || egr_id != (raw32 & 0xffff))
return 1;
*p += sizeof(__u32);
}
if (trace->type.bit2)
*p += sizeof(__u32);
if (trace->type.bit3)
*p += sizeof(__u32);
if (trace->type.bit4) {
if (__be32_to_cpu(*((__u32 *)*p)) != 0xffffffff)
return 1;
*p += sizeof(__u32);
}
if (trace->type.bit5) {
if (__be32_to_cpu(*((__u32 *)*p)) != ns_data)
return 1;
*p += sizeof(__u32);
}
if (trace->type.bit6) {
if (__be32_to_cpu(*((__u32 *)*p)) != 0xffffffff)
return 1;
*p += sizeof(__u32);
}
if (trace->type.bit7) {
if (__be32_to_cpu(*((__u32 *)*p)) != 0xffffffff)
return 1;
*p += sizeof(__u32);
}
if (trace->type.bit8) {
raw64 = __be64_to_cpu(*((__u64 *)*p));
if (hlim != (raw64 >> 56) || wide != (raw64 & 0xffffffffffffff))
return 1;
*p += sizeof(__u64);
}
if (trace->type.bit9) {
if (__be32_to_cpu(*((__u32 *)*p)) != ingr_wide)
return 1;
*p += sizeof(__u32);
if (__be32_to_cpu(*((__u32 *)*p)) != egr_wide)
return 1;
*p += sizeof(__u32);
}
if (trace->type.bit10) {
if (__be64_to_cpu(*((__u64 *)*p)) != ns_wide)
return 1;
*p += sizeof(__u64);
}
if (trace->type.bit11) {
if (__be32_to_cpu(*((__u32 *)*p)) != 0xffffffff)
return 1;
*p += sizeof(__u32);
}
if (trace->type.bit22) {
raw32 = __be32_to_cpu(*((__u32 *)*p));
sc_len = sc_data ? __ALIGN_KERNEL(strlen(sc_data), 4) : 0;
if (sc_len != (raw32 >> 24) * 4 || sc_id != (raw32 & 0xffffff))
return 1;
*p += sizeof(__u32);
if (sc_data) {
if (strncmp(*p, sc_data, strlen(sc_data)))
return 1;
*p += strlen(sc_data);
sc_len -= strlen(sc_data);
while (sc_len--) {
if (**p != '\0')
return 1;
*p += sizeof(__u8);
}
}
}
return 0;
}
int check_ioam6_trace(struct ioam6_trace_hdr *trace, struct args *args)
{
__u8 *p;
int i;
if (__be16_to_cpu(trace->namespace_id) != args->ns_id ||
__be32_to_cpu(trace->type_be32) != args->trace_type)
return 1;
p = trace->data + trace->remlen * 4;
for (i = args->n_node - 1; i >= 0; i--) {
if (check_ioam6_node_data(&p, trace,
args->node[i].hop_limit,
args->node[i].id,
args->node[i].wide,
args->node[i].ingr_id,
args->node[i].ingr_wide,
args->node[i].egr_id,
args->node[i].egr_wide,
args->node[i].ns_data,
args->node[i].ns_wide,
args->node[i].sc_id,
args->node[i].sc_data))
return 1;
}
return 0;
}
int parse_node_args(int *argcp, char ***argvp, struct node_args *node)
{
char **argv = *argvp;
if (*argcp < NODE_ARGS_SIZE)
return 1;
node->hop_limit = strtoul(argv[NODE_ARG_HOP_LIMIT], NULL, 10);
if (!node->hop_limit) {
node->hop_limit = strtoul(argv[NODE_ARG_HOP_LIMIT], NULL, 16);
if (!node->hop_limit)
return 1;
}
node->id = strtoul(argv[NODE_ARG_ID], NULL, 10);
if (!node->id) {
node->id = strtoul(argv[NODE_ARG_ID], NULL, 16);
if (!node->id)
return 1;
}
node->wide = strtoull(argv[NODE_ARG_WIDE], NULL, 10);
if (!node->wide) {
node->wide = strtoull(argv[NODE_ARG_WIDE], NULL, 16);
if (!node->wide)
return 1;
}
node->ingr_id = strtoul(argv[NODE_ARG_INGR_ID], NULL, 10);
if (!node->ingr_id) {
node->ingr_id = strtoul(argv[NODE_ARG_INGR_ID], NULL, 16);
if (!node->ingr_id)
return 1;
}
node->ingr_wide = strtoul(argv[NODE_ARG_INGR_WIDE], NULL, 10);
if (!node->ingr_wide) {
node->ingr_wide = strtoul(argv[NODE_ARG_INGR_WIDE], NULL, 16);
if (!node->ingr_wide)
return 1;
}
node->egr_id = strtoul(argv[NODE_ARG_EGR_ID], NULL, 10);
if (!node->egr_id) {
node->egr_id = strtoul(argv[NODE_ARG_EGR_ID], NULL, 16);
if (!node->egr_id)
return 1;
}
node->egr_wide = strtoul(argv[NODE_ARG_EGR_WIDE], NULL, 10);
if (!node->egr_wide) {
node->egr_wide = strtoul(argv[NODE_ARG_EGR_WIDE], NULL, 16);
if (!node->egr_wide)
return 1;
}
node->ns_data = strtoul(argv[NODE_ARG_NS_DATA], NULL, 16);
if (!node->ns_data)
return 1;
node->ns_wide = strtoull(argv[NODE_ARG_NS_WIDE], NULL, 16);
if (!node->ns_wide)
return 1;
node->sc_id = strtoul(argv[NODE_ARG_SC_ID], NULL, 10);
if (!node->sc_id) {
node->sc_id = strtoul(argv[NODE_ARG_SC_ID], NULL, 16);
if (!node->sc_id)
return 1;
}
*argcp -= NODE_ARGS_SIZE;
*argvp += NODE_ARGS_SIZE;
if (node->sc_id != 0xffffff) {
if (!*argcp)
return 1;
node->sc_data = argv[NODE_ARG_SC_ID + 1];
*argcp -= 1;
*argvp += 1;
}
return 0;
}
struct args *parse_args(int argc, char **argv)
{
struct args *args;
int n_node, i;
if (argc < ARGS_SIZE)
goto out;
n_node = strtoul(argv[ARG_N_NODE], NULL, 10);
if (!n_node || n_node > 10)
goto out;
args = calloc(1, sizeof(*args) + n_node * sizeof(struct node_args));
if (!args)
goto out;
args->ns_id = strtoul(argv[ARG_NS_ID], NULL, 10);
if (!args->ns_id)
goto free;
args->trace_type = strtoul(argv[ARG_TRACE_TYPE], NULL, 16);
if (!args->trace_type)
goto free;
args->n_node = n_node;
args->ifname = argv[ARG_IFNAME];
argv += ARGS_SIZE;
argc -= ARGS_SIZE;
for (i = 0; i < n_node; i++) {
if (parse_node_args(&argc, &argv, &args->node[i]))
goto free;
}
if (argc)
goto free;
return args;
free:
free(args);
out:
return NULL;
}
int main(int argc, char **argv)
{
int ret, fd, pkts, size, hoplen, found;
struct ioam6_trace_hdr *ioam6h;
struct ioam6_hdr *opt;
struct ipv6hdr *ip6h;
__u8 buffer[400], *p;
struct args *args;
args = parse_args(argc - 1, argv + 1);
if (!args) {
ret = 1;
goto out;
}
fd = socket(AF_PACKET, SOCK_DGRAM, __cpu_to_be16(ETH_P_IPV6));
if (!fd) {
ret = 1;
goto out;
}
if (setsockopt(fd, SOL_SOCKET, SO_BINDTODEVICE,
args->ifname, strlen(args->ifname))) {
ret = 1;
goto close;
}
pkts = 0;
found = 0;
while (pkts < 3 && !found) {
size = recv(fd, buffer, sizeof(buffer), 0);
ip6h = (struct ipv6hdr *)buffer;
pkts++;
if (ip6h->nexthdr == IPPROTO_HOPOPTS) {
p = buffer + sizeof(*ip6h);
hoplen = (p[1] + 1) << 3;
p += sizeof(struct ipv6_hopopt_hdr);
while (hoplen > 0) {
opt = (struct ioam6_hdr *)p;
if (opt->opt_type == IPV6_TLV_IOAM &&
opt->type == IOAM6_TYPE_PREALLOC) {
found = 1;
p += sizeof(*opt);
ioam6h = (struct ioam6_trace_hdr *)p;
ret = check_ioam6_trace(ioam6h, args);
break;
}
p += opt->opt_len + 2;
hoplen -= opt->opt_len + 2;
}
}
}
if (!found)
ret = 1;
close:
close(fd);
out:
free(args);
return ret;
}
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment