Commit 4a9f42c9 authored by Alexei Starovoitov's avatar Alexei Starovoitov

Merge branch 'bpf-flow-dissector'

Petar Penkov says:

====================
This patch series hardens the RX stack by allowing flow dissection in BPF,
as previously discussed [1]. Because of the rigorous checks of the BPF
verifier, this provides significant security guarantees. In particular, the
BPF flow dissector cannot get inside of an infinite loop, as with
CVE-2013-4348, because BPF programs are guaranteed to terminate. It cannot
read outside of packet bounds, because all memory accesses are checked.
Also, with BPF the administrator can decide which protocols to support,
reducing potential attack surface. Rarely encountered protocols can be
excluded from dissection and the program can be updated without kernel
recompile or reboot if a bug is discovered.

Patch 1 adds infrastructure to execute a BPF program in __skb_flow_dissect.
This includes a new BPF program and attach type.

Patch 2 adds the new BPF flow dissector definitions to tools/uapi.

Patch 3 adds support for the new BPF program type to libbpf and bpftool.

Patch 4 adds a flow dissector program in BPF. This parses most protocols in
__skb_flow_dissect in BPF for a subset of flow keys (basic, control, ports,
and address types).

Patch 5 adds a selftest that attaches the BPF program to the flow dissector
and sends traffic with different levels of encapsulation.

Performance Evaluation:
The in-kernel implementation was compared against the demo program from
patch 4 using the test in patch 5 with IPv4/UDP traffic over 10 seconds.
	$perf record -a -C 4 taskset -c 4 ./test_flow_dissector -i 4 -f 8 \
		-t 10

In-kernel Dissector:
	__skb_flow_dissect overhead: 2.12%
	Total Packets: 3,272,597 (from output of ./test_flow_dissector)

BPF Dissector:
	__skb_flow_dissect overhead: 1.63%
	Total Packets: 3,232,356 (from output of ./test_flow_dissector)

No-op BPF Dissector:
	__skb_flow_dissect overhead: 1.52%
	Total Packets: 3,330,635 (from output of ./test_flow_dissector)

Changes since v3:
1/ struct bpf_flow_keys reorganized to remove holes in patch 1 and patch 2.

Changes since v2:
1/ Changes to tools/include/uapi pulled into a separate patch 2
2/ Changes to tools/lib and tools/bpftool pulled into a separate patch 3
3/ Changed flow_keys in __sk_buff from __u32 to struct bpf_flow_keys *
4/ Added nhoff field in struct bpf_flow_keys to pass initial offset
5/ Saving all of the modified control block, rather than just the qdisc
6/ Sample BPF program in patch 4 modified to use the changes above

Changes since v1:
1/ LD_ABS instructions now disallowed for the new BPF prog type
2/ now checks if skb is NULL in __skb_flow_dissect()
3/ fixed incorrect accesses in flow_dissector_is_valid_access()
	- writes to the flow_keys field now disallowed
	- reads/writes to tc_classid and data_meta now disallowed
4/ headers pulled with bpf_skb_load_data if direct access fails

Changes since RFC:
1/ Flow dissector hook changed from global to per-netns
2/ Defined struct bpf_flow_keys to be used in BPF flow dissector
programs instead of exposing the internal flow keys layout. Added a
function to translate from bpf_flow_keys to the internal layout after BPF
dissection is complete. The pointer to this struct is stored in
qdisc_skb_cb rather than inside of the 20 byte control block which
simplifies verification and allows access to all 20 bytes of the cb.
3/ Removed GUE parsing as it relied on a hardcoded port
4/ MPLS parsing now stops at the first label which is consistent
with the in-kernel flow dissector
5/ Refactored to use direct packet access and to write out to
struct bpf_flow_keys

[1] http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf
====================
Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
parents 1edb6e03 50b3ed57
...@@ -212,6 +212,7 @@ enum bpf_reg_type { ...@@ -212,6 +212,7 @@ enum bpf_reg_type {
PTR_TO_PACKET_META, /* skb->data - meta_len */ PTR_TO_PACKET_META, /* skb->data - meta_len */
PTR_TO_PACKET, /* reg points to skb->data */ PTR_TO_PACKET, /* reg points to skb->data */
PTR_TO_PACKET_END, /* skb->data + headlen */ PTR_TO_PACKET_END, /* skb->data + headlen */
PTR_TO_FLOW_KEYS, /* reg points to bpf_flow_keys */
}; };
/* The information passed from prog-specific *_is_valid_access /* The information passed from prog-specific *_is_valid_access
......
...@@ -32,6 +32,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2) ...@@ -32,6 +32,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2)
#ifdef CONFIG_INET #ifdef CONFIG_INET
BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport) BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport)
#endif #endif
BPF_PROG_TYPE(BPF_PROG_TYPE_FLOW_DISSECTOR, flow_dissector)
BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
......
...@@ -243,6 +243,8 @@ struct scatterlist; ...@@ -243,6 +243,8 @@ struct scatterlist;
struct pipe_inode_info; struct pipe_inode_info;
struct iov_iter; struct iov_iter;
struct napi_struct; struct napi_struct;
struct bpf_prog;
union bpf_attr;
#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
struct nf_conntrack { struct nf_conntrack {
...@@ -1192,6 +1194,11 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector, ...@@ -1192,6 +1194,11 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
const struct flow_dissector_key *key, const struct flow_dissector_key *key,
unsigned int key_count); unsigned int key_count);
int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
struct bpf_prog *prog);
int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr);
bool __skb_flow_dissect(const struct sk_buff *skb, bool __skb_flow_dissect(const struct sk_buff *skb,
struct flow_dissector *flow_dissector, struct flow_dissector *flow_dissector,
void *target_container, void *target_container,
......
...@@ -43,6 +43,7 @@ struct ctl_table_header; ...@@ -43,6 +43,7 @@ struct ctl_table_header;
struct net_generic; struct net_generic;
struct uevent_sock; struct uevent_sock;
struct netns_ipvs; struct netns_ipvs;
struct bpf_prog;
#define NETDEV_HASHBITS 8 #define NETDEV_HASHBITS 8
...@@ -145,6 +146,8 @@ struct net { ...@@ -145,6 +146,8 @@ struct net {
#endif #endif
struct net_generic __rcu *gen; struct net_generic __rcu *gen;
struct bpf_prog __rcu *flow_dissector_prog;
/* Note : following structs are cache line aligned */ /* Note : following structs are cache line aligned */
#ifdef CONFIG_XFRM #ifdef CONFIG_XFRM
struct netns_xfrm xfrm; struct netns_xfrm xfrm;
......
...@@ -19,6 +19,7 @@ struct Qdisc_ops; ...@@ -19,6 +19,7 @@ struct Qdisc_ops;
struct qdisc_walker; struct qdisc_walker;
struct tcf_walker; struct tcf_walker;
struct module; struct module;
struct bpf_flow_keys;
typedef int tc_setup_cb_t(enum tc_setup_type type, typedef int tc_setup_cb_t(enum tc_setup_type type,
void *type_data, void *cb_priv); void *type_data, void *cb_priv);
...@@ -307,9 +308,14 @@ struct tcf_proto { ...@@ -307,9 +308,14 @@ struct tcf_proto {
}; };
struct qdisc_skb_cb { struct qdisc_skb_cb {
unsigned int pkt_len; union {
u16 slave_dev_queue_mapping; struct {
u16 tc_classid; unsigned int pkt_len;
u16 slave_dev_queue_mapping;
u16 tc_classid;
};
struct bpf_flow_keys *flow_keys;
};
#define QDISC_CB_PRIV_LEN 20 #define QDISC_CB_PRIV_LEN 20
unsigned char data[QDISC_CB_PRIV_LEN]; unsigned char data[QDISC_CB_PRIV_LEN];
}; };
......
...@@ -152,6 +152,7 @@ enum bpf_prog_type { ...@@ -152,6 +152,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_LWT_SEG6LOCAL, BPF_PROG_TYPE_LWT_SEG6LOCAL,
BPF_PROG_TYPE_LIRC_MODE2, BPF_PROG_TYPE_LIRC_MODE2,
BPF_PROG_TYPE_SK_REUSEPORT, BPF_PROG_TYPE_SK_REUSEPORT,
BPF_PROG_TYPE_FLOW_DISSECTOR,
}; };
enum bpf_attach_type { enum bpf_attach_type {
...@@ -172,6 +173,7 @@ enum bpf_attach_type { ...@@ -172,6 +173,7 @@ enum bpf_attach_type {
BPF_CGROUP_UDP4_SENDMSG, BPF_CGROUP_UDP4_SENDMSG,
BPF_CGROUP_UDP6_SENDMSG, BPF_CGROUP_UDP6_SENDMSG,
BPF_LIRC_MODE2, BPF_LIRC_MODE2,
BPF_FLOW_DISSECTOR,
__MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
}; };
...@@ -2333,6 +2335,7 @@ struct __sk_buff { ...@@ -2333,6 +2335,7 @@ struct __sk_buff {
/* ... here. */ /* ... here. */
__u32 data_meta; __u32 data_meta;
struct bpf_flow_keys *flow_keys;
}; };
struct bpf_tunnel_key { struct bpf_tunnel_key {
...@@ -2778,4 +2781,27 @@ enum bpf_task_fd_type { ...@@ -2778,4 +2781,27 @@ enum bpf_task_fd_type {
BPF_FD_TYPE_URETPROBE, /* filename + offset */ BPF_FD_TYPE_URETPROBE, /* filename + offset */
}; };
struct bpf_flow_keys {
__u16 nhoff;
__u16 thoff;
__u16 addr_proto; /* ETH_P_* of valid addrs */
__u8 is_frag;
__u8 is_first_frag;
__u8 is_encap;
__u8 ip_proto;
__be16 n_proto;
__be16 sport;
__be16 dport;
union {
struct {
__be32 ipv4_src;
__be32 ipv4_dst;
};
struct {
__u32 ipv6_src[4]; /* in6_addr; network order */
__u32 ipv6_dst[4]; /* in6_addr; network order */
};
};
};
#endif /* _UAPI__LINUX_BPF_H__ */ #endif /* _UAPI__LINUX_BPF_H__ */
...@@ -1615,6 +1615,9 @@ static int bpf_prog_attach(const union bpf_attr *attr) ...@@ -1615,6 +1615,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
case BPF_LIRC_MODE2: case BPF_LIRC_MODE2:
ptype = BPF_PROG_TYPE_LIRC_MODE2; ptype = BPF_PROG_TYPE_LIRC_MODE2;
break; break;
case BPF_FLOW_DISSECTOR:
ptype = BPF_PROG_TYPE_FLOW_DISSECTOR;
break;
default: default:
return -EINVAL; return -EINVAL;
} }
...@@ -1636,6 +1639,9 @@ static int bpf_prog_attach(const union bpf_attr *attr) ...@@ -1636,6 +1639,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
case BPF_PROG_TYPE_LIRC_MODE2: case BPF_PROG_TYPE_LIRC_MODE2:
ret = lirc_prog_attach(attr, prog); ret = lirc_prog_attach(attr, prog);
break; break;
case BPF_PROG_TYPE_FLOW_DISSECTOR:
ret = skb_flow_dissector_bpf_prog_attach(attr, prog);
break;
default: default:
ret = cgroup_bpf_prog_attach(attr, ptype, prog); ret = cgroup_bpf_prog_attach(attr, ptype, prog);
} }
...@@ -1688,6 +1694,8 @@ static int bpf_prog_detach(const union bpf_attr *attr) ...@@ -1688,6 +1694,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, NULL); return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, NULL);
case BPF_LIRC_MODE2: case BPF_LIRC_MODE2:
return lirc_prog_detach(attr); return lirc_prog_detach(attr);
case BPF_FLOW_DISSECTOR:
return skb_flow_dissector_bpf_prog_detach(attr);
default: default:
return -EINVAL; return -EINVAL;
} }
......
...@@ -261,6 +261,7 @@ static const char * const reg_type_str[] = { ...@@ -261,6 +261,7 @@ static const char * const reg_type_str[] = {
[PTR_TO_PACKET] = "pkt", [PTR_TO_PACKET] = "pkt",
[PTR_TO_PACKET_META] = "pkt_meta", [PTR_TO_PACKET_META] = "pkt_meta",
[PTR_TO_PACKET_END] = "pkt_end", [PTR_TO_PACKET_END] = "pkt_end",
[PTR_TO_FLOW_KEYS] = "flow_keys",
}; };
static char slot_type_char[] = { static char slot_type_char[] = {
...@@ -965,6 +966,7 @@ static bool is_spillable_regtype(enum bpf_reg_type type) ...@@ -965,6 +966,7 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
case PTR_TO_PACKET: case PTR_TO_PACKET:
case PTR_TO_PACKET_META: case PTR_TO_PACKET_META:
case PTR_TO_PACKET_END: case PTR_TO_PACKET_END:
case PTR_TO_FLOW_KEYS:
case CONST_PTR_TO_MAP: case CONST_PTR_TO_MAP:
return true; return true;
default: default:
...@@ -1238,6 +1240,7 @@ static bool may_access_direct_pkt_data(struct bpf_verifier_env *env, ...@@ -1238,6 +1240,7 @@ static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
case BPF_PROG_TYPE_LWT_XMIT: case BPF_PROG_TYPE_LWT_XMIT:
case BPF_PROG_TYPE_SK_SKB: case BPF_PROG_TYPE_SK_SKB:
case BPF_PROG_TYPE_SK_MSG: case BPF_PROG_TYPE_SK_MSG:
case BPF_PROG_TYPE_FLOW_DISSECTOR:
if (meta) if (meta)
return meta->pkt_access; return meta->pkt_access;
...@@ -1321,6 +1324,18 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, ...@@ -1321,6 +1324,18 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off,
return -EACCES; return -EACCES;
} }
static int check_flow_keys_access(struct bpf_verifier_env *env, int off,
int size)
{
if (size < 0 || off < 0 ||
(u64)off + size > sizeof(struct bpf_flow_keys)) {
verbose(env, "invalid access to flow keys off=%d size=%d\n",
off, size);
return -EACCES;
}
return 0;
}
static bool __is_pointer_value(bool allow_ptr_leaks, static bool __is_pointer_value(bool allow_ptr_leaks,
const struct bpf_reg_state *reg) const struct bpf_reg_state *reg)
{ {
...@@ -1422,6 +1437,9 @@ static int check_ptr_alignment(struct bpf_verifier_env *env, ...@@ -1422,6 +1437,9 @@ static int check_ptr_alignment(struct bpf_verifier_env *env,
* right in front, treat it the very same way. * right in front, treat it the very same way.
*/ */
return check_pkt_ptr_alignment(env, reg, off, size, strict); return check_pkt_ptr_alignment(env, reg, off, size, strict);
case PTR_TO_FLOW_KEYS:
pointer_desc = "flow keys ";
break;
case PTR_TO_MAP_VALUE: case PTR_TO_MAP_VALUE:
pointer_desc = "value "; pointer_desc = "value ";
break; break;
...@@ -1692,6 +1710,17 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn ...@@ -1692,6 +1710,17 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
err = check_packet_access(env, regno, off, size, false); err = check_packet_access(env, regno, off, size, false);
if (!err && t == BPF_READ && value_regno >= 0) if (!err && t == BPF_READ && value_regno >= 0)
mark_reg_unknown(env, regs, value_regno); mark_reg_unknown(env, regs, value_regno);
} else if (reg->type == PTR_TO_FLOW_KEYS) {
if (t == BPF_WRITE && value_regno >= 0 &&
is_pointer_value(env, value_regno)) {
verbose(env, "R%d leaks addr into flow keys\n",
value_regno);
return -EACCES;
}
err = check_flow_keys_access(env, off, size);
if (!err && t == BPF_READ && value_regno >= 0)
mark_reg_unknown(env, regs, value_regno);
} else { } else {
verbose(env, "R%d invalid mem access '%s'\n", regno, verbose(env, "R%d invalid mem access '%s'\n", regno,
reg_type_str[reg->type]); reg_type_str[reg->type]);
...@@ -1839,6 +1868,8 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno, ...@@ -1839,6 +1868,8 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
case PTR_TO_PACKET_META: case PTR_TO_PACKET_META:
return check_packet_access(env, regno, reg->off, access_size, return check_packet_access(env, regno, reg->off, access_size,
zero_size_allowed); zero_size_allowed);
case PTR_TO_FLOW_KEYS:
return check_flow_keys_access(env, reg->off, access_size);
case PTR_TO_MAP_VALUE: case PTR_TO_MAP_VALUE:
return check_map_access(env, regno, reg->off, access_size, return check_map_access(env, regno, reg->off, access_size,
zero_size_allowed); zero_size_allowed);
...@@ -4366,6 +4397,7 @@ static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur, ...@@ -4366,6 +4397,7 @@ static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur,
case PTR_TO_CTX: case PTR_TO_CTX:
case CONST_PTR_TO_MAP: case CONST_PTR_TO_MAP:
case PTR_TO_PACKET_END: case PTR_TO_PACKET_END:
case PTR_TO_FLOW_KEYS:
/* Only valid matches are exact, which memcmp() above /* Only valid matches are exact, which memcmp() above
* would have accepted * would have accepted
*/ */
......
...@@ -5123,6 +5123,17 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) ...@@ -5123,6 +5123,17 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
} }
} }
static const struct bpf_func_proto *
flow_dissector_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
switch (func_id) {
case BPF_FUNC_skb_load_bytes:
return &bpf_skb_load_bytes_proto;
default:
return bpf_base_func_proto(func_id);
}
}
static const struct bpf_func_proto * static const struct bpf_func_proto *
lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{ {
...@@ -5241,6 +5252,10 @@ static bool bpf_skb_is_valid_access(int off, int size, enum bpf_access_type type ...@@ -5241,6 +5252,10 @@ static bool bpf_skb_is_valid_access(int off, int size, enum bpf_access_type type
if (size != size_default) if (size != size_default)
return false; return false;
break; break;
case bpf_ctx_range(struct __sk_buff, flow_keys):
if (size != sizeof(struct bpf_flow_keys *))
return false;
break;
default: default:
/* Only narrow read access allowed for now. */ /* Only narrow read access allowed for now. */
if (type == BPF_WRITE) { if (type == BPF_WRITE) {
...@@ -5266,6 +5281,7 @@ static bool sk_filter_is_valid_access(int off, int size, ...@@ -5266,6 +5281,7 @@ static bool sk_filter_is_valid_access(int off, int size,
case bpf_ctx_range(struct __sk_buff, data): case bpf_ctx_range(struct __sk_buff, data):
case bpf_ctx_range(struct __sk_buff, data_meta): case bpf_ctx_range(struct __sk_buff, data_meta):
case bpf_ctx_range(struct __sk_buff, data_end): case bpf_ctx_range(struct __sk_buff, data_end):
case bpf_ctx_range(struct __sk_buff, flow_keys):
case bpf_ctx_range_till(struct __sk_buff, family, local_port): case bpf_ctx_range_till(struct __sk_buff, family, local_port):
return false; return false;
} }
...@@ -5291,6 +5307,7 @@ static bool lwt_is_valid_access(int off, int size, ...@@ -5291,6 +5307,7 @@ static bool lwt_is_valid_access(int off, int size,
case bpf_ctx_range(struct __sk_buff, tc_classid): case bpf_ctx_range(struct __sk_buff, tc_classid):
case bpf_ctx_range_till(struct __sk_buff, family, local_port): case bpf_ctx_range_till(struct __sk_buff, family, local_port):
case bpf_ctx_range(struct __sk_buff, data_meta): case bpf_ctx_range(struct __sk_buff, data_meta):
case bpf_ctx_range(struct __sk_buff, flow_keys):
return false; return false;
} }
...@@ -5501,6 +5518,7 @@ static bool tc_cls_act_is_valid_access(int off, int size, ...@@ -5501,6 +5518,7 @@ static bool tc_cls_act_is_valid_access(int off, int size,
case bpf_ctx_range(struct __sk_buff, data_end): case bpf_ctx_range(struct __sk_buff, data_end):
info->reg_type = PTR_TO_PACKET_END; info->reg_type = PTR_TO_PACKET_END;
break; break;
case bpf_ctx_range(struct __sk_buff, flow_keys):
case bpf_ctx_range_till(struct __sk_buff, family, local_port): case bpf_ctx_range_till(struct __sk_buff, family, local_port):
return false; return false;
} }
...@@ -5702,6 +5720,7 @@ static bool sk_skb_is_valid_access(int off, int size, ...@@ -5702,6 +5720,7 @@ static bool sk_skb_is_valid_access(int off, int size,
switch (off) { switch (off) {
case bpf_ctx_range(struct __sk_buff, tc_classid): case bpf_ctx_range(struct __sk_buff, tc_classid):
case bpf_ctx_range(struct __sk_buff, data_meta): case bpf_ctx_range(struct __sk_buff, data_meta):
case bpf_ctx_range(struct __sk_buff, flow_keys):
return false; return false;
} }
...@@ -5761,6 +5780,39 @@ static bool sk_msg_is_valid_access(int off, int size, ...@@ -5761,6 +5780,39 @@ static bool sk_msg_is_valid_access(int off, int size,
return true; return true;
} }
static bool flow_dissector_is_valid_access(int off, int size,
enum bpf_access_type type,
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
{
if (type == BPF_WRITE) {
switch (off) {
case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
break;
default:
return false;
}
}
switch (off) {
case bpf_ctx_range(struct __sk_buff, data):
info->reg_type = PTR_TO_PACKET;
break;
case bpf_ctx_range(struct __sk_buff, data_end):
info->reg_type = PTR_TO_PACKET_END;
break;
case bpf_ctx_range(struct __sk_buff, flow_keys):
info->reg_type = PTR_TO_FLOW_KEYS;
break;
case bpf_ctx_range(struct __sk_buff, tc_classid):
case bpf_ctx_range(struct __sk_buff, data_meta):
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
return false;
}
return bpf_skb_is_valid_access(off, size, type, prog, info);
}
static u32 bpf_convert_ctx_access(enum bpf_access_type type, static u32 bpf_convert_ctx_access(enum bpf_access_type type,
const struct bpf_insn *si, const struct bpf_insn *si,
struct bpf_insn *insn_buf, struct bpf_insn *insn_buf,
...@@ -6055,6 +6107,15 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type, ...@@ -6055,6 +6107,15 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
bpf_target_off(struct sock_common, bpf_target_off(struct sock_common,
skc_num, 2, target_size)); skc_num, 2, target_size));
break; break;
case offsetof(struct __sk_buff, flow_keys):
off = si->off;
off -= offsetof(struct __sk_buff, flow_keys);
off += offsetof(struct sk_buff, cb);
off += offsetof(struct qdisc_skb_cb, flow_keys);
*insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg,
si->src_reg, off);
break;
} }
return insn - insn_buf; return insn - insn_buf;
...@@ -7018,6 +7079,15 @@ const struct bpf_verifier_ops sk_msg_verifier_ops = { ...@@ -7018,6 +7079,15 @@ const struct bpf_verifier_ops sk_msg_verifier_ops = {
const struct bpf_prog_ops sk_msg_prog_ops = { const struct bpf_prog_ops sk_msg_prog_ops = {
}; };
const struct bpf_verifier_ops flow_dissector_verifier_ops = {
.get_func_proto = flow_dissector_func_proto,
.is_valid_access = flow_dissector_is_valid_access,
.convert_ctx_access = bpf_convert_ctx_access,
};
const struct bpf_prog_ops flow_dissector_prog_ops = {
};
int sk_detach_filter(struct sock *sk) int sk_detach_filter(struct sock *sk)
{ {
int ret = -ENOENT; int ret = -ENOENT;
......
...@@ -25,6 +25,9 @@ ...@@ -25,6 +25,9 @@
#include <net/flow_dissector.h> #include <net/flow_dissector.h>
#include <scsi/fc/fc_fcoe.h> #include <scsi/fc/fc_fcoe.h>
#include <uapi/linux/batadv_packet.h> #include <uapi/linux/batadv_packet.h>
#include <linux/bpf.h>
static DEFINE_MUTEX(flow_dissector_mutex);
static void dissector_set_key(struct flow_dissector *flow_dissector, static void dissector_set_key(struct flow_dissector *flow_dissector,
enum flow_dissector_key_id key_id) enum flow_dissector_key_id key_id)
...@@ -62,6 +65,44 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector, ...@@ -62,6 +65,44 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
} }
EXPORT_SYMBOL(skb_flow_dissector_init); EXPORT_SYMBOL(skb_flow_dissector_init);
int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
struct bpf_prog *prog)
{
struct bpf_prog *attached;
struct net *net;
net = current->nsproxy->net_ns;
mutex_lock(&flow_dissector_mutex);
attached = rcu_dereference_protected(net->flow_dissector_prog,
lockdep_is_held(&flow_dissector_mutex));
if (attached) {
/* Only one BPF program can be attached at a time */
mutex_unlock(&flow_dissector_mutex);
return -EEXIST;
}
rcu_assign_pointer(net->flow_dissector_prog, prog);
mutex_unlock(&flow_dissector_mutex);
return 0;
}
int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
{
struct bpf_prog *attached;
struct net *net;
net = current->nsproxy->net_ns;
mutex_lock(&flow_dissector_mutex);
attached = rcu_dereference_protected(net->flow_dissector_prog,
lockdep_is_held(&flow_dissector_mutex));
if (!attached) {
mutex_unlock(&flow_dissector_mutex);
return -ENOENT;
}
bpf_prog_put(attached);
RCU_INIT_POINTER(net->flow_dissector_prog, NULL);
mutex_unlock(&flow_dissector_mutex);
return 0;
}
/** /**
* skb_flow_get_be16 - extract be16 entity * skb_flow_get_be16 - extract be16 entity
* @skb: sk_buff to extract from * @skb: sk_buff to extract from
...@@ -588,6 +629,60 @@ static bool skb_flow_dissect_allowed(int *num_hdrs) ...@@ -588,6 +629,60 @@ static bool skb_flow_dissect_allowed(int *num_hdrs)
return (*num_hdrs <= MAX_FLOW_DISSECT_HDRS); return (*num_hdrs <= MAX_FLOW_DISSECT_HDRS);
} }
static void __skb_flow_bpf_to_target(const struct bpf_flow_keys *flow_keys,
struct flow_dissector *flow_dissector,
void *target_container)
{
struct flow_dissector_key_control *key_control;
struct flow_dissector_key_basic *key_basic;
struct flow_dissector_key_addrs *key_addrs;
struct flow_dissector_key_ports *key_ports;
key_control = skb_flow_dissector_target(flow_dissector,
FLOW_DISSECTOR_KEY_CONTROL,
target_container);
key_control->thoff = flow_keys->thoff;
if (flow_keys->is_frag)
key_control->flags |= FLOW_DIS_IS_FRAGMENT;
if (flow_keys->is_first_frag)
key_control->flags |= FLOW_DIS_FIRST_FRAG;
if (flow_keys->is_encap)
key_control->flags |= FLOW_DIS_ENCAPSULATION;
key_basic = skb_flow_dissector_target(flow_dissector,
FLOW_DISSECTOR_KEY_BASIC,
target_container);
key_basic->n_proto = flow_keys->n_proto;
key_basic->ip_proto = flow_keys->ip_proto;
if (flow_keys->addr_proto == ETH_P_IP &&
dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_IPV4_ADDRS)) {
key_addrs = skb_flow_dissector_target(flow_dissector,
FLOW_DISSECTOR_KEY_IPV4_ADDRS,
target_container);
key_addrs->v4addrs.src = flow_keys->ipv4_src;
key_addrs->v4addrs.dst = flow_keys->ipv4_dst;
key_control->addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
} else if (flow_keys->addr_proto == ETH_P_IPV6 &&
dissector_uses_key(flow_dissector,
FLOW_DISSECTOR_KEY_IPV6_ADDRS)) {
key_addrs = skb_flow_dissector_target(flow_dissector,
FLOW_DISSECTOR_KEY_IPV6_ADDRS,
target_container);
memcpy(&key_addrs->v6addrs, &flow_keys->ipv6_src,
sizeof(key_addrs->v6addrs));
key_control->addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
}
if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_PORTS)) {
key_ports = skb_flow_dissector_target(flow_dissector,
FLOW_DISSECTOR_KEY_PORTS,
target_container);
key_ports->src = flow_keys->sport;
key_ports->dst = flow_keys->dport;
}
}
/** /**
* __skb_flow_dissect - extract the flow_keys struct and return it * __skb_flow_dissect - extract the flow_keys struct and return it
* @skb: sk_buff to extract the flow from, can be NULL if the rest are specified * @skb: sk_buff to extract the flow from, can be NULL if the rest are specified
...@@ -619,6 +714,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb, ...@@ -619,6 +714,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
struct flow_dissector_key_vlan *key_vlan; struct flow_dissector_key_vlan *key_vlan;
enum flow_dissect_ret fdret; enum flow_dissect_ret fdret;
enum flow_dissector_key_id dissector_vlan = FLOW_DISSECTOR_KEY_MAX; enum flow_dissector_key_id dissector_vlan = FLOW_DISSECTOR_KEY_MAX;
struct bpf_prog *attached;
int num_hdrs = 0; int num_hdrs = 0;
u8 ip_proto = 0; u8 ip_proto = 0;
bool ret; bool ret;
...@@ -658,6 +754,44 @@ bool __skb_flow_dissect(const struct sk_buff *skb, ...@@ -658,6 +754,44 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
FLOW_DISSECTOR_KEY_BASIC, FLOW_DISSECTOR_KEY_BASIC,
target_container); target_container);
rcu_read_lock();
attached = skb ? rcu_dereference(dev_net(skb->dev)->flow_dissector_prog)
: NULL;
if (attached) {
/* Note that even though the const qualifier is discarded
* throughout the execution of the BPF program, all changes(the
* control block) are reverted after the BPF program returns.
* Therefore, __skb_flow_dissect does not alter the skb.
*/
struct bpf_flow_keys flow_keys = {};
struct bpf_skb_data_end cb_saved;
struct bpf_skb_data_end *cb;
u32 result;
cb = (struct bpf_skb_data_end *)skb->cb;
/* Save Control Block */
memcpy(&cb_saved, cb, sizeof(cb_saved));
memset(cb, 0, sizeof(cb_saved));
/* Pass parameters to the BPF program */
cb->qdisc_cb.flow_keys = &flow_keys;
flow_keys.nhoff = nhoff;
bpf_compute_data_pointers((struct sk_buff *)skb);
result = BPF_PROG_RUN(attached, skb);
/* Restore state */
memcpy(cb, &cb_saved, sizeof(cb_saved));
__skb_flow_bpf_to_target(&flow_keys, flow_dissector,
target_container);
key_control->thoff = min_t(u16, key_control->thoff, skb->len);
rcu_read_unlock();
return result == BPF_OK;
}
rcu_read_unlock();
if (dissector_uses_key(flow_dissector, if (dissector_uses_key(flow_dissector,
FLOW_DISSECTOR_KEY_ETH_ADDRS)) { FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
struct ethhdr *eth = eth_hdr(skb); struct ethhdr *eth = eth_hdr(skb);
......
...@@ -74,6 +74,7 @@ static const char * const prog_type_name[] = { ...@@ -74,6 +74,7 @@ static const char * const prog_type_name[] = {
[BPF_PROG_TYPE_RAW_TRACEPOINT] = "raw_tracepoint", [BPF_PROG_TYPE_RAW_TRACEPOINT] = "raw_tracepoint",
[BPF_PROG_TYPE_CGROUP_SOCK_ADDR] = "cgroup_sock_addr", [BPF_PROG_TYPE_CGROUP_SOCK_ADDR] = "cgroup_sock_addr",
[BPF_PROG_TYPE_LIRC_MODE2] = "lirc_mode2", [BPF_PROG_TYPE_LIRC_MODE2] = "lirc_mode2",
[BPF_PROG_TYPE_FLOW_DISSECTOR] = "flow_dissector",
}; };
static void print_boot_time(__u64 nsecs, char *buf, unsigned int size) static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
......
...@@ -152,6 +152,7 @@ enum bpf_prog_type { ...@@ -152,6 +152,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_LWT_SEG6LOCAL, BPF_PROG_TYPE_LWT_SEG6LOCAL,
BPF_PROG_TYPE_LIRC_MODE2, BPF_PROG_TYPE_LIRC_MODE2,
BPF_PROG_TYPE_SK_REUSEPORT, BPF_PROG_TYPE_SK_REUSEPORT,
BPF_PROG_TYPE_FLOW_DISSECTOR,
}; };
enum bpf_attach_type { enum bpf_attach_type {
...@@ -172,6 +173,7 @@ enum bpf_attach_type { ...@@ -172,6 +173,7 @@ enum bpf_attach_type {
BPF_CGROUP_UDP4_SENDMSG, BPF_CGROUP_UDP4_SENDMSG,
BPF_CGROUP_UDP6_SENDMSG, BPF_CGROUP_UDP6_SENDMSG,
BPF_LIRC_MODE2, BPF_LIRC_MODE2,
BPF_FLOW_DISSECTOR,
__MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
}; };
...@@ -2333,6 +2335,7 @@ struct __sk_buff { ...@@ -2333,6 +2335,7 @@ struct __sk_buff {
/* ... here. */ /* ... here. */
__u32 data_meta; __u32 data_meta;
struct bpf_flow_keys *flow_keys;
}; };
struct bpf_tunnel_key { struct bpf_tunnel_key {
...@@ -2778,4 +2781,27 @@ enum bpf_task_fd_type { ...@@ -2778,4 +2781,27 @@ enum bpf_task_fd_type {
BPF_FD_TYPE_URETPROBE, /* filename + offset */ BPF_FD_TYPE_URETPROBE, /* filename + offset */
}; };
struct bpf_flow_keys {
__u16 nhoff;
__u16 thoff;
__u16 addr_proto; /* ETH_P_* of valid addrs */
__u8 is_frag;
__u8 is_first_frag;
__u8 is_encap;
__u8 ip_proto;
__be16 n_proto;
__be16 sport;
__be16 dport;
union {
struct {
__be32 ipv4_src;
__be32 ipv4_dst;
};
struct {
__u32 ipv6_src[4]; /* in6_addr; network order */
__u32 ipv6_dst[4]; /* in6_addr; network order */
};
};
};
#endif /* _UAPI__LINUX_BPF_H__ */ #endif /* _UAPI__LINUX_BPF_H__ */
...@@ -1502,6 +1502,7 @@ static bool bpf_prog_type__needs_kver(enum bpf_prog_type type) ...@@ -1502,6 +1502,7 @@ static bool bpf_prog_type__needs_kver(enum bpf_prog_type type)
case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
case BPF_PROG_TYPE_LIRC_MODE2: case BPF_PROG_TYPE_LIRC_MODE2:
case BPF_PROG_TYPE_SK_REUSEPORT: case BPF_PROG_TYPE_SK_REUSEPORT:
case BPF_PROG_TYPE_FLOW_DISSECTOR:
return false; return false;
case BPF_PROG_TYPE_UNSPEC: case BPF_PROG_TYPE_UNSPEC:
case BPF_PROG_TYPE_KPROBE: case BPF_PROG_TYPE_KPROBE:
...@@ -2121,6 +2122,7 @@ static const struct { ...@@ -2121,6 +2122,7 @@ static const struct {
BPF_PROG_SEC("sk_skb", BPF_PROG_TYPE_SK_SKB), BPF_PROG_SEC("sk_skb", BPF_PROG_TYPE_SK_SKB),
BPF_PROG_SEC("sk_msg", BPF_PROG_TYPE_SK_MSG), BPF_PROG_SEC("sk_msg", BPF_PROG_TYPE_SK_MSG),
BPF_PROG_SEC("lirc_mode2", BPF_PROG_TYPE_LIRC_MODE2), BPF_PROG_SEC("lirc_mode2", BPF_PROG_TYPE_LIRC_MODE2),
BPF_PROG_SEC("flow_dissector", BPF_PROG_TYPE_FLOW_DISSECTOR),
BPF_SA_PROG_SEC("cgroup/bind4", BPF_CGROUP_INET4_BIND), BPF_SA_PROG_SEC("cgroup/bind4", BPF_CGROUP_INET4_BIND),
BPF_SA_PROG_SEC("cgroup/bind6", BPF_CGROUP_INET6_BIND), BPF_SA_PROG_SEC("cgroup/bind6", BPF_CGROUP_INET6_BIND),
BPF_SA_PROG_SEC("cgroup/connect4", BPF_CGROUP_INET4_CONNECT), BPF_SA_PROG_SEC("cgroup/connect4", BPF_CGROUP_INET4_CONNECT),
......
...@@ -23,3 +23,5 @@ test_skb_cgroup_id_user ...@@ -23,3 +23,5 @@ test_skb_cgroup_id_user
test_socket_cookie test_socket_cookie
test_cgroup_storage test_cgroup_storage
test_select_reuseport test_select_reuseport
test_flow_dissector
flow_dissector_load
...@@ -35,7 +35,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test ...@@ -35,7 +35,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \ test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \ test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \
get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \ get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
test_skb_cgroup_id_kern.o test_skb_cgroup_id_kern.o bpf_flow.o
# Order correspond to 'make run_tests' order # Order correspond to 'make run_tests' order
TEST_PROGS := test_kmod.sh \ TEST_PROGS := test_kmod.sh \
...@@ -47,10 +47,12 @@ TEST_PROGS := test_kmod.sh \ ...@@ -47,10 +47,12 @@ TEST_PROGS := test_kmod.sh \
test_tunnel.sh \ test_tunnel.sh \
test_lwt_seg6local.sh \ test_lwt_seg6local.sh \
test_lirc_mode2.sh \ test_lirc_mode2.sh \
test_skb_cgroup_id.sh test_skb_cgroup_id.sh \
test_flow_dissector.sh
# Compile but not part of 'make run_tests' # Compile but not part of 'make run_tests'
TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr test_skb_cgroup_id_user TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr test_skb_cgroup_id_user \
flow_dissector_load test_flow_dissector
include ../lib.mk include ../lib.mk
......
// SPDX-License-Identifier: GPL-2.0
#include <limits.h>
#include <stddef.h>
#include <stdbool.h>
#include <string.h>
#include <linux/pkt_cls.h>
#include <linux/bpf.h>
#include <linux/in.h>
#include <linux/if_ether.h>
#include <linux/icmp.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
#include <linux/tcp.h>
#include <linux/udp.h>
#include <linux/if_packet.h>
#include <sys/socket.h>
#include <linux/if_tunnel.h>
#include <linux/mpls.h>
#include "bpf_helpers.h"
#include "bpf_endian.h"
int _version SEC("version") = 1;
#define PROG(F) SEC(#F) int bpf_func_##F
/* These are the identifiers of the BPF programs that will be used in tail
* calls. Name is limited to 16 characters, with the terminating character and
* bpf_func_ above, we have only 6 to work with, anything after will be cropped.
*/
enum {
IP,
IPV6,
IPV6OP, /* Destination/Hop-by-Hop Options IPv6 Extension header */
IPV6FR, /* Fragmentation IPv6 Extension Header */
MPLS,
VLAN,
};
#define IP_MF 0x2000
#define IP_OFFSET 0x1FFF
#define IP6_MF 0x0001
#define IP6_OFFSET 0xFFF8
struct vlan_hdr {
__be16 h_vlan_TCI;
__be16 h_vlan_encapsulated_proto;
};
struct gre_hdr {
__be16 flags;
__be16 proto;
};
struct frag_hdr {
__u8 nexthdr;
__u8 reserved;
__be16 frag_off;
__be32 identification;
};
struct bpf_map_def SEC("maps") jmp_table = {
.type = BPF_MAP_TYPE_PROG_ARRAY,
.key_size = sizeof(__u32),
.value_size = sizeof(__u32),
.max_entries = 8
};
static __always_inline void *bpf_flow_dissect_get_header(struct __sk_buff *skb,
__u16 hdr_size,
void *buffer)
{
void *data_end = (void *)(long)skb->data_end;
void *data = (void *)(long)skb->data;
__u16 nhoff = skb->flow_keys->nhoff;
__u8 *hdr;
/* Verifies this variable offset does not overflow */
if (nhoff > (USHRT_MAX - hdr_size))
return NULL;
hdr = data + nhoff;
if (hdr + hdr_size <= data_end)
return hdr;
if (bpf_skb_load_bytes(skb, nhoff, buffer, hdr_size))
return NULL;
return buffer;
}
/* Dispatches on ETHERTYPE */
static __always_inline int parse_eth_proto(struct __sk_buff *skb, __be16 proto)
{
struct bpf_flow_keys *keys = skb->flow_keys;
keys->n_proto = proto;
switch (proto) {
case bpf_htons(ETH_P_IP):
bpf_tail_call(skb, &jmp_table, IP);
break;
case bpf_htons(ETH_P_IPV6):
bpf_tail_call(skb, &jmp_table, IPV6);
break;
case bpf_htons(ETH_P_MPLS_MC):
case bpf_htons(ETH_P_MPLS_UC):
bpf_tail_call(skb, &jmp_table, MPLS);
break;
case bpf_htons(ETH_P_8021Q):
case bpf_htons(ETH_P_8021AD):
bpf_tail_call(skb, &jmp_table, VLAN);
break;
default:
/* Protocol not supported */
return BPF_DROP;
}
return BPF_DROP;
}
SEC("dissect")
int dissect(struct __sk_buff *skb)
{
if (!skb->vlan_present)
return parse_eth_proto(skb, skb->protocol);
else
return parse_eth_proto(skb, skb->vlan_proto);
}
/* Parses on IPPROTO_* */
static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto)
{
struct bpf_flow_keys *keys = skb->flow_keys;
void *data_end = (void *)(long)skb->data_end;
struct icmphdr *icmp, _icmp;
struct gre_hdr *gre, _gre;
struct ethhdr *eth, _eth;
struct tcphdr *tcp, _tcp;
struct udphdr *udp, _udp;
keys->ip_proto = proto;
switch (proto) {
case IPPROTO_ICMP:
icmp = bpf_flow_dissect_get_header(skb, sizeof(*icmp), &_icmp);
if (!icmp)
return BPF_DROP;
return BPF_OK;
case IPPROTO_IPIP:
keys->is_encap = true;
return parse_eth_proto(skb, bpf_htons(ETH_P_IP));
case IPPROTO_IPV6:
keys->is_encap = true;
return parse_eth_proto(skb, bpf_htons(ETH_P_IPV6));
case IPPROTO_GRE:
gre = bpf_flow_dissect_get_header(skb, sizeof(*gre), &_gre);
if (!gre)
return BPF_DROP;
if (bpf_htons(gre->flags & GRE_VERSION))
/* Only inspect standard GRE packets with version 0 */
return BPF_OK;
keys->nhoff += sizeof(*gre); /* Step over GRE Flags and Proto */
if (GRE_IS_CSUM(gre->flags))
keys->nhoff += 4; /* Step over chksum and Padding */
if (GRE_IS_KEY(gre->flags))
keys->nhoff += 4; /* Step over key */
if (GRE_IS_SEQ(gre->flags))
keys->nhoff += 4; /* Step over sequence number */
keys->is_encap = true;
if (gre->proto == bpf_htons(ETH_P_TEB)) {
eth = bpf_flow_dissect_get_header(skb, sizeof(*eth),
&_eth);
if (!eth)
return BPF_DROP;
keys->nhoff += sizeof(*eth);
return parse_eth_proto(skb, eth->h_proto);
} else {
return parse_eth_proto(skb, gre->proto);
}
case IPPROTO_TCP:
tcp = bpf_flow_dissect_get_header(skb, sizeof(*tcp), &_tcp);
if (!tcp)
return BPF_DROP;
if (tcp->doff < 5)
return BPF_DROP;
if ((__u8 *)tcp + (tcp->doff << 2) > data_end)
return BPF_DROP;
keys->thoff = keys->nhoff;
keys->sport = tcp->source;
keys->dport = tcp->dest;
return BPF_OK;
case IPPROTO_UDP:
case IPPROTO_UDPLITE:
udp = bpf_flow_dissect_get_header(skb, sizeof(*udp), &_udp);
if (!udp)
return BPF_DROP;
keys->thoff = keys->nhoff;
keys->sport = udp->source;
keys->dport = udp->dest;
return BPF_OK;
default:
return BPF_DROP;
}
return BPF_DROP;
}
static __always_inline int parse_ipv6_proto(struct __sk_buff *skb, __u8 nexthdr)
{
struct bpf_flow_keys *keys = skb->flow_keys;
keys->ip_proto = nexthdr;
switch (nexthdr) {
case IPPROTO_HOPOPTS:
case IPPROTO_DSTOPTS:
bpf_tail_call(skb, &jmp_table, IPV6OP);
break;
case IPPROTO_FRAGMENT:
bpf_tail_call(skb, &jmp_table, IPV6FR);
break;
default:
return parse_ip_proto(skb, nexthdr);
}
return BPF_DROP;
}
PROG(IP)(struct __sk_buff *skb)
{
void *data_end = (void *)(long)skb->data_end;
struct bpf_flow_keys *keys = skb->flow_keys;
void *data = (void *)(long)skb->data;
struct iphdr *iph, _iph;
bool done = false;
iph = bpf_flow_dissect_get_header(skb, sizeof(*iph), &_iph);
if (!iph)
return BPF_DROP;
/* IP header cannot be smaller than 20 bytes */
if (iph->ihl < 5)
return BPF_DROP;
keys->addr_proto = ETH_P_IP;
keys->ipv4_src = iph->saddr;
keys->ipv4_dst = iph->daddr;
keys->nhoff += iph->ihl << 2;
if (data + keys->nhoff > data_end)
return BPF_DROP;
if (iph->frag_off & bpf_htons(IP_MF | IP_OFFSET)) {
keys->is_frag = true;
if (iph->frag_off & bpf_htons(IP_OFFSET))
/* From second fragment on, packets do not have headers
* we can parse.
*/
done = true;
else
keys->is_first_frag = true;
}
if (done)
return BPF_OK;
return parse_ip_proto(skb, iph->protocol);
}
PROG(IPV6)(struct __sk_buff *skb)
{
struct bpf_flow_keys *keys = skb->flow_keys;
struct ipv6hdr *ip6h, _ip6h;
ip6h = bpf_flow_dissect_get_header(skb, sizeof(*ip6h), &_ip6h);
if (!ip6h)
return BPF_DROP;
keys->addr_proto = ETH_P_IPV6;
memcpy(&keys->ipv6_src, &ip6h->saddr, 2*sizeof(ip6h->saddr));
keys->nhoff += sizeof(struct ipv6hdr);
return parse_ipv6_proto(skb, ip6h->nexthdr);
}
PROG(IPV6OP)(struct __sk_buff *skb)
{
struct ipv6_opt_hdr *ip6h, _ip6h;
ip6h = bpf_flow_dissect_get_header(skb, sizeof(*ip6h), &_ip6h);
if (!ip6h)
return BPF_DROP;
/* hlen is in 8-octets and does not include the first 8 bytes
* of the header
*/
skb->flow_keys->nhoff += (1 + ip6h->hdrlen) << 3;
return parse_ipv6_proto(skb, ip6h->nexthdr);
}
PROG(IPV6FR)(struct __sk_buff *skb)
{
struct bpf_flow_keys *keys = skb->flow_keys;
struct frag_hdr *fragh, _fragh;
fragh = bpf_flow_dissect_get_header(skb, sizeof(*fragh), &_fragh);
if (!fragh)
return BPF_DROP;
keys->nhoff += sizeof(*fragh);
keys->is_frag = true;
if (!(fragh->frag_off & bpf_htons(IP6_OFFSET)))
keys->is_first_frag = true;
return parse_ipv6_proto(skb, fragh->nexthdr);
}
PROG(MPLS)(struct __sk_buff *skb)
{
struct mpls_label *mpls, _mpls;
mpls = bpf_flow_dissect_get_header(skb, sizeof(*mpls), &_mpls);
if (!mpls)
return BPF_DROP;
return BPF_OK;
}
PROG(VLAN)(struct __sk_buff *skb)
{
struct bpf_flow_keys *keys = skb->flow_keys;
struct vlan_hdr *vlan, _vlan;
__be16 proto;
/* Peek back to see if single or double-tagging */
if (bpf_skb_load_bytes(skb, keys->nhoff - sizeof(proto), &proto,
sizeof(proto)))
return BPF_DROP;
/* Account for double-tagging */
if (proto == bpf_htons(ETH_P_8021AD)) {
vlan = bpf_flow_dissect_get_header(skb, sizeof(*vlan), &_vlan);
if (!vlan)
return BPF_DROP;
if (vlan->h_vlan_encapsulated_proto != bpf_htons(ETH_P_8021Q))
return BPF_DROP;
keys->nhoff += sizeof(*vlan);
}
vlan = bpf_flow_dissect_get_header(skb, sizeof(*vlan), &_vlan);
if (!vlan)
return BPF_DROP;
keys->nhoff += sizeof(*vlan);
/* Only allow 8021AD + 8021Q double tagging and no triple tagging.*/
if (vlan->h_vlan_encapsulated_proto == bpf_htons(ETH_P_8021AD) ||
vlan->h_vlan_encapsulated_proto == bpf_htons(ETH_P_8021Q))
return BPF_DROP;
return parse_eth_proto(skb, vlan->h_vlan_encapsulated_proto);
}
char __license[] SEC("license") = "GPL";
...@@ -18,3 +18,4 @@ CONFIG_CRYPTO_HMAC=m ...@@ -18,3 +18,4 @@ CONFIG_CRYPTO_HMAC=m
CONFIG_CRYPTO_SHA256=m CONFIG_CRYPTO_SHA256=m
CONFIG_VXLAN=y CONFIG_VXLAN=y
CONFIG_GENEVE=y CONFIG_GENEVE=y
CONFIG_NET_CLS_FLOWER=m
// SPDX-License-Identifier: GPL-2.0
#include <error.h>
#include <errno.h>
#include <getopt.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
const char *cfg_pin_path = "/sys/fs/bpf/flow_dissector";
const char *cfg_map_name = "jmp_table";
bool cfg_attach = true;
char *cfg_section_name;
char *cfg_path_name;
static void load_and_attach_program(void)
{
struct bpf_program *prog, *main_prog;
struct bpf_map *prog_array;
int i, fd, prog_fd, ret;
struct bpf_object *obj;
int prog_array_fd;
ret = bpf_prog_load(cfg_path_name, BPF_PROG_TYPE_FLOW_DISSECTOR, &obj,
&prog_fd);
if (ret)
error(1, 0, "bpf_prog_load %s", cfg_path_name);
main_prog = bpf_object__find_program_by_title(obj, cfg_section_name);
if (!main_prog)
error(1, 0, "bpf_object__find_program_by_title %s",
cfg_section_name);
prog_fd = bpf_program__fd(main_prog);
if (prog_fd < 0)
error(1, 0, "bpf_program__fd");
prog_array = bpf_object__find_map_by_name(obj, cfg_map_name);
if (!prog_array)
error(1, 0, "bpf_object__find_map_by_name %s", cfg_map_name);
prog_array_fd = bpf_map__fd(prog_array);
if (prog_array_fd < 0)
error(1, 0, "bpf_map__fd %s", cfg_map_name);
i = 0;
bpf_object__for_each_program(prog, obj) {
fd = bpf_program__fd(prog);
if (fd < 0)
error(1, 0, "bpf_program__fd");
if (fd != prog_fd) {
printf("%d: %s\n", i, bpf_program__title(prog, false));
bpf_map_update_elem(prog_array_fd, &i, &fd, BPF_ANY);
++i;
}
}
ret = bpf_prog_attach(prog_fd, 0 /* Ignore */, BPF_FLOW_DISSECTOR, 0);
if (ret)
error(1, 0, "bpf_prog_attach %s", cfg_path_name);
ret = bpf_object__pin(obj, cfg_pin_path);
if (ret)
error(1, 0, "bpf_object__pin %s", cfg_pin_path);
}
static void detach_program(void)
{
char command[64];
int ret;
ret = bpf_prog_detach(0, BPF_FLOW_DISSECTOR);
if (ret)
error(1, 0, "bpf_prog_detach");
/* To unpin, it is necessary and sufficient to just remove this dir */
sprintf(command, "rm -r %s", cfg_pin_path);
ret = system(command);
if (ret)
error(1, errno, command);
}
static void parse_opts(int argc, char **argv)
{
bool attach = false;
bool detach = false;
int c;
while ((c = getopt(argc, argv, "adp:s:")) != -1) {
switch (c) {
case 'a':
if (detach)
error(1, 0, "attach/detach are exclusive");
attach = true;
break;
case 'd':
if (attach)
error(1, 0, "attach/detach are exclusive");
detach = true;
break;
case 'p':
if (cfg_path_name)
error(1, 0, "only one prog name can be given");
cfg_path_name = optarg;
break;
case 's':
if (cfg_section_name)
error(1, 0, "only one section can be given");
cfg_section_name = optarg;
break;
}
}
if (detach)
cfg_attach = false;
if (cfg_attach && !cfg_path_name)
error(1, 0, "must provide a path to the BPF program");
if (cfg_attach && !cfg_section_name)
error(1, 0, "must provide a section name");
}
int main(int argc, char **argv)
{
parse_opts(argc, argv);
if (cfg_attach)
load_and_attach_program();
else
detach_program();
return 0;
}
This diff is collapsed.
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
#
# Load BPF flow dissector and verify it correctly dissects traffic
export TESTNAME=test_flow_dissector
unmount=0
# Kselftest framework requirement - SKIP code is 4.
ksft_skip=4
msg="skip all tests:"
if [ $UID != 0 ]; then
echo $msg please run this as root >&2
exit $ksft_skip
fi
# This test needs to be run in a network namespace with in_netns.sh. Check if
# this is the case and run it with in_netns.sh if it is being run in the root
# namespace.
if [[ -z $(ip netns identify $$) ]]; then
../net/in_netns.sh "$0" "$@"
exit $?
fi
# Determine selftest success via shell exit code
exit_handler()
{
if (( $? == 0 )); then
echo "selftests: $TESTNAME [PASS]";
else
echo "selftests: $TESTNAME [FAILED]";
fi
set +e
# Cleanup
tc filter del dev lo ingress pref 1337 2> /dev/null
tc qdisc del dev lo ingress 2> /dev/null
./flow_dissector_load -d 2> /dev/null
if [ $unmount -ne 0 ]; then
umount bpffs 2> /dev/null
fi
}
# Exit script immediately (well catched by trap handler) if any
# program/thing exits with a non-zero status.
set -e
# (Use 'trap -l' to list meaning of numbers)
trap exit_handler 0 2 3 6 9
# Mount BPF file system
if /bin/mount | grep /sys/fs/bpf > /dev/null; then
echo "bpffs already mounted"
else
echo "bpffs not mounted. Mounting..."
unmount=1
/bin/mount bpffs /sys/fs/bpf -t bpf
fi
# Attach BPF program
./flow_dissector_load -p bpf_flow.o -s dissect
# Setup
tc qdisc add dev lo ingress
echo "Testing IPv4..."
# Drops all IP/UDP packets coming from port 9
tc filter add dev lo parent ffff: protocol ip pref 1337 flower ip_proto \
udp src_port 9 action drop
# Send 10 IPv4/UDP packets from port 8. Filter should not drop any.
./test_flow_dissector -i 4 -f 8
# Send 10 IPv4/UDP packets from port 9. Filter should drop all.
./test_flow_dissector -i 4 -f 9 -F
# Send 10 IPv4/UDP packets from port 10. Filter should not drop any.
./test_flow_dissector -i 4 -f 10
echo "Testing IPIP..."
# Send 10 IPv4/IPv4/UDP packets from port 8. Filter should not drop any.
./with_addr.sh ./with_tunnels.sh ./test_flow_dissector -o 4 -e bare -i 4 \
-D 192.168.0.1 -S 1.1.1.1 -f 8
# Send 10 IPv4/IPv4/UDP packets from port 9. Filter should drop all.
./with_addr.sh ./with_tunnels.sh ./test_flow_dissector -o 4 -e bare -i 4 \
-D 192.168.0.1 -S 1.1.1.1 -f 9 -F
# Send 10 IPv4/IPv4/UDP packets from port 10. Filter should not drop any.
./with_addr.sh ./with_tunnels.sh ./test_flow_dissector -o 4 -e bare -i 4 \
-D 192.168.0.1 -S 1.1.1.1 -f 10
echo "Testing IPv4 + GRE..."
# Send 10 IPv4/GRE/IPv4/UDP packets from port 8. Filter should not drop any.
./with_addr.sh ./with_tunnels.sh ./test_flow_dissector -o 4 -e gre -i 4 \
-D 192.168.0.1 -S 1.1.1.1 -f 8
# Send 10 IPv4/GRE/IPv4/UDP packets from port 9. Filter should drop all.
./with_addr.sh ./with_tunnels.sh ./test_flow_dissector -o 4 -e gre -i 4 \
-D 192.168.0.1 -S 1.1.1.1 -f 9 -F
# Send 10 IPv4/GRE/IPv4/UDP packets from port 10. Filter should not drop any.
./with_addr.sh ./with_tunnels.sh ./test_flow_dissector -o 4 -e gre -i 4 \
-D 192.168.0.1 -S 1.1.1.1 -f 10
tc filter del dev lo ingress pref 1337
echo "Testing IPv6..."
# Drops all IPv6/UDP packets coming from port 9
tc filter add dev lo parent ffff: protocol ipv6 pref 1337 flower ip_proto \
udp src_port 9 action drop
# Send 10 IPv6/UDP packets from port 8. Filter should not drop any.
./test_flow_dissector -i 6 -f 8
# Send 10 IPv6/UDP packets from port 9. Filter should drop all.
./test_flow_dissector -i 6 -f 9 -F
# Send 10 IPv6/UDP packets from port 10. Filter should not drop any.
./test_flow_dissector -i 6 -f 10
exit 0
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
#
# add private ipv4 and ipv6 addresses to loopback
readonly V6_INNER='100::a/128'
readonly V4_INNER='192.168.0.1/32'
if getopts ":s" opt; then
readonly SIT_DEV_NAME='sixtofourtest0'
readonly V6_SIT='2::/64'
readonly V4_SIT='172.17.0.1/32'
shift
fi
fail() {
echo "error: $*" 1>&2
exit 1
}
setup() {
ip -6 addr add "${V6_INNER}" dev lo || fail 'failed to setup v6 address'
ip -4 addr add "${V4_INNER}" dev lo || fail 'failed to setup v4 address'
if [[ -n "${V6_SIT}" ]]; then
ip link add "${SIT_DEV_NAME}" type sit remote any local any \
|| fail 'failed to add sit'
ip link set dev "${SIT_DEV_NAME}" up \
|| fail 'failed to bring sit device up'
ip -6 addr add "${V6_SIT}" dev "${SIT_DEV_NAME}" \
|| fail 'failed to setup v6 SIT address'
ip -4 addr add "${V4_SIT}" dev "${SIT_DEV_NAME}" \
|| fail 'failed to setup v4 SIT address'
fi
sleep 2 # avoid race causing bind to fail
}
cleanup() {
if [[ -n "${V6_SIT}" ]]; then
ip -4 addr del "${V4_SIT}" dev "${SIT_DEV_NAME}"
ip -6 addr del "${V6_SIT}" dev "${SIT_DEV_NAME}"
ip link del "${SIT_DEV_NAME}"
fi
ip -4 addr del "${V4_INNER}" dev lo
ip -6 addr del "${V6_INNER}" dev lo
}
trap cleanup EXIT
setup
"$@"
exit "$?"
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
#
# setup tunnels for flow dissection test
readonly SUFFIX="test_$(mktemp -u XXXX)"
CONFIG="remote 127.0.0.2 local 127.0.0.1 dev lo"
setup() {
ip link add "ipip_${SUFFIX}" type ipip ${CONFIG}
ip link add "gre_${SUFFIX}" type gre ${CONFIG}
ip link add "sit_${SUFFIX}" type sit ${CONFIG}
echo "tunnels before test:"
ip tunnel show
ip link set "ipip_${SUFFIX}" up
ip link set "gre_${SUFFIX}" up
ip link set "sit_${SUFFIX}" up
}
cleanup() {
ip tunnel del "ipip_${SUFFIX}"
ip tunnel del "gre_${SUFFIX}"
ip tunnel del "sit_${SUFFIX}"
echo "tunnels after test:"
ip tunnel show
}
trap cleanup EXIT
setup
"$@"
exit "$?"
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment