Commit c85e41bf authored by Jakub Kicinski's avatar Jakub Kicinski

Merge tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

Patch #1 skips transaction if object type provides no .update interface.

Patch #2 skips NETDEV_CHANGENAME which is unused.

Patch #3 enables conntrack to handle Multicast Router Advertisements and
	 Multicast Router Solicitations from the Multicast Router Discovery
	 protocol (RFC4286) as untracked opposed to invalid packets.
	 From Linus Luessing.

Patch #4 updates DCCP conntracker to mark invalid as invalid, instead of
	 dropping them, from Jason Xing.

Patch #5 uses NF_DROP instead of -NF_DROP since NF_DROP is 0,
	 also from Jason.

Patch #6 removes reference in netfilter's sysctl documentation on pickup
	 entries which were already removed by Florian Westphal.

Patch #7 removes check for IPS_OFFLOAD flag to disable early drop which
	 allows to evict entries from the conntrack table,
	 also from Florian.

Patches #8 to #16 updates nf_tables pipapo set backend to allocate
	 the datastructure copy on-demand from preparation phase,
	 to better deal with OOM situations where .commit step is too late
	 to fail. Series from Florian Westphal.

Patch #17 adds a selftest with packetdrill to cover conntrack TCP state
	 transitions, also from Florian.

Patch #18 use GFP_KERNEL to clone elements from control plane to avoid
	 quick atomic reserves exhaustion with large sets, reporter refers
	 to million entries magnitude.

* tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: allow clone callbacks to sleep
  selftests: netfilter: add packetdrill based conntrack tests
  netfilter: nft_set_pipapo: remove dirty flag
  netfilter: nft_set_pipapo: move cloning of match info to insert/removal path
  netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone
  netfilter: nft_set_pipapo: merge deactivate helper into caller
  netfilter: nft_set_pipapo: prepare walk function for on-demand clone
  netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
  netfilter: nft_set_pipapo: make pipapo_clone helper return NULL
  netfilter: nft_set_pipapo: move prove_locking helper around
  netfilter: conntrack: remove flowtable early-drop test
  netfilter: conntrack: documentation: remove reference to non-existent sysctl
  netfilter: use NF_DROP instead of -NF_DROP
  netfilter: conntrack: dccp: try not to drop skb in conntrack
  netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router Discovery
  netfilter: nf_tables: remove NETDEV_CHANGENAME from netdev chain event handler
  netfilter: nf_tables: skip transaction if update object is not implemented
====================

Link: https://lore.kernel.org/r/20240512161436.168973-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parents cddd2dc6 fa23e0d4
......@@ -222,11 +222,11 @@ nf_flowtable_tcp_timeout - INTEGER (seconds)
Control offload timeout for tcp connections.
TCP connections may be offloaded from nf conntrack to nf flow table.
Once aged, the connection is returned to nf conntrack with tcp pickup timeout.
Once aged, the connection is returned to nf conntrack.
nf_flowtable_udp_timeout - INTEGER (seconds)
default 30
Control offload timeout for udp connections.
UDP connections may be offloaded from nf conntrack to nf flow table.
Once aged, the connection is returned to nf conntrack with udp pickup timeout.
Once aged, the connection is returned to nf conntrack.
......@@ -416,7 +416,7 @@ struct nft_expr_info;
int nft_expr_inner_parse(const struct nft_ctx *ctx, const struct nlattr *nla,
struct nft_expr_info *info);
int nft_expr_clone(struct nft_expr *dst, struct nft_expr *src);
int nft_expr_clone(struct nft_expr *dst, struct nft_expr *src, gfp_t gfp);
void nft_expr_destroy(const struct nft_ctx *ctx, struct nft_expr *expr);
int nft_expr_dump(struct sk_buff *skb, unsigned int attr,
const struct nft_expr *expr, bool reset);
......@@ -935,7 +935,7 @@ struct nft_expr_ops {
struct nft_regs *regs,
const struct nft_pktinfo *pkt);
int (*clone)(struct nft_expr *dst,
const struct nft_expr *src);
const struct nft_expr *src, gfp_t gfp);
unsigned int size;
int (*init)(const struct nft_ctx *ctx,
......
......@@ -112,6 +112,7 @@ struct icmp6hdr {
#define ICMPV6_MOBILE_PREFIX_ADV 147
#define ICMPV6_MRDISC_ADV 151
#define ICMPV6_MRDISC_SOL 152
#define ICMPV6_MSG_MAX 255
......
......@@ -44,7 +44,7 @@ static int iptable_filter_table_init(struct net *net)
return -ENOMEM;
/* Entry 1 is the FORWARD hook */
((struct ipt_standard *)repl->entries)[1].target.verdict =
forward ? -NF_ACCEPT - 1 : -NF_DROP - 1;
forward ? -NF_ACCEPT - 1 : NF_DROP - 1;
err = ipt_register_table(net, &packet_filter, repl, filter_ops);
kfree(repl);
......
......@@ -43,7 +43,7 @@ static int ip6table_filter_table_init(struct net *net)
return -ENOMEM;
/* Entry 1 is the FORWARD hook */
((struct ip6t_standard *)repl->entries)[1].target.verdict =
forward ? -NF_ACCEPT - 1 : -NF_DROP - 1;
forward ? -NF_ACCEPT - 1 : NF_DROP - 1;
err = ip6t_register_table(net, &packet_filter, repl, filter_ops);
kfree(repl);
......
......@@ -1440,8 +1440,6 @@ static bool gc_worker_can_early_drop(const struct nf_conn *ct)
const struct nf_conntrack_l4proto *l4proto;
u8 protonum = nf_ct_protonum(ct);
if (test_bit(IPS_OFFLOAD_BIT, &ct->status) && protonum != IPPROTO_UDP)
return false;
if (!test_bit(IPS_ASSURED_BIT, &ct->status))
return true;
......@@ -2024,7 +2022,7 @@ nf_conntrack_in(struct sk_buff *skb, const struct nf_hook_state *state)
goto repeat;
NF_CT_STAT_INC_ATOMIC(state->net, invalid);
if (ret == -NF_DROP)
if (ret == NF_DROP)
NF_CT_STAT_INC_ATOMIC(state->net, drop);
ret = -ret;
......
......@@ -525,7 +525,7 @@ int nf_conntrack_dccp_packet(struct nf_conn *ct, struct sk_buff *skb,
dh = skb_header_pointer(skb, dataoff, sizeof(*dh), &_dh.dh);
if (!dh)
return NF_DROP;
return -NF_ACCEPT;
if (dccp_error(dh, skb, dataoff, state))
return -NF_ACCEPT;
......@@ -533,7 +533,7 @@ int nf_conntrack_dccp_packet(struct nf_conn *ct, struct sk_buff *skb,
/* pull again, including possible 48 bit sequences and subtype header */
dh = dccp_header_pointer(skb, dataoff, dh, &_dh);
if (!dh)
return NF_DROP;
return -NF_ACCEPT;
type = dh->dccph_type;
if (!nf_ct_is_confirmed(ct) && !dccp_new(ct, skb, dh, state))
......
......@@ -62,7 +62,9 @@ static const u_int8_t noct_valid_new[] = {
[NDISC_ROUTER_ADVERTISEMENT - 130] = 1,
[NDISC_NEIGHBOUR_SOLICITATION - 130] = 1,
[NDISC_NEIGHBOUR_ADVERTISEMENT - 130] = 1,
[ICMPV6_MLD2_REPORT - 130] = 1
[ICMPV6_MLD2_REPORT - 130] = 1,
[ICMPV6_MRDISC_ADV - 130] = 1,
[ICMPV6_MRDISC_SOL - 130] = 1
};
bool nf_conntrack_invert_icmpv6_tuple(struct nf_conntrack_tuple *tuple,
......
......@@ -3333,7 +3333,7 @@ static struct nft_expr *nft_expr_init(const struct nft_ctx *ctx,
return ERR_PTR(err);
}
int nft_expr_clone(struct nft_expr *dst, struct nft_expr *src)
int nft_expr_clone(struct nft_expr *dst, struct nft_expr *src, gfp_t gfp)
{
int err;
......@@ -3341,7 +3341,7 @@ int nft_expr_clone(struct nft_expr *dst, struct nft_expr *src)
return -EINVAL;
dst->ops = src->ops;
err = src->ops->clone(dst, src);
err = src->ops->clone(dst, src, gfp);
if (err < 0)
return err;
......@@ -6525,7 +6525,7 @@ int nft_set_elem_expr_clone(const struct nft_ctx *ctx, struct nft_set *set,
if (!expr)
goto err_expr;
err = nft_expr_clone(expr, set->exprs[i]);
err = nft_expr_clone(expr, set->exprs[i], GFP_KERNEL_ACCOUNT);
if (err < 0) {
kfree(expr);
goto err_expr;
......@@ -6564,7 +6564,7 @@ static int nft_set_elem_expr_setup(struct nft_ctx *ctx,
for (i = 0; i < num_exprs; i++) {
expr = nft_setelem_expr_at(elem_expr, elem_expr->size);
err = nft_expr_clone(expr, expr_array[i]);
err = nft_expr_clone(expr, expr_array[i], GFP_KERNEL_ACCOUNT);
if (err < 0)
goto err_elem_expr_setup;
......@@ -7776,6 +7776,9 @@ static int nf_tables_newobj(struct sk_buff *skb, const struct nfnl_info *info,
if (WARN_ON_ONCE(!type))
return -ENOENT;
if (!obj->ops->update)
return 0;
nft_ctx_init(&ctx, net, skb, info->nlh, family, table, NULL, nla);
return nf_tables_updobj(&ctx, type, nla[NFTA_OBJ_DATA], obj);
......@@ -9467,9 +9470,10 @@ static void nft_obj_commit_update(struct nft_trans *trans)
obj = nft_trans_obj(trans);
newobj = nft_trans_obj_newobj(trans);
if (obj->ops->update)
obj->ops->update(obj, newobj);
if (WARN_ON_ONCE(!obj->ops->update))
return;
obj->ops->update(obj, newobj);
nft_obj_destroy(&trans->ctx, newobj);
}
......
......@@ -325,9 +325,6 @@ static void nft_netdev_event(unsigned long event, struct net_device *dev,
struct nft_hook *hook, *found = NULL;
int n = 0;
if (event != NETDEV_UNREGISTER)
return;
list_for_each_entry(hook, &basechain->hook_list, list) {
if (hook->ops.dev == dev)
found = hook;
......@@ -367,8 +364,7 @@ static int nf_tables_netdev_event(struct notifier_block *this,
.net = dev_net(dev),
};
if (event != NETDEV_UNREGISTER &&
event != NETDEV_CHANGENAME)
if (event != NETDEV_UNREGISTER)
return NOTIFY_DONE;
nft_net = nft_pernet(ctx.net);
......
......@@ -210,12 +210,12 @@ static void nft_connlimit_destroy(const struct nft_ctx *ctx,
nft_connlimit_do_destroy(ctx, priv);
}
static int nft_connlimit_clone(struct nft_expr *dst, const struct nft_expr *src)
static int nft_connlimit_clone(struct nft_expr *dst, const struct nft_expr *src, gfp_t gfp)
{
struct nft_connlimit *priv_dst = nft_expr_priv(dst);
struct nft_connlimit *priv_src = nft_expr_priv(src);
priv_dst->list = kmalloc(sizeof(*priv_dst->list), GFP_ATOMIC);
priv_dst->list = kmalloc(sizeof(*priv_dst->list), gfp);
if (!priv_dst->list)
return -ENOMEM;
......
......@@ -226,7 +226,7 @@ static void nft_counter_destroy(const struct nft_ctx *ctx,
nft_counter_do_destroy(priv);
}
static int nft_counter_clone(struct nft_expr *dst, const struct nft_expr *src)
static int nft_counter_clone(struct nft_expr *dst, const struct nft_expr *src, gfp_t gfp)
{
struct nft_counter_percpu_priv *priv = nft_expr_priv(src);
struct nft_counter_percpu_priv *priv_clone = nft_expr_priv(dst);
......@@ -236,7 +236,7 @@ static int nft_counter_clone(struct nft_expr *dst, const struct nft_expr *src)
nft_counter_fetch(priv, &total);
cpu_stats = alloc_percpu_gfp(struct nft_counter, GFP_ATOMIC);
cpu_stats = alloc_percpu_gfp(struct nft_counter, gfp);
if (cpu_stats == NULL)
return -ENOMEM;
......
......@@ -35,7 +35,7 @@ static int nft_dynset_expr_setup(const struct nft_dynset *priv,
for (i = 0; i < priv->num_exprs; i++) {
expr = nft_setelem_expr_at(elem_expr, elem_expr->size);
if (nft_expr_clone(expr, priv->expr_array[i]) < 0)
if (nft_expr_clone(expr, priv->expr_array[i], GFP_ATOMIC) < 0)
return -1;
elem_expr->size += priv->expr_array[i]->ops->size;
......
......@@ -102,12 +102,12 @@ static void nft_last_destroy(const struct nft_ctx *ctx,
kfree(priv->last);
}
static int nft_last_clone(struct nft_expr *dst, const struct nft_expr *src)
static int nft_last_clone(struct nft_expr *dst, const struct nft_expr *src, gfp_t gfp)
{
struct nft_last_priv *priv_dst = nft_expr_priv(dst);
struct nft_last_priv *priv_src = nft_expr_priv(src);
priv_dst->last = kzalloc(sizeof(*priv_dst->last), GFP_ATOMIC);
priv_dst->last = kzalloc(sizeof(*priv_dst->last), gfp);
if (!priv_dst->last)
return -ENOMEM;
......
......@@ -150,7 +150,7 @@ static void nft_limit_destroy(const struct nft_ctx *ctx,
}
static int nft_limit_clone(struct nft_limit_priv *priv_dst,
const struct nft_limit_priv *priv_src)
const struct nft_limit_priv *priv_src, gfp_t gfp)
{
priv_dst->tokens_max = priv_src->tokens_max;
priv_dst->rate = priv_src->rate;
......@@ -158,7 +158,7 @@ static int nft_limit_clone(struct nft_limit_priv *priv_dst,
priv_dst->burst = priv_src->burst;
priv_dst->invert = priv_src->invert;
priv_dst->limit = kmalloc(sizeof(*priv_dst->limit), GFP_ATOMIC);
priv_dst->limit = kmalloc(sizeof(*priv_dst->limit), gfp);
if (!priv_dst->limit)
return -ENOMEM;
......@@ -223,14 +223,15 @@ static void nft_limit_pkts_destroy(const struct nft_ctx *ctx,
nft_limit_destroy(ctx, &priv->limit);
}
static int nft_limit_pkts_clone(struct nft_expr *dst, const struct nft_expr *src)
static int nft_limit_pkts_clone(struct nft_expr *dst, const struct nft_expr *src,
gfp_t gfp)
{
struct nft_limit_priv_pkts *priv_dst = nft_expr_priv(dst);
struct nft_limit_priv_pkts *priv_src = nft_expr_priv(src);
priv_dst->cost = priv_src->cost;
return nft_limit_clone(&priv_dst->limit, &priv_src->limit);
return nft_limit_clone(&priv_dst->limit, &priv_src->limit, gfp);
}
static struct nft_expr_type nft_limit_type;
......@@ -281,12 +282,13 @@ static void nft_limit_bytes_destroy(const struct nft_ctx *ctx,
nft_limit_destroy(ctx, priv);
}
static int nft_limit_bytes_clone(struct nft_expr *dst, const struct nft_expr *src)
static int nft_limit_bytes_clone(struct nft_expr *dst, const struct nft_expr *src,
gfp_t gfp)
{
struct nft_limit_priv *priv_dst = nft_expr_priv(dst);
struct nft_limit_priv *priv_src = nft_expr_priv(src);
return nft_limit_clone(priv_dst, priv_src);
return nft_limit_clone(priv_dst, priv_src, gfp);
}
static const struct nft_expr_ops nft_limit_bytes_ops = {
......
......@@ -233,7 +233,7 @@ static void nft_quota_destroy(const struct nft_ctx *ctx,
return nft_quota_do_destroy(ctx, priv);
}
static int nft_quota_clone(struct nft_expr *dst, const struct nft_expr *src)
static int nft_quota_clone(struct nft_expr *dst, const struct nft_expr *src, gfp_t gfp)
{
struct nft_quota *priv_dst = nft_expr_priv(dst);
struct nft_quota *priv_src = nft_expr_priv(src);
......@@ -241,7 +241,7 @@ static int nft_quota_clone(struct nft_expr *dst, const struct nft_expr *src)
priv_dst->quota = priv_src->quota;
priv_dst->flags = priv_src->flags;
priv_dst->consumed = kmalloc(sizeof(*priv_dst->consumed), GFP_ATOMIC);
priv_dst->consumed = kmalloc(sizeof(*priv_dst->consumed), gfp);
if (!priv_dst->consumed)
return -ENOMEM;
......
This diff is collapsed.
......@@ -155,14 +155,12 @@ struct nft_pipapo_match {
* @match: Currently in-use matching data
* @clone: Copy where pending insertions and deletions are kept
* @width: Total bytes to be matched for one packet, including padding
* @dirty: Working copy has pending insertions or deletions
* @last_gc: Timestamp of last garbage collection run, jiffies
*/
struct nft_pipapo {
struct nft_pipapo_match __rcu *match;
struct nft_pipapo_match *clone;
int width;
bool dirty;
unsigned long last_gc;
};
......
......@@ -13,6 +13,7 @@ TEST_PROGS += conntrack_tcp_unreplied.sh
TEST_PROGS += conntrack_sctp_collision.sh
TEST_PROGS += conntrack_vrf.sh
TEST_PROGS += ipvs.sh
TEST_PROGS += nf_conntrack_packetdrill.sh
TEST_PROGS += nf_nat_edemux.sh
TEST_PROGS += nft_audit.sh
TEST_PROGS += nft_concat_range.sh
......@@ -45,6 +46,7 @@ $(OUTPUT)/conntrack_dump_flush: CFLAGS += $(MNL_CFLAGS)
$(OUTPUT)/conntrack_dump_flush: LDLIBS += $(MNL_LDLIBS)
TEST_FILES := lib.sh
TEST_FILES += packetdrill
TEST_INCLUDES := \
../lib.sh
......@@ -86,3 +86,4 @@ CONFIG_VLAN_8021Q=m
CONFIG_XFRM_USER=m
CONFIG_XFRM_STATISTICS=y
CONFIG_NET_PKTGEN=m
CONFIG_TUN=m
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
source lib.sh
checktool "conntrack --version" "run test without conntrack"
checktool "iptables --version" "run test without iptables"
checktool "ip6tables --version" "run test without ip6tables"
modprobe -q tun
modprobe -q nf_conntrack
# echo 1 > /proc/sys/net/netfilter/nf_log_all_netns
PDRILL_TIMEOUT=10
files="
conntrack_ack_loss_stall.pkt
conntrack_inexact_rst.pkt
conntrack_syn_challenge_ack.pkt
conntrack_synack_old.pkt
conntrack_synack_reuse.pkt
conntrack_rst_invalid.pkt
"
if ! packetdrill --dry_run --verbose "packetdrill/conntrack_ack_loss_stall.pkt";then
echo "SKIP: packetdrill not installed"
exit ${ksft_skip}
fi
ret=0
run_packetdrill()
{
filename="$1"
ipver="$2"
local mtu=1500
export NFCT_IP_VERSION="$ipver"
if [ "$ipver" = "ipv4" ];then
export xtables="iptables"
elif [ "$ipver" = "ipv6" ];then
export xtables="ip6tables"
mtu=1520
fi
timeout "$PDRILL_TIMEOUT" unshare -n packetdrill --ip_version="$ipver" --mtu=$mtu \
--tolerance_usecs=1000000 --non_fatal packet "$filename"
}
run_one_test_file()
{
filename="$1"
for v in ipv4 ipv6;do
printf "%-50s(%s)%-20s" "$filename" "$v" ""
if run_packetdrill packetdrill/"$f" "$v";then
echo OK
else
echo FAIL
ret=1
fi
done
}
echo "Replaying packetdrill test cases:"
for f in $files;do
run_one_test_file packetdrill/"$f"
done
exit $ret
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
# for debugging set net.netfilter.nf_log_all_netns=1 in init_net
# or do not use net namespaces.
modprobe -q nf_conntrack
sysctl -q net.netfilter.nf_conntrack_log_invalid=6
# Flush old cached data (fastopen cookies).
ip tcp_metrics flush all > /dev/null 2>&1
# TCP min, default, and max receive and send buffer sizes.
sysctl -q net.ipv4.tcp_rmem="4096 540000 $((15*1024*1024))"
sysctl -q net.ipv4.tcp_wmem="4096 $((256*1024)) 4194304"
# TCP congestion control.
sysctl -q net.ipv4.tcp_congestion_control=cubic
# TCP slow start after idle.
sysctl -q net.ipv4.tcp_slow_start_after_idle=0
# TCP Explicit Congestion Notification (ECN)
sysctl -q net.ipv4.tcp_ecn=0
sysctl -q net.ipv4.tcp_notsent_lowat=4294967295 > /dev/null 2>&1
# Override the default qdisc on the tun device.
# Many tests fail with timing errors if the default
# is FQ and that paces their flows.
tc qdisc add dev tun0 root pfifo
# Enable conntrack
$xtables -A INPUT -m conntrack --ctstate NEW -p tcp --syn
// check that already-acked (retransmitted) packet is let through rather
// than tagged as INVALID.
`packetdrill/common.sh`
// should set -P DROP but it disconnects VM w.o. extra netns
+0 `$xtables -A INPUT -m conntrack --ctstate INVALID -j DROP`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 10) = 0
+0 < S 0:0(0) win 32792 <mss 1000>
+0 > S. 0:0(0) ack 1 <mss 1460>
+.01 < . 1:1(0) ack 1 win 65535
+0 accept(3, ..., ...) = 4
+0.0001 < P. 1:1461(1460) ack 1 win 257
+.0 > . 1:1(0) ack 1461 win 65535
+0.0001 < P. 1461:2921(1460) ack 1 win 257
+.0 > . 1:1(0) ack 2921 win 65535
+0.0001 < P. 2921:4381(1460) ack 1 win 257
+.0 > . 1:1(0) ack 4381 win 65535
+0.0001 < P. 4381:5841(1460) ack 1 win 257
+.0 > . 1:1(0) ack 5841 win 65535
+0.0001 < P. 5841:7301(1460) ack 1 win 257
+.0 > . 1:1(0) ack 7301 win 65535
+0.0001 < P. 7301:8761(1460) ack 1 win 257
+.0 > . 1:1(0) ack 8761 win 65535
+0.0001 < P. 8761:10221(1460) ack 1 win 257
+.0 > . 1:1(0) ack 10221 win 65535
+0.0001 < P. 10221:11681(1460) ack 1 win 257
+.0 > . 1:1(0) ack 11681 win 65535
+0.0001 < P. 11681:13141(1460) ack 1 win 257
+.0 > . 1:1(0) ack 13141 win 65535
+0.0001 < P. 13141:14601(1460) ack 1 win 257
+.0 > . 1:1(0) ack 14601 win 65535
+0.0001 < P. 14601:16061(1460) ack 1 win 257
+.0 > . 1:1(0) ack 16061 win 65535
+0.0001 < P. 16061:17521(1460) ack 1 win 257
+.0 > . 1:1(0) ack 17521 win 65535
+0.0001 < P. 17521:18981(1460) ack 1 win 257
+.0 > . 1:1(0) ack 18981 win 65535
+0.0001 < P. 18981:20441(1460) ack 1 win 257
+.0 > . 1:1(0) ack 20441 win 65535
+0.0001 < P. 20441:21901(1460) ack 1 win 257
+.0 > . 1:1(0) ack 21901 win 65535
+0.0001 < P. 21901:23361(1460) ack 1 win 257
+.0 > . 1:1(0) ack 23361 win 65535
+0.0001 < P. 23361:24821(1460) ack 1 win 257
0.055 > . 1:1(0) ack 24821 win 65535
+0.0001 < P. 24821:26281(1460) ack 1 win 257
+.0 > . 1:1(0) ack 26281 win 65535
+0.0001 < P. 26281:27741(1460) ack 1 win 257
+.0 > . 1:1(0) ack 27741 win 65535
+0.0001 < P. 27741:29201(1460) ack 1 win 257
+.0 > . 1:1(0) ack 29201 win 65535
+0.0001 < P. 29201:30661(1460) ack 1 win 257
+.0 > . 1:1(0) ack 30661 win 65535
+0.0001 < P. 30661:32121(1460) ack 1 win 257
+.0 > . 1:1(0) ack 32121 win 65535
+0.0001 < P. 32121:33581(1460) ack 1 win 257
+.0 > . 1:1(0) ack 33581 win 65535
+0.0001 < P. 33581:35041(1460) ack 1 win 257
+.0 > . 1:1(0) ack 35041 win 65535
+0.0001 < P. 35041:36501(1460) ack 1 win 257
+.0 > . 1:1(0) ack 36501 win 65535
+0.0001 < P. 36501:37961(1460) ack 1 win 257
+.0 > . 1:1(0) ack 37961 win 65535
+0.0001 < P. 37961:39421(1460) ack 1 win 257
+.0 > . 1:1(0) ack 39421 win 65535
+0.0001 < P. 39421:40881(1460) ack 1 win 257
+.0 > . 1:1(0) ack 40881 win 65535
+0.0001 < P. 40881:42341(1460) ack 1 win 257
+.0 > . 1:1(0) ack 42341 win 65535
+0.0001 < P. 42341:43801(1460) ack 1 win 257
+.0 > . 1:1(0) ack 43801 win 65535
+0.0001 < P. 43801:45261(1460) ack 1 win 257
+.0 > . 1:1(0) ack 45261 win 65535
+0.0001 < P. 45261:46721(1460) ack 1 win 257
+.0 > . 1:1(0) ack 46721 win 65535
+0.0001 < P. 46721:48181(1460) ack 1 win 257
+.0 > . 1:1(0) ack 48181 win 65535
+0.0001 < P. 48181:49641(1460) ack 1 win 257
+.0 > . 1:1(0) ack 49641 win 65535
+0.0001 < P. 49641:51101(1460) ack 1 win 257
+.0 > . 1:1(0) ack 51101 win 65535
+0.0001 < P. 51101:52561(1460) ack 1 win 257
+.0 > . 1:1(0) ack 52561 win 65535
+0.0001 < P. 52561:54021(1460) ack 1 win 257
+.0 > . 1:1(0) ack 54021 win 65535
+0.0001 < P. 54021:55481(1460) ack 1 win 257
+.0 > . 1:1(0) ack 55481 win 65535
+0.0001 < P. 55481:56941(1460) ack 1 win 257
+.0 > . 1:1(0) ack 56941 win 65535
+0.0001 < P. 56941:58401(1460) ack 1 win 257
+.0 > . 1:1(0) ack 58401 win 65535
+0.0001 < P. 58401:59861(1460) ack 1 win 257
+.0 > . 1:1(0) ack 59861 win 65535
+0.0001 < P. 59861:61321(1460) ack 1 win 257
+.0 > . 1:1(0) ack 61321 win 65535
+0.0001 < P. 61321:62781(1460) ack 1 win 257
+.0 > . 1:1(0) ack 62781 win 65535
+0.0001 < P. 62781:64241(1460) ack 1 win 257
+.0 > . 1:1(0) ack 64241 win 65535
+0.0001 < P. 64241:65701(1460) ack 1 win 257
+.0 > . 1:1(0) ack 65701 win 65535
+0.0001 < P. 65701:67161(1460) ack 1 win 257
+.0 > . 1:1(0) ack 67161 win 65535
// nf_ct_proto_6: SEQ is under the lower bound (already ACKed data retransmitted) IN=tun0 OUT= MAC= SRC=192.0.2.1 DST=192.168.24.72 LEN=1500 TOS=0x00 PREC=0x00 TTL=255 ID=0 PROTO=TCP SPT=34375 DPT=8080 SEQ=1 ACK=4162510439 WINDOW=257 RES=0x00 ACK PSH URGP=0
+0.0001 < P. 1:1461(1460) ack 1 win 257
// only sent if above packet isn't flagged as invalid
+.0 > . 1:1(0) ack 67161 win 65535
+0 `$xtables -D INPUT -m conntrack --ctstate INVALID -j DROP`
// check RST packet that doesn't exactly match expected next sequence
// number still transitions conntrack state to CLOSE iff its already in
// FIN/CLOSE_WAIT.
`packetdrill/common.sh`
// 5.771921 server_ip > client_ip TLSv1.2 337 [Packet size limited during capture]
// 5.771994 server_ip > client_ip TLSv1.2 337 [Packet size limited during capture]
// 5.772212 client_ip > server_ip TCP 66 45020 > 443 [ACK] Seq=1905874048 Ack=781810658 Win=36352 Len=0 TSval=3317842872 TSecr=675936334
// 5.787924 server_ip > client_ip TLSv1.2 1300 [Packet size limited during capture]
// 5.788126 server_ip > client_ip TLSv1.2 90 Application Data
// 5.788207 server_ip > client_ip TCP 66 443 > 45020 [FIN, ACK] Seq=781811916 Ack=1905874048 Win=31104 Len=0 TSval=675936350 TSecr=3317842872
// 5.788447 client_ip > server_ip TLSv1.2 90 Application Data
// 5.788479 client_ip > server_ip TCP 66 45020 > 443 [RST, ACK] Seq=1905874072 Ack=781811917 Win=39040 Len=0 TSval=3317842889 TSecr=675936350
// 5.788581 server_ip > client_ip TCP 54 8443 > 45020 [RST] Seq=781811892 Win=0 Len=0
+0 `iptables -A INPUT -p tcp -m conntrack --ctstate INVALID -j DROP`
+0 `iptables -A OUTPUT -p tcp -m conntrack --ctstate INVALID -j DROP`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
0.1 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
0.1 > S 0:0(0) win 65535 <mss 1460,sackOK,TS val 1 ecr 0,nop,wscale 8>
+0.1 < S. 1:1(0) ack 1 win 65535 <mss 1460>
+0 > . 1:1(0) ack 1 win 65535
+0 < . 1:1001(1000) ack 1 win 65535
+0 < . 1001:2001(1000) ack 1 win 65535
+0 < . 2001:3001(1000) ack 1 win 65535
+0 > . 1:1(0) ack 1001 win 65535
+0 > . 1:1(0) ack 2001 win 65535
+0 > . 1:1(0) ack 3001 win 65535
+0 write(3, ..., 1000) = 1000
+0.0 > P. 1:1001(1000) ack 3001 win 65535
+0.1 read(3, ..., 1000) = 1000
// Conntrack should move to FIN_WAIT, then CLOSE_WAIT.
+0 < F. 3001:3001(0) ack 1001 win 65535
+0 > . 1001:1001(0) ack 3002 win 65535
+0 `conntrack -f $NFCT_IP_VERSION -L -p tcp --dport 8080 2>/dev/null |grep -q CLOSE_WAIT`
+1 close(3) = 0
// RST: unread data. FIN was seen, hence ack + 1
+0 > R. 1001:1001(0) ack 3002 win 65535
// ... and then, CLOSE.
+0 `conntrack -f $NFCT_IP_VERSION -L -p tcp --dport 8080 2>/dev/null |grep -q CLOSE\ `
// Spurious RST from peer -- no sk state. Should NOT get
// marked INVALID, because conntrack is already closing.
+0.1 < R 2001:2001(0) win 0
// No packets should have been marked INVALID
+0 `iptables -v -S INPUT | grep INVALID | grep -q -- "-c 0 0"`
+0 `iptables -v -S OUTPUT | grep INVALID | grep -q -- "-c 0 0"`
// check that out of window resets are marked as INVALID and conntrack remains
// in ESTABLISHED state.
`packetdrill/common.sh`
+0 `$xtables -A INPUT -p tcp -m conntrack --ctstate INVALID -j DROP`
+0 `$xtables -A OUTPUT -p tcp -m conntrack --ctstate INVALID -j DROP`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
0.1 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
0.1 > S 0:0(0) win 65535 <mss 1460,sackOK,TS val 1 ecr 0,nop,wscale 8>
+0.1 < S. 1:1(0) ack 1 win 65535 <mss 1460>
+0 > . 1:1(0) ack 1 win 65535
+0 < . 1:1001(1000) ack 1 win 65535
+0 < . 1001:2001(1000) ack 1 win 65535
+0 < . 2001:3001(1000) ack 1 win 65535
+0 > . 1:1(0) ack 1001 win 65535
+0 > . 1:1(0) ack 2001 win 65535
+0 > . 1:1(0) ack 3001 win 65535
+0 write(3, ..., 1000) = 1000
// out of window
+0.0 < R 0:0(0) win 0
+0 `conntrack -f $NFCT_IP_VERSION -L -p tcp --dport 8080 2>/dev/null |grep -q ESTABLISHED`
// out of window
+0.0 < R 1000000:1000000(0) win 0
+0 `conntrack -f $NFCT_IP_VERSION -L -p tcp --dport 8080 2>/dev/null |grep -q ESTABLISHED`
// in-window but not exact match
+0.0 < R 42:42(0) win 0
+0 `conntrack -f $NFCT_IP_VERSION -L -p tcp --dport 8080 2>/dev/null |grep -q ESTABLISHED`
+0.0 > P. 1:1001(1000) ack 3001 win 65535
+0.1 read(3, ..., 1000) = 1000
+0 `conntrack -f $NFCT_IP_VERSION -L -p tcp --dport 8080 2>/dev/null |grep -q ESTABLISHED`
+0 < . 3001:3001(0) ack 1001 win 65535
+0.0 < R. 3000:3000(0) ack 1001 win 0
+0 `conntrack -f $NFCT_IP_VERSION -L -p tcp --dport 8080 2>/dev/null |grep -q ESTABLISHED`
// exact next sequence
+0.0 < R. 3001:3001(0) ack 1001 win 0
// Conntrack should move to CLOSE
// Expect four invalid RSTs
+0 `$xtables -v -S INPUT | grep INVALID | grep -q -- "-c 4 "`
+0 `$xtables -v -S OUTPUT | grep INVALID | grep -q -- "-c 0 0"`
+0 `conntrack -f $NFCT_IP_VERSION -L -p tcp --dport 8080 2>/dev/null |grep -q CLOSE\ `
// Check connection re-use, i.e. peer that receives the SYN answers with
// a challenge-ACK.
// Check that conntrack lets all packets pass, including the challenge ack,
// and that a new connection is established.
`packetdrill/common.sh`
// S >
// . < (challnge-ack)
// R. >
// S >
// S. <
// Expected outcome: established connection.
+0 `$xtables -A INPUT -p tcp -m conntrack --ctstate INVALID -j DROP`
+0 `$xtables -A OUTPUT -p tcp -m conntrack --ctstate INVALID -j DROP`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
0.1 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
0.1 > S 0:0(0) win 65535 <mss 1460,sackOK,TS val 1 ecr 0,nop,wscale 8>
// Challenge ACK, old incarnation.
0.1 < . 145824453:145824453(0) ack 643160523 win 240 <mss 1460,nop,nop,TS val 1 ecr 1,nop,wscale 0>
+0.01 > R 643160523:643160523(0) win 0
+0.01 `conntrack -f $NFCT_IP_VERSION -L -p tcp --dport 8080 2>/dev/null | grep UNREPLIED | grep -q SYN_SENT`
// Must go through.
+0.01 > S 0:0(0) win 65535 <mss 1460,sackOK,TS val 1 ecr 0,nop,wscale 8>
// correct synack
+0.1 < S. 0:0(0) ack 1 win 250 <mss 1460,nop,nop,TS val 1 ecr 1,nop,wscale 0>
// 3whs completes.
+0.01 > . 1:1(0) ack 1 win 256 <nop,nop,TS val 1 ecr 1>
+0 `conntrack -f $NFCT_IP_VERSION -L -p tcp --dport 8080 2>/dev/null | grep ESTABLISHED | grep -q ASSURED`
// No packets should have been marked INVALID
+0 `$xtables -v -S INPUT | grep INVALID | grep -q -- "-c 0 0"`
+0 `$xtables -v -S OUTPUT | grep INVALID | grep -q -- "-c 0 0"`
// Check conntrack copes with syn/ack reply for a previous, old incarnation.
// tcpdump with buggy sequence
// 10.176.25.8.829 > 10.192.171.30.2049: Flags [S], seq 2375731741, win 29200, options [mss 1460,sackOK,TS val 2083107423 ecr 0,nop,wscale 7], length 0
// OLD synack, for old/previous S
// 10.192.171.30.2049 > 10.176.25.8.829: Flags [S.], seq 145824453, ack 643160523, win 65535, options [mss 8952,nop,wscale 5,TS val 3215437785 ecr 2082921663,nop,nop], length 0
// This reset never makes it to the endpoint, elided in the packetdrill script
// 10.192.171.30.2049 > 10.176.25.8.829: Flags [R.], seq 1, ack 1, win 65535, options [mss 8952,nop,wscale 5,TS val 3215443451 ecr 2082921663,nop,nop], length 0
// Syn retransmit, no change
// 10.176.25.8.829 > 10.192.171.30.2049: Flags [S], seq 2375731741, win 29200, options [mss 1460,sackOK,TS val 2083115583 ecr 0,nop,wscale 7], length 0
// CORRECT synack, should be accepted, but conntrack classified this as INVALID:
// SEQ is over the upper bound (over the window of the receiver) IN=tun0 OUT= MAC= SRC=192.0.2.1 DST=192.168.37.78 LEN=40 TOS=0x00 PREC=0x00 TTL=255 ID=0 PROTO=TCP SPT=8080 DPT=34500 SEQ=162602411 ACK=2124350315 ..
// 10.192.171.30.2049 > 10.176.25.8.829: Flags [S.], seq 162602410, ack 2375731742, win 65535, options [mss 8952,nop,wscale 5,TS val 3215445754 ecr 2083115583,nop,nop], length 0
`packetdrill/common.sh`
+0 `$xtables -A INPUT -p tcp -m conntrack --ctstate INVALID -j DROP`
+0 `$xtables -A OUTPUT -p tcp -m conntrack --ctstate INVALID -j DROP`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
0.1 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
0.1 > S 0:0(0) win 65535 <mss 1460,sackOK,TS val 1 ecr 0,nop,wscale 8>
// bogus/outdated synack, invalid ack value
0.1 < S. 145824453:145824453(0) ack 643160523 win 240 <mss 1440,nop,nop,TS val 1 ecr 1,nop,wscale 0>
// syn retransmitted
1.01 > S 0:0(0) win 65535 <mss 1460,sackOK,TS val 1015 ecr 0,nop,wscale 8>
+0 `conntrack -f $NFCT_IP_VERSION -L -p tcp --dport 8080 2>/dev/null | grep UNREPLIED | grep -q SYN_SENT`
// correct synack
+0 < S. 145758918:145758918(0) ack 1 win 250 <mss 1460,nop,nop,TS val 1 ecr 1,nop,wscale 0>
+0 write(3, ..., 1) = 1
// with buggy conntrack above packet is dropped, so SYN rtx is seen:
// script packet: 1.054007 . 1:1(0) ack 16777958 win 256 <nop,nop,TS val 1033 ecr 1>
// actual packet: 3.010000 S 0:0(0) win 65535 <mss 1460,sackOK,TS val 1015 ecr 0,nop,wscale 8>
+0 `conntrack -f $NFCT_IP_VERSION -L -p tcp --dport 8080 2>/dev/null | grep ESTABLISHED | grep -q ASSURED`
+0 > P. 1:2(1) ack 4294901762 win 256 <nop,nop,TS val 1067 ecr 1>
+0 `conntrack -f $NFCT_IP_VERSION -L -p tcp --dport 8080 2>/dev/null | grep ASSURED | grep -q ESTABLISHED`
// No packets should have been marked INVALID in OUTPUT direction, 1 in INPUT
+0 `$xtables -v -S OUTPUT | grep INVALID | grep -q -- "-c 0 0"`
+0 `$xtables -v -S INPUT | grep INVALID | grep -q -- "-c 1 "`
+0 `$xtables -D INPUT -p tcp -m conntrack --ctstate INVALID -j DROP`
+0 `$xtables -D OUTPUT -p tcp -m conntrack --ctstate INVALID -j DROP`
// Check reception of another SYN while we have an established conntrack state.
// Challenge ACK is supposed to pass through, RST reply should clear conntrack
// state and SYN retransmit should give us new 'SYN_RECV' connection state.
`packetdrill/common.sh`
// should show a match if bug is present:
+0 `iptables -A INPUT -m conntrack --ctstate INVALID -p tcp --tcp-flags SYN,ACK SYN,ACK`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 10) = 0
+0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7, TS val 1 ecr 0,nop,nop>
+0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,TS val 100 ecr 1,nop,wscale 8>
+.01 < . 1:1(0) ack 1 win 257 <TS val 1 ecr 100,nop,nop>
+0 accept(3, ..., ...) = 4
+0 < P. 1:101(100) ack 1 win 257 <TS val 2 ecr 100,nop,nop>
+.001 > . 1:1(0) ack 101 win 256 <nop,nop,TS val 110 ecr 2>
+0 read(4, ..., 101) = 100
1.0 < S 2000:2000(0) win 32792 <mss 1000,nop,wscale 7, TS val 233 ecr 0,nop,nop>
// Won't expect this: challenge ack.
+0 > . 1:1(0) ack 101 win 256 <nop,nop,TS val 112 ecr 2>
+0 < R. 101:101(0) ack 1 win 257
+0 close(4) = 0
1.5 < S 2000:2000(0) win 32792 <mss 1000,nop,wscale 0, TS val 233 ecr 0,nop,nop>
+0 `conntrack -L -p tcp --dport 8080 2>/dev/null | grep -q SYN_RECV`
+0 `iptables -v -S INPUT | grep INVALID | grep -q -- "-c 0 0"`
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment