Commit 3d572b23 authored by David S. Miller's avatar David S. Miller

Merge branch 'RED-Introduce-an-ECN-tail-dropping-mode'

Petr Machata says:

====================
RED: Introduce an ECN tail-dropping mode

When the RED qdisc is currently configured to enable ECN, the RED algorithm
is used to decide whether a certain SKB should be marked. If that SKB is
not ECN-capable, it is early-dropped.

It is also possible to keep all traffic in the queue, and just mark the
ECN-capable subset of it, as appropriate under the RED algorithm. Some
switches support this mode, and some installations make use of it.
There is currently no way to put the RED qdiscs to this mode.

Therefore this patchset adds a new RED flag, TC_RED_TAILDROP. When the
qdisc is configured with this flag, non-ECT traffic is enqueued (and
tail-dropped when the queue size is exhausted) instead of being
early-dropped.

Unfortunately, adding a new RED flag is not as simple as it sounds. RED
flags are passed in tc_red_qopt.flags. However RED neglects to validate the
flag field, and just copies it over wholesale to its internal structure,
and later dumps it back.

A broken userspace can therefore configure a RED qdisc with arbitrary
unsupported flags, and later expect to see the flags on qdisc dump. The
current ABI thus allows storage of 5 bits of custom data along with the
qdisc instance.

GRED, SFQ and CHOKE qdiscs are in the same situation. (GRED validates VQ
flags, but not the flags for the main queue.) E.g. if SFQ ever needs to
support TC_RED_ADAPTATIVE, it needs another way of doing it, and at the
same time it needs to retain the possibility to store 6 bits of
uninterpreted data.

For RED, this problem is resolved in patch #2, which adds a new attribute,
and a way to separate flags from userbits that can be reused by other
qdiscs. The flag itself and related behavioral changes are added in patch

To test the new feature, patch #1 first introduces a TDC testsuite that
covers the existing RED flags. Patch #5 later extends it with taildrop
coverage. Patch #6 contains a forwarding selftest for the offloaded
datapath.

To test the SW datapath, I took the mlxsw selftest and adapted it in mostly
obvious ways. The test is stable enough to verify that RED, ECN and ECN
taildrop actually work. However, I have no confidence in its portability to
other people's machines or mildly different configurations. I therefore do
not find it suitable for upstreaming.

GRED and CHOKE can use the same method as RED if they ever need to support
extra flags. SFQ uses the length of TCA_OPTIONS to dispatch on binary
control structure version, and would therefore need a different approach.

v2:
- Patch #1
    - Require nsPlugin in each RED test
    - Match end-of-line to catch cases of more flags reported than
      requested
- Patch #2:
    - Replaced with another patch.
- Patch #3:
    - Fix red_use_taildrop() condition in red_enqueue switch for
      probabilistic case.
- Patch #5:
    - Require nsPlugin in each RED test
    - Match end-of-line to catch cases of more flags reported than
      requested
    - Add a test for creation of non-ECN taildrop, which should fail
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 085793f0 63f3c1d0
......@@ -323,7 +323,7 @@ mlxsw_sp_qdisc_get_tc_stats(struct mlxsw_sp_port *mlxsw_sp_port,
static int
mlxsw_sp_tclass_congestion_enable(struct mlxsw_sp_port *mlxsw_sp_port,
int tclass_num, u32 min, u32 max,
u32 probability, bool is_ecn)
u32 probability, bool is_wred, bool is_ecn)
{
char cwtpm_cmd[MLXSW_REG_CWTPM_LEN];
char cwtp_cmd[MLXSW_REG_CWTP_LEN];
......@@ -341,7 +341,7 @@ mlxsw_sp_tclass_congestion_enable(struct mlxsw_sp_port *mlxsw_sp_port,
return err;
mlxsw_reg_cwtpm_pack(cwtpm_cmd, mlxsw_sp_port->local_port, tclass_num,
MLXSW_REG_CWTP_DEFAULT_PROFILE, true, is_ecn);
MLXSW_REG_CWTP_DEFAULT_PROFILE, is_wred, is_ecn);
return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(cwtpm), cwtpm_cmd);
}
......@@ -445,8 +445,9 @@ mlxsw_sp_qdisc_red_replace(struct mlxsw_sp_port *mlxsw_sp_port, u32 handle,
prob = DIV_ROUND_UP(prob, 1 << 16);
min = mlxsw_sp_bytes_cells(mlxsw_sp, p->min);
max = mlxsw_sp_bytes_cells(mlxsw_sp, p->max);
return mlxsw_sp_tclass_congestion_enable(mlxsw_sp_port, tclass_num, min,
max, prob, p->is_ecn);
return mlxsw_sp_tclass_congestion_enable(mlxsw_sp_port, tclass_num,
min, max, prob,
!p->is_nodrop, p->is_ecn);
}
static void
......
......@@ -740,6 +740,7 @@ struct tc_red_qopt_offload_params {
u32 limit;
bool is_ecn;
bool is_harddrop;
bool is_nodrop;
struct gnet_stats_queue *qstats;
};
......
......@@ -179,6 +179,44 @@ static inline bool red_check_params(u32 qth_min, u32 qth_max, u8 Wlog)
return true;
}
static inline int red_get_flags(unsigned char qopt_flags,
unsigned char historic_mask,
struct nlattr *flags_attr,
unsigned char supported_mask,
struct nla_bitfield32 *p_flags,
unsigned char *p_userbits,
struct netlink_ext_ack *extack)
{
struct nla_bitfield32 flags;
if (qopt_flags && flags_attr) {
NL_SET_ERR_MSG_MOD(extack, "flags should be passed either through qopt, or through a dedicated attribute");
return -EINVAL;
}
if (flags_attr) {
flags = nla_get_bitfield32(flags_attr);
} else {
flags.selector = historic_mask;
flags.value = qopt_flags & historic_mask;
}
*p_flags = flags;
*p_userbits = qopt_flags & ~historic_mask;
return 0;
}
static inline int red_validate_flags(unsigned char flags,
struct netlink_ext_ack *extack)
{
if ((flags & TC_RED_NODROP) && !(flags & TC_RED_ECN)) {
NL_SET_ERR_MSG_MOD(extack, "nodrop mode is only meaningful with ECN");
return -EINVAL;
}
return 0;
}
static inline void red_set_parms(struct red_parms *p,
u32 qth_min, u32 qth_max, u8 Wlog, u8 Plog,
u8 Scell_log, u8 *stab, u32 max_P)
......
......@@ -256,6 +256,7 @@ enum {
TCA_RED_PARMS,
TCA_RED_STAB,
TCA_RED_MAX_P,
TCA_RED_FLAGS, /* bitfield32 */
__TCA_RED_MAX,
};
......@@ -268,12 +269,28 @@ struct tc_red_qopt {
unsigned char Wlog; /* log(W) */
unsigned char Plog; /* log(P_max/(qth_max-qth_min)) */
unsigned char Scell_log; /* cell size for idle damping */
/* This field can be used for flags that a RED-like qdisc has
* historically supported. E.g. when configuring RED, it can be used for
* ECN, HARDDROP and ADAPTATIVE. For SFQ it can be used for ECN,
* HARDDROP. Etc. Because this field has not been validated, and is
* copied back on dump, any bits besides those to which a given qdisc
* has assigned a historical meaning need to be considered for free use
* by userspace tools.
*
* Any further flags need to be passed differently, e.g. through an
* attribute (such as TCA_RED_FLAGS above). Such attribute should allow
* passing both recent and historic flags in one value.
*/
unsigned char flags;
#define TC_RED_ECN 1
#define TC_RED_HARDDROP 2
#define TC_RED_ADAPTATIVE 4
#define TC_RED_NODROP 8
};
#define TC_RED_HISTORIC_FLAGS (TC_RED_ECN | TC_RED_HARDDROP | TC_RED_ADAPTATIVE)
struct tc_red_xstats {
__u32 early; /* Early drops */
__u32 pdrop; /* Drops due to queue limits */
......
......@@ -35,7 +35,11 @@
struct red_sched_data {
u32 limit; /* HARD maximal queue length */
unsigned char flags;
/* Non-flags in tc_red_qopt.flags. */
unsigned char userbits;
struct timer_list adapt_timer;
struct Qdisc *sch;
struct red_parms parms;
......@@ -44,6 +48,8 @@ struct red_sched_data {
struct Qdisc *qdisc;
};
static const u32 red_supported_flags = TC_RED_HISTORIC_FLAGS | TC_RED_NODROP;
static inline int red_use_ecn(struct red_sched_data *q)
{
return q->flags & TC_RED_ECN;
......@@ -54,6 +60,11 @@ static inline int red_use_harddrop(struct red_sched_data *q)
return q->flags & TC_RED_HARDDROP;
}
static int red_use_nodrop(struct red_sched_data *q)
{
return q->flags & TC_RED_NODROP;
}
static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
......@@ -74,23 +85,36 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
case RED_PROB_MARK:
qdisc_qstats_overlimit(sch);
if (!red_use_ecn(q) || !INET_ECN_set_ce(skb)) {
if (!red_use_ecn(q)) {
q->stats.prob_drop++;
goto congestion_drop;
}
q->stats.prob_mark++;
if (INET_ECN_set_ce(skb)) {
q->stats.prob_mark++;
} else if (!red_use_nodrop(q)) {
q->stats.prob_drop++;
goto congestion_drop;
}
/* Non-ECT packet in ECN nodrop mode: queue it. */
break;
case RED_HARD_MARK:
qdisc_qstats_overlimit(sch);
if (red_use_harddrop(q) || !red_use_ecn(q) ||
!INET_ECN_set_ce(skb)) {
if (red_use_harddrop(q) || !red_use_ecn(q)) {
q->stats.forced_drop++;
goto congestion_drop;
}
q->stats.forced_mark++;
if (INET_ECN_set_ce(skb)) {
q->stats.forced_mark++;
} else if (!red_use_nodrop(q)) {
q->stats.forced_drop++;
goto congestion_drop;
}
/* Non-ECT packet in ECN nodrop mode: queue it. */
break;
}
......@@ -165,6 +189,7 @@ static int red_offload(struct Qdisc *sch, bool enable)
opt.set.limit = q->limit;
opt.set.is_ecn = red_use_ecn(q);
opt.set.is_harddrop = red_use_harddrop(q);
opt.set.is_nodrop = red_use_nodrop(q);
opt.set.qstats = &sch->qstats;
} else {
opt.command = TC_RED_DESTROY;
......@@ -183,9 +208,12 @@ static void red_destroy(struct Qdisc *sch)
}
static const struct nla_policy red_policy[TCA_RED_MAX + 1] = {
[TCA_RED_UNSPEC] = { .strict_start_type = TCA_RED_FLAGS },
[TCA_RED_PARMS] = { .len = sizeof(struct tc_red_qopt) },
[TCA_RED_STAB] = { .len = RED_STAB_SIZE },
[TCA_RED_MAX_P] = { .type = NLA_U32 },
[TCA_RED_FLAGS] = { .type = NLA_BITFIELD32,
.validation_data = &red_supported_flags },
};
static int red_change(struct Qdisc *sch, struct nlattr *opt,
......@@ -194,7 +222,10 @@ static int red_change(struct Qdisc *sch, struct nlattr *opt,
struct Qdisc *old_child = NULL, *child = NULL;
struct red_sched_data *q = qdisc_priv(sch);
struct nlattr *tb[TCA_RED_MAX + 1];
struct nla_bitfield32 flags_bf;
struct tc_red_qopt *ctl;
unsigned char userbits;
unsigned char flags;
int err;
u32 max_P;
......@@ -216,6 +247,12 @@ static int red_change(struct Qdisc *sch, struct nlattr *opt,
if (!red_check_params(ctl->qth_min, ctl->qth_max, ctl->Wlog))
return -EINVAL;
err = red_get_flags(ctl->flags, TC_RED_HISTORIC_FLAGS,
tb[TCA_RED_FLAGS], red_supported_flags,
&flags_bf, &userbits, extack);
if (err)
return err;
if (ctl->limit > 0) {
child = fifo_create_dflt(sch, &bfifo_qdisc_ops, ctl->limit,
extack);
......@@ -227,7 +264,14 @@ static int red_change(struct Qdisc *sch, struct nlattr *opt,
}
sch_tree_lock(sch);
q->flags = ctl->flags;
flags = (q->flags & ~flags_bf.selector) | flags_bf.value;
err = red_validate_flags(flags, extack);
if (err)
goto unlock_out;
q->flags = flags;
q->userbits = userbits;
q->limit = ctl->limit;
if (child) {
qdisc_tree_flush_backlog(q->qdisc);
......@@ -256,6 +300,12 @@ static int red_change(struct Qdisc *sch, struct nlattr *opt,
if (old_child)
qdisc_put(old_child);
return 0;
unlock_out:
sch_tree_unlock(sch);
if (child)
qdisc_put(child);
return err;
}
static inline void red_adaptative_timer(struct timer_list *t)
......@@ -299,10 +349,15 @@ static int red_dump_offload_stats(struct Qdisc *sch)
static int red_dump(struct Qdisc *sch, struct sk_buff *skb)
{
struct red_sched_data *q = qdisc_priv(sch);
struct nla_bitfield32 flags_bf = {
.selector = red_supported_flags,
.value = q->flags,
};
struct nlattr *opts = NULL;
struct tc_red_qopt opt = {
.limit = q->limit,
.flags = q->flags,
.flags = (q->flags & TC_RED_HISTORIC_FLAGS) |
q->userbits,
.qth_min = q->parms.qth_min >> q->parms.Wlog,
.qth_max = q->parms.qth_max >> q->parms.Wlog,
.Wlog = q->parms.Wlog,
......@@ -319,7 +374,8 @@ static int red_dump(struct Qdisc *sch, struct sk_buff *skb)
if (opts == NULL)
goto nla_put_failure;
if (nla_put(skb, TCA_RED_PARMS, sizeof(opt), &opt) ||
nla_put_u32(skb, TCA_RED_MAX_P, q->parms.max_P))
nla_put_u32(skb, TCA_RED_MAX_P, q->parms.max_P) ||
nla_put(skb, TCA_RED_FLAGS, sizeof(flags_bf), &flags_bf))
goto nla_put_failure;
return nla_nest_end(skb, opts);
......
......@@ -389,17 +389,14 @@ check_marking()
((pct $cond))
}
do_ecn_test()
ecn_test_common()
{
local name=$1; shift
local vlan=$1; shift
local limit=$1; shift
local backlog
local pct
# Main stream.
start_tcp_traffic $h1.$vlan $(ipaddr 1 $vlan) $(ipaddr 3 $vlan) \
$h3_mac tos=0x01
# Build the below-the-limit backlog using UDP. We could use TCP just
# fine, but this way we get a proof that UDP is accepted when queue
# length is below the limit. The main stream is using TCP, and if the
......@@ -409,7 +406,7 @@ do_ecn_test()
check_err $? "Could not build the requested backlog"
pct=$(check_marking $vlan "== 0")
check_err $? "backlog $backlog / $limit Got $pct% marked packets, expected == 0."
log_test "TC $((vlan - 10)): ECN backlog < limit"
log_test "TC $((vlan - 10)): $name backlog < limit"
# Now push TCP, because non-TCP traffic would be early-dropped after the
# backlog crosses the limit, and we want to make sure that the backlog
......@@ -419,7 +416,20 @@ do_ecn_test()
check_err $? "Could not build the requested backlog"
pct=$(check_marking $vlan ">= 95")
check_err $? "backlog $backlog / $limit Got $pct% marked packets, expected >= 95."
log_test "TC $((vlan - 10)): ECN backlog > limit"
log_test "TC $((vlan - 10)): $name backlog > limit"
}
do_ecn_test()
{
local vlan=$1; shift
local limit=$1; shift
local name=ECN
start_tcp_traffic $h1.$vlan $(ipaddr 1 $vlan) $(ipaddr 3 $vlan) \
$h3_mac tos=0x01
sleep 1
ecn_test_common "$name" $vlan $limit
# Up there we saw that UDP gets accepted when backlog is below the
# limit. Now that it is above, it should all get dropped, and backlog
......@@ -427,7 +437,31 @@ do_ecn_test()
RET=0
build_backlog $vlan $((2 * limit)) udp >/dev/null
check_fail $? "UDP traffic went into backlog instead of being early-dropped"
log_test "TC $((vlan - 10)): ECN backlog > limit: UDP early-dropped"
log_test "TC $((vlan - 10)): $name backlog > limit: UDP early-dropped"
stop_traffic
sleep 1
}
do_ecn_nodrop_test()
{
local vlan=$1; shift
local limit=$1; shift
local name="ECN nodrop"
start_tcp_traffic $h1.$vlan $(ipaddr 1 $vlan) $(ipaddr 3 $vlan) \
$h3_mac tos=0x01
sleep 1
ecn_test_common "$name" $vlan $limit
# Up there we saw that UDP gets accepted when backlog is below the
# limit. Now that it is above, in nodrop mode, make sure it goes to
# backlog as well.
RET=0
build_backlog $vlan $((2 * limit)) udp >/dev/null
check_err $? "UDP traffic was early-dropped instead of getting into backlog"
log_test "TC $((vlan - 10)): $name backlog > limit: UDP not dropped"
stop_traffic
sleep 1
......
......@@ -4,6 +4,7 @@
ALL_TESTS="
ping_ipv4
ecn_test
ecn_nodrop_test
red_test
mc_backlog_test
"
......@@ -50,6 +51,16 @@ ecn_test()
uninstall_qdisc
}
ecn_nodrop_test()
{
install_qdisc ecn nodrop
do_ecn_nodrop_test 10 $BACKLOG1
do_ecn_nodrop_test 11 $BACKLOG2
uninstall_qdisc
}
red_test()
{
install_qdisc
......
......@@ -4,6 +4,7 @@
ALL_TESTS="
ping_ipv4
ecn_test
ecn_nodrop_test
red_test
mc_backlog_test
"
......@@ -33,6 +34,13 @@ ecn_test()
uninstall_qdisc
}
ecn_nodrop_test()
{
install_qdisc ecn nodrop
do_ecn_nodrop_test 10 $BACKLOG
uninstall_qdisc
}
red_test()
{
install_qdisc
......
[
{
"id": "8b6e",
"name": "Create RED with no flags",
"category": [
"qdisc",
"red"
],
"plugins": {
"requires": "nsPlugin"
},
"setup": [
"$IP link add dev $DUMMY type dummy || /bin/true"
],
"cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root red limit 1M avpkt 1500 min 100K max 300K",
"expExitCode": "0",
"verifyCmd": "$TC qdisc show dev $DUMMY",
"matchPattern": "qdisc red 1: root .* limit 1Mb min 100Kb max 300Kb $",
"matchCount": "1",
"teardown": [
"$TC qdisc del dev $DUMMY handle 1: root",
"$IP link del dev $DUMMY type dummy"
]
},
{
"id": "342e",
"name": "Create RED with adaptive flag",
"category": [
"qdisc",
"red"
],
"plugins": {
"requires": "nsPlugin"
},
"setup": [
"$IP link add dev $DUMMY type dummy || /bin/true"
],
"cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root red adaptive limit 1M avpkt 1500 min 100K max 300K",
"expExitCode": "0",
"verifyCmd": "$TC qdisc show dev $DUMMY",
"matchPattern": "qdisc red 1: root .* limit 1Mb min 100Kb max 300Kb adaptive $",
"matchCount": "1",
"teardown": [
"$TC qdisc del dev $DUMMY handle 1: root",
"$IP link del dev $DUMMY type dummy"
]
},
{
"id": "2d4b",
"name": "Create RED with ECN flag",
"category": [
"qdisc",
"red"
],
"plugins": {
"requires": "nsPlugin"
},
"setup": [
"$IP link add dev $DUMMY type dummy || /bin/true"
],
"cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root red ecn limit 1M avpkt 1500 min 100K max 300K",
"expExitCode": "0",
"verifyCmd": "$TC qdisc show dev $DUMMY",
"matchPattern": "qdisc red 1: root .* limit 1Mb min 100Kb max 300Kb ecn $",
"matchCount": "1",
"teardown": [
"$TC qdisc del dev $DUMMY handle 1: root",
"$IP link del dev $DUMMY type dummy"
]
},
{
"id": "650f",
"name": "Create RED with flags ECN, adaptive",
"category": [
"qdisc",
"red"
],
"plugins": {
"requires": "nsPlugin"
},
"setup": [
"$IP link add dev $DUMMY type dummy || /bin/true"
],
"cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root red ecn adaptive limit 1M avpkt 1500 min 100K max 300K",
"expExitCode": "0",
"verifyCmd": "$TC qdisc show dev $DUMMY",
"matchPattern": "qdisc red 1: root .* limit 1Mb min 100Kb max 300Kb ecn adaptive $",
"matchCount": "1",
"teardown": [
"$TC qdisc del dev $DUMMY handle 1: root",
"$IP link del dev $DUMMY type dummy"
]
},
{
"id": "5f15",
"name": "Create RED with flags ECN, harddrop",
"category": [
"qdisc",
"red"
],
"plugins": {
"requires": "nsPlugin"
},
"setup": [
"$IP link add dev $DUMMY type dummy || /bin/true"
],
"cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root red ecn harddrop limit 1M avpkt 1500 min 100K max 300K",
"expExitCode": "0",
"verifyCmd": "$TC qdisc show dev $DUMMY",
"matchPattern": "qdisc red 1: root .* limit 1Mb min 100Kb max 300Kb ecn harddrop $",
"matchCount": "1",
"teardown": [
"$TC qdisc del dev $DUMMY handle 1: root",
"$IP link del dev $DUMMY type dummy"
]
},
{
"id": "53e8",
"name": "Create RED with flags ECN, nodrop",
"category": [
"qdisc",
"red"
],
"plugins": {
"requires": "nsPlugin"
},
"setup": [
"$IP link add dev $DUMMY type dummy || /bin/true"
],
"cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root red ecn nodrop limit 1M avpkt 1500 min 100K max 300K",
"expExitCode": "0",
"verifyCmd": "$TC qdisc show dev $DUMMY",
"matchPattern": "qdisc red 1: root .* limit 1Mb min 100Kb max 300Kb ecn nodrop $",
"matchCount": "1",
"teardown": [
"$TC qdisc del dev $DUMMY handle 1: root",
"$IP link del dev $DUMMY type dummy"
]
},
{
"id": "d091",
"name": "Fail to create RED with only nodrop flag",
"category": [
"qdisc",
"red"
],
"plugins": {
"requires": "nsPlugin"
},
"setup": [
"$IP link add dev $DUMMY type dummy || /bin/true"
],
"cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root red nodrop limit 1M avpkt 1500 min 100K max 300K",
"expExitCode": "2",
"verifyCmd": "$TC qdisc show dev $DUMMY",
"matchPattern": "qdisc red",
"matchCount": "0",
"teardown": [
"$IP link del dev $DUMMY type dummy"
]
},
{
"id": "af8e",
"name": "Create RED with flags ECN, nodrop, harddrop",
"category": [
"qdisc",
"red"
],
"plugins": {
"requires": "nsPlugin"
},
"setup": [
"$IP link add dev $DUMMY type dummy || /bin/true"
],
"cmdUnderTest": "$TC qdisc add dev $DUMMY handle 1: root red ecn harddrop nodrop limit 1M avpkt 1500 min 100K max 300K",
"expExitCode": "0",
"verifyCmd": "$TC qdisc show dev $DUMMY",
"matchPattern": "qdisc red 1: root .* limit 1Mb min 100Kb max 300Kb ecn harddrop nodrop $",
"matchCount": "1",
"teardown": [
"$TC qdisc del dev $DUMMY handle 1: root",
"$IP link del dev $DUMMY type dummy"
]
}
]
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment