Commit 6bff0017 authored by David S. Miller's avatar David S. Miller

Merge branch 'ETS-qdisc'

Petr Machata says:

====================
Add a new Qdisc, ETS

The IEEE standard 802.1Qaz (and 802.1Q-2014) specifies four principal
transmission selection algorithms: strict priority, credit-based shaper,
ETS (bandwidth sharing), and vendor-specific. All these have their
corresponding knobs in DCB. But DCB does not have interfaces to configure
RED and ECN, unlike Qdiscs.

In the Qdisc land, strict priority is implemented by PRIO. Credit-based
transmission selection algorithm can then be modeled by having e.g. TBF or
CBS Qdisc below some of the PRIO bands. ETS would then be modeled by
placing a DRR Qdisc under the last PRIO band.

The problem with this approach is that DRR on its own, as well as the
combination of PRIO and DRR, are tricky to configure and tricky to offload
to 802.1Qaz-compliant hardware. This is due to several reasons:

- As any classful Qdisc, DRR supports adding classifiers to decide in which
  class to enqueue packets. Unlike PRIO, there's however no fallback in the
  form of priomap. A way to achieve classification based on packet priority
  is e.g. like this:

    # tc filter add dev swp1 root handle 1: \
		basic match 'meta(priority eq 0)' flowid 1:10

  Expressing the priomap in this manner however forces drivers to deep dive
  into the classifier block to parse the individual rules.

  A possible solution would be to extend the classes with a "defmap" a la
  split / defmap mechanism of CBQ, and introduce this as a last resort
  classification. However, unlike priomap, this doesn't have the guarantee
  of covering all priorities. Traffic whose priority is not covered is
  dropped by DRR as unclassified. But ASICs tend to implement dropping in
  the ACL block, not in scheduling pipelines. The need to treat these
  configurations correctly (if only to decide to not offload at all)
  complicates a driver.

  It's not clear how to retrofit priomap with all its benefits to DRR
  without changing it beyond recognition.

- The interplay between PRIO and DRR is also causing problems. 802.1Qaz has
  all ETS TCs as a last resort. Switch ASICs that support ETS at all are
  likely to handle ETS traffic this way as well. However, the Linux model
  is more generic, allowing the DRR block in any band. Drivers would need
  to be careful to handle this case correctly, otherwise the offloaded
  model might not match the slow-path one.

  In a similar vein, PRIO and DRR need to agree on the list of priorities
  assigned to DRR. This is doubly problematic--the user needs to take care
  to keep the two in sync, and the driver needs to watch for any holes in
  DRR coverage and treat the traffic correctly, as discussed above.

  Note that at the time that DRR Qdisc is added, it has no classes, and
  thus any priorities assigned to that PRIO band are not covered. Thus this
  case is surprisingly rather common, and needs to be handled gracefully by
  the driver.

- Similarly due to DRR flexibility, when a Qdisc (such as RED) is attached
  below it, it is not immediately clear which TC the class represents. This
  is unlike PRIO with its straightforward classid scheme. When DRR is
  combined with PRIO, the relationship between classes and TCs gets even
  more murky.

  This is a problem for users as well: the TC mapping is rather important
  for (devlink) shared buffer configuration and (ethtool) counters.

So instead, this patch set introduces a new Qdisc, which is based on
802.1Qaz wording. It is PRIO-like in how it is configured, meaning one
needs to specify how many bands there are, how many are strict and how many
are ETS, quanta for the latter, and priomap.

The new Qdisc operates like the PRIO / DRR combo would when configured as
per the standard. The strict classes, if any, are tried for traffic first.
When there's no traffic in any of the strict queues, the ETS ones (if any)
are treated in the same way as in DRR.

The chosen interface makes the overall system both reasonably easy to
configure, and reasonably easy to offload. The extra code to support ETS in
mlxsw (which already supports PRIO) is about 150 lines, of which perhaps 20
lines is bona fide new business logic.

Credit-based shaping transmission selection algorithm can be configured by
adding a CBS Qdisc under one of the strict bands (e.g. TBF can be used to a
similar effect as well). As a non-work-conserving Qdisc, CBS can't be
hooked under the ETS bands. This is detected and handled identically to DRR
Qdisc at runtime. Note that offloading CBS is not subject of this patchset.

The patchset proceeds in four stages:

- Patches #1-#3 are cleanups.
- Patches #4 and #5 contain the new Qdisc.
- Patches #6 and #7 update mlxsw to offload the new Qdisc.
- Patches #8-#10 add selftests for ETS.

Examples:

- Add a Qdisc with 6 bands, 3 strict and 3 ETS with 45%-30%-25% weights:

    # tc qdisc add dev swp1 root handle 1: \
	ets strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 4500 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5

- Tweak quantum of one of the classes of the previous Qdisc:

    # tc class ch dev swp1 classid 1:4 ets quantum 1000
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 6 strict 3 quanta 1000 3000 2500 priomap 0 1 1 1 2 3 4 5 5 5 5 5 5 5 5 5
    # tc class ch dev swp1 classid 1:3 ets quantum 1000
    Error: Strict bands do not have a configurable quantum.

- Purely strict Qdisc with 1:1 mapping between priorities and TCs:

    # tc qdisc add dev swp1 root handle 1: \
	ets strict 8 priomap 7 6 5 4 3 2 1 0
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 8 strict 8 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7

- Use "bands" to specify number of bands explicitly. Underspecified bands
  are implicitly ETS and their quantum is taken from MTU. The following
  thus gives each band the same weight:

    # tc qdisc add dev swp1 root handle 1: \
	ets bands 8 priomap 7 6 5 4 3 2 1 0
    # tc qdisc sh dev swp1
    qdisc ets 1: root refcnt 2 bands 8 quanta 1514 1514 1514 1514 1514 1514 1514 1514 priomap 7 6 5 4 3 2 1 0 7 7 7 7 7 7 7 7

v2:
- This addresses points raised by David Miller.
- Patch #4:
    - sch_ets.c: Add a comment with description of the Qdisc and the
      dequeuing algorithm.
    - Kconfig: Add a high-level description to the help blurb.

v1:
- No changes, first upstream submission after RFC.

v3 (internal):
- This addresses review from Jiri Pirko.
- Patch #3:
    - Rename to _HR_ instead of to _HIERARCHY_.
- Patch #4:
    - pkt_sched.h: Keep all the TCA_ETS_ constants in one enum.
    - pkt_sched.h: Rename TCA_ETS_BANDS to _NBANDS, _STRICT to _NSTRICT,
      _BAND_QUANTUM to _QUANTA_BAND and _PMAP_BAND to _PRIOMAP_BAND.
    - sch_ets.c: Update to reflect the above changes. Add a new policy,
      ets_class_policy, which is used when parsing class changes.
      Currently that policy is the same as the quanta policy, but that
      might change.
    - sch_ets.c: Move MTU handling from ets_quantum_parse() to the one
      caller that makes use of it.
    - sch_ets.c: ets_qdisc_priomap_parse(): WARN_ON_ONCE on invalid
      attribute instead of returning an extack.
- Patch #6:
    - __mlxsw_sp_qdisc_ets_replace(): Pass the weights argument to this
      function in this patch already. Drop the weight computation.
    - mlxsw_sp_qdisc_prio_replace(): Rename "quanta" to "zeroes" and
      pass for the abovementioned "weights".
    - mlxsw_sp_qdisc_prio_graft(): Convert to a wrapper around
      __mlxsw_sp_qdisc_ets_graft(), instead of invoking the latter
      directly from mlxsw_sp_setup_tc_prio().
    - Update to follow the _HIERARCHY_ -> _HR_ renaming.
- Patch #7:
    - __mlxsw_sp_qdisc_ets_replace(): The "weights" argument passing and
      weight computation removal are now done in a previous patch.
    - mlxsw_sp_setup_tc_ets(): Drop case TC_ETS_REPLACE, which is handled
      earlier in the function.
- Patch #3 (iproute2):
    - Add an example output to the commit message.
    - tc-ets.8: Fix output of two examples.
    - tc-ets.8: Describe default values of "bands", "quanta".
    - q_ets.c: A number of fixes in error messages.
    - q_ets.c: Comment formatting: /*padding*/ -> /* padding */
    - q_ets.c: parse_nbands: Move duplicate checking to callers.
    - q_ets.c: Don't accept both "quantum" and "quanta" as equivalent.

v2 (internal):
- This addresses review from Ido Schimmel and comments from Alexander
  Kushnarov.
- Patch #2:
    - s/coment/comment in the commit message.
- Patch #4:
    - sch_ets: ets_class_is_strict(), ets_class_id(): Constify an argument
    - ets_class_find(): RXTify
- Patch #3 (iproute2):
    - tc-ets.8: some spelling fixes
    - tc-ets.8: add another example
    - tc.8: add an ETS to "CLASSFUL QDISCS" section

v1 (internal):
- This addresses RFC reviews from Ido Schimmel and Roman Mashak, bugs found
  by Alexander Petrovskiy and myself, and other improvements.
- Patch #2:
    - Expand the explanation with an explicit example.
- Patch #4:
    - Kconfig: s/sch_drr/sch_ets/
    - sch_ets: Reorder includes to be in alphabetical order
    - sch_ets: ets_quantum_parse(): Rename the return-pointer argument
      from pquantum to quantum, and use it directly, not going through a
      local temporary.
    - sch_ets: ets_qdisc_quanta_parse(): Convert syntax of function
      argument "quanta" from an array to a pointer.
    - sch_ets: ets_qdisc_priomap_parse(): Likewise with "priomap".
    - sch_ets: ets_qdisc_quanta_parse(), ets_qdisc_priomap_parse(): Invoke
      __nla_validate_nested directly instead of nl80211_validate_nested().
    - sch_ets: ets_qdisc_quanta_parse(): WARN_ON_ONCE on invalid attribute
      instead of returning an extack.
    - sch_ets: ets_qdisc_change(): Make the last band the default one for
      unmentioned priomap priorities.
    - sch_ets: Fix a panic when an offloaded child in a bandwidth-sharing
      band notified its ETS parent.
    - sch_ets: When ungrafting, add the newly-created invisible FIFO to
      the Qdisc hash
- Patch #5:
    - pkt_cls.h: Note that quantum=0 signifies a strict band.
    - Fix error path handling when ets_offload_dump() fails.
- Patch #6:
    - __mlxsw_sp_qdisc_ets_replace(): Convert syntax of function arguments
      "quanta" and "priomap" from arrays to pointers.
- Patch #7:
    - __mlxsw_sp_qdisc_ets_replace(): Convert syntax of function argument
      "weights" from an array to a pointer.
- Patch #9:
    - mlxsw/sch_ets.sh: Add a comment explaining packet prioritization.
    - Adjust the whole suite to allow testing of traffic classifiers
      in addition to testing priomap.
- Patch #10:
    - Add a number of new tests to test default priomap band, overlarge
      number of bands, zeroes in quanta, and altogether missing quanta.
- Patch #1 (iproute2):
    - State motivation for inclusion of this patch in the patcheset in the
      commit message.
- Patch #3 (iproute2):
    - tc-ets.8: it is now December
    - tc-ets.8: explain inactivity WRT using non-WC Qdiscs under ETS band
    - tc-ets.8: s/flow/band in explanation of quantum
    - tc-ets.8: explain what happens with priorities not covered by priomap
    - tc-ets.8: default priomap band is now the last one
    - q_ets.c: ets_parse_opt(): Remove unnecessary initialization of
      priomap and quanta.
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents cbd22f17 82c664b6
......@@ -3477,10 +3477,10 @@ MLXSW_REG_DEFINE(qeec, MLXSW_REG_QEEC_ID, MLXSW_REG_QEEC_LEN);
MLXSW_ITEM32(reg, qeec, local_port, 0x00, 16, 8);
enum mlxsw_reg_qeec_hr {
MLXSW_REG_QEEC_HIERARCY_PORT,
MLXSW_REG_QEEC_HIERARCY_GROUP,
MLXSW_REG_QEEC_HIERARCY_SUBGROUP,
MLXSW_REG_QEEC_HIERARCY_TC,
MLXSW_REG_QEEC_HR_PORT,
MLXSW_REG_QEEC_HR_GROUP,
MLXSW_REG_QEEC_HR_SUBGROUP,
MLXSW_REG_QEEC_HR_TC,
};
/* reg_qeec_element_hierarchy
......@@ -3618,8 +3618,7 @@ static inline void mlxsw_reg_qeec_ptps_pack(char *payload, u8 local_port,
{
MLXSW_REG_ZERO(qeec, payload);
mlxsw_reg_qeec_local_port_set(payload, local_port);
mlxsw_reg_qeec_element_hierarchy_set(payload,
MLXSW_REG_QEEC_HIERARCY_PORT);
mlxsw_reg_qeec_element_hierarchy_set(payload, MLXSW_REG_QEEC_HR_PORT);
mlxsw_reg_qeec_ptps_set(payload, ptps);
}
......
......@@ -1796,6 +1796,8 @@ static int mlxsw_sp_setup_tc(struct net_device *dev, enum tc_setup_type type,
return mlxsw_sp_setup_tc_red(mlxsw_sp_port, type_data);
case TC_SETUP_QDISC_PRIO:
return mlxsw_sp_setup_tc_prio(mlxsw_sp_port, type_data);
case TC_SETUP_QDISC_ETS:
return mlxsw_sp_setup_tc_ets(mlxsw_sp_port, type_data);
default:
return -EOPNOTSUPP;
}
......@@ -3602,26 +3604,25 @@ static int mlxsw_sp_port_ets_init(struct mlxsw_sp_port *mlxsw_sp_port)
* one subgroup, which are all member in the same group.
*/
err = mlxsw_sp_port_ets_set(mlxsw_sp_port,
MLXSW_REG_QEEC_HIERARCY_GROUP, 0, 0, false,
0);
MLXSW_REG_QEEC_HR_GROUP, 0, 0, false, 0);
if (err)
return err;
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
err = mlxsw_sp_port_ets_set(mlxsw_sp_port,
MLXSW_REG_QEEC_HIERARCY_SUBGROUP, i,
MLXSW_REG_QEEC_HR_SUBGROUP, i,
0, false, 0);
if (err)
return err;
}
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
err = mlxsw_sp_port_ets_set(mlxsw_sp_port,
MLXSW_REG_QEEC_HIERARCY_TC, i, i,
MLXSW_REG_QEEC_HR_TC, i, i,
false, 0);
if (err)
return err;
err = mlxsw_sp_port_ets_set(mlxsw_sp_port,
MLXSW_REG_QEEC_HIERARCY_TC,
MLXSW_REG_QEEC_HR_TC,
i + 8, i,
true, 100);
if (err)
......@@ -3633,13 +3634,13 @@ static int mlxsw_sp_port_ets_init(struct mlxsw_sp_port *mlxsw_sp_port)
* for the initial configuration.
*/
err = mlxsw_sp_port_ets_maxrate_set(mlxsw_sp_port,
MLXSW_REG_QEEC_HIERARCY_PORT, 0, 0,
MLXSW_REG_QEEC_HR_PORT, 0, 0,
MLXSW_REG_QEEC_MAS_DIS);
if (err)
return err;
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
err = mlxsw_sp_port_ets_maxrate_set(mlxsw_sp_port,
MLXSW_REG_QEEC_HIERARCY_SUBGROUP,
MLXSW_REG_QEEC_HR_SUBGROUP,
i, 0,
MLXSW_REG_QEEC_MAS_DIS);
if (err)
......@@ -3647,14 +3648,14 @@ static int mlxsw_sp_port_ets_init(struct mlxsw_sp_port *mlxsw_sp_port)
}
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
err = mlxsw_sp_port_ets_maxrate_set(mlxsw_sp_port,
MLXSW_REG_QEEC_HIERARCY_TC,
MLXSW_REG_QEEC_HR_TC,
i, i,
MLXSW_REG_QEEC_MAS_DIS);
if (err)
return err;
err = mlxsw_sp_port_ets_maxrate_set(mlxsw_sp_port,
MLXSW_REG_QEEC_HIERARCY_TC,
MLXSW_REG_QEEC_HR_TC,
i + 8, i,
MLXSW_REG_QEEC_MAS_DIS);
if (err)
......@@ -3664,7 +3665,7 @@ static int mlxsw_sp_port_ets_init(struct mlxsw_sp_port *mlxsw_sp_port)
/* Configure the min shaper for multicast TCs. */
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
err = mlxsw_sp_port_min_bw_set(mlxsw_sp_port,
MLXSW_REG_QEEC_HIERARCY_TC,
MLXSW_REG_QEEC_HR_TC,
i + 8, i,
MLXSW_REG_QEEC_MIS_MIN);
if (err)
......
......@@ -852,6 +852,8 @@ int mlxsw_sp_setup_tc_red(struct mlxsw_sp_port *mlxsw_sp_port,
struct tc_red_qopt_offload *p);
int mlxsw_sp_setup_tc_prio(struct mlxsw_sp_port *mlxsw_sp_port,
struct tc_prio_qopt_offload *p);
int mlxsw_sp_setup_tc_ets(struct mlxsw_sp_port *mlxsw_sp_port,
struct tc_ets_qopt_offload *p);
/* spectrum_fid.c */
bool mlxsw_sp_fid_is_dummy(struct mlxsw_sp *mlxsw_sp, u16 fid_index);
......
......@@ -160,7 +160,7 @@ static int __mlxsw_sp_dcbnl_ieee_setets(struct mlxsw_sp_port *mlxsw_sp_port,
u8 weight = ets->tc_tx_bw[i];
err = mlxsw_sp_port_ets_set(mlxsw_sp_port,
MLXSW_REG_QEEC_HIERARCY_SUBGROUP, i,
MLXSW_REG_QEEC_HR_SUBGROUP, i,
0, dwrr, weight);
if (err) {
netdev_err(dev, "Failed to link subgroup ETS element %d to group\n",
......@@ -198,7 +198,7 @@ static int __mlxsw_sp_dcbnl_ieee_setets(struct mlxsw_sp_port *mlxsw_sp_port,
u8 weight = my_ets->tc_tx_bw[i];
err = mlxsw_sp_port_ets_set(mlxsw_sp_port,
MLXSW_REG_QEEC_HIERARCY_SUBGROUP, i,
MLXSW_REG_QEEC_HR_SUBGROUP, i,
0, dwrr, weight);
}
return err;
......@@ -507,7 +507,7 @@ static int mlxsw_sp_dcbnl_ieee_setmaxrate(struct net_device *dev,
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
err = mlxsw_sp_port_ets_maxrate_set(mlxsw_sp_port,
MLXSW_REG_QEEC_HIERARCY_SUBGROUP,
MLXSW_REG_QEEC_HR_SUBGROUP,
i, 0,
maxrate->tc_maxrate[i]);
if (err) {
......@@ -523,7 +523,7 @@ static int mlxsw_sp_dcbnl_ieee_setmaxrate(struct net_device *dev,
err_port_ets_maxrate_set:
for (i--; i >= 0; i--)
mlxsw_sp_port_ets_maxrate_set(mlxsw_sp_port,
MLXSW_REG_QEEC_HIERARCY_SUBGROUP,
MLXSW_REG_QEEC_HR_SUBGROUP,
i, 0, my_maxrate->tc_maxrate[i]);
return err;
}
......
......@@ -849,6 +849,7 @@ enum tc_setup_type {
TC_SETUP_QDISC_GRED,
TC_SETUP_QDISC_TAPRIO,
TC_SETUP_FT,
TC_SETUP_QDISC_ETS,
};
/* These structures hold the attributes of bpf state that are being passed
......
......@@ -791,9 +791,8 @@ enum tc_prio_command {
struct tc_prio_qopt_offload_params {
int bands;
u8 priomap[TC_PRIO_MAX + 1];
/* In case that a prio qdisc is offloaded and now is changed to a
* non-offloadedable config, it needs to update the backlog & qlen
* values to negate the HW backlog & qlen values (and only them).
/* At the point of un-offloading the Qdisc, the reported backlog and
* qlen need to be reduced by the portion that is in HW.
*/
struct gnet_stats_queue *qstats;
};
......@@ -824,4 +823,35 @@ struct tc_root_qopt_offload {
bool ingress;
};
enum tc_ets_command {
TC_ETS_REPLACE,
TC_ETS_DESTROY,
TC_ETS_STATS,
TC_ETS_GRAFT,
};
struct tc_ets_qopt_offload_replace_params {
unsigned int bands;
u8 priomap[TC_PRIO_MAX + 1];
unsigned int quanta[TCQ_ETS_MAX_BANDS]; /* 0 for strict bands. */
unsigned int weights[TCQ_ETS_MAX_BANDS];
struct gnet_stats_queue *qstats;
};
struct tc_ets_qopt_offload_graft_params {
u8 band;
u32 child_handle;
};
struct tc_ets_qopt_offload {
enum tc_ets_command command;
u32 handle;
u32 parent;
union {
struct tc_ets_qopt_offload_replace_params replace_params;
struct tc_qopt_offload_stats stats;
struct tc_ets_qopt_offload_graft_params graft_params;
};
};
#endif
......@@ -1187,4 +1187,21 @@ enum {
#define TCA_TAPRIO_ATTR_MAX (__TCA_TAPRIO_ATTR_MAX - 1)
/* ETS */
#define TCQ_ETS_MAX_BANDS 16
enum {
TCA_ETS_UNSPEC,
TCA_ETS_NBANDS, /* u8 */
TCA_ETS_NSTRICT, /* u8 */
TCA_ETS_QUANTA, /* nested TCA_ETS_QUANTA_BAND */
TCA_ETS_QUANTA_BAND, /* u32 */
TCA_ETS_PRIOMAP, /* nested TCA_ETS_PRIOMAP_BAND */
TCA_ETS_PRIOMAP_BAND, /* u8 */
__TCA_ETS_MAX,
};
#define TCA_ETS_MAX (__TCA_ETS_MAX - 1)
#endif
......@@ -409,6 +409,23 @@ config NET_SCH_PLUG
To compile this code as a module, choose M here: the
module will be called sch_plug.
config NET_SCH_ETS
tristate "Enhanced transmission selection scheduler (ETS)"
help
The Enhanced Transmission Selection scheduler is a classful
queuing discipline that merges functionality of PRIO and DRR
qdiscs in one scheduler. ETS makes it easy to configure a set of
strict and bandwidth-sharing bands to implement the transmission
selection described in 802.1Qaz.
Say Y here if you want to use the ETS packet scheduling
algorithm.
To compile this driver as a module, choose M here: the module
will be called sch_ets.
If unsure, say N.
menuconfig NET_SCH_DEFAULT
bool "Allow override default queue discipline"
---help---
......
......@@ -48,6 +48,7 @@ obj-$(CONFIG_NET_SCH_ATM) += sch_atm.o
obj-$(CONFIG_NET_SCH_NETEM) += sch_netem.o
obj-$(CONFIG_NET_SCH_DRR) += sch_drr.o
obj-$(CONFIG_NET_SCH_PLUG) += sch_plug.o
obj-$(CONFIG_NET_SCH_ETS) += sch_ets.o
obj-$(CONFIG_NET_SCH_MQPRIO) += sch_mqprio.o
obj-$(CONFIG_NET_SCH_SKBPRIO) += sch_skbprio.o
obj-$(CONFIG_NET_SCH_CHOKE) += sch_choke.o
......
This diff is collapsed.
......@@ -24,24 +24,6 @@ rate()
echo $((8 * (t1 - t0) / interval))
}
start_traffic()
{
local h_in=$1; shift # Where the traffic egresses the host
local sip=$1; shift
local dip=$1; shift
local dmac=$1; shift
$MZ $h_in -p 8000 -A $sip -B $dip -c 0 \
-a own -b $dmac -t udp -q &
sleep 1
}
stop_traffic()
{
# Suppress noise from killing mausezahn.
{ kill %% && wait %%; } 2>/dev/null
}
check_rate()
{
local rate=$1; shift
......@@ -96,3 +78,31 @@ measure_rate()
echo $ir $er
return $ret
}
bail_on_lldpad()
{
if systemctl is-active --quiet lldpad; then
cat >/dev/stderr <<-EOF
WARNING: lldpad is running
lldpad will likely configure DCB, and this test will
configure Qdiscs. mlxsw does not support both at the
same time, one of them is arbitrarily going to overwrite
the other. That will cause spurious failures (or,
unlikely, passes) of this test.
EOF
if [[ -z $ALLOW_LLDPAD ]]; then
cat >/dev/stderr <<-EOF
If you want to run the test anyway, please set
an environment variable ALLOW_LLDPAD to a
non-empty string.
EOF
exit 1
else
return
fi
fi
}
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
# A driver for the ETS selftest that implements testing in offloaded datapath.
lib_dir=$(dirname $0)/../../../net/forwarding
source $lib_dir/sch_ets_core.sh
source $lib_dir/devlink_lib.sh
source qos_lib.sh
ALL_TESTS="
ping_ipv4
priomap_mode
ets_test_strict
ets_test_mixed
ets_test_dwrr
"
switch_create()
{
ets_switch_create
# Create a bottleneck so that the DWRR process can kick in.
ethtool -s $h2 speed 1000 autoneg off
ethtool -s $swp2 speed 1000 autoneg off
# Set the ingress quota high and use the three egress TCs to limit the
# amount of traffic that is admitted to the shared buffers. This makes
# sure that there is always enough traffic of all types to select from
# for the DWRR process.
devlink_port_pool_th_set $swp1 0 12
devlink_tc_bind_pool_th_set $swp1 0 ingress 0 12
devlink_port_pool_th_set $swp2 4 12
devlink_tc_bind_pool_th_set $swp2 7 egress 4 5
devlink_tc_bind_pool_th_set $swp2 6 egress 4 5
devlink_tc_bind_pool_th_set $swp2 5 egress 4 5
# Note: sch_ets_core.sh uses VLAN ingress-qos-map to assign packet
# priorities at $swp1 based on their 802.1p headers. ingress-qos-map is
# not offloaded by mlxsw as of this writing, but the mapping used is
# 1:1, which is the mapping currently hard-coded by the driver.
}
switch_destroy()
{
devlink_tc_bind_pool_th_restore $swp2 5 egress
devlink_tc_bind_pool_th_restore $swp2 6 egress
devlink_tc_bind_pool_th_restore $swp2 7 egress
devlink_port_pool_th_restore $swp2 4
devlink_tc_bind_pool_th_restore $swp1 0 ingress
devlink_port_pool_th_restore $swp1 0
ethtool -s $swp2 autoneg on
ethtool -s $h2 autoneg on
ets_switch_destroy
}
# Callback from sch_ets_tests.sh
get_stats()
{
local band=$1; shift
ethtool_stats_get "$h2" rx_octets_prio_$band
}
bail_on_lldpad
ets_run
......@@ -1065,3 +1065,21 @@ flood_test()
flood_unicast_test $br_port $host1_if $host2_if
flood_multicast_test $br_port $host1_if $host2_if
}
start_traffic()
{
local h_in=$1; shift # Where the traffic egresses the host
local sip=$1; shift
local dip=$1; shift
local dmac=$1; shift
$MZ $h_in -p 8000 -A $sip -B $dip -c 0 \
-a own -b $dmac -t udp -q &
sleep 1
}
stop_traffic()
{
# Suppress noise from killing mausezahn.
{ kill %% && wait %%; } 2>/dev/null
}
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
# A driver for the ETS selftest that implements testing in slowpath.
lib_dir=.
source sch_ets_core.sh
ALL_TESTS="
ping_ipv4
priomap_mode
ets_test_strict
ets_test_mixed
ets_test_dwrr
classifier_mode
ets_test_strict
ets_test_mixed
ets_test_dwrr
"
switch_create()
{
ets_switch_create
# Create a bottleneck so that the DWRR process can kick in.
tc qdisc add dev $swp2 root handle 1: tbf \
rate 1Gbit burst 1Mbit latency 100ms
PARENT="parent 1:"
}
switch_destroy()
{
ets_switch_destroy
tc qdisc del dev $swp2 root
}
# Callback from sch_ets_tests.sh
get_stats()
{
local stream=$1; shift
link_stats_get $h2.1$stream rx bytes
}
ets_run
# SPDX-License-Identifier: GPL-2.0
# This is a template for ETS Qdisc test.
#
# This test sends from H1 several traffic streams with 802.1p-tagged packets.
# The tags are used at $swp1 to prioritize the traffic. Each stream is then
# queued at a different ETS band according to the assigned priority. After
# runnig for a while, counters at H2 are consulted to determine whether the
# traffic scheduling was according to the ETS configuration.
#
# This template is supposed to be embedded by a test driver, which implements
# statistics collection, any HW-specific stuff, and prominently configures the
# system to assure that there is overcommitment at $swp2. That is necessary so
# that the ETS traffic selection algorithm kicks in and has to schedule some
# traffic at the expense of other.
#
# A driver for veth-based testing is in sch_ets.sh, an example of a driver for
# an offloaded data path is in selftests/drivers/net/mlxsw/sch_ets.sh.
#
# +---------------------------------------------------------------------+
# | H1 |
# | + $h1.10 + $h1.11 + $h1.12 |
# | | 192.0.2.1/28 | 192.0.2.17/28 | 192.0.2.33/28 |
# | | egress-qos-map | egress-qos-map | egress-qos-map |
# | | 0:0 | 0:1 | 0:2 |
# | \____________________ | ____________________/ |
# | \|/ |
# | + $h1 |
# +---------------------------|-----------------------------------------+
# |
# +---------------------------|-----------------------------------------+
# | SW + $swp1 |
# | | >1Gbps |
# | ____________________/|\____________________ |
# | / | \ |
# | +--|----------------+ +--|----------------+ +--|----------------+ |
# | | + $swp1.10 | | + $swp1.11 | | + $swp1.12 | |
# | | ingress-qos-map| | ingress-qos-map| | ingress-qos-map| |
# | | 0:0 1:1 2:2 | | 0:0 1:1 2:2 | | 0:0 1:1 2:2 | |
# | | | | | | | |
# | | BR10 | | BR11 | | BR12 | |
# | | | | | | | |
# | | + $swp2.10 | | + $swp2.11 | | + $swp2.12 | |
# | +--|----------------+ +--|----------------+ +--|----------------+ |
# | \____________________ | ____________________/ |
# | \|/ |
# | + $swp2 |
# | | 1Gbps (ethtool or HTB qdisc) |
# | | qdisc ets quanta $W0 $W1 $W2 |
# | | priomap 0 1 2 |
# +---------------------------|-----------------------------------------+
# |
# +---------------------------|-----------------------------------------+
# | H2 + $h2 |
# | ____________________/|\____________________ |
# | / | \ |
# | + $h2.10 + $h2.11 + $h2.12 |
# | 192.0.2.2/28 192.0.2.18/28 192.0.2.34/28 |
# +---------------------------------------------------------------------+
NUM_NETIFS=4
CHECK_TC=yes
source $lib_dir/lib.sh
source $lib_dir/sch_ets_tests.sh
PARENT=root
QDISC_DEV=
sip()
{
echo 192.0.2.$((16 * $1 + 1))
}
dip()
{
echo 192.0.2.$((16 * $1 + 2))
}
# Callback from sch_ets_tests.sh
ets_start_traffic()
{
local dst_mac=$(mac_get $h2)
local i=$1; shift
start_traffic $h1.1$i $(sip $i) $(dip $i) $dst_mac
}
ETS_CHANGE_QDISC=
priomap_mode()
{
echo "Running in priomap mode"
ets_delete_qdisc
ETS_CHANGE_QDISC=ets_change_qdisc_priomap
}
classifier_mode()
{
echo "Running in classifier mode"
ets_delete_qdisc
ETS_CHANGE_QDISC=ets_change_qdisc_classifier
}
ets_change_qdisc_priomap()
{
local dev=$1; shift
local nstrict=$1; shift
local priomap=$1; shift
local quanta=("${@}")
local op=$(if [[ -n $QDISC_DEV ]]; then echo change; else echo add; fi)
tc qdisc $op dev $dev $PARENT handle 10: ets \
$(if ((nstrict)); then echo strict $nstrict; fi) \
$(if ((${#quanta[@]})); then echo quanta ${quanta[@]}; fi) \
priomap $priomap
QDISC_DEV=$dev
}
ets_change_qdisc_classifier()
{
local dev=$1; shift
local nstrict=$1; shift
local priomap=$1; shift
local quanta=("${@}")
local op=$(if [[ -n $QDISC_DEV ]]; then echo change; else echo add; fi)
tc qdisc $op dev $dev $PARENT handle 10: ets \
$(if ((nstrict)); then echo strict $nstrict; fi) \
$(if ((${#quanta[@]})); then echo quanta ${quanta[@]}; fi)
if [[ $op == add ]]; then
local prio=0
local band
for band in $priomap; do
tc filter add dev $dev parent 10: basic \
match "meta(priority eq $prio)" \
flowid 10:$((band + 1))
((prio++))
done
fi
QDISC_DEV=$dev
}
# Callback from sch_ets_tests.sh
ets_change_qdisc()
{
if [[ -z "$ETS_CHANGE_QDISC" ]]; then
exit 1
fi
$ETS_CHANGE_QDISC "$@"
}
ets_delete_qdisc()
{
if [[ -n $QDISC_DEV ]]; then
tc qdisc del dev $QDISC_DEV $PARENT
QDISC_DEV=
fi
}
h1_create()
{
local i;
simple_if_init $h1
mtu_set $h1 9900
for i in {0..2}; do
vlan_create $h1 1$i v$h1 $(sip $i)/28
ip link set dev $h1.1$i type vlan egress 0:$i
done
}
h1_destroy()
{
local i
for i in {0..2}; do
vlan_destroy $h1 1$i
done
mtu_restore $h1
simple_if_fini $h1
}
h2_create()
{
local i
simple_if_init $h2
mtu_set $h2 9900
for i in {0..2}; do
vlan_create $h2 1$i v$h2 $(dip $i)/28
done
}
h2_destroy()
{
local i
for i in {0..2}; do
vlan_destroy $h2 1$i
done
mtu_restore $h2
simple_if_fini $h2
}
ets_switch_create()
{
local i
ip link set dev $swp1 up
mtu_set $swp1 9900
ip link set dev $swp2 up
mtu_set $swp2 9900
for i in {0..2}; do
vlan_create $swp1 1$i
ip link set dev $swp1.1$i type vlan ingress 0:0 1:1 2:2
vlan_create $swp2 1$i
ip link add dev br1$i type bridge
ip link set dev $swp1.1$i master br1$i
ip link set dev $swp2.1$i master br1$i
ip link set dev br1$i up
ip link set dev $swp1.1$i up
ip link set dev $swp2.1$i up
done
}
ets_switch_destroy()
{
local i
ets_delete_qdisc
for i in {0..2}; do
ip link del dev br1$i
vlan_destroy $swp2 1$i
vlan_destroy $swp1 1$i
done
mtu_restore $swp2
ip link set dev $swp2 down
mtu_restore $swp1
ip link set dev $swp1 down
}
setup_prepare()
{
h1=${NETIFS[p1]}
swp1=${NETIFS[p2]}
swp2=${NETIFS[p3]}
h2=${NETIFS[p4]}
put=$swp2
hut=$h2
vrf_prepare
h1_create
h2_create
switch_create
}
cleanup()
{
pre_cleanup
switch_destroy
h2_destroy
h1_destroy
vrf_cleanup
}
ping_ipv4()
{
ping_test $h1.10 $(dip 0) " vlan 10"
ping_test $h1.11 $(dip 1) " vlan 11"
ping_test $h1.12 $(dip 2) " vlan 12"
}
ets_run()
{
trap cleanup EXIT
setup_prepare
setup_wait
tests_run
exit $EXIT_STATUS
}
# SPDX-License-Identifier: GPL-2.0
# Global interface:
# $put -- port under test (e.g. $swp2)
# get_stats($band) -- A function to collect stats for band
# ets_start_traffic($band) -- Start traffic for this band
# ets_change_qdisc($op, $dev, $nstrict, $quanta...) -- Add or change qdisc
# WS describes the Qdisc configuration. It has one value per band (so the
# number of array elements indicates the number of bands). If the value is
# 0, it is a strict band, otherwise the it's a DRR band and the value is
# that band's quantum.
declare -a WS
qdisc_describe()
{
local nbands=${#WS[@]}
local nstrict=0
local i
for ((i = 0; i < nbands; i++)); do
if ((!${WS[$i]})); then
: $((nstrict++))
fi
done
echo -n "ets bands $nbands"
if ((nstrict)); then
echo -n " strict $nstrict"
fi
if ((nstrict < nbands)); then
echo -n " quanta"
for ((i = nstrict; i < nbands; i++)); do
echo -n " ${WS[$i]}"
done
fi
}
__strict_eval()
{
local desc=$1; shift
local d=$1; shift
local total=$1; shift
local above=$1; shift
RET=0
if ((! total)); then
check_err 1 "No traffic observed"
log_test "$desc"
return
fi
local ratio=$(echo "scale=2; 100 * $d / $total" | bc -l)
if ((above)); then
test $(echo "$ratio > 95.0" | bc -l) -eq 1
check_err $? "Not enough traffic"
log_test "$desc"
log_info "Expected ratio >95% Measured ratio $ratio"
else
test $(echo "$ratio < 5" | bc -l) -eq 1
check_err $? "Too much traffic"
log_test "$desc"
log_info "Expected ratio <5% Measured ratio $ratio"
fi
}
strict_eval()
{
__strict_eval "$@" 1
}
notraf_eval()
{
__strict_eval "$@" 0
}
__ets_dwrr_test()
{
local -a streams=("$@")
local low_stream=${streams[0]}
local seen_strict=0
local -a t0 t1 d
local stream
local total
local i
echo "Testing $(qdisc_describe), streams ${streams[@]}"
for stream in ${streams[@]}; do
ets_start_traffic $stream
done
sleep 10
t0=($(for stream in ${streams[@]}; do
get_stats $stream
done))
sleep 10
t1=($(for stream in ${streams[@]}; do
get_stats $stream
done))
d=($(for ((i = 0; i < ${#streams[@]}; i++)); do
echo $((${t1[$i]} - ${t0[$i]}))
done))
total=$(echo ${d[@]} | sed 's/ /+/g' | bc)
for ((i = 0; i < ${#streams[@]}; i++)); do
local stream=${streams[$i]}
if ((seen_strict)); then
notraf_eval "band $stream" ${d[$i]} $total
elif ((${WS[$stream]} == 0)); then
strict_eval "band $stream" ${d[$i]} $total
seen_strict=1
elif ((stream == low_stream)); then
# Low stream is used as DWRR evaluation reference.
continue
else
multipath_eval "bands $low_stream:$stream" \
${WS[$low_stream]} ${WS[$stream]} \
${d[0]} ${d[$i]}
fi
done
for stream in ${streams[@]}; do
stop_traffic
done
}
ets_dwrr_test_012()
{
__ets_dwrr_test 0 1 2
}
ets_dwrr_test_01()
{
__ets_dwrr_test 0 1
}
ets_dwrr_test_12()
{
__ets_dwrr_test 1 2
}
ets_qdisc_setup()
{
local dev=$1; shift
local nstrict=$1; shift
local -a quanta=("$@")
local ndwrr=${#quanta[@]}
local nbands=$((nstrict + ndwrr))
local nstreams=$(if ((nbands > 3)); then echo 3; else echo $nbands; fi)
local priomap=$(seq 0 $((nstreams - 1)))
local i
WS=($(
for ((i = 0; i < nstrict; i++)); do
echo 0
done
for ((i = 0; i < ndwrr; i++)); do
echo ${quanta[$i]}
done
))
ets_change_qdisc $dev $nstrict "$priomap" ${quanta[@]}
}
ets_set_dwrr_uniform()
{
ets_qdisc_setup $put 0 3300 3300 3300
}
ets_set_dwrr_varying()
{
ets_qdisc_setup $put 0 5000 3500 1500
}
ets_set_strict()
{
ets_qdisc_setup $put 3
}
ets_set_mixed()
{
ets_qdisc_setup $put 1 5000 2500 1500
}
ets_change_quantum()
{
tc class change dev $put classid 10:2 ets quantum 8000
WS[1]=8000
}
ets_set_dwrr_two_bands()
{
ets_qdisc_setup $put 0 5000 2500
}
ets_test_strict()
{
ets_set_strict
ets_dwrr_test_01
ets_dwrr_test_12
}
ets_test_mixed()
{
ets_set_mixed
ets_dwrr_test_01
ets_dwrr_test_12
}
ets_test_dwrr()
{
ets_set_dwrr_uniform
ets_dwrr_test_012
ets_set_dwrr_varying
ets_dwrr_test_012
ets_change_quantum
ets_dwrr_test_012
ets_set_dwrr_two_bands
ets_dwrr_test_01
}
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment