Commit 5b27aafa authored by David S. Miller's avatar David S. Miller

Merge branch 'net-sched-taprio-change-schedules'

Vinicius Costa Gomes says:

====================
net/sched: taprio change schedules

Changes from RFC:
 - Removed the patches for taprio offloading, because of the lack of
   in-tree users;
 - Updated the links to point to the PATCH version of this series;

Original cover letter:

Overview
--------

This RFC has two objectives, it adds support for changing the running
schedules during "runtime", explained in more detail later, and
proposes an interface between taprio and the drivers for hardware
offloading.

These two different features are presented together so it's clear what
the "final state" would look like. But after the RFC stage, they can
be proposed (and reviewed) separately.

Changing the schedules without disrupting traffic is important for
handling dynamic use cases, for example, when streams are
added/removed and when the network configuration changes.

Hardware offloading support allows schedules to be more precise and
have lower resource usage.

Changing schedules
------------------

The same as the other interfaces we proposed, we try to use the same
concepts as the IEEE 802.1Q-2018 specification. So, for changing
schedules, there are an "oper" (operational) and an "admin" schedule.
The "admin" schedule is mutable and not in use, the "oper" schedule is
immutable and is in use.

That is, when the user first adds an schedule it is in the "admin"
state, and it becomes "oper" when its base-time (basically when it
starts) is reached.

What this means is that now it's possible to create taprio with a schedule:

$ tc qdisc add dev IFACE parent root handle 100 taprio \
      num_tc 3 \
      map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
      queues 1@0 1@1 2@2 \
      base-time 10000000 \
      sched-entry S 03 300000 \
      sched-entry S 02 300000 \
      sched-entry S 06 400000 \
      clockid CLOCK_TAI

And then, later, after the previous schedule is "promoted" to "oper",
add a new ("admin") schedule to be used some time later:

$ tc qdisc change dev IFACE parent root handle 100 taprio \
      base-time 1553121866000000000 \
      sched-entry S 02 500000 \
      sched-entry S 0f 400000 \
      clockid CLOCK_TAI

When enabling the ability to change schedules, it makes sense to add
two more defined knobs to schedules: "cycle-time" allows to truncate a
cycle to some value, so it repeats after a well-defined value;
"cycle-time-extension" controls how much an entry can be extended if
it's the last one before the change of schedules, the reason is to
avoid a very small cycle when transitioning from a schedule to
another.

With these, taprio in the software mode should provide a fairly
complete implementation of what's defined in the Enhancements for
Scheduled Traffic parts of the specification.

Hardware offload
----------------

Some workloads require better guarantees from their schedules than
what's provided by the software implementation. This series proposes
an interface for configuring schedules into compatible network
controllers.

This part is proposed together with the support for changing
schedules, because it raises questions like, should the "qdisc" side
be responsible of providing visibility into the schedules or should it
be the driver?

In this proposal, the driver is called passing the new schedule as
soon as it is validated, and the "core" qdisc takes care of displaying
(".dump()") the correct schedules at all times. It means that some
logic would need to be duplicated in the driver, if the hardware
doesn't have support for multiple schedules. But as taprio doesn't
have enough information about the underlying controller to know how
much in advance a schedule needs to be informed to the hardware, it
feels like a fair compromise.

The hardware offloading part of this proposal also tries to define an
interface for frame-preemption and how it interacts with the
scheduling of traffic, see Section 8.6.8.4 of IEEE 802.1Q-2018 for
more information.

One important difference between the qdisc interface and the
qdisc-driver interface, is that the "gate mask" on the qdisc side
references traffic classes, that is bit 0 of the gate mask means
Traffic Class 0, and in the driver interface, it specifies the queues,
that is bit 0 means queue 0. That is to say that taprio converts the
references to traffic classes to references to queues before sending
the offloading request to the driver.

Request for help
----------------

I would like that interested driver maintainers could take a look at
the proposed interface and see if it's going to be too awkward for any
particular device. Also, pointers to available documentation would be
appreciated. The idea here is to start a discussion so we can have an
interface that would work for multiple vendors.

Links
-----

kernel patches:
https://github.com/vcgomes/net-next/tree/taprio-add-support-for-change-v3

iproute2 patches:
https://github.com/vcgomes/iproute2/tree/taprio-add-support-for-change-v3
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents cd86972a c25031e9
...@@ -1148,6 +1148,16 @@ enum { ...@@ -1148,6 +1148,16 @@ enum {
#define TCA_TAPRIO_SCHED_MAX (__TCA_TAPRIO_SCHED_MAX - 1) #define TCA_TAPRIO_SCHED_MAX (__TCA_TAPRIO_SCHED_MAX - 1)
/* The format for the admin sched (dump only):
* [TCA_TAPRIO_SCHED_ADMIN_SCHED]
* [TCA_TAPRIO_ATTR_SCHED_BASE_TIME]
* [TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST]
* [TCA_TAPRIO_ATTR_SCHED_ENTRY]
* [TCA_TAPRIO_ATTR_SCHED_ENTRY_CMD]
* [TCA_TAPRIO_ATTR_SCHED_ENTRY_GATES]
* [TCA_TAPRIO_ATTR_SCHED_ENTRY_INTERVAL]
*/
enum { enum {
TCA_TAPRIO_ATTR_UNSPEC, TCA_TAPRIO_ATTR_UNSPEC,
TCA_TAPRIO_ATTR_PRIOMAP, /* struct tc_mqprio_qopt */ TCA_TAPRIO_ATTR_PRIOMAP, /* struct tc_mqprio_qopt */
...@@ -1156,6 +1166,9 @@ enum { ...@@ -1156,6 +1166,9 @@ enum {
TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY, /* single entry */ TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY, /* single entry */
TCA_TAPRIO_ATTR_SCHED_CLOCKID, /* s32 */ TCA_TAPRIO_ATTR_SCHED_CLOCKID, /* s32 */
TCA_TAPRIO_PAD, TCA_TAPRIO_PAD,
TCA_TAPRIO_ATTR_ADMIN_SCHED, /* The admin sched, only used in dump */
TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME, /* s64 */
TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
__TCA_TAPRIO_ATTR_MAX, __TCA_TAPRIO_ATTR_MAX,
}; };
......
...@@ -16,6 +16,7 @@ ...@@ -16,6 +16,7 @@
#include <linux/math64.h> #include <linux/math64.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/spinlock.h> #include <linux/spinlock.h>
#include <linux/rcupdate.h>
#include <net/netlink.h> #include <net/netlink.h>
#include <net/pkt_sched.h> #include <net/pkt_sched.h>
#include <net/pkt_cls.h> #include <net/pkt_cls.h>
...@@ -41,25 +42,88 @@ struct sched_entry { ...@@ -41,25 +42,88 @@ struct sched_entry {
u8 command; u8 command;
}; };
struct sched_gate_list {
struct rcu_head rcu;
struct list_head entries;
size_t num_entries;
ktime_t cycle_close_time;
s64 cycle_time;
s64 cycle_time_extension;
s64 base_time;
};
struct taprio_sched { struct taprio_sched {
struct Qdisc **qdiscs; struct Qdisc **qdiscs;
struct Qdisc *root; struct Qdisc *root;
s64 base_time;
int clockid; int clockid;
atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+ atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+
* speeds it's sub-nanoseconds per byte * speeds it's sub-nanoseconds per byte
*/ */
size_t num_entries;
/* Protects the update side of the RCU protected current_entry */ /* Protects the update side of the RCU protected current_entry */
spinlock_t current_entry_lock; spinlock_t current_entry_lock;
struct sched_entry __rcu *current_entry; struct sched_entry __rcu *current_entry;
struct list_head entries; struct sched_gate_list __rcu *oper_sched;
struct sched_gate_list __rcu *admin_sched;
ktime_t (*get_time)(void); ktime_t (*get_time)(void);
struct hrtimer advance_timer; struct hrtimer advance_timer;
struct list_head taprio_list; struct list_head taprio_list;
}; };
static ktime_t sched_base_time(const struct sched_gate_list *sched)
{
if (!sched)
return KTIME_MAX;
return ns_to_ktime(sched->base_time);
}
static void taprio_free_sched_cb(struct rcu_head *head)
{
struct sched_gate_list *sched = container_of(head, struct sched_gate_list, rcu);
struct sched_entry *entry, *n;
if (!sched)
return;
list_for_each_entry_safe(entry, n, &sched->entries, list) {
list_del(&entry->list);
kfree(entry);
}
kfree(sched);
}
static void switch_schedules(struct taprio_sched *q,
struct sched_gate_list **admin,
struct sched_gate_list **oper)
{
rcu_assign_pointer(q->oper_sched, *admin);
rcu_assign_pointer(q->admin_sched, NULL);
if (*oper)
call_rcu(&(*oper)->rcu, taprio_free_sched_cb);
*oper = *admin;
*admin = NULL;
}
static ktime_t get_cycle_time(struct sched_gate_list *sched)
{
struct sched_entry *entry;
ktime_t cycle = 0;
if (sched->cycle_time != 0)
return sched->cycle_time;
list_for_each_entry(entry, &sched->entries, list)
cycle = ktime_add_ns(cycle, entry->interval);
sched->cycle_time = cycle;
return cycle;
}
static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch, static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free) struct sk_buff **to_free)
{ {
...@@ -136,8 +200,8 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch) ...@@ -136,8 +200,8 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
{ {
struct taprio_sched *q = qdisc_priv(sch); struct taprio_sched *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch); struct net_device *dev = qdisc_dev(sch);
struct sk_buff *skb = NULL;
struct sched_entry *entry; struct sched_entry *entry;
struct sk_buff *skb;
u32 gate_mask; u32 gate_mask;
int i; int i;
...@@ -154,10 +218,9 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch) ...@@ -154,10 +218,9 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
* "AdminGateSates" * "AdminGateSates"
*/ */
gate_mask = entry ? entry->gate_mask : TAPRIO_ALL_GATES_OPEN; gate_mask = entry ? entry->gate_mask : TAPRIO_ALL_GATES_OPEN;
rcu_read_unlock();
if (!gate_mask) if (!gate_mask)
return NULL; goto done;
for (i = 0; i < dev->num_tx_queues; i++) { for (i = 0; i < dev->num_tx_queues; i++) {
struct Qdisc *child = q->qdiscs[i]; struct Qdisc *child = q->qdiscs[i];
...@@ -197,22 +260,72 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch) ...@@ -197,22 +260,72 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
skb = child->ops->dequeue(child); skb = child->ops->dequeue(child);
if (unlikely(!skb)) if (unlikely(!skb))
return NULL; goto done;
qdisc_bstats_update(sch, skb); qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb); qdisc_qstats_backlog_dec(sch, skb);
sch->q.qlen--; sch->q.qlen--;
return skb; goto done;
} }
return NULL; done:
rcu_read_unlock();
return skb;
}
static bool should_restart_cycle(const struct sched_gate_list *oper,
const struct sched_entry *entry)
{
if (list_is_last(&entry->list, &oper->entries))
return true;
if (ktime_compare(entry->close_time, oper->cycle_close_time) == 0)
return true;
return false;
}
static bool should_change_schedules(const struct sched_gate_list *admin,
const struct sched_gate_list *oper,
ktime_t close_time)
{
ktime_t next_base_time, extension_time;
if (!admin)
return false;
next_base_time = sched_base_time(admin);
/* This is the simple case, the close_time would fall after
* the next schedule base_time.
*/
if (ktime_compare(next_base_time, close_time) <= 0)
return true;
/* This is the cycle_time_extension case, if the close_time
* plus the amount that can be extended would fall after the
* next schedule base_time, we can extend the current schedule
* for that amount.
*/
extension_time = ktime_add_ns(close_time, oper->cycle_time_extension);
/* FIXME: the IEEE 802.1Q-2018 Specification isn't clear about
* how precisely the extension should be made. So after
* conformance testing, this logic may change.
*/
if (ktime_compare(next_base_time, extension_time) <= 0)
return true;
return false;
} }
static enum hrtimer_restart advance_sched(struct hrtimer *timer) static enum hrtimer_restart advance_sched(struct hrtimer *timer)
{ {
struct taprio_sched *q = container_of(timer, struct taprio_sched, struct taprio_sched *q = container_of(timer, struct taprio_sched,
advance_timer); advance_timer);
struct sched_gate_list *oper, *admin;
struct sched_entry *entry, *next; struct sched_entry *entry, *next;
struct Qdisc *sch = q->root; struct Qdisc *sch = q->root;
ktime_t close_time; ktime_t close_time;
...@@ -220,25 +333,46 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer) ...@@ -220,25 +333,46 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer)
spin_lock(&q->current_entry_lock); spin_lock(&q->current_entry_lock);
entry = rcu_dereference_protected(q->current_entry, entry = rcu_dereference_protected(q->current_entry,
lockdep_is_held(&q->current_entry_lock)); lockdep_is_held(&q->current_entry_lock));
oper = rcu_dereference_protected(q->oper_sched,
lockdep_is_held(&q->current_entry_lock));
admin = rcu_dereference_protected(q->admin_sched,
lockdep_is_held(&q->current_entry_lock));
/* This is the case that it's the first time that the schedule if (!oper)
* runs, so it only happens once per schedule. The first entry switch_schedules(q, &admin, &oper);
* is pre-calculated during the schedule initialization.
/* This can happen in two cases: 1. this is the very first run
* of this function (i.e. we weren't running any schedule
* previously); 2. The previous schedule just ended. The first
* entry of all schedules are pre-calculated during the
* schedule initialization.
*/ */
if (unlikely(!entry)) { if (unlikely(!entry || entry->close_time == oper->base_time)) {
next = list_first_entry(&q->entries, struct sched_entry, next = list_first_entry(&oper->entries, struct sched_entry,
list); list);
close_time = next->close_time; close_time = next->close_time;
goto first_run; goto first_run;
} }
if (list_is_last(&entry->list, &q->entries)) if (should_restart_cycle(oper, entry)) {
next = list_first_entry(&q->entries, struct sched_entry, next = list_first_entry(&oper->entries, struct sched_entry,
list); list);
else oper->cycle_close_time = ktime_add_ns(oper->cycle_close_time,
oper->cycle_time);
} else {
next = list_next_entry(entry, list); next = list_next_entry(entry, list);
}
close_time = ktime_add_ns(entry->close_time, next->interval); close_time = ktime_add_ns(entry->close_time, next->interval);
close_time = min_t(ktime_t, close_time, oper->cycle_close_time);
if (should_change_schedules(admin, oper, close_time)) {
/* Set things so the next time this runs, the new
* schedule runs.
*/
close_time = sched_base_time(admin);
switch_schedules(q, &admin, &oper);
}
next->close_time = close_time; next->close_time = close_time;
taprio_set_budget(q, next); taprio_set_budget(q, next);
...@@ -275,6 +409,8 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = { ...@@ -275,6 +409,8 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = {
[TCA_TAPRIO_ATTR_SCHED_BASE_TIME] = { .type = NLA_S64 }, [TCA_TAPRIO_ATTR_SCHED_BASE_TIME] = { .type = NLA_S64 },
[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY] = { .type = NLA_NESTED }, [TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY] = { .type = NLA_NESTED },
[TCA_TAPRIO_ATTR_SCHED_CLOCKID] = { .type = NLA_S32 }, [TCA_TAPRIO_ATTR_SCHED_CLOCKID] = { .type = NLA_S32 },
[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME] = { .type = NLA_S64 },
[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 },
}; };
static int fill_sched_entry(struct nlattr **tb, struct sched_entry *entry, static int fill_sched_entry(struct nlattr **tb, struct sched_entry *entry,
...@@ -322,71 +458,8 @@ static int parse_sched_entry(struct nlattr *n, struct sched_entry *entry, ...@@ -322,71 +458,8 @@ static int parse_sched_entry(struct nlattr *n, struct sched_entry *entry,
return fill_sched_entry(tb, entry, extack); return fill_sched_entry(tb, entry, extack);
} }
/* Returns the number of entries in case of success */
static int parse_sched_single_entry(struct nlattr *n,
struct taprio_sched *q,
struct netlink_ext_ack *extack)
{
struct nlattr *tb_entry[TCA_TAPRIO_SCHED_ENTRY_MAX + 1] = { };
struct nlattr *tb_list[TCA_TAPRIO_SCHED_MAX + 1] = { };
struct sched_entry *entry;
bool found = false;
u32 index;
int err;
err = nla_parse_nested_deprecated(tb_list, TCA_TAPRIO_SCHED_MAX, n,
entry_list_policy, NULL);
if (err < 0) {
NL_SET_ERR_MSG(extack, "Could not parse nested entry");
return -EINVAL;
}
if (!tb_list[TCA_TAPRIO_SCHED_ENTRY]) {
NL_SET_ERR_MSG(extack, "Single-entry must include an entry");
return -EINVAL;
}
err = nla_parse_nested_deprecated(tb_entry,
TCA_TAPRIO_SCHED_ENTRY_MAX,
tb_list[TCA_TAPRIO_SCHED_ENTRY],
entry_policy, NULL);
if (err < 0) {
NL_SET_ERR_MSG(extack, "Could not parse nested entry");
return -EINVAL;
}
if (!tb_entry[TCA_TAPRIO_SCHED_ENTRY_INDEX]) {
NL_SET_ERR_MSG(extack, "Entry must specify an index\n");
return -EINVAL;
}
index = nla_get_u32(tb_entry[TCA_TAPRIO_SCHED_ENTRY_INDEX]);
if (index >= q->num_entries) {
NL_SET_ERR_MSG(extack, "Index for single entry exceeds number of entries in schedule");
return -EINVAL;
}
list_for_each_entry(entry, &q->entries, list) {
if (entry->index == index) {
found = true;
break;
}
}
if (!found) {
NL_SET_ERR_MSG(extack, "Could not find entry");
return -ENOENT;
}
err = fill_sched_entry(tb_entry, entry, extack);
if (err < 0)
return err;
return q->num_entries;
}
static int parse_sched_list(struct nlattr *list, static int parse_sched_list(struct nlattr *list,
struct taprio_sched *q, struct sched_gate_list *sched,
struct netlink_ext_ack *extack) struct netlink_ext_ack *extack)
{ {
struct nlattr *n; struct nlattr *n;
...@@ -416,64 +489,42 @@ static int parse_sched_list(struct nlattr *list, ...@@ -416,64 +489,42 @@ static int parse_sched_list(struct nlattr *list,
return err; return err;
} }
list_add_tail(&entry->list, &q->entries); list_add_tail(&entry->list, &sched->entries);
i++; i++;
} }
q->num_entries = i; sched->num_entries = i;
return i; return i;
} }
/* Returns the number of entries in case of success */ static int parse_taprio_schedule(struct nlattr **tb,
static int parse_taprio_opt(struct nlattr **tb, struct taprio_sched *q, struct sched_gate_list *new,
struct netlink_ext_ack *extack) struct netlink_ext_ack *extack)
{ {
int err = 0; int err = 0;
int clockid;
if (tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST] &&
tb[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY])
return -EINVAL;
if (tb[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY] && q->num_entries == 0) if (tb[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY]) {
return -EINVAL; NL_SET_ERR_MSG(extack, "Adding a single entry is not supported");
return -ENOTSUPP;
if (q->clockid == -1 && !tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]) }
return -EINVAL;
if (tb[TCA_TAPRIO_ATTR_SCHED_BASE_TIME]) if (tb[TCA_TAPRIO_ATTR_SCHED_BASE_TIME])
q->base_time = nla_get_s64( new->base_time = nla_get_s64(tb[TCA_TAPRIO_ATTR_SCHED_BASE_TIME]);
tb[TCA_TAPRIO_ATTR_SCHED_BASE_TIME]);
if (tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]) { if (tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION])
clockid = nla_get_s32(tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]); new->cycle_time_extension = nla_get_s64(tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION]);
/* We only support static clockids and we don't allow if (tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME])
* for it to be modified after the first init. new->cycle_time = nla_get_s64(tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME]);
*/
if (clockid < 0 || (q->clockid != -1 && q->clockid != clockid))
return -EINVAL;
q->clockid = clockid;
}
if (tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST]) if (tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST])
err = parse_sched_list( err = parse_sched_list(
tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST], q, extack); tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST], new, extack);
else if (tb[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY]) if (err < 0)
err = parse_sched_single_entry(
tb[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY], q, extack);
/* parse_sched_* return the number of entries in the schedule,
* a schedule with zero entries is an error.
*/
if (err == 0) {
NL_SET_ERR_MSG(extack, "The schedule should contain at least one entry");
return -EINVAL;
}
return err; return err;
return 0;
} }
static int taprio_parse_mqprio_opt(struct net_device *dev, static int taprio_parse_mqprio_opt(struct net_device *dev,
...@@ -482,11 +533,17 @@ static int taprio_parse_mqprio_opt(struct net_device *dev, ...@@ -482,11 +533,17 @@ static int taprio_parse_mqprio_opt(struct net_device *dev,
{ {
int i, j; int i, j;
if (!qopt) { if (!qopt && !dev->num_tc) {
NL_SET_ERR_MSG(extack, "'mqprio' configuration is necessary"); NL_SET_ERR_MSG(extack, "'mqprio' configuration is necessary");
return -EINVAL; return -EINVAL;
} }
/* If num_tc is already set, it means that the user already
* configured the mqprio part
*/
if (dev->num_tc)
return 0;
/* Verify num_tc is not out of max range */ /* Verify num_tc is not out of max range */
if (qopt->num_tc > TC_MAX_QUEUE) { if (qopt->num_tc > TC_MAX_QUEUE) {
NL_SET_ERR_MSG(extack, "Number of traffic classes is outside valid range"); NL_SET_ERR_MSG(extack, "Number of traffic classes is outside valid range");
...@@ -532,14 +589,15 @@ static int taprio_parse_mqprio_opt(struct net_device *dev, ...@@ -532,14 +589,15 @@ static int taprio_parse_mqprio_opt(struct net_device *dev,
return 0; return 0;
} }
static int taprio_get_start_time(struct Qdisc *sch, ktime_t *start) static int taprio_get_start_time(struct Qdisc *sch,
struct sched_gate_list *sched,
ktime_t *start)
{ {
struct taprio_sched *q = qdisc_priv(sch); struct taprio_sched *q = qdisc_priv(sch);
struct sched_entry *entry;
ktime_t now, base, cycle; ktime_t now, base, cycle;
s64 n; s64 n;
base = ns_to_ktime(q->base_time); base = sched_base_time(sched);
now = q->get_time(); now = q->get_time();
if (ktime_after(base, now)) { if (ktime_after(base, now)) {
...@@ -547,11 +605,7 @@ static int taprio_get_start_time(struct Qdisc *sch, ktime_t *start) ...@@ -547,11 +605,7 @@ static int taprio_get_start_time(struct Qdisc *sch, ktime_t *start)
return 0; return 0;
} }
/* Calculate the cycle_time, by summing all the intervals. cycle = get_cycle_time(sched);
*/
cycle = 0;
list_for_each_entry(entry, &q->entries, list)
cycle = ktime_add_ns(cycle, entry->interval);
/* The qdisc is expected to have at least one sched_entry. Moreover, /* The qdisc is expected to have at least one sched_entry. Moreover,
* any entry must have 'interval' > 0. Thus if the cycle time is zero, * any entry must have 'interval' > 0. Thus if the cycle time is zero,
...@@ -569,22 +623,40 @@ static int taprio_get_start_time(struct Qdisc *sch, ktime_t *start) ...@@ -569,22 +623,40 @@ static int taprio_get_start_time(struct Qdisc *sch, ktime_t *start)
return 0; return 0;
} }
static void taprio_start_sched(struct Qdisc *sch, ktime_t start) static void setup_first_close_time(struct taprio_sched *q,
struct sched_gate_list *sched, ktime_t base)
{ {
struct taprio_sched *q = qdisc_priv(sch);
struct sched_entry *first; struct sched_entry *first;
unsigned long flags; ktime_t cycle;
spin_lock_irqsave(&q->current_entry_lock, flags); first = list_first_entry(&sched->entries,
struct sched_entry, list);
first = list_first_entry(&q->entries, struct sched_entry, cycle = get_cycle_time(sched);
list);
/* FIXME: find a better place to do this */
sched->cycle_close_time = ktime_add_ns(base, cycle);
first->close_time = ktime_add_ns(start, first->interval); first->close_time = ktime_add_ns(base, first->interval);
taprio_set_budget(q, first); taprio_set_budget(q, first);
rcu_assign_pointer(q->current_entry, NULL); rcu_assign_pointer(q->current_entry, NULL);
}
spin_unlock_irqrestore(&q->current_entry_lock, flags); static void taprio_start_sched(struct Qdisc *sch,
ktime_t start, struct sched_gate_list *new)
{
struct taprio_sched *q = qdisc_priv(sch);
ktime_t expires;
expires = hrtimer_get_expires(&q->advance_timer);
if (expires == 0)
expires = KTIME_MAX;
/* If the new schedule starts before the next expiration, we
* reprogram it to the earliest one, so we change the admin
* schedule to the operational one at the right time.
*/
start = min_t(ktime_t, start, expires);
hrtimer_start(&q->advance_timer, start, HRTIMER_MODE_ABS); hrtimer_start(&q->advance_timer, start, HRTIMER_MODE_ABS);
} }
...@@ -639,10 +711,12 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt, ...@@ -639,10 +711,12 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack) struct netlink_ext_ack *extack)
{ {
struct nlattr *tb[TCA_TAPRIO_ATTR_MAX + 1] = { }; struct nlattr *tb[TCA_TAPRIO_ATTR_MAX + 1] = { };
struct sched_gate_list *oper, *admin, *new_admin;
struct taprio_sched *q = qdisc_priv(sch); struct taprio_sched *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch); struct net_device *dev = qdisc_dev(sch);
struct tc_mqprio_qopt *mqprio = NULL; struct tc_mqprio_qopt *mqprio = NULL;
int i, err, size; int i, err, clockid;
unsigned long flags;
ktime_t start; ktime_t start;
err = nla_parse_nested_deprecated(tb, TCA_TAPRIO_ATTR_MAX, opt, err = nla_parse_nested_deprecated(tb, TCA_TAPRIO_ATTR_MAX, opt,
...@@ -657,13 +731,78 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt, ...@@ -657,13 +731,78 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
if (err < 0) if (err < 0)
return err; return err;
/* A schedule with less than one entry is an error */ new_admin = kzalloc(sizeof(*new_admin), GFP_KERNEL);
size = parse_taprio_opt(tb, q, extack); if (!new_admin) {
if (size < 0) NL_SET_ERR_MSG(extack, "Not enough memory for a new schedule");
return size; return -ENOMEM;
}
INIT_LIST_HEAD(&new_admin->entries);
rcu_read_lock();
oper = rcu_dereference(q->oper_sched);
admin = rcu_dereference(q->admin_sched);
rcu_read_unlock();
if (mqprio && (oper || admin)) {
NL_SET_ERR_MSG(extack, "Changing the traffic mapping of a running schedule is not supported");
err = -ENOTSUPP;
goto free_sched;
}
err = parse_taprio_schedule(tb, new_admin, extack);
if (err < 0)
goto free_sched;
if (new_admin->num_entries == 0) {
NL_SET_ERR_MSG(extack, "There should be at least one entry in the schedule");
err = -EINVAL;
goto free_sched;
}
if (tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]) {
clockid = nla_get_s32(tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]);
/* We only support static clockids and we don't allow
* for it to be modified after the first init.
*/
if (clockid < 0 ||
(q->clockid != -1 && q->clockid != clockid)) {
NL_SET_ERR_MSG(extack, "Changing the 'clockid' of a running schedule is not supported");
err = -ENOTSUPP;
goto free_sched;
}
q->clockid = clockid;
}
if (q->clockid == -1 && !tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]) {
NL_SET_ERR_MSG(extack, "Specifying a 'clockid' is mandatory");
err = -EINVAL;
goto free_sched;
}
taprio_set_picos_per_byte(dev, q);
/* Protects against enqueue()/dequeue() */
spin_lock_bh(qdisc_lock(sch));
if (!hrtimer_active(&q->advance_timer)) {
hrtimer_init(&q->advance_timer, q->clockid, HRTIMER_MODE_ABS); hrtimer_init(&q->advance_timer, q->clockid, HRTIMER_MODE_ABS);
q->advance_timer.function = advance_sched; q->advance_timer.function = advance_sched;
}
if (mqprio) {
netdev_set_num_tc(dev, mqprio->num_tc);
for (i = 0; i < mqprio->num_tc; i++)
netdev_set_tc_queue(dev, i,
mqprio->count[i],
mqprio->offset[i]);
/* Always use supplied priority mappings */
for (i = 0; i < TC_BITMASK + 1; i++)
netdev_set_prio_tc_map(dev, i,
mqprio->prio_tc_map[i]);
}
switch (q->clockid) { switch (q->clockid) {
case CLOCK_REALTIME: case CLOCK_REALTIME:
...@@ -679,59 +818,46 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt, ...@@ -679,59 +818,46 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
q->get_time = ktime_get_clocktai; q->get_time = ktime_get_clocktai;
break; break;
default: default:
return -ENOTSUPP; NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
err = -EINVAL;
goto unlock;
} }
for (i = 0; i < dev->num_tx_queues; i++) { err = taprio_get_start_time(sch, new_admin, &start);
struct netdev_queue *dev_queue; if (err < 0) {
struct Qdisc *qdisc; NL_SET_ERR_MSG(extack, "Internal error: failed get start time");
goto unlock;
}
dev_queue = netdev_get_tx_queue(dev, i); setup_first_close_time(q, new_admin, start);
qdisc = qdisc_create_dflt(dev_queue,
&pfifo_qdisc_ops,
TC_H_MAKE(TC_H_MAJ(sch->handle),
TC_H_MIN(i + 1)),
extack);
if (!qdisc)
return -ENOMEM;
if (i < dev->real_num_tx_queues) /* Protects against advance_sched() */
qdisc_hash_add(qdisc, false); spin_lock_irqsave(&q->current_entry_lock, flags);
q->qdiscs[i] = qdisc; taprio_start_sched(sch, start, new_admin);
}
if (mqprio) { rcu_assign_pointer(q->admin_sched, new_admin);
netdev_set_num_tc(dev, mqprio->num_tc); if (admin)
for (i = 0; i < mqprio->num_tc; i++) call_rcu(&admin->rcu, taprio_free_sched_cb);
netdev_set_tc_queue(dev, i, new_admin = NULL;
mqprio->count[i],
mqprio->offset[i]);
/* Always use supplied priority mappings */ spin_unlock_irqrestore(&q->current_entry_lock, flags);
for (i = 0; i < TC_BITMASK + 1; i++)
netdev_set_prio_tc_map(dev, i,
mqprio->prio_tc_map[i]);
}
taprio_set_picos_per_byte(dev, q); err = 0;
err = taprio_get_start_time(sch, &start); unlock:
if (err < 0) { spin_unlock_bh(qdisc_lock(sch));
NL_SET_ERR_MSG(extack, "Internal error: failed get start time");
return err;
}
taprio_start_sched(sch, start); free_sched:
kfree(new_admin);
return 0; return err;
} }
static void taprio_destroy(struct Qdisc *sch) static void taprio_destroy(struct Qdisc *sch)
{ {
struct taprio_sched *q = qdisc_priv(sch); struct taprio_sched *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch); struct net_device *dev = qdisc_dev(sch);
struct sched_entry *entry, *n;
unsigned int i; unsigned int i;
spin_lock(&taprio_list_lock); spin_lock(&taprio_list_lock);
...@@ -750,10 +876,11 @@ static void taprio_destroy(struct Qdisc *sch) ...@@ -750,10 +876,11 @@ static void taprio_destroy(struct Qdisc *sch)
netdev_set_num_tc(dev, 0); netdev_set_num_tc(dev, 0);
list_for_each_entry_safe(entry, n, &q->entries, list) { if (q->oper_sched)
list_del(&entry->list); call_rcu(&q->oper_sched->rcu, taprio_free_sched_cb);
kfree(entry);
} if (q->admin_sched)
call_rcu(&q->admin_sched->rcu, taprio_free_sched_cb);
} }
static int taprio_init(struct Qdisc *sch, struct nlattr *opt, static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
...@@ -761,12 +888,12 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt, ...@@ -761,12 +888,12 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
{ {
struct taprio_sched *q = qdisc_priv(sch); struct taprio_sched *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch); struct net_device *dev = qdisc_dev(sch);
int i;
INIT_LIST_HEAD(&q->entries);
spin_lock_init(&q->current_entry_lock); spin_lock_init(&q->current_entry_lock);
/* We may overwrite the configuration later */
hrtimer_init(&q->advance_timer, CLOCK_TAI, HRTIMER_MODE_ABS); hrtimer_init(&q->advance_timer, CLOCK_TAI, HRTIMER_MODE_ABS);
q->advance_timer.function = advance_sched;
q->root = sch; q->root = sch;
...@@ -796,6 +923,25 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt, ...@@ -796,6 +923,25 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
list_add(&q->taprio_list, &taprio_list); list_add(&q->taprio_list, &taprio_list);
spin_unlock(&taprio_list_lock); spin_unlock(&taprio_list_lock);
for (i = 0; i < dev->num_tx_queues; i++) {
struct netdev_queue *dev_queue;
struct Qdisc *qdisc;
dev_queue = netdev_get_tx_queue(dev, i);
qdisc = qdisc_create_dflt(dev_queue,
&pfifo_qdisc_ops,
TC_H_MAKE(TC_H_MAJ(sch->handle),
TC_H_MIN(i + 1)),
extack);
if (!qdisc)
return -ENOMEM;
if (i < dev->real_num_tx_queues)
qdisc_hash_add(qdisc, false);
q->qdiscs[i] = qdisc;
}
return taprio_change(sch, opt, extack); return taprio_change(sch, opt, extack);
} }
...@@ -867,15 +1013,55 @@ static int dump_entry(struct sk_buff *msg, ...@@ -867,15 +1013,55 @@ static int dump_entry(struct sk_buff *msg,
return -1; return -1;
} }
static int dump_schedule(struct sk_buff *msg,
const struct sched_gate_list *root)
{
struct nlattr *entry_list;
struct sched_entry *entry;
if (nla_put_s64(msg, TCA_TAPRIO_ATTR_SCHED_BASE_TIME,
root->base_time, TCA_TAPRIO_PAD))
return -1;
if (nla_put_s64(msg, TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME,
root->cycle_time, TCA_TAPRIO_PAD))
return -1;
if (nla_put_s64(msg, TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION,
root->cycle_time_extension, TCA_TAPRIO_PAD))
return -1;
entry_list = nla_nest_start_noflag(msg,
TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST);
if (!entry_list)
goto error_nest;
list_for_each_entry(entry, &root->entries, list) {
if (dump_entry(msg, entry) < 0)
goto error_nest;
}
nla_nest_end(msg, entry_list);
return 0;
error_nest:
nla_nest_cancel(msg, entry_list);
return -1;
}
static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb) static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
{ {
struct taprio_sched *q = qdisc_priv(sch); struct taprio_sched *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch); struct net_device *dev = qdisc_dev(sch);
struct sched_gate_list *oper, *admin;
struct tc_mqprio_qopt opt = { 0 }; struct tc_mqprio_qopt opt = { 0 };
struct nlattr *nest, *entry_list; struct nlattr *nest, *sched_nest;
struct sched_entry *entry;
unsigned int i; unsigned int i;
rcu_read_lock();
oper = rcu_dereference(q->oper_sched);
admin = rcu_dereference(q->admin_sched);
opt.num_tc = netdev_get_num_tc(dev); opt.num_tc = netdev_get_num_tc(dev);
memcpy(opt.prio_tc_map, dev->prio_tc_map, sizeof(opt.prio_tc_map)); memcpy(opt.prio_tc_map, dev->prio_tc_map, sizeof(opt.prio_tc_map));
...@@ -886,35 +1072,41 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb) ...@@ -886,35 +1072,41 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
nest = nla_nest_start_noflag(skb, TCA_OPTIONS); nest = nla_nest_start_noflag(skb, TCA_OPTIONS);
if (!nest) if (!nest)
return -ENOSPC; goto start_error;
if (nla_put(skb, TCA_TAPRIO_ATTR_PRIOMAP, sizeof(opt), &opt)) if (nla_put(skb, TCA_TAPRIO_ATTR_PRIOMAP, sizeof(opt), &opt))
goto options_error; goto options_error;
if (nla_put_s64(skb, TCA_TAPRIO_ATTR_SCHED_BASE_TIME,
q->base_time, TCA_TAPRIO_PAD))
goto options_error;
if (nla_put_s32(skb, TCA_TAPRIO_ATTR_SCHED_CLOCKID, q->clockid)) if (nla_put_s32(skb, TCA_TAPRIO_ATTR_SCHED_CLOCKID, q->clockid))
goto options_error; goto options_error;
entry_list = nla_nest_start_noflag(skb, if (oper && dump_schedule(skb, oper))
TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST);
if (!entry_list)
goto options_error; goto options_error;
list_for_each_entry(entry, &q->entries, list) { if (!admin)
if (dump_entry(skb, entry) < 0) goto done;
goto options_error;
}
nla_nest_end(skb, entry_list); sched_nest = nla_nest_start_noflag(skb, TCA_TAPRIO_ATTR_ADMIN_SCHED);
if (dump_schedule(skb, admin))
goto admin_error;
nla_nest_end(skb, sched_nest);
done:
rcu_read_unlock();
return nla_nest_end(skb, nest); return nla_nest_end(skb, nest);
admin_error:
nla_nest_cancel(skb, sched_nest);
options_error: options_error:
nla_nest_cancel(skb, nest); nla_nest_cancel(skb, nest);
return -1;
start_error:
rcu_read_unlock();
return -ENOSPC;
} }
static struct Qdisc *taprio_leaf(struct Qdisc *sch, unsigned long cl) static struct Qdisc *taprio_leaf(struct Qdisc *sch, unsigned long cl)
...@@ -1001,6 +1193,7 @@ static struct Qdisc_ops taprio_qdisc_ops __read_mostly = { ...@@ -1001,6 +1193,7 @@ static struct Qdisc_ops taprio_qdisc_ops __read_mostly = {
.id = "taprio", .id = "taprio",
.priv_size = sizeof(struct taprio_sched), .priv_size = sizeof(struct taprio_sched),
.init = taprio_init, .init = taprio_init,
.change = taprio_change,
.destroy = taprio_destroy, .destroy = taprio_destroy,
.peek = taprio_peek, .peek = taprio_peek,
.dequeue = taprio_dequeue, .dequeue = taprio_dequeue,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment