Commit 5080f39e authored by Nik Unger's avatar Nik Unger Committed by David S. Miller

netem: apply correct delay when rate throttling

I recently reported on the netem list that iperf network benchmarks
show unexpected results when a bandwidth throttling rate has been
configured for netem. Specifically:

1) The measured link bandwidth *increases* when a higher delay is added
2) The measured link bandwidth appears higher than the specified limit
3) The measured link bandwidth for the same very slow settings varies significantly across
  machines

The issue can be reproduced by using tc to configure netem with a
512kbit rate and various (none, 1us, 50ms, 100ms, 200ms) delays on a
veth pair between network namespaces, and then using iperf (or any
other network benchmarking tool) to test throughput. Complete detailed
instructions are in the original email chain here:
https://lists.linuxfoundation.org/pipermail/netem/2017-February/001672.html

There appear to be two underlying bugs causing these effects:

- The first issue causes long delays when the rate is slow and no
  delay is configured (e.g., "rate 512kbit"). This is because SKBs are
  not orphaned when no delay is configured, so orphaning does not
  occur until *after* the rate-induced delay has been applied. For
  this reason, adding a tiny delay (e.g., "rate 512kbit delay 1us")
  dramatically increases the measured bandwidth.

- The second issue is that rate-induced delays are not correctly
  applied, allowing SKB delays to occur in parallel. The indended
  approach is to compute the delay for an SKB and to add this delay to
  the end of the current queue. However, the code does not detect
  existing SKBs in the queue due to improperly testing sch->q.qlen,
  which is nonzero even when packets exist only in the
  rbtree. Consequently, new SKBs do not wait for the current queue to
  empty. When packet delays vary significantly (e.g., if packet sizes
  are different), then this also causes unintended reordering.

I modified the code to expect a delay (and orphan the SKB) when a rate
is configured. I also added some defensive tests that correctly find
the latest scheduled delivery time, even if it is (unexpectedly) for a
packet in sch->q. I have tested these changes on the latest kernel
(4.11.0-rc1+) and the iperf / ping test results are as expected.
Signed-off-by: default avatarNik Unger <njunger@uwaterloo.ca>
Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parent cd918afd
...@@ -462,7 +462,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, ...@@ -462,7 +462,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
/* If a delay is expected, orphan the skb. (orphaning usually takes /* If a delay is expected, orphan the skb. (orphaning usually takes
* place at TX completion time, so _before_ the link transit delay) * place at TX completion time, so _before_ the link transit delay)
*/ */
if (q->latency || q->jitter) if (q->latency || q->jitter || q->rate)
skb_orphan_partial(skb); skb_orphan_partial(skb);
/* /*
...@@ -530,21 +530,31 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, ...@@ -530,21 +530,31 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
now = psched_get_time(); now = psched_get_time();
if (q->rate) { if (q->rate) {
struct sk_buff *last; struct netem_skb_cb *last = NULL;
if (sch->q.tail)
last = netem_skb_cb(sch->q.tail);
if (q->t_root.rb_node) {
struct sk_buff *t_skb;
struct netem_skb_cb *t_last;
t_skb = netem_rb_to_skb(rb_last(&q->t_root));
t_last = netem_skb_cb(t_skb);
if (!last ||
t_last->time_to_send > last->time_to_send) {
last = t_last;
}
}
if (sch->q.qlen)
last = sch->q.tail;
else
last = netem_rb_to_skb(rb_last(&q->t_root));
if (last) { if (last) {
/* /*
* Last packet in queue is reference point (now), * Last packet in queue is reference point (now),
* calculate this time bonus and subtract * calculate this time bonus and subtract
* from delay. * from delay.
*/ */
delay -= netem_skb_cb(last)->time_to_send - now; delay -= last->time_to_send - now;
delay = max_t(psched_tdiff_t, 0, delay); delay = max_t(psched_tdiff_t, 0, delay);
now = netem_skb_cb(last)->time_to_send; now = last->time_to_send;
} }
delay += packet_len_2_sched_time(qdisc_pkt_len(skb), q); delay += packet_len_2_sched_time(qdisc_pkt_len(skb), q);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment