Commit c1d5674f authored by Yuchung Cheng's avatar Yuchung Cheng Committed by David S. Miller

tcp: less aggressive window probing on local congestion

Previously when the sender fails to send (original) data packet or
window probes due to congestion in the local host (e.g. throttling
in qdisc), it'll retry within an RTO or two up to 500ms.

In low-RTT networks such as data-centers, RTO is often far below
the default minimum 200ms. Then local host congestion could trigger
a retry storm pouring gas to the fire. Worse yet, the probe counter
(icsk_probes_out) is not properly updated so the aggressive retry
may exceed the system limit (15 rounds) until the packet finally
slips through.

On such rare events, it's wise to retry more conservatively
(500ms) and update the stats properly to reflect these incidents
and follow the system limit. Note that this is consistent with
the behaviors when a keep-alive probe or RTO retry is dropped
due to local congestion.
Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
Reviewed-by: default avatarNeal Cardwell <ncardwell@google.com>
Reviewed-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parent 590d2026
...@@ -3749,7 +3749,7 @@ void tcp_send_probe0(struct sock *sk) ...@@ -3749,7 +3749,7 @@ void tcp_send_probe0(struct sock *sk)
struct inet_connection_sock *icsk = inet_csk(sk); struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk); struct tcp_sock *tp = tcp_sk(sk);
struct net *net = sock_net(sk); struct net *net = sock_net(sk);
unsigned long probe_max; unsigned long timeout;
int err; int err;
err = tcp_write_wakeup(sk, LINUX_MIB_TCPWINPROBE); err = tcp_write_wakeup(sk, LINUX_MIB_TCPWINPROBE);
...@@ -3761,26 +3761,18 @@ void tcp_send_probe0(struct sock *sk) ...@@ -3761,26 +3761,18 @@ void tcp_send_probe0(struct sock *sk)
return; return;
} }
icsk->icsk_probes_out++;
if (err <= 0) { if (err <= 0) {
if (icsk->icsk_backoff < net->ipv4.sysctl_tcp_retries2) if (icsk->icsk_backoff < net->ipv4.sysctl_tcp_retries2)
icsk->icsk_backoff++; icsk->icsk_backoff++;
icsk->icsk_probes_out++; timeout = tcp_probe0_when(sk, TCP_RTO_MAX);
probe_max = TCP_RTO_MAX;
} else { } else {
/* If packet was not sent due to local congestion, /* If packet was not sent due to local congestion,
* do not backoff and do not remember icsk_probes_out. * Let senders fight for local resources conservatively.
* Let local senders to fight for local resources.
*
* Use accumulated backoff yet.
*/ */
if (!icsk->icsk_probes_out) timeout = TCP_RESOURCE_PROBE_INTERVAL;
icsk->icsk_probes_out = 1;
probe_max = TCP_RESOURCE_PROBE_INTERVAL;
} }
tcp_reset_xmit_timer(sk, ICSK_TIME_PROBE0, tcp_reset_xmit_timer(sk, ICSK_TIME_PROBE0, timeout, TCP_RTO_MAX, NULL);
tcp_probe0_when(sk, probe_max),
TCP_RTO_MAX,
NULL);
} }
int tcp_rtx_synack(const struct sock *sk, struct request_sock *req) int tcp_rtx_synack(const struct sock *sk, struct request_sock *req)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment