1. 04 Nov, 2013 10 commits
    • Sebastian Hesselbarth's avatar
      net: mv643xx_eth: fix orphaned statistics timer crash · b24b4a82
      Sebastian Hesselbarth authored
      [ Upstream commit f564412c ]
      
      The periodic statistics timer gets started at port _probe() time, but
      is stopped on _stop() only. In a modular environment, this can cause
      the timer to access already deallocated memory, if the module is unloaded
      without starting the eth device. To fix this, we add the timer right
      before the port is started, instead of at _probe() time.
      Signed-off-by: default avatarSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Acked-by: default avatarJason Cooper <jason@lakedaemon.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b24b4a82
    • Sebastian Hesselbarth's avatar
      net: mv643xx_eth: update statistics timer from timer context only · 85cd0213
      Sebastian Hesselbarth authored
      [ Upstream commit 041b4ddb ]
      
      Each port driver installs a periodic timer to update port statistics
      by calling mib_counters_update. As mib_counters_update is also called
      from non-timer context, we should not reschedule the timer there but
      rather move it to timer-only context.
      Signed-off-by: default avatarSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Acked-by: default avatarJason Cooper <jason@lakedaemon.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      85cd0213
    • David S. Miller's avatar
      l2tp: Fix build warning with ipv6 disabled. · d980ed62
      David S. Miller authored
      [ Upstream commit 8d8a51e2 ]
      
      net/l2tp/l2tp_core.c: In function ‘l2tp_verify_udp_checksum’:
      net/l2tp/l2tp_core.c:499:22: warning: unused variable ‘tunnel’ [-Wunused-variable]
      
      Create a helper "l2tp_tunnel()" to facilitate this, and as a side
      effect get rid of a bunch of unnecessary void pointer casts.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d980ed62
    • François CACHEREUL's avatar
      l2tp: fix kernel panic when using IPv4-mapped IPv6 addresses · bd83cd77
      François CACHEREUL authored
      [ Upstream commit e18503f4 ]
      
      IPv4 mapped addresses cause kernel panic.
      The patch juste check whether the IPv6 address is an IPv4 mapped
      address. If so, use IPv4 API instead of IPv6.
      
      [  940.026915] general protection fault: 0000 [#1]
      [  940.026915] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core pppox ppp_generic slhc loop psmouse
      [  940.026915] CPU: 0 PID: 3184 Comm: memcheck-amd64- Not tainted 3.11.0+ #1
      [  940.026915] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [  940.026915] task: ffff880007130e20 ti: ffff88000737e000 task.ti: ffff88000737e000
      [  940.026915] RIP: 0010:[<ffffffff81333780>]  [<ffffffff81333780>] ip6_xmit+0x276/0x326
      [  940.026915] RSP: 0018:ffff88000737fd28  EFLAGS: 00010286
      [  940.026915] RAX: c748521a75ceff48 RBX: ffff880000c30800 RCX: 0000000000000000
      [  940.026915] RDX: ffff88000075cc4e RSI: 0000000000000028 RDI: ffff8800060e5a40
      [  940.026915] RBP: ffff8800060e5a40 R08: 0000000000000000 R09: ffff88000075cc90
      [  940.026915] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88000737fda0
      [  940.026915] R13: 0000000000000000 R14: 0000000000002000 R15: ffff880005d3b580
      [  940.026915] FS:  00007f163dc5e800(0000) GS:ffffffff81623000(0000) knlGS:0000000000000000
      [  940.026915] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  940.026915] CR2: 00000004032dc940 CR3: 0000000005c25000 CR4: 00000000000006f0
      [  940.026915] Stack:
      [  940.026915]  ffff88000075cc4e ffffffff81694e90 ffff880000c30b38 0000000000000020
      [  940.026915]  11000000523c4bac ffff88000737fdb4 0000000000000000 ffff880000c30800
      [  940.026915]  ffff880005d3b580 ffff880000c30b38 ffff8800060e5a40 0000000000000020
      [  940.026915] Call Trace:
      [  940.026915]  [<ffffffff81356cc3>] ? inet6_csk_xmit+0xa4/0xc4
      [  940.026915]  [<ffffffffa0038535>] ? l2tp_xmit_skb+0x503/0x55a [l2tp_core]
      [  940.026915]  [<ffffffff812b8d3b>] ? pskb_expand_head+0x161/0x214
      [  940.026915]  [<ffffffffa003e91d>] ? pppol2tp_xmit+0xf2/0x143 [l2tp_ppp]
      [  940.026915]  [<ffffffffa00292e0>] ? ppp_channel_push+0x36/0x8b [ppp_generic]
      [  940.026915]  [<ffffffffa00293fe>] ? ppp_write+0xaf/0xc5 [ppp_generic]
      [  940.026915]  [<ffffffff8110ead4>] ? vfs_write+0xa2/0x106
      [  940.026915]  [<ffffffff8110edd6>] ? SyS_write+0x56/0x8a
      [  940.026915]  [<ffffffff81378ac0>] ? system_call_fastpath+0x16/0x1b
      [  940.026915] Code: 00 49 8b 8f d8 00 00 00 66 83 7c 11 02 00 74 60 49
      8b 47 58 48 83 e0 fe 48 8b 80 18 01 00 00 48 85 c0 74 13 48 8b 80 78 02
      00 00 <48> ff 40 28 41 8b 57 68 48 01 50 30 48 8b 54 24 08 49 c7 c1 51
      [  940.026915] RIP  [<ffffffff81333780>] ip6_xmit+0x276/0x326
      [  940.026915]  RSP <ffff88000737fd28>
      [  940.057945] ---[ end trace be8aba9a61c8b7f3 ]---
      [  940.058583] Kernel panic - not syncing: Fatal exception in interrupt
      Signed-off-by: default avatarFrançois CACHEREUL <f.cachereul@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd83cd77
    • Eric Dumazet's avatar
      net: do not call sock_put() on TIMEWAIT sockets · 53d384ae
      Eric Dumazet authored
      [ Upstream commit 80ad1d61 ]
      
      commit 3ab5aee7 ("net: Convert TCP & DCCP hash tables to use RCU /
      hlist_nulls") incorrectly used sock_put() on TIMEWAIT sockets.
      
      We should instead use inet_twsk_put()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53d384ae
    • Yuchung Cheng's avatar
      tcp: fix incorrect ca_state in tail loss probe · d8ac0f17
      Yuchung Cheng authored
      [ Upstream commit 031afe49 ]
      
      On receiving an ACK that covers the loss probe sequence, TLP
      immediately sets the congestion state to Open, even though some packets
      are not recovered and retransmisssion are on the way.  The later ACks
      may trigger a WARN_ON check in step D of tcp_fastretrans_alert(), e.g.,
      https://bugzilla.redhat.com/show_bug.cgi?id=989251
      
      The fix is to follow the similar procedure in recovery by calling
      tcp_try_keep_open(). The sender switches to Open state if no packets
      are retransmissted. Otherwise it goes to Disorder and let subsequent
      ACKs move the state to Recovery or Open.
      Reported-By: default avatarMichael Sterrett <michael@sterretts.net>
      Tested-By: default avatarDormando <dormando@rydia.net>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8ac0f17
    • Eric Dumazet's avatar
      tcp: do not forget FIN in tcp_shifted_skb() · a50b0399
      Eric Dumazet authored
      [ Upstream commit 5e8a402f ]
      
      Yuchung found following problem :
      
       There are bugs in the SACK processing code, merging part in
       tcp_shift_skb_data(), that incorrectly resets or ignores the sacked
       skbs FIN flag. When a receiver first SACK the FIN sequence, and later
       throw away ofo queue (e.g., sack-reneging), the sender will stop
       retransmitting the FIN flag, and hangs forever.
      
      Following packetdrill test can be used to reproduce the bug.
      
      $ cat sack-merge-bug.pkt
      `sysctl -q net.ipv4.tcp_fack=0`
      
      // Establish a connection and send 10 MSS.
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +.000 bind(3, ..., ...) = 0
      +.000 listen(3, 1) = 0
      
      +.050 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
      +.000 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
      +.001 < . 1:1(0) ack 1 win 1024
      +.000 accept(3, ..., ...) = 4
      
      +.100 write(4, ..., 12000) = 12000
      +.000 shutdown(4, SHUT_WR) = 0
      +.000 > . 1:10001(10000) ack 1
      +.050 < . 1:1(0) ack 2001 win 257
      +.000 > FP. 10001:12001(2000) ack 1
      +.050 < . 1:1(0) ack 2001 win 257 <sack 10001:11001,nop,nop>
      +.050 < . 1:1(0) ack 2001 win 257 <sack 10001:12002,nop,nop>
      // SACK reneg
      +.050 < . 1:1(0) ack 12001 win 257
      +0 %{ print "unacked: ",tcpi_unacked }%
      +5 %{ print "" }%
      
      First, a typo inverted left/right of one OR operation, then
      code forgot to advance end_seq if the merged skb carried FIN.
      
      Bug was added in 2.6.29 by commit 832d11c5
      ("tcp: Try to restore large SKBs while SACK processing")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a50b0399
    • Eric Dumazet's avatar
      tcp: must unclone packets before mangling them · b81908e1
      Eric Dumazet authored
      [ Upstream commit c52e2421 ]
      
      TCP stack should make sure it owns skbs before mangling them.
      
      We had various crashes using bnx2x, and it turned out gso_size
      was cleared right before bnx2x driver was populating TC descriptor
      of the _previous_ packet send. TCP stack can sometime retransmit
      packets that are still in Qdisc.
      
      Of course we could make bnx2x driver more robust (using
      ACCESS_ONCE(shinfo->gso_size) for example), but the bug is TCP stack.
      
      We have identified two points where skb_unclone() was needed.
      
      This patch adds a WARN_ON_ONCE() to warn us if we missed another
      fix of this kind.
      
      Kudos to Neal for finding the root cause of this bug. Its visible
      using small MSS.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b81908e1
    • Eric Dumazet's avatar
      tcp: TSQ can use a dynamic limit · 0ae5f47e
      Eric Dumazet authored
      [ Upstream commit c9eeec26 ]
      
      When TCP Small Queues was added, we used a sysctl to limit amount of
      packets queues on Qdisc/device queues for a given TCP flow.
      
      Problem is this limit is either too big for low rates, or too small
      for high rates.
      
      Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
      auto sizing, it can better control number of packets in Qdisc/device
      queues.
      
      New limit is two packets or at least 1 to 2 ms worth of packets.
      
      Low rates flows benefit from this patch by having even smaller
      number of packets in queues, allowing for faster recovery,
      better RTT estimations.
      
      High rates flows benefit from this patch by allowing more than 2 packets
      in flight as we had reports this was a limiting factor to reach line
      rate. [ In particular if TX completion is delayed because of coalescing
      parameters ]
      
      Example for a single flow on 10Gbp link controlled by FQ/pacing
      
      14 packets in flight instead of 2
      
      $ tc -s -d qd
      qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
      buckets 1024 quantum 3028 initial_quantum 15140
       Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0
      requeues 6822476)
       rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476
        2047 flow, 2046 inactive, 1 throttled, delay 15673 ns
        2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit
      
      Note that sk_pacing_rate is currently set to twice the actual rate, but
      this might be refined in the future when a flow is in congestion
      avoidance.
      
      Additional change : skb->destructor should be set to tcp_wfree().
      
      A future patch (for linux 3.13+) might remove tcp_limit_output_bytes
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ae5f47e
    • Eric Dumazet's avatar
      tcp: TSO packets automatic sizing · 5e25ba50
      Eric Dumazet authored
      [ Upstream commits 6d36824e730f247b602c90e8715a792003e3c5a7,
        02cf4ebd, and parts of
        7eec4174 ]
      
      After hearing many people over past years complaining against TSO being
      bursty or even buggy, we are proud to present automatic sizing of TSO
      packets.
      
      One part of the problem is that tcp_tso_should_defer() uses an heuristic
      relying on upcoming ACKS instead of a timer, but more generally, having
      big TSO packets makes little sense for low rates, as it tends to create
      micro bursts on the network, and general consensus is to reduce the
      buffering amount.
      
      This patch introduces a per socket sk_pacing_rate, that approximates
      the current sending rate, and allows us to size the TSO packets so
      that we try to send one packet every ms.
      
      This field could be set by other transports.
      
      Patch has no impact for high speed flows, where having large TSO packets
      makes sense to reach line rate.
      
      For other flows, this helps better packet scheduling and ACK clocking.
      
      This patch increases performance of TCP flows in lossy environments.
      
      A new sysctl (tcp_min_tso_segs) is added, to specify the
      minimal size of a TSO packet (default being 2).
      
      A follow-up patch will provide a new packet scheduler (FQ), using
      sk_pacing_rate as an input to perform optional per flow pacing.
      
      This explains why we chose to set sk_pacing_rate to twice the current
      rate, allowing 'slow start' ramp up.
      
      sk_pacing_rate = 2 * cwnd * mss / srtt
      
      v2: Neal Cardwell reported a suspect deferring of last two segments on
      initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
      into account tp->xmit_size_goal_segs
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Van Jacobson <vanj@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e25ba50
  2. 18 Oct, 2013 30 commits