1. 07 Aug, 2017 3 commits
    • Dean Jenkins's avatar
      asix: Add rx->ax_skb = NULL after usbnet_skb_return() · 22889dbb
      Dean Jenkins authored
      In asix_rx_fixup_internal() there is a risk that rx->ax_skb gets
      reused after passing the Ethernet frame into the network stack via
      usbnet_skb_return().
      
      The risks include:
      
      a) asynchronously freeing rx->ax_skb after passing the netdev buffer
         to the NAPI layer which might corrupt the backlog queue.
      
      b) erroneously reusing rx->ax_skb such as calling skb_put_data() multiple
         times which causes writing off the end of the netdev buffer.
      
      Therefore add a defensive rx->ax_skb = NULL after usbnet_skb_return()
      so that it is not possible to free rx->ax_skb or to apply
      skb_put_data() too many times.
      Signed-off-by: default avatarDean Jenkins <Dean_Jenkins@mentor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22889dbb
    • Thomas Richter's avatar
      bpf: fix selftest/bpf/test_pkt_md_access on s390x · f9ea3225
      Thomas Richter authored
      Commit 18f3d6be ("selftests/bpf: Add test cases to test narrower ctx field loads")
      introduced new eBPF test cases. One of them (test_pkt_md_access.c)
      fails on s390x. The BPF verifier error message is:
      
      [root@s8360046 bpf]# ./test_progs
      test_pkt_access:PASS:ipv4 349 nsec
      test_pkt_access:PASS:ipv6 212 nsec
      [....]
      libbpf: load bpf program failed: Permission denied
      libbpf: -- BEGIN DUMP LOG ---
      libbpf:
      0: (71) r2 = *(u8 *)(r1 +0)
      invalid bpf_context access off=0 size=1
      
      libbpf: -- END LOG --
      libbpf: failed to load program 'test1'
      libbpf: failed to load object './test_pkt_md_access.o'
      Summary: 29 PASSED, 1 FAILED
      [root@s8360046 bpf]#
      
      This is caused by a byte endianness issue. S390x is a big endian
      architecture.  Pointer access to the lowest byte or halfword of a
      four byte value need to add an offset.
      On little endian architectures this offset is not needed.
      
      Fix this and use the same approach as the originator used for other files
      (for example test_verifier.c) in his original commit.
      
      With this fix the test program test_progs succeeds on s390x:
      [root@s8360046 bpf]# ./test_progs
      test_pkt_access:PASS:ipv4 236 nsec
      test_pkt_access:PASS:ipv6 217 nsec
      test_xdp:PASS:ipv4 3624 nsec
      test_xdp:PASS:ipv6 1722 nsec
      test_l4lb:PASS:ipv4 926 nsec
      test_l4lb:PASS:ipv6 1322 nsec
      test_tcp_estats:PASS: 0 nsec
      test_bpf_obj_id:PASS:get-fd-by-notexist-prog-id 0 nsec
      test_bpf_obj_id:PASS:get-fd-by-notexist-map-id 0 nsec
      test_bpf_obj_id:PASS:get-prog-info(fd) 0 nsec
      test_bpf_obj_id:PASS:get-map-info(fd) 0 nsec
      test_bpf_obj_id:PASS:get-prog-info(fd) 0 nsec
      test_bpf_obj_id:PASS:get-map-info(fd) 0 nsec
      test_bpf_obj_id:PASS:get-prog-fd(next_id) 0 nsec
      test_bpf_obj_id:PASS:get-prog-info(next_id->fd) 0 nsec
      test_bpf_obj_id:PASS:get-prog-fd(next_id) 0 nsec
      test_bpf_obj_id:PASS:get-prog-info(next_id->fd) 0 nsec
      test_bpf_obj_id:PASS:check total prog id found by get_next_id 0 nsec
      test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
      test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
      test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
      test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
      test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
      test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
      test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
      test_bpf_obj_id:PASS:check get-map-info(next_id->fd) 0 nsec
      test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
      test_bpf_obj_id:PASS:check get-map-info(next_id->fd) 0 nsec
      test_bpf_obj_id:PASS:check total map id found by get_next_id 0 nsec
      test_pkt_md_access:PASS: 277 nsec
      Summary: 30 PASSED, 0 FAILED
      [root@s8360046 bpf]#
      
      Fixes: 18f3d6be ("selftests/bpf: Add test cases to test narrower ctx field loads")
      Signed-off-by: default avatarThomas Richter <tmricht@linux.vnet.ibm.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9ea3225
    • stephen hemminger's avatar
      netvsc: fix race on sub channel creation · 732e4985
      stephen hemminger authored
      The existing sub channel code did not wait for all the sub-channels
      to completely initialize. This could lead to race causing crash
      in napi_netif_del() from bad list. The existing code would send
      an init message, then wait only for the initial response that
      the init message was received. It thought it was waiting for
      sub channels but really the init response did the wakeup.
      
      The new code keeps track of the number of open channels and
      waits until that many are open.
      
      Other issues here were:
        * host might return less sub-channels than was requested.
        * the new init status is not valid until after init was completed.
      
      Fixes: b3e6b82a ("hv_netvsc: Wait for sub-channels to be processed during probe")
      Signed-off-by: default avatarStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      732e4985
  2. 04 Aug, 2017 9 commits
    • Daniel Borkmann's avatar
      bpf: fix byte order test in test_verifier · 2c460621
      Daniel Borkmann authored
      We really must check with #if __BYTE_ORDER == XYZ instead of
      just presence of #ifdef __LITTLE_ENDIAN. I noticed that when
      actually running this on big endian machine, the latter test
      resolves to true for user space, same for #ifdef __BIG_ENDIAN.
      
      E.g., looking at endian.h from libc, both are also defined
      there, so we really must test this against __BYTE_ORDER instead
      for proper insns selection. For the kernel, such checks are
      fine though e.g. see 13da9e20 ("Revert "endian: #define
      __BYTE_ORDER"") and 415586c9 ("UAPI: fix endianness conditionals
      in M32R's asm/stat.h") for some more context, but not for
      user space. Lets also make sure to properly include endian.h.
      After that, suite passes for me:
      
      ./test_verifier: ELF 64-bit MSB executable, [...]
      
      Linux foo 4.13.0-rc3+ #4 SMP Fri Aug 4 06:59:30 EDT 2017 s390x s390x s390x GNU/Linux
      
      Before fix: Summary: 505 PASSED, 11 FAILED
      After  fix: Summary: 516 PASSED,  0 FAILED
      
      Fixes: 18f3d6be ("selftests/bpf: Add test cases to test narrower ctx field loads")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong <yhs@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c460621
    • Thomas Bogendoerfer's avatar
      xgene: Always get clk source, but ignore if it's missing for SGMII ports · aaf83aec
      Thomas Bogendoerfer authored
      Even the driver doesn't do anything with the clk source for SGMII
      ports it needs to be enabled by doing a devm_clk_get(), if there is
      a clk source in DT.
      
      Fixes: 0db01097 ('xgene: Don't fail probe, if there is no clk resource for SGMII interfaces')
      Signed-off-by: default avatarThomas Bogendoerfer <tbogendoerfer@suse.de>
      Tested-by: default avatarLaura Abbott <labbott@redhat.com>
      Acked-by: default avatarIyappan Subramanian <isubramanian@apm.com>
      Tested-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aaf83aec
    • David Daney's avatar
      MIPS: Add missing file for eBPF JIT. · b6bd53f9
      David Daney authored
      Inexplicably, commit f381bf6d ("MIPS: Add support for eBPF JIT.")
      lost a file somewhere on its path to Linus' tree.  Add back the
      missing ebpf_jit.c so that we can build with CONFIG_BPF_JIT selected.
      
      This version of ebpf_jit.c is identical to the original except for two
      minor change need to resolve conflicts with changes merged from the
      BPF branch:
      
      A) Set prog->jited_len = image_size;
      B) Use BPF_TAIL_CALL instead of BPF_CALL | BPF_X
      
      Fixes: f381bf6d ("MIPS: Add support for eBPF JIT.")
      Signed-off-by: default avatarDavid Daney <david.daney@cavium.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6bd53f9
    • David S. Miller's avatar
      Merge branch 's390-bpf-jit-fixes' · 7a973251
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      Two BPF fixes for s390
      
      Found while testing some other work touching JITs.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a973251
    • Daniel Borkmann's avatar
      bpf, s390: fix build for libbpf and selftest suite · bad1926d
      Daniel Borkmann authored
      The BPF feature test as well as libbpf is missing the __NR_bpf
      define for s390 and currently refuses to compile (selftest suite
      depends on libbpf as well). Similar issue was fixed some time
      ago via b0c47807 ("bpf: Add sparc support to tools and
      samples."), just do the same and add definitions.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bad1926d
    • Daniel Borkmann's avatar
      bpf, s390: fix jit branch offset related to ldimm64 · b0a0c256
      Daniel Borkmann authored
      While testing some other work that required JIT modifications, I
      run into test_bpf causing a hang when JIT enabled on s390. The
      problematic test case was the one from ddc665a4 (bpf, arm64:
      fix jit branch offset related to ldimm64), and turns out that we
      do have a similar issue on s390 as well. In bpf_jit_prog() we
      update next instruction address after returning from bpf_jit_insn()
      with an insn_count. bpf_jit_insn() returns either -1 in case of
      error (e.g. unsupported insn), 1 or 2. The latter is only the
      case for ldimm64 due to spanning 2 insns, however, next address
      is only set to i + 1 not taking actual insn_count into account,
      thus fix is to use insn_count instead of 1. bpf_jit_enable in
      mode 2 provides also disasm on s390:
      
      Before fix:
      
        000003ff800349b6: a7f40003   brc     15,3ff800349bc                 ; target
        000003ff800349ba: 0000               unknown
        000003ff800349bc: e3b0f0700024       stg     %r11,112(%r15)
        000003ff800349c2: e3e0f0880024       stg     %r14,136(%r15)
        000003ff800349c8: 0db0               basr    %r11,%r0
        000003ff800349ca: c0ef00000000       llilf   %r14,0
        000003ff800349d0: e320b0360004       lg      %r2,54(%r11)
        000003ff800349d6: e330b03e0004       lg      %r3,62(%r11)
        000003ff800349dc: ec23ffeda065       clgrj   %r2,%r3,10,3ff800349b6 ; jmp
        000003ff800349e2: e3e0b0460004       lg      %r14,70(%r11)
        000003ff800349e8: e3e0b04e0004       lg      %r14,78(%r11)
        000003ff800349ee: b904002e   lgr     %r2,%r14
        000003ff800349f2: e3b0f0700004       lg      %r11,112(%r15)
        000003ff800349f8: e3e0f0880004       lg      %r14,136(%r15)
        000003ff800349fe: 07fe               bcr     15,%r14
      
      After fix:
      
        000003ff80ef3db4: a7f40003   brc     15,3ff80ef3dba
        000003ff80ef3db8: 0000               unknown
        000003ff80ef3dba: e3b0f0700024       stg     %r11,112(%r15)
        000003ff80ef3dc0: e3e0f0880024       stg     %r14,136(%r15)
        000003ff80ef3dc6: 0db0               basr    %r11,%r0
        000003ff80ef3dc8: c0ef00000000       llilf   %r14,0
        000003ff80ef3dce: e320b0360004       lg      %r2,54(%r11)
        000003ff80ef3dd4: e330b03e0004       lg      %r3,62(%r11)
        000003ff80ef3dda: ec230006a065       clgrj   %r2,%r3,10,3ff80ef3de6 ; jmp
        000003ff80ef3de0: e3e0b0460004       lg      %r14,70(%r11)
        000003ff80ef3de6: e3e0b04e0004       lg      %r14,78(%r11)          ; target
        000003ff80ef3dec: b904002e   lgr     %r2,%r14
        000003ff80ef3df0: e3b0f0700004       lg      %r11,112(%r15)
        000003ff80ef3df6: e3e0f0880004       lg      %r14,136(%r15)
        000003ff80ef3dfc: 07fe               bcr     15,%r14
      
      test_bpf.ko suite runs fine after the fix.
      
      Fixes: 05462310 ("s390/bpf: Add s390x eBPF JIT compiler backend")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0a0c256
    • David S. Miller's avatar
      Merge branch 'mlxsw-Couple-of-fixes' · 1aff0c34
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: Couple of fixes
      
      Ido says:
      
      The first patch prevents us from warning about valid situations that can
      happen due to the fact that some operations in switchdev are deferred.
      
      Second patch fixes a long standing problem in which we didn't correctly
      free resources upon module removal, resulting in a memory leak.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1aff0c34
    • Ido Schimmel's avatar
      mlxsw: spectrum_switchdev: Release multicast groups during fini · 852cfeed
      Ido Schimmel authored
      Each multicast group (MID) stores a bitmap of ports to which a packet
      should be forwarded to in case an MDB entry associated with the MID is
      hit.
      
      Since the initial introduction of IGMP snooping in commit 3a49b4fd
      ("mlxsw: Adding layer 2 multicast support") the driver didn't correctly
      free these multicast groups upon ungraceful situations such as the
      removal of the upper bridge device or module removal.
      
      The correct way to fix this is to associate each MID with the bridge
      ports member in it and then drop the reference in case the bridge port
      is destroyed, but this will result in a lot more code and will be fixed
      in net-next.
      
      For now, upon module removal, traverse the MID list and release each
      one.
      
      Fixes: 3a49b4fd ("mlxsw: Adding layer 2 multicast support")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      852cfeed
    • Ido Schimmel's avatar
      mlxsw: spectrum_switchdev: Don't warn about valid situations · 17b334a8
      Ido Schimmel authored
      Some operations in the bridge driver such as MDB deletion are preformed
      in an atomic context and thus deferred to a process context by the
      switchdev infrastructure.
      
      Therefore, by the time the operation is performed by the underlying
      device driver it's possible the bridge port context is already gone.
      This is especially true for removal flows, but theoretically can also be
      invoked during addition.
      
      Remove the warnings in such situations and return normally.
      
      Fixes: c57529e1 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
      Fixes: 3922285d ("net: bridge: Add support for offloading port attributes")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17b334a8
  3. 03 Aug, 2017 7 commits
    • David S. Miller's avatar
      Merge branch 'tcp-xmit-timer-rearming' · 337f1b07
      David S. Miller authored
      Neal Cardwell says:
      
      ====================
      tcp: fix xmit timer rearming to avoid stalls
      
      This patch series is a bug fix for a TCP loss recovery performance bug
      reported independently in recent netdev threads:
      
       (i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
       (ii) July 26, 2017: netdev thread:
             "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
             outstanding TLP retransmission"
      
      Many thanks to Klavs Klavsen and Mao Wenan for the detailed reports,
      traces, and packetdrill test cases, which enabled us to root-cause
      this issue and verify the fix.
      
      - v1 -> v2:
       - In patch 2/3, changed an unclear comment in the pre-existing code
         in tcp_schedule_loss_probe() to be more clear (thanks to Eric Dumazet
         for suggesting we improve this).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      337f1b07
    • Neal Cardwell's avatar
      tcp: fix xmit timer to only be reset if data ACKed/SACKed · df92c839
      Neal Cardwell authored
      Fix a TCP loss recovery performance bug raised recently on the netdev
      list, in two threads:
      
      (i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
      (ii) July 26, 2017: netdev thread:
           "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
           outstanding TLP retransmission"
      
      The basic problem is that incoming TCP packets that did not indicate
      forward progress could cause the xmit timer (TLP or RTO) to be rearmed
      and pushed back in time. In certain corner cases this could result in
      the following problems noted in these threads:
      
       - Repeated ACKs coming in with bogus SACKs corrupted by middleboxes
         could cause TCP to repeatedly schedule TLPs forever. We kept
         sending TLPs after every ~200ms, which elicited bogus SACKs, which
         caused more TLPs, ad infinitum; we never fired an RTO to fill in
         the holes.
      
       - Incoming data segments could, in some cases, cause us to reschedule
         our RTO or TLP timer further out in time, for no good reason. This
         could cause repeated inbound data to result in stalls in outbound
         data, in the presence of packet loss.
      
      This commit fixes these bugs by changing the TLP and RTO ACK
      processing to:
      
       (a) Only reschedule the xmit timer once per ACK.
      
       (b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the
           ACK indicates sufficient forward progress (a packet was
           cumulatively ACKed, or we got a SACK for a packet that was sent
           before the most recent retransmit of the write queue head).
      
      This brings us back into closer compliance with the RFCs, since, as
      the comment for tcp_rearm_rto() notes, we should only restart the RTO
      timer after forward progress on the connection. Previously we were
      restarting the xmit timer even in these cases where there was no
      forward progress.
      
      As a side benefit, this commit simplifies and speeds up the TCP timer
      arming logic. We had been calling inet_csk_reset_xmit_timer() three
      times on normal ACKs that cumulatively acknowledged some data:
      
      1) Once near the top of tcp_ack() to switch from TLP timer to RTO:
              if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
                     tcp_rearm_rto(sk);
      
      2) Once in tcp_clean_rtx_queue(), to update the RTO:
              if (flag & FLAG_ACKED) {
                     tcp_rearm_rto(sk);
      
      3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO
         to TLP:
              if (icsk->icsk_pending == ICSK_TIME_RETRANS)
                     tcp_schedule_loss_probe(sk);
      
      This commit, by only rescheduling the xmit timer once per ACK,
      simplifies the code and reduces CPU overhead.
      
      This commit was tested in an A/B test with Google web server
      traffic. SNMP stats and request latency metrics were within noise
      levels, substantiating that for normal web traffic patterns this is a
      rare issue. This commit was also tested with packetdrill tests to
      verify that it fixes the timer behavior in the corner cases discussed
      in the netdev threads mentioned above.
      
      This patch is a bug fix patch intended to be queued for -stable
      relases.
      
      Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)")
      Reported-by: default avatarKlavs Klavsen <kl@vsen.dk>
      Reported-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNandita Dukkipati <nanditad@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df92c839
    • Neal Cardwell's avatar
      tcp: enable xmit timer fix by having TLP use time when RTO should fire · a2815817
      Neal Cardwell authored
      Have tcp_schedule_loss_probe() base the TLP scheduling decision based
      on when the RTO *should* fire. This is to enable the upcoming xmit
      timer fix in this series, where tcp_schedule_loss_probe() cannot
      assume that the last timer installed was an RTO timer (because we are
      no longer doing the "rearm RTO, rearm RTO, rearm TLP" dance on every
      ACK). So tcp_schedule_loss_probe() must independently figure out when
      an RTO would want to fire.
      
      In the new TLP implementation following in this series, we cannot
      assume that icsk_timeout was set based on an RTO; after processing a
      cumulative ACK the icsk_timeout we see can be from a previous TLP or
      RTO. So we need to independently recalculate the RTO time (instead of
      reading it out of icsk_timeout). Removing this dependency on the
      nature of icsk_timeout makes things a little easier to reason about
      anyway.
      
      Note that the old and new code should be equivalent, since they are
      both saying: "if the RTO is in the future, but at an earlier time than
      the normal TLP time, then set the TLP timer to fire when the RTO would
      have fired".
      
      Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNandita Dukkipati <nanditad@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2815817
    • Neal Cardwell's avatar
      tcp: introduce tcp_rto_delta_us() helper for xmit timer fix · e1a10ef7
      Neal Cardwell authored
      Pure refactor. This helper will be required in the xmit timer fix
      later in the patch series. (Because the TLP logic will want to make
      this calculation.)
      
      Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNandita Dukkipati <nanditad@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1a10ef7
    • Xin Long's avatar
      ipv6: set rt6i_protocol properly in the route when it is installed · b91d5329
      Xin Long authored
      After commit c2ed1880 ("net: ipv6: check route protocol when
      deleting routes"), ipv6 route checks rt protocol when trying to
      remove a rt entry.
      
      It introduced a side effect causing 'ip -6 route flush cache' not
      to work well. When flushing caches with iproute, all route caches
      get dumped from kernel then removed one by one by sending DELROUTE
      requests to kernel for each cache.
      
      The thing is iproute sends the request with the cache whose proto
      is set with RTPROT_REDIRECT by rt6_fill_node() when kernel dumps
      it. But in kernel the rt_cache protocol is still 0, which causes
      the cache not to be matched and removed.
      
      So the real reason is rt6i_protocol in the route is not set when
      it is allocated. As David Ahern's suggestion, this patch is to
      set rt6i_protocol properly in the route when it is installed and
      remove the codes setting rtm_protocol according to rt6i_flags in
      rt6_fill_node.
      
      This is also an improvement to keep rt6i_protocol consistent with
      rtm_protocol.
      
      Fixes: c2ed1880 ("net: ipv6: check route protocol when deleting routes")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Suggested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b91d5329
    • Eric Dumazet's avatar
      net: fix keepalive code vs TCP_FASTOPEN_CONNECT · 2dda6400
      Eric Dumazet authored
      syzkaller was able to trigger a divide by 0 in TCP stack [1]
      
      Issue here is that keepalive timer needs to be updated to not attempt
      to send a probe if the connection setup was deferred using
      TCP_FASTOPEN_CONNECT socket option added in linux-4.11
      
      [1]
       divide error: 0000 [#1] SMP
       CPU: 18 PID: 0 Comm: swapper/18 Not tainted
       task: ffff986f62f4b040 ti: ffff986f62fa2000 task.ti: ffff986f62fa2000
       RIP: 0010:[<ffffffff8409cc0d>]  [<ffffffff8409cc0d>] __tcp_select_window+0x8d/0x160
       Call Trace:
        <IRQ>
        [<ffffffff8409d951>] tcp_transmit_skb+0x11/0x20
        [<ffffffff8409da21>] tcp_xmit_probe_skb+0xc1/0xe0
        [<ffffffff840a0ee8>] tcp_write_wakeup+0x68/0x160
        [<ffffffff840a151b>] tcp_keepalive_timer+0x17b/0x230
        [<ffffffff83b3f799>] call_timer_fn+0x39/0xf0
        [<ffffffff83b40797>] run_timer_softirq+0x1d7/0x280
        [<ffffffff83a04ddb>] __do_softirq+0xcb/0x257
        [<ffffffff83ae03ac>] irq_exit+0x9c/0xb0
        [<ffffffff83a04c1a>] smp_apic_timer_interrupt+0x6a/0x80
        [<ffffffff83a03eaf>] apic_timer_interrupt+0x7f/0x90
        <EOI>
        [<ffffffff83fed2ea>] ? cpuidle_enter_state+0x13a/0x3b0
        [<ffffffff83fed2cd>] ? cpuidle_enter_state+0x11d/0x3b0
      
      Tested:
      
      Following packetdrill no longer crashes the kernel
      
      `echo 0 >/proc/sys/net/ipv4/tcp_timestamps`
      
      // Cache warmup: send a Fast Open cookie request
          0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
         +0 setsockopt(3, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
         +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation is now in progress)
         +0 > S 0:0(0) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO,nop,nop>
       +.01 < S. 123:123(0) ack 1 win 14600 <mss 1460,nop,nop,sackOK,nop,wscale 6,FO abcd1234,nop,nop>
         +0 > . 1:1(0) ack 1
         +0 close(3) = 0
         +0 > F. 1:1(0) ack 1
         +0 < F. 1:1(0) ack 2 win 92
         +0 > .  2:2(0) ack 2
      
         +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
         +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
         +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
         +0 setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
       +.01 connect(4, ..., ...) = 0
         +0 setsockopt(4, SOL_TCP, TCP_KEEPIDLE, [5], 4) = 0
         +10 close(4) = 0
      
      `echo 1 >/proc/sys/net/ipv4/tcp_timestamps`
      
      Fixes: 19f6d3f3 ("net/tcp-fastopen: Add new API support")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2dda6400
    • David S. Miller's avatar
      Merge tag 'batadv-net-for-davem-20170802' of git://git.open-mesh.org/linux-merge · 4d2bbb0e
      David S. Miller authored
      Simon Wunderlich says:
      
      ====================
      Here is a batman-adv bugfix:
      
       - fix TT sync flag inconsistency problems, which can lead to excess packets,
         by Linus Luessing
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d2bbb0e
  4. 02 Aug, 2017 19 commits
  5. 01 Aug, 2017 2 commits
    • K. Den's avatar
      gue: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP · 1bff8a0c
      K. Den authored
      In the case that GRO is turned on and the original received packet is
      CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
      csum-unnecessary point, which for instance could occur if the packet
      comes from another Linux guest on the same Linux host, we have to do
      either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
      csum_start properly reset considering RCO.
      
      However, since b7fe10e5 ("gro: Fix remcsum offload to deal with frags
      in GRO") that barrier in such case could be skipped if GRO turned on,
      hence we pass over it and the inner L4 validation mistakenly reckons
      it as a bad csum.
      
      This patch makes remcsum_offload being reset at the same time of GRO
      remcsum cleanup, so as to make it work in such case as before.
      
      Fixes: b7fe10e5 ("gro: Fix remcsum offload to deal with frags in GRO")
      Signed-off-by: default avatarKoichiro Den <den@klaipeden.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1bff8a0c
    • K. Den's avatar
      vxlan: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP · be73b304
      K. Den authored
      In the case that GRO is turned on and the original received packet is
      CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
      csum-unnecessary point, which for instance could occur if the packet
      comes from another Linux guest on the same Linux host, we have to do
      either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
      csum_start properly reset considering RCO.
      
      However, since b7fe10e5("gro: Fix remcsum offload to deal with frags
      in GRO") that barrier in such case could be skipped if GRO turned on,
      hence we pass over it and the inner L4 validation mistakenly reckons
      it as a bad csum.
      
      This patch makes remcsum_offload being reset at the same time of GRO
      remcsum cleanup, so as to make it work in such case as before.
      
      Fixes: b7fe10e5 ("gro: Fix remcsum offload to deal with frags in GRO")
      Signed-off-by: default avatarKoichiro Den <den@klaipeden.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be73b304