1. 26 May, 2017 3 commits
    • Peter Dawson's avatar
      ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets · 0e9a7095
      Peter Dawson authored
      This fix addresses two problems in the way the DSCP field is formulated
       on the encapsulating header of IPv6 tunnels.
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661
      
      1) The IPv6 tunneling code was manipulating the DSCP field of the
       encapsulating packet using the 32b flowlabel. Since the flowlabel is
       only the lower 20b it was incorrect to assume that the upper 12b
       containing the DSCP and ECN fields would remain intact when formulating
       the encapsulating header. This fix handles the 'inherit' and
       'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.
      
      2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
       incorrect and resulted in the DSCP value always being set to 0.
      
      Commit 90427ef5 ("ipv6: fix flow labels when the traffic class
       is non-0") caused the regression by masking out the flowlabel
       which exposed the incorrect handling of the DSCP portion of the
       flowlabel in ip6_tunnel and ip6_gre.
      
      Fixes: 90427ef5 ("ipv6: fix flow labels when the traffic class is non-0")
      Signed-off-by: default avatarPeter Dawson <peter.a.dawson@boeing.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e9a7095
    • Davide Caratti's avatar
      sctp: fix ICMP processing if skb is non-linear · 804ec7eb
      Davide Caratti authored
      sometimes ICMP replies to INIT chunks are ignored by the client, even if
      the encapsulated SCTP headers match an open socket. This happens when the
      ICMP packet is carried by a paged skb: use skb_header_pointer() to read
      packet contents beyond the SCTP header, so that chunk header and initiate
      tag are validated correctly.
      
      v2:
      - don't use skb_header_pointer() to read the transport header, since
        icmp_socket_deliver() already puts these 8 bytes in the linear area.
      - change commit message to make specific reference to INIT chunks.
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      804ec7eb
    • linzhang's avatar
      net: llc: add lock_sock in llc_ui_bind to avoid a race condition · 0908cf4d
      linzhang authored
      There is a race condition in llc_ui_bind if two or more processes/threads
      try to bind a same socket.
      
      If more processes/threads bind a same socket success that will lead to
      two problems, one is this action is not what we expected, another is
      will lead to kernel in unstable status or oops(in my simple test case,
      cause llc2.ko can't unload).
      
      The current code is test SOCK_ZAPPED bit to avoid a process to
      bind a same socket twice but that is can't avoid more processes/threads
      try to bind a same socket at the same time.
      
      So, add lock_sock in llc_ui_bind like others, such as llc_ui_connect.
      Signed-off-by: default avatarLin Zhang <xiaolou4617@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0908cf4d
  2. 25 May, 2017 13 commits
    • Nithin Sujir's avatar
      bonding: Don't update slave->link until ready to commit · 797a9364
      Nithin Sujir authored
      In the loadbalance arp monitoring scheme, when a slave link change is
      detected, the slave->link is immediately updated and slave_state_changed
      is set. Later down the function, the rtnl_lock is acquired and the
      changes are committed, updating the bond link state.
      
      However, the acquisition of the rtnl_lock can fail. The next time the
      monitor runs, since slave->link is already updated, it determines that
      link is unchanged. This results in the bond link state permanently out
      of sync with the slave link.
      
      This patch modifies bond_loadbalance_arp_mon() to handle link changes
      identical to bond_ab_arp_{inspect/commit}(). The new link state is
      maintained in slave->new_link until we're ready to commit at which point
      it's copied into slave->link.
      
      NOTE: miimon_{inspect/commit}() has a more complex state machine
      requiring the use of the bond_{propose,commit}_link_state() functions
      which maintains the intermediate state in slave->link_new_state. The arp
      monitors don't require that.
      
      Testing: This bug is very easy to reproduce with the following steps.
      1. In a loop, toggle a slave link of a bond slave interface.
      2. In a separate loop, do ifconfig up/down of an unrelated interface to
      create contention for rtnl_lock.
      Within a few iterations, the bond link goes out of sync with the slave
      link.
      Signed-off-by: default avatarNithin Nayak Sujir <nsujir@tintri.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
      Acked-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      797a9364
    • David Daney's avatar
      test_bpf: Add a couple of tests for BPF_JSGE. · 791caeb0
      David Daney authored
      Some JITs can optimize comparisons with zero.  Add a couple of
      BPF_JSGE tests against immediate zero.
      Signed-off-by: default avatarDavid Daney <david.daney@cavium.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      791caeb0
    • David S. Miller's avatar
      Merge branch 'bpf-fixes' · ae08ea97
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      Various BPF fixes
      
      Follow-up to fix incorrect pruning when alignment tracking is
      in use and to properly clear regs after call to not leave stale
      data behind, also a fix that adds bpf_clone_redirect to the
      bpf_helper_changes_pkt_data helper and exposes correct map_flags
      for lpm map into fdinfo. For details, please see individual
      patches.
      
      v1 -> v2:
        - Reworked first patch so that env->strict_alignment is the
          final indicator on whether we have to deal with strict
          alignment rather than having CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
          checks on various locations, so only checking env->strict_alignment
          is sufficient after that. Thanks for spotting, Dave!
        - Added patch 3 and 4.
        - Rest as is.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae08ea97
    • Daniel Borkmann's avatar
      bpf: add various verifier test cases · 614d0d77
      Daniel Borkmann authored
      This patch adds various verifier test cases:
      
      1) A test case for the pruning issue when tracking alignment
         is used.
      2) Various PTR_TO_MAP_VALUE_OR_NULL tests to make sure pointer
         arithmetic turns such register into UNKNOWN_VALUE type.
      3) Test cases for the special treatment of LD_ABS/LD_IND to
         make sure verifier doesn't break calling convention here.
         Latter is needed, since f.e. arm64 JIT uses r1 - r5 for
         storing temporary data, so they really must be marked as
         NOT_INIT.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      614d0d77
    • Daniel Borkmann's avatar
      bpf: fix wrong exposure of map_flags into fdinfo for lpm · a316338c
      Daniel Borkmann authored
      trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via
      attr->map_flags, since it does not support preallocation yet. We
      check the flag, but we never copy the flag into trie->map.map_flags,
      which is later on exposed into fdinfo and used by loaders such as
      iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test
      whether a pinned map has the same spec as the one from the BPF obj
      file and if not, bails out, which is currently the case for lpm
      since it exposes always 0 as flags.
      
      Also copy over flags in array_map_alloc() and stack_map_alloc().
      They always have to be 0 right now, but we should make sure to not
      miss to copy them over at a later point in time when we add actual
      flags for them to use.
      
      Fixes: b95a5c4d ("bpf: add a longest prefix match trie map implementation")
      Reported-by: default avatarJarno Rajahalme <jarno@covalent.io>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a316338c
    • Daniel Borkmann's avatar
      bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data · 41703a73
      Daniel Borkmann authored
      The bpf_clone_redirect() still needs to be listed in
      bpf_helper_changes_pkt_data() since we call into
      bpf_try_make_head_writable() from there, thus we need
      to invalidate prior pkt regs as well.
      
      Fixes: 36bbef52 ("bpf: direct packet write and access for helpers for clsact progs")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41703a73
    • Daniel Borkmann's avatar
      bpf: properly reset caller saved regs after helper call and ld_abs/ind · a9789ef9
      Daniel Borkmann authored
      Currently, after performing helper calls, we clear all caller saved
      registers, that is r0 - r5 and fill r0 depending on struct bpf_func_proto
      specification. The way we reset these regs can affect pruning decisions
      in later paths, since we only reset register's imm to 0 and type to
      NOT_INIT. However, we leave out clearing of other variables such as id,
      min_value, max_value, etc, which can later on lead to pruning mismatches
      due to stale data.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9789ef9
    • Daniel Borkmann's avatar
      bpf: fix incorrect pruning decision when alignment must be tracked · 1ad2f583
      Daniel Borkmann authored
      Currently, when we enforce alignment tracking on direct packet access,
      the verifier lets the following program pass despite doing a packet
      write with unaligned access:
      
        0: (61) r2 = *(u32 *)(r1 +76)
        1: (61) r3 = *(u32 *)(r1 +80)
        2: (61) r7 = *(u32 *)(r1 +8)
        3: (bf) r0 = r2
        4: (07) r0 += 14
        5: (25) if r7 > 0x1 goto pc+4
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
        6: (2d) if r0 > r3 goto pc+1
         R0=pkt(id=0,off=14,r=14) R1=ctx R2=pkt(id=0,off=0,r=14)
         R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
        7: (63) *(u32 *)(r0 -4) = r0
        8: (b7) r0 = 0
        9: (95) exit
      
        from 6 to 8:
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
        8: (b7) r0 = 0
        9: (95) exit
      
        from 5 to 10:
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=2 R10=fp
        10: (07) r0 += 1
        11: (05) goto pc-6
        6: safe                           <----- here, wrongly found safe
        processed 15 insns
      
      However, if we enforce a pruning mismatch by adding state into r8
      which is then being mismatched in states_equal(), we find that for
      the otherwise same program, the verifier detects a misaligned packet
      access when actually walking that path:
      
        0: (61) r2 = *(u32 *)(r1 +76)
        1: (61) r3 = *(u32 *)(r1 +80)
        2: (61) r7 = *(u32 *)(r1 +8)
        3: (b7) r8 = 1
        4: (bf) r0 = r2
        5: (07) r0 += 14
        6: (25) if r7 > 0x1 goto pc+4
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=0,max_value=1
         R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
        7: (2d) if r0 > r3 goto pc+1
         R0=pkt(id=0,off=14,r=14) R1=ctx R2=pkt(id=0,off=0,r=14)
         R3=pkt_end R7=inv,min_value=0,max_value=1
         R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
        8: (63) *(u32 *)(r0 -4) = r0
        9: (b7) r0 = 0
        10: (95) exit
      
        from 7 to 9:
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=0,max_value=1
         R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
        9: (b7) r0 = 0
        10: (95) exit
      
        from 6 to 11:
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=2
         R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
        11: (07) r0 += 1
        12: (b7) r8 = 0
        13: (05) goto pc-7                <----- mismatch due to r8
        7: (2d) if r0 > r3 goto pc+1
         R0=pkt(id=0,off=15,r=15) R1=ctx R2=pkt(id=0,off=0,r=15)
         R3=pkt_end R7=inv,min_value=2
         R8=imm0,min_value=0,max_value=0,min_align=2147483648 R10=fp
        8: (63) *(u32 *)(r0 -4) = r0
        misaligned packet access off 2+15+-4 size 4
      
      The reason why we fail to see it in states_equal() is that the
      third test in compare_ptrs_to_packet() ...
      
        if (old->off <= cur->off &&
            old->off >= old->range && cur->off >= cur->range)
                return true;
      
      ... will let the above pass. The situation we run into is that
      old->off <= cur->off (14 <= 15), meaning that prior walked paths
      went with smaller offset, which was later used in the packet
      access after successful packet range check and found to be safe
      already.
      
      For example: Given is R0=pkt(id=0,off=0,r=0). Adding offset 14
      as in above program to it, results in R0=pkt(id=0,off=14,r=0)
      before the packet range test. Now, testing this against R3=pkt_end
      with 'if r0 > r3 goto out' will transform R0 into R0=pkt(id=0,off=14,r=14)
      for the case when we're within bounds. A write into the packet
      at offset *(u32 *)(r0 -4), that is, 2 + 14 -4, is valid and
      aligned (2 is for NET_IP_ALIGN). After processing this with
      all fall-through paths, we later on check paths from branches.
      When the above skb->mark test is true, then we jump near the
      end of the program, perform r0 += 1, and jump back to the
      'if r0 > r3 goto out' test we've visited earlier already. This
      time, R0 is of type R0=pkt(id=0,off=15,r=0), and we'll prune
      that part because this time we'll have a larger safe packet
      range, and we already found that with off=14 all further insn
      were already safe, so it's safe as well with a larger off.
      However, the problem is that the subsequent write into the packet
      with 2 + 15 -4 is then unaligned, and not caught by the alignment
      tracking. Note that min_align, aux_off, and aux_off_align were
      all 0 in this example.
      
      Since we cannot tell at this time what kind of packet access was
      performed in the prior walk and what minimal requirements it has
      (we might do so in the future, but that requires more complexity),
      fix it to disable this pruning case for strict alignment for now,
      and let the verifier do check such paths instead. With that applied,
      the test cases pass and reject the program due to misalignment.
      
      Fixes: d1174416 ("bpf: Track alignment of register values in the verifier.")
      Reference: http://patchwork.ozlabs.org/patch/761909/Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ad2f583
    • Ihar Hrachyshka's avatar
      arp: fixed -Wuninitialized compiler warning · 5990baaa
      Ihar Hrachyshka authored
      Commit 7d472a59 ("arp: always override
      existing neigh entries with gratuitous ARP") introduced a compiler
      warning:
      
      net/ipv4/arp.c:880:35: warning: 'addr_type' may be used uninitialized in
      this function [-Wmaybe-uninitialized]
      
      While the code logic seems to be correct and doesn't allow the variable
      to be used uninitialized, and the warning is not consistently
      reproducible, it's still worth fixing it for other people not to waste
      time looking at the warning in case it pops up in the build environment.
      Yes, compiler is probably at fault, but we will need to accommodate.
      
      Fixes: 7d472a59 ("arp: always override existing neigh entries with gratuitous ARP")
      Signed-off-by: default avatarIhar Hrachyshka <ihrachys@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5990baaa
    • Wei Wang's avatar
      tcp: avoid fastopen API to be used on AF_UNSPEC · ba615f67
      Wei Wang authored
      Fastopen API should be used to perform fastopen operations on the TCP
      socket. It does not make sense to use fastopen API to perform disconnect
      by calling it with AF_UNSPEC. The fastopen data path is also prone to
      race conditions and bugs when using with AF_UNSPEC.
      
      One issue reported and analyzed by Vegard Nossum is as follows:
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      Thread A:                            Thread B:
      ------------------------------------------------------------------------
      sendto()
       - tcp_sendmsg()
           - sk_stream_memory_free() = 0
               - goto wait_for_sndbuf
      	     - sk_stream_wait_memory()
      	        - sk_wait_event() // sleep
                |                          sendto(flags=MSG_FASTOPEN, dest_addr=AF_UNSPEC)
      	  |                           - tcp_sendmsg()
      	  |                              - tcp_sendmsg_fastopen()
      	  |                                 - __inet_stream_connect()
      	  |                                    - tcp_disconnect() //because of AF_UNSPEC
      	  |                                       - tcp_transmit_skb()// send RST
      	  |                                    - return 0; // no reconnect!
      	  |                           - sk_stream_wait_connect()
      	  |                                 - sock_error()
      	  |                                    - xchg(&sk->sk_err, 0)
      	  |                                    - return -ECONNRESET
      	- ... // wake up, see sk->sk_err == 0
          - skb_entail() on TCP_CLOSE socket
      
      If the connection is reopened then we will send a brand new SYN packet
      after thread A has already queued a buffer. At this point I think the
      socket internal state (sequence numbers etc.) becomes messed up.
      
      When the new connection is closed, the FIN-ACK is rejected because the
      sequence number is outside the window. The other side tries to
      retransmit,
      but __tcp_retransmit_skb() calls tcp_trim_head() on an empty skb which
      corrupts the skb data length and hits a BUG() in copy_and_csum_bits().
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      
      Hence, this patch adds a check for AF_UNSPEC in the fastopen data path
      and return EOPNOTSUPP to user if such case happens.
      
      Fixes: cf60af03 ("tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba615f67
    • Roman Kapl's avatar
      net: move somaxconn init from sysctl code · 7c3f1875
      Roman Kapl authored
      The default value for somaxconn is set in sysctl_core_net_init(), but this
      function is not called when kernel is configured without CONFIG_SYSCTL.
      
      This results in the kernel not being able to accept TCP connections,
      because the backlog has zero size. Usually, the user ends up with:
      "TCP: request_sock_TCP: Possible SYN flooding on port 7. Dropping request.  Check SNMP counters."
      If SYN cookies are not enabled the connection is rejected.
      
      Before ef547f2a (tcp: remove max_qlen_log), the effects were less
      severe, because the backlog was always at least eight slots long.
      Signed-off-by: default avatarRoman Kapl <roman.kapl@sysgo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c3f1875
    • Gustavo A. R. Silva's avatar
      net: fix potential null pointer dereference · 65d786c2
      Gustavo A. R. Silva authored
      Add null check to avoid a potential null pointer dereference.
      
      Addresses-Coverity-ID: 1408831
      Signed-off-by: default avatarGustavo A. R. Silva <garsilva@embeddedor.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65d786c2
    • Eric Garver's avatar
      geneve: fix fill_info when using collect_metadata · 11387fe4
      Eric Garver authored
      Since 9b4437a5 ("geneve: Unify LWT and netdev handling.") fill_info
      does not return UDP_ZERO_CSUM6_RX when using COLLECT_METADATA. This is
      because it uses ip_tunnel_info_af() with the device level info, which is
      not valid for COLLECT_METADATA.
      
      Fix by checking for the presence of the actual sockets.
      
      Fixes: 9b4437a5 ("geneve: Unify LWT and netdev handling.")
      Signed-off-by: default avatarEric Garver <e@erig.me>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11387fe4
  3. 24 May, 2017 13 commits
    • David S. Miller's avatar
      Merge branch 'q-in-q-checksums' · e6a88e4b
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      BPF pruning follow-up
      
      Follow-up to fix incorrect pruning when alignment tracking is
      in use and to properly clear regs after call to not leave stale
      data behind. For details, please see individual patches.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6a88e4b
    • Vlad Yasevich's avatar
      virtio-net: enable TSO/checksum offloads for Q-in-Q vlans · 2836b4f2
      Vlad Yasevich authored
      Since virtio does not provide it's own ndo_features_check handler,
      TSO, and now checksum offload, are disabled for stacked vlans.
      Re-enable the support and let the host take care of it.  This
      restores/improves Guest-to-Guest performance over Q-in-Q vlans.
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2836b4f2
    • Vlad Yasevich's avatar
      be2net: Fix offload features for Q-in-Q packets · cc6e9de6
      Vlad Yasevich authored
      At least some of the be2net cards do not seem to be capabled
      of performing checksum offload computions on Q-in-Q packets.
      In these case, the recevied checksum on the remote is invalid
      and TCP syn packets are dropped.
      
      This patch adds a call to check disbled acceleration features
      on Q-in-Q tagged traffic.
      
      CC: Sathya Perla <sathya.perla@broadcom.com>
      CC: Ajit Khaparde <ajit.khaparde@broadcom.com>
      CC: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      CC: Somnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc6e9de6
    • Vlad Yasevich's avatar
      vlan: Fix tcp checksum offloads in Q-in-Q vlans · 35d2f80b
      Vlad Yasevich authored
      It appears that TCP checksum offloading has been broken for
      Q-in-Q vlans.  The behavior was execerbated by the
      series
          commit afb0bc97 ("Merge branch 'stacked_vlan_tso'")
      that that enabled accleleration features on stacked vlans.
      
      However, event without that series, it is possible to trigger
      this issue.  It just requires a lot more specialized configuration.
      
      The root cause is the interaction between how
      netdev_intersect_features() works, the features actually set on
      the vlan devices and HW having the ability to run checksum with
      longer headers.
      
      The issue starts when netdev_interesect_features() replaces
      NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
      if the HW advertises IP|IPV6 specific checksums.  This happens
      for tagged and multi-tagged packets.   However, HW that enables
      IP|IPV6 checksum offloading doesn't gurantee that packets with
      arbitrarily long headers can be checksummed.
      
      This patch disables IP|IPV6 checksums on the packet for multi-tagged
      packets.
      
      CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      CC: Michal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35d2f80b
    • Andrew Lunn's avatar
      net: phy: marvell: Limit errata to 88m1101 · f2899788
      Andrew Lunn authored
      The 88m1101 has an errata when configuring autoneg. However, it was
      being applied to many other Marvell PHYs as well. Limit its scope to
      just the 88m1101.
      
      Fixes: 76884679 ("phylib: Add support for Marvell 88e1111S and 88e1145")
      Reported-by: default avatarDaniel Walker <danielwa@cisco.com>
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: default avatarHarini Katakam <harinik@xilinx.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2899788
    • Randy Dunlap's avatar
      net/phy: fix mdio-octeon dependency and build · cd47512e
      Randy Dunlap authored
      Fix build errors by making this driver depend on OF_MDIO, like
      several other similar drivers do.
      
      drivers/built-in.o: In function `octeon_mdiobus_remove':
      mdio-octeon.c:(.text+0x196ee0): undefined reference to `mdiobus_unregister'
      mdio-octeon.c:(.text+0x196ee8): undefined reference to `mdiobus_free'
      drivers/built-in.o: In function `octeon_mdiobus_probe':
      mdio-octeon.c:(.text+0x196f1d): undefined reference to `devm_mdiobus_alloc_size'
      mdio-octeon.c:(.text+0x196ffe): undefined reference to `of_mdiobus_register'
      mdio-octeon.c:(.text+0x197010): undefined reference to `mdiobus_free'
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc:	Andrew Lunn <andrew@lunn.ch>
      Cc:	Florian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd47512e
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2017-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 3f6b123b
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-fixes-2017-05-23
      
      Some TC offloads fixes from Or Gerlitz.
      From Erez, mlx5 IPoIB RX fix to improve GRO.
      From Mohamad, Command interface fix to improve mitigation against FW
      commands timeouts.
      From Tariq, Driver load Tolerance against affinity settings failures.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f6b123b
    • David S. Miller's avatar
      Merge tag 'mac80211-for-davem-2017-05-23' of... · 029c5817
      David S. Miller authored
      Merge tag 'mac80211-for-davem-2017-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      Just two fixes this time:
       * fix the scheduled scan "BUG: scheduling while atomic"
       * check mesh address extension flags more strictly
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      029c5817
    • Alexander Potapenko's avatar
      net: rtnetlink: bail out from rtnl_fdb_dump() on parse error · 0ff50e83
      Alexander Potapenko authored
      rtnl_fdb_dump() failed to check the result of nlmsg_parse(), which led
      to contents of |ifm| being uninitialized because nlh->nlmsglen was too
      small to accommodate |ifm|. The uninitialized data may affect some
      branches and result in unwanted effects, although kernel data doesn't
      seem to leak to the userspace directly.
      
      The bug has been detected with KMSAN and syzkaller.
      
      For the record, here is the KMSAN report:
      
      ==================================================================
      BUG: KMSAN: use of unitialized memory in rtnl_fdb_dump+0x5dc/0x1000
      CPU: 0 PID: 1039 Comm: probe Not tainted 4.11.0-rc5+ #2727
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16
       dump_stack+0x143/0x1b0 lib/dump_stack.c:52
       kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:1007
       __kmsan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:491
       rtnl_fdb_dump+0x5dc/0x1000 net/core/rtnetlink.c:3230
       netlink_dump+0x84f/0x1190 net/netlink/af_netlink.c:2168
       __netlink_dump_start+0xc97/0xe50 net/netlink/af_netlink.c:2258
       netlink_dump_start ./include/linux/netlink.h:165
       rtnetlink_rcv_msg+0xae9/0xb40 net/core/rtnetlink.c:4094
       netlink_rcv_skb+0x339/0x5a0 net/netlink/af_netlink.c:2339
       rtnetlink_rcv+0x83/0xa0 net/core/rtnetlink.c:4110
       netlink_unicast_kernel net/netlink/af_netlink.c:1272
       netlink_unicast+0x13b7/0x1480 net/netlink/af_netlink.c:1298
       netlink_sendmsg+0x10b8/0x10f0 net/netlink/af_netlink.c:1844
       sock_sendmsg_nosec net/socket.c:633
       sock_sendmsg net/socket.c:643
       ___sys_sendmsg+0xd4b/0x10f0 net/socket.c:1997
       __sys_sendmsg net/socket.c:2031
       SYSC_sendmsg+0x2c6/0x3f0 net/socket.c:2042
       SyS_sendmsg+0x87/0xb0 net/socket.c:2038
       do_syscall_64+0x102/0x150 arch/x86/entry/common.c:285
       entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
      RIP: 0033:0x401300
      RSP: 002b:00007ffc3b0e6d58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000401300
      RDX: 0000000000000000 RSI: 00007ffc3b0e6d80 RDI: 0000000000000003
      RBP: 00007ffc3b0e6e00 R08: 000000000000000b R09: 0000000000000004
      R10: 000000000000000d R11: 0000000000000246 R12: 0000000000000000
      R13: 00000000004065a0 R14: 0000000000406630 R15: 0000000000000000
      origin: 000000008fe00056
       save_stack_trace+0x59/0x60 arch/x86/kernel/stacktrace.c:59
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:352
       kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:247
       kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:260
       slab_alloc_node mm/slub.c:2743
       __kmalloc_node_track_caller+0x1f4/0x390 mm/slub.c:4349
       __kmalloc_reserve net/core/skbuff.c:138
       __alloc_skb+0x2cd/0x740 net/core/skbuff.c:231
       alloc_skb ./include/linux/skbuff.h:933
       netlink_alloc_large_skb net/netlink/af_netlink.c:1144
       netlink_sendmsg+0x934/0x10f0 net/netlink/af_netlink.c:1819
       sock_sendmsg_nosec net/socket.c:633
       sock_sendmsg net/socket.c:643
       ___sys_sendmsg+0xd4b/0x10f0 net/socket.c:1997
       __sys_sendmsg net/socket.c:2031
       SYSC_sendmsg+0x2c6/0x3f0 net/socket.c:2042
       SyS_sendmsg+0x87/0xb0 net/socket.c:2038
       do_syscall_64+0x102/0x150 arch/x86/entry/common.c:285
       return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
      ==================================================================
      
      and the reproducer:
      
      ==================================================================
        #include <sys/socket.h>
        #include <net/if_arp.h>
        #include <linux/netlink.h>
        #include <stdint.h>
      
        int main()
        {
          int sock = socket(PF_NETLINK, SOCK_DGRAM | SOCK_NONBLOCK, 0);
          struct msghdr msg;
          memset(&msg, 0, sizeof(msg));
          char nlmsg_buf[32];
          memset(nlmsg_buf, 0, sizeof(nlmsg_buf));
          struct nlmsghdr *nlmsg = nlmsg_buf;
          nlmsg->nlmsg_len = 0x11;
          nlmsg->nlmsg_type = 0x1e; // RTM_NEWROUTE = RTM_BASE + 0x0e
          // type = 0x0e = 1110b
          // kind = 2
          nlmsg->nlmsg_flags = 0x101; // NLM_F_ROOT | NLM_F_REQUEST
          nlmsg->nlmsg_seq = 0;
          nlmsg->nlmsg_pid = 0;
          nlmsg_buf[16] = (char)7;
          struct iovec iov;
          iov.iov_base = nlmsg_buf;
          iov.iov_len = 17;
          msg.msg_iov = &iov;
          msg.msg_iovlen = 1;
          sendmsg(sock, &msg, 0);
          return 0;
        }
      ==================================================================
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ff50e83
    • Quentin Schulz's avatar
      net: fec: add post PHY reset delay DT property · 159a0760
      Quentin Schulz authored
      Some PHY require to wait for a bit after the reset GPIO has been
      toggled. This adds support for the DT property `phy-reset-post-delay`
      which gives the delay in milliseconds to wait after reset.
      
      If the DT property is not given, no delay is observed. Post reset delay
      greater than 1000ms are invalid.
      Signed-off-by: default avatarQuentin Schulz <quentin.schulz@free-electrons.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      159a0760
    • David S. Miller's avatar
      Merge branch 'sctp-dupcookie-fixes' · 11d3c949
      David S. Miller authored
      Xin Long says:
      
      ====================
      sctp: a bunch of fixes for processing dupcookie
      
      After introducing transport hashtable and per stream info into sctp,
      some regressions were caused when processing dupcookie, this patchset
      is to fix them.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11d3c949
    • Xin Long's avatar
      sctp: set new_asoc temp when processing dupcookie · 7e062977
      Xin Long authored
      After sctp changed to use transport hashtable, a transport would be
      added into global hashtable when adding the peer to an asoc, then
      the asoc can be got by searching the transport in the hashtbale.
      
      The problem is when processing dupcookie in sctp_sf_do_5_2_4_dupcook,
      a new asoc would be created. A peer with the same addr and port as
      the one in the old asoc might be added into the new asoc, but fail
      to be added into the hashtable, as they also belong to the same sk.
      
      It causes that sctp's dupcookie processing can not really work.
      
      Since the new asoc will be freed after copying it's information to
      the old asoc, it's more like a temp asoc. So this patch is to fix
      it by setting it as a temp asoc to avoid adding it's any transport
      into the hashtable and also avoid allocing assoc_id.
      
      An extra thing it has to do is to also alloc stream info for any
      temp asoc, as sctp dupcookie process needs it to update old asoc.
      But I don't think it would hurt something, as a temp asoc would
      always be freed after finishing processing cookie echo packet.
      Reported-by: default avatarJianwen Ji <jiji@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e062977
    • Xin Long's avatar
      sctp: fix stream update when processing dupcookie · 3ab21379
      Xin Long authored
      Since commit 3dbcc105 ("sctp: alloc stream info when initializing
      asoc"), stream and stream.out info are always alloced when creating
      an asoc.
      
      So it's not correct to check !asoc->stream before updating stream
      info when processing dupcookie, but would be better to check asoc
      state instead.
      
      Fixes: 3dbcc105 ("sctp: alloc stream info when initializing asoc")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ab21379
  4. 23 May, 2017 11 commits