1. 16 Mar, 2018 5 commits
    • Florian Fainelli's avatar
      net: systemport: Rewrite __bcm_sysport_tx_reclaim() · 484d802d
      Florian Fainelli authored
      There is no need for complex checking between the last consumed index
      and current consumed index, a simple subtraction will do.
      
      This also eliminates the possibility of a permanent transmit queue stall
      under the following conditions:
      
      - one CPU bursts ring->size worth of traffic (up to 256 buffers), to the
        point where we run out of free descriptors, so we stop the transmit
        queue at the end of bcm_sysport_xmit()
      
      - because of our locking, we have the transmit process disable
        interrupts which means we can be blocking the TX reclamation process
      
      - when TX reclamation finally runs, we will be computing the difference
        between ring->c_index (last consumed index by SW) and what the HW
        reports through its register
      
      - this register is masked with (ring->size - 1) = 0xff, which will lead
        to stripping the upper bits of the index (register is 16-bits wide)
      
      - we will be computing last_tx_cn as 0, which means there is no work to
        be done, and we never wake-up the transmit queue, leaving it
        permanently disabled
      
      A practical example is e.g: ring->c_index aka last_c_index = 12, we
      pushed 256 entries, HW consumer index = 268, we mask it with 0xff = 12,
      so last_tx_cn == 0, nothing happens.
      
      Fixes: 80105bef ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      484d802d
    • Tom Herbert's avatar
      kcm: lock lower socket in kcm_attach · 2cc683e8
      Tom Herbert authored
      Need to lock lower socket in order to provide mutual exclusion
      with kcm_unattach.
      
      v2: Add Reported-by for syzbot
      
      Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
      Reported-by: syzbot+ea75c0ffcd353d32515f064aaebefc5279e6161e@syzkaller.appspotmail.com
      Signed-off-by: default avatarTom Herbert <tom@quantonium.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2cc683e8
    • David S. Miller's avatar
      Merge branch 'vlan-untag-and-insert-fixes' · e693be29
      David S. Miller authored
      Toshiaki Makita says:
      
      ====================
      Fix vlan untag and insertion for bridge and vlan with reorder_hdr off
      
      As Brandon Carpenter reported[1], sending non-vlan-offloaded packets from
      bridge devices ends up with corrupted packets. He narrowed down this problem
      and found that the root cause is in skb_reorder_vlan_header().
      
      While I was working on fixing this problem, I found that the function does
      not work properly for double tagged packets with reorder_hdr off as well.
      
      Patch 1 fixes these 2 problems in skb_reorder_vlan_header().
      
      And it turned out that fixing skb_reorder_vlan_header() is not sufficient
      to receive double tagged packets with reorder_hdr off while I was testing the
      fix. Vlan tags got out of order when vlan devices with reorder_hdr disabled
      were stacked. Patch 2 fixes this problem.
      
      [1] https://www.spinics.net/lists/linux-ethernet-bridging/msg07039.html
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e693be29
    • Toshiaki Makita's avatar
      vlan: Fix out of order vlan headers with reorder header off · cbe7128c
      Toshiaki Makita authored
      With reorder header off, received packets are untagged in skb_vlan_untag()
      called from within __netif_receive_skb_core(), and later the tag will be
      inserted back in vlan_do_receive().
      
      This caused out of order vlan headers when we create a vlan device on top
      of another vlan device, because vlan_do_receive() inserts a tag as the
      outermost vlan tag. E.g. the outer tag is first removed in skb_vlan_untag()
      and inserted back in vlan_do_receive(), then the inner tag is next removed
      and inserted back as the outermost tag.
      
      This patch fixes the behaviour by inserting the inner tag at the right
      position.
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cbe7128c
    • Toshiaki Makita's avatar
      net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off · 4bbb3e0e
      Toshiaki Makita authored
      When we have a bridge with vlan_filtering on and a vlan device on top of
      it, packets would be corrupted in skb_vlan_untag() called from
      br_dev_xmit().
      
      The problem sits in skb_reorder_vlan_header() used in skb_vlan_untag(),
      which makes use of skb->mac_len. In this function mac_len is meant for
      handling rx path with vlan devices with reorder_header disabled, but in
      tx path mac_len is typically 0 and cannot be used, which is the problem
      in this case.
      
      The current code even does not properly handle rx path (skb_vlan_untag()
      called from __netif_receive_skb_core()) with reorder_header off actually.
      
      In rx path single tag case, it works as follows:
      
      - Before skb_reorder_vlan_header()
      
       mac_header                                data
         v                                        v
         +-------------------+-------------+------+----
         |        ETH        |    VLAN     | ETH  |
         |       ADDRS       | TPID | TCI  | TYPE |
         +-------------------+-------------+------+----
         <-------- mac_len --------->
                             <------------->
                              to be removed
      
      - After skb_reorder_vlan_header()
      
                  mac_header                     data
                       v                          v
                       +-------------------+------+----
                       |        ETH        | ETH  |
                       |       ADDRS       | TYPE |
                       +-------------------+------+----
                       <-------- mac_len --------->
      
      This is ok, but in rx double tag case, it corrupts packets:
      
      - Before skb_reorder_vlan_header()
      
       mac_header                                              data
         v                                                      v
         +-------------------+-------------+-------------+------+----
         |        ETH        |    VLAN     |    VLAN     | ETH  |
         |       ADDRS       | TPID | TCI  | TPID | TCI  | TYPE |
         +-------------------+-------------+-------------+------+----
         <--------------- mac_len ---------------->
                                           <------------->
                                          should be removed
                             <--------------------------->
                               actually will be removed
      
      - After skb_reorder_vlan_header()
      
                  mac_header                                   data
                       v                                        v
                                     +-------------------+------+----
                                     |        ETH        | ETH  |
                                     |       ADDRS       | TYPE |
                                     +-------------------+------+----
                       <--------------- mac_len ---------------->
      
      So, two of vlan tags are both removed while only inner one should be
      removed and mac_header (and mac_len) is broken.
      
      skb_vlan_untag() is meant for removing the vlan header at (skb->data - 2),
      so use skb->data and skb->mac_header to calculate the right offset.
      Reported-by: default avatarBrandon Carpenter <brandon.carpenter@cypherpath.com>
      Fixes: a6e18ff1 ("vlan: Fix untag operations of stacked vlans with REORDER_HEADER off")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bbb3e0e
  2. 15 Mar, 2018 2 commits
    • Roman Mashak's avatar
      net sched actions: return explicit error when tunnel_key mode is not specified · 51d4740f
      Roman Mashak authored
      If set/unset mode of the tunnel_key action is not provided, ->init() still
      returns 0, and the caller proceeds with bogus 'struct tc_action *' object,
      this results in crash:
      
      % tc actions add action tunnel_key src_ip 1.1.1.1 dst_ip 2.2.2.1 id 7 index 1
      
      [   35.805515] general protection fault: 0000 [#1] SMP PTI
      [   35.806161] Modules linked in: act_tunnel_key kvm_intel kvm irqbypass
      crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64
      crypto_simd glue_helper cryptd serio_raw
      [   35.808233] CPU: 1 PID: 428 Comm: tc Not tainted 4.16.0-rc4+ #286
      [   35.808929] RIP: 0010:tcf_action_init+0x90/0x190
      [   35.809457] RSP: 0018:ffffb8edc068b9a0 EFLAGS: 00010206
      [   35.810053] RAX: 1320c000000a0003 RBX: 0000000000000001 RCX: 0000000000000000
      [   35.810866] RDX: 0000000000000070 RSI: 0000000000007965 RDI: ffffb8edc068b910
      [   35.811660] RBP: ffffb8edc068b9d0 R08: 0000000000000000 R09: ffffb8edc068b808
      [   35.812463] R10: ffffffffc02bf040 R11: 0000000000000040 R12: ffffb8edc068bb38
      [   35.813235] R13: 0000000000000000 R14: 0000000000000000 R15: ffffb8edc068b910
      [   35.814006] FS:  00007f3d0d8556c0(0000) GS:ffff91d1dbc40000(0000)
      knlGS:0000000000000000
      [   35.814881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   35.815540] CR2: 000000000043f720 CR3: 0000000019248001 CR4: 00000000001606a0
      [   35.816457] Call Trace:
      [   35.817158]  tc_ctl_action+0x11a/0x220
      [   35.817795]  rtnetlink_rcv_msg+0x23d/0x2e0
      [   35.818457]  ? __slab_alloc+0x1c/0x30
      [   35.819079]  ? __kmalloc_node_track_caller+0xb1/0x2b0
      [   35.819544]  ? rtnl_calcit.isra.30+0xe0/0xe0
      [   35.820231]  netlink_rcv_skb+0xce/0x100
      [   35.820744]  netlink_unicast+0x164/0x220
      [   35.821500]  netlink_sendmsg+0x293/0x370
      [   35.822040]  sock_sendmsg+0x30/0x40
      [   35.822508]  ___sys_sendmsg+0x2c5/0x2e0
      [   35.823149]  ? pagecache_get_page+0x27/0x220
      [   35.823714]  ? filemap_fault+0xa2/0x640
      [   35.824423]  ? page_add_file_rmap+0x108/0x200
      [   35.825065]  ? alloc_set_pte+0x2aa/0x530
      [   35.825585]  ? finish_fault+0x4e/0x70
      [   35.826140]  ? __handle_mm_fault+0xbc1/0x10d0
      [   35.826723]  ? __sys_sendmsg+0x41/0x70
      [   35.827230]  __sys_sendmsg+0x41/0x70
      [   35.827710]  do_syscall_64+0x68/0x120
      [   35.828195]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      [   35.828859] RIP: 0033:0x7f3d0ca4da67
      [   35.829331] RSP: 002b:00007ffc9f284338 EFLAGS: 00000246 ORIG_RAX:
      000000000000002e
      [   35.830304] RAX: ffffffffffffffda RBX: 00007ffc9f284460 RCX: 00007f3d0ca4da67
      [   35.831247] RDX: 0000000000000000 RSI: 00007ffc9f2843b0 RDI: 0000000000000003
      [   35.832167] RBP: 000000005aa6a7a9 R08: 0000000000000001 R09: 0000000000000000
      [   35.833075] R10: 00000000000005f1 R11: 0000000000000246 R12: 0000000000000000
      [   35.833997] R13: 00007ffc9f2884c0 R14: 0000000000000001 R15: 0000000000674640
      [   35.834923] Code: 24 30 bb 01 00 00 00 45 31 f6 eb 5e 8b 50 08 83 c2 07 83 e2
      fc 83 c2 70 49 8b 07 48 8b 40 70 48 85 c0 74 10 48 89 14 24 4c 89 ff <ff> d0 48
      8b 14 24 48 01 c2 49 01 d6 45 85 ed 74 05 41 83 47 2c
      [   35.837442] RIP: tcf_action_init+0x90/0x190 RSP: ffffb8edc068b9a0
      [   35.838291] ---[ end trace a095c06ee4b97a26 ]---
      
      Fixes: d0f6dd8a ("net/sched: Introduce act_tunnel_key")
      Signed-off-by: default avatarRoman Mashak <mrv@mojatatu.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51d4740f
    • Ursula Braun's avatar
      net/smc: simplify wait when closing listen socket · 3d502067
      Ursula Braun authored
      Closing of a listen socket wakes up kernel_accept() of
      smc_tcp_listen_worker(), and then has to wait till smc_tcp_listen_worker()
      gives up the internal clcsock. The wait logic introduced with
      commit 127f4970 ("net/smc: release clcsock from tcp_listen_worker")
      might wait longer than necessary. This patch implements the idea to
      implement the wait just with flush_work(), and gets rid of the extra
      smc_close_wait_listen_clcsock() function.
      
      Fixes: 127f4970 ("net/smc: release clcsock from tcp_listen_worker")
      Reported-by: default avatarHans Wippel <hwippel@linux.vnet.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d502067
  3. 14 Mar, 2018 13 commits
  4. 13 Mar, 2018 4 commits
    • Dan Carpenter's avatar
      qed: Use after free in qed_rdma_free() · f89782c2
      Dan Carpenter authored
      We're dereferencing "p_hwfn->p_rdma_info" but that is freed on the line
      before in qed_rdma_resc_free(p_hwfn).
      
      Fixes: 9de506a5 ("qed: Free RoCE ILT Memory on rmmod qedr")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarMichal Kalderon <Michal.Kalderon@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f89782c2
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · d2ddf628
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2018-03-13
      
      1) Refuse to insert 32 bit userspace socket policies on 64
         bit systems like we do it for standard policies. We don't
         have a compat layer, so inserting socket policies from
         32 bit userspace will lead to a broken configuration.
      
      2) Make the policy hold queue work without the flowcache.
         Dummy bundles are not chached anymore, so we need to
         generate a new one on each lookup as long as the SAs
         are not yet in place.
      
      3) Fix the validation of the esn replay attribute. The
         The sanity check in verify_replay() is bypassed if
         the XFRM_STATE_ESN flag is not set. Fix this by doing
         the sanity check uncoditionally.
         From Florian Westphal.
      
      4) After most of the dst_entry garbage collection code
         is removed, we may leak xfrm_dst entries as they are
         neither cached nor tracked somewhere. Fix this by
         reusing the 'uncached_list' to track xfrm_dst entries
         too. From Xin Long.
      
      5) Fix a rcu_read_lock/rcu_read_unlock imbalance in
         xfrm_get_tos() From Xin Long.
      
      6) Fix an infinite loop in xfrm_get_dst_nexthop. On
         transport mode we fetch the child dst_entry after
         we continue, so this pointer is never updated.
         Fix this by fetching it before we continue.
      
      7) Fix ESN sequence number gap after IPsec GSO packets.
          We accidentally increment the sequence number counter
          on the xfrm_state by one packet too much in the ESN
          case. Fix this by setting the sequence number to the
          correct value.
      
      8) Reset the ethernet protocol after decapsulation only if a
         mac header was set. Otherwise it breaks configurations
         with TUN devices. From Yossi Kuperman.
      
      9) Fix __this_cpu_read() usage in preemptible code. Use
         this_cpu_read() instead in ipcomp_alloc_tfms().
         From Greg Hackmann.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2ddf628
    • Greg Hackmann's avatar
      net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms() · 0dcd7876
      Greg Hackmann authored
      f7c83bcb ("net: xfrm: use __this_cpu_read per-cpu helper") added a
      __this_cpu_read() call inside ipcomp_alloc_tfms().
      
      At the time, __this_cpu_read() required the caller to either not care
      about races or to handle preemption/interrupt issues.  3.15 tightened
      the rules around some per-cpu operations, and now __this_cpu_read()
      should never be used in a preemptible context.  On 3.15 and later, we
      need to use this_cpu_read() instead.
      
      syzkaller reported this leading to the following kernel BUG while
      fuzzing sendmsg:
      
      BUG: using __this_cpu_read() in preemptible [00000000] code: repro/3101
      caller is ipcomp_init_state+0x185/0x990
      CPU: 3 PID: 3101 Comm: repro Not tainted 4.16.0-rc4-00123-g86f84779 #154
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      Call Trace:
       dump_stack+0xb9/0x115
       check_preemption_disabled+0x1cb/0x1f0
       ipcomp_init_state+0x185/0x990
       ? __xfrm_init_state+0x876/0xc20
       ? lock_downgrade+0x5e0/0x5e0
       ipcomp4_init_state+0xaa/0x7c0
       __xfrm_init_state+0x3eb/0xc20
       xfrm_init_state+0x19/0x60
       pfkey_add+0x20df/0x36f0
       ? pfkey_broadcast+0x3dd/0x600
       ? pfkey_sock_destruct+0x340/0x340
       ? pfkey_seq_stop+0x80/0x80
       ? __skb_clone+0x236/0x750
       ? kmem_cache_alloc+0x1f6/0x260
       ? pfkey_sock_destruct+0x340/0x340
       ? pfkey_process+0x62a/0x6f0
       pfkey_process+0x62a/0x6f0
       ? pfkey_send_new_mapping+0x11c0/0x11c0
       ? mutex_lock_io_nested+0x1390/0x1390
       pfkey_sendmsg+0x383/0x750
       ? dump_sp+0x430/0x430
       sock_sendmsg+0xc0/0x100
       ___sys_sendmsg+0x6c8/0x8b0
       ? copy_msghdr_from_user+0x3b0/0x3b0
       ? pagevec_lru_move_fn+0x144/0x1f0
       ? find_held_lock+0x32/0x1c0
       ? do_huge_pmd_anonymous_page+0xc43/0x11e0
       ? lock_downgrade+0x5e0/0x5e0
       ? get_kernel_page+0xb0/0xb0
       ? _raw_spin_unlock+0x29/0x40
       ? do_huge_pmd_anonymous_page+0x400/0x11e0
       ? __handle_mm_fault+0x553/0x2460
       ? __fget_light+0x163/0x1f0
       ? __sys_sendmsg+0xc7/0x170
       __sys_sendmsg+0xc7/0x170
       ? SyS_shutdown+0x1a0/0x1a0
       ? __do_page_fault+0x5a0/0xca0
       ? lock_downgrade+0x5e0/0x5e0
       SyS_sendmsg+0x27/0x40
       ? __sys_sendmsg+0x170/0x170
       do_syscall_64+0x19f/0x640
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x7f0ee73dfb79
      RSP: 002b:00007ffe14fc15a8 EFLAGS: 00000207 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0ee73dfb79
      RDX: 0000000000000000 RSI: 00000000208befc8 RDI: 0000000000000004
      RBP: 00007ffe14fc15b0 R08: 00007ffe14fc15c0 R09: 00007ffe14fc15c0
      R10: 0000000000000000 R11: 0000000000000207 R12: 0000000000400440
      R13: 00007ffe14fc16b0 R14: 0000000000000000 R15: 0000000000000000
      Signed-off-by: default avatarGreg Hackmann <ghackmann@google.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      0dcd7876
    • Florian Fainelli's avatar
      net: dsa: Fix dsa_is_user_port() test inversion · 5a9f8df6
      Florian Fainelli authored
      During the conversion to dsa_is_user_port(), a condition ended up being
      reversed, which would prevent the creation of any user port when using
      the legacy binding and/or platform data, fix that.
      
      Fixes: 4a5b85ff ("net: dsa: use dsa_is_user_port everywhere")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a9f8df6
  5. 12 Mar, 2018 16 commits
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · 59bb8835
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2018-03-12
      
      This series contains fixes to e1000e only.
      
      Benjamin Poirier provides two fixes, first reverts commits that changed
      what happens to the link status when there is an error.  These commits
      were to resolve a race condition, but in the process of fixing the race
      condition, they changed the behavior when an error occurred.  Second fix
      resolves a race condition by not setting "get_link_status" to false
      after checking the link.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59bb8835
    • David S. Miller's avatar
      Merge branch 'l2tp-fix-races-with-ipv4-mapped-ipv6-addresses' · 38fbbc9c
      David S. Miller authored
      Paolo Abeni says:
      
      ====================
      l2tp: fix races with ipv4-mapped ipv6 addresses
      
      The syzbot reported an l2tp oops that uncovered some races in the l2tp xmit
      path and a partially related issue in the generic ipv6 code.
      
      We need to address them separately.
      
      v1 -> v2:
       - add missing fixes tag in patch 1
       - fix several issues in patch 2
      
      v2 -> v3:
       - dropped some unneeded chunks in patch 2
      ====================
      Reviewed-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38fbbc9c
    • Paolo Abeni's avatar
      l2tp: fix races with ipv4-mapped ipv6 addresses · b954f940
      Paolo Abeni authored
      The l2tp_tunnel_create() function checks for v4mapped ipv6
      sockets and cache that flag, so that l2tp core code can
      reusing it at xmit time.
      
      If the socket is provided by the userspace, the connection
      status of the tunnel sockets can change between the tunnel
      creation and the xmit call, so that syzbot is able to
      trigger the following splat:
      
      BUG: KASAN: use-after-free in ip6_dst_idev include/net/ip6_fib.h:192
      [inline]
      BUG: KASAN: use-after-free in ip6_xmit+0x1f76/0x2260
      net/ipv6/ip6_output.c:264
      Read of size 8 at addr ffff8801bd949318 by task syz-executor4/23448
      
      CPU: 0 PID: 23448 Comm: syz-executor4 Not tainted 4.16.0-rc4+ #65
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:17 [inline]
        dump_stack+0x194/0x24d lib/dump_stack.c:53
        print_address_description+0x73/0x250 mm/kasan/report.c:256
        kasan_report_error mm/kasan/report.c:354 [inline]
        kasan_report+0x23c/0x360 mm/kasan/report.c:412
        __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
        ip6_dst_idev include/net/ip6_fib.h:192 [inline]
        ip6_xmit+0x1f76/0x2260 net/ipv6/ip6_output.c:264
        inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139
        l2tp_xmit_core net/l2tp/l2tp_core.c:1053 [inline]
        l2tp_xmit_skb+0x105f/0x1410 net/l2tp/l2tp_core.c:1148
        pppol2tp_sendmsg+0x470/0x670 net/l2tp/l2tp_ppp.c:341
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg+0xca/0x110 net/socket.c:640
        ___sys_sendmsg+0x767/0x8b0 net/socket.c:2046
        __sys_sendmsg+0xe5/0x210 net/socket.c:2080
        SYSC_sendmsg net/socket.c:2091 [inline]
        SyS_sendmsg+0x2d/0x50 net/socket.c:2087
        do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x453e69
      RSP: 002b:00007f819593cc68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f819593d6d4 RCX: 0000000000453e69
      RDX: 0000000000000081 RSI: 000000002037ffc8 RDI: 0000000000000004
      RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000000004c3 R14: 00000000006f72e8 R15: 0000000000000000
      
      This change addresses the issues:
      * explicitly checking for TCP_ESTABLISHED for user space provided sockets
      * dropping the v4mapped flag usage - it can become outdated - and
        explicitly invoking ipv6_addr_v4mapped() instead
      
      The issue is apparently there since ancient times.
      
      v1 -> v2: (many thanks to Guillaume)
       - with csum issue introduced in v1
       - replace pr_err with pr_debug
       - fix build issue with IPV6 disabled
       - move l2tp_sk_is_v4mapped in l2tp_core.c
      
      v2 -> v3:
       - don't update inet_daddr for v4mapped address, unneeded
       - drop rendundant check at creation time
      
      Reported-and-tested-by: syzbot+92fa328176eb07e4ac1a@syzkaller.appspotmail.com
      Fixes: 3557baab ("[L2TP]: PPP over L2TP driver core")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b954f940
    • Paolo Abeni's avatar
      net: ipv6: keep sk status consistent after datagram connect failure · 2f987a76
      Paolo Abeni authored
      On unsuccesful ip6_datagram_connect(), if the failure is caused by
      ip6_datagram_dst_update(), the sk peer information are cleared, but
      the sk->sk_state is preserved.
      
      If the socket was already in an established status, the overall sk
      status is inconsistent and fouls later checks in datagram code.
      
      Fix this saving the old peer information and restoring them in
      case of failure. This also aligns ipv6 datagram connect() behavior
      with ipv4.
      
      v1 -> v2:
       - added missing Fixes tag
      
      Fixes: 85cb73ff ("net: ipv6: reset daddr and dport in sk if connect() fails")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f987a76
    • Benjamin Poirier's avatar
      e1000e: Fix link check race condition · e2710dbf
      Benjamin Poirier authored
      Alex reported the following race condition:
      
      /* link goes up... interrupt... schedule watchdog */
      \ e1000_watchdog_task
      	\ e1000e_has_link
      		\ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link
      			\ e1000e_phy_has_link_generic(..., &link)
      				link = true
      
      					 /* link goes down... interrupt */
      					 \ e1000_msix_other
      						 hw->mac.get_link_status = true
      
      			/* link is up */
      			mac->get_link_status = false
      
      		link_active = true
      		/* link_active is true, wrongly, and stays so because
      		 * get_link_status is false */
      
      Avoid this problem by making sure that we don't set get_link_status = false
      after having checked the link.
      
      It seems this problem has been present since the introduction of e1000e.
      
      Link: https://lkml.org/lkml/2018/1/29/338Reported-by: default avatarAlexander Duyck <alexander.duyck@gmail.com>
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@suse.com>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      e2710dbf
    • Benjamin Poirier's avatar
      Revert "e1000e: Separate signaling for link check/link up" · 3016e0a0
      Benjamin Poirier authored
      This reverts commit 19110cfb.
      This reverts commit 4110e02e.
      This reverts commit d3604515c9eda464a92e8e67aae82dfe07fe3c98.
      
      Commit 19110cfb ("e1000e: Separate signaling for link check/link up")
      changed what happens to the link status when there is an error which
      happens after "get_link_status = false" in the copper check_for_link
      callbacks. Previously, such an error would be ignored and the link
      considered up. After that commit, any error implies that the link is down.
      
      Revert commit 19110cfb ("e1000e: Separate signaling for link check/link
      up") and its followups. After reverting, the race condition described in
      the log of commit 19110cfb is reintroduced. It may still be triggered
      by LSC events but this should keep the link down in case the link is
      electrically unstable, as discussed. The race may no longer be
      triggered by RXO events because commit 4aea7a5c ("e1000e: Avoid
      receiver overrun interrupt bursts") restored reading icr in the Other
      handler.
      
      Link: https://lkml.org/lkml/2018/3/1/789Signed-off-by: default avatarBenjamin Poirier <bpoirier@suse.com>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      3016e0a0
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · b7475948
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for your net tree, they are:
      
      1) Fixed hashtable representation doesn't support timeout flag, skip it
         otherwise rules to add elements from the packet fail bogusly fail with
         EOPNOTSUPP.
      
      2) Fix bogus error with 32-bits ebtables userspace and 64-bits kernel,
         patch from Florian Westphal.
      
      3) Sanitize proc names in several x_tables extensions, also from Florian.
      
      4) Add sanitization to ebt_among wormhash logic, from Florian.
      
      5) Missing release of hook array in flowtable.
      ====================
      b7475948
    • David S. Miller's avatar
      Merge tag 'linux-can-fixes-for-4.16-20180312' of... · 4665c6b0
      David S. Miller authored
      Merge tag 'linux-can-fixes-for-4.16-20180312' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2018-03-12
      
      this is a pull reqeust of 6 patches for net/master.
      
      The first patch is by Wolfram Sang and fixes a bitshift vs. comparison mistake
      in the m_can driver. Two patches of Marek Vasut repair the error handling in
      the ifi driver. The two patches by Stephane Grosjean fix a "echo_skb is
      occupied!" bug in the peak/pcie_fd driver. Bich HEMON's patch adds pinctrl
      select state calls to the m_can's driver to further improve power saving during
      suspend.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4665c6b0
    • Xin Long's avatar
      sock_diag: request _diag module only when the family or proto has been registered · bf2ae2e4
      Xin Long authored
      Now when using 'ss' in iproute, kernel would try to load all _diag
      modules, which also causes corresponding family and proto modules
      to be loaded as well due to module dependencies.
      
      Like after running 'ss', sctp, dccp, af_packet (if it works as a module)
      would be loaded.
      
      For example:
      
        $ lsmod|grep sctp
        $ ss
        $ lsmod|grep sctp
        sctp_diag              16384  0
        sctp                  323584  5 sctp_diag
        inet_diag              24576  4 raw_diag,tcp_diag,sctp_diag,udp_diag
        libcrc32c              16384  3 nf_conntrack,nf_nat,sctp
      
      As these family and proto modules are loaded unintentionally, it
      could cause some problems, like:
      
      - Some debug tools use 'ss' to collect the socket info, which loads all
        those diag and family and protocol modules. It's noisy for identifying
        issues.
      
      - Users usually expect to drop sctp init packet silently when they
        have no sense of sctp protocol instead of sending abort back.
      
      - It wastes resources (especially with multiple netns), and SCTP module
        can't be unloaded once it's loaded.
      
      ...
      
      In short, it's really inappropriate to have these family and proto
      modules loaded unexpectedly when just doing debugging with inet_diag.
      
      This patch is to introduce sock_load_diag_module() where it loads
      the _diag module only when it's corresponding family or proto has
      been already registered.
      
      Note that we can't just load _diag module without the family or
      proto loaded, as some symbols used in _diag module are from the
      family or proto module.
      
      v1->v2:
        - move inet proto check to inet_diag to avoid a compiling err.
      v2->v3:
        - define sock_load_diag_module in sock.c and export one symbol
          only.
        - improve the changelog.
      Reported-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarPhil Sutter <phil@nwl.cc>
      Acked-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf2ae2e4
    • David S. Miller's avatar
      Merge branch 'bnxt_en-Bug-fixes' · 9e5fb720
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Bug fixes.
      
      There are 3 bug fixes in this series to fix regressions recently
      introduced when adding the new ring reservations scheme.  2 minor
      fixes in the TC Flower code to return standard errno values and
      to elide some unnecessary warning dmesg.  One Fixes the VLAN TCI
      value passed to the stack by including the entire 16-bit VLAN TCI,
      and the last fix is to check for valid VNIC ID before setting up or
      shutting down LRO/GRO.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e5fb720
    • Michael Chan's avatar
      bnxt_en: Check valid VNIC ID in bnxt_hwrm_vnic_set_tpa(). · 3c4fe80b
      Michael Chan authored
      During initialization, if we encounter errors, there is a code path that
      calls bnxt_hwrm_vnic_set_tpa() with invalid VNIC ID.  This may cause a
      warning in firmware logs.
      
      Fixes: c0c050c5 ("bnxt_en: New Broadcom ethernet driver.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c4fe80b
    • Venkat Duvvuru's avatar
      bnxt_en: close & open NIC, only when the interface is in running state. · 1a037782
      Venkat Duvvuru authored
      bnxt_restore_pf_fw_resources routine frees PF resources by calling
      close_nic and allocates the resources back, by doing open_nic. However,
      this is not needed, if the PF is already in closed state.
      
      This bug causes the driver to call open the device and call request_irq()
      when it is not needed.  Ultimately, pci_disable_msix() will crash
      when bnxt_en is unloaded.
      
      This patch fixes the problem by skipping __bnxt_close_nic and
      __bnxt_open_nic inside bnxt_restore_pf_fw_resources routine, if the
      interface is not running.
      
      Fixes: 80fcaf46 ("bnxt_en: Restore MSIX after disabling SRIOV.")
      Signed-off-by: default avatarVenkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a037782
    • Venkat Duvvuru's avatar
      bnxt_en: Return standard Linux error codes for hwrm flow cmds. · 6ae777ea
      Venkat Duvvuru authored
      Currently, internal error value is returned by the driver, when
      hwrm_cfa_flow_alloc() fails due lack of resources.  We should be returning
      Linux errno value -ENOSPC instead.
      
      This patch also converts other similar command errors to standard Linux errno
      code (-EIO) in bnxt_tc.c
      
      Fixes: db1d36a2 ("bnxt_en: add TC flower offload flow_alloc/free FW cmds")
      Signed-off-by: default avatarVenkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ae777ea
    • Michael Chan's avatar
      bnxt_en: Fix regressions when setting up MQPRIO TX rings. · 832aed16
      Michael Chan authored
      Recent changes added the bnxt_init_int_mode() call in the driver's open
      path whenever ring reservations are changed.  This call was previously
      only called in the probe path.  In the open path, if MQPRIO TC has been
      setup, the bnxt_init_int_mode() call would reset and mess up the MQPRIO
      per TC rings.
      
      Fix it by not re-initilizing bp->tx_nr_rings_per_tc in
      bnxt_init_int_mode().  Instead, initialize it in the probe path only
      after the bnxt_init_int_mode() call.
      
      Fixes: 674f50a5 ("bnxt_en: Implement new method to reserve rings.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      832aed16
    • Michael Chan's avatar
      bnxt_en: Pass complete VLAN TCI to the stack. · ed7bc602
      Michael Chan authored
      When receiving a packet with VLAN tag, pass the entire 16-bit TCI to the
      stack when calling __vlan_hwaccel_put_tag().  The current code is only
      passing the 12-bit tag and it is missing the priority bits.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed7bc602
    • Sriharsha Basavapatna's avatar
      bnxt_en: Remove unwanted ovs-offload messages in some conditions · b9ecc340
      Sriharsha Basavapatna authored
      In some conditions when the driver fails to add a flow in HW and returns
      an error back to the stack, the stack continues to invoke get_flow_stats()
      and/or del_flow() on it. The driver fails these APIs with an error message
      "no flow_node for cookie". The message gets logged repeatedly as long as
      the stack keeps invoking these functions.
      
      Fix this by removing the corresponding netdev_info() calls from these
      functions.
      
      Fixes: d7bc7305 ("bnxt_en: add code to query TC flower offload stats")
      Signed-off-by: default avatarSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9ecc340