1. 31 Aug, 2020 2 commits
  2. 29 Aug, 2020 1 commit
    • Florian Westphal's avatar
      netfilter: conntrack: do not auto-delete clash entries on reply · c4617214
      Florian Westphal authored
      Its possible that we have more than one packet with the same ct tuple
      simultaneously, e.g. when an application emits n packets on same UDP
      socket from multiple threads.
      
      NAT rules might be applied to those packets. With the right set of rules,
      n packets will be mapped to m destinations, where at least two packets end
      up with the same destination.
      
      When this happens, the existing clash resolution may merge the skb that
      is processed after the first has been received with the identical tuple
      already in hash table.
      
      However, its possible that this identical tuple is a NAT_CLASH tuple.
      In that case the second skb will be sent, but no reply can be received
      since the reply that is processed first removes the NAT_CLASH tuple.
      
      Do not auto-delete, this gives a 1 second window for replies to be passed
      back to originator.
      
      Packets that are coming later (udp stream case) will not be affected:
      they match the original ct entry, not a NAT_CLASH one.
      
      Also prevent NAT_CLASH entries from getting offloaded.
      
      Fixes: 6a757c07 ("netfilter: conntrack: allow insertion of clashing entries")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      c4617214
  3. 28 Aug, 2020 13 commits
  4. 27 Aug, 2020 12 commits
  5. 26 Aug, 2020 12 commits
    • David S. Miller's avatar
      Merge branch 'net-fix-netpoll-crash-with-bnxt' · 5875568a
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      net: fix netpoll crash with bnxt
      
      Rob run into crashes when using XDP on bnxt. Upon investigation
      it turns out that during driver reconfig irq core produces
      a warning message when IRQs are requested. This triggers netpoll,
      which in turn accesses uninitialized driver state. Same crash can
      also be triggered on this platform by changing the number of rings.
      
      Looks like we have two missing pieces here, netif_napi_add() has
      to make sure we start out with netpoll blocked. The driver also
      has to be more careful about when napi gets enabled.
      
      Tested XDP and channel count changes, the warning message no longer
      causes a crash. Not sure if the memory barriers added in patch 1
      are necessary, but it seems we should have them.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5875568a
    • Jakub Kicinski's avatar
      bnxt: don't enable NAPI until rings are ready · 96ecdcc9
      Jakub Kicinski authored
      Netpoll can try to poll napi as soon as napi_enable() is called.
      It crashes trying to access a doorbell which is still NULL:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000000
       CPU: 59 PID: 6039 Comm: ethtool Kdump: loaded Tainted: G S                5.9.0-rc1-00469-g5fd99b5d-dirty #26
       RIP: 0010:bnxt_poll+0x121/0x1c0
       Code: c4 20 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 8b 86 a0 01 00 00 41 23 85 18 01 00 00 49 8b 96 a8 01 00 00 0d 00 00 00 24 <89> 02
      41 f6 45 77 02 74 cb 49 8b ae d8 01 00 00 31 c0 c7 44 24 1a
        netpoll_poll_dev+0xbd/0x1a0
        __netpoll_send_skb+0x1b2/0x210
        netpoll_send_udp+0x2c9/0x406
        write_ext_msg+0x1d7/0x1f0
        console_unlock+0x23c/0x520
        vprintk_emit+0xe0/0x1d0
        printk+0x58/0x6f
        x86_vector_activate.cold+0xf/0x46
        __irq_domain_activate_irq+0x50/0x80
        __irq_domain_activate_irq+0x32/0x80
        __irq_domain_activate_irq+0x32/0x80
        irq_domain_activate_irq+0x25/0x40
        __setup_irq+0x2d2/0x700
        request_threaded_irq+0xfb/0x160
        __bnxt_open_nic+0x3b1/0x750
        bnxt_open_nic+0x19/0x30
        ethtool_set_channels+0x1ac/0x220
        dev_ethtool+0x11ba/0x2240
        dev_ioctl+0x1cf/0x390
        sock_do_ioctl+0x95/0x130
      Reported-by: default avatarRob Sherwood <rsher@fb.com>
      Fixes: c0c050c5 ("bnxt_en: New Broadcom ethernet driver.")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96ecdcc9
    • Jakub Kicinski's avatar
      net: disable netpoll on fresh napis · 96e97bc0
      Jakub Kicinski authored
      napi_disable() makes sure to set the NAPI_STATE_NPSVC bit to prevent
      netpoll from accessing rings before init is complete. However, the
      same is not done for fresh napi instances in netif_napi_add(),
      even though we expect NAPI instances to be added as disabled.
      
      This causes crashes during driver reconfiguration (enabling XDP,
      changing the channel count) - if there is any printk() after
      netif_napi_add() but before napi_enable().
      
      To ensure memory ordering is correct we need to use RCU accessors.
      Reported-by: default avatarRob Sherwood <rsher@fb.com>
      Fixes: 2d8bff12 ("netpoll: Close race condition between poll_one_napi and napi_disable")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96e97bc0
    • Ido Schimmel's avatar
      ipv4: Silence suspicious RCU usage warning · 7f6f32bb
      Ido Schimmel authored
      fib_info_notify_update() is always called with RTNL held, but not from
      an RCU read-side critical section. This leads to the following warning
      [1] when the FIB table list is traversed with
      hlist_for_each_entry_rcu(), but without a proper lockdep expression.
      
      Since modification of the list is protected by RTNL, silence the warning
      by adding a lockdep expression which verifies RTNL is held.
      
      [1]
       =============================
       WARNING: suspicious RCU usage
       5.9.0-rc1-custom-14233-g2f26e122d62f #129 Not tainted
       -----------------------------
       net/ipv4/fib_trie.c:2124 RCU-list traversed in non-reader section!!
      
       other info that might help us debug this:
      
       rcu_scheduler_active = 2, debug_locks = 1
       1 lock held by ip/834:
        #0: ffffffff85a3b6b0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x49a/0xbd0
      
       stack backtrace:
       CPU: 0 PID: 834 Comm: ip Not tainted 5.9.0-rc1-custom-14233-g2f26e122d62f #129
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
       Call Trace:
        dump_stack+0x100/0x184
        lockdep_rcu_suspicious+0x143/0x14d
        fib_info_notify_update+0x8d1/0xa60
        __nexthop_replace_notify+0xd2/0x290
        rtm_new_nexthop+0x35e2/0x5946
        rtnetlink_rcv_msg+0x4f7/0xbd0
        netlink_rcv_skb+0x17a/0x480
        rtnetlink_rcv+0x22/0x30
        netlink_unicast+0x5ae/0x890
        netlink_sendmsg+0x98a/0xf40
        ____sys_sendmsg+0x879/0xa00
        ___sys_sendmsg+0x122/0x190
        __sys_sendmsg+0x103/0x1d0
        __x64_sys_sendmsg+0x7d/0xb0
        do_syscall_64+0x32/0x50
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7fde28c3be57
       Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51
      c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
      RSP: 002b:00007ffc09330028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fde28c3be57
      RDX: 0000000000000000 RSI: 00007ffc09330090 RDI: 0000000000000003
      RBP: 000000005f45f911 R08: 0000000000000001 R09: 00007ffc0933012c
      R10: 0000000000000076 R11: 0000000000000246 R12: 0000000000000001
      R13: 00007ffc09330290 R14: 00007ffc09330eee R15: 00005610e48ed020
      
      Fixes: 1bff1a0c ("ipv4: Add function to send route updates")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f6f32bb
    • Xie He's avatar
      drivers/net/wan/lapbether: Set network_header before transmitting · 91244d10
      Xie He authored
      Set the skb's network_header before it is passed to the underlying
      Ethernet device for transmission.
      
      This patch fixes the following issue:
      
      When we use this driver with AF_PACKET sockets, there would be error
      messages of:
         protocol 0805 is buggy, dev (Ethernet interface name)
      printed in the system "dmesg" log.
      
      This is because skbs passed down to the Ethernet device for transmission
      don't have their network_header properly set, and the dev_queue_xmit_nit
      function in net/core/dev.c complains about this.
      
      Reason of setting the network_header to this place (at the end of the
      Ethernet header, and at the beginning of the Ethernet payload):
      
      Because when this driver receives an skb from the Ethernet device, the
      network_header is also set at this place.
      
      Cc: Martin Schiller <ms@dev.tdt.de>
      Signed-off-by: default avatarXie He <xie.he.0141@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91244d10
    • Florian Westphal's avatar
      mptcp: free acked data before waiting for more memory · 1cec170d
      Florian Westphal authored
      After subflow lock is dropped, more wmem might have been made available.
      
      This fixes a deadlock in mptcp_connect.sh 'mmap' mode: wmem is exhausted.
      But as the mptcp socket holds on to already-acked data (for retransmit)
      no wakeup will occur.
      
      Using 'goto restart' calls mptcp_clean_una(sk) which will free pages
      that have been acked completely in the mean time.
      
      Fixes: fb529e62 ("mptcp: break and restart in case mptcp sndbuf is full")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cec170d
    • Vinicius Costa Gomes's avatar
      taprio: Fix using wrong queues in gate mask · 09e31cf0
      Vinicius Costa Gomes authored
      Since commit 9c66d156 ("taprio: Add support for hardware
      offloading") there's a bit of inconsistency when offloading schedules
      to the hardware:
      
      In software mode, the gate masks are specified in terms of traffic
      classes, so if say "sched-entry S 03 20000", it means that the traffic
      classes 0 and 1 are open for 20us; when taprio is offloaded to
      hardware, the gate masks are specified in terms of hardware queues.
      
      The idea here is to fix hardware offloading, so schedules in hardware
      and software mode have the same behavior. What's needed to do is to
      map traffic classes to queues when applying the offload to the driver.
      
      Fixes: 9c66d156 ("taprio: Add support for hardware offloading")
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09e31cf0
    • YueHaibing's avatar
      net: cdc_ncm: Fix build error · 5fd99b5d
      YueHaibing authored
      If USB_NET_CDC_NCM is y and USB_NET_CDCETHER is m, build fails:
      
      drivers/net/usb/cdc_ncm.o:(.rodata+0x1d8): undefined reference to `usbnet_cdc_update_filter'
      
      Select USB_NET_CDCETHER for USB_NET_CDC_NCM to fix this.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Fixes: e10dcb1b ("net: cdc_ncm: hook into set_rx_mode to admit multicast traffic")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fd99b5d
    • Yi Li's avatar
      net: hns3: Fix for geneve tx checksum bug · a156998f
      Yi Li authored
      when skb->encapsulation is 0, skb->ip_summed is CHECKSUM_PARTIAL
      and it is udp packet, which has a dest port as the IANA assigned.
      the hardware is expected to do the checksum offload, but the
      hardware will not do the checksum offload when udp dest port is
      6081.
      
      This patch fixes it by doing the checksum in software.
      Reported-by: default avatarLi Bing <libing@winhong.com>
      Signed-off-by: default avatarYi Li <yili@winhong.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a156998f
    • David S. Miller's avatar
      Merge branch 'bnxt_en-Bug-fixes' · 0a3445b8
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Bug fixes.
      
      This set of driver patches include bug fixes for ethtool get channels,
      ethtool statistics, ethtool NVRAM, AER recovery, a firmware reset issue
      that could potentially crash, hwmon temperature reporting issue on VF,
      and 2 fixes for regressions introduced by the recent user-defined RSS
      map feature.
      
      Please queue patches 1 to 6 for -stable.  Thanks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a3445b8
    • Michael Chan's avatar
      bnxt_en: Setup default RSS map in all scenarios. · b43b9f53
      Michael Chan authored
      The recent changes to support user-defined RSS map assume that RX
      rings are always reserved and the default RSS map is set after the
      RX rings are successfully reserved.  If the firmware spec is older
      than 1.6.1, no ring reservations are required and the default RSS
      map is not setup at all.  In another scenario where the fw Resource
      Manager is older, RX rings are not reserved and we also end up with
      no valid RSS map.
      
      Fix both issues in bnxt_need_reserve_rings().  In both scenarios
      described above, we don't need to reserve RX rings so we need to
      call this new function bnxt_check_rss_map_no_rmgr() to setup the
      default RSS map when needed.
      
      Without valid RSS map, the NIC won't receive packets properly.
      
      Fixes: 1667cbf6 ("bnxt_en: Add logical RSS indirection table structure.")
      Reviewed-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Reviewed-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b43b9f53
    • Edwin Peer's avatar
      bnxt_en: init RSS table for Minimal-Static VF reservation · 5fa65524
      Edwin Peer authored
      There are no VF rings available during probe when the device is configured
      using the Minimal-Static reservation strategy. In this case, the RSS
      indirection table can only be initialized later, during bnxt_open_nic().
      However, this was not happening because the rings will already have been
      reserved via bnxt_init_dflt_ring_mode(), causing bnxt_need_reserve_rings()
      to return false in bnxt_reserve_rings() and bypass the RSS table init.
      
      Solve this by pushing the call to bnxt_set_dflt_rss_indir_tbl() into
      __bnxt_reserve_rings(), which is common to both paths and is called
      whenever ring configuration is changed. After doing this, the RSS table
      init that must be called from bnxt_init_one() happens implicitly via
      bnxt_set_default_rings(), necessitating doing the allocation earlier in
      order to avoid a null pointer dereference.
      
      Fixes: bd3191b5 ("bnxt_en: Implement ethtool -X to set indirection table.")
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fa65524