1. 29 May, 2020 2 commits
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 942110fd
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2020-05-29
      
      1) Several fixes for ESP gro/gso in transport and beet mode when
         IPv6 extension headers are present. From Xin Long.
      
      2) Fix a wrong comment on XFRMA_OFFLOAD_DEV.
         From Antony Antony.
      
      3) Fix sk_destruct callback handling on ESP in TCP encapsulation.
         From Sabrina Dubroca.
      
      4) Fix a use after free in xfrm_output_gso when used with vxlan.
         From Xin Long.
      
      5) Fix secpath handling of VTI when used wiuth IPCOMP.
         From Xin Long.
      
      6) Fix an oops when deleting a x-netns xfrm interface.
         From Nicolas Dichtel.
      
      7) Fix a possible warning on policy updates. We had a case where it was
         possible to add two policies with the same lookup keys.
         From Xin Long.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      942110fd
    • Xin Long's avatar
      xfrm: fix a NULL-ptr deref in xfrm_local_error · f6a23d85
      Xin Long authored
      This patch is to fix a crash:
      
        [ ] kasan: GPF could be caused by NULL-ptr deref or user memory access
        [ ] general protection fault: 0000 [#1] SMP KASAN PTI
        [ ] RIP: 0010:ipv6_local_error+0xac/0x7a0
        [ ] Call Trace:
        [ ]  xfrm6_local_error+0x1eb/0x300
        [ ]  xfrm_local_error+0x95/0x130
        [ ]  __xfrm6_output+0x65f/0xb50
        [ ]  xfrm6_output+0x106/0x46f
        [ ]  udp_tunnel6_xmit_skb+0x618/0xbf0 [ip6_udp_tunnel]
        [ ]  vxlan_xmit_one+0xbc6/0x2c60 [vxlan]
        [ ]  vxlan_xmit+0x6a0/0x4276 [vxlan]
        [ ]  dev_hard_start_xmit+0x165/0x820
        [ ]  __dev_queue_xmit+0x1ff0/0x2b90
        [ ]  ip_finish_output2+0xd3e/0x1480
        [ ]  ip_do_fragment+0x182d/0x2210
        [ ]  ip_output+0x1d0/0x510
        [ ]  ip_send_skb+0x37/0xa0
        [ ]  raw_sendmsg+0x1b4c/0x2b80
        [ ]  sock_sendmsg+0xc0/0x110
      
      This occurred when sending a v4 skb over vxlan6 over ipsec, in which case
      skb->protocol == htons(ETH_P_IPV6) while skb->sk->sk_family == AF_INET in
      xfrm_local_error(). Then it will go to xfrm6_local_error() where it tries
      to get ipv6 info from a ipv4 sk.
      
      This issue was actually fixed by Commit 628e341f ("xfrm: make local
      error reporting more robust"), but brought back by Commit 844d4874
      ("xfrm: choose protocol family by skb protocol").
      
      So to fix it, we should call xfrm6_local_error() only when skb->protocol
      is htons(ETH_P_IPV6) and skb->sk->sk_family is AF_INET6.
      
      Fixes: 844d4874 ("xfrm: choose protocol family by skb protocol")
      Reported-by: default avatarXiumei Mu <xmu@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      f6a23d85
  2. 28 May, 2020 4 commits
  3. 27 May, 2020 18 commits
    • Vladimir Oltean's avatar
      net: dsa: declare lockless TX feature for slave ports · 2b86cb82
      Vladimir Oltean authored
      Be there a platform with the following layout:
      
            Regular NIC
             |
             +----> DSA master for switch port
                     |
                     +----> DSA master for another switch port
      
      After changing DSA back to static lockdep class keys in commit
      1a33e10e ("net: partially revert dynamic lockdep key changes"), this
      kernel splat can be seen:
      
      [   13.361198] ============================================
      [   13.366524] WARNING: possible recursive locking detected
      [   13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted
      [   13.377874] --------------------------------------------
      [   13.383201] swapper/0/0 is trying to acquire lock:
      [   13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
      [   13.397879]
      [   13.397879] but task is already holding lock:
      [   13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
      [   13.413593]
      [   13.413593] other info that might help us debug this:
      [   13.420140]  Possible unsafe locking scenario:
      [   13.420140]
      [   13.426075]        CPU0
      [   13.428523]        ----
      [   13.430969]   lock(&dsa_slave_netdev_xmit_lock_key);
      [   13.435946]   lock(&dsa_slave_netdev_xmit_lock_key);
      [   13.440924]
      [   13.440924]  *** DEADLOCK ***
      [   13.440924]
      [   13.446860]  May be due to missing lock nesting notation
      [   13.446860]
      [   13.453668] 6 locks held by swapper/0/0:
      [   13.457598]  #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
      [   13.466593]  #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560
      [   13.474803]  #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10
      [   13.483886]  #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
      [   13.492793]  #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
      [   13.503094]  #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
      [   13.512000]
      [   13.512000] stack backtrace:
      [   13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988
      [   13.530421] Call trace:
      [   13.532871]  dump_backtrace+0x0/0x1d8
      [   13.536539]  show_stack+0x24/0x30
      [   13.539862]  dump_stack+0xe8/0x150
      [   13.543271]  __lock_acquire+0x1030/0x1678
      [   13.547290]  lock_acquire+0xf8/0x458
      [   13.550873]  _raw_spin_lock+0x44/0x58
      [   13.554543]  __dev_queue_xmit+0x84c/0xbe0
      [   13.558562]  dev_queue_xmit+0x24/0x30
      [   13.562232]  dsa_slave_xmit+0xe0/0x128
      [   13.565988]  dev_hard_start_xmit+0xf4/0x448
      [   13.570182]  __dev_queue_xmit+0x808/0xbe0
      [   13.574200]  dev_queue_xmit+0x24/0x30
      [   13.577869]  neigh_resolve_output+0x15c/0x220
      [   13.582237]  ip6_finish_output2+0x244/0xb10
      [   13.586430]  __ip6_finish_output+0x1dc/0x298
      [   13.590709]  ip6_output+0x84/0x358
      [   13.594116]  mld_sendpack+0x2bc/0x560
      [   13.597786]  mld_ifc_timer_expire+0x210/0x390
      [   13.602153]  call_timer_fn+0xcc/0x400
      [   13.605822]  run_timer_softirq+0x588/0x6e0
      [   13.609927]  __do_softirq+0x118/0x590
      [   13.613597]  irq_exit+0x13c/0x148
      [   13.616918]  __handle_domain_irq+0x6c/0xc0
      [   13.621023]  gic_handle_irq+0x6c/0x160
      [   13.624779]  el1_irq+0xbc/0x180
      [   13.627927]  cpuidle_enter_state+0xb4/0x4d0
      [   13.632120]  cpuidle_enter+0x3c/0x50
      [   13.635703]  call_cpuidle+0x44/0x78
      [   13.639199]  do_idle+0x228/0x2c8
      [   13.642433]  cpu_startup_entry+0x2c/0x48
      [   13.646363]  rest_init+0x1ac/0x280
      [   13.649773]  arch_call_rest_init+0x14/0x1c
      [   13.653878]  start_kernel+0x490/0x4bc
      
      Lockdep keys themselves were added in commit ab92d68f ("net: core:
      add generic lockdep keys"), and it's very likely that this splat existed
      since then, but I have no real way to check, since this stacked platform
      wasn't supported by mainline back then.
      
      >From Taehee's own words:
      
        This patch was considered that all stackable devices have LLTX flag.
        But the dsa doesn't have LLTX, so this splat happened.
        After this patch, dsa shares the same lockdep class key.
        On the nested dsa interface architecture, which you illustrated,
        the same lockdep class key will be used in __dev_queue_xmit() because
        dsa doesn't have LLTX.
        So that lockdep detects deadlock because the same lockdep class key is
        used recursively although actually the different locks are used.
        There are some ways to fix this problem.
      
        1. using NETIF_F_LLTX flag.
        If possible, using the LLTX flag is a very clear way for it.
        But I'm so sorry I don't know whether the dsa could have LLTX or not.
      
        2. using dynamic lockdep again.
        It means that each interface uses a separate lockdep class key.
        So, lockdep will not detect recursive locking.
        But this way has a problem that it could consume lockdep class key
        too many.
        Currently, lockdep can have 8192 lockdep class keys.
         - you can see this number with the following command.
           cat /proc/lockdep_stats
           lock-classes:                         1251 [max: 8192]
           ...
           The [max: 8192] means that the maximum number of lockdep class keys.
        If too many lockdep class keys are registered, lockdep stops to work.
        So, using a dynamic(separated) lockdep class key should be considered
        carefully.
        In addition, updating lockdep class key routine might have to be existing.
        (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key())
      
        3. Using lockdep subclass.
        A lockdep class key could have 8 subclasses.
        The different subclass is considered different locks by lockdep
        infrastructure.
        But "lock-classes" is not counted by subclasses.
        So, it could avoid stopping lockdep infrastructure by an overflow of
        lockdep class keys.
        This approach should also have an updating lockdep class key routine.
        (lockdep_set_subclass())
      
        4. Using nonvalidate lockdep class key.
        The lockdep infrastructure supports nonvalidate lockdep class key type.
        It means this lockdep is not validated by lockdep infrastructure.
        So, the splat will not happen but lockdep couldn't detect real deadlock
        case because lockdep really doesn't validate it.
        I think this should be used for really special cases.
        (lockdep_set_novalidate_class())
      
      Further discussion here:
      https://patchwork.ozlabs.org/project/netdev/patch/20200503052220.4536-2-xiyou.wangcong@gmail.com/
      
      There appears to be no negative side-effect to declaring lockless TX for
      the DSA virtual interfaces, which means they handle their own locking.
      So that's what we do to make the splat go away.
      
      Patch tested in a wide variety of cases: unicast, multicast, PTP, etc.
      
      Fixes: ab92d68f ("net: core: add generic lockdep keys")
      Suggested-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b86cb82
    • Vladimir Oltean's avatar
      net: dsa: felix: send VLANs on CPU port as egress-tagged · 183be6f9
      Vladimir Oltean authored
      As explained in other commits before (b9cd75e6 and 87b0f983),
      ocelot switches have a single egress-untagged VLAN per port, and the
      driver would deny adding a second one while an egress-untagged VLAN
      already exists.
      
      But on the CPU port (where the VLAN configuration is implicit, because
      there is no net device for the bridge to control), the DSA core attempts
      to add a VLAN using the same flags as were used for the front-panel
      port. This would make adding any untagged VLAN fail due to the CPU port
      rejecting the configuration:
      
      bridge vlan add dev swp0 vid 100 pvid untagged
      [ 1865.854253] mscc_felix 0000:00:00.5: Port already has a native VLAN: 1
      [ 1865.860824] mscc_felix 0000:00:00.5: Failed to add VLAN 100 to port 5: -16
      
      (note that port 5 is the CPU port and not the front-panel swp0).
      
      So this hardware will send all VLANs as tagged towards the CPU.
      
      Fixes: 56051948 ("net: dsa: ocelot: add driver for Felix switch family")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      183be6f9
    • Arnd Bergmann's avatar
      bridge: multicast: work around clang bug · b3b6a84c
      Arnd Bergmann authored
      Clang-10 and clang-11 run into a corner case of the register
      allocator on 32-bit ARM, leading to excessive stack usage from
      register spilling:
      
      net/bridge/br_multicast.c:2422:6: error: stack frame size of 1472 bytes in function 'br_multicast_get_stats' [-Werror,-Wframe-larger-than=]
      
      Work around this by marking one of the internal functions as
      noinline_for_stack.
      
      Link: https://bugs.llvm.org/show_bug.cgi?id=45802#c9Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3b6a84c
    • Stefano Garzarella's avatar
      vsock: fix timeout in vsock_accept() · 7e0afbdf
      Stefano Garzarella authored
      The accept(2) is an "input" socket interface, so we should use
      SO_RCVTIMEO instead of SO_SNDTIMEO to set the timeout.
      
      So this patch replace sock_sndtimeo() with sock_rcvtimeo() to
      use the right timeout in the vsock_accept().
      
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Reviewed-by: default avatarJorgen Hansen <jhansen@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e0afbdf
    • Heinrich Kuhn's avatar
      nfp: flower: fix used time of merge flow statistics · 5b186cd6
      Heinrich Kuhn authored
      Prior to this change the correct value for the used counter is calculated
      but not stored nor, therefore, propagated to user-space. In use-cases such
      as OVS use-case at least this results in active flows being removed from
      the hardware datapath. Which results in both unnecessary flow tear-down
      and setup, and packet processing on the host.
      
      This patch addresses the problem by saving the calculated used value
      which allows the value to propagate to user-space.
      
      Found by inspection.
      
      Fixes: aa6ce2ea ("nfp: flower: support stats update for merge flows")
      Signed-off-by: default avatarHeinrich Kuhn <heinrich.kuhn@netronome.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b186cd6
    • Davide Caratti's avatar
      net/sched: fix infinite loop in sch_fq_pie · bb2f930d
      Davide Caratti authored
      this command hangs forever:
      
       # tc qdisc add dev eth0 root fq_pie flows 65536
      
       watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [tc:1028]
       [...]
       CPU: 1 PID: 1028 Comm: tc Not tainted 5.7.0-rc6+ #167
       RIP: 0010:fq_pie_init+0x60e/0x8b7 [sch_fq_pie]
       Code: 4c 89 65 50 48 89 f8 48 c1 e8 03 42 80 3c 30 00 0f 85 2a 02 00 00 48 8d 7d 10 4c 89 65 58 48 89 f8 48 c1 e8 03 42 80 3c 30 00 <0f> 85 a7 01 00 00 48 8d 7d 18 48 c7 45 10 46 c3 23 00 48 89 f8 48
       RSP: 0018:ffff888138d67468 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
       RAX: 1ffff9200018d2b2 RBX: ffff888139c1c400 RCX: ffffffffffffffff
       RDX: 000000000000c5e8 RSI: ffffc900000e5000 RDI: ffffc90000c69590
       RBP: ffffc90000c69580 R08: fffffbfff79a9699 R09: fffffbfff79a9699
       R10: 0000000000000700 R11: fffffbfff79a9698 R12: ffffc90000c695d0
       R13: 0000000000000000 R14: dffffc0000000000 R15: 000000002347c5e8
       FS:  00007f01e1850e40(0000) GS:ffff88814c880000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000000000067c340 CR3: 000000013864c000 CR4: 0000000000340ee0
       Call Trace:
        qdisc_create+0x3fd/0xeb0
        tc_modify_qdisc+0x3be/0x14a0
        rtnetlink_rcv_msg+0x5f3/0x920
        netlink_rcv_skb+0x121/0x350
        netlink_unicast+0x439/0x630
        netlink_sendmsg+0x714/0xbf0
        sock_sendmsg+0xe2/0x110
        ____sys_sendmsg+0x5b4/0x890
        ___sys_sendmsg+0xe9/0x160
        __sys_sendmsg+0xd3/0x170
        do_syscall_64+0x9a/0x370
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      we can't accept 65536 as a valid number for 'nflows', because the loop on
      'idx' in fq_pie_init() will never end. The extack message is correct, but
      it doesn't say that 0 is not a valid number for 'flows': while at it, fix
      this also. Add a tdc selftest to check correct validation of 'flows'.
      
      CC: Ivan Vecera <ivecera@redhat.com>
      Fixes: ec97ecf1 ("net: sched: add Flow Queue PIE packet scheduler")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb2f930d
    • Pablo Neira Ayuso's avatar
      netfilter: nf_conntrack_pptp: fix compilation warning with W=1 build · 4946ea5c
      Pablo Neira Ayuso authored
      >> include/linux/netfilter/nf_conntrack_pptp.h:13:20: warning: 'const' type qualifier on return type has no effect [-Wignored-qualifiers]
      extern const char *const pptp_msg_name(u_int16_t msg);
      ^~~~~~
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Fixes: 4c559f15 ("netfilter: nf_conntrack_pptp: prevent buffer overflows in debug code")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4946ea5c
    • Pablo Neira Ayuso's avatar
      netfilter: conntrack: comparison of unsigned in cthelper confirmation · 94945ad2
      Pablo Neira Ayuso authored
      net/netfilter/nf_conntrack_core.c: In function nf_confirm_cthelper:
      net/netfilter/nf_conntrack_core.c:2117:15: warning: comparison of unsigned expression in < 0 is always false [-Wtype-limits]
       2117 |   if (protoff < 0 || (frag_off & htons(~0x7)) != 0)
            |               ^
      
      ipv6_skip_exthdr() returns a signed integer.
      Reported-by: default avatarColin Ian King <colin.king@canonical.com>
      Fixes: 703acd70 ("netfilter: nfnetlink_cthelper: unbreak userspace helper support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      94945ad2
    • Nathan Chancellor's avatar
      netfilter: conntrack: Pass value of ctinfo to __nf_conntrack_update · 46c1e062
      Nathan Chancellor authored
      Clang warns:
      
      net/netfilter/nf_conntrack_core.c:2068:21: warning: variable 'ctinfo' is
      uninitialized when used here [-Wuninitialized]
              nf_ct_set(skb, ct, ctinfo);
                                 ^~~~~~
      net/netfilter/nf_conntrack_core.c:2024:2: note: variable 'ctinfo' is
      declared here
              enum ip_conntrack_info ctinfo;
              ^
      1 warning generated.
      
      nf_conntrack_update was split up into nf_conntrack_update and
      __nf_conntrack_update, where the assignment of ctinfo is in
      nf_conntrack_update but it is used in __nf_conntrack_update.
      
      Pass the value of ctinfo from nf_conntrack_update to
      __nf_conntrack_update so that uninitialized memory is not used
      and everything works properly.
      
      Fixes: ee04805f ("netfilter: conntrack: make conntrack userspace helpers work again")
      Link: https://github.com/ClangBuiltLinux/linux/issues/1039Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      46c1e062
    • Eric Dumazet's avatar
      crypto: chelsio/chtls: properly set tp->lsndtime · a4976a3e
      Eric Dumazet authored
      TCP tp->lsndtime unit/base is tcp_jiffies32, not tcp_time_stamp()
      
      Fixes: 36bedb3f ("crypto: chtls - Inline TLS record Tx")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Ayush Sawal <ayush.sawal@chelsio.com>
      Cc: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4976a3e
    • Chris Packham's avatar
      net: sctp: Fix spelling in Kconfig help · 9a233d32
      Chris Packham authored
      Change 'handeled' to 'handled' in the Kconfig help for SCTP.
      Signed-off-by: default avatarChris Packham <chris.packham@alliedtelesis.co.nz>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a233d32
    • David S. Miller's avatar
      Merge branch 'bnxt_en-Bug-fixes' · 09cb9f26
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Bug fixes.
      
      3 bnxt_en driver fixes, covering a bug in preserving the counters during
      some resets, proper error code when flashing NVRAM fails, and an
      endian bug when extracting the firmware response message length.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09cb9f26
    • Edwin Peer's avatar
      bnxt_en: fix firmware message length endianness · 2a5a8800
      Edwin Peer authored
      The explicit mask and shift is not the appropriate way to parse fields
      out of a little endian struct. The length field is internally __le16
      and the strategy employed only happens to work on little endian machines
      because the offset used is actually incorrect (length is at offset 6).
      
      Also remove the related and no longer used definitions from bnxt.h.
      
      Fixes: 845adfe4 ("bnxt_en: Improve valid bit checking in firmware response message.")
      Signed-off-by: default avatarEdwin Peer <edwin.peer@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a5a8800
    • Vasundhara Volam's avatar
      bnxt_en: Fix return code to "flash_device". · 95ec1f47
      Vasundhara Volam authored
      When NVRAM directory is not found, return the error code
      properly as per firmware command failure instead of the hardcode
      -ENOBUFS.
      
      Fixes: 3a707bed ("bnxt_en: Return -EAGAIN if fw command returns BUSY")
      Signed-off-by: default avatarVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95ec1f47
    • Michael Chan's avatar
      bnxt_en: Fix accumulation of bp->net_stats_prev. · b8056e84
      Michael Chan authored
      We have logic to maintain network counters across resets by storing
      the counters in bp->net_stats_prev before reset.  But not all resets
      will clear the counters.  Certain resets that don't need to change
      the number of rings do not clear the counters.  The current logic
      accumulates the counters before all resets, causing big jumps in
      the counters after some resets, such as ethtool -G.
      
      Fix it by only accumulating the counters during reset if the irq_re_init
      parameter is set.  The parameter signifies that all rings and interrupts
      will be reset and that means that the counters will also be reset.
      Reported-by: default avatarVijayendra Suman <vijayendra.suman@oracle.com>
      Fixes: b8875ca3 ("bnxt_en: Save ring statistics before reset.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8056e84
    • Daniele Palmas's avatar
      net: usb: qmi_wwan: add Telit LE910C1-EUX composition · 591612aa
      Daniele Palmas authored
      Add support for Telit LE910C1-EUX composition
      
      0x1031: tty, tty, tty, rmnet
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      591612aa
    • Willem de Bruijn's avatar
      net: check untrusted gso_size at kernel entry · 6dd912f8
      Willem de Bruijn authored
      Syzkaller again found a path to a kernel crash through bad gso input:
      a packet with gso size exceeding len.
      
      These packets are dropped in tcp_gso_segment and udp[46]_ufo_fragment.
      But they may affect gso size calculations earlier in the path.
      
      Now that we have thlen as of commit 9274124f ("net: stricter
      validation of untrusted gso packets"), check gso_size at entry too.
      
      Fixes: bfd5f4a3 ("packet: Add GSO/csum offload support.")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6dd912f8
    • Paolo Abeni's avatar
      mptcp: avoid NULL-ptr derefence on fallback · 0a82e230
      Paolo Abeni authored
      In the MPTCP receive path we must cope with TCP fallback
      on blocking recvmsg(). Currently in such code path we detect
      the fallback condition, but we don't fetch the struct socket
      required for fallback.
      
      The above allowed syzkaller to trigger a NULL pointer
      dereference:
      
      general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
      CPU: 1 PID: 7226 Comm: syz-executor523 Not tainted 5.7.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:sock_recvmsg_nosec net/socket.c:886 [inline]
      RIP: 0010:sock_recvmsg+0x92/0x110 net/socket.c:904
      Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 44 89 6c 24 04 e8 53 18 1d fb 4d 8d 6f 20 4c 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 4c 89 ef e8 20 12 5b fb bd a0 00 00 00 49 03 6d
      RSP: 0018:ffffc90001077b98 EFLAGS: 00010202
      RAX: 0000000000000004 RBX: ffffc90001077dc0 RCX: dffffc0000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: 0000000000000000 R08: ffffffff86565e59 R09: ffffed10115afeaa
      R10: ffffed10115afeaa R11: 0000000000000000 R12: 1ffff9200020efbc
      R13: 0000000000000020 R14: ffffc90001077de0 R15: 0000000000000000
      FS:  00007fc6a3abe700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004d0050 CR3: 00000000969f0000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       mptcp_recvmsg+0x18d5/0x19b0 net/mptcp/protocol.c:891
       inet_recvmsg+0xf6/0x1d0 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec net/socket.c:886 [inline]
       sock_recvmsg net/socket.c:904 [inline]
       __sys_recvfrom+0x2f3/0x470 net/socket.c:2057
       __do_sys_recvfrom net/socket.c:2075 [inline]
       __se_sys_recvfrom net/socket.c:2071 [inline]
       __x64_sys_recvfrom+0xda/0xf0 net/socket.c:2071
       do_syscall_64+0xf3/0x1b0 arch/x86/entry/common.c:295
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      
      Address the issue initializing the struct socket reference
      before entering the fallback code.
      
      Reported-and-tested-by: syzbot+c6bfc3db991edc918432@syzkaller.appspotmail.com
      Suggested-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Fixes: 8ab183de ("mptcp: cope with later TCP fallback")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a82e230
  4. 26 May, 2020 12 commits
    • Fugang Duan's avatar
      net: stmmac: enable timestamp snapshot for required PTP packets in dwmac v5.10a · f2fb6b62
      Fugang Duan authored
      For rx filter 'HWTSTAMP_FILTER_PTP_V2_EVENT', it should be
      PTP v2/802.AS1, any layer, any kind of event packet, but HW only
      take timestamp snapshot for below PTP message: sync, Pdelay_req,
      Pdelay_resp.
      
      Then it causes below issue when test E2E case:
      ptp4l[2479.534]: port 1: received DELAY_REQ without timestamp
      ptp4l[2481.423]: port 1: received DELAY_REQ without timestamp
      ptp4l[2481.758]: port 1: received DELAY_REQ without timestamp
      ptp4l[2483.524]: port 1: received DELAY_REQ without timestamp
      ptp4l[2484.233]: port 1: received DELAY_REQ without timestamp
      ptp4l[2485.750]: port 1: received DELAY_REQ without timestamp
      ptp4l[2486.888]: port 1: received DELAY_REQ without timestamp
      ptp4l[2487.265]: port 1: received DELAY_REQ without timestamp
      ptp4l[2487.316]: port 1: received DELAY_REQ without timestamp
      
      Timestamp snapshot dependency on register bits in received path:
      SNAPTYPSEL TSMSTRENA TSEVNTENA 	PTP_Messages
      01         x         0          SYNC, Follow_Up, Delay_Req,
                                      Delay_Resp, Pdelay_Req, Pdelay_Resp,
                                      Pdelay_Resp_Follow_Up
      01         0         1          SYNC, Pdelay_Req, Pdelay_Resp
      
      For dwmac v5.10a, enabling all events by setting register
      DWC_EQOS_TIME_STAMPING[SNAPTYPSEL] to 2’b01, clearing bit [TSEVNTENA]
      to 0’b0, which can support all required events.
      Signed-off-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2fb6b62
    • David S. Miller's avatar
      Merge branch 'nexthop-group-fixes' · 4d5c32ec
      David S. Miller authored
      David Ahern says:
      
      ====================
      nexthops: Fix 2 fundamental flaws with nexthop groups
      
      Nik's torture tests have exposed 2 fundamental mistakes with the initial
      nexthop code for groups. First, the nexthops entries and num_nh in the
      nh_grp struct should not be modified once the struct is set under rcu.
      Doing so has major affects on the datapath seeing valid nexthop entries.
      
      Second, the helpers in the header file were convenient for not repeating
      code, but they cause datapath walks to potentially see 2 different group
      structs after an rcu replace, disrupting a walk of the path objects.
      This second problem applies solely to IPv4 as I re-used too much of the
      existing code in walking legs of a multipath route.
      
      Patches 1 is refactoring change to simplify the overhead of reviewing and
      understanding the change in patch 2 which fixes the update of nexthop
      groups when a compnent leg is removed.
      
      Patches 3-5 address the second problem. Patch 3 inlines the multipath
      check such that the mpath lookup and subsequent calls all use the same
      nh_grp struct. Patches 4 and 5 fix datapath uses of fib_info_num_path
      with iterative calls to fib_info_nhc.
      
      fib_info_num_path can be used in control plane path in a 'for loop' with
      subsequent fib_info_nhc calls to get each leg since the nh_grp struct is
      only changed while holding the rtnl; the combination can not be used in
      the data plane with external nexthops as it involves repeated dereferences
      of nh_grp struct which can change between calls.
      
      Similarly, nexthop_is_multipath can be used for branching decisions in
      the datapath since the nexthop type can not be changed (a group can not
      be converted to standalone and vice versa).
      
      Patch set developed in coordination with Nikolay Aleksandrov. He did a
      lot of work creating a good reproducer, discussing options to fix it
      and testing iterations.
      
      I have adapted Nik's commands into additional tests in the nexthops
      selftest script which I will send against -next.
      
      v2
      - fixed whitespace errors
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d5c32ec
    • David Ahern's avatar
      ipv4: nexthop version of fib_info_nh_uses_dev · 1fd1c768
      David Ahern authored
      Similar to the last path, need to fix fib_info_nh_uses_dev for
      external nexthops to avoid referencing multiple nh_grp structs.
      Move the device check in fib_info_nh_uses_dev to a helper and
      create a nexthop version that is called if the fib_info uses an
      external nexthop.
      
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fd1c768
    • David Ahern's avatar
      ipv4: Refactor nhc evaluation in fib_table_lookup · af7888ad
      David Ahern authored
      FIB lookups can return an entry that references an external nexthop.
      While walking the nexthop struct we do not want to make multiple calls
      into the nexthop code which can result in 2 different structs getting
      accessed - one returning the number of paths the rest of the loop
      seeing a different nh_grp struct. If the nexthop group shrunk, the
      result is an attempt to access a fib_nh_common that does not exist for
      the new nh_grp struct but did for the old one.
      
      To fix that move the device evaluation code to a helper that can be
      used for inline fib_nh path as well as external nexthops.
      
      Update the existing check for fi->nh in fib_table_lookup to call a
      new helper, nexthop_get_nhc_lookup, which walks the external nexthop
      with a single rcu dereference.
      
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af7888ad
    • David Ahern's avatar
      nexthop: Expand nexthop_is_multipath in a few places · 0b5e2e39
      David Ahern authored
      I got too fancy consolidating checks on multipath type. The result
      is that path lookups can access 2 different nh_grp structs as exposed
      by Nik's torture tests. Expand nexthop_is_multipath within nexthop.h to
      avoid multiple, nh_grp dereferences and make decisions based on the
      consistent struct.
      
      Only 2 places left using nexthop_is_multipath are within IPv6, both
      only check that the nexthop is a multipath for a branching decision
      which are acceptable.
      
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b5e2e39
    • Nikolay Aleksandrov's avatar
      nexthops: don't modify published nexthop groups · 90f33bff
      Nikolay Aleksandrov authored
      We must avoid modifying published nexthop groups while they might be
      in use, otherwise we might see NULL ptr dereferences. In order to do
      that we allocate 2 nexthoup group structures upon nexthop creation
      and swap between them when we have to delete an entry. The reason is
      that we can't fail nexthop group removal, so we can't handle allocation
      failure thus we move the extra allocation on creation where we can
      safely fail and return ENOMEM.
      
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90f33bff
    • David Ahern's avatar
      nexthops: Move code from remove_nexthop_from_groups to remove_nh_grp_entry · ac21753a
      David Ahern authored
      Move nh_grp dereference and check for removing nexthop group due to
      all members gone into remove_nh_grp_entry.
      
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac21753a
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 963bdfc7
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Set VLAN tag in tcp reset/icmp unreachable packets to reject
         connections in the bridge family, from Michael Braun.
      
      2) Incorrect subcounter flag update in ipset, from Phil Sutter.
      
      3) Possible buffer overflow in the pptp conntrack helper, based
         on patch from Dan Carpenter.
      
      4) Restore userspace conntrack helper hook logic that broke after
         hook consolidation rework.
      
      5) Unbreak userspace conntrack helper registration via
         nfnetlink_cthelper.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      963bdfc7
    • David S. Miller's avatar
      Merge tag 'mac80211-for-net-2020-05-25' of... · 1a6da4fc
      David S. Miller authored
      Merge tag 'mac80211-for-net-2020-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      A few changes:
       * fix a debugfs vs. wiphy rename crash
       * fix an invalid HE spec definition
       * fix a mesh timer crash
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a6da4fc
    • Qiushi Wu's avatar
      qlcnic: fix missing release in qlcnic_83xx_interrupt_test. · 15c97385
      Qiushi Wu authored
      In function qlcnic_83xx_interrupt_test(), function
      qlcnic_83xx_diag_alloc_res() is not handled by function
      qlcnic_83xx_diag_free_res() after a call of the function
      qlcnic_alloc_mbx_args() failed. Fix this issue by adding
      a jump target "fail_mbx_args", and jump to this new target
      when qlcnic_alloc_mbx_args() failed.
      
      Fixes: b6b4316c ("qlcnic: Handle qlcnic_alloc_mbx_args() failure")
      Signed-off-by: default avatarQiushi Wu <wu000273@umn.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15c97385
    • Vladimir Oltean's avatar
      dpaa_eth: fix usage as DSA master, try 3 · 5d14c304
      Vladimir Oltean authored
      The dpaa-eth driver probes on compatible string for the MAC node, and
      the fman/mac.c driver allocates a dpaa-ethernet platform device that
      triggers the probing of the dpaa-eth net device driver.
      
      All of this is fine, but the problem is that the struct device of the
      dpaa_eth net_device is 2 parents away from the MAC which can be
      referenced via of_node. So of_find_net_device_by_node can't find it, and
      DSA switches won't be able to probe on top of FMan ports.
      
      It would be a bit silly to modify a core function
      (of_find_net_device_by_node) to look for dev->parent->parent->of_node
      just for one driver. We're just 1 step away from implementing full
      recursion.
      
      Actually there have already been at least 2 previous attempts to make
      this work:
      - Commit a1a50c8e ("fsl/man: Inherit parent device and of_node")
      - One or more of the patches in "[v3,0/6] adapt DPAA drivers for DSA":
        https://patchwork.ozlabs.org/project/netdev/cover/1508178970-28945-1-git-send-email-madalin.bucur@nxp.com/
        (I couldn't really figure out which one was supposed to solve the
        problem and how).
      
      Point being, it looks like this is still pretty much a problem today.
      On T1040, the /sys/class/net/eth0 symlink currently points to
      
      ../../devices/platform/ffe000000.soc/ffe400000.fman/ffe4e6000.ethernet/dpaa-ethernet.0/net/eth0
      
      which pretty much illustrates the problem. The closest of_node we've got
      is the "fsl,fman-memac" at /soc@ffe000000/fman@400000/ethernet@e6000,
      which is what we'd like to be able to reference from DSA as host port.
      
      For of_find_net_device_by_node to find the eth0 port, we would need the
      parent of the eth0 net_device to not be the "dpaa-ethernet" platform
      device, but to point 1 level higher, aka the "fsl,fman-memac" node
      directly. The new sysfs path would look like this:
      
      ../../devices/platform/ffe000000.soc/ffe400000.fman/ffe4e6000.ethernet/net/eth0
      
      And this is exactly what SET_NETDEV_DEV does. It sets the parent of the
      net_device. The new parent has an of_node associated with it, and
      of_dev_node_match already checks for the of_node of the device or of its
      parent.
      
      Fixes: a1a50c8e ("fsl/man: Inherit parent device and of_node")
      Fixes: c6e26ea8 ("dpaa_eth: change device used")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d14c304
    • Vinay Kumar Yadav's avatar
      net/tls: fix race condition causing kernel panic · 0cada332
      Vinay Kumar Yadav authored
      tls_sw_recvmsg() and tls_decrypt_done() can be run concurrently.
      // tls_sw_recvmsg()
      	if (atomic_read(&ctx->decrypt_pending))
      		crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
      	else
      		reinit_completion(&ctx->async_wait.completion);
      
      //tls_decrypt_done()
        	pending = atomic_dec_return(&ctx->decrypt_pending);
      
        	if (!pending && READ_ONCE(ctx->async_notify))
        		complete(&ctx->async_wait.completion);
      
      Consider the scenario tls_decrypt_done() is about to run complete()
      
      	if (!pending && READ_ONCE(ctx->async_notify))
      
      and tls_sw_recvmsg() reads decrypt_pending == 0, does reinit_completion(),
      then tls_decrypt_done() runs complete(). This sequence of execution
      results in wrong completion. Consequently, for next decrypt request,
      it will not wait for completion, eventually on connection close, crypto
      resources freed, there is no way to handle pending decrypt response.
      
      This race condition can be avoided by having atomic_read() mutually
      exclusive with atomic_dec_return(),complete().Intoduced spin lock to
      ensure the mutual exclution.
      
      Addressed similar problem in tx direction.
      
      v1->v2:
      - More readable commit message.
      - Corrected the lock to fix new race scenario.
      - Removed barrier which is not needed now.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Signed-off-by: default avatarVinay Kumar Yadav <vinay.yadav@chelsio.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cada332
  5. 25 May, 2020 4 commits