1. 15 Dec, 2021 5 commits
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-fixes-for-ulp-a-deadlock-and-netlink-docs' · 500f3720
      Jakub Kicinski authored
      Mat Martineau says:
      
      ====================
      mptcp: Fixes for ULP, a deadlock, and netlink docs
      
      Two of the MPTCP fixes in this set are related to the TCP_ULP socket
      option with MPTCP sockets operating in "fallback" mode (the connection
      has reverted to regular TCP). The other issues are an observed deadlock
      and missing parameter documentation in the MPTCP netlink API.
      
      Patch 1 marks TCP_ULP as unsupported earlier in MPTCP setsockopt code,
      so the fallback code path in the MPTCP layer does not pass the TCP_ULP
      option down to the subflow TCP socket.
      
      Patch 2 makes sure a TCP fallback socket returned to userspace by
      accept()ing on a MPTCP listening socket does not allow use of the
      "mptcp" TCP_ULP type. That ULP is intended only for use by in-kernel
      MPTCP subflows.
      
      Patch 3 fixes the possible deadlock when sending data and there are
      socket option changes to sync to the subflows.
      
      Patch 4 makes sure all MPTCP netlink event parameters are documented
      in the MPTCP uapi header.
      ====================
      
      Link: https://lore.kernel.org/r/20211214231604.211016-1-mathew.j.martineau@linux.intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      500f3720
    • Matthieu Baerts's avatar
      mptcp: add missing documented NL params · 6813b192
      Matthieu Baerts authored
      'loc_id' and 'rem_id' are set in all events linked to subflows but those
      were missing in the events description in the comments.
      
      Fixes: b911c97c ("mptcp: add netlink event support")
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6813b192
    • Maxim Galaganov's avatar
      mptcp: fix deadlock in __mptcp_push_pending() · 3d79e375
      Maxim Galaganov authored
      __mptcp_push_pending() may call mptcp_flush_join_list() with subflow
      socket lock held. If such call hits mptcp_sockopt_sync_all() then
      subsequently __mptcp_sockopt_sync() could try to lock the subflow
      socket for itself, causing a deadlock.
      
      sysrq: Show Blocked State
      task:ss-server       state:D stack:    0 pid:  938 ppid:     1 flags:0x00000000
      Call Trace:
       <TASK>
       __schedule+0x2d6/0x10c0
       ? __mod_memcg_state+0x4d/0x70
       ? csum_partial+0xd/0x20
       ? _raw_spin_lock_irqsave+0x26/0x50
       schedule+0x4e/0xc0
       __lock_sock+0x69/0x90
       ? do_wait_intr_irq+0xa0/0xa0
       __lock_sock_fast+0x35/0x50
       mptcp_sockopt_sync_all+0x38/0xc0
       __mptcp_push_pending+0x105/0x200
       mptcp_sendmsg+0x466/0x490
       sock_sendmsg+0x57/0x60
       __sys_sendto+0xf0/0x160
       ? do_wait_intr_irq+0xa0/0xa0
       ? fpregs_restore_userregs+0x12/0xd0
       __x64_sys_sendto+0x20/0x30
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f9ba546c2d0
      RSP: 002b:00007ffdc3b762d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00007f9ba56c8060 RCX: 00007f9ba546c2d0
      RDX: 000000000000077a RSI: 0000000000e5e180 RDI: 0000000000000234
      RBP: 0000000000cc57f0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ba56c8060
      R13: 0000000000b6ba60 R14: 0000000000cc7840 R15: 41d8685b1d7901b8
       </TASK>
      
      Fix the issue by using __mptcp_flush_join_list() instead of plain
      mptcp_flush_join_list() inside __mptcp_push_pending(), as suggested by
      Florian. The sockopt sync will be deferred to the workqueue.
      
      Fixes: 1b3e7ede ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/244Suggested-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarMaxim Galaganov <max@internet.ru>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3d79e375
    • Florian Westphal's avatar
      mptcp: clear 'kern' flag from fallback sockets · d6692b3b
      Florian Westphal authored
      The mptcp ULP extension relies on sk->sk_sock_kern being set correctly:
      It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from
      working for plain tcp sockets (any userspace-exposed socket).
      
      But in case of fallback, accept() can return a plain tcp sk.
      In such case, sk is still tagged as 'kernel' and setsockopt will work.
      
      This will crash the kernel, The subflow extension has a NULL ctx->conn
      mptcp socket:
      
      BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0
      Call Trace:
       tcp_data_ready+0xf8/0x370
       [..]
      
      Fixes: cf7da0d6 ("mptcp: Create SUBFLOW socket for incoming connections")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d6692b3b
    • Florian Westphal's avatar
      mptcp: remove tcp ulp setsockopt support · 404cd9a2
      Florian Westphal authored
      TCP_ULP setsockopt cannot be used for mptcp because its already
      used internally to plumb subflow (tcp) sockets to the mptcp layer.
      
      syzbot managed to trigger a crash for mptcp connections that are
      in fallback mode:
      
      KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
      CPU: 1 PID: 1083 Comm: syz-executor.3 Not tainted 5.16.0-rc2-syzkaller #0
      RIP: 0010:tls_build_proto net/tls/tls_main.c:776 [inline]
      [..]
       __tcp_set_ulp net/ipv4/tcp_ulp.c:139 [inline]
       tcp_set_ulp+0x428/0x4c0 net/ipv4/tcp_ulp.c:160
       do_tcp_setsockopt+0x455/0x37c0 net/ipv4/tcp.c:3391
       mptcp_setsockopt+0x1b47/0x2400 net/mptcp/sockopt.c:638
      
      Remove support for TCP_ULP setsockopt.
      
      Fixes: d9e4c129 ("mptcp: only admit explicitly supported sockopt")
      Reported-by: syzbot+1fd9b69cde42967d1add@syzkaller.appspotmail.com
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      404cd9a2
  2. 14 Dec, 2021 19 commits
  3. 13 Dec, 2021 10 commits
    • Stefan Assmann's avatar
      iavf: do not override the adapter state in the watchdog task (again) · fe523d7c
      Stefan Assmann authored
      The watchdog task incorrectly changes the state to __IAVF_RESETTING,
      instead of letting the reset task take care of that. This was already
      resolved by commit 22c8fd71 ("iavf: do not override the adapter
      state in the watchdog task") but the problem was reintroduced by the
      recent code refactoring in commit 45eebd62 ("iavf: Refactor iavf
      state machine tracking").
      
      Fixes: 45eebd62 ("iavf: Refactor iavf state machine tracking")
      Signed-off-by: default avatarStefan Assmann <sassmann@kpanic.de>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      fe523d7c
    • Dan Carpenter's avatar
      iavf: missing unlocks in iavf_watchdog_task() · bc2f39a6
      Dan Carpenter authored
      This code was re-organized and there some unlocks missing now.
      
      Fixes: 898ef1cb ("iavf: Combine init and watchdog state machines")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      bc2f39a6
    • David Wu's avatar
      net: stmmac: Add GFP_DMA32 for rx buffers if no 64 capability · 884d2b84
      David Wu authored
      Use page_pool_alloc_pages instead of page_pool_dev_alloc_pages, which
      can give the gfp parameter, in the case of not supporting 64-bit width,
      using 32-bit address memory can reduce a copy from swiotlb.
      Signed-off-by: default avatarDavid Wu <david.wu@rock-chips.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      884d2b84
    • Russell King (Oracle)'s avatar
      net: phy: add a note about refcounting · d33dae51
      Russell King (Oracle) authored
      Recently, a patch has been submitted to "fix" the refcounting for a DT
      node in of_mdiobus_link_mdiodev(). This is not a leaked refcount. The
      refcount is passed to the new device.
      
      Sadly, coccicheck identifies this location as a leaked refcount, which
      means we're likely to keep getting patches to "fix" this. However,
      fixing this will cause breakage. Add a comment to state that the lack
      of of_node_put() here is intentional.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d33dae51
    • Wang Qing's avatar
      net: ethernet: ti: add missing of_node_put before return · be565ec7
      Wang Qing authored
      Fix following coccicheck warning:
      WARNING: Function "for_each_child_of_node"
      should have of_node_put() before return.
      
      Early exits from for_each_child_of_node should decrement the
      node reference counter.
      Signed-off-by: default avatarWang Qing <wangqing@vivo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be565ec7
    • Hangbin Liu's avatar
      selftest/net/forwarding: declare NETIFS p9 p10 · 71da1aec
      Hangbin Liu authored
      The recent GRE selftests defined NUM_NETIFS=10. If the users copy
      forwarding.config.sample to forwarding.config directly, they will get
      error "Command line is not complete" when run the GRE tests, because
      create_netif_veth() failed with no interface name defined.
      
      Fix it by extending the NETIFS with p9 and p10.
      
      Fixes: 2800f248 ("selftests: forwarding: Test multipath hashing on inner IP pkts for GRE tunnel")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71da1aec
    • Marek Behún's avatar
      net: dsa: mv88e6xxx: Unforce speed & duplex in mac_link_down() · 9d591fc0
      Marek Behún authored
      Commit 64d47d50 ("net: dsa: mv88e6xxx: configure interface settings
      in mac_config") removed forcing of speed and duplex from
      mv88e6xxx_mac_config(), where the link is forced down, and left it only
      in mv88e6xxx_mac_link_up(), by which time link is unforced.
      
      It seems that (at least on 88E6190) when changing cmode to 2500base-x,
      if the link is not forced down, but the speed or duplex are still
      forced, the forcing of new settings for speed & duplex doesn't take in
      mv88e6xxx_mac_link_up().
      
      Fix this by unforcing speed & duplex in mv88e6xxx_mac_link_down().
      
      Fixes: 64d47d50 ("net: dsa: mv88e6xxx: configure interface settings in mac_config")
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d591fc0
    • Willem de Bruijn's avatar
      selftests/net: toeplitz: fix udp option · a8d13611
      Willem de Bruijn authored
      Tiny fix. Option -u ("use udp") does not take an argument.
      
      It can cause the next argument to silently be ignored.
      
      Fixes: 5ebfb4cc ("selftests/net: toeplitz test")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8d13611
    • Miaoqian Lin's avatar
      net: bcmgenet: Fix NULL vs IS_ERR() checking · ab8eb798
      Miaoqian Lin authored
      The phy_attach() function does not return NULL. It returns error pointers.
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab8eb798
    • Davide Caratti's avatar
      net/sched: sch_ets: don't remove idle classes from the round-robin list · c062f2a0
      Davide Caratti authored
      Shuang reported that the following script:
      
       1) tc qdisc add dev ddd0 handle 10: parent 1: ets bands 8 strict 4 priomap 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
       2) mausezahn ddd0  -A 10.10.10.1 -B 10.10.10.2 -c 0 -a own -b 00:c1:a0:c1:a0:00 -t udp &
       3) tc qdisc change dev ddd0 handle 10: ets bands 4 strict 2 quanta 2500 2500 priomap 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
      
      crashes systematically when line 2) is commented:
      
       list_del corruption, ffff8e028404bd30->next is LIST_POISON1 (dead000000000100)
       ------------[ cut here ]------------
       kernel BUG at lib/list_debug.c:47!
       invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
       CPU: 0 PID: 954 Comm: tc Not tainted 5.16.0-rc4+ #478
       Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
       RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x47
       Code: fe ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 08 42 1b 87 e8 1d c5 fe ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 98 42 1b 87 e8 09 c5 fe ff <0f> 0b 48 c7 c7 48 43 1b 87 e8 fb c4 fe ff 0f 0b 48 89 f2 48 89 fe
       RSP: 0018:ffffae46807a3888 EFLAGS: 00010246
       RAX: 000000000000004e RBX: 0000000000000007 RCX: 0000000000000202
       RDX: 0000000000000000 RSI: ffffffff871ac536 RDI: 00000000ffffffff
       RBP: ffffae46807a3a10 R08: 0000000000000000 R09: c0000000ffff7fff
       R10: 0000000000000001 R11: ffffae46807a36a8 R12: ffff8e028404b800
       R13: ffff8e028404bd30 R14: dead000000000100 R15: ffff8e02fafa2400
       FS:  00007efdc92e4480(0000) GS:ffff8e02fb600000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000682f48 CR3: 00000001058be000 CR4: 0000000000350ef0
       Call Trace:
        <TASK>
        ets_qdisc_change+0x58b/0xa70 [sch_ets]
        tc_modify_qdisc+0x323/0x880
        rtnetlink_rcv_msg+0x169/0x4a0
        netlink_rcv_skb+0x50/0x100
        netlink_unicast+0x1a5/0x280
        netlink_sendmsg+0x257/0x4d0
        sock_sendmsg+0x5b/0x60
        ____sys_sendmsg+0x1f2/0x260
        ___sys_sendmsg+0x7c/0xc0
        __sys_sendmsg+0x57/0xa0
        do_syscall_64+0x3a/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7efdc8031338
       Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 43 2c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 41 89 d4 55
       RSP: 002b:00007ffdf1ce9828 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 0000000061b37a97 RCX: 00007efdc8031338
       RDX: 0000000000000000 RSI: 00007ffdf1ce9890 RDI: 0000000000000003
       RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000078a940
       R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001
       R13: 0000000000688880 R14: 0000000000000000 R15: 0000000000000000
        </TASK>
       Modules linked in: sch_ets sch_tbf dummy rfkill iTCO_wdt iTCO_vendor_support intel_rapl_msr intel_rapl_common joydev pcspkr i2c_i801 virtio_balloon i2c_smbus lpc_ich ip_tables xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel serio_raw ghash_clmulni_intel ahci libahci libata virtio_blk virtio_console virtio_net net_failover failover sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: sch_ets]
       ---[ end trace f35878d1912655c2 ]---
       RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x47
       Code: fe ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 08 42 1b 87 e8 1d c5 fe ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 98 42 1b 87 e8 09 c5 fe ff <0f> 0b 48 c7 c7 48 43 1b 87 e8 fb c4 fe ff 0f 0b 48 89 f2 48 89 fe
       RSP: 0018:ffffae46807a3888 EFLAGS: 00010246
       RAX: 000000000000004e RBX: 0000000000000007 RCX: 0000000000000202
       RDX: 0000000000000000 RSI: ffffffff871ac536 RDI: 00000000ffffffff
       RBP: ffffae46807a3a10 R08: 0000000000000000 R09: c0000000ffff7fff
       R10: 0000000000000001 R11: ffffae46807a36a8 R12: ffff8e028404b800
       R13: ffff8e028404bd30 R14: dead000000000100 R15: ffff8e02fafa2400
       FS:  00007efdc92e4480(0000) GS:ffff8e02fb600000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000682f48 CR3: 00000001058be000 CR4: 0000000000350ef0
       Kernel panic - not syncing: Fatal exception in interrupt
       Kernel Offset: 0x4e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
       ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      we can remove 'q->classes[i].alist' only if DRR class 'i' was part of the
      active list. In the ETS scheduler DRR classes belong to that list only if
      the queue length is greater than zero: we need to test for non-zero value
      of 'q->classes[i].qdisc->q.qlen' before removing from the list, similarly
      to what has been done elsewhere in the ETS code.
      
      Fixes: de6d2592 ("net/sched: sch_ets: don't peek at classes beyond 'nbands'")
      Reported-by: default avatarShuang Li <shuali@redhat.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c062f2a0
  4. 12 Dec, 2021 6 commits