1. 06 Jul, 2021 3 commits
    • Taehee Yoo's avatar
      bonding: fix suspicious RCU usage in bond_ipsec_add_sa() · b648eba4
      Taehee Yoo authored
      To dereference bond->curr_active_slave, it uses rcu_dereference().
      But it and the caller doesn't acquire RCU so a warning occurs.
      So add rcu_read_lock().
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add bond0 type bond
          ip link set dummy0 master bond0
          ip link set dummy0 up
          ip link set bond0 up
          ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 \
      	    mode transport \
      	    reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
      	    0x44434241343332312423222114131211f4f3f2f1 128 sel \
      	    src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp offload \
      	    dev bond0 dir in
      
      Splat looks like:
      =============================
      WARNING: suspicious RCU usage
      5.13.0-rc3+ #1168 Not tainted
      -----------------------------
      drivers/net/bonding/bond_main.c:411 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      1 lock held by ip/684:
       #0: ffffffff9a2757c0 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3},
      at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user]
         55.191733][  T684] stack backtrace:
      CPU: 0 PID: 684 Comm: ip Not tainted 5.13.0-rc3+ #1168
      Call Trace:
       dump_stack+0xa4/0xe5
       bond_ipsec_add_sa+0x18c/0x1f0 [bonding]
       xfrm_dev_state_add+0x2a9/0x770
       ? memcpy+0x38/0x60
       xfrm_add_sa+0x2278/0x3b10 [xfrm_user]
       ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user]
       ? register_lock_class+0x1750/0x1750
       xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
       ? rcu_read_lock_sched_held+0x91/0xc0
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? find_held_lock+0x3a/0x1c0
       ? mutex_lock_io_nested+0x1210/0x1210
       ? sched_clock_cpu+0x18/0x170
       netlink_rcv_skb+0x121/0x350
       ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
       ? netlink_ack+0x9d0/0x9d0
       ? netlink_deliver_tap+0x17c/0xa50
       xfrm_netlink_rcv+0x68/0x80 [xfrm_user]
       netlink_unicast+0x41c/0x610
       ? netlink_attachskb+0x710/0x710
       netlink_sendmsg+0x6b9/0xb70
      [ ... ]
      
      Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b648eba4
    • Nguyen Dinh Phi's avatar
      tcp: fix tcp_init_transfer() to not reset icsk_ca_initialized · be5d1b61
      Nguyen Dinh Phi authored
      This commit fixes a bug (found by syzkaller) that could cause spurious
      double-initializations for congestion control modules, which could cause
      memory leaks or other problems for congestion control modules (like CDG)
      that allocate memory in their init functions.
      
      The buggy scenario constructed by syzkaller was something like:
      
      (1) create a TCP socket
      (2) initiate a TFO connect via sendto()
      (3) while socket is in TCP_SYN_SENT, call setsockopt(TCP_CONGESTION),
          which calls:
             tcp_set_congestion_control() ->
               tcp_reinit_congestion_control() ->
                 tcp_init_congestion_control()
      (4) receive ACK, connection is established, call tcp_init_transfer(),
          set icsk_ca_initialized=0 (without first calling cc->release()),
          call tcp_init_congestion_control() again.
      
      Note that in this sequence tcp_init_congestion_control() is called
      twice without a cc->release() call in between. Thus, for CC modules
      that allocate memory in their init() function, e.g, CDG, a memory leak
      may occur. The syzkaller tool managed to find a reproducer that
      triggered such a leak in CDG.
      
      The bug was introduced when that commit 8919a9b3 ("tcp: Only init
      congestion control if not initialized already")
      introduced icsk_ca_initialized and set icsk_ca_initialized to 0 in
      tcp_init_transfer(), missing the possibility for a sequence like the
      one above, where a process could call setsockopt(TCP_CONGESTION) in
      state TCP_SYN_SENT (i.e. after the connect() or TFO open sendmsg()),
      which would call tcp_init_congestion_control(). It did not intend to
      reset any initialization that the user had already explicitly made;
      it just missed the possibility of that particular sequence (which
      syzkaller managed to find).
      
      Fixes: 8919a9b3 ("tcp: Only init congestion control if not initialized already")
      Reported-by: syzbot+f1e24a0594d4e3a895d3@syzkaller.appspotmail.com
      Signed-off-by: default avatarNguyen Dinh Phi <phind.uet@gmail.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Tested-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be5d1b61
    • Paul Blakey's avatar
      skbuff: Release nfct refcount on napi stolen or re-used skbs · 8550ff8d
      Paul Blakey authored
      When multiple SKBs are merged to a new skb under napi GRO,
      or SKB is re-used by napi, if nfct was set for them in the
      driver, it will not be released while freeing their stolen
      head state or on re-use.
      
      Release nfct on napi's stolen or re-used SKBs, and
      in gro_list_prepare, check conntrack metadata diff.
      
      Fixes: 5c6b9460 ("net/mlx5e: CT: Handle misses after executing CT action")
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8550ff8d
  2. 05 Jul, 2021 6 commits
  3. 03 Jul, 2021 1 commit
  4. 02 Jul, 2021 13 commits
  5. 01 Jul, 2021 17 commits
    • Kees Cook's avatar
      s390: iucv: Avoid field over-reading memcpy() · 5140aaa4
      Kees Cook authored
      In preparation for FORTIFY_SOURCE performing compile-time and run-time
      field bounds checking for memcpy(), memmove(), and memset(), avoid
      intentionally reading across neighboring array fields.
      
      Add a wrapping struct to serve as the memcpy() source so the compiler
      can perform appropriate bounds checking, avoiding this future warning:
      
      In function '__fortify_memcpy',
          inlined from 'iucv_message_pending' at net/iucv/iucv.c:1663:4:
      ./include/linux/fortify-string.h:246:4: error: call to '__read_overflow2_field' declared with attribute error: detected read beyond size of field (2nd parameter)
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5140aaa4
    • Christophe JAILLET's avatar
      gve: Propagate error codes to caller · 6dce38b4
      Christophe JAILLET authored
      If 'gve_probe()' fails, we should propagate the error code, instead of
      hard coding a -ENXIO value.
      Make sure that all error handling paths set a correct value for 'err'.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarCatherine Sullivan <csully@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6dce38b4
    • Christophe JAILLET's avatar
      gve: Fix an error handling path in 'gve_probe()' · 2342ae10
      Christophe JAILLET authored
      If the 'register_netdev() call fails, we must release the resources
      allocated by the previous 'gve_init_priv()' call, as already done in the
      remove function.
      
      Add a new label and the missing 'gve_teardown_priv_resources()' in the
      error handling path.
      
      Fixes: 893ce44d ("gve: Add basic driver framework for Compute Engine Virtual NIC")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarCatherine Sullivan <csully@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2342ae10
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/t · aa3cf240
      David S. Miller authored
      nguy/net-queue
      
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-07-01
      
      This series contains updates to igb, igc, ixgbe, e1000e, fm10k, and iavf
      drivers.
      
      Vinicius fixes a use-after-free issue present in igc and igb.
      
      Tom Rix fixes the return value for igc_read_phy_reg() when the
      operation is not supported for igc.
      
      Christophe Jaillet fixes unrolling of PCIe error reporting for ixgbe,
      igc, igb, fm10k, e10000e, and iavf.
      
      Alex ensures that q_vector array is not accessed beyond its bounds for
      igb.
      
      Jedrzej moves ring assignment to occur after bounds have been checked in
      igb.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa3cf240
    • Mohammad Athari Bin Ismail's avatar
      net: stmmac: Terminate FPE workqueue in suspend · 6b28a86d
      Mohammad Athari Bin Ismail authored
      Add stmmac_fpe_stop_wq() in stmmac_suspend() to terminate FPE workqueue
      during suspend. So, in suspend mode, there will be no FPE workqueue
      available. Without this fix, new additional FPE workqueue will be created
      in every suspend->resume cycle.
      
      Fixes: 5a558611 ("net: stmmac: support FPE link partner hand-shaking procedure")
      Signed-off-by: default avatarMohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b28a86d
    • David S. Miller's avatar
      Merge branch 'sms911x-dts' · 1c88995d
      David S. Miller authored
      Geert Uytterhoeven says:
      
      ====================
      sms911x: DTS fixes and DT binding to json-schema conversion
      
      This patch series converts the Smart Mixed-Signal Connectivity (SMSC)
      LAN911x/912x Controller Device Tree binding documentation to
      json-schema, after fixing a few issues in DTS files.
      
      Changed compared to v1[1]:
        - Dropped applied patches,
        - Add Reviewed-by,
        - Drop bogus double quotes in compatible values,
        - Add comment explaining why "additionalProperties: true" is needed.
      
      [1] [PATCH 0/5] sms911x: DTS fixes and DT binding to json-schema conversion
          https://lore.kernel.org/r/cover.1621518686.git.geert+renesas@glider.be
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c88995d
    • Geert Uytterhoeven's avatar
      dt-bindings: net: sms911x: Convert to json-schema · 19373d02
      Geert Uytterhoeven authored
      Convert the Smart Mixed-Signal Connectivity (SMSC) LAN911x/912x
      Controller Device Tree binding documentation to json-schema.
      
      Document missing properties.
      Make "phy-mode" not required, as many DTS files do not have it, and the
      Linux drivers falls back to PHY_INTERFACE_MODE_NA.
      Correct nodename in example.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19373d02
    • Geert Uytterhoeven's avatar
      ARM: dts: qcom-apq8060: Correct Ethernet node name and drop bogus irq property · b6c88010
      Geert Uytterhoeven authored
      make dtbs_check:
      
          ethernet-ebi2@2,0: $nodename:0: 'ethernet-ebi2@2,0' does not match '^ethernet(@.*)?$'
          ethernet-ebi2@2,0: 'smsc,irq-active-low' does not match any of the regexes: 'pinctrl-[0-9]+'
      
      There is no "smsc,irq-active-low" property, as active low is the
      default.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6c88010
    • Eric Dumazet's avatar
      udp: annotate data races around unix_sk(sk)->gso_size · 18a419ba
      Eric Dumazet authored
      Accesses to unix_sk(sk)->gso_size are lockless.
      Add READ_ONCE()/WRITE_ONCE() around them.
      
      BUG: KCSAN: data-race in udp_lib_setsockopt / udpv6_sendmsg
      
      write to 0xffff88812d78f47c of 2 bytes by task 10849 on cpu 1:
       udp_lib_setsockopt+0x3b3/0x710 net/ipv4/udp.c:2696
       udpv6_setsockopt+0x63/0x90 net/ipv6/udp.c:1630
       sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3265
       __sys_setsockopt+0x18f/0x200 net/socket.c:2104
       __do_sys_setsockopt net/socket.c:2115 [inline]
       __se_sys_setsockopt net/socket.c:2112 [inline]
       __x64_sys_setsockopt+0x62/0x70 net/socket.c:2112
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff88812d78f47c of 2 bytes by task 10852 on cpu 0:
       udpv6_sendmsg+0x161/0x16b0 net/ipv6/udp.c:1299
       inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:642
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg net/socket.c:674 [inline]
       ____sys_sendmsg+0x360/0x4d0 net/socket.c:2337
       ___sys_sendmsg net/socket.c:2391 [inline]
       __sys_sendmmsg+0x315/0x4b0 net/socket.c:2477
       __do_sys_sendmmsg net/socket.c:2506 [inline]
       __se_sys_sendmmsg net/socket.c:2503 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2503
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000 -> 0x0005
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 10852 Comm: syz-executor.0 Not tainted 5.13.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18a419ba
    • Paolo Abeni's avatar
      tcp: consistently disable header prediction for mptcp · 71158bb1
      Paolo Abeni authored
      The MPTCP receive path is hooked only into the TCP slow-path.
      The DSS presence allows plain MPTCP traffic to hit that
      consistently.
      
      Since commit e1ff9e82 ("net: mptcp: improve fallback to TCP"),
      when an MPTCP socket falls back to TCP, it can hit the TCP receive
      fast-path, and delay or stop triggering the event notification.
      
      Address the issue explicitly disabling the header prediction
      for MPTCP sockets.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/200
      Fixes: e1ff9e82 ("net: mptcp: improve fallback to TCP")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71158bb1
    • Christoph Hellwig's avatar
      net: remove the caif_hsi driver · ca75bcf0
      Christoph Hellwig authored
      The caif_hsi driver relies on a cfhsi_get_ops symbol using symbol_get,
      but this symbol is not provided anywhere in the kernel tree.  Remove
      this driver given that it is dead code.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca75bcf0
    • Xin Long's avatar
      Documentation: add more details in tipc.rst · 09ef1786
      Xin Long authored
      kernel-doc for TIPC is too simple, we need to add more information for it.
      
      This patch is to extend the abstract, and add the Features and Links items.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09ef1786
    • Sukadev Bhattiprolu's avatar
      ibmvnic: retry reset if there are no other resets · 4f408e1f
      Sukadev Bhattiprolu authored
      Normally, if a reset fails due to failover or other communication error
      there is another reset (eg: FAILOVER) in the queue and we would process
      that reset. But if we are unable to communicate with PHYP or VIOS after
      H_FREE_CRQ, there would be no other resets in the queue and the adapter
      would be in an undefined state even though it was in the OPEN state
      earlier. While starting the reset we set the carrier to off state so
      we won't even get the timeout resets.
      
      If the last queued reset fails, retry it as a hard reset (after the
      usual 60 second settling time).
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Reviewed-by: default avatarDany Madden <drt@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f408e1f
    • David S. Miller's avatar
      Merge branch 'ptp-virtual-clocks-and-timestamping' · b2bc8148
      David S. Miller authored
      Yangbo Lu says:
      
      ====================
      ptp: support virtual clocks and timestamping
      
      Current PTP driver exposes one PTP device to user which binds network
      interface/interfaces to provide timestamping. Actually we have a way
      utilizing timecounter/cyclecounter to virtualize any number of PTP
      clocks based on a same free running physical clock for using.
      The purpose of having multiple PTP virtual clocks is for user space
      to directly/easily use them for multiple domains synchronization.
      
      user
      space:     ^                                  ^
                 | SO_TIMESTAMPING new flag:        | Packets with
                 | SOF_TIMESTAMPING_BIND_PHC        | TX/RX HW timestamps
                 v                                  v
               +--------------------------------------------+
      sock:    |     sock (new member sk_bind_phc)          |
               +--------------------------------------------+
                 ^                                  ^
                 | ethtool_get_phc_vclocks          | Convert HW timestamps
                 |                                  | to sk_bind_phc
                 v                                  v
               +--------------+--------------+--------------+
      vclock:  | ptp1         | ptp2         | ptpN         |
               +--------------+--------------+--------------+
      pclock:  |             ptp0 free running              |
               +--------------------------------------------+
      
      The block diagram may explain how it works. Besides the PTP virtual
      clocks, the packet HW timestamp converting to the bound PHC is also
      done in sock driver. For user space, PTP virtual clocks can be
      created via sysfs, and extended SO_TIMESTAMPING API (new flag
      SOF_TIMESTAMPING_BIND_PHC) can be used to bind one PTP virtual clock
      for timestamping.
      
      The test tool timestamping.c (together with linuxptp phc_ctl tool) can
      be used to verify:
      
        # echo 4 > /sys/class/ptp/ptp0/n_vclocks
        [  129.399472] ptp ptp0: new virtual clock ptp2
        [  129.404234] ptp ptp0: new virtual clock ptp3
        [  129.409532] ptp ptp0: new virtual clock ptp4
        [  129.413942] ptp ptp0: new virtual clock ptp5
        [  129.418257] ptp ptp0: guarantee physical clock free running
        #
        # phc_ctl /dev/ptp2 set 10000
        # phc_ctl /dev/ptp3 set 20000
        #
        # timestamping eno0 2 SOF_TIMESTAMPING_TX_HARDWARE SOF_TIMESTAMPING_RAW_HARDWARE SOF_TIMESTAMPING_BIND_PHC
        # timestamping eno0 2 SOF_TIMESTAMPING_RX_HARDWARE SOF_TIMESTAMPING_RAW_HARDWARE SOF_TIMESTAMPING_BIND_PHC
        # timestamping eno0 3 SOF_TIMESTAMPING_TX_HARDWARE SOF_TIMESTAMPING_RAW_HARDWARE SOF_TIMESTAMPING_BIND_PHC
        # timestamping eno0 3 SOF_TIMESTAMPING_RX_HARDWARE SOF_TIMESTAMPING_RAW_HARDWARE SOF_TIMESTAMPING_BIND_PHC
      
      Changes for v2:
      	- Converted to num_vclocks for creating virtual clocks.
      	- Guranteed physical clock free running when using virtual
      	  clocks.
      	- Fixed build warning.
      	- Updated copyright.
      Changes for v3:
      	- Supported PTP virtual clock in default in PTP driver.
      	- Protected concurrency of ptp->num_vclocks accessing.
      	- Supported PHC vclocks query via ethtool.
      	- Extended SO_TIMESTAMPING API for PHC binding.
      	- Converted HW timestamps to PHC bound, instead of previous
      	  binding domain value to PHC idea.
      	- Other minor fixes.
      Changes for v4:
      	- Used do_aux_work callback for vclock refreshing instead.
      	- Used unsigned int for vclocks number, and max_vclocks
      	  for limitiation.
      	- Fixed mutex locking.
      	- Dynamically allocated memory for vclock index storage.
      	- Removed ethtool ioctl command for vclocks getting.
      	- Updated doc for ethtool phc vclocks get.
      	- Converted to mptcp_setsockopt_sol_socket_timestamping().
      	- Passed so_timestamping for sock_set_timestamping.
      	- Fixed checkpatch/build.
      	- Other minor fixed.
      Changes for v5:
      	- Fixed checkpatch/build/bug reported by test robot.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2bc8148
    • Yangbo Lu's avatar
      MAINTAINERS: add entry for PTP virtual clock driver · 5ce15f27
      Yangbo Lu authored
      Add entry for PTP virtual clock driver.
      Signed-off-by: default avatarYangbo Lu <yangbo.lu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ce15f27
    • Yangbo Lu's avatar
      selftests/net: timestamping: support binding PHC · 2214d703
      Yangbo Lu authored
      Support binding PHC of PTP vclock for timestamping.
      Signed-off-by: default avatarYangbo Lu <yangbo.lu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2214d703
    • Yangbo Lu's avatar
      net: socket: support hardware timestamp conversion to PHC bound · d7c08826
      Yangbo Lu authored
      This patch is to support hardware timestamp conversion to
      PHC bound. This applies to both RX and TX since their skb
      handling (for TX, it's skb clone in error queue) all goes
      through __sock_recv_timestamp.
      Signed-off-by: default avatarYangbo Lu <yangbo.lu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7c08826