1. 19 Feb, 2022 14 commits
    • Svenning Sørensen's avatar
      net: dsa: microchip: fix bridging with more than two member ports · 3d00827a
      Svenning Sørensen authored
      Commit b3612ccd ("net: dsa: microchip: implement multi-bridge support")
      plugged a packet leak between ports that were members of different bridges.
      Unfortunately, this broke another use case, namely that of more than two
      ports that are members of the same bridge.
      
      After that commit, when a port is added to a bridge, hardware bridging
      between other member ports of that bridge will be cleared, preventing
      packet exchange between them.
      
      Fix by ensuring that the Port VLAN Membership bitmap includes any existing
      ports in the bridge, not just the port being added.
      
      Fixes: b3612ccd ("net: dsa: microchip: implement multi-bridge support")
      Signed-off-by: default avatarSvenning Sørensen <sss@secomea.com>
      Tested-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d00827a
    • Christophe Leroy's avatar
      net: Force inlining of checksum functions in net/checksum.h · 5486f5bf
      Christophe Leroy authored
      All functions defined as static inline in net/checksum.h are
      meant to be inlined for performance reason.
      
      But since commit ac7c3e4f ("compiler: enable
      CONFIG_OPTIMIZE_INLINING forcibly") the compiler is allowed to
      uninline functions when it wants.
      
      Fair enough in the general case, but for tiny performance critical
      checksum helpers that's counter-productive.
      
      The problem mainly arises when selecting CONFIG_CC_OPTIMISE_FOR_SIZE,
      Those helpers being 'static inline' in header files you suddenly find
      them duplicated many times in the resulting vmlinux.
      
      Here is a typical exemple when building powerpc pmac32_defconfig
      with CONFIG_CC_OPTIMISE_FOR_SIZE. csum_sub() appears 4 times:
      
      	c04a23cc <csum_sub>:
      	c04a23cc:	7c 84 20 f8 	not     r4,r4
      	c04a23d0:	7c 63 20 14 	addc    r3,r3,r4
      	c04a23d4:	7c 63 01 94 	addze   r3,r3
      	c04a23d8:	4e 80 00 20 	blr
      		...
      	c04a2ce8:	4b ff f6 e5 	bl      c04a23cc <csum_sub>
      		...
      	c04a2d2c:	4b ff f6 a1 	bl      c04a23cc <csum_sub>
      		...
      	c04a2d54:	4b ff f6 79 	bl      c04a23cc <csum_sub>
      		...
      	c04a754c <csum_sub>:
      	c04a754c:	7c 84 20 f8 	not     r4,r4
      	c04a7550:	7c 63 20 14 	addc    r3,r3,r4
      	c04a7554:	7c 63 01 94 	addze   r3,r3
      	c04a7558:	4e 80 00 20 	blr
      		...
      	c04ac930:	4b ff ac 1d 	bl      c04a754c <csum_sub>
      		...
      	c04ad264:	4b ff a2 e9 	bl      c04a754c <csum_sub>
      		...
      	c04e3b08 <csum_sub>:
      	c04e3b08:	7c 84 20 f8 	not     r4,r4
      	c04e3b0c:	7c 63 20 14 	addc    r3,r3,r4
      	c04e3b10:	7c 63 01 94 	addze   r3,r3
      	c04e3b14:	4e 80 00 20 	blr
      		...
      	c04e5788:	4b ff e3 81 	bl      c04e3b08 <csum_sub>
      		...
      	c04e65c8:	4b ff d5 41 	bl      c04e3b08 <csum_sub>
      		...
      	c0512d34 <csum_sub>:
      	c0512d34:	7c 84 20 f8 	not     r4,r4
      	c0512d38:	7c 63 20 14 	addc    r3,r3,r4
      	c0512d3c:	7c 63 01 94 	addze   r3,r3
      	c0512d40:	4e 80 00 20 	blr
      		...
      	c0512dfc:	4b ff ff 39 	bl      c0512d34 <csum_sub>
      		...
      	c05138bc:	4b ff f4 79 	bl      c0512d34 <csum_sub>
      		...
      
      Restore the expected behaviour by using __always_inline for all
      functions defined in net/checksum.h
      
      vmlinux size is even reduced by 256 bytes with this patch:
      
      	   text	   data	    bss	    dec	    hex	filename
      	6980022	2515362	 194384	9689768	 93daa8	vmlinux.before
      	6979862	2515266	 194384	9689512	 93d9a8	vmlinux.now
      
      Fixes: ac7c3e4f ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly")
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5486f5bf
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 0033fced
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-02-18
      
      This series contains updates to ice driver only.
      
      Wojciech fixes protocol matching for slow-path switchdev so that all
      packets are correctly redirected.
      
      Michal removes accidental unconditional setting of l4 port filtering
      flag.
      
      Jake adds locking to protect VF reset and removal to fix various issues
      that can be encountered when they race with each other.
      
      Tom Rix propagates an error and initializes a struct to resolve reported
      Clang issues.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0033fced
    • David S. Miller's avatar
      Merge branch 'mptcp-fixes' · 90141edc
      David S. Miller authored
      Mat Martineau says:
      
      ====================
      mptcp: Fix address advertisement races and stabilize tests
      
      Patches 1, 2, and 7 modify two self tests to give consistent, accurate
      results by fixing timing issues and accounting for syncookie behavior.
      
      Paches 3-6 fix two races in overlapping address advertisement send and
      receive. Associated self tests are updated, including addition of two
      MIBs to enable testing and tracking dropped address events.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90141edc
    • Paolo Abeni's avatar
      selftests: mptcp: be more conservative with cookie MPJ limits · e35f885b
      Paolo Abeni authored
      Since commit 2843ff6f ("mptcp: remote addresses fullmesh"), an
      MPTCP client can attempt creating multiple MPJ subflow simultaneusly.
      
      In such scenario the server, when syncookies are enabled, could end-up
      accepting incoming MPJ syn even above the configured subflow limit, as
      the such limit can be enforced in a reliable way only after the subflow
      creation. In case of syncookie, only after the 3rd ack reception.
      
      As a consequence the related self-tests case sporadically fails, as it
      verify that the server always accept the expected number of MPJ syn.
      
      Address the issues relaxing the MPJ syn number constrain. Note that the
      check on the accepted number of MPJ 3rd ack still remains intact.
      
      Fixes: 2843ff6f ("mptcp: remote addresses fullmesh")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e35f885b
    • Paolo Abeni's avatar
      selftests: mptcp: more robust signal race test · 6ef84b15
      Paolo Abeni authored
      The in kernel MPTCP PM implementation can process a single
      incoming add address option at any given time. In the
      mentioned test the server can surpass such limit. Let the
      setup cope with that allowing a faster add_addr retransmission.
      
      Fixes: a88c9e49 ("mptcp: do not block subflows creation on errors")
      Fixes: f7efc777 ("mptcp: drop argument port from mptcp_pm_announce_addr")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/254Reported-and-tested-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ef84b15
    • Paolo Abeni's avatar
      mptcp: add mibs counter for ignored incoming options · f73c1194
      Paolo Abeni authored
      The MPTCP in kernel path manager has some constraints on incoming
      addresses announce processing, so that in edge scenarios it can
      end-up dropping (ignoring) some of such announces.
      
      The above is not very limiting in practice since such scenarios are
      very uncommon and MPTCP will recover due to ADD_ADDR retransmissions.
      
      This patch adds a few MIB counters to account for such drop events
      to allow easier introspection of the critical scenarios.
      
      Fixes: f7efc777 ("mptcp: drop argument port from mptcp_pm_announce_addr")
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f73c1194
    • Paolo Abeni's avatar
      mptcp: fix race in incoming ADD_ADDR option processing · 837cf45d
      Paolo Abeni authored
      If an MPTCP endpoint received multiple consecutive incoming
      ADD_ADDR options, mptcp_pm_add_addr_received() can overwrite
      the current remote address value after the PM lock is released
      in mptcp_pm_nl_add_addr_received() and before such address
      is echoed.
      
      Fix the issue caching the remote address value a little earlier
      and always using the cached value after releasing the PM lock.
      
      Fixes: f7efc777 ("mptcp: drop argument port from mptcp_pm_announce_addr")
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      837cf45d
    • Paolo Abeni's avatar
      mptcp: fix race in overlapping signal events · 98247bc1
      Paolo Abeni authored
      After commit a88c9e49 ("mptcp: do not block subflows
      creation on errors"), if a signal address races with a failing
      subflow creation, the subflow creation failure control path
      can trigger the selection of the next address to be announced
      while the current announced is still pending.
      
      The above will cause the unintended suppression of the ADD_ADDR
      announce.
      
      Fix the issue skipping the to-be-suppressed announce before it
      will mark an endpoint as already used. The relevant announce
      will be triggered again when the current one will complete.
      
      Fixes: a88c9e49 ("mptcp: do not block subflows creation on errors")
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98247bc1
    • Paolo Abeni's avatar
      selftests: mptcp: improve 'fair usage on close' stability · 5b31dda7
      Paolo Abeni authored
      The mentioned test has to wait for a subflow creation failure.
      The current code looks for TCP sockets in TW state and sometimes
      misses the relevant event. Switch to a more stable check, looking
      for the associated mib counter.
      
      Fixes: 46e967d1 ("selftests: mptcp: add tests for subflow creation failure")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/257Reported-and-tested-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b31dda7
    • Paolo Abeni's avatar
      selftests: mptcp: fix diag instability · 0cd33c5f
      Paolo Abeni authored
      Instead of waiting for an arbitrary amount of time for the MPTCP
      MP_CAPABLE handshake to complete, explicitly wait for the relevant
      socket to enter into the established status.
      
      Additionally let the data transfer application use the slowest
      transfer mode available (-r), to cope with very slow host, or
      high jitter caused by hosting VMs.
      
      Fixes: df62f2ec ("selftests/mptcp: add diag interface tests")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/258Reported-and-tested-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cd33c5f
    • Christophe JAILLET's avatar
      nfp: flower: Fix a potential leak in nfp_tunnel_add_shared_mac() · 3a14d088
      Christophe JAILLET authored
      ida_simple_get() returns an id between min (0) and max (NFP_MAX_MAC_INDEX)
      inclusive.
      So NFP_MAX_MAC_INDEX (0xff) is a valid id.
      
      In order for the error handling path to work correctly, the 'invalid'
      value for 'ida_idx' should not be in the 0..NFP_MAX_MAC_INDEX range,
      inclusive.
      
      So set it to -1.
      
      Fixes: 20cce886 ("nfp: flower: enable MAC address sharing for offloadable devs")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20220218131535.100258-1-simon.horman@corigine.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3a14d088
    • Subash Abhinov Kasiviswanathan's avatar
    • Jeremy Linton's avatar
      net: mvpp2: always set port pcs ops · 5a2aba71
      Jeremy Linton authored
      Booting a MACCHIATObin with 5.17, the system OOPs with
      a null pointer deref when the network is started. This
      is caused by the pcs->ops structure being null in
      mcpp2_acpi_start() when it tries to call pcs_config().
      
      Hoisting the code which sets pcs_gmac.ops and pcs_xlg.ops,
      assuring they are always set, fixes the problem.
      
      The OOPs looks like:
      [   18.687760] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000010
      [   18.698561] Mem abort info:
      [   18.698564]   ESR = 0x96000004
      [   18.698567]   EC = 0x25: DABT (current EL), IL = 32 bits
      [   18.709821]   SET = 0, FnV = 0
      [   18.714292]   EA = 0, S1PTW = 0
      [   18.718833]   FSC = 0x04: level 0 translation fault
      [   18.725126] Data abort info:
      [   18.729408]   ISV = 0, ISS = 0x00000004
      [   18.734655]   CM = 0, WnR = 0
      [   18.738933] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000111bbf000
      [   18.745409] [0000000000000010] pgd=0000000000000000, p4d=0000000000000000
      [   18.752235] Internal error: Oops: 96000004 [#1] SMP
      [   18.757134] Modules linked in: rfkill ip_set nf_tables nfnetlink qrtr sunrpc vfat fat omap_rng fuse zram xfs crct10dif_ce mvpp2 ghash_ce sbsa_gwdt phylink xhci_plat_hcd ahci_plam
      [   18.773481] CPU: 0 PID: 681 Comm: NetworkManager Not tainted 5.17.0-0.rc3.89.fc36.aarch64 #1
      [   18.781954] Hardware name: Marvell                         Armada 7k/8k Family Board      /Armada 7k/8k Family Board      , BIOS EDK II Jun  4 2019
      [   18.795222] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [   18.802213] pc : mvpp2_start_dev+0x2b0/0x300 [mvpp2]
      [   18.807208] lr : mvpp2_start_dev+0x298/0x300 [mvpp2]
      [   18.812197] sp : ffff80000b4732c0
      [   18.815522] x29: ffff80000b4732c0 x28: 0000000000000000 x27: ffffccab38ae57f8
      [   18.822689] x26: ffff6eeb03065a10 x25: ffff80000b473a30 x24: ffff80000b4735b8
      [   18.829855] x23: 0000000000000000 x22: 00000000000001e0 x21: ffff6eeb07b6ab68
      [   18.837021] x20: ffff6eeb07b6ab30 x19: ffff6eeb07b6a9c0 x18: 0000000000000014
      [   18.844187] x17: 00000000f6232bfe x16: ffffccab899b1dc0 x15: 000000006a30f9fa
      [   18.851353] x14: 000000003b77bd50 x13: 000006dc896f0e8e x12: 001bbbfccfd0d3a2
      [   18.858519] x11: 0000000000001528 x10: 0000000000001548 x9 : ffffccab38ad0fb0
      [   18.865685] x8 : ffff80000b473330 x7 : 0000000000000000 x6 : 0000000000000000
      [   18.872851] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff80000b4732f8
      [   18.880017] x2 : 000000000000001a x1 : 0000000000000002 x0 : ffff6eeb07b6ab68
      [   18.887183] Call trace:
      [   18.889637]  mvpp2_start_dev+0x2b0/0x300 [mvpp2]
      [   18.894279]  mvpp2_open+0x134/0x2b4 [mvpp2]
      [   18.898483]  __dev_open+0x128/0x1e4
      [   18.901988]  __dev_change_flags+0x17c/0x1d0
      [   18.906187]  dev_change_flags+0x30/0x70
      [   18.910038]  do_setlink+0x278/0xa7c
      [   18.913540]  __rtnl_newlink+0x44c/0x7d0
      [   18.917391]  rtnl_newlink+0x5c/0x8c
      [   18.920892]  rtnetlink_rcv_msg+0x254/0x314
      [   18.925006]  netlink_rcv_skb+0x48/0x10c
      [   18.928858]  rtnetlink_rcv+0x24/0x30
      [   18.932449]  netlink_unicast+0x290/0x2f4
      [   18.936386]  netlink_sendmsg+0x1d0/0x41c
      [   18.940323]  sock_sendmsg+0x60/0x70
      [   18.943825]  ____sys_sendmsg+0x248/0x260
      [   18.947762]  ___sys_sendmsg+0x74/0xa0
      [   18.951438]  __sys_sendmsg+0x64/0xcc
      [   18.955027]  __arm64_sys_sendmsg+0x30/0x40
      [   18.959140]  invoke_syscall+0x50/0x120
      [   18.962906]  el0_svc_common.constprop.0+0x4c/0xf4
      [   18.967629]  do_el0_svc+0x30/0x9c
      [   18.970958]  el0_svc+0x28/0xb0
      [   18.974025]  el0t_64_sync_handler+0x10c/0x140
      [   18.978400]  el0t_64_sync+0x1a4/0x1a8
      [   18.982078] Code: 52800004 b9416262 aa1503e0 52800041 (f94008a5)
      [   18.988196] ---[ end trace 0000000000000000 ]---
      
      Fixes: cff05632 ("net: mvpp2: use .mac_select_pcs() interface")
      Suggested-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Reviewed-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Link: https://lore.kernel.org/r/20220214231852.3331430-1-jeremy.linton@arm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5a2aba71
  2. 18 Feb, 2022 11 commits
    • Tom Rix's avatar
      ice: initialize local variable 'tlv' · 5950bdc8
      Tom Rix authored
      Clang static analysis reports this issues
      ice_common.c:5008:21: warning: The left expression of the compound
        assignment is an uninitialized value. The computed value will
        also be garbage
        ldo->phy_type_low |= ((u64)buf << (i * 16));
        ~~~~~~~~~~~~~~~~~ ^
      
      When called from ice_cfg_phy_fec() ldo is the uninitialized local
      variable tlv.  So initialize.
      
      Fixes: ea78ce4d ("ice: add link lenient and default override support")
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      5950bdc8
    • Tom Rix's avatar
      ice: check the return of ice_ptp_gettimex64 · ed22d9c8
      Tom Rix authored
      Clang static analysis reports this issue
      time64.h:69:50: warning: The left operand of '+'
        is a garbage value
        set_normalized_timespec64(&ts_delta, lhs.tv_sec + rhs.tv_sec,
                                             ~~~~~~~~~~ ^
      In ice_ptp_adjtime_nonatomic(), the timespec64 variable 'now'
      is set by ice_ptp_gettimex64().  This function can fail
      with -EBUSY, so 'now' can have a gargbage value.
      So check the return.
      
      Fixes: 06c16d89 ("ice: register 1588 PTP clock device object for E810 devices")
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      ed22d9c8
    • Jacob Keller's avatar
      ice: fix concurrent reset and removal of VFs · fadead80
      Jacob Keller authored
      Commit c503e632 ("ice: Stop processing VF messages during teardown")
      introduced a driver state flag, ICE_VF_DEINIT_IN_PROGRESS, which is
      intended to prevent some issues with concurrently handling messages from
      VFs while tearing down the VFs.
      
      This change was motivated by crashes caused while tearing down and
      bringing up VFs in rapid succession.
      
      It turns out that the fix actually introduces issues with the VF driver
      caused because the PF no longer responds to any messages sent by the VF
      during its .remove routine. This results in the VF potentially removing
      its DMA memory before the PF has shut down the device queues.
      
      Additionally, the fix doesn't actually resolve concurrency issues within
      the ice driver. It is possible for a VF to initiate a reset just prior
      to the ice driver removing VFs. This can result in the remove task
      concurrently operating while the VF is being reset. This results in
      similar memory corruption and panics purportedly fixed by that commit.
      
      Fix this concurrency at its root by protecting both the reset and
      removal flows using the existing VF cfg_lock. This ensures that we
      cannot remove the VF while any outstanding critical tasks such as a
      virtchnl message or a reset are occurring.
      
      This locking change also fixes the root cause originally fixed by commit
      c503e632 ("ice: Stop processing VF messages during teardown"), so we
      can simply revert it.
      
      Note that I kept these two changes together because simply reverting the
      original commit alone would leave the driver vulnerable to worse race
      conditions.
      
      Fixes: c503e632 ("ice: Stop processing VF messages during teardown")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      fadead80
    • Michal Swiatkowski's avatar
      ice: fix setting l4 port flag when adding filter · 932645c2
      Michal Swiatkowski authored
      Accidentally filter flag for none encapsulated l4 port field is always
      set. Even if user wants to add encapsulated l4 port field.
      
      Remove this unnecessary flag setting.
      
      Fixes: 9e300987 ("ice: VXLAN and Geneve TC support")
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      932645c2
    • Wojciech Drewek's avatar
      ice: Match on all profiles in slow-path · b70bc066
      Wojciech Drewek authored
      In switchdev mode, slow-path rules need to match all protocols, in order
      to correctly redirect unfiltered or missed packets to the uplink. To set
      this up for the virtual function to uplink flow, the rule that redirects
      packets to the control VSI must have the tunnel type set to
      ICE_SW_TUN_AND_NON_TUN. As a result of that new tunnel type being set,
      ice_get_compat_fv_bitmap will select ICE_PROF_ALL. At that point all
      profiles would be selected for this rule, resulting in the desired
      behavior. Without this change slow-path would not work with
      tunnel protocols.
      
      Fixes: 8b032a55 ("ice: low level support for tunnels")
      Signed-off-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      b70bc066
    • Xiaoke Wang's avatar
      net: ll_temac: check the return value of devm_kmalloc() · b352c346
      Xiaoke Wang authored
      devm_kmalloc() returns a pointer to allocated memory on success, NULL
      on failure. While lp->indirect_lock is allocated by devm_kmalloc()
      without proper check. It is better to check the value of it to
      prevent potential wrong memory access.
      
      Fixes: f14f5c11 ("net: ll_temac: Support indirect_mutex share within TEMAC IP")
      Signed-off-by: default avatarXiaoke Wang <xkernel.wang@foxmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b352c346
    • Eric Dumazet's avatar
      net-timestamp: convert sk->sk_tskey to atomic_t · a1cdec57
      Eric Dumazet authored
      UDP sendmsg() can be lockless, this is causing all kinds
      of data races.
      
      This patch converts sk->sk_tskey to remove one of these races.
      
      BUG: KCSAN: data-race in __ip_append_data / __ip_append_data
      
      read to 0xffff8881035d4b6c of 4 bytes by task 8877 on cpu 1:
       __ip_append_data+0x1c1/0x1de0 net/ipv4/ip_output.c:994
       ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
       udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
       inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      write to 0xffff8881035d4b6c of 4 bytes by task 8880 on cpu 0:
       __ip_append_data+0x1d8/0x1de0 net/ipv4/ip_output.c:994
       ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
       udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
       inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
       __do_sys_sendmmsg net/socket.c:2582 [inline]
       __se_sys_sendmmsg net/socket.c:2579 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000054d -> 0x0000054e
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 8880 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00167-gdcb85f85-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 09c2d251 ("net-timestamp: add key to disambiguate concurrent datagrams")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1cdec57
    • Oliver Neukum's avatar
      sr9700: sanity check for packet length · e9da0b56
      Oliver Neukum authored
      A malicious device can leak heap data to user space
      providing bogus frame lengths. Introduce a sanity check.
      Signed-off-by: default avatarOliver Neukum <oneukum@suse.com>
      Reviewed-by: default avatarGrant Grundler <grundler@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9da0b56
    • Paul Blakey's avatar
      net/sched: act_ct: Fix flow table lookup after ct clear or switching zones · 2f131de3
      Paul Blakey authored
      Flow table lookup is skipped if packet either went through ct clear
      action (which set the IP_CT_UNTRACKED flag on the packet), or while
      switching zones and there is already a connection associated with
      the packet. This will result in no SW offload of the connection,
      and the and connection not being removed from flow table with
      TCP teardown (fin/rst packet).
      
      To fix the above, remove these unneccary checks in flow
      table lookup.
      
      Fixes: 46475bb2 ("net/sched: act_ct: Software offload of established flows")
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f131de3
    • suresh kumar's avatar
      net-sysfs: add check for netdevice being present to speed_show · 4224cfd7
      suresh kumar authored
      When bringing down the netdevice or system shutdown, a panic can be
      triggered while accessing the sysfs path because the device is already
      removed.
      
          [  755.549084] mlx5_core 0000:12:00.1: Shutdown was called
          [  756.404455] mlx5_core 0000:12:00.0: Shutdown was called
          ...
          [  757.937260] BUG: unable to handle kernel NULL pointer dereference at           (null)
          [  758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280
      
          crash> bt
          ...
          PID: 12649  TASK: ffff8924108f2100  CPU: 1   COMMAND: "amsd"
          ...
           #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778
              [exception RIP: dma_pool_alloc+0x1ab]
              RIP: ffffffff8ee11acb  RSP: ffff89240e1a3968  RFLAGS: 00010046
              RAX: 0000000000000246  RBX: ffff89243d874100  RCX: 0000000000001000
              RDX: 0000000000000000  RSI: 0000000000000246  RDI: ffff89243d874090
              RBP: ffff89240e1a39c0   R8: 000000000001f080   R9: ffff8905ffc03c00
              R10: ffffffffc04680d4  R11: ffffffff8edde9fd  R12: 00000000000080d0
              R13: ffff89243d874090  R14: ffff89243d874080  R15: 0000000000000000
              ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
          #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core]
          #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core]
          #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core]
          #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core]
          #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core]
          #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core]
          #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core]
          #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46
          #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208
          #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3
          #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf
          #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596
          #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10
          #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5
          #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff
          #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f
          #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92
      
          crash> net_device.state ffff89443b0c0000
            state = 0x5  (__LINK_STATE_START| __LINK_STATE_NOCARRIER)
      
      To prevent this scenario, we also make sure that the netdevice is present.
      Signed-off-by: default avatarsuresh kumar <suresh2514@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4224cfd7
    • Duoming Zhou's avatar
      drivers: hamradio: 6pack: fix UAF bug caused by mod_timer() · efe4186e
      Duoming Zhou authored
      When a 6pack device is detaching, the sixpack_close() will act to cleanup
      necessary resources. Although del_timer_sync() in sixpack_close()
      won't return if there is an active timer, one could use mod_timer() in
      sp_xmit_on_air() to wake up timer again by calling userspace syscall such
      as ax25_sendmsg(), ax25_connect() and ax25_ioctl().
      
      This unexpected waked handler, sp_xmit_on_air(), realizes nothing about
      the undergoing cleanup and may still call pty_write() to use driver layer
      resources that have already been released.
      
      One of the possible race conditions is shown below:
      
            (USE)                      |      (FREE)
      ax25_sendmsg()                   |
       ax25_queue_xmit()               |
        ...                            |
        sp_xmit()                      |
         sp_encaps()                   | sixpack_close()
          sp_xmit_on_air()             |  del_timer_sync(&sp->tx_t)
           mod_timer(&sp->tx_t,...)    |  ...
                                       |  unregister_netdev()
                                       |  ...
           (wait a while)              | tty_release()
                                       |  tty_release_struct()
                                       |   release_tty()
          sp_xmit_on_air()             |    tty_kref_put(tty_struct) //FREE
           pty_write(tty_struct) //USE |    ...
      
      The corresponding fail log is shown below:
      ===============================================================
      BUG: KASAN: use-after-free in __run_timers.part.0+0x170/0x470
      Write of size 8 at addr ffff88800a652ab8 by task swapper/2/0
      ...
      Call Trace:
        ...
        queue_work_on+0x3f/0x50
        pty_write+0xcd/0xe0pty_write+0xcd/0xe0
        sp_xmit_on_air+0xb2/0x1f0
        call_timer_fn+0x28/0x150
        __run_timers.part.0+0x3c2/0x470
        run_timer_softirq+0x3b/0x80
        __do_softirq+0xf1/0x380
        ...
      
      This patch reorders the del_timer_sync() after the unregister_netdev()
      to avoid UAF bugs. Because the unregister_netdev() is well synchronized,
      it flushs out any pending queues, waits the refcount of net_device
      decreases to zero and removes net_device from kernel. There is not any
      running routines after executing unregister_netdev(). Therefore, we could
      not arouse timer from userspace again.
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Reviewed-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efe4186e
  3. 17 Feb, 2022 15 commits
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 7a2fb912
      Jakub Kicinski authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2022-02-17
      
      We've added 8 non-merge commits during the last 7 day(s) which contain
      a total of 8 files changed, 119 insertions(+), 15 deletions(-).
      
      The main changes are:
      
      1) Add schedule points in map batch ops, from Eric.
      
      2) Fix bpf_msg_push_data with len 0, from Felix.
      
      3) Fix crash due to incorrect copy_map_value, from Kumar.
      
      4) Fix crash due to out of bounds access into reg2btf_ids, from Kumar.
      
      5) Fix a bpf_timer initialization issue with clang, from Yonghong.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf: Add schedule points in batch ops
        bpf: Fix crash due to out of bounds access into reg2btf_ids.
        selftests: bpf: Check bpf_msg_push_data return value
        bpf: Fix a bpf_timer initialization issue
        bpf: Emit bpf_timer in vmlinux BTF
        selftests/bpf: Add test for bpf_timer overwriting crash
        bpf: Fix crash due to incorrect copy_map_value
        bpf: Do not try bpf_msg_push_data with len 0
      ====================
      
      Link: https://lore.kernel.org/r/20220217190000.37925-1-alexei.starovoitov@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7a2fb912
    • Linus Torvalds's avatar
      Merge tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8b97cae3
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from wireless and netfilter.
      
        Current release - regressions:
      
         - dsa: lantiq_gswip: fix use after free in gswip_remove()
      
         - smc: avoid overwriting the copies of clcsock callback functions
      
        Current release - new code bugs:
      
         - iwlwifi:
            - fix use-after-free when no FW is present
            - mei: fix the pskb_may_pull check in ipv4
            - mei: retry mapping the shared area
            - mvm: don't feed the hardware RFKILL into iwlmei
      
        Previous releases - regressions:
      
         - ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
      
         - tipc: fix wrong publisher node address in link publications
      
         - iwlwifi: mvm: don't send SAR GEO command for 3160 devices, avoid FW
           assertion
      
         - bgmac: make idm and nicpm resource optional again
      
         - atl1c: fix tx timeout after link flap
      
        Previous releases - always broken:
      
         - vsock: remove vsock from connected table when connect is
           interrupted by a signal
      
         - ping: change destination interface checks to match raw sockets
      
         - crypto: af_alg - get rid of alg_memory_allocated to avoid confusing
           semantics (and null-deref) after SO_RESERVE_MEM was added
      
         - ipv6: make exclusive flowlabel checks per-netns
      
         - bonding: force carrier update when releasing slave
      
         - sched: limit TC_ACT_REPEAT loops
      
         - bridge: multicast: notify switchdev driver whenever MC processing
           gets disabled because of max entries reached
      
         - wifi: brcmfmac: fix crash in brcm_alt_fw_path when WLAN not found
      
         - iwlwifi: fix locking when "HW not ready"
      
         - phy: mediatek: remove PHY mode check on MT7531
      
         - dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN
      
         - dsa: lan9303:
            - fix polarity of reset during probe
            - fix accelerated VLAN handling"
      
      * tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
        bonding: force carrier update when releasing slave
        nfp: flower: netdev offload check for ip6gretap
        ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
        ipv4: fix data races in fib_alias_hw_flags_set
        net: dsa: lan9303: add VLAN IDs to master device
        net: dsa: lan9303: handle hwaccel VLAN tags
        vsock: remove vsock from connected table when connect is interrupted by a signal
        Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
        ping: fix the dif and sdif check in ping_lookup
        net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
        net: sched: limit TC_ACT_REPEAT loops
        tipc: fix wrong notification node addresses
        net: dsa: lantiq_gswip: fix use after free in gswip_remove()
        ipv6: per-netns exclusive flowlabel checks
        net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
        CDC-NCM: avoid overflow in sanity checking
        mctp: fix use after free
        net: mscc: ocelot: fix use-after-free in ocelot_vlan_del()
        bonding: fix data-races around agg_select_timer
        dpaa2-eth: Initialize mutex used in one step timestamping path
        ...
      8b97cae3
    • Zhang Changzhong's avatar
      bonding: force carrier update when releasing slave · a6ab75ce
      Zhang Changzhong authored
      In __bond_release_one(), bond_set_carrier() is only called when bond
      device has no slave. Therefore, if we remove the up slave from a master
      with two slaves and keep the down slave, the master will remain up.
      
      Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
      statement.
      
      Reproducer:
      $ insmod bonding.ko mode=0 miimon=100 max_bonds=2
      $ ifconfig bond0 up
      $ ifenslave bond0 eth0 eth1
      $ ifconfig eth0 down
      $ ifenslave -d bond0 eth1
      $ cat /proc/net/bonding/bond0
      
      Fixes: ff59c456 ("[PATCH] bonding: support carrier state for master")
      Signed-off-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Link: https://lore.kernel.org/r/1645021088-38370-1-git-send-email-zhangchangzhong@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a6ab75ce
    • Eric Dumazet's avatar
      bpf: Add schedule points in batch ops · 75134f16
      Eric Dumazet authored
      syzbot reported various soft lockups caused by bpf batch operations.
      
       INFO: task kworker/1:1:27 blocked for more than 140 seconds.
       INFO: task hung in rcu_barrier
      
      Nothing prevents batch ops to process huge amount of data,
      we need to add schedule points in them.
      
      Note that maybe_wait_bpf_programs(map) calls from
      generic_map_delete_batch() can be factorized by moving
      the call after the loop.
      
      This will be done later in -next tree once we get this fix merged,
      unless there is strong opinion doing this optimization sooner.
      
      Fixes: aa2e93b8 ("bpf: Add generic support for update and delete batch ops")
      Fixes: cb4d03ab ("bpf: Add generic support for lookup batch op")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarBrian Vazquez <brianvv@google.com>
      Link: https://lore.kernel.org/bpf/20220217181902.808742-1-eric.dumazet@gmail.com
      75134f16
    • Luis Chamberlain's avatar
      fs/file_table: fix adding missing kmemleak_not_leak() · a3580ac9
      Luis Chamberlain authored
      Commit b42bc9a3 ("Fix regression due to "fs: move binfmt_misc sysctl
      to its own file") fixed a regression, however it failed to add a
      kmemleak_not_leak().
      
      Fixes: b42bc9a3 ("Fix regression due to "fs: move binfmt_misc sysctl to its own file")
      Reported-by: default avatarTong Zhang <ztong0001@gmail.com>
      Cc: Tong Zhang <ztong0001@gmail.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3580ac9
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v5.17-2022-02-17' of... · 2dd3a8a1
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Fix corrupt inject files when only last branch option is enabled with
         ARM CoreSight ETM
      
       - Fix use-after-free for realloc(..., 0) in libsubcmd, found by gcc 12
      
       - Defer freeing string after possible strlen() on it in the BPF loader,
         found by gcc 12
      
       - Avoid early exit in 'perf trace' due SIGCHLD from non-workload
         processes
      
       - Fix arm64 perf_event_attr 'perf test's wrt --call-graph
         initialization
      
       - Fix libperf 32-bit build for 'perf test' wrt uint64_t printf
      
       - Fix perf_cpu_map__for_each_cpu macro in libperf, providing access to
         the CPU iterator
      
       - Sync linux/perf_event.h UAPI with the kernel sources
      
       - Update Jiri Olsa's email address in MAINTAINERS
      
      * tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        perf bpf: Defer freeing string after possible strlen() on it
        perf test: Fix arm64 perf_event_attr tests wrt --call-graph initialization
        libsubcmd: Fix use-after-free for realloc(..., 0)
        libperf: Fix perf_cpu_map__for_each_cpu macro
        perf cs-etm: Fix corrupt inject files when only last branch option is enabled
        perf cs-etm: No-op refactor of synth opt usage
        libperf: Fix 32-bit build for tests uint64_t printf
        tools headers UAPI: Sync linux/perf_event.h with the kernel sources
        perf trace: Avoid early exit due SIGCHLD from non-workload processes
        MAINTAINERS: Update Jiri's email address
      2dd3a8a1
    • Linus Torvalds's avatar
      Merge tag 'modules-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux · edbd6c62
      Linus Torvalds authored
      Pull module fix from Luis Chamberlain:
       "Fixes module decompression when CONFIG_SYSFS=n
      
        The only fix trickled down for v5.17-rc cycle so far is the fix for
        module decompression when CONFIG_SYSFS=n. This was reported through
        0-day"
      
      * tag 'modules-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
        module: fix building with sysfs disabled
      edbd6c62
    • Danie du Toit's avatar
      nfp: flower: netdev offload check for ip6gretap · 7dbcda58
      Danie du Toit authored
      IPv6 GRE tunnels are not being offloaded, this is caused by a missing
      netdev offload check. The functionality of IPv6 GRE tunnel offloading
      was previously added but this check was not included. Adding the
      ip6gretap check allows IPv6 GRE tunnels to be offloaded correctly.
      
      Fixes: f7536ffb ("nfp: flower: Allow ipv6gretap interface for offloading")
      Signed-off-by: default avatarDanie du Toit <danie.dutoit@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20220217124820.40436-1-louis.peens@corigine.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7dbcda58
    • Eric Dumazet's avatar
      ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt · d95d6320
      Eric Dumazet authored
      Because fib6_info_hw_flags_set() is called without any synchronization,
      all accesses to gi6->offload, fi->trap and fi->offload_failed
      need some basic protection like READ_ONCE()/WRITE_ONCE().
      
      BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt
      
      read to 0xffff8881087d5886 of 1 bytes by task 13953 on cpu 0:
       fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1007 [inline]
       fib6_purge_rt+0x4f/0x580 net/ipv6/ip6_fib.c:1033
       fib6_del_route net/ipv6/ip6_fib.c:1983 [inline]
       fib6_del+0x696/0x890 net/ipv6/ip6_fib.c:2028
       __ip6_del_rt net/ipv6/route.c:3876 [inline]
       ip6_del_rt+0x83/0x140 net/ipv6/route.c:3891
       __ipv6_dev_ac_dec+0x2b5/0x370 net/ipv6/anycast.c:374
       ipv6_dev_ac_dec net/ipv6/anycast.c:387 [inline]
       __ipv6_sock_ac_close+0x141/0x200 net/ipv6/anycast.c:207
       ipv6_sock_ac_close+0x79/0x90 net/ipv6/anycast.c:220
       inet6_release+0x32/0x50 net/ipv6/af_inet6.c:476
       __sock_release net/socket.c:650 [inline]
       sock_close+0x6c/0x150 net/socket.c:1318
       __fput+0x295/0x520 fs/file_table.c:280
       ____fput+0x11/0x20 fs/file_table.c:313
       task_work_run+0x8e/0x110 kernel/task_work.c:164
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
       exit_to_user_mode_prepare+0x160/0x190 kernel/entry/common.c:207
       __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
       syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
       do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1:
       fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230
       nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline]
       nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline]
       nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline]
       nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline]
       nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline]
       nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x2c7/0x2e0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      value changed: 0x22 -> 0x2a
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events nsim_fib_event_work
      
      Fixes: 0c5fcf9e ("IPv6: Add "offload failed" indication to routes")
      Fixes: bb3c4ab9 ("ipv6: Add "offload" and "trap" indications to routes")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Amit Cohen <amcohen@nvidia.com>
      Cc: Ido Schimmel <idosch@nvidia.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220216173217.3792411-2-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d95d6320
    • Eric Dumazet's avatar
      ipv4: fix data races in fib_alias_hw_flags_set · 9fcf986c
      Eric Dumazet authored
      fib_alias_hw_flags_set() can be used by concurrent threads,
      and is only RCU protected.
      
      We need to annotate accesses to following fields of struct fib_alias:
      
          offload, trap, offload_failed
      
      Because of READ_ONCE()WRITE_ONCE() limitations, make these
      field u8.
      
      BUG: KCSAN: data-race in fib_alias_hw_flags_set / fib_alias_hw_flags_set
      
      read to 0xffff888134224a6a of 1 bytes by task 2013 on cpu 1:
       fib_alias_hw_flags_set+0x28a/0x470 net/ipv4/fib_trie.c:1050
       nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
       nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
       nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
       nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
       nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
       nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       process_scheduled_works kernel/workqueue.c:2370 [inline]
       worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
       kthread+0x1bf/0x1e0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30
      
      write to 0xffff888134224a6a of 1 bytes by task 4872 on cpu 0:
       fib_alias_hw_flags_set+0x2d5/0x470 net/ipv4/fib_trie.c:1054
       nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
       nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
       nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
       nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
       nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
       nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       process_scheduled_works kernel/workqueue.c:2370 [inline]
       worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
       kthread+0x1bf/0x1e0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00 -> 0x02
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 4872 Comm: kworker/0:0 Not tainted 5.17.0-rc3-syzkaller-00188-g1d41d2e8-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events nsim_fib_event_work
      
      Fixes: 90b93f1b ("ipv4: Add "offload" and "trap" indications to routes")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/20220216173217.3792411-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9fcf986c
    • Mans Rullgard's avatar
      net: dsa: lan9303: add VLAN IDs to master device · 430065e2
      Mans Rullgard authored
      If the master device does VLAN filtering, the IDs used by the switch
      must be added for any frames to be received.  Do this in the
      port_enable() function, and remove them in port_disable().
      
      Fixes: a1292595 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Link: https://lore.kernel.org/r/20220216204818.28746-1-mans@mansr.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      430065e2
    • Mans Rullgard's avatar
      net: dsa: lan9303: handle hwaccel VLAN tags · 017b355b
      Mans Rullgard authored
      Check for a hwaccel VLAN tag on rx and use it if present.  Otherwise,
      use __skb_vlan_pop() like the other tag parsers do.  This fixes the case
      where the VLAN tag has already been consumed by the master.
      
      Fixes: a1292595 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
      Signed-off-by: default avatarMans Rullgard <mans@mansr.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      017b355b
    • Linus Torvalds's avatar
      mm: don't try to NUMA-migrate COW pages that have other uses · 80d47f5d
      Linus Torvalds authored
      Oded Gabbay reports that enabling NUMA balancing causes corruption with
      his Gaudi accelerator test load:
      
       "All the details are in the bug, but the bottom line is that somehow,
        this patch causes corruption when the numa balancing feature is
        enabled AND we don't use process affinity AND we use GUP to pin pages
        so our accelerator can DMA to/from system memory.
      
        Either disabling numa balancing, using process affinity to bind to
        specific numa-node or reverting this patch causes the bug to
        disappear"
      
      and Oded bisected the issue to commit 09854ba9 ("mm: do_wp_page()
      simplification").
      
      Now, the NUMA balancing shouldn't actually be changing the writability
      of a page, and as such shouldn't matter for COW.  But it appears it
      does.  Suspicious.
      
      However, regardless of that, the condition for enabling NUMA faults in
      change_pte_range() is nonsensical.  It uses "page_mapcount(page)" to
      decide if a COW page should be NUMA-protected or not, and that makes
      absolutely no sense.
      
      The number of mappings a page has is irrelevant: not only does GUP get a
      reference to a page as in Oded's case, but the other mappings migth be
      paged out and the only reference to them would be in the page count.
      
      Since we should never try to NUMA-balance a page that we can't move
      anyway due to other references, just fix the code to use 'page_count()'.
      Oded confirms that that fixes his issue.
      
      Now, this does imply that something in NUMA balancing ends up changing
      page protections (other than the obvious one of making the page
      inaccessible to get the NUMA faulting information).  Otherwise the COW
      simplification wouldn't matter - since doing the GUP on the page would
      make sure it's writable.
      
      The cause of that permission change would be good to figure out too,
      since it clearly results in spurious COW events - but fixing the
      nonsensical test that just happened to work before is obviously the
      CorrectThing(tm) to do regardless.
      
      Fixes: 09854ba9 ("mm: do_wp_page() simplification")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215616
      Link: https://lore.kernel.org/all/CAFCwf10eNmwq2wD71xjUhqkvv5+_pJMR1nPug2RqNDcFT4H86Q@mail.gmail.com/Reported-and-tested-by: default avatarOded Gabbay <oded.gabbay@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80d47f5d
    • Seth Forshee's avatar
      vsock: remove vsock from connected table when connect is interrupted by a signal · b9208492
      Seth Forshee authored
      vsock_connect() expects that the socket could already be in the
      TCP_ESTABLISHED state when the connecting task wakes up with a signal
      pending. If this happens the socket will be in the connected table, and
      it is not removed when the socket state is reset. In this situation it's
      common for the process to retry connect(), and if the connection is
      successful the socket will be added to the connected table a second
      time, corrupting the list.
      
      Prevent this by calling vsock_remove_connected() if a signal is received
      while waiting for a connection. This is harmless if the socket is not in
      the connected table, and if it is in the table then removing it will
      prevent list corruption from a double add.
      
      Note for backporting: this patch requires d5afa82c ("vsock: correct
      removal of socket from the list"), which is in all current stable trees
      except 4.9.y.
      
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Signed-off-by: default avatarSeth Forshee <sforshee@digitalocean.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Link: https://lore.kernel.org/r/20220217141312.2297547-1-sforshee@digitalocean.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b9208492
    • Jonas Gorski's avatar
      Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname" · 6aba04ee
      Jonas Gorski authored
      This reverts commit 3710e809.
      
      Since idm_base and nicpm_base are still optional resources not present
      on all platforms, this breaks the driver for everything except Northstar
      2 (which has both).
      
      The same change was already reverted once with 755f5738 ("net:
      broadcom: fix a mistake about ioremap resource").
      
      So let's do it again.
      
      Fixes: 3710e809 ("net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname")
      Signed-off-by: default avatarJonas Gorski <jonas.gorski@gmail.com>
      [florian: Added comments to explain the resources are optional]
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20220216184634.2032460-1-f.fainelli@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6aba04ee