1. 05 Jun, 2023 5 commits
    • David S. Miller's avatar
      Merge branch 'regmap-TSE-PCS' · f91e32de
      David S. Miller authored
      Maxime Chevallier says:
      
      ====================
      net: add a regmap-based mdio driver and drop TSE PCS
      
      This is the V4 of a series that follows-up on the work [1] aiming to drop the
      altera TSE PCS driver, as it turns out to be a version of the Lynx PCS exposed
      as a memory-mapped block, instead of living on an MDIO bus.
      
      One step of this removal involved creating a regmap-based mdio driver
      that translates MDIO accesses into the actual underlying bus that
      exposes the register. The register layout must of course match the
      standard MDIO layout, but we can now account for differences in stride
      with recent work on the regmap subsystem [2].
      
      Sorry for repeating this, but I didn't hear anything on this matter in previous
      iterations, Mark, Net maintainers, this series depends on the patch
      e12ff287 that was recently merged into the regmap tree [3].
      
      For this series to be usable in net-next, this patch must be applied
      beforehand. Should Mark create a tag that would then be merged into
      net-next ? Or should we just wait for the next release to merge this
      into net-next ?
      
      This series introduces a new MDIO driver, and uses it to convert Altera
      TSE from the actual TSE PCS driver to Lynx PCS.
      
      Since it turns out dwmac_socfpga also uses a TSE PCS block, port that
      driver to Lynx as well.
      
      Changes in V4 :
       - Use new pcs_lynx_create/destroy helpers added by Russell
       - Rework the cleanup sequence to avoid leaking data
       - Rework a bit KConfig to properly select dependencies
       - Fix a few hiccups with misplaced hunks in 2 commits
      
      Changes in V3 :
       - Use a dedicated struct for the mii bus's priv data, to avoid
         duplicating the whole struct mdio_regmap_config, from which 2 fields
         only are necessary after init, as suggested by Russell
       - Use ~0 instead of ~0UL for the no-scan bitmask, following Simon's
         review.
      
      Changes in V2 :
       - Use phy_mask to avoid unnecessarily scanning the whole mdio bus
       - Go one step further and completely disable scanning if users
         set the .autoscan flag to false, in case the mdiodevice isn't an
         actual PHY (a PCS for example).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f91e32de
    • Maxime Chevallier's avatar
      net: stmmac: dwmac-sogfpga: use the lynx pcs driver · 5d1f3fe7
      Maxime Chevallier authored
      dwmac_socfpga re-implements support for the TSE PCS, which is identical
      to the already existing TSE PCS, which in turn is the same as the Lynx
      PCS. Drop the existing TSE re-implemenation and use the Lynx PCS
      instead, relying on the regmap-mdio driver to translate MDIO accesses
      into mmio accesses.
      
      Add a lynx_pcs reference in the stmmac's internal structure, and use
      .mac_select_pcs() to return the relevant PCS to be used.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d1f3fe7
    • Maxime Chevallier's avatar
      net: pcs: Drop the TSE PCS driver · 196eec40
      Maxime Chevallier authored
      Now that we can easily create a mdio-device that represents a
      memory-mapped device that exposes an MDIO-like register layout, we don't
      need the Altera TSE PCS anymore, since we can use the Lynx PCS instead.
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      196eec40
    • Maxime Chevallier's avatar
      net: ethernet: altera-tse: Convert to mdio-regmap and use PCS Lynx · db48abba
      Maxime Chevallier authored
      The newly introduced regmap-based MDIO driver allows for an easy mapping
      of an mdiodevice onto the memory-mapped TSE PCS, which is actually a
      Lynx PCS.
      
      Convert Altera TSE to use this PCS instead of the pcs-altera-tse, which
      is nothing more than a memory-mapped Lynx PCS.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db48abba
    • Maxime Chevallier's avatar
      net: mdio: Introduce a regmap-based mdio driver · 642af0f9
      Maxime Chevallier authored
      There exists several examples today of devices that embed an ethernet
      PHY or PCS directly inside an SoC. In this situation, either the device
      is controlled through a vendor-specific register set, or sometimes
      exposes the standard 802.3 registers that are typically accessed over
      MDIO.
      
      As phylib and phylink are designed to use mdiodevices, this driver
      allows creating a virtual MDIO bus, that translates mdiodev register
      accesses to regmap accesses.
      
      The reason we use regmap is because there are at least 3 such devices
      known today, 2 of them are Altera TSE PCS's, memory-mapped, exposed
      with a 4-byte stride in stmmac's dwmac-socfpga variant, and a 2-byte
      stride in altera-tse. The other one (nxp,sja1110-base-tx-mdio) is
      exposed over SPI.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      642af0f9
  2. 04 Jun, 2023 1 commit
  3. 03 Jun, 2023 12 commits
  4. 02 Jun, 2023 11 commits
  5. 01 Jun, 2023 11 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · a03a91bd
      Jakub Kicinski authored
      Cross-merge networking fixes after downstream PR.
      
      No conflicts.
      
      Adjacent changes:
      
      drivers/net/ethernet/sfc/tc.c
        622ab656 ("sfc: fix error unwinds in TC offload")
        b6583d5e ("sfc: support TC decap rules matching on enc_src_port")
      
      net/mptcp/protocol.c
        5b825727 ("mptcp: add annotations around msk->subflow accesses")
        e76c8ef5 ("mptcp: refactor mptcp_stream_accept()")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a03a91bd
    • Linus Torvalds's avatar
      Merge tag 'net-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 714069da
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Happy Wear a Dress Day.
      
        Fairly standard-sized batch of fixes, accounting for the lack of
        sub-tree submissions this week. The mlx5 IRQ fixes are notable, people
        were complaining about that. No fires burning.
      
        Current release - regressions:
      
         - eth: mlx5e:
            - multiple fixes for dynamic IRQ allocation
            - prevent encap offload when neigh update is running
      
         - eth: mana: fix perf regression: remove rx_cqes, tx_cqes counters
      
        Current release - new code bugs:
      
         - eth: mlx5e: DR, add missing mutex init/destroy in pattern manager
      
        Previous releases - always broken:
      
         - tcp: deny tcp_disconnect() when threads are waiting
      
         - sched: prevent ingress Qdiscs from getting installed in random
           locations in the hierarchy and moving around
      
         - sched: flower: fix possible OOB write in fl_set_geneve_opt()
      
         - netlink: fix NETLINK_LIST_MEMBERSHIPS length report
      
         - udp6: fix race condition in udp6_sendmsg & connect
      
         - tcp: fix mishandling when the sack compression is deferred
      
         - rtnetlink: validate link attributes set at creation time
      
         - mptcp: fix connect timeout handling
      
         - eth: stmmac: fix call trace when stmmac_xdp_xmit() is invoked
      
         - eth: amd-xgbe: fix the false linkup in xgbe_phy_status
      
         - eth: mlx5e:
            - fix corner cases in internal buffer configuration
            - drain health before unregistering devlink
      
         - usb: qmi_wwan: set DTR quirk for BroadMobi BM818
      
        Misc:
      
         - tcp: return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if
           user_mss set"
      
      * tag 'net-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
        mptcp: fix active subflow finalization
        mptcp: add annotations around sk->sk_shutdown accesses
        mptcp: fix data race around msk->first access
        mptcp: consolidate passive msk socket initialization
        mptcp: add annotations around msk->subflow accesses
        mptcp: fix connect timeout handling
        rtnetlink: add the missing IFLA_GRO_ tb check in validate_linkmsg
        rtnetlink: move IFLA_GSO_ tb check to validate_linkmsg
        rtnetlink: call validate_linkmsg in rtnl_create_link
        ice: recycle/free all of the fragments from multi-buffer frame
        net: phy: mxl-gpy: extend interrupt fix to all impacted variants
        net: renesas: rswitch: Fix return value in error path of xmit
        net: dsa: mv88e6xxx: Increase wait after reset deactivation
        net: ipa: Use correct value for IPA_STATUS_SIZE
        tcp: fix mishandling when the sack compression is deferred.
        net/sched: flower: fix possible OOB write in fl_set_geneve_opt()
        sfc: fix error unwinds in TC offload
        net/mlx5: Read embedded cpu after init bit cleared
        net/mlx5e: Fix error handling in mlx5e_refresh_tirs
        net/mlx5: Ensure af_desc.mask is properly initialized
        ...
      714069da
    • Mike Christie's avatar
      fork, vhost: Use CLONE_THREAD to fix freezer/ps regression · f9010dbd
      Mike Christie authored
      When switching from kthreads to vhost_tasks two bugs were added:
      1. The vhost worker tasks's now show up as processes so scripts doing
      ps or ps a would not incorrectly detect the vhost task as another
      process.  2. kthreads disabled freeze by setting PF_NOFREEZE, but
      vhost tasks's didn't disable or add support for them.
      
      To fix both bugs, this switches the vhost task to be thread in the
      process that does the VHOST_SET_OWNER ioctl, and has vhost_worker call
      get_signal to support SIGKILL/SIGSTOP and freeze signals. Note that
      SIGKILL/STOP support is required because CLONE_THREAD requires
      CLONE_SIGHAND which requires those 2 signals to be supported.
      
      This is a modified version of the patch written by Mike Christie
      <michael.christie@oracle.com> which was a modified version of patch
      originally written by Linus.
      
      Much of what depended upon PF_IO_WORKER now depends on PF_USER_WORKER.
      Including ignoring signals, setting up the register state, and having
      get_signal return instead of calling do_group_exit.
      
      Tidied up the vhost_task abstraction so that the definition of
      vhost_task only needs to be visible inside of vhost_task.c.  Making
      it easier to review the code and tell what needs to be done where.
      As part of this the main loop has been moved from vhost_worker into
      vhost_task_fn.  vhost_worker now returns true if work was done.
      
      The main loop has been updated to call get_signal which handles
      SIGSTOP, freezing, and collects the message that tells the thread to
      exit as part of process exit.  This collection clears
      __fatal_signal_pending.  This collection is not guaranteed to
      clear signal_pending() so clear that explicitly so the schedule()
      sleeps.
      
      For now the vhost thread continues to exist and run work until the
      last file descriptor is closed and the release function is called as
      part of freeing struct file.  To avoid hangs in the coredump
      rendezvous and when killing threads in a multi-threaded exec.  The
      coredump code and de_thread have been modified to ignore vhost threads.
      
      Remvoing the special case for exec appears to require teaching
      vhost_dev_flush how to directly complete transactions in case
      the vhost thread is no longer running.
      
      Removing the special case for coredump rendezvous requires either the
      above fix needed for exec or moving the coredump rendezvous into
      get_signal.
      
      Fixes: 6e890c5d ("vhost: use vhost_tasks for worker threads")
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Co-developed-by: default avatarMike Christie <michael.christie@oracle.com>
      Signed-off-by: default avatarMike Christie <michael.christie@oracle.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f9010dbd
    • Jakub Kicinski's avatar
      Merge tag 'mlx5-fixes-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · a451b8eb
      Jakub Kicinski authored
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2023-05-31
      
      This series provides bug fixes to mlx5 driver.
      
      * tag 'mlx5-fixes-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
        net/mlx5: Read embedded cpu after init bit cleared
        net/mlx5e: Fix error handling in mlx5e_refresh_tirs
        net/mlx5: Ensure af_desc.mask is properly initialized
        net/mlx5: Fix setting of irq->map.index for static IRQ case
        net/mlx5: Remove rmap also in case dynamic MSIX not supported
      ====================
      
      Link: https://lore.kernel.org/r/20230601031051.131529-1-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a451b8eb
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-fixes-for-connect-timeout-access-annotations-and-subflow-init' · 66dd1014
      Jakub Kicinski authored
      Mat Martineau says:
      
      ====================
      mptcp: Fixes for connect timeout, access annotations, and subflow init
      
      Patch 1 allows the SO_SNDTIMEO sockopt to correctly change the connect
      timeout on MPTCP sockets.
      
      Patches 2-5 add READ_ONCE()/WRITE_ONCE() annotations to fix KCSAN issues.
      
      Patch 6 correctly initializes some subflow fields on outgoing connections.
      ====================
      
      Link: https://lore.kernel.org/r/20230531-send-net-20230531-v1-0-47750c420571@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      66dd1014
    • Paolo Abeni's avatar
      mptcp: fix active subflow finalization · 55b47ca7
      Paolo Abeni authored
      Active subflow are inserted into the connection list at creation time.
      When the MPJ handshake completes successfully, a new subflow creation
      netlink event is generated correctly, but the current code wrongly
      avoid initializing a couple of subflow data.
      
      The above will cause misbehavior on a few exceptional events: unneeded
      mptcp-level retransmission on msk-level sequence wrap-around and infinite
      mapping fallback even when a MPJ socket is present.
      
      Address the issue factoring out the needed initialization in a new helper
      and invoking the latter from __mptcp_finish_join() time for passive
      subflow and from mptcp_finish_join() for active ones.
      
      Fixes: 0530020a ("mptcp: track and update contiguous data status")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      55b47ca7
    • Paolo Abeni's avatar
      mptcp: add annotations around sk->sk_shutdown accesses · 6b9831bf
      Paolo Abeni authored
      Christoph reported the mptcp variant of a recently addressed plain
      TCP issue. Similar to commit e14cadfd ("tcp: add annotations around
      sk->sk_shutdown accesses") add READ/WRITE ONCE annotations to silence
      KCSAN reports around lockless sk_shutdown access.
      
      Fixes: 71ba088c ("mptcp: cleanup accept and poll")
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/401Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6b9831bf
    • Paolo Abeni's avatar
      mptcp: fix data race around msk->first access · 1b1b43ee
      Paolo Abeni authored
      The first subflow socket is accessed outside the msk socket lock
      by mptcp_subflow_fail(), we need to annotate each write access
      with WRITE_ONCE, but a few spots still lacks it.
      
      Fixes: 76a13b31 ("mptcp: invoke MP_FAIL response when needed")
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1b1b43ee
    • Paolo Abeni's avatar
      mptcp: consolidate passive msk socket initialization · 7e8b88ec
      Paolo Abeni authored
      When the msk socket is cloned at MPC handshake time, a few
      fields are initialized in a racy way outside mptcp_sk_clone()
      and the msk socket lock.
      
      The above is due historical reasons: before commit a88d0092
      ("mptcp: simplify subflow_syn_recv_sock()") as the first subflow socket
      carrying all the needed date was not available yet at msk creation
      time
      
      We can now refactor the code moving the missing initialization bit
      under the socket lock, removing the init race and avoiding some
      code duplication.
      
      This will also simplify the next patch, as all msk->first write
      access are now under the msk socket lock.
      
      Fixes: 0397c6d8 ("mptcp: keep unaccepted MPC subflow into join list")
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7e8b88ec
    • Paolo Abeni's avatar
      mptcp: add annotations around msk->subflow accesses · 5b825727
      Paolo Abeni authored
      The MPTCP can access the first subflow socket in a few spots
      outside the socket lock scope. That is actually safe, as MPTCP
      will delete the socket itself only after the msk sock close().
      
      Still the such accesses causes a few KCSAN splats, as reported
      by Christoph. Silence the harmless warning adding a few annotation
      around the relevant accesses.
      
      Fixes: 71ba088c ("mptcp: cleanup accept and poll")
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/402Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5b825727
    • Paolo Abeni's avatar
      mptcp: fix connect timeout handling · 786fc124
      Paolo Abeni authored
      Ondrej reported a functional issue WRT timeout handling on connect
      with a nice reproducer.
      
      The problem is that the current mptcp connect waits for both the
      MPTCP socket level timeout, and the first subflow socket timeout.
      The latter is not influenced/touched by the exposed setsockopt().
      
      Overall the above makes the SO_SNDTIMEO a no-op on connect.
      
      Since mptcp_connect is invoked via inet_stream_connect and the
      latter properly handle the MPTCP level timeout, we can address the
      issue making the nested subflow level connect always unblocking.
      
      This also allow simplifying a bit the code, dropping an ugly hack
      to handle the fastopen and custom proto_ops connect.
      
      The issues predates the blamed commit below, but the current resolution
      requires the infrastructure introduced there.
      
      Fixes: 54f1944e ("mptcp: factor out mptcp_connect()")
      Reported-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/399
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      786fc124