1. 13 Apr, 2024 22 commits
  2. 12 Apr, 2024 18 commits
    • David S. Miller's avatar
      Merge branch 'nfp-minor-improvements' · 982a73c7
      David S. Miller authored
      Louis Peens says:
      
      ====================
      nfp: series of minor driver improvements
      
      This short series bundles now only includes a small update to add a
      board part number to devlink. Previously some dim patches also formed
      part of this series, these were dropped in v5.
      
      Patch1: Add new define for devlink string "board.part_number"
      Patch2: Make use of this field in the nfp driver
      
      Changes since V4:
      - Dropped the dim patches, as there is a more significant rework in
        progress to make it more flexible, as mentioned in the V4 review:
        https://lore.kernel.org/all/1712547870-112976-2-git-send-email-hengqi@linux.alibaba.com/
      - Updated the devlink description of 'board.part_number'
      
      Changes since V3:
      - Fixed: Documentation/networking/devlink/devlink-info.rst:150:
          WARNING: Title underline too short.
      
      Changes since V2:
      - After some discussion on the previous series it was agreed that only
        the "board.part_number" field makes sense in the common code. The
        "board.model" field which was moved to devlink common code in V1 is
        now kept in the driver. The field is specific to the nfp driver,
        exposing the codename of the board.
      - In summary, add "board.part_number" to devlink, and populate it
        in the the nfp driver.
      
      Changes since V1:
      - Move nfp local defines to devlink common code as it is quite generic.
      - Add new 'dim' profile instead of using driver local overrides, as this
        allows use of the 'dim' helpers.
      - This expanded 2 patches to 4, as the common code changes are split
        into seperate patches.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      982a73c7
    • Fei Qin's avatar
      nfp: update devlink device info output · 8910f93b
      Fei Qin authored
      Newer NIC will introduce a new part number, now add it
      into devlink device info.
      
      This patch also updates the information of "board.id" in
      nfp.rst to match the devlink-info.rst.
      Signed-off-by: default avatarFei Qin <fei.qin@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8910f93b
    • Fei Qin's avatar
      devlink: add a new info version tag · 3bb946c9
      Fei Qin authored
      Add definition and documentation for the new generic
      info "board.part_number".
      
      The new one is for part number specific use, and board.id
      is modified to match the documentation in devlink-info.
      Signed-off-by: default avatarFei Qin <fei.qin@corigine.com>
      Signed-off-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3bb946c9
    • Hechao Li's avatar
      tcp: increase the default TCP scaling ratio · 697a6c8c
      Hechao Li authored
      After commit dfa2f048 ("tcp: get rid of sysctl_tcp_adv_win_scale"),
      we noticed an application-level timeout due to reduced throughput.
      
      Before the commit, for a client that sets SO_RCVBUF to 65k, it takes
      around 22 seconds to transfer 10M data. After the commit, it takes 40
      seconds. Because our application has a 30-second timeout, this
      regression broke the application.
      
      The reason that it takes longer to transfer data is that
      tp->scaling_ratio is initialized to a value that results in ~0.25 of
      rcvbuf. In our case, SO_RCVBUF is set to 65536 by the application, which
      translates to 2 * 65536 = 131,072 bytes in rcvbuf and hence a ~28k
      initial receive window.
      
      Later, even though the scaling_ratio is updated to a more accurate
      skb->len/skb->truesize, which is ~0.66 in our environment, the window
      stays at ~0.25 * rcvbuf. This is because tp->window_clamp does not
      change together with the tp->scaling_ratio update when autotuning is
      disabled due to SO_RCVBUF. As a result, the window size is capped at the
      initial window_clamp, which is also ~0.25 * rcvbuf, and never grows
      bigger.
      
      Most modern applications let the kernel do autotuning, and benefit from
      the increased scaling_ratio. But there are applications such as kafka
      that has a default setting of SO_RCVBUF=64k.
      
      This patch increases the initial scaling_ratio from ~25% to 50% in order
      to make it backward compatible with the original default
      sysctl_tcp_adv_win_scale for applications setting SO_RCVBUF.
      
      Fixes: dfa2f048 ("tcp: get rid of sysctl_tcp_adv_win_scale")
      Signed-off-by: default avatarHechao Li <hli@netflix.com>
      Reviewed-by: default avatarTycho Andersen <tycho@tycho.pizza>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/netdev/20240402215405.432863-1-hli@netflix.com/Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      697a6c8c
    • David S. Miller's avatar
      Merge branch 'rtl8226b-serdes-switching' · c31bd5b6
      David S. Miller authored
      Eric Woudstra says:
      
      ====================
      rtl8226b/8221b add C45 instances and SerDes switching
      
      Based on the comments in [PATCH net-next]
      "Realtek RTL822x PHY rework to c45 and SerDes interface switching"
      
      Adds SerDes switching interface between 2500base-x and sgmii for
      rtl8221b and rtl8226b.
      
      Add get_rate_matching() for rtl8226b and rtl8221b, reading the serdes
      mode from phy.
      
      Driver instances are added for rtl8226b and rtl8221b for Clause 45
      access only. The existing code is not touched, they use newly added
      functions. They also use the same rtl822xb_config_init() and
      rtl822xb_get_rate_matching() as these functions also can be used for
      direct Clause 45 access. Also Adds definition of MMC 31 registers,
      which cannot be used through C45-over-C22, only when phydev->is_c45
      is set.
      
      Change rtlgen_get_speed() so the register value is passed as argument.
      Using Clause 45 access, this value is retrieved differently.
      Rename it to rtlgen_decode_speed() and add a call to it in
      rtl822x_c45_read_status().
      
      Add rtl822x_c45_get_features() to set supported port for rtl8221b.
      
      Then 1 quirk is added for sfp modules known to have a rtl8221b
      behind RollBall, Clause 45 only, protocol.
      
      Changed in PATCH v4:
      * Changed switch to if statement in rtl822xb_get_rate_matching()
      * Removed setting ETHTOOL_LINK_MODE_MII_BIT in rtl822x_c45_get_features()
      
      Changed in PATCH v3:
      * Only apply to rtl8221b and rtl8226b phy's
      * Set phydev->rate_matching in .config_init()
      * Removed OEM SFP fixup for now, as there are modules with the same
        vendor name/PN, but with different PHY's. We found rtl8221b, but
        also the ty8821, which is not yet supported.
      
      Changed in PATCH v2:
      * Set author to Marek for the commit of the new C45 instances
      * Separate commit for setting supported ports
      * Renamed rtlgen_get_speed to rtlgen_decode_speed
      * Always fill in possible interfaces
      * Renamed sfp_fixup_oem_2_5g to sfp_fixup_oem_2_5gbaset
      * Only update phydev->interface when link is up
      
      Alexander Couzens (1):
        net: phy: realtek: configure SerDes mode for rtl822xb PHYs
      
      Eric Woudstra (3):
        net: phy: realtek: add get_rate_matching() for rtl822xb PHYs
        net: phy: realtek: Change rtlgen_get_speed() to rtlgen_decode_speed()
        net: phy: realtek: add rtl822x_c45_get_features() to set supported
          port
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c31bd5b6
    • Marek Behún's avatar
      net: sfp: add quirk for another multigig RollBall transceiver · 1c77c721
      Marek Behún authored
      Add quirk for another RollBall copper transceiver: Turris RTSFP-2.5G,
      containing 2.5g capable RTL8221B PHY.
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarEric Woudstra <ericwouds@gmail.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c77c721
    • Eric Woudstra's avatar
      net: phy: realtek: add rtl822x_c45_get_features() to set supported port · 2d9ce648
      Eric Woudstra authored
      Sets ETHTOOL_LINK_MODE_TP_BIT in phydev->supported.
      Signed-off-by: default avatarEric Woudstra <ericwouds@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d9ce648
    • Eric Woudstra's avatar
      net: phy: realtek: Change rtlgen_get_speed() to rtlgen_decode_speed() · 2e4ea707
      Eric Woudstra authored
      The value of the register to determine the speed, is retrieved
      differently when using Clause 45 only. To use the rtlgen_get_speed()
      function in this case, pass the value of the register as argument to
      rtlgen_get_speed(). The function would then always return 0, so change it
      to void. A better name for this function now is rtlgen_decode_speed().
      
      Replace a call to genphy_read_status() followed by rtlgen_get_speed()
      with a call to rtlgen_read_status() in rtl822x_read_status().
      
      Add reading speed to rtl822x_c45_read_status().
      Signed-off-by: default avatarEric Woudstra <ericwouds@gmail.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e4ea707
    • Marek Behún's avatar
      net: phy: realtek: Add driver instances for rtl8221b via Clause 45 · ad5ce743
      Marek Behún authored
      Collected from several commits in [PATCH net-next]
      "Realtek RTL822x PHY rework to c45 and SerDes interface switching"
      
      The instances are used by Clause 45 only accessible PHY's on several sfp
      modules, which are using RollBall protocol.
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      [ Added matching functions to differentiate C45 instances ]
      Signed-off-by: default avatarEric Woudstra <ericwouds@gmail.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad5ce743
    • Eric Woudstra's avatar
      net: phy: realtek: add get_rate_matching() for rtl822xb PHYs · c189dbd7
      Eric Woudstra authored
      Uses vendor register to determine if SerDes is setup in rate-matching mode.
      
      Rate-matching only supported when SerDes is set to 2500base-x.
      Signed-off-by: default avatarEric Woudstra <ericwouds@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c189dbd7
    • Alexander Couzens's avatar
      net: phy: realtek: configure SerDes mode for rtl822xb PHYs · deb8af52
      Alexander Couzens authored
      The rtl8221b and rtl8226b series support switching SerDes mode between
      2500base-x and sgmii based on the negotiated copper speed.
      
      Configure this switching mode according to SerDes modes supported by
      host.
      
      There is an additional datasheet for RTL8226B/RTL8221B called
      "SERDES MODE SETTING FLOW APPLICATION NOTE" where a sequence is
      described to setup interface and rate adapter mode.
      
      However, there is no documentation about the meaning of registers
      and bits, it's literally just magic numbers and pseudo-code.
      Signed-off-by: default avatarAlexander Couzens <lynxis@fe80.eu>
      [ refactored, dropped HiSGMII mode and changed commit message ]
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      [ changed rtl822x_update_interface() to use vendor register ]
      [ always fill in possible interfaces ]
      [ only apply to rtl8221b and rtl8226b phy's ]
      [ set phydev->rate_matching in .config_init() ]
      Signed-off-by: default avatarEric Woudstra <ericwouds@gmail.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: should come before them, without any blank lines. As the
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      deb8af52
    • Jakub Kicinski's avatar
      Merge branch 'net-dsa-allow-phylink_mac_ops-in-dsa-drivers' · af74be9f
      Jakub Kicinski authored
      Russell King says:
      
      ====================
      net: dsa: allow phylink_mac_ops in DSA drivers
      
      This series showcases my idea of moving the phylink_mac_ops into DSA
      drivers, using mv88e6xxx as an example. Since I'm only changing one
      driver, providing the mac_ops has to be optional and the existing shims
      need to be kept for unconverted drivers.
      
      The first patch introduces a new helper that converts from the
      phylink_config structure that phylink uses to communicate with MAC
      drivers to the dsa_port structure. From this, DSA drivers can get
      the dsa_switch structure and thus their implementation specific
      data structure, and they can also retrieve the port index.
      
      The second patch adds the support to the core DSA layer to allow
      DSA drivers to provide phylink_mac_ops.
      
      The third patch converts mv88e6xxx to use this.
      
      I initially made this change after adding yet more phylink to DSA
      driver shims for my work with phylink-based EEE support, and decided
      that it was getting silly to keep implementing more and more shims.
      There are cases where shims don't work well - we had already tripped
      over a case a few years ago when the phylink mac_select_pcs operation
      was introduced. Phylink tested for the presence of this in the ops
      structure, but with DSA shims, this doesn't necessarily mean that
      the sub-driver supports this method. The only way to find that out
      is to call the method with dummy values and check the return code.
      
      The same thing was partly true when adding EEE support, and I ended
      up with this in phylink to determine whether the MAC supported EEE:
      
      +static bool phylink_mac_supports_eee(struct phylink *pl)
      +{
      +       return pl->mac_ops->mac_disable_tx_lpi &&
      +              pl->mac_ops->mac_enable_tx_lpi &&
      +              pl->config->lpi_capabilities;
      +}
      
      because merely testing for the presence of the operations is
      insufficient when shims are involved - and it wasn't possible to call
      these functions in the way that mac_select_pcs could be called.
      
      So, I think it's time to get away from this shimming model and instead
      have drivers directly interface to the various subsystems.
      
      This converts mv88e6xxx. I have similar patches for other DSA drivers
      that will be sent once this has been reviewed.
      ====================
      
      Link: https://lore.kernel.org/r/ZhbrbM+d5UfgafGp@shell.armlinux.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      af74be9f
    • Russell King (Oracle)'s avatar
      net: dsa: mv88e6xxx: provide own phylink MAC operations · 0cb6da0c
      Russell King (Oracle) authored
      Convert mv88e6xxx to provide its own phylink MAC operations, thus
      avoiding the shim layer in DSA's port.c
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Link: https://lore.kernel.org/r/E1rudqK-006K9N-HY@rmk-PC.armlinux.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0cb6da0c
    • Russell King (Oracle)'s avatar
      net: dsa: allow DSA switch drivers to provide their own phylink mac ops · cae425cb
      Russell King (Oracle) authored
      Rather than having a shim for each and every phylink MAC operation,
      allow DSA switch drivers to provide their own ops structure. When a
      DSA driver provides the phylink MAC operations, the shimmed ops must
      not be provided, so fail an attempt to register a switch with both
      the phylink_mac_ops in struct dsa_switch and the phylink_mac_*
      operations populated in dsa_switch_ops populated.
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Link: https://lore.kernel.org/r/E1rudqF-006K9H-Cc@rmk-PC.armlinux.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cae425cb
    • Russell King (Oracle)'s avatar
      net: dsa: introduce dsa_phylink_to_port() · dd0c9855
      Russell King (Oracle) authored
      We convert from a phylink_config struct to a dsa_port struct in many
      places, let's provide a helper for this.
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Link: https://lore.kernel.org/r/E1rudqA-006K9B-85@rmk-PC.armlinux.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dd0c9855
    • Colin Ian King's avatar
      tls: remove redundant assignment to variable decrypted · f7ac8fbd
      Colin Ian King authored
      The variable decrypted is being assigned a value that is never read,
      the control of flow after the assignment is via an return path and
      decrypted is not referenced in this path. The assignment is redundant
      and can be removed.
      
      Cleans up clang scan warning:
      net/tls/tls_sw.c:2150:4: warning: Value stored to 'decrypted' is never
      read [deadcode.DeadStores]
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Link: https://lore.kernel.org/r/20240410144136.289030-1-colin.i.king@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f7ac8fbd
    • Guillaume Nault's avatar
      ipv4: Remove RTO_ONLINK. · 5618603f
      Guillaume Nault authored
      RTO_ONLINK was a flag used in ->flowi4_tos that allowed to alter the
      scope of an IPv4 route lookup. Setting this flag was equivalent to
      specifying RT_SCOPE_LINK in ->flowi4_scope.
      
      With commit ec20b283 ("ipv4: Set scope explicitly in
      ip_route_output()."), the last users of RTO_ONLINK have been removed.
      Therefore, we can now drop the code that checked this bit and stop
      modifying ->flowi4_scope in ip_route_output_key_hash().
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/57de760565cab55df7b129f523530ac6475865b2.1712754146.git.gnault@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5618603f
    • Jon Maloy's avatar
      tcp: add support for SO_PEEK_OFF socket option · 05ea4916
      Jon Maloy authored
      When reading received messages from a socket with MSG_PEEK, we may want
      to read the contents with an offset, like we can do with pread/preadv()
      when reading files. Currently, it is not possible to do that.
      
      In this commit, we add support for the SO_PEEK_OFF socket option for TCP,
      in a similar way it is done for Unix Domain sockets.
      
      In the iperf3 log examples shown below, we can observe a throughput
      improvement of 15-20 % in the direction host->namespace when using the
      protocol splicer 'pasta' (https://passt.top).
      This is a consistent result.
      
      pasta(1) and passt(1) implement user-mode networking for network
      namespaces (containers) and virtual machines by means of a translation
      layer between Layer-2 network interface and native Layer-4 sockets
      (TCP, UDP, ICMP/ICMPv6 echo).
      
      Received, pending TCP data to the container/guest is kept in kernel
      buffers until acknowledged, so the tool routinely needs to fetch new
      data from socket, skipping data that was already sent.
      
      At the moment this is implemented using a dummy buffer passed to
      recvmsg(). With this change, we don't need a dummy buffer and the
      related buffer copy (copy_to_user()) anymore.
      
      passt and pasta are supported in KubeVirt and libvirt/qemu.
      
      jmaloy@freyr:~/passt$ perf record -g ./pasta --config-net -f
      SO_PEEK_OFF not supported by kernel.
      
      jmaloy@freyr:~/passt# iperf3 -s
      -----------------------------------------------------------
      Server listening on 5201 (test #1)
      -----------------------------------------------------------
      Accepted connection from 192.168.122.1, port 44822
      [  5] local 192.168.122.180 port 5201 connected to 192.168.122.1 port 44832
      [ ID] Interval           Transfer     Bitrate
      [  5]   0.00-1.00   sec  1.02 GBytes  8.78 Gbits/sec
      [  5]   1.00-2.00   sec  1.06 GBytes  9.08 Gbits/sec
      [  5]   2.00-3.00   sec  1.07 GBytes  9.15 Gbits/sec
      [  5]   3.00-4.00   sec  1.10 GBytes  9.46 Gbits/sec
      [  5]   4.00-5.00   sec  1.03 GBytes  8.85 Gbits/sec
      [  5]   5.00-6.00   sec  1.10 GBytes  9.44 Gbits/sec
      [  5]   6.00-7.00   sec  1.11 GBytes  9.56 Gbits/sec
      [  5]   7.00-8.00   sec  1.07 GBytes  9.20 Gbits/sec
      [  5]   8.00-9.00   sec   667 MBytes  5.59 Gbits/sec
      [  5]   9.00-10.00  sec  1.03 GBytes  8.83 Gbits/sec
      [  5]  10.00-10.04  sec  30.1 MBytes  6.36 Gbits/sec
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate
      [  5]   0.00-10.04  sec  10.3 GBytes  8.78 Gbits/sec   receiver
      -----------------------------------------------------------
      Server listening on 5201 (test #2)
      -----------------------------------------------------------
      ^Ciperf3: interrupt - the server has terminated
      jmaloy@freyr:~/passt#
      logout
      [ perf record: Woken up 23 times to write data ]
      [ perf record: Captured and wrote 5.696 MB perf.data (35580 samples) ]
      jmaloy@freyr:~/passt$
      
      jmaloy@freyr:~/passt$ perf record -g ./pasta --config-net -f
      SO_PEEK_OFF supported by kernel.
      
      jmaloy@freyr:~/passt# iperf3 -s
      -----------------------------------------------------------
      Server listening on 5201 (test #1)
      -----------------------------------------------------------
      Accepted connection from 192.168.122.1, port 52084
      [  5] local 192.168.122.180 port 5201 connected to 192.168.122.1 port 52098
      [ ID] Interval           Transfer     Bitrate
      [  5]   0.00-1.00   sec  1.32 GBytes  11.3 Gbits/sec
      [  5]   1.00-2.00   sec  1.19 GBytes  10.2 Gbits/sec
      [  5]   2.00-3.00   sec  1.26 GBytes  10.8 Gbits/sec
      [  5]   3.00-4.00   sec  1.36 GBytes  11.7 Gbits/sec
      [  5]   4.00-5.00   sec  1.33 GBytes  11.4 Gbits/sec
      [  5]   5.00-6.00   sec  1.21 GBytes  10.4 Gbits/sec
      [  5]   6.00-7.00   sec  1.31 GBytes  11.2 Gbits/sec
      [  5]   7.00-8.00   sec  1.25 GBytes  10.7 Gbits/sec
      [  5]   8.00-9.00   sec  1.33 GBytes  11.5 Gbits/sec
      [  5]   9.00-10.00  sec  1.24 GBytes  10.7 Gbits/sec
      [  5]  10.00-10.04  sec  56.0 MBytes  12.1 Gbits/sec
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate
      [  5]   0.00-10.04  sec  12.9 GBytes  11.0 Gbits/sec  receiver
      -----------------------------------------------------------
      Server listening on 5201 (test #2)
      -----------------------------------------------------------
      ^Ciperf3: interrupt - the server has terminated
      logout
      [ perf record: Woken up 20 times to write data ]
      [ perf record: Captured and wrote 5.040 MB perf.data (33411 samples) ]
      jmaloy@freyr:~/passt$
      
      The perf record confirms this result. Below, we can observe that the
      CPU spends significantly less time in the function ____sys_recvmsg()
      when we have offset support.
      
      Without offset support:
      ----------------------
      jmaloy@freyr:~/passt$ perf report -q --symbol-filter=do_syscall_64 \
                             -p ____sys_recvmsg -x --stdio -i  perf.data | head -1
      46.32%     0.00%  passt.avx2  [kernel.vmlinux]  [k] do_syscall_64  ____sys_recvmsg
      
      With offset support:
      ----------------------
      jmaloy@freyr:~/passt$ perf report -q --symbol-filter=do_syscall_64 \
                             -p ____sys_recvmsg -x --stdio -i  perf.data | head -1
      28.12%     0.00%  passt.avx2  [kernel.vmlinux]  [k] do_syscall_64  ____sys_recvmsg
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarJon Maloy <jmaloy@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240409152805.913891-1-jmaloy@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      05ea4916