1. 20 Aug, 2024 17 commits
    • Gal Pressman's avatar
      net: Silence false field-spanning write warning in metadata_dst memcpy · 13cfd6a6
      Gal Pressman authored
      When metadata_dst struct is allocated (using metadata_dst_alloc()), it
      reserves room for options at the end of the struct.
      
      Change the memcpy() to unsafe_memcpy() as it is guaranteed that enough
      room (md_size bytes) was allocated and the field-spanning write is
      intentional.
      
      This resolves the following warning:
      	------------[ cut here ]------------
      	memcpy: detected field-spanning write (size 104) of single field "&new_md->u.tun_info" at include/net/dst_metadata.h:166 (size 96)
      	WARNING: CPU: 2 PID: 391470 at include/net/dst_metadata.h:166 tun_dst_unclone+0x114/0x138 [geneve]
      	Modules linked in: act_tunnel_key geneve ip6_udp_tunnel udp_tunnel act_vlan act_mirred act_skbedit cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress sbsa_gwdt ipmi_devintf ipmi_msghandler xfrm_interface xfrm6_tunnel tunnel6 tunnel4 xfrm_user xfrm_algo nvme_fabrics overlay optee openvswitch nsh nf_conncount ib_srp scsi_transport_srp rpcrdma rdma_ucm ib_iser rdma_cm ib_umad iw_cm libiscsi ib_ipoib scsi_transport_iscsi ib_cm uio_pdrv_genirq uio mlxbf_pmc pwr_mlxbf mlxbf_bootctl bluefield_edac nft_chain_nat binfmt_misc xt_MASQUERADE nf_nat xt_tcpmss xt_NFLOG nfnetlink_log xt_recent xt_hashlimit xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_comment ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink sch_fq_codel dm_multipath fuse efi_pstore ip_tables btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 nvme nvme_core mlx5_ib ib_uverbs ib_core ipv6 crc_ccitt mlx5_core crct10dif_ce mlxfw
      	 psample i2c_mlxbf gpio_mlxbf2 mlxbf_gige mlxbf_tmfifo
      	CPU: 2 PID: 391470 Comm: handler6 Not tainted 6.10.0-rc1 #1
      	Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.5.0.12993 Dec  6 2023
      	pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      	pc : tun_dst_unclone+0x114/0x138 [geneve]
      	lr : tun_dst_unclone+0x114/0x138 [geneve]
      	sp : ffffffc0804533f0
      	x29: ffffffc0804533f0 x28: 000000000000024e x27: 0000000000000000
      	x26: ffffffdcfc0e8e40 x25: ffffff8086fa6600 x24: ffffff8096a0c000
      	x23: 0000000000000068 x22: 0000000000000008 x21: ffffff8092ad7000
      	x20: ffffff8081e17900 x19: ffffff8092ad7900 x18: 00000000fffffffd
      	x17: 0000000000000000 x16: ffffffdcfa018488 x15: 695f6e75742e753e
      	x14: 2d646d5f77656e26 x13: 6d5f77656e262220 x12: 646c65696620656c
      	x11: ffffffdcfbe33ae8 x10: ffffffdcfbe1baa8 x9 : ffffffdcfa0a4c10
      	x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000000000001
      	x5 : ffffff83fdeeb010 x4 : 0000000000000000 x3 : 0000000000000027
      	x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80913f6780
      	Call trace:
      	 tun_dst_unclone+0x114/0x138 [geneve]
      	 geneve_xmit+0x214/0x10e0 [geneve]
      	 dev_hard_start_xmit+0xc0/0x220
      	 __dev_queue_xmit+0xa14/0xd38
      	 dev_queue_xmit+0x14/0x28 [openvswitch]
      	 ovs_vport_send+0x98/0x1c8 [openvswitch]
      	 do_output+0x80/0x1a0 [openvswitch]
      	 do_execute_actions+0x172c/0x1958 [openvswitch]
      	 ovs_execute_actions+0x64/0x1a8 [openvswitch]
      	 ovs_packet_cmd_execute+0x258/0x2d8 [openvswitch]
      	 genl_family_rcv_msg_doit+0xc8/0x138
      	 genl_rcv_msg+0x1ec/0x280
      	 netlink_rcv_skb+0x64/0x150
      	 genl_rcv+0x40/0x60
      	 netlink_unicast+0x2e4/0x348
      	 netlink_sendmsg+0x1b0/0x400
      	 __sock_sendmsg+0x64/0xc0
      	 ____sys_sendmsg+0x284/0x308
      	 ___sys_sendmsg+0x88/0xf0
      	 __sys_sendmsg+0x70/0xd8
      	 __arm64_sys_sendmsg+0x2c/0x40
      	 invoke_syscall+0x50/0x128
      	 el0_svc_common.constprop.0+0x48/0xf0
      	 do_el0_svc+0x24/0x38
      	 el0_svc+0x38/0x100
      	 el0t_64_sync_handler+0xc0/0xc8
      	 el0t_64_sync+0x1a4/0x1a8
      	---[ end trace 0000000000000000 ]---
      Reviewed-by: default avatarCosmin Ratiu <cratiu@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarGal Pressman <gal@nvidia.com>
      Link: https://patch.msgid.link/20240818114351.3612692-1-gal@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      13cfd6a6
    • Zhang Zekun's avatar
      net: hns3: Use ARRAY_SIZE() to improve readability · 2cbece60
      Zhang Zekun authored
      There is a helper function ARRAY_SIZE() to help calculating the
      u32 array size, and we don't need to do it mannually. So, let's
      use ARRAY_SIZE() to calculate the array size, and improve the code
      readability.
      Signed-off-by: default avatarZhang Zekun <zhangzekun11@huawei.com>
      Reviewed-by: Jijie Shao<shaojijie@huawei.com>
      Link: https://patch.msgid.link/20240818052518.45489-1-zhangzekun11@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2cbece60
    • Jakub Kicinski's avatar
      selftests: net/forwarding: spawn sh inside vrf to speed up ping loop · 555e5531
      Jakub Kicinski authored
      Looking at timestamped output of netdev CI reveals that
      most of the time in forwarding tests for custom route
      hashing is spent on a single case, namely the test which
      uses ping (mausezahn does not support flow labels).
      
      On a non-debug kernel we spend 714 of 730 total test
      runtime (97%) on this test case. While having flow label
      support in a traffic gen tool / mausezahn would be best,
      we can significantly speed up the loop by putting ip vrf exec
      outside of the iteration.
      
      In a test of 1000 pings using a normal loop takes 50 seconds
      to finish. While using:
      
        ip vrf exec $vrf sh -c "$loop-body"
      
      takes 12 seconds (1/4 of the time).
      
      Some of the slowness is likely due to our inefficient virtualization
      setup, but even on my laptop running "ip link help" 16k times takes
      25-30 seconds, so I think it's worth optimizing even for fastest
      setups.
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Tested-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Link: https://patch.msgid.link/20240817203659.712085-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      555e5531
    • Zhang Zekun's avatar
      net: ethernet: ibm: Simpify code with for_each_child_of_node() · 79765386
      Zhang Zekun authored
      for_each_child_of_node can help to iterate through the device_node,
      and we don't need to use while loop. No functional change with this
      conversion.
      Signed-off-by: default avatarZhang Zekun <zhangzekun11@huawei.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20240816015837.109627-1-zhangzekun11@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      79765386
    • Paolo Abeni's avatar
      Merge branch 'preparations-for-fib-rule-dscp-selector' · 6b2efdc4
      Paolo Abeni authored
      Ido Schimmel says:
      
      ====================
      Preparations for FIB rule DSCP selector
      
      This patchset moves the masking of the upper DSCP bits in 'flowi4_tos'
      to the core instead of relying on callers of the FIB lookup API to do
      it.
      
      This will allow us to start changing users of the API to initialize the
      'flowi4_tos' field with all six bits of the DSCP field. In turn, this
      will allow us to extend FIB rules with a new DSCP selector.
      
      By masking the upper DSCP bits in the core we are able to maintain the
      behavior of the TOS selector in FIB rules and routes to only match on
      the lower DSCP bits.
      
      While working on this I found two users of the API that do not mask the
      upper DSCP bits before performing the lookup. The first is an ancient
      netlink family that is unlikely to be used. It is adjusted in patch #1
      to mask both the upper DSCP bits and the ECN bits before calling the
      API.
      
      The second user is a nftables module that differs in this regard from
      its equivalent iptables module. It is adjusted in patch #2 to invoke the
      API with the upper DSCP bits masked, like all other callers. The
      relevant selftest passed, but in the unlikely case that regressions are
      reported because of this change, we can restore the existing behavior
      using a new flow information flag as discussed here [1].
      
      The last patch moves the masking of the upper DSCP bits to the core,
      making the first two patches redundant, but I wanted to post them
      separately to call attention to the behavior change for these two users
      of the FIB lookup API.
      
      Future patchsets (around 3) will start unmasking the upper DSCP bits
      throughout the networking stack before adding support for the new FIB
      rule DSCP selector.
      
      Changes from v1 [2]:
      
      Patch #3: Include <linux/ip.h> in <linux/in_route.h> instead of
      including it in net/ip_fib.h
      
      [1] https://lore.kernel.org/netdev/ZpqpB8vJU%2FQ6LSqa@debian/
      [2] https://lore.kernel.org/netdev/20240725131729.1729103-1-idosch@nvidia.com/
      ====================
      
      Link: https://patch.msgid.link/20240814125224.972815-1-idosch@nvidia.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6b2efdc4
    • Ido Schimmel's avatar
      ipv4: Centralize TOS matching · 1fa3314c
      Ido Schimmel authored
      The TOS field in the IPv4 flow information structure ('flowi4_tos') is
      matched by the kernel against the TOS selector in IPv4 rules and routes.
      The field is initialized differently by different call sites. Some treat
      it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as
      RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC
      791 TOS and initialize it using IPTOS_RT_MASK.
      
      What is common to all these call sites is that they all initialize the
      lower three DSCP bits, which fits the TOS definition in the initial IPv4
      specification (RFC 791).
      
      Therefore, the kernel only allows configuring IPv4 FIB rules that match
      on the lower three DSCP bits which are always guaranteed to be
      initialized by all call sites:
      
       # ip -4 rule add tos 0x1c table 100
       # ip -4 rule add tos 0x3c table 100
       Error: Invalid tos.
      
      While this works, it is unlikely to be very useful. RFC 791 that
      initially defined the TOS and IP precedence fields was updated by RFC
      2474 over twenty five years ago where these fields were replaced by a
      single six bits DSCP field.
      
      Extending FIB rules to match on DSCP can be done by adding a new DSCP
      selector while maintaining the existing semantics of the TOS selector
      for applications that rely on that.
      
      A prerequisite for allowing FIB rules to match on DSCP is to adjust all
      the call sites to initialize the high order DSCP bits and remove their
      masking along the path to the core where the field is matched on.
      
      However, making this change alone will result in a behavior change. For
      example, a forwarded IPv4 packet with a DS field of 0xfc will no longer
      match a FIB rule that was configured with 'tos 0x1c'.
      
      This behavior change can be avoided by masking the upper three DSCP bits
      in 'flowi4_tos' before comparing it against the TOS selectors in FIB
      rules and routes.
      
      Implement the above by adding a new function that checks whether a given
      DSCP value matches the one specified in the IPv4 flow information
      structure and invoke it from the three places that currently match on
      'flowi4_tos'.
      
      Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK
      since the latter is not uAPI and we should be able to remove it at some
      point.
      
      Include <linux/ip.h> in <linux/in_route.h> since the former defines
      IPTOS_TOS_MASK which is used in the definition of RT_TOS() in
      <linux/in_route.h>.
      
      No regressions in FIB tests:
      
       # ./fib_tests.sh
       [...]
       Tests passed: 218
       Tests failed:   0
      
      And FIB rule tests:
      
       # ./fib_rule_tests.sh
       [...]
       Tests passed: 116
       Tests failed:   0
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1fa3314c
    • Ido Schimmel's avatar
      netfilter: nft_fib: Mask upper DSCP bits before FIB lookup · 548a2029
      Ido Schimmel authored
      As part of its functionality, the nftables FIB expression module
      performs a FIB lookup, but unlike other users of the FIB lookup API, it
      does so without masking the upper DSCP bits. In particular, this differs
      from the equivalent iptables match ("rpfilter") that does mask the upper
      DSCP bits before the FIB lookup.
      
      Align the module to other users of the FIB lookup API and mask the upper
      DSCP bits using IPTOS_RT_MASK before the lookup.
      
      No regressions in nft_fib.sh:
      
       # ./nft_fib.sh
       PASS: fib expression did not cause unwanted packet drops
       PASS: fib expression did drop packets for 1.1.1.1
       PASS: fib expression did drop packets for 1c3::c01d
       PASS: fib expression forward check with policy based routing
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      548a2029
    • Ido Schimmel's avatar
      ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family · 8fed5475
      Ido Schimmel authored
      The NETLINK_FIB_LOOKUP netlink family can be used to perform a FIB
      lookup according to user provided parameters and communicate the result
      back to user space.
      
      However, unlike other users of the FIB lookup API, the upper DSCP bits
      and the ECN bits of the DS field are not masked, which can result in the
      wrong result being returned.
      
      Solve this by masking the upper DSCP bits and the ECN bits using
      IPTOS_RT_MASK.
      
      The structure that communicates the request and the response is not
      exported to user space, so it is unlikely that this netlink family is
      actually in use [1].
      
      [1] https://lore.kernel.org/netdev/ZpqpB8vJU%2FQ6LSqa@debian/Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8fed5475
    • Paolo Abeni's avatar
      Merge branch 'net-smc-introduce-ringbufs-usage-statistics' · ccb445ae
      Paolo Abeni authored
      Wen Gu says:
      
      ====================
      net/smc: introduce ringbufs usage statistics
      
      Currently, we have histograms that show the sizes of ringbufs that ever
      used by SMC connections. However, they are always incremental and since
      SMC allows the reuse of ringbufs, we cannot know the actual amount of
      ringbufs being allocated or actively used.
      
      So this patch set introduces statistics for the amount of ringbufs that
      actually allocated by link group and actively used by connections of a
      certain net namespace, so that we can react based on these memory usage
      information, e.g. active fallback to TCP.
      
      With appropriate adaptations of smc-tools, we can obtain these ringbufs
      usage information:
      
      $ smcr -d linkgroup
      LG-ID    : 00000500
      LG-Role  : SERV
      LG-Type  : ASYML
      VLAN     : 0
      PNET-ID  :
      Version  : 1
      Conns    : 0
      Sndbuf   : 12910592 B    <-
      RMB      : 12910592 B    <-
      
      or
      
      $ smcr -d stats
      [...]
      RX Stats
        Data transmitted (Bytes)      869225943 (869.2M)
        Total requests                 18494479
        Buffer usage  (Bytes)          12910592 (12.31M)  <-
        [...]
      
      TX Stats
        Data transmitted (Bytes)    12760884405 (12.76G)
        Total requests                 36988338
        Buffer usage  (Bytes)          12910592 (12.31M)  <-
        [...]
      [...]
      
      Change log:
      v3->v2
      - use new helper nla_put_uint() instead of nla_put_u64_64bit().
      
      v2->v1
      https://lore.kernel.org/r/20240807075939.57882-1-guwen@linux.alibaba.com/
      - remove inline keyword in .c files.
      - use local variable in macros to avoid potential side effects.
      
      v1
      https://lore.kernel.org/r/20240805090551.80786-1-guwen@linux.alibaba.com/
      ====================
      
      Link: https://patch.msgid.link/20240814130827.73321-1-guwen@linux.alibaba.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ccb445ae
    • Wen Gu's avatar
      net/smc: introduce statistics for ringbufs usage of net namespace · e0d10354
      Wen Gu authored
      The buffer size histograms in smc_stats, namely rx/tx_rmbsize, record
      the sizes of ringbufs for all connections that have ever appeared in
      the net namespace. They are incremental and we cannot know the actual
      ringbufs usage from these. So here introduces statistics for current
      ringbufs usage of existing smc connections in the net namespace into
      smc_stats, it will be incremented when new connection uses a ringbuf
      and decremented when the ringbuf is unused.
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e0d10354
    • Wen Gu's avatar
      net/smc: introduce statistics for allocated ringbufs of link group · d386d59b
      Wen Gu authored
      Currently we have the statistics on sndbuf/RMB sizes of all connections
      that have ever been on the link group, namely smc_stats_memsize. However
      these statistics are incremental and since the ringbufs of link group
      are allowed to be reused, we cannot know the actual allocated buffers
      through these. So here introduces the statistic on actual allocated
      ringbufs of the link group, it will be incremented when a new ringbuf is
      added into buf_list and decremented when it is deleted from buf_list.
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d386d59b
    • Zhang Changzhong's avatar
      net: remove redundant check in skb_shift() · dca9d62a
      Zhang Changzhong authored
      The check for '!to' is redundant here, since skb_can_coalesce() already
      contains this check.
      Signed-off-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/1723730983-22912-1-git-send-email-zhangchangzhong@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dca9d62a
    • Yue Haibing's avatar
      mptcp: Remove unused declaration mptcp_sockopt_sync() · af3dc0ad
      Yue Haibing authored
      Commit a1ab24e5 ("mptcp: consolidate sockopt synchronization")
      removed the implementation but leave declaration.
      Signed-off-by: default avatarYue Haibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20240816100404.879598-1-yuehaibing@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      af3dc0ad
    • Yue Haibing's avatar
      net/mlx5: E-Switch, Remove unused declarations · c5e2a1b0
      Yue Haibing authored
      These are never implenmented since commit b691b111 ("net/mlx5: Implement
      devlink port function cmds to control ipsec_packet").
      Signed-off-by: default avatarYue Haibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20240816101550.881844-1-yuehaibing@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c5e2a1b0
    • Yue Haibing's avatar
      igbvf: Remove two unused declarations · 12906bab
      Yue Haibing authored
      There is no caller and implementations in tree.
      Signed-off-by: default avatarYue Haibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20240816101638.882072-1-yuehaibing@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      12906bab
    • Yue Haibing's avatar
      gve: Remove unused declaration gve_rx_alloc_rings() · 359c5eb0
      Yue Haibing authored
      Commit f13697cc ("gve: Switch to config-aware queue allocation")
      convert this function to gve_rx_alloc_rings_gqi().
      Signed-off-by: default avatarYue Haibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20240816101906.882743-1-yuehaibing@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      359c5eb0
    • Jakub Kicinski's avatar
      tcp_metrics: use netlink policy for IPv6 addr len validation · a2901083
      Jakub Kicinski authored
      Use the netlink policy to validate IPv6 address length.
      Destination address currently has policy for max len set,
      and source has no policy validation. In both cases
      the code does the real check. With correct policy
      check the code can be removed.
      Reviewed-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://patch.msgid.link/20240816212245.467745-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a2901083
  2. 19 Aug, 2024 1 commit
  3. 17 Aug, 2024 2 commits
    • Simon Horman's avatar
      bnx2x: Set ivi->vlan field as an integer · a99ef548
      Simon Horman authored
      In bnx2x_get_vf_config():
      * The vlan field of ivi is a 32-bit integer, it is used to store a vlan ID.
      * The vlan field of bulletin is a 16-bit integer, it is also used to store
        a vlan ID.
      
      In the current code, ivi->vlan is set using memset. But in the case of
      setting it to the value of bulletin->vlan, this involves reading
      32 bits from a 16bit source. This is likely safe, as the following
      6 bytes are padding in the same structure, but none the less, it seems
      undesirable.
      
      However, it is entirely unclear to me how this scheme works on
      big-endian systems.
      
      Resolve this by simply assigning integer values to ivi->vlan.
      
      Flagged by W=1 builds.
      f.e. gcc-14 reports:
      
      In function 'fortify_memcpy_chk',
          inlined from 'bnx2x_get_vf_config' at .../bnx2x_sriov.c:2655:4:
      .../fortify-string.h:580:25: warning: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
        580 |                         __read_overflow2_field(q_size_field, size);
            |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Compile tested only.
      Signed-off-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Link: https://patch.msgid.link/20240815-bnx2x-int-vlan-v1-1-5940b76e37ad@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a99ef548
    • Christoph Paasch's avatar
      mpls: Reduce skb re-allocations due to skb_cow() · f4ae8420
      Christoph Paasch authored
      mpls_xmit() needs to prepend the MPLS-labels to the packet. That implies
      one needs to make sure there is enough space for it in the headers.
      
      Calling skb_cow() implies however that one wants to change even the
      playload part of the packet (which is not true for MPLS). Thus, call
      skb_cow_head() instead, which is what other tunnelling protocols do.
      
      Running a server with this comm it entirely removed the calls to
      pskb_expand_head() from the callstack in mpls_xmit() thus having
      significant CPU-reduction, especially at peak times.
      
      Cc: Roopa Prabhu <roopa@nvidia.com>
      Reported-by: default avatarCraig Taylor <cmtaylor@apple.com>
      Signed-off-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20240815161201.22021-1-cpaasch@apple.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f4ae8420
  4. 16 Aug, 2024 20 commits