- 17 Nov, 2021 24 commits
-
-
Dmytro Linkin authored
Vports' QoS is not commonly used but consume SW/HW resources, which becomes an issue on BlueField SoC systems. Don't enable QoS on vports by default on eswitch mode change and enable when it's going to be used by one of the top level users: - configuring TC matchall filter with police action; - setting rate with legacy NDO API; - calling devlink ops->rate_leaf_*() callbacks. Disable vport QoS on vport cleanup. Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Parav Pandit authored
eswitch.c is mainly for common code between legacy and offloads mode. MAC address get and set via devlink is applicable only in offloads mode. Hence, move it to eswitch_offloads.c file. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Parav Pandit authored
mlx5_eswitch_set_vport_mac() routine already does necessary checks which are duplicated in implementation of mlx5_devlink_port_function_hw_addr_set(). Hence, reuse mlx5_eswitch_set_vport_mac() and cut down the code. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Parav Pandit authored
An eswitch vport of the devlink port is always enabled before a devlink port is registered. And a eswitch vport is always disabled after a devlink port is unregistered. Hence avoid the vport enabled check in the devlink callback routine. Such check is only applicable in the legacy SR-IOV callbacks. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Sunil Sudhakar Rani <sunrani@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Chris Mi authored
There is a use case that the local and remote VTEPs are in the same host. Currently, the out ifindex is not specified when looking up the decap route for offloads. So in this case, a local route is returned and the route dev is lo. Actual tunnel interface can be created with a parameter "dev" [1], which specifies the physical device to use for tunnel endpoint communication. Pass this parameter to driver when looking up decap route for offloads. So that a unicast route will be returned. [1] ip link add name vxlan1 type vxlan id 100 dev enp4s0f0 remote 1.1.1.1 dstport 4789 Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
Move the comment to the correct place where the driver actually removes the flag and not in the check that maybe pedit actions exists. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
When deleting fdb/nic flow rules first release all resources and then call the kfree() calls instead of sparse them around the function. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Roi Dayan authored
Counter is only added if counter flag exists. So check the counter fag exists for deleting the counter. This is the same as in add/del fdb flow. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yihao Han authored
swap() was used instead of the tmp variable to swap values Signed-off-by: Yihao Han <hanyihao@vivo.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Paul Blakey authored
As each CT rule uses at least 4 modify header actions, each rule causes at least 3 reallocations by the mod header actions api. Allow initial static allocation of the mod acts array, and use it for CT rules. If the static allocation is exceeded go back to dynamic allocation. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com>
-
Paul Blakey authored
For all mod hdr related functions to reside in a single self contained component (mod_hdr.c), refactor alloc() and add get_id() so that user won't rely on internal implementation, and move both to mod_hdr component. Rename the prefix to mlx5e_mod_hdr_* as other mod hdr functions. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Aya Levin authored
Use firmware version field as an indication to health buffer's sanity. When firmware version is 0xFFFFFFFF, deduce that firmware is unavailable and avoid printing the health buffer to dmesg as it doesn't provide debug info. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Saeed Mahameed authored
Treat the string as an argument to avoid this. drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:482:5: error: format string is not a string literal (potentially insecure) name); ^~~~ drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:2079:4: error: format string is not a string literal (potentially insecure) ptp_ch_stats_desc[i].format); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
-
Saeed Mahameed authored
Add support for ethtool coalesce cq mode set and get. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
-
Russell King (Oracle) authored
SMII has not been documented in the kernel, but information on this PHY interface mode has been recently found. Document it, and correct the recently introduced phylink handling for this interface mode. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/E1mmfVl-0075nP-14@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Merge branch 'r8169-disable-detection-of-further-chip-versions-that-didn-t-make-it-to-the-mass-market' Heiner Kallweit says: ==================== r8169: disable detection of further chip versions that didn't make it to the mass market There's no sign of life from further chip versions. Seems they didn't make it to the mass market. Let's disable detection and if nobody complains remove support a few kernel versions later. ==================== Link: https://lore.kernel.org/r/7708d13a-4a2b-090d-fadf-ecdd0fff5d2e@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
It seems this chip version never made it to the wild. Therefore disable detection and if nobody complains remove support completely later. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
It seems this chip version never made it to the wild. Therefore disable detection and if nobody complains remove support completely later. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
It seems these chip versions never made it to the wild. Therefore disable detection and if nobody complains remove support completely later. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Heiner Kallweit authored
With newer chip versions ASPM-related issues seem to occur only if L1.2 is enabled. I have a test system with RTL8168h that gives a number of rx_missed errors when running iperf and L1.2 is enabled. With L1.2 disabled (and L1 + L1.1 active) everything is fine. See also [0]. Can't test this, but L1 + L1.1 being active should be sufficient to reach higher package power saving states. [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1942830Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/36feb8c4-a0b6-422a-899c-e61f2e869dfe@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Eric Dumazet says: ==================== net: better packing of global vars First two patches avoid holes in data section, and last patch makes sure some siphash keys are contained in a single cache line. ==================== Link: https://lore.kernel.org/r/20211115172303.3732746-1-eric.dumazet@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
siphash keys use 16 bytes. Define siphash_aligned_key_t macro so that we can make sure they are not crossing a cache line boundary. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
Same rationale than prior patch : using the dedicated section avoid holes and pack all these bool values. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
.data.once contains nicely packed bool variables. It is used already by DO_ONCE_LITE(). Using it also in DO_ONCE() removes holes in .data section. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 16 Nov, 2021 16 commits
-
-
David S. Miller authored
Eric Dumazet says: ==================== net: prot_inuse and sock_inuse cleanups Small series cleaning and optimizing sock_prot_inuse_add() and sock_inuse_add(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This is distracting really, let's make this simpler, because many callers had to take care of this by themselves, even if on x86 this adds more code than really needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
net->core.sock_inuse is a per cpu variable (int), while net->core.prot_inuse is another per cpu variable of 64 integers. per cpu allocator tend to place them in very different places. Grouping them together makes sense, since it makes updates potentially faster, if hitting the same cache line. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
MPTCP hard codes it, let us instead provide this helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
sock_prot_inuse_add() is very small, we can inline it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Eric Dumazet says: ==================== gro: get out of core files Move GRO related content into net/core/gro.c and include/net/gro.h. This reduces GRO scope to where it is really needed, and shrinks too big files (include/linux/netdevice.h and net/core/dev.c) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Move gro code and data from net/core/dev.c to net/core/gro.c to ease maintenance. gro_normal_list() and gro_normal_one() are inlined because they are called from both files. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
net/core/gro.c will contain all core gro functions, to shrink net/core/skbuff.c and net/core/dev.c Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This helper is used once, no need to keep it in fat net/core/skbuff.c Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
include/linux/netdevice.h became too big, move gro stuff into include/net/gro.h Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Eric Dumazet says: ==================== tcp: optimizations for linux-5.17 Mostly small improvements in this series. The notable change is in "defer skb freeing after socket lock is released" in recvmsg() (and RX zerocopy) The idea is to try to let skb freeing to BH handler, whenever possible, or at least perform the freeing outside of the socket lock section, for much improved performance. This idea can probably be extended to other protocols. Tests on a 100Gbit NIC Max throughput for one TCP_STREAM flow, over 10 runs. MTU : 1500 (1428 bytes of TCP payload per MSS) Before: 55 Gbit After: 66 Gbit MTU : 4096+ (4096 bytes of TCP payload, plus TCP/IPv6 headers) Before: 82 Gbit After: 95 Gbit ==================== Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
sk_rx_dst/sk_rx_dst_ifindex/sk_rx_dst_cookie are read in early demux, and currently spans two cache lines. Moving them close to sk_refcnt makes more sense, as only one cache line is needed. New layout for this hot cache line is : struct sock { struct sock_common __sk_common; /* 0 0x88 */ /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */ struct dst_entry * sk_rx_dst; /* 0x88 0x8 */ int sk_rx_dst_ifindex; /* 0x90 0x4 */ u32 sk_rx_dst_cookie; /* 0x94 0x4 */ socket_lock_t sk_lock; /* 0x98 0x20 */ atomic_t sk_drops; /* 0xb8 0x4 */ int sk_rcvlowat; /* 0xbc 0x4 */ /* --- cacheline 3 boundary (192 bytes) --- */ Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Under pressure, tcp recvmsg() has logic to process the socket backlog, but calls tcp_cleanup_rbuf() right before. Avoiding sending ACK right before processing new segments makes a lot of sense, as this decrease the number of ACK packets, with no impact on effective ACK clocking. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Testing timeo before sk_err/sk_state/sk_shutdown makes more sense. Modern applications use non-blocking IO, while a socket is terminated only once during its life time. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
tcp recvmsg() (or rx zerocopy) spends a fair amount of time freeing skbs after their payload has been consumed. A typical ~64KB GRO packet has to release ~45 page references, eventually going to page allocator for each of them. Currently, this freeing is performed while socket lock is held, meaning that there is a high chance that BH handler has to queue incoming packets to tcp socket backlog. This can cause additional latencies, because the user thread has to process the backlog at release_sock() time, and while doing so, additional frames can be added by BH handler. This patch adds logic to defer these frees after socket lock is released, or directly from BH handler if possible. Being able to free these skbs from BH handler helps a lot, because this avoids the usual alloc/free assymetry, when BH handler and user thread do not run on same cpu or NUMA node. One cpu can now be fully utilized for the kernel->user copy, and another cpu is handling BH processing and skb/page allocs/frees (assuming RFS is not forcing use of a single CPU) Tested: 100Gbit NIC Max throughput for one TCP_STREAM flow, over 10 runs MTU : 1500 Before: 55 Gbit After: 66 Gbit MTU : 4096+(headers) Before: 82 Gbit After: 95 Gbit Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
TCP uses sk_eat_skb() when skbs can be removed from receive queue. However, the call to skb_orphan() from __kfree_skb() incurs an indirect call so sock_rfee(), which is more expensive than a direct call, especially for CONFIG_RETPOLINE=y. Add tcp_eat_recv_skb() function to make the call before __kfree_skb(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-