- 13 Jul, 2024 34 commits
-
-
Easwar Hariharan authored
I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave" with more appropriate terms. Inspired by Wolfram's series to fix drivers/i2c/, fix the terminology for users of I2C_ALGOBIT bitbanging interface, now that the approved verbiage exists in the specification. Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Link: https://patch.msgid.link/20240711052734.1273652-5-eahariha@linux.microsoft.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Oleksij Rempel authored
This patch implements the TDR test procedure as described in "Application Note DP83TD510E Cable Diagnostics Toolkit revC", section 3.2. The procedure was tested with "draka 08 signalkabel 2x0.8mm". The reported cable length was 5 meters more for each 20 meters of actual cable length. For instance, a 20-meter cable showed as 25 meters, and a 40-meter cable showed as 50 meters. Since other parts of the diagnostics provided by this PHY (e.g., Active Link Cable Diagnostics) require accurate cable characterization to provide proper results, this tuning can be implemented in a separate patch/interface. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> changes v2: - add comments - change post silence time to 1000ms Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240712152848.2479912-1-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Breno Leitao authored
Remove variables that are defined and incremented but never read. This issue appeared in network tests[1] as: drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c:38:6: warning: variable 'i' set but not used [-Wunused-but-set-variable] 38 | int i = 0; | ^ Link: https://netdev.bots.linux.dev/static/nipa/870263/13729811/build_clang/stderr [1] Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20240712134817.913756-1-leitao@debian.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
ARFS depends on NTUPLE filters, but the inverse is not true. Drivers which don't support ARFS commonly still support NTUPLE filtering. mlx5 has a Kconfig option to disable ARFS (MLX5_EN_ARFS) and does not advertise NTUPLE filters as a feature at all when ARFS is compiled out. That's not correct, ntuple filters indeed still work just fine (as long as MLX5_EN_RXNFC is enabled). This is needed to make the RSS test not skip all RSS context related testing. Acked-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/20240711223722.297676-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Matthieu Baerts (NGI0) authored
It looks like we missed these two errors recently: - SC2068: Double quote array expansions to avoid re-splitting elements. - SC2145: Argument mixes string and array. Use * or separate argument. Two simple fixes, it is not supposed to change the behaviour as the variable names should not have any spaces in their names. Still, better to fix them to easily spot new issues. Fixes: f265d311 ("selftests: mptcp: lib: use setup/cleanup_ns helpers") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240712-upstream-net-next-20240712-selftests-mptcp-fix-shellcheck-v1-1-1cb7180db40a@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Saeed Mahameed says: ==================== mlx5 misc 2023-07-08 (sf max eq) Link: https://patchwork.kernel.org/project/netdevbpf/patch/20240708080025.1593555-2-tariqt@nvidia.com/ ==================== Link: https://patch.msgid.link/20240712003310.355106-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Daniel Jurgens authored
If a maximum number of EQs has been set for an SF, use that amount. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/20240712003310.355106-5-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Daniel Jurgens authored
If the user hasn't configured max_io_eqs set a low default. The SF driver shouldn't try to create more than this, but FW will enforce this limit. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/20240712003310.355106-4-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Daniel Jurgens authored
When setting max_io_eqs for an SF function also set the sf_eq_usage_cap. This is to indicate to the SF driver from the PF that the user has set the max io eqs via devlink. So the SF driver can later query the proper max eq value from the new cap. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/20240712003310.355106-3-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Daniel Jurgens authored
Expose a new cap sf_eq_usage. The vhca_resource_manager can write this cap, indicating the SF driver should use max_num_eqs_24b to determine how many EQs to use. Will be used in the next patch, to indicate to the SF driver from the PF that the user has set the max io eqs via devlink. So the SF driver can later query the proper max eq value from the new cap. devlink port function set pci/0000:08:00.0/32768 max_io_eqs 32 Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/20240712003310.355106-2-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Thorsten Blum authored
Change the data type of the variable freq in mvpp2_rx_time_coal_set() and mvpp2_tx_time_coal_set() to u32 because port->priv->tclk also has the data type u32. Change the data type of the function parameter clk_hz in mvpp2_usec_to_cycles() and mvpp2_cycles_to_usec() to u32 accordingly and remove the following Coccinelle/coccicheck warning reported by do_div.cocci: WARNING: do_div() does a 64-by-32 division, please consider using div64_ul instead Use min() to simplify the code and improve its readability. Compile-tested only. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240711154741.174745-1-thorsten.blum@toblux.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Danielle Ratson authored
Currently, during the module firmware flashing process, unicast notifications are sent from the kernel using the same sequence number, making it impossible for user space to track missed notifications. Monotonically increase the message sequence number, so the order of notifications could be tracked effectively. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20240711080934.2071869-1-danieller@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Kuniyuki Iwashima says: ==================== tcp: Make simultaneous connect() RFC-compliant. Patch 1 fixes an issue that BPF TCP option parser is triggered for ACK instead of SYN+ACK in the case of simultaneous connect(). Patch 2 removes an wrong assumption in tcp_ao/self-connnect tests. v2: https://lore.kernel.org/netdev/20240708180852.92919-1-kuniyu@amazon.com/ v1: https://lore.kernel.org/netdev/20240704035703.95065-1-kuniyu@amazon.com/ ==================== Link: https://patch.msgid.link/20240710171246.87533-1-kuniyu@amazon.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kuniyuki Iwashima authored
tcp_ao/self-connect.c checked the following SNMP stats before/after connect() to confirm that the test exercises the simultaneous connect() path. * TCPChallengeACK * TCPSYNChallenge But the stats should not be counted for self-connect in the first place, and the assumption is no longer true. Let's remove the check. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Dmitry Safonov <dima@arista.com> Link: https://patch.msgid.link/20240710171246.87533-3-kuniyu@amazon.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kuniyuki Iwashima authored
RFC 9293 states that in the case of simultaneous connect(), the connection gets established when SYN+ACK is received. [0] TCP Peer A TCP Peer B 1. CLOSED CLOSED 2. SYN-SENT --> <SEQ=100><CTL=SYN> ... 3. SYN-RECEIVED <-- <SEQ=300><CTL=SYN> <-- SYN-SENT 4. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED 5. SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ... 6. ESTABLISHED <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED 7. ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED However, since commit 0c24604b ("tcp: implement RFC 5961 4.2"), such a SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge ACK. For example, the write() syscall in the following packetdrill script fails with -EAGAIN, and wrong SNMP stats get incremented. 0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > S 0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8> +0 < S 0:0(0) win 1000 <mss 1000> +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8> +0 < S. 0:0(0) ack 1 win 1000 +0 write(3, ..., 100) = 100 +0 > P. 1:101(100) ack 1 -- # packetdrill cross-synack.pkt cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable) # nstat ... TcpExtTCPChallengeACK 1 0.0 TcpExtTCPSYNChallenge 1 0.0 The problem is that bpf_skops_established() is triggered by the Challenge ACK instead of SYN+ACK. This causes the bpf prog to miss the chance to check if the peer supports a TCP option that is expected to be exchanged in SYN and SYN+ACK. Let's accept a bare SYN+ACK for active-open TCP_SYN_RECV sockets to avoid such a situation. Note that tcp_ack_snd_check() in tcp_rcv_state_process() is skipped not to send an unnecessary ACK, but this could be a bit risky for net.git, so this targets for net-next. Link: https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 [0] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240710171246.87533-2-kuniyu@amazon.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Peng Fan authored
Add install target for vsock to make Yocto easy to install the images. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20240710122728.45044-1-peng.fan@oss.nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
Following files are part of TCP stack: - net/ipv4/inet_connection_sock.c - net/ipv4/inet_hashtables.c - net/ipv4/inet_timewait_sock.c - net/ipv6/inet6_connection_sock.c - net/ipv6/inet6_hashtables.c Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240712234213.3178593-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queueJakub Kicinski authored
Tony Nguyen says: ==================== idpf: XDP chapter I: convert Rx to libeth Alexander Lobakin says: XDP for idpf is currently 5 chapters: * convert Rx to libeth (this); * convert Tx and stats to libeth; * generic XDP and XSk code changes, libeth_xdp; * actual XDP for idpf via libeth_xdp; * XSk for idpf (^). Part I does the following: * splits &idpf_queue into 4 (RQ, SQ, FQ, CQ) and puts them on a diet; * ensures optimal cacheline placement, strictly asserts CL sizes; * moves currently unused/dead singleq mode out of line; * reuses libeth's Rx ptype definitions and helpers; * uses libeth's Rx buffer management for both header and payload; * eliminates memcpy()s and coherent DMA uses on hotpath, uses napi_build_skb() instead of in-place short skb allocation. Most idpf patches, except for the queue split, removes more lines than adds. Expect far better memory utilization and +5-8% on Rx depending on the case (+17% on skb XDP_DROP :>). * '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: idpf: use libeth Rx buffer management for payload buffer idpf: convert header split mode to libeth + napi_build_skb() libeth: support different types of buffers for Rx idpf: remove legacy Page Pool Ethtool stats idpf: reuse libeth's definitions of parsed ptype structures idpf: compile singleq code only under default-n CONFIG_IDPF_SINGLEQ idpf: merge singleq and splitq &net_device_ops idpf: strictly assert cachelines of queue and queue vector structures idpf: avoid bloating &idpf_q_vector with big %NR_CPUS idpf: split &idpf_queue into 4 strictly-typed queue structures idpf: stop using macros for accessing queue descriptors libeth: add cacheline / struct layout assertion helpers page_pool: use __cacheline_group_{begin, end}_aligned() cache: add __cacheline_group_{begin, end}_aligned() (+ couple more) ==================== Link: https://patch.msgid.link/20240710203031.188081-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski authored
Daniel Borkmann says: ==================== pull-request: bpf-next 2024-07-12 We've added 23 non-merge commits during the last 3 day(s) which contain a total of 18 files changed, 234 insertions(+), 243 deletions(-). The main changes are: 1) Improve BPF verifier by utilizing overflow.h helpers to check for overflows, from Shung-Hsi Yu. 2) Fix NULL pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT when attr->attach_prog_fd was not specified, from Tengda Wu. 3) Fix arm64 BPF JIT when generating code for BPF trampolines with BPF_TRAMP_F_CALL_ORIG which corrupted upper address bits, from Puranjay Mohan. 4) Remove test_run callback from lwt_seg6local_prog_ops which never worked in the first place and caused syzbot reports, from Sebastian Andrzej Siewior. 5) Relax BPF verifier to accept non-zero offset on KF_TRUSTED_ARGS/ /KF_RCU-typed BPF kfuncs, from Matt Bobrowski. 6) Fix a long standing bug in libbpf with regards to handling of BPF skeleton's forward and backward compatibility, from Andrii Nakryiko. 7) Annotate btf_{seq,snprintf}_show functions with __printf, from Alan Maguire. 8) BPF selftest improvements to reuse common network helpers in sk_lookup test and dropping the open-coded inetaddr_len() and make_socket() ones, from Geliang Tang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits) selftests/bpf: Test for null-pointer-deref bugfix in resolve_prog_type() bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT selftests/bpf: DENYLIST.aarch64: Skip fexit_sleep again bpf: use check_sub_overflow() to check for subtraction overflows bpf: use check_add_overflow() to check for addition overflows bpf: fix overflow check in adjust_jmp_off() bpf: Eliminate remaining "make W=1" warnings in kernel/bpf/btf.o bpf: annotate BTF show functions with __printf bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIG selftests/bpf: Close obj in error path in xdp_adjust_tail selftests/bpf: Null checks for links in bpf_tcp_ca selftests/bpf: Use connect_fd_to_fd in sk_lookup selftests/bpf: Use start_server_addr in sk_lookup selftests/bpf: Use start_server_str in sk_lookup selftests/bpf: Close fd in error path in drop_on_reuseport selftests/bpf: Add ASSERT_OK_FD macro selftests/bpf: Add backlog for network_helper_opts selftests/bpf: fix compilation failure when CONFIG_NF_FLOW_TABLE=m bpf: Remove tst_run from lwt_seg6local_prog_ops. bpf: relax zero fixed offset constraint on KF_TRUSTED_ARGS/KF_RCU ... ==================== Link: https://patch.msgid.link/20240712212448.5378-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski authored
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt.c f7ce5eb2 ("bnxt_en: Fix crash in bnxt_get_max_rss_ctx_ring()") 20c8ad72 ("eth: bnxt: use the RSS context XArray instead of the local list") Adjacent changes: net/ethtool/ioctl.c 503757c8 ("net: ethtool: Fix RSS setting") eac9122f ("net: ethtool: record custom RSS contexts in the XArray") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Jakub Kicinski says: ==================== eth: bnxt: use the new RSS API Convert bnxt from using the set_rxfh API to separate create/modify/remove callbacks. Two small extensions to the core APIs are necessary: - the ability to discard contexts if for some catastrophic reasons device can no longer provide them; - the ability to reserve space in the context for RSS table growth. The driver is adjusted to store indirection tables on u32 to make it easier to use core structs directly. With that out of the way the conversion is fairly straightforward. Since the opposition to discarding contexts was relatively mild and its what bnxt does already, I'm sticking to that. We may very well need to revisit that at a later time. v1: https://lore.kernel.org/all/20240702234757.4188344-1-kuba@kernel.org/ ==================== Link: https://patch.msgid.link/20240711220713.283778-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Instead of allocating a separate indir table in the vnic use the one already present in the RSS context allocated by the core. This saves some LoC and also we won't have to worry about syncing the local version back to the core, once core learns how to dump contexts. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-12-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Ethtool core stores indirection table with u32 entries, "just to be safe". Switch the type in the driver, so that it's easier to swap local tables for the core ones. Memory allocations already use sizeof(*entry), switch the memset()s as well. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-11-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
bnxt allocates tables of max size, and changes the used size based on number of active rings. The unused entries get padded out with zeros. bnxt_modify_rss() seems to always pad out the table of the main / default RSS context, instead of the table of the modified context. I haven't observed any behavior change due to this patch, so I don't think it's a fix. Not entirely sure what role the padding plays, 0 is a valid queue ID. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-10-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Core already maintains all RSS contexts in an XArray, no need to keep a second list in the driver. Remove bnxt_get_max_rss_ctx_ring() completely since core performs the same check already. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-9-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Core can allocate space for per-context driver-private data, use it for struct bnxt_rss_ctx. Inline bnxt_alloc_rss_ctx() at this point, most of the init (as in the actions bnxt_del_one_rss_ctx() will undo) is open coded in bnxt_create_rxfh_context(), anyway. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-8-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
New RSS context API removes old contexts on netdev unregister. No need to wipe them manually. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-7-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Core will allocate IDs for the driver, from the range [1, BNXT_MAX_ETH_RSS_CTX], no need to track the allocations. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-6-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Use the new ethtool ops for RSS context management. The conversion is pretty straightforward cut / paste of the right chunks of the combined handler. Main change is that we let the core pick the IDs (bitmap will be removed separately for ease of review), so we need to tell the core when we lose a context. Since the new API passes rxfh as const, change bnxt_modify_rss() to also take const. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-5-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Contexts get deleted from FW when the device is down, but they are kept in SW and re-added back on open. bnxt_set_rxfh_context() apparently does not want to deal with complexity of dealing with both the device down and device up cases. This is perhaps acceptable for creating new contexts, but not being able to delete contexts makes core-driven cleanups messy. Specifically with the new RSS API core will try to delete contexts automatically after bringing the device down. Support the delete-while-down case. Skip the FW logic and delete just the driver state. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-4-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Some drivers (bnxt but I think also mlx5 from ML discussions) change the size of the indirection table depending on the number of Rx rings. Decouple the max table size from the size of the currently used table, so that we can reserve space in the context for table growth. Static members in ethtool_ops are good enough for now, we can add callbacks to read the max size more dynamically if someone needs that. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-3-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
RSS contexts may get lost from a device, in various extreme circumstances. Specifically if the firmware leaks resources and resets, or crashes and either recovers in partially working state or the crash causes a different FW version to run - creating the context again may fail. Drivers should do their absolute best to prevent this from happening. When it does, however, telling user that a context exists, when it can't possibly be used any more is counter productive. Add a helper for drivers to discard contexts. Print an error, in the future netlink notification will also be sent. More robust approaches were proposed, like keeping the contexts but marking them as "dead" (but possibly resurrected by next reset). That may be better but it's unclear at this stage whether the effort is worth the benefits. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20240711220713.283778-2-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull more networking fixes from Jakub Kicinski: "A quick follow up to yesterday's pull. We got a regressions report for the bnxt patch as soon as it got to your tree. The ethtool fix is also good to have, although it's an older regression. Current release - regressions: - eth: bnxt_en: fix crash in bnxt_get_max_rss_ctx_ring() on older HW when user tries to decrease the ring count Previous releases - regressions: - ethtool: fix RSS setting, accept "no change" setting if the driver doesn't support the new features - eth: i40e: remove needless retries of NVM update, don't wait 20min when we know the firmware update won't succeed" * tag 'net-6.10-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: bnxt_en: Fix crash in bnxt_get_max_rss_ctx_ring() octeontx2-af: fix issue with IPv4 match for RSS octeontx2-af: fix issue with IPv6 ext match for RSS octeontx2-af: fix detection of IP layer octeontx2-af: fix a issue with cpt_lf_alloc mailbox octeontx2-af: replace cpt slot with lf id on reg write i40e: fix: remove needless retries of NVM update net: ethtool: Fix RSS setting
-
Michael Chan authored
On older chips not supporting multiple RSS contexts, reducing ethtool channels will crash: BUG: kernel NULL pointer dereference, address: 00000000000000b8 PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 7032 Comm: ethtool Tainted: G S 6.10.0-rc4 #1 Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017 RIP: 0010:bnxt_get_max_rss_ctx_ring+0x4c/0x90 [bnxt_en] Code: c3 d3 eb 4c 8b 83 38 01 00 00 48 8d bb 38 01 00 00 4c 39 c7 74 42 41 8d 54 24 ff 31 c0 0f b7 d2 4c 8d 4c 12 02 66 85 ed 74 1d <49> 8b 90 b8 00 00 00 49 8d 34 11 0f b7 0a 66 39 c8 0f 42 c1 48 83 RSP: 0018:ffffaaa501d23ba8 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff8efdf600c940 RCX: 0000000000000000 RDX: 000000000000007f RSI: ffffffffacf429c4 RDI: ffff8efdf600ca78 RBP: 0000000000000080 R08: 0000000000000000 R09: 0000000000000100 R10: 0000000000000001 R11: ffffaaa501d238c0 R12: 0000000000000080 R13: 0000000000000000 R14: ffff8efdf600c000 R15: 0000000000000006 FS: 00007f977a7d2740(0000) GS:ffff8f041f840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000b8 CR3: 00000002320aa004 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die_body+0x15/0x60 ? page_fault_oops+0x157/0x440 ? do_user_addr_fault+0x60/0x770 ? _raw_spin_lock_irqsave+0x12/0x40 ? exc_page_fault+0x61/0x120 ? asm_exc_page_fault+0x22/0x30 ? bnxt_get_max_rss_ctx_ring+0x4c/0x90 [bnxt_en] ? bnxt_get_max_rss_ctx_ring+0x25/0x90 [bnxt_en] bnxt_set_channels+0x9d/0x340 [bnxt_en] ethtool_set_channels+0x14b/0x210 __dev_ethtool+0xdf8/0x2890 ? preempt_count_add+0x6a/0xa0 ? percpu_counter_add_batch+0x23/0x90 ? filemap_map_pages+0x417/0x4a0 ? avc_has_extended_perms+0x185/0x420 ? __pfx_udp_ioctl+0x10/0x10 ? sk_ioctl+0x55/0xf0 ? kmalloc_trace_noprof+0xe0/0x210 ? dev_ethtool+0x54/0x170 dev_ethtool+0xa2/0x170 dev_ioctl+0xbe/0x530 sock_do_ioctl+0xa3/0xf0 sock_ioctl+0x20d/0x2e0 bp->rss_ctx_list is not initialized if the chip or firmware does not support multiple RSS contexts. Fix it by adding a check in bnxt_get_max_rss_ctx_ring() before proceeding to reference bp->rss_ctx_list. Fixes: 0d1b7d6c ("bnxt: fix crashes when reducing ring count with active RSS contexts") Reported-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/netdev/ZpFEJeNpwxW1aW9k@gmail.com/Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240712175318.166811-1-michael.chan@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 12 Jul, 2024 6 commits
-
-
Tengda Wu authored
This test verifies that resolve_prog_type() works as expected when `attach_prog_fd` is not passed in. `prog->aux->dst_prog` in resolve_prog_type() is assigned by `attach_prog_fd`, and would be NULL if `attach_prog_fd` is not provided. Loading EXT prog with bpf_dynptr_from_skb() kfunc call in this way will lead to null-pointer-deref. Verify that the null-pointer-deref bug in resolve_prog_type() is fixed. Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240711145819.254178-3-wutengda@huaweicloud.com
-
Tengda Wu authored
When loading a EXT program without specifying `attr->attach_prog_fd`, the `prog->aux->dst_prog` will be null. At this time, calling resolve_prog_type() anywhere will result in a null pointer dereference. Example stack trace: [ 8.107863] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004 [ 8.108262] Mem abort info: [ 8.108384] ESR = 0x0000000096000004 [ 8.108547] EC = 0x25: DABT (current EL), IL = 32 bits [ 8.108722] SET = 0, FnV = 0 [ 8.108827] EA = 0, S1PTW = 0 [ 8.108939] FSC = 0x04: level 0 translation fault [ 8.109102] Data abort info: [ 8.109203] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 8.109399] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 8.109614] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 8.109836] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101354000 [ 8.110011] [0000000000000004] pgd=0000000000000000, p4d=0000000000000000 [ 8.112624] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 8.112783] Modules linked in: [ 8.113120] CPU: 0 PID: 99 Comm: may_access_dire Not tainted 6.10.0-rc3-next-20240613-dirty #1 [ 8.113230] Hardware name: linux,dummy-virt (DT) [ 8.113390] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 8.113429] pc : may_access_direct_pkt_data+0x24/0xa0 [ 8.113746] lr : add_subprog_and_kfunc+0x634/0x8e8 [ 8.113798] sp : ffff80008283b9f0 [ 8.113813] x29: ffff80008283b9f0 x28: ffff800082795048 x27: 0000000000000001 [ 8.113881] x26: ffff0000c0bb2600 x25: 0000000000000000 x24: 0000000000000000 [ 8.113897] x23: ffff0000c1134000 x22: 000000000001864f x21: ffff0000c1138000 [ 8.113912] x20: 0000000000000001 x19: ffff0000c12b8000 x18: ffffffffffffffff [ 8.113929] x17: 0000000000000000 x16: 0000000000000000 x15: 0720072007200720 [ 8.113944] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720 [ 8.113958] x11: 0720072007200720 x10: 0000000000f9fca4 x9 : ffff80008021f4e4 [ 8.113991] x8 : 0101010101010101 x7 : 746f72705f6d656d x6 : 000000001e0e0f5f [ 8.114006] x5 : 000000000001864f x4 : ffff0000c12b8000 x3 : 000000000000001c [ 8.114020] x2 : 0000000000000002 x1 : 0000000000000000 x0 : 0000000000000000 [ 8.114126] Call trace: [ 8.114159] may_access_direct_pkt_data+0x24/0xa0 [ 8.114202] bpf_check+0x3bc/0x28c0 [ 8.114214] bpf_prog_load+0x658/0xa58 [ 8.114227] __sys_bpf+0xc50/0x2250 [ 8.114240] __arm64_sys_bpf+0x28/0x40 [ 8.114254] invoke_syscall.constprop.0+0x54/0xf0 [ 8.114273] do_el0_svc+0x4c/0xd8 [ 8.114289] el0_svc+0x3c/0x140 [ 8.114305] el0t_64_sync_handler+0x134/0x150 [ 8.114331] el0t_64_sync+0x168/0x170 [ 8.114477] Code: 7100707f 54000081 f9401c00 f9403800 (b9400403) [ 8.118672] ---[ end trace 0000000000000000 ]--- One way to fix it is by forcing `attach_prog_fd` non-empty when bpf_prog_load(). But this will lead to `libbpf_probe_bpf_prog_type` API broken which use verifier log to probe prog type and will log nothing if we reject invalid EXT prog before bpf_check(). Another way is by adding null check in resolve_prog_type(). The issue was introduced by commit 4a9c7bbe ("bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT") which wanted to correct type resolution for BPF_PROG_TYPE_TRACING programs. Before that, the type resolution of BPF_PROG_TYPE_EXT prog actually follows the logic below: prog->aux->dst_prog ? prog->aux->dst_prog->type : prog->type; It implies that when EXT program is not yet attached to `dst_prog`, the prog type should be EXT itself. This code worked fine in the past. So just keep using it. Fix this by returning `prog->type` for BPF_PROG_TYPE_EXT if `dst_prog` is not present in resolve_prog_type(). Fixes: 4a9c7bbe ("bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT") Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20240711145819.254178-2-wutengda@huaweicloud.com
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs fixes from David Sterba: "Fix a regression in extent map shrinker behaviour. In the past weeks we got reports from users that there are huge latency spikes or freezes. This was bisected to newly added shrinker of extent maps (it was added to fix a build up of the structures in memory). I'm assuming that the freezes would happen to many users after release so I'd like to get it merged now so it's in 6.10. Although the diff size is not small the changes are relatively straightforward, the reporters verified the fixes and we did testing on our side. The fixes: - adjust behaviour under memory pressure and check lock or scheduling conditions, bail out if needed - synchronize tracking of the scanning progress so inode ranges are not skipped or work duplicated - do a delayed iput when scanning a root so evicting an inode does not slow things down in case of lots of dirty data, also fix lockdep warning, a deadlock could happen when writing the dirty data would need to start a transaction" * tag 'for-6.10-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: avoid races when tracking progress for extent map shrinking btrfs: stop extent map shrinker if reschedule is needed btrfs: use delayed iput during extent map shrinking
-
https://github.com/ceph/ceph-clientLinus Torvalds authored
Pull ceph fixes from Ilya Dryomov: "A fix for a possible use-after-free following "rbd unmap" or "umount" marked for stable and two kernel-doc fixups" * tag 'ceph-for-6.10-rc8' of https://github.com/ceph/ceph-client: libceph: fix crush_choose_firstn() kernel-doc warnings libceph: suppress crush_choose_indep() kernel-doc warnings libceph: fix race between delayed_work() and ceph_monc_stop()
-
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pmLinus Torvalds authored
Pull pmdomain fix from Ulf Hansson: - qcom: Skip retention level for rpmhpd's * tag 'pmdomain-v6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: pmdomain: qcom: rpmhpd: Skip retention level for Power Domains
-
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds authored
Pull MMC host fixes from Ulf Hansson: - davinci_mmc: Prevent transmitted data size from exceeding sgm's length - sdhci: Fix max_seg_size for 64KiB PAGE_SIZE * tag 'mmc-v6.10-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: davinci_mmc: Prevent transmitted data size from exceeding sgm's length mmc: sdhci: Fix max_seg_size for 64KiB PAGE_SIZE
-