- 04 Apr, 2024 37 commits
-
-
Amit Cohen authored
Completion queues are used for completions of RDQ or SDQ. Each completion queue is used for one DQ. The first CQs are used for SDQs and the rest are used for RDQs. Currently, for each CQE (completion queue element), we check 'sr' value (send/receive) to know if it is completion of RDQ or SDQ. Actually, we do not really have to check it, as according to the queue number we know if it handles completions of Rx or Tx. Break the tasklet into two - one for Rx (RDQ) and one for Tx (SDQ). Then, setup the appropriate tasklet for each queue as part of queue initialization. Use 'sr' value for unlikely case that we get completion with type that we do not expect. Call WARN_ON_ONCE() only after checking the value, to avoid calling this method for each completion. A next patch set will use NAPI to handle events, then we will have a separate poll method for Rx and Tx. This change is a preparation for NAPI usage. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/50fbc366f8de54cb5dc72a7c4f394333ef71f1d0.1712062203.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Amit Cohen authored
This function will be broken into several functions later. As preparation, reorder variables to reverse xmas tree. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/7170a8f4429ecb5a539b0374c621697778ff8363.1712062203.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Amit Cohen authored
The previous patch changed the code to do not handle command interface from event queue. With this change the wait queue is not used anymore. Remove it and 'wait_done' variable. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/f3af6a5a9dabd97d2920cefe475c6aa57767f504.1712062203.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Amit Cohen authored
The device supports two event queues. EQ0 is used for command interface completion events. EQ1 is used for completion events of RDQ or SDQ. Currently, for each EQE (event queue element), we check the queue number and handle accordingly. More than that, for each interrupt we schedule tasklets for both EQs. This is really ineffective, especially because of the fact that EQ0 is used only as part of driver init/fini, when EMADs are not available. There is no point to schedule the tasklet for it and check each EQE. A previous patch changed the code to poll command interface for each use of it. It means that now there is no real reason to use EQ0, as we poll the command interface. Initialize only one event queue and use it as EQ1 (this is determined by queue number). Then, for each interrupt we can schedule the tasklet only for one queue and we do not have to check the queue number. This simplifies the code and should improve performance. Note that polling command interface is ok as we use it only as part of driver init/fini. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/23d764f5c032e4c363b98590b746a4b32d2bf900.1712062203.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Amit Cohen authored
Currently we use MLXSW_PCI_EQS_COUNT event queues. A next patch will change the driver to initialize only EQ1, as EQ0 is not required anymore when we poll command interface. Rename the macro to MLXSW_PCI_EQS_MAX as later we will not initialize the maximum supported EQs, this value represents the maximum and a new macro will be added to represent the actual used queues. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/b08df430b62f23ca1aa3aaa257896d2d95aa7691.1712062203.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Amit Cohen authored
Command interface is used for configuring and querying FW when EMADs are not available. During the time that the driver sets up the asynchronous queues, it polls the command interface for getting completions. Then, there is a short period when asynchronous queues work, but EMADs are not available (marked in the code as nopoll = true). During this time, we send commands via command interface, but we do not poll it, as we can get an interrupt for the completion. Completions of command interface are received from HW in EQ0 (event queue 0). The usage of EQ0 instead of polling is done only 4 times during initialization and one time during tear down, but it makes an overhead during lifetime of the driver. For each interrupt, we have to check if we get events in EQ0 or EQ1 and handle them. This is really ineffective, especially because of the fact that EQ0 is used only as part of driver init/fini. Instead, we can poll command interface for each call of cmd_exec(). It means that when we send a command via command interface (as EMADs are not available), we will poll it, regardless of availability of the asynchronous queues. This will allow us to configure later only EQ1 and simplify the flow. Remove 'nopoll' indication and change mlxsw_pci_cmd_exec() to poll till answer/timeout regardless of queues' state. For now, completions are handled also by EQ0, but it will be removed in next patch. Additional cleanups will be added in next patches. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/e674c70380ceda953e0e45a77334c5d22e69938f.1712062203.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Amit Cohen authored
This function will be used later only for EQ1. As preparation, reorder variables to reverse xmas tree and return earlier when it is possible, to simplify the code. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/2412d6c135b2a6aedb4484f5d8baab3aecd7b9ae.1712062203.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Amit Cohen authored
The structure 'mlxsw_pci_queue' stores several counters which were consumed via debugfs. Since commit 9a32562b ("mlxsw: Remove debugfs interface"), these counters are not used. Remove them. This makes the 'union u' and 'struct eq' redundant. Maintain 'struct cq' as it will be extended later. Replace increasing 'q->u.eq.ev_other_count' with WARN_ON_ONCE(), as it is used in an unreasonable case of receiving event in EQ which is not EQ0 or EQ1. When the queues are initialized, we check number of event queues and fail with the print "Unsupported number of queues" in case that the driver tries to initialize more than two queues. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/ee9e658800aa0390e08342100bc27daff4c176c0.1712062203.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Amit Cohen authored
Currently, as part of mlxsw_pci_cq_tasklet(), we check if any item was handled, and only in such case we arm doorbell. This is unlikely case, as we schedule tasklet only for CQs that we get an event for them, which means that they contain completions to handle. Remove this check, which is supposed to be true always, and even if it is false, it is not a mistake to ring the doorbell. We can warn on such case, but it is not really worth to add a check which will be run for each CQ handling when we do not expect to reach it and it does not point to logic error that should be handled. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/f8efa481bfe7bebb9f93bb803f44ab7da77f53e6.1712062203.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Amit Cohen authored
Currently, the structure 'mlxsw_pci_queue_ops' holds a pointer to the callback function of tasklet. This is used only for EQ and CQ. mlxsw driver will use NAPI in a following patch set, so CQ will not use tasklet anymore. As preparation, remove this pointer from the shared operation structure and setup the tasklet as part of queue initialization. For now, setup tasklet for EQ and CQ. Later, CQ code will be changed. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/a326cae5fc1ad085a1a063c004983de6fe389414.1712062203.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Amit Cohen authored
Move mlxsw_pci_cq_{init, fini}() after mlxsw_pci_cq_tasklet() as a next patch will setup the tasklet as part of initialization. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/25196cb5baf5acf6ec1e956203790e018ba8e306.1712062203.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Amit Cohen authored
Move mlxsw_pci_eq_{init, fini}() after mlxsw_pci_eq_tasklet() as a next patch will setup the tasklet as part of initialization. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/7ae120a02e1c490084daae7e684a0d40b7cce4e7.1712062203.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Tariq Toukan says: ==================== mlx5 misc patches This patchset includes small features and misc code enhancements for the mlx5 core and EN drivers. Patches 1-4 by Gal improves the mlx5e ethtool stats implementation, for example by using standard helpers ethtool_sprintf/puts. Patch 5 by me adds a reset option for the FW command interface debugfs stats entries. This allows explicit FW command interface stats reset between different runs of a test case. Patches 6 and 7 are simple cleanups. Patch 8 by Gal adds driver support for 800Gbps link modes. Patch 9 by Jianbo enhances the L4 steering abilities. Patches 10-11 by Jianbo save redundant operations. ==================== Link: https://lore.kernel.org/r/20240402133043.56322-1-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jianbo Liu authored
Firmware will return 0 on query BOOT/INIT PAGES for non-page supplier functions (external host PF/VF/SF), so no page is needed to be allocated for them. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240402133043.56322-12-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jianbo Liu authored
Page events are not issued by device on the function if page_request_disable is set, so no need to create pages EQ. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240402133043.56322-11-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jianbo Liu authored
Replace matching on TCP and UDP protocols with new l4_type field which is parsed by steering for ttc_table. It is enabled by the outer_l4_type or inner_l4_type bits in nic_rx or port_sel flow table capabilities and used only if pcc_ifa2 bit in HCA capabilities is set. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240402133043.56322-10-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Gal Pressman authored
Add support for 800Gbps speed, link modes of 100Gbps per lane. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240402133043.56322-9-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Gal Pressman authored
In the kernel, the preferred types are uX. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-8-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Carolina Jubran authored
Starting from commit eb9b9fdc ("net/mlx5e: Introduce extended version for mlx5e_xmit_data") sinfo is no longer passed as an argument to mlx5e_xmit_xdp_frame(), the comment is inconsistent. check_result must be zero when the packet is fragmented. Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-7-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tariq Toukan authored
Resetting stats just before some test/debug case allows us to eliminate out the impact of previous commands. Useful in particular for the average latency calculation. The average_write() callback was unreachable, as "average" is a read-only file. Extend, rename, and use it for a newly exposed write-only "reset" file. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-6-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Gal Pressman authored
The fill_strings() callbacks were changed to accept a **data pointer, and not rely on propagating the index value. Make a similar change to fill_stats() callbacks to keep the API consistent. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-5-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Gal Pressman authored
Use ethtool_sprintf/puts() helper functions which handle the common pattern of printing a string into the ethtool strings interface and incrementing the string pointer by ETH_GSTRING_LEN. Change the fill_strings callback to accept a **data pointer, and remove the index and return value. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-4-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Gal Pressman authored
Use ethtool_sprintf/puts() helper functions which handle the common pattern of printing a string into the ethtool strings interface and incrementing the string pointer by ETH_GSTRING_LEN. The int return value in mlx5e_self_test_fill_strings() is not removed as it is still used to return the number of selftests. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-3-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Gal Pressman authored
Use ethtool_sprintf/puts() helper functions which handle the common pattern of printing a string into the ethtool strings interface and incrementing the string pointer by ETH_GSTRING_LEN. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240402133043.56322-2-tariqt@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Merge tag 'wireless-next-2024-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.10 The first "new features" pull request for v6.10 with changes both in stack and in drivers. The big thing in this pull request is that wireless subsystem is now almost free of sparse warnings. There's only one warning left in ath11k which was introduced in v6.9-rc1 and will be fixed via the wireless tree. Realtek drivers continue to improve, now we have support for RTL8922AE and RTL8723CS devices. ath11k also has long waited support for P2P. This time we have a small conflict in iwlwifi, Stephen has an example merge resolution which should help with fixing the conflict: https://lore.kernel.org/all/20240326100945.765b8caf@canb.auug.org.au/ Major changes: rtw89 * RTL8922AE Wi-Fi 7 PCI device support rtw88 * RTL8723CS SDIO device support iwlwifi * don't support puncturing in 5 GHz * support monitor mode on passive channels * BZ-W device support * P2P with HE/EHT support ath11k * P2P support for QCA6390, WCN6855 and QCA2066 * tag 'wireless-next-2024-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (122 commits) wifi: mt76: mt7915: workaround dubious x | !y warning wifi: mwl8k: Avoid -Wflex-array-member-not-at-end warnings wifi: ti: Avoid a hundred -Wflex-array-member-not-at-end warnings wifi: iwlwifi: mvm: fix check in iwl_mvm_sta_fw_id_mask net: rfkill: gpio: Convert to platform remove callback returning void wifi: mac80211: use kvcalloc() for codel vars wifi: iwlwifi: reconfigure TLC during HW restart wifi: iwlwifi: mvm: don't change BA sessions during restart wifi: iwlwifi: mvm: select STA mask only for active links wifi: iwlwifi: mvm: set wider BW OFDMA ignore correctly wifi: iwlwifi: Add support for LARI_CONFIG_CHANGE_CMD cmd v9 wifi: iwlwifi: mvm: Declare HE/EHT capabilities support for P2P interfaces wifi: iwlwifi: mvm: Remove outdated comment wifi: iwlwifi: add support for BZ_W wifi: iwlwifi: Print a specific device name. wifi: iwlwifi: remove wrong CRF_IDs wifi: iwlwifi: remove devices that never came out wifi: iwlwifi: mvm: mark EMLSR disabled in cleanup iterator wifi: iwlwifi: mvm: fix active link counting during recovery wifi: iwlwifi: mvm: assign link STA ID lookups during restart ... ==================== Link: https://lore.kernel.org/r/20240403093625.CF515C433C7@smtp.kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Rahul Rameshbabu authored
ethtool.py depends on yml files in a specific location of the linux kernel tree. Using relative lookup for those files means that ethtool.py would need to be run under tools/net/ynl/. Lookup needed yml files without depending on the current working directory that ethtool.py is invoked from. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/20240402204000.115081-1-rrameshbabu@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Pawel Dembicki authored
This commit implements VCT in 88E308X/88E609X Family. It require two workarounds with some magic configuration. Regular use require only one register configuration. But Open Circuit require second workaround. It cause implementation two phases for fault length measuring. Fast Ethernet PHY have implemented very simple version of VCT. It's complitley different than vct5 or vct7. Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240402201123.2961909-3-paweldembicki@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Pawel Dembicki authored
Some PHYs can recognize during a cable test if the impedance in the cable is okay. They can detect reflections caused by impedance discontinuity between a regular 100 Ohm cable and an abnormal part with a higher or lower impedance. This commit introduces a new result code: ETHTOOL_A_CABLE_RESULT_CODE_IMPEDANCE_MISMATCH, which represents the results of a cable test indicating issues with impedance integrity. Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240402201123.2961909-2-paweldembicki@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Pawel Dembicki authored
This patch implements only basic support. It covers PHY used in multiple IC: PHY: 88E3082, 88E3083 Switch: 88E6096, 88E6097 Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20240402201123.2961909-1-paweldembicki@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Christophe JAILLET authored
In "struct muram_info", the 'size' field is unused. In "struct memac_cfg", the 'fixed_link' field is unused. Remove them. Found with cppcheck, unusedStructMember. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Sean Anderson <sean.anderson@seco.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/425222d4f6c584e8316ccb7b2ef415a85c96e455.1712084103.git.christophe.jaillet@wanadoo.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Kuniyuki Iwashima says: ==================== af_unix: Remove old GC leftovers. This is a follow-up series for commit 4090fa37 ("af_unix: Replace garbage collection algorithm.") which introduced the new GC for AF_UNIX. Now we no longer need two ugly tricks for the old GC, let's remove them. ==================== Link: https://lore.kernel.org/r/20240401173125.92184-1-kuniyu@amazon.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kuniyuki Iwashima authored
In the previous GC implementation, the shape of the inflight socket graph was not expected to change while GC was in progress. MSG_PEEK was tricky because it could install inflight fd silently and transform the graph. Let's say we peeked a fd, which was a listening socket, and accept()ed some embryo sockets from it. The garbage collection algorithm would have been confused because the set of sockets visited in scan_inflight() would change within the same GC invocation. That's why we placed spin_lock(&unix_gc_lock) and spin_unlock() in unix_peek_fds() with a fat comment. In the new GC implementation, we no longer garbage-collect the socket if it exists in another queue, that is, if it has a bridge to another SCC. Also, accept() will require the lock if it has edges. Thus, we need not do the complicated lock dance. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240401173125.92184-3-kuniyu@amazon.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kuniyuki Iwashima authored
When we passed fds, we used to bump each file's refcount twice in scm_fp_copy() and scm_fp_dup() before linking the socket to gc_inflight_list. This is because we incremented the inflight count of the socket and linked it to the list in advance before passing skb to the destination socket. Otherwise, the inflight socket could have been garbage-collected in a small race window between linking the socket to the list and queuing skb: CPU 1 : sendmsg(X) w/ A's fd CPU 2 : close(A) ----- ----- /* Here A's refcount is 1, and inflight count is 0 */ bump A's refcount to 2 in scm_fp_copy() bump A's inflight count to 1 link A to gc_inflight_list decrement A's refcount to 1 /* A's refcount == inflight count, thus A could be GC candidate */ start GC mark A as candidate purge A's receive queue queue skb w/ A's fd to X /* A is queued, but all data has been lost */ After commit 4090fa37 ("af_unix: Replace garbage collection algorithm."), we increment the inflight count and link the socket to the global list only when queuing the skb. The race no longer exists, so let's not clone the fd nor bump the count in unix_attach_fds(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240401173125.92184-2-kuniyu@amazon.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Jason Xing says: ==================== tcp: make trace of reset logic complete Before this, we miss some cases where the TCP layer could send RST but we cannot trace it. So I decided to complete it :) Link: https://lore.kernel.org/all/20240329034243.7929-1-kerneljasonxing@gmail.com/ ==================== Link: https://lore.kernel.org/r/20240401073605.37335-1-kerneljasonxing@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jason Xing authored
Prior to this patch, what we can see by enabling trace_tcp_send is only happening under two circumstances: 1) active rst mode 2) non-active rst mode and based on the full socket That means the inconsistency occurs if we use tcpdump and trace simultaneously to see how rst happens. It's necessary that we should take into other cases into considerations, say: 1) time-wait socket 2) no socket ... By parsing the incoming skb and reversing its 4-tuple can we know the exact 'flow' which might not exist. Samples after applied this patch: 1. tcp_send_reset: skbaddr=XXX skaddr=XXX src=ip:port dest=ip:port state=TCP_ESTABLISHED 2. tcp_send_reset: skbaddr=000...000 skaddr=XXX src=ip:port dest=ip:port state=UNKNOWN Note: 1) UNKNOWN means we cannot extract the right information from skb. 2) skbaddr/skaddr could be 0 Signed-off-by: Jason Xing <kernelxing@tencent.com> Link: https://lore.kernel.org/r/20240401073605.37335-3-kerneljasonxing@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jason Xing authored
Introducing entry_saddr and entry_daddr parameters in this macro for later use can help us record the reverse 4-tuple by analyzing the 4-tuple of the incoming skb when receiving. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240401073605.37335-2-kerneljasonxing@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Marcelo Tosatti authored
For systems that use CPU isolation (via nohz_full), creating or destroying a socket with SO_TIMESTAMP, SO_TIMESTAMPNS or SO_TIMESTAMPING with flag SOF_TIMESTAMPING_RX_SOFTWARE will cause a static key to be enabled/disabled. This in turn causes undesired IPIs to isolated CPUs. So enable the static key unconditionally, if CPU isolation is enabled, thus avoiding the IPIs. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/ZgrUiLLtbEUf9SFn@tpadSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 03 Apr, 2024 3 commits
-
-
David S. Miller authored
Harshitha Ramamurthy says: ==================== gve: enable ring size changes This series enables support to change ring size via ethtool in gve. The first three patches deal with some clean up, setting default values for the ring sizes and related fields. The last two patches enable ring size changes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Harshitha Ramamurthy authored
Allow the user to change ring size via ethtool if supported by the device. The driver relies on the ring size ranges queried from device to validate ring sizes requested by the user. Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Harshitha Ramamurthy authored
Add support to read ring size change capability and the min and max descriptor counts from the device and store it in the driver. Also accommodate a special case where the device does not provide minimum ring size depending on the version of the device. In that case, rely on default values for the minimums. Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-