- 01 Feb, 2023 36 commits
-
-
Yang Yingliang authored
It should be 'chain' passed to PTR_ERR() in the error path after calling nft_chain_lookup() in nf_tables_delrule(). Fixes: f80a612d ("netfilter: nf_tables: add support to destroy operation") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Alok Tiwari authored
static analyzer detect null pointer dereference case for 'type' function __nft_obj_type_get() can return NULL value which require to handle if type is NULL pointer return -ENOENT. This is a theoretical issue, since an existing object has a type, but better add this failsafe check. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Jakub Kicinski authored
Alex Elder says: ==================== net: ipa: remaining IPA v5.0 support This series includes almost all remaining IPA code changes required to support IPA v5.0. IPA register definitions and configuration data for IPA v5.0 will be sent later (soon). Note that the GSI register definitions still require work. GSI for IPA v5.0 supports up to 256 (rather than 32) channels, and this changes the way GSI register offsets are calculated. A few GSI register fields also change. The first patch in this series increases the number of IPA endpoints supported by the driver, from 32 to 36. The next updates the width of the destination field for the IP_PACKET_INIT immediate command so it can represent up to 256 endpoints rather than just 32. The next adds a few definitions of some IPA registers and fields that are first available in IPA v5.0. The next two patches update the code that handles router and filter table caches. Previously these were referred to as "hashed" tables, and the IPv4 and IPv6 tables are now combined into one "unified" table. The sixth and seventh patches add support for a new pulse generator, which allows time periods to be specified with a wider range of clock resolution. And the last patch just defines two new memory regions that were not previously used. ==================== Link: https://lore.kernel.org/r/20230130210158.4126129-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
IPA v5.0 uses two memory regions not previously used. Define them and treat them as valid only for IPA v5.0. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
The AP has third pulse generator available starting with IPA v5.0. Redefine ipa_qtime_val() to support that possibility. Pass the IPA pointer as an argument so the version can be determined. And stop using the sign of the returned tick count to indicate which of two pulse generators to use. Instead, have the caller provide the address of a variable that will hold the selected pulse generator for the Qtime value. And for version 5.0, check whether the third pulse generator best represents the time period. Add code in ipa_qtime_config() to configure the fourth pulse generator for IPA v5.0+; in that case configure both the third and fourth pulse generators to use 10 msec granularity. Consistently use "ticks" for local variables that represent a tick count. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Starting with IPA v5.0, the head-of-line blocking timer has more than two pulse generators available to define timer granularity. To prepare for that, change the way the field value is encoded to use ipa_reg_encode() rather than ipa_reg_bit(). The aggregation granularity selection could (in principle) also use an additional pulse generator starting with IPA v5.0. Encode the AGGR_GRAN_SEL field differently to allow that as well. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
IPA v5.0+ separates the configuration of entries in the cached (previously "hashed") routing and filtering tables into distinct registers. Previously a single "filter and router" register updated entries in both tables at once; now the routing and filter table caches have separate registers that define their content. This patch updates the code that zeroes entries in the cached filter and router tables to support IPA versions including v5.0+. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Update the code that causes filter and router table caches to be flushed so that it supports IPA versions 5.0+. It adds a comment in ipa_hardware_config_hashing() that explains that cacheing does not need to be enabled, just as before, because it's enabled by default. (For the record, the FILT_ROUT_CACHE_CFG register would have been used if we wanted to explicitly enable these.) Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Define some new registers that appear starting with IPA v5.0, along with enumerated types identifying their fields. Code that uses these will be added by upcoming patches. Most of the new registers are related to filter and routing tables, and in particular, their "hashed" variant. These tables are better described as "cached", where a hash value determines which entries are cached. From now on, naming related to this functionality will use "cache" instead of "hash", and that is reflected in these new register names. Some registers for managing these caches and their contents have changed as well. A few other new field definitions for registers (unrelated to table caches) are also defined. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
The IP_PACKET_INIT immediate command defines the destination endpoint to which a packet should be sent. Prior to IPA v5.0, a 5 bit field in that command represents the endpoint, but starting with IPA v5.0, the field is extended to 8 bits to support more than 32 endpoints. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Alex Elder authored
Increase the number of endpoints supported by the driver to 36, which IPA v5.0 supports. This makes it impossible to check at build time whether the supported number is too big to fit within the (5-bit) PACKET_INIT destination endpoint field. Instead, convert the build time check to compare against what fits in 8 bits. Add a check in ipa_endpoint_config() to also ensure the hardware reports an endpoint count that's in the expected range. Just open-code 32 as the limit (the PACKET_INIT field mask is not available where we'd want to use it). Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxJakub Kicinski authored
Saeed Mahameed says: ==================== mlx5-updates-2023-01-30 Add fast update encryption key Jianbo Liu Says: ================ Data encryption keys (DEKs) are the keys used for data encryption and decryption operations. Starting from version 22.33.0783, firmware is optimized to accelerate the update of user keys into DEK object in hardware. The support for bulk allocation and destruction of DEK objects is added, and the bulk allocated DEKs are uninitialized, as the bulk creation requires no input key. When offload encryption/decryption, user gets one object from a bulk, and updates key by a new "modify DEK" command. This command is the same as create DEK object, but requires no heavy context memory allocation in firmware, which consumes most cpu cycles of the create DEK command. DEKs are cached internally by the NIC, so invalidating internal NIC caches is required before reusing DEKs. The SYNC_CRYPTO command is added to support it. DEK object can be reused, the keys in it can be updated after this command is executed. This patchset enhances the key creation and destruction flow, to get use of this new feature. Any user, for example, ktls, ipsec and macsec, can use it to offload keys. But, only ktls uses it, as others don't need many keys, and caching two many DEKs in pool is wasteful. There are two new data struts added: a. DEK pool. One pool is created for each key type. The bulks by the type, are placed in the pool's different bulk lists, according to the number of available and in_used DEKs in the bulk. b. DEK bulk. All DEKs in one bulk allocation are store here. There are two bitmaps to indicate the state of each DEK. New APIs are then added. When user need a DEK object, a. Fetch one bulk with avail DEKs, from the partial_list or avail_list, otherwise create new one. b. Pick one DEK, and set its need_sync and in_used bits to 1. Move the bulk to full_list if no more available keys, or put it to partial_list if the bulk is newly created. c. Update DEK object's key with user key, by the "modify DEK" command. d. Return DEK struct to user, then it gets the object id and fills it into the offload commands. When user free a DEK, a. Set in_use bit to 0. If all need_sync bits are 1 and all in_use bits of this bulk are 0, move it to sync_list. b. If the number of DEKs, which are freed by users, is over the threshold (128), schedule a workqueue to do the sync process. For the sync process, the SYNC_CRYPTO command is executed first. Then, for each bulks in partial_list, full_list and sync_list, reset need_sync bits of the freed DEK objects. If all need_sync bits in one bulk are zero, move it to avail_list. We already supported TIS pool to recycle the TISes. With this series and TIS pool, TLS CPS performance is improved greatly. And we tested https on the system: CPU: dual AMD EPYC 7763 64-Core processors RAM: 512G DEV: ConnectX-6 DX, with FW ver 22.33.0838 and TLS_OPTIMISE=true TLS CPS performance numbers are: Before: 11k connections/sec After: 101 connections/sec ================ * tag 'mlx5-updates-2023-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: kTLS, Improve connection rate by using fast update encryption key net/mlx5: Keep only one bulk of full available DEKs net/mlx5: Add async garbage collector for DEK bulk net/mlx5: Reuse DEKs after executing SYNC_CRYPTO command net/mlx5: Use bulk allocation for fast update encryption key net/mlx5: Add bulk allocation and modify_dek operation net/mlx5: Add support SYNC_CRYPTO command net/mlx5: Add new APIs for fast update encryption key net/mlx5: Refactor the encryption key creation net/mlx5: Add const to the key pointer of encryption key creation net/mlx5: Prepare for fast crypto key update if hardware supports it net/mlx5: Change key type to key purpose net/mlx5: Add IFC bits and enums for crypto key net/mlx5: Add IFC bits for general obj create param net/mlx5: Header file for crypto ==================== Link: https://lore.kernel.org/r/20230131031201.35336-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queueJakub Kicinski authored
Tony Nguyen says: ==================== Intel Wired LAN: Remove redundant Device Control Error Reporting Enable Bjorn Helgaas says: Since f26e58bf ("PCI/AER: Enable error reporting when AER is native"), the PCI core sets the Device Control bits that enable error reporting for PCIe devices. This series removes redundant calls to pci_enable_pcie_error_reporting() that do the same thing from several NIC drivers. There are several more drivers where this should be removed; I started with just the Intel drivers here. * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ixgbe: Remove redundant pci_enable_pcie_error_reporting() igc: Remove redundant pci_enable_pcie_error_reporting() igb: Remove redundant pci_enable_pcie_error_reporting() ice: Remove redundant pci_enable_pcie_error_reporting() iavf: Remove redundant pci_enable_pcie_error_reporting() i40e: Remove redundant pci_enable_pcie_error_reporting() fm10k: Remove redundant pci_enable_pcie_error_reporting() e1000e: Remove redundant pci_enable_pcie_error_reporting() ==================== Link: https://lore.kernel.org/r/20230130192519.686446-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Petr Machata says: ==================== selftests: mlxsw: Convert to iproute2 dcb There is a dedicated tool for configuration of DCB in iproute2. Use it in the selftests instead of lldpad. Patches #1-#3 convert three tests. Patch #4 drops the now-unnecessary lldpad helpers. ==================== Link: https://lore.kernel.org/r/cover.1675096231.git.petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Petr Machata authored
The existing users of these helpers have been converted to iproute2 dcb. Drop the helpers. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Petr Machata authored
Set up default port priority through the iproute2 dcb tool, which is easier to understand and manage. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Petr Machata authored
Set up DSCP prioritization through the iproute2 dcb tool, which is easier to understand and manage. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Petr Machata authored
Set up DSCP prioritization through the iproute2 dcb tool, which is easier to understand and manage. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Jerome Brunet says: ==================== net: mdio: add amlogic gxl mdio mux support Add support for the MDIO multiplexer found in the Amlogic GXL SoC family. This multiplexer allows to choose between the external (SoC pins) MDIO bus, or the internal one leading to the integrated 10/100M PHY. This multiplexer has been handled with the mdio-mux-mmioreg generic driver so far. When it was added, it was thought the logic was handled by a single register. It turns out more than a single register need to be properly set. As long as the device is using the Amlogic vendor bootloader, or upstream u-boot with net support, it is working fine since the kernel is inheriting the bootloader settings. Without net support in the bootloader, this glue comes unset in the kernel and only the external path may operate properly. With this driver (and the associated change in arch/arm64/boot/dts/amlogic/meson-gxl.dtsi), the kernel no longer relies on the bootloader to set things up, fixing the problem. ==================== Link: https://lore.kernel.org/r/20230130151616.375168-1-jbrunet@baylibre.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jerome Brunet authored
Add support for the mdio mux and internal phy glue of the GXL SoC family Reported-by: Da Xue <da@lessconfused.com> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jerome Brunet authored
Add documentation for the MDIO bus multiplexer found on the Amlogic GXL SoC family Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Jakub Kicinski says: ==================== tools: ynl: more docs and basic ethtool support I got discouraged from supporting ethtool in specs, because generating the user space C code seems a little tricky. The messages are ID'ed in a "directional" way (to and from kernel are separate ID "spaces"). There is value, however, in having the spec and being able to for example use it in Python. After paying off some technical debt - add a partial ethtool spec. Partial because the header for ethtool is almost a 1000 LoC, so converting in one sitting is tough. But adding new commands should be trivial now. Last but not least I add more docs, I realized that I've been sending a similar "instructions" email to people working on new families. It's now intro-specs.rst. ==================== Link: https://lore.kernel.org/r/20230131023354.1732677-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
The scripts require Python 3 and some distros are dropping Python 2 support. Reported-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
We have a bit of documentation about the internals of Netlink and the specs, but really the goal is for most people to not worry about those. Add a practical guide for beginners who want to poke at the specs. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Ethtool is one of the most actively developed families. With the changes to the CLI it should be possible to use the YNL based code for easy prototyping and development. Add a partial family definition. I've tested the string set and rings. I don't have any MAC Merge implementation to test with, but I added the definition for it, anyway, because it's last. New commands can simply be added at the end without having to worry about manually providing IDs / values. Set (with notification support - None is the response, the data is from the notification): $ sudo ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/ethtool.yaml \ --do rings-set \ --json '{"header":{"dev-name":"enp0s31f6"}, "rx":129}' \ --subscribe monitor None [{'msg': {'header': {'dev-index': 2, 'dev-name': 'enp0s31f6'}, 'rx': 136, 'rx-max': 4096, 'tx': 256, 'tx-max': 4096, 'tx-push': 0}, 'name': 'rings-ntf'}] Do / dump (yes, the kernel requires that even for dump and even if empty - the "header" nest must be there): $ ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/ethtool.yaml \ --do rings-get \ --json '{"header":{"dev-index": 2}}' {'header': {'dev-index': 2, 'dev-name': 'enp0s31f6'}, 'rx': 136, 'rx-max': 4096, 'tx': 256, 'tx-max': 4096, 'tx-push': 0} $ ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/ethtool.yaml \ --dump rings-get \ --json '{"header":{}}' [{'header': {'dev-index': 2, 'dev-name': 'enp0s31f6'}, 'rx': 136, 'rx-max': 4096, 'tx': 256, 'tx-max': 4096, 'tx-push': 0}, {'header': {'dev-index': 3, 'dev-name': 'wlp0s20f3'}, 'tx-push': 0}, {'header': {'dev-index': 19, 'dev-name': 'enp58s0u1u1'}, 'rx': 100, 'rx-max': 4096, 'tx-push': 0}] And error reporting: $ ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/ethtool.yaml \ --dump rings-get \ --json '{"header":{"flags":5}}' Netlink error: Invalid argument nl_len = 68 (52) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'reserved bit set', 'bad-attr-offs': 24, 'bad-attr': '.header.flags'} None Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
I had a (bright?) idea of introducing the concept of enum-models to account for all the weird ways families enumerate their messages. I've never finished it because generating C code for each of them is pretty daunting. But for languages which can use ID values directly the support is simple enough, so clean this up a bit. "unified" model is what I recommend going forward. "directional" model is what ethtool uses. "notify-split" is used by the proposed DPLL code, but we can just make them use "unified", it hasn't been merged :) Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
The CLI script tries to validate jsonschema by default. It's seems better to validate too many times than too few. However, when copying the scripts to random servers having to install jsonschema is tedious. Load jsonschema via importlib, and let the user opt out. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
When I wrote the first version of the Python code I was quite excited that we can generate class methods directly from the spec. Unfortunately we need to use valid identifiers for method names (specifically no dashes are allowed). Don't reuse those names on the CLI, it's much more natural to use the operation names exactly as listed in the spec. Instead of: ./cli --do rings_get use: ./cli --do rings-get Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
One of my favorite features of the Netlink specs is that they make decoding structured extack a ton easier. Implement pretty printing bad attribute names in YNL. For example it will now say: 'bad-attr': '.header.flags' rather than the useless: 'bad-attr-offs': 32 Proof: $ ./cli.py --spec ethtool.yaml --do rings_get \ --json '{"header":{"dev-index":1, "flags":4}}' Netlink error: Invalid argument nl_len = 68 (52) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'reserved bit set', 'bad-attr': '.header.flags'} Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Ethtool uses mutli-attr, add the support to YNL. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Support families which use different IDs for messages to and from the kernel. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Ethtool needs support for handful of extra types. It doesn't have the definitions section yet. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Adapt the common object hierarchy in code gen and CLI. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
There's a lot of copy and pasting going on between the "cli" and code gen when it comes to representing the parsed spec. Create a library which both can use. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Move the CLI code out of samples/ and the library part of it into tools/net/ynl/lib/. This way we can start sharing some code with the code gen. Initially I thought that code gen is too C-specific to share anything but basic stuff like calculating values for enums can easily be shared. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
An earlier fix tried to address generated code jumping around one code-gen run to another. Turns out dict()s are already ordered since Python 3.7, the problem is that we iterate over operation modes using a set(). Sets are unordered in Python. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 31 Jan, 2023 4 commits
-
-
Caleb Connolly authored
Replace the enable_irq_wake() call with one to dev_pm_set_wake_irq() instead. This will let the dev PM framework automatically manage the the wakeup capability of the ipa IRQ and ensure that userspace requests to enable/disable wakeup for the IPA via sysfs are respected. Signed-off-by: Caleb Connolly <caleb.connolly@linaro.org> Reviewed-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20230127202758.2913612-1-caleb.connolly@linaro.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Arnd Bergmann authored
When NET_DSA_MICROCHIP_KSZ_COMMON is built-in but PTP is a loadable module, the ksz_ptp support still causes a link failure: ld.lld-16: error: undefined symbol: ptp_clock_index >>> referenced by ksz_ptp.c >>> drivers/net/dsa/microchip/ksz_ptp.o:(ksz_get_ts_info) in archive vmlinux.a This can happen if NET_DSA_MICROCHIP_KSZ8863_SMI is enabled, or even if none of the KSZ9477_I2C/KSZ_SPI/KSZ8863_SMI ones are active but only the common module is. The most straightforward way to address this is to move the dependency to NET_DSA_MICROCHIP_KSZ_PTP itself, which can now only be enabled if both PTP_1588_CLOCK support is reachable from NET_DSA_MICROCHIP_KSZ_COMMON. Alternatively, one could make NET_DSA_MICROCHIP_KSZ_COMMON a hidden Kconfig symbol and extend the PTP_1588_CLOCK_OPTIONAL dependency to NET_DSA_MICROCHIP_KSZ8863_SMI as well, but that is a little more fragile. Fixes: eac1ea20 ("net: dsa: microchip: ptp: add the posix clock support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230130131808.1084796-1-arnd@kernel.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Randy Dunlap authored
Correct spelling problems for Documentation/networking/ as reported by codespell. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Jiri Pirko <jiri@nvidia.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/20230129231053.20863-5-rdunlap@infradead.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Nick Child authored
Previously, ibmvnic IRQs were assigned to CPU numbers by assigning all the IRQs for transmit queues then assigning all the IRQs for receive queues. With multi-threaded processors, in a heavy RX or TX environment, physical cores would either be overloaded or underutilized (due to the IRQ assignment algorithm). This approach is sub-optimal because IRQs for the same subprocess (RX or TX) would be bound to adjacent CPU numbers, meaning they were more likely to be contending for the same core. For example, in a system with 64 CPU's and 32 queues, the IRQs would be bound to CPU in the following pattern: IRQ type | CPU number ----------------------- TX0 | 0-1 TX1 | 2-3 <etc> RX0 | 32-33 RX1 | 34-35 <etc> Observe that in SMT-8, the first 4 tx queues would be sharing the same core. A more optimal algorithm would balance the number RX and TX IRQ's across the physical cores. Therefore, to increase performance, distribute RX and TX IRQs across cores by alternating between assigning IRQs for RX and TX queues to CPUs. With a system with 64 CPUs and 32 queues, this results in the following pattern: IRQ type | CPU number ----------------------- TX0 | 0-1 RX0 | 2-3 TX1 | 4-5 RX1 | 6-7 <etc> Observe that in SMT-8, there is equal distribution of RX and TX IRQs per core. In the above case, each core handles 2 TX and 2 RX IRQ's. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Haren Myneni <haren@linux.ibm.com> Link: https://lore.kernel.org/r/20230127214358.318152-1-nnac123@linux.ibm.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-