- 03 Sep, 2024 13 commits
-
-
Jinjie Ruan authored
Avoid need to manually handle of_node_put() by using for_each_available_child_of_node_scoped(), which can simplfy code. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Jinjie Ruan authored
Avoid need to manually handle of_node_put() by using for_each_child_of_node_scoped(), which can simplfy code. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Jinjie Ruan authored
Avoid need to manually handle of_node_put() by using for_each_child_of_node_scoped(), which can simplfy code. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Paolo Abeni authored
Alexander Lobakin says: ==================== netdev_features: start cleaning netdev_features_t up NETDEV_FEATURE_COUNT is currently 64, which means we can't add any new features as netdev_features_t is u64. As per several discussions, instead of converting netdev_features_t to a bitmap, which would mean A LOT of changes, we can try cleaning up netdev feature bits. There's a bunch of bits which don't really mean features, rather device attributes/properties that can't be changed via Ethtool in any of the drivers. Such attributes can be moved to netdev private flags without losing any functionality. Start converting some read-only netdev features to private flags from the ones that are most obvious, like lockless Tx, inability to change network namespace etc. I was able to reduce NETDEV_FEATURE_COUNT from 64 to 60, which mean 4 free slots for new features. There are obviously more read-only features to convert, such as highDMA, "challenged VLAN", HSR (4 bits) - this will be done in subsequent series. Please note that currently netdev features are not uAPI/ABI by any means. Ethtool passes their names and bits to the userspace separately and there are no hardcoded names/bits in the userspace, so that new Ethtool could work on older kernels and vice versa. This, however, isn't true for Ethtools < 3.4. I haven't changed the bit positions of the already existing features and instead replaced the freed bits with stubs. But it's anyway theoretically possible that Ethtools older than 2011 will break. I hope no currently supported distros supply such an ancient version. Shell scripts also most likely won't break since the removed bits were always read-only, meaning nobody would try touching them from a script. ==================== Link: https://patch.msgid.link/20240829123340.789395-1-aleksander.lobakin@intel.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Alexander Lobakin authored
NETIF_F_ALL_FCOE is used only in vlan_dev.c, 2 times. Now that it's only 2 bits, open-code it and remove the definition from netdev_features.h. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Alexander Lobakin authored
Ability to handle maximum FCoE frames of 2158 bytes can never be changed and thus more of an attribute, not a toggleable feature. Move it from netdev_features_t to "cold" priv flags (bitfield bool) and free yet another feature bit. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Alexander Lobakin authored
"Interface can't change network namespaces" is rather an attribute, not a feature, and it can't be changed via Ethtool. Make it a "cold" private flag instead of a netdev_feature and free one more bit. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Alexander Lobakin authored
NETIF_F_LLTX can't be changed via Ethtool and is not a feature, rather an attribute, very similar to IFF_NO_QUEUE (and hot). Free one netdev_features_t bit and make it a "hot" private flag. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Alexander Lobakin authored
Make dev->priv_flags `u32` back and define bits higher than 31 as bitfield booleans as per Jakub's suggestion. This simplifies code which accesses these bits with no optimization loss (testb both before/after), allows to not extend &netdev_priv_flags each time, but also scales better as bits > 63 in the future would only add a new u64 to the structure with no complications, comparing to that extending ::priv_flags would require converting it to a bitmap. Note that I picked `unsigned long :1` to not lose any potential optimizations comparing to `bool :1` etc. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Pawel Dembicki authored
This commit introduces implementations of three functions: .port_fdb_dump .port_fdb_add .port_fdb_del The FDB database organization is the same as in other old Vitesse chips: It has 2048 rows and 4 columns (buckets). The row index is calculated by the hash function 'vsc73xx_calc_hash' and the FDB entry must be placed exactly into row[hash]. The chip selects the bucket number by itself. Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Link: https://patch.msgid.link/20240827123938.582789-1-paweldembicki@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Jakub Kicinski authored
Merge tag 'linux-can-next-for-6.12-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2024-08-30 The first patch is by Duy Nguyen and document the R-Car V4M support in the rcar-canfd DT bindings. Frank Li's patch converts the microchip,mcp251x.txt DT bindings documentation to yaml. A patch by Zhang Changzhong update a comment in the j1939 CAN networking stack. Stefan Mätje's patch updates the CAN configuration netlink code, so that the bit timing calculation doesn't work on stale can_priv::ctrlmode data. Martin Jocic contributes a patch for the kvaser_pciefd driver to convert some ifdefs into if (IS_ENABLED()). The last patch is by Yan Zhen and simplifies the probe() function of the kvaser USB driver by using dev_err_probe(). * tag 'linux-can-next-for-6.12-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: kvaser_usb: Simplify with dev_err_probe() can: kvaser_pciefd: Use IS_ENABLED() instead of #ifdef can: netlink: avoid call to do_set_data_bittiming callback with stale can_priv::ctrlmode can: j1939: use correct function name in comment dt-bindings: can: convert microchip,mcp251x.txt to yaml dt-bindings: can: renesas,rcar-canfd: Document R-Car V4M support ==================== Link: https://patch.msgid.link/20240830214406.1605786-1-mkl@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
ChunHao Lin authored
Add support for RTL8126A rev.b. Its XID is 0x64a. It is basically based on the one with XID 0x649, but with different firmware file. Signed-off-by: ChunHao Lin <hau@realtek.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20240830021810.11993-1-hau@realtek.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Joe Damato authored
In commit 27f91aaf ("netdev-genl: Add netlink framework functions for napi"), when an invalid NAPI ID is specified the return value -EINVAL is used and no extack is set. Change the return value to -ENOENT and set the extack. Before this commit: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --do napi-get --json='{"id": 451}' Netlink error: Invalid argument nl_len = 36 (20) nl_flags = 0x100 nl_type = 2 error: -22 After this commit: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --do napi-get --json='{"id": 451}' Netlink error: No such file or directory nl_len = 44 (28) nl_flags = 0x300 nl_type = 2 error: -2 extack: {'bad-attr': '.id'} Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20240831121707.17562-1-jdamato@fastly.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 02 Sep, 2024 1 commit
-
-
Andrew Halaney authored
This callback doesn't seem to serve much purpose, and prevents things like: - systemd.link files from disabling autonegotiation - carrier detection in NetworkManager - any ethtool setting prior to userspace bringing the link up. The only fear I can think of is accessing unclocked resources due to pm_runtime, but ethtool ioctls handle that as of commit f32a2137 ("ethtool: runtime-resume netdev parent before ethtool ioctl ops") Reviewed-by: Dmitry Dolenko <d.dolenko@metrotek.ru> Tested-by: Dmitry Dolenko <d.dolenko@metrotek.ru> Signed-off-by: Andrew Halaney <ahalaney@redhat.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 01 Sep, 2024 4 commits
-
-
David S. Miller authored
Srujana Challa says: ==================== octeontx2-af: update CPT block for CN10KB and CN10KA B0 This commit addresses two key updates for the CN10KB and CN10KA B0: 1. The number of FLT interrupt vectors has been reduced to 2 on CN10KB. The code is updated to reflect this change across the CN10K series. 2. The maximum CPT credits that RX can use are now configurable through a hardware CSR on CN10KA B0. This patch sets the default value to optimize peak performance, aligning it with other chip versions. v2: - Addressed the review comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Srujana Challa authored
The maximum CPT credits that RXC can use are now configurable on CN10KA B0 through a hardware CSR. This patch sets the default value to optimize peak performance, aligning it with other chip versions. Signed-off-by: Srujana Challa <schalla@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Srujana Challa authored
This patch modifies the driver to prevent access to RXC hardware registers on the CN10KB, as RXC is not available on this chip. Signed-off-by: Srujana Challa <schalla@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Srujana Challa authored
This patch updates the driver to use a dynamic number of vectors instead of a hard-coded value. This change accommodates the CN10KB, which has 2 vectors, unlike the previously supported chips that have 3 vectors. Signed-off-by: Srujana Challa <schalla@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 31 Aug, 2024 13 commits
-
-
David S. Miller authored
Ido Schimmel says: ==================== Unmask upper DSCP bits - part 2 tl;dr - This patchset continues to unmask the upper DSCP bits in the IPv4 flow key in preparation for allowing IPv4 FIB rules to match on DSCP. No functional changes are expected. Part 1 was merged in commit ("Merge branch 'unmask-upper-dscp-bits-part-1'"). The TOS field in the IPv4 flow key ('flowi4_tos') is used during FIB lookup to match against the TOS selector in FIB rules and routes. It is currently impossible for user space to configure FIB rules that match on the DSCP value as the upper DSCP bits are either masked in the various call sites that initialize the IPv4 flow key or along the path to the FIB core. In preparation for adding a DSCP selector to IPv4 and IPv6 FIB rules, we need to make sure the entire DSCP value is present in the IPv4 flow key. This patchset continues to unmask the upper DSCP bits, but this time in the output route path. Patches #1-#3 unmask the upper DSCP bits in the various places that invoke the core output route lookup functions directly. Patches #4-#6 do the same in three helpers that are widely used in the output path to initialize the TOS field in the IPv4 flow key. The rest of the patches continue to unmask these bits in call sites that invoke the following wrappers around the core lookup functions: Patch #7 - __ip_route_output_key() Patches #8-#12 - ip_route_output_flow() The next patchset will handle the callers of ip_route_output_ports() and ip_route_output_key(). No functional changes are expected as commit 1fa3314c ("ipv4: Centralize TOS matching") moved the masking of the upper DSCP bits to the core where 'flowi4_tos' is matched against the TOS selector. Changes since v1 [1]: * Remove IPTOS_RT_MASK in patch #7 instead of in patch #6 [1] https://lore.kernel.org/netdev/20240827111813.2115285-1-idosch@nvidia.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Unmask the upper DSCP bits when calling ip_route_output_flow() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Unmask the upper DSCP bits when calling ip_route_output_flow() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Unmask the upper DSCP bits when calling ip_route_output_flow() so that in the future it could perform the FIB lookup according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
The function calls flowi4_init_output() to initialize an IPv4 flow key with which it then performs a FIB lookup using ip_route_output_flow(). The 'tos' variable with which the TOS value in the IPv4 flow key (flowi4_tos) is initialized contains the full DS field. Unmask the upper DSCP bits so that in the future the FIB lookup could be performed according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
The function calls flowi4_init_output() to initialize an IPv4 flow key with which it then performs a FIB lookup using ip_route_output_flow(). 'arg->tos' with which the TOS value in the IPv4 flow key (flowi4_tos) is initialized contains the full DS field. Unmask the upper DSCP bits so that in the future the FIB lookup could be performed according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
The function returns a value that is used to initialize 'flowi4_tos' before being passed to the FIB lookup API in the following call chain: xfrm_bundle_create() tos = xfrm_get_tos(fl, family) xfrm_dst_lookup(..., tos, ...) __xfrm_dst_lookup(..., tos, ...) xfrm4_dst_lookup(..., tos, ...) __xfrm4_dst_lookup(..., tos, ...) fl4->flowi4_tos = tos __ip_route_output_key(net, fl4) Unmask the upper DSCP bits so that in the future the output route lookup could be performed according to the full DSCP value. Remove IPTOS_RT_MASK since it is no longer used. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
build_sk_flow_key() and __build_flow_key() are used to build an IPv4 flow key before calling one of the FIB lookup APIs. Unmask the upper DSCP bits so that in the future the lookup could be performed according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
The function is used by a few socket types to retrieve the TOS value with which to perform the FIB lookup for packets sent through the socket (flowi4_tos). If a DS field was passed using the IP_TOS control message, then it is used. Otherwise the one specified via the IP_TOS socket option. Unmask the upper DSCP bits so that in the future the lookup could be performed according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
The function is used to read the DS field that was stored in IPv4 sockets via the IP_TOS socket option so that it could be used to initialize the flowi4_tos field before resolving an output route. Unmask the upper DSCP bits so that in the future the output route lookup could be performed according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
The function is called to resolve a route for an ICMP message that is sent in response to a situation. Based on the type of the generated ICMP message, the function is either passed the DS field of the packet that generated the ICMP message or a DS field that is derived from it. Unmask the upper DSCP bits before resolving and output route via ip_route_output_key_hash() so that in the future the lookup could be performed according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Unmask the upper DSCP bits so that in the future output routes could be looked up according to the full DSCP value. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
Unmask the upper DSCP bits when looking up an output route via the RTM_GETROUTE netlink message so that in the future the lookup could be performed according to the full DSCP value. No functional changes intended since the upper DSCP bits are masked when comparing against the TOS selectors in FIB rules and routes. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 30 Aug, 2024 9 commits
-
-
Yan Zhen authored
dev_err_probe() is used to log an error message during the probe process of a device. It can simplify the error path and unify a message template. Using this helper is totally fine even if err is known to never be -EPROBE_DEFER. The benefit compared to a normal dev_err() is the standardized format of the error code, it being emitted symbolically and the fact that the error code is returned which allows more compact error paths. Signed-off-by: Yan Zhen <yanzhen@vivo.com> Link: https://patch.msgid.link/20240830110651.519119-1-yanzhen@vivo.com mkl: fix indention Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Martin Jocic authored
Use the IS_ENABLED() macro to check kernel config defines instead of ifdef. Use upper_32_bits() to avoid warnings about "right shift count >= width of type" on systems with CONFIG_ARCH_DMA_ADDR_T_64BIT not set. In kvaser_pciefd_write_dma_map_altera() use lower_32_bits() for symmetry. Signed-off-by: Martin Jocic <martin.jocic@kvaser.com> Link: https://patch.msgid.link/20240830141038.1402217-1-mkl@pengutronix.deSigned-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Stefan Mätje authored
This patch moves the evaluation of data[IFLA_CAN_CTRLMODE] in function can_changelink in front of the evaluation of data[IFLA_CAN_BITTIMING]. This avoids a call to do_set_data_bittiming providing a stale can_priv::ctrlmode with a CAN_CTRLMODE_FD flag not matching the requested state when switching between a CAN Classic and CAN-FD bitrate. In the same manner the evaluation of data[IFLA_CAN_CTRLMODE] in function can_validate is also moved in front of the evaluation of data[IFLA_CAN_BITTIMING]. This is a preparation for patches where the nominal and data bittiming may have interdependencies on the driver side depending on the CAN_CTRLMODE_FD flag state. Signed-off-by: Stefan Mätje <stefan.maetje@esd.eu> Link: https://patch.msgid.link/20240808164224.213522-1-stefan.maetje@esd.euSigned-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Zhang Changzhong authored
The function j1939_cancel_all_active_sessions() was renamed to j1939_cancel_active_session() but name in comment wasn't updated. Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol") Link: https://patch.msgid.link/1724935703-44621-1-git-send-email-zhangchangzhong@huawei.comSigned-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Frank Li authored
Convert binding doc microchip,mcp251x.txt to yaml. Additional change: - add ref to spi-peripheral-props.yaml Fix below warning: arch/arm64/boot/dts/freescale/imx8dx-colibri-eval-v3.dtb: /bus@5a000000/spi@5a020000/can@0: failed to match any schema with compatible: ['microchip,mcp2515'] Signed-off-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20240814164407.4022211-1-Frank.Li@nxp.comSigned-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Duy Nguyen authored
Document support for the CAN-FD Interface on the Renesas R-Car V4M (R8A779H0) SoC, which supports up to four channels. The CAN-FD module on R-Car V4M is very similar to the one on R-Car V4H, but differs in some hardware parameters, as reflected by the Parameter Status Information part of the Global IP Version Register. However, none of this parameterization should have any impact on the driver, as the driver does not access any register that is impacted by the parameterization (except for the number of channels). Signed-off-by: Duy Nguyen <duy.nguyen.rh@renesas.com> [geert: Clarify R-Car V4M differences] Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/68b5f910bef89508e3455c768844ebe859d6ff1d.1722520779.git.geert+renesas@glider.beSigned-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Diogo Jahchan Koike authored
Move validation into set, removing .set_validate operation as its current implementation holds the rtnl lock for acquiring the PHY device, defeating the intended purpose of checking before grabbing the lock. Reported-by: syzbot+ec369e6d58e210135f71@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ec369e6d58e210135f71 Fixes: 31748765 ("net: ethtool: pse-pd: Target the command to the requested PHY") Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20240829184830.5861-1-djahchankoike@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Joe Damato authored
Two fields, page_pools and *irq_moder, were added to struct net_device in commit 083772c9 ("net: page_pool: record pools per netdev") and commit f750dfe8 ("ethtool: provide customized dim profile management"), respectively. Add both to the net_cachelines documentation, as well. Signed-off-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20240829155742.366584-1-jdamato@fastly.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Eric Dumazet says: ==================== icmp: avoid possible side-channels attacks Keyu Man reminded us that linux ICMP rate limiting was still allowing side-channels attacks. Quoting the fine document [1]: 4.4 Private Source Port Scan Method ... We can then use the same global ICMP rate limit as a side channel to infer if such an ICMP message has been triggered. At first glance, this method can work but at a low speed of one port per second, due to the per-IP rate limit on ICMP messages. Surprisingly, after we analyze the source code of the ICMP rate limit implementation, we find that the global rate limit is checked prior to the per-IP rate limit. This means that even if the per-IP rate limit may eventually determine that no ICMP reply should be sent, a packet is still subjected to the global rate limit check and one token is deducted. Ironically, such a decision is consciously made by Linux developers to avoid invoking the expensive check of the per-IP rate limit [ 22], involving a search process to locate the per-IP data structure. This effectively means that the per-IP rate limit can be disre- garded for the purpose of our side channel based scan, as it only determines if the final ICMP reply is generated but has nothing to do with the global rate limit counter decrement. As a result, we can continue to use roughly the same scan method as efficient as before, achieving 1,000 ports per second ... This series : 1) Changes the order of the two rate limiters to fix the issue. 2-3) Make the 'host-wide' rate limiter a per-netns one. [1] Link: https://dl.acm.org/doi/pdf/10.1145/3372297.3417280 ==================== Link: https://patch.msgid.link/20240829144641.3880376-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-