- 20 Jun, 2022 20 commits
-
-
Cong Wang authored
Currently both splice() and sockmap use ->read_sock() to read skb from receive queue, but for sockmap we only read one entire skb at a time, so ->read_sock() is too conservative to use. Introduce a new proto_ops ->read_skb() which supports this sematic, with this we can finally pass the ownership of skb to recv actors. For non-TCP protocols, all ->read_sock() can be simply converted to ->read_skb(). Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220615162014.89193-3-xiyou.wangcong@gmail.com
-
Cong Wang authored
This patch inroduces tcp_read_skb() based on tcp_read_sock(), a preparation for the next patch which actually introduces a new sock ops. TCP is special here, because it has tcp_read_sock() which is mainly used by splice(). tcp_read_sock() supports partial read and arbitrary offset, neither of them is needed for sockmap. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220615162014.89193-2-xiyou.wangcong@gmail.com
-
David S. Miller authored
Ido Schimmel says: ==================== mlxsw: Unified bridge conversion - part 1/6 This set starts converting mlxsw to the unified bridge model and mainly adds new device registers and extends existing ones that will be used in follow-up patchsets. High-level summary ================== The unified bridge model is a new way of managing low-level device objects such as filtering identifiers (FIDs). The conversion moves a lot of logic out of the device's firmware towards the driver, but its main selling point is that it allows to overcome various scalability issues related to the amount of entries that need to be programmed to the device. The only (intended) user visible changes of the conversion are improvement in resource utilization and ability to support more router interfaces (RIFs) in Spectrum-{2,3}. Details ======= Commit 50853808 ("Merge branch 'mlxsw-Prepare-for-VLAN-aware-bridge-w-VxLAN'") converted mlxsw to emulate 802.1Q FIDs (represent VLANs in a VLAN-aware bridge) using 802.1D FIDs (represent VLAN-unaware bridges). This was necessary because at that time VNI could not be assigned to 802.1Q FIDs, which effectively meant that mlxsw could not support VXLAN with VLAN-aware bridges. The downside of this approach is that multiple {Port,VID}->FID entries are required in order to classify incoming traffic to a FID, as opposed to a single VID->FID entry that can be used with actual 802.1Q FIDs. For example, if 10 ports are members in the same VLAN-aware bridge and the same 100 VLANs are configured on each port, then only 100 VID->FID entries are required with 802.1Q FIDs, whereas 1000 {Port,VID}->FID entries are required with emulated 802.1Q FIDs. The above limitation is the result of various assumptions that were made in the design of the API that was exposed to software. In the unified bridge model the API is much more "raw" and therefore avoids these assumptions, allowing software to configure the device in a more efficient manner. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
Router interfaces (RIFs) constructed on top of VLAN-aware bridges are of "VLAN" type, whereas RIFs constructed on top of VLAN-unaware bridges of "FID" type. In other words, the RIF type is derived from the underlying FID type. VLAN RIFs are used on top of 802.1Q FIDs, whereas FID RIFs are used on top of 802.1D FIDs. Currently 802.1Q FIDs are emulated using 802.1D FIDs, and therefore VLAN RIFs are emulated using FID RIFs. As part of converting the driver to use unified bridge, 802.1Q FIDs and VLAN RIFs will be used. Add the relevant fields to RITR register, add pack() function for VLAN RIF and rename one field to fit the internal name. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
As preparation for unified bridge model, add support for VNI->FID mapping via SVFA register. When performing VXLAN encapsulation, the VXLAN header needs to contain a VNI. This VNI is derived from the FID classification performed on ingress, through which the ingress RIF is also determined. Similarly, when performing VXLAN decapsulation, the FID of the packet needs to be determined. This FID is derived from VNI classification performed during decapsulation. In the old model, both entries (i.e., FID->VNI and VNI->FID) were configured via SFMR.vni. In the new model, where ingress is separated from egress, ingress configuration (VNI->FID) is performed via SVFA, while SFMR only configures egress (FID->VNI). Add 'vni' field to SVFA, add new mapping table - VNI to FID, add new pack() function for VNI mapping and edit the comment in SFMR. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
RITR configures the router interface table. As preparation for unified bridge model, add egress FID field to RITR. After routing, a packet has to perform a layer-2 lookup using the destination MAC it got from the routing and a FID. In the new model, the egress FID is configured by RITR for both sub-port and FID RIFs. Add 'efid' field to sub-port router interface and update FID router interface related comment. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
The REIV maps {egress router interface (eRIF), egress_port} -> {vlan ID}. As preparation for unified bridge model, add REIV register for future use. In the past, firmware would take care of the above mentioned mapping, but in the new model this should be done by software using REIV register. REIV register supports a simultaneous update of 256 ports using 'port_page' field. When 'port_page'=0 the records represent ports 0-255, when 'port_page'=1 the records represent ports 256-511 and so on. The register is reserved while using the legacy model. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
SFGC register maps {packet type, bridge type} -> {MID base, table type}. As preparation for unified bridge model, remove 'mid' field and add 'mid_base' field. The MID index (index to PGT table which maps MID to local port list and SMPE index) is a result of 'mid_base' + 'fid_offset'. Using the legacy bridge model, firmware configures 'mid_base'. However, using the new model, software is responsible to configure it via SFGC register. The 'mid_base' is configured per {packet type, bridge type}, for example, for {Unicast, .1Q}, {Broadcast, .1D}. Add the field 'mid_base' to SFGC register and increase the length of the register accordingly. Remove the field 'mid' as currently it is ignored by the device, its use is an old leftover. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
SFMR register creates and configures FIDs. As preparation for unified bridge model, add a required field for future use. The PGT (Port Group) table maps multicast ID (MID) to {local port list, SMPE index} on Spectrum-1 and to {local port list} on the other ASICs. In the legacy model, software did not interact with this table directly. Instead, it was accessed by firmware in response to registers such as SFTR and SMID. In the new model, the SFTR register is deprecated and software has full control over the PGT table using the SMID register. The configuration of MDB entries (using SFD) is unchanged, but flooding configuration is completely different. SFGC register maps {packet type, bridge type} -> {MID base, table type}, then with FID and FID-offset which are configured via SFMR, the MID index is obtained. Add the field 'flood_bridge_type' to SFMR, software can separate between 802.1q FIDs and vFIDs using two types which are supported. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
SFD register configures FDB table. As preparation for unified bridge model, add some required fields for future use. In the new model, firmware no longer configures the egress VID, this responsibility is moved to software. For layer 2 this means that software needs to determine the egress VID for both unicast and multicast. For unicast FDB records and unicast LAG FDB records, the VID needs to be set via new fields in SFD - 'set_vid' and 'vid'. Add the two mentioned fields for future use. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
SFMR register creates and configures FIDs. As preparation unified bridge model, add some required fields for future use. The device includes two main tables to support layer 2 multicast (i.e., MDB and flooding). These are the PGT (Port Group Table) and the MPE (Multicast Port Egress) table. - PGT is {MID -> (bitmap of local_port, SPME index)} - MPE is {(Local port, SMPE index) -> eVID} In Spectrum-2 and later ASICs, the SMPE index is an attribute of the FID and programmed via new fields in SFMR register - 'smpe_valid' and 'smpe'. Add the two mentioned fields for future use and increase the length of the register accordingly. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
SMID register maps multicast ID (MID) into a list of local ports. As preparation for unified bridge model, add some required fields for future use. The device includes two main tables to support layer 2 multicast (i.e., MDB and flooding). These are the PGT (Port Group Table) and the MPE (Multicast Port Egress) table. - PGT is {MID -> (bitmap of local_port, SPME index)} - MPE is {(Local port, SMPE index) -> eVID} In Spectrum-1, both indexes into the MPE table (local port and SMPE) are derived from the PGT table. Therefore, the SMPE index needs to be programmed as part of the PGT entry via new fields in SMID - 'smpe_valid' and 'smpe'. Add the two mentioned fields for future use and align the callers of mlxsw_reg_smid2_pack() to pass zeros for SMPE fields. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
The SMPE register maps {egress_port, SMPE index} -> VID. The device includes two main tables to support layer 2 multicast (i.e., MDB and flooding). These are the PGT (Port Group Table) and the MPE (Multicast Port Egress) table. - PGT is {MID -> (bitmap of local_port, SPME index)} - MPE is {(Local port, SMPE index) -> eVID} In Spectrum-1, the index into the MPE table - called switch multicast to port egress VID (SMPE) - is derived from the PGT entry, whereas in Spectrum-2 and later ASICs it is derived from the FID. In the legacy model, software did not interact with this table as it was completely hidden in firmware. In the new model, software needs to populate the table itself in order to map from {Local port, SMPE index} to an egress VID. This is done using the SMPE register. Add the register for future use. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
SVFA register controls the VID to FID mapping and {Port, VID} to FID mapping for virtualized ports. As preparation for unified bridge model, add some required fields for future use. On ingress, after ingress ACL, a packet needs to be classified to a FID. The key for this lookup can be one of: 1. VID. When port is not in virtual mode. 2. {RQ, VID}. When port is in virtual mode. 3. FID. When FID was set by ingress ACL. Since RITR no longer performs ingress configuration, the ingress RIF for the first two entry types needs to be set via new fields in SVFA - 'irif_v' and 'irif'. Add the two mentioned fields for future use and increase the length of the register accordingly. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
SFMR register creates and configures FIDs. As preparation for unified bridge model, add some required fields for future use. On ingress, after ingress ACL, a packet needs to be classified to a FID. The key for this lookup can be one of: 1. VID. When port is not in virtual mode. 2. {RQ, VID}. When port is in virtual mode. 3. FID. When FID was set by ingress ACL. For example, via VR_AND_FID_ACTION. Since RITR no longer performs ingress configuration, the ingress RIF for the last entry type needs to be set via new fields in SFMR - 'irif_v' and 'irif'. Add the two mentioned fields for future use. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Amit Cohen authored
SFMR register creates and configures FIDs. As preparation for unified bridge model, add a field for future use. In the new model, RITR no longer configures the rFID used for sub-port RIFs and it has to be created by software via SFMR. Such FIDs need to be created with special flood indication using 'flood_rsp' field. When set, this bit instructs the device to manage the flooding entries for this FID in a reserved part of the port group table (PGT). Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ronak Doshi authored
'Commit 6f91f4ba ("vmxnet3: add support for capability registers")' added support for capability registers. These registers are used to advertize capabilities of the device. The patch updated the dev_caps to disable outer checksum offload if PTCR register does not support it. However, it missed to update other overlay offloads. This patch fixes this issue. Fixes: 6f91f4ba ("vmxnet3: add support for capability registers") Signed-off-by: Ronak Doshi <doshir@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Kuniyuki Iwashima says: ==================== raw: Fix nits of RCU conversion series. The first patch fixes a build error by commit ba44f818 ("raw: use more conventional iterators"), but it does not land in the net tree, so this series is targeted to net-next. The second patch replaces some hlist functions with sk's helper macros. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kuniyuki Iwashima authored
hlist_nulls_add_head_rcu() and hlist_nulls_for_each_entry() have dedicated macros for sk. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kuniyuki Iwashima authored
The trailing semicolon causes a compiler error, so let's remove it. net/ipv4/raw.c: In function ‘raw_icmp_error’: net/ipv4/raw.c:266:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement] 266 | struct hlist_nulls_head *hlist; | ^~~~~~ Fixes: ba44f818 ("raw: use more conventional iterators") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 19 Jun, 2022 14 commits
-
-
Xiang wangx authored
Delete the redundant word 'and'. Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xiang wangx authored
Delete the redundant word 'and'. Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xiang wangx authored
Delete the redundant word 'and'. Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Simon Horman authored
This reverts commit 9386ebcc ("nfp: update nfp_X logging definitions") The reverted patch was intended to improve logging for the NFP driver by including information such as the source code file and number in log messages. Unfortunately our experience is that this has not improved things as we had hoped. The resulting logs are inconsistent with (most) other kernel log messages. And rely on knowledge of the source code version in order for the extra information to be useful. Thus, revert the change. We acknowledge that Jakub Kicinski <kuba@kernel.org> foresaw this problem. Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Russell King says: ==================== net: introduce mii_bmcr_encode_fixed() While converting the mv88e6xxx driver to phylink pcs, it has been noticed that we've started to have repeated cases where we convert a speed and duplex to a BMCR value. Rather than open coding this in multiple locations, let's provide a helper for this - in linux/mii.h. This helper not only takes care of the standard 10, 100 and 1000Mbps encodings, but also includes 2500Mbps (which is the same as 1000Mbps) for those users who require that encoding as well. Unknown speeds will be encoded to 10Mbps, and non-full duplexes will be encoded as half duplex. This series converts the existing users to the new helper, and the mv88e6xxx conversion will add further users in the 6352 and 639x PCS code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Russell King (Oracle) authored
Use the newly introduced mii_bmcr_encode_fixed() for the xpcs driver. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Russell King (Oracle) authored
Make use of the newly introduced mii_bmcr_encode_fixed() to get the BMCR value when setting loopback mode for the 88e1510. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Russell King (Oracle) authored
phylib can make use of the newly introduced mii_bmcr_encode_fixed() macro, so let's convert it over. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Russell King (Oracle) authored
Add a function to encode a fixed speed/duplex to a BMCR value. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Eric Dumazet says: ==================== raw: RCU conversion Using rwlock in networking code is extremely risky. writers can starve if enough readers are constantly grabing the rwlock. I thought rwlock were at fault and sent this patch: https://lkml.org/lkml/2022/6/17/272 But Peter and Linus essentially told me rwlock had to be unfair. We need to get rid of rwlock in networking stacks. Without this conversion, following script triggers soft lockups: for i in {1..48} do ping -f -n -q 127.0.0.1 & sleep 0.1 done Next step will be to convert ping sockets to RCU as well. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Using rwlock in networking code is extremely risky. writers can starve if enough readers are constantly grabing the rwlock. I thought rwlock were at fault and sent this patch: https://lkml.org/lkml/2022/6/17/272 But Peter and Linus essentially told me rwlock had to be unfair. We need to get rid of rwlock in networking code. Without this fix, following script triggers soft lockups: for i in {1..48} do ping -f -n -q 127.0.0.1 & sleep 0.1 done Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
In order to prepare the following patch, I change raw v4 & v6 code to use more conventional iterators. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xiaoliang Yang authored
When adjusting the PTP clock, the base time of the TAS configuration will become unreliable. We need reset the TAS configuration by using a new base time. For example, if the driver gets a base time 0 of Qbv configuration from user, and current time is 20000. The driver will set the TAS base time to be 20000. After the PTP clock adjustment, the current time becomes 10000. If the TAS base time is still 20000, it will be a future time, and TAS entry list will stop running. Another example, if the current time becomes to be 10000000 after PTP clock adjust, a large time offset can cause the hardware to hang. This patch introduces a tas_clock_adjust() function to reset the TAS module by using a new base time after the PTP clock adjustment. This can avoid issues above. Due to PTP clock adjustment can occur at any time, it may conflict with the TAS configuration. We introduce a new TAS lock to serialize the access to the TAS registers. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Christian Marangi authored
QCOM_SOCINFO depends on QCOM_SMEM but is not selected, this cause some problems with QCOM_SOCINFO getting selected with the dependency of QCOM_SMEM not met. To fix this remove the select in Kconfig and add additional info in the DWMAC_IPQ806X config description. Reported-by: kernel test robot <lkp@intel.com> Fixes: 9ec092d2 ("net: ethernet: stmmac: add missing sgmii configure for ipq806x") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 18 Jun, 2022 6 commits
-
-
Eric Dumazet authored
Using rwlock in networking code is extremely risky. writers can starve if enough readers are constantly grabing the rwlock. I thought rwlock were at fault and sent this patch: https://lkml.org/lkml/2022/6/17/272 But Peter and Linus essentially told me rwlock had to be unfair. We need to get rid of rwlock in networking code. Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Peter Lafreniere authored
ax25_dev_device_up() is only called during device setup, which is done in user context. In addition, ax25_dev_device_up() unconditionally calls ax25_register_dev_sysctl(), which already allocates with GFP_KERNEL. Since it is allowed to sleep in this function, here we change ax25_dev_device_up() to use GFP_KERNEL to reduce unnecessary out-of-memory errors. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Lafreniere <pjlafren@mtu.edu> Link: https://lore.kernel.org/r/20220616152333.9812-1-pjlafren@mtu.eduSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Xiang wangx authored
Delete the redundant word 'the'. Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Link: https://lore.kernel.org/r/20220616164155.11686-1-wangxiang@cdjrlc.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Xiang wangx authored
Delete the redundant word 'the'. Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Link: https://lore.kernel.org/r/20220616142624.3397-1-wangxiang@cdjrlc.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yinjun Zhang authored
Show correct pause frame parameters for nfp. These parameters cannot be configured, so .set_pauseparam() is not implemented. With this change: #ethtool --show-pause enp1s0np0 Pause parameters for enp1s0np0: Autonegotiate: off RX: on TX: on Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220616133358.135305-1-simon.horman@corigine.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Oleksij Rempel authored
Rework MDIO locking to avoid potential circular locking: WARNING: possible circular locking dependency detected 5.19.0-rc1-ar9331-00017-g3ab364c7c48c #5 Not tainted ------------------------------------------------------ kworker/u2:4/68 is trying to acquire lock: 81f3c83c (ar9331:1005:(&ar9331_mdio_regmap_config)->lock){+.+.}-{4:4}, at: regmap_write+0x50/0x8c but task is already holding lock: 81f60494 (&bus->mdio_lock){+.+.}-{4:4}, at: mdiobus_read+0x40/0x78 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&bus->mdio_lock){+.+.}-{4:4}: lock_acquire+0x2d4/0x360 __mutex_lock+0xf8/0x384 mutex_lock_nested+0x2c/0x38 mdiobus_write+0x44/0x80 ar9331_sw_bus_write+0x50/0xe4 _regmap_raw_write_impl+0x604/0x724 _regmap_bus_raw_write+0x9c/0xb4 _regmap_write+0xdc/0x1a0 _regmap_update_bits+0xf4/0x118 _regmap_select_page+0x108/0x138 _regmap_raw_read+0x25c/0x288 _regmap_bus_read+0x60/0x98 _regmap_read+0xd4/0x1b0 _regmap_update_bits+0xc4/0x118 regmap_update_bits_base+0x64/0x8c ar9331_sw_irq_bus_sync_unlock+0x40/0x6c __irq_set_handler+0x7c/0xac ar9331_sw_irq_map+0x48/0x7c irq_domain_associate+0x174/0x208 irq_create_mapping_affinity+0x1a8/0x230 ar9331_sw_probe+0x22c/0x388 mdio_probe+0x44/0x70 really_probe+0x200/0x424 __driver_probe_device+0x290/0x298 driver_probe_device+0x54/0xe4 __device_attach_driver+0xe4/0x130 bus_for_each_drv+0xb4/0xd8 __device_attach+0x104/0x1a4 bus_probe_device+0x48/0xc4 device_add+0x600/0x800 mdio_device_register+0x68/0xa0 of_mdiobus_register+0x2bc/0x3c4 ag71xx_probe+0x6e4/0x984 platform_probe+0x78/0xd0 really_probe+0x200/0x424 __driver_probe_device+0x290/0x298 driver_probe_device+0x54/0xe4 __driver_attach+0x17c/0x190 bus_for_each_dev+0x8c/0xd0 bus_add_driver+0x110/0x228 driver_register+0xe4/0x12c do_one_initcall+0x104/0x2a0 kernel_init_freeable+0x250/0x288 kernel_init+0x34/0x130 ret_from_kernel_thread+0x14/0x1c -> #0 (ar9331:1005:(&ar9331_mdio_regmap_config)->lock){+.+.}-{4:4}: check_noncircular+0x88/0xc0 __lock_acquire+0x10bc/0x18bc lock_acquire+0x2d4/0x360 __mutex_lock+0xf8/0x384 mutex_lock_nested+0x2c/0x38 regmap_write+0x50/0x8c ar9331_sw_mbus_read+0x74/0x1b8 __mdiobus_read+0x90/0xec mdiobus_read+0x50/0x78 get_phy_device+0xa0/0x18c fwnode_mdiobus_register_phy+0x120/0x1d4 of_mdiobus_register+0x244/0x3c4 devm_of_mdiobus_register+0xe8/0x100 ar9331_sw_setup+0x16c/0x3a0 dsa_register_switch+0x7dc/0xcc0 ar9331_sw_probe+0x370/0x388 mdio_probe+0x44/0x70 really_probe+0x200/0x424 __driver_probe_device+0x290/0x298 driver_probe_device+0x54/0xe4 __device_attach_driver+0xe4/0x130 bus_for_each_drv+0xb4/0xd8 __device_attach+0x104/0x1a4 bus_probe_device+0x48/0xc4 deferred_probe_work_func+0xf0/0x10c process_one_work+0x314/0x4d4 worker_thread+0x2a4/0x354 kthread+0x134/0x13c ret_from_kernel_thread+0x14/0x1c other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&bus->mdio_lock); lock(ar9331:1005:(&ar9331_mdio_regmap_config)->lock); lock(&bus->mdio_lock); lock(ar9331:1005:(&ar9331_mdio_regmap_config)->lock); *** DEADLOCK *** 5 locks held by kworker/u2:4/68: #0: 81c04eb4 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1e4/0x4d4 #1: 81f0de78 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1e4/0x4d4 #2: 81f0a880 (&dev->mutex){....}-{4:4}, at: __device_attach+0x40/0x1a4 #3: 80c8aee0 (dsa2_mutex){+.+.}-{4:4}, at: dsa_register_switch+0x5c/0xcc0 #4: 81f60494 (&bus->mdio_lock){+.+.}-{4:4}, at: mdiobus_read+0x40/0x78 stack backtrace: CPU: 0 PID: 68 Comm: kworker/u2:4 Not tainted 5.19.0-rc1-ar9331-00017-g3ab364c7c48c #5 Workqueue: events_unbound deferred_probe_work_func Stack : 00000056 800d4638 81f0d64c 00000004 00000018 00000000 80a20000 80a20000 80937590 81ef3858 81f0d760 3913578a 00000005 8045e824 81f0d600 a8db84cc 00000000 00000000 80937590 00000a44 00000000 00000002 00000001 ffffffff 81f0d6a4 80982d7c 0000000f 20202020 80a20000 00000001 80937590 81ef3858 81f0d760 3913578a 00000005 00000005 00000000 03bd0000 00000000 80e00000 ... Call Trace: [<80069db0>] show_stack+0x94/0x130 [<8045e824>] dump_stack_lvl+0x54/0x8c [<800c7fac>] check_noncircular+0x88/0xc0 [<800ca068>] __lock_acquire+0x10bc/0x18bc [<800cb478>] lock_acquire+0x2d4/0x360 [<807b84c4>] __mutex_lock+0xf8/0x384 [<807b877c>] mutex_lock_nested+0x2c/0x38 [<804ea640>] regmap_write+0x50/0x8c [<80501e38>] ar9331_sw_mbus_read+0x74/0x1b8 [<804fe9a0>] __mdiobus_read+0x90/0xec [<804feac4>] mdiobus_read+0x50/0x78 [<804fcf74>] get_phy_device+0xa0/0x18c [<804ffeb4>] fwnode_mdiobus_register_phy+0x120/0x1d4 [<805004f0>] of_mdiobus_register+0x244/0x3c4 [<804f0c50>] devm_of_mdiobus_register+0xe8/0x100 [<805017a0>] ar9331_sw_setup+0x16c/0x3a0 [<807355c8>] dsa_register_switch+0x7dc/0xcc0 [<80501468>] ar9331_sw_probe+0x370/0x388 [<804ff0c0>] mdio_probe+0x44/0x70 [<804d1848>] really_probe+0x200/0x424 [<804d1cfc>] __driver_probe_device+0x290/0x298 [<804d1d58>] driver_probe_device+0x54/0xe4 [<804d2298>] __device_attach_driver+0xe4/0x130 [<804cf048>] bus_for_each_drv+0xb4/0xd8 [<804d200c>] __device_attach+0x104/0x1a4 [<804d026c>] bus_probe_device+0x48/0xc4 [<804d108c>] deferred_probe_work_func+0xf0/0x10c [<800a0ffc>] process_one_work+0x314/0x4d4 [<800a17fc>] worker_thread+0x2a4/0x354 [<800a9a54>] kthread+0x134/0x13c [<8006306c>] ret_from_kernel_thread+0x14/0x1c [ Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20220616112550.877118-1-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-