net: phy: marvell10g: fix broken PHY interrupts for anyone after us in the driver probe list
Enabling interrupts via device tree for the internal PHYs on the mv88e6390 DSA switch does not work. The driver insists to use poll mode. Stage one debugging shows that the fwnode_mdiobus_phy_device_register function calls fwnode_irq_get properly, and phy->irq is set to a valid interrupt line initially. But it is then cleared. Stage two debugging shows that it is cleared here: phy_probe: /* Disable the interrupt if the PHY doesn't support it * but the interrupt is still a valid one */ if (!phy_drv_supports_irq(phydrv) && phy_interrupt_is_valid(phydev)) phydev->irq = PHY_POLL; Okay, so does the "Marvell 88E6390 Family" PHY driver not have the .config_intr and .handle_interrupt function pointers? Yes it does. Stage three debugging shows that the PHY device does not attempt a probe against the "Marvell 88E6390 Family" driver, but against the "mv88x3310" driver. Okay, so why does the "mv88x3310" driver match on a mv88x6390 internal PHY? The PHY IDs (MARVELL_PHY_ID_88E6390_FAMILY vs MARVELL_PHY_ID_88X3310) are way different. Stage four debugging has us looking through: phy_device_register -> device_add -> bus_probe_device -> device_initial_probe -> __device_attach -> bus_for_each_drv -> driver_match_device -> drv->bus->match -> phy_bus_match Okay, so as we said, the MII_PHYSID1 of mv88e6390 does not match the mv88x3310 driver's PHY mask & ID, so why would phy_bus_match return... Ahh, phy_bus_match calls a shortcircuit method, phydrv->match_phy_device, and does not even bother to compare the PHY ID if that is implemented. So of course, we go inside the marvell10g.c driver and sure enough, it implements .match_phy_device and does not bother to check the PHY ID. What's interesting though is that at the end of the device_add() from phy_device_register(), the driver for the internal PHYs _is_ the proper "Marvell 88E6390 Family". This is because "mv88x3310" ends up failing to probe after all, and __device_attach_driver(), to quote: /* * Ignore errors returned by ->probe so that the next driver can try * its luck. */ The next (and only other) driver that matches is the 6390 driver. For this one, phy_probe doesn't fail, and everything expects to work as normal, EXCEPT phydev->irq has already been cleared by the previous unsuccessful probe of a driver which did not implement PHY interrupts, and therefore cleared that IRQ. Okay, so it is not just Marvell 6390 that has PHY interrupts broken. Stuff like Atheros, Aquantia, Broadcom, Qualcomm work because they are lexicographically before Marvell, and stuff like NXP, Realtek, Vitesse are broken. This goes to show how fragile it is to reset phydev->irq = PHY_POLL from the actual beginning of phy_probe itself. That seems like an actual bug of its own too, since phy_probe has side effects which are not restored on probe failure, but the line of thought probably was, the same driver will attempt probe again, so it doesn't matter. Well, looks like it does. Maybe it would make more sense to move the phydev->irq clearing after the actual device_add() in phy_device_register() completes, and the bound driver is the actual final one. (also, a bit frightening that drivers are permitted to bypass the MDIO bus matching in such a trivial way and perform PHY reads and writes from the .match_phy_device method, on devices that do not even belong to them. In the general case it might not be guaranteed that the MDIO accesses one driver needs to make to figure out whether to match on a device is safe for all other PHY devices) Fixes: a5de4be0 ("net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Marek Behún <kabel@kernel.org> Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20210827132541.28953-1-kabel@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
Showing
Please register or sign in to comment