• Miquel Raynal's avatar
    can: sja1000: Always restart the Tx queue after an overrun · b5efb4e6
    Miquel Raynal authored
    Upstream commit 717c6ec2 ("can: sja1000: Prevent overrun stalls with
    a soft reset on Renesas SoCs") fixes an issue with Renesas own SJA1000
    CAN controller reception: the Rx buffer is only 5 messages long, so when
    the bus loaded (eg. a message every 50us), overrun may easily
    happen. Upon an overrun situation, due to a possible internal crosstalk
    situation, the controller enters a frozen state which only can be
    unlocked with a soft reset (experimentally). The solution was to offload
    a call to sja1000_start() in a threaded handler. This needs to happen in
    process context as this operation requires to sleep. sja1000_start()
    basically enters "reset mode", performs a proper software reset and
    returns back into "normal mode".
    
    Since this fix was introduced, we no longer observe any stalls in
    reception. However it was sporadically observed that the transmit path
    would now freeze. Further investigation blamed the fix mentioned above,
    and especially the reset operation. Reproducing the reset in a loop
    helped identifying what could possibly go wrong. The sja1000 is a single
    Tx queue device, which leverages the netdev helpers to process one Tx
    message at a time. The logic is: the queue is stopped, the message sent
    to the transceiver, once properly transmitted the controller sets a
    status bit which triggers an interrupt, in the interrupt handler the
    transmission status is checked and the queue woken up. Unfortunately, if
    an overrun happens, we might perform the soft reset precisely between
    the transmission of the buffer to the transceiver and the advent of the
    transmission status bit. We would then stop the transmission operation
    without re-enabling the queue, leading to all further transmissions to
    be ignored.
    
    The reset interrupt can only happen while the device is "open", and
    after a reset we anyway want to resume normal operations, no matter if a
    packet to transmit got dropped in the process, so we shall wake up the
    queue. Restarting the device and waking-up the queue is exactly what
    sja1000_set_mode(CAN_MODE_START) does. In order to be consistent about
    the queue state, we must acquire a lock both in the reset handler and in
    the transmit path to ensure serialization of both operations. It turns
    out, a lock is already held when entering the transmit path, so we can
    just acquire/release it as well with the regular net helpers inside the
    threaded interrupt handler and this way we should be safe. As the
    reset handler might still be called after the transmission of a frame to
    the transceiver but before it actually gets transmitted, we must ensure
    we don't leak the skb, so we free it (the behavior is consistent, no
    matter if there was an skb on the stack or not).
    
    Fixes: 717c6ec2 ("can: sja1000: Prevent overrun stalls with a soft reset on Renesas SoCs")
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
    Link: https://lore.kernel.org/all/20231002160206.190953-1-miquel.raynal@bootlin.com
    [mkl: fixed call to can_free_echo_skb()]
    Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
    b5efb4e6
sja1000.c 18.4 KB