[media] ivtv: Fix corrective action taken upon DMA ERR interrupt to avoid hang
After upgrading the kernel from stock Ubuntu 7.10 to 10.04, with no hardware changes, I started getting the dreaded DMA TIMEOUT errors, followed by inability to encode until the machine was rebooted. I came across a post from Andy in March (http://www.gossamer-threads.com/lists/ivtv/users/40943#40943) where he speculates that perhaps the corrective actions being taken after a DMA ERROR are not sufficient to recover the situation. After some testing I suspect that this is indeed the case, and that in fact the corrective action may be what hangs the card's DMA engine, rather than the original error. Specifically these DMA ERROR IRQs seem to present with two different values in the IVTV_REG_DMASTATUS register: 0x11 and 0x13. The current corrective action is to clear that status register back to 0x01 or 0x03, and then issue the next DMA request. In the case of a 0x13 this seems to result in a minor glitch in the encoded stream due to the failed transfer that was not retried, but otherwise things continue OK. In the case of a 0x11 the card's DMA write engine is never heard from again, and a DMA TIMEOUT follows shortly after. 0x11 is the killer. I suspect that the two cases need to be handled differently. The difference is in bit 1 (0x02), which is set when the error is about to be successfully recovered, and clear when things are about to go bad. Bit 1 of DMASTATUS is described differently in different places either as a positive "write finished", or an inverted "write busy". If we take the first definition, then when an error arises with state 0x11, it means that the write did not complete. It makes sense to start a new transfer, as in the current code. But if we take the second definition, then 0x11 means "an error but the write engine is still busy". Trying to feed it a new transfer in this situation might not be a good idea. As an experiment, I added code to ignore the DMA ERROR IRQ if DMASTATUS is 0x11. I.e., don't start a new transfer, don't clear our flags, etc. The hope was that the card would complete the transfer and issue a ENC DMA COMPLETE, either successfully or with an error condition there. However the card still hung. The only remaining corrective action being taken with a 0x11 status was then the write back to the status register to clear the error, i.e. DMASTATUS = DMASTATUS & ~3. This would have the effect of clearing the error bit 4, while leaving the lower bits indicating DMA write busy. Strangely enough, removing this write to the status register solved the problem! If the DMA ERROR IRQ with DMASTATUS=0x11 is completely ignored, with no corrective action at all, then the card will complete the transfer and issue a new IRQ. If the status register is written to when it has the value 0x11, then the DMA engine hangs. Perhaps it's illegal to write to DMASTATUS while the read or write busy bit is set? At any rate, it appears that the current corrective action is indeed making things worse rather than better. I put together a patch that modifies ivtv_irq_dma_err to do the following: - Don't write back to IVTV_REG_DMASTATUS. - If write-busy is asserted, leave the card alone. Just extend the timeout slightly. - If write-busy is de-asserted, retry the current transfer. This has completely fixed my DMA TIMEOUT woes. DMA ERR events still occur, but now they seem to be correctly handled. 0x11 events no longer hang the card, and 0x13 events no longer result in a glitch in the stream, as the failed transfer is retried. I'm happy. I've inlined the patch below in case it is of interest. As described above, I have a theory about why it works (based on a different interpretation of bit 1 of DMASTATUS), but I can't guarantee that my theory is correct. There may be another explanation, or it may be a fluke. Maybe ignoring that IRQ entirely would be equally effective? Maybe the status register read/writeback sequence is race condition if the card changes it in the mean time? Also as I am using a PVR-150 only, I have not been able to test it on other cards, which may be especially relevant for 350s that support concurrent decoding. Hopefully the patch does not break the DMA READ path. Mike [awalls@md.metrocast.net: Modified patch to add a verbose comment, make minor brace reformats, and clear the error flags in the IVTV_REG_DMASTATUS iff both read and write DMA were not in progress. Mike's conjecture about a race condition with the writeback is correct; it can confuse the DMA engine.] [Comment and analysis from the ML post by Michael <mike@rsy.com>] Signed-off-by: Andy Walls <awalls@md.metrocast.net> Cc: stable@kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Showing
Please register or sign in to comment