[netdrvr e100] fix slab corruption
* Addresses two problems, both resulting in slab corruption: 1) driver indicating skb while HW is still DMA'ing (ouch!), 2) driver not stopping receiver activity before downing i/f. Fix is 1) wait for RNR (receiver-no-resources) interrupt before restarting receiver, 2) reseting HW to stop receiver before stopping i/f. This issue was also reproducible with eepro100. You need to turn off the copybreak, and reduce the number of descriptors to 4. Then bang on it with pktgen with 60-byte packets, with slab debugging enabled. For e100-3.0.x, the issue was a lot easier to reproduce with NAPI, because NAPI polls independently of where the HW is at, so it's easier for us to catch HW in the middle of finishing off the last Rx (as it runs out of resources) and asking HW if it's idle. Checking the RU status is not-reliable! That's the problem, and the mistake both eepro100 and e100-3.0.x were making. The solution is rely on RNR interrupts as the only indicator that HW is truly done, and then we're ready to restart the RU. We should only get RNR interrupts when we overrun the Rx ring. With NAPI, if the ring is overrun, we'll post RNR, but not restart the RU until we're out of polling. Without NAPI, we'll restart the RU as soon as we get RNR. I ran some 24-hour tests with and without NAPI (with 4 descriptors) and didn't get any corruption. Prior to this patch, I would get many errors about slab corruption. Also, the patch is larger than you might expect, but I initially thought I was doing something wrong with managing the <list.h> ring, so I that code using old fashion double-link list. The ring management wasn't the problem, after all, but I prefer the old-fashion d-link implementation as it's easier to read.
Showing
This diff is collapsed.
Please register or sign in to comment