• Alan Stern's avatar
    USB: EHCI: add a delay when unlinking an active QH · 87d61912
    Alan Stern authored
    Michael Reutman reports that an AMD/ATI EHCI host controller on one of
    his computers does not stop transferring data when an active bulk QH
    is unlinked from the async schedule.  Apparently that host controller
    fails to implement the IAA mechanism correctly when an active QH is
    unlinked.  This leads to data corruption, because the controller
    continues to update the QH in memory when the driver doesn't expect
    it.  As a result, the next URB submitted for that QH can hang, because
    the link pointers for the TD queue have been messed up.  This
    misbehavior is observed quite regularly.
    
    To be fair, the EHCI spec (section 4.8.2) says that active QHs should
    not be unlinked.  It goes on to recommend a procedure that involves
    waiting for the QH to go inactive before unlinking it.  In the real
    world this is impractical, not least because the QH may _never_ go
    inactive.  (What were they thinking?)  Sometimes we have no choice but
    to unlink an active QH.
    
    In an attempt to avoid the problems that can ensue, this patch changes
    how the driver decides when the unlink is complete.  In addition to
    waiting through two IAA cycles, in cases where the QH was not known to
    be inactive beforehand we now wait until a 2-ms period has elapsed
    with the host controller making no change to the QH data structure
    (the hw_current and hw_token fields in particular).  The intuition
    here is that after such a long period, the endpoint must be NAKing and
    hopefully the QH has been dropped from the host controller's internal
    cache.  There's no way to know if this reasoning is really valid --
    the spec is no help in this regard -- but at least this approach fixes
    Michael's problem.
    
    The test for whether the QH is already known to be inactive involves
    the reason for unlinking the QH originally.  If it was unlinked
    because it had halted, or it stopped in response to a short read, or
    it overlaid a dummy TD (a silicon bug), then it certainly is inactive.
    If it was unlinked because the TD queue was empty and no TDs have been
    added to the queue in the meantime, then it must be inactive.  Or if
    the hardware status indicates that the QH is currently halted (even if
    that wasn't the reason for unlinking it), then it is inactive.
    Otherwise, if none of those checks apply, we go through the 2-ms
    delay.
    Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
    Reported-by: default avatarMichael Reutman <mreutman@epiqsolutions.com>
    Tested-by: default avatarMichael Reutman <mreutman@epiqsolutions.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    87d61912
ehci.h 29.1 KB