• Andrew Morton's avatar
    [PATCH] fix mod_timer() race · 2eb724ed
    Andrew Morton authored
    If two CPUs run mod_timer against the same not-pending timer then they
    have no locking relationship.  They can both see the timer as
    not-pending and they both add the timer to their cpu-local list.  The
    CPU which gets there second corrupts the first CPU's lists.
    
    This was causing Dave Hansen's 8-way to oops after a couple of minutes
    of specweb testing.
    
    I believe that to fix this we need locking which is associated with the
    timer itself.  The easy fix is hashed spinlocking based on the timer's
    address.  The hard fix is a lock inside the timer itself.
    
    It is hard because init_timer() becomes compulsory, to initialise that
    spinlock.  An unknown number of code paths in the kernel just wipe the
    timer to all-zeroes and start using it.
    
    I chose the hard way - it is cleaner and more idiomatic.  The patch
    also adds a "magic number" to the timer so we can detect when a timer
    was not correctly initialised.  A warning and stack backtrace is
    generated and the timer is fixed up.  After 16 such warnings the
    warning mechanism shuts itself up until a reboot.
    
    It took six patches to my kernel to stop the warnings from coming out.
    The uninitialised timers are extremely easy to find and fix.  But it
    will take some time to weed them all out.  Maybe we should go for
    the hashed locking...
    
    Note that the new timer->lock means that we can clean up some awkward
    "oh we raced, let's try again" code in timer.c.  But to do that we'd
    also need to take timer->lock in the commonly-called del_timer(), so I
    left it as-is.
    
    The lock is not needed in add_timer() because concurrent
    add_timer()/add_timer() and concurrent add_timer()/mod_timer() are
    illegal.
    2eb724ed
timer.c 29.7 KB