• Linus Torvalds's avatar
    Add memory barrier semantics to wake_up() & co · 04e2f174
    Linus Torvalds authored
    Oleg Nesterov and others have pointed out that on some architectures,
    the traditional sequence of
    
    	set_current_state(TASK_INTERRUPTIBLE);
    	if (CONDITION)
    		return;
    	schedule();
    
    is racy wrt another CPU doing
    
    	CONDITION = 1;
    	wake_up_process(p);
    
    because while set_current_state() has a memory barrier separating
    setting of the TASK_INTERRUPTIBLE state from reading of the CONDITION
    variable, there is no such memory barrier on the wakeup side.
    
    Now, wake_up_process() does actually take a spinlock before it reads and
    sets the task state on the waking side, and on x86 (and many other
    architectures) that spinlock is in fact equivalent to a memory barrier,
    but that is not generally guaranteed.  The write that sets CONDITION
    could move into the critical region protected by the runqueue spinlock.
    
    However, adding a smp_wmb() to before the spinlock should now order the
    writing of CONDITION wrt the lock itself, which in turn is ordered wrt
    the accesses within the spinlock (which includes the reading of the old
    state).
    
    This should thus close the race (which probably has never been seen in
    practice, but since smp_wmb() is a no-op on x86, it's not like this will
    make anything worse either on the most common architecture where the
    spinlock already gave the required protection).
    Acked-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
    Acked-by: default avatarDmitry Adamushko <dmitry.adamushko@gmail.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Nick Piggin <nickpiggin@yahoo.com.au>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    04e2f174
sched.c 200 KB