• Jens Axboe's avatar
    sbitmap: fix race in wait batch accounting · c854ab57
    Jens Axboe authored
    If we have multiple callers of sbq_wake_up(), we can end up in a
    situation where the wait_cnt will continually go more and more
    negative. Consider the case where our wake batch is 1, hence
    wait_cnt will start out as 1.
    
    wait_cnt == 1
    
    CPU0				CPU1
    atomic_dec_return(), cnt == 0
    				atomic_dec_return(), cnt == -1
    				cmpxchg(-1, 0) (succeeds)
    				[wait_cnt now 0]
    cmpxchg(0, 1) (fails)
    
    This ends up with wait_cnt being 0, we'll wakeup immediately
    next time. Going through the same loop as above again, and
    we'll have wait_cnt -1.
    
    For the case where we have a larger wake batch, the only
    difference is that the starting point will be higher. We'll
    still end up with continually smaller batch wakeups, which
    defeats the purpose of the rolling wakeups.
    
    Always reset the wait_cnt to the batch value. Then it doesn't
    matter who wins the race. But ensure that whomever does win
    the race is the one that increments the ws index and wakes up
    our batch count, loser gets to call __sbq_wake_up() again to
    account his wakeups towards the next active wait state index.
    
    Fixes: 6c0ca7ae ("sbitmap: fix wakeup hang after sbq resize")
    Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    c854ab57
sbitmap.c 13.8 KB