• Jeff Mahoney's avatar
    reiserfs: fix race with flush_used_journal_lists and flush_journal_list · 721a769c
    Jeff Mahoney authored
    There are two locks involved in managing the journal lists. The general
    reiserfs_write_lock and the journal->j_flush_mutex.
    
    While flush_journal_list is sleeping to acquire the j_flush_mutex or to
    submit a block for write, it will drop the write lock. This allows
    another thread to acquire the write lock and ultimately call
    flush_used_journal_lists to traverse the list of journal lists and
    select one for flushing. It can select the journal_list that has just
    had flush_journal_list called on it in the original thread and call it
    again with the same journal_list.
    
    The second thread then drops the write lock to acquire j_flush_mutex and
    the first thread reacquires it and continues execution and eventually
    clears and frees the journal list before dropping j_flush_mutex and
    returning.
    
    The second thread acquires j_flush_mutex and ends up operating on a
    journal_list that has already been released. If the memory hasn't
    been reused, we'll soon after hit a BUG_ON because the transaction id
    has already been cleared. If it's been reused, we'll crash in other
    fun ways.
    
    Since flush_journal_list will synchronize on j_flush_mutex, we can fix
    the race by taking a proper reference in flush_used_journal_lists
    and checking to see if it's still valid after the mutex is taken. It's
    safe to iterate the list of journal lists and pick a list with
    just the write lock as long as a reference is taken on the journal list
    before we drop the lock. We already have code to handle whether a
    transaction has been flushed already so we can use that to handle the
    race and get rid of the trans_id BUG_ON.
    Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
    Signed-off-by: default avatarJan Kara <jack@suse.cz>
    721a769c
journal.c 120 KB