• Coly Li's avatar
    bcache: avoid journal no-space deadlock by reserving 1 journal bucket · 32feee36
    Coly Li authored
    The journal no-space deadlock was reported time to time. Such deadlock
    can happen in the following situation.
    
    When all journal buckets are fully filled by active jset with heavy
    write I/O load, the cache set registration (after a reboot) will load
    all active jsets and inserting them into the btree again (which is
    called journal replay). If a journaled bkey is inserted into a btree
    node and results btree node split, new journal request might be
    triggered. For example, the btree grows one more level after the node
    split, then the root node record in cache device super block will be
    upgrade by bch_journal_meta() from bch_btree_set_root(). But there is no
    space in journal buckets, the journal replay has to wait for new journal
    bucket to be reclaimed after at least one journal bucket replayed. This
    is one example that how the journal no-space deadlock happens.
    
    The solution to avoid the deadlock is to reserve 1 journal bucket in
    run time, and only permit the reserved journal bucket to be used during
    cache set registration procedure for things like journal replay. Then
    the journal space will never be fully filled, there is no chance for
    journal no-space deadlock to happen anymore.
    
    This patch adds a new member "bool do_reserve" in struct journal, it is
    inititalized to 0 (false) when struct journal is allocated, and set to
    1 (true) by bch_journal_space_reserve() when all initialization done in
    run_cache_set(). In the run time when journal_reclaim() tries to
    allocate a new journal bucket, free_journal_buckets() is called to check
    whether there are enough free journal buckets to use. If there is only
    1 free journal bucket and journal->do_reserve is 1 (true), the last
    bucket is reserved and free_journal_buckets() will return 0 to indicate
    no free journal bucket. Then journal_reclaim() will give up, and try
    next time to see whetheer there is free journal bucket to allocate. By
    this method, there is always 1 jouranl bucket reserved in run time.
    
    During the cache set registration, journal->do_reserve is 0 (false), so
    the reserved journal bucket can be used to avoid the no-space deadlock.
    Reported-by: default avatarNikhil Kshirsagar <nkshirsagar@gmail.com>
    Signed-off-by: default avatarColy Li <colyli@suse.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20220524102336.10684-5-colyli@suse.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
    32feee36
journal.c 23.5 KB