• Brian Foster's avatar
    bcachefs: flush journal to avoid invalid dev usage entries on recovery · bc652905
    Brian Foster authored
    A crash immediately after device removal can result in an
    unmountable filesystem due to recovery failure. The following
    command reliably reproduces on a multi-device fs:
    
      bcachefs device remove <dev> && xfs_io -xc shutdown <mnt>
    
    The post-crash mount fails with an error similar to the following,
    reported by fsck:
    
      invalid journal entry dev_usage at offset 7994/8034 seq 12: bad dev, fixing
    
    This refers to a device usage entry in the journal that refers to
    the index of the just removed device. Recovery considers this an
    invalid entry and fails to proceed.
    
    Device usage entries are added to journal buffer writes via
    bch_journal_write() -> bch2_journal_super_entries_add_common(),
    which means any journal buffer write has content that refers to
    member devices at the time of the journal write.
    
    The device remove sequence already removes metadata references to
    the device being removed. It then flushes any pins that refer to the
    device, clears replica entries, removes the in-memory device object
    and lastly updates the superblock to reflect that the device is no
    longer present. The problem is that any journal writes that occur
    during this sequence will include a dev usage entry so long as the
    device is present. To avoid this problem, we can flush the journal
    once more after the device entry is removed from the in-core
    structures, but before the superblock is updated to fully remove the
    device on-disk.
    Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
    Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
    bc652905
super.c 47 KB