• Dave Chinner's avatar
    xfs: di_flushiter considered harmful · e60896d8
    Dave Chinner authored
    When we made all inode updates transactional, we no longer needed
    the log recovery detection for inodes being newer on disk than the
    transaction being replayed - it was redundant as replay of the log
    would always result in the latest version of the inode would be on
    disk. It was redundant, but left in place because it wasn't
    considered to be a problem.
    
    However, with the new "don't read inodes on create" optimisation,
    flushiter has come back to bite us. Essentially, the optimisation
    made always initialises flushiter to zero in the create transaction,
    and so if we then crash and run recovery and the inode already on
    disk has a non-zero flushiter it will skip recovery of that inode.
    As a result, log recovery does the wrong thing and we end up with a
    corrupt filesystem.
    
    Because we have to support old kernel to new kernel upgrades, we
    can't just get rid of the flushiter support in log recovery as we
    might be upgrading from a kernel that doesn't have fully transactional
    inode updates.  Unfortunately, for v4 superblocks there is no way to
    guarantee that log recovery knows about this fact.
    
    We cannot add a new inode format flag to say it's a "special inode
    create" because it won't be understood by older kernels and so
    recovery could do the wrong thing on downgrade. We cannot specially
    detect the combination of zero mode/non-zero flushiter on disk to
    non-zero mode, zero flushiter in the log item during recovery
    because wrapping of the flushiter can result in false detection.
    
    Hence that makes this "don't use flushiter" optimisation limited to
    a disk format that guarantees that we don't need it. And that means
    the only fix here is to limit the "no read IO on create"
    optimisation to version 5 superblocks....
    Reported-by: default avatarMarkus Trippelsdorf <markus@trippelsdorf.de>
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
    Signed-off-by: default avatarBen Myers <bpm@sgi.com>
    e60896d8
xfs_inode.c 117 KB