• Filipe Manana's avatar
    btrfs: fix warning during log replay when bumping inode link count · 769030e1
    Filipe Manana authored
    During log replay, at add_link(), we may increment the link count of
    another inode that has a reference that conflicts with a new reference
    for the inode currently being processed.
    
    During log replay, at add_link(), we may drop (unlink) a reference from
    some inode in the subvolume tree if that reference conflicts with a new
    reference found in the log for the inode we are currently processing.
    
    After the unlink, If the link count has decreased from 1 to 0, then we
    increment the link count to prevent the inode from being deleted if it's
    evicted by an iput() call, because we may have references to add to that
    inode later on (and we will fixup its link count later during log replay).
    
    However incrementing the link count from 0 to 1 triggers a warning:
    
      $ cat fs/inode.c
      (...)
      void inc_nlink(struct inode *inode)
      {
            if (unlikely(inode->i_nlink == 0)) {
                     WARN_ON(!(inode->i_state & I_LINKABLE));
                     atomic_long_dec(&inode->i_sb->s_remove_count);
            }
      (...)
    
    The I_LINKABLE flag is only set when creating an O_TMPFILE file, so it's
    never set during log replay.
    
    Most of the time, the warning isn't triggered even if we dropped the last
    reference of the conflicting inode, and this is because:
    
    1) The conflicting inode was previously marked for fixup, through a call
       to link_to_fixup_dir(), which increments the inode's link count;
    
    2) And the last iput() on the inode has not triggered eviction of the
       inode, nor was eviction triggered after the iput(). So at add_link(),
       even if we unlink the last reference of the inode, its link count ends
       up being 1 and not 0.
    
    So this means that if eviction is triggered after link_to_fixup_dir() is
    called, at add_link() we will read the inode back from the subvolume tree
    and have it with a correct link count, matching the number of references
    it has on the subvolume tree. So if when we are at add_link() the inode
    has exactly one reference only, its link count is 1, and after the unlink
    its link count becomes 0.
    
    So fix this by using set_nlink() instead of inc_nlink(), as the former
    accepts a transition from 0 to 1 and it's what we use in other similar
    contexts (like at link_to_fixup_dir().
    
    Also make add_inode_ref() use set_nlink() instead of inc_nlink() to
    bump the link count from 0 to 1.
    
    The warning is actually harmless, but it may scare users. Josef also ran
    into it recently.
    
    CC: stable@vger.kernel.org # 5.1+
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    769030e1
tree-log.c 196 KB