• Jan Kara's avatar
    writeback: Avoid skipping inode writeback · be0937e0
    Jan Kara authored
    commit 5afced3b upstream.
    
    Inode's i_io_list list head is used to attach inode to several different
    lists - wb->{b_dirty, b_dirty_time, b_io, b_more_io}. When flush worker
    prepares a list of inodes to writeback e.g. for sync(2), it moves inodes
    to b_io list. Thus it is critical for sync(2) data integrity guarantees
    that inode is not requeued to any other writeback list when inode is
    queued for processing by flush worker. That's the reason why
    writeback_single_inode() does not touch i_io_list (unless the inode is
    completely clean) and why __mark_inode_dirty() does not touch i_io_list
    if I_SYNC flag is set.
    
    However there are two flaws in the current logic:
    
    1) When inode has only I_DIRTY_TIME set but it is already queued in b_io
    list due to sync(2), concurrent __mark_inode_dirty(inode, I_DIRTY_SYNC)
    can still move inode back to b_dirty list resulting in skipping
    writeback of inode time stamps during sync(2).
    
    2) When inode is on b_dirty_time list and writeback_single_inode() races
    with __mark_inode_dirty() like:
    
    writeback_single_inode()		__mark_inode_dirty(inode, I_DIRTY_PAGES)
      inode->i_state |= I_SYNC
      __writeback_single_inode()
    					  inode->i_state |= I_DIRTY_PAGES;
    					  if (inode->i_state & I_SYNC)
    					    bail
      if (!(inode->i_state & I_DIRTY_ALL))
      - not true so nothing done
    
    We end up with I_DIRTY_PAGES inode on b_dirty_time list and thus
    standard background writeback will not writeback this inode leading to
    possible dirty throttling stalls etc. (thanks to Martijn Coenen for this
    analysis).
    
    Fix these problems by tracking whether inode is queued in b_io or
    b_more_io lists in a new I_SYNC_QUEUED flag. When this flag is set, we
    know flush worker has queued inode and we should not touch i_io_list.
    On the other hand we also know that once flush worker is done with the
    inode it will requeue the inode to appropriate dirty list. When
    I_SYNC_QUEUED is not set, __mark_inode_dirty() can (and must) move inode
    to appropriate dirty list.
    Reported-by: default avatarMartijn Coenen <maco@android.com>
    Reviewed-by: default avatarMartijn Coenen <maco@android.com>
    Tested-by: default avatarMartijn Coenen <maco@android.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Fixes: 0ae45f63 ("vfs: add support for a lazytime mount option")
    CC: stable@vger.kernel.org
    Signed-off-by: default avatarJan Kara <jack@suse.cz>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    be0937e0
fs-writeback.c 72.9 KB