• Dmitry Monakhov's avatar
    ext4: fix ext4_flush_completed_IO wait semantics · c278531d
    Dmitry Monakhov authored
    BUG #1) All places where we call ext4_flush_completed_IO are broken
        because buffered io and DIO/AIO goes through three stages
        1) submitted io,
        2) completed io (in i_completed_io_list) conversion pended
        3) finished  io (conversion done)
        And by calling ext4_flush_completed_IO we will flush only
        requests which were in (2) stage, which is wrong because:
         1) punch_hole and truncate _must_ wait for all outstanding unwritten io
          regardless to it's state.
         2) fsync and nolock_dio_read should also wait because there is
            a time window between end_page_writeback() and ext4_add_complete_io()
            As result integrity fsync is broken in case of buffered write
            to fallocated region:
            fsync                                      blkdev_completion
    	 ->filemap_write_and_wait_range
                                                       ->ext4_end_bio
                                                         ->end_page_writeback
              <-- filemap_write_and_wait_range return
    	 ->ext4_flush_completed_IO
       	 sees empty i_completed_io_list but pended
       	 conversion still exist
                                                         ->ext4_add_complete_io
    
    BUG #2) Race window becomes wider due to the 'ext4: completed_io
    locking cleanup V4' patch series
    
    This patch make following changes:
    1) ext4_flush_completed_io() now first try to flush completed io and when
       wait for any outstanding unwritten io via ext4_unwritten_wait()
    2) Rename function to more appropriate name.
    3) Assert that all callers of ext4_flush_unwritten_io should hold i_mutex to
       prevent endless wait
    Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
    Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
    Reviewed-by: default avatarJan Kara <jack@suse.cz>
    c278531d
indirect.c 44.5 KB