• Damien Le Moal's avatar
    zonefs: fix synchronous direct writes to sequential files · fe9da61f
    Damien Le Moal authored
    Commit 16d7fd3c ("zonefs: use iomap for synchronous direct writes")
    changes zonefs code from a self-built zone append BIO to using iomap for
    synchronous direct writes. This change relies on iomap submit BIO
    callback to change the write BIO built by iomap to a zone append BIO.
    However, this change overlooked the fact that a write BIO may be very
    large as it is split when issued. The change from a regular write to a
    zone append operation for the built BIO can result in a block layer
    warning as zone append BIO are not allowed to be split.
    
    WARNING: CPU: 18 PID: 202210 at block/bio.c:1644 bio_split+0x288/0x350
    Call Trace:
    ? __warn+0xc9/0x2b0
    ? bio_split+0x288/0x350
    ? report_bug+0x2e6/0x390
    ? handle_bug+0x41/0x80
    ? exc_invalid_op+0x13/0x40
    ? asm_exc_invalid_op+0x16/0x20
    ? bio_split+0x288/0x350
    bio_split_rw+0x4bc/0x810
    ? __pfx_bio_split_rw+0x10/0x10
    ? lockdep_unlock+0xf2/0x250
    __bio_split_to_limits+0x1d8/0x900
    blk_mq_submit_bio+0x1cf/0x18a0
    ? __pfx_iov_iter_extract_pages+0x10/0x10
    ? __pfx_blk_mq_submit_bio+0x10/0x10
    ? find_held_lock+0x2d/0x110
    ? lock_release+0x362/0x620
    ? mark_held_locks+0x9e/0xe0
    __submit_bio+0x1ea/0x290
    ? __pfx___submit_bio+0x10/0x10
    ? seqcount_lockdep_reader_access.constprop.0+0x82/0x90
    submit_bio_noacct_nocheck+0x675/0xa20
    ? __pfx_bio_iov_iter_get_pages+0x10/0x10
    ? __pfx_submit_bio_noacct_nocheck+0x10/0x10
    iomap_dio_bio_iter+0x624/0x1280
    __iomap_dio_rw+0xa22/0x18a0
    ? lock_is_held_type+0xe3/0x140
    ? __pfx___iomap_dio_rw+0x10/0x10
    ? lock_release+0x362/0x620
    ? zonefs_file_write_iter+0x74c/0xc80 [zonefs]
    ? down_write+0x13d/0x1e0
    iomap_dio_rw+0xe/0x40
    zonefs_file_write_iter+0x5ea/0xc80 [zonefs]
    do_iter_readv_writev+0x18b/0x2c0
    ? __pfx_do_iter_readv_writev+0x10/0x10
    ? inode_security+0x54/0xf0
    do_iter_write+0x13b/0x7c0
    ? lock_is_held_type+0xe3/0x140
    vfs_writev+0x185/0x550
    ? __pfx_vfs_writev+0x10/0x10
    ? __handle_mm_fault+0x9bd/0x1c90
    ? find_held_lock+0x2d/0x110
    ? lock_release+0x362/0x620
    ? find_held_lock+0x2d/0x110
    ? lock_release+0x362/0x620
    ? __up_read+0x1ea/0x720
    ? do_pwritev+0x136/0x1f0
    do_pwritev+0x136/0x1f0
    ? __pfx_do_pwritev+0x10/0x10
    ? syscall_enter_from_user_mode+0x22/0x90
    ? lockdep_hardirqs_on+0x7d/0x100
    do_syscall_64+0x58/0x80
    
    This error depends on the hardware used, specifically on the max zone
    append bytes and max_[hw_]sectors limits. Tests using AMD Epyc machines
    that have low limits did not reveal this issue while runs on Intel Xeon
    machines with larger limits trigger it.
    
    Manually splitting the zone append BIO using bio_split_rw() can solve
    this issue but also requires issuing the fragment BIOs synchronously
    with submit_bio_wait(), to avoid potential reordering of the zone append
    BIO fragments, which would lead to data corruption. That is, this
    solution is not better than using regular write BIOs which are subject
    to serialization using zone write locking at the IO scheduler level.
    
    Given this, fix the issue by removing zone append support and using
    regular write BIOs for synchronous direct writes. This allows preseving
    the use of iomap and having identical synchronous and asynchronous
    sequential file write path. Zone append support will be reintroduced
    later through io_uring commands to ensure that the needed special
    handling is done correctly.
    Reported-by: default avatarShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
    Fixes: 16d7fd3c ("zonefs: use iomap for synchronous direct writes")
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
    Tested-by: default avatarShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    fe9da61f
zonefs.h 7.33 KB