• Naohiro Aota's avatar
    btrfs: ensure pages are unlocked on cow_file_range() failure · 9ce7466f
    Naohiro Aota authored
    There is a hung_task report on zoned btrfs like below.
    
    https://github.com/naota/linux/issues/59
    
      [726.328648] INFO: task rocksdb:high0:11085 blocked for more than 241 seconds.
      [726.329839]       Not tainted 5.16.0-rc1+ #1
      [726.330484] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [726.331603] task:rocksdb:high0   state:D stack:    0 pid:11085 ppid: 11082 flags:0x00000000
      [726.331608] Call Trace:
      [726.331611]  <TASK>
      [726.331614]  __schedule+0x2e5/0x9d0
      [726.331622]  schedule+0x58/0xd0
      [726.331626]  io_schedule+0x3f/0x70
      [726.331629]  __folio_lock+0x125/0x200
      [726.331634]  ? find_get_entries+0x1bc/0x240
      [726.331638]  ? filemap_invalidate_unlock_two+0x40/0x40
      [726.331642]  truncate_inode_pages_range+0x5b2/0x770
      [726.331649]  truncate_inode_pages_final+0x44/0x50
      [726.331653]  btrfs_evict_inode+0x67/0x480
      [726.331658]  evict+0xd0/0x180
      [726.331661]  iput+0x13f/0x200
      [726.331664]  do_unlinkat+0x1c0/0x2b0
      [726.331668]  __x64_sys_unlink+0x23/0x30
      [726.331670]  do_syscall_64+0x3b/0xc0
      [726.331674]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [726.331677] RIP: 0033:0x7fb9490a171b
      [726.331681] RSP: 002b:00007fb943ffac68 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
      [726.331684] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb9490a171b
      [726.331686] RDX: 00007fb943ffb040 RSI: 000055a6bbe6ec20 RDI: 00007fb94400d300
      [726.331687] RBP: 00007fb943ffad00 R08: 0000000000000000 R09: 0000000000000000
      [726.331688] R10: 0000000000000031 R11: 0000000000000246 R12: 00007fb943ffb000
      [726.331690] R13: 00007fb943ffb040 R14: 0000000000000000 R15: 00007fb943ffd260
      [726.331693]  </TASK>
    
    While we debug the issue, we found running fstests generic/551 on 5GB
    non-zoned null_blk device in the emulated zoned mode also had a
    similar hung issue.
    
    Also, we can reproduce the same symptom with an error injected
    cow_file_range() setup.
    
    The hang occurs when cow_file_range() fails in the middle of
    allocation. cow_file_range() called from do_allocation_zoned() can
    split the give region ([start, end]) for allocation depending on
    current block group usages. When btrfs can allocate bytes for one part
    of the split regions but fails for the other region (e.g. because of
    -ENOSPC), we return the error leaving the pages in the succeeded regions
    locked. Technically, this occurs only when @unlock == 0. Otherwise, we
    unlock the pages in an allocated region after creating an ordered
    extent.
    
    Considering the callers of cow_file_range(unlock=0) won't write out
    the pages, we can unlock the pages on error exit from
    cow_file_range(). So, we can ensure all the pages except @locked_page
    are unlocked on error case.
    
    In summary, cow_file_range now behaves like this:
    
    - page_started == 1 (return value)
      - All the pages are unlocked. IO is started.
    - unlock == 1
      - All the pages except @locked_page are unlocked in any case
    - unlock == 0
      - On success, all the pages are locked for writing out them
      - On failure, all the pages except @locked_page are unlocked
    
    Fixes: 42c01100 ("btrfs: zoned: introduce dedicated data write path for zoned filesystems")
    CC: stable@vger.kernel.org # 5.12+
    Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarNaohiro Aota <naohiro.aota@wdc.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    9ce7466f
inode.c 325 KB