1. 09 Feb, 2021 23 commits
  2. 08 Feb, 2021 17 commits
    • Naohiro Aota's avatar
      iomap: support REQ_OP_ZONE_APPEND · c3b0e880
      Naohiro Aota authored
      A ZONE_APPEND bio must follow hardware restrictions (e.g. not exceeding
      max_zone_append_sectors) not to be split. bio_iov_iter_get_pages builds
      such restricted bio using __bio_iov_append_get_pages if bio_op(bio) ==
      REQ_OP_ZONE_APPEND.
      
      To utilize it, we need to set the bio_op before calling
      bio_iov_iter_get_pages(). This commit introduces IOMAP_F_ZONE_APPEND, so
      that iomap user can set the flag to indicate they want REQ_OP_ZONE_APPEND
      and restricted bio.
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c3b0e880
    • Johannes Thumshirn's avatar
      block: add bio_add_zone_append_page · ae29333f
      Johannes Thumshirn authored
      Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which
      is intended to be used by file systems that directly add pages to a bio
      instead of using bio_iov_iter_get_pages().
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ae29333f
    • Filipe Manana's avatar
      btrfs: fix extent buffer leak on failure to copy root · 72c9925f
      Filipe Manana authored
      At btrfs_copy_root(), if the call to btrfs_inc_ref() fails we end up
      returning without unlocking and releasing our reference on the extent
      buffer named "cow" we previously allocated with btrfs_alloc_tree_block().
      
      So fix that by unlocking the extent buffer and dropping our reference on
      it before returning.
      
      Fixes: be20aa9d ("Btrfs: Add mount option to turn off data cow")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      72c9925f
    • Qu Wenruo's avatar
      btrfs: explain page locking and readahead in read_extent_buffer_pages() · 2c4d8cb7
      Qu Wenruo authored
      In read_extent_buffer_pages(), if we failed to lock the page atomically,
      we just exit with return value 0.
      
      This is counter-intuitive, as normally if we can't lock what we need, we
      would return something like EAGAIN.
      
      But that return hides under (wait == WAIT_NONE) branch, which only gets
      triggered for readahead.
      
      And for readahead, if we failed to lock the page, it means the extent
      buffer is either being read by other thread, or has been read and is
      under modification.  Either way the eb will or has been cached, thus
      readahead has no need to wait for it.
      
      Add comment on this counter-intuitive behavior.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      2c4d8cb7
    • Qu Wenruo's avatar
      btrfs: allow read-only mount of 4K sector size fs on 64K page system · 0bb3eb3e
      Qu Wenruo authored
      This adds the basic RO mount ability for 4K sector size on 64K page
      system.
      
      Currently we only plan to support 4K and 64K page system.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      0bb3eb3e
    • Qu Wenruo's avatar
      btrfs: integrate page status update for data read path into begin/end_page_read · 92082d40
      Qu Wenruo authored
      In btrfs data page read path, the page status update are handled in two
      different locations:
      
        btrfs_do_read_page()
        {
      	while (cur <= end) {
      		/* No need to read from disk */
      		if (HOLE/PREALLOC/INLINE){
      			memset();
      			set_extent_uptodate();
      			continue;
      		}
      		/* Read from disk */
      		ret = submit_extent_page(end_bio_extent_readpage);
        }
      
        end_bio_extent_readpage()
        {
      	endio_readpage_uptodate_page_status();
        }
      
      This is fine for sectorsize == PAGE_SIZE case, as for above loop we
      should only hit one branch and then exit.
      
      But for subpage, there is more work to be done in page status update:
      
      - Page Unlock condition
        Unlike regular page size == sectorsize case, we can no longer just
        unlock a page.
        Only the last reader of the page can unlock the page.
        This means, we can unlock the page either in the while() loop, or in
        the endio function.
      
      - Page uptodate condition
        Since we have multiple sectors to read for a page, we can only mark
        the full page uptodate if all sectors are uptodate.
      
      To handle both subpage and regular cases, introduce a pair of functions
      to help handling page status update:
      
      - begin_page_read()
        For regular case, it does nothing.
        For subpage case, it updates the reader counters so that later
        end_page_read() can know who is the last one to unlock the page.
      
      - end_page_read()
        This is just endio_readpage_uptodate_page_status() renamed.
        The original name is a little too long and too specific for endio.
      
        The new thing added is the condition for page unlock.
        Now for subpage data, we unlock the page if we're the last reader.
      
      This does not only provide the basis for subpage data read, but also
      hide the special handling of page read from the main read loop.
      
      Also, since we're changing how the page lock is handled, there are two
      existing error paths where we need to manually unlock the page before
      calling begin_page_read().
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      92082d40
    • Qu Wenruo's avatar
      btrfs: introduce btrfs_subpage for data inodes · 32443de3
      Qu Wenruo authored
      To support subpage sector size, data also need extra info to make sure
      which sectors in a page are uptodate/dirty/...
      
      This patch will make pages for data inodes get btrfs_subpage structure
      attached, and detached when the page is freed.
      
      This patch also slightly changes the timing when
      set_page_extent_mapped() is called to make sure:
      
      - We have page->mapping set
        page->mapping->host is used to grab btrfs_fs_info, thus we can only
        call this function after page is mapped to an inode.
      
        One call site attaches pages to inode manually, thus we have to modify
        the timing of set_page_extent_mapped() a bit.
      
      - As soon as possible, before other operations
        Since memory allocation can fail, we have to do extra error handling.
        Calling set_page_extent_mapped() as soon as possible can simply the
        error handling for several call sites.
      
      The idea is pretty much the same as iomap_page, but with more bitmaps
      for btrfs specific cases.
      
      Currently the plan is to switch iomap if iomap can provide sector
      aligned write back (only write back dirty sectors, but not the full
      page, data balance require this feature).
      
      So we will stick to btrfs specific bitmap for now.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      32443de3
    • Qu Wenruo's avatar
      btrfs: introduce subpage metadata validation check · 371cdc07
      Qu Wenruo authored
      For subpage metadata validation check, there are some differences:
      
      - Read must finish in one bvec
        Since we're just reading one subpage range in one page, it should
        never be split into two bios nor two bvecs.
      
      - How to grab the existing eb
        Instead of grabbing eb using page->private, we have to go search radix
        tree as we don't have any direct pointer at hand.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      371cdc07
    • Qu Wenruo's avatar
      btrfs: support subpage in endio_readpage_update_page_status() · 4325cb22
      Qu Wenruo authored
      To handle subpage status update, add the following:
      
      - Use btrfs_page_*() subpage-aware helpers to update page status
        Now we can handle both cases well.
      
      - No page unlock for subpage metadata
        Since subpage metadata doesn't utilize page locking at all, skip it.
        For subpage data locking, it's handled in later commits.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4325cb22
    • Qu Wenruo's avatar
      btrfs: introduce read_extent_buffer_subpage() · 4012daf7
      Qu Wenruo authored
      Introduce a helper, read_extent_buffer_subpage(), to do the subpage
      extent buffer read.
      
      The difference between regular and subpage routines are:
      
      - No page locking
        Here we completely rely on extent locking.
        Page locking can reduce the concurrency greatly, as if we lock one
        page to read one extent buffer, all the other extent buffers in the
        same page will have to wait.
      
      - Extent uptodate condition
        Despite the existing PageUptodate() and EXTENT_BUFFER_UPTODATE check,
        We also need to check btrfs_subpage::uptodate_bitmap.
      
      - No page iteration
        Just one page, no need to loop, this greatly simplified the subpage
        routine.
      
      This patch only implements the bio submit part, no endio support yet.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4012daf7
    • Qu Wenruo's avatar
      btrfs: support subpage in try_release_extent_buffer() · d1e86e3f
      Qu Wenruo authored
      Unlike the original try_release_extent_buffer(),
      try_release_subpage_extent_buffer() will iterate through all the ebs in
      the page, and try to release each.
      
      We can release the full page only after there's no private attached,
      which means all ebs of that page have been released as well.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d1e86e3f
    • Qu Wenruo's avatar
      btrfs: support subpage in btrfs_clone_extent_buffer · 92d83e94
      Qu Wenruo authored
      For btrfs_clone_extent_buffer(), it's mostly the same code of
      __alloc_dummy_extent_buffer(), except it has extra page copy.
      
      So to make it subpage compatible, we only need to:
      
      - Call set_extent_buffer_uptodate() instead of SetPageUptodate()
        This will set correct uptodate bit for subpage and regular sector size
        cases.
      
      Since we're calling set_extent_buffer_uptodate() which will also set
      EXTENT_BUFFER_UPTODATE bit, we don't need to manually set that bit
      either.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      92d83e94
    • Qu Wenruo's avatar
      btrfs: support subpage in set/clear_extent_buffer_uptodate() · 251f2acc
      Qu Wenruo authored
      To support subpage in set_extent_buffer_uptodate and
      clear_extent_buffer_uptodate we only need to use the subpage-aware
      helpers to update the page bits.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      251f2acc
    • Qu Wenruo's avatar
      btrfs: introduce helpers for subpage error status · 03a816b3
      Qu Wenruo authored
      Introduce the following functions to handle subpage error status:
      
      - btrfs_subpage_set_error()
      - btrfs_subpage_clear_error()
      - btrfs_subpage_test_error()
        These helpers can only be called when the page has subpage attached
        and the range is ensured to be inside the page.
      
      - btrfs_page_set_error()
      - btrfs_page_clear_error()
      - btrfs_page_test_error()
        These helpers can handle both regular sector size and subpage without
        problem.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      03a816b3
    • Qu Wenruo's avatar
      btrfs: introduce helpers for subpage uptodate status · a1d767c1
      Qu Wenruo authored
      Introduce the following functions to handle subpage uptodate status:
      
      - btrfs_subpage_set_uptodate()
      - btrfs_subpage_clear_uptodate()
      - btrfs_subpage_test_uptodate()
        These helpers can only be called when the page has subpage attached
        and the range is ensured to be inside the page.
      
      - btrfs_page_set_uptodate()
      - btrfs_page_clear_uptodate()
      - btrfs_page_test_uptodate()
        These helpers can handle both regular sector size and subpage.
        Although caller should still ensure that the range is inside the page.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a1d767c1
    • Qu Wenruo's avatar
      btrfs: attach private to dummy extent buffer pages · 09bc1f0f
      Qu Wenruo authored
      There are locations where we allocate dummy extent buffers for temporary
      usage, like in tree_mod_log_rewind() or get_old_root().
      
      These dummy extent buffers will be handled by the same eb accessors, and
      if they don't have page::private subpage eb accessors could fail.
      
      To address such problems, make __alloc_dummy_extent_buffer() attach
      page private for dummy extent buffers too.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      09bc1f0f
    • Qu Wenruo's avatar
      btrfs: support subpage for extent buffer page release · 8ff8466d
      Qu Wenruo authored
      In btrfs_release_extent_buffer_pages(), we need to add extra handling
      for subpage.
      
      Introduce a helper, detach_extent_buffer_page(), to do different
      handling for regular and subpage cases.
      
      For subpage case, handle detaching page private.
      
      For unmapped (dummy or cloned) ebs, we can detach the page private
      immediately as the page can only be attached to one unmapped eb.
      
      For mapped ebs, we have to ensure there are no eb in the page range
      before we delete it, as page->private is shared between all ebs in the
      same page.
      
      But there is a subpage specific race, where we can race with extent
      buffer allocation, and clear the page private while new eb is still
      being utilized, like this:
      
        Extent buffer A is the new extent buffer which will be allocated,
        while extent buffer B is the last existing extent buffer of the page.
      
        		T1 (eb A) 	 |		T2 (eb B)
        -------------------------------+------------------------------
        alloc_extent_buffer()		 | btrfs_release_extent_buffer_pages()
        |- p = find_or_create_page()   | |
        |- attach_extent_buffer_page() | |
        |				 | |- detach_extent_buffer_page()
        |				 |    |- if (!page_range_has_eb())
        |				 |    |  No new eb in the page range yet
        |				 |    |  As new eb A hasn't yet been
        |				 |    |  inserted into radix tree.
        |				 |    |- btrfs_detach_subpage()
        |				 |       |- detach_page_private();
        |- radix_tree_insert()	 |
      
        Then we have a metadata eb whose page has no private bit.
      
      To avoid such race, we introduce a subpage metadata-specific member,
      btrfs_subpage::eb_refs.
      
      In alloc_extent_buffer() we increase eb_refs in the critical section of
      private_lock.  Then page_range_has_eb() will return true for
      detach_extent_buffer_page(), and will not detach page private.
      
      The section is marked by:
      
      - btrfs_page_inc_eb_refs()
      - btrfs_page_dec_eb_refs()
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      8ff8466d