1. 17 Dec, 2018 10 commits
    • Qu Wenruo's avatar
      btrfs: volumes: Make sure no dev extent is beyond device boundary · 05a37c48
      Qu Wenruo authored
      Add extra dev extent end check against device boundary.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      05a37c48
    • Qu Wenruo's avatar
      btrfs: volumes: Make sure there is no overlap of dev extents at mount time · 5eb19381
      Qu Wenruo authored
      Enhance btrfs_verify_dev_extents() to remember previous checked dev
      extents, so it can verify no dev extents can overlap.
      
      Analysis from Hans:
      
      "Imagine allocating a DATA|DUP chunk.
      
       In the chunk allocator, we first set...
         max_stripe_size = SZ_1G;
         max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE
       ... which is 10GiB.
      
       Then...
         /* we don't want a chunk larger than 10% of writeable space */
         max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
             		 max_chunk_size);
      
       Imagine we only have one 7880MiB block device in this filesystem. Now
       max_chunk_size is down to 788MiB.
      
       The next step in the code is to search for max_stripe_size * dev_stripes
       amount of free space on the device, which is in our example 1GiB * 2 =
       2GiB. Imagine the device has exactly 1578MiB free in one contiguous
       piece. This amount of bytes will be put in devices_info[ndevs - 1].max_avail
      
       Next we recalculate the stripe_size (which is actually the device extent
       length), based on the actual maximum amount of available raw disk space:
         stripe_size = div_u64(devices_info[ndevs - 1].max_avail, dev_stripes);
      
       stripe_size is now 789MiB
      
       Next we do...
         data_stripes = num_stripes / ncopies
       ...where data_stripes ends up as 1, because num_stripes is 2 (the amount
       of device extents we're going to have), and DUP has ncopies 2.
      
       Next there's a check...
         if (stripe_size * data_stripes > max_chunk_size)
       ...which matches because 789MiB * 1 > 788MiB.
      
       We go into the if code, and next is...
         stripe_size = div_u64(max_chunk_size, data_stripes);
       ...which resets stripe_size to max_chunk_size: 788MiB
      
       Next is a fun one...
         /* bump the answer up to a 16MB boundary */
         stripe_size = round_up(stripe_size, SZ_16M);
       ...which changes stripe_size from 788MiB to 800MiB.
      
       We're not done changing stripe_size yet...
         /* But don't go higher than the limits we found while searching
          * for free extents
          */
         stripe_size = min(devices_info[ndevs - 1].max_avail,
             	      stripe_size);
      
       This is bad. max_avail is twice the stripe_size (we need to fit 2 device
       extents on the same device for DUP).
      
       The result here is that 800MiB < 1578MiB, so it's unchanged. However,
       the resulting DUP chunk will need 1600MiB disk space, which isn't there,
       and the second dev_extent might extend into the next thing (next
       dev_extent? end of device?) for 22MiB.
      
       The last shown line of code relies on a situation where there's twice
       the value of stripe_size present as value for the variable stripe_size
       when it's DUP. This was actually the case before commit 92e222df
       "btrfs: alloc_chunk: fix DUP stripe size handling", from which I quote:
         "[...] in the meantime there's a check to see if the stripe_size does
       not exceed max_chunk_size. Since during this check stripe_size is twice
       the amount as intended, the check will reduce the stripe_size to
       max_chunk_size if the actual correct to be used stripe_size is more than
       half the amount of max_chunk_size."
      
       In the previous version of the code, the 16MiB alignment (why is this
       done, by the way?) would result in a 50% chance that it would actually
       do an 8MiB alignment for the individual dev_extents, since it was
       operating on double the size. Does this matter?
      
       Does it matter that stripe_size can be set to anything which is not
       16MiB aligned because of the amount of remaining available disk space
       which is just taken?
      
       What is the main purpose of this round_up?
      
       The most straightforward thing to do seems something like...
         stripe_size = min(
             div_u64(devices_info[ndevs - 1].max_avail, dev_stripes),
             stripe_size
         )
       ..just putting half of the max_avail into stripe_size."
      
      Link: https://lore.kernel.org/linux-btrfs/b3461a38-e5f8-f41d-c67c-2efac8129054@mendix.com/Reported-by: default avatarHans van Kranenburg <hans.van.kranenburg@mendix.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      [ add analysis from report ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      5eb19381
    • Qu Wenruo's avatar
      btrfs: Refactor find_free_extent loops update into find_free_extent_update_loop · e72d79d6
      Qu Wenruo authored
      We have a complex loop design for find_free_extent(), that has different
      behavior for each loop, some even includes new chunk allocation.
      
      Instead of putting such a long code into find_free_extent() and makes it
      harder to read, just extract them into find_free_extent_update_loop().
      
      With all the cleanups, the main find_free_extent() should be pretty
      barebone:
      
      find_free_extent()
      |- Iterate through all block groups
      |  |- Get a valid block group
      |  |- Try to do clustered allocation in that block group
      |  |- Try to do unclustered allocation in that block group
      |  |- Check if the result is valid
      |  |  |- If valid, then exit
      |  |- Jump to next block group
      |
      |- Push harder to find free extents
         |- If not found, re-iterate all block groups
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarSu Yue <suy.fnst@cn.fujitsu.com>
      [ copy callchain from changelog to function comment ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e72d79d6
    • Qu Wenruo's avatar
      btrfs: Refactor unclustered extent allocation into find_free_extent_unclustered() · e1a41848
      Qu Wenruo authored
      This patch will extract unclsutered extent allocation code into
      find_free_extent_unclustered().
      
      And this helper function will use return value to indicate what to do
      next.
      
      This should make find_free_extent() a little easier to read.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarSu Yue <suy.fnst@cn.fujitsu.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      [Update merge conflict with fb5c39d7 ("btrfs: don't use ctl->free_space for max_extent_size")]
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e1a41848
    • Qu Wenruo's avatar
      btrfs: Refactor clustered extent allocation into find_free_extent_clustered · d06e3bb6
      Qu Wenruo authored
      We have two main methods to find free extents inside a block group:
      
      1) clustered allocation
      2) unclustered allocation
      
      This patch will extract the clustered allocation into
      find_free_extent_clustered() to make it a little easier to read.
      
      Instead of jumping between different labels in find_free_extent(), the
      helper function will use return value to indicate different behavior.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarSu Yue <suy.fnst@cn.fujitsu.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d06e3bb6
    • Qu Wenruo's avatar
      btrfs: Introduce find_free_extent_ctl structure for later rework · b4bd745d
      Qu Wenruo authored
      Instead of tons of different local variables in find_free_extent(),
      extract them into find_free_extent_ctl structure, and add better
      explanation for them.
      
      Some modification may looks redundant, but will later greatly simplify
      function parameter list during find_free_extent() refactor.
      
      Also add two comments to co-operate with fb5c39d7 ("btrfs: don't use
      ctl->free_space for max_extent_size"), to make ffe_ctl->max_extent_size
      update more reader-friendly.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarSu Yue <suy.fnst@cn.fujitsu.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      b4bd745d
    • Lu Fengqi's avatar
      btrfs: extent-tree: Detect bytes_pinned underflow earlier · e2907c1a
      Lu Fengqi authored
      Introduce a new wrapper update_bytes_pinned to replace open coded
      bytes_pinned modifiers. Now the underflows of space_info::bytes_pinned
      get detected and reported.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e2907c1a
    • Qu Wenruo's avatar
      btrfs: extent-tree: Detect bytes_may_use underflow earlier · 9f9b8e8d
      Qu Wenruo authored
      Although we have space_info::bytes_may_use underflow detection in
      btrfs_free_reserved_data_space_noquota(), we have more callers who are
      subtracting number from space_info::bytes_may_use.
      
      So instead of doing underflow detection for every caller, introduce a
      new wrapper update_bytes_may_use() to replace open coded bytes_may_use
      modifiers.
      
      This also introduce a macro to declare more wrappers, but currently
      space_info::bytes_may_use is the mostly interesting one.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      9f9b8e8d
    • Filipe Manana's avatar
      Btrfs: remove no longer used stuff for tracking pending ordered extents · 85dd506c
      Filipe Manana authored
      Tracking pending ordered extents per transaction was introduced in commit
      50d9aa99 ("Btrfs: make sure logged extents complete in the current
      transaction V3") and later updated in commit 161c3549 ("Btrfs: change
      how we wait for pending ordered extents").
      
      However now that on fsync we always wait for ordered extents to complete
      before logging, done in commit 5636cf7d ("btrfs: remove the logged
      extents infrastructure"), we no longer need the stuff to track for pending
      ordered extents, which was not completely removed in the mentioned commit.
      So remove the remaining of the pending ordered extents infrastructure.
      Reviewed-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      85dd506c
    • Filipe Manana's avatar
      Btrfs: remove no longer used logged range variables when logging extents · ce02f032
      Filipe Manana authored
      The logged_start and logged_end variables, at btrfs_log_changed_extents,
      were added in commit 8c6c5928 ("btrfs: log csums for all modified
      extents"). However since the recent simplification for fsync, which makes
      us wait for all ordered extents to complete before logging extents, we
      no longer need those variables. Commit a2120a47 ("btrfs: clean up the
      left over logged_list usage") forgot to remove them.
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ce02f032
  2. 16 Dec, 2018 1 commit
  3. 14 Dec, 2018 20 commits
  4. 13 Dec, 2018 9 commits