1. 02 Jul, 2013 16 commits
    • Josef Bacik's avatar
      Btrfs: only do the tree_mod_log_free_eb if this is our last ref · 7fb7d76f
      Josef Bacik authored
      There is another bug in the tree mod log stuff in that we're calling
      tree_mod_log_free_eb every single time a block is cow'ed.  The problem with this
      is that if this block is shared by multiple snapshots we will call this multiple
      times per block, so if we go to rewind the mod log for this block we'll BUG_ON()
      in __tree_mod_log_rewind because we try to rewind a free twice.  We only want to
      call tree_mod_log_free_eb if we are actually freeing the block.  With this patch
      I no longer hit the panic in __tree_mod_log_rewind.  Thanks,
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      7fb7d76f
    • Josef Bacik's avatar
      Btrfs: hold the tree mod lock in __tree_mod_log_rewind · f1ca7e98
      Josef Bacik authored
      We need to hold the tree mod log lock in __tree_mod_log_rewind since we walk
      forward in the tree mod entries, otherwise we'll end up with random entries and
      trip the BUG_ON() at the front of __tree_mod_log_rewind.  This fixes the panics
      people were seeing when running
      
      find /whatever -type f -exec btrfs fi defrag {} \;
      
      Thansk,
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      f1ca7e98
    • Josef Bacik's avatar
      Btrfs: make backref walking code handle skinny metadata · 261c84b6
      Josef Bacik authored
      I missed fixing the backref stuff when I introduced the skinny metadata.  If you
      try and do things like snapshot aware defrag with skinny metadata you are going
      to see tons of warnings related to the backref count being less than 0.  This is
      because the delayed refs will be found for stuff just fine, but it won't find
      the skinny metadata extent refs.  With this patch I'm not seeing warnings
      anymore.  Thanks,
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      261c84b6
    • Liu Bo's avatar
      Btrfs: fix crash regarding to ulist_add_merge · 35f0399d
      Liu Bo authored
      Several users reported this crash of NULL pointer or general protection,
      the story is that we add a rbtree for speedup ulist iteration, and we
      use krealloc() to address ulist growth, and krealloc() use memcpy to copy
      old data to new memory area, so it's OK for an array as it doesn't use
      pointers while it's not OK for a rbtree as it uses pointers.
      
      So krealloc() will mess up our rbtree and it ends up with crash.
      Reviewed-by: default avatarWang Shilong <wangsl-fnst@cn.fujitsu.com>
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      35f0399d
    • Miao Xie's avatar
      Btrfs: fix several potential problems in copy_nocow_pages_for_inode · edd1400b
      Miao Xie authored
      - It makes no sense that we deal with a inode in the dead tree.
      - fix the race between dio and page copy by waiting the dio completion
      - avoid the page copy vs truncate/punch hole
      - check if the page is in the page cache or not
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      edd1400b
    • Miao Xie's avatar
      Btrfs: cleanup the code of copy_nocow_pages_for_inode() · 826aa0a8
      Miao Xie authored
      - It make no sense that we continue to do something after the error
        happened, just go back with this patch.
      - remove some check of copy_nocow_pages_for_inode(), such as page check
        after write, inode check in the end of the function, because we are
        sure they exist.
      - remove the unnecessary goto in the return value check of the write
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      826aa0a8
    • Miao Xie's avatar
      Btrfs: fix oops when recovering the file data by scrub function · 26b25891
      Miao Xie authored
      We get oops while running btrfs replace start test,
      ------------[ cut here ]------------
      kernel BUG at mm/filemap.c:608!
      [SNIP]
      Call Trace:
        [<ffffffffa04b36c7>] copy_nocow_pages_for_inode+0x217/0x3f0 [btrfs]
        [<ffffffffa04b34b0>] ? scrub_print_warning_inode+0x230/0x230 [btrfs]
        [<ffffffffa04b34b0>] ? scrub_print_warning_inode+0x230/0x230 [btrfs]
        [<ffffffffa04bb8ce>] iterate_extent_inodes+0x1ae/0x300 [btrfs]
        [<ffffffffa04bbab2>] iterate_inodes_from_logical+0x92/0xb0 [btrfs]
        [<ffffffffa04b34b0>] ? scrub_print_warning_inode+0x230/0x230 [btrfs]
        [<ffffffffa04b3b07>] copy_nocow_pages_worker+0x97/0x150 [btrfs]
        [<ffffffffa048eed4>] worker_loop+0x134/0x540 [btrfs]
        [<ffffffff816274ea>] ? __schedule+0x3ca/0x7f0
        [<ffffffffa048eda0>] ? btrfs_queue_worker+0x300/0x300 [btrfs]
        [<ffffffff8106f2f0>] kthread+0xc0/0xd0
        [<ffffffff8106f230>] ? flush_kthread_worker+0x80/0x80
        [<ffffffff8163181c>] ret_from_fork+0x7c/0xb0
        [<ffffffff8106f230>] ? flush_kthread_worker+0x80/0x80
      [SNIP]
       RIP  [<ffffffff8111f4c5>] unlock_page+0x35/0x40
        RSP <ffff88010316bb98>
       ---[ end trace 421e79ad0dd72c7d ]---
      
      it is because we forgot to lock the page again after we read data to
      the page. Fix it.
      Signed-off-by: default avatarLin Feng <linfeng@cn.fujitsu.com>
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      26b25891
    • Josef Bacik's avatar
      Btrfs: make the chunk allocator completely tree lockless · 6df9a95e
      Josef Bacik authored
      When adjusting the enospc rules for relocation I ran into a deadlock because we
      were relocating the only system chunk and that forced us to try and allocate a
      new system chunk while holding locks in the chunk tree, which caused us to
      deadlock.  To fix this I've moved all of the dev extent addition and chunk
      addition out to the delayed chunk completion stuff.  We still keep the in-memory
      stuff which makes sure everything is consistent.
      
      One change I had to make was to search the commit root of the device tree to
      find a free dev extent, and hold onto any chunk em's that we allocated in that
      transaction so we do not allocate the same dev extent twice.  This has the side
      effect of fixing a bug with balance that has been there ever since balance
      existed.  Basically you can free a block group and it's dev extent and then
      immediately allocate that dev extent for a new block group and write stuff to
      that dev extent, all within the same transaction.  So if you happen to crash
      during a balance you could come back to a completely broken file system.  This
      patch should keep these sort of things from happening in the future since we
      won't be able to allocate free'd dev extents until after the transaction
      commits.  This has passed all of the xfstests and my super annoying stress test
      followed by a balance.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      6df9a95e
    • Josef Bacik's avatar
      Btrfs: cleanup orphaned root orphan item · 68a7342c
      Josef Bacik authored
      I hit a weird problem were my root item had been deleted but the orphan item had
      not.  This isn't necessarily a problem, but it keeps the file system from being
      mounted.  To fix this we just need to axe the orphan item if we can't find the
      fs root when we're putting them altogether.  With this patch I was able to
      successfully mount my file system.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      68a7342c
    • Miao Xie's avatar
      Btrfs: fix wrong mirror number tuning · a70c6172
      Miao Xie authored
      Now reading the data from the target device of the replace operation is allowed,
      so the mirror number that is greater than the stripes number of a chunk is valid,
      we will tune it when we find there is no target device later. Fix it.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      a70c6172
    • Miao Xie's avatar
      e6da5d2e
    • Miao Xie's avatar
      Btrfs: remove btrfs_sector_sum structure · f51a4a18
      Miao Xie authored
      Using the structure btrfs_sector_sum to keep the checksum value is
      unnecessary, because the extents that btrfs_sector_sum points to are
      continuous, we can find out the expected checksums by btrfs_ordered_sum's
      bytenr and the offset, so we can remove btrfs_sector_sum's bytenr. After
      removing bytenr, there is only one member in the structure, so it makes
      no sense to keep the structure, just remove it, and use a u32 array to
      store the checksum value.
      
      By this change, we don't use the while loop to get the checksums one by
      one. Now, we can get several checksum value at one time, it improved the
      performance by ~74% on my SSD (31MB/s -> 54MB/s).
      
      test command:
       # dd if=/dev/zero of=/mnt/btrfs/file0 bs=1M count=1024 oflag=sync
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      f51a4a18
    • Josef Bacik's avatar
      Btrfs: check if we can nocow if we don't have data space · 7ee9e440
      Josef Bacik authored
      We always just try and reserve data space when we write, but if we are out of
      space but have prealloc'ed extents we should still successfully write.  This
      patch will try and see if we can write to prealloc'ed space and if we can go
      ahead and allow the write to continue.  With this patch we now pass xfstests
      generic/274.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      7ee9e440
    • Josef Bacik's avatar
      Btrfs: stop using try_to_writeback_inodes_sb_nr to flush delalloc · 925a6efb
      Josef Bacik authored
      try_to_writeback_inodes_sb_nr returns 1 if writeback is already underway, which
      is completely fraking useless for us as we need to make sure pages are actually
      written before we go and check if there are ordered extents.  So replace this
      with an open coding of try_to_writeback_inodes_sb_nr minus the writeback
      underway check so that we are sure to actually have flushed some dirty pages out
      and will have ordered extents to use.  With this patch xfstests generic/273 now
      passes.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      925a6efb
    • Josef Bacik's avatar
      Btrfs: use a percpu to keep track of possibly pinned bytes · b150a4f1
      Josef Bacik authored
      There are all of these checks in the ENOSPC code to see if committing the
      transaction would free up enough space to make the allocation.  This is because
      early on we just committed the transaction and hoped and prayed, which resulted
      in cases where it took _forever_ to get an ENOSPC when we really were out of
      space.  So we check space_info->bytes_pinned, except this isn't completely true
      because it doesn't account for space we may free but are stuck in delayed refs.
      So tests like xfstests 226 would fail because we wouldn't commit the transaction
      to free up the data space.  So instead add a percpu counter that will be a
      little fuzzier, it will add bytes as soon as we try to free up the space, and
      remove any space it doesn't actually free up when we get around to doing the
      actual free.  We then 0 out this counter every transaction period so we have a
      better idea of how much space we will actually free up by committing this
      transaction.  With this patch we now pass xfstests 226.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      b150a4f1
    • Josef Bacik's avatar
      Btrfs: check for actual acls rather than just xattrs when caching no acl · f23b5a59
      Josef Bacik authored
      We have an optimization that will go ahead and cache no acls on an inode if
      there are no xattrs on the inode.  This saves us a lookup later to check the
      acls for writes or any other access.  The problem is I use selinux so I always
      have an xattr on inodes, so make this test a little smarter and check for the
      actual acl hash on the key and if it isn't there then we still get to cache no
      acl which makes everybody who uses selinux a little happier.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      f23b5a59
  2. 01 Jul, 2013 13 commits
    • Josef Bacik's avatar
      Btrfs: move btrfs_truncate_page to btrfs_cont_expand instead of btrfs_truncate · a71754fc
      Josef Bacik authored
      This has plagued us forever and I'm so over working around it.  When we truncate
      down to a non-page aligned offset we will call btrfs_truncate_page to zero out
      the end of the page and write it back to disk, this will keep us from exposing
      stale data if we truncate back up from that point.  The problem with this is it
      requires data space to do this, and people don't really expect to get ENOSPC
      from truncate() for these sort of things.  This also tends to bite the orphan
      cleanup stuff too which keeps people from mounting.  To get around this we can
      just move this into btrfs_cont_expand() to make sure if we are truncating up
      from a non-page size aligned i_size we will zero out the rest of this page so
      that we don't expose stale data.  This will give ENOSPC if you try to truncate()
      up or if you try to write past the end of isize, which is much more reasonable.
      This fixes xfstests generic/083 failing to mount because of the orphan cleanup
      failing.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      a71754fc
    • Josef Bacik's avatar
      Btrfs: optimize reada_for_balance · 0b08851f
      Josef Bacik authored
      This patch does two things.  First we no longer explicitly read in the blocks
      we're trying to readahead.  For things like balance_level we may never actually
      use the blocks so this just adds uneeded latency, and balance_level and
      split_node will both read in the blocks they care about explicitly so if the
      blocks need to be waited on it will be done there.  Secondly we no longer drop
      the path if we do readahead, we just set the path blocking before we call
      reada_for_balance() and then we're good to go.  Hopefully this will cut down on
      the number of re-searches.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      0b08851f
    • Josef Bacik's avatar
      Btrfs: optimize read_block_for_search · bdf7c00e
      Josef Bacik authored
      This patch does two things, first it only does one call to
      btrfs_buffer_uptodate() with the gen specified instead of once with 0 and then
      again with gen specified.  The other thing is to call btrfs_read_buffer() on the
      buffer we've found instead of dropping it and then calling read_tree_block().
      This will keep us from doing yet another radix tree lookup for a buffer we've
      already found.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      bdf7c00e
    • Josef Bacik's avatar
      Btrfs: unlock extent range on enospc in compressed submit · fdf8e2ea
      Josef Bacik authored
      A user reported a deadlock where the async submit thread was blocked on the
      lock_extent() lock, and then everybody behind him was locked on the page lock
      for the page he was holding.  Looking at the code I noticed we do not unlock the
      extent range when we get ENOSPC and goto retry.  This is bad because we
      immediately try to lock that range again to do the cow, which will cause a
      deadlock.  Fix this by unlocking the range.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      fdf8e2ea
    • Wang Sheng-Hui's avatar
      Btrfs: fix the comment typo for btrfs_attach_transaction_barrier · 90b6d283
      Wang Sheng-Hui authored
      The comment is for btrfs_attach_transaction_barrier, not for
      btrfs_attach_transaction. Fix the typo.
      Signed-off-by: default avatarWang Sheng-Hui <shhuiw@gmail.com>
      Acked-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      90b6d283
    • Josef Bacik's avatar
      Btrfs: fix not being able to find skinny extents during relocate · aee68ee5
      Josef Bacik authored
      We unconditionally search for the EXTENT_ITEM_KEY for metadata during balance,
      and then check the key that we found to see if it is actually a
      METADATA_ITEM_KEY, but this doesn't work right because METADATA is a higher key
      value, so if what we are looking for happens to be the first item in the leaf
      the search will dump us out at the previous leaf, and we won't find our item.
      So instead do what we do everywhere else, search for the skinny extent first and
      if we don't find it go back and re-search for the extent item.  This patch fixes
      the panic I was hitting when balancing a large file system with skinny extents.
      Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      aee68ee5
    • Josef Bacik's avatar
      Btrfs: cleanup backref search commit root flag stuff · da61d31a
      Josef Bacik authored
      Looking into this backref problem I noticed we're using a macro to what turns
      out to essentially be a NULL check to see if we need to search the commit root.
      I'm killing this, let's just do what everybody else does and checks if trans ==
      NULL.  I've also made it so we pass in the path to __resolve_indirect_refs which
      will have the search_commit_root flag set properly already and that way we can
      avoid allocating another path when we have a perfectly good one to use.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      da61d31a
    • Josef Bacik's avatar
      Btrfs: free csums when we're done scrubbing an extent · d88d46c6
      Josef Bacik authored
      A user reported scrub taking up an unreasonable amount of ram as it ran.  This
      is because we lookup the csums for the extent we're scrubbing but don't free it
      up until after we're done with the scrub, which means we can take up a whole lot
      of ram.  This patch fixes this by dropping the csums once we're done with the
      extent we've scrubbed.  The user reported this to fix their problem.  Thanks,
      Reported-and-tested-by: default avatarRemco Hosman <remco@hosman.xs4all.nl>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      d88d46c6
    • Josef Bacik's avatar
      Btrfs: fix transaction throttling for delayed refs · 1be41b78
      Josef Bacik authored
      Dave has this fs_mark script that can make btrfs abort with sufficient amount of
      ram.  This is because with more ram we can keep more dirty metadata in cache
      which in a round about way makes for many more pending delayed refs.  What
      happens is we end up not throttling the transaction enough so when we go to
      commit the transaction when we've completely filled the file system we'll
      abort() because we use all of the space in the global reserve and we still have
      delayed refs to run.  To fix this we need to make the delayed ref flushing and
      the transaction throttling dependant upon the number of delayed refs that we
      have instead of how much reserved space is left in the global reserve.  With
      this patch we not only stop aborting transactions but we also get a smoother run
      speed with fs_mark and it makes us about 10% faster.  Thanks,
      Reported-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      1be41b78
    • Josef Bacik's avatar
      Btrfs: stop waiting on current trans if we aborted · 501407aa
      Josef Bacik authored
      I hit a hang when run_delayed_refs returned an error in the beginning of
      btrfs_commit_transaction.  If we decide we need to commit the transaction in
      btrfs_end_transaction we'll set BLOCKED and start to commit, but if we get an
      error this early on we'll just exit without committing.  This is fine, except
      that anybody else who tried to start a transaction will sit in
      wait_current_trans() since we're set to BLOCKED and we never set it to something
      else and woke people up.  To fix this we want to check for trans->aborted
      everywhere we wait for the transaction state to change, and make
      btrfs_abort_transaction() wake up any waiters there may be.  All the callers
      will notice that the transaction has aborted and exit out properly.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      501407aa
    • Josef Bacik's avatar
      Btrfs: wake up delayed ref flushing waiters on abort · f971fe29
      Josef Bacik authored
      I hit a deadlock because we aborted when flushing delayed refs but didn't wake
      any of the other flushers up and so everybody was just sleeping forever.  This
      should fix the problem.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      f971fe29
    • Jie Liu's avatar
      btrfs: fix the code comments for LZO compression workspace · 3fb40375
      Jie Liu authored
      Fix the code comments for lzo compression workspace.
      The buf item is used to store the decompressed data
      and cbuf is used to store the compressed data.
      Signed-off-by: default avatarJie Liu <jeff.liu@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      3fb40375
    • Miao Xie's avatar
      Btrfs: fix broken nocow after balance · 5bc7247a
      Miao Xie authored
      Balance will create reloc_root for each fs root, and it's going to
      record last_snapshot to filter shared blocks.  The side effect of
      setting last_snapshot is to break nocow attributes of files.
      
      Since the extents are not shared by the relocation tree after the balance,
      we can recover the old last_snapshot safely if no one snapshoted the
      source tree. We fix the above problem by this way.
      Reported-by: default avatarKyle Gates <kylegates@hotmail.com>
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      5bc7247a
  3. 14 Jun, 2013 11 commits
    • Josef Bacik's avatar
      Btrfs: exclude logged extents before replying when we are mixed · 8c2a1a30
      Josef Bacik authored
      With non-mixed block groups we replay the logs before we're allowed to do any
      writes, so we get away with not pinning/removing the data extents until right
      when we replay them.  However with mixed block groups we allocate out of the
      same pool, so we could easily allocate a metadata block that was logged in our
      tree log.  To deal with this we just need to notice that we have mixed block
      groups and do the normal excluding/removal dance during the pin stage of the log
      replay and that way we don't allocate metadata blocks from areas we have logged
      data extents.  With this patch we now pass xfstests generic/311 with mixed
      block groups turned on.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      8c2a1a30
    • Josef Bacik's avatar
      Btrfs: put our inode if orphan cleanup fails · 01cd3367
      Josef Bacik authored
      When we cross into a different subvol when doing a lookup we will run the orhpan
      cleanup.  If this fails however we do not drop the ref to the inode we were
      looking up before we return an error, which leads to busy inodes on umount.
      Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      01cd3367
    • Josef Bacik's avatar
      Btrfs: add some missing iput()'s in btrfs_orphan_cleanup · c69b26b0
      Josef Bacik authored
      There are some error cases that we don't do an iput() on our inode, fix this.
      Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      c69b26b0
    • Josef Bacik's avatar
      Btrfs: do not pin while under spin lock · e78417d1
      Josef Bacik authored
      When testing a corrupted fs I noticed I was getting sleep while atomic errors
      when the transaction aborted.  This is because btrfs_pin_extent may need to
      allocate memory and we are calling this under the spin lock.  Fix this by moving
      it out and doing the pin after dropping the spin lock but before dropping the
      mutex, the same way it works when delayed refs run normally.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      e78417d1
    • Thomas Meyer's avatar
      Btrfs: Cocci spatch "memdup.spatch" · a5959bc0
      Thomas Meyer authored
      Signed-off-by: default avatarThomas Meyer <thomas@m3y3r.de>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      a5959bc0
    • Thomas Meyer's avatar
      Btrfs: Cocci spatch "ptr_ret.spatch" · 97a184fe
      Thomas Meyer authored
      Signed-off-by: default avatarThomas Meyer <thomas@m3y3r.de>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      97a184fe
    • Jan Schmidt's avatar
      Btrfs: fix qgroup rescan resume on mount · b382a324
      Jan Schmidt authored
      When called during mount, we cannot start the rescan worker thread until
      open_ctree is done. This commit restuctures the qgroup rescan internals to
      enable a clean deferral of the rescan resume operation.
      
      First of all, the struct qgroup_rescan is removed, saving us a malloc and
      some initialization synchronizations problems. Its only element (the worker
      struct) now lives within fs_info just as the rest of the rescan code.
      
      Then setting up a rescan worker is split into several reusable stages.
      Currently we have three different rescan startup scenarios:
      	(A) rescan ioctl
      	(B) rescan resume by mount
      	(C) rescan by quota enable
      
      Each case needs its own combination of the four following steps:
      	(1) set the progress [A, C: zero; B: state of umount]
      	(2) commit the transaction [A]
      	(3) set the counters [A, C: zero; B: state of umount]
      	(4) start worker [A, B, C]
      
      qgroup_rescan_init does step (1). There's no extra function added to commit
      a transaction, we've got that already. qgroup_rescan_zero_tracking does
      step (3). Step (4) is nothing more than a call to the generic
      btrfs_queue_worker.
      
      We also get rid of a double check for the rescan progress during
      btrfs_qgroup_account_ref, which is no longer required due to having step 2
      from the list above.
      
      As a side effect, this commit prepares to move the rescan start code from
      btrfs_run_qgroups (which is run during commit) to a less time critical
      section.
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      b382a324
    • Jan Schmidt's avatar
      Btrfs: avoid double free of fs_info->qgroup_ulist · eb1716af
      Jan Schmidt authored
      When btrfs_read_qgroup_config or btrfs_quota_enable return non-zero, we've
      already freed the fs_info->qgroup_ulist. The final btrfs_free_qgroup_config
      called from quota_disable makes another ulist_free(fs_info->qgroup_ulist)
      call.
      
      We set fs_info->qgroup_ulist to NULL on the mentioned error paths, turning
      the ulist_free in btrfs_free_qgroup_config into a noop.
      
      Cc: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      eb1716af
    • Jan Schmidt's avatar
      Btrfs: fix memory patcher through fs_info->qgroup_ulist · 4373519d
      Jan Schmidt authored
      Commit 5b7c665e introduced fs_info->qgroup_ulist, that is allocated during
      btrfs_read_qgroup_config and meant to be used later by the qgroup accounting
      code. However, it is always freed before btrfs_read_qgroup_config returns,
      becuase the commit mentioned above adds a check for (ret), where a check
      for (ret < 0) would have been the right choice. This commit fixes the check.
      
      Cc: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      4373519d
    • Josef Bacik's avatar
      Btrfs: simplify unlink reservations · d52be818
      Josef Bacik authored
      Dave pointed out a problem where if you filled up a file system as much as
      possible you couldn't remove any files.  The whole unlink reservation thing is
      convoluted because it tries to guess if it's going to add space to unlink
      something or not, and has all these odd uncommented cases where it simply does
      not try.  So to fix this I've added a way to conditionally steal from the global
      reserve if we can't make our normal reservation.  If we have more than half the
      space in the global reserve free we will go ahead and steal from the global
      reserve.  With this patch Dave's reproducer now works and I can rm all the files
      on the file system.  Thanks,
      Reported-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      d52be818
    • Miao Xie's avatar
      Btrfs: merge pending IO for tree log write back · c6adc9cc
      Miao Xie authored
      Before applying this patch, we flushed the log tree of the fs/file
      tree firstly, and then flushed the log root tree. It is ineffective,
      especially on the hard disk. This patch improved this problem by wrapping
      the above two flushes by the same blk_plug.
      
      By test, the performance of the sync write went up ~60%(2.9MB/s -> 4.6MB/s)
      on my scsi disk whose disk buffer was enabled.
      
      Test step:
       # mkfs.btrfs -f -m single <disk>
       # mount <disk> <mnt>
       # dd if=/dev/zero of=<mnt>/file0 bs=32K count=1024 oflag=sync
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      c6adc9cc