1. 11 Oct, 2013 2 commits
    • Josef Bacik's avatar
      Btrfs: limit delalloc pages outside of find_delalloc_range · 7bf811a5
      Josef Bacik authored
      Liu fixed part of this problem and unfortunately I steered him in slightly the
      wrong direction and so didn't completely fix the problem.  The problem is we
      limit the size of the delalloc range we are looking for to max bytes and then we
      try to lock that range.  If we fail to lock the pages in that range we will
      shrink the max bytes to a single page and re loop.  However if our first page is
      inside of the delalloc range then we will end up limiting the end of the range
      to a period before our first page.  This is illustrated below
      
      [0 -------- delalloc range --------- 256mb]
                                        [page]
      
      So find_delalloc_range will return with delalloc_start as 0 and end as 128mb,
      and then we will notice that delalloc_start < *start and adjust it up, but not
      adjust delalloc_end up, so things go sideways.  To fix this we need to not limit
      the max bytes in find_delalloc_range, but in find_lock_delalloc_range and that
      way we don't end up with this confusion.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      7bf811a5
    • Josef Bacik's avatar
      Btrfs: use right root when checking for hash collision · 4871c158
      Josef Bacik authored
      btrfs_rename was using the root of the old dir instead of the root of the new
      dir when checking for a hash collision, so if you tried to move a file into a
      subvol it would freak out because it would see the file you are trying to move
      in its current root.  This fixes the bug where this would fail
      
      btrfs subvol create test1
      btrfs subvol create test2
      mv test1 test2.
      
      Thanks to Chris Murphy for catching this,
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarChris Murphy <lists@colorremedies.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      4871c158
  2. 04 Oct, 2013 4 commits
    • Ilya Dryomov's avatar
      Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing · 1357272f
      Ilya Dryomov authored
      free_device rcu callback, scheduled from btrfs_rm_dev_replace_srcdev,
      can be processed before btrfs_scratch_superblock is called, which would
      result in a use-after-free on btrfs_device contents.  Fix this by
      zeroing the superblock before the rcu callback is registered.
      
      Cc: Stefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      1357272f
    • Ilya Dryomov's avatar
      Btrfs: eliminate races in worker stopping code · 964fb15a
      Ilya Dryomov authored
      The current implementation of worker threads in Btrfs has races in
      worker stopping code, which cause all kinds of panics and lockups when
      running btrfs/011 xfstest in a loop.  The problem is that
      btrfs_stop_workers is unsynchronized with respect to check_idle_worker,
      check_busy_worker and __btrfs_start_workers.
      
      E.g., check_idle_worker race flow:
      
             btrfs_stop_workers():            check_idle_worker(aworker):
      - grabs the lock
      - splices the idle list into the
        working list
      - removes the first worker from the
        working list
      - releases the lock to wait for
        its kthread's completion
                                        - grabs the lock
                                        - if aworker is on the working list,
                                          moves aworker from the working list
                                          to the idle list
                                        - releases the lock
      - grabs the lock
      - puts the worker
      - removes the second worker from the
        working list
                                    ......
              btrfs_stop_workers returns, aworker is on the idle list
                       FS is umounted, memory is freed
                                    ......
                    aworker is waken up, fireworks ensue
      
      With this applied, I wasn't able to trigger the problem in 48 hours,
      whereas previously I could reliably reproduce at least one of these
      races within an hour.
      Reported-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      964fb15a
    • Liu Bo's avatar
      Btrfs: fix crash of compressed writes · 385fe0be
      Liu Bo authored
      The crash[1] is found by xfstests/generic/208 with "-o compress",
      it's not reproduced everytime, but it does panic.
      
      The bug is quite interesting, it's actually introduced by a recent commit
      (573aecaf,
      Btrfs: actually limit the size of delalloc range).
      
      Btrfs implements delay allocation, so during writeback, we
      (1) get a page A and lock it
      (2) search the state tree for delalloc bytes and lock all pages within the range
      (3) process the delalloc range, including find disk space and create
          ordered extent and so on.
      (4) submit the page A.
      
      It runs well in normal cases, but if we're in a racy case, eg.
      buffered compressed writes and aio-dio writes,
      sometimes we may fail to lock all pages in the 'delalloc' range,
      in which case, we need to fall back to search the state tree again with
      a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset).
      
      The mentioned commit has a side effect, that is, in the fallback case,
      we can find delalloc bytes before the index of the page we already have locked,
      so we're in the case of (delalloc_end <= *start) and return with (found > 0).
      
      This ends with not locking delalloc pages but making ->writepage still
      process them, and the crash happens.
      
      This fixes it by just thinking that we find nothing and returning to caller
      as the caller knows how to deal with it properly.
      
      [1]:
      ------------[ cut here ]------------
      kernel BUG at mm/page-writeback.c:2170!
      [...]
      CPU: 2 PID: 11755 Comm: btrfs-delalloc- Tainted: G           O 3.11.0+ #8
      [...]
      RIP: 0010:[<ffffffff810f5093>]  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
      [...]
      [ 4934.248731] Stack:
      [ 4934.248731]  ffff8801477e5dc8 ffffea00049b9f00 ffff8801869f9ce8 ffffffffa02b841a
      [ 4934.248731]  0000000000000000 0000000000000000 0000000000000fff 0000000000000620
      [ 4934.248731]  ffff88018db59c78 ffffea0005da8d40 ffffffffa02ff860 00000001810016c0
      [ 4934.248731] Call Trace:
      [ 4934.248731]  [<ffffffffa02b841a>] extent_range_clear_dirty_for_io+0xcf/0xf5 [btrfs]
      [ 4934.248731]  [<ffffffffa02a8889>] compress_file_range+0x1dc/0x4cb [btrfs]
      [ 4934.248731]  [<ffffffff8104f7af>] ? detach_if_pending+0x22/0x4b
      [ 4934.248731]  [<ffffffffa02a8bad>] async_cow_start+0x35/0x53 [btrfs]
      [ 4934.248731]  [<ffffffffa02c694b>] worker_loop+0x14b/0x48c [btrfs]
      [ 4934.248731]  [<ffffffffa02c6800>] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
      [ 4934.248731]  [<ffffffff810608f5>] kthread+0x8d/0x95
      [ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
      [ 4934.248731]  [<ffffffff814fe09c>] ret_from_fork+0x7c/0xb0
      [ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
      [ 4934.248731] Code: ff 85 c0 0f 94 c0 0f b6 c0 59 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 2c de 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 52 49 8b 84 24 80 00 00 00 f6 40 20 01 75 44
      [ 4934.248731] RIP  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
      [ 4934.248731]  RSP <ffff8801869f9c48>
      [ 4934.280307] ---[ end trace 36f06d3f8750236a ]---
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      385fe0be
    • Josef Bacik's avatar
      Btrfs: fix transid verify errors when recovering log tree · 60e7cd3a
      Josef Bacik authored
      If we crash with a log, remount and recover that log, and then crash before we
      can commit another transaction we will get transid verify errors on the next
      mount.  This is because we were not zero'ing out the log when we committed the
      transaction after recovery.  This is ok as long as we commit another transaction
      at some point in the future, but if you abort or something else goes wrong you
      can end up in this weird state because the recovery stuff says that the tree log
      should have a generation+1 of the super generation, which won't be the case of
      the transaction that was started for recovery.  Fix this by removing the check
      and _always_ zero out the log portion of the super when we commit a transaction.
      This fixes the transid verify issues I was seeing with my force errors tests.
      Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      60e7cd3a
  3. 21 Sep, 2013 26 commits
  4. 02 Sep, 2013 4 commits
  5. 01 Sep, 2013 4 commits
    • Filipe David Borba Manana's avatar
      Btrfs: optimize key searches in btrfs_search_slot · d7396f07
      Filipe David Borba Manana authored
      When the binary search returns 0 (exact match), the target key
      will necessarily be at slot 0 of all nodes below the current one,
      so in this case the binary search is not needed because it will
      always return 0, and we waste time doing it, holding node locks
      for longer than necessary, etc.
      
      Below follow histograms with the times spent on the current approach of
      doing a binary search when the previous binary search returned 0, and
      times for the new approach, which directly picks the first item/child
      node in the leaf/node.
      
      Current approach:
      
      Count: 6682
      Range: 35.000 - 8370.000; Mean: 85.837; Median: 75.000; Stddev: 106.429
      Percentiles:  90th: 124.000; 95th: 145.000; 99th: 206.000
        35.000 -   61.080:  1235 ################
        61.080 -  106.053:  4207 #####################################################
       106.053 -  183.606:  1122 ##############
       183.606 -  317.341:   111 #
       317.341 -  547.959:     6 |
       547.959 - 8370.000:     1 |
      
      Approach proposed by this patch:
      
      Count: 6682
      Range:  6.000 - 135.000; Mean: 16.690; Median: 16.000; Stddev:  7.160
      Percentiles:  90th: 23.000; 95th: 27.000; 99th: 40.000
         6.000 -    8.418:    58 #
         8.418 -   11.670:  1149 #########################
        11.670 -   16.046:  2418 #####################################################
        16.046 -   21.934:  2098 ##############################################
        21.934 -   29.854:   744 ################
        29.854 -   40.511:   154 ###
        40.511 -   54.848:    41 #
        54.848 -   74.136:     5 |
        74.136 -  100.087:     9 |
       100.087 -  135.000:     6 |
      
      These samples were captured during a run of the btrfs tests 001, 002 and
      004 in the xfstests, with a leaf/node size of 4Kb.
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      d7396f07
    • Josef Bacik's avatar
      Btrfs: don't use an async starter for most of our workers · 45d5fd14
      Josef Bacik authored
      We only need an async starter if we can't make a GFP_NOFS allocation in our
      current path.  This is the case for the endio stuff since it happens in IRQ
      context, but things like the caching thread workers and the delalloc flushers we
      can easily make this allocation and start threads right away.  Also change the
      worker count for the caching thread pool.  Traditionally we limited this to 2
      since we took read locks while caching, but nowadays we do this lockless so
      there's no reason to limit the number of caching threads.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      45d5fd14
    • Josef Bacik's avatar
      Btrfs: only update disk_i_size as we remove extents · 7f4f6e0a
      Josef Bacik authored
      This fixes a problem where if we fail a truncate we will leave the i_size set
      where we wanted to truncate to instead of where we were able to truncate to.
      Fix this by making btrfs_truncate_inode_items do the disk_i_size update as it
      removes extents, that way it will always be consistent with where its extents
      are.  Then if the truncate fails at all we can update the in-ram i_size with
      what we have on disk and delete the orphan item.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      7f4f6e0a
    • Filipe David Borba Manana's avatar
      Btrfs: fix deadlock in uuid scan kthread · f45388f3
      Filipe David Borba Manana authored
      If there's an ongoing transaction when the uuid scan kthread attempts
      to create one, the kthread will block, waiting for that transaction to
      finish while it's keeping locks on the tree root, and in turn the existing
      transaction is waiting for those locks to be free.
      
      The stack trace reported by the kernel follows.
      
      [36700.671601] INFO: task btrfs-uuid:15480 blocked for more than 120 seconds.
      [36700.671602] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [36700.671602] btrfs-uuid      D 0000000000000000     0 15480      2 0x00000000
      [36700.671604]  ffff880710bd5b88 0000000000000046 ffff8803d36ba850 0000000000030000
      [36700.671605]  ffff8806d76dc530 ffff880710bd5fd8 ffff880710bd5fd8 ffff880710bd5fd8
      [36700.671607]  ffff8808098ac530 ffff8806d76dc530 ffff880710bd5b98 ffff8805e4508e40
      [36700.671608] Call Trace:
      [36700.671610]  [<ffffffff816f36b9>] schedule+0x29/0x70
      [36700.671620]  [<ffffffffa05a3bdf>] wait_current_trans.isra.33+0xbf/0x120 [btrfs]
      [36700.671623]  [<ffffffff81066760>] ? add_wait_queue+0x60/0x60
      [36700.671629]  [<ffffffffa05a5b06>] start_transaction+0x3d6/0x530 [btrfs]
      [36700.671636]  [<ffffffffa05bb1f4>] ? btrfs_get_token_32+0x64/0xf0 [btrfs]
      [36700.671642]  [<ffffffffa05a5fbb>] btrfs_start_transaction+0x1b/0x20 [btrfs]
      [36700.671649]  [<ffffffffa05c8a81>] btrfs_uuid_scan_kthread+0x211/0x3d0 [btrfs]
      [36700.671655]  [<ffffffffa05c8870>] ? __btrfs_open_devices+0x2a0/0x2a0 [btrfs]
      [36700.671657]  [<ffffffff81065fa0>] kthread+0xc0/0xd0
      [36700.671659]  [<ffffffff81065ee0>] ? flush_kthread_worker+0xb0/0xb0
      [36700.671661]  [<ffffffff816fcd1c>] ret_from_fork+0x7c/0xb0
      [36700.671662]  [<ffffffff81065ee0>] ? flush_kthread_worker+0xb0/0xb0
      [36700.671663] INFO: task btrfs:15481 blocked for more than 120 seconds.
      [36700.671664] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [36700.671665] btrfs           D 0000000000000000     0 15481  15212 0x00000004
      [36700.671666]  ffff880248cbf4c8 0000000000000086 ffff8803d36ba700 ffff8801dbd5c280
      [36700.671668]  ffff880807815c40 ffff880248cbffd8 ffff880248cbffd8 ffff880248cbffd8
      [36700.671669]  ffff8805e86a0000 ffff880807815c40 ffff880248cbf4d8 ffff8801dbd5c280
      [36700.671670] Call Trace:
      [36700.671672]  [<ffffffff816f36b9>] schedule+0x29/0x70
      [36700.671679]  [<ffffffffa05d9b0d>] btrfs_tree_lock+0x6d/0x230 [btrfs]
      [36700.671680]  [<ffffffff81066760>] ? add_wait_queue+0x60/0x60
      [36700.671685]  [<ffffffffa0582829>] btrfs_search_slot+0x999/0xb00 [btrfs]
      [36700.671691]  [<ffffffffa05bd9de>] ? btrfs_lookup_first_ordered_extent+0x5e/0xb0 [btrfs]
      [36700.671698]  [<ffffffffa05e3e54>] __btrfs_write_out_cache+0x8c4/0xa80 [btrfs]
      [36700.671704]  [<ffffffffa05e4362>] btrfs_write_out_cache+0xb2/0xf0 [btrfs]
      [36700.671710]  [<ffffffffa05c4441>] ? free_extent_buffer+0x61/0xc0 [btrfs]
      [36700.671716]  [<ffffffffa0594c82>] btrfs_write_dirty_block_groups+0x562/0x650 [btrfs]
      [36700.671723]  [<ffffffffa0610092>] commit_cowonly_roots+0x171/0x24b [btrfs]
      [36700.671729]  [<ffffffffa05a4dde>] btrfs_commit_transaction+0x4fe/0xa10 [btrfs]
      [36700.671735]  [<ffffffffa0610af3>] create_subvol+0x5c0/0x636 [btrfs]
      [36700.671742]  [<ffffffffa05d49ff>] btrfs_mksubvol.isra.60+0x33f/0x3f0 [btrfs]
      [36700.671747]  [<ffffffffa05d4bf2>] btrfs_ioctl_snap_create_transid+0x142/0x190 [btrfs]
      [36700.671752]  [<ffffffffa05d4c6c>] ? btrfs_ioctl_snap_create+0x2c/0x80 [btrfs]
      [36700.671757]  [<ffffffffa05d4c9e>] btrfs_ioctl_snap_create+0x5e/0x80 [btrfs]
      [36700.671759]  [<ffffffff8113a764>] ? handle_pte_fault+0x84/0x920
      [36700.671764]  [<ffffffffa05d87eb>] btrfs_ioctl+0xf0b/0x1d00 [btrfs]
      [36700.671766]  [<ffffffff8113c120>] ? handle_mm_fault+0x210/0x310
      [36700.671768]  [<ffffffff816f83a4>] ? __do_page_fault+0x284/0x4e0
      [36700.671770]  [<ffffffff81180aa6>] do_vfs_ioctl+0x96/0x550
      [36700.671772]  [<ffffffff81170fe3>] ? __sb_end_write+0x33/0x70
      [36700.671774]  [<ffffffff81180ff1>] SyS_ioctl+0x91/0xb0
      [36700.671775]  [<ffffffff816fcdc2>] system_call_fastpath+0x16/0x1b
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      f45388f3