1. 13 May, 2013 1 commit
  2. 11 May, 2013 1 commit
  3. 06 May, 2013 1 commit
    • Lachlan McIlroy's avatar
      ext4: limit group search loop for non-extent files · e6155736
      Lachlan McIlroy authored
      In the case where we are allocating for a non-extent file,
      we must limit the groups we allocate from to those below
      2^32 blocks, and ext4_mb_regular_allocator() attempts to
      do this initially by putting a cap on ngroups for the
      subsequent search loop.
      
      However, the initial target group comes in from the 
      allocation context (ac), and it may already be beyond
      the artificially limited ngroups.  In this case,
      the limit
      
      	if (group == ngroups)
      		group = 0;
      
      at the top of the loop is never true, and the loop will
      run away.
      
      Catch this case inside the loop and reset the search to
      start at group 0.
      
      [sandeen@redhat.com: add commit msg & comments]
      Signed-off-by: default avatarLachlan McIlroy <lmcilroy@redhat.com>
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      e6155736
  4. 03 May, 2013 1 commit
    • Yan, Zheng's avatar
      ext4: fix fio regression · e30b5dca
      Yan, Zheng authored
      We (Linux Kernel Performance project) found a regression introduced
      by commit:
      
        f7fec032 ext4: track all extent status in extent status tree
      
      The commit causes about 20% performance decrease in fio random write
      test. Profiler shows that rb_next() uses a lot of CPU time. The call
      stack is:
      
        rb_next
        ext4_es_find_delayed_extent
        ext4_map_blocks
        _ext4_get_block
        ext4_get_block_write
        __blockdev_direct_IO
        ext4_direct_IO
        generic_file_direct_write
        __generic_file_aio_write
        ext4_file_write
        aio_rw_vect_retry
        aio_run_iocb
        do_io_submit
        sys_io_submit
        system_call_fastpath
        io_submit
        td_io_getevents
        io_u_queued_complete
        thread_main
        main
        __libc_start_main
      
      The cause is that ext4_es_find_delayed_extent() doesn't have an
      upper bound, it keeps searching until a delayed extent is found.
      When there are a lots of non-delayed entries in the extent state
      tree, ext4_es_find_delayed_extent() may uses a lot of CPU time.
      Reported-by: default avatarLKP project <lkp@linux.intel.com>
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      e30b5dca
  5. 23 Apr, 2013 1 commit
  6. 22 Apr, 2013 3 commits
  7. 21 Apr, 2013 2 commits
    • Theodore Ts'o's avatar
      jbd2: trace when lock_buffer in do_get_write_access takes a long time · f783f091
      Theodore Ts'o authored
      While investigating interactivity problems it was clear that processes
      sometimes stall for long periods of times if an attempt is made to
      lock a buffer which is undergoing writeback.  It would stall in
      a trace looking something like
      
      [<ffffffff811a39de>] __lock_buffer+0x2e/0x30
      [<ffffffff8123a60f>] do_get_write_access+0x43f/0x4b0
      [<ffffffff8123a7cb>] jbd2_journal_get_write_access+0x2b/0x50
      [<ffffffff81220f79>] __ext4_journal_get_write_access+0x39/0x80
      [<ffffffff811f3198>] ext4_reserve_inode_write+0x78/0xa0
      [<ffffffff811f3209>] ext4_mark_inode_dirty+0x49/0x220
      [<ffffffff811f57d1>] ext4_dirty_inode+0x41/0x60
      [<ffffffff8119ac3e>] __mark_inode_dirty+0x4e/0x2d0
      [<ffffffff8118b9b9>] update_time+0x79/0xc0
      [<ffffffff8118ba98>] file_update_time+0x98/0x100
      [<ffffffff81110ffc>] __generic_file_aio_write+0x17c/0x3b0
      [<ffffffff811112aa>] generic_file_aio_write+0x7a/0xf0
      [<ffffffff811ea853>] ext4_file_write+0x83/0xd0
      [<ffffffff81172b23>] do_sync_write+0xa3/0xe0
      [<ffffffff811731ae>] vfs_write+0xae/0x180
      [<ffffffff8117361d>] sys_write+0x4d/0x90
      [<ffffffff8159d62d>] system_call_fastpath+0x1a/0x1f
      [<ffffffffffffffff>] 0xffffffffffffffff
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      f783f091
    • Theodore Ts'o's avatar
      ext4: mark metadata blocks using bh flags · 13fca323
      Theodore Ts'o authored
      This allows metadata writebacks which are issued via block device
      writeback to be sent with the current write request flags.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      13fca323
  8. 20 Apr, 2013 2 commits
  9. 19 Apr, 2013 5 commits
    • Tao Ma's avatar
      ext4: fix readdir error in case inline_data+^dir_index. · c4d8b023
      Tao Ma authored
      Zach reported a problem that if inline data is enabled, we don't
      tell the difference between the offset of '.' and '..'. And a
      getdents will fail if the user only want to get '.'. And what's
      worse, we may meet with duplicate dir entries as the offset
      for inline dir and non-inline one is quite different.
      
      This patch just try to resolve this problem if dir_index
      is disabled. In this case, f_pos is the real offset with
      the dir block, so for inline dir, we just pretend as if
      we are a dir block and returns the offset like a norml
      dir block does.
      Reported-by: default avatarZach Brown <zab@redhat.com>
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      c4d8b023
    • Tao Ma's avatar
      ext4: fix readdir error in the case of inline_data+dir_index · 8af0f082
      Tao Ma authored
      Zach reported a problem that if inline data is enabled, we don't
      tell the difference between the offset of '.' and '..'. And a
      getdents will fail if the user only want to get '.' and what's worse,
      if there is a conversion happens when the user calls getdents
      many times, he/she may get the same entry twice.
      
      In theory, a dir block would also fail if it is converted to a
      hashed-index based dir since f_pos will become a hash value, not the
      real one, but it doesn't happen.  And a deep investigation shows that
      we uses a hash based solution even for a normal dir if the dir_index
      feature is enabled.
      
      So this patch just adds a new htree_inlinedir_to_tree for inline dir,
      and if we find that the hash index is supported, we will do like what
      we do for a dir block.
      Reported-by: default avatarZach Brown <zab@redhat.com>
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      8af0f082
    • Zheng Liu's avatar
      jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset · 28daf4fa
      Zheng Liu authored
      The jbd2_alloc_handle() function is only called by new_handle().  So
      this commit uses kmem_cache_zalloc() instead of
      kmem_cache_alloc()/memset().
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      28daf4fa
    • Darrick J. Wong's avatar
    • Jan Kara's avatar
      ext4: move quota initialization out of inode allocation transaction · eb9cc7e1
      Jan Kara authored
      Inode allocation transaction is pretty heavy (246 credits with quotas
      and extents before previous patch, still around 200 after it).  This is
      mostly due to credits required for allocation of quota structures
      (credits there are heavily overestimated but it's difficult to make
      better estimates if we don't want to wire non-trivial assumptions about
      quota format into filesystem).
      
      So move quota initialization out of allocation transaction. That way
      transaction for quota structure allocation will be started only if we
      need to look up quota structure on disk (rare) and furthermore it will
      be started for each quota type separately, not for all of them at once.
      This reduces maximum transaction size to 34 is most cases and to 73 in
      the worst case.
      
      [ Modified by tytso to clean up the cleanup paths for error handling.
        Also use a separate call to ext4_std_error() for each failure so it
        is easier for someone who is debugging a problem in this function to
        determine which function call failed. ]
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      eb9cc7e1
  10. 18 Apr, 2013 1 commit
  11. 12 Apr, 2013 6 commits
  12. 11 Apr, 2013 2 commits
  13. 10 Apr, 2013 5 commits
  14. 09 Apr, 2013 4 commits
  15. 08 Apr, 2013 2 commits
    • Dmitry Monakhov's avatar
      ext4: fix incorrect lock ordering for ext4_ind_migrate · e8238f9a
      Dmitry Monakhov authored
      existing locking ordering: journal-> i_data_sem, but
      ext4_ind_migrate() grab locks in opposite order which may result in
      deadlock.
      Signed-off-by: default avatarDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      e8238f9a
    • Dr. Tilmann Bubeck's avatar
      ext4: implementation of a new ioctl called EXT4_IOC_SWAP_BOOT · 393d1d1d
      Dr. Tilmann Bubeck authored
      Add a new ioctl, EXT4_IOC_SWAP_BOOT which swaps i_blocks and
      associated attributes (like i_blocks, i_size, i_flags, ...) from the
      specified inode with inode EXT4_BOOT_LOADER_INO (#5). This is
      typically used to store a boot loader in a secure part of the
      filesystem, where it can't be changed by a normal user by accident.
      The data blocks of the previous boot loader will be associated with
      the given inode.
      
      This usercode program is a simple example of the usage:
      
      int main(int argc, char *argv[])
      {
        int fd;
        int err;
      
        if ( argc != 2 ) {
          printf("usage: ext4-swap-boot-inode FILE-TO-SWAP\n");
          exit(1);
        }
      
        fd = open(argv[1], O_WRONLY);
        if ( fd < 0 ) {
          perror("open");
          exit(1);
        }
      
        err = ioctl(fd, EXT4_IOC_SWAP_BOOT);
        if ( err < 0 ) {
          perror("ioctl");
          exit(1);
        }
      
        close(fd);
        exit(0);
      }
      
      [ Modified by Theodore Ts'o to fix a number of bugs in the original code.]
      Signed-off-by: default avatarDr. Tilmann Bubeck <t.bubeck@reinform.de>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      393d1d1d
  16. 04 Apr, 2013 3 commits
    • Lukas Czerner's avatar
      ext4: print more info in ext4_print_free_blocks() · f78ee70d
      Lukas Czerner authored
      Additionally print i_allocated_meta_blocks information as well.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      f78ee70d
    • Lukas Czerner's avatar
      ext4: try to prepend extent to the existing one · be8981be
      Lukas Czerner authored
      Currently when inserting extent in ext4_ext_insert_extent() we would
      only try to to see if we can append new extent to the found extent. If
      we can not, then we proceed with adding new extent into the extent tree,
      but then possibly merging it back again.
      
      We can avoid this situation by trying to append and prepend new extent
      to the existing ones. However since the new extent can be on either
      sides of the existing extent, we have to pick the right extent to try to
      append/prepend to.
      
      This patch adds the conditions to pick the right extent to
      append/prepend to and adds the actual prepending condition as well. This
      will also eliminate the need to use "reserved" block for possibly
      growing extent tree.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      be8981be
    • Lukas Czerner's avatar
      ext4: Transfer initialized block to right neighbor if possible · bc2d9db4
      Lukas Czerner authored
      Currently when converting extent to initialized we attempt to transfer
      initialized block to the left neighbour if possible when certain
      criteria are met. However we do not attempt to do the same for the
      right neighbor.
      
      This commit adds the possibility to transfer initialized block to the
      right neighbour if:
      
      1. We're not converting the whole extent
      2. Both extents are stored in the same extent tree node
      3. Right neighbor is initialized
      4. Right neighbor is logically abutting the current one
      5. Right neighbor is physically abutting the current one
      6. Right neighbor would not overflow the length limit
      
      This is basically the same logic as with transferring to the left. This
      will gain us some performance benefits since it is faster than inserting
      extent and then merging it.
      
      It would also prevent some situation in delalloc patch when we might run
      out of metadata reservation. This is due to the fact that we would
      attempt to split the extent first (possibly allocating new metadata
      block) even though we did not counted for that because it can (and will)
      be merged again. This commit fix that scenario, because we no longer
      need to split the extent in such case.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      bc2d9db4