1. 18 Feb, 2013 1 commit
    • Zheng Liu's avatar
      ext4: refine extent status tree · 06b0c886
      Zheng Liu authored
      This commit refines the extent status tree code.
      
      1) A prefix 'es_' is added to to the extent status tree structure
      members.
      
      2) Refactored es_remove_extent() so that __es_remove_extent() can be
      used by es_insert_extent() to remove the old extent entry(-ies) before
      inserting a new one.
      
      3) Rename extent_status_end() to ext4_es_end()
      
      4) ext4_es_can_be_merged() is define to check whether two extents can
      be merged or not.
      
      5) Update and clarified comments.
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      06b0c886
  2. 15 Feb, 2013 2 commits
    • Theodore Ts'o's avatar
      ext4: use ERR_PTR() abstraction for ext4_append() · 0f70b406
      Theodore Ts'o authored
      Use ERR_PTR()/IS_ERR() abstraction instead of passing in a separate
      pointer to an integer for the error code, as a code cleanup.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      0f70b406
    • Theodore Ts'o's avatar
      ext4: refactor code to read directory blocks into ext4_read_dirblock() · dc6982ff
      Theodore Ts'o authored
      The code to read in directory blocks and verify their metadata
      checksums was replicated in ten different places across
      fs/ext4/namei.c, and the code was buggy in subtle ways in a number of
      those replicated sites.  In some cases, ext4_error() was called with a
      training newline.  In others, in particularly in empty_dir(), it was
      possible to call ext4_dirent_csum_verify() on an index block, which
      would trigger false warnings requesting the system adminsitrator to
      run e2fsck.
      
      By refactoring the code, we make the code more readable, as well as
      shrinking the compiled object file by over 700 bytes and 50 lines of
      code.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      dc6982ff
  3. 14 Feb, 2013 2 commits
  4. 09 Feb, 2013 10 commits
    • Theodore Ts'o's avatar
      jbd2: use module parameters instead of debugfs for jbd_debug · b6e96d00
      Theodore Ts'o authored
      There are multiple reasons to move away from debugfs.  First of all,
      we are only using it for a single parameter, and it is much more
      complicated to set up (some 30 lines of code compared to 3), and one
      more thing that might fail while loading the jbd2 module.
      
      Secondly, as a module paramter it can be specified as a boot option if
      jbd2 is built into the kernel, or as a parameter when the module is
      loaded, and it can also be manipulated dynamically under
      /sys/module/jbd2/parameters/jbd2_debug.  So it is more flexible.
      
      Ultimately we want to move away from using jbd_debug() towards
      tracepoints, but for now this is still a useful simplification of the
      code base.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      b6e96d00
    • Theodore Ts'o's avatar
      ext4: use module parameters instead of debugfs for mballoc_debug · a0b30c12
      Theodore Ts'o authored
      There are multiple reasons to move away from debugfs.  First of all,
      we are only using it for a single parameter, and it is much more
      complicated to set up (some 30 lines of code compared to 3), and one
      more thing that might fail while loading the ext4 module.
      
      Secondly, as a module paramter it can be specified as a boot option if
      ext4 is built into the kernel, or as a parameter when the module is
      loaded, and it can also be manipulated dynamically under
      /sys/module/ext4/parameters/mballoc_debug.  So it is more flexible.
      
      Ultimately we want to move away from using mb_debug() towards
      tracepoints, but for now this is still a useful simplification of the
      code base.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      a0b30c12
    • Theodore Ts'o's avatar
      ext4: start handle at the last possible moment when creating inodes · 1139575a
      Theodore Ts'o authored
      In ext4_{create,mknod,mkdir,symlink}(), don't start the journal handle
      until the inode has been succesfully allocated.  In order to do this,
      we need to start the handle in the ext4_new_inode().  So create a new
      variant of this function, ext4_new_inode_start_handle(), so the handle
      can be created at the last possible minute, before we need to modify
      the inode allocation bitmap block.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      1139575a
    • Theodore Ts'o's avatar
      ext4: fix the number of credits needed for acl ops with inline data · 95eaefbd
      Theodore Ts'o authored
      Operations which modify extended attributes may need extra journal
      credits if inline data is used, since there is a chance that some
      extended attributes may need to get pushed to an external attribute
      block.
      
      Changes to reflect this was made in xattr.c, but they were missed in
      fs/ext4/acl.c.  To fix this, abstract the calculation of the number of
      credits needed for xattr operations to an inline function defined in
      ext4_jbd2.h, and use it in acl.c and xattr.c.
      
      Also move the function declarations used in inline.c from xattr.h
      (where they are non-obviously hidden, and caused problems since
      ext4_jbd2.h needs to use the function ext4_has_inline_data), and move
      them to ext4.h.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarTao Ma <boyu.mt@taobao.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      95eaefbd
    • Theodore Ts'o's avatar
      ext4: fix the number of credits needed for ext4_unlink() and ext4_rmdir() · 64044abf
      Theodore Ts'o authored
      The ext4_unlink() and ext4_rmdir() don't actually release the blocks
      associated with the file/directory.  This gets done in a separate jbd2
      handle called via ext4_evict_inode().  Thus, we don't need to reserve
      lots of journal credits for the truncate.
      
      Note that using too many journal credits is non-optimal because it can
      leading to the journal transmit getting closed too early, before it is
      strictly necessary.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      64044abf
    • Theodore Ts'o's avatar
      ext4: fix the number of credits needed for ext4_ext_migrate() · 4b217630
      Theodore Ts'o authored
      The migration ioctl creates a temporary inode.  Since this inode is
      never linked to a directory, we don't need to reserve journal credits
      required for modifying the directory.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      4b217630
    • Theodore Ts'o's avatar
      ext4: start handle at the last possible moment in ext4_rmdir() · 8dcfaad2
      Theodore Ts'o authored
      Don't start the jbd2 transaction handle until after the directory
      entry has been found, to minimize the amount of time that a handle is
      held active.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      8dcfaad2
    • Theodore Ts'o's avatar
      ext4: start handle at the last possible moment in ext4_unlink() · 931b6864
      Theodore Ts'o authored
      Don't start the jbd2 transaction handle until after the directory
      entry has been found, to minimize the amount of time that a handle is
      held active.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      931b6864
    • Theodore Ts'o's avatar
      ext4: grab page before starting transaction handle in write_begin() · 47564bfb
      Theodore Ts'o authored
      The grab_cache_page_write_begin() function can potentially sleep for a
      long time, since it may need to do memory allocation which can block
      if the system is under significant memory pressure, and because it may
      be blocked on page writeback.  If it does take a long time to grab the
      page, it's better that we not hold an active jbd2 handle.
      
      So grab a handle on the page first, and _then_ start the transaction
      handle.
      
      This commit fixes the following long transaction handle hold time:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      47564bfb
    • Theodore Ts'o's avatar
      ext4: pass context information to jbd2__journal_start() · 9924a92a
      Theodore Ts'o authored
      So we can better understand what bits of ext4 are responsible for
      long-running jbd2 handles, use jbd2__journal_start() so we can pass
      context information for logging purposes.
      
      The recommended way for finding the longer-running handles is:
      
         T=/sys/kernel/debug/tracing
         EVENT=$T/events/jbd2/jbd2_handle_stats
         echo "interval > 5" > $EVENT/filter
         echo 1 > $EVENT/enable
      
         ./run-my-fs-benchmark
      
         cat $T/trace > /tmp/problem-handles
      
      This will list handles that were active for longer than 20ms.  Having
      longer-running handles is bad, because a commit started at the wrong
      time could stall for those 20+ milliseconds, which could delay an
      fsync() or an O_SYNC operation.  Here is an example line from the
      trace file describing a handle which lived on for 311 jiffies, or over
      1.2 seconds:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      9924a92a
  5. 08 Feb, 2013 2 commits
  6. 07 Feb, 2013 2 commits
  7. 04 Feb, 2013 1 commit
    • Theodore Ts'o's avatar
      ext4: optimize mballoc for large allocations · 40ae3487
      Theodore Ts'o authored
      The ext4 block allocator only maintains buddy bitmaps for chunks which
      are less than or equal to one quarter of a block group.  That is, for
      a file aystem with a 1k blocksize, and where the number of blocks in a
      block group is 8192 blocks, the largest chunk size tracked by buddy
      bitmaps is 2048 blocks.
      
      For a file system with a 4k blocksize, and where the number of blocks
      in a block group is 32768 blocks, the largest chunk size tracked by
      buddy bitmaps is 8192 blocks.
      
      To work around this code, mballoc.c before this commit would truncate
      allocation requests to the number of blocks in a block group minus 10.
      Why 10?  Aside from being a completely arbitrary number, it avoids
      block allocation to be a power of two larger than 25% of the block
      group.  If you try to explicitly fallocate 50% of the block group
      size, this will demonstrate the problem; the block allocation code
      will scan the all of the blocks in the file system with cr==0 (since
      the request is for a natural power of two), but then completely fail
      for all blocks groups, since the buddy bitmaps don't track chunk sizes
      of 50% of the block group.
      
      To fix this, in these we use ext4_mb_complex_scan_group() instead of
      ext4_mb_simple_scan_group().
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger@dilger.ca>
      40ae3487
  8. 03 Feb, 2013 4 commits
  9. 02 Feb, 2013 4 commits
  10. 30 Jan, 2013 2 commits
  11. 29 Jan, 2013 8 commits
  12. 28 Jan, 2013 2 commits
    • Jan Kara's avatar
      ext4: simplify mpage_add_bh_to_extent() · b6a8e62f
      Jan Kara authored
      The argument b_size of mpage_add_bh_to_extent() was bogus since it was
      always == blocksize (which we can easily derive from inode->i_blkbits).
      Also second branch of condition:
      	if (nrblocks >= EXT4_MAX_TRANS_DATA) {
      	} else if ((nrblocks + (b_size >> mpd->inode->i_blkbits)) >
      						EXT4_MAX_TRANS_DATA) {
      	}
      was never taken because (b_size >> mpd->inode->i_blkbits) == 1.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      b6a8e62f
    • Jan Kara's avatar
      ext4: dirty page has always buffers attached · f8bec370
      Jan Kara authored
      ext4_writepage(), write_cache_pages_da(), and mpage_da_submit_io()
      doesn't have to deal with the case when page doesn't have buffers. We
      attach buffers to a page in ->write_begin() and ->page_mkwrite() which
      covers all places where a page can become dirty.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      f8bec370