1. 11 Sep, 2009 2 commits
    • Andreas Schlick's avatar
      ext4: Always set dx_node's fake_dirent explicitly. · 1f7bebb9
      Andreas Schlick authored
      When ext4_dx_add_entry() has to split an index node, it has to ensure that
      name_len of dx_node's fake_dirent is also zero, because otherwise e2fsck
      won't recognise it as an intermediate htree node and consider the htree to
      be corrupted.
      Signed-off-by: default avatarAndreas Schlick <schlick@lavabit.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      1f7bebb9
    • Theodore Ts'o's avatar
      ext4: Fix async commit mode to be safe by using a barrier · 0e3d2a63
      Theodore Ts'o authored
      Previously the journal_async_commit mount option was equivalent to
      using barrier=0 (and just as unsafe).  This patch fixes it so that we
      eliminate the barrier before the commit block (by not using ordered
      mode), and explicitly issuing an empty barrier bio after writing the
      commit block.  Because of the journal checksum, it is safe to do this;
      if the journal blocks are not all written before a power failure, the
      checksum in the commit block will prevent the last transaction from
      being replayed.
      
      Using the fs_mark benchmark, using journal_async_commit shows a 50%
      improvement:
      
      FSUse%        Count         Size    Files/sec     App Overhead
           8         1000        10240         30.5            28242
      
      vs.
      
      FSUse%        Count         Size    Files/sec     App Overhead
           8         1000        10240         45.8            28620
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      0e3d2a63
  2. 10 Sep, 2009 5 commits
  3. 12 Sep, 2009 1 commit
  4. 10 Sep, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: Use bforget() in no journal mode for ext4_journal_{forget,revoke}() · c7acb4c1
      Theodore Ts'o authored
      When ext4 is using a journal, a metadata block which is deallocated
      must be passed into the journal layer so it can be dropped from the
      current transaction and/or revoked.  This is done by calling the
      functions ext4_journal_forget() and ext4_journal_revoke(), which call
      jbd2_journal_forget(), and jbd2_journal_revoke(), respectively.
      
      Since the jbd2_journal_forget() and jbd2_journal_revoke() call
      bforget(), if ext4 is not using a journal, ext4_journal_forget() and
      ext4_journal_revoke() must call bforget() to avoid a dirty metadata
      block overwriting a block after it has been reallocated and reused for
      another inode's data block.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      c7acb4c1
  5. 08 Sep, 2009 1 commit
  6. 10 Sep, 2009 1 commit
  7. 06 Sep, 2009 3 commits
  8. 16 Sep, 2009 1 commit
  9. 06 Sep, 2009 1 commit
  10. 05 Sep, 2009 1 commit
  11. 17 Sep, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: fix tracepoint format string warnings · a3710fd1
      Theodore Ts'o authored
      Unlike on some other architectures ino_t is an unsigned int on s390.
      So add an explicit cast to avoid lots of compile warnings:
      
      In file included from include/trace/ftrace.h:285,
                       from include/trace/define_trace.h:61,
                       from include/trace/events/ext4.h:711,
                       from fs/ext4/super.c:50:
      include/trace/events/ext4.h: In function 'ftrace_raw_output_ext4_free_inode':
      include/trace/events/ext4.h:12: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      a3710fd1
  12. 05 Sep, 2009 1 commit
  13. 01 Sep, 2009 1 commit
  14. 31 Aug, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: Restore wbc->range_start in ext4_da_writepages() · de89de6e
      Theodore Ts'o authored
      To solve a lock inversion problem, we implement part of the
      range_cyclic algorithm in ext4_da_writepages().  (See commit 2acf2c26
      for more details.)
      
      As part of that change wbc->range_start was modified by ext4's
      writepages function, which causes its callers to get confused since
      they aren't expecting the filesystem to modify it.  The simplest fix
      is to save and restore wbc->range_start in ext4_da_writepages.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      de89de6e
  15. 17 Sep, 2009 1 commit
  16. 30 Aug, 2009 1 commit
  17. 29 Aug, 2009 1 commit
  18. 28 Aug, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: fix extent sanity checking code with AGGRESSIVE_TEST · 55ad63bf
      Theodore Ts'o authored
      The extents sanity-checking code depends on the ext4_ext_space_*()
      functions returning the maximum alloable size for eh_max; however,
      when the debugging #ifdef AGGRESSIVE_TEST is enabled to test the
      extent tree handling code, this prevents a normally created ext4
      filesystem from being mounted with the errors:
      
      Aug 26 15:43:50 bsd086 kernel: [   96.070277] EXT4-fs error (device sda8): ext4_ext_check_inode: bad header/extent in inode #8: too large eh_max - magic f30a, entries 1, max 4(3), depth 0(0)
      Aug 26 15:43:50 bsd086 kernel: [   96.070526] EXT4-fs (sda8): no journal found
      
      Bug reported by Akira Fujita.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      55ad63bf
  19. 26 Aug, 2009 3 commits
    • Eric Sandeen's avatar
      ext4: use ext4_grpblk_t more extensively · a36b4498
      Eric Sandeen authored
      unsigned  short is potentially too small to track blocks within
      a group; today it is safe due to restrictions in e2fsprogs but
      we have _lo / _hi bits for group blocks with the intent to go
      up to 32 bits, so clean this up now.
      
      There are many more places where we use unsigned/int/unsigned int
      to contain a group block but this should at least fix all the
      short types.
      
      I added a few comments to the struct ext4_group_info definition
      as well.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      a36b4498
    • Eric Sandeen's avatar
      ext4: use variables not types in sizeofs() for allocations · 1927805e
      Eric Sandeen authored
      Precursor to changing some types; to keep things in sync, it 
      seems better to allocate/memset based on the size of the 
      variables we are using rather than on some disconnected 
      basic type like "unsigned short"
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      1927805e
    • Aneesh Kumar K.V's avatar
      ext4: Add missing unlock_new_inode() call in extent migration code · a8526e84
      Aneesh Kumar K.V authored
      We need to unlock the new inode before iput.  This patch fixes the
      following warning when calling chattr +e to migrate a file to use
      extents.  It also fixes problems in when e4defrag attempts to
      defragment an inode.
      
      [  470.400044] ------------[ cut here ]------------
      [  470.400065] WARNING: at fs/inode.c:1210 generic_delete_inode+0x65/0x16a()
      [  470.400072] Hardware name: N/A
      .....
      ...
      [  470.400353] Pid: 4451, comm: chattr Not tainted 2.6.31-rc7-red-debug #4
      [  470.400359] Call Trace:
      [  470.400372]  [<ffffffff81037771>] warn_slowpath_common+0x77/0x8f
      [  470.400385]  [<ffffffff81037798>] warn_slowpath_null+0xf/0x11
      [  470.400395]  [<ffffffff810b7f28>] generic_delete_inode+0x65/0x16a
      [  470.400405]  [<ffffffff810b8044>] generic_drop_inode+0x17/0x1bd
      [  470.400413]  [<ffffffff810b7083>] iput+0x61/0x65
      [  470.400455]  [<ffffffffa003b229>] ext4_ext_migrate+0x5eb/0x66a [ext4]
      [  470.400492]  [<ffffffffa002b1f8>] ext4_ioctl+0x340/0x756 [ext4]
      [  470.400507]  [<ffffffff810b1a91>] vfs_ioctl+0x1d/0x82
      [  470.400517]  [<ffffffff810b1ff0>] do_vfs_ioctl+0x483/0x4c9
      [  470.400527]  [<ffffffff81059c30>] ? trace_hardirqs_on+0xd/0xf
      [  470.400537]  [<ffffffff810b2087>] sys_ioctl+0x51/0x74
      [  470.400549]  [<ffffffff8100ba6b>] system_call_fastpath+0x16/0x1b
      [  470.400557] ---[ end trace ab85723542352dac ]---
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      a8526e84
  20. 18 Aug, 2009 7 commits
    • Eric Sandeen's avatar
      ext4: Add feature set check helper for mount & remount paths · a13fb1a4
      Eric Sandeen authored
      A user reported that although his root ext4 filesystem was mounting
      fine, other filesystems would not mount, with the:
      
      "Filesystem with huge files cannot be mounted RDWR without CONFIG_LBDAF"
      
      error on his 32-bit box built without CONFIG_LBDAF.  This is because
      the test at mount time for this situation was not being re-checked
      on remount, and the normal boot process makes an ro->rw transition,
      so this was being missed.
      
      Refactor to make a common helper function to test the filesystem
      features against the type of mount request (RO vs. RW) so that we 
      stay consistent.
      
      Addresses Red-Hat-Bugzilla: #517650
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      a13fb1a4
    • Eric Sandeen's avatar
      simplify some logic in ext4_mb_normalize_request · 38877f4e
      Eric Sandeen authored
      While reading through some of the mballoc code it seems that a couple
      spots in the size normalization function could be streamlined.
      
      The test for non-overlapping PAs can be or'd for the start & end
      conditions, and the tests for adjacent PAs can be else-if'd - 
      it's essentially independently testing:
      
      	if (A + B <= C)
      		...
      	if (A > C)
      		...
      
      These cannot both be true so it seems like the else-if might
      be slightly more efficient and/or informative.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      38877f4e
    • Eric Sandeen's avatar
      ext4: open-code ext4_mb_update_group_info · 0373130d
      Eric Sandeen authored
      ext4_mb_update_group_info is only called in one place, and it's
      extremely simple.  There's no reason to have it in a separate function
      in a separate file as far as I can tell, it just obfuscates what's
      really going on.
      
      Perhaps it was intended to keep the grp->bb_* manipulation local to
      mballoc.c but we're already accessing other grp-> fields in balloc.c
      directly so this seems ok.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      0373130d
    • Eric Sandeen's avatar
      ext4: reject too-large filesystems on 32-bit kernels · bf43d84b
      Eric Sandeen authored
      ext4 will happily mount a > 16T filesystem on a 32-bit box, but
      this is not safe; writes to the block device will wrap past 16T
      and the page cache can't index past 16T (232 index * 4k pages).
      
      Adding another test to the existing "too many sectors" test
      should do the trick.
      
      Add a comment, a relevant return value, and fix the reference
      to the CONFIG_LBD(AF) option as well.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      bf43d84b
    • H Hartley Sweeten's avatar
      jbd2: bitfields should be unsigned · 0ccff1a4
      H Hartley Sweeten authored
      This fixes sparse noise:
        error: dubious one-bit signed bitfield
      Signed-off-by: default avatarH Hartley Sweeten <hsweeten@visionengravers.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan Kara <jack@ucw.cz>
      0ccff1a4
    • Jan Kara's avatar
      ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks() · 487caeef
      Jan Kara authored
      During truncate we are sometimes forced to start a new transaction as
      the amount of blocks to be journaled is both quite large and hard to
      predict. So far we restarted a transaction while holding i_data_sem
      and that violates lock ordering because i_data_sem ranks below a
      transaction start (and it can lead to a real deadlock with
      ext4_get_blocks() mapping blocks in some page while having a
      transaction open).
      
      We fix the problem by dropping the i_data_sem before restarting the
      transaction and acquire it afterwards. It's slightly subtle that this
      works:
      
      1) By the time ext4_truncate() is called, all the page cache for the
      truncated part of the file is dropped so get_block() should not be
      called on it (we only have to invalidate extent cache after we
      reacquire i_data_sem because some extent from not-truncated part could
      extend also into the part we are going to truncate).
      
      2) Writes, migrate or defrag hold i_mutex so they are stopped for all
      the time of the truncate.
      
      This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      487caeef
    • Jan Kara's avatar
      jbd2: Annotate transaction start also for jbd2_journal_restart() · 9599b0e5
      Jan Kara authored
      lockdep annotation for a transaction start has been at the end of
      jbd2_journal_start(). But a transaction is also started from
      jbd2_journal_restart(). Move the lockdep annotation to start_this_handle()
      which covers both cases.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      9599b0e5
  21. 18 Sep, 2009 1 commit
  22. 01 Sep, 2009 1 commit
    • Mingming's avatar
      ext4: Compile warning fix when EXT_DEBUG enabled · 84fe3bef
      Mingming authored
      When EXT_DEBUG is enabled I received the following compile warning on
      PPC64:
      
        CC [M]  fs/ext4/inode.o
        CC [M]  fs/ext4/extents.o
      fs/ext4/extents.c: In function ‘ext4_ext_rm_leaf’:
      fs/ext4/extents.c:2097: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘ext4_lblk_t’
      fs/ext4/extents.c: In function ‘ext4_ext_get_blocks’:
      fs/ext4/extents.c:2789: warning: format ‘%u’ expects type ‘unsigned int’, but argument 4 has type ‘long unsigned int’
      fs/ext4/extents.c:2852: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘ext4_lblk_t’
      fs/ext4/extents.c:2953: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 4 has type ‘unsigned int’
        CC [M]  fs/ext4/migrate.o
      
      The patch fixes compile warning.
      Signed-off-by: default avatarMingming Cao <cmm@us.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      
      Index: linux-2.6.31-rc4/fs/ext4/extents.c
      ===================================================================
      84fe3bef
  23. 18 Sep, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: Avoid group preallocation for closed files · 50797481
      Theodore Ts'o authored
      Currently the group preallocation code tries to find a large (512)
      free block from which to do per-cpu group allocation for small files.
      The problem with this scheme is that it leaves the filesystem horribly
      fragmented.  In the worst case, if the filesystem is unmounted and
      remounted (after a system shutdown, for example) we forget the fact
      that wee were using a particular (now-partially filled) 512 block
      extent.  So the next time we try to allocate space for a small file,
      we will find *another* completely free 512 block chunk to allocate
      small files.  Given that there are 32,768 blocks in a block group,
      after 64 iterations of "mount, write one 4k file in a directory,
      unmount", the block group will have 64 files, each separated by 511
      blocks, and the block group will no longer have any free 512
      completely free chunks of blocks for group preallocation space.
      
      So if we try to allocate blocks for a file that has been closed, such
      that we know the final size of the file, and the filesystem is not
      busy, avoid using group preallocation.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      50797481
  24. 10 Aug, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: Fix bugs in mballoc's stream allocation mode · 4ba74d00
      Theodore Ts'o authored
      The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all
      screwed up.  These fields were getting unconditionally all the time,
      set even when stream allocation had not taken place, and if they were
      being used when the file was smaller than s_mb_stream_request, which
      is when the allocation should _not_ be doing stream allocation.
      
      Fix this by determining whether or not we stream allocation should
      take place once, in ext4_mb_group_or_file(), and setting a flag which
      gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found().
      This simplifies the code and assures that we are consistently using
      (or not using) the stream allocation logic.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      4ba74d00
  25. 09 Aug, 2009 1 commit