1. 03 May, 2015 3 commits
    • Jan Kara's avatar
      ext4: fix growing of tiny filesystems · 2c869b26
      Jan Kara authored
      The estimate of necessary transaction credits in ext4_flex_group_add()
      is too pessimistic. It reserves credit for sb, resize inode, and resize
      inode dindirect block for each group added in a flex group although they
      are always the same block and thus it is enough to account them only
      once. Also the number of modified GDT block is overestimated since we
      fit EXT4_DESC_PER_BLOCK(sb) descriptors in one block.
      
      Make the estimation more precise. That reduces number of requested
      credits enough that we can grow 20 MB filesystem (which has 1 MB
      journal, 79 reserved GDT blocks, and flex group size 16 by default).
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      2c869b26
    • Davide Italiano's avatar
      ext4: move check under lock scope to close a race. · 280227a7
      Davide Italiano authored
      fallocate() checks that the file is extent-based and returns
      EOPNOTSUPP in case is not. Other tasks can convert from and to
      indirect and extent so it's safe to check only after grabbing
      the inode mutex.
      Signed-off-by: default avatarDavide Italiano <dccitaliano@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      280227a7
    • Lukas Czerner's avatar
      ext4: fix data corruption caused by unwritten and delayed extents · d2dc317d
      Lukas Czerner authored
      Currently it is possible to lose whole file system block worth of data
      when we hit the specific interaction with unwritten and delayed extents
      in status extent tree.
      
      The problem is that when we insert delayed extent into extent status
      tree the only way to get rid of it is when we write out delayed buffer.
      However there is a limitation in the extent status tree implementation
      so that when inserting unwritten extent should there be even a single
      delayed block the whole unwritten extent would be marked as delayed.
      
      At this point, there is no way to get rid of the delayed extents,
      because there are no delayed buffers to write out. So when a we write
      into said unwritten extent we will convert it to written, but it still
      remains delayed.
      
      When we try to write into that block later ext4_da_map_blocks() will set
      the buffer new and delayed and map it to invalid block which causes
      the rest of the block to be zeroed loosing already written data.
      
      For now we can fix this by simply not allowing to set delayed status on
      written extent in the extent status tree. Also add WARN_ON() to make
      sure that we notice if this happens in the future.
      
      This problem can be easily reproduced by running the following xfs_io.
      
      xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
                -c "falloc 0 131072" \
                -c "pwrite -S 0xbb 65536 2048" \
                -c "fsync" /mnt/test/fff
      
      echo 3 > /proc/sys/vm/drop_caches
      xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff
      
      This can be theoretically also reproduced by at random by running fsx,
      but it's not very reliable, though on machines with bigger page size
      (like ppc) this can be seen more often (especially xfstest generic/127)
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      d2dc317d
  2. 02 May, 2015 2 commits
  3. 01 May, 2015 2 commits
    • Theodore Ts'o's avatar
      ext4 crypto: add padding to filenames before encrypting · a44cd7a0
      Theodore Ts'o authored
      This obscures the length of the filenames, to decrease the amount of
      information leakage.  By default, we pad the filenames to the next 4
      byte boundaries.  This costs nothing, since the directory entries are
      aligned to 4 byte boundaries anyway.  Filenames can also be padded to
      8, 16, or 32 bytes, which will consume more directory space.
      
      Change-Id: Ibb7a0fb76d2c48e2061240a709358ff40b14f322
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      a44cd7a0
    • Theodore Ts'o's avatar
      ext4 crypto: simplify and speed up filename encryption · 5de0b4d0
      Theodore Ts'o authored
      Avoid using SHA-1 when calculating the user-visible filename when the
      encryption key is available, and avoid decrypting lots of filenames
      when searching for a directory entry in a directory block.
      
      Change-Id: If4655f144784978ba0305b597bfa1c8d7bb69e63
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      5de0b4d0
  4. 16 Apr, 2015 2 commits
  5. 12 Apr, 2015 12 commits
  6. 11 Apr, 2015 5 commits
  7. 08 Apr, 2015 1 commit
  8. 03 Apr, 2015 8 commits
    • Lukas Czerner's avatar
      ext4: make fsync to sync parent dir in no-journal for real this time · e12fb972
      Lukas Czerner authored
      Previously commit 14ece102 added a
      support for for syncing parent directory of newly created inodes to
      make sure that the inode is not lost after a power failure in
      no-journal mode.
      
      However this does not work in majority of cases, namely:
       - if the directory has inline data
       - if the directory is already indexed
       - if the directory already has at least one block and:
      	- the new entry fits into it
      	- or we've successfully converted it to indexed
      
      So in those cases we might lose the inode entirely even after fsync in
      the no-journal mode. This also includes ext2 default mode obviously.
      
      I've noticed this while running xfstest generic/321 and even though the
      test should fail (we need to run fsck after a crash in no-journal mode)
      I could not find a newly created entries even when if it was fsynced
      before.
      
      Fix this by adjusting the ext4_add_entry() successful exit paths to set
      the inode EXT4_STATE_NEWENTRY so that fsync has the chance to fsync the
      parent directory as well.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Frank Mayhar <fmayhar@google.com>
      Cc: stable@vger.kernel.org
      e12fb972
    • Eric Whitney's avatar
      ext4: don't release reserved space for previously allocated cluster · 9d21c9fa
      Eric Whitney authored
      When xfstests' auto group is run on a bigalloc filesystem with a
      4.0-rc3 kernel, e2fsck failures and kernel warnings occur for some
      tests. e2fsck reports incorrect iblocks values, and the warnings
      indicate that the space reserved for delayed allocation is being
      overdrawn at allocation time.
      
      Some of these errors occur because the reserved space is incorrectly
      decreased by one cluster when ext4_ext_map_blocks satisfies an
      allocation request by mapping an unused portion of a previously
      allocated cluster.  Because a cluster's worth of reserved space was
      already released when it was first allocated, it should not be released
      again.
      
      This patch appears to correct the e2fsck failure reported for
      generic/232 and the kernel warnings produced by ext4/001, generic/009,
      and generic/033.  Failures and warnings for some other tests remain to
      be addressed.
      Signed-off-by: default avatarEric Whitney <enwlinux@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      9d21c9fa
    • Eric Whitney's avatar
      ext4: fix loss of delalloc extent info in ext4_zero_range() · 94426f4b
      Eric Whitney authored
      In ext4_zero_range(), removing a file's entire block range from the
      extent status tree removes all records of that file's delalloc extents.
      The delalloc accounting code uses this information, and its loss can
      then lead to accounting errors and kernel warnings at writeback time and
      subsequent file system damage.  This is most noticeable on bigalloc
      file systems where code in ext4_ext_map_blocks() handles cases where
      delalloc extents share clusters with a newly allocated extent.
      
      Because we're not deleting a block range and are correctly updating the
      status of its associated extent, there is no need to remove anything
      from the extent status tree.
      
      When this patch is combined with an unrelated bug fix for
      ext4_zero_range(), kernel warnings and e2fsck errors reported during
      xfstests runs on bigalloc filesystems are greatly reduced without
      introducing regressions on other xfstests-bld test scenarios.
      Signed-off-by: default avatarEric Whitney <enwlinux@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      94426f4b
    • Lukas Czerner's avatar
      ext4: allocate entire range in zero range · 0f2af21a
      Lukas Czerner authored
      Currently there is a bug in zero range code which causes zero range
      calls to only allocate block aligned portion of the range, while
      ignoring the rest in some cases.
      
      In some cases, namely if the end of the range is past i_size, we do
      attempt to preallocate the last nonaligned block. However this might
      cause kernel to BUG() in some carefully designed zero range requests
      on setups where page size > block size.
      
      Fix this problem by first preallocating the entire range, including
      the nonaligned edges and converting the written extents to unwritten
      in the next step. This approach will also give us the advantage of
      having the range to be as linearly contiguous as possible.
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      0f2af21a
    • Maurizio Lombardi's avatar
      5a4f3145
    • Christoph Hellwig's avatar
      ext4: remove block_device_ejected · 08439fec
      Christoph Hellwig authored
      bdi->dev now never goes away, so this function became useless.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      08439fec
    • Wei Yuan's avatar
      ext4: remove useless condition in if statement. · 5f80f62a
      Wei Yuan authored
      In this if statement, the previous condition is useless, the later one
      has covered it.
      Signed-off-by: default avatarWeiyuan <weiyuan.wei@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarLukas Czerner <lczerner@redhat.com>
      5f80f62a
    • Sheng Yong's avatar
      ext4: remove unused header files · 72b8e0f9
      Sheng Yong authored
      Remove unused header files and header files which are included in
      ext4.h.
      Signed-off-by: default avatarSheng Yong <shengyong1@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      72b8e0f9
  9. 02 Apr, 2015 3 commits
  10. 17 Mar, 2015 2 commits
    • Theodore Ts'o's avatar
      fs: add dirtytime_expire_seconds sysctl · 1efff914
      Theodore Ts'o authored
      Add a tuning knob so we can adjust the dirtytime expiration timeout,
      which is very useful for testing lazytime.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      1efff914
    • Theodore Ts'o's avatar
      fs: make sure the timestamps for lazytime inodes eventually get written · a2f48706
      Theodore Ts'o authored
      Jan Kara pointed out that if there is an inode which is constantly
      getting dirtied with I_DIRTY_PAGES, an inode with an updated timestamp
      will never be written since inode->dirtied_when is constantly getting
      updated.  We fix this by adding an extra field to the inode,
      dirtied_time_when, so inodes with a stale dirtytime can get detected
      and handled.
      
      In addition, if we have a dirtytime inode caused by an atime update,
      and there is no write activity on the file system, we need to have a
      secondary system to make sure these inodes get written out.  We do
      this by setting up a second delayed work structure which wakes up the
      CPU much more rarely compared to writeback_expire_centisecs.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      a2f48706