Commits · 0a2179b169089f871e071c74316371ed43e6c8eb · Kirill Smelkov / linux

11 Jan, 2011 1 commit

ext4: revert buggy trim overflow patch · 0a2179b1

Theodore Ts'o authored Jan 11, 2011

This reverts commit 4f531501: ext4: fix possible overflow in
ext4_trim_fs()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

0a2179b1

10 Jan, 2011 24 commits

ext4: don't pass entire map to check_eofblocks_fl · d002ebf1

Eric Sandeen authored Jan 10, 2011

Since check_eofblocks_fl() only uses the m_lblk portion of the map
structure, we may as well pass that directly, rather than passing the
entire map, which IMHO obfuscates what parameters check_eofblocks_fl()
cares about.  Not a big deal, but seems tidier and less confusing, to
me.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

d002ebf1

ext4: fix memory leak in ext4_free_branches · 1c5b9e90

Theodore Ts'o authored Jan 10, 2011

Commit 40389687 moved a call to ext4_forget() out of
ext4_free_branches and let ext4_free_blocks() handle calling
bforget().  But that change unfortunately did not replace the call to
ext4_forget() with brelse(), which was needed to drop the in-use count
of the indirect block's buffer head, which lead to a memory leak when
deleting files that used indirect blocks.  Fix this.

Thanks to Hugh Dickins for pointing this out.

Cc: stable@kernel.org
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

1c5b9e90

ext4: remove ext4_mb_return_to_preallocation() · a5196f8c

Theodore Ts'o authored Jan 10, 2011

This function was never implemented, except for a BUG_ON which was
tripping when ext4 is run without a journal.  The problem is that
although the comment asserts that "truncate (which is the only way to
free block) discards all preallocations", ext4_free_blocks() is also
called in various error recovery paths when blocks have been
allocated, but for various reasons, we were not able to use those data
blocks (for example, because we ran out of memory while trying to
manipulate the extent tree, or some other similar situation).

In addition to the fact that this function isn't implemented except
for the incorrect BUG_ON, the single caller of this function,
ext4_free_blocks(), doesn't use it all if the journal is enabled.

So remove the (stub) function entirely for now.  If we decide it's
better to add it back, it's only going to be useful with a relatively
large number of code changes anyway.

Google-Bug-Id: 3236408

Cc: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

a5196f8c

ext4: flush the i_completed_io_list during ext4_truncate · 3889fd57

Jiaying Zhang authored Jan 10, 2011

Ted first found the bug when running 2.6.36 kernel with dioread_nolock
mount option that xfstests #13 complained about wrong file size during fsck.
However, the bug exists in the older kernels as well although it is
somehow harder to trigger.

The problem is that ext4_end_io_work() can happen after we have truncated an
inode to a smaller size. Then when ext4_end_io_work() calls
ext4_convert_unwritten_extents(), we may reallocate some blocks that have
been truncated, so the inode size becomes inconsistent with the allocated
blocks.

The following patch flushes the i_completed_io_list during truncate to reduce
the risk that some pending end_io requests are executed later and convert
already truncated blocks to initialized.

Note that although the fix helps reduce the problem a lot there may still
be a race window between vmtruncate() and ext4_end_io_work(). The fundamental
problem is that if vmtruncate() is called without either i_mutex or i_alloc_sem
held, it can race with an ongoing write request so that the io_end request is
processed later when the corresponding blocks have been truncated.

Ted and I have discussed the problem offline and we saw a few ways to fix
the race completely:

a) We guarantee that i_mutex lock and i_alloc_sem write lock are both hold
whenever vmtruncate() is called. The i_mutex lock prevents any new write
requests from entering writeback and the i_alloc_sem prevents the race
from ext4_page_mkwrite(). Currently we hold both locks if vmtruncate()
is called from do_truncate(), which is probably the most common case.
However, there are places where we may call vmtruncate() without holding
either i_mutex or i_alloc_sem. I would like to ask for other people's
opinions on what locks are expected to be held before calling vmtruncate().
There seems a disagreement among the callers of that function.

b) We change the ext4 write path so that we change the extent tree to contain
the newly allocated blocks and update i_size both at the same time --- when
the write of the data blocks is completed.

c) We add some additional locking to synchronize vmtruncate() and
ext4_end_io_work(). This approach may have performance implications so we
need to be careful.

All of the above proposals may require more substantial changes, so
we may consider to take the following patch as a bandaid.
Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

3889fd57

ext4: add error checking to calls to ext4_handle_dirty_metadata() · b4097142

Theodore Ts'o authored Jan 10, 2011

Call ext4_std_error() in various places when we can't bail out
cleanly, so the file system can be marked as in error.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

b4097142

ext4: fix trimming of a single group · ca6e909f

Jan Kara authored Jan 10, 2011

When ext4_trim_fs() is called to trim a part of a single group, the
logic will wrongly set last block of the interval to 'len' instead
of 'first_block + len'. Thus a shorter interval is possibly trimmed.
Fix it.

CC: Lukas Czerner <lczerner@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

ca6e909f

ext4: fix uninitialized variable in ext4_register_li_request · 6c5a6cb9

Andrew Morton authored Jan 10, 2011

fs/ext4/super.c: In function 'ext4_register_li_request':
fs/ext4/super.c:2936: warning: 'ret' may be used uninitialized in this function

It looks buggy to me, too.

Cc: Lukas Czerner <lczerner@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

6c5a6cb9

ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary · 8aefcd55

Theodore Ts'o authored Jan 10, 2011

Replace the jbd2_inode structure (which is 48 bytes) with a pointer
and only allocate the jbd2_inode when it is needed --- that is, when
the file system has a journal present and the inode has been opened
for writing.  This allows us to further slim down the ext4_inode_info
structure.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

8aefcd55

ext4: drop i_state_flags on architectures with 64-bit longs · 353eb83c

Theodore Ts'o authored Jan 10, 2011

We can store the dynamic inode state flags in the high bits of
EXT4_I(inode)->i_flags, and eliminate i_state_flags.  This saves 8
bytes from the size of ext4_inode_info structure, which when
multiplied by the number of the number of in the inode cache, can save
a lot of memory.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

353eb83c

ext4: reorder ext4_inode_info structure elements to remove unneeded padding · 8a2005d3

Theodore Ts'o authored Jan 10, 2011

By reordering the elements in the ext4_inode_info structure, we can
reduce the padding needed on an x86_64 system by 16 bytes.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

8a2005d3

ext4: drop ec_type from the ext4_ext_cache structure · b05e6ae5

Theodore Ts'o authored Jan 10, 2011

We can encode the ec_type information by using ee_len == 0 to denote
EXT4_EXT_CACHE_NO, ee_start == 0 to denote EXT4_EXT_CACHE_GAP, and if
neither is true, then the cache type must be EXT4_EXT_CACHE_EXTENT.
This allows us to reduce the size of ext4_ext_inode by another 8
bytes. (ec_type is 4 bytes, plus another 4 bytes of padding)
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

b05e6ae5

ext4: use ext4_lblk_t instead of sector_t for logical blocks · 01f49d0b

Theodore Ts'o authored Jan 10, 2011

This fixes a number of places where we used sector_t instead of
ext4_lblk_t for logical blocks, which for ext4 are still 32-bit data
types.  No point wasting space in the ext4_inode_info structure, and
requiring 64-bit arithmetic on 32-bit systems, when it isn't
necessary.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

01f49d0b

ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED · f2321097

Theodore Ts'o authored Jan 10, 2011

Remove the short element i_delalloc_reserved_flag from the
ext4_inode_info structure and replace it a new bit in i_state_flags.
Since we have an ext4_inode_info for every ext4 inode cached in the
inode cache, any savings we can produce here is a very good thing from
a memory utilization perspective.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

f2321097

ext4: fix 32bit overflow in ext4_ext_find_goal() · ad4fb9ca

Kazuya Mio authored Jan 10, 2011

ext4_ext_find_goal() returns an ideal physical block number that the block
allocator tries to allocate first. However, if a required file offset is
smaller than the existing extent's one, ext4_ext_find_goal() returns
a wrong block number because it may overflow at
"block - le32_to_cpu(ex->ee_block)". This patch fixes the problem.

ext4_ext_find_goal() will also return a wrong block number in case
a file offset of the existing extent is too big. In this case,
the ideal physical block number is fixed in ext4_mb_initialize_context(),
so it's no problem.

reproduce:
# dd if=/dev/zero of=/mnt/mp1/tmp bs=127M count=1 oflag=sync
# dd if=/dev/zero of=/mnt/mp1/file bs=512K count=1 seek=1 oflag=sync
# filefrag -v /mnt/mp1/file
Filesystem type is: ef53
File size of /mnt/mp1/file is 1048576 (256 blocks, blocksize 4096)
 ext logical physical expected length flags
   0     128    67456             128 eof
/mnt/mp1/file: 2 extents found
# rm -rf /mnt/mp1/tmp
# echo $((512*4096)) > /sys/fs/ext4/loop0/mb_stream_req
# dd if=/dev/zero of=/mnt/mp1/file bs=512K count=1 oflag=sync conv=notrunc

result (linux-2.6.37-rc2 + ext4 patch queue):
# filefrag -v /mnt/mp1/file
Filesystem type is: ef53
File size of /mnt/mp1/file is 1048576 (256 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0    33280             128 
   1     128    67456    33407    128 eof
/mnt/mp1/file: 2 extents found

result(apply this patch):
# filefrag -v /mnt/mp1/file
Filesystem type is: ef53
File size of /mnt/mp1/file is 1048576 (256 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0    66560             128 
   1     128    67456    66687    128 eof
/mnt/mp1/file: 2 extents found
Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

ad4fb9ca

ext4: add more error checks to ext4_mkdir() · dabd991f

Namhyung Kim authored Jan 10, 2011

Check return value of ext4_journal_get_write_access,
ext4_journal_dirty_metadata and ext4_mark_inode_dirty. Move brelse()
under 'out_stop' to release bh properly in case of journal error.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

dabd991f

ext4: ext4_ext_migrate should use NULL not 0 · f1dffc4c

Eric Paris authored Jan 10, 2011

ext4_ext_migrate() calls ext4_new_inode() and passes 0 instead of a pointer
to a struct qstr.  This patch uses NULL, to make it obvious to the caller
that this was a pointer.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

f1dffc4c

ext4: Use ext4_error_file() to print the pathname to the corrupted inode · f7c21177

Theodore Ts'o authored Jan 10, 2011

Where the file pointer is available, use ext4_error_file() instead of
ext4_error_inode().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

f7c21177

ext4: use IS_ERR() to check for errors in ext4_error_file · f9a62d09

Dan Carpenter authored Jan 10, 2011

d_path() returns an ERR_PTR and it doesn't return NULL.  This is in
ext4_error_file() and no one actually calls ext4_error_file().
Signed-off-by: Dan Carpenter <error27@gmail.com>

f9a62d09

ext4: test the correct variable in ext4_init_pageio() · 13195184

Dan Carpenter authored Jan 10, 2011

This is a copy and paste error.  The intent was to check
"io_page_cachep".  We tested "io_page_cachep" earlier.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

13195184

ext2: remove dead code in ext2_xattr_get · 1f605b30

Wang Sheng-Hui authored Jan 10, 2011

Reviewed-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Wang Sheng-Hui <crosslonelyover@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

1f605b30

ext2,ext3,ext4: clarify comment for extN_xattr_set_handle · 6e9510b0
Wang Sheng-Hui authored Jan 10, 2011
```
Signed-off-by: Wang Sheng-Hui <crosslonelyover@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
6e9510b0

ext4: clean up ext4_xattr_list()'s error code checking and return strategy · eaeef867

Theodore Ts'o authored Jan 10, 2011

Any time you see code that tries to add error codes together, you
should want to claw your eyes out...
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

eaeef867

ext4: remove warning message from ext4_issue_discard helper · 93259636

Lukas Czerner authored Jan 10, 2011

ext4_issue_discard is supposed to be helper for calling discard, however
in case that underlying device does not support discard it prints out
the warning message and clears the DISCARD t_mount_opt flag. Since it
can be (and is) used by others, it should not do anything and let the
caller to handle the error case.

This commit removes warning message and flag setting from
ext4_issue_discard and use it just in place where it is really needed
(release_blocks_on_commit). FITRIM ioctl should not set any flags nor it
should print out warning messages, so get rid of the warning as well.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>

93259636

ext4: fix possible overflow in ext4_trim_fs() · 4f531501

Lukas Czerner authored Jan 10, 2011

When determining last group through ext4_get_group_no_and_offset() the
result may be wrong in cases when range->start and range-len are too
big, because it may overflow when summing up those two numbers.

Fix that by checking range->len and limit its value to
ext4_blocks_count(). This commit was tested by myself with expected
result.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>

4f531501

20 Dec, 2010 7 commits

ext4: Add error checking to kmem_cache_alloc() call in ext4_free_blocks() · b72143ab
Theodore Ts'o authored Dec 20, 2010
```
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
b72143ab

ext4: Use printf extension %pV · 0ff2ea7d

Joe Perches authored Dec 19, 2010

Using %pV reduces the number of printk calls and eliminates any
possible message interleaving from other printk calls.

In function __ext4_grp_locked_error also added KERN_CONT to some
printks.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

0ff2ea7d

ext4: Use vzalloc in ext4_fill_flex_info() · 94de56ab

Joe Perches authored Dec 19, 2010

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

94de56ab

ext4: zero out nanosecond timestamps for small inodes · af0b44a1

Eric Sandeen authored Dec 19, 2010

When nanosecond timestamp resolution isn't supported on an ext4
partition (inode size = 128), stat() appears to be returning
uninitialized garbage in the nanosecond component of timestamps.

EXT4_INODE_GET_XTIME should zero out tv_nsec when EXT4_FITS_IN_INODE
evaluates to false.
Reported-by: Jordan Russell <jr-list-2010@quo.to>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

af0b44a1

ext4: optimize ext4_check_dir_entry() with unlikely() annotations · cad3f007

Theodore Ts'o authored Dec 19, 2010

This function gets called a lot for large directories, and the answer
is almost always "no, no, there's no problem".  This means using
unlikely() is a good thing.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

cad3f007

ext4: use kmem_cache_zalloc() in ext4_init_io_end() · b17b35ec

Jesper Juhl authored Dec 19, 2010

Use advantage of kmem_cache_zalloc() to remove a memset() call in
ext4_init_io_end() and save a few bytes.

Before:
 [jj@dragon linux-2.6]$ size fs/ext4/page-io.o
    text    data     bss     dec     hex filename
    3016       0     624    3640     e38 fs/ext4/page-io.o
After:
 [jj@dragon linux-2.6]$ size fs/ext4/page-io.o
    text    data     bss     dec     hex filename
    3000       0     624    3624     e28 fs/ext4/page-io.o
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

b17b35ec

ext4: Remove redundant unlikely() · 6ca7b13d

Tobias Klauser authored Dec 19, 2010

IS_ERR() already implies unlikely(), so it can be omitted here.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

6ca7b13d

18 Dec, 2010 5 commits

jbd2: simplify return path of journal_init_common · b7271b0a
Theodore Ts'o authored Dec 18, 2010
```
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
b7271b0a

jbd2: move debug message into debug #ifdef · 9a4f6271

Theodore Ts'o authored Dec 18, 2010

This is a port to jbd2 of a patch which Namhyung Kim <namhyung@gmail.com>
originally made to fs/jbd.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

9a4f6271

jbd2: remove unnecessary goto statement · ae00b267

Theodore Ts'o authored Dec 18, 2010

This is a port to jbd2 of a patch which Namhyung Kim <namhyung@gmail.com>
originally made to fs/jbd.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

ae00b267

jbd2: use offset_in_page() instead of manual calculation · a1dd5331

Theodore Ts'o authored Dec 18, 2010

This is a port to jbd2 of a patch which Namhyung Kim <namhyung@gmail.com>
originally made to fs/jbd.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

a1dd5331

jbd2: Fix a debug message in do_get_write_access() · cfef2c6a

Theodore Ts'o authored Dec 18, 2010

'buffer_head' should be 'journal_head'

This is a port of a patch which Namhyung Kim <namhyung@gmail.com> made
to fs/jbd to jbd2.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

cfef2c6a

17 Dec, 2010 2 commits

jbd2: Use pr_notice_ratelimited() in journal_alloc_journal_head() · 670be5a7

Theodore Ts'o authored Dec 17, 2010

We had an open-coded version of printk_ratelimited(); use the provided
abstraction to make the code cleaner and easier to understand.

Based on a similar patch for fs/jbd from Namhyung Kim <namhyung@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

670be5a7

ext4: Use pr_warning_ratelimited() instead of printk_ratelimit() · a8901d34

Theodore Ts'o authored Dec 17, 2010

printk_ratelimit() is deprecated since it is a global instead of a
per-printk ratelimit.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

a8901d34

16 Dec, 2010 1 commit

ext4: Fix up comments in inode.c · 225db7d3

Theodore Ts'o authored Dec 16, 2010

This fixes up some broken argument descriptions that Namhyung Kim had
originally submitted for ext3.  This fixes the comments that were
still applicable in ext4.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

225db7d3