Commits · 0cfc9255a1efb0467de2162950197750570ecec0 · Kirill Smelkov / linux

05 Aug, 2010 1 commit

ext4: re-inline ext4_rec_len_(to|from)_disk functions · 0cfc9255

Eric Sandeen authored Aug 05, 2010

commit 3d0518f4, "ext4: New rec_len encoding for very
large blocksizes" made several changes to this path, but from
a perf perspective, un-inlining ext4_rec_len_from_disk() seems
most significant.  This function is called from ext4_check_dir_entry(),
which on a file-creation workload is called extremely often.

I tested this with bonnie:

# bonnie++ -u root -s 0 -f -x 200 -d /mnt/test -n 32

(this does 200 iterations) and got this for the file creations:

ext4 stock:   Average =  21206.8 files/s
ext4 inlined: Average =  22346.7 files/s  (+5%)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

0cfc9255

04 Aug, 2010 2 commits

jbd2: Remove t_handle_lock from start_this_handle() · 8dd42046

Theodore Ts'o authored Aug 03, 2010

This should remove the last exclusive lock from start_this_handle(),
so that we should now be able to start multiple transactions at the
same time on large SMP systems.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

8dd42046

jbd2: Change j_state_lock to be a rwlock_t · a931da6a

Theodore Ts'o authored Aug 03, 2010

Lockstat reports have shown that j_state_lock is a major source of
lock contention, especially on systems with more than 4 CPU cores.  So
change it to be a read/write spinlock.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

a931da6a

02 Aug, 2010 2 commits

jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop · a51dca9c

Theodore Ts'o authored Aug 02, 2010

By using an atomic_t for t_updates and t_outstanding credits, this
should allow us to not need to take transaction t_handle_lock in
jbd2_journal_stop().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

a51dca9c

ext4: Add mount options in superblock · 8b67f04a

Theodore Ts'o authored Aug 01, 2010

Allow mount options to be stored in the superblock. Also add default
mount option bits for nobarrier, block_validity, discard, and nodelalloc.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

8b67f04a

01 Aug, 2010 2 commits

ext4: force block allocation on quota_off · ca0e05e4

Dmitry Monakhov authored Aug 01, 2010

Perform full sync procedure so that any delayed allocation blocks are
allocated so quota will be consistent.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

ca0e05e4

ext4: fix freeze deadlock under IO · 437f88cc

Eric Sandeen authored Aug 01, 2010

Commit 6b0310fb caused a regression resulting in deadlocks
when freezing a filesystem which had active IO; the vfs_check_frozen
level (SB_FREEZE_WRITE) did not let the freeze-related IO syncing
through.  Duh.

Changing the test to FREEZE_TRANS should let the normal freeze
syncing get through the fs, but still block any transactions from
starting once the fs is completely frozen.

I tested this by running fsstress in the background while periodically
snapshotting the fs and running fsck on the result.  I ran into
occasional deadlocks, but different ones.  I think this is a
fine fix for the problem at hand, and the other deadlocky things
will need more investigation.
Reported-by: Phillip Susi <psusi@cfl.rr.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

437f88cc

29 Jul, 2010 1 commit

ext4: drop inode from orphan list if ext4_delete_inode() fails · 45388219

Theodore Ts'o authored Jul 29, 2010

There were some error paths in ext4_delete_inode() which was not
dropping the inode from the orphan list.  This could lead to a BUG_ON
on umount when the orphan list is discovered to be non-empty.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

45388219

27 Jul, 2010 21 commits

ext4: check to make make sure bd_dev is set before dereferencing it · f613dfcb

Theodore Ts'o authored Jul 27, 2010

There are some drivers which may not set bdev->bd_dev.  So make sure
it is non-NULL before dereferencing it.

Google-Bug-Id: 1773557
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

f613dfcb

jbd2: Make barrier messages less scary · cc937db7

Eric Sandeen authored Jul 27, 2010

Saying things like "sync failed" when a device does
not support barriers makes users slightly more worried than
they need to be; rather than talking about sync failures,
let's just state the barrier-based facts.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

cc937db7

ext4: don't print scary messages for allocation failures post-abort · e3570639

Eric Sandeen authored Jul 27, 2010

I often get emails containing the "This should not happen!!" message,
conveniently trimmed to remove things like:

sd 0:0:0:0: [sda] Unhandled error code
sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 03 13 c9 70 00 00 28 00
end_request: I/O error, dev sda, sector 51628400
Aborting journal on device dm-0-8.
EXT4-fs error (device dm-0): ext4_journal_start_sb: Detected aborted journal
EXT4-fs (dm-0): Remounting filesystem read-only

I don't think there is any value to the verbosity if the reason is
due to a filesystem abort; it just obfuscates the root cause.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

e3570639

ext4: fix EFBIG edge case when writing to large non-extent file · d889dc83

Toshiyuki Okajima authored Jul 27, 2010

By running the following reproducer, we can confirm that the write 
system call returns with 0 when it should return the error EFBIG.

#!/bin/sh

/bin/dd if=/dev/zero of=./img bs=1k count=1 seek=1024k > /dev/null 2>&1
/sbin/mkfs.ext3 -Fq ./img
/bin/mount -o loop -t ext4 ./img /mnt
/bin/touch /mnt/file
strace /bin/dd if=/dev/zero of=/mnt/file conv=notrunc bs=1k count=1 seek=$((2194719883264/1024)) 2>&1 | /bin/egrep "write.* 1024\) = "
/bin/umount /mnt
exit
Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Eric Sandeen <sandeen@redhat.com>

d889dc83

ext4: fix ext4_get_blocks references · 79e83036

Eric Sandeen authored Jul 27, 2010

ext4_get_blocks got renamed to ext4_map_blocks, but left stale
comments and a prototype littered around.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

79e83036

ext4: Always journal quota file modifications · 62d2b5f2

Jan Kara authored Jul 27, 2010

When journaled quota options are not specified, we do writes
to quota files just in data=ordered mode. This actually causes
warnings from JBD2 about dirty journaled buffer because ext4_getblk
unconditionally treats a block allocated by it as metadata. Since
quota actually is filesystem metadata, the easiest way to get rid
of the warning is to always treat quota writes as metadata...
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

62d2b5f2

ext4: Fix potential memory leak in ext4_fill_super · dcc7dae3

Cyrill Gorcunov authored Jul 27, 2010

Under heavy memory pressure we may hit out of memory
situation and as result kstrdup'ed options will not be
freed. Fix it.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

dcc7dae3

ext4: Don't error out the fs if the user tries to make a file too big · 0c095c7f

Theodore Ts'o authored Jul 27, 2010

If the user attempts to make a non-extent-mapped file to be too large,
return EFBIG, but don't call ext4_std_err() which will end up marking
the file system as containing an error.

Thanks to Toshiyuki Okajima-san at Fujitsu for pointing this out.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

0c095c7f

ext4: allocate stripe-multiple IOs on stripe boundaries · 506bf2d8

Eric Sandeen authored Jul 27, 2010

For some reason, today mballoc only allocates IOs which are exactly
stripe-sized on a stripe boundary.  If you have a multiple (say, a
128k IO on a 64k stripe) you may end up unaligned.

It seems to me that a simple change to align stripe-multiple IOs
on stripe boundaries would be a very good idea, unless this breaks
some other mballoc heuristic for some reason...
Reported-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

506bf2d8

ext4: move aio completion after unwritten extent conversion · 5b3ff237

jiayingz@google.com (Jiaying Zhang) authored Jul 27, 2010

This patch is to be applied upon Christoph's "direct-io: move aio_complete
into ->end_io" patch. It adds iocb and result fields to struct ext4_io_end_t,
so that we can call aio_complete from ext4_end_io_nolock() after the extent
conversion has finished.

I have verified with Christoph's aio-dio test that used to fail after a few
runs on an original kernel but now succeeds on the patched kernel.

See http://thread.gmane.org/gmane.comp.file-systems.ext4/19659 for details.
Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

5b3ff237

direct-io: move aio_complete into ->end_io · 552ef802

Christoph Hellwig authored Jul 27, 2010

Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited.  That means
the aio_complete calls needs to be moved into the ->end_io callback so
that the filesystem can control when to call it exactly.

This makes a bit of a mess out of dio_complete and the ->end_io callback
prototype even more complicated. 
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz> 
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

552ef802

ext4: Support discard requests when running in no-journal mode · 5c521830

Jiaying Zhang authored Jul 27, 2010

Issue discard request in ext4_free_blocks() when ext4 has no journal and
is mounted with discard option.
Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

5c521830

jbd2: Remove __GFP_NOFAIL from jbd2 layer · 47def826

Theodore Ts'o authored Jul 27, 2010

__GFP_NOFAIL is going away, so add our own retry loop.  Also add
jbd2__journal_start() and jbd2__journal_restart() which take a gfp
mask, so that file systems can optionally (re)start transaction
handles using GFP_KERNEL.  If they do this, then they need to be
prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready
to reflect that error up to userspace.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

47def826

ext4: Fix block bitmap inconsistencies after a crash when deleting files · 40389687

Amir G authored Jul 27, 2010

We have experienced bitmap inconsistencies after crash during file
delete under heavy load.  The crash is not file system related and I
the following patch in ext4_free_branches() fixes the recovery
problem.

If the transaction is restarted and there is a crash before the new
transaction is committed, then after recovery, the blocks that this
indirect block points to have been freed, but the indirect block
itself has not been freed and may still point to some of the free
blocks (because of the ext4_forget()).

So ext4_forget() should be called inside ext4_free_blocks() to avoid
this problem.
Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

40389687

ext4: Remove unnecessary casts of private_data · a271fe85

Joe Perches authored Jul 27, 2010

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

a271fe85

ext4: fix potential NULL dereference while tracing · e5880d76
Theodore Ts'o authored Jul 27, 2010
```
The allocation_context pointer can be NULL.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
e5880d76

ext4: Define s_jnl_backup_type in superblock · 89eeddf0

Theodore Ts'o authored Jul 27, 2010

This has been in use by e2fsprogs for a while; define it to keep the
super block fields in sync.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

89eeddf0

ext4: Once a day, printk file system error information to dmesg · 66e61a9e

Theodore Ts'o authored Jul 27, 2010

This allows us to grab any file system error messages by scraping
/var/log/messages.  This will make it easy for us to do error analysis
across the very large number of machines as we deploy ext4 across the
fleet.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

66e61a9e

ext4: Save error information to the superblock for analysis · 1c13d5c0

Theodore Ts'o authored Jul 27, 2010

Save number of file system errors, and the time function name, line
number, block number, and inode number of the first and most recent
errors reported on the file system in the superblock.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

1c13d5c0

ext4: Pass line numbers to ext4_error() and friends · c398eda0
Theodore Ts'o authored Jul 27, 2010
```
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
c398eda0

ext4: Cleanup ext4_check_dir_entry so __func__ is now implicit · 60fd4da3

Theodore Ts'o authored Jul 27, 2010

    
Also start passing the line number to ext4_check_dir since we're going
to need it in upcoming patch.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

60fd4da3

29 Jun, 2010 4 commits
- ext4: Pass line number to ext4_journal_abort_handle() · 90c7201b
  Theodore Ts'o authored Jun 29, 2010
```
This allows the error messages to include the line number
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
  90c7201b
- ext4: Enhance ext4_grp_locked_error() to take block and function numbers · e29136f8
  Theodore Ts'o authored Jun 29, 2010
```
Also use a macro definition so that __func__ and __LINE__ is implicit.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
  e29136f8
- ext4: clean up ext4_abort() so __func__ is now implicit · c67d859e
  Theodore Ts'o authored Jun 29, 2010
```
Use a macro definition for ext4_abort() to clean up the .c files a wee
bit.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
  c67d859e
- ext4: Add new superblock fields reserved for the Next3 snapshot feature · 4a9cdec7
  Theodore Ts'o authored Jun 29, 2010
```
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
```
  4a9cdec7
15 Jun, 2010 1 commit

ext4: update ctime when changing the file's permission by setfacl · c6ac12a6

Jan Kara authored Jun 15, 2010

ext4 didn't update the ctime of the file when its permission was
changed.

Steps to reproduce:
 # touch aaa
 # stat -c %Z aaa
 1275289822
 # setfacl -m  'u::x,g::x,o::x' aaa
 # stat -c %Z aaa
 1275289822                         <- unchanged

But, according to the spec of the ctime, ext4 must update it.

Port of ext3 patch by Miao Xie <miaox@cn.fujitsu.com>.

CC: linux-ext4@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

c6ac12a6

14 Jun, 2010 3 commits

ext4: remove vestiges of nobh support · 206f7ab4

Christoph Hellwig authored Jun 14, 2010

The nobh option was only supported for writeback mode, but given that all
write paths actually create buffer heads it effectively was a no-op already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

206f7ab4

ext4: remove initialized but not read variables · 5a0790c2

Andi Kleen authored Jun 14, 2010

No real bugs found, just removed some dead code.

Found by gcc 4.6's new warnings.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

5a0790c2

ext4: Convert more i_flags references to use accessor functions · 07a03824

Theodore Ts'o authored Jun 14, 2010

These changes are not ones which are likely to result in races, but
they should be fixed.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

07a03824

12 Jun, 2010 2 commits

ext4: Clean up s_dirt handling · a0375156

Theodore Ts'o authored Jun 11, 2010

We don't need to set s_dirt in most of the ext4 code when journaling
is enabled.  In ext3/4 some of the summary statistics for # of free
inodes, blocks, and directories are calculated from the per-block
group statistics when the file system is mounted or unmounted.  As a
result the superblock doesn't have to be updated, either via the
journal or by setting s_dirt.  There are a few exceptions, most
notably when resizing the file system, where the superblock needs to
be modified --- and in that case it should be done as a journalled
operation if possible, and s_dirt set only in no-journal mode.

This patch will optimize out some unneeded disk writes when using ext4
with a journal.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

a0375156

Linux 2.6.35-rc3 · 7e27d6e7
Linus Torvalds authored Jun 11, 2010

7e27d6e7

11 Jun, 2010 1 commit

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 4cea8706

Linus Torvalds authored Jun 11, 2010

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  wimax/i2400m: fix missing endian correction read in fw loader
  net8139: fix a race at the end of NAPI
  pktgen: Fix accuracy of inter-packet delay.
  pkt_sched: gen_estimator: add a new lock
  net: deliver skbs on inactive slaves to exact matches
  ipv6: fix ICMP6_MIB_OUTERRORS
  r8169: fix mdio_read and update mdio_write according to hw specs
  gianfar: Revive the driver for eTSEC devices (disable timestamping)
  caif: fix a couple range checks
  phylib: Add support for the LXT973 phy.
  net: Print num_rx_queues imbalance warning only when there are allocated queues

4cea8706