- 24 Apr, 2016 2 commits
-
-
Jan Kara authored
Currently we ask jbd2 to write all dirty allocated buffers before committing a transaction when doing writeback of delay allocated blocks. However this is unnecessary since we move all pages to writeback state before dropping a transaction handle and then submit all the necessary IO. We still need the transaction commit to wait for all the outstanding writeback before flushing disk caches during transaction commit to avoid data exposure issues though. Use the new jbd2 capability and ask it to only wait for outstanding writeback during transaction commit when writing back data in ext4_writepages(). Tested-by:
"HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
This flag is just duplicating what ext4_should_order_data() tells you and is used in a single place. Furthermore it doesn't reflect changes to inode data journalling flag so it may be possibly misleading. Just remove it. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 04 Apr, 2016 1 commit
-
-
Kirill A. Shutemov authored
Mostly direct substitution with occasional adjustment or removing outdated comments. Signed-off-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by:
Michal Hocko <mhocko@suse.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 01 Apr, 2016 1 commit
-
-
Theodore Ts'o authored
With the internal Quota feature, mke2fs creates empty quota inodes and quota usage tracking is enabled as soon as the file system is mounted. Since quotacheck is no longer preallocating all of the blocks in the quota inode that are likely needed to be written to, we are now seeing a lockdep false positive caused by needing to allocate a quota block from inside ext4_map_blocks(), while holding i_data_sem for a data inode. This results in this complaint: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ei->i_data_sem); lock(&s->s_dquot.dqio_mutex); lock(&ei->i_data_sem); lock(&s->s_dquot.dqio_mutex); Google-Bug-Id: 27907753 Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 26 Mar, 2016 1 commit
-
-
Theodore Ts'o authored
We don't want the writeback triggered from the journal commit (in data=writeback mode) to cause the journal to abort due to generic_writepages() returning an ENOMEM error. In addition, if fsync() fails with ENOMEM, most applications will probably not do the right thing. So if we are doing a data integrity sync, and ext4_encrypt() returns ENOMEM, we will submit any queued I/O to date, and then retry the allocation using GFP_NOFAIL. Google-Bug-Id: 27641567 Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 13 Mar, 2016 1 commit
-
-
Aihua Zhang authored
the error is: fs/ext4/mballoc.c:475:43: error: 'struct ext4_group_info' has no member named 'bb_bitmap'. so, the definition of macro DOUBLE_CHECK should before 'struct ext4_group_info', I fixed it, and I moved the macro AGGRESSIVE_CHECK together, because I think they shoule be together. Signed-off-by:
Aihua Zhang <zhangaihua1@huawei.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 10 Mar, 2016 1 commit
-
-
Jan Kara authored
Using SEEK_DATA in a huge sparse file can easily lead to sotflockups as ext4_seek_data() iterates hole block-by-block. Fix the problem by using returned hole size from ext4_map_blocks() and thus skip the hole in one go. Update also SEEK_HOLE implementation to follow the same pattern as SEEK_DATA to make future maintenance easier. Furthermore we add cond_resched() to both ext4_seek_data() and ext4_seek_hole() to avoid softlockups in case evil user creates huge fragmented file and we have to go through lots of extents. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 09 Mar, 2016 5 commits
-
-
Jan Kara authored
Remove counter of pending io ends as it is unused. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
When mapping blocks for direct IO, we allocate io_end structure before mapping blocks and store pointer to it in the inode. This creates a requirement that any AIO DIO using io_end must be protected by i_mutex. This created problems in the past with dioread_nolock mode which was corrupting io_end pointers. Also io_end is allocated unnecessarily in case where we don't need to convert any extents (which is a common case for example when overwriting file). We fix the problem by allocating io_end only once we return unwritten extent from block mapping function for AIO DIO (so we can save some pointless io_end allocations) and we pass pointer to it in bh->b_private which generic DIO code later passes to our end IO callback. That way we remove any need for global pointer to io_end structure and thus fix the races. The downside of this change is that the checking for unwritten IO in flight in ext4_extents_can_be_merged() is more racy since we now increment i_unwritten / set EXT4_STATE_DIO_UNWRITTEN only after dropping i_data_sem. However the check has been racy already before because ext4_writepages() already increment i_unwritten after dropping i_data_sem and reserved blocks save us from hitting ENOSPC in the worst case. Signed-off-by:
Jan Kara <jack@suse.cz>
-
Jan Kara authored
Rename ext4_get_blocks_write() to ext4_get_blocks_unwritten() to better describe what it does. Also split out get blocks functions for direct IO. Later we move functionality from _ext4_get_blocks() there. There's no functional change in this patch. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
Currently we've used hashed aio_mutex to serialize unaligned AIO DIO. However the code cleanups that happened after 2011 when the lock was introduced made aio_mutex acquired at almost the same places where we already have exclusion using i_mutex. So just use i_mutex for the exclusion of unaligned AIO DIO. The change moves waiting for pending unwritten extent conversion under i_mutex. That makes special handling of O_APPEND writes unnecessary and also avoids possible livelocking of unaligned AIO DIO with aligned one (nothing was preventing contiguous stream of aligned AIO DIOs to let unaligned AIO DIO wait forever). Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
On 64-bit architectures we have two 4-byte holes in struct ext4_io_end. Order entries better to avoid this and thus make the structure occupy 64 instead of 72 bytes for 64-bit architectures. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 28 Feb, 2016 1 commit
-
-
Jan Kara authored
When AIO DIO fails e.g. due to IO error, we must not convert unwritten extents as that will expose uninitialized data. Handle this case by clearing unwritten flag from io_end in case of error and thus preventing extent conversion. Signed-off-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Dave Chinner <david@fromorbit.com>
-
- 23 Feb, 2016 1 commit
-
-
Jan Kara authored
Since old mbcache code is gone, let's rename new code to mbcache since number 2 is now meaningless. This is just a mechanical replacement. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 22 Feb, 2016 1 commit
-
-
Jan Kara authored
The conversion is generally straightforward. The only tricky part is that xattr block corresponding to found mbcache entry can get freed before we get buffer lock for that block. So we have to check whether the entry is still valid after getting buffer lock. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 08 Feb, 2016 1 commit
-
-
Theodore Ts'o authored
Add a validation check for dentries for encrypted directory to make sure we're not caching stale data after a key has been added or removed. Also check to make sure that status of the encryption key is updated when readdir(2) is executed. Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 22 Jan, 2016 1 commit
-
-
Al Viro authored
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 08 Jan, 2016 4 commits
-
-
Li Xi authored
This patch adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR ioctl interface support for ext4. The interface is kept consistent with XFS_IOC_FSGETXATTR/XFS_IOC_FSGETXATTR. Signed-off-by:
Li Xi <lixi@ddn.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Reviewed-by:
Andreas Dilger <adilger@dilger.ca> Reviewed-by:
Jan Kara <jack@suse.cz>
-
Li Xi authored
This patch adds mount options for enabling/disabling project quota accounting and enforcement. A new specific inode is also used for project quota accounting. [ Includes fix from Dan Carpenter to crrect error checking from dqget(). ] Signed-off-by:
Li Xi <lixi@ddn.com> Signed-off-by:
Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Reviewed-by:
Andreas Dilger <adilger@dilger.ca> Reviewed-by:
Jan Kara <jack@suse.cz>
-
Li Xi authored
Signed-off-by:
Li Xi <lixi@ddn.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Reviewed-by:
Andreas Dilger <adilger@dilger.ca> Reviewed-by:
Jan Kara <jack@suse.cz>
-
Theodore Ts'o authored
A number of functions include ext4_add_dx_entry, make_indexed_dir, etc. are being passed a dentry even though the only thing they use is the containing parent. We can shrink the code size slightly by making this replacement. This will also be useful in cases where we don't have a dentry as the argument to the directory entry insert functions. Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 07 Dec, 2015 6 commits
-
-
Jan Kara authored
Make DAX fault path use pre-zeroed blocks to avoid races with extent conversion and zeroing when two page faults to the same block happen. Signed-off-by:
Jan Kara <jack@suse.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
DAX page fault path needs to get blocks that are pre-zeroed to avoid races when two concurrent page faults happen in the same block of a file. Implement support for this in ext4_map_blocks(). Signed-off-by:
Jan Kara <jack@suse.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
Create new function ext4_issue_zeroout() to zeroout contiguous (both logically and physically) part of inode data. We will need to issue zeroout when extent structure is not readily available and this function will allow us to do it without making up fake extent structures. Signed-off-by:
Jan Kara <jack@suse.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
When dioread_nolock mode is enabled, we grab i_data_sem in ext4_ext_direct_IO() and therefore we need to instruct _ext4_get_block() not to grab i_data_sem again using EXT4_GET_BLOCKS_NO_LOCK. However holding i_data_sem over overwrite direct IO isn't needed these days. We have exclusion against truncate / hole punching because we increase i_dio_count under i_mutex in ext4_ext_direct_IO() so once ext4_file_write_iter() verifies blocks are allocated & written, they are guaranteed to stay so during the whole direct IO even after we drop i_mutex. So we can just remove this locking abuse and the no longer necessary EXT4_GET_BLOCKS_NO_LOCK flag. Signed-off-by:
Jan Kara <jack@suse.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
When doing delayed allocation, update of on-disk inode size is postponed until IO submission time. However hole punch or zero range fallocate calls can end up discarding the tail page cache page and thus on-disk inode size would never be properly updated. Make sure the on-disk inode size is updated before truncating page cache. Signed-off-by:
Jan Kara <jack@suse.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
Currently, page faults and hole punching are completely unsynchronized. This can result in page fault faulting in a page into a range that we are punching after truncate_pagecache_range() has been called and thus we can end up with a page mapped to disk blocks that will be shortly freed. Filesystem corruption will shortly follow. Note that the same race is avoided for truncate by checking page fault offset against i_size but there isn't similar mechanism available for punching holes. Fix the problem by creating new rw semaphore i_mmap_sem in inode and grab it for writing over truncate, hole punching, and other functions removing blocks from extent tree and for read over page faults. We cannot easily use i_data_sem for this since that ranks below transaction start and we need something ranking above it so that it can be held over the whole truncate / hole punching operation. Also remove various workarounds we had in the code to reduce race window when page fault could have created pages with stale mapping information. Signed-off-by:
Jan Kara <jack@suse.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 24 Nov, 2015 1 commit
-
-
David Turner authored
In ext4, the bottom two bits of {a,c,m}time_extra are used to extend the {a,c,m}time fields, deferring the year 2038 problem to the year 2446. When decoding these extended fields, for times whose bottom 32 bits would represent a negative number, sign extension causes the 64-bit extended timestamp to be negative as well, which is not what's intended. This patch corrects that issue, so that the only negative {a,c,m}times are those between 1901 and 1970 (as per 32-bit signed timestamps). Some older kernels might have written pre-1970 dates with 1,1 in the extra bits. This patch treats those incorrectly-encoded dates as pre-1970, instead of post-2311, until kernel 4.20 is released. Hopefully by then e2fsck will have fixed up the bad data. Also add a comment explaining the encoding of ext4's extra {a,c,m}time bits. Signed-off-by:
David Turner <novalis@novalis.org> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Reported-by:
Mark Harris <mh8928@yahoo.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732 Cc: stable@vger.kernel.org
-
- 19 Oct, 2015 1 commit
-
-
Dmitry Monakhov authored
It is appeared that we can pass journal related mount options and such options be shown in /proc/mounts Example: #mkfs.ext4 -F /dev/vdb #tune2fs -O ^has_journal /dev/vdb #mount /dev/vdb /mnt/ -ocommit=20,journal_async_commit #cat /proc/mounts | grep /mnt /dev/vdb /mnt ext4 rw,relatime,journal_checksum,journal_async_commit,commit=20,data=ordered 0 0 But options:"journal_checksum,journal_async_commit,commit=20,data=ordered" has nothing with reality because there is no journal at all. This patch disallow following options for journalless configurations: - journal_checksum - journal_async_commit - commit=%ld - data={writeback,ordered,journal} Signed-off-by:
Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Reviewed-by:
Andreas Dilger <adilger@dilger.ca>
-
- 17 Oct, 2015 4 commits
-
-
Darrick J. Wong authored
Create separate predicate functions to test/set/clear feature flags, thereby replacing the wordy old macros. Furthermore, clean out the places where we open-coded feature tests. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com>
-
Darrick J. Wong authored
Instead of overloading EIO for CRC errors and corrupt structures, return the same error codes that XFS returns for the same issues. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Darrick J. Wong authored
Allow the filesystem to store the metadata checksum seed in the superblock and add an incompat feature to say that we're using it. This enables tune2fs to change the UUID on a mounted metadata_csum FS without having to (racy!) rewrite all disk metadata. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Theodore Ts'o authored
Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 03 Oct, 2015 1 commit
-
-
Theodore Ts'o authored
Since ext4_page_crypto() doesn't need an encryption context (at least not any more), this allows us to simplify a number function signature and also allows us to avoid needing to allocate a context in ext4_block_write_begin(). It also means we no longer need a separate ext4_decrypt_one() function. Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 23 Sep, 2015 2 commits
-
-
Theodore Ts'o authored
This allows us to refactor the procfs code, which saves a bit of compiled space. More importantly it isolates most of the procfs support code into a single file, so it's easier to #ifdef it out if the proc file system has been disabled. Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Theodore Ts'o authored
Also statically allocate the ext4_kset and ext4_feat objects, since we only need exactly one of each, and it's simpler and less code if we drop the dynamic allocation and deallocation when it's not needed. Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 08 Sep, 2015 1 commit
-
-
Matthew Wilcox authored
DAX wants different semantics from any currently-existing ext4 get_block callback. Unlike ext4_get_block_write(), it needs to honour the 'create' flag, and unlike ext4_get_block(), it needs to be able to return unwritten extents. So introduce a new ext4_get_block_dax() which has those semantics. We could also change ext4_get_block_write() to honour the 'create' flag, but that might have consequences on other users that I do not currently understand. Signed-off-by:
Matthew Wilcox <willy@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 22 Jul, 2015 1 commit
-
-
Tejun Heo authored
ext4_io_submit_init() takes the pointer to writeback_control to test its sync_mode and determine between WRITE and WRITE_SYNC and records the result in ->io_op. This patch makes it record the pointer directly and moves the test to ext4_io_submit(). This doesn't cause any noticeable differences now but having writeback_control available throughout IO submission path will be depended upon by the planned cgroup writeback support. Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 15 Jun, 2015 1 commit
-
-
Andreas Dilger authored
Several ext4_warning() messages in the directory handling code do not report the inode number of the (potentially corrupt) directory where a problem is seen, and others report this in an ad-hoc manner. Add an ext4_warning_inode() helper to print the inode number and command name consistent with ext4_error_inode(). Consolidate the place in ext4.h that these macros are defined. Clean up some other directory error and warning messages to print the calling function name. Minor code style fixes in nearby lines. Signed-off-by:
Andreas Dilger <adilger@dilger.ca> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
- 09 Jun, 2015 1 commit
-
-
Namjae Jeon authored
This patch implements fallocate's FALLOC_FL_INSERT_RANGE for Ext4. 1) Make sure that both offset and len are block size aligned. 2) Update the i_size of inode by len bytes. 3) Compute the file's logical block number against offset. If the computed block number is not the starting block of the extent, split the extent such that the block number is the starting block of the extent. 4) Shift all the extents which are lying between [offset, last allocated extent] towards right by len bytes. This step will make a hole of len bytes at offset. Signed-off-by:
Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by:
Ashish Sangwan <a.sangwan@samsung.com>
-