An error occurred fetching the project authors.
- 05 Sep, 2014 1 commit
-
-
Theodore Ts'o authored
commit 86f0afd4 upstream. If there is a failure while allocating the preallocation structure, a number of blocks can end up getting marked in the in-memory buddy bitmap, and then not getting released. This can result in the following corruption getting reported by the kernel: EXT4-fs error (device sda3): ext4_mb_generate_buddy:758: group 1126, 12793 clusters in bitmap, 12729 in gd In that case, we need to release the blocks using mb_free_blocks(). Tested: fs smoke test; also demonstrated that with injected errors, the file system is no longer getting corrupted Google-Bug-Id: 16657874 Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 01 Jul, 2014 1 commit
-
-
Maurizio Lombardi authored
commit b5b60778 upstream. The variable "size" is expressed as number of blocks and not as number of clusters, this could trigger a kernel panic when using ext4 with the size of a cluster different from the size of a block. Signed-off-by:
Maurizio Lombardi <mlombard@redhat.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 09 Jan, 2014 3 commits
-
-
Lukas Czerner authored
commit 8f9ff189 upstream. When using FITRIM ioctl on a file system without journal it will only trim the block group once, no matter how many times you invoke FITRIM ioctl and how many block you release from the block group. It is because we only clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT in journal callback. Fix this by clearing the bit in no journal mode as well. Signed-off-by:
Lukas Czerner <lczerner@redhat.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Reported-by:
Jorge Fábregas <jorge.fabregas@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Theodore Ts'o authored
commit f5a44db5 upstream. The missing casts can cause the high 64-bits of the physical blocks to be lost. Set up new macros which allows us to make sure the right thing happen, even if at some point we end up supporting larger logical block numbers. Thanks to the Emese Revfy and the PaX security team for reporting this issue. Reported-by:
PaX Team <pageexec@freemail.hu> Reported-by:
Emese Revfy <re.emese@gmail.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Junho Ryu authored
commit 4e8d2139 upstream. ext4_mb_put_pa should hold pa->pa_lock before accessing pa->pa_count. While ext4_mb_use_preallocated checks pa->pa_deleted first and then increments pa->count later, ext4_mb_put_pa decrements pa->pa_count before holding pa->pa_lock and then sets pa->pa_deleted. * Free sequence ext4_mb_put_pa (1): atomic_dec_and_test pa->pa_count ext4_mb_put_pa (2): lock pa->pa_lock ext4_mb_put_pa (3): check pa->pa_deleted ext4_mb_put_pa (4): set pa->pa_deleted=1 ext4_mb_put_pa (5): unlock pa->pa_lock ext4_mb_put_pa (6): remove pa from a list ext4_mb_pa_callback: free pa * Use sequence ext4_mb_use_preallocated (1): iterate over preallocation ext4_mb_use_preallocated (2): lock pa->pa_lock ext4_mb_use_preallocated (3): check pa->pa_deleted ext4_mb_use_preallocated (4): increase pa->pa_count ext4_mb_use_preallocated (5): unlock pa->pa_lock ext4_mb_release_context: access pa * Use-after-free sequence [initial status] <pa->pa_deleted = 0, pa_count = 1> ext4_mb_use_preallocated (1): iterate over preallocation ext4_mb_use_preallocated (2): lock pa->pa_lock ext4_mb_use_preallocated (3): check pa->pa_deleted ext4_mb_put_pa (1): atomic_dec_and_test pa->pa_count [pa_count decremented] <pa->pa_deleted = 0, pa_count = 0> ext4_mb_use_preallocated (4): increase pa->pa_count [pa_count incremented] <pa->pa_deleted = 0, pa_count = 1> ext4_mb_use_preallocated (5): unlock pa->pa_lock ext4_mb_put_pa (2): lock pa->pa_lock ext4_mb_put_pa (3): check pa->pa_deleted ext4_mb_put_pa (4): set pa->pa_deleted=1 [race condition!] <pa->pa_deleted = 1, pa_count = 1> ext4_mb_put_pa (5): unlock pa->pa_lock ext4_mb_put_pa (6): remove pa from a list ext4_mb_pa_callback: free pa ext4_mb_release_context: access pa AddressSanitizer has detected use-after-free in ext4_mb_new_blocks Bug report: http://goo.gl/rG1On3Signed-off-by:
Junho Ryu <jayr@google.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 22 Jul, 2013 1 commit
-
-
Theodore Ts'o authored
commit e7676a70 upstream. The filesystem should not be marked inconsistent if ext4_free_blocks() is not able to allocate memory. Unfortunately some callers (most notably ext4_truncate) don't have a way to reflect an error back up to the VFS. And even if we did, most userspace applications won't deal with most system calls returning ENOMEM anyway. Reported-by:
Nagachandra P <nagachandra@gmail.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 06 May, 2013 1 commit
-
-
Lachlan McIlroy authored
In the case where we are allocating for a non-extent file, we must limit the groups we allocate from to those below 2^32 blocks, and ext4_mb_regular_allocator() attempts to do this initially by putting a cap on ngroups for the subsequent search loop. However, the initial target group comes in from the allocation context (ac), and it may already be beyond the artificially limited ngroups. In this case, the limit if (group == ngroups) group = 0; at the top of the loop is never true, and the loop will run away. Catch this case inside the loop and reset the search to start at group 0. [sandeen@redhat.com: add commit msg & comments] Signed-off-by:
Lachlan McIlroy <lmcilroy@redhat.com> Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 10 Apr, 2013 1 commit
-
-
Dmitri Monakho authored
This patch should fix sparse complains about shadow declatations. Signed-off-by:
Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 09 Apr, 2013 2 commits
-
-
Al Viro authored
The only part of proc_dir_entry the code outside of fs/proc really cares about is PDE(inode)->data. Provide a helper for that; static inline for now, eventually will be moved to fs/proc, along with the knowledge of struct proc_dir_entry layout. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Andrey Sidorov authored
Improve mb_free_blocks speed by clearing entire range at once instead of iterating over each bit. Freeing block-by-block also makes buddy bitmap subtree flip twice making most of the work a no-op. Very few bits in buddy bitmap require change, e.g. freeing entire group is a 1 bit flip only. As a result, releasing blocks of 60G file now takes 5ms instead of 2.7s. This is especially good for non-preemptive kernels as there is no rescheduling during release. Signed-off-by:
Andrey Sidorov <qrxd43@motorola.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 04 Apr, 2013 3 commits
-
-
Lukas Czerner authored
Currently on many places in ext4 we're using ext4_get_group_no_and_offset() even though we're only interested in knowing the block group of the particular block, not the offset within the block group so we can use more efficient way to compute block group. This patch introduces ext4_get_group_number() which computes block group for a given block much more efficiently. Use this function instead of ext4_get_group_no_and_offset() everywhere where we're only interested in knowing the block group. Signed-off-by:
Lukas Czerner <lczerner@redhat.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
Dmitry Monakhov authored
It is incorrect to use list_for_each_entry_safe() for journal callback traversial because ->next may be removed by other task: ->ext4_mb_free_metadata() ->ext4_mb_free_metadata() ->ext4_journal_callback_del() This results in the following issue: WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250() Hardware name: list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod Pid: 16400, comm: jbd2/dm-1-8 Tainted: G W 3.8.0-rc3+ #107 Call Trace: [<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0 [<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50 [<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0 [<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250 [<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0 [<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570 [<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0 [<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0 [<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0 [<ffffffff810ad630>] ? wake_up_bit+0x40/0x40 [<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80 [<ffffffff810ac6be>] kthread+0x10e/0x120 [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70 [<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0 [<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70 This patch fix the issue as follows: - ext4_journal_commit_callback() make list truly traversial safe simply by always starting from list_head - fix race between two ext4_journal_callback_del() and ext4_journal_callback_try_del() Signed-off-by:
Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Reviewed-by:
Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.com
-
Theodore Ts'o authored
Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Reviewed-by:
Lukas Czerner <lczerner@redhat.com>
-
- 12 Mar, 2013 1 commit
-
-
Theodore Ts'o authored
A user who was using a 8TB+ file system and with a very large flexbg size (> 65536) could cause the atomic_t used in the struct flex_groups to overflow. This was detected by PaX security patchset: http://forums.grsecurity.net/viewtopic.php?f=3&t=3289&p=12551#p12551 This bug was introduced in commit 9f24e420, so it's been around since 2.6.30. :-( Fix this by using an atomic64_t for struct orlav_stats's free_clusters. Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Reviewed-by:
Lukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org
-
- 11 Mar, 2013 2 commits
-
-
Lukas Czerner authored
Using yield() is strongly discouraged (see sched/core.c) especially since we can just use cond_resched(). Replace all use of yield() with cond_resched(). Signed-off-by:
Lukas Czerner <lczerner@redhat.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
Lukas Czerner authored
Remove unused variable 'freed' in ext4_free_blocks(). Signed-off-by:
Lukas Czerner <lczerner@redhat.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 02 Mar, 2013 1 commit
-
-
Lukas Czerner authored
We're using macro EXT4_B2C() to convert number of blocks to number of clusters for bigalloc file systems. However, we should be using EXT4_NUM_B2C(). Signed-off-by:
Lukas Czerner <lczerner@redhat.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 09 Feb, 2013 1 commit
-
-
Theodore Ts'o authored
There are multiple reasons to move away from debugfs. First of all, we are only using it for a single parameter, and it is much more complicated to set up (some 30 lines of code compared to 3), and one more thing that might fail while loading the ext4 module. Secondly, as a module paramter it can be specified as a boot option if ext4 is built into the kernel, or as a parameter when the module is loaded, and it can also be manipulated dynamically under /sys/module/ext4/parameters/mballoc_debug. So it is more flexible. Ultimately we want to move away from using mb_debug() towards tracepoints, but for now this is still a useful simplification of the code base. Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 04 Feb, 2013 1 commit
-
-
Theodore Ts'o authored
The ext4 block allocator only maintains buddy bitmaps for chunks which are less than or equal to one quarter of a block group. That is, for a file aystem with a 1k blocksize, and where the number of blocks in a block group is 8192 blocks, the largest chunk size tracked by buddy bitmaps is 2048 blocks. For a file system with a 4k blocksize, and where the number of blocks in a block group is 32768 blocks, the largest chunk size tracked by buddy bitmaps is 8192 blocks. To work around this code, mballoc.c before this commit would truncate allocation requests to the number of blocks in a block group minus 10. Why 10? Aside from being a completely arbitrary number, it avoids block allocation to be a power of two larger than 25% of the block group. If you try to explicitly fallocate 50% of the block group size, this will demonstrate the problem; the block allocation code will scan the all of the blocks in the file system with cr==0 (since the request is for a natural power of two), but then completely fail for all blocks groups, since the buddy bitmaps don't track chunk sizes of 50% of the block group. To fix this, in these we use ext4_mb_complex_scan_group() instead of ext4_mb_simple_scan_group(). Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger@dilger.ca>
-
- 02 Feb, 2013 1 commit
-
-
Niu Yawei authored
In ext4_mb_add_n_trim(), lg_prealloc_lock should be taken when changing the lg_prealloc_list. Signed-off-by:
Niu Yawei <yawei.niu@intel.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 08 Nov, 2012 3 commits
-
-
Lukas Czerner authored
We should warn user then the discard request fails. However we need to exclude -EOPNOTSUPP case since parts of the device might not support it while other parts can. So print the kernel warning when the error != -EOPNOTSUPP is returned from ext4_issue_discard(). We should also handle error cases in batched discard, again excluding EOPNOTSUPP. Reviewed-by:
Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by:
Lukas Czerner <lczerner@redhat.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
Alan Cox authored
Signed-off-by:
Alan Cox <alan@linux.intel.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
Eric Sandeen authored
I think the whole function could be made prettier, but that goto really took the cake for too-clever-by-half. Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 22 Oct, 2012 2 commits
-
-
Lukas Czerner authored
Currently if len argument in ext4_trim_fs() is smaller than one block, the 'end' variable underflow. Avoid that by returning EINVAL if len is smaller than file system block. Also remove useless unlikely(). Signed-off-by:
Lukas Czerner <lczerner@redhat.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
Tao Ma authored
In mke2fs, we only checksum the whole bitmap block and it is right. While in the kernel, we use EXT4_BLOCKS_PER_GROUP to indicate the size of the checksumed bitmap which is wrong when we enable bigalloc. The right size should be EXT4_CLUSTERS_PER_GROUP and this patch fixes it. Also as every caller of ext4_block_bitmap_csum_set and ext4_block_bitmap_csum_verify pass in EXT4_BLOCKS_PER_GROUP(sb)/8, we'd better removes this parameter and sets it in the function itself. Signed-off-by:
Tao Ma <boyu.mt@taobao.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Reviewed-by:
Lukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org
-
- 27 Sep, 2012 2 commits
-
-
Lukas Czerner authored
With a minor tweaks regarding minimum extent size to discard and discarded bytes reporting the FITRIM can be enabled on bigalloc file system and it works without any problem. This patch fixes minlen handling and discarded bytes reporting to take into consideration bigalloc enabled file systems and finally removes the restriction and allow FITRIM to be used on file system with bigalloc feature enabled. Reviewed-by:
Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by:
Lukas Czerner <lczerner@redhat.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
Wei Yongjun authored
Using kmem_cache_zalloc() instead of kmem_cache_alloc() and memset(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by:
Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 24 Sep, 2012 1 commit
-
-
Yongqiang Yang authored
Free block counters should be checked before doing allocation. Signed-off-by:
Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 18 Sep, 2012 1 commit
-
-
Theodore Ts'o authored
This is a revert of commit b56ff9d3, which removed the call to ext4_issue_discard() to fix a BUG reported because ext4_issue_discard() was being called from inside a block group spinlock. As it turns out this bug had already been fixed by Lukas Czerner in commit 53fdcf99 by the simple expedient of moving when we call ext4_issue_discard() outside the spinlock. So it should be safe to re-enable this functionality, which I tested by putting an BUG_ON(in_atomic) just after the restored callsite to ext4_issue_discard(). Addresses-Google-Bug: #6750518 Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: Anatol Pomozov <anatol.pomozov@gmail.com>
-
- 05 Sep, 2012 1 commit
-
-
Theodore Ts'o authored
Previously we allocated the s_group_info array with enough space for any future possible growth of the file system via online resize. This is unfortunate because it wastes memory, and it doesn't work for the meta_bg scheme, since there is no limit based on the number of reserved gdt blocks. So add the code to grow the s_group_info array as needed. Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 01 Sep, 2012 1 commit
-
-
Anatol Pomozov authored
Signed-off-by:
Anatol Pomozov <anatol.pomozov@gmail.com> Signed-off-by:
Jiri Kosina <jkosina@suse.cz>
-
- 17 Aug, 2012 2 commits
-
-
Robin Dong authored
All the routines call mb_find_extent are setting argument 'order' to 0 just like: mb_find_extent(e4b, 0, ex.fe_start, ex.fe_len, &ex); therefore the useless argument should be removed. Signed-off-by:
Robin Dong <sanbai@taobao.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
Theodore Ts'o authored
Add a short circuit check to ext4_mb_group_group() so that we don't bother to load the block bitmap for a block group which does not have any space available. (Or which does not have enough space until we are in desperation mode, i.e., when cr == 3.) Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=45741 Reported-by: mirek@me.com Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 23 Jul, 2012 1 commit
-
-
Jan Kara authored
Commit a0375156 properly notes that superblock doesn't need to be marked as dirty when only number of free inodes / blocks / number of directories changes since that is recomputed on each mount anyway. However that comment leaves some unnecessary markings as dirty in place. Remove these. Artem: tested using xfstests for both journalled and non-journalled ext4. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Tested-by:
Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
-
- 09 Jul, 2012 1 commit
-
-
Haibo Liu authored
In this patch, the statement "poff = block % blocks_per_page" in ext4_mb_get_buddy_page_lock has no effect. It will be optimized out by the compiler, but it's better to remove it. Signed-off-by:
Haibo Liu <HaiboLiu6@gmail.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 30 Jun, 2012 1 commit
-
-
Aditya Kali authored
Currently ext4_mb_load_buddy is called for every group, irrespective of whether the group info is already in memory, while reading /proc/fs/ext4/<partition>/mb_groups proc file. For the purpose of mb_groups proc file, it is unnecessary to load the file group info from disk if it was loaded in past. These calls to ext4_mb_load_buddy make reading the mb_groups proc file expensive. Also, the locks around ext4_get_group_info are not required. This patch modifies the code to call ext4_mb_load_buddy only if the group info had never been loaded into memory in past. It also removes the mb group locking around ext4_get_group_info call. Signed-off-by:
Aditya Kali <adityakali@google.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 01 Jun, 2012 2 commits
-
-
Salman Qazi authored
We can't have references held on pages in the s_buddy_cache while we are trying to truncate its pages and put the inode. All the pages must be gone before we reach clear_inode. This can only be gauranteed if we can prevent new users from grabbing references to s_buddy_cache's pages. The original bug can be reproduced and the bug fix can be verified by: while true; do mount -t ext4 /dev/ram0 /export/hda3/ram0; \ umount /export/hda3/ram0; done & while true; do cat /proc/fs/ext4/ram0/mb_groups; done Signed-off-by:
Salman Qazi <sqazi@google.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
-
Salman Qazi authored
ext4_free_blocks fails to pair an ext4_mb_load_buddy with a matching ext4_mb_unload_buddy when it fails a memory allocation. Signed-off-by:
Salman Qazi <sqazi@google.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
-
- 28 May, 2012 2 commits
-
-
Zheng Liu authored
remove 'len' variable in ext4_discard_allocated_blocks() because it is useless. Signed-off-by:
Zheng Liu <wenqing.lz@taobao.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
Akira Fujita authored
needs_recovery in ext4_mb_init() is not used, remove it. Signed-off-by:
Akira Fujita <a-fujita@rs.jp.ne.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-