- 15 Mar, 2022 6 commits
-
-
Ritesh Harjani authored
All ext4 & jbd2 trace events starts with "dev Major:Minor". While we are still improving/adding the ftrace events for FC, let's fix last two remaining trace events to follow the same convention. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/8f33b163f0f29df2491c03b79f8ac96890ea5184.1647057583.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
This adds commit_tid info in ext4_fc_commit_start/stop which is helpful in debugging fast_commit issues. For e.g. issues where due to jbd2 journal full commit, FC miss to commit updates to a file. Also improves TP_prink format string i.e. all ext4 and jbd2 trace events starts with "dev MAjOR,MINOR". Let's follow the same convention while we are still at it. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/ebcd6b9ab5b718db30f90854497886801ce38c63.1647057583.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
This adds commit_tid argument in ext4_fc_update_stats() so that we can add this information too in jbd_debug logs. This is also required in a later patch to pass the commit_tid info in ext4_fc_commit_start/stop() trace events. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/dabda3f2919a60e01887e798bf5915216b451733.1647057583.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
This patch adds the transaction & inode tid info in trace events for callers of ext4_fc_track_template(). This is helpful in debugging race conditions where an inode could belong to two different transaction tids. It also fixes the checkpatch warnings which says use tabs instead of spaces. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/c203c09dc11bb372803c430f621f25a4b8c2c8b4.1647057583.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
This adds a new trace event in ext4_fc_cleanup() which is helpful in debugging some fast_commit issues. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/794cdb1d5d3622f3f80d30c222ff6652ea68c375.1647057583.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
Currently ext4_fc_track_template() checks, whether the trace event path belongs to replay or does sb has ineligible set, if yes it simply returns. This patch pulls those checks before calling ext4_fc_track_template() in the callers of ext4_fc_track_template(). [ Add checks to ext4_rename() which calls the __ext4_fc_track_*() functions directly. -- TYT ] Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/3cd025d9c490218a92e6d8fb30b6123e693373e3.1647057583.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
- 13 Mar, 2022 8 commits
-
-
Ritesh Harjani authored
This just puts trace_ext4_fc_commit_start(sb) & ktime_get() for measuring FC commit time, after the check of whether sb supports JOURNAL_FAST_COMMIT or not. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/d53cf3e535924ec0a1eb41a560e96561b0727e7a.1647057583.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
One should use DECLARE_EVENT_CLASS for similar event types instead of defining TRACE_EVENT for each event type. This is helpful in reducing the text section footprint for e.g. [1] [1]: https://lwn.net/Articles/381064/Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/a019cb46219ef4b30e4d98d7ced7d8819a2fc61d.1647057583.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
ftrace's __print_symbolic() requires that any enum values used in the symbol to string translation table be wrapped in a TRACE_DEFINE_ENUM so that the enum value can be decoded from the ftrace ring buffer by user space tooling. This patch also fixes few other problems found in this trace point. e.g. dereferencing structures in TP_printk which should not be done at any cost. Also to avoid checkpatch warnings, this patch removes those whitespaces/tab stops issues. Cc: stable@kernel.org Fixes: aa75f4d3 ("ext4: main fast-commit commit path") Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/b4b9691414c35c62e570b723e661c80674169f9a.1647057583.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
Below commit removed all references of EXT4_FC_COMMIT_FAILED. commit 0915e464 ("ext4: simplify updating of fast commit stats") Just remove it since it is not used anymore. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/c941357e476be07a1138c7319ca5faab7fb80fc6.1647057583.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
Recently I've got a report of BUG_ON trigerring during transaction commit in ext4_journalled_writepage_callback() because we've spotted a dirty page without buffers. Add WARN_ON_ONCE to ext4_journalled_set_page_dirty() to catch the problematic condition earlier where we have better chance of understanding which code path is creating dirty data without preparing the page properly. Also update the comment with current information when we are at it. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220310101832.5645-1-jack@suse.czSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
lianzhi chang authored
The unit of file system size should be TiB, not PiB Signed-off-by: lianzhi chang <changlianzhi@uniontech.com> Link: https://lore.kernel.org/r/20220310014415.29937-1-changlianzhi@uniontech.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ojaswin Mujoo authored
Currently mb_optimize_scan scan feature which improves filesystem performance heavily (when FS is fragmented), seems to be not working with files with extents (ext4 by default has files with extents). This patch fixes that and makes mb_optimize_scan feature work for files with extents. Below are some performance numbers obtained when allocating a 10M and 100M file with and w/o this patch on a filesytem with no 1M contiguous block. <perf numbers> =============== Workload: dd if=/dev/urandom of=test conv=fsync bs=1M count=10/100 Time taken ===================================================== no. Size without-patch with-patch Diff(%) 1 10M 0m8.401s 0m5.623s 33.06% 2 100M 1m40.465s 1m14.737s 25.6% <debug stats> ============= w/o patch: mballoc: reqs: 17056 success: 11407 groups_scanned: 13643 cr0_stats: hits: 37 groups_considered: 9472 useless_loops: 36 bad_suggestions: 0 cr1_stats: hits: 11418 groups_considered: 908560 useless_loops: 1894 bad_suggestions: 0 cr2_stats: hits: 1873 groups_considered: 6913 useless_loops: 21 cr3_stats: hits: 21 groups_considered: 5040 useless_loops: 21 extents_scanned: 417364 goal_hits: 3707 2^n_hits: 37 breaks: 1873 lost: 0 buddies_generated: 239/240 buddies_time_used: 651080 preallocated: 705 discarded: 478 with patch: mballoc: reqs: 12768 success: 11305 groups_scanned: 12768 cr0_stats: hits: 1 groups_considered: 18 useless_loops: 0 bad_suggestions: 0 cr1_stats: hits: 5829 groups_considered: 50626 useless_loops: 0 bad_suggestions: 0 cr2_stats: hits: 6938 groups_considered: 580363 useless_loops: 0 cr3_stats: hits: 0 groups_considered: 0 useless_loops: 0 extents_scanned: 309059 goal_hits: 0 2^n_hits: 1 breaks: 1463 lost: 0 buddies_generated: 239/240 buddies_time_used: 791392 preallocated: 673 discarded: 446 Fixes: 196e402a (ext4: improve cr 0 / cr 1 group scanning) Cc: stable@kernel.org Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Suggested-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/fc9a48f7f8dcfc83891a8b21f6dd8cdf056ed810.1646732698.git.ojaswin@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ojaswin Mujoo authored
After moving to the new mount API, mb_optimize_scan mount option handling was not working as expected due to the parsed value always being overwritten by default. Refactor and fix this to the expected behavior described below: * mb_optimize_scan=1 - On * mb_optimize_scan=0 - Off * mb_optimize_scan not passed - On if no. of BGs > threshold else off * Remounts retain previous value unless we explicitly pass the option with a new value Fixes: cebe85d5 ("ext4: switch to the new mount api") Cc: stable@kernel.org Reported-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/c98970fe99f26718586d02e942f293300fb48ef3.1646732698.git.ojaswin@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
- 03 Mar, 2022 7 commits
-
-
Theodore Ts'o authored
[un]pin_user_pages_remote is dirtying pages without properly warning the file system in advance. A related race was noted by Jan Kara in 2018[1]; however, more recently instead of it being a very hard-to-hit race, it could be reliably triggered by process_vm_writev(2) which was discovered by Syzbot[2]. This is technically a bug in mm/gup.c, but arguably ext4 is fragile in that if some other kernel subsystem dirty pages without properly notifying the file system using page_mkwrite(), ext4 will BUG, while other file systems will not BUG (although data will still be lost). So instead of crashing with a BUG, issue a warning (since there may be potential data loss) and just mark the page as clean to avoid unprivileged denial of service attacks until the problem can be properly fixed. More discussion and background can be found in the thread starting at [2]. [1] https://lore.kernel.org/linux-mm/20180103100430.GE4911@quack2.suse.cz [2] https://lore.kernel.org/r/Yg0m6IjcNmfaSokM@google.com Reported-by: syzbot+d59332e2db681cf18f0318a06e994ebbb529a8db@syzkaller.appspotmail.com Reported-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/YiDS9wVfq4mM2jGK@mit.edu
-
Colin Ian King authored
Variable split_flag1 is being assigned a value that is never read, it is being re-assigned a new value in the following code block. The assignment is redundant and can be removed. Cleans up clang scan build warning: fs/ext4/extents.c:3371:2: warning: Value stored to 'split_flag1' is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20220301121644.997833-1-colin.i.king@gmail.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Zhang Yi authored
when ext4 filesystem is created with 64k block size, ^extent and ^huge_file features. the upper_limit would underflow during the computations in ext4_max_bitmap_size(). The problem is the size of block index tree for such large block size is more than i_blocks can carry. So fix the computation to count with this possibility. After this fix, the 'res' cannot overflow loff_t on the extreme case of filesystem with huge_files and 64K block size, so this patch also revert commit 75ca6ad4 ("ext4: fix loff_t overflow in ext4_max_bitmap_size()"). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220301111704.2153829-1-yi.zhang@huawei.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Yang Li authored
Remove the excess description of @bh in ext4_mb_clear_bb() kernel-doc comment to remove warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/ext4/mballoc.c:5895: warning: Excess function parameter 'bh' description in 'ext4_mb_clear_bb' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20220301092136.34764-1-yang.lee@linux.alibaba.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ye Bin authored
We inject IO error when rmdir non empty direcory, then got issue as follows: step1: mkfs.ext4 -F /dev/sda step2: mount /dev/sda test step3: cd test step4: mkdir -p 1/2 step5: rmdir 1 [ 110.920551] ext4_empty_dir: inject fault [ 110.921926] EXT4-fs warning (device sda): ext4_rmdir:3113: inode #12: comm rmdir: empty directory '1' has too many links (3) step6: cd .. step7: umount test step8: fsck.ext4 -f /dev/sda e2fsck 1.42.9 (28-Dec-2013) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Entry '..' in .../??? (13) has deleted/unused inode 12. Clear<y>? yes Pass 3: Checking directory connectivity Unconnected directory inode 13 (...) Connect to /lost+found<y>? yes Pass 4: Checking reference counts Inode 13 ref count is 3, should be 2. Fix<y>? yes Pass 5: Checking group summary information /dev/sda: ***** FILE SYSTEM WAS MODIFIED ***** /dev/sda: 12/131072 files (0.0% non-contiguous), 26157/524288 blocks ext4_rmdir if (!ext4_empty_dir(inode)) goto end_rmdir; ext4_empty_dir bh = ext4_read_dirblock(inode, 0, DIRENT_HTREE); if (IS_ERR(bh)) return true; Now if read directory block failed, 'ext4_empty_dir' will return true, assume directory is empty. Obviously, it will lead to above issue. To solve this issue, if read directory block failed 'ext4_empty_dir' just return false. To avoid making things worse when file system is already corrupted, 'ext4_empty_dir' also return false. Signed-off-by: Ye Bin <yebin10@huawei.com> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20220228024815.3952506-1-yebin10@huawei.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Wang Qing authored
Use the helper function time_is_{before,after}_jiffies() to improve code readability. Signed-off-by: Wang Qing <wangqing@vivo.com> Link: https://lore.kernel.org/r/1646018120-61462-1-git-send-email-wangqing@vivo.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
Currently ext4_fc_commit_dentry_updates() is of quadratic time complexity, which is causing performance bottlenecks with high threads/file/dir count with fs_mark. This patch makes commit dentry updates (and hence ext4_fc_commit()) path to linear time complexity. Hence improves the performance of workloads which does fsync on multiple threads/open files one-by-one. Absolute numbers in avg file creates per sec (from fs_mark in 1K order) ======================================================================= no. Order without-patch(K) with-patch(K) Diff(%) 1 1 16.90 17.51 +3.60 2 2,2 32.08 31.80 -0.87 3 3,3 53.97 55.01 +1.92 4 4,4 78.94 76.90 -2.58 5 5,5 95.82 95.37 -0.46 6 6,6 87.92 103.38 +17.58 7 6,10 0.73 126.13 +17178.08 8 6,14 2.33 143.19 +6045.49 workload type ============== For e.g. 7th row order of 6,10 (2^6 == 64 && 2^10 == 1024) echo /run/riteshh/mnt/{1..64} |sed -E 's/[[:space:]]+/ -d /g' \ | xargs -I {} bash -c "sudo fs_mark -L 100 -D 1024 -n 1024 -s0 -S5 -d {}" Perf profile (w/o patches) ============================= 87.15% [kernel] [k] ext4_fc_commit --> Heavy contention/bottleneck 1.98% [kernel] [k] perf_event_interrupt 0.96% [kernel] [k] power_pmu_enable 0.91% [kernel] [k] update_sd_lb_stats.constprop.0 0.67% [kernel] [k] ktime_get Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/930f35d4fd5f83e2673c868781d9ebf15e91bf4e.1645426817.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
- 26 Feb, 2022 13 commits
-
-
Ritesh Harjani authored
This patch adds an extra checks in ext4_mb_mark_bb() function to make sure we mark & report error if we were to mark/clear any of the critical FS metadata specific bitmaps (&bail out) to prevent from any accidental corruption. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/53cbb6f2573db162a57f935365050d8b1df202ee.1644992610.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
Currently ext4_mb_clear_bb() & ext4_group_add_blocks() only checks whether the given block ranges (which is to be freed) belongs to any FS metadata blocks or not, of the block's respective block group. But to detect any FS error early, it is better to add more strict checkings in those functions which checks whether the given blocks belongs to any critical FS metadata or not within system-zone. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/ddd9143d064774e32d6364a99667817c6e8bfdc0.1644992610.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
This API will be needed at places where we don't have an inode for e.g. while freeing blocks in ext4_group_add_blocks() Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/dd34a236543ad5ae7123eeebe0cb69e6bdd44f34.1644992610.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
We don't need the return value of mb_test_and_clear_bits() in ext4_mb_mark_bb() So simply use mb_clear_bits() instead. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/a971935306dafb124da0193c7dad1aa485210b62.1644992610.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
ext4_set_bits() should actually be mb_set_bits() for uniform API naming convention. This is via below cmd - grep -nr "ext4_set_bits" fs/ext4/ | cut -d ":" -f 1 | xargs sed -i 's/ext4_set_bits/mb_set_bits/g' Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/f1f6ece1405b76a7a987e9145d1adfaf71e30695.1644992610.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
Instead of open coding it, use in_range() function instead. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/8e5526ef14150778871ac7c937c8993c6a09cd3e.1644992610.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
ext4_free_blocks() function became too long and confusing, this patch just pulls out the ext4_mb_clear_bb() function logic from it which clears the block bitmap and frees it. No functionality change in this patch Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/22c30fbb26ba409cf8aa5f0c7912970272c459e8.1644992610.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
In case of flex_bg feature (which is by default enabled), extents for any given inode might span across blocks from two different block group. ext4_mb_mark_bb() only reads the buffer_head of block bitmap once for the starting block group, but it fails to read it again when the extent length boundary overflows to another block group. Then in this below loop it accesses memory beyond the block group bitmap buffer_head and results into a data abort. for (i = 0; i < clen; i++) if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == !state) already++; This patch adds this functionality for checking block group boundary in ext4_mb_mark_bb() and update the buffer_head(bitmap_bh) for every different block group. w/o this patch, I was easily able to hit a data access abort using Power platform. <...> [ 74.327662] EXT4-fs error (device loop3): ext4_mb_generate_buddy:1141: group 11, block bitmap and bg descriptor inconsistent: 21248 vs 23294 free clusters [ 74.533214] EXT4-fs (loop3): shut down requested (2) [ 74.536705] Aborting journal on device loop3-8. [ 74.702705] BUG: Unable to handle kernel data access on read at 0xc00000005e980000 [ 74.703727] Faulting instruction address: 0xc0000000007bffb8 cpu 0xd: Vector: 300 (Data Access) at [c000000015db7060] pc: c0000000007bffb8: ext4_mb_mark_bb+0x198/0x5a0 lr: c0000000007bfeec: ext4_mb_mark_bb+0xcc/0x5a0 sp: c000000015db7300 msr: 800000000280b033 dar: c00000005e980000 dsisr: 40000000 current = 0xc000000027af6880 paca = 0xc00000003ffd5200 irqmask: 0x03 irq_happened: 0x01 pid = 5167, comm = mount <...> enter ? for help [c000000015db7380] c000000000782708 ext4_ext_clear_bb+0x378/0x410 [c000000015db7400] c000000000813f14 ext4_fc_replay+0x1794/0x2000 [c000000015db7580] c000000000833f7c do_one_pass+0xe9c/0x12a0 [c000000015db7710] c000000000834504 jbd2_journal_recover+0x184/0x2d0 [c000000015db77c0] c000000000841398 jbd2_journal_load+0x188/0x4a0 [c000000015db7880] c000000000804de8 ext4_fill_super+0x2638/0x3e10 [c000000015db7a40] c0000000005f8404 get_tree_bdev+0x2b4/0x350 [c000000015db7ae0] c0000000007ef058 ext4_get_tree+0x28/0x40 [c000000015db7b00] c0000000005f6344 vfs_get_tree+0x44/0x100 [c000000015db7b70] c00000000063c408 path_mount+0xdd8/0xe70 [c000000015db7c40] c00000000063c8f0 sys_mount+0x450/0x550 [c000000015db7d50] c000000000035770 system_call_exception+0x4a0/0x4e0 [c000000015db7e10] c00000000000c74c system_call_common+0xec/0x250 Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/2609bc8f66fc15870616ee416a18a3d392a209c4.1644992609.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
ext4_mb_mark_bb() currently wrongly calculates cluster len (clen) and flex_group->free_clusters. This patch fixes that. Identified based on code review of ext4_mb_mark_bb() function. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/a0b035d536bafa88110b74456853774b64c8ac40.1644992609.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
CONFIG_JBD2_DEBUG and jbd2_journal_enable_debug knobs were added in update_t_max_wait(), since earlier it used to take a spinlock for updating t_max_wait, which could cause a bottleneck while starting a txn (start_this_handle()). Since in previous patch, we have killed t_handle_lock completely, we could get rid of this debug config and knob to let t_max_wait be updated by default again. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/ad7319a601fd501079310747ce87d908e0944763.1644992076.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
This patch kills t_handle_lock transaction spinlock completely from jbd2. To explain the reasoning, currently there were three sites at which this spinlock was used. 1. jbd2_journal_wait_updates() a. Based on careful code review it can be seen that, we don't need this lock here. This is since we wait for any currently ongoing updates based on a atomic variable t_updates. And we anyway don't take any t_handle_lock while in stop_this_handle(). i.e. write_lock(&journal->j_state_lock() jbd2_journal_wait_updates() stop_this_handle() while (atomic_read(txn->t_updates) { | DEFINE_WAIT(wait); | prepare_to_wait(); | if (atomic_read(txn->t_updates) if (atomic_dec_and_test(txn->t_updates)) write_unlock(&journal->j_state_lock); schedule(); wake_up() write_lock(&journal->j_state_lock); finish_wait(); } txn->t_state = T_COMMIT write_unlock(&journal->j_state_lock); b. Also note that between atomic_inc(&txn->t_updates) in start_this_handle() and jbd2_journal_wait_updates(), the synchronization happens via read_lock(journal->j_state_lock) in start_this_handle(); 2. jbd2_journal_extend() a. jbd2_journal_extend() is called with the handle of each process from task_struct. So no lock required in updating member fields of handle_t b. For member fields of h_transaction, all updates happens only via atomic APIs (which is also within read_lock()). So, no need of this transaction spinlock. 3. update_t_max_wait() Based on Jan suggestion, this can be carefully removed using atomic cmpxchg API. Note that there can be several processes which are waiting for a new transaction to be allocated and started. For doing this only one process will succeed in taking write_lock() and allocating a new txn. After that all of the process will be updating the t_max_wait (max transaction wait time). This can be done via below method w/o taking any locks using atomic cmpxchg. For more details refer [1] new = get_new_val(); old = READ_ONCE(ptr->max_val); while (old < new) old = cmpxchg(&ptr->max_val, old, new); [1]: https://lwn.net/Articles/849237/Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/d89e599658b4a1f3893a48c6feded200073037fc.1644992076.git.riteshh@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
Ritesh Harjani authored
jbd2_journal_wait_updates() is called with j_state_lock held. But if there is a commit in progress, then this transaction might get committed and freed via jbd2_journal_commit_transaction() -> jbd2_journal_free_transaction(), when we release j_state_lock. So check for journal->j_running_transaction everytime we release and acquire j_state_lock to avoid use-after-free issue. Link: https://lore.kernel.org/r/948c2fed518ae739db6a8f7f83f1d58b504f87d0.1644497105.git.ritesh.list@gmail.com Fixes: 4f981868 ("jbd2: refactor wait logic for transaction updates into a common function") Cc: stable@kernel.org Reported-and-tested-by: syzbot+afa2ca5171d93e44b348@syzkaller.appspotmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-
Lukas Czerner authored
After commit 6e47a3cc ("ext4: get rid of super block and sbi from handle_mount_ops()") the 'abort' options stopped working. This is because we're using ctx_set_mount_flags() helper that's expecting an argument with the appropriate bit set, but instead got EXT4_MF_FS_ABORTED which is a bit position. ext4_set_mount_flag() is using set_bit() while ctx_set_mount_flags() was using bitwise OR. Create a separate helper ctx_set_mount_flag() to handle setting the mount_flags correctly. While we're at it clean up the EXT4_SET_CTX macros so that we're only creating helpers that we actually use to avoid warnings. Fixes: 6e47a3cc ("ext4: get rid of super block and sbi from handle_mount_ops()") Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Ye Bin <yebin10@huawei.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Tested-by: Gabriel Krisman Bertazi <krisman@collabora.com> Link: https://lore.kernel.org/r/20220201131345.77591-1-lczerner@redhat.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>
-
- 20 Feb, 2022 6 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull locking fix from Borislav Petkov: "Fix a NULL ptr dereference when dumping lockdep chains through /proc/lockdep_chains" * tag 'locking_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Correct lock_classes index mapping
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Borislav Petkov: - Fix the ptrace regset xfpregs_set() callback to behave according to the ABI - Handle poisoned pages properly in the SGX reclaimer code * tag 'x86_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ptrace: Fix xfpregs_set()'s incorrect xmm clearing x86/sgx: Fix missing poison handling in reclaimer
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fix from Borislav Petkov: "Fix task exposure order when forking tasks" * tag 'sched_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix yet more sched_fork() races
-
git://git.kernel.org/pub/scm/linux/kernel/git/ras/rasLinus Torvalds authored
Pull EDAC fix from Borislav Petkov: "Fix a long-standing struct alignment bug in the EDAC struct allocation code" * tag 'edac_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC: Fix calculation of returned address and next offset in edac_align_ptr()
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "Three fixes, all in drivers. The ufs and qedi fixes are minor; the lpfc one is a bit bigger because it involves adding a heuristic to detect and deal with common but not standards compliant behaviour" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: core: Fix divide by zero in ufshcd_map_queues() scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loop scsi: qedi: Fix ABBA deadlock in qedi_process_tmf_resp() and qedi_process_cmd_cleanup_resp()
-