- 10 Mar, 2024 38 commits
-
-
Kent Overstreet authored
Our proliferation of memalloc_*_{save,restore} APIs is getting a bit silly, this adds a generic version and converts the existing save/restore functions to wrappers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: linux-mm@kvack.org Acked-by: Vlastimil Babka <vbabka@suse.cz>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Going to be adding more code here for checking subvol structure. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Repurposing standard error codes in bcachefs code is banned in new code, and we need to get rid of the remaining ones - private error codes give us much better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
WQ_UNBOUND with max_active 1 means ordered workqueue, but we don't actually need or want ordered semantics - and probably want a higher concurrency limit anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Need to fix this oversight for the new FS_IOC_(GET|SET)UUID ioctls. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
switch the statfs code from something horrible and open coded to the more standard uuid_to_fsid() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Files within a subvolume cannot be renamed into another subvolume, but subvolumes themselves were intended to be. This implements subvolume renaming - we need to ensure that there's only a single dirent that points to a subvolume key (not multiple versions in different snapshots), and we need to ensure that dirent.d_parent_subol and inode.bi_parent_subvol are updated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
btree_and_journal_iter is old code that we want to get rid of, but we're not ready to yet. lack of btree node prefetching is, it turns out, a real performance issue for fsck on spinning rust, so - add it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
we now always have a btree_trans when using a btree_and_journal_iter; prep work for adding prefetching to btree_and_journal_iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Recently a severe performance regression was discovered, which bisected to a6548c8b bcachefs: Avoid flushing the journal in the discard path It turns out the old behaviour, which issued excessive journal flushes, worked around a performance issue where queueing delays would cause the journal to not be able to write quickly enough and stall. The journal flushes masked the issue because they periodically flushed the device write cache, reducing write latency for non flushes. This patch reworks the journalling code to allow more than one (non-flush) write to be in flight at a time. With this patch, doing 4k random writes and an iodepth of 128, we are now able to hit 560k iops to a Samsung 970 EVO Plus - previously, we were stuck in the ~200k range. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Prep work for having multiple journal writes in flight. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Prep work for having multiple journal writes in flight. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This gives us a way to record the date and time every journal entry was written - useful for debugging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Eliminates some error paths - no longer have a hardcoded BCH_REPLICAS_MAX limit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Drop an unnecessary bch2_subvolume_get_snapshot() call, and drop the __ from the name - this is a normal interface. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Minor renaming for clarity, bit of refactoring. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Most bcachefs workqueues are used for completions, and should be WQ_HIGHPRI - this helps reduce queuing delays, we want to complete quickly once we can no longer signal backpressure by blocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
For DT_SUBVOL, we now print both parent and child subvol IDs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Previously, any time we failed to get a journal reservation we'd retry, with the journal lock held; but this isn't necessary given wait_event()/wake_up() ordering. This avoids performance cliffs when the journal starts to get backed up and lock contention shoots up. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We don't want journal write completions to be blocked behind btree transactions - io_complete_wq is used for btree updates after data and metadata writes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Guoyu Ou authored
When we are checking whether a subvolume is empty in the specified snapshot, entries that do not belong to this subvolume should be skipped. This fixes the following case: $ bcachefs subvolume create ./sub $ cd sub $ bcachefs subvolume create ./sub2 $ bcachefs subvolume snapshot . ./snap $ ls -a snap . .. $ rmdir snap rmdir: failed to remove 'snap': Directory not empty As Kent suggested, we pass 0 in may_delete_deleted_inode() to ignore subvols in the subvol we are checking, because inode.bi_subvol is only set on subvolume roots, and we can't go through every inode in the subvolume and change bi_subvol when taking a snapshot. It makes the check less strict, but that's ok, the rest of fsck will still catch it. Signed-off-by: Guoyu Ou <benogy@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We were failing to set path->uptodate when reaching the end of a btree node iterator, causing the new prefetch code for backpointers gc to go into an infinite loop. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
validate_bset_keys() never properly validated k->u64s; it checked if it was 0, but not if it was smaller than keys for the given packed format; this fixes that small oversight. This patch was backported, so it's adding quite a few error enums so that they don't get renumbered and we don't have confusing gaps. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We don't know where the superblock and journal lives on offline devices; that means if a device is offline fsck can't check those buckets. Previously, fsck would incorrectly clear bucket data types for those buckets on offline devices; now we just use the previous state. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
When a btree root is unreadable, we still might be able to get some data back by replaying what's in the journal. Previously though, we got confused when journal replay would attempt to replay a key for a level that didn't exist. This adds bch2_btree_increase_depth(), so that journal replay can handle this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
check_inode_deleted_list() returns true if the inode is on the deleted list; check_inode() was checking the return code incorrectly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This adds an option to disable kicking out devices when splitbrain is detected - it seems there's some issues with splitbrain detection and we're kicking out devices erronously. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We need to be able to iterate over extent ptrs that may be corrupted in order to print them - this fixes a bug where we'd pop an assert in bch2_bkey_durability_safe(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
bch2_journal_seq_blacklist_add() was bugged when the new entry overlapped with multiple existing entries, and it also assumed new entries are being added in increasing order. This is true on any sane filesystem, but when trying to recover from very badly mangled filesystems we might end up with the journal sequence number rewinding vs. what the blacklist list knows about - easiest to just handle that here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Li Zetao authored
There is a null-ptr-deref issue reported by kasan: KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] Call Trace: <TASK> bch2_fs_alloc+0x1092/0x2170 [bcachefs] bch2_fs_open+0x683/0xe10 [bcachefs] ... When initializing the name of bch_fs, it needs to dynamically alloc memory to meet the length of the name. However, when name allocation failed, it will cause a null-ptr-deref access exception in subsequent string copy. Fix this issue by checking if name allocation is successful. Fixes: 401ec4db ("bcachefs: Printbuf rework") Signed-off-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
- 25 Feb, 2024 2 commits
-
-
Linus Torvalds authored
-
https://evilpiepirate.org/git/bcachefsLinus Torvalds authored
Pull bcachefs fixes from Kent Overstreet: "Some more mostly boring fixes, but some not User reported ones: - the BTREE_ITER_FILTER_SNAPSHOTS one fixes a really nasty performance bug; user reported an untar initially taking two seconds and then ~2 minutes - kill a __GFP_NOFAIL in the buffered read path; this was a leftover from the trickier fix to kill __GFP_NOFAIL in readahead, where we can't return errors (and have to silently truncate the read ourselves). bcachefs can't use GFP_NOFAIL for folio state unlike iomap based filesystems because our folio state is just barely too big, 2MB hugepages cause us to exceed the 2 page threshhold for GFP_NOFAIL. additionally, the flags argument was just buggy, we weren't supplying GFP_KERNEL previously (!)" * tag 'bcachefs-2024-02-25' of https://evilpiepirate.org/git/bcachefs: bcachefs: fix bch2_save_backtrace() bcachefs: Fix check_snapshot() memcpy bcachefs: Fix bch2_journal_flush_device_pins() bcachefs: fix iov_iter count underflow on sub-block dio read bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree bcachefs: Kill __GFP_NOFAIL in buffered read path bcachefs: fix backpointer_to_text() when dev does not exist
-