- 22 Oct, 2023 40 commits
-
-
Kent Overstreet authored
On failure to get a write lock (because we had a conflicting read lock), we need to make sure to upgrade the read lock to an intent lock - or we could end up spinning. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
The bch2_btree_path_upgrade() call was failing and tripping an assert - path->level + 1 is in this case not necessarily exactly what we want, fix it by upgrading exactly the locks we want. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This allows triggers to distinguish between a key entering the btree - i.e. being called from the trans commit path - vs. being called on a key that already exists, i.e. by GC. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This helps to unify the interface between bch2_mark_key() and bch2_trans_mark_key() - and it also gives access to the journal reservation and journal seq in the mark_key path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
- The backpointer that ec_stripe_update_ptrs() uses now needs to include the snapshot ID, which means we have to change where we add the backpointer to after getting the snapshot ID for the new extents - ec_stripe_update_ptrs() needs to be calling bch2_trans_begin() - improve error message in bch2_mark_stripe() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
When the old or new key doesn't exist, we should still pass in a deleted key with the correct pos. This fixes a bug in the ec code, when bch2_mark_stripe() was looking up the wrong in-memory stripe. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This tweaks the journal code to always act as if there's space available in nochanges mode, when we're not going to be doing any writes. This helps in recovering filesystems that won't mount because they need journal replay and the journal has gotten stuck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
The fsck code has been handling transaction restarts locally, to avoid calling fsck_err() multiple times (and asking the user/logging the error multiple times) on transaction restart. However, with our improving assertions about iterator validity, this isn't working anymore - the code wasn't entirely correct, in ways that are fine for now but are going to matter once we start wanting online fsck. This code converts much of the fsck code to handle transaction restarts in a more rigorously correct way - moving restart handling up to the top level of check_dirent, check_xattr and others - at the cost of logging errors multiple times on transaction restart. Fixing the issues with logging errors multiple times is probably going to require memoizing calls to fsck_err() - we'll leave that for future improvements. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Was popping an assertion on !BTREE_ITER_ALL_SNAPSHOTS iters when getting to the end. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This fixes building in userspace - code that's coupled to the kernel VFS interface should live in fs.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
nochanges mode is often used for getting data off of otherwise nonrecoverable filesystems, which is often because of errors hit during fsck. Don't force version upgrade & fsck in nochanges mode, so that it's more likely to mount. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Back when we relied on the journal sequence number blacklist machinery for consistency between btree and the journal, we needed to ensure a new journal entry was written before any btree writes were done. But, this had the side effect of consuming some space in the journal prior to doing journal replay - which could lead to a very wedged filesystem, since we don't yet have a way to grow the journal prior to going RW. Fortunately, the journal sequence number blacklist machinery isn't needed anymore, as btree node pointers now record the numer of sectors currently written to that node - that code should all be ripped out. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Snapshot deletion needs to become a multi step process, where we unlink, then tear down the page cache, then delete the subvolume - the deleting flag is equivalent to an inode with i_nlink = 0. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We should always print out the key we were marking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
It seems some users have reflink pointers which span many indirect extents, from a short window in time when merging of reflink pointers was allowed. Now, we're seeing transaction path overflows in fix_reflink_p(), the code path to clear out the reflink_p fields now used for front/back pad - but, we don't actually need to be running triggers in that path, which is an easy partial fix. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
for_each_btree_key() now calls bch2_trans_begin() as needed; that means, we can also call it when we're in danger of overflowing transaction paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
The way __bch2_mark_reflink_p returns errors was clashing with returning the number of sectors processed - we weren't returning FSCK_ERR_EXIT correctly. Fix this by only using the return code for errors, which actually ends up simplifying the overall logic. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This fixes a possible race where we fail to remove a device because of btree nodes still on it, that are being deleted by in flight btree updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We have been getting away from handling transaction restarts locally - convert bch2_btree_node_rewrite() to the newer style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We were modifying state, then return -EINTR, causing us to skip nodes - ouch. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
But we don't need to call it from outside the btree iterator code anymore, since it's called by bch2_trans_begin() and bch2_btree_path_traverse(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This is a hacky but effective fix to device usage stats for superblock and journal being wrong on a newly added device (following the comment that already told us how it needed to be done!) Reported-by: Chris Webb <chris@arachsys.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
readdir() in a directory with many subvolumes could overflow transaction paths - this is a simple hack around the issue. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
The back merging case wasn't returning errors correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This changes the on disk format for dirents that point to subvols so that they also record the subvolid of the parent subvol, so that we can filter them out in other subvolumes. This also updates the dirent code to do that filtering, and in particular tweaks the rename code - we need to ensure that there's only ever one dirent (counting multiplicities in different snapshots) that point to a subvolume. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Code that uses for_each_btree_key often wants transaction restarts to be handled locally and not returned. Originally, we wouldn't return transaction restarts if there was a single iterator in the transaction - the reasoning being if there weren't other iterators being invalidated, and the current iterator was being advanced/retraversed, there weren't any locks or iterators we were required to preserve. But with the btree_path conversion that approach doesn't work anymore - even when we're using for_each_btree_key() with a single iterator there will still be two paths in the transaction, since we now always preserve the path at the pos the iterator was initialized at - the reason being that on restart we often restart from the same place. And it turns out there's now a lot of for_each_btree_key() uses that _do not_ want transaction restarts handled locally, and should be returning them. This patch splits out for_each_btree_key_norestart() and for_each_btree_key_continue_norestart(), and converts existing users as appropriate. for_each_btree_key(), for_each_btree_key_continue(), and for_each_btree_node() now handle transaction restarts themselves by calling bch2_trans_begin() when necessary - and the old hack to not return transaction restarts when there's a single path in the transaction has been deleted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
It's not an error if we don't have cached data - skip BCH_DATA_cached in bch2_have_enough_devs(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This fixes a bug where subsequently doing creates with the same name fails. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
check_path() wasn't checking the snapshot ID when checking for directory structure loops - so, snapshots would cause us to detect a loop that wasn't actually a loop. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
When a reflink pointer points to only part of an indirect extent, and then that indirect extent is fragmented (e.g. by copygc), if the reflink pointer only points to one of the fragments we leak a reference. Fix this by storing front/back pad values in reflink pointers - when inserting reflink pointesr, we initialize them to cover the full range of the indirect extents we reference. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We had a bug where reflink_p pointers weren't being initialized to 0, and when we started using the second word, things broke badly. This patch revs the on disk format version and adds cleanup code to zero out the second word of reflink_p pointers before we start using it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
It shouldn't be necessary when we're only using a single iterator and not doing updates, but that's harder to debug at the moment. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Now that peek_node()/next_node() are converted to return errors directly, we don't need bch2_trans_exit() to return errors - it's cleaner this way and wasn't used much anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This changes for_each_btree_node() to work like for_each_btree_key(), and to that end bch2_btree_iter_peek_node() and next_node() also return error ptrs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
When a reflink pointer points to an indirect extent that doesn't exist, we need to replace it with a KEY_TYPE_error key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Checking of directory structure across subvolumes was broken - we need to look up the snapshot ID of the parent subvolume when crossing subvol boundaries. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Subvolume deletion doesn't flush & evict the btree key cache - ideally it would, but that's tricky, so instead bch2_subvolume_create() needs to make sure the slot doesn't exist in the key cache to avoid triggering assertions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Brett Holman authored
Type size_t is architecture-specific. Fix warnings for some non-amd64 arches. Signed-off-by: Brett Holman <bholman.devel@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This bug was only discovered when we started using the 2nd word in the val, which should have been zeroed out as those fields had never been used before - ouch. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-