- 22 Oct, 2023 40 commits
-
-
Kent Overstreet authored
Btree node merging now happens prior to transaction commit, not after, so we don't need to pay attention to BTREE_INSERT_NOUNLOCK. Also, foreground_maybe_merge shouldn't be calling bch2_btree_iter_traverse_all() - this is becoming private to the btree iterator code and should only be called by bch2_trans_begin(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Upcoming patch will require that a transaction restart is always immediately followed by bch2_trans_begin(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
On transaction restart iterators won't be locked anymore - make sure we're always checking for errors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
bch2_btree_iter_traverse_all() may loop, and it needs to clear iter->should_be_locked on every iteration. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This avoids unexpected lock restarts in bch2_btree_iter_traverse_all(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
They should already be traversed, and we're asserting that since the introduction of iter->should_be_locked Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
bch2_btree_node_ptr_v2 has a field for stashing a pointer to the in memory btree node; this is safe because we clear this field when reading in nodes from disk and we never free in memory btree nodes - but, we have bug reports that indicate something might be faulty with this optimization, so let's add an option for it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Btree iterator tracepoints should print whether they're for the key cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This adds a new helper for btree_cache.c that does what we want where the iterator is still being traverse - and also eliminates some unnecessary transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We're working to standardize handling of transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Don't print out the ": " when there isn't a value to print. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We were squashing BCH_FSCK_ERRORS_NOT_FIXED. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This is needed for snapshots because we need to start handling lock restarts even when just calling bch2_inode_peek(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Snapshots add another btree lookup, thus we need to handle lock restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Downgrading of btree iterators is something that should only happen explicitly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Add a field to struct bset for the sector offset within the btree node where it was written. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This reverts commit 664f9847bec525d396d62d2db094ca9020289ae0. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This closes a significant hole (and last known hole) in our ability to verify metadata. Previously, since btree nodes are log structured, we couldn't detect lost btree writes that weren't the first write to a given node. Additionally, this seems to have lead to some significant metadata corruption on multi device filesystems with metadata replication: since a write may have made it to one device and not another, if we read that btree node back from the replica that did have that write and started appending after that point, the other replica would have a gap in the bset entries and reading from that replica wouldn't find the rest of the bsets. But, since updates to interior btree nodes are now journalled, we can close this hole by updating pointers to btree nodes after every write with the currently written number of sectors, without negatively affecting performance. This means we will always detect lost or corrupt metadata - it also means that our btree is now a curious hybrid of COW and non COW btrees, with all the benefits of both (excluding complexity). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We should always print out the full btree node ptr. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
The unit tests hadn't been updated for various recent btree changes - this patch makes them work again. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We'd hit a BUG() when rewinding at the start of the btree on btrees with snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
The fsck code handles transaction restarts in a very ad hoc way, and not always correctly. This patch makes some improvements to check_dirents(), but more work needs to be done to figure out how this kind of code should be structured. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We weren't correctly verifying that we had interior node intent locks - this patch also fixes bugs uncovered by the new assertions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
There were some error paths where we were leaking page refs - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We probably don't ever want to flip this off in production, but it may be useful for certain kinds of testing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
On fstest generic/388, we were seeing sporadic deadlocks in the emergency shutdown, where we'd get stuck shutting down the allocator because bch2_btree_update_start() -> bch2_btree_reserve_get() allocated and then deallocated some btree nodes, putting them back on the btree_reserve_cache, after the allocator shutdown code had already cleared out that cache. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This adds safe versions of bch2_varint_(encode|decode) that don't read or write past the end of the buffer, or varint being encoded. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This is to help debug a rare shutdown deadlock in the allocator code - the btree code is leaking open_buckets. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This is a performance improvement by removing the need to wait for the in flight btree write to complete before kicking one off, which is going to be needed to avoid a performance regression with the upcoming patch to update btree ptrs after every btree write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Compat features should be cleared if the filesystem was touched by a version that doesn't support them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This is something we've attempted to stick to for quite some time, as it helps guarantee filesystem latency - but there's a few remaining paths that this patch fixes. This is also necessary for an upcoming patch to update btree pointers after every btree write - since the btree write completion path will now be doing btree operations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
btree_trans should always be passed when we have one - iter->trans is disfavoured. This mainly updates old code in btree_update_interior.c, some of which predates btree_trans. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Dan Robertson authored
Add basic kernel docs for bch2_trans_reset and bch2_trans_begin. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Dan Robertson authored
A new device state that is not a valid state should return -EINVAL in the disk set state ioctl. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Add a new flag to control assertions about updating to internal snapshot nodes, that normally should not be written to - to be used in an upcoming patch. Also do some renaming - trigger_flags is now update_flags. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Add readable names for d_type, and use it in dirent_to_text(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This assertion is checking that what the iterator points to is consistent with iter->real_pos, and since it's an internal btree ordering property it should be using bpos_cmp. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Internal btree code really wants a POS_MAX with all fields ~0; external code more likely wants the snapshot field to be 0, because when we're passing it to bch2_trans_get_iter() it's used for the snapshot we're operating in, which should be 0 for most btrees that don't use snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-