An error occurred fetching the project authors.
- 22 Oct, 2023 40 commits
-
-
Kent Overstreet authored
This improves __bch2_trans_commit - early in the recovery process, when we're running btree_gc and before we want to go RW, it now uses bch2_journal_key_insert() to add the update to the list of updates for journal replay to do, instead of btree_gc having to use separate interfaces depending on whether we're running at bringup or, later, runtime. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
The old .debugcheck methods are no more and this just calls the .invalid method, which doesn't add much since we already check that when doing btree updates and when reading metadata in. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Like the previous patches, this converts bch2_gc_gens() to use the alloc btree directly, and private arrays of generation numbers for its own recalculation of oldest_gen. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This changes the btree_gc code to only use the second bucket array, the one dedicated to GC. On completion, it compares what's in its in memory bucket array to the allocation information in the btree and writes it directly, instead of updating the main in-memory bucket array and writing that. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Add a new helper that returns true if the given btree ID uses the btree key cache. This enables some new cleanups, since the helper can check the options for whether caching is enabled on a given btree. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We were double-freeing old_buckets and not freeing old_buckets_gens: also, the code was supposed to free buckets, not old_buckets; old_buckets is only needed because we have to use rcu_assign_pointer() instead of swap(), and won't be set if we hit the error path. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Implement a hash table, using cuckoo hashing, for empty buckets that are waiting on a journal commit before they can be reused. This replaces the journal_seq field of bucket_mark, and is part of eventually getting rid of the in memory bucket array. We may need to make bch2_bucket_needs_journal_commit() lockless, pending profiling and testing. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This reverts commit f95b61228efd04c9c158123da5827c96e9773b29. It turns out, we're seeing filesystems in the wild end up with blacklisted btree node bsets - this should not be happening, and until we understand why and fix it we need to keep this code around. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
- Add a shim uuid_unparse_lower() in the kernel, since %pU doesn't work in userspace - We don't need to print the bcachefs: or the filesystem name prefix in userspace - Improve a few error messages Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Add a field to bch_dev for the dev_t of the underlying block device - this fixes a null ptr deref in tracepoints. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
With BTREE_ITER_WITH_JOURNAL, there's no longer any restrictions on the order we have to replay keys from the journal in, and we can also start up journal reclaim right away - and delete a bunch of code. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is automatically enabled when initializing a btree iterator before journal replay has completed - it overlays the contents of the journal with the btree. This lets us delete bch2_btree_and_journal_walk() and just use the normal btree iterator interface instead - which also lets us delete a significant amount of duplicated code. Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch - we're redoing the binary search over keys in the journal every time we call bch2_btree_iter_peek(). Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Add a flag to indicate whether a journal replay key has been overwritten, and set/test it with appropriate btree locks held. This fixes a race between the allocator - invalidating buckets, and doing btree updates - and journal replay, which before this patch could clobber the allocator thread's update with an older version of the key from the journal. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
The main in-memory bucket array is going away, but we'll still need to keep bucket generations in memory, at least for now - ptr_stale() needs to be an efficient operation. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This is so that the copygc code doesn't have to refer to bucket_mark.owned_by_allocator - assisting in getting rid of the in memory bucket array. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Since metadata version bcachefs_metadata_version_btree_ptr_sectors_written, we haven't needed the journal seq blacklist mechanism for ignoring blacklisted btree node writes - we now only need it for ignoring journal entries that were written after the newest flush journal entry, and then we only need to keep those blacklist entries around until journal replay is finished. That means we can delete the code for scanning btree nodes to GC journal_seq_blacklist entries. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
If the allocator threads start before journal replay has finished replaying alloc keys, journal replay might overwrite the allocator's btree updates. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This changes bch2_bucket_alloc_new_fs() to a simple bump allocator that doesn't need to use the in memory bucket array, part of a larger patch series to entirely get rid of the in memory bucket array, except for gc/fsck. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
It'll now be handled at format time and in sysfs like other options - it still can only be set at format time, though. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This adds flags for options that must be a power of two (block size and btree node size), and options that are stored in the superblock as a power of two (encoded extent max). Also: options are now stored in memory in the same units they're displayed in (bytes): we now convert when getting and setting from the superblock. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This fixes some bugs when we hit an error very early in the filesystem startup path, before most things have been initialized. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This adds more latency/event measurements and breaks some apart into more events. Journal writes are broken apart into flush writes and noflush writes, btree compactions are broken out from btree splits, btree mergers are added, as well as btree_interior_updates - foreground and total. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We have two radix trees of stripes - one that mirrors some information from the stripes btree in normal operation, and another that GC uses to recalculate block usage counts. The normal one is now only used for finding partially empty stripes in order to reuse them - the normal stripes radix tree and the GC stripes radix tree are used significantly differently, so this patch splits them into separate types. In an upcoming patch we'll be replacing c->stripes with a btree that indexes stripes by the order we want to reuse them. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Start a new header, errcode.h, for bcachefs-private error codes - more error codes will be converted later. This patch just converts bucket_alloc_ret so that they can be mixed with standard error codes and passed as ERR_PTR errors - the ec.c code was doing this already, but incorrectly. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
The filesystem initialization path first marks superblock and journal buckets non transactionally, since the btree isn't functional yet. That path was updating the per-journal-buf percpu counters via bch2_dev_usage_update(), and updating the wrong set of counters so those updates didn't get written out until journal entry 4. The relevant code is going to get significantly rewritten in the future as we transition away from the in memory bucket array, so this just hacks around it for now. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Change log messages in userspace to be closer to what they are in kernel space, and include the device name - it's also useful in userspace. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Snapshot deletion needs to become a multi step process, where we unlink, then tear down the page cache, then delete the subvolume - the deleting flag is equivalent to an inode with i_nlink = 0. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This patch adds subvolume.c - support for the subvolumes and snapshots btrees and related data types and on disk data structures. The next patches will start hooking up this new code to existing code. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Brett Holman authored
This adds progress stats to sysfs for copygc, rebalance, recovery, and the cmd_job ioctls. Signed-off-by:
Brett Holman <bholman.devel@gmail.com> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This closes a significant hole (and last known hole) in our ability to verify metadata. Previously, since btree nodes are log structured, we couldn't detect lost btree writes that weren't the first write to a given node. Additionally, this seems to have lead to some significant metadata corruption on multi device filesystems with metadata replication: since a write may have made it to one device and not another, if we read that btree node back from the replica that did have that write and started appending after that point, the other replica would have a gap in the bset entries and reading from that replica wouldn't find the rest of the bsets. But, since updates to interior btree nodes are now journalled, we can close this hole by updating pointers to btree nodes after every write with the currently written number of sectors, without negatively affecting performance. This means we will always detect lost or corrupt metadata - it also means that our btree is now a curious hybrid of COW and non COW btrees, with all the benefits of both (excluding complexity). Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We can't use btree_update_wq becuase btree updates may be waiting on btree writes to complete. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
%pU for printing out pointers to uuids doesn't work in perf trace Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Also, clean up workqueue usage - we shouldn't be using system workqueues, pretty much everything we do needs to be on our own WQ_MEM_RECLAIM workqueues. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
There's a new module parameter, verify_all_btree_replicas, that enables reading from every btree replica when reading in btree nodes and comparing them against each other. We've been seeing some strange btree corruption - this will hopefully aid in tracking it down and catching it more often. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Writeback throttling is a kernel config option and not always enabled. When it's not enabled we need a fallback, to avoid unbounded memory pinning and work item backlogs. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-