- 22 Oct, 2023 40 commits
-
-
Kent Overstreet authored
Btree updates before we go RW work by inserting into the array of keys that journal replay will insert - but inserting into a flat array is O(n), meaning if btree_gc needs to update many alloc keys, we're O(n^2). Fortunately, the updates btree_gc does happens in sequential order, which means a gap buffer works nicely here - this patch implements a gap buffer for journal keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This behavior dates from the early, early days of bcache, and upon further delving appears to not make any sense. The shrinker only works in terms of 'objects' of unknown size; normalizing to pages only had the effect of changing the batch size, which we could do directly - if we wanted; we probably don't. Normalizing to pages meant our batch size was very small, which seems to have been keeping us from doing as much shrinking as we should be under heavy memory pressure; this patch appears to alleviate some OOMs we've been seeing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
mark_stripe_bucket() was busted; it was using @new unitialized. Also, clean up all the gc mark functions, and convert them to the same style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This neatly avoids bugs where we fail partway through initializing a new filesystem, if we just don't write out partly-initialized state. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
With printbufs, it's now easy to build up multi-line log messages and emit them with one call, which is good because it prevents multiple multi-line log messages from getting Interspersed in the log buffer; this patch also improves the formatting and converts it to latest style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Trivial cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
In a few places we were passing a variable to pr_buf() for the format string - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This switches struct bucket to using a lock, instead of cmpxchg. And now that the protected members no longer need to fit into a u64, we can expand the sector counts to 32 bits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
All code using the in-memory bucket array, excluding GC, has now been converted to use the alloc btree directly - so we can finally delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This is one of the last steps in getting rid of the main in-memory bucket array. This changes bch2_dev_usage_update() to take bkey_alloc_unpacked instead of bucket_mark, and for the places where we are in fact working with bucket_mark and don't have bkey_alloc_unpacked, we add a wrapper that takes bucket_mark and converts to bkey_alloc_unpacked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
In the old allocator code, preparing an existing empty bucket was part of the same code path that invalidated buckets containing cached data. In the new allocator code this is no longer the case: the main allocator path finds empty buckets (via the new freespace btree), and can't allocate buckets that contain cached data. We now need a separate code path to invalidate buckets containing cached data when we're low on empty buckets, which this patch implements. When the number of free buckets decreases that triggers the new invalidate path to run, which uses the LRU btree to pick cached data buckets to invalidate until we're above our watermark. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
In the old allocator code, buckets would be discarded just prior to being used - this made sense in bcache where we were discarding buckets just after invalidating the cached data they contain, but in a filesystem where we typically have more free space we want to be discarding buckets when they become empty. This patch implements the new behaviour - it checks the need_discard btree for buckets awaiting discards, and then clears the appropriate bit in the alloc btree, which moves the buckets to the freespace btree. Additionally, discards are now enabled by default. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Now that we have new persistent data structures for the allocator, this patch converts the allocator to use them. Now, foreground bucket allocation uses the freespace btree to find buckets to allocate, instead of popping buckets off the freelist. The background allocator threads are no longer needed and are deleted, as well as the allocator freelists. Now we only need background tasks for invalidating buckets containing cached data (when we are low on empty buckets), and for issuing discards. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This adds two new btrees for the upcoming allocator rewrite: an extents btree of free buckets, and a btree for buckets awaiting discards. We also add a new trigger for alloc keys to keep the new btrees up to date, and a compatibility path to initialize them on existing filesystems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This introduces a new alloc key which doesn't use varints. Soon we'll be adding backpointers and storing them in alloc keys, which means our pack/unpack workflow for alloc keys won't really work - we'll need to be mutating alloc keys in place. Instead of bch2_alloc_unpack(), we now have bch2_alloc_to_v4() that converts older types of alloc keys to v4 if needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This implements new persistent LRUs, to be used for buckets containing cached data, as well as stripes ordered by time when a block became empty. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
A new empty key type, to be used when using a btree as a set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Add a new superblock field which represents journal buckets as ranges: also move code for the superblock journal fields to journal_sb.c. This also reworks the code for resizing the journal to write the new superblock before using the new journal buckets, and thus be a bit safer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
In the write path, after the write to the block device(s) complete we have to punt to process context to do the btree update. Instead of using the work item embedded in op->cl, this patch switches to a per write-point work item. This helps with two different issues: - lock contention: btree updates to the same writepoint will (usually) be updating the same alloc keys - context switch overhead: when we're bottlenecked on btree updates, having a thread (running out of a work item) checking the write point for completed ops is cheaper than queueing up a new work item and waking up a kworker. In an arbitrary benchmark, 4k random writes with fio running inside a VM, this patch resulted in a 10% improvement in total iops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This simplifies the logic in bch2_btree_update_start() a bit, handling the unlock/block logic more locally. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Since journal reclaim -> btree key cache flushing may require the allocation of new btree nodes, it has an implicit dependency on copygc in order to make forward progress - so we should avoid blocking copygc unless the journal is really close to full. This introduces watermarks to replace our single MAY_GET_UNRESERVED bit in the journal, and adds a watermark for copygc and plumbs it through. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We don't actually want copygc allocations to be nowait - an allocation for copygc might fail and then later succeed due to a bucket needing to wait on journal commit, or to be discarded. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
When bch2_journal_pin_set() is updating an existing pin, we shouldn't call bch2_journal_reclaim_fast() after dropping the old pin and before dropping the new pin - that could reclaim the entry we're trying to pin. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This makes an array of strings available, like our other enums. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
For backpointers, we'll need to delete old backpointers before adding new backpointers - otherwise we'll run into spurious duplicate backpointer errors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
For backpointers, we need to switch the order triggers are run in: we need to run triggers for deletions/overwrites before triggers for inserts. To avoid breaking the reflink triggers, this patch moves deleting of indirect extents with refcount=0 to their triggers, instead of doing it when we update those keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Print bucket:offset when the filesystem is online; this makes debugging easier when correlating with alloc updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Add a new helper for logging messages to the journal - a new debugging tool, an alternative to trace_printk(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This fixes a bug where __bch2_btree_node_update_key() wasn't clearing should_be_locked, leading to bch2_btree_path_traverse() always failing - all callers of btree_path_make_mut() want should_be_locked cleared, so do it there. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
bch2_btree_iter_next_node() was mucking with other btree_path state without setting path->update to be consistent with the fact that the path is very much no longer uptodate - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We weren't properly catching errors from snapshot_live() - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
bch2_journal_space_available -> bch2_journal_halt() self deadlocks on journal lock; work around this by dropping/retaking journal lock before we call bch2_fatal_error(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
When deleting an entry from a heap that was at entry h->used - 1, we'd end up calling heap_sift() on an entry outside the heap - the entry we just removed - which would end up re-adding it to the heap and deleting something we didn't want to delete. Oops... Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This tells the compiler to check printf format strings, and catches a few bugs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We've been seeing a very strange bug where journal flush & reclaim delay end up getting inexplicably zeroed, in the superblock. We're now validating all the options in bch2_validate_super(), and 0 is no longer a valid value for those options, but we need to be careful not to prevent people's filesystems from mounting because of the new validation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Something funny is going on with the new code for restoring the journal write point, and it's hard to reproduce. We do want to debug this because resuming writing to the journal in the wrong spot could be something serious. For now, replace the assertion with an error message and revert to old behaviour when it happens. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
-