An error occurred fetching the project authors.
- 22 Oct, 2023 40 commits
-
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Our types are exported to the tracepoint code, so it's not necessary to break things out individually when passing them to tracepoints - we can also call other functions from TP_fast_assign(). Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
It doesn't make any sense to set should_be_locked on btree_paths that aren't locked, and is often a bug - this patch adds assertions and fixes some of those bugs. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Now that we have error codes, with subtypes, we can switch to our own error code for transaction restarts - and even better, a distinct error code for each transaction restart reason: clearer code and better debugging. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Daniel Hill authored
We need the caller name and a place to store our results, btree_trans provides this. Signed-off-by:
Daniel Hill <daniel@gluo.nz> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Better/more descriptive naming, and prep for adding nested_lockrestart_do() and nested_commit_do(). Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
If we're trying to get a ref and the refcount has been killed, it means we're doing an emergency shutdown - we always want tryget_live(). Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This converts bcachefs to the modern printbuf interface/implementation, synced with the version to be submitted upstream. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Delete some obsolete tracepoints, organize alloc tracepoints better, make a few tracepoints more consistent. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Can't take btree node locks while holding btree_reserve_cache_lock - it would be nice if we could check this with lockdep. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We're seeing occasional firings of the assertion in the key cache shutdown code that nr_dirty == 0, which means we must sometimes be doing transaction commits after we've gone read only. Cleanups & changes: - BCH_FS_ALLOC_CLEAN renamed to BCH_FS_CLEAN_SHUTDOWN - new helper bch2_btree_interior_updates_flush(), which returns true if it had to wait - bch2_btree_flush_writes() now also returns true if there were btree writes in flight - __bch2_fs_read_only now checks if btree writes were in flight in the shutdown loop: btree write completion does a transaction update, to update the pointer in the parent node - assert that !BCH_FS_CLEAN_SHUTDOWN in __bch2_trans_commit Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Previously, btree_update_interior.c passed keys to bch2_trans_mark_* that hadn't been fully initialized - they didn't have the key field filled out, just the value. With backpointers, we need to make sure keys are fully initialized before marking them - because the backpointer points back to the original key. This patch tweaks the interior update paths to fix this. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
For backpointers, we'll need the full key location - that means btree_id and btree level. This patch plumbs it through. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This adds a new parameter to .key_invalid() methods for whether the key is being read or written; the idea being that methods can do more aggressive checks when a key is newly created and being written, when we wouldn't want to delete the key because of those checks. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Now that we have new persistent data structures for the allocator, this patch converts the allocator to use them. Now, foreground bucket allocation uses the freespace btree to find buckets to allocate, instead of popping buckets off the freelist. The background allocator threads are no longer needed and are deleted, as well as the allocator freelists. Now we only need background tasks for invalidating buckets containing cached data (when we are low on empty buckets), and for issuing discards. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
In the write path, after the write to the block device(s) complete we have to punt to process context to do the btree update. Instead of using the work item embedded in op->cl, this patch switches to a per write-point work item. This helps with two different issues: - lock contention: btree updates to the same writepoint will (usually) be updating the same alloc keys - context switch overhead: when we're bottlenecked on btree updates, having a thread (running out of a work item) checking the write point for completed ops is cheaper than queueing up a new work item and waking up a kworker. In an arbitrary benchmark, 4k random writes with fio running inside a VM, this patch resulted in a 10% improvement in total iops. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This simplifies the logic in bch2_btree_update_start() a bit, handling the unlock/block logic more locally. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Since journal reclaim -> btree key cache flushing may require the allocation of new btree nodes, it has an implicit dependency on copygc in order to make forward progress - so we should avoid blocking copygc unless the journal is really close to full. This introduces watermarks to replace our single MAY_GET_UNRESERVED bit in the journal, and adds a watermark for copygc and plumbs it through. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This makes an array of strings available, like our other enums. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
bch2_btree_iter_next_node() was mucking with other btree_path state without setting path->update to be consistent with the fact that the path is very much no longer uptodate - oops. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Six locks have a percpu mode, which we use for interior btree nodes, as well as btree key cache keys for the subvolumes btree. We've been switching locks back and forth between percpu and non percpu mode as needed, but it turns out this is racy - when we're reusing an existing node, other threads could be attempting to lock it while we're switching it between modes. This patch fixes this by never switching 'struct btree' between the two modes, and instead segragating them between two different freed lists. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We don't need to pass the number of nodes required to bch2_btree_update_start, just whether we're doing a split at @level. This is prep work for a fix to our usage of six lock's percpu mode, which is going to require us to count up and allocate interior nodes and leaf nodes seperately. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
When we do an interior btree update, we create new btree nodes and link them into the btree in memory, but they don't become reachable on disk until later, when btree_update_nodes_written_trans() runs. Updates to the new nodes can thus happen before they're reachable on disk, and if the updates to those new nodes are written before the nodes become reachable, we would then drop the journal pin for those updates before the btree has them. This is what the journal pin in bch2_btree_update_start() was protecting against. However, it's not actually needed because we don't allow subsequent append writes to btree nodes until the node is reachable on disk. Dropping this unneeded pin also fixes a bug introduced by "bcachefs: Journal seq now incremented at entry open, not close" - in the new code, if the journal is completely empty a journal pin list for journal_cur_seq() won't exist. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Checking btree_node_may_write() isn't atomic with the other btree flags, dirty and need_write in particular. There was a rare race where we'd unblock a node from writing while __btree_node_flush() was setting need_write, and no thread would notice that the node was now both able to write and needed to be written. Fix this by adding btree node flags for will_make_reachable and write_blocked that can be checked in the cmpxchg loop in __bch2_btree_node_write. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
btree_node_write_if_need() kicks off a btree node write only if need_write is set; this makes the locking easier to reason about by moving the check into the cmpxchg loop in __bch2_btree_node_write(). Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This is for adding an array of strings for btree node flag names. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This was just dead code. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
In btree_update_interior.c, we were changing a path's level directly - which affects path sort order - without re-sorting paths, leading to assertions when bch2_path_get() verified paths were sorted correctly. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This patch changes printbufs dynamically allocate and reallocate a buffer as needed. Stack usage has become a bit of a problem, and a major cause of that has been static size string buffers on the stack. The most involved part of this refactoring is that printbufs must now be exited with printbuf_exit(). Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We're now coming up with triggers that modify the update being done. A bkey_s_c is const - bkey_i is the correct type to be using here. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
BTREE_NODE_SEQ() is supposed to give us a time ordering of btree nodes on disk, so that we can tell which btree node is newer if we ever have to scan the entire device to find btree nodes. The btree node merge path wasn't setting it correctly on the new node - oops. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This consolidates some of the btree node lock path, so that when we're blocked taking a write lock on a node it shows up in bch2_btree_trans_to_text(), along with intent and read locks. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
These nodes aren't reachable by other threads, so there's no need to keep it locked - and this fixes a bug with the assertion in bch2_trans_unlock() firing on transaction restart. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
bch2_btree_update_key() is used in the btree node write path - before delivering the completion we have to update the parent pointer with the number of sectors written. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Symbol decoding, via %ps, isn't supported in userspace - this will also be faster when we're using trans->fn in the fast path, as with the new BCH_JSET_ENTRY_log journal messages. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
With BTREE_ITER_WITH_JOURNAL, there's no longer any restrictions on the order we have to replay keys from the journal in, and we can also start up journal reclaim right away - and delete a bunch of code. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is automatically enabled when initializing a btree iterator before journal replay has completed - it overlays the contents of the journal with the btree. This lets us delete bch2_btree_and_journal_walk() and just use the normal btree iterator interface instead - which also lets us delete a significant amount of duplicated code. Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch - we're redoing the binary search over keys in the journal every time we call bch2_btree_iter_peek(). Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-