Commits · f0d2e9f2e511c137b75f15d0d13abd0217239253 · Kirill Smelkov / linux

An error occurred fetching the project authors.

22 Oct, 2023 40 commits

bcachefs: Add assertions for unexpected transaction restarts · f0d2e9f2
Kent Overstreet authored 2 years ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
```
f0d2e9f2

bcachefs: Tracepoint improvements · 9f96568c

Kent Overstreet authored 2 years ago

Our types are exported to the tracepoint code, so it's not necessary to
break things out individually when passing them to tracepoints - we can
also call other functions from TP_fast_assign().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

9f96568c

bcachefs: BTREE_ITER_NO_NODE -> BCH_ERR codes · 315c9ba6
Kent Overstreet authored 2 years ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
```
315c9ba6

bcachefs: Don't set should_be_locked on paths that aren't locked · fd211bc7

Kent Overstreet authored 2 years ago

It doesn't make any sense to set should_be_locked on btree_paths that
aren't locked, and is often a bug - this patch adds assertions and fixes
some of those bugs.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

fd211bc7

bcachefs: EINTR -> BCH_ERR_transaction_restart · 549d173c

Kent Overstreet authored 2 years ago

Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

549d173c

bcachefs: lock time stats prep work. · 8bfe14e8

Daniel Hill authored 2 years ago

We need the caller name and a place to store our results, btree_trans provides this.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8bfe14e8

bcachefs: Rename __bch2_trans_do() -> commit_do() · e68914ca

Kent Overstreet authored 2 years ago

Better/more descriptive naming, and prep for adding
nested_lockrestart_do() and nested_commit_do().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

e68914ca

bcachefs: Always use percpu_ref_tryget_live() on c->writes · a3d7afa5

Kent Overstreet authored 2 years ago

If we're trying to get a ref and the refcount has been killed, it means
we're doing an emergency shutdown - we always want tryget_live().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

a3d7afa5

bcachefs: Printbuf rework · 401ec4db

Kent Overstreet authored 2 years ago

This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

401ec4db

bcachefs: Tracepoint improvements · 1f93726e

Kent Overstreet authored 2 years ago

Delete some obsolete tracepoints, organize alloc tracepoints better,
make a few tracepoints more consistent.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

1f93726e

bcachefs: Lock ordering fix · fd4cecd2

Kent Overstreet authored 2 years ago

Can't take btree node locks while holding btree_reserve_cache_lock - it
would be nice if we could check this with lockdep.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

fd4cecd2

bcachefs: Shutdown path improvements · c0960603

Kent Overstreet authored 2 years ago

We're seeing occasional firings of the assertion in the key cache
shutdown code that nr_dirty == 0, which means we must sometimes be doing
transaction commits after we've gone read only.

Cleanups & changes:
 - BCH_FS_ALLOC_CLEAN renamed to BCH_FS_CLEAN_SHUTDOWN
 - new helper bch2_btree_interior_updates_flush(), which returns true if
   it had to wait
 - bch2_btree_flush_writes() now also returns true if there were btree
   writes in flight
 - __bch2_fs_read_only now checks if btree writes were in flight in the
   shutdown loop: btree write completion does a transaction update, to
   update the pointer in the parent node
 - assert that !BCH_FS_CLEAN_SHUTDOWN in __bch2_trans_commit
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

c0960603

bcachefs: btree_update_interior.c prep for backpointers · 7419646b

Kent Overstreet authored 2 years ago

Previously, btree_update_interior.c passed keys to bch2_trans_mark_*
that hadn't been fully initialized - they didn't have the key field
filled out, just the value.

With backpointers, we need to make sure keys are fully initialized
before marking them - because the backpointer points back to the
original key.

This patch tweaks the interior update paths to fix this.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

7419646b

bcachefs: Plumb btree_id & level to trans_mark · e1b8f5f5

Kent Overstreet authored 2 years ago

For backpointers, we'll need the full key location - that means btree_id
and btree level. This patch plumbs it through.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

e1b8f5f5

bcachefs: Add rw to .key_invalid() · 275c8426

Kent Overstreet authored 2 years ago

This adds a new parameter to .key_invalid() methods for whether the key
is being read or written; the idea being that methods can do more
aggressive checks when a key is newly created and being written, when we
wouldn't want to delete the key because of those checks.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

275c8426

bcachefs: Convert .key_invalid methods to printbufs · f0ac7df2
Kent Overstreet authored 2 years ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
```
f0ac7df2

bcachefs: Kill allocator threads & freelists · f25d8215

Kent Overstreet authored 3 years ago

Now that we have new persistent data structures for the allocator, this
patch converts the allocator to use them.

Now, foreground bucket allocation uses the freespace btree to find
buckets to allocate, instead of popping buckets off the freelist.

The background allocator threads are no longer needed and are deleted,
as well as the allocator freelists. Now we only need background tasks
for invalidating buckets containing cached data (when we are low on
empty buckets), and for issuing discards.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f25d8215

bcachefs: Run btree updates after write out of write_point · b17d3cec

Kent Overstreet authored 2 years ago

In the write path, after the write to the block device(s) complete we
have to punt to process context to do the btree update.

Instead of using the work item embedded in op->cl, this patch switches
to a per write-point work item. This helps with two different issues:

 - lock contention: btree updates to the same writepoint will (usually)
   be updating the same alloc keys
 - context switch overhead: when we're bottlenecked on btree updates,
   having a thread (running out of a work item) checking the write point
   for completed ops is cheaper than queueing up a new work item and
   waking up a kworker.

In an arbitrary benchmark, 4k random writes with fio running inside a
VM, this patch resulted in a 10% improvement in total iops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b17d3cec

bcachefs: bch2_btree_update_start() refactoring · 5f417394

Kent Overstreet authored 3 years ago

This simplifies the logic in bch2_btree_update_start() a bit, handling
the unlock/block logic more locally.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5f417394

bcachefs: Introduce a separate journal watermark for copygc · 31f63fd1

Kent Overstreet authored 2 years ago

Since journal reclaim -> btree key cache flushing may require the
allocation of new btree nodes, it has an implicit dependency on copygc
in order to make forward progress - so we should avoid blocking copygc
unless the journal is really close to full.

This introduces watermarks to replace our single MAY_GET_UNRESERVED bit
in the journal, and adds a watermark for copygc and plumbs it through.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

31f63fd1

bcachefs: x-macroize alloc_reserve enum · 3e154711

Kent Overstreet authored 2 years ago

This makes an array of strings available, like our other enums.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3e154711

bcachefs: Use darray for extra_journal_entries · 2a6870ad
Kent Overstreet authored 2 years ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
```
2a6870ad

bcachefs: Add a missing btree_path_set_dirty() calls · 7071878b

Kent Overstreet authored 2 years ago

bch2_btree_iter_next_node() was mucking with other btree_path state
without setting path->update to be consistent with the fact that the
path is very much no longer uptodate - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

7071878b

bcachefs: Fix usage of six lock's percpu mode · 30985537

Kent Overstreet authored 2 years ago

Six locks have a percpu mode, which we use for interior btree nodes, as
well as btree key cache keys for the subvolumes btree. We've been
switching locks back and forth between percpu and non percpu mode as
needed, but it turns out this is racy - when we're reusing an existing
node, other threads could be attempting to lock it while we're switching
it between modes.

This patch fixes this by never switching 'struct btree' between the two
modes, and instead segragating them between two different freed lists.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

30985537

bcachefs: Simplify parameters to bch2_btree_update_start() · ee68105f

Kent Overstreet authored 2 years ago

We don't need to pass the number of nodes required to
bch2_btree_update_start, just whether we're doing a split at @level.
This is prep work for a fix to our usage of six lock's percpu mode,
which is going to require us to count up and allocate interior nodes and
leaf nodes seperately.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ee68105f

bcachefs: Drop unneeded journal pin in bch2_btree_update_start() · b66fbf33

Kent Overstreet authored 2 years ago

When we do an interior btree update, we create new btree nodes and link
them into the btree in memory, but they don't become reachable on disk
until later, when btree_update_nodes_written_trans() runs.

Updates to the new nodes can thus happen before they're reachable on
disk, and if the updates to those new nodes are written before the nodes
become reachable, we would then drop the journal pin for those updates
before the btree has them.

This is what the journal pin in bch2_btree_update_start() was protecting
against. However, it's not actually needed because we don't allow
subsequent append writes to btree nodes until the node is reachable on
disk.

Dropping this unneeded pin also fixes a bug introduced by "bcachefs:
Journal seq now incremented at entry open, not close" - in the new code,
if the journal is completely empty a journal pin list for
journal_cur_seq() won't exist.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

b66fbf33

bcachefs: Fix race leading to btree node write getting stuck · bf3efff5

Kent Overstreet authored 2 years ago

Checking btree_node_may_write() isn't atomic with the other btree flags,
dirty and need_write in particular. There was a rare race where we'd
unblock a node from writing while __btree_node_flush() was setting
need_write, and no thread would notice that the node was now both able
to write and needed to be written.

Fix this by adding btree node flags for will_make_reachable and
write_blocked that can be checked in the cmpxchg loop in
__bch2_btree_node_write.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

bf3efff5

bcachefs: Improve btree_node_write_if_need() · 82732ef5

Kent Overstreet authored 2 years ago

btree_node_write_if_need() kicks off a btree node write only if
need_write is set; this makes the locking easier to reason about by
moving the check into the cmpxchg loop in __bch2_btree_node_write().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

82732ef5

bcachefs: Use x-macros for btree node flags · de517c95

Kent Overstreet authored 2 years ago

This is for adding an array of strings for btree node flag names.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

de517c95

bcachefs: Kill BCH_FS_HOLD_BTREE_WRITES · 55334d78
Kent Overstreet authored 2 years ago
```
This was just dead code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
```
55334d78

bcachefs: Fix btree path sorting · a0a07c59

Kent Overstreet authored 2 years ago

In btree_update_interior.c, we were changing a path's level directly -
which affects path sort order - without re-sorting paths, leading to
assertions when bch2_path_get() verified paths were sorted correctly.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

a0a07c59

bcachefs: Heap allocate printbufs · fa8e94fa

Kent Overstreet authored 2 years ago

This patch changes printbufs dynamically allocate and reallocate a
buffer as needed. Stack usage has become a bit of a problem, and a major
cause of that has been static size string buffers on the stack.

The most involved part of this refactoring is that printbufs must now be
exited with printbuf_exit().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

fa8e94fa

bcachefs: bch2_trans_mark_key() now takes a bkey_i * · ae94c78f

Kent Overstreet authored 3 years ago

We're now coming up with triggers that modify the update being done. A
bkey_s_c is const - bkey_i is the correct type to be using here.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

ae94c78f

bcachefs: Set BTREE_NODE_SEQ() correctly in merge path · 6e44568c

Kent Overstreet authored 2 years ago

BTREE_NODE_SEQ() is supposed to give us a time ordering of btree nodes
on disk, so that we can tell which btree node is newer if we ever have
to scan the entire device to find btree nodes.

The btree node merge path wasn't setting it correctly on the new node -
oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

6e44568c

bcachefs: Also show when blocked on write locks · c7ce2732

Kent Overstreet authored 3 years ago

This consolidates some of the btree node lock path, so that when we're
blocked taking a write lock on a node it shows up in
bch2_btree_trans_to_text(), along with intent and read locks.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

c7ce2732

bcachefs: Don't keep nodes in btree_reserve locked · 35228ecb

Kent Overstreet authored 3 years ago

These nodes aren't reachable by other threads, so there's no need to
keep it locked - and this fixes a bug with the assertion in
bch2_trans_unlock() firing on transaction restart.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

35228ecb

bcachefs: Use BTREE_INSERT_USE_RESERVE in btree_update_key() · b674bfad

Kent Overstreet authored 3 years ago

bch2_btree_update_key() is used in the btree node write path - before
delivering the completion we have to update the parent pointer with the
number of sectors written.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

b674bfad

bcachefs: Switch to __func__for recording where btree_trans was initialized · 669f87a5

Kent Overstreet authored 3 years ago

Symbol decoding, via %ps, isn't supported in userspace - this will also
be faster when we're using trans->fn in the fast path, as with the new
BCH_JSET_ENTRY_log journal messages.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

669f87a5

bcachefs: Simplify journal replay · d8601afc

Kent Overstreet authored 3 years ago

With BTREE_ITER_WITH_JOURNAL, there's no longer any restrictions on the
order we have to replay keys from the journal in, and we can also start
up journal reclaim right away - and delete a bunch of code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

d8601afc

bcachefs: BTREE_ITER_WITH_JOURNAL · 5222a460

Kent Overstreet authored 3 years ago

This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is
automatically enabled when initializing a btree iterator before journal
replay has completed - it overlays the contents of the journal with the
btree.

This lets us delete bch2_btree_and_journal_walk() and just use the
normal btree iterator interface instead - which also lets us delete a
significant amount of duplicated code.

Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch -
we're redoing the binary search over keys in the journal every time we
call bch2_btree_iter_peek().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5222a460