Commits · f299d57350b2450c522dc7780400ce811f4847ec · Kirill Smelkov / linux

22 Oct, 2023 40 commits

bcachefs: Refactor filesystem usage accounting · f299d573

Kent Overstreet authored Nov 13, 2020

Various filesystem usage counters are kept in percpu counters, with one
set per in flight journal buffer. Right now all the code that deals with
it assumes that there's only two buffers/sets of counters, but the
number of journal bufs is getting increased to 4 in the next patch - so
refactor that code to not assume a constant.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f299d573

bcachefs: Fix spurious alloc errors on forced shutdown · 7bfbbd88

Kent Overstreet authored Dec 02, 2020

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7bfbbd88

bcachefs: Fix some spurious gcc warnings · b206df6e

Kent Overstreet authored Dec 03, 2020

These only come up when building in userspace, for some reason.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b206df6e

bcachefs: Fix journal_flush_seq() · c5bb1690

Kent Overstreet authored Dec 02, 2020

The error check was inverted - leading fsyncs to get stuck and hang,
oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c5bb1690

bcachefs: bch2_trans_get_iter() no longer returns errors · 3eb26d01

Kent Overstreet authored Dec 01, 2020

Since we now always preallocate the maximum number of iterators when we
initialize a btree transaction, getting an iterator never fails - we can
delete a fair amount of error path code.

This patch also simplifies the iterator allocation code a bit.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3eb26d01

bcachefs: Add error handling to unit & perf tests · ec3d21a9

Kent Overstreet authored Dec 01, 2020

This way, these tests can be used with tests that inject IO errors and
shut down the filesystem.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ec3d21a9

bcachefs: Journal pin refactoring · 231db03c

Kent Overstreet authored Dec 01, 2020

This deletes some duplicated code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

231db03c

bcachefs: Fix for fsck spuriously finding duplicate extents · 34c1cd6a

Kent Overstreet authored Dec 01, 2020

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

34c1cd6a

bcachefs: Use BTREE_ITER_PREFETCH in journal+btree iter · 2e9f3b88

Kent Overstreet authored Dec 01, 2020

Introducing the journal+btree iter introduced a regression where we
stopped using BTREE_ITER_PREFETCH - this is a performance regression on
rotating disks.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2e9f3b88

bcachefs: Ensure we always have a journal pin in interior update path · 04e23a56

Kent Overstreet authored Nov 30, 2020

For the new nodes an interior btree update makes reachable, updates to
those nodes may be journalled after the btree update starts but before
the transactional part - where we make those nodes reachable. Those
updates need to be kept in the journal until after the btree update
completes, hence we should always get a journal pin at the start of the
interior update.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

04e23a56

bcachefs: Change a BUG_ON() to a fatal error · d7b04163

Kent Overstreet authored Nov 30, 2020

In the btree key cache code, failing to flush a dirty key is a serious
error, but it doesn't need to be a BUG_ON(), we can stop the filesystem
instead.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d7b04163

bcachefs: Fix error in filesystem initialization · d0022290

Kent Overstreet authored Nov 29, 2020

The rhashtable code doesn't like when we destroy an rhashtable that was
never initialized
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d0022290

bcachefs: Fix journal reclaim spinning in recovery · 5731cf01

Kent Overstreet authored Nov 29, 2020

We can't run journal reclaim until we've finished replaying updates to
interior btree nodes - the check for this was in the wrong place though,
leading to journal reclaim spinning before it was allowed to proceed.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5731cf01

bcachefs: Fix for __readahead_batch getting partial batch · 89931472

Kent Overstreet authored Nov 29, 2020

We were incorrectly ignoring the return value of __readahead_batch,
leading to a null ptr deref in __bch2_page_state_create().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

89931472

bcachefs: Optimize bch2_journal_flush_seq_async() · 33b3b1dc

Kent Overstreet authored Nov 20, 2020

Avoid taking the journal lock if we don't have to.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

33b3b1dc

bcachefs: Delete dead code · 7b489207

Kent Overstreet authored Nov 20, 2020

The interior btree node update path has changed, this is no longer
needed.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7b489207

bcachefs: bch2_btree_delete_range_trans() · 087c2019

Kent Overstreet authored Nov 20, 2020

This helps reduce stack usage by avoiding multiple btree_trans on the
stack.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

087c2019

bcachefs: Don't use bkey cache for inode update in fsck · 6584e84a

Kent Overstreet authored Nov 20, 2020

fsck doesn't know about the btree key cache, and non-cached iterators
aren't cache coherent (yet?)
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

6584e84a

bcachefs: Fix an rcu splat · f3020550

Kent Overstreet authored Nov 20, 2020

bch2_bucket_alloc() requires rcu_read_lock() to be held.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f3020550

bcachefs: Move journal reclaim to a kthread · b7a9bbfc

Kent Overstreet authored Nov 19, 2020

This is to make tracing easier.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b7a9bbfc

bcachefs: Throttle updates when btree key cache is too dirty · d5425a3b

Kent Overstreet authored Nov 19, 2020

This is needed to ensure we don't deadlock because journal reclaim and
thus memory reclaim isn't making forward progress.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d5425a3b

bcachefs: Journal reclaim requires memalloc_noreclaim_save() · 9d4582ff

Kent Overstreet authored Nov 19, 2020

Memory reclaim requires journal reclaim to make forward progress - it's
what cleans our caches - thus, while we're in journal reclaim or holding
the journal reclaim lock we can't recurse into memory reclaim.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

9d4582ff

bcachefs: Simplify transaction commit error path · b3c2a06b

Kent Overstreet authored Nov 20, 2020

The transaction restart path traverses all iterators, we don't need to
do it here.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b3c2a06b

bcachefs: Ensure journal reclaim runs when btree key cache is too dirty · 8a92e545

Kent Overstreet authored Nov 19, 2020

Ensuring the key cache isn't too dirty is critical for ensuring that the
shrinker can reclaim memory.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8a92e545

bcachefs: Improve btree key cache shrinker · 12590720

Kent Overstreet authored Nov 19, 2020

The shrinker should start scanning for entries that can be freed oldest
to newest - this way, we can avoid scanning a lot of entries that are
too new to be freed.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

12590720

bcachefs: More debug code improvements · 4e92cbb6

Kent Overstreet authored Nov 19, 2020

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4e92cbb6

bcachefs: Add a kmem_cache for btree_key_cache objects · 14ba3706

Kent Overstreet authored Nov 18, 2020

We allocate a lot of these, and we're seeing sporading OOMs - this will
help with tracking those down.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

14ba3706

bcachefs: Be more precise with journal error reporting · ed0e24c0

Kent Overstreet authored Nov 18, 2020

We were incorrectly detecting a journal deadlock - the journal filling
up - when only the journal pin fifo had filled up; if the journal pin
fifo is full that just means we need to wait on reclaim.

This plumbs through better error reporting so we can better discriminate
in the journal_res_get path what's going on.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ed0e24c0

bcachefs: Add btree cache stats to sysfs · d8ebed7d

Kent Overstreet authored Nov 19, 2020

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d8ebed7d

bcachefs: Add an ioctl for resizing journal on a device · e8c851b3

Kent Overstreet authored Nov 16, 2020

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e8c851b3

bcachefs: Add more debug checks · 1c74cec1

Kent Overstreet authored Nov 16, 2020

tracking down a bug where we see a btree node pointer in the wrong node
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1c74cec1

bcachefs: Dump journal state when the journal deadlocks · e8bd002b

Kent Overstreet authored Nov 16, 2020

Currently tracking down one of these bugs.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e8bd002b

bcachefs: Dont' use percpu btree_iter buf in userspace · dbd1e825

Kent Overstreet authored Nov 16, 2020

bcachefs-tools doesn't have a real percpu (per thread) implementation
yet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

dbd1e825

bcachefs: Set preallocated transaction mem to avoid restarts · 0b5c9f59

Kent Overstreet authored Nov 15, 2020

this will reduce transaction restarts, from observation of tracepoints.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0b5c9f59

bcachefs: Convert tracepoints to use %ps, not %pf · 3dc5fcfc

Kent Overstreet authored Nov 16, 2020

Symbol decoding was changed from %pf to %ps
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3dc5fcfc

bcachefs: Fix journal entry repair code · 4d54337c

Kent Overstreet authored Nov 16, 2020

When we detect bad keys in the journal that have to be dropped, the flow
control was wrong - we ended up not checking the next key in that entry.
Oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4d54337c

bcachefs: Add a shrinker for the btree key cache · 628a3ad2

Kent Overstreet authored Nov 12, 2020

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

628a3ad2

bcachefs: Take a SRCU lock in btree transactions · 876c7af3

Kent Overstreet authored Nov 15, 2020

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

876c7af3

bcachefs: Check for errors from register_shrinker() · d8b46004

Kent Overstreet authored Nov 15, 2020

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d8b46004

bcachefs: Assorted journal refactoring · 158eecb8

Kent Overstreet authored Nov 14, 2020

Improved the way we track various state by adding j->err_seq, which
records the first journal sequence number that encountered an error
being written, and j->last_empty_seq, which records the most recent
journal entry that was completely empty.

Also, use the low bits of the journal sequence number to index the
corresponding journal_buf.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

158eecb8