Commits · 822835ffeae411bbc8af104da9331fdf63a7bc12 · Kirill Smelkov / linux

22 Oct, 2023 40 commits

bcachefs: Fold bucket_state in to BCH_DATA_TYPES() · 822835ff

Kent Overstreet authored Apr 01, 2022

Previously, we were missing accounting for buckets in need_gc_gens and
need_discard states. This matters because buckets in those states need
other btree operations done before they can be used, so they can't be
conuted when checking current number of free buckets against the
allocation watermark.

Also, we weren't directly counting free buckets at all. Now, data type 0
== BCH_DATA_free, and free buckets are counted; this means we can get
rid of the separate (poorly defined) count of unavailable buckets.

This is a new on disk format version, with upgrade and fsck required for
the accounting changes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

822835ff

bcachefs: Add a sysfs attr for triggering discards · 8058ea64

Kent Overstreet authored Apr 07, 2022

We're currently debugging an issue with discards not getting run; this
patch adds a manual trigger so we can then watch the tracepoint while it
runs.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

8058ea64

bcachefs: Topology repair fixes · 48620e51

Kent Overstreet authored Apr 07, 2022

 - We were failing to start topology repair, because we hadn't set the
   superblock flag indicating it needed to run
 - set_node_min() forget to update the btree node's key
 - bch2_gc_alloc_reset() didn't reset data type, leading to inserting an
   invalid key that was empty but had nonzero data type
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

48620e51

bcachefs: Use bch2_trans_inconsistent() more · 5e05d7ed
Kent Overstreet authored Apr 07, 2022
```
This gets us better error messages.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
```
5e05d7ed

bcachefs: Move alloc assertion to .key_invalid() · 62491956

Kent Overstreet authored Apr 07, 2022

.key_invalid is a better place for this assertion.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

62491956

bcachefs: Improve btree_bad_header() · 1d8a2689

Kent Overstreet authored Apr 07, 2022

In the future printbufs will be mempool-ified, so we shouldn't be using
more than one at a time if we don't have to.

This also fixes an extra trailing newline.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

1d8a2689

bcachefs: Check for read_time == 0 in bch2_alloc_v4_invalid() · 11c7d3e8

Kent Overstreet authored Apr 06, 2022

We've been seeing this error in fsck and we weren't able to track down
where it came from - but now that .key_invalid methods take a rw
argument, we can safely check for this.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

11c7d3e8

bcachefs: fsck: Work around transaction restarts · 292dea86

Kent Overstreet authored Apr 06, 2022

In check_extents() and check_dirents(), we're working towards only
handling transaction restarts in one place, at the top level - but we're
not there yet. check_i_sectors() and check_subdir_count() handle
transaction restarts locally, which means the iterator for the
dirent/extent is left unlocked (should_be_locked == 0), leading to
asserts popping when we go to do updates.

This patch hacks around this for now, until we can delete the offending
code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

292dea86

bcachefs: Add rw to .key_invalid() · 275c8426

Kent Overstreet authored Apr 03, 2022

This adds a new parameter to .key_invalid() methods for whether the key
is being read or written; the idea being that methods can do more
aggressive checks when a key is newly created and being written, when we
wouldn't want to delete the key because of those checks.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

275c8426

bcachefs: More improvements for alloc info checks · e1effd42

Kent Overstreet authored Apr 05, 2022

 - Move checks for whether the device & bucket are valid from the
   .key_invalid method to bch2_check_alloc_key(). This is because
   .key_invalid() is called on keys that may no longer exist (post
   journal replay), which is a problem when removing/resizing devices.

 - We weren't checking the need_discard btree to ensure that every set
   bucket has a corresponding alloc key. This refactors the code for
   checking the freespace btree, so that it now checks both.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

e1effd42

bcachefs: Silence spurious copygc err when shutting down · afb6f7f6
Kent Overstreet authored Apr 04, 2022
```
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
```
afb6f7f6
bcachefs: Convert .key_invalid methods to printbufs · f0ac7df2
Kent Overstreet authored Apr 03, 2022
```
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
```
f0ac7df2

bcachefs: Gap buffer for journal keys · d1d7737f

Kent Overstreet authored Apr 04, 2022

Btree updates before we go RW work by inserting into the array of keys
that journal replay will insert - but inserting into a flat array is
O(n), meaning if btree_gc needs to update many alloc keys, we're O(n^2).

Fortunately, the updates btree_gc does happens in sequential order,
which means a gap buffer works nicely here - this patch implements a gap
buffer for journal keys.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

d1d7737f

bcachefs: Don't normalize to pages in btree cache shrinker · 7c7e071d

Kent Overstreet authored Apr 03, 2022

This behavior dates from the early, early days of bcache, and upon
further delving appears to not make any sense. The shrinker only works
in terms of 'objects' of unknown size; normalizing to pages only had the
effect of changing the batch size, which we could do directly - if we
wanted; we probably don't. Normalizing to pages meant our batch size was
very small, which seems to have been keeping us from doing as much
shrinking as we should be under heavy memory pressure; this patch
appears to alleviate some OOMs we've been seeing.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

7c7e071d

bcachefs: Add a tracepoint for superblock writes · 4254f5bf
Kent Overstreet authored Apr 03, 2022
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
4254f5bf

bcachefs: gc mark fn fixes, cleanups · c6b6d416

Kent Overstreet authored Apr 02, 2022

mark_stripe_bucket() was busted; it was using @new unitialized.

Also, clean up all the gc mark functions, and convert them to the same
style.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

c6b6d416

bcachefs: Don't write partially-initialized superblocks · 80c80164

Kent Overstreet authored Apr 02, 2022

This neatly avoids bugs where we fail partway through initializing a new
filesystem, if we just don't write out partly-initialized state.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

80c80164

bcachefs: Improve read_from_stale_dirty_pointer() message · 64afbbc9

Kent Overstreet authored Apr 02, 2022

With printbufs, it's now easy to build up multi-line log messages and
emit them with one call, which is good because it prevents multiple
multi-line log messages from getting Interspersed in the log buffer;
this patch also improves the formatting and converts it to latest style.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

64afbbc9

bcachefs: Use crc_is_compressed() · 75f02de4
Kent Overstreet authored Mar 31, 2022
```
Trivial cleanup.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
```
75f02de4

bcachefs: Fix pr_buf() calls · c32fc674

Kent Overstreet authored Apr 02, 2022

In a few places we were passing a variable to pr_buf() for the format
string - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

c32fc674

bcachefs: Kill struct bucket_mark · 66d90823

Kent Overstreet authored Feb 14, 2022

This switches struct bucket to using a lock, instead of cmpxchg. And now
that the protected members no longer need to fit into a u64, we can
expand the sector counts to 32 bits.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

66d90823

bcachefs: Kill main in-memory bucket array · 5735608c

Kent Overstreet authored Feb 10, 2022

All code using the in-memory bucket array, excluding GC, has now been
converted to use the alloc btree directly - so we can finally delete it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5735608c

bcachefs: bch2_dev_usage_update() no longer depends on bucket_mark · 5f43f99c

Kent Overstreet authored Feb 10, 2022

This is one of the last steps in getting rid of the main in-memory
bucket array.

This changes bch2_dev_usage_update() to take bkey_alloc_unpacked instead
of bucket_mark, and for the places where we are in fact working with
bucket_mark and don't have bkey_alloc_unpacked, we add a wrapper that
takes bucket_mark and converts to bkey_alloc_unpacked.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5f43f99c

bcachefs: Fsck for need_discard & freespace btrees · 5add07d5
Kent Overstreet authored Feb 17, 2022
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
5add07d5

bcachefs: New bucket invalidate path · caece7fe

Kent Overstreet authored Feb 10, 2022

In the old allocator code, preparing an existing empty bucket was part
of the same code path that invalidated buckets containing cached data.
In the new allocator code this is no longer the case: the main allocator
path finds empty buckets (via the new freespace btree), and can't
allocate buckets that contain cached data.

We now need a separate code path to invalidate buckets containing cached
data when we're low on empty buckets, which this patch implements. When
the number of free buckets decreases that triggers the new invalidate
path to run, which uses the LRU btree to pick cached data buckets to
invalidate until we're above our watermark.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

caece7fe

bcachefs: New discard implementation · 59cc38b8

Kent Overstreet authored Feb 10, 2022

In the old allocator code, buckets would be discarded just prior to
being used - this made sense in bcache where we were discarding buckets
just after invalidating the cached data they contain, but in a
filesystem where we typically have more free space we want to be
discarding buckets when they become empty.

This patch implements the new behaviour - it checks the need_discard
btree for buckets awaiting discards, and then clears the appropriate
bit in the alloc btree, which moves the buckets to the freespace btree.

Additionally, discards are now enabled by default.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

59cc38b8

bcachefs: Kill allocator threads & freelists · f25d8215

Kent Overstreet authored Jan 09, 2022

Now that we have new persistent data structures for the allocator, this
patch converts the allocator to use them.

Now, foreground bucket allocation uses the freespace btree to find
buckets to allocate, instead of popping buckets off the freelist.

The background allocator threads are no longer needed and are deleted,
as well as the allocator freelists. Now we only need background tasks
for invalidating buckets containing cached data (when we are low on
empty buckets), and for issuing discards.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f25d8215

bcachefs: Freespace, need_discard btrees · c6b2826c

Kent Overstreet authored Dec 11, 2021

This adds two new btrees for the upcoming allocator rewrite: an extents
btree of free buckets, and a btree for buckets awaiting discards.

We also add a new trigger for alloc keys to keep the new btrees up to
date, and a compatibility path to initialize them on existing
filesystems.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c6b2826c

bcachefs: KEY_TYPE_alloc_v4 · 3d48a7f8

Kent Overstreet authored Dec 31, 2021

This introduces a new alloc key which doesn't use varints. Soon we'll be
adding backpointers and storing them in alloc keys, which means our
pack/unpack workflow for alloc keys won't really work - we'll need to be
mutating alloc keys in place.

Instead of bch2_alloc_unpack(), we now have bch2_alloc_to_v4() that
converts older types of alloc keys to v4 if needed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3d48a7f8

bcachefs: LRU btree · d326ab2f

Kent Overstreet authored Dec 05, 2021

This implements new persistent LRUs, to be used for buckets containing
cached data, as well as stripes ordered by time when a block became
empty.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

d326ab2f

bcachefs: KEY_TYPE_set · 179e3434

Kent Overstreet authored Jan 05, 2022

A new empty key type, to be used when using a btree as a set.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

179e3434

bcachefs: bch_sb_field_journal_v2 · 25be2e5d

Kent Overstreet authored Mar 10, 2022

Add a new superblock field which represents journal buckets as ranges:
also move code for the superblock journal fields to journal_sb.c.

This also reworks the code for resizing the journal to write the new
superblock before using the new journal buckets, and thus be a bit
safer.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

25be2e5d

bcachefs: Run btree updates after write out of write_point · b17d3cec

Kent Overstreet authored Oct 31, 2022

In the write path, after the write to the block device(s) complete we
have to punt to process context to do the btree update.

Instead of using the work item embedded in op->cl, this patch switches
to a per write-point work item. This helps with two different issues:

 - lock contention: btree updates to the same writepoint will (usually)
   be updating the same alloc keys
 - context switch overhead: when we're bottlenecked on btree updates,
   having a thread (running out of a work item) checking the write point
   for completed ops is cheaper than queueing up a new work item and
   waking up a kworker.

In an arbitrary benchmark, 4k random writes with fio running inside a
VM, this patch resulted in a 10% improvement in total iops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b17d3cec

bcachefs: bch2_btree_update_start() refactoring · 5f417394

Kent Overstreet authored Jan 11, 2022

This simplifies the logic in bch2_btree_update_start() a bit, handling
the unlock/block logic more locally.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5f417394

bcachefs: Introduce a separate journal watermark for copygc · 31f63fd1

Kent Overstreet authored Mar 14, 2022

Since journal reclaim -> btree key cache flushing may require the
allocation of new btree nodes, it has an implicit dependency on copygc
in order to make forward progress - so we should avoid blocking copygc
unless the journal is really close to full.

This introduces watermarks to replace our single MAY_GET_UNRESERVED bit
in the journal, and adds a watermark for copygc and plumbs it through.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

31f63fd1

bcachefs: Copygc allocations shouldn't be nowait · d905f67e

Kent Overstreet authored Mar 15, 2022

We don't actually want copygc allocations to be nowait - an allocation
for copygc might fail and then later succeed due to a bucket needing to
wait on journal commit, or to be discarded.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

d905f67e

bcachefs: Fix bch2_journal_pin_set() · 70a9953c

Kent Overstreet authored Jan 05, 2023

When bch2_journal_pin_set() is updating an existing pin, we shouldn't
call bch2_journal_reclaim_fast() after dropping the old pin and before
dropping the new pin - that could reclaim the entry we're trying to pin.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

70a9953c

bcachefs: x-macroize alloc_reserve enum · 3e154711

Kent Overstreet authored Mar 13, 2022

This makes an array of strings available, like our other enums.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3e154711

bcachefs: Run overwrite triggers before insert · f13fd87a

Kent Overstreet authored Mar 30, 2022

For backpointers, we'll need to delete old backpointers before adding
new backpointers - otherwise we'll run into spurious duplicate
backpointer errors.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

f13fd87a

bcachefs: Move deletion of refcount=0 indirect extents to their triggers · 78668fe0

Kent Overstreet authored Mar 31, 2022

For backpointers, we need to switch the order triggers are run in: we
need to run triggers for deletions/overwrites before triggers for
inserts.

To avoid breaking the reflink triggers, this patch moves deleting of
indirect extents with refcount=0 to their triggers, instead of doing it
when we update those keys.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

78668fe0