Commits · 21aec962dfec2df11694350e5b2d3a9a9c298e7d · Kirill Smelkov / linux

An error occurred fetching the project authors.

22 Oct, 2023 40 commits

bcachefs: New data structure for buckets waiting on journal commit · 21aec962

Kent Overstreet authored 3 years ago

Implement a hash table, using cuckoo hashing, for empty buckets that are
waiting on a journal commit before they can be reused.

This replaces the journal_seq field of bucket_mark, and is part of
eventually getting rid of the in memory bucket array.

We may need to make bch2_bucket_needs_journal_commit() lockless, pending
profiling and testing.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

21aec962

bcachefs: Also print out in-memory gen on stale dirty pointer · f443fa66

Kent Overstreet authored 3 years ago

We're trying to track down a bug that shows itself as newly-created
extents having stale dirty pointers - possibly due to the in memory gen
and the btree gen being inconsistent. This patch changes the error
message to also print out the in memory bucket gen when this happens.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

f443fa66

bcachefs: Add error messages for memory allocation failures · f0f41a6d

Kent Overstreet authored 3 years ago

This adds some missing diagnostics from rare but annoying to debug
runtime allocation failure paths.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

f0f41a6d

bcachefs: Optimize bucket reuse · e3ad2937

Kent Overstreet authored 3 years ago

If the btree updates pointing to a bucket were never flushed by the
journal before the bucket became empty again, we can reuse the bucket
without a journal flush.

This tweaks the tracking of journal sequence numbers in alloc keys to
implement this optimization: now, we only update the journal sequence
number in alloc keys on transitions to and from empty. When a bucket
becomes empty, we check if we can tell the journal not to flush entries
starting from when the bucket was used.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

e3ad2937

bcachefs: Kill bch2_ec_mem_alloc() · 13f914ec

Kent Overstreet authored 3 years ago

bch2_ec_mem_alloc() was only used by GC, and there's no real need to
preallocate the stripes radix tree since we can cope fine with memory
allocation failure when we use the radix tree. This deletes a fair bit
of code, and it's also needed for the upcoming patch because
bch2_btree_iter_peek_prev() won't be working before journal replay
completes (and using it was incorrect previously, as well).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

13f914ec

bcachefs: Fix allocator + journal interaction · 36f035e9

Kent Overstreet authored 3 years ago

The allocator needs to wait until the last update touching a bucket has
been commited before writing to it again. However, the code was checking
against the last dirty journal sequence number, not the last flushed
journal sequence number.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

36f035e9

bcachefs: New in-memory array for bucket gens · a7860877

Kent Overstreet authored 3 years ago

The main in-memory bucket array is going away, but we'll still need to
keep bucket generations in memory, at least for now - ptr_stale() needs
to be an efficient operation.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

a7860877

bcachefs: Separate out gc_bucket() · 47ac34ec

Kent Overstreet authored 3 years ago

Since the main in memory bucket array is going away, we don't want to be
calling bucket() or __bucket() when what we want is the GC in-memory
bucket.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

47ac34ec

bcachefs: bch2_journal_key_insert() no longer transfers ownership · e75b2d4c

Kent Overstreet authored 3 years ago

bch2_journal_key_insert() used to assume that the key passed to it was
allocated with kmalloc(), and on success took ownership. This patch
deletes that behaviour, making it more similar to
bch2_trans_update()/bch2_trans_commit().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

e75b2d4c

bcachefs: bch2_bucket_alloc_new_fs() no longer depends on bucket marks · 77170d0d

Kent Overstreet authored 3 years ago

Now that bch2_bucket_alloc_new_fs() isn't looking at bucket marks to
decide what buckets are eligible to allocate, we can clean up the
filesystem initialization and device add paths. Previously, we had to
use ancient code to mark superblock/journal buckets in the in memory
bucket marks as we allocated them, and then zero that out and re-do that
marking using the newer transational bucket mark paths. Now, we can
simply delete the in-memory bucket marking.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

77170d0d

bcachefs: Option improvements · 8244f320

Kent Overstreet authored 3 years ago

This adds flags for options that must be a power of two (block size and
btree node size), and options that are stored in the superblock as a
power of two (encoded extent max).

Also: options are now stored in memory in the same units they're
displayed in (bytes): we now convert when getting and setting from the
superblock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

8244f320

bcachefs: Improve alloc_mem_to_key() · 20572300

Kent Overstreet authored 3 years ago

This moves some common code into alloc_mem_to_key(), which translates
from the in-memory format for a bucket to the btree key format.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

20572300

bcachefs: bch2_alloc_write() · fb0e4808

Kent Overstreet authored 3 years ago

This adds a new helper that much like the one we have for inode updates,
that allocates the packed alloc key, packs it and calls
bch2_trans_update.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

fb0e4808

bcachefs: Split out struct gc_stripe from struct stripe · 990d42d1

Kent Overstreet authored 3 years ago

We have two radix trees of stripes - one that mirrors some information
from the stripes btree in normal operation, and another that GC uses to
recalculate block usage counts.

The normal one is now only used for finding partially empty stripes in
order to reuse them - the normal stripes radix tree and the GC stripes
radix tree are used significantly differently, so this patch splits them
into separate types.

In an upcoming patch we'll be replacing c->stripes with a btree that
indexes stripes by the order we want to reuse them.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

990d42d1

bcachefs: bch2_trans_update() is now __must_check · 94a3e1a6

Kent Overstreet authored 3 years ago

With snapshots, bch2_trans_update() has to check if we need a whitout,
which can cause a transaction restart, so this is important now.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

94a3e1a6

bcachefs: Erasure coding fixes · b547d005

Kent Overstreet authored 3 years ago

When we added the stripe and stripe_redundancy fields to alloc keys, we
neglected to add them to the functions that convert back and forth with
the in-memory types.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

b547d005

bcachefs: Handle replica marking fsck errors locally · 181fe42a

Kent Overstreet authored 3 years ago

This simplifies the code quite a bit and eliminates an inconsistency - a
given bkey doesn't necessarily translate to a single replicas entry for
disk space accounting.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

181fe42a

bcachefs: Push c->mark_lock usage down to where it is needed · 58e1ea4b

Kent Overstreet authored 3 years ago

This changes the bch2_mark_key() and related paths to take mark lock
where it is needed, instead of taking it in the upper transaction commit
path - by pushing down locking we'll be able to handle fsck errors
locally instead of requiring a separate check in the btree_gc code for
replicas being marked.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

58e1ea4b

bcachefs: Kill bch2_replicas_delta_list_marked() · 502cfb35

Kent Overstreet authored 3 years ago

This changes bch2_trans_fs_usage_apply() to handle failure (replicas
entry missing) by reverting the changes it made - meaning we can make
the main transaction commit path a bit slimmer, and perhaps also
simplify some locking in upcoming patches.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

502cfb35

bcachefs: Run insert triggers before overwrite triggers · f0c3f88b

Kent Overstreet authored 3 years ago

Currently, btree triggers are run in natural key order, which presents a
problem for fallocate in INSERT_RANGE mode: since we're moving existing
extents to higher offsets, the trigger for deleting the old extent runs
before the trigger that adds the new extent, potentially leading to
indirect extents being deleted that shouldn't be when the delete causes
the refcount to hit 0.

This changes the order we run triggers so that for a givin btree, we run
all insert triggers before overwrite triggers, nicely sidestepping this
issue.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

f0c3f88b

bcachefs: Disk space accounting fix on brand-new fs · c714614b

Kent Overstreet authored 3 years ago

The filesystem initialization path first marks superblock and journal
buckets non transactionally, since the btree isn't functional yet. That
path was updating the per-journal-buf percpu counters via
bch2_dev_usage_update(), and updating the wrong set of counters so those
updates didn't get written out until journal entry 4.

The relevant code is going to get significantly rewritten in the future
as we transition away from the in memory bucket array, so this just
hacks around it for now.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

c714614b

bcachefs: Fix upgrade path for reflink_p fix · 076c783c
Kent Overstreet authored 3 years ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
```
076c783c

bcachefs: Add journal_seq to inode & alloc keys · 3e52c222

Kent Overstreet authored 3 years ago

Add fields to inode & alloc keys that record the journal sequence number
when they were most recently modified.

For alloc keys, this is needed to know what journal sequence number we
have to flush before the bucket can be reused. Currently this is tracked
in memory, but we'll be getting rid of the in memory bucket array.

For inodes, this is needed for fsync when the inode has been evicted
from the vfs cache. Currently we use a bloom filter per outstanding
journal buf - but that mechanism has been broken since we added the
ability to not issue a flush/fua for every journal write.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

3e52c222

bcachefs: BTREE_TRIGGER_INSERT now only means insert · 2debb1b8

Kent Overstreet authored 3 years ago

This allows triggers to distinguish between a key entering the btree -
i.e. being called from the trans commit path - vs. being called on a key
that already exists, i.e. by GC.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

2debb1b8

bcachefs: Convert bch2_mark_key() to take a btree_trans * · 904823de

Kent Overstreet authored 3 years ago

This helps to unify the interface between bch2_mark_key() and
bch2_trans_mark_key() - and it also gives access to the journal
reservation and journal seq in the mark_key path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

904823de

bcachefs: Assorted ec fixes · 961b2d62

Kent Overstreet authored 3 years ago

- The backpointer that ec_stripe_update_ptrs() uses now needs to include
  the snapshot ID, which means we have to change where we add the
  backpointer to after getting the snapshot ID for the new extents

- ec_stripe_update_ptrs() needs to be calling bch2_trans_begin()

- improve error message in bch2_mark_stripe()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

961b2d62

bcachefs: Fix bch2_mark_update() · 37f72492

Kent Overstreet authored 3 years ago

When the old or new key doesn't exist, we should still pass in a deleted
key with the correct pos. This fixes a bug in the ec code, when
bch2_mark_stripe() was looking up the wrong in-memory stripe.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

37f72492

bcachefs: Improve error messages in trans_mark_reflink_p() · f3b1e193

Kent Overstreet authored 3 years ago

We should always print out the key we were marking.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

f3b1e193

bcachefs: Fix fsck path for refink pointers · 396a887d

Kent Overstreet authored 3 years ago

The way __bch2_mark_reflink_p returns errors was clashing with returning
the number of sectors processed - we weren't returning FSCK_ERR_EXIT
correctly.

Fix this by only using the return code for errors, which actually ends
up simplifying the overall logic.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

396a887d

bcachefs: Fix for leaking of reflinked extents · 6d76aefe

Kent Overstreet authored 3 years ago

When a reflink pointer points to only part of an indirect extent, and
then that indirect extent is fragmented (e.g. by copygc), if the reflink
pointer only points to one of the fragments we leak a reference.

Fix this by storing front/back pad values in reflink pointers - when
inserting reflink pointesr, we initialize them to cover the full range
of the indirect extents we reference.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

6d76aefe

bcachefs: Improve reflink repair code · dfc276df

Kent Overstreet authored 3 years ago

When a reflink pointer points to an indirect extent that doesn't exist,
we need to replace it with a KEY_TYPE_error key.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

dfc276df

bcachefs: Subvolumes, snapshots · 14b393ee

Kent Overstreet authored 3 years ago

This patch adds subvolume.c - support for the subvolumes and snapshots
btrees and related data types and on disk data structures. The next
patches will start hooking up this new code to existing code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

14b393ee

bcachefs: btree_path · 67e0dd8f

Kent Overstreet authored 3 years ago

This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

67e0dd8f

bcachefs: Prefer using btree_insert_entry to btree_iter · 6fba6b83

Kent Overstreet authored 3 years ago

This moves some data dependencies forward, to improve pipelining.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

6fba6b83

bcachefs: Fix 32 bit build failures · fd0bd123

Brett Holman authored 3 years ago

This fix replaces multiple 64 bit divisions with do_div() equivalents.
Signed-off-by: Brett Holman <bholman.devel@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

fd0bd123

bcachefs: Disk space accounting fix · 62df3d44

Kent Overstreet authored 3 years ago

DIV_ROUND_UP() wasn't doing what we wanted when passing it negative
numbers - fix it by just not passing it negative numbers anymore.

Also, no need to do the scaling by compression ratio for incompressible
data.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

62df3d44

bcachefs: Extensive triggers cleanups · 297d8934

Kent Overstreet authored 3 years ago

 - We no longer mark subsets of extents, they're marked like regular
   keys now - which means we can drop the offset & sectors arguments
   to trigger functions
 - Drop other arguments that are no longer needed anymore in various
   places - fs_usage
 - Drop the logic for handling extents in bch2_mark_update() that isn't
   needed anymore, to match bch2_trans_mark_update()
 - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we
   don't have an old key to mark
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

297d8934

bcachefs: Improve iter->should_be_locked · 8c3f6da9

Kent Overstreet authored 3 years ago

Adding iter->should_be_locked introduced a regression where it ended up
not being set on the iterator passed to bch2_btree_update_start(), which
is definitely not what we want.

This patch requires it to be set when calling bch2_trans_update(), and
adds various fixups to make that happen.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

8c3f6da9

bcachefs: Make sure bch2_trans_mark_update uses correct iter flags · 8ee529e9

Kent Overstreet authored 3 years ago

Now that bch2_btree_iter_peek_with_updates() has been removed in favor
of BTREE_ITER_WITH_UPDATES, we need to make sure it's not used where we
don't want it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

8ee529e9

bcachefs: Don't underflow c->sectors_available · 290448ed

Kent Overstreet authored 3 years ago

This rarely used error path should've been checking for underflow -
oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

290448ed