Commits · 8933315689bcb57a3b282bad262ac584e095a2f5 · Kirill Smelkov / linux

22 Oct, 2023 40 commits

bcachefs: Convert bch2_dev_usrdata_drop() to for_each_btree_key2() · 89333156

Kent Overstreet authored Jul 17, 2022

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

89333156

bcachefs: Convert bch2_do_invalidates_work() to for_each_btree_key2() · d04801a0

Kent Overstreet authored Jul 17, 2022

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

d04801a0

bcachefs: bch2_trans_run() · dadecd02

Kent Overstreet authored Jul 14, 2022

This adds a new helper, bch2_trans_run(), that runs a function with a
btree_transaction context but without handling transaction restarts.
We're adding checks for nested transaction restart handling: when an
inner transaction handles a transaction restart it will still have to
return it to the outer transaction, or else assertions will be popped in
the outer transaction.

But some places don't need restart handling at the outer scope, so this
helper does what they need.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

dadecd02

bcachefs: Convert bch2_gc_done() for_each_btree_key2() · 326568f1

Kent Overstreet authored Jul 17, 2022

This converts bch2_gc_stripes_done() and bch2_gc_reflink_done() to the
new for_each_btree_key_commit() macro.

The new for_each_btree_key2() and for_each_btree_key_commit() macros
handles transaction retries, allowing us to avoid nested transactions -
which we want to avoid since they're tricky to do completely correctly
and upcoming assertions are going to be checking for that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

326568f1

bcachefs: Convert more fsck code to for_each_btree_key2() · eace11a7

Kent Overstreet authored Jul 17, 2022

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

eace11a7

bcachefs: Convert more quota code to for_each_btree_key2() · 1329c7ce

Kent Overstreet authored Jul 17, 2022

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

1329c7ce

bcachefs: Convert bch2_check_lrus() to for_each_btree_key_commit() · 1615505c

Kent Overstreet authored Jul 17, 2022

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

1615505c

bcachefs: Convert bch2_dev_freespace_init() to for_each_btree_key_commit() · ca91f40f

Kent Overstreet authored Jul 17, 2022

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

ca91f40f

bcachefs: Convert bch2_do_discards_work() to for_each_btree_key2() · 4910a950

Kent Overstreet authored Jul 17, 2022

The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4910a950

bcachefs: Improve bucket_alloc_fail tracepoint · 8ef98313

Kent Overstreet authored Jul 17, 2022

We should be printing the number of free buckets, not just the number of
available buckets.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8ef98313

bcachefs: bch2_mark_alloc(): Do wakeups after updating usage · f501ad2b

Kent Overstreet authored Jul 17, 2022

We have an obvious wake up race if we do the wakeup _before_ updating
the counters the thing doing the waiting is reading.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

f501ad2b

bcachefs: added lock held time stats · c807ca95

Daniel Hill authored Jul 14, 2022

We now record the length of time btree locks are held and expose this in debugfs.

Enabled via CONFIG_BCACHEFS_LOCK_TIME_STATS.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c807ca95

bcachefs: bch2_time_stats_to_text now indents properly · 25055c69

Daniel Hill authored Jul 14, 2022

Printbufs indentation feature doesn't yet work with '\n' and '\t'. So we've
replaced all instances of '\n' with prt_newline.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

25055c69

bcachefs: lock time stats prep work. · 8bfe14e8

Daniel Hill authored Jul 14, 2022

We need the caller name and a place to store our results, btree_trans provides this.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8bfe14e8

bcachefs: Unlock in bch2_trans_begin() if we've held locks more than 10us · 43de721a

Kent Overstreet authored Jul 13, 2022

We try to ensure we never hold btree locks for too long - bcachefs tries
to be soft realtime. This adds a check when restarting a transaction,
where a transaction restart is cheap - if we've been holding locks for
too long, drop and retake them.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

43de721a

bcachefs: for_each_btree_key2() · a1783320

Kent Overstreet authored Jul 15, 2022

This introduces two new macros for iterating through the btree, with
transaction restart handling
 - for_each_btree_key2()
 - for_each_btree_key_commit()

Every iteration is now in an implicit transaction, and - as with
lockrestart_do() and commit_do() - returning -EINTR will cause the
transaction to be restarted, at the same key.

This patch converts a bunch of code that was open coding this to these
new macros, saving a substantial amount of code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

a1783320

bcachefs: Fix repair for extent past end of inode · 0d06b4ec

Kent Overstreet authored Jul 16, 2022

When we find an extent past an inode's i_size, we need to do the
deletion in the inode's snapshot (which will emit a whiteout if
necessary); and we also need to note that we now have an a key at that
position and snapshot, so that we don't go into an infinite loop.

Also, switch to walking inodes in reverse older, oldest snapshot to
newest, so that we emit the fewest whiteouts possible.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

0d06b4ec

bcachefs: When fsck finds redundant snapshot keys, trigger snapshots cleanup · c7a09cb1

Kent Overstreet authored Jul 16, 2022

Fsck now checks for keys in different snapshot IDs that are now
redundant due to other snapshots being deleted - it needs to for its own
algorithms to not get confused.

When it detects this it should re-run the post snapshot deletion cleanup
- this patch does that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

c7a09cb1

bcachefs: Improve fsck for subvols/snapshots · 35f1a503

Kent Overstreet authored Jul 14, 2022

 - Bunch of refactoring, and move some code out of
   bch2_snapshots_start() and into bch2_snapshots_check(), for constency
   with the rest of fsck

 - Interior snapshot nodes no longer point to a subvolume; this is so we
   don't end up with dangling subvol references when deleting or require
   scanning the full snapshots btree.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

35f1a503

bcachefs: Improve snapshots_seen · 49124d8a

Kent Overstreet authored Jul 14, 2022

This makes the snapshots_seen data structure fsck private and improves
it; we now also track the equivalence class for each snapshot id we've
seen, which means we can detect when snapshot deletion hasn't finished
or run correctly (which will otherwise confuse fsck).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

49124d8a

bcachefs: Fix subvol/snapshot deleting in recovery · 4ab35c34

Kent Overstreet authored Jul 14, 2022

fsck doesn't want to run while we're cleaning up deleted snapshots - if
that work needs to be done, we want it to have finished before fsck
runs, otherwise fsck will get confused when it finds multiple keys in
the same snapshot ID equivalence class (i.e. the mechanism that
snapshot deletion uses for cleaning up redundant keys).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

4ab35c34

bcachefs: fsck_inode_rm() shouldn't delete subvols · e4085b70

Kent Overstreet authored Jul 14, 2022

We should never see an inode marked as unlinked that's a subvolume root
(or a directory) in fsck, but even if we do it's not correct for fsck to
delete the subvolume: subvolumes are owned by dirents, and if we find a
dangling subvolume (not marked as unlinked) we want fsck to reattach it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

e4085b70

bcachefs: Switch data_update path to snapshot_id_list · 597dee1c

Kent Overstreet authored Jul 14, 2022

snapshots_seen is becoming private to fsck, and snapshot_id_list is
actually what the data update path needs.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

597dee1c

bcachefs: Fix snapshot deletion · 416cc426

Kent Overstreet authored Jul 12, 2022

Snapshots being deleted won't in general have a corresponding subvolume:
this fixes a spurious fsck error where we'd complain about a snapshot
pointing to a missing subvolume - but the subvolume had been deleted,
and the snapshot was pending deletion as well.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

416cc426

bcachefs: Rename __bch2_trans_do() -> commit_do() · e68914ca

Kent Overstreet authored Jul 13, 2022

Better/more descriptive naming, and prep for adding
nested_lockrestart_do() and nested_commit_do().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

e68914ca

bcachefs: Silence some fsck errors when reconstructing alloc info · 80b3bf33

Kent Overstreet authored Jul 11, 2022

There's no need to print fsck errors for errors that are expected, and
the user has already opted to repair.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

80b3bf33

bcachefs: Put some repair messages behind opts->verbose · 1534ebb7

Kent Overstreet authored Jul 11, 2022

These messages log the updates we're doing in bch2_check_fix_ptrs(),
which is useful when debugging but not usually needed.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

1534ebb7

bcachefs: Silence unimportant tracepoints · e28307a1
Kent Overstreet authored Jul 05, 2022
```
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
```
e28307a1

bcachefs: Fix move path when move_stats == NULL · 7c0732b8

Kent Overstreet authored Jun 29, 2022

This isn't done very often, but it is legitimate
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

7c0732b8

bcachefs: Get ref on c->writes in move.c · 4081ace3

Kent Overstreet authored Jun 20, 2022

There's no point reading an extent in order to move it if the write is
going to fail because we're shutting down. This patch changes the move
path so that moving_io now owns a ref on c->writes - as a bonus,
rebalance and copygc will now notice that we're shutting down and exit
quicker.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

4081ace3

bcachefs: move.c refactoring · 0337cc7e

Kent Overstreet authored Jun 20, 2022

 - add bch2_moving_ctxt_(init|exit)
 - split out __bch2_evacutae_bucket() which takes an existing
   moving_ctxt, this will be used for improving copygc performance by
   pipelining across multiple buckets
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0337cc7e

bcachefs: data jobs, including rebalance wait for copygc. · c91996c5

Daniel Hill authored Jun 16, 2022

move_ratelimit() now has a bool that specifies whether we want to
wait for copygc to finish.

When copygc is running, we're probably low on free buckets instead
of consuming the remaining buckets, we want to wait for copygc to
finish.

This should help with performance, and run away bucket fragmentation.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c91996c5

bcachefs: Redo data_update interface · 7f5c5d20

Kent Overstreet authored Jun 13, 2022

This patch significantly cleans up and simplifies the data_update
interface. Instead of only being able to specify a single pointer by
device to rewrite, we're now able to specify any or all of the pointers
in the original extent to be rewrited, as a bitmask.

data_cmd is no more: the various pred functions now just return true if
the extent should be moved/updated. All the data_update path does is
rewrite existing replicas, or add new ones.

This fixes a bug where with background compression on replicated
filesystems, where rebalance -> data_update would incorrectly drop the
wrong old replica, and keep trying to recompress an extent pointer and
each time failing to drop the right replica. Oops.

Now, the data update path doesn't look at the io options to decide which
pointers to keep and which to drop - it only goes off of the
data_update_options passed to it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

7f5c5d20

bcachefs: Fix bch2_check_alloc_key() · 47ab0c5f

Kent Overstreet authored Jun 26, 2022

bch2_check_alloc_key() was failing to check buckets that didn't have
alloc keys yet (because they'd never been used) - they still need to be
added to the freespace btree.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

47ab0c5f

bcachefs: Improve bch2_check_alloc_info · e34da43e

Kent Overstreet authored Jun 19, 2022

 - In check_alloc_key(), previously we were re-initializing iterators
   for the need_discard and freespace btrees for every alloc key we
   checked. But this was causing us to redo lookups into the journal
   keys every time, since those lookups are cached in struct btree_iter.
   This initializes the iterators in bch2_check_alloc_info and passes
   them into check_alloc_key().

 - Make the looping more consistent/efficient in bch2_check_alloc_info()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

e34da43e

bcachefs: Use BTREE_INSERT_LAZY_RW in bch2_check_alloc_info() · 22add2ec

Kent Overstreet authored Jun 26, 2022

This runs before we go rw for journal replay, but after we're allowed to
go rw. It might be time to consider killing BTREE_INSERT_LAZY_RW,
though.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

22add2ec

bcachefs: Bucket invalidate path improvements · 38585367

Kent Overstreet authored Jun 20, 2022

 - invalidate_one_bucket() now returns 1 when we don't have any buckets
   on this device to invalidate, ensuring we don't spin
 - the tracepoint invocation is moved to after the transaction commit,
   and we now include the number of cached sectors in the tracepoint
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

38585367

bcachefs: Don't BUG_ON() inode link count underflow · 962ad1a7

Kent Overstreet authored Jun 23, 2022

This switches that assertion to a bch2_trans_inconsistent() call, as it
should be.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

962ad1a7

bcachefs: Always descend to leaf nodes it btree_gc · 7a47d099

Kent Overstreet authored Jun 22, 2022

If a btree node is unreadable, it's the topology repair that fixes that
and it's kicked off by btree_gc, so btree_gc needs to touch every node
and very that they can be read.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

7a47d099

bcachefs: fix __dev_available(). · 58aaa083

Daniel Hill authored Jun 23, 2022

__dev_available() now calculates available buckets correctly. Previously
it would almost always return 0 when we have cached data.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

58aaa083