Commits · 845cffed0d343ecea9f6ff3883cac9a6872d9920 · Kirill Smelkov / linux

22 Oct, 2023 40 commits

Kent Overstreet authored Sep 19, 2022

Chasing down a strange locking bug.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

845cffed

six locks: Wakeup now takes lock on behalf of waiter · 84a37cbf

Kent Overstreet authored Aug 26, 2022

This brings back an important optimization, to avoid touching the wait
lists an extra time, while preserving the property that a thread is on a
lock waitlist iff it is waiting - it is never removed from the waitlist
until it has the lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

84a37cbf

six locks: Fix a lost wakeup · e4b7254c

Kent Overstreet authored Oct 15, 2022

There was a lost wakeup between a read unlock in percpu mode and a write
lock. The unlock path unlocks, then executes a barrier, then checks for
waiters; correspondingly, the lock side should set the wait bit and
execute a barrier, then attempt to take the lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e4b7254c

six locks: Enable lockdep · 5b254da5

Kent Overstreet authored Sep 24, 2022

Now that we have lockdep_set_no_check_recursion(), we can enable lockdep
checking.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5b254da5

six locks: Add start_time to six_lock_waiter · f6ea2d57

Kent Overstreet authored Sep 24, 2022

This is needed by the cycle detector in bcachefs - we need a way to
iterater over waitlist entries while dropping and retaking the waitlist
lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f6ea2d57

six locks: six_lock_waiter() · 0bfb9f42

Kent Overstreet authored Aug 27, 2022

This allows passing in the wait list entry - to be used for a deadlock
cycle detector.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0bfb9f42

six locks: Simplify wait lists · ebc6f76a

Kent Overstreet authored Aug 25, 2022

This switches to a single list of waiters, instead of separate lists for
read and intent, and switches write locks to also use the wait lists
instead of being handled differently.

Also, removal from the wait list is now done by the process waiting on
the lock, not the process doing the wakeup. This is needed for the new
deadlock cycle detector - we need tasks to stay on the waitlist until
they've successfully acquired the lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ebc6f76a

bcachefs: Add private error codes for ENOSPC · 098ef98d

Kent Overstreet authored Sep 18, 2022

Continuing the saga of introducing private dedicated error codes for
each error path, this patch converts ENOSPC to error codes that are
subtypes of ENOSPC. We've recently had a test failure where we got
-ENOSPC where we shouldn't have, and didn't have enough information to
tell where it came from, so this patch will solve that problem.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

098ef98d

bcachefs: Errcodes can now subtype standard error codes · 5c1ef830

Kent Overstreet authored Sep 18, 2022

The next patch is going to be adding private error codes for all the
places we return -ENOSPC.

Additionally, this patch updates return paths at all module boundaries
to call bch2_err_class(), to return the standard error code.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5c1ef830

bcachefs: Make an assertion more informative · 57ce8274
Kent Overstreet authored Sep 18, 2022
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
57ce8274

bcachefs: All held locks must be in a btree path · e4215d0f

Kent Overstreet authored Sep 16, 2022

With the new deadlock cycle detector, it's critical that all held locks
be marked in a btree_path, because that's what the cycle detector
traverses - any locks that aren't correctly marked will cause deadlocks.

This changes the btree_path to allocate some btree_paths for the new
nodes, since until the final update is done we otherwise don't have a
path referencing them.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e4215d0f

bcachefs: bch2_btree_path_upgrade() now emits transaction restart · 367d72dd

Kent Overstreet authored Sep 17, 2022

Centralizing the transaction restart/tracepoint in
bch2_btree_path_upgrade() lets us improve the tracepoint - now it emits
old and new locks_want.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

367d72dd

bcachefs: Add a manual trigger for lock wakeups · b8eec675

Kent Overstreet authored Sep 17, 2022

Spotted a lockup once that appeared to be a lost wakeup. Adding a manual
trigger for lock wakeups will make it easy to tell if that's what it is
next time it occurs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b8eec675

bcachefs: Fix sb_field_counters formatting · 5a82c7c7

Kent Overstreet authored Sep 16, 2022

We have counters with longer names now, so adjust the tabstop - also,
make sure there's always a space printed between the name and the
number.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5a82c7c7

bcachefs: Re-enable hash_redo_key() · 5877d887

Kent Overstreet authored Sep 04, 2022

When subvolumes & snapshots were rolled out, hash_redo_key() was
disabled due to some new complications - namely, bch2_hash_set() works
at the subvolume level, and fsck does not run in a defined subvolume,
instead working at the snapshot ID level.

This patch splits out bch2_hash_set_snapshot() from bch2_hash_set(), and
makes one small tweak for fsck:

 - Normally, bch2_hash_set() (and other dirent code) needs to know what
   subvolume we're in, because dirents that point to other subvolumes
   should only be visible in the subvolume they were created in, not
   other snapshots. We can't check that in fsck, so we just assume that
   all dirents are visible.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5877d887

bcachefs: Kill journal_keys->journal_seq_base · 1ffb876f

Kent Overstreet authored Sep 12, 2022

This removes an optimization that didn't actually save us any memory,
due to alignment, but did make the code more complicated than it needed
to be. We were also seeing a bug where journal_seq_base wasn't getting
correctly initailized, so hopefully it'll fix that too.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1ffb876f

bcachefs: Fix redundant transaction restart · e87b0e4a

Kent Overstreet authored Sep 04, 2022

Little bit of tidying up, this makes the counters a little bit clearer
as to what's happening.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e87b0e4a

bcachefs: Ensure intent locks are marked before taking write locks · 1bb91233

Kent Overstreet authored Sep 03, 2022

Locks must be correctly marked for the cycle detector to work.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1bb91233

bcachefs: Avoid using btree_node_lock_nopath() · 38474c26

Kent Overstreet authored Sep 02, 2022

With the upcoming cycle detector, we have to be careful about using
btree_node_lock_nopath - in particular, using it to take write locks can
cause deadlocks.

All held locks need to be tracked in a btree_path, so that the cycle
detector knows about them - unless we know that we cannot cause
deadlocks for other reasons: e.g. we are only taking read locks, or
we're in very early fsck (topology repair).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

38474c26

bcachefs: Fix usage of six lock's percpu mode, key cache version · 3d21d48e

Kent Overstreet authored Sep 03, 2022

Similar to "bcachefs: Fix usage of six lock's percpu mode", six locks
have a percpu mode, but we can't switch between percpu and non percpu
modes while a lock is in use: threads attempting to take a read lock may
race, and we'll end up with the read count permanently off.

Fixing this the "correct" way, in six_lock_pcpu_(alloc|free) would
require an RCU barrier, and we don't want to do that - instead, we have
to permanently segragate percpu and non percpu objects, including when
on freelists.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3d21d48e

bcachefs: Refactor bkey_cached_alloc() path · 0242130f

Kent Overstreet authored Sep 03, 2022

Clean up the arguments passed and make them more consistent.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0242130f

bcachefs: Convert more locking code to btree_bkey_cached_common · da4474f2

Kent Overstreet authored Sep 03, 2022

Ideally, all the code in btree_locking.c should be converted, but then
we'd want to convert btree_path to point to btree_key_cached_common too,
and then we'd be in for a much bigger cleanup - but a bit of incremental
cleanup will still be helpful for the next patches.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

da4474f2

bcachefs: btree_bkey_cached_common->cached · 4e6defd1

Kent Overstreet authored Aug 31, 2022

Add a type descriptor to btree_bkey_cached_common - there's no reason
not to since we've got padding that was otherwise unused, and this is a
nice cleanup (and helpful in later patches).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4e6defd1

bcachefs: Fix six_lock_readers_add() · 6b81f194

Kent Overstreet authored Sep 01, 2022

Have to be careful with bit fields - when subtracting, this was
overflowing into the write_locking bit.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

6b81f194

bcachefs: bch2_btree_node_lock_write_nofail() · d5024b01

Kent Overstreet authored Aug 22, 2022

Taking a write lock will be able to fail, with the new cycle detector -
unless we pass it nofail, which is possible but not preferred.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d5024b01

bcachefs: New locking functions · ca7d8fca

Kent Overstreet authored Aug 21, 2022

In the future, with the new deadlock cycle detector, we won't be using
bare six_lock_* anymore: lock wait entries will all be embedded in
btree_trans, and we will need a btree_trans context whenever locking a
btree node.

This patch plumbs a btree_trans to the few places that need it, and adds
two new locking functions
 - btree_node_lock_nopath, which may fail returning a transaction
   restart, and
 - btree_node_lock_nopath_nofail, to be used in places where we know we
   cannot deadlock (i.e. because we're holding no other locks).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

ca7d8fca

bcachefs: Mark write locks before taking lock · 54618087

Kent Overstreet authored Aug 26, 2022

six locks are unfair: while a thread is blocked trying to take a write
lock, new read locks will fail. The new deadlock cycle detector makes
use of our existing lock tracing, so we need to tell it we're holding a
write lock before we take the lock for it to work correctly.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

54618087

bcachefs: Delete time_stats for lock contended times · 534a591e

Kent Overstreet authored Aug 27, 2022

Since we've now got time_stats for lock hold times (per btree
transaction), we don't need this anymore.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

534a591e

bcachefs: Don't leak lock pcpu counts memory · c919f53f
Kent Overstreet authored Aug 30, 2022
```
This fixes a small memory leak.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
c919f53f

six locks: Delete six_lock_pcpu_free_rcu() · f5178b34

Kent Overstreet authored Aug 27, 2022

Didn't have any users, and wasn't a good idea to begin with - delete it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f5178b34

bcachefs: Add persistent counters for all tracepoints · 674cfc26

Kent Overstreet authored Aug 27, 2022

Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

674cfc26

bcachefs: Fix bch2_btree_update_start() to return -BCH_ERR_journal_reclaim_would_deadlock · d97e6aae

Kent Overstreet authored Aug 27, 2022

On failure to get a journal pre-reservation because we're called from
journal reclaim we're not supposed to return a transaction restart error
- this fixes a livelock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d97e6aae

bcachefs: Improve bch2_btree_node_relock() · 8a9c1b1c

Kent Overstreet authored Aug 27, 2022

This moves the IS_ERR_OR_NULL() check to the inline part, since that's a
fast path event.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8a9c1b1c

bcachefs: Improve trans_restart_journal_preres_get tracepoint · ce56bf7f
Kent Overstreet authored Aug 27, 2022
```
It now includes journal_flags.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
ce56bf7f

bcachefs: Improve btree_node_relock_fail tracepoint · 5f1dd9a6

Kent Overstreet authored Aug 27, 2022

It now prints the error name when the btree node is an error pointer;
also, don't trace failures when the the btree node is
BCH_ERR_no_btree_node_up.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5f1dd9a6

bcachefs: Make more btree_paths available · b1cdc398

Kent Overstreet authored Aug 27, 2022

 - Don't decrease BTREE_ITER_MAX when building with CONFIG_LOCKDEP
   anymore. The lockdep table sizes are configurable now, we don't need
   this anymore.
 - btree_trans_too_many_iters() is less conservative now. Previously it
   was causing a transaction restart if we had used more than
   BTREE_ITER_MAX / 2 paths, change this to BTREE_ITER_MAX - 8.

This helps with excessive transaction restarts/livelocks in the bucket
allocator path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b1cdc398

bcachefs: Correctly initialize bkey_cached->lock · 06a53943

Kent Overstreet authored Aug 25, 2022

We need to use the right class for some assertions to work correctly.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

06a53943

bcachefs: Track held write locks · 131dcd5a

Kent Overstreet authored Aug 22, 2022

The upcoming lock cycle detection code will need to know precisely which
locks every btree_trans is holding, including write locks - this patch
updates btree_node_locked_type to include write locks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

131dcd5a

bcachefs: Print lock counts in debugs btree_transactions · c240c3a9

Kent Overstreet authored Aug 23, 2022

Improve our debugfs output, to help in debugging deadlocks: this shows,
for every btree node we print, the current number of readers/intent
locks/write locks held.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c240c3a9

bcachefs: Switch btree locking code to struct btree_bkey_cached_common · 14599cce
Kent Overstreet authored Aug 22, 2022
```
This is just some type safety cleanup.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
```
14599cce