Commits · ae10fe017bf54653a61a93e49fac1c3e2b474e20 · Kirill Smelkov / linux

22 Oct, 2023 40 commits

Kent Overstreet authored Nov 04, 2022

This refactoring puts our various allocation path counters into a
dedicated struct - the upcoming nocow patch is going to add another
counter.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ae10fe01

bcachefs: Fix bch2_btree_path_up_until_good_node() · 29cea6f4

Kent Overstreet authored Sep 27, 2022

There was a rare bug when path->locks_want was nonzero, but not
BTREE_MAX_DEPTH, where we'd return on a valid node that wasn't locked -
oops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

29cea6f4

bcachefs: Factor out bch2_write_drop_io_error_ptrs() · e0eaf862

Kent Overstreet authored Sep 27, 2022

Move slowpath code to a separate, non-inline function.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e0eaf862

bcachefs: Break out bch2_btree_path_traverse_cached_slowpath() · 99e2146b
Kent Overstreet authored Sep 26, 2022
```
Prep work for further refactoring.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
99e2146b

bcachefs: Kill io_in_flight semaphore · 2d848dac

Kent Overstreet authored Sep 26, 2022

This used to be needed more for buffered IO, but now the block layer has
writeback throttling - we can delete this now.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2d848dac

bcachefs: Improve bucket_alloc tracepoint · 68b6cd19

Kent Overstreet authored Sep 26, 2022

It now includes more info - whether the bucket was for metadata or data
- and also call it in the same place as the bucket_alloc_fail
tracepoint.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

68b6cd19

bcachefs; Mark __bch2_trans_iter_init as inline · c298fd7d

Kent Overstreet authored Sep 26, 2022

This function is fairly small and only used in two places: one very hot,
the other cold, so it should definitely be inlined.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c298fd7d

bcachefs: Inline fast path of check_pos_snapshot_overwritten() · 25b4b330

Kent Overstreet authored Sep 26, 2022

This moves the slowpath of check_pos_snapshot_overwritten() to a
separate function, and inlines the fast path - helping performance on
btrees that don't use snapshot and for users that aren't using
snapshots.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

25b4b330

bcachefs: Improve jset_validate() · c23a9e08

Kent Overstreet authored Sep 26, 2022

Previously, jset_validate() was formatting the initial part of an error
string for every entry it validating - expensive.

This moves that code to journal_entry_err_msg(), which is now only
called if there's an actual error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c23a9e08

bcachefs: Optimize btree_path_alloc() · 3f3bc66e

Kent Overstreet authored Sep 26, 2022

 - move slowpath code to a separate function, btree_path_overflow()
 - no need to use hweight64
 - copy nr_max_paths from btree_transaction_stats to btree_trans,
   avoiding a data dependency in the fast path
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3f3bc66e

bcachefs: Inline bch2_trans_kmalloc() fast path · 14d8f26a
Kent Overstreet authored Sep 26, 2022
```
Small performance optimization.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
14d8f26a

bcachefs: Run bch2_fs_counters_init() earlier · f3b8403e

Kent Overstreet authored Sep 25, 2022

We need counters to be initialized before initializing shrinkers - the
shrinker callbacks will update those counters. This fixes a segfault in
userspace.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f3b8403e

bcachefs: btree_err() now uses bch2_print_string_as_lines() · d704d623

Kent Overstreet authored Sep 25, 2022

We've seen long error messages get truncated here, so convert to the new
bch2_print_string_as_lines().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d704d623

bcachefs: Improve bch2_fsck_err() · dbb9936b

Kent Overstreet authored Sep 25, 2022

 - factor out fsck_err_get()
 - if the "bcachefs (%s):" prefix has already been applied, don't
   duplicate it
 - convert to printbufs instead of static char arrays
 - tidy up control flow a bit
 - use bch2_print_string_as_lines(), to avoid messages getting truncated
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

dbb9936b

bcachefs: bch2_print_string_as_lines() · a8f35428

Kent Overstreet authored Sep 25, 2022

This adds a helper for printing a large buffer one line at a time, to
avoid the 1k printk limit.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a8f35428

bcachefs: bch2_btree_node_relock_notrace() · e9174370

Kent Overstreet authored Sep 25, 2022

Most of the node_relock_fail trace events are generated from
bch2_btree_path_verify_level(), when debugcheck_iterators is enabled -
but we're not interested in these trace events, they don't indicate that
we're in a slowpath.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e9174370

bcachefs: bch2_btree_cache_scan() improvement · c36ff038

Kent Overstreet authored Sep 25, 2022

We're still seeing OOM issues caused by the btree node cache shrinker
not sufficiently freeing memory: thus, this patch changes the shrinker
to not exit if __GFP_FS was not supplied.

Instead, tweak btree node memory allocation so that we never invoke
memory reclaim while holding the btree node cache lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c36ff038

bcachefs: Fix blocking with locks held · c6cf49a9

Kent Overstreet authored Sep 23, 2022

This is a major oopsy - we should always be unlocking before calling
closure_sync(), else we'll cause a deadlock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c6cf49a9

bcachefs: btree_update_nodes_written() needs BTREE_INSERT_USE_RESERVE · 01ed3359
Kent Overstreet authored Sep 23, 2022
```
This fixes an obvious deadlock - whoops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
01ed3359

bcachefs: Fix error handling in bch2_btree_update_start() · d602657c

Kent Overstreet authored Sep 22, 2022

We were checking for -EAGAIN, but we're not returned that when we didn't
pass a closure to wait with - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d602657c

bcachefs: Improve bch2_btree_trans_to_text() · afbc7194

Kent Overstreet authored Sep 01, 2022

This is just a formatting/readability improvement.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

afbc7194

bcachefs: Kill normalize_read_intent_locks() · 8b31e4fc

Kent Overstreet authored Aug 22, 2022

Before we had the deadlock cycle detector, we didn't want to be holding
read locks when taking intent locks, because blocking on an intent lock
while holding a read lock was a lock ordering violation that could
cause a deadlock.

With the cycle detector this is no longer an issue, so this code can be
deleted.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

8b31e4fc

bcachefs: Ensure bch2_btree_node_lock_write_nofail() never fails · 2ec254c0

Kent Overstreet authored Mar 06, 2023

In order for bch2_btree_node_lock_write_nofail() to never produce a
deadlock, we must ensure we're never holding read locks when using it.
Fortunately, it's only used from code paths where any read locks may be
safely dropped.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2ec254c0

bcachefs: Delete old deadlock avoidance code · 0d7009d7

Kent Overstreet authored Aug 22, 2022

This deletes our old lock ordering based deadlock avoidance code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

0d7009d7

bcachefs: Print deadlock cycle in debugfs · 96d994b3

Kent Overstreet authored Aug 22, 2022

In the event that we're not finished debugging the cycle detector, this
adds a new file to debugfs that shows what the cycle detector finds, if
anything. By comparing this with btree_transactions, which shows held
locks for every btree_transaction, we'll be able to determine if it's
the cycle detector that's buggy or something else.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

96d994b3

bcachefs: Deadlock cycle detector · 33bd5d06

Kent Overstreet authored Aug 22, 2022

We've outgrown our own deadlock avoidance strategy.

The btree iterator API provides an interface where the user doesn't need
to concern themselves with lock ordering - different btree iterators can
be traversed in any order. Without special care, this will lead to
deadlocks.

Our previous strategy was to define a lock ordering internally, and
whenever we attempt to take a lock and trylock() fails, we'd check if
the current btree transaction is holding any locks that cause a lock
ordering violation. If so, we'd issue a transaction restart, and then
bch2_trans_begin() would re-traverse all previously used iterators, but
in the correct order.

That approach had some issues, though.
 - Sometimes we'd issue transaction restarts unnecessarily, when no
   deadlock would have actually occured. Lock ordering restarts have
   become our primary cause of transaction restarts, on some workloads
   totally 20% of actual transaction commits.

 - To avoid deadlock or livelock, we'd often have to take intent locks
   when we only wanted a read lock: with the lock ordering approach, it
   is actually illegal to hold _any_ read lock while blocking on an intent
   lock, and this has been causing us unnecessary lock contention.

 - It was getting fragile - the various lock ordering rules are not
   trivial, and we'd been seeing occasional livelock issues related to
   this machinery.

So, since bcachefs is already a relational database masquerading as a
filesystem, we're stealing the next traditional database technique and
switching to a cycle detector for avoiding deadlocks.

When we block taking a btree lock, after adding ourself to the waitlist
but before sleeping, we do a DFS of btree transactions waiting on other
btree transactions, starting with the current transaction and walking
our held locks, and transactions blocking on our held locks.

If we find a cycle, we emit a transaction restart. Occasionally (e.g.
the btree split path) we can not allow the lock() operation to fail, so
if necessary we'll tell another transaction that it has to fail.

Result: trans_restart_would_deadlock events are reduced by a factor of
10 to 100, and we'll be able to delete a whole bunch of grotty, fragile
code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

33bd5d06

bcachefs: Fix bch2_btree_node_upgrade() · 62448afe

Kent Overstreet authored Aug 05, 2022

Previously, if we were trying to upgrade from a read to an intent lock
but we held an additional read lock via another btree_path,
bch2_btree_node_upgrade() would always fail, in six_lock_tryupgrade().

This patch factors out the code that __bch2_btree_node_lock_write() uses
to temporarily drop extra read locks, so that six_lock_tryupgrade() can
succeed.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

62448afe

bcachefs: Add a debug assert · 845cffed

Kent Overstreet authored Sep 19, 2022

Chasing down a strange locking bug.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

845cffed

six locks: Wakeup now takes lock on behalf of waiter · 84a37cbf

Kent Overstreet authored Aug 26, 2022

This brings back an important optimization, to avoid touching the wait
lists an extra time, while preserving the property that a thread is on a
lock waitlist iff it is waiting - it is never removed from the waitlist
until it has the lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

84a37cbf

six locks: Fix a lost wakeup · e4b7254c

Kent Overstreet authored Oct 15, 2022

There was a lost wakeup between a read unlock in percpu mode and a write
lock. The unlock path unlocks, then executes a barrier, then checks for
waiters; correspondingly, the lock side should set the wait bit and
execute a barrier, then attempt to take the lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e4b7254c

six locks: Enable lockdep · 5b254da5

Kent Overstreet authored Sep 24, 2022

Now that we have lockdep_set_no_check_recursion(), we can enable lockdep
checking.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5b254da5

six locks: Add start_time to six_lock_waiter · f6ea2d57

Kent Overstreet authored Sep 24, 2022

This is needed by the cycle detector in bcachefs - we need a way to
iterater over waitlist entries while dropping and retaking the waitlist
lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f6ea2d57

six locks: six_lock_waiter() · 0bfb9f42

Kent Overstreet authored Aug 27, 2022

This allows passing in the wait list entry - to be used for a deadlock
cycle detector.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0bfb9f42

six locks: Simplify wait lists · ebc6f76a

Kent Overstreet authored Aug 25, 2022

This switches to a single list of waiters, instead of separate lists for
read and intent, and switches write locks to also use the wait lists
instead of being handled differently.

Also, removal from the wait list is now done by the process waiting on
the lock, not the process doing the wakeup. This is needed for the new
deadlock cycle detector - we need tasks to stay on the waitlist until
they've successfully acquired the lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ebc6f76a

bcachefs: Add private error codes for ENOSPC · 098ef98d

Kent Overstreet authored Sep 18, 2022

Continuing the saga of introducing private dedicated error codes for
each error path, this patch converts ENOSPC to error codes that are
subtypes of ENOSPC. We've recently had a test failure where we got
-ENOSPC where we shouldn't have, and didn't have enough information to
tell where it came from, so this patch will solve that problem.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

098ef98d

bcachefs: Errcodes can now subtype standard error codes · 5c1ef830

Kent Overstreet authored Sep 18, 2022

The next patch is going to be adding private error codes for all the
places we return -ENOSPC.

Additionally, this patch updates return paths at all module boundaries
to call bch2_err_class(), to return the standard error code.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5c1ef830

bcachefs: Make an assertion more informative · 57ce8274
Kent Overstreet authored Sep 18, 2022
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
57ce8274

bcachefs: All held locks must be in a btree path · e4215d0f

Kent Overstreet authored Sep 16, 2022

With the new deadlock cycle detector, it's critical that all held locks
be marked in a btree_path, because that's what the cycle detector
traverses - any locks that aren't correctly marked will cause deadlocks.

This changes the btree_path to allocate some btree_paths for the new
nodes, since until the final update is done we otherwise don't have a
path referencing them.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e4215d0f

bcachefs: bch2_btree_path_upgrade() now emits transaction restart · 367d72dd

Kent Overstreet authored Sep 17, 2022

Centralizing the transaction restart/tracepoint in
bch2_btree_path_upgrade() lets us improve the tracepoint - now it emits
old and new locks_want.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

367d72dd

bcachefs: Add a manual trigger for lock wakeups · b8eec675

Kent Overstreet authored Sep 17, 2022

Spotted a lockup once that appeared to be a lost wakeup. Adding a manual
trigger for lock wakeups will make it easy to tell if that's what it is
next time it occurs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b8eec675