Commits · b0c5b15cc8969f79b410a825efe9894cdec85738 · Kirill Smelkov / linux

22 Oct, 2023 40 commits

bcachefs: Optimize __bkey_unpack_key_format_checked() · b0c5b15c
Kent Overstreet authored Oct 17, 2022
```
Delete some code when CONFIG_BCACHEFS_DEBUG=n
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
b0c5b15c

bcachefs: Inline bch2_inode_pack() · 3e8b4b3a

Kent Overstreet authored Oct 17, 2022

It's mainly used from bch2_inode_write(), so inline it there.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3e8b4b3a

bcachefs: bucket_alloc_fail tracepoint should only fire when we have to block · adf16c6d

Kent Overstreet authored Oct 17, 2022

We don't want to fire the bucket_alloc_fail tracepoint on transaction
restart, when we can retry immediately - only when we the allocation
actually has to block.

Also, switch from sched_clock() to local_clock(), as we've been doing
elsewhere.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

adf16c6d

bcachefs: Optimize bch2_trans_init() · 307e3c13

Kent Overstreet authored Oct 17, 2022

Now we store the transaction's fn idx in a local variable, instead of
redoing the lookup every time we call bch2_trans_init().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

307e3c13

bcachefs: Split out __btree_path_up_until_good_node() · 29aa78f1

Kent Overstreet authored Oct 17, 2022

This breaks up btree_path_up_until_good_node() so that only the fastpath
gets inlined.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

29aa78f1

bcachefs: Btree key cache shrinker fix · b2f83e76

Kent Overstreet authored Oct 17, 2022

The shrinker assumes freed key cache items are ordered by age, so that
it doesn't have to scan the full list to find items that are old enough
(according to the srcu code) to be freed.

But percpu freelists broke this ordering; this patch fixes this by
ensuring we insert items into the proper position.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b2f83e76

bcachefs: __bio_compress() fix up. · be75bb7a

Daniel Hill authored Oct 16, 2022

A single block can't be compressed, so it's incompressible.
This stops rebalance repeatably marking extents as uncompressed.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

be75bb7a

bcachefs: make durability a read-write sysfs option · 597c6d17

Daniel Hill authored May 25, 2022

Sometimes the user may need to change durability after formatting to
match current hardware setup, this option provides a quick and flexible
alternative to removing then adding the device.
It is HIGHLY ADVISED TO RUN REREPLICATE after changing this value so the
system doesn't remain degraded.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

597c6d17

bcachefs: improve behaviour of btree_cache_scan() · b5ac23c4

Daniel Hill authored Oct 06, 2022

Appending new nodes to the end of the list means we're more likely to
evict old entries when btree_cache_scan() is started.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b5ac23c4

bcachefs: Quota fixes · bd954215

Kent Overstreet authored Oct 15, 2022

 - We now correctly allow soft limits to be exceeded, instead of always
   returning -EDQUOT
 - Disk quota grate times/warnings can now be set, not just the
   systemwide defaults
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bd954215

bcachefs: Switch to local_clock() for fastpath time source · d7e4e513

Kent Overstreet authored Oct 15, 2022

local_clock() isn't always completely accurate - e.g. on machines with
TSC drift - but ktime_get_ns() overhead is too high, unfortunately.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d7e4e513

bcachefs: Btree key cache improvements · fe5b37f6

Kent Overstreet authored Oct 15, 2022

 - In userspace, we don't have real percpu variables; this patch
   disables the percpu freelists in userspace
 - add some error messages for the asserts in
   bch2_fs_btree_key_cache_exit(); we've been hitting this (only in
   userspace, oddly), perhaps this will help us track down the error.
 - bkey_cached_reuse() should likely be taking the key cache lock, and
   it's a slowpath so it doesn't hurt to
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

fe5b37f6

bcachefs: Fix btree node prefetchig · dccedaaa

Kent Overstreet authored Oct 14, 2022

We were forgetting to count down the number of nodes to prefetch, firing
off _way_ more than intended - whoops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

dccedaaa

bcachefs: bch2_btree_key_cache_scan() doesn't need trylock · 0196eb89

Kent Overstreet authored Oct 14, 2022

We don't actually allocate memory under the btree key cache lock - so
there's no recursion concerns, and the shrinker can just use
mutex_lock().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0196eb89

bcachefs: Defer full journal entry validation · d1b2c864

Kent Overstreet authored Oct 13, 2022

On journal read, previously we would do full journal entry validation
immediately after reading a journal entry.

However, this would lead to errors for journal entries we weren't
actually going to use, either because they were too old or too new
(newer than the most recent flush).

We've observed write tearing on journal entries newer than the newest
flush - which makes sense, prior to a flush there's no guarantees about
write persistence.

This patch defers full journal entry validation until the end of the
journal read path, when we know which journal entries we'll want to use.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d1b2c864

bcachefs: Improve journal_entry_add() · 17fe3b64

Kent Overstreet authored Oct 14, 2022

Prep work for the next patch, to defer journal entry validation: we now
track for each replica whether we had a good checksum.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

17fe3b64

bcachefs: time stats now uses the mean_and_variance module. · bf8f8b20
Daniel Hill authored Aug 12, 2022
```
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
bf8f8b20

bcachefs: Mean and variance · 92095781

Daniel Hill authored Aug 06, 2022

This module provides a fast 64bit implementation of basic statistics
functions, including mean, variance and standard deviation in both
weighted and unweighted variants, the unweighted variant has a 32bit
limitation per sample to prevent overflow when squaring.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

92095781

bcachefs: Fix for not dropping privs in fallocate · 07bfcc0b

Kent Overstreet authored Oct 13, 2022

When modifying a file, we may be required to drop the suid/sgid bits -
we were missing a file_modified() call to do this.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

07bfcc0b

bcachefs: Fix bch2_write_begin() · 3a4d3656

Kent Overstreet authored Oct 13, 2022

An error case was jumping to the wrong label, creating an infinite loop
- oops.

This fixes fstests generic/648.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3a4d3656

fixup bcachefs: Deadlock cycle detector · 40405557
Kent Overstreet authored Jan 20, 2023
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
40405557
fixup bcachefs: Deadlock cycle detector · 80df5b8c
Kent Overstreet authored Jan 20, 2023
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
80df5b8c

bcachefs: Fix lock_graph_remove_non_waiters() · 896f1b31

Kent Overstreet authored Oct 12, 2022

We were removing 1 more entry than we were supposed to - oops.

Also some other simplifications and cleanups, and bring back the abort
preference code in a better fashion.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

896f1b31

bcachefs: Support FS_XFLAG_PROJINHERIT · 65ff2d3a

Kent Overstreet authored Oct 12, 2022

We already have support for the flag's semantics: inode options are
inherited by children if they were explicitly set on the parent. This
patch just maps the FS_XFLAG_PROJINHERIT flag to the "this option was
epxlicitly set" bit.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

65ff2d3a

bcachefs: Don't allow hardlinks when inherited attrs would change · bf9cb250

Kent Overstreet authored Oct 12, 2022

This is the right thing to do, and conforms with our own behaviour on
rename and xfs's behaviour on hardlink.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bf9cb250

bcachefs: Initialize sb_quota with default 1 week timer · f866870f

Kent Overstreet authored Oct 12, 2022

For compliance with other quota implementations, we should be
initializing quota information with a default 1 week timelimit: this
fixes fstests generic/235.

Also, this adds to_text() functions for some quota structs - useful
debugging aids.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f866870f

bcachefs: Call bch2_btree_update_add_new_node() before dropping write lock · de107dc8

Kent Overstreet authored Oct 12, 2022

btree nodes can be written by other threads (shrinker, journal reclaim)
with only a read lock, but brand new nodes should only be written by the
thread doing the split/interior update. bch2_btree_update_add_new_node()
sets btree node flags to indicate that this is a new node and should not
be written out by other threads, thus we need to call it before dropping
our write lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

de107dc8

bcachefs: Reflink now respects quotas · e8540e56

Kent Overstreet authored Oct 11, 2022

This adds a new helper, quota_reserve_range(), which takes a quota
reservation for unallocated blocks in a given file range, and uses it in
bch2_remap_file_range().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e8540e56

bcachefs: Fix a rare path in bch2_btree_path_peek_slot() · f42238b5

Kent Overstreet authored Oct 12, 2022

In the drop_alloc tests, we may end up calling
bch2_btree_iter_peek_slot() on an interior level that doesn't exist.
Previously, this would hit the path->uptodate assertion in
bch2_btree_path_peek_slot(); this path first checks a NULL btree node,
which is how we know we're at the end of the btree.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f42238b5

bcachefs: bch2_path_put_nokeep() · 7dcbdbd8

Kent Overstreet authored Oct 11, 2022

The btree iterator code may allocate extra btree paths, temporarily,
that do not refer to keys being returned: we don't need to wait until
transaction restart to drop these, when they're not referenced they
should be deleted right away.

This fixes a transaction path overflow bug.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7dcbdbd8

bcachefs: Fix cached data accounting · 5b3243cb

Kent Overstreet authored Oct 11, 2022

Negating without casting to a signed integer means the value wasn't
getting sign extended properly - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5b3243cb

bcachefs: Btree splits now only take the locks they need · 1f0f731f

Kent Overstreet authored Sep 27, 2022

Previously, bch2_btree_update_start() would always take all intent
locks, all the way up to the root.

We've finally got data from users where this became a scalability issue
- so, this patch fixes bch2_btree_update_start() to only take the locks
we need.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1f0f731f

bcachefs: bch2_btree_iter_peek() now works with interior nodes · 969576ec

Kent Overstreet authored Oct 09, 2022

Needed by the next patch, which will be iterating over keys in nodes at
level 1.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

969576ec

bcachefs: bch2_btree_insert_node() no longer uses lock_write_nofail · 1ff7849f

Kent Overstreet authored Oct 09, 2022

Now that we have an error path plumbed through, there's no need to be
using bch2_btree_node_lock_write_nofail().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1ff7849f

bcachefs: Add error path to btree_split() · a8eefbd3

Kent Overstreet authored Oct 01, 2022

The next patch in the series is (finally!) going to change btree splits
(and interior updates in general) to not take intent locks all the way
up to the root - instead only locking the nodes they'll need to modify.

However, this will be introducing a race since if we're not holding a
write lock on a btree node it can be written out by another thread, and
then we might not have enough space for a new bset entry.

We can handle this by retrying - we just need to introduce a new error
path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a8eefbd3

bcachefs: Write new btree nodes after parent update · 8cbb0002

Kent Overstreet authored Oct 01, 2022

In order to avoid locking all btree nodes up to the root for btree node
splits, we're going to have to introduce a new error path into
bch2_btree_insert_node(); this mean we can't have done any writes or
modified global state before that point.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8cbb0002

bcachefs: Simplify break_cycle() · fe2de9a8

Kent Overstreet authored Oct 09, 2022

We'd like to prioritize aborting transactions that have done less work -
however, it appears breaking cycles by telling other threads to abort
may still be buggy, so disable that for now.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

fe2de9a8

bcachefs: Print cycle on unrecoverable deadlock · 1148a97f

Kent Overstreet authored Oct 09, 2022

Some lock operations can't fail; a cycle of nofail locks is impossible
to recover from. So we want to get rid of these nofail locking
operations, but as this is tricky it'll be done incrementally.

If such a cycle happens, this patch prints out which codepaths are
involved so we know what to work on next.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1148a97f

bcachefs: Handle dropping pointers in data_update path · 1be88797

Kent Overstreet authored Oct 09, 2022

Cached pointers are generally dropped, not moved: this led to an
assertion firing in the data update path when there were no new replicas
being written.

This path adds a data_options field for pointers to be dropped, and
tweaks move_extent() to check if we're only dropping pointers, not
writing new ones, before kicking off a data update operation.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1be88797

bcachefs: Ratelimit ec error message · 160dff6d

Kent Overstreet authored Oct 09, 2022

We should fix this, but for now this makes this more usable.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

160dff6d