Commits · e264b2f62a8fdf571e9ca9a741719a9b567573f5 · Kirill Smelkov / linux

22 Oct, 2023 40 commits

bcachefs: Improve bch2_btree_update_start() · e264b2f6

Kent Overstreet authored Mar 31, 2021

bch2_btree_update_start() is now responsible for taking gc_lock and
upgrading the iterator to lock parent nodes - greatly simplifying error
handling and all of the callers.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e264b2f6

bcachefs: Add a sysfs var for average btree write size · ba5f03d3

Kent Overstreet authored Mar 31, 2021

Useful number for performance tuning.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ba5f03d3

bcachefs: Improve bch2_trans_relock() · d5a43661

Kent Overstreet authored Mar 30, 2021

We're getting away from relying on iter->uptodate - this changes
bch2_trans_relock() to more directly specify which iterators should be
relocked.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d5a43661

bcachefs: Move btree lock debugging to slowpath fn · acb3b26e

Kent Overstreet authored Mar 31, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

acb3b26e

bcachefs: Don't make foreground writes wait behind journal reclaim too long · 24db24c7
Kent Overstreet authored Mar 31, 2021
```
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
24db24c7

buckets.c fixups XXX squash · 65bcd657

Kent Overstreet authored Mar 28, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

65bcd657

bcachefs: Add repair code for out of order keys in a btree node. · 5f65d74d

Kent Overstreet authored Mar 29, 2021

This just drops the offending key - in the bug report where this was
seen, it was clearly a single bit memory error, and fsck will fix the
missing key.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5f65d74d

bcachefs: Free iterator in bch2_btree_delete_range_trans() · a84b6c50

Kent Overstreet authored Mar 28, 2021

This is specifically to speed up bch2_inode_rm(), so that we're not
traversing iterators we're done with.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a84b6c50

bcachefs: Have journal reclaim thread flush more aggressively · c5f51cdd

Kent Overstreet authored Mar 28, 2021

This adds a new watermark for the journal reclaim when flushing btree
key cache entries - it should try and stay ahead of where foreground
threads doing transaction commits will enter direct journal reclaim.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c5f51cdd

bcachefs: Don't use bch2_inode_find_by_inum() in move.c · 883d9701

Kent Overstreet authored Mar 16, 2021

Since move.c isn't aware of what subvolume we're in, we can't use the
standard inode lookup code - fortunately, we're just using it for
reading IO options.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

883d9701

bcachefs: Change inode allocation code for snapshots · e6ae2727

Kent Overstreet authored Mar 15, 2021

For snapshots, when we allocate a new inode we want to allocate an inode
number that isn't in use in any other subvolume. We won't be able to use
ITER_SLOTS for this, inode allocation needs to change to use
BTREE_ITER_ALL_SNAPSHOTS.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e6ae2727

bcachefs: Inode backpointers · ab2a29cc

Kent Overstreet authored Mar 02, 2021

This patch adds two new inode fields, bi_dir and bi_dir_offset, that
point back to the inode's dirent.

Since we're only adding fields for a single backpointer, files that have
been hardlinked won't necessarily have valid backpointers: we also add a
new inode flag, BCH_INODE_BACKPTR_UNTRUSTED, that's set if an inode has
ever had multiple links to it. That's ok, because we only really need
this functionality for directories, which can never have multiple
hardlinks - when we add subvolumes, we'll need a way to enemurate and
print subvolumes, and this will let us reconstruct a path to a subvolume
root given a subvolume root inode.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ab2a29cc

bcachefs: Start using bpos.snapshot field · e751c01a

Kent Overstreet authored Mar 24, 2021

This patch starts treating the bpos.snapshot field like part of the key
in the btree code:

* bpos_successor() and bpos_predecessor() now include the snapshot field
* Keys in btrees that will be using snapshots (extents, inodes, dirents
  and xattrs) now always have their snapshot field set to U32_MAX

The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that
determines whether we're iterating over keys in all snapshots or not -
internally, this controlls whether bkey_(successor|predecessor)
increment/decrement the snapshot field, or only the higher bits of the
key.

We add a new member to struct btree_iter, iter->snapshot: when
BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always
equal iter->snapshot, which will be 0 for btrees that don't use
snapshots, and alsways U32_MAX for btrees that will use snapshots
(until we enable snapshot creation).

This patch also introduces a new metadata version number, and compat
code for reading from/writing to older versions - this isn't a forced
upgrade (yet).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e751c01a

bcachefs: Split out bpos_cmp() and bkey_cmp() · 4cf91b02

Kent Overstreet authored Mar 04, 2021

With snapshots, we're going to need to differentiate between comparisons
that should and shouldn't include the snapshot field. bpos_cmp is now
the comparison function that does include the snapshot field, used by
core btree code.

Upper level filesystem code generally does _not_ want to compare against
the snapshot field - that code wants keys to compare as equal even when
one of them is in an ancestor snapshot.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4cf91b02

bcachefs: Add a mechanism for running callbacks at trans commit time · 43d00243

Kent Overstreet authored Feb 03, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

43d00243

bcachefs: btree key cache locking improvements · 331194a2

Kent Overstreet authored Mar 24, 2021

The btree key cache mutex was becoming a significant bottleneck - it was
mainly used to protect the lists of dirty, clean and freed cached keys.

This patch eliminates the dirty and clean lists - instead, when we need
to scan for keys to drop from the cache we iterate over the rhashtable,
and thus we're able to remove most uses of that lock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

331194a2

bcachefs: Simplify btree_node_iter_init_pack_failed() · 2649b514

Kent Overstreet authored Mar 27, 2021

Since we now make sure to always generate packed bkey formats that can
pack the min_key of a btree node, this path should actually never
happen.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2649b514

bcachefs: Fix for bch2_trans_commit() unlocking when it's not supposed to · f793fd85

Kent Overstreet authored Mar 27, 2021

When we pass BTREE_INSERT_NOUNLOCK bch2_trans_commit isn't supposed to
unlock after a successful commit, but it was calling
bch2_trans_cond_resched() - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f793fd85

bcachefs: Fix packed bkey format calculation for new btree roots · 3bf57160

Kent Overstreet authored Mar 26, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3bf57160

bcachefs: Fix building of aux search trees · c7e04e22

Kent Overstreet authored Mar 26, 2021

We weren't packing the min/max keys, which was a major oversight and
completely disabled generating bkey_floats for adjacent nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c7e04e22

bcachefs: Generate better bkey formats when splitting nodes · 2da5d000

Kent Overstreet authored Mar 26, 2021

On btree node split, we weren't ensuring the min_key of the new larger
node packs in the new format for this node. This triggers some painful
slowpaths in the bset.c aux search tree code - this patch fixes that by
calculating a new format for the new node with the new min_key.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2da5d000

bcachefs: Drop bkey noops · 0390ea8a

Kent Overstreet authored Mar 24, 2021

Bkey noops were introduced to deal with trimming inline data extents in
place in the btree: if the u64s field of a bkey was 0, that u64 was a
noop and we'd start looking for the next bkey immediately after it.

But extent handling has been lifted above the btree - we no longer
modify existing extents in place in the btree, and the compatibilty code
for old style extent btree nodes is gone, so we can completely drop this
code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0390ea8a

bcachefs: Increase default journal size · 7c8b166e

Kent Overstreet authored Mar 24, 2021

The default was 1/256th of the device and capped at 512MB, which is
fairly tiny these days.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7c8b166e

bcachefs: Use pcpu mode of six locks for interior nodes · a9d79c6e

Kent Overstreet authored Mar 23, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a9d79c6e

bcachefs: Split btree_iter_traverse and bch2_btree_iter_traverse() · 08070cba

Kent Overstreet authored Mar 23, 2021

External (to the btree iterator code) users of bch2_btree_iter_traverse
expect that on success the iterator will be pointed at iter->pos and
have that position locked - but since we split iter->pos and
iter->real_pos, that means it has to update iter->real_pos if necessary.

Internal users don't expect it to modify iter->real_pos, so we need two
separate functions.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

08070cba

bcachefs: Improve inode deletion code · d3e6b9a1

Kent Overstreet authored Mar 21, 2021

It had some silly redundancies.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d3e6b9a1

bcachefs: Add an .invalid method for bch2_btree_ptr_v2 · fad7cfed

Kent Overstreet authored Mar 22, 2021

It was using the method for btree_ptr_v1, but that wasn't checking all
the fields.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

fad7cfed

bcachefs: Include snapshot field in bch2_bpos_to_text · 1fe9b1d3

Kent Overstreet authored Mar 22, 2021

More prep work for snapshots.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1fe9b1d3

bcachefs: Update iter->real_pos lazily · bcad5622

Kent Overstreet authored Mar 21, 2021

peek() has to update iter->real_pos - there's no need for
bch2_btree_iter_set_pos() to update it as well.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcad5622

bcachefs: Consolidate bch2_btree_iter_peek() and peek_with_updates() · 818664f5

Kent Overstreet authored Mar 21, 2021

Ideally we'll be getting rid of peek_with_updates(), but the callers
will need to be checked.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

818664f5

bcachefs: Improve iter->real_pos handling · ca58cbd4

Kent Overstreet authored Mar 21, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ca58cbd4

bcachefs: Internal btree iterator renaming · 3b0baf6f

Kent Overstreet authored Mar 21, 2021

This just gives some internal helpers some better names.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3b0baf6f

bcachefs: Kill btree_iter_peek_uptodate() · 07fc72e1

Kent Overstreet authored Mar 21, 2021

Since we're no longer doing next() immediately followed by peek(), this
optimization isn't doing anything anymore.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

07fc72e1

bcachefs: Iterators are now always consistent with iter->real_pos · 5cde51cd

Kent Overstreet authored Mar 21, 2021

This means bch2_btree_iter_traverse_one() can be made more efficient.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5cde51cd

bcachefs: Have btree_iter_next_node() use btree_iter_set_search_pos() · 345ca825

Kent Overstreet authored Mar 21, 2021

btree node iterators need to obey the regular btree node invarionts
w.r.t. iter->real_pos; once they do, bch2_btree_iter_traverse will have
less that it needs to check.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

345ca825

bcachefs: Replace bch2_btree_iter_next() calls with bch2_btree_iter_advance · e0ba3b64

Kent Overstreet authored Mar 21, 2021

The way btree iterators work internally has been changing, particularly
with the iter->real_pos changes, and bch2_btree_iter_next() is no longer
hyper optimized - it's just advance followed by peek, so it's more
efficient to just call advance where we're not using the return value of
bch2_btree_iter_next().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e0ba3b64

bcachefs: Get disk reservation when overwriting data in old snapshot · cb16bfaa

Kent Overstreet authored Mar 21, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

cb16bfaa

bcachefs: Switch extent_handle_overwrites() to one key at a time · 4cfb722c

Kent Overstreet authored Mar 20, 2021

Prep work for snapshots
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4cfb722c

bcachefs: Optimize bch2_btree_iter_verify_level() · 4ce41957

Kent Overstreet authored Mar 20, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4ce41957

bcachefs: Fix iterator picking · 5c1ec980

Kent Overstreet authored Mar 20, 2021

comparison was wrong
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5c1ec980