Commits · a84b6c50f18e197070e35a04252fcc5c0abf2904 · Kirill Smelkov / linux

22 Oct, 2023 40 commits

bcachefs: Free iterator in bch2_btree_delete_range_trans() · a84b6c50

Kent Overstreet authored Mar 28, 2021

This is specifically to speed up bch2_inode_rm(), so that we're not
traversing iterators we're done with.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a84b6c50

bcachefs: Have journal reclaim thread flush more aggressively · c5f51cdd

Kent Overstreet authored Mar 28, 2021

This adds a new watermark for the journal reclaim when flushing btree
key cache entries - it should try and stay ahead of where foreground
threads doing transaction commits will enter direct journal reclaim.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c5f51cdd

bcachefs: Don't use bch2_inode_find_by_inum() in move.c · 883d9701

Kent Overstreet authored Mar 16, 2021

Since move.c isn't aware of what subvolume we're in, we can't use the
standard inode lookup code - fortunately, we're just using it for
reading IO options.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

883d9701

bcachefs: Change inode allocation code for snapshots · e6ae2727

Kent Overstreet authored Mar 15, 2021

For snapshots, when we allocate a new inode we want to allocate an inode
number that isn't in use in any other subvolume. We won't be able to use
ITER_SLOTS for this, inode allocation needs to change to use
BTREE_ITER_ALL_SNAPSHOTS.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e6ae2727

bcachefs: Inode backpointers · ab2a29cc

Kent Overstreet authored Mar 02, 2021

This patch adds two new inode fields, bi_dir and bi_dir_offset, that
point back to the inode's dirent.

Since we're only adding fields for a single backpointer, files that have
been hardlinked won't necessarily have valid backpointers: we also add a
new inode flag, BCH_INODE_BACKPTR_UNTRUSTED, that's set if an inode has
ever had multiple links to it. That's ok, because we only really need
this functionality for directories, which can never have multiple
hardlinks - when we add subvolumes, we'll need a way to enemurate and
print subvolumes, and this will let us reconstruct a path to a subvolume
root given a subvolume root inode.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ab2a29cc

bcachefs: Start using bpos.snapshot field · e751c01a

Kent Overstreet authored Mar 24, 2021

This patch starts treating the bpos.snapshot field like part of the key
in the btree code:

* bpos_successor() and bpos_predecessor() now include the snapshot field
* Keys in btrees that will be using snapshots (extents, inodes, dirents
  and xattrs) now always have their snapshot field set to U32_MAX

The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that
determines whether we're iterating over keys in all snapshots or not -
internally, this controlls whether bkey_(successor|predecessor)
increment/decrement the snapshot field, or only the higher bits of the
key.

We add a new member to struct btree_iter, iter->snapshot: when
BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always
equal iter->snapshot, which will be 0 for btrees that don't use
snapshots, and alsways U32_MAX for btrees that will use snapshots
(until we enable snapshot creation).

This patch also introduces a new metadata version number, and compat
code for reading from/writing to older versions - this isn't a forced
upgrade (yet).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e751c01a

bcachefs: Split out bpos_cmp() and bkey_cmp() · 4cf91b02

Kent Overstreet authored Mar 04, 2021

With snapshots, we're going to need to differentiate between comparisons
that should and shouldn't include the snapshot field. bpos_cmp is now
the comparison function that does include the snapshot field, used by
core btree code.

Upper level filesystem code generally does _not_ want to compare against
the snapshot field - that code wants keys to compare as equal even when
one of them is in an ancestor snapshot.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4cf91b02

bcachefs: Add a mechanism for running callbacks at trans commit time · 43d00243

Kent Overstreet authored Feb 03, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

43d00243

bcachefs: btree key cache locking improvements · 331194a2

Kent Overstreet authored Mar 24, 2021

The btree key cache mutex was becoming a significant bottleneck - it was
mainly used to protect the lists of dirty, clean and freed cached keys.

This patch eliminates the dirty and clean lists - instead, when we need
to scan for keys to drop from the cache we iterate over the rhashtable,
and thus we're able to remove most uses of that lock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

331194a2

bcachefs: Simplify btree_node_iter_init_pack_failed() · 2649b514

Kent Overstreet authored Mar 27, 2021

Since we now make sure to always generate packed bkey formats that can
pack the min_key of a btree node, this path should actually never
happen.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2649b514

bcachefs: Fix for bch2_trans_commit() unlocking when it's not supposed to · f793fd85

Kent Overstreet authored Mar 27, 2021

When we pass BTREE_INSERT_NOUNLOCK bch2_trans_commit isn't supposed to
unlock after a successful commit, but it was calling
bch2_trans_cond_resched() - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f793fd85

bcachefs: Fix packed bkey format calculation for new btree roots · 3bf57160

Kent Overstreet authored Mar 26, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3bf57160

bcachefs: Fix building of aux search trees · c7e04e22

Kent Overstreet authored Mar 26, 2021

We weren't packing the min/max keys, which was a major oversight and
completely disabled generating bkey_floats for adjacent nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

c7e04e22

bcachefs: Generate better bkey formats when splitting nodes · 2da5d000

Kent Overstreet authored Mar 26, 2021

On btree node split, we weren't ensuring the min_key of the new larger
node packs in the new format for this node. This triggers some painful
slowpaths in the bset.c aux search tree code - this patch fixes that by
calculating a new format for the new node with the new min_key.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

2da5d000

bcachefs: Drop bkey noops · 0390ea8a

Kent Overstreet authored Mar 24, 2021

Bkey noops were introduced to deal with trimming inline data extents in
place in the btree: if the u64s field of a bkey was 0, that u64 was a
noop and we'd start looking for the next bkey immediately after it.

But extent handling has been lifted above the btree - we no longer
modify existing extents in place in the btree, and the compatibilty code
for old style extent btree nodes is gone, so we can completely drop this
code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

0390ea8a

bcachefs: Increase default journal size · 7c8b166e

Kent Overstreet authored Mar 24, 2021

The default was 1/256th of the device and capped at 512MB, which is
fairly tiny these days.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7c8b166e

bcachefs: Use pcpu mode of six locks for interior nodes · a9d79c6e

Kent Overstreet authored Mar 23, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a9d79c6e

bcachefs: Split btree_iter_traverse and bch2_btree_iter_traverse() · 08070cba

Kent Overstreet authored Mar 23, 2021

External (to the btree iterator code) users of bch2_btree_iter_traverse
expect that on success the iterator will be pointed at iter->pos and
have that position locked - but since we split iter->pos and
iter->real_pos, that means it has to update iter->real_pos if necessary.

Internal users don't expect it to modify iter->real_pos, so we need two
separate functions.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

08070cba

bcachefs: Improve inode deletion code · d3e6b9a1

Kent Overstreet authored Mar 21, 2021

It had some silly redundancies.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d3e6b9a1

bcachefs: Add an .invalid method for bch2_btree_ptr_v2 · fad7cfed

Kent Overstreet authored Mar 22, 2021

It was using the method for btree_ptr_v1, but that wasn't checking all
the fields.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

fad7cfed

bcachefs: Include snapshot field in bch2_bpos_to_text · 1fe9b1d3

Kent Overstreet authored Mar 22, 2021

More prep work for snapshots.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

1fe9b1d3

bcachefs: Update iter->real_pos lazily · bcad5622

Kent Overstreet authored Mar 21, 2021

peek() has to update iter->real_pos - there's no need for
bch2_btree_iter_set_pos() to update it as well.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcad5622

bcachefs: Consolidate bch2_btree_iter_peek() and peek_with_updates() · 818664f5

Kent Overstreet authored Mar 21, 2021

Ideally we'll be getting rid of peek_with_updates(), but the callers
will need to be checked.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

818664f5

bcachefs: Improve iter->real_pos handling · ca58cbd4

Kent Overstreet authored Mar 21, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ca58cbd4

bcachefs: Internal btree iterator renaming · 3b0baf6f

Kent Overstreet authored Mar 21, 2021

This just gives some internal helpers some better names.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3b0baf6f

bcachefs: Kill btree_iter_peek_uptodate() · 07fc72e1

Kent Overstreet authored Mar 21, 2021

Since we're no longer doing next() immediately followed by peek(), this
optimization isn't doing anything anymore.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

07fc72e1

bcachefs: Iterators are now always consistent with iter->real_pos · 5cde51cd

Kent Overstreet authored Mar 21, 2021

This means bch2_btree_iter_traverse_one() can be made more efficient.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5cde51cd

bcachefs: Have btree_iter_next_node() use btree_iter_set_search_pos() · 345ca825

Kent Overstreet authored Mar 21, 2021

btree node iterators need to obey the regular btree node invarionts
w.r.t. iter->real_pos; once they do, bch2_btree_iter_traverse will have
less that it needs to check.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

345ca825

bcachefs: Replace bch2_btree_iter_next() calls with bch2_btree_iter_advance · e0ba3b64

Kent Overstreet authored Mar 21, 2021

The way btree iterators work internally has been changing, particularly
with the iter->real_pos changes, and bch2_btree_iter_next() is no longer
hyper optimized - it's just advance followed by peek, so it's more
efficient to just call advance where we're not using the return value of
bch2_btree_iter_next().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e0ba3b64

bcachefs: Get disk reservation when overwriting data in old snapshot · cb16bfaa

Kent Overstreet authored Mar 21, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

cb16bfaa

bcachefs: Switch extent_handle_overwrites() to one key at a time · 4cfb722c

Kent Overstreet authored Mar 20, 2021

Prep work for snapshots
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4cfb722c

bcachefs: Optimize bch2_btree_iter_verify_level() · 4ce41957

Kent Overstreet authored Mar 20, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

4ce41957

bcachefs: Fix iterator picking · 5c1ec980

Kent Overstreet authored Mar 20, 2021

comparison was wrong
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5c1ec980

bcachefs: Don't unconditially version_upgrade in initialize · 73590619

Kent Overstreet authored Mar 21, 2021

This is mkfs's job. Also, clean up the handling of feature bits some.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

73590619

bcachefs: Validate bset version field against sb version fields · 84cc758d

Kent Overstreet authored Mar 21, 2021

The superblock version fields need to be accurate to know whether a
filesystem is supported, thus we should be verifying them.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

84cc758d

bcachefs: Don't overwrite snapshot field in bch2_cut_back() · d361a26d

Kent Overstreet authored Mar 19, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d361a26d

bcachefs: Kill bkey ops->debugcheck method · 7e6dbac9

Kent Overstreet authored Mar 19, 2021

This code used to be used for running some assertions on alloc info at
runtime, but it long predates fsck and hasn't been good for much in
ages - we can delete it now.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

7e6dbac9

bcachefs: Assert that iterators aren't being double freed · e9895f0a

Kent Overstreet authored Mar 19, 2021

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e9895f0a

bcachefs: Require all btree iterators to be freed · 50dc0f69

Kent Overstreet authored Mar 19, 2021

We keep running into occasional bugs with btree transaction iterators
overflowing - this will make those bugs more visible.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

50dc0f69

bcachefs: btree_iter_set_dontneed() · 8d956c2f

Kent Overstreet authored Mar 19, 2021

This is a bit clearer than using bch2_btree_iter_free().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8d956c2f