An error occurred fetching the project authors.
- 22 Oct, 2023 40 commits
-
-
Kent Overstreet authored
Implement a hash table, using cuckoo hashing, for empty buckets that are waiting on a journal commit before they can be reused. This replaces the journal_seq field of bucket_mark, and is part of eventually getting rid of the in memory bucket array. We may need to make bch2_bucket_needs_journal_commit() lockless, pending profiling and testing. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We're trying to track down a bug that shows itself as newly-created extents having stale dirty pointers - possibly due to the in memory gen and the btree gen being inconsistent. This patch changes the error message to also print out the in memory bucket gen when this happens. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This adds some missing diagnostics from rare but annoying to debug runtime allocation failure paths. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
If the btree updates pointing to a bucket were never flushed by the journal before the bucket became empty again, we can reuse the bucket without a journal flush. This tweaks the tracking of journal sequence numbers in alloc keys to implement this optimization: now, we only update the journal sequence number in alloc keys on transitions to and from empty. When a bucket becomes empty, we check if we can tell the journal not to flush entries starting from when the bucket was used. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
bch2_ec_mem_alloc() was only used by GC, and there's no real need to preallocate the stripes radix tree since we can cope fine with memory allocation failure when we use the radix tree. This deletes a fair bit of code, and it's also needed for the upcoming patch because bch2_btree_iter_peek_prev() won't be working before journal replay completes (and using it was incorrect previously, as well). Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
The allocator needs to wait until the last update touching a bucket has been commited before writing to it again. However, the code was checking against the last dirty journal sequence number, not the last flushed journal sequence number. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
The main in-memory bucket array is going away, but we'll still need to keep bucket generations in memory, at least for now - ptr_stale() needs to be an efficient operation. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Since the main in memory bucket array is going away, we don't want to be calling bucket() or __bucket() when what we want is the GC in-memory bucket. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
bch2_journal_key_insert() used to assume that the key passed to it was allocated with kmalloc(), and on success took ownership. This patch deletes that behaviour, making it more similar to bch2_trans_update()/bch2_trans_commit(). Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Now that bch2_bucket_alloc_new_fs() isn't looking at bucket marks to decide what buckets are eligible to allocate, we can clean up the filesystem initialization and device add paths. Previously, we had to use ancient code to mark superblock/journal buckets in the in memory bucket marks as we allocated them, and then zero that out and re-do that marking using the newer transational bucket mark paths. Now, we can simply delete the in-memory bucket marking. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This adds flags for options that must be a power of two (block size and btree node size), and options that are stored in the superblock as a power of two (encoded extent max). Also: options are now stored in memory in the same units they're displayed in (bytes): we now convert when getting and setting from the superblock. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This moves some common code into alloc_mem_to_key(), which translates from the in-memory format for a bucket to the btree key format. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This adds a new helper that much like the one we have for inode updates, that allocates the packed alloc key, packs it and calls bch2_trans_update. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We have two radix trees of stripes - one that mirrors some information from the stripes btree in normal operation, and another that GC uses to recalculate block usage counts. The normal one is now only used for finding partially empty stripes in order to reuse them - the normal stripes radix tree and the GC stripes radix tree are used significantly differently, so this patch splits them into separate types. In an upcoming patch we'll be replacing c->stripes with a btree that indexes stripes by the order we want to reuse them. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
With snapshots, bch2_trans_update() has to check if we need a whitout, which can cause a transaction restart, so this is important now. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
When we added the stripe and stripe_redundancy fields to alloc keys, we neglected to add them to the functions that convert back and forth with the in-memory types. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This simplifies the code quite a bit and eliminates an inconsistency - a given bkey doesn't necessarily translate to a single replicas entry for disk space accounting. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This changes the bch2_mark_key() and related paths to take mark lock where it is needed, instead of taking it in the upper transaction commit path - by pushing down locking we'll be able to handle fsck errors locally instead of requiring a separate check in the btree_gc code for replicas being marked. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This changes bch2_trans_fs_usage_apply() to handle failure (replicas entry missing) by reverting the changes it made - meaning we can make the main transaction commit path a bit slimmer, and perhaps also simplify some locking in upcoming patches. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Currently, btree triggers are run in natural key order, which presents a problem for fallocate in INSERT_RANGE mode: since we're moving existing extents to higher offsets, the trigger for deleting the old extent runs before the trigger that adds the new extent, potentially leading to indirect extents being deleted that shouldn't be when the delete causes the refcount to hit 0. This changes the order we run triggers so that for a givin btree, we run all insert triggers before overwrite triggers, nicely sidestepping this issue. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
The filesystem initialization path first marks superblock and journal buckets non transactionally, since the btree isn't functional yet. That path was updating the per-journal-buf percpu counters via bch2_dev_usage_update(), and updating the wrong set of counters so those updates didn't get written out until journal entry 4. The relevant code is going to get significantly rewritten in the future as we transition away from the in memory bucket array, so this just hacks around it for now. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Add fields to inode & alloc keys that record the journal sequence number when they were most recently modified. For alloc keys, this is needed to know what journal sequence number we have to flush before the bucket can be reused. Currently this is tracked in memory, but we'll be getting rid of the in memory bucket array. For inodes, this is needed for fsync when the inode has been evicted from the vfs cache. Currently we use a bloom filter per outstanding journal buf - but that mechanism has been broken since we added the ability to not issue a flush/fua for every journal write. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This allows triggers to distinguish between a key entering the btree - i.e. being called from the trans commit path - vs. being called on a key that already exists, i.e. by GC. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This helps to unify the interface between bch2_mark_key() and bch2_trans_mark_key() - and it also gives access to the journal reservation and journal seq in the mark_key path. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
- The backpointer that ec_stripe_update_ptrs() uses now needs to include the snapshot ID, which means we have to change where we add the backpointer to after getting the snapshot ID for the new extents - ec_stripe_update_ptrs() needs to be calling bch2_trans_begin() - improve error message in bch2_mark_stripe() Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
When the old or new key doesn't exist, we should still pass in a deleted key with the correct pos. This fixes a bug in the ec code, when bch2_mark_stripe() was looking up the wrong in-memory stripe. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
We should always print out the key we were marking. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
The way __bch2_mark_reflink_p returns errors was clashing with returning the number of sectors processed - we weren't returning FSCK_ERR_EXIT correctly. Fix this by only using the return code for errors, which actually ends up simplifying the overall logic. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
When a reflink pointer points to only part of an indirect extent, and then that indirect extent is fragmented (e.g. by copygc), if the reflink pointer only points to one of the fragments we leak a reference. Fix this by storing front/back pad values in reflink pointers - when inserting reflink pointesr, we initialize them to cover the full range of the indirect extents we reference. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
When a reflink pointer points to an indirect extent that doesn't exist, we need to replace it with a KEY_TYPE_error key. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This patch adds subvolume.c - support for the subvolumes and snapshots btrees and related data types and on disk data structures. The next patches will start hooking up this new code to existing code. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This moves some data dependencies forward, to improve pipelining. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Brett Holman authored
This fix replaces multiple 64 bit divisions with do_div() equivalents. Signed-off-by:
Brett Holman <bholman.devel@gmail.com> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
DIV_ROUND_UP() wasn't doing what we wanted when passing it negative numbers - fix it by just not passing it negative numbers anymore. Also, no need to do the scaling by compression ratio for incompressible data. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
- We no longer mark subsets of extents, they're marked like regular keys now - which means we can drop the offset & sectors arguments to trigger functions - Drop other arguments that are no longer needed anymore in various places - fs_usage - Drop the logic for handling extents in bch2_mark_update() that isn't needed anymore, to match bch2_trans_mark_update() - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we don't have an old key to mark Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Adding iter->should_be_locked introduced a regression where it ended up not being set on the iterator passed to bch2_btree_update_start(), which is definitely not what we want. This patch requires it to be set when calling bch2_trans_update(), and adds various fixups to make that happen. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
Now that bch2_btree_iter_peek_with_updates() has been removed in favor of BTREE_ITER_WITH_UPDATES, we need to make sure it's not used where we don't want it. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-
Kent Overstreet authored
This rarely used error path should've been checking for underflow - oops. Signed-off-by:
Kent Overstreet <kent.overstreet@gmail.com>
-