197 Commits

Author SHA1 Message Date
Kent Overstreet
1b30ed5fd8 bcachefs: Use btree write buffer for LRU btree
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:53 -04:00
Daniel Hill
8ffa11a2c5 bcachefs: let __bch2_btree_insert() pass in flags
This patch is prep work for the following patch.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:52 -04:00
Kent Overstreet
629a21b621 bcachefs: Improve invalidate_one_bucket() error messages
Make sure to check for lru entries that point to buckets that don't
exist as well as buckets in the wrong state, and improve the error
message we print out.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:52 -04:00
Kent Overstreet
dbe17f1883 bcachefs: BKEY_INVALID_FROM_JOURNAL
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:52 -04:00
Kent Overstreet
facafdcbc1 bcachefs: Change bkey_invalid() rw param to flags
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:52 -04:00
Kent Overstreet
83f33d6865 bcachefs: Rework lru btree
This patch changes how the LRU index works:

Instead of using KEY_TYPE_lru where the bucket the lru entry points to
is part of the value, this switches to KEY_TYPE_set and encoding the
bucket we refer to in the low bits of the key.

This means that we no longer have to check for collisions when inserting
LRU entries. We'll be making using of this in the next patch, which adds
a btree write buffer - a pure write buffer for btree updates, where
updates are appended to a simple array and then periodically sorted and
batch inserted.

This is a new on disk format version, and a forced upgrade.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:52 -04:00
Kent Overstreet
5250b74d55 bcachefs: bucket_gens btree
To improve mount times, add a btree for just bucket gens, 256 of them
per key: this means we'll have to scan drastically less metadata at
startup.

This adds
 - trigger for keeping it in sync with the all btree
 - initialization code, for filesystems from previous versions
 - new path for reading bucket gens
 - new fsck code

And a new on disk format version.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:51 -04:00
Kent Overstreet
d23124c757 bcachefs: Improve bch2_check_alloc_info()
This factors out a new helper from bch2_dev_freespace_init(),
bch2_get_key_or_hole(), and uses it in bch2_check_alloc_info(): we're
now able to process holes in the alloc btree as ranges, instead of one
bucket at a time.

This will improve fsck performance on new filesystems, or filesystems
where not every bucket has been used yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:51 -04:00
Kent Overstreet
cc65f56599 bcachefs: Improve bch2_dev_freespace_init()
This makes bch2_dev_freespace_init() much faster: instead of processing
every bucket on the device one at a time, we handle ranges of missing
keys all at once: the freespace btree is an extents style btree, so we
only have to insert one freespace key for every range of missing keys
in the alloc btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:51 -04:00
Kent Overstreet
a8c752bb1d bcachefs: New on disk format: Backpointers
This patch adds backpointers: we now have a reverse index from device
and offset on that device (specifically, offset within a bucket) back to
btree nodes and (non cached) data extents.

The first 40 backpointers within a bucket are stored in the alloc key;
after that backpointers spill over to the next backpointers btree. This
is to help avoid performance regressions from additional btree updates
on large streaming workloads.

This patch adds all the code for creating, checking and repairing
backpointers. The next patch in the series is going to use backpointers
for copygc - finally getting rid of the need to scan all extents to do
copygc.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:50 -04:00
Kent Overstreet
f2b542ba42 bcachefs: Go RW before check_alloc_info()
It's possible to do btree updates before going RW by adding them to the
list of updates for journal replay to do, but this is limited by what
fits in RAM. This patch switches the second alloc info phase to run
after going RW - btree_gc has already ensured the alloc btree itself is
correct - and tweaks the allocation path to deal with the potential
small inconsistencies.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:50 -04:00
Kent Overstreet
d94189ad56 bcachefs: Debug mode for c->writes references
This adds a debug mode where we split up the c->writes refcount into
distinct refcounts for every codepath that takes a reference, and adds
sysfs code to print the value of each ref.

This will make it easier to debug shutdown hangs due to refcount leaks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:50 -04:00
Kent Overstreet
f52dd1ae20 bcachefs: Fix bch_alloc_to_text()
We weren't guarding against the alloc key having an invalid data type.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:49 -04:00
Kent Overstreet
19a614d2e4 bcachefs: Better inlining for bch2_alloc_to_v4_mut
This separates out the slowpath into a separate function, and inlines
bch2_alloc_v4_mut into bch2_trans_start_alloc_update(), the main place
it's called.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:49 -04:00
Kent Overstreet
313816363a bcachefs: bch2_trans_relock_notrace()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:49 -04:00
Kent Overstreet
994ba47543 bcachefs: New btree helpers
This introduces some new conveniences, to help cut down on boilerplate:

 - bch2_trans_kmalloc_nomemzero() - performance optimiation
 - bch2_bkey_make_mut()
 - bch2_bkey_get_mut()
 - bch2_bkey_get_mut_typed()
 - bch2_bkey_alloc()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:48 -04:00
Kent Overstreet
78c0b75c34 bcachefs: More errcode cleanup
We shouldn't be overloading standard error codes now that we have
provisions for bcachefs-specific errorcodes: this patch converts super.c
and super-io.c to per error site errcodes, with a bit of cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:48 -04:00
Kent Overstreet
e88a75ebe8 bcachefs: New bpos_cmp(), bkey_cmp() replacements
This patch introduces
 - bpos_eq()
 - bpos_lt()
 - bpos_le()
 - bpos_gt()
 - bpos_ge()

and equivalent replacements for bkey_cmp().

Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:47 -04:00
Kent Overstreet
674cfc2624 bcachefs: Add persistent counters for all tracepoints
Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:39 -04:00
Kent Overstreet
549d173c1b bcachefs: EINTR -> BCH_ERR_transaction_restart
Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:37 -04:00
Kent Overstreet
d4bf5eecd7 bcachefs: Use bch2_err_str() in error messages
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:36 -04:00
Kent Overstreet
615f867c14 bcachefs: Improved errcodes
Instead of overloading standard error codes (EINTR/EAGAIN), and defining
short lists of error codes in multiple places that potentially end up
overlapping & conflicting, we're now going to have one master list of
error codes.

Error codes are defined with an x-macro: thus we also have
bch2_err_str() now.

Also, error codes have a class field. Now, instead of checking for
errors with ==, code should use bch2_err_matches(), which returns true
if the error is equal to or a sub-error of the error class.

This means we can define unique errors for every source location where
an error is generated, which will help improve our error messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:36 -04:00
Kent Overstreet
445d184af2 bcachefs: Convert alloc code to for_each_btree_key_commit()
The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:36 -04:00
Kent Overstreet
d04801a0f4 bcachefs: Convert bch2_do_invalidates_work() to for_each_btree_key2()
The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:36 -04:00
Kent Overstreet
ca91f40ff7 bcachefs: Convert bch2_dev_freespace_init() to for_each_btree_key_commit()
The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:36 -04:00
Kent Overstreet
4910a9506c bcachefs: Convert bch2_do_discards_work() to for_each_btree_key2()
The new for_each_btree_key2() macro handles transaction retries,
allowing us to avoid nested transactions - which we want to avoid since
they're tricky to do completely correctly and upcoming assertions are
going to be checking for that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:36 -04:00
Kent Overstreet
a1783320d4 bcachefs: for_each_btree_key2()
This introduces two new macros for iterating through the btree, with
transaction restart handling
 - for_each_btree_key2()
 - for_each_btree_key_commit()

Every iteration is now in an implicit transaction, and - as with
lockrestart_do() and commit_do() - returning -EINTR will cause the
transaction to be restarted, at the same key.

This patch converts a bunch of code that was open coding this to these
new macros, saving a substantial amount of code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:35 -04:00
Kent Overstreet
e68914ca84 bcachefs: Rename __bch2_trans_do() -> commit_do()
Better/more descriptive naming, and prep for adding
nested_lockrestart_do() and nested_commit_do().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:35 -04:00
Kent Overstreet
80b3bf33d3 bcachefs: Silence some fsck errors when reconstructing alloc info
There's no need to print fsck errors for errors that are expected, and
the user has already opted to repair.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:35 -04:00
Kent Overstreet
47ab0c5f6a bcachefs: Fix bch2_check_alloc_key()
bch2_check_alloc_key() was failing to check buckets that didn't have
alloc keys yet (because they'd never been used) - they still need to be
added to the freespace btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
e34da43e33 bcachefs: Improve bch2_check_alloc_info
- In check_alloc_key(), previously we were re-initializing iterators
   for the need_discard and freespace btrees for every alloc key we
   checked. But this was causing us to redo lookups into the journal
   keys every time, since those lookups are cached in struct btree_iter.
   This initializes the iterators in bch2_check_alloc_info and passes
   them into check_alloc_key().

 - Make the looping more consistent/efficient in bch2_check_alloc_info()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
22add2ec67 bcachefs: Use BTREE_INSERT_LAZY_RW in bch2_check_alloc_info()
This runs before we go rw for journal replay, but after we're allowed to
go rw. It might be time to consider killing BTREE_INSERT_LAZY_RW,
though.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
3858536744 bcachefs: Bucket invalidate path improvements
- invalidate_one_bucket() now returns 1 when we don't have any buckets
   on this device to invalidate, ensuring we don't spin
 - the tracepoint invocation is moved to after the transaction commit,
   and we now include the number of cached sectors in the tracepoint

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
1c6ff39445 bcachefs: Fix refcount leak in bch2_do_invalidates()
If we fail to queue the work item because it's already in process, we
need to drop the ref we just took.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
a3d7afa5c1 bcachefs: Always use percpu_ref_tryget_live() on c->writes
If we're trying to get a ref and the refcount has been killed, it means
we're doing an emergency shutdown - we always want tryget_live().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
6f44a9940c bcachefs: Add a persistent counter for bucket discards
Like the previous patch for bucket invalidates, add another counter for
a core allocator path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
440c15cc91 bcachefs: Add a persistent counter for bucket invalidation
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
df8c2ccb93 bcachefs: Fix freespace initialization
bch2_dev_freespace_init() was using __bch2_trans_do() incorrectly, and
calling bch2_bucket_do_index() with a stale alloc key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:33 -04:00
Kent Overstreet
401ec4db63 bcachefs: Printbuf rework
This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:33 -04:00
Kent Overstreet
1cab5a82cc bcachefs: Go RW before bch2_check_lrus()
btree updates before going RW are expensive if they're in random order,
since they use the list of keys for journal replay to insert, which is
just a gap buffer.

This patch improves the bucket invalidate path so that if
bch2_check_lrus() hasn't finished it only prints warnings instead of
doing an emergency shutdown, which means we can now set BCH_FS_MAY_GO_RW
before bch2_check_lrus().

Also, the filesystem state bits are reorganized a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00
Kent Overstreet
1f93726e63 bcachefs: Tracepoint improvements
Delete some obsolete tracepoints, organize alloc tracepoints better,
make a few tracepoints more consistent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00
Kent Overstreet
e1b8f5f5ca bcachefs: Plumb btree_id & level to trans_mark
For backpointers, we'll need the full key location - that means btree_id
and btree level. This patch plumbs it through.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00
Kent Overstreet
0b09032653 bcachefs: Improve bch2_lru_delete() error messages
When we detect a filesystem inconsistency, we should include the
relevent keys in the error message. This patch adds a parameter to pass
the key with the lru entry to bch2_lru_delete(), so that it can be
printed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:31 -04:00
Kent Overstreet
9b93596c33 bcachefs: Improve error message when alloc key doesn't match lru entry
Error messages should always print out the full key when available -
this gives us a starting point when looking through the journal to debug
what went wrong.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:31 -04:00
Kent Overstreet
7003589dab bcachefs: Ensure buckets have io_time[READ] set
It's an error if a bucket is in state BCH_DATA_cached but not on the LRU
btree - i.e io_time[READ] == 0 - so, make sure it's set before adding
it.

Also, make some of the LRU code a bit clearer and more direct.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:31 -04:00
Kent Overstreet
84befe8ef9 bcachefs: Use bch2_trans_inconsistent_on() in more places
This gets us better error messages.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:31 -04:00
Kent Overstreet
a9c0a4cbf1 bcachefs: Minor device removal fixes
- We weren't clearing the LRU btree
 - bch2_alloc_read() runs before bch2_check_alloc_key() deletes alloc
   keys for devices/buckets that don't exists, so it needs to check for
   that
 - bch2_check_lrus() needs to check that buckets exists
 - improve some error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:31 -04:00
Kent Overstreet
aae29082c6 bcachefs: bch2_btree_delete_extent_at()
New helper, for deleting extents.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:31 -04:00
Kent Overstreet
822835ffea bcachefs: Fold bucket_state in to BCH_DATA_TYPES()
Previously, we were missing accounting for buckets in need_gc_gens and
need_discard states. This matters because buckets in those states need
other btree operations done before they can be used, so they can't be
conuted when checking current number of free buckets against the
allocation watermark.

Also, we weren't directly counting free buckets at all. Now, data type 0
== BCH_DATA_free, and free buckets are counted; this means we can get
rid of the separate (poorly defined) count of unavailable buckets.

This is a new on disk format version, with upgrade and fsck required for
the accounting changes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00
Kent Overstreet
62491956f4 bcachefs: Move alloc assertion to .key_invalid()
.key_invalid is a better place for this assertion.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00