74 Commits

Author SHA1 Message Date
Kent Overstreet
953ee28a3e bcachefs: Kill bch2_btree_iter_peek_cached()
It's now been rolled into bch2_btree_iter_peek_slot()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:06 -04:00
Kent Overstreet
5288e66a7b bcachefs: BTREE_ITER_WITH_UPDATES
This drops bch2_btree_iter_peek_with_updates() and replaces it with a
new flag, BTREE_ITER_WITH_UPDATES, and also reworks
bch2_btree_iter_peek_slot() to respect it too.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:05 -04:00
Kent Overstreet
509d3e0a8d bcachefs: Child btree iterators
This adds the ability for btree iterators to own child iterators - to be
used by an upcoming rework of bch2_btree_iter_peek_slot(), so we can
scan forwards while maintaining our current position.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
66a0a49750 bcachefs: btree_iter->should_be_locked
Add a field to struct btree_iter for tracking whether it should be
locked - this fixes spurious transaction restarts in
bch2_trans_relock().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
531a0095c9 bcachefs: Improve btree iterator tracepoints
This patch adds some new tracepoints to the btree iterator code, and
adds new fields to the existing tracepoints - primarily for the iterator
position.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
2527dd9158 bcachefs: Improve bch2_btree_iter_traverse_all()
By changing it to upgrade iterators to intent locks to avoid lock
restarts we can simplify __bch2_btree_node_lock() quite a bit - this
fixes a probable bug where it could potentially drop a lock on an
unrelated error but still succeed instead of causing a transaction
restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:00 -04:00
Kent Overstreet
08e337618f bcachefs: Drop some memset() calls
gcc is emitting rep stos here, which is silly (and slow) for an 8 byte
memset.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:59 -04:00
Kent Overstreet
2d587674ba bcachefs: Increase commality between BTREE_ITER_NODES and BTREE_ITER_KEYS
Eventually BTREE_ITER_NODES should be going away. This patch is to fix a
transaction iterator overflow in the btree node merge path because
BTREE_ITER_NODES iterators couldn't be reused.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:58 -04:00
Kent Overstreet
5c1d808ad8 bcachefs: Drop trans->nounlock
Since we're no longer doing btree node merging post commit, we can now
delete a bunch of code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:58 -04:00
Kent Overstreet
e751c01a8e bcachefs: Start using bpos.snapshot field
This patch starts treating the bpos.snapshot field like part of the key
in the btree code:

* bpos_successor() and bpos_predecessor() now include the snapshot field
* Keys in btrees that will be using snapshots (extents, inodes, dirents
  and xattrs) now always have their snapshot field set to U32_MAX

The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that
determines whether we're iterating over keys in all snapshots or not -
internally, this controlls whether bkey_(successor|predecessor)
increment/decrement the snapshot field, or only the higher bits of the
key.

We add a new member to struct btree_iter, iter->snapshot: when
BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always
equal iter->snapshot, which will be 0 for btrees that don't use
snapshots, and alsways U32_MAX for btrees that will use snapshots
(until we enable snapshot creation).

This patch also introduces a new metadata version number, and compat
code for reading from/writing to older versions - this isn't a forced
upgrade (yet).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00
Kent Overstreet
f793fd85dc bcachefs: Fix for bch2_trans_commit() unlocking when it's not supposed to
When we pass BTREE_INSERT_NOUNLOCK bch2_trans_commit isn't supposed to
unlock after a successful commit, but it was calling
bch2_trans_cond_resched() - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00
Kent Overstreet
08070cba4a bcachefs: Split btree_iter_traverse and bch2_btree_iter_traverse()
External (to the btree iterator code) users of bch2_btree_iter_traverse
expect that on success the iterator will be pointed at iter->pos and
have that position locked - but since we split iter->pos and
iter->real_pos, that means it has to update iter->real_pos if necessary.

Internal users don't expect it to modify iter->real_pos, so we need two
separate functions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00
Kent Overstreet
bcad562259 bcachefs: Update iter->real_pos lazily
peek() has to update iter->real_pos - there's no need for
bch2_btree_iter_set_pos() to update it as well.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:57 -04:00
Kent Overstreet
e0ba3b6429 bcachefs: Replace bch2_btree_iter_next() calls with bch2_btree_iter_advance
The way btree iterators work internally has been changing, particularly
with the iter->real_pos changes, and bch2_btree_iter_next() is no longer
hyper optimized - it's just advance followed by peek, so it's more
efficient to just call advance where we're not using the return value of
bch2_btree_iter_next().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
8d956c2fb8 bcachefs: btree_iter_set_dontneed()
This is a bit clearer than using bch2_btree_iter_free().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
abcecb49f5 bcachefs: Fsck code refactoring
Change fsck code to always put btree iterators - also, make some flow
control improvements to deal with lock restarts better, and refactor
check_extents() to not walk extents twice for counting/checking
i_sectors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
c7bb769c81 bcachefs: __bch2_trans_get_iter() refactoring, BTREE_ITER_NOT_EXTENTS
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
27ace9cc01 bcachefs: Simplify for_each_btree_key()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
18fc6ae503 bcachefs: btree_iter_prev_slot()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
1f7fdc0abd bcachefs: btree_iter_live()
New helper to clean things up a bit - also, improve iter->flags
handling.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
792e2c4c85 bcachefs: Kill bch2_btree_iter_set_pos_same_leaf()
The only reason we were keeping this around was for
BTREE_INSERT_NOUNLOCK semantics - if bch2_btree_iter_set_pos() advances
to the next leaf node, it'll drop the lock on the node that we just
inserted to.

But we don't rely on BTREE_INSERT_NOUNLOCK semantics for the extents
btree, just the inodes btree, and if we do need it for the extents btree
in the future we can do it more cleanly by cloning the iterator - this
lets us delete some special cases in the btree iterator code, which is
complicated enough as it is.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:53 -04:00
Kent Overstreet
cc578a36f9 bcachefs: Fix __btree_iter_next() when all iters are in use_next() when all iters are in use
Also, print out more information on btree transaction iterator overflow.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:49 -04:00
Kent Overstreet
3eb26d0157 bcachefs: bch2_trans_get_iter() no longer returns errors
Since we now always preallocate the maximum number of iterators when we
initialize a btree transaction, getting an iterator never fails - we can
delete a fair amount of error path code.

This patch also simplifies the iterator allocation code a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:48 -04:00
Kent Overstreet
7e7ae6ca57 bcachefs: Fix spurious transaction restarts
The checks for lock ordering violations weren't quite right.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:46 -04:00
Kent Overstreet
645d72aa36 bcachefs: Fix btree updates when mixing cached and non cached iterators
There was a bug where bch2_trans_update() would incorrectly delete a
pending update where the new update did not actually overwrite the
existing update, because we were incorrectly using BTREE_ITER_TYPE when
sorting pending btree updates.

This affects the pending patch to use cached iterators for inode
updates.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet
2ca88e5ad9 bcachefs: Btree key cache
This introduces a new kind of btree iterator, cached iterators, which
point to keys cached in a hash table. The cache also acts as a write
cache - in the update path, we journal the update but defer updating the
btree until the cached entry is flushed by journal reclaim.

Cache coherency is for now up to the users to handle, which isn't ideal
but should be good enough for now.

These new iterators will be used for updating inodes and alloc info (the
alloc and stripes btrees).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:41 -04:00
Kent Overstreet
72545b5e76 bcachefs: bch2_trans_downgrade()
bch2_btree_iter_downgrade() was looping over all iterators in a
transaction; bch2_trans_downgrade() should be doing that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:40 -04:00
Kent Overstreet
f96c0df4db bcachefs: Fix a deadlock in bch2_btree_node_get_sibling()
There was a bad interaction with bch2_btree_iter_set_pos_same_leaf(),
which can leave a btree node locked that is just outside iter->pos,
breaking the lock ordering checks in __bch2_btree_node_lock(). Ideally
we should get rid of this corner case, but for now fix it locally with
verbose comments.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:40 -04:00
Kent Overstreet
495aabede3 bcachefs: Add debug code to print btree transactions
Intented to help debug deadlocks, since we can't use lockdep to check
btree node lock ordering.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:40 -04:00
Kent Overstreet
0329b1507d bcachefs: Trace where btree iterators are allocated
This will help with iterator overflow bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:38 -04:00
Kent Overstreet
39fb2983c5 bcachefs: Kill bkey_type_successor
Previously, BTREE_ID_INODES was special - inodes were indexed by the
inode field, which meant the offset field of struct bpos wasn't used,
which led to special cases in e.g. the btree iterator code.

Now, inodes in the inodes btree are indexed by the offset field.

Also: prevously min_key was special for extents btrees, min_key for
extents would equal max_key for the previous node. Now, min_key =
bkey_successor() of the previous node, same as non extent btrees.

This means we can completely get rid of
btree_type_sucessor/predecessor.

Also make some improvements to the metadata IO validate/compat code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:37 -04:00
Kent Overstreet
57b0b3db47 bcachefs: btree_iter_peek_with_updates()
Introduce a new iterator method that provides a consistent view of the
btree plus uncommitted updates.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:36 -04:00
Kent Overstreet
2dac0eae78 bcachefs: Iterator debug code improvements
More aggressively checking iterator invariants, and fixing the resulting
bugs. Also greatly simplifying iter_next() and iter_next_slot() - they
were hyper optimized before, but the optimizations were getting too
brittle.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:36 -04:00
Kent Overstreet
163e885a0a bcachefs: Kill TRANS_RESET_MEM|TRANS_RESET_ITERS
All iterators should be released now with bch2_trans_iter_put(), so
TRANS_RESET_ITERS shouldn't be needed anymore, and TRANS_RESET_MEM is
always used.

Also convert more code to __bch2_trans_do().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:35 -04:00
Kent Overstreet
6a9ec82826 bcachefs: __bch2_btree_iter_set_pos()
This one takes an additional argument for whether we're searching for >=
or > the search key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:35 -04:00
Kent Overstreet
f21539a56d bcachefs: Use bch2_trans_reset in bch2_trans_commit()
Clean up a bit of duplicated code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Kent Overstreet
a8abd3a7f6 bcachefs: bch2_trans_reset() calls should be at the tops of loops
It needs to be called when we get -EINTR due to e.g. lock restart - this
fixes a transaction iterators overflow bug.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Kent Overstreet
887c2a4ee5 bcachefs: bch2_btree_iter_fix_key_modified()
This is considerably cheaper than bch2_btree_node_iter_fix(), for cases
where the key was only modified and key ordering isn't changing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
2a9101a989 bcachefs: Refactor bch2_trans_commit() path
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
64bc001153 bcachefs: Rework btree iterator lifetimes
The btree_trans struct needs to memoize/cache btree iterators, so that
on transaction restart we don't have to completely redo btree lookups,
and so that we can do them all at once in the correct order when the
transaction had to restart to avoid a deadlock.

This switches the btree iterator lookups to work based on iterator
position, instead of trying to match them up based on the stack trace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:28 -04:00
Kent Overstreet
ef9f95ba41 bcachefs: Improve error handling for for_each_btree_key_continue()
Change it to match for_each_btree_key()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:28 -04:00
Kent Overstreet
ccf5a10958 bcachefs: bch2_btree_iter_peek_prev()
Last of the basic operations for iterating forwards and backwards over
the btree: we now have
 - peek(),	returns key >= iter->pos
 - next(),	returns key >  iter->pos
 - peek_prev(),	returns key <= iter->pos
 - prev(),	returns key < iter->pos

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:27 -04:00
Kent Overstreet
9b02d1c49a bcachefs: Optimize calls to bch2_btree_iter_traverse()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:27 -04:00
Kent Overstreet
36e9d69854 bcachefs: Do updates in order they were queued up in
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:26 -04:00
Kent Overstreet
c8b18c37b2 bcachefs: fix for_each_btree_key()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:25 -04:00
Kent Overstreet
61011ea237 bcachefs: Rip out old hacky transaction restart tracing
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00
Kent Overstreet
20bceecb31 bcachefs: More work to avoid transaction restarts
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00
Kent Overstreet
7d82586660 bcachefs: Avoid spurious transaction restarts
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00
Kent Overstreet
0e6dd8fba0 bcachefs: Ensure bch2_btree_iter_next() always advances
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00
Kent Overstreet
58fbf80834 bcachefs: Delete duplicate code
Also rename for consistency

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00