356 Commits

Author SHA1 Message Date
Kent Overstreet
c96f108b05 bcachefs: Optimize bch2_trans_iter_init()
When flags & btree_id are constants, we can constant fold the entire
calculation of the actual iterator flags - and the whole thing becomes
small enough to inline.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:46 -04:00
Kent Overstreet
c9ee99ad8c bcachefs: Move some asserts behind CONFIG_BCACHEFS_DEBUG
Convert some non-critical asserts in long-stable code to debug asserts.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:46 -04:00
Kent Overstreet
0f35e0860a bcachefs: Fix return code from btree_path_traverse_one()
trans->restarted is a positive error code, not the usual negative

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:46 -04:00
Kent Overstreet
b2d1d56b1d bcachefs: Fixes for building in userspace
- Marking a non-static function as inline doesn't actually work and is
   now causing problems - drop that

 - Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages
   with bcachefs (filesystem name)

 - Userspace doesn't have real percpu variables (maybe we can get this
   fixed someday), put an #ifdef around bch2_disk_reservation_add()
   fastpath

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:46 -04:00
Kent Overstreet
984dc67e3b bcachefs: Improve __bch2_btree_path_make_mut()
btree_path_copy() doesn't need to call
bch2_btree_path_check_sort_fast() - the newly allocated path will always
be in the correct position, post copy; also delete some redundant
branches from __bch2_btree_path_make_mut().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
0cc455b3ca bcachefs: Inlining improvements
- Don't call into bch2_encrypt_bio() when we're not encrypting
 - Pull slowpath out of trans_lock_write()
 - Make sure bc2h_trans_journal_res_get() gets inlined.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
a101957649 bcachefs: More style fixes
Fixes for various checkpatch errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
c167f9e541 bcachefs: Journal keys overlay fixes
- In the btree iterator code that overlays keys from the journal, we
   were incorrectly specifying level=0 instead of the btree_path's
   current level in a few places
 - When we didn't do journal replay, we shouldn't free the journal keys:
   this fixes cmd_list and cmd_dump, which run in norecovery mode

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
1f69368c5c bcachefs: Fix an out-of-bounds shift
roundup_pow_of_two() is undefined for 0 - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
c81f5836a4 bcachefs: Don't touch c->flags in bch2_trans_iter_init()
This moves the JOURNAL_REPLAY_DONE flag check from
bch2_trans_iter_init() to bch2_trans_init(), where we stash a copy in
btree_trans - gaining us a small performance improvement.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:44 -04:00
Kent Overstreet
3e3e02e6bc bcachefs: Assorted checkpatch fixes
checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:

 ASSIGN_IN_IF
 UNSPECIFIED_INT	- bcachefs coding style prefers single token type names
 NEW_TYPEDEFS		- typedefs are occasionally good
 FUNCTION_ARGUMENTS	- we prefer to look at functions in .c files
			  (hopefully with docbook documentation), not .h
			  file prototypes
 MULTISTATEMENT_MACRO_USE_DO_WHILE
			- we have _many_ x-macros and other macros where
			  we can't do this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:44 -04:00
Kent Overstreet
307e3c1319 bcachefs: Optimize bch2_trans_init()
Now we store the transaction's fn idx in a local variable, instead of
redoing the lookup every time we call bch2_trans_init().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:44 -04:00
Kent Overstreet
29aa78f15e bcachefs: Split out __btree_path_up_until_good_node()
This breaks up btree_path_up_until_good_node() so that only the fastpath
gets inlined.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:44 -04:00
Kent Overstreet
d7e4e51370 bcachefs: Switch to local_clock() for fastpath time source
local_clock() isn't always completely accurate - e.g. on machines with
TSC drift - but ktime_get_ns() overhead is too high, unfortunately.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:44 -04:00
Kent Overstreet
dccedaaa52 bcachefs: Fix btree node prefetchig
We were forgetting to count down the number of nodes to prefetch, firing
off _way_ more than intended - whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:44 -04:00
Kent Overstreet
f42238b5cd bcachefs: Fix a rare path in bch2_btree_path_peek_slot()
In the drop_alloc tests, we may end up calling
bch2_btree_iter_peek_slot() on an interior level that doesn't exist.
Previously, this would hit the path->uptodate assertion in
bch2_btree_path_peek_slot(); this path first checks a NULL btree node,
which is how we know we're at the end of the btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet
7dcbdbd85c bcachefs: bch2_path_put_nokeep()
The btree iterator code may allocate extra btree paths, temporarily,
that do not refer to keys being returned: we don't need to wait until
transaction restart to drop these, when they're not referenced they
should be deleted right away.

This fixes a transaction path overflow bug.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet
969576ecae bcachefs: bch2_btree_iter_peek() now works with interior nodes
Needed by the next patch, which will be iterating over keys in nodes at
level 1.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet
29cea6f483 bcachefs: Fix bch2_btree_path_up_until_good_node()
There was a rare bug when path->locks_want was nonzero, but not
BTREE_MAX_DEPTH, where we'd return on a valid node that wasn't locked -
oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
c298fd7d34 bcachefs; Mark __bch2_trans_iter_init as inline
This function is fairly small and only used in two places: one very hot,
the other cold, so it should definitely be inlined.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
3f3bc66ef0 bcachefs: Optimize btree_path_alloc()
- move slowpath code to a separate function, btree_path_overflow()
 - no need to use hweight64
 - copy nr_max_paths from btree_transaction_stats to btree_trans,
   avoiding a data dependency in the fast path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
14d8f26ad0 bcachefs: Inline bch2_trans_kmalloc() fast path
Small performance optimization.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
a8f3542843 bcachefs: bch2_print_string_as_lines()
This adds a helper for printing a large buffer one line at a time, to
avoid the 1k printk limit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
e9174370d0 bcachefs: bch2_btree_node_relock_notrace()
Most of the node_relock_fail trace events are generated from
bch2_btree_path_verify_level(), when debugcheck_iterators is enabled -
but we're not interested in these trace events, they don't indicate that
we're in a slowpath.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
afbc719468 bcachefs: Improve bch2_btree_trans_to_text()
This is just a formatting/readability improvement.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
0d7009d7ca bcachefs: Delete old deadlock avoidance code
This deletes our old lock ordering based deadlock avoidance code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:41 -04:00
Kent Overstreet
96d994b37c bcachefs: Print deadlock cycle in debugfs
In the event that we're not finished debugging the cycle detector, this
adds a new file to debugfs that shows what the cycle detector finds, if
anything. By comparing this with btree_transactions, which shows held
locks for every btree_transaction, we'll be able to determine if it's
the cycle detector that's buggy or something else.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
33bd5d0686 bcachefs: Deadlock cycle detector
We've outgrown our own deadlock avoidance strategy.

The btree iterator API provides an interface where the user doesn't need
to concern themselves with lock ordering - different btree iterators can
be traversed in any order. Without special care, this will lead to
deadlocks.

Our previous strategy was to define a lock ordering internally, and
whenever we attempt to take a lock and trylock() fails, we'd check if
the current btree transaction is holding any locks that cause a lock
ordering violation. If so, we'd issue a transaction restart, and then
bch2_trans_begin() would re-traverse all previously used iterators, but
in the correct order.

That approach had some issues, though.
 - Sometimes we'd issue transaction restarts unnecessarily, when no
   deadlock would have actually occured. Lock ordering restarts have
   become our primary cause of transaction restarts, on some workloads
   totally 20% of actual transaction commits.

 - To avoid deadlock or livelock, we'd often have to take intent locks
   when we only wanted a read lock: with the lock ordering approach, it
   is actually illegal to hold _any_ read lock while blocking on an intent
   lock, and this has been causing us unnecessary lock contention.

 - It was getting fragile - the various lock ordering rules are not
   trivial, and we'd been seeing occasional livelock issues related to
   this machinery.

So, since bcachefs is already a relational database masquerading as a
filesystem, we're stealing the next traditional database technique and
switching to a cycle detector for avoiding deadlocks.

When we block taking a btree lock, after adding ourself to the waitlist
but before sleeping, we do a DFS of btree transactions waiting on other
btree transactions, starting with the current transaction and walking
our held locks, and transactions blocking on our held locks.

If we find a cycle, we emit a transaction restart. Occasionally (e.g.
the btree split path) we can not allow the lock() operation to fail, so
if necessary we'll tell another transaction that it has to fail.

Result: trans_restart_would_deadlock events are reduced by a factor of
10 to 100, and we'll be able to delete a whole bunch of grotty, fragile
code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:41 -04:00
Kent Overstreet
845cffed0d bcachefs: Add a debug assert
Chasing down a strange locking bug.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
57ce827442 bcachefs: Make an assertion more informative
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
e4215d0fec bcachefs: All held locks must be in a btree path
With the new deadlock cycle detector, it's critical that all held locks
be marked in a btree_path, because that's what the cycle detector
traverses - any locks that aren't correctly marked will cause deadlocks.

This changes the btree_path to allocate some btree_paths for the new
nodes, since until the final update is done we otherwise don't have a
path referencing them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
38474c2642 bcachefs: Avoid using btree_node_lock_nopath()
With the upcoming cycle detector, we have to be careful about using
btree_node_lock_nopath - in particular, using it to take write locks can
cause deadlocks.

All held locks need to be tracked in a btree_path, so that the cycle
detector knows about them - unless we know that we cannot cause
deadlocks for other reasons: e.g. we are only taking read locks, or
we're in very early fsck (topology repair).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
4e6defd106 bcachefs: btree_bkey_cached_common->cached
Add a type descriptor to btree_bkey_cached_common - there's no reason
not to since we've got padding that was otherwise unused, and this is a
nice cleanup (and helpful in later patches).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
674cfc2624 bcachefs: Add persistent counters for all tracepoints
Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:39 -04:00
Kent Overstreet
c240c3a944 bcachefs: Print lock counts in debugs btree_transactions
Improve our debugfs output, to help in debugging deadlocks: this shows,
for every btree node we print, the current number of readers/intent
locks/write locks held.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:39 -04:00
Kent Overstreet
14599cce44 bcachefs: Switch btree locking code to struct btree_bkey_cached_common
This is just some type safety cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:39 -04:00
Kent Overstreet
616928c30f bcachefs: Track maximum transaction memory
This patch
 - tracks maximum bch2_trans_kmalloc() memory used in btree_transaction_stats
 - makes it available in debugfs
 - switches bch2_trans_init() to using that for the amount of memory to
   preallocate, instead of the parameter passed in

This drastically reduces transaction restarts, and means we no longer
need to track this in the source code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:39 -04:00
Kent Overstreet
2e27f6567b bcachefs: Kill nodes_intent_locked
Previously, we used two different bit arrays for tracking held btree
node locks. This patch switches to an array of two bit integers, which
will let us track, in a future patch, when we hold a write lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:39 -04:00
Kent Overstreet
d4263e5638 bcachefs: Better use of locking helpers
Held btree locks are tracked in btree_path->nodes_locked and
btree_path->nodes_intent_locked. Upcoming patches are going to change
the representation in struct btree_path, so this patch switches to
proper helpers instead of direct access to these fields.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:39 -04:00
Kent Overstreet
cd5afabea1 bcachefs: btree_locking.c
Start to centralize some of the locking code in a new file; more locking
code will be moving here in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:39 -04:00
Kent Overstreet
efd0d03816 bcachefs: Minor transaction restart handling fix
- fsck_inode_rm() wasn't returning BCH_ERR_transaction_restart_nested
 - change bch2_trans_verify_not_restarted() to call panic() - we don't
   want these errors to be missed

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:38 -04:00
Kent Overstreet
23dfb3a2f7 bcachefs: Fix bch2_btree_iter_peek_slot() error path
iter->k needs to be consistent with iter->pos - required for
bch2_btree_iter_(rewind|advance) to work correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:38 -04:00
Kent Overstreet
8192f8a586 bcachefs: Another should_be_locked fixup
When returning a key from the key cache, in BTREE_ITER_WITH_KEY_CACHE
mode, we don't want to set should_be_locked on iter->path; we're not
returning a key from that path, so we donn't need to, and also since we
traversed the key cache iterator before setting should_be_locked on that
path it might be unlocked (if we unlocked, bch2_trans_relock() won't
have relocked it).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:38 -04:00
Kent Overstreet
223b560e02 bcachefs: btree_path_down() optimization
We should be calling btree_node_mem_ptr_set() before path_level_init(),
since we already touched the key that btree_node_mem_ptr_set() will
modify and path_level_init() will be doing the lookup in the child btree
node we're recursing to.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:38 -04:00
Kent Overstreet
c497df8b85 bcachefs: Increment restart count in bch2_trans_begin()
Instead of counting transaction restarts, count when the transaction is
restarted: if bch2_trans_begin() was called when the transaction wasn't
restarted we need to ensure restart_count is still incremented.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:38 -04:00
Kent Overstreet
5c0bb66ae3 bcachefs: Track the maximum btree_paths ever allocated by each transaction
We need a way to check if the machinery for handling btree_paths with in
a transaction is behaving reasonably, as it often has not been - we've
had bugs with transaction path overflows caused by duplicate paths and
plenty of other things.

This patch tracks, per transaction fn, the most btree paths ever
allocated by that transaction and makes it available in debugfs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:38 -04:00
Kent Overstreet
4aba7d4569 bcachefs: Rename lock_held_stats -> btree_transaction_stats
Going to be adding more things to this in the next patch.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:38 -04:00
Kent Overstreet
a300261ad1 bcachefs: Fix duplicate paths left by bch2_path_put()
bch2_path_put() is supposed to drop paths that aren't needed on
transaction restart, or to hold locks that we're supposed to keep until
transaction commit: but it was failing to free paths in some cases that
it should have, leading to transaction path overflows with lots of
duplicate paths.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:38 -04:00
Kent Overstreet
6fae65c112 bcachefs: Kill BTREE_ITER_CACHED_(NOFILL|NOCREATE)
These were used more prior to getting rid of the in-memory bucket arrays
- they don't serve much purpose anymore, and deleting them lets us write
better assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:38 -04:00
Kent Overstreet
9f96568c0a bcachefs: Tracepoint improvements
Our types are exported to the tracepoint code, so it's not necessary to
break things out individually when passing them to tracepoints - we can
also call other functions from TP_fast_assign().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:38 -04:00