linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2025-01-04 04:06:26 +00:00

Author	SHA1	Message	Date
Kent Overstreet	aae15aafcd	bcachefs: New and improved topology repair code This splits out btree topology repair into a separate pass, and makes some improvements: - When we have to pick which of two overlapping nodes to drop keys from, we use the btree node header sequence number to preserve the newer node - the gc code has been changed so that it doesn't bail out if we're continuing/ignoring on fsck error - this way the dump tool can skip running the repair pass but still walk all reachable metadata - add a new superblock flag indicating when a filesystem is known to have btree topology issues, and the topology repair pass should be run - changing the start/end of a node might mean keys in that node have to be deleted: this patch handles that better by splitting it out into a separate function and running it explicitly in the topology repair code, previously those keys were only being dropped when the btree node was read in. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	0098376f03	bcachefs: New helper __bch2_btree_insert_keys_interior() Consolidate common parts of bch2_btree_insert_keys_interior() and btree_split_insert_keys() - prep work for adding some new topology assertions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	bcd25dac53	bcachefs: Rewrite btree nodes with errors This patch adds self healing functionality for btree nodes - if we notice a problem when reading a btree node, we just rewrite it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	537c32f521	bcachefs: Don't BUG_ON() btree topology error This replaces an assertion in the btree merge path with a bch2_inconsistent_error() - fsck will fix it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	4d47b21c4d	bcachefs: Fix a use after free Turns out, we weren't waiting on in flight btree writes when freeing existing btree nodes. This lead to stray btree writes overwriting newly allocated buckets, but only started showing itself with some of the recent allocator work and another patch to move submitting of btree writes to worqueues. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	241e26369e	bcachefs: Don't flush btree writes more aggressively because of btree key cache We need to flush the btree key cache when it's too dirty, because otherwise the shrinker won't be able to reclaim memory - this is done by journal reclaim. But journal reclaim also kicks btree node writes: this meant that btree node writes were getting kicked much too often just because we needed to flush btree key cache keys. This patch splits journal pins into two different lists, and teaches journal reclaim to not flush btree node writes when it only needs to flush key cache keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	35d5aff263	bcachefs: Kill bch2_fs_usage_scratch_get() This is an important cleanup, eliminating an unnecessary copy in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	2940295c97	bcachefs: Be more careful about JOURNAL_RES_GET_RESERVED JOURNAL_RES_GET_RESERVED should only be used for updatse that need to be done to free up space in the journal. In particular, when we're flushing keys from the key cache, if we're flushing them out of order we shouldn't be using it, since we're using up our remaining space in the journal without dropping a pin that will let us make forward progress. With this patch, BTREE_INSERT_JOURNAL_RECLAIM without BTREE_INSERT_JOURNAL_RESERVED may return -EAGAIN - we can't wait on journal reclaim if we're already in journal reclaim. This means we need to propagate these errors up to journal reclaim, indicating that flushing a journal pin should be retried in the future. This is prep work for a patch to change the way journal reclaim works, to split out flushing key cache keys because the btree key cache is too dirty from journal reclaim because we need space in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	ecab6be7e5	bcachefs: bch2_foreground_maybe_merge() now correctly reports lock restarts This means that btree node splits don't have to automatically trigger a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	54ca47e114	bcachefs: Kill bch2_btree_node_get_sibling() This patch reworks the btree node merge path to use a second btree iterator to get the sibling node - which means bch2_btree_iter_get_sibling() can be deleted. Also, it uses bch2_btree_iter_traverse_all() if necessary - which means it should be more reliable. We don't currently even try to make it work when trans->nounlock is set - after a BTREE_INSERT_NOUNLOCK transaction commit, hopefully this will be a worthwhile tradeoff. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	1259cc31b2	bcachefs: Change where merging of interior btree nodes is trigger from Previously, we were doing btree node merging from bch2_btree_insert_node() - but this is called from the split path, when we're in the middle of creating new nodes and deleting new nodes and the iterators are in a weird state. Also, this means we're starting a new btree_update while in the middle of an existing one, and that's asking for deadlocks. Much simpler and saner to trigger btree node merging _after_ the whole btree node split path is finished. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	e264b2f62a	bcachefs: Improve bch2_btree_update_start() bch2_btree_update_start() is now responsible for taking gc_lock and upgrading the iterator to lock parent nodes - greatly simplifying error handling and all of the callers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	e751c01a8e	bcachefs: Start using bpos.snapshot field This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor\|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	4cf91b0270	bcachefs: Split out bpos_cmp() and bkey_cmp() With snapshots, we're going to need to differentiate between comparisons that should and shouldn't include the snapshot field. bpos_cmp is now the comparison function that does include the snapshot field, used by core btree code. Upper level filesystem code generally does _not_ want to compare against the snapshot field - that code wants keys to compare as equal even when one of them is in an ancestor snapshot. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	3bf57160c2	bcachefs: Fix packed bkey format calculation for new btree roots Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	2da5d000b9	bcachefs: Generate better bkey formats when splitting nodes On btree node split, we weren't ensuring the min_key of the new larger node packs in the new format for this node. This triggers some painful slowpaths in the bset.c aux search tree code - this patch fixes that by calculating a new format for the new node with the new min_key. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	0390ea8ad8	bcachefs: Drop bkey noops Bkey noops were introduced to deal with trimming inline data extents in place in the btree: if the u64s field of a bkey was 0, that u64 was a noop and we'd start looking for the next bkey immediately after it. But extent handling has been lifted above the btree - we no longer modify existing extents in place in the btree, and the compatibilty code for old style extent btree nodes is gone, so we can completely drop this code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	a9d79c6e8b	bcachefs: Use pcpu mode of six locks for interior nodes Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	f020bfcdb0	bcachefs: Use bch2_bpos_to_text() more consistently Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	41f8b09edc	bcachefs: Rename BTREE_ID enums for consistency with other enums Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	c052cf82f3	bcachefs: KEY_TYPE_discard is no longer used KEY_TYPE_discard used to be used for extent whiteouts, but when handling over overlapping extents was lifted above the core btree code it became unused. This patch updates various code to reflect that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	f2785955bb	bcachefs: Kill support for !BTREE_NODE_NEW_EXTENT_OVERWRITE() bcachefs has been aggressively migrating filesystems and btree nodes to the new format for quite some time - this shouldn't affect anyone anymore, and lets us delete a _lot_ of code. Also, it frees up KEY_TYPE_discard for a new whiteout key type for snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	1889ad5a12	bcachefs: Add code to scan for/rewite old btree nodes This adds a new data job type to scan for btree nodes in the old extent format, and rewrite them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	d042b0402c	bcachefs: Add an option for metadata_target Also, make journal writes obey foreground_target and metadata_target. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	51d2dfb82d	bcachefs: Add BTREE_PTR_RANGE_UPDATED This is so that when we discover btree topology issues, we can just update the pointer to a btree node and signal btree read path that the min/max keys in the node header should be updated from the node pointer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	dcf64dfbbc	bcachefs: Fix btree node split after merge operations A btree node merge operation deletes a key in the parent node; if when inserting into the parent node we split the parent node, we can end up with a whiteout in the parent node that we don't want. The existing code drops them before doing the split, because they can screw up picking the pivot, but we forgot about the unwritten writeouts area - that needs to be cleared out too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	890e3f5bf7	bcachefs: Reserve some open buckets for btree allocations This reverts part of the change from "bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much" - it turns out we still should be reserving open buckets for btree node allocations, because otherwise data bucket allocations (especially with erasure coding enabled) can use up all our open buckets and we won't be able to do the metadata update that lets us release those open bucket references. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	07a1006ae8	bcachefs: Reduce/kill BKEY_PADDED use With various newer key types - stripe keys, inline data extents - the old approach of calculating the maximum size of the value is becoming more and more error prone. Better to switch to bkey_on_stack, which can dynamically allocate if necessary to handle any size bkey. In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	3187aa8d57	bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much Previously, we were using BTREE_INSERT_RESERVE in a lot of places where it no longer makes sense. - we now have more open_buckets than we used to, and the reserves work better, so we shouldn't need to use BTREE_INSERT_RESERVE just because we're holding open_buckets pinned anymore. - We have the btree key cache for updates to the alloc btree, meaning we no longer need the btree reserve to ensure the allocator can make forward progress. This means that we should only need a reserve for btree updates to ensure that copygc can make forward progress. Since it's now just for copygc, we can also fold RESERVE_BTREE into RESERVE_MOVINGGC (the allocator's freelist reserve). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	719fe7fb55	bcachefs: Update transactional triggers interface to pass old & new keys This is needed to fix a bug where we're overflowing iterators within a btree transaction, because we're updating the stripes btree (to update block counts) and the stripes btree trigger is unnecessarily updating the alloc btree - it doesn't need to update the alloc btree when the pointers within a stripe aren't changing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	5db43418d5	bcachefs: Don't issue btree writes that weren't journalled If we have an error in the btree interior update path that prevents us from journalling the update, we can't issue the corresponding btree node write - we didn't get a journal sequence number that would cause it to be ignored in recovery. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	7bfbbd8802	bcachefs: Fix spurious alloc errors on forced shutdown Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	04e23a566f	bcachefs: Ensure we always have a journal pin in interior update path For the new nodes an interior btree update makes reachable, updates to those nodes may be journalled after the btree update starts but before the transactional part - where we make those nodes reachable. Those updates need to be kept in the journal until after the btree update completes, hence we should always get a journal pin at the start of the interior update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00
Kent Overstreet	7b48920770	bcachefs: Delete dead code The interior btree node update path has changed, this is no longer needed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00
Kent Overstreet	4e92cbb642	bcachefs: More debug code improvements Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00
Kent Overstreet	1c74cec10c	bcachefs: Add more debug checks tracking down a bug where we see a btree node pointer in the wrong node Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:47 -04:00
Kent Overstreet	0b5c9f5940	bcachefs: Set preallocated transaction mem to avoid restarts this will reduce transaction restarts, from observation of tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:47 -04:00
Kent Overstreet	6a747c4683	bcachefs: Add accounting for dirty btree nodes/keys This lets us improve journal reclaim, so that it now tries to make sure no more than 3/4s of the btree node cache and btree key cache are dirty - ensuring the shrinkers can free memory. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:46 -04:00
Kent Overstreet	811d2bcd85	bcachefs: Drop typechecking from bkey_cmp_packed() This only did anything in two places, and those can just be replaced wiht bkey_cmp_left_packed()). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:46 -04:00
Kent Overstreet	7807e14384	bcachefs: Convert various code to printbuf printbufs know how big the buffer is that was allocated, so we can get rid of the random PAGE_SIZEs all over the place. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:43 -04:00
Kent Overstreet	760992aac8	bcachefs: Ensure we wake up threads locking node when reusing it Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:43 -04:00
Kent Overstreet	4fe7efa177	bcachefs: Fix an error path We were missing a 'goto retry' and continuing on with an error pointer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:43 -04:00
Kent Overstreet	a2b5313a39	bcachefs: Fix a faulty assertion Now that updates to interior nodes are journalled, we shouldn't be checking topology of interior nodes until we've finished replaying updates to that node. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	fff899b1d9	bcachefs: Mark btree nodes as needing rewrite when not all replicas are RW This fixes a bug where recovery fails when one of the devices is read only. Also - consolidate the "must rewrite this node to insert it" behind a new btree node flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	937f503605	bcachefs: Use btree reserve when appropriate Whenever we're doing an update that has pointers, that generally means we need to do the update in order to release open bucket references - so we should be using the btree open bucket reserve. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	2ca88e5ad9	bcachefs: Btree key cache This introduces a new kind of btree iterator, cached iterators, which point to keys cached in a hash table. The cache also acts as a write cache - in the update path, we journal the update but defer updating the btree until the cached entry is flushed by journal reclaim. Cache coherency is for now up to the users to handle, which isn't ideal but should be good enough for now. These new iterators will be used for updating inodes and alloc info (the alloc and stripes btrees). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	bd2bb273a0	bcachefs: Don't deadlock when btree node reuse changes lock ordering Btree node lock ordering is based on the logical key. However, 'struct btree' may be reused for a different btree node under memory pressure. This patch uses the new six lock callback to check if a btree node is no longer the node we wanted to lock before blocking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	4efe71a646	bcachefs: Always give out journal pre-res if we already have one This is better than skipping the journal pre-reservation if we already have one - we should still acount for the journal reservation we're going to have to get. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	966885ee40	bcachefs: Fix a linked list bug Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	40ca39b564	bcachefs: btree_update_nodes_written() requires alloc reserve Also, in the btree_update_start() path, if we already have a journal pre-reservation we don't want to take another - that's a deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00

1 2 3

123 Commits