linux-stable

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git synced 2025-01-17 18:56:24 +00:00

Author	SHA1	Message	Date
Kent Overstreet	e96f5a61cb	bcachefs: Fix bch2_check_discard_freespace_key() We weren't correctly checking the freespace btree - it's an extents btree, which means we need to iterate over each bucket in a freespace extent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	faa62a2036	bcachefs: alloc_v4_u64s() fix With the recent bkey_ops.min_val_size addition, bkey values are automatically extended to the size of the current version. The check in bch2_alloc_v4_invalid() needs to be updated to take this into account. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:01 -04:00
Kent Overstreet	dbda63bbb0	bcachefs: bch2_bkey_make_mut() now calls bch2_trans_update() It's safe to call bch2_trans_update with a k/v pair where the value hasn't been filled out, as long as the key part has been and the value is filled out by transaction commit time. This patch folds the bch2_trans_update() call into bch2_bkey_make_mut(), eliminating a bit of boilerplate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:01 -04:00
Kent Overstreet	34dfa5db19	bcachefs: bch2_bkey_get_mut() improvements - bch2_bkey_get_mut() now handles types increasing in size, allocating a buffer for the type's current size when necessary - bch2_bkey_make_mut_typed() - bch2_bkey_get_mut() now initializes the iterator, like bch2_bkey_get_iter() Also, refactor so that most of the code is in functions - now macros are only used for wrappers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:01 -04:00
Kent Overstreet	bcb79a51cb	bcachefs: bch2_bkey_get_iter() helpers Introduce new helpers for a common pattern: bch2_trans_iter_init(); bch2_btree_iter_peek_slot(); - bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of the correct type - bch2_bkey_get_val_typed() copies the val out of the btree to a (typically stack allocated) variable; it handles the case where the value in the btree is smaller than the current version of the type, zeroing out the remainder. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:00 -04:00
Brian Foster	251babb55d	bcachefs: fix NULL bch_dev deref when checking bucket_gens keys fsck removes bucket_gens keys for devices that do not exist in the volume (i.e., if the device was removed). In 'fsck -n' mode, the associated fsck_err_on() wrapper returns false to skip the key removal. This proceeds on to the rest of the function, which eventually segfaults on a NULL bch_dev because the device does not exist. Update bch2_check_bucket_gens_key() to skip out of the rest of the function when the associated device does not exist, regardless of running fsck in check or repair mode. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:00 -04:00
Kent Overstreet	615fccada5	bcachefs: Fix a slab-out-of-bounds In __bch2_alloc_to_v4_mut(), we overrun the buffer we allocate if the alloc key had backpointers stored in it (which we no longer support). Fix this with a max() call. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:00 -04:00
Kent Overstreet	62a03559d6	bcachefs: Rip out code for storing backpointers in alloc keys We don't store backpointers in alloc keys anymore, since we gained the btree write buffer. This patch drops support for backpointers in alloc keys, and revs the on disk format version so that we know a fsck is required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	1546cf9727	bcachefs: Fix bch2_get_key_or_hole() This fixes an off by one error, due to confusing closed vs. half open intervals. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	e9b9e475ea	bcachefs: bch2_dev_freespace_init() Print out status every 10 seconds It appears freespace init can still take awhile, and we've had a report or two of it getting stuck - let's have it print out where it's at every 10 seconds. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Brian Foster	8bff9875a6	bcachefs: use dedicated workqueue for tasks holding write refs A workqueue resource deadlock has been observed when running fsck on a filesystem with a full/stuck journal. fsck is not currently able to repair the fs due to fairly rapid emergency shutdown, but rather than exit gracefully the fsck process hangs during the shutdown sequence. Fortunately this is easily recoverable from userspace, but the root cause involves code shared between the kernel and userspace and so should be addressed. The deadlock scenario involves the main task in the bch2_fs_stop() -> bch2_fs_read_only() path waiting on write references to drain with the fs state lock held. A bch2_read_only_work() workqueue task is scheduled on the system_long_wq, blocked on the state lock. Finally, various other write ref holding workqueue tasks are scheduled to run on the same workqueue and must complete in order to release references that the initial task is waiting on. To avoid this problem, we can split the dependent workqueue tasks across different workqueues. It's a bit of a waste to create a dedicated wq for the read-only worker, but there are several tasks throughout the fs that follow the pattern of acquiring a write reference and then scheduling to the system wq. Use a local wq for such tasks to break the subtle dependency between these and the read-only worker. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	b40901b0f7	bcachefs: New erasure coding shutdown path This implements a new shutdown path for erasure coding, which is needed for the upcoming BCH_WRITE_WAIT_FOR_EC write path. The process is: - Cancel new stripes being built up - Close out/cancel open buckets on write points or the partial list that are for stripes - Shutdown rebalance/copygc - Then wait for in flight new stripes to finish With BCH_WRITE_WAIT_FOR_EC, move ops will be waiting on stripes to fill up before they complete; the new ec shutdown path is needed for shutting down copygc/rebalance without deadlocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00
Kent Overstreet	46e14854fc	bcachefs: Fix next_bucket() This fixes an infinite loop in bch2_get_key_or_real_bucket_hole(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00
Kent Overstreet	39a1ea129a	bcachefs: Single open_bucket_partial list Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00
Kent Overstreet	84ddb8b98e	bcachefs: Don't invalidate open buckets Like bch2_trans_mark_bucket(), we shouldn't be incrementing a bucket gen while it's still open - erasure coding was hitting this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:53 -04:00
Kent Overstreet	80c3308578	bcachefs: Fragmentation LRU Now that we have much more efficient updates to the LRU btree, this patch adds a new LRU that indexes buckets by fragmentation. This means copygc no longer has to scan every bucket to find buckets that need to be evacuated. Changes: - A new field in bch_alloc_v4, fragmentation_lru - this corresponds to the bucket's position in the fragmentation LRU. We add a new field for this instead of calculating it as needed because we may make the fragmentation LRU optional; this field indicates whether a bucket is on the fragmentation LRU. Also, zoned devices will introduce variable bucket sizes; explicitly recording the LRU position will be safer for them. - A new copygc path for using the fragmentation LRU instead of scanning every bucket and building up an in-memory heap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:53 -04:00
Kent Overstreet	1b30ed5fd8	bcachefs: Use btree write buffer for LRU btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:53 -04:00
Daniel Hill	8ffa11a2c5	bcachefs: let __bch2_btree_insert() pass in flags This patch is prep work for the following patch. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:52 -04:00
Kent Overstreet	629a21b621	bcachefs: Improve invalidate_one_bucket() error messages Make sure to check for lru entries that point to buckets that don't exist as well as buckets in the wrong state, and improve the error message we print out. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:52 -04:00
Kent Overstreet	dbe17f1883	bcachefs: BKEY_INVALID_FROM_JOURNAL Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:52 -04:00
Kent Overstreet	facafdcbc1	bcachefs: Change bkey_invalid() rw param to flags Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:52 -04:00
Kent Overstreet	83f33d6865	bcachefs: Rework lru btree This patch changes how the LRU index works: Instead of using KEY_TYPE_lru where the bucket the lru entry points to is part of the value, this switches to KEY_TYPE_set and encoding the bucket we refer to in the low bits of the key. This means that we no longer have to check for collisions when inserting LRU entries. We'll be making using of this in the next patch, which adds a btree write buffer - a pure write buffer for btree updates, where updates are appended to a simple array and then periodically sorted and batch inserted. This is a new on disk format version, and a forced upgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:52 -04:00
Kent Overstreet	5250b74d55	bcachefs: bucket_gens btree To improve mount times, add a btree for just bucket gens, 256 of them per key: this means we'll have to scan drastically less metadata at startup. This adds - trigger for keeping it in sync with the all btree - initialization code, for filesystems from previous versions - new path for reading bucket gens - new fsck code And a new on disk format version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:51 -04:00
Kent Overstreet	d23124c757	bcachefs: Improve bch2_check_alloc_info() This factors out a new helper from bch2_dev_freespace_init(), bch2_get_key_or_hole(), and uses it in bch2_check_alloc_info(): we're now able to process holes in the alloc btree as ranges, instead of one bucket at a time. This will improve fsck performance on new filesystems, or filesystems where not every bucket has been used yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:51 -04:00
Kent Overstreet	cc65f56599	bcachefs: Improve bch2_dev_freespace_init() This makes bch2_dev_freespace_init() much faster: instead of processing every bucket on the device one at a time, we handle ranges of missing keys all at once: the freespace btree is an extents style btree, so we only have to insert one freespace key for every range of missing keys in the alloc btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:51 -04:00
Kent Overstreet	a8c752bb1d	bcachefs: New on disk format: Backpointers This patch adds backpointers: we now have a reverse index from device and offset on that device (specifically, offset within a bucket) back to btree nodes and (non cached) data extents. The first 40 backpointers within a bucket are stored in the alloc key; after that backpointers spill over to the next backpointers btree. This is to help avoid performance regressions from additional btree updates on large streaming workloads. This patch adds all the code for creating, checking and repairing backpointers. The next patch in the series is going to use backpointers for copygc - finally getting rid of the need to scan all extents to do copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	f2b542ba42	bcachefs: Go RW before check_alloc_info() It's possible to do btree updates before going RW by adding them to the list of updates for journal replay to do, but this is limited by what fits in RAM. This patch switches the second alloc info phase to run after going RW - btree_gc has already ensured the alloc btree itself is correct - and tweaks the allocation path to deal with the potential small inconsistencies. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	d94189ad56	bcachefs: Debug mode for c->writes references This adds a debug mode where we split up the c->writes refcount into distinct refcounts for every codepath that takes a reference, and adds sysfs code to print the value of each ref. This will make it easier to debug shutdown hangs due to refcount leaks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	f52dd1ae20	bcachefs: Fix bch_alloc_to_text() We weren't guarding against the alloc key having an invalid data type. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:49 -04:00
Kent Overstreet	19a614d2e4	bcachefs: Better inlining for bch2_alloc_to_v4_mut This separates out the slowpath into a separate function, and inlines bch2_alloc_v4_mut into bch2_trans_start_alloc_update(), the main place it's called. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:49 -04:00
Kent Overstreet	313816363a	bcachefs: bch2_trans_relock_notrace() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:49 -04:00
Kent Overstreet	994ba47543	bcachefs: New btree helpers This introduces some new conveniences, to help cut down on boilerplate: - bch2_trans_kmalloc_nomemzero() - performance optimiation - bch2_bkey_make_mut() - bch2_bkey_get_mut() - bch2_bkey_get_mut_typed() - bch2_bkey_alloc() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:48 -04:00
Kent Overstreet	78c0b75c34	bcachefs: More errcode cleanup We shouldn't be overloading standard error codes now that we have provisions for bcachefs-specific errorcodes: this patch converts super.c and super-io.c to per error site errcodes, with a bit of cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:48 -04:00
Kent Overstreet	e88a75ebe8	bcachefs: New bpos_cmp(), bkey_cmp() replacements This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:47 -04:00
Kent Overstreet	674cfc2624	bcachefs: Add persistent counters for all tracepoints Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	549d173c1b	bcachefs: EINTR -> BCH_ERR_transaction_restart Now that we have error codes, with subtypes, we can switch to our own error code for transaction restarts - and even better, a distinct error code for each transaction restart reason: clearer code and better debugging. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00
Kent Overstreet	d4bf5eecd7	bcachefs: Use bch2_err_str() in error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:36 -04:00
Kent Overstreet	615f867c14	bcachefs: Improved errcodes Instead of overloading standard error codes (EINTR/EAGAIN), and defining short lists of error codes in multiple places that potentially end up overlapping & conflicting, we're now going to have one master list of error codes. Error codes are defined with an x-macro: thus we also have bch2_err_str() now. Also, error codes have a class field. Now, instead of checking for errors with ==, code should use bch2_err_matches(), which returns true if the error is equal to or a sub-error of the error class. This means we can define unique errors for every source location where an error is generated, which will help improve our error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:36 -04:00
Kent Overstreet	445d184af2	bcachefs: Convert alloc code to for_each_btree_key_commit() The new for_each_btree_key2() macro handles transaction retries, allowing us to avoid nested transactions - which we want to avoid since they're tricky to do completely correctly and upcoming assertions are going to be checking for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:36 -04:00
Kent Overstreet	d04801a0f4	bcachefs: Convert bch2_do_invalidates_work() to for_each_btree_key2() The new for_each_btree_key2() macro handles transaction retries, allowing us to avoid nested transactions - which we want to avoid since they're tricky to do completely correctly and upcoming assertions are going to be checking for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:36 -04:00
Kent Overstreet	ca91f40ff7	bcachefs: Convert bch2_dev_freespace_init() to for_each_btree_key_commit() The new for_each_btree_key2() macro handles transaction retries, allowing us to avoid nested transactions - which we want to avoid since they're tricky to do completely correctly and upcoming assertions are going to be checking for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:36 -04:00
Kent Overstreet	4910a9506c	bcachefs: Convert bch2_do_discards_work() to for_each_btree_key2() The new for_each_btree_key2() macro handles transaction retries, allowing us to avoid nested transactions - which we want to avoid since they're tricky to do completely correctly and upcoming assertions are going to be checking for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:36 -04:00
Kent Overstreet	a1783320d4	bcachefs: for_each_btree_key2() This introduces two new macros for iterating through the btree, with transaction restart handling - for_each_btree_key2() - for_each_btree_key_commit() Every iteration is now in an implicit transaction, and - as with lockrestart_do() and commit_do() - returning -EINTR will cause the transaction to be restarted, at the same key. This patch converts a bunch of code that was open coding this to these new macros, saving a substantial amount of code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	e68914ca84	bcachefs: Rename __bch2_trans_do() -> commit_do() Better/more descriptive naming, and prep for adding nested_lockrestart_do() and nested_commit_do(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	80b3bf33d3	bcachefs: Silence some fsck errors when reconstructing alloc info There's no need to print fsck errors for errors that are expected, and the user has already opted to repair. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	47ab0c5f6a	bcachefs: Fix bch2_check_alloc_key() bch2_check_alloc_key() was failing to check buckets that didn't have alloc keys yet (because they'd never been used) - they still need to be added to the freespace btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	e34da43e33	bcachefs: Improve bch2_check_alloc_info - In check_alloc_key(), previously we were re-initializing iterators for the need_discard and freespace btrees for every alloc key we checked. But this was causing us to redo lookups into the journal keys every time, since those lookups are cached in struct btree_iter. This initializes the iterators in bch2_check_alloc_info and passes them into check_alloc_key(). - Make the looping more consistent/efficient in bch2_check_alloc_info() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	22add2ec67	bcachefs: Use BTREE_INSERT_LAZY_RW in bch2_check_alloc_info() This runs before we go rw for journal replay, but after we're allowed to go rw. It might be time to consider killing BTREE_INSERT_LAZY_RW, though. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	3858536744	bcachefs: Bucket invalidate path improvements - invalidate_one_bucket() now returns 1 when we don't have any buckets on this device to invalidate, ensuring we don't spin - the tracepoint invocation is moved to after the transaction commit, and we now include the number of cached sectors in the tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	1c6ff39445	bcachefs: Fix refcount leak in bch2_do_invalidates() If we fail to queue the work item because it's already in process, we need to drop the ref we just took. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00

1 2 3 4 5

213 Commits