linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-01-10 07:10:27 +00:00

Author	SHA1	Message	Date
Kent Overstreet	2776369266	bcachefs: Add a cond_resched() call to journal_keys_sort() We're just doing cpu work here and it could take awhile, a cond_resched() is definitely needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	62a03559d6	bcachefs: Rip out code for storing backpointers in alloc keys We don't store backpointers in alloc keys anymore, since we gained the btree write buffer. This patch drops support for backpointers in alloc keys, and revs the on disk format version so that we know a fsck is required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Brian Foster	349b1d832b	bcachefs: use reservation for log messages during recovery If we block on journal reservation attempting to log journal messages during recovery, particularly for the first message(s) before we start doing actual work, chances are the filesystem ends up deadlocked. Allow logged messages to use reserved journal space to mitigate this problem. In the worst case where no space is available whatsoever, this at least allows the fs to recognize that the journal is stuck and fail the mount gracefully. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	26559553e4	bcachefs: Add a fallback when journal_keys doesn't fit in ram We may end up in a situation where allocating the buffer for the sorted journal_keys fails - but it would likely succeed, post compaction where we drop duplicates. We've had reports of this allocation failing, so this adds a slowpath to do the compaction incrementally. This is only a band-aid fix; we need to look at limiting the number of keys in the journal based on the amount of system RAM. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00
Kent Overstreet	40a18fe273	bcachefs: Add error message for failing to allocate sorted journal keys Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00
Kent Overstreet	65d48e3525	bcachefs: Private error codes: ENOMEM This adds private error codes for most (but not all) of our ENOMEM uses, which makes it easier to track down assorted allocation failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00
Kent Overstreet	ac2ccddc26	bcachefs: Drop some anonymous structs, unions Rust bindgen doesn't cope well with anonymous structs and unions. This patch drops the fancy anonymous structs & unions in bkey_i that let us use the same helpers for bkey_i and bkey_packed; since bkey_packed is an internal type that's never exposed to outside code, it's only a minor inconvenienc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	27616a3124	bcachefs: Simplify ec stripes heap Now that we have a separate data structure for tracking open stripes, the stripes heap can track all existing stripes, which is a nice simplification. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00
Kent Overstreet	80c3308578	bcachefs: Fragmentation LRU Now that we have much more efficient updates to the LRU btree, this patch adds a new LRU that indexes buckets by fragmentation. This means copygc no longer has to scan every bucket to find buckets that need to be evacuated. Changes: - A new field in bch_alloc_v4, fragmentation_lru - this corresponds to the bucket's position in the fragmentation LRU. We add a new field for this instead of calculating it as needed because we may make the fragmentation LRU optional; this field indicates whether a bucket is on the fragmentation LRU. Also, zoned devices will introduce variable bucket sizes; explicitly recording the LRU position will be safer for them. - A new copygc path for using the fragmentation LRU instead of scanning every bucket and building up an in-memory heap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:53 -04:00
Kent Overstreet	806c8a6aa8	bcachefs: Fix failure to read btree roots If failed to read a btree root - or if we're not using a btree root, because of the reconstruct_alloc option - make sure we update the corresponding info for the key/level for the root on disk. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:52 -04:00
Kent Overstreet	83f33d6865	bcachefs: Rework lru btree This patch changes how the LRU index works: Instead of using KEY_TYPE_lru where the bucket the lru entry points to is part of the value, this switches to KEY_TYPE_set and encoding the bucket we refer to in the low bits of the key. This means that we no longer have to check for collisions when inserting LRU entries. We'll be making using of this in the next patch, which adds a btree write buffer - a pure write buffer for btree updates, where updates are appended to a simple array and then periodically sorted and batch inserted. This is a new on disk format version, and a forced upgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:52 -04:00
Kent Overstreet	5250b74d55	bcachefs: bucket_gens btree To improve mount times, add a btree for just bucket gens, 256 of them per key: this means we'll have to scan drastically less metadata at startup. This adds - trigger for keeping it in sync with the all btree - initialization code, for filesystems from previous versions - new path for reading bucket gens - new fsck code And a new on disk format version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:51 -04:00
Kent Overstreet	8dd69d9f64	bcachefs: KEY_TYPE_inode_v3, metadata_version_inode_v3 Move bi_size and bi_sectors into the non-varint portion of the inode, so that the write path can update them without going through the relatively expensive unpack/pack operations. Other changes: - Add a field for the offset of the varint section, so we can add new non-varint fields without needing a new inode type, like alloc_v3 - Move bi_mode into the flags field, so that the varint section can be u64 aligned Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:51 -04:00
Kent Overstreet	47b323a0b0	bcachefs: Start snapshots before bch2_gc() bch2_gc may require snapshots to be started - the repair path when checking the reflink btree may do updates to the extents btree. This moves bch2_fs_initialize_subvolumes() and bch2_fs_snapshots_start() to before bch2_gc() - since we haven't gone RW yet, the updates in bch2_fs_initialize_subvolumes() are done via the journal replay keys list, so it's fine to do this before bch2_gc(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:51 -04:00
Kent Overstreet	a8c752bb1d	bcachefs: New on disk format: Backpointers This patch adds backpointers: we now have a reverse index from device and offset on that device (specifically, offset within a bucket) back to btree nodes and (non cached) data extents. The first 40 backpointers within a bucket are stored in the alloc key; after that backpointers spill over to the next backpointers btree. This is to help avoid performance regressions from additional btree updates on large streaming workloads. This patch adds all the code for creating, checking and repairing backpointers. The next patch in the series is going to use backpointers for copygc - finally getting rid of the need to scan all extents to do copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	f2b542ba42	bcachefs: Go RW before check_alloc_info() It's possible to do btree updates before going RW by adding them to the list of updates for journal replay to do, but this is limited by what fits in RAM. This patch switches the second alloc info phase to run after going RW - btree_gc has already ensured the alloc btree itself is correct - and tweaks the allocation path to deal with the potential small inconsistencies. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	5f5c746617	bcachefs: Start copygc when first going read-write In the distant past, it wasn't possible to start copygc until after journal replay had finished. Now, the btree iterator code overlays keys from the journal, so there's no reason not to start it earlier - and it solves a rare deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	858536c7ce	bcachefs: Convert EROFS errors to private error codes More error code improvements - this gets us more useful error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:49 -04:00
Kent Overstreet	5bbe3f2d0e	bcachefs: Log more messages in the journal This patch - Adds a mechanism for queuing up journal entries prior to the journal being started, which will be used for early journal log messages - Adds bch2_fs_log_msg() and improves bch2_trans_log_msg(), which now take format strings. bch2_fs_log_msg() can be used before or after the journal has been started, and will use the appropriate mechanism. - Deletes the now obsolete bch2_journal_log_msg() - And adds more log messages to the recovery path - messages for journal/filesystem started, journal entries being blacklisted, and journal replay starting/finishing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:48 -04:00
Kent Overstreet	67ace27246	bcachefs: Add a missing bch2_err_str() call Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:48 -04:00
Kent Overstreet	1ba8a796b4	bcachefs: Recover from blacklisted journal entries If it so happens that we crash while dirty, meaning we don't have the superblock clean section, and we erroneously mark a journal entry we wrote as blacklisted, we won't be able to recover. This patch fixes this by adding a fallback: if we've got no superblock clean section, and no non-ignored journal entries, we try the most recent ignored journal entry. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:48 -04:00
Kent Overstreet	4f948723ed	bcachefs: Fix bch2_journal_keys_peek_upto() bch2_journal_keys_peek_upto() was comparing against btree_id & level incorrectly - fix this by using __journal_key_cmp(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:47 -04:00
Kent Overstreet	e0de429a3a	bcachefs: Don't error out when just reading the journal This tweaks the recovery and journal paths so that we don't error out before we need to: the list_journal command should work, even if we wouldn't be able to replay successfully. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:47 -04:00
Kent Overstreet	e88a75ebe8	bcachefs: New bpos_cmp(), bkey_cmp() replacements This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:47 -04:00
Kent Overstreet	b2d1d56b1d	bcachefs: Fixes for building in userspace - Marking a non-static function as inline doesn't actually work and is now causing problems - drop that - Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages with bcachefs (filesystem name) - Userspace doesn't have real percpu variables (maybe we can get this fixed someday), put an #ifdef around bch2_disk_reservation_add() fastpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	a101957649	bcachefs: More style fixes Fixes for various checkpatch errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	c167f9e541	bcachefs: Journal keys overlay fixes - In the btree iterator code that overlays keys from the journal, we were incorrectly specifying level=0 instead of the btree_path's current level in a few places - When we didn't do journal replay, we shouldn't free the journal keys: this fixes cmd_list and cmd_dump, which run in norecovery mode Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	3e3e02e6bc	bcachefs: Assorted checkpatch fixes checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	1ffb876fb0	bcachefs: Kill journal_keys->journal_seq_base This removes an optimization that didn't actually save us any memory, due to alignment, but did make the code more complicated than it needed to be. We were also seeing a bug where journal_seq_base wasn't getting correctly initailized, so hopefully it'll fix that too. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	1ed0a5d280	bcachefs: Convert fsck errors to errcode.h Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00
Kent Overstreet	d4bf5eecd7	bcachefs: Use bch2_err_str() in error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:36 -04:00
Kent Overstreet	3ab25c1b4e	bcachefs: We can handle missing btree roots for all alloc btrees We can rebuild alloc info if these btree roots are missing - no need to bail out and say the filesystem is unrecoverable Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:36 -04:00
Kent Overstreet	4ab35c34d5	bcachefs: Fix subvol/snapshot deleting in recovery fsck doesn't want to run while we're cleaning up deleted snapshots - if that work needs to be done, we want it to have finished before fsck runs, otherwise fsck will get confused when it finds multiple keys in the same snapshot ID equivalence class (i.e. the mechanism that snapshot deletion uses for cleaning up redundant keys). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	401ec4db63	bcachefs: Printbuf rework This converts bcachefs to the modern printbuf interface/implementation, synced with the version to be submitted upstream. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:33 -04:00
Kent Overstreet	576179021c	bcachefs: Fix btree_and_journal_iter We had a bug where btree_and_journal_iter would return the same key twice - after deleting it (perhaps because it was present in both the btree and the journal?) This reworks btree_and_journal_iter to track the current position, much like btree_paths, which makes the logic considerably simpler and more robust. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:33 -04:00
Kent Overstreet	f2aa026575	bcachefs: Fix for cmd_list_journal cmd_list_journal wasn't correctly listing the most recent journal entries as blacklisted - because in the recovery path when just reading the journal, we were failing to add those to the blacklist table. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:33 -04:00
Kent Overstreet	30525f6863	bcachefs: Fix journal_keys_search() overhead Previously, on every btree_iter_peek() operation we were searching the journal keys, doing a full binary search - which was slow. This patch fixes that by saving our position in the journal keys, so that we only do a full binary search when moving our position backwards or a large jump forwards. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:33 -04:00
Kent Overstreet	11f5e595bf	bcachefs: Always print when doing journal replay in fsck This logging improvement helps see when the previous fsck pass has completed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:33 -04:00
Kent Overstreet	d8a161ad54	bcachefs: LRU repair tweaks - Drop old unneeded parameter for whether we're in initial GC - which was from when btree updates had to be done differently before we went RW. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:32 -04:00
Kent Overstreet	099989c1b2	bcachefs: Fix journal_iters_fix() journal_iters_fix() was incorrectly rewinding iterators past keys they had already returned, leading to those keys being double counted in the bch2_gc() path - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:32 -04:00
Kent Overstreet	1cab5a82cc	bcachefs: Go RW before bch2_check_lrus() btree updates before going RW are expensive if they're in random order, since they use the list of keys for journal replay to insert, which is just a gap buffer. This patch improves the bucket invalidate path so that if bch2_check_lrus() hasn't finished it only prints warnings instead of doing an emergency shutdown, which means we can now set BCH_FS_MAY_GO_RW before bch2_check_lrus(). Also, the filesystem state bits are reorganized a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:32 -04:00
Kent Overstreet	75c8d0305a	bcachefs: Kill old rebuild_replicas option This option was useful when the replicas mechism was new and still being debugged, but hasn't been used in ages - let's delete it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:31 -04:00
Kent Overstreet	c609947b5e	bcachefs: Fix for getting stuck in journal replay In journal replay, we weren't immediately dropping journal pins when we start doing updates that ewern't from journal replay - leading to journal reclaim getting stuck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:31 -04:00
Kent Overstreet	5650bb46be	bcachefs: Introduce bch2_journal_keys_peek_(upto\|slot)() When many journal replay keys have been overwritten, bch2_journal_keys_peek() was taking excessively long to scan before it found a key to return. Fix this by introducing bch2_journal_keys_peek_upto() which takes a parameter for the end of the range we want, so that we can terminate the search much sooner, and replace all uses of bch2_journal_keys_peek() with peek_upto() or peek_slot(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:31 -04:00
Kent Overstreet	502f973dba	bcachefs: Fix a few warnings on 32 bit These showed up when building for mips. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:31 -04:00
Kent Overstreet	ce6201c456	bcachefs: Use a genradix for reading journal entries Previously, the journal read path used a linked list for storing the journal entries we read from disk. But there's been a bug that's been causing journal_flush_delay to incorrectly be set to 0, leading to far more journal entries than is normal being written out, which then means filesystems are no longer able to start due to the O(n^2) behaviour of inserting into/searching that linked list. Fix this by switching to a radix tree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:30 -04:00
Kent Overstreet	95752a02cb	bcachefs: Refactor journal_keys_sort() to return an error code When there weren't any keys in the journal there's no need to allocate the buffer - but doing that causes a spurious -ENOMEM. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:30 -04:00
Kent Overstreet	822835ffea	bcachefs: Fold bucket_state in to BCH_DATA_TYPES() Previously, we were missing accounting for buckets in need_gc_gens and need_discard states. This matters because buckets in those states need other btree operations done before they can be used, so they can't be conuted when checking current number of free buckets against the allocation watermark. Also, we weren't directly counting free buckets at all. Now, data type 0 == BCH_DATA_free, and free buckets are counted; this means we can get rid of the separate (poorly defined) count of unavailable buckets. This is a new on disk format version, with upgrade and fsck required for the accounting changes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:30 -04:00
Kent Overstreet	e1effd42a1	bcachefs: More improvements for alloc info checks - Move checks for whether the device & bucket are valid from the .key_invalid method to bch2_check_alloc_key(). This is because .key_invalid() is called on keys that may no longer exist (post journal replay), which is a problem when removing/resizing devices. - We weren't checking the need_discard btree to ensure that every set bucket has a corresponding alloc key. This refactors the code for checking the freespace btree, so that it now checks both. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:30 -04:00
Kent Overstreet	d1d7737fd9	bcachefs: Gap buffer for journal keys Btree updates before we go RW work by inserting into the array of keys that journal replay will insert - but inserting into a flat array is O(n), meaning if btree_gc needs to update many alloc keys, we're O(n^2). Fortunately, the updates btree_gc does happens in sequential order, which means a gap buffer works nicely here - this patch implements a gap buffer for journal keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:30 -04:00

1 2 3 4 5 ...

288 Commits