linux-next

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git synced 2025-01-06 14:05:39 +00:00

Author	SHA1	Message	Date
Filipe Manana	318eee0328	btrfs: remove btrfs_lru_cache_is_full() inline function It's not used anywhere at the moment, but it was used in earlier version of a patch that removed its use in the second version. So just remove btrfs_lru_cache_is_full(). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:18 +02:00
Christoph Hellwig	43fa4219bc	btrfs: simplify adding pages in btrfs_add_compressed_bio_pages btrfs_add_compressed_bio_pages is needlessly complicated. Instead of iterating over the logic disk offset just to add pages to the bio use a simple offset starting at 0, which also removes most of the claiming. Additionally __bio_add_pages already takes care of the assert that the bio is always properly sized, and btrfs_submit_bio called right after asserts that the bio size is non-zero. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:18 +02:00
Christoph Hellwig	4513cb0c40	btrfs: move the bi_sector assignment out of btrfs_add_compressed_bio_pages Adding pages to a bio has nothing to do with the sector. Move the assignment to the two callers in preparation for cleaning up btrfs_add_compressed_bio_pages. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:18 +02:00
Naohiro Aota	d1cc579383	btrfs: sysfs: relax bg_reclaim_threshold for debugging purposes Currently, /sys/fs/btrfs/<UUID>/bg_reclaim_threshold is limited to 0 (disable) or [50 .. 100]%, so we need to fill 50% of a device to start the auto reclaim process. It is cumbersome to do so when we want to shake out possible race issues of normal write vs reclaim. Relax the threshold check under the BTRFS_DEBUG option. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:18 +02:00
Christoph Hellwig	2cef0c79bb	btrfs: make btrfs_split_bio work on struct btrfs_bio btrfs_split_bio expects a btrfs_bio as argument and always allocates one. Type both the orig_bio argument and the return value as struct btrfs_bio to improve type safety. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:18 +02:00
Christoph Hellwig	b41bbd293e	btrfs: return a btrfs_bio from btrfs_bio_alloc Return the containing struct btrfs_bio instead of the less type safe struct bio from btrfs_bio_alloc. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:17 +02:00
Christoph Hellwig	9dfde1b47b	btrfs: store a pointer to a btrfs_bio in struct btrfs_bio_ctrl The bio in struct btrfs_bio_ctrl must be a btrfs_bio, so store a pointer to the btrfs_bio for better type checking. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:17 +02:00
Christoph Hellwig	d733ea012d	btrfs: simplify finding the inode in submit_one_bio struct btrfs_bio now has an always valid inode pointer that can be used to find the inode in submit_one_bio, so use that and initialize all variables for which it is possible at declaration time. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:17 +02:00
Christoph Hellwig	b7d463a1d1	btrfs: store a pointer to the original btrfs_bio in struct compressed_bio The original bio must be a btrfs_bio, so store a pointer to the btrfs_bio for better type checking. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:17 +02:00
Christoph Hellwig	690834e47c	btrfs: pass a btrfs_bio to btrfs_submit_compressed_read btrfs_submit_compressed_read expects the bio passed to it to be embedded into a btrfs_bio structure. Pass the btrfs_bio directly to increase type safety and make the code self-documenting. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:17 +02:00
Christoph Hellwig	ae42a154ca	btrfs: pass a btrfs_bio to btrfs_submit_bio btrfs_submit_bio expects the bio passed to it to be embedded into a btrfs_bio structure. Pass the btrfs_bio directly to increase type safety and make the code self-documenting. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:17 +02:00
Christoph Hellwig	7edb9a3e72	btrfs: move zero filling of compressed read bios into common code All algorithms have to fill the remainder of the orig_bio with zeroes, so do it in common code. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:17 +02:00
Christoph Hellwig	34f888ce3a	btrfs: cleanup main loop in btrfs_encoded_read_regular_fill_pages btrfs_encoded_read_regular_fill_pages has a pretty odd control flow. Unwind it so that there is a single loop over the pages array. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:17 +02:00
Christoph Hellwig	b665affe93	btrfs: remove unused members from struct btrfs_encoded_read_private The inode and file_offset members in struct btrfs_encoded_read_private are unused, so remove them. Last used in commit `7959bd4411` ("btrfs: remove the start argument to check_data_csum and export") and commit `7609afac67` ("btrfs: handle checksum validation and repair at the storage layer"). Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:17 +02:00
David Sterba	0b5485391d	btrfs: locking: use atomic for DREW lock writers The DREW lock uses percpu variable to track lock counters and for that it needs to allocate the structure. In btrfs_read_tree_root() or btrfs_init_fs_root() this may add another error case or requires the NOFS scope protection. One way is to preallocate the structure as was suggested in https://lore.kernel.org/linux-btrfs/20221214021125.28289-1-robbieko@synology.com/ We may avoid the allocation altogether if we don't use the percpu variables but an atomic for the writer counter. This should not make any difference, the DREW lock is used for truncate and NOCOW writes along with other IO operations. The percpu counter for writers has been there since the original commit `8257b2dc3c` "Btrfs: introduce btrfs_{start, end}_nocow_write() for each subvolume". The reason could be to avoid hammering the same cacheline from all the readers but then the writers do that anyway. Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:17 +02:00
Anand Jain	ce4cf3793e	btrfs: remove redundant clearing of NODISCARD If no discard mount option is specified, including the NODISCARD option, we make the async discard the default option then we don't have to call the clear_opt again to clear the NODISCARD flag. Though this makes no difference, that the call is redundant has been pointed out several times so we better remove it. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:17 +02:00
Anand Jain	0f202b256a	btrfs: avoid repetitive define BTRFS_FEATURE_INCOMPAT_SUPP BTRFS_FEATURE_INCOMPAT_SUPP is defined twice, once under CONFIG_BTRFS_DEBUG and once without it, resulting in repetitive code. The reason for this is to add experimental features under CONFIG_BTRFS_DEBUG. To avoid repetitive code, add a common list BTRFS_FEATURE_INCOMPAT_SUPP_STABLE, and append experimental features only under CONFIG_BTRFS_DEBUG. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:16 +02:00
Qu Wenruo	6b4d375a81	btrfs: scrub: remove root and csum_root arguments from scrub_simple_mirror() We don't need to pass the roots as arguments, reading them from the rb-tree is cheap. Thus there is really not much need to pre-fetch it and pass it all the way from scrub_stripe(). And we already have more than enough arguments in scrub_simple_mirror() and scrub_simple_stripe(), it's better to remove them and only grab those roots in scrub_simple_mirror(). Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:16 +02:00
Qu Wenruo	1d40329736	btrfs: scrub: remove unused path inside scrub_stripe() The variable @path is no longer passed into any call sites after commit `18d30ab961` ("btrfs: scrub: use scrub_simple_mirror() to handle RAID56 data stripe scrub"), thus we can remove the variable completely. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:16 +02:00
Qu Wenruo	5f50fa918f	btrfs: do not use replace target device as an extra mirror [BUG] Currently btrfs can use dev-replace device as an extra mirror for read-repair. But it can lead to NODATASUM corruption in the following case: There is a RAID1 data chunk, and dev-replace is running from dev2 to dev0. \|//\| = Replaced data X X+1MB X+2MB Dev 2: \| \| \| <- Source dev Dev 0: \|///////\| \| <- Target dev Then a read on dev 2 X+2MB happens. And something wrong happened inside devid 2, causing an -EIO. In that case, read-repair would try the next mirror, and since we can use target device as an extra mirror, we will use that mirror instead. But unfortunately since the read is beyond the current replace cursor, we should not trust it at all, what we get would be just uninitialized garbage. But if this read is for NODATASUM range, then we just trust them and cause data corruption. [CAUSE] We used to have some checks to make sure we only return such extra mirror when the range is before our left cursor. The first commit introducing this behavior is `ad6d620e2a` ("Btrfs: allow repair code to include target disk when searching mirrors"). But later a fix, `22ab04e814` ("Btrfs: fix race between device replace and chunk allocation") changed the behavior, to always let btrfs_map_block() include the extra mirror to address a race in dev-replace which can cause missing writes to target device. This means, we lose the tracking of cursor for the extra mirror, thus can lead to above corruption. [FIX] The extra mirror is never a reliable one, at the beginning of dev-replace, the reliability is zero, while only at the end of the replace it's a fully reliable mirror. We either do the complex tracking, or never trust it. IMHO it's much easier to maintain if we don't trust it at all, and the extra mirror can only benefit for a limited period of time (during replace). Thus this patch would completely remove the ability to use target device as an extra mirror. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:16 +02:00
Qu Wenruo	4871c33baf	btrfs: open_ctree() error handling cleanup Currently open_ctree() still uses two variables for error handling, err and ret. This can be confusing and missing some errors and does not conform to current coding style. This patch will fix the problems by: - Use only ret for error handling - Add proper ret assignment Originally we rely on the default value (-EINVAL) of err to handle errors, but that doesn't really reflects the error. This will change it use the correct error number for the following call sites: * subpage_info allocation * btrfs_free_extra_devids() * btrfs_check_rw_degradable() * cleaner_kthread allocation * transaction_kthread allocation - Add an extra ASSERT() To make sure we error out instead of returning 0. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:16 +02:00
Christoph Hellwig	e2eb02480c	btrfs: cleanup the main loop in btrfs_lookup_bio_sums Introduce a bio_offset variable for the current offset into the bio instead of recalculating it over and over. Remove the now only used once search_len and sector_offset variables, and reduce the scope for count and cur_disk_bytenr. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:16 +02:00
Christoph Hellwig	65886d2b1f	btrfs: remove search_file_offset_in_bio There is no need to search for a file offset in a bio, it is now always provided in bbio->file_offset (set at bio allocation time since `0d495430db` ("btrfs: set bbio->file_offset in alloc_new_bio")). Just use that with the offset into the bio. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:16 +02:00
Johannes Thumshirn	198bd49e5f	btrfs: sink calc_bio_boundaries into its only caller Nowadays calc_bio_boundaries() is a relatively simple function that only guarantees the one bio equals to one ordered extent rule for uncompressed Zone Append bios. Sink it into it's only caller alloc_new_bio(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:16 +02:00
Christoph Hellwig	24e6c80822	btrfs: simplify main loop in submit_extent_page bio_add_page always adds either the entire range passed to it or nothing. Based on that btrfs_bio_add_page can only return a length smaller than the passed in one when hitting the ordered extent limit, which can only happen for writes. Given that compressed writes never even use this code path, this means that all the special cases for compressed extent offset handling are dead code. Reflow submit_extent_page to take advantage of this by inlining btrfs_bio_add_page and handling the ordered extent limit by decrementing it for each added range and thus significantly simplifying the loop. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:16 +02:00
Christoph Hellwig	78a2ef1b7b	btrfs: check for contiguity in submit_extent_page Different loop iterations in btrfs_bio_add_page not only have the same contiguity parameters, but also any non-initial operation operates on a fresh bio anyway. Factor out the contiguity check into a new btrfs_bio_is_contig and only call it once in submit_extent_page before descending into the bio_add_page loop. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:16 +02:00
Christoph Hellwig	5380311fc8	btrfs: simplify the error handling in __extent_writepage_io Remove the has_error and saved_ret variables, and just jump to a goto label for error handling from the only place returning an error from the main loop. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:16 +02:00
Christoph Hellwig	551733372f	btrfs: remove the submit_extent_page return value submit_extent_page always returns 0 since commit `d5e4377d50` ("btrfs: split zone append bios in btrfs_submit_bio"). Change it to a void return type and remove all the unreachable error handling code in the callers. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:16 +02:00
Christoph Hellwig	f8ed4852f3	btrfs: remove the compress_type argument to submit_extent_page Update the compress_type in the btrfs_bio_ctrl after forcing out the previous bio in btrfs_do_readpage, so that alloc_new_bio can just use the compress_type member in struct btrfs_bio_ctrl instead of passing the same information redundantly as a function argument. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:15 +02:00
Christoph Hellwig	a140453bf9	btrfs: rename the this_bio_flag variable in btrfs_do_readpage Rename this_bio_flag to compress_type to match the surrounding code and better document the intent. Also use the proper enum type instead of unsigned long. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:15 +02:00
Christoph Hellwig	c9bc621fb4	btrfs: move the compress_type check out of btrfs_bio_add_page The compress_type can only change on a per-extent basis. So instead of checking it for every page in btrfs_bio_add_page, do the check once in btrfs_do_readpage, which is the only caller of btrfs_bio_add_page and submit_extent_page that deals with compressed extents. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:15 +02:00
Christoph Hellwig	72b505dc57	btrfs: add a wbc pointer to struct btrfs_bio_ctrl Instead of passing down the wbc pointer the deep call chain, just add it to the btrfs_bio_ctrl structure. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:15 +02:00
Christoph Hellwig	794c26e214	btrfs: remove the sync_io flag in struct btrfs_bio_ctrl The sync_io flag is equivalent to wbc->sync_mode == WB_SYNC_ALL, so just check for that and remove the separate flag. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:15 +02:00
Christoph Hellwig	c000bc04ba	btrfs: store the bio opf in struct btrfs_bio_ctrl The bio op and flags never change over the life time of a bio_ctrl, so move it in there instead of passing it down the deep call chain all the way down to alloc_new_bio. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:15 +02:00
Christoph Hellwig	eb8d0c6d04	btrfs: remove the force_bio_submit to submit_extent_page If force_bio_submit, submit_extent_page simply calls submit_one_bio as the first thing. This can just be moved to the only caller that sets force_bio_submit to true. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:15 +02:00
Christoph Hellwig	67998cf438	btrfs: don't set force_bio_submit in read_extent_buffer_subpage When read_extent_buffer_subpage calls submit_extent_page, it does so on a freshly initialized btrfs_bio_ctrl structure that can't have a valid bio to submit. Clear the force_bio_submit parameter to false as there is nothing to submit. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:15 +02:00
Anand Jain	fdf8d595f4	btrfs: open code btrfs_bin_search() btrfs_bin_search() is a simple wrapper that searches for the whole slots by calling btrfs_generic_bin_search() with the starting slot/first_slot preset to 0. This simple wrapper can be open coded as btrfs_bin_search(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:15 +02:00
Qu Wenruo	7b31e0451d	btrfs: dev-replace: properly follow its read mode [BUG] Although dev replace ioctl has a way to specify the mode on whether we should read from the source device, it's not properly followed. # mkfs.btrfs -f -d raid1 -m raid1 $dev1 $dev2 # mount $dev1 $mnt # xfs_io -f -c "pwrite 0 32M" $mnt/file # sync # btrfs replace start -r -f 1 $dev3 $mnt And one extra trace is added to scrub_submit(), showing the detail about the bio: btrfs-11569 [005] ... 37.0270: scrub_submit.part.0: devid=1 logical=22036480 phy=22036480 len=16384 btrfs-11569 [005] ... 37.0273: scrub_submit.part.0: devid=1 logical=30457856 phy=30457856 len=32768 btrfs-11569 [005] ... 37.0274: scrub_submit.part.0: devid=1 logical=30507008 phy=30507008 len=49152 btrfs-11569 [005] ... 37.0274: scrub_submit.part.0: devid=1 logical=30605312 phy=30605312 len=32768 btrfs-11569 [005] ... 37.0275: scrub_submit.part.0: devid=1 logical=30703616 phy=30703616 len=65536 btrfs-11569 [005] ... 37.0281: scrub_submit.part.0: devid=1 logical=298844160 phy=298844160 len=131072 ... btrfs-11569 [005] ... 37.0762: scrub_submit.part.0: devid=1 logical=322961408 phy=322961408 len=131072 btrfs-11569 [005] ... 37.0762: scrub_submit.part.0: devid=1 logical=323092480 phy=323092480 len=131072 One can see that all the reads are submitted to devid 1, even if we have specified "-r" option to avoid reading from the source device. [CAUSE] The dev-replace read mode is only set but not followed by scrub code at all. In fact, only common read path is properly following the read mode, but scrub itself has its own read path, thus not following the mode. [FIX] Here we enhance scrub_find_good_copy() to also follow the read mode. The idea is pretty simple, in the first loop, we avoid the following devices: - Missing devices This is the existing condition - The source device if the replace wants to avoid it. And if above loop found no candidate (e.g. replace a single device), then we discard the 2nd condition, and try again. Since we're here, also enhance the function scrub_find_good_copy() by: - Remove the forward declaration - Makes it return int To indicates errors, e.g. no good mirror found. - Add extra error messages Now with the same trace, "btrfs replace start -r" works as expected: btrfs-1213 [000] ... 991.9059: scrub_submit.part.0: devid=2 logical=22036480 phy=1064960 len=16384 btrfs-1213 [000] ... 991.9062: scrub_submit.part.0: devid=2 logical=30457856 phy=9486336 len=32768 btrfs-1213 [000] ... 991.9063: scrub_submit.part.0: devid=2 logical=30507008 phy=9535488 len=49152 btrfs-1213 [000] ... 991.9064: scrub_submit.part.0: devid=2 logical=30605312 phy=9633792 len=32768 btrfs-1213 [000] ... 991.9065: scrub_submit.part.0: devid=2 logical=30703616 phy=9732096 len=65536 btrfs-1213 [000] ... 991.9073: scrub_submit.part.0: devid=2 logical=298844160 phy=277872640 len=131072 btrfs-1213 [000] ... 991.9075: scrub_submit.part.0: devid=2 logical=298975232 phy=278003712 len=131072 btrfs-1213 [000] ... 991.9078: scrub_submit.part.0: devid=2 logical=299106304 phy=278134784 len=131072 ... btrfs-1213 [000] ... 991.9474: scrub_submit.part.0: devid=2 logical=318504960 phy=297533440 len=131072 btrfs-1213 [000] ... 991.9476: scrub_submit.part.0: devid=2 logical=318636032 phy=297664512 len=131072 btrfs-1213 [000] ... 991.9479: scrub_submit.part.0: devid=2 logical=318767104 phy=297795584 len=131072 Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:15 +02:00
Christoph Hellwig	f9327a70c1	btrfs: fold finish_compressed_bio_write into btrfs_finish_compressed_write_work Fold finish_compressed_bio_write into its only caller as there is no reason to keep them separate. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:15 +02:00
Christoph Hellwig	a959a1745d	btrfs: don't clear page->mapping in btrfs_free_compressed_pages No one ever set ->mapping on these pages, so don't bother clearing it. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:15 +02:00
Christoph Hellwig	32586c5bca	btrfs: factor out a btrfs_free_compressed_pages helper Share the code to free the compressed pages and the array to hold them into a common helper. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:14 +02:00
Christoph Hellwig	10e924bc32	btrfs: factor out a btrfs_add_compressed_bio_pages helper Factor out a common helper to add the compressed_bio pages to the bio that is shared by the compressed read and write path. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:14 +02:00
Christoph Hellwig	d7294e4dee	btrfs: use the bbio file offset in add_ra_bio_pages struct btrfs_bio now has a file_offset field set up by all submitters. Use that value combined with the bio size in add_ra_bio_pages to calculate the last offset in the bio. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:14 +02:00
Christoph Hellwig	e7aff33e31	btrfs: use the bbio file offset in btrfs_submit_compressed_read struct btrfs_bio now has a file_offset field set up by all submitters. Use that in btrfs_submit_compressed_read instead of recalculating the value. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:14 +02:00
Christoph Hellwig	798c9fc74d	btrfs: remove redundant free_extent_map in btrfs_submit_compressed_read em can't be non-NULL after the free_extent_map label. Also remove the now pointless clearing of em to NULL after freeing it. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:14 +02:00
Christoph Hellwig	544fe4a903	btrfs: embed a btrfs_bio into struct compressed_bio Embed a btrfs_bio into struct compressed_bio. This avoids potential (so far theoretical) deadlocks due to nesting of btrfs_bioset allocations for the original read bio and the compressed bio, and avoids an extra memory allocation in the I/O path. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:14 +02:00
Qu Wenruo	18d758a2d8	btrfs: replace btrfs_io_context::raid_map with a fixed u64 value In btrfs_io_context structure, we have a pointer raid_map, which indicates the logical bytenr for each stripe. But considering we always call sort_parity_stripes(), the result raid_map[] is always sorted, thus raid_map[0] is always the logical bytenr of the full stripe. So why we waste the space and time (for sorting) for raid_map? This patch will replace btrfs_io_context::raid_map with a single u64 number, full_stripe_start, by: - Replace btrfs_io_context::raid_map with full_stripe_start - Replace call sites using raid_map[0] to use full_stripe_start - Replace call sites using raid_map[i] to compare with nr_data_stripes. The benefits are: - Less memory wasted on raid_map It's sizeof(u64) * num_stripes vs sizeof(u64). It'll always save at least one u64, and the benefit grows larger with num_stripes. - No more weird alloc_btrfs_io_context() behavior As there is only one fixed size + one variable length array. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:14 +02:00
Qu Wenruo	1faf388506	btrfs: use an efficient way to represent source of duplicated stripes For btrfs dev-replace, we have to duplicate writes to the source device into the target device. For non-RAID56, all writes into the same mapped ranges are sharing the same content, thus they don't really need to bother anything. (E.g. in btrfs_submit_bio() for non-RAID56 range we just submit the same write to all involved devices). But for RAID56, all stripes contain different content, thus we must have a clear mapping of which stripe is duplicated from which original stripe. Currently we use a complex way using tgtdev_map[] array, e.g: num_tgtdevs = 1 tgtdev_map[0] = 0 <- Means stripes[0] is not involved in replace. tgtdev_map[1] = 3 <- Means stripes[1] is involved in replace, and it's duplicated to stripes[3]. tgtdev_map[2] = 0 <- Means stripes[2] is not involved in replace. But this is wasting some space, and ignores one important thing for dev-replace, there is at most one running replace. Thus we can change it to a fixed array to represent the mapping: replace_nr_stripes = 1 replace_stripe_src = 1 <- Means stripes[1] is involved in replace. thus the extra stripe is a copy of stripes[1] By this we can save some space for bioc on RAID56 chunks with many devices. And we get rid of one variable sized array from bioc. Thus the patch involves the following changes: - Replace @num_tgtdevs and @tgtdev_map[] with @replace_nr_stripes and @replace_stripe_src. @num_tgtdevs is just renamed to @replace_nr_stripes. While the mapping is completely changed. - Add extra ASSERT()s for RAID56 code - Only add two more extra stripes for dev-replace cases. As we have an upper limit on how many dev-replace stripes we can have. - Unify the behavior of handle_ops_on_dev_replace() Previously handle_ops_on_dev_replace() go two different paths for WRITE and GET_READ_MIRRORS. Now unify them by always going the WRITE path first (with at most 2 replace stripes), then if we're doing GET_READ_MIRRORS and we have 2 extra stripes, just drop one stripe. - Remove the @real_stripes argument from alloc_btrfs_io_context() As we don't need the old variable length array any more. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:14 +02:00
Qu Wenruo	4ced85f81a	btrfs: reduce type width of btrfs_io_contexts That structure is our ultimate object for all __btrfs_map_block() related functions. We have some hard to understand members, like tgtdev_map, but without any comments. This patch will improve the situation: - Add extra comments for num_stripes, mirror_num, num_tgtdevs and tgtdev_map[] Especially for the last two members, add a dedicated (thus very long) comments for them, with example to explain it. - Shrink those int members to u16. In fact our on-disk format is only using u16 for num_stripes, thus no need to use int at all. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:14 +02:00
Qu Wenruo	be5c7edbfd	btrfs: simplify the bioc argument for handle_ops_on_dev_replace() There is no memory re-allocation for handle_ops_on_dev_replace(), thus we don't need to pass a btrfs_io_context pointer. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-17 18:01:14 +02:00

1 2 3 4 5 ...

1171182 Commits