The old page API is being gradually replaced and converted to use folio
to improve code readability and avoid repeated conversion between page
and folio. Based on the previous patch, the compression path can be
directly used in folio without converting to page.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The old page API is being gradually replaced and converted to use folio
to improve code readability and avoid repeated conversion between page
and folio. And memcpy_to_page() can be replaced with memcpy_to_folio().
But there is no memzero_folio(), but it can be replaced equivalently by
folio_zero_range().
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The old page API is being gradually replaced and converted to use folio
to improve code readability and avoid repeated conversion between page
and folio. And memcpy_to_page() can be replaced with memcpy_to_folio().
But there is no memzero_folio(), but it can be replaced equivalently by
folio_zero_range().
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The old page API is being gradually replaced and converted to use folio
to improve code readability and avoid repeated conversion between page
and folio. And memcpy_to_page() can be replaced with memcpy_to_folio().
But there is no memzero_folio(), but it can be replaced equivalently by
folio_zero_range().
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Historically we've held the extent lock throughout the entire read.
There's been a few reasons for this, but it's mostly just caused us
problems. For example, this prevents us from allowing page faults
during direct io reads, because we could deadlock. This has forced us
to only allow 4k reads at a time for io_uring NOWAIT requests because we
have no idea if we'll be forced to page fault and thus have to do a
whole lot of work.
On the buffered side we are protected by the page lock, as long as we're
reading things like buffered writes, punch hole, and even direct IO to a
certain degree will get hung up on the page lock while the page is in
flight.
On the direct side we have the dio extent lock, which acts much like the
way the extent lock worked previously to this patch, however just for
direct reads. This protects direct reads from concurrent direct writes,
while we're protected from buffered writes via the inode lock.
Now that we're protected in all cases, narrow the extent lock to the
part where we're getting the extent map to submit the reads, no longer
holding the extent lock for the entire read operation. Push the extent
lock down into do_readpage() so that we're only grabbing it when looking
up the extent map. This portion was contributed by Goldwyn.
Co-developed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function name is a bit misleading as it submits the btrfs_bio
(bbio), rename it so we can use btrfs_submit_bio() when an actual bio is
submitted.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Willy is going to get rid of page->index, and add_ra_bio_pages uses
page->index. Make his life easier by converting add_ra_bio_pages to use
folios so that we are no longer using page->index.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At add_ra_bio_pages() we are accessing the extent map to calculate
'add_size' after we dropped our reference on the extent map, resulting
in a use-after-free. Fix this by computing 'add_size' before dropping our
extent map reference.
Reported-by: syzbot+853d80cba98ce1157ae6@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/000000000000038144061c6d18f2@google.com/
Fixes: 6a4049102055 ("btrfs: subpage: make add_ra_bio_pages() compatible")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function btrfs_alloc_folio_array() is only utilized in
btrfs_submit_compressed_read() and no other location, and the only
caller is not utilizing the @extra_gfp parameter.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pass a struct btrfs_inode to btrfs_compress_heuristic() as it's an
internal interface, allowing to remove some use of BTRFS_I.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
The structure is internal so we should use struct btrfs_inode for that,
allowing to remove some use of BTRFS_I.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
The member extent_map::block_start can be calculated from
extent_map::disk_bytenr + extent_map::offset for regular extents.
And otherwise just extent_map::disk_bytenr.
And this is already validated by the validate_extent_map(). Now we can
remove the member.
However there is a special case in btrfs_create_dio_extent() where we
for NOCOW/PREALLOC ordered extents cannot directly use the resulting
btrfs_file_extent, as btrfs_split_ordered_extent() cannot handle them
yet.
So for that call site, we pass file_extent->disk_bytenr +
file_extent->num_bytes as disk_bytenr for the ordered extent, and 0 for
offset.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The extent_map::block_len is either extent_map::len (non-compressed
extent) or extent_map::disk_num_bytes (compressed extent).
Since we already have sanity checks to do the cross-checks between the
new and old members, we can drop the old extent_map::block_len now.
For most call sites, they can manually select extent_map::len or
extent_map::disk_num_bytes, since most if not all of them have checked
if the extent is compressed.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since we have extent_map::offset, the old extent_map::orig_start is just
extent_map::start - extent_map::offset for non-hole/inline extents.
And since the new extent_map::offset is already verified by
validate_extent_map() while the old orig_start is not, let's just remove
the old member from all call sites.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Fix typo in the end IO compression callbacks, from "comprssed" to
"compressed".
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
A comment from Filipe on one of my previous cleanups brought my
attention to a new helper we have for getting the root id of a root,
which makes it easier to read in the code.
The changes where made with the following Coccinelle semantic patch:
// <smpl>
@@
expression E,E1;
@@
(
E->root_key.objectid = E1
|
- E->root_key.objectid
+ btrfs_root_id(E)
)
// </smpl>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor style fixups ]
Signed-off-by: David Sterba <dsterba@suse.com>
For both compression and decompression paths, we always require a
"struct page **pages" and "unsigned long nr_pages", this involves quite
some part of the btrfs compression paths:
- All the compression entry points
- compressed_bio structure
This affects both compression and decompression.
- async_extent structure
Unfortunately with all those involved parts, there is no good way to
split the conversion into smaller patches while still passing compiling.
So do this in one big conversion in one go.
Please note this is direct page->folio conversion, no change on the page
sized folio requirement yet.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor style fixups ]
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we have two wrappers to allocate and free a page for
compression usage:
- btrfs_alloc_compr_page()
- btrfs_free_compr_page()
The allocator would try to grab a page from the pool, and only allocate
a new page if the pool is empty.
The reclaimer would check if the pool is full, and if not full it would
put the page into the pool.
This patch converts both helpers to use folio interfaces, and allowing
further conversion of compression path to folios.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For all the supported compression algorithms, the compression path would
always need to grab the page cache, then do the compression.
Normally we would get a page reference without any problem, since the
write path should have already locked the pages in the write range.
For the sake of error handling, we should handle the page cache miss
case.
Adds a common wrapper, btrfs_compress_find_get_page(), which calls
find_get_page(), and do the error handling along with an error message.
Callers inside compression path would only need to call
btrfs_compress_find_get_page(), and error out if it returned any error.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since commit a440d48c7f93 ("Btrfs: heuristic: implement sampling
logic"), btrfs_compress_heuristic() is no longer a simple "return true",
but more complex to determine if we should compress.
Thus the comment is dead and can be confusing, just remove it.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a convenience helper to get a fs_info from a VFS inode pointer
instead of open coding the chain or using btrfs_sb() that in some cases
does one more pointer hop. This is implemented as a macro (still with
type checking) so we don't need full definitions of struct btrfs_inode,
btrfs_root or btrfs_fs_info.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add convenience helpers to get a fs_info from a page or folio pointer
instead of open coding the chain or using btrfs_sb() that in some cases
does one more pointer hop. This is implemented as a macro (still with
type checking) so we don't need full definitions of struct page, folio,
btrfs_root and btrfs_fs_info. The latter can't be static inlines as this
would create loop between ctree.h <-> fs.h, or the headers would have to
be restructured.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With help of neovim, LSP and clangd we can identify header files that
are not actually needed to be included in the .c files. This is focused
only on removal (with minor fixups), further cleanups are possible but
will require doing the header files properly with forward declarations,
minimized includes and include-what-you-use care.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
If we have a filesystem with 4k sectorsize, and an inlined compressed
extent created like this:
item 4 key (257 INODE_ITEM 0) itemoff 15863 itemsize 160
generation 8 transid 8 size 4096 nbytes 4096
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
item 5 key (257 INODE_REF 256) itemoff 15839 itemsize 24
index 2 namelen 14 name: source_inlined
item 6 key (257 EXTENT_DATA 0) itemoff 15770 itemsize 69
generation 8 type 0 (inline)
inline extent data size 48 ram_bytes 4096 compression 1 (zlib)
Which has an inline compressed extent at file offset 0, and its
decompressed size is 4K, allowing us to reflink that 4K range to another
location (which will not be compressed).
If we do such reflink on a subpage system, it would fail like this:
# xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
XFS_IOC_CLONE_RANGE: Input/output error
[CAUSE]
In zlib_decompress(), we didn't treat @start_byte as just a page offset,
but also use it as an indicator on whether we should switch our output
buffer.
In reality, for subpage cases, although @start_byte can be non-zero,
we should never switch input/output buffer, since the whole input/output
buffer should never exceed one sector.
Note: The above assumption is only not true if we're going to support
multi-page sectorsize.
Thus the current code using @start_byte as a condition to switch
input/output buffer or finish the decompression is completely incorrect.
[FIX]
The fix involves several modifications:
- Rename @start_byte to @dest_pgoff to properly express its meaning
- Add an extra ASSERT() inside btrfs_decompress() to make sure the
input/output size never exceeds one sector.
- Use Z_FINISH flag to make sure the decompression happens in one go
- Remove the loop needed to switch input/output buffers
- Use correct destination offset inside the destination page
- Consider early end as an error
After the fix, even on 64K page sized aarch64, above reflink now
works as expected:
# xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
linked 4096/4096 bytes at offset 61440
And resulted a correct file layout:
item 9 key (258 INODE_ITEM 0) itemoff 15542 itemsize 160
generation 10 transid 10 size 65536 nbytes 4096
block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
sequence 1 flags 0x0(none)
item 10 key (258 INODE_REF 256) itemoff 15528 itemsize 14
index 3 namelen 4 name: dest
item 11 key (258 XATTR_ITEM 3817753667) itemoff 15445 itemsize 83
location key (0 UNKNOWN.0 0) type XATTR
transid 10 data_len 37 name_len 16
name: security.selinux
data unconfined_u:object_r:unlabeled_t:s0
item 12 key (258 EXTENT_DATA 61440) itemoff 15392 itemsize 53
generation 10 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we still go the old page based iterator functions, like
bio_for_each_segment_all(), we can hit middle pages of a folio (compound
page).
In that case if we set any page flag on those middle pages, we can
easily trigger VM_BUG_ON(), as for compound page flags, they should
follow their flag policies (normally only set on leading or tail pages).
To avoid such problem in the future full folio migration, here we do:
- Change from bio_for_each_segment_all() to bio_for_each_folio_all()
This completely removes the ability to access the middle page.
- Add extra ASSERT()s for data read/write paths
To ensure we only get single paged folio for data now.
- Rename those end io functions to follow a certain schema
* end_bbio_compressed_read()
* end_bbio_compressed_write()
These two endio functions don't set any page flags, as they use pages
not mapped to any address space.
They can be very good candidates for higher order folio testing.
And they are shared between compression and encoded IO.
* end_bbio_data_read()
* end_bbio_data_write()
* end_bbio_meta_read()
* end_bbio_meta_write()
The old function names are not unified:
- end_bio_extent_writepage()
- end_bio_extent_readpage()
- extent_buffer_write_end_io()
- extent_buffer_read_end_io()
They share no schema on where the "end_*io" string should be, nor can
be confusing just using "extent_buffer" and "extent" to distinguish
data and metadata paths.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Although subpage itself is conflicting with higher folio, since subpage
(sectorsize < PAGE_SIZE and nodesize < PAGE_SIZE) means we will never
need higher order folio, there is a hidden pitfall:
- btrfs_page_*() helpers
Those helpers are an abstraction to handle both subpage and non-subpage
cases, which means we're going to pass pages pointers to those helpers.
And since those helpers are shared between data and metadata paths, it's
unavoidable to let them to handle folios, including higher order
folios).
Meanwhile for true subpage case, we should only have a single page
backed folios anyway, thus add a new ASSERT() for btrfs_subpage_assert()
to ensure that.
Also since those helpers are shared between both data and metadata, add
some extra ASSERT()s for data path to make sure we only get single page
backed folio for now.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently alloc_extent_buffer() utilizes find_or_create_page() to
allocate one page a time for an extent buffer.
This method has the following disadvantages:
- find_or_create_page() is the legacy way of allocating new pages
With the new folio infrastructure, find_or_create_page() is just
redirected to filemap_get_folio().
- Lacks the way to support higher order (order >= 1) folios
As we can not yet let filemap give us a higher order folio.
This patch would change the workflow by the following way:
Old | new
-----------------------------------+-------------------------------------
| ret = btrfs_alloc_page_array();
for (i = 0; i < num_pages; i++) { | for (i = 0; i < num_pages; i++) {
p = find_or_create_page(); | ret = filemap_add_folio();
/* Attach page private */ | /* Reuse page cache if needed */
/* Reused eb if needed */ |
| /* Attach page private and
| reuse eb if needed */
| }
By this we split the page allocation and private attaching into two
parts, allowing future updates to each part more easily, and migrate to
folio interfaces (especially for possible higher order folios).
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently, in struct extent_map, we use an unsigned int (32 bits) to
identify the compression type of an extent and an unsigned long (64 bits
on a 64 bits platform, 32 bits otherwise) for flags. We are only using
6 different flags, so an unsigned long is excessive and we can use flags
to identify the compression type instead of using a dedicated 32 bits
field.
We can easily have tens or hundreds of thousands (or more) of extent maps
on busy and large filesystems, specially with compression enabled or many
or large files with tons of small extents. So it's convenient to have the
extent_map structure as small as possible in order to use less memory.
So remove the compression type field from struct extent_map, use flags
to identify the compression type and shorten the flags field from an
unsigned long to a u32. This saves 8 bytes (on 64 bits platforms) and
reduces the size of the structure from 136 bytes down to 128 bytes, using
now only two cache lines, and increases the number of extent maps we can
have per 4K page from 30 to 32. By using a u32 for the flags instead of
an unsigned long, we no longer use test_bit(), set_bit() and clear_bit(),
but that level of atomicity is not needed as most flags are never cleared
once set (before adding an extent map to the tree), and the ones that can
be cleared or set after an extent map is added to the tree, are always
performed while holding the write lock on the extent map tree, while the
reader holds a lock on the tree or tests for a flag that never changes
once the extent map is in the tree (such as compression flags).
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The pages are now allocated and freed centrally, so we can extend the
logic to manage the lifetime. The main idea is to keep a few recently
used pages and hand them to all writers. Ideally we won't have to go to
allocator at all (a slight performance gain) and also raise chance that
we'll have the pages available (slightly increased reliability).
In order to avoid gathering too many pages, the shrinker is attached to
the cache so we can free them on when MM demands that. The first
implementation will drain the whole cache. Further this can be refined
to keep some minimal number of pages for emergency purposes. The
ultimate goal to avoid memory allocation failures on the write out path
from the compression.
The pool threshold is set to cover full BTRFS_MAX_COMPRESSED / PAGE_SIZE
for minimal thread pool, which is 8 (btrfs_init_fs_info()). This is 128K
/ 4K * 8 = 256 pages at maximum, which is 1MiB.
This is for all filesystems currently mounted, with heavy use of
compression IO the allocator is still needed. The cache helps for short
burst IO.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is a preparation for managing compression pages in a cache-like
manner, instead of asking the allocator each time. The common allocation
and free wrappers are introduced and are functionally equivalent to the
current code.
The freeing helpers need to be carefully placed where the last reference
is dropped. This is either after directly allocating (error handling)
or when there are no other users of the pages (after copying the contents).
It's safe to not use the helper and use put_page() that will handle the
reference count. Not using the helper means there's lower number of
pages that could be reused without passing them back to allocator.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We sync the kernel files to userspace and the 'errno' symbol is defined
by standard library, which does not matter in kernel but the parameters
or local variables could clash. Rename them all.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The 'btrfs_compressed_bioset' struct isn't exported outside of the
fs/btrfs/compression.c file, so make it static to fix the following
sparse warning:
fs/btrfs/compression.c:40:16: warning: symbol 'btrfs_compressed_bioset' was not declared. Should it be static?
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Use the btrfs_finish_ordered_extent helper to complete compressed writes
using the bbio->ordered pointer instead of requiring an rbtree lookup
in the otherwise equivalent btrfs_mark_ordered_io_finished called from
btrfs_writepage_endio_finish_ordered.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add a pointer to the ordered_extent to the existing union in struct
btrfs_bio, so all code dealing with data write bios can just use a
pointer dereference to retrieve the ordered_extent instead of doing
multiple rbtree lookups per I/O.
The reference to this ordered_extent is dropped at end I/O time,
which implies that an extra one must be acquired when the bio is split.
This also requires moving the btrfs_extract_ordered_extent call into
btrfs_split_bio so that the invariant of always having a valid
ordered_extent reference for the btrfs_bio is kept.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_submit_compressed_write always operates on a single ordered_extent.
Make that explicit by using btrfs_alloc_ordered_extent in the callers
and passing the ordered_extent to btrfs_submit_compressed_write. This
will help with storing and ordered_extent pointer in the btrfs_bio in
subsequent patches.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently buffered writeback bios are allowed to span multiple
ordered_extents, although that basically never actually happens since
commit 4a445b7b6178 ("btrfs: don't merge pages into bio if their page
offset is not contiguous").
Supporting bios than span ordered_extents complicates the file
checksumming code, and prevents us from adding an ordered_extent pointer
to the btrfs_bio structure. Use the existing code to limit a bio to
single ordered_extent for zoned device writes for all writes.
This allows to remove the REQ_BTRFS_ONE_ORDERED flags, and the
handling of multiple ordered_extents in btrfs_csum_one_bio.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
PageError is not used by the VFS/MM and deprecated because it uses up a
page bit and has no coherent rules. Instead read errors are usually
propagated by not setting or clearing the uptodate bit, and write errors
are propagated through the address_space. Btrfs now only sets the flag
and never clears it for data pages, so just remove all places setting it,
and the subpage error bit.
Note that the error propagation for superblock writes that work on the
block device mapping still uses PageError for now, but that will be
addressed in a separate series.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Given that read recovery for data I/O is handled in the storage layer,
the mirror_num argument to btrfs_submit_compressed_read is always 0,
so remove it.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Use SECTOR_SHIFT while converting a physical address to an LBA, makes
it more readable.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we're doing a lot of work for btrfs_bio:
- Checksum verification for data read bios
- Bio splits if it crosses stripe boundary
- Read repair for data read bios
However for the incoming scrub patches, we don't want this extra
functionality at all, just plain logical + mirror -> physical mapping
ability.
Thus here we do the following changes:
- Introduce btrfs_bio::fs_info
This is for the new scrub specific btrfs_bio, which would not populate
btrfs_bio::inode.
Thus we need such new member to grab a fs_info
This new member will always be populated.
- Replace @inode argument with @fs_info for btrfs_bio_init() and its
caller
Since @inode is no longer a mandatory member, replace it with
@fs_info, and let involved users populate @inode.
- Skip checksum verification and generation if @bbio->inode is NULL
- Add extra ASSERT()s
To make sure:
* bbio->inode is properly set for involved read repair path
* if @file_offset is set, bbio->inode is also populated
- Grab @fs_info from @bbio directly
We can no longer go @bbio->inode->root->fs_info, as bbio->inode can be
NULL. This involves:
* btrfs_simple_end_io()
* should_async_write()
* btrfs_wq_submit_bio()
* btrfs_use_zone_append()
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_submit_compressed_write should not have to care if it is called
from a helper thread or not. Move the kthread_associate_blkcg handling
into submit_one_async_extent, as that is the one caller that needs it.
Also move the assignment of REQ_CGROUP_PUNT into cow_file_range_async,
as that is the routine that sets up the helper thread offload.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_add_compressed_bio_pages is needlessly complicated. Instead
of iterating over the logic disk offset just to add pages to the bio
use a simple offset starting at 0, which also removes most of the
claiming. Additionally __bio_add_pages already takes care of the
assert that the bio is always properly sized, and btrfs_submit_bio
called right after asserts that the bio size is non-zero.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Adding pages to a bio has nothing to do with the sector. Move the
assignment to the two callers in preparation for cleaning up
btrfs_add_compressed_bio_pages.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The original bio must be a btrfs_bio, so store a pointer to the
btrfs_bio for better type checking.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_submit_compressed_read expects the bio passed to it to be embedded
into a btrfs_bio structure. Pass the btrfs_bio directly to increase type
safety and make the code self-documenting.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_submit_bio expects the bio passed to it to be embedded into a
btrfs_bio structure. Pass the btrfs_bio directly to increase type
safety and make the code self-documenting.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
All algorithms have to fill the remainder of the orig_bio with zeroes,
so do it in common code.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Fold finish_compressed_bio_write into its only caller as there is no
reason to keep them separate.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
No one ever set ->mapping on these pages, so don't bother clearing it.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Share the code to free the compressed pages and the array to hold them
into a common helper.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>