We can't trust write pointers when the previous mount was not
successfully unmounted.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Rather than in range of [0, max_file_blocks()), since data after EOF
is alwasy zero, it's unnecessary to preload mapping info of the data.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
With gcc and W=1 option, there's a warning like this:
fs/f2fs/compress.c: In function ‘f2fs_init_page_array_cache’:
fs/f2fs/compress.c:1984:47: error: ‘%u’ directive writing between
1 and 7 bytes into a region of size between 5 and 8
[-Werror=format-overflow=]
1984 | sprintf(slab_name, "f2fs_page_array_entry-%u:%u", MAJOR(dev),
MINOR(dev));
| ^~
String "f2fs_page_array_entry-%u:%u" can up to 35. The first "%u" can up
to 4 and the second "%u" can up to 7, so total size is "24 + 4 + 7 = 35".
slab_name's size should be 35 rather than 32.
Cc: stable@vger.kernel.org
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Otherwise, it may print random physical block address in tracepoint
of f2fs_map_blocks() as below:
f2fs_map_blocks: dev = (253,16), ino = 2297, file offset = 0, start blkaddr = 0xa356c421, len = 0x0, flags = 0
Fixes: c4020b2da4c9 ("f2fs: support F2FS_IOC_PRECACHE_EXTENTS")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This allows f2fs to support cases where the block size = page size for
both 4K and 16K block sizes. Other sizes should work as well, should the
need arise. This does not currently support 4K Block size filesystems if
the page size is 16K.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch tries to preload extent_cache given POSIX_FADV_WILLNEED, which is
more useful for generic usecases.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If file has both cold and compress flag, during f2fs_ioc_compress_file(),
f2fs will trigger IPU for non-compress cluster and OPU for compress
cluster, so that, data of the file may be fragmented.
Fix it by always triggering OPU for IOs from user mode compression.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
syzbot reports a kernel bug as below:
F2FS-fs (loop1): detect filesystem reference count leak during umount, type: 10, count: 1
kernel BUG at fs/f2fs/super.c:1639!
CPU: 0 PID: 15451 Comm: syz-executor.1 Not tainted 6.5.0-syzkaller-09338-ge0152e7481c6 #0
RIP: 0010:f2fs_put_super+0xce1/0xed0 fs/f2fs/super.c:1639
Call Trace:
generic_shutdown_super+0x161/0x3c0 fs/super.c:693
kill_block_super+0x3b/0x70 fs/super.c:1646
kill_f2fs_super+0x2b7/0x3d0 fs/f2fs/super.c:4879
deactivate_locked_super+0x9a/0x170 fs/super.c:481
deactivate_super+0xde/0x100 fs/super.c:514
cleanup_mnt+0x222/0x3d0 fs/namespace.c:1254
task_work_run+0x14d/0x240 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x210/0x240 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
syscall_exit_to_user_mode+0x1d/0x60 kernel/entry/common.c:296
do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd
In f2fs_put_super(), it tries to do sanity check on dirty and IO
reference count of f2fs, once there is any reference count leak,
it will trigger panic.
The root case is, during f2fs_put_super(), if there is any IO error
in f2fs_wait_on_all_pages(), we missed to truncate meta_inode's page
cache later, result in panic, fix this case.
Fixes: 20872584b8c0 ("f2fs: fix to drop all dirty meta/node pages during umount()")
Reported-by: syzbot+ebd7072191e2eddd7d6e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000a14f020604a62a98@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
With below script, redundant compress extension will be parsed and added
by parse_options(), because parse_options() doesn't check whether the
extension is existed or not, fix it.
1. mount -t f2fs -o compress_extension=so /dev/vdb /mnt/f2fs
2. mount -t f2fs -o remount,compress_extension=so /mnt/f2fs
3. mount|grep f2fs
/dev/vdb on /mnt/f2fs type f2fs (...,compress_extension=so,compress_extension=so,...)
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Fixes: 151b1982be5d ("f2fs: compress: add nocompress extensions support")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch covers sanity check logic on cluster w/ CONFIG_F2FS_CHECK_FS,
otherwise, there will be performance regression while querying cluster
mapping info.
Callers of f2fs_is_compressed_cluster() only care about whether cluster
is compressed or not, rather than # of valid blocks in compressed cluster,
so, let's adjust f2fs_is_compressed_cluster()'s logic according to
caller's requirement.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Call trace:
__memcpy+0x128/0x250
f2fs_read_multi_pages+0x940/0xf7c
f2fs_mpage_readpages+0x5a8/0x624
f2fs_readahead+0x5c/0x110
page_cache_ra_unbounded+0x1b8/0x590
do_sync_mmap_readahead+0x1dc/0x2e4
filemap_fault+0x254/0xa8c
f2fs_filemap_fault+0x2c/0x104
__do_fault+0x7c/0x238
do_handle_mm_fault+0x11bc/0x2d14
do_mem_abort+0x3a8/0x1004
el0_da+0x3c/0xa0
el0t_64_sync_handler+0xc4/0xec
el0t_64_sync+0x1b4/0x1b8
In f2fs_read_multi_pages(), once f2fs_decompress_cluster() was called if
we hit cached page in compress_inode's cache, dic may be released, it needs
break the loop rather than continuing it, in order to avoid accessing
invalid dic pointer.
Fixes: 6ce19aff0b8c ("f2fs: compress: add compress_inode to cache compressed blocks")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
With below mount option and testcase, it hangs kernel.
1. mount -t f2fs -o compress_log_size=5 /dev/vdb /mnt/f2fs
2. touch /mnt/f2fs/file
3. chattr +c /mnt/f2fs/file
4. dd if=/dev/zero of=/mnt/f2fs/file bs=1MB count=1
5. sync
6. dd if=/dev/zero of=/mnt/f2fs/file bs=111 count=11 conv=notrunc
7. sync
INFO: task sync:4788 blocked for more than 120 seconds.
Not tainted 6.5.0-rc1+ #322
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:sync state:D stack:0 pid:4788 ppid:509 flags:0x00000002
Call Trace:
<TASK>
__schedule+0x335/0xf80
schedule+0x6f/0xf0
wb_wait_for_completion+0x5e/0x90
sync_inodes_sb+0xd8/0x2a0
sync_inodes_one_sb+0x1d/0x30
iterate_supers+0x99/0xf0
ksys_sync+0x46/0xb0
__do_sys_sync+0x12/0x20
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
The reason is f2fs_all_cluster_page_ready() assumes that pages array should
cover at least one cluster, otherwise, it will always return false, result
in deadloop.
By default, pages array size is 16, and it can cover the case cluster_size
is equal or less than 16, for the case cluster_size is larger than 16, let's
allocate memory of pages array dynamically.
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In this cycle, we don't have a highlighted feature enhancement, but mostly
have fixed issues mainly in two parts: 1) zoned block device, 2) compression
support. For zoned block device, we've tried to improve the power-off recovery
flow as much as possible. For compression, we found some corner cases caused by
wrong compression policy and logics. Other than them, there were some reverts
and stat corrections.
Bug fix:
- use finish zone command when closing a zone
- check zone type before sending async reset zone command
- fix to assign compress_level for lz4 correctly
- fix error path of f2fs_submit_page_read()
- don't {,de}compress non-full cluster
- send small discard commands during checkpoint back
- flush inode if atomic file is aborted
- correct to account gc/cp stats
And, there are minor bug fixes, avoiding false lockdep warning, and clean-ups.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmTyemoACgkQQBSofoJI
UNLLKQ//bupYGPOqAgKbd/s7FhtULMiiRmFVy7W2eoMIc/oeeXOGrzDAF/1NifLC
WLV4uBNVTS4PS8D1vRzxZNEZt9aqPS0vQ8hxW/3nTI9Z425NX3nz7gLSxxmwIkIe
xj++V6tvKPcCH0BfKvfFCtcxj09PsflgdEuT8w/sIkH6p4o+VEMFs1Lc9PQsjUmh
epznK7JGBwpAxmHqI74n1eAw2CI6W+oKx23YDTNMBD6hmXTU0fkTeBURrOlSsUHZ
nhafPecsrCEI+OpAj03G/7e/zt+iTUKdmHx9O5ir/P00vF/c+SU2vSwB97FiHqBi
B4UmocTM0MAsU80PQcmE6aU3zgQFI0Yun5yZ24VeWjKTu76ssZSmT2HA/4RL+LLf
AeAW4FSyfh76pls8X5IWfilxGLWq6kTzSZA0MF7dH2q7qlj5apL5wKpm/XH6POqn
qELY/Y9+P1QuCcNL8BiYrgA5xBqVJ7Uw/6/6U3Y77PElc+Pwl3vI8UZ7uCOBrsXL
e0TLXy23AJA6AS2DyLLziy669nXAZRb95B8TWMfEeVZIMFvCeeqYc74N8jOFa0T8
q6uQFZs+0cETLZA8MSZdlNhzvhJmbW6wgSIz++CEdikWSLBZMKWxBVjCPkkCY9uc
DMh8zruSVbYPZWBTcxkMFEBJKKrU43++e7pb8ZoqTj4Pq1317b0=
=Qa8+
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-6-6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, we don't have a highlighted feature enhancement, but
mostly have fixed issues mainly in two parts: 1) zoned block device,
and 2) compression support.
For zoned block device, we've tried to improve the power-off recovery
flow as much as possible. For compression, we found some corner cases
caused by wrong compression policy and logics. Other than them, there
were some reverts and stat corrections.
Bug fixes:
- use finish zone command when closing a zone
- check zone type before sending async reset zone command
- fix to assign compress_level for lz4 correctly
- fix error path of f2fs_submit_page_read()
- don't {,de}compress non-full cluster
- send small discard commands during checkpoint back
- flush inode if atomic file is aborted
- correct to account gc/cp stats
And, there are minor bug fixes, avoiding false lockdep warning, and
clean-ups"
* tag 'f2fs-for-6-6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (25 commits)
f2fs: use finish zone command when closing a zone
f2fs: compress: fix to assign compress_level for lz4 correctly
f2fs: fix error path of f2fs_submit_page_read()
f2fs: clean up error handling in sanity_check_{compress_,}inode()
f2fs: avoid false alarm of circular locking
Revert "f2fs: do not issue small discard commands during checkpoint"
f2fs: doc: fix description of max_small_discards
f2fs: should update REQ_TIME for direct write
f2fs: fix to account cp stats correctly
f2fs: fix to account gc stats correctly
f2fs: remove unneeded check condition in __f2fs_setxattr()
f2fs: fix to update i_ctime in __f2fs_setxattr()
Revert "f2fs: fix to do sanity check on extent cache correctly"
f2fs: increase usage of folio_next_index() helper
f2fs: Only lfs mode is allowed with zoned block device feature
f2fs: check zone type before sending async reset zone command
f2fs: compress: don't {,de}compress non-full cluster
f2fs: allow f2fs_ioc_{,de}compress_file to be interrupted
f2fs: don't reopen the main block device in f2fs_scan_devices
f2fs: fix to avoid mmap vs set_compress_option case
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmTs08EQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpqa4EACu/zKE+omGXBV0Q7kEpVsChjp0ElGtSDIJ
tJfTuvnWqQjrqRv4ksmZvGdx8SkqFuXri4/7oBXlsaqeUVbIQdWJUpLErBye6nxa
lUb6nXOFWwyG94cMRYs71lN0loosjb7aiVw7oVLAIhntq3p3doFl/cyy3ndMZrUE
pZbsrWSt4QiOKhcO0TtIjfAwsr31AN51qFiNNITEiZl3UjXfkGRCK81X0yM2N8zZ
7Y0h1ldPBsZ/olNWeRyaW1uB64nKM0buR7/nDxCV/NI05nndJ34bIgo/JIj4xy0v
SiBj2+y86+oMJZt17yYENwOQdtX3hbyESGuVm9dCrO0t9/byVQxkUk0OMm65BM/l
l2d+gmMQZTbHziqfLlgq9i3i9+B4C2hsb7iBpuo7SW/FPbM45POgi3lpiZycaZyu
krQo1qwL4KSGXzGN9CabEuKDcJcXqLxqMDOyEDA3R5Kz06V9tNuM+Di/mr4vuZHK
sVHUfHuWBO9ionLlGPdc3fH/CuMqic8SHjumiAm2menBZV6cSzRDxpm6H4CyLt7y
tWmw7BNU7dfHFGd+Jw0Ld49sAuEybszEXq6qYv5uYBVfJNqDvOvEeVoQp0RN2jJA
AG30hymcZgxn9n7gkIgkPQDgIGUjnzUR8B2mE2UFU1CYVHXYXAXU55CCI5oeTkbs
d0Y/zCZf1A==
=p1bd
-----END PGP SIGNATURE-----
Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
"Pretty quiet round for this release. This contains:
- Add support for zoned storage to ublk (Andreas, Ming)
- Series improving performance for drivers that mark themselves as
needing a blocking context for issue (Bart)
- Cleanup the flush logic (Chengming)
- sed opal keyring support (Greg)
- Fixes and improvements to the integrity support (Jinyoung)
- Add some exports for bcachefs that we can hopefully delete again in
the future (Kent)
- deadline throttling fix (Zhiguo)
- Series allowing building the kernel without buffer_head support
(Christoph)
- Sanitize the bio page adding flow (Christoph)
- Write back cache fixes (Christoph)
- MD updates via Song:
- Fix perf regression for raid0 large sequential writes (Jan)
- Fix split bio iostat for raid0 (David)
- Various raid1 fixes (Heinz, Xueshi)
- raid6test build fixes (WANG)
- Deprecate bitmap file support (Christoph)
- Fix deadlock with md sync thread (Yu)
- Refactor md io accounting (Yu)
- Various non-urgent fixes (Li, Yu, Jack)
- Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li,
Ming, Nitesh, Ruan, Tejun, Thomas, Xu)"
* tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits)
block: use strscpy() to instead of strncpy()
block: sed-opal: keyring support for SED keys
block: sed-opal: Implement IOC_OPAL_REVERT_LSP
block: sed-opal: Implement IOC_OPAL_DISCOVERY
blk-mq: prealloc tags when increase tagset nr_hw_queues
blk-mq: delete redundant tagset map update when fallback
blk-mq: fix tags leak when shrink nr_hw_queues
ublk: zoned: support REQ_OP_ZONE_RESET_ALL
md: raid0: account for split bio in iostat accounting
md/raid0: Fix performance regression for large sequential writes
md/raid0: Factor out helper for mapping and submitting a bio
md raid1: allow writebehind to work on any leg device set WriteMostly
md/raid1: hold the barrier until handle_read_error() finishes
md/raid1: free the r1bio before waiting for blocked rdev
md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io()
blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init
drivers/rnbd: restore sysfs interface to rnbd-client
md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid()
raid6: test: only check for Altivec if building on powerpc hosts
raid6: test: make sure all intermediate and artifact files are .gitignored
...
* Make large writes to the page cache fill sparse parts of the cache
with large folios, then use large memcpy calls for the large folio.
* Track the per-block dirty state of each large folio so that a
buffered write to a single byte on a large folio does not result in a
(potentially) multi-megabyte writeback IO.
* Allow some directio completions to be performed in the initiating
task's context instead of punting through a workqueue. This will
reduce latency for some io_uring requests.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZM0Z1AAKCRBKO3ySh0YR
pp7BAQCzkKejCM0185tNIH/faHjzidSisNQkJ5HoB4Opq9U66AEA6IPuAdlPlM/J
FPW1oPq33Yn7AV4wXjUNFfDLzVb/Fgg=
=dFBU
-----END PGP SIGNATURE-----
Merge tag 'iomap-6.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap updates from Darrick Wong:
"We've got some big changes for this release -- I'm very happy to be
landing willy's work to enable large folios for the page cache for
general read and write IOs when the fs can make contiguous space
allocations, and Ritesh's work to track sub-folio dirty state to
eliminate the write amplification problems inherent in using large
folios.
As a bonus, io_uring can now process write completions in the caller's
context instead of bouncing through a workqueue, which should reduce
io latency dramatically. IOWs, XFS should see a nice performance bump
for both IO paths.
Summary:
- Make large writes to the page cache fill sparse parts of the cache
with large folios, then use large memcpy calls for the large folio.
- Track the per-block dirty state of each large folio so that a
buffered write to a single byte on a large folio does not result in
a (potentially) multi-megabyte writeback IO.
- Allow some directio completions to be performed in the initiating
task's context instead of punting through a workqueue. This will
reduce latency for some io_uring requests"
* tag 'iomap-6.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (26 commits)
iomap: support IOCB_DIO_CALLER_COMP
io_uring/rw: add write support for IOCB_DIO_CALLER_COMP
fs: add IOCB flags related to passing back dio completions
iomap: add IOMAP_DIO_INLINE_COMP
iomap: only set iocb->private for polled bio
iomap: treat a write through cache the same as FUA
iomap: use an unsigned type for IOMAP_DIO_* defines
iomap: cleanup up iomap_dio_bio_end_io()
iomap: Add per-block dirty state tracking to improve performance
iomap: Allocate ifs in ->write_begin() early
iomap: Refactor iomap_write_delalloc_punch() function out
iomap: Use iomap_punch_t typedef
iomap: Fix possible overflow condition in iomap_write_delalloc_scan
iomap: Add some uptodate state handling helpers for ifs state bitmap
iomap: Drop ifs argument from iomap_set_range_uptodate()
iomap: Rename iomap_page to iomap_folio_state and others
iomap: Copy larger chunks from userspace
iomap: Create large folios in the buffered write path
filemap: Allow __filemap_get_folio to allocate large folios
filemap: Add fgf_t typedef
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXpbgAKCRCRxhvAZXjc
oi8PAQCtXelGZHmTcmevsO8p4Qz7hFpkonZ/TnxKf+RdnlNgPgD+NWi+LoRBpaAj
xk4z8SqJaTTP4WXrG5JZ6o7EQkUL8gE=
=2e9I
-----END PGP SIGNATURE-----
Merge tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull superblock updates from Christian Brauner:
"This contains the super rework that was ready for this cycle. The
first part changes the order of how we open block devices and allocate
superblocks, contains various cleanups, simplifications, and a new
mechanism to wait on superblock state changes.
This unblocks work to ultimately limit the number of writers to a
block device. Jan has already scheduled follow-up work that will be
ready for v6.7 and allows us to restrict the number of writers to a
given block device. That series builds on this work right here.
The second part contains filesystem freezing updates.
Overview:
The generic superblock changes are rougly organized as follows
(ignoring additional minor cleanups):
(1) Removal of the bd_super member from struct block_device.
This was a very odd back pointer to struct super_block with
unclear rules. For all relevant places we have other means to get
the same information so just get rid of this.
(2) Simplify rules for superblock cleanup.
Roughly, everything that is allocated during fs_context
initialization and that's stored in fs_context->s_fs_info needs
to be cleaned up by the fs_context->free() implementation before
the superblock allocation function has been called successfully.
After sget_fc() returned fs_context->s_fs_info has been
transferred to sb->s_fs_info at which point sb->kill_sb() if
fully responsible for cleanup. Adhering to these rules means that
cleanup of sb->s_fs_info in fill_super() is to be avoided as it's
brittle and inconsistent.
Cleanup shouldn't be duplicated between sb->put_super() as
sb->put_super() is only called if sb->s_root has been set aka
when the filesystem has been successfully born (SB_BORN). That
complexity should be avoided.
This also means that block devices are to be closed in
sb->kill_sb() instead of sb->put_super(). More details in the
lower section.
(3) Make it possible to lookup or create a superblock before opening
block devices
There's a subtle dependency on (2) as some filesystems did rely
on fill_super() to be called in order to correctly clean up
sb->s_fs_info. All these filesystems have been fixed.
(4) Switch most filesystem to follow the same logic as the generic
mount code now does as outlined in (3).
(5) Use the superblock as the holder of the block device. We can now
easily go back from block device to owning superblock.
(6) Export and extend the generic fs_holder_ops and use them as
holder ops everywhere and remove the filesystem specific holder
ops.
(7) Call from the block layer up into the filesystem layer when the
block device is removed, allowing to shut down the filesystem
without risk of deadlocks.
(8) Get rid of get_super().
We can now easily go back from the block device to owning
superblock and can call up from the block layer into the
filesystem layer when the device is removed. So no need to wade
through all registered superblock to find the owning superblock
anymore"
Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/
* tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits)
super: use higher-level helper for {freeze,thaw}
super: wait until we passed kill super
super: wait for nascent superblocks
super: make locking naming consistent
super: use locking helpers
fs: simplify invalidate_inodes
fs: remove get_super
block: call into the file system for ioctl BLKFLSBUF
block: call into the file system for bdev_mark_dead
block: consolidate __invalidate_device and fsync_bdev
block: drop the "busy inodes on changed media" log message
dasd: also call __invalidate_device when setting the device offline
amiflop: don't call fsync_bdev in FDFMTBEG
floppy: call disk_force_media_change when changing the format
block: simplify the disk_force_media_change interface
nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl
xfs use fs_holder_ops for the log and RT devices
xfs: drop s_umount over opening the log and RT devices
ext4: use fs_holder_ops for the log device
ext4: drop s_umount over opening the log device
...
Use the finish zone command first when a zone should be closed.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
After remount, F2FS_OPTION().compress_level was assgin to
LZ4HC_DEFAULT_CLEVEL incorrectly, result in lz4hc:9 was enabled, fix it.
1. mount /dev/vdb
/dev/vdb on /mnt/f2fs type f2fs (...,compress_algorithm=lz4,compress_log_size=2,...)
2. mount -t f2fs -o remount,compress_log_size=3 /mnt/f2fs/
3. mount|grep f2fs
/dev/vdb on /mnt/f2fs type f2fs (...,compress_algorithm=lz4:9,compress_log_size=3,...)
Fixes: 00e120b5e4b5 ("f2fs: assign default compression level")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In error path of f2fs_submit_page_read(), it missed to call
iostat_update_and_unbind_ctx() and free bio_post_read_ctx, fix it.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In sanity_check_{compress_,}inode(), it doesn't need to set SBI_NEED_FSCK
in each error case, instead, we can set the flag in do_read_inode() only
once when sanity_check_inode() fails.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* Allow the kernel to initiate a freeze of a filesystem. The kernel
and userspace can both hold a freeze on a filesystem at the same
time; the freeze is not lifted until /both/ holders lift it. This
will enable us to fix a longstanding bug in XFS online fsck.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZLVnJwAKCRBKO3ySh0YR
pqVIAP9u9CZEJ2Zcc7YpBj1MLUQGr2xBmz8RJEVJbQHKVgYcQwEA9BNb4eH4i2Af
K7Qp0OGNgyzZw37lN23Uf/SDuBK2QgM=
=seMl
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXo2AAKCRCRxhvAZXjc
ojDfAQDguc2saF8WLeXtn2O0pGOW8vTrhpwiFHNI6hwdzf07/AD+LGBpFEqYKyX5
NHPzdR7YYpJoTsQzR4JFJVZqN9Q1xgU=
=wDq0
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.6-merge-2' of ssh://gitolite.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull filesystem freezing updates from Darrick Wong:
New code for 6.6:
* Allow the kernel to initiate a freeze of a filesystem. The kernel
and userspace can both hold a freeze on a filesystem at the same
time; the freeze is not lifted until /both/ holders lift it. This
will enable us to fix a longstanding bug in XFS online fsck.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Message-Id: <20230822182604.GB11286@frogsfrogsfrogs>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Previously, we have two mechanisms to cache & submit small discards:
a) set max small discard number in /sys/fs/f2fs/vdb/max_small_discards,
and checkpoint will cache small discard candidates w/ configured maximum
number.
b) call FITRIM ioctl, also, checkpoint in f2fs_trim_fs() will cache small
discard candidates w/ configured discard granularity, but w/o limitation
of number. FSTRIM interface is asynchronized, so it won't submit discard
directly.
Finally, discard thread will submit them in background periodically.
However, after commit 9ac00e7cef10 ("f2fs: do not issue small discard
commands during checkpoint"), the mechanism a) is broken, since no matter
how we configure the sysfs entry /sys/fs/f2fs/vdb/max_small_discards,
checkpoint will not cache small discard candidates any more.
echo 0 > /sys/fs/f2fs/vdb/max_small_discards
xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
sync
cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
echo 100 > /sys/fs/f2fs/vdb/max_small_discards
rm /mnt/f2fs/file
xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
sync
cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
Before the patch:
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 . . . . . . . .
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 3 1 . . . . . .
After the patch:
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 . . . . . . . .
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 . . . . . . . .
This patch reverts commit 9ac00e7cef10 ("f2fs: do not issue small discard
commands during checkpoint") in order to fix this issue.
Fixes: 9ac00e7cef10 ("f2fs: do not issue small discard commands during checkpoint")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The sending interval of discard and GC should also
consider direct write requests; filesystem is not
idle if there is direct write.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As reported, status debugfs entry shows inconsistent GC stats as below:
GC calls: 6008 (BG: 6161)
- data segments : 3053 (BG: 3053)
- node segments : 2955 (BG: 2955)
Total GC calls is larger than BGGC calls, the reason is:
- f2fs_stat_info.call_count accounts total migrated section count
by f2fs_gc()
- f2fs_stat_info.bg_gc accounts total call times of f2fs_gc() from
background gc_thread
Another issue is gc_foreground_calls sysfs entry shows total GC call
count rather than FGGC call count.
This patch changes as below for fix:
- account GC calls and migrated segment count separately
- support to account migrated section count if it enables large section
mode
- fix to show correct value in gc_foreground_calls sysfs entry
Fixes: fc7100ea2a52 ("f2fs: Add f2fs stats to sysfs")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It has checked return value of write_all_xattrs(), remove unneeded
following check condition.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
generic/728 - output mismatch (see /media/fstests/results//generic/728.out.bad)
--- tests/generic/728.out 2023-07-19 07:10:48.362711407 +0000
+++ /media/fstests/results//generic/728.out.bad 2023-07-19 08:39:57.000000000 +0000
QA output created by 728
+Expected ctime to change after setxattr.
+Expected ctime to change after removexattr.
Silence is golden
...
(Run 'diff -u /media/fstests/tests/generic/728.out /media/fstests/results//generic/728.out.bad' to see the entire diff)
generic/729 1s
It needs to update i_ctime after {set,remove}xattr, fix it.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
syzbot reports a f2fs bug as below:
UBSAN: array-index-out-of-bounds in fs/f2fs/f2fs.h:3275:19
index 1409 is out of range for type '__le32[923]' (aka 'unsigned int[923]')
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348
inline_data_addr fs/f2fs/f2fs.h:3275 [inline]
__recover_inline_status fs/f2fs/inode.c:113 [inline]
do_read_inode fs/f2fs/inode.c:480 [inline]
f2fs_iget+0x4730/0x48b0 fs/f2fs/inode.c:604
f2fs_fill_super+0x640e/0x80c0 fs/f2fs/super.c:4601
mount_bdev+0x276/0x3b0 fs/super.c:1391
legacy_get_tree+0xef/0x190 fs/fs_context.c:611
vfs_get_tree+0x8c/0x270 fs/super.c:1519
do_new_mount+0x28f/0xae0 fs/namespace.c:3335
do_mount fs/namespace.c:3675 [inline]
__do_sys_mount fs/namespace.c:3884 [inline]
__se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3861
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The issue was bisected to:
commit d48a7b3a72f121655d95b5157c32c7d555e44c05
Author: Chao Yu <chao@kernel.org>
Date: Mon Jan 9 03:49:20 2023 +0000
f2fs: fix to do sanity check on extent cache correctly
The root cause is we applied both v1 and v2 of the patch, v2 is the right
fix, so it needs to revert v1 in order to fix reported issue.
v1:
commit d48a7b3a72f1 ("f2fs: fix to do sanity check on extent cache correctly")
https://lore.kernel.org/lkml/20230109034920.492914-1-chao@kernel.org/
v2:
commit 269d11948100 ("f2fs: fix to do sanity check on extent cache correctly")
https://lore.kernel.org/lkml/20230207134808.1827869-1-chao@kernel.org/
Reported-by: syzbot+601018296973a481f302@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000fcf0690600e4d04d@google.com/
Fixes: d48a7b3a72f1 ("f2fs: fix to do sanity check on extent cache correctly")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using
the existing helper folio_next_index().
Signed-off-by: Minjie Du <duminjie@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Now f2fs support four block allocation modes: lfs, adaptive,
fragment:segment, fragment:block. Only lfs mode is allowed with zoned block
device feature.
Fixes: 6691d940b0e0 ("f2fs: introduce fragment allocation mode mount option")
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The commit 25f9080576b9 ("f2fs: add async reset zone command support")
introduced "async reset zone commands" by calling
__submit_zone_reset_cmd() in async discard operations. However,
__submit_zone_reset_cmd() is called regardless of zone type of discard
target zone. When devices have conventional zones, zone reset commands
are sent to the conventional zones and cause I/O errors.
Avoid the I/O errors by checking that the discard target zone type is
sequential write required. If not, handle the discard operation in same
manner as non-zoned, regular block devices. For that purpose, add a new
helper function f2fs_bdev_index() which gets index of the zone reset
target device.
Fixes: 25f9080576b9 ("f2fs: add async reset zone command support")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs won't compress non-full cluster in tail of file, let's skip
dirtying and rewrite such cluster during f2fs_ioc_{,de}compress_file.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch allows f2fs_ioc_{,de}compress_file() to be interrupted, so that,
userspace won't be blocked when manual {,de}compression on large file is
interrupted by signal.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs_scan_devices reopens the main device since the very beginning, which
has always been useless, and also means that we don't pass the right
holder for the reopen, which now leads to a warning as the core super.c
holder ops aren't passed in for the reopen.
Fixes: 3c62be17d4f5 ("f2fs: support multiple devices")
Fixes: 0718afd47f70 ("block: introduce holder ops")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Compression option in inode should not be changed after they have
been used, however, it may happen in below race case:
Thread A Thread B
- f2fs_ioc_set_compress_option
- check f2fs_is_mmap_file()
- check get_dirty_pages()
- check F2FS_HAS_BLOCKS()
- f2fs_file_mmap
- set_inode_flag(FI_MMAP_FILE)
- fault
- do_page_mkwrite
- f2fs_vm_page_mkwrite
- f2fs_get_block_locked
- fault_dirty_shared_page
- set_page_dirty
- update i_compress_algorithm
- update i_log_cluster_size
- update i_cluster_size
Avoid such race condition by covering f2fs_file_mmap() w/ i_sem lock,
meanwhile add mmap file check condition in f2fs_may_compress() as well.
Fixes: e1e8debec656 ("f2fs: add F2FS_IOC_SET_COMPRESS_OPTION ioctl")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
https://bugzilla.kernel.org/show_bug.cgi?id=216050
Somehow we're getting a page which has a different mapping.
Let's avoid the infinite loop.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs_compress_alloc_page() uses mempool to allocate memory, it never
fail, don't handle error case in its callers.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This reverts commit bfd476623999118d9c509cb0fa9380f2912bc225.
Shinichiro Kawasaki reported:
When I ran workloads on f2fs using v6.5-rcX with fixes [1][2] and a zoned block
devices with 4kb logical block size, I observe mount failure as follows. When
I revert this commit, the failure goes away.
[ 167.781975][ T1555] F2FS-fs (dm-0): IO Block Size: 4 KB
[ 167.890728][ T1555] F2FS-fs (dm-0): Found nat_bits in checkpoint
[ 171.482588][ T1555] F2FS-fs (dm-0): Zone without valid block has non-zero write pointer. Reset the write pointer: wp[0x1300,0x8]
[ 171.496000][ T1555] F2FS-fs (dm-0): (0) : Unaligned zone reset attempted (block 280000 + 80000)
[ 171.505037][ T1555] F2FS-fs (dm-0): Discard zone failed: (errno=-5)
The patch replaced "sbi->log_blocksize - SECTOR_SHIFT" with
"sbi->log_sectors_per_block". However, I think these two are not equal when the
device has 4k logical block size. The former uses Linux kernel sector size 512
byte. The latter use 512b sector size or 4kb sector size depending on the
device. mkfs.f2fs obtains logical block size via BLKSSZGET ioctl from the device
and reflects it to the value sbi->log_sector_size_per_block. This causes
unexpected write pointer calculations in check_zone_write_pointer(). This
resulted in unexpected zone reset and the mount failure.
[1] https://lkml.kernel.org/linux-f2fs-devel/20230711050101.GA19128@lst.de/
[2] https://lore.kernel.org/linux-f2fs-devel/20230804091556.2372567-1-shinichiro.kawasaki@wdc.com/
Cc: stable@vger.kernel.org
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: bfd476623999 ("f2fs: clean up w/ sbi->log_sectors_per_block")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The file system type is not a very useful holder as it doesn't allow us
to go back to the actual file system instance. Pass the super_block instead
which is useful when passed back to the file system driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Message-Id: <20230802154131.2221419-7-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
generic_fillattr just fills in the entire stat struct indiscriminately
today, copying data from the inode. There is at least one attribute
(STATX_CHANGE_COOKIE) that can have side effects when it is reported,
and we're looking at adding more with the addition of multigrain
timestamps.
Add a request_mask argument to generic_fillattr and have most callers
just pass in the value that is passed to getattr. Have other callers
(e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of
STATX_CHANGE_COOKIE into generic_fillattr.
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: "Paulo Alcantara (SUSE)" <pc@manguebit.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Message-Id: <20230807-mgctime-v7-2-d1dec143a704@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Add a new config option that controls building the buffer_head code, and
select it from all file systems and stacking drivers that need it.
For the block device nodes and alternative iomap based buffered I/O path
is provided when buffer_head support is not enabled, and iomap needs a
a small tweak to define the IOMAP_F_BUFFER_HEAD flag to 0 to not call
into the buffer_head code when it doesn't exist.
Otherwise this is just Kconfig and ifdef changes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20230801172201.1923299-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
block_page_mkwrite_return is neither block nor mkwrite specific, and
should not be under CONFIG_BLOCK. Move it to mm.h and rename it to
vmf_fs_error.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230801172201.1923299-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Similarly to gfp_t, define fgf_t as its own type to prevent various
misuses and confusion. Leave the flags as FGP_* for now to reduce the
size of this patch; they will be converted to FGF_* later. Move the
documentation to the definition of the type insted of burying it in the
__filemap_get_folio() documentation.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-41-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Userspace can freeze a filesystem using the FIFREEZE ioctl or by
suspending the block device; this state persists until userspace thaws
the filesystem with the FITHAW ioctl or resuming the block device.
Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for
the fsfreeze ioctl") we only allow the first freeze command to succeed.
The kernel may decide that it is necessary to freeze a filesystem for
its own internal purposes, such as suspends in progress, filesystem fsck
activities, or quiescing a device prior to removal. Userspace thaw
commands must never break a kernel freeze, and kernel thaw commands
shouldn't undo userspace's freeze command.
Introduce a couple of freeze holder flags and wire it into the
sb_writers state. One kernel and one userspace freeze are allowed to
coexist at the same time; the filesystem will not thaw until both are
lifted.
I wonder if the f2fs/gfs2 code should be using a kernel freeze here, but
for now we'll use FREEZE_HOLDER_USERSPACE to preserve existing
behaviors.
Cc: mcgrof@kernel.org
Cc: jack@suse.cz
Cc: hch@infradead.org
Cc: ruansy.fnst@fujitsu.com
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>