Commit Graph

4245 Commits

Author SHA1 Message Date
Stephen Rothwell
9db4945673 Merge branch 'crc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git 2024-12-20 15:11:55 +11:00
Dmitry Antipov
76f01376df f2fs: ensure that node info flags are always initialized
Syzbot has reported the following KMSAN splat:

BUG: KMSAN: uninit-value in f2fs_new_node_page+0x1494/0x1630
 f2fs_new_node_page+0x1494/0x1630
 f2fs_new_inode_page+0xb9/0x100
 f2fs_init_inode_metadata+0x176/0x1e90
 f2fs_add_inline_entry+0x723/0xc90
 f2fs_do_add_link+0x48f/0xa70
 f2fs_symlink+0x6af/0xfc0
 vfs_symlink+0x1f1/0x470
 do_symlinkat+0x471/0xbc0
 __x64_sys_symlink+0xcf/0x140
 x64_sys_call+0x2fcc/0x3d90
 do_syscall_64+0xd9/0x1b0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Local variable new_ni created at:
 f2fs_new_node_page+0x9d/0x1630
 f2fs_new_inode_page+0xb9/0x100

So adjust 'f2fs_get_node_info()' to ensure that 'flag'
field of 'struct node_info' is always initialized.

Reported-by: syzbot+5141f6db57a2f7614352@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=5141f6db57a2f7614352
Fixes: e05df3b115 ("f2fs: add node operations")
Suggested-by: Chao Yu <chao@kernel.org>
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:54 +00:00
Yongpeng Yang
e9a844f6e4 f2fs: The GC triggered by ioctl also needs to mark the segno as victim
In SSR mode, the segment selected for allocation might be the same as
the target segment of the GC triggered by ioctl, resulting in the GC
moving the CURSEG_I(sbi, type)->segno.
Thread A				Thread B or Thread A
- f2fs_ioc_gc_range
 - __f2fs_ioc_gc_range(.victim_segno=segno#N)
  - f2fs_gc
   - __get_victim
    - f2fs_get_victim
    : segno#N is valid, return segno#N as source segment of GC
					- f2fs_allocate_data_block
						- need_new_seg
						- get_ssr_segment
						- f2fs_get_victim
						: get segno #N as destination segment
						- change_curseg

Fixes: e066b83c9b ("f2fs: add ioctl to flush data from faster device to cold area")
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:29 +00:00
zangyangyang1
5f65945427 f2fs: cache more dentry pages
While traversing dir entries in dentry page, it's better to refresh current
accessed page in lru list by using FGP_ACCESSED flag, otherwise, such page
may has less chance to survive during memory reclaim, result in causing
additional IO when revisiting dentry page.

Signed-off-by: zangyangyang1 <zangyangyang1@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:28 +00:00
Matthew Wilcox (Oracle)
c910a64bc4 f2fs: Remove calls to folio_file_mapping()
All folios that f2fs sees belong to f2fs and not to the swapcache
so it can dereference folio->mapping directly like all other
filesystems do.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:26 +00:00
Matthew Wilcox (Oracle)
19bbd306dd f2fs: Convert __read_io_type() to take a folio
Remove the last call to page_file_mapping() as both callers can now pass
in a folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:23 +00:00
Matthew Wilcox (Oracle)
f58d864582 f2fs: Use a data folio in f2fs_submit_page_bio()
Remove a call to compound_head().  We can call bio_add_folio_nofail()
here because we just allocated the bio, so we know it can't fail and
thus the error path can never be taken.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:19 +00:00
Matthew Wilcox (Oracle)
0765b3f989 f2fs: Use a folio more in f2fs_submit_page_bio()
Cache the result of page_folio(fio->page) in a local variable so
we don't have to keep calling it.  Saves a couple of calls to
compound_head() and removes an access to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:16 +00:00
Matthew Wilcox (Oracle)
e0821645dd f2fs: Convert f2fs_finish_read_bio() to use folios
Use bio_for_each_folio_all() to iterate over each folio in the bio.
This lets us use folio_end_read() which saves an atomic operation and
memory barrier compared to marking the folio uptodate and unlocking
it as two separate operations.  This also removes a few hidden calls
to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:13 +00:00
Matthew Wilcox (Oracle)
1cf7460070 f2fs: Add F2FS_F_SB()
This is the folio equivalent of F2FS_P_SB().  Removes a call to
page_file_mapping() as we know folios seen by f2fs are never part of
the swap cache.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:10 +00:00
Matthew Wilcox (Oracle)
87e2a15bc0 f2fs: Convert submit tracepoints to take a folio
Remove accesses to page->index and page->mapping as well as
unnecessary calls to page_file_mapping().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:07 +00:00
Matthew Wilcox (Oracle)
ac866908d7 f2fs: Use a folio in f2fs_write_compressed_pages()
Remove accesses to page->index and an unnecessary reference to
page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:04 +00:00
Matthew Wilcox (Oracle)
1cda5bc0b2 f2fs: Use a folio in f2fs_truncate_partial_cluster()
Convert the incoming page to a folio and use it throughout.
Removes an access to page->index.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:12:01 +00:00
Matthew Wilcox (Oracle)
ff6c82a934 f2fs: Use a folio in f2fs_compress_write_end()
This removes an access of page->index.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:11:58 +00:00
Matthew Wilcox (Oracle)
a909c17953 f2fs: Use a folio in f2fs_all_cluster_page_ready()
Remove references to page->index and use folio_test_uptodate()
instead of PageUptodate().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-12-16 16:11:51 +00:00
Eric Biggers
3ca4bec40e f2fs: switch to using the crc32 library
Now that the crc32() library function takes advantage of
architecture-specific optimizations, it is unnecessary to go through the
crypto API.  Just use crc32().  This is much simpler, and it improves
performance due to eliminating the crypto API overhead.

Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20241202010844.144356-19-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-12-01 17:23:02 -08:00
Linus Torvalds
44b4d13b70 f2fs-for-6.13-rc1
This series introduces a device aliasing feature where user can carve out
 partitions but reclaim the space back by deleting aliased file in root dir.
 In addition to that, there're numerous minor bug fixes in zoned device support,
 checkpoint=disable, extent cache management, fiemap, and lazytime mount option.
 The full list of noticeable changes can be found below.
 
 Enhancement:
  - introduce device aliasing file
  - add stats in debugfs to show multiple devices
  - add a sysfs node to limit max read extent count per-inode
  - modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable
  - decrease spare area for pinned files for zoned devices
 
 Bug fix:
  - Revert "f2fs: remove unreachable lazytime mount option parsing"
  - adjust unusable cap before checkpoint=disable mode
  - fix to drop all discards after creating snapshot on lvm device
  - fix to shrink read extent node in batches
  - fix changing cursegs if recovery fails on zoned device
  - fix to adjust appropriate length for fiemap
  - fix fiemap failure issue when page size is 16KB
  - fix to avoid forcing direct write to use buffered IO on inline_data inode
  - fix to map blocks correctly for direct write
  - fix to account dirty data in __get_secs_required()
  - fix null-ptr-deref in f2fs_submit_page_bio()
  - f2fs: compress: fix inconsistent update of i_blocks in release_compress_blocks and reserve_compress_blocks
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmdD25MACgkQQBSofoJI
 UNKgLhAAgr0Dy/VWDgRlMovckq0q5EyQu/Jospv6mJyErQ4pZwwidNn9FSf0yua9
 O0Pofs1zMFWoe5R2UOvwOnahmvlwD1nnRMylA10/9hp+/aKlTRxOI7HrdL5wFgWG
 QRTb/k+mgoEQk8+9ElThzq/CkmQovPEUfhoxW7bE4zH9kVoxi2klFbkASZynqEFe
 a+TVQoDUnXvb1cbvr4zEVuD79QEmazD/bgc+gquxChCHfzX8ip4R0aCZM1ceTgm/
 Vru0LUKGQTWXPPReugJbOOtoIJ/kgD9Sg5xa7Icg3nxukgiYUDdl3e7MTgfvHOK6
 Fwwj+ZbM/yV/gpAQp+g+uOkKSFqfulyOb+nzX5tmebmiT2Vs6XSQ0Xo+fjm7N1QC
 j0G1vwz91xETK/gw2U/zL/HQVB3IU/2dtBT2ek4x6kmVL3rmHYoI6r2ofQcEFjGn
 2YQ9yvvT/fY6fza88kWO0PjgIRDzw9D9ihfZVyH9MCy5n6adhWlFXIg0HbAoecDE
 6xsVjb5BVYJfQvVz3FauGRXu6i3mePaURC1rrf5NKFfAWJP7pDfi9IvSL56u2aMt
 J+RJ7a2u1l1z/yhBxtr00KhMP586OZHVJwQvwNJV7mzBFhvOlm3a4jTzbG35dE+V
 MfbbjR628y/0IkqZiB7YVu1NIF2qdbZosv4nO7b584Q1h1NH/PU=
 =LOgM
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This series introduces a device aliasing feature where user can carve
  out partitions but reclaim the space back by deleting aliased file in
  root dir.

  In addition to that, there're numerous minor bug fixes in zoned device
  support, checkpoint=disable, extent cache management, fiemap, and
  lazytime mount option. The full list of noticeable changes can be
  found below.

  Enhancements:
   - introduce device aliasing file
   - add stats in debugfs to show multiple devices
   - add a sysfs node to limit max read extent count per-inode
   - modify f2fs_is_checkpoint_ready logic to allow more data to be
     written with the CP disable
   - decrease spare area for pinned files for zoned devices

  Fixes:
   - Revert "f2fs: remove unreachable lazytime mount option parsing"
   - adjust unusable cap before checkpoint=disable mode
   - fix to drop all discards after creating snapshot on lvm device
   - fix to shrink read extent node in batches
   - fix changing cursegs if recovery fails on zoned device
   - fix to adjust appropriate length for fiemap
   - fix fiemap failure issue when page size is 16KB
   - fix to avoid forcing direct write to use buffered IO on inline_data
     inode
   - fix to map blocks correctly for direct write
   - fix to account dirty data in __get_secs_required()
   - fix null-ptr-deref in f2fs_submit_page_bio()
   - fix inconsistent update of i_blocks in release_compress_blocks and
     reserve_compress_blocks"

* tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
  f2fs: fix to drop all discards after creating snapshot on lvm device
  f2fs: add a sysfs node to limit max read extent count per-inode
  f2fs: fix to shrink read extent node in batches
  f2fs: print message if fscorrupted was found in f2fs_new_node_page()
  f2fs: clear SBI_POR_DOING before initing inmem curseg
  f2fs: fix changing cursegs if recovery fails on zoned device
  f2fs: adjust unusable cap before checkpoint=disable mode
  f2fs: fix to requery extent which cross boundary of inquiry
  f2fs: fix to adjust appropriate length for fiemap
  f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK}
  f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow
  f2fs: replace deprecated strcpy with strscpy
  Revert "f2fs: remove unreachable lazytime mount option parsing"
  f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode
  f2fs: fix to map blocks correctly for direct write
  f2fs: fix race in concurrent f2fs_stop_gc_thread
  f2fs: fix fiemap failure issue when page size is 16KB
  f2fs: remove redundant atomic file check in defragment
  f2fs: fix to convert log type to segment data type correctly
  f2fs: clean up the unused variable additional_reserved_segments
  ...
2024-11-26 12:50:58 -08:00
Chao Yu
bc8aeb04fd f2fs: fix to drop all discards after creating snapshot on lvm device
Piergiorgio reported a bug in bugzilla as below:

------------[ cut here ]------------
WARNING: CPU: 2 PID: 969 at fs/f2fs/segment.c:1330
RIP: 0010:__submit_discard_cmd+0x27d/0x400 [f2fs]
Call Trace:
 __issue_discard_cmd+0x1ca/0x350 [f2fs]
 issue_discard_thread+0x191/0x480 [f2fs]
 kthread+0xcf/0x100
 ret_from_fork+0x31/0x50
 ret_from_fork_asm+0x1a/0x30

w/ below testcase, it can reproduce this bug quickly:
- pvcreate /dev/vdb
- vgcreate myvg1 /dev/vdb
- lvcreate -L 1024m -n mylv1 myvg1
- mount /dev/myvg1/mylv1 /mnt/f2fs
- dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=20
- sync
- rm /mnt/f2fs/file
- sync
- lvcreate -L 1024m -s -n mylv1-snapshot /dev/myvg1/mylv1
- umount /mnt/f2fs

The root cause is: it will update discard_max_bytes of mounted lvm
device to zero after creating snapshot on this lvm device, then,
__submit_discard_cmd() will pass parameter @nr_sects w/ zero value
to __blkdev_issue_discard(), it returns a NULL bio pointer, result
in panic.

This patch changes as below for fixing:
1. Let's drop all remained discards in f2fs_unfreeze() if snapshot
of lvm device is created.
2. Checking discard_max_bytes before submitting discard during
__submit_discard_cmd().

Cc: stable@vger.kernel.org
Fixes: 35ec7d5748 ("f2fs: split discard command in prior to block layer")
Reported-by: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219484
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-23 15:48:15 +00:00
Chao Yu
009a8241a8 f2fs: add a sysfs node to limit max read extent count per-inode
Quoted:
"at this time, there are still 1086911 extent nodes in this zombie
extent tree that need to be cleaned up.

crash_arm64_sprd_v8.0.3++> extent_tree.node_cnt ffffff80896cc500
  node_cnt = {
    counter = 1086911
  },
"

As reported by Xiuhong, there will be a huge number of extent nodes
in extent tree, it may potentially cause:
- slab memory fragments
- extreme long time shrink on extent tree
- low mapping efficiency

Let's add a sysfs node to limit max read extent count for each inode,
by default, value of this threshold is 10240, it can be updated
according to user's requirement.

Reported-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/20241112110627.1314632-1-xiuhong.wang@unisoc.com/
Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-23 15:48:13 +00:00
Chao Yu
3fc5d5a182 f2fs: fix to shrink read extent node in batches
We use rwlock to protect core structure data of extent tree during
its shrink, however, if there is a huge number of extent nodes in
extent tree, during shrink of extent tree, it may hold rwlock for
a very long time, which may trigger kernel hang issue.

This patch fixes to shrink read extent node in batches, so that,
critical region of the rwlock can be shrunk to avoid its extreme
long time hold.

Reported-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/20241112110627.1314632-1-xiuhong.wang@unisoc.com/
Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:16:54 +00:00
Chao Yu
81520c684c f2fs: print message if fscorrupted was found in f2fs_new_node_page()
If fs corruption occurs in f2fs_new_node_page(), let's print
more information about corrupted metadata into kernel log.

Meanwhile, it updates to record ERROR_INCONSISTENT_NAT instead
of ERROR_INVALID_BLKADDR if blkaddr in nat entry is not
NULL_ADDR which means nat bitmap and nat entry is inconsistent.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:16:39 +00:00
Sheng Yong
f88c7904b5 f2fs: clear SBI_POR_DOING before initing inmem curseg
SBI_POR_DOING can be cleared after recovery is completed, so that
changes made before recovery can be persistent, and subsequent
errors can be recorded into cp/sb.

Signed-off-by: Song Feng <songfeng@oppo.com>
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Signed-off-by: Sheng Yong <shengyong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:14:07 +00:00
Sheng Yong
1015035609 f2fs: fix changing cursegs if recovery fails on zoned device
Fsync data recovery attempts to check and fix write pointer consistency
of cursegs and all other zones. If the write pointers of cursegs are
unaligned, cursegs are changed to new sections.

If recovery fails, zone write pointers are still checked and fixed,
but the latest checkpoint cannot be written back. Additionally, retry-
mount skips recovery and rolls back to reuse the old cursegs whose
zones are already finished. This can lead to unaligned write later.

This patch addresses the issue by leaving writer pointers untouched if
recovery fails. When retry-mount is performed, cursegs and other zones
are checked and fixed after skipping recovery.

Signed-off-by: Song Feng <songfeng@oppo.com>
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Signed-off-by: Sheng Yong <shengyong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:13:53 +00:00
Daeho Jeong
a35749b1ed f2fs: adjust unusable cap before checkpoint=disable mode
The unusable cap value must be adjusted before checking whether
checkpoint=disable is feasible.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:30 +00:00
Chao Yu
6787a82245 f2fs: fix to requery extent which cross boundary of inquiry
dd if=/dev/zero of=file bs=4k count=5
xfs_io file -c "fiemap -v 2 16384"
file:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..31]:         139272..139303      32 0x1000
     1: [32..39]:        139304..139311       8 0x1001
xfs_io file -c "fiemap -v 0 16384"
file:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..31]:         139272..139303      32 0x1000
xfs_io file -c "fiemap -v 0 16385"
file:
   EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
     0: [0..39]:         139272..139311      40 0x1001

There are two problems:
- continuous extent is split to two
- FIEMAP_EXTENT_LAST is missing in last extent

The root cause is: if upper boundary of inquiry crosses extent,
f2fs_map_blocks() will truncate length of returned extent to
F2FS_BYTES_TO_BLK(len), and also, it will stop to query latter
extent or hole to make sure current extent is last or not.

In order to fix this issue, once we found an extent locates
in the end of inquiry range by f2fs_map_blocks(), we need to
expand inquiry range to requiry.

Cc: stable@vger.kernel.org
Fixes: 7f63eb77af ("f2fs: report unwritten area in f2fs_fiemap")
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:30 +00:00
Zhiguo Niu
77569f785c f2fs: fix to adjust appropriate length for fiemap
If user give a file size as "length" parameter for fiemap
operations, but if this size is non-block size aligned,
it will show 2 segments fiemap results even this whole file
is contiguous on disk, such as the following results:

 ./f2fs_io fiemap 0 19034 ylog/analyzer.py
Fiemap: offset = 0 len = 19034
        logical addr.    physical addr.   length           flags
0       0000000000000000 0000000020baa000 0000000000004000 00001000
1       0000000000004000 0000000020bae000 0000000000001000 00001001

after this patch:
./f2fs_io fiemap 0 19034 ylog/analyzer.py
Fiemap: offset = 0 len = 19034
    logical addr.    physical addr.   length           flags
0    0000000000000000 00000000315f3000 0000000000005000 00001001

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:30 +00:00
Chao Yu
7461f37094 f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK}
f2fs doesn't support different blksize in one instance, so
bytes_to_blks() and blks_to_bytes() are equal to F2FS_BYTES_TO_BLK
and F2FS_BLK_TO_BYTES, let's use F2FS_BYTES_TO_BLK/F2FS_BLK_TO_BYTES
instead for cleanup.

Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:29 +00:00
Chao Yu
3273d8ad94 f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow
It missed to cast variable to unsigned long long type before
bit shift, which will cause overflow, fix it.

Fixes: f7ef9b83b5 ("f2fs: introduce macros to convert bytes and blocks in f2fs")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:29 +00:00
Daniel Yang
789ca0eb47 f2fs: replace deprecated strcpy with strscpy
strcpy is deprecated. Kernel docs recommend replacing strcpy with
strscpy. The function strcpy() return value isn't used so there
shouldn't be an issue replacing with the safer alternative strscpy.

Signed-off-by: Daniel Yang <danielyangkang@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:29 +00:00
Jaegeuk Kim
acff9409dd Revert "f2fs: remove unreachable lazytime mount option parsing"
This reverts commit 54f43a10fa.

The above commit broke the lazytime mount, given

mount("/dev/vdb", "/mnt/test", "f2fs", 0, "lazytime");

CC: stable@vger.kernel.org # 6.11+
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-21 16:12:15 +00:00
Linus Torvalds
0f25f0e4ef the bulk of struct fd memory safety stuff
Making sure that struct fd instances are destroyed in the same
 scope where they'd been created, getting rid of reassignments
 and passing them by reference, converting to CLASS(fd{,_pos,_raw}).
 
 We are getting very close to having the memory safety of that stuff
 trivial to verify.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZzdikAAKCRBZ7Krx/gZQ
 69nJAQCmbQHK3TGUbQhOw6MJXOK9ezpyEDN3FZb4jsu38vTIdgEA6OxAYDO2m2g9
 CN18glYmD3wRyU6Bwl4vGODouSJvDgA=
 =gVH3
 -----END PGP SIGNATURE-----

Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull 'struct fd' class updates from Al Viro:
 "The bulk of struct fd memory safety stuff

  Making sure that struct fd instances are destroyed in the same scope
  where they'd been created, getting rid of reassignments and passing
  them by reference, converting to CLASS(fd{,_pos,_raw}).

  We are getting very close to having the memory safety of that stuff
  trivial to verify"

* tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
  deal with the last remaing boolean uses of fd_file()
  css_set_fork(): switch to CLASS(fd_raw, ...)
  memcg_write_event_control(): switch to CLASS(fd)
  assorted variants of irqfd setup: convert to CLASS(fd)
  do_pollfd(): convert to CLASS(fd)
  convert do_select()
  convert vfs_dedupe_file_range().
  convert cifs_ioctl_copychunk()
  convert media_request_get_by_fd()
  convert spu_run(2)
  switch spufs_calls_{get,put}() to CLASS() use
  convert cachestat(2)
  convert do_preadv()/do_pwritev()
  fdget(), more trivial conversions
  fdget(), trivial conversions
  privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget()
  o2hb_region_dev_store(): avoid goto around fdget()/fdput()
  introduce "fd_pos" class, convert fdget_pos() users to it.
  fdget_raw() users: switch to CLASS(fd_raw)
  convert vmsplice() to CLASS(fd)
  ...
2024-11-18 12:24:06 -08:00
Linus Torvalds
70e7730c2a vfs-6.13.misc
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcToAAKCRCRxhvAZXjc
 osL9AP948FFumJRC28gDJ4xp+X4eohNOfkgoEG8FTbF2zU6ulwD+O0pr26FqpFli
 pqlG+38UdATImpfqqWjPbb72sBYcfQg=
 =wLUh
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
 "Features:

   - Fixup and improve NLM and kNFSD file lock callbacks

     Last year both GFS2 and OCFS2 had some work done to make their
     locking more robust when exported over NFS. Unfortunately, part of
     that work caused both NLM (for NFS v3 exports) and kNFSD (for
     NFSv4.1+ exports) to no longer send lock notifications to clients

     This in itself is not a huge problem because most NFS clients will
     still poll the server in order to acquire a conflicted lock

     It's important for NLM and kNFSD that they do not block their
     kernel threads inside filesystem's file_lock implementations
     because that can produce deadlocks. We used to make sure of this by
     only trusting that posix_lock_file() can correctly handle blocking
     lock calls asynchronously, so the lock managers would only setup
     their file_lock requests for async callbacks if the filesystem did
     not define its own lock() file operation

     However, when GFS2 and OCFS2 grew the capability to correctly
     handle blocking lock requests asynchronously, they started
     signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check
     for also trusting posix_lock_file() was inadvertently dropped, so
     now most filesystems no longer produce lock notifications when
     exported over NFS

     Fix this by using an fop_flag which greatly simplifies the problem
     and grooms the way for future uses by both filesystems and lock
     managers alike

   - Add a sysctl to delete the dentry when a file is removed instead of
     making it a negative dentry

     Commit 681ce86235 ("vfs: Delete the associated dentry when
     deleting a file") introduced an unconditional deletion of the
     associated dentry when a file is removed. However, this led to
     performance regressions in specific benchmarks, such as
     ilebench.sum_operations/s, prompting a revert in commit
     4a4be1ad3a ("Revert "vfs: Delete the associated dentry when
     deleting a file""). This reintroduces the concept conditionally
     through a sysctl

   - Expand the statmount() system call:

       * Report the filesystem subtype in a new fs_subtype field to
         e.g., report fuse filesystem subtypes

       * Report the superblock source in a new sb_source field

       * Add a new way to return filesystem specific mount options in an
         option array that returns filesystem specific mount options
         separated by zero bytes and unescaped. This allows caller's to
         retrieve filesystem specific mount options and immediately pass
         them to e.g., fsconfig() without having to unescape or split
         them

       * Report security (LSM) specific mount options in a separate
         security option array. We don't lump them together with
         filesystem specific mount options as security mount options are
         generic and most users aren't interested in them

         The format is the same as for the filesystem specific mount
         option array

   - Support relative paths in fsconfig()'s FSCONFIG_SET_STRING command

   - Optimize acl_permission_check() to avoid costly {g,u}id ownership
     checks if possible

   - Use smp_mb__after_spinlock() to avoid full smp_mb() in evict()

   - Add synchronous wakeup support for ep_poll_callback.

     Currently, epoll only uses wake_up() to wake up task. But sometimes
     there are epoll users which want to use the synchronous wakeup flag
     to give a hint to the scheduler, e.g., the Android binder driver.
     So add a wake_up_sync() define, and use wake_up_sync() when sync is
     true in ep_poll_callback()

  Fixes:

   - Fix kernel documentation for inode_insert5() and iget5_locked()

   - Annotate racy epoll check on file->f_ep

   - Make F_DUPFD_QUERY associative

   - Avoid filename buffer overrun in initramfs

   - Don't let statmount() return empty strings

   - Add a cond_resched() to dump_user_range() to avoid hogging the CPU

   - Don't query the device logical blocksize multiple times for hfsplus

   - Make filemap_read() check that the offset is positive or zero

  Cleanups:

   - Various typo fixes

   - Cleanup wbc_attach_fdatawrite_inode()

   - Add __releases annotation to wbc_attach_and_unlock_inode()

   - Add hugetlbfs tracepoints

   - Fix various vfs kernel doc parameters

   - Remove obsolete TODO comment from io_cancel()

   - Convert wbc_account_cgroup_owner() to take a folio

   - Fix comments for BANDWITH_INTERVAL and wb_domain_writeout_add()

   - Reorder struct posix_acl to save 8 bytes

   - Annotate struct posix_acl with __counted_by()

   - Replace one-element array with flexible array member in freevxfs

   - Use idiomatic atomic64_inc_return() in alloc_mnt_ns()"

* tag 'vfs-6.13.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
  statmount: retrieve security mount options
  vfs: make evict() use smp_mb__after_spinlock instead of smp_mb
  statmount: add flag to retrieve unescaped options
  fs: add the ability for statmount() to report the sb_source
  writeback: wbc_attach_fdatawrite_inode out of line
  writeback: add a __releases annoation to wbc_attach_and_unlock_inode
  fs: add the ability for statmount() to report the fs_subtype
  fs: don't let statmount return empty strings
  fs:aio: Remove TODO comment suggesting hash or array usage in io_cancel()
  hfsplus: don't query the device logical block size multiple times
  freevxfs: Replace one-element array with flexible array member
  fs: optimize acl_permission_check()
  initramfs: avoid filename buffer overrun
  fs/writeback: convert wbc_account_cgroup_owner to take a folio
  acl: Annotate struct posix_acl with __counted_by()
  acl: Realign struct posix_acl to save 8 bytes
  epoll: Add synchronous wakeup support for ep_poll_callback
  coredump: add cond_resched() to dump_user_range
  mm/page-writeback.c: Fix comment of wb_domain_writeout_add()
  mm/page-writeback.c: Update comment for BANDWIDTH_INTERVAL
  ...
2024-11-18 09:35:30 -08:00
Chao Yu
26e6f59d0b f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode
Jinsu Lee reported a performance regression issue, after commit
5c8764f867 ("f2fs: fix to force buffered IO on inline_data
inode"), we forced direct write to use buffered IO on inline_data
inode, it will cause performace regression due to memory copy
and data flush.

It's fine to not force direct write to use buffered IO, as it
can convert inline inode before committing direct write IO.

Fixes: 5c8764f867 ("f2fs: fix to force buffered IO on inline_data inode")
Reported-by: Jinsu Lee <jinsu1.lee@samsung.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/af03dd2c-e361-4f80-b2fd-39440766cf6e@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-05 18:30:22 +00:00
Chao Yu
5dd00ebda3 f2fs: fix to map blocks correctly for direct write
f2fs_map_blocks() supports to map continuous holes or preallocated
address, we should avoid setting F2FS_MAP_MAPPED for these cases
only, otherwise, it may fail f2fs_iomap_begin(), and make direct
write fallbacking to use buffered IO and flush, result in performance
regression.

Fixes: 9f0f6bf427 ("f2fs: support to map continuous holes or preallocated address")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202409122103.e45aa13b-oliver.sang@intel.com
Cc: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-05 18:29:57 +00:00
Long Li
7b0033dbc4 f2fs: fix race in concurrent f2fs_stop_gc_thread
In my test case, concurrent calls to f2fs shutdown report the following
stack trace:

 Oops: general protection fault, probably for non-canonical address 0xc6cfff63bb5513fc: 0000 [#1] PREEMPT SMP PTI
 CPU: 0 UID: 0 PID: 678 Comm: f2fs_rep_shutdo Not tainted 6.12.0-rc5-next-20241029-g6fb2fa9805c5-dirty #85
 Call Trace:
  <TASK>
  ? show_regs+0x8b/0xa0
  ? __die_body+0x26/0xa0
  ? die_addr+0x54/0x90
  ? exc_general_protection+0x24b/0x5c0
  ? asm_exc_general_protection+0x26/0x30
  ? kthread_stop+0x46/0x390
  f2fs_stop_gc_thread+0x6c/0x110
  f2fs_do_shutdown+0x309/0x3a0
  f2fs_ioc_shutdown+0x150/0x1c0
  __f2fs_ioctl+0xffd/0x2ac0
  f2fs_ioctl+0x76/0xe0
  vfs_ioctl+0x23/0x60
  __x64_sys_ioctl+0xce/0xf0
  x64_sys_call+0x2b1b/0x4540
  do_syscall_64+0xa7/0x240
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

The root cause is a race condition in f2fs_stop_gc_thread() called from
different f2fs shutdown paths:

  [CPU0]                       [CPU1]
  ----------------------       -----------------------
  f2fs_stop_gc_thread          f2fs_stop_gc_thread
                                 gc_th = sbi->gc_thread
    gc_th = sbi->gc_thread
    kfree(gc_th)
    sbi->gc_thread = NULL
                                 < gc_th != NULL >
                                 kthread_stop(gc_th->f2fs_gc_task) //UAF

The commit c7f114d864 ("f2fs: fix to avoid use-after-free in
f2fs_stop_gc_thread()") attempted to fix this issue by using a read
semaphore to prevent races between shutdown and remount threads, but
it fails to prevent all race conditions.

Fix it by converting to write lock of s_umount in f2fs_do_shutdown().

Fixes: 7950e9ac63 ("f2fs: stop gc/discard thread after fs shutdown")
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-05 18:27:51 +00:00
Xiuhong Wang
a7a7c1d423 f2fs: fix fiemap failure issue when page size is 16KB
After enable 16K page size, an infinite loop may occur in
fiemap (fm_length=UINT64_MAX) on a file, such as the 16KB
scratch.img during the remount operation in Android.

The condition for whether fiemap continues to map is to check
whether the number of bytes corresponding to the next map.m_lblk
exceeds blks_to_bytes(inode,max_inode_blocks(inode)) if there are HOLE.
The latter does not take into account the maximum size of a file with 16KB
page size, so the loop cannot be jumped out.

The following is the fail trace:
When f2fs_map_blocks reaches map.m_lblk=3936, it needs to go to the
first direct node block, so the map is 3936 + 4090 = 8026,
The next map is the second direct node block, that is,
8026 + 4090 = 12116,
The next map is the first indirect node block, that is,
12116 + 4090 * 4090 = 16740216,
The next map is the second indirect node block, that is,
16740216 + 4090 * 4090 = 33468316,
The next map is the first double indirect node block, that is,
33468316 + 4090 * 4090 * 4090 = 68451397316
Since map.m_lblk represents the address of a block, which is 32
bits, truncation will occur, that is, 68451397316 becomes
4026887876, and the number of bytes corresponding to the block
number does not exceed blks_to_bytes(inode,max_inode_blocks(inode)),
so the loop will not be jumped out.
The next time, it will be considered that it should still be a
double indirect node block, that is,
4026887876 + 4090 * 4090 * 4090 = 72444816876, which will be
truncated to 3725340140, and the loop will not be jumped out.

156.374871: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 0, start blkaddr = 0x8e00, len = 0x200, flags = 2,seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.374916: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 512, start blkaddr = 0x0, len = 0x0, flags = 0 , seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.374920: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 513, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
......
156.385747: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 3935, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385752: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 3936, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385755: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 8026, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385758: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 12116, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385761: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 16740216, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385764: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 33468316, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385767: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 4026887876, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385770: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 3725340140, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385772: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 4026887876, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0
156.385775: f2fs_map_blocks: dev = (254,57), ino = 7449, file offset = 3725340140, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 8, may_create = 0, multidevice = 0, flag = 1, err = 0

Commit a6a010f5de ("f2fs: Restrict max filesize for 16K f2fs")
has set the maximum allowed file size to (U32_MAX + 1) * F2FS_BLKSIZE,
so max_file_blocks should be used here to limit it, that is,
maxbytes defined above. And the max_inode_blocks function is not
called by other functions except here, so cleanup it.

Signed-off-by: Xiuhong Wang <xiuhong.wang@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-05 18:26:51 +00:00
Al Viro
8152f82010 fdget(), more trivial conversions
all failure exits prior to fdget() leave the scope, all matching fdput()
are immediately followed by leaving the scope.

[xfs_ioc_commit_range() chunk moved here as well]

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-03 01:28:06 -05:00
Zhiguo Niu
744e66cb87 f2fs: remove redundant atomic file check in defragment
f2fs_is_atomic_file(inode) is checked in f2fs_defragment_range,
so remove the redundant checking in f2fs_ioc_defragment.

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:42 +00:00
Chao Yu
51d3d952c5 f2fs: fix to convert log type to segment data type correctly
This patch introduces a new helper log_type_to_seg_type() to convert
log type to segment data type, and uses it to clean up opened codes
in build_curseg(), and also it fixes to convert log type before use
in do_write_page().

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:42 +00:00
LongPing Wei
1df2bc3c82 f2fs: clean up the unused variable additional_reserved_segments
additional_reserved_segments was introduced by
commit 300a842937 ("f2fs: fix to reserve space for IO align feature"),
and its initialization was deleted by
commit 87161a2b0a ("f2fs: deprecate io_bits").

Signed-off-by: LongPing Wei <weilongping@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:41 +00:00
Chao Yu
cffaa0976f f2fs: clean up opened code w/ {get,set}_nid()
Just cleanup, no logic change.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:41 +00:00
Zhiguo Niu
296b8cb34e f2fs: fix to avoid use GC_AT when setting gc_mode as GC_URGENT_LOW or GC_URGENT_MID
If gc_mode is set to GC_URGENT_LOW or GC_URGENT_MID, cost benefit GC
approach should be used, but if ATGC is enabled at the same time,
Age-threshold approach will be selected, which can only do amount of
GC and it is much less than the numbers of CB approach.

some traces:
  f2fs_gc-254:48-396     [007] ..... 2311600.684028: f2fs_gc_begin: dev = (254,48), gc_type = Background GC, no_background_GC = 0, nr_free_secs = 0, nodes = 1053, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:0
  f2fs_gc-254:48-396     [007] ..... 2311600.684527: f2fs_get_victim: dev = (254,48), type = No TYPE, policy = (Background GC, LFS-mode, Age-threshold), victim = 10, cost = 4294364975, ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 44898
  f2fs_gc-254:48-396     [007] ..... 2311600.714835: f2fs_gc_end: dev = (254,48), ret = 0, seg_freed = 0, sec_freed = 0, nodes = 1562, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:0
  f2fs_gc-254:48-396     [007] ..... 2311600.714843: f2fs_background_gc: dev = (254,48), wait_ms = 50, prefree = 0, free = 44898
  f2fs_gc-254:48-396     [007] ..... 2311600.771785: f2fs_gc_begin: dev = (254,48), gc_type = Background GC, no_background_GC = 0, nr_free_secs = 0, nodes = 1562, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:
  f2fs_gc-254:48-396     [007] ..... 2311600.772275: f2fs_gc_end: dev = (254,48), ret = -61, seg_freed = 0, sec_freed = 0, nodes = 1562, dents = 2, imeta = 18, free_sec:44898, free_seg:44898, rsv_seg:239, prefree_seg:0

Fixes: 0e5e81114d ("f2fs: add GC_URGENT_LOW mode in gc_urgent")
Fixes: d98af5f455 ("f2fs: introduce gc_urgent_mid mode")
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:41 +00:00
Chao Yu
9395fb09e8 f2fs: fix to parse temperature correctly in f2fs_get_segment_temp()
In __get_segment_type(), __get_segment_type_6() may return
CURSEG_COLD_DATA_PINNED or CURSEG_ALL_DATA_ATGC log type, but
following f2fs_get_segment_temp() can only handle persistent
log type, fix it.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:41 +00:00
Chao Yu
f10a890308 f2fs: fix to avoid potential deadlock in f2fs_record_stop_reason()
syzbot reports deadlock issue of f2fs as below:

======================================================
WARNING: possible circular locking dependency detected
6.12.0-rc3-syzkaller-00087-gc964ced77262 #0 Not tainted
------------------------------------------------------
kswapd0/79 is trying to acquire lock:
ffff888011824088 (&sbi->sb_lock){++++}-{3:3}, at: f2fs_down_write fs/f2fs/f2fs.h:2199 [inline]
ffff888011824088 (&sbi->sb_lock){++++}-{3:3}, at: f2fs_record_stop_reason+0x52/0x1d0 fs/f2fs/super.c:4068

but task is already holding lock:
ffff88804bd92610 (sb_internal#2){.+.+}-{0:0}, at: f2fs_evict_inode+0x662/0x15c0 fs/f2fs/inode.c:842

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (sb_internal#2){.+.+}-{0:0}:
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
       percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
       __sb_start_write include/linux/fs.h:1716 [inline]
       sb_start_intwrite+0x4d/0x1c0 include/linux/fs.h:1899
       f2fs_evict_inode+0x662/0x15c0 fs/f2fs/inode.c:842
       evict+0x4e8/0x9b0 fs/inode.c:725
       f2fs_evict_inode+0x1a4/0x15c0 fs/f2fs/inode.c:807
       evict+0x4e8/0x9b0 fs/inode.c:725
       dispose_list fs/inode.c:774 [inline]
       prune_icache_sb+0x239/0x2f0 fs/inode.c:963
       super_cache_scan+0x38c/0x4b0 fs/super.c:223
       do_shrink_slab+0x701/0x1160 mm/shrinker.c:435
       shrink_slab+0x1093/0x14d0 mm/shrinker.c:662
       shrink_one+0x43b/0x850 mm/vmscan.c:4818
       shrink_many mm/vmscan.c:4879 [inline]
       lru_gen_shrink_node mm/vmscan.c:4957 [inline]
       shrink_node+0x3799/0x3de0 mm/vmscan.c:5937
       kswapd_shrink_node mm/vmscan.c:6765 [inline]
       balance_pgdat mm/vmscan.c:6957 [inline]
       kswapd+0x1ca3/0x3700 mm/vmscan.c:7226
       kthread+0x2f0/0x390 kernel/kthread.c:389
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

-> #1 (fs_reclaim){+.+.}-{0:0}:
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
       __fs_reclaim_acquire mm/page_alloc.c:3834 [inline]
       fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3848
       might_alloc include/linux/sched/mm.h:318 [inline]
       prepare_alloc_pages+0x147/0x5b0 mm/page_alloc.c:4493
       __alloc_pages_noprof+0x16f/0x710 mm/page_alloc.c:4722
       alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265
       alloc_pages_noprof mm/mempolicy.c:2345 [inline]
       folio_alloc_noprof+0x128/0x180 mm/mempolicy.c:2352
       filemap_alloc_folio_noprof+0xdf/0x500 mm/filemap.c:1010
       do_read_cache_folio+0x2eb/0x850 mm/filemap.c:3787
       read_mapping_folio include/linux/pagemap.h:1011 [inline]
       f2fs_commit_super+0x3c0/0x7d0 fs/f2fs/super.c:4032
       f2fs_record_stop_reason+0x13b/0x1d0 fs/f2fs/super.c:4079
       f2fs_handle_critical_error+0x2ac/0x5c0 fs/f2fs/super.c:4174
       f2fs_write_inode+0x35f/0x4d0 fs/f2fs/inode.c:785
       write_inode fs/fs-writeback.c:1503 [inline]
       __writeback_single_inode+0x711/0x10d0 fs/fs-writeback.c:1723
       writeback_single_inode+0x1f3/0x660 fs/fs-writeback.c:1779
       sync_inode_metadata+0xc4/0x120 fs/fs-writeback.c:2849
       f2fs_release_file+0xa8/0x100 fs/f2fs/file.c:1941
       __fput+0x23f/0x880 fs/file_table.c:431
       task_work_run+0x24f/0x310 kernel/task_work.c:228
       resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
       exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
       __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
       syscall_exit_to_user_mode+0x168/0x370 kernel/entry/common.c:218
       do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #0 (&sbi->sb_lock){++++}-{3:3}:
       check_prev_add kernel/locking/lockdep.c:3161 [inline]
       check_prevs_add kernel/locking/lockdep.c:3280 [inline]
       validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904
       __lock_acquire+0x1384/0x2050 kernel/locking/lockdep.c:5202
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
       down_write+0x99/0x220 kernel/locking/rwsem.c:1577
       f2fs_down_write fs/f2fs/f2fs.h:2199 [inline]
       f2fs_record_stop_reason+0x52/0x1d0 fs/f2fs/super.c:4068
       f2fs_handle_critical_error+0x2ac/0x5c0 fs/f2fs/super.c:4174
       f2fs_evict_inode+0xa61/0x15c0 fs/f2fs/inode.c:883
       evict+0x4e8/0x9b0 fs/inode.c:725
       f2fs_evict_inode+0x1a4/0x15c0 fs/f2fs/inode.c:807
       evict+0x4e8/0x9b0 fs/inode.c:725
       dispose_list fs/inode.c:774 [inline]
       prune_icache_sb+0x239/0x2f0 fs/inode.c:963
       super_cache_scan+0x38c/0x4b0 fs/super.c:223
       do_shrink_slab+0x701/0x1160 mm/shrinker.c:435
       shrink_slab+0x1093/0x14d0 mm/shrinker.c:662
       shrink_one+0x43b/0x850 mm/vmscan.c:4818
       shrink_many mm/vmscan.c:4879 [inline]
       lru_gen_shrink_node mm/vmscan.c:4957 [inline]
       shrink_node+0x3799/0x3de0 mm/vmscan.c:5937
       kswapd_shrink_node mm/vmscan.c:6765 [inline]
       balance_pgdat mm/vmscan.c:6957 [inline]
       kswapd+0x1ca3/0x3700 mm/vmscan.c:7226
       kthread+0x2f0/0x390 kernel/kthread.c:389
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

other info that might help us debug this:

Chain exists of:
  &sbi->sb_lock --> fs_reclaim --> sb_internal#2

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  rlock(sb_internal#2);
                               lock(fs_reclaim);
                               lock(sb_internal#2);
  lock(&sbi->sb_lock);

Root cause is there will be potential deadlock in between
below tasks:

Thread A				Kswapd
- f2fs_ioc_commit_atomic_write
 - mnt_want_write_file -- down_read lock A
					- balance_pgdat
					 - __fs_reclaim_acquire  -- lock B
					  - shrink_node
					   - prune_icache_sb
					    - dispose_list
					     - f2fs_evict_inode
					      - sb_start_intwrite  -- down_read lock A
 - f2fs_do_sync_file
  - f2fs_write_inode
   - f2fs_handle_critical_error
    - f2fs_record_stop_reason
     - f2fs_commit_super
      - read_mapping_folio
       - filemap_alloc_folio_noprof
        - fs_reclaim_acquire  -- lock B

Both threads try to acquire read lock of lock A, then its upcoming write
lock grabber will trigger deadlock.

Let's always create an asynchronous task in f2fs_handle_critical_error()
rather than calling f2fs_record_stop_reason() synchronously to avoid
this potential deadlock issue.

Fixes: b62e71be21 ("f2fs: support errors=remount-ro|continue|panic mountoption")
Reported-by: syzbot+be4a9983e95a5e25c8d3@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6704d667.050a0220.1e4d62.0081.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daejun Park <daejun7.park@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:41 +00:00
Zeng Heng
0c3a38a4b4 f2fs: Fix not used variable 'index'
Fix the following compilation warning:
fs/f2fs/data.c:2391:10: warning: variable ‘index’ set but not used
[-Wunused-but-set-variable]
 2391 |  pgoff_t index;

Only define and set the variable index when the CONFIG_F2FS_FS_COMPRESSION
is enabled.

Fixes: db92e6c729 ("f2fs: convert f2fs_mpage_readpages() to use folio")
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:41 +00:00
Andrew Kreimer
e63ce120b4 f2fs: fix typos
Fix typos: datas -> data.

Via codespell.

Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:41 +00:00
Qi Han
84b5bb8bf0 f2fs: modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable
When the free segment is used up during CP disable, many write or
ioctl operations will get ENOSPC error codes, even if there are
still many blocks available. We can reproduce it in the following
steps:

dd if=/dev/zero of=f2fs.img bs=1M count=65
mkfs.f2fs -f f2fs.img
mount f2fs.img f2fs_dir -o checkpoint=disable:10%
cd f2fs_dir
i=1 ; while [[ $i -lt 50 ]] ; do (file_name=./2M_file$i ; dd \
if=/dev/random of=$file_name bs=1M count=2); i=$((i+1)); done
sync
i=1 ; while [[ $i -lt 50 ]] ; do (file_name=./2M_file$i ; truncate \
-s 1K $file_name); i=$((i+1)); done
sync
dd if=/dev/zero of=./file bs=1M count=20

In f2fs_need_SSR() function, it is allowed to use SSR to allocate
blocks when CP is disabled, so in f2fs_is_checkpoint_ready function,
can we judge the number of invalid blocks when free segment is not
enough, and return ENOSPC only if the number of invalid blocks is
also not enough.

Signed-off-by: Qi Han <hanqi@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:24:40 +00:00
Yongpeng Yang
43563069e1 f2fs: check curseg->inited before write_sum_page in change_curseg
In the __f2fs_init_atgc_curseg->get_atssr_segment calling,
curseg->segno is NULL_SEGNO, indicating that there is no summary
block that needs to be written.

Fixes: 093749e296 ("f2fs: support age threshold based garbage collection")
Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:19:01 +00:00
LongPing Wei
c3af1f1347 f2fs: fix the wrong f2fs_bug_on condition in f2fs_do_replace_block
This f2fs_bug_on was introduced by commit 2c1905042c ("f2fs: check
segment type in __f2fs_replace_block") when there were only 6 curseg types.
After commit d0b9e42ab6 ("f2fs: introduce inmem curseg") was introduced,
the condition should be changed to checking curseg->seg_type.

Fixes: d0b9e42ab6 ("f2fs: introduce inmem curseg")
Signed-off-by: LongPing Wei <weilongping@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:19:01 +00:00
Chao Yu
5bc5aae843 f2fs: zone: introduce first_zoned_segno in f2fs_sb_info
first_zoned_segno() returns a fixed value, let's cache it in
structure f2fs_sb_info to avoid redundant calculation.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:19:00 +00:00