linux-next/fs
Long Li 51d20d1dac
iomap: fix zero padding data issue in concurrent append writes
During concurrent append writes to XFS filesystem, zero padding data
may appear in the file after power failure. This happens due to imprecise
disk size updates when handling write completion.

Consider this scenario with concurrent append writes same file:

  Thread 1:                  Thread 2:
  ------------               -----------
  write [A, A+B]
  update inode size to A+B
  submit I/O [A, A+BS]
                             write [A+B, A+B+C]
                             update inode size to A+B+C
  <I/O completes, updates disk size to min(A+B+C, A+BS)>
  <power failure>

After reboot:
  1) with A+B+C < A+BS, the file has zero padding in range [A+B, A+B+C]

  |<         Block Size (BS)      >|
  |DDDDDDDDDDDDDDDD0000000000000000|
  ^               ^        ^
  A              A+B     A+B+C
                         (EOF)

  2) with A+B+C > A+BS, the file has zero padding in range [A+B, A+BS]

  |<         Block Size (BS)      >|<           Block Size (BS)    >|
  |DDDDDDDDDDDDDDDD0000000000000000|00000000000000000000000000000000|
  ^               ^                ^               ^
  A              A+B              A+BS           A+B+C
                                  (EOF)

  D = Valid Data
  0 = Zero Padding

The issue stems from disk size being set to min(io_offset + io_size,
inode->i_size) at I/O completion. Since io_offset+io_size is block
size granularity, it may exceed the actual valid file data size. In
the case of concurrent append writes, inode->i_size may be larger
than the actual range of valid file data written to disk, leading to
inaccurate disk size updates.

This patch modifies the meaning of io_size to represent the size of
valid data within EOF in an ioend. If the ioend spans beyond i_size,
io_size will be trimmed to provide the file with more accurate size
information. This is particularly useful for on-disk size updates
at completion time.

After this change, ioends that span i_size will not grow or merge with
other ioends in concurrent scenarios. However, these cases that need
growth/merging rarely occur and it seems no noticeable performance impact.
Although rounding up io_size could enable ioend growth/merging in these
scenarios, we decided to keep the code simple after discussion [1].

Another benefit is that it makes the xfs_ioend_is_append() check more
accurate, which can reduce unnecessary end bio callbacks of xfs_end_bio()
in certain scenarios, such as repeated writes at the file tail without
extending the file size.

Link [1]: https://patchwork.kernel.org/project/xfs/patch/20241113091907.56937-1-leo.lilong@huawei.com

Fixes: ae259a9c85 ("fs: introduce iomap infrastructure") # goes further back than this
Signed-off-by: Long Li <leo.lilong@huawei.com>
Link: https://lore.kernel.org/r/20241209114241.3725722-3-leo.lilong@huawei.com
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-11 11:09:05 +01:00
..
9p fs/9p: replace functions v9fs_cache_{register|unregister} with direct calls 2024-11-16 17:23:19 +09:00
adfs Merge patch series "adfs, affs, befs, hfs, hfsplus: convert to new mount api" 2024-10-08 14:41:53 +02:00
affs Merge patch series "adfs, affs, befs, hfs, hfsplus: convert to new mount api" 2024-10-08 14:41:53 +02:00
afs vfs-6.12-rc6.fixes 2024-11-01 07:37:10 -10:00
autofs autofs: fix thinko in validate_dev_ioctl() 2024-10-28 13:16:56 +01:00
bcachefs - The series "resource: A couple of cleanups" from Andy Shevchenko 2024-11-25 16:09:48 -08:00
befs befs: convert befs to use the new mount api 2024-09-18 11:44:43 +02:00
bfs fs: Convert aops->write_begin to take a folio 2024-08-07 11:33:21 +02:00
btrfs cxl changes for v6.13 2024-11-22 12:33:52 -08:00
cachefiles cachefiles: Fix NULL pointer dereference in object->file 2024-11-11 14:39:38 +01:00
ceph A fix for the mount "device" string parser from Patrick and two cred 2024-11-30 10:22:38 -08:00
coda coda: use param->file for FSCONFIG_SET_FD 2024-08-19 13:45:03 +02:00
configfs configfs: improve item creation performance 2024-11-14 07:45:20 +01:00
cramfs vfs-6.11.module.description 2024-07-15 11:14:59 -07:00
crypto Random number generator updates for Linux 6.13-rc1. 2024-11-19 10:43:44 -08:00
debugfs debugfs: add small file operations for most files 2024-10-23 16:47:01 +02:00
devpts
dlm dlm: fix dlm_recover_members refcount on error 2024-11-18 10:05:57 -06:00
ecryptfs vfs-6.13.ecryptfs.mount.api 2024-11-26 13:39:02 -08:00
efivarfs [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
efs efs: fix the efs new mount api implementation 2024-10-15 15:58:36 +02:00
erofs erofs: handle NONHEAD !delta[1] lclusters gracefully 2024-11-18 18:50:14 +08:00
exfat exfat: reduce FAT chain traversal 2024-11-25 17:08:27 +09:00
exportfs fs: prepare for "explicit connectable" file handles 2024-11-15 11:34:57 +01:00
ext2 vfs-6.12.file 2024-09-16 09:14:02 +02:00
ext4 A lot of miscellaneous ext4 bug fixes and cleanups this cycle, most 2024-11-18 16:32:58 -08:00
f2fs f2fs-for-6.13-rc1 2024-11-26 12:50:58 -08:00
fat fat: fix uninitialized variable 2024-10-17 00:28:06 -07:00
freevxfs freevxfs: Replace one-element array with flexible array member 2024-11-06 10:42:06 +01:00
fuse virtio: features, fixes, cleanups 2024-11-27 13:11:58 -08:00
gfs2 gfs2 changes 2024-11-26 12:34:50 -08:00
hfs hfs: Sanity check the root record 2024-12-02 15:32:19 +01:00
hfsplus vfs-6.13.misc 2024-11-18 09:35:30 -08:00
hostfs hostfs: Fix the NULL vs IS_ERR() bug for __filemap_get_folio() 2024-11-15 20:55:32 +01:00
hpfs hpfs: convert hpfs to use the new mount api 2024-10-08 14:41:53 +02:00
hugetlbfs - The series "zram: optimal post-processing target selection" from 2024-11-23 09:58:07 -08:00
iomap iomap: fix zero padding data issue in concurrent append writes 2024-12-11 11:09:05 +01:00
isofs isofs: avoid memory leak in iocharset 2024-11-06 20:24:41 +01:00
jbd2 jbd2: flush filesystem device before updating tail sequence 2024-12-04 12:00:05 +01:00
jffs2 jffs2: Prevent rtime decompress memory corruption 2024-11-14 20:56:19 +01:00
jfs A few more patches to add sanity checks in jfs 2024-11-21 09:59:59 -08:00
kernfs
lockd NFSD 6.13 Release Notes 2024-11-26 12:59:30 -08:00
minix buffer: Convert __block_write_begin() to take a folio 2024-08-07 11:33:36 +02:00
netfs fscache: Remove duplicate included header 2024-11-21 09:35:25 +01:00
nfs NFS client updates for Linux 6.13 2024-11-30 10:17:53 -08:00
nfs_common nfs_common: must not hold RCU while calling nfsd_file_put_local 2024-11-18 20:23:12 -05:00
nfsd NFSD 6.13 Release Notes 2024-11-26 12:59:30 -08:00
nilfs2 - The series "resource: A couple of cleanups" from Andy Shevchenko 2024-11-25 16:09:48 -08:00
nls move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
notify \n 2024-11-21 09:55:45 -08:00
ntfs3 fs/ntfs3: Accumulated refactoring changes 2024-11-01 11:19:53 +03:00
ocfs2 - The series "resource: A couple of cleanups" from Andy Shevchenko 2024-11-25 16:09:48 -08:00
omfs fs: Convert aops->write_begin to take a folio 2024-08-07 11:33:21 +02:00
openpromfs openpromfs: add missing MODULE_DESCRIPTION() macro 2024-06-20 09:46:01 +02:00
orangefs move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
overlayfs overlayfs updates for 6.13 2024-11-22 20:55:42 -08:00
proc vfs-6.13-rc1.fixes 2024-11-27 08:11:46 -08:00
pstore drm next for 6.12-rc1 2024-09-19 10:18:15 +02:00
qnx4 qnx4: add MODULE_DESCRIPTION() 2024-05-28 11:52:53 +02:00
qnx6 fs/qnx6: Fix building with GCC 15 2024-12-03 10:40:36 +01:00
quota \n 2024-11-21 09:50:18 -08:00
ramfs
romfs romfs: fix romfs_read_folio() 2024-08-21 22:32:58 +02:00
smb 22 SMB3 client fixes 2024-11-30 10:14:42 -08:00
squashfs Squashfs: fix variable overflow in squashfs_readpage_block 2024-10-30 20:14:12 -07:00
sysfs sysfs: bin_attribute: add const read/write callback variants 2024-11-05 14:00:28 +01:00
sysv buffer: Convert __block_write_begin() to take a folio 2024-08-07 11:33:36 +02:00
tests execve: Move KUnit tests to tests/ subdirectory 2024-07-22 18:25:47 -07:00
tracefs tracing: Fix tracefs mount options 2024-11-01 08:38:14 -04:00
ubifs This pull request contains updates for JFFS2, UBI and UBIFS: 2024-11-30 10:32:47 -08:00
udf udf: fix uninit-value use in udf_get_fileshortad 2024-10-02 14:32:37 +02:00
ufs ufs: ufs_sb_private_info: remove unused s_{2,3}apb fields 2024-11-12 19:02:12 -05:00
unicode unicode updates 2024-11-22 20:50:55 -08:00
vboxsf fs: Convert aops->write_end to take a folio 2024-08-07 11:32:02 +02:00
verity fsverity: expose verified fsverity built-in signatures to LSMs 2024-08-20 14:03:18 -04:00
xfs - The series "zram: optimal post-processing target selection" from 2024-11-23 09:58:07 -08:00
zonefs zonefs fixes for 6.12-rc2 2024-10-02 12:02:15 -07:00
aio.c A rather large update for timekeeping and timers: 2024-11-19 16:35:06 -08:00
anon_inodes.c
attr.c fs: handle delegated timestamps in setattr_copy_mgtime 2024-10-10 10:20:51 +02:00
backing-file.c fs/backing_file: fix wrong argument in callback 2024-11-26 18:13:29 +01:00
bad_inode.c
binfmt_elf_fdpic.c Revert "fs: don't block i_writecount during exec" 2024-11-27 12:51:30 +01:00
binfmt_elf.c Revert "fs: don't block i_writecount during exec" 2024-11-27 12:51:30 +01:00
binfmt_flat.c move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
binfmt_misc.c Revert "fs: don't block i_writecount during exec" 2024-11-27 12:51:30 +01:00
binfmt_script.c fs: binfmt: add missing MODULE_DESCRIPTION() macros 2024-05-28 12:06:51 +02:00
bpf_fs_kfuncs.c bpf: Add kfunc bpf_get_dentry_xattr() to read xattr from dentry 2024-08-07 11:26:54 -07:00
buffer.c - The series "zram: optimal post-processing target selection" from 2024-11-23 09:58:07 -08:00
char_dev.c fs: Reorganize kerneldoc parameter names 2024-10-22 11:16:57 +02:00
compat_binfmt_elf.c binfmt_elf: Wire up AT_HWCAP3 at AT_HWCAP4 2024-10-17 18:38:49 +01:00
coredump.c coredump: add cond_resched() to dump_user_range 2024-10-22 11:16:58 +02:00
d_path.c
dax.c fsdax: dax_unshare_iter needs to copy entire blocks 2024-10-07 13:51:47 +02:00
dcache.c - The series "zram: optimal post-processing target selection" from 2024-11-23 09:58:07 -08:00
direct-io.c fs/direct-io: Remove linux/prefetch.h include 2024-08-19 13:45:02 +02:00
drop_caches.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
eventfd.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
eventpoll.c Networking changes for 6.13. 2024-11-21 08:28:08 -08:00
exec.c Revert "fs: don't block i_writecount during exec" 2024-11-27 12:51:30 +01:00
fcntl.c fs: require inode_owner_or_capable for F_SET_RW_HINT 2024-11-25 15:16:49 +01:00
fhandle.c vfs-6.13.exportfs 2024-11-26 13:26:15 -08:00
file_table.c Merge branch 'work.fdtable' into vfs.file 2024-10-30 09:58:02 +01:00
file.c vfs-6.13.file 2024-11-18 10:30:29 -08:00
filesystems.c
fs_context.c
fs_parser.c vfs-6.13.ovl 2024-11-18 10:45:06 -08:00
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c Merge patch series "two little writeback cleanups v2" 2024-11-13 14:08:34 +01:00
fsopen.c fdget(), more trivial conversions 2024-11-03 01:28:06 -05:00
init.c
inode.c - The series "zram: optimal post-processing target selection" from 2024-11-23 09:58:07 -08:00
internal.h sanitize struct filename and lookup flags handling in statx 2024-11-18 14:54:10 -08:00
ioctl.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
Kconfig reiserfs: The last commit 2024-10-21 16:29:38 +02:00
Kconfig.binfmt exec: Add KUnit test for bprm_stack_limits() 2024-06-19 13:13:55 -07:00
kernel_read_file.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
libfs.c sanitize struct filename and lookup flags handling in statx 2024-11-18 14:54:10 -08:00
locks.c fdget(), more trivial conversions 2024-11-03 01:28:06 -05:00
Makefile reiserfs: The last commit 2024-10-21 16:29:38 +02:00
mbcache.c
mnt_idmapping.c fuse update for 6.12 2024-09-24 15:29:42 -07:00
mount.h vfs-6.12.mount 2024-09-16 11:15:26 +02:00
mpage.c fs/writeback: convert wbc_account_cgroup_owner to take a folio 2024-10-28 13:26:54 +01:00
namei.c sanitize xattr and io_uring interactions with it, 2024-11-18 12:44:25 -08:00
namespace.c statmount: fix security option retrieval 2024-11-21 09:35:31 +01:00
nsfs.c [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
open.c \n 2024-11-21 09:55:45 -08:00
pidfs.c pidfd: add ioctl to retrieve pid info 2024-10-24 13:54:51 +02:00
pipe.c [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
pnode.c
pnode.h
posix_acl.c acl: Annotate struct posix_acl with __counted_by() 2024-10-22 11:16:59 +02:00
proc_namespace.c fs: rename show_mnt_opts -> show_vfsmnt_opts 2024-06-28 14:36:43 +02:00
read_write.c the bulk of struct fd memory safety stuff 2024-11-18 12:24:06 -08:00
readdir.c introduce "fd_pos" class, convert fdget_pos() users to it. 2024-11-03 01:28:06 -05:00
remap_range.c convert vfs_dedupe_file_range(). 2024-11-03 01:28:07 -05:00
select.c do_pollfd(): convert to CLASS(fd) 2024-11-03 01:28:07 -05:00
seq_file.c fs: Reorganize kerneldoc parameter names 2024-10-22 11:16:57 +02:00
signalfd.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
splice.c fdget(), more trivial conversions 2024-11-03 01:28:06 -05:00
stack.c
stat.c sanitize struct filename and lookup flags handling in statx 2024-11-18 14:54:10 -08:00
statfs.c fdget_raw() users: switch to CLASS(fd_raw) 2024-11-03 01:28:06 -05:00
super.c fs/super.c: introduce get_tree_bdev_flags() 2024-10-21 14:30:26 +02:00
sync.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
sysctls.c
timerfd.c A rather large update for timekeeping and timers: 2024-11-19 16:35:06 -08:00
userfaultfd.c fork: do not invoke uffd on fork if error occurs 2024-10-28 21:40:38 -07:00
utimes.c fdget(), more trivial conversions 2024-11-03 01:28:06 -05:00
xattr.c xattr: remove redundant check on variable err 2024-11-06 13:00:01 -05:00