linux-stable/fs/ext4
Christian Brauner b40508ca5d
Merge patch series "timekeeping/fs: multigrain timestamp redux"
Jeff Layton <jlayton@kernel.org> says:

The VFS has always used coarse-grained timestamps when updating the
ctime and mtime after a change. This has the benefit of allowing
filesystems to optimize away a lot metadata updates, down to around 1
per jiffy, even when a file is under heavy writes.

Unfortunately, this has always been an issue when we're exporting via
NFSv3, which relies on timestamps to validate caches. A lot of changes
can happen in a jiffy, so timestamps aren't sufficient to help the
client decide when to invalidate the cache. Even with NFSv4, a lot of
exported filesystems don't properly support a change attribute and are
subject to the same problems with timestamp granularity. Other
applications have similar issues with timestamps (e.g backup
applications).

If we were to always use fine-grained timestamps, that would improve the
situation, but that becomes rather expensive, as the underlying
filesystem would have to log a lot more metadata updates.

What we need is a way to only use fine-grained timestamps when they are
being actively queried. Use the (unused) top bit in inode->i_ctime_nsec
as a flag that indicates whether the current timestamps have been
queried via stat() or the like. When it's set, we allow the kernel to
use a fine-grained timestamp iff it's necessary to make the ctime show
a different value.

This solves the problem of being able to distinguish the timestamp
between updates, but introduces a new problem: it's now possible for a
file being changed to get a fine-grained timestamp. A file that is
altered just a bit later can then get a coarse-grained one that appears
older than the earlier fine-grained time. This violates timestamp
ordering guarantees.

To remedy this, keep a global monotonic atomic64_t value that acts as a
timestamp floor.  When we go to stamp a file, we first get the latter of
the current floor value and the current coarse-grained time. If the
inode ctime hasn't been queried then we just attempt to stamp it with
that value.

If it has been queried, then first see whether the current coarse time
is later than the existing ctime. If it is, then we accept that value.
If it isn't, then we get a fine-grained time and try to swap that into
the global floor. Whether that succeeds or fails, we take the resulting
floor time, convert it to realtime and try to swap that into the ctime.

We take the result of the ctime swap whether it succeeds or fails, since
either is just as valid.

Filesystems can opt into this by setting the FS_MGTIME fstype flag.
Others should be unaffected (other than being subject to the same floor
value as multigrain filesystems).

* patches from https://lore.kernel.org/r/20241002-mgtime-v10-0-d1c4717f5284@kernel.org:
  tmpfs: add support for multigrain timestamps
  btrfs: convert to multigrain timestamps
  ext4: switch to multigrain timestamps
  xfs: switch to multigrain timestamps
  Documentation: add a new file documenting multigrain timestamps
  fs: add percpu counters for significant multigrain timestamp events
  fs: tracepoints around multigrain timestamp events
  fs: handle delegated timestamps in setattr_copy_mgtime
  fs: have setattr_copy handle multigrain timestamps appropriately
  fs: add infrastructure for multigrain timestamps

Link: https://lore.kernel.org/r/20241002-mgtime-v10-0-d1c4717f5284@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-10 10:20:57 +02:00
..
.kunitconfig ext4: add .kunitconfig fragment to enable ext4-specific tests 2021-02-11 23:16:30 -05:00
acl.c ext4: convert to ctime accessor functions 2023-07-24 10:29:54 +02:00
acl.h Revert "ext4: apply umask if ACL support is disabled" 2024-05-02 18:25:39 -04:00
balloc.c ext4: add some kunit stub for mballoc kunit test 2023-10-05 22:32:16 -04:00
bitmap.c ext4: move checksum length calculation of inode bitmap into ext4_inode_bitmap_csum_[verify/set]() functions 2024-09-03 22:12:15 -04:00
block_validity.c ext4: block_validity: Remove unnecessary ‘NULL’ values from new_node 2024-06-27 09:34:00 -04:00
crypto.c ext4: Move CONFIG_UNICODE defguards into the code flow 2024-06-07 17:00:45 +02:00
dir.c Lots of cleanups and bug fixes this cycle, primarily in the block 2024-09-20 19:26:45 -07:00
ext4_extents.h ext4: fix sparse warnings 2021-08-30 23:36:50 -04:00
ext4_jbd2.c use ->bd_mapping instead of ->bd_inode->i_mapping 2024-05-03 02:36:51 -04:00
ext4_jbd2.h ext4: split ext4_journal_start trace for debug 2022-12-01 10:46:54 -05:00
ext4.h Lots of cleanups and bug fixes this cycle, primarily in the block 2024-09-20 19:26:45 -07:00
extents_status.c ext4: drop all delonly descriptions 2024-09-02 15:26:15 -04:00
extents_status.h ext4: drop ext4_es_is_delonly() 2024-09-02 15:26:14 -04:00
extents.c ext4: save unnecessary indentation in ext4_ext_create_new_leaf() 2024-09-03 22:14:16 -04:00
fast_commit.c ext4: use handle to mark fc as ineligible in __track_dentry_update() 2024-10-04 17:35:54 -04:00
fast_commit.h ext4: add missing validation of fast-commit record lengths 2022-12-08 21:49:24 -05:00
file.c ext4: dax: keep orphan list before truncate overflow allocated blocks 2024-09-03 22:14:16 -04:00
fsmap.c ext4: port block device access to file 2024-02-25 12:05:26 +01:00
fsmap.h ext4: fsmap: fix the block/inode bitmap comment 2021-06-24 09:48:29 -04:00
fsync.c ext4: drop EXT4_MF_FS_ABORTED flag 2023-07-29 18:37:53 -04:00
hash.c ext4: remove redundant checks of s_encoding 2023-08-27 11:27:13 -04:00
ialloc.c ext4: check buffer_verified in advance to avoid unneeded ext4_get_group_info() 2024-09-03 22:12:16 -04:00
indirect.c ext4: update delalloc data reserve spcae in ext4_es_insert_extent() 2024-09-02 15:26:14 -04:00
inline.c Lots of cleanups and bug fixes this cycle, primarily in the block 2024-09-20 19:26:45 -07:00
inode-test.c ext4: add missing MODULE_DESCRIPTION() 2024-07-05 16:07:24 -04:00
inode.c Lots of cleanups and bug fixes this cycle, primarily in the block 2024-09-20 19:26:45 -07:00
ioctl.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
Kconfig fs: add CONFIG_BUFFER_HEAD 2023-08-02 09:13:09 -06:00
Makefile ext4: move ext4 crypto code to its own file crypto.c 2022-05-21 22:24:24 -04:00
mballoc-test.c ext4: add test_mb_mark_used_cost to estimate cost of mb_mark_used 2024-05-03 00:12:32 -04:00
mballoc.c ext4: convert EXT4_B2C(sbi->s_stripe) users to EXT4_NUM_B2C 2024-09-03 22:14:17 -04:00
mballoc.h ext4: convert ac_buddy_page to ac_buddy_folio 2024-05-07 15:38:17 -04:00
migrate.c ext4: fix i_data_sem unlock order in ext4_ind_migrate() 2024-09-03 22:14:17 -04:00
mmp.c ext4: replace read-only check for shutdown check in mmp code 2023-07-29 18:37:53 -04:00
move_extent.c ext4: get rid of ppath in get_ext_path() 2024-09-03 22:12:17 -04:00
namei.c ext4: explicitly exit when ext4_find_inline_entry returns an error 2024-09-03 22:12:16 -04:00
orphan.c ext4: remove trailing newline from ext4_msg() message 2022-12-08 21:49:23 -05:00
page-io.c ext4: remove calls to to set/clear the folio error flag 2024-05-09 00:23:51 -04:00
readpage.c ext4: reduce stack usage in ext4_mpage_readpages() 2024-08-26 21:47:03 -04:00
resize.c ext4: fix off by one issue in alloc_flex_gd() 2024-10-04 17:36:28 -04:00
super.c Merge patch series "timekeeping/fs: multigrain timestamp redux" 2024-10-10 10:20:57 +02:00
symlink.c ext4_get_link(): fix breakage in RCU mode 2024-02-25 02:10:32 -05:00
sysfs.c ext4: add positive int attr pointer to avoid sysfs variables overflow 2024-05-02 23:48:30 -04:00
truncate.h ext4: Convert to use mapping->invalidate_lock 2021-07-13 14:29:00 +02:00
verity.c fs: Convert aops->write_begin to take a folio 2024-08-07 11:33:21 +02:00
xattr_hurd.c fs: port xattr to mnt_idmap 2023-01-19 09:24:28 +01:00
xattr_security.c fs: port xattr to mnt_idmap 2023-01-19 09:24:28 +01:00
xattr_trusted.c fs: port xattr to mnt_idmap 2023-01-19 09:24:28 +01:00
xattr_user.c fs: port xattr to mnt_idmap 2023-01-19 09:24:28 +01:00
xattr.c ext4: mark fc as ineligible using an handle in ext4_xattr_set() 2024-10-04 17:36:09 -04:00
xattr.h ext4: annotate struct ext4_xattr_inode_array with __counted_by() 2024-08-26 23:40:06 -04:00