linux-stable/fs
Filipe Manana 732d591a5d btrfs: stop copying old dir items when logging a directory
When logging a directory, we go over every leaf of the subvolume tree that
was changed in the current transaction and copy all its dir index keys to
the log tree.

That includes copying dir index keys created in past transactions. This is
done mostly for simplicity, as after logging the keys we log an item that
specifies the start and end ranges of the keys we logged. That item is
then used during log replay to figure out which keys need to be deleted -
every key in that range that we find in the subvolume tree and is not in
the log tree, needs to be deleted.

Now that we log only dir index keys, and not dir item keys anymore, when
we remove dentries from a directory (due to unlink and rename operations),
we can get entire leaves that we changed only for deleting old dir index
keys, or that have few dir index keys that are new - this is due to the
fact that the offset for new index keys comes from a monotonically
increasing counter.

We can avoid logging dir index keys from past transactions, and in order
to track the deletions, only log range items (BTRFS_DIR_LOG_INDEX_KEY key
type) when we find gaps between consecutive index keys. This massively
reduces the amount of logged metadata when we have deleted directory
entries, even if it's a small percentage of the total number of entries.
The reduction comes from both less items that are logged and instead of
logging many dir index items (struct btrfs_dir_item), which have a size
of 30 bytes plus a file name, we typically log just a few range items
(struct btrfs_dir_log_item), which take only 8 bytes each.

Even if no entries were deleted from a directory and only new entries
were added, we typically still get a reduction on the amount of logged
metadata, because it's very likely the first leaf that got the new
dir index entries also has several old dir index entries.

So change the logging logic to not log dir index keys created in past
transactions and log a range item for every gap it finds between each
pair of consecutive index keys, to ensure deletions are tracked and
replayed on log replay.

This patch is part of a patchset comprised of the following patches:

 1/4 btrfs: don't log unnecessary boundary keys when logging directory
 2/4 btrfs: put initial index value of a directory in a constant
 3/4 btrfs: stop copying old dir items when logging a directory
 4/4 btrfs: stop trying to log subdirectories created in past transactions

The following test was run on a branch without this patchset and on a
branch with the first three patches applied:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/nvme0n1
  MNT=/mnt/nvme0n1

  NUM_FILES=1000000
  NUM_FILE_DELETES=10000

  MKFS_OPTIONS="-O no-holes -R free-space-tree"
  MOUNT_OPTIONS="-o ssd"

  mkfs.btrfs -f $MKFS_OPTIONS $DEV
  mount $MOUNT_OPTIONS $DEV $MNT

  mkdir $MNT/testdir
  for ((i = 1; i <= $NUM_FILES; i++)); do
      echo -n > $MNT/testdir/file_$i
  done

  sync

  del_inc=$(( $NUM_FILES / $NUM_FILE_DELETES ))
  for ((i = 1; i <= $NUM_FILES; i += $del_inc)); do
      rm -f $MNT/testdir/file_$i
  done

  start=$(date +%s%N)
  xfs_io -c "fsync" $MNT/testdir
  end=$(date +%s%N)

  dur=$(( (end - start) / 1000000 ))
  echo "dir fsync took $dur ms after deleting $NUM_FILE_DELETES files"
  echo

  umount $MNT

The test was run on a non-debug kernel (Debian's default kernel config),
and the results were the following for various values of NUM_FILES and
NUM_FILE_DELETES:

** before, NUM_FILES = 1 000 000, NUM_FILE_DELETES = 10 000 **

dir fsync took 585 ms after deleting 10000 files

** after, NUM_FILES = 1 000 000, NUM_FILE_DELETES = 10 000 **

dir fsync took 34 ms after deleting 10000 files   (-94.2%)

** before, NUM_FILES = 100 000, NUM_FILE_DELETES = 1 000 **

dir fsync took 50 ms after deleting 1000 files

** after, NUM_FILES = 100 000, NUM_FILE_DELETES = 1 000 **

dir fsync took 7 ms after deleting 1000 files    (-86.0%)

** before, NUM_FILES = 10 000, NUM_FILE_DELETES = 100 **

dir fsync took 9 ms after deleting 100 files

** after, NUM_FILES = 10 000, NUM_FILE_DELETES = 100 **

dir fsync took 5 ms after deleting 100 files     (-44.4%)

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14 13:13:46 +01:00
..
9p Revert "fs/9p: search open fids first" 2022-01-30 22:13:37 +09:00
adfs fs/adfs: remove unneeded variable make code cleaner 2022-01-20 08:52:55 +02:00
affs affs: use bdev_nr_sectors instead of open coding it 2021-10-18 14:43:22 -06:00
afs afs: Fix potential thrashing in afs writeback 2022-03-11 10:24:37 -08:00
autofs autofs: fix wait name hash calculation in autofs_wait() 2021-10-20 21:09:02 -04:00
befs isystem: ship and use stdarg.h 2021-08-19 09:02:55 +09:00
bfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
btrfs btrfs: stop copying old dir items when logging a directory 2022-03-14 13:13:46 +01:00
cachefiles cachefiles: Fix volume coherency attribute 2022-03-11 10:24:37 -08:00
ceph ceph: set pool_ns in new inode layout for async creates 2022-01-26 20:17:50 +01:00
cifs cifs: fix confusing unneeded warning message on smb2.1 and earlier 2022-02-16 17:16:49 -06:00
coda coda: bump module version to 7.2 2021-11-09 10:02:51 -08:00
configfs configfs: fix a race in configfs_{,un}register_subsystem() 2022-02-22 18:30:28 +01:00
cramfs cramfs: use bdev_nr_bytes instead of open coding it 2021-10-18 14:43:22 -06:00
crypto fscrypt: improve a few comments 2021-10-25 19:11:50 -07:00
debugfs debugfs: lockdown: Allow reading debugfs files that are not world readable 2022-01-06 15:47:41 +01:00
devpts fsnotify: fix fsnotify hooks in pseudo filesystems 2022-01-24 14:17:02 +01:00
dlm driver core changes for 5.17-rc1 2022-01-12 11:11:34 -08:00
ecryptfs fs: add is_idmapped_mnt() helper 2021-12-03 18:44:06 +01:00
efivarfs
efs
erofs erofs: fix ztailpacking on > 4GiB filesystems 2022-03-02 21:58:45 +08:00
exfat exfat: fix missing REQ_SYNC in exfat_update_bhs() 2022-01-10 11:00:04 +09:00
exportfs
ext2 fsdax: shift partition offset handling into the file systems 2021-12-04 08:58:54 -08:00
ext4 Various bug fixes for ext4 fast commit and inline data handling. Also 2022-02-06 10:34:45 -08:00
f2fs Fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into the 2022-02-01 11:13:24 -08:00
fat FAT: use io_schedule_timeout() instead of congestion_wait() 2022-01-20 08:52:54 +02:00
freevxfs
fscache fscache: Fix the volume collision wait condition 2022-01-21 21:36:28 +00:00
fuse fuse: fix pipe buffer lifetime for direct_io 2022-03-07 16:30:44 +01:00
gfs2 gfs2 fixes: 2022-02-11 11:36:32 -08:00
hfs Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
hfsplus hfsplus: use struct_group_attr() for memcpy() region 2022-01-20 08:52:54 +02:00
hostfs hostfs: Fix writeback of dirty pages 2021-12-21 21:44:27 +01:00
hpfs treewide: Replace open-coded flex arrays in unions 2021-10-18 12:28:53 -07:00
hugetlbfs hugetlbfs: fix off-by-one error in hugetlb_vmdelete_list() 2022-01-15 16:30:30 +02:00
iomap xfs, iomap: limit individual ioend chain lengths in writeback 2022-01-26 09:19:20 -08:00
isofs isofs: Fix out of bound access for corrupted isofs image 2021-10-19 12:51:02 +02:00
jbd2 Various bug fixes for ext4 fast commit and inline data handling. Also 2022-02-06 10:34:45 -08:00
jffs2 Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2022-01-17 05:49:30 +02:00
jfs Just one JFS patch 2021-11-03 09:23:25 -07:00
kernfs kernfs: prevent early freeing of root node 2021-12-03 14:36:21 +01:00
ksmbd ksmbd: add support for key exchange 2022-02-04 00:12:22 -06:00
lockd Notable bug fixes: 2022-02-02 10:14:31 -08:00
minix mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
netfs netfs: Make ops->init_rreq() optional 2022-01-21 21:36:28 +00:00
nfs NFS: Do not report writeback errors in nfs_getattr() 2022-02-16 15:15:22 -05:00
nfs_common nfs: Fix kerneldoc warning shown up by W=1 2021-10-04 22:02:17 +01:00
nfsd Notable bug fixes: 2022-02-09 09:56:57 -08:00
nilfs2 Merge branch 'akpm' (patches from Andrew) 2022-01-20 10:41:01 +02:00
nls
notify fanotify: Fix stale file descriptor in copy_event_to_user() 2022-02-01 12:52:07 +01:00
ntfs fs/ntfs/attrib.c: fix one kernel-doc comment 2022-01-15 16:30:24 +02:00
ntfs3 mm: remove cleancache 2022-01-22 08:33:38 +02:00
ocfs2 ocfs2: fix a deadlock when commit trans 2022-01-30 09:56:58 +02:00
omfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
openpromfs
orangefs orangefs: Fix the size of a memory allocation in orangefs_bufmap_alloc() 2021-12-31 14:37:43 -05:00
overlayfs overlayfs fixes for 5.17-rc3 2022-02-01 11:23:02 -08:00
proc proc: fix documentation and description of pagemap 2022-03-05 11:08:33 -08:00
pstore pstore update for v5.17-rc1 2022-01-10 11:48:37 -08:00
qnx4 qnx4: work around gcc false positive warning bug 2021-09-21 08:36:48 -07:00
qnx6
quota quota: make dquot_quota_sync return errors from ->sync_fs 2022-01-30 08:59:47 -08:00
ramfs Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
reiserfs reiserfs: don't use congestion_wait() 2021-11-18 11:52:22 +01:00
romfs
smbfs_common smb3: add new defines from protocol specification 2022-01-18 16:50:47 -06:00
squashfs squashfs: provide backing_dev_info in order to disable read-ahead 2022-01-15 16:30:24 +02:00
sysfs fs/sysfs/dir.c: replace S_IRWXU|S_IRUGO|S_IXUGO with 0755 sysfs_create_dir_ns() 2021-10-05 16:35:05 +02:00
sysv sysv: use BUILD_BUG_ON instead of runtime check 2021-11-09 10:02:52 -08:00
tracefs tracefs: Set the group ownership in apply_options() not parse_options() 2022-02-25 21:05:04 -05:00
ubifs ubifs: read-only if LEB may always be taken in ubifs_garbage_collect 2021-12-23 22:30:38 +01:00
udf udf: Restore i_lenAlloc when inode expansion fails 2022-01-24 14:45:02 +01:00
ufs isystem: ship and use stdarg.h 2021-08-19 09:02:55 +09:00
unicode Fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into the 2022-02-01 11:13:24 -08:00
vboxsf vboxfs: fix broken legacy mount signature checking 2021-09-27 11:26:21 -07:00
verity fs-verity: fix signed integer overflow with i_size near S64_MAX 2021-09-22 10:56:34 -07:00
xfs Bug fixes for 5.17-rc4: 2022-02-26 09:53:19 -08:00
zonefs zonefs: add MODULE_ALIAS_FS 2021-12-17 16:56:35 +09:00
aio.c aio: move aio sysctl to aio.c 2022-01-22 08:33:34 +02:00
anon_inodes.c fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure() 2021-09-19 22:35:37 -04:00
attr.c fs: handle circular mappings correctly 2021-11-17 09:26:09 +01:00
bad_inode.c vfs: add rcu argument to ->get_acl() callback 2021-08-18 22:08:24 +02:00
binfmt_aout.c binfmt: a.out: Fix bogus semicolon 2021-09-05 10:15:05 -07:00
binfmt_elf_fdpic.c coredump: Limit coredumps to a single thread group 2021-10-08 12:06:02 -05:00
binfmt_elf.c binfmt_elf fix for v5.17-rc7 2022-03-01 11:31:37 -08:00
binfmt_flat.c binfmt: remove in-tree usage of MAP_EXECUTABLE 2021-06-29 10:53:50 -07:00
binfmt_misc.c Fix regression due to "fs: move binfmt_misc sysctl to its own file" 2022-02-09 09:50:02 -08:00
binfmt_script.c
buffer.c fs/buffer: Convert __block_write_begin_int() to take a folio 2021-12-16 15:49:51 -05:00
char_dev.c
compat_binfmt_elf.c
coredump.c fs/coredump: move coredump sysctls into its own file 2022-01-22 08:33:36 +02:00
d_path.c d_path: fix Kernel doc validator complaining 2021-11-06 13:30:32 -07:00
dax.c dax: remove the copy_from_iter and copy_to_iter methods 2021-12-18 08:04:53 -08:00
dcache.c fs: move dcache sysctls to its own file 2022-01-22 08:33:36 +02:00
direct-io.c fs: get rid of the res2 iocb->ki_complete argument 2021-10-25 10:36:24 -06:00
drop_caches.c fs: drop_caches: fix skipping over shadow cache inodes 2021-09-03 09:58:10 -07:00
eventfd.c eventfd: Export eventfd_wake_count to modules 2021-09-06 07:20:56 -04:00
eventpoll.c eventpoll: simplify sysctl declaration with register_sysctl() 2022-01-22 08:33:35 +02:00
exec.c fs/coredump: move coredump sysctls into its own file 2022-01-22 08:33:36 +02:00
fcntl.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
fhandle.c
file_table.c fs/file_table: fix adding missing kmemleak_not_leak() 2022-02-17 10:23:19 -08:00
file.c fget: clarify and improve __fget_files() implementation 2021-12-13 10:55:30 -08:00
filesystems.c fs: simplify get_filesystem_list / get_all_fs_names 2021-08-23 01:25:40 -04:00
fs_context.c vfs: fs_context: fix up param length parsing in legacy_parse_param 2022-01-18 09:23:19 +02:00
fs_parser.c fs_parse: allow parameter value to be empty 2021-12-09 14:09:36 -05:00
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c fscache rewrite 2022-01-12 13:45:12 -08:00
fsopen.c
init.c
inode.c fs: move inode sysctls to its own file 2022-01-22 08:33:35 +02:00
internal.h fs/buffer: Convert __block_write_begin_int() to take a folio 2021-12-16 15:49:51 -05:00
io_uring.c io_uring: disallow modification of rsrc_data during quiesce 2022-02-22 09:57:32 -07:00
io-wq.c io_uring-5.17-2022-01-21 2022-01-21 16:07:21 +02:00
io-wq.h Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2022-01-17 05:49:30 +02:00
ioctl.c fs/ioctl: remove unnecessary __user annotation 2022-01-15 16:30:25 +02:00
Kconfig ksmbd: add support for key exchange 2022-02-04 00:12:22 -06:00
Kconfig.binfmt binfmt: remove support for em86 (alpha only) 2021-07-25 22:33:03 -07:00
kernel_read_file.c vfs: check fd has read access in kernel_read_file_from_fd() 2021-10-18 20:22:03 -10:00
libfs.c unicode: clean up the Kconfig symbol confusion 2022-01-20 19:57:24 -05:00
locks.c fs: move locking sysctls where they are used 2022-01-22 08:33:36 +02:00
Makefile Fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into the 2022-02-01 11:13:24 -08:00
mbcache.c
mount.h
mpage.c mm: remove cleancache 2022-01-22 08:33:38 +02:00
namei.c \n 2022-01-28 17:51:31 +02:00
namespace.c fs: add kernel doc for mnt_{hold,unhold}_writers() 2022-02-14 08:35:32 +01:00
no-block.c
nsfs.c
open.c fs: support mapped mounts of mapped filesystems 2021-12-05 10:28:57 +01:00
pipe.c watch_queue: Fix lack of barrier/sync/lock between post and read 2022-03-11 10:17:13 -08:00
pnode.c
pnode.h
posix_acl.c fs: support mapped mounts of mapped filesystems 2021-12-05 10:28:57 +01:00
proc_namespace.c fs: add is_idmapped_mnt() helper 2021-12-03 18:44:06 +01:00
read_write.c fs: remove leftover comments from mandatory locking removal 2021-10-26 12:20:50 -04:00
readdir.c
remap_range.c fs: Convert vfs_dedupe_file_range_compare to folios 2022-01-08 00:28:41 -05:00
select.c select: Fix indefinitely sleeping task in poll_schedule_timeout() 2022-01-11 09:03:05 -08:00
seq_file.c seq_file: move seq_escape() to a header 2021-11-09 10:02:52 -08:00
signalfd.c Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2022-01-17 05:49:30 +02:00
splice.c
stack.c
stat.c fs: add generic helper for filling statx attribute flags 2021-08-17 11:47:43 +02:00
statfs.c
super.c vfs: make freeze_super abort when sync_filesystem returns error 2022-01-30 08:59:47 -08:00
sync.c vfs: make sync_filesystem return errors from ->sync_fs 2022-01-30 08:59:47 -08:00
sysctls.c fs: move namespace sysctls and declare fs base directory 2022-01-22 08:33:36 +02:00
timerfd.c timerfd: Provide timerfd_resume() 2021-08-10 17:57:22 +02:00
userfaultfd.c mm: refactor vm_area_struct::anon_vma_name usage code 2022-03-05 11:08:32 -08:00
utimes.c
xattr.c