linux-stable/Documentation/filesystems
Al Viro 1c18edd1b7 __dentry_kill(): new locking scheme
Currently we enter __dentry_kill() with parent (along with the victim
dentry and victim's inode) held locked.  Then we
	mark dentry refcount as dead
	call ->d_prune()
	remove dentry from hash
	remove it from the parent's list of children
	unlock the parent, don't need it from that point on
	detach dentry from inode, unlock dentry and drop the inode
(via ->d_iput())
	call ->d_release()
	regain the lock on dentry
	check if it's on a shrink list (in which case freeing its empty husk
has to be left to shrink_dentry_list()) or not (in which case we can free it
ourselves).  In the former case, mark it as an empty husk, so that
shrink_dentry_list() would know it can free the sucker.
	drop the lock on dentry
... and usually the caller proceeds to drop a reference on the parent,
possibly retaking the lock on it.

That is painful for a bunch of reasons, starting with the need to take locks
out of order, but not limited to that - the parent of positive dentry can
change if we drop its ->d_lock, so getting these locks has to be done with
care.  Moreover, as soon as dentry is out of the parent's list of children,
shrink_dcache_for_umount() won't see it anymore, making it appear as if
the parent is inexplicably busy.  We do work around that by having
shrink_dentry_list() decrement the parent's refcount first and put it on
shrink list to be evicted once we are done with __dentry_kill() of child,
but that may in some cases lead to ->d_iput() on child called after the
parent got killed.  That doesn't happen in cases where in-tree ->d_iput()
instances might want to look at the parent, but that's brittle as hell.

Solution: do removal from the parent's list of children in the very
end of __dentry_kill().  As the result, the callers do not need to
lock the parent and by the time we really need the parent locked,
dentry is negative and is guaranteed not to be moved around.

It does mean that ->d_prune() will be called with parent not locked.
It also means that we might see dentries in process of being torn
down while going through the parent's list of children; those dentries
will be unhashed, negative and with refcount marked dead.  In practice,
that's enough for in-tree code that looks through the list of children
to do the right thing as-is.  Out-of-tree code might need to be adjusted.

Calling conventions: __dentry_kill(dentry) is called with dentry->d_lock
held, along with ->i_lock of its inode (if any).  It either returns
the parent (locked, with refcount decremented to 0) or NULL (if there'd
been no parent or if refcount decrement for parent hadn't reached 0).

lock_for_kill() is adjusted for new requirements - it doesn't touch
the parent's ->d_lock at all.

Callers adjusted.  Note that for dput() we don't need to bother with
fast_dput() for the parent - we just need to check retain_dentry()
for it, since its ->d_lock is still held since the moment when
__dentry_kill() had taken it to remove the victim from the list of
children.

The kludge with early decrement of parent's refcount in
shrink_dentry_list() is no longer needed - shrink_dcache_for_umount()
sees the half-killed dentries in the list of children for as long
as they are pinning the parent.  They are easily recognized and
accounted for by select_collect(), so we know we are not done yet.

As the result, we always have the expected ordering for ->d_iput()/->d_release()
vs. __dentry_kill() of the parent, no exceptions.  Moreover, the current
rules for shrink lists (one must make sure that shrink_dcache_for_umount()
won't happen while any dentries from the superblock in question are on
any shrink lists) are gone - shrink_dcache_for_umount() will do the
right thing in all cases, taking such dentries out.  Their empty
husks (memory occupied by struct dentry itself + its external name,
if any) will remain on the shrink lists, but they are no obstacles
to filesystem shutdown.  And such husks will get freed as soon as
shrink_dentry_list() of the list they are on gets to them.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-25 02:34:49 -05:00
..
caching Documentation: Fix typos 2023-08-18 11:29:03 -06:00
ext4 Documentation: Fix typos 2023-08-18 11:29:03 -06:00
nfs vfs-6.7.fsid 2023-11-07 12:11:26 -08:00
smb smb3: move Documentation/filesystems/cifs to Documentation/filesystems/smb 2023-05-24 16:29:21 -05:00
spufs Documentation: spufs: correct a duplicate word typo 2022-09-27 13:21:44 -06:00
9p.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
adfs.rst docs: filesystems: convert adfs.txt to ReST 2020-03-02 13:58:44 -07:00
affs.rst affs: fix basic permission bits to actually work 2020-08-31 12:20:31 +02:00
afs.rst afs: Documentation: correct reference to CONFIG_AFS_FS 2023-07-21 13:46:02 -06:00
api-summary.rst block: move fs/block_dev.c to block/bdev.c 2021-09-07 08:39:40 -06:00
autofs-mount-control.rst autofs: use flexible array in ioctl structure 2023-05-30 16:42:00 -07:00
autofs.rst autofs: use flexible array in ioctl structure 2023-05-30 16:42:00 -07:00
automount-support.rst docs: filesystems: convert automount-support.txt to ReST 2020-05-05 09:22:21 -06:00
befs.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
bfs.rst docs: filesystems: convert bfs.txt to ReST 2020-03-02 14:01:26 -07:00
btrfs.rst MAINTAINERS: remove links to obsolete btrfs.wiki.kernel.org 2023-09-08 14:21:27 +02:00
ceph.rst ceph: update documentation regarding snapshot naming limitations 2023-08-24 11:24:36 +02:00
coda.rst Documentation: coda: annotate duplicated words 2020-07-13 10:02:32 -06:00
configfs.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
cramfs.rst docs: filesystems: convert cramfs.txt to ReST 2020-03-02 14:02:07 -07:00
dax.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
debugfs.rst debugfs: small Documentation cleaning 2022-11-09 13:58:55 -07:00
devpts.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
directory-locking.rst fs: Lock moved directories 2023-06-02 15:00:18 +02:00
dlmfs.rst docs: update ocfs2-devel mailing list address 2023-07-08 09:29:29 -07:00
dnotify.rst docs: filesystems: convert dnotify.txt to ReST 2020-05-05 09:22:22 -06:00
ecryptfs.rst docs: prevent warnings due to autosectionlabel 2020-03-20 17:01:29 -06:00
efivarfs.rst docs: filesystems: add info about efivars content 2020-05-25 18:59:59 -06:00
erofs.rst erofs: fix inode metadata space layout description in documentation 2023-10-12 11:04:14 +08:00
ext2.rst ext2: remove nobh support 2022-08-02 12:34:04 -04:00
ext3.rst docs: filesystems: convert ext3.txt to ReST 2020-03-02 14:03:16 -07:00
f2fs.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
fiemap.rst A lot of bug fixes and cleanups for ext4, including: 2020-06-05 16:19:28 -07:00
files.rst file: convert to SLAB_TYPESAFE_BY_RCU 2023-10-19 11:02:48 +02:00
fscrypt.rst fscrypt: track master key presence separately from secret 2023-10-16 21:23:45 -07:00
fsverity.rst ovl: Add framework for verity support 2023-08-12 19:02:38 +03:00
fuse-io.rst docs: filesystems: convert fuse-io.txt to ReST 2020-05-05 09:22:22 -06:00
fuse.rst fuse: Add module param for CAP_SYS_ADMIN access bypassing allow_other 2022-07-21 16:06:19 +02:00
gfs2-glocks.rst gfs2 fixes 2023-09-05 13:00:28 -07:00
gfs2-uevents.rst docs: filesystems: convert gfs2-uevents.txt to ReST 2020-03-02 14:03:35 -07:00
gfs2.rst Documentation: Update filesystems/gfs2.rst 2020-12-01 00:25:20 +01:00
hfs.rst Replace HTTP links with HTTPS ones: Documentation/filesystems 2020-06-26 11:14:12 -06:00
hfsplus.rst docs: filesystems: convert hfsplus.txt to ReST 2020-03-02 14:03:47 -07:00
hpfs.rst Replace HTTP links with HTTPS ones: Documentation/filesystems 2020-06-26 11:14:12 -06:00
idmappings.rst Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
index.rst docs: add maintainer entry profile for XFS 2023-08-10 07:47:53 -07:00
inotify.rst docs: filesystems: convert inotify.txt to ReST 2020-03-02 14:03:55 -07:00
isofs.rst docs: filesystems: convert isofs.txt to ReST 2020-03-02 14:04:06 -07:00
journalling.rst jbd2: drop jbd2_fc_init documentation 2020-11-06 23:01:03 -05:00
locking.rst Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
locks.rst docs: fs: locks.rst: update comment about mandatory file locking 2021-10-19 06:48:21 -04:00
mount_api.rst fs_context: drop the unused lsm_flags member 2023-03-16 14:38:28 +01:00
netfs_library.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
nilfs2.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
ntfs3.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
ntfs.rst docs: filesystems: convert ntfs.txt to ReST 2020-03-02 14:04:06 -07:00
ocfs2-online-filecheck.rst docs: filesystems: convert ocfs2-online-filecheck.txt to ReST 2020-03-02 14:04:06 -07:00
ocfs2.rst docs: update ocfs2-devel mailing list address 2023-07-08 09:29:29 -07:00
omfs.rst Replace HTTP links with HTTPS ones: OMFS 2020-07-13 11:24:43 -06:00
orangefs.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
overlayfs.rst ovl: add support for appending lowerdirs one by one 2023-10-31 00:13:02 +02:00
path-lookup.rst Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:41:14 -07:00
path-lookup.txt Replace HTTP links with HTTPS ones: documentation 2020-06-08 09:30:19 -06:00
porting.rst __dentry_kill(): new locking scheme 2023-11-25 02:34:49 -05:00
proc.rst doc: Add /proc/bootconfig to proc.rst 2023-10-31 18:46:43 +09:00
qnx6.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
quota.rst quota: Fixup http links in quota doc 2020-07-09 08:14:01 +02:00
ramfs-rootfs-initramfs.rst Documentation/filesystems: ramfs-rootfs-initramfs: use :Author: 2023-05-16 12:55:35 -06:00
relay.rst docs: filesystems: convert relay.txt to ReST 2020-03-02 14:04:41 -07:00
romfs.rst docs: filesystems: convert romfs.txt to ReST 2020-03-02 14:04:41 -07:00
seq_file.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
sharedsubtree.rst Documentation/filesystems: sharedsubtree: add section headings 2023-05-16 12:50:05 -06:00
splice.rst docs: Bring some order to filesystem documentation 2019-03-06 09:46:10 -07:00
squashfs.rst docs: filesystems: convert squashfs.txt to ReST 2020-03-02 14:04:41 -07:00
sysfs.rst driver core: bus: mark the struct bus_type for sysfs callbacks as constant 2023-03-23 13:20:40 +01:00
sysv-fs.rst docs: filesystems: convert sysv-fs.txt to ReST 2020-03-02 14:04:41 -07:00
tmpfs.rst tmpfs,xattr: enable limited user extended attributes 2023-08-10 12:06:04 +02:00
ubifs-authentication.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
ubifs.rst Documentation: ubifs: Fix compression idiom 2022-10-10 13:01:10 -06:00
udf.rst udf: Replace HTTP links with HTTPS ones 2020-07-14 14:37:39 +02:00
vfat.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
vfs.rst Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
virtiofs.rst virtiofs: Add mount option and atime behavior to the doc 2020-04-20 17:01:34 +02:00
xfs-delayed-logging-design.rst Documentation: filesystems: correct possessive "its" 2022-09-27 13:21:44 -06:00
xfs-maintainer-entry-profile.rst docs: add maintainer entry profile for XFS 2023-08-10 07:47:53 -07:00
xfs-online-fsck-design.rst Documentation: xfs: Remove repeated word in comments 2023-09-22 05:20:09 -06:00
xfs-self-describing-metadata.rst xfs: document the filesystem metadata checking strategy 2023-04-11 18:59:47 -07:00
zonefs.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00