linux/fs
Lorenzo Stoakes f64e67e5d3 fork: do not invoke uffd on fork if error occurs
Patch series "fork: do not expose incomplete mm on fork".

During fork we may place the virtual memory address space into an
inconsistent state before the fork operation is complete.

In addition, we may encounter an error during the fork operation that
indicates that the virtual memory address space is invalidated.

As a result, we should not be exposing it in any way to external machinery
that might interact with the mm or VMAs, machinery that is not designed to
deal with incomplete state.

We specifically update the fork logic to defer khugepaged and ksm to the
end of the operation and only to be invoked if no error arose, and
disallow uffd from observing fork events should an error have occurred.


This patch (of 2):

Currently on fork we expose the virtual address space of a process to
userland unconditionally if uffd is registered in VMAs, regardless of
whether an error arose in the fork.

This is performed in dup_userfaultfd_complete() which is invoked
unconditionally, and performs two duties - invoking registered handlers
for the UFFD_EVENT_FORK event via dup_fctx(), and clearing down
userfaultfd_fork_ctx objects established in dup_userfaultfd().

This is problematic, because the virtual address space may not yet be
correctly initialised if an error arose.

The change in commit d240629148 ("fork: use __mt_dup() to duplicate
maple tree in dup_mmap()") makes this more pertinent as we may be in a
state where entries in the maple tree are not yet consistent.

We address this by, on fork error, ensuring that we roll back state that
we would otherwise expect to clean up through the event being handled by
userland and perform the memory freeing duty otherwise performed by
dup_userfaultfd_complete().

We do this by implementing a new function, dup_userfaultfd_fail(), which
performs the same loop, only decrementing reference counts.

Note that we perform mmgrab() on the parent and child mm's, however
userfaultfd_ctx_put() will mmdrop() this once the reference count drops to
zero, so we will avoid memory leaks correctly here.

Link: https://lkml.kernel.org/r/cover.1729014377.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/d3691d58bb58712b6fb3df2be441d175bd3cdf07.1729014377.git.lorenzo.stoakes@oracle.com
Fixes: d240629148 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-28 21:40:38 -07:00
..
9p Revert patches causing inode collision problems 2024-10-25 15:25:02 -07:00
adfs move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
affs affs-for-6.12-tag 2024-09-16 13:07:59 +02:00
afs afs: Fix lock recursion 2024-10-17 15:33:46 +02:00
autofs autofs: add per dentry expire timeout 2024-08-30 08:22:36 +02:00
bcachefs bcachefs fixes for 6.12-rc5 2024-10-24 12:38:59 -07:00
befs befs: Convert befs_symlink_read_folio() to use folio_end_read() 2024-05-31 12:31:39 +02:00
bfs fs: Convert aops->write_begin to take a folio 2024-08-07 11:33:21 +02:00
btrfs for-6.12-rc4-tag 2024-10-24 13:04:15 -07:00
cachefiles cachefiles: fix dentry leak in cachefiles_open_file() 2024-09-27 18:29:19 +02:00
ceph A fix from Patrick for a variety of CephFS lockup scenarios caused by 2024-10-04 10:10:23 -07:00
coda coda: use param->file for FSCONFIG_SET_FD 2024-08-19 13:45:03 +02:00
configfs fs/configfs: Add a callback to determine attribute visibility 2024-06-17 20:42:57 +02:00
cramfs vfs-6.11.module.description 2024-07-15 11:14:59 -07:00
crypto move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
debugfs [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
devpts
dlm [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
ecryptfs move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
efivarfs [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
efs vfs-6.11.module.description 2024-07-15 11:14:59 -07:00
erofs Changes since last update: 2024-10-14 11:12:09 -07:00
exfat move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
exportfs fhandle: relax open_by_handle_at() permission checks 2024-05-28 15:57:23 +02:00
ext2 vfs-6.12.file 2024-09-16 09:14:02 +02:00
ext4 ext4: fix off by one issue in alloc_flex_gd() 2024-10-04 17:36:28 -04:00
f2fs f2fs: allow parallel DIO reads 2024-10-11 15:12:07 +00:00
fat fat: fix uninitialized variable 2024-10-17 00:28:06 -07:00
freevxfs freevxfs: Convert freevxfs to the new mount API. 2024-03-26 09:04:53 +01:00
fuse fuse: remove stray debug line 2024-10-25 17:05:49 +02:00
gfs2 gfs2 changes 2024-09-23 11:55:17 -07:00
hfs fs: Convert aops->write_begin to take a folio 2024-08-07 11:33:21 +02:00
hfsplus move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
hostfs fs: Convert aops->write_begin to take a folio 2024-08-07 11:33:21 +02:00
hpfs move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
hugetlbfs fs: Convert aops->write_begin to take a folio 2024-08-07 11:33:21 +02:00
iomap iomap: move locking out of iomap_write_delalloc_release 2024-10-15 11:37:42 +02:00
isofs move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
jbd2 jbd2: remove unneeded check of ret in jbd2_fc_get_buf 2024-08-26 23:49:15 -04:00
jffs2 jffs2: Use a folio in jffs2_garbage_collect_dnode() 2024-08-19 13:40:00 +02:00
jfs jfs: Fix sanity check in dbMount 2024-10-22 09:40:37 -05:00
kernfs kernfs: mount: Remove unnecessary ‘NULL’ values from knparent 2024-05-04 19:02:39 +02:00
lockd move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
minix buffer: Convert __block_write_begin() to take a folio 2024-08-07 11:33:36 +02:00
netfs netfs: Downgrade i_rwsem for a buffered write 2024-10-17 15:33:42 +02:00
nfs NFS: remove revoked delegation from server's delegation list 2024-10-09 15:39:22 -04:00
nfs_common nfs_common: fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put() 2024-10-03 16:19:43 -04:00
nfsd nfsd-6.12 fixes: 2024-10-25 11:38:15 -07:00
nilfs2 vfs-6.12-rc5.fixes 2024-10-21 10:48:24 -07:00
nls move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
notify inotify: Fix possible deadlock in fsnotify_destroy_mark 2024-10-02 15:14:29 +02:00
ntfs3 Changes for 6.12-rc3 2024-10-08 10:53:06 -07:00
ocfs2 fs: Fix uninitialized value issue in from_kuid and from_kgid 2024-10-17 15:33:43 +02:00
omfs fs: Convert aops->write_begin to take a folio 2024-08-07 11:33:21 +02:00
openpromfs openpromfs: add missing MODULE_DESCRIPTION() macro 2024-06-20 09:46:01 +02:00
orangefs move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
overlayfs fs: pass offset and result to backing_file end_write() callback 2024-10-16 13:17:45 +02:00
proc vfs-6.12-rc5.fixes 2024-10-21 10:48:24 -07:00
pstore drm next for 6.12-rc1 2024-09-19 10:18:15 +02:00
qnx4 qnx4: add MODULE_DESCRIPTION() 2024-05-28 11:52:53 +02:00
qnx6 qnx6: Convert directory handling to use kmap_local 2024-08-07 11:31:56 +02:00
quota \n 2024-09-23 10:49:28 -07:00
ramfs mm: switch mm->get_unmapped_area() to a flag 2024-04-25 20:56:25 -07:00
reiserfs move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
romfs romfs: fix romfs_read_folio() 2024-08-21 22:32:58 +02:00
smb cifs: fix warning when destroy 'cifs_io_request_pool' 2024-10-23 07:42:44 -05:00
squashfs Many singleton patches - please see the various changelogs for details. 2024-09-21 08:20:50 -07:00
sysfs Merge 6.9-rc5 into driver-core-next 2024-04-23 13:27:43 +02:00
sysv buffer: Convert __block_write_begin() to take a folio 2024-08-07 11:33:36 +02:00
tests execve: Move KUnit tests to tests/ subdirectory 2024-07-22 18:25:47 -07:00
tracefs eventfs: Use list_del_rcu() for SRCU protected list variable 2024-09-05 10:18:48 -04:00
ubifs [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
udf udf: fix uninit-value use in udf_get_fileshortad 2024-10-02 14:32:37 +02:00
ufs ufs_rename(): fix bogus argument of folio_release_kmap() 2024-10-02 00:05:09 -04:00
unicode unicode: Don't special case ignorable code points 2024-10-09 13:34:01 -04:00
vboxsf fs: Convert aops->write_end to take a folio 2024-08-07 11:32:02 +02:00
verity fsverity: expose verified fsverity built-in signatures to LSMs 2024-08-20 14:03:18 -04:00
xfs xfs: update the pag for the last AG at recovery time 2024-10-22 13:37:19 +02:00
zonefs zonefs fixes for 6.12-rc2 2024-10-02 12:02:15 -07:00
aio.c fs/aio: Fix __percpu annotation of *cpu pointer in struct kioctx 2024-08-19 13:45:03 +02:00
anon_inodes.c fs: Create anon_inode_getfile_fmode() 2024-04-26 10:33:05 +02:00
attr.c nfsd-6.11 fixes: 2024-08-29 06:20:44 +12:00
backing-file.c fs: pass offset and result to backing_file end_write() callback 2024-10-16 13:17:45 +02:00
bad_inode.c
binfmt_elf_fdpic.c binfmt_elf_fdpic: fix AUXV size calculation when ELF_HWCAP2 is defined 2024-08-26 13:00:38 -07:00
binfmt_elf.c Revert "binfmt_elf, coredump: Log the reason of the failed core dumps" 2024-09-26 11:39:02 -07:00
binfmt_flat.c move asm/unaligned.h to linux/unaligned.h 2024-10-02 17:23:23 -04:00
binfmt_misc.c vfs-6.11.module.description 2024-07-15 11:14:59 -07:00
binfmt_script.c fs: binfmt: add missing MODULE_DESCRIPTION() macros 2024-05-28 12:06:51 +02:00
bpf_fs_kfuncs.c bpf: Add kfunc bpf_get_dentry_xattr() to read xattr from dentry 2024-08-07 11:26:54 -07:00
buffer.c vfs-6.12.folio 2024-09-16 08:54:30 +02:00
char_dev.c
compat_binfmt_elf.c
coredump.c Revert "binfmt_elf, coredump: Log the reason of the failed core dumps" 2024-09-26 11:39:02 -07:00
d_path.c
dax.c iomap: constrain the file range passed to iomap_file_unshare 2024-10-03 10:22:28 +02:00
dcache.c vfs-6.12.misc 2024-09-16 08:35:09 +02:00
direct-io.c fs/direct-io: Remove linux/prefetch.h include 2024-08-19 13:45:02 +02:00
drop_caches.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
eventfd.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
eventpoll.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
exec.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
fcntl.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
fhandle.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
file_table.c slab updates for 6.12 2024-09-18 08:53:53 +02:00
file.c close_range(): fix the logics in descriptor table trimming 2024-09-29 21:52:29 -04:00
filesystems.c
fs_context.c
fs_parser.c fs_parse: add uid & gid option option parsing helpers 2024-07-02 06:20:49 +02:00
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c inode: port __I_SYNC to var event 2024-08-30 08:22:39 +02:00
fsopen.c [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
init.c
inode.c bcachefs: do not use PF_MEMALLOC_NORECLAIM 2024-10-09 12:47:18 -07:00
internal.h file: reclaim 24 bytes from f_owner 2024-08-28 13:05:39 +02:00
ioctl.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
Kconfig nfs_common: fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT 2024-10-03 16:19:51 -04:00
Kconfig.binfmt exec: Add KUnit test for bprm_stack_limits() 2024-06-19 13:13:55 -07:00
kernel_read_file.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
libfs.c vfs-6.12.folio 2024-09-16 08:54:30 +02:00
locks.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
Makefile bpf: introduce new VFS based BPF kfuncs 2024-08-06 09:01:41 -07:00
mbcache.c
mnt_idmapping.c fuse update for 6.12 2024-09-24 15:29:42 -07:00
mount.h vfs-6.12.mount 2024-09-16 11:15:26 +02:00
mpage.c buffer: Remove calls to set and clear the folio error flag 2024-05-31 12:31:43 +02:00
namei.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
namespace.c fs: don't try and remove empty rbtree node 2024-10-17 15:33:43 +02:00
nsfs.c [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
open.c openat2: explicitly return -E2BIG for (usize > PAGE_SIZE) 2024-10-10 12:09:03 +02:00
pidfs.c pidfs: check for valid pid namespace 2024-09-27 18:29:19 +02:00
pipe.c [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
pnode.c
pnode.h
posix_acl.c fs: Use in_group_or_capable() helper to simplify the code 2024-08-30 08:22:37 +02:00
proc_namespace.c fs: rename show_mnt_opts -> show_vfsmnt_opts 2024-06-28 14:36:43 +02:00
read_write.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
readdir.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
remap_range.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
select.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
seq_file.c seq_file: Simplify __seq_puts() 2024-05-02 16:28:20 +02:00
signalfd.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
splice.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
stack.c
stat.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
statfs.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
super.c vfs-6.12.misc 2024-09-16 08:35:09 +02:00
sync.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
sysctls.c
timerfd.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
userfaultfd.c fork: do not invoke uffd on fork if error occurs 2024-10-28 21:40:38 -07:00
utimes.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
xattr.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00