linux/kernel
Christian Brauner 71bc356f93
commoncap: handle idmapped mounts
When interacting with user namespace and non-user namespace aware
filesystem capabilities the vfs will perform various security checks to
determine whether or not the filesystem capabilities can be used by the
caller, whether they need to be removed and so on. The main
infrastructure for this resides in the capability codepaths but they are
called through the LSM security infrastructure even though they are not
technically an LSM or optional. This extends the existing security hooks
security_inode_removexattr(), security_inode_killpriv(),
security_inode_getsecurity() to pass down the mount's user namespace and
makes them aware of idmapped mounts.

In order to actually get filesystem capabilities from disk the
capability infrastructure exposes the get_vfs_caps_from_disk() helper.
For user namespace aware filesystem capabilities a root uid is stored
alongside the capabilities.

In order to determine whether the caller can make use of the filesystem
capability or whether it needs to be ignored it is translated according
to the superblock's user namespace. If it can be translated to uid 0
according to that id mapping the caller can use the filesystem
capabilities stored on disk. If we are accessing the inode that holds
the filesystem capabilities through an idmapped mount we map the root
uid according to the mount's user namespace. Afterwards the checks are
identical to non-idmapped mounts: reading filesystem caps from disk
enforces that the root uid associated with the filesystem capability
must have a mapping in the superblock's user namespace and that the
caller is either in the same user namespace or is a descendant of the
superblock's user namespace. For filesystems that are mountable inside
user namespace the caller can just mount the filesystem and won't
usually need to idmap it. If they do want to idmap it they can create an
idmapped mount and mark it with a user namespace they created and which
is thus a descendant of s_user_ns. For filesystems that are not
mountable inside user namespaces the descendant rule is trivially true
because the s_user_ns will be the initial user namespace.

If the initial user namespace is passed nothing changes so non-idmapped
mounts will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:17 +01:00
..
bpf inode: make init and permission helpers idmapped mount aware 2021-01-24 14:27:16 +01:00
cgroup namei: make permission helpers idmapped mount aware 2021-01-24 14:27:16 +01:00
configs staging: ION: remove some references to CONFIG_ION 2021-01-06 17:39:38 +01:00
debug smp: Cleanup smp_call_function*() 2020-11-24 16:47:49 +01:00
dma dma-mapping updates for 5.11: 2020-12-22 13:19:43 -08:00
entry The new preemtible kmap_local() implementation: 2020-12-14 18:35:53 -08:00
events Merge branch 'exec-update-lock-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:36:48 -08:00
gcov gcov: fix kernel-doc markup issue 2020-12-15 22:46:18 -08:00
irq genirq: Fix export of irq_to_desc() for powerpc KVM 2020-12-25 11:02:39 -08:00
kcsan kcsan: Fix encoding masks and regain address bit 2020-11-06 17:19:26 -08:00
livepatch livepatch: Use the default ftrace_ops instead of REGS when ARGS is available 2020-11-13 12:15:28 -05:00
locking A moderate set of locking updates: 2020-12-14 17:27:47 -08:00
power Power management updates for 5.11-rc1 2020-12-15 16:30:31 -08:00
printk printk changes for 5.11 2020-12-16 10:45:11 -08:00
rcu Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu 2021-01-04 10:55:19 -08:00
sched Fix a context switch performance regression. 2020-12-27 09:00:47 -08:00
time Update/fix two CPU sanity checks in the hotplug and the boot code, 2020-12-27 09:03:41 -08:00
trace tracing/kprobes: Do the notrace functions check without kprobes on ftrace 2021-01-11 16:09:53 -05:00
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
acct.c kernel/acct.c: use #elif instead of #end and #elif 2020-12-15 22:46:15 -08:00
async.c treewide: Remove uninitialized_var() usage 2020-07-16 12:35:15 -07:00
audit_fsnotify.c fsnotify: generalize handle_inode_event() 2020-12-03 14:58:35 +01:00
audit_tree.c fsnotify: generalize handle_inode_event() 2020-12-03 14:58:35 +01:00
audit_watch.c fsnotify: generalize handle_inode_event() 2020-12-03 14:58:35 +01:00
audit.c audit: replace atomic_add_return() 2020-12-02 22:52:16 -05:00
audit.h audit: change unnecessary globals into statics 2020-08-17 20:26:58 -04:00
auditfilter.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
auditsc.c commoncap: handle idmapped mounts 2021-01-24 14:27:17 +01:00
backtracetest.c treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() 2020-07-30 11:15:58 -07:00
bounds.c
capability.c capability: handle idmapped mounts 2021-01-24 14:27:16 +01:00
compat.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
configs.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
context_tracking.c context_tracking: Ensure that the critical path cannot be instrumented 2020-06-11 15:14:36 +02:00
cpu_pm.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
cpu.c Scheduler updates: 2020-12-14 18:29:11 -08:00
crash_core.c kdump: append uts_namespace.name offset to VMCOREINFO 2020-12-15 22:46:18 -08:00
crash_dump.c crash_dump: Remove no longer used saved_max_pfn 2020-04-15 11:21:54 +02:00
cred.c exec: Teach prepare_exec_creds how exec treats uids & gids 2020-05-20 14:44:21 -05:00
delayacct.c
dma.c
exec_domain.c
exit.c kernel/io_uring: cancel io_uring before task works 2020-12-30 19:36:54 -07:00
extable.c kernel/extable.c: use address-of operator on section symbols 2020-04-07 10:43:42 -07:00
fail_function.c fault-injection: handle EI_ETYPE_TRUE 2020-12-15 22:46:19 -08:00
fork.c kasan: rename (un)poison_shadow to (un)poison_range 2020-12-22 12:55:06 -08:00
freezer.c
futex.c Merge branch 'locking/rwsem' 2020-12-09 17:08:45 +01:00
gen_kheaders.sh kbuild: add variables for compression tools 2020-06-06 23:42:01 +09:00
groups.c LSM: Signal to SafeSetID when setting group IDs 2020-10-13 09:17:34 -07:00
hung_task.c kernel/hung_task.c: make type annotations consistent 2020-11-02 12:14:19 -08:00
iomem.c
irq_work.c irq_work: Optimize irq_work_single() 2020-11-24 16:47:49 +01:00
jump_label.c jump_label: Fix usage in module __init 2020-12-18 16:53:12 +01:00
kallsyms.c treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
kcmp.c Merge branch 'exec-update-lock-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:36:48 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks sched/rt, locking: Use CONFIG_PREEMPTION 2019-12-08 14:37:36 +01:00
Kconfig.preempt
kcov.c kernel: make kcov_common_handle consider the current context 2020-11-02 18:00:20 -08:00
kexec_core.c crypto: sha - split sha.h into sha1.h and sha2.h 2020-11-20 14:45:33 +11:00
kexec_elf.c
kexec_file.c crypto: sha - split sha.h into sha1.h and sha2.h 2020-11-20 14:45:33 +11:00
kexec_internal.h kexec: add machine_kexec_post_load() 2020-01-08 16:32:55 +00:00
kexec.c LSM: Introduce kernel_post_load_data() hook 2020-10-05 13:37:03 +02:00
kheaders.c
kmod.c kmod: remove redundant "be an" in the comment 2020-08-12 10:58:01 -07:00
kprobes.c Merge branch 'linus' into perf/kprobes 2020-11-07 13:18:49 +01:00
ksysfs.c
kthread.c Merge branch 'akpm' (patches from Andrew) 2020-12-15 12:53:37 -08:00
latencytop.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
Makefile kcov: don't instrument with UBSAN 2020-12-15 22:46:19 -08:00
module_signature.c
module_signing.c
module-internal.h
module.c Modules updates for v5.11 2020-12-17 13:01:31 -08:00
notifier.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
nsproxy.c fixes-v5.11 2020-12-14 16:40:27 -08:00
padata.c padata: fix possible padata_works_lock deadlock 2020-09-04 17:51:55 +10:00
panic.c panic: don't dump stack twice on warn 2020-11-14 11:26:04 -08:00
params.c Modules updates for v5.11 2020-12-17 13:01:31 -08:00
pid_namespace.c fixes-v5.11 2020-12-14 16:40:27 -08:00
pid.c Merge branch 'exec-update-lock-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:36:48 -08:00
profile.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
ptrace.c Merge branch 'akpm' (patches from Andrew) 2020-12-15 12:53:37 -08:00
range.c kernel.h: split out min()/max() et al. helpers 2020-10-16 11:11:19 -07:00
reboot.c reboot: hide from sysfs not applicable settings 2020-12-15 22:46:19 -08:00
regset.c regset: kill ->get() 2020-07-27 14:31:12 -04:00
relay.c relay: allow the use of const callback structs 2020-12-15 22:46:18 -08:00
resource_kunit.c resource: provide meaningful MODULE_LICENSE() in test suite 2020-11-25 18:52:35 +01:00
resource.c kernel/resource.c: fix kernel-doc markups 2020-12-15 22:46:18 -08:00
rseq.c rseq: Reject unknown flags on rseq unregister 2019-12-25 10:41:20 +01:00
scftorture.c scftorture: Add full-test stutter capability 2020-11-06 17:13:56 -08:00
scs.c scs: switch to vmapped shadow stacks 2020-12-01 10:30:28 +00:00
seccomp.c seccomp updates for v5.11-rc1 2020-12-16 11:30:10 -08:00
signal.c tif-task_work.arch-2020-12-14 2020-12-16 12:33:35 -08:00
smp.c Merge branch 'linus' into sched/core, to resolve semantic conflict 2020-11-27 11:10:50 +01:00
smpboot.c
smpboot.h
softirq.c Misc fixes/updates: 2020-12-27 09:06:10 -08:00
stackleak.c stackleak: let stack_erasing_sysctl take a kernel pointer buffer 2020-09-19 13:13:39 -07:00
stacktrace.c stacktrace: Remove reliable argument from arch_stack_walk() callback 2020-09-18 14:24:16 +01:00
static_call.c static_call: Fix return type of static_call_init 2020-10-02 21:18:25 +02:00
stop_machine.c Merge branch 'linus' into sched/core, to resolve semantic conflict 2020-11-27 11:10:50 +01:00
sys_ni.c epoll: wire up syscall epoll_pwait2 2020-12-19 11:18:38 -08:00
sys.c fs: add file and path permissions helpers 2021-01-24 14:27:16 +01:00
sysctl-test.c kunit: allow kunit tests to be loaded as a module 2020-01-09 16:42:29 -07:00
sysctl.c rcu: Panic after fixed number of stalls 2020-11-19 19:37:16 -08:00
task_work.c task_work: remove legacy TWA_SIGNAL path 2020-12-12 09:17:38 -07:00
taskstats.c treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
test_kprobes.c
torture.c rcutorture: Make stutter_wait() caller restore priority 2020-11-06 17:13:54 -08:00
tracepoint.c tracepoints: Migrate to use SYSCALL_WORK flag 2020-11-16 21:53:15 +01:00
tsacct.c tsacct: add 64-bit btime field 2019-12-18 18:07:31 +01:00
ucount.c ucount: Make sure ucounts in /proc/sys/user don't regress again 2020-04-07 21:51:27 +02:00
uid16.c
uid16.h
umh.c usermodehelper: reset umask to default before executing user process 2020-10-06 10:31:52 -07:00
up.c smp/up: Make smp_call_function_single() match SMP semantics 2020-02-07 15:34:12 +01:00
user_namespace.c fixes-v5.11 2020-12-14 16:40:27 -08:00
user-return-notifier.c
user.c user: Use generic ns_common::count 2020-08-19 14:14:12 +02:00
usermode_driver.c umd: Stop using split_argv 2020-07-07 11:58:59 -05:00
utsname_sysctl.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
utsname.c uts: Use generic ns_common::count 2020-08-19 14:13:20 +02:00
watch_queue.c watch_queue: Limit the number of watches a user can hold 2020-08-17 09:39:18 -07:00
watchdog_hld.c
watchdog.c kernel/watchdog: fix watchdog_allowed_mask not used warning 2020-11-14 11:26:03 -08:00
workqueue_internal.h
workqueue.c Merge branch 'for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2020-12-28 11:23:02 -08:00