mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-01-14 17:14:09 +00:00
cgroup: Add documentation for cgroup namespaces
Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Tejun Heo <tj@kernel.org>
This commit is contained in:
parent
ed82571b1a
commit
d4021f6cd4
@ -47,6 +47,11 @@ CONTENTS
|
||||
5-3. IO
|
||||
5-3-1. IO Interface Files
|
||||
5-3-2. Writeback
|
||||
6. Namespace
|
||||
6-1. Basics
|
||||
6-2. The Root and Views
|
||||
6-3. Migration and setns(2)
|
||||
6-4. Interaction with Other Namespaces
|
||||
P. Information on Kernel Programming
|
||||
P-1. Filesystem Support for Writeback
|
||||
D. Deprecated v1 Core Features
|
||||
@ -1085,6 +1090,148 @@ writeback as follows.
|
||||
vm.dirty[_background]_ratio.
|
||||
|
||||
|
||||
6. Namespace
|
||||
|
||||
6-1. Basics
|
||||
|
||||
cgroup namespace provides a mechanism to virtualize the view of the
|
||||
"/proc/$PID/cgroup" file and cgroup mounts. The CLONE_NEWCGROUP clone
|
||||
flag can be used with clone(2) and unshare(2) to create a new cgroup
|
||||
namespace. The process running inside the cgroup namespace will have
|
||||
its "/proc/$PID/cgroup" output restricted to cgroupns root. The
|
||||
cgroupns root is the cgroup of the process at the time of creation of
|
||||
the cgroup namespace.
|
||||
|
||||
Without cgroup namespace, the "/proc/$PID/cgroup" file shows the
|
||||
complete path of the cgroup of a process. In a container setup where
|
||||
a set of cgroups and namespaces are intended to isolate processes the
|
||||
"/proc/$PID/cgroup" file may leak potential system level information
|
||||
to the isolated processes. For Example:
|
||||
|
||||
# cat /proc/self/cgroup
|
||||
0::/batchjobs/container_id1
|
||||
|
||||
The path '/batchjobs/container_id1' can be considered as system-data
|
||||
and undesirable to expose to the isolated processes. cgroup namespace
|
||||
can be used to restrict visibility of this path. For example, before
|
||||
creating a cgroup namespace, one would see:
|
||||
|
||||
# ls -l /proc/self/ns/cgroup
|
||||
lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
|
||||
# cat /proc/self/cgroup
|
||||
0::/batchjobs/container_id1
|
||||
|
||||
After unsharing a new namespace, the view changes.
|
||||
|
||||
# ls -l /proc/self/ns/cgroup
|
||||
lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
|
||||
# cat /proc/self/cgroup
|
||||
0::/
|
||||
|
||||
When some thread from a multi-threaded process unshares its cgroup
|
||||
namespace, the new cgroupns gets applied to the entire process (all
|
||||
the threads). This is natural for the v2 hierarchy; however, for the
|
||||
legacy hierarchies, this may be unexpected.
|
||||
|
||||
A cgroup namespace is alive as long as there are processes inside or
|
||||
mounts pinning it. When the last usage goes away, the cgroup
|
||||
namespace is destroyed. The cgroupns root and the actual cgroups
|
||||
remain.
|
||||
|
||||
|
||||
6-2. The Root and Views
|
||||
|
||||
The 'cgroupns root' for a cgroup namespace is the cgroup in which the
|
||||
process calling unshare(2) is running. For example, if a process in
|
||||
/batchjobs/container_id1 cgroup calls unshare, cgroup
|
||||
/batchjobs/container_id1 becomes the cgroupns root. For the
|
||||
init_cgroup_ns, this is the real root ('/') cgroup.
|
||||
|
||||
The cgroupns root cgroup does not change even if the namespace creator
|
||||
process later moves to a different cgroup.
|
||||
|
||||
# ~/unshare -c # unshare cgroupns in some cgroup
|
||||
# cat /proc/self/cgroup
|
||||
0::/
|
||||
# mkdir sub_cgrp_1
|
||||
# echo 0 > sub_cgrp_1/cgroup.procs
|
||||
# cat /proc/self/cgroup
|
||||
0::/sub_cgrp_1
|
||||
|
||||
Each process gets its namespace-specific view of "/proc/$PID/cgroup"
|
||||
|
||||
Processes running inside the cgroup namespace will be able to see
|
||||
cgroup paths (in /proc/self/cgroup) only inside their root cgroup.
|
||||
From within an unshared cgroupns:
|
||||
|
||||
# sleep 100000 &
|
||||
[1] 7353
|
||||
# echo 7353 > sub_cgrp_1/cgroup.procs
|
||||
# cat /proc/7353/cgroup
|
||||
0::/sub_cgrp_1
|
||||
|
||||
From the initial cgroup namespace, the real cgroup path will be
|
||||
visible:
|
||||
|
||||
$ cat /proc/7353/cgroup
|
||||
0::/batchjobs/container_id1/sub_cgrp_1
|
||||
|
||||
From a sibling cgroup namespace (that is, a namespace rooted at a
|
||||
different cgroup), the cgroup path relative to its own cgroup
|
||||
namespace root will be shown. For instance, if PID 7353's cgroup
|
||||
namespace root is at '/batchjobs/container_id2', then it will see
|
||||
|
||||
# cat /proc/7353/cgroup
|
||||
0::/../container_id2/sub_cgrp_1
|
||||
|
||||
Note that the relative path always starts with '/' to indicate that
|
||||
its relative to the cgroup namespace root of the caller.
|
||||
|
||||
|
||||
6-3. Migration and setns(2)
|
||||
|
||||
Processes inside a cgroup namespace can move into and out of the
|
||||
namespace root if they have proper access to external cgroups. For
|
||||
example, from inside a namespace with cgroupns root at
|
||||
/batchjobs/container_id1, and assuming that the global hierarchy is
|
||||
still accessible inside cgroupns:
|
||||
|
||||
# cat /proc/7353/cgroup
|
||||
0::/sub_cgrp_1
|
||||
# echo 7353 > batchjobs/container_id2/cgroup.procs
|
||||
# cat /proc/7353/cgroup
|
||||
0::/../container_id2
|
||||
|
||||
Note that this kind of setup is not encouraged. A task inside cgroup
|
||||
namespace should only be exposed to its own cgroupns hierarchy.
|
||||
|
||||
setns(2) to another cgroup namespace is allowed when:
|
||||
|
||||
(a) the process has CAP_SYS_ADMIN against its current user namespace
|
||||
(b) the process has CAP_SYS_ADMIN against the target cgroup
|
||||
namespace's userns
|
||||
|
||||
No implicit cgroup changes happen with attaching to another cgroup
|
||||
namespace. It is expected that the someone moves the attaching
|
||||
process under the target cgroup namespace root.
|
||||
|
||||
|
||||
6-4. Interaction with Other Namespaces
|
||||
|
||||
Namespace specific cgroup hierarchy can be mounted by a process
|
||||
running inside a non-init cgroup namespace.
|
||||
|
||||
# mount -t cgroup2 none $MOUNT_POINT
|
||||
|
||||
This will mount the unified cgroup hierarchy with cgroupns root as the
|
||||
filesystem root. The process needs CAP_SYS_ADMIN against its user and
|
||||
mount namespaces.
|
||||
|
||||
The virtualization of /proc/self/cgroup file combined with restricting
|
||||
the view of cgroup hierarchy by namespace-private cgroupfs mount
|
||||
provides a properly isolated cgroup view inside the container.
|
||||
|
||||
|
||||
P. Information on Kernel Programming
|
||||
|
||||
This section contains kernel programming information in the areas
|
||||
|
Loading…
x
Reference in New Issue
Block a user