linux/rust/kernel/pid_namespace.rs
Christian Brauner e0020ba6cb
rust: add PidNamespace
The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.

The `PidNamespace` of a `Task` doesn't ever change once the `Task` is
alive. A `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)`
will not have an effect on the calling `Task`'s pid namespace. It will
only effect the pid namespace of children created by the calling `Task`.
This invariant guarantees that after having acquired a reference to a
`Task`'s pid namespace it will remain unchanged.

When a task has exited and been reaped `release_task()` will be called.
This will set the `PidNamespace` of the task to `NULL`. So retrieving
the `PidNamespace` of a task that is dead will return `NULL`. Note, that
neither holding the RCU lock nor holding a referencing count to the
`Task` will prevent `release_task()` being called.

In order to retrieve the `PidNamespace` of a `Task` the
`task_active_pid_ns()` function can be used. There are two cases to
consider:

(1) retrieving the `PidNamespace` of the `current` task (2) retrieving
the `PidNamespace` of a non-`current` task

From system call context retrieving the `PidNamespace` for case (1) is
always safe and requires neither RCU locking nor a reference count to be
held. Retrieving the `PidNamespace` after `release_task()` for current
will return `NULL` but no codepath like that is exposed to Rust.

Retrieving the `PidNamespace` from system call context for (2) requires
RCU protection. Accessing `PidNamespace` outside of RCU protection
requires a reference count that must've been acquired while holding the
RCU lock. Note that accessing a non-`current` task means `NULL` can be
returned as the non-`current` task could have already passed through
`release_task()`.

To retrieve (1) the `current_pid_ns!()` macro should be used which
ensure that the returned `PidNamespace` cannot outlive the calling
scope. The associated `current_pid_ns()` function should not be called
directly as it could be abused to created an unbounded lifetime for
`PidNamespace`. The `current_pid_ns!()` macro allows Rust to handle the
common case of accessing `current`'s `PidNamespace` without RCU
protection and without having to acquire a reference count.

For (2) the `task_get_pid_ns()` method must be used. This will always
acquire a reference on `PidNamespace` and will return an `Option` to
force the caller to explicitly handle the case where `PidNamespace` is
`None`, something that tends to be forgotten when doing the equivalent
operation in `C`. Missing RCU primitives make it difficult to perform
operations that are otherwise safe without holding a reference count as
long as RCU protection is guaranteed. But it is not important currently.
But we do want it in the future.

Note for (2) the required RCU protection around calling
`task_active_pid_ns()` synchronizes against putting the last reference
of the associated `struct pid` of `task->thread_pid`. The `struct pid`
stored in that field is used to retrieve the `PidNamespace` of the
caller. When `release_task()` is called `task->thread_pid` will be
`NULL`ed and `put_pid()` on said `struct pid` will be delayed in
`free_pid()` via `call_rcu()` allowing everyone with an RCU protected
access to the `struct pid` acquired from `task->thread_pid` to finish.

Link: https://lore.kernel.org/r/20241002-brauner-rust-pid_namespace-v5-1-a90e70d44fde@kernel.org
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-08 15:44:36 +02:00

69 lines
2.4 KiB
Rust

// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2024 Christian Brauner <brauner@kernel.org>
//! Pid namespaces.
//!
//! C header: [`include/linux/pid_namespace.h`](srctree/include/linux/pid_namespace.h) and
//! [`include/linux/pid.h`](srctree/include/linux/pid.h)
use crate::{
bindings,
types::{AlwaysRefCounted, Opaque},
};
use core::ptr;
/// Wraps the kernel's `struct pid_namespace`. Thread safe.
///
/// This structure represents the Rust abstraction for a C `struct pid_namespace`. This
/// implementation abstracts the usage of an already existing C `struct pid_namespace` within Rust
/// code that we get passed from the C side.
#[repr(transparent)]
pub struct PidNamespace {
inner: Opaque<bindings::pid_namespace>,
}
impl PidNamespace {
/// Returns a raw pointer to the inner C struct.
#[inline]
pub fn as_ptr(&self) -> *mut bindings::pid_namespace {
self.inner.get()
}
/// Creates a reference to a [`PidNamespace`] from a valid pointer.
///
/// # Safety
///
/// The caller must ensure that `ptr` is valid and remains valid for the lifetime of the
/// returned [`PidNamespace`] reference.
pub unsafe fn from_ptr<'a>(ptr: *const bindings::pid_namespace) -> &'a Self {
// SAFETY: The safety requirements guarantee the validity of the dereference, while the
// `PidNamespace` type being transparent makes the cast ok.
unsafe { &*ptr.cast() }
}
}
// SAFETY: Instances of `PidNamespace` are always reference-counted.
unsafe impl AlwaysRefCounted for PidNamespace {
#[inline]
fn inc_ref(&self) {
// SAFETY: The existence of a shared reference means that the refcount is nonzero.
unsafe { bindings::get_pid_ns(self.as_ptr()) };
}
#[inline]
unsafe fn dec_ref(obj: ptr::NonNull<PidNamespace>) {
// SAFETY: The safety requirements guarantee that the refcount is non-zero.
unsafe { bindings::put_pid_ns(obj.cast().as_ptr()) }
}
}
// SAFETY:
// - `PidNamespace::dec_ref` can be called from any thread.
// - It is okay to send ownership of `PidNamespace` across thread boundaries.
unsafe impl Send for PidNamespace {}
// SAFETY: It's OK to access `PidNamespace` through shared references from other threads because
// we're either accessing properties that don't change or that are properly synchronised by C code.
unsafe impl Sync for PidNamespace {}