vfs-6.13.rust.pid_namespace

-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcWKgAKCRCRxhvAZXjc
 osPAAP9bLzOPIF51IgP9mQTBlKKrpCWCMQVss5xRDseyNEfCEQD/fR9TSSnX9Suw
 iad9oBkxkzCjyxWIH46rvbdnc38lRwo=
 =aawA
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.13.rust.pid_namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull pid_namespace rust bindings from Christian Brauner:
 "This contains my Rust bindings for pid namespaces needed for various
  rust drivers. Here's a description of the basic C semantics and how
  they are mapped to Rust.

  The pid namespace of a task doesn't ever change once the task is
  alive. A unshare(CLONE_NEWPID) or setns(fd_pidns/pidfd, CLONE_NEWPID)
  will not have an effect on the calling task's pid namespace. It will
  only effect the pid namespace of children created by the calling task.
  This invariant guarantees that after having acquired a reference to a
  task's pid namespace it will remain unchanged.

  When a task has exited and been reaped release_task() will be called.
  This will set the pid namespace of the task to NULL. So retrieving the
  pid namespace of a task that is dead will return NULL. Note, that
  neither holding the RCU lock nor holding a reference count to the task
  will prevent release_task() from being called.

  In order to retrieve the pid namespace of a task the
  task_active_pid_ns() function can be used. There are two cases to
  consider:

   (1) retrieving the pid namespace of the current task
   (2) retrieving the pid namespace of a non-current task

  From system call context retrieving the pid namespace for case (1) is
  always safe and requires neither RCU locking nor a reference count to
  be held. Retrieving the pid namespace after release_task() for current
  will return NULL but no codepath like that is exposed to Rust.

  Retrieving the pid namespace from system call context for (2) requires
  RCU protection. Accessing a pid namespace outside of RCU protection
  requires a reference count that must've been acquired while holding
  the RCU lock. Note that accessing a non-current task means NULL can be
  returned as the non-current task could have already passed through
  release_task().

  To retrieve (1) the current_pid_ns!() macro should be used. It ensures
  that the returned pid namespace cannot outlive the calling scope. The
  associated current_pid_ns() function should not be called directly as
  it could be abused to created an unbounded lifetime for the pid
  namespace. The current_pid_ns!() macro allows Rust to handle the
  common case of accessing current's pid namespace without RCU
  protection and without having to acquire a reference count.

  For (2) the task_get_pid_ns() method must be used. This will always
  acquire a reference on the pid namespace and will return an Option to
  force the caller to explicitly handle the case where pid namespace is
  None. Something that tends to be forgotten when doing the equivalent
  operation in C.

  Missing RCU primitives make it difficult to perform operations that
  are otherwise safe without holding a reference count as long as RCU
  protection is guaranteed. But it is not important currently. But we do
  want it in the future.

  Note that for (2) the required RCU protection around calling
  task_active_pid_ns() synchronizes against putting the last reference
  of the associated struct pid of task->thread_pid. The struct pid
  stored in that field is used to retrieve the pid namespace of the
  caller. When release_task() is called task->thread_pid will be NULLed
  and put_pid() on said struct pid will be delayed in free_pid() via
  call_rcu() allowing everyone with an RCU protected access to the
  struct pid acquired from task->thread_pid to finish"

* tag 'vfs-6.13.rust.pid_namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  rust: add PidNamespace
This commit is contained in:
Linus Torvalds 2024-11-26 13:18:00 -08:00
commit 9ad8d22f2f
5 changed files with 225 additions and 6 deletions

View File

@ -18,6 +18,7 @@
#include "kunit.c"
#include "mutex.c"
#include "page.c"
#include "pid_namespace.c"
#include "rbtree.c"
#include "refcount.c"
#include "security.c"

View File

@ -0,0 +1,26 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/pid_namespace.h>
#include <linux/cleanup.h>
struct pid_namespace *rust_helper_get_pid_ns(struct pid_namespace *ns)
{
return get_pid_ns(ns);
}
void rust_helper_put_pid_ns(struct pid_namespace *ns)
{
put_pid_ns(ns);
}
/* Get a reference on a task's pid namespace. */
struct pid_namespace *rust_helper_task_get_pid_ns(struct task_struct *task)
{
struct pid_namespace *pid_ns;
guard(rcu)();
pid_ns = task_active_pid_ns(task);
if (pid_ns)
get_pid_ns(pid_ns);
return pid_ns;
}

View File

@ -45,6 +45,7 @@
#[cfg(CONFIG_NET)]
pub mod net;
pub mod page;
pub mod pid_namespace;
pub mod prelude;
pub mod print;
pub mod rbtree;

View File

@ -0,0 +1,68 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2024 Christian Brauner <brauner@kernel.org>
//! Pid namespaces.
//!
//! C header: [`include/linux/pid_namespace.h`](srctree/include/linux/pid_namespace.h) and
//! [`include/linux/pid.h`](srctree/include/linux/pid.h)
use crate::{
bindings,
types::{AlwaysRefCounted, Opaque},
};
use core::ptr;
/// Wraps the kernel's `struct pid_namespace`. Thread safe.
///
/// This structure represents the Rust abstraction for a C `struct pid_namespace`. This
/// implementation abstracts the usage of an already existing C `struct pid_namespace` within Rust
/// code that we get passed from the C side.
#[repr(transparent)]
pub struct PidNamespace {
inner: Opaque<bindings::pid_namespace>,
}
impl PidNamespace {
/// Returns a raw pointer to the inner C struct.
#[inline]
pub fn as_ptr(&self) -> *mut bindings::pid_namespace {
self.inner.get()
}
/// Creates a reference to a [`PidNamespace`] from a valid pointer.
///
/// # Safety
///
/// The caller must ensure that `ptr` is valid and remains valid for the lifetime of the
/// returned [`PidNamespace`] reference.
pub unsafe fn from_ptr<'a>(ptr: *const bindings::pid_namespace) -> &'a Self {
// SAFETY: The safety requirements guarantee the validity of the dereference, while the
// `PidNamespace` type being transparent makes the cast ok.
unsafe { &*ptr.cast() }
}
}
// SAFETY: Instances of `PidNamespace` are always reference-counted.
unsafe impl AlwaysRefCounted for PidNamespace {
#[inline]
fn inc_ref(&self) {
// SAFETY: The existence of a shared reference means that the refcount is nonzero.
unsafe { bindings::get_pid_ns(self.as_ptr()) };
}
#[inline]
unsafe fn dec_ref(obj: ptr::NonNull<PidNamespace>) {
// SAFETY: The safety requirements guarantee that the refcount is non-zero.
unsafe { bindings::put_pid_ns(obj.cast().as_ptr()) }
}
}
// SAFETY:
// - `PidNamespace::dec_ref` can be called from any thread.
// - It is okay to send ownership of `PidNamespace` across thread boundaries.
unsafe impl Send for PidNamespace {}
// SAFETY: It's OK to access `PidNamespace` through shared references from other threads because
// we're either accessing properties that don't change or that are properly synchronised by C code.
unsafe impl Sync for PidNamespace {}

View File

@ -6,7 +6,8 @@
use crate::{
bindings,
types::{NotThreadSafe, Opaque},
pid_namespace::PidNamespace,
types::{ARef, NotThreadSafe, Opaque},
};
use core::{
cmp::{Eq, PartialEq},
@ -36,6 +37,16 @@ macro_rules! current {
};
}
/// Returns the currently running task's pid namespace.
#[macro_export]
macro_rules! current_pid_ns {
() => {
// SAFETY: Deref + addr-of below create a temporary `PidNamespaceRef` that cannot outlive
// the caller.
unsafe { &*$crate::task::Task::current_pid_ns() }
};
}
/// Wraps the kernel's `struct task_struct`.
///
/// # Invariants
@ -145,6 +156,97 @@ fn deref(&self) -> &Self::Target {
}
}
/// Returns a PidNamespace reference for the currently executing task's/thread's pid namespace.
///
/// This function can be used to create an unbounded lifetime by e.g., storing the returned
/// PidNamespace in a global variable which would be a bug. So the recommended way to get the
/// current task's/thread's pid namespace is to use the [`current_pid_ns`] macro because it is
/// safe.
///
/// # Safety
///
/// Callers must ensure that the returned object doesn't outlive the current task/thread.
pub unsafe fn current_pid_ns() -> impl Deref<Target = PidNamespace> {
struct PidNamespaceRef<'a> {
task: &'a PidNamespace,
_not_send: NotThreadSafe,
}
impl Deref for PidNamespaceRef<'_> {
type Target = PidNamespace;
fn deref(&self) -> &Self::Target {
self.task
}
}
// The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
//
// The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
// `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
// on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
// created by the calling `Task`. This invariant guarantees that after having acquired a
// reference to a `Task`'s pid namespace it will remain unchanged.
//
// When a task has exited and been reaped `release_task()` will be called. This will set
// the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
// that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
// referencing count to
// the `Task` will prevent `release_task()` being called.
//
// In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
// can be used. There are two cases to consider:
//
// (1) retrieving the `PidNamespace` of the `current` task
// (2) retrieving the `PidNamespace` of a non-`current` task
//
// From system call context retrieving the `PidNamespace` for case (1) is always safe and
// requires neither RCU locking nor a reference count to be held. Retrieving the
// `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
// like that is exposed to Rust.
//
// Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
// Accessing `PidNamespace` outside of RCU protection requires a reference count that
// must've been acquired while holding the RCU lock. Note that accessing a non-`current`
// task means `NULL` can be returned as the non-`current` task could have already passed
// through `release_task()`.
//
// To retrieve (1) the `current_pid_ns!()` macro should be used which ensure that the
// returned `PidNamespace` cannot outlive the calling scope. The associated
// `current_pid_ns()` function should not be called directly as it could be abused to
// created an unbounded lifetime for `PidNamespace`. The `current_pid_ns!()` macro allows
// Rust to handle the common case of accessing `current`'s `PidNamespace` without RCU
// protection and without having to acquire a reference count.
//
// For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
// reference on `PidNamespace` and will return an `Option` to force the caller to
// explicitly handle the case where `PidNamespace` is `None`, something that tends to be
// forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
// difficult to perform operations that are otherwise safe without holding a reference
// count as long as RCU protection is guaranteed. But it is not important currently. But we
// do want it in the future.
//
// Note for (2) the required RCU protection around calling `task_active_pid_ns()`
// synchronizes against putting the last reference of the associated `struct pid` of
// `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
// `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
// `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
// `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
// from `task->thread_pid` to finish.
//
// SAFETY: The current task's pid namespace is valid as long as the current task is running.
let pidns = unsafe { bindings::task_active_pid_ns(Task::current_raw()) };
PidNamespaceRef {
// SAFETY: If the current thread is still running, the current task and its associated
// pid namespace are valid. `PidNamespaceRef` is not `Send`, so we know it cannot be
// transferred to another thread (where it could potentially outlive the current
// `Task`). The caller needs to ensure that the PidNamespaceRef doesn't outlive the
// current task/thread.
task: unsafe { PidNamespace::from_ptr(pidns) },
_not_send: NotThreadSafe,
}
}
/// Returns a raw pointer to the task.
#[inline]
pub fn as_ptr(&self) -> *mut bindings::task_struct {
@ -188,11 +290,32 @@ pub fn signal_pending(&self) -> bool {
unsafe { bindings::signal_pending(self.as_ptr()) != 0 }
}
/// Returns the given task's pid in the current pid namespace.
pub fn pid_in_current_ns(&self) -> Pid {
// SAFETY: It's valid to pass a null pointer as the namespace (defaults to current
// namespace). The task pointer is also valid.
unsafe { bindings::task_tgid_nr_ns(self.as_ptr(), ptr::null_mut()) }
/// Returns task's pid namespace with elevated reference count
pub fn get_pid_ns(&self) -> Option<ARef<PidNamespace>> {
// SAFETY: By the type invariant, we know that `self.0` is valid.
let ptr = unsafe { bindings::task_get_pid_ns(self.as_ptr()) };
if ptr.is_null() {
None
} else {
// SAFETY: `ptr` is valid by the safety requirements of this function. And we own a
// reference count via `task_get_pid_ns()`.
// CAST: `Self` is a `repr(transparent)` wrapper around `bindings::pid_namespace`.
Some(unsafe { ARef::from_raw(ptr::NonNull::new_unchecked(ptr.cast::<PidNamespace>())) })
}
}
/// Returns the given task's pid in the provided pid namespace.
#[doc(alias = "task_tgid_nr_ns")]
pub fn tgid_nr_ns(&self, pidns: Option<&PidNamespace>) -> Pid {
let pidns = match pidns {
Some(pidns) => pidns.as_ptr(),
None => core::ptr::null_mut(),
};
// SAFETY: By the type invariant, we know that `self.0` is valid. We received a valid
// PidNamespace that we can use as a pointer or we received an empty PidNamespace and
// thus pass a null pointer. The underlying C function is safe to be used with NULL
// pointers.
unsafe { bindings::task_tgid_nr_ns(self.as_ptr(), pidns) }
}
/// Wakes up the task.