linux-stable/arch
Lorenzo Stoakes 662df3e5c3 mm: madvise: implement lightweight guard page mechanism
Implement a new lightweight guard page feature, that is regions of
userland virtual memory that, when accessed, cause a fatal signal to
arise.

Currently users must establish PROT_NONE ranges to achieve this.

However this is very costly memory-wise - we need a VMA for each and every
one of these regions AND they become unmergeable with surrounding VMAs.

In addition repeated mmap() calls require repeated kernel context switches
and contention of the mmap lock to install these ranges, potentially also
having to unmap memory if installed over existing ranges.

The lightweight guard approach eliminates the VMA cost altogether - rather
than establishing a PROT_NONE VMA, it operates at the level of page table
entries - establishing PTE markers such that accesses to them cause a
fault followed by a SIGSGEV signal being raised.

This is achieved through the PTE marker mechanism, which we have already
extended to provide PTE_MARKER_GUARD, which we installed via the generic
page walking logic which we have extended for this purpose.

These guard ranges are established with MADV_GUARD_INSTALL.  If the range
in which they are installed contain any existing mappings, they will be
zapped, i.e.  free the range and unmap memory (thus mimicking the
behaviour of MADV_DONTNEED in this respect).

Any existing guard entries will be left untouched.  There is therefore no
nesting of guarded pages.

Guarded ranges are NOT cleared by MADV_DONTNEED nor MADV_FREE (in both
instances the memory range may be reused at which point a user would
expect guards to still be in place), but they are cleared via
MADV_GUARD_REMOVE, process teardown or unmapping of memory ranges.

The guard property can be removed from ranges via MADV_GUARD_REMOVE.  The
ranges over which this is applied, should they contain non-guard entries,
will be untouched, with only guard entries being cleared.

We permit this operation on anonymous memory only, and only VMAs which are
non-special, non-huge and not mlock()'d (if we permitted this we'd have to
drop locked pages which would be rather counterintuitive).

Racing page faults can cause repeated attempts to install guard pages that
are interrupted, result in a zap, and this process can end up being
repeated.  If this happens more than would be expected in normal
operation, we rescind locks and retry the whole thing, which avoids lock
contention in this scenario.

Link: https://lkml.kernel.org/r/6aafb5821bf209f277dfae0787abb2ef87a37542.1730123433.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Jann Horn <jannh@google.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Jann Horn <jannh@google.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11 00:26:45 -08:00
..
alpha mm: madvise: implement lightweight guard page mechanism 2024-11-11 00:26:45 -08:00
arc asm-generic: introduce text-patching.h 2024-11-07 14:25:15 -08:00
arm asm-generic: introduce text-patching.h 2024-11-07 14:25:15 -08:00
arm64 arch: introduce set_direct_map_valid_noflush() 2024-11-07 14:25:15 -08:00
csky asm-generic: introduce text-patching.h 2024-11-07 14:25:15 -08:00
hexagon asm-generic: introduce text-patching.h 2024-11-07 14:25:15 -08:00
loongarch arch: introduce set_direct_map_valid_noflush() 2024-11-07 14:25:15 -08:00
m68k asm-generic: introduce text-patching.h 2024-11-07 14:25:15 -08:00
microblaze asm-generic: introduce text-patching.h 2024-11-07 14:25:15 -08:00
mips mm: madvise: implement lightweight guard page mechanism 2024-11-11 00:26:45 -08:00
nios2 asm-generic: introduce text-patching.h 2024-11-07 14:25:15 -08:00
openrisc asm-generic: introduce text-patching.h 2024-11-07 14:25:15 -08:00
parisc mm: madvise: implement lightweight guard page mechanism 2024-11-11 00:26:45 -08:00
powerpc asm-generic: introduce text-patching.h 2024-11-07 14:25:15 -08:00
riscv arch: introduce set_direct_map_valid_noflush() 2024-11-07 14:25:15 -08:00
s390 arch: introduce set_direct_map_valid_noflush() 2024-11-07 14:25:15 -08:00
sh asm-generic: introduce text-patching.h 2024-11-07 14:25:15 -08:00
sparc asm-generic: introduce text-patching.h 2024-11-07 14:25:15 -08:00
um x86/module: prepare module loading for ROX allocations of text 2024-11-07 14:25:16 -08:00
x86 bootmem: stop using page->index 2024-11-07 14:38:07 -08:00
xtensa mm: madvise: implement lightweight guard page mechanism 2024-11-11 00:26:45 -08:00
.gitignore
Kconfig execmem: add support for cache of large ROX pages 2024-11-07 14:25:16 -08:00