linux-next/mm
Lorenzo Stoakes 662df3e5c3 mm: madvise: implement lightweight guard page mechanism
Implement a new lightweight guard page feature, that is regions of
userland virtual memory that, when accessed, cause a fatal signal to
arise.

Currently users must establish PROT_NONE ranges to achieve this.

However this is very costly memory-wise - we need a VMA for each and every
one of these regions AND they become unmergeable with surrounding VMAs.

In addition repeated mmap() calls require repeated kernel context switches
and contention of the mmap lock to install these ranges, potentially also
having to unmap memory if installed over existing ranges.

The lightweight guard approach eliminates the VMA cost altogether - rather
than establishing a PROT_NONE VMA, it operates at the level of page table
entries - establishing PTE markers such that accesses to them cause a
fault followed by a SIGSGEV signal being raised.

This is achieved through the PTE marker mechanism, which we have already
extended to provide PTE_MARKER_GUARD, which we installed via the generic
page walking logic which we have extended for this purpose.

These guard ranges are established with MADV_GUARD_INSTALL.  If the range
in which they are installed contain any existing mappings, they will be
zapped, i.e.  free the range and unmap memory (thus mimicking the
behaviour of MADV_DONTNEED in this respect).

Any existing guard entries will be left untouched.  There is therefore no
nesting of guarded pages.

Guarded ranges are NOT cleared by MADV_DONTNEED nor MADV_FREE (in both
instances the memory range may be reused at which point a user would
expect guards to still be in place), but they are cleared via
MADV_GUARD_REMOVE, process teardown or unmapping of memory ranges.

The guard property can be removed from ranges via MADV_GUARD_REMOVE.  The
ranges over which this is applied, should they contain non-guard entries,
will be untouched, with only guard entries being cleared.

We permit this operation on anonymous memory only, and only VMAs which are
non-special, non-huge and not mlock()'d (if we permitted this we'd have to
drop locked pages which would be rather counterintuitive).

Racing page faults can cause repeated attempts to install guard pages that
are interrupted, result in a zap, and this process can end up being
repeated.  If this happens more than would be expected in normal
operation, we rescind locks and retry the whole thing, which avoids lock
contention in this scenario.

Link: https://lkml.kernel.org/r/6aafb5821bf209f277dfae0787abb2ef87a37542.1730123433.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Jann Horn <jannh@google.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Suggested-by: Jann Horn <jannh@google.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11 00:26:45 -08:00
..
damon Merge branch 'mm-hotfixes-stable' into mm-stable 2024-11-11 00:04:10 -08:00
kasan kasan: delete CONFIG_KASAN_MODULE_TEST 2024-11-11 00:26:44 -08:00
kfence mm: kfence: fix elapsed time for allocated/freed track 2024-09-26 14:01:44 -07:00
kmsan mm, kasan, kmsan: instrument copy_from/to_kernel_nofault 2024-11-06 20:11:14 -08:00
backing-dev.c writeback: support retrieving per group debug writeback stats of bdi 2024-05-05 17:53:51 -07:00
balloon_compaction.c mm: remove MIGRATE_SYNC_NO_COPY mode 2024-07-03 19:30:00 -07:00
bootmem_info.c bootmem: stop using page->index 2024-11-07 14:38:07 -08:00
cma_debug.c mm/cma_debug: show complete cma name in debugfs directories 2022-09-11 20:25:50 -07:00
cma_sysfs.c mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
cma.c mm/cma: fix useless return in void function 2024-11-05 16:56:30 -08:00
cma.h mm/cma: add sysfs file 'release_pages_success' 2024-02-22 10:24:57 -08:00
compaction.c mm:page_alloc: fix the NULL ac->nodemask in __alloc_pages_slowpath() 2024-09-03 21:15:47 -07:00
debug_page_alloc.c mm: page_alloc: consolidate free page accounting 2024-04-25 20:56:04 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries 2024-09-17 01:07:01 -07:00
debug.c mm: support only one page_type per page 2024-09-03 21:15:43 -07:00
dmapool_test.c mm/dmapool: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
dmapool.c mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs 2023-12-05 11:17:58 +01:00
early_ioremap.c mm/early_ioremap.c: improve the execution efficiency of early_ioremap_setup() 2023-06-09 16:25:56 -07:00
execmem.c alloc_tag: populate memory for module tags as needed 2024-11-07 14:25:16 -08:00
fadvise.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
fail_page_alloc.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
failslab.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
filemap.c memcg-v1: remove memcg move locking code 2024-11-06 20:11:19 -08:00
folio-compat.c mm: remove putback_lru_page() 2024-09-09 16:38:59 -07:00
gup_test.c Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-06-23 16:58:19 -07:00
gup_test.h mm/gup_test: start/stop/read functionality for PIN LONGTERM test 2022-11-08 17:37:15 -08:00
gup.c gup: convert FOLL_TOUCH case in follow_page_pte() to folio 2024-11-06 20:11:08 -08:00
highmem.c mm/highmem: make nr_free_highpages() return "unsigned long" 2024-07-03 19:30:06 -07:00
hmm.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
huge_memory.c mm: add per-order mTHP swpin counters 2024-11-11 00:26:43 -08:00
hugetlb_cgroup.c mm: memcg: don't call propagate_protected_usage() needlessly 2024-09-01 20:25:50 -07:00
hugetlb_vmemmap.c mm/hugetlb_vmemmap: don't synchronize_rcu() without HVO 2024-09-01 20:25:45 -07:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: fix reference to nonexistent file 2023-10-25 16:47:14 -07:00
hugetlb.c mm: add PTE_MARKER_GUARD PTE marker 2024-11-11 00:26:44 -08:00
hwpoison-inject.c mm/hwpoison: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
init-mm.c mm: Deprecate pasid field 2023-12-12 10:11:32 +01:00
internal.h mm: pagewalk: add the ability to install PTEs 2024-11-11 00:26:44 -08:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed 2023-08-18 10:12:36 -07:00
Kconfig resource: remove dependency on SPARSEMEM from GET_FREE_REGION 2024-10-28 21:40:39 -07:00
Kconfig.debug slub: Introduce CONFIG_SLUB_RCU_DEBUG 2024-08-27 14:12:51 +02:00
khugepaged.c mm: khugepaged: collapse_pte_mapped_thp() use pte_offset_map_rw_nolock() 2024-11-05 16:56:27 -08:00
kmemleak.c mm/kmemleak: fix typo in object_no_scan() comment 2024-11-06 20:11:12 -08:00
ksm.c mm: mass constification of folio/page pointers 2024-11-07 14:38:07 -08:00
list_lru.c mm: list_lru: fix UAF for memory cgroup 2024-08-07 18:33:56 -07:00
maccess.c kasan: migrate copy_user_test to kunit 2024-11-11 00:26:44 -08:00
madvise.c mm: madvise: implement lightweight guard page mechanism 2024-11-11 00:26:45 -08:00
Makefile mm: introduce numa_emulation 2024-09-03 21:15:31 -07:00
mapping_dirty_helpers.c mm: fix clean_record_shared_mapping_range kernel-doc 2023-08-24 16:20:30 -07:00
memblock.c memblock: updates for 6.12-rc1 2024-09-25 11:35:19 -07:00
memcontrol-v1.c memcg-v1: remove memcg move locking code 2024-11-06 20:11:19 -08:00
memcontrol-v1.h memcg-v1: remove charge move code 2024-11-06 20:11:18 -08:00
memcontrol.c Merge branch 'mm-hotfixes-stable' into mm-stable 2024-11-11 00:04:10 -08:00
memfd.c mm/hugetlb: simplify refs in memfd_alloc_folio 2024-09-26 14:01:44 -07:00
memory_hotplug.c kaslr: rename physmem_end and PHYSMEM_END to direct_map_physmem_end 2024-11-06 20:11:11 -08:00
memory-failure.c mm: mass constification of folio/page pointers 2024-11-07 14:38:07 -08:00
memory-tiers.c memory tiers: use default_dram_perf_ref_source in log message 2024-09-26 14:01:44 -07:00
memory.c mm: add PTE_MARKER_GUARD PTE marker 2024-11-11 00:26:44 -08:00
mempolicy.c mm: renovate page_address_in_vma() 2024-11-07 14:38:07 -08:00
mempool.c mm: fix xyz_noprof functions calling profiled functions 2024-06-05 19:19:26 -07:00
memremap.c mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs() 2024-05-05 17:53:49 -07:00
memtest.c memtest: use {READ,WRITE}_ONCE in memory scanning 2024-03-13 12:12:21 -07:00
migrate_device.c mm: remap unused subpages to shared zeropage when splitting isolated thp 2024-09-09 16:39:03 -07:00
migrate.c mm: remove redundant condition for THP folio 2024-11-06 20:11:16 -08:00
mincore.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
mlock.c mm/mlock: set the correct prev on failure 2024-11-07 14:14:58 -08:00
mm_init.c alloc_tag: support for page allocation tag compression 2024-11-07 14:25:16 -08:00
mm_slot.h mm: introduce common struct mm_slot 2022-10-03 14:02:43 -07:00
mmap_lock.c mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer 2024-07-03 19:30:26 -07:00
mmap.c mm: isolate mmap internal logic to mm/vma.c 2024-11-06 20:11:19 -08:00
mmu_gather.c mm/mmu_gather: improve cond_resched() handling with large folios and expensive page freeing 2024-02-22 15:27:17 -08:00
mmu_notifier.c mm: move internal core VMA manipulation functions to own file 2024-09-01 20:25:54 -07:00
mmzone.c mm: improve code consistency with zonelist_* helper functions 2024-09-01 20:25:55 -07:00
mprotect.c mm: add PTE_MARKER_GUARD PTE marker 2024-11-11 00:26:44 -08:00
mremap.c mm/mremap: remove goto from mremap_to() 2024-11-06 20:11:16 -08:00
mseal.c mm: madvise: implement lightweight guard page mechanism 2024-11-11 00:26:45 -08:00
msync.c mm/msync: use vma_find() instead of vma linked list 2022-09-26 19:46:25 -07:00
nommu.c mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling 2024-11-05 16:49:55 -08:00
numa_emulation.c mm: introduce numa_emulation 2024-09-03 21:15:31 -07:00
numa_memblks.c mm: numa_clear_kernel_node_hotplug: Add NUMA_NO_NODE check for node id 2024-10-28 21:40:40 -07:00
numa.c mm: make range-to-target_node lookup facility a part of numa_memblks 2024-09-03 21:15:32 -07:00
oom_kill.c mm: move mm flags to mm_types.h 2024-11-05 16:56:26 -08:00
page_alloc.c Merge branch 'mm-hotfixes-stable' into mm-stable 2024-11-11 00:04:10 -08:00
page_counter.c mm, memcg: cg2 memory{.swap,}.peak write handlers 2024-09-01 20:25:53 -07:00
page_ext.c mm: don't account memmap per-node 2024-08-15 22:16:14 -07:00
page_idle.c mm: page_idle: convert page idle to use a folio 2023-01-18 17:12:52 -08:00
page_io.c mm: add per-order mTHP swpin counters 2024-11-11 00:26:43 -08:00
page_isolation.c mm: remove migration for HugePage in isolate_single_pageblock() 2024-09-03 21:15:40 -07:00
page_owner.c mm/page-owner: use gfp_nested_mask() instead of open coded masking 2024-05-19 14:40:44 -07:00
page_poison.c mm/page_poison: replace kmap_atomic() with kmap_local_page() 2023-12-10 16:51:50 -08:00
page_reporting.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_reporting.h
page_table_check.c mm/page_table_check: fix crash on ZONE_DEVICE 2024-06-15 10:43:04 -07:00
page_vma_mapped.c mm: mass constification of folio/page pointers 2024-11-07 14:38:07 -08:00
page-writeback.c memcg-v1: no need for memcg locking for writeback tracking 2024-11-06 20:11:19 -08:00
pagewalk.c mm: pagewalk: add the ability to install PTEs 2024-11-11 00:26:44 -08:00
percpu-internal.h mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
percpu-km.c
percpu-stats.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
percpu-vm.c percpu: clean up all mappings when pcpu_map_pages() fails 2024-04-25 20:55:49 -07:00
percpu.c mm: use page->private instead of page->index in percpu 2024-11-07 14:38:07 -08:00
pgalloc-track.h
pgtable-generic.c mm: pgtable: remove pte_offset_map_nolock() 2024-11-05 16:56:29 -08:00
process_vm_access.c mm: refactor mm_access() to not return NULL 2024-11-05 16:56:23 -08:00
ptdump.c mm: ptdump: add check_wx_pages debugfs attribute 2024-02-22 10:24:47 -08:00
readahead.c mm: don't set readahead flag on a folio when lookahead_size > nr_to_read 2024-11-06 20:11:15 -08:00
rmap.c mm: mass constification of folio/page pointers 2024-11-07 14:38:07 -08:00
rodata_test.c mm/rodata_test: use PAGE_ALIGNED() helper 2022-10-03 14:03:05 -07:00
secretmem.c secretmem: disable memfd_secret() if arch cannot set direct map 2024-10-09 12:47:19 -07:00
shmem_quota.c shmem_quota: build the object file conditionally to the config option 2024-09-01 20:25:45 -07:00
shmem.c mm: shmem: fallback to page size splice if large folio has poisoned pages 2024-11-06 20:11:18 -08:00
show_mem.c mm/show_mem: use str_yes_no() helper in show_free_areas() 2024-11-07 14:38:08 -08:00
shrinker_debug.c mm: shrinker: use min() to improve shrinker_debugfs_scan_write() 2024-09-03 21:15:40 -07:00
shrinker.c mm: shrinker: avoid memleak in alloc_shrinker_info 2024-10-31 20:27:04 -07:00
shuffle.c mm/shuffle: convert module_param_call to module_param_cb 2022-10-03 14:03:07 -07:00
shuffle.h mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
slab_common.c mm: krealloc: Fix MTE false alarm in __do_krealloc 2024-10-29 10:40:53 +01:00
slab.h mm, slab: suppress warnings in test_leak_destroy kunit test 2024-10-02 16:28:46 +02:00
slub.c mm, slab: suppress warnings in test_leak_destroy kunit test 2024-10-02 16:28:46 +02:00
sparse-vmemmap.c LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space 2024-10-21 22:11:19 +08:00
sparse.c bootmem: stop using page->index 2024-11-07 14:38:07 -08:00
swap_cgroup.c mm: attempt to batch free swap entries for zap_pte_range() 2024-09-03 21:15:33 -07:00
swap_slots.c mm: swap: update get_swap_pages() to take folio order 2024-04-25 20:56:37 -07:00
swap_state.c mm: swap: use str_true_false() helper function 2024-11-06 20:11:14 -08:00
swap.c mm/thp: fix deferred split unqueue naming and locking 2024-11-05 16:49:54 -08:00
swap.h mm: fix swap_read_folio_zeromap() for large folios with partial zeromap 2024-09-17 01:07:01 -07:00
swapfile.c mm, swap: avoid over reclaim of full clusters 2024-10-30 20:14:11 -07:00
truncate.c mm/truncate: reset xa_has_values flag on each iteration 2024-11-06 20:11:09 -08:00
usercopy.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
userfaultfd.c mm: remove unused hugepage for vma_alloc_folio() 2024-11-06 20:11:12 -08:00
util.c mm: renovate page_address_in_vma() 2024-11-07 14:38:07 -08:00
vma_internal.h mm: isolate mmap internal logic to mm/vma.c 2024-11-06 20:11:19 -08:00
vma.c mm/vma: the pgoff is correct if can_merge_right 2024-11-06 20:11:20 -08:00
vma.h mm: isolate mmap internal logic to mm/vma.c 2024-11-06 20:11:19 -08:00
vmalloc.c alloc_tag: populate memory for module tags as needed 2024-11-07 14:25:16 -08:00
vmpressure.c eventfd: simplify eventfd_signal() 2023-11-28 14:08:38 +01:00
vmscan.c mm/vmscan: wake up flushers conditionally to avoid cgroup OOM 2024-11-07 14:38:07 -08:00
vmstat.c Merge branch 'mm-hotfixes-stable' into mm-stable 2024-11-11 00:04:10 -08:00
workingset.c memcg: workingset: remove folio_memcg_rcu usage 2024-11-06 20:11:20 -08:00
z3fold.c mm/z3fold: add __percpu annotation to *unbuddied pointer in struct z3fold_pool 2024-09-01 20:25:56 -07:00
zbud.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zpool.c mm: zpool: return pool size in pages 2024-04-25 20:55:48 -07:00
zsmalloc.c mm/zsmalloc: use memcpy_from/to_page whereever possible 2024-11-07 14:38:07 -08:00
zswap.c mm: zswap: zswap_store_page() will initialize entry after adding to xarray. 2024-11-11 00:26:43 -08:00