linux-stable/include
Jason A. Donenfeld 9651fcedf7 mm: add MAP_DROPPABLE for designating always lazily freeable mappings
The vDSO getrandom() implementation works with a buffer allocated with a
new system call that has certain requirements:

- It shouldn't be written to core dumps.
  * Easy: VM_DONTDUMP.
- It should be zeroed on fork.
  * Easy: VM_WIPEONFORK.

- It shouldn't be written to swap.
  * Uh-oh: mlock is rlimited.
  * Uh-oh: mlock isn't inherited by forks.

- It shouldn't reserve actual memory, but it also shouldn't crash when
  page faulting in memory if none is available
  * Uh-oh: VM_NORESERVE means segfaults.

It turns out that the vDSO getrandom() function has three really nice
characteristics that we can exploit to solve this problem:

1) Due to being wiped during fork(), the vDSO code is already robust to
   having the contents of the pages it reads zeroed out midway through
   the function's execution.

2) In the absolute worst case of whatever contingency we're coding for,
   we have the option to fallback to the getrandom() syscall, and
   everything is fine.

3) The buffers the function uses are only ever useful for a maximum of
   60 seconds -- a sort of cache, rather than a long term allocation.

These characteristics mean that we can introduce VM_DROPPABLE, which
has the following semantics:

a) It never is written out to swap.
b) Under memory pressure, mm can just drop the pages (so that they're
   zero when read back again).
c) It is inherited by fork.
d) It doesn't count against the mlock budget, since nothing is locked.
e) If there's not enough memory to service a page fault, it's not fatal,
   and no signal is sent.

This way, allocations used by vDSO getrandom() can use:

    VM_DROPPABLE | VM_DONTDUMP | VM_WIPEONFORK | VM_NORESERVE

And there will be no problem with OOMing, crashing on overcommitment,
using memory when not in use, not wiping on fork(), coredumps, or
writing out to swap.

In order to let vDSO getrandom() use this, expose these via mmap(2) as
MAP_DROPPABLE.

Note that this involves removing the MADV_FREE special case from
sort_folio(), which according to Yu Zhao is unnecessary and will simply
result in an extra call to shrink_folio_list() in the worst case. The
chunk removed reenables the swapbacked flag, which we don't want for
VM_DROPPABLE, and we can't conditionalize it here because there isn't a
vma reference available.

Finally, the provided self test ensures that this is working as desired.

Cc: linux-mm@kvack.org
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-07-19 20:22:12 +02:00
..
acpi ACPI: EC: Evaluate orphan _REG under EC device 2024-06-13 11:28:54 +02:00
asm-generic syscalls: mmap(): use unsigned offset type consistently 2024-06-25 15:57:38 +02:00
clocksource
crypto This push fixes a bug in the new ecc P521 code as well as a buggy 2024-05-20 08:47:54 -07:00
drm Short summary of fixes pull: 2024-05-27 13:47:14 +10:00
dt-bindings dt-bindings: net: dp8386x: Add MIT license along with GPL-2.0 2024-06-07 12:16:22 +01:00
keys Hi, 2024-05-13 10:40:15 -07:00
kunit kunit: Print last test location on fault 2024-05-06 14:22:02 -06:00
kvm Merge branch kvm-arm64/misc-6.10 into kvmarm-master/next 2024-05-08 16:41:50 +01:00
linux mm: add MAP_DROPPABLE for designating always lazily freeable mappings 2024-07-19 20:22:12 +02:00
math-emu
media
memory
misc
net bpf: Fix too early release of tcx_entry 2024-07-08 14:07:31 -07:00
pcmcia
ras tracing/treewide: Remove second parameter of __assign_str() 2024-05-22 20:14:47 -04:00
rdma The usual shower of singleton fixes and minor series all over MM, 2024-05-19 09:21:03 -07:00
rv
scsi scsi: core: Introduce the BLIST_SKIP_IO_HINTS flag 2024-06-13 21:03:13 -04:00
soc I'm actually surprised this time. There aren't any new Qualcomm SoC clk 2024-05-18 12:48:37 -07:00
sound ALSA: dmaengine: Synchronize dma channel after drop() 2024-06-11 17:13:31 +01:00
target
trace mm: add MAP_DROPPABLE for designating always lazily freeable mappings 2024-07-19 20:22:12 +02:00
uapi mm: add MAP_DROPPABLE for designating always lazily freeable mappings 2024-07-19 20:22:12 +02:00
ufs
vdso
video
xen