mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-01-04 04:04:19 +00:00
242d12c981
Currently, we have mTHP features, but unfortunately, without support for
large folio swap-ins, once these large folios are swapped out, they are
lost because mTHP swap is a one-way process. The lack of mTHP swap-in
functionality prevents mTHP from being used on devices like Android that
heavily rely on swap.
This patch introduces mTHP swap-in support. It starts from sync devices
such as zRAM. This is probably the simplest and most common use case,
benefiting billions of Android phones and similar devices with minimal
implementation cost. In this straightforward scenario, large folios are
always exclusive, eliminating the need to handle complex rmap and
swapcache issues.
It offers several benefits:
1. Enables bidirectional mTHP swapping, allowing retrieval of mTHP after
swap-out and swap-in. Large folios in the buddy system are also
preserved as much as possible, rather than being fragmented due
to swap-in.
2. Eliminates fragmentation in swap slots and supports successful
THP_SWPOUT.
w/o this patch (Refer to the data from Chris's and Kairui's latest
swap allocator optimization while running ./thp_swap_allocator_test
w/o "-a" option [1]):
./thp_swap_allocator_test
Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 2: swpout inc: 131, swpout fallback inc: 101, Fallback percentage: 43.53%
Iteration 3: swpout inc: 71, swpout fallback inc: 155, Fallback percentage: 68.58%
Iteration 4: swpout inc: 55, swpout fallback inc: 168, Fallback percentage: 75.34%
Iteration 5: swpout inc: 35, swpout fallback inc: 191, Fallback percentage: 84.51%
Iteration 6: swpout inc: 25, swpout fallback inc: 199, Fallback percentage: 88.84%
Iteration 7: swpout inc: 23, swpout fallback inc: 205, Fallback percentage: 89.91%
Iteration 8: swpout inc: 9, swpout fallback inc: 219, Fallback percentage: 96.05%
Iteration 9: swpout inc: 13, swpout fallback inc: 213, Fallback percentage: 94.25%
Iteration 10: swpout inc: 12, swpout fallback inc: 216, Fallback percentage: 94.74%
Iteration 11: swpout inc: 16, swpout fallback inc: 213, Fallback percentage: 93.01%
Iteration 12: swpout inc: 10, swpout fallback inc: 210, Fallback percentage: 95.45%
Iteration 13: swpout inc: 16, swpout fallback inc: 212, Fallback percentage: 92.98%
Iteration 14: swpout inc: 12, swpout fallback inc: 212, Fallback percentage: 94.64%
Iteration 15: swpout inc: 15, swpout fallback inc: 211, Fallback percentage: 93.36%
Iteration 16: swpout inc: 15, swpout fallback inc: 200, Fallback percentage: 93.02%
Iteration 17: swpout inc: 9, swpout fallback inc: 220, Fallback percentage: 96.07%
w/ this patch (always 0%):
Iteration 1: swpout inc: 948, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 2: swpout inc: 953, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 3: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 4: swpout inc: 952, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 5: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 6: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 7: swpout inc: 947, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 8: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 9: swpout inc: 950, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 10: swpout inc: 945, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 11: swpout inc: 947, swpout fallback inc: 0, Fallback percentage: 0.00%
...
3. With both mTHP swap-out and swap-in supported, we offer the option to enable
zsmalloc compression/decompression with larger granularity[2]. The upcoming
optimization in zsmalloc will significantly increase swap speed and improve
compression efficiency. Tested by running 100 iterations of swapping 100MiB
of anon memory, the swap speed improved dramatically:
time consumption of swapin(ms) time consumption of swapout(ms)
lz4 4k 45274 90540
lz4 64k 22942 55667
zstdn 4k 85035 186585
zstdn 64k 46558 118533
The compression ratio also improved, as evaluated with 1 GiB of data:
granularity orig_data_size compr_data_size
4KiB-zstd 1048576000 246876055
64KiB-zstd 1048576000 199763892
Without mTHP swap-in, the potential optimizations in zsmalloc cannot be
realized.
4. Even mTHP swap-in itself can reduce swap-in page faults by a factor
of nr_pages. Swapping in content filled with the same data 0x11, w/o
and w/ the patch for five rounds (Since the content is the same,
decompression will be very fast. This primarily assesses the impact of
reduced page faults):
swp in bandwidth(bytes/ms) w/o w/
round1 624152 1127501
round2 631672 1127501
round3 620459 1139756
round4 606113 1139756
round5 624152
|
||
---|---|---|
.. | ||
damon | ||
kasan | ||
kfence | ||
kmsan | ||
backing-dev.c | ||
balloon_compaction.c | ||
bootmem_info.c | ||
cma_debug.c | ||
cma_sysfs.c | ||
cma.c | ||
cma.h | ||
compaction.c | ||
debug_page_alloc.c | ||
debug_page_ref.c | ||
debug_vm_pgtable.c | ||
debug.c | ||
dmapool_test.c | ||
dmapool.c | ||
early_ioremap.c | ||
execmem.c | ||
fadvise.c | ||
fail_page_alloc.c | ||
failslab.c | ||
filemap.c | ||
folio-compat.c | ||
gup_test.c | ||
gup_test.h | ||
gup.c | ||
highmem.c | ||
hmm.c | ||
huge_memory.c | ||
hugetlb_cgroup.c | ||
hugetlb_vmemmap.c | ||
hugetlb_vmemmap.h | ||
hugetlb.c | ||
hwpoison-inject.c | ||
init-mm.c | ||
internal.h | ||
interval_tree.c | ||
io-mapping.c | ||
ioremap.c | ||
Kconfig | ||
Kconfig.debug | ||
khugepaged.c | ||
kmemleak.c | ||
ksm.c | ||
list_lru.c | ||
maccess.c | ||
madvise.c | ||
Makefile | ||
mapping_dirty_helpers.c | ||
memblock.c | ||
memcontrol-v1.c | ||
memcontrol-v1.h | ||
memcontrol.c | ||
memfd.c | ||
memory_hotplug.c | ||
memory-failure.c | ||
memory-tiers.c | ||
memory.c | ||
mempolicy.c | ||
mempool.c | ||
memremap.c | ||
memtest.c | ||
migrate_device.c | ||
migrate.c | ||
mincore.c | ||
mlock.c | ||
mm_init.c | ||
mm_slot.h | ||
mmap_lock.c | ||
mmap.c | ||
mmu_gather.c | ||
mmu_notifier.c | ||
mmzone.c | ||
mprotect.c | ||
mremap.c | ||
mseal.c | ||
msync.c | ||
nommu.c | ||
numa_emulation.c | ||
numa_memblks.c | ||
numa.c | ||
oom_kill.c | ||
page_alloc.c | ||
page_counter.c | ||
page_ext.c | ||
page_idle.c | ||
page_io.c | ||
page_isolation.c | ||
page_owner.c | ||
page_poison.c | ||
page_reporting.c | ||
page_reporting.h | ||
page_table_check.c | ||
page_vma_mapped.c | ||
page-writeback.c | ||
pagewalk.c | ||
percpu-internal.h | ||
percpu-km.c | ||
percpu-stats.c | ||
percpu-vm.c | ||
percpu.c | ||
pgalloc-track.h | ||
pgtable-generic.c | ||
process_vm_access.c | ||
ptdump.c | ||
readahead.c | ||
rmap.c | ||
rodata_test.c | ||
secretmem.c | ||
shmem_quota.c | ||
shmem.c | ||
show_mem.c | ||
shrinker_debug.c | ||
shrinker.c | ||
shuffle.c | ||
shuffle.h | ||
slab_common.c | ||
slab.h | ||
slub.c | ||
sparse-vmemmap.c | ||
sparse.c | ||
swap_cgroup.c | ||
swap_slots.c | ||
swap_state.c | ||
swap.c | ||
swap.h | ||
swapfile.c | ||
truncate.c | ||
usercopy.c | ||
userfaultfd.c | ||
util.c | ||
vma_internal.h | ||
vma.c | ||
vma.h | ||
vmalloc.c | ||
vmpressure.c | ||
vmscan.c | ||
vmstat.c | ||
workingset.c | ||
z3fold.c | ||
zbud.c | ||
zpool.c | ||
zsmalloc.c | ||
zswap.c |