linux-next/Documentation/mm
Qi Zheng 1f07a84d2e mm: khugepaged: recheck pmd state in retract_page_tables()
Patch series "synchronously scan and reclaim empty user PTE pages", v4.

Previously, we tried to use a completely asynchronous method to reclaim
empty user PTE pages [1].  After discussing with David Hildenbrand, we
decided to implement synchronous reclaimation in the case of
madvise(MADV_DONTNEED) as the first step.

So this series aims to synchronously free the empty PTE pages in
madvise(MADV_DONTNEED) case.  We will detect and free empty PTE pages in
zap_pte_range(), and will add zap_details.reclaim_pt to exclude cases
other than madvise(MADV_DONTNEED).

In zap_pte_range(), mmu_gather is used to perform batch tlb flushing and
page freeing operations.  Therefore, if we want to free the empty PTE page
in this path, the most natural way is to add it to mmu_gather as well. 
Now, if CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, mmu_gather will free
page table pages by semi RCU:

 - batch table freeing: asynchronous free by RCU
 - single table freeing: IPI + synchronous free

But this is not enough to free the empty PTE page table pages in paths
other that munmap and exit_mmap path, because IPI cannot be synchronized
with rcu_read_lock() in pte_offset_map{_lock}().  So we should let single
table also be freed by RCU like batch table freeing.

As a first step, we supported this feature on x86_64 and selectd the newly
introduced CONFIG_ARCH_SUPPORTS_PT_RECLAIM.

For other cases such as madvise(MADV_FREE), consider scanning and freeing
empty PTE pages asynchronously in the future.

Note: issues related to TLB flushing are not new to this series and are tracked
      in the separate RFC patch [3]. And more context please refer to this
      thread [4].

[1]. https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/
[2]. https://lore.kernel.org/lkml/cover.1727332572.git.zhengqi.arch@bytedance.com/
[3]. https://lore.kernel.org/lkml/20240815120715.14516-1-zhengqi.arch@bytedance.com/
[4]. https://lore.kernel.org/lkml/6f38cb19-9847-4f70-bbe7-06881bb016be@bytedance.com/


This patch (of 12):

In retract_page_tables(), the lock of new_folio is still held, we will be
blocked in the page fault path, which prevents the pte entries from being
set again.  So even though the old empty PTE page may be concurrently
freed and a new PTE page is filled into the pmd entry, it is still empty
and can be removed.

So just refactor the retract_page_tables() a little bit and recheck the
pmd state after holding the pmd lock.

Link: https://lkml.kernel.org/r/cover.1733305182.git.zhengqi.arch@bytedance.com
Link: https://lkml.kernel.org/r/70a51804cd19d44ccaf031825d9fb6eaf92f2bad.1733305182.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Suggested-by: Jann Horn <jannh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 19:50:45 -08:00
..
damon Docs/mm/damon: recommend academic papers to read and/or cite 2024-11-11 17:22:27 -08:00
active_mm.rst lazy tlb: allow lazy tlb mm refcounting to be configurable 2023-03-28 16:20:08 -07:00
allocation-profiling.rst alloc_tag: support for page allocation tag compression 2024-11-07 14:25:16 -08:00
arch_pgtable_helpers.rst mm: drop leftover comment references to pxx_huge() 2024-07-03 19:30:02 -07:00
balance.rst - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
bootmem.rst docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
free_page_reporting.rst docs/mm: remove useless markup 2023-02-02 10:18:05 -07:00
highmem.rst Documentation work keeps chugging along; stuff for 6.6 includes: 2023-08-30 20:05:42 -07:00
hmm.rst docs:mm: fix spelling mistakes in heterogeneous memory management page 2024-09-10 15:14:57 -06:00
hugetlbfs_reserv.rst mm: convert free_huge_page() to free_huge_folio() 2023-08-21 14:28:43 -07:00
hwpoison.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
index.rst Docs/mm/index: move allocation profiling document to unsorted documents chapter 2024-07-03 16:19:15 -06:00
ksm.rst docs/mm: remove useless markup 2023-02-02 10:18:05 -07:00
memory-model.rst docs/mm: remove useless markup 2023-02-02 10:18:05 -07:00
mmu_notifier.rst docs/mm: remove useless markup 2023-02-02 10:18:05 -07:00
multigen_lru.rst mm: multi-gen LRU: improve design doc 2023-03-28 16:20:07 -07:00
numa.rst docs/mm: remove useless markup 2023-02-02 10:18:05 -07:00
oom.rst docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
overcommit-accounting.rst docs: mm: fix vm overcommit documentation for OVERCOMMIT_GUESS 2023-10-10 13:35:55 -06:00
page_allocation.rst docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
page_cache.rst ubifs: Convert ubifs_vm_page_mkwrite() to use a folio 2024-02-25 21:08:00 +01:00
page_frags.rst net: remove gfp_mask from napi_alloc_skb() 2024-03-28 18:30:40 -07:00
page_migration.rst mm: remove isolate_lru_page() 2024-09-09 16:38:59 -07:00
page_owner.rst mm,page_owner: fix refcount imbalance 2024-04-16 15:39:49 -07:00
page_reclaim.rst docs/mm: Physical Memory: remove useless markup 2023-02-02 10:18:04 -07:00
page_table_check.rst mm/page_table_check: support userfault wr-protect entries 2024-05-05 17:53:41 -07:00
page_tables.rst Docs/mm: Fix a mistake for pfn in page_tables.rst 2024-10-14 10:16:16 -06:00
physical_memory.rst docs/mm: Physical Memory: Fix grammar 2023-04-11 16:16:50 -06:00
process_addrs.rst mm: khugepaged: recheck pmd state in retract_page_tables() 2024-12-18 19:50:45 -08:00
remap_file_pages.rst docs/mm: remove useless markup 2023-02-02 10:18:05 -07:00
shmfs.rst docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
slab.rst docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
slub.rst SLUB: Add support for per object memory policies 2024-10-29 10:43:53 +01:00
split_page_table_lock.rst mm: pgtable: remove pte_offset_map_nolock() 2024-11-05 16:56:29 -08:00
swap.rst docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
transhuge.rst mm: remove follow_page() 2024-09-01 20:26:01 -07:00
unevictable-lru.rst mm: remove isolate_lru_page() 2024-09-09 16:38:59 -07:00
vmalloc.rst docs: rename Documentation/vm to Documentation/mm 2022-06-27 12:52:53 -07:00
vmalloced-kernel-stacks.rst docs:mm: fixed spelling and grammar mistakes on vmalloc kernel stack page 2024-09-10 15:31:45 -06:00
vmemmap_dedup.rst remove references to page->flags in documentation 2024-04-25 20:56:15 -07:00
z3fold.rst docs/mm: remove useless markup 2023-02-02 10:18:05 -07:00
zsmalloc.rst mm: add orphaned kernel-doc to the rst files. 2023-08-24 16:20:31 -07:00