mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory

For now, the feature of hugetlb_free_vmemmap is not compatible with the
feature of memory_hotplug.memmap_on_memory, and hugetlb_free_vmemmap takes
precedence over memory_hotplug.memmap_on_memory.  However, someone wants
to make memory_hotplug.memmap_on_memory takes precedence over
hugetlb_free_vmemmap since memmap_on_memory makes it more likely to
succeed memory hotplug in close-to-OOM situations.  So the decision of
making hugetlb_free_vmemmap take precedence is not wise and elegant.

The proper approach is to have hugetlb_vmemmap.c do the check whether the
section which the HugeTLB pages belong to can be optimized.  If the
section's vmemmap pages are allocated from the added memory block itself,
hugetlb_free_vmemmap should refuse to optimize the vmemmap, otherwise, do
the optimization.  Then both kernel parameters are compatible.  So this
patch introduces VmemmapSelfHosted to mask any non-optimizable vmemmap
pages.  The hugetlb_vmemmap can use this flag to detect if a vmemmap page
can be optimized.

[songmuchun@bytedance.com: walk vmemmap page tables to avoid false-positive]
  Link: https://lkml.kernel.org/r/20220620110616.12056-3-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20220617135650.74901-3-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Co-developed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
Muchun Song 2022-06-17 21:56:50 +08:00 committed by akpm
parent ed7802dd48
commit 6636109512
6 changed files with 93 additions and 47 deletions

View File

@ -1722,9 +1722,11 @@
Built with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=y, Built with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=y,
the default is on. the default is on.
This is not compatible with memory_hotplug.memmap_on_memory. Note that the vmemmap pages may be allocated from the added
If both parameters are enabled, hugetlb_free_vmemmap takes memory block itself when memory_hotplug.memmap_on_memory is
precedence over memory_hotplug.memmap_on_memory. enabled, those vmemmap pages cannot be optimized even if this
feature is enabled. Other vmemmap pages not allocated from
the added memory block itself do not be affected.
hung_task_panic= hung_task_panic=
[KNL] Should the hung task detector generate panics. [KNL] Should the hung task detector generate panics.
@ -3068,10 +3070,12 @@
[KNL,X86,ARM] Boolean flag to enable this feature. [KNL,X86,ARM] Boolean flag to enable this feature.
Format: {on | off (default)} Format: {on | off (default)}
When enabled, runtime hotplugged memory will When enabled, runtime hotplugged memory will
allocate its internal metadata (struct pages) allocate its internal metadata (struct pages,
from the hotadded memory which will allow to those vmemmap pages cannot be optimized even
hotadd a lot of memory without requiring if hugetlb_free_vmemmap is enabled) from the
additional memory to do so. hotadded memory which will allow to hotadd a
lot of memory without requiring additional
memory to do so.
This feature is disabled by default because it This feature is disabled by default because it
has some implication on large (e.g. GB) has some implication on large (e.g. GB)
allocations in some configurations (e.g. small allocations in some configurations (e.g. small
@ -3081,10 +3085,6 @@
Note that even when enabled, there are a few cases where Note that even when enabled, there are a few cases where
the feature is not effective. the feature is not effective.
This is not compatible with hugetlb_free_vmemmap. If
both parameters are enabled, hugetlb_free_vmemmap takes
precedence over memory_hotplug.memmap_on_memory.
memtest= [KNL,X86,ARM,M68K,PPC,RISCV] Enable memtest memtest= [KNL,X86,ARM,M68K,PPC,RISCV] Enable memtest
Format: <integer> Format: <integer>
default : 0 <disable> default : 0 <disable>

View File

@ -565,9 +565,8 @@ See Documentation/admin-guide/mm/hugetlbpage.rst
hugetlb_optimize_vmemmap hugetlb_optimize_vmemmap
======================== ========================
This knob is not available when memory_hotplug.memmap_on_memory (kernel parameter) This knob is not available when the size of 'struct page' (a structure defined
is configured or the size of 'struct page' (a structure defined in in include/linux/mm_types.h) is not power of two (an unusual system config could
include/linux/mm_types.h) is not power of two (an unusual system config could
result in this). result in this).
Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap pages Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap pages

View File

@ -351,13 +351,4 @@ void arch_remove_linear_mapping(u64 start, u64 size);
extern bool mhp_supports_memmap_on_memory(unsigned long size); extern bool mhp_supports_memmap_on_memory(unsigned long size);
#endif /* CONFIG_MEMORY_HOTPLUG */ #endif /* CONFIG_MEMORY_HOTPLUG */
#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
bool mhp_memmap_on_memory(void);
#else
static inline bool mhp_memmap_on_memory(void)
{
return false;
}
#endif
#endif /* __LINUX_MEMORY_HOTPLUG_H */ #endif /* __LINUX_MEMORY_HOTPLUG_H */

View File

@ -193,6 +193,11 @@ enum pageflags {
/* Only valid for buddy pages. Used to track pages that are reported */ /* Only valid for buddy pages. Used to track pages that are reported */
PG_reported = PG_uptodate, PG_reported = PG_uptodate,
#ifdef CONFIG_MEMORY_HOTPLUG
/* For self-hosted memmap pages */
PG_vmemmap_self_hosted = PG_owner_priv_1,
#endif
}; };
#define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1) #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1)
@ -628,6 +633,12 @@ PAGEFLAG_FALSE(SkipKASanPoison, skip_kasan_poison)
*/ */
__PAGEFLAG(Reported, reported, PF_NO_COMPOUND) __PAGEFLAG(Reported, reported, PF_NO_COMPOUND)
#ifdef CONFIG_MEMORY_HOTPLUG
PAGEFLAG(VmemmapSelfHosted, vmemmap_self_hosted, PF_ANY)
#else
PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted)
#endif
/* /*
* On an anonymous page mapped into a user virtual memory area, * On an anonymous page mapped into a user virtual memory area,
* page->mapping points to its anon_vma, not to a struct address_space; * page->mapping points to its anon_vma, not to a struct address_space;

View File

@ -10,7 +10,7 @@
*/ */
#define pr_fmt(fmt) "HugeTLB: " fmt #define pr_fmt(fmt) "HugeTLB: " fmt
#include <linux/memory_hotplug.h> #include <linux/memory.h>
#include "hugetlb_vmemmap.h" #include "hugetlb_vmemmap.h"
/* /*
@ -97,18 +97,68 @@ int hugetlb_vmemmap_alloc(struct hstate *h, struct page *head)
return ret; return ret;
} }
static unsigned int vmemmap_optimizable_pages(struct hstate *h,
struct page *head)
{
if (READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF)
return 0;
if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) {
pmd_t *pmdp, pmd;
struct page *vmemmap_page;
unsigned long vaddr = (unsigned long)head;
/*
* Only the vmemmap page's vmemmap page can be self-hosted.
* Walking the page tables to find the backing page of the
* vmemmap page.
*/
pmdp = pmd_off_k(vaddr);
/*
* The READ_ONCE() is used to stabilize *pmdp in a register or
* on the stack so that it will stop changing under the code.
* The only concurrent operation where it can be changed is
* split_vmemmap_huge_pmd() (*pmdp will be stable after this
* operation).
*/
pmd = READ_ONCE(*pmdp);
if (pmd_leaf(pmd))
vmemmap_page = pmd_page(pmd) + pte_index(vaddr);
else
vmemmap_page = pte_page(*pte_offset_kernel(pmdp, vaddr));
/*
* Due to HugeTLB alignment requirements and the vmemmap pages
* being at the start of the hotplugged memory region in
* memory_hotplug.memmap_on_memory case. Checking any vmemmap
* page's vmemmap page if it is marked as VmemmapSelfHosted is
* sufficient.
*
* [ hotplugged memory ]
* [ section ][...][ section ]
* [ vmemmap ][ usable memory ]
* ^ | | |
* +---+ | |
* ^ | |
* +-------+ |
* ^ |
* +-------------------------------------------+
*/
if (PageVmemmapSelfHosted(vmemmap_page))
return 0;
}
return hugetlb_optimize_vmemmap_pages(h);
}
void hugetlb_vmemmap_free(struct hstate *h, struct page *head) void hugetlb_vmemmap_free(struct hstate *h, struct page *head)
{ {
unsigned long vmemmap_addr = (unsigned long)head; unsigned long vmemmap_addr = (unsigned long)head;
unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages; unsigned long vmemmap_end, vmemmap_reuse, vmemmap_pages;
vmemmap_pages = hugetlb_optimize_vmemmap_pages(h); vmemmap_pages = vmemmap_optimizable_pages(h, head);
if (!vmemmap_pages) if (!vmemmap_pages)
return; return;
if (READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF)
return;
static_branch_inc(&hugetlb_optimize_vmemmap_key); static_branch_inc(&hugetlb_optimize_vmemmap_key);
vmemmap_addr += RESERVE_VMEMMAP_SIZE; vmemmap_addr += RESERVE_VMEMMAP_SIZE;
@ -199,10 +249,10 @@ static struct ctl_table hugetlb_vmemmap_sysctls[] = {
static __init int hugetlb_vmemmap_sysctls_init(void) static __init int hugetlb_vmemmap_sysctls_init(void)
{ {
/* /*
* If "memory_hotplug.memmap_on_memory" is enabled or "struct page" * If "struct page" crosses page boundaries, the vmemmap pages cannot
* crosses page boundaries, the vmemmap pages cannot be optimized. * be optimized.
*/ */
if (!mhp_memmap_on_memory() && is_power_of_2(sizeof(struct page))) if (is_power_of_2(sizeof(struct page)))
register_sysctl_init("vm", hugetlb_vmemmap_sysctls); register_sysctl_init("vm", hugetlb_vmemmap_sysctls);
return 0; return 0;

View File

@ -43,30 +43,22 @@
#include "shuffle.h" #include "shuffle.h"
#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY #ifdef CONFIG_MHP_MEMMAP_ON_MEMORY
static int memmap_on_memory_set(const char *val, const struct kernel_param *kp)
{
if (hugetlb_optimize_vmemmap_enabled())
return 0;
return param_set_bool(val, kp);
}
static const struct kernel_param_ops memmap_on_memory_ops = {
.flags = KERNEL_PARAM_OPS_FL_NOARG,
.set = memmap_on_memory_set,
.get = param_get_bool,
};
/* /*
* memory_hotplug.memmap_on_memory parameter * memory_hotplug.memmap_on_memory parameter
*/ */
static bool memmap_on_memory __ro_after_init; static bool memmap_on_memory __ro_after_init;
module_param_cb(memmap_on_memory, &memmap_on_memory_ops, &memmap_on_memory, 0444); module_param(memmap_on_memory, bool, 0444);
MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug"); MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug");
bool mhp_memmap_on_memory(void) static inline bool mhp_memmap_on_memory(void)
{ {
return memmap_on_memory; return memmap_on_memory;
} }
#else
static inline bool mhp_memmap_on_memory(void)
{
return false;
}
#endif #endif
enum { enum {
@ -1035,7 +1027,7 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
struct zone *zone) struct zone *zone)
{ {
unsigned long end_pfn = pfn + nr_pages; unsigned long end_pfn = pfn + nr_pages;
int ret; int ret, i;
ret = kasan_add_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages)); ret = kasan_add_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages));
if (ret) if (ret)
@ -1043,6 +1035,9 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages,
move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE); move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE);
for (i = 0; i < nr_pages; i++)
SetPageVmemmapSelfHosted(pfn_to_page(pfn + i));
/* /*
* It might be that the vmemmap_pages fully span sections. If that is * It might be that the vmemmap_pages fully span sections. If that is
* the case, mark those sections online here as otherwise they will be * the case, mark those sections online here as otherwise they will be