mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-01-01 02:33:57 +00:00
mm: multi-gen LRU: improve design doc
This patch improves the design doc. Specifically, 1. add a section for the per-memcg mm_struct list, and 2. add a section for the PID controller. Link: https://lkml.kernel.org/r/20230214035445.1250139-2-talumbau@google.com Signed-off-by: T.J. Alumbaugh <talumbau@google.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
parent
9a52b2f32a
commit
32d32ef140
@ -103,7 +103,8 @@ moving across tiers only involves atomic operations on
|
|||||||
``folio->flags`` and therefore has a negligible cost. A feedback loop
|
``folio->flags`` and therefore has a negligible cost. A feedback loop
|
||||||
modeled after the PID controller monitors refaults over all the tiers
|
modeled after the PID controller monitors refaults over all the tiers
|
||||||
from anon and file types and decides which tiers from which types to
|
from anon and file types and decides which tiers from which types to
|
||||||
evict or protect.
|
evict or protect. The desired effect is to balance refault percentages
|
||||||
|
between anon and file types proportional to the swappiness level.
|
||||||
|
|
||||||
There are two conceptually independent procedures: the aging and the
|
There are two conceptually independent procedures: the aging and the
|
||||||
eviction. They form a closed-loop system, i.e., the page reclaim.
|
eviction. They form a closed-loop system, i.e., the page reclaim.
|
||||||
@ -156,6 +157,27 @@ This time-based approach has the following advantages:
|
|||||||
and memory sizes.
|
and memory sizes.
|
||||||
2. It is more reliable because it is directly wired to the OOM killer.
|
2. It is more reliable because it is directly wired to the OOM killer.
|
||||||
|
|
||||||
|
``mm_struct`` list
|
||||||
|
------------------
|
||||||
|
An ``mm_struct`` list is maintained for each memcg, and an
|
||||||
|
``mm_struct`` follows its owner task to the new memcg when this task
|
||||||
|
is migrated.
|
||||||
|
|
||||||
|
A page table walker iterates ``lruvec_memcg()->mm_list`` and calls
|
||||||
|
``walk_page_range()`` with each ``mm_struct`` on this list to scan
|
||||||
|
PTEs. When multiple page table walkers iterate the same list, each of
|
||||||
|
them gets a unique ``mm_struct``, and therefore they can run in
|
||||||
|
parallel.
|
||||||
|
|
||||||
|
Page table walkers ignore any misplaced pages, e.g., if an
|
||||||
|
``mm_struct`` was migrated, pages left in the previous memcg will be
|
||||||
|
ignored when the current memcg is under reclaim. Similarly, page table
|
||||||
|
walkers will ignore pages from nodes other than the one under reclaim.
|
||||||
|
|
||||||
|
This infrastructure also tracks the usage of ``mm_struct`` between
|
||||||
|
context switches so that page table walkers can skip processes that
|
||||||
|
have been sleeping since the last iteration.
|
||||||
|
|
||||||
Rmap/PT walk feedback
|
Rmap/PT walk feedback
|
||||||
---------------------
|
---------------------
|
||||||
Searching the rmap for PTEs mapping each page on an LRU list (to test
|
Searching the rmap for PTEs mapping each page on an LRU list (to test
|
||||||
@ -170,7 +192,7 @@ promotes hot pages. If the scan was done cacheline efficiently, it
|
|||||||
adds the PMD entry pointing to the PTE table to the Bloom filter. This
|
adds the PMD entry pointing to the PTE table to the Bloom filter. This
|
||||||
forms a feedback loop between the eviction and the aging.
|
forms a feedback loop between the eviction and the aging.
|
||||||
|
|
||||||
Bloom Filters
|
Bloom filters
|
||||||
-------------
|
-------------
|
||||||
Bloom filters are a space and memory efficient data structure for set
|
Bloom filters are a space and memory efficient data structure for set
|
||||||
membership test, i.e., test if an element is not in the set or may be
|
membership test, i.e., test if an element is not in the set or may be
|
||||||
@ -186,6 +208,18 @@ is false positive, the cost is an additional scan of a range of PTEs,
|
|||||||
which may yield hot pages anyway. Parameters of the filter itself can
|
which may yield hot pages anyway. Parameters of the filter itself can
|
||||||
control the false positive rate in the limit.
|
control the false positive rate in the limit.
|
||||||
|
|
||||||
|
PID controller
|
||||||
|
--------------
|
||||||
|
A feedback loop modeled after the Proportional-Integral-Derivative
|
||||||
|
(PID) controller monitors refaults over anon and file types and
|
||||||
|
decides which type to evict when both types are available from the
|
||||||
|
same generation.
|
||||||
|
|
||||||
|
The PID controller uses generations rather than the wall clock as the
|
||||||
|
time domain because a CPU can scan pages at different rates under
|
||||||
|
varying memory pressure. It calculates a moving average for each new
|
||||||
|
generation to avoid being permanently locked in a suboptimal state.
|
||||||
|
|
||||||
Memcg LRU
|
Memcg LRU
|
||||||
---------
|
---------
|
||||||
An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs,
|
An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs,
|
||||||
@ -223,9 +257,9 @@ parts:
|
|||||||
|
|
||||||
* Generations
|
* Generations
|
||||||
* Rmap walks
|
* Rmap walks
|
||||||
* Page table walks
|
* Page table walks via ``mm_struct`` list
|
||||||
* Bloom filters
|
* Bloom filters for rmap/PT walk feedback
|
||||||
* PID controller
|
* PID controller for refault feedback
|
||||||
|
|
||||||
The aging and the eviction form a producer-consumer model;
|
The aging and the eviction form a producer-consumer model;
|
||||||
specifically, the latter drives the former by the sliding window over
|
specifically, the latter drives the former by the sliding window over
|
||||||
|
@ -3604,7 +3604,7 @@ static bool iterate_mm_list_nowalk(struct lruvec *lruvec, unsigned long max_seq)
|
|||||||
}
|
}
|
||||||
|
|
||||||
/******************************************************************************
|
/******************************************************************************
|
||||||
* refault feedback loop
|
* PID controller
|
||||||
******************************************************************************/
|
******************************************************************************/
|
||||||
|
|
||||||
/*
|
/*
|
||||||
|
Loading…
Reference in New Issue
Block a user