linux-next/Documentation
Aneesh Kumar K.V c6018b4b25 mm/mempolicy: add set_mempolicy_home_node syscall
This syscall can be used to set a home node for the MPOL_BIND and
MPOL_PREFERRED_MANY memory policy.  Users should use this syscall after
setting up a memory policy for the specified range as shown below.

  mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp,
        new_nodes->size + 1, 0);
  sys_set_mempolicy_home_node((unsigned long)p, nr_pages * page_size,
				home_node, 0);

The syscall allows specifying a home node/preferred node from which
kernel will fulfill memory allocation requests first.

For address range with MPOL_BIND memory policy, if nodemask specifies
more than one node, page allocations will come from the node in the
nodemask with sufficient free memory that is closest to the home
node/preferred node.

For MPOL_PREFERRED_MANY if the nodemask specifies more than one node,
page allocation will come from the node in the nodemask with sufficient
free memory that is closest to the home node/preferred node.  If there
is not enough memory in all the nodes specified in the nodemask, the
allocation will be attempted from the closest numa node to the home node
in the system.

This helps applications to hint at a memory allocation preference node
and fallback to _only_ a set of nodes if the memory is not available on
the preferred node.  Fallback allocation is attempted from the node
which is nearest to the preferred node.

This helps applications to have control on memory allocation numa nodes
and avoids default fallback to slow memory NUMA nodes.  For example a
system with NUMA nodes 1,2 and 3 with DRAM memory and 10, 11 and 12 of
slow memory

 new_nodes = numa_bitmask_alloc(nr_nodes);

 numa_bitmask_setbit(new_nodes, 1);
 numa_bitmask_setbit(new_nodes, 2);
 numa_bitmask_setbit(new_nodes, 3);

 p = mmap(NULL, nr_pages * page_size, protflag, mapflag, -1, 0);
 mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp,  new_nodes->size + 1, 0);

 sys_set_mempolicy_home_node(p, nr_pages * page_size, 2, 0);

This will allocate from nodes closer to node 2 and will make sure the
kernel will only allocate from nodes 1, 2, and 3.  Memory will not be
allocated from slow memory nodes 10, 11, and 12.  This differs from
default MPOL_BIND behavior in that with default MPOL_BIND the allocation
will be attempted from node closer to the local node.  One of the
reasons to specify a home node is to allow allocations from cpu less
NUMA node and its nearby NUMA nodes.

With MPOL_PREFERRED_MANY on the other hand will first try to allocate
from the closest node to node 2 from the node list 1, 2 and 3.  If those
nodes don't have enough memory, kernel will allocate from slow memory
node 10, 11 and 12 which ever is closer to node 2.

Link: https://lkml.kernel.org/r/20211202123810.267175-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Ben Widawsky <ben.widawsky@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:30 +02:00
..
ABI f2fs-for-5.16-rc1 2021-11-13 11:20:22 -08:00
accounting
admin-guide mm/mempolicy: add set_mempolicy_home_node syscall 2022-01-15 16:30:30 +02:00
arm Documentation: arm: marvell: Fix link to armada_1000_pb.pdf document 2021-11-15 02:49:56 -07:00
arm64 arm64: update PAC description for kernel 2021-12-02 10:13:35 +00:00
block This is a relatively unexciting cycle for documentation. 2021-11-02 22:11:39 -07:00
bpf libbpf: update index.rst reference 2021-11-17 06:12:14 -07:00
cdrom drivers/cdrom: improved ioctl for media change detection 2021-09-14 20:05:26 -06:00
core-api Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
cpu-freq cpufreq: docs: Update core.rst 2021-12-01 20:02:11 +01:00
crypto crypto: engine - Add KPP Support to Crypto Engine 2021-10-29 21:04:03 +08:00
dev-tools Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
devicetree dt-bindings: spi: cadence-quadspi: document "intel,socfpga-qspi" 2021-12-27 04:20:05 -06:00
doc-guide docs: Update Sphinx requirements 2021-11-15 02:47:22 -07:00
driver-api cxl for v5.16 2021-11-08 11:49:48 -08:00
fault-injection
fb
features parisc: Move thread_info into task struct 2021-11-01 07:35:59 +01:00
filesystems mm: add a field to store names for private anonymous memory 2022-01-15 16:30:27 +02:00
firmware_class
firmware-guide Documentation: ACPI: Fix non-D0 probe _DSC object example 2021-11-10 13:59:12 +01:00
fpga
gpu drm-misc-next for 5.16: 2021-11-05 13:50:15 +10:00
hid
hwmon Driver core changes for 5.16-rc1 2021-11-04 08:32:38 -07:00
i2c Docs: Fixes link to I2C specification 2021-12-31 14:39:28 +01:00
ia64
ide
iio
infiniband
input
isdn
kbuild Kbuild updates for v5.16 2021-11-08 09:15:45 -08:00
kernel-hacking docs: futex: Fix kernel-doc references 2021-10-19 17:27:05 +02:00
leds leds: add new LED_FUNCTION_PLAYER for player LEDs for game controllers. 2021-10-27 09:49:29 +02:00
litmus-tests
livepatch
locking Documentation/locking/locktypes: Update migrate_disable() bits. 2021-11-30 15:40:31 +01:00
m68k
maintainer docs: use the lore redirector everywhere 2021-10-12 13:58:19 -06:00
mhi
mips
misc-devices
netlabel
networking Documentation: fix outdated interpretation of ip_no_pmtu_disc 2021-12-30 13:28:04 +00:00
nios2
nvdimm
openrisc
parisc
PCI
pcmcia
power Documentation: power: Describe 'advanced' and 'simple' EM models 2021-11-10 21:26:34 +01:00
powerpc
process Documentation: Add minimum pahole version 2021-11-29 14:48:00 -07:00
RCU rcu: Fix undefined Kconfig macros 2021-09-13 16:32:46 -07:00
riscv
s390
scheduler sched/fair: Add document for burstable CFS bandwidth 2021-10-05 15:51:41 +02:00
scsi
security net,lsm,selinux: revert the security_sctp_assoc_established() hook 2021-11-14 12:21:53 +00:00
sh
sound ALSA: hda/realtek: Add new alc285-hp-amp-init model 2021-12-14 10:44:26 +01:00
sparc
sphinx
sphinx-static
spi spi: Remove unused function spi_busnum_to_master() 2021-10-07 15:45:57 +01:00
staging
target
timers Documentation/no_hz: Introduce "dyntick-idle mode" before using it 2021-09-27 11:31:11 -06:00
trace docs: ftrace: fix the wrong path of tracefs 2021-11-15 02:50:39 -07:00
translations doc/zh_CN: fix a translation error in management-style 2021-11-15 02:53:30 -07:00
usb
userspace-api Char/Misc driver update for 5.16-rc1 2021-11-04 08:21:47 -07:00
virt Merge branch 'kvm-sev-move-context' into kvm-master 2021-11-11 11:02:58 -05:00
vm mm: page table check 2022-01-15 16:30:28 +02:00
w1 dt-bindings: w1: update w1-gpio.yaml reference 2021-09-16 21:02:09 -05:00
watchdog
x86 - Add the model number of a new, Raptor Lake CPU, to intel-family.h 2021-11-14 09:29:03 -08:00
xtensa
.gitignore
arch.rst
asm-annotations.rst docs: use the lore redirector everywhere 2021-10-12 13:58:19 -06:00
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py docs: conf.py: fix support for Readthedocs v 1.0.0 2021-11-29 14:27:52 -07:00
COPYING-logo
docutils.conf
dontdiff
index.rst
Kconfig
logo.gif
Makefile
memory-barriers.txt
SubmittingPatches
watch_queue.rst