Linux kernel stable tree
Go to file
Kalesh Singh 249608ee47 mm: respect mmap hint address when aligning for THP
Commit efa7df3e3b ("mm: align larger anonymous mappings on THP
boundaries") updated __get_unmapped_area() to align the start address for
the VMA to a PMD boundary if CONFIG_TRANSPARENT_HUGEPAGE=y.

It does this by effectively looking up a region that is of size,
request_size + PMD_SIZE, and aligning up the start to a PMD boundary.

Commit 4ef9ad19e1 ("mm: huge_memory: don't force huge page alignment on
32 bit") opted out of this for 32bit due to regressions in mmap base
randomization.

Commit d4148aeab4 ("mm, mmap: limit THP alignment of anonymous mappings
to PMD-aligned sizes") restricted this to only mmap sizes that are
multiples of the PMD_SIZE due to reported regressions in some performance
benchmarks -- which seemed mostly due to the reduced spatial locality of
related mappings due to the forced PMD-alignment.

Another unintended side effect has emerged: When a user specifies an mmap
hint address, the THP alignment logic modifies the behavior, potentially
ignoring the hint even if a sufficiently large gap exists at the requested
hint location.

Example Scenario:

Consider the following simplified virtual address (VA) space:

    ...

    0x200000-0x400000 --- VMA A
    0x400000-0x600000 --- Hole
    0x600000-0x800000 --- VMA B

    ...

A call to mmap() with hint=0x400000 and len=0x200000 behaves differently:

  - Before THP alignment: The requested region (size 0x200000) fits into
    the gap at 0x400000, so the hint is respected.

  - After alignment: The logic searches for a region of size
    0x400000 (len + PMD_SIZE) starting at 0x400000.
    This search fails due to the mapping at 0x600000 (VMA B), and the hint
    is ignored, falling back to arch_get_unmapped_area[_topdown]().

In general the hint is effectively ignored, if there is any existing
mapping in the below range:

     [mmap_hint + mmap_size, mmap_hint + mmap_size + PMD_SIZE)

This changes the semantics of mmap hint; from ""Respect the hint if a
sufficiently large gap exists at the requested location" to "Respect the
hint only if an additional PMD-sized gap exists beyond the requested
size".

This has performance implications for allocators that allocate their heap
using mmap but try to keep it "as contiguous as possible" by using the end
of the exisiting heap as the address hint.  With the new behavior it's
more likely to get a much less contiguous heap, adding extra fragmentation
and performance overhead.

To restore the expected behavior; don't use
thp_get_unmapped_area_vmflags() when the user provided a hint address, for
anonymous mappings.

Note: As Yang Shi pointed out: the issue still remains for filesystems
which are using thp_get_unmapped_area() for their get_unmapped_area() op. 
It is unclear what worklaods will regress for if we ignore THP alignment
when the hint address is provided for such file backed mappings -- so this
fix will be handled separately.

Link: https://lkml.kernel.org/r/20241118214650.3667577-1-kaleshsingh@google.com
Fixes: efa7df3e3b ("mm: align larger anonymous mappings on THP boundaries")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hans Boehm <hboehm@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-05 19:54:46 -08:00
arch i2c-for-6.13-rc1-part3 2024-12-01 13:38:24 -08:00
block block-6.13-20242901 2024-11-30 15:47:29 -08:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto Random number generator updates for Linux 6.13-rc1. 2024-11-19 10:43:44 -08:00
Documentation - Move the ->select callback to the correct ops structure in irq-mvebu-sei to 2024-12-01 12:37:58 -08:00
drivers i2c-for-6.13-rc1-part3 2024-12-01 13:38:24 -08:00
fs ocfs2: update seq_file index in ocfs2_dlm_seq_next 2024-12-05 19:54:45 -08:00
include mm/codetag: swap tags when migrate pages 2024-12-05 19:54:46 -08:00
init - The series "resource: A couple of cleanups" from Andy Shevchenko 2024-11-25 16:09:48 -08:00
io_uring io_uring-6.13-20242901 2024-11-30 15:43:02 -08:00
ipc - The series "resource: A couple of cleanups" from Andy Shevchenko 2024-11-25 16:09:48 -08:00
kernel - Fix a case where posix timers with a thread-group-wide target would miss 2024-12-01 12:41:21 -08:00
lib mm/codetag: swap tags when migrate pages 2024-12-05 19:54:46 -08:00
LICENSES LICENSES: add 0BSD license text 2024-09-01 20:43:24 -07:00
mm mm: respect mmap hint address when aligning for THP 2024-12-05 19:54:46 -08:00
net 9p update for 6.13-rc1 2024-11-30 10:28:14 -08:00
rust block-6.13-20242901 2024-11-30 15:47:29 -08:00
samples TTY / Serial driver updates for 6.13-rc1 2024-11-30 09:03:16 -08:00
scripts Kbuild updates for v6.13 2024-11-30 13:41:50 -08:00
security lsm/stable-6.13 PR 20241129 2024-11-30 18:14:56 -08:00
sound sound fixes for 6.13-rc1 2024-11-29 13:01:05 -08:00
tools selftests/damon: add _damon_sysfs.py to TEST_FILES 2024-12-05 19:54:44 -08:00
usr initramfs: shorten cmd_initfs in usr/Makefile 2024-07-16 01:07:52 +09:00
virt VFIO updates for v6.13 2024-11-27 12:57:03 -08:00
.clang-format clang-format: Update with v6.11-rc1's for_each macro list 2024-08-02 13:20:31 +02:00
.clippy.toml rust: enable Clippy's check-private-items 2024-10-07 21:39:57 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore MAINTAINERS: Retire Ralf Baechle 2024-11-12 15:48:59 +01:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore Kbuild updates for v6.13 2024-11-30 13:41:50 -08:00
.mailmap media updates for v6.13-rc1 2024-11-20 14:01:15 -08:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS cgroup: Changes for v6.13 2024-11-20 09:54:49 -08:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS i2c-for-6.13-rc1-part3 2024-12-01 13:38:24 -08:00
Makefile Linux 6.13-rc1 2024-12-01 14:28:56 -08:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.