linux/tools
Linus Torvalds 4f43ade45d memblock: always release pages to the buddy allocator in memblock_free_late()
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
 only releases pages to the buddy allocator if they are not in the
 deferred range. This is correct for free pages (as defined by
 for_each_free_mem_pfn_range_in_zone()) because free pages in the
 deferred range will be initialized and released as part of the deferred
 init process. memblock_free_pages() is called by memblock_free_late(),
 which is used to free reserved ranges after memblock_free_all() has
 run. All pages in reserved ranges have been initialized at that point,
 and accordingly, those pages are not touched by the deferred init
 process. This means that currently, if the pages that
 memblock_free_late() intends to release are in the deferred range, they
 will never be released to the buddy allocator. They will forever be
 reserved.
 
 In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
 which is also correct for free pages but is not correct for reserved
 pages. KMSAN metadata for reserved pages is initialized by
 kmsan_init_shadow(), which runs shortly before memblock_free_all().
 
 For both of these reasons, memblock_free_pages() should only be called
 for free pages, and memblock_free_late() should call __free_pages_core()
 directly instead.
 
 One case where this issue can occur in the wild is EFI boot on
 x86_64. The x86 EFI code reserves all EFI boot services memory ranges
 via memblock_reserve() and frees them later via memblock_free_late()
 (efi_reserve_boot_services() and efi_free_boot_services(),
 respectively). If any of those ranges happens to fall within the
 deferred init range, the pages will not be released and that memory will
 be unavailable.
 
 For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:
 
 v6.2-rc2:
 Node 0, zone      DMA
       spanned  4095
       present  3999
       managed  3840
 Node 0, zone    DMA32
       spanned  246652
       present  245868
       managed  178867
 
 v6.2-rc2 + patch:
 Node 0, zone      DMA
       spanned  4095
       present  3999
       managed  3840
 Node 0, zone    DMA32
       spanned  246652
       present  245868
       managed  222816   # +43,949 pages
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmPCrI8QHHJwcHRAa2Vy
 bmVsLm9yZwAKCRA5A4Ymyw79kT1lB/wPbLpePLzZfDGyV/NR9gi4FuJiaRfhlklV
 rbxnJce050GERbSQoF/r4zrxn2pzvIWGMh1xWZBGi/q8mT2rOIYtVqUahY9YuL/Z
 7+xqdCOALIxEj+cXqYocqp8/NFgUWLGuMoomc9lWvEkUs+zOvkD8Z/bRecfPYvOa
 BftPALmtXgx46Ecce0gZvvh4YULpVLNdDPPiwZTabV+47Cl8+cJ0Y+iEHsUfOesU
 hQG0unWJH77O3IU4QxiirLekLP/6a5O5f0W7u3PZmNNv7N+UdwE+De+QF0aamfgA
 LZDO1qOakflegFZvK0JchCzS4hc6dtRKqIvNM3cCBMXLvV4REHKP
 =geNh
 -----END PGP SIGNATURE-----

Merge tag 'fixes-2023-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock fix from Mike Rapoport:
 "memblock: always release pages to the buddy allocator in
  memblock_free_late()

  If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
  only releases pages to the buddy allocator if they are not in the
  deferred range. This is correct for free pages (as defined by
  for_each_free_mem_pfn_range_in_zone()) because free pages in the
  deferred range will be initialized and released as part of the
  deferred init process.

  memblock_free_pages() is called by memblock_free_late(), which is used
  to free reserved ranges after memblock_free_all() has run. All pages
  in reserved ranges have been initialized at that point, and
  accordingly, those pages are not touched by the deferred init process.

  This means that currently, if the pages that memblock_free_late()
  intends to release are in the deferred range, they will never be
  released to the buddy allocator. They will forever be reserved.

  In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
  which is also correct for free pages but is not correct for reserved
  pages. KMSAN metadata for reserved pages is initialized by
  kmsan_init_shadow(), which runs shortly before memblock_free_all().

  For both of these reasons, memblock_free_pages() should only be called
  for free pages, and memblock_free_late() should call
  __free_pages_core() directly instead.

  One case where this issue can occur in the wild is EFI boot on x86_64.
  The x86 EFI code reserves all EFI boot services memory ranges via
  memblock_reserve() and frees them later via memblock_free_late()
  (efi_reserve_boot_services() and efi_free_boot_services(),
  respectively).

  If any of those ranges happens to fall within the deferred init range,
  the pages will not be released and that memory will be unavailable.

  For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:

    v6.2-rc2:
    Node 0, zone      DMA
          spanned  4095
          present  3999
          managed  3840
    Node 0, zone    DMA32
          spanned  246652
          present  245868
          managed  178867

    v6.2-rc2 + patch:
    Node 0, zone      DMA
          spanned  4095
          present  3999
          managed  3840
    Node 0, zone    DMA32
          spanned  246652
          present  245868
          managed  222816   # +43,949 pages"

* tag 'fixes-2023-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  mm: Always release pages to the buddy allocator in memblock_free_late().
2023-01-14 10:08:08 -06:00
..
accounting tools/accounting/procacct: remove some unused variables 2022-11-18 13:55:09 -08:00
arch perf tools fixes and improvements for v6.2: 2nd batch 2022-12-22 11:07:29 -08:00
bootconfig
bpf bpftool: Fix linkage with statically built libllvm 2022-12-22 20:09:43 +01:00
build perf bpf: Fix build with libbpf 0.7.0 by checking if bpf_program__set_insns() is available 2022-10-25 17:40:48 -03:00
certs
cgroup iocost_monitor: reorder BlkgIterator 2022-09-23 16:57:10 -10:00
counter
debugging tools: Add new "test" taint to kernel-chktaint 2022-09-07 14:51:12 -06:00
edid
firewire
firmware
gpio
hv tools: hv: kvp: remove unnecessary (void*) conversions 2022-09-05 16:55:20 +00:00
iio tools: iio: iio_generic_buffer: Fix read size 2022-11-01 08:48:13 +00:00
include tools/nolibc: fix the O_* fcntl/open macro definitions for riscv 2023-01-09 09:36:05 -08:00
io_uring
kvm/kvm_stat tools/kvm_stat: update exit reasons for vmx/svm/aarch64/userspace 2022-11-09 12:26:52 -05:00
laptop
leds
lib libperf: Fix install_pkgconfig target 2022-12-16 10:04:06 -03:00
memory-model tools/memory-model: Weaken ctrl dependency definition in explanation.txt 2022-10-18 15:14:52 -07:00
objtool objtool: Tolerate STT_NOTYPE symbols at end of section 2023-01-09 17:53:46 +01:00
pci
pcmcia
perf perf auxtrace: Fix address filter duplicate symbol selection 2023-01-11 14:03:44 -03:00
power ACPI updates for 6.2-rc1 2022-12-12 13:38:17 -08:00
rcu
scripts
spi spi: spidev_test: Warn when the mode is not the requested mode 2022-06-13 15:56:03 +01:00
testing memblock: always release pages to the buddy allocator in memblock_free_late() 2023-01-14 10:08:08 -06:00
thermal tools/thermal: Fix possible path truncations 2022-08-03 19:28:46 +02:00
time
tracing rtla: Fix exit status when returning from calls to usage() 2022-12-09 18:06:24 -05:00
usb tools: usb: ffs-aio-example: Fix build error with aarch64-*-gnu-gcc toolchain(s) 2022-11-09 12:37:56 +01:00
verification Tracing fix for 6.2: 2022-12-21 19:03:42 -08:00
virtio tools/virtio: remove smp_read_barrier_depends() 2022-12-28 05:28:11 -05:00
vm MM patches for 6.2-rc1. 2022-12-13 19:29:45 -08:00
wmi
Makefile tools/nolibc: make the default target build the headers 2022-06-20 09:43:19 -07:00