linux-next/include
Liu Shixin 51aaee3948 mm: hugetlb: independent PMD page table shared count
The folio refcount may be increased unexpectly through try_get_folio() by
caller such as split_huge_pages.  In huge_pmd_unshare(), we use refcount
to check whether a pmd page table is shared.  The check is incorrect if
the refcount is increased by the above caller, and this can cause the page
table leaked:

 BUG: Bad page state in process sh  pfn:109324
 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x66 pfn:0x109324
 flags: 0x17ffff800000000(node=0|zone=2|lastcpupid=0xfffff)
 page_type: f2(table)
 raw: 017ffff800000000 0000000000000000 0000000000000000 0000000000000000
 raw: 0000000000000066 0000000000000000 00000000f2000000 0000000000000000
 page dumped because: nonzero mapcount
 ...
 CPU: 31 UID: 0 PID: 7515 Comm: sh Kdump: loaded Tainted: G    B              6.13.0-rc2master+ #7
 Tainted: [B]=BAD_PAGE
 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
 Call trace:
  show_stack+0x20/0x38 (C)
  dump_stack_lvl+0x80/0xf8
  dump_stack+0x18/0x28
  bad_page+0x8c/0x130
  free_page_is_bad_report+0xa4/0xb0
  free_unref_page+0x3cc/0x620
  __folio_put+0xf4/0x158
  split_huge_pages_all+0x1e0/0x3e8
  split_huge_pages_write+0x25c/0x2d8
  full_proxy_write+0x64/0xd8
  vfs_write+0xcc/0x280
  ksys_write+0x70/0x110
  __arm64_sys_write+0x24/0x38
  invoke_syscall+0x50/0x120
  el0_svc_common.constprop.0+0xc8/0xf0
  do_el0_svc+0x24/0x38
  el0_svc+0x34/0x128
  el0t_64_sync_handler+0xc8/0xd0
  el0t_64_sync+0x190/0x198

The issue may be triggered by damon, offline_page, page_idle, etc, which
will increase the refcount of page table.

1. The page table itself will be discarded after reporting the
   "nonzero mapcount".

2. The HugeTLB page mapped by the page table miss freeing since we
   treat the page table as shared and a shared page table will not be
   unmapped.

Fix it by introducing independent PMD page table shared count.  As
described by comment, pt_index/pt_mm/pt_frag_refcount are used for s390
gmap, x86 pgds and powerpc, pt_share_count is used for x86/arm64/riscv
pmds, so we can reuse the field as pt_share_count.

Link: https://lkml.kernel.org/r/20241216071147.3984217-1-liushixin2@huawei.com
Fixes: 39dde65c99 ("[PATCH] shared page table for hugetlb page")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Ken Chen <kenneth.w.chen@intel.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 19:49:57 -08:00
..
acpi common: switch back from remove_new() to remove() callback 2024-11-25 17:31:39 -08:00
asm-generic - Fix a case where posix timers with a thread-group-wide target would miss 2024-12-01 12:41:21 -08:00
clocksource
crypto This update includes the following changes: 2024-11-19 10:28:41 -08:00
cxl
drm Merge tag 'drm-misc-fixes-2024-12-05' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes 2024-12-06 08:40:47 +10:00
dt-bindings Char/Misc/IIO/Whatever driver subsystem updates for 6.13-rc1 2024-11-29 11:58:27 -08:00
keys
kunit module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
kvm KVM: arm64: vgic: Kill VGIC_MAX_PRIVATE definition 2024-11-20 17:21:08 -08:00
linux mm: hugetlb: independent PMD page table shared count 2024-12-18 19:49:57 -08:00
math-emu
media media: replace obsolete hans.verkuil@cisco.com alias 2024-11-08 13:38:09 +01:00
memory
misc
net bluetooth pull request for net: 2024-12-12 07:10:40 -08:00
pcmcia
ras
rdma
rv
scsi Random number generator updates for Linux 6.13-rc1. 2024-11-19 10:43:44 -08:00
soc ARC fixes for 6.13-r32 or rc4 2024-12-15 15:38:12 -08:00
sound ALSA: hda: cs35l56: Remove calls to cs35l56_force_sync_asp1_registers_from_cache() 2024-12-06 13:54:06 +01:00
target
trace mm/damon: fix order of arguments in damos_before_apply tracepoint 2024-12-05 19:54:47 -08:00
uapi iommu/arm-smmu-v3: Improve uAPI comment for IOMMU_HW_INFO_TYPE_ARM_SMMUV3 2024-12-03 13:30:31 -04:00
ufs scsi: ufs: core: Add missing post notify for power mode change 2024-12-04 13:22:59 -05:00
vdso
video - Improved handling of LCD power states and interactions with the fbdev subsystem. 2024-11-22 16:29:57 -08:00
xen