linux-next/tools
Kemeng Shi e6b6abc540 Xarray: do not return sibling entries from xas_find_marked()
Patch series "Fixes and cleanups to xarray", v3.

This series contains some random fixes and cleanups to xarray. Patch 1-2
are fixes and patch 3-6 are cleanups. More details can be found in
respective patches.


This patch (of 5):

Similar to issue fixed in commit cbc0285433 ("XArray: Do not return
sibling entries from xa_load()"), we may return sibling entries from
xas_find_marked as following:

    Thread A:               Thread B:
                            xa_store_range(xa, entry, 6, 7, gfp);
			    xa_set_mark(xa, 6, mark)
    XA_STATE(xas, xa, 6);
    xas_find_marked(&xas, 7, mark);
    offset = xas_find_chunk(xas, advance, mark);
    [offset is 6 which points to a valid entry]
                            xa_store_range(xa, entry, 4, 7, gfp);
    entry = xa_entry(xa, node, 6);
    [entry is a sibling of 4]
    if (!xa_is_node(entry))
        return entry;

Skip sibling entry like xas_find() does to protect caller from seeing
sibling entry from xas_find_marked() or caller may use sibling entry as a
valid entry and crash the kernel.

Besides, load_race() test is modified to catch mentioned issue and
modified load_race() only passes after this fix is merged.

Here is an example how this bug could be triggerred in theory in nfs which
enables large folio in mapping:
Let's take a look at involved racer:
1. How pages could be created and dirtied in nfs.
write
 ksys_write
  vfs_write
   new_sync_write
    nfs_file_write
     generic_perform_write
      nfs_write_begin
       fgf_set_order
        __filemap_get_folio
      nfs_write_end
       nfs_update_folio
        nfs_writepage_setup
	 nfs_mark_request_dirty
	  filemap_dirty_folio
	   __folio_mark_dirty
	    __xa_set_mark

2. How dirty pages could be deleted in nfs.
ioctl
 do_vfs_ioctl
  file_ioctl
   ioctl_preallocate
    vfs_fallocate
     nfs42_fallocate
      nfs42_proc_deallocate
       truncate_pagecache_range
        truncate_inode_pages_range
	 truncate_inode_folio
	  filemap_remove_folio
	   page_cache_delete
	    xas_store(&xas, NULL);

3. How dirty pages could be lockless searched
sync_file_range
 ksys_sync_file_range
  __filemap_fdatawrite_range
   filemap_fdatawrite_wbc
    do_writepages
     writeback_use_writepage
      writeback_iter
       writeback_get_folio
        filemap_get_folios_tag
         find_get_entry
          folio = xas_find_marked()
          folio_try_get(folio)

In theory, kernel will crash as following:
1.Create               2.Search             3.Delete
/* write page 2,3 */
write
 ...
  nfs_write_begin
   fgf_set_order
   __filemap_get_folio
    ...
     /* index = 2, order = 1 */
     xa_store(&xas, folio)
  nfs_write_end
   ...
    __folio_mark_dirty

                       /* sync page 2 and page 3 */
                       sync_file_range
                        ...
                         find_get_entry
                          folio = xas_find_marked()
                          /* offset will be 2 */
                          offset = xas_find_chunk()

                                             /* delete page 2 and page 3 */
                                             ioctl
                                              ...
                                               xas_store(&xas, NULL);

/* write page 0-3 */
write
 ...
  nfs_write_begin
   fgf_set_order
   __filemap_get_folio
    ...
     /* index = 0, order = 2 */
     xa_store(&xas, folio)
  nfs_write_end
   ...
    __folio_mark_dirty

                          /* get sibling entry from offset 2 */
                          entry = xa_entry(.., 2)
                          /* use sibling entry as folio and crash kernel */
                          folio_try_get(folio)

Link: https://lkml.kernel.org/r/20241218154613.58754-2-shikemeng@huaweicloud.com
Link: https://lkml.kernel.org/r/20241213122523.12764-1-shikemeng@huaweicloud.com
Link: https://lkml.kernel.org/r/20241213122523.12764-2-shikemeng@huaweicloud.com
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-12-18 19:51:39 -08:00
..
accounting tools/accounting/procacct: fix minor errors 2024-12-18 19:51:31 -08:00
arch tools headers: Sync arm64 kvm header with the kernel sources 2024-12-04 14:34:49 -08:00
bootconfig
bpf BPF fixes: 2024-12-06 15:07:48 -08:00
build perf tools changes for v6.13 2024-11-26 14:54:00 -08:00
certs
cgroup
counter
crypto crypto: tools/ccp - Remove unused variable 2024-08-30 18:22:30 +08:00
debugging
firewire tools/firewire: Fix several incorrect format specifiers 2024-11-14 09:12:04 +09:00
firmware
gpio tools: gpio: Fix several incorrect format specifiers 2024-11-13 16:30:05 +01:00
hv hyperv-next for v6.12 2024-09-19 08:15:30 +02:00
iio iio: Add channel type for attention 2024-11-03 20:33:43 +00:00
include tools headers: Sync uapi/asm-generic/mman.h with the kernel sources 2024-12-04 14:34:50 -08:00
kvm/kvm_stat
laptop
leds
lib libperf: evlist: Fix --cpu argument on hybrid platform 2024-12-11 09:19:44 -08:00
memory-model tools/memory-model: simple.txt: Fix stale reference to recipes-pairs.txt 2024-09-13 23:56:44 -07:00
mm - The series "zram: optimal post-processing target selection" from 2024-11-23 09:58:07 -08:00
net NFSD 6.13 Release Notes 2024-11-26 12:59:30 -08:00
objtool Kbuild updates for v6.13 2024-11-30 13:41:50 -08:00
pci tools: PCI: Fix incorrect printf format specifiers 2024-11-20 14:20:51 -06:00
pcmcia
perf perf probe: Fix uninitialized variable 2024-12-11 21:40:46 -08:00
power turbostat version 2024.11.30 2024-11-30 18:30:22 -08:00
rcu tools/rcu: Remove RCU Tasks Rude asynchronous APIs from rcu-updaters.sh 2024-07-29 07:39:32 +05:30
sched_ext sched_ext: Rename scx_bpf_dispatch[_vtime]_from_dsq*() -> scx_bpf_dsq_move[_vtime]*() 2024-11-11 07:06:16 -10:00
scripts tools: Override makefile ARCH variable if defined, but empty 2024-11-29 17:04:25 +01:00
sound ASoC: dapm-graph: show path name for non-static routes 2024-08-23 11:03:00 +01:00
spi spi: spidev_test: add support for word delay 2024-11-07 15:25:50 +00:00
testing Xarray: do not return sibling entries from xas_find_marked() 2024-12-18 19:51:39 -08:00
thermal tools/thermal: Fix common realloc mistake 2024-11-15 14:29:03 +01:00
time
tracing tracing/tools: Updates for 6.13 2024-11-22 13:24:22 -08:00
usb usbip: tools: Fix detach_port() invalid port error path 2024-10-29 04:23:23 +01:00
verification verification/dot2: Improve dot parser robustness 2024-11-19 08:57:13 -05:00
virtio Fix typo in vringh_test.c 2024-11-06 04:40:07 -05:00
wmi
workqueue
writeback
Makefile sched_ext: Add scx_simple and scx_example_qmap example schedulers 2024-06-18 10:09:17 -10:00