linux-stable/mm
Jan Kara ab4443fe3c readahead: avoid multiple marked readahead pages
ra_alloc_folio() marks a page that should trigger next round of async
readahead.  However it rounds up computed index to the order of page being
allocated.  This can however lead to multiple consecutive pages being
marked with readahead flag.  Consider situation with index == 1, mark ==
1, order == 0.  We insert order 0 page at index 1 and mark it.  Then we
bump order to 1, index to 2, mark (still == 1) is rounded up to 2 so page
at index 2 is marked as well.  Then we bump order to 2, index is
incremented to 4, mark gets rounded to 4 so page at index 4 is marked as
well.  The fact that multiple pages get marked within a single readahead
window confuses the readahead logic and results in readahead window being
trimmed back to 1.  This situation is triggered in particular when maximum
readahead window size is not a power of two (in the observed case it was
768 KB) and as a result sequential read throughput suffers.

Fix the problem by rounding 'mark' down instead of up.  Because the index
is naturally aligned to 'order', we are guaranteed 'rounded mark' == index
iff 'mark' is within the page we are allocating at 'index' and thus
exactly one page is marked with readahead flag as required by the
readahead code and sequential read performance is restored.

This effectively reverts part of commit b9ff43dd27 ("mm/readahead: Fix
readahead with large folios").  The commit changed the rounding with the
rationale:

"...  we were setting the readahead flag on the folio which contains the
last byte read from the block.  This is wrong because we will trigger
readahead at the end of the read without waiting to see if a subsequent
read is going to use the pages we just read."

Although this is true, the fact is this was always the case with read
sizes not aligned to folio boundaries and large folios in the page cache
just make the situation more obvious (and frequent).  Also for sequential
read workloads it is better to trigger the readahead earlier rather than
later.  It is true that the difference in the rounding and thus earlier
triggering of the readahead can result in reading more for semi-random
workloads.  However workloads really suffering from this seem to be rare. 
In particular I have verified that the workload described in commit
b9ff43dd27 ("mm/readahead: Fix readahead with large folios") of reading
random 100k blocks from a file like:

[reader]
bs=100k
rw=randread
numjobs=1
size=64g
runtime=60s

is not impacted by the rounding change and achieves ~70MB/s in both cases.

[jack@suse.cz: fix one more place where mark rounding was done as well]
  Link: https://lkml.kernel.org/r/20240123153254.5206-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240104085839.21029-1-jack@suse.cz
Fixes: b9ff43dd27 ("mm/readahead: Fix readahead with large folios")
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Guo Xuenan <guoxuenan@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-25 23:52:20 -08:00
..
damon mm/damon/vaddr: change asm-generic/mman-common.h to linux/mman.h 2023-12-29 11:58:57 -08:00
kasan kasan: avoid resetting aux_lock 2024-01-12 15:20:45 -08:00
kfence KFENCE: cleanup kfence_guarded_alloc() after CONFIG_SLAB removal 2023-12-05 11:17:58 +01:00
kmsan mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
backing-dev.c writeback: remove redundant checks for root memcg 2023-08-21 13:37:48 -07:00
balloon_compaction.c mm: Convert all PageMovable users to movable_operations 2022-08-02 12:34:03 -04:00
bootmem_info.c bootmem: use kmemleak_free_part_phys in put_page_bootmem 2023-10-25 16:47:13 -07:00
cma_debug.c mm/cma_debug: show complete cma name in debugfs directories 2022-09-11 20:25:50 -07:00
cma_sysfs.c mm: cma: make kobj_type structure constant 2023-03-28 16:20:06 -07:00
cma.c mm: cma: remove unnecessary initialization of ret 2023-12-12 10:57:08 -08:00
cma.h mm/cma: provide option to opt out from exposing pages on activation failure 2022-03-22 15:57:09 -07:00
compaction.c Generic: 2024-01-17 13:03:37 -08:00
debug_page_alloc.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
debug_page_ref.c
debug_vm_pgtable.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
debug.c mm: update validate_mm() to use vma iterator 2023-06-09 16:25:31 -07:00
dmapool_test.c dmapool: add alloc/free performance test 2023-04-05 19:42:38 -07:00
dmapool.c mm/mempool/dmapool: remove CONFIG_DEBUG_SLAB ifdefs 2023-12-05 11:17:58 +01:00
early_ioremap.c mm/early_ioremap.c: improve the execution efficiency of early_ioremap_setup() 2023-06-09 16:25:56 -07:00
fadvise.c mm: remove unnecessary pagevec includes 2023-06-23 16:59:31 -07:00
fail_page_alloc.c mm: page_alloc: split out FAIL_PAGE_ALLOC 2023-06-09 16:25:23 -07:00
failslab.c mm: fix unexpected changes to {failslab|fail_page_alloc}.attr 2022-11-22 18:50:44 -08:00
filemap.c vfs-6.8.netfs 2024-01-19 09:10:23 -08:00
folio-compat.c mm: remove page_add_new_anon_rmap and lru_cache_add_inactive_or_unevictable 2023-12-29 11:58:27 -08:00
gup_test.c Merge mm-hotfixes-stable into mm-stable to pick up depended-upon changes. 2023-06-23 16:58:19 -07:00
gup_test.h mm/gup_test: start/stop/read functionality for PIN LONGTERM test 2022-11-08 17:37:15 -08:00
gup.c mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]() 2023-12-29 11:58:56 -08:00
highmem.c x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs 2023-12-29 12:22:28 -08:00
hmm.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
huge_memory.c Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
hugetlb_cgroup.c mm, hugetlb: remove HUGETLB_CGROUP_MIN_ORDER 2023-10-18 14:34:17 -07:00
hugetlb_vmemmap.c mm: hugetlb_vmemmap: move mmap lock to vmemmap_remap_range() 2023-12-12 10:57:08 -08:00
hugetlb_vmemmap.h mm: hugetlb_vmemmap: fix reference to nonexistent file 2023-10-25 16:47:14 -07:00
hugetlb.c Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
hwpoison-inject.c mm/hwpoison: add __init/__exit annotations to module init/exit funcs 2022-10-03 14:03:05 -07:00
init-mm.c mm: Deprecate pasid field 2023-12-12 10:11:32 +01:00
internal.h mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
interval_tree.c
io-mapping.c
ioremap.c mm: ioremap: remove unneeded ioremap_allowed and iounmap_allowed 2023-08-18 10:12:36 -07:00
Kconfig IOMMU Updates for Linux v6.8 2024-01-18 15:16:57 -08:00
Kconfig.debug mm/slab: remove CONFIG_SLAB from all Kconfig and Makefile 2023-12-05 11:14:40 +01:00
khugepaged.c header cleanups for 6.8 2024-01-10 16:43:55 -08:00
kmemleak.c kmemleak: avoid RCU stalls when freeing metadata for per-CPU pointers 2023-12-12 10:57:07 -08:00
ksm.c mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]() 2023-12-29 11:58:56 -08:00
list_lru.c mm/list_lru.c: remove unused list_lru_from_kmem() 2023-12-20 14:48:11 -08:00
maccess.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
madvise.c mm: return a folio from read_swap_cache_async() 2023-12-29 11:58:32 -08:00
Makefile mm/slab: remove CONFIG_SLAB from all Kconfig and Makefile 2023-12-05 11:14:40 +01:00
mapping_dirty_helpers.c mm: fix clean_record_shared_mapping_range kernel-doc 2023-08-24 16:20:30 -07:00
memblock.c memblock: code readability improvement 2024-01-18 16:46:18 -08:00
memcontrol.c Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
memfd.c memfd: drop warning for missing exec-related flags 2023-10-04 10:32:22 -07:00
memory_hotplug.c mm/memory_hotplug: fix memmap_on_memory sysfs value retrieval 2024-01-12 15:20:48 -08:00
memory-failure.c New code for 6.8: 2024-01-10 08:45:22 -08:00
memory-tiers.c base/node / acpi: Change 'node_hmem_attrs' to 'access_coordinates' 2023-12-22 14:23:13 -08:00
memory.c Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
mempolicy.c Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
mempool.c Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
memremap.c mm: remove stale example from comment 2023-12-29 11:58:26 -08:00
memtest.c mm: memtest: convert to memtest_report_meminfo() 2023-08-21 13:37:47 -07:00
migrate_device.c mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]() 2023-12-29 11:58:56 -08:00
migrate.c Generic: 2024-01-17 13:03:37 -08:00
mincore.c mm: enable page walking API to lock vmas during the walk 2023-08-21 13:07:20 -07:00
mlock.c mm: mlock: avoid folio_within_range() on KSM pages 2023-10-25 16:47:14 -07:00
mm_init.c efi: disable mirror feature during crashkernel 2024-01-12 15:20:47 -08:00
mm_slot.h mm: introduce common struct mm_slot 2022-10-03 14:02:43 -07:00
mmap_lock.c
mmap.c Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
mmu_gather.c mm/memory: page_remove_rmap() -> folio_remove_rmap_pte() 2023-12-29 11:58:54 -08:00
mmu_notifier.c mmu_notifiers: rename invalidate_range notifier 2023-08-18 10:12:41 -07:00
mmzone.c zswap: shrink zswap pool based on memory pressure 2023-12-12 10:57:02 -08:00
mprotect.c mm: mprotect: use a folio in change_pte_range() 2023-10-25 16:47:12 -07:00
mremap.c mm: abstract VMA merge and extend into vma_merge_extend() helper 2023-10-18 14:34:18 -07:00
msync.c mm/msync: use vma_find() instead of vma linked list 2022-09-26 19:46:25 -07:00
nommu.c Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
oom_kill.c mm, oom:dump_tasks add rss detailed information printing 2023-12-10 16:51:53 -08:00
page_alloc.c Networking changes for 6.8. 2024-01-11 10:07:29 -08:00
page_counter.c mm: page_counter: remove unneeded atomic ops for low/min 2022-09-11 20:26:01 -07:00
page_ext.c mm/page_ext: move functions around for minor cleanups to page_ext 2023-08-18 10:12:31 -07:00
page_idle.c mm: page_idle: convert page idle to use a folio 2023-01-18 17:12:52 -08:00
page_io.c zswap: memcontrol: implement zswap writeback disabling 2023-12-29 20:22:11 -08:00
page_isolation.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_owner.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_poison.c mm/page_poison: replace kmap_atomic() with kmap_local_page() 2023-12-10 16:51:50 -08:00
page_reporting.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
page_reporting.h
page_table_check.c mm: convert page_table_check_pte_set() to page_table_check_ptes_set() 2023-08-24 16:20:18 -07:00
page_vma_mapped.c mm: thp: introduce multi-size THP sysfs interface 2023-12-20 14:48:12 -08:00
page-writeback.c Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
pagewalk.c mm: pagewalk: assert write mmap lock only for walking the user page tables 2023-12-10 16:51:53 -08:00
percpu-internal.h percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to reduce false sharing 2023-06-19 16:19:29 -07:00
percpu-km.c
percpu-stats.c mm: use vmalloc_array and vcalloc for array allocations 2022-03-08 09:30:46 -05:00
percpu-vm.c
percpu.c mm: Introduce flush_cache_vmap_early() 2023-12-14 00:23:17 -08:00
pgalloc-track.h
pgtable-generic.c mm/pgtable: notes on pte_offset_map[_lock]() 2023-08-18 10:12:25 -07:00
process_vm_access.c mm: fix process_vm_rw page counts 2023-12-10 16:51:39 -08:00
ptdump.c mm: ptdump should use ptep_get_lockless() 2023-06-19 16:19:24 -07:00
readahead.c readahead: avoid multiple marked readahead pages 2024-01-25 23:52:20 -08:00
rmap.c mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED 2023-12-29 11:58:56 -08:00
rodata_test.c mm/rodata_test: use PAGE_ALIGNED() helper 2022-10-03 14:03:05 -07:00
secretmem.c mm/secretmem: use a folio in secretmem_fault() 2023-08-21 13:38:02 -07:00
shmem_quota.c shmem: Add default quota limit mount options 2023-08-09 09:15:40 +02:00
shmem.c header cleanups for 6.8 2024-01-10 16:43:55 -08:00
show_mem.c mm, treewide: introduce NR_PAGE_ORDERS 2024-01-08 15:27:15 -08:00
shrinker_debug.c mm: shrinker: convert shrinker_rwsem to mutex 2023-10-04 10:32:26 -07:00
shrinker.c mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info() 2024-01-05 09:58:32 -08:00
shuffle.c mm/shuffle: convert module_param_call to module_param_cb 2022-10-03 14:03:07 -07:00
shuffle.h mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
slab_common.c slub: use a folio in __kmalloc_large_node 2024-01-05 10:17:46 -08:00
slab.h mm/slab: move kmalloc() functions from slab_common.c to slub.c 2023-12-06 11:57:21 +01:00
slub.c Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
sparse-vmemmap.c mm/vmemmap: allow architectures to override how vmemmap optimization works 2023-08-18 10:12:53 -07:00
sparse.c mm/sparsemem: fix race in accessing memory_section->usage 2023-12-29 11:58:43 -08:00
swap_cgroup.c mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled 2022-10-03 14:03:36 -07:00
swap_slots.c mm/swap: convert put_swap_page() to put_swap_folio() 2022-10-03 14:02:46 -07:00
swap_state.c mm: convert swap_cluster_readahead and swap_vma_readahead to return a folio 2023-12-29 11:58:32 -08:00
swap.c mm: remove references to pagevec 2023-06-23 16:59:30 -07:00
swap.h mm: convert swap_cluster_readahead and swap_vma_readahead to return a folio 2023-12-29 11:58:32 -08:00
swapfile.c header cleanups for 6.8 2024-01-10 16:43:55 -08:00
truncate.c fs: convert error_remove_page to error_remove_folio 2023-12-10 16:51:42 -08:00
usercopy.c mm: Fix copy_from_user_nofault(). 2023-04-12 17:36:23 -07:00
userfaultfd.c userfaultfd: avoid huge_zero_page in UFFDIO_MOVE 2024-01-12 15:20:49 -08:00
util.c mm/util: use kmap_local_page() in memcmp_pages() 2023-12-10 16:51:49 -08:00
vmalloc.c mm/vmalloc: fix the unchecked dereference warning in vread_iter() 2023-11-01 12:38:35 -07:00
vmpressure.c eventfd: simplify eventfd_signal() 2023-11-28 14:08:38 +01:00
vmscan.c Many singleton patches against the MM code. The patch series which 2024-01-09 11:18:47 -08:00
vmstat.c mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER 2024-01-08 15:27:15 -08:00
workingset.c mm: ratelimit stat flush from workingset shrinker 2024-01-05 10:17:45 -08:00
z3fold.c mm/z3fold: remove obsolete comment for struct z3fold_pool 2023-08-21 13:37:51 -07:00
zbud.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zpool.c mm: zswap: remove shrink from zpool interface 2023-06-19 16:19:27 -07:00
zsmalloc.c mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large 2024-01-05 10:17:47 -08:00
zswap.c zswap: memcontrol: implement zswap writeback disabling 2023-12-29 20:22:11 -08:00