linux-stable/fs/xfs
Dave Chinner c85007e2e3 xfs: don't use BMBT btree split workers for IO completion
When we split a BMBT due to record insertion, we offload it to a
worker thread because we can be deep in the stack when we try to
allocate a new block for the BMBT. Allocation can use several
kilobytes of stack (full memory reclaim, swap and/or IO path can
end up on the stack during allocation) and we can already be several
kilobytes deep in the stack when we need to split the BMBT.

A recent workload demonstrated a deadlock in this BMBT split
offload. It requires several things to happen at once:

1. two inodes need a BMBT split at the same time, one must be
unwritten extent conversion from IO completion, the other must be
from extent allocation.

2. there must be a no available xfs_alloc_wq worker threads
available in the worker pool.

3. There must be sustained severe memory shortages such that new
kworker threads cannot be allocated to the xfs_alloc_wq pool for
both threads that need split work to be run

4. The split work from the unwritten extent conversion must run
first.

5. when the BMBT block allocation runs from the split work, it must
loop over all AGs and not be able to either trylock an AGF
successfully, or each AGF is is able to lock has no space available
for a single block allocation.

6. The BMBT allocation must then attempt to lock the AGF that the
second task queued to the rescuer thread already has locked before
it finds an AGF it can allocate from.

At this point, we have an ABBA deadlock between tasks queued on the
xfs_alloc_wq rescuer thread and a locked AGF. i.e. The queued task
holding the AGF lock can't be run by the rescuer thread until the
task the rescuer thread is runing gets the AGF lock....

This is a highly improbably series of events, but there it is.

There's a couple of ways to fix this, but the easiest way to ensure
that we only punt tasks with a locked AGF that holds enough space
for the BMBT block allocations to the worker thread.

This works for unwritten extent conversion in IO completion (which
doesn't have a locked AGF and space reservations) because we have
tight control over the IO completion stack. It is typically only 6
functions deep when xfs_btree_split() is called because we've
already offloaded the IO completion work to a worker thread and
hence we don't need to worry about stack overruns here.

The other place we can be called for a BMBT split without a
preceeding allocation is __xfs_bunmapi() when punching out the
center of an existing extent. We don't remove extents in the IO
path, so these operations don't tend to be called with a lot of
stack consumed. Hence we don't really need to ship the split off to
a worker thread in these cases, either.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-02-05 08:48:24 -08:00
..
libxfs xfs: don't use BMBT btree split workers for IO completion 2023-02-05 08:48:24 -08:00
scrub xfs: check inode core when scrubbing metadata files 2022-11-16 16:11:51 -08:00
Kconfig xfs: fix Kconfig asking about XFS_SUPPORT_V4 when XFS_FS=n 2020-10-16 15:34:28 -07:00
kmem.c mm: introduce memalloc_retry_wait() 2022-01-15 16:30:29 +02:00
kmem.h xfs: remove kmem_zone typedef 2021-10-22 16:00:31 -07:00
Makefile - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
mrlock.h xfs: convert to SPDX license tags 2018-06-06 14:17:53 -07:00
xfs_acl.c fs: pass dentry to set acl method 2022-10-19 12:55:42 +02:00
xfs_acl.h fs: pass dentry to set acl method 2022-10-19 12:55:42 +02:00
xfs_aops.c xfs: add debug knob to slow down writeback for fun 2022-11-28 17:24:35 -08:00
xfs_aops.h xfs: add a xfs_inode_buftarg helper 2019-10-28 08:37:54 -07:00
xfs_attr_inactive.c xfs: don't leak memory when attr fork loading fails 2022-07-20 16:40:39 -07:00
xfs_attr_item.c xfs: dump corrupt recovered log intent items to dmesg consistently 2022-10-31 08:58:20 -07:00
xfs_attr_item.h xfs: share xattr name and value buffers when logging xattr updates 2022-05-23 08:43:46 +10:00
xfs_attr_list.c xfs: use XFS_IFORK_Q to determine the presence of an xattr fork 2022-07-09 15:17:21 -07:00
xfs_bio_io.c fs/xfs: Use the enum req_op and blk_opf_t types 2022-07-14 12:14:33 -06:00
xfs_bmap_item.c xfs: fix confusing variable names in xfs_bmap_item.c 2023-02-05 08:48:11 -08:00
xfs_bmap_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_bmap_util.c xfs: xfs_bmap_punch_delalloc_range() should take a byte range 2022-11-29 09:09:17 +11:00
xfs_bmap_util.h xfs: xfs_bmap_punch_delalloc_range() should take a byte range 2022-11-29 09:09:17 +11:00
xfs_buf_item_recover.c xfs: convert buf_cancel_table allocation to kmalloc_array 2022-05-27 10:27:19 +10:00
xfs_buf_item.c xfs: fix super block buf log item UAF during force shutdown 2022-11-30 09:25:46 -08:00
xfs_buf_item.h xfs: convert buffer log item flags to unsigned. 2022-04-21 10:46:40 +10:00
xfs_buf.c xfs: invalidate block device page cache during unmount 2022-11-30 08:55:18 -08:00
xfs_buf.h xfs: xfs_buf cache destroy isn't RCU safe 2022-07-20 16:40:39 -07:00
xfs_dir2_readdir.c xfs: rearrange the logic and remove the broken comment for xfs_dir2_isxx 2022-10-04 16:39:58 +11:00
xfs_discard.c xfs: pass perag to xfs_alloc_read_agf() 2022-07-07 19:07:40 +10:00
xfs_discard.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
xfs_dquot_item_recover.c xfs: replace xfs_sb_version checks with feature flag checks 2021-08-19 10:07:12 -07:00
xfs_dquot_item.c xfs: remove support for disabling quota accounting on a mounted file system 2021-08-06 11:05:36 -07:00
xfs_dquot_item.h xfs: remove support for disabling quota accounting on a mounted file system 2021-08-06 11:05:36 -07:00
xfs_dquot.c xfs: Fix comment typo 2022-07-22 10:58:39 -07:00
xfs_dquot.h xfs: remove warning counters from struct xfs_dquot_res 2022-05-11 17:12:09 +10:00
xfs_error.c New XFS code for 6.2: 2022-12-14 10:11:51 -08:00
xfs_error.h xfs: add debug knob to slow down writeback for fun 2022-11-28 17:24:35 -08:00
xfs_export.c xfs: convert remaining mount flags to state flags 2021-08-19 10:07:13 -07:00
xfs_export.h xfs: convert to SPDX license tags 2018-06-06 14:17:53 -07:00
xfs_extent_busy.c xfs: fix extent busy updating 2023-01-05 07:34:21 -08:00
xfs_extent_busy.h xfs: pass perags through to the busy extent code 2021-06-02 10:48:24 +10:00
xfs_extfree_item.c xfs: fix confusing xfs_extent_item variable names 2023-02-05 08:48:11 -08:00
xfs_extfree_item.h xfs: refactor all the EFI/EFD log item sizeof logic 2022-10-31 08:58:20 -07:00
xfs_file.c xfs: write page faults in iomap are not buffered writes 2022-11-07 10:09:11 +11:00
xfs_filestream.c xfs: pass perag to xfs_alloc_read_agf() 2022-07-07 19:07:40 +10:00
xfs_filestream.h xfs: convert mount flags to features 2021-08-19 10:07:12 -07:00
xfs_fsmap.c xfs: make rtbitmap ILOCKing consistent when scanning the rt bitmap file 2022-11-16 15:25:03 -08:00
xfs_fsmap.h xfs: fix deadlock and streamline xfs_getfsmap performance 2020-10-07 08:40:29 -07:00
xfs_fsops.c - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
xfs_fsops.h xfs: get rid of xfs_growfs_{data,log}_t 2021-02-03 09:18:50 -08:00
xfs_globals.c xfs: Add larp debug option 2022-05-11 17:01:22 +10:00
xfs_health.c xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown 2021-08-19 10:07:13 -07:00
xfs_icache.c xfs: Fix deadlock on xfs_inodegc_worker 2023-01-03 10:23:07 -08:00
xfs_icache.h xfs: introduce xfs_inodegc_push() 2022-06-23 13:34:38 -07:00
xfs_icreate_item.c xfs: fix potential log item leak 2022-05-04 11:45:11 +10:00
xfs_icreate_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_inode_item_recover.c xfs: clean up "%Ld/%Lu" which doesn't meet C standard 2022-09-19 06:47:14 +10:00
xfs_inode_item.c xfs: remove the redundant word in comment 2022-09-19 06:45:14 +10:00
xfs_inode_item.h xfs: aborting inodes on shutdown may need buffer lock 2022-03-29 18:21:59 -07:00
xfs_inode.c xfs: fix incorrect error-out in xfs_remove 2022-11-16 19:20:20 -08:00
xfs_inode.h - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
xfs_ioctl32.c xfs: Set up infrastructure for log attribute replay 2022-05-04 12:41:02 +10:00
xfs_ioctl32.h xfs: remove unused xfs_ioctl32.h declarations 2022-01-18 10:18:36 -08:00
xfs_ioctl.c xfs: get root inode correctly at bulkstat 2023-01-03 10:23:07 -08:00
xfs_ioctl.h xfs: kill the XFS_IOC_{ALLOC,FREE}SP* ioctls 2022-01-17 09:16:41 -08:00
xfs_iomap.c xfs: make xfs_iomap_page_ops static 2022-12-26 10:11:18 -08:00
xfs_iomap.h xfs: use iomap_valid method to detect stale cached iomaps 2022-11-29 09:09:17 +11:00
xfs_iops.c MM patches for 6.2-rc1. 2022-12-13 19:29:45 -08:00
xfs_iops.h xfs: remove xfs_setattr_time() declaration 2022-09-19 06:53:14 +10:00
xfs_itable.c xfs: port to vfs{g,u}id_t and associated helpers 2022-09-19 06:54:14 +10:00
xfs_itable.h xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters 2022-04-13 07:02:45 +00:00
xfs_iunlink_item.c xfs: add in-memory iunlink log item 2022-07-14 11:47:42 +10:00
xfs_iunlink_item.h xfs: add in-memory iunlink log item 2022-07-14 11:47:42 +10:00
xfs_iwalk.c xfs: avoid buffer deadlocks when walking fs inodes 2021-08-09 11:13:16 -07:00
xfs_iwalk.h xfs: Decouple XFS_IBULK flags from XFS_IWALK flags 2022-04-13 07:02:44 +00:00
xfs_linux.h fs/xfs: Use the enum req_op and blk_opf_t types 2022-07-14 12:14:33 -06:00
xfs_log_cil.c xfs: xlog_sync() manually adjusts grant head space 2022-07-07 18:56:09 +10:00
xfs_log_priv.h xfs: xlog_sync() manually adjusts grant head space 2022-07-07 18:56:09 +10:00
xfs_log_recover.c xfs: avoid a UAF when log intent item recovery fails 2022-10-18 14:39:29 -07:00
xfs_log.c xfs: wait iclog complete before tearing down AIL 2022-11-30 09:25:46 -08:00
xfs_log.h xfs: move CIL ordering to the logvec chain 2022-07-07 18:56:08 +10:00
xfs_message.c Merge branch 'guilt/xfs-unsigned-flags-5.18' into xfs-5.19-for-next 2022-04-21 16:45:03 +10:00
xfs_message.h xfs: implement per-mount warnings for scrub and shrink usage 2022-05-27 10:31:34 +10:00
xfs_mount.c xfs: fix sb write verify for lazysbcount 2022-11-16 19:20:20 -08:00
xfs_mount.h - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe 2022-08-05 16:32:45 -07:00
xfs_mru_cache.c xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_mru_cache.h xfs: convert to SPDX license tags 2018-06-06 14:17:53 -07:00
xfs_notify_failure.c xfs: changes for 6.1-rc1 2022-10-10 20:32:10 -07:00
xfs_ondisk.h xfs: fix memcpy fortify errors in EFI log format copying 2022-10-31 08:58:20 -07:00
xfs_pnfs.c xfs: use iomap_valid method to detect stale cached iomaps 2022-11-29 09:09:17 +11:00
xfs_pnfs.h xfs: prepare xfs_break_layouts() for another layout type 2018-05-22 07:19:08 -07:00
xfs_pwork.c xfs: increase the default parallelism levels of pwork clients 2021-02-03 09:18:49 -08:00
xfs_pwork.h xfs: increase the default parallelism levels of pwork clients 2021-02-03 09:18:49 -08:00
xfs_qm_bhv.c xfs: replace xfs_sb_version checks with feature flag checks 2021-08-19 10:07:12 -07:00
xfs_qm_syscalls.c xfs: introduce xfs_inodegc_push() 2022-06-23 13:34:38 -07:00
xfs_qm.c xfs: xfs_qm: remove unnecessary ‘0’ values from error 2023-01-03 10:23:07 -08:00
xfs_qm.h xfs: remove quota warning limit from struct xfs_quota_limits 2022-05-11 17:12:09 +10:00
xfs_quota.h xfs: queue inactivation immediately when quota is nearing enforcement 2021-08-09 10:52:18 -07:00
xfs_quotaops.c xfs: don't set quota warning values 2022-05-11 17:12:09 +10:00
xfs_refcount_item.c xfs: fix confusing variable names in xfs_refcount_item.c 2023-02-05 08:48:12 -08:00
xfs_refcount_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_reflink.c xfs: don't assert if cmap covers imap after cycling lock 2022-12-26 10:11:17 -08:00
xfs_reflink.h xfs: pass perag to xfs_alloc_read_agf() 2022-07-07 19:07:40 +10:00
xfs_rmap_item.c xfs: fix confusing variable names in xfs_rmap_item.c 2023-02-05 08:48:11 -08:00
xfs_rmap_item.h xfs: rename _zone variables to _cache 2021-10-22 16:04:20 -07:00
xfs_rtalloc.c xfs: make rtbitmap ILOCKing consistent when scanning the rt bitmap file 2022-11-16 15:25:03 -08:00
xfs_rtalloc.h xfs: recalculate free rt extents after log recovery 2022-04-12 06:49:42 +10:00
xfs_stats.c xfs: replace unnecessary seq_printf with seq_puts 2022-09-19 06:48:14 +10:00
xfs_stats.h xfs: periodically relog deferred intent items 2020-10-07 08:40:28 -07:00
xfs_super.c xfs: Print XFS UUID on mount and umount events. 2022-11-16 19:20:21 -08:00
xfs_super.h xfs: implement ->notify_failure() for XFS 2022-07-17 17:14:30 -07:00
xfs_symlink.c xfs: replace inode fork size macros with functions 2022-07-12 11:17:27 -07:00
xfs_symlink.h xfs: support idmapped mounts 2021-01-24 14:43:46 +01:00
xfs_sysctl.c xfs: restore speculative_cow_prealloc_lifetime sysctl 2021-02-24 10:16:08 -08:00
xfs_sysctl.h xfs: Add larp debug option 2022-05-11 17:01:22 +10:00
xfs_sysfs.c xfs: Add larp debug option 2022-05-11 17:01:22 +10:00
xfs_sysfs.h xfs: Fix unreferenced object reported by kmemleak in xfs_sysfs_init() 2022-10-20 09:42:56 -07:00
xfs_trace.c xfs: add debug knob to slow down writeback for fun 2022-11-28 17:24:35 -08:00
xfs_trace.h xfs: pass refcount intent directly through the log intent code 2023-02-05 08:48:11 -08:00
xfs_trans_ail.c xfs: shut up -Wuninitialized in xfsaild_push 2022-11-30 08:55:18 -08:00
xfs_trans_buf.c xfs: introduce xfs_buf_daddr() 2021-08-19 10:07:14 -07:00
xfs_trans_dquot.c xfs: remove quota warning limit from struct xfs_quota_limits 2022-05-11 17:12:09 +10:00
xfs_trans_priv.h xfs: convert log vector chain to use list heads 2022-07-07 18:55:59 +10:00
xfs_trans.c xfs: introduce in-memory inode unlink log items 2022-07-14 09:21:42 -07:00
xfs_trans.h xfs: introduce in-memory inode unlink log items 2022-07-14 09:21:42 -07:00
xfs_xattr.c xfs: use strscpy() to instead of strncpy() 2023-02-05 08:48:11 -08:00
xfs_xattr.h xfs: move xfs_attr_use_log_assist usage out of libxfs 2022-05-27 10:34:04 +10:00
xfs.h xfs: remove b_last_holder & associated macros 2018-08-12 08:37:31 -07:00