Linux kernel source tree
Go to file
Damien Le Moal fe0418eb9b block: Prevent potential deadlocks in zone write plug error recovery
Zone write plugging for handling writes to zones of a zoned block
device always execute a zone report whenever a write BIO to a zone
fails. The intent of this is to ensure that the tracking of a zone write
pointer is always correct to ensure that the alignment to a zone write
pointer of write BIOs can be checked on submission and that we can
always correctly emulate zone append operations using regular write
BIOs.

However, this error recovery scheme introduces a potential deadlock if a
device queue freeze is initiated while BIOs are still plugged in a zone
write plug and one of these write operation fails. In such case, the
disk zone write plug error recovery work is scheduled and executes a
report zone. This in turn can result in a request allocation in the
underlying driver to issue the report zones command to the device. But
with the device queue freeze already started, this allocation will
block, preventing the report zone execution and the continuation of the
processing of the plugged BIOs. As plugged BIOs hold a queue usage
reference, the queue freeze itself will never complete, resulting in a
deadlock.

Avoid this problem by completely removing from the zone write plugging
code the use of report zones operations after a failed write operation,
instead relying on the device user to either execute a report zones,
reset the zone, finish the zone, or give up writing to the device (which
is a fairly common pattern for file systems which degrade to read-only
after write failures). This is not an unreasonnable requirement as all
well-behaved applications, FSes and device mapper already use report
zones to recover from write errors whenever possible by comparing the
current position of a zone write pointer with what their assumption
about the position is.

The changes to remove the automatic error recovery are as follows:
 - Completely remove the error recovery work and its associated
   resources (zone write plug list head, disk error list, and disk
   zone_wplugs_work work struct). This also removes the functions
   disk_zone_wplug_set_error() and disk_zone_wplug_clear_error().

 - Change the BLK_ZONE_WPLUG_ERROR zone write plug flag into
   BLK_ZONE_WPLUG_NEED_WP_UPDATE. This new flag is set for a zone write
   plug whenever a write opration targetting the zone of the zone write
   plug fails. This flag indicates that the zone write pointer offset is
   not reliable and that it must be updated when the next report zone,
   reset zone, finish zone or disk revalidation is executed.

 - Modify blk_zone_write_plug_bio_endio() to set the
   BLK_ZONE_WPLUG_NEED_WP_UPDATE flag for the target zone of a failed
   write BIO.

 - Modify the function disk_zone_wplug_set_wp_offset() to clear this
   new flag, thus implementing recovery of a correct write pointer
   offset with the reset (all) zone and finish zone operations.

 - Modify blkdev_report_zones() to always use the disk_report_zones_cb()
   callback so that disk_zone_wplug_sync_wp_offset() can be called for
   any zone marked with the BLK_ZONE_WPLUG_NEED_WP_UPDATE flag.
   This implements recovery of a correct write pointer offset for zone
   write plugs marked with BLK_ZONE_WPLUG_NEED_WP_UPDATE and within
   the range of the report zones operation executed by the user.

 - Modify blk_revalidate_seq_zone() to call
   disk_zone_wplug_sync_wp_offset() for all sequential write required
   zones when a zoned block device is revalidated, thus always resolving
   any inconsistency between the write pointer offset of zone write
   plugs and the actual write pointer position of sequential zones.

Fixes: dd291d77cc ("block: Introduce zone write plugging")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20241209122357.47838-5-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-10 09:15:33 -07:00
arch module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
block block: Prevent potential deadlocks in zone write plug error recovery 2024-12-10 09:15:33 -07:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
Documentation module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
drivers dm: Fix dm-zoned-reclaim zone write pointer alignment 2024-12-10 09:15:33 -07:00
fs module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
include block: Prevent potential deadlocks in zone write plug error recovery 2024-12-10 09:15:33 -07:00
init - The series "resource: A couple of cleanups" from Andy Shevchenko 2024-11-25 16:09:48 -08:00
io_uring io_uring-6.13-20242901 2024-11-30 15:43:02 -08:00
ipc - The series "resource: A couple of cleanups" from Andy Shevchenko 2024-11-25 16:09:48 -08:00
kernel module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
lib module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
LICENSES LICENSES: add 0BSD license text 2024-09-01 20:43:24 -07:00
mm module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
net module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
rust block-6.13-20242901 2024-11-30 15:47:29 -08:00
samples module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
scripts module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
security module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
sound module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
tools module: Convert symbol namespace to string literal 2024-12-02 11:34:44 -08:00
usr initramfs: shorten cmd_initfs in usr/Makefile 2024-07-16 01:07:52 +09:00
virt VFIO updates for v6.13 2024-11-27 12:57:03 -08:00
.clang-format clang-format: Update with v6.11-rc1's for_each macro list 2024-08-02 13:20:31 +02:00
.clippy.toml rust: enable Clippy's check-private-items 2024-10-07 21:39:57 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore MAINTAINERS: Retire Ralf Baechle 2024-11-12 15:48:59 +01:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore Kbuild updates for v6.13 2024-11-30 13:41:50 -08:00
.mailmap media updates for v6.13-rc1 2024-11-20 14:01:15 -08:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS cgroup: Changes for v6.13 2024-11-20 09:54:49 -08:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS i2c-for-6.13-rc1-part3 2024-12-01 13:38:24 -08:00
Makefile Linux 6.13-rc1 2024-12-01 14:28:56 -08:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.