block: move cache control settings out of queue->flags

Move the cache control settings into the queue_limits so that the flags
can be set atomically with the device queue frozen.

Add new features and flags field for the driver set flags, and internal
(usually sysfs-controlled) flags in the block layer.  Note that we'll
eventually remove enough field from queue_limits to bring it back to the
previous size.

The disable flag is inverted compared to the previous meaning, which
means it now survives a rescan, similar to the max_sectors and
max_discard_sectors user limits.

The FLUSH and FUA flags are now inherited by blk_stack_limits, which
simplified the code in dm a lot, but also causes a slight behavior
change in that dm-switch and dm-unstripe now advertise a write cache
despite setting num_flush_bios to 0.  The I/O path will handle this
gracefully, but as far as I can tell the lack of num_flush_bios
and thus flush support is a pre-existing data integrity bug in those
targets that really needs fixing, after which a non-zero num_flush_bios
should be required in dm for targets that map to underlying devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This commit is contained in:
Christoph Hellwig 2024-06-17 08:04:40 +02:00 committed by Jens Axboe
parent 70905f8706
commit 1122c0c1cc
29 changed files with 228 additions and 207 deletions

View File

@ -46,41 +46,50 @@ worry if the underlying devices need any explicit cache flushing and how
the Forced Unit Access is implemented. The REQ_PREFLUSH and REQ_FUA flags
may both be set on a single bio.
Implementation details for bio based block drivers
--------------------------------------------------------------
These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
directly below the submit_bio interface. For remapping drivers the REQ_FUA
bits need to be propagated to underlying devices, and a global flush needs
to be implemented for bios with the REQ_PREFLUSH bit set. For real device
drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits
on non-empty bios can simply be ignored, and REQ_PREFLUSH requests without
data can be completed successfully without doing any work. Drivers for
devices with volatile caches need to implement the support for these
flags themselves without any help from the block layer.
Implementation details for request_fn based block drivers
---------------------------------------------------------
Feature settings for block drivers
----------------------------------
For devices that do not support volatile write caches there is no driver
support required, the block layer completes empty REQ_PREFLUSH requests before
entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
requests that have a payload. For devices with volatile write caches the
driver needs to tell the block layer that it supports flushing caches by
doing::
requests that have a payload.
blk_queue_write_cache(sdkp->disk->queue, true, false);
For devices with volatile write caches the driver needs to tell the block layer
that it supports flushing caches by setting the
and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn. Note that
REQ_PREFLUSH requests with a payload are automatically turned into a sequence
of an empty REQ_OP_FLUSH request followed by the actual write by the block
layer. For devices that also support the FUA bit the block layer needs
to be told to pass through the REQ_FUA bit using::
BLK_FEAT_WRITE_CACHE
blk_queue_write_cache(sdkp->disk->queue, true, true);
flag in the queue_limits feature field. For devices that also support the FUA
bit the block layer needs to be told to pass on the REQ_FUA bit by also setting
the
and the driver must handle write requests that have the REQ_FUA bit set
in prep_fn/request_fn. If the FUA bit is not natively supported the block
layer turns it into an empty REQ_OP_FLUSH request after the actual write.
BLK_FEAT_FUA
flag in the features field of the queue_limits structure.
Implementation details for bio based block drivers
--------------------------------------------------
For bio based drivers the REQ_PREFLUSH and REQ_FUA bit are simplify passed on
to the driver if the drivers sets the BLK_FEAT_WRITE_CACHE flag and the drivers
needs to handle them.
*NOTE*: The REQ_FUA bit also gets passed on when the BLK_FEAT_FUA flags is
_not_ set. Any bio based driver that sets BLK_FEAT_WRITE_CACHE also needs to
handle REQ_FUA.
For remapping drivers the REQ_FUA bits need to be propagated to underlying
devices, and a global flush needs to be implemented for bios with the
REQ_PREFLUSH bit set.
Implementation details for blk-mq drivers
-----------------------------------------
When the BLK_FEAT_WRITE_CACHE flag is set, REQ_OP_WRITE | REQ_PREFLUSH requests
with a payload are automatically turned into a sequence of a REQ_OP_FLUSH
request followed by the actual write by the block layer.
When the BLK_FEAT_FUA flags is set, the REQ_FUA bit simplify passed on for the
REQ_OP_WRITE request, else a REQ_OP_FLUSH request is sent by the block layer
after the completion of the write request for bio submissions with the REQ_FUA
bit set.

View File

@ -835,6 +835,7 @@ static int ubd_add(int n, char **error_out)
struct queue_limits lim = {
.max_segments = MAX_SG,
.seg_boundary_mask = PAGE_SIZE - 1,
.features = BLK_FEAT_WRITE_CACHE,
};
struct gendisk *disk;
int err = 0;
@ -882,7 +883,6 @@ static int ubd_add(int n, char **error_out)
}
blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
blk_queue_write_cache(disk->queue, true, false);
disk->major = UBD_MAJOR;
disk->first_minor = n << UBD_SHIFT;
disk->minors = 1 << UBD_SHIFT;

View File

@ -782,7 +782,7 @@ void submit_bio_noacct(struct bio *bio)
if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_WRITE &&
bio_op(bio) != REQ_OP_ZONE_APPEND))
goto end_io;
if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
if (!bdev_write_cache(bdev)) {
bio->bi_opf &= ~(REQ_PREFLUSH | REQ_FUA);
if (!bio_sectors(bio)) {
status = BLK_STS_OK;

View File

@ -381,8 +381,8 @@ static void blk_rq_init_flush(struct request *rq)
bool blk_insert_flush(struct request *rq)
{
struct request_queue *q = rq->q;
unsigned long fflags = q->queue_flags; /* may change, cache */
struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
bool supports_fua = q->limits.features & BLK_FEAT_FUA;
unsigned int policy = 0;
/* FLUSH/FUA request must never be merged */
@ -394,11 +394,10 @@ bool blk_insert_flush(struct request *rq)
/*
* Check which flushes we need to sequence for this operation.
*/
if (fflags & (1UL << QUEUE_FLAG_WC)) {
if (blk_queue_write_cache(q)) {
if (rq->cmd_flags & REQ_PREFLUSH)
policy |= REQ_FSEQ_PREFLUSH;
if (!(fflags & (1UL << QUEUE_FLAG_FUA)) &&
(rq->cmd_flags & REQ_FUA))
if ((rq->cmd_flags & REQ_FUA) && !supports_fua)
policy |= REQ_FSEQ_POSTFLUSH;
}
@ -407,7 +406,7 @@ bool blk_insert_flush(struct request *rq)
* REQ_PREFLUSH and FUA for the driver.
*/
rq->cmd_flags &= ~REQ_PREFLUSH;
if (!(fflags & (1UL << QUEUE_FLAG_FUA)))
if (!supports_fua)
rq->cmd_flags &= ~REQ_FUA;
/*

View File

@ -93,8 +93,6 @@ static const char *const blk_queue_flag_name[] = {
QUEUE_FLAG_NAME(INIT_DONE),
QUEUE_FLAG_NAME(STABLE_WRITES),
QUEUE_FLAG_NAME(POLL),
QUEUE_FLAG_NAME(WC),
QUEUE_FLAG_NAME(FUA),
QUEUE_FLAG_NAME(DAX),
QUEUE_FLAG_NAME(STATS),
QUEUE_FLAG_NAME(REGISTERED),

View File

@ -261,6 +261,9 @@ static int blk_validate_limits(struct queue_limits *lim)
lim->misaligned = 0;
}
if (!(lim->features & BLK_FEAT_WRITE_CACHE))
lim->features &= ~BLK_FEAT_FUA;
err = blk_validate_integrity_limits(lim);
if (err)
return err;
@ -454,6 +457,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
{
unsigned int top, bottom, alignment, ret = 0;
t->features |= (b->features & BLK_FEAT_INHERIT_MASK);
t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
t->max_user_sectors = min_not_zero(t->max_user_sectors,
b->max_user_sectors);
@ -711,30 +716,6 @@ void blk_set_queue_depth(struct request_queue *q, unsigned int depth)
}
EXPORT_SYMBOL(blk_set_queue_depth);
/**
* blk_queue_write_cache - configure queue's write cache
* @q: the request queue for the device
* @wc: write back cache on or off
* @fua: device supports FUA writes, if true
*
* Tell the block layer about the write cache of @q.
*/
void blk_queue_write_cache(struct request_queue *q, bool wc, bool fua)
{
if (wc) {
blk_queue_flag_set(QUEUE_FLAG_HW_WC, q);
blk_queue_flag_set(QUEUE_FLAG_WC, q);
} else {
blk_queue_flag_clear(QUEUE_FLAG_HW_WC, q);
blk_queue_flag_clear(QUEUE_FLAG_WC, q);
}
if (fua)
blk_queue_flag_set(QUEUE_FLAG_FUA, q);
else
blk_queue_flag_clear(QUEUE_FLAG_FUA, q);
}
EXPORT_SYMBOL_GPL(blk_queue_write_cache);
int bdev_alignment_offset(struct block_device *bdev)
{
struct request_queue *q = bdev_get_queue(bdev);

View File

@ -423,32 +423,41 @@ static ssize_t queue_io_timeout_store(struct request_queue *q, const char *page,
static ssize_t queue_wc_show(struct request_queue *q, char *page)
{
if (test_bit(QUEUE_FLAG_WC, &q->queue_flags))
return sprintf(page, "write back\n");
return sprintf(page, "write through\n");
if (q->limits.features & BLK_FLAGS_WRITE_CACHE_DISABLED)
return sprintf(page, "write through\n");
return sprintf(page, "write back\n");
}
static ssize_t queue_wc_store(struct request_queue *q, const char *page,
size_t count)
{
struct queue_limits lim;
bool disable;
int err;
if (!strncmp(page, "write back", 10)) {
if (!test_bit(QUEUE_FLAG_HW_WC, &q->queue_flags))
return -EINVAL;
blk_queue_flag_set(QUEUE_FLAG_WC, q);
disable = false;
} else if (!strncmp(page, "write through", 13) ||
!strncmp(page, "none", 4)) {
blk_queue_flag_clear(QUEUE_FLAG_WC, q);
!strncmp(page, "none", 4)) {
disable = true;
} else {
return -EINVAL;
}
lim = queue_limits_start_update(q);
if (disable)
lim.flags |= BLK_FLAGS_WRITE_CACHE_DISABLED;
else
lim.flags &= ~BLK_FLAGS_WRITE_CACHE_DISABLED;
err = queue_limits_commit_update(q, &lim);
if (err)
return err;
return count;
}
static ssize_t queue_fua_show(struct request_queue *q, char *page)
{
return sprintf(page, "%u\n", test_bit(QUEUE_FLAG_FUA, &q->queue_flags));
return sprintf(page, "%u\n", !!(q->limits.features & BLK_FEAT_FUA));
}
static ssize_t queue_dax_show(struct request_queue *q, char *page)

View File

@ -206,8 +206,8 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw,
*/
if (wb_acct & WBT_DISCARD)
limit = rwb->wb_background;
else if (test_bit(QUEUE_FLAG_WC, &rwb->rqos.disk->queue->queue_flags) &&
!wb_recent_wait(rwb))
else if (blk_queue_write_cache(rwb->rqos.disk->queue) &&
!wb_recent_wait(rwb))
limit = 0;
else
limit = rwb->wb_normal;

View File

@ -2697,6 +2697,7 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
* connect.
*/
.max_hw_sectors = DRBD_MAX_BIO_SIZE_SAFE >> 8,
.features = BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA,
};
device = minor_to_device(minor);
@ -2736,7 +2737,6 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
disk->private_data = device;
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, disk->queue);
blk_queue_write_cache(disk->queue, true, true);
device->md_io.page = alloc_page(GFP_KERNEL);
if (!device->md_io.page)

View File

@ -985,6 +985,9 @@ static int loop_reconfigure_limits(struct loop_device *lo, unsigned short bsize)
lim.logical_block_size = bsize;
lim.physical_block_size = bsize;
lim.io_min = bsize;
lim.features &= ~BLK_FEAT_WRITE_CACHE;
if (file->f_op->fsync && !(lo->lo_flags & LO_FLAGS_READ_ONLY))
lim.features |= BLK_FEAT_WRITE_CACHE;
if (!backing_bdev || bdev_nonrot(backing_bdev))
blk_queue_flag_set(QUEUE_FLAG_NONROT, lo->lo_queue);
else
@ -1078,9 +1081,6 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
lo->old_gfp_mask = mapping_gfp_mask(mapping);
mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
if (!(lo->lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
blk_queue_write_cache(lo->lo_queue, true, false);
error = loop_reconfigure_limits(lo, config->block_size);
if (WARN_ON_ONCE(error))
goto out_unlock;
@ -1131,9 +1131,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
struct file *filp;
gfp_t gfp = lo->old_gfp_mask;
if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags))
blk_queue_write_cache(lo->lo_queue, false, false);
/*
* Freeze the request queue when unbinding on a live file descriptor and
* thus an open device. When called from ->release we are guaranteed

View File

@ -342,12 +342,14 @@ static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
lim.max_hw_discard_sectors = UINT_MAX;
else
lim.max_hw_discard_sectors = 0;
if (!(nbd->config->flags & NBD_FLAG_SEND_FLUSH))
blk_queue_write_cache(nbd->disk->queue, false, false);
else if (nbd->config->flags & NBD_FLAG_SEND_FUA)
blk_queue_write_cache(nbd->disk->queue, true, true);
else
blk_queue_write_cache(nbd->disk->queue, true, false);
if (!(nbd->config->flags & NBD_FLAG_SEND_FLUSH)) {
lim.features &= ~(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA);
} else if (nbd->config->flags & NBD_FLAG_SEND_FUA) {
lim.features |= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA;
} else {
lim.features |= BLK_FEAT_WRITE_CACHE;
lim.features &= ~BLK_FEAT_FUA;
}
lim.logical_block_size = blksize;
lim.physical_block_size = blksize;
error = queue_limits_commit_update(nbd->disk->queue, &lim);

View File

@ -1928,6 +1928,13 @@ static int null_add_dev(struct nullb_device *dev)
goto out_cleanup_tags;
}
if (dev->cache_size > 0) {
set_bit(NULLB_DEV_FL_CACHE, &nullb->dev->flags);
lim.features |= BLK_FEAT_WRITE_CACHE;
if (dev->fua)
lim.features |= BLK_FEAT_FUA;
}
nullb->disk = blk_mq_alloc_disk(nullb->tag_set, &lim, nullb);
if (IS_ERR(nullb->disk)) {
rv = PTR_ERR(nullb->disk);
@ -1940,11 +1947,6 @@ static int null_add_dev(struct nullb_device *dev)
nullb_setup_bwtimer(nullb);
}
if (dev->cache_size > 0) {
set_bit(NULLB_DEV_FL_CACHE, &nullb->dev->flags);
blk_queue_write_cache(nullb->q, true, dev->fua);
}
nullb->q->queuedata = nullb;
blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q);

View File

@ -388,9 +388,8 @@ static int ps3disk_probe(struct ps3_system_bus_device *_dev)
.max_segments = -1,
.max_segment_size = dev->bounce_size,
.dma_alignment = dev->blk_size - 1,
.features = BLK_FEAT_WRITE_CACHE,
};
struct request_queue *queue;
struct gendisk *gendisk;
if (dev->blk_size < 512) {
@ -447,10 +446,6 @@ static int ps3disk_probe(struct ps3_system_bus_device *_dev)
goto fail_free_tag_set;
}
queue = gendisk->queue;
blk_queue_write_cache(queue, true, false);
priv->gendisk = gendisk;
gendisk->major = ps3disk_major;
gendisk->first_minor = devidx * PS3DISK_MINORS;

View File

@ -1389,6 +1389,12 @@ static int rnbd_client_setup_device(struct rnbd_clt_dev *dev,
le32_to_cpu(rsp->max_discard_sectors);
}
if (rsp->cache_policy & RNBD_WRITEBACK) {
lim.features |= BLK_FEAT_WRITE_CACHE;
if (rsp->cache_policy & RNBD_FUA)
lim.features |= BLK_FEAT_FUA;
}
dev->gd = blk_mq_alloc_disk(&dev->sess->tag_set, &lim, dev);
if (IS_ERR(dev->gd))
return PTR_ERR(dev->gd);
@ -1397,10 +1403,6 @@ static int rnbd_client_setup_device(struct rnbd_clt_dev *dev,
blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, dev->queue);
blk_queue_flag_set(QUEUE_FLAG_SAME_FORCE, dev->queue);
blk_queue_write_cache(dev->queue,
!!(rsp->cache_policy & RNBD_WRITEBACK),
!!(rsp->cache_policy & RNBD_FUA));
return rnbd_clt_setup_gen_disk(dev, rsp, idx);
}

View File

@ -487,8 +487,6 @@ static void ublk_dev_param_basic_apply(struct ublk_device *ub)
struct request_queue *q = ub->ub_disk->queue;
const struct ublk_param_basic *p = &ub->params.basic;
blk_queue_write_cache(q, p->attrs & UBLK_ATTR_VOLATILE_CACHE,
p->attrs & UBLK_ATTR_FUA);
if (p->attrs & UBLK_ATTR_ROTATIONAL)
blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
else
@ -2210,6 +2208,12 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
lim.max_zone_append_sectors = p->max_zone_append_sectors;
}
if (ub->params.basic.attrs & UBLK_ATTR_VOLATILE_CACHE) {
lim.features |= BLK_FEAT_WRITE_CACHE;
if (ub->params.basic.attrs & UBLK_ATTR_FUA)
lim.features |= BLK_FEAT_FUA;
}
if (wait_for_completion_interruptible(&ub->completion) != 0)
return -EINTR;

View File

@ -1100,6 +1100,7 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
struct gendisk *disk = dev_to_disk(dev);
struct virtio_blk *vblk = disk->private_data;
struct virtio_device *vdev = vblk->vdev;
struct queue_limits lim;
int i;
BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE));
@ -1108,7 +1109,17 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
return i;
virtio_cwrite8(vdev, offsetof(struct virtio_blk_config, wce), i);
blk_queue_write_cache(disk->queue, virtblk_get_cache_mode(vdev), false);
lim = queue_limits_start_update(disk->queue);
if (virtblk_get_cache_mode(vdev))
lim.features |= BLK_FEAT_WRITE_CACHE;
else
lim.features &= ~BLK_FEAT_WRITE_CACHE;
blk_mq_freeze_queue(disk->queue);
i = queue_limits_commit_update(disk->queue, &lim);
blk_mq_unfreeze_queue(disk->queue);
if (i)
return i;
return count;
}
@ -1504,6 +1515,9 @@ static int virtblk_probe(struct virtio_device *vdev)
if (err)
goto out_free_tags;
if (virtblk_get_cache_mode(vdev))
lim.features |= BLK_FEAT_WRITE_CACHE;
vblk->disk = blk_mq_alloc_disk(&vblk->tag_set, &lim, vblk);
if (IS_ERR(vblk->disk)) {
err = PTR_ERR(vblk->disk);
@ -1519,10 +1533,6 @@ static int virtblk_probe(struct virtio_device *vdev)
vblk->disk->fops = &virtblk_fops;
vblk->index = index;
/* configure queue flush support */
blk_queue_write_cache(vblk->disk->queue, virtblk_get_cache_mode(vdev),
false);
/* If disk is read-only in the host, the guest should obey */
if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO))
set_disk_ro(vblk->disk, 1);

View File

@ -959,6 +959,12 @@ static void blkif_set_queue_limits(const struct blkfront_info *info,
lim->max_secure_erase_sectors = UINT_MAX;
}
if (info->feature_flush) {
lim->features |= BLK_FEAT_WRITE_CACHE;
if (info->feature_fua)
lim->features |= BLK_FEAT_FUA;
}
/* Hard sector size and max sectors impersonate the equiv. hardware. */
lim->logical_block_size = info->sector_size;
lim->physical_block_size = info->physical_sector_size;
@ -987,8 +993,6 @@ static const char *flush_info(struct blkfront_info *info)
static void xlvbd_flush(struct blkfront_info *info)
{
blk_queue_write_cache(info->rq, info->feature_flush ? true : false,
info->feature_fua ? true : false);
pr_info("blkfront: %s: %s %s %s %s %s %s %s\n",
info->gd->disk_name, flush_info(info),
"persistent grants:", info->feature_persistent ?

View File

@ -897,7 +897,6 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
sector_t sectors, struct block_device *cached_bdev,
const struct block_device_operations *ops)
{
struct request_queue *q;
const size_t max_stripes = min_t(size_t, INT_MAX,
SIZE_MAX / sizeof(atomic_t));
struct queue_limits lim = {
@ -909,6 +908,7 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
.io_min = block_size,
.logical_block_size = block_size,
.physical_block_size = block_size,
.features = BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA,
};
uint64_t n;
int idx;
@ -975,12 +975,7 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
d->disk->fops = ops;
d->disk->private_data = d;
q = d->disk->queue;
blk_queue_flag_set(QUEUE_FLAG_NONROT, d->disk->queue);
blk_queue_write_cache(q, true, true);
return 0;
out_bioset_exit:

View File

@ -1686,34 +1686,16 @@ int dm_calculate_queue_limits(struct dm_table *t,
return validate_hardware_logical_block_alignment(t, limits);
}
static int device_flush_capable(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
/*
* Check if a target requires flush support even if none of the underlying
* devices need it (e.g. to persist target-specific metadata).
*/
static bool dm_table_supports_flush(struct dm_table *t)
{
unsigned long flush = (unsigned long) data;
struct request_queue *q = bdev_get_queue(dev->bdev);
return (q->queue_flags & flush);
}
static bool dm_table_supports_flush(struct dm_table *t, unsigned long flush)
{
/*
* Require at least one underlying device to support flushes.
* t->devices includes internal dm devices such as mirror logs
* so we need to use iterate_devices here, which targets
* supporting flushes must provide.
*/
for (unsigned int i = 0; i < t->num_targets; i++) {
struct dm_target *ti = dm_table_get_target(t, i);
if (!ti->num_flush_bios)
continue;
if (ti->flush_supported)
return true;
if (ti->type->iterate_devices &&
ti->type->iterate_devices(ti, device_flush_capable, (void *) flush))
if (ti->num_flush_bios && ti->flush_supported)
return true;
}
@ -1855,7 +1837,6 @@ static int device_requires_stable_pages(struct dm_target *ti,
int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
struct queue_limits *limits)
{
bool wc = false, fua = false;
int r;
if (dm_table_supports_nowait(t))
@ -1876,12 +1857,8 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
if (!dm_table_supports_secure_erase(t))
limits->max_secure_erase_sectors = 0;
if (dm_table_supports_flush(t, (1UL << QUEUE_FLAG_WC))) {
wc = true;
if (dm_table_supports_flush(t, (1UL << QUEUE_FLAG_FUA)))
fua = true;
}
blk_queue_write_cache(q, wc, fua);
if (dm_table_supports_flush(t))
limits->features |= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA;
if (dm_table_supports_dax(t, device_not_dax_capable)) {
blk_queue_flag_set(QUEUE_FLAG_DAX, q);

View File

@ -5785,7 +5785,10 @@ struct mddev *md_alloc(dev_t dev, char *name)
int partitioned;
int shift;
int unit;
int error ;
int error;
struct queue_limits lim = {
.features = BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA,
};
/*
* Wait for any previous instance of this device to be completely
@ -5825,7 +5828,7 @@ struct mddev *md_alloc(dev_t dev, char *name)
*/
mddev->hold_active = UNTIL_STOP;
disk = blk_alloc_disk(NULL, NUMA_NO_NODE);
disk = blk_alloc_disk(&lim, NUMA_NO_NODE);
if (IS_ERR(disk)) {
error = PTR_ERR(disk);
goto out_free_mddev;
@ -5843,7 +5846,6 @@ struct mddev *md_alloc(dev_t dev, char *name)
disk->fops = &md_fops;
disk->private_data = mddev;
blk_queue_write_cache(disk->queue, true, true);
disk->events |= DISK_EVENT_MEDIA_CHANGE;
mddev->gendisk = disk;
error = add_disk(disk);

View File

@ -2466,8 +2466,7 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,
struct mmc_blk_data *md;
int devidx, ret;
char cap_str[10];
bool cache_enabled = false;
bool fua_enabled = false;
unsigned int features = 0;
devidx = ida_alloc_max(&mmc_blk_ida, max_devices - 1, GFP_KERNEL);
if (devidx < 0) {
@ -2499,7 +2498,24 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,
*/
md->read_only = mmc_blk_readonly(card);
md->disk = mmc_init_queue(&md->queue, card);
if (mmc_host_cmd23(card->host)) {
if ((mmc_card_mmc(card) &&
card->csd.mmca_vsn >= CSD_SPEC_VER_3) ||
(mmc_card_sd(card) &&
card->scr.cmds & SD_SCR_CMD23_SUPPORT))
md->flags |= MMC_BLK_CMD23;
}
if (md->flags & MMC_BLK_CMD23 &&
((card->ext_csd.rel_param & EXT_CSD_WR_REL_PARAM_EN) ||
card->ext_csd.rel_sectors)) {
md->flags |= MMC_BLK_REL_WR;
features |= (BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA);
} else if (mmc_cache_enabled(card->host)) {
features |= BLK_FEAT_WRITE_CACHE;
}
md->disk = mmc_init_queue(&md->queue, card, features);
if (IS_ERR(md->disk)) {
ret = PTR_ERR(md->disk);
goto err_kfree;
@ -2539,26 +2555,6 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,
set_capacity(md->disk, size);
if (mmc_host_cmd23(card->host)) {
if ((mmc_card_mmc(card) &&
card->csd.mmca_vsn >= CSD_SPEC_VER_3) ||
(mmc_card_sd(card) &&
card->scr.cmds & SD_SCR_CMD23_SUPPORT))
md->flags |= MMC_BLK_CMD23;
}
if (md->flags & MMC_BLK_CMD23 &&
((card->ext_csd.rel_param & EXT_CSD_WR_REL_PARAM_EN) ||
card->ext_csd.rel_sectors)) {
md->flags |= MMC_BLK_REL_WR;
fua_enabled = true;
cache_enabled = true;
}
if (mmc_cache_enabled(card->host))
cache_enabled = true;
blk_queue_write_cache(md->queue.queue, cache_enabled, fua_enabled);
string_get_size((u64)size, 512, STRING_UNITS_2,
cap_str, sizeof(cap_str));
pr_info("%s: %s %s %s%s\n",

View File

@ -344,10 +344,12 @@ static const struct blk_mq_ops mmc_mq_ops = {
};
static struct gendisk *mmc_alloc_disk(struct mmc_queue *mq,
struct mmc_card *card)
struct mmc_card *card, unsigned int features)
{
struct mmc_host *host = card->host;
struct queue_limits lim = { };
struct queue_limits lim = {
.features = features,
};
struct gendisk *disk;
if (mmc_can_erase(card))
@ -413,10 +415,12 @@ static inline bool mmc_merge_capable(struct mmc_host *host)
* mmc_init_queue - initialise a queue structure.
* @mq: mmc queue
* @card: mmc card to attach this queue
* @features: block layer features (BLK_FEAT_*)
*
* Initialise a MMC card request queue.
*/
struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card)
struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
unsigned int features)
{
struct mmc_host *host = card->host;
struct gendisk *disk;
@ -460,7 +464,7 @@ struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card)
return ERR_PTR(ret);
disk = mmc_alloc_disk(mq, card);
disk = mmc_alloc_disk(mq, card, features);
if (IS_ERR(disk))
blk_mq_free_tag_set(&mq->tag_set);
return disk;

View File

@ -94,7 +94,8 @@ struct mmc_queue {
struct work_struct complete_work;
};
struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card);
struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
unsigned int features);
extern void mmc_cleanup_queue(struct mmc_queue *);
extern void mmc_queue_suspend(struct mmc_queue *);
extern void mmc_queue_resume(struct mmc_queue *);

View File

@ -336,6 +336,8 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
lim.logical_block_size = tr->blksize;
if (tr->discard)
lim.max_hw_discard_sectors = UINT_MAX;
if (tr->flush)
lim.features |= BLK_FEAT_WRITE_CACHE;
/* Create gendisk */
gd = blk_mq_alloc_disk(new->tag_set, &lim, new);
@ -373,9 +375,6 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
spin_lock_init(&new->queue_lock);
INIT_LIST_HEAD(&new->rq_list);
if (tr->flush)
blk_queue_write_cache(new->rq, true, false);
blk_queue_flag_set(QUEUE_FLAG_NONROT, new->rq);
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, new->rq);

View File

@ -455,6 +455,7 @@ static int pmem_attach_disk(struct device *dev,
.logical_block_size = pmem_sector_size(ndns),
.physical_block_size = PAGE_SIZE,
.max_hw_sectors = UINT_MAX,
.features = BLK_FEAT_WRITE_CACHE,
};
int nid = dev_to_node(dev), fua;
struct resource *res = &nsio->res;
@ -495,6 +496,8 @@ static int pmem_attach_disk(struct device *dev,
dev_warn(dev, "unable to guarantee persistence of writes\n");
fua = 0;
}
if (fua)
lim.features |= BLK_FEAT_FUA;
if (!devm_request_mem_region(dev, res->start, resource_size(res),
dev_name(&ndns->dev))) {
@ -543,7 +546,6 @@ static int pmem_attach_disk(struct device *dev,
}
pmem->virt_addr = addr;
blk_queue_write_cache(q, true, fua);
blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, q);
if (pmem->pfn_flags & PFN_MAP)

View File

@ -2056,7 +2056,6 @@ static int nvme_update_ns_info_generic(struct nvme_ns *ns,
static int nvme_update_ns_info_block(struct nvme_ns *ns,
struct nvme_ns_info *info)
{
bool vwc = ns->ctrl->vwc & NVME_CTRL_VWC_PRESENT;
struct queue_limits lim;
struct nvme_id_ns_nvm *nvm = NULL;
struct nvme_zone_info zi = {};
@ -2106,6 +2105,11 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
ns->head->ids.csi == NVME_CSI_ZNS)
nvme_update_zone_info(ns, &lim, &zi);
if (ns->ctrl->vwc & NVME_CTRL_VWC_PRESENT)
lim.features |= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA;
else
lim.features &= ~(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA);
/*
* Register a metadata profile for PI, or the plain non-integrity NVMe
* metadata masquerading as Type 0 if supported, otherwise reject block
@ -2132,7 +2136,6 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
if ((id->dlfeat & 0x7) == 0x1 && (id->dlfeat & (1 << 3)))
ns->head->features |= NVME_NS_DEAC;
set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info));
blk_queue_write_cache(ns->disk->queue, vwc, vwc);
set_bit(NVME_NS_READY, &ns->flags);
blk_mq_unfreeze_queue(ns->disk->queue);

View File

@ -521,7 +521,6 @@ static void nvme_requeue_work(struct work_struct *work)
int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
{
struct queue_limits lim;
bool vwc = false;
mutex_init(&head->lock);
bio_list_init(&head->requeue_list);
@ -562,11 +561,6 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
if (ctrl->tagset->nr_maps > HCTX_TYPE_POLL &&
ctrl->tagset->map[HCTX_TYPE_POLL].nr_queues)
blk_queue_flag_set(QUEUE_FLAG_POLL, head->disk->queue);
/* we need to propagate up the VMC settings */
if (ctrl->vwc & NVME_CTRL_VWC_PRESENT)
vwc = true;
blk_queue_write_cache(head->disk->queue, vwc, vwc);
return 0;
}

View File

@ -120,17 +120,18 @@ static const char *sd_cache_types[] = {
"write back, no read (daft)"
};
static void sd_set_flush_flag(struct scsi_disk *sdkp)
static void sd_set_flush_flag(struct scsi_disk *sdkp,
struct queue_limits *lim)
{
bool wc = false, fua = false;
if (sdkp->WCE) {
wc = true;
lim->features |= BLK_FEAT_WRITE_CACHE;
if (sdkp->DPOFUA)
fua = true;
lim->features |= BLK_FEAT_FUA;
else
lim->features &= ~BLK_FEAT_FUA;
} else {
lim->features &= ~(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA);
}
blk_queue_write_cache(sdkp->disk->queue, wc, fua);
}
static ssize_t
@ -168,9 +169,18 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
wce = (ct & 0x02) && !sdkp->write_prot ? 1 : 0;
if (sdkp->cache_override) {
struct queue_limits lim;
sdkp->WCE = wce;
sdkp->RCD = rcd;
sd_set_flush_flag(sdkp);
lim = queue_limits_start_update(sdkp->disk->queue);
sd_set_flush_flag(sdkp, &lim);
blk_mq_freeze_queue(sdkp->disk->queue);
ret = queue_limits_commit_update(sdkp->disk->queue, &lim);
blk_mq_unfreeze_queue(sdkp->disk->queue);
if (ret)
return ret;
return count;
}
@ -3663,7 +3673,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
* We now have all cache related info, determine how we deal
* with flush requests.
*/
sd_set_flush_flag(sdkp);
sd_set_flush_flag(sdkp, &lim);
/* Initial block count limit based on CDB TRANSFER LENGTH field size. */
dev_max = sdp->use_16_for_rw ? SD_MAX_XFER_BLOCKS : SD_DEF_XFER_BLOCKS;

View File

@ -282,6 +282,28 @@ static inline bool blk_op_is_passthrough(blk_opf_t op)
return op == REQ_OP_DRV_IN || op == REQ_OP_DRV_OUT;
}
/* flags set by the driver in queue_limits.features */
enum {
/* supports a volatile write cache */
BLK_FEAT_WRITE_CACHE = (1u << 0),
/* supports passing on the FUA bit */
BLK_FEAT_FUA = (1u << 1),
};
/*
* Flags automatically inherited when stacking limits.
*/
#define BLK_FEAT_INHERIT_MASK \
(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA)
/* internal flags in queue_limits.flags */
enum {
/* do not send FLUSH or FUA command despite advertised write cache */
BLK_FLAGS_WRITE_CACHE_DISABLED = (1u << 31),
};
/*
* BLK_BOUNCE_NONE: never bounce (default)
* BLK_BOUNCE_HIGH: bounce all highmem pages
@ -292,6 +314,8 @@ enum blk_bounce {
};
struct queue_limits {
unsigned int features;
unsigned int flags;
enum blk_bounce bounce;
unsigned long seg_boundary_mask;
unsigned long virt_boundary_mask;
@ -536,12 +560,9 @@ struct request_queue {
#define QUEUE_FLAG_ADD_RANDOM 10 /* Contributes to random pool */
#define QUEUE_FLAG_SYNCHRONOUS 11 /* always completes in submit context */
#define QUEUE_FLAG_SAME_FORCE 12 /* force complete on same CPU */
#define QUEUE_FLAG_HW_WC 13 /* Write back caching supported */
#define QUEUE_FLAG_INIT_DONE 14 /* queue is initialized */
#define QUEUE_FLAG_STABLE_WRITES 15 /* don't modify blks until WB is done */
#define QUEUE_FLAG_POLL 16 /* IO polling enabled if set */
#define QUEUE_FLAG_WC 17 /* Write back caching */
#define QUEUE_FLAG_FUA 18 /* device supports FUA writes */
#define QUEUE_FLAG_DAX 19 /* device supports DAX */
#define QUEUE_FLAG_STATS 20 /* track IO start and completion times */
#define QUEUE_FLAG_REGISTERED 22 /* queue has been registered to a disk */
@ -951,7 +972,6 @@ void queue_limits_stack_bdev(struct queue_limits *t, struct block_device *bdev,
sector_t offset, const char *pfx);
extern void blk_queue_update_dma_pad(struct request_queue *, unsigned int);
extern void blk_queue_rq_timeout(struct request_queue *, unsigned int);
extern void blk_queue_write_cache(struct request_queue *q, bool enabled, bool fua);
struct blk_independent_access_ranges *
disk_alloc_independent_access_ranges(struct gendisk *disk, int nr_ia_ranges);
@ -1304,14 +1324,20 @@ static inline bool bdev_stable_writes(struct block_device *bdev)
return test_bit(QUEUE_FLAG_STABLE_WRITES, &q->queue_flags);
}
static inline bool blk_queue_write_cache(struct request_queue *q)
{
return (q->limits.features & BLK_FEAT_WRITE_CACHE) &&
!(q->limits.flags & BLK_FLAGS_WRITE_CACHE_DISABLED);
}
static inline bool bdev_write_cache(struct block_device *bdev)
{
return test_bit(QUEUE_FLAG_WC, &bdev_get_queue(bdev)->queue_flags);
return blk_queue_write_cache(bdev_get_queue(bdev));
}
static inline bool bdev_fua(struct block_device *bdev)
{
return test_bit(QUEUE_FLAG_FUA, &bdev_get_queue(bdev)->queue_flags);
return bdev_get_queue(bdev)->limits.features & BLK_FEAT_FUA;
}
static inline bool bdev_nowait(struct block_device *bdev)