Merge branch 'for-6.14/block' into for-next

* for-6.14/block: (85 commits)
  nvme-pci: use correct size to free the hmb buffer
  nvme: Add error path for xa_store in nvme_init_effects
  nvme-pci: fix comment typo
  Documentation: Document the NVMe PCI endpoint target driver
  nvmet: New NVMe PCI endpoint function target driver
  nvmet: Implement arbitration feature support
  nvmet: Implement interrupt config feature support
  nvmet: Implement interrupt coalescing feature support
  nvmet: Implement host identifier set feature support
  nvmet: Introduce get/set_feature controller operations
  nvmet: Do not require SGL for PCI target controller commands
  nvmet: Add support for I/O queue management admin commands
  nvmet: Introduce nvmet_sq_create() and nvmet_cq_create()
  nvmet: Introduce nvmet_req_transfer_len()
  nvmet: Improve nvmet_alloc_ctrl() interface and implementation
  nvme: Add PCI transport type
  nvmet: Add drvdata field to struct nvmet_ctrl
  nvmet: Introduce nvmet_get_cmd_effects_admin()
  nvmet: Export nvmet_update_cc() and nvmet_cc_xxx() helpers
  nvmet: Add vendor_id and subsys_vendor_id subsystem attributes
  ...

Signed-off-by: Jens Axboe <axboe@kernel.dk>
This commit is contained in:
Jens Axboe 2025-01-13 07:26:24 -07:00
commit 654081b023
123 changed files with 4648 additions and 1193 deletions

View File

@ -15,6 +15,7 @@ PCI Endpoint Framework
pci-ntb-howto
pci-vntb-function
pci-vntb-howto
pci-nvme-function
function/binding/pci-test
function/binding/pci-ntb

View File

@ -0,0 +1,13 @@
.. SPDX-License-Identifier: GPL-2.0
=================
PCI NVMe Function
=================
:Author: Damien Le Moal <dlemoal@kernel.org>
The PCI NVMe endpoint function implements a PCI NVMe controller using the NVMe
subsystem target core code. The driver for this function resides with the NVMe
subsystem as drivers/nvme/target/nvmet-pciep.c.
See Documentation/nvme/nvme-pci-endpoint-target.rst for more details.

View File

@ -0,0 +1,12 @@
.. SPDX-License-Identifier: GPL-2.0
==============
NVMe Subsystem
==============
.. toctree::
:maxdepth: 2
:numbered:
feature-and-quirk-policy
nvme-pci-endpoint-target

View File

@ -0,0 +1,368 @@
.. SPDX-License-Identifier: GPL-2.0
=================================
NVMe PCI Endpoint Function Target
=================================
:Author: Damien Le Moal <dlemoal@kernel.org>
The NVMe PCI endpoint function target driver implements a NVMe PCIe controller
using a NVMe fabrics target controller configured with the PCI transport type.
Overview
========
The NVMe PCI endpoint function target driver allows exposing a NVMe target
controller over a PCIe link, thus implementing an NVMe PCIe device similar to a
regular M.2 SSD. The target controller is created in the same manner as when
using NVMe over fabrics: the controller represents the interface to an NVMe
subsystem using a port. The port transfer type must be configured to be
"pci". The subsystem can be configured to have namespaces backed by regular
files or block devices, or can use NVMe passthrough to expose to the PCI host an
existing physical NVMe device or a NVMe fabrics host controller (e.g. a NVMe TCP
host controller).
The NVMe PCI endpoint function target driver relies as much as possible on the
NVMe target core code to parse and execute NVMe commands submitted by the PCIe
host. However, using the PCI endpoint framework API and DMA API, the driver is
also responsible for managing all data transfers over the PCIe link. This
implies that the NVMe PCI endpoint function target driver implements several
NVMe data structure management and some NVMe command parsing.
1) The driver manages retrieval of NVMe commands in submission queues using DMA
if supported, or MMIO otherwise. Each command retrieved is then executed
using a work item to maximize performance with the parallel execution of
multiple commands on different CPUs. The driver uses a work item to
constantly poll the doorbell of all submission queues to detect command
submissions from the PCIe host.
2) The driver transfers completion queues entries of completed commands to the
PCIe host using MMIO copy of the entries in the host completion queue.
After posting completion entries in a completion queue, the driver uses the
PCI endpoint framework API to raise an interrupt to the host to signal the
commands completion.
3) For any command that has a data buffer, the NVMe PCI endpoint target driver
parses the command PRPs or SGLs lists to create a list of PCI address
segments representing the mapping of the command data buffer on the host.
The command data buffer is transferred over the PCIe link using this list of
PCI address segments using DMA, if supported. If DMA is not supported, MMIO
is used, which results in poor performance. For write commands, the command
data buffer is transferred from the host into a local memory buffer before
executing the command using the target core code. For read commands, a local
memory buffer is allocated to execute the command and the content of that
buffer is transferred to the host once the command completes.
Controller Capabilities
-----------------------
The NVMe capabilities exposed to the PCIe host through the BAR 0 registers
are almost identical to the capabilities of the NVMe target controller
implemented by the target core code. There are some exceptions.
1) The NVMe PCI endpoint target driver always sets the controller capability
CQR bit to request "Contiguous Queues Required". This is to facilitate the
mapping of a queue PCI address range to the local CPU address space.
2) The doorbell stride (DSTRB) is always set to be 4B
3) Since the PCI endpoint framework does not provide a way to handle PCI level
resets, the controller capability NSSR bit (NVM Subsystem Reset Supported)
is always cleared.
4) The boot partition support (BPS), Persistent Memory Region Supported (PMRS)
and Controller Memory Buffer Supported (CMBS) capabilities are never
reported.
Supported Features
------------------
The NVMe PCI endpoint target driver implements support for both PRPs and SGLs.
The driver also implements IRQ vector coalescing and submission queue
arbitration burst.
The maximum number of queues and the maximum data transfer size (MDTS) are
configurable through configfs before starting the controller. To avoid issues
with excessive local memory usage for executing commands, MDTS defaults to 512
KB and is limited to a maximum of 2 MB (arbitrary limit).
Mimimum number of PCI Address Mapping Windows Required
------------------------------------------------------
Most PCI endpoint controllers provide a limited number of mapping windows for
mapping a PCI address range to local CPU memory addresses. The NVMe PCI
endpoint target controllers uses mapping windows for the following.
1) One memory window for raising MSI or MSI-X interrupts
2) One memory window for MMIO transfers
3) One memory window for each completion queue
Given the highly asynchronous nature of the NVMe PCI endpoint target driver
operation, the memory windows as described above will generally not be used
simultaneously, but that may happen. So a safe maximum number of completion
queues that can be supported is equal to the total number of memory mapping
windows of the PCI endpoint controller minus two. E.g. for an endpoint PCI
controller with 32 outbound memory windows available, up to 30 completion
queues can be safely operated without any risk of getting PCI address mapping
errors due to the lack of memory windows.
Maximum Number of Queue Pairs
-----------------------------
Upon binding of the NVMe PCI endpoint target driver to the PCI endpoint
controller, BAR 0 is allocated with enough space to accommodate the admin queue
and multiple I/O queues. The maximum of number of I/O queues pairs that can be
supported is limited by several factors.
1) The NVMe target core code limits the maximum number of I/O queues to the
number of online CPUs.
2) The total number of queue pairs, including the admin queue, cannot exceed
the number of MSI-X or MSI vectors available.
3) The total number of completion queues must not exceed the total number of
PCI mapping windows minus 2 (see above).
The NVMe endpoint function driver allows configuring the maximum number of
queue pairs through configfs.
Limitations and NVMe Specification Non-Compliance
-------------------------------------------------
Similar to the NVMe target core code, the NVMe PCI endpoint target driver does
not support multiple submission queues using the same completion queue. All
submission queues must specify a unique completion queue.
User Guide
==========
This section describes the hardware requirements and how to setup an NVMe PCI
endpoint target device.
Kernel Requirements
-------------------
The kernel must be compiled with the configuration options CONFIG_PCI_ENDPOINT,
CONFIG_PCI_ENDPOINT_CONFIGFS, and CONFIG_NVME_TARGET_PCI_EPF enabled.
CONFIG_PCI, CONFIG_BLK_DEV_NVME and CONFIG_NVME_TARGET must also be enabled
(obviously).
In addition to this, at least one PCI endpoint controller driver should be
available for the endpoint hardware used.
To facilitate testing, enabling the null-blk driver (CONFIG_BLK_DEV_NULL_BLK)
is also recommended. With this, a simple setup using a null_blk block device
as a subsystem namespace can be used.
Hardware Requirements
---------------------
To use the NVMe PCI endpoint target driver, at least one endpoint controller
device is required.
To find the list of endpoint controller devices in the system::
# ls /sys/class/pci_epc/
a40000000.pcie-ep
If PCI_ENDPOINT_CONFIGFS is enabled::
# ls /sys/kernel/config/pci_ep/controllers
a40000000.pcie-ep
The endpoint board must of course also be connected to a host with a PCI cable
with RX-TX signal swapped. If the host PCI slot used does not have
plug-and-play capabilities, the host should be powered off when the NVMe PCI
endpoint device is configured.
NVMe Endpoint Device
--------------------
Creating an NVMe endpoint device is a two step process. First, an NVMe target
subsystem and port must be defined. Second, the NVMe PCI endpoint device must
be setup and bound to the subsystem and port created.
Creating a NVMe Subsystem and Port
----------------------------------
Details about how to configure a NVMe target subsystem and port are outside the
scope of this document. The following only provides a simple example of a port
and subsystem with a single namespace backed by a null_blk device.
First, make sure that configfs is enabled::
# mount -t configfs none /sys/kernel/config
Next, create a null_blk device (default settings give a 250 GB device without
memory backing). The block device created will be /dev/nullb0 by default::
# modprobe null_blk
# ls /dev/nullb0
/dev/nullb0
The NVMe PCI endpoint function target driver must be loaded::
# modprobe nvmet_pci_epf
# lsmod | grep nvmet
nvmet_pci_epf 32768 0
nvmet 118784 1 nvmet_pci_epf
nvme_core 131072 2 nvmet_pci_epf,nvmet
Now, create a subsystem and a port that we will use to create a PCI target
controller when setting up the NVMe PCI endpoint target device. In this
example, the port is created with a maximum of 4 I/O queue pairs::
# cd /sys/kernel/config/nvmet/subsystems
# mkdir nvmepf.0.nqn
# echo -n "Linux-pci-epf" > nvmepf.0.nqn/attr_model
# echo "0x1b96" > nvmepf.0.nqn/attr_vendor_id
# echo "0x1b96" > nvmepf.0.nqn/attr_subsys_vendor_id
# echo 1 > nvmepf.0.nqn/attr_allow_any_host
# echo 4 > nvmepf.0.nqn/attr_qid_max
Next, create and enable the subsystem namespace using the null_blk block
device::
# mkdir nvmepf.0.nqn/namespaces/1
# echo -n "/dev/nullb0" > nvmepf.0.nqn/namespaces/1/device_path
# echo 1 > "nvmepf.0.nqn/namespaces/1/enable"
Finally, create the target port and link it to the subsystem::
# cd /sys/kernel/config/nvmet/ports
# mkdir 1
# echo -n "pci" > 1/addr_trtype
# ln -s /sys/kernel/config/nvmet/subsystems/nvmepf.0.nqn \
/sys/kernel/config/nvmet/ports/1/subsystems/nvmepf.0.nqn
Creating a NVMe PCI Endpoint Device
-----------------------------------
With the NVMe target subsystem and port ready for use, the NVMe PCI endpoint
device can now be created and enabled. The NVMe PCI endpoint target driver
should already be loaded (that is done automatically when the port is created)::
# ls /sys/kernel/config/pci_ep/functions
nvmet_pci_epf
Next, create function 0::
# cd /sys/kernel/config/pci_ep/functions/nvmet_pci_epf
# mkdir nvmepf.0
# ls nvmepf.0/
baseclass_code msix_interrupts secondary
cache_line_size nvme subclass_code
deviceid primary subsys_id
interrupt_pin progif_code subsys_vendor_id
msi_interrupts revid vendorid
Configure the function using any device ID (the vendor ID for the device will
be automatically set to the same value as the NVMe target subsystem vendor
ID)::
# cd /sys/kernel/config/pci_ep/functions/nvmet_pci_epf
# echo 0xBEEF > nvmepf.0/deviceid
# echo 32 > nvmepf.0/msix_interrupts
If the PCI endpoint controller used does not support MSI-X, MSI can be
configured instead::
# echo 32 > nvmepf.0/msi_interrupts
Next, let's bind our endpoint device with the target subsystem and port that we
created::
# echo 1 > nvmepf.0/nvme/portid
# echo "nvmepf.0.nqn" > nvmepf.0/nvme/subsysnqn
The endpoint function can then be bound to the endpoint controller and the
controller started::
# cd /sys/kernel/config/pci_ep
# ln -s functions/nvmet_pci_epf/nvmepf.0 controllers/a40000000.pcie-ep/
# echo 1 > controllers/a40000000.pcie-ep/start
On the endpoint machine, kernel messages will show information as the NVMe
target device and endpoint device are created and connected.
.. code-block:: text
null_blk: disk nullb0 created
null_blk: module loaded
nvmet: adding nsid 1 to subsystem nvmepf.0.nqn
nvmet_pci_epf nvmet_pci_epf.0: PCI endpoint controller supports MSI-X, 32 vectors
nvmet: Created nvm controller 1 for subsystem nvmepf.0.nqn for NQN nqn.2014-08.org.nvmexpress:uuid:2ab90791-2246-4fbb-961d-4c3d5a5a0176.
nvmet_pci_epf nvmet_pci_epf.0: New PCI ctrl "nvmepf.0.nqn", 4 I/O queues, mdts 524288 B
PCI Root-Complex Host
---------------------
Booting the PCI host will result in the initialization of the PCIe link (this
may be signaled by the PCI endpoint driver with a kernel message). A kernel
message on the endpoint will also signal when the host NVMe driver enables the
device controller::
nvmet_pci_epf nvmet_pci_epf.0: Enabling controller
On the host side, the NVMe PCI endpoint function target device will is
discoverable as a PCI device, with the vendor ID and device ID as configured::
# lspci -n
0000:01:00.0 0108: 1b96:beef
An this device will be recognized as an NVMe device with a single namespace::
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme0n1 259:0 0 250G 0 disk
The NVMe endpoint block device can then be used as any other regular NVMe
namespace block device. The *nvme* command line utility can be used to get more
detailed information about the endpoint device::
# nvme id-ctrl /dev/nvme0
NVME Identify Controller:
vid : 0x1b96
ssvid : 0x1b96
sn : 94993c85650ef7bcd625
mn : Linux-pci-epf
fr : 6.13.0-r
rab : 6
ieee : 000000
cmic : 0xb
mdts : 7
cntlid : 0x1
ver : 0x20100
...
Endpoint Bindings
=================
The NVMe PCI endpoint target driver uses the PCI endpoint configfs device
attributes as follows.
================ ===========================================================
vendorid Ignored (the vendor id of the NVMe target subsystem is used)
deviceid Anything is OK (e.g. PCI_ANY_ID)
revid Do not care
progif_code Must be 0x02 (NVM Express)
baseclass_code Must be 0x01 (PCI_BASE_CLASS_STORAGE)
subclass_code Must be 0x08 (Non-Volatile Memory controller)
cache_line_size Do not care
subsys_vendor_id Ignored (the subsystem vendor id of the NVMe target subsystem
is used)
subsys_id Anything is OK (e.g. PCI_ANY_ID)
msi_interrupts At least equal to the number of queue pairs desired
msix_interrupts At least equal to the number of queue pairs desired
interrupt_pin Interrupt PIN to use if MSI and MSI-X are not supported
================ ===========================================================
The NVMe PCI endpoint target function also has some specific configurable
fields defined in the *nvme* subdirectory of the function directory. These
fields are as follows.
================ ===========================================================
mdts_kb Maximum data transfer size in KiB (default: 512)
portid The ID of the target port to use
subsysnqn The NQN of the target subsystem to use
================ ===========================================================

View File

@ -60,6 +60,7 @@ Storage interfaces
cdrom/index
scsi/index
target/index
nvme/index
Other subsystems
----------------

View File

@ -865,7 +865,6 @@ static int ubd_add(int n, char **error_out)
ubd_dev->tag_set.ops = &ubd_mq_ops;
ubd_dev->tag_set.queue_depth = 64;
ubd_dev->tag_set.numa_node = NUMA_NO_NODE;
ubd_dev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
ubd_dev->tag_set.driver_data = ubd_dev;
ubd_dev->tag_set.nr_hw_queues = 1;

View File

@ -27,8 +27,6 @@ bfq-y := bfq-iosched.o bfq-wf2q.o bfq-cgroup.o
obj-$(CONFIG_IOSCHED_BFQ) += bfq.o
obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o
obj-$(CONFIG_BLK_MQ_PCI) += blk-mq-pci.o
obj-$(CONFIG_BLK_MQ_VIRTIO) += blk-mq-virtio.o
obj-$(CONFIG_BLK_DEV_ZONED) += blk-zoned.o
obj-$(CONFIG_BLK_WBT) += blk-wbt.o
obj-$(CONFIG_BLK_DEBUG_FS) += blk-mq-debugfs.o

View File

@ -7622,7 +7622,7 @@ static ssize_t bfq_low_latency_store(struct elevator_queue *e,
#define BFQ_ATTR(name) \
__ATTR(name, 0644, bfq_##name##_show, bfq_##name##_store)
static struct elv_fs_entry bfq_attrs[] = {
static const struct elv_fs_entry bfq_attrs[] = {
BFQ_ATTR(fifo_expire_sync),
BFQ_ATTR(fifo_expire_async),
BFQ_ATTR(back_seek_max),

View File

@ -946,8 +946,11 @@ static bool bvec_try_merge_page(struct bio_vec *bv, struct page *page,
/*
* Try to merge a page into a segment, while obeying the hardware segment
* size limit. This is not for normal read/write bios, but for passthrough
* or Zone Append operations that we can't split.
* size limit.
*
* This is kept around for the integrity metadata, which is still tries
* to build the initial bio to the hardware limit and doesn't have proper
* helpers to split. Hopefully this will go away soon.
*/
bool bvec_try_merge_hw_page(struct request_queue *q, struct bio_vec *bv,
struct page *page, unsigned len, unsigned offset,
@ -964,106 +967,6 @@ bool bvec_try_merge_hw_page(struct request_queue *q, struct bio_vec *bv,
return bvec_try_merge_page(bv, page, len, offset, same_page);
}
/**
* bio_add_hw_page - attempt to add a page to a bio with hw constraints
* @q: the target queue
* @bio: destination bio
* @page: page to add
* @len: vec entry length
* @offset: vec entry offset
* @max_sectors: maximum number of sectors that can be added
* @same_page: return if the segment has been merged inside the same page
*
* Add a page to a bio while respecting the hardware max_sectors, max_segment
* and gap limitations.
*/
int bio_add_hw_page(struct request_queue *q, struct bio *bio,
struct page *page, unsigned int len, unsigned int offset,
unsigned int max_sectors, bool *same_page)
{
unsigned int max_size = max_sectors << SECTOR_SHIFT;
if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
return 0;
len = min3(len, max_size, queue_max_segment_size(q));
if (len > max_size - bio->bi_iter.bi_size)
return 0;
if (bio->bi_vcnt > 0) {
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
if (bvec_try_merge_hw_page(q, bv, page, len, offset,
same_page)) {
bio->bi_iter.bi_size += len;
return len;
}
if (bio->bi_vcnt >=
min(bio->bi_max_vecs, queue_max_segments(q)))
return 0;
/*
* If the queue doesn't support SG gaps and adding this segment
* would create a gap, disallow it.
*/
if (bvec_gap_to_prev(&q->limits, bv, offset))
return 0;
}
bvec_set_page(&bio->bi_io_vec[bio->bi_vcnt], page, len, offset);
bio->bi_vcnt++;
bio->bi_iter.bi_size += len;
return len;
}
/**
* bio_add_hw_folio - attempt to add a folio to a bio with hw constraints
* @q: the target queue
* @bio: destination bio
* @folio: folio to add
* @len: vec entry length
* @offset: vec entry offset in the folio
* @max_sectors: maximum number of sectors that can be added
* @same_page: return if the segment has been merged inside the same folio
*
* Add a folio to a bio while respecting the hardware max_sectors, max_segment
* and gap limitations.
*/
int bio_add_hw_folio(struct request_queue *q, struct bio *bio,
struct folio *folio, size_t len, size_t offset,
unsigned int max_sectors, bool *same_page)
{
if (len > UINT_MAX || offset > UINT_MAX)
return 0;
return bio_add_hw_page(q, bio, folio_page(folio, 0), len, offset,
max_sectors, same_page);
}
/**
* bio_add_pc_page - attempt to add page to passthrough bio
* @q: the target queue
* @bio: destination bio
* @page: page to add
* @len: vec entry length
* @offset: vec entry offset
*
* Attempt to add a page to the bio_vec maplist. This can fail for a
* number of reasons, such as the bio being full or target block device
* limitations. The target block device must allow bio's up to PAGE_SIZE,
* so it is always possible to add a single page to an empty bio.
*
* This should only be used by passthrough bios.
*/
int bio_add_pc_page(struct request_queue *q, struct bio *bio,
struct page *page, unsigned int len, unsigned int offset)
{
bool same_page = false;
return bio_add_hw_page(q, bio, page, len, offset,
queue_max_hw_sectors(q), &same_page);
}
EXPORT_SYMBOL(bio_add_pc_page);
/**
* __bio_add_page - add page(s) to a bio in a new segment
* @bio: destination bio

View File

@ -630,7 +630,13 @@ static void __submit_bio(struct bio *bio)
} else if (likely(bio_queue_enter(bio) == 0)) {
struct gendisk *disk = bio->bi_bdev->bd_disk;
if ((bio->bi_opf & REQ_POLLED) &&
!(disk->queue->limits.features & BLK_FEAT_POLL)) {
bio->bi_status = BLK_STS_NOTSUPP;
bio_endio(bio);
} else {
disk->fops->submit_bio(bio);
}
blk_queue_exit(disk->queue);
}
@ -805,12 +811,6 @@ void submit_bio_noacct(struct bio *bio)
}
}
if (!(q->limits.features & BLK_FEAT_POLL) &&
(bio->bi_opf & REQ_POLLED)) {
bio_clear_polled(bio);
goto not_supported;
}
switch (bio_op(bio)) {
case REQ_OP_READ:
break;
@ -935,7 +935,7 @@ int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags)
return 0;
q = bdev_get_queue(bdev);
if (cookie == BLK_QC_T_NONE || !(q->limits.features & BLK_FEAT_POLL))
if (cookie == BLK_QC_T_NONE)
return 0;
blk_flush_plug(current->plug, false);
@ -956,7 +956,8 @@ int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags)
} else {
struct gendisk *disk = q->disk;
if (disk && disk->fops->poll_bio)
if ((q->limits.features & BLK_FEAT_POLL) && disk &&
disk->fops->poll_bio)
ret = disk->fops->poll_bio(bio, iob, flags);
}
blk_queue_exit(q);

View File

@ -226,9 +226,7 @@ static ssize_t flag_store(struct device *dev, const char *page, size_t count,
else
lim.integrity.flags |= flag;
blk_mq_freeze_queue(q);
err = queue_limits_commit_update(q, &lim);
blk_mq_unfreeze_queue(q);
err = queue_limits_commit_update_frozen(q, &lim);
if (err)
return err;
return count;

View File

@ -189,7 +189,7 @@ static int bio_copy_user_iov(struct request *rq, struct rq_map_data *map_data,
}
}
if (bio_add_pc_page(rq->q, bio, page, bytes, offset) < bytes) {
if (bio_add_page(bio, page, bytes, offset) < bytes) {
if (!map_data)
__free_page(page);
break;
@ -272,86 +272,27 @@ static struct bio *blk_rq_map_bio_alloc(struct request *rq,
static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
gfp_t gfp_mask)
{
iov_iter_extraction_t extraction_flags = 0;
unsigned int max_sectors = queue_max_hw_sectors(rq->q);
unsigned int nr_vecs = iov_iter_npages(iter, BIO_MAX_VECS);
struct bio *bio;
int ret;
int j;
if (!iov_iter_count(iter))
return -EINVAL;
bio = blk_rq_map_bio_alloc(rq, nr_vecs, gfp_mask);
if (bio == NULL)
if (!bio)
return -ENOMEM;
if (blk_queue_pci_p2pdma(rq->q))
extraction_flags |= ITER_ALLOW_P2PDMA;
if (iov_iter_extract_will_pin(iter))
bio_set_flag(bio, BIO_PAGE_PINNED);
while (iov_iter_count(iter)) {
struct page *stack_pages[UIO_FASTIOV];
struct page **pages = stack_pages;
ssize_t bytes;
size_t offs;
int npages;
if (nr_vecs > ARRAY_SIZE(stack_pages))
pages = NULL;
bytes = iov_iter_extract_pages(iter, &pages, LONG_MAX,
nr_vecs, extraction_flags, &offs);
if (unlikely(bytes <= 0)) {
ret = bytes ? bytes : -EFAULT;
goto out_unmap;
}
npages = DIV_ROUND_UP(offs + bytes, PAGE_SIZE);
if (unlikely(offs & queue_dma_alignment(rq->q)))
j = 0;
else {
for (j = 0; j < npages; j++) {
struct page *page = pages[j];
unsigned int n = PAGE_SIZE - offs;
bool same_page = false;
if (n > bytes)
n = bytes;
if (!bio_add_hw_page(rq->q, bio, page, n, offs,
max_sectors, &same_page))
break;
if (same_page)
bio_release_page(bio, page);
bytes -= n;
offs = 0;
}
}
/*
* release the pages we didn't map into the bio, if any
*/
while (j < npages)
bio_release_page(bio, pages[j++]);
if (pages != stack_pages)
kvfree(pages);
/* couldn't stuff something into bio? */
if (bytes) {
iov_iter_revert(iter, bytes);
break;
}
}
ret = bio_iov_iter_get_pages(bio, iter);
if (ret)
goto out_put;
ret = blk_rq_append_bio(rq, bio);
if (ret)
goto out_unmap;
goto out_release;
return 0;
out_unmap:
out_release:
bio_release_pages(bio, false);
out_put:
blk_mq_map_bio_put(bio);
return ret;
}
@ -422,8 +363,7 @@ static struct bio *bio_map_kern(struct request_queue *q, void *data,
page = virt_to_page(data);
else
page = vmalloc_to_page(data);
if (bio_add_pc_page(q, bio, page, bytes,
offset) < bytes) {
if (bio_add_page(bio, page, bytes, offset) < bytes) {
/* we don't support partial mappings */
bio_uninit(bio);
kfree(bio);
@ -507,7 +447,7 @@ static struct bio *bio_copy_kern(struct request_queue *q, void *data,
if (!reading)
memcpy(page_address(page), p, bytes);
if (bio_add_pc_page(q, bio, page, bytes, 0) < bytes)
if (bio_add_page(bio, page, bytes, 0) < bytes)
break;
len -= bytes;
@ -536,24 +476,33 @@ cleanup:
*/
int blk_rq_append_bio(struct request *rq, struct bio *bio)
{
struct bvec_iter iter;
struct bio_vec bv;
const struct queue_limits *lim = &rq->q->limits;
unsigned int max_bytes = lim->max_hw_sectors << SECTOR_SHIFT;
unsigned int nr_segs = 0;
int ret;
bio_for_each_bvec(bv, bio, iter)
nr_segs++;
/* check that the data layout matches the hardware restrictions */
ret = bio_split_rw_at(bio, lim, &nr_segs, max_bytes);
if (ret) {
/* if we would have to split the bio, copy instead */
if (ret > 0)
ret = -EREMOTEIO;
return ret;
}
if (!rq->bio) {
blk_rq_bio_prep(rq, bio, nr_segs);
} else {
if (rq->bio) {
if (!ll_back_merge_fn(rq, bio, nr_segs))
return -EINVAL;
rq->biotail->bi_next = bio;
rq->biotail = bio;
rq->__data_len += (bio)->bi_iter.bi_size;
rq->__data_len += bio->bi_iter.bi_size;
bio_crypt_free_ctx(bio);
return 0;
}
rq->nr_phys_segments = nr_segs;
rq->bio = rq->biotail = bio;
rq->__data_len = bio->bi_iter.bi_size;
return 0;
}
EXPORT_SYMBOL(blk_rq_append_bio);
@ -561,9 +510,7 @@ EXPORT_SYMBOL(blk_rq_append_bio);
/* Prepare bio for passthrough IO given ITER_BVEC iter */
static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter)
{
const struct queue_limits *lim = &rq->q->limits;
unsigned int max_bytes = lim->max_hw_sectors << SECTOR_SHIFT;
unsigned int nsegs;
unsigned int max_bytes = rq->q->limits.max_hw_sectors << SECTOR_SHIFT;
struct bio *bio;
int ret;
@ -576,18 +523,10 @@ static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter)
return -ENOMEM;
bio_iov_bvec_set(bio, iter);
/* check that the data layout matches the hardware restrictions */
ret = bio_split_rw_at(bio, lim, &nsegs, max_bytes);
if (ret) {
/* if we would have to split the bio, copy instead */
if (ret > 0)
ret = -EREMOTEIO;
ret = blk_rq_append_bio(rq, bio);
if (ret)
blk_mq_map_bio_put(bio);
return ret;
}
blk_rq_bio_prep(rq, bio, nsegs);
return 0;
}
/**
@ -644,8 +583,11 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
ret = bio_copy_user_iov(rq, map_data, &i, gfp_mask);
else
ret = bio_map_user_iov(rq, &i, gfp_mask);
if (ret)
if (ret) {
if (ret == -EREMOTEIO)
ret = -EINVAL;
goto unmap_rq;
}
if (!bio)
bio = rq->bio;
} while (iov_iter_count(&i));

View File

@ -473,6 +473,63 @@ unsigned int blk_recalc_rq_segments(struct request *rq)
return nr_phys_segs;
}
struct phys_vec {
phys_addr_t paddr;
u32 len;
};
static bool blk_map_iter_next(struct request *req,
struct req_iterator *iter, struct phys_vec *vec)
{
unsigned int max_size;
struct bio_vec bv;
if (req->rq_flags & RQF_SPECIAL_PAYLOAD) {
if (!iter->bio)
return false;
vec->paddr = bvec_phys(&req->special_vec);
vec->len = req->special_vec.bv_len;
iter->bio = NULL;
return true;
}
if (!iter->iter.bi_size)
return false;
bv = mp_bvec_iter_bvec(iter->bio->bi_io_vec, iter->iter);
vec->paddr = bvec_phys(&bv);
max_size = get_max_segment_size(&req->q->limits, vec->paddr, UINT_MAX);
bv.bv_len = min(bv.bv_len, max_size);
bio_advance_iter_single(iter->bio, &iter->iter, bv.bv_len);
/*
* If we are entirely done with this bi_io_vec entry, check if the next
* one could be merged into it. This typically happens when moving to
* the next bio, but some callers also don't pack bvecs tight.
*/
while (!iter->iter.bi_size || !iter->iter.bi_bvec_done) {
struct bio_vec next;
if (!iter->iter.bi_size) {
if (!iter->bio->bi_next)
break;
iter->bio = iter->bio->bi_next;
iter->iter = iter->bio->bi_iter;
}
next = mp_bvec_iter_bvec(iter->bio->bi_io_vec, iter->iter);
if (bv.bv_len + next.bv_len > max_size ||
!biovec_phys_mergeable(req->q, &bv, &next))
break;
bv.bv_len += next.bv_len;
bio_advance_iter_single(iter->bio, &iter->iter, next.bv_len);
}
vec->len = bv.bv_len;
return true;
}
static inline struct scatterlist *blk_next_sg(struct scatterlist **sg,
struct scatterlist *sglist)
{
@ -490,120 +547,26 @@ static inline struct scatterlist *blk_next_sg(struct scatterlist **sg,
return sg_next(*sg);
}
static unsigned blk_bvec_map_sg(struct request_queue *q,
struct bio_vec *bvec, struct scatterlist *sglist,
struct scatterlist **sg)
{
unsigned nbytes = bvec->bv_len;
unsigned nsegs = 0, total = 0;
while (nbytes > 0) {
unsigned offset = bvec->bv_offset + total;
unsigned len = get_max_segment_size(&q->limits,
bvec_phys(bvec) + total, nbytes);
struct page *page = bvec->bv_page;
/*
* Unfortunately a fair number of drivers barf on scatterlists
* that have an offset larger than PAGE_SIZE, despite other
* subsystems dealing with that invariant just fine. For now
* stick to the legacy format where we never present those from
* the block layer, but the code below should be removed once
* these offenders (mostly MMC/SD drivers) are fixed.
*/
page += (offset >> PAGE_SHIFT);
offset &= ~PAGE_MASK;
*sg = blk_next_sg(sg, sglist);
sg_set_page(*sg, page, len, offset);
total += len;
nbytes -= len;
nsegs++;
}
return nsegs;
}
static inline int __blk_bvec_map_sg(struct bio_vec bv,
struct scatterlist *sglist, struct scatterlist **sg)
{
*sg = blk_next_sg(sg, sglist);
sg_set_page(*sg, bv.bv_page, bv.bv_len, bv.bv_offset);
return 1;
}
/* only try to merge bvecs into one sg if they are from two bios */
static inline bool
__blk_segment_map_sg_merge(struct request_queue *q, struct bio_vec *bvec,
struct bio_vec *bvprv, struct scatterlist **sg)
{
int nbytes = bvec->bv_len;
if (!*sg)
return false;
if ((*sg)->length + nbytes > queue_max_segment_size(q))
return false;
if (!biovec_phys_mergeable(q, bvprv, bvec))
return false;
(*sg)->length += nbytes;
return true;
}
static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
struct scatterlist *sglist,
struct scatterlist **sg)
{
struct bio_vec bvec, bvprv = { NULL };
struct bvec_iter iter;
int nsegs = 0;
bool new_bio = false;
for_each_bio(bio) {
bio_for_each_bvec(bvec, bio, iter) {
/*
* Only try to merge bvecs from two bios given we
* have done bio internal merge when adding pages
* to bio
*/
if (new_bio &&
__blk_segment_map_sg_merge(q, &bvec, &bvprv, sg))
goto next_bvec;
if (bvec.bv_offset + bvec.bv_len <= PAGE_SIZE)
nsegs += __blk_bvec_map_sg(bvec, sglist, sg);
else
nsegs += blk_bvec_map_sg(q, &bvec, sglist, sg);
next_bvec:
new_bio = false;
}
if (likely(bio->bi_iter.bi_size)) {
bvprv = bvec;
new_bio = true;
}
}
return nsegs;
}
/*
* map a request to scatterlist, return number of sg entries setup. Caller
* must make sure sg can hold rq->nr_phys_segments entries
* Map a request to scatterlist, return number of sg entries setup. Caller
* must make sure sg can hold rq->nr_phys_segments entries.
*/
int __blk_rq_map_sg(struct request_queue *q, struct request *rq,
struct scatterlist *sglist, struct scatterlist **last_sg)
{
struct req_iterator iter = {
.bio = rq->bio,
.iter = rq->bio->bi_iter,
};
struct phys_vec vec;
int nsegs = 0;
if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
nsegs = __blk_bvec_map_sg(rq->special_vec, sglist, last_sg);
else if (rq->bio)
nsegs = __blk_bios_map_sg(q, rq->bio, sglist, last_sg);
while (blk_map_iter_next(rq, &iter, &vec)) {
*last_sg = blk_next_sg(last_sg, sglist);
sg_set_page(*last_sg, phys_to_page(vec.paddr), vec.len,
offset_in_page(vec.paddr));
nsegs++;
}
if (*last_sg)
sg_mark_end(*last_sg);

View File

@ -11,6 +11,7 @@
#include <linux/smp.h>
#include <linux/cpu.h>
#include <linux/group_cpus.h>
#include <linux/device/bus.h>
#include "blk.h"
#include "blk-mq.h"
@ -54,3 +55,39 @@ int blk_mq_hw_queue_to_node(struct blk_mq_queue_map *qmap, unsigned int index)
return NUMA_NO_NODE;
}
/**
* blk_mq_map_hw_queues - Create CPU to hardware queue mapping
* @qmap: CPU to hardware queue map
* @dev: The device to map queues
* @offset: Queue offset to use for the device
*
* Create a CPU to hardware queue mapping in @qmap. The struct bus_type
* irq_get_affinity callback will be used to retrieve the affinity.
*/
void blk_mq_map_hw_queues(struct blk_mq_queue_map *qmap,
struct device *dev, unsigned int offset)
{
const struct cpumask *mask;
unsigned int queue, cpu;
if (!dev->bus->irq_get_affinity)
goto fallback;
for (queue = 0; queue < qmap->nr_queues; queue++) {
mask = dev->bus->irq_get_affinity(dev, queue + offset);
if (!mask)
goto fallback;
for_each_cpu(cpu, mask)
qmap->mq_map[cpu] = qmap->queue_offset + queue;
}
return;
fallback:
WARN_ON_ONCE(qmap->nr_queues > 1);
blk_mq_clear_mq_map(qmap);
}
EXPORT_SYMBOL_GPL(blk_mq_map_hw_queues);

View File

@ -172,21 +172,13 @@ static int hctx_state_show(void *data, struct seq_file *m)
return 0;
}
#define BLK_TAG_ALLOC_NAME(name) [BLK_TAG_ALLOC_##name] = #name
static const char *const alloc_policy_name[] = {
BLK_TAG_ALLOC_NAME(FIFO),
BLK_TAG_ALLOC_NAME(RR),
};
#undef BLK_TAG_ALLOC_NAME
#define HCTX_FLAG_NAME(name) [ilog2(BLK_MQ_F_##name)] = #name
static const char *const hctx_flag_name[] = {
HCTX_FLAG_NAME(SHOULD_MERGE),
HCTX_FLAG_NAME(TAG_QUEUE_SHARED),
HCTX_FLAG_NAME(STACKING),
HCTX_FLAG_NAME(TAG_HCTX_SHARED),
HCTX_FLAG_NAME(BLOCKING),
HCTX_FLAG_NAME(NO_SCHED),
HCTX_FLAG_NAME(TAG_RR),
HCTX_FLAG_NAME(NO_SCHED_BY_DEFAULT),
};
#undef HCTX_FLAG_NAME
@ -194,22 +186,11 @@ static const char *const hctx_flag_name[] = {
static int hctx_flags_show(void *data, struct seq_file *m)
{
struct blk_mq_hw_ctx *hctx = data;
const int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(hctx->flags);
BUILD_BUG_ON(ARRAY_SIZE(hctx_flag_name) !=
BLK_MQ_F_ALLOC_POLICY_START_BIT);
BUILD_BUG_ON(ARRAY_SIZE(alloc_policy_name) != BLK_TAG_ALLOC_MAX);
BUILD_BUG_ON(ARRAY_SIZE(hctx_flag_name) != ilog2(BLK_MQ_F_MAX));
seq_puts(m, "alloc_policy=");
if (alloc_policy < ARRAY_SIZE(alloc_policy_name) &&
alloc_policy_name[alloc_policy])
seq_puts(m, alloc_policy_name[alloc_policy]);
else
seq_printf(m, "%d", alloc_policy);
seq_puts(m, " ");
blk_flags_show(m,
hctx->flags ^ BLK_ALLOC_POLICY_TO_MQ_FLAG(alloc_policy),
hctx_flag_name, ARRAY_SIZE(hctx_flag_name));
blk_flags_show(m, hctx->flags, hctx_flag_name,
ARRAY_SIZE(hctx_flag_name));
seq_puts(m, "\n");
return 0;
}

View File

@ -1,46 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2016 Christoph Hellwig.
*/
#include <linux/kobject.h>
#include <linux/blkdev.h>
#include <linux/blk-mq-pci.h>
#include <linux/pci.h>
#include <linux/module.h>
#include "blk-mq.h"
/**
* blk_mq_pci_map_queues - provide a default queue mapping for PCI device
* @qmap: CPU to hardware queue map.
* @pdev: PCI device associated with @set.
* @offset: Offset to use for the pci irq vector
*
* This function assumes the PCI device @pdev has at least as many available
* interrupt vectors as @set has queues. It will then query the vector
* corresponding to each queue for it's affinity mask and built queue mapping
* that maps a queue to the CPUs that have irq affinity for the corresponding
* vector.
*/
void blk_mq_pci_map_queues(struct blk_mq_queue_map *qmap, struct pci_dev *pdev,
int offset)
{
const struct cpumask *mask;
unsigned int queue, cpu;
for (queue = 0; queue < qmap->nr_queues; queue++) {
mask = pci_irq_get_affinity(pdev, queue + offset);
if (!mask)
goto fallback;
for_each_cpu(cpu, mask)
qmap->mq_map[cpu] = qmap->queue_offset + queue;
}
return;
fallback:
WARN_ON_ONCE(qmap->nr_queues > 1);
blk_mq_clear_mq_map(qmap);
}
EXPORT_SYMBOL_GPL(blk_mq_pci_map_queues);

View File

@ -351,8 +351,7 @@ bool blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
ctx = blk_mq_get_ctx(q);
hctx = blk_mq_map_queue(q, bio->bi_opf, ctx);
type = hctx->type;
if (!(hctx->flags & BLK_MQ_F_SHOULD_MERGE) ||
list_empty_careful(&ctx->rq_lists[type]))
if (list_empty_careful(&ctx->rq_lists[type]))
goto out_put;
/* default per sw-queue merge */

View File

@ -544,30 +544,11 @@ static int bt_alloc(struct sbitmap_queue *bt, unsigned int depth,
node);
}
int blk_mq_init_bitmaps(struct sbitmap_queue *bitmap_tags,
struct sbitmap_queue *breserved_tags,
unsigned int queue_depth, unsigned int reserved,
int node, int alloc_policy)
{
unsigned int depth = queue_depth - reserved;
bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
if (bt_alloc(bitmap_tags, depth, round_robin, node))
return -ENOMEM;
if (bt_alloc(breserved_tags, reserved, round_robin, node))
goto free_bitmap_tags;
return 0;
free_bitmap_tags:
sbitmap_queue_free(bitmap_tags);
return -ENOMEM;
}
struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
unsigned int reserved_tags,
int node, int alloc_policy)
unsigned int reserved_tags, unsigned int flags, int node)
{
unsigned int depth = total_tags - reserved_tags;
bool round_robin = flags & BLK_MQ_F_TAG_RR;
struct blk_mq_tags *tags;
if (total_tags > BLK_MQ_TAG_MAX) {
@ -582,14 +563,18 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
tags->nr_tags = total_tags;
tags->nr_reserved_tags = reserved_tags;
spin_lock_init(&tags->lock);
if (bt_alloc(&tags->bitmap_tags, depth, round_robin, node))
goto out_free_tags;
if (bt_alloc(&tags->breserved_tags, reserved_tags, round_robin, node))
goto out_free_bitmap_tags;
if (blk_mq_init_bitmaps(&tags->bitmap_tags, &tags->breserved_tags,
total_tags, reserved_tags, node,
alloc_policy) < 0) {
return tags;
out_free_bitmap_tags:
sbitmap_queue_free(&tags->bitmap_tags);
out_free_tags:
kfree(tags);
return NULL;
}
return tags;
}
void blk_mq_free_tags(struct blk_mq_tags *tags)

View File

@ -1,46 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2016 Christoph Hellwig.
*/
#include <linux/device.h>
#include <linux/blk-mq-virtio.h>
#include <linux/virtio_config.h>
#include <linux/module.h>
#include "blk-mq.h"
/**
* blk_mq_virtio_map_queues - provide a default queue mapping for virtio device
* @qmap: CPU to hardware queue map.
* @vdev: virtio device to provide a mapping for.
* @first_vec: first interrupt vectors to use for queues (usually 0)
*
* This function assumes the virtio device @vdev has at least as many available
* interrupt vectors as @set has queues. It will then query the vector
* corresponding to each queue for it's affinity mask and built queue mapping
* that maps a queue to the CPUs that have irq affinity for the corresponding
* vector.
*/
void blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap,
struct virtio_device *vdev, int first_vec)
{
const struct cpumask *mask;
unsigned int queue, cpu;
if (!vdev->config->get_vq_affinity)
goto fallback;
for (queue = 0; queue < qmap->nr_queues; queue++) {
mask = vdev->config->get_vq_affinity(vdev, first_vec + queue);
if (!mask)
goto fallback;
for_each_cpu(cpu, mask)
qmap->mq_map[cpu] = qmap->queue_offset + queue;
}
return;
fallback:
blk_mq_map_queues(qmap);
}
EXPORT_SYMBOL_GPL(blk_mq_virtio_map_queues);

View File

@ -131,6 +131,10 @@ static bool blk_freeze_set_owner(struct request_queue *q,
if (!q->mq_freeze_depth) {
q->mq_freeze_owner = owner;
q->mq_freeze_owner_depth = 1;
q->mq_freeze_disk_dead = !q->disk ||
test_bit(GD_DEAD, &q->disk->state) ||
!blk_queue_registered(q);
q->mq_freeze_queue_dying = blk_queue_dying(q);
return true;
}
@ -142,8 +146,6 @@ static bool blk_freeze_set_owner(struct request_queue *q,
/* verify the last unfreeze in owner context */
static bool blk_unfreeze_check_owner(struct request_queue *q)
{
if (!q->mq_freeze_owner)
return false;
if (q->mq_freeze_owner != current)
return false;
if (--q->mq_freeze_owner_depth == 0) {
@ -189,7 +191,7 @@ bool __blk_freeze_queue_start(struct request_queue *q,
void blk_freeze_queue_start(struct request_queue *q)
{
if (__blk_freeze_queue_start(q, current))
blk_freeze_acquire_lock(q, false, false);
blk_freeze_acquire_lock(q);
}
EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
@ -237,7 +239,7 @@ bool __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic)
void blk_mq_unfreeze_queue(struct request_queue *q)
{
if (__blk_mq_unfreeze_queue(q, false))
blk_unfreeze_release_lock(q, false, false);
blk_unfreeze_release_lock(q);
}
EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
@ -2656,8 +2658,10 @@ static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
if (bio->bi_opf & REQ_RAHEAD)
rq->cmd_flags |= REQ_FAILFAST_MASK;
rq->bio = rq->biotail = bio;
rq->__sector = bio->bi_iter.bi_sector;
blk_rq_bio_prep(rq, bio, nr_segs);
rq->__data_len = bio->bi_iter.bi_size;
rq->nr_phys_segments = nr_segs;
if (bio_integrity(bio))
rq->nr_integrity_segments = blk_rq_count_integrity_sg(rq->q,
bio);
@ -3092,14 +3096,21 @@ void blk_mq_submit_bio(struct bio *bio)
}
/*
* Device reconfiguration may change logical block size, so alignment
* check has to be done with queue usage counter held
* Device reconfiguration may change logical block size or reduce the
* number of poll queues, so the checks for alignment and poll support
* have to be done with queue usage counter held.
*/
if (unlikely(bio_unaligned(bio, q))) {
bio_io_error(bio);
goto queue_exit;
}
if ((bio->bi_opf & REQ_POLLED) && !blk_mq_can_poll(q)) {
bio->bi_status = BLK_STS_NOTSUPP;
bio_endio(bio);
goto queue_exit;
}
bio = __bio_split_to_limits(bio, &q->limits, &nr_segs);
if (!bio)
goto queue_exit;
@ -3472,8 +3483,7 @@ static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
if (node == NUMA_NO_NODE)
node = set->numa_node;
tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
tags = blk_mq_init_tags(nr_tags, reserved_tags, set->flags, node);
if (!tags)
return NULL;
@ -4317,12 +4327,6 @@ void blk_mq_release(struct request_queue *q)
blk_mq_sysfs_deinit(q);
}
static bool blk_mq_can_poll(struct blk_mq_tag_set *set)
{
return set->nr_maps > HCTX_TYPE_POLL &&
set->map[HCTX_TYPE_POLL].nr_queues;
}
struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set,
struct queue_limits *lim, void *queuedata)
{
@ -4333,7 +4337,7 @@ struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set,
if (!lim)
lim = &default_lim;
lim->features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT;
if (blk_mq_can_poll(set))
if (set->nr_maps > HCTX_TYPE_POLL)
lim->features |= BLK_FEAT_POLL;
q = blk_alloc_queue(lim, set->numa_node);
@ -5021,8 +5025,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
fallback:
blk_mq_update_queue_map(set);
list_for_each_entry(q, &set->tag_list, tag_set_list) {
struct queue_limits lim;
blk_mq_realloc_hw_ctxs(set, q);
if (q->nr_hw_queues != set->nr_hw_queues) {
@ -5036,13 +5038,6 @@ fallback:
set->nr_hw_queues = prev_nr_hw_queues;
goto fallback;
}
lim = queue_limits_start_update(q);
if (blk_mq_can_poll(set))
lim.features |= BLK_FEAT_POLL;
else
lim.features &= ~BLK_FEAT_POLL;
if (queue_limits_commit_update(q, &lim) < 0)
pr_warn("updating the poll flag failed\n");
blk_mq_map_swqueue(q);
}
@ -5102,9 +5097,9 @@ static int blk_hctx_poll(struct request_queue *q, struct blk_mq_hw_ctx *hctx,
int blk_mq_poll(struct request_queue *q, blk_qc_t cookie,
struct io_comp_batch *iob, unsigned int flags)
{
struct blk_mq_hw_ctx *hctx = xa_load(&q->hctx_table, cookie);
return blk_hctx_poll(q, hctx, iob, flags);
if (!blk_mq_can_poll(q))
return 0;
return blk_hctx_poll(q, xa_load(&q->hctx_table, cookie), iob, flags);
}
int blk_rq_poll(struct request *rq, struct io_comp_batch *iob,

View File

@ -163,11 +163,8 @@ struct blk_mq_alloc_data {
};
struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
unsigned int reserved_tags, int node, int alloc_policy);
unsigned int reserved_tags, unsigned int flags, int node);
void blk_mq_free_tags(struct blk_mq_tags *tags);
int blk_mq_init_bitmaps(struct sbitmap_queue *bitmap_tags,
struct sbitmap_queue *breserved_tags, unsigned int queue_depth,
unsigned int reserved, int node, int alloc_policy);
unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
unsigned long blk_mq_get_tags(struct blk_mq_alloc_data *data, int nr_tags,
@ -451,4 +448,10 @@ do { \
#define blk_mq_run_dispatch_ops(q, dispatch_ops) \
__blk_mq_run_dispatch_ops(q, true, dispatch_ops) \
static inline bool blk_mq_can_poll(struct request_queue *q)
{
return (q->limits.features & BLK_FEAT_POLL) &&
q->tag_set->map[HCTX_TYPE_POLL].nr_queues;
}
#endif

View File

@ -413,7 +413,8 @@ int blk_set_default_limits(struct queue_limits *lim)
* @lim: limits to apply
*
* Apply the limits in @lim that were obtained from queue_limits_start_update()
* and updated by the caller to @q.
* and updated by the caller to @q. The caller must have frozen the queue or
* ensure that there are no outstanding I/Os by other means.
*
* Returns 0 if successful, else a negative error code.
*/
@ -443,6 +444,30 @@ out_unlock:
}
EXPORT_SYMBOL_GPL(queue_limits_commit_update);
/**
* queue_limits_commit_update_frozen - commit an atomic update of queue limits
* @q: queue to update
* @lim: limits to apply
*
* Apply the limits in @lim that were obtained from queue_limits_start_update()
* and updated with the new values by the caller to @q. Freezes the queue
* before the update and unfreezes it after.
*
* Returns 0 if successful, else a negative error code.
*/
int queue_limits_commit_update_frozen(struct request_queue *q,
struct queue_limits *lim)
{
int ret;
blk_mq_freeze_queue(q);
ret = queue_limits_commit_update(q, lim);
blk_mq_unfreeze_queue(q);
return ret;
}
EXPORT_SYMBOL_GPL(queue_limits_commit_update_frozen);
/**
* queue_limits_set - apply queue limits to queue
* @q: queue to update

View File

@ -24,6 +24,8 @@ struct queue_sysfs_entry {
struct attribute attr;
ssize_t (*show)(struct gendisk *disk, char *page);
ssize_t (*store)(struct gendisk *disk, const char *page, size_t count);
int (*store_limit)(struct gendisk *disk, const char *page,
size_t count, struct queue_limits *lim);
void (*load_module)(struct gendisk *disk, const char *page, size_t count);
};
@ -153,13 +155,11 @@ QUEUE_SYSFS_SHOW_CONST(discard_zeroes_data, 0)
QUEUE_SYSFS_SHOW_CONST(write_same_max, 0)
QUEUE_SYSFS_SHOW_CONST(poll_delay, -1)
static ssize_t queue_max_discard_sectors_store(struct gendisk *disk,
const char *page, size_t count)
static int queue_max_discard_sectors_store(struct gendisk *disk,
const char *page, size_t count, struct queue_limits *lim)
{
unsigned long max_discard_bytes;
struct queue_limits lim;
ssize_t ret;
int err;
ret = queue_var_store(&max_discard_bytes, page, count);
if (ret < 0)
@ -171,38 +171,28 @@ static ssize_t queue_max_discard_sectors_store(struct gendisk *disk,
if ((max_discard_bytes >> SECTOR_SHIFT) > UINT_MAX)
return -EINVAL;
lim = queue_limits_start_update(disk->queue);
lim.max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
err = queue_limits_commit_update(disk->queue, &lim);
if (err)
return err;
return ret;
lim->max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
return 0;
}
static ssize_t
queue_max_sectors_store(struct gendisk *disk, const char *page, size_t count)
static int
queue_max_sectors_store(struct gendisk *disk, const char *page, size_t count,
struct queue_limits *lim)
{
unsigned long max_sectors_kb;
struct queue_limits lim;
ssize_t ret;
int err;
ret = queue_var_store(&max_sectors_kb, page, count);
if (ret < 0)
return ret;
lim = queue_limits_start_update(disk->queue);
lim.max_user_sectors = max_sectors_kb << 1;
err = queue_limits_commit_update(disk->queue, &lim);
if (err)
return err;
return ret;
lim->max_user_sectors = max_sectors_kb << 1;
return 0;
}
static ssize_t queue_feature_store(struct gendisk *disk, const char *page,
size_t count, blk_features_t feature)
size_t count, struct queue_limits *lim, blk_features_t feature)
{
struct queue_limits lim;
unsigned long val;
ssize_t ret;
@ -210,15 +200,11 @@ static ssize_t queue_feature_store(struct gendisk *disk, const char *page,
if (ret < 0)
return ret;
lim = queue_limits_start_update(disk->queue);
if (val)
lim.features |= feature;
lim->features |= feature;
else
lim.features &= ~feature;
ret = queue_limits_commit_update(disk->queue, &lim);
if (ret)
return ret;
return count;
lim->features &= ~feature;
return 0;
}
#define QUEUE_SYSFS_FEATURE(_name, _feature) \
@ -227,10 +213,10 @@ static ssize_t queue_##_name##_show(struct gendisk *disk, char *page) \
return sysfs_emit(page, "%u\n", \
!!(disk->queue->limits.features & _feature)); \
} \
static ssize_t queue_##_name##_store(struct gendisk *disk, \
const char *page, size_t count) \
static int queue_##_name##_store(struct gendisk *disk, \
const char *page, size_t count, struct queue_limits *lim) \
{ \
return queue_feature_store(disk, page, count, _feature); \
return queue_feature_store(disk, page, count, lim, _feature); \
}
QUEUE_SYSFS_FEATURE(rotational, BLK_FEAT_ROTATIONAL)
@ -245,10 +231,17 @@ static ssize_t queue_##_name##_show(struct gendisk *disk, char *page) \
!!(disk->queue->limits.features & _feature)); \
}
QUEUE_SYSFS_FEATURE_SHOW(poll, BLK_FEAT_POLL);
QUEUE_SYSFS_FEATURE_SHOW(fua, BLK_FEAT_FUA);
QUEUE_SYSFS_FEATURE_SHOW(dax, BLK_FEAT_DAX);
static ssize_t queue_poll_show(struct gendisk *disk, char *page)
{
if (queue_is_mq(disk->queue))
return sysfs_emit(page, "%u\n", blk_mq_can_poll(disk->queue));
return sysfs_emit(page, "%u\n",
!!(disk->queue->limits.features & BLK_FEAT_POLL));
}
static ssize_t queue_zoned_show(struct gendisk *disk, char *page)
{
if (blk_queue_is_zoned(disk->queue))
@ -266,10 +259,9 @@ static ssize_t queue_iostats_passthrough_show(struct gendisk *disk, char *page)
return queue_var_show(!!blk_queue_passthrough_stat(disk->queue), page);
}
static ssize_t queue_iostats_passthrough_store(struct gendisk *disk,
const char *page, size_t count)
static int queue_iostats_passthrough_store(struct gendisk *disk,
const char *page, size_t count, struct queue_limits *lim)
{
struct queue_limits lim;
unsigned long ios;
ssize_t ret;
@ -277,18 +269,13 @@ static ssize_t queue_iostats_passthrough_store(struct gendisk *disk,
if (ret < 0)
return ret;
lim = queue_limits_start_update(disk->queue);
if (ios)
lim.flags |= BLK_FLAG_IOSTATS_PASSTHROUGH;
lim->flags |= BLK_FLAG_IOSTATS_PASSTHROUGH;
else
lim.flags &= ~BLK_FLAG_IOSTATS_PASSTHROUGH;
ret = queue_limits_commit_update(disk->queue, &lim);
if (ret)
return ret;
return count;
lim->flags &= ~BLK_FLAG_IOSTATS_PASSTHROUGH;
return 0;
}
static ssize_t queue_nomerges_show(struct gendisk *disk, char *page)
{
return queue_var_show((blk_queue_nomerges(disk->queue) << 1) |
@ -391,12 +378,10 @@ static ssize_t queue_wc_show(struct gendisk *disk, char *page)
return sysfs_emit(page, "write through\n");
}
static ssize_t queue_wc_store(struct gendisk *disk, const char *page,
size_t count)
static int queue_wc_store(struct gendisk *disk, const char *page,
size_t count, struct queue_limits *lim)
{
struct queue_limits lim;
bool disable;
int err;
if (!strncmp(page, "write back", 10)) {
disable = false;
@ -407,15 +392,11 @@ static ssize_t queue_wc_store(struct gendisk *disk, const char *page,
return -EINVAL;
}
lim = queue_limits_start_update(disk->queue);
if (disable)
lim.flags |= BLK_FLAG_WRITE_CACHE_DISABLED;
lim->flags |= BLK_FLAG_WRITE_CACHE_DISABLED;
else
lim.flags &= ~BLK_FLAG_WRITE_CACHE_DISABLED;
err = queue_limits_commit_update(disk->queue, &lim);
if (err)
return err;
return count;
lim->flags &= ~BLK_FLAG_WRITE_CACHE_DISABLED;
return 0;
}
#define QUEUE_RO_ENTRY(_prefix, _name) \
@ -431,6 +412,13 @@ static struct queue_sysfs_entry _prefix##_entry = { \
.store = _prefix##_store, \
};
#define QUEUE_LIM_RW_ENTRY(_prefix, _name) \
static struct queue_sysfs_entry _prefix##_entry = { \
.attr = { .name = _name, .mode = 0644 }, \
.show = _prefix##_show, \
.store_limit = _prefix##_store, \
}
#define QUEUE_RW_LOAD_MODULE_ENTRY(_prefix, _name) \
static struct queue_sysfs_entry _prefix##_entry = { \
.attr = { .name = _name, .mode = 0644 }, \
@ -441,7 +429,7 @@ static struct queue_sysfs_entry _prefix##_entry = { \
QUEUE_RW_ENTRY(queue_requests, "nr_requests");
QUEUE_RW_ENTRY(queue_ra, "read_ahead_kb");
QUEUE_RW_ENTRY(queue_max_sectors, "max_sectors_kb");
QUEUE_LIM_RW_ENTRY(queue_max_sectors, "max_sectors_kb");
QUEUE_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb");
QUEUE_RO_ENTRY(queue_max_segments, "max_segments");
QUEUE_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments");
@ -457,7 +445,7 @@ QUEUE_RO_ENTRY(queue_io_opt, "optimal_io_size");
QUEUE_RO_ENTRY(queue_max_discard_segments, "max_discard_segments");
QUEUE_RO_ENTRY(queue_discard_granularity, "discard_granularity");
QUEUE_RO_ENTRY(queue_max_hw_discard_sectors, "discard_max_hw_bytes");
QUEUE_RW_ENTRY(queue_max_discard_sectors, "discard_max_bytes");
QUEUE_LIM_RW_ENTRY(queue_max_discard_sectors, "discard_max_bytes");
QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data");
QUEUE_RO_ENTRY(queue_atomic_write_max_sectors, "atomic_write_max_bytes");
@ -477,11 +465,11 @@ QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");
QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
QUEUE_RW_ENTRY(queue_iostats_passthrough, "iostats_passthrough");
QUEUE_LIM_RW_ENTRY(queue_iostats_passthrough, "iostats_passthrough");
QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
QUEUE_RW_ENTRY(queue_poll, "io_poll");
QUEUE_RW_ENTRY(queue_poll_delay, "io_poll_delay");
QUEUE_RW_ENTRY(queue_wc, "write_cache");
QUEUE_LIM_RW_ENTRY(queue_wc, "write_cache");
QUEUE_RO_ENTRY(queue_fua, "fua");
QUEUE_RO_ENTRY(queue_dax, "dax");
QUEUE_RW_ENTRY(queue_io_timeout, "io_timeout");
@ -494,10 +482,10 @@ static struct queue_sysfs_entry queue_hw_sector_size_entry = {
.show = queue_logical_block_size_show,
};
QUEUE_RW_ENTRY(queue_rotational, "rotational");
QUEUE_RW_ENTRY(queue_iostats, "iostats");
QUEUE_RW_ENTRY(queue_add_random, "add_random");
QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes");
QUEUE_LIM_RW_ENTRY(queue_rotational, "rotational");
QUEUE_LIM_RW_ENTRY(queue_iostats, "iostats");
QUEUE_LIM_RW_ENTRY(queue_add_random, "add_random");
QUEUE_LIM_RW_ENTRY(queue_stable_writes, "stable_writes");
#ifdef CONFIG_BLK_WBT
static ssize_t queue_var_store64(s64 *var, const char *page)
@ -695,7 +683,7 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
struct request_queue *q = disk->queue;
ssize_t res;
if (!entry->store)
if (!entry->store_limit && !entry->store)
return -EIO;
/*
@ -706,11 +694,26 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
if (entry->load_module)
entry->load_module(disk, page, length);
blk_mq_freeze_queue(q);
if (entry->store_limit) {
struct queue_limits lim = queue_limits_start_update(q);
res = entry->store_limit(disk, page, length, &lim);
if (res < 0) {
queue_limits_cancel_update(q);
return res;
}
res = queue_limits_commit_update_frozen(q, &lim);
if (res)
return res;
return length;
}
mutex_lock(&q->sysfs_lock);
blk_mq_freeze_queue(q);
res = entry->store(disk, page, length);
mutex_unlock(&q->sysfs_lock);
blk_mq_unfreeze_queue(q);
mutex_unlock(&q->sysfs_lock);
return res;
}

View File

@ -11,12 +11,8 @@
*/
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
#include <linux/mm.h>
#include <linux/vmalloc.h>
#include <linux/sched/mm.h>
#include <linux/spinlock.h>
#include <linux/refcount.h>
#include <linux/mempool.h>
@ -463,6 +459,8 @@ static inline void disk_put_zone_wplug(struct blk_zone_wplug *zwplug)
static inline bool disk_should_remove_zone_wplug(struct gendisk *disk,
struct blk_zone_wplug *zwplug)
{
lockdep_assert_held(&zwplug->lock);
/* If the zone write plug was already removed, we are done. */
if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED)
return false;
@ -584,6 +582,7 @@ static inline void blk_zone_wplug_bio_io_error(struct blk_zone_wplug *zwplug,
bio_clear_flag(bio, BIO_ZONE_WRITE_PLUGGING);
bio_io_error(bio);
disk_put_zone_wplug(zwplug);
/* Drop the reference taken by disk_zone_wplug_add_bio(() */
blk_queue_exit(q);
}
@ -895,10 +894,7 @@ void blk_zone_write_plug_init_request(struct request *req)
break;
}
/*
* Drop the extra reference on the queue usage we got when
* plugging the BIO and advance the write pointer offset.
*/
/* Drop the reference taken by disk_zone_wplug_add_bio(). */
blk_queue_exit(q);
zwplug->wp_offset += bio_sectors(bio);
@ -917,6 +913,8 @@ static bool blk_zone_wplug_prepare_bio(struct blk_zone_wplug *zwplug,
{
struct gendisk *disk = bio->bi_bdev->bd_disk;
lockdep_assert_held(&zwplug->lock);
/*
* If we lost track of the zone write pointer due to a write error,
* the user must either execute a report zones, reset the zone or finish
@ -1446,7 +1444,6 @@ static int disk_update_zone_resources(struct gendisk *disk,
unsigned int nr_seq_zones, nr_conv_zones;
unsigned int pool_size;
struct queue_limits lim;
int ret;
disk->nr_zones = args->nr_zones;
disk->zone_capacity = args->zone_capacity;
@ -1497,11 +1494,7 @@ static int disk_update_zone_resources(struct gendisk *disk,
}
commit:
blk_mq_freeze_queue(q);
ret = queue_limits_commit_update(q, &lim);
blk_mq_unfreeze_queue(q);
return ret;
return queue_limits_commit_update_frozen(q, &lim);
}
static int blk_revalidate_conv_zone(struct blk_zone *zone, unsigned int idx,
@ -1776,24 +1769,14 @@ int blk_zone_issue_zeroout(struct block_device *bdev, sector_t sector,
EXPORT_SYMBOL_GPL(blk_zone_issue_zeroout);
#ifdef CONFIG_BLK_DEBUG_FS
int queue_zone_wplugs_show(void *data, struct seq_file *m)
static void queue_zone_wplug_show(struct blk_zone_wplug *zwplug,
struct seq_file *m)
{
struct request_queue *q = data;
struct gendisk *disk = q->disk;
struct blk_zone_wplug *zwplug;
unsigned int zwp_wp_offset, zwp_flags;
unsigned int zwp_zone_no, zwp_ref;
unsigned int zwp_bio_list_size, i;
unsigned int zwp_bio_list_size;
unsigned long flags;
if (!disk->zone_wplugs_hash)
return 0;
rcu_read_lock();
for (i = 0; i < disk_zone_wplugs_hash_size(disk); i++) {
hlist_for_each_entry_rcu(zwplug,
&disk->zone_wplugs_hash[i], node) {
spin_lock_irqsave(&zwplug->lock, flags);
zwp_zone_no = zwplug->zone_no;
zwp_flags = zwplug->flags;
@ -1802,11 +1785,25 @@ int queue_zone_wplugs_show(void *data, struct seq_file *m)
zwp_bio_list_size = bio_list_size(&zwplug->bio_list);
spin_unlock_irqrestore(&zwplug->lock, flags);
seq_printf(m, "%u 0x%x %u %u %u\n",
zwp_zone_no, zwp_flags, zwp_ref,
seq_printf(m, "%u 0x%x %u %u %u\n", zwp_zone_no, zwp_flags, zwp_ref,
zwp_wp_offset, zwp_bio_list_size);
}
}
}
int queue_zone_wplugs_show(void *data, struct seq_file *m)
{
struct request_queue *q = data;
struct gendisk *disk = q->disk;
struct blk_zone_wplug *zwplug;
unsigned int i;
if (!disk->zone_wplugs_hash)
return 0;
rcu_read_lock();
for (i = 0; i < disk_zone_wplugs_hash_size(disk); i++)
hlist_for_each_entry_rcu(zwplug, &disk->zone_wplugs_hash[i],
node)
queue_zone_wplug_show(zwplug, m);
rcu_read_unlock();
return 0;

View File

@ -556,14 +556,6 @@ void bdev_set_nr_sectors(struct block_device *bdev, sector_t sectors);
struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id,
struct lock_class_key *lkclass);
int bio_add_hw_page(struct request_queue *q, struct bio *bio,
struct page *page, unsigned int len, unsigned int offset,
unsigned int max_sectors, bool *same_page);
int bio_add_hw_folio(struct request_queue *q, struct bio *bio,
struct folio *folio, size_t len, size_t offset,
unsigned int max_sectors, bool *same_page);
/*
* Clean up a page appropriately, where the page may be pinned, may have a
* ref taken on it or neither.
@ -720,22 +712,29 @@ void blk_integrity_verify(struct bio *bio);
void blk_integrity_prepare(struct request *rq);
void blk_integrity_complete(struct request *rq, unsigned int nr_bytes);
static inline void blk_freeze_acquire_lock(struct request_queue *q, bool
disk_dead, bool queue_dying)
#ifdef CONFIG_LOCKDEP
static inline void blk_freeze_acquire_lock(struct request_queue *q)
{
if (!disk_dead)
if (!q->mq_freeze_disk_dead)
rwsem_acquire(&q->io_lockdep_map, 0, 1, _RET_IP_);
if (!queue_dying)
if (!q->mq_freeze_queue_dying)
rwsem_acquire(&q->q_lockdep_map, 0, 1, _RET_IP_);
}
static inline void blk_unfreeze_release_lock(struct request_queue *q, bool
disk_dead, bool queue_dying)
static inline void blk_unfreeze_release_lock(struct request_queue *q)
{
if (!queue_dying)
if (!q->mq_freeze_queue_dying)
rwsem_release(&q->q_lockdep_map, _RET_IP_);
if (!disk_dead)
if (!q->mq_freeze_disk_dead)
rwsem_release(&q->io_lockdep_map, _RET_IP_);
}
#else
static inline void blk_freeze_acquire_lock(struct request_queue *q)
{
}
static inline void blk_unfreeze_release_lock(struct request_queue *q)
{
}
#endif
#endif /* BLK_INTERNAL_H */

View File

@ -381,7 +381,7 @@ struct request_queue *bsg_setup_queue(struct device *dev, const char *name,
set->queue_depth = 128;
set->numa_node = NUMA_NO_NODE;
set->cmd_size = sizeof(struct bsg_job) + dd_job_size;
set->flags = BLK_MQ_F_NO_SCHED | BLK_MQ_F_BLOCKING;
set->flags = BLK_MQ_F_BLOCKING;
if (blk_mq_alloc_tag_set(set))
goto out_tag_set;

View File

@ -405,12 +405,12 @@ struct request *elv_former_request(struct request_queue *q, struct request *rq)
return NULL;
}
#define to_elv(atr) container_of((atr), struct elv_fs_entry, attr)
#define to_elv(atr) container_of_const((atr), struct elv_fs_entry, attr)
static ssize_t
elv_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
{
struct elv_fs_entry *entry = to_elv(attr);
const struct elv_fs_entry *entry = to_elv(attr);
struct elevator_queue *e;
ssize_t error;
@ -428,7 +428,7 @@ static ssize_t
elv_attr_store(struct kobject *kobj, struct attribute *attr,
const char *page, size_t length)
{
struct elv_fs_entry *entry = to_elv(attr);
const struct elv_fs_entry *entry = to_elv(attr);
struct elevator_queue *e;
ssize_t error;
@ -461,7 +461,7 @@ int elv_register_queue(struct request_queue *q, bool uevent)
error = kobject_add(&e->kobj, &q->disk->queue_kobj, "iosched");
if (!error) {
struct elv_fs_entry *attr = e->type->elevator_attrs;
const struct elv_fs_entry *attr = e->type->elevator_attrs;
if (attr) {
while (attr->attr.name) {
if (sysfs_create_file(&e->kobj, &attr->attr))
@ -547,14 +547,6 @@ void elv_unregister(struct elevator_type *e)
}
EXPORT_SYMBOL_GPL(elv_unregister);
static inline bool elv_support_iosched(struct request_queue *q)
{
if (!queue_is_mq(q) ||
(q->tag_set->flags & BLK_MQ_F_NO_SCHED))
return false;
return true;
}
/*
* For single queue devices, default to using mq-deadline. If we have multiple
* queues or mq-deadline is not available, default to "none".
@ -580,9 +572,6 @@ void elevator_init_mq(struct request_queue *q)
struct elevator_type *e;
int err;
if (!elv_support_iosched(q))
return;
WARN_ON_ONCE(blk_queue_registered(q));
if (unlikely(q->elevator))
@ -601,16 +590,13 @@ void elevator_init_mq(struct request_queue *q)
*
* Disk isn't added yet, so verifying queue lock only manually.
*/
blk_freeze_queue_start_non_owner(q);
blk_freeze_acquire_lock(q, true, false);
blk_mq_freeze_queue_wait(q);
blk_mq_freeze_queue(q);
blk_mq_cancel_work_sync(q);
err = blk_mq_init_sched(q, e);
blk_unfreeze_release_lock(q, true, false);
blk_mq_unfreeze_queue_non_owner(q);
blk_mq_unfreeze_queue(q);
if (err) {
pr_warn("\"%s\" elevator initialization failed, "
@ -717,9 +703,6 @@ void elv_iosched_load_module(struct gendisk *disk, const char *buf,
struct elevator_type *found;
const char *name;
if (!elv_support_iosched(disk->queue))
return;
strscpy(elevator_name, buf, sizeof(elevator_name));
name = strstrip(elevator_name);
@ -737,9 +720,6 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
char elevator_name[ELV_NAME_MAX];
int ret;
if (!elv_support_iosched(disk->queue))
return count;
strscpy(elevator_name, buf, sizeof(elevator_name));
ret = elevator_change(disk->queue, strstrip(elevator_name));
if (!ret)
@ -754,9 +734,6 @@ ssize_t elv_iosched_show(struct gendisk *disk, char *name)
struct elevator_type *cur = NULL, *e;
int len = 0;
if (!elv_support_iosched(q))
return sprintf(name, "none\n");
if (!q->elevator) {
len += sprintf(name+len, "[none] ");
} else {

View File

@ -71,7 +71,7 @@ struct elevator_type
size_t icq_size; /* see iocontext.h */
size_t icq_align; /* ditto */
struct elv_fs_entry *elevator_attrs;
const struct elv_fs_entry *elevator_attrs;
const char *elevator_name;
const char *elevator_alias;
struct module *elevator_owner;

View File

@ -400,21 +400,23 @@ int __must_check add_disk_fwnode(struct device *parent, struct gendisk *disk,
struct device *ddev = disk_to_dev(disk);
int ret;
/* Only makes sense for bio-based to set ->poll_bio */
if (queue_is_mq(disk->queue) && disk->fops->poll_bio)
if (queue_is_mq(disk->queue)) {
/*
* ->submit_bio and ->poll_bio are bypassed for blk-mq drivers.
*/
if (disk->fops->submit_bio || disk->fops->poll_bio)
return -EINVAL;
/*
* The disk queue should now be all set with enough information about
* the device for the elevator code to pick an adequate default
* elevator if one is needed, that is, for devices requesting queue
* registration.
* Initialize the I/O scheduler code and pick a default one if
* needed.
*/
elevator_init_mq(disk->queue);
/* Mark bdev as having a submit_bio, if needed */
if (disk->fops->submit_bio)
} else {
if (!disk->fops->submit_bio)
return -EINVAL;
bdev_set_flag(disk->part0, BD_HAS_SUBMIT_BIO);
}
/*
* If the driver provides an explicit major number it also must provide
@ -661,7 +663,7 @@ void del_gendisk(struct gendisk *disk)
struct request_queue *q = disk->queue;
struct block_device *part;
unsigned long idx;
bool start_drain, queue_dying;
bool start_drain;
might_sleep();
@ -690,9 +692,8 @@ void del_gendisk(struct gendisk *disk)
*/
mutex_lock(&disk->open_mutex);
start_drain = __blk_mark_disk_dead(disk);
queue_dying = blk_queue_dying(q);
if (start_drain)
blk_freeze_acquire_lock(q, true, queue_dying);
blk_freeze_acquire_lock(q);
xa_for_each_start(&disk->part_tbl, idx, part, 1)
drop_partition(part);
mutex_unlock(&disk->open_mutex);
@ -748,7 +749,7 @@ void del_gendisk(struct gendisk *disk)
blk_mq_exit_queue(q);
if (start_drain)
blk_unfreeze_release_lock(q, true, queue_dying);
blk_unfreeze_release_lock(q);
}
EXPORT_SYMBOL(del_gendisk);
@ -798,7 +799,7 @@ static ssize_t disk_badblocks_store(struct device *dev,
}
#ifdef CONFIG_BLOCK_LEGACY_AUTOLOAD
void blk_request_module(dev_t devt)
static bool blk_probe_dev(dev_t devt)
{
unsigned int major = MAJOR(devt);
struct blk_major_name **n;
@ -808,14 +809,26 @@ void blk_request_module(dev_t devt)
if ((*n)->major == major && (*n)->probe) {
(*n)->probe(devt);
mutex_unlock(&major_names_lock);
return;
return true;
}
}
mutex_unlock(&major_names_lock);
return false;
}
if (request_module("block-major-%d-%d", MAJOR(devt), MINOR(devt)) > 0)
void blk_request_module(dev_t devt)
{
int error;
if (blk_probe_dev(devt))
return;
error = request_module("block-major-%d-%d", MAJOR(devt), MINOR(devt));
/* Make old-style 2.4 aliases work */
request_module("block-major-%d", MAJOR(devt));
if (error > 0)
error = request_module("block-major-%d", MAJOR(devt));
if (!error)
blk_probe_dev(devt);
}
#endif /* CONFIG_BLOCK_LEGACY_AUTOLOAD */

View File

@ -889,7 +889,7 @@ KYBER_LAT_SHOW_STORE(KYBER_WRITE, write);
#undef KYBER_LAT_SHOW_STORE
#define KYBER_LAT_ATTR(op) __ATTR(op##_lat_nsec, 0644, kyber_##op##_lat_show, kyber_##op##_lat_store)
static struct elv_fs_entry kyber_sched_attrs[] = {
static const struct elv_fs_entry kyber_sched_attrs[] = {
KYBER_LAT_ATTR(read),
KYBER_LAT_ATTR(write),
__ATTR_NULL

View File

@ -834,7 +834,7 @@ STORE_INT(deadline_fifo_batch_store, &dd->fifo_batch, 0, INT_MAX);
#define DD_ATTR(name) \
__ATTR(name, 0644, deadline_##name##_show, deadline_##name##_store)
static struct elv_fs_entry deadline_attrs[] = {
static const struct elv_fs_entry deadline_attrs[] = {
DD_ATTR(read_expire),
DD_ATTR(write_expire),
DD_ATTR(writes_starved),

View File

@ -396,7 +396,7 @@ extern const struct attribute_group *ahci_sdev_groups[];
.shost_groups = ahci_shost_groups, \
.sdev_groups = ahci_sdev_groups, \
.change_queue_depth = ata_scsi_change_queue_depth, \
.tag_alloc_policy = BLK_TAG_ALLOC_RR, \
.tag_alloc_policy_rr = true, \
.device_configure = ata_scsi_device_configure
extern struct ata_port_operations ahci_ops;

View File

@ -935,7 +935,7 @@ static const struct scsi_host_template pata_macio_sht = {
.device_configure = pata_macio_device_configure,
.sdev_groups = ata_common_sdev_groups,
.can_queue = ATA_DEF_QUEUE,
.tag_alloc_policy = BLK_TAG_ALLOC_RR,
.tag_alloc_policy_rr = true,
};
static struct ata_port_operations pata_macio_ops = {

View File

@ -672,7 +672,7 @@ static const struct scsi_host_template mv6_sht = {
.dma_boundary = MV_DMA_BOUNDARY,
.sdev_groups = ata_ncq_sdev_groups,
.change_queue_depth = ata_scsi_change_queue_depth,
.tag_alloc_policy = BLK_TAG_ALLOC_RR,
.tag_alloc_policy_rr = true,
.device_configure = ata_scsi_device_configure
};

View File

@ -385,7 +385,7 @@ static const struct scsi_host_template nv_adma_sht = {
.device_configure = nv_adma_device_configure,
.sdev_groups = ata_ncq_sdev_groups,
.change_queue_depth = ata_scsi_change_queue_depth,
.tag_alloc_policy = BLK_TAG_ALLOC_RR,
.tag_alloc_policy_rr = true,
};
static const struct scsi_host_template nv_swncq_sht = {
@ -396,7 +396,7 @@ static const struct scsi_host_template nv_swncq_sht = {
.device_configure = nv_swncq_device_configure,
.sdev_groups = ata_ncq_sdev_groups,
.change_queue_depth = ata_scsi_change_queue_depth,
.tag_alloc_policy = BLK_TAG_ALLOC_RR,
.tag_alloc_policy_rr = true,
};
/*

View File

@ -378,7 +378,6 @@ static const struct scsi_host_template sil24_sht = {
.can_queue = SIL24_MAX_CMDS,
.sg_tablesize = SIL24_MAX_SGE,
.dma_boundary = ATA_DMA_BOUNDARY,
.tag_alloc_policy = BLK_TAG_ALLOC_FIFO,
.sdev_groups = ata_ncq_sdev_groups,
.change_queue_depth = ata_scsi_change_queue_depth,
.device_configure = ata_scsi_device_configure

View File

@ -1819,7 +1819,6 @@ static int fd_alloc_drive(int drive)
unit[drive].tag_set.nr_maps = 1;
unit[drive].tag_set.queue_depth = 2;
unit[drive].tag_set.numa_node = NUMA_NO_NODE;
unit[drive].tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
if (blk_mq_alloc_tag_set(&unit[drive].tag_set))
goto out_cleanup_trackbuf;

View File

@ -368,7 +368,6 @@ aoeblk_gdalloc(void *vp)
set->nr_hw_queues = 1;
set->queue_depth = 128;
set->numa_node = NUMA_NO_NODE;
set->flags = BLK_MQ_F_SHOULD_MERGE;
err = blk_mq_alloc_tag_set(set);
if (err) {
pr_err("aoe: cannot allocate tag set for %ld.%d\n",

View File

@ -2088,7 +2088,6 @@ static int __init atari_floppy_init (void)
unit[i].tag_set.nr_maps = 1;
unit[i].tag_set.queue_depth = 2;
unit[i].tag_set.numa_node = NUMA_NO_NODE;
unit[i].tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
ret = blk_mq_alloc_tag_set(&unit[i].tag_set);
if (ret)
goto err;

View File

@ -4596,7 +4596,6 @@ static int __init do_floppy_init(void)
tag_sets[drive].nr_maps = 1;
tag_sets[drive].queue_depth = 2;
tag_sets[drive].numa_node = NUMA_NO_NODE;
tag_sets[drive].flags = BLK_MQ_F_SHOULD_MERGE;
err = blk_mq_alloc_tag_set(&tag_sets[drive]);
if (err)
goto out_put_disk;

View File

@ -68,7 +68,6 @@ struct loop_device {
struct list_head idle_worker_list;
struct rb_root worker_tree;
struct timer_list timer;
bool use_dio;
bool sysfs_inited;
struct request_queue *lo_queue;
@ -182,41 +181,44 @@ static bool lo_bdev_can_use_dio(struct loop_device *lo,
return true;
}
static void __loop_update_dio(struct loop_device *lo, bool dio)
static bool lo_can_use_dio(struct loop_device *lo)
{
struct file *file = lo->lo_backing_file;
struct inode *inode = file->f_mapping->host;
struct block_device *backing_bdev = NULL;
bool use_dio;
struct inode *inode = lo->lo_backing_file->f_mapping->host;
if (!(lo->lo_backing_file->f_mode & FMODE_CAN_ODIRECT))
return false;
if (S_ISBLK(inode->i_mode))
backing_bdev = I_BDEV(inode);
else if (inode->i_sb->s_bdev)
backing_bdev = inode->i_sb->s_bdev;
return lo_bdev_can_use_dio(lo, I_BDEV(inode));
if (inode->i_sb->s_bdev)
return lo_bdev_can_use_dio(lo, inode->i_sb->s_bdev);
return true;
}
use_dio = dio && (file->f_mode & FMODE_CAN_ODIRECT) &&
(!backing_bdev || lo_bdev_can_use_dio(lo, backing_bdev));
if (lo->use_dio == use_dio)
return;
/* flush dirty pages before changing direct IO */
vfs_fsync(file, 0);
/*
* The flag of LO_FLAGS_DIRECT_IO is handled similarly with
* LO_FLAGS_READ_ONLY, both are set from kernel, and losetup
* will get updated by ioctl(LOOP_GET_STATUS)
/*
* Direct I/O can be enabled either by using an O_DIRECT file descriptor, or by
* passing in the LO_FLAGS_DIRECT_IO flag from userspace. It will be silently
* disabled when the device block size is too small or the offset is unaligned.
*
* loop_get_status will always report the effective LO_FLAGS_DIRECT_IO flag and
* not the originally passed in one.
*/
if (lo->lo_state == Lo_bound)
blk_mq_freeze_queue(lo->lo_queue);
lo->use_dio = use_dio;
if (use_dio)
static inline void loop_update_dio(struct loop_device *lo)
{
bool dio_in_use = lo->lo_flags & LO_FLAGS_DIRECT_IO;
lockdep_assert_held(&lo->lo_mutex);
WARN_ON_ONCE(lo->lo_state == Lo_bound &&
lo->lo_queue->mq_freeze_depth == 0);
if (lo->lo_backing_file->f_flags & O_DIRECT)
lo->lo_flags |= LO_FLAGS_DIRECT_IO;
else
if ((lo->lo_flags & LO_FLAGS_DIRECT_IO) && !lo_can_use_dio(lo))
lo->lo_flags &= ~LO_FLAGS_DIRECT_IO;
if (lo->lo_state == Lo_bound)
blk_mq_unfreeze_queue(lo->lo_queue);
/* flush dirty pages before starting to issue direct I/O */
if ((lo->lo_flags & LO_FLAGS_DIRECT_IO) && !dio_in_use)
vfs_fsync(lo->lo_backing_file, 0);
}
/**
@ -311,6 +313,13 @@ static void loop_clear_limits(struct loop_device *lo, int mode)
lim.discard_granularity = 0;
}
/*
* XXX: this updates the queue limits without freezing the queue, which
* is against the locking protocol and dangerous. But we can't just
* freeze the queue as we're inside the ->queue_rq method here. So this
* should move out into a workqueue unless we get the file operations to
* advertise if they support specific fallocate operations.
*/
queue_limits_commit_update(lo->lo_queue, &lim);
}
@ -520,12 +529,6 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
}
}
static inline void loop_update_dio(struct loop_device *lo)
{
__loop_update_dio(lo, (lo->lo_backing_file->f_flags & O_DIRECT) |
lo->use_dio);
}
static void loop_reread_partitions(struct loop_device *lo)
{
int rc;
@ -964,7 +967,6 @@ loop_set_status_from_info(struct loop_device *lo,
memcpy(lo->lo_file_name, info->lo_file_name, LO_NAME_SIZE);
lo->lo_file_name[LO_NAME_SIZE-1] = 0;
lo->lo_flags = info->lo_flags;
return 0;
}
@ -977,12 +979,12 @@ static unsigned int loop_default_blocksize(struct loop_device *lo,
return SECTOR_SIZE;
}
static int loop_reconfigure_limits(struct loop_device *lo, unsigned int bsize)
static void loop_update_limits(struct loop_device *lo, struct queue_limits *lim,
unsigned int bsize)
{
struct file *file = lo->lo_backing_file;
struct inode *inode = file->f_mapping->host;
struct block_device *backing_bdev = NULL;
struct queue_limits lim;
u32 granularity = 0, max_discard_sectors = 0;
if (S_ISBLK(inode->i_mode))
@ -995,22 +997,20 @@ static int loop_reconfigure_limits(struct loop_device *lo, unsigned int bsize)
loop_get_discard_config(lo, &granularity, &max_discard_sectors);
lim = queue_limits_start_update(lo->lo_queue);
lim.logical_block_size = bsize;
lim.physical_block_size = bsize;
lim.io_min = bsize;
lim.features &= ~(BLK_FEAT_WRITE_CACHE | BLK_FEAT_ROTATIONAL);
lim->logical_block_size = bsize;
lim->physical_block_size = bsize;
lim->io_min = bsize;
lim->features &= ~(BLK_FEAT_WRITE_CACHE | BLK_FEAT_ROTATIONAL);
if (file->f_op->fsync && !(lo->lo_flags & LO_FLAGS_READ_ONLY))
lim.features |= BLK_FEAT_WRITE_CACHE;
lim->features |= BLK_FEAT_WRITE_CACHE;
if (backing_bdev && !bdev_nonrot(backing_bdev))
lim.features |= BLK_FEAT_ROTATIONAL;
lim.max_hw_discard_sectors = max_discard_sectors;
lim.max_write_zeroes_sectors = max_discard_sectors;
lim->features |= BLK_FEAT_ROTATIONAL;
lim->max_hw_discard_sectors = max_discard_sectors;
lim->max_write_zeroes_sectors = max_discard_sectors;
if (max_discard_sectors)
lim.discard_granularity = granularity;
lim->discard_granularity = granularity;
else
lim.discard_granularity = 0;
return queue_limits_commit_update(lo->lo_queue, &lim);
lim->discard_granularity = 0;
}
static int loop_configure(struct loop_device *lo, blk_mode_t mode,
@ -1019,6 +1019,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
{
struct file *file = fget(config->fd);
struct address_space *mapping;
struct queue_limits lim;
int error;
loff_t size;
bool partscan;
@ -1063,6 +1064,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
error = loop_set_status_from_info(lo, &config->info);
if (error)
goto out_unlock;
lo->lo_flags = config->info.lo_flags;
if (!(file->f_mode & FMODE_WRITE) || !(mode & BLK_OPEN_WRITE) ||
!file->f_op->write_iter)
@ -1084,13 +1086,15 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
disk_force_media_change(lo->lo_disk);
set_disk_ro(lo->lo_disk, (lo->lo_flags & LO_FLAGS_READ_ONLY) != 0);
lo->use_dio = lo->lo_flags & LO_FLAGS_DIRECT_IO;
lo->lo_device = bdev;
lo->lo_backing_file = file;
lo->old_gfp_mask = mapping_gfp_mask(mapping);
mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
error = loop_reconfigure_limits(lo, config->block_size);
lim = queue_limits_start_update(lo->lo_queue);
loop_update_limits(lo, &lim, config->block_size);
/* No need to freeze the queue as the device isn't bound yet. */
error = queue_limits_commit_update(lo->lo_queue, &lim);
if (error)
goto out_unlock;
@ -1150,7 +1154,12 @@ static void __loop_clr_fd(struct loop_device *lo)
lo->lo_sizelimit = 0;
memset(lo->lo_file_name, 0, LO_NAME_SIZE);
/* reset the block size to the default */
/*
* Reset the block size to the default.
*
* No queue freezing needed because this is called from the final
* ->release call only, so there can't be any outstanding I/O.
*/
lim = queue_limits_start_update(lo->lo_queue);
lim.logical_block_size = SECTOR_SIZE;
lim.physical_block_size = SECTOR_SIZE;
@ -1244,7 +1253,6 @@ static int
loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
{
int err;
int prev_lo_flags;
bool partscan = false;
bool size_changed = false;
@ -1263,21 +1271,19 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
invalidate_bdev(lo->lo_device);
}
/* I/O need to be drained during transfer transition */
/* I/O needs to be drained before changing lo_offset or lo_sizelimit */
blk_mq_freeze_queue(lo->lo_queue);
prev_lo_flags = lo->lo_flags;
err = loop_set_status_from_info(lo, info);
if (err)
goto out_unfreeze;
/* Mask out flags that can't be set using LOOP_SET_STATUS. */
lo->lo_flags &= LOOP_SET_STATUS_SETTABLE_FLAGS;
/* For those flags, use the previous values instead */
lo->lo_flags |= prev_lo_flags & ~LOOP_SET_STATUS_SETTABLE_FLAGS;
/* For flags that can't be cleared, use previous values too */
lo->lo_flags |= prev_lo_flags & ~LOOP_SET_STATUS_CLEARABLE_FLAGS;
partscan = !(lo->lo_flags & LO_FLAGS_PARTSCAN) &&
(info->lo_flags & LO_FLAGS_PARTSCAN);
lo->lo_flags &= ~(LOOP_SET_STATUS_SETTABLE_FLAGS |
LOOP_SET_STATUS_CLEARABLE_FLAGS);
lo->lo_flags |= (info->lo_flags & LOOP_SET_STATUS_SETTABLE_FLAGS);
if (size_changed) {
loff_t new_size = get_size(lo->lo_offset, lo->lo_sizelimit,
@ -1285,17 +1291,13 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
loop_set_size(lo, new_size);
}
/* update dio if lo_offset or transfer is changed */
__loop_update_dio(lo, lo->use_dio);
/* update the direct I/O flag if lo_offset changed */
loop_update_dio(lo);
out_unfreeze:
blk_mq_unfreeze_queue(lo->lo_queue);
if (!err && (lo->lo_flags & LO_FLAGS_PARTSCAN) &&
!(prev_lo_flags & LO_FLAGS_PARTSCAN)) {
if (partscan)
clear_bit(GD_SUPPRESS_PART_SCAN, &lo->lo_disk->state);
partscan = true;
}
out_unlock:
mutex_unlock(&lo->lo_mutex);
if (partscan)
@ -1444,20 +1446,32 @@ static int loop_set_capacity(struct loop_device *lo)
static int loop_set_dio(struct loop_device *lo, unsigned long arg)
{
int error = -ENXIO;
if (lo->lo_state != Lo_bound)
goto out;
bool use_dio = !!arg;
__loop_update_dio(lo, !!arg);
if (lo->use_dio == !!arg)
if (lo->lo_state != Lo_bound)
return -ENXIO;
if (use_dio == !!(lo->lo_flags & LO_FLAGS_DIRECT_IO))
return 0;
if (use_dio) {
if (!lo_can_use_dio(lo))
return -EINVAL;
/* flush dirty pages before starting to use direct I/O */
vfs_fsync(lo->lo_backing_file, 0);
}
blk_mq_freeze_queue(lo->lo_queue);
if (use_dio)
lo->lo_flags |= LO_FLAGS_DIRECT_IO;
else
lo->lo_flags &= ~LO_FLAGS_DIRECT_IO;
blk_mq_unfreeze_queue(lo->lo_queue);
return 0;
error = -EINVAL;
out:
return error;
}
static int loop_set_block_size(struct loop_device *lo, unsigned long arg)
{
struct queue_limits lim;
int err = 0;
if (lo->lo_state != Lo_bound)
@ -1469,8 +1483,11 @@ static int loop_set_block_size(struct loop_device *lo, unsigned long arg)
sync_blockdev(lo->lo_device);
invalidate_bdev(lo->lo_device);
lim = queue_limits_start_update(lo->lo_queue);
loop_update_limits(lo, &lim, arg);
blk_mq_freeze_queue(lo->lo_queue);
err = loop_reconfigure_limits(lo, arg);
err = queue_limits_commit_update(lo->lo_queue, &lim);
loop_update_dio(lo);
blk_mq_unfreeze_queue(lo->lo_queue);
@ -1854,7 +1871,7 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
cmd->use_aio = false;
break;
default:
cmd->use_aio = lo->use_dio;
cmd->use_aio = lo->lo_flags & LO_FLAGS_DIRECT_IO;
break;
}
@ -2023,8 +2040,7 @@ static int loop_add(int i)
lo->tag_set.queue_depth = hw_queue_depth;
lo->tag_set.numa_node = NUMA_NO_NODE;
lo->tag_set.cmd_size = sizeof(struct loop_cmd);
lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_STACKING |
BLK_MQ_F_NO_SCHED_BY_DEFAULT;
lo->tag_set.flags = BLK_MQ_F_STACKING | BLK_MQ_F_NO_SCHED_BY_DEFAULT;
lo->tag_set.driver_data = lo;
err = blk_mq_alloc_tag_set(&lo->tag_set);

View File

@ -3416,7 +3416,6 @@ static int mtip_block_initialize(struct driver_data *dd)
dd->tags.reserved_tags = 1;
dd->tags.cmd_size = sizeof(struct mtip_cmd);
dd->tags.numa_node = dd->numa_node;
dd->tags.flags = BLK_MQ_F_SHOULD_MERGE;
dd->tags.driver_data = dd;
dd->tags.timeout = MTIP_NCQ_CMD_TIMEOUT_MS;

View File

@ -327,8 +327,7 @@ static void nbd_mark_nsock_dead(struct nbd_device *nbd, struct nbd_sock *nsock,
nsock->sent = 0;
}
static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
loff_t blksize)
static int nbd_set_size(struct nbd_device *nbd, loff_t bytesize, loff_t blksize)
{
struct queue_limits lim;
int error;
@ -368,7 +367,7 @@ static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
lim.logical_block_size = blksize;
lim.physical_block_size = blksize;
error = queue_limits_commit_update(nbd->disk->queue, &lim);
error = queue_limits_commit_update_frozen(nbd->disk->queue, &lim);
if (error)
return error;
@ -379,18 +378,6 @@ static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
return 0;
}
static int nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
loff_t blksize)
{
int error;
blk_mq_freeze_queue(nbd->disk->queue);
error = __nbd_set_size(nbd, bytesize, blksize);
blk_mq_unfreeze_queue(nbd->disk->queue);
return error;
}
static void nbd_complete_rq(struct request *req)
{
struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);
@ -1841,8 +1828,7 @@ static struct nbd_device *nbd_dev_add(int index, unsigned int refs)
nbd->tag_set.queue_depth = 128;
nbd->tag_set.numa_node = NUMA_NO_NODE;
nbd->tag_set.cmd_size = sizeof(struct nbd_cmd);
nbd->tag_set.flags = BLK_MQ_F_SHOULD_MERGE |
BLK_MQ_F_BLOCKING;
nbd->tag_set.flags = BLK_MQ_F_BLOCKING;
nbd->tag_set.driver_data = nbd;
INIT_WORK(&nbd->remove_work, nbd_dev_remove_work);
nbd->backend = NULL;
@ -2180,6 +2166,7 @@ static void nbd_disconnect_and_put(struct nbd_device *nbd)
flush_workqueue(nbd->recv_workq);
nbd_clear_que(nbd);
nbd->task_setup = NULL;
clear_bit(NBD_RT_BOUND, &nbd->config->runtime_flags);
mutex_unlock(&nbd->config_lock);
if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF,

View File

@ -266,6 +266,10 @@ static bool g_zone_full;
module_param_named(zone_full, g_zone_full, bool, S_IRUGO);
MODULE_PARM_DESC(zone_full, "Initialize the sequential write required zones of a zoned device to be full. Default: false");
static bool g_rotational;
module_param_named(rotational, g_rotational, bool, S_IRUGO);
MODULE_PARM_DESC(rotational, "Set the rotational feature for the device. Default: false");
static struct nullb_device *null_alloc_dev(void);
static void null_free_dev(struct nullb_device *dev);
static void null_del_dev(struct nullb *nullb);
@ -468,6 +472,7 @@ NULLB_DEVICE_ATTR(no_sched, bool, NULL);
NULLB_DEVICE_ATTR(shared_tags, bool, NULL);
NULLB_DEVICE_ATTR(shared_tag_bitmap, bool, NULL);
NULLB_DEVICE_ATTR(fua, bool, NULL);
NULLB_DEVICE_ATTR(rotational, bool, NULL);
static ssize_t nullb_device_power_show(struct config_item *item, char *page)
{
@ -621,6 +626,7 @@ static struct configfs_attribute *nullb_device_attrs[] = {
&nullb_device_attr_shared_tags,
&nullb_device_attr_shared_tag_bitmap,
&nullb_device_attr_fua,
&nullb_device_attr_rotational,
NULL,
};
@ -706,7 +712,8 @@ static ssize_t memb_group_features_show(struct config_item *item, char *page)
"shared_tags,size,submit_queues,use_per_node_hctx,"
"virt_boundary,zoned,zone_capacity,zone_max_active,"
"zone_max_open,zone_nr_conv,zone_offline,zone_readonly,"
"zone_size,zone_append_max_sectors,zone_full\n");
"zone_size,zone_append_max_sectors,zone_full,"
"rotational\n");
}
CONFIGFS_ATTR_RO(memb_group_, features);
@ -793,6 +800,7 @@ static struct nullb_device *null_alloc_dev(void)
dev->shared_tags = g_shared_tags;
dev->shared_tag_bitmap = g_shared_tag_bitmap;
dev->fua = g_fua;
dev->rotational = g_rotational;
return dev;
}
@ -899,7 +907,7 @@ static struct nullb_page *null_radix_tree_insert(struct nullb *nullb, u64 idx,
if (radix_tree_insert(root, idx, t_page)) {
null_free_page(t_page);
t_page = radix_tree_lookup(root, idx);
WARN_ON(!t_page || t_page->page->index != idx);
WARN_ON(!t_page || t_page->page->private != idx);
} else if (is_cache)
nullb->dev->curr_cache += PAGE_SIZE;
@ -922,7 +930,7 @@ static void null_free_device_storage(struct nullb_device *dev, bool is_cache)
(void **)t_pages, pos, FREE_BATCH);
for (i = 0; i < nr_pages; i++) {
pos = t_pages[i]->page->index;
pos = t_pages[i]->page->private;
ret = radix_tree_delete_item(root, pos, t_pages[i]);
WARN_ON(ret != t_pages[i]);
null_free_page(ret);
@ -948,7 +956,7 @@ static struct nullb_page *__null_lookup_page(struct nullb *nullb,
root = is_cache ? &nullb->dev->cache : &nullb->dev->data;
t_page = radix_tree_lookup(root, idx);
WARN_ON(t_page && t_page->page->index != idx);
WARN_ON(t_page && t_page->page->private != idx);
if (t_page && (for_write || test_bit(sector_bit, t_page->bitmap)))
return t_page;
@ -991,7 +999,7 @@ static struct nullb_page *null_insert_page(struct nullb *nullb,
spin_lock_irq(&nullb->lock);
idx = sector >> PAGE_SECTORS_SHIFT;
t_page->page->index = idx;
t_page->page->private = idx;
t_page = null_radix_tree_insert(nullb, idx, t_page, !ignore_cache);
radix_tree_preload_end();
@ -1011,7 +1019,7 @@ static int null_flush_cache_page(struct nullb *nullb, struct nullb_page *c_page)
struct nullb_page *t_page, *ret;
void *dst, *src;
idx = c_page->page->index;
idx = c_page->page->private;
t_page = null_insert_page(nullb, idx << PAGE_SECTORS_SHIFT, true);
@ -1070,7 +1078,7 @@ again:
* avoid race, we don't allow page free
*/
for (i = 0; i < nr_pages; i++) {
nullb->cache_flush_pos = c_pages[i]->page->index;
nullb->cache_flush_pos = c_pages[i]->page->private;
/*
* We found the page which is being flushed to disk by other
* threads
@ -1783,9 +1791,8 @@ static int null_init_global_tag_set(void)
tag_set.nr_hw_queues = g_submit_queues;
tag_set.queue_depth = g_hw_queue_depth;
tag_set.numa_node = g_home_node;
tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
if (g_no_sched)
tag_set.flags |= BLK_MQ_F_NO_SCHED;
tag_set.flags |= BLK_MQ_F_NO_SCHED_BY_DEFAULT;
if (g_shared_tag_bitmap)
tag_set.flags |= BLK_MQ_F_TAG_HCTX_SHARED;
if (g_blocking)
@ -1809,9 +1816,8 @@ static int null_setup_tagset(struct nullb *nullb)
nullb->tag_set->nr_hw_queues = nullb->dev->submit_queues;
nullb->tag_set->queue_depth = nullb->dev->hw_queue_depth;
nullb->tag_set->numa_node = nullb->dev->home_node;
nullb->tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
if (nullb->dev->no_sched)
nullb->tag_set->flags |= BLK_MQ_F_NO_SCHED;
nullb->tag_set->flags |= BLK_MQ_F_NO_SCHED_BY_DEFAULT;
if (nullb->dev->shared_tag_bitmap)
nullb->tag_set->flags |= BLK_MQ_F_TAG_HCTX_SHARED;
if (nullb->dev->blocking)
@ -1938,6 +1944,9 @@ static int null_add_dev(struct nullb_device *dev)
lim.features |= BLK_FEAT_FUA;
}
if (dev->rotational)
lim.features |= BLK_FEAT_ROTATIONAL;
nullb->disk = blk_mq_alloc_disk(nullb->tag_set, &lim, nullb);
if (IS_ERR(nullb->disk)) {
rv = PTR_ERR(nullb->disk);

View File

@ -107,6 +107,7 @@ struct nullb_device {
bool shared_tags; /* share tag set between devices for blk-mq */
bool shared_tag_bitmap; /* use hostwide shared tags */
bool fua; /* Support FUA */
bool rotational; /* Fake rotational device */
};
struct nullb {

View File

@ -384,9 +384,9 @@ static int ps3disk_probe(struct ps3_system_bus_device *_dev)
unsigned int devidx;
struct queue_limits lim = {
.logical_block_size = dev->blk_size,
.max_hw_sectors = dev->bounce_size >> 9,
.max_hw_sectors = BOUNCE_SIZE >> 9,
.max_segments = -1,
.max_segment_size = dev->bounce_size,
.max_segment_size = BOUNCE_SIZE,
.dma_alignment = dev->blk_size - 1,
.features = BLK_FEAT_WRITE_CACHE |
BLK_FEAT_ROTATIONAL,
@ -434,8 +434,7 @@ static int ps3disk_probe(struct ps3_system_bus_device *_dev)
ps3disk_identify(dev);
error = blk_mq_alloc_sq_tag_set(&priv->tag_set, &ps3disk_mq_ops, 1,
BLK_MQ_F_SHOULD_MERGE);
error = blk_mq_alloc_sq_tag_set(&priv->tag_set, &ps3disk_mq_ops, 1, 0);
if (error)
goto fail_teardown;

View File

@ -4964,7 +4964,6 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
rbd_dev->tag_set.ops = &rbd_mq_ops;
rbd_dev->tag_set.queue_depth = rbd_dev->opts->queue_depth;
rbd_dev->tag_set.numa_node = NUMA_NO_NODE;
rbd_dev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
rbd_dev->tag_set.nr_hw_queues = num_present_cpus();
rbd_dev->tag_set.cmd_size = sizeof(struct rbd_img_request);

View File

@ -1209,8 +1209,7 @@ static int setup_mq_tags(struct rnbd_clt_session *sess)
tag_set->ops = &rnbd_mq_ops;
tag_set->queue_depth = sess->queue_depth;
tag_set->numa_node = NUMA_NO_NODE;
tag_set->flags = BLK_MQ_F_SHOULD_MERGE |
BLK_MQ_F_TAG_QUEUE_SHARED;
tag_set->flags = BLK_MQ_F_TAG_QUEUE_SHARED;
tag_set->cmd_size = sizeof(struct rnbd_iu) + RNBD_RDMA_SGL_SIZE;
/* for HCTX_TYPE_DEFAULT, HCTX_TYPE_READ, HCTX_TYPE_POLL */

View File

@ -167,7 +167,7 @@ static int process_rdma(struct rnbd_srv_session *srv_sess,
bio->bi_iter.bi_sector = le64_to_cpu(msg->sector);
prio = srv_sess->ver < RNBD_PROTO_VER_MAJOR ||
usrlen < sizeof(*msg) ? 0 : le16_to_cpu(msg->prio);
bio_set_prio(bio, prio);
bio->bi_ioprio = prio;
submit_bio(bio);

View File

@ -32,25 +32,31 @@ module! {
license: "GPL v2",
}
#[pin_data]
struct NullBlkModule {
_disk: Pin<KBox<Mutex<GenDisk<NullBlkDevice>>>>,
#[pin]
_disk: Mutex<GenDisk<NullBlkDevice>>,
}
impl kernel::Module for NullBlkModule {
fn init(_module: &'static ThisModule) -> Result<Self> {
impl kernel::InPlaceModule for NullBlkModule {
fn init(_module: &'static ThisModule) -> impl PinInit<Self, Error> {
pr_info!("Rust null_blk loaded\n");
// Use a immediately-called closure as a stable `try` block
let disk = /* try */ (|| {
let tagset = Arc::pin_init(TagSet::new(1, 256, 1), flags::GFP_KERNEL)?;
let disk = gen_disk::GenDiskBuilder::new()
gen_disk::GenDiskBuilder::new()
.capacity_sectors(4096 << 11)
.logical_block_size(4096)?
.physical_block_size(4096)?
.rotational(false)
.build(format_args!("rnullb{}", 0), tagset)?;
.build(format_args!("rnullb{}", 0), tagset)
})();
let disk = KBox::pin_init(new_mutex!(disk, "nullb:disk"), flags::GFP_KERNEL)?;
Ok(Self { _disk: disk })
try_pin_init!(Self {
_disk <- new_mutex!(disk?, "nullb:disk"),
})
}
}

View File

@ -829,7 +829,7 @@ static int probe_disk(struct vdc_port *port)
}
err = blk_mq_alloc_sq_tag_set(&port->tag_set, &vdc_mq_ops,
VDC_TX_RING_SIZE, BLK_MQ_F_SHOULD_MERGE);
VDC_TX_RING_SIZE, 0);
if (err)
return err;

View File

@ -818,7 +818,7 @@ static int swim_floppy_init(struct swim_priv *swd)
for (drive = 0; drive < swd->floppy_count; drive++) {
err = blk_mq_alloc_sq_tag_set(&swd->unit[drive].tag_set,
&swim_mq_ops, 2, BLK_MQ_F_SHOULD_MERGE);
&swim_mq_ops, 2, 0);
if (err)
goto exit_put_disks;

View File

@ -1208,8 +1208,7 @@ static int swim3_attach(struct macio_dev *mdev,
fs = &floppy_states[floppy_count];
memset(fs, 0, sizeof(*fs));
rc = blk_mq_alloc_sq_tag_set(&fs->tag_set, &swim3_mq_ops, 2,
BLK_MQ_F_SHOULD_MERGE);
rc = blk_mq_alloc_sq_tag_set(&fs->tag_set, &swim3_mq_ops, 2, 0);
if (rc)
goto out_unregister;

View File

@ -2213,7 +2213,6 @@ static int ublk_add_tag_set(struct ublk_device *ub)
ub->tag_set.queue_depth = ub->dev_info.queue_depth;
ub->tag_set.numa_node = NUMA_NO_NODE;
ub->tag_set.cmd_size = sizeof(struct ublk_rq_data);
ub->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
ub->tag_set.driver_data = ub;
return blk_mq_alloc_tag_set(&ub->tag_set);
}

View File

@ -13,7 +13,6 @@
#include <linux/string_helpers.h>
#include <linux/idr.h>
#include <linux/blk-mq.h>
#include <linux/blk-mq-virtio.h>
#include <linux/numa.h>
#include <linux/vmalloc.h>
#include <uapi/linux/virtio_ring.h>
@ -1106,9 +1105,7 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
lim.features |= BLK_FEAT_WRITE_CACHE;
else
lim.features &= ~BLK_FEAT_WRITE_CACHE;
blk_mq_freeze_queue(disk->queue);
i = queue_limits_commit_update(disk->queue, &lim);
blk_mq_unfreeze_queue(disk->queue);
i = queue_limits_commit_update_frozen(disk->queue, &lim);
if (i)
return i;
return count;
@ -1181,7 +1178,8 @@ static void virtblk_map_queues(struct blk_mq_tag_set *set)
if (i == HCTX_TYPE_POLL)
blk_mq_map_queues(&set->map[i]);
else
blk_mq_virtio_map_queues(&set->map[i], vblk->vdev, 0);
blk_mq_map_hw_queues(&set->map[i],
&vblk->vdev->dev, 0);
}
}
@ -1481,7 +1479,6 @@ static int virtblk_probe(struct virtio_device *vdev)
vblk->tag_set.ops = &virtio_mq_ops;
vblk->tag_set.queue_depth = queue_depth;
vblk->tag_set.numa_node = NUMA_NO_NODE;
vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
vblk->tag_set.cmd_size =
sizeof(struct virtblk_req) +
sizeof(struct scatterlist) * VIRTIO_BLK_INLINE_SG_CNT;

View File

@ -1131,7 +1131,6 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
} else
info->tag_set.queue_depth = BLK_RING_SIZE(info);
info->tag_set.numa_node = NUMA_NO_NODE;
info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
info->tag_set.cmd_size = sizeof(struct blkif_req);
info->tag_set.driver_data = info;

View File

@ -354,7 +354,6 @@ static int __init z2_init(void)
tag_set.nr_maps = 1;
tag_set.queue_depth = 16;
tag_set.numa_node = NUMA_NO_NODE;
tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
ret = blk_mq_alloc_tag_set(&tag_set);
if (ret)
goto out_unregister_blkdev;

View File

@ -777,7 +777,7 @@ static int probe_gdrom(struct platform_device *devptr)
probe_gdrom_setupcd();
err = blk_mq_alloc_sq_tag_set(&gd.tag_set, &gdrom_mq_ops, 1,
BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING);
BLK_MQ_F_BLOCKING);
if (err)
goto probe_fail_free_cd_info;

View File

@ -82,7 +82,7 @@ static void moving_init(struct moving_io *io)
bio_init(bio, NULL, bio->bi_inline_vecs,
DIV_ROUND_UP(KEY_SIZE(&io->w->key), PAGE_SECTORS), 0);
bio_get(bio);
bio_set_prio(bio, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0));
bio->bi_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0);
bio->bi_iter.bi_size = KEY_SIZE(&io->w->key) << 9;
bio->bi_private = &io->cl;

View File

@ -334,7 +334,7 @@ static void dirty_init(struct keybuf_key *w)
bio_init(bio, NULL, bio->bi_inline_vecs,
DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS), 0);
if (!io->dc->writeback_percent)
bio_set_prio(bio, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0));
bio->bi_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0);
bio->bi_iter.bi_size = KEY_SIZE(&w->key) << 9;
bio->bi_private = w;

View File

@ -547,7 +547,7 @@ int dm_mq_init_request_queue(struct mapped_device *md, struct dm_table *t)
md->tag_set->ops = &dm_mq_ops;
md->tag_set->queue_depth = dm_get_blk_mq_queue_depth();
md->tag_set->numa_node = md->numa_node_id;
md->tag_set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_STACKING;
md->tag_set->flags = BLK_MQ_F_STACKING;
md->tag_set->nr_hw_queues = dm_get_blk_mq_nr_hw_queues();
md->tag_set->driver_data = md;

View File

@ -122,7 +122,7 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
struct bio *bio = dm_bio_from_per_bio_data(io, v->ti->per_io_data_size);
par = fec_read_parity(v, rsb, block_offset, &offset,
par_buf_offset, &buf, bio_prio(bio));
par_buf_offset, &buf, bio->bi_ioprio);
if (IS_ERR(par))
return PTR_ERR(par);
@ -164,7 +164,7 @@ static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io,
dm_bufio_release(buf);
par = fec_read_parity(v, rsb, block_offset, &offset,
par_buf_offset, &buf, bio_prio(bio));
par_buf_offset, &buf, bio->bi_ioprio);
if (IS_ERR(par))
return PTR_ERR(par);
}
@ -254,7 +254,7 @@ static int fec_read_bufs(struct dm_verity *v, struct dm_verity_io *io,
bufio = v->bufio;
}
bbuf = dm_bufio_read_with_ioprio(bufio, block, &buf, bio_prio(bio));
bbuf = dm_bufio_read_with_ioprio(bufio, block, &buf, bio->bi_ioprio);
if (IS_ERR(bbuf)) {
DMWARN_LIMIT("%s: FEC %llu: read failed (%llu): %ld",
v->data_dev->name,

View File

@ -321,7 +321,7 @@ static int verity_verify_level(struct dm_verity *v, struct dm_verity_io *io,
}
} else {
data = dm_bufio_read_with_ioprio(v->bufio, hash_block,
&buf, bio_prio(bio));
&buf, bio->bi_ioprio);
}
if (IS_ERR(data))
@ -789,7 +789,7 @@ static int verity_map(struct dm_target *ti, struct bio *bio)
verity_fec_init_io(io);
verity_submit_prefetch(v, io, bio_prio(bio));
verity_submit_prefetch(v, io, bio->bi_ioprio);
submit_bio_noacct(bio);

View File

@ -2094,8 +2094,7 @@ static int msb_init_disk(struct memstick_dev *card)
if (msb->disk_id < 0)
return msb->disk_id;
rc = blk_mq_alloc_sq_tag_set(&msb->tag_set, &msb_mq_ops, 2,
BLK_MQ_F_SHOULD_MERGE);
rc = blk_mq_alloc_sq_tag_set(&msb->tag_set, &msb_mq_ops, 2, 0);
if (rc)
goto out_release_id;

View File

@ -1139,8 +1139,7 @@ static int mspro_block_init_disk(struct memstick_dev *card)
if (disk_id < 0)
return disk_id;
rc = blk_mq_alloc_sq_tag_set(&msb->tag_set, &mspro_mq_ops, 2,
BLK_MQ_F_SHOULD_MERGE);
rc = blk_mq_alloc_sq_tag_set(&msb->tag_set, &mspro_mq_ops, 2, 0);
if (rc)
goto out_release_id;

View File

@ -441,7 +441,7 @@ struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
else
mq->tag_set.queue_depth = MMC_QUEUE_DEPTH;
mq->tag_set.numa_node = NUMA_NO_NODE;
mq->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING;
mq->tag_set.flags = BLK_MQ_F_BLOCKING;
mq->tag_set.nr_hw_queues = 1;
mq->tag_set.cmd_size = sizeof(struct mmc_queue_req);
mq->tag_set.driver_data = mq;

View File

@ -329,7 +329,7 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
goto out_list_del;
ret = blk_mq_alloc_sq_tag_set(new->tag_set, &mtd_mq_ops, 2,
BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING);
BLK_MQ_F_BLOCKING);
if (ret)
goto out_kfree_tag_set;

View File

@ -383,7 +383,7 @@ int ubiblock_create(struct ubi_volume_info *vi)
dev->tag_set.ops = &ubiblock_mq_ops;
dev->tag_set.queue_depth = 64;
dev->tag_set.numa_node = NUMA_NO_NODE;
dev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING;
dev->tag_set.flags = BLK_MQ_F_BLOCKING;
dev->tag_set.cmd_size = sizeof(struct ubiblock_pdu);
dev->tag_set.driver_data = dev;
dev->tag_set.nr_hw_queues = 1;

View File

@ -1251,7 +1251,6 @@ static int apple_nvme_alloc_tagsets(struct apple_nvme *anv)
anv->admin_tagset.timeout = NVME_ADMIN_TIMEOUT;
anv->admin_tagset.numa_node = NUMA_NO_NODE;
anv->admin_tagset.cmd_size = sizeof(struct apple_nvme_iod);
anv->admin_tagset.flags = BLK_MQ_F_NO_SCHED;
anv->admin_tagset.driver_data = &anv->adminq;
ret = blk_mq_alloc_tag_set(&anv->admin_tagset);
@ -1275,7 +1274,6 @@ static int apple_nvme_alloc_tagsets(struct apple_nvme *anv)
anv->tagset.timeout = NVME_IO_TIMEOUT;
anv->tagset.numa_node = NUMA_NO_NODE;
anv->tagset.cmd_size = sizeof(struct apple_nvme_iod);
anv->tagset.flags = BLK_MQ_F_SHOULD_MERGE;
anv->tagset.driver_data = &anv->ioq;
ret = blk_mq_alloc_tag_set(&anv->tagset);

View File

@ -2133,9 +2133,10 @@ static int nvme_update_ns_info_generic(struct nvme_ns *ns,
struct queue_limits lim;
int ret;
blk_mq_freeze_queue(ns->disk->queue);
lim = queue_limits_start_update(ns->disk->queue);
nvme_set_ctrl_limits(ns->ctrl, &lim);
blk_mq_freeze_queue(ns->disk->queue);
ret = queue_limits_commit_update(ns->disk->queue, &lim);
set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info));
blk_mq_unfreeze_queue(ns->disk->queue);
@ -2182,12 +2183,12 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
goto out;
}
lim = queue_limits_start_update(ns->disk->queue);
blk_mq_freeze_queue(ns->disk->queue);
ns->head->lba_shift = id->lbaf[lbaf].ds;
ns->head->nuse = le64_to_cpu(id->nuse);
capacity = nvme_lba_to_sect(ns->head, le64_to_cpu(id->nsze));
lim = queue_limits_start_update(ns->disk->queue);
nvme_set_ctrl_limits(ns->ctrl, &lim);
nvme_configure_metadata(ns->ctrl, ns->head, id, nvm, info);
nvme_set_chunk_sectors(ns, id, &lim);
@ -2290,6 +2291,7 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
struct queue_limits *ns_lim = &ns->disk->queue->limits;
struct queue_limits lim;
lim = queue_limits_start_update(ns->head->disk->queue);
blk_mq_freeze_queue(ns->head->disk->queue);
/*
* queue_limits mixes values that are the hardware limitations
@ -2306,7 +2308,6 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
* the splitting limits in to make sure we still obey possibly
* lower limitations of other controllers.
*/
lim = queue_limits_start_update(ns->head->disk->queue);
lim.logical_block_size = ns_lim->logical_block_size;
lim.physical_block_size = ns_lim->physical_block_size;
lim.io_min = ns_lim->io_min;
@ -3097,7 +3098,7 @@ int nvme_get_log(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page, u8 lsp, u8 csi,
static int nvme_get_effects_log(struct nvme_ctrl *ctrl, u8 csi,
struct nvme_effects_log **log)
{
struct nvme_effects_log *cel = xa_load(&ctrl->cels, csi);
struct nvme_effects_log *old, *cel = xa_load(&ctrl->cels, csi);
int ret;
if (cel)
@ -3114,7 +3115,11 @@ static int nvme_get_effects_log(struct nvme_ctrl *ctrl, u8 csi,
return ret;
}
xa_store(&ctrl->cels, csi, cel, GFP_KERNEL);
old = xa_store(&ctrl->cels, csi, cel, GFP_KERNEL);
if (xa_is_err(old)) {
kfree(cel);
return xa_err(old);
}
out:
*log = cel;
return 0;
@ -3176,6 +3181,25 @@ free_data:
return ret;
}
static int nvme_init_effects_log(struct nvme_ctrl *ctrl,
u8 csi, struct nvme_effects_log **log)
{
struct nvme_effects_log *effects, *old;
effects = kzalloc(sizeof(*effects), GFP_KERNEL);
if (effects)
return -ENOMEM;
old = xa_store(&ctrl->cels, csi, effects, GFP_KERNEL);
if (xa_is_err(old)) {
kfree(effects);
return xa_err(old);
}
*log = effects;
return 0;
}
static void nvme_init_known_nvm_effects(struct nvme_ctrl *ctrl)
{
struct nvme_effects_log *log = ctrl->effects;
@ -3222,10 +3246,9 @@ static int nvme_init_effects(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
}
if (!ctrl->effects) {
ctrl->effects = kzalloc(sizeof(*ctrl->effects), GFP_KERNEL);
if (!ctrl->effects)
return -ENOMEM;
xa_store(&ctrl->cels, NVME_CSI_NVM, ctrl->effects, GFP_KERNEL);
ret = nvme_init_effects_log(ctrl, NVME_CSI_NVM, &ctrl->effects);
if (ret < 0)
return ret;
}
nvme_init_known_nvm_effects(ctrl);
@ -4569,7 +4592,6 @@ int nvme_alloc_admin_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
/* Reserved for fabric connect and keep alive */
set->reserved_tags = 2;
set->numa_node = ctrl->numa_node;
set->flags = BLK_MQ_F_NO_SCHED;
if (ctrl->ops->flags & NVME_F_BLOCKING)
set->flags |= BLK_MQ_F_BLOCKING;
set->cmd_size = cmd_size;
@ -4644,7 +4666,6 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
/* Reserved for fabric connect */
set->reserved_tags = 1;
set->numa_node = ctrl->numa_node;
set->flags = BLK_MQ_F_SHOULD_MERGE;
if (ctrl->ops->flags & NVME_F_BLOCKING)
set->flags |= BLK_MQ_F_BLOCKING;
set->cmd_size = cmd_size;

View File

@ -16,7 +16,6 @@
#include <linux/nvme-fc.h>
#include "fc.h"
#include <scsi/scsi_transport_fc.h>
#include <linux/blk-mq-pci.h>
/* *************************** Data Structures/Defines ****************** */

View File

@ -1187,43 +1187,4 @@ static inline bool nvme_multi_css(struct nvme_ctrl *ctrl)
return (ctrl->ctrl_config & NVME_CC_CSS_MASK) == NVME_CC_CSS_CSI;
}
#ifdef CONFIG_NVME_VERBOSE_ERRORS
const char *nvme_get_error_status_str(u16 status);
const char *nvme_get_opcode_str(u8 opcode);
const char *nvme_get_admin_opcode_str(u8 opcode);
const char *nvme_get_fabrics_opcode_str(u8 opcode);
#else /* CONFIG_NVME_VERBOSE_ERRORS */
static inline const char *nvme_get_error_status_str(u16 status)
{
return "I/O Error";
}
static inline const char *nvme_get_opcode_str(u8 opcode)
{
return "I/O Cmd";
}
static inline const char *nvme_get_admin_opcode_str(u8 opcode)
{
return "Admin Cmd";
}
static inline const char *nvme_get_fabrics_opcode_str(u8 opcode)
{
return "Fabrics Cmd";
}
#endif /* CONFIG_NVME_VERBOSE_ERRORS */
static inline const char *nvme_opcode_str(int qid, u8 opcode)
{
return qid ? nvme_get_opcode_str(opcode) :
nvme_get_admin_opcode_str(opcode);
}
static inline const char *nvme_fabrics_opcode_str(
int qid, const struct nvme_command *cmd)
{
if (nvme_is_fabrics(cmd))
return nvme_get_fabrics_opcode_str(cmd->fabrics.fctype);
return nvme_opcode_str(qid, cmd->common.opcode);
}
#endif /* _NVME_H */

View File

@ -8,7 +8,6 @@
#include <linux/async.h>
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
#include <linux/blk-mq-pci.h>
#include <linux/blk-integrity.h>
#include <linux/dmi.h>
#include <linux/init.h>
@ -373,7 +372,7 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, __le32 *dbbuf_db,
/*
* Ensure that the doorbell is updated before reading the event
* index from memory. The controller needs to provide similar
* ordering to ensure the envent index is updated before reading
* ordering to ensure the event index is updated before reading
* the doorbell.
*/
mb();
@ -463,7 +462,7 @@ static void nvme_pci_map_queues(struct blk_mq_tag_set *set)
*/
map->queue_offset = qoff;
if (i != HCTX_TYPE_POLL && offset)
blk_mq_pci_map_queues(map, to_pci_dev(dev->dev), offset);
blk_mq_map_hw_queues(map, dev->dev, offset);
else
blk_mq_map_queues(map);
qoff += map->nr_queues;
@ -1148,13 +1147,13 @@ static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
}
}
static inline int nvme_poll_cq(struct nvme_queue *nvmeq,
static inline bool nvme_poll_cq(struct nvme_queue *nvmeq,
struct io_comp_batch *iob)
{
int found = 0;
bool found = false;
while (nvme_cqe_pending(nvmeq)) {
found++;
found = true;
/*
* load-load control dependency between phase and the rest of
* the cqe requires a full read memory barrier
@ -2086,8 +2085,8 @@ static int nvme_alloc_host_mem_single(struct nvme_dev *dev, u64 size)
sizeof(*dev->host_mem_descs), &dev->host_mem_descs_dma,
GFP_KERNEL);
if (!dev->host_mem_descs) {
dma_free_noncontiguous(dev->dev, dev->host_mem_size,
dev->hmb_sgt, DMA_BIDIRECTIONAL);
dma_free_noncontiguous(dev->dev, size, dev->hmb_sgt,
DMA_BIDIRECTIONAL);
dev->hmb_sgt = NULL;
return -ENOMEM;
}

View File

@ -54,6 +54,8 @@ MODULE_PARM_DESC(tls_handshake_timeout,
"nvme TLS handshake timeout in seconds (default 10)");
#endif
static atomic_t nvme_tcp_cpu_queues[NR_CPUS];
#ifdef CONFIG_DEBUG_LOCK_ALLOC
/* lockdep can detect a circular dependency of the form
* sk_lock -> mmap_lock (page fault) -> fs locks -> sk_lock
@ -127,6 +129,7 @@ enum nvme_tcp_queue_flags {
NVME_TCP_Q_ALLOCATED = 0,
NVME_TCP_Q_LIVE = 1,
NVME_TCP_Q_POLLING = 2,
NVME_TCP_Q_IO_CPU_SET = 3,
};
enum nvme_tcp_recv_state {
@ -1562,23 +1565,56 @@ static bool nvme_tcp_poll_queue(struct nvme_tcp_queue *queue)
ctrl->io_queues[HCTX_TYPE_POLL];
}
/**
* Track the number of queues assigned to each cpu using a global per-cpu
* counter and select the least used cpu from the mq_map. Our goal is to spread
* different controllers I/O threads across different cpu cores.
*
* Note that the accounting is not 100% perfect, but we don't need to be, we're
* simply putting our best effort to select the best candidate cpu core that we
* find at any given point.
*/
static void nvme_tcp_set_queue_io_cpu(struct nvme_tcp_queue *queue)
{
struct nvme_tcp_ctrl *ctrl = queue->ctrl;
int qid = nvme_tcp_queue_id(queue);
int n = 0;
struct blk_mq_tag_set *set = &ctrl->tag_set;
int qid = nvme_tcp_queue_id(queue) - 1;
unsigned int *mq_map = NULL;
int cpu, min_queues = INT_MAX, io_cpu;
if (wq_unbound)
goto out;
if (nvme_tcp_default_queue(queue))
n = qid - 1;
mq_map = set->map[HCTX_TYPE_DEFAULT].mq_map;
else if (nvme_tcp_read_queue(queue))
n = qid - ctrl->io_queues[HCTX_TYPE_DEFAULT] - 1;
mq_map = set->map[HCTX_TYPE_READ].mq_map;
else if (nvme_tcp_poll_queue(queue))
n = qid - ctrl->io_queues[HCTX_TYPE_DEFAULT] -
ctrl->io_queues[HCTX_TYPE_READ] - 1;
if (wq_unbound)
queue->io_cpu = WORK_CPU_UNBOUND;
else
queue->io_cpu = cpumask_next_wrap(n - 1, cpu_online_mask, -1, false);
mq_map = set->map[HCTX_TYPE_POLL].mq_map;
if (WARN_ON(!mq_map))
goto out;
/* Search for the least used cpu from the mq_map */
io_cpu = WORK_CPU_UNBOUND;
for_each_online_cpu(cpu) {
int num_queues = atomic_read(&nvme_tcp_cpu_queues[cpu]);
if (mq_map[cpu] != qid)
continue;
if (num_queues < min_queues) {
io_cpu = cpu;
min_queues = num_queues;
}
}
if (io_cpu != WORK_CPU_UNBOUND) {
queue->io_cpu = io_cpu;
atomic_inc(&nvme_tcp_cpu_queues[io_cpu]);
set_bit(NVME_TCP_Q_IO_CPU_SET, &queue->flags);
}
out:
dev_dbg(ctrl->ctrl.device, "queue %d: using cpu %d\n",
qid, queue->io_cpu);
}
static void nvme_tcp_tls_done(void *data, int status, key_serial_t pskid)
@ -1722,7 +1758,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
queue->sock->sk->sk_allocation = GFP_ATOMIC;
queue->sock->sk->sk_use_task_frag = false;
nvme_tcp_set_queue_io_cpu(queue);
queue->io_cpu = WORK_CPU_UNBOUND;
queue->request = NULL;
queue->data_remaining = 0;
queue->ddgst_remaining = 0;
@ -1844,6 +1880,9 @@ static void nvme_tcp_stop_queue(struct nvme_ctrl *nctrl, int qid)
if (!test_bit(NVME_TCP_Q_ALLOCATED, &queue->flags))
return;
if (test_and_clear_bit(NVME_TCP_Q_IO_CPU_SET, &queue->flags))
atomic_dec(&nvme_tcp_cpu_queues[queue->io_cpu]);
mutex_lock(&queue->queue_lock);
if (test_and_clear_bit(NVME_TCP_Q_LIVE, &queue->flags))
__nvme_tcp_stop_queue(queue);
@ -1878,9 +1917,10 @@ static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
nvme_tcp_init_recv_ctx(queue);
nvme_tcp_setup_sock_ops(queue);
if (idx)
if (idx) {
nvme_tcp_set_queue_io_cpu(queue);
ret = nvmf_connect_io_queue(nctrl, idx);
else
} else
ret = nvmf_connect_admin_queue(nctrl);
if (!ret) {
@ -2845,6 +2885,7 @@ static struct nvmf_transport_ops nvme_tcp_transport = {
static int __init nvme_tcp_init_module(void)
{
unsigned int wq_flags = WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_SYSFS;
int cpu;
BUILD_BUG_ON(sizeof(struct nvme_tcp_hdr) != 8);
BUILD_BUG_ON(sizeof(struct nvme_tcp_cmd_pdu) != 72);
@ -2862,6 +2903,9 @@ static int __init nvme_tcp_init_module(void)
if (!nvme_tcp_wq)
return -ENOMEM;
for_each_possible_cpu(cpu)
atomic_set(&nvme_tcp_cpu_queues[cpu], 0);
nvmf_register_transport(&nvme_tcp_transport);
return 0;
}

View File

@ -115,3 +115,14 @@ config NVME_TARGET_AUTH
target side.
If unsure, say N.
config NVME_TARGET_PCI_EPF
tristate "NVMe PCI Endpoint Function target support"
depends on NVME_TARGET && PCI_ENDPOINT
depends on NVME_CORE=y || NVME_CORE=NVME_TARGET
help
This enables the NVMe PCI Endpoint Function target driver support,
which allows creating a NVMe PCI controller using an endpoint mode
capable PCI controller.
If unsure, say N.

View File

@ -8,6 +8,7 @@ obj-$(CONFIG_NVME_TARGET_RDMA) += nvmet-rdma.o
obj-$(CONFIG_NVME_TARGET_FC) += nvmet-fc.o
obj-$(CONFIG_NVME_TARGET_FCLOOP) += nvme-fcloop.o
obj-$(CONFIG_NVME_TARGET_TCP) += nvmet-tcp.o
obj-$(CONFIG_NVME_TARGET_PCI_EPF) += nvmet-pci-epf.o
nvmet-y += core.o configfs.o admin-cmd.o fabrics-cmd.o \
discovery.o io-cmd-file.o io-cmd-bdev.o pr.o
@ -20,4 +21,5 @@ nvmet-rdma-y += rdma.o
nvmet-fc-y += fc.o
nvme-fcloop-y += fcloop.o
nvmet-tcp-y += tcp.o
nvmet-pci-epf-y += pci-epf.o
nvmet-$(CONFIG_TRACING) += trace.o

View File

@ -12,6 +12,142 @@
#include <linux/unaligned.h>
#include "nvmet.h"
static void nvmet_execute_delete_sq(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
u16 sqid = le16_to_cpu(req->cmd->delete_queue.qid);
u16 status;
if (!nvmet_is_pci_ctrl(ctrl)) {
status = nvmet_report_invalid_opcode(req);
goto complete;
}
if (!sqid) {
status = NVME_SC_QID_INVALID | NVME_STATUS_DNR;
goto complete;
}
status = nvmet_check_sqid(ctrl, sqid, false);
if (status != NVME_SC_SUCCESS)
goto complete;
status = ctrl->ops->delete_sq(ctrl, sqid);
complete:
nvmet_req_complete(req, status);
}
static void nvmet_execute_create_sq(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvme_command *cmd = req->cmd;
u16 sqid = le16_to_cpu(cmd->create_sq.sqid);
u16 cqid = le16_to_cpu(cmd->create_sq.cqid);
u16 sq_flags = le16_to_cpu(cmd->create_sq.sq_flags);
u16 qsize = le16_to_cpu(cmd->create_sq.qsize);
u64 prp1 = le64_to_cpu(cmd->create_sq.prp1);
u16 status;
if (!nvmet_is_pci_ctrl(ctrl)) {
status = nvmet_report_invalid_opcode(req);
goto complete;
}
if (!sqid) {
status = NVME_SC_QID_INVALID | NVME_STATUS_DNR;
goto complete;
}
status = nvmet_check_sqid(ctrl, sqid, true);
if (status != NVME_SC_SUCCESS)
goto complete;
/*
* Note: The NVMe specification allows multiple SQs to use the same CQ.
* However, the target code does not really support that. So for now,
* prevent this and fail the command if sqid and cqid are different.
*/
if (!cqid || cqid != sqid) {
pr_err("SQ %u: Unsupported CQID %u\n", sqid, cqid);
status = NVME_SC_CQ_INVALID | NVME_STATUS_DNR;
goto complete;
}
if (!qsize || qsize > NVME_CAP_MQES(ctrl->cap)) {
status = NVME_SC_QUEUE_SIZE | NVME_STATUS_DNR;
goto complete;
}
status = ctrl->ops->create_sq(ctrl, sqid, sq_flags, qsize, prp1);
complete:
nvmet_req_complete(req, status);
}
static void nvmet_execute_delete_cq(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
u16 cqid = le16_to_cpu(req->cmd->delete_queue.qid);
u16 status;
if (!nvmet_is_pci_ctrl(ctrl)) {
status = nvmet_report_invalid_opcode(req);
goto complete;
}
if (!cqid) {
status = NVME_SC_QID_INVALID | NVME_STATUS_DNR;
goto complete;
}
status = nvmet_check_cqid(ctrl, cqid);
if (status != NVME_SC_SUCCESS)
goto complete;
status = ctrl->ops->delete_cq(ctrl, cqid);
complete:
nvmet_req_complete(req, status);
}
static void nvmet_execute_create_cq(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvme_command *cmd = req->cmd;
u16 cqid = le16_to_cpu(cmd->create_cq.cqid);
u16 cq_flags = le16_to_cpu(cmd->create_cq.cq_flags);
u16 qsize = le16_to_cpu(cmd->create_cq.qsize);
u16 irq_vector = le16_to_cpu(cmd->create_cq.irq_vector);
u64 prp1 = le64_to_cpu(cmd->create_cq.prp1);
u16 status;
if (!nvmet_is_pci_ctrl(ctrl)) {
status = nvmet_report_invalid_opcode(req);
goto complete;
}
if (!cqid) {
status = NVME_SC_QID_INVALID | NVME_STATUS_DNR;
goto complete;
}
status = nvmet_check_cqid(ctrl, cqid);
if (status != NVME_SC_SUCCESS)
goto complete;
if (!qsize || qsize > NVME_CAP_MQES(ctrl->cap)) {
status = NVME_SC_QUEUE_SIZE | NVME_STATUS_DNR;
goto complete;
}
status = ctrl->ops->create_cq(ctrl, cqid, cq_flags, qsize,
prp1, irq_vector);
complete:
nvmet_req_complete(req, status);
}
u32 nvmet_get_log_page_len(struct nvme_command *cmd)
{
u32 len = le16_to_cpu(cmd->get_log_page.numdu);
@ -230,8 +366,18 @@ out:
nvmet_req_complete(req, status);
}
static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log)
static void nvmet_get_cmd_effects_admin(struct nvmet_ctrl *ctrl,
struct nvme_effects_log *log)
{
/* For a PCI target controller, advertize support for the . */
if (nvmet_is_pci_ctrl(ctrl)) {
log->acs[nvme_admin_delete_sq] =
log->acs[nvme_admin_create_sq] =
log->acs[nvme_admin_delete_cq] =
log->acs[nvme_admin_create_cq] =
cpu_to_le32(NVME_CMD_EFFECTS_CSUPP);
}
log->acs[nvme_admin_get_log_page] =
log->acs[nvme_admin_identify] =
log->acs[nvme_admin_abort_cmd] =
@ -240,7 +386,10 @@ static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log)
log->acs[nvme_admin_async_event] =
log->acs[nvme_admin_keep_alive] =
cpu_to_le32(NVME_CMD_EFFECTS_CSUPP);
}
static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log)
{
log->iocs[nvme_cmd_read] =
log->iocs[nvme_cmd_flush] =
log->iocs[nvme_cmd_dsm] =
@ -265,6 +414,7 @@ static void nvmet_get_cmd_effects_zns(struct nvme_effects_log *log)
static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvme_effects_log *log;
u16 status = NVME_SC_SUCCESS;
@ -276,6 +426,7 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
switch (req->cmd->get_log_page.csi) {
case NVME_CSI_NVM:
nvmet_get_cmd_effects_admin(ctrl, log);
nvmet_get_cmd_effects_nvm(log);
break;
case NVME_CSI_ZNS:
@ -283,6 +434,7 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
status = NVME_SC_INVALID_IO_CMD_SET;
goto free;
}
nvmet_get_cmd_effects_admin(ctrl, log);
nvmet_get_cmd_effects_nvm(log);
nvmet_get_cmd_effects_zns(log);
break;
@ -508,7 +660,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvmet_subsys *subsys = ctrl->subsys;
struct nvme_id_ctrl *id;
u32 cmd_capsule_size;
u32 cmd_capsule_size, ctratt;
u16 status = 0;
if (!subsys->subsys_discovered) {
@ -523,9 +675,8 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
goto out;
}
/* XXX: figure out how to assign real vendors IDs. */
id->vid = 0;
id->ssvid = 0;
id->vid = cpu_to_le16(subsys->vendor_id);
id->ssvid = cpu_to_le16(subsys->subsys_vendor_id);
memcpy(id->sn, ctrl->subsys->serial, NVMET_SN_MAX_SIZE);
memcpy_and_pad(id->mn, sizeof(id->mn), subsys->model_number,
@ -557,8 +708,10 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
/* XXX: figure out what to do about RTD3R/RTD3 */
id->oaes = cpu_to_le32(NVMET_AEN_CFG_OPTIONAL);
id->ctratt = cpu_to_le32(NVME_CTRL_ATTR_HID_128_BIT |
NVME_CTRL_ATTR_TBKAS);
ctratt = NVME_CTRL_ATTR_HID_128_BIT | NVME_CTRL_ATTR_TBKAS;
if (nvmet_is_pci_ctrl(ctrl))
ctratt |= NVME_CTRL_ATTR_RHII;
id->ctratt = cpu_to_le32(ctratt);
id->oacs = 0;
@ -1105,6 +1258,92 @@ u16 nvmet_set_feat_async_event(struct nvmet_req *req, u32 mask)
return 0;
}
static u16 nvmet_set_feat_host_id(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
if (!nvmet_is_pci_ctrl(ctrl))
return NVME_SC_CMD_SEQ_ERROR | NVME_STATUS_DNR;
/*
* The NVMe base specifications v2.1 recommends supporting 128-bits host
* IDs (section 5.1.25.1.28.1). However, that same section also says
* that "The controller may support a 64-bit Host Identifier and/or an
* extended 128-bit Host Identifier". So simplify this support and do
* not support 64-bits host IDs to avoid needing to check that all
* controllers associated with the same subsystem all use the same host
* ID size.
*/
if (!(req->cmd->common.cdw11 & cpu_to_le32(1 << 0))) {
req->error_loc = offsetof(struct nvme_common_command, cdw11);
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
}
return nvmet_copy_from_sgl(req, 0, &req->sq->ctrl->hostid,
sizeof(req->sq->ctrl->hostid));
}
static u16 nvmet_set_feat_irq_coalesce(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
u32 cdw11 = le32_to_cpu(req->cmd->common.cdw11);
struct nvmet_feat_irq_coalesce irqc = {
.time = (cdw11 >> 8) & 0xff,
.thr = cdw11 & 0xff,
};
/*
* This feature is not supported for fabrics controllers and mandatory
* for PCI controllers.
*/
if (!nvmet_is_pci_ctrl(ctrl)) {
req->error_loc = offsetof(struct nvme_common_command, cdw10);
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
}
return ctrl->ops->set_feature(ctrl, NVME_FEAT_IRQ_COALESCE, &irqc);
}
static u16 nvmet_set_feat_irq_config(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
u32 cdw11 = le32_to_cpu(req->cmd->common.cdw11);
struct nvmet_feat_irq_config irqcfg = {
.iv = cdw11 & 0xffff,
.cd = (cdw11 >> 16) & 0x1,
};
/*
* This feature is not supported for fabrics controllers and mandatory
* for PCI controllers.
*/
if (!nvmet_is_pci_ctrl(ctrl)) {
req->error_loc = offsetof(struct nvme_common_command, cdw10);
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
}
return ctrl->ops->set_feature(ctrl, NVME_FEAT_IRQ_CONFIG, &irqcfg);
}
static u16 nvmet_set_feat_arbitration(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
u32 cdw11 = le32_to_cpu(req->cmd->common.cdw11);
struct nvmet_feat_arbitration arb = {
.hpw = (cdw11 >> 24) & 0xff,
.mpw = (cdw11 >> 16) & 0xff,
.lpw = (cdw11 >> 8) & 0xff,
.ab = cdw11 & 0x3,
};
if (!ctrl->ops->set_feature) {
req->error_loc = offsetof(struct nvme_common_command, cdw10);
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
}
return ctrl->ops->set_feature(ctrl, NVME_FEAT_ARBITRATION, &arb);
}
void nvmet_execute_set_features(struct nvmet_req *req)
{
struct nvmet_subsys *subsys = nvmet_req_subsys(req);
@ -1118,6 +1357,9 @@ void nvmet_execute_set_features(struct nvmet_req *req)
return;
switch (cdw10 & 0xff) {
case NVME_FEAT_ARBITRATION:
status = nvmet_set_feat_arbitration(req);
break;
case NVME_FEAT_NUM_QUEUES:
ncqr = (cdw11 >> 16) & 0xffff;
nsqr = cdw11 & 0xffff;
@ -1128,6 +1370,12 @@ void nvmet_execute_set_features(struct nvmet_req *req)
nvmet_set_result(req,
(subsys->max_qid - 1) | ((subsys->max_qid - 1) << 16));
break;
case NVME_FEAT_IRQ_COALESCE:
status = nvmet_set_feat_irq_coalesce(req);
break;
case NVME_FEAT_IRQ_CONFIG:
status = nvmet_set_feat_irq_config(req);
break;
case NVME_FEAT_KATO:
status = nvmet_set_feat_kato(req);
break;
@ -1135,7 +1383,7 @@ void nvmet_execute_set_features(struct nvmet_req *req)
status = nvmet_set_feat_async_event(req, NVMET_AEN_CFG_ALL);
break;
case NVME_FEAT_HOST_ID:
status = NVME_SC_CMD_SEQ_ERROR | NVME_STATUS_DNR;
status = nvmet_set_feat_host_id(req);
break;
case NVME_FEAT_WRITE_PROTECT:
status = nvmet_set_feat_write_protect(req);
@ -1172,6 +1420,79 @@ static u16 nvmet_get_feat_write_protect(struct nvmet_req *req)
return 0;
}
static u16 nvmet_get_feat_irq_coalesce(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvmet_feat_irq_coalesce irqc = { };
u16 status;
/*
* This feature is not supported for fabrics controllers and mandatory
* for PCI controllers.
*/
if (!nvmet_is_pci_ctrl(ctrl)) {
req->error_loc = offsetof(struct nvme_common_command, cdw10);
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
}
status = ctrl->ops->get_feature(ctrl, NVME_FEAT_IRQ_COALESCE, &irqc);
if (status != NVME_SC_SUCCESS)
return status;
nvmet_set_result(req, ((u32)irqc.time << 8) | (u32)irqc.thr);
return NVME_SC_SUCCESS;
}
static u16 nvmet_get_feat_irq_config(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
u32 iv = le32_to_cpu(req->cmd->common.cdw11) & 0xffff;
struct nvmet_feat_irq_config irqcfg = { .iv = iv };
u16 status;
/*
* This feature is not supported for fabrics controllers and mandatory
* for PCI controllers.
*/
if (!nvmet_is_pci_ctrl(ctrl)) {
req->error_loc = offsetof(struct nvme_common_command, cdw10);
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
}
status = ctrl->ops->get_feature(ctrl, NVME_FEAT_IRQ_CONFIG, &irqcfg);
if (status != NVME_SC_SUCCESS)
return status;
nvmet_set_result(req, ((u32)irqcfg.cd << 16) | iv);
return NVME_SC_SUCCESS;
}
static u16 nvmet_get_feat_arbitration(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvmet_feat_arbitration arb = { };
u16 status;
if (!ctrl->ops->get_feature) {
req->error_loc = offsetof(struct nvme_common_command, cdw10);
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
}
status = ctrl->ops->get_feature(ctrl, NVME_FEAT_ARBITRATION, &arb);
if (status != NVME_SC_SUCCESS)
return status;
nvmet_set_result(req,
((u32)arb.hpw << 24) |
((u32)arb.mpw << 16) |
((u32)arb.lpw << 8) |
(arb.ab & 0x3));
return NVME_SC_SUCCESS;
}
void nvmet_get_feat_kato(struct nvmet_req *req)
{
nvmet_set_result(req, req->sq->ctrl->kato * 1000);
@ -1198,21 +1519,24 @@ void nvmet_execute_get_features(struct nvmet_req *req)
* need to come up with some fake values for these.
*/
#if 0
case NVME_FEAT_ARBITRATION:
break;
case NVME_FEAT_POWER_MGMT:
break;
case NVME_FEAT_TEMP_THRESH:
break;
case NVME_FEAT_ERR_RECOVERY:
break;
case NVME_FEAT_IRQ_COALESCE:
break;
case NVME_FEAT_IRQ_CONFIG:
break;
case NVME_FEAT_WRITE_ATOMIC:
break;
#endif
case NVME_FEAT_ARBITRATION:
status = nvmet_get_feat_arbitration(req);
break;
case NVME_FEAT_IRQ_COALESCE:
status = nvmet_get_feat_irq_coalesce(req);
break;
case NVME_FEAT_IRQ_CONFIG:
status = nvmet_get_feat_irq_config(req);
break;
case NVME_FEAT_ASYNC_EVENT:
nvmet_get_feat_async_event(req);
break;
@ -1293,6 +1617,27 @@ out:
nvmet_req_complete(req, status);
}
u32 nvmet_admin_cmd_data_len(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
if (nvme_is_fabrics(cmd))
return nvmet_fabrics_admin_cmd_data_len(req);
if (nvmet_is_disc_subsys(nvmet_req_subsys(req)))
return nvmet_discovery_cmd_data_len(req);
switch (cmd->common.opcode) {
case nvme_admin_get_log_page:
return nvmet_get_log_page_len(cmd);
case nvme_admin_identify:
return NVME_IDENTIFY_DATA_SIZE;
case nvme_admin_get_features:
return nvmet_feat_data_len(req, le32_to_cpu(cmd->common.cdw10));
default:
return 0;
}
}
u16 nvmet_parse_admin_cmd(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
@ -1307,13 +1652,30 @@ u16 nvmet_parse_admin_cmd(struct nvmet_req *req)
if (unlikely(ret))
return ret;
/* For PCI controllers, admin commands shall not use SGL. */
if (nvmet_is_pci_ctrl(req->sq->ctrl) && !req->sq->qid &&
cmd->common.flags & NVME_CMD_SGL_ALL)
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
if (nvmet_is_passthru_req(req))
return nvmet_parse_passthru_admin_cmd(req);
switch (cmd->common.opcode) {
case nvme_admin_delete_sq:
req->execute = nvmet_execute_delete_sq;
return 0;
case nvme_admin_create_sq:
req->execute = nvmet_execute_create_sq;
return 0;
case nvme_admin_get_log_page:
req->execute = nvmet_execute_get_log_page;
return 0;
case nvme_admin_delete_cq:
req->execute = nvmet_execute_delete_cq;
return 0;
case nvme_admin_create_cq:
req->execute = nvmet_execute_create_cq;
return 0;
case nvme_admin_identify:
req->execute = nvmet_execute_identify;
return 0;

View File

@ -37,6 +37,7 @@ static struct nvmet_type_name_map nvmet_transport[] = {
{ NVMF_TRTYPE_RDMA, "rdma" },
{ NVMF_TRTYPE_FC, "fc" },
{ NVMF_TRTYPE_TCP, "tcp" },
{ NVMF_TRTYPE_PCI, "pci" },
{ NVMF_TRTYPE_LOOP, "loop" },
};
@ -46,6 +47,7 @@ static const struct nvmet_type_name_map nvmet_addr_family[] = {
{ NVMF_ADDR_FAMILY_IP6, "ipv6" },
{ NVMF_ADDR_FAMILY_IB, "ib" },
{ NVMF_ADDR_FAMILY_FC, "fc" },
{ NVMF_ADDR_FAMILY_PCI, "pci" },
{ NVMF_ADDR_FAMILY_LOOP, "loop" },
};
@ -1400,6 +1402,49 @@ out_unlock:
}
CONFIGFS_ATTR(nvmet_subsys_, attr_cntlid_max);
static ssize_t nvmet_subsys_attr_vendor_id_show(struct config_item *item,
char *page)
{
return snprintf(page, PAGE_SIZE, "0x%x\n", to_subsys(item)->vendor_id);
}
static ssize_t nvmet_subsys_attr_vendor_id_store(struct config_item *item,
const char *page, size_t count)
{
u16 vid;
if (kstrtou16(page, 0, &vid))
return -EINVAL;
down_write(&nvmet_config_sem);
to_subsys(item)->vendor_id = vid;
up_write(&nvmet_config_sem);
return count;
}
CONFIGFS_ATTR(nvmet_subsys_, attr_vendor_id);
static ssize_t nvmet_subsys_attr_subsys_vendor_id_show(struct config_item *item,
char *page)
{
return snprintf(page, PAGE_SIZE, "0x%x\n",
to_subsys(item)->subsys_vendor_id);
}
static ssize_t nvmet_subsys_attr_subsys_vendor_id_store(struct config_item *item,
const char *page, size_t count)
{
u16 ssvid;
if (kstrtou16(page, 0, &ssvid))
return -EINVAL;
down_write(&nvmet_config_sem);
to_subsys(item)->subsys_vendor_id = ssvid;
up_write(&nvmet_config_sem);
return count;
}
CONFIGFS_ATTR(nvmet_subsys_, attr_subsys_vendor_id);
static ssize_t nvmet_subsys_attr_model_show(struct config_item *item,
char *page)
{
@ -1628,6 +1673,8 @@ static struct configfs_attribute *nvmet_subsys_attrs[] = {
&nvmet_subsys_attr_attr_serial,
&nvmet_subsys_attr_attr_cntlid_min,
&nvmet_subsys_attr_attr_cntlid_max,
&nvmet_subsys_attr_attr_vendor_id,
&nvmet_subsys_attr_attr_subsys_vendor_id,
&nvmet_subsys_attr_attr_model,
&nvmet_subsys_attr_attr_qid_max,
&nvmet_subsys_attr_attr_ieee_oui,
@ -1782,6 +1829,7 @@ static struct config_group *nvmet_referral_make(
return ERR_PTR(-ENOMEM);
INIT_LIST_HEAD(&port->entry);
port->disc_addr.trtype = NVMF_TRTYPE_MAX;
config_group_init_type_name(&port->group, name, &nvmet_referral_type);
return &port->group;
@ -2007,6 +2055,7 @@ static struct config_group *nvmet_ports_make(struct config_group *group,
port->inline_data_size = -1; /* < 0 == let the transport choose */
port->max_queue_size = -1; /* < 0 == let the transport choose */
port->disc_addr.trtype = NVMF_TRTYPE_MAX;
port->disc_addr.portid = cpu_to_le16(portid);
port->disc_addr.adrfam = NVMF_ADDR_FAMILY_MAX;
port->disc_addr.treq = NVMF_TREQ_DISABLE_SQFLOW;

View File

@ -836,6 +836,89 @@ static void nvmet_confirm_sq(struct percpu_ref *ref)
complete(&sq->confirm_done);
}
u16 nvmet_check_cqid(struct nvmet_ctrl *ctrl, u16 cqid)
{
if (!ctrl->sqs)
return NVME_SC_INTERNAL | NVME_STATUS_DNR;
if (cqid > ctrl->subsys->max_qid)
return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
/*
* Note: For PCI controllers, the NVMe specifications allows multiple
* SQs to share a single CQ. However, we do not support this yet, so
* check that there is no SQ defined for a CQ. If one exist, then the
* CQ ID is invalid for creation as well as when the CQ is being
* deleted (as that would mean that the SQ was not deleted before the
* CQ).
*/
if (ctrl->sqs[cqid])
return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
return NVME_SC_SUCCESS;
}
u16 nvmet_cq_create(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq,
u16 qid, u16 size)
{
u16 status;
status = nvmet_check_cqid(ctrl, qid);
if (status != NVME_SC_SUCCESS)
return status;
nvmet_cq_setup(ctrl, cq, qid, size);
return NVME_SC_SUCCESS;
}
EXPORT_SYMBOL_GPL(nvmet_cq_create);
u16 nvmet_check_sqid(struct nvmet_ctrl *ctrl, u16 sqid,
bool create)
{
if (!ctrl->sqs)
return NVME_SC_INTERNAL | NVME_STATUS_DNR;
if (sqid > ctrl->subsys->max_qid)
return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
if ((create && ctrl->sqs[sqid]) ||
(!create && !ctrl->sqs[sqid]))
return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
return NVME_SC_SUCCESS;
}
u16 nvmet_sq_create(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq,
u16 sqid, u16 size)
{
u16 status;
int ret;
if (!kref_get_unless_zero(&ctrl->ref))
return NVME_SC_INTERNAL | NVME_STATUS_DNR;
status = nvmet_check_sqid(ctrl, sqid, true);
if (status != NVME_SC_SUCCESS)
return status;
ret = nvmet_sq_init(sq);
if (ret) {
status = NVME_SC_INTERNAL | NVME_STATUS_DNR;
goto ctrl_put;
}
nvmet_sq_setup(ctrl, sq, sqid, size);
sq->ctrl = ctrl;
return NVME_SC_SUCCESS;
ctrl_put:
nvmet_ctrl_put(ctrl);
return status;
}
EXPORT_SYMBOL_GPL(nvmet_sq_create);
void nvmet_sq_destroy(struct nvmet_sq *sq)
{
struct nvmet_ctrl *ctrl = sq->ctrl;
@ -929,6 +1012,33 @@ static inline u16 nvmet_io_cmd_check_access(struct nvmet_req *req)
return 0;
}
static u32 nvmet_io_cmd_transfer_len(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
u32 metadata_len = 0;
if (nvme_is_fabrics(cmd))
return nvmet_fabrics_io_cmd_data_len(req);
if (!req->ns)
return 0;
switch (req->cmd->common.opcode) {
case nvme_cmd_read:
case nvme_cmd_write:
case nvme_cmd_zone_append:
if (req->sq->ctrl->pi_support && nvmet_ns_has_pi(req->ns))
metadata_len = nvmet_rw_metadata_len(req);
return nvmet_rw_data_len(req) + metadata_len;
case nvme_cmd_dsm:
return nvmet_dsm_len(req);
case nvme_cmd_zone_mgmt_recv:
return (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
default:
return 0;
}
}
static u16 nvmet_parse_io_cmd(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
@ -1030,13 +1140,16 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
/*
* For fabrics, PSDT field shall describe metadata pointer (MPTR) that
* contains an address of a single contiguous physical buffer that is
* byte aligned.
* byte aligned. For PCI controllers, this is optional so not enforced.
*/
if (unlikely((flags & NVME_CMD_SGL_ALL) != NVME_CMD_SGL_METABUF)) {
req->error_loc = offsetof(struct nvme_common_command, flags);
if (!req->sq->ctrl || !nvmet_is_pci_ctrl(req->sq->ctrl)) {
req->error_loc =
offsetof(struct nvme_common_command, flags);
status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
goto fail;
}
}
if (unlikely(!req->sq->ctrl))
/* will return an error for any non-connect command: */
@ -1077,11 +1190,27 @@ void nvmet_req_uninit(struct nvmet_req *req)
}
EXPORT_SYMBOL_GPL(nvmet_req_uninit);
size_t nvmet_req_transfer_len(struct nvmet_req *req)
{
if (likely(req->sq->qid != 0))
return nvmet_io_cmd_transfer_len(req);
if (unlikely(!req->sq->ctrl))
return nvmet_connect_cmd_data_len(req);
return nvmet_admin_cmd_data_len(req);
}
EXPORT_SYMBOL_GPL(nvmet_req_transfer_len);
bool nvmet_check_transfer_len(struct nvmet_req *req, size_t len)
{
if (unlikely(len != req->transfer_len)) {
u16 status;
req->error_loc = offsetof(struct nvme_common_command, dptr);
nvmet_req_complete(req, NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR);
if (req->cmd->common.flags & NVME_CMD_SGL_ALL)
status = NVME_SC_SGL_INVALID_DATA;
else
status = NVME_SC_INVALID_FIELD;
nvmet_req_complete(req, status | NVME_STATUS_DNR);
return false;
}
@ -1092,8 +1221,14 @@ EXPORT_SYMBOL_GPL(nvmet_check_transfer_len);
bool nvmet_check_data_len_lte(struct nvmet_req *req, size_t data_len)
{
if (unlikely(data_len > req->transfer_len)) {
u16 status;
req->error_loc = offsetof(struct nvme_common_command, dptr);
nvmet_req_complete(req, NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR);
if (req->cmd->common.flags & NVME_CMD_SGL_ALL)
status = NVME_SC_SGL_INVALID_DATA;
else
status = NVME_SC_INVALID_FIELD;
nvmet_req_complete(req, status | NVME_STATUS_DNR);
return false;
}
@ -1184,41 +1319,6 @@ void nvmet_req_free_sgls(struct nvmet_req *req)
}
EXPORT_SYMBOL_GPL(nvmet_req_free_sgls);
static inline bool nvmet_cc_en(u32 cc)
{
return (cc >> NVME_CC_EN_SHIFT) & 0x1;
}
static inline u8 nvmet_cc_css(u32 cc)
{
return (cc >> NVME_CC_CSS_SHIFT) & 0x7;
}
static inline u8 nvmet_cc_mps(u32 cc)
{
return (cc >> NVME_CC_MPS_SHIFT) & 0xf;
}
static inline u8 nvmet_cc_ams(u32 cc)
{
return (cc >> NVME_CC_AMS_SHIFT) & 0x7;
}
static inline u8 nvmet_cc_shn(u32 cc)
{
return (cc >> NVME_CC_SHN_SHIFT) & 0x3;
}
static inline u8 nvmet_cc_iosqes(u32 cc)
{
return (cc >> NVME_CC_IOSQES_SHIFT) & 0xf;
}
static inline u8 nvmet_cc_iocqes(u32 cc)
{
return (cc >> NVME_CC_IOCQES_SHIFT) & 0xf;
}
static inline bool nvmet_css_supported(u8 cc_css)
{
switch (cc_css << NVME_CC_CSS_SHIFT) {
@ -1295,6 +1395,7 @@ void nvmet_update_cc(struct nvmet_ctrl *ctrl, u32 new)
ctrl->csts &= ~NVME_CSTS_SHST_CMPLT;
mutex_unlock(&ctrl->lock);
}
EXPORT_SYMBOL_GPL(nvmet_update_cc);
static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
{
@ -1402,15 +1503,15 @@ bool nvmet_host_allowed(struct nvmet_subsys *subsys, const char *hostnqn)
* Note: ctrl->subsys->lock should be held when calling this function
*/
static void nvmet_setup_p2p_ns_map(struct nvmet_ctrl *ctrl,
struct nvmet_req *req)
struct device *p2p_client)
{
struct nvmet_ns *ns;
unsigned long idx;
if (!req->p2p_client)
if (!p2p_client)
return;
ctrl->p2p_client = get_device(req->p2p_client);
ctrl->p2p_client = get_device(p2p_client);
nvmet_for_each_enabled_ns(&ctrl->subsys->namespaces, idx, ns)
nvmet_p2pmem_ns_add_p2p(ctrl, ns);
@ -1439,45 +1540,44 @@ static void nvmet_fatal_error_handler(struct work_struct *work)
ctrl->ops->delete_ctrl(ctrl);
}
u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
struct nvmet_req *req, u32 kato, struct nvmet_ctrl **ctrlp,
uuid_t *hostid)
struct nvmet_ctrl *nvmet_alloc_ctrl(struct nvmet_alloc_ctrl_args *args)
{
struct nvmet_subsys *subsys;
struct nvmet_ctrl *ctrl;
u32 kato = args->kato;
u8 dhchap_status;
int ret;
u16 status;
status = NVME_SC_CONNECT_INVALID_PARAM | NVME_STATUS_DNR;
subsys = nvmet_find_get_subsys(req->port, subsysnqn);
args->status = NVME_SC_CONNECT_INVALID_PARAM | NVME_STATUS_DNR;
subsys = nvmet_find_get_subsys(args->port, args->subsysnqn);
if (!subsys) {
pr_warn("connect request for invalid subsystem %s!\n",
subsysnqn);
req->cqe->result.u32 = IPO_IATTR_CONNECT_DATA(subsysnqn);
req->error_loc = offsetof(struct nvme_common_command, dptr);
goto out;
args->subsysnqn);
args->result = IPO_IATTR_CONNECT_DATA(subsysnqn);
args->error_loc = offsetof(struct nvme_common_command, dptr);
return NULL;
}
down_read(&nvmet_config_sem);
if (!nvmet_host_allowed(subsys, hostnqn)) {
if (!nvmet_host_allowed(subsys, args->hostnqn)) {
pr_info("connect by host %s for subsystem %s not allowed\n",
hostnqn, subsysnqn);
req->cqe->result.u32 = IPO_IATTR_CONNECT_DATA(hostnqn);
args->hostnqn, args->subsysnqn);
args->result = IPO_IATTR_CONNECT_DATA(hostnqn);
up_read(&nvmet_config_sem);
status = NVME_SC_CONNECT_INVALID_HOST | NVME_STATUS_DNR;
req->error_loc = offsetof(struct nvme_common_command, dptr);
args->status = NVME_SC_CONNECT_INVALID_HOST | NVME_STATUS_DNR;
args->error_loc = offsetof(struct nvme_common_command, dptr);
goto out_put_subsystem;
}
up_read(&nvmet_config_sem);
status = NVME_SC_INTERNAL;
args->status = NVME_SC_INTERNAL;
ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
if (!ctrl)
goto out_put_subsystem;
mutex_init(&ctrl->lock);
ctrl->port = req->port;
ctrl->ops = req->ops;
ctrl->port = args->port;
ctrl->ops = args->ops;
#ifdef CONFIG_NVME_TARGET_PASSTHRU
/* By default, set loop targets to clear IDS by default */
@ -1491,8 +1591,8 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
INIT_WORK(&ctrl->fatal_err_work, nvmet_fatal_error_handler);
INIT_DELAYED_WORK(&ctrl->ka_work, nvmet_keep_alive_timer);
memcpy(ctrl->subsysnqn, subsysnqn, NVMF_NQN_SIZE);
memcpy(ctrl->hostnqn, hostnqn, NVMF_NQN_SIZE);
memcpy(ctrl->subsysnqn, args->subsysnqn, NVMF_NQN_SIZE);
memcpy(ctrl->hostnqn, args->hostnqn, NVMF_NQN_SIZE);
kref_init(&ctrl->ref);
ctrl->subsys = subsys;
@ -1515,12 +1615,12 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
subsys->cntlid_min, subsys->cntlid_max,
GFP_KERNEL);
if (ret < 0) {
status = NVME_SC_CONNECT_CTRL_BUSY | NVME_STATUS_DNR;
args->status = NVME_SC_CONNECT_CTRL_BUSY | NVME_STATUS_DNR;
goto out_free_sqs;
}
ctrl->cntlid = ret;
uuid_copy(&ctrl->hostid, hostid);
uuid_copy(&ctrl->hostid, args->hostid);
/*
* Discovery controllers may use some arbitrary high value
@ -1542,12 +1642,35 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
if (ret)
goto init_pr_fail;
list_add_tail(&ctrl->subsys_entry, &subsys->ctrls);
nvmet_setup_p2p_ns_map(ctrl, req);
nvmet_setup_p2p_ns_map(ctrl, args->p2p_client);
nvmet_debugfs_ctrl_setup(ctrl);
mutex_unlock(&subsys->lock);
*ctrlp = ctrl;
return 0;
if (args->hostid)
uuid_copy(&ctrl->hostid, args->hostid);
dhchap_status = nvmet_setup_auth(ctrl);
if (dhchap_status) {
pr_err("Failed to setup authentication, dhchap status %u\n",
dhchap_status);
nvmet_ctrl_put(ctrl);
if (dhchap_status == NVME_AUTH_DHCHAP_FAILURE_FAILED)
args->status =
NVME_SC_CONNECT_INVALID_HOST | NVME_STATUS_DNR;
else
args->status = NVME_SC_INTERNAL;
return NULL;
}
args->status = NVME_SC_SUCCESS;
pr_info("Created %s controller %d for subsystem %s for NQN %s%s%s.\n",
nvmet_is_disc_subsys(ctrl->subsys) ? "discovery" : "nvm",
ctrl->cntlid, ctrl->subsys->subsysnqn, ctrl->hostnqn,
ctrl->pi_support ? " T10-PI is enabled" : "",
nvmet_has_auth(ctrl) ? " with DH-HMAC-CHAP" : "");
return ctrl;
init_pr_fail:
mutex_unlock(&subsys->lock);
@ -1561,9 +1684,9 @@ out_free_ctrl:
kfree(ctrl);
out_put_subsystem:
nvmet_subsys_put(subsys);
out:
return status;
return NULL;
}
EXPORT_SYMBOL_GPL(nvmet_alloc_ctrl);
static void nvmet_ctrl_free(struct kref *ref)
{
@ -1599,6 +1722,7 @@ void nvmet_ctrl_put(struct nvmet_ctrl *ctrl)
{
kref_put(&ctrl->ref, nvmet_ctrl_free);
}
EXPORT_SYMBOL_GPL(nvmet_ctrl_put);
void nvmet_ctrl_fatal_error(struct nvmet_ctrl *ctrl)
{

View File

@ -224,6 +224,9 @@ static void nvmet_execute_disc_get_log_page(struct nvmet_req *req)
}
list_for_each_entry(r, &req->port->referrals, entry) {
if (r->disc_addr.trtype == NVMF_TRTYPE_PCI)
continue;
nvmet_format_discovery_entry(hdr, r,
NVME_DISC_SUBSYS_NAME,
r->disc_addr.traddr,
@ -352,6 +355,20 @@ static void nvmet_execute_disc_get_features(struct nvmet_req *req)
nvmet_req_complete(req, stat);
}
u32 nvmet_discovery_cmd_data_len(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
switch (cmd->common.opcode) {
case nvme_admin_get_log_page:
return nvmet_get_log_page_len(req->cmd);
case nvme_admin_identify:
return NVME_IDENTIFY_DATA_SIZE;
default:
return 0;
}
}
u16 nvmet_parse_discovery_cmd(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;

View File

@ -179,6 +179,11 @@ static u8 nvmet_auth_failure2(void *d)
return data->rescode_exp;
}
u32 nvmet_auth_send_data_len(struct nvmet_req *req)
{
return le32_to_cpu(req->cmd->auth_send.tl);
}
void nvmet_execute_auth_send(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
@ -206,7 +211,7 @@ void nvmet_execute_auth_send(struct nvmet_req *req)
offsetof(struct nvmf_auth_send_command, spsp1);
goto done;
}
tl = le32_to_cpu(req->cmd->auth_send.tl);
tl = nvmet_auth_send_data_len(req);
if (!tl) {
status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
req->error_loc =
@ -429,6 +434,11 @@ static void nvmet_auth_failure1(struct nvmet_req *req, void *d, int al)
data->rescode_exp = req->sq->dhchap_status;
}
u32 nvmet_auth_receive_data_len(struct nvmet_req *req)
{
return le32_to_cpu(req->cmd->auth_receive.al);
}
void nvmet_execute_auth_receive(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
@ -454,7 +464,7 @@ void nvmet_execute_auth_receive(struct nvmet_req *req)
offsetof(struct nvmf_auth_receive_command, spsp1);
goto done;
}
al = le32_to_cpu(req->cmd->auth_receive.al);
al = nvmet_auth_receive_data_len(req);
if (!al) {
status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
req->error_loc =

View File

@ -85,6 +85,22 @@ static void nvmet_execute_prop_get(struct nvmet_req *req)
nvmet_req_complete(req, status);
}
u32 nvmet_fabrics_admin_cmd_data_len(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
switch (cmd->fabrics.fctype) {
#ifdef CONFIG_NVME_TARGET_AUTH
case nvme_fabrics_type_auth_send:
return nvmet_auth_send_data_len(req);
case nvme_fabrics_type_auth_receive:
return nvmet_auth_receive_data_len(req);
#endif
default:
return 0;
}
}
u16 nvmet_parse_fabrics_admin_cmd(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
@ -114,6 +130,22 @@ u16 nvmet_parse_fabrics_admin_cmd(struct nvmet_req *req)
return 0;
}
u32 nvmet_fabrics_io_cmd_data_len(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
switch (cmd->fabrics.fctype) {
#ifdef CONFIG_NVME_TARGET_AUTH
case nvme_fabrics_type_auth_send:
return nvmet_auth_send_data_len(req);
case nvme_fabrics_type_auth_receive:
return nvmet_auth_receive_data_len(req);
#endif
default:
return 0;
}
}
u16 nvmet_parse_fabrics_io_cmd(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
@ -213,73 +245,67 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
struct nvmf_connect_command *c = &req->cmd->connect;
struct nvmf_connect_data *d;
struct nvmet_ctrl *ctrl = NULL;
u16 status;
u8 dhchap_status;
struct nvmet_alloc_ctrl_args args = {
.port = req->port,
.ops = req->ops,
.p2p_client = req->p2p_client,
.kato = le32_to_cpu(c->kato),
};
if (!nvmet_check_transfer_len(req, sizeof(struct nvmf_connect_data)))
return;
d = kmalloc(sizeof(*d), GFP_KERNEL);
if (!d) {
status = NVME_SC_INTERNAL;
args.status = NVME_SC_INTERNAL;
goto complete;
}
status = nvmet_copy_from_sgl(req, 0, d, sizeof(*d));
if (status)
args.status = nvmet_copy_from_sgl(req, 0, d, sizeof(*d));
if (args.status)
goto out;
if (c->recfmt != 0) {
pr_warn("invalid connect version (%d).\n",
le16_to_cpu(c->recfmt));
req->error_loc = offsetof(struct nvmf_connect_command, recfmt);
status = NVME_SC_CONNECT_FORMAT | NVME_STATUS_DNR;
args.error_loc = offsetof(struct nvmf_connect_command, recfmt);
args.status = NVME_SC_CONNECT_FORMAT | NVME_STATUS_DNR;
goto out;
}
if (unlikely(d->cntlid != cpu_to_le16(0xffff))) {
pr_warn("connect attempt for invalid controller ID %#x\n",
d->cntlid);
status = NVME_SC_CONNECT_INVALID_PARAM | NVME_STATUS_DNR;
req->cqe->result.u32 = IPO_IATTR_CONNECT_DATA(cntlid);
args.status = NVME_SC_CONNECT_INVALID_PARAM | NVME_STATUS_DNR;
args.result = IPO_IATTR_CONNECT_DATA(cntlid);
goto out;
}
d->subsysnqn[NVMF_NQN_FIELD_LEN - 1] = '\0';
d->hostnqn[NVMF_NQN_FIELD_LEN - 1] = '\0';
status = nvmet_alloc_ctrl(d->subsysnqn, d->hostnqn, req,
le32_to_cpu(c->kato), &ctrl, &d->hostid);
if (status)
args.subsysnqn = d->subsysnqn;
args.hostnqn = d->hostnqn;
args.hostid = &d->hostid;
args.kato = c->kato;
ctrl = nvmet_alloc_ctrl(&args);
if (!ctrl)
goto out;
dhchap_status = nvmet_setup_auth(ctrl);
if (dhchap_status) {
pr_err("Failed to setup authentication, dhchap status %u\n",
dhchap_status);
nvmet_ctrl_put(ctrl);
if (dhchap_status == NVME_AUTH_DHCHAP_FAILURE_FAILED)
status = (NVME_SC_CONNECT_INVALID_HOST | NVME_STATUS_DNR);
else
status = NVME_SC_INTERNAL;
goto out;
}
status = nvmet_install_queue(ctrl, req);
if (status) {
args.status = nvmet_install_queue(ctrl, req);
if (args.status) {
nvmet_ctrl_put(ctrl);
goto out;
}
pr_info("creating %s controller %d for subsystem %s for NQN %s%s%s.\n",
nvmet_is_disc_subsys(ctrl->subsys) ? "discovery" : "nvm",
ctrl->cntlid, ctrl->subsys->subsysnqn, ctrl->hostnqn,
ctrl->pi_support ? " T10-PI is enabled" : "",
nvmet_has_auth(ctrl) ? " with DH-HMAC-CHAP" : "");
req->cqe->result.u32 = cpu_to_le32(nvmet_connect_result(ctrl));
args.result = cpu_to_le32(nvmet_connect_result(ctrl));
out:
kfree(d);
complete:
nvmet_req_complete(req, status);
req->error_loc = args.error_loc;
req->cqe->result.u32 = args.result;
nvmet_req_complete(req, args.status);
}
static void nvmet_execute_io_connect(struct nvmet_req *req)
@ -343,6 +369,17 @@ out_ctrl_put:
goto out;
}
u32 nvmet_connect_cmd_data_len(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
if (!nvme_is_fabrics(cmd) ||
cmd->fabrics.fctype != nvme_fabrics_type_connect)
return 0;
return sizeof(struct nvmf_connect_data);
}
u16 nvmet_parse_connect_cmd(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;

View File

@ -272,6 +272,9 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
iter_flags = SG_MITER_FROM_SG;
}
if (req->cmd->rw.control & NVME_RW_LR)
opf |= REQ_FAILFAST_DEV;
if (is_pci_p2pdma_page(sg_page(req->sg)))
opf |= REQ_NOMERGE;

View File

@ -245,6 +245,8 @@ struct nvmet_ctrl {
struct nvmet_subsys *subsys;
struct nvmet_sq **sqs;
void *drvdata;
bool reset_tbkas;
struct mutex lock;
@ -331,6 +333,8 @@ struct nvmet_subsys {
struct config_group namespaces_group;
struct config_group allowed_hosts_group;
u16 vendor_id;
u16 subsys_vendor_id;
char *model_number;
u32 ieee_oui;
char *firmware_rev;
@ -411,6 +415,18 @@ struct nvmet_fabrics_ops {
void (*discovery_chg)(struct nvmet_port *port);
u8 (*get_mdts)(const struct nvmet_ctrl *ctrl);
u16 (*get_max_queue_size)(const struct nvmet_ctrl *ctrl);
/* Operations mandatory for PCI target controllers */
u16 (*create_sq)(struct nvmet_ctrl *ctrl, u16 sqid, u16 flags,
u16 qsize, u64 prp1);
u16 (*delete_sq)(struct nvmet_ctrl *ctrl, u16 sqid);
u16 (*create_cq)(struct nvmet_ctrl *ctrl, u16 cqid, u16 flags,
u16 qsize, u64 prp1, u16 irq_vector);
u16 (*delete_cq)(struct nvmet_ctrl *ctrl, u16 cqid);
u16 (*set_feature)(const struct nvmet_ctrl *ctrl, u8 feat,
void *feat_data);
u16 (*get_feature)(const struct nvmet_ctrl *ctrl, u8 feat,
void *feat_data);
};
#define NVMET_MAX_INLINE_BIOVEC 8
@ -520,18 +536,24 @@ void nvmet_start_keep_alive_timer(struct nvmet_ctrl *ctrl);
void nvmet_stop_keep_alive_timer(struct nvmet_ctrl *ctrl);
u16 nvmet_parse_connect_cmd(struct nvmet_req *req);
u32 nvmet_connect_cmd_data_len(struct nvmet_req *req);
void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id);
u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req);
u16 nvmet_file_parse_io_cmd(struct nvmet_req *req);
u16 nvmet_bdev_zns_parse_io_cmd(struct nvmet_req *req);
u32 nvmet_admin_cmd_data_len(struct nvmet_req *req);
u16 nvmet_parse_admin_cmd(struct nvmet_req *req);
u32 nvmet_discovery_cmd_data_len(struct nvmet_req *req);
u16 nvmet_parse_discovery_cmd(struct nvmet_req *req);
u16 nvmet_parse_fabrics_admin_cmd(struct nvmet_req *req);
u32 nvmet_fabrics_admin_cmd_data_len(struct nvmet_req *req);
u16 nvmet_parse_fabrics_io_cmd(struct nvmet_req *req);
u32 nvmet_fabrics_io_cmd_data_len(struct nvmet_req *req);
bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
struct nvmet_sq *sq, const struct nvmet_fabrics_ops *ops);
void nvmet_req_uninit(struct nvmet_req *req);
size_t nvmet_req_transfer_len(struct nvmet_req *req);
bool nvmet_check_transfer_len(struct nvmet_req *req, size_t len);
bool nvmet_check_data_len_lte(struct nvmet_req *req, size_t data_len);
void nvmet_req_complete(struct nvmet_req *req, u16 status);
@ -542,19 +564,37 @@ void nvmet_execute_set_features(struct nvmet_req *req);
void nvmet_execute_get_features(struct nvmet_req *req);
void nvmet_execute_keep_alive(struct nvmet_req *req);
u16 nvmet_check_cqid(struct nvmet_ctrl *ctrl, u16 cqid);
void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid,
u16 size);
u16 nvmet_cq_create(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid,
u16 size);
u16 nvmet_check_sqid(struct nvmet_ctrl *ctrl, u16 sqid, bool create);
void nvmet_sq_setup(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq, u16 qid,
u16 size);
u16 nvmet_sq_create(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq, u16 qid,
u16 size);
void nvmet_sq_destroy(struct nvmet_sq *sq);
int nvmet_sq_init(struct nvmet_sq *sq);
void nvmet_ctrl_fatal_error(struct nvmet_ctrl *ctrl);
void nvmet_update_cc(struct nvmet_ctrl *ctrl, u32 new);
u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
struct nvmet_req *req, u32 kato, struct nvmet_ctrl **ctrlp,
uuid_t *hostid);
struct nvmet_alloc_ctrl_args {
struct nvmet_port *port;
char *subsysnqn;
char *hostnqn;
uuid_t *hostid;
const struct nvmet_fabrics_ops *ops;
struct device *p2p_client;
u32 kato;
u32 result;
u16 error_loc;
u16 status;
};
struct nvmet_ctrl *nvmet_alloc_ctrl(struct nvmet_alloc_ctrl_args *args);
struct nvmet_ctrl *nvmet_ctrl_find_get(const char *subsysnqn,
const char *hostnqn, u16 cntlid,
struct nvmet_req *req);
@ -696,6 +736,11 @@ static inline bool nvmet_is_disc_subsys(struct nvmet_subsys *subsys)
return subsys->type != NVME_NQN_NVME;
}
static inline bool nvmet_is_pci_ctrl(struct nvmet_ctrl *ctrl)
{
return ctrl->port->disc_addr.trtype == NVMF_TRTYPE_PCI;
}
#ifdef CONFIG_NVME_TARGET_PASSTHRU
void nvmet_passthru_subsys_free(struct nvmet_subsys *subsys);
int nvmet_passthru_ctrl_enable(struct nvmet_subsys *subsys);
@ -737,6 +782,41 @@ void nvmet_passthrough_override_cap(struct nvmet_ctrl *ctrl);
u16 errno_to_nvme_status(struct nvmet_req *req, int errno);
u16 nvmet_report_invalid_opcode(struct nvmet_req *req);
static inline bool nvmet_cc_en(u32 cc)
{
return (cc >> NVME_CC_EN_SHIFT) & 0x1;
}
static inline u8 nvmet_cc_css(u32 cc)
{
return (cc >> NVME_CC_CSS_SHIFT) & 0x7;
}
static inline u8 nvmet_cc_mps(u32 cc)
{
return (cc >> NVME_CC_MPS_SHIFT) & 0xf;
}
static inline u8 nvmet_cc_ams(u32 cc)
{
return (cc >> NVME_CC_AMS_SHIFT) & 0x7;
}
static inline u8 nvmet_cc_shn(u32 cc)
{
return (cc >> NVME_CC_SHN_SHIFT) & 0x3;
}
static inline u8 nvmet_cc_iosqes(u32 cc)
{
return (cc >> NVME_CC_IOSQES_SHIFT) & 0xf;
}
static inline u8 nvmet_cc_iocqes(u32 cc)
{
return (cc >> NVME_CC_IOCQES_SHIFT) & 0xf;
}
/* Convert a 32-bit number to a 16-bit 0's based number */
static inline __le16 to0based(u32 a)
{
@ -773,7 +853,9 @@ static inline void nvmet_req_bio_put(struct nvmet_req *req, struct bio *bio)
}
#ifdef CONFIG_NVME_TARGET_AUTH
u32 nvmet_auth_send_data_len(struct nvmet_req *req);
void nvmet_execute_auth_send(struct nvmet_req *req);
u32 nvmet_auth_receive_data_len(struct nvmet_req *req);
void nvmet_execute_auth_receive(struct nvmet_req *req);
int nvmet_auth_set_key(struct nvmet_host *host, const char *secret,
bool set_ctrl);
@ -831,4 +913,26 @@ static inline void nvmet_pr_put_ns_pc_ref(struct nvmet_pr_per_ctrl_ref *pc_ref)
{
percpu_ref_put(&pc_ref->ref);
}
/*
* Data for the get_feature() and set_feature() operations of PCI target
* controllers.
*/
struct nvmet_feat_irq_coalesce {
u8 thr;
u8 time;
};
struct nvmet_feat_irq_config {
u16 iv;
bool cd;
};
struct nvmet_feat_arbitration {
u8 hpw;
u8 mpw;
u8 lpw;
u8 ab;
};
#endif /* _NVMET_H */

View File

@ -261,6 +261,7 @@ static int nvmet_passthru_map_sg(struct nvmet_req *req, struct request *rq)
{
struct scatterlist *sg;
struct bio *bio;
int ret = -EINVAL;
int i;
if (req->sg_cnt > BIO_MAX_VECS)
@ -277,16 +278,19 @@ static int nvmet_passthru_map_sg(struct nvmet_req *req, struct request *rq)
}
for_each_sg(req->sg, sg, req->sg_cnt, i) {
if (bio_add_pc_page(rq->q, bio, sg_page(sg), sg->length,
sg->offset) < sg->length) {
nvmet_req_bio_put(req, bio);
return -EINVAL;
}
if (bio_add_page(bio, sg_page(sg), sg->length, sg->offset) <
sg->length)
goto out_bio_put;
}
blk_rq_bio_prep(rq, bio, req->sg_cnt);
ret = blk_rq_append_bio(rq, bio);
if (ret)
goto out_bio_put;
return 0;
out_bio_put:
nvmet_req_bio_put(req, bio);
return ret;
}
static void nvmet_passthru_execute_cmd(struct nvmet_req *req)

File diff suppressed because it is too large Load Diff

View File

@ -586,8 +586,7 @@ void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) {
unsigned int len = sg->length;
if (bio_add_pc_page(bdev_get_queue(bio->bi_bdev), bio,
sg_page(sg), len, sg->offset) != len) {
if (bio_add_page(bio, sg_page(sg), len, sg->offset) != len) {
status = NVME_SC_INTERNAL;
goto out_put_bio;
}

View File

@ -1670,6 +1670,19 @@ static void pci_dma_cleanup(struct device *dev)
iommu_device_unuse_default_domain(dev);
}
/*
* pci_device_irq_get_affinity - get IRQ affinity mask for device
* @dev: ptr to dev structure
* @irq_vec: interrupt vector number
*
* Return the CPU affinity mask for @dev and @irq_vec.
*/
static const struct cpumask *pci_device_irq_get_affinity(struct device *dev,
unsigned int irq_vec)
{
return pci_irq_get_affinity(to_pci_dev(dev), irq_vec);
}
const struct bus_type pci_bus_type = {
.name = "pci",
.match = pci_bus_match,
@ -1677,6 +1690,7 @@ const struct bus_type pci_bus_type = {
.probe = pci_device_probe,
.remove = pci_device_remove,
.shutdown = pci_device_shutdown,
.irq_get_affinity = pci_device_irq_get_affinity,
.dev_groups = pci_dev_groups,
.bus_groups = pci_bus_groups,
.drv_groups = pci_drv_groups,

View File

@ -56,7 +56,6 @@ int dasd_gendisk_alloc(struct dasd_block *block)
block->tag_set.cmd_size = sizeof(struct dasd_ccw_req);
block->tag_set.nr_hw_queues = nr_hw_queues;
block->tag_set.queue_depth = queue_depth;
block->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
block->tag_set.numa_node = NUMA_NO_NODE;
rc = blk_mq_alloc_tag_set(&block->tag_set);
if (rc)

View File

@ -461,7 +461,6 @@ int scm_blk_dev_setup(struct scm_blk_dev *bdev, struct scm_device *scmdev)
bdev->tag_set.cmd_size = sizeof(blk_status_t);
bdev->tag_set.nr_hw_queues = nr_requests;
bdev->tag_set.queue_depth = nr_requests_per_io * nr_requests;
bdev->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
bdev->tag_set.numa_node = NUMA_NO_NODE;
ret = blk_mq_alloc_tag_set(&bdev->tag_set);

View File

@ -16,7 +16,6 @@
#include <linux/spinlock.h>
#include <linux/workqueue.h>
#include <linux/if_ether.h>
#include <linux/blk-mq-pci.h>
#include <scsi/fc/fc_fip.h>
#include <scsi/scsi_host.h>
#include <scsi/scsi_transport.h>
@ -601,7 +600,7 @@ void fnic_mq_map_queues_cpus(struct Scsi_Host *host)
return;
}
blk_mq_pci_map_queues(qmap, l_pdev, FNIC_PCI_OFFSET);
blk_mq_map_hw_queues(qmap, &l_pdev->dev, FNIC_PCI_OFFSET);
}
static int fnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)

View File

@ -9,7 +9,6 @@
#include <linux/acpi.h>
#include <linux/blk-mq.h>
#include <linux/blk-mq-pci.h>
#include <linux/clk.h>
#include <linux/debugfs.h>
#include <linux/dmapool.h>

View File

@ -3328,7 +3328,7 @@ static void hisi_sas_map_queues(struct Scsi_Host *shost)
if (i == HCTX_TYPE_POLL)
blk_mq_map_queues(qmap);
else
blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
blk_mq_map_hw_queues(qmap, hisi_hba->dev,
BASE_VECTORS_V3_HW);
qoff += qmap->nr_queues;
}
@ -3345,7 +3345,7 @@ static const struct scsi_host_template sht_v3_hw = {
.slave_alloc = hisi_sas_slave_alloc,
.shost_groups = host_v3_hw_groups,
.sdev_groups = sdev_groups_v3_hw,
.tag_alloc_policy = BLK_TAG_ALLOC_RR,
.tag_alloc_policy_rr = true,
.host_reset = hisi_sas_host_reset,
.host_tagset = 1,
.mq_poll = queue_complete_v3_hw,

View File

@ -37,7 +37,6 @@
#include <linux/poll.h>
#include <linux/vmalloc.h>
#include <linux/irq_poll.h>
#include <linux/blk-mq-pci.h>
#include <scsi/scsi.h>
#include <scsi/scsi_cmnd.h>
@ -3193,7 +3192,7 @@ static void megasas_map_queues(struct Scsi_Host *shost)
map = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
map->nr_queues = instance->msix_vectors - offset;
map->queue_offset = 0;
blk_mq_pci_map_queues(map, instance->pdev, offset);
blk_mq_map_hw_queues(map, &instance->pdev->dev, offset);
qoff += map->nr_queues;
offset += map->nr_queues;

View File

@ -12,7 +12,6 @@
#include <linux/blkdev.h>
#include <linux/blk-mq.h>
#include <linux/blk-mq-pci.h>
#include <linux/delay.h>
#include <linux/dmapool.h>
#include <linux/errno.h>

View File

@ -4042,7 +4042,7 @@ static void mpi3mr_map_queues(struct Scsi_Host *shost)
*/
map->queue_offset = qoff;
if (i != HCTX_TYPE_POLL)
blk_mq_pci_map_queues(map, mrioc->pdev, offset);
blk_mq_map_hw_queues(map, &mrioc->pdev->dev, offset);
else
blk_mq_map_queues(map);

View File

@ -53,7 +53,6 @@
#include <linux/pci.h>
#include <linux/interrupt.h>
#include <linux/raid_class.h>
#include <linux/blk-mq-pci.h>
#include <linux/unaligned.h>
#include "mpt3sas_base.h"
@ -11890,7 +11889,7 @@ static void scsih_map_queues(struct Scsi_Host *shost)
*/
map->queue_offset = qoff;
if (i != HCTX_TYPE_POLL)
blk_mq_pci_map_queues(map, ioc->pdev, offset);
blk_mq_map_hw_queues(map, &ioc->pdev->dev, offset);
else
blk_mq_map_queues(map);

View File

@ -105,7 +105,7 @@ static void pm8001_map_queues(struct Scsi_Host *shost)
struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
if (pm8001_ha->number_of_intr > 1) {
blk_mq_pci_map_queues(qmap, pm8001_ha->pdev, 1);
blk_mq_map_hw_queues(qmap, &pm8001_ha->pdev->dev, 1);
return;
}

View File

@ -56,7 +56,6 @@
#include <scsi/sas_ata.h>
#include <linux/atomic.h>
#include <linux/blk-mq.h>
#include <linux/blk-mq-pci.h>
#include "pm8001_defs.h"
#define DRV_NAME "pm80xx"

View File

@ -8,7 +8,6 @@
#include <linux/delay.h>
#include <linux/nvme.h>
#include <linux/nvme-fc.h>
#include <linux/blk-mq-pci.h>
#include <linux/blk-mq.h>
static struct nvme_fc_port_template qla_nvme_fc_transport;
@ -841,7 +840,7 @@ static void qla_nvme_map_queues(struct nvme_fc_local_port *lport,
{
struct scsi_qla_host *vha = lport->private;
blk_mq_pci_map_queues(map, vha->hw->pdev, vha->irq_offset);
blk_mq_map_hw_queues(map, &vha->hw->pdev->dev, vha->irq_offset);
}
static void qla_nvme_localport_delete(struct nvme_fc_local_port *lport)

Some files were not shown because too many files have changed in this diff Show More