nvme updates for Linux 6.14

- Target support for PCI-Endpoint transport (Damien)
  - TCP IO queue spreading fixes (Sagi, Chaitanya)
  - Target handling for "limited retry" flags (Guixen)
  - Poll type fix (Yongsoo)
  - Xarray storage error handling (Keisuke)
  - Host memory buffer free size fix on error (Francis)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE3Fbyvv+648XNRdHTPe3zGtjzRgkFAmeEP3AACgkQPe3zGtjz
 RgnXLw/7B8liECkx93+Ceg1JCuXX2WbbYkgyjjggEFYFNbuZdlVMoAjUsTaJLDLs
 V9Werl314aY+RKui7hGBiQ/F9ozAHLjwbi2k3/hmd6mVekZEWVTXWTQvcRb94YQX
 PJLq9ihuQb4cwuzk6MY0yZDX0cgLMb2brCQ+E4/fbfNWX8VKgRkKwEsejYciFXCz
 k8b45wY9ytW+svCcE/6Shmr0oWjcpr8F/5KASlOgmpZHuYKXBZwXErDk7ZndxoUL
 gjnELYnu1l62Ki3khkhk84ap5OH9WPRzOEBn3ab66yqyZNHpFQ4eaakzpEh7SBIO
 eZWMniSTIH1p+9evZCKd1pMKgoi+vzFm+i0mgKWpClmI0vjLgwDc3prj9xex+Wbx
 w3QKuRqlqIT0az3e64MAZcJiFzPWs1851NnI4Wb5jH9SblutJ4DtU5//aBgjTI2W
 wsgZfN9TmNBLek3YafDz1hsI5rxGTolJcYykC1VmojVbCXkhH05KjL0jGgOnvafo
 HSF6ezkEZMeB4G2sHIFsO82P0DQcdpx1zZA68X91jSDHwlGlBoMZjlLf0wYxbxBk
 iGtz/uJQeVb/PsRaP5nVlfXmT0QJLGdGtqLliJFnN6xIO8hqSW072lkHh4IdYbzG
 2CD1kwMh3Bso/HKdyjSaB3TcSRyAagVHlXDj6mAp8ZntetB6mmY=
 =37g5
 -----END PGP SIGNATURE-----

Merge tag 'nvme-6.14-2025-01-12' of git://git.infradead.org/nvme into for-6.14/block

Pull NVMe updates from Keith:

"nvme updates for Linux 6.14

 - Target support for PCI-Endpoint transport (Damien)
 - TCP IO queue spreading fixes (Sagi, Chaitanya)
 - Target handling for "limited retry" flags (Guixen)
 - Poll type fix (Yongsoo)
 - Xarray storage error handling (Keisuke)
 - Host memory buffer free size fix on error (Francis)"

* tag 'nvme-6.14-2025-01-12' of git://git.infradead.org/nvme: (25 commits)
  nvme-pci: use correct size to free the hmb buffer
  nvme: Add error path for xa_store in nvme_init_effects
  nvme-pci: fix comment typo
  Documentation: Document the NVMe PCI endpoint target driver
  nvmet: New NVMe PCI endpoint function target driver
  nvmet: Implement arbitration feature support
  nvmet: Implement interrupt config feature support
  nvmet: Implement interrupt coalescing feature support
  nvmet: Implement host identifier set feature support
  nvmet: Introduce get/set_feature controller operations
  nvmet: Do not require SGL for PCI target controller commands
  nvmet: Add support for I/O queue management admin commands
  nvmet: Introduce nvmet_sq_create() and nvmet_cq_create()
  nvmet: Introduce nvmet_req_transfer_len()
  nvmet: Improve nvmet_alloc_ctrl() interface and implementation
  nvme: Add PCI transport type
  nvmet: Add drvdata field to struct nvmet_ctrl
  nvmet: Introduce nvmet_get_cmd_effects_admin()
  nvmet: Export nvmet_update_cc() and nvmet_cc_xxx() helpers
  nvmet: Add vendor_id and subsys_vendor_id subsystem attributes
  ...
This commit is contained in:
Jens Axboe 2025-01-13 07:12:15 -07:00
commit 9752b55035
21 changed files with 3962 additions and 188 deletions

View File

@ -15,6 +15,7 @@ PCI Endpoint Framework
pci-ntb-howto
pci-vntb-function
pci-vntb-howto
pci-nvme-function
function/binding/pci-test
function/binding/pci-ntb

View File

@ -0,0 +1,13 @@
.. SPDX-License-Identifier: GPL-2.0
=================
PCI NVMe Function
=================
:Author: Damien Le Moal <dlemoal@kernel.org>
The PCI NVMe endpoint function implements a PCI NVMe controller using the NVMe
subsystem target core code. The driver for this function resides with the NVMe
subsystem as drivers/nvme/target/nvmet-pciep.c.
See Documentation/nvme/nvme-pci-endpoint-target.rst for more details.

View File

@ -0,0 +1,12 @@
.. SPDX-License-Identifier: GPL-2.0
==============
NVMe Subsystem
==============
.. toctree::
:maxdepth: 2
:numbered:
feature-and-quirk-policy
nvme-pci-endpoint-target

View File

@ -0,0 +1,368 @@
.. SPDX-License-Identifier: GPL-2.0
=================================
NVMe PCI Endpoint Function Target
=================================
:Author: Damien Le Moal <dlemoal@kernel.org>
The NVMe PCI endpoint function target driver implements a NVMe PCIe controller
using a NVMe fabrics target controller configured with the PCI transport type.
Overview
========
The NVMe PCI endpoint function target driver allows exposing a NVMe target
controller over a PCIe link, thus implementing an NVMe PCIe device similar to a
regular M.2 SSD. The target controller is created in the same manner as when
using NVMe over fabrics: the controller represents the interface to an NVMe
subsystem using a port. The port transfer type must be configured to be
"pci". The subsystem can be configured to have namespaces backed by regular
files or block devices, or can use NVMe passthrough to expose to the PCI host an
existing physical NVMe device or a NVMe fabrics host controller (e.g. a NVMe TCP
host controller).
The NVMe PCI endpoint function target driver relies as much as possible on the
NVMe target core code to parse and execute NVMe commands submitted by the PCIe
host. However, using the PCI endpoint framework API and DMA API, the driver is
also responsible for managing all data transfers over the PCIe link. This
implies that the NVMe PCI endpoint function target driver implements several
NVMe data structure management and some NVMe command parsing.
1) The driver manages retrieval of NVMe commands in submission queues using DMA
if supported, or MMIO otherwise. Each command retrieved is then executed
using a work item to maximize performance with the parallel execution of
multiple commands on different CPUs. The driver uses a work item to
constantly poll the doorbell of all submission queues to detect command
submissions from the PCIe host.
2) The driver transfers completion queues entries of completed commands to the
PCIe host using MMIO copy of the entries in the host completion queue.
After posting completion entries in a completion queue, the driver uses the
PCI endpoint framework API to raise an interrupt to the host to signal the
commands completion.
3) For any command that has a data buffer, the NVMe PCI endpoint target driver
parses the command PRPs or SGLs lists to create a list of PCI address
segments representing the mapping of the command data buffer on the host.
The command data buffer is transferred over the PCIe link using this list of
PCI address segments using DMA, if supported. If DMA is not supported, MMIO
is used, which results in poor performance. For write commands, the command
data buffer is transferred from the host into a local memory buffer before
executing the command using the target core code. For read commands, a local
memory buffer is allocated to execute the command and the content of that
buffer is transferred to the host once the command completes.
Controller Capabilities
-----------------------
The NVMe capabilities exposed to the PCIe host through the BAR 0 registers
are almost identical to the capabilities of the NVMe target controller
implemented by the target core code. There are some exceptions.
1) The NVMe PCI endpoint target driver always sets the controller capability
CQR bit to request "Contiguous Queues Required". This is to facilitate the
mapping of a queue PCI address range to the local CPU address space.
2) The doorbell stride (DSTRB) is always set to be 4B
3) Since the PCI endpoint framework does not provide a way to handle PCI level
resets, the controller capability NSSR bit (NVM Subsystem Reset Supported)
is always cleared.
4) The boot partition support (BPS), Persistent Memory Region Supported (PMRS)
and Controller Memory Buffer Supported (CMBS) capabilities are never
reported.
Supported Features
------------------
The NVMe PCI endpoint target driver implements support for both PRPs and SGLs.
The driver also implements IRQ vector coalescing and submission queue
arbitration burst.
The maximum number of queues and the maximum data transfer size (MDTS) are
configurable through configfs before starting the controller. To avoid issues
with excessive local memory usage for executing commands, MDTS defaults to 512
KB and is limited to a maximum of 2 MB (arbitrary limit).
Mimimum number of PCI Address Mapping Windows Required
------------------------------------------------------
Most PCI endpoint controllers provide a limited number of mapping windows for
mapping a PCI address range to local CPU memory addresses. The NVMe PCI
endpoint target controllers uses mapping windows for the following.
1) One memory window for raising MSI or MSI-X interrupts
2) One memory window for MMIO transfers
3) One memory window for each completion queue
Given the highly asynchronous nature of the NVMe PCI endpoint target driver
operation, the memory windows as described above will generally not be used
simultaneously, but that may happen. So a safe maximum number of completion
queues that can be supported is equal to the total number of memory mapping
windows of the PCI endpoint controller minus two. E.g. for an endpoint PCI
controller with 32 outbound memory windows available, up to 30 completion
queues can be safely operated without any risk of getting PCI address mapping
errors due to the lack of memory windows.
Maximum Number of Queue Pairs
-----------------------------
Upon binding of the NVMe PCI endpoint target driver to the PCI endpoint
controller, BAR 0 is allocated with enough space to accommodate the admin queue
and multiple I/O queues. The maximum of number of I/O queues pairs that can be
supported is limited by several factors.
1) The NVMe target core code limits the maximum number of I/O queues to the
number of online CPUs.
2) The total number of queue pairs, including the admin queue, cannot exceed
the number of MSI-X or MSI vectors available.
3) The total number of completion queues must not exceed the total number of
PCI mapping windows minus 2 (see above).
The NVMe endpoint function driver allows configuring the maximum number of
queue pairs through configfs.
Limitations and NVMe Specification Non-Compliance
-------------------------------------------------
Similar to the NVMe target core code, the NVMe PCI endpoint target driver does
not support multiple submission queues using the same completion queue. All
submission queues must specify a unique completion queue.
User Guide
==========
This section describes the hardware requirements and how to setup an NVMe PCI
endpoint target device.
Kernel Requirements
-------------------
The kernel must be compiled with the configuration options CONFIG_PCI_ENDPOINT,
CONFIG_PCI_ENDPOINT_CONFIGFS, and CONFIG_NVME_TARGET_PCI_EPF enabled.
CONFIG_PCI, CONFIG_BLK_DEV_NVME and CONFIG_NVME_TARGET must also be enabled
(obviously).
In addition to this, at least one PCI endpoint controller driver should be
available for the endpoint hardware used.
To facilitate testing, enabling the null-blk driver (CONFIG_BLK_DEV_NULL_BLK)
is also recommended. With this, a simple setup using a null_blk block device
as a subsystem namespace can be used.
Hardware Requirements
---------------------
To use the NVMe PCI endpoint target driver, at least one endpoint controller
device is required.
To find the list of endpoint controller devices in the system::
# ls /sys/class/pci_epc/
a40000000.pcie-ep
If PCI_ENDPOINT_CONFIGFS is enabled::
# ls /sys/kernel/config/pci_ep/controllers
a40000000.pcie-ep
The endpoint board must of course also be connected to a host with a PCI cable
with RX-TX signal swapped. If the host PCI slot used does not have
plug-and-play capabilities, the host should be powered off when the NVMe PCI
endpoint device is configured.
NVMe Endpoint Device
--------------------
Creating an NVMe endpoint device is a two step process. First, an NVMe target
subsystem and port must be defined. Second, the NVMe PCI endpoint device must
be setup and bound to the subsystem and port created.
Creating a NVMe Subsystem and Port
----------------------------------
Details about how to configure a NVMe target subsystem and port are outside the
scope of this document. The following only provides a simple example of a port
and subsystem with a single namespace backed by a null_blk device.
First, make sure that configfs is enabled::
# mount -t configfs none /sys/kernel/config
Next, create a null_blk device (default settings give a 250 GB device without
memory backing). The block device created will be /dev/nullb0 by default::
# modprobe null_blk
# ls /dev/nullb0
/dev/nullb0
The NVMe PCI endpoint function target driver must be loaded::
# modprobe nvmet_pci_epf
# lsmod | grep nvmet
nvmet_pci_epf 32768 0
nvmet 118784 1 nvmet_pci_epf
nvme_core 131072 2 nvmet_pci_epf,nvmet
Now, create a subsystem and a port that we will use to create a PCI target
controller when setting up the NVMe PCI endpoint target device. In this
example, the port is created with a maximum of 4 I/O queue pairs::
# cd /sys/kernel/config/nvmet/subsystems
# mkdir nvmepf.0.nqn
# echo -n "Linux-pci-epf" > nvmepf.0.nqn/attr_model
# echo "0x1b96" > nvmepf.0.nqn/attr_vendor_id
# echo "0x1b96" > nvmepf.0.nqn/attr_subsys_vendor_id
# echo 1 > nvmepf.0.nqn/attr_allow_any_host
# echo 4 > nvmepf.0.nqn/attr_qid_max
Next, create and enable the subsystem namespace using the null_blk block
device::
# mkdir nvmepf.0.nqn/namespaces/1
# echo -n "/dev/nullb0" > nvmepf.0.nqn/namespaces/1/device_path
# echo 1 > "nvmepf.0.nqn/namespaces/1/enable"
Finally, create the target port and link it to the subsystem::
# cd /sys/kernel/config/nvmet/ports
# mkdir 1
# echo -n "pci" > 1/addr_trtype
# ln -s /sys/kernel/config/nvmet/subsystems/nvmepf.0.nqn \
/sys/kernel/config/nvmet/ports/1/subsystems/nvmepf.0.nqn
Creating a NVMe PCI Endpoint Device
-----------------------------------
With the NVMe target subsystem and port ready for use, the NVMe PCI endpoint
device can now be created and enabled. The NVMe PCI endpoint target driver
should already be loaded (that is done automatically when the port is created)::
# ls /sys/kernel/config/pci_ep/functions
nvmet_pci_epf
Next, create function 0::
# cd /sys/kernel/config/pci_ep/functions/nvmet_pci_epf
# mkdir nvmepf.0
# ls nvmepf.0/
baseclass_code msix_interrupts secondary
cache_line_size nvme subclass_code
deviceid primary subsys_id
interrupt_pin progif_code subsys_vendor_id
msi_interrupts revid vendorid
Configure the function using any device ID (the vendor ID for the device will
be automatically set to the same value as the NVMe target subsystem vendor
ID)::
# cd /sys/kernel/config/pci_ep/functions/nvmet_pci_epf
# echo 0xBEEF > nvmepf.0/deviceid
# echo 32 > nvmepf.0/msix_interrupts
If the PCI endpoint controller used does not support MSI-X, MSI can be
configured instead::
# echo 32 > nvmepf.0/msi_interrupts
Next, let's bind our endpoint device with the target subsystem and port that we
created::
# echo 1 > nvmepf.0/nvme/portid
# echo "nvmepf.0.nqn" > nvmepf.0/nvme/subsysnqn
The endpoint function can then be bound to the endpoint controller and the
controller started::
# cd /sys/kernel/config/pci_ep
# ln -s functions/nvmet_pci_epf/nvmepf.0 controllers/a40000000.pcie-ep/
# echo 1 > controllers/a40000000.pcie-ep/start
On the endpoint machine, kernel messages will show information as the NVMe
target device and endpoint device are created and connected.
.. code-block:: text
null_blk: disk nullb0 created
null_blk: module loaded
nvmet: adding nsid 1 to subsystem nvmepf.0.nqn
nvmet_pci_epf nvmet_pci_epf.0: PCI endpoint controller supports MSI-X, 32 vectors
nvmet: Created nvm controller 1 for subsystem nvmepf.0.nqn for NQN nqn.2014-08.org.nvmexpress:uuid:2ab90791-2246-4fbb-961d-4c3d5a5a0176.
nvmet_pci_epf nvmet_pci_epf.0: New PCI ctrl "nvmepf.0.nqn", 4 I/O queues, mdts 524288 B
PCI Root-Complex Host
---------------------
Booting the PCI host will result in the initialization of the PCIe link (this
may be signaled by the PCI endpoint driver with a kernel message). A kernel
message on the endpoint will also signal when the host NVMe driver enables the
device controller::
nvmet_pci_epf nvmet_pci_epf.0: Enabling controller
On the host side, the NVMe PCI endpoint function target device will is
discoverable as a PCI device, with the vendor ID and device ID as configured::
# lspci -n
0000:01:00.0 0108: 1b96:beef
An this device will be recognized as an NVMe device with a single namespace::
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme0n1 259:0 0 250G 0 disk
The NVMe endpoint block device can then be used as any other regular NVMe
namespace block device. The *nvme* command line utility can be used to get more
detailed information about the endpoint device::
# nvme id-ctrl /dev/nvme0
NVME Identify Controller:
vid : 0x1b96
ssvid : 0x1b96
sn : 94993c85650ef7bcd625
mn : Linux-pci-epf
fr : 6.13.0-r
rab : 6
ieee : 000000
cmic : 0xb
mdts : 7
cntlid : 0x1
ver : 0x20100
...
Endpoint Bindings
=================
The NVMe PCI endpoint target driver uses the PCI endpoint configfs device
attributes as follows.
================ ===========================================================
vendorid Ignored (the vendor id of the NVMe target subsystem is used)
deviceid Anything is OK (e.g. PCI_ANY_ID)
revid Do not care
progif_code Must be 0x02 (NVM Express)
baseclass_code Must be 0x01 (PCI_BASE_CLASS_STORAGE)
subclass_code Must be 0x08 (Non-Volatile Memory controller)
cache_line_size Do not care
subsys_vendor_id Ignored (the subsystem vendor id of the NVMe target subsystem
is used)
subsys_id Anything is OK (e.g. PCI_ANY_ID)
msi_interrupts At least equal to the number of queue pairs desired
msix_interrupts At least equal to the number of queue pairs desired
interrupt_pin Interrupt PIN to use if MSI and MSI-X are not supported
================ ===========================================================
The NVMe PCI endpoint target function also has some specific configurable
fields defined in the *nvme* subdirectory of the function directory. These
fields are as follows.
================ ===========================================================
mdts_kb Maximum data transfer size in KiB (default: 512)
portid The ID of the target port to use
subsysnqn The NQN of the target subsystem to use
================ ===========================================================

View File

@ -60,6 +60,7 @@ Storage interfaces
cdrom/index
scsi/index
target/index
nvme/index
Other subsystems
----------------

View File

@ -3093,7 +3093,7 @@ int nvme_get_log(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page, u8 lsp, u8 csi,
static int nvme_get_effects_log(struct nvme_ctrl *ctrl, u8 csi,
struct nvme_effects_log **log)
{
struct nvme_effects_log *cel = xa_load(&ctrl->cels, csi);
struct nvme_effects_log *old, *cel = xa_load(&ctrl->cels, csi);
int ret;
if (cel)
@ -3110,7 +3110,11 @@ static int nvme_get_effects_log(struct nvme_ctrl *ctrl, u8 csi,
return ret;
}
xa_store(&ctrl->cels, csi, cel, GFP_KERNEL);
old = xa_store(&ctrl->cels, csi, cel, GFP_KERNEL);
if (xa_is_err(old)) {
kfree(cel);
return xa_err(old);
}
out:
*log = cel;
return 0;
@ -3172,6 +3176,25 @@ free_data:
return ret;
}
static int nvme_init_effects_log(struct nvme_ctrl *ctrl,
u8 csi, struct nvme_effects_log **log)
{
struct nvme_effects_log *effects, *old;
effects = kzalloc(sizeof(*effects), GFP_KERNEL);
if (effects)
return -ENOMEM;
old = xa_store(&ctrl->cels, csi, effects, GFP_KERNEL);
if (xa_is_err(old)) {
kfree(effects);
return xa_err(old);
}
*log = effects;
return 0;
}
static void nvme_init_known_nvm_effects(struct nvme_ctrl *ctrl)
{
struct nvme_effects_log *log = ctrl->effects;
@ -3218,10 +3241,9 @@ static int nvme_init_effects(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
}
if (!ctrl->effects) {
ctrl->effects = kzalloc(sizeof(*ctrl->effects), GFP_KERNEL);
if (!ctrl->effects)
return -ENOMEM;
xa_store(&ctrl->cels, NVME_CSI_NVM, ctrl->effects, GFP_KERNEL);
ret = nvme_init_effects_log(ctrl, NVME_CSI_NVM, &ctrl->effects);
if (ret < 0)
return ret;
}
nvme_init_known_nvm_effects(ctrl);

View File

@ -1182,43 +1182,4 @@ static inline bool nvme_multi_css(struct nvme_ctrl *ctrl)
return (ctrl->ctrl_config & NVME_CC_CSS_MASK) == NVME_CC_CSS_CSI;
}
#ifdef CONFIG_NVME_VERBOSE_ERRORS
const char *nvme_get_error_status_str(u16 status);
const char *nvme_get_opcode_str(u8 opcode);
const char *nvme_get_admin_opcode_str(u8 opcode);
const char *nvme_get_fabrics_opcode_str(u8 opcode);
#else /* CONFIG_NVME_VERBOSE_ERRORS */
static inline const char *nvme_get_error_status_str(u16 status)
{
return "I/O Error";
}
static inline const char *nvme_get_opcode_str(u8 opcode)
{
return "I/O Cmd";
}
static inline const char *nvme_get_admin_opcode_str(u8 opcode)
{
return "Admin Cmd";
}
static inline const char *nvme_get_fabrics_opcode_str(u8 opcode)
{
return "Fabrics Cmd";
}
#endif /* CONFIG_NVME_VERBOSE_ERRORS */
static inline const char *nvme_opcode_str(int qid, u8 opcode)
{
return qid ? nvme_get_opcode_str(opcode) :
nvme_get_admin_opcode_str(opcode);
}
static inline const char *nvme_fabrics_opcode_str(
int qid, const struct nvme_command *cmd)
{
if (nvme_is_fabrics(cmd))
return nvme_get_fabrics_opcode_str(cmd->fabrics.fctype);
return nvme_opcode_str(qid, cmd->common.opcode);
}
#endif /* _NVME_H */

View File

@ -372,7 +372,7 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, __le32 *dbbuf_db,
/*
* Ensure that the doorbell is updated before reading the event
* index from memory. The controller needs to provide similar
* ordering to ensure the envent index is updated before reading
* ordering to ensure the event index is updated before reading
* the doorbell.
*/
mb();
@ -1147,13 +1147,13 @@ static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
}
}
static inline int nvme_poll_cq(struct nvme_queue *nvmeq,
struct io_comp_batch *iob)
static inline bool nvme_poll_cq(struct nvme_queue *nvmeq,
struct io_comp_batch *iob)
{
int found = 0;
bool found = false;
while (nvme_cqe_pending(nvmeq)) {
found++;
found = true;
/*
* load-load control dependency between phase and the rest of
* the cqe requires a full read memory barrier
@ -2085,8 +2085,8 @@ static int nvme_alloc_host_mem_single(struct nvme_dev *dev, u64 size)
sizeof(*dev->host_mem_descs), &dev->host_mem_descs_dma,
GFP_KERNEL);
if (!dev->host_mem_descs) {
dma_free_noncontiguous(dev->dev, dev->host_mem_size,
dev->hmb_sgt, DMA_BIDIRECTIONAL);
dma_free_noncontiguous(dev->dev, size, dev->hmb_sgt,
DMA_BIDIRECTIONAL);
dev->hmb_sgt = NULL;
return -ENOMEM;
}

View File

@ -54,6 +54,8 @@ MODULE_PARM_DESC(tls_handshake_timeout,
"nvme TLS handshake timeout in seconds (default 10)");
#endif
static atomic_t nvme_tcp_cpu_queues[NR_CPUS];
#ifdef CONFIG_DEBUG_LOCK_ALLOC
/* lockdep can detect a circular dependency of the form
* sk_lock -> mmap_lock (page fault) -> fs locks -> sk_lock
@ -127,6 +129,7 @@ enum nvme_tcp_queue_flags {
NVME_TCP_Q_ALLOCATED = 0,
NVME_TCP_Q_LIVE = 1,
NVME_TCP_Q_POLLING = 2,
NVME_TCP_Q_IO_CPU_SET = 3,
};
enum nvme_tcp_recv_state {
@ -1562,23 +1565,56 @@ static bool nvme_tcp_poll_queue(struct nvme_tcp_queue *queue)
ctrl->io_queues[HCTX_TYPE_POLL];
}
/**
* Track the number of queues assigned to each cpu using a global per-cpu
* counter and select the least used cpu from the mq_map. Our goal is to spread
* different controllers I/O threads across different cpu cores.
*
* Note that the accounting is not 100% perfect, but we don't need to be, we're
* simply putting our best effort to select the best candidate cpu core that we
* find at any given point.
*/
static void nvme_tcp_set_queue_io_cpu(struct nvme_tcp_queue *queue)
{
struct nvme_tcp_ctrl *ctrl = queue->ctrl;
int qid = nvme_tcp_queue_id(queue);
int n = 0;
struct blk_mq_tag_set *set = &ctrl->tag_set;
int qid = nvme_tcp_queue_id(queue) - 1;
unsigned int *mq_map = NULL;
int cpu, min_queues = INT_MAX, io_cpu;
if (wq_unbound)
goto out;
if (nvme_tcp_default_queue(queue))
n = qid - 1;
mq_map = set->map[HCTX_TYPE_DEFAULT].mq_map;
else if (nvme_tcp_read_queue(queue))
n = qid - ctrl->io_queues[HCTX_TYPE_DEFAULT] - 1;
mq_map = set->map[HCTX_TYPE_READ].mq_map;
else if (nvme_tcp_poll_queue(queue))
n = qid - ctrl->io_queues[HCTX_TYPE_DEFAULT] -
ctrl->io_queues[HCTX_TYPE_READ] - 1;
if (wq_unbound)
queue->io_cpu = WORK_CPU_UNBOUND;
else
queue->io_cpu = cpumask_next_wrap(n - 1, cpu_online_mask, -1, false);
mq_map = set->map[HCTX_TYPE_POLL].mq_map;
if (WARN_ON(!mq_map))
goto out;
/* Search for the least used cpu from the mq_map */
io_cpu = WORK_CPU_UNBOUND;
for_each_online_cpu(cpu) {
int num_queues = atomic_read(&nvme_tcp_cpu_queues[cpu]);
if (mq_map[cpu] != qid)
continue;
if (num_queues < min_queues) {
io_cpu = cpu;
min_queues = num_queues;
}
}
if (io_cpu != WORK_CPU_UNBOUND) {
queue->io_cpu = io_cpu;
atomic_inc(&nvme_tcp_cpu_queues[io_cpu]);
set_bit(NVME_TCP_Q_IO_CPU_SET, &queue->flags);
}
out:
dev_dbg(ctrl->ctrl.device, "queue %d: using cpu %d\n",
qid, queue->io_cpu);
}
static void nvme_tcp_tls_done(void *data, int status, key_serial_t pskid)
@ -1722,7 +1758,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
queue->sock->sk->sk_allocation = GFP_ATOMIC;
queue->sock->sk->sk_use_task_frag = false;
nvme_tcp_set_queue_io_cpu(queue);
queue->io_cpu = WORK_CPU_UNBOUND;
queue->request = NULL;
queue->data_remaining = 0;
queue->ddgst_remaining = 0;
@ -1844,6 +1880,9 @@ static void nvme_tcp_stop_queue(struct nvme_ctrl *nctrl, int qid)
if (!test_bit(NVME_TCP_Q_ALLOCATED, &queue->flags))
return;
if (test_and_clear_bit(NVME_TCP_Q_IO_CPU_SET, &queue->flags))
atomic_dec(&nvme_tcp_cpu_queues[queue->io_cpu]);
mutex_lock(&queue->queue_lock);
if (test_and_clear_bit(NVME_TCP_Q_LIVE, &queue->flags))
__nvme_tcp_stop_queue(queue);
@ -1878,9 +1917,10 @@ static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
nvme_tcp_init_recv_ctx(queue);
nvme_tcp_setup_sock_ops(queue);
if (idx)
if (idx) {
nvme_tcp_set_queue_io_cpu(queue);
ret = nvmf_connect_io_queue(nctrl, idx);
else
} else
ret = nvmf_connect_admin_queue(nctrl);
if (!ret) {
@ -2849,6 +2889,7 @@ static struct nvmf_transport_ops nvme_tcp_transport = {
static int __init nvme_tcp_init_module(void)
{
unsigned int wq_flags = WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_SYSFS;
int cpu;
BUILD_BUG_ON(sizeof(struct nvme_tcp_hdr) != 8);
BUILD_BUG_ON(sizeof(struct nvme_tcp_cmd_pdu) != 72);
@ -2866,6 +2907,9 @@ static int __init nvme_tcp_init_module(void)
if (!nvme_tcp_wq)
return -ENOMEM;
for_each_possible_cpu(cpu)
atomic_set(&nvme_tcp_cpu_queues[cpu], 0);
nvmf_register_transport(&nvme_tcp_transport);
return 0;
}

View File

@ -115,3 +115,14 @@ config NVME_TARGET_AUTH
target side.
If unsure, say N.
config NVME_TARGET_PCI_EPF
tristate "NVMe PCI Endpoint Function target support"
depends on NVME_TARGET && PCI_ENDPOINT
depends on NVME_CORE=y || NVME_CORE=NVME_TARGET
help
This enables the NVMe PCI Endpoint Function target driver support,
which allows creating a NVMe PCI controller using an endpoint mode
capable PCI controller.
If unsure, say N.

View File

@ -8,6 +8,7 @@ obj-$(CONFIG_NVME_TARGET_RDMA) += nvmet-rdma.o
obj-$(CONFIG_NVME_TARGET_FC) += nvmet-fc.o
obj-$(CONFIG_NVME_TARGET_FCLOOP) += nvme-fcloop.o
obj-$(CONFIG_NVME_TARGET_TCP) += nvmet-tcp.o
obj-$(CONFIG_NVME_TARGET_PCI_EPF) += nvmet-pci-epf.o
nvmet-y += core.o configfs.o admin-cmd.o fabrics-cmd.o \
discovery.o io-cmd-file.o io-cmd-bdev.o pr.o
@ -20,4 +21,5 @@ nvmet-rdma-y += rdma.o
nvmet-fc-y += fc.o
nvme-fcloop-y += fcloop.o
nvmet-tcp-y += tcp.o
nvmet-pci-epf-y += pci-epf.o
nvmet-$(CONFIG_TRACING) += trace.o

View File

@ -12,6 +12,142 @@
#include <linux/unaligned.h>
#include "nvmet.h"
static void nvmet_execute_delete_sq(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
u16 sqid = le16_to_cpu(req->cmd->delete_queue.qid);
u16 status;
if (!nvmet_is_pci_ctrl(ctrl)) {
status = nvmet_report_invalid_opcode(req);
goto complete;
}
if (!sqid) {
status = NVME_SC_QID_INVALID | NVME_STATUS_DNR;
goto complete;
}
status = nvmet_check_sqid(ctrl, sqid, false);
if (status != NVME_SC_SUCCESS)
goto complete;
status = ctrl->ops->delete_sq(ctrl, sqid);
complete:
nvmet_req_complete(req, status);
}
static void nvmet_execute_create_sq(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvme_command *cmd = req->cmd;
u16 sqid = le16_to_cpu(cmd->create_sq.sqid);
u16 cqid = le16_to_cpu(cmd->create_sq.cqid);
u16 sq_flags = le16_to_cpu(cmd->create_sq.sq_flags);
u16 qsize = le16_to_cpu(cmd->create_sq.qsize);
u64 prp1 = le64_to_cpu(cmd->create_sq.prp1);
u16 status;
if (!nvmet_is_pci_ctrl(ctrl)) {
status = nvmet_report_invalid_opcode(req);
goto complete;
}
if (!sqid) {
status = NVME_SC_QID_INVALID | NVME_STATUS_DNR;
goto complete;
}
status = nvmet_check_sqid(ctrl, sqid, true);
if (status != NVME_SC_SUCCESS)
goto complete;
/*
* Note: The NVMe specification allows multiple SQs to use the same CQ.
* However, the target code does not really support that. So for now,
* prevent this and fail the command if sqid and cqid are different.
*/
if (!cqid || cqid != sqid) {
pr_err("SQ %u: Unsupported CQID %u\n", sqid, cqid);
status = NVME_SC_CQ_INVALID | NVME_STATUS_DNR;
goto complete;
}
if (!qsize || qsize > NVME_CAP_MQES(ctrl->cap)) {
status = NVME_SC_QUEUE_SIZE | NVME_STATUS_DNR;
goto complete;
}
status = ctrl->ops->create_sq(ctrl, sqid, sq_flags, qsize, prp1);
complete:
nvmet_req_complete(req, status);
}
static void nvmet_execute_delete_cq(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
u16 cqid = le16_to_cpu(req->cmd->delete_queue.qid);
u16 status;
if (!nvmet_is_pci_ctrl(ctrl)) {
status = nvmet_report_invalid_opcode(req);
goto complete;
}
if (!cqid) {
status = NVME_SC_QID_INVALID | NVME_STATUS_DNR;
goto complete;
}
status = nvmet_check_cqid(ctrl, cqid);
if (status != NVME_SC_SUCCESS)
goto complete;
status = ctrl->ops->delete_cq(ctrl, cqid);
complete:
nvmet_req_complete(req, status);
}
static void nvmet_execute_create_cq(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvme_command *cmd = req->cmd;
u16 cqid = le16_to_cpu(cmd->create_cq.cqid);
u16 cq_flags = le16_to_cpu(cmd->create_cq.cq_flags);
u16 qsize = le16_to_cpu(cmd->create_cq.qsize);
u16 irq_vector = le16_to_cpu(cmd->create_cq.irq_vector);
u64 prp1 = le64_to_cpu(cmd->create_cq.prp1);
u16 status;
if (!nvmet_is_pci_ctrl(ctrl)) {
status = nvmet_report_invalid_opcode(req);
goto complete;
}
if (!cqid) {
status = NVME_SC_QID_INVALID | NVME_STATUS_DNR;
goto complete;
}
status = nvmet_check_cqid(ctrl, cqid);
if (status != NVME_SC_SUCCESS)
goto complete;
if (!qsize || qsize > NVME_CAP_MQES(ctrl->cap)) {
status = NVME_SC_QUEUE_SIZE | NVME_STATUS_DNR;
goto complete;
}
status = ctrl->ops->create_cq(ctrl, cqid, cq_flags, qsize,
prp1, irq_vector);
complete:
nvmet_req_complete(req, status);
}
u32 nvmet_get_log_page_len(struct nvme_command *cmd)
{
u32 len = le16_to_cpu(cmd->get_log_page.numdu);
@ -230,8 +366,18 @@ out:
nvmet_req_complete(req, status);
}
static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log)
static void nvmet_get_cmd_effects_admin(struct nvmet_ctrl *ctrl,
struct nvme_effects_log *log)
{
/* For a PCI target controller, advertize support for the . */
if (nvmet_is_pci_ctrl(ctrl)) {
log->acs[nvme_admin_delete_sq] =
log->acs[nvme_admin_create_sq] =
log->acs[nvme_admin_delete_cq] =
log->acs[nvme_admin_create_cq] =
cpu_to_le32(NVME_CMD_EFFECTS_CSUPP);
}
log->acs[nvme_admin_get_log_page] =
log->acs[nvme_admin_identify] =
log->acs[nvme_admin_abort_cmd] =
@ -240,7 +386,10 @@ static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log)
log->acs[nvme_admin_async_event] =
log->acs[nvme_admin_keep_alive] =
cpu_to_le32(NVME_CMD_EFFECTS_CSUPP);
}
static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log)
{
log->iocs[nvme_cmd_read] =
log->iocs[nvme_cmd_flush] =
log->iocs[nvme_cmd_dsm] =
@ -265,6 +414,7 @@ static void nvmet_get_cmd_effects_zns(struct nvme_effects_log *log)
static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvme_effects_log *log;
u16 status = NVME_SC_SUCCESS;
@ -276,6 +426,7 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
switch (req->cmd->get_log_page.csi) {
case NVME_CSI_NVM:
nvmet_get_cmd_effects_admin(ctrl, log);
nvmet_get_cmd_effects_nvm(log);
break;
case NVME_CSI_ZNS:
@ -283,6 +434,7 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
status = NVME_SC_INVALID_IO_CMD_SET;
goto free;
}
nvmet_get_cmd_effects_admin(ctrl, log);
nvmet_get_cmd_effects_nvm(log);
nvmet_get_cmd_effects_zns(log);
break;
@ -507,7 +659,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvmet_subsys *subsys = ctrl->subsys;
struct nvme_id_ctrl *id;
u32 cmd_capsule_size;
u32 cmd_capsule_size, ctratt;
u16 status = 0;
if (!subsys->subsys_discovered) {
@ -522,9 +674,8 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
goto out;
}
/* XXX: figure out how to assign real vendors IDs. */
id->vid = 0;
id->ssvid = 0;
id->vid = cpu_to_le16(subsys->vendor_id);
id->ssvid = cpu_to_le16(subsys->subsys_vendor_id);
memcpy(id->sn, ctrl->subsys->serial, NVMET_SN_MAX_SIZE);
memcpy_and_pad(id->mn, sizeof(id->mn), subsys->model_number,
@ -556,8 +707,10 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
/* XXX: figure out what to do about RTD3R/RTD3 */
id->oaes = cpu_to_le32(NVMET_AEN_CFG_OPTIONAL);
id->ctratt = cpu_to_le32(NVME_CTRL_ATTR_HID_128_BIT |
NVME_CTRL_ATTR_TBKAS);
ctratt = NVME_CTRL_ATTR_HID_128_BIT | NVME_CTRL_ATTR_TBKAS;
if (nvmet_is_pci_ctrl(ctrl))
ctratt |= NVME_CTRL_ATTR_RHII;
id->ctratt = cpu_to_le32(ctratt);
id->oacs = 0;
@ -1104,6 +1257,92 @@ u16 nvmet_set_feat_async_event(struct nvmet_req *req, u32 mask)
return 0;
}
static u16 nvmet_set_feat_host_id(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
if (!nvmet_is_pci_ctrl(ctrl))
return NVME_SC_CMD_SEQ_ERROR | NVME_STATUS_DNR;
/*
* The NVMe base specifications v2.1 recommends supporting 128-bits host
* IDs (section 5.1.25.1.28.1). However, that same section also says
* that "The controller may support a 64-bit Host Identifier and/or an
* extended 128-bit Host Identifier". So simplify this support and do
* not support 64-bits host IDs to avoid needing to check that all
* controllers associated with the same subsystem all use the same host
* ID size.
*/
if (!(req->cmd->common.cdw11 & cpu_to_le32(1 << 0))) {
req->error_loc = offsetof(struct nvme_common_command, cdw11);
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
}
return nvmet_copy_from_sgl(req, 0, &req->sq->ctrl->hostid,
sizeof(req->sq->ctrl->hostid));
}
static u16 nvmet_set_feat_irq_coalesce(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
u32 cdw11 = le32_to_cpu(req->cmd->common.cdw11);
struct nvmet_feat_irq_coalesce irqc = {
.time = (cdw11 >> 8) & 0xff,
.thr = cdw11 & 0xff,
};
/*
* This feature is not supported for fabrics controllers and mandatory
* for PCI controllers.
*/
if (!nvmet_is_pci_ctrl(ctrl)) {
req->error_loc = offsetof(struct nvme_common_command, cdw10);
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
}
return ctrl->ops->set_feature(ctrl, NVME_FEAT_IRQ_COALESCE, &irqc);
}
static u16 nvmet_set_feat_irq_config(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
u32 cdw11 = le32_to_cpu(req->cmd->common.cdw11);
struct nvmet_feat_irq_config irqcfg = {
.iv = cdw11 & 0xffff,
.cd = (cdw11 >> 16) & 0x1,
};
/*
* This feature is not supported for fabrics controllers and mandatory
* for PCI controllers.
*/
if (!nvmet_is_pci_ctrl(ctrl)) {
req->error_loc = offsetof(struct nvme_common_command, cdw10);
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
}
return ctrl->ops->set_feature(ctrl, NVME_FEAT_IRQ_CONFIG, &irqcfg);
}
static u16 nvmet_set_feat_arbitration(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
u32 cdw11 = le32_to_cpu(req->cmd->common.cdw11);
struct nvmet_feat_arbitration arb = {
.hpw = (cdw11 >> 24) & 0xff,
.mpw = (cdw11 >> 16) & 0xff,
.lpw = (cdw11 >> 8) & 0xff,
.ab = cdw11 & 0x3,
};
if (!ctrl->ops->set_feature) {
req->error_loc = offsetof(struct nvme_common_command, cdw10);
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
}
return ctrl->ops->set_feature(ctrl, NVME_FEAT_ARBITRATION, &arb);
}
void nvmet_execute_set_features(struct nvmet_req *req)
{
struct nvmet_subsys *subsys = nvmet_req_subsys(req);
@ -1117,6 +1356,9 @@ void nvmet_execute_set_features(struct nvmet_req *req)
return;
switch (cdw10 & 0xff) {
case NVME_FEAT_ARBITRATION:
status = nvmet_set_feat_arbitration(req);
break;
case NVME_FEAT_NUM_QUEUES:
ncqr = (cdw11 >> 16) & 0xffff;
nsqr = cdw11 & 0xffff;
@ -1127,6 +1369,12 @@ void nvmet_execute_set_features(struct nvmet_req *req)
nvmet_set_result(req,
(subsys->max_qid - 1) | ((subsys->max_qid - 1) << 16));
break;
case NVME_FEAT_IRQ_COALESCE:
status = nvmet_set_feat_irq_coalesce(req);
break;
case NVME_FEAT_IRQ_CONFIG:
status = nvmet_set_feat_irq_config(req);
break;
case NVME_FEAT_KATO:
status = nvmet_set_feat_kato(req);
break;
@ -1134,7 +1382,7 @@ void nvmet_execute_set_features(struct nvmet_req *req)
status = nvmet_set_feat_async_event(req, NVMET_AEN_CFG_ALL);
break;
case NVME_FEAT_HOST_ID:
status = NVME_SC_CMD_SEQ_ERROR | NVME_STATUS_DNR;
status = nvmet_set_feat_host_id(req);
break;
case NVME_FEAT_WRITE_PROTECT:
status = nvmet_set_feat_write_protect(req);
@ -1171,6 +1419,79 @@ static u16 nvmet_get_feat_write_protect(struct nvmet_req *req)
return 0;
}
static u16 nvmet_get_feat_irq_coalesce(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvmet_feat_irq_coalesce irqc = { };
u16 status;
/*
* This feature is not supported for fabrics controllers and mandatory
* for PCI controllers.
*/
if (!nvmet_is_pci_ctrl(ctrl)) {
req->error_loc = offsetof(struct nvme_common_command, cdw10);
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
}
status = ctrl->ops->get_feature(ctrl, NVME_FEAT_IRQ_COALESCE, &irqc);
if (status != NVME_SC_SUCCESS)
return status;
nvmet_set_result(req, ((u32)irqc.time << 8) | (u32)irqc.thr);
return NVME_SC_SUCCESS;
}
static u16 nvmet_get_feat_irq_config(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
u32 iv = le32_to_cpu(req->cmd->common.cdw11) & 0xffff;
struct nvmet_feat_irq_config irqcfg = { .iv = iv };
u16 status;
/*
* This feature is not supported for fabrics controllers and mandatory
* for PCI controllers.
*/
if (!nvmet_is_pci_ctrl(ctrl)) {
req->error_loc = offsetof(struct nvme_common_command, cdw10);
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
}
status = ctrl->ops->get_feature(ctrl, NVME_FEAT_IRQ_CONFIG, &irqcfg);
if (status != NVME_SC_SUCCESS)
return status;
nvmet_set_result(req, ((u32)irqcfg.cd << 16) | iv);
return NVME_SC_SUCCESS;
}
static u16 nvmet_get_feat_arbitration(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvmet_feat_arbitration arb = { };
u16 status;
if (!ctrl->ops->get_feature) {
req->error_loc = offsetof(struct nvme_common_command, cdw10);
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
}
status = ctrl->ops->get_feature(ctrl, NVME_FEAT_ARBITRATION, &arb);
if (status != NVME_SC_SUCCESS)
return status;
nvmet_set_result(req,
((u32)arb.hpw << 24) |
((u32)arb.mpw << 16) |
((u32)arb.lpw << 8) |
(arb.ab & 0x3));
return NVME_SC_SUCCESS;
}
void nvmet_get_feat_kato(struct nvmet_req *req)
{
nvmet_set_result(req, req->sq->ctrl->kato * 1000);
@ -1197,21 +1518,24 @@ void nvmet_execute_get_features(struct nvmet_req *req)
* need to come up with some fake values for these.
*/
#if 0
case NVME_FEAT_ARBITRATION:
break;
case NVME_FEAT_POWER_MGMT:
break;
case NVME_FEAT_TEMP_THRESH:
break;
case NVME_FEAT_ERR_RECOVERY:
break;
case NVME_FEAT_IRQ_COALESCE:
break;
case NVME_FEAT_IRQ_CONFIG:
break;
case NVME_FEAT_WRITE_ATOMIC:
break;
#endif
case NVME_FEAT_ARBITRATION:
status = nvmet_get_feat_arbitration(req);
break;
case NVME_FEAT_IRQ_COALESCE:
status = nvmet_get_feat_irq_coalesce(req);
break;
case NVME_FEAT_IRQ_CONFIG:
status = nvmet_get_feat_irq_config(req);
break;
case NVME_FEAT_ASYNC_EVENT:
nvmet_get_feat_async_event(req);
break;
@ -1292,6 +1616,27 @@ out:
nvmet_req_complete(req, status);
}
u32 nvmet_admin_cmd_data_len(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
if (nvme_is_fabrics(cmd))
return nvmet_fabrics_admin_cmd_data_len(req);
if (nvmet_is_disc_subsys(nvmet_req_subsys(req)))
return nvmet_discovery_cmd_data_len(req);
switch (cmd->common.opcode) {
case nvme_admin_get_log_page:
return nvmet_get_log_page_len(cmd);
case nvme_admin_identify:
return NVME_IDENTIFY_DATA_SIZE;
case nvme_admin_get_features:
return nvmet_feat_data_len(req, le32_to_cpu(cmd->common.cdw10));
default:
return 0;
}
}
u16 nvmet_parse_admin_cmd(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
@ -1306,13 +1651,30 @@ u16 nvmet_parse_admin_cmd(struct nvmet_req *req)
if (unlikely(ret))
return ret;
/* For PCI controllers, admin commands shall not use SGL. */
if (nvmet_is_pci_ctrl(req->sq->ctrl) && !req->sq->qid &&
cmd->common.flags & NVME_CMD_SGL_ALL)
return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
if (nvmet_is_passthru_req(req))
return nvmet_parse_passthru_admin_cmd(req);
switch (cmd->common.opcode) {
case nvme_admin_delete_sq:
req->execute = nvmet_execute_delete_sq;
return 0;
case nvme_admin_create_sq:
req->execute = nvmet_execute_create_sq;
return 0;
case nvme_admin_get_log_page:
req->execute = nvmet_execute_get_log_page;
return 0;
case nvme_admin_delete_cq:
req->execute = nvmet_execute_delete_cq;
return 0;
case nvme_admin_create_cq:
req->execute = nvmet_execute_create_cq;
return 0;
case nvme_admin_identify:
req->execute = nvmet_execute_identify;
return 0;

View File

@ -37,6 +37,7 @@ static struct nvmet_type_name_map nvmet_transport[] = {
{ NVMF_TRTYPE_RDMA, "rdma" },
{ NVMF_TRTYPE_FC, "fc" },
{ NVMF_TRTYPE_TCP, "tcp" },
{ NVMF_TRTYPE_PCI, "pci" },
{ NVMF_TRTYPE_LOOP, "loop" },
};
@ -46,6 +47,7 @@ static const struct nvmet_type_name_map nvmet_addr_family[] = {
{ NVMF_ADDR_FAMILY_IP6, "ipv6" },
{ NVMF_ADDR_FAMILY_IB, "ib" },
{ NVMF_ADDR_FAMILY_FC, "fc" },
{ NVMF_ADDR_FAMILY_PCI, "pci" },
{ NVMF_ADDR_FAMILY_LOOP, "loop" },
};
@ -1412,6 +1414,49 @@ out_unlock:
}
CONFIGFS_ATTR(nvmet_subsys_, attr_cntlid_max);
static ssize_t nvmet_subsys_attr_vendor_id_show(struct config_item *item,
char *page)
{
return snprintf(page, PAGE_SIZE, "0x%x\n", to_subsys(item)->vendor_id);
}
static ssize_t nvmet_subsys_attr_vendor_id_store(struct config_item *item,
const char *page, size_t count)
{
u16 vid;
if (kstrtou16(page, 0, &vid))
return -EINVAL;
down_write(&nvmet_config_sem);
to_subsys(item)->vendor_id = vid;
up_write(&nvmet_config_sem);
return count;
}
CONFIGFS_ATTR(nvmet_subsys_, attr_vendor_id);
static ssize_t nvmet_subsys_attr_subsys_vendor_id_show(struct config_item *item,
char *page)
{
return snprintf(page, PAGE_SIZE, "0x%x\n",
to_subsys(item)->subsys_vendor_id);
}
static ssize_t nvmet_subsys_attr_subsys_vendor_id_store(struct config_item *item,
const char *page, size_t count)
{
u16 ssvid;
if (kstrtou16(page, 0, &ssvid))
return -EINVAL;
down_write(&nvmet_config_sem);
to_subsys(item)->subsys_vendor_id = ssvid;
up_write(&nvmet_config_sem);
return count;
}
CONFIGFS_ATTR(nvmet_subsys_, attr_subsys_vendor_id);
static ssize_t nvmet_subsys_attr_model_show(struct config_item *item,
char *page)
{
@ -1640,6 +1685,8 @@ static struct configfs_attribute *nvmet_subsys_attrs[] = {
&nvmet_subsys_attr_attr_serial,
&nvmet_subsys_attr_attr_cntlid_min,
&nvmet_subsys_attr_attr_cntlid_max,
&nvmet_subsys_attr_attr_vendor_id,
&nvmet_subsys_attr_attr_subsys_vendor_id,
&nvmet_subsys_attr_attr_model,
&nvmet_subsys_attr_attr_qid_max,
&nvmet_subsys_attr_attr_ieee_oui,
@ -1794,6 +1841,7 @@ static struct config_group *nvmet_referral_make(
return ERR_PTR(-ENOMEM);
INIT_LIST_HEAD(&port->entry);
port->disc_addr.trtype = NVMF_TRTYPE_MAX;
config_group_init_type_name(&port->group, name, &nvmet_referral_type);
return &port->group;
@ -2019,6 +2067,7 @@ static struct config_group *nvmet_ports_make(struct config_group *group,
port->inline_data_size = -1; /* < 0 == let the transport choose */
port->max_queue_size = -1; /* < 0 == let the transport choose */
port->disc_addr.trtype = NVMF_TRTYPE_MAX;
port->disc_addr.portid = cpu_to_le16(portid);
port->disc_addr.adrfam = NVMF_ADDR_FAMILY_MAX;
port->disc_addr.treq = NVMF_TREQ_DISABLE_SQFLOW;

View File

@ -818,6 +818,89 @@ static void nvmet_confirm_sq(struct percpu_ref *ref)
complete(&sq->confirm_done);
}
u16 nvmet_check_cqid(struct nvmet_ctrl *ctrl, u16 cqid)
{
if (!ctrl->sqs)
return NVME_SC_INTERNAL | NVME_STATUS_DNR;
if (cqid > ctrl->subsys->max_qid)
return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
/*
* Note: For PCI controllers, the NVMe specifications allows multiple
* SQs to share a single CQ. However, we do not support this yet, so
* check that there is no SQ defined for a CQ. If one exist, then the
* CQ ID is invalid for creation as well as when the CQ is being
* deleted (as that would mean that the SQ was not deleted before the
* CQ).
*/
if (ctrl->sqs[cqid])
return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
return NVME_SC_SUCCESS;
}
u16 nvmet_cq_create(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq,
u16 qid, u16 size)
{
u16 status;
status = nvmet_check_cqid(ctrl, qid);
if (status != NVME_SC_SUCCESS)
return status;
nvmet_cq_setup(ctrl, cq, qid, size);
return NVME_SC_SUCCESS;
}
EXPORT_SYMBOL_GPL(nvmet_cq_create);
u16 nvmet_check_sqid(struct nvmet_ctrl *ctrl, u16 sqid,
bool create)
{
if (!ctrl->sqs)
return NVME_SC_INTERNAL | NVME_STATUS_DNR;
if (sqid > ctrl->subsys->max_qid)
return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
if ((create && ctrl->sqs[sqid]) ||
(!create && !ctrl->sqs[sqid]))
return NVME_SC_QID_INVALID | NVME_STATUS_DNR;
return NVME_SC_SUCCESS;
}
u16 nvmet_sq_create(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq,
u16 sqid, u16 size)
{
u16 status;
int ret;
if (!kref_get_unless_zero(&ctrl->ref))
return NVME_SC_INTERNAL | NVME_STATUS_DNR;
status = nvmet_check_sqid(ctrl, sqid, true);
if (status != NVME_SC_SUCCESS)
return status;
ret = nvmet_sq_init(sq);
if (ret) {
status = NVME_SC_INTERNAL | NVME_STATUS_DNR;
goto ctrl_put;
}
nvmet_sq_setup(ctrl, sq, sqid, size);
sq->ctrl = ctrl;
return NVME_SC_SUCCESS;
ctrl_put:
nvmet_ctrl_put(ctrl);
return status;
}
EXPORT_SYMBOL_GPL(nvmet_sq_create);
void nvmet_sq_destroy(struct nvmet_sq *sq)
{
struct nvmet_ctrl *ctrl = sq->ctrl;
@ -911,6 +994,33 @@ static inline u16 nvmet_io_cmd_check_access(struct nvmet_req *req)
return 0;
}
static u32 nvmet_io_cmd_transfer_len(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
u32 metadata_len = 0;
if (nvme_is_fabrics(cmd))
return nvmet_fabrics_io_cmd_data_len(req);
if (!req->ns)
return 0;
switch (req->cmd->common.opcode) {
case nvme_cmd_read:
case nvme_cmd_write:
case nvme_cmd_zone_append:
if (req->sq->ctrl->pi_support && nvmet_ns_has_pi(req->ns))
metadata_len = nvmet_rw_metadata_len(req);
return nvmet_rw_data_len(req) + metadata_len;
case nvme_cmd_dsm:
return nvmet_dsm_len(req);
case nvme_cmd_zone_mgmt_recv:
return (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
default:
return 0;
}
}
static u16 nvmet_parse_io_cmd(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
@ -1012,12 +1122,15 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
/*
* For fabrics, PSDT field shall describe metadata pointer (MPTR) that
* contains an address of a single contiguous physical buffer that is
* byte aligned.
* byte aligned. For PCI controllers, this is optional so not enforced.
*/
if (unlikely((flags & NVME_CMD_SGL_ALL) != NVME_CMD_SGL_METABUF)) {
req->error_loc = offsetof(struct nvme_common_command, flags);
status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
goto fail;
if (!req->sq->ctrl || !nvmet_is_pci_ctrl(req->sq->ctrl)) {
req->error_loc =
offsetof(struct nvme_common_command, flags);
status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
goto fail;
}
}
if (unlikely(!req->sq->ctrl))
@ -1059,11 +1172,27 @@ void nvmet_req_uninit(struct nvmet_req *req)
}
EXPORT_SYMBOL_GPL(nvmet_req_uninit);
size_t nvmet_req_transfer_len(struct nvmet_req *req)
{
if (likely(req->sq->qid != 0))
return nvmet_io_cmd_transfer_len(req);
if (unlikely(!req->sq->ctrl))
return nvmet_connect_cmd_data_len(req);
return nvmet_admin_cmd_data_len(req);
}
EXPORT_SYMBOL_GPL(nvmet_req_transfer_len);
bool nvmet_check_transfer_len(struct nvmet_req *req, size_t len)
{
if (unlikely(len != req->transfer_len)) {
u16 status;
req->error_loc = offsetof(struct nvme_common_command, dptr);
nvmet_req_complete(req, NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR);
if (req->cmd->common.flags & NVME_CMD_SGL_ALL)
status = NVME_SC_SGL_INVALID_DATA;
else
status = NVME_SC_INVALID_FIELD;
nvmet_req_complete(req, status | NVME_STATUS_DNR);
return false;
}
@ -1074,8 +1203,14 @@ EXPORT_SYMBOL_GPL(nvmet_check_transfer_len);
bool nvmet_check_data_len_lte(struct nvmet_req *req, size_t data_len)
{
if (unlikely(data_len > req->transfer_len)) {
u16 status;
req->error_loc = offsetof(struct nvme_common_command, dptr);
nvmet_req_complete(req, NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR);
if (req->cmd->common.flags & NVME_CMD_SGL_ALL)
status = NVME_SC_SGL_INVALID_DATA;
else
status = NVME_SC_INVALID_FIELD;
nvmet_req_complete(req, status | NVME_STATUS_DNR);
return false;
}
@ -1166,41 +1301,6 @@ void nvmet_req_free_sgls(struct nvmet_req *req)
}
EXPORT_SYMBOL_GPL(nvmet_req_free_sgls);
static inline bool nvmet_cc_en(u32 cc)
{
return (cc >> NVME_CC_EN_SHIFT) & 0x1;
}
static inline u8 nvmet_cc_css(u32 cc)
{
return (cc >> NVME_CC_CSS_SHIFT) & 0x7;
}
static inline u8 nvmet_cc_mps(u32 cc)
{
return (cc >> NVME_CC_MPS_SHIFT) & 0xf;
}
static inline u8 nvmet_cc_ams(u32 cc)
{
return (cc >> NVME_CC_AMS_SHIFT) & 0x7;
}
static inline u8 nvmet_cc_shn(u32 cc)
{
return (cc >> NVME_CC_SHN_SHIFT) & 0x3;
}
static inline u8 nvmet_cc_iosqes(u32 cc)
{
return (cc >> NVME_CC_IOSQES_SHIFT) & 0xf;
}
static inline u8 nvmet_cc_iocqes(u32 cc)
{
return (cc >> NVME_CC_IOCQES_SHIFT) & 0xf;
}
static inline bool nvmet_css_supported(u8 cc_css)
{
switch (cc_css << NVME_CC_CSS_SHIFT) {
@ -1277,6 +1377,7 @@ void nvmet_update_cc(struct nvmet_ctrl *ctrl, u32 new)
ctrl->csts &= ~NVME_CSTS_SHST_CMPLT;
mutex_unlock(&ctrl->lock);
}
EXPORT_SYMBOL_GPL(nvmet_update_cc);
static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
{
@ -1384,15 +1485,15 @@ bool nvmet_host_allowed(struct nvmet_subsys *subsys, const char *hostnqn)
* Note: ctrl->subsys->lock should be held when calling this function
*/
static void nvmet_setup_p2p_ns_map(struct nvmet_ctrl *ctrl,
struct nvmet_req *req)
struct device *p2p_client)
{
struct nvmet_ns *ns;
unsigned long idx;
if (!req->p2p_client)
if (!p2p_client)
return;
ctrl->p2p_client = get_device(req->p2p_client);
ctrl->p2p_client = get_device(p2p_client);
xa_for_each(&ctrl->subsys->namespaces, idx, ns)
nvmet_p2pmem_ns_add_p2p(ctrl, ns);
@ -1421,45 +1522,44 @@ static void nvmet_fatal_error_handler(struct work_struct *work)
ctrl->ops->delete_ctrl(ctrl);
}
u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
struct nvmet_req *req, u32 kato, struct nvmet_ctrl **ctrlp,
uuid_t *hostid)
struct nvmet_ctrl *nvmet_alloc_ctrl(struct nvmet_alloc_ctrl_args *args)
{
struct nvmet_subsys *subsys;
struct nvmet_ctrl *ctrl;
u32 kato = args->kato;
u8 dhchap_status;
int ret;
u16 status;
status = NVME_SC_CONNECT_INVALID_PARAM | NVME_STATUS_DNR;
subsys = nvmet_find_get_subsys(req->port, subsysnqn);
args->status = NVME_SC_CONNECT_INVALID_PARAM | NVME_STATUS_DNR;
subsys = nvmet_find_get_subsys(args->port, args->subsysnqn);
if (!subsys) {
pr_warn("connect request for invalid subsystem %s!\n",
subsysnqn);
req->cqe->result.u32 = IPO_IATTR_CONNECT_DATA(subsysnqn);
req->error_loc = offsetof(struct nvme_common_command, dptr);
goto out;
args->subsysnqn);
args->result = IPO_IATTR_CONNECT_DATA(subsysnqn);
args->error_loc = offsetof(struct nvme_common_command, dptr);
return NULL;
}
down_read(&nvmet_config_sem);
if (!nvmet_host_allowed(subsys, hostnqn)) {
if (!nvmet_host_allowed(subsys, args->hostnqn)) {
pr_info("connect by host %s for subsystem %s not allowed\n",
hostnqn, subsysnqn);
req->cqe->result.u32 = IPO_IATTR_CONNECT_DATA(hostnqn);
args->hostnqn, args->subsysnqn);
args->result = IPO_IATTR_CONNECT_DATA(hostnqn);
up_read(&nvmet_config_sem);
status = NVME_SC_CONNECT_INVALID_HOST | NVME_STATUS_DNR;
req->error_loc = offsetof(struct nvme_common_command, dptr);
args->status = NVME_SC_CONNECT_INVALID_HOST | NVME_STATUS_DNR;
args->error_loc = offsetof(struct nvme_common_command, dptr);
goto out_put_subsystem;
}
up_read(&nvmet_config_sem);
status = NVME_SC_INTERNAL;
args->status = NVME_SC_INTERNAL;
ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
if (!ctrl)
goto out_put_subsystem;
mutex_init(&ctrl->lock);
ctrl->port = req->port;
ctrl->ops = req->ops;
ctrl->port = args->port;
ctrl->ops = args->ops;
#ifdef CONFIG_NVME_TARGET_PASSTHRU
/* By default, set loop targets to clear IDS by default */
@ -1473,8 +1573,8 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
INIT_WORK(&ctrl->fatal_err_work, nvmet_fatal_error_handler);
INIT_DELAYED_WORK(&ctrl->ka_work, nvmet_keep_alive_timer);
memcpy(ctrl->subsysnqn, subsysnqn, NVMF_NQN_SIZE);
memcpy(ctrl->hostnqn, hostnqn, NVMF_NQN_SIZE);
memcpy(ctrl->subsysnqn, args->subsysnqn, NVMF_NQN_SIZE);
memcpy(ctrl->hostnqn, args->hostnqn, NVMF_NQN_SIZE);
kref_init(&ctrl->ref);
ctrl->subsys = subsys;
@ -1497,12 +1597,12 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
subsys->cntlid_min, subsys->cntlid_max,
GFP_KERNEL);
if (ret < 0) {
status = NVME_SC_CONNECT_CTRL_BUSY | NVME_STATUS_DNR;
args->status = NVME_SC_CONNECT_CTRL_BUSY | NVME_STATUS_DNR;
goto out_free_sqs;
}
ctrl->cntlid = ret;
uuid_copy(&ctrl->hostid, hostid);
uuid_copy(&ctrl->hostid, args->hostid);
/*
* Discovery controllers may use some arbitrary high value
@ -1524,12 +1624,35 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
if (ret)
goto init_pr_fail;
list_add_tail(&ctrl->subsys_entry, &subsys->ctrls);
nvmet_setup_p2p_ns_map(ctrl, req);
nvmet_setup_p2p_ns_map(ctrl, args->p2p_client);
nvmet_debugfs_ctrl_setup(ctrl);
mutex_unlock(&subsys->lock);
*ctrlp = ctrl;
return 0;
if (args->hostid)
uuid_copy(&ctrl->hostid, args->hostid);
dhchap_status = nvmet_setup_auth(ctrl);
if (dhchap_status) {
pr_err("Failed to setup authentication, dhchap status %u\n",
dhchap_status);
nvmet_ctrl_put(ctrl);
if (dhchap_status == NVME_AUTH_DHCHAP_FAILURE_FAILED)
args->status =
NVME_SC_CONNECT_INVALID_HOST | NVME_STATUS_DNR;
else
args->status = NVME_SC_INTERNAL;
return NULL;
}
args->status = NVME_SC_SUCCESS;
pr_info("Created %s controller %d for subsystem %s for NQN %s%s%s.\n",
nvmet_is_disc_subsys(ctrl->subsys) ? "discovery" : "nvm",
ctrl->cntlid, ctrl->subsys->subsysnqn, ctrl->hostnqn,
ctrl->pi_support ? " T10-PI is enabled" : "",
nvmet_has_auth(ctrl) ? " with DH-HMAC-CHAP" : "");
return ctrl;
init_pr_fail:
mutex_unlock(&subsys->lock);
@ -1543,9 +1666,9 @@ out_free_ctrl:
kfree(ctrl);
out_put_subsystem:
nvmet_subsys_put(subsys);
out:
return status;
return NULL;
}
EXPORT_SYMBOL_GPL(nvmet_alloc_ctrl);
static void nvmet_ctrl_free(struct kref *ref)
{
@ -1581,6 +1704,7 @@ void nvmet_ctrl_put(struct nvmet_ctrl *ctrl)
{
kref_put(&ctrl->ref, nvmet_ctrl_free);
}
EXPORT_SYMBOL_GPL(nvmet_ctrl_put);
void nvmet_ctrl_fatal_error(struct nvmet_ctrl *ctrl)
{

View File

@ -224,6 +224,9 @@ static void nvmet_execute_disc_get_log_page(struct nvmet_req *req)
}
list_for_each_entry(r, &req->port->referrals, entry) {
if (r->disc_addr.trtype == NVMF_TRTYPE_PCI)
continue;
nvmet_format_discovery_entry(hdr, r,
NVME_DISC_SUBSYS_NAME,
r->disc_addr.traddr,
@ -352,6 +355,20 @@ static void nvmet_execute_disc_get_features(struct nvmet_req *req)
nvmet_req_complete(req, stat);
}
u32 nvmet_discovery_cmd_data_len(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
switch (cmd->common.opcode) {
case nvme_admin_get_log_page:
return nvmet_get_log_page_len(req->cmd);
case nvme_admin_identify:
return NVME_IDENTIFY_DATA_SIZE;
default:
return 0;
}
}
u16 nvmet_parse_discovery_cmd(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;

View File

@ -179,6 +179,11 @@ static u8 nvmet_auth_failure2(void *d)
return data->rescode_exp;
}
u32 nvmet_auth_send_data_len(struct nvmet_req *req)
{
return le32_to_cpu(req->cmd->auth_send.tl);
}
void nvmet_execute_auth_send(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
@ -206,7 +211,7 @@ void nvmet_execute_auth_send(struct nvmet_req *req)
offsetof(struct nvmf_auth_send_command, spsp1);
goto done;
}
tl = le32_to_cpu(req->cmd->auth_send.tl);
tl = nvmet_auth_send_data_len(req);
if (!tl) {
status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
req->error_loc =
@ -429,6 +434,11 @@ static void nvmet_auth_failure1(struct nvmet_req *req, void *d, int al)
data->rescode_exp = req->sq->dhchap_status;
}
u32 nvmet_auth_receive_data_len(struct nvmet_req *req)
{
return le32_to_cpu(req->cmd->auth_receive.al);
}
void nvmet_execute_auth_receive(struct nvmet_req *req)
{
struct nvmet_ctrl *ctrl = req->sq->ctrl;
@ -454,7 +464,7 @@ void nvmet_execute_auth_receive(struct nvmet_req *req)
offsetof(struct nvmf_auth_receive_command, spsp1);
goto done;
}
al = le32_to_cpu(req->cmd->auth_receive.al);
al = nvmet_auth_receive_data_len(req);
if (!al) {
status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR;
req->error_loc =

View File

@ -85,6 +85,22 @@ static void nvmet_execute_prop_get(struct nvmet_req *req)
nvmet_req_complete(req, status);
}
u32 nvmet_fabrics_admin_cmd_data_len(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
switch (cmd->fabrics.fctype) {
#ifdef CONFIG_NVME_TARGET_AUTH
case nvme_fabrics_type_auth_send:
return nvmet_auth_send_data_len(req);
case nvme_fabrics_type_auth_receive:
return nvmet_auth_receive_data_len(req);
#endif
default:
return 0;
}
}
u16 nvmet_parse_fabrics_admin_cmd(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
@ -114,6 +130,22 @@ u16 nvmet_parse_fabrics_admin_cmd(struct nvmet_req *req)
return 0;
}
u32 nvmet_fabrics_io_cmd_data_len(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
switch (cmd->fabrics.fctype) {
#ifdef CONFIG_NVME_TARGET_AUTH
case nvme_fabrics_type_auth_send:
return nvmet_auth_send_data_len(req);
case nvme_fabrics_type_auth_receive:
return nvmet_auth_receive_data_len(req);
#endif
default:
return 0;
}
}
u16 nvmet_parse_fabrics_io_cmd(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
@ -213,73 +245,67 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
struct nvmf_connect_command *c = &req->cmd->connect;
struct nvmf_connect_data *d;
struct nvmet_ctrl *ctrl = NULL;
u16 status;
u8 dhchap_status;
struct nvmet_alloc_ctrl_args args = {
.port = req->port,
.ops = req->ops,
.p2p_client = req->p2p_client,
.kato = le32_to_cpu(c->kato),
};
if (!nvmet_check_transfer_len(req, sizeof(struct nvmf_connect_data)))
return;
d = kmalloc(sizeof(*d), GFP_KERNEL);
if (!d) {
status = NVME_SC_INTERNAL;
args.status = NVME_SC_INTERNAL;
goto complete;
}
status = nvmet_copy_from_sgl(req, 0, d, sizeof(*d));
if (status)
args.status = nvmet_copy_from_sgl(req, 0, d, sizeof(*d));
if (args.status)
goto out;
if (c->recfmt != 0) {
pr_warn("invalid connect version (%d).\n",
le16_to_cpu(c->recfmt));
req->error_loc = offsetof(struct nvmf_connect_command, recfmt);
status = NVME_SC_CONNECT_FORMAT | NVME_STATUS_DNR;
args.error_loc = offsetof(struct nvmf_connect_command, recfmt);
args.status = NVME_SC_CONNECT_FORMAT | NVME_STATUS_DNR;
goto out;
}
if (unlikely(d->cntlid != cpu_to_le16(0xffff))) {
pr_warn("connect attempt for invalid controller ID %#x\n",
d->cntlid);
status = NVME_SC_CONNECT_INVALID_PARAM | NVME_STATUS_DNR;
req->cqe->result.u32 = IPO_IATTR_CONNECT_DATA(cntlid);
args.status = NVME_SC_CONNECT_INVALID_PARAM | NVME_STATUS_DNR;
args.result = IPO_IATTR_CONNECT_DATA(cntlid);
goto out;
}
d->subsysnqn[NVMF_NQN_FIELD_LEN - 1] = '\0';
d->hostnqn[NVMF_NQN_FIELD_LEN - 1] = '\0';
status = nvmet_alloc_ctrl(d->subsysnqn, d->hostnqn, req,
le32_to_cpu(c->kato), &ctrl, &d->hostid);
if (status)
args.subsysnqn = d->subsysnqn;
args.hostnqn = d->hostnqn;
args.hostid = &d->hostid;
args.kato = c->kato;
ctrl = nvmet_alloc_ctrl(&args);
if (!ctrl)
goto out;
dhchap_status = nvmet_setup_auth(ctrl);
if (dhchap_status) {
pr_err("Failed to setup authentication, dhchap status %u\n",
dhchap_status);
nvmet_ctrl_put(ctrl);
if (dhchap_status == NVME_AUTH_DHCHAP_FAILURE_FAILED)
status = (NVME_SC_CONNECT_INVALID_HOST | NVME_STATUS_DNR);
else
status = NVME_SC_INTERNAL;
goto out;
}
status = nvmet_install_queue(ctrl, req);
if (status) {
args.status = nvmet_install_queue(ctrl, req);
if (args.status) {
nvmet_ctrl_put(ctrl);
goto out;
}
pr_info("creating %s controller %d for subsystem %s for NQN %s%s%s.\n",
nvmet_is_disc_subsys(ctrl->subsys) ? "discovery" : "nvm",
ctrl->cntlid, ctrl->subsys->subsysnqn, ctrl->hostnqn,
ctrl->pi_support ? " T10-PI is enabled" : "",
nvmet_has_auth(ctrl) ? " with DH-HMAC-CHAP" : "");
req->cqe->result.u32 = cpu_to_le32(nvmet_connect_result(ctrl));
args.result = cpu_to_le32(nvmet_connect_result(ctrl));
out:
kfree(d);
complete:
nvmet_req_complete(req, status);
req->error_loc = args.error_loc;
req->cqe->result.u32 = args.result;
nvmet_req_complete(req, args.status);
}
static void nvmet_execute_io_connect(struct nvmet_req *req)
@ -343,6 +369,17 @@ out_ctrl_put:
goto out;
}
u32 nvmet_connect_cmd_data_len(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;
if (!nvme_is_fabrics(cmd) ||
cmd->fabrics.fctype != nvme_fabrics_type_connect)
return 0;
return sizeof(struct nvmf_connect_data);
}
u16 nvmet_parse_connect_cmd(struct nvmet_req *req)
{
struct nvme_command *cmd = req->cmd;

View File

@ -272,6 +272,9 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
iter_flags = SG_MITER_FROM_SG;
}
if (req->cmd->rw.control & NVME_RW_LR)
opf |= REQ_FAILFAST_DEV;
if (is_pci_p2pdma_page(sg_page(req->sg)))
opf |= REQ_NOMERGE;

View File

@ -238,6 +238,8 @@ struct nvmet_ctrl {
struct nvmet_subsys *subsys;
struct nvmet_sq **sqs;
void *drvdata;
bool reset_tbkas;
struct mutex lock;
@ -324,6 +326,8 @@ struct nvmet_subsys {
struct config_group namespaces_group;
struct config_group allowed_hosts_group;
u16 vendor_id;
u16 subsys_vendor_id;
char *model_number;
u32 ieee_oui;
char *firmware_rev;
@ -404,6 +408,18 @@ struct nvmet_fabrics_ops {
void (*discovery_chg)(struct nvmet_port *port);
u8 (*get_mdts)(const struct nvmet_ctrl *ctrl);
u16 (*get_max_queue_size)(const struct nvmet_ctrl *ctrl);
/* Operations mandatory for PCI target controllers */
u16 (*create_sq)(struct nvmet_ctrl *ctrl, u16 sqid, u16 flags,
u16 qsize, u64 prp1);
u16 (*delete_sq)(struct nvmet_ctrl *ctrl, u16 sqid);
u16 (*create_cq)(struct nvmet_ctrl *ctrl, u16 cqid, u16 flags,
u16 qsize, u64 prp1, u16 irq_vector);
u16 (*delete_cq)(struct nvmet_ctrl *ctrl, u16 cqid);
u16 (*set_feature)(const struct nvmet_ctrl *ctrl, u8 feat,
void *feat_data);
u16 (*get_feature)(const struct nvmet_ctrl *ctrl, u8 feat,
void *feat_data);
};
#define NVMET_MAX_INLINE_BIOVEC 8
@ -513,18 +529,24 @@ void nvmet_start_keep_alive_timer(struct nvmet_ctrl *ctrl);
void nvmet_stop_keep_alive_timer(struct nvmet_ctrl *ctrl);
u16 nvmet_parse_connect_cmd(struct nvmet_req *req);
u32 nvmet_connect_cmd_data_len(struct nvmet_req *req);
void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id);
u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req);
u16 nvmet_file_parse_io_cmd(struct nvmet_req *req);
u16 nvmet_bdev_zns_parse_io_cmd(struct nvmet_req *req);
u32 nvmet_admin_cmd_data_len(struct nvmet_req *req);
u16 nvmet_parse_admin_cmd(struct nvmet_req *req);
u32 nvmet_discovery_cmd_data_len(struct nvmet_req *req);
u16 nvmet_parse_discovery_cmd(struct nvmet_req *req);
u16 nvmet_parse_fabrics_admin_cmd(struct nvmet_req *req);
u32 nvmet_fabrics_admin_cmd_data_len(struct nvmet_req *req);
u16 nvmet_parse_fabrics_io_cmd(struct nvmet_req *req);
u32 nvmet_fabrics_io_cmd_data_len(struct nvmet_req *req);
bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
struct nvmet_sq *sq, const struct nvmet_fabrics_ops *ops);
void nvmet_req_uninit(struct nvmet_req *req);
size_t nvmet_req_transfer_len(struct nvmet_req *req);
bool nvmet_check_transfer_len(struct nvmet_req *req, size_t len);
bool nvmet_check_data_len_lte(struct nvmet_req *req, size_t data_len);
void nvmet_req_complete(struct nvmet_req *req, u16 status);
@ -535,19 +557,37 @@ void nvmet_execute_set_features(struct nvmet_req *req);
void nvmet_execute_get_features(struct nvmet_req *req);
void nvmet_execute_keep_alive(struct nvmet_req *req);
u16 nvmet_check_cqid(struct nvmet_ctrl *ctrl, u16 cqid);
void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid,
u16 size);
u16 nvmet_cq_create(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid,
u16 size);
u16 nvmet_check_sqid(struct nvmet_ctrl *ctrl, u16 sqid, bool create);
void nvmet_sq_setup(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq, u16 qid,
u16 size);
u16 nvmet_sq_create(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq, u16 qid,
u16 size);
void nvmet_sq_destroy(struct nvmet_sq *sq);
int nvmet_sq_init(struct nvmet_sq *sq);
void nvmet_ctrl_fatal_error(struct nvmet_ctrl *ctrl);
void nvmet_update_cc(struct nvmet_ctrl *ctrl, u32 new);
u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
struct nvmet_req *req, u32 kato, struct nvmet_ctrl **ctrlp,
uuid_t *hostid);
struct nvmet_alloc_ctrl_args {
struct nvmet_port *port;
char *subsysnqn;
char *hostnqn;
uuid_t *hostid;
const struct nvmet_fabrics_ops *ops;
struct device *p2p_client;
u32 kato;
u32 result;
u16 error_loc;
u16 status;
};
struct nvmet_ctrl *nvmet_alloc_ctrl(struct nvmet_alloc_ctrl_args *args);
struct nvmet_ctrl *nvmet_ctrl_find_get(const char *subsysnqn,
const char *hostnqn, u16 cntlid,
struct nvmet_req *req);
@ -689,6 +729,11 @@ static inline bool nvmet_is_disc_subsys(struct nvmet_subsys *subsys)
return subsys->type != NVME_NQN_NVME;
}
static inline bool nvmet_is_pci_ctrl(struct nvmet_ctrl *ctrl)
{
return ctrl->port->disc_addr.trtype == NVMF_TRTYPE_PCI;
}
#ifdef CONFIG_NVME_TARGET_PASSTHRU
void nvmet_passthru_subsys_free(struct nvmet_subsys *subsys);
int nvmet_passthru_ctrl_enable(struct nvmet_subsys *subsys);
@ -730,6 +775,41 @@ void nvmet_passthrough_override_cap(struct nvmet_ctrl *ctrl);
u16 errno_to_nvme_status(struct nvmet_req *req, int errno);
u16 nvmet_report_invalid_opcode(struct nvmet_req *req);
static inline bool nvmet_cc_en(u32 cc)
{
return (cc >> NVME_CC_EN_SHIFT) & 0x1;
}
static inline u8 nvmet_cc_css(u32 cc)
{
return (cc >> NVME_CC_CSS_SHIFT) & 0x7;
}
static inline u8 nvmet_cc_mps(u32 cc)
{
return (cc >> NVME_CC_MPS_SHIFT) & 0xf;
}
static inline u8 nvmet_cc_ams(u32 cc)
{
return (cc >> NVME_CC_AMS_SHIFT) & 0x7;
}
static inline u8 nvmet_cc_shn(u32 cc)
{
return (cc >> NVME_CC_SHN_SHIFT) & 0x3;
}
static inline u8 nvmet_cc_iosqes(u32 cc)
{
return (cc >> NVME_CC_IOSQES_SHIFT) & 0xf;
}
static inline u8 nvmet_cc_iocqes(u32 cc)
{
return (cc >> NVME_CC_IOCQES_SHIFT) & 0xf;
}
/* Convert a 32-bit number to a 16-bit 0's based number */
static inline __le16 to0based(u32 a)
{
@ -766,7 +846,9 @@ static inline void nvmet_req_bio_put(struct nvmet_req *req, struct bio *bio)
}
#ifdef CONFIG_NVME_TARGET_AUTH
u32 nvmet_auth_send_data_len(struct nvmet_req *req);
void nvmet_execute_auth_send(struct nvmet_req *req);
u32 nvmet_auth_receive_data_len(struct nvmet_req *req);
void nvmet_execute_auth_receive(struct nvmet_req *req);
int nvmet_auth_set_key(struct nvmet_host *host, const char *secret,
bool set_ctrl);
@ -824,4 +906,26 @@ static inline void nvmet_pr_put_ns_pc_ref(struct nvmet_pr_per_ctrl_ref *pc_ref)
{
percpu_ref_put(&pc_ref->ref);
}
/*
* Data for the get_feature() and set_feature() operations of PCI target
* controllers.
*/
struct nvmet_feat_irq_coalesce {
u8 thr;
u8 time;
};
struct nvmet_feat_irq_config {
u16 iv;
bool cd;
};
struct nvmet_feat_arbitration {
u8 hpw;
u8 mpw;
u8 lpw;
u8 ab;
};
#endif /* _NVMET_H */

File diff suppressed because it is too large Load Diff

View File

@ -64,6 +64,7 @@ enum {
/* Transport Type codes for Discovery Log Page entry TRTYPE field */
enum {
NVMF_TRTYPE_PCI = 0, /* PCI */
NVMF_TRTYPE_RDMA = 1, /* RDMA */
NVMF_TRTYPE_FC = 2, /* Fibre Channel */
NVMF_TRTYPE_TCP = 3, /* TCP/IP */
@ -275,6 +276,7 @@ enum nvme_ctrl_attr {
NVME_CTRL_ATTR_HID_128_BIT = (1 << 0),
NVME_CTRL_ATTR_TBKAS = (1 << 6),
NVME_CTRL_ATTR_ELBAS = (1 << 15),
NVME_CTRL_ATTR_RHII = (1 << 18),
};
struct nvme_id_ctrl {
@ -1896,6 +1898,46 @@ static inline bool nvme_is_fabrics(const struct nvme_command *cmd)
return cmd->common.opcode == nvme_fabrics_command;
}
#ifdef CONFIG_NVME_VERBOSE_ERRORS
const char *nvme_get_error_status_str(u16 status);
const char *nvme_get_opcode_str(u8 opcode);
const char *nvme_get_admin_opcode_str(u8 opcode);
const char *nvme_get_fabrics_opcode_str(u8 opcode);
#else /* CONFIG_NVME_VERBOSE_ERRORS */
static inline const char *nvme_get_error_status_str(u16 status)
{
return "I/O Error";
}
static inline const char *nvme_get_opcode_str(u8 opcode)
{
return "I/O Cmd";
}
static inline const char *nvme_get_admin_opcode_str(u8 opcode)
{
return "Admin Cmd";
}
static inline const char *nvme_get_fabrics_opcode_str(u8 opcode)
{
return "Fabrics Cmd";
}
#endif /* CONFIG_NVME_VERBOSE_ERRORS */
static inline const char *nvme_opcode_str(int qid, u8 opcode)
{
return qid ? nvme_get_opcode_str(opcode) :
nvme_get_admin_opcode_str(opcode);
}
static inline const char *nvme_fabrics_opcode_str(
int qid, const struct nvme_command *cmd)
{
if (nvme_is_fabrics(cmd))
return nvme_get_fabrics_opcode_str(cmd->fabrics.fctype);
return nvme_opcode_str(qid, cmd->common.opcode);
}
struct nvme_error_slot {
__le64 error_count;
__le16 sqid;