933 Commits

Author SHA1 Message Date
James Smart
0a02e39fd1 nvme-fc: correct io termination handling
The io completion handling for i/o's that are failing due to
to a transport error or association termination had issues, causing
io failures (DNR set so retries didn't kick in) or long stalls.

Change the io completion handler for the following items:

When an io has been completed due to a transport abort (based on an
exchange error) or when marked as aborted as part of an association
termination (FCOP_FLAGS_TERMIO), set the NVME completion status to
NVME_SC_ABORTED. By default, do not set DNR on the status so that a
retry can be attempted after association recreate.

In cases where an io is failed (non-successful nvme status including
aborted), if the controller is being deleted (blk_queue_dying) or
the io was part of the ios used for association creation (ctrl state
is NEW or RECONNECTING), then additionally set the DNR bit so the io
will not be retried. If the failed io was part of association creation,
the failure will tear down the partially completioned association and
typically restart a new reconnect attempt (another create association
later).

Rearranged code flow to remove a largely unneeded local variable.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-20 12:16:59 +02:00
Chaitanya Kulkarni
a7a7cbe353 nvme-pci: add SGL support
This adds SGL support for NVMe PCIe driver, based on an earlier patch
from Rajiv Shanmugam Madeswaran <smrajiv15 at gmail.com>. This patch
refactors the original code and adds new module parameter sgl_threshold
to determine whether to use SGL or PRP for IOs.

The usage of SGLs is controlled by the sgl_threshold module parameter,
which allows to conditionally use SGLs if average request segment
size (avg_seg_size) is greater than sgl_threshold. In the original patch,
the decision of using SGLs was dependent only on the IO size,
with the new approach we consider not only IO size but also the
number of physical segments present in the IO.

We calculate avg_seg_size based on request payload bytes and number
of physical segments present in the request.

For e.g.:-

1. blk_rq_nr_phys_segments = 2 blk_rq_payload_bytes = 8k
avg_seg_size = 4K use sgl if avg_seg_size >= sgl_threshold.

2. blk_rq_nr_phys_segments = 2 blk_rq_payload_bytes = 64k
avg_seg_size = 32K use sgl if avg_seg_size >= sgl_threshold.

3. blk_rq_nr_phys_segments = 16 blk_rq_payload_bytes = 64k
avg_seg_size = 4K use sgl if avg_seg_size >= sgl_threshold.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-20 12:16:58 +02:00
Christoph Hellwig
9843f685ae nvme: use ida_simple_{get,remove} for the controller instance
Switch to the ida_simple_* helpers instead of opencoding them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-10-20 12:16:57 +02:00
Roy Shterman
ba2dec35e4 nvmet: Change max_nsid in subsystem due to ns_disable if needed
In case we disable namespaces which has the nsid like
subsystem max_nsid we need to search for the next largest nsid
in this subsystem. If the subsystem don't has more namespaces
we set it to 0, else we take nsid from the last namespace in
namespaces list because the list is sorted while inserting.

Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Roy Shterman <roys@lightbitslabs.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
[hch: slight refactor]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-20 12:06:02 +02:00
Sagi Grimberg
f04b9cc87b nvme-rdma: Fix error status return in tagset allocation failure
We should make sure to escelate allocation failures to prevent a
use-after-free in nvmf_create_ctrl.

Fixes: b28a308ee777 ("nvme-rdma: move tagset allocation to a dedicated routine")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 17:13:51 +02:00
Sagi Grimberg
bd9f07590a nvme-rdma: Fix possible double free in reconnect flow
The fact that we free the async event buffer in
nvme_rdma_destroy_admin_queue can cause us to free it
more than once because this happens in every reconnect
attempt since commit 31fdf1840170. we rely on the queue
state flags DELETING to avoid this for other resources.

A more complete fix is to not destroy the admin/io queues
unconditionally on every reconnect attempt, but its a bit
more extensive and will go in the next release.

Fixes: 31fdf1840170 ("nvme-rdma: reuse configure/destroy_admin_queue")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 17:07:22 +02:00
Bhumika Goyal
66603a31f8 nvmet: make config_item_type const
Make config_item_type structures const as they are either passed to a
function having the argument as const or used inside an if statement or
stored in the const "ci_type" field of a config_item structure.

Done using Coccinelle

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 16:15:28 +02:00
Israel Rukshin
86f36b9c6c nvme-loop: Add BLK_MQ_F_NO_SCHED flag to admin tag set
This flag is useful for admin queues that aren't used for normal IO.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 10:58:54 +02:00
Israel Rukshin
5a22e2bf44 nvme-fc: Add BLK_MQ_F_NO_SCHED flag to admin tag set
Since commit b86dd81
"block: get rid of blk-mq default scheduler choice Kconfig entries",
when setting nr_hw_queues to 1 the admin tag set uses mq-deadline scheduler.
This flag is useful for admin queues that aren't used for normal IO.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart  <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 10:58:54 +02:00
Israel Rukshin
94f29d4f78 nvme-rdma: Add BLK_MQ_F_NO_SCHED flag to admin tag set
Since commit b86dd81
"block: get rid of blk-mq default scheduler choice Kconfig entries",
when setting nr_hw_queues to 1 the admin tag set uses mq-deadline scheduler.
This flag is useful for admin queues that aren't used for normal IO.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 10:58:53 +02:00
Minwoo Im
16772ae6d9 nvme-pci: fix typos in comments
fixed comment typos in adapter_alloc_cq() and adapter_alloc_sq().
'the the' duplications are replaced with 'that the'.

Signed-off-by: Minwoo Im <dn3108@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 10:56:07 +02:00
James Smart
f9cf2a6491 nvmet: synchronize sqhd update
In testing target io in read write mix, we did indeed get into cases where
sqhd didn't update properly and slowly missed enough updates to shutdown
the queue.

Protect the updating sqhd by using cmpxchg, and for that turn the sqhd
field into a u32 so that cmpxchg works on it for all architectures.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 09:16:12 +02:00
James Smart
17c4dc6eb7 nvme-fc: retry initial controller connections 3 times
Currently, if a frame is lost of command fails as part of initial
association create for a new controller, the new controller connection
request will immediately fail.

Add in an immediate 3 retry loop before giving up.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:30:01 +02:00
James Smart
8a82dbf191 nvme-fc: fix iowait hang
Add missing iowait head initialization.
Fix irqsave vs irq: wait_event_lock_irq() doesn't do irq save/restore

Fixes: 36715cf4b366 ("nvme_fc: replace ioabort msleep loop with completion”)
Cc: <stable@vger.kernel.org> # 4.13
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Tested-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:30:00 +02:00
Sagi Grimberg
0ad0bfa298 nvme-rdma: stop controller reset if the controller is deleting
If the controller is deleting (in case the user decided to delete it), we
have no point to continue reset sequence.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:29:39 +02:00
Sagi Grimberg
5013e98b5e nvme-rdma: change queue flag semantics DELETING -> ALLOCATED
Instead of marking we are deleting, mark we are allocated and check that
instead. This makes the logic symmetrical to connected mark check.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:28:55 +02:00
Sagi Grimberg
60a5188633 nvme-rdma: Don't local invalidate if the queue is not live
No chance for the local invalidate to succeed if the queue-pair
is in error state. Most likely the target will do a remote
invalidation of our mr so not a big loss on the test_bit.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:54 +02:00
Sagi Grimberg
5e1fe61d41 nvme-rdma: teardown admin/io queues once on error recovery
Relying on the queue state while tearing down on every reconnect
attempt is not a good design. We should do it once in err_work
and simply try to establish the queues for each reconnect attempt.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:53 +02:00
Sagi Grimberg
0fc176dfda nvme-rdma: Check that reinit_request got a proper mr
Warn if req->mr is NULL as it should never happen.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:52 +02:00
Sagi Grimberg
0c5b43b9c1 nvme-rdma: move assignment to declaration
No need for the extra line for trivial assignments.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:52 +02:00
Sagi Grimberg
d8bfceebc4 nvme-rdma: fix wrong logging message
Not necessarily address resolution failed.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:51 +02:00
Sagi Grimberg
60070c78ef nvme-rdma: pass tagset to directly nvme_rdma_free_tagset
Instead of flagging admin/io.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:50 +02:00
Sagi Grimberg
31b8446079 nvme: introduce nvme_reinit_tagset
Move blk_mq_reinit_tagset from blk-mq to nvme core
as the only user of it. Current transports that use
it (rdma, fc) simply implement .reinit_request op.

This patch does not change any functionality.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:48 +02:00
Christoph Hellwig
761f2e1ed8 nvme: simplify compat_ioctl handling
We can just use our normal ioctl handler for the compat case and remove
the boilerplate code for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
2017-10-16 14:54:10 +02:00
Javier González
1a94b2d484 lightnvm: implement generic path for sync I/O
Implement a generic path for sending sync I/O on LightNVM. This allows
to reuse the standard synchronous path trough blk_execute_rq(), instead
of implementing a wait_for_completion on the target side (e.g., pblk).

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13 08:34:57 -06:00
Javier González
1b839187db lightnvm: fail fast on passthrough commands
Make LightNVM passhtrough commands fail fast. User space will then take
care of re-submitting.

Fixes: 84d4add793c6 ('lightnvm: add ioctls for vector I/Os')
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-13 08:34:57 -06:00
James Smart
469d0ef06d nvme-fc: move remote port get/put/free location
move nvme_fc_rport_get/put and rport free to higher in the file to
avoid adding prototypes to resolve references in upcoming code additions

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-05 10:02:47 +02:00
Christoph Hellwig
8969f1f829 nvme-pci: Use PCI bus address for data/queues in CMB
Currently, NVMe PCI host driver is programming CMB dma address as
I/O SQs addresses. This results in failures on systems where 1:1
outbound mapping is not used (example Broadcom iProc SOCs) because
CMB BAR will be progammed with PCI bus address but NVMe PCI EP will
try to access CMB using dma address.

To have CMB working on systems without 1:1 outbound mapping, we
program PCI bus address for I/O SQs instead of dma address. This
approach will work on systems with/without 1:1 outbound mapping.

Based on a report and previous patch from Abhishek Shah.

Fixes: 8ffaadf7 ("NVMe: Use CMB for the IO SQes if available")
Cc: stable@vger.kernel.org
Reported-by: Abhishek Shah <abhishek.shah@broadcom.com>
Tested-by: Abhishek Shah <abhishek.shah@broadcom.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-04 11:42:53 +02:00
James Smart
5f5685569a nvme-fc: create fc class and transport device
Added a new fc class and a device node for udev events under it.  I
expect the fc class will eventually be the location where the FC SCSI and
FC NVME merge in the future. Therefore names are kept somewhat generic.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-04 09:48:23 +02:00
James Smart
eaefd5abf6 nvme-fc: add uevent for auto-connect
To support auto-connecting to FC-NVME devices upon their dynamic
appearance, add a uevent that can kick off connection scripts.
uevent is posted against the fc_udev device.

patch set tested with the following rule to kick an nvme-cli connect-all
for the FC initiator and FC target ports. This is just an example for
testing and not intended for real life use.

ACTION=="change", SUBSYSTEM=="fc", ENV{FC_EVENT}=="nvmediscovery", \
        ENV{NVMEFC_HOST_TRADDR}=="*", ENV{NVMEFC_TRADDR}=="*", \
	RUN+="/bin/sh -c '/usr/local/sbin/nvme connect-all --transport=fc --host-traddr=$env{NVMEFC_HOST_TRADDR} --traddr=$env{NVMEFC_TRADDR} >> /tmp/nvme_fc.log'"

I will post proposed udev/systemd scripts for possible kernel support.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-04 09:48:20 +02:00
Sagi Grimberg
d1f1071f81 nvme-fabrics: request transport module
Help userspace to make sure transport module is loaded.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-04 09:43:58 +02:00
James Smart
3375e29c28 nvmet: bump NVMET_NR_QUEUES to 128
Raise the max number of IO queues to 128. There are several hosts with
more than 64 cpus/threads.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-04 09:43:58 +02:00
Randy Dunlap
786456325b nvme: use menu Kconfig interface
Add a menu interface for NVME host and target support so that it is
presented to users more like other Kconfig symbols.
This makes the Device Driver menu less cluttered (easier to read)
and keeps all of these symbols grouped together.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-04 09:43:57 +02:00
Marc Olson
8ae4e4477d nvme: update timeout module parameter type
The underlying blk_mq_tag_set, and request timeout parameters support an
unsigned int. Extend the size of the nvme module parameters for io and
admin commands to match.

Signed-off-by: Marc Olson <marcolso@amazon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-04 09:43:09 +02:00
Martin Wilck
007a61ae2f nvme: fix visibility of "uuid" ns attribute
"uuid" must be invisible if both ns->uuid and ns->nguid are unset,
not if either one is.

Fixes: d934f9848a77 "nvme: provide UUID value to userspace"
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-01 09:58:04 +02:00
James Smart
fddc9923c6 nvme-fcloop: fix port deletes and callbacks
Now that there are potentially long delays between when a remoteport or
targetport delete calls is made and when the callback occurs (dev_loss_tmo
timeout), no longer block in the delete routines and move the final nport
puts to the callbacks.

Moved the fcloop_nport_get/put/free routines to avoid forward declarations.

Ensure port_info structs used in registrations are nulled in case fields
are not set (ex: devloss_tmo values).

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 12:42:11 -06:00
James Smart
0c319d3a14 nvmet-fc: ensure target queue id within range.
When searching for queue id's ensure they are within the expected range.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 12:42:11 -06:00
James Smart
3688feb582 nvmet-fc: on port remove call put outside lock
Avoid calling the put routine, as it may traverse to free routines while
holding the target lock.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 12:42:11 -06:00
Sagi Grimberg
e4d753d7e5 nvme-rdma: don't fully stop the controller in error recovery
By calling nvme_stop_ctrl on a already failed controller will wait for the
scan work to complete (only by identify timeout expiration which is 60
seconds). This is unnecessary when we already know that the controller has
failed.

Reported-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 12:42:11 -06:00
Sagi Grimberg
0a960afd60 nvme-rdma: give up reconnect if state change fails
If we failed to transition to state LIVE after a successful reconnect,
then controller deletion already started. In this case there is no
point moving forward with reconnect.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 12:42:11 -06:00
Sagi Grimberg
1a40d97288 nvme-core: Use nvme_wq to queue async events and fw activation
async_event_work might race as it is executed from two different
workqueues at the moment.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 12:42:11 -06:00
James Smart
8cbd96a628 nvme: fix sqhd reference when admin queue connect fails
Fix bug in sqhd patch.

It wasn't the sq that was at risk. In the case where the admin queue
connect command fails, the sq->size field is not set. Therefore, this
becomes a divide by zero error.

Add a quick check to bypass under this failure condition.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 12:42:11 -06:00
James Smart
bb1cc74790 nvmet: implement valid sqhd values in completions
To support sqhd, for initiators that are following the spec and
paying attention to sqhd vs their sqtail values:

- add sqhd to struct nvmet_sq
- initialize sqhd to 0 in nvmet_sq_setup
- rather than propagate the 0's-based qsize value from the connect message
  which requires a +1 in every sqhd update, and as nothing else references
  it, convert to 1's-based value in nvmt_sq/cq_setup() calls.
- validate connect message sqsize being non-zero per spec.
- updated assign sqhd for every completion that goes back.

Also remove handling the NULL sq case in __nvmet_req_complete, as it can't
happen with the current code.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 08:56:05 -06:00
Guilherme G. Piccoli
8edd11c9ad nvme-fabrics: Allow 0 as KATO value
Currently, driver code allows user to set 0 as KATO
(Keep Alive TimeOut), but this is not being respected.
This patch enforces the expected behavior.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 08:56:05 -06:00
James Smart
0951338d96 nvme: allow timed-out ios to retry
Currently the nvme_req_needs_retry() applies several checks to see if
a retry is allowed. On of those is whether the current time has exceeded
the start time of the io plus the timeout length. This check, if an io
times out, means there is never a retry allowed for the io. Which means
applications see the io failure.

Remove this check and allow the io to timeout, like it does on other
protocols, and retries to be made.

On the FC transport, a frame can be lost for an individual io, and there
may be no other errors that escalate for the connection/association.
The io will timeout, which causes the transport to escalate into creating
a new association, but the io that timed out, due to this retry logic, has
already failed back to the application and things are hosed.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 08:56:05 -06:00
James Smart
cd48282cc7 nvme: stop aer posting if controller state not live
If an nvme async_event command completes, in most cases, a new
async event is posted. However, if the controller enters a
resetting or reconnecting state, there is nothing to block the
scheduled work element from posting the async event again. Nor are
there calls from the transport to stop async events when an
association dies.

In the case of FC, where the association is torn down, the aer must
be aborted on the FC link and completes through the normal job
completion path. Thus the terminated async event ends up being
rescheduled even though the controller isn't in a valid state for
the aer, and the reposting gets the transport into a partially torn
down data structure.

It's possible to hit the scenario on rdma, although much less likely
due to an aer completing right as the association is terminated and
as the association teardown reclaims the blk requests via
nvme_cancel_request() so its immediate, not a link-related action
like on FC.

Fix by putting controller state checks in both the async event
completion routine where it schedules the async event and in the
async event work routine before it calls into the transport. It's
effectively a "stop_async_events()" behavior.  The transport, when
it creates a new association with the subsystem will transition
the state back to live and is already restarting the async event
posting.

Signed-off-by: James Smart <james.smart@broadcom.com>
[hch: remove taking a lock over reading the controller state]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 08:56:05 -06:00
Keith Busch
d087747384 nvme-pci: Print invalid SGL only once
The WARN_ONCE macro returns true if the condition is true, not if the
warn was raised, so we're printing the scatter list every time it's
invalid. This is excessive and makes debugging harder, so this patch
prints it just once.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 08:56:05 -06:00
Keith Busch
161b8be2bd nvme-pci: initialize queue memory before interrupts
A spurious interrupt before the nvme driver has initialized the completion
queue may inadvertently cause the driver to believe it has a completion
to process. This may result in a NULL dereference since the nvmeq's tags
are not set at this point.

The patch initializes the host's CQ memory so that a spurious interrupt
isn't mistaken for a real completion.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 08:56:05 -06:00
James Smart
deb61742e0 nvmet-fc: fix failing max io queue connections
fc transport is treating NVMET_NR_QUEUES as maximum queue count, e.g.
admin queue plus NVMET_NR_QUEUES-1 io queues.  But NVMET_NR_QUEUES is
the number of io queues, so maximum queue count is really
NVMET_NR_QUEUES+1.

Fix the handling in the target fc transport

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 08:56:05 -06:00
James Smart
d9d34c0b23 nvme-fc: use transport-specific sgl format
Sync with NVM Express spec change and FC-NVME 1.18.

FC transport sets SGL type to Transport SGL Data Block Descriptor and
subtype to transport-specific value 0x0A.

Removed the warn-on's on the PRP fields. They are unneeded. They were
to check for values from the upper layer that weren't set right, and
for the most part were fine. But, with Async events, which reuse the
same structure and 2nd time issued the SGL overlay converted them to
the Transport SGL values - the warn-on's were errantly firing.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 08:56:05 -06:00