NVMe: Unbind driver on failure

Instead of removing the PCI device from the kernel's topology on
controller failure, this patch simply requests unbinding the device
from the driver. This avoids concurrently running pci removal with the
hot plug event, which has been reported to be problematic when multiple
surprise events occur near simultaneously.

The other benefit is that we will have PCI config and memory space
available to poke around for debugging a failed controller, assuming
the device was not physically removed.

The down side occurs if the platform and/or kernel do not support any
type of surprise hot removal. The device will remain visible through
sysfs (and therefore lspci), and some manual work is necessary to get
the logical topology corrected. But if your platform and/or kernel don't
support surprise removal, you probably shouldn't be doing that anyway.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
This commit is contained in:
Keith Busch 2016-03-28 16:03:21 -06:00 committed by Jens Axboe
parent 014a0d609e
commit 921920ab32

View File

@ -1857,7 +1857,7 @@ static void nvme_remove_dead_ctrl_work(struct work_struct *work)
nvme_kill_queues(&dev->ctrl);
if (pci_get_drvdata(pdev))
pci_stop_and_remove_bus_device_locked(pdev);
device_release_driver(&pdev->dev);
nvme_put_ctrl(&dev->ctrl);
}