-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmc7S40QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpjHVD/43rDZ8ehs+IAAr6S0RemNX1SRG0mK2UOEb
kMoNogS7StO/c4JYW3JuzCyLRn5ZsgeWV/muqxwDEWQrmTGrvi+V45KikrZPwm3k
p0ump33qV9EU2jiR1MKZjtwK2P0CI7/DD3W8ww6IOvKbTT7RcqQcdHznvXArFBtc
xCuQPpayFG7ZasC+N9VaBwtiUEVgU3Ek9AFT7UVZRWajjHPNalQwaooJWayO0rEG
KdoW5yG0ryLrgCY2ACSvRLS+2s14EJtb8hgT08WKHTNgd5LxhSKxfsTapamua+7U
FdVS6Ij0tEkgu2jpvgj7QKO0Uw10Cnep2gj7RHts/LVewvkliS6XcheOzqRS1jWU
I2EI+UaGOZ11OUiw52VIveEVS5zV/NWhgy5BSP9LYEvXw0BUAHRDYGMem8o5G1V1
SWqjIM1UWvcQDlAnMF9FDVzojvjVUmYWvcAlFFztO8J0B7SavHR3NcfHwEf57reH
rNoUbi/9c4/wjJJF33gejiR5pU+ewy/Mk75GrtX3xpEqlztfRbf9/FbPCMEAO1KR
DF/b3lkUV9i2/BRW6a0SpZ5RDSmSYMnateel6TrPyVSRnpiSSFO8FrbynwUOa17b
6i49YDFWzzXOrR1YWDg6IEtTrcmBEmvi7F6aoDs020qUnL0hwLn1ZuoIxuiFEpor
Z0iFF1B/nw==
=PWTH
-----END PGP SIGNATURE-----
Merge tag 'for-6.13/block-20241118' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- NVMe updates via Keith:
- Use uring_cmd helper (Pavel)
- Host Memory Buffer allocation enhancements (Christoph)
- Target persistent reservation support (Guixin)
- Persistent reservation tracing (Guixen)
- NVMe 2.1 specification support (Keith)
- Rotational Meta Support (Matias, Wang, Keith)
- Volatile cache detection enhancment (Guixen)
- MD updates via Song:
- Maintainers update
- raid5 sync IO fix
- Enhance handling of faulty and blocked devices
- raid5-ppl atomic improvement
- md-bitmap fix
- Support for manually defining embedded partition tables
- Zone append fixes and cleanups
- Stop sending the queued requests in the plug list to the driver
->queue_rqs() handle in reverse order.
- Zoned write plug cleanups
- Cleanups disk stats tracking and add support for disk stats for
passthrough IO
- Add preparatory support for file system atomic writes
- Add lockdep support for queue freezing. Already found a bunch of
issues, and some fixes for that are in here. More will be coming.
- Fix race between queue stopping/quiescing and IO queueing
- ublk recovery improvements
- Fix ublk mmap for 64k pages
- Various fixes and cleanups
* tag 'for-6.13/block-20241118' of git://git.kernel.dk/linux: (118 commits)
MAINTAINERS: Update git tree for mdraid subsystem
block: make struct rq_list available for !CONFIG_BLOCK
block/genhd: use seq_put_decimal_ull for diskstats decimal values
block: don't reorder requests in blk_mq_add_to_batch
block: don't reorder requests in blk_add_rq_to_plug
block: add a rq_list type
block: remove rq_list_move
virtio_blk: reverse request order in virtio_queue_rqs
nvme-pci: reverse request order in nvme_queue_rqs
btrfs: validate queue limits
block: export blk_validate_limits
nvmet: add tracing of reservation commands
nvme: parse reservation commands's action and rtype to string
nvmet: report ns's vwc not present
md/raid5: Increase r5conf.cache_name size
block: remove the ioprio field from struct request
block: remove the write_hint field from struct request
nvme: check ns's volatile write cache not present
nvme: add rotational support
nvme: use command set independent id ns if available
...
max_zone_append_sectors differs from all other queue limits in that the
final value used is not stored in the queue_limits but needs to be
obtained using queue_limits_max_zone_append_sectors helper. This not
only adds (tiny) extra overhead to the I/O path, but also can be easily
forgotten in file system code.
Add a new max_hw_zone_append_sectors value to queue_limits which is
set by the driver, and calculate max_zone_append_sectors from that and
the other inputs in blk_validate_zoned_limits, similar to how
max_sectors is calculated to fix this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241104073955.112324-3-hch@lst.de
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20241108154657.845768-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Two small fixes, the drivers one in ufs simply delays running a work
queue and the generic one in zoned storage switches to a more correct
API that tries the standard buddy allocator first (for small
allocations); this fixes an allocation problem with small allocations
seen under memory pressure.
Signed-off-by: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZy5pZSYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishdonAQCeHKnp
BkxbSls2uPQn4DAxak1tCfxTujMFWxQHP2Y92gEA2q3chF7P136z2rYp0wGfm7P4
ivxddAWZth4tV7TUUw8=
=by5B
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes, the drivers one in ufs simply delays running a work
queue and the generic one in zoned storage switches to a more correct
API that tries the standard buddy allocator first (for small
allocations); this fixes an allocation problem with small allocations
seen under memory pressure"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Start the RTC update work later
scsi: sd_zbc: Use kvzalloc() to allocate REPORT ZONES buffer
max_zone_append_sectors differs from all other queue limits in that the
final value used is not stored in the queue_limits but needs to be
obtained using queue_limits_max_zone_append_sectors helper. This not
only adds (tiny) extra overhead to the I/O path, but also can be easily
forgotten in file system code.
Add a new max_hw_zone_append_sectors value to queue_limits which is
set by the driver, and calculate max_zone_append_sectors from that and
the other inputs in blk_validate_zoned_limits, similar to how
max_sectors is calculated to fix this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241104073955.112324-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We have two reports of failed memory allocation in btrfs' code which is
calling into report zones.
Both of these reports have the following signature coming from
__vmalloc_area_node():
kworker/u17:5: vmalloc error: size 0, failed to allocate pages, mode:0x10dc2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_NORETRY|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
Further debugging showed these where allocations of one sector (512
bytes) and at least one of the reporter's systems where low on memory,
so going through the overhead of allocating a vm area failed.
Switching the allocation from __vmalloc() to kvzalloc() avoids the
overhead of vmalloc() on small allocations and succeeds.
Note: the buffer is already freed using kvfree() so there's no need to
adjust the free path.
Cc: Qu Wenru <wqu@suse.com>
Cc: Naohiro Aota <naohiro.aota@wdc.com>
Link: https://github.com/kdave/btrfs-progs/issues/779
Link: https://github.com/kdave/btrfs-progs/issues/915
Fixes: 23a50861adda ("scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()")
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20241030110253.11718-1-jth@kernel.org
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
Now that device mapper can handle resetting all zones of a mapped zoned
device using REQ_OP_ZONE_RESET_ALL, all zoned block device drivers
support this operation. With this, the request queue feature
BLK_FEAT_ZONE_RESETALL is not necessary and the emulation code in
blk-zone.c can be removed.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240704052816.623865-5-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move the zone_resetall flag into the queue_limits feature field so that
it can be set atomically with the queue frozen.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-24-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move the zoned flags into the features field to reclaim a little
bit of space.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-23-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move a bit of code that sets up the zone flag and the write granularity
into sd_zbc_read_zones to be with the rest of the zoned limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since commit 7437bb73f087 ("block: remove support for the host aware zone
model"), only ZBC devices expose a zoned access model. sd_is_zoned is
used to check for that and thus return false for host aware devices.
Replace the helper with the simple open coded TYPE_ZBC check to fix this.
Fixes: 7437bb73f087 ("block: remove support for the host aware zone model")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20240617060532.127975-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Assign all queue limits through a local queue_limits variable and
queue_limits_commit_update so that we can't race updating them from
multiple places, and freeze the queue when updating them so that
in-progress I/O submissions don't see half-updated limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240531074837.1648501-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Consolidate setting zone-related queue limits in sd_zbc_read_zones
instead of splitting them between sd_zbc_revalidate_zones and
sd_zbc_read_zones, and move the early_zone_information initialization
in sd_zbc_read_zones above setting up the queue limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240531074837.1648501-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The only user of blk_revalidate_disk_zones() second argument was the
SCSI disk driver (sd). Now that this driver does not require this
update_driver_data argument, remove it to simplify the interface of
blk_revalidate_disk_zones(). Also update the function kdoc comment to
be more accurate (i.e. there is no gendisk ->revalidate method).
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240408014128.205141-21-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Set the request queue of a TYPE_ZBC device as needing zone append
emulation by setting the device queue max_zone_append_sectors limit to
0. This enables the block layer generic implementation provided by zone
write plugging. With this, the sd driver will never see a
REQ_OP_ZONE_APPEND request and the zone append emulation code
implemented in sd_zbc.c can be removed.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240408014128.205141-14-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When zones were first added the SCSI and ATA specs, two different
models were supported (in addition to the drive managed one that
is invisible to the host):
- host managed where non-conventional zones there is strict requirement
to write at the write pointer, or else an error is returned
- host aware where a write point is maintained if writes always happen
at it, otherwise it is left in an under-defined state and the
sequential write preferred zones behave like conventional zones
(probably very badly performing ones, though)
Not surprisingly this lukewarm model didn't prove to be very useful and
was finally removed from the ZBC and SBC specs (NVMe never implemented
it). Due to to the easily disappearing write pointer host software
could never rely on the write pointer to actually be useful for say
recovery.
Fortunately only a few HDD prototypes shipped using this model which
never made it to mass production. Drop the support before it is too
late. Note that any such host aware prototype HDD can still be used
with Linux as we'll now treat it as a conventional HDD.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In sd_zbc_revalidate_zones(), execute blk_queue_chunk_sectors() and
blk_queue_max_zone_append_sectors() to respectively set a ZBC device
zone size and maximum zone append sector limit before executing
blk_revalidate_disk_zones(). This is to allow the block layer zone
reavlidation to check these device characteristics prior to checking all
zones of the device.
Since blk_queue_max_zone_append_sectors() already caps the device
maximum zone append limit to the zone size and to the maximum command
size, the max_append value passed to blk_queue_max_zone_append_sectors()
is simplified to the maximum number of segments times the number of
sectors per page.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230703024812.76778-2-dlemoal@kernel.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the sd driver revalidates host-managed SMR disks, it calls
disk_set_zoned() which changes the zone_write_granularity attribute value
to the logical block size regardless of the device type. After that, the sd
driver overwrites the value in sd_zbc_read_zone() with the physical block
size, since ZBC/ZAC requires this for host-managed disks. Between the calls
to disk_set_zoned() and sd_zbc_read_zone(), there exists a window where the
attribute shows the logical block size as the zone_write_granularity value,
which is wrong for host-managed disks. The duration of the window is from
20ms to 200ms, depending on report zone command execution time.
To avoid the wrong zone_write_granularity value between disk_set_zoned()
and sd_zbc_read_zone(), modify the value not in sd_zbc_read_zone() but
just after disk_set_zoned() call.
Fixes: a805a4fa4fa3 ("block: introduce zone_write_granularity limit")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20230306063024.3376959-1-shinichiro.kawasaki@wdc.com
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi_execute_req() is going to be removed. Conver zbc to
scsi_execute_cmd().
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add tracepoints to the SCSI zone append emulation in order to trace the
zone start to write-pointer aligned LBA translation and the corresponding
completion.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/d103bcf5f90139143469f2a0084c74bd9e03ad4a.1669804487.git.johannes.thumshirn@wdc.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
ZBC Zoned Block Commands specification mandates SYNCHRONIZE CACHE(16) for
host-managed zoned block devices, but does not mandate SYNCHRONIZE
CACHE(10). Call SYNCHRONIZE CACHE(16) in place of SYNCHRONIZE CACHE(10) to
ensure that the command is always supported. For this purpose, add
use_16_for_sync flag to struct scsi_device in same manner as use_16_for_rw
flag.
To be precise, ZBC does not mandate SYNCHRONIZE CACHE(16) for host-aware
zoned block devices. However, modern devices should support 16-byte
commands. Hence, call SYNCHRONIZE CACHE (16) on both types of ZBC devices,
host-aware and host-managed. Of note is that READ(16) and WRITE(16) have
same story and they are already called for both types of ZBC devices.
Another note is that this patch depends on the fix commit ea045fd344cb
("ata: libata-scsi: fix SYNCHRONIZE CACHE (16) command failure").
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20221115002905.1709006-1-shinichiro.kawasaki@wdc.com
Reviewed-by: Damien Le Moal <damien.lemoal@opendource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The type name enum req_opf is misleading since it suggests that values of
this type include both an operation type and flags. Since values of this
type represent an operation only, change the type name into enum req_op.
Convert the enum req_op documentation into kernel-doc format. Move a few
definitions such that the enum req_op documentation occurs just above
the enum req_op definition.
The name "req_opf" was introduced by commit ef295ecf090d ("block: better op
and flags encoding").
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-2-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move the zone related fields that are currently stored in
struct request_queue to struct gendisk as these are part of the highlevel
block layer API and are only used for non-passthrough I/O that requires
the gendisk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220706070350.1703384-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Switch to a gendisk based API in preparation for moving all zone related
fields from the request_queue to the gendisk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220706070350.1703384-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Prepare for storing the zone related field in struct gendisk instead
of struct request_queue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220706070350.1703384-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Make sure to always free a scsi disk zone information, even for regular
disks. This ensures that there is no memory leak, even in the case of a
zoned disk changing type to a regular disk (e.g. with a reformat using the
FORMAT WITH PRESET command or other vendor proprietary command).
To do this, rename sd_zbc_clear_zone_info() to sd_zbc_free_zone_info() and
remove sd_zbc_release_disk(). A call to sd_zbc_free_zone_info() is added to
sd_zbc_read_zones() for drives for which sd_is_zoned() returns
false. Furthermore, sd_zbc_free_zone_info() code make s sure that the sdkp
rev_mutex is never used while not being initialized by gating the cleanup
code with a a check on the zone_wp_update_buf field as it is never NULL
when rev_mutex has been initialized.
Link: https://lore.kernel.org/r/20220601062544.905141-3-damien.lemoal@opensource.wdc.com
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
ZBC-2 allows host-managed disks to report gap zones. This allow zoned disks
to report an offset between data zone starts that is a power of two even if
the number of logical blocks with data per zone is not a power of two.
Another new feature in ZBC-2 is support for constant zone starting LBA
offsets. For zoned disks that report a constant zone starting LBA offset,
hide the gap zones from the block layer. Report the offset between data
zone starts as zone size and report the number of logical blocks with data
per zone as the zone capacity.
Link: https://lore.kernel.org/r/20220421183023.3462291-7-bvanassche@acm.org
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
[ bvanassche: Reworked this patch ]
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Return early in sd_zbc_check_zoned_characteristics() for host-aware
disks. This patch does not change any functionality but makes a later patch
easier to read.
Link: https://lore.kernel.org/r/20220421183023.3462291-6-bvanassche@acm.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
[ bvanassche: extracted this change from a larger patch ]
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Deriving the meaning of the nr_zones, rev_nr_zones, zone_blocks and
rev_zone_blocks member variables requires careful analysis of the source
code. Make the meaning of these member variables easier to understand by
introducing struct zoned_disk_info.
Link: https://lore.kernel.org/r/20220421183023.3462291-5-bvanassche@acm.org
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When querying zones, track the position in logical blocks instead of in
sectors. This change slightly simplifies sd_zbc_report_zones().
Link: https://lore.kernel.org/r/20220421183023.3462291-4-bvanassche@acm.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
[ bvanassche: extracted this change from a larger patch ]
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The following check in sd_zbc_cmnd_checks() can only work correctly if the
zone size is a power of two:
if (sector & (sd_zbc_zone_sectors(sdkp) - 1))
/* Unaligned request */
return BLK_STS_IOERR;
Hence this patch that verifies that the zone size is a power of two.
Link: https://lore.kernel.org/r/20220421183023.3462291-3-bvanassche@acm.org
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add several kernel-doc headers. Declare input arrays const. Specify the
array size in function declarations.
Link: https://lore.kernel.org/r/20220421183023.3462291-2-bvanassche@acm.org
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are no more end-users of REQ_OP_WRITE_SAME left, so we can start
deleting it.
Link: https://lore.kernel.org/r/20220209082828.2629273-5-hch@lst.de
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This series consists of the usual driver updates (ufs, pm80xx, lpfc,
mpi3mr, mpt3sas, hisi_sas, libsas) and minor updates and bug fixes.
The most impactful change is likely the switch from GFP_DMA to
GFP_KERNEL in a bunch of drivers, but even that shouldn't affect too
many people.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYeCJZyYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishfMnAQCsERG9
V4yX8LDpBjD7leIccf+6krJNNWaIWYYkEdxpzQD9FShB7/yDakFq3erW2y5mVqac
dZ065M0ckE4bxk9uMIE=
=gPHF
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This series consists of the usual driver updates (ufs, pm80xx, lpfc,
mpi3mr, mpt3sas, hisi_sas, libsas) and minor updates and bug fixes.
The most impactful change is likely the switch from GFP_DMA to
GFP_KERNEL in a bunch of drivers, but even that shouldn't affect too
many people"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (121 commits)
scsi: mpi3mr: Bump driver version to 8.0.0.61.0
scsi: mpi3mr: Fixes around reply request queues
scsi: mpi3mr: Enhanced Task Management Support Reply handling
scsi: mpi3mr: Use TM response codes from MPI3 headers
scsi: mpi3mr: Add io_uring interface support in I/O-polled mode
scsi: mpi3mr: Print cable mngnt and temp threshold events
scsi: mpi3mr: Support Prepare for Reset event
scsi: mpi3mr: Add Event acknowledgment logic
scsi: mpi3mr: Gracefully handle online FW update operation
scsi: mpi3mr: Detect async reset that occurred in firmware
scsi: mpi3mr: Add IOC reinit function
scsi: mpi3mr: Handle offline FW activation in graceful manner
scsi: mpi3mr: Code refactor of IOC init - part2
scsi: mpi3mr: Code refactor of IOC init - part1
scsi: mpi3mr: Fault IOC when internal command gets timeout
scsi: mpi3mr: Display IOC firmware package version
scsi: mpi3mr: Handle unaligned PLL in unmap cmnds
scsi: mpi3mr: Increase internal cmnds timeout to 60s
scsi: mpi3mr: Do access status validation before adding devices
scsi: mpi3mr: Add support for PCIe Managed Switch SES device
...
Make sd_zbc_parse_report() use if/else when setting the write pointer,
instead of setting it unconditionally and then conditionally updating it.
Link: https://lore.kernel.org/r/20211201142821.64650-2-Niklas.Cassel@wdc.com
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
According to the ZBC (and ZAC) specification, a zone that has Zone Type set
to Conventional, must also have its Zone Condition set to "Not Write
Pointer".
Therefore, a conventional zone will never have Zone Condition set to
"Full", which means that we can omit the non-conventional prerequisite from
the zone full condition check.
Link: https://lore.kernel.org/r/20211201142821.64650-1-Niklas.Cassel@wdc.com
Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Just use the disk attached to the request_queue instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch addresses the following Coverity report about the zno *
sdkp->zone_blocks expression:
CID 1475514 (#1 of 1): Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
overflow_before_widen: Potentially overflowing expression zno *
sdkp->zone_blocks with type unsigned int (32 bits, unsigned) is evaluated
using 32-bit arithmetic, and then used in a context that expects an
expression of type sector_t (64 bits, unsigned).
Link: https://lore.kernel.org/r/20210917212314.2362324-1-bvanassche@acm.org
Fixes: 5795eb443060 ("scsi: sd_zbc: emulate ZONE_APPEND commands")
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reporting zones on a SCSI device sometimes fail with the following error:
[76248.516390] ata16.00: invalid transfer count 131328
[76248.523618] sd 15:0:0:0: [sda] REPORT ZONES start lba 536870912 failed
The error (from drivers/ata/libata-scsi.c:ata_scsi_zbc_in_xlat()) indicates
that buffer size is not aligned to SECTOR_SIZE.
This happens when the __vmalloc() failed. Consider we are reporting 4096
zones, then we will have "bufsize = roundup((4096 + 1) * 64,
SECTOR_SIZE)" = (513 * 512) = 262656. Then, __vmalloc() failure halves
the bufsize to 131328, which is no longer aligned to SECTOR_SIZE.
Use rounddown() to ensure the size is always aligned to SECTOR_SIZE and fix
the comment as well.
Link: https://lore.kernel.org/r/20210906140642.2267569-1-naohiro.aota@wdc.com
Fixes: 23a50861adda ("scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()")
Cc: stable@vger.kernel.org # 5.5+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Prepare for removal of the request pointer by using scsi_cmd_to_rq()
instead. This patch does not change any functionality.
Link: https://lore.kernel.org/r/20210809230355.8186-4-bvanassche@acm.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Replace the check for DRIVER_SENSE with a check for
scsi_status_is_check_condition().
Audit all callsites to ensure the SAM status is set correctly. For
backwards compability move the DRIVER_SENSE definition to sg.h, and update
sg, bsg, and scsi_ioctl to set the DRIVER_SENSE driver_status whenever
SAM_STAT_CHECK_CONDITION is present.
[mkp: fix zeroday srp warning]
Link: https://lore.kernel.org/r/20210427083046.31620-10-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
fix
Return the actual error code in __scsi_execute() (which, according to the
documentation, should have happened anyway). And audit all callers to cope
with negative return values from __scsi_execute() and friends.
[mkp: resolve conflict and return bool]
Link: https://lore.kernel.org/r/20210427083046.31620-7-hare@suse.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull 5.12/scsi-fixes into the 5.13 SCSI tree to provide a baseline for
some UFS changes that would otherwise cause conflicts during the
merge.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes the following W=1 kernel build warning(s):
drivers/scsi/sd_zbc.c:137: warning: wrong kernel-doc identifier on line:
Link: https://lore.kernel.org/r/20210317091230.2912389-5-lee.jones@linaro.org
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Le Moal <damien.lemoal@hgst.com>
Cc: Shaun Tancheff <shaun.tancheff@seagate.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is a few driver updates (iscsi, mpt3sas) that were still in the
staging queue when the merge window opened (all committed on or before
8 Feb) and some small bug fixes which came in during the merge window
(all committed on 22 Feb).
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYDvKnSYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXtMAQDsmEay
V/GLOBE3lvURgUz/AANItBVQ+RTbo+3r25Q7PAEA/zqxVW3NitlJPzLA/MkWQ7p6
gnfxL088xAtZr24sdHA=
=whGs
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"This is a few driver updates (iscsi, mpt3sas) that were still in the
staging queue when the merge window opened (all committed on or before
8 Feb) and some small bug fixes which came in during the merge window
(all committed on 22 Feb)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (30 commits)
scsi: hpsa: Correct dev cmds outstanding for retried cmds
scsi: sd: Fix Opal support
scsi: target: tcmu: Fix memory leak caused by wrong uio usage
scsi: target: tcmu: Move some functions without code change
scsi: sd: sd_zbc: Don't pass GFP_NOIO to kvcalloc
scsi: aic7xxx: Remove unused function pointer typedef ahc_bus_suspend/resume_t
scsi: bnx2fc: Fix Kconfig warning & CNIC build errors
scsi: ufs: Fix a duplicate dev quirk number
scsi: aic79xx: Fix spelling of version
scsi: target: core: Prevent underflow for service actions
scsi: target: core: Add cmd length set before cmd complete
scsi: iscsi: Drop session lock in iscsi_session_chkready()
scsi: qla4xxx: Use iscsi_is_session_online()
scsi: libiscsi: Reset max/exp cmdsn during recovery
scsi: iscsi_tcp: Fix shost can_queue initialization
scsi: libiscsi: Add helper to calculate max SCSI cmds per session
scsi: libiscsi: Fix iSCSI host workq destruction
scsi: libiscsi: Fix iscsi_task use after free()
scsi: libiscsi: Drop taskqueuelock
scsi: libiscsi: Fix iscsi_prep_scsi_cmd_pdu() error handling
...
Dan reported we're passing in GFP_NOIO to kvmalloc() which will then
fallback to doing kmalloc() instead of an optional vmalloc() if the size
exceeds kmalloc()s limits. This will break with drives that have zone
numbers exceeding PAGE_SIZE/sizeof(u32).
Instead of passing in GFP_NOIO, enter an implicit GFP_NOIO allocation
scope.
Link: https://lore.kernel.org/r/YCuvSfKw4qEQBr/t@mwanda
Link: https://lore.kernel.org/r/5a6345e2989fd06c049ac4e4627f6acb492c15b8.1613569821.git.johannes.thumshirn@wdc.com
Fixes: 5795eb443060: ("scsi: sd_zbc: emulate ZONE_APPEND commands")
Cc: Damien Le Moal <Damien.LeMoal@wdc.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For host-aware ZBC disk, setting the device zoned model to BLK_ZONED_HA
using blk_queue_set_zoned() in sd_read_block_characteristics() may
result in the block device effective zoned model to be "none"
(BLK_ZONED_NONE) if partitions are present on the device. In this case,
sd_zbc_read_zones() should not setup the zone related queue limits for
the disk so that the device limits and configuration is consistent with
a regular disk and resources not uselessly allocated (e.g. the zone
write pointer tracking array for zone append emulation).
Furthermore, if the disk zoned model changes at run time due to the
creation of a partition by the user, the zone related resources can be
released.
Fix both problems by introducing the function sd_zbc_clear_zone_info()
to reset the scsi disk zone information and free resources and by
returning early in sd_zbc_read_zones() for a block device that has a
zoned model equal to BLK_ZONED_NONE.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>