License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 14:07:57 +00:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2010-04-28 13:55:08 +00:00
|
|
|
/*
|
|
|
|
* Functions related to generic helpers functions
|
|
|
|
*/
|
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/bio.h>
|
|
|
|
#include <linux/blkdev.h>
|
|
|
|
#include <linux/scatterlist.h>
|
|
|
|
|
|
|
|
#include "blk.h"
|
|
|
|
|
2016-04-16 18:55:28 +00:00
|
|
|
int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
|
2016-06-09 14:00:36 +00:00
|
|
|
sector_t nr_sects, gfp_t gfp_mask, int flags,
|
2016-06-05 19:31:49 +00:00
|
|
|
struct bio **biop)
|
2010-04-28 13:55:08 +00:00
|
|
|
{
|
|
|
|
struct request_queue *q = bdev_get_queue(bdev);
|
2016-04-16 18:55:28 +00:00
|
|
|
struct bio *bio = *biop;
|
2016-10-28 14:48:16 +00:00
|
|
|
unsigned int op;
|
block: improve discard bio alignment in __blkdev_issue_discard()
This patch improves discard bio split for address and size alignment in
__blkdev_issue_discard(). The aligned discard bio may help underlying
device controller to perform better discard and internal garbage
collection, and avoid unnecessary internal fragment.
Current discard bio split algorithm in __blkdev_issue_discard() may have
non-discarded fregment on device even the discard bio LBA and size are
both aligned to device's discard granularity size.
Here is the example steps on how to reproduce the above problem.
- On a VMWare ESXi 6.5 update3 installation, create a 51GB virtual disk
with thin mode and give it to a Linux virtual machine.
- Inside the Linux virtual machine, if the 50GB virtual disk shows up as
/dev/sdb, fill data into the first 50GB by,
# dd if=/dev/zero of=/dev/sdb bs=4096 count=13107200
- Discard the 50GB range from offset 0 on /dev/sdb,
# blkdiscard /dev/sdb -o 0 -l 53687091200
- Observe the underlying mapping status of the device
# sg_get_lba_status /dev/sdb -m 1048 --lba=0
descriptor LBA: 0x0000000000000000 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000000000800 blocks: 16773120 deallocated
descriptor LBA: 0x0000000000fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000001000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000017ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000001800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000001fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000002000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000027ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000002800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000002fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000003000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000037ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000003800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000003fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000004000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000047ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000004800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000004fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000005000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000057ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000005800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000005fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000006000000 blocks: 6291456 deallocated
descriptor LBA: 0x0000000006600000 blocks: 0 deallocated
Although the discard bio starts at LBA 0 and has 50<<30 bytes size which
are perfect aligned to the discard granularity, from the above list
these are many 1MB (2048 sectors) internal fragments exist unexpectedly.
The problem is in __blkdev_issue_discard(), an improper algorithm causes
an improper bio size which is not aligned.
25 int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
26 sector_t nr_sects, gfp_t gfp_mask, int flags,
27 struct bio **biop)
28 {
29 struct request_queue *q = bdev_get_queue(bdev);
[snipped]
56
57 while (nr_sects) {
58 sector_t req_sects = min_t(sector_t, nr_sects,
59 bio_allowed_max_sectors(q));
60
61 WARN_ON_ONCE((req_sects << 9) > UINT_MAX);
62
63 bio = blk_next_bio(bio, 0, gfp_mask);
64 bio->bi_iter.bi_sector = sector;
65 bio_set_dev(bio, bdev);
66 bio_set_op_attrs(bio, op, 0);
67
68 bio->bi_iter.bi_size = req_sects << 9;
69 sector += req_sects;
70 nr_sects -= req_sects;
[snipped]
79 }
80
81 *biop = bio;
82 return 0;
83 }
84 EXPORT_SYMBOL(__blkdev_issue_discard);
At line 58-59, to discard a 50GB range, req_sects is set as return value
of bio_allowed_max_sectors(q), which is 8388607 sectors. In the above
case, the discard granularity is 2048 sectors, although the start LBA
and discard length are aligned to discard granularity, req_sects never
has chance to be aligned to discard granularity. This is why there are
some still-mapped 2048 sectors fragment in every 4 or 8 GB range.
If req_sects at line 58 is set to a value aligned to discard_granularity
and close to UNIT_MAX, then all consequent split bios inside device
driver are (almostly) aligned to discard_granularity of the device
queue. The 2048 sectors still-mapped fragment will disappear.
This patch introduces bio_aligned_discard_max_sectors() to return the
the value which is aligned to q->limits.discard_granularity and closest
to UINT_MAX. Then this patch replaces bio_allowed_max_sectors() with
this new routine to decide a more proper split bio length.
But we still need to handle the situation when discard start LBA is not
aligned to q->limits.discard_granularity, otherwise even the length is
aligned, current code may still leave 2048 fragment around every 4GB
range. Therefore, to calculate req_sects, firstly the start LBA of
discard range is checked (including partition offset), if it is not
aligned to discard granularity, the first split location should make
sure following bio has bi_sector aligned to discard granularity. Then
there won't be still-mapped fragment in the middle of the discard range.
The above is how this patch improves discard bio alignment in
__blkdev_issue_discard(). Now with this patch, after discard with same
command line mentiond previously, sg_get_lba_status returns,
descriptor LBA: 0x0000000000000000 blocks: 106954752 deallocated
descriptor LBA: 0x0000000006600000 blocks: 0 deallocated
We an see there is no 2048 sectors segment anymore, everything is clean.
Reported-and-tested-by: Acshai Manoj <acshai.manoj@microfocus.com>
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Enzo Matsumiya <ematsumiya@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-17 02:42:30 +00:00
|
|
|
sector_t bs_mask, part_offset = 0;
|
2010-04-28 13:55:08 +00:00
|
|
|
|
2018-01-11 13:09:12 +00:00
|
|
|
if (bdev_read_only(bdev))
|
|
|
|
return -EPERM;
|
|
|
|
|
2016-06-09 14:00:36 +00:00
|
|
|
if (flags & BLKDEV_DISCARD_SECURE) {
|
|
|
|
if (!blk_queue_secure_erase(q))
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
op = REQ_OP_SECURE_ERASE;
|
|
|
|
} else {
|
|
|
|
if (!blk_queue_discard(q))
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
op = REQ_OP_DISCARD;
|
|
|
|
}
|
2010-04-28 13:55:08 +00:00
|
|
|
|
2020-08-05 17:25:03 +00:00
|
|
|
/* In case the discard granularity isn't set by buggy device driver */
|
|
|
|
if (WARN_ON_ONCE(!q->limits.discard_granularity)) {
|
|
|
|
char dev_name[BDEVNAME_SIZE];
|
|
|
|
|
|
|
|
bdevname(bdev, dev_name);
|
|
|
|
pr_err_ratelimited("%s: Error: discard_granularity is 0.\n", dev_name);
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
2016-10-11 20:51:08 +00:00
|
|
|
bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
|
|
|
|
if ((sector | nr_sects) & bs_mask)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2018-10-29 12:57:18 +00:00
|
|
|
if (!nr_sects)
|
|
|
|
return -EINVAL;
|
2015-10-22 16:59:42 +00:00
|
|
|
|
block: improve discard bio alignment in __blkdev_issue_discard()
This patch improves discard bio split for address and size alignment in
__blkdev_issue_discard(). The aligned discard bio may help underlying
device controller to perform better discard and internal garbage
collection, and avoid unnecessary internal fragment.
Current discard bio split algorithm in __blkdev_issue_discard() may have
non-discarded fregment on device even the discard bio LBA and size are
both aligned to device's discard granularity size.
Here is the example steps on how to reproduce the above problem.
- On a VMWare ESXi 6.5 update3 installation, create a 51GB virtual disk
with thin mode and give it to a Linux virtual machine.
- Inside the Linux virtual machine, if the 50GB virtual disk shows up as
/dev/sdb, fill data into the first 50GB by,
# dd if=/dev/zero of=/dev/sdb bs=4096 count=13107200
- Discard the 50GB range from offset 0 on /dev/sdb,
# blkdiscard /dev/sdb -o 0 -l 53687091200
- Observe the underlying mapping status of the device
# sg_get_lba_status /dev/sdb -m 1048 --lba=0
descriptor LBA: 0x0000000000000000 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000000000800 blocks: 16773120 deallocated
descriptor LBA: 0x0000000000fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000001000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000017ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000001800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000001fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000002000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000027ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000002800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000002fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000003000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000037ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000003800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000003fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000004000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000047ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000004800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000004fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000005000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000057ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000005800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000005fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000006000000 blocks: 6291456 deallocated
descriptor LBA: 0x0000000006600000 blocks: 0 deallocated
Although the discard bio starts at LBA 0 and has 50<<30 bytes size which
are perfect aligned to the discard granularity, from the above list
these are many 1MB (2048 sectors) internal fragments exist unexpectedly.
The problem is in __blkdev_issue_discard(), an improper algorithm causes
an improper bio size which is not aligned.
25 int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
26 sector_t nr_sects, gfp_t gfp_mask, int flags,
27 struct bio **biop)
28 {
29 struct request_queue *q = bdev_get_queue(bdev);
[snipped]
56
57 while (nr_sects) {
58 sector_t req_sects = min_t(sector_t, nr_sects,
59 bio_allowed_max_sectors(q));
60
61 WARN_ON_ONCE((req_sects << 9) > UINT_MAX);
62
63 bio = blk_next_bio(bio, 0, gfp_mask);
64 bio->bi_iter.bi_sector = sector;
65 bio_set_dev(bio, bdev);
66 bio_set_op_attrs(bio, op, 0);
67
68 bio->bi_iter.bi_size = req_sects << 9;
69 sector += req_sects;
70 nr_sects -= req_sects;
[snipped]
79 }
80
81 *biop = bio;
82 return 0;
83 }
84 EXPORT_SYMBOL(__blkdev_issue_discard);
At line 58-59, to discard a 50GB range, req_sects is set as return value
of bio_allowed_max_sectors(q), which is 8388607 sectors. In the above
case, the discard granularity is 2048 sectors, although the start LBA
and discard length are aligned to discard granularity, req_sects never
has chance to be aligned to discard granularity. This is why there are
some still-mapped 2048 sectors fragment in every 4 or 8 GB range.
If req_sects at line 58 is set to a value aligned to discard_granularity
and close to UNIT_MAX, then all consequent split bios inside device
driver are (almostly) aligned to discard_granularity of the device
queue. The 2048 sectors still-mapped fragment will disappear.
This patch introduces bio_aligned_discard_max_sectors() to return the
the value which is aligned to q->limits.discard_granularity and closest
to UINT_MAX. Then this patch replaces bio_allowed_max_sectors() with
this new routine to decide a more proper split bio length.
But we still need to handle the situation when discard start LBA is not
aligned to q->limits.discard_granularity, otherwise even the length is
aligned, current code may still leave 2048 fragment around every 4GB
range. Therefore, to calculate req_sects, firstly the start LBA of
discard range is checked (including partition offset), if it is not
aligned to discard granularity, the first split location should make
sure following bio has bi_sector aligned to discard granularity. Then
there won't be still-mapped fragment in the middle of the discard range.
The above is how this patch improves discard bio alignment in
__blkdev_issue_discard(). Now with this patch, after discard with same
command line mentiond previously, sg_get_lba_status returns,
descriptor LBA: 0x0000000000000000 blocks: 106954752 deallocated
descriptor LBA: 0x0000000006600000 blocks: 0 deallocated
We an see there is no 2048 sectors segment anymore, everything is clean.
Reported-and-tested-by: Acshai Manoj <acshai.manoj@microfocus.com>
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Enzo Matsumiya <ematsumiya@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-17 02:42:30 +00:00
|
|
|
/* In case the discard request is in a partition */
|
2020-09-03 05:40:57 +00:00
|
|
|
if (bdev_is_partition(bdev))
|
2020-11-24 08:34:24 +00:00
|
|
|
part_offset = bdev->bd_start_sect;
|
block: improve discard bio alignment in __blkdev_issue_discard()
This patch improves discard bio split for address and size alignment in
__blkdev_issue_discard(). The aligned discard bio may help underlying
device controller to perform better discard and internal garbage
collection, and avoid unnecessary internal fragment.
Current discard bio split algorithm in __blkdev_issue_discard() may have
non-discarded fregment on device even the discard bio LBA and size are
both aligned to device's discard granularity size.
Here is the example steps on how to reproduce the above problem.
- On a VMWare ESXi 6.5 update3 installation, create a 51GB virtual disk
with thin mode and give it to a Linux virtual machine.
- Inside the Linux virtual machine, if the 50GB virtual disk shows up as
/dev/sdb, fill data into the first 50GB by,
# dd if=/dev/zero of=/dev/sdb bs=4096 count=13107200
- Discard the 50GB range from offset 0 on /dev/sdb,
# blkdiscard /dev/sdb -o 0 -l 53687091200
- Observe the underlying mapping status of the device
# sg_get_lba_status /dev/sdb -m 1048 --lba=0
descriptor LBA: 0x0000000000000000 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000000000800 blocks: 16773120 deallocated
descriptor LBA: 0x0000000000fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000001000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000017ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000001800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000001fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000002000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000027ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000002800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000002fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000003000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000037ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000003800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000003fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000004000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000047ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000004800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000004fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000005000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000057ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000005800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000005fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000006000000 blocks: 6291456 deallocated
descriptor LBA: 0x0000000006600000 blocks: 0 deallocated
Although the discard bio starts at LBA 0 and has 50<<30 bytes size which
are perfect aligned to the discard granularity, from the above list
these are many 1MB (2048 sectors) internal fragments exist unexpectedly.
The problem is in __blkdev_issue_discard(), an improper algorithm causes
an improper bio size which is not aligned.
25 int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
26 sector_t nr_sects, gfp_t gfp_mask, int flags,
27 struct bio **biop)
28 {
29 struct request_queue *q = bdev_get_queue(bdev);
[snipped]
56
57 while (nr_sects) {
58 sector_t req_sects = min_t(sector_t, nr_sects,
59 bio_allowed_max_sectors(q));
60
61 WARN_ON_ONCE((req_sects << 9) > UINT_MAX);
62
63 bio = blk_next_bio(bio, 0, gfp_mask);
64 bio->bi_iter.bi_sector = sector;
65 bio_set_dev(bio, bdev);
66 bio_set_op_attrs(bio, op, 0);
67
68 bio->bi_iter.bi_size = req_sects << 9;
69 sector += req_sects;
70 nr_sects -= req_sects;
[snipped]
79 }
80
81 *biop = bio;
82 return 0;
83 }
84 EXPORT_SYMBOL(__blkdev_issue_discard);
At line 58-59, to discard a 50GB range, req_sects is set as return value
of bio_allowed_max_sectors(q), which is 8388607 sectors. In the above
case, the discard granularity is 2048 sectors, although the start LBA
and discard length are aligned to discard granularity, req_sects never
has chance to be aligned to discard granularity. This is why there are
some still-mapped 2048 sectors fragment in every 4 or 8 GB range.
If req_sects at line 58 is set to a value aligned to discard_granularity
and close to UNIT_MAX, then all consequent split bios inside device
driver are (almostly) aligned to discard_granularity of the device
queue. The 2048 sectors still-mapped fragment will disappear.
This patch introduces bio_aligned_discard_max_sectors() to return the
the value which is aligned to q->limits.discard_granularity and closest
to UINT_MAX. Then this patch replaces bio_allowed_max_sectors() with
this new routine to decide a more proper split bio length.
But we still need to handle the situation when discard start LBA is not
aligned to q->limits.discard_granularity, otherwise even the length is
aligned, current code may still leave 2048 fragment around every 4GB
range. Therefore, to calculate req_sects, firstly the start LBA of
discard range is checked (including partition offset), if it is not
aligned to discard granularity, the first split location should make
sure following bio has bi_sector aligned to discard granularity. Then
there won't be still-mapped fragment in the middle of the discard range.
The above is how this patch improves discard bio alignment in
__blkdev_issue_discard(). Now with this patch, after discard with same
command line mentiond previously, sg_get_lba_status returns,
descriptor LBA: 0x0000000000000000 blocks: 106954752 deallocated
descriptor LBA: 0x0000000006600000 blocks: 0 deallocated
We an see there is no 2048 sectors segment anymore, everything is clean.
Reported-and-tested-by: Acshai Manoj <acshai.manoj@microfocus.com>
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Enzo Matsumiya <ematsumiya@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-17 02:42:30 +00:00
|
|
|
|
2018-10-29 12:57:18 +00:00
|
|
|
while (nr_sects) {
|
block: improve discard bio alignment in __blkdev_issue_discard()
This patch improves discard bio split for address and size alignment in
__blkdev_issue_discard(). The aligned discard bio may help underlying
device controller to perform better discard and internal garbage
collection, and avoid unnecessary internal fragment.
Current discard bio split algorithm in __blkdev_issue_discard() may have
non-discarded fregment on device even the discard bio LBA and size are
both aligned to device's discard granularity size.
Here is the example steps on how to reproduce the above problem.
- On a VMWare ESXi 6.5 update3 installation, create a 51GB virtual disk
with thin mode and give it to a Linux virtual machine.
- Inside the Linux virtual machine, if the 50GB virtual disk shows up as
/dev/sdb, fill data into the first 50GB by,
# dd if=/dev/zero of=/dev/sdb bs=4096 count=13107200
- Discard the 50GB range from offset 0 on /dev/sdb,
# blkdiscard /dev/sdb -o 0 -l 53687091200
- Observe the underlying mapping status of the device
# sg_get_lba_status /dev/sdb -m 1048 --lba=0
descriptor LBA: 0x0000000000000000 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000000000800 blocks: 16773120 deallocated
descriptor LBA: 0x0000000000fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000001000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000017ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000001800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000001fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000002000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000027ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000002800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000002fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000003000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000037ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000003800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000003fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000004000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000047ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000004800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000004fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000005000000 blocks: 8386560 deallocated
descriptor LBA: 0x00000000057ff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000005800000 blocks: 8386560 deallocated
descriptor LBA: 0x0000000005fff800 blocks: 2048 mapped (or unknown)
descriptor LBA: 0x0000000006000000 blocks: 6291456 deallocated
descriptor LBA: 0x0000000006600000 blocks: 0 deallocated
Although the discard bio starts at LBA 0 and has 50<<30 bytes size which
are perfect aligned to the discard granularity, from the above list
these are many 1MB (2048 sectors) internal fragments exist unexpectedly.
The problem is in __blkdev_issue_discard(), an improper algorithm causes
an improper bio size which is not aligned.
25 int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
26 sector_t nr_sects, gfp_t gfp_mask, int flags,
27 struct bio **biop)
28 {
29 struct request_queue *q = bdev_get_queue(bdev);
[snipped]
56
57 while (nr_sects) {
58 sector_t req_sects = min_t(sector_t, nr_sects,
59 bio_allowed_max_sectors(q));
60
61 WARN_ON_ONCE((req_sects << 9) > UINT_MAX);
62
63 bio = blk_next_bio(bio, 0, gfp_mask);
64 bio->bi_iter.bi_sector = sector;
65 bio_set_dev(bio, bdev);
66 bio_set_op_attrs(bio, op, 0);
67
68 bio->bi_iter.bi_size = req_sects << 9;
69 sector += req_sects;
70 nr_sects -= req_sects;
[snipped]
79 }
80
81 *biop = bio;
82 return 0;
83 }
84 EXPORT_SYMBOL(__blkdev_issue_discard);
At line 58-59, to discard a 50GB range, req_sects is set as return value
of bio_allowed_max_sectors(q), which is 8388607 sectors. In the above
case, the discard granularity is 2048 sectors, although the start LBA
and discard length are aligned to discard granularity, req_sects never
has chance to be aligned to discard granularity. This is why there are
some still-mapped 2048 sectors fragment in every 4 or 8 GB range.
If req_sects at line 58 is set to a value aligned to discard_granularity
and close to UNIT_MAX, then all consequent split bios inside device
driver are (almostly) aligned to discard_granularity of the device
queue. The 2048 sectors still-mapped fragment will disappear.
This patch introduces bio_aligned_discard_max_sectors() to return the
the value which is aligned to q->limits.discard_granularity and closest
to UINT_MAX. Then this patch replaces bio_allowed_max_sectors() with
this new routine to decide a more proper split bio length.
But we still need to handle the situation when discard start LBA is not
aligned to q->limits.discard_granularity, otherwise even the length is
aligned, current code may still leave 2048 fragment around every 4GB
range. Therefore, to calculate req_sects, firstly the start LBA of
discard range is checked (including partition offset), if it is not
aligned to discard granularity, the first split location should make
sure following bio has bi_sector aligned to discard granularity. Then
there won't be still-mapped fragment in the middle of the discard range.
The above is how this patch improves discard bio alignment in
__blkdev_issue_discard(). Now with this patch, after discard with same
command line mentiond previously, sg_get_lba_status returns,
descriptor LBA: 0x0000000000000000 blocks: 106954752 deallocated
descriptor LBA: 0x0000000006600000 blocks: 0 deallocated
We an see there is no 2048 sectors segment anymore, everything is clean.
Reported-and-tested-by: Acshai Manoj <acshai.manoj@microfocus.com>
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Enzo Matsumiya <ematsumiya@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-17 02:42:30 +00:00
|
|
|
sector_t granularity_aligned_lba, req_sects;
|
|
|
|
sector_t sector_mapped = sector + part_offset;
|
|
|
|
|
|
|
|
granularity_aligned_lba = round_up(sector_mapped,
|
|
|
|
q->limits.discard_granularity >> SECTOR_SHIFT);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check whether the discard bio starts at a discard_granularity
|
|
|
|
* aligned LBA,
|
|
|
|
* - If no: set (granularity_aligned_lba - sector_mapped) to
|
|
|
|
* bi_size of the first split bio, then the second bio will
|
|
|
|
* start at a discard_granularity aligned LBA on the device.
|
|
|
|
* - If yes: use bio_aligned_discard_max_sectors() as the max
|
|
|
|
* possible bi_size of the first split bio. Then when this bio
|
|
|
|
* is split in device drive, the split ones are very probably
|
|
|
|
* to be aligned to discard_granularity of the device's queue.
|
|
|
|
*/
|
|
|
|
if (granularity_aligned_lba == sector_mapped)
|
|
|
|
req_sects = min_t(sector_t, nr_sects,
|
|
|
|
bio_aligned_discard_max_sectors(q));
|
|
|
|
else
|
|
|
|
req_sects = min_t(sector_t, nr_sects,
|
|
|
|
granularity_aligned_lba - sector_mapped);
|
block: split discard into aligned requests
When a disk has large discard_granularity and small max_discard_sectors,
discards are not split with optimal alignment. In the limit case of
discard_granularity == max_discard_sectors, no request could be aligned
correctly, so in fact you might end up with no discarded logical blocks
at all.
Another example that helps showing the condition in the patch is with
discard_granularity == 64, max_discard_sectors == 128. A request that is
submitted for 256 sectors 2..257 will be split in two: 2..129, 130..257.
However, only 2 aligned blocks out of 3 are included in the request;
128..191 may be left intact and not discarded. With this patch, the
first request will be truncated to ensure good alignment of what's left,
and the split will be 2..127, 128..255, 256..257. The patch will also
take into account the discard_alignment.
At most one extra request will be introduced, because the first request
will be reduced by at most granularity-1 sectors, and granularity
must be less than max_discard_sectors. Subsequent requests will run
on round_down(max_discard_sectors, granularity) sectors, as in the
current code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-08-02 07:48:50 +00:00
|
|
|
|
2018-11-14 15:17:18 +00:00
|
|
|
WARN_ON_ONCE((req_sects << 9) > UINT_MAX);
|
|
|
|
|
2022-01-24 09:11:02 +00:00
|
|
|
bio = blk_next_bio(bio, bdev, 0, op, gfp_mask);
|
2013-10-11 22:44:27 +00:00
|
|
|
bio->bi_iter.bi_sector = sector;
|
|
|
|
bio->bi_iter.bi_size = req_sects << 9;
|
2018-10-29 12:57:18 +00:00
|
|
|
sector += req_sects;
|
block: split discard into aligned requests
When a disk has large discard_granularity and small max_discard_sectors,
discards are not split with optimal alignment. In the limit case of
discard_granularity == max_discard_sectors, no request could be aligned
correctly, so in fact you might end up with no discarded logical blocks
at all.
Another example that helps showing the condition in the patch is with
discard_granularity == 64, max_discard_sectors == 128. A request that is
submitted for 256 sectors 2..257 will be split in two: 2..129, 130..257.
However, only 2 aligned blocks out of 3 are included in the request;
128..191 may be left intact and not discarded. With this patch, the
first request will be truncated to ensure good alignment of what's left,
and the split will be 2..127, 128..255, 256..257. The patch will also
take into account the discard_alignment.
At most one extra request will be introduced, because the first request
will be reduced by at most granularity-1 sectors, and granularity
must be less than max_discard_sectors. Subsequent requests will run
on round_down(max_discard_sectors, granularity) sectors, as in the
current code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-08-02 07:48:50 +00:00
|
|
|
nr_sects -= req_sects;
|
2010-04-28 13:55:08 +00:00
|
|
|
|
2014-02-12 16:34:01 +00:00
|
|
|
/*
|
|
|
|
* We can loop for a long time in here, if someone does
|
|
|
|
* full device discards (like mkfs). Be nice and allow
|
|
|
|
* us to schedule out to avoid softlocking if preempt
|
|
|
|
* is disabled.
|
|
|
|
*/
|
|
|
|
cond_resched();
|
2011-05-07 01:26:27 +00:00
|
|
|
}
|
2016-04-16 18:55:28 +00:00
|
|
|
|
|
|
|
*biop = bio;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(__blkdev_issue_discard);
|
|
|
|
|
|
|
|
/**
|
|
|
|
* blkdev_issue_discard - queue a discard
|
|
|
|
* @bdev: blockdev to issue discard for
|
|
|
|
* @sector: start sector
|
|
|
|
* @nr_sects: number of sectors to discard
|
|
|
|
* @gfp_mask: memory allocation flags (for bio_alloc)
|
2017-01-23 19:41:39 +00:00
|
|
|
* @flags: BLKDEV_DISCARD_* flags to control behaviour
|
2016-04-16 18:55:28 +00:00
|
|
|
*
|
|
|
|
* Description:
|
|
|
|
* Issue a discard request for the sectors in question.
|
|
|
|
*/
|
|
|
|
int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
|
|
|
|
sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
|
|
|
|
{
|
|
|
|
struct bio *bio = NULL;
|
|
|
|
struct blk_plug plug;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
blk_start_plug(&plug);
|
2016-06-09 14:00:36 +00:00
|
|
|
ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, flags,
|
2016-04-16 18:55:28 +00:00
|
|
|
&bio);
|
2016-05-05 15:54:21 +00:00
|
|
|
if (!ret && bio) {
|
2016-06-05 19:31:41 +00:00
|
|
|
ret = submit_bio_wait(bio);
|
2017-04-05 17:21:23 +00:00
|
|
|
if (ret == -EOPNOTSUPP)
|
2016-05-05 15:54:21 +00:00
|
|
|
ret = 0;
|
2016-06-07 16:32:13 +00:00
|
|
|
bio_put(bio);
|
2016-05-05 15:54:21 +00:00
|
|
|
}
|
2012-12-14 03:15:51 +00:00
|
|
|
blk_finish_plug(&plug);
|
2010-04-28 13:55:08 +00:00
|
|
|
|
2016-05-05 15:54:21 +00:00
|
|
|
return ret;
|
2010-04-28 13:55:08 +00:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(blkdev_issue_discard);
|
2010-04-28 13:55:09 +00:00
|
|
|
|
2012-09-18 16:19:27 +00:00
|
|
|
/**
|
2016-11-30 20:28:58 +00:00
|
|
|
* __blkdev_issue_write_same - generate number of bios with same page
|
2012-09-18 16:19:27 +00:00
|
|
|
* @bdev: target blockdev
|
|
|
|
* @sector: start sector
|
|
|
|
* @nr_sects: number of sectors to write
|
|
|
|
* @gfp_mask: memory allocation flags (for bio_alloc)
|
|
|
|
* @page: page containing data to write
|
2016-11-30 20:28:58 +00:00
|
|
|
* @biop: pointer to anchor bio
|
2012-09-18 16:19:27 +00:00
|
|
|
*
|
|
|
|
* Description:
|
2016-11-30 20:28:58 +00:00
|
|
|
* Generate and issue number of bios(REQ_OP_WRITE_SAME) with same page.
|
2012-09-18 16:19:27 +00:00
|
|
|
*/
|
2016-11-30 20:28:58 +00:00
|
|
|
static int __blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
|
|
|
|
sector_t nr_sects, gfp_t gfp_mask, struct page *page,
|
|
|
|
struct bio **biop)
|
2012-09-18 16:19:27 +00:00
|
|
|
{
|
|
|
|
struct request_queue *q = bdev_get_queue(bdev);
|
|
|
|
unsigned int max_write_same_sectors;
|
2016-11-30 20:28:58 +00:00
|
|
|
struct bio *bio = *biop;
|
2016-10-11 20:51:08 +00:00
|
|
|
sector_t bs_mask;
|
2012-09-18 16:19:27 +00:00
|
|
|
|
2018-01-11 13:09:12 +00:00
|
|
|
if (bdev_read_only(bdev))
|
|
|
|
return -EPERM;
|
|
|
|
|
2016-10-11 20:51:08 +00:00
|
|
|
bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
|
|
|
|
if ((sector | nr_sects) & bs_mask)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2016-11-30 20:28:58 +00:00
|
|
|
if (!bdev_write_same(bdev))
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
2015-05-22 07:46:56 +00:00
|
|
|
/* Ensure that max_write_same_sectors doesn't overflow bi_size */
|
2018-10-29 12:57:19 +00:00
|
|
|
max_write_same_sectors = bio_allowed_max_sectors(q);
|
2012-09-18 16:19:27 +00:00
|
|
|
|
|
|
|
while (nr_sects) {
|
2022-01-24 09:11:02 +00:00
|
|
|
bio = blk_next_bio(bio, bdev, 1, REQ_OP_WRITE_SAME, gfp_mask);
|
2013-10-11 22:44:27 +00:00
|
|
|
bio->bi_iter.bi_sector = sector;
|
2012-09-18 16:19:27 +00:00
|
|
|
bio->bi_vcnt = 1;
|
|
|
|
bio->bi_io_vec->bv_page = page;
|
|
|
|
bio->bi_io_vec->bv_offset = 0;
|
|
|
|
bio->bi_io_vec->bv_len = bdev_logical_block_size(bdev);
|
|
|
|
|
|
|
|
if (nr_sects > max_write_same_sectors) {
|
2013-10-11 22:44:27 +00:00
|
|
|
bio->bi_iter.bi_size = max_write_same_sectors << 9;
|
2012-09-18 16:19:27 +00:00
|
|
|
nr_sects -= max_write_same_sectors;
|
|
|
|
sector += max_write_same_sectors;
|
|
|
|
} else {
|
2013-10-11 22:44:27 +00:00
|
|
|
bio->bi_iter.bi_size = nr_sects << 9;
|
2012-09-18 16:19:27 +00:00
|
|
|
nr_sects = 0;
|
|
|
|
}
|
2016-11-30 20:28:58 +00:00
|
|
|
cond_resched();
|
2012-09-18 16:19:27 +00:00
|
|
|
}
|
|
|
|
|
2016-11-30 20:28:58 +00:00
|
|
|
*biop = bio;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* blkdev_issue_write_same - queue a write same operation
|
|
|
|
* @bdev: target blockdev
|
|
|
|
* @sector: start sector
|
|
|
|
* @nr_sects: number of sectors to write
|
|
|
|
* @gfp_mask: memory allocation flags (for bio_alloc)
|
|
|
|
* @page: page containing data
|
|
|
|
*
|
|
|
|
* Description:
|
|
|
|
* Issue a write same request for the sectors in question.
|
|
|
|
*/
|
|
|
|
int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
|
|
|
|
sector_t nr_sects, gfp_t gfp_mask,
|
|
|
|
struct page *page)
|
|
|
|
{
|
|
|
|
struct bio *bio = NULL;
|
|
|
|
struct blk_plug plug;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
blk_start_plug(&plug);
|
|
|
|
ret = __blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, page,
|
|
|
|
&bio);
|
|
|
|
if (ret == 0 && bio) {
|
2016-06-05 19:31:41 +00:00
|
|
|
ret = submit_bio_wait(bio);
|
2016-06-07 16:32:13 +00:00
|
|
|
bio_put(bio);
|
|
|
|
}
|
2016-11-30 20:28:58 +00:00
|
|
|
blk_finish_plug(&plug);
|
2016-07-19 09:23:34 +00:00
|
|
|
return ret;
|
2012-09-18 16:19:27 +00:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL(blkdev_issue_write_same);
|
|
|
|
|
2016-11-30 20:28:59 +00:00
|
|
|
static int __blkdev_issue_write_zeroes(struct block_device *bdev,
|
|
|
|
sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
|
2017-04-05 17:21:09 +00:00
|
|
|
struct bio **biop, unsigned flags)
|
2016-11-30 20:28:59 +00:00
|
|
|
{
|
|
|
|
struct bio *bio = *biop;
|
|
|
|
unsigned int max_write_zeroes_sectors;
|
|
|
|
|
2018-01-11 13:09:12 +00:00
|
|
|
if (bdev_read_only(bdev))
|
|
|
|
return -EPERM;
|
|
|
|
|
2016-11-30 20:28:59 +00:00
|
|
|
/* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
|
|
|
|
max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev);
|
|
|
|
|
|
|
|
if (max_write_zeroes_sectors == 0)
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
|
|
|
while (nr_sects) {
|
2022-01-24 09:11:02 +00:00
|
|
|
bio = blk_next_bio(bio, bdev, 0, REQ_OP_WRITE_ZEROES, gfp_mask);
|
2016-11-30 20:28:59 +00:00
|
|
|
bio->bi_iter.bi_sector = sector;
|
2017-04-05 17:21:09 +00:00
|
|
|
if (flags & BLKDEV_ZERO_NOUNMAP)
|
|
|
|
bio->bi_opf |= REQ_NOUNMAP;
|
2016-11-30 20:28:59 +00:00
|
|
|
|
|
|
|
if (nr_sects > max_write_zeroes_sectors) {
|
|
|
|
bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
|
|
|
|
nr_sects -= max_write_zeroes_sectors;
|
|
|
|
sector += max_write_zeroes_sectors;
|
|
|
|
} else {
|
|
|
|
bio->bi_iter.bi_size = nr_sects << 9;
|
|
|
|
nr_sects = 0;
|
|
|
|
}
|
|
|
|
cond_resched();
|
|
|
|
}
|
|
|
|
|
|
|
|
*biop = bio;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-07-06 11:21:15 +00:00
|
|
|
/*
|
|
|
|
* Convert a number of 512B sectors to a number of pages.
|
|
|
|
* The result is limited to a number of pages that can fit into a BIO.
|
|
|
|
* Also make sure that the result is always at least 1 (page) for the cases
|
|
|
|
* where nr_sects is lower than the number of sectors in a page.
|
|
|
|
*/
|
|
|
|
static unsigned int __blkdev_sectors_to_bio_pages(sector_t nr_sects)
|
|
|
|
{
|
2017-09-11 15:46:49 +00:00
|
|
|
sector_t pages = DIV_ROUND_UP_SECTOR_T(nr_sects, PAGE_SIZE / 512);
|
2017-07-06 11:21:15 +00:00
|
|
|
|
2021-03-11 11:01:37 +00:00
|
|
|
return min(pages, (sector_t)BIO_MAX_VECS);
|
2017-07-06 11:21:15 +00:00
|
|
|
}
|
|
|
|
|
2017-10-16 13:59:09 +00:00
|
|
|
static int __blkdev_issue_zero_pages(struct block_device *bdev,
|
|
|
|
sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
|
|
|
|
struct bio **biop)
|
|
|
|
{
|
|
|
|
struct bio *bio = *biop;
|
|
|
|
int bi_size = 0;
|
|
|
|
unsigned int sz;
|
|
|
|
|
2018-01-11 13:09:12 +00:00
|
|
|
if (bdev_read_only(bdev))
|
|
|
|
return -EPERM;
|
|
|
|
|
2017-10-16 13:59:09 +00:00
|
|
|
while (nr_sects != 0) {
|
2022-01-24 09:11:02 +00:00
|
|
|
bio = blk_next_bio(bio, bdev, __blkdev_sectors_to_bio_pages(nr_sects),
|
|
|
|
REQ_OP_WRITE, gfp_mask);
|
2017-10-16 13:59:09 +00:00
|
|
|
bio->bi_iter.bi_sector = sector;
|
|
|
|
|
|
|
|
while (nr_sects != 0) {
|
|
|
|
sz = min((sector_t) PAGE_SIZE, nr_sects << 9);
|
|
|
|
bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0);
|
|
|
|
nr_sects -= bi_size >> 9;
|
|
|
|
sector += bi_size >> 9;
|
|
|
|
if (bi_size < sz)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
cond_resched();
|
|
|
|
}
|
|
|
|
|
|
|
|
*biop = bio;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-04-28 13:55:09 +00:00
|
|
|
/**
|
2016-11-30 20:28:58 +00:00
|
|
|
* __blkdev_issue_zeroout - generate number of zero filed write bios
|
2010-04-28 13:55:09 +00:00
|
|
|
* @bdev: blockdev to issue
|
|
|
|
* @sector: start sector
|
|
|
|
* @nr_sects: number of sectors to write
|
|
|
|
* @gfp_mask: memory allocation flags (for bio_alloc)
|
2016-11-30 20:28:58 +00:00
|
|
|
* @biop: pointer to anchor bio
|
2017-04-05 17:21:08 +00:00
|
|
|
* @flags: controls detailed behavior
|
2010-04-28 13:55:09 +00:00
|
|
|
*
|
|
|
|
* Description:
|
2017-04-05 17:21:08 +00:00
|
|
|
* Zero-fill a block range, either using hardware offload or by explicitly
|
|
|
|
* writing zeroes to the device.
|
|
|
|
*
|
|
|
|
* If a device is using logical block provisioning, the underlying space will
|
|
|
|
* not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
|
2017-04-05 17:21:10 +00:00
|
|
|
*
|
|
|
|
* If %flags contains BLKDEV_ZERO_NOFALLBACK, the function will return
|
|
|
|
* -EOPNOTSUPP if no explicit hardware offload for zeroing is provided.
|
2010-04-28 13:55:09 +00:00
|
|
|
*/
|
2016-11-30 20:28:58 +00:00
|
|
|
int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
|
|
|
|
sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
|
2017-04-05 17:21:08 +00:00
|
|
|
unsigned flags)
|
2010-04-28 13:55:09 +00:00
|
|
|
{
|
2010-08-06 11:23:25 +00:00
|
|
|
int ret;
|
2016-10-11 20:51:08 +00:00
|
|
|
sector_t bs_mask;
|
|
|
|
|
|
|
|
bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
|
|
|
|
if ((sector | nr_sects) & bs_mask)
|
|
|
|
return -EINVAL;
|
2010-04-28 13:55:09 +00:00
|
|
|
|
2016-11-30 20:28:59 +00:00
|
|
|
ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp_mask,
|
2017-04-05 17:21:09 +00:00
|
|
|
biop, flags);
|
2017-04-05 17:21:10 +00:00
|
|
|
if (ret != -EOPNOTSUPP || (flags & BLKDEV_ZERO_NOFALLBACK))
|
2017-10-16 13:59:09 +00:00
|
|
|
return ret;
|
2010-04-28 13:55:09 +00:00
|
|
|
|
2017-10-16 13:59:09 +00:00
|
|
|
return __blkdev_issue_zero_pages(bdev, sector, nr_sects, gfp_mask,
|
|
|
|
biop);
|
2010-04-28 13:55:09 +00:00
|
|
|
}
|
2016-11-30 20:28:58 +00:00
|
|
|
EXPORT_SYMBOL(__blkdev_issue_zeroout);
|
2012-09-18 16:19:28 +00:00
|
|
|
|
|
|
|
/**
|
|
|
|
* blkdev_issue_zeroout - zero-fill a block range
|
|
|
|
* @bdev: blockdev to write
|
|
|
|
* @sector: start sector
|
|
|
|
* @nr_sects: number of sectors to write
|
|
|
|
* @gfp_mask: memory allocation flags (for bio_alloc)
|
2017-04-05 17:21:08 +00:00
|
|
|
* @flags: controls detailed behavior
|
2012-09-18 16:19:28 +00:00
|
|
|
*
|
|
|
|
* Description:
|
2017-04-05 17:21:08 +00:00
|
|
|
* Zero-fill a block range, either using hardware offload or by explicitly
|
|
|
|
* writing zeroes to the device. See __blkdev_issue_zeroout() for the
|
|
|
|
* valid values for %flags.
|
2012-09-18 16:19:28 +00:00
|
|
|
*/
|
|
|
|
int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
|
2017-04-05 17:21:08 +00:00
|
|
|
sector_t nr_sects, gfp_t gfp_mask, unsigned flags)
|
2012-09-18 16:19:28 +00:00
|
|
|
{
|
2017-10-16 13:59:10 +00:00
|
|
|
int ret = 0;
|
|
|
|
sector_t bs_mask;
|
|
|
|
struct bio *bio;
|
2016-11-30 20:28:58 +00:00
|
|
|
struct blk_plug plug;
|
2017-10-16 13:59:10 +00:00
|
|
|
bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev);
|
2015-01-21 01:06:30 +00:00
|
|
|
|
2017-10-16 13:59:10 +00:00
|
|
|
bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
|
|
|
|
if ((sector | nr_sects) & bs_mask)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
retry:
|
|
|
|
bio = NULL;
|
2016-11-30 20:28:58 +00:00
|
|
|
blk_start_plug(&plug);
|
2017-10-16 13:59:10 +00:00
|
|
|
if (try_write_zeroes) {
|
|
|
|
ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects,
|
|
|
|
gfp_mask, &bio, flags);
|
|
|
|
} else if (!(flags & BLKDEV_ZERO_NOFALLBACK)) {
|
|
|
|
ret = __blkdev_issue_zero_pages(bdev, sector, nr_sects,
|
|
|
|
gfp_mask, &bio);
|
|
|
|
} else {
|
|
|
|
/* No zeroing offload support */
|
|
|
|
ret = -EOPNOTSUPP;
|
|
|
|
}
|
2016-11-30 20:28:58 +00:00
|
|
|
if (ret == 0 && bio) {
|
|
|
|
ret = submit_bio_wait(bio);
|
|
|
|
bio_put(bio);
|
|
|
|
}
|
|
|
|
blk_finish_plug(&plug);
|
2017-10-16 13:59:10 +00:00
|
|
|
if (ret && try_write_zeroes) {
|
|
|
|
if (!(flags & BLKDEV_ZERO_NOFALLBACK)) {
|
|
|
|
try_write_zeroes = false;
|
|
|
|
goto retry;
|
|
|
|
}
|
|
|
|
if (!bdev_write_zeroes_sectors(bdev)) {
|
|
|
|
/*
|
|
|
|
* Zeroing offload support was indicated, but the
|
|
|
|
* device reported ILLEGAL REQUEST (for some devices
|
|
|
|
* there is no non-destructive way to verify whether
|
|
|
|
* WRITE ZEROES is actually supported).
|
|
|
|
*/
|
|
|
|
ret = -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
}
|
2012-09-18 16:19:28 +00:00
|
|
|
|
2016-11-30 20:28:58 +00:00
|
|
|
return ret;
|
2012-09-18 16:19:28 +00:00
|
|
|
}
|
2010-04-28 13:55:09 +00:00
|
|
|
EXPORT_SYMBOL(blkdev_issue_zeroout);
|