mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-01-12 08:09:56 +00:00
e836007089
We've found that using raid0 with the 'original' layout and discard enabled with different disk sizes (such that at least two zones are created) can result in data corruption. This is due to the fact that the discard handling in 'raid0_handle_discard()' assumes the 'alternate' layout. We've seen this corruption using ext4 but other filesystems are likely susceptible as well. More specifically, while multiple zones are necessary to create the corruption, the corruption may not occur with multiple zones if they layout in such a way the layout matches what the 'alternate' layout would have produced. Thus, not all raid0 devices with the 'original' layout, different size disks and discard enabled will encounter this corruption. The 3.14 kernel inadvertently changed the raid0 disk layout for different size disks. Thus, running a pre-3.14 kernel and post-3.14 kernel on the same raid0 array could corrupt data. This lead to the creation of the 'original' layout (to match the pre-3.14 layout) and the 'alternate' layout (to match the post 3.14 layout) in the 5.4 kernel time frame and an option to tell the kernel which layout to use (since it couldn't be autodetected). However, when the 'original' layout was added back to 5.4 discard support for the 'original' layout was not added leading this issue. I've been able to reliably reproduce the corruption with the following test case: 1. create raid0 array with different size disks using original layout 2. mkfs 3. mount -o discard 4. create lots of files 5. remove 1/2 the files 6. fstrim -a (or just the mount point for the raid0 array) 7. umount 8. fsck -fn /dev/md0 (spews all sorts of corruptions) Let's fix this by adding proper discard support to the 'original' layout. The fix 'maps' the 'original' layout disks to the order in which they are read/written such that we can compare the disks in the same way that the current 'alternate' layout does. A 'disk_shift' field is added to 'struct strip_zone'. This could be computed on the fly in raid0_handle_discard() but by adding this field, we save some computation in the discard path. Note we could also potentially fix this by re-ordering the disks in the zones that follow the first one, and then always read/writing them using the 'alternate' layout. However, that is seen as a more substantial change, and we are attempting the least invasive fix at this time to remedy the corruption. I've verified the change using the reproducer mentioned above. Typically, the corruption is seen after less than 3 iterations, while the patch has run 500+ iterations. Cc: NeilBrown <neilb@suse.de> Cc: Song Liu <song@kernel.org> Fixes: c84a1372df92 ("md/raid0: avoid RAID0 data corruption due to layout confusion.") Cc: stable@vger.kernel.org Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230623180523.1901230-1-jbaron@akamai.com
34 lines
951 B
C
34 lines
951 B
C
/* SPDX-License-Identifier: GPL-2.0 */
|
|
#ifndef _RAID0_H
|
|
#define _RAID0_H
|
|
|
|
struct strip_zone {
|
|
sector_t zone_end; /* Start of the next zone (in sectors) */
|
|
sector_t dev_start; /* Zone offset in real dev (in sectors) */
|
|
int nb_dev; /* # of devices attached to the zone */
|
|
int disk_shift; /* start disk for the original layout */
|
|
};
|
|
|
|
/* Linux 3.14 (20d0189b101) made an unintended change to
|
|
* the RAID0 layout for multi-zone arrays (where devices aren't all
|
|
* the same size.
|
|
* RAID0_ORIG_LAYOUT restores the original layout
|
|
* RAID0_ALT_MULTIZONE_LAYOUT uses the altered layout
|
|
* The layouts are identical when there is only one zone (all
|
|
* devices the same size).
|
|
*/
|
|
|
|
enum r0layout {
|
|
RAID0_ORIG_LAYOUT = 1,
|
|
RAID0_ALT_MULTIZONE_LAYOUT = 2,
|
|
};
|
|
struct r0conf {
|
|
struct strip_zone *strip_zone;
|
|
struct md_rdev **devlist; /* lists of rdevs, pointed to
|
|
* by strip_zone->dev */
|
|
int nr_strip_zones;
|
|
enum r0layout layout;
|
|
};
|
|
|
|
#endif
|