Commit Graph

1200128 Commits

Author SHA1 Message Date
Christian Brauner
dc3216b141
super: ensure valid info
For keyed filesystems that recycle superblocks based on s_fs_info or
information contained therein s_fs_info must be kept as long as the
superblock is on the filesystem type super list. This isn't guaranteed
as s_fs_info will be freed latest in sb->kill_sb().

The fix is simply to perform notification and list removal in
kill_anon_super(). Any filesystem needs to free s_fs_info after they
call the kill_*() helpers. If they don't they risk use-after-free right
now so fixing it here is guaranteed that s_fs_info remain valid.

For block backed filesystems notifying in pass sb->kill_sb() in
deactivate_locked_super() remains unproblematic and is required because
multiple other block devices can be shut down after kill_block_super()
has been called from a filesystem's sb->kill_sb() handler. For example,
ext4 and xfs close additional devices. Block based filesystems don't
depend on s_fs_info (btrfs does use s_fs_info but also uses
kill_anon_super() and not kill_block_super().).

Sorry for that braino. Goal should be to unify this behavior during this
cycle obviously. But let's please do a simple bugfix now.

Fixes: 2c18a63b76 ("super: wait until we passed kill super")
Fixes: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: syzbot+5b64180f8d9e39d3f061@syzkaller.appspotmail.com
Message-Id: <20230828-vfs-super-fixes-v1-2-b37a4a04a88f@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-29 10:13:04 +02:00
Christian Brauner
345a5c4a0b
super: move lockdep assert
Fix braino and move the lockdep assertion after put_super() otherwise we
risk a use-after-free.

Fixes: 2c18a63b76 ("super: wait until we passed kill super")
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20230828-vfs-super-fixes-v1-1-b37a4a04a88f@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-29 10:13:04 +02:00
Christian Brauner
cd4284cfd3 New code for 6.6:
* Allow the kernel to initiate a freeze of a filesystem.  The kernel
    and userspace can both hold a freeze on a filesystem at the same
    time; the freeze is not lifted until /both/ holders lift it.  This
    will enable us to fix a longstanding bug in XFS online fsck.
  * Use kernel-initated fsfreeze to fix some longstanding false negatives
    in onlin fsck of the free space and inode counters.
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZM0XzQAKCRBKO3ySh0YR
 phSCAQD9hQmd9tngbNGos44XthgHDIfVHLQLWLt6lwcD0WNfIgEAwMWKLzI9hi7G
 SmX3NWDQBj7kvC96HYizIvdSsdkvHw0=
 =ulEr
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXpMgAKCRCRxhvAZXjc
 ovFBAP97HEUSf78XXTQehluJgkbSVu208DFC4mCyFA6rRihskQD/Yz0uosr/51zJ
 FdUPNg8MNkQCRtqx5LQ7yClNSr9Sxg4=
 =uIAe
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.6-merge-3' of ssh://gitolite.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs online fsck update from Darrick Wong:

New code for 6.6:

 * Allow the kernel to initiate a freeze of a filesystem.  The kernel
   and userspace can both hold a freeze on a filesystem at the same
   time; the freeze is not lifted until /both/ holders lift it.  This
   will enable us to fix a longstanding bug in XFS online fsck.
 * Use kernel-initated fsfreeze to fix some longstanding false negatives
   in online fsck of the free space and inode counters.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Message-Id: <20230822182604.GB11286@frogsfrogsfrogs>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-23 13:09:22 +02:00
Christian Brauner
3fb5a6562a New code for 6.6:
* Allow the kernel to initiate a freeze of a filesystem.  The kernel
    and userspace can both hold a freeze on a filesystem at the same
    time; the freeze is not lifted until /both/ holders lift it.  This
    will enable us to fix a longstanding bug in XFS online fsck.
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZLVnJwAKCRBKO3ySh0YR
 pqVIAP9u9CZEJ2Zcc7YpBj1MLUQGr2xBmz8RJEVJbQHKVgYcQwEA9BNb4eH4i2Af
 K7Qp0OGNgyzZw37lN23Uf/SDuBK2QgM=
 =seMl
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXo2AAKCRCRxhvAZXjc
 ojDfAQDguc2saF8WLeXtn2O0pGOW8vTrhpwiFHNI6hwdzf07/AD+LGBpFEqYKyX5
 NHPzdR7YYpJoTsQzR4JFJVZqN9Q1xgU=
 =wDq0
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.6-merge-2' of ssh://gitolite.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull filesystem freezing updates from Darrick Wong:

New code for 6.6:

 * Allow the kernel to initiate a freeze of a filesystem.  The kernel
   and userspace can both hold a freeze on a filesystem at the same
   time; the freeze is not lifted until /both/ holders lift it.  This
   will enable us to fix a longstanding bug in XFS online fsck.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Message-Id: <20230822182604.GB11286@frogsfrogsfrogs>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-23 13:06:55 +02:00
Christian Brauner
051178c366
super: use higher-level helper for {freeze,thaw}
It's not necessary to use low-level locking helpers here. Use the
higher-level locking helpers and log if the superblock is dying. Since
the caller is assumed to already hold an active reference it isn't
possible to observe a dying superblock.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-22 13:32:50 +02:00
Christian Brauner
2c18a63b76 super: wait until we passed kill super
Recent rework moved block device closing out of sb->put_super() and into
sb->kill_sb() to avoid deadlocks as s_umount is held in put_super() and
blkdev_put() can end up taking s_umount again.

That means we need to move the removal of the superblock from @fs_supers
out of generic_shutdown_super() and into deactivate_locked_super() to
ensure that concurrent mounters don't fail to open block devices that
are still in use because blkdev_put() in sb->kill_sb() hasn't been
called yet.

We can now do this as we can make iterators through @fs_super and
@super_blocks wait without holding s_umount. Concurrent mounts will wait
until a dying superblock is fully dead so until sb->kill_sb() has been
called and SB_DEAD been set. Concurrent iterators can already discard
any SB_DYING superblock.

Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230818-vfs-super-fixes-v3-v3-4-9f0b1876e46b@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-21 18:09:08 +02:00
Christian Brauner
5e87491415 super: wait for nascent superblocks
Recent patches experiment with making it possible to allocate a new
superblock before opening the relevant block device. Naturally this has
intricate side-effects that we get to learn about while developing this.

Superblock allocators such as sget{_fc}() return with s_umount of the
new superblock held and lock ordering currently requires that block
level locks such as bdev_lock and open_mutex rank above s_umount.

Before aca740cecb ("fs: open block device after superblock creation")
ordering was guaranteed to be correct as block devices were opened prior
to superblock allocation and thus s_umount wasn't held. But now s_umount
must be dropped before opening block devices to avoid locking
violations.

This has consequences. The main one being that iterators over
@super_blocks and @fs_supers that grab a temporary reference to the
superblock can now also grab s_umount before the caller has managed to
open block devices and called fill_super(). So whereas before such
iterators or concurrent mounts would have simply slept on s_umount until
SB_BORN was set or the superblock was discard due to initalization
failure they can now needlessly spin through sget{_fc}().

If the caller is sleeping on bdev_lock or open_mutex one caller waiting
on SB_BORN will always spin somewhere and potentially this can go on for
quite a while.

It should be possible to drop s_umount while allowing iterators to wait
on a nascent superblock to either be born or discarded. This patch
implements a wait_var_event() mechanism allowing iterators to sleep
until they are woken when the superblock is born or discarded.

This also allows us to avoid relooping through @fs_supers and
@super_blocks if a superblock isn't yet born or dying.

Link: aca740cecb ("fs: open block device after superblock creation")
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230818-vfs-super-fixes-v3-v3-3-9f0b1876e46b@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-21 18:08:03 +02:00
Christian Brauner
d8ce82efde super: make locking naming consistent
Make the naming consistent with the earlier introduced
super_lock_{read,write}() helpers.

Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230818-vfs-super-fixes-v3-v3-2-9f0b1876e46b@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-21 14:36:57 +02:00
Christian Brauner
0ed33598dd super: use locking helpers
Replace the open-coded {down,up}_{read,write}() calls with simple
wrappers. Follow-up patches will benefit from this as well.

Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230818-vfs-super-fixes-v3-v3-1-9f0b1876e46b@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-21 14:36:57 +02:00
Christoph Hellwig
e127b9bccd fs: simplify invalidate_inodes
kill_dirty has always been true for a long time, so hard code it and
remove the unused return value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Message-Id: <20230811100828.1897174-18-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-21 14:35:32 +02:00
Christoph Hellwig
38bcdd3893 fs: remove get_super
get_super is unused now, remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Message-Id: <20230811100828.1897174-17-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-21 14:35:32 +02:00
Christoph Hellwig
2142b88c37 block: call into the file system for ioctl BLKFLSBUF
BLKFLSBUF is a historic ioctl that is called on a file handle to a
block device and syncs either the file system mounted on that block
device if there is one, or otherwise the just the data on the block
device.

Replace the get_super based syncing with a holder operation to remove
the last usage of get_super, and to also support syncing the file system
if the block device is not the main block device stored in s_dev.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Message-Id: <20230811100828.1897174-16-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-21 14:35:32 +02:00
Christoph Hellwig
d8530de5a6 block: call into the file system for bdev_mark_dead
Combine the newly merged bdev_mark_dead helper with the existing
mark_dead holder operation so that all operations that invalidate
a device that is dead or being removed now go through the holder
ops.  This allows file systems to explicitly shutdown either ASAP
(for a surprise removal) or after writing back data (for an orderly
removal), and do so not only for the main device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Message-Id: <20230811100828.1897174-15-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-21 14:35:32 +02:00
Christoph Hellwig
560e20e4bf block: consolidate __invalidate_device and fsync_bdev
We currently have two interfaces that take a block_devices and the find
a mounted file systems to flush or invaldidate data on it.  Both are a
bit problematic because they only work for the "main" block devices
that is used as s_dev for the super_block, and because they don't call
into the file system at all.

Merge the two into a new bdev_mark_dead helper that does both the
syncing and invalidation and which is properly documented.  This is
in preparation of merging the functionality into the ->mark_dead
holder operation so that it will work on additional block devices
used by a file systems and give us a single entry point for invalidation
of dead devices or media.

Note that a single standalone fsync_bdev call for an obscure ioctl
remains for now, but that one will also be deal with in a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Message-Id: <20230811100828.1897174-14-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-21 14:35:31 +02:00
Christoph Hellwig
127a5093c7 block: drop the "busy inodes on changed media" log message
This message isn't exactly helpful, and file systems already print way more
useful messages when shut down while active.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Message-Id: <20230811100828.1897174-13-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-21 14:35:31 +02:00
Christoph Hellwig
2527fd3877 dasd: also call __invalidate_device when setting the device offline
Don't just write out the data, but also invalidate all caches when setting
the device offline.  Stop canceling the offlining when writeback fails
as there is no way to recover from that anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Message-Id: <20230811100828.1897174-12-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-21 14:35:31 +02:00
Christoph Hellwig
2c0326c587 amiflop: don't call fsync_bdev in FDFMTBEG
FDFMTBEG is used by fdformat to calibrate before formatting a disk.
Neither the atari nor PC floppy driver sync data, which also seems
a bit pointless for a disk hat is about to get formatted.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Message-Id: <20230811100828.1897174-11-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-21 14:35:31 +02:00
Christoph Hellwig
a47145f236 floppy: call disk_force_media_change when changing the format
While changing the format of a floppy isn't strictly speaking a media
change, the effects are the same in that the content of the media
changes and the diskseq should be increased and uevent should be
sent.  Switch from calling __invalidate_device to
disk_force_media_change to do so.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Message-Id: <20230811100828.1897174-10-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-21 14:35:31 +02:00
Christoph Hellwig
ab6860f62b block: simplify the disk_force_media_change interface
Hard code the events to DISK_EVENT_MEDIA_CHANGE as that is the only
useful use case, and drop the superfluous return value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Message-Id: <20230811100828.1897174-9-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-21 14:35:30 +02:00
Christoph Hellwig
0c1c9a27ce nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl
nbd_clear_sock_ioctl kills the socket and with that the block
device.  Instead of just invalidating file system buffers,
mark the device as dead, which will also invalidate the buffers
as part of the proper shutdown sequence.  This also includes
invalidating partitions if there are any.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Message-Id: <20230811100828.1897174-8-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-21 14:35:30 +02:00
Christoph Hellwig
8ffa54e337 xfs use fs_holder_ops for the log and RT devices
Use the generic fs_holder_ops to shut down the file system when the
log or RT device goes away instead of duplicating the logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Message-Id: <20230802154131.2221419-13-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-11 13:15:04 +02:00
Christoph Hellwig
8d945b595e xfs: drop s_umount over opening the log and RT devices
Just like get_tree_bdev needs to drop s_umount when opening the main
device, we need to do the same for the xfs log and RT devices to avoid a
potential lock order reversal with s_unmount for the mark_dead path.

It might be preferable to just drop s_umount over ->fill_super entirely,
but that will require a fairly massive audit first, so we'll do the easy
version here first.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Message-Id: <20230802154131.2221419-12-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-11 13:15:04 +02:00
Christoph Hellwig
8bed178375 ext4: use fs_holder_ops for the log device
Use the generic fs_holder_ops to shut down the file system when the
log device goes away instead of duplicating the logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230802154131.2221419-11-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-11 13:15:04 +02:00
Christoph Hellwig
6f5fc7de98 ext4: drop s_umount over opening the log device
Just like get_tree_bdev needs to drop s_umount when opening the main
device, we need to do the same for the ext4 log device to avoid a
potential lock order reversal with s_unmount for the mark_dead path.

It might be preferable to just drop s_umount over ->fill_super entirely,
but that will require a fairly massive audit first, so we'll do the easy
version here first.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Message-Id: <20230802154131.2221419-10-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-11 13:15:04 +02:00
Christoph Hellwig
7ecd0b6f51 fs: export fs_holder_ops
Export fs_holder_ops so that file systems that open additional block
devices can use it as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Message-Id: <20230802154131.2221419-9-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-11 13:15:03 +02:00
Christoph Hellwig
9c09a7cf62 fs: stop using get_super in fs_mark_dead
fs_mark_dead currently uses get_super to find the superblock for the
block device that is going away.  This means it is limited to the
main device stored in sb->s_dev, leading to a lot of code duplication
for file systems that can use multiple block devices.

Now that the holder for all block devices used by file systems is set
to the super_block, we can instead look at that holder and then check
if the file system is born and active, so do that instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Message-Id: <20230802154131.2221419-8-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-11 13:15:03 +02:00
Christoph Hellwig
2ea6f68932 fs: use the super_block as holder when mounting file systems
The file system type is not a very useful holder as it doesn't allow us
to go back to the actual file system instance.  Pass the super_block instead
which is useful when passed back to the file system driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Message-Id: <20230802154131.2221419-7-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-11 13:15:03 +02:00
Christoph Hellwig
4b41828be2 ext4: make the IS_EXT2_SB/IS_EXT3_SB checks more robust
Check for sb->s_type which is the right place to look at the file system
type, not the holder, which is just an implementation detail in the VFS
helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Message-Id: <20230802154131.2221419-6-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:57 +02:00
Christoph Hellwig
c1e012ea9e nilfs2: use setup_bdev_super to de-duplicate the mount code
Use the generic setup_bdev_super helper to open the main block device
and do various bits of superblock setup instead of duplicating the
logic.  This includes moving to the new scheme implemented in common
code that only opens the block device after the superblock has allocated.

It does not yet convert nilfs2 to the new mount API, but doing so will
become a bit simpler after this first step.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Message-Id: <20230802154131.2221419-3-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:56 +02:00
Christoph Hellwig
cf6da236c2 fs: export setup_bdev_super
We'll want to use setup_bdev_super instead of duplicating it in nilfs2.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Message-Id: <20230802154131.2221419-2-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:56 +02:00
Jan Kara
aca740cecb fs: open block device after superblock creation
Currently get_tree_bdev and mount_bdev open the block device before
committing to allocating a super block. That creates problems for
restricting the number of writers to a device, and also leads to a
unusual and not very helpful holder (the fs_type).

Reorganize the super block code to first look whether the superblock for
a particular device does already exist and open the block device only if
it doesn't.

[hch: port to before the bdev_handle changes,
      duplicate the bdev read-only check from blkdev_get_by_path,
      extend the fsfree_mutex coverage to protect against freezes,
      fix an open bdev leak when the bdev is frozen,
      use the bdev local variable more,
      rename the s variable to sb to be more descriptive]
[brauner: remove references to mounts as they're mostly irrelevant]
[brauner & hch: fold fixes for romfs and cramfs for
                syzbot+2faac0423fdc9692822b@syzkaller.appspotmail.com]

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20230724175145.201318-1-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:56 +02:00
Christoph Hellwig
a4f64a300a ntfs3: free the sbi in ->kill_sb
As a rule of thumb everything allocated to the fs_context and moved into
the super_block should be freed by ->kill_sb so that the teardown
handling doesn't need to be duplicated between the fill_super error
path and put_super.  Implement an ntfs3-specific kill_sb method to do
that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Message-Id: <20230809220545.1308228-14-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:56 +02:00
Christoph Hellwig
5f0fb2210b ntfs3: don't call sync_blockdev in ntfs_put_super
kill_block_super will call sync_blockdev just a tad later already.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Message-Id: <20230809220545.1308228-13-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:56 +02:00
Christoph Hellwig
126dbf8a1b ntfs3: rename put_ntfs ntfs3_free_sbi
put_ntfs is a rather unconventional name for a function that frees the
sbi and associated resources.  Give it a more descriptive name and drop
the duplicate name in the top of the function comment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Message-Id: <20230809220545.1308228-12-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:55 +02:00
Christoph Hellwig
4abc9a43d9 exfat: free the sbi and iocharset in ->kill_sb
As a rule of thumb everything allocated to the fs_context and moved into
the super_block should be freed by ->kill_sb so that the teardown
handling doesn't need to be duplicated between the fill_super error
path and put_super.  Implement an exfat-specific kill_sb method to do
that and share the code with the mount contex free helper for the
mount error handling case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20230809220545.1308228-11-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:55 +02:00
Christoph Hellwig
c934dc927e exfat: don't RCU-free the sbi
There are no RCU critical sections for accessing any information in the
sbi, so drop the call_rcu indirection for freeing the sbi.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20230809220545.1308228-10-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:55 +02:00
Christoph Hellwig
1489dffd51 ext4: close the external journal device in ->kill_sb
blkdev_put must not be called under sb->s_umount to avoid a lock order
reversal with disk->open_mutex.  Move closing the external journal device
into ->kill_sb to archive that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20230809220545.1308228-9-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:55 +02:00
Christoph Hellwig
1a0a5dad67 xfs: document the invalidate_bdev call in invalidate_bdev
Copy and paste the commit message from Darrick into a comment to explain
the seemingly odd invalidate_bdev in xfs_shutdown_devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Message-Id: <20230809220545.1308228-8-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:55 +02:00
Christoph Hellwig
35a93b148b xfs: close the external block devices in xfs_mount_free
blkdev_put must not be called under sb->s_umount to avoid a lock order
reversal with disk->open_mutex.  Move closing the buftargs into ->kill_sb
to archive that.  Note that the flushing of the disk caches and
block device mapping invalidated needs to stay in ->put_super as the main
block device is closed in kill_block_super already.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Message-Id: <20230809220545.1308228-7-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:54 +02:00
Christoph Hellwig
41233576e9 xfs: close the RT and log block devices in xfs_free_buftarg
Closing the block devices logically belongs into xfs_free_buftarg,  So
instead of open coding it in the caller move it there and add a check
for the s_bdev so that the main device isn't close as that's done by the
VFS helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Message-Id: <20230809220545.1308228-6-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:54 +02:00
Christoph Hellwig
d3ef7e94ee xfs: remove xfs_blkdev_put
There isn't much use for this trivial wrapper, especially as the NULL
check is only needed in a single call site.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Message-Id: <20230809220545.1308228-5-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:54 +02:00
Christoph Hellwig
2a9311adb8 xfs: free the xfs_mount in ->kill_sb
As a rule of thumb everything allocated to the fs_context and moved into
the super_block should be freed by ->kill_sb so that the teardown
handling doesn't need to be duplicated between the fill_super error
path and put_super.  Implement a XFS-specific kill_sb method to do that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Message-Id: <20230809220545.1308228-4-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:54 +02:00
Christoph Hellwig
1aa2d074d4 xfs: remove a superfluous s_fs_info NULL check in xfs_fs_put_super
->put_super is only called when sb->s_root is set, and thus when
fill_super succeeds.  Thus drop the NULL check that can't happen in
xfs_fs_put_super.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Message-Id: <20230809220545.1308228-3-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:54 +02:00
Christoph Hellwig
dbbff48906 xfs: reformat the xfs_fs_free prototype
The xfs_fs_free prototype formatting is a weird mix of the classic XFS
style and the Linux style.  Fix it up to be consistent.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Message-Id: <20230809220545.1308228-2-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10 10:34:53 +02:00
Christoph Hellwig
6a32073955 fs, block: remove bdev->bd_super
bdev->bd_super is unused now, remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Message-Id: <20230807112625.652089-5-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09 08:21:22 +02:00
Christoph Hellwig
8887b94d93 ocfs2: stop using bdev->bd_super for journal error logging
All ocfs2 journal error handling and logging is based on buffer_heads,
and the owning inode and thus super_block can be retrieved through
bh->b_assoc_map->host.  Switch to using that to remove the last users
of bdev->bd_super.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Message-Id: <20230807112625.652089-4-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09 08:16:56 +02:00
Christoph Hellwig
01efe93a5a ext4: don't use bdev->bd_super in __ext4_journal_get_write_access
__ext4_journal_get_write_access already has a super_block available,
and there is no need to go from that to the bdev to go back to the
owning super_block.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20230807112625.652089-3-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09 08:16:55 +02:00
Christoph Hellwig
4b2201dad2 fs: stop using bdev->bd_super in mark_buffer_write_io_error
bdev->bd_super is a somewhat awkward backpointer from a block device to
an owning file system with unclear rules.

For the buffer_head code we already have a good backpointer for the
inode that the buffer_head is associated with, even if it lives on the
block device mapping: b_assoc_map. It is used track dirty buffers
associated with an inode but living on the block device mapping like
directory buffers in ext4.

mark_buffer_write_io_error already uses it for the call to
mapping_set_error, and should be doing the same for the per-sb error
sequence.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Message-Id: <20230807112625.652089-2-hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09 08:16:55 +02:00
Darrick J. Wong
ce85a1e046 xfs: stabilize fs summary counters for online fsck
If the fscounters scrubber notices incorrect summary counters, it's
entirely possible that scrub is simply racing with other threads that
are updating the incore counters.  There isn't a good way to stabilize
percpu counters or set ourselves up to observe live updates with hooks
like we do for the quotacheck or nlinks scanners, so we instead choose
to freeze the filesystem long enough to walk the incore per-AG
structures.

Past me thought that it was going to be commonplace to have to freeze
the filesystem to perform some kind of repair and set up a whole
separate infrastructure to freeze the filesystem in such a way that
userspace could not unfreeze while we were running.  This involved
adding a mutex and freeze_super/thaw_super functions and dealing with
the fact that the VFS freeze/thaw functions can free the VFS superblock
references on return.

This was all very overwrought, since fscounters turned out to be the
only user of scrub freezes, and it doesn't require the log to quiesce,
only the incore superblock counters.  We prevent other threads from
changing the freeze level by calling freeze_super_excl with a custom
freeze cookie to keep everyone else out of the filesystem.

The end result is that fscounters should be much more efficient.  When
we're checking a busy system and we can't stabilize the counters, the
custom freeze will do less work, which should result in less downtime.
Repair should be similarly speedy, but that's in a later patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-08-04 08:20:57 -07:00
Darrick J. Wong
59ba4fdd2d fs: wait for partially frozen filesystems
Jan Kara suggested that when one thread is in the middle of freezing a
filesystem, another thread trying to freeze the same fs but with a
different freeze_holder should wait until the freezer reaches either end
state (UNFROZEN or COMPLETE) instead of returning EBUSY immediately.

Neither caller can do anything sensible with this race other than retry
but they cannot really distinguish EBUSY as in "some other holder of the
same type has the sb already frozen" from "freezing raced with holder of
a different type".

Plumb in the extra code needed to wait for the fs freezer to reach an
end state and try the freeze again.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
2023-07-17 09:06:37 -07:00