porting: document superblock as block device holder

We've changed the holder of the block device which has consequences.
Document this clearly and in detail so filesystem and vfs developers
have a proper digital paper trail.

Signed-off-by: Christian Brauner <brauner@kernel.org>
This commit is contained in:
Christian Brauner 2023-09-15 16:01:40 +02:00
parent 2ba0dd6562
commit 060e6c7d17
No known key found for this signature in database
GPG Key ID: 91C61BC06578DCA2

View File

@ -975,3 +975,73 @@ was discarded due to initialization failure.
Since the new logic drops s_umount concurrent mounters could grab s_umount and
would spin. Instead they are now made to wait using an explicit wait-wake
mechanism without having to hold s_umount.
---
**mandatory**
The holder of a block device is now the superblock.
The holder of a block device used to be the file_system_type which wasn't
particularly useful. It wasn't possible to go from block device to owning
superblock without matching on the device pointer stored in the superblock.
This mechanism would only work for a single device so the block layer couldn't
find the owning superblock of any additional devices.
In the old mechanism reusing or creating a superblock for a racing mount(2) and
umount(2) relied on the file_system_type as the holder. This was severly
underdocumented however:
(1) Any concurrent mounter that managed to grab an active reference on an
existing superblock was made to wait until the superblock either became
ready or until the superblock was removed from the list of superblocks of
the filesystem type. If the superblock is ready the caller would simple
reuse it.
(2) If the mounter came after deactivate_locked_super() but before
the superblock had been removed from the list of superblocks of the
filesystem type the mounter would wait until the superblock was shutdown,
reuse the block device and allocate a new superblock.
(3) If the mounter came after deactivate_locked_super() and after
the superblock had been removed from the list of superblocks of the
filesystem type the mounter would reuse the block device and allocate a new
superblock (the bd_holder point may still be set to the filesystem type).
Because the holder of the block device was the file_system_type any concurrent
mounter could open the block devices of any superblock of the same
file_system_type without risking seeing EBUSY because the block device was
still in use by another superblock.
Making the superblock the owner of the block device changes this as the holder
is now a unique superblock and thus block devices associated with it cannot be
reused by concurrent mounters. So a concurrent mounter in (2) could suddenly
see EBUSY when trying to open a block device whose holder was a different
superblock.
The new logic thus waits until the superblock and the devices are shutdown in
->kill_sb(). Removal of the superblock from the list of superblocks of the
filesystem type is now moved to a later point when the devices are closed:
(1) Any concurrent mounter managing to grab an active reference on an existing
superblock is made to wait until the superblock is either ready or until
the superblock and all devices are shutdown in ->kill_sb(). If the
superblock is ready the caller will simply reuse it.
(2) If the mounter comes after deactivate_locked_super() but before
the superblock has been removed from the list of superblocks of the
filesystem type the mounter is made to wait until the superblock and the
devices are shut down in ->kill_sb() and the superblock is removed from the
list of superblocks of the filesystem type. The mounter will allocate a new
superblock and grab ownership of the block device (the bd_holder pointer of
the block device will be set to the newly allocated superblock).
(3) This case is now collapsed into (2) as the superblock is left on the list
of superblocks of the filesystem type until all devices are shutdown in
->kill_sb(). In other words, if the superblock isn't on the list of
superblock of the filesystem type anymore then it has given up ownership of
all associated block devices (the bd_holder pointer is NULL).
As this is a VFS level change it has no practical consequences for filesystems
other than that all of them must use one of the provided kill_litter_super(),
kill_anon_super(), or kill_block_super() helpers.