module: warn about excessively long module waits

Russell King reported that the arm cbc(aes) crypto module hangs when
loaded, and Herbert Xu bisected it to commit 9b9879fc03 ("modules:
catch concurrent module loads, treat them as idempotent"), and noted:

 "So what's happening here is that the first modprobe tries to load a
  fallback CBC implementation, in doing so it triggers a load of the
  exact same module due to module aliases.

  IOW we're loading aes-arm-bs which provides cbc(aes). However, this
  needs a fallback of cbc(aes) to operate, which is made out of the
  generic cbc module + any implementation of aes, or ecb(aes). The
  latter happens to also be provided by aes-arm-cb so that's why it
  tries to load the same module again"

So loading the aes-arm-bs module ends up wanting to recursively load
itself, and the recursive load then ends up waiting for the original
module load to complete.

This is a regression, in that it used to be that we just tried to load
the module multiple times, and then as we went on to install it the
second time we would instead just error out because the module name
already existed.

That is actually also exactly what the original "catch concurrent loads"
patch did in commit 9828ed3f69 ("module: error out early on concurrent
load of the same module file"), but it turns out that it ends up being
racy, in that erroring out before the module has been fully initialized
will cause failures in dependent module loading.

See commit ac2263b588 (which was the revert of that "error out early")
commit for details about why erroring out before the module has been
initialized is actually fundamentally racy.

Now, for the actual recursive module load (as opposed to just
concurrently loading the same module twice), the race is not an issue.

At the same time it's hard for the kernel to see that this is recursion,
because the module load is always done from a usermode helper, so the
recursion is not some simple callchain within the kernel.

End result: this is not the real fix, but this at least adds a warning
for the situation (admittedly much too late for all the debugging pain
that Russell and Herbert went through) and if we can come to a
resolution on how to detect the recursion properly, this re-organizes
the code to make that easier.

Link: https://lore.kernel.org/all/ZrFHLqvFqhzykuYw@shell.armlinux.org.uk/
Reported-by: Russell King <linux@armlinux.org.uk>
Debugged-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
Linus Torvalds 2024-08-08 12:29:40 -07:00
parent cf6d429eb6
commit cb5b81bc9a

View File

@ -3183,15 +3183,28 @@ static int idempotent_init_module(struct file *f, const char __user * uargs, int
if (!f || !(f->f_mode & FMODE_READ))
return -EBADF;
/* See if somebody else is doing the operation? */
if (idempotent(&idem, file_inode(f))) {
wait_for_completion(&idem.complete);
return idem.ret;
/* Are we the winners of the race and get to do this? */
if (!idempotent(&idem, file_inode(f))) {
int ret = init_module_from_file(f, uargs, flags);
return idempotent_complete(&idem, ret);
}
/* Otherwise, we'll do it and complete others */
return idempotent_complete(&idem,
init_module_from_file(f, uargs, flags));
/*
* Somebody else won the race and is loading the module.
*
* We have to wait for it forever, since our 'idem' is
* on the stack and the list entry stays there until
* completed (but we could fix it under the idem_lock)
*
* It's also unclear what a real timeout might be,
* but we could maybe at least make this killable
* and remove the idem entry in that case?
*/
for (;;) {
if (wait_for_completion_timeout(&idem.complete, 10*HZ))
return idem.ret;
pr_warn_once("module '%pD' taking a long time to load", f);
}
}
SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)