This patch reverts two TOMOYO patches that were merged into Linus' tree
during the v6.12 merge window:
8b985bbfab ("tomoyo: allow building as a loadable LSM module")
268225a1de ("tomoyo: preparation step for building as a loadable LSM module")
Together these two patches introduced the CONFIG_SECURITY_TOMOYO_LKM
Kconfig build option which enabled a TOMOYO specific dynamic LSM loading
mechanism (see the original commits for more details). Unfortunately,
this approach was widely rejected by the LSM community as well as some
members of the general kernel community. Objections included concerns
over setting a bad precedent regarding individual LSMs managing their
LSM callback registrations as well as general kernel symbol exporting
practices. With little to no support for the CONFIG_SECURITY_TOMOYO_LKM
approach outside of Tetsuo, and multiple objections, we need to revert
these changes.
Link: https://lore.kernel.org/all/0c4b443a-9c72-4800-97e8-a3816b6a9ae2@I-love.SAKURA.ne.jp
Link: https://lore.kernel.org/all/CAHC9VhR=QjdoHG3wJgHFJkKYBg7vkQH2MpffgVzQ0tAByo_wRg@mail.gmail.com
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This tool is only used in security/selinux/Makefile.
Move it to security/selinux/ so that 'make clean' can clean it up.
Please note 'make clean' does not clean scripts/ because tools under
scripts/ are often used for external module builds. Obviously, genheaders
is not the case here.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The header, security/selinux/include/classmap.h, is included not only
from kernel space but also from host programs.
It includes <linux/capability.h> and <linux/socket.h>, which pull in
more <linux/*.h> headers. This makes the host programs less portable,
specifically causing build errors on macOS.
Those headers are included for the following purposes:
- <linux/capability.h> for checking CAP_LAST_CAP
- <linux/socket.h> for checking PF_MAX
These checks can be guarded by __KERNEL__ so they are skipped when
building host programs. Testing them when building the kernel should
be sufficient.
The header, security/selinux/include/initial_sid_to_string.h, includes
<linux/stddef.h> for the NULL definition, but this is not portable
either. Instead, <stddef.h> should be included for host programs.
Reported-by: Daniel Gomez <da.gomez@samsung.com>
Closes: https://lore.kernel.org/lkml/20240807-macos-build-support-v1-6-4cd1ded85694@samsung.com/
Closes: https://lore.kernel.org/lkml/20240807-macos-build-support-v1-7-4cd1ded85694@samsung.com/
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
MODVERSIONS recently grew a dependency on !COMPILE_TEST so that Rust
could be more easily tested. However, this introduces a Kconfig warning
when building allmodconfig with a clang version that supports RANDSTRUCT
natively because RANDSTRUCT_FULL and RANDSTRUCT_PERFORMANCE select
MODVERSIONS when MODULES is enabled, bypassing the !COMPILE_TEST
dependency:
WARNING: unmet direct dependencies detected for MODVERSIONS
Depends on [n]: MODULES [=y] && !COMPILE_TEST [=y]
Selected by [y]:
- RANDSTRUCT_FULL [=y] && (CC_HAS_RANDSTRUCT [=y] || GCC_PLUGINS [=n]) && MODULES [=y]
Add the !COMPILE_TEST dependency to the selections to clear up the
warning.
Fixes: 1f9c4a9967 ("Kbuild: make MODVERSIONS support depend on not being a compile test build")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240928-fix-randstruct-modversions-kconfig-warning-v1-1-27d3edc8571e@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
TOMOYO is useful as an analysis tool for learning how a Linux system works.
My boss was hoping that SELinux's policy is generated from what TOMOYO has
observed. A translated paper describing it is available at
https://master.dl.sourceforge.net/project/tomoyo/docs/nsf2003-en.pdf/nsf2003-en.pdf?viasf=1 .
Although that attempt failed due to mapping problem between inode and pathname,
TOMOYO remains as an access restriction tool due to ability to write custom
policy by individuals.
I was delivering pure LKM version of TOMOYO (named AKARI) to users who
cannot afford rebuilding their distro kernels with TOMOYO enabled. But
since the LSM framework was converted to static calls, it became more
difficult to deliver AKARI to such users. Therefore, I decided to update
TOMOYO so that people can use mostly LKM version of TOMOYO with minimal
burden for both distributors and users.
Tetsuo Handa (3):
tomoyo: preparation step for building as a loadable LSM module
tomoyo: allow building as a loadable LSM module
tomoyo: fallback to realpath if symlink's pathname does not exist
security/tomoyo/Kconfig | 15 ++++++++
security/tomoyo/Makefile | 10 ++++-
security/tomoyo/common.c | 14 ++++++-
security/tomoyo/common.h | 72 ++++++++++++++++++++++++++++++++++++++
security/tomoyo/domain.c | 9 +++-
security/tomoyo/gc.c | 3 +
security/tomoyo/hooks.h | 110 -----------------------------------------------------------
security/tomoyo/init.c | 366 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
security/tomoyo/load_policy.c | 12 ++++++
security/tomoyo/proxy.c | 82 ++++++++++++++++++++++++++++++++++++++++++++
security/tomoyo/securityfs_if.c | 12 ++++--
security/tomoyo/util.c | 3 -
12 files changed, 585 insertions(+), 123 deletions(-)
-----BEGIN PGP SIGNATURE-----
iQJXBAABCABBFiEEQ8gzaWI9etOpbC/HQl8SjQxk9SoFAmb2iQUjHHBlbmd1aW4t
a2VybmVsQGktbG92ZS5zYWt1cmEubmUuanAACgkQQl8SjQxk9Soxaw/+N3CEsVbs
npXTNsj515QWSd66Eg8UkAeE2YmwTnxQ5ZRR83SCHf65MVhhkwIKZvFy8VcnJnBu
Xsl9p2Vme8klopTUXmjABzba2OyfU4qwHCBv6dJehTgKNUW1NVCAgliS0C20k//j
63iGAWJ1cjQl6fbMXFS/JW/KR49EJLMKtybVdH0ldwo+fk3YivQCalxzB9FlnqKA
LrhLI2BPq3CMDlC64572XiGQrLY31PR7WDwAUux4CYjE/cMubLpRzL/NjgGO2DJs
V6fXwqRqdywZhM2ltzbWiSz5US0hvWO0E5Ql3dKX180LLlVGlhs0eCiuZGwPUY+j
vjZEB5ygjOI4SO4AfWxJgqprjDAEPzfyo7+txENwW8q4c/li9woY9B8ma7S1VMGQ
6vKx2N8kw0bCZi6FkBu6ymc4sQJcG1acWJT/1IEbcfHOHDAEbmPeHAtvJsn/JPLN
yiKCfCXM7ecNuCyRlMvDssIK3qRh+E2cFC0WVUQc9MEEkCYneGqU1R2lUKZHqEsN
94kdhjiUroQZjcQVrtJwDv5Baf2OgMw7DGuhUi0CwKm8Hs7NH5lZ+i+xtvnZCdG7
GoF62GKGEc91WN9OzPmvWRAKZkr+KrfNvVm/GJE2P6l0MaCgyDztrn7xduuSD/Rr
sazREZUXqmqltyzgAisth4/BPkqvhCSPMEM=
=7347
-----END PGP SIGNATURE-----
Merge tag 'tomoyo-pr-20240927' of git://git.code.sf.net/p/tomoyo/tomoyo
Pull tomoyo updates from Tetsuo Handa:
"One bugfix patch, one preparation patch, and one conversion patch.
TOMOYO is useful as an analysis tool for learning how a Linux system
works. My boss was hoping that SELinux's policy is generated from what
TOMOYO has observed. A translated paper describing it is available at
https://master.dl.sourceforge.net/project/tomoyo/docs/nsf2003-en.pdf/nsf2003-en.pdf?viasf=1
Although that attempt failed due to mapping problem between inode and
pathname, TOMOYO remains as an access restriction tool due to ability
to write custom policy by individuals.
I was delivering pure LKM version of TOMOYO (named AKARI) to users who
cannot afford rebuilding their distro kernels with TOMOYO enabled. But
since the LSM framework was converted to static calls, it became more
difficult to deliver AKARI to such users. Therefore, I decided to
update TOMOYO so that people can use mostly LKM version of TOMOYO with
minimal burden for both distributors and users"
* tag 'tomoyo-pr-20240927' of git://git.code.sf.net/p/tomoyo/tomoyo:
tomoyo: fallback to realpath if symlink's pathname does not exist
tomoyo: allow building as a loadable LSM module
tomoyo: preparation step for building as a loadable LSM module
Alfred Agrell found that TOMOYO cannot handle execveat(AT_EMPTY_PATH)
inside chroot environment where /dev and /proc are not mounted, for
commit 51f39a1f0c ("syscalls: implement execveat() system call") missed
that TOMOYO tries to canonicalize argv[0] when the filename fed to the
executed program as argv[0] is supplied using potentially nonexistent
pathname.
Since "/dev/fd/<fd>" already lost symlink information used for obtaining
that <fd>, it is too late to reconstruct symlink's pathname. Although
<filename> part of "/dev/fd/<fd>/<filename>" might not be canonicalized,
TOMOYO cannot use tomoyo_realpath_nofollow() when /dev or /proc is not
mounted. Therefore, fallback to tomoyo_realpath_from_path() when
tomoyo_realpath_nofollow() failed.
Reported-by: Alfred Agrell <blubban@gmail.com>
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1082001
Fixes: 51f39a1f0c ("syscalls: implement execveat() system call")
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
-----BEGIN PGP SIGNATURE-----
iQIyBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmbyniwACgkQ6rmadz2v
bTqE0w/2J8TJWfR+1Z0Bf2Nzt3kFd/wLNn6FpWsq+z0/pzoP5AzborvmLzNiZmeh
0vJFieOL7pV4+NcaIHBPqfW1eMsXu+BlrtkHGLLYiCPJUr8o5jU9SrVKfF3arMZS
a6+zcX6EivX0MYWobZ2F7/8XF0nRQADxzInLazFmtJmLmOAyIch417KOg9ylwr3m
WVqhtCImUFyVz83XMFgbf2jXrvL9xD08iHN62GzcAioRF5LeJSPX0U/N15gWDqF7
V68F0PnvUf6/hkFvYVynhpMivE8u+8VXCHX+heZ8yUyf4ExV/+KSZrImupJ0WLeO
iX/qJ/9XP+g6ad9Olqpu6hmPi/6c6epQgbSOchpG04FGBGmJv1j9w4wnlHCgQDdB
i2oKHRtMKdqNZc0sOSfvw/KyxZXJuD1VQ9YgGVpZbHUbSZDoj7T40zWziUp8VgyR
nNtOmfJLDbtYlPV7/cQY5Ui4ccMJm6GzxxLBcqcMWxBu/90Ng0wTSubLbg3RHmWu
d9cCL6IprjJnliEUqC4k4gqZy6RJlHvQ8+NDllaW+4iPnz7B2WaUbwRX/oZ5yiYK
bLjWCWo+SzntVPAzTsmAYs2G47vWoALxo2NpNXLfmhJiWwfakJaQu7fwrDxsY11M
OgByiOzcbAcvkJzeVIDhfLVq5z49KF6k4D8Qu0uvXHDeC8Mraw==
=zzmh
-----END PGP SIGNATURE-----
Merge tag 'bpf-next-6.12-struct-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf 'struct fd' updates from Alexei Starovoitov:
"This includes struct_fd BPF changes from Al and Andrii"
* tag 'bpf-next-6.12-struct-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
bpf: convert bpf_token_create() to CLASS(fd, ...)
security,bpf: constify struct path in bpf_token_create() LSM hook
bpf: more trivial fdget() conversions
bpf: trivial conversions for fdget()
bpf: switch maps to CLASS(fd, ...)
bpf: factor out fetching bpf_map from FD and adding it to used_maps list
bpf: switch fdget_raw() uses to CLASS(fd_raw, ...)
bpf: convert __bpf_prog_get() to CLASS(fd, ...)
-----BEGIN PGP SIGNATURE-----
iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCZvGpchAcbWljQGRpZ2lr
b2QubmV0AAoJEOXj0OiMgvbSTzMBAIpcYKf75IyC4DXqiXlko508YdyI2YfYeWdd
5yVZbSHgAP0aEFO4AOvJ26pPlGF+8zVIHq+HNAhrAalZBulxASePCA==
=nsAF
-----END PGP SIGNATURE-----
Merge tag 'landlock-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"We can now scope a Landlock domain thanks to a new "scoped" field that
can deny interactions with resources outside of this domain.
The LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET flag denies connections to an
abstract UNIX socket created outside of the current scoped domain, and
the LANDLOCK_SCOPE_SIGNAL flag denies sending a signal to processes
outside of the current scoped domain.
These restrictions also apply to nested domains according to their
scope. The related changes will also be useful to support other kind
of IPC isolations"
* tag 'landlock-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Document LANDLOCK_SCOPE_SIGNAL
samples/landlock: Add support for signal scoping
selftests/landlock: Test signal created by out-of-bound message
selftests/landlock: Test signal scoping for threads
selftests/landlock: Test signal scoping
landlock: Add signal scoping
landlock: Document LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET
samples/landlock: Add support for abstract UNIX socket scoping
selftests/landlock: Test inherited restriction of abstract UNIX socket
selftests/landlock: Test connected and unconnected datagram UNIX socket
selftests/landlock: Test UNIX sockets with any address formats
selftests/landlock: Test abstract UNIX socket scoping
selftests/landlock: Test handling of unknown scope
landlock: Add abstract UNIX socket scoping
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmbxyVEUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXObWA//dDTn1UMEE2zBE5oF46Yw6FDIipEc
TL7ulL6fXHKZnAGOfkNREKydkLddZVH+mG7AyJQL6A/06s3/tl3J6i8yLdYZ67iD
6khZzXvwTA41oLKNB/gVCF3xUUIcifnEqoCIRA9AFg7ck+W/gjtXbHD1xaWYjpqX
rAorbAu3YA1Rv+sOe2NWZ0EDUPUzfzBPJEZT27TSwCVoWED9r9BxMvQgdhijf0XZ
a0T8wk1RfAvP4+Cf8XPLUkrgu/x9OauLAdx/a48TeODxQ6KjcFUTUtujRsBduzq/
cnJEeXAJwD7YqbuoNmidwTul/RGZS3nsWhEr2i8JBVdWYSDACpahO1Ls3WtJuQt3
oCEQGwrXyPlL4LlcSmRjxL+PLc+MIihjWetIOqgujxKQe82rG+fJlu42zBxbmqnI
BglJ3Ps+kcHPdUh216NAiKwJXw00IsUsldCZpAe+ck7Tz3H1OhMtjKNa0H7nqYtn
dMV3ieIKj+PVLJTjSeoLSQ3lxx8JFdH7owV7zO++NLsX05dQx8LTUeqSzL6skUk2
ocn0ekBmH4GRSph2nUBsr5W575Zx2VKdGS8nS9d/TxXOzuwflOZpX81kAzwCX+Ru
VN9wVlM8qgFwoeK8SlaOD94Jsy7nAeaBu0/H3fYdB5TX1MnNTIOqTtZgxpotr2Gw
Z295YFAklGMv7zo=
=KDfa
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20240923' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM fixes from Paul Moore:
- Add a missing security_mmap_file() check to the remap_file_pages()
syscall
- Properly reference the SELinux and Smack LSM blobs in the
security_watch_key() LSM hook
- Fix a random IPE selftest crash caused by a missing list terminator
in the test
* tag 'lsm-pr-20240923' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
ipe: Add missing terminator to list of unit tests
selinux,smack: properly reference the LSM blob in security_watch_key()
mm: call the security_mmap_file() LSM hook in remap_file_pages()
One of concerns for enabling TOMOYO in prebuilt kernels is that distributor
wants to avoid bloating kernel packages. Although boot-time kernel command
line options allows selecting built-in LSMs to enable, file size increase
of vmlinux and memory footprint increase of vmlinux caused by builtin-but-
not-enabled LSMs remains. If it becomes possible to make LSMs dynamically
appendable after boot using loadable kernel modules, these problems will
go away.
Another of concerns for enabling TOMOYO in prebuilt kernels is that who can
provide support when distributor cannot provide support. Due to "those who
compiled kernel code is expected to provide support for that kernel code"
spell, TOMOYO is failing to get enabled in Fedora distribution [1]. The
point of loadable kernel module is to share the workload. If it becomes
possible to make LSMs dynamically appendable after boot using loadable
kernel modules, as with people can use device drivers not supported by
distributors but provided by third party device vendors, we can break
this spell and can lower the barrier for using TOMOYO.
This patch is intended for demonstrating that there is nothing difficult
for supporting TOMOYO-like loadable LSM modules. For now we need to live
with a mixture of built-in part and loadable part because fully loadable
LSM modules are not supported since Linux 2.6.24 [2] and number of LSMs
which can reserve static call slots is determined at compile time in
Linux 6.12.
Major changes in this patch are described below.
There are no behavior changes as long as TOMOYO is built into vmlinux.
Add CONFIG_SECURITY_TOMOYO_LKM as "bool" instead of changing
CONFIG_SECURITY_TOMOYO from "bool" to "tristate", for something went
wrong with how Makefile is evaluated if I choose "tristate".
Add proxy.c for serving as a bridge between vmlinux and tomoyo.ko .
Move callback functions from init.c to proxy.c when building as a loadable
LSM module. init.c is built-in part and remains for reserving static call
slots. proxy.c contains module's init function and tells init.c location of
callback functions, making it possible to use static call for tomoyo.ko .
By deferring initialization of "struct tomoyo_task" until tomoyo.ko is
loaded, threads created between init.c reserved LSM hooks and proxy.c
updates LSM hooks will have NULL "struct tomoyo_task" instances. Assuming
that tomoyo.ko is loaded by the moment when the global init process starts,
initialize "struct tomoyo_task" instance for current thread as a kernel
thread when tomoyo_task(current) is called for the first time.
There is a hack for exporting currently not-exported functions.
This hack will be removed after all relevant functions are exported.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=542986 [1]
Link: https://lkml.kernel.org/r/caafb609-8bef-4840-a080-81537356fc60@I-love.SAKURA.ne.jp [2]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Add missing terminator to list of unit tests to avoid random crashes seen
when running the test.
Fixes: 10ca05a760 ("ipe: kunit test for parser")
Cc: Deven Bowers <deven.desai@linux.microsoft.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZvDNmgAKCRBZ7Krx/gZQ
63zrAP9vI0rf55v27twiabe9LnI7aSx5ckoqXxFIFxyT3dOYpQD/bPmoApnWDD3d
592+iDgLsema/H/0/CqfqlaNtDNY8Q0=
=HUl5
-----END PGP SIGNATURE-----
Merge tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull 'struct fd' updates from Al Viro:
"Just the 'struct fd' layout change, with conversion to accessor
helpers"
* tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
add struct fd constructors, get rid of __to_fd()
struct fd: representation change
introduce fd_file(), convert all accessors to it.
In order to allow Makefile to generate tomoyo.ko as output, rename
tomoyo.c to hooks.h and cut out LSM hook registration part that will be
built into vmlinux from hooks.h to init.c . Also, update comments and
relocate some variables. No behavior changes.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmbk/nIACgkQ6rmadz2v
bTqxuBAAnqW81Rr0nORIxeJMbyo4EiFuYHGk6u5BYP9NPzqHroUPCLVmSP7Hp/Ta
CJjsiZeivZsGa6Qlc3BCa4hHNpqP5WE1C/73svSDn7/99EfxdSBtirpMVFUPsUtn
DDb5chNpvnxKNS8Mw5Ty8wBrdbXHMlSx+IfaFHpv0Yn6EAcuF4UdoEUq2l3PqhfD
Il9Zm127eViPGAP+o+TBZFfW+rRw8d0ngqeRq2GvJ8ibNEDWss+GmBI1Dod7d+fC
dUDg96Ipdm1a5Xz7dnH80eXz9JHdpu6qhQrQMKKArnlpJElrKiOf9b17ZcJoPQOR
ZnstEnUyVnrWROZxUuKY72+2tx3TuSf+L9uZqFHNx3Ix5FIoS+tFbHf4b8SxtsOb
hb2X7SigdGqhQDxUT+IPeO5hsJlIvG1/VYxMXxgc++rh9DjL06hDLUSH1WBSU0fC
kFQ7HrcpAlVHtWmGbwwUyVjD+KC/qmZBTAnkcYT4C62WZVytSCnihIuSFAvV1tpZ
SSIhVPyQ599UoZIiQYihp0S4qP74FotCtErWSrThneh2Cl8kDsRq//lV1nj/PTV8
CpTvz4VCFDFTgthCfd62fP95EwW5K+aE3NjGTPW/9Hx/0+J/1tT+yqWsrToGaruf
TbrqtzQhpclz9UEqA+696cVAXNj9uRU4AoD3YIg72kVnRlkgYd0=
=MDwh
-----END PGP SIGNATURE-----
Merge tag 'bpf-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Introduce '__attribute__((bpf_fastcall))' for helpers and kfuncs with
corresponding support in LLVM.
It is similar to existing 'no_caller_saved_registers' attribute in
GCC/LLVM with a provision for backward compatibility. It allows
compilers generate more efficient BPF code assuming the verifier or
JITs will inline or partially inline a helper/kfunc with such
attribute. bpf_cast_to_kern_ctx, bpf_rdonly_cast,
bpf_get_smp_processor_id are the first set of such helpers.
- Harden and extend ELF build ID parsing logic.
When called from sleepable context the relevants parts of ELF file
will be read to find and fetch .note.gnu.build-id information. Also
harden the logic to avoid TOCTOU, overflow, out-of-bounds problems.
- Improvements and fixes for sched-ext:
- Allow passing BPF iterators as kfunc arguments
- Make the pointer returned from iter_next method trusted
- Fix x86 JIT convergence issue due to growing/shrinking conditional
jumps in variable length encoding
- BPF_LSM related:
- Introduce few VFS kfuncs and consolidate them in
fs/bpf_fs_kfuncs.c
- Enforce correct range of return values from certain LSM hooks
- Disallow attaching to other LSM hooks
- Prerequisite work for upcoming Qdisc in BPF:
- Allow kptrs in program provided structs
- Support for gen_epilogue in verifier_ops
- Important fixes:
- Fix uprobe multi pid filter check
- Fix bpf_strtol and bpf_strtoul helpers
- Track equal scalars history on per-instruction level
- Fix tailcall hierarchy on x86 and arm64
- Fix signed division overflow to prevent INT_MIN/-1 trap on x86
- Fix get kernel stack in BPF progs attached to tracepoint:syscall
- Selftests:
- Add uprobe bench/stress tool
- Generate file dependencies to drastically improve re-build time
- Match JIT-ed and BPF asm with __xlated/__jited keywords
- Convert older tests to test_progs framework
- Add support for RISC-V
- Few fixes when BPF programs are compiled with GCC-BPF backend
(support for GCC-BPF in BPF CI is ongoing in parallel)
- Add traffic monitor
- Enable cross compile and musl libc
* tag 'bpf-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (260 commits)
btf: require pahole 1.21+ for DEBUG_INFO_BTF with default DWARF version
btf: move pahole check in scripts/link-vmlinux.sh to lib/Kconfig.debug
btf: remove redundant CONFIG_BPF test in scripts/link-vmlinux.sh
bpf: Call the missed kfree() when there is no special field in btf
bpf: Call the missed btf_record_free() when map creation fails
selftests/bpf: Add a test case to write mtu result into .rodata
selftests/bpf: Add a test case to write strtol result into .rodata
selftests/bpf: Rename ARG_PTR_TO_LONG test description
selftests/bpf: Fix ARG_PTR_TO_LONG {half-,}uninitialized test
bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error
bpf: Improve check_raw_mode_ok test for MEM_UNINIT-tagged types
bpf: Fix helper writes to read-only maps
bpf: Remove truncation test in bpf_strtol and bpf_strtoul helpers
bpf: Fix bpf_strtol and bpf_strtoul helpers for 32bit
selftests/bpf: Add tests for sdiv/smod overflow cases
bpf: Fix a sdiv overflow issue
libbpf: Add bpf_object__token_fd accessor
docs/bpf: Add missing BPF program types to docs
docs/bpf: Add constant values for linkages
bpf: Use fake pt_regs when doing bpf syscall tracepoint tracing
...
Unfortunately when we migrated the lifecycle management of the key LSM
blob to the LSM framework we forgot to convert the security_watch_key()
callbacks for SELinux and Smack. This patch corrects this by making use
of the selinux_key() and smack_key() helper functions respectively.
This patch also removes some input checking in the Smack callback as it
is no longer needed.
Fixes: 5f8d28f6d7 ("lsm: infrastructure management of the key security blob")
Reported-by: syzbot+044fdf24e96093584232@syzkaller.appspotmail.com
Tested-by: syzbot+044fdf24e96093584232@syzkaller.appspotmail.com
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
- rcu pointer assignment in smk_set_cipso
- indentation in smack_ip_output
-----BEGIN PGP SIGNATURE-----
iQJLBAABCAA1FiEEC+9tH1YyUwIQzUIeOKUVfIxDyBEFAmbk0uIXHGNhc2V5QHNj
aGF1Zmxlci1jYS5jb20ACgkQOKUVfIxDyBG60g//akcCh6Pu93lwer7vtbWRe2dL
88cziDLadaJbSlGl5G8GBUS6J0j5ZGRMrT2G6Ch7dnZyNUr3irwULKL3E5xAz62U
UIzp5EqOWX3fChO5H/uKysjm0JtUsTaJALsiZzqacYuC9PHFho8+PBAAB4ALHVoZ
0ar4IW5T7DfJk1rKVfy1z+ClPLQbQVHloS/0gkstHkBTlB3WaHOCvgmv03JxD1Nj
+TxLBuin1DYvm0eJtYkIvP8ODv6FRDGY+6ziwHBSCT3dLpOQDgXozGzO48xa6iQa
l8UMq9HbzYAd7/bbHiySS99mZ38Yq9E4Tg1SXD9WlgmtBhP6KkYYSzFp8o2lee8x
NkR8lOE7Xp8b4lqVqk6HAwpQOfUErXc6BnZ7CHTX5cWmsFI4X9kJQCxJ0CgfnG+W
LMByLa3x6V3BnFDG4nP6na6jKBd5iQiika0+VermDrIG/BZ4O4dJMMhwYW+AeA0w
45zKMucUckUA+AOMcbqFCh5EL8C0zWM2PQY8OlfW5oRIzZSeOiyo5/ourtE6ccx0
TZKrqowhuzMqXlVW6891+0EphMIzn3au1MRoPw2t3Xr/BodtgMV9MB6VBfpJ/nax
rxBq2YsH9S18Wb+wo7HRekDc2AIbFwG5VvqY8dV4LdFnUbJ+5tq+q+6Yqn648m9K
Nkx7UVSXwOImUt0O4KI=
=V/iJ
-----END PGP SIGNATURE-----
Merge tag 'Smack-for-6.12' of https://github.com/cschaufler/smack-next
Pull smack updates from Casey Schaufler:
"Two patches: one is a simple indentation correction, the other
corrects a potentially rcu unsafe pointer assignment"
* tag 'Smack-for-6.12' of https://github.com/cschaufler/smack-next:
smackfs: Use rcu_assign_pointer() to ensure safe assignment in smk_set_cipso
security: smack: Fix indentation in smack_netfilter.c
Currently, a sandbox process is not restricted to sending a signal (e.g.
SIGKILL) to a process outside the sandbox environment. The ability to
send a signal for a sandboxed process should be scoped the same way
abstract UNIX sockets are scoped. Therefore, we extend the "scoped"
field in a ruleset with LANDLOCK_SCOPE_SIGNAL to specify that a ruleset
will deny sending any signal from within a sandbox process to its parent
(i.e. any parent sandbox or non-sandboxed processes).
This patch adds file_set_fowner and file_free_security hooks to set and
release a pointer to the file owner's domain. This pointer, fown_domain
in landlock_file_security will be used in file_send_sigiotask to check
if the process can send a signal.
The ruleset_with_unknown_scope test is updated to support
LANDLOCK_SCOPE_SIGNAL.
This depends on two new changes:
- commit 1934b21261 ("file: reclaim 24 bytes from f_owner"): replace
container_of(fown, struct file, f_owner) with fown->file .
- commit 26f204380a ("fs: Fix file_set_fowner LSM hook
inconsistencies"): lock before calling the hook.
Signed-off-by: Tahera Fahimi <fahimitahera@gmail.com>
Closes: https://github.com/landlock-lsm/linux/issues/8
Link: https://lore.kernel.org/r/df2b4f880a2ed3042992689a793ea0951f6798a5.1725657727.git.fahimitahera@gmail.com
[mic: Update landlock_get_current_domain()'s return type, improve and
fix locking in hook_file_set_fowner(), simplify and fix sleepable call
and locking issue in hook_file_send_sigiotask() and rebase on the latest
VFS tree, simplify hook_task_kill() and quickly return when not
sandboxed, improve comments, rename LANDLOCK_SCOPED_SIGNAL]
Co-developed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Introduce a new "scoped" member to landlock_ruleset_attr that can
specify LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET to restrict connection to
abstract UNIX sockets from a process outside of the socket's domain.
Two hooks are implemented to enforce these restrictions:
unix_stream_connect and unix_may_send.
Closes: https://github.com/landlock-lsm/linux/issues/7
Signed-off-by: Tahera Fahimi <fahimitahera@gmail.com>
Link: https://lore.kernel.org/r/5f7ad85243b78427242275b93481cfc7c127764b.1725494372.git.fahimitahera@gmail.com
[mic: Fix commit message formatting, improve documentation, simplify
hook_unix_may_send(), and cosmetic fixes including rename of
LANDLOCK_SCOPED_ABSTRACT_UNIX_SOCKET]
Co-developed-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmbiGGAUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPU8BAA1+A15pmS34I9pq7c8TmRz3rNEs/a
zrW1aWJ0X/+axNS7sW3Pwtt1EKuaOhskKU8gNSieRhljC8rgXIVjZzLw6Atgcr5k
upulGbU9TXyVisYN+PWv9/84ito6/nYsKb7Mg3nUVsdodtIFVnsk1fxYLPHQEBig
Pl3i26U3VqH93Kz0W5vs/QR2uduPB8ZyscdTgcbrY9Vv1Y7IDZ2g9QsJVKLvbQKL
qcPK1JkHa+sBPJxDqS9A40zgbLbdPQgWQzsXX3dz822w1Ga7FIHSqxMBA6HwHZ+L
kV4P58wVfavhwt/cQSKMWI/yiGPMMd0B6yD+m8ojOvGfOfRCWxGMmEMqHNuZ3m7k
Bfll5ZgZTY8phUUhiNf3nxO3F3MM/5bHdhPOj3RReqbAbS6uWr4/fThPDYY/zIo6
NCY3HGxx3Ae64uQ01gC2p/czC50jDsMwlbXiZbrgdBhjBm/CVk5ozb80mLVcGrLB
+6XMzzSbC8IaNAH2fDmUJ2ABdwyNPgsSOTGZVzIanpxu1SU2/yk3SMxkp8fv5s36
wLeODUVcLgsjVV538Mkm6PGTE4TlXaH9yi6apMyJAGp0vPYx5c3Xxk2y5A5cur5p
hcrbDiX2QgeqFbwsz36incmPmbef2NU2c8feR8XLtPJuwNIeRcMSje0pnkaFlRmb
TAUJ1sDQAzZ8Fy0=
=HIAO
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Move the LSM framework to static calls
This transitions the vast majority of the LSM callbacks into static
calls. Those callbacks which haven't been converted were left as-is
due to the general ugliness of the changes required to support the
static call conversion; we can revisit those callbacks at a future
date.
- Add the Integrity Policy Enforcement (IPE) LSM
This adds a new LSM, Integrity Policy Enforcement (IPE). There is
plenty of documentation about IPE in this patches, so I'll refrain
from going into too much detail here, but the basic motivation behind
IPE is to provide a mechanism such that administrators can restrict
execution to only those binaries which come from integrity protected
storage, e.g. a dm-verity protected filesystem. You will notice that
IPE requires additional LSM hooks in the initramfs, dm-verity, and
fs-verity code, with the associated patches carrying ACK/review tags
from the associated maintainers. We couldn't find an obvious
maintainer for the initramfs code, but the IPE patchset has been
widely posted over several years.
Both Deven Bowers and Fan Wu have contributed to IPE's development
over the past several years, with Fan Wu agreeing to serve as the IPE
maintainer moving forward. Once IPE is accepted into your tree, I'll
start working with Fan to ensure he has the necessary accounts, keys,
etc. so that he can start submitting IPE pull requests to you
directly during the next merge window.
- Move the lifecycle management of the LSM blobs to the LSM framework
Management of the LSM blobs (the LSM state buffers attached to
various kernel structs, typically via a void pointer named "security"
or similar) has been mixed, some blobs were allocated/managed by
individual LSMs, others were managed by the LSM framework itself.
Starting with this pull we move management of all the LSM blobs,
minus the XFRM blob, into the framework itself, improving consistency
across LSMs, and reducing the amount of duplicated code across LSMs.
Due to some additional work required to migrate the XFRM blob, it has
been left as a todo item for a later date; from a practical
standpoint this omission should have little impact as only SELinux
provides a XFRM LSM implementation.
- Fix problems with the LSM's handling of F_SETOWN
The LSM hook for the fcntl(F_SETOWN) operation had a couple of
problems: it was racy with itself, and it was disconnected from the
associated DAC related logic in such a way that the LSM state could
be updated in cases where the DAC state would not. We fix both of
these problems by moving the security_file_set_fowner() hook into the
same section of code where the DAC attributes are updated. Not only
does this resolve the DAC/LSM synchronization issue, but as that code
block is protected by a lock, it also resolve the race condition.
- Fix potential problems with the security_inode_free() LSM hook
Due to use of RCU to protect inodes and the placement of the LSM hook
associated with freeing the inode, there is a bit of a challenge when
it comes to managing any LSM state associated with an inode. The VFS
folks are not open to relocating the LSM hook so we have to get
creative when it comes to releasing an inode's LSM state.
Traditionally we have used a single LSM callback within the hook that
is triggered when the inode is "marked for death", but not actually
released due to RCU.
Unfortunately, this causes problems for LSMs which want to take an
action when the inode's associated LSM state is actually released; so
we add an additional LSM callback, inode_free_security_rcu(), that is
called when the inode's LSM state is released in the RCU free
callback.
- Refactor two LSM hooks to better fit the LSM return value patterns
The vast majority of the LSM hooks follow the "return 0 on success,
negative values on failure" pattern, however, there are a small
handful that have unique return value behaviors which has caused
confusion in the past and makes it difficult for the BPF verifier to
properly vet BPF LSM programs. This includes patches to
convert two of these"special" LSM hooks to the common 0/-ERRNO pattern.
- Various cleanups and improvements
A handful of patches to remove redundant code, better leverage the
IS_ERR_OR_NULL() helper, add missing "static" markings, and do some
minor style fixups.
* tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (40 commits)
security: Update file_set_fowner documentation
fs: Fix file_set_fowner LSM hook inconsistencies
lsm: Use IS_ERR_OR_NULL() helper function
lsm: remove LSM_COUNT and LSM_CONFIG_COUNT
ipe: Remove duplicated include in ipe.c
lsm: replace indirect LSM hook calls with static calls
lsm: count the LSMs enabled at compile time
kernel: Add helper macros for loop unrolling
init/main.c: Initialize early LSMs after arch code, static keys and calls.
MAINTAINERS: add IPE entry with Fan Wu as maintainer
documentation: add IPE documentation
ipe: kunit test for parser
scripts: add boot policy generation program
ipe: enable support for fs-verity as a trust provider
fsverity: expose verified fsverity built-in signatures to LSMs
lsm: add security_inode_setintegrity() hook
ipe: add support for dm-verity as a trust provider
dm-verity: expose root hash digest and signature data to LSMs
block,lsm: add LSM blob and new LSM hooks for block devices
ipe: add permissive toggle
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmbiGE0UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMeZA/+KwrK8bHSm+y9USrYaI4S2biiomsb
GxNS6j0yIvg6uogWI2q8uTLXDdKMuJy88i7DHAMze+k6sSg8w6yEpFngFKeSAFpa
7X6iF/4EU2ZjwHnKRbL5r5DDGyGeKm+GxCmjkwx/Xo+Qfk85D0mzjcXYiXkwRa2h
DGdL34XztCfJNhJpPnnHDwh6OvVTY/c20g684D/7RMAXCkOq5r5SCfRK4SX1SpaT
ge9DEm1Oz7cC4zY0yUMby6ibBmCsfjIIO1aIXFgf1IHjKOIuMzESIG6YwphnU2zp
mI+7Zy6vvMd3dWDTxeMKqSsu43R3jkaclUnxyORmRD2noe7ehTvgPsQp31C9mmu1
JF+50TjkiONGkuWoYsCdRDAZnpA1GLU5cU0Y3ENDcXazV5xt9omXIek4En2MlV/S
DsXznvyaEJrAlZUBHZcJQwao394ZsPd+4nAelBTrbu+Ok2YD1p/GIv0va+lHIgZp
xUsRNbOs/24bxW0k6XXgv8nFhsiBuXctB4GF1x4Dw2rvUqYtSJEK7tpq5B3yWAPs
R57xKyELZrNTkf/2jcoCRQb9EODmhefYxYvN0fVgAKrzBbtVlOLltncKu3PYD8Vl
yQPLKlu2NaER3ipJqMIFMi+O945YWPB47pNbKFVQJmyneGgc7++It1fVmvnFqWlt
xP+p81tIM5E++Gw=
=VwaX
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Ensure that both IPv4 and IPv6 connections are properly initialized
While we always properly initialized IPv4 connections early in their
life, we missed the necessary IPv6 change when we were adding IPv6
support.
- Annotate the SELinux inode revalidation function to quiet KCSAN
KCSAN correctly identifies a race in __inode_security_revalidate()
when we check to see if an inode's SELinux has been properly
initialized. While KCSAN is correct, it is an intentional choice made
for performance reasons; if necessary, we check the state a second
time, this time with a lock held, before initializing the inode's
state.
- Code cleanups, simplification, etc.
A handful of individual patches to simplify some SELinux kernel
logic, improve return code granularity via ERR_PTR(), follow the
guidance on using KMEM_CACHE(), and correct some minor style
problems.
* tag 'selinux-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix style problems in security/selinux/include/audit.h
selinux: simplify avc_xperms_audit_required()
selinux: mark both IPv4 and IPv6 accepted connection sockets as labeled
selinux: replace kmem_cache_create() with KMEM_CACHE()
selinux: annotate false positive data race to avoid KCSAN warnings
selinux: refactor code to return ERR_PTR in selinux_netlbl_sock_genattr
selinux: Streamline type determination in security_compute_sid
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEwAAKCRCRxhvAZXjc
onI2AQDXa5XhIx0VpLWE9uVImVy3QuUKc/5pI1e1DKMgxLhKCgEAh15a4ETqmVaw
Zp3ZSzoLD8Ez1WwWb6cWQuHFYRSjtwU=
=+LKG
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.12.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull procfs updates from Christian Brauner:
"This contains the following changes for procfs:
- Add config options and parameters to block forcing memory writes.
This adds a Kconfig option and boot param to allow removing the
FOLL_FORCE flag from /proc/<pid>/mem write calls as this can be
used in various attacks.
The traditional forcing behavior is kept as default because it can
break GDB and some other use cases.
This is the simpler version that you had requested.
- Restrict overmounting of ephemeral entities.
It is currently possible to mount on top of various ephemeral
entities in procfs. This specifically includes magic links. To
recap, magic links are links of the form /proc/<pid>/fd/<nr>. They
serve as references to a target file and during path lookup they
cause a jump to the target path. Such magic links disappear if the
corresponding file descriptor is closed.
Currently it is possible to overmount such magic links. This is
mostly interesting for an attacker that wants to somehow trick a
process into e.g., reopening something that it didn't intend to
reopen or to hide a malicious file descriptor.
But also it risks leaking mounts for long-running processes. When
overmounting a magic link like above, the mount will not be
detached when the file descriptor is closed. Only the target
mountpoint will disappear. Which has the consequence of making it
impossible to unmount that mount afterwards. So the mount will
stick around until the process exits and the /proc/<pid>/ directory
is cleaned up during proc_flush_pid() when the dentries are pruned
and invalidated.
That in turn means it's possible for a program to accidentally leak
mounts and it's also possible to make a task leak mounts without
it's knowledge if the attacker just keeps overmounting things under
/proc/<pid>/fd/<nr>.
Disallow overmounting of such ephemeral entities.
- Cleanup the readdir method naming in some procfs file operations.
- Replace kmalloc() and strcpy() with a simple kmemdup() call"
* tag 'vfs-6.12.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
proc: fold kmalloc() + strcpy() into kmemdup()
proc: block mounting on top of /proc/<pid>/fdinfo/*
proc: block mounting on top of /proc/<pid>/fd/*
proc: block mounting on top of /proc/<pid>/map_files/*
proc: add proc_splice_unmountable()
proc: proc_readfdinfo() -> proc_fdinfo_iterate()
proc: proc_readfd() -> proc_fd_iterate()
proc: add config & param to block forcing mem writes
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZuQEwAAKCRCRxhvAZXjc
osS0AQCgIpvey9oW5DMyMw6Bv0hFMRv95gbNQZfHy09iK+NMNAD9GALhb/4cMIVB
7YrZGXEz454lpgcs8AnrOVjVNfctOQg=
=e9s9
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.12.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs file updates from Christian Brauner:
"This is the work to cleanup and shrink struct file significantly.
Right now, (focusing on x86) struct file is 232 bytes. After this
series struct file will be 184 bytes aka 3 cacheline and a spare 8
bytes for future extensions at the end of the struct.
With struct file being as ubiquitous as it is this should make a
difference for file heavy workloads and allow further optimizations in
the future.
- struct fown_struct was embedded into struct file letting it take up
32 bytes in total when really it shouldn't even be embedded in
struct file in the first place. Instead, actual users of struct
fown_struct now allocate the struct on demand. This frees up 24
bytes.
- Move struct file_ra_state into the union containg the cleanup hooks
and move f_iocb_flags out of the union. This closes a 4 byte hole
we created earlier and brings struct file to 192 bytes. Which means
struct file is 3 cachelines and we managed to shrink it by 40
bytes.
- Reorder struct file so that nothing crosses a cacheline.
I suspect that in the future we will end up reordering some members
to mitigate false sharing issues or just because someone does
actually provide really good perf data.
- Shrinking struct file to 192 bytes is only part of the work.
Files use a slab that is SLAB_TYPESAFE_BY_RCU and when a kmem cache
is created with SLAB_TYPESAFE_BY_RCU the free pointer must be
located outside of the object because the cache doesn't know what
part of the memory can safely be overwritten as it may be needed to
prevent object recycling.
That has the consequence that SLAB_TYPESAFE_BY_RCU may end up
adding a new cacheline.
So this also contains work to add a new kmem_cache_create_rcu()
function that allows the caller to specify an offset where the
freelist pointer is supposed to be placed. Thus avoiding the
implicit addition of a fourth cacheline.
- And finally this removes the f_version member in struct file.
The f_version member isn't particularly well-defined. It is mainly
used as a cookie to detect concurrent seeks when iterating
directories. But it is also abused by some subsystems for
completely unrelated things.
It is mostly a directory and filesystem specific thing that doesn't
really need to live in struct file and with its wonky semantics it
really lacks a specific function.
For pipes, f_version is (ab)used to defer poll notifications until
a write has happened. And struct pipe_inode_info is used by
multiple struct files in their ->private_data so there's no chance
of pushing that down into file->private_data without introducing
another pointer indirection.
But pipes don't rely on f_pos_lock so this adds a union into struct
file encompassing f_pos_lock and a pipe specific f_pipe member that
pipes can use. This union of course can be extended to other file
types and is similar to what we do in struct inode already"
* tag 'vfs-6.12.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (26 commits)
fs: remove f_version
pipe: use f_pipe
fs: add f_pipe
ubifs: store cookie in private data
ufs: store cookie in private data
udf: store cookie in private data
proc: store cookie in private data
ocfs2: store cookie in private data
input: remove f_version abuse
ext4: store cookie in private data
ext2: store cookie in private data
affs: store cookie in private data
fs: add generic_llseek_cookie()
fs: use must_set_pos()
fs: add must_set_pos()
fs: add vfs_setpos_cookie()
s390: remove unused f_version
ceph: remove unused f_version
adi: remove unused f_version
mm: Removed @freeptr_offset to prevent doc warning
...
There is no reason why struct path pointer shouldn't be const-qualified
when being passed into bpf_token_create() LSM hook. Add that const.
Acked-by: Paul Moore <paul@paul-moore.com> (LSM/SELinux)
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
bpf task local storage is now using task_struct->bpf_storage, so
bpf_lsm_blob_sizes.lbs_task is no longer needed. Remove it to save some
memory.
Fixes: a10787e6d5 ("bpf: Enable task local storage for tracing programs")
Cc: stable@vger.kernel.org
Cc: KP Singh <kpsingh@kernel.org>
Cc: Matt Bobrowski <mattbobrowski@google.com>
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20240911055508.9588-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Highlight that the file_set_fowner hook is now called with a lock held.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
In the `smk_set_cipso` function, the `skp->smk_netlabel.attr.mls.cat`
field is directly assigned to a new value without using the appropriate
RCU pointer assignment functions. According to RCU usage rules, this is
illegal and can lead to unpredictable behavior, including data
inconsistencies and impossible-to-diagnose memory corruption issues.
This possible bug was identified using a static analysis tool developed
by myself, specifically designed to detect RCU-related issues.
To address this, the assignment is now done using rcu_assign_pointer(),
which ensures that the pointer assignment is done safely, with the
necessary memory barriers and synchronization. This change prevents
potential RCU dereference issues by ensuring that the `cat` field is
safely updated while still adhering to RCU's requirements.
Fixes: 0817534ff9 ("smackfs: Fix use-after-free in netlbl_catmap_walk()")
Signed-off-by: Jiawei Ye <jiawei.ye@foxmail.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Pull misc fixes from Guenter Roeck.
These are fixes for regressions that Guenther has been reporting, and
the maintainers haven't picked up and sent in. With rc6 fairly imminent,
I'm taking them directly from Guenter.
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
apparmor: fix policy_unpack_test on big endian systems
Revert "MIPS: csrc-r4k: Apply verification clocksource flags"
microblaze: don't treat zero reserved memory regions as error
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmbR4owUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXN4ABAAg9wk2oKUY6sH317v89/ejnMbxn/F
1LaoRaYZZ8mz3d9Ph3s+6a29cRIn7w/Nefwy7E78wHChD6yiNMXK79AyZ676/AEW
6TewqmOeIzmAP76aTyLy3MQCDiw8VG3EI5tMUl3oNhd8XPtNwyQBy+sLU6EylllI
XBmZ0w6mz42LLo33ApY71edXxi3J967Dk8YSlkIRVgrDOcvYHpyKqUU6L5w3aWF1
XoTooFpZZDOYEJGie16POmLFtfgmxUV20XqdqNsuADPnOIamwGuVE7v3a2/bSMvt
G797KQlRzKBoBjYcl/fCBFRhIMcpn91Ig0nvj+gX02LgfAmg7Xkp2ZDdEV40A32H
mEAxFhsvV0mEzTZWRgYYmr3HGF7xiiZUyhu9uIatiK3yb3MAK7Ow7L1+pQsUku99
EwAexxT5+1kKY5t//Ech1jX36jm1OAVrVLWrWDl8cERKeCeuhs9mXjiKCP62rD28
wotd1R4s/O50NflsP0ywVxAshZ4EFVvHjoSgr/kcHyX8nAwzMF5GcvDrwjXQB5u7
3QYW74USIwXT0zcEvqaprVLhekTEtt2EieHPKIit97p718R582YC9Fxm0mgFKlgp
lGelo2g+JJPB4Y/T8EpUGPIW6nHK9Iw9cp7K07yhhKX0O+EO7bZE9ShunFf8INN5
EJUf4oAUTIc2XH4=
=Li4s
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm fix from Paul Moore:
"One small patch to correct a NFS permissions problem with SELinux and
Smack"
* tag 'lsm-pr-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
selinux,smack: don't bypass permissions check in inode_setsecctx hook
This adds a Kconfig option and boot param to allow removing
the FOLL_FORCE flag from /proc/pid/mem write calls because
it can be abused.
The traditional forcing behavior is kept as default because
it can break GDB and some other use cases.
Previously we tried a more sophisticated approach allowing
distributions to fine-tune /proc/pid/mem behavior, however
that got NAK-ed by Linus [1], who prefers this simpler
approach with semantics also easier to understand for users.
Link: https://lore.kernel.org/lkml/CAHk-=wiGWLChxYmUA5HrT5aopZrB7_2VTa0NLZcxORgkUe5tEQ@mail.gmail.com/ [1]
Cc: Doug Anderson <dianders@chromium.org>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Link: https://lore.kernel.org/r/20240802080225.89408-1-adrian.ratiu@collabora.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Use the IS_ERR_OR_NULL() helper instead of open-coding a
NULL and an error pointer checks to simplify the code and
improve readability.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Marek Gresko reports that the root user on an NFS client is able to
change the security labels on files on an NFS filesystem that is
exported with root squashing enabled.
The end of the kerneldoc comment for __vfs_setxattr_noperm() states:
* This function requires the caller to lock the inode's i_mutex before it
* is executed. It also assumes that the caller will make the appropriate
* permission checks.
nfsd_setattr() does do permissions checking via fh_verify() and
nfsd_permission(), but those don't do all the same permissions checks
that are done by security_inode_setxattr() and its related LSM hooks do.
Since nfsd_setattr() is the only consumer of security_inode_setsecctx(),
simplest solution appears to be to replace the call to
__vfs_setxattr_noperm() with a call to __vfs_setxattr_locked(). This
fixes the above issue and has the added benefit of causing nfsd to
recall conflicting delegations on a file when a client tries to change
its security label.
Cc: stable@kernel.org
Reported-by: Marek Gresko <marek.gresko@protonmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218809
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
By associative and commutative laws, the result of the two 'audited' is
zero. Take the second 'audited' as an example:
1) audited = requested & avd->auditallow;
2) audited &= ~requested;
==> audited = ~requested & (requested & avd->auditallow);
==> audited = (~requested & requested) & avd->auditallow;
==> audited = 0 & avd->auditallow;
==> audited = 0;
In fact, it is more readable to directly write zero. The value of the
first 'audited' is 0 because AUDIT is not allowed. The second 'audited'
is zero because there is no AUDITALLOW permission.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The current partial labeling was introduced in 389fb800ac ("netlabel:
Label incoming TCP connections correctly in SELinux") due to the fact
that IPv6 labeling was not supported yet at the time.
Signed-off-by: Guido Trentalancia <guido@trentalancia.com>
[PM: properly format the referenced commit ID, adjust subject]
Signed-off-by: Paul Moore <paul@paul-moore.com>
We do embedd struct fown_struct into struct file letting it take up 32
bytes in total. We could tweak struct fown_struct to be more compact but
really it shouldn't even be embedded in struct file in the first place.
Instead, actual users of struct fown_struct should allocate the struct
on demand. This frees up 24 bytes in struct file.
That will have some potentially user-visible changes for the ownership
fcntl()s. Some of them can now fail due to allocation failures.
Practically, that probably will almost never happen as the allocations
are small and they only happen once per file.
The fown_struct is used during kill_fasync() which is used by e.g.,
pipes to generate a SIGIO signal. Sending of such signals is conditional
on userspace having set an owner for the file using one of the F_OWNER
fcntl()s. Such users will be unaffected if struct fown_struct is
allocated during the fcntl() call.
There are a few subsystems that call __f_setown() expecting
file->f_owner to be allocated:
(1) tun devices
file->f_op->fasync::tun_chr_fasync()
-> __f_setown()
There are no callers of tun_chr_fasync().
(2) tty devices
file->f_op->fasync::tty_fasync()
-> __tty_fasync()
-> __f_setown()
tty_fasync() has no additional callers but __tty_fasync() has. Note
that __tty_fasync() only calls __f_setown() if the @on argument is
true. It's called from:
file->f_op->release::tty_release()
-> tty_release()
-> __tty_fasync()
-> __f_setown()
tty_release() calls __tty_fasync() with @on false
=> __f_setown() is never called from tty_release().
=> All callers of tty_release() are safe as well.
file->f_op->release::tty_open()
-> tty_release()
-> __tty_fasync()
-> __f_setown()
__tty_hangup() calls __tty_fasync() with @on false
=> __f_setown() is never called from tty_release().
=> All callers of __tty_hangup() are safe as well.
From the callchains it's obvious that (1) and (2) end up getting called
via file->f_op->fasync(). That can happen either through the F_SETFL
fcntl() with the FASYNC flag raised or via the FIOASYNC ioctl(). If
FASYNC is requested and the file isn't already FASYNC then
file->f_op->fasync() is called with @on true which ends up causing both
(1) and (2) to call __f_setown().
(1) and (2) are the only subsystems that call __f_setown() from the
file->f_op->fasync() handler. So both (1) and (2) have been updated to
allocate a struct fown_struct prior to calling fasync_helper() to
register with the fasync infrastructure. That's safe as they both call
fasync_helper() which also does allocations if @on is true.
The other interesting case are file leases:
(3) file leases
lease_manager_ops->lm_setup::lease_setup()
-> __f_setown()
Which in turn is called from:
generic_add_lease()
-> lease_manager_ops->lm_setup::lease_setup()
-> __f_setown()
So here again we can simply make generic_add_lease() allocate struct
fown_struct prior to the lease_manager_ops->lm_setup::lease_setup()
which happens under a spinlock.
With that the two remaining subsystems that call __f_setown() are:
(4) dnotify
(5) sockets
Both have their own custom ioctls to set struct fown_struct and both
have been converted to allocate a struct fown_struct on demand from
their respective ioctls.
Interactions with O_PATH are fine as well e.g., when opening a /dev/tty
as O_PATH then no file->f_op->open() happens thus no file->f_owner is
allocated. That's fine as no file operation will be set for those and
the device has never been opened. fcntl()s called on such things will
just allocate a ->f_owner on demand. Although I have zero idea why'd you
care about f_owner on an O_PATH fd.
Link: https://lore.kernel.org/r/20240813-work-f_owner-v2-1-4e9343a79f9f@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Based on guidance in include/linux/slab.h, replace kmem_cache_create()
with KMEM_CACHE() for sources under security/selinux to simplify creation
of SLAB caches.
Signed-off-by: Eric Suen <ericsu@linux.microsoft.com>
[PM: minor grammar nits in the description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Because these are equals to MAX_LSM_COUNT. Also, we can avoid dynamic
memory allocation for ordered_lsms because MAX_LSM_COUNT is a constant.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Paul Moore <paul@paul-moore.com>
KCSAN flags the check of isec->initialized by
__inode_security_revalidate() as a data race. This is indeed a racy
check, but inode_doinit_with_dentry() will recheck with isec->lock held.
Annotate the check with the data_race() macro to silence the KCSAN false
positive.
Reported-by: syzbot+319ed1769c0078257262@syzkaller.appspotmail.com
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
policy_unpack_test fails on big endian systems because data byte order
is expected to be little endian but is generated in host byte order.
This results in test failures such as:
# policy_unpack_test_unpack_array_with_null_name: EXPECTATION FAILED at security/apparmor/policy_unpack_test.c:150
Expected array_size == (u16)16, but
array_size == 4096 (0x1000)
(u16)16 == 16 (0x10)
# policy_unpack_test_unpack_array_with_null_name: pass:0 fail:1 skip:0 total:1
not ok 3 policy_unpack_test_unpack_array_with_null_name
# policy_unpack_test_unpack_array_with_name: EXPECTATION FAILED at security/apparmor/policy_unpack_test.c:164
Expected array_size == (u16)16, but
array_size == 4096 (0x1000)
(u16)16 == 16 (0x10)
# policy_unpack_test_unpack_array_with_name: pass:0 fail:1 skip:0 total:1
Add the missing endianness conversions when generating test data.
Fixes: 4d944bcd4e ("apparmor: add AppArmor KUnit tests for policy unpack")
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Aligned parameters in the function declaration of smack_ip_output
to adhere to the Linux kernel coding style guidelines.
The parameters of the smack_ip_output function were previously misaligned,
with the second and third parameters not aligned under the first parameter.
This change corrects the indentation, improving code readability and
maintaining consistency with the rest of the codebase.
Signed-off-by: GiSeong Ji <jiggyjiggy0323@gmail.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
The header files eval.h is included twice in ipe.c,
so one inclusion of each can be removed.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9796
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
LSM hooks are currently invoked from a linked list as indirect calls
which are invoked using retpolines as a mitigation for speculative
attacks (Branch History / Target injection) and add extra overhead which
is especially bad in kernel hot paths:
security_file_ioctl:
0xff...0320 <+0>: endbr64
0xff...0324 <+4>: push %rbp
0xff...0325 <+5>: push %r15
0xff...0327 <+7>: push %r14
0xff...0329 <+9>: push %rbx
0xff...032a <+10>: mov %rdx,%rbx
0xff...032d <+13>: mov %esi,%ebp
0xff...032f <+15>: mov %rdi,%r14
0xff...0332 <+18>: mov $0xff...7030,%r15
0xff...0339 <+25>: mov (%r15),%r15
0xff...033c <+28>: test %r15,%r15
0xff...033f <+31>: je 0xff...0358 <security_file_ioctl+56>
0xff...0341 <+33>: mov 0x18(%r15),%r11
0xff...0345 <+37>: mov %r14,%rdi
0xff...0348 <+40>: mov %ebp,%esi
0xff...034a <+42>: mov %rbx,%rdx
0xff...034d <+45>: call 0xff...2e0 <__x86_indirect_thunk_array+352>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Indirect calls that use retpolines leading to overhead, not just due
to extra instruction but also branch misses.
0xff...0352 <+50>: test %eax,%eax
0xff...0354 <+52>: je 0xff...0339 <security_file_ioctl+25>
0xff...0356 <+54>: jmp 0xff...035a <security_file_ioctl+58>
0xff...0358 <+56>: xor %eax,%eax
0xff...035a <+58>: pop %rbx
0xff...035b <+59>: pop %r14
0xff...035d <+61>: pop %r15
0xff...035f <+63>: pop %rbp
0xff...0360 <+64>: jmp 0xff...47c4 <__x86_return_thunk>
The indirect calls are not really needed as one knows the addresses of
enabled LSM callbacks at boot time and only the order can possibly
change at boot time with the lsm= kernel command line parameter.
An array of static calls is defined per LSM hook and the static calls
are updated at boot time once the order has been determined.
With the hook now exposed as a static call, one can see that the
retpolines are no longer there and the LSM callbacks are invoked
directly:
security_file_ioctl:
0xff...0ca0 <+0>: endbr64
0xff...0ca4 <+4>: nopl 0x0(%rax,%rax,1)
0xff...0ca9 <+9>: push %rbp
0xff...0caa <+10>: push %r14
0xff...0cac <+12>: push %rbx
0xff...0cad <+13>: mov %rdx,%rbx
0xff...0cb0 <+16>: mov %esi,%ebp
0xff...0cb2 <+18>: mov %rdi,%r14
0xff...0cb5 <+21>: jmp 0xff...0cc7 <security_file_ioctl+39>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Static key enabled for SELinux
0xffffffff818f0cb7 <+23>: jmp 0xff...0cde <security_file_ioctl+62>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Static key enabled for BPF LSM. This is something that is changed to
default to false to avoid the existing side effect issues of BPF LSM
[1] in a subsequent patch.
0xff...0cb9 <+25>: xor %eax,%eax
0xff...0cbb <+27>: xchg %ax,%ax
0xff...0cbd <+29>: pop %rbx
0xff...0cbe <+30>: pop %r14
0xff...0cc0 <+32>: pop %rbp
0xff...0cc1 <+33>: cs jmp 0xff...0000 <__x86_return_thunk>
0xff...0cc7 <+39>: endbr64
0xff...0ccb <+43>: mov %r14,%rdi
0xff...0cce <+46>: mov %ebp,%esi
0xff...0cd0 <+48>: mov %rbx,%rdx
0xff...0cd3 <+51>: call 0xff...3230 <selinux_file_ioctl>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Direct call to SELinux.
0xff...0cd8 <+56>: test %eax,%eax
0xff...0cda <+58>: jne 0xff...0cbd <security_file_ioctl+29>
0xff...0cdc <+60>: jmp 0xff...0cb7 <security_file_ioctl+23>
0xff...0cde <+62>: endbr64
0xff...0ce2 <+66>: mov %r14,%rdi
0xff...0ce5 <+69>: mov %ebp,%esi
0xff...0ce7 <+71>: mov %rbx,%rdx
0xff...0cea <+74>: call 0xff...e220 <bpf_lsm_file_ioctl>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Direct call to BPF LSM.
0xff...0cef <+79>: test %eax,%eax
0xff...0cf1 <+81>: jne 0xff...0cbd <security_file_ioctl+29>
0xff...0cf3 <+83>: jmp 0xff...0cb9 <security_file_ioctl+25>
0xff...0cf5 <+85>: endbr64
0xff...0cf9 <+89>: mov %r14,%rdi
0xff...0cfc <+92>: mov %ebp,%esi
0xff...0cfe <+94>: mov %rbx,%rdx
0xff...0d01 <+97>: pop %rbx
0xff...0d02 <+98>: pop %r14
0xff...0d04 <+100>: pop %rbp
0xff...0d05 <+101>: ret
0xff...0d06 <+102>: int3
0xff...0d07 <+103>: int3
0xff...0d08 <+104>: int3
0xff...0d09 <+105>: int3
While this patch uses static_branch_unlikely indicating that an LSM hook
is likely to be not present. In most cases this is still a better choice
as even when an LSM with one hook is added, empty slots are created for
all LSM hooks (especially when many LSMs that do not initialize most
hooks are present on the system).
There are some hooks that don't use the call_int_hook or
call_void_hook. These hooks are updated to use a new macro called
lsm_for_each_hook where the lsm_callback is directly invoked as an
indirect call.
Below are results of the relevant Unixbench system benchmarks with BPF LSM
and SELinux enabled with default policies enabled with and without these
patches.
Benchmark Delta(%): (+ is better)
==========================================================================
Execl Throughput +1.9356
File Write 1024 bufsize 2000 maxblocks +6.5953
Pipe Throughput +9.5499
Pipe-based Context Switching +3.0209
Process Creation +2.3246
Shell Scripts (1 concurrent) +1.4975
System Call Overhead +2.7815
System Benchmarks Index Score (Partial Only): +3.4859
In the best case, some syscalls like eventfd_create benefitted to about
~10%.
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add various happy/unhappy unit tests for both IPE's policy parser.
Besides, a test suite for IPE functionality is available at
https://github.com/microsoft/ipe/tree/test-suite
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Enables an IPE policy to be enforced from kernel start, enabling access
control based on trust from kernel startup. This is accomplished by
transforming an IPE policy indicated by CONFIG_IPE_BOOT_POLICY into a
c-string literal that is parsed at kernel startup as an unsigned policy.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Enable IPE policy authors to indicate trust for a singular fsverity
file, identified by the digest information, through "fsverity_digest"
and all files using valid fsverity builtin signatures via
"fsverity_signature".
This enables file-level integrity claims to be expressed in IPE,
allowing individual files to be authorized, giving some flexibility
for policy authors. Such file-level claims are important to be expressed
for enforcing the integrity of packages, as well as address some of the
scalability issues in a sole dm-verity based solution (# of loop back
devices, etc).
This solution cannot be done in userspace as the minimum threat that
IPE should mitigate is an attacker downloads malicious payload with
all required dependencies. These dependencies can lack the userspace
check, bypassing the protection entirely. A similar attack succeeds if
the userspace component is replaced with a version that does not
perform the check. As a result, this can only be done in the common
entry point - the kernel.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This patch introduces a new hook to save inode's integrity
data. For example, for fsverity enabled files, LSMs can use this hook to
save the existence of verified fsverity builtin signature into the inode's
security blob, and LSMs can make access decisions based on this data.
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: subject line tweak, removed changelog]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Allows author of IPE policy to indicate trust for a singular dm-verity
volume, identified by roothash, through "dmverity_roothash" and all
signed and validated dm-verity volumes, through "dmverity_signature".
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: fixed some line length issues in the comments]
Signed-off-by: Paul Moore <paul@paul-moore.com>
This patch introduces a new LSM blob to the block_device structure,
enabling the security subsystem to store security-sensitive data related
to block devices. Currently, for a device mapper's mapped device containing
a dm-verity target, critical security information such as the roothash and
its signing state are not readily accessible. Specifically, while the
dm-verity volume creation process passes the dm-verity roothash and its
signature from userspace to the kernel, the roothash is stored privately
within the dm-verity target, and its signature is discarded
post-verification. This makes it extremely hard for the security subsystem
to utilize these data.
With the addition of the LSM blob to the block_device structure, the
security subsystem can now retain and manage important security metadata
such as the roothash and the signing state of a dm-verity by storing them
inside the blob. Access decisions can then be based on these stored data.
The implementation follows the same approach used for security blobs in
other structures like struct file, struct inode, and struct superblock.
The initialization of the security blob occurs after the creation of the
struct block_device, performed by the security subsystem. Similarly, the
security blob is freed by the security subsystem before the struct
block_device is deallocated or freed.
This patch also introduces a new hook security_bdev_setintegrity() to save
block device's integrity data to the new LSM blob. For example, for
dm-verity, it can use this hook to expose its roothash and signing state
to LSMs, then LSMs can save these data into the LSM blob.
Please note that the new hook should be invoked every time the security
information is updated to keep these data current. For example, in
dm-verity, if the mapping table is reloaded and configured to use a
different dm-verity target with a new roothash and signing information,
the previously stored data in the LSM blob will become obsolete. It is
crucial to re-invoke the hook to refresh these data and ensure they are up
to date. This necessity arises from the design of device-mapper, where a
device-mapper device is first created, and then targets are subsequently
loaded into it. These targets can be modified multiple times during the
device's lifetime. Therefore, while the LSM blob is allocated during the
creation of the block device, its actual contents are not initialized at
this stage and can change substantially over time. This includes
alterations from data that the LSM 'trusts' to those it does not, making
it essential to handle these changes correctly. Failure to address this
dynamic aspect could potentially allow for bypassing LSM checks.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: merge fuzz, subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
IPE, like SELinux, supports a permissive mode. This mode allows policy
authors to test and evaluate IPE policy without it affecting their
programs. When the mode is changed, a 1404 AUDIT_MAC_STATUS will
be reported.
This patch adds the following audit records:
audit: MAC_STATUS enforcing=0 old_enforcing=1 auid=4294967295
ses=4294967295 enabled=1 old-enabled=1 lsm=ipe res=1
audit: MAC_STATUS enforcing=1 old_enforcing=0 auid=4294967295
ses=4294967295 enabled=1 old-enabled=1 lsm=ipe res=1
The audit record only emit when the value from the user input is
different from the current enforce value.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Users of IPE require a way to identify when and why an operation fails,
allowing them to both respond to violations of policy and be notified
of potentially malicious actions on their systems with respect to IPE
itself.
This patch introduces 3 new audit events.
AUDIT_IPE_ACCESS(1420) indicates the result of an IPE policy evaluation
of a resource.
AUDIT_IPE_CONFIG_CHANGE(1421) indicates the current active IPE policy
has been changed to another loaded policy.
AUDIT_IPE_POLICY_LOAD(1422) indicates a new IPE policy has been loaded
into the kernel.
This patch also adds support for success auditing, allowing users to
identify why an allow decision was made for a resource. However, it is
recommended to use this option with caution, as it is quite noisy.
Here are some examples of the new audit record types:
AUDIT_IPE_ACCESS(1420):
audit: AUDIT1420 ipe_op=EXECUTE ipe_hook=BPRM_CHECK enforcing=1
pid=297 comm="sh" path="/root/vol/bin/hello" dev="tmpfs"
ino=3897 rule="op=EXECUTE boot_verified=TRUE action=ALLOW"
audit: AUDIT1420 ipe_op=EXECUTE ipe_hook=BPRM_CHECK enforcing=1
pid=299 comm="sh" path="/mnt/ipe/bin/hello" dev="dm-0"
ino=2 rule="DEFAULT action=DENY"
audit: AUDIT1420 ipe_op=EXECUTE ipe_hook=BPRM_CHECK enforcing=1
pid=300 path="/tmp/tmpdp2h1lub/deny/bin/hello" dev="tmpfs"
ino=131 rule="DEFAULT action=DENY"
The above three records were generated when the active IPE policy only
allows binaries from the initramfs to run. The three identical `hello`
binary were placed at different locations, only the first hello from
the rootfs(initramfs) was allowed.
Field ipe_op followed by the IPE operation name associated with the log.
Field ipe_hook followed by the name of the LSM hook that triggered the IPE
event.
Field enforcing followed by the enforcement state of IPE. (it will be
introduced in the next commit)
Field pid followed by the pid of the process that triggered the IPE
event.
Field comm followed by the command line program name of the process that
triggered the IPE event.
Field path followed by the file's path name.
Field dev followed by the device name as found in /dev where the file is
from.
Note that for device mappers it will use the name `dm-X` instead of
the name in /dev/mapper.
For a file in a temp file system, which is not from a device, it will use
`tmpfs` for the field.
The implementation of this part is following another existing use case
LSM_AUDIT_DATA_INODE in security/lsm_audit.c
Field ino followed by the file's inode number.
Field rule followed by the IPE rule made the access decision. The whole
rule must be audited because the decision is based on the combination of
all property conditions in the rule.
Along with the syscall audit event, user can know why a blocked
happened. For example:
audit: AUDIT1420 ipe_op=EXECUTE ipe_hook=BPRM_CHECK enforcing=1
pid=2138 comm="bash" path="/mnt/ipe/bin/hello" dev="dm-0"
ino=2 rule="DEFAULT action=DENY"
audit[1956]: SYSCALL arch=c000003e syscall=59
success=no exit=-13 a0=556790138df0 a1=556790135390 a2=5567901338b0
a3=ab2a41a67f4f1f4e items=1 ppid=147 pid=1956 auid=4294967295 uid=0
gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0
ses=4294967295 comm="bash" exe="/usr/bin/bash" key=(null)
The above two records showed bash used execve to run "hello" and got
blocked by IPE. Note that the IPE records are always prior to a SYSCALL
record.
AUDIT_IPE_CONFIG_CHANGE(1421):
audit: AUDIT1421
old_active_pol_name="Allow_All" old_active_pol_version=0.0.0
old_policy_digest=sha256:E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649
new_active_pol_name="boot_verified" new_active_pol_version=0.0.0
new_policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F
auid=4294967295 ses=4294967295 lsm=ipe res=1
The above record showed the current IPE active policy switch from
`Allow_All` to `boot_verified` along with the version and the hash
digest of the two policies. Note IPE can only have one policy active
at a time, all access decision evaluation is based on the current active
policy.
The normal procedure to deploy a policy is loading the policy to deploy
into the kernel first, then switch the active policy to it.
AUDIT_IPE_POLICY_LOAD(1422):
audit: AUDIT1422 policy_name="boot_verified" policy_version=0.0.0
policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F2676
auid=4294967295 ses=4294967295 lsm=ipe res=1
The above record showed a new policy has been loaded into the kernel
with the policy name, policy version and policy hash.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
As is typical with LSMs, IPE uses securityfs as its interface with
userspace. for a complete list of the interfaces and the respective
inputs/outputs, please see the documentation under
admin-guide/LSM/ipe.rst
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
When deleting a directory in the security file system, the existing
securityfs_remove requires the directory to be empty, otherwise
it will do nothing. This leads to a potential risk that the security
file system might be in an unclean state when the intended deletion
did not happen.
This commit introduces a new function securityfs_recursive_remove
to recursively delete a directory without leaving an unclean state.
Co-developed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
IPE is designed to provide system level trust guarantees, this usually
implies that trust starts from bootup with a hardware root of trust,
which validates the bootloader. After this, the bootloader verifies
the kernel and the initramfs.
As there's no currently supported integrity method for initramfs, and
it's typically already verified by the bootloader. This patch introduces
a new IPE property `boot_verified` which allows author of IPE policy to
indicate trust for files from initramfs.
The implementation of this feature utilizes the newly added
`initramfs_populated` hook. This hook marks the superblock of the rootfs
after the initramfs has been unpacked into it.
Before mounting the real rootfs on top of the initramfs, initramfs
script will recursively remove all files and directories on the
initramfs. This is typically implemented by using switch_root(8)
(https://man7.org/linux/man-pages/man8/switch_root.8.html).
Therefore the initramfs will be empty and not accessible after the real
rootfs takes over. It is advised to switch to a different policy
that doesn't rely on the `boot_verified` property after this point.
This ensures that the trust policies remain relevant and effective
throughout the system's operation.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This patch introduces a new hook to notify security system that the
content of initramfs has been unpacked into the rootfs.
Upon receiving this notification, the security system can activate
a policy to allow only files that originated from the initramfs to
execute or load into kernel during the early stages of booting.
This approach is crucial for minimizing the attack surface by
ensuring that only trusted files from the initramfs are operational
in the critical boot phase.
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
IPE's initial goal is to control both execution and the loading of
kernel modules based on the system's definition of trust. It
accomplishes this by plugging into the security hooks for
bprm_check_security, file_mprotect, mmap_file, kernel_load_data,
and kernel_read_data.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Introduce a core evaluation function in IPE that will be triggered by
various security hooks (e.g., mmap, bprm_check, kexec). This function
systematically assesses actions against the defined IPE policy, by
iterating over rules specific to the action being taken. This critical
addition enables IPE to enforce its security policies effectively,
ensuring that actions intercepted by these hooks are scrutinized for policy
compliance before they are allowed to proceed.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
IPE's interpretation of the what the user trusts is accomplished through
its policy. IPE's design is to not provide support for a single trust
provider, but to support multiple providers to enable the end-user to
choose the best one to seek their needs.
This requires the policy to be rather flexible and modular so that
integrity providers, like fs-verity, dm-verity, or some other system,
can plug into the policy with minimal code changes.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: added NULL check in parse_rule() as discussed]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Integrity Policy Enforcement (IPE) is an LSM that provides an
complimentary approach to Mandatory Access Control than existing LSMs
today.
Existing LSMs have centered around the concept of access to a resource
should be controlled by the current user's credentials. IPE's approach,
is that access to a resource should be controlled by the system's trust
of a current resource.
The basis of this approach is defining a global policy to specify which
resource can be trusted.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Trusted keys unseal the key blob on load, but keep the sealed payload in
the blob field so that every subsequent read (export) will simply
convert this field to hex and send it to userspace.
With DCP-based trusted keys, we decrypt the blob encryption key (BEK)
in the Kernel due hardware limitations and then decrypt the blob payload.
BEK decryption is done in-place which means that the trusted key blob
field is modified and it consequently holds the BEK in plain text.
Every subsequent read of that key thus send the plain text BEK instead
of the encrypted BEK to userspace.
This issue only occurs when importing a trusted DCP-based key and
then exporting it again. This should rarely happen as the common use cases
are to either create a new trusted key and export it, or import a key
blob and then just use it without exporting it again.
Fix this by performing BEK decryption and encryption in a dedicated
buffer. Further always wipe the plain text BEK buffer to prevent leaking
the key via uninitialized memory.
Cc: stable@vger.kernel.org # v6.10+
Fixes: 2e8a0f40a3 ("KEYS: trusted: Introduce NXP DCP-backed trusted keys")
Signed-off-by: David Gstir <david@sigma-star.at>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
The DCP trusted key type uses the wrong helper function to store
the blob's payload length which can lead to the wrong byte order
being used in case this would ever run on big endian architectures.
Fix by using correct helper function.
Cc: stable@vger.kernel.org # v6.10+
Fixes: 2e8a0f40a3 ("KEYS: trusted: Introduce NXP DCP-backed trusted keys")
Suggested-by: Richard Weinberger <richard@nod.at>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405240610.fj53EK0q-lkp@intel.com/
Signed-off-by: David Gstir <david@sigma-star.at>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Fix sparse warning:
security/lockdown/lockdown.c:79:21: warning:
symbol 'lockdown_lsmid' was not declared. Should it be static?
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
For any changes of struct fd representation we need to
turn existing accesses to fields into calls of wrappers.
Accesses to struct fd::flags are very few (3 in linux/file.h,
1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
explicit initializers).
Those can be dealt with in the commit converting to
new layout; accesses to struct fd::file are too many for that.
This commit converts (almost) all of f.file to
fd_file(f). It's not entirely mechanical ('file' is used as
a member name more than just in struct fd) and it does not
even attempt to distinguish the uses in pointer context from
those in boolean context; the latter will be eventually turned
into a separate helper (fd_empty()).
NOTE: mass conversion to fd_empty(), tempting as it
might be, is a bad idea; better do that piecewise in commit
that convert from fdget...() to CLASS(...).
[conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c
caught by git; fs/stat.c one got caught by git grep]
[fs/xattr.c conflict]
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The LSM framework has an existing inode_free_security() hook which
is used by LSMs that manage state associated with an inode, but
due to the use of RCU to protect the inode, special care must be
taken to ensure that the LSMs do not fully release the inode state
until it is safe from a RCU perspective.
This patch implements a new inode_free_security_rcu() implementation
hook which is called when it is safe to free the LSM's internal inode
state. Unfortunately, this new hook does not have access to the inode
itself as it may already be released, so the existing
inode_free_security() hook is retained for those LSMs which require
access to the inode.
Cc: stable@vger.kernel.org
Reported-by: syzbot+5446fbf332b0602ede0b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/00000000000076ba3b0617f65cc8@google.com
Signed-off-by: Paul Moore <paul@paul-moore.com>
Some cleanup and style corrections for lsm_hooks.h.
* Drop the lsm_inode_alloc() extern declaration, it is not needed.
* Relocate lsm_get_xattr_slot() and extern variables in the file to
improve grouping of related objects.
* Don't use tabs to needlessly align structure fields.
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Unfortunately it appears that vma_is_initial_heap() is currently broken
for applications that do not currently have any heap allocated, e.g.
brk == start_brk. The breakage is such that it will cause SELinux to
check for the process/execheap permission on memory regions that cross
brk/start_brk even when there is no heap.
The proper fix would be to correct vma_is_initial_heap(), but as there
are multiple callers I am hesitant to unilaterally modify the helper
out of concern that I would end up breaking some other subsystem. The
mm developers have been made aware of the situation and hopefully they
will have a fix at some point in the future, but we need a fix soon so
we are simply going to revert our use of vma_is_initial_heap() in favor
of our old logic/code which works as expected, even in the face of a
zero size heap. We can return to using vma_is_initial_heap() at some
point in the future when it is fixed.
Cc: stable@vger.kernel.org
Reported-by: Marc Reisner <reisner.marc@gmail.com>
Closes: https://lore.kernel.org/all/ZrPmoLKJEf1wiFmM@marcreisner.com
Fixes: 68df1baf15 ("selinux: use vma_is_initial_stack() and vma_is_initial_heap()")
Signed-off-by: Paul Moore <paul@paul-moore.com>
When avc_add_xperms_decision() fails, the information recorded by the new
avc node is incomplete. In this case, the new avc node should be released
instead of replacing the old avc node.
Cc: stable@vger.kernel.org
Fixes: fa1aa143ac ("selinux: extended permissions for ioctls")
Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The count increases only when a node is successfully added to
the linked list.
Cc: stable@vger.kernel.org
Fixes: fa1aa143ac ("selinux: extended permissions for ioctls")
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
To be consistent with most LSM hooks, convert the return value of
hook inode_copy_up_xattr to 0 or a negative error code.
Before:
- Hook inode_copy_up_xattr returns 0 when accepting xattr, 1 when
discarding xattr, -EOPNOTSUPP if it does not know xattr, or any
other negative error code otherwise.
After:
- Hook inode_copy_up_xattr returns 0 when accepting xattr, *-ECANCELED*
when discarding xattr, -EOPNOTSUPP if it does not know xattr, or
any other negative error code otherwise.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
To be consistent with most LSM hooks, convert the return value of
hook vm_enough_memory to 0 or a negative error code.
Before:
- Hook vm_enough_memory returns 1 if permission is granted, 0 if not.
- LSM_RET_DEFAULT(vm_enough_memory_mm) is 1.
After:
- Hook vm_enough_memory reutrns 0 if permission is granted, negative
error code if not.
- LSM_RET_DEFAULT(vm_enough_memory_mm) is 0.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Move management of the perf_event->security blob out of the individual
security modules and into the security infrastructure. Instead of
allocating the blobs from within the modules the modules tell the
infrastructure how much space is required, and the space is allocated
there. There are no longer any modules that require the perf_event_free()
hook. The hook definition has been removed.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Move management of the infiniband security blob out of the individual
security modules and into the LSM infrastructure. The security modules
tell the infrastructure how much space they require at initialization.
There are no longer any modules that require the ib_free() hook.
The hook definition has been removed.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
[PM: subject tweak, selinux style fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Move management of the dev_tun security blob out of the individual
security modules and into the LSM infrastructure. The security modules
tell the infrastructure how much space they require at initialization.
There are no longer any modules that require the dev_tun_free hook.
The hook definition has been removed.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
[PM: subject tweak, selinux style fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Create a helper function lsm_blob_alloc() for general use in the hook
specific functions that allocate LSM blobs. Change the hook specific
functions to use this helper. This reduces the code size by a small
amount and will make adding new instances of infrastructure managed
security blobs easier.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Move management of the key->security blob out of the individual security
modules and into the security infrastructure. Instead of allocating the
blobs from within the modules the modules tell the infrastructure how
much space is required, and the space is allocated there. There are
no existing modules that require a key_free hook, so the call to it and
the definition for it have been removed.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Move management of the sock->sk_security blob out
of the individual security modules and into the security
infrastructure. Instead of allocating the blobs from within
the modules the modules tell the infrastructure how much
space is required, and the space is allocated there.
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Refactor the code in selinux_netlbl_sock_genattr to return ERR_PTR
when an error occurs.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Simplifies the logic for determining the security context type in
security_compute_sid, enhancing readability and efficiency.
Consolidates default type assignment logic next to type transition
checks, removing redundancy and improving code flow.
Signed-off-by: Canfeng Guo <guocanfeng@uniontech.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
+ Cleanups
- optimization: try to avoid refing the label in apparmor_file_open
- remove useless static inline function is_deleted
- use kvfree_sensitive to free data->data
- fix typo in kernel doc
+ Bug fixes
- unpack transition table if dfa is not present
- test: add MODULE_DESCRIPTION()
- take nosymfollow flag into account
- fix possible NULL pointer dereference
- fix null pointer deref when receiving skb during sock creation
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE7cSDD705q2rFEEf7BS82cBjVw9gFAmaikGkACgkQBS82cBjV
w9gasBAAsnHikPshnqnCnIyA/Am9/LeOstKf8jQCbPWtjt1Fw3y8gmuJJOhRalM1
R343rFUF3Z5wb+Yy1xELww3eET91QgC9s5NLJceDXvKOnAppHD+YOphjfmRbpgaF
rWHGN6/rD30HALKpsMw7Z3jTe9xOPhygh+lWlaiJIoXZ2hZwv2Chd6TDVR8BSFyq
OLhuf++DLcZNEcvg1bUxccK49J+iAVeC0VtdzNXw+BRZU5zM/US8n+8EStxuY65n
cAwBM+cJn6yhX3bsazUacw32SCgWePTJr87Wbfn1oF5znoU/HMM9CtPvivTwQSRN
+nT0qm57CJW2IJdRfP2OhRpAbRvUgMjGjyIf7PJHn0zCXffiDNZhSuFOcKy9wAOQ
H9qi7lkAC9KYIs1jI57Fogp+sgo11q+fPdzgGuuCgFnZp7gEPChsTewO346jNYES
fORj62CsWL3lvhADq1cC/kgO9NZHwI7SLH5orobvrIoP9SGH2pGi0Nzdch62MNXo
BwJctpsgS8QPl2XiQhf38+LIczf+USOkDlXO5tmYsTTSUNmjd+7fDc1pcBv0WCf+
2DOEpI3DuaCr1oTr+jJ3zfysYZdeubFI/8uq5dCrMGQZZneEx0RSZAOGAbrWaPCy
XrmkvrjDwGMPocBF0RVHgda60OofZRa/aJnUxhIkOXTDDWpdQPM=
=kO+r
-----END PGP SIGNATURE-----
Merge tag 'apparmor-pr-2024-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
Pull apparmor updates from John Johansen:
"Cleanups
- optimization: try to avoid refing the label in apparmor_file_open
- remove useless static inline function is_deleted
- use kvfree_sensitive to free data->data
- fix typo in kernel doc
Bug fixes:
- unpack transition table if dfa is not present
- test: add MODULE_DESCRIPTION()
- take nosymfollow flag into account
- fix possible NULL pointer dereference
- fix null pointer deref when receiving skb during sock creation"
* tag 'apparmor-pr-2024-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: unpack transition table if dfa is not present
apparmor: try to avoid refing the label in apparmor_file_open
apparmor: test: add MODULE_DESCRIPTION()
apparmor: take nosymfollow flag into account
apparmor: fix possible NULL pointer dereference
apparmor: fix typo in kernel doc
apparmor: remove useless static inline function is_deleted
apparmor: use kvfree_sensitive to free data->data
apparmor: Fix null pointer deref when receiving skb during sock creation
-----BEGIN PGP SIGNATURE-----
iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCZqFEchAcbWljQGRpZ2lr
b2QubmV0AAoJEOXj0OiMgvbSULcBAPEV5Viu/zox2FdS87EGTqWxEQJcBRvc3ahj
MQk44WtMAP4o2CnwrOoMyZXeq9npteL5lQsVhEzeI+p8oN9C9bThBg==
=zizo
-----END PGP SIGNATURE-----
Merge tag 'landlock-6.11-rc1-houdini-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fix from Mickaël Salaün:
"Jann Horn reported a sandbox bypass for Landlock. This includes the
fix and new tests. This should be backported"
* tag 'landlock-6.11-rc1-houdini-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/landlock: Add cred_transfer test
landlock: Don't lose track of restrictions on cred_transfer
const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ctl_table
structs into .rodata data which will ensure that proc_handler function
pointers cannot be modified.
This patch has been generated by the following coccinelle script:
```
virtual patch
@r1@
identifier ctl, write, buffer, lenp, ppos;
identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos);
@r2@
identifier func, ctl, write, buffer, lenp, ppos;
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos)
{ ... }
@r3@
identifier func;
@@
int func(
- struct ctl_table *
+ const struct ctl_table *
,int , void *, size_t *, loff_t *);
@r4@
identifier func, ctl;
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int , void *, size_t *, loff_t *);
@r5@
identifier func, write, buffer, lenp, ppos;
@@
int func(
- struct ctl_table *
+ const struct ctl_table *
,int write, void *buffer, size_t *lenp, loff_t *ppos);
```
* Code formatting was adjusted in xfs_sysctl.c to comply with code
conventions. The xfs_stats_clear_proc_handler,
xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where
adjusted.
* The ctl_table argument in proc_watchdog_common was const qualified.
This is called from a proc_handler itself and is calling back into
another proc_handler, making it necessary to change it as part of the
proc_handler migration.
Co-developed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Co-developed-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Joel Granados <j.granados@samsung.com>
Due to a bug in earlier userspaces, a transition table may be present
even when the dfa is not. Commit 7572fea31e
("apparmor: convert fperm lookup to use accept as an index") made the
verification check more rigourous regressing old userspaces with
the bug. For compatibility reasons allow the orphaned transition table
during unpack and discard.
Fixes: 7572fea31e ("apparmor: convert fperm lookup to use accept as an index")
Signed-off-by: Georgia Garcia <georgia.garcia@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
If the label is not stale (which is the common case), the fact that the
passed file object holds a reference can be leverged to avoid the
ref/unref cycle. Doing so reduces performance impact of apparmor on
parallel open() invocations.
When benchmarking on a 24-core vm using will-it-scale's open1_process
("Separate file open"), the results are (ops/s):
before: 6092196
after: 8309726 (+36%)
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Fix the 'make W=1' warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in security/apparmor/apparmor_policy_unpack_test.o
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
A "nosymfollow" flag was added in commit
dab741e0e0 ("Add a "nosymfollow" mount option.")
While we don't need to implement any special logic on
the AppArmor kernel side to handle it, we should provide
user with a correct list of mount flags in audit logs.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
When a process' cred struct is replaced, this _almost_ always invokes
the cred_prepare LSM hook; but in one special case (when
KEYCTL_SESSION_TO_PARENT updates the parent's credentials), the
cred_transfer LSM hook is used instead. Landlock only implements the
cred_prepare hook, not cred_transfer, so KEYCTL_SESSION_TO_PARENT causes
all information on Landlock restrictions to be lost.
This basically means that a process with the ability to use the fork()
and keyctl() syscalls can get rid of all Landlock restrictions on
itself.
Fix it by adding a cred_transfer hook that does the same thing as the
existing cred_prepare hook. (Implemented by having hook_cred_prepare()
call hook_cred_transfer() so that the two functions are less likely to
accidentally diverge in the future.)
Cc: stable@kernel.org
Fixes: 385975dca5 ("landlock: Set up the security framework and manage credentials")
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20240724-landlock-houdini-fix-v1-1-df89a4560ca3@google.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
API:
- Test setkey in no-SIMD context.
- Add skcipher speed test for user-specified algorithm.
Algorithms:
- Add x25519 support on ppc64le.
- Add VAES and AVX512 / AVX10 optimized AES-GCM on x86.
- Remove sm2 algorithm.
Drivers:
- Add Allwinner H616 support to sun8i-ce.
- Use DMA in stm32.
- Add Exynos850 hwrng support to exynos.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmaZFsgACgkQxycdCkmx
i6f76Q//ej7akY9fo6/qsn8UFK16O0SCEMkx7TrkxqHV8R6uwy4ret3+b5dbckY6
hBjDabiL/BAdNzo8hvta+BOtN6ToEqquSVwNCpX0U3YMLf9dIzcMA4Uri3LbxUHi
x9Qa8klI5x62Kg+RW+ovaJC4C11oKTpjVeDn4S57MudlBnhEa3DYcEADKiUowkEz
aigtLx8HrZYjwkQxwgWeS0xzeojhW1P20yaghOd6hTCD7vKw18JaKdD8r4YFGOBu
39eDaM/0vR+wWokk3NNl6NmXieBT8qLFt+OIbQs6b3gX9K37daahRs1VoShcL+ix
l8GaqLpo1n1llVrV1OWzyVLVLtYK849QEo6OmlusnbK7e5pQKEOXoACQ0VB8ElNE
1u7KNW6CBWGzr33dWPgl9yYBrT3BmMXABIK4dNmTicJsK2zk2FPKbLDZNi8fWah/
D46mv7Rb8EtTdhN56EzceUJpd1ZfmP9S4vY1Hu8YdmI1pxex11US/XppKLoyymqp
vNOzf85VuZ/GkUPfHdyWAFBnTaCjXtSBrlXD6+0nxavU9KGli0PLLX5tKNNWGw0l
51Z0tbNsDbo3Z+sMmtfvBXR2V8NwiAT5f775W0lLvpq/44mbDpdN3jGvfy9y9C7u
1DUC6F0XtUhZjR7e6/EhvHh3lB/a3w/m3+XC+XzDeox/VYTrC3Q=
=x80X
-----END PGP SIGNATURE-----
Merge tag 'v6.11-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
"API:
- Test setkey in no-SIMD context
- Add skcipher speed test for user-specified algorithm
Algorithms:
- Add x25519 support on ppc64le
- Add VAES and AVX512 / AVX10 optimized AES-GCM on x86
- Remove sm2 algorithm
Drivers:
- Add Allwinner H616 support to sun8i-ce
- Use DMA in stm32
- Add Exynos850 hwrng support to exynos"
* tag 'v6.11-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (81 commits)
hwrng: core - remove (un)register_miscdev()
crypto: lib/mpi - delete unnecessary condition
crypto: testmgr - generate power-of-2 lengths more often
crypto: mxs-dcp - Ensure payload is zero when using key slot
hwrng: Kconfig - Do not enable by default CN10K driver
crypto: starfive - Fix nent assignment in rsa dec
crypto: starfive - Align rsa input data to 32-bit
crypto: qat - fix unintentional re-enabling of error interrupts
crypto: qat - extend scope of lock in adf_cfg_add_key_value_param()
Documentation: qat: fix auto_reset attribute details
crypto: sun8i-ce - add Allwinner H616 support
crypto: sun8i-ce - wrap accesses to descriptor address fields
dt-bindings: crypto: sun8i-ce: Add compatible for H616
hwrng: core - Fix wrong quality calculation at hw rng registration
hwrng: exynos - Enable Exynos850 support
hwrng: exynos - Add SMC based TRNG operation
hwrng: exynos - Implement bus clock control
hwrng: exynos - Use devm_clk_get_enabled() to get the clock
hwrng: exynos - Improve coding style
dt-bindings: rng: Add Exynos850 support to exynos-trng
...
* Fix some typos, incomplete or confusing phrases.
* Split paragraphs where appropriate.
* List the same error code multiple times,
if it has multiple possible causes.
* Bring wording closer to the man page wording,
which has undergone more thorough review
(esp. for LANDLOCK_ACCESS_FS_WRITE_FILE).
* Small semantic clarifications
* Call the ephemeral port range "ephemeral"
* Clarify reasons for EFAULT in landlock_add_rule()
* Clarify @rule_type doc for landlock_add_rule()
This is a collection of small fixes which I collected when preparing the
corresponding man pages [1].
Cc: Alejandro Colomar <alx@kernel.org>
Cc: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Link: https://lore.kernel.org/r/20240715155554.2791018-1-gnoack@google.com [1]
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20240715160328.2792835-2-gnoack@google.com
[mic: Add label to link, fix formatting spotted by make htmldocs,
synchronize userspace-api documentation's date]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
- Intel PT support enhancements & fixes
- Fix leaked SIGTRAP events
- Improve and fix the Intel uncore driver
- Add support for Intel HBM and CXL uncore counters
- Add Intel Lake and Arrow Lake support
- AMD uncore driver fixes
- Make SIGTRAP and __perf_pending_irq() work on RT
- Micro-optimizations
- Misc cleanups and fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmaWjncRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1iZyg//TSafjCK4N9fyXrPdPqf8L7ntX5uYf0rd
uVZpEo/+VGvuFhznHnZIV2DLetvuwYZcUWszCqQMYfokGGi6WI1/k4MeZkSpN5QE
p5mFk6gW3cmpHT9bECg7mKQH+w7Qna/b6mnA0HYTFxPGmQKdQDl1/S+ZsgWedxpC
4V3re7/FzenFVS45DwSMPi9s7uZzZhVhTSgb4XLy+0Da4S0iRULItBa8HT8HmqE5
v5aQlw3mmwKPUWvyPMi3Sw6RRWK3C+n5ZxWswSYoLSM3dsp1ZD+YYqtOv2GqAx8v
JoL0SOnGnNCfxGHh0kz5D2hztDvq61Enotih2gz7HxvdWh2DasNp4yS1USGQhu5h
VJnKNA0TfOUaYqWFVj0EgRVhDX79lMwSHTkR1DZd4vM2GDigHeRPh0zGSn2w/koV
oCRxFfBoktHBnX0Te1NE2BhojbuKp25vTGK6GriVcHt/RNpuz6hTxsjdJzHCAlVX
M349l0EpUJafvfaIN9zF22uw22J8P9y9JYqI6ebkUIKiuoT9LuafVYhQupSE9H4u
IqlozPCTNw6eAQcUo03gkl3n+SY/DZH6eU2ycKgEp3r7TDGYbJPwxY1BgOHbwi4U
lySM07leso2accSVAz7GDMI3ejj6Sx64asWS1FSwbajDflouaIK2jtey+1IOdXfv
hHY65tomV8U=
=gguT
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
- Intel PT support enhancements & fixes
- Fix leaked SIGTRAP events
- Improve and fix the Intel uncore driver
- Add support for Intel HBM and CXL uncore counters
- Add Intel Lake and Arrow Lake support
- AMD uncore driver fixes
- Make SIGTRAP and __perf_pending_irq() work on RT
- Micro-optimizations
- Misc cleanups and fixes
* tag 'perf-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
perf/x86/intel: Add a distinct name for Granite Rapids
perf/x86/intel/ds: Fix non 0 retire latency on Raptorlake
perf/x86/intel: Hide Topdown metrics events if the feature is not enumerated
perf/x86/intel/uncore: Fix the bits of the CHA extended umask for SPR
perf: Split __perf_pending_irq() out of perf_pending_irq()
perf: Don't disable preemption in perf_pending_task().
perf: Move swevent_htable::recursion into task_struct.
perf: Shrink the size of the recursion counter.
perf: Enqueue SIGTRAP always via task_work.
task_work: Add TWA_NMI_CURRENT as an additional notify mode.
perf: Move irq_work_queue() where the event is prepared.
perf: Fix event leak upon exec and file release
perf: Fix event leak upon exit
task_work: Introduce task_work_cancel() again
task_work: s/task_work_cancel()/task_work_cancel_func()/
perf/x86/amd/uncore: Fix DF and UMC domain identification
perf/x86/amd/uncore: Avoid PMU registration if counters are unavailable
perf/x86/intel: Support Perfmon MSRs aliasing
perf/x86/intel: Support PERFEVTSEL extension
perf/x86: Add config_mask to represent EVENTSEL bitmask
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmaVT+gUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPr0BAA1JDxr3dcRnylZgQYy9LU9fIbqPhH
OnV3LsPvuH+dQujluKqjOSaHpxx5EM/DZ+Heot59eY81S/shYnkEg2Z7TvvCAqBD
2rNjufU4wwisU1RgCTCXMdQ5qgtX+ya5p93Q0r1zRRzkE7qcEZxIbxuN/yRotgiF
ie5FFA5tk7bEwt4MKCqlvfn8p6yu+YZNLINHcBmOYVVlBd1BPoOlK0TlvT4p/dtM
z9RSnpuu52HU35w5UJ0hj5I9i/M1LH6cPID5/Uj2/PXVPJkw5FMwNLPGgKVpmswm
PQZQPsSC7+eKlNLmI98BAfQfYgdk0yShKQkQgnJ1UfZkSx/scLLY/kr7LrbzkOkN
KI5jhCFsQibQ4CWuOD4mjSJJZCIyKj6TPzXTq0meEAePblnoyAVWO6sUP71tCtFn
503a3a/kDpvRqv5BRqjOQsxvLTjPHkZCDfZMTnlSU0+U3Ww/gbcnXmkHh9HdUdgQ
qkyWJmfgiIk2V0aebXchDTxCT3qclvLisxcCJSZm/9EQkc6xKWIQE/YPa4y4UGdk
tI2U8nKwGimEU4oi6dP3DBYFZc3RNBB3iF+iIveHX2k5+TBqNvESYCN6wRY/TBBb
TMFjEYyYDEBwvJhzNxioVuJ7u1gxc3EUMfwQPmuJk2oUFEWxbWXucdGeBbpkddrV
wbUjT9FTVOcS9b0=
=w5mj
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20240715' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
"Two LSM patches focused on cleaning up the inode xattr capability
handling"
* tag 'lsm-pr-20240715' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
selinux,smack: remove the capability checks in the removexattr hooks
lsm: fixup the inode xattr capability handling
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmaVT80UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPyTBAAhRyIqQ33E8XKkUsfcSkOq9jHynVJ
STXZ5ZKct3KZ4huQRYvUI7t435thaPxz/zPcm9HQesz9gU1BsGzAzQdGcmH1utPh
/bsR042Q4yxpHLaIPbVNCDrR4DFhjY5i+uLVyegZMYmM94HAlk+yWfrDJ/4w3ZXX
5iWw1ezxsER4KDuDCX6mrHnfoBh9DbMYhGmMeTy1pAIs14XIIWNmwXtYw1AUTdSe
Hhyhi0TidUwsLtmWqwBcFW88nka0dpSLiiWWapvTxv7SNp7PMMWLEtGrbpXNpPzl
t6tjhRe/DFLGEXWbWZSfO0bVvxdMRRbQNWw7iTg2xaom3GNzI0upbehkcLoBrFr5
+kjwHR7n3f4CBM/kgpPSQqgHcFc0OIJj0WsMUQ131M7RMeadqCzUDxNzwkOfjUy5
fRexcigzY/PmgLp16/OGEB35DdBSzw3j9FrUkQG0fVTOhSR3tDAWpoYh7GhRVyZV
eRrQhw/BTeK+ryL3bqVvAUAHmOlGZfYsRS4untqt6wO5GS92C9TT4FOFXRKIvmrI
3pjyz2atGg/2U42lWUPJPqe8GNj4NbLgfJBJNOz++wvYjQo+e8i/6IQGdigQhDEY
rJdfwxB7vmH55rEMLVB9swyPi2glAb3F0JDH14UUnGVm5WFKn7UN70T/XkexCno9
dZotwrPoytUjF6o=
=+gTM
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20240715' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux update from Paul Moore:
"A single SELinux patch to change the type of a pre-processor constant
to better match its use"
* tag 'selinux-pr-20240715' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: Use 1UL for EBITMAP_BIT to match maps type
Commit 61df7b8282 ("lsm: fixup the inode xattr capability handling")
moved the responsibility of doing the inode xattr capability checking
out of the individual LSMs and into the LSM framework itself.
Unfortunately, while the original commit added the capability checks
to both the setxattr and removexattr code in the LSM framework, it
only removed the setxattr capability checks from the individual LSMs,
leaving duplicated removexattr capability checks in both the SELinux
and Smack code.
This patch removes the duplicated code from SELinux and Smack.
Fixes: 61df7b8282 ("lsm: fixup the inode xattr capability handling")
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
A proper task_work_cancel() API that actually cancels a callback and not
*any* callback pointing to a given function is going to be needed for
perf events event freeing. Do the appropriate rename to prepare for
that.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240621091601.18227-2-frederic@kernel.org
When defined using bit-fields, the compiler takes care of packing the
bits in a memory-efficient way and frees us from defining
LANDLOCK_SHIFT_ACCESS_* by hand. The exact memory layout does not
matter in our use case.
The manual definition of LANDLOCK_SHIFT_ACCESS_* has resulted in bugs in
at least two recent patch sets [1] [2] where new kinds of handled access
rights were introduced.
Cc: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Link: https://lore.kernel.org/r/ebd680cc-25d6-ee14-4856-310f5e5e28e4@huawei-partners.com [1]
Link: https://lore.kernel.org/r/ZmLEoBfHyUR3nKAV@google.com [2]
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20240610082115.1693267-1-gnoack@google.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCZohO5hQcem9oYXJAbGlu
dXguaWJtLmNvbQAKCRDLwZzRsCrn5dwCAQD981vQUMZysfy6xpQhVDjrn34tt/F9
lBDvrFuLWmiP7QD/c6p6RBNgxDlyj0wn2XA9/AhObpppC2yl4lvieZ6IwwE=
=exIi
-----END PGP SIGNATURE-----
Merge tag 'integrity-v6.10-fix' of ssh://ra.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity fix from Mimi Zohar:
"A single bug fix to properly remove all of the securityfs IMA
measurement lists"
* tag 'integrity-v6.10-fix' of ssh://ra.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: fix wrong zero-assignment during securityfs dentry remove
This patch modifies the definition of EBITMAP_BIT in
security/selinux/ss/ebitmap.h from 1ULL to 1UL to match the type
of elements in the ebitmap_node maps array.
This change does not affect the functionality or correctness of
the code but aims to enhance code quality by adhering to good
programming practices and avoiding unnecessary type conversions.
Signed-off-by: Canfeng Guo <guocanfeng@uniontech.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
During kbuild, with W=1, modpost will warn when a module doesn't have
a MODULE_DESCRIPTION(). The encrypted-keys module does not have a
MODULE_DESCRIPTION(). But currently, even with an allmodconfig
configuration, this module is built-in, and as a result, kbuild does
not currently warn about the missing MODULE_DESCRIPTION().
However, just in case it is built as a module in the future, add the
missing MODULE_DESCRIPTION() macro invocation.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
kbuild reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in security/keys/trusted-keys/trusted.o
Add the missing MODULE_DESCRIPTION() macro invocation.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
When a process accept()s connection from a unix socket
(either stream or seqpacket)
it gets the socket with the label of the connecting process.
For example, if a connecting process has a label 'foo',
the accept()ed socket will also have 'in' and 'out' labels 'foo',
regardless of the label of the listener process.
This is because kernel creates unix child sockets
in the context of the connecting process.
I do not see any obvious way for the listener to abuse
alien labels coming with the new socket, but,
to be on the safe side, it's better fix new socket labels.
Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmZwh44UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNRBA//Q09J/SADHi63fjpStx+Gvo5h6TbM
L4gqYsjxpi7CfXFwlBtFRjk9Q0osRDxbDWTuZ8gMcJONlRdHpZFil2gYSEacImsn
tAkrQpV32U1oNua+kgoIkQTHwNIKjA9odYZ4pyJ0AZvnB5Z62B841r8GAaTADg++
fGOuCBYZeuioCAjPUN2KZtkCKdhiu823Gwe2z9U6SJyCdPqRFjpBuumDoNvCTrCB
UJuc5DqWSNk2rZXZQG6RSLeOOZZwRf9s2ATU96T/9Lp0m6qqxPPisHkWscjhx5Ve
W7z2IWGFrNzJ8ABKwBK/NUMQbs3WzsepyPqZdoo//PkhMjQlfb+5iPitJWM6qmdM
6jgj2HkDzX2OtR9u6VOcOKKwz4NQnf4JcHRUDjq8vQ3eKYOTcDLx4VR8O/Ullmhf
pZL4klNXpBrw7DLYurTlpbm9jUmMCev9DvuSYJmyRjq7jA+8Cph6+clGriIbljqn
9hCqSnbufDxySwB0unYu9zwnC4bN+Yzcgr4qYFoA+zdj5eYloaJvPhwOh6MPsQaO
DJlCt6Wfw4SqD3afxaJnzw4/SBRuPA8ISoxTXVJUg7Q+NfUI8HBDO4YihiqJ7cm0
yvD0mFvweJVEpX2slDyob58xYgkmL8TaIPErJ9A/EO30W0nm+nQzXDR+cOa9VqAc
txcTscOv5YMLLMk=
=nYky
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20240617' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm fix from Paul Moore:
"A single LSM/IMA patch to fix a problem caused by sleeping while in a
RCU critical section"
* tag 'lsm-pr-20240617' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
ima: Avoid blocking in RCU read-side critical section
Mainly MM singleton fixes. And a couple of ocfs2 regression fixes.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZnCEQAAKCRDdBJ7gKXxA
jmgSAQDk3BYs1n67cnwx/Zi04yMYDyfYTCYg2udPfT2a+GpmbwD+N5dJd/vCztXH
5eLpP11xd/yr2+I9FefyZeUuA80KtgQ=
=2agY
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Mainly MM singleton fixes. And a couple of ocfs2 regression fixes"
* tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
kcov: don't lose track of remote references during softirqs
mm: shmem: fix getting incorrect lruvec when replacing a shmem folio
mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick
mm: fix possible OOB in numa_rebuild_large_mapping()
mm/migrate: fix kernel BUG at mm/compaction.c:2761!
selftests: mm: make map_fixed_noreplace test names stable
mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default
gcov: add support for GCC 14
zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get()
lib/alloc_tag: do not register sysctl interface when CONFIG_SYSCTL=n
MAINTAINERS: remove Lorenzo as vmalloc reviewer
Revert "mm: init_mlocked_on_free_v3"
mm/page_table_check: fix crash on ZONE_DEVICE
gcc: disable '-Warray-bounds' for gcc-9
ocfs2: fix NULL pointer dereference in ocfs2_abort_trigger()
ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty()
There was insufficient review and no agreement that this is the right
approach.
There are serious flaws with the implementation that make processes using
mlock() not even work with simple fork() [1] and we get reliable crashes
when rebooting.
Further, simply because we might be unmapping a single PTE of a large
mlocked folio, we shouldn't zero out the whole folio.
... especially because the code can also *corrupt* urelated memory because
kernel_init_pages(page, folio_nr_pages(folio));
Could end up writing outside of the actual folio if we work with a tail
page.
Let's revert it. Once there is agreement that this is the right approach,
the issues were fixed and there was reasonable review and proper testing,
we can consider it again.
[1] https://lkml.kernel.org/r/4da9da2f-73e4-45fd-b62f-a8a513314057@redhat.com
Link: https://lkml.kernel.org/r/20240605091710.38961-1-david@redhat.com
Fixes: ba42b524a0 ("mm: init_mlocked_on_free_v3")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/lkml/20240528151340.4282-1-00107082@163.com/
Reported-by: Lance Yang <ioworker0@gmail.com>
Closes: https://lkml.kernel.org/r/20240601140917.43562-1-ioworker0@gmail.com
Acked-by: Lance Yang <ioworker0@gmail.com>
Cc: York Jasper Niebuhr <yjnworkstation@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The SM2 algorithm has a single user in the kernel. However, it's
never been integrated properly with that user: asymmetric_keys.
The crux of the issue is that the way it computes its digest with
sm3 does not fit into the architecture of asymmetric_keys. As no
solution has been proposed, remove this algorithm.
It can be resubmitted when it is integrated properly into the
asymmetric_keys subsystem.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Document the unused function parameter of yama_relation_cleanup() to
please kernel doc warnings.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240315125418.273104-2-cgzones@googlemail.com
Signed-off-by: Kees Cook <kees@kernel.org>
Currently, Smack mirrors the label of incoming tcp/ipv4 connections:
when a label 'foo' connects to a label 'bar' with tcp/ipv4,
'foo' always gets 'foo' in returned ipv4 packets. So,
1) returned packets are incorrectly labeled ('foo' instead of 'bar')
2) 'bar' can write to 'foo' without being authorized to write.
Here is a scenario how to see this:
* Take two machines, let's call them C and S,
with active Smack in the default state
(no settings, no rules, no labeled hosts, only builtin labels)
* At S, add Smack rule 'foo bar w'
(labels 'foo' and 'bar' are instantiated at S at this moment)
* At S, at label 'bar', launch a program
that listens for incoming tcp/ipv4 connections
* From C, at label 'foo', connect to the listener at S.
(label 'foo' is instantiated at C at this moment)
Connection succeedes and works.
* Send some data in both directions.
* Collect network traffic of this connection.
All packets in both directions are labeled with the CIPSO
of the label 'foo'. Hence, label 'bar' writes to 'foo' without
being authorized, and even without ever being known at C.
If anybody cares: exactly the same happens with DCCP.
This behavior 1st manifested in release 2.6.29.4 (see Fixes below)
and it looks unintentional. At least, no explanation was provided.
I changed returned packes label into the 'bar',
to bring it into line with the Smack documentation claims.
Signed-off-by: Konstantin Andreev <andreev@swemel.ru>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
The current security_inode_setxattr() and security_inode_removexattr()
hooks rely on individual LSMs to either call into the associated
capability hooks (cap_inode_setxattr() or cap_inode_removexattr()), or
return a magic value of 1 to indicate that the LSM layer itself should
perform the capability checks. Unfortunately, with the default return
value for these LSM hooks being 0, an individual LSM hook returning a
1 will cause the LSM hook processing to exit early, potentially
skipping a LSM. Thankfully, with the exception of the BPF LSM, none
of the LSMs which currently register inode xattr hooks should end up
returning a value of 1, and in the BPF LSM case, with the BPF LSM hooks
executing last there should be no real harm in stopping processing of
the LSM hooks. However, the reliance on the individual LSMs to either
call the capability hooks themselves, or signal the LSM with a return
value of 1, is fragile and relies on a specific set of LSMs being
enabled. This patch is an effort to resolve, or minimize, these
issues.
Before we discuss the solution, there are a few observations and
considerations that we need to take into account:
* BPF LSM registers an implementation for every LSM hook, and that
implementation simply returns the hook's default return value, a
0 in this case. We want to ensure that the default BPF LSM behavior
results in the capability checks being called.
* SELinux and Smack do not expect the traditional capability checks
to be applied to the xattrs that they "own".
* SELinux and Smack are currently written in such a way that the
xattr capability checks happen before any additional LSM specific
access control checks. SELinux does apply SELinux specific access
controls to all xattrs, even those not "owned" by SELinux.
* IMA and EVM also register xattr hooks but assume that the LSM layer
and specific LSMs have already authorized the basic xattr operation.
In order to ensure we perform the capability based access controls
before the individual LSM access controls, perform only one capability
access control check for each operation, and clarify the logic around
applying the capability controls, we need a mechanism to determine if
any of the enabled LSMs "own" a particular xattr and want to take
responsibility for controlling access to that xattr. The solution in
this patch is to create a new LSM hook, 'inode_xattr_skipcap', that is
not exported to the rest of the kernel via a security_XXX() function,
but is used by the LSM layer to determine if a LSM wants to control
access to a given xattr and avoid the traditional capability controls.
Registering an inode_xattr_skipcap hook is optional, if a LSM declines
to register an implementation, or uses an implementation that simply
returns the default value (0), there is no effect as the LSM continues
to enforce the capability based controls (unless another LSM takes
ownership of the xattr). If none of the LSMs signal that the
capability checks should be skipped, the capability check is performed
and if access is granted the individual LSM xattr access control hooks
are executed, keeping with the DAC-before-LSM convention.
Cc: stable@vger.kernel.org
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
In case of error during ima_fs_init() all the dentry already created
are removed. {ascii, binary}_securityfs_measurement_lists are freed
calling for each array the remove_securityfs_measurement_lists(). This
function, at the end, assigns to zero the securityfs_measurement_list_count.
This causes during the second call of remove_securityfs_measurement_lists()
to leave the dentry of the array pending, not removing them correctly,
because the securityfs_measurement_list_count is already zero.
Move the securityfs_measurement_list_count = 0 after the two
remove_securityfs_measurement_lists() calls to correctly remove all the
dentry already allocated.
Fixes: 9fa8e76250 ("ima: add crypto agility support for template-hash algorithm")
Signed-off-by: Enrico Bravi <enrico.bravi@polito.it>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
When asn1_encode_sequence() fails, WARN is not the correct solution.
1. asn1_encode_sequence() is not an internal function (located
in lib/asn1_encode.c).
2. Location is known, which makes the stack trace useless.
3. Results a crash if panic_on_warn is set.
It is also noteworthy that the use of WARN is undocumented, and it
should be avoided unless there is a carefully considered rationale to
use it.
Replace WARN with pr_err, and print the return value instead, which is
only useful piece of information.
Cc: stable@vger.kernel.org # v5.13+
Fixes: f221974525 ("security: keys: trusted: use ASN.1 TPM2 key format for the blobs")
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
'scratch' is never freed. Fix this by calling kfree() in the success, and
in the error case.
Cc: stable@vger.kernel.org # +v5.13
Fixes: f221974525 ("security: keys: trusted: use ASN.1 TPM2 key format for the blobs")
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
documented (hopefully adequately) in the respective changelogs. Notable
series include:
- Lucas Stach has provided some page-mapping
cleanup/consolidation/maintainability work in the series "mm/treewide:
Remove pXd_huge() API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one
test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being allocated:
number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in largely
similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene" Johannes
Weiner has fixed the page allocator's handling of migratetype requests,
with resulting improvements in compaction efficiency.
- In the series "make the hugetlb migration strategy consistent" Baolin
Wang has fixed a hugetlb migration issue, which should improve hugetlb
allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when memory
almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting" Kairui
Song has optimized pagecache insertion, yielding ~10% performance
improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various page->flags
cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb
functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series
"mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs. This
is a simple first-cut implementation for now. The series is "support
multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in the
series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts in
the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call it
GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to
use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes
the initialization code so that migration between different memory types
works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant driver
in the series "mm: follow_pte() improvements and acrn follow_pte()
fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to folio
in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size THP's
in the series "mm: add per-order mTHP alloc and swpout counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled
and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His series
"mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes
the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback instrumentation
in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series "Fix
and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the
series "Improve anon_vma scalability for anon VMAs". Intel's test bot
reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA
jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB
nvA4E0DcPrUAFy144FNM0NTCb7u9vAw=
=V3R/
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
"The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.
Notable series include:
- Lucas Stach has provided some page-mapping cleanup/consolidation/
maintainability work in the series "mm/treewide: Remove pXd_huge()
API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
one test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being
allocated: number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in
largely similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene"
Johannes Weiner has fixed the page allocator's handling of
migratetype requests, with resulting improvements in compaction
efficiency.
- In the series "make the hugetlb migration strategy consistent"
Baolin Wang has fixed a hugetlb migration issue, which should
improve hugetlb allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when
memory almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting"
Kairui Song has optimized pagecache insertion, yielding ~10%
performance improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various
page->flags cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series:
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert
hugetlb functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
series "mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs.
This is a simple first-cut implementation for now. The series is
"support multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in
the series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts
in the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call
it GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
path to use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes".
Fixes the initialization code so that migration between different
memory types works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant
driver in the series "mm: follow_pte() improvements and acrn
follow_pte() fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to
folio in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size
THP's in the series "mm: add per-order mTHP alloc and swpout
counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap
same-filled and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His
series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
optimizes the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback
instrumentation in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series
"Fix and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
the series "Improve anon_vma scalability for anon VMAs". Intel's
test bot reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as
XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking""
* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
selftests: cgroup: add tests to verify the zswap writeback path
mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
mm/damon/core: fix return value from damos_wmark_metric_value
mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
selftests: cgroup: remove redundant enabling of memory controller
Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
Docs/mm/damon/design: use a list for supported filters
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
selftests/damon: classify tests for functionalities and regressions
selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
selftests/damon: add a test for DAMOS quota goal
...
If modules are built compressed, and LoadPin is enforcing by default, we
must have in-kernel module decompression enabled (MODULE_DECOMPRESS).
Modules will fail to load without decompression built into the kernel
because they'll be blocked by LoadPin. Add a depends on clause to
prevent this combination.
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20240514224839.2526112-1-swboyd@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
- Avoid 'constexpr', which is a keyword in C23
- Allow 'dtbs_check' and 'dt_compatible_check' run independently of
'dt_binding_check'
- Fix weak references to avoid GOT entries in position-independent
code generation
- Convert the last use of 'optional' property in arch/sh/Kconfig
- Remove support for the 'optional' property in Kconfig
- Remove support for Clang's ThinLTO caching, which does not work with
the .incbin directive
- Change the semantics of $(src) so it always points to the source
directory, which fixes Makefile inconsistencies between upstream and
downstream
- Fix 'make tar-pkg' for RISC-V to produce a consistent package
- Provide reasonable default coverage for objtool, sanitizers, and
profilers
- Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc.
- Remove the last use of tristate choice in drivers/rapidio/Kconfig
- Various cleanups and fixes in Kconfig
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmZFlGcVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsG8voQALC8NtFpduWVfLRj2Qg6Ll/xf1vX
2igcTJEOFHkeqXLGoT8dTDKLEipUBUvKyguPq66CGwVTe2g6zy/nUSXeVtFrUsIa
msLTi8FqhqUo5lodNvGMRf8qqmuqcvnXoiQwIocF92jtsFy14bhiFY+n4HfcFNjj
GOKwqBZYQUwY/VVb090efc7RfS9c7uwABJSBelSoxg3AGZriwjGy7Pw5aSKGgVYi
inqL1eR6qwPP6z7CgQWM99soP+zwybFZmnQrsD9SniRBI4rtAat8Ih5jQFaSUFUQ
lk2w0NQBRFN88/uR2IJ2GWuIlQ74WeJ+QnCqVuQ59tV5zw90wqSmLzngfPD057Dv
JjNuhk0UyXVtpIg3lRtd4810ppNSTe33b9OM4O2H846W/crju5oDRNDHcflUXcwm
Rmn5ho1rb5QVzDVejJbgwidnUInSgJ9PZcvXQ/RJVZPhpgsBzAY9pQexG1G3hviw
y9UDrt6KP6bF9tHjmolmtdIes9Pj0c4dN6/Rdj4HS4hIQ/GDar0tnwvOvtfUctNL
orJlBsA6GeMmDVXKkR0ytOCWRYqWWbyt8g70RVKQJfuHX7/hGyAQPaQ2/u4mQhC2
aevYfbNJMj0VDfGz81HDBKFtkc5n+Ite8l157dHEl2LEabkOkRdNVcn7SNbOvZmd
ZCSnZ31h7woGfNho
=D5B/
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Avoid 'constexpr', which is a keyword in C23
- Allow 'dtbs_check' and 'dt_compatible_check' run independently of
'dt_binding_check'
- Fix weak references to avoid GOT entries in position-independent code
generation
- Convert the last use of 'optional' property in arch/sh/Kconfig
- Remove support for the 'optional' property in Kconfig
- Remove support for Clang's ThinLTO caching, which does not work with
the .incbin directive
- Change the semantics of $(src) so it always points to the source
directory, which fixes Makefile inconsistencies between upstream and
downstream
- Fix 'make tar-pkg' for RISC-V to produce a consistent package
- Provide reasonable default coverage for objtool, sanitizers, and
profilers
- Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc.
- Remove the last use of tristate choice in drivers/rapidio/Kconfig
- Various cleanups and fixes in Kconfig
* tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits)
kconfig: use sym_get_choice_menu() in sym_check_prop()
rapidio: remove choice for enumeration
kconfig: lxdialog: remove initialization with A_NORMAL
kconfig: m/nconf: merge two item_add_str() calls
kconfig: m/nconf: remove dead code to display value of bool choice
kconfig: m/nconf: remove dead code to display children of choice members
kconfig: gconf: show checkbox for choice correctly
kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal
Makefile: remove redundant tool coverage variables
kbuild: provide reasonable defaults for tool coverage
modules: Drop the .export_symbol section from the final modules
kconfig: use menu_list_for_each_sym() in sym_check_choice_deps()
kconfig: use sym_get_choice_menu() in conf_write_defconfig()
kconfig: add sym_get_choice_menu() helper
kconfig: turn defaults and additional prompt for choice members into error
kconfig: turn missing prompt for choice members into error
kconfig: turn conf_choice() into void function
kconfig: use linked list in sym_set_changed()
kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED
kconfig: gconf: remove debug code
...
-----BEGIN PGP SIGNATURE-----
iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCZkYGahAcbWljQGRpZ2lr
b2QubmV0AAoJEOXj0OiMgvbSaxUBAL1USJI2DUo+hJTDHqXdVZGEE6VfYnmEhIur
NZfsti4UAQCabAej7+NHzh1jAnnKMJFbcyl4LRy99anJGlEWq9ExDA==
=fViz
-----END PGP SIGNATURE-----
Merge tag 'landlock-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"This brings ioctl control to Landlock, contributed by Günther Noack.
This also adds him as a Landlock reviewer, and fixes an issue in the
sample"
* tag 'landlock-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
MAINTAINERS: Add Günther Noack as Landlock reviewer
fs/ioctl: Add a comment to keep the logic in sync with LSM policies
MAINTAINERS: Notify Landlock maintainers about changes to fs/ioctl.c
landlock: Document IOCTL support
samples/landlock: Add support for LANDLOCK_ACCESS_FS_IOCTL_DEV
selftests/landlock: Exhaustive test for the IOCTL allow-list
selftests/landlock: Check IOCTL restrictions for named UNIX domain sockets
selftests/landlock: Test IOCTLs on named pipes
selftests/landlock: Test ioctl(2) and ftruncate(2) with open(O_PATH)
selftests/landlock: Test IOCTL with memfds
selftests/landlock: Test IOCTL support
landlock: Add IOCTL access right for character and block devices
samples/landlock: Fix incorrect free in populate_ruleset_net
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCZkQD2xQcem9oYXJAbGlu
dXguaWJtLmNvbQAKCRDLwZzRsCrn5b81APwINCoiLDB0L9KkyUhQjqvLLZCS85u9
Qo3/wdUGAb2tygD/RYKJgtGxvqO1Ap1IysytKHId0yo+vokdXG5stpMPJQE=
=dIrw
-----END PGP SIGNATURE-----
Merge tag 'integrity-v6.10' of ssh://ra.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
"Two IMA changes, one EVM change, a use after free bug fix, and a code
cleanup to address "-Wflex-array-member-not-at-end" warnings:
- The existing IMA {ascii, binary}_runtime_measurements lists include
a hard coded SHA1 hash. To address this limitation, define per TPM
enabled hash algorithm {ascii, binary}_runtime_measurements lists
- Close an IMA integrity init_module syscall measurement gap by
defining a new critical-data record
- Enable (partial) EVM support on stacked filesystems (overlayfs).
Only EVM portable & immutable file signatures are copied up, since
they do not contain filesystem specific metadata"
* tag 'integrity-v6.10' of ssh://ra.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: add crypto agility support for template-hash algorithm
evm: Rename is_unsupported_fs to is_unsupported_hmac_fs
fs: Rename SB_I_EVM_UNSUPPORTED to SB_I_EVM_HMAC_UNSUPPORTED
evm: Enforce signatures on unsupported filesystem for EVM_INIT_X509
ima: re-evaluate file integrity on file metadata change
evm: Store and detect metadata inode attributes changes
ima: Move file-change detection variables into new structure
evm: Use the metadata inode to calculate metadata hash
evm: Implement per signature type decision in security_inode_copy_up_xattr
security: allow finer granularity in permitting copy-up of security xattrs
ima: Rename backing_inode to real_inode
integrity: Avoid -Wflex-array-member-not-at-end warnings
ima: define an init_module critical data record
ima: Fix use-after-free on a dentry's dname.name
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmZCWd4UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMFMQ/8DlQhbe+NzeAjBg2O7RqNKoE/UU2l
JAkzgQ7h35/sgsfDIEhPv0VOzq2cwLSaw5/qSGGrQDWR4AKz55HvBSJQDElgzPo2
wtL2lozjJ03+P391aeXEHdkqzgp2F3c6oNthIcFvRmi3EVcWr7eqnjWU+bJzFtMA
z4IgX01yiUF6EMdTJcWjSLclk3dOqHWPV657RadYyp2pr3u9rc9gqgxichs0dQW0
X/vexUBg81TdSeHS+u9Jgrz+TCsQdP3CHSFibXDv1rX89bqCbG9wYNTyTBHH9f0j
LPpV2bKaesxIZV6dZyIKKSqCtHGBcjcJEjUbl7JFkQOS/a8NPMDH2ZYkHqbyq4BH
SX+kRhthmXZiZ2IWhMRikaE4FBIWI7JAC4riKB2Dd0RzJrNEgaawU0wM+evKai9F
s3GCk/P7wnTT4GlDHSkYEiiRF9c3DNkfVb9mqG78PRMR4X+z/0dsX5iaJTwlyMxo
69JSoTNzMxVdDtvD+K89V7GCm8NmGFjLwQUqb282L/6ibzC/5CCUQXn8ZWvNRPql
IUoQhKbveCWR3m9vAMeGpE61oNgFp0mHj3Zb/vE5Yu7HxOn2y1lFv2XGxR6bGyTF
T53AaHM15uFomhqhG+iM3b4FDhNIJpbUc4MJsq3iMgFC941n1ceXyeBEgf31+Z2x
k1ZlxqLAWn05auA=
=2Wbi
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20240513' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Attempt to pre-allocate the SELinux status page so it doesn't appear
to userspace that we are skipping SELinux policy sequence numbers
- Reject invalid SELinux policy bitmaps with an error at policy load
time
- Consistently use the same type, u32, for ebitmap offsets
- Improve the "symhash" hash function for better distribution on common
policies
- Correct a number of printk format specifiers in the ebitmap code
- Improved error checking in sel_write_load()
- Ensure we have a proper return code in the
filename_trans_read_helper_compat() function
- Make better use of the current_sid() helper function
- Allow for more hash table statistics when debugging is enabled
- Migrate from printk_ratelimit() to pr_warn_ratelimited()
- Miscellaneous cleanups and tweaks to selinux_lsm_getattr()
- More consitification work in the conditional policy space
* tag 'selinux-pr-20240513' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: constify source policy in cond_policydb_dup()
selinux: avoid printk_ratelimit()
selinux: pre-allocate the status page
selinux: clarify return code in filename_trans_read_helper_compat()
selinux: use u32 as bit position type in ebitmap code
selinux: improve symtab string hashing
selinux: dump statistics for more hash tables
selinux: make more use of current_sid()
selinux: update numeric format specifiers for ebitmaps
selinux: improve error checking in sel_write_load()
selinux: cleanup selinux_lsm_getattr()
selinux: reject invalid ebitmaps
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmZCWeoUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOb/Q//V68LTU4pJF/2DaxC4rA9mDXC/iey
AF1v8ajzWOuPA9X0xTkYRKkXY7llgqVCjJmKhtqA9elPEfZyRXhvZikoBnfb7TvS
dh3nEh/3u2I09PnewBZdj4Bq1owQ+JDfxVzTpMQ433lNri23zcuszI8IB3ePvHe2
CoMs/UxbB5D6JEBOIiK1GFqiv2gt+293cw0RorD4DmUinAPU1WOqk9pdxuIwwONM
PyJ4kHcQJ7aU6nAfSPQbFzrqaZCvoy2gKaCtslQ9ASlOWRtFbSbccKu29g8vbPcg
Pql70xzYcjr5iXcSpD3d84LHzFOYBuSWrdqRFeP4w9tZzCAYGTlfvltAnQImbg2B
GbDFogNB2hZbUq2zf9hhXcmhGVr4tqlxeujsGPCiivZfVSPR/9VyWvL9dH6R/ooE
YZbmc1DNSNPkvozbejJ48rcI+nb5iCe7FVMeKODSvQFMpzAn4WsCb+svMhRzQ1h4
f1XkJ9TGxNEkBoVd0LpOfiPgB+HsFWRy/h9I5lQdAYr/H1MU4bAnKT761k8zPdXr
Ne03zv0uvoOPHp5ObeKc5krWjJsuRffT/r9hegUiWfww/sANo7dfx2Oo22YoTo86
siK27qa/tiWn11/WGM8ozW3Ni8GcU496gO4n7RkVrVjdzpl6JG42CtRwaNYM7q6v
1Ntmg9H/wZllMg4=
=F3Tz
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20240513' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- The security/* portion of the effort to remove the empty sentinel
elements at the end of the ctl_table arrays
- Update the file list associated with the LSM / "SECURITY SUBSYSTEM"
entry in the MAINTAINERS file (and then fix a typo in then update)
* tag 'lsm-pr-20240513' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
MAINTAINERS: repair file entry in SECURITY SUBSYSTEM
MAINTAINERS: update the LSM file list
lsm: remove the now superfluous sentinel element from ctl_table array
Core & protocols
----------------
- Complete rework of garbage collection of AF_UNIX sockets.
AF_UNIX is prone to forming reference count cycles due to fd passing
functionality. New method based on Tarjan's Strongly Connected Components
algorithm should be both faster and remove a lot of workarounds
we accumulated over the years.
- Add TCP fraglist GRO support, allowing chaining multiple TCP packets
and forwarding them together. Useful for small switches / routers which
lack basic checksum offload in some scenarios (e.g. PPPoE).
- Support using SMP threads for handling packet backlog i.e. packet
processing from software interfaces and old drivers which don't
use NAPI. This helps move the processing out of the softirq jumble.
- Continue work of converting from rtnl lock to RCU protection.
Don't require rtnl lock when reading: IPv6 routing FIB, IPv6 address
labels, netdev threaded NAPI sysfs files, bonding driver's sysfs files,
MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics, TC Qdiscs,
neighbor entries, ARP entries via ioctl(SIOCGARP), a lot of the link
information available via rtnetlink.
- Small optimizations from Eric to UDP wake up handling, memory accounting,
RPS/RFS implementation, TCP packet sizing etc.
- Allow direct page recycling in the bulk API used by XDP, for +2% PPS.
- Support peek with an offset on TCP sockets.
- Add MPTCP APIs for querying last time packets were received/sent/acked,
and whether MPTCP "upgrade" succeeded on a TCP socket.
- Add intra-node communication shortcut to improve SMC performance.
- Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol driver.
- Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.
- Add reset reasons for tracing what caused a TCP reset to be sent.
- Introduce direction attribute for xfrm (IPSec) states.
State can be used either for input or output packet processing.
Things we sprinkled into general kernel code
--------------------------------------------
- Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().
This required touch-ups and renaming of a few existing users.
- Add Endian-dependent __counted_by_{le,be} annotations.
- Make building selftests "quieter" by printing summaries like
"CC object.o" rather than full commands with all the arguments.
Netfilter
---------
- Use GFP_KERNEL to clone elements, to deal better with OOM situations
and avoid failures in the .commit step.
BPF
---
- Add eBPF JIT for ARCv2 CPUs.
- Support attaching kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function entry
and return, the entry program can decide if the return program gets
executed and the entry program can share u64 cookie value with return
program. "Session mode" is a common use-case for tetragon and bpftrace.
- Add the ability to specify and retrieve BPF cookie for raw tracepoint
programs in order to ease migration from classic to raw tracepoints.
- Add an internal-only BPF per-CPU instruction for resolving per-CPU
memory addresses and implement support in x86, ARM64 and RISC-V JITs.
This allows inlining functions which need to access per-CPU state.
- Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
atomics in bpf_arena which can be JITed as a single x86 instruction.
Support BPF arena on ARM64.
- Add a new bpf_wq API for deferring events and refactor process-context
bpf_timer code to keep common code where possible.
- Harden the BPF verifier's and/or/xor value tracking.
- Introduce crypto kfuncs to let BPF programs call kernel crypto APIs.
- Support bpf_tail_call_static() helper for BPF programs with GCC 13.
- Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
program to have code sections where preemption is disabled.
Driver API
----------
- Skip software TC processing completely if all installed rules are
marked as HW-only, instead of checking the HW-only flag rule by rule.
- Add support for configuring PoE (Power over Ethernet), similar to
the already existing support for PoDL (Power over Data Line) config.
- Initial bits of a queue control API, for now allowing a single queue
to be reset without disturbing packet flow to other queues.
- Common (ethtool) statistics for hardware timestamping.
Tests and tooling
-----------------
- Remove the need to create a config file to run the net forwarding tests
so that a naive "make run_tests" can exercise them.
- Define a method of writing tests which require an external endpoint
to communicate with (to send/receive data towards the test machine).
Add a few such tests.
- Create a shared code library for writing Python tests. Expose the YAML
Netlink library from tools/ to the tests for easy Netlink access.
- Move netfilter tests under net/, extend them, separate performance tests
from correctness tests, and iron out issues found by running them
"on every commit".
- Refactor BPF selftests to use common network helpers.
- Further work filling in YAML definitions of Netlink messages for:
nftables, team driver, bonding interfaces, vlan interfaces, VF info,
TC u32 mark, TC police action.
- Teach Python YAML Netlink to decode attribute policies.
- Extend the definition of the "indexed array" construct in the specs
to cover arrays of scalars rather than just nests.
- Add hyperlinks between definitions in generated Netlink docs.
Drivers
-------
- Make sure unsupported flower control flags are rejected by drivers,
and make more drivers report errors directly to the application rather
than dmesg (large number of driver changes from Asbjørn Sloth Tønnesen).
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support multiple RSS contexts and steering traffic to them
- support XDP metadata
- make page pool allocations more NUMA aware
- Intel (100G, ice, idpf):
- extract datapath code common among Intel drivers into a library
- use fewer resources in switchdev by sharing queues with the PF
- add PFCP filter support
- add Ethernet filter support
- use a spinlock instead of HW lock in PTP clock ops
- support 5 layer Tx scheduler topology
- nVidia/Mellanox:
- 800G link modes and 100G SerDes speeds
- per-queue IRQ coalescing configuration
- Marvell Octeon:
- support offloading TC packet mark action
- Ethernet NICs consumer, embedded and virtual:
- stop lying about skb->truesize in USB Ethernet drivers, it messes up
TCP memory calculations
- Google cloud vNIC:
- support changing ring size via ethtool
- support ring reset using the queue control API
- VirtIO net:
- expose flow hash from RSS to XDP
- per-queue statistics
- add selftests
- Synopsys (stmmac):
- support controllers which require an RX clock signal from the MII
bus to perform their hardware initialization
- TI:
- icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
- icssg_prueth: add SW TX / RX Coalescing based on hrtimers
- cpsw: minimal XDP support
- Renesas (ravb):
- support describing the MDIO bus
- Realtek (r8169):
- add support for RTL8168M
- Microchip Sparx5:
- matchall and flower actions mirred and redirect
- Ethernet switches:
- nVidia/Mellanox:
- improve events processing performance
- Marvell:
- add support for MV88E6250 family internal PHYs
- Microchip:
- add DCB and DSCP mapping support for KSZ switches
- vsc73xx: convert to PHYLINK
- Realtek:
- rtl8226b/rtl8221b: add C45 instances and SerDes switching
- Many driver changes related to PHYLIB and PHYLINK deprecated API cleanup.
- Ethernet PHYs:
- Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
- micrel: lan8814: add support for PPS out and external timestamp trigger
- WiFi:
- Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices drivers.
Modern devices can only be configured using nl80211.
- mac80211/cfg80211
- handle color change per link for WiFi 7 Multi-Link Operation
- Intel (iwlwifi):
- don't support puncturing in 5 GHz
- support monitor mode on passive channels
- BZ-W device support
- P2P with HE/EHT support
- re-add support for firmware API 90
- provide channel survey information for Automatic Channel Selection
- MediaTek (mt76):
- mt7921 LED control
- mt7925 EHT radiotap support
- mt7920e PCI support
- Qualcomm (ath11k):
- P2P support for QCA6390, WCN6855 and QCA2066
- support hibernation
- ieee80211-freq-limit Device Tree property support
- Qualcomm (ath12k):
- refactoring in preparation of multi-link support
- suspend and hibernation support
- ACPI support
- debugfs support, including dfs_simulate_radar support
- RealTek:
- rtw88: RTL8723CS SDIO device support
- rtw89: RTL8922AE Wi-Fi 7 PCI device support
- rtw89: complete features of new WiFi 7 chip 8922AE including
BT-coexistence and Wake-on-WLAN
- rtw89: use BIOS ACPI settings to set TX power and channels
- rtl8xxxu: enable Management Frame Protection (MFP) support
- Bluetooth:
- support for Intel BlazarI and Filmore Peak2 (BE201)
- support for MediaTek MT7921S SDIO
- initial support for Intel PCIe BT driver
- remove HCI_AMP support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmZD6sQACgkQMUZtbf5S
IrtLYw/+I73ePGIye37o2jpbodcLAUZVfF3r6uYUzK8hokEcKD0QVJa9w7PizLZ3
UO45ClOXFLJCkfP4reFenLfxGCel2AJI+F7VFl2xaO2XgrcH/lnVrHqKZEAEXjls
KoYMnShIolv7h2MKP6hHtyTi2j1wvQUKsZC71o9/fuW+4fUT8gECx1YtYcL73wrw
gEMdlUgBYC3jiiCUHJIFX6iPJ2t/TC+q1eIIF2K/Osrk2kIqQhzoozcL4vpuAZQT
99ljx/qRelXa8oppDb7nM5eulg7WY8ZqxEfFZphTMC5nLEGzClxuOTTl2kDYI/D/
UZmTWZDY+F5F0xvNk2gH84qVJXBOVDoobpT7hVA/tDuybobc/kvGDzRayEVqVzKj
Q0tPlJs+xBZpkK5TVnxaFLJVOM+p1Xosxy3kNVXmuYNBvT/R89UbJiCrUKqKZF+L
z/1mOYUv8UklHqYAeuJSptHvqJjTGa/fsEYP7dAUBbc1N2eVB8mzZ4mgU5rYXbtC
E6UXXiWnoSRm8bmco9QmcWWoXt5UGEizHSJLz6t1R5Df/YmXhWlytll5aCwY1ksf
FNoL7S4u7AZThL1Nwi7yUs4CAjhk/N4aOsk+41S0sALCx30BJuI6UdesAxJ0lu+Z
fwCQYbs27y4p7mBLbkYwcQNxAxGm7PSK4yeyRIy2njiyV4qnLf8=
=EsC2
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Complete rework of garbage collection of AF_UNIX sockets.
AF_UNIX is prone to forming reference count cycles due to fd
passing functionality. New method based on Tarjan's Strongly
Connected Components algorithm should be both faster and remove a
lot of workarounds we accumulated over the years.
- Add TCP fraglist GRO support, allowing chaining multiple TCP
packets and forwarding them together. Useful for small switches /
routers which lack basic checksum offload in some scenarios (e.g.
PPPoE).
- Support using SMP threads for handling packet backlog i.e. packet
processing from software interfaces and old drivers which don't use
NAPI. This helps move the processing out of the softirq jumble.
- Continue work of converting from rtnl lock to RCU protection.
Don't require rtnl lock when reading: IPv6 routing FIB, IPv6
address labels, netdev threaded NAPI sysfs files, bonding driver's
sysfs files, MPLS devconf, IPv4 FIB rules, netns IDs, tcp metrics,
TC Qdiscs, neighbor entries, ARP entries via ioctl(SIOCGARP), a lot
of the link information available via rtnetlink.
- Small optimizations from Eric to UDP wake up handling, memory
accounting, RPS/RFS implementation, TCP packet sizing etc.
- Allow direct page recycling in the bulk API used by XDP, for +2%
PPS.
- Support peek with an offset on TCP sockets.
- Add MPTCP APIs for querying last time packets were received/sent/acked
and whether MPTCP "upgrade" succeeded on a TCP socket.
- Add intra-node communication shortcut to improve SMC performance.
- Add IPv6 (and IPv{4,6}-over-IPv{4,6}) support to the GTP protocol
driver.
- Add HSR-SAN (RedBOX) mode of operation to the HSR protocol driver.
- Add reset reasons for tracing what caused a TCP reset to be sent.
- Introduce direction attribute for xfrm (IPSec) states. State can be
used either for input or output packet processing.
Things we sprinkled into general kernel code:
- Add bitmap_{read,write}(), bitmap_size(), expose BYTES_TO_BITS().
This required touch-ups and renaming of a few existing users.
- Add Endian-dependent __counted_by_{le,be} annotations.
- Make building selftests "quieter" by printing summaries like
"CC object.o" rather than full commands with all the arguments.
Netfilter:
- Use GFP_KERNEL to clone elements, to deal better with OOM
situations and avoid failures in the .commit step.
BPF:
- Add eBPF JIT for ARCv2 CPUs.
- Support attaching kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function
entry and return, the entry program can decide if the return
program gets executed and the entry program can share u64 cookie
value with return program. "Session mode" is a common use-case for
tetragon and bpftrace.
- Add the ability to specify and retrieve BPF cookie for raw
tracepoint programs in order to ease migration from classic to raw
tracepoints.
- Add an internal-only BPF per-CPU instruction for resolving per-CPU
memory addresses and implement support in x86, ARM64 and RISC-V
JITs. This allows inlining functions which need to access per-CPU
state.
- Optimize x86 BPF JIT's emit_mov_imm64, and add support for various
atomics in bpf_arena which can be JITed as a single x86
instruction. Support BPF arena on ARM64.
- Add a new bpf_wq API for deferring events and refactor
process-context bpf_timer code to keep common code where possible.
- Harden the BPF verifier's and/or/xor value tracking.
- Introduce crypto kfuncs to let BPF programs call kernel crypto
APIs.
- Support bpf_tail_call_static() helper for BPF programs with GCC 13.
- Add bpf_preempt_{disable,enable}() kfuncs in order to allow a BPF
program to have code sections where preemption is disabled.
Driver API:
- Skip software TC processing completely if all installed rules are
marked as HW-only, instead of checking the HW-only flag rule by
rule.
- Add support for configuring PoE (Power over Ethernet), similar to
the already existing support for PoDL (Power over Data Line)
config.
- Initial bits of a queue control API, for now allowing a single
queue to be reset without disturbing packet flow to other queues.
- Common (ethtool) statistics for hardware timestamping.
Tests and tooling:
- Remove the need to create a config file to run the net forwarding
tests so that a naive "make run_tests" can exercise them.
- Define a method of writing tests which require an external endpoint
to communicate with (to send/receive data towards the test
machine). Add a few such tests.
- Create a shared code library for writing Python tests. Expose the
YAML Netlink library from tools/ to the tests for easy Netlink
access.
- Move netfilter tests under net/, extend them, separate performance
tests from correctness tests, and iron out issues found by running
them "on every commit".
- Refactor BPF selftests to use common network helpers.
- Further work filling in YAML definitions of Netlink messages for:
nftables, team driver, bonding interfaces, vlan interfaces, VF
info, TC u32 mark, TC police action.
- Teach Python YAML Netlink to decode attribute policies.
- Extend the definition of the "indexed array" construct in the specs
to cover arrays of scalars rather than just nests.
- Add hyperlinks between definitions in generated Netlink docs.
Drivers:
- Make sure unsupported flower control flags are rejected by drivers,
and make more drivers report errors directly to the application
rather than dmesg (large number of driver changes from Asbjørn
Sloth Tønnesen).
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support multiple RSS contexts and steering traffic to them
- support XDP metadata
- make page pool allocations more NUMA aware
- Intel (100G, ice, idpf):
- extract datapath code common among Intel drivers into a library
- use fewer resources in switchdev by sharing queues with the PF
- add PFCP filter support
- add Ethernet filter support
- use a spinlock instead of HW lock in PTP clock ops
- support 5 layer Tx scheduler topology
- nVidia/Mellanox:
- 800G link modes and 100G SerDes speeds
- per-queue IRQ coalescing configuration
- Marvell Octeon:
- support offloading TC packet mark action
- Ethernet NICs consumer, embedded and virtual:
- stop lying about skb->truesize in USB Ethernet drivers, it
messes up TCP memory calculations
- Google cloud vNIC:
- support changing ring size via ethtool
- support ring reset using the queue control API
- VirtIO net:
- expose flow hash from RSS to XDP
- per-queue statistics
- add selftests
- Synopsys (stmmac):
- support controllers which require an RX clock signal from the
MII bus to perform their hardware initialization
- TI:
- icssg_prueth: support ICSSG-based Ethernet on AM65x SR1.0 devices
- icssg_prueth: add SW TX / RX Coalescing based on hrtimers
- cpsw: minimal XDP support
- Renesas (ravb):
- support describing the MDIO bus
- Realtek (r8169):
- add support for RTL8168M
- Microchip Sparx5:
- matchall and flower actions mirred and redirect
- Ethernet switches:
- nVidia/Mellanox:
- improve events processing performance
- Marvell:
- add support for MV88E6250 family internal PHYs
- Microchip:
- add DCB and DSCP mapping support for KSZ switches
- vsc73xx: convert to PHYLINK
- Realtek:
- rtl8226b/rtl8221b: add C45 instances and SerDes switching
- Many driver changes related to PHYLIB and PHYLINK deprecated API
cleanup
- Ethernet PHYs:
- Add a new driver for Airoha EN8811H 2.5 Gigabit PHY.
- micrel: lan8814: add support for PPS out and external timestamp trigger
- WiFi:
- Disable Wireless Extensions (WEXT) in all Wi-Fi 7 devices
drivers. Modern devices can only be configured using nl80211.
- mac80211/cfg80211
- handle color change per link for WiFi 7 Multi-Link Operation
- Intel (iwlwifi):
- don't support puncturing in 5 GHz
- support monitor mode on passive channels
- BZ-W device support
- P2P with HE/EHT support
- re-add support for firmware API 90
- provide channel survey information for Automatic Channel Selection
- MediaTek (mt76):
- mt7921 LED control
- mt7925 EHT radiotap support
- mt7920e PCI support
- Qualcomm (ath11k):
- P2P support for QCA6390, WCN6855 and QCA2066
- support hibernation
- ieee80211-freq-limit Device Tree property support
- Qualcomm (ath12k):
- refactoring in preparation of multi-link support
- suspend and hibernation support
- ACPI support
- debugfs support, including dfs_simulate_radar support
- RealTek:
- rtw88: RTL8723CS SDIO device support
- rtw89: RTL8922AE Wi-Fi 7 PCI device support
- rtw89: complete features of new WiFi 7 chip 8922AE including
BT-coexistence and Wake-on-WLAN
- rtw89: use BIOS ACPI settings to set TX power and channels
- rtl8xxxu: enable Management Frame Protection (MFP) support
- Bluetooth:
- support for Intel BlazarI and Filmore Peak2 (BE201)
- support for MediaTek MT7921S SDIO
- initial support for Intel PCIe BT driver
- remove HCI_AMP support"
* tag 'net-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1827 commits)
selftests: netfilter: fix packetdrill conntrack testcase
net: gro: fix napi_gro_cb zeroed alignment
Bluetooth: btintel_pcie: Refactor and code cleanup
Bluetooth: btintel_pcie: Fix warning reported by sparse
Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1
Bluetooth: btintel: Fix compiler warning for multi_v7_defconfig config
Bluetooth: btintel_pcie: Fix compiler warnings
Bluetooth: btintel_pcie: Add *setup* function to download firmware
Bluetooth: btintel_pcie: Add support for PCIe transport
Bluetooth: btintel: Export few static functions
Bluetooth: HCI: Remove HCI_AMP support
Bluetooth: L2CAP: Fix div-by-zero in l2cap_le_flowctl_init()
Bluetooth: qca: Fix error code in qca_read_fw_build_info()
Bluetooth: hci_conn: Use __counted_by() and avoid -Wfamnae warning
Bluetooth: btintel: Add support for Filmore Peak2 (BE201)
Bluetooth: btintel: Add support for BlazarI
LE Create Connection command timeout increased to 20 secs
dt-bindings: net: bluetooth: Add MediaTek MT7921S SDIO Bluetooth
Bluetooth: compute LE flow credits based on recvbuf space
Bluetooth: hci_sync: Use cmd->num_cis instead of magic number
...
2nd trial of the earlier PR with more appropriate tag:
1. Do no overwrite the key expiration once it is set.
2. Early to quota updates for keys to key_put(), instead of
updating them in key_gc_unused_keys().
[1] Earlier PR:
https://lore.kernel.org/linux-integrity/20240326143838.15076-1-jarkko@kernel.org/
BR, Jarkko
-----BEGIN PGP SIGNATURE-----
iJYEABYKAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCZjzYqSAcamFya2tvLnNh
a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0nQJAQDN4r2K/QgryPyS
fSfVPeeIPkBIOat2sTq8IOQLKlCybwD8Duo6gJuZWmQmdbn2m2Pn/Dg3xUy4ljba
IwHRro/kcwY=
=3V25
-----END PGP SIGNATURE-----
Merge tag 'keys-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull keys updates from Jarkko Sakkinen:
- do not overwrite the key expiration once it is set
- move key quota updates earlier into key_put(), instead of updating
them in key_gc_unused_keys()
* tag 'keys-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
keys: Fix overwrite of key expiration on instantiation
keys: update key quotas in key_put()
These are the changes for the TPM driver with a single major new
feature: TPM bus encryption and integrity protection. The key pair
on TPM side is generated from so called null random seed per power
on of the machine [1]. This supports the TPM encryption of the hard
drive by adding layer of protection against bus interposer attacks.
Other than the pull request a few minor fixes and documentation for
tpm_tis to clarify basics of TPM localities for future patch review
discussions (will be extended and refined over times, just a seed).
[1] https://lore.kernel.org/linux-integrity/20240429202811.13643-1-James.Bottomley@HansenPartnership.com/
BR, Jarkko
-----BEGIN PGP SIGNATURE-----
iJYEABYKAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCZj0l2iAcamFya2tvLnNh
a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0m8yAP4hBjMtpgAJZ4eZ
5o9tEQJrh/1JFZJ+8HU5IKPc4RU8BAEAyyYOCtxtS/C5B95iP+LvNla0KWi0pprU
HsCLULnV2Aw=
=RTXJ
-----END PGP SIGNATURE-----
Merge tag 'tpmdd-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull TPM updates from Jarkko Sakkinen:
"These are the changes for the TPM driver with a single major new
feature: TPM bus encryption and integrity protection. The key pair on
TPM side is generated from so called null random seed per power on of
the machine [1]. This supports the TPM encryption of the hard drive by
adding layer of protection against bus interposer attacks.
Other than that, a few minor fixes and documentation for tpm_tis to
clarify basics of TPM localities for future patch review discussions
(will be extended and refined over times, just a seed)"
Link: https://lore.kernel.org/linux-integrity/20240429202811.13643-1-James.Bottomley@HansenPartnership.com/ [1]
* tag 'tpmdd-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: (28 commits)
Documentation: tpm: Add TPM security docs toctree entry
tpm: disable the TPM if NULL name changes
Documentation: add tpm-security.rst
tpm: add the null key name as a sysfs export
KEYS: trusted: Add session encryption protection to the seal/unseal path
tpm: add session encryption protection to tpm2_get_random()
tpm: add hmac checks to tpm2_pcr_extend()
tpm: Add the rest of the session HMAC API
tpm: Add HMAC session name/handle append
tpm: Add HMAC session start and end functions
tpm: Add TCG mandated Key Derivation Functions (KDFs)
tpm: Add NULL primary creation
tpm: export the context save and load commands
tpm: add buffer function to point to returned parameters
crypto: lib - implement library version of AES in CFB mode
KEYS: trusted: tpm2: Use struct tpm_buf for sized buffers
tpm: Add tpm_buf_read_{u8,u16,u32}
tpm: TPM2B formatted buffers
tpm: Store the length of the tpm_buf data separately.
tpm: Update struct tpm_buf documentation comments
...
Introduces the LANDLOCK_ACCESS_FS_IOCTL_DEV right
and increments the Landlock ABI version to 5.
This access right applies to device-custom IOCTL commands
when they are invoked on block or character device files.
Like the truncate right, this right is associated with a file
descriptor at the time of open(2), and gets respected even when the
file descriptor is used outside of the thread which it was originally
opened in.
Therefore, a newly enabled Landlock policy does not apply to file
descriptors which are already open.
If the LANDLOCK_ACCESS_FS_IOCTL_DEV right is handled, only a small
number of safe IOCTL commands will be permitted on newly opened device
files. These include FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC, as well
as other IOCTL commands for regular files which are implemented in
fs/ioctl.c.
Noteworthy scenarios which require special attention:
TTY devices are often passed into a process from the parent process,
and so a newly enabled Landlock policy does not retroactively apply to
them automatically. In the past, TTY devices have often supported
IOCTL commands like TIOCSTI and some TIOCLINUX subcommands, which were
letting callers control the TTY input buffer (and simulate
keypresses). This should be restricted to CAP_SYS_ADMIN programs on
modern kernels though.
Known limitations:
The LANDLOCK_ACCESS_FS_IOCTL_DEV access right is a coarse-grained
control over IOCTL commands.
Landlock users may use path-based restrictions in combination with
their knowledge about the file system layout to control what IOCTLs
can be done.
Cc: Paul Moore <paul@paul-moore.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20240419161122.2023765-2-gnoack@google.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Fix the typo in the function documentation to please kernel doc
warnings.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
The inlined function is_deleted is redundant, it is not called at all
from any function in security/apparmor/file.c and so it can be removed.
Cleans up clang scan build warning:
security/apparmor/file.c:153:20: warning: unused function
'is_deleted' [-Wunused-function]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Inside unpack_profile() data->data is allocated using kvmemdup() so it
should be freed with the corresponding kvfree_sensitive().
Also add missing data->data release for rhashtable insertion failure path
in unpack_profile().
Found by Linux Verification Center (linuxtesting.org).
Fixes: e025be0f26 ("apparmor: support querying extended trusted helper extra data")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for
checked-in source files. It is merely a convention without any functional
difference. In fact, $(obj) and $(src) are exactly the same, as defined
in scripts/Makefile.build:
src := $(obj)
When the kernel is built in a separate output directory, $(src) does
not accurately reflect the source directory location. While Kbuild
resolves this discrepancy by specifying VPATH=$(srctree) to search for
source files, it does not cover all cases. For example, when adding a
header search path for local headers, -I$(srctree)/$(src) is typically
passed to the compiler.
This introduces inconsistency between upstream and downstream Makefiles
because $(src) is used instead of $(srctree)/$(src) for the latter.
To address this inconsistency, this commit changes the semantics of
$(src) so that it always points to the directory in the source tree.
Going forward, the variables used in Makefiles will have the following
meanings:
$(obj) - directory in the object tree
$(src) - directory in the source tree (changed by this commit)
$(objtree) - the top of the kernel object tree
$(srctree) - the top of the kernel source tree
Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced
with $(src).
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
If some entity is snooping the TPM bus, the can see the data going in
to be sealed and the data coming out as it is unsealed. Add parameter
and response encryption to these cases to ensure that no secrets are
leaked even if the bus is snooped.
As part of doing this conversion it was discovered that policy
sessions can't work with HMAC protected authority because of missing
pieces (the tpm Nonce). I've added code to work the same way as
before, which will result in potential authority exposure (while still
adding security for the command and the returned blob), and a fixme to
redo the API to get rid of this security hole.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Take advantage of the new sized buffer (TPM2B) mode of struct tpm_buf in
tpm2_seal_trusted(). This allows to add robustness to the command
construction without requiring to calculate buffer sizes manually.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
TPM2B buffers, or sized buffers, have a two byte header, which contains the
length of the payload as a 16-bit big-endian number, without counting in
the space taken by the header. This differs from encoding in the TPM header
where the length includes also the bytes taken by the header.
Unbound the length of a tpm_buf from the value stored to the TPM command
header. A separate encoding and decoding step so that different buffer
types can be supported, with variant header format and length encoding.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Open code the last remaining call site for tpm_send().
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Update the documentation for trusted and encrypted KEYS with DCP as new
trust source:
- Describe security properties of DCP trust source
- Describe key usage
- Document blob format
Co-developed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Co-developed-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Signed-off-by: David Gstir <david@sigma-star.at>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
DCP (Data Co-Processor) is the little brother of NXP's CAAM IP.
Beside of accelerated crypto operations, it also offers support for
hardware-bound keys. Using this feature it is possible to implement a blob
mechanism similar to what CAAM offers. Unlike on CAAM, constructing and
parsing the blob has to happen in software (i.e. the kernel).
The software-based blob format used by DCP trusted keys encrypts
the payload using AES-128-GCM with a freshly generated random key and nonce.
The random key itself is AES-128-ECB encrypted using the DCP unique
or OTP key.
The DCP trusted key blob format is:
/*
* struct dcp_blob_fmt - DCP BLOB format.
*
* @fmt_version: Format version, currently being %1
* @blob_key: Random AES 128 key which is used to encrypt @payload,
* @blob_key itself is encrypted with OTP or UNIQUE device key in
* AES-128-ECB mode by DCP.
* @nonce: Random nonce used for @payload encryption.
* @payload_len: Length of the plain text @payload.
* @payload: The payload itself, encrypted using AES-128-GCM and @blob_key,
* GCM auth tag of size AES_BLOCK_SIZE is attached at the end of it.
*
* The total size of a DCP BLOB is sizeof(struct dcp_blob_fmt) + @payload_len +
* AES_BLOCK_SIZE.
*/
struct dcp_blob_fmt {
__u8 fmt_version;
__u8 blob_key[AES_KEYSIZE_128];
__u8 nonce[AES_KEYSIZE_128];
__le32 payload_len;
__u8 payload[];
} __packed;
By default the unique key is used. It is also possible to use the
OTP key. While the unique key should be unique it is not documented how
this key is derived. Therefore selection the OTP key is supported as
well via the use_otp_key module parameter.
Co-developed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Co-developed-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Signed-off-by: David Gstir <david@sigma-star.at>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Enabling trusted keys requires at least one trust source implementation
(currently TPM, TEE or CAAM) to be enabled. Currently, this is
done by checking each trust source's config option individually.
This does not scale when more trust sources like the one for DCP
are added, because the condition will get long and hard to read.
Add config HAVE_TRUSTED_KEYS which is set to true by each trust source
once its enabled and adapt the check for having at least one active trust
source to use this option. Whenever a new trust source is added, it now
needs to select HAVE_TRUSTED_KEYS.
Signed-off-by: David Gstir <david@sigma-star.at>
Tested-by: Jarkko Sakkinen <jarkko@kernel.org> # for TRUSTED_KEYS_TPM
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
The expiry time of a key is unconditionally overwritten during
instantiation, defaulting to turn it permanent. This causes a problem
for DNS resolution as the expiration set by user-space is overwritten to
TIME64_MAX, disabling further DNS updates. Fix this by restoring the
condition that key_set_expiry is only called when the pre-parser sets a
specific expiry.
Fixes: 39299bdd25 ("keys, dns: Allow key types (eg. DNS) to be reclaimed immediately on expiry")
Signed-off-by: Silvio Gissi <sifonsec@amazon.com>
cc: David Howells <dhowells@redhat.com>
cc: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: keyrings@vger.kernel.org
cc: netdev@vger.kernel.org
cc: stable@vger.kernel.org
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Delaying key quotas update when key's refcount reaches 0 in key_put() has
been causing some issues in fscrypt testing, specifically in fstest
generic/581. This commit fixes this test flakiness by dealing with the
quotas immediately, and leaving all the other clean-ups to the key garbage
collector.
This is done by moving the updates to the qnkeys and qnbytes fields in
struct key_user from key_gc_unused_keys() into key_put(). Unfortunately,
this also means that we need to switch to the irq-version of the spinlock
that protects these fields and use spin_lock_{irqsave,irqrestore} in all
the code that touches these fields.
Signed-off-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@kernel.org>
cond_policydb_dup() duplicates conditional parts of an existing policy.
Declare the source policy const, since it should not be modified.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: various line length fixups]
Signed-off-by: Paul Moore <paul@paul-moore.com>
The usage of printk_ratelimit() is discouraged, see
include/linux/printk.h, thus use pr_warn_ratelimited().
While editing this line address the following checkpatch warning:
WARNING: Integer promotion: Using 'h' in '%hu' is unnecessary
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Since the status page is currently only allocated on first use, the
sequence number of the initial policyload (i.e. 1) is not stored,
leading to the observable sequence of 0, 2, 3, 4, ...
Try to pre-allocate the status page during the initialization of the
selinuxfs, so selinux_status_update_policyload() will set the sequence
number.
This brings the status page to return the actual sequence number for the
initial policy load, which is also observable via the netlink socket.
I could not find any occurrence where userspace depends on the actual
value returned by selinux_status_policyload(3), thus the breakage should
be unnoticed.
Closes: https://lore.kernel.org/selinux/87o7fmua12.fsf@redhat.com/
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: trimmed 'reported-by' that was missing an email]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Implements the "init_mlocked_on_free" boot option. When this boot option
is enabled, any mlock'ed pages are zeroed on free. If
the pages are munlock'ed beforehand, no initialization takes place.
This boot option is meant to combat the performance hit of
"init_on_free" as reported in commit 6471384af2 ("mm: security:
introduce init_on_alloc=1 and init_on_free=1 boot options"). With
"init_mlocked_on_free=1" only relevant data is freed while everything
else is left untouched by the kernel. Correspondingly, this patch
introduces no performance hit for unmapping non-mlock'ed memory. The
unmapping overhead for purely mlocked memory was measured to be
approximately 13%. Realistically, most systems mlock only a fraction of
the total memory so the real-world system overhead should be close to
zero.
Optimally, userspace programs clear any key material or other
confidential memory before exit and munlock the according memory
regions. If a program crashes, userspace key managers fail to do this
job. Accordingly, no munlock operations are performed so the data is
caught and zeroed by the kernel. Should the program not crash, all
memory will ideally be munlocked so no overhead is caused.
CONFIG_INIT_MLOCKED_ON_FREE_DEFAULT_ON can be set to enable
"init_mlocked_on_free" by default.
Link: https://lkml.kernel.org/r/20240329145605.149917-1-yjnworkstation@gmail.com
Signed-off-by: York Jasper Niebuhr <yjnworkstation@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: York Jasper Niebuhr <yjnworkstation@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which will
reduce the overall build time size of the kernel and run time memory
bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)
Remove the sentinel from all files under security/ that register a
sysctl table.
Signed-off-by: Joel Granados <j.granados@samsung.com>
Acked-by: Kees Cook <keescook@chromium.org> # loadpin & yama
Tested-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
The template hash showed by the ascii_runtime_measurements and
binary_runtime_measurements is the one calculated using sha1 and there is
no possibility to change this value, despite the fact that the template
hash is calculated using the hash algorithms corresponding to all the PCR
banks configured in the TPM.
Add the support to retrieve the ima log with the template data hash
calculated with a specific hash algorithm.
Add a new file in the securityfs ima directory for each hash algo
configured in a PCR bank of the TPM. Each new file has the name with
the following structure:
{binary, ascii}_runtime_measurements_<hash_algo_name>
Legacy files are kept, to avoid breaking existing applications, but as
symbolic links which point to {binary, ascii}_runtime_measurements_sha1
files. These two files are created even if a TPM chip is not detected or
the sha1 bank is not configured in the TPM.
As example, in the case a TPM chip is present and sha256 is the only
configured PCR bank, the listing of the securityfs ima directory is the
following:
lr--r--r-- [...] ascii_runtime_measurements -> ascii_runtime_measurements_sha1
-r--r----- [...] ascii_runtime_measurements_sha1
-r--r----- [...] ascii_runtime_measurements_sha256
lr--r--r-- [...] binary_runtime_measurements -> binary_runtime_measurements_sha1
-r--r----- [...] binary_runtime_measurements_sha1
-r--r----- [...] binary_runtime_measurements_sha256
--w------- [...] policy
-r--r----- [...] runtime_measurements_count
-r--r----- [...] violations
Signed-off-by: Enrico Bravi <enrico.bravi@polito.it>
Signed-off-by: Silvia Sisinni <silvia.sisinni@polito.it>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>