We do embedd struct fown_struct into struct file letting it take up 32
bytes in total. We could tweak struct fown_struct to be more compact but
really it shouldn't even be embedded in struct file in the first place.
Instead, actual users of struct fown_struct should allocate the struct
on demand. This frees up 24 bytes in struct file.
That will have some potentially user-visible changes for the ownership
fcntl()s. Some of them can now fail due to allocation failures.
Practically, that probably will almost never happen as the allocations
are small and they only happen once per file.
The fown_struct is used during kill_fasync() which is used by e.g.,
pipes to generate a SIGIO signal. Sending of such signals is conditional
on userspace having set an owner for the file using one of the F_OWNER
fcntl()s. Such users will be unaffected if struct fown_struct is
allocated during the fcntl() call.
There are a few subsystems that call __f_setown() expecting
file->f_owner to be allocated:
(1) tun devices
file->f_op->fasync::tun_chr_fasync()
-> __f_setown()
There are no callers of tun_chr_fasync().
(2) tty devices
file->f_op->fasync::tty_fasync()
-> __tty_fasync()
-> __f_setown()
tty_fasync() has no additional callers but __tty_fasync() has. Note
that __tty_fasync() only calls __f_setown() if the @on argument is
true. It's called from:
file->f_op->release::tty_release()
-> tty_release()
-> __tty_fasync()
-> __f_setown()
tty_release() calls __tty_fasync() with @on false
=> __f_setown() is never called from tty_release().
=> All callers of tty_release() are safe as well.
file->f_op->release::tty_open()
-> tty_release()
-> __tty_fasync()
-> __f_setown()
__tty_hangup() calls __tty_fasync() with @on false
=> __f_setown() is never called from tty_release().
=> All callers of __tty_hangup() are safe as well.
From the callchains it's obvious that (1) and (2) end up getting called
via file->f_op->fasync(). That can happen either through the F_SETFL
fcntl() with the FASYNC flag raised or via the FIOASYNC ioctl(). If
FASYNC is requested and the file isn't already FASYNC then
file->f_op->fasync() is called with @on true which ends up causing both
(1) and (2) to call __f_setown().
(1) and (2) are the only subsystems that call __f_setown() from the
file->f_op->fasync() handler. So both (1) and (2) have been updated to
allocate a struct fown_struct prior to calling fasync_helper() to
register with the fasync infrastructure. That's safe as they both call
fasync_helper() which also does allocations if @on is true.
The other interesting case are file leases:
(3) file leases
lease_manager_ops->lm_setup::lease_setup()
-> __f_setown()
Which in turn is called from:
generic_add_lease()
-> lease_manager_ops->lm_setup::lease_setup()
-> __f_setown()
So here again we can simply make generic_add_lease() allocate struct
fown_struct prior to the lease_manager_ops->lm_setup::lease_setup()
which happens under a spinlock.
With that the two remaining subsystems that call __f_setown() are:
(4) dnotify
(5) sockets
Both have their own custom ioctls to set struct fown_struct and both
have been converted to allocate a struct fown_struct on demand from
their respective ioctls.
Interactions with O_PATH are fine as well e.g., when opening a /dev/tty
as O_PATH then no file->f_op->open() happens thus no file->f_owner is
allocated. That's fine as no file operation will be set for those and
the device has never been opened. fcntl()s called on such things will
just allocate a ->f_owner on demand. Although I have zero idea why'd you
care about f_owner on an O_PATH fd.
Link: https://lore.kernel.org/r/20240813-work-f_owner-v2-1-4e9343a79f9f@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Based on guidance in include/linux/slab.h, replace kmem_cache_create()
with KMEM_CACHE() for sources under security/selinux to simplify creation
of SLAB caches.
Signed-off-by: Eric Suen <ericsu@linux.microsoft.com>
[PM: minor grammar nits in the description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Because these are equals to MAX_LSM_COUNT. Also, we can avoid dynamic
memory allocation for ordered_lsms because MAX_LSM_COUNT is a constant.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Paul Moore <paul@paul-moore.com>
KCSAN flags the check of isec->initialized by
__inode_security_revalidate() as a data race. This is indeed a racy
check, but inode_doinit_with_dentry() will recheck with isec->lock held.
Annotate the check with the data_race() macro to silence the KCSAN false
positive.
Reported-by: syzbot+319ed1769c0078257262@syzkaller.appspotmail.com
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
policy_unpack_test fails on big endian systems because data byte order
is expected to be little endian but is generated in host byte order.
This results in test failures such as:
# policy_unpack_test_unpack_array_with_null_name: EXPECTATION FAILED at security/apparmor/policy_unpack_test.c:150
Expected array_size == (u16)16, but
array_size == 4096 (0x1000)
(u16)16 == 16 (0x10)
# policy_unpack_test_unpack_array_with_null_name: pass:0 fail:1 skip:0 total:1
not ok 3 policy_unpack_test_unpack_array_with_null_name
# policy_unpack_test_unpack_array_with_name: EXPECTATION FAILED at security/apparmor/policy_unpack_test.c:164
Expected array_size == (u16)16, but
array_size == 4096 (0x1000)
(u16)16 == 16 (0x10)
# policy_unpack_test_unpack_array_with_name: pass:0 fail:1 skip:0 total:1
Add the missing endianness conversions when generating test data.
Fixes: 4d944bcd4e73 ("apparmor: add AppArmor KUnit tests for policy unpack")
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Aligned parameters in the function declaration of smack_ip_output
to adhere to the Linux kernel coding style guidelines.
The parameters of the smack_ip_output function were previously misaligned,
with the second and third parameters not aligned under the first parameter.
This change corrects the indentation, improving code readability and
maintaining consistency with the rest of the codebase.
Signed-off-by: GiSeong Ji <jiggyjiggy0323@gmail.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
The header files eval.h is included twice in ipe.c,
so one inclusion of each can be removed.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9796
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
LSM hooks are currently invoked from a linked list as indirect calls
which are invoked using retpolines as a mitigation for speculative
attacks (Branch History / Target injection) and add extra overhead which
is especially bad in kernel hot paths:
security_file_ioctl:
0xff...0320 <+0>: endbr64
0xff...0324 <+4>: push %rbp
0xff...0325 <+5>: push %r15
0xff...0327 <+7>: push %r14
0xff...0329 <+9>: push %rbx
0xff...032a <+10>: mov %rdx,%rbx
0xff...032d <+13>: mov %esi,%ebp
0xff...032f <+15>: mov %rdi,%r14
0xff...0332 <+18>: mov $0xff...7030,%r15
0xff...0339 <+25>: mov (%r15),%r15
0xff...033c <+28>: test %r15,%r15
0xff...033f <+31>: je 0xff...0358 <security_file_ioctl+56>
0xff...0341 <+33>: mov 0x18(%r15),%r11
0xff...0345 <+37>: mov %r14,%rdi
0xff...0348 <+40>: mov %ebp,%esi
0xff...034a <+42>: mov %rbx,%rdx
0xff...034d <+45>: call 0xff...2e0 <__x86_indirect_thunk_array+352>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Indirect calls that use retpolines leading to overhead, not just due
to extra instruction but also branch misses.
0xff...0352 <+50>: test %eax,%eax
0xff...0354 <+52>: je 0xff...0339 <security_file_ioctl+25>
0xff...0356 <+54>: jmp 0xff...035a <security_file_ioctl+58>
0xff...0358 <+56>: xor %eax,%eax
0xff...035a <+58>: pop %rbx
0xff...035b <+59>: pop %r14
0xff...035d <+61>: pop %r15
0xff...035f <+63>: pop %rbp
0xff...0360 <+64>: jmp 0xff...47c4 <__x86_return_thunk>
The indirect calls are not really needed as one knows the addresses of
enabled LSM callbacks at boot time and only the order can possibly
change at boot time with the lsm= kernel command line parameter.
An array of static calls is defined per LSM hook and the static calls
are updated at boot time once the order has been determined.
With the hook now exposed as a static call, one can see that the
retpolines are no longer there and the LSM callbacks are invoked
directly:
security_file_ioctl:
0xff...0ca0 <+0>: endbr64
0xff...0ca4 <+4>: nopl 0x0(%rax,%rax,1)
0xff...0ca9 <+9>: push %rbp
0xff...0caa <+10>: push %r14
0xff...0cac <+12>: push %rbx
0xff...0cad <+13>: mov %rdx,%rbx
0xff...0cb0 <+16>: mov %esi,%ebp
0xff...0cb2 <+18>: mov %rdi,%r14
0xff...0cb5 <+21>: jmp 0xff...0cc7 <security_file_ioctl+39>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Static key enabled for SELinux
0xffffffff818f0cb7 <+23>: jmp 0xff...0cde <security_file_ioctl+62>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Static key enabled for BPF LSM. This is something that is changed to
default to false to avoid the existing side effect issues of BPF LSM
[1] in a subsequent patch.
0xff...0cb9 <+25>: xor %eax,%eax
0xff...0cbb <+27>: xchg %ax,%ax
0xff...0cbd <+29>: pop %rbx
0xff...0cbe <+30>: pop %r14
0xff...0cc0 <+32>: pop %rbp
0xff...0cc1 <+33>: cs jmp 0xff...0000 <__x86_return_thunk>
0xff...0cc7 <+39>: endbr64
0xff...0ccb <+43>: mov %r14,%rdi
0xff...0cce <+46>: mov %ebp,%esi
0xff...0cd0 <+48>: mov %rbx,%rdx
0xff...0cd3 <+51>: call 0xff...3230 <selinux_file_ioctl>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Direct call to SELinux.
0xff...0cd8 <+56>: test %eax,%eax
0xff...0cda <+58>: jne 0xff...0cbd <security_file_ioctl+29>
0xff...0cdc <+60>: jmp 0xff...0cb7 <security_file_ioctl+23>
0xff...0cde <+62>: endbr64
0xff...0ce2 <+66>: mov %r14,%rdi
0xff...0ce5 <+69>: mov %ebp,%esi
0xff...0ce7 <+71>: mov %rbx,%rdx
0xff...0cea <+74>: call 0xff...e220 <bpf_lsm_file_ioctl>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Direct call to BPF LSM.
0xff...0cef <+79>: test %eax,%eax
0xff...0cf1 <+81>: jne 0xff...0cbd <security_file_ioctl+29>
0xff...0cf3 <+83>: jmp 0xff...0cb9 <security_file_ioctl+25>
0xff...0cf5 <+85>: endbr64
0xff...0cf9 <+89>: mov %r14,%rdi
0xff...0cfc <+92>: mov %ebp,%esi
0xff...0cfe <+94>: mov %rbx,%rdx
0xff...0d01 <+97>: pop %rbx
0xff...0d02 <+98>: pop %r14
0xff...0d04 <+100>: pop %rbp
0xff...0d05 <+101>: ret
0xff...0d06 <+102>: int3
0xff...0d07 <+103>: int3
0xff...0d08 <+104>: int3
0xff...0d09 <+105>: int3
While this patch uses static_branch_unlikely indicating that an LSM hook
is likely to be not present. In most cases this is still a better choice
as even when an LSM with one hook is added, empty slots are created for
all LSM hooks (especially when many LSMs that do not initialize most
hooks are present on the system).
There are some hooks that don't use the call_int_hook or
call_void_hook. These hooks are updated to use a new macro called
lsm_for_each_hook where the lsm_callback is directly invoked as an
indirect call.
Below are results of the relevant Unixbench system benchmarks with BPF LSM
and SELinux enabled with default policies enabled with and without these
patches.
Benchmark Delta(%): (+ is better)
==========================================================================
Execl Throughput +1.9356
File Write 1024 bufsize 2000 maxblocks +6.5953
Pipe Throughput +9.5499
Pipe-based Context Switching +3.0209
Process Creation +2.3246
Shell Scripts (1 concurrent) +1.4975
System Call Overhead +2.7815
System Benchmarks Index Score (Partial Only): +3.4859
In the best case, some syscalls like eventfd_create benefitted to about
~10%.
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add various happy/unhappy unit tests for both IPE's policy parser.
Besides, a test suite for IPE functionality is available at
https://github.com/microsoft/ipe/tree/test-suite
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Enables an IPE policy to be enforced from kernel start, enabling access
control based on trust from kernel startup. This is accomplished by
transforming an IPE policy indicated by CONFIG_IPE_BOOT_POLICY into a
c-string literal that is parsed at kernel startup as an unsigned policy.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Enable IPE policy authors to indicate trust for a singular fsverity
file, identified by the digest information, through "fsverity_digest"
and all files using valid fsverity builtin signatures via
"fsverity_signature".
This enables file-level integrity claims to be expressed in IPE,
allowing individual files to be authorized, giving some flexibility
for policy authors. Such file-level claims are important to be expressed
for enforcing the integrity of packages, as well as address some of the
scalability issues in a sole dm-verity based solution (# of loop back
devices, etc).
This solution cannot be done in userspace as the minimum threat that
IPE should mitigate is an attacker downloads malicious payload with
all required dependencies. These dependencies can lack the userspace
check, bypassing the protection entirely. A similar attack succeeds if
the userspace component is replaced with a version that does not
perform the check. As a result, this can only be done in the common
entry point - the kernel.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This patch introduces a new hook to save inode's integrity
data. For example, for fsverity enabled files, LSMs can use this hook to
save the existence of verified fsverity builtin signature into the inode's
security blob, and LSMs can make access decisions based on this data.
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: subject line tweak, removed changelog]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Allows author of IPE policy to indicate trust for a singular dm-verity
volume, identified by roothash, through "dmverity_roothash" and all
signed and validated dm-verity volumes, through "dmverity_signature".
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: fixed some line length issues in the comments]
Signed-off-by: Paul Moore <paul@paul-moore.com>
This patch introduces a new LSM blob to the block_device structure,
enabling the security subsystem to store security-sensitive data related
to block devices. Currently, for a device mapper's mapped device containing
a dm-verity target, critical security information such as the roothash and
its signing state are not readily accessible. Specifically, while the
dm-verity volume creation process passes the dm-verity roothash and its
signature from userspace to the kernel, the roothash is stored privately
within the dm-verity target, and its signature is discarded
post-verification. This makes it extremely hard for the security subsystem
to utilize these data.
With the addition of the LSM blob to the block_device structure, the
security subsystem can now retain and manage important security metadata
such as the roothash and the signing state of a dm-verity by storing them
inside the blob. Access decisions can then be based on these stored data.
The implementation follows the same approach used for security blobs in
other structures like struct file, struct inode, and struct superblock.
The initialization of the security blob occurs after the creation of the
struct block_device, performed by the security subsystem. Similarly, the
security blob is freed by the security subsystem before the struct
block_device is deallocated or freed.
This patch also introduces a new hook security_bdev_setintegrity() to save
block device's integrity data to the new LSM blob. For example, for
dm-verity, it can use this hook to expose its roothash and signing state
to LSMs, then LSMs can save these data into the LSM blob.
Please note that the new hook should be invoked every time the security
information is updated to keep these data current. For example, in
dm-verity, if the mapping table is reloaded and configured to use a
different dm-verity target with a new roothash and signing information,
the previously stored data in the LSM blob will become obsolete. It is
crucial to re-invoke the hook to refresh these data and ensure they are up
to date. This necessity arises from the design of device-mapper, where a
device-mapper device is first created, and then targets are subsequently
loaded into it. These targets can be modified multiple times during the
device's lifetime. Therefore, while the LSM blob is allocated during the
creation of the block device, its actual contents are not initialized at
this stage and can change substantially over time. This includes
alterations from data that the LSM 'trusts' to those it does not, making
it essential to handle these changes correctly. Failure to address this
dynamic aspect could potentially allow for bypassing LSM checks.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: merge fuzz, subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
IPE, like SELinux, supports a permissive mode. This mode allows policy
authors to test and evaluate IPE policy without it affecting their
programs. When the mode is changed, a 1404 AUDIT_MAC_STATUS will
be reported.
This patch adds the following audit records:
audit: MAC_STATUS enforcing=0 old_enforcing=1 auid=4294967295
ses=4294967295 enabled=1 old-enabled=1 lsm=ipe res=1
audit: MAC_STATUS enforcing=1 old_enforcing=0 auid=4294967295
ses=4294967295 enabled=1 old-enabled=1 lsm=ipe res=1
The audit record only emit when the value from the user input is
different from the current enforce value.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Users of IPE require a way to identify when and why an operation fails,
allowing them to both respond to violations of policy and be notified
of potentially malicious actions on their systems with respect to IPE
itself.
This patch introduces 3 new audit events.
AUDIT_IPE_ACCESS(1420) indicates the result of an IPE policy evaluation
of a resource.
AUDIT_IPE_CONFIG_CHANGE(1421) indicates the current active IPE policy
has been changed to another loaded policy.
AUDIT_IPE_POLICY_LOAD(1422) indicates a new IPE policy has been loaded
into the kernel.
This patch also adds support for success auditing, allowing users to
identify why an allow decision was made for a resource. However, it is
recommended to use this option with caution, as it is quite noisy.
Here are some examples of the new audit record types:
AUDIT_IPE_ACCESS(1420):
audit: AUDIT1420 ipe_op=EXECUTE ipe_hook=BPRM_CHECK enforcing=1
pid=297 comm="sh" path="/root/vol/bin/hello" dev="tmpfs"
ino=3897 rule="op=EXECUTE boot_verified=TRUE action=ALLOW"
audit: AUDIT1420 ipe_op=EXECUTE ipe_hook=BPRM_CHECK enforcing=1
pid=299 comm="sh" path="/mnt/ipe/bin/hello" dev="dm-0"
ino=2 rule="DEFAULT action=DENY"
audit: AUDIT1420 ipe_op=EXECUTE ipe_hook=BPRM_CHECK enforcing=1
pid=300 path="/tmp/tmpdp2h1lub/deny/bin/hello" dev="tmpfs"
ino=131 rule="DEFAULT action=DENY"
The above three records were generated when the active IPE policy only
allows binaries from the initramfs to run. The three identical `hello`
binary were placed at different locations, only the first hello from
the rootfs(initramfs) was allowed.
Field ipe_op followed by the IPE operation name associated with the log.
Field ipe_hook followed by the name of the LSM hook that triggered the IPE
event.
Field enforcing followed by the enforcement state of IPE. (it will be
introduced in the next commit)
Field pid followed by the pid of the process that triggered the IPE
event.
Field comm followed by the command line program name of the process that
triggered the IPE event.
Field path followed by the file's path name.
Field dev followed by the device name as found in /dev where the file is
from.
Note that for device mappers it will use the name `dm-X` instead of
the name in /dev/mapper.
For a file in a temp file system, which is not from a device, it will use
`tmpfs` for the field.
The implementation of this part is following another existing use case
LSM_AUDIT_DATA_INODE in security/lsm_audit.c
Field ino followed by the file's inode number.
Field rule followed by the IPE rule made the access decision. The whole
rule must be audited because the decision is based on the combination of
all property conditions in the rule.
Along with the syscall audit event, user can know why a blocked
happened. For example:
audit: AUDIT1420 ipe_op=EXECUTE ipe_hook=BPRM_CHECK enforcing=1
pid=2138 comm="bash" path="/mnt/ipe/bin/hello" dev="dm-0"
ino=2 rule="DEFAULT action=DENY"
audit[1956]: SYSCALL arch=c000003e syscall=59
success=no exit=-13 a0=556790138df0 a1=556790135390 a2=5567901338b0
a3=ab2a41a67f4f1f4e items=1 ppid=147 pid=1956 auid=4294967295 uid=0
gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0
ses=4294967295 comm="bash" exe="/usr/bin/bash" key=(null)
The above two records showed bash used execve to run "hello" and got
blocked by IPE. Note that the IPE records are always prior to a SYSCALL
record.
AUDIT_IPE_CONFIG_CHANGE(1421):
audit: AUDIT1421
old_active_pol_name="Allow_All" old_active_pol_version=0.0.0
old_policy_digest=sha256:E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649
new_active_pol_name="boot_verified" new_active_pol_version=0.0.0
new_policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F
auid=4294967295 ses=4294967295 lsm=ipe res=1
The above record showed the current IPE active policy switch from
`Allow_All` to `boot_verified` along with the version and the hash
digest of the two policies. Note IPE can only have one policy active
at a time, all access decision evaluation is based on the current active
policy.
The normal procedure to deploy a policy is loading the policy to deploy
into the kernel first, then switch the active policy to it.
AUDIT_IPE_POLICY_LOAD(1422):
audit: AUDIT1422 policy_name="boot_verified" policy_version=0.0.0
policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F2676
auid=4294967295 ses=4294967295 lsm=ipe res=1
The above record showed a new policy has been loaded into the kernel
with the policy name, policy version and policy hash.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
As is typical with LSMs, IPE uses securityfs as its interface with
userspace. for a complete list of the interfaces and the respective
inputs/outputs, please see the documentation under
admin-guide/LSM/ipe.rst
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
When deleting a directory in the security file system, the existing
securityfs_remove requires the directory to be empty, otherwise
it will do nothing. This leads to a potential risk that the security
file system might be in an unclean state when the intended deletion
did not happen.
This commit introduces a new function securityfs_recursive_remove
to recursively delete a directory without leaving an unclean state.
Co-developed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
IPE is designed to provide system level trust guarantees, this usually
implies that trust starts from bootup with a hardware root of trust,
which validates the bootloader. After this, the bootloader verifies
the kernel and the initramfs.
As there's no currently supported integrity method for initramfs, and
it's typically already verified by the bootloader. This patch introduces
a new IPE property `boot_verified` which allows author of IPE policy to
indicate trust for files from initramfs.
The implementation of this feature utilizes the newly added
`initramfs_populated` hook. This hook marks the superblock of the rootfs
after the initramfs has been unpacked into it.
Before mounting the real rootfs on top of the initramfs, initramfs
script will recursively remove all files and directories on the
initramfs. This is typically implemented by using switch_root(8)
(https://man7.org/linux/man-pages/man8/switch_root.8.html).
Therefore the initramfs will be empty and not accessible after the real
rootfs takes over. It is advised to switch to a different policy
that doesn't rely on the `boot_verified` property after this point.
This ensures that the trust policies remain relevant and effective
throughout the system's operation.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This patch introduces a new hook to notify security system that the
content of initramfs has been unpacked into the rootfs.
Upon receiving this notification, the security system can activate
a policy to allow only files that originated from the initramfs to
execute or load into kernel during the early stages of booting.
This approach is crucial for minimizing the attack surface by
ensuring that only trusted files from the initramfs are operational
in the critical boot phase.
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
IPE's initial goal is to control both execution and the loading of
kernel modules based on the system's definition of trust. It
accomplishes this by plugging into the security hooks for
bprm_check_security, file_mprotect, mmap_file, kernel_load_data,
and kernel_read_data.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Introduce a core evaluation function in IPE that will be triggered by
various security hooks (e.g., mmap, bprm_check, kexec). This function
systematically assesses actions against the defined IPE policy, by
iterating over rules specific to the action being taken. This critical
addition enables IPE to enforce its security policies effectively,
ensuring that actions intercepted by these hooks are scrutinized for policy
compliance before they are allowed to proceed.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
IPE's interpretation of the what the user trusts is accomplished through
its policy. IPE's design is to not provide support for a single trust
provider, but to support multiple providers to enable the end-user to
choose the best one to seek their needs.
This requires the policy to be rather flexible and modular so that
integrity providers, like fs-verity, dm-verity, or some other system,
can plug into the policy with minimal code changes.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: added NULL check in parse_rule() as discussed]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Integrity Policy Enforcement (IPE) is an LSM that provides an
complimentary approach to Mandatory Access Control than existing LSMs
today.
Existing LSMs have centered around the concept of access to a resource
should be controlled by the current user's credentials. IPE's approach,
is that access to a resource should be controlled by the system's trust
of a current resource.
The basis of this approach is defining a global policy to specify which
resource can be trusted.
Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@linux.microsoft.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Trusted keys unseal the key blob on load, but keep the sealed payload in
the blob field so that every subsequent read (export) will simply
convert this field to hex and send it to userspace.
With DCP-based trusted keys, we decrypt the blob encryption key (BEK)
in the Kernel due hardware limitations and then decrypt the blob payload.
BEK decryption is done in-place which means that the trusted key blob
field is modified and it consequently holds the BEK in plain text.
Every subsequent read of that key thus send the plain text BEK instead
of the encrypted BEK to userspace.
This issue only occurs when importing a trusted DCP-based key and
then exporting it again. This should rarely happen as the common use cases
are to either create a new trusted key and export it, or import a key
blob and then just use it without exporting it again.
Fix this by performing BEK decryption and encryption in a dedicated
buffer. Further always wipe the plain text BEK buffer to prevent leaking
the key via uninitialized memory.
Cc: stable@vger.kernel.org # v6.10+
Fixes: 2e8a0f40a39c ("KEYS: trusted: Introduce NXP DCP-backed trusted keys")
Signed-off-by: David Gstir <david@sigma-star.at>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
The DCP trusted key type uses the wrong helper function to store
the blob's payload length which can lead to the wrong byte order
being used in case this would ever run on big endian architectures.
Fix by using correct helper function.
Cc: stable@vger.kernel.org # v6.10+
Fixes: 2e8a0f40a39c ("KEYS: trusted: Introduce NXP DCP-backed trusted keys")
Suggested-by: Richard Weinberger <richard@nod.at>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405240610.fj53EK0q-lkp@intel.com/
Signed-off-by: David Gstir <david@sigma-star.at>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Fix sparse warning:
security/lockdown/lockdown.c:79:21: warning:
symbol 'lockdown_lsmid' was not declared. Should it be static?
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
For any changes of struct fd representation we need to
turn existing accesses to fields into calls of wrappers.
Accesses to struct fd::flags are very few (3 in linux/file.h,
1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
explicit initializers).
Those can be dealt with in the commit converting to
new layout; accesses to struct fd::file are too many for that.
This commit converts (almost) all of f.file to
fd_file(f). It's not entirely mechanical ('file' is used as
a member name more than just in struct fd) and it does not
even attempt to distinguish the uses in pointer context from
those in boolean context; the latter will be eventually turned
into a separate helper (fd_empty()).
NOTE: mass conversion to fd_empty(), tempting as it
might be, is a bad idea; better do that piecewise in commit
that convert from fdget...() to CLASS(...).
[conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c
caught by git; fs/stat.c one got caught by git grep]
[fs/xattr.c conflict]
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The LSM framework has an existing inode_free_security() hook which
is used by LSMs that manage state associated with an inode, but
due to the use of RCU to protect the inode, special care must be
taken to ensure that the LSMs do not fully release the inode state
until it is safe from a RCU perspective.
This patch implements a new inode_free_security_rcu() implementation
hook which is called when it is safe to free the LSM's internal inode
state. Unfortunately, this new hook does not have access to the inode
itself as it may already be released, so the existing
inode_free_security() hook is retained for those LSMs which require
access to the inode.
Cc: stable@vger.kernel.org
Reported-by: syzbot+5446fbf332b0602ede0b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/00000000000076ba3b0617f65cc8@google.com
Signed-off-by: Paul Moore <paul@paul-moore.com>
Some cleanup and style corrections for lsm_hooks.h.
* Drop the lsm_inode_alloc() extern declaration, it is not needed.
* Relocate lsm_get_xattr_slot() and extern variables in the file to
improve grouping of related objects.
* Don't use tabs to needlessly align structure fields.
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Unfortunately it appears that vma_is_initial_heap() is currently broken
for applications that do not currently have any heap allocated, e.g.
brk == start_brk. The breakage is such that it will cause SELinux to
check for the process/execheap permission on memory regions that cross
brk/start_brk even when there is no heap.
The proper fix would be to correct vma_is_initial_heap(), but as there
are multiple callers I am hesitant to unilaterally modify the helper
out of concern that I would end up breaking some other subsystem. The
mm developers have been made aware of the situation and hopefully they
will have a fix at some point in the future, but we need a fix soon so
we are simply going to revert our use of vma_is_initial_heap() in favor
of our old logic/code which works as expected, even in the face of a
zero size heap. We can return to using vma_is_initial_heap() at some
point in the future when it is fixed.
Cc: stable@vger.kernel.org
Reported-by: Marc Reisner <reisner.marc@gmail.com>
Closes: https://lore.kernel.org/all/ZrPmoLKJEf1wiFmM@marcreisner.com
Fixes: 68df1baf158f ("selinux: use vma_is_initial_stack() and vma_is_initial_heap()")
Signed-off-by: Paul Moore <paul@paul-moore.com>
When avc_add_xperms_decision() fails, the information recorded by the new
avc node is incomplete. In this case, the new avc node should be released
instead of replacing the old avc node.
Cc: stable@vger.kernel.org
Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The count increases only when a node is successfully added to
the linked list.
Cc: stable@vger.kernel.org
Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls")
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
To be consistent with most LSM hooks, convert the return value of
hook inode_copy_up_xattr to 0 or a negative error code.
Before:
- Hook inode_copy_up_xattr returns 0 when accepting xattr, 1 when
discarding xattr, -EOPNOTSUPP if it does not know xattr, or any
other negative error code otherwise.
After:
- Hook inode_copy_up_xattr returns 0 when accepting xattr, *-ECANCELED*
when discarding xattr, -EOPNOTSUPP if it does not know xattr, or
any other negative error code otherwise.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
To be consistent with most LSM hooks, convert the return value of
hook vm_enough_memory to 0 or a negative error code.
Before:
- Hook vm_enough_memory returns 1 if permission is granted, 0 if not.
- LSM_RET_DEFAULT(vm_enough_memory_mm) is 1.
After:
- Hook vm_enough_memory reutrns 0 if permission is granted, negative
error code if not.
- LSM_RET_DEFAULT(vm_enough_memory_mm) is 0.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Move management of the perf_event->security blob out of the individual
security modules and into the security infrastructure. Instead of
allocating the blobs from within the modules the modules tell the
infrastructure how much space is required, and the space is allocated
there. There are no longer any modules that require the perf_event_free()
hook. The hook definition has been removed.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Move management of the infiniband security blob out of the individual
security modules and into the LSM infrastructure. The security modules
tell the infrastructure how much space they require at initialization.
There are no longer any modules that require the ib_free() hook.
The hook definition has been removed.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
[PM: subject tweak, selinux style fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Move management of the dev_tun security blob out of the individual
security modules and into the LSM infrastructure. The security modules
tell the infrastructure how much space they require at initialization.
There are no longer any modules that require the dev_tun_free hook.
The hook definition has been removed.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
[PM: subject tweak, selinux style fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Create a helper function lsm_blob_alloc() for general use in the hook
specific functions that allocate LSM blobs. Change the hook specific
functions to use this helper. This reduces the code size by a small
amount and will make adding new instances of infrastructure managed
security blobs easier.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Move management of the key->security blob out of the individual security
modules and into the security infrastructure. Instead of allocating the
blobs from within the modules the modules tell the infrastructure how
much space is required, and the space is allocated there. There are
no existing modules that require a key_free hook, so the call to it and
the definition for it have been removed.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: John Johansen <john.johansen@canonical.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Move management of the sock->sk_security blob out
of the individual security modules and into the security
infrastructure. Instead of allocating the blobs from within
the modules the modules tell the infrastructure how much
space is required, and the space is allocated there.
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: subject tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Refactor the code in selinux_netlbl_sock_genattr to return ERR_PTR
when an error occurs.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Simplifies the logic for determining the security context type in
security_compute_sid, enhancing readability and efficiency.
Consolidates default type assignment logic next to type transition
checks, removing redundancy and improving code flow.
Signed-off-by: Canfeng Guo <guocanfeng@uniontech.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
+ Cleanups
- optimization: try to avoid refing the label in apparmor_file_open
- remove useless static inline function is_deleted
- use kvfree_sensitive to free data->data
- fix typo in kernel doc
+ Bug fixes
- unpack transition table if dfa is not present
- test: add MODULE_DESCRIPTION()
- take nosymfollow flag into account
- fix possible NULL pointer dereference
- fix null pointer deref when receiving skb during sock creation
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE7cSDD705q2rFEEf7BS82cBjVw9gFAmaikGkACgkQBS82cBjV
w9gasBAAsnHikPshnqnCnIyA/Am9/LeOstKf8jQCbPWtjt1Fw3y8gmuJJOhRalM1
R343rFUF3Z5wb+Yy1xELww3eET91QgC9s5NLJceDXvKOnAppHD+YOphjfmRbpgaF
rWHGN6/rD30HALKpsMw7Z3jTe9xOPhygh+lWlaiJIoXZ2hZwv2Chd6TDVR8BSFyq
OLhuf++DLcZNEcvg1bUxccK49J+iAVeC0VtdzNXw+BRZU5zM/US8n+8EStxuY65n
cAwBM+cJn6yhX3bsazUacw32SCgWePTJr87Wbfn1oF5znoU/HMM9CtPvivTwQSRN
+nT0qm57CJW2IJdRfP2OhRpAbRvUgMjGjyIf7PJHn0zCXffiDNZhSuFOcKy9wAOQ
H9qi7lkAC9KYIs1jI57Fogp+sgo11q+fPdzgGuuCgFnZp7gEPChsTewO346jNYES
fORj62CsWL3lvhADq1cC/kgO9NZHwI7SLH5orobvrIoP9SGH2pGi0Nzdch62MNXo
BwJctpsgS8QPl2XiQhf38+LIczf+USOkDlXO5tmYsTTSUNmjd+7fDc1pcBv0WCf+
2DOEpI3DuaCr1oTr+jJ3zfysYZdeubFI/8uq5dCrMGQZZneEx0RSZAOGAbrWaPCy
XrmkvrjDwGMPocBF0RVHgda60OofZRa/aJnUxhIkOXTDDWpdQPM=
=kO+r
-----END PGP SIGNATURE-----
Merge tag 'apparmor-pr-2024-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
Pull apparmor updates from John Johansen:
"Cleanups
- optimization: try to avoid refing the label in apparmor_file_open
- remove useless static inline function is_deleted
- use kvfree_sensitive to free data->data
- fix typo in kernel doc
Bug fixes:
- unpack transition table if dfa is not present
- test: add MODULE_DESCRIPTION()
- take nosymfollow flag into account
- fix possible NULL pointer dereference
- fix null pointer deref when receiving skb during sock creation"
* tag 'apparmor-pr-2024-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: unpack transition table if dfa is not present
apparmor: try to avoid refing the label in apparmor_file_open
apparmor: test: add MODULE_DESCRIPTION()
apparmor: take nosymfollow flag into account
apparmor: fix possible NULL pointer dereference
apparmor: fix typo in kernel doc
apparmor: remove useless static inline function is_deleted
apparmor: use kvfree_sensitive to free data->data
apparmor: Fix null pointer deref when receiving skb during sock creation
-----BEGIN PGP SIGNATURE-----
iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCZqFEchAcbWljQGRpZ2lr
b2QubmV0AAoJEOXj0OiMgvbSULcBAPEV5Viu/zox2FdS87EGTqWxEQJcBRvc3ahj
MQk44WtMAP4o2CnwrOoMyZXeq9npteL5lQsVhEzeI+p8oN9C9bThBg==
=zizo
-----END PGP SIGNATURE-----
Merge tag 'landlock-6.11-rc1-houdini-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fix from Mickaël Salaün:
"Jann Horn reported a sandbox bypass for Landlock. This includes the
fix and new tests. This should be backported"
* tag 'landlock-6.11-rc1-houdini-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/landlock: Add cred_transfer test
landlock: Don't lose track of restrictions on cred_transfer
const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ctl_table
structs into .rodata data which will ensure that proc_handler function
pointers cannot be modified.
This patch has been generated by the following coccinelle script:
```
virtual patch
@r1@
identifier ctl, write, buffer, lenp, ppos;
identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos);
@r2@
identifier func, ctl, write, buffer, lenp, ppos;
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int write, void *buffer, size_t *lenp, loff_t *ppos)
{ ... }
@r3@
identifier func;
@@
int func(
- struct ctl_table *
+ const struct ctl_table *
,int , void *, size_t *, loff_t *);
@r4@
identifier func, ctl;
@@
int func(
- struct ctl_table *ctl
+ const struct ctl_table *ctl
,int , void *, size_t *, loff_t *);
@r5@
identifier func, write, buffer, lenp, ppos;
@@
int func(
- struct ctl_table *
+ const struct ctl_table *
,int write, void *buffer, size_t *lenp, loff_t *ppos);
```
* Code formatting was adjusted in xfs_sysctl.c to comply with code
conventions. The xfs_stats_clear_proc_handler,
xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where
adjusted.
* The ctl_table argument in proc_watchdog_common was const qualified.
This is called from a proc_handler itself and is calling back into
another proc_handler, making it necessary to change it as part of the
proc_handler migration.
Co-developed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Co-developed-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Joel Granados <j.granados@samsung.com>
Due to a bug in earlier userspaces, a transition table may be present
even when the dfa is not. Commit 7572fea31e3e
("apparmor: convert fperm lookup to use accept as an index") made the
verification check more rigourous regressing old userspaces with
the bug. For compatibility reasons allow the orphaned transition table
during unpack and discard.
Fixes: 7572fea31e3e ("apparmor: convert fperm lookup to use accept as an index")
Signed-off-by: Georgia Garcia <georgia.garcia@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
If the label is not stale (which is the common case), the fact that the
passed file object holds a reference can be leverged to avoid the
ref/unref cycle. Doing so reduces performance impact of apparmor on
parallel open() invocations.
When benchmarking on a 24-core vm using will-it-scale's open1_process
("Separate file open"), the results are (ops/s):
before: 6092196
after: 8309726 (+36%)
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Fix the 'make W=1' warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in security/apparmor/apparmor_policy_unpack_test.o
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
A "nosymfollow" flag was added in commit
dab741e0e02b ("Add a "nosymfollow" mount option.")
While we don't need to implement any special logic on
the AppArmor kernel side to handle it, we should provide
user with a correct list of mount flags in audit logs.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>