Commit Graph

1320357 Commits

Author SHA1 Message Date
Hou Tao
3d5611b4d7 bpf: Remove unnecessary kfree(im_node) in lpm_trie_update_elem
There is no need to call kfree(im_node) when updating element fails,
because im_node must be NULL. Remove the unnecessary kfree() for
im_node.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-06 09:14:25 -08:00
Hou Tao
156c977c53 bpf: Remove unnecessary check when updating LPM trie
When "node->prefixlen == matchlen" is true, it means that the node is
fully matched. If "node->prefixlen == key->prefixlen" is false, it means
the prefix length of key is greater than the prefix length of node,
otherwise, matchlen will not be equal with node->prefixlen. However, it
also implies that the prefix length of node must be less than
max_prefixlen.

Therefore, "node->prefixlen == trie->max_prefixlen" will always be false
when the check of "node->prefixlen == key->prefixlen" returns false.
Remove this unnecessary comparison.

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-06 09:14:25 -08:00
Alexei Starovoitov
e2cf913314 Merge branch 'fixes-for-stack-with-allow_ptr_leaks'
Kumar Kartikeya Dwivedi says:

====================
Fixes for stack with allow_ptr_leaks

Two fixes for usability/correctness gaps when interacting with the stack
without CAP_PERFMON (i.e. with allow_ptr_leaks = false). See the commits
for details. I've verified that the tests fail when run without the fixes.

Changelog:
----------
v3 -> v4
v3: https://lore.kernel.org/bpf/20241202083814.1888784-1-memxor@gmail.com

 * Address Andrii's comments
   * Fix bug paperered over by missing CAP_NET_ADMIN in verifier_mtu
     test
   * Add warning when undefined CAP_ constant is specified, and fail
     test
   * Reorder annotations to be more clear
   * Verify that fixes fail without patches again
 * Add Acked-by from Andrii

v2 -> v3
v2: https://lore.kernel.org/bpf/20241127212026.3580542-1-memxor@gmail.com

 * Address comments from Eduard
   * Fix comment for mark_stack_slot_misc
   * We can simply always return early when stype == STACK_INVALID
   * Drop allow_ptr_leaks conditionals
   * Add Eduard's __caps_unpriv patch into the series
   * Convert test_verifier_mtu to use it
   * Move existing tests to __caps_unpriv annotation and verifier_spill_fill.c
   * Add Acked-by from Eduard

v1 -> v2
v1: https://lore.kernel.org/bpf/20241127185135.2753982-1-memxor@gmail.com

 * Fix CI errors in selftest by removing dependence on BPF_ST
====================

Link: https://patch.msgid.link/20241204044757.1483141-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-04 09:19:51 -08:00
Kumar Kartikeya Dwivedi
19b6dbc006 selftests/bpf: Add test for narrow spill into 64-bit spilled scalar
Add a test case to verify that without CAP_PERFMON, the test now
succeeds instead of failing due to a verification error.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204044757.1483141-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-04 09:19:50 -08:00
Kumar Kartikeya Dwivedi
f513c36350 selftests/bpf: Add test for reading from STACK_INVALID slots
Ensure that when CAP_PERFMON is dropped, and the verifier sees
allow_ptr_leaks as false, we are not permitted to read from a
STACK_INVALID slot. Without the fix, the test will report unexpected
success in loading.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204044757.1483141-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-04 09:19:50 -08:00
Eduard Zingerman
adfdd9c685 selftests/bpf: Introduce __caps_unpriv annotation for tests
Add a __caps_unpriv annotation so that tests requiring specific
capabilities while dropping the rest can conveniently specify them
during selftest declaration instead of munging with capabilities at
runtime from the testing binary.

While at it, let us convert test_verifier_mtu to use this new support
instead.

Since we do not want to include linux/capability.h, we only defined the
four main capabilities BPF subsystem deals with in bpf_misc.h for use in
tests. If the user passes a CAP_SYS_NICE or anything else that's not
defined in the header, capability parsing code will return a warning.

Also reject strtol returning 0. CAP_CHOWN = 0 but we'll never need to
use it, and strtol doesn't errno on failed conversion. Fail the test in
such a case.

The original diff for this idea is available at link [0].

  [0]: https://lore.kernel.org/bpf/a1e48f5d9ae133e19adc6adf27e19d585e06bab4.camel@gmail.com

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
[ Kartikeya: rebase on bpf-next, add warn to parse_caps, convert test_verifier_mtu ]
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204044757.1483141-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-04 09:19:50 -08:00
Tao Lyu
b0e66977dc bpf: Fix narrow scalar spill onto 64-bit spilled scalar slots
When CAP_PERFMON and CAP_SYS_ADMIN (allow_ptr_leaks) are disabled, the
verifier aims to reject partial overwrite on an 8-byte stack slot that
contains a spilled pointer.

However, in such a scenario, it rejects all partial stack overwrites as
long as the targeted stack slot is a spilled register, because it does
not check if the stack slot is a spilled pointer.

Incomplete checks will result in the rejection of valid programs, which
spill narrower scalar values onto scalar slots, as shown below.

0: R1=ctx() R10=fp0
; asm volatile ( @ repro.bpf.c:679
0: (7a) *(u64 *)(r10 -8) = 1          ; R10=fp0 fp-8_w=1
1: (62) *(u32 *)(r10 -8) = 1
attempt to corrupt spilled pointer on stack
processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0.

Fix this by expanding the check to not consider spilled scalar registers
when rejecting the write into the stack.

Previous discussion on this patch is at link [0].

  [0]: https://lore.kernel.org/bpf/20240403202409.2615469-1-tao.lyu@epfl.ch

Fixes: ab125ed3ec ("bpf: fix check for attempt to corrupt spilled pointer")
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Tao Lyu <tao.lyu@epfl.ch>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204044757.1483141-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-04 09:19:50 -08:00
Kumar Kartikeya Dwivedi
69772f509e bpf: Don't mark STACK_INVALID as STACK_MISC in mark_stack_slot_misc
Inside mark_stack_slot_misc, we should not upgrade STACK_INVALID to
STACK_MISC when allow_ptr_leaks is false, since invalid contents
shouldn't be read unless the program has the relevant capabilities.
The relaxation only makes sense when env->allow_ptr_leaks is true.

However, such conversion in privileged mode becomes unnecessary, as
invalid slots can be read without being upgraded to STACK_MISC.

Currently, the condition is inverted (i.e. checking for true instead of
false), simply remove it to restore correct behavior.

Fixes: eaf18febd6 ("bpf: preserve STACK_ZERO slots on partial reg spills")
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reported-by: Tao Lyu <tao.lyu@epfl.ch>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241204044757.1483141-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-04 09:19:50 -08:00
Eduard Zingerman
5a6ea7022f samples/bpf: Remove unnecessary -I flags from libbpf EXTRA_CFLAGS
Commit [0] breaks samples/bpf build:

    $ make M=samples/bpf
    ...
    make -C /path/to/kernel/samples/bpf/../../tools/lib/bpf \
     ...
     EXTRA_CFLAGS=" \
     ...
     -fsanitize=bounds \
     -I/path/to/kernel/usr/include \
     ...
    	/path/to/kernel/samples/bpf/libbpf/libbpf.a install_headers
      CC      /path/to/kernel/samples/bpf/libbpf/staticobjs/libbpf.o
    In file included from libbpf.c:29:
    /path/to/kernel/tools/include/linux/err.h:35:8: error: 'inline' can only appear on functions
       35 | static inline void * __must_check ERR_PTR(long error_)
          |        ^

The error is caused by `objtree` variable changing definition from `.`
(dot) to an absolute path:
- The variable TPROGS_CFLAGS is constructed as follows:
  ...
  TPROGS_CFLAGS += -I$(objtree)/usr/include
- It is passed as EXTRA_CFLAGS for libbpf compilation:
  $(LIBBPF): ...
    ...
	$(MAKE) -C $(LIBBPF_SRC) RM='rm -rf' EXTRA_CFLAGS="$(TPROGS_CFLAGS)"
- Before commit [0], the line passed to libbpf makefile was
  '-I./usr/include', where '.' referred to LIBBPF_SRC due to -C flag.
  The directory $(LIBBPF_SRC)/usr/include does not exist and thus
  was never resolved by C compiler.
- After commit [0], the line passed to libbpf makefile became:
  '<output-dir>/usr/include', this directory exists and is resolved by
  C compiler.
- Both 'tools/include' and 'usr/include' define files err.h and types.h.
- libbpf expects headers like 'linux/err.h' and 'linux/types.h'
  defined in 'tools/include', not 'usr/include', hence the compilation
  error.

This commit removes unnecessary -I flags from libbpf compilation.
(libbpf sets up the necessary includes at lib/bpf/Makefile:63).

Changes v1 [1] -> v2:
- dropped unnecessary replacement of KBUILD_OUTPUT with $(objtree)
  (Andrii)
Changes v2 [2] -> v3:
- make sure --sysroot option is set for libbpf's EXTRA_CFLAGS,
  if $(SYSROOT) is set (Stanislav)

[0] commit 13b25489b6 ("kbuild: change working directory to external module directory with M=")
[1] https://lore.kernel.org/bpf/20241202212154.3174402-1-eddyz87@gmail.com/
[2] https://lore.kernel.org/bpf/20241202234741.3492084-1-eddyz87@gmail.com/

Fixes: 13b25489b6 ("kbuild: change working directory to external module directory with M=")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/bpf/20241203182222.3915763-1-eddyz87@gmail.com
2024-12-03 13:29:04 -08:00
Kumar Kartikeya Dwivedi
bd74e238ae bpf: Zero index arg error string for dynptr and iter
Andrii spotted that process_dynptr_func's rejection of incorrect
argument register type will print an error string where argument numbers
are not zero-indexed, unlike elsewhere in the verifier.  Fix this by
subtracting 1 from regno. The same scenario exists for iterator
messages. Fix selftest error strings that match on the exact argument
number while we're at it to ensure clean bisection.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241203002235.3776418-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-02 18:47:41 -08:00
Alexei Starovoitov
d4c44354bc Merge branch 'fix-missing-process_iter_arg-type-check'
Kumar Kartikeya Dwivedi says:

====================
Fix missing process_iter_arg type check

I am taking over Tao's earlier patch set that can be found at [0], after
an offline discussion. The bug reported in that thread is that
process_iter_arg missed a reg->type == PTR_TO_STACK check. Fix this by
adding it in, and also address comments from Andrii on the earlier
attempt. Include more selftests to ensure the error is caught.

  [0]: https://lore.kernel.org/bpf/20241107214736.347630-1-tao.lyu@epfl.ch

Changelog:
----------
v1 -> v2:
v1: https://lore.kernel.org/bpf/20241127230147.4158201-1-memxor@gmail.com
====================

Link: https://patch.msgid.link/20241203000238.3602922-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-02 17:48:06 -08:00
Kumar Kartikeya Dwivedi
7f71197001 selftests/bpf: Add tests for iter arg check
Add selftests to cover argument type check for iterator kfuncs, and
cover all three kinds (new, next, destroy). Without the fix in the
previous patch, the selftest would not cause a verifier error.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241203000238.3602922-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-02 17:47:56 -08:00
Tao Lyu
12659d2861 bpf: Ensure reg is PTR_TO_STACK in process_iter_arg
Currently, KF_ARG_PTR_TO_ITER handling missed checking the reg->type and
ensuring it is PTR_TO_STACK. Instead of enforcing this in the caller of
process_iter_arg, move the check into it instead so that all callers
will gain the check by default. This is similar to process_dynptr_func.

An existing selftest in verifier_bits_iter.c fails due to this change,
but it's because it was passing a NULL pointer into iter_next helper and
getting an error further down the checks, but probably meant to pass an
uninitialized iterator on the stack (as is done in the subsequent test
below it). We will gain coverage for non-PTR_TO_STACK arguments in later
patches hence just change the declaration to zero-ed stack object.

Fixes: 06accc8779 ("bpf: add support for open-coded iterator loops")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Tao Lyu <tao.lyu@epfl.ch>
[ Kartikeya: move check into process_iter_arg, rewrite commit log ]
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241203000238.3602922-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-02 17:47:56 -08:00
Björn Töpel
537a2525ea tools: Override makefile ARCH variable if defined, but empty
There are a number of tools (bpftool, selftests), that require a
"bootstrap" build. Here, a bootstrap build is a build host variant of
a target. E.g., assume that you're performing a bpftool cross-build on
x86 to riscv, a bootstrap build would then be an x86 variant of
bpftool. The typical way to perform the host build variant, is to pass
"ARCH=" in a sub-make. However, if a variable has been set with a
command argument, then ordinary assignments in the makefile are
ignored.

This side-effect results in that ARCH, and variables depending on ARCH
are not set. Workaround by overriding ARCH to the host arch, if ARCH
is empty.

Fixes: 8859b0da5a ("tools/bpftool: Fix cross-build")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Quentin Monnet <qmo@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/bpf/20241127101748.165693-1-bjorn@kernel.org
2024-11-29 17:04:25 +01:00
Zijian Zhang
3448ad23b3 selftests/bpf: Add apply_bytes test to test_txmsg_redir_wait_sndmem in test_sockmap
Add this to more comprehensively test the socket memory accounting logic
in the __SK_REDIRECT and __SK_DROP cases of tcp_bpf_sendmsg. We don't have
test when apply_bytes are not zero in test_txmsg_redir_wait_sndmem.
test_send_large has opt->rate=2, it will invoke sendmsg two times.

Specifically, the first sendmsg will trigger the case where the ret value
of tcp_bpf_sendmsg_redir is less than 0; while the second sendmsg happens
after the 3 seconds timeout, and it will trigger __SK_DROP because socket
c2 has been removed from the sockmap/hash.

Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241016234838.3167769-2-zijianzhang@bytedance.com
2024-11-26 20:42:03 +01:00
Zijian Zhang
ca70b8baf2 tcp_bpf: Fix the sk_mem_uncharge logic in tcp_bpf_sendmsg
The current sk memory accounting logic in __SK_REDIRECT is pre-uncharging
tosend bytes, which is either msg->sg.size or a smaller value apply_bytes.

Potential problems with this strategy are as follows:

- If the actual sent bytes are smaller than tosend, we need to charge some
  bytes back, as in line 487, which is okay but seems not clean.

- When tosend is set to apply_bytes, as in line 417, and (ret < 0), we may
  miss uncharging (msg->sg.size - apply_bytes) bytes.

[...]
415 tosend = msg->sg.size;
416 if (psock->apply_bytes && psock->apply_bytes < tosend)
417   tosend = psock->apply_bytes;
[...]
443 sk_msg_return(sk, msg, tosend);
444 release_sock(sk);
446 origsize = msg->sg.size;
447 ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress,
448                             msg, tosend, flags);
449 sent = origsize - msg->sg.size;
[...]
454 lock_sock(sk);
455 if (unlikely(ret < 0)) {
456   int free = sk_msg_free_nocharge(sk, msg);
458   if (!cork)
459     *copied -= free;
460 }
[...]
487 if (eval == __SK_REDIRECT)
488   sk_mem_charge(sk, tosend - sent);
[...]

When running the selftest test_txmsg_redir_wait_sndmem with txmsg_apply,
the following warning will be reported:

------------[ cut here ]------------
WARNING: CPU: 6 PID: 57 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x190/0x1a0
Modules linked in:
CPU: 6 UID: 0 PID: 57 Comm: kworker/6:0 Not tainted 6.12.0-rc1.bm.1-amd64+ #43
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Workqueue: events sk_psock_destroy
RIP: 0010:inet_sock_destruct+0x190/0x1a0
RSP: 0018:ffffad0a8021fe08 EFLAGS: 00010206
RAX: 0000000000000011 RBX: ffff9aab4475b900 RCX: ffff9aab481a0800
RDX: 0000000000000303 RSI: 0000000000000011 RDI: ffff9aab4475b900
RBP: ffff9aab4475b990 R08: 0000000000000000 R09: ffff9aab40050ec0
R10: 0000000000000000 R11: ffff9aae6fdb1d01 R12: ffff9aab49c60400
R13: ffff9aab49c60598 R14: ffff9aab49c60598 R15: dead000000000100
FS:  0000000000000000(0000) GS:ffff9aae6fd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffec7e47bd8 CR3: 00000001a1a1c004 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0x89/0x130
? inet_sock_destruct+0x190/0x1a0
? report_bug+0xfc/0x1e0
? handle_bug+0x5c/0xa0
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? inet_sock_destruct+0x190/0x1a0
__sk_destruct+0x25/0x220
sk_psock_destroy+0x2b2/0x310
process_scheduled_works+0xa3/0x3e0
worker_thread+0x117/0x240
? __pfx_worker_thread+0x10/0x10
kthread+0xcf/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
---[ end trace 0000000000000000 ]---

In __SK_REDIRECT, a more concise way is delaying the uncharging after sent
bytes are finalized, and uncharge this value. When (ret < 0), we shall
invoke sk_msg_free.

Same thing happens in case __SK_DROP, when tosend is set to apply_bytes,
we may miss uncharging (msg->sg.size - apply_bytes) bytes. The same
warning will be reported in selftest.

[...]
468 case __SK_DROP:
469 default:
470 sk_msg_free_partial(sk, msg, tosend);
471 sk_msg_apply_bytes(psock, tosend);
472 *copied -= (tosend + delta);
473 return -EACCES;
[...]

So instead of sk_msg_free_partial we can do sk_msg_free here.

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Fixes: 8ec95b9471 ("bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241016234838.3167769-3-zijianzhang@bytedance.com
2024-11-26 20:37:46 +01:00
Sebastian Andrzej Siewior
6b64128a74 selftests/bpf: Check for PREEMPTION instead of PREEMPT
CONFIG_PREEMPT is a preemtion model the so called "Low-Latency Desktop".
A different preemption model is PREEMPT_RT the so called "Real-Time".
Both implement preemption in kernel and set CONFIG_PREEMPTION.
There is also the so called "LAZY PREEMPT" which the "Scheduler
controlled preemption model". Here we have also preemption in the kernel
the rules are slightly different.

Therefore the testsuite should not check for CONFIG_PREEMPT (as one
model) but for CONFIG_PREEMPTION to figure out if preemption in the
kernel is possible.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20241119161819.qvEcs-n_@linutronix.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-26 08:55:01 -08:00
Amir Mohammadi
ef3ba8c258 bpftool: fix potential NULL pointer dereferencing in prog_dump()
A NULL pointer dereference could occur if ksyms
is not properly checked before usage in the prog_dump() function.

Fixes: b053b439b7 ("bpf: libbpf: bpftool: Print bpf_line_info during prog dump")
Signed-off-by: Amir Mohammadi <amiremohamadi@yahoo.com>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20241121083413.7214-1-amiremohamadi@yahoo.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-25 14:28:46 -08:00
Larysa Zaremba
ac9a48a6f1 xsk: always clear DMA mapping information when unmapping the pool
When the umem is shared, the DMA mapping is also shared between the xsk
pools, therefore it should stay valid as long as at least 1 user remains.
However, the pool also keeps the copies of DMA-related information that are
initialized in the same way in xp_init_dma_info(), but cleared by
xp_dma_unmap() only for the last remaining pool, this causes the problems
below.

The first one is that the commit adbf5a4234 ("ice: remove af_xdp_zc_qps
bitmap") relies on pool->dev to determine the presence of a ZC pool on a
given queue, avoiding internal bookkeeping. This works perfectly fine if
the UMEM is not shared, but reliably fails otherwise as stated in the
linked report.

The second one is pool->dma_pages which is dynamically allocated and
only freed in xp_dma_unmap(), this leads to a small memory leak. kmemleak
does not catch it, but by printing the allocation results after terminating
the userspace program it is possible to see that all addresses except the
one belonging to the last detached pool are still accessible through the
kmemleak dump functionality.

Always clear the DMA mapping information from the pool and free
pool->dma_pages when unmapping the pool, so that the only difference
between results of the last remaining user's call and the ones before would
be the destruction of the DMA mapping.

Fixes: adbf5a4234 ("ice: remove af_xdp_zc_qps bitmap")
Fixes: 921b68692a ("xsk: Enable sharing of dma mappings")
Reported-by: Alasdair McWilliam <alasdair.mcwilliam@outlook.com>
Closes: https://lore.kernel.org/PA4P194MB10056F208AF221D043F57A3D86512@PA4P194MB1005.EURP194.PROD.OUTLOOK.COM
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20241122112912.89881-1-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-25 14:27:37 -08:00
Alexei Starovoitov
7ca0884200 Merge branch 'bpf-fix-oob-accesses-in-map_delete_elem-callbacks'
Maciej Fijalkowski says:

====================
bpf: fix OOB accesses in map_delete_elem callbacks

v1->v2:
- CC stable and collect tags from Toke & John

Hi,

Jordy reported that for big enough XSKMAPs and DEVMAPs, when deleting
elements, OOB writes occur.

Reproducer below:

// compile with gcc -o map_poc map_poc.c -lbpf
#include <errno.h>
#include <linux/bpf.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/syscall.h>
#include <unistd.h>

int main() {
  // Create a large enough BPF XSK map
  int map_fd;
  union bpf_attr create_attr = {
      .map_type = BPF_MAP_TYPE_XSKMAP,
      .key_size = sizeof(int),
      .value_size = sizeof(int),
      .max_entries = 0x80000000 + 2,
  };

  map_fd = syscall(SYS_bpf, BPF_MAP_CREATE, &create_attr, sizeof(create_attr));
  if (map_fd < 0) {
    fprintf(stderr, "Failed to create BPF map: %s\n", strerror(errno));
    return 1;
  }

  // Delete an element from the map using syscall
  unsigned int key = 0x80000000 + 1;
  if (syscall(SYS_bpf, BPF_MAP_DELETE_ELEM,
              &(union bpf_attr){
                  .map_fd = map_fd,
                  .key = &key,
              },
              sizeof(union bpf_attr)) < 0) {
    fprintf(stderr, "Failed to delete element from BPF map: %s\n",
            strerror(errno));
    return 1;
  }

  close(map_fd);
  return 0;
}

This tiny series changes data types from int to u32 of keys being used
for map accesses.

Thanks,
Maciej
====================

Link: https://patch.msgid.link/20241122121030.716788-1-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-25 14:25:49 -08:00
Maciej Fijalkowski
ab244dd7cf bpf: fix OOB devmap writes when deleting elements
Jordy reported issue against XSKMAP which also applies to DEVMAP - the
index used for accessing map entry, due to being a signed integer,
causes the OOB writes. Fix is simple as changing the type from int to
u32, however, when compared to XSKMAP case, one more thing needs to be
addressed.

When map is released from system via dev_map_free(), we iterate through
all of the entries and an iterator variable is also an int, which
implies OOB accesses. Again, change it to be u32.

Example splat below:

[  160.724676] BUG: unable to handle page fault for address: ffffc8fc2c001000
[  160.731662] #PF: supervisor read access in kernel mode
[  160.736876] #PF: error_code(0x0000) - not-present page
[  160.742095] PGD 0 P4D 0
[  160.744678] Oops: Oops: 0000 [#1] PREEMPT SMP
[  160.749106] CPU: 1 UID: 0 PID: 520 Comm: kworker/u145:12 Not tainted 6.12.0-rc1+ #487
[  160.757050] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[  160.767642] Workqueue: events_unbound bpf_map_free_deferred
[  160.773308] RIP: 0010:dev_map_free+0x77/0x170
[  160.777735] Code: 00 e8 fd 91 ed ff e8 b8 73 ed ff 41 83 7d 18 19 74 6e 41 8b 45 24 49 8b bd f8 00 00 00 31 db 85 c0 74 48 48 63 c3 48 8d 04 c7 <48> 8b 28 48 85 ed 74 30 48 8b 7d 18 48 85 ff 74 05 e8 b3 52 fa ff
[  160.796777] RSP: 0018:ffffc9000ee1fe38 EFLAGS: 00010202
[  160.802086] RAX: ffffc8fc2c001000 RBX: 0000000080000000 RCX: 0000000000000024
[  160.809331] RDX: 0000000000000000 RSI: 0000000000000024 RDI: ffffc9002c001000
[  160.816576] RBP: 0000000000000000 R08: 0000000000000023 R09: 0000000000000001
[  160.823823] R10: 0000000000000001 R11: 00000000000ee6b2 R12: dead000000000122
[  160.831066] R13: ffff88810c928e00 R14: ffff8881002df405 R15: 0000000000000000
[  160.838310] FS:  0000000000000000(0000) GS:ffff8897e0c40000(0000) knlGS:0000000000000000
[  160.846528] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  160.852357] CR2: ffffc8fc2c001000 CR3: 0000000005c32006 CR4: 00000000007726f0
[  160.859604] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  160.866847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  160.874092] PKRU: 55555554
[  160.876847] Call Trace:
[  160.879338]  <TASK>
[  160.881477]  ? __die+0x20/0x60
[  160.884586]  ? page_fault_oops+0x15a/0x450
[  160.888746]  ? search_extable+0x22/0x30
[  160.892647]  ? search_bpf_extables+0x5f/0x80
[  160.896988]  ? exc_page_fault+0xa9/0x140
[  160.900973]  ? asm_exc_page_fault+0x22/0x30
[  160.905232]  ? dev_map_free+0x77/0x170
[  160.909043]  ? dev_map_free+0x58/0x170
[  160.912857]  bpf_map_free_deferred+0x51/0x90
[  160.917196]  process_one_work+0x142/0x370
[  160.921272]  worker_thread+0x29e/0x3b0
[  160.925082]  ? rescuer_thread+0x4b0/0x4b0
[  160.929157]  kthread+0xd4/0x110
[  160.932355]  ? kthread_park+0x80/0x80
[  160.936079]  ret_from_fork+0x2d/0x50
[  160.943396]  ? kthread_park+0x80/0x80
[  160.950803]  ret_from_fork_asm+0x11/0x20
[  160.958482]  </TASK>

Fixes: 546ac1ffb7 ("bpf: add devmap, a map for storing net device references")
CC: stable@vger.kernel.org
Reported-by: Jordy Zomer <jordyzomer@google.com>
Suggested-by: Jordy Zomer <jordyzomer@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20241122121030.716788-3-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-25 14:25:48 -08:00
Maciej Fijalkowski
32cd3db7de xsk: fix OOB map writes when deleting elements
Jordy says:

"
In the xsk_map_delete_elem function an unsigned integer
(map->max_entries) is compared with a user-controlled signed integer
(k). Due to implicit type conversion, a large unsigned value for
map->max_entries can bypass the intended bounds check:

	if (k >= map->max_entries)
		return -EINVAL;

This allows k to hold a negative value (between -2147483648 and -2),
which is then used as an array index in m->xsk_map[k], which results
in an out-of-bounds access.

	spin_lock_bh(&m->lock);
	map_entry = &m->xsk_map[k]; // Out-of-bounds map_entry
	old_xs = unrcu_pointer(xchg(map_entry, NULL));  // Oob write
	if (old_xs)
		xsk_map_sock_delete(old_xs, map_entry);
	spin_unlock_bh(&m->lock);

The xchg operation can then be used to cause an out-of-bounds write.
Moreover, the invalid map_entry passed to xsk_map_sock_delete can lead
to further memory corruption.
"

It indeed results in following splat:

[76612.897343] BUG: unable to handle page fault for address: ffffc8fc2e461108
[76612.904330] #PF: supervisor write access in kernel mode
[76612.909639] #PF: error_code(0x0002) - not-present page
[76612.914855] PGD 0 P4D 0
[76612.917431] Oops: Oops: 0002 [#1] PREEMPT SMP
[76612.921859] CPU: 11 UID: 0 PID: 10318 Comm: a.out Not tainted 6.12.0-rc1+ #470
[76612.929189] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[76612.939781] RIP: 0010:xsk_map_delete_elem+0x2d/0x60
[76612.944738] Code: 00 00 41 54 55 53 48 63 2e 3b 6f 24 73 38 4c 8d a7 f8 00 00 00 48 89 fb 4c 89 e7 e8 2d bf 05 00 48 8d b4 eb 00 01 00 00 31 ff <48> 87 3e 48 85 ff 74 05 e8 16 ff ff ff 4c 89 e7 e8 3e bc 05 00 31
[76612.963774] RSP: 0018:ffffc9002e407df8 EFLAGS: 00010246
[76612.969079] RAX: 0000000000000000 RBX: ffffc9002e461000 RCX: 0000000000000000
[76612.976323] RDX: 0000000000000001 RSI: ffffc8fc2e461108 RDI: 0000000000000000
[76612.983569] RBP: ffffffff80000001 R08: 0000000000000000 R09: 0000000000000007
[76612.990812] R10: ffffc9002e407e18 R11: ffff888108a38858 R12: ffffc9002e4610f8
[76612.998060] R13: ffff888108a38858 R14: 00007ffd1ae0ac78 R15: ffffc9002e4610c0
[76613.005303] FS:  00007f80b6f59740(0000) GS:ffff8897e0ec0000(0000) knlGS:0000000000000000
[76613.013517] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76613.019349] CR2: ffffc8fc2e461108 CR3: 000000011e3ef001 CR4: 00000000007726f0
[76613.026595] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[76613.033841] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[76613.041086] PKRU: 55555554
[76613.043842] Call Trace:
[76613.046331]  <TASK>
[76613.048468]  ? __die+0x20/0x60
[76613.051581]  ? page_fault_oops+0x15a/0x450
[76613.055747]  ? search_extable+0x22/0x30
[76613.059649]  ? search_bpf_extables+0x5f/0x80
[76613.063988]  ? exc_page_fault+0xa9/0x140
[76613.067975]  ? asm_exc_page_fault+0x22/0x30
[76613.072229]  ? xsk_map_delete_elem+0x2d/0x60
[76613.076573]  ? xsk_map_delete_elem+0x23/0x60
[76613.080914]  __sys_bpf+0x19b7/0x23c0
[76613.084555]  __x64_sys_bpf+0x1a/0x20
[76613.088194]  do_syscall_64+0x37/0xb0
[76613.091832]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[76613.096962] RIP: 0033:0x7f80b6d1e88d
[76613.100592] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
[76613.119631] RSP: 002b:00007ffd1ae0ac68 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
[76613.131330] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f80b6d1e88d
[76613.142632] RDX: 0000000000000098 RSI: 00007ffd1ae0ad20 RDI: 0000000000000003
[76613.153967] RBP: 00007ffd1ae0adc0 R08: 0000000000000000 R09: 0000000000000000
[76613.166030] R10: 00007f80b6f77040 R11: 0000000000000206 R12: 00007ffd1ae0aed8
[76613.177130] R13: 000055ddf42ce1e9 R14: 000055ddf42d0d98 R15: 00007f80b6fab040
[76613.188129]  </TASK>

Fix this by simply changing key type from int to u32.

Fixes: fbfc504a24 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP")
CC: stable@vger.kernel.org
Reported-by: Jordy Zomer <jordyzomer@google.com>
Suggested-by: Jordy Zomer <jordyzomer@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20241122121030.716788-2-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-25 14:25:48 -08:00
Alexei Starovoitov
20a39ea377 Merge branch 'bpf-vsock-fix-poll-and-close'
Michal Luczaj says:

====================
bpf, vsock: Fix poll() and close()

Two small fixes for vsock: poll() missing a queue check, and close() not
invoking sockmap cleanup.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Acked-by: John Fastabend <john.fastabend@gmail.com>
---
====================

Link: https://patch.msgid.link/20241118-vsock-bpf-poll-close-v1-0-f1b9669cacdc@rbox.co
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-25 14:19:15 -08:00
Michal Luczaj
515745445e selftest/bpf: Add test for vsock removal from sockmap on close()
Make sure the proto::close callback gets invoked on vsock release.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20241118-vsock-bpf-poll-close-v1-4-f1b9669cacdc@rbox.co
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
2024-11-25 14:19:14 -08:00
Michal Luczaj
135ffc7bec bpf, vsock: Invoke proto::close on close()
vsock defines a BPF callback to be invoked when close() is called. However,
this callback is never actually executed. As a result, a closed vsock
socket is not automatically removed from the sockmap/sockhash.

Introduce a dummy vsock_close() and make vsock_release() call proto::close.

Note: changes in __vsock_release() look messy, but it's only due to indent
level reduction and variables xmas tree reorder.

Fixes: 634f1a7110 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Link: https://lore.kernel.org/r/20241118-vsock-bpf-poll-close-v1-3-f1b9669cacdc@rbox.co
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
2024-11-25 14:19:14 -08:00
Michal Luczaj
9c2a2a4513 selftest/bpf: Add test for af_vsock poll()
Verify that vsock's poll() notices when sk_psock::ingress_msg isn't empty.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20241118-vsock-bpf-poll-close-v1-2-f1b9669cacdc@rbox.co
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
2024-11-25 14:19:14 -08:00
Michal Luczaj
9f0fc98145 bpf, vsock: Fix poll() missing a queue
When a verdict program simply passes a packet without redirection, sk_msg
is enqueued on sk_psock::ingress_msg. Add a missing check to poll().

Fixes: 634f1a7110 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Luigi Leonardi <leonardi@redhat.com>
Link: https://lore.kernel.org/r/20241118-vsock-bpf-poll-close-v1-1-f1b9669cacdc@rbox.co
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
2024-11-25 14:19:14 -08:00
Thomas Weißschuh
8618f5ffba bpf, lsm: Remove getlsmprop hooks BTF IDs
These hooks are not useful for BPF LSM currently.
Furthermore a recent renaming introduced build warnings:

  BTFIDS  vmlinux
WARN: resolve_btfids: unresolved symbol bpf_lsm_task_getsecid_obj
WARN: resolve_btfids: unresolved symbol bpf_lsm_current_getsecid_subj

Link: https://lore.kernel.org/lkml/20241123-bpf_lsm_task_getsecid_obj-v1-1-0d0f94649e05@weissschuh.net/
Fixes: 37f670aacd ("lsm: use lsm_prop in security_current_getsecid")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20241125-bpf_lsm_task_getsecid_obj-v2-1-c8395bde84e0@weissschuh.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-25 14:14:17 -08:00
Linus Torvalds
9f16d5e6f2 The biggest change here is eliminating the awful idea that KVM had, of
essentially guessing which pfns are refcounted pages.  The reason to
 do so was that KVM needs to map both non-refcounted pages (for example
 BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP VMAs that contain
 refcounted pages.  However, the result was security issues in the past,
 and more recently the inability to map VM_IO and VM_PFNMAP memory
 that _is_ backed by struct page but is not refcounted.  In particular
 this broke virtio-gpu blob resources (which directly map host graphics
 buffers into the guest as "vram" for the virtio-gpu device) with the
 amdgpu driver, because amdgpu allocates non-compound higher order pages
 and the tail pages could not be mapped into KVM.
 
 This requires adjusting all uses of struct page in the per-architecture
 code, to always work on the pfn whenever possible.  The large series that
 did this, from David Stevens and Sean Christopherson, also cleaned up
 substantially the set of functions that provided arch code with the
 pfn for a host virtual addresses.  The previous maze of twisty little
 passages, all different, is replaced by five functions (__gfn_to_page,
 __kvm_faultin_pfn, the non-__ versions of these two, and kvm_prefetch_pages)
 saving almost 200 lines of code.
 
 ARM:
 
 * Support for stage-1 permission indirection (FEAT_S1PIE) and
   permission overlays (FEAT_S1POE), including nested virt + the
   emulated page table walker
 
 * Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This call
   was introduced in PSCIv1.3 as a mechanism to request hibernation,
   similar to the S4 state in ACPI
 
 * Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
   part of it, introduce trivial initialization of the host's MPAM
   context so KVM can use the corresponding traps
 
 * PMU support under nested virtualization, honoring the guest
   hypervisor's trap configuration and event filtering when running a
   nested guest
 
 * Fixes to vgic ITS serialization where stale device/interrupt table
   entries are not zeroed when the mapping is invalidated by the VM
 
 * Avoid emulated MMIO completion if userspace has requested synchronous
   external abort injection
 
 * Various fixes and cleanups affecting pKVM, vCPU initialization, and
   selftests
 
 LoongArch:
 
 * Add iocsr and mmio bus simulation in kernel.
 
 * Add in-kernel interrupt controller emulation.
 
 * Add support for virtualization extensions to the eiointc irqchip.
 
 PPC:
 
 * Drop lingering and utterly obsolete references to PPC970 KVM, which was
   removed 10 years ago.
 
 * Fix incorrect documentation references to non-existing ioctls
 
 RISC-V:
 
 * Accelerate KVM RISC-V when running as a guest
 
 * Perf support to collect KVM guest statistics from host side
 
 s390:
 
 * New selftests: more ucontrol selftests and CPU model sanity checks
 
 * Support for the gen17 CPU model
 
 * List registers supported by KVM_GET/SET_ONE_REG in the documentation
 
 x86:
 
 * Cleanup KVM's handling of Accessed and Dirty bits to dedup code, improve
   documentation, harden against unexpected changes.  Even if the hardware
   A/D tracking is disabled, it is possible to use the hardware-defined A/D
   bits to track if a PFN is Accessed and/or Dirty, and that removes a lot
   of special cases.
 
 * Elide TLB flushes when aging secondary PTEs, as has been done in x86's
   primary MMU for over 10 years.
 
 * Recover huge pages in-place in the TDP MMU when dirty page logging is
   toggled off, instead of zapping them and waiting until the page is
   re-accessed to create a huge mapping.  This reduces vCPU jitter.
 
 * Batch TLB flushes when dirty page logging is toggled off.  This reduces
   the time it takes to disable dirty logging by ~3x.
 
 * Remove the shrinker that was (poorly) attempting to reclaim shadow page
   tables in low-memory situations.
 
 * Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE.
 
 * Advertise CPUIDs for new instructions in Clearwater Forest
 
 * Quirk KVM's misguided behavior of initialized certain feature MSRs to
   their maximum supported feature set, which can result in KVM creating
   invalid vCPU state.  E.g. initializing PERF_CAPABILITIES to a non-zero
   value results in the vCPU having invalid state if userspace hides PDCM
   from the guest, which in turn can lead to save/restore failures.
 
 * Fix KVM's handling of non-canonical checks for vCPUs that support LA57
   to better follow the "architecture", in quotes because the actual
   behavior is poorly documented.  E.g. most MSR writes and descriptor
   table loads ignore CR4.LA57 and operate purely on whether the CPU
   supports LA57.
 
 * Bypass the register cache when querying CPL from kvm_sched_out(), as
   filling the cache from IRQ context is generally unsafe; harden the
   cache accessors to try to prevent similar issues from occuring in the
   future.  The issue that triggered this change was already fixed in 6.12,
   but was still kinda latent.
 
 * Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM
   over-advertises SPEC_CTRL when trying to support cross-vendor VMs.
 
 * Minor cleanups
 
 * Switch hugepage recovery thread to use vhost_task.  These kthreads can
   consume significant amounts of CPU time on behalf of a VM or in response
   to how the VM behaves (for example how it accesses its memory); therefore
   KVM tried to place the thread in the VM's cgroups and charge the CPU
   time consumed by that work to the VM's container.  However the kthreads
   did not process SIGSTOP/SIGCONT, and therefore cgroups which had KVM
   instances inside could not complete freezing.  Fix this by replacing the
   kthread with a PF_USER_WORKER thread, via the vhost_task abstraction.
   Another 100+ lines removed, with generally better behavior too like
   having these threads properly parented in the process tree.
 
 * Revert a workaround for an old CPU erratum (Nehalem/Westmere) that didn't
   really work; there was really nothing to work around anyway: the broken
   patch was meant to fix nested virtualization, but the PERF_GLOBAL_CTRL
   MSR is virtualized and therefore unaffected by the erratum.
 
 * Fix 6.12 regression where CONFIG_KVM will be built as a module even
   if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is 'y'.
 
 x86 selftests:
 
 * x86 selftests can now use AVX.
 
 Documentation:
 
 * Use rST internal links
 
 * Reorganize the introduction to the API document
 
 Generic:
 
 * Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock instead
   of RCU, so that running a vCPU on a different task doesn't encounter long
   due to having to wait for all CPUs become quiescent.  In general both reads
   and writes are rare, but userspace that supports confidential computing is
   introducing the use of "helper" vCPUs that may jump from one host processor
   to another.  Those will be very happy to trigger a synchronize_rcu(), and
   the effect on performance is quite the disaster.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmc9MRYUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroP00QgArxqxBIGLCW5t7bw7vtNq63QYRyh4
 dTiDguLiYQJ+AXmnRu11R6aPC7HgMAvlFCCmH+GEce4WEgt26hxCmncJr/aJOSwS
 letCS7TrME16PeZvh25A1nhPBUw6mTF1qqzgcdHMrqXG8LuHoGcKYGSRVbkf3kfI
 1ZoMq1r8ChXbVVmCx9DQ3gw1TVr5Dpjs2voLh8rDSE9Xpw0tVVabHu3/NhQEz/F+
 t8/nRaqH777icCHIf9PCk5HnarHxLAOvhM2M0Yj09PuBcE5fFQxpxltw/qiKQqqW
 ep4oquojGl87kZnhlDaac2UNtK90Ws+WxxvCwUmbvGN0ZJVaQwf4FvTwig==
 =lWpE
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "The biggest change here is eliminating the awful idea that KVM had of
  essentially guessing which pfns are refcounted pages.

  The reason to do so was that KVM needs to map both non-refcounted
  pages (for example BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP
  VMAs that contain refcounted pages.

  However, the result was security issues in the past, and more recently
  the inability to map VM_IO and VM_PFNMAP memory that _is_ backed by
  struct page but is not refcounted. In particular this broke virtio-gpu
  blob resources (which directly map host graphics buffers into the
  guest as "vram" for the virtio-gpu device) with the amdgpu driver,
  because amdgpu allocates non-compound higher order pages and the tail
  pages could not be mapped into KVM.

  This requires adjusting all uses of struct page in the
  per-architecture code, to always work on the pfn whenever possible.
  The large series that did this, from David Stevens and Sean
  Christopherson, also cleaned up substantially the set of functions
  that provided arch code with the pfn for a host virtual addresses.

  The previous maze of twisty little passages, all different, is
  replaced by five functions (__gfn_to_page, __kvm_faultin_pfn, the
  non-__ versions of these two, and kvm_prefetch_pages) saving almost
  200 lines of code.

  ARM:

   - Support for stage-1 permission indirection (FEAT_S1PIE) and
     permission overlays (FEAT_S1POE), including nested virt + the
     emulated page table walker

   - Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This
     call was introduced in PSCIv1.3 as a mechanism to request
     hibernation, similar to the S4 state in ACPI

   - Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As
     part of it, introduce trivial initialization of the host's MPAM
     context so KVM can use the corresponding traps

   - PMU support under nested virtualization, honoring the guest
     hypervisor's trap configuration and event filtering when running a
     nested guest

   - Fixes to vgic ITS serialization where stale device/interrupt table
     entries are not zeroed when the mapping is invalidated by the VM

   - Avoid emulated MMIO completion if userspace has requested
     synchronous external abort injection

   - Various fixes and cleanups affecting pKVM, vCPU initialization, and
     selftests

  LoongArch:

   - Add iocsr and mmio bus simulation in kernel.

   - Add in-kernel interrupt controller emulation.

   - Add support for virtualization extensions to the eiointc irqchip.

  PPC:

   - Drop lingering and utterly obsolete references to PPC970 KVM, which
     was removed 10 years ago.

   - Fix incorrect documentation references to non-existing ioctls

  RISC-V:

   - Accelerate KVM RISC-V when running as a guest

   - Perf support to collect KVM guest statistics from host side

  s390:

   - New selftests: more ucontrol selftests and CPU model sanity checks

   - Support for the gen17 CPU model

   - List registers supported by KVM_GET/SET_ONE_REG in the
     documentation

  x86:

   - Cleanup KVM's handling of Accessed and Dirty bits to dedup code,
     improve documentation, harden against unexpected changes.

     Even if the hardware A/D tracking is disabled, it is possible to
     use the hardware-defined A/D bits to track if a PFN is Accessed
     and/or Dirty, and that removes a lot of special cases.

   - Elide TLB flushes when aging secondary PTEs, as has been done in
     x86's primary MMU for over 10 years.

   - Recover huge pages in-place in the TDP MMU when dirty page logging
     is toggled off, instead of zapping them and waiting until the page
     is re-accessed to create a huge mapping. This reduces vCPU jitter.

   - Batch TLB flushes when dirty page logging is toggled off. This
     reduces the time it takes to disable dirty logging by ~3x.

   - Remove the shrinker that was (poorly) attempting to reclaim shadow
     page tables in low-memory situations.

   - Clean up and optimize KVM's handling of writes to
     MSR_IA32_APICBASE.

   - Advertise CPUIDs for new instructions in Clearwater Forest

   - Quirk KVM's misguided behavior of initialized certain feature MSRs
     to their maximum supported feature set, which can result in KVM
     creating invalid vCPU state. E.g. initializing PERF_CAPABILITIES to
     a non-zero value results in the vCPU having invalid state if
     userspace hides PDCM from the guest, which in turn can lead to
     save/restore failures.

   - Fix KVM's handling of non-canonical checks for vCPUs that support
     LA57 to better follow the "architecture", in quotes because the
     actual behavior is poorly documented. E.g. most MSR writes and
     descriptor table loads ignore CR4.LA57 and operate purely on
     whether the CPU supports LA57.

   - Bypass the register cache when querying CPL from kvm_sched_out(),
     as filling the cache from IRQ context is generally unsafe; harden
     the cache accessors to try to prevent similar issues from occuring
     in the future. The issue that triggered this change was already
     fixed in 6.12, but was still kinda latent.

   - Advertise AMD_IBPB_RET to userspace, and fix a related bug where
     KVM over-advertises SPEC_CTRL when trying to support cross-vendor
     VMs.

   - Minor cleanups

   - Switch hugepage recovery thread to use vhost_task.

     These kthreads can consume significant amounts of CPU time on
     behalf of a VM or in response to how the VM behaves (for example
     how it accesses its memory); therefore KVM tried to place the
     thread in the VM's cgroups and charge the CPU time consumed by that
     work to the VM's container.

     However the kthreads did not process SIGSTOP/SIGCONT, and therefore
     cgroups which had KVM instances inside could not complete freezing.

     Fix this by replacing the kthread with a PF_USER_WORKER thread, via
     the vhost_task abstraction. Another 100+ lines removed, with
     generally better behavior too like having these threads properly
     parented in the process tree.

   - Revert a workaround for an old CPU erratum (Nehalem/Westmere) that
     didn't really work; there was really nothing to work around anyway:
     the broken patch was meant to fix nested virtualization, but the
     PERF_GLOBAL_CTRL MSR is virtualized and therefore unaffected by the
     erratum.

   - Fix 6.12 regression where CONFIG_KVM will be built as a module even
     if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is
     'y'.

  x86 selftests:

   - x86 selftests can now use AVX.

  Documentation:

   - Use rST internal links

   - Reorganize the introduction to the API document

  Generic:

   - Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock
     instead of RCU, so that running a vCPU on a different task doesn't
     encounter long due to having to wait for all CPUs become quiescent.

     In general both reads and writes are rare, but userspace that
     supports confidential computing is introducing the use of "helper"
     vCPUs that may jump from one host processor to another. Those will
     be very happy to trigger a synchronize_rcu(), and the effect on
     performance is quite the disaster"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (298 commits)
  KVM: x86: Break CONFIG_KVM_X86's direct dependency on KVM_INTEL || KVM_AMD
  KVM: x86: add back X86_LOCAL_APIC dependency
  Revert "KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()"
  KVM: x86: switch hugepage recovery thread to vhost_task
  KVM: x86: expose MSR_PLATFORM_INFO as a feature MSR
  x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest
  Documentation: KVM: fix malformed table
  irqchip/loongson-eiointc: Add virt extension support
  LoongArch: KVM: Add irqfd support
  LoongArch: KVM: Add PCHPIC user mode read and write functions
  LoongArch: KVM: Add PCHPIC read and write functions
  LoongArch: KVM: Add PCHPIC device support
  LoongArch: KVM: Add EIOINTC user mode read and write functions
  LoongArch: KVM: Add EIOINTC read and write functions
  LoongArch: KVM: Add EIOINTC device support
  LoongArch: KVM: Add IPI user mode read and write function
  LoongArch: KVM: Add IPI read and write function
  LoongArch: KVM: Add IPI device support
  LoongArch: KVM: Add iocsr and mmio bus simulation in kernel
  KVM: arm64: Pass on SVE mapping failures
  ...
2024-11-23 16:00:50 -08:00
Linus Torvalds
42d9e8b7cc powerpc updates for 6.13
- Rework kfence support for the HPT MMU to work on systems with >= 16TB of RAM.
 
  - Remove the powerpc "maple" platform, used by the "Yellow Dog Powerstation".
 
  - Add support for DYNAMIC_FTRACE_WITH_CALL_OPS,
    DYNAMIC_FTRACE_WITH_DIRECT_CALLS & BPF Trampolines.
 
  - Add support for running KVM nested guests on Power11.
 
  - Other small features, cleanups and fixes.
 
 Thanks to: Amit Machhiwal, Arnd Bergmann, Christophe Leroy, Costa Shulyupin,
 David Hunter, David Wang, Disha Goel, Gautam Menghani, Geert Uytterhoeven,
 Hari Bathini, Julia Lawall, Kajol Jain, Keith Packard, Lukas Bulwahn, Madhavan
 Srinivasan, Markus Elfring, Michal Suchanek, Ming Lei, Mukesh Kumar Chaurasiya,
 Nathan Chancellor, Naveen N Rao, Nicholas Piggin, Nysal Jan K.A, Paulo Miguel
 Almeida, Pavithra Prakash, Ritesh Harjani (IBM), Rob Herring (Arm), Sachin P
 Bappalige, Shen Lichuan, Simon Horman, Sourabh Jain, Thomas Weißschuh, Thorsten
 Blum, Thorsten Leemhuis, Venkat Rao Bagalkote, Zhang Zekun,
 zhang jiao.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRjvi15rv0TSTaE+SIF0oADX8seIQUCZ0Fi5AAKCRAF0oADX8se
 IeI0AQCAkNWRYzGNzPM6aMwDpq5qdeZzvp0rZxuNsRSnIKJlxAD+PAOxOietgjbQ
 Lxt3oizg+UcH/304Y/iyT8IrwI4n+gE=
 =xNtu
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Rework kfence support for the HPT MMU to work on systems with >= 16TB
   of RAM.

 - Remove the powerpc "maple" platform, used by the "Yellow Dog
   Powerstation".

 - Add support for DYNAMIC_FTRACE_WITH_CALL_OPS,
   DYNAMIC_FTRACE_WITH_DIRECT_CALLS & BPF Trampolines.

 - Add support for running KVM nested guests on Power11.

 - Other small features, cleanups and fixes.

Thanks to Amit Machhiwal, Arnd Bergmann, Christophe Leroy, Costa
Shulyupin, David Hunter, David Wang, Disha Goel, Gautam Menghani, Geert
Uytterhoeven, Hari Bathini, Julia Lawall, Kajol Jain, Keith Packard,
Lukas Bulwahn, Madhavan Srinivasan, Markus Elfring, Michal Suchanek,
Ming Lei, Mukesh Kumar Chaurasiya, Nathan Chancellor, Naveen N Rao,
Nicholas Piggin, Nysal Jan K.A, Paulo Miguel Almeida, Pavithra Prakash,
Ritesh Harjani (IBM), Rob Herring (Arm), Sachin P Bappalige, Shen
Lichuan, Simon Horman, Sourabh Jain, Thomas Weißschuh, Thorsten Blum,
Thorsten Leemhuis, Venkat Rao Bagalkote, Zhang Zekun, and zhang jiao.

* tag 'powerpc-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (89 commits)
  EDAC/powerpc: Remove PPC_MAPLE drivers
  powerpc/perf: Add per-task/process monitoring to vpa_pmu driver
  powerpc/kvm: Add vpa latency counters to kvm_vcpu_arch
  docs: ABI: sysfs-bus-event_source-devices-vpa-pmu: Document sysfs event format entries for vpa_pmu
  powerpc/perf: Add perf interface to expose vpa counters
  MAINTAINERS: powerpc: Mark Maddy as "M"
  powerpc/Makefile: Allow overriding CPP
  powerpc-km82xx.c: replace of_node_put() with __free
  ps3: Correct some typos in comments
  powerpc/kexec: Fix return of uninitialized variable
  macintosh: Use common error handling code in via_pmu_led_init()
  powerpc/powermac: Use of_property_match_string() in pmac_has_backlight_type()
  powerpc: remove dead config options for MPC85xx platform support
  powerpc/xive: Use cpumask_intersects()
  selftests/powerpc: Remove the path after initialization.
  powerpc/xmon: symbol lookup length fixed
  powerpc/ep8248e: Use %pa to format resource_size_t
  powerpc/ps3: Reorganize kerneldoc parameter names
  KVM: PPC: Book3S HV: Fix kmv -> kvm typo
  powerpc/sstep: make emulate_vsx_load and emulate_vsx_store static
  ...
2024-11-23 10:44:31 -08:00
Linus Torvalds
5c00ff742b - The series "zram: optimal post-processing target selection" from
Sergey Senozhatsky improves zram's post-processing selection algorithm.
   This leads to improved memory savings.
 
 - Wei Yang has gone to town on the mapletree code, contributing several
   series which clean up the implementation:
 
 	- "refine mas_mab_cp()"
 	- "Reduce the space to be cleared for maple_big_node"
 	- "maple_tree: simplify mas_push_node()"
 	- "Following cleanup after introduce mas_wr_store_type()"
 	- "refine storing null"
 
 - The series "selftests/mm: hugetlb_fault_after_madv improvements" from
   David Hildenbrand fixes this selftest for s390.
 
 - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
   implements some rationaizations and cleanups in the page mapping code.
 
 - The series "mm: optimize shadow entries removal" from Shakeel Butt
   optimizes the file truncation code by speeding up the handling of shadow
   entries.
 
 - The series "Remove PageKsm()" from Matthew Wilcox completes the
   migration of this flag over to being a folio-based flag.
 
 - The series "Unify hugetlb into arch_get_unmapped_area functions" from
   Oscar Salvador implements a bunch of consolidations and cleanups in the
   hugetlb code.
 
 - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
   takes away the wp-fault time practice of turning a huge zero page into
   small pages.  Instead we replace the whole thing with a THP.  More
   consistent cleaner and potentiall saves a large number of pagefaults.
 
 - The series "percpu: Add a test case and fix for clang" from Andy
   Shevchenko enhances and fixes the kernel's built in percpu test code.
 
 - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
   optimizes mremap() by avoiding doing things which we didn't need to do.
 
 - The series "Improve the tmpfs large folio read performance" from
   Baolin Wang teaches tmpfs to copy data into userspace at the folio size
   rather than as individual pages.  A 20% speedup was observed.
 
 - The series "mm/damon/vaddr: Fix issue in
   damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting.
 
 - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt
   removes the long-deprecated memcgv2 charge moving feature.
 
 - The series "fix error handling in mmap_region() and refactor" from
   Lorenzo Stoakes cleanup up some of the mmap() error handling and
   addresses some potential performance issues.
 
 - The series "x86/module: use large ROX pages for text allocations" from
   Mike Rapoport teaches x86 to use large pages for read-only-execute
   module text.
 
 - The series "page allocation tag compression" from Suren Baghdasaryan
   is followon maintenance work for the new page allocation profiling
   feature.
 
 - The series "page->index removals in mm" from Matthew Wilcox remove
   most references to page->index in mm/.  A slow march towards shrinking
   struct page.
 
 - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
   interface tests" from Andrew Paniakin performs maintenance work for
   DAMON's self testing code.
 
 - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
   improves zswap's batching of compression and decompression.  It is a
   step along the way towards using Intel IAA hardware acceleration for
   this zswap operation.
 
 - The series "kasan: migrate the last module test to kunit" from
   Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests
   over to the KUnit framework.
 
 - The series "implement lightweight guard pages" from Lorenzo Stoakes
   permits userapace to place fault-generating guard pages within a single
   VMA, rather than requiring that multiple VMAs be created for this.
   Improved efficiencies for userspace memory allocators are expected.
 
 - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
   tracepoints to provide increased visibility into memcg stats flushing
   activity.
 
 - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
   fixes a zram buglet which potentially affected performance.
 
 - The series "mm: add more kernel parameters to control mTHP" from
   Maíra Canal enhances our ability to control/configuremultisize THP from
   the kernel boot command line.
 
 - The series "kasan: few improvements on kunit tests" from Sabyrzhan
   Tasbolatov has a couple of fixups for the KASAN KUnit tests.
 
 - The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
   from Kairui Song optimizes list_lru memory utilization when lockdep is
   enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzwFqgAKCRDdBJ7gKXxA
 jkeuAQCkl+BmeYHE6uG0hi3pRxkupseR6DEOAYIiTv0/l8/GggD/Z3jmEeqnZaNq
 xyyenpibWgUoShU2wZ/Ha8FE5WDINwg=
 =JfWR
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - The series "zram: optimal post-processing target selection" from
   Sergey Senozhatsky improves zram's post-processing selection
   algorithm. This leads to improved memory savings.

 - Wei Yang has gone to town on the mapletree code, contributing several
   series which clean up the implementation:
	- "refine mas_mab_cp()"
	- "Reduce the space to be cleared for maple_big_node"
	- "maple_tree: simplify mas_push_node()"
	- "Following cleanup after introduce mas_wr_store_type()"
	- "refine storing null"

 - The series "selftests/mm: hugetlb_fault_after_madv improvements" from
   David Hildenbrand fixes this selftest for s390.

 - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
   implements some rationaizations and cleanups in the page mapping
   code.

 - The series "mm: optimize shadow entries removal" from Shakeel Butt
   optimizes the file truncation code by speeding up the handling of
   shadow entries.

 - The series "Remove PageKsm()" from Matthew Wilcox completes the
   migration of this flag over to being a folio-based flag.

 - The series "Unify hugetlb into arch_get_unmapped_area functions" from
   Oscar Salvador implements a bunch of consolidations and cleanups in
   the hugetlb code.

 - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
   takes away the wp-fault time practice of turning a huge zero page
   into small pages. Instead we replace the whole thing with a THP. More
   consistent cleaner and potentiall saves a large number of pagefaults.

 - The series "percpu: Add a test case and fix for clang" from Andy
   Shevchenko enhances and fixes the kernel's built in percpu test code.

 - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
   optimizes mremap() by avoiding doing things which we didn't need to
   do.

 - The series "Improve the tmpfs large folio read performance" from
   Baolin Wang teaches tmpfs to copy data into userspace at the folio
   size rather than as individual pages. A 20% speedup was observed.

 - The series "mm/damon/vaddr: Fix issue in
   damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON
   splitting.

 - The series "memcg-v1: fully deprecate charge moving" from Shakeel
   Butt removes the long-deprecated memcgv2 charge moving feature.

 - The series "fix error handling in mmap_region() and refactor" from
   Lorenzo Stoakes cleanup up some of the mmap() error handling and
   addresses some potential performance issues.

 - The series "x86/module: use large ROX pages for text allocations"
   from Mike Rapoport teaches x86 to use large pages for
   read-only-execute module text.

 - The series "page allocation tag compression" from Suren Baghdasaryan
   is followon maintenance work for the new page allocation profiling
   feature.

 - The series "page->index removals in mm" from Matthew Wilcox remove
   most references to page->index in mm/. A slow march towards shrinking
   struct page.

 - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
   interface tests" from Andrew Paniakin performs maintenance work for
   DAMON's self testing code.

 - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
   improves zswap's batching of compression and decompression. It is a
   step along the way towards using Intel IAA hardware acceleration for
   this zswap operation.

 - The series "kasan: migrate the last module test to kunit" from
   Sabyrzhan Tasbolatov completes the migration of the KASAN built-in
   tests over to the KUnit framework.

 - The series "implement lightweight guard pages" from Lorenzo Stoakes
   permits userapace to place fault-generating guard pages within a
   single VMA, rather than requiring that multiple VMAs be created for
   this. Improved efficiencies for userspace memory allocators are
   expected.

 - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
   tracepoints to provide increased visibility into memcg stats flushing
   activity.

 - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
   fixes a zram buglet which potentially affected performance.

 - The series "mm: add more kernel parameters to control mTHP" from
   Maíra Canal enhances our ability to control/configuremultisize THP
   from the kernel boot command line.

 - The series "kasan: few improvements on kunit tests" from Sabyrzhan
   Tasbolatov has a couple of fixups for the KASAN KUnit tests.

 - The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
   from Kairui Song optimizes list_lru memory utilization when lockdep
   is enabled.

* tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits)
  cma: enforce non-zero pageblock_order during cma_init_reserved_mem()
  mm/kfence: add a new kunit test test_use_after_free_read_nofault()
  zram: fix NULL pointer in comp_algorithm_show()
  memcg/hugetlb: add hugeTLB counters to memcg
  vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event
  mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount
  zram: ZRAM_DEF_COMP should depend on ZRAM
  MAINTAINERS/MEMORY MANAGEMENT: add document files for mm
  Docs/mm/damon: recommend academic papers to read and/or cite
  mm: define general function pXd_init()
  kmemleak: iommu/iova: fix transient kmemleak false positive
  mm/list_lru: simplify the list_lru walk callback function
  mm/list_lru: split the lock to per-cgroup scope
  mm/list_lru: simplify reparenting and initial allocation
  mm/list_lru: code clean up for reparenting
  mm/list_lru: don't export list_lru_add
  mm/list_lru: don't pass unnecessary key parameters
  kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller
  kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW
  kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols
  ...
2024-11-23 09:58:07 -08:00
Linus Torvalds
228a1157fb 15 smb3 client fixes, most also for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmdA/W0ACgkQiiy9cAdy
 T1HN6Av/ai3dqbNO0d9IrmM7JONITa6CYqBA+DC2zyv7LtNmSOVGAeP+LyEuM0rE
 jHbQBNSRVhoNKCv5ywT6GhjMNnDeO0ctZG2aXoQnGfCFdZ5dE/08r8Cc24xQW+x5
 cdLbBGR22HBi4MjqqWALL2goL9TpLYo9ht31P1xmiM1pw3/pbh2lsEDLVMS/veeG
 RQ1pIg5YpWcWQWAnuwI6RDxGHF2taj7tcm8gZ1mYRUlaPHmsCHeNg6lgHdDXLFKw
 0HA4dx8AeH2rdMWgwP42UVIVoxG/H//xPpLym+A8yV+11EJcurjYkskhEsSUxeWq
 vzsf0xFxN2MhjwI5DYfr5kIQknjS3qCTOofRsJi6s5GhxJy2hl0Wfto0AJs6+Fl4
 /sukKWjWSgrkcpILLokDUZptYEBH7pUNcBVpfcQOScnUeV4qQk8oko3bTKDSfgTq
 q3zrh2mPanTOpM6UgcBRfywR4r5tAKzoQOUfJpQuItLvLhgrgAm+WGCWFepztJlT
 LPEGTxrG
 =K4Ai
 -----END PGP SIGNATURE-----

Merge tag '6.13-rc-part1-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client updates from Steve French:

 - Fix two SMB3.1.1 POSIX Extensions problems

 - Fixes for special file handling (symlinks and FIFOs)

 - Improve compounding

 - Four cleanup patches

 - Fix use after free in signing

 - Add support for handling namespaces for reconnect related upcalls
   (e.g. for DNS names resolution and auth)

 - Fix various directory lease problems (directory entry caching),
   including some important potential use after frees

* tag '6.13-rc-part1-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: prevent use-after-free due to open_cached_dir error paths
  smb: Don't leak cfid when reconnect races with open_cached_dir
  smb: client: handle max length for SMB symlinks
  smb: client: get rid of bounds check in SMB2_ioctl_init()
  smb: client: improve compound padding in encryption
  smb3: request handle caching when caching directories
  cifs: Recognize SFU char/block devices created by Windows NFS server on Windows Server <<2012
  CIFS: New mount option for cifs.upcall namespace resolution
  smb/client: Prevent error pointer dereference
  fs/smb/client: implement chmod() for SMB3 POSIX Extensions
  smb: cached directories can be more than root file handle
  smb: client: fix use-after-free of signing key
  smb: client: Use str_yes_no() helper function
  smb: client: memcpy() with surrounding object base address
  cifs: Remove pre-historic unused CIFSSMBCopy
2024-11-22 21:54:14 -08:00
Linus Torvalds
e7675238b9 overlayfs updates for 6.13
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmc90jsACgkQEVvVyTe/
 1Wol0A//RhzFCG8geR7Grbptp40CUm9kVISvkr50mPBdvVk3jX9WvH9m/10qapGP
 tcGHSdHt+q5qabqutKLmQRiFbwpGEaBMaFOe7JH8na8xWvmSa3p7sJC5kLByS3rm
 D2F+cVx3Di7MTscz/Ma724bHdHOUO5RbDuMIcjp7uXRvaNWJ0uZg5xWlBKsNa3h8
 DbNSYi5ICihLYpUxI9NglHZ6iqcS2jHsUHSAw52/GJ2Zon1LAAmKoSn6s7hZ27ZJ
 f8Rv5fFuYmkRV7nYo/gjLY1gt7KXZFcfUtMT05yd7zcnqDayKEFXEiwI/Bz5fXZL
 HmZpOP4RV2M9B8HzhReVR/yG8gZaaUezX+aVQp7plZSc73GhMdFFd1bUyjgJ4Lzf
 C2BlBMWafc/Zc7a7r0+X5577i34nED8lGuVMEdYMtjSjstpzIP+1Wlzn2cGi4+5K
 VAb+kEravjP9ck7YrmbruRYfVhDaE37BDs4XML4S8gzcZgdaTcEMyGw1ifEhvPjA
 vLbRs24a5VO7/cKlks7PWS6i9uExaz7g4re0jUPwUuc+nS+Hv+y8kLSPqLS4CtNY
 MxhS2IhKK5gp1Z9XGpLsak+ancTYLSV0OJ15qsAChpqoqSG5Xd9Lt4CWACnF33Ea
 ny8z5QpOAHWVb97k6xaEvu/r0dl+PHdG7vfb0MNhXaajNF8SKiU=
 =pgoX
 -----END PGP SIGNATURE-----

Merge tag 'ovl-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs

Pull overlayfs updates from Amir Goldstein:

 - Fix a syzbot reported NULL pointer deref with bfs lower layers

 - Fix a copy up failure of large file from lower fuse fs

 - Followup cleanup of backing_file API from Miklos

 - Introduction and use of revert/override_creds_light() helpers, that
   were suggested by Christian as a mitigation to cache line bouncing
   and false sharing of fields in overlayfs creator_cred long lived
   struct cred copy.

 - Store up to two backing file references (upper and lower) in an
   ovl_file container instead of storing a single backing file in
   file->private_data.

   This is used to avoid the practice of opening a short lived backing
   file for the duration of some file operations and to avoid the
   specialized use of FDPUT_FPUT in such occasions, that was getting in
   the way of Al's fd_file() conversions.

* tag 'ovl-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: Filter invalid inodes with missing lookup function
  ovl: convert ovl_real_fdget() callers to ovl_real_file()
  ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path()
  ovl: store upper real file in ovl_file struct
  ovl: allocate a container struct ovl_file for ovl private context
  ovl: do not open non-data lower file for fsync
  ovl: Optimize override/revert creds
  ovl: pass an explicit reference of creators creds to callers
  ovl: use wrapper ovl_revert_creds()
  fs/backing-file: Convert to revert/override_creds_light()
  cred: Add a light version of override/revert_creds()
  backing-file: clean up the API
  ovl: properly handle large files in ovl_security_fileattr
2024-11-22 20:55:42 -08:00
Linus Torvalds
060fc106b6 unicode updates
This update includes:
 
   - A patch by Thomas Weißschuh constifying a read-only struct.
 
   - A patch by André Almeida fixing the error path of unicode_load,
   which might trigger a kernel oops if it fails to find the unicode
   module.
 
   - One documentation fix by Gan Jie, updating a filename in the README.
 
   - A patch by André Almeida adding the link of my tree to MAINTAINERS.
 
 All but the MAINTAINERS patch have been sitting on my tree and in
 linux-next since early in the 6.12 cycle.
 
 Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS3XO7QfvpFoONBhH1OwQgI3t8RJgUCZ0D4JwAKCRBOwQgI3t8R
 JmjZAP988O9eB4ITF6KHKsHyY3pOhxSRXU5jpr78v7ofDDuGwAD/UBJZyF35wgJz
 S2q295kCAEP8bUKxj6RJtyMyQnamQg8=
 =irw2
 -----END PGP SIGNATURE-----

Merge tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode

Pull unicode updates from Gabriel Krisman Bertazi:

 - constify a read-only struct (Thomas Weißschuh)

 - fix the error path of unicode_load, avoiding a possible kernel oops
   if it fails to find the unicode module (André Almeida)

 - documentation fix, updating a filename in the README (Gan Jie)

 - add the link of my tree to MAINTAINERS (André Almeida)

* tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  MAINTAINERS: Add Unicode tree
  unicode: change the reference of database file
  unicode: Fix utf8_load() error path
  unicode: constify utf8 data table
2024-11-22 20:50:55 -08:00
Linus Torvalds
980f8f8fd4 Summary
* sysctl ctl_table constification
 
   Constifying ctl_table structs prevents the modification of proc_handler
   function pointers. All ctl_table struct arguments are const qualified in the
   sysctl API in such a way that the ctl_table arrays being defined elsewhere
   and passed through sysctl can be constified one-by-one. We kick the
   constification off by qualifying user_table in kernel/ucount.c and expect all
   the ctl_tables to be constified in the coming releases.
 
 * Misc fixes
 
   Adjust comments in two places to better reflect the code. Remove superfluous
   dput calls. Remove Luis from sysctl maintainership. Replace comments about
   holding a lock with calls to lockdep_assert_held.
 
 * Testing
 
   All these went through 0-day and they have all been in linux-next for at
   least 1 month (since Oct-24). I also rand these through the sysctl selftest
   for x86_64.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmdAXMsACgkQupfNUreW
 QU/KfQv8Daq9sew98ohmS/lkdoE1dfpI72motzEn1993CbLjN2h3CZauaHjBPFnr
 rpr8qPrphdWTyDbDMgx63oxcNxM07g7a9H0y/K3IwdUsx7fGINgHF5kfWeVn09ov
 X8I3NuL/+xSHAZRsLQeBykbY6BD5e0uuxL6ayGzkejrgRd+80dmC3MzXqX207v1z
 rlrUFXEXwqKYgxP/H+pxmvmVWKAeFsQt/E49GOkg2qSg9mVFhtKpxHwMJVqS2a8u
 qAKHgcZhB5T8TQSb1eKnyCzXLDLpzqUBj9ejqJSsQm16fweawv221Ji6a1k53QYG
 chreoB9R8qCZ/jGoWI3ZKGRZ/Vl37l+GF/82X/sDrMbKwVlxvaERpb1KXrnh/D1v
 qNze1Eea0eYv22weGGEa3J5N2tKfgX6NcRFioDNe9VEXX6zDcAtJKTKZtbMB3gXX
 CzQicH5yXApyAk3aNCq0S3s+WRQR0syGAYCmtxhaRgXRnSu9qifKZ1XhZQyhgKIG
 Flt9MsU2
 =bOJ0
 -----END PGP SIGNATURE-----

Merge tag 'sysctl-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl

Pull sysctl updates from Joel Granados:
 "sysctl ctl_table constification:

   - Constifying ctl_table structs prevents the modification of
     proc_handler function pointers. All ctl_table struct arguments are
     const qualified in the sysctl API in such a way that the ctl_table
     arrays being defined elsewhere and passed through sysctl can be
     constified one-by-one.

     We kick the constification off by qualifying user_table in
     kernel/ucount.c and expect all the ctl_tables to be constified in
     the coming releases.

  Misc fixes:

   - Adjust comments in two places to better reflect the code

   - Remove superfluous dput calls

   - Remove Luis from sysctl maintainership

   - Replace comments about holding a lock with calls to
     lockdep_assert_held"

* tag 'sysctl-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
  sysctl: Reduce dput(child) calls in proc_sys_fill_cache()
  sysctl: Reorganize kerneldoc parameter names
  ucounts: constify sysctl table user_table
  sysctl: update comments to new registration APIs
  MAINTAINERS: remove me from sysctl
  sysctl: Convert locking comments to lockdep assertions
  const_structs.checkpatch: add ctl_table
  sysctl: make internal ctl_tables const
  sysctl: allow registration of const struct ctl_table
  sysctl: move internal interfaces to const struct ctl_table
  bpf: Constify ctl_table argument of filter function
2024-11-22 20:36:11 -08:00
Linus Torvalds
2a163a4cea RDMA v6.13 merge window pull request
Seveal fixes scattered across the drivers and a few new features:
 
 - Minor updates and bug fixes to hfi1, efa, iopob, bnxt, hns
 
 - Force disassociate the userspace FD when hns does an async reset
 
 - bnxt new features for optimized modify QP to skip certain stayes, CQ
   coalescing, better debug dumping
 
 - mlx5 new data placement ordering feature
 
 - Faster destruction of mlx5 devx HW objects
 
 - Improvements to RDMA CM mad handling
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZz4ENwAKCRCFwuHvBreF
 YQYQAP9R54r5J1Iylg+zqhCc+e/9oveuuZbfLvy/EJiEpmdprQEAgPs1RrB0z7U6
 1xrVStUKNPhGd5XeVVZGkIV0zYv6Tw4=
 =V5xI
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Seveal fixes scattered across the drivers and a few new features:

   - Minor updates and bug fixes to hfi1, efa, iopob, bnxt, hns

   - Force disassociate the userspace FD when hns does an async reset

   - bnxt new features for optimized modify QP to skip certain stayes,
     CQ coalescing, better debug dumping

   - mlx5 new data placement ordering feature

   - Faster destruction of mlx5 devx HW objects

   - Improvements to RDMA CM mad handling"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (51 commits)
  RDMA/bnxt_re: Correct the sequence of device suspend
  RDMA/bnxt_re: Use the default mode of congestion control
  RDMA/bnxt_re: Support different traffic class
  IB/cm: Rework sending DREQ when destroying a cm_id
  IB/cm: Do not hold reference on cm_id unless needed
  IB/cm: Explicitly mark if a response MAD is a retransmission
  RDMA/mlx5: Move events notifier registration to be after device registration
  RDMA/bnxt_re: Cache MSIx info to a local structure
  RDMA/bnxt_re: Refurbish CQ to NQ hash calculation
  RDMA/bnxt_re: Refactor NQ allocation
  RDMA/bnxt_re: Fail probe early when not enough MSI-x vectors are reserved
  RDMA/hns: Fix different dgids mapping to the same dip_idx
  RDMA/bnxt_re: Add set_func_resources support for P5/P7 adapters
  RDMA/bnxt_re: Enhance RoCE SRIOV resource configuration design
  bnxt_en: Add support for RoCE sriov configuration
  RDMA/hns: Fix NULL pointer derefernce in hns_roce_map_mr_sg()
  RDMA/hns: Fix out-of-order issue of requester when setting FENCE
  RDMA/nldev: Add IB device and net device rename events
  RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation
  RDMA/core: Move ib_uverbs_file struct to uverbs_types.h
  ...
2024-11-22 20:03:57 -08:00
Linus Torvalds
ceba6f6f33 IOMMU Updates for Linux v6.13:
Including:
 
 	- Core Updates:
 	  - Convert call-sites using iommu_domain_alloc() to more specific
 	    versions and remove function.
 	  - Introduce iommu_paging_domain_alloc_flags().
 	  - Extend support for allocating PASID-capable domains to more
 	    drivers.
 	  - Remove iommu_present().
 	  - Some smaller improvements.
 
 	- New IOMMU driver for RISC-V.
 
 	- Intel VT-d Updates:
 	  - Add domain_alloc_paging support.
 	  - Enable user space IOPFs in non-PASID and non-svm cases.
 	  - Small code refactoring and cleanups.
 	  - Add domain replacement support for pasid.
 
 	- AMD-Vi Updates:
 	  - Adapt to iommu_paging_domain_alloc_flags() interface and alloc V2
 	    page-tables by default.
 	  - Replace custom domain ID allocator with IDA allocator.
 	  - Add ops->release_domain() support.
 	  - Other improvements to device attach and domain allocation code
 	    paths.
 
 	- ARM-SMMU Updates:
 	  - SMMUv2:
 	    - Return -EPROBE_DEFER for client devices probing before their SMMU.
 	    - Devicetree binding updates for Qualcomm MMU-500 implementations.
 	  - SMMUv3:
 	    - Minor fixes and cleanup for NVIDIA's virtual command queue driver.
 	  - IO-PGTable:
 	    - Fix indexing of concatenated PGDs and extend selftest coverage.
 	    - Remove unused block-splitting support.
 
 	- S390 IOMMU:
 	  - Implement support for blocking domain.
 
 	- Mediatek IOMMU:
 	  - Enable 35-bit physical address support for mt8186.
 
 	- OMAP IOMMU driver:
 	  - Adapt to recent IOMMU core changes and unbreak driver.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmdAPOoACgkQK/BELZcB
 GuOs1w/+PoLbOYUjmJiOfpI6YNSEfF2tE4z2al/YYIBcNoAmTTRauuhv6+S0gVRy
 NTfSucw7OuLlbE9vGsdY02UL1PK58NGfUF8Z2rZSf+RRgLACc47cjZWh0vzDlNbP
 4LTdqJXmIWiYcmDtY7LmHtwTSiB900YFZwZOHmTSfNyJt8UC4tBPRh8k2YD3vuxc
 QZlxSihEf+F+vm8GtW40Ia9BiG3YhCYAcHq6Y4dKxI0JWN+7oRiPN8CF+z/vcdjV
 VpCDBcbHjvqqpXJvddQHA0SrGDBMHz1AXYhRXnfe7Ogh6SbaSWDSsdaIS27DsOzC
 L6fxW3+sNmfEOO1RmJoizkHzAtkLWCLNjBvjOb1hUCpwLcKf5nhgE3wOQSwzqumn
 KbxpoQpHFJutikDBGRsKJCsNqS8ZNWd4Z8rHhTnq2ctuYUFvurkcwX4WXOSRpsoA
 iJ+x1ezk9FxObHj/B+1nIAwKoeaLyFEwJe7Etom/E2m/2mq2oQOrq1bvfIGCms5h
 mqLYJ9L9MDanhEiOshHooy6ROPD842XmWILfq3HUi9JcrB/BvILPRsESQnNAn3Zl
 8ImbR5VijGGDy50KBE8I9abRwDTIn9c2JJVDSh3tAz1aicGnRLcIeqNeuJ4IEQZf
 IQb7qcZQge17ie/Pwr24GlwrKG7DhOg5NXvl3DiVUum2NFGjuBc=
 =V9hb
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu updates from Joerg Roedel:
 "Core Updates:
   - Convert call-sites using iommu_domain_alloc() to more specific
     versions and remove function
   - Introduce iommu_paging_domain_alloc_flags()
   - Extend support for allocating PASID-capable domains to more drivers
   - Remove iommu_present()
   - Some smaller improvements

  New IOMMU driver for RISC-V

  Intel VT-d Updates:
   - Add domain_alloc_paging support
   - Enable user space IOPFs in non-PASID and non-svm cases
   - Small code refactoring and cleanups
   - Add domain replacement support for pasid

  AMD-Vi Updates:
   - Adapt to iommu_paging_domain_alloc_flags() interface and alloc V2
     page-tables by default
   - Replace custom domain ID allocator with IDA allocator
   - Add ops->release_domain() support
   - Other improvements to device attach and domain allocation code
     paths

  ARM-SMMU Updates:
   - SMMUv2:
      - Return -EPROBE_DEFER for client devices probing before their
        SMMU
      - Devicetree binding updates for Qualcomm MMU-500 implementations
   - SMMUv3:
      - Minor fixes and cleanup for NVIDIA's virtual command queue
        driver
   - IO-PGTable:
      - Fix indexing of concatenated PGDs and extend selftest coverage
      - Remove unused block-splitting support

  S390 IOMMU:
   - Implement support for blocking domain

  Mediatek IOMMU:
   - Enable 35-bit physical address support for mt8186

  OMAP IOMMU driver:
   - Adapt to recent IOMMU core changes and unbreak driver"

* tag 'iommu-updates-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (92 commits)
  iommu/tegra241-cmdqv: Fix alignment failure at max_n_shift
  iommu: Make set_dev_pasid op support domain replacement
  iommu/arm-smmu-v3: Make set_dev_pasid() op support replace
  iommu/vt-d: Add set_dev_pasid callback for nested domain
  iommu/vt-d: Make identity_domain_set_dev_pasid() to handle domain replacement
  iommu/vt-d: Make intel_svm_set_dev_pasid() support domain replacement
  iommu/vt-d: Limit intel_iommu_set_dev_pasid() for paging domain
  iommu/vt-d: Make intel_iommu_set_dev_pasid() to handle domain replacement
  iommu/vt-d: Add iommu_domain_did() to get did
  iommu/vt-d: Consolidate the struct dev_pasid_info add/remove
  iommu/vt-d: Add pasid replace helpers
  iommu/vt-d: Refactor the pasid setup helpers
  iommu/vt-d: Add a helper to flush cache for updating present pasid entry
  iommu: Pass old domain to set_dev_pasid op
  iommu/iova: Fix typo 'adderss'
  iommu: Add a kdoc to iommu_unmap()
  iommu/io-pgtable-arm-v7s: Remove split on unmap behavior
  iommu/io-pgtable-arm: Remove split on unmap behavior
  iommu/vt-d: Drain PRQs when domain removed from RID
  iommu/vt-d: Drop pasid requirement for prq initialization
  ...
2024-11-22 19:55:10 -08:00
Linus Torvalds
eb78332b10 More thermal control updates for 6.13-rc1
- Add SAR2130P compatible to DT bindings in the QCom Tsens driver (Dmitry
    Baryshkov).
 
  - Add static annotation to arrays describing platform sensors in the
    LVTS Mediatek driver (Colin Ian King).
 
  - Switch back to struct platform_driver::remove() from the previous
    callbacks prototype rework (Uwe Kleine-König).
 
  - Add MSM8937 compatible to DT bindings and its support in the QCom
    Tsens driver (Barnabás Czémán).
 
  - Remove a pointless sign test on an unsigned value in k3_bgp_read_temp()
    in the k3_j72xx_bandgap driver (Rex Nie).
 
  - Fix a pointer reference loss when realloc() fails in the thermal
    library (Zhang Jiao).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmdA4iQSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxIzEP/2ldTTMI81LBlWMfdHe1TZgqart20LJB
 2Zsl45awrNgWCMqWMyFtTq3Mzim0Jss8RWt1B3HP05w/ig2O6svI2FyxKtqZTEhg
 w8VRrbFR4klg8apWbf5d4MojhMAtEmHiBJ/8OkPYR/3Sl+712TE35YA0dxhn8vvV
 4I9h7585nOhAskm0pIqBjBSvbdUq4ake27TSeB/EZk2lVwAXL8LAMT9h1uwlWTWz
 6B79TxQoJ6SYB/Xl/uIgQ5ShULKcBjlc9ofAsyNX41BCDVwSso/I9J9dW1y/VcRt
 RcNglAtWWMcAs6e6+iySvO1Vbn6FIbj7xapDtnRLePV5fes2PuLFrMAvxyjk55Sx
 LGcU+oTBUI9tEf0ywxj9/kDK9mn+zvAbv2OEp9VihT1AdTyBnQ5o70MhwAhFs7h2
 IfpbTs9flxvM6KlT9/qFdD25uqelhuV4ElnTjKuUEapMxn55YcIt5dszZcCf25T+
 L7ybCQ3PV9mYCkGOrig+dpcZeo9YDBevhb5F6jG3yR/SXInxalRHYiia9gIh43p2
 qgGIYYW/B/nk+TSq8bYowXslgREvwjEQEmkZpQVyQWyO1kTnwdeaTRo7IaKnEk58
 1LVcFkD6UVy5dZu0Q93h+dEO18wWkOFX4/uJJ7trw9DMr68bw+gzgQV/JFk5R077
 mfXi95PR3VgW
 =cXi4
 -----END PGP SIGNATURE-----

Merge tag 'thermal-6.13-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more thermal control updates from Rafael Wysocki:
 "These update a few thermal drivers used on ARM platforms and thermal
  tools:

   - Add SAR2130P compatible to DT bindings in the QCom Tsens driver
     (Dmitry Baryshkov)

   - Add static annotation to arrays describing platform sensors in the
     LVTS Mediatek driver (Colin Ian King)

   - Switch back to struct platform_driver::remove() from the previous
     callbacks prototype rework (Uwe Kleine-König)

   - Add MSM8937 compatible to DT bindings and its support in the QCom
     Tsens driver (Barnabás Czémán)

   - Remove a pointless sign test on an unsigned value in
     k3_bgp_read_temp() in the k3_j72xx_bandgap driver (Rex Nie)

   - Fix a pointer reference loss when realloc() fails in the thermal
     library (Zhang Jiao)"

* tag 'thermal-6.13-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  tools/thermal: Fix common realloc mistake
  thermal/drivers/k3_j72xx_bandgap: Simplify code in k3_bgp_read_temp()
  thermal/drivers/qcom/tsens-v1: Add support for MSM8937 tsens
  dt-bindings: thermal: tsens: Add MSM8937
  thermal: Switch back to struct platform_driver::remove()
  thermal/drivers/mediatek/lvts_thermal: Make read-only arrays static const
  dt-bindings: thermal: qcom-tsens: Add SAR2130P compatible
2024-11-22 19:43:18 -08:00
Linus Torvalds
6f9baa9b92 More power management updates for 6.13-rc1
- Add virtual cpufreq driver for guest kernels (David Dai).
 
  - Minor cleanup to various cpufreq drivers (Andy Shevchenko, Dhruva
    Gole, Jie Zhan, Jinjie Ruan, Shuosheng Huang, Sibi Sankar, and Yuan
    Can).
 
  - Revert "cpufreq: brcmstb-avs-cpufreq: Fix initial command check" (Colin
    Ian King).
 
  - Improve DT bindings for qcom-hw driver (Dmitry Baryshkov, Konrad
    Dybcio, and Nikunj Kela).
 
  - Make cpuidle_play_dead() try all idle states with :enter_dead()
    callbacks and change their return type to void (Rafael Wysocki).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmdA6AcSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxSnEP/AkRFAczWdQNqOEw3H7VPZJwqnN6BIpw
 0ewhxx5QHP5TfQ/u/VXxgcja0nuTDVIDJSjvnypyxKJnZa5+sl0Kj9ieaIBlPKAl
 m0UmiqT4mbw4mEIBse/6pZR6DCHOYNfmYTo6BdA1JDfVPM9fEmmicKmk1NW5BEt3
 FOPPZpGACzlMxiFHLhEoDsMJZKGdGZQsu5mLhkHW3pmMVU8+pgKiypO0yVVl535l
 vcgvrXjBRgHVdoUi99zK3ORW3n9O/UADvampEV0e4pxpTNd0kC3U3igaOAT+S/EI
 1wGOxC7WAlMJju+5aUX8qEwg19kalg8ecU49ahhE9o/dRN9lOzLYm4xurpcVWJLH
 pH88XJMTZGHAgfLWD+isR4NRnIYVxdG+K85YpWd560xD1Be+oJXAlrg1PcR2PLea
 CcvyH8a4fGqnpvLxx3teYFYZElXSFkARxEXrfDx0KLDsy+q7DGaoiAOf+AonXipP
 8HjhuLdh2jUPqWwdZR+4Td3HMh6sS0soLGHiURA+lGVS5ucek8ZLnbNoQGS+9NAU
 u64QwAqH9kiW4VxLUC17ErfZLMBU1jj+C79LaMRUoc5wfv6WU9oOZ6Xg9JcGZUGV
 cEsCQqtir9o0snRXHdQv5QTij8IFqTr2iNV8vZ0cmCp8UxSMN8/41p016BZ+Ahd/
 yb7PJ7hHZGm6
 =aB0d
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.13-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These mostly are updates of cpufreq drivers used on ARM platforms plus
  one new DT-based cpufreq driver for virtualized guests and two cpuidle
  changes that should not make any difference on systems currently in
  the field, but will be needed for future development:

   - Add virtual cpufreq driver for guest kernels (David Dai)

   - Minor cleanup to various cpufreq drivers (Andy Shevchenko, Dhruva
     Gole, Jie Zhan, Jinjie Ruan, Shuosheng Huang, Sibi Sankar, and Yuan
     Can)

   - Revert "cpufreq: brcmstb-avs-cpufreq: Fix initial command check"
     (Colin Ian King)

   - Improve DT bindings for qcom-hw driver (Dmitry Baryshkov, Konrad
     Dybcio, and Nikunj Kela)

   - Make cpuidle_play_dead() try all idle states with :enter_dead()
     callbacks and change their return type to void (Rafael Wysocki)"

* tag 'pm-6.13-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (22 commits)
  cpuidle: Change :enter_dead() driver callback return type to void
  cpuidle: Do not return from cpuidle_play_dead() on callback failures
  arm64: dts: qcom: sc8180x: Add a SoC-specific compatible to cpufreq-hw
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add SC8180X compatible
  cpufreq: sun50i: add a100 cpufreq support
  cpufreq: mediatek-hw: Fix wrong return value in mtk_cpufreq_get_cpu_power()
  cpufreq: CPPC: Fix wrong return value in cppc_get_cpu_power()
  cpufreq: CPPC: Fix wrong return value in cppc_get_cpu_cost()
  cpufreq: loongson3: Check for error code from devm_mutex_init() call
  cpufreq: scmi: Fix cleanup path when boost enablement fails
  cpufreq: CPPC: Fix possible null-ptr-deref for cppc_get_cpu_cost()
  cpufreq: CPPC: Fix possible null-ptr-deref for cpufreq_cpu_get_raw()
  Revert "cpufreq: brcmstb-avs-cpufreq: Fix initial command check"
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add SAR2130P compatible
  cpufreq: add virtual-cpufreq driver
  dt-bindings: cpufreq: add virtual cpufreq device
  cpufreq: loongson2: Unregister platform_driver on failure
  cpufreq: ti-cpufreq: Remove revision offsets in AM62 family
  cpufreq: ti-cpufreq: Allow backward compatibility for efuse syscon
  cppc_cpufreq: Remove HiSilicon CPPC workaround
  ...
2024-11-22 19:29:48 -08:00
Linus Torvalds
619d996c86 Hi,
This pull request has only some minor tweaks.
 
 BR, Jarkko
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRE6pSOnaBC00OEHEIaerohdGur0gUCZz+7VQAKCRAaerohdGur
 0up2AQCVWdhQq3JdGyU6CHEz28ZrcFCR5Mlwr6XHhtXeDb/jlAEA1CjVtSFho2ir
 bfaTfirN7RWEFdZyDpzqwROYu03rfwg=
 =cm2z
 -----END PGP SIGNATURE-----

Merge tag 'tpmdd-next-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull TPM updates from Jarkko Sakkinen:
 "Only some minor tweaks"

* tag 'tpmdd-next-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm: atmel: Drop PPC64 specific MMIO setup
  char: tpm: cr50: Add new device/vendor ID 0x50666666
  char: tpm: cr50: Move i2c locking to request/relinquish locality ops
  char: tpm: cr50: Use generic request/relinquish locality ops
  tpm: ibmvtpm: Set TPM_OPS_AUTO_STARTUP flag on driver
2024-11-22 17:25:32 -08:00
Linus Torvalds
d0c9a21c8e MTD device changes: Aside from the platform_driver::remove() switch, two
misc issues got fixed.
 
 SPI-NAND changes:
 A load of fixes to Winbond manufacturer driver have been done, plus a
 structure constification.
 
 Raw NAND changes:
 The GPMI driver has been improved on the power management side.
 The Davinci driver has been cleaned up.
 A leak in the Atmel driver plus some typos in the core have been fixed.
 
 SPI NOR changes:
 Introduce byte swap support for 8D-8D-8D mode and a user for it:
 macronix. SPI NOR flashes may swap the bytes on a 16-bit boundary when
 configured in Octal DTR mode. For such cases the byte order is
 propagated through SPI MEM to the SPI controllers so that the
 controllers swap the bytes back at runtime. This avoids breaking the
 boot sequence because of the endianness problems that appear when the
 bootloaders use 1-1-1 and the kernel uses 8D-8D-8D with byte swap
 support. Along with the SPI MEM byte swap support we queue a patch for
 the SPI MXIC controller that swaps the bytes back at runtime.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE9HuaYnbmDhq/XIDIJWrqGEe9VoQFAmc/WusACgkQJWrqGEe9
 VoR0Zgf/admMDFN51dtkz950bnOkZfot/4uLgUQCDenhbugHrom7KWQ6+oh1+HSN
 9EAjLoLNQzq4vxKx1WoI/99iJO86zg/DiyVD3nQidv9JkqHRDp2t13ZLclr4gGyW
 Kh1lDQ+9GwpB8CQQnxVaPL39NjjqR3RiEfEP/1fVgGYQvCt4yedhVsDT3WThJeVb
 1n7l54RBpZji88mT0chFB9CoSLnzrYZFh2MvzJaW/i1v02yZLXHFxFiKiKo+WysY
 FGQTY3x0j20H2Ib8RSP7ECegvNb1HtfIxAPsTIqDBGbrA+ahvBr0J/XxX3NbV3RT
 Ee4rXqL257zH9dC9Rr1LJAZCqiyx7w==
 =p+y9
 -----END PGP SIGNATURE-----

Merge tag 'mtd/for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull MTD updates from Miquel Raynal:
 "MTD device changes:
   - switch platform_driver back to remove()
   - misc fixes

  SPI-NAND changes:
   - a load of fixes to Winbond manufacturer driver
   - structure constification

  Raw NAND changes:
   - improve the power management of the GPMI driver
   - Davinci driver clean-ups
   - fix leak in the Atmel driver
   - fix some typos in the core

  SPI NOR changes:
   - Introduce byte swap support for 8D-8D-8D mode and a user for it:
     macronix.

     SPI NOR flashes may swap the bytes on a 16-bit boundary when
     configured in Octal DTR mode. For such cases the byte order is
     propagated through SPI MEM to the SPI controllers so that the
     controllers swap the bytes back at runtime. This avoids breaking
     the boot sequence because of the endianness problems that appear
     when the bootloaders use 1-1-1 and the kernel uses 8D-8D-8D with
     byte swap support. Along with the SPI MEM byte swap support we
     queue a patch for the SPI MXIC controller that swaps the bytes back
     at runtime"

* tag 'mtd/for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (25 commits)
  mtd: spi-nor: core: replace dummy buswidth from addr to data
  mtd: spi-nor: winbond: add "w/ and w/o SFDP" comment
  mtd: spi-nor: spansion: Use nor->addr_nbytes in octal DTR mode in RD_ANY_REG_OP
  mtd: Switch back to struct platform_driver::remove()
  mtd: cfi_cmdset_0002: remove redundant assignment to variable ret
  mtd: spinand: Constify struct nand_ecc_engine_ops
  MAINTAINERS: add mailing list for GPMI NAND driver
  mtd: spinand: winbond: Sort the devices
  mtd: spinand: winbond: Ignore the last ID characters
  mtd: spinand: winbond: Fix 512GW, 01GW, 01JW and 02JW ECC information
  mtd: spinand: winbond: Fix 512GW and 02JW OOB layout
  mtd: nand: raw: gpmi: improve power management handling
  mtd: nand: raw: gpmi: switch to SYSTEM_SLEEP_PM_OPS
  mtd: rawnand: davinci: use generic device property helpers
  mtd: rawnand: davinci: break the line correctly
  mtd: rawnand: davinci: order headers alphabetically
  mtd: rawnand: atmel: Fix possible memory leak
  mtd: rawnand: Correct multiple typos in comments
  mtd: hyperbus: rpc-if: Add missing MODULE_DEVICE_TABLE
  mtd: spi-nor: add support for Macronix Octal flash
  ...
2024-11-22 17:06:59 -08:00
Linus Torvalds
9f3a2ba62c The core framework gained a clk provider helper, a clk consumer helper, and
some unit tests for the assigned clk rates feature in DeviceTree. On the vendor
 driver side, we gained a whole pile of SoC driver support detailed below. The
 majority in the diffstat is Qualcomm, but there's also quite a few Samsung and
 Mediatek clk driver additions in here as well. The top vendors is quite common,
 but the sheer amount of new drivers is uncommon, so I'm anticipating a larger
 number of fixes for clk drivers this cycle.
 
 Core:
  - devm_clk_bulk_get_all_enabled() to return number of clks acquired
  - devm_clk_hw_register_gate_parent_hw() helper to modernize drivers
  - KUnit tests for clk-assigned-rates{,-u64}
 
 New Drivers:
  - Marvell PXA1908 SoC clks
  - Mobileye EyeQ5, EyeQ6L and EyeQ6H clk driver
  - TWL6030 clk driver
  - Nuvoton Arbel BMC NPCM8XX SoC clks
  - MediaTek MT6735 SoC clks
  - MediaTek MT7620, MT7628 and MT7688 MMC clks
  - Add a driver for gated fixed rate clocks
  - Global clock controllers for Qualcomm QCS8300 and IPQ5424 SoCs
  - Camera, display and video clock controllers for Qualcomm SA8775P SoCs
  - Global, display, GPU, TCSR, and RPMh clock controllers for Qualcomm SAR2130P
  - Global, camera, display, GPU, and video clock controllers for Qualcomm
    SM8475 SoCs
  - RTC power domain and Battery Backup Function (VBATTB) clock support for the
    Renesas RZ/G3S SoC
  - Qualcomm IPQ9574 alpha PLLs
  - Support for i.MX91 CCM in the i.MX93 driver
  - Microchip LAN969X SoC clks
  - Cortex-A55 core clocks and Interrupt Control Unit (ICU) clock and reset on
    Renesas RZ/V2H(P)
  - Samsung ExynosAutov920 clk drivers for PERIC1, MISC, HSI0 and HSI1
  - Samsung Exynos8895 clk drivers for FSYS0/1, PERIC0/1, PERIS and TOP
 
 Updates:
  - Convert more clk bindings to YAML
  - Various clk driver cleanups: NULL checks, add const, etc.
  - Remove END/NUM #defines that count number of clks in various binding headers
  - Continue moving reset drivers to drivers/reset via auxiliary bus
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmc/r1kRHHNib3lkQGtl
 cm5lbC5vcmcACgkQrQKIl8bklSUlaw/+NkmTMPSpgKy8NfZi6KoCk3U5llaknXvj
 Y/Y2pB7UpOFDTsSCKRcFrZ6JWS6GIogE70W9w+zxIht4QA4Ekd9vKT7VRhMl+8t/
 pz2i0c0Pm24hSye9LKM7JCVIVL8SNYonOs3wC1sfMVMDoUikVwupj6Bmj0nAYrBo
 hbJFBXtn/LbyYImJQ9hYqHnUtJKGp/N7hhpGu6kT/lbzcaWsBMp4lhH+s20DJz5e
 kdJVJGaLOELerAG/SHIxh9obtfznvex6x3itTB0o/d6/1DSDjjlxnZH8YV8eQWk0
 kK+ORuewA+qCi3RiPReHCPBIfPI4HL0z3k5JFA5eI7eD4VZIis+YBOa/Y8bQR9bG
 wDg5qh5su0fdeWBUvkFB03igNoMdtH68iYd2q3YE0ka95FGulcyvbqoyxTJnjIxW
 328PizYZV8LQ4+LGSdIFyp9f/SrjF0pAt7yTF8Dis3jq3ul/6ELX9G6OCNgtGKQz
 p0Hb01fKC4s7w48QI5OXQKfS382vS8G8a2NIwt2xxorc4+Dr2rjPvlDhErshCOAT
 nDEerIjGWr/0rQeTGxg+SLUx5ytq2aBkysg95/9WVe3b8kZeePiW9gEH4tgealY8
 eHzFvbqXutlKer0xLOYiLd3hOeHhkCJNj48QS8jVXtRGGeLjZONw5F1mjTNskPpx
 9jbKMcDjGyc=
 =FqLm
 -----END PGP SIGNATURE-----

Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk updates from Stephen Boyd:
 "The core framework gained a clk provider helper, a clk consumer
  helper, and some unit tests for the assigned clk rates feature in
  DeviceTree. On the vendor driver side, we gained a whole pile of SoC
  driver support detailed below. The majority in the diffstat is
  Qualcomm, but there's also quite a few Samsung and Mediatek clk driver
  additions in here as well. The top vendors is quite common, but the
  sheer amount of new drivers is uncommon, so I'm anticipating a larger
  number of fixes for clk drivers this cycle.

  Core:
   - devm_clk_bulk_get_all_enabled() to return number of clks acquired
   - devm_clk_hw_register_gate_parent_hw() helper to modernize drivers
   - KUnit tests for clk-assigned-rates{,-u64}

  New Drivers:
   - Marvell PXA1908 SoC clks
   - Mobileye EyeQ5, EyeQ6L and EyeQ6H clk driver
   - TWL6030 clk driver
   - Nuvoton Arbel BMC NPCM8XX SoC clks
   - MediaTek MT6735 SoC clks
   - MediaTek MT7620, MT7628 and MT7688 MMC clks
   - Add a driver for gated fixed rate clocks
   - Global clock controllers for Qualcomm QCS8300 and IPQ5424 SoCs
   - Camera, display and video clock controllers for Qualcomm SA8775P
     SoCs
   - Global, display, GPU, TCSR, and RPMh clock controllers for Qualcomm
     SAR2130P
   - Global, camera, display, GPU, and video clock controllers for
     Qualcomm SM8475 SoCs
   - RTC power domain and Battery Backup Function (VBATTB) clock support
     for the Renesas RZ/G3S SoC
   - Qualcomm IPQ9574 alpha PLLs
   - Support for i.MX91 CCM in the i.MX93 driver
   - Microchip LAN969X SoC clks
   - Cortex-A55 core clocks and Interrupt Control Unit (ICU) clock and
     reset on Renesas RZ/V2H(P)
   - Samsung ExynosAutov920 clk drivers for PERIC1, MISC, HSI0 and HSI1
   - Samsung Exynos8895 clk drivers for FSYS0/1, PERIC0/1, PERIS and TOP

  Updates:
   - Convert more clk bindings to YAML
   - Various clk driver cleanups: NULL checks, add const, etc.
   - Remove END/NUM #defines that count number of clks in various
     binding headers
   - Continue moving reset drivers to drivers/reset via auxiliary bus"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (162 commits)
  clk: clk-loongson2: Fix potential buffer overflow in flexible-array member access
  clk: Fix invalid execution of clk_set_rate
  clk: clk-loongson2: Fix memory corruption bug in struct loongson2_clk_provider
  clk: lan966x: make it selectable for ARCH_LAN969X
  clk: eyeq: add EyeQ6H west fixed factor clocks
  clk: eyeq: add EyeQ6H central fixed factor clocks
  clk: eyeq: add EyeQ5 fixed factor clocks
  clk: eyeq: add fixed factor clocks infrastructure
  clk: eyeq: require clock index with phandle in all cases
  clk: fixed-factor: add clk_hw_register_fixed_factor_index() function
  dt-bindings: clock: eyeq: add more Mobileye EyeQ5/EyeQ6H clocks
  dt-bindings: soc: mobileye: set `#clock-cells = <1>` for all compatibles
  clk: clk-axi-clkgen: make sure to enable the AXI bus clock
  dt-bindings: clock: axi-clkgen: include AXI clk
  clk: mmp: Add Marvell PXA1908 MPMU driver
  clk: mmp: Add Marvell PXA1908 APMU driver
  clk: mmp: Add Marvell PXA1908 APBCP driver
  clk: mmp: Add Marvell PXA1908 APBC driver
  dt-bindings: clock: Add Marvell PXA1908 clock bindings
  clk: mmp: Switch to use struct u32_fract instead of custom one
  ...
2024-11-22 17:02:25 -08:00
Linus Torvalds
2fb7eb3d7e - Improved handling of LCD power states and interactions with the fbdev subsystem.
- Introduced new LCD_POWER_ constants to decouple the LCD subsystem from fbdev.
 - Several drivers were updated to use the new LCD_POWER_ constants.
 - Clarified the semantics of the lcd_ops.controls_device callback.
 - Removed unnecessary includes and dependencies.
 - Removed unused notifier functionality.
 - Simplified code with scoped for-each loops.
 - Fixed module autoloading for the ktz8866 driver.
 - Updated device tree bindings to yaml format.
 - Minor cleanups and improvements in various drivers.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdrbJNaO+IJqU8IdIUa+KL4f8d2EFAmc/KTkACgkQUa+KL4f8
 d2HzWA//QtnboEtMOOt5hlY/MyGnwDG2nWWwbnlEVQ1zN3lQut4WWOocuyqIu6Wr
 tfRyLKOsiAWdjFi7NBQlV3ylNyyD4XU9/qNduOAMPe3VGj/lZB70coH+RL4L4JvX
 cSiue8vF9StX7gB7qxh1uVBastlCu9ofVZV4dFMD4hX+hA3SgBX5itmOfue9AMkm
 LXdA1DeDvigH5DMPIStHuQS0HVOxv06TJd1syzVBesa8G3raTwluTc16uPJkIwHd
 MnDqdvevaiW1ZnE3+8o2CmkLfFsB27iltwutXn9MbrUKBz0S7/ruNTib+RguFOaP
 Eo+ls0KIaiPH4xi0FuHlSiXKwszIvZ3GGeu3fO66k9m+vM2/IesBGgf0GR3Y8Lxr
 fwl4iNkLguKsBnLHBxhDnrQ3ltQiBw4q8Q2m/zQ4OoaW/sMX+514D3jOZmx3snE1
 ODE22bhoznRLyEAHisv4wA9snoAyvEItejczGC7HLJ0SbWUGBbqzHgvhjwMkh6R6
 0+6dJ22DTlSwZsyXcSkzA85ayiuxFLhwvD0nfxyGwUD//swg8L7ymnpVLOqpdJcc
 EgdVQeuaTij0UH9NutMDoIC1Z0SUMvpli1Wk0/Y1Hmg0vICZUUjewcli6VEbZGFu
 PKkkAeTPJaxfYWW64VHk6flSaW67VjqfiQ01F4oLlRwONcihpxM=
 =9n4l
 -----END PGP SIGNATURE-----

Merge tag 'backlight-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight

Pull backlight updates from Lee Jones:

 - Improve handling of LCD power states and interactions with the fbdev
   subsystem

 - Introduce new LCD_POWER_ constants to decouple the LCD subsystem from
   fbdev

 - Update several drivers to use the new LCD_POWER_ constants

 - Clarify the semantics of the lcd_ops.controls_device callback

 - Remove unnecessary includes and dependencies

 - Remove unused notifier functionality

 - Simplify code with scoped for-each loops

 - Fix module autoloading for the ktz8866 driver

 - Update device tree bindings to yaml format

 - Minor cleanups and improvements in various drivers

* tag 'backlight-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight: (33 commits)
  MAINTAINERS: Use Daniel Thompson's korg address for Backlight work
  dt-bindings: backlight: Convert zii,rave-sp-backlight.txt to yaml
  backlight: Remove notifier
  backlight: ktz8866: Fix module autoloading
  backlight: 88pm860x_bl: Simplify with scoped for each OF child loop
  backlight: lcd: Do not include <linux/fb.h> in lcd header
  backlight: lcd: Remove struct fb_videomode from set_mode callback
  backlight: lcd: Replace check_fb with controls_device
  HID: picoLCD: Replace check_fb in favor of struct fb_info.lcd_dev
  fbdev: omap: Use lcd power constants
  fbdev: imxfb: Use lcd power constants
  fbdev: imxfb: Replace check_fb in favor of struct fb_info.lcd_dev
  fbdev: clps711x-fb: Use lcd power constants
  fbdev: clps711x-fb: Replace check_fb in favor of struct fb_info.lcd_dev
  backlight: tdo24m: Use lcd power constants
  backlight: platform_lcd: Use lcd power constants
  backlight: platform_lcd: Remove match_fb from struct plat_lcd_data
  backlight: platform_lcd: Remove include statement for <linux/backlight.h>
  backlight: otm3225a: Use lcd power constants
  backlight: ltv350qv: Use lcd power constants
  ...
2024-11-22 16:29:57 -08:00
Linus Torvalds
93251bdf7a - Removed unused local header files from various drivers.
- Reverted platform driver removal to the original method for consistency.
 - Introduced ordered workqueues for LED events, replacing the less efficient system_wq.
 - Switched to a safer iteration macro in several drivers to prevent potential memory leaks.
 - Fixed a refcounting bug in the mt6360 flash LED driver.
 - Fixed an uninitialized variable in the mt6370_mc_pattern_clear() function.
 - Resolved Smatch warnings in the leds-bcm6328 driver.
 - Addressed a potential NULL pointer dereference in the brightness_show() function.
 - Fixed an incorrect format specifier in the ss4200 driver.
 - Prevented a resource leak in the max5970 driver's probe function.
 - Added support for specifying the number of serial shift bits in the device tree for the BCM63138 family.
 - Implemented multicolor brightness control in the lp5562 driver.
 - Added a device tree property to override the default LED pin polarity.
 - Added a property to specify the default brightness value when the LED is initially on.
 - Set missing timing properties for the ktd2692 driver.
 - Documented the "rc-feedback" trigger for controlling LEDs based on remote control activity.
 - Converted text bindings to YAML for the pca955x driver to enable device tree validation.
 - Removed redundant checks for invalid channel numbers in the lp55xx driver.
 - Updated the MAINTAINERS file with current contact information.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdrbJNaO+IJqU8IdIUa+KL4f8d2EFAmc/KSgACgkQUa+KL4f8
 d2HvLRAAsTuxvq89Ooe5eVWTY39fADykEx8sLgKTJWBs+A4EkD6a6DZAU0QQKZeX
 o7ERtasl5E0WFXF2BaVYPWNDjcSFlqG3AB5PbQLpzUUEkllJWEaWz5ZthpTMct5n
 Q/ylbTA8dYS+bw3bztTKWRWX8RMXKSHdjymJoSuoFLY7T1MLqNHZbu+whNKfC++h
 UMMXyx2M9B+3/0cDghy1EvCWkBTVOL9GzhQhAc+guQTCA8VeK+7LilnCtDOwymdO
 lScTM87S5gD6n22tcDFHhHH+qXVG8LfNpRRKiZdv6BmFJwHpXD+nAMReJOmyReDp
 jepy1aX/W/1cw/GMzd/nOMx/8u8rbO5Up9euCS/jFvgSpn9bKYIlMy2CVsAznOGN
 elt/kCD3nIxY8pzMTnvxCEVYj+hfbCKQWq51cm6G5hhQbfVw72fwiVIGTy8fadO3
 kLOiWM3EKvtpjbdMl0BzWVOIPLO0gQOeYH7ZYb7TM0g3mtFo47DhoGHYQCiaGgko
 Bpb+4fcXpeFLTutM5HGvqqfOCnl/uk4Gf3KhdECSmjFb3L0JbWE/JN3CMq+AR7oK
 FcOhYhNVEGu7VILQYScmChz/DzJT267lyLKF7y1jkmz5ZKMD2r+8XZ2xEb5ks6PG
 +gU7lMY2OXWs/lB9rGTRfCkS50DFJfNG2F5oN7fZ5UJLMWL1EY8=
 =qZhx
 -----END PGP SIGNATURE-----

Merge tag 'leds-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds

PULL LED updates from Lee Jones:

 - Remove unused local header files from various drivers

 - Revert platform driver removal to the original method for consistency

 - Introduce ordered workqueues for LED events, replacing the less
   efficient system_wq

 - Switch to a safer iteration macro in several drivers to prevent
   potential memory leaks

 - Fix a refcounting bug in the mt6360 flash LED driver

 - Fix an uninitialized variable in the mt6370_mc_pattern_clear()
   function

 - Resolve Smatch warnings in the leds-bcm6328 driver

 - Address a potential NULL pointer dereference in the brightness_show()
   function

 - Fix an incorrect format specifier in the ss4200 driver

 - Prevent a resource leak in the max5970 driver's probe function

 - Add support for specifying the number of serial shift bits in the
   device tree for the BCM63138 family

 - Implement multicolor brightness control in the lp5562 driver

 - Add a device tree property to override the default LED pin polarity

 - Add a property to specify the default brightness value when the LED
   is initially on

 - Set missing timing properties for the ktd2692 driver

 - Document the "rc-feedback" trigger for controlling LEDs based on
   remote control activity

 - Convert text bindings to YAML for the pca955x driver to enable device
   tree validation

 - Remove redundant checks for invalid channel numbers in the lp55xx
   driver

 - Update the MAINTAINERS file with current contact information

* tag 'leds-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds: (46 commits)
  leds: ss4200: Fix the wrong format specifier for 'blinking'
  leds: pwm: Add optional DT property default-brightness
  dt-bindings: leds: pwm: Add default-brightness property
  leds: class: Protect brightness_show() with led_cdev->led_access mutex
  leds: ktd2692: Set missing timing properties
  leds: max5970: Fix unreleased fwnode_handle in probe function
  leds: Introduce ordered workqueue for LEDs events instead of system_wq
  MAINTAINERS: Replace Siemens IPC related bouncing maintainers
  leds: bcm6328: Replace divide condition with comparison for shift value
  leds: lp55xx: Remove redundant test for invalid channel number
  dt-bindings: leds: pca955x: Convert text bindings to YAML
  leds: rgb: leds-mt6370-rgb: Fix uninitialized variable 'ret' in mt6370_mc_pattern_clear
  leds: lp5562: Add multicolor brightness control
  dt-bindings: leds: Add 'active-high' property
  leds: Switch back to struct platform_driver::remove()
  leds: bcm63138: Add some register defines
  leds: bcm63138: Handle shift register config
  leds: bcm63138: Use scopes and guards
  dt-bindings: leds: bcm63138: Add shift register bits
  leds: leds-gpio-register: Reorganize kerneldoc parameter names
  ...
2024-11-22 16:25:20 -08:00
Linus Torvalds
80739fd00c - Several drivers, including atmel-flexcom/rk8xx-core, palmas, and
tps65010, have undergone minor code improvements to enhance consistency and
   fix race conditions.
 
 - The syscon driver now utilizes the regmap max_register_is_0 capability
   for consistent register map configuration across syscons of all sizes.
 
 - New device support has been added for QCS8300, qcs615, SA8255p, and
   samsung,s2dos05, expanding the range of compatible hardware.
 
 - The cros_ec driver now supports loading cros_ec_ucsi on supported ECs
   and avoids loading the charger with UCSI, streamlining functionality.
 
 - The bd96801 driver now utilizes the more modern maple tree register
   cache, improving performance.
 
 - The da9052-spi driver has undergone a fix to change the read-mask to
   write-mask, preventing potential issues.
 
 - Unused declarations in max77693 have been removed, and support for
   samsung,s2dos05 has been added, enhancing code clarity and device compatibility.
 
 - Error handling in cs42l43 has been fixed to avoid unbalanced regulator
   put and ensure proper synchronization during driver removal.
 
 - The wcd934x driver now uses MODULE_DEVICE_TABLE() instead of
   MODULE_ALIAS(), improving code consistency.
 
 - Documentation for qcom,tcsr, syscon, and atmel-smc has been updated
   and reorganized for better clarity and maintainability.
 
 - The intel_soc_pmic_bxtwc driver has undergone significant improvements,
   including the use of IRQ domains for various devices, fixing IRQ domain names
   duplication, and code refactoring for better consistency and maintainability.
 
 - The ipaq-micro driver has received a fix for a missing break statement in
   the default case, enhancing code robustness.
 
 - Support for the AXP323 PMIC has been added to the axp20x driver, along
   with ensuring a clear relationship between IDs and model names, and allowing
   multiple regulators, broadening hardware compatibility.
 
 - The cs42l43 driver now disables IRQs during suspend for improved power
   management.
 
 - The adp5585 driver has reduced its dependencies by dropping the obsolete
   dependency on COMPILE_TEST.
 
 - Initial support for the MT6328 PMIC has been added to the mt6397 driver,
   expanding the range of supported hardware.
 
 - The rtc-bd70528 driver has been simplified by dropping the IC name from
   IRQ, improving code readability.
 
 - Documentation for qcom,spmi-pmic, ti,twl, and zii,rave-sp has been
   updated to enhance clarity and incorporate new features.
 
 - The rt5033 driver has received a fix for a missing regmap_del_irq_chip()
   in the error handling path.
 
 - New device support has been added for MSM8917, and the
   intel_soc_pmic_crc driver now supports non-ACPI instantiated i2c_client.
 
 - The 88pm886 driver has added support for the RTC cell, and the tqmx86
   driver has improved its GPIO IRQ setup and added I2C IRQ support,
   increasing functionality.
 
 - The sprd,sc2731 DT schema has been updated and converted to YAML format
   for better readability and maintainability.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEdrbJNaO+IJqU8IdIUa+KL4f8d2EFAmc/KQQACgkQUa+KL4f8
 d2E0Gg//blIYrtUgGy5xEwR8WIobYtAxBo+AMX1tSgh4Hs4u/SFhy4cE7no+M3J2
 Id5gQJscuz3k4sH0raoUMp2NRFyI8lD8Y9xRBTDE+KV/FbdL1KHfmTun2NY1zG7U
 LIop39HsJkJ0Z06lnyBf61QK6SmlzT8vXbJmK4mYf8wgBX7iFDZ0FMZHP2uW5k/Z
 UV8nyQalwerG+jOGXfQkVDXF8YKToqPtqsFWTJ1Yn5gs1SCd6dyusDNYqUDuW4Ng
 dbu/4wt3mspliTOnBTPnXlcVsCNefhtbCWxyBpaA3luK9ciMdX7cZ8wei1xkFcwK
 5bXPjXsFiiUbDX0l/6eS1h676k1JQl5iABlhGXHJm/GMcN9fdNFCQL/2rtJ4iSfW
 0CoYjERfm6OyHF0Wiuk3I8x/AARWKXtDEjktGXUL0do7NBqJgB3ISme8x8b5hW4l
 HO6MmsFmHxHbIlb+kCTTCtXa5R1Sdca/8qrPxMb+B89X3eOtF7sjVgS9dwkLNCGp
 hqP0K2IGNaRw+EDlXCBaWrbq7x0kpup6o+nooViU0Pj9fFjEdZlCLyu22+kjl04V
 Lfe3x9wMXBrHVrPynoaQp6+57QlWfpM0uuKJWoaKlCoJTh8UbFcWWkDqr6I/pDur
 EtfSwOO8uVuS8m/FMAs0m/+zrWfHAvjAbAHFCKBu/vKaD5DvxeI=
 =YP3r
 -----END PGP SIGNATURE-----

Merge tag 'mfd-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd

Pull MFD updates from Lee Jones:

 - Several drivers, including atmel-flexcom/rk8xx-core, palmas, and
   tps65010, have undergone minor code improvements to enhance
   consistency and fix race conditions.

 - The syscon driver now utilizes the regmap max_register_is_0
   capability for consistent register map configuration across syscons
   of all sizes.

 - New device support has been added for QCS8300, qcs615, SA8255p, and
   samsung,s2dos05, expanding the range of compatible hardware.

 - The cros_ec driver now supports loading cros_ec_ucsi on supported ECs
   and avoids loading the charger with UCSI, streamlining functionality.

 - The bd96801 driver now utilizes the more modern maple tree register
   cache, improving performance.

 - The da9052-spi driver has undergone a fix to change the read-mask to
   write-mask, preventing potential issues.

 - Unused declarations in max77693 have been removed, and support for
   samsung,s2dos05 has been added, enhancing code clarity and device
   compatibility.

 - Error handling in cs42l43 has been fixed to avoid unbalanced
   regulator put and ensure proper synchronization during driver
   removal.

 - The wcd934x driver now uses MODULE_DEVICE_TABLE() instead of
   MODULE_ALIAS(), improving code consistency.

 - Documentation for qcom,tcsr, syscon, and atmel-smc has been updated
   and reorganized for better clarity and maintainability.

 - The intel_soc_pmic_bxtwc driver has undergone significant
   improvements, including the use of IRQ domains for various devices,
   fixing IRQ domain names duplication, and code refactoring for better
   consistency and maintainability.

 - The ipaq-micro driver has received a fix for a missing break
   statement in the default case, enhancing code robustness.

 - Support for the AXP323 PMIC has been added to the axp20x driver,
   along with ensuring a clear relationship between IDs and model names,
   and allowing multiple regulators, broadening hardware compatibility.

 - The cs42l43 driver now disables IRQs during suspend for improved
   power management.

 - The adp5585 driver has reduced its dependencies by dropping the
   obsolete dependency on COMPILE_TEST.

 - Initial support for the MT6328 PMIC has been added to the mt6397
   driver, expanding the range of supported hardware.

 - The rtc-bd70528 driver has been simplified by dropping the IC name
   from IRQ, improving code readability.

 - Documentation for qcom,spmi-pmic, ti,twl, and zii,rave-sp has been
   updated to enhance clarity and incorporate new features.

 - The rt5033 driver has received a fix for a missing
   regmap_del_irq_chip() in the error handling path.

 - New device support has been added for MSM8917, and the
   intel_soc_pmic_crc driver now supports non-ACPI instantiated
   i2c_client.

 - The 88pm886 driver has added support for the RTC cell, and the tqmx86
   driver has improved its GPIO IRQ setup and added I2C IRQ support,
   increasing functionality.

 - The sprd,sc2731 DT schema has been updated and converted to YAML
   format for better readability and maintainability.

* tag 'mfd-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (62 commits)
  dt-bindings: mfd: bd71828: Use charger resistor in mOhm instead of MOhm
  dt-bindings: mfd: sprd,sc2731: Convert to YAML
  mfd: tqmx86: Add I2C IRQ support
  mfd: tqmx86: Make IRQ setup errors non-fatal
  mfd: tqmx86: Refactor GPIO IRQ setup
  mfd: tqmx86: Improve gpio_irq module parameter description
  mfd: tqmx86: Add board definitions for TQMx120UC, TQMx130UC and TQMxE41S
  mfd: 88pm886: Add the RTC cell
  dt-bindings: mfd: Add Realtek RTL9300 switch peripherals
  mfd: intel_soc_pmic_crc: Add support for non ACPI instantiated i2c_client
  mfd: intel_soc_pmic_*: Consistently use filename as driver name
  dt-bindings: mfd: qcom,tcsr: Add compatible for MSM8917
  mfd: rt5033: Fix missing regmap_del_irq_chip()
  mfd: cgbc-core: Fix error handling paths in cgbc_init_device()
  dt-bindings: mfd: aspeed: Support for AST2700
  mfd: Switch back to struct platform_driver::remove()
  dt-bindings: mfd: qcom,spmi-pmic: Document PMICs added in SM8750
  mfd: rtc: bd7xxxx Drop IC name from IRQ
  mfd: mt6397: Add initial support for MT6328
  mfd: adp5585: Drop obsolete dependency on COMPILE_TEST
  ...
2024-11-22 16:19:47 -08:00
Linus Torvalds
e288c352a4 linux_kselftest-kunit-6.13-rc1-fixed
kunit update for Linux 6.13-rc1
 
 -- fixes user-after-free (UAF) bug in kunit_init_suite()
 
 -- adds option to kunit tool to print just the summary of test results
 
 -- adds option to kunit tool to print just the failed test results
 
 -- fixes kunit_zalloc_skb() to use user passed in gfp value instead of
    hardcoding GFP_KERNEL
 
 -- fixes kunit_zalloc_skb() kernel doc to include allocation flags variable
 
 -- updates KUnit email address for Brendan Higgins
 
 -- adds LoongArch config to qemu_configs
 
 -- changes tool to allow overriding the shutdown mode from qemu config
 
 -- enables shutdown in loongarch qemu_config
 
 -- fixes potential null dereference in kunit_device_driver_test()
 
 -- fixes debugfs to use IS_ERR() for alloc_string_stream() error check
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmc/zUAACgkQCwJExA0N
 QxyHtRAAo439aoXwyLs/tTezI3K8O3gT0QJTqOCuOdE3OtuqfHOn5LpAXEeEfAHo
 Mb0G0jXuMJSHrt3psUQFBEJ3+wKax8FtNoUJN7Tc8gbQJ1Ue8UBAVKdTUmca6lqB
 SwIaZA4hZBujj9z1YwnMYkTnuGAG03rzOjE1biZq+GNuo7cw5OQ/49yQA2BBR9fR
 zVxMcc2C0JfMkGezR74/gV7uWHyVmG/SOofoxLs2JsgGPN4hMAqtjLgVjqZ9CuQ+
 YJWKxrRNrhMJrfYxGHPI9Yvab919Vftkf+oUlJArjwCBCZQihkScuuwDvBdZx+u3
 Npt81j2lpglxJNiXIm2K5lsKs6djhT6omueOWVIlibF6Ee3Hmd3b7Xc+AJbJMUMe
 xQyro059spIP7ezgrGeC2hETkt6CAGD7K6R9iedh2oOozC/Tp03a+8jE6mux2EQm
 nhrQIqazbNRdnv3Pe3+G09peqgQ9FlH2hHoMEoSPM4JQNJWGYhDG19YwLO8fnzsO
 HHb4xS8DLxk0uu87HWI89GEptVi/s0LrqQeUJb0ZlAIdU842TsTSOWmxu6M6FBsx
 9hFkBVsFkAKt5912gnm4qQSgkwkNw4Ou7MnMbHPm/gFpmIkGPhqpCmQSScct3lZx
 YWNxQ5CcQeNk4ngWJz3d1Voq27tL1yd5Zvbg97v4wHnSyo4W+JY=
 =gRq+
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest-kunit-6.13-rc1-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kunit updates from Shuah Khan:

 - fix user-after-free (UAF) bug in kunit_init_suite()

 - add option to kunit tool to print just the summary of test results

 - add option to kunit tool to print just the failed test results

 - fix kunit_zalloc_skb() to use user passed in gfp value instead of
   hardcoding GFP_KERNEL

 - fixe kunit_zalloc_skb() kernel doc to include allocation flags
   variable

 - update KUnit email address for Brendan Higgins

 - add LoongArch config to qemu_configs

 - allow overriding the shutdown mode from qemu config

 - enable shutdown in loongarch qemu_config

 - fix potential null dereference in kunit_device_driver_test()

 - fix debugfs to use IS_ERR() for alloc_string_stream() error check

* tag 'linux_kselftest-kunit-6.13-rc1-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: qemu_configs: loongarch: Enable shutdown
  kunit: tool: Allow overriding the shutdown mode from qemu config
  kunit: qemu_configs: Add LoongArch config
  kunit: debugfs: Use IS_ERR() for alloc_string_stream() error check
  kunit: Fix potential null dereference in kunit_device_driver_test()
  MAINTAINERS: Update KUnit email address for Brendan Higgins
  kunit: string-stream: Fix a UAF bug in kunit_init_suite()
  kunit: tool: print failed tests only
  kunit: tool: Only print the summary
  kunit: skb: add gfp to kernel doc for kunit_zalloc_skb()
  kunit: skb: use "gfp" variable instead of hardcoding GFP_KERNEL
2024-11-22 16:11:56 -08:00
Linus Torvalds
06afb0f361 tracing updates for v6.13:
- Addition of faultable tracepoints
 
   There's a tracepoint attached to both a system call entry and exit. This
   location is known to allow page faults. The tracepoints are called under
   an rcu_read_lock() which does not allow faults that can sleep. This limits
   the ability of tracepoint handlers to page fault in user space system call
   parameters. Now these tracepoints have been made "faultable", allowing the
   callbacks to fault in user space parameters and record them.
 
   Note, only the infrastructure has been implemented. The consumers (perf,
   ftrace, BPF) now need to have their code modified to allow faults.
 
 - Fix up of BPF code for the tracepoint faultable logic
 
 - Update tracepoints to use the new static branch API
 
 - Remove trace_*_rcuidle() variants and the SRCU protection they used
 
 - Remove unused TRACE_EVENT_FL_FILTERED logic
 
 - Replace strncpy() with strscpy() and memcpy()
 
 - Use replace per_cpu_ptr(smp_processor_id()) with this_cpu_ptr()
 
 - Fix perf events to not duplicate samples when tracing is enabled
 
 - Replace atomic64_add_return(1, counter) with atomic64_inc_return(counter)
 
 - Make stack trace buffer 4K instead of PAGE_SIZE
 
 - Remove TRACE_FLAG_IRQS_NOSUPPORT flag as it was never used
 
 - Get the true return address for function tracer when function graph tracer
   is also running.
 
   When function_graph trace is running along with function tracer,
   the parent function of the function tracer sometimes is
   "return_to_handler", which is the function graph trampoline to record
   the exit of the function. Use existing logic that calls into the
   fgraph infrastructure to find the real return address.
 
 - Remove (un)regfunc pointers out of tracepoint structure
 
 - Added last minute bug fix for setting pending modules in stack function
   filter.
 
   echo "write*:mod:ext3" > /sys/kernel/tracing/stack_trace_filter
 
   Would cause a kernel NULL dereference.
 
 - Minor clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZz6dehQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qlQsAP9aB0XGUV3UykvjZuKK84VDZ26a2hZH
 X2JDYsNA4luuPAEAz/BG2rnslfMZ04WTMAl8h1eh10lxcuHG0wQMHVBXIwI=
 =lzb5
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Addition of faultable tracepoints

   There's a tracepoint attached to both a system call entry and exit.
   This location is known to allow page faults. The tracepoints are
   called under an rcu_read_lock() which does not allow faults that can
   sleep. This limits the ability of tracepoint handlers to page fault
   in user space system call parameters. Now these tracepoints have been
   made "faultable", allowing the callbacks to fault in user space
   parameters and record them.

   Note, only the infrastructure has been implemented. The consumers
   (perf, ftrace, BPF) now need to have their code modified to allow
   faults.

 - Fix up of BPF code for the tracepoint faultable logic

 - Update tracepoints to use the new static branch API

 - Remove trace_*_rcuidle() variants and the SRCU protection they used

 - Remove unused TRACE_EVENT_FL_FILTERED logic

 - Replace strncpy() with strscpy() and memcpy()

 - Use replace per_cpu_ptr(smp_processor_id()) with this_cpu_ptr()

 - Fix perf events to not duplicate samples when tracing is enabled

 - Replace atomic64_add_return(1, counter) with
   atomic64_inc_return(counter)

 - Make stack trace buffer 4K instead of PAGE_SIZE

 - Remove TRACE_FLAG_IRQS_NOSUPPORT flag as it was never used

 - Get the true return address for function tracer when function graph
   tracer is also running.

   When function_graph trace is running along with function tracer, the
   parent function of the function tracer sometimes is
   "return_to_handler", which is the function graph trampoline to record
   the exit of the function. Use existing logic that calls into the
   fgraph infrastructure to find the real return address.

 - Remove (un)regfunc pointers out of tracepoint structure

 - Added last minute bug fix for setting pending modules in stack
   function filter.

     echo "write*:mod:ext3" > /sys/kernel/tracing/stack_trace_filter

   Would cause a kernel NULL dereference.

 - Minor clean ups

* tag 'trace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (31 commits)
  ftrace: Fix regression with module command in stack_trace_filter
  tracing: Fix function name for trampoline
  ftrace: Get the true parent ip for function tracer
  tracing: Remove redundant check on field->field in histograms
  bpf: ensure RCU Tasks Trace GP for sleepable raw tracepoint BPF links
  bpf: decouple BPF link/attach hook and BPF program sleepable semantics
  bpf: put bpf_link's program when link is safe to be deallocated
  tracing: Replace strncpy() with strscpy() when copying comm
  tracing: Add might_fault() check in __DECLARE_TRACE_SYSCALL
  tracing: Fix syscall tracepoint use-after-free
  tracing: Introduce tracepoint_is_faultable()
  tracing: Introduce tracepoint extended structure
  tracing: Remove TRACE_FLAG_IRQS_NOSUPPORT
  tracing: Replace multiple deprecated strncpy with memcpy
  tracing: Make percpu stack trace buffer invariant to PAGE_SIZE
  tracing: Use atomic64_inc_return() in trace_clock_counter()
  trace/trace_event_perf: remove duplicate samples on the first tracepoint event
  tracing/bpf: Add might_fault check to syscall probes
  tracing/perf: Add might_fault check to syscall probes
  tracing/ftrace: Add might_fault check to syscall probes
  ...
2024-11-22 13:27:01 -08:00
Linus Torvalds
4b01712311 tracing/tools: Updates for 6.13
- Add ':' to getopt option 'trace-buffer-size' in timerlat_hist for
   consistency
 
 - Remove unused sched_getattr define
 
 - Rename sched_setattr() helper to syscall_sched_setattr() to avoid
   conflicts
 
 - Update counters to long from int to avoid overflow
 
 - Add libcpupower dependency detection
 
 - Add --deepest-idle-state to timerlat to limit deep idle sleeps
 
 - Other minor clean ups and documentation changes
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZz5O/hQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qkLlAQDAJ0MASrdbJRDrLrfmKX6sja582MLe
 3MvevdSkOeXRdQEA0tzm46KOb5/aYNotzpntQVkTjuZiPBHSgn1JzASiaAI=
 =OZ1w
 -----END PGP SIGNATURE-----

Merge tag 'trace-tools-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing tools updates from Steven Rostedt:

 - Add ':' to getopt option 'trace-buffer-size' in timerlat_hist for
   consistency

 - Remove unused sched_getattr define

 - Rename sched_setattr() helper to syscall_sched_setattr() to avoid
   conflicts

 - Update counters to long from int to avoid overflow

 - Add libcpupower dependency detection

 - Add --deepest-idle-state to timerlat to limit deep idle sleeps

 - Other minor clean ups and documentation changes

* tag 'trace-tools-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  verification/dot2: Improve dot parser robustness
  tools/rtla: Improve exception handling in timerlat_load.py
  tools/rtla: Enhance argument parsing in timerlat_load.py
  tools/rtla: Improve code readability in timerlat_load.py
  rtla/timerlat: Do not set params->user_workload with -U
  rtla: Documentation: Mention --deepest-idle-state
  rtla/timerlat: Add --deepest-idle-state for hist
  rtla/timerlat: Add --deepest-idle-state for top
  rtla/utils: Add idle state disabling via libcpupower
  rtla: Add optional dependency on libcpupower
  tools/build: Add libcpupower dependency detection
  rtla/timerlat: Make timerlat_hist_cpu->*_count unsigned long long
  rtla/timerlat: Make timerlat_top_cpu->*_count unsigned long long
  tools/rtla: fix collision with glibc sched_attr/sched_set_attr
  tools/rtla: drop __NR_sched_getattr
  rtla: Fix consistency in getopt_long for timerlat_hist
  rv: Fix a typo
  tools/rv: Correct the grammatical errors in the comments
  tools/rv: Correct the grammatical errors in the comments
  rtla: use the definition for stdout fd when calling isatty()
2024-11-22 13:24:22 -08:00
Linus Torvalds
f1db825805 trace ring-buffer updates for v6.13
- Limit time interrupts are disabled in rb_check_pages()
 
   The rb_check_pages() is called after the ring buffer size is updated to
   make sure that the ring buffer has not been corrupted. Commit
   c2274b908d ("ring-buffer: Fix a race between readers and resize
   checks") fixed a race with the check pages and simultaneous resizes to the
   ring buffer by adding a raw_spin_lock_irqsave() around the check
   operation. Although this was a simple fix, it would hold interrupts
   disabled for non determinative amount of time. This could harm PREEMPT_RT
   operations.
 
   Instead, modify the logic by adding a counter when the buffer is modified
   and to release the raw_spin_lock() at each iteration. It checks the
   counter under the lock to see if a modification happened during the loop,
   and if it did, it would restart the loop up to 3 times. After 3 times, it
   will simply exit the check, as it is unlikely that would ever happen as
   buffer resizes are rare occurrences.
 
 - Replace some open coded str_low_high() with the helper
 
 - Fix some documentation/comments
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZz5KNxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qiANAP4/6cSGOhQgIkaN8UsKmWTfBqU89JK2
 a4tqAZWKsQormgEAkDLPD0Lda0drmu/Dwnr/klS21yyLcQBzyX1CYw9G4gY=
 =jkLz
 -----END PGP SIGNATURE-----

Merge tag 'trace-ring-buffer-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull trace ring-buffer updates from Steven Rostedt:

 - Limit time interrupts are disabled in rb_check_pages()

   rb_check_pages() is called after the ring buffer size is updated to
   make sure that the ring buffer has not been corrupted. Commit
   c2274b908d ("ring-buffer: Fix a race between readers and resize
   checks") fixed a race with the check pages and simultaneous resizes
   to the ring buffer by adding a raw_spin_lock_irqsave() around the
   check operation. Although this was a simple fix, it would hold
   interrupts disabled for non determinative amount of time. This could
   harm PREEMPT_RT operations.

   Instead, modify the logic by adding a counter when the buffer is
   modified and to release the raw_spin_lock() at each iteration. It
   checks the counter under the lock to see if a modification happened
   during the loop, and if it did, it would restart the loop up to 3
   times. After 3 times, it will simply exit the check, as it is
   unlikely that would ever happen as buffer resizes are rare
   occurrences.

 - Replace some open coded str_low_high() with the helper

 - Fix some documentation/comments

* tag 'trace-ring-buffer-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Correct a grammatical error in a comment
  ring-buffer: Use str_low_high() helper in ring_buffer_producer()
  ring-buffer: Reorganize kerneldoc parameter names
  ring-buffer: Limit time with disabled interrupts in rb_check_pages()
2024-11-22 13:11:17 -08:00
Linus Torvalds
be4202228e - Add new infrastructure for reading TDX metadata
- Use the newly-available metadata to:
   - Disable potentially nasty #VE exceptions
   - Get more complete CPU topology information from the VMM
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmc9PCoACgkQaDWVMHDJ
 krDIjA//Wo2GVkPkviCdRxAsd0lWyKN52mVHPy1JAlT8H/idOpreMHod1Hn/Jga9
 UqSH2lKZe1JM9KcyOW/1xTVxlJtYxj+aIbDkIVj4qk84e34yOXlJXvBiXbNgscSU
 ZWDN6C20LJuv5ZNb7rA1hGRrPlPS8C5g74oBWQX6qAhBCPS6n9ir0wgixZUwV4kw
 eH3QWyVzhNOKzKqGcHY4+DCe5odn8VDxGXtC97NNRzg1kfu4s0OaCF4EpCAOhZgA
 V/K7eFJQdRsnXtQ0eOq9O6l04cd6aqEyYp76Y6FL9Ztfno6P/YopmfVwXL1wYoHr
 aXO/Qi0km+Vz8xXr6blHC/s/amiLHxn6Ou1ZqUmniOcZiYva9GfyUoHttegH4IcV
 /kDfG52A5T1rREPwakYPTMUhan04nTk7rbjwyXbATyRtbKatl5bbsli4kw0T2R5P
 laCPabvXv3eeBCk+PYPnuVCstfpN5VsLr4VUz3wSMoIEbq3l4lvO/yKaok+KCtyD
 qQjk3H09GDvlR7gWB20Tg5ajyOeoK99gD3qwtXZ7APvvAQ9MewN+fgj2z5QPYnrd
 Bea88gjsSYbJ1RYG68c378rRI0uM6RKXjXu3wSro+QN2oTuE7jfNUh8/SiDi/Ssi
 fdYTFcrVLA3/b9dEqhCOSCVyEifgLMDorAODrdf0oJDuCpeoa4E=
 =NIl1
 -----END PGP SIGNATURE-----

Merge tag 'x86_tdx_for_6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull tdx updates from Dave Hansen:
 "These essentially refine some interactions between TDX guests and
  VMMs.

  The first leverages a new TDX module feature to runtime disable the
  ability for a VM to inject #VE exceptions. Before this feature, there
  was only a static on/off switch and the guest had to panic if it was
  configured in a bad state.

  The second lets the guest opt in to be able to access the topology
  CPUID leaves. Before this, accesses to those leaves would #VE.

  For both of these, it would have been nicest to just change the
  default behavior, but some pesky "other" OSes evidently need to retain
  the legacy behavior.

  Summary:

   - Add new infrastructure for reading TDX metadata

   - Use the newly-available metadata to:
      - Disable potentially nasty #VE exceptions
      - Get more complete CPU topology information from the VMM"

* tag 'x86_tdx_for_6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tdx: Enable CPU topology enumeration
  x86/tdx: Dynamically disable SEPT violations from causing #VEs
  x86/tdx: Rename tdx_parse_tdinfo() to tdx_setup()
  x86/tdx: Introduce wrappers to read and write TD metadata
2024-11-22 13:07:19 -08:00