These two callers of bpf_local_storage_alloc() are the same as
bpf_selem_alloc(): bpf_sk_storage_clone() and
bpf_local_storage_update(). The running contexts of these two callers
have already disabled migration, therefore, there is no need to add
extra migrate_{disable|enable} pair in bpf_local_storage_alloc().
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-15-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_selem_alloc() has two callers:
(1) bpf_sk_storage_clone_elem()
bpf_sk_storage_clone() has already disabled migration before invoking
bpf_sk_storage_clone_elem().
(2) bpf_local_storage_update()
Its callers include: cgrp/task/inode/sock storage ->map_update_elem()
callbacks and bpf_{cgrp|task|inode|sk}_storage_get() helpers. These
running contexts have already disabled migration
Therefore, there is no need to add extra migrate_{disable|enable} pair
in bpf_selem_alloc().
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-14-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When BPF program invokes bpf_cpumask_release(), the migration must have
been disabled. When bpf_cpumask_release_dtor() invokes
bpf_cpumask_release(), the caller bpf_obj_free_fields() also has
disabled migration, therefore, it is OK to remove the unnecessary
migrate_{disable|enable} pair in bpf_cpumask_release().
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-13-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The callers of bpf_obj_free_fields() have already guaranteed that the
migration is disabled, therefore, there is no need to invoke
migrate_{disable,enable} pair in bpf_obj_free_fields()'s underly
implementation.
This patch removes unnecessary migrate_{disable|enable} pairs from
bpf_obj_free_fields() and its callees: bpf_list_head_free() and
bpf_rb_root_free().
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-12-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The freeing of all map elements may invoke bpf_obj_free_fields() to free
the special fields in the map value. Since these special fields may be
allocated from bpf memory allocator, migrate_{disable|enable} pairs are
necessary for the freeing of these special fields.
To simplify reasoning about when migrate_disable() is needed for the
freeing of these special fields, let the caller to guarantee migration
is disabled before invoking bpf_obj_free_fields(). Therefore, disabling
migration before calling ops->map_free() to simplify the freeing of map
values or special fields allocated from bpf memory allocator.
After disabling migration in bpf_map_free(), there is no need for
additional migration_{disable|enable} pairs in these ->map_free()
callbacks. Remove these redundant invocations.
The migrate_{disable|enable} pairs in the underlying implementation of
bpf_obj_free_fields() will be removed by the following patch.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-11-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_selem_free_rcu() calls bpf_obj_free_fields() to free the special
fields in map value (e.g., kptr). Since kptrs may be allocated from bpf
memory allocator, migrate_{disable|enable} pairs are necessary for the
freeing of these kptrs.
To simplify reasoning about when migrate_disable() is needed for the
freeing of these dynamically-allocated kptrs, let the caller to
guarantee migration is disabled before invoking bpf_obj_free_fields().
Therefore, the patch adds migrate_{disable|enable} pair in
bpf_selem_free_rcu(). The migrate_{disable|enable} pairs in the
underlying implementation of bpf_obj_free_fields() will be removed by
the following patch.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-10-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_sk_storage_clone() will call bpf_selem_free() to free the clone
element when the allocation of new sock storage fails. bpf_selem_free()
will call check_and_free_fields() to free the special fields in the
element. Since the allocated element is not visible to bpf syscall or
bpf program when bpf_local_storage_alloc() fails, these special fields
in the element must be all zero when invoking bpf_selem_free().
To be uniform with other callers of bpf_selem_free(), disabling
migration when cloning sock storage. Adding migrate_{disable|enable}
pair also benefits the potential switching from kzalloc to bpf memory
allocator for sock storage.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-9-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When destroying sock storage, it invokes bpf_local_storage_destroy() to
remove all storage elements saved in the sock storage. The destroy
procedure will call bpf_selem_free() to free the element, and
bpf_selem_free() calls bpf_obj_free_fields() to free the special fields
in map value (e.g., kptr). Since kptrs may be allocated from bpf memory
allocator, migrate_{disable|enable} pairs are necessary for the freeing
of these kptrs.
To simplify reasoning about when migrate_disable() is needed for the
freeing of these dynamically-allocated kptrs, let the caller to
guarantee migration is disabled before invoking bpf_obj_free_fields().
Therefore, the patch adds migrate_{disable|enable} pair in
bpf_sock_storage_free(). The migrate_{disable|enable} pairs in the
underlying implementation of bpf_obj_free_fields() will be removed by
The following patch.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-8-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When destroying inode storage, it invokes bpf_local_storage_destroy() to
remove all storage elements saved in the inode storage. The destroy
procedure will call bpf_selem_free() to free the element, and
bpf_selem_free() calls bpf_obj_free_fields() to free the special fields
in map value (e.g., kptr). Since kptrs may be allocated from bpf memory
allocator, migrate_{disable|enable} pairs are necessary for the freeing
of these kptrs.
To simplify reasoning about when migrate_disable() is needed for the
freeing of these dynamically-allocated kptrs, let the caller to
guarantee migration is disabled before invoking bpf_obj_free_fields().
Therefore, the patch adds migrate_{disable|enable} pair in
bpf_inode_storage_free(). The migrate_{disable|enable} pairs in the
underlying implementation of bpf_obj_free_fields() will be removed by
the following patch.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-7-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Three callers of bpf_task_storage_lock() are ->map_lookup_elem,
->map_update_elem, ->map_delete_elem from bpf syscall. BPF syscall for
these three operations of task storage has already disabled migration.
Another two callers are bpf_task_storage_get() and
bpf_task_storage_delete() helpers which will be used by BPF program.
Two callers of bpf_task_storage_trylock() are bpf_task_storage_get() and
bpf_task_storage_delete() helpers. The running contexts of these helpers
have already disabled migration.
Therefore, it is safe to remove migrate_{disable|enable} from task
storage lock helpers for these call sites. However,
bpf_task_storage_free() also invokes bpf_task_storage_lock() and its
running context doesn't disable migration, therefore, add the missed
migrate_{disable|enable} in bpf_task_storage_free().
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Three callers of bpf_cgrp_storage_lock() are ->map_lookup_elem,
->map_update_elem, ->map_delete_elem from bpf syscall. BPF syscall for
these three operations of cgrp storage has already disabled migration.
Two call sites of bpf_cgrp_storage_trylock() are bpf_cgrp_storage_get(),
and bpf_cgrp_storage_delete() helpers. The running contexts of these
helpers have already disabled migration.
Therefore, it is safe to remove migrate_disable() for these callers.
However, bpf_cgrp_storage_free() also invokes bpf_cgrp_storage_lock()
and its running context doesn't disable migration. Therefore, also add
the missed migrate_{disabled|enable} in bpf_cgrp_storage_free().
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
htab_elem_free() has two call-sites: delete_all_elements() has already
disabled migration, free_htab_elem() is invoked by other 4 functions:
__htab_map_lookup_and_delete_elem, __htab_map_lookup_and_delete_batch,
htab_map_update_elem and htab_map_delete_elem.
BPF syscall has already disabled migration before invoking
->map_update_elem, ->map_delete_elem, and ->map_lookup_and_delete_elem
callbacks for hash map. __htab_map_lookup_and_delete_batch() also
disables migration before invoking free_htab_elem(). ->map_update_elem()
and ->map_delete_elem() of hash map may be invoked by BPF program and
the running context of BPF program has already disabled migration.
Therefore, it is safe to remove the migration_{disable|enable} pair in
htab_elem_free()
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
BPF program may call bpf_for_each_map_elem(), and it will call
the ->map_for_each_callback callback of related bpf map. Considering the
running context of bpf program has already disabled migration, remove
the unnecessary migrate_{disable|enable} pair in the implementations of
->map_for_each_callback. To ensure the guarantee will not be voilated
later, also add cant_migrate() check in the implementations.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Both bpf program and bpf syscall may invoke ->update or ->delete
operation for LPM trie. For bpf program, its running context has already
disabled migration explicitly through (migrate_disable()) or implicitly
through (preempt_disable() or disable irq). For bpf syscall, the
migration is disabled through the use of bpf_disable_instrumentation()
before invoking the corresponding map operation callback.
Therefore, it is safe to remove the migrate_{disable|enable){} pair from
LPM trie.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding kprobe.session probe to bpf_kfunc_common_test that misses bpf
program execution due to recursion check and making sure it increases
the program missed count properly.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20250106175048.1443905-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When kprobe multi bpf program can't be executed due to recursion check,
we currently return 0 (success) to fprobe layer where it's ignored for
standard kprobe multi probes.
For kprobe session the success return value will make fprobe layer to
install return probe and try to execute it as well.
But the return session probe should not get executed, because the entry
part did not run. FWIW the return probe bpf program most likely won't get
executed, because its recursion check will likely fail as well, but we
don't need to run it in the first place.. also we can make this clear
and obvious.
It also affects missed counts for kprobe session program execution, which
are now doubled (extra count for not executed return probe).
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20250106175048.1443905-1-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit ef1b808e3b7c ("bpf: Fix UAF via mismatching bpf_prog/attachment
RCU flavors") resolved a possible UAF issue in uprobes that attach
non-sleepable bpf prog by explicitly waiting for a tasks-trace-RCU grace
period. But, in the current implementation, synchronize_rcu_tasks_trace
is included within the mutex critical section, which increases the
length of the critical section and may affect performance. So let's move
out synchronize_rcu_tasks_trace from mutex CS.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20250104013946.1111785-1-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Emil Tsalapatis says:
====================
In BPF programs, kfunc calls while holding a lock are not allowed
because kfuncs may sleep by default. The exception to this rule are the
functions in special_kfunc_list, which are guaranteed to not sleep. The
bpf_iter_num_* functions used by the bpf_for and bpf_repeat macros make
no function calls themselves, and as such are guaranteed to not sleep.
Add them to special_kfunc_list to allow them within BPF spinlock
critical sections.
Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
====================
Link: https://patch.msgid.link/20250104202528.882482-1-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a selftest to ensure BPF for loops within critical sections are
accepted by the verifier.
Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250104202528.882482-3-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add the bpf_iter_num_* kfuncs called by bpf_for in special_kfunc_list,
and allow the calls even while holding a spin lock.
Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250104202528.882482-2-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit f1517eb790f9 ("bpf/tests: Expand branch conversion JIT test")
introduced "Long conditional jump tests" but due to those tests making
use of 64 bits DIV and MOD, they don't get jited on powerpc/32,
leading to the long conditional jump test being skiped for unrelated
reason.
Add 4 new tests that are restricted to 32 bits ALU so that the jump
tests can also be performed on platforms that do no support 64 bits
operations.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/609f87a2d84e032c8d9ccb9ba7aebef893698f1e.1736154762.git.christophe.leroy@csgroup.eu
Currently in emit_{lse,ll_sc}_atomic(), if there is an offset, we add it
to the base address by doing e.g.:
if (off) {
emit_a64_mov_i(1, tmp, off, ctx);
emit(A64_ADD(1, tmp, tmp, dst), ctx);
[...]
As pointed out by Xu, we can use emit_a64_add_i() (added in the previous
patch) instead, which tries to combine the above into a single A64_ADD_I
or A64_SUB_I when possible.
Suggested-by: Xu Kuohai <xukuohai@huaweicloud.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/bpf/9ad3034a62361d91a99af24efa03f48c4c9e13ea.1735868489.git.yepeilin@google.com
Currently, when we run the BPF selftests with the following command:
make -C tools/testing/selftests TARGETS=bpf SKIP_TARGETS=""
The command generates untracked files and directories with make version
less than 4.4:
'''
Untracked files:
(use "git add <file>..." to include in what will be committed)
tools/testing/selftests/bpfFEATURE-DUMP.selftests
tools/testing/selftests/bpffeature/
'''
We lost slash after word "bpf". The reason is slash appending code is as
follow:
'''
OUTPUT := $(OUTPUT)/
$(eval include ../../../build/Makefile.feature)
OUTPUT := $(patsubst %/,%,$(OUTPUT))
'''
This way of assigning values to OUTPUT will never be effective for the
variable OUTPUT provided via the command argument [1] and BPF makefile
is called from parent Makfile(tools/testing/selftests/Makefile) like:
'''
all:
...
$(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET
'''
According to GNU make, we can use override Directive to fix this issue [2].
[1] https://www.gnu.org/software/make/manual/make.html#Overriding
[2] https://www.gnu.org/software/make/manual/make.html#Override-Directive
Fixes: dc3a8804d790 ("selftests/bpf: Adapt OUTPUT appending logic to lower versions of Make")
Signed-off-by: Jiayuan Chen <mrpre@163.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20241224075957.288018-1-mrpre@163.com
There is a UAF report in the bpf_struct_ops when CONFIG_MODULES=n.
In particular, the report is on tcp_congestion_ops that has
a "struct module *owner" member.
For struct_ops that has a "struct module *owner" member,
it can be extended either by the regular kernel module or
by the bpf_struct_ops. bpf_try_module_get() will be used
to do the refcounting and different refcount is done
based on the owner pointer. When CONFIG_MODULES=n,
the btf_id of the "struct module" is missing:
WARN: resolve_btfids: unresolved symbol module
Thus, the bpf_try_module_get() cannot do the correct refcounting.
Not all subsystem's struct_ops requires the "struct module *owner" member.
e.g. the recent sched_ext_ops.
This patch is to disable bpf_struct_ops registration if
the struct_ops has the "struct module *" member and the
"struct module" btf_id is missing. The btf_type_is_fwd() helper
is moved to the btf.h header file for this test.
This has happened since the beginning of bpf_struct_ops which has gone
through many changes. The Fixes tag is set to a recent commit that this
patch can apply cleanly. Considering CONFIG_MODULES=n is not
common and the age of the issue, targeting for bpf-next also.
Fixes: 1611603537a4 ("bpf: Create argument information for nullable arguments.")
Reported-by: Robert Morris <rtm@csail.mit.edu>
Closes: https://lore.kernel.org/bpf/74665.1733669976@localhost/
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241220201818.127152-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The range tree introduction removed the need for maple tree usage
but missed removing the MT_ENTRY defined value that was used to
mark maple tree allocated entries.
Remove the MT_ENTRY define.
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Link: https://lore.kernel.org/r/20241223115901.14207-1-lpieralisi@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Commit 82c1f13de315 ("selftests/bpf: Add more stats into veristat")
introduced new stats, added by default in the CSV output, that were not
added to parse_stat_value, used in parse_stats_csv which is used in
comparison mode. Thus it broke comparison mode altogether making it fail
with "Unrecognized stat #7" and EINVAL.
One quirk is that PROG_TYPE and ATTACH_TYPE have been transformed to
strings using libbpf_bpf_prog_type_str and libbpf_bpf_attach_type_str
respectively. Since we might not want to compare those string values, we
just skip the parsing in this patch. We might want to translate it back
to the enum value or compare the string value directly.
Fixes: 82c1f13de315 ("selftests/bpf: Add more stats into veristat")
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
Tested-by: Mykyta Yatsenko<yatsenko@meta.com>
Link: https://lore.kernel.org/r/20241220152218.28405-1-mahe.tardy@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Matan Shachnai says:
====================
This patch-set aims to improve precision of BPF_MUL and add testcases
to illustrate precision gains using signed and unsigned bounds.
Changes from v1:
- Fixed typo made in patch.
Changes from v2:
- Added signed multiplication to BPF_MUL.
- Added test cases to exercise BPF_MUL.
- Reordered patches in the series.
Changes from v3:
- Coding style fixes.
====================
Link: https://patch.msgid.link/20241218032337.12214-1-m.shachnai@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch improves (or maintains) the precision of register value tracking
in BPF_MUL across all possible inputs. It also simplifies
scalar32_min_max_mul() and scalar_min_max_mul().
As it stands, BPF_MUL is composed of three functions:
case BPF_MUL:
tnum_mul();
scalar32_min_max_mul();
scalar_min_max_mul();
The current implementation of scalar_min_max_mul() restricts the u64 input
ranges of dst_reg and src_reg to be within [0, U32_MAX]:
/* Both values are positive, so we can work with unsigned and
* copy the result to signed (unless it exceeds S64_MAX).
*/
if (umax_val > U32_MAX || dst_reg->umax_value > U32_MAX) {
/* Potential overflow, we know nothing */
__mark_reg64_unbounded(dst_reg);
return;
}
This restriction is done to avoid unsigned overflow, which could otherwise
wrap the result around 0, and leave an unsound output where umin > umax. We
also observe that limiting these u64 input ranges to [0, U32_MAX] leads to
a loss of precision. Consider the case where the u64 bounds of dst_reg are
[0, 2^34] and the u64 bounds of src_reg are [0, 2^2]. While the
multiplication of these two bounds doesn't overflow and is sound [0, 2^36],
the current scalar_min_max_mul() would set the entire register state to
unbounded.
Importantly, we update BPF_MUL to allow signed bound multiplication
(i.e. multiplying negative bounds) as well as allow u64 inputs to take on
values from [0, U64_MAX]. We perform signed multiplication on two bounds
[a,b] and [c,d] by multiplying every combination of the bounds
(i.e. a*c, a*d, b*c, and b*d) and checking for overflow of each product. If
there is an overflow, we mark the signed bounds unbounded [S64_MIN, S64_MAX].
In the case of no overflow, we take the minimum of these products to
be the resulting smin, and the maximum to be the resulting smax.
The key idea here is that if there’s no possibility of overflow, either
when multiplying signed bounds or unsigned bounds, we can safely multiply the
respective bounds; otherwise, we set the bounds that exhibit overflow
(during multiplication) to unbounded.
if (check_mul_overflow(*dst_umax, src_reg->umax_value, dst_umax) ||
(check_mul_overflow(*dst_umin, src_reg->umin_value, dst_umin))) {
/* Overflow possible, we know nothing */
*dst_umin = 0;
*dst_umax = U64_MAX;
}
...
Below, we provide an example BPF program (below) that exhibits the
imprecision in the current BPF_MUL, where the outputs are all unbounded. In
contrast, the updated BPF_MUL produces a bounded register state:
BPF_LD_IMM64(BPF_REG_1, 11),
BPF_LD_IMM64(BPF_REG_2, 4503599627370624),
BPF_ALU64_IMM(BPF_NEG, BPF_REG_2, 0),
BPF_ALU64_IMM(BPF_NEG, BPF_REG_2, 0),
BPF_ALU64_REG(BPF_AND, BPF_REG_1, BPF_REG_2),
BPF_LD_IMM64(BPF_REG_3, 809591906117232263),
BPF_ALU64_REG(BPF_MUL, BPF_REG_3, BPF_REG_1),
BPF_MOV64_IMM(BPF_REG_0, 1),
BPF_EXIT_INSN(),
Verifier log using the old BPF_MUL:
func#0 @0
0: R1=ctx() R10=fp0
0: (18) r1 = 0xb ; R1_w=11
2: (18) r2 = 0x10000000000080 ; R2_w=0x10000000000080
4: (87) r2 = -r2 ; R2_w=scalar()
5: (87) r2 = -r2 ; R2_w=scalar()
6: (5f) r1 &= r2 ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=11,var_off=(0x0; 0xb)) R2_w=scalar()
7: (18) r3 = 0xb3c3f8c99262687 ; R3_w=0xb3c3f8c99262687
9: (2f) r3 *= r1 ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=11,var_off=(0x0; 0xb)) R3_w=scalar()
...
Verifier using the new updated BPF_MUL (more precise bounds at label 9)
func#0 @0
0: R1=ctx() R10=fp0
0: (18) r1 = 0xb ; R1_w=11
2: (18) r2 = 0x10000000000080 ; R2_w=0x10000000000080
4: (87) r2 = -r2 ; R2_w=scalar()
5: (87) r2 = -r2 ; R2_w=scalar()
6: (5f) r1 &= r2 ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=11,var_off=(0x0; 0xb)) R2_w=scalar()
7: (18) r3 = 0xb3c3f8c99262687 ; R3_w=0xb3c3f8c99262687
9: (2f) r3 *= r1 ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=11,var_off=(0x0; 0xb)) R3_w=scalar(smin=0,smax=umax=0x7b96bb0a94a3a7cd,var_off=(0x0; 0x7fffffffffffffff))
...
Finally, we proved the soundness of the new scalar_min_max_mul() and
scalar32_min_max_mul() functions. Typically, multiplication operations are
expensive to check with bitvector-based solvers. We were able to prove the
soundness of these functions using Non-Linear Integer Arithmetic (NIA)
theory. Additionally, using Agni [2,3], we obtained the encodings for
scalar32_min_max_mul() and scalar_min_max_mul() in bitvector theory, and
were able to prove their soundness using 8-bit bitvectors (instead of
64-bit bitvectors that the functions actually use).
In conclusion, with this patch,
1. We were able to show that we can improve the overall precision of
BPF_MUL. We proved (using an SMT solver) that this new version of
BPF_MUL is at least as precise as the current version for all inputs
and more precise for some inputs.
2. We are able to prove the soundness of the new scalar_min_max_mul() and
scalar32_min_max_mul(). By leveraging the existing proof of tnum_mul
[1], we can say that the composition of these three functions within
BPF_MUL is sound.
[1] https://ieeexplore.ieee.org/abstract/document/9741267
[2] https://link.springer.com/chapter/10.1007/978-3-031-37709-9_12
[3] https://people.cs.rutgers.edu/~sn349/papers/sas24-preprint.pdf
Co-developed-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Co-developed-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
Signed-off-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
Co-developed-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
Signed-off-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
Signed-off-by: Matan Shachnai <m.shachnai@gmail.com>
Link: https://lore.kernel.org/r/20241218032337.12214-2-m.shachnai@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Starting from 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and
MFD_EXEC") and until 1717449b4417 ("memfd: drop warning for missing
exec-related flags"), the kernel would print a warning if neither
MFD_NOEXEC_SEAL nor MFD_EXEC is set in memfd_create().
If libbpf runs on on a kernel between these two commits (eg. on an
improperly backported system), it'll trigger this warning.
To avoid this warning (and also be more secure), explicitly set
MFD_NOEXEC_SEAL. But since libbpf can be run on potentially very old
kernels, leave a fallback for kernels without MFD_NOEXEC_SEAL support.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/6e62c2421ad7eb1da49cbf16da95aaaa7f94d394.1735594195.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In PREEMPT_RT, kmalloc(GFP_ATOMIC) is still not safe in non preemptible
context. bpf_mem_alloc must be used in PREEMPT_RT. This patch is
to enforce bpf_mem_alloc in the bpf_local_storage when CONFIG_PREEMPT_RT
is enabled.
[ 35.118559] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[ 35.118566] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1832, name: test_progs
[ 35.118569] preempt_count: 1, expected: 0
[ 35.118571] RCU nest depth: 1, expected: 1
[ 35.118577] INFO: lockdep is turned off.
...
[ 35.118647] __might_resched+0x433/0x5b0
[ 35.118677] rt_spin_lock+0xc3/0x290
[ 35.118700] ___slab_alloc+0x72/0xc40
[ 35.118723] __kmalloc_noprof+0x13f/0x4e0
[ 35.118732] bpf_map_kzalloc+0xe5/0x220
[ 35.118740] bpf_selem_alloc+0x1d2/0x7b0
[ 35.118755] bpf_local_storage_update+0x2fa/0x8b0
[ 35.118784] bpf_sk_storage_get_tracing+0x15a/0x1d0
[ 35.118791] bpf_prog_9a118d86fca78ebb_trace_inet_sock_set_state+0x44/0x66
[ 35.118795] bpf_trace_run3+0x222/0x400
[ 35.118820] __bpf_trace_inet_sock_set_state+0x11/0x20
[ 35.118824] trace_inet_sock_set_state+0x112/0x130
[ 35.118830] inet_sk_state_store+0x41/0x90
[ 35.118836] tcp_set_state+0x3b3/0x640
There is no need to adjust the gfp_flags passing to the
bpf_mem_cache_alloc_flags() which only honors the GFP_KERNEL.
The verifier has ensured GFP_KERNEL is passed only in sleepable context.
It has been an old issue since the first introduction of the
bpf_local_storage ~5 years ago, so this patch targets the bpf-next.
bpf_mem_alloc is needed to solve it, so the Fixes tag is set
to the commit when bpf_mem_alloc was first used in the bpf_local_storage.
Fixes: 08a7ce384e33 ("bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elem")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20241218193000.2084281-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fix comparator implementation to return most popular source code
lines instead of least.
Introduce min/max macro for building veristat outside of Linux
repository.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241217181113.364651-1-mykyta.yatsenko5@gmail.com
free_task() already calls bpf_task_storage_free(). It is not necessary
to call it again on security_task_free(). Remove the hook.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://patch.msgid.link/20241212075956.2614894-1-song@kernel.org
- Limit EFI zboot to GZIP and ZSTD before it comes in wider use
- Fix inconsistent error when looking up a non-existent file in efivarfs
with a name that does not adhere to the NAME-GUID format
- Drop some unused code
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZ17ajwAKCRAwbglWLn0t
XGkQAQCuIi5yPony5hJf6vrYXm7rnHN2NS9Wg7q3rKNR7TIGMQD/YHRdNJbJ4nO5
BrOVS4eVXvSzvWrYxB/W4EAMJ1uyLgs=
=LNFy
-----END PGP SIGNATURE-----
Merge tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
- Limit EFI zboot to GZIP and ZSTD before it comes in wider use
- Fix inconsistent error when looking up a non-existent file in
efivarfs with a name that does not adhere to the NAME-GUID format
- Drop some unused code
* tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi/esrt: remove esre_attribute::store()
efivarfs: Fix error on non-existent file
efi/zboot: Limit compression options to GZIP and ZSTD
We have these fixes for hosts: PNX used the wrong unit for timeouts,
Nomadik was missing a sentinel, and RIIC was missing rounding up.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmdemHwACgkQFA3kzBSg
KbYypRAAid1NeA2THz8lrwMN9iyiRj+ANcczbLcGKZXibF04wyHHAREAlP6e+w1N
bkAYP7Fl2bHDvXkbQtro2U9BAngT+reXbtka2/nMQXIz2X+kop+WO81oBAi1Iat3
AmGzS5rBuUguyNocb2q6tnSOFBHIJkix0wXJTstW4OLO4q20pjY3O7NUGVZ6bHCO
FYunN2pDnAKadCz3Iy2nhA3J9fRBaZnVK2SWX/wL1lYjnQUykWnlC70ADZOgUsoH
r3IOPtakHYUIOX9YZZKmIYi5TaWHsORSvhC4htFNPt627+pwL4jfiikOv8PmBm3d
KRr2R1bXwpG7xHXh23iMw4FcHlpWauojQEVUYETEXDKqSAUGW3RzTxKV9ug93nLr
EO3gGF1Jy+XKAnHrAbgjML+rZNYTdFKkwmI4KauSg3UoORGEHDVtoGfGuRM5BVB5
ZjVECPSc/swJ/MVdT74Mdc8dxpetYsD01w1AdzJA1lC42+jKvU7wl7V4jVaGjA65
UM8xJoRKlG8/5LM+VNo4Sg8ZaOVOu3gX980VSkbGHoN6TrfFMsYhDcP8Zd7R2x8W
8tBpurSe++O3aj9HhzzkrWTx5EgDFRRyw0O0+7o2/FCEuyXIuh+YxrTwcyLUIH9I
c2BHnJJCB69ZzlASwlrXPeKFG2V564ZS/wdA/mbmm1mUkufs+z4=
=WM5N
-----END PGP SIGNATURE-----
Merge tag 'i2c-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"i2c host fixes: PNX used the wrong unit for timeouts, Nomadik was
missing a sentinel, and RIIC was missing rounding up"
* tag 'i2c-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: riic: Always round-up when calculating bus period
i2c: nomadik: Add missing sentinel to match table
i2c: pnx: Fix timeout in wait functions
SoC to fix interrupt priority assignment and even make a dead machine boot
again when the gic-v3 driver enables pseudo NMIs
- Correct the declaration of a percpu variable to fix several sparse warnings
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmdexi0ACgkQEsHwGGHe
VUoy2A/9EJkTmjoFL+AeDY1nGxjCiPJREZgxKmOgKX9uzjBF/airHs8m5RzYteYK
bUBbnrc3LEXMX1sOPGAfAvXTyfrIlWYqv8hVWcaAUs80S7Mm/aSnydA33NP6mj3/
m/113+CnhtBsTThMb/D/Cz4mTq2BrbTFqiUpMSDIA624Zr+XwD4rP1vMUmKDiYGW
8EeW8ym6OnCNQYhd9CMBA/BeFyF4blSb+onwM4rMm3xXgGQJ5ywfp9Ry6wU1x0Q8
EC0Rwz4yHcMYRjlrT940ZVDN6u+i3HPPHrhipJyua9awnDBc3oBT5rmqEg1s99TO
P5YemyDHEaTub91HHyHcXL3X6/Enk2mtwA/+RViUywVsiPti2m1k/hvUK5JECoyw
MtOZ4Br4KnbKOH2qLyg9S4eWcNLNdlB8Q+At63yssqFpOCaF7LCXnTIzX9by+z4K
qriS7UGVqzTFZNtf8oiM++7IkL0zP+P6IlNKiuZVbZilAgAT1KHFoqkVtWhpHdkj
UZjmEPxjMQYVVG29OG9rdwAlPu7vyHJsZRaT07GhJIv+QwfufLk7hxMsvPB0Inm5
1rG+JARzBv1eU+91KWvA3LW5CySjASoEQtsrGlh0Ns/Mkduvc68txaperXRf9Fg5
j6kYriZNK85JyDUM6GQ561doxlpVZWMkC/GAKZRP8ZlaUCCokug=
=SlQ1
-----END PGP SIGNATURE-----
Merge tag 'irq_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Disable the secure programming interface of the GIC500 chip in the
RK3399 SoC to fix interrupt priority assignment and even make a dead
machine boot again when the gic-v3 driver enables pseudo NMIs
- Correct the declaration of a percpu variable to fix several sparse
warnings
* tag 'irq_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3: Work around insecure GIC integrations
irqchip/gic: Correct declaration of *percpu_base pointer in union gic_base
its time accounting
- Properly track the CFS runqueue runnable stats
- Check the total number of all queued tasks in a sched fair's runqueue
hierarchy before deciding to stop the tick
- Fix the scheduling of the task that got woken last (NEXT_BUDDY) by
preventing those from being delayed
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmdexEsACgkQEsHwGGHe
VUpFqA//SIIbNJEIQEwGkFrYpGwVpSISm94L4ENsrkWbJWQlALwQEBJF9Me/DOZH
vHaX3o+cMxt26W7o0NKyPcvYtulnOr33HZA/uxK35MDaUinSA3Spt3jXHfR3n0mL
ljNQQraWHGaJh7dzKMZoxP6DR78/Z0yotXjt33xeBFMSJuzGsklrbIiSJ6c4m/3u
Y1lrQT8LncsxJMYIPAKtBAc9hvJfGFV6IOTaTfxP0oTuDo/2qTNVHm7to40wk3NW
kb0lf2kjVtE6mwMfEm49rtjE3h0VnPJKGKoEkLi9IQoPbQq9Uf4i9VSmRe3zqPAz
yBxV8BAu2koscMZzqw1CTnd9c/V+/A9qOOHfDo72I5MriJ1qVWCEsqB1y3u2yT6n
XjwFDbPiVKI8H9YlsZpWERocCRypshevPNlYOF93PlK+YTXoMWaXMQhec5NDzLLw
Se1K2sCi3U8BMdln0dH6nhk0unzNKQ8UKzrMFncSjnpWhpJ69uxyUZ/jL//6bvfi
Z+7G4U54mUhGyOAaUSGH/20TnZRWJ7NJC542omFgg9v0VLxx+wnZyX4zJIV0jvRr
6voYmYDCO8zn/hO67VBJuei97ayIzxDNP1tVl15LzcvRcIGWNUPOwp5jijv8vDJG
lJhQrMF6w4fgPItC20FvptlDvpP9cItSzyyOeg074HjDS53QN2Y=
=jOb3
-----END PGP SIGNATURE-----
Merge tag 'sched_urgent_for_v6.13_rc3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Prevent incorrect dequeueing of the deadline dlserver helper task and
fix its time accounting
- Properly track the CFS runqueue runnable stats
- Check the total number of all queued tasks in a sched fair's runqueue
hierarchy before deciding to stop the tick
- Fix the scheduling of the task that got woken last (NEXT_BUDDY) by
preventing those from being delayed
* tag 'sched_urgent_for_v6.13_rc3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/dlserver: Fix dlserver time accounting
sched/dlserver: Fix dlserver double enqueue
sched/eevdf: More PELT vs DELAYED_DEQUEUE
sched/fair: Fix sched_can_stop_tick() for fair tasks
sched/fair: Fix NEXT_BUDDY