linux-next/kernel
Hou Tao 7ac5c53e00 bpf: Use c->unit_size to select target cache during free
At present, bpf memory allocator uses check_obj_size() to ensure that
ksize() of allocated pointer is equal with the unit_size of used
bpf_mem_cache. Its purpose is to prevent bpf_mem_free() from selecting
a bpf_mem_cache which has different unit_size compared with the
bpf_mem_cache used for allocation. But as reported by lkp, the return
value of ksize() or kmalloc_size_roundup() may change due to slab merge
and it will lead to the warning report in check_obj_size().

The reported warning happened as follows:
(1) in bpf_mem_cache_adjust_size(), kmalloc_size_roundup(96) returns the
object_size of kmalloc-96 instead of kmalloc-cg-96. The object_size of
kmalloc-96 is 96, so size_index for 96 is not adjusted accordingly.
(2) the object_size of kmalloc-cg-96 is adjust from 96 to 128 due to
slab merge in __kmem_cache_alias(). For SLAB, SLAB_HWCACHE_ALIGN is
enabled by default for kmalloc slab, so align is 64 and size is 128 for
kmalloc-cg-96. SLUB has a similar merge logic, but its object_size will
not be changed, because its align is 8 under x86-64.
(3) when unit_alloc() does kmalloc_node(96, __GFP_ACCOUNT, node),
ksize() returns 128 instead of 96 for the returned pointer.
(4) the warning in check_obj_size() is triggered.

Considering the slab merge can happen in anytime (e.g, a slab created in
a new module), the following case is also possible: during the
initialization of bpf_global_ma, there is no slab merge and ksize() for
a 96-bytes object returns 96. But after that a new slab created by a
kernel module is merged to kmalloc-cg-96 and the object_size of
kmalloc-cg-96 is adjust from 96 to 128 (which is possible for x86-64 +
CONFIG_SLAB, because its alignment requirement is 64 for 96-bytes slab).
So soon or later, when bpf_global_ma frees a 96-byte-sized pointer
which is allocated from bpf_mem_cache with unit_size=96, bpf_mem_free()
will free the pointer through a bpf_mem_cache in which unit_size is 128,
because the return value of ksize() changes. The warning for the
mismatch will be triggered again.

A feasible fix is introducing similar APIs compared with ksize() and
kmalloc_size_roundup() to return the actually-allocated size instead of
size which may change due to slab merge, but it will introduce
unnecessary dependency on the implementation details of mm subsystem.

As for now the pointer of bpf_mem_cache is saved in the 8-bytes area
(or 4-bytes under 32-bit host) above the returned pointer, using
unit_size in the saved bpf_mem_cache to select the target cache instead
of inferring the size from the pointer itself. Beside no extra
dependency on mm subsystem, the performance for bpf_mem_free_rcu() is
also improved as shown below.

Before applying the patch, the performances of bpf_mem_alloc() and
bpf_mem_free_rcu() on 8-CPUs VM with one producer are as follows:

kmalloc : alloc 11.69 ± 0.28M/s free 29.58 ± 0.93M/s
percpu  : alloc 14.11 ± 0.52M/s free 14.29 ± 0.99M/s

After apply the patch, the performance for bpf_mem_free_rcu() increases
9% and 146% for kmalloc memory and per-cpu memory respectively:

kmalloc: alloc 11.01 ± 0.03M/s free   32.42 ± 0.48M/s
percpu:  alloc 12.84 ± 0.12M/s free   35.24 ± 0.23M/s

After the fixes, there is no need to adjust size_index to fix the
mismatch between allocation and free, so remove it as well. Also return
NULL instead of ZERO_SIZE_PTR for zero-sized alloc in bpf_mem_alloc(),
because there is no bpf_mem_cache pointer saved above ZERO_SIZE_PTR.

Fixes: 9077fc228f ("bpf: Use kmalloc_size_roundup() to adjust size_index")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/bpf/202310302113.9f8fe705-oliver.sang@intel.com
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231216131052.27621-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-20 13:25:46 -08:00
..
bpf bpf: Use c->unit_size to select target cache during free 2023-12-20 13:25:46 -08:00
cgroup Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-12-07 17:53:17 -08:00
configs hardening: Provide Kconfig fragments for basic options 2023-09-22 09:50:55 -07:00
debug kdb: Corrects comment for kdballocenv 2023-11-06 17:13:55 +00:00
dma swiotlb: fix out-of-bounds TLB allocations with CONFIG_SWIOTLB_DYNAMIC 2023-11-08 16:27:05 +01:00
entry entry: Remove empty addr_limit_user_check() 2023-08-23 10:32:39 +02:00
events perf: Fix perf_event_validate_size() 2023-11-29 15:43:50 +01:00
futex futex: Fix hardcoded flags 2023-11-15 04:02:25 +01:00
gcov gcov: annotate struct gcov_iterator with __counted_by 2023-10-18 14:43:22 -07:00
irq As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
kcsan mm: delete checks for xor_unlock_is_negative_byte() 2023-10-18 14:34:17 -07:00
livepatch livepatch: Fix missing newline character in klp_resolve_symbols() 2023-09-20 11:24:18 +02:00
locking lockdep: Fix block chain corruption 2023-11-24 11:04:54 +01:00
module This update includes the following changes: 2023-11-02 16:15:30 -10:00
power Power management updates for 6.7-rc1 2023-10-31 15:38:12 -10:00
printk TTY/Serial changes for 6.7-rc1 2023-11-03 15:44:25 -10:00
rcu RCU fixes for v6.7 2023-11-08 09:47:52 -08:00
sched sched/fair: Fix the decision for load balance 2023-11-14 22:27:01 +01:00
time - Do the push of pending hrtimers away from a CPU which is being 2023-11-19 13:35:07 -08:00
trace Revert BPF token-related functionality 2023-12-19 08:23:03 -08:00
.gitignore
acct.c fs: rename __mnt_{want,drop}_write*() helpers 2023-09-11 15:05:50 +02:00
async.c
audit_fsnotify.c
audit_tree.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
audit_watch.c audit: don't WARN_ON_ONCE(!current->mm) in audit_exe_compare() 2023-11-14 17:34:27 -05:00
audit.c audit: move trailing statements to next line 2023-08-15 18:16:14 -04:00
audit.h audit: correct audit_filter_inodes() definition 2023-07-21 12:17:25 -04:00
auditfilter.c audit: move trailing statements to next line 2023-08-15 18:16:14 -04:00
auditsc.c audit,io_uring: io_uring openat triggers audit reference count underflow 2023-10-13 18:34:46 +02:00
backtracetest.c
bounds.c mm: multi-gen LRU: minimal implementation 2022-09-26 19:46:09 -07:00
capability.c lsm: constify the 'target' parameter in security_capget() 2023-08-08 16:48:47 -04:00
cfi.c cfi: Switch to -fsanitize=kcfi 2022-09-26 10:13:13 -07:00
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-03-14 19:32:38 -07:00
configs.c
context_tracking.c locking/atomic: treewide: use raw_atomic*_<op>() 2023-06-05 09:57:20 +02:00
cpu_pm.c cpuidle, cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}() 2023-01-13 11:48:15 +01:00
cpu.c - Do the push of pending hrtimers away from a CPU which is being 2023-11-19 13:35:07 -08:00
crash_core.c crash_core.c: remove unneeded functions 2023-10-04 10:41:58 -07:00
crash_dump.c
cred.c lsm/stable-6.7 PR 20231030 2023-10-30 20:13:17 -10:00
delayacct.c delayacct: track delays from IRQ/SOFTIRQ 2023-04-18 16:39:34 -07:00
dma.c
exec_domain.c
exit.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
exit.h exit: add internal include file with helpers 2023-09-21 12:03:50 -06:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-02-08 13:36:22 +01:00
fork.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
freezer.c freezer,sched: Do not restore saved_state of a thawed task 2023-11-29 15:43:48 +01:00
gen_kheaders.sh Revert "kheaders: substituting --sort in archive creation" 2023-05-28 16:20:21 +09:00
groups.c groups: Convert group_info.usage to refcount_t 2023-09-29 11:28:39 -07:00
hung_task.c kernel/hung_task.c: set some hung_task.c variables storage-class-specifier to static 2023-04-08 13:45:37 -07:00
iomem.c kernel/iomem.c: remove __weak ioremap_cache helper 2023-08-21 13:37:28 -07:00
irq_work.c trace: Add trace_ipi_send_cpu() 2023-03-24 11:01:29 +01:00
jump_label.c jump_label: Prevent key->enabled int overflow 2022-12-01 15:53:05 -08:00
kallsyms_internal.h kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[] 2022-11-12 18:47:36 -08:00
kallsyms_selftest.c Modules changes for v6.6-rc1 2023-08-29 17:32:32 -07:00
kallsyms_selftest.h kallsyms: Add self-test facility 2022-11-15 00:42:02 -08:00
kallsyms.c kallsyms: Change func signature for cleanup_symbol_name() 2023-08-25 15:00:36 -07:00
kcmp.c file: convert to SLAB_TYPESAFE_BY_RCU 2023-10-19 11:02:48 +02:00
Kconfig.freezer
Kconfig.hz
Kconfig.kexec kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP 2023-12-06 16:12:48 -08:00
Kconfig.locks
Kconfig.preempt
kcov.c kcov: add prototypes for helper functions 2023-06-09 17:44:17 -07:00
kexec_core.c crash_core: move crashk_*res definition into crash_core.c 2023-10-04 10:41:58 -07:00
kexec_elf.c
kexec_file.c integrity-v6.6 2023-08-30 09:16:56 -07:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2022-09-11 21:55:06 -07:00
kexec.c kernel: kexec: copy user-array safely 2023-10-09 16:59:47 +10:00
kheaders.c kheaders: Use array declaration instead of char 2023-03-24 20:10:59 -07:00
kprobes.c kprobes: consistent rcu api usage for kretprobe holder 2023-12-01 14:53:55 +09:00
ksyms_common.c kallsyms: make kallsyms_show_value() as generic function 2023-06-08 12:27:20 -07:00
ksysfs.c crash: hotplug support for kexec_load() 2023-08-24 16:25:14 -07:00
kthread.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
latencytop.c latencytop: use the last element of latency_record of system 2022-09-11 21:55:12 -07:00
Makefile v6.5-rc1-modules-next 2023-06-28 15:51:08 -07:00
module_signature.c
notifier.c notifiers: add tracepoints to the notifiers infrastructure 2023-04-08 13:45:38 -07:00
nsproxy.c nsproxy: Convert nsproxy.count to refcount_t 2023-08-21 11:29:12 -07:00
padata.c padata: Fix refcnt handling in padata_free_shell() 2023-10-27 18:04:24 +08:00
panic.c panic: use atomic_try_cmpxchg in panic() and nmi_panic() 2023-10-04 10:41:56 -07:00
params.c kernel: params: Remove unnecessary ‘0’ values from err 2023-07-10 12:47:01 -07:00
pid_namespace.c pid: pid_ns_ctl_handler: remove useless comment 2023-10-04 10:41:57 -07:00
pid_sysctl.h memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy 2023-08-21 13:37:59 -07:00
pid.c pidfd: prevent a kernel-doc warning 2023-09-19 13:21:33 -07:00
profile.c kernel/profile.c: simplify duplicated code in profile_setup() 2022-09-11 21:55:12 -07:00
ptrace.c mm: make __access_remote_vm() static 2023-10-18 14:34:15 -07:00
range.c
reboot.c kernel/reboot: Add device to sys_off_handler 2023-07-28 11:33:09 +01:00
regset.c
relay.c kernel: relay: remove unnecessary NULL values from relay_open_buf 2023-08-18 10:18:55 -07:00
resource_kunit.c
resource.c resource: Unify next_resource() and next_resource_skip_children() 2023-10-05 13:58:07 +02:00
rseq.c rseq: Extend struct rseq with per-memory-map concurrency ID 2022-12-27 12:52:12 +01:00
scftorture.c scftorture: Pause testing after memory-allocation failure 2023-07-14 15:02:57 -07:00
scs.c scs: add support for dynamic shadow call stacks 2022-11-09 18:06:35 +00:00
seccomp.c seccomp: Add missing kerndoc notations 2023-08-17 12:32:15 -07:00
signal.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
smp.c CSD lock commits for v6.7 2023-10-30 17:56:53 -10:00
smpboot.c kthread: add kthread_stop_put 2023-10-04 10:41:57 -07:00
smpboot.h
softirq.c sched/core: introduce sched_core_idle_cpu() 2023-07-13 15:21:50 +02:00
stackleak.c stackleak: allow to specify arch specific stackleak poison function 2023-04-20 11:36:35 +02:00
stacktrace.c stacktrace: Export stack_trace_save_tsk 2023-09-11 23:59:47 -04:00
static_call_inline.c static_call: Add call depth tracking support 2022-10-17 16:41:16 +02:00
static_call.c
stop_machine.c
sys_ni.c asm-generic updates for v6.7 2023-11-01 15:28:33 -10:00
sys.c prctl: Disable prctl(PR_SET_MDWE) on parisc 2023-11-18 19:35:31 +01:00
sysctl-test.c kernel/sysctl-test: use SYSCTL_{ZERO/ONE_HUNDRED} instead of i_{zero/one_hundred} 2022-09-08 16:56:45 -07:00
sysctl.c asm-generic updates for v6.7 2023-11-01 15:28:33 -10:00
task_work.c task_work: add kerneldoc annotation for 'data' argument 2023-09-19 13:21:32 -07:00
taskstats.c taskstats: fill_stats_for_tgid: use for_each_thread() 2023-10-04 10:41:57 -07:00
torture.c torture: Print out torture module parameters 2023-09-24 17:24:01 +02:00
tracepoint.c tracepoint: Allow livepatch module add trace event 2023-02-18 14:34:36 -05:00
tsacct.c
ucount.c sysctl: Add size to register_sysctl 2023-08-15 15:26:17 -07:00
uid16.c
uid16.h
umh.c sysctl: fix unused proc_cap_handler() function warning 2023-06-29 15:19:43 -07:00
up.c smp: Change function signatures to use call_single_data_t 2023-09-13 14:59:24 +02:00
user_namespace.c As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
user-return-notifier.c
user.c binfmt_misc: enable sandboxed mounts 2023-10-11 08:46:01 -07:00
usermode_driver.c
utsname_sysctl.c utsname: simplify one-level sysctl registration for uts_kern_table 2023-04-13 11:49:35 -07:00
utsname.c
vhost_task.c vhost: Fix worker hangs due to missed wake up calls 2023-06-08 15:43:09 -04:00
watch_queue.c kernel: watch_queue: copy user-array safely 2023-10-09 16:59:48 +10:00
watchdog_buddy.c watchdog/hardlockup: move SMP barriers from common code to buddy code 2023-06-19 16:25:28 -07:00
watchdog_perf.c watchdog/perf: add a weak function for an arch to detect if perf can use NMIs 2023-06-09 17:44:21 -07:00
watchdog.c watchdog: move softlockup_panic back to early_param 2023-11-01 12:10:02 -07:00
workqueue_internal.h workqueue: Drop the special locking rule for worker->flags and worker_pool->flags 2023-08-07 15:57:22 -10:00
workqueue.c workqueue: Make sure that wq_unbound_cpumask is never empty 2023-11-22 06:17:26 -10:00