bpf-next-for-netdev

-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZbQV+gAKCRDbK58LschI
 g2OeAP0VvhZS9SPiS+/AMAFuw2W1BkMrFNbfBTc3nzRnyJSmNAD+NG4CLLJvsKI9
 olu7VC20B8pLTGLUGIUSwqnjOC+Kkgc=
 =wVMl
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-01-26

We've added 107 non-merge commits during the last 4 day(s) which contain
a total of 101 files changed, 6009 insertions(+), 1260 deletions(-).

The main changes are:

1) Add BPF token support to delegate a subset of BPF subsystem
   functionality from privileged system-wide daemons such as systemd
   through special mount options for userns-bound BPF fs to a trusted
   & unprivileged application. With addressed changes from Christian
   and Linus' reviews, from Andrii Nakryiko.

2) Support registration of struct_ops types from modules which helps
   projects like fuse-bpf that seeks to implement a new struct_ops type,
   from Kui-Feng Lee.

3) Add support for retrieval of cookies for perf/kprobe multi links,
   from Jiri Olsa.

4) Bigger batch of prep-work for the BPF verifier to eventually support
   preserving boundaries and tracking scalars on narrowing fills,
   from Maxim Mikityanskiy.

5) Extend the tc BPF flavor to support arbitrary TCP SYN cookies to help
   with the scenario of SYN floods, from Kuniyuki Iwashima.

6) Add code generation to inline the bpf_kptr_xchg() helper which
   improves performance when stashing/popping the allocated BPF objects,
   from Hou Tao.

7) Extend BPF verifier to track aligned ST stores as imprecise spilled
   registers, from Yonghong Song.

8) Several fixes to BPF selftests around inline asm constraints and
   unsupported VLA code generation, from Jose E. Marchesi.

9) Various updates to the BPF IETF instruction set draft document such
   as the introduction of conformance groups for instructions,
   from Dave Thaler.

10) Fix BPF verifier to make infinite loop detection in is_state_visited()
    exact to catch some too lax spill/fill corner cases,
    from Eduard Zingerman.

11) Refactor the BPF verifier pointer ALU check to allow ALU explicitly
    instead of implicitly for various register types, from Hao Sun.

12) Fix the flaky tc_redirect_dtime BPF selftest due to slowness
    in neighbor advertisement at setup time, from Martin KaFai Lau.

13) Change BPF selftests to skip callback tests for the case when the
    JIT is disabled, from Tiezhu Yang.

14) Add a small extension to libbpf which allows to auto create
    a map-in-map's inner map, from Andrey Grafin.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (107 commits)
  selftests/bpf: Add missing line break in test_verifier
  bpf, docs: Clarify definitions of various instructions
  bpf: Fix error checks against bpf_get_btf_vmlinux().
  bpf: One more maintainer for libbpf and BPF selftests
  selftests/bpf: Incorporate LSM policy to token-based tests
  selftests/bpf: Add tests for LIBBPF_BPF_TOKEN_PATH envvar
  libbpf: Support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar
  selftests/bpf: Add tests for BPF object load with implicit token
  selftests/bpf: Add BPF object loading tests with explicit token passing
  libbpf: Wire up BPF token support at BPF object level
  libbpf: Wire up token_fd into feature probing logic
  libbpf: Move feature detection code into its own file
  libbpf: Further decouple feature checking logic from bpf_object
  libbpf: Split feature detectors definitions from cached results
  selftests/bpf: Utilize string values for delegate_xxx mount options
  bpf: Support symbolic BPF FS delegation mount options
  bpf: Fail BPF_TOKEN_CREATE if no delegation option was set on BPF FS
  bpf,selinux: Allocate bpf_security_struct per BPF token
  selftests/bpf: Add BPF token-enabled tests
  libbpf: Add BPF token support to bpf_prog_load() API
  ...
====================

Link: https://lore.kernel.org/r/20240126215710.19855-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
Jakub Kicinski 2024-01-26 21:08:21 -08:00
commit 92046e83c0
101 changed files with 6021 additions and 1272 deletions

View File

@ -97,6 +97,28 @@ Definitions
A: 10000110
B: 11111111 10000110
Conformance groups
------------------
An implementation does not need to support all instructions specified in this
document (e.g., deprecated instructions). Instead, a number of conformance
groups are specified. An implementation must support the "basic" conformance
group and may support additional conformance groups, where supporting a
conformance group means it must support all instructions in that conformance
group.
The use of named conformance groups enables interoperability between a runtime
that executes instructions, and tools as such compilers that generate
instructions for the runtime. Thus, capability discovery in terms of
conformance groups might be done manually by users or automatically by tools.
Each conformance group has a short ASCII label (e.g., "basic") that
corresponds to a set of instructions that are mandatory. That is, each
instruction has one or more conformance groups of which it is a member.
The "basic" conformance group includes all instructions defined in this
specification unless otherwise noted.
Instruction encoding
====================
@ -152,12 +174,12 @@ and imm containing the high 32 bits of the immediate value.
This is depicted in the following figure::
basic_instruction
.-----------------------------.
| |
code:8 regs:8 offset:16 imm:32 unused:32 imm:32
| |
'--------------'
pseudo instruction
.------------------------------.
| |
opcode:8 regs:8 offset:16 imm:32 unused:32 imm:32
| |
'--------------'
pseudo instruction
Thus the 64-bit immediate value is constructed as follows:
@ -295,7 +317,11 @@ The ``BPF_MOVSX`` instruction does a move operation with sign extension.
``BPF_ALU | BPF_MOVSX`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into 32
bit operands, and zeroes the remaining upper 32 bits.
``BPF_ALU64 | BPF_MOVSX`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit
operands into 64 bit operands.
operands into 64 bit operands. Unlike other arithmetic instructions,
``BPF_MOVSX`` is only defined for register source operands (``BPF_X``).
The ``BPF_NEG`` instruction is only defined when the source bit is clear
(``BPF_K``).
Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31)
for 32-bit operations.
@ -352,27 +378,27 @@ Jump instructions
otherwise identical operations.
The 'code' field encodes the operation as below:
======== ===== === =========================================== =========================================
code value src description notes
======== ===== === =========================================== =========================================
BPF_JA 0x0 0x0 PC += offset BPF_JMP class
BPF_JA 0x0 0x0 PC += imm BPF_JMP32 class
======== ===== === =============================== =============================================
code value src description notes
======== ===== === =============================== =============================================
BPF_JA 0x0 0x0 PC += offset BPF_JMP | BPF_K only
BPF_JA 0x0 0x0 PC += imm BPF_JMP32 | BPF_K only
BPF_JEQ 0x1 any PC += offset if dst == src
BPF_JGT 0x2 any PC += offset if dst > src unsigned
BPF_JGE 0x3 any PC += offset if dst >= src unsigned
BPF_JGT 0x2 any PC += offset if dst > src unsigned
BPF_JGE 0x3 any PC += offset if dst >= src unsigned
BPF_JSET 0x4 any PC += offset if dst & src
BPF_JNE 0x5 any PC += offset if dst != src
BPF_JSGT 0x6 any PC += offset if dst > src signed
BPF_JSGE 0x7 any PC += offset if dst >= src signed
BPF_CALL 0x8 0x0 call helper function by address see `Helper functions`_
BPF_CALL 0x8 0x1 call PC += imm see `Program-local functions`_
BPF_CALL 0x8 0x2 call helper function by BTF ID see `Helper functions`_
BPF_EXIT 0x9 0x0 return BPF_JMP only
BPF_JLT 0xa any PC += offset if dst < src unsigned
BPF_JLE 0xb any PC += offset if dst <= src unsigned
BPF_JSLT 0xc any PC += offset if dst < src signed
BPF_JSLE 0xd any PC += offset if dst <= src signed
======== ===== === =========================================== =========================================
BPF_JSGT 0x6 any PC += offset if dst > src signed
BPF_JSGE 0x7 any PC += offset if dst >= src signed
BPF_CALL 0x8 0x0 call helper function by address BPF_JMP | BPF_K only, see `Helper functions`_
BPF_CALL 0x8 0x1 call PC += imm BPF_JMP | BPF_K only, see `Program-local functions`_
BPF_CALL 0x8 0x2 call helper function by BTF ID BPF_JMP | BPF_K only, see `Helper functions`_
BPF_EXIT 0x9 0x0 return BPF_JMP | BPF_K only
BPF_JLT 0xa any PC += offset if dst < src unsigned
BPF_JLE 0xb any PC += offset if dst <= src unsigned
BPF_JSLT 0xc any PC += offset if dst < src signed
BPF_JSLE 0xd any PC += offset if dst <= src signed
======== ===== === =============================== =============================================
The BPF program needs to store the return value into register R0 before doing a
``BPF_EXIT``.
@ -610,4 +636,6 @@ Legacy BPF Packet access instructions
BPF previously introduced special instructions for access to packet data that were
carried over from classic BPF. However, these instructions are
deprecated and should no longer be used.
deprecated and should no longer be used. All legacy packet access
instructions belong to the "legacy" conformance group instead of the "basic"
conformance group.

View File

@ -562,7 +562,7 @@ works::
* ``checkpoint[0].r1`` is marked as read;
* At instruction #5 exit is reached and ``checkpoint[0]`` can now be processed
by ``clean_live_states()``. After this processing ``checkpoint[0].r0`` has a
by ``clean_live_states()``. After this processing ``checkpoint[0].r1`` has a
read mark and all other registers and stack slots are marked as ``NOT_INIT``
or ``STACK_INVALID``

View File

@ -3799,6 +3799,7 @@ M: Alexei Starovoitov <ast@kernel.org>
M: Daniel Borkmann <daniel@iogearbox.net>
M: Andrii Nakryiko <andrii@kernel.org>
R: Martin KaFai Lau <martin.lau@linux.dev>
R: Eduard Zingerman <eddyz87@gmail.com>
R: Song Liu <song@kernel.org>
R: Yonghong Song <yonghong.song@linux.dev>
R: John Fastabend <john.fastabend@gmail.com>
@ -3859,6 +3860,7 @@ F: net/unix/unix_bpf.c
BPF [LIBRARY] (libbpf)
M: Andrii Nakryiko <andrii@kernel.org>
M: Eduard Zingerman <eddyz87@gmail.com>
L: bpf@vger.kernel.org
S: Maintained
F: tools/lib/bpf/
@ -3916,6 +3918,7 @@ F: security/bpf/
BPF [SELFTESTS] (Test Runners & Infrastructure)
M: Andrii Nakryiko <andrii@kernel.org>
M: Eduard Zingerman <eddyz87@gmail.com>
R: Mykola Lysenko <mykolal@fb.com>
L: bpf@vger.kernel.org
S: Maintained

View File

@ -2305,3 +2305,8 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type,
return ret;
}
bool bpf_jit_supports_ptr_xchg(void)
{
return true;
}

View File

@ -3242,3 +3242,8 @@ void bpf_arch_poke_desc_update(struct bpf_jit_poke_descriptor *poke,
BUG_ON(ret < 0);
}
}
bool bpf_jit_supports_ptr_xchg(void)
{
return true;
}

View File

@ -110,7 +110,7 @@ lirc_mode2_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_get_prandom_u32:
return &bpf_get_prandom_u32_proto;
case BPF_FUNC_trace_printk:
if (perfmon_capable())
if (bpf_token_capable(prog->aux->token, CAP_PERFMON))
return bpf_get_trace_printk_proto();
fallthrough;
default:

View File

@ -52,6 +52,10 @@ struct module;
struct bpf_func_state;
struct ftrace_ops;
struct cgroup;
struct bpf_token;
struct user_namespace;
struct super_block;
struct inode;
extern struct idr btf_idr;
extern spinlock_t btf_idr_lock;
@ -1485,6 +1489,7 @@ struct bpf_prog_aux {
#ifdef CONFIG_SECURITY
void *security;
#endif
struct bpf_token *token;
struct bpf_prog_offload *offload;
struct btf *btf;
struct bpf_func_info *func_info;
@ -1609,6 +1614,31 @@ struct bpf_link_primer {
u32 id;
};
struct bpf_mount_opts {
kuid_t uid;
kgid_t gid;
umode_t mode;
/* BPF token-related delegation options */
u64 delegate_cmds;
u64 delegate_maps;
u64 delegate_progs;
u64 delegate_attachs;
};
struct bpf_token {
struct work_struct work;
atomic64_t refcnt;
struct user_namespace *userns;
u64 allowed_cmds;
u64 allowed_maps;
u64 allowed_progs;
u64 allowed_attachs;
#ifdef CONFIG_SECURITY
void *security;
#endif
};
struct bpf_struct_ops_value;
struct btf_member;
@ -1673,19 +1703,48 @@ struct bpf_struct_ops {
void (*unreg)(void *kdata);
int (*update)(void *kdata, void *old_kdata);
int (*validate)(void *kdata);
const struct btf_type *type;
const struct btf_type *value_type;
void *cfi_stubs;
struct module *owner;
const char *name;
struct btf_func_model func_models[BPF_STRUCT_OPS_MAX_NR_MEMBERS];
};
struct bpf_struct_ops_desc {
struct bpf_struct_ops *st_ops;
const struct btf_type *type;
const struct btf_type *value_type;
u32 type_id;
u32 value_id;
void *cfi_stubs;
};
enum bpf_struct_ops_state {
BPF_STRUCT_OPS_STATE_INIT,
BPF_STRUCT_OPS_STATE_INUSE,
BPF_STRUCT_OPS_STATE_TOBEFREE,
BPF_STRUCT_OPS_STATE_READY,
};
struct bpf_struct_ops_common_value {
refcount_t refcnt;
enum bpf_struct_ops_state state;
};
#if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL)
/* This macro helps developer to register a struct_ops type and generate
* type information correctly. Developers should use this macro to register
* a struct_ops type instead of calling __register_bpf_struct_ops() directly.
*/
#define register_bpf_struct_ops(st_ops, type) \
({ \
struct bpf_struct_ops_##type { \
struct bpf_struct_ops_common_value common; \
struct type data ____cacheline_aligned_in_smp; \
}; \
BTF_TYPE_EMIT(struct bpf_struct_ops_##type); \
__register_bpf_struct_ops(st_ops); \
})
#define BPF_MODULE_OWNER ((void *)((0xeB9FUL << 2) + POISON_POINTER_DELTA))
const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id);
void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log);
bool bpf_struct_ops_get(const void *kdata);
void bpf_struct_ops_put(const void *kdata);
int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key,
@ -1727,15 +1786,12 @@ struct bpf_dummy_ops {
int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
union bpf_attr __user *uattr);
#endif
int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc,
struct btf *btf,
struct bpf_verifier_log *log);
void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map);
#else
static inline const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id)
{
return NULL;
}
static inline void bpf_struct_ops_init(struct btf *btf,
struct bpf_verifier_log *log)
{
}
#define register_bpf_struct_ops(st_ops, type) ({ (void *)(st_ops); 0; })
static inline bool bpf_try_module_get(const void *data, struct module *owner)
{
return try_module_get(owner);
@ -1754,6 +1810,9 @@ static inline int bpf_struct_ops_link_create(union bpf_attr *attr)
{
return -EOPNOTSUPP;
}
static inline void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map)
{
}
#endif
@ -2068,6 +2127,7 @@ static inline void bpf_enable_instrumentation(void)
migrate_enable();
}
extern const struct super_operations bpf_super_ops;
extern const struct file_operations bpf_map_fops;
extern const struct file_operations bpf_prog_fops;
extern const struct file_operations bpf_iter_fops;
@ -2202,24 +2262,26 @@ static inline void bpf_map_dec_elem_count(struct bpf_map *map)
extern int sysctl_unprivileged_bpf_disabled;
static inline bool bpf_allow_ptr_leaks(void)
bool bpf_token_capable(const struct bpf_token *token, int cap);
static inline bool bpf_allow_ptr_leaks(const struct bpf_token *token)
{
return perfmon_capable();
return bpf_token_capable(token, CAP_PERFMON);
}
static inline bool bpf_allow_uninit_stack(void)
static inline bool bpf_allow_uninit_stack(const struct bpf_token *token)
{
return perfmon_capable();
return bpf_token_capable(token, CAP_PERFMON);
}
static inline bool bpf_bypass_spec_v1(void)
static inline bool bpf_bypass_spec_v1(const struct bpf_token *token)
{
return cpu_mitigations_off() || perfmon_capable();
return cpu_mitigations_off() || bpf_token_capable(token, CAP_PERFMON);
}
static inline bool bpf_bypass_spec_v4(void)
static inline bool bpf_bypass_spec_v4(const struct bpf_token *token)
{
return cpu_mitigations_off() || perfmon_capable();
return cpu_mitigations_off() || bpf_token_capable(token, CAP_PERFMON);
}
int bpf_map_new_fd(struct bpf_map *map, int flags);
@ -2236,8 +2298,21 @@ int bpf_link_new_fd(struct bpf_link *link);
struct bpf_link *bpf_link_get_from_fd(u32 ufd);
struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
void bpf_token_inc(struct bpf_token *token);
void bpf_token_put(struct bpf_token *token);
int bpf_token_create(union bpf_attr *attr);
struct bpf_token *bpf_token_get_from_fd(u32 ufd);
bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type);
bool bpf_token_allow_prog_type(const struct bpf_token *token,
enum bpf_prog_type prog_type,
enum bpf_attach_type attach_type);
int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
struct inode *bpf_get_inode(struct super_block *sb, const struct inode *dir,
umode_t mode);
#define BPF_ITER_FUNC_PREFIX "bpf_iter_"
#define DEFINE_BPF_ITER_FUNC(target, args...) \
@ -2472,11 +2547,14 @@ int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *pr
struct btf *btf, const struct btf_type *t);
const char *btf_find_decl_tag_value(const struct btf *btf, const struct btf_type *pt,
int comp_idx, const char *tag_key);
int btf_find_next_decl_tag(const struct btf *btf, const struct btf_type *pt,
int comp_idx, const char *tag_key, int last_id);
struct bpf_prog *bpf_prog_by_id(u32 id);
struct bpf_link *bpf_link_by_id(u32 id);
const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id,
const struct bpf_prog *prog);
void bpf_task_storage_free(struct task_struct *task);
void bpf_cgrp_storage_free(struct cgroup *cgroup);
bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
@ -2595,6 +2673,24 @@ static inline int bpf_obj_get_user(const char __user *pathname, int flags)
return -EOPNOTSUPP;
}
static inline bool bpf_token_capable(const struct bpf_token *token, int cap)
{
return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
}
static inline void bpf_token_inc(struct bpf_token *token)
{
}
static inline void bpf_token_put(struct bpf_token *token)
{
}
static inline struct bpf_token *bpf_token_get_from_fd(u32 ufd)
{
return ERR_PTR(-EOPNOTSUPP);
}
static inline void __dev_flush(void)
{
}
@ -2718,7 +2814,7 @@ static inline int btf_struct_access(struct bpf_verifier_log *log,
}
static inline const struct bpf_func_proto *
bpf_base_func_proto(enum bpf_func_id func_id)
bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
return NULL;
}

View File

@ -453,7 +453,7 @@ struct bpf_verifier_state {
#define bpf_get_spilled_reg(slot, frame, mask) \
(((slot < frame->allocated_stack / BPF_REG_SIZE) && \
((1 << frame->stack[slot].slot_type[0]) & (mask))) \
((1 << frame->stack[slot].slot_type[BPF_REG_SIZE - 1]) & (mask))) \
? &frame->stack[slot].spilled_ptr : NULL)
/* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */
@ -662,6 +662,7 @@ struct bpf_verifier_env {
u32 prev_insn_idx;
struct bpf_prog *prog; /* eBPF program being verified */
const struct bpf_verifier_ops *ops;
struct module *attach_btf_mod; /* The owner module of prog->aux->attach_btf */
struct bpf_verifier_stack_elem *head; /* stack of verifier states to be processed */
int stack_size; /* number of states to be processed */
bool strict_alignment; /* perform strict pointer alignment checks */

View File

@ -137,6 +137,7 @@ struct btf_struct_metas {
extern const struct file_operations btf_fops;
const char *btf_get_name(const struct btf *btf);
void btf_get(struct btf *btf);
void btf_put(struct btf *btf);
int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_sz);
@ -496,6 +497,18 @@ static inline void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id)
struct bpf_verifier_log;
#if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL)
struct bpf_struct_ops;
int __register_bpf_struct_ops(struct bpf_struct_ops *st_ops);
const struct bpf_struct_ops_desc *bpf_struct_ops_find_value(struct btf *btf, u32 value_id);
const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *btf, u32 type_id);
#else
static inline const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *btf, u32 type_id)
{
return NULL;
}
#endif
#ifdef CONFIG_BPF_SYSCALL
const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
const char *btf_name_by_offset(const struct btf *btf, u32 offset);

View File

@ -955,6 +955,7 @@ bool bpf_jit_supports_subprog_tailcalls(void);
bool bpf_jit_supports_kfunc_call(void);
bool bpf_jit_supports_far_kfunc_call(void);
bool bpf_jit_supports_exceptions(void);
bool bpf_jit_supports_ptr_xchg(void);
void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie);
bool bpf_helper_changes_pkt_data(void *func);
@ -1139,7 +1140,7 @@ static inline bool bpf_jit_blinding_enabled(struct bpf_prog *prog)
return false;
if (!bpf_jit_harden)
return false;
if (bpf_jit_harden == 1 && bpf_capable())
if (bpf_jit_harden == 1 && bpf_token_capable(prog->aux->token, CAP_BPF))
return false;
return true;

View File

@ -404,10 +404,17 @@ LSM_HOOK(void, LSM_RET_VOID, audit_rule_free, void *lsmrule)
LSM_HOOK(int, 0, bpf, int cmd, union bpf_attr *attr, unsigned int size)
LSM_HOOK(int, 0, bpf_map, struct bpf_map *map, fmode_t fmode)
LSM_HOOK(int, 0, bpf_prog, struct bpf_prog *prog)
LSM_HOOK(int, 0, bpf_map_alloc_security, struct bpf_map *map)
LSM_HOOK(void, LSM_RET_VOID, bpf_map_free_security, struct bpf_map *map)
LSM_HOOK(int, 0, bpf_prog_alloc_security, struct bpf_prog_aux *aux)
LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free_security, struct bpf_prog_aux *aux)
LSM_HOOK(int, 0, bpf_map_create, struct bpf_map *map, union bpf_attr *attr,
struct bpf_token *token)
LSM_HOOK(void, LSM_RET_VOID, bpf_map_free, struct bpf_map *map)
LSM_HOOK(int, 0, bpf_prog_load, struct bpf_prog *prog, union bpf_attr *attr,
struct bpf_token *token)
LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free, struct bpf_prog *prog)
LSM_HOOK(int, 0, bpf_token_create, struct bpf_token *token, union bpf_attr *attr,
struct path *path)
LSM_HOOK(void, LSM_RET_VOID, bpf_token_free, struct bpf_token *token)
LSM_HOOK(int, 0, bpf_token_cmd, const struct bpf_token *token, enum bpf_cmd cmd)
LSM_HOOK(int, 0, bpf_token_capable, const struct bpf_token *token, int cap)
#endif /* CONFIG_BPF_SYSCALL */
LSM_HOOK(int, 0, locked_down, enum lockdown_reason what)

View File

@ -32,6 +32,7 @@
#include <linux/string.h>
#include <linux/mm.h>
#include <linux/sockptr.h>
#include <linux/bpf.h>
#include <uapi/linux/lsm.h>
struct linux_binprm;
@ -2064,15 +2065,22 @@ static inline void securityfs_remove(struct dentry *dentry)
union bpf_attr;
struct bpf_map;
struct bpf_prog;
struct bpf_prog_aux;
struct bpf_token;
#ifdef CONFIG_SECURITY
extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size);
extern int security_bpf_map(struct bpf_map *map, fmode_t fmode);
extern int security_bpf_prog(struct bpf_prog *prog);
extern int security_bpf_map_alloc(struct bpf_map *map);
extern int security_bpf_map_create(struct bpf_map *map, union bpf_attr *attr,
struct bpf_token *token);
extern void security_bpf_map_free(struct bpf_map *map);
extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux);
extern void security_bpf_prog_free(struct bpf_prog_aux *aux);
extern int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
struct bpf_token *token);
extern void security_bpf_prog_free(struct bpf_prog *prog);
extern int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
struct path *path);
extern void security_bpf_token_free(struct bpf_token *token);
extern int security_bpf_token_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
extern int security_bpf_token_capable(const struct bpf_token *token, int cap);
#else
static inline int security_bpf(int cmd, union bpf_attr *attr,
unsigned int size)
@ -2090,7 +2098,8 @@ static inline int security_bpf_prog(struct bpf_prog *prog)
return 0;
}
static inline int security_bpf_map_alloc(struct bpf_map *map)
static inline int security_bpf_map_create(struct bpf_map *map, union bpf_attr *attr,
struct bpf_token *token)
{
return 0;
}
@ -2098,13 +2107,33 @@ static inline int security_bpf_map_alloc(struct bpf_map *map)
static inline void security_bpf_map_free(struct bpf_map *map)
{ }
static inline int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
static inline int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
struct bpf_token *token)
{
return 0;
}
static inline void security_bpf_prog_free(struct bpf_prog_aux *aux)
static inline void security_bpf_prog_free(struct bpf_prog *prog)
{ }
static inline int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
struct path *path)
{
return 0;
}
static inline void security_bpf_token_free(struct bpf_token *token)
{ }
static inline int security_bpf_token_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
{
return 0;
}
static inline int security_bpf_token_capable(const struct bpf_token *token, int cap)
{
return 0;
}
#endif /* CONFIG_SECURITY */
#endif /* CONFIG_BPF_SYSCALL */

View File

@ -83,6 +83,45 @@ static inline struct sock *req_to_sk(struct request_sock *req)
return (struct sock *)req;
}
/**
* skb_steal_sock - steal a socket from an sk_buff
* @skb: sk_buff to steal the socket from
* @refcounted: is set to true if the socket is reference-counted
* @prefetched: is set to true if the socket was assigned from bpf
*/
static inline struct sock *skb_steal_sock(struct sk_buff *skb,
bool *refcounted, bool *prefetched)
{
struct sock *sk = skb->sk;
if (!sk) {
*prefetched = false;
*refcounted = false;
return NULL;
}
*prefetched = skb_sk_is_prefetched(skb);
if (*prefetched) {
#if IS_ENABLED(CONFIG_SYN_COOKIES)
if (sk->sk_state == TCP_NEW_SYN_RECV && inet_reqsk(sk)->syncookie) {
struct request_sock *req = inet_reqsk(sk);
*refcounted = false;
sk = req->rsk_listener;
req->rsk_listener = NULL;
return sk;
}
#endif
*refcounted = sk_is_refcounted(sk);
} else {
*refcounted = true;
}
skb->destructor = NULL;
skb->sk = NULL;
return sk;
}
static inline struct request_sock *
reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk_listener,
bool attach_listener)

View File

@ -2830,31 +2830,6 @@ sk_is_refcounted(struct sock *sk)
return !sk_fullsock(sk) || !sock_flag(sk, SOCK_RCU_FREE);
}
/**
* skb_steal_sock - steal a socket from an sk_buff
* @skb: sk_buff to steal the socket from
* @refcounted: is set to true if the socket is reference-counted
* @prefetched: is set to true if the socket was assigned from bpf
*/
static inline struct sock *
skb_steal_sock(struct sk_buff *skb, bool *refcounted, bool *prefetched)
{
if (skb->sk) {
struct sock *sk = skb->sk;
*refcounted = true;
*prefetched = skb_sk_is_prefetched(skb);
if (*prefetched)
*refcounted = sk_is_refcounted(sk);
skb->destructor = NULL;
skb->sk = NULL;
return sk;
}
*prefetched = false;
*refcounted = false;
return NULL;
}
/* Checks if this SKB belongs to an HW offloaded socket
* and whether any SW fallbacks are required based on dev.
* Check decrypted mark in case skb_orphan() cleared socket.

View File

@ -498,6 +498,22 @@ struct request_sock *cookie_tcp_reqsk_alloc(const struct request_sock_ops *ops,
struct tcp_options_received *tcp_opt,
int mss, u32 tsoff);
#if IS_ENABLED(CONFIG_BPF)
struct bpf_tcp_req_attrs {
u32 rcv_tsval;
u32 rcv_tsecr;
u16 mss;
u8 rcv_wscale;
u8 snd_wscale;
u8 ecn_ok;
u8 wscale_ok;
u8 sack_ok;
u8 tstamp_ok;
u8 usec_ts_ok;
u8 reserved[3];
};
#endif
#ifdef CONFIG_SYN_COOKIES
/* Syncookies use a monotonic timer which increments every 60 seconds.
@ -577,6 +593,15 @@ static inline u32 tcp_cookie_time(void)
return val;
}
/* Convert one nsec 64bit timestamp to ts (ms or usec resolution) */
static inline u64 tcp_ns_to_ts(bool usec_ts, u64 val)
{
if (usec_ts)
return div_u64(val, NSEC_PER_USEC);
return div_u64(val, NSEC_PER_MSEC);
}
u32 __cookie_v4_init_sequence(const struct iphdr *iph, const struct tcphdr *th,
u16 *mssp);
__u32 cookie_v4_init_sequence(const struct sk_buff *skb, __u16 *mss);
@ -590,6 +615,26 @@ static inline bool cookie_ecn_ok(const struct net *net, const struct dst_entry *
dst_feature(dst, RTAX_FEATURE_ECN);
}
#if IS_ENABLED(CONFIG_BPF)
static inline bool cookie_bpf_ok(struct sk_buff *skb)
{
return skb->sk;
}
struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb);
#else
static inline bool cookie_bpf_ok(struct sk_buff *skb)
{
return false;
}
static inline struct request_sock *cookie_bpf_check(struct net *net, struct sock *sk,
struct sk_buff *skb)
{
return NULL;
}
#endif
/* From net/ipv6/syncookies.c */
int __cookie_v6_check(const struct ipv6hdr *iph, const struct tcphdr *th);
struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb);

View File

@ -847,6 +847,36 @@ union bpf_iter_link_info {
* Returns zero on success. On error, -1 is returned and *errno*
* is set appropriately.
*
* BPF_TOKEN_CREATE
* Description
* Create BPF token with embedded information about what
* BPF-related functionality it allows:
* - a set of allowed bpf() syscall commands;
* - a set of allowed BPF map types to be created with
* BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
* - a set of allowed BPF program types and BPF program attach
* types to be loaded with BPF_PROG_LOAD command, if
* BPF_PROG_LOAD itself is allowed.
*
* BPF token is created (derived) from an instance of BPF FS,
* assuming it has necessary delegation mount options specified.
* This BPF token can be passed as an extra parameter to various
* bpf() syscall commands to grant BPF subsystem functionality to
* unprivileged processes.
*
* When created, BPF token is "associated" with the owning
* user namespace of BPF FS instance (super block) that it was
* derived from, and subsequent BPF operations performed with
* BPF token would be performing capabilities checks (i.e.,
* CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
* that user namespace. Without BPF token, such capabilities
* have to be granted in init user namespace, making bpf()
* syscall incompatible with user namespace, for the most part.
*
* Return
* A new file descriptor (a nonnegative integer), or -1 if an
* error occurred (in which case, *errno* is set appropriately).
*
* NOTES
* eBPF objects (maps and programs) can be shared between processes.
*
@ -901,6 +931,8 @@ enum bpf_cmd {
BPF_ITER_CREATE,
BPF_LINK_DETACH,
BPF_PROG_BIND_MAP,
BPF_TOKEN_CREATE,
__MAX_BPF_CMD,
};
enum bpf_map_type {
@ -951,6 +983,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_BLOOM_FILTER,
BPF_MAP_TYPE_USER_RINGBUF,
BPF_MAP_TYPE_CGRP_STORAGE,
__MAX_BPF_MAP_TYPE
};
/* Note that tracing related programs such as
@ -995,6 +1028,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SK_LOOKUP,
BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
BPF_PROG_TYPE_NETFILTER,
__MAX_BPF_PROG_TYPE
};
enum bpf_attach_type {
@ -1330,6 +1364,12 @@ enum {
/* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */
BPF_F_PATH_FD = (1U << 14),
/* Flag for value_type_btf_obj_fd, the fd is available */
BPF_F_VTYPE_BTF_OBJ_FD = (1U << 15),
/* BPF token FD is passed in a corresponding command's token_fd field */
BPF_F_TOKEN_FD = (1U << 16),
};
/* Flags for BPF_PROG_QUERY. */
@ -1403,6 +1443,15 @@ union bpf_attr {
* to using 5 hash functions).
*/
__u64 map_extra;
__s32 value_type_btf_obj_fd; /* fd pointing to a BTF
* type data for
* btf_vmlinux_value_type_id.
*/
/* BPF token FD to use with BPF_MAP_CREATE operation.
* If provided, map_flags should have BPF_F_TOKEN_FD flag set.
*/
__s32 map_token_fd;
};
struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
@ -1472,6 +1521,10 @@ union bpf_attr {
* truncated), or smaller (if log buffer wasn't filled completely).
*/
__u32 log_true_size;
/* BPF token FD to use with BPF_PROG_LOAD operation.
* If provided, prog_flags should have BPF_F_TOKEN_FD flag set.
*/
__s32 prog_token_fd;
};
struct { /* anonymous struct used by BPF_OBJ_* commands */
@ -1584,6 +1637,11 @@ union bpf_attr {
* truncated), or smaller (if log buffer wasn't filled completely).
*/
__u32 btf_log_true_size;
__u32 btf_flags;
/* BPF token FD to use with BPF_BTF_LOAD operation.
* If provided, btf_flags should have BPF_F_TOKEN_FD flag set.
*/
__s32 btf_token_fd;
};
struct {
@ -1714,6 +1772,11 @@ union bpf_attr {
__u32 flags; /* extra flags */
} prog_bind_map;
struct { /* struct used by BPF_TOKEN_CREATE command */
__u32 flags;
__u32 bpffs_fd;
} token_create;
} __attribute__((aligned(8)));
/* The description below is an attempt at providing documentation to eBPF
@ -4839,9 +4902,9 @@ union bpf_attr {
* going through the CPU's backlog queue.
*
* The *flags* argument is reserved and must be 0. The helper is
* currently only supported for tc BPF program types at the ingress
* hook and for veth device types. The peer device must reside in a
* different network namespace.
* currently only supported for tc BPF program types at the
* ingress hook and for veth and netkit target device types. The
* peer device must reside in a different network namespace.
* Return
* The helper returns **TC_ACT_REDIRECT** on success or
* **TC_ACT_SHOT** on error.
@ -6487,7 +6550,7 @@ struct bpf_map_info {
__u32 btf_id;
__u32 btf_key_type_id;
__u32 btf_value_type_id;
__u32 :32; /* alignment pad */
__u32 btf_vmlinux_id;
__u64 map_extra;
} __attribute__((aligned(8)));
@ -6563,6 +6626,7 @@ struct bpf_link_info {
__u32 count; /* in/out: kprobe_multi function count */
__u32 flags;
__u64 missed;
__aligned_u64 cookies;
} kprobe_multi;
struct {
__aligned_u64 path;
@ -6582,6 +6646,7 @@ struct bpf_link_info {
__aligned_u64 file_name; /* in/out */
__u32 name_len;
__u32 offset; /* offset from file_name */
__u64 cookie;
} uprobe; /* BPF_PERF_EVENT_UPROBE, BPF_PERF_EVENT_URETPROBE */
struct {
__aligned_u64 func_name; /* in/out */
@ -6589,14 +6654,19 @@ struct bpf_link_info {
__u32 offset; /* offset from func_name */
__u64 addr;
__u64 missed;
__u64 cookie;
} kprobe; /* BPF_PERF_EVENT_KPROBE, BPF_PERF_EVENT_KRETPROBE */
struct {
__aligned_u64 tp_name; /* in/out */
__u32 name_len;
__u32 :32;
__u64 cookie;
} tracepoint; /* BPF_PERF_EVENT_TRACEPOINT */
struct {
__u64 config;
__u32 type;
__u32 :32;
__u64 cookie;
} event; /* BPF_PERF_EVENT_EVENT */
};
} perf_event;

View File

@ -6,7 +6,7 @@ cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
endif
CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o
obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o token.o
obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o

View File

@ -82,7 +82,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
int numa_node = bpf_map_attr_numa_node(attr);
u32 elem_size, index_mask, max_entries;
bool bypass_spec_v1 = bpf_bypass_spec_v1();
bool bypass_spec_v1 = bpf_bypass_spec_v1(NULL);
u64 array_size, mask64;
struct bpf_array *array;

View File

@ -260,9 +260,15 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
BTF_SET_START(sleepable_lsm_hooks)
BTF_ID(func, bpf_lsm_bpf)
BTF_ID(func, bpf_lsm_bpf_map)
BTF_ID(func, bpf_lsm_bpf_map_alloc_security)
BTF_ID(func, bpf_lsm_bpf_map_free_security)
BTF_ID(func, bpf_lsm_bpf_map_create)
BTF_ID(func, bpf_lsm_bpf_map_free)
BTF_ID(func, bpf_lsm_bpf_prog)
BTF_ID(func, bpf_lsm_bpf_prog_load)
BTF_ID(func, bpf_lsm_bpf_prog_free)
BTF_ID(func, bpf_lsm_bpf_token_create)
BTF_ID(func, bpf_lsm_bpf_token_free)
BTF_ID(func, bpf_lsm_bpf_token_cmd)
BTF_ID(func, bpf_lsm_bpf_token_capable)
BTF_ID(func, bpf_lsm_bprm_check_security)
BTF_ID(func, bpf_lsm_bprm_committed_creds)
BTF_ID(func, bpf_lsm_bprm_committing_creds)
@ -357,9 +363,8 @@ BTF_ID(func, bpf_lsm_userns_create)
BTF_SET_END(sleepable_lsm_hooks)
BTF_SET_START(untrusted_lsm_hooks)
BTF_ID(func, bpf_lsm_bpf_map_free_security)
BTF_ID(func, bpf_lsm_bpf_prog_alloc_security)
BTF_ID(func, bpf_lsm_bpf_prog_free_security)
BTF_ID(func, bpf_lsm_bpf_map_free)
BTF_ID(func, bpf_lsm_bpf_prog_free)
BTF_ID(func, bpf_lsm_file_alloc_security)
BTF_ID(func, bpf_lsm_file_free_security)
#ifdef CONFIG_SECURITY_NETWORK

View File

@ -13,26 +13,15 @@
#include <linux/btf_ids.h>
#include <linux/rcupdate_wait.h>
enum bpf_struct_ops_state {
BPF_STRUCT_OPS_STATE_INIT,
BPF_STRUCT_OPS_STATE_INUSE,
BPF_STRUCT_OPS_STATE_TOBEFREE,
BPF_STRUCT_OPS_STATE_READY,
};
#define BPF_STRUCT_OPS_COMMON_VALUE \
refcount_t refcnt; \
enum bpf_struct_ops_state state
struct bpf_struct_ops_value {
BPF_STRUCT_OPS_COMMON_VALUE;
struct bpf_struct_ops_common_value common;
char data[] ____cacheline_aligned_in_smp;
};
struct bpf_struct_ops_map {
struct bpf_map map;
struct rcu_head rcu;
const struct bpf_struct_ops *st_ops;
const struct bpf_struct_ops_desc *st_ops_desc;
/* protect map_update */
struct mutex lock;
/* link has all the bpf_links that is populated
@ -40,12 +29,15 @@ struct bpf_struct_ops_map {
* (in kvalue.data).
*/
struct bpf_link **links;
u32 links_cnt;
/* image is a page that has all the trampolines
* that stores the func args before calling the bpf_prog.
* A PAGE_SIZE "image" is enough to store all trampoline for
* "links[]".
*/
void *image;
/* The owner moduler's btf. */
struct btf *btf;
/* uvalue->data stores the kernel struct
* (e.g. tcp_congestion_ops) that is more useful
* to userspace than the kvalue. For example,
@ -70,35 +62,6 @@ static DEFINE_MUTEX(update_mutex);
#define VALUE_PREFIX "bpf_struct_ops_"
#define VALUE_PREFIX_LEN (sizeof(VALUE_PREFIX) - 1)
/* bpf_struct_ops_##_name (e.g. bpf_struct_ops_tcp_congestion_ops) is
* the map's value exposed to the userspace and its btf-type-id is
* stored at the map->btf_vmlinux_value_type_id.
*
*/
#define BPF_STRUCT_OPS_TYPE(_name) \
extern struct bpf_struct_ops bpf_##_name; \
\
struct bpf_struct_ops_##_name { \
BPF_STRUCT_OPS_COMMON_VALUE; \
struct _name data ____cacheline_aligned_in_smp; \
};
#include "bpf_struct_ops_types.h"
#undef BPF_STRUCT_OPS_TYPE
enum {
#define BPF_STRUCT_OPS_TYPE(_name) BPF_STRUCT_OPS_TYPE_##_name,
#include "bpf_struct_ops_types.h"
#undef BPF_STRUCT_OPS_TYPE
__NR_BPF_STRUCT_OPS_TYPE,
};
static struct bpf_struct_ops * const bpf_struct_ops[] = {
#define BPF_STRUCT_OPS_TYPE(_name) \
[BPF_STRUCT_OPS_TYPE_##_name] = &bpf_##_name,
#include "bpf_struct_ops_types.h"
#undef BPF_STRUCT_OPS_TYPE
};
const struct bpf_verifier_ops bpf_struct_ops_verifier_ops = {
};
@ -108,138 +71,139 @@ const struct bpf_prog_ops bpf_struct_ops_prog_ops = {
#endif
};
static const struct btf_type *module_type;
BTF_ID_LIST(st_ops_ids)
BTF_ID(struct, module)
BTF_ID(struct, bpf_struct_ops_common_value)
void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log)
{
s32 type_id, value_id, module_id;
const struct btf_member *member;
struct bpf_struct_ops *st_ops;
const struct btf_type *t;
char value_name[128];
const char *mname;
u32 i, j;
/* Ensure BTF type is emitted for "struct bpf_struct_ops_##_name" */
#define BPF_STRUCT_OPS_TYPE(_name) BTF_TYPE_EMIT(struct bpf_struct_ops_##_name);
#include "bpf_struct_ops_types.h"
#undef BPF_STRUCT_OPS_TYPE
module_id = btf_find_by_name_kind(btf, "module", BTF_KIND_STRUCT);
if (module_id < 0) {
pr_warn("Cannot find struct module in btf_vmlinux\n");
return;
}
module_type = btf_type_by_id(btf, module_id);
for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) {
st_ops = bpf_struct_ops[i];
if (strlen(st_ops->name) + VALUE_PREFIX_LEN >=
sizeof(value_name)) {
pr_warn("struct_ops name %s is too long\n",
st_ops->name);
continue;
}
sprintf(value_name, "%s%s", VALUE_PREFIX, st_ops->name);
value_id = btf_find_by_name_kind(btf, value_name,
BTF_KIND_STRUCT);
if (value_id < 0) {
pr_warn("Cannot find struct %s in btf_vmlinux\n",
value_name);
continue;
}
type_id = btf_find_by_name_kind(btf, st_ops->name,
BTF_KIND_STRUCT);
if (type_id < 0) {
pr_warn("Cannot find struct %s in btf_vmlinux\n",
st_ops->name);
continue;
}
t = btf_type_by_id(btf, type_id);
if (btf_type_vlen(t) > BPF_STRUCT_OPS_MAX_NR_MEMBERS) {
pr_warn("Cannot support #%u members in struct %s\n",
btf_type_vlen(t), st_ops->name);
continue;
}
for_each_member(j, t, member) {
const struct btf_type *func_proto;
mname = btf_name_by_offset(btf, member->name_off);
if (!*mname) {
pr_warn("anon member in struct %s is not supported\n",
st_ops->name);
break;
}
if (__btf_member_bitfield_size(t, member)) {
pr_warn("bit field member %s in struct %s is not supported\n",
mname, st_ops->name);
break;
}
func_proto = btf_type_resolve_func_ptr(btf,
member->type,
NULL);
if (func_proto &&
btf_distill_func_proto(log, btf,
func_proto, mname,
&st_ops->func_models[j])) {
pr_warn("Error in parsing func ptr %s in struct %s\n",
mname, st_ops->name);
break;
}
}
if (j == btf_type_vlen(t)) {
if (st_ops->init(btf)) {
pr_warn("Error in init bpf_struct_ops %s\n",
st_ops->name);
} else {
st_ops->type_id = type_id;
st_ops->type = t;
st_ops->value_id = value_id;
st_ops->value_type = btf_type_by_id(btf,
value_id);
}
}
}
}
enum {
IDX_MODULE_ID,
IDX_ST_OPS_COMMON_VALUE_ID,
};
extern struct btf *btf_vmlinux;
static const struct bpf_struct_ops *
bpf_struct_ops_find_value(u32 value_id)
static bool is_valid_value_type(struct btf *btf, s32 value_id,
const struct btf_type *type,
const char *value_name)
{
unsigned int i;
const struct btf_type *common_value_type;
const struct btf_member *member;
const struct btf_type *vt, *mt;
if (!value_id || !btf_vmlinux)
return NULL;
for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) {
if (bpf_struct_ops[i]->value_id == value_id)
return bpf_struct_ops[i];
vt = btf_type_by_id(btf, value_id);
if (btf_vlen(vt) != 2) {
pr_warn("The number of %s's members should be 2, but we get %d\n",
value_name, btf_vlen(vt));
return false;
}
member = btf_type_member(vt);
mt = btf_type_by_id(btf, member->type);
common_value_type = btf_type_by_id(btf_vmlinux,
st_ops_ids[IDX_ST_OPS_COMMON_VALUE_ID]);
if (mt != common_value_type) {
pr_warn("The first member of %s should be bpf_struct_ops_common_value\n",
value_name);
return false;
}
member++;
mt = btf_type_by_id(btf, member->type);
if (mt != type) {
pr_warn("The second member of %s should be %s\n",
value_name, btf_name_by_offset(btf, type->name_off));
return false;
}
return NULL;
return true;
}
const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id)
int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc,
struct btf *btf,
struct bpf_verifier_log *log)
{
unsigned int i;
struct bpf_struct_ops *st_ops = st_ops_desc->st_ops;
const struct btf_member *member;
const struct btf_type *t;
s32 type_id, value_id;
char value_name[128];
const char *mname;
int i;
if (!type_id || !btf_vmlinux)
return NULL;
if (strlen(st_ops->name) + VALUE_PREFIX_LEN >=
sizeof(value_name)) {
pr_warn("struct_ops name %s is too long\n",
st_ops->name);
return -EINVAL;
}
sprintf(value_name, "%s%s", VALUE_PREFIX, st_ops->name);
for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) {
if (bpf_struct_ops[i]->type_id == type_id)
return bpf_struct_ops[i];
type_id = btf_find_by_name_kind(btf, st_ops->name,
BTF_KIND_STRUCT);
if (type_id < 0) {
pr_warn("Cannot find struct %s in %s\n",
st_ops->name, btf_get_name(btf));
return -EINVAL;
}
t = btf_type_by_id(btf, type_id);
if (btf_type_vlen(t) > BPF_STRUCT_OPS_MAX_NR_MEMBERS) {
pr_warn("Cannot support #%u members in struct %s\n",
btf_type_vlen(t), st_ops->name);
return -EINVAL;
}
return NULL;
value_id = btf_find_by_name_kind(btf, value_name,
BTF_KIND_STRUCT);
if (value_id < 0) {
pr_warn("Cannot find struct %s in %s\n",
value_name, btf_get_name(btf));
return -EINVAL;
}
if (!is_valid_value_type(btf, value_id, t, value_name))
return -EINVAL;
for_each_member(i, t, member) {
const struct btf_type *func_proto;
mname = btf_name_by_offset(btf, member->name_off);
if (!*mname) {
pr_warn("anon member in struct %s is not supported\n",
st_ops->name);
return -EOPNOTSUPP;
}
if (__btf_member_bitfield_size(t, member)) {
pr_warn("bit field member %s in struct %s is not supported\n",
mname, st_ops->name);
return -EOPNOTSUPP;
}
func_proto = btf_type_resolve_func_ptr(btf,
member->type,
NULL);
if (func_proto &&
btf_distill_func_proto(log, btf,
func_proto, mname,
&st_ops->func_models[i])) {
pr_warn("Error in parsing func ptr %s in struct %s\n",
mname, st_ops->name);
return -EINVAL;
}
}
if (i == btf_type_vlen(t)) {
if (st_ops->init(btf)) {
pr_warn("Error in init bpf_struct_ops %s\n",
st_ops->name);
return -EINVAL;
} else {
st_ops_desc->type_id = type_id;
st_ops_desc->type = t;
st_ops_desc->value_id = value_id;
st_ops_desc->value_type = btf_type_by_id(btf,
value_id);
}
}
return 0;
}
static int bpf_struct_ops_map_get_next_key(struct bpf_map *map, void *key,
@ -265,7 +229,7 @@ int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key,
kvalue = &st_map->kvalue;
/* Pair with smp_store_release() during map_update */
state = smp_load_acquire(&kvalue->state);
state = smp_load_acquire(&kvalue->common.state);
if (state == BPF_STRUCT_OPS_STATE_INIT) {
memset(value, 0, map->value_size);
return 0;
@ -276,7 +240,7 @@ int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key,
*/
uvalue = value;
memcpy(uvalue, st_map->uvalue, map->value_size);
uvalue->state = state;
uvalue->common.state = state;
/* This value offers the user space a general estimate of how
* many sockets are still utilizing this struct_ops for TCP
@ -284,7 +248,7 @@ int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key,
* should sufficiently meet our present goals.
*/
refcnt = atomic64_read(&map->refcnt) - atomic64_read(&map->usercnt);
refcount_set(&uvalue->refcnt, max_t(s64, refcnt, 0));
refcount_set(&uvalue->common.refcnt, max_t(s64, refcnt, 0));
return 0;
}
@ -296,10 +260,9 @@ static void *bpf_struct_ops_map_lookup_elem(struct bpf_map *map, void *key)
static void bpf_struct_ops_map_put_progs(struct bpf_struct_ops_map *st_map)
{
const struct btf_type *t = st_map->st_ops->type;
u32 i;
for (i = 0; i < btf_type_vlen(t); i++) {
for (i = 0; i < st_map->links_cnt; i++) {
if (st_map->links[i]) {
bpf_link_put(st_map->links[i]);
st_map->links[i] = NULL;
@ -307,7 +270,7 @@ static void bpf_struct_ops_map_put_progs(struct bpf_struct_ops_map *st_map)
}
}
static int check_zero_holes(const struct btf_type *t, void *data)
static int check_zero_holes(const struct btf *btf, const struct btf_type *t, void *data)
{
const struct btf_member *member;
u32 i, moff, msize, prev_mend = 0;
@ -319,8 +282,8 @@ static int check_zero_holes(const struct btf_type *t, void *data)
memchr_inv(data + prev_mend, 0, moff - prev_mend))
return -EINVAL;
mtype = btf_type_by_id(btf_vmlinux, member->type);
mtype = btf_resolve_size(btf_vmlinux, mtype, &msize);
mtype = btf_type_by_id(btf, member->type);
mtype = btf_resolve_size(btf, mtype, &msize);
if (IS_ERR(mtype))
return PTR_ERR(mtype);
prev_mend = moff + msize;
@ -376,10 +339,12 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
void *value, u64 flags)
{
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
const struct bpf_struct_ops *st_ops = st_map->st_ops;
const struct bpf_struct_ops_desc *st_ops_desc = st_map->st_ops_desc;
const struct bpf_struct_ops *st_ops = st_ops_desc->st_ops;
struct bpf_struct_ops_value *uvalue, *kvalue;
const struct btf_type *module_type;
const struct btf_member *member;
const struct btf_type *t = st_ops->type;
const struct btf_type *t = st_ops_desc->type;
struct bpf_tramp_links *tlinks;
void *udata, *kdata;
int prog_fd, err;
@ -392,16 +357,16 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
if (*(u32 *)key != 0)
return -E2BIG;
err = check_zero_holes(st_ops->value_type, value);
err = check_zero_holes(st_map->btf, st_ops_desc->value_type, value);
if (err)
return err;
uvalue = value;
err = check_zero_holes(t, uvalue->data);
err = check_zero_holes(st_map->btf, t, uvalue->data);
if (err)
return err;
if (uvalue->state || refcount_read(&uvalue->refcnt))
if (uvalue->common.state || refcount_read(&uvalue->common.refcnt))
return -EINVAL;
tlinks = kcalloc(BPF_TRAMP_MAX, sizeof(*tlinks), GFP_KERNEL);
@ -413,7 +378,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
mutex_lock(&st_map->lock);
if (kvalue->state != BPF_STRUCT_OPS_STATE_INIT) {
if (kvalue->common.state != BPF_STRUCT_OPS_STATE_INIT) {
err = -EBUSY;
goto unlock;
}
@ -425,6 +390,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
image = st_map->image;
image_end = st_map->image + PAGE_SIZE;
module_type = btf_type_by_id(btf_vmlinux, st_ops_ids[IDX_MODULE_ID]);
for_each_member(i, t, member) {
const struct btf_type *mtype, *ptype;
struct bpf_prog *prog;
@ -432,7 +398,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
u32 moff;
moff = __btf_member_bit_offset(t, member) / 8;
ptype = btf_type_resolve_ptr(btf_vmlinux, member->type, NULL);
ptype = btf_type_resolve_ptr(st_map->btf, member->type, NULL);
if (ptype == module_type) {
if (*(void **)(udata + moff))
goto reset_unlock;
@ -457,8 +423,8 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
if (!ptype || !btf_type_is_func_proto(ptype)) {
u32 msize;
mtype = btf_type_by_id(btf_vmlinux, member->type);
mtype = btf_resolve_size(btf_vmlinux, mtype, &msize);
mtype = btf_type_by_id(st_map->btf, member->type);
mtype = btf_resolve_size(st_map->btf, mtype, &msize);
if (IS_ERR(mtype)) {
err = PTR_ERR(mtype);
goto reset_unlock;
@ -484,7 +450,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
}
if (prog->type != BPF_PROG_TYPE_STRUCT_OPS ||
prog->aux->attach_btf_id != st_ops->type_id ||
prog->aux->attach_btf_id != st_ops_desc->type_id ||
prog->expected_attach_type != i) {
bpf_prog_put(prog);
err = -EINVAL;
@ -527,7 +493,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
*
* Pair with smp_load_acquire() during lookup_elem().
*/
smp_store_release(&kvalue->state, BPF_STRUCT_OPS_STATE_READY);
smp_store_release(&kvalue->common.state, BPF_STRUCT_OPS_STATE_READY);
goto unlock;
}
@ -545,7 +511,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
* It ensures the above udata updates (e.g. prog->aux->id)
* can be seen once BPF_STRUCT_OPS_STATE_INUSE is set.
*/
smp_store_release(&kvalue->state, BPF_STRUCT_OPS_STATE_INUSE);
smp_store_release(&kvalue->common.state, BPF_STRUCT_OPS_STATE_INUSE);
goto unlock;
}
@ -575,12 +541,12 @@ static long bpf_struct_ops_map_delete_elem(struct bpf_map *map, void *key)
if (st_map->map.map_flags & BPF_F_LINK)
return -EOPNOTSUPP;
prev_state = cmpxchg(&st_map->kvalue.state,
prev_state = cmpxchg(&st_map->kvalue.common.state,
BPF_STRUCT_OPS_STATE_INUSE,
BPF_STRUCT_OPS_STATE_TOBEFREE);
switch (prev_state) {
case BPF_STRUCT_OPS_STATE_INUSE:
st_map->st_ops->unreg(&st_map->kvalue.data);
st_map->st_ops_desc->st_ops->unreg(&st_map->kvalue.data);
bpf_map_put(map);
return 0;
case BPF_STRUCT_OPS_STATE_TOBEFREE:
@ -597,6 +563,7 @@ static long bpf_struct_ops_map_delete_elem(struct bpf_map *map, void *key)
static void bpf_struct_ops_map_seq_show_elem(struct bpf_map *map, void *key,
struct seq_file *m)
{
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
void *value;
int err;
@ -606,7 +573,8 @@ static void bpf_struct_ops_map_seq_show_elem(struct bpf_map *map, void *key,
err = bpf_struct_ops_map_sys_lookup_elem(map, key, value);
if (!err) {
btf_type_seq_show(btf_vmlinux, map->btf_vmlinux_value_type_id,
btf_type_seq_show(st_map->btf,
map->btf_vmlinux_value_type_id,
value, m);
seq_puts(m, "\n");
}
@ -631,6 +599,15 @@ static void __bpf_struct_ops_map_free(struct bpf_map *map)
static void bpf_struct_ops_map_free(struct bpf_map *map)
{
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
/* st_ops->owner was acquired during map_alloc to implicitly holds
* the btf's refcnt. The acquire was only done when btf_is_module()
* st_map->btf cannot be NULL here.
*/
if (btf_is_module(st_map->btf))
module_put(st_map->st_ops_desc->st_ops->owner);
/* The struct_ops's function may switch to another struct_ops.
*
* For example, bpf_tcp_cc_x->init() may switch to
@ -654,29 +631,61 @@ static void bpf_struct_ops_map_free(struct bpf_map *map)
static int bpf_struct_ops_map_alloc_check(union bpf_attr *attr)
{
if (attr->key_size != sizeof(unsigned int) || attr->max_entries != 1 ||
(attr->map_flags & ~BPF_F_LINK) || !attr->btf_vmlinux_value_type_id)
(attr->map_flags & ~(BPF_F_LINK | BPF_F_VTYPE_BTF_OBJ_FD)) ||
!attr->btf_vmlinux_value_type_id)
return -EINVAL;
return 0;
}
static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
{
const struct bpf_struct_ops *st_ops;
const struct bpf_struct_ops_desc *st_ops_desc;
size_t st_map_size;
struct bpf_struct_ops_map *st_map;
const struct btf_type *t, *vt;
struct module *mod = NULL;
struct bpf_map *map;
struct btf *btf;
int ret;
st_ops = bpf_struct_ops_find_value(attr->btf_vmlinux_value_type_id);
if (!st_ops)
return ERR_PTR(-ENOTSUPP);
if (attr->map_flags & BPF_F_VTYPE_BTF_OBJ_FD) {
/* The map holds btf for its whole life time. */
btf = btf_get_by_fd(attr->value_type_btf_obj_fd);
if (IS_ERR(btf))
return ERR_CAST(btf);
if (!btf_is_module(btf)) {
btf_put(btf);
return ERR_PTR(-EINVAL);
}
vt = st_ops->value_type;
if (attr->value_size != vt->size)
return ERR_PTR(-EINVAL);
mod = btf_try_get_module(btf);
/* mod holds a refcnt to btf. We don't need an extra refcnt
* here.
*/
btf_put(btf);
if (!mod)
return ERR_PTR(-EINVAL);
} else {
btf = bpf_get_btf_vmlinux();
if (IS_ERR(btf))
return ERR_CAST(btf);
if (!btf)
return ERR_PTR(-ENOTSUPP);
}
t = st_ops->type;
st_ops_desc = bpf_struct_ops_find_value(btf, attr->btf_vmlinux_value_type_id);
if (!st_ops_desc) {
ret = -ENOTSUPP;
goto errout;
}
vt = st_ops_desc->value_type;
if (attr->value_size != vt->size) {
ret = -EINVAL;
goto errout;
}
t = st_ops_desc->type;
st_map_size = sizeof(*st_map) +
/* kvalue stores the
@ -685,17 +694,17 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
(vt->size - sizeof(struct bpf_struct_ops_value));
st_map = bpf_map_area_alloc(st_map_size, NUMA_NO_NODE);
if (!st_map)
return ERR_PTR(-ENOMEM);
if (!st_map) {
ret = -ENOMEM;
goto errout;
}
st_map->st_ops = st_ops;
st_map->st_ops_desc = st_ops_desc;
map = &st_map->map;
ret = bpf_jit_charge_modmem(PAGE_SIZE);
if (ret) {
__bpf_struct_ops_map_free(map);
return ERR_PTR(ret);
}
if (ret)
goto errout_free;
st_map->image = arch_alloc_bpf_trampoline(PAGE_SIZE);
if (!st_map->image) {
@ -704,29 +713,38 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
* here.
*/
bpf_jit_uncharge_modmem(PAGE_SIZE);
__bpf_struct_ops_map_free(map);
return ERR_PTR(-ENOMEM);
ret = -ENOMEM;
goto errout_free;
}
st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE);
st_map->links_cnt = btf_type_vlen(t);
st_map->links =
bpf_map_area_alloc(btf_type_vlen(t) * sizeof(struct bpf_links *),
bpf_map_area_alloc(st_map->links_cnt * sizeof(struct bpf_links *),
NUMA_NO_NODE);
if (!st_map->uvalue || !st_map->links) {
__bpf_struct_ops_map_free(map);
return ERR_PTR(-ENOMEM);
ret = -ENOMEM;
goto errout_free;
}
st_map->btf = btf;
mutex_init(&st_map->lock);
bpf_map_init_from_attr(map, attr);
return map;
errout_free:
__bpf_struct_ops_map_free(map);
errout:
module_put(mod);
return ERR_PTR(ret);
}
static u64 bpf_struct_ops_map_mem_usage(const struct bpf_map *map)
{
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
const struct bpf_struct_ops *st_ops = st_map->st_ops;
const struct btf_type *vt = st_ops->value_type;
const struct bpf_struct_ops_desc *st_ops_desc = st_map->st_ops_desc;
const struct btf_type *vt = st_ops_desc->value_type;
u64 usage;
usage = sizeof(*st_map) +
@ -785,7 +803,7 @@ static bool bpf_struct_ops_valid_to_reg(struct bpf_map *map)
return map->map_type == BPF_MAP_TYPE_STRUCT_OPS &&
map->map_flags & BPF_F_LINK &&
/* Pair with smp_store_release() during map_update */
smp_load_acquire(&st_map->kvalue.state) == BPF_STRUCT_OPS_STATE_READY;
smp_load_acquire(&st_map->kvalue.common.state) == BPF_STRUCT_OPS_STATE_READY;
}
static void bpf_struct_ops_map_link_dealloc(struct bpf_link *link)
@ -800,7 +818,7 @@ static void bpf_struct_ops_map_link_dealloc(struct bpf_link *link)
/* st_link->map can be NULL if
* bpf_struct_ops_link_create() fails to register.
*/
st_map->st_ops->unreg(&st_map->kvalue.data);
st_map->st_ops_desc->st_ops->unreg(&st_map->kvalue.data);
bpf_map_put(&st_map->map);
}
kfree(st_link);
@ -847,7 +865,7 @@ static int bpf_struct_ops_map_link_update(struct bpf_link *link, struct bpf_map
if (!bpf_struct_ops_valid_to_reg(new_map))
return -EINVAL;
if (!st_map->st_ops->update)
if (!st_map->st_ops_desc->st_ops->update)
return -EOPNOTSUPP;
mutex_lock(&update_mutex);
@ -860,12 +878,12 @@ static int bpf_struct_ops_map_link_update(struct bpf_link *link, struct bpf_map
old_st_map = container_of(old_map, struct bpf_struct_ops_map, map);
/* The new and old struct_ops must be the same type. */
if (st_map->st_ops != old_st_map->st_ops) {
if (st_map->st_ops_desc != old_st_map->st_ops_desc) {
err = -EINVAL;
goto err_out;
}
err = st_map->st_ops->update(st_map->kvalue.data, old_st_map->kvalue.data);
err = st_map->st_ops_desc->st_ops->update(st_map->kvalue.data, old_st_map->kvalue.data);
if (err)
goto err_out;
@ -916,7 +934,7 @@ int bpf_struct_ops_link_create(union bpf_attr *attr)
if (err)
goto err_out;
err = st_map->st_ops->reg(st_map->kvalue.data);
err = st_map->st_ops_desc->st_ops->reg(st_map->kvalue.data);
if (err) {
bpf_link_cleanup(&link_primer);
link = NULL;
@ -931,3 +949,10 @@ int bpf_struct_ops_link_create(union bpf_attr *attr)
kfree(link);
return err;
}
void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map)
{
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
info->btf_vmlinux_id = btf_obj_id(st_map->btf);
}

View File

@ -1,12 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* internal file - do not include directly */
#ifdef CONFIG_BPF_JIT
#ifdef CONFIG_NET
BPF_STRUCT_OPS_TYPE(bpf_dummy_ops)
#endif
#ifdef CONFIG_INET
#include <net/tcp.h>
BPF_STRUCT_OPS_TYPE(tcp_congestion_ops)
#endif
#endif

View File

@ -19,6 +19,7 @@
#include <linux/bpf_verifier.h>
#include <linux/btf.h>
#include <linux/btf_ids.h>
#include <linux/bpf.h>
#include <linux/bpf_lsm.h>
#include <linux/skmsg.h>
#include <linux/perf_event.h>
@ -241,6 +242,12 @@ struct btf_id_dtor_kfunc_tab {
struct btf_id_dtor_kfunc dtors[];
};
struct btf_struct_ops_tab {
u32 cnt;
u32 capacity;
struct bpf_struct_ops_desc ops[];
};
struct btf {
void *data;
struct btf_type **types;
@ -258,6 +265,7 @@ struct btf {
struct btf_kfunc_set_tab *kfunc_set_tab;
struct btf_id_dtor_kfunc_tab *dtor_kfunc_tab;
struct btf_struct_metas *struct_meta_tab;
struct btf_struct_ops_tab *struct_ops_tab;
/* split BTF support */
struct btf *base_btf;
@ -1688,11 +1696,20 @@ static void btf_free_struct_meta_tab(struct btf *btf)
btf->struct_meta_tab = NULL;
}
static void btf_free_struct_ops_tab(struct btf *btf)
{
struct btf_struct_ops_tab *tab = btf->struct_ops_tab;
kfree(tab);
btf->struct_ops_tab = NULL;
}
static void btf_free(struct btf *btf)
{
btf_free_struct_meta_tab(btf);
btf_free_dtor_kfunc_tab(btf);
btf_free_kfunc_set_tab(btf);
btf_free_struct_ops_tab(btf);
kvfree(btf->types);
kvfree(btf->resolved_sizes);
kvfree(btf->resolved_ids);
@ -1707,6 +1724,11 @@ static void btf_free_rcu(struct rcu_head *rcu)
btf_free(btf);
}
const char *btf_get_name(const struct btf *btf)
{
return btf->name;
}
void btf_get(struct btf *btf)
{
refcount_inc(&btf->refcnt);
@ -3310,30 +3332,48 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
return BTF_FIELD_FOUND;
}
int btf_find_next_decl_tag(const struct btf *btf, const struct btf_type *pt,
int comp_idx, const char *tag_key, int last_id)
{
int len = strlen(tag_key);
int i, n;
for (i = last_id + 1, n = btf_nr_types(btf); i < n; i++) {
const struct btf_type *t = btf_type_by_id(btf, i);
if (!btf_type_is_decl_tag(t))
continue;
if (pt != btf_type_by_id(btf, t->type))
continue;
if (btf_type_decl_tag(t)->component_idx != comp_idx)
continue;
if (strncmp(__btf_name_by_offset(btf, t->name_off), tag_key, len))
continue;
return i;
}
return -ENOENT;
}
const char *btf_find_decl_tag_value(const struct btf *btf, const struct btf_type *pt,
int comp_idx, const char *tag_key)
{
const char *value = NULL;
int i;
const struct btf_type *t;
int len, id;
for (i = 1; i < btf_nr_types(btf); i++) {
const struct btf_type *t = btf_type_by_id(btf, i);
int len = strlen(tag_key);
id = btf_find_next_decl_tag(btf, pt, comp_idx, tag_key, 0);
if (id < 0)
return ERR_PTR(id);
t = btf_type_by_id(btf, id);
len = strlen(tag_key);
value = __btf_name_by_offset(btf, t->name_off) + len;
/* Prevent duplicate entries for same type */
id = btf_find_next_decl_tag(btf, pt, comp_idx, tag_key, id);
if (id >= 0)
return ERR_PTR(-EEXIST);
if (!btf_type_is_decl_tag(t))
continue;
if (pt != btf_type_by_id(btf, t->type) ||
btf_type_decl_tag(t)->component_idx != comp_idx)
continue;
if (strncmp(__btf_name_by_offset(btf, t->name_off), tag_key, len))
continue;
/* Prevent duplicate entries for same type */
if (value)
return ERR_PTR(-EEXIST);
value = __btf_name_by_offset(btf, t->name_off) + len;
}
if (!value)
return ERR_PTR(-ENOENT);
return value;
}
@ -5933,8 +5973,6 @@ struct btf *btf_parse_vmlinux(void)
/* btf_parse_vmlinux() runs under bpf_verifier_lock */
bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]);
bpf_struct_ops_init(btf, log);
refcount_set(&btf->refcnt, 1);
err = btf_alloc_id(btf);
@ -6284,6 +6322,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
__btf_name_by_offset(btf, t->name_off));
return true;
}
EXPORT_SYMBOL_GPL(btf_ctx_access);
enum bpf_struct_walk_result {
/* < 0 error */
@ -6946,6 +6985,11 @@ static bool btf_is_dynptr_ptr(const struct btf *btf, const struct btf_type *t)
return false;
}
enum btf_arg_tag {
ARG_TAG_CTX = 0x1,
ARG_TAG_NONNULL = 0x2,
};
/* Process BTF of a function to produce high-level expectation of function
* arguments (like ARG_PTR_TO_CTX, or ARG_PTR_TO_MEM, etc). This information
* is cached in subprog info for reuse.
@ -7027,70 +7071,86 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog)
* Only PTR_TO_CTX and SCALAR are supported atm.
*/
for (i = 0; i < nargs; i++) {
bool is_nonnull = false;
const char *tag;
u32 tags = 0;
int id = 0;
t = btf_type_by_id(btf, args[i].type);
tag = btf_find_decl_tag_value(btf, fn_t, i, "arg:");
if (IS_ERR(tag) && PTR_ERR(tag) == -ENOENT) {
tag = NULL;
} else if (IS_ERR(tag)) {
bpf_log(log, "arg#%d type's tag fetching failure: %ld\n", i, PTR_ERR(tag));
return PTR_ERR(tag);
}
/* 'arg:<tag>' decl_tag takes precedence over derivation of
* register type from BTF type itself
*/
if (tag) {
while ((id = btf_find_next_decl_tag(btf, fn_t, i, "arg:", id)) > 0) {
const struct btf_type *tag_t = btf_type_by_id(btf, id);
const char *tag = __btf_name_by_offset(btf, tag_t->name_off) + 4;
/* disallow arg tags in static subprogs */
if (!is_global) {
bpf_log(log, "arg#%d type tag is not supported in static functions\n", i);
return -EOPNOTSUPP;
}
if (strcmp(tag, "ctx") == 0) {
sub->args[i].arg_type = ARG_PTR_TO_CTX;
continue;
tags |= ARG_TAG_CTX;
} else if (strcmp(tag, "nonnull") == 0) {
tags |= ARG_TAG_NONNULL;
} else {
bpf_log(log, "arg#%d has unsupported set of tags\n", i);
return -EOPNOTSUPP;
}
if (strcmp(tag, "nonnull") == 0)
is_nonnull = true;
}
if (id != -ENOENT) {
bpf_log(log, "arg#%d type tag fetching failure: %d\n", i, id);
return id;
}
t = btf_type_by_id(btf, args[i].type);
while (btf_type_is_modifier(t))
t = btf_type_by_id(btf, t->type);
if (btf_type_is_int(t) || btf_is_any_enum(t)) {
sub->args[i].arg_type = ARG_ANYTHING;
continue;
}
if (btf_type_is_ptr(t) && btf_get_prog_ctx_type(log, btf, t, prog_type, i)) {
if (!btf_type_is_ptr(t))
goto skip_pointer;
if ((tags & ARG_TAG_CTX) || btf_get_prog_ctx_type(log, btf, t, prog_type, i)) {
if (tags & ~ARG_TAG_CTX) {
bpf_log(log, "arg#%d has invalid combination of tags\n", i);
return -EINVAL;
}
sub->args[i].arg_type = ARG_PTR_TO_CTX;
continue;
}
if (btf_type_is_ptr(t) && btf_is_dynptr_ptr(btf, t)) {
if (btf_is_dynptr_ptr(btf, t)) {
if (tags) {
bpf_log(log, "arg#%d has invalid combination of tags\n", i);
return -EINVAL;
}
sub->args[i].arg_type = ARG_PTR_TO_DYNPTR | MEM_RDONLY;
continue;
}
if (is_global && btf_type_is_ptr(t)) {
if (is_global) { /* generic user data pointer */
u32 mem_size;
t = btf_type_skip_modifiers(btf, t->type, NULL);
ref_t = btf_resolve_size(btf, t, &mem_size);
if (IS_ERR(ref_t)) {
bpf_log(log,
"arg#%d reference type('%s %s') size cannot be determined: %ld\n",
i, btf_type_str(t), btf_name_by_offset(btf, t->name_off),
bpf_log(log, "arg#%d reference type('%s %s') size cannot be determined: %ld\n",
i, btf_type_str(t), btf_name_by_offset(btf, t->name_off),
PTR_ERR(ref_t));
return -EINVAL;
}
sub->args[i].arg_type = is_nonnull ? ARG_PTR_TO_MEM : ARG_PTR_TO_MEM_OR_NULL;
sub->args[i].arg_type = ARG_PTR_TO_MEM | PTR_MAYBE_NULL;
if (tags & ARG_TAG_NONNULL)
sub->args[i].arg_type &= ~PTR_MAYBE_NULL;
sub->args[i].mem_size = mem_size;
continue;
}
if (is_nonnull) {
bpf_log(log, "arg#%d marked as non-null, but is not a pointer type\n", i);
skip_pointer:
if (tags) {
bpf_log(log, "arg#%d has pointer tag, but is not a pointer type\n", i);
return -EINVAL;
}
if (btf_type_is_int(t) || btf_is_any_enum(t)) {
sub->args[i].arg_type = ARG_ANYTHING;
continue;
}
bpf_log(log, "Arg#%d type %s in %s() is not supported yet.\n",
i, btf_type_str(t), tname);
return -EINVAL;
@ -8645,3 +8705,121 @@ bool btf_type_ids_nocast_alias(struct bpf_verifier_log *log,
return !strncmp(reg_name, arg_name, cmp_len);
}
#ifdef CONFIG_BPF_JIT
static int
btf_add_struct_ops(struct btf *btf, struct bpf_struct_ops *st_ops,
struct bpf_verifier_log *log)
{
struct btf_struct_ops_tab *tab, *new_tab;
int i, err;
tab = btf->struct_ops_tab;
if (!tab) {
tab = kzalloc(offsetof(struct btf_struct_ops_tab, ops[4]),
GFP_KERNEL);
if (!tab)
return -ENOMEM;
tab->capacity = 4;
btf->struct_ops_tab = tab;
}
for (i = 0; i < tab->cnt; i++)
if (tab->ops[i].st_ops == st_ops)
return -EEXIST;
if (tab->cnt == tab->capacity) {
new_tab = krealloc(tab,
offsetof(struct btf_struct_ops_tab,
ops[tab->capacity * 2]),
GFP_KERNEL);
if (!new_tab)
return -ENOMEM;
tab = new_tab;
tab->capacity *= 2;
btf->struct_ops_tab = tab;
}
tab->ops[btf->struct_ops_tab->cnt].st_ops = st_ops;
err = bpf_struct_ops_desc_init(&tab->ops[btf->struct_ops_tab->cnt], btf, log);
if (err)
return err;
btf->struct_ops_tab->cnt++;
return 0;
}
const struct bpf_struct_ops_desc *
bpf_struct_ops_find_value(struct btf *btf, u32 value_id)
{
const struct bpf_struct_ops_desc *st_ops_list;
unsigned int i;
u32 cnt;
if (!value_id)
return NULL;
if (!btf->struct_ops_tab)
return NULL;
cnt = btf->struct_ops_tab->cnt;
st_ops_list = btf->struct_ops_tab->ops;
for (i = 0; i < cnt; i++) {
if (st_ops_list[i].value_id == value_id)
return &st_ops_list[i];
}
return NULL;
}
const struct bpf_struct_ops_desc *
bpf_struct_ops_find(struct btf *btf, u32 type_id)
{
const struct bpf_struct_ops_desc *st_ops_list;
unsigned int i;
u32 cnt;
if (!type_id)
return NULL;
if (!btf->struct_ops_tab)
return NULL;
cnt = btf->struct_ops_tab->cnt;
st_ops_list = btf->struct_ops_tab->ops;
for (i = 0; i < cnt; i++) {
if (st_ops_list[i].type_id == type_id)
return &st_ops_list[i];
}
return NULL;
}
int __register_bpf_struct_ops(struct bpf_struct_ops *st_ops)
{
struct bpf_verifier_log *log;
struct btf *btf;
int err = 0;
btf = btf_get_module_btf(st_ops->owner);
if (!btf)
return -EINVAL;
log = kzalloc(sizeof(*log), GFP_KERNEL | __GFP_NOWARN);
if (!log) {
err = -ENOMEM;
goto errout;
}
log->level = BPF_LOG_KERNEL;
err = btf_add_struct_ops(btf, st_ops, log);
errout:
kfree(log);
btf_put(btf);
return err;
}
EXPORT_SYMBOL_GPL(__register_bpf_struct_ops);
#endif

View File

@ -1630,7 +1630,7 @@ cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_perf_event_output:
return &bpf_event_output_data_proto;
default:
return bpf_base_func_proto(func_id);
return bpf_base_func_proto(func_id, prog);
}
}
@ -2191,7 +2191,7 @@ sysctl_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_perf_event_output:
return &bpf_event_output_data_proto;
default:
return bpf_base_func_proto(func_id);
return bpf_base_func_proto(func_id, prog);
}
}
@ -2348,7 +2348,7 @@ cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_perf_event_output:
return &bpf_event_output_data_proto;
default:
return bpf_base_func_proto(func_id);
return bpf_base_func_proto(func_id, prog);
}
}

View File

@ -682,7 +682,7 @@ static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
void bpf_prog_kallsyms_add(struct bpf_prog *fp)
{
if (!bpf_prog_kallsyms_candidate(fp) ||
!bpf_capable())
!bpf_token_capable(fp->aux->token, CAP_BPF))
return;
bpf_prog_ksym_set_addr(fp);
@ -2779,6 +2779,7 @@ void bpf_prog_free(struct bpf_prog *fp)
if (aux->dst_prog)
bpf_prog_put(aux->dst_prog);
bpf_token_put(aux->token);
INIT_WORK(&aux->work, bpf_prog_free_deferred);
schedule_work(&aux->work);
}
@ -2925,6 +2926,16 @@ bool __weak bpf_jit_supports_far_kfunc_call(void)
return false;
}
/* Return TRUE if the JIT backend satisfies the following two conditions:
* 1) JIT backend supports atomic_xchg() on pointer-sized words.
* 2) Under the specific arch, the implementation of xchg() is the same
* as atomic_xchg() on pointer-sized words.
*/
bool __weak bpf_jit_supports_ptr_xchg(void)
{
return false;
}
/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call
* skb_copy_bits(), so provide a weak definition of it for NET-less config.
*/

View File

@ -1414,6 +1414,7 @@ BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
{
unsigned long *kptr = map_value;
/* This helper may be inlined by verifier. */
return xchg(kptr, (unsigned long)ptr);
}
@ -1679,7 +1680,7 @@ const struct bpf_func_proto bpf_probe_read_kernel_str_proto __weak;
const struct bpf_func_proto bpf_task_pt_regs_proto __weak;
const struct bpf_func_proto *
bpf_base_func_proto(enum bpf_func_id func_id)
bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
switch (func_id) {
case BPF_FUNC_map_lookup_elem:
@ -1730,7 +1731,7 @@ bpf_base_func_proto(enum bpf_func_id func_id)
break;
}
if (!bpf_capable())
if (!bpf_token_capable(prog->aux->token, CAP_BPF))
return NULL;
switch (func_id) {
@ -1788,7 +1789,7 @@ bpf_base_func_proto(enum bpf_func_id func_id)
break;
}
if (!perfmon_capable())
if (!bpf_token_capable(prog->aux->token, CAP_PERFMON))
return NULL;
switch (func_id) {

View File

@ -20,6 +20,7 @@
#include <linux/filter.h>
#include <linux/bpf.h>
#include <linux/bpf_trace.h>
#include <linux/kstrtox.h>
#include "preload/bpf_preload.h"
enum bpf_type {
@ -98,9 +99,9 @@ static const struct inode_operations bpf_prog_iops = { };
static const struct inode_operations bpf_map_iops = { };
static const struct inode_operations bpf_link_iops = { };
static struct inode *bpf_get_inode(struct super_block *sb,
const struct inode *dir,
umode_t mode)
struct inode *bpf_get_inode(struct super_block *sb,
const struct inode *dir,
umode_t mode)
{
struct inode *inode;
@ -594,6 +595,136 @@ struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type typ
}
EXPORT_SYMBOL(bpf_prog_get_type_path);
struct bpffs_btf_enums {
const struct btf *btf;
const struct btf_type *cmd_t;
const struct btf_type *map_t;
const struct btf_type *prog_t;
const struct btf_type *attach_t;
};
static int find_bpffs_btf_enums(struct bpffs_btf_enums *info)
{
const struct btf *btf;
const struct btf_type *t;
const char *name;
int i, n;
memset(info, 0, sizeof(*info));
btf = bpf_get_btf_vmlinux();
if (IS_ERR(btf))
return PTR_ERR(btf);
if (!btf)
return -ENOENT;
info->btf = btf;
for (i = 1, n = btf_nr_types(btf); i < n; i++) {
t = btf_type_by_id(btf, i);
if (!btf_type_is_enum(t))
continue;
name = btf_name_by_offset(btf, t->name_off);
if (!name)
continue;
if (strcmp(name, "bpf_cmd") == 0)
info->cmd_t = t;
else if (strcmp(name, "bpf_map_type") == 0)
info->map_t = t;
else if (strcmp(name, "bpf_prog_type") == 0)
info->prog_t = t;
else if (strcmp(name, "bpf_attach_type") == 0)
info->attach_t = t;
else
continue;
if (info->cmd_t && info->map_t && info->prog_t && info->attach_t)
return 0;
}
return -ESRCH;
}
static bool find_btf_enum_const(const struct btf *btf, const struct btf_type *enum_t,
const char *prefix, const char *str, int *value)
{
const struct btf_enum *e;
const char *name;
int i, n, pfx_len = strlen(prefix);
*value = 0;
if (!btf || !enum_t)
return false;
for (i = 0, n = btf_vlen(enum_t); i < n; i++) {
e = &btf_enum(enum_t)[i];
name = btf_name_by_offset(btf, e->name_off);
if (!name || strncasecmp(name, prefix, pfx_len) != 0)
continue;
/* match symbolic name case insensitive and ignoring prefix */
if (strcasecmp(name + pfx_len, str) == 0) {
*value = e->val;
return true;
}
}
return false;
}
static void seq_print_delegate_opts(struct seq_file *m,
const char *opt_name,
const struct btf *btf,
const struct btf_type *enum_t,
const char *prefix,
u64 delegate_msk, u64 any_msk)
{
const struct btf_enum *e;
bool first = true;
const char *name;
u64 msk;
int i, n, pfx_len = strlen(prefix);
delegate_msk &= any_msk; /* clear unknown bits */
if (delegate_msk == 0)
return;
seq_printf(m, ",%s", opt_name);
if (delegate_msk == any_msk) {
seq_printf(m, "=any");
return;
}
if (btf && enum_t) {
for (i = 0, n = btf_vlen(enum_t); i < n; i++) {
e = &btf_enum(enum_t)[i];
name = btf_name_by_offset(btf, e->name_off);
if (!name || strncasecmp(name, prefix, pfx_len) != 0)
continue;
msk = 1ULL << e->val;
if (delegate_msk & msk) {
/* emit lower-case name without prefix */
seq_printf(m, "%c", first ? '=' : ':');
name += pfx_len;
while (*name) {
seq_printf(m, "%c", tolower(*name));
name++;
}
delegate_msk &= ~msk;
first = false;
}
}
}
if (delegate_msk)
seq_printf(m, "%c0x%llx", first ? '=' : ':', delegate_msk);
}
/*
* Display the mount options in /proc/mounts.
*/
@ -601,6 +732,8 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
{
struct inode *inode = d_inode(root);
umode_t mode = inode->i_mode & S_IALLUGO & ~S_ISVTX;
struct bpf_mount_opts *opts = root->d_sb->s_fs_info;
u64 mask;
if (!uid_eq(inode->i_uid, GLOBAL_ROOT_UID))
seq_printf(m, ",uid=%u",
@ -610,6 +743,35 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
from_kgid_munged(&init_user_ns, inode->i_gid));
if (mode != S_IRWXUGO)
seq_printf(m, ",mode=%o", mode);
if (opts->delegate_cmds || opts->delegate_maps ||
opts->delegate_progs || opts->delegate_attachs) {
struct bpffs_btf_enums info;
/* ignore errors, fallback to hex */
(void)find_bpffs_btf_enums(&info);
mask = (1ULL << __MAX_BPF_CMD) - 1;
seq_print_delegate_opts(m, "delegate_cmds",
info.btf, info.cmd_t, "BPF_",
opts->delegate_cmds, mask);
mask = (1ULL << __MAX_BPF_MAP_TYPE) - 1;
seq_print_delegate_opts(m, "delegate_maps",
info.btf, info.map_t, "BPF_MAP_TYPE_",
opts->delegate_maps, mask);
mask = (1ULL << __MAX_BPF_PROG_TYPE) - 1;
seq_print_delegate_opts(m, "delegate_progs",
info.btf, info.prog_t, "BPF_PROG_TYPE_",
opts->delegate_progs, mask);
mask = (1ULL << __MAX_BPF_ATTACH_TYPE) - 1;
seq_print_delegate_opts(m, "delegate_attachs",
info.btf, info.attach_t, "BPF_",
opts->delegate_attachs, mask);
}
return 0;
}
@ -624,7 +786,7 @@ static void bpf_free_inode(struct inode *inode)
free_inode_nonrcu(inode);
}
static const struct super_operations bpf_super_ops = {
const struct super_operations bpf_super_ops = {
.statfs = simple_statfs,
.drop_inode = generic_delete_inode,
.show_options = bpf_show_options,
@ -635,28 +797,30 @@ enum {
OPT_UID,
OPT_GID,
OPT_MODE,
OPT_DELEGATE_CMDS,
OPT_DELEGATE_MAPS,
OPT_DELEGATE_PROGS,
OPT_DELEGATE_ATTACHS,
};
static const struct fs_parameter_spec bpf_fs_parameters[] = {
fsparam_u32 ("uid", OPT_UID),
fsparam_u32 ("gid", OPT_GID),
fsparam_u32oct ("mode", OPT_MODE),
fsparam_string ("delegate_cmds", OPT_DELEGATE_CMDS),
fsparam_string ("delegate_maps", OPT_DELEGATE_MAPS),
fsparam_string ("delegate_progs", OPT_DELEGATE_PROGS),
fsparam_string ("delegate_attachs", OPT_DELEGATE_ATTACHS),
{}
};
struct bpf_mount_opts {
kuid_t uid;
kgid_t gid;
umode_t mode;
};
static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
{
struct bpf_mount_opts *opts = fc->fs_private;
struct bpf_mount_opts *opts = fc->s_fs_info;
struct fs_parse_result result;
kuid_t uid;
kgid_t gid;
int opt;
int opt, err;
opt = fs_parse(fc, bpf_fs_parameters, param, &result);
if (opt < 0) {
@ -708,6 +872,67 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
case OPT_MODE:
opts->mode = result.uint_32 & S_IALLUGO;
break;
case OPT_DELEGATE_CMDS:
case OPT_DELEGATE_MAPS:
case OPT_DELEGATE_PROGS:
case OPT_DELEGATE_ATTACHS: {
struct bpffs_btf_enums info;
const struct btf_type *enum_t;
const char *enum_pfx;
u64 *delegate_msk, msk = 0;
char *p;
int val;
/* ignore errors, fallback to hex */
(void)find_bpffs_btf_enums(&info);
switch (opt) {
case OPT_DELEGATE_CMDS:
delegate_msk = &opts->delegate_cmds;
enum_t = info.cmd_t;
enum_pfx = "BPF_";
break;
case OPT_DELEGATE_MAPS:
delegate_msk = &opts->delegate_maps;
enum_t = info.map_t;
enum_pfx = "BPF_MAP_TYPE_";
break;
case OPT_DELEGATE_PROGS:
delegate_msk = &opts->delegate_progs;
enum_t = info.prog_t;
enum_pfx = "BPF_PROG_TYPE_";
break;
case OPT_DELEGATE_ATTACHS:
delegate_msk = &opts->delegate_attachs;
enum_t = info.attach_t;
enum_pfx = "BPF_";
break;
default:
return -EINVAL;
}
while ((p = strsep(&param->string, ":"))) {
if (strcmp(p, "any") == 0) {
msk |= ~0ULL;
} else if (find_btf_enum_const(info.btf, enum_t, enum_pfx, p, &val)) {
msk |= 1ULL << val;
} else {
err = kstrtou64(p, 0, &msk);
if (err)
return err;
}
}
/* Setting delegation mount options requires privileges */
if (msk && !capable(CAP_SYS_ADMIN))
return -EPERM;
*delegate_msk |= msk;
break;
}
default:
/* ignore unknown mount options */
break;
}
return 0;
@ -784,10 +1009,14 @@ static int populate_bpffs(struct dentry *parent)
static int bpf_fill_super(struct super_block *sb, struct fs_context *fc)
{
static const struct tree_descr bpf_rfiles[] = { { "" } };
struct bpf_mount_opts *opts = fc->fs_private;
struct bpf_mount_opts *opts = sb->s_fs_info;
struct inode *inode;
int ret;
/* Mounting an instance of BPF FS requires privileges */
if (fc->user_ns != &init_user_ns && !capable(CAP_SYS_ADMIN))
return -EPERM;
ret = simple_fill_super(sb, BPF_FS_MAGIC, bpf_rfiles);
if (ret)
return ret;
@ -811,7 +1040,7 @@ static int bpf_get_tree(struct fs_context *fc)
static void bpf_free_fc(struct fs_context *fc)
{
kfree(fc->fs_private);
kfree(fc->s_fs_info);
}
static const struct fs_context_operations bpf_context_ops = {
@ -835,17 +1064,32 @@ static int bpf_init_fs_context(struct fs_context *fc)
opts->uid = current_fsuid();
opts->gid = current_fsgid();
fc->fs_private = opts;
/* start out with no BPF token delegation enabled */
opts->delegate_cmds = 0;
opts->delegate_maps = 0;
opts->delegate_progs = 0;
opts->delegate_attachs = 0;
fc->s_fs_info = opts;
fc->ops = &bpf_context_ops;
return 0;
}
static void bpf_kill_super(struct super_block *sb)
{
struct bpf_mount_opts *opts = sb->s_fs_info;
kill_litter_super(sb);
kfree(opts);
}
static struct file_system_type bpf_fs_type = {
.owner = THIS_MODULE,
.name = "bpf",
.init_fs_context = bpf_init_fs_context,
.parameters = bpf_fs_parameters,
.kill_sb = kill_litter_super,
.kill_sb = bpf_kill_super,
.fs_flags = FS_USERNS_MOUNT,
};
static int __init bpf_init(void)

View File

@ -1011,8 +1011,8 @@ int map_check_no_btf(const struct bpf_map *map,
return -ENOTSUPP;
}
static int map_check_btf(struct bpf_map *map, const struct btf *btf,
u32 btf_key_id, u32 btf_value_id)
static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
const struct btf *btf, u32 btf_key_id, u32 btf_value_id)
{
const struct btf_type *key_type, *value_type;
u32 key_size, value_size;
@ -1040,7 +1040,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
if (!IS_ERR_OR_NULL(map->record)) {
int i;
if (!bpf_capable()) {
if (!bpf_token_capable(token, CAP_BPF)) {
ret = -EPERM;
goto free_map_tab;
}
@ -1123,14 +1123,21 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
return ret;
}
#define BPF_MAP_CREATE_LAST_FIELD map_extra
static bool bpf_net_capable(void)
{
return capable(CAP_NET_ADMIN) || capable(CAP_SYS_ADMIN);
}
#define BPF_MAP_CREATE_LAST_FIELD map_token_fd
/* called via syscall */
static int map_create(union bpf_attr *attr)
{
const struct bpf_map_ops *ops;
struct bpf_token *token = NULL;
int numa_node = bpf_map_attr_numa_node(attr);
u32 map_type = attr->map_type;
struct bpf_map *map;
bool token_flag;
int f_flags;
int err;
@ -1138,6 +1145,12 @@ static int map_create(union bpf_attr *attr)
if (err)
return -EINVAL;
/* check BPF_F_TOKEN_FD flag, remember if it's set, and then clear it
* to avoid per-map type checks tripping on unknown flag
*/
token_flag = attr->map_flags & BPF_F_TOKEN_FD;
attr->map_flags &= ~BPF_F_TOKEN_FD;
if (attr->btf_vmlinux_value_type_id) {
if (attr->map_type != BPF_MAP_TYPE_STRUCT_OPS ||
attr->btf_key_type_id || attr->btf_value_type_id)
@ -1178,14 +1191,32 @@ static int map_create(union bpf_attr *attr)
if (!ops->map_mem_usage)
return -EINVAL;
if (token_flag) {
token = bpf_token_get_from_fd(attr->map_token_fd);
if (IS_ERR(token))
return PTR_ERR(token);
/* if current token doesn't grant map creation permissions,
* then we can't use this token, so ignore it and rely on
* system-wide capabilities checks
*/
if (!bpf_token_allow_cmd(token, BPF_MAP_CREATE) ||
!bpf_token_allow_map_type(token, attr->map_type)) {
bpf_token_put(token);
token = NULL;
}
}
err = -EPERM;
/* Intent here is for unprivileged_bpf_disabled to block BPF map
* creation for unprivileged users; other actions depend
* on fd availability and access to bpffs, so are dependent on
* object creation success. Even with unprivileged BPF disabled,
* capability checks are still carried out.
*/
if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
return -EPERM;
if (sysctl_unprivileged_bpf_disabled && !bpf_token_capable(token, CAP_BPF))
goto put_token;
/* check privileged map type permissions */
switch (map_type) {
@ -1218,25 +1249,27 @@ static int map_create(union bpf_attr *attr)
case BPF_MAP_TYPE_LRU_PERCPU_HASH:
case BPF_MAP_TYPE_STRUCT_OPS:
case BPF_MAP_TYPE_CPUMAP:
if (!bpf_capable())
return -EPERM;
if (!bpf_token_capable(token, CAP_BPF))
goto put_token;
break;
case BPF_MAP_TYPE_SOCKMAP:
case BPF_MAP_TYPE_SOCKHASH:
case BPF_MAP_TYPE_DEVMAP:
case BPF_MAP_TYPE_DEVMAP_HASH:
case BPF_MAP_TYPE_XSKMAP:
if (!capable(CAP_NET_ADMIN))
return -EPERM;
if (!bpf_token_capable(token, CAP_NET_ADMIN))
goto put_token;
break;
default:
WARN(1, "unsupported map type %d", map_type);
return -EPERM;
goto put_token;
}
map = ops->map_alloc(attr);
if (IS_ERR(map))
return PTR_ERR(map);
if (IS_ERR(map)) {
err = PTR_ERR(map);
goto put_token;
}
map->ops = ops;
map->map_type = map_type;
@ -1273,7 +1306,7 @@ static int map_create(union bpf_attr *attr)
map->btf = btf;
if (attr->btf_value_type_id) {
err = map_check_btf(map, btf, attr->btf_key_type_id,
err = map_check_btf(map, token, btf, attr->btf_key_type_id,
attr->btf_value_type_id);
if (err)
goto free_map;
@ -1285,15 +1318,16 @@ static int map_create(union bpf_attr *attr)
attr->btf_vmlinux_value_type_id;
}
err = security_bpf_map_alloc(map);
err = security_bpf_map_create(map, attr, token);
if (err)
goto free_map;
goto free_map_sec;
err = bpf_map_alloc_id(map);
if (err)
goto free_map_sec;
bpf_map_save_memcg(map);
bpf_token_put(token);
err = bpf_map_new_fd(map, f_flags);
if (err < 0) {
@ -1314,6 +1348,8 @@ static int map_create(union bpf_attr *attr)
free_map:
btf_put(map->btf);
map->ops->map_free(map);
put_token:
bpf_token_put(token);
return err;
}
@ -2144,7 +2180,7 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu)
kvfree(aux->func_info);
kfree(aux->func_info_aux);
free_uid(aux->user);
security_bpf_prog_free(aux);
security_bpf_prog_free(aux->prog);
bpf_prog_free(aux->prog);
}
@ -2590,13 +2626,15 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
}
/* last field in 'union bpf_attr' used by this command */
#define BPF_PROG_LOAD_LAST_FIELD log_true_size
#define BPF_PROG_LOAD_LAST_FIELD prog_token_fd
static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
{
enum bpf_prog_type type = attr->prog_type;
struct bpf_prog *prog, *dst_prog = NULL;
struct btf *attach_btf = NULL;
struct bpf_token *token = NULL;
bool bpf_cap;
int err;
char license[128];
@ -2610,13 +2648,35 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
BPF_F_TEST_RND_HI32 |
BPF_F_XDP_HAS_FRAGS |
BPF_F_XDP_DEV_BOUND_ONLY |
BPF_F_TEST_REG_INVARIANTS))
BPF_F_TEST_REG_INVARIANTS |
BPF_F_TOKEN_FD))
return -EINVAL;
bpf_prog_load_fixup_attach_type(attr);
if (attr->prog_flags & BPF_F_TOKEN_FD) {
token = bpf_token_get_from_fd(attr->prog_token_fd);
if (IS_ERR(token))
return PTR_ERR(token);
/* if current token doesn't grant prog loading permissions,
* then we can't use this token, so ignore it and rely on
* system-wide capabilities checks
*/
if (!bpf_token_allow_cmd(token, BPF_PROG_LOAD) ||
!bpf_token_allow_prog_type(token, attr->prog_type,
attr->expected_attach_type)) {
bpf_token_put(token);
token = NULL;
}
}
bpf_cap = bpf_token_capable(token, CAP_BPF);
err = -EPERM;
if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
(attr->prog_flags & BPF_F_ANY_ALIGNMENT) &&
!bpf_capable())
return -EPERM;
!bpf_cap)
goto put_token;
/* Intent here is for unprivileged_bpf_disabled to block BPF program
* creation for unprivileged users; other actions depend
@ -2625,21 +2685,23 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
* capability checks are still carried out for these
* and other operations.
*/
if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
return -EPERM;
if (sysctl_unprivileged_bpf_disabled && !bpf_cap)
goto put_token;
if (attr->insn_cnt == 0 ||
attr->insn_cnt > (bpf_capable() ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
return -E2BIG;
attr->insn_cnt > (bpf_cap ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS)) {
err = -E2BIG;
goto put_token;
}
if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
type != BPF_PROG_TYPE_CGROUP_SKB &&
!bpf_capable())
return -EPERM;
!bpf_cap)
goto put_token;
if (is_net_admin_prog_type(type) && !capable(CAP_NET_ADMIN) && !capable(CAP_SYS_ADMIN))
return -EPERM;
if (is_perfmon_prog_type(type) && !perfmon_capable())
return -EPERM;
if (is_net_admin_prog_type(type) && !bpf_token_capable(token, CAP_NET_ADMIN))
goto put_token;
if (is_perfmon_prog_type(type) && !bpf_token_capable(token, CAP_PERFMON))
goto put_token;
/* attach_prog_fd/attach_btf_obj_fd can specify fd of either bpf_prog
* or btf, we need to check which one it is
@ -2649,27 +2711,33 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
if (IS_ERR(dst_prog)) {
dst_prog = NULL;
attach_btf = btf_get_by_fd(attr->attach_btf_obj_fd);
if (IS_ERR(attach_btf))
return -EINVAL;
if (IS_ERR(attach_btf)) {
err = -EINVAL;
goto put_token;
}
if (!btf_is_kernel(attach_btf)) {
/* attaching through specifying bpf_prog's BTF
* objects directly might be supported eventually
*/
btf_put(attach_btf);
return -ENOTSUPP;
err = -ENOTSUPP;
goto put_token;
}
}
} else if (attr->attach_btf_id) {
/* fall back to vmlinux BTF, if BTF type ID is specified */
attach_btf = bpf_get_btf_vmlinux();
if (IS_ERR(attach_btf))
return PTR_ERR(attach_btf);
if (!attach_btf)
return -EINVAL;
if (IS_ERR(attach_btf)) {
err = PTR_ERR(attach_btf);
goto put_token;
}
if (!attach_btf) {
err = -EINVAL;
goto put_token;
}
btf_get(attach_btf);
}
bpf_prog_load_fixup_attach_type(attr);
if (bpf_prog_load_check_attach(type, attr->expected_attach_type,
attach_btf, attr->attach_btf_id,
dst_prog)) {
@ -2677,7 +2745,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
bpf_prog_put(dst_prog);
if (attach_btf)
btf_put(attach_btf);
return -EINVAL;
err = -EINVAL;
goto put_token;
}
/* plain bpf_prog allocation */
@ -2687,7 +2756,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
bpf_prog_put(dst_prog);
if (attach_btf)
btf_put(attach_btf);
return -ENOMEM;
err = -EINVAL;
goto put_token;
}
prog->expected_attach_type = attr->expected_attach_type;
@ -2698,9 +2768,9 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS;
err = security_bpf_prog_alloc(prog->aux);
if (err)
goto free_prog;
/* move token into prog->aux, reuse taken refcnt */
prog->aux->token = token;
token = NULL;
prog->aux->user = get_current_user();
prog->len = attr->insn_cnt;
@ -2709,12 +2779,12 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
if (copy_from_bpfptr(prog->insns,
make_bpfptr(attr->insns, uattr.is_kernel),
bpf_prog_insn_size(prog)) != 0)
goto free_prog_sec;
goto free_prog;
/* copy eBPF program license from user space */
if (strncpy_from_bpfptr(license,
make_bpfptr(attr->license, uattr.is_kernel),
sizeof(license) - 1) < 0)
goto free_prog_sec;
goto free_prog;
license[sizeof(license) - 1] = 0;
/* eBPF programs must be GPL compatible to use GPL-ed functions */
@ -2728,14 +2798,14 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
if (bpf_prog_is_dev_bound(prog->aux)) {
err = bpf_prog_dev_bound_init(prog, attr);
if (err)
goto free_prog_sec;
goto free_prog;
}
if (type == BPF_PROG_TYPE_EXT && dst_prog &&
bpf_prog_is_dev_bound(dst_prog->aux)) {
err = bpf_prog_dev_bound_inherit(prog, dst_prog);
if (err)
goto free_prog_sec;
goto free_prog;
}
/*
@ -2757,12 +2827,16 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
/* find program type: socket_filter vs tracing_filter */
err = find_prog_type(type, prog);
if (err < 0)
goto free_prog_sec;
goto free_prog;
prog->aux->load_time = ktime_get_boottime_ns();
err = bpf_obj_name_cpy(prog->aux->name, attr->prog_name,
sizeof(attr->prog_name));
if (err < 0)
goto free_prog;
err = security_bpf_prog_load(prog, attr, token);
if (err)
goto free_prog_sec;
/* run eBPF verifier */
@ -2808,13 +2882,16 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
*/
__bpf_prog_put_noref(prog, prog->aux->real_func_cnt);
return err;
free_prog_sec:
free_uid(prog->aux->user);
security_bpf_prog_free(prog->aux);
security_bpf_prog_free(prog);
free_prog:
free_uid(prog->aux->user);
if (prog->aux->attach_btf)
btf_put(prog->aux->attach_btf);
bpf_prog_free(prog);
put_token:
bpf_token_put(token);
return err;
}
@ -3501,6 +3578,7 @@ static int bpf_perf_link_fill_kprobe(const struct perf_event *event,
if (!kallsyms_show_value(current_cred()))
addr = 0;
info->perf_event.kprobe.addr = addr;
info->perf_event.kprobe.cookie = event->bpf_cookie;
return 0;
}
#endif
@ -3526,6 +3604,7 @@ static int bpf_perf_link_fill_uprobe(const struct perf_event *event,
else
info->perf_event.type = BPF_PERF_EVENT_UPROBE;
info->perf_event.uprobe.offset = offset;
info->perf_event.uprobe.cookie = event->bpf_cookie;
return 0;
}
#endif
@ -3553,6 +3632,7 @@ static int bpf_perf_link_fill_tracepoint(const struct perf_event *event,
uname = u64_to_user_ptr(info->perf_event.tracepoint.tp_name);
ulen = info->perf_event.tracepoint.name_len;
info->perf_event.type = BPF_PERF_EVENT_TRACEPOINT;
info->perf_event.tracepoint.cookie = event->bpf_cookie;
return bpf_perf_link_fill_common(event, uname, ulen, NULL, NULL, NULL, NULL);
}
@ -3561,6 +3641,7 @@ static int bpf_perf_link_fill_perf_event(const struct perf_event *event,
{
info->perf_event.event.type = event->attr.type;
info->perf_event.event.config = event->attr.config;
info->perf_event.event.cookie = event->bpf_cookie;
info->perf_event.type = BPF_PERF_EVENT_EVENT;
return 0;
}
@ -3818,7 +3899,7 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
case BPF_PROG_TYPE_SK_LOOKUP:
return attach_type == prog->expected_attach_type ? 0 : -EINVAL;
case BPF_PROG_TYPE_CGROUP_SKB:
if (!capable(CAP_NET_ADMIN))
if (!bpf_token_capable(prog->aux->token, CAP_NET_ADMIN))
/* cg-skb progs can be loaded by unpriv user.
* check permissions at attach time.
*/
@ -4021,7 +4102,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
static int bpf_prog_query(const union bpf_attr *attr,
union bpf_attr __user *uattr)
{
if (!capable(CAP_NET_ADMIN))
if (!bpf_net_capable())
return -EPERM;
if (CHECK_ATTR(BPF_PROG_QUERY))
return -EINVAL;
@ -4687,6 +4768,8 @@ static int bpf_map_get_info_by_fd(struct file *file,
info.btf_value_type_id = map->btf_value_type_id;
}
info.btf_vmlinux_value_type_id = map->btf_vmlinux_value_type_id;
if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS)
bpf_map_struct_ops_info_fill(&info, map);
if (bpf_map_is_offloaded(map)) {
err = bpf_map_offload_info_fill(&info, map);
@ -4789,15 +4872,34 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
return err;
}
#define BPF_BTF_LOAD_LAST_FIELD btf_log_true_size
#define BPF_BTF_LOAD_LAST_FIELD btf_token_fd
static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
{
struct bpf_token *token = NULL;
if (CHECK_ATTR(BPF_BTF_LOAD))
return -EINVAL;
if (!bpf_capable())
if (attr->btf_flags & ~BPF_F_TOKEN_FD)
return -EINVAL;
if (attr->btf_flags & BPF_F_TOKEN_FD) {
token = bpf_token_get_from_fd(attr->btf_token_fd);
if (IS_ERR(token))
return PTR_ERR(token);
if (!bpf_token_allow_cmd(token, BPF_BTF_LOAD)) {
bpf_token_put(token);
token = NULL;
}
}
if (!bpf_token_capable(token, CAP_BPF)) {
bpf_token_put(token);
return -EPERM;
}
bpf_token_put(token);
return btf_new_fd(attr, uattr, uattr_size);
}
@ -5415,6 +5517,20 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
return ret;
}
#define BPF_TOKEN_CREATE_LAST_FIELD token_create.bpffs_fd
static int token_create(union bpf_attr *attr)
{
if (CHECK_ATTR(BPF_TOKEN_CREATE))
return -EINVAL;
/* no flags are supported yet */
if (attr->token_create.flags)
return -EINVAL;
return bpf_token_create(attr);
}
static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
{
union bpf_attr attr;
@ -5548,6 +5664,9 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
case BPF_PROG_BIND_MAP:
err = bpf_prog_bind_map(&attr);
break;
case BPF_TOKEN_CREATE:
err = token_create(&attr);
break;
default:
err = -EINVAL;
break;
@ -5654,7 +5773,7 @@ static const struct bpf_func_proto bpf_sys_bpf_proto = {
const struct bpf_func_proto * __weak
tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
return bpf_base_func_proto(func_id);
return bpf_base_func_proto(func_id, prog);
}
BPF_CALL_1(bpf_sys_close, u32, fd)
@ -5704,7 +5823,8 @@ syscall_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
switch (func_id) {
case BPF_FUNC_sys_bpf:
return !perfmon_capable() ? NULL : &bpf_sys_bpf_proto;
return !bpf_token_capable(prog->aux->token, CAP_PERFMON)
? NULL : &bpf_sys_bpf_proto;
case BPF_FUNC_btf_find_by_name_kind:
return &bpf_btf_find_by_name_kind_proto;
case BPF_FUNC_sys_close:

278
kernel/bpf/token.c Normal file
View File

@ -0,0 +1,278 @@
#include <linux/bpf.h>
#include <linux/vmalloc.h>
#include <linux/fdtable.h>
#include <linux/file.h>
#include <linux/fs.h>
#include <linux/kernel.h>
#include <linux/idr.h>
#include <linux/namei.h>
#include <linux/user_namespace.h>
#include <linux/security.h>
static bool bpf_ns_capable(struct user_namespace *ns, int cap)
{
return ns_capable(ns, cap) || (cap != CAP_SYS_ADMIN && ns_capable(ns, CAP_SYS_ADMIN));
}
bool bpf_token_capable(const struct bpf_token *token, int cap)
{
struct user_namespace *userns;
/* BPF token allows ns_capable() level of capabilities */
userns = token ? token->userns : &init_user_ns;
if (!bpf_ns_capable(userns, cap))
return false;
if (token && security_bpf_token_capable(token, cap) < 0)
return false;
return true;
}
void bpf_token_inc(struct bpf_token *token)
{
atomic64_inc(&token->refcnt);
}
static void bpf_token_free(struct bpf_token *token)
{
security_bpf_token_free(token);
put_user_ns(token->userns);
kfree(token);
}
static void bpf_token_put_deferred(struct work_struct *work)
{
struct bpf_token *token = container_of(work, struct bpf_token, work);
bpf_token_free(token);
}
void bpf_token_put(struct bpf_token *token)
{
if (!token)
return;
if (!atomic64_dec_and_test(&token->refcnt))
return;
INIT_WORK(&token->work, bpf_token_put_deferred);
schedule_work(&token->work);
}
static int bpf_token_release(struct inode *inode, struct file *filp)
{
struct bpf_token *token = filp->private_data;
bpf_token_put(token);
return 0;
}
static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp)
{
struct bpf_token *token = filp->private_data;
u64 mask;
BUILD_BUG_ON(__MAX_BPF_CMD >= 64);
mask = (1ULL << __MAX_BPF_CMD) - 1;
if ((token->allowed_cmds & mask) == mask)
seq_printf(m, "allowed_cmds:\tany\n");
else
seq_printf(m, "allowed_cmds:\t0x%llx\n", token->allowed_cmds);
BUILD_BUG_ON(__MAX_BPF_MAP_TYPE >= 64);
mask = (1ULL << __MAX_BPF_MAP_TYPE) - 1;
if ((token->allowed_maps & mask) == mask)
seq_printf(m, "allowed_maps:\tany\n");
else
seq_printf(m, "allowed_maps:\t0x%llx\n", token->allowed_maps);
BUILD_BUG_ON(__MAX_BPF_PROG_TYPE >= 64);
mask = (1ULL << __MAX_BPF_PROG_TYPE) - 1;
if ((token->allowed_progs & mask) == mask)
seq_printf(m, "allowed_progs:\tany\n");
else
seq_printf(m, "allowed_progs:\t0x%llx\n", token->allowed_progs);
BUILD_BUG_ON(__MAX_BPF_ATTACH_TYPE >= 64);
mask = (1ULL << __MAX_BPF_ATTACH_TYPE) - 1;
if ((token->allowed_attachs & mask) == mask)
seq_printf(m, "allowed_attachs:\tany\n");
else
seq_printf(m, "allowed_attachs:\t0x%llx\n", token->allowed_attachs);
}
#define BPF_TOKEN_INODE_NAME "bpf-token"
static const struct inode_operations bpf_token_iops = { };
static const struct file_operations bpf_token_fops = {
.release = bpf_token_release,
.show_fdinfo = bpf_token_show_fdinfo,
};
int bpf_token_create(union bpf_attr *attr)
{
struct bpf_mount_opts *mnt_opts;
struct bpf_token *token = NULL;
struct user_namespace *userns;
struct inode *inode;
struct file *file;
struct path path;
struct fd f;
umode_t mode;
int err, fd;
f = fdget(attr->token_create.bpffs_fd);
if (!f.file)
return -EBADF;
path = f.file->f_path;
path_get(&path);
fdput(f);
if (path.dentry != path.mnt->mnt_sb->s_root) {
err = -EINVAL;
goto out_path;
}
if (path.mnt->mnt_sb->s_op != &bpf_super_ops) {
err = -EINVAL;
goto out_path;
}
err = path_permission(&path, MAY_ACCESS);
if (err)
goto out_path;
userns = path.dentry->d_sb->s_user_ns;
/*
* Enforce that creators of BPF tokens are in the same user
* namespace as the BPF FS instance. This makes reasoning about
* permissions a lot easier and we can always relax this later.
*/
if (current_user_ns() != userns) {
err = -EPERM;
goto out_path;
}
if (!ns_capable(userns, CAP_BPF)) {
err = -EPERM;
goto out_path;
}
/* Creating BPF token in init_user_ns doesn't make much sense. */
if (current_user_ns() == &init_user_ns) {
err = -EOPNOTSUPP;
goto out_path;
}
mnt_opts = path.dentry->d_sb->s_fs_info;
if (mnt_opts->delegate_cmds == 0 &&
mnt_opts->delegate_maps == 0 &&
mnt_opts->delegate_progs == 0 &&
mnt_opts->delegate_attachs == 0) {
err = -ENOENT; /* no BPF token delegation is set up */
goto out_path;
}
mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
if (IS_ERR(inode)) {
err = PTR_ERR(inode);
goto out_path;
}
inode->i_op = &bpf_token_iops;
inode->i_fop = &bpf_token_fops;
clear_nlink(inode); /* make sure it is unlinked */
file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops);
if (IS_ERR(file)) {
iput(inode);
err = PTR_ERR(file);
goto out_path;
}
token = kzalloc(sizeof(*token), GFP_USER);
if (!token) {
err = -ENOMEM;
goto out_file;
}
atomic64_set(&token->refcnt, 1);
/* remember bpffs owning userns for future ns_capable() checks */
token->userns = get_user_ns(userns);
token->allowed_cmds = mnt_opts->delegate_cmds;
token->allowed_maps = mnt_opts->delegate_maps;
token->allowed_progs = mnt_opts->delegate_progs;
token->allowed_attachs = mnt_opts->delegate_attachs;
err = security_bpf_token_create(token, attr, &path);
if (err)
goto out_token;
fd = get_unused_fd_flags(O_CLOEXEC);
if (fd < 0) {
err = fd;
goto out_token;
}
file->private_data = token;
fd_install(fd, file);
path_put(&path);
return fd;
out_token:
bpf_token_free(token);
out_file:
fput(file);
out_path:
path_put(&path);
return err;
}
struct bpf_token *bpf_token_get_from_fd(u32 ufd)
{
struct fd f = fdget(ufd);
struct bpf_token *token;
if (!f.file)
return ERR_PTR(-EBADF);
if (f.file->f_op != &bpf_token_fops) {
fdput(f);
return ERR_PTR(-EINVAL);
}
token = f.file->private_data;
bpf_token_inc(token);
fdput(f);
return token;
}
bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
{
if (!token)
return false;
if (!(token->allowed_cmds & (1ULL << cmd)))
return false;
return security_bpf_token_cmd(token, cmd) == 0;
}
bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type)
{
if (!token || type >= __MAX_BPF_MAP_TYPE)
return false;
return token->allowed_maps & (1ULL << type);
}
bool bpf_token_allow_prog_type(const struct bpf_token *token,
enum bpf_prog_type prog_type,
enum bpf_attach_type attach_type)
{
if (!token || prog_type >= __MAX_BPF_PROG_TYPE || attach_type >= __MAX_BPF_ATTACH_TYPE)
return false;
return (token->allowed_progs & (1ULL << prog_type)) &&
(token->allowed_attachs & (1ULL << attach_type));
}

View File

@ -4403,6 +4403,18 @@ static bool __is_pointer_value(bool allow_ptr_leaks,
return reg->type != SCALAR_VALUE;
}
static void assign_scalar_id_before_mov(struct bpf_verifier_env *env,
struct bpf_reg_state *src_reg)
{
if (src_reg->type == SCALAR_VALUE && !src_reg->id &&
!tnum_is_const(src_reg->var_off))
/* Ensure that src_reg has a valid ID that will be copied to
* dst_reg and then will be used by find_equal_scalars() to
* propagate min/max range.
*/
src_reg->id = ++env->id_gen;
}
/* Copy src state preserving dst->parent and dst->live fields */
static void copy_register_state(struct bpf_reg_state *dst, const struct bpf_reg_state *src)
{
@ -4438,6 +4450,11 @@ static bool is_bpf_st_mem(struct bpf_insn *insn)
return BPF_CLASS(insn->code) == BPF_ST && BPF_MODE(insn->code) == BPF_MEM;
}
static int get_reg_width(struct bpf_reg_state *reg)
{
return fls64(reg->umax_value);
}
/* check_stack_{read,write}_fixed_off functions track spill/fill of registers,
* stack boundary and alignment are checked in check_mem_access()
*/
@ -4488,12 +4505,18 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
mark_stack_slot_scratched(env, spi);
if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && env->bpf_capable) {
bool reg_value_fits;
reg_value_fits = get_reg_width(reg) <= BITS_PER_BYTE * size;
/* Make sure that reg had an ID to build a relation on spill. */
if (reg_value_fits)
assign_scalar_id_before_mov(env, reg);
save_register_state(env, state, spi, reg, size);
/* Break the relation on a narrowing spill. */
if (fls64(reg->umax_value) > BITS_PER_BYTE * size)
if (!reg_value_fits)
state->stack[spi].spilled_ptr.id = 0;
} else if (!reg && !(off % BPF_REG_SIZE) && is_bpf_st_mem(insn) &&
insn->imm != 0 && env->bpf_capable) {
env->bpf_capable) {
struct bpf_reg_state fake_reg = {};
__mark_reg_known(&fake_reg, insn->imm);
@ -4640,7 +4663,20 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
return -EINVAL;
}
/* Erase all spilled pointers. */
/* If writing_zero and the spi slot contains a spill of value 0,
* maintain the spill type.
*/
if (writing_zero && *stype == STACK_SPILL &&
is_spilled_scalar_reg(&state->stack[spi])) {
struct bpf_reg_state *spill_reg = &state->stack[spi].spilled_ptr;
if (tnum_is_const(spill_reg->var_off) && spill_reg->var_off.value == 0) {
zero_used = true;
continue;
}
}
/* Erase all other spilled pointers. */
state->stack[spi].spilled_ptr.type = NOT_INIT;
/* Update the slot type. */
@ -12826,6 +12862,19 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
}
switch (base_type(ptr_reg->type)) {
case PTR_TO_CTX:
case PTR_TO_MAP_VALUE:
case PTR_TO_MAP_KEY:
case PTR_TO_STACK:
case PTR_TO_PACKET_META:
case PTR_TO_PACKET:
case PTR_TO_TP_BUFFER:
case PTR_TO_BTF_ID:
case PTR_TO_MEM:
case PTR_TO_BUF:
case PTR_TO_FUNC:
case CONST_PTR_TO_DYNPTR:
break;
case PTR_TO_FLOW_KEYS:
if (known)
break;
@ -12835,16 +12884,10 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
if (known && smin_val == 0 && opcode == BPF_ADD)
break;
fallthrough;
case PTR_TO_PACKET_END:
case PTR_TO_SOCKET:
case PTR_TO_SOCK_COMMON:
case PTR_TO_TCP_SOCK:
case PTR_TO_XDP_SOCK:
default:
verbose(env, "R%d pointer arithmetic on %s prohibited\n",
dst, reg_type_str(env, ptr_reg->type));
return -EACCES;
default:
break;
}
/* In case of 'scalar += pointer', dst_reg inherits pointer type and id.
@ -13905,20 +13948,13 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
if (BPF_SRC(insn->code) == BPF_X) {
struct bpf_reg_state *src_reg = regs + insn->src_reg;
struct bpf_reg_state *dst_reg = regs + insn->dst_reg;
bool need_id = src_reg->type == SCALAR_VALUE && !src_reg->id &&
!tnum_is_const(src_reg->var_off);
if (BPF_CLASS(insn->code) == BPF_ALU64) {
if (insn->off == 0) {
/* case: R1 = R2
* copy register state to dest reg
*/
if (need_id)
/* Assign src and dst registers the same ID
* that will be used by find_equal_scalars()
* to propagate min/max range.
*/
src_reg->id = ++env->id_gen;
assign_scalar_id_before_mov(env, src_reg);
copy_register_state(dst_reg, src_reg);
dst_reg->live |= REG_LIVE_WRITTEN;
dst_reg->subreg_def = DEF_NOT_SUBREG;
@ -13933,8 +13969,8 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
bool no_sext;
no_sext = src_reg->umax_value < (1ULL << (insn->off - 1));
if (no_sext && need_id)
src_reg->id = ++env->id_gen;
if (no_sext)
assign_scalar_id_before_mov(env, src_reg);
copy_register_state(dst_reg, src_reg);
if (!no_sext)
dst_reg->id = 0;
@ -13954,10 +13990,10 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
return -EACCES;
} else if (src_reg->type == SCALAR_VALUE) {
if (insn->off == 0) {
bool is_src_reg_u32 = src_reg->umax_value <= U32_MAX;
bool is_src_reg_u32 = get_reg_width(src_reg) <= 32;
if (is_src_reg_u32 && need_id)
src_reg->id = ++env->id_gen;
if (is_src_reg_u32)
assign_scalar_id_before_mov(env, src_reg);
copy_register_state(dst_reg, src_reg);
/* Make sure ID is cleared if src_reg is not in u32
* range otherwise dst_reg min/max could be incorrectly
@ -13971,8 +14007,8 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
/* case: W1 = (s8, s16)W2 */
bool no_sext = src_reg->umax_value < (1ULL << (insn->off - 1));
if (no_sext && need_id)
src_reg->id = ++env->id_gen;
if (no_sext)
assign_scalar_id_before_mov(env, src_reg);
copy_register_state(dst_reg, src_reg);
if (!no_sext)
dst_reg->id = 0;
@ -17027,7 +17063,7 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
}
/* attempt to detect infinite loop to avoid unnecessary doomed work */
if (states_maybe_looping(&sl->state, cur) &&
states_equal(env, &sl->state, cur, false) &&
states_equal(env, &sl->state, cur, true) &&
!iter_active_depths_differ(&sl->state, cur) &&
sl->state.callback_unroll_depth == cur->callback_unroll_depth) {
verbose_linfo(env, insn_idx, "; ");
@ -19809,6 +19845,23 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
continue;
}
/* Implement bpf_kptr_xchg inline */
if (prog->jit_requested && BITS_PER_LONG == 64 &&
insn->imm == BPF_FUNC_kptr_xchg &&
bpf_jit_supports_ptr_xchg()) {
insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_2);
insn_buf[1] = BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_1, BPF_REG_0, 0);
cnt = 2;
new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
if (!new_prog)
return -ENOMEM;
delta += cnt - 1;
env->prog = prog = new_prog;
insn = new_prog->insnsi + i + delta;
continue;
}
patch_call_imm:
fn = env->ops->get_func_proto(insn->imm, env->prog);
/* all functions that have prototype and verifier allowed
@ -20041,7 +20094,6 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog)
state->first_insn_idx = env->subprog_info[subprog].start;
state->last_insn_idx = -1;
regs = state->frame[state->curframe]->regs;
if (subprog || env->prog->type == BPF_PROG_TYPE_EXT) {
const char *sub_name = subprog_name(env, subprog);
@ -20233,10 +20285,12 @@ static void print_verification_stats(struct bpf_verifier_env *env)
static int check_struct_ops_btf_id(struct bpf_verifier_env *env)
{
const struct btf_type *t, *func_proto;
const struct bpf_struct_ops_desc *st_ops_desc;
const struct bpf_struct_ops *st_ops;
const struct btf_member *member;
struct bpf_prog *prog = env->prog;
u32 btf_id, member_idx;
struct btf *btf;
const char *mname;
if (!prog->gpl_compatible) {
@ -20244,15 +20298,30 @@ static int check_struct_ops_btf_id(struct bpf_verifier_env *env)
return -EINVAL;
}
if (!prog->aux->attach_btf_id)
return -ENOTSUPP;
btf = prog->aux->attach_btf;
if (btf_is_module(btf)) {
/* Make sure st_ops is valid through the lifetime of env */
env->attach_btf_mod = btf_try_get_module(btf);
if (!env->attach_btf_mod) {
verbose(env, "struct_ops module %s is not found\n",
btf_get_name(btf));
return -ENOTSUPP;
}
}
btf_id = prog->aux->attach_btf_id;
st_ops = bpf_struct_ops_find(btf_id);
if (!st_ops) {
st_ops_desc = bpf_struct_ops_find(btf, btf_id);
if (!st_ops_desc) {
verbose(env, "attach_btf_id %u is not a supported struct\n",
btf_id);
return -ENOTSUPP;
}
st_ops = st_ops_desc->st_ops;
t = st_ops->type;
t = st_ops_desc->type;
member_idx = prog->expected_attach_type;
if (member_idx >= btf_type_vlen(t)) {
verbose(env, "attach to invalid member idx %u of struct %s\n",
@ -20261,8 +20330,8 @@ static int check_struct_ops_btf_id(struct bpf_verifier_env *env)
}
member = &btf_type_member(t)[member_idx];
mname = btf_name_by_offset(btf_vmlinux, member->name_off);
func_proto = btf_type_resolve_func_ptr(btf_vmlinux, member->type,
mname = btf_name_by_offset(btf, member->name_off);
func_proto = btf_type_resolve_func_ptr(btf, member->type,
NULL);
if (!func_proto) {
verbose(env, "attach to invalid member %s(@idx %u) of struct %s\n",
@ -20764,7 +20833,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
env->prog = *prog;
env->ops = bpf_verifier_ops[env->prog->type];
env->fd_array = make_bpfptr(attr->fd_array, uattr.is_kernel);
is_priv = bpf_capable();
env->allow_ptr_leaks = bpf_allow_ptr_leaks(env->prog->aux->token);
env->allow_uninit_stack = bpf_allow_uninit_stack(env->prog->aux->token);
env->bypass_spec_v1 = bpf_bypass_spec_v1(env->prog->aux->token);
env->bypass_spec_v4 = bpf_bypass_spec_v4(env->prog->aux->token);
env->bpf_capable = is_priv = bpf_token_capable(env->prog->aux->token, CAP_BPF);
bpf_get_btf_vmlinux();
@ -20796,12 +20870,6 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
if (attr->prog_flags & BPF_F_ANY_ALIGNMENT)
env->strict_alignment = false;
env->allow_ptr_leaks = bpf_allow_ptr_leaks();
env->allow_uninit_stack = bpf_allow_uninit_stack();
env->bypass_spec_v1 = bpf_bypass_spec_v1();
env->bypass_spec_v4 = bpf_bypass_spec_v4();
env->bpf_capable = bpf_capable();
if (is_priv)
env->test_state_freq = attr->prog_flags & BPF_F_TEST_STATE_FREQ;
env->test_reg_invariants = attr->prog_flags & BPF_F_TEST_REG_INVARIANTS;
@ -20967,6 +21035,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
env->prog->expected_attach_type = 0;
*prog = env->prog;
module_put(env->attach_btf_mod);
err_unlock:
if (!is_priv)
mutex_unlock(&bpf_verifier_lock);

View File

@ -1629,7 +1629,7 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_trace_vprintk:
return bpf_get_trace_vprintk_proto();
default:
return bpf_base_func_proto(func_id);
return bpf_base_func_proto(func_id, prog);
}
}
@ -2679,6 +2679,7 @@ static void bpf_kprobe_multi_link_dealloc(struct bpf_link *link)
static int bpf_kprobe_multi_link_fill_link_info(const struct bpf_link *link,
struct bpf_link_info *info)
{
u64 __user *ucookies = u64_to_user_ptr(info->kprobe_multi.cookies);
u64 __user *uaddrs = u64_to_user_ptr(info->kprobe_multi.addrs);
struct bpf_kprobe_multi_link *kmulti_link;
u32 ucount = info->kprobe_multi.count;
@ -2686,6 +2687,8 @@ static int bpf_kprobe_multi_link_fill_link_info(const struct bpf_link *link,
if (!uaddrs ^ !ucount)
return -EINVAL;
if (ucookies && !ucount)
return -EINVAL;
kmulti_link = container_of(link, struct bpf_kprobe_multi_link, link);
info->kprobe_multi.count = kmulti_link->cnt;
@ -2699,6 +2702,18 @@ static int bpf_kprobe_multi_link_fill_link_info(const struct bpf_link *link,
else
ucount = kmulti_link->cnt;
if (ucookies) {
if (kmulti_link->cookies) {
if (copy_to_user(ucookies, kmulti_link->cookies, ucount * sizeof(u64)))
return -EFAULT;
} else {
for (i = 0; i < ucount; i++) {
if (put_user(0, ucookies + i))
return -EFAULT;
}
}
}
if (kallsyms_show_value(current_cred())) {
if (copy_to_user(uaddrs, kmulti_link->addrs, ucount * sizeof(u64)))
return -EFAULT;

View File

@ -7,7 +7,7 @@
#include <linux/bpf.h>
#include <linux/btf.h>
extern struct bpf_struct_ops bpf_bpf_dummy_ops;
static struct bpf_struct_ops bpf_bpf_dummy_ops;
/* A common type for test_N with return value in bpf_dummy_ops */
typedef int (*dummy_ops_test_ret_fn)(struct bpf_dummy_ops_state *state, ...);
@ -22,6 +22,8 @@ struct bpf_dummy_ops_test_args {
struct bpf_dummy_ops_state state;
};
static struct btf *bpf_dummy_ops_btf;
static struct bpf_dummy_ops_test_args *
dummy_ops_init_args(const union bpf_attr *kattr, unsigned int nr)
{
@ -90,9 +92,15 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
void *image = NULL;
unsigned int op_idx;
int prog_ret;
s32 type_id;
int err;
if (prog->aux->attach_btf_id != st_ops->type_id)
type_id = btf_find_by_name_kind(bpf_dummy_ops_btf,
bpf_bpf_dummy_ops.name,
BTF_KIND_STRUCT);
if (type_id < 0)
return -EINVAL;
if (prog->aux->attach_btf_id != type_id)
return -EOPNOTSUPP;
func_proto = prog->aux->attach_func_proto;
@ -148,6 +156,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr,
static int bpf_dummy_init(struct btf *btf)
{
bpf_dummy_ops_btf = btf;
return 0;
}
@ -247,7 +256,7 @@ static struct bpf_dummy_ops __bpf_bpf_dummy_ops = {
.test_sleepable = bpf_dummy_test_sleepable,
};
struct bpf_struct_ops bpf_bpf_dummy_ops = {
static struct bpf_struct_ops bpf_bpf_dummy_ops = {
.verifier_ops = &bpf_dummy_verifier_ops,
.init = bpf_dummy_init,
.check_member = bpf_dummy_ops_check_member,
@ -256,4 +265,11 @@ struct bpf_struct_ops bpf_bpf_dummy_ops = {
.unreg = bpf_dummy_unreg,
.name = "bpf_dummy_ops",
.cfi_stubs = &__bpf_bpf_dummy_ops,
.owner = THIS_MODULE,
};
static int __init bpf_dummy_struct_ops_init(void)
{
return register_bpf_struct_ops(&bpf_bpf_dummy_ops, bpf_dummy_ops);
}
late_initcall(bpf_dummy_struct_ops_init);

View File

@ -88,7 +88,7 @@
#include "dev.h"
static const struct bpf_func_proto *
bpf_sk_base_func_proto(enum bpf_func_id func_id);
bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
int copy_bpf_fprog_from_user(struct sock_fprog *dst, sockptr_t src, int len)
{
@ -778,7 +778,7 @@ static int bpf_convert_filter(struct sock_filter *prog, int len,
BPF_EMIT_JMP;
break;
/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
/* ldxb 4 * ([14] & 0xf) is remapped into 6 insns. */
case BPF_LDX | BPF_MSH | BPF_B: {
struct sock_filter tmp = {
.code = BPF_LD | BPF_ABS | BPF_B,
@ -804,7 +804,7 @@ static int bpf_convert_filter(struct sock_filter *prog, int len,
*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP);
break;
}
/* RET_K is remaped into 2 insns. RET_A case doesn't need an
/* RET_K is remapped into 2 insns. RET_A case doesn't need an
* extra mov as BPF_REG_0 is already mapped into BPF_REG_A.
*/
case BPF_RET | BPF_A:
@ -2968,7 +2968,7 @@ BPF_CALL_4(bpf_msg_pop_data, struct sk_msg *, msg, u32, start,
*
* Then if B is non-zero AND there is no space allocate space and
* compact A, B regions into page. If there is space shift ring to
* the rigth free'ing the next element in ring to place B, leaving
* the right free'ing the next element in ring to place B, leaving
* A untouched except to reduce length.
*/
if (start != offset) {
@ -7894,7 +7894,7 @@ sock_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_ktime_get_coarse_ns:
return &bpf_ktime_get_coarse_ns_proto;
default:
return bpf_base_func_proto(func_id);
return bpf_base_func_proto(func_id, prog);
}
}
@ -7987,7 +7987,7 @@ sock_addr_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return NULL;
}
default:
return bpf_sk_base_func_proto(func_id);
return bpf_sk_base_func_proto(func_id, prog);
}
}
@ -8006,7 +8006,7 @@ sk_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_perf_event_output:
return &bpf_skb_event_output_proto;
default:
return bpf_sk_base_func_proto(func_id);
return bpf_sk_base_func_proto(func_id, prog);
}
}
@ -8193,7 +8193,7 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
#endif
#endif
default:
return bpf_sk_base_func_proto(func_id);
return bpf_sk_base_func_proto(func_id, prog);
}
}
@ -8252,13 +8252,13 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
#endif
#endif
default:
return bpf_sk_base_func_proto(func_id);
return bpf_sk_base_func_proto(func_id, prog);
}
#if IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES)
/* The nf_conn___init type is used in the NF_CONNTRACK kfuncs. The
* kfuncs are defined in two different modules, and we want to be able
* to use them interchangably with the same BTF type ID. Because modules
* to use them interchangeably with the same BTF type ID. Because modules
* can't de-duplicate BTF IDs between each other, we need the type to be
* referenced in the vmlinux BTF or the verifier will get confused about
* the different types. So we add this dummy type reference which will
@ -8313,7 +8313,7 @@ sock_ops_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_tcp_sock_proto;
#endif /* CONFIG_INET */
default:
return bpf_sk_base_func_proto(func_id);
return bpf_sk_base_func_proto(func_id, prog);
}
}
@ -8355,7 +8355,7 @@ sk_msg_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_get_cgroup_classid_curr_proto;
#endif
default:
return bpf_sk_base_func_proto(func_id);
return bpf_sk_base_func_proto(func_id, prog);
}
}
@ -8399,7 +8399,7 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_skc_lookup_tcp_proto;
#endif
default:
return bpf_sk_base_func_proto(func_id);
return bpf_sk_base_func_proto(func_id, prog);
}
}
@ -8410,7 +8410,7 @@ flow_dissector_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_skb_load_bytes:
return &bpf_flow_dissector_load_bytes_proto;
default:
return bpf_sk_base_func_proto(func_id);
return bpf_sk_base_func_proto(func_id, prog);
}
}
@ -8437,7 +8437,7 @@ lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_skb_under_cgroup:
return &bpf_skb_under_cgroup_proto;
default:
return bpf_sk_base_func_proto(func_id);
return bpf_sk_base_func_proto(func_id, prog);
}
}
@ -8612,7 +8612,7 @@ static bool cg_skb_is_valid_access(int off, int size,
return false;
case bpf_ctx_range(struct __sk_buff, data):
case bpf_ctx_range(struct __sk_buff, data_end):
if (!bpf_capable())
if (!bpf_token_capable(prog->aux->token, CAP_BPF))
return false;
break;
}
@ -8624,7 +8624,7 @@ static bool cg_skb_is_valid_access(int off, int size,
case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
break;
case bpf_ctx_range(struct __sk_buff, tstamp):
if (!bpf_capable())
if (!bpf_token_capable(prog->aux->token, CAP_BPF))
return false;
break;
default:
@ -11268,7 +11268,7 @@ sk_reuseport_func_proto(enum bpf_func_id func_id,
case BPF_FUNC_ktime_get_coarse_ns:
return &bpf_ktime_get_coarse_ns_proto;
default:
return bpf_base_func_proto(func_id);
return bpf_base_func_proto(func_id, prog);
}
}
@ -11450,7 +11450,7 @@ sk_lookup_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_sk_release:
return &bpf_sk_release_proto;
default:
return bpf_sk_base_func_proto(func_id);
return bpf_sk_base_func_proto(func_id, prog);
}
}
@ -11784,7 +11784,7 @@ const struct bpf_func_proto bpf_sock_from_file_proto = {
};
static const struct bpf_func_proto *
bpf_sk_base_func_proto(enum bpf_func_id func_id)
bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
const struct bpf_func_proto *func;
@ -11813,10 +11813,10 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id)
case BPF_FUNC_ktime_get_coarse_ns:
return &bpf_ktime_get_coarse_ns_proto;
default:
return bpf_base_func_proto(func_id);
return bpf_base_func_proto(func_id, prog);
}
if (!perfmon_capable())
if (!bpf_token_capable(prog->aux->token, CAP_PERFMON))
return NULL;
return func;
@ -11869,6 +11869,103 @@ __bpf_kfunc int bpf_sock_addr_set_sun_path(struct bpf_sock_addr_kern *sa_kern,
return 0;
}
__bpf_kfunc int bpf_sk_assign_tcp_reqsk(struct sk_buff *skb, struct sock *sk,
struct bpf_tcp_req_attrs *attrs, int attrs__sz)
{
#if IS_ENABLED(CONFIG_SYN_COOKIES)
const struct request_sock_ops *ops;
struct inet_request_sock *ireq;
struct tcp_request_sock *treq;
struct request_sock *req;
struct net *net;
__u16 min_mss;
u32 tsoff = 0;
if (attrs__sz != sizeof(*attrs) ||
attrs->reserved[0] || attrs->reserved[1] || attrs->reserved[2])
return -EINVAL;
if (!skb_at_tc_ingress(skb))
return -EINVAL;
net = dev_net(skb->dev);
if (net != sock_net(sk))
return -ENETUNREACH;
switch (skb->protocol) {
case htons(ETH_P_IP):
ops = &tcp_request_sock_ops;
min_mss = 536;
break;
#if IS_BUILTIN(CONFIG_IPV6)
case htons(ETH_P_IPV6):
ops = &tcp6_request_sock_ops;
min_mss = IPV6_MIN_MTU - 60;
break;
#endif
default:
return -EINVAL;
}
if (sk->sk_type != SOCK_STREAM || sk->sk_state != TCP_LISTEN ||
sk_is_mptcp(sk))
return -EINVAL;
if (attrs->mss < min_mss)
return -EINVAL;
if (attrs->wscale_ok) {
if (!READ_ONCE(net->ipv4.sysctl_tcp_window_scaling))
return -EINVAL;
if (attrs->snd_wscale > TCP_MAX_WSCALE ||
attrs->rcv_wscale > TCP_MAX_WSCALE)
return -EINVAL;
}
if (attrs->sack_ok && !READ_ONCE(net->ipv4.sysctl_tcp_sack))
return -EINVAL;
if (attrs->tstamp_ok) {
if (!READ_ONCE(net->ipv4.sysctl_tcp_timestamps))
return -EINVAL;
tsoff = attrs->rcv_tsecr - tcp_ns_to_ts(attrs->usec_ts_ok, tcp_clock_ns());
}
req = inet_reqsk_alloc(ops, sk, false);
if (!req)
return -ENOMEM;
ireq = inet_rsk(req);
treq = tcp_rsk(req);
req->rsk_listener = sk;
req->syncookie = 1;
req->mss = attrs->mss;
req->ts_recent = attrs->rcv_tsval;
ireq->snd_wscale = attrs->snd_wscale;
ireq->rcv_wscale = attrs->rcv_wscale;
ireq->tstamp_ok = !!attrs->tstamp_ok;
ireq->sack_ok = !!attrs->sack_ok;
ireq->wscale_ok = !!attrs->wscale_ok;
ireq->ecn_ok = !!attrs->ecn_ok;
treq->req_usec_ts = !!attrs->usec_ts_ok;
treq->ts_off = tsoff;
skb_orphan(skb);
skb->sk = req_to_sk(req);
skb->destructor = sock_pfree;
return 0;
#else
return -EOPNOTSUPP;
#endif
}
__bpf_kfunc_end_defs();
int bpf_dynptr_from_skb_rdonly(struct sk_buff *skb, u64 flags,
@ -11897,6 +11994,10 @@ BTF_SET8_START(bpf_kfunc_check_set_sock_addr)
BTF_ID_FLAGS(func, bpf_sock_addr_set_sun_path)
BTF_SET8_END(bpf_kfunc_check_set_sock_addr)
BTF_SET8_START(bpf_kfunc_check_set_tcp_reqsk)
BTF_ID_FLAGS(func, bpf_sk_assign_tcp_reqsk, KF_TRUSTED_ARGS)
BTF_SET8_END(bpf_kfunc_check_set_tcp_reqsk)
static const struct btf_kfunc_id_set bpf_kfunc_set_skb = {
.owner = THIS_MODULE,
.set = &bpf_kfunc_check_set_skb,
@ -11912,6 +12013,11 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_sock_addr = {
.set = &bpf_kfunc_check_set_sock_addr,
};
static const struct btf_kfunc_id_set bpf_kfunc_set_tcp_reqsk = {
.owner = THIS_MODULE,
.set = &bpf_kfunc_check_set_tcp_reqsk,
};
static int __init bpf_kfunc_init(void)
{
int ret;
@ -11927,8 +12033,9 @@ static int __init bpf_kfunc_init(void)
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_SEG6LOCAL, &bpf_kfunc_set_skb);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_NETFILTER, &bpf_kfunc_set_skb);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp);
return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
&bpf_kfunc_set_sock_addr);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
&bpf_kfunc_set_sock_addr);
return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
}
late_initcall(bpf_kfunc_init);

View File

@ -2583,8 +2583,18 @@ EXPORT_SYMBOL(sock_efree);
#ifdef CONFIG_INET
void sock_pfree(struct sk_buff *skb)
{
if (sk_is_refcounted(skb->sk))
sock_gen_put(skb->sk);
struct sock *sk = skb->sk;
if (!sk_is_refcounted(sk))
return;
if (sk->sk_state == TCP_NEW_SYN_RECV && inet_reqsk(sk)->syncookie) {
inet_reqsk(sk)->rsk_listener = NULL;
reqsk_free(inet_reqsk(sk));
return;
}
sock_gen_put(sk);
}
EXPORT_SYMBOL(sock_pfree);
#endif /* CONFIG_INET */

View File

@ -12,7 +12,7 @@
#include <net/bpf_sk_storage.h>
/* "extern" is to avoid sparse warning. It is only used in bpf_struct_ops.c. */
extern struct bpf_struct_ops bpf_tcp_congestion_ops;
static struct bpf_struct_ops bpf_tcp_congestion_ops;
static u32 unsupported_ops[] = {
offsetof(struct tcp_congestion_ops, get_info),
@ -20,6 +20,7 @@ static u32 unsupported_ops[] = {
static const struct btf_type *tcp_sock_type;
static u32 tcp_sock_id, sock_id;
static const struct btf_type *tcp_congestion_ops_type;
static int bpf_tcp_ca_init(struct btf *btf)
{
@ -36,6 +37,11 @@ static int bpf_tcp_ca_init(struct btf *btf)
tcp_sock_id = type_id;
tcp_sock_type = btf_type_by_id(btf, tcp_sock_id);
type_id = btf_find_by_name_kind(btf, "tcp_congestion_ops", BTF_KIND_STRUCT);
if (type_id < 0)
return -EINVAL;
tcp_congestion_ops_type = btf_type_by_id(btf, type_id);
return 0;
}
@ -149,7 +155,7 @@ static u32 prog_ops_moff(const struct bpf_prog *prog)
u32 midx;
midx = prog->expected_attach_type;
t = bpf_tcp_congestion_ops.type;
t = tcp_congestion_ops_type;
m = &btf_type_member(t)[midx];
return __btf_member_bit_offset(t, m) / 8;
@ -191,7 +197,7 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id,
case BPF_FUNC_ktime_get_coarse_ns:
return &bpf_ktime_get_coarse_ns_proto;
default:
return bpf_base_func_proto(func_id);
return bpf_base_func_proto(func_id, prog);
}
}
@ -339,7 +345,7 @@ static struct tcp_congestion_ops __bpf_ops_tcp_congestion_ops = {
.release = __bpf_tcp_ca_release,
};
struct bpf_struct_ops bpf_tcp_congestion_ops = {
static struct bpf_struct_ops bpf_tcp_congestion_ops = {
.verifier_ops = &bpf_tcp_ca_verifier_ops,
.reg = bpf_tcp_ca_reg,
.unreg = bpf_tcp_ca_unreg,
@ -350,10 +356,16 @@ struct bpf_struct_ops bpf_tcp_congestion_ops = {
.validate = bpf_tcp_ca_validate,
.name = "tcp_congestion_ops",
.cfi_stubs = &__bpf_ops_tcp_congestion_ops,
.owner = THIS_MODULE,
};
static int __init bpf_tcp_ca_kfunc_init(void)
{
return register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_tcp_ca_kfunc_set);
int ret;
ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_tcp_ca_kfunc_set);
ret = ret ?: register_bpf_struct_ops(&bpf_tcp_congestion_ops, tcp_congestion_ops);
return ret;
}
late_initcall(bpf_tcp_ca_kfunc_init);

View File

@ -51,15 +51,6 @@ static u32 cookie_hash(__be32 saddr, __be32 daddr, __be16 sport, __be16 dport,
count, &syncookie_secret[c]);
}
/* Convert one nsec 64bit timestamp to ts (ms or usec resolution) */
static u64 tcp_ns_to_ts(bool usec_ts, u64 val)
{
if (usec_ts)
return div_u64(val, NSEC_PER_USEC);
return div_u64(val, NSEC_PER_MSEC);
}
/*
* when syncookies are in effect and tcp timestamps are enabled we encode
* tcp options in the lower bits of the timestamp value that will be
@ -304,6 +295,24 @@ static int cookie_tcp_reqsk_init(struct sock *sk, struct sk_buff *skb,
return 0;
}
#if IS_ENABLED(CONFIG_BPF)
struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb)
{
struct request_sock *req = inet_reqsk(skb->sk);
skb->sk = NULL;
skb->destructor = NULL;
if (cookie_tcp_reqsk_init(sk, skb, req)) {
reqsk_free(req);
req = NULL;
}
return req;
}
EXPORT_SYMBOL_GPL(cookie_bpf_check);
#endif
struct request_sock *cookie_tcp_reqsk_alloc(const struct request_sock_ops *ops,
struct sock *sk, struct sk_buff *skb,
struct tcp_options_received *tcp_opt,
@ -404,9 +413,13 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
!th->ack || th->rst)
goto out;
req = cookie_tcp_check(net, sk, skb);
if (IS_ERR(req))
goto out;
if (cookie_bpf_ok(skb)) {
req = cookie_bpf_check(sk, skb);
} else {
req = cookie_tcp_check(net, sk, skb);
if (IS_ERR(req))
goto out;
}
if (!req)
goto out_drop;
@ -454,7 +467,8 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
ireq->wscale_ok, &rcv_wscale,
dst_metric(&rt->dst, RTAX_INITRWND));
ireq->rcv_wscale = rcv_wscale;
if (!req->syncookie)
ireq->rcv_wscale = rcv_wscale;
ireq->ecn_ok &= cookie_ecn_ok(net, &rt->dst);
ret = tcp_get_cookie_sock(sk, skb, req, &rt->dst);

View File

@ -182,9 +182,13 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
!th->ack || th->rst)
goto out;
req = cookie_tcp_check(net, sk, skb);
if (IS_ERR(req))
goto out;
if (cookie_bpf_ok(skb)) {
req = cookie_bpf_check(sk, skb);
} else {
req = cookie_tcp_check(net, sk, skb);
if (IS_ERR(req))
goto out;
}
if (!req)
goto out_drop;
@ -247,7 +251,8 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
ireq->wscale_ok, &rcv_wscale,
dst_metric(dst, RTAX_INITRWND));
ireq->rcv_wscale = rcv_wscale;
if (!req->syncookie)
ireq->rcv_wscale = rcv_wscale;
ireq->ecn_ok &= cookie_ecn_ok(net, dst);
ret = tcp_get_cookie_sock(sk, skb, req, dst);

View File

@ -314,7 +314,7 @@ static bool nf_is_valid_access(int off, int size, enum bpf_access_type type,
static const struct bpf_func_proto *
bpf_nf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
return bpf_base_func_proto(func_id);
return bpf_base_func_proto(func_id, prog);
}
const struct bpf_verifier_ops netfilter_verifier_ops = {

View File

@ -5410,29 +5410,87 @@ int security_bpf_prog(struct bpf_prog *prog)
}
/**
* security_bpf_map_alloc() - Allocate a bpf map LSM blob
* @map: bpf map
* security_bpf_map_create() - Check if BPF map creation is allowed
* @map: BPF map object
* @attr: BPF syscall attributes used to create BPF map
* @token: BPF token used to grant user access
*
* Initialize the security field inside bpf map.
* Do a check when the kernel creates a new BPF map. This is also the
* point where LSM blob is allocated for LSMs that need them.
*
* Return: Returns 0 on success, error on failure.
*/
int security_bpf_map_alloc(struct bpf_map *map)
int security_bpf_map_create(struct bpf_map *map, union bpf_attr *attr,
struct bpf_token *token)
{
return call_int_hook(bpf_map_alloc_security, 0, map);
return call_int_hook(bpf_map_create, 0, map, attr, token);
}
/**
* security_bpf_prog_alloc() - Allocate a bpf program LSM blob
* @aux: bpf program aux info struct
* security_bpf_prog_load() - Check if loading of BPF program is allowed
* @prog: BPF program object
* @attr: BPF syscall attributes used to create BPF program
* @token: BPF token used to grant user access to BPF subsystem
*
* Initialize the security field inside bpf program.
* Perform an access control check when the kernel loads a BPF program and
* allocates associated BPF program object. This hook is also responsible for
* allocating any required LSM state for the BPF program.
*
* Return: Returns 0 on success, error on failure.
*/
int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
struct bpf_token *token)
{
return call_int_hook(bpf_prog_alloc_security, 0, aux);
return call_int_hook(bpf_prog_load, 0, prog, attr, token);
}
/**
* security_bpf_token_create() - Check if creating of BPF token is allowed
* @token: BPF token object
* @attr: BPF syscall attributes used to create BPF token
* @path: path pointing to BPF FS mount point from which BPF token is created
*
* Do a check when the kernel instantiates a new BPF token object from BPF FS
* instance. This is also the point where LSM blob can be allocated for LSMs.
*
* Return: Returns 0 on success, error on failure.
*/
int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
struct path *path)
{
return call_int_hook(bpf_token_create, 0, token, attr, path);
}
/**
* security_bpf_token_cmd() - Check if BPF token is allowed to delegate
* requested BPF syscall command
* @token: BPF token object
* @cmd: BPF syscall command requested to be delegated by BPF token
*
* Do a check when the kernel decides whether provided BPF token should allow
* delegation of requested BPF syscall command.
*
* Return: Returns 0 on success, error on failure.
*/
int security_bpf_token_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
{
return call_int_hook(bpf_token_cmd, 0, token, cmd);
}
/**
* security_bpf_token_capable() - Check if BPF token is allowed to delegate
* requested BPF-related capability
* @token: BPF token object
* @cap: capabilities requested to be delegated by BPF token
*
* Do a check when the kernel decides whether provided BPF token should allow
* delegation of requested BPF-related capabilities.
*
* Return: Returns 0 on success, error on failure.
*/
int security_bpf_token_capable(const struct bpf_token *token, int cap)
{
return call_int_hook(bpf_token_capable, 0, token, cap);
}
/**
@ -5443,18 +5501,29 @@ int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
*/
void security_bpf_map_free(struct bpf_map *map)
{
call_void_hook(bpf_map_free_security, map);
call_void_hook(bpf_map_free, map);
}
/**
* security_bpf_prog_free() - Free a bpf program's LSM blob
* @aux: bpf program aux info struct
* security_bpf_prog_free() - Free a BPF program's LSM blob
* @prog: BPF program struct
*
* Clean up the security information stored inside bpf prog.
* Clean up the security information stored inside BPF program.
*/
void security_bpf_prog_free(struct bpf_prog_aux *aux)
void security_bpf_prog_free(struct bpf_prog *prog)
{
call_void_hook(bpf_prog_free_security, aux);
call_void_hook(bpf_prog_free, prog);
}
/**
* security_bpf_token_free() - Free a BPF token's LSM blob
* @token: BPF token struct
*
* Clean up the security information stored inside BPF token.
*/
void security_bpf_token_free(struct bpf_token *token)
{
call_void_hook(bpf_token_free, token);
}
#endif /* CONFIG_BPF_SYSCALL */

View File

@ -6920,7 +6920,8 @@ static int selinux_bpf_prog(struct bpf_prog *prog)
BPF__PROG_RUN, NULL);
}
static int selinux_bpf_map_alloc(struct bpf_map *map)
static int selinux_bpf_map_create(struct bpf_map *map, union bpf_attr *attr,
struct bpf_token *token)
{
struct bpf_security_struct *bpfsec;
@ -6942,7 +6943,8 @@ static void selinux_bpf_map_free(struct bpf_map *map)
kfree(bpfsec);
}
static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux)
static int selinux_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr,
struct bpf_token *token)
{
struct bpf_security_struct *bpfsec;
@ -6951,16 +6953,39 @@ static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux)
return -ENOMEM;
bpfsec->sid = current_sid();
aux->security = bpfsec;
prog->aux->security = bpfsec;
return 0;
}
static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
static void selinux_bpf_prog_free(struct bpf_prog *prog)
{
struct bpf_security_struct *bpfsec = aux->security;
struct bpf_security_struct *bpfsec = prog->aux->security;
aux->security = NULL;
prog->aux->security = NULL;
kfree(bpfsec);
}
static int selinux_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
struct path *path)
{
struct bpf_security_struct *bpfsec;
bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
if (!bpfsec)
return -ENOMEM;
bpfsec->sid = current_sid();
token->security = bpfsec;
return 0;
}
static void selinux_bpf_token_free(struct bpf_token *token)
{
struct bpf_security_struct *bpfsec = token->security;
token->security = NULL;
kfree(bpfsec);
}
#endif
@ -7324,8 +7349,9 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
LSM_HOOK_INIT(bpf, selinux_bpf),
LSM_HOOK_INIT(bpf_map, selinux_bpf_map),
LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free),
LSM_HOOK_INIT(bpf_map_free, selinux_bpf_map_free),
LSM_HOOK_INIT(bpf_prog_free, selinux_bpf_prog_free),
LSM_HOOK_INIT(bpf_token_free, selinux_bpf_token_free),
#endif
#ifdef CONFIG_PERF_EVENTS
@ -7382,8 +7408,9 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = {
LSM_HOOK_INIT(audit_rule_init, selinux_audit_rule_init),
#endif
#ifdef CONFIG_BPF_SYSCALL
LSM_HOOK_INIT(bpf_map_alloc_security, selinux_bpf_map_alloc),
LSM_HOOK_INIT(bpf_prog_alloc_security, selinux_bpf_prog_alloc),
LSM_HOOK_INIT(bpf_map_create, selinux_bpf_map_create),
LSM_HOOK_INIT(bpf_prog_load, selinux_bpf_prog_load),
LSM_HOOK_INIT(bpf_token_create, selinux_bpf_token_create),
#endif
#ifdef CONFIG_PERF_EVENTS
LSM_HOOK_INIT(perf_event_alloc, selinux_perf_event_alloc),

View File

@ -249,18 +249,44 @@ static int get_prog_info(int prog_id, struct bpf_prog_info *info)
return err;
}
static int cmp_u64(const void *A, const void *B)
{
const __u64 *a = A, *b = B;
struct addr_cookie {
__u64 addr;
__u64 cookie;
};
return *a - *b;
static int cmp_addr_cookie(const void *A, const void *B)
{
const struct addr_cookie *a = A, *b = B;
if (a->addr == b->addr)
return 0;
return a->addr < b->addr ? -1 : 1;
}
static struct addr_cookie *
get_addr_cookie_array(__u64 *addrs, __u64 *cookies, __u32 count)
{
struct addr_cookie *data;
__u32 i;
data = calloc(count, sizeof(data[0]));
if (!data) {
p_err("mem alloc failed");
return NULL;
}
for (i = 0; i < count; i++) {
data[i].addr = addrs[i];
data[i].cookie = cookies[i];
}
qsort(data, count, sizeof(data[0]), cmp_addr_cookie);
return data;
}
static void
show_kprobe_multi_json(struct bpf_link_info *info, json_writer_t *wtr)
{
struct addr_cookie *data;
__u32 i, j = 0;
__u64 *addrs;
jsonw_bool_field(json_wtr, "retprobe",
info->kprobe_multi.flags & BPF_F_KPROBE_MULTI_RETURN);
@ -268,14 +294,20 @@ show_kprobe_multi_json(struct bpf_link_info *info, json_writer_t *wtr)
jsonw_uint_field(json_wtr, "missed", info->kprobe_multi.missed);
jsonw_name(json_wtr, "funcs");
jsonw_start_array(json_wtr);
addrs = u64_to_ptr(info->kprobe_multi.addrs);
qsort(addrs, info->kprobe_multi.count, sizeof(addrs[0]), cmp_u64);
data = get_addr_cookie_array(u64_to_ptr(info->kprobe_multi.addrs),
u64_to_ptr(info->kprobe_multi.cookies),
info->kprobe_multi.count);
if (!data)
return;
/* Load it once for all. */
if (!dd.sym_count)
kernel_syms_load(&dd);
if (!dd.sym_count)
goto error;
for (i = 0; i < dd.sym_count; i++) {
if (dd.sym_mapping[i].address != addrs[j])
if (dd.sym_mapping[i].address != data[j].addr)
continue;
jsonw_start_object(json_wtr);
jsonw_uint_field(json_wtr, "addr", dd.sym_mapping[i].address);
@ -287,11 +319,14 @@ show_kprobe_multi_json(struct bpf_link_info *info, json_writer_t *wtr)
} else {
jsonw_string_field(json_wtr, "module", dd.sym_mapping[i].module);
}
jsonw_uint_field(json_wtr, "cookie", data[j].cookie);
jsonw_end_object(json_wtr);
if (j++ == info->kprobe_multi.count)
break;
}
jsonw_end_array(json_wtr);
error:
free(data);
}
static __u64 *u64_to_arr(__u64 val)
@ -334,6 +369,7 @@ show_perf_event_kprobe_json(struct bpf_link_info *info, json_writer_t *wtr)
u64_to_ptr(info->perf_event.kprobe.func_name));
jsonw_uint_field(wtr, "offset", info->perf_event.kprobe.offset);
jsonw_uint_field(wtr, "missed", info->perf_event.kprobe.missed);
jsonw_uint_field(wtr, "cookie", info->perf_event.kprobe.cookie);
}
static void
@ -343,6 +379,7 @@ show_perf_event_uprobe_json(struct bpf_link_info *info, json_writer_t *wtr)
jsonw_string_field(wtr, "file",
u64_to_ptr(info->perf_event.uprobe.file_name));
jsonw_uint_field(wtr, "offset", info->perf_event.uprobe.offset);
jsonw_uint_field(wtr, "cookie", info->perf_event.uprobe.cookie);
}
static void
@ -350,6 +387,7 @@ show_perf_event_tracepoint_json(struct bpf_link_info *info, json_writer_t *wtr)
{
jsonw_string_field(wtr, "tracepoint",
u64_to_ptr(info->perf_event.tracepoint.tp_name));
jsonw_uint_field(wtr, "cookie", info->perf_event.tracepoint.cookie);
}
static char *perf_config_hw_cache_str(__u64 config)
@ -426,6 +464,8 @@ show_perf_event_event_json(struct bpf_link_info *info, json_writer_t *wtr)
else
jsonw_uint_field(wtr, "event_config", config);
jsonw_uint_field(wtr, "cookie", info->perf_event.event.cookie);
if (type == PERF_TYPE_HW_CACHE && perf_config)
free((void *)perf_config);
}
@ -670,8 +710,8 @@ void netfilter_dump_plain(const struct bpf_link_info *info)
static void show_kprobe_multi_plain(struct bpf_link_info *info)
{
struct addr_cookie *data;
__u32 i, j = 0;
__u64 *addrs;
if (!info->kprobe_multi.count)
return;
@ -683,21 +723,24 @@ static void show_kprobe_multi_plain(struct bpf_link_info *info)
printf("func_cnt %u ", info->kprobe_multi.count);
if (info->kprobe_multi.missed)
printf("missed %llu ", info->kprobe_multi.missed);
addrs = (__u64 *)u64_to_ptr(info->kprobe_multi.addrs);
qsort(addrs, info->kprobe_multi.count, sizeof(__u64), cmp_u64);
data = get_addr_cookie_array(u64_to_ptr(info->kprobe_multi.addrs),
u64_to_ptr(info->kprobe_multi.cookies),
info->kprobe_multi.count);
if (!data)
return;
/* Load it once for all. */
if (!dd.sym_count)
kernel_syms_load(&dd);
if (!dd.sym_count)
return;
goto error;
printf("\n\t%-16s %s", "addr", "func [module]");
printf("\n\t%-16s %-16s %s", "addr", "cookie", "func [module]");
for (i = 0; i < dd.sym_count; i++) {
if (dd.sym_mapping[i].address != addrs[j])
if (dd.sym_mapping[i].address != data[j].addr)
continue;
printf("\n\t%016lx %s",
dd.sym_mapping[i].address, dd.sym_mapping[i].name);
printf("\n\t%016lx %-16llx %s",
dd.sym_mapping[i].address, data[j].cookie, dd.sym_mapping[i].name);
if (dd.sym_mapping[i].module[0] != '\0')
printf(" [%s] ", dd.sym_mapping[i].module);
else
@ -706,6 +749,8 @@ static void show_kprobe_multi_plain(struct bpf_link_info *info)
if (j++ == info->kprobe_multi.count)
break;
}
error:
free(data);
}
static void show_uprobe_multi_plain(struct bpf_link_info *info)
@ -754,6 +799,8 @@ static void show_perf_event_kprobe_plain(struct bpf_link_info *info)
printf("+%#x", info->perf_event.kprobe.offset);
if (info->perf_event.kprobe.missed)
printf(" missed %llu", info->perf_event.kprobe.missed);
if (info->perf_event.kprobe.cookie)
printf(" cookie %llu", info->perf_event.kprobe.cookie);
printf(" ");
}
@ -770,6 +817,8 @@ static void show_perf_event_uprobe_plain(struct bpf_link_info *info)
else
printf("\n\tuprobe ");
printf("%s+%#x ", buf, info->perf_event.uprobe.offset);
if (info->perf_event.uprobe.cookie)
printf("cookie %llu ", info->perf_event.uprobe.cookie);
}
static void show_perf_event_tracepoint_plain(struct bpf_link_info *info)
@ -781,6 +830,8 @@ static void show_perf_event_tracepoint_plain(struct bpf_link_info *info)
return;
printf("\n\ttracepoint %s ", buf);
if (info->perf_event.tracepoint.cookie)
printf("cookie %llu ", info->perf_event.tracepoint.cookie);
}
static void show_perf_event_event_plain(struct bpf_link_info *info)
@ -802,6 +853,9 @@ static void show_perf_event_event_plain(struct bpf_link_info *info)
else
printf("%llu ", config);
if (info->perf_event.event.cookie)
printf("cookie %llu ", info->perf_event.event.cookie);
if (type == PERF_TYPE_HW_CACHE && perf_config)
free((void *)perf_config);
}
@ -952,6 +1006,14 @@ static int do_show_link(int fd)
return -ENOMEM;
}
info.kprobe_multi.addrs = ptr_to_u64(addrs);
cookies = calloc(count, sizeof(__u64));
if (!cookies) {
p_err("mem alloc failed");
free(addrs);
close(fd);
return -ENOMEM;
}
info.kprobe_multi.cookies = ptr_to_u64(cookies);
goto again;
}
}
@ -977,7 +1039,7 @@ static int do_show_link(int fd)
cookies = calloc(count, sizeof(__u64));
if (!cookies) {
p_err("mem alloc failed");
free(cookies);
free(ref_ctr_offsets);
free(offsets);
close(fd);
return -ENOMEM;

View File

@ -2298,7 +2298,7 @@ static int profile_open_perf_events(struct profiler_bpf *obj)
int map_fd;
profile_perf_events = calloc(
sizeof(int), obj->rodata->num_cpu * obj->rodata->num_metric);
obj->rodata->num_cpu * obj->rodata->num_metric, sizeof(int));
if (!profile_perf_events) {
p_err("failed to allocate memory for perf_event array: %s",
strerror(errno));

View File

@ -847,6 +847,36 @@ union bpf_iter_link_info {
* Returns zero on success. On error, -1 is returned and *errno*
* is set appropriately.
*
* BPF_TOKEN_CREATE
* Description
* Create BPF token with embedded information about what
* BPF-related functionality it allows:
* - a set of allowed bpf() syscall commands;
* - a set of allowed BPF map types to be created with
* BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
* - a set of allowed BPF program types and BPF program attach
* types to be loaded with BPF_PROG_LOAD command, if
* BPF_PROG_LOAD itself is allowed.
*
* BPF token is created (derived) from an instance of BPF FS,
* assuming it has necessary delegation mount options specified.
* This BPF token can be passed as an extra parameter to various
* bpf() syscall commands to grant BPF subsystem functionality to
* unprivileged processes.
*
* When created, BPF token is "associated" with the owning
* user namespace of BPF FS instance (super block) that it was
* derived from, and subsequent BPF operations performed with
* BPF token would be performing capabilities checks (i.e.,
* CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
* that user namespace. Without BPF token, such capabilities
* have to be granted in init user namespace, making bpf()
* syscall incompatible with user namespace, for the most part.
*
* Return
* A new file descriptor (a nonnegative integer), or -1 if an
* error occurred (in which case, *errno* is set appropriately).
*
* NOTES
* eBPF objects (maps and programs) can be shared between processes.
*
@ -901,6 +931,8 @@ enum bpf_cmd {
BPF_ITER_CREATE,
BPF_LINK_DETACH,
BPF_PROG_BIND_MAP,
BPF_TOKEN_CREATE,
__MAX_BPF_CMD,
};
enum bpf_map_type {
@ -951,6 +983,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_BLOOM_FILTER,
BPF_MAP_TYPE_USER_RINGBUF,
BPF_MAP_TYPE_CGRP_STORAGE,
__MAX_BPF_MAP_TYPE
};
/* Note that tracing related programs such as
@ -995,6 +1028,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SK_LOOKUP,
BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
BPF_PROG_TYPE_NETFILTER,
__MAX_BPF_PROG_TYPE
};
enum bpf_attach_type {
@ -1330,6 +1364,12 @@ enum {
/* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */
BPF_F_PATH_FD = (1U << 14),
/* Flag for value_type_btf_obj_fd, the fd is available */
BPF_F_VTYPE_BTF_OBJ_FD = (1U << 15),
/* BPF token FD is passed in a corresponding command's token_fd field */
BPF_F_TOKEN_FD = (1U << 16),
};
/* Flags for BPF_PROG_QUERY. */
@ -1403,6 +1443,15 @@ union bpf_attr {
* to using 5 hash functions).
*/
__u64 map_extra;
__s32 value_type_btf_obj_fd; /* fd pointing to a BTF
* type data for
* btf_vmlinux_value_type_id.
*/
/* BPF token FD to use with BPF_MAP_CREATE operation.
* If provided, map_flags should have BPF_F_TOKEN_FD flag set.
*/
__s32 map_token_fd;
};
struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
@ -1472,6 +1521,10 @@ union bpf_attr {
* truncated), or smaller (if log buffer wasn't filled completely).
*/
__u32 log_true_size;
/* BPF token FD to use with BPF_PROG_LOAD operation.
* If provided, prog_flags should have BPF_F_TOKEN_FD flag set.
*/
__s32 prog_token_fd;
};
struct { /* anonymous struct used by BPF_OBJ_* commands */
@ -1584,6 +1637,11 @@ union bpf_attr {
* truncated), or smaller (if log buffer wasn't filled completely).
*/
__u32 btf_log_true_size;
__u32 btf_flags;
/* BPF token FD to use with BPF_BTF_LOAD operation.
* If provided, btf_flags should have BPF_F_TOKEN_FD flag set.
*/
__s32 btf_token_fd;
};
struct {
@ -1714,6 +1772,11 @@ union bpf_attr {
__u32 flags; /* extra flags */
} prog_bind_map;
struct { /* struct used by BPF_TOKEN_CREATE command */
__u32 flags;
__u32 bpffs_fd;
} token_create;
} __attribute__((aligned(8)));
/* The description below is an attempt at providing documentation to eBPF
@ -4839,9 +4902,9 @@ union bpf_attr {
* going through the CPU's backlog queue.
*
* The *flags* argument is reserved and must be 0. The helper is
* currently only supported for tc BPF program types at the ingress
* hook and for veth device types. The peer device must reside in a
* different network namespace.
* currently only supported for tc BPF program types at the
* ingress hook and for veth and netkit target device types. The
* peer device must reside in a different network namespace.
* Return
* The helper returns **TC_ACT_REDIRECT** on success or
* **TC_ACT_SHOT** on error.
@ -6487,7 +6550,7 @@ struct bpf_map_info {
__u32 btf_id;
__u32 btf_key_type_id;
__u32 btf_value_type_id;
__u32 :32; /* alignment pad */
__u32 btf_vmlinux_id;
__u64 map_extra;
} __attribute__((aligned(8)));
@ -6563,6 +6626,7 @@ struct bpf_link_info {
__u32 count; /* in/out: kprobe_multi function count */
__u32 flags;
__u64 missed;
__aligned_u64 cookies;
} kprobe_multi;
struct {
__aligned_u64 path;
@ -6582,6 +6646,7 @@ struct bpf_link_info {
__aligned_u64 file_name; /* in/out */
__u32 name_len;
__u32 offset; /* offset from file_name */
__u64 cookie;
} uprobe; /* BPF_PERF_EVENT_UPROBE, BPF_PERF_EVENT_URETPROBE */
struct {
__aligned_u64 func_name; /* in/out */
@ -6589,14 +6654,19 @@ struct bpf_link_info {
__u32 offset; /* offset from func_name */
__u64 addr;
__u64 missed;
__u64 cookie;
} kprobe; /* BPF_PERF_EVENT_KPROBE, BPF_PERF_EVENT_KRETPROBE */
struct {
__aligned_u64 tp_name; /* in/out */
__u32 name_len;
__u32 :32;
__u64 cookie;
} tracepoint; /* BPF_PERF_EVENT_TRACEPOINT */
struct {
__u64 config;
__u32 type;
__u32 :32;
__u64 cookie;
} event; /* BPF_PERF_EVENT_EVENT */
};
} perf_event;
@ -6904,6 +6974,7 @@ enum {
BPF_TCP_LISTEN,
BPF_TCP_CLOSING, /* Now a valid state */
BPF_TCP_NEW_SYN_RECV,
BPF_TCP_BOUND_INACTIVE,
BPF_TCP_MAX_STATES /* Leave at the end! */
};

View File

@ -1,4 +1,4 @@
libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \
netlink.o bpf_prog_linfo.o libbpf_probes.o hashmap.o \
btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \
usdt.o zip.o elf.o
usdt.o zip.o elf.o features.o

View File

@ -103,7 +103,7 @@ int sys_bpf_prog_load(union bpf_attr *attr, unsigned int size, int attempts)
* [0] https://lore.kernel.org/bpf/20201201215900.3569844-1-guro@fb.com/
* [1] d05512618056 ("bpf: Add bpf_ktime_get_coarse_ns helper")
*/
int probe_memcg_account(void)
int probe_memcg_account(int token_fd)
{
const size_t attr_sz = offsetofend(union bpf_attr, attach_btf_obj_fd);
struct bpf_insn insns[] = {
@ -120,6 +120,9 @@ int probe_memcg_account(void)
attr.insns = ptr_to_u64(insns);
attr.insn_cnt = insn_cnt;
attr.license = ptr_to_u64("GPL");
attr.prog_token_fd = token_fd;
if (token_fd)
attr.prog_flags |= BPF_F_TOKEN_FD;
prog_fd = sys_bpf_fd(BPF_PROG_LOAD, &attr, attr_sz);
if (prog_fd >= 0) {
@ -146,7 +149,7 @@ int bump_rlimit_memlock(void)
struct rlimit rlim;
/* if kernel supports memcg-based accounting, skip bumping RLIMIT_MEMLOCK */
if (memlock_bumped || kernel_supports(NULL, FEAT_MEMCG_ACCOUNT))
if (memlock_bumped || feat_supported(NULL, FEAT_MEMCG_ACCOUNT))
return 0;
memlock_bumped = true;
@ -169,7 +172,7 @@ int bpf_map_create(enum bpf_map_type map_type,
__u32 max_entries,
const struct bpf_map_create_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, map_extra);
const size_t attr_sz = offsetofend(union bpf_attr, map_token_fd);
union bpf_attr attr;
int fd;
@ -181,7 +184,7 @@ int bpf_map_create(enum bpf_map_type map_type,
return libbpf_err(-EINVAL);
attr.map_type = map_type;
if (map_name && kernel_supports(NULL, FEAT_PROG_NAME))
if (map_name && feat_supported(NULL, FEAT_PROG_NAME))
libbpf_strlcpy(attr.map_name, map_name, sizeof(attr.map_name));
attr.key_size = key_size;
attr.value_size = value_size;
@ -191,6 +194,7 @@ int bpf_map_create(enum bpf_map_type map_type,
attr.btf_key_type_id = OPTS_GET(opts, btf_key_type_id, 0);
attr.btf_value_type_id = OPTS_GET(opts, btf_value_type_id, 0);
attr.btf_vmlinux_value_type_id = OPTS_GET(opts, btf_vmlinux_value_type_id, 0);
attr.value_type_btf_obj_fd = OPTS_GET(opts, value_type_btf_obj_fd, 0);
attr.inner_map_fd = OPTS_GET(opts, inner_map_fd, 0);
attr.map_flags = OPTS_GET(opts, map_flags, 0);
@ -198,6 +202,8 @@ int bpf_map_create(enum bpf_map_type map_type,
attr.numa_node = OPTS_GET(opts, numa_node, 0);
attr.map_ifindex = OPTS_GET(opts, map_ifindex, 0);
attr.map_token_fd = OPTS_GET(opts, token_fd, 0);
fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz);
return libbpf_err_errno(fd);
}
@ -232,7 +238,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
const struct bpf_insn *insns, size_t insn_cnt,
struct bpf_prog_load_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, log_true_size);
const size_t attr_sz = offsetofend(union bpf_attr, prog_token_fd);
void *finfo = NULL, *linfo = NULL;
const char *func_info, *line_info;
__u32 log_size, log_level, attach_prog_fd, attach_btf_obj_fd;
@ -261,8 +267,9 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
attr.prog_flags = OPTS_GET(opts, prog_flags, 0);
attr.prog_ifindex = OPTS_GET(opts, prog_ifindex, 0);
attr.kern_version = OPTS_GET(opts, kern_version, 0);
attr.prog_token_fd = OPTS_GET(opts, token_fd, 0);
if (prog_name && kernel_supports(NULL, FEAT_PROG_NAME))
if (prog_name && feat_supported(NULL, FEAT_PROG_NAME))
libbpf_strlcpy(attr.prog_name, prog_name, sizeof(attr.prog_name));
attr.license = ptr_to_u64(license);
@ -1182,7 +1189,7 @@ int bpf_raw_tracepoint_open(const char *name, int prog_fd)
int bpf_btf_load(const void *btf_data, size_t btf_size, struct bpf_btf_load_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, btf_log_true_size);
const size_t attr_sz = offsetofend(union bpf_attr, btf_token_fd);
union bpf_attr attr;
char *log_buf;
size_t log_size;
@ -1207,6 +1214,10 @@ int bpf_btf_load(const void *btf_data, size_t btf_size, struct bpf_btf_load_opts
attr.btf = ptr_to_u64(btf_data);
attr.btf_size = btf_size;
attr.btf_flags = OPTS_GET(opts, btf_flags, 0);
attr.btf_token_fd = OPTS_GET(opts, token_fd, 0);
/* log_level == 0 and log_buf != NULL means "try loading without
* log_buf, but retry with log_buf and log_level=1 on error", which is
* consistent across low-level and high-level BTF and program loading
@ -1287,3 +1298,20 @@ int bpf_prog_bind_map(int prog_fd, int map_fd,
ret = sys_bpf(BPF_PROG_BIND_MAP, &attr, attr_sz);
return libbpf_err_errno(ret);
}
int bpf_token_create(int bpffs_fd, struct bpf_token_create_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, token_create);
union bpf_attr attr;
int fd;
if (!OPTS_VALID(opts, bpf_token_create_opts))
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.token_create.bpffs_fd = bpffs_fd;
attr.token_create.flags = OPTS_GET(opts, flags, 0);
fd = sys_bpf_fd(BPF_TOKEN_CREATE, &attr, attr_sz);
return libbpf_err_errno(fd);
}

View File

@ -51,8 +51,12 @@ struct bpf_map_create_opts {
__u32 numa_node;
__u32 map_ifindex;
__s32 value_type_btf_obj_fd;
__u32 token_fd;
size_t :0;
};
#define bpf_map_create_opts__last_field map_ifindex
#define bpf_map_create_opts__last_field token_fd
LIBBPF_API int bpf_map_create(enum bpf_map_type map_type,
const char *map_name,
@ -102,9 +106,10 @@ struct bpf_prog_load_opts {
* If kernel doesn't support this feature, log_size is left unchanged.
*/
__u32 log_true_size;
__u32 token_fd;
size_t :0;
};
#define bpf_prog_load_opts__last_field log_true_size
#define bpf_prog_load_opts__last_field token_fd
LIBBPF_API int bpf_prog_load(enum bpf_prog_type prog_type,
const char *prog_name, const char *license,
@ -130,9 +135,12 @@ struct bpf_btf_load_opts {
* If kernel doesn't support this feature, log_size is left unchanged.
*/
__u32 log_true_size;
__u32 btf_flags;
__u32 token_fd;
size_t :0;
};
#define bpf_btf_load_opts__last_field log_true_size
#define bpf_btf_load_opts__last_field token_fd
LIBBPF_API int bpf_btf_load(const void *btf_data, size_t btf_size,
struct bpf_btf_load_opts *opts);
@ -640,6 +648,30 @@ struct bpf_test_run_opts {
LIBBPF_API int bpf_prog_test_run_opts(int prog_fd,
struct bpf_test_run_opts *opts);
struct bpf_token_create_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 flags;
size_t :0;
};
#define bpf_token_create_opts__last_field flags
/**
* @brief **bpf_token_create()** creates a new instance of BPF token derived
* from specified BPF FS mount point.
*
* BPF token created with this API can be passed to bpf() syscall for
* commands like BPF_PROG_LOAD, BPF_MAP_CREATE, etc.
*
* @param bpffs_fd FD for BPF FS instance from which to derive a BPF token
* instance.
* @param opts optional BPF token creation options, can be NULL
*
* @return BPF token FD > 0, on success; negative error code, otherwise (errno
* is also set to the error code)
*/
LIBBPF_API int bpf_token_create(int bpffs_fd,
struct bpf_token_create_opts *opts);
#ifdef __cplusplus
} /* extern "C" */
#endif

View File

@ -268,7 +268,7 @@ enum bpf_enum_value_kind {
* a relocation, which records BTF type ID describing root struct/union and an
* accessor string which describes exact embedded field that was used to take
* an address. See detailed description of this relocation format and
* semantics in comments to struct bpf_field_reloc in libbpf_internal.h.
* semantics in comments to struct bpf_core_relo in include/uapi/linux/bpf.h.
*
* This relocation allows libbpf to adjust BPF instruction to use correct
* actual field offset, based on target kernel BTF type that matches original

View File

@ -1317,7 +1317,9 @@ struct btf *btf__parse_split(const char *path, struct btf *base_btf)
static void *btf_get_raw_data(const struct btf *btf, __u32 *size, bool swap_endian);
int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 log_level)
int btf_load_into_kernel(struct btf *btf,
char *log_buf, size_t log_sz, __u32 log_level,
int token_fd)
{
LIBBPF_OPTS(bpf_btf_load_opts, opts);
__u32 buf_sz = 0, raw_size;
@ -1367,6 +1369,10 @@ int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 lo
opts.log_level = log_level;
}
opts.token_fd = token_fd;
if (token_fd)
opts.btf_flags |= BPF_F_TOKEN_FD;
btf->fd = bpf_btf_load(raw_data, raw_size, &opts);
if (btf->fd < 0) {
/* time to turn on verbose mode and try again */
@ -1394,7 +1400,7 @@ int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 lo
int btf__load_into_kernel(struct btf *btf)
{
return btf_load_into_kernel(btf, NULL, 0, 0);
return btf_load_into_kernel(btf, NULL, 0, 0, 0);
}
int btf__fd(const struct btf *btf)

View File

@ -11,8 +11,6 @@
#include "libbpf_internal.h"
#include "str_error.h"
#define STRERR_BUFSIZE 128
/* A SHT_GNU_versym section holds 16-bit words. This bit is set if
* the symbol is hidden and can only be seen when referenced using an
* explicit version number. This is a GNU extension.

503
tools/lib/bpf/features.c Normal file
View File

@ -0,0 +1,503 @@
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
#include <linux/kernel.h>
#include <linux/filter.h>
#include "bpf.h"
#include "libbpf.h"
#include "libbpf_common.h"
#include "libbpf_internal.h"
#include "str_error.h"
static inline __u64 ptr_to_u64(const void *ptr)
{
return (__u64)(unsigned long)ptr;
}
int probe_fd(int fd)
{
if (fd >= 0)
close(fd);
return fd >= 0;
}
static int probe_kern_prog_name(int token_fd)
{
const size_t attr_sz = offsetofend(union bpf_attr, prog_name);
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
union bpf_attr attr;
int ret;
memset(&attr, 0, attr_sz);
attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
attr.license = ptr_to_u64("GPL");
attr.insns = ptr_to_u64(insns);
attr.insn_cnt = (__u32)ARRAY_SIZE(insns);
attr.prog_token_fd = token_fd;
if (token_fd)
attr.prog_flags |= BPF_F_TOKEN_FD;
libbpf_strlcpy(attr.prog_name, "libbpf_nametest", sizeof(attr.prog_name));
/* make sure loading with name works */
ret = sys_bpf_prog_load(&attr, attr_sz, PROG_LOAD_ATTEMPTS);
return probe_fd(ret);
}
static int probe_kern_global_data(int token_fd)
{
char *cp, errmsg[STRERR_BUFSIZE];
struct bpf_insn insns[] = {
BPF_LD_MAP_VALUE(BPF_REG_1, 0, 16),
BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 42),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
LIBBPF_OPTS(bpf_map_create_opts, map_opts,
.token_fd = token_fd,
.map_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
LIBBPF_OPTS(bpf_prog_load_opts, prog_opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int ret, map, insn_cnt = ARRAY_SIZE(insns);
map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_global", sizeof(int), 32, 1, &map_opts);
if (map < 0) {
ret = -errno;
cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n",
__func__, cp, -ret);
return ret;
}
insns[0].imm = map;
ret = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, &prog_opts);
close(map);
return probe_fd(ret);
}
static int probe_kern_btf(int token_fd)
{
static const char strs[] = "\0int";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_func(int token_fd)
{
static const char strs[] = "\0int\0x\0a";
/* void x(int a) {} */
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* FUNC_PROTO */ /* [2] */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0),
BTF_PARAM_ENC(7, 1),
/* FUNC x */ /* [3] */
BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 0), 2),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_func_global(int token_fd)
{
static const char strs[] = "\0int\0x\0a";
/* static void x(int a) {} */
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* FUNC_PROTO */ /* [2] */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0),
BTF_PARAM_ENC(7, 1),
/* FUNC x BTF_FUNC_GLOBAL */ /* [3] */
BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, BTF_FUNC_GLOBAL), 2),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_datasec(int token_fd)
{
static const char strs[] = "\0x\0.data";
/* static int a; */
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* VAR x */ /* [2] */
BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1),
BTF_VAR_STATIC,
/* DATASEC val */ /* [3] */
BTF_TYPE_ENC(3, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
BTF_VAR_SECINFO_ENC(2, 0, 4),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_float(int token_fd)
{
static const char strs[] = "\0float";
__u32 types[] = {
/* float */
BTF_TYPE_FLOAT_ENC(1, 4),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_decl_tag(int token_fd)
{
static const char strs[] = "\0tag";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* VAR x */ /* [2] */
BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1),
BTF_VAR_STATIC,
/* attr */
BTF_TYPE_DECL_TAG_ENC(1, 2, -1),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_btf_type_tag(int token_fd)
{
static const char strs[] = "\0tag";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* attr */
BTF_TYPE_TYPE_TAG_ENC(1, 1), /* [2] */
/* ptr */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 2), /* [3] */
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
static int probe_kern_array_mmap(int token_fd)
{
LIBBPF_OPTS(bpf_map_create_opts, opts,
.map_flags = BPF_F_MMAPABLE | (token_fd ? BPF_F_TOKEN_FD : 0),
.token_fd = token_fd,
);
int fd;
fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_mmap", sizeof(int), sizeof(int), 1, &opts);
return probe_fd(fd);
}
static int probe_kern_exp_attach_type(int token_fd)
{
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.expected_attach_type = BPF_CGROUP_INET_SOCK_CREATE,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
int fd, insn_cnt = ARRAY_SIZE(insns);
/* use any valid combination of program type and (optional)
* non-zero expected attach type (i.e., not a BPF_CGROUP_INET_INGRESS)
* to see if kernel supports expected_attach_type field for
* BPF_PROG_LOAD command
*/
fd = bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, NULL, "GPL", insns, insn_cnt, &opts);
return probe_fd(fd);
}
static int probe_kern_probe_read_kernel(int token_fd)
{
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
struct bpf_insn insns[] = {
BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), /* r1 = r10 (fp) */
BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), /* r1 += -8 */
BPF_MOV64_IMM(BPF_REG_2, 8), /* r2 = 8 */
BPF_MOV64_IMM(BPF_REG_3, 0), /* r3 = 0 */
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_probe_read_kernel),
BPF_EXIT_INSN(),
};
int fd, insn_cnt = ARRAY_SIZE(insns);
fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, &opts);
return probe_fd(fd);
}
static int probe_prog_bind_map(int token_fd)
{
char *cp, errmsg[STRERR_BUFSIZE];
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
LIBBPF_OPTS(bpf_map_create_opts, map_opts,
.token_fd = token_fd,
.map_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
LIBBPF_OPTS(bpf_prog_load_opts, prog_opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int ret, map, prog, insn_cnt = ARRAY_SIZE(insns);
map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_det_bind", sizeof(int), 32, 1, &map_opts);
if (map < 0) {
ret = -errno;
cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n",
__func__, cp, -ret);
return ret;
}
prog = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, &prog_opts);
if (prog < 0) {
close(map);
return 0;
}
ret = bpf_prog_bind_map(prog, map, NULL);
close(map);
close(prog);
return ret >= 0;
}
static int probe_module_btf(int token_fd)
{
static const char strs[] = "\0int";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),
};
struct bpf_btf_info info;
__u32 len = sizeof(info);
char name[16];
int fd, err;
fd = libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs), token_fd);
if (fd < 0)
return 0; /* BTF not supported at all */
memset(&info, 0, sizeof(info));
info.name = ptr_to_u64(name);
info.name_len = sizeof(name);
/* check that BPF_OBJ_GET_INFO_BY_FD supports specifying name pointer;
* kernel's module BTF support coincides with support for
* name/name_len fields in struct bpf_btf_info.
*/
err = bpf_btf_get_info_by_fd(fd, &info, &len);
close(fd);
return !err;
}
static int probe_perf_link(int token_fd)
{
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int prog_fd, link_fd, err;
prog_fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL",
insns, ARRAY_SIZE(insns), &opts);
if (prog_fd < 0)
return -errno;
/* use invalid perf_event FD to get EBADF, if link is supported;
* otherwise EINVAL should be returned
*/
link_fd = bpf_link_create(prog_fd, -1, BPF_PERF_EVENT, NULL);
err = -errno; /* close() can clobber errno */
if (link_fd >= 0)
close(link_fd);
close(prog_fd);
return link_fd < 0 && err == -EBADF;
}
static int probe_uprobe_multi_link(int token_fd)
{
LIBBPF_OPTS(bpf_prog_load_opts, load_opts,
.expected_attach_type = BPF_TRACE_UPROBE_MULTI,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
LIBBPF_OPTS(bpf_link_create_opts, link_opts);
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
int prog_fd, link_fd, err;
unsigned long offset = 0;
prog_fd = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL",
insns, ARRAY_SIZE(insns), &load_opts);
if (prog_fd < 0)
return -errno;
/* Creating uprobe in '/' binary should fail with -EBADF. */
link_opts.uprobe_multi.path = "/";
link_opts.uprobe_multi.offsets = &offset;
link_opts.uprobe_multi.cnt = 1;
link_fd = bpf_link_create(prog_fd, -1, BPF_TRACE_UPROBE_MULTI, &link_opts);
err = -errno; /* close() can clobber errno */
if (link_fd >= 0)
close(link_fd);
close(prog_fd);
return link_fd < 0 && err == -EBADF;
}
static int probe_kern_bpf_cookie(int token_fd)
{
struct bpf_insn insns[] = {
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_attach_cookie),
BPF_EXIT_INSN(),
};
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.token_fd = token_fd,
.prog_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int ret, insn_cnt = ARRAY_SIZE(insns);
ret = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, &opts);
return probe_fd(ret);
}
static int probe_kern_btf_enum64(int token_fd)
{
static const char strs[] = "\0enum64";
__u32 types[] = {
BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_ENUM64, 0, 0), 8),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs), token_fd));
}
typedef int (*feature_probe_fn)(int /* token_fd */);
static struct kern_feature_cache feature_cache;
static struct kern_feature_desc {
const char *desc;
feature_probe_fn probe;
} feature_probes[__FEAT_CNT] = {
[FEAT_PROG_NAME] = {
"BPF program name", probe_kern_prog_name,
},
[FEAT_GLOBAL_DATA] = {
"global variables", probe_kern_global_data,
},
[FEAT_BTF] = {
"minimal BTF", probe_kern_btf,
},
[FEAT_BTF_FUNC] = {
"BTF functions", probe_kern_btf_func,
},
[FEAT_BTF_GLOBAL_FUNC] = {
"BTF global function", probe_kern_btf_func_global,
},
[FEAT_BTF_DATASEC] = {
"BTF data section and variable", probe_kern_btf_datasec,
},
[FEAT_ARRAY_MMAP] = {
"ARRAY map mmap()", probe_kern_array_mmap,
},
[FEAT_EXP_ATTACH_TYPE] = {
"BPF_PROG_LOAD expected_attach_type attribute",
probe_kern_exp_attach_type,
},
[FEAT_PROBE_READ_KERN] = {
"bpf_probe_read_kernel() helper", probe_kern_probe_read_kernel,
},
[FEAT_PROG_BIND_MAP] = {
"BPF_PROG_BIND_MAP support", probe_prog_bind_map,
},
[FEAT_MODULE_BTF] = {
"module BTF support", probe_module_btf,
},
[FEAT_BTF_FLOAT] = {
"BTF_KIND_FLOAT support", probe_kern_btf_float,
},
[FEAT_PERF_LINK] = {
"BPF perf link support", probe_perf_link,
},
[FEAT_BTF_DECL_TAG] = {
"BTF_KIND_DECL_TAG support", probe_kern_btf_decl_tag,
},
[FEAT_BTF_TYPE_TAG] = {
"BTF_KIND_TYPE_TAG support", probe_kern_btf_type_tag,
},
[FEAT_MEMCG_ACCOUNT] = {
"memcg-based memory accounting", probe_memcg_account,
},
[FEAT_BPF_COOKIE] = {
"BPF cookie support", probe_kern_bpf_cookie,
},
[FEAT_BTF_ENUM64] = {
"BTF_KIND_ENUM64 support", probe_kern_btf_enum64,
},
[FEAT_SYSCALL_WRAPPER] = {
"Kernel using syscall wrapper", probe_kern_syscall_wrapper,
},
[FEAT_UPROBE_MULTI_LINK] = {
"BPF multi-uprobe link support", probe_uprobe_multi_link,
},
};
bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id)
{
struct kern_feature_desc *feat = &feature_probes[feat_id];
int ret;
/* assume global feature cache, unless custom one is provided */
if (!cache)
cache = &feature_cache;
if (READ_ONCE(cache->res[feat_id]) == FEAT_UNKNOWN) {
ret = feat->probe(cache->token_fd);
if (ret > 0) {
WRITE_ONCE(cache->res[feat_id], FEAT_SUPPORTED);
} else if (ret == 0) {
WRITE_ONCE(cache->res[feat_id], FEAT_MISSING);
} else {
pr_warn("Detection of kernel %s support failed: %d\n", feat->desc, ret);
WRITE_ONCE(cache->res[feat_id], FEAT_MISSING);
}
}
return READ_ONCE(cache->res[feat_id]) == FEAT_SUPPORTED;
}

View File

@ -59,6 +59,8 @@
#define BPF_FS_MAGIC 0xcafe4a11
#endif
#define BPF_FS_DEFAULT_PATH "/sys/fs/bpf"
#define BPF_INSN_SZ (sizeof(struct bpf_insn))
/* vsprintf() in __base_pr() uses nonliteral format string. It may break
@ -70,6 +72,7 @@
static struct bpf_map *bpf_object__add_map(struct bpf_object *obj);
static bool prog_is_subprog(const struct bpf_object *obj, const struct bpf_program *prog);
static int map_set_def_max_entries(struct bpf_map *map);
static const char * const attach_type_name[] = {
[BPF_CGROUP_INET_INGRESS] = "cgroup_inet_ingress",
@ -527,6 +530,7 @@ struct bpf_map {
struct bpf_map_def def;
__u32 numa_node;
__u32 btf_var_idx;
int mod_btf_fd;
__u32 btf_key_type_id;
__u32 btf_value_type_id;
__u32 btf_vmlinux_value_type_id;
@ -693,6 +697,10 @@ struct bpf_object {
struct usdt_manager *usdt_man;
struct kern_feature_cache *feat_cache;
char *token_path;
int token_fd;
char path[];
};
@ -930,22 +938,29 @@ find_member_by_name(const struct btf *btf, const struct btf_type *t,
return NULL;
}
static int find_ksym_btf_id(struct bpf_object *obj, const char *ksym_name,
__u16 kind, struct btf **res_btf,
struct module_btf **res_mod_btf);
#define STRUCT_OPS_VALUE_PREFIX "bpf_struct_ops_"
static int find_btf_by_prefix_kind(const struct btf *btf, const char *prefix,
const char *name, __u32 kind);
static int
find_struct_ops_kern_types(const struct btf *btf, const char *tname,
find_struct_ops_kern_types(struct bpf_object *obj, const char *tname,
struct module_btf **mod_btf,
const struct btf_type **type, __u32 *type_id,
const struct btf_type **vtype, __u32 *vtype_id,
const struct btf_member **data_member)
{
const struct btf_type *kern_type, *kern_vtype;
const struct btf_member *kern_data_member;
struct btf *btf;
__s32 kern_vtype_id, kern_type_id;
__u32 i;
kern_type_id = btf__find_by_name_kind(btf, tname, BTF_KIND_STRUCT);
kern_type_id = find_ksym_btf_id(obj, tname, BTF_KIND_STRUCT,
&btf, mod_btf);
if (kern_type_id < 0) {
pr_warn("struct_ops init_kern: struct %s is not found in kernel BTF\n",
tname);
@ -999,14 +1014,16 @@ static bool bpf_map__is_struct_ops(const struct bpf_map *map)
}
/* Init the map's fields that depend on kern_btf */
static int bpf_map__init_kern_struct_ops(struct bpf_map *map,
const struct btf *btf,
const struct btf *kern_btf)
static int bpf_map__init_kern_struct_ops(struct bpf_map *map)
{
const struct btf_member *member, *kern_member, *kern_data_member;
const struct btf_type *type, *kern_type, *kern_vtype;
__u32 i, kern_type_id, kern_vtype_id, kern_data_off;
struct bpf_object *obj = map->obj;
const struct btf *btf = obj->btf;
struct bpf_struct_ops *st_ops;
const struct btf *kern_btf;
struct module_btf *mod_btf;
void *data, *kern_data;
const char *tname;
int err;
@ -1014,16 +1031,19 @@ static int bpf_map__init_kern_struct_ops(struct bpf_map *map,
st_ops = map->st_ops;
type = st_ops->type;
tname = st_ops->tname;
err = find_struct_ops_kern_types(kern_btf, tname,
err = find_struct_ops_kern_types(obj, tname, &mod_btf,
&kern_type, &kern_type_id,
&kern_vtype, &kern_vtype_id,
&kern_data_member);
if (err)
return err;
kern_btf = mod_btf ? mod_btf->btf : obj->btf_vmlinux;
pr_debug("struct_ops init_kern %s: type_id:%u kern_type_id:%u kern_vtype_id:%u\n",
map->name, st_ops->type_id, kern_type_id, kern_vtype_id);
map->mod_btf_fd = mod_btf ? mod_btf->fd : -1;
map->def.value_size = kern_vtype->size;
map->btf_vmlinux_value_type_id = kern_vtype_id;
@ -1099,6 +1119,8 @@ static int bpf_map__init_kern_struct_ops(struct bpf_map *map,
return -ENOTSUP;
}
if (mod_btf)
prog->attach_btf_obj_fd = mod_btf->fd;
prog->attach_btf_id = kern_type_id;
prog->expected_attach_type = kern_member_idx;
@ -1141,8 +1163,7 @@ static int bpf_object__init_kern_struct_ops_maps(struct bpf_object *obj)
if (!bpf_map__is_struct_ops(map))
continue;
err = bpf_map__init_kern_struct_ops(map, obj->btf,
obj->btf_vmlinux);
err = bpf_map__init_kern_struct_ops(map);
if (err)
return err;
}
@ -2216,7 +2237,7 @@ static int build_map_pin_path(struct bpf_map *map, const char *path)
int err;
if (!path)
path = "/sys/fs/bpf";
path = BPF_FS_DEFAULT_PATH;
err = pathname_concat(buf, sizeof(buf), path, bpf_map__name(map));
if (err)
@ -3225,7 +3246,7 @@ static int bpf_object__sanitize_and_load_btf(struct bpf_object *obj)
} else {
/* currently BPF_BTF_LOAD only supports log_level 1 */
err = btf_load_into_kernel(kern_btf, obj->log_buf, obj->log_size,
obj->log_level ? 1 : 0);
obj->log_level ? 1 : 0, obj->token_fd);
}
if (sanitize) {
if (!err) {
@ -4546,6 +4567,58 @@ int bpf_map__set_max_entries(struct bpf_map *map, __u32 max_entries)
return 0;
}
static int bpf_object_prepare_token(struct bpf_object *obj)
{
const char *bpffs_path;
int bpffs_fd = -1, token_fd, err;
bool mandatory;
enum libbpf_print_level level;
/* token is explicitly prevented */
if (obj->token_path && obj->token_path[0] == '\0') {
pr_debug("object '%s': token is prevented, skipping...\n", obj->name);
return 0;
}
mandatory = obj->token_path != NULL;
level = mandatory ? LIBBPF_WARN : LIBBPF_DEBUG;
bpffs_path = obj->token_path ?: BPF_FS_DEFAULT_PATH;
bpffs_fd = open(bpffs_path, O_DIRECTORY, O_RDWR);
if (bpffs_fd < 0) {
err = -errno;
__pr(level, "object '%s': failed (%d) to open BPF FS mount at '%s'%s\n",
obj->name, err, bpffs_path,
mandatory ? "" : ", skipping optional step...");
return mandatory ? err : 0;
}
token_fd = bpf_token_create(bpffs_fd, 0);
close(bpffs_fd);
if (token_fd < 0) {
if (!mandatory && token_fd == -ENOENT) {
pr_debug("object '%s': BPF FS at '%s' doesn't have BPF token delegation set up, skipping...\n",
obj->name, bpffs_path);
return 0;
}
__pr(level, "object '%s': failed (%d) to create BPF token from '%s'%s\n",
obj->name, token_fd, bpffs_path,
mandatory ? "" : ", skipping optional step...");
return mandatory ? token_fd : 0;
}
obj->feat_cache = calloc(1, sizeof(*obj->feat_cache));
if (!obj->feat_cache) {
close(token_fd);
return -ENOMEM;
}
obj->token_fd = token_fd;
obj->feat_cache->token_fd = token_fd;
return 0;
}
static int
bpf_object__probe_loading(struct bpf_object *obj)
{
@ -4555,6 +4628,10 @@ bpf_object__probe_loading(struct bpf_object *obj)
BPF_EXIT_INSN(),
};
int ret, insn_cnt = ARRAY_SIZE(insns);
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.token_fd = obj->token_fd,
.prog_flags = obj->token_fd ? BPF_F_TOKEN_FD : 0,
);
if (obj->gen_loader)
return 0;
@ -4564,9 +4641,9 @@ bpf_object__probe_loading(struct bpf_object *obj)
pr_warn("Failed to bump RLIMIT_MEMLOCK (err = %d), you might need to do it explicitly!\n", ret);
/* make sure basic loading works */
ret = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, NULL);
ret = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, &opts);
if (ret < 0)
ret = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, NULL);
ret = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, &opts);
if (ret < 0) {
ret = errno;
cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
@ -4581,462 +4658,18 @@ bpf_object__probe_loading(struct bpf_object *obj)
return 0;
}
static int probe_fd(int fd)
{
if (fd >= 0)
close(fd);
return fd >= 0;
}
static int probe_kern_prog_name(void)
{
const size_t attr_sz = offsetofend(union bpf_attr, prog_name);
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
union bpf_attr attr;
int ret;
memset(&attr, 0, attr_sz);
attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
attr.license = ptr_to_u64("GPL");
attr.insns = ptr_to_u64(insns);
attr.insn_cnt = (__u32)ARRAY_SIZE(insns);
libbpf_strlcpy(attr.prog_name, "libbpf_nametest", sizeof(attr.prog_name));
/* make sure loading with name works */
ret = sys_bpf_prog_load(&attr, attr_sz, PROG_LOAD_ATTEMPTS);
return probe_fd(ret);
}
static int probe_kern_global_data(void)
{
char *cp, errmsg[STRERR_BUFSIZE];
struct bpf_insn insns[] = {
BPF_LD_MAP_VALUE(BPF_REG_1, 0, 16),
BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 42),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
int ret, map, insn_cnt = ARRAY_SIZE(insns);
map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_global", sizeof(int), 32, 1, NULL);
if (map < 0) {
ret = -errno;
cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n",
__func__, cp, -ret);
return ret;
}
insns[0].imm = map;
ret = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, NULL);
close(map);
return probe_fd(ret);
}
static int probe_kern_btf(void)
{
static const char strs[] = "\0int";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs)));
}
static int probe_kern_btf_func(void)
{
static const char strs[] = "\0int\0x\0a";
/* void x(int a) {} */
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* FUNC_PROTO */ /* [2] */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0),
BTF_PARAM_ENC(7, 1),
/* FUNC x */ /* [3] */
BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 0), 2),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs)));
}
static int probe_kern_btf_func_global(void)
{
static const char strs[] = "\0int\0x\0a";
/* static void x(int a) {} */
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* FUNC_PROTO */ /* [2] */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0),
BTF_PARAM_ENC(7, 1),
/* FUNC x BTF_FUNC_GLOBAL */ /* [3] */
BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, BTF_FUNC_GLOBAL), 2),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs)));
}
static int probe_kern_btf_datasec(void)
{
static const char strs[] = "\0x\0.data";
/* static int a; */
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* VAR x */ /* [2] */
BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1),
BTF_VAR_STATIC,
/* DATASEC val */ /* [3] */
BTF_TYPE_ENC(3, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4),
BTF_VAR_SECINFO_ENC(2, 0, 4),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs)));
}
static int probe_kern_btf_float(void)
{
static const char strs[] = "\0float";
__u32 types[] = {
/* float */
BTF_TYPE_FLOAT_ENC(1, 4),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs)));
}
static int probe_kern_btf_decl_tag(void)
{
static const char strs[] = "\0tag";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* VAR x */ /* [2] */
BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1),
BTF_VAR_STATIC,
/* attr */
BTF_TYPE_DECL_TAG_ENC(1, 2, -1),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs)));
}
static int probe_kern_btf_type_tag(void)
{
static const char strs[] = "\0tag";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */
/* attr */
BTF_TYPE_TYPE_TAG_ENC(1, 1), /* [2] */
/* ptr */
BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 2), /* [3] */
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs)));
}
static int probe_kern_array_mmap(void)
{
LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_MMAPABLE);
int fd;
fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_mmap", sizeof(int), sizeof(int), 1, &opts);
return probe_fd(fd);
}
static int probe_kern_exp_attach_type(void)
{
LIBBPF_OPTS(bpf_prog_load_opts, opts, .expected_attach_type = BPF_CGROUP_INET_SOCK_CREATE);
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
int fd, insn_cnt = ARRAY_SIZE(insns);
/* use any valid combination of program type and (optional)
* non-zero expected attach type (i.e., not a BPF_CGROUP_INET_INGRESS)
* to see if kernel supports expected_attach_type field for
* BPF_PROG_LOAD command
*/
fd = bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, NULL, "GPL", insns, insn_cnt, &opts);
return probe_fd(fd);
}
static int probe_kern_probe_read_kernel(void)
{
struct bpf_insn insns[] = {
BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), /* r1 = r10 (fp) */
BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), /* r1 += -8 */
BPF_MOV64_IMM(BPF_REG_2, 8), /* r2 = 8 */
BPF_MOV64_IMM(BPF_REG_3, 0), /* r3 = 0 */
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_probe_read_kernel),
BPF_EXIT_INSN(),
};
int fd, insn_cnt = ARRAY_SIZE(insns);
fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, NULL);
return probe_fd(fd);
}
static int probe_prog_bind_map(void)
{
char *cp, errmsg[STRERR_BUFSIZE];
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
int ret, map, prog, insn_cnt = ARRAY_SIZE(insns);
map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_det_bind", sizeof(int), 32, 1, NULL);
if (map < 0) {
ret = -errno;
cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg));
pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n",
__func__, cp, -ret);
return ret;
}
prog = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, NULL);
if (prog < 0) {
close(map);
return 0;
}
ret = bpf_prog_bind_map(prog, map, NULL);
close(map);
close(prog);
return ret >= 0;
}
static int probe_module_btf(void)
{
static const char strs[] = "\0int";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),
};
struct bpf_btf_info info;
__u32 len = sizeof(info);
char name[16];
int fd, err;
fd = libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs));
if (fd < 0)
return 0; /* BTF not supported at all */
memset(&info, 0, sizeof(info));
info.name = ptr_to_u64(name);
info.name_len = sizeof(name);
/* check that BPF_OBJ_GET_INFO_BY_FD supports specifying name pointer;
* kernel's module BTF support coincides with support for
* name/name_len fields in struct bpf_btf_info.
*/
err = bpf_btf_get_info_by_fd(fd, &info, &len);
close(fd);
return !err;
}
static int probe_perf_link(void)
{
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
int prog_fd, link_fd, err;
prog_fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL",
insns, ARRAY_SIZE(insns), NULL);
if (prog_fd < 0)
return -errno;
/* use invalid perf_event FD to get EBADF, if link is supported;
* otherwise EINVAL should be returned
*/
link_fd = bpf_link_create(prog_fd, -1, BPF_PERF_EVENT, NULL);
err = -errno; /* close() can clobber errno */
if (link_fd >= 0)
close(link_fd);
close(prog_fd);
return link_fd < 0 && err == -EBADF;
}
static int probe_uprobe_multi_link(void)
{
LIBBPF_OPTS(bpf_prog_load_opts, load_opts,
.expected_attach_type = BPF_TRACE_UPROBE_MULTI,
);
LIBBPF_OPTS(bpf_link_create_opts, link_opts);
struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
int prog_fd, link_fd, err;
unsigned long offset = 0;
prog_fd = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL",
insns, ARRAY_SIZE(insns), &load_opts);
if (prog_fd < 0)
return -errno;
/* Creating uprobe in '/' binary should fail with -EBADF. */
link_opts.uprobe_multi.path = "/";
link_opts.uprobe_multi.offsets = &offset;
link_opts.uprobe_multi.cnt = 1;
link_fd = bpf_link_create(prog_fd, -1, BPF_TRACE_UPROBE_MULTI, &link_opts);
err = -errno; /* close() can clobber errno */
if (link_fd >= 0)
close(link_fd);
close(prog_fd);
return link_fd < 0 && err == -EBADF;
}
static int probe_kern_bpf_cookie(void)
{
struct bpf_insn insns[] = {
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_attach_cookie),
BPF_EXIT_INSN(),
};
int ret, insn_cnt = ARRAY_SIZE(insns);
ret = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL", insns, insn_cnt, NULL);
return probe_fd(ret);
}
static int probe_kern_btf_enum64(void)
{
static const char strs[] = "\0enum64";
__u32 types[] = {
BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_ENUM64, 0, 0), 8),
};
return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs)));
}
static int probe_kern_syscall_wrapper(void);
enum kern_feature_result {
FEAT_UNKNOWN = 0,
FEAT_SUPPORTED = 1,
FEAT_MISSING = 2,
};
typedef int (*feature_probe_fn)(void);
static struct kern_feature_desc {
const char *desc;
feature_probe_fn probe;
enum kern_feature_result res;
} feature_probes[__FEAT_CNT] = {
[FEAT_PROG_NAME] = {
"BPF program name", probe_kern_prog_name,
},
[FEAT_GLOBAL_DATA] = {
"global variables", probe_kern_global_data,
},
[FEAT_BTF] = {
"minimal BTF", probe_kern_btf,
},
[FEAT_BTF_FUNC] = {
"BTF functions", probe_kern_btf_func,
},
[FEAT_BTF_GLOBAL_FUNC] = {
"BTF global function", probe_kern_btf_func_global,
},
[FEAT_BTF_DATASEC] = {
"BTF data section and variable", probe_kern_btf_datasec,
},
[FEAT_ARRAY_MMAP] = {
"ARRAY map mmap()", probe_kern_array_mmap,
},
[FEAT_EXP_ATTACH_TYPE] = {
"BPF_PROG_LOAD expected_attach_type attribute",
probe_kern_exp_attach_type,
},
[FEAT_PROBE_READ_KERN] = {
"bpf_probe_read_kernel() helper", probe_kern_probe_read_kernel,
},
[FEAT_PROG_BIND_MAP] = {
"BPF_PROG_BIND_MAP support", probe_prog_bind_map,
},
[FEAT_MODULE_BTF] = {
"module BTF support", probe_module_btf,
},
[FEAT_BTF_FLOAT] = {
"BTF_KIND_FLOAT support", probe_kern_btf_float,
},
[FEAT_PERF_LINK] = {
"BPF perf link support", probe_perf_link,
},
[FEAT_BTF_DECL_TAG] = {
"BTF_KIND_DECL_TAG support", probe_kern_btf_decl_tag,
},
[FEAT_BTF_TYPE_TAG] = {
"BTF_KIND_TYPE_TAG support", probe_kern_btf_type_tag,
},
[FEAT_MEMCG_ACCOUNT] = {
"memcg-based memory accounting", probe_memcg_account,
},
[FEAT_BPF_COOKIE] = {
"BPF cookie support", probe_kern_bpf_cookie,
},
[FEAT_BTF_ENUM64] = {
"BTF_KIND_ENUM64 support", probe_kern_btf_enum64,
},
[FEAT_SYSCALL_WRAPPER] = {
"Kernel using syscall wrapper", probe_kern_syscall_wrapper,
},
[FEAT_UPROBE_MULTI_LINK] = {
"BPF multi-uprobe link support", probe_uprobe_multi_link,
},
};
bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id)
{
struct kern_feature_desc *feat = &feature_probes[feat_id];
int ret;
if (obj && obj->gen_loader)
/* To generate loader program assume the latest kernel
* to avoid doing extra prog_load, map_create syscalls.
*/
return true;
if (READ_ONCE(feat->res) == FEAT_UNKNOWN) {
ret = feat->probe();
if (ret > 0) {
WRITE_ONCE(feat->res, FEAT_SUPPORTED);
} else if (ret == 0) {
WRITE_ONCE(feat->res, FEAT_MISSING);
} else {
pr_warn("Detection of kernel %s support failed: %d\n", feat->desc, ret);
WRITE_ONCE(feat->res, FEAT_MISSING);
}
}
if (obj->token_fd)
return feat_supported(obj->feat_cache, feat_id);
return READ_ONCE(feat->res) == FEAT_SUPPORTED;
return feat_supported(NULL, feat_id);
}
static bool map_is_reuse_compat(const struct bpf_map *map, int map_fd)
@ -5160,9 +4793,17 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
create_attr.map_flags = def->map_flags;
create_attr.numa_node = map->numa_node;
create_attr.map_extra = map->map_extra;
create_attr.token_fd = obj->token_fd;
if (obj->token_fd)
create_attr.map_flags |= BPF_F_TOKEN_FD;
if (bpf_map__is_struct_ops(map))
if (bpf_map__is_struct_ops(map)) {
create_attr.btf_vmlinux_value_type_id = map->btf_vmlinux_value_type_id;
if (map->mod_btf_fd >= 0) {
create_attr.value_type_btf_obj_fd = map->mod_btf_fd;
create_attr.map_flags |= BPF_F_VTYPE_BTF_OBJ_FD;
}
}
if (obj->btf && btf__fd(obj->btf) >= 0) {
create_attr.btf_fd = btf__fd(obj->btf);
@ -5172,6 +4813,9 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
if (bpf_map_type__is_map_in_map(def->type)) {
if (map->inner_map) {
err = map_set_def_max_entries(map->inner_map);
if (err)
return err;
err = bpf_object__create_map(obj, map->inner_map, true);
if (err) {
pr_warn("map '%s': failed to create inner map: %d\n",
@ -6864,7 +6508,7 @@ static int probe_kern_arg_ctx_tag(void)
if (cached_result >= 0)
return cached_result;
btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs));
btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs), 0);
if (btf_fd < 0)
return 0;
@ -7473,6 +7117,10 @@ static int bpf_object_load_prog(struct bpf_object *obj, struct bpf_program *prog
load_attr.prog_flags = prog->prog_flags;
load_attr.fd_array = obj->fd_array;
load_attr.token_fd = obj->token_fd;
if (obj->token_fd)
load_attr.prog_flags |= BPF_F_TOKEN_FD;
/* adjust load_attr if sec_def provides custom preload callback */
if (prog->sec_def && prog->sec_def->prog_prepare_load_fn) {
err = prog->sec_def->prog_prepare_load_fn(prog, &load_attr, prog->sec_def->cookie);
@ -7918,7 +7566,7 @@ static int bpf_object_init_progs(struct bpf_object *obj, const struct bpf_object
static struct bpf_object *bpf_object_open(const char *path, const void *obj_buf, size_t obj_buf_sz,
const struct bpf_object_open_opts *opts)
{
const char *obj_name, *kconfig, *btf_tmp_path;
const char *obj_name, *kconfig, *btf_tmp_path, *token_path;
struct bpf_object *obj;
char tmp_name[64];
int err;
@ -7955,6 +7603,16 @@ static struct bpf_object *bpf_object_open(const char *path, const void *obj_buf,
if (log_size && !log_buf)
return ERR_PTR(-EINVAL);
token_path = OPTS_GET(opts, bpf_token_path, NULL);
/* if user didn't specify bpf_token_path explicitly, check if
* LIBBPF_BPF_TOKEN_PATH envvar was set and treat it as bpf_token_path
* option
*/
if (!token_path)
token_path = getenv("LIBBPF_BPF_TOKEN_PATH");
if (token_path && strlen(token_path) >= PATH_MAX)
return ERR_PTR(-ENAMETOOLONG);
obj = bpf_object__new(path, obj_buf, obj_buf_sz, obj_name);
if (IS_ERR(obj))
return obj;
@ -7963,6 +7621,14 @@ static struct bpf_object *bpf_object_open(const char *path, const void *obj_buf,
obj->log_size = log_size;
obj->log_level = log_level;
if (token_path) {
obj->token_path = strdup(token_path);
if (!obj->token_path) {
err = -ENOMEM;
goto out;
}
}
btf_tmp_path = OPTS_GET(opts, btf_custom_path, NULL);
if (btf_tmp_path) {
if (strlen(btf_tmp_path) >= PATH_MAX) {
@ -8473,7 +8139,8 @@ static int bpf_object_load(struct bpf_object *obj, int extra_log_level, const ch
if (obj->gen_loader)
bpf_gen__init(obj->gen_loader, extra_log_level, obj->nr_programs, obj->nr_maps);
err = bpf_object__probe_loading(obj);
err = bpf_object_prepare_token(obj);
err = err ? : bpf_object__probe_loading(obj);
err = err ? : bpf_object__load_vmlinux_btf(obj, false);
err = err ? : bpf_object__resolve_externs(obj, obj->kconfig);
err = err ? : bpf_object__sanitize_maps(obj);
@ -9008,6 +8675,11 @@ void bpf_object__close(struct bpf_object *obj)
}
zfree(&obj->programs);
zfree(&obj->feat_cache);
zfree(&obj->token_path);
if (obj->token_fd > 0)
close(obj->token_fd);
free(obj);
}
@ -9966,7 +9638,9 @@ static int libbpf_find_attach_btf_id(struct bpf_program *prog, const char *attac
*btf_obj_fd = 0;
*btf_type_id = 1;
} else {
err = find_kernel_btf_id(prog->obj, attach_name, attach_type, btf_obj_fd, btf_type_id);
err = find_kernel_btf_id(prog->obj, attach_name,
attach_type, btf_obj_fd,
btf_type_id);
}
if (err) {
pr_warn("prog '%s': failed to find kernel BTF type ID of '%s': %d\n",
@ -11028,7 +10702,7 @@ static const char *arch_specific_syscall_pfx(void)
#endif
}
static int probe_kern_syscall_wrapper(void)
int probe_kern_syscall_wrapper(int token_fd)
{
char syscall_name[64];
const char *ksys_pfx;

View File

@ -177,10 +177,29 @@ struct bpf_object_open_opts {
* logs through its print callback.
*/
__u32 kernel_log_level;
/* Path to BPF FS mount point to derive BPF token from.
*
* Created BPF token will be used for all bpf() syscall operations
* that accept BPF token (e.g., map creation, BTF and program loads,
* etc) automatically within instantiated BPF object.
*
* If bpf_token_path is not specified, libbpf will consult
* LIBBPF_BPF_TOKEN_PATH environment variable. If set, it will be
* taken as a value of bpf_token_path option and will force libbpf to
* either create BPF token from provided custom BPF FS path, or will
* disable implicit BPF token creation, if envvar value is an empty
* string. bpf_token_path overrides LIBBPF_BPF_TOKEN_PATH, if both are
* set at the same time.
*
* Setting bpf_token_path option to empty string disables libbpf's
* automatic attempt to create BPF token from default BPF FS mount
* point (/sys/fs/bpf), in case this default behavior is undesirable.
*/
const char *bpf_token_path;
size_t :0;
};
#define bpf_object_open_opts__last_field kernel_log_level
#define bpf_object_open_opts__last_field bpf_token_path
/**
* @brief **bpf_object__open()** creates a bpf_object by opening

View File

@ -411,4 +411,5 @@ LIBBPF_1.3.0 {
} LIBBPF_1.2.0;
LIBBPF_1.4.0 {
bpf_token_create;
} LIBBPF_1.3.0;

View File

@ -15,6 +15,7 @@
#include <linux/err.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <libelf.h>
#include "relo_core.h"
@ -360,15 +361,32 @@ enum kern_feature_id {
__FEAT_CNT,
};
int probe_memcg_account(void);
enum kern_feature_result {
FEAT_UNKNOWN = 0,
FEAT_SUPPORTED = 1,
FEAT_MISSING = 2,
};
struct kern_feature_cache {
enum kern_feature_result res[__FEAT_CNT];
int token_fd;
};
bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id);
bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id);
int probe_kern_syscall_wrapper(int token_fd);
int probe_memcg_account(int token_fd);
int bump_rlimit_memlock(void);
int parse_cpu_mask_str(const char *s, bool **mask, int *mask_sz);
int parse_cpu_mask_file(const char *fcpu, bool **mask, int *mask_sz);
int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
const char *str_sec, size_t str_len);
int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 log_level);
const char *str_sec, size_t str_len,
int token_fd);
int btf_load_into_kernel(struct btf *btf,
char *log_buf, size_t log_sz, __u32 log_level,
int token_fd);
struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf);
void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type,
@ -532,6 +550,17 @@ static inline bool is_ldimm64_insn(struct bpf_insn *insn)
return insn->code == (BPF_LD | BPF_IMM | BPF_DW);
}
/* Unconditionally dup FD, ensuring it doesn't use [0, 2] range.
* Original FD is not closed or altered in any other way.
* Preserves original FD value, if it's invalid (negative).
*/
static inline int dup_good_fd(int fd)
{
if (fd < 0)
return fd;
return fcntl(fd, F_DUPFD_CLOEXEC, 3);
}
/* if fd is stdin, stdout, or stderr, dup to a fd greater than 2
* Takes ownership of the fd passed in, and closes it if calling
* fcntl(fd, F_DUPFD_CLOEXEC, 3).
@ -543,7 +572,7 @@ static inline int ensure_good_fd(int fd)
if (fd < 0)
return fd;
if (fd < 3) {
fd = fcntl(fd, F_DUPFD_CLOEXEC, 3);
fd = dup_good_fd(fd);
saved_errno = errno;
close(old_fd);
errno = saved_errno;
@ -555,6 +584,15 @@ static inline int ensure_good_fd(int fd)
return fd;
}
static inline int sys_dup2(int oldfd, int newfd)
{
#ifdef __NR_dup2
return syscall(__NR_dup2, oldfd, newfd);
#else
return syscall(__NR_dup3, oldfd, newfd, 0);
#endif
}
/* Point *fixed_fd* to the same file that *tmp_fd* points to.
* Regardless of success, *tmp_fd* is closed.
* Whatever *fixed_fd* pointed to is closed silently.
@ -563,7 +601,7 @@ static inline int reuse_fd(int fixed_fd, int tmp_fd)
{
int err;
err = dup2(tmp_fd, fixed_fd);
err = sys_dup2(tmp_fd, fixed_fd);
err = err < 0 ? -errno : 0;
close(tmp_fd); /* clean up temporary FD */
return err;
@ -613,4 +651,6 @@ int elf_resolve_syms_offsets(const char *binary_path, int cnt,
int elf_resolve_pattern_offsets(const char *binary_path, const char *pattern,
unsigned long **poffsets, size_t *pcnt);
int probe_fd(int fd);
#endif /* __LIBBPF_LIBBPF_INTERNAL_H */

View File

@ -219,7 +219,8 @@ int libbpf_probe_bpf_prog_type(enum bpf_prog_type prog_type, const void *opts)
}
int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
const char *str_sec, size_t str_len)
const char *str_sec, size_t str_len,
int token_fd)
{
struct btf_header hdr = {
.magic = BTF_MAGIC,
@ -229,6 +230,10 @@ int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
.str_off = types_len,
.str_len = str_len,
};
LIBBPF_OPTS(bpf_btf_load_opts, opts,
.token_fd = token_fd,
.btf_flags = token_fd ? BPF_F_TOKEN_FD : 0,
);
int btf_fd, btf_len;
__u8 *raw_btf;
@ -241,7 +246,7 @@ int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
memcpy(raw_btf + hdr.hdr_len, raw_types, hdr.type_len);
memcpy(raw_btf + hdr.hdr_len + hdr.type_len, str_sec, hdr.str_len);
btf_fd = bpf_btf_load(raw_btf, btf_len, NULL);
btf_fd = bpf_btf_load(raw_btf, btf_len, &opts);
free(raw_btf);
return btf_fd;
@ -271,7 +276,7 @@ static int load_local_storage_btf(void)
};
return libbpf__load_raw_btf((char *)types, sizeof(types),
strs, sizeof(strs));
strs, sizeof(strs), 0);
}
static int probe_map_create(enum bpf_map_type map_type)
@ -326,6 +331,7 @@ static int probe_map_create(enum bpf_map_type map_type)
case BPF_MAP_TYPE_STRUCT_OPS:
/* we'll get -ENOTSUPP for invalid BTF type ID for struct_ops */
opts.btf_vmlinux_value_type_id = 1;
opts.value_type_btf_obj_fd = -1;
exp_err = -524; /* -ENOTSUPP */
break;
case BPF_MAP_TYPE_BLOOM_FILTER:

View File

@ -2,5 +2,8 @@
#ifndef __LIBBPF_STR_ERROR_H
#define __LIBBPF_STR_ERROR_H
#define STRERR_BUFSIZE 128
char *libbpf_strerror_r(int err, char *dst, int len);
#endif /* __LIBBPF_STR_ERROR_H */

View File

@ -115,7 +115,7 @@ the insn 20 undoes map_value addition. It is currently impossible for the
verifier to understand such speculative pointer arithmetic.
Hence `this patch`__ addresses it on the compiler side. It was committed on llvm 12.
__ https://reviews.llvm.org/D85570
__ https://github.com/llvm/llvm-project/commit/ddf1864ace484035e3cde5e83b3a31ac81e059c6
The corresponding C code
@ -165,7 +165,7 @@ This is due to a llvm BPF backend bug. `The fix`__
has been pushed to llvm 10.x release branch and will be
available in 10.0.1. The patch is available in llvm 11.0.0 trunk.
__ https://reviews.llvm.org/D78466
__ https://github.com/llvm/llvm-project/commit/3cb7e7bf959dcd3b8080986c62e10a75c7af43f0
bpf_verif_scale/loop6.bpf.o test failure with Clang 12
======================================================
@ -204,7 +204,7 @@ r5(w5) is eventually saved on stack at insn #24 for later use.
This cause later verifier failure. The bug has been `fixed`__ in
Clang 13.
__ https://reviews.llvm.org/D97479
__ https://github.com/llvm/llvm-project/commit/1959ead525b8830cc8a345f45e1c3ef9902d3229
BPF CO-RE-based tests and Clang version
=======================================
@ -221,11 +221,11 @@ failures:
- __builtin_btf_type_id() [0_, 1_, 2_];
- __builtin_preserve_type_info(), __builtin_preserve_enum_value() [3_, 4_].
.. _0: https://reviews.llvm.org/D74572
.. _1: https://reviews.llvm.org/D74668
.. _2: https://reviews.llvm.org/D85174
.. _3: https://reviews.llvm.org/D83878
.. _4: https://reviews.llvm.org/D83242
.. _0: https://github.com/llvm/llvm-project/commit/6b01b465388b204d543da3cf49efd6080db094a9
.. _1: https://github.com/llvm/llvm-project/commit/072cde03aaa13a2c57acf62d79876bf79aa1919f
.. _2: https://github.com/llvm/llvm-project/commit/00602ee7ef0bf6c68d690a2bd729c12b95c95c99
.. _3: https://github.com/llvm/llvm-project/commit/6d218b4adb093ff2e9764febbbc89f429412006c
.. _4: https://github.com/llvm/llvm-project/commit/6d6750696400e7ce988d66a1a00e1d0cb32815f8
Floating-point tests and Clang version
======================================
@ -234,7 +234,7 @@ Certain selftests, e.g. core_reloc, require support for the floating-point
types, which was introduced in `Clang 13`__. The older Clang versions will
either crash when compiling these tests, or generate an incorrect BTF.
__ https://reviews.llvm.org/D83289
__ https://github.com/llvm/llvm-project/commit/a7137b238a07d9399d3ae96c0b461571bd5aa8b2
Kernel function call test and Clang version
===========================================
@ -248,7 +248,7 @@ Without it, the error from compiling bpf selftests looks like:
libbpf: failed to find BTF for extern 'tcp_slow_start' [25] section: -2
__ https://reviews.llvm.org/D93563
__ https://github.com/llvm/llvm-project/commit/886f9ff53155075bd5f1e994f17b85d1e1b7470c
btf_tag test and Clang version
==============================
@ -264,8 +264,8 @@ Without them, the btf_tag selftest will be skipped and you will observe:
#<test_num> btf_tag:SKIP
.. _0: https://reviews.llvm.org/D111588
.. _1: https://reviews.llvm.org/D111199
.. _0: https://github.com/llvm/llvm-project/commit/a162b67c98066218d0d00aa13b99afb95d9bb5e6
.. _1: https://github.com/llvm/llvm-project/commit/3466e00716e12e32fdb100e3fcfca5c2b3e8d784
Clang dependencies for static linking tests
===========================================
@ -274,7 +274,7 @@ linked_vars, linked_maps, and linked_funcs tests depend on `Clang fix`__ to
generate valid BTF information for weak variables. Please make sure you use
Clang that contains the fix.
__ https://reviews.llvm.org/D100362
__ https://github.com/llvm/llvm-project/commit/968292cb93198442138128d850fd54dc7edc0035
Clang relocation changes
========================
@ -292,7 +292,7 @@ Here, ``type 2`` refers to new relocation type ``R_BPF_64_ABS64``.
To fix this issue, user newer libbpf.
.. Links
.. _clang reloc patch: https://reviews.llvm.org/D102712
.. _clang reloc patch: https://github.com/llvm/llvm-project/commit/6a2ea84600ba4bd3b2733bd8f08f5115eb32164b
.. _kernel llvm reloc: /Documentation/bpf/llvm_reloc.rst
Clang dependencies for the u32 spill test (xdpwall)
@ -304,6 +304,6 @@ from running test_progs will look like:
.. code-block:: console
test_xdpwall:FAIL:Does LLVM have https://reviews.llvm.org/D109073? unexpected error: -4007
test_xdpwall:FAIL:Does LLVM have https://github.com/llvm/llvm-project/commit/ea72b0319d7b0f0c2fcf41d121afa5d031b319d5? unexpected error: -4007
__ https://reviews.llvm.org/D109073
__ https://github.com/llvm/llvm-project/commit/ea72b0319d7b0f0c2fcf41d121afa5d031b319d5

View File

@ -260,11 +260,11 @@ extern void bpf_throw(u64 cookie) __ksym;
#define __is_signed_type(type) (((type)(-1)) < (type)1)
#define __bpf_cmp(LHS, OP, SIGN, PRED, RHS, DEFAULT) \
#define __bpf_cmp(LHS, OP, PRED, RHS, DEFAULT) \
({ \
__label__ l_true; \
bool ret = DEFAULT; \
asm volatile goto("if %[lhs] " SIGN #OP " %[rhs] goto %l[l_true]" \
asm volatile goto("if %[lhs] " OP " %[rhs] goto %l[l_true]" \
:: [lhs] "r"((short)LHS), [rhs] PRED (RHS) :: l_true); \
ret = !DEFAULT; \
l_true: \
@ -276,7 +276,7 @@ l_true: \
* __lhs OP __rhs below will catch the mistake.
* Be aware that we check only __lhs to figure out the sign of compare.
*/
#define _bpf_cmp(LHS, OP, RHS, NOFLIP) \
#define _bpf_cmp(LHS, OP, RHS, UNLIKELY) \
({ \
typeof(LHS) __lhs = (LHS); \
typeof(RHS) __rhs = (RHS); \
@ -285,14 +285,17 @@ l_true: \
(void)(__lhs OP __rhs); \
if (__cmp_cannot_be_signed(OP) || !__is_signed_type(typeof(__lhs))) { \
if (sizeof(__rhs) == 8) \
ret = __bpf_cmp(__lhs, OP, "", "r", __rhs, NOFLIP); \
/* "i" will truncate 64-bit constant into s32, \
* so we have to use extra register via "r". \
*/ \
ret = __bpf_cmp(__lhs, #OP, "r", __rhs, UNLIKELY); \
else \
ret = __bpf_cmp(__lhs, OP, "", "i", __rhs, NOFLIP); \
ret = __bpf_cmp(__lhs, #OP, "ri", __rhs, UNLIKELY); \
} else { \
if (sizeof(__rhs) == 8) \
ret = __bpf_cmp(__lhs, OP, "s", "r", __rhs, NOFLIP); \
ret = __bpf_cmp(__lhs, "s"#OP, "r", __rhs, UNLIKELY); \
else \
ret = __bpf_cmp(__lhs, OP, "s", "i", __rhs, NOFLIP); \
ret = __bpf_cmp(__lhs, "s"#OP, "ri", __rhs, UNLIKELY); \
} \
ret; \
})
@ -304,7 +307,7 @@ l_true: \
#ifndef bpf_cmp_likely
#define bpf_cmp_likely(LHS, OP, RHS) \
({ \
bool ret; \
bool ret = 0; \
if (__builtin_strcmp(#OP, "==") == 0) \
ret = _bpf_cmp(LHS, !=, RHS, false); \
else if (__builtin_strcmp(#OP, "!=") == 0) \
@ -318,7 +321,7 @@ l_true: \
else if (__builtin_strcmp(#OP, ">=") == 0) \
ret = _bpf_cmp(LHS, <, RHS, false); \
else \
(void) "bug"; \
asm volatile("r0 " #OP " invalid compare"); \
ret; \
})
#endif

View File

@ -51,6 +51,16 @@ extern int bpf_dynptr_clone(const struct bpf_dynptr *ptr, struct bpf_dynptr *clo
extern int bpf_sock_addr_set_sun_path(struct bpf_sock_addr_kern *sa_kern,
const __u8 *sun_path, __u32 sun_path__sz) __ksym;
/* Description
* Allocate and configure a reqsk and link it with a listener and skb.
* Returns
* Error code
*/
struct sock;
struct bpf_tcp_req_attrs;
extern int bpf_sk_assign_tcp_reqsk(struct __sk_buff *skb, struct sock *sk,
struct bpf_tcp_req_attrs *attrs, int attrs__sz) __ksym;
void *bpf_cast_to_kern_ctx(void *) __ksym;
void *bpf_rdonly_cast(void *obj, __u32 btf_id) __ksym;

View File

@ -1,7 +1,9 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2020 Facebook */
#include <linux/bpf.h>
#include <linux/btf.h>
#include <linux/btf_ids.h>
#include <linux/delay.h>
#include <linux/error-injection.h>
#include <linux/init.h>
#include <linux/module.h>
@ -520,11 +522,75 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_offset)
BTF_SET8_END(bpf_testmod_check_kfunc_ids)
static int bpf_testmod_ops_init(struct btf *btf)
{
return 0;
}
static bool bpf_testmod_ops_is_valid_access(int off, int size,
enum bpf_access_type type,
const struct bpf_prog *prog,
struct bpf_insn_access_aux *info)
{
return bpf_tracing_btf_ctx_access(off, size, type, prog, info);
}
static int bpf_testmod_ops_init_member(const struct btf_type *t,
const struct btf_member *member,
void *kdata, const void *udata)
{
return 0;
}
static const struct btf_kfunc_id_set bpf_testmod_kfunc_set = {
.owner = THIS_MODULE,
.set = &bpf_testmod_check_kfunc_ids,
};
static const struct bpf_verifier_ops bpf_testmod_verifier_ops = {
.is_valid_access = bpf_testmod_ops_is_valid_access,
};
static int bpf_dummy_reg(void *kdata)
{
struct bpf_testmod_ops *ops = kdata;
int r;
r = ops->test_2(4, 3);
return 0;
}
static void bpf_dummy_unreg(void *kdata)
{
}
static int bpf_testmod_test_1(void)
{
return 0;
}
static int bpf_testmod_test_2(int a, int b)
{
return 0;
}
static struct bpf_testmod_ops __bpf_testmod_ops = {
.test_1 = bpf_testmod_test_1,
.test_2 = bpf_testmod_test_2,
};
struct bpf_struct_ops bpf_bpf_testmod_ops = {
.verifier_ops = &bpf_testmod_verifier_ops,
.init = bpf_testmod_ops_init,
.init_member = bpf_testmod_ops_init_member,
.reg = bpf_dummy_reg,
.unreg = bpf_dummy_unreg,
.cfi_stubs = &__bpf_testmod_ops,
.name = "bpf_testmod_ops",
.owner = THIS_MODULE,
};
extern int bpf_fentry_test1(int a);
static int bpf_testmod_init(void)
@ -535,6 +601,7 @@ static int bpf_testmod_init(void)
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_testmod_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_testmod_kfunc_set);
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_testmod_kfunc_set);
ret = ret ?: register_bpf_struct_ops(&bpf_bpf_testmod_ops, bpf_testmod_ops);
if (ret < 0)
return ret;
if (bpf_fentry_test1(0) < 0)
@ -544,6 +611,14 @@ static int bpf_testmod_init(void)
static void bpf_testmod_exit(void)
{
/* Need to wait for all references to be dropped because
* bpf_kfunc_call_test_release() which currently resides in kernel can
* be called after bpf_testmod is unloaded. Once release function is
* moved into the module this wait can be removed.
*/
while (refcount_read(&prog_test_struct.cnt) > 1)
msleep(20);
return sysfs_remove_bin_file(kernel_kobj, &bin_attr_bpf_testmod_file);
}

View File

@ -28,4 +28,9 @@ struct bpf_iter_testmod_seq {
int cnt;
};
struct bpf_testmod_ops {
int (*test_1)(void);
int (*test_2)(int a, int b);
};
#endif /* _BPF_TESTMOD_H */

View File

@ -81,6 +81,7 @@ CONFIG_NF_NAT=y
CONFIG_RC_CORE=y
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SYN_COOKIES=y
CONFIG_TEST_BPF=m
CONFIG_USERFAULTFD=y
CONFIG_VSOCKETS=y

View File

@ -35,7 +35,7 @@ static int check_load(const char *file, enum bpf_prog_type type)
}
bpf_program__set_type(prog, type);
bpf_program__set_flags(prog, BPF_F_TEST_RND_HI32 | BPF_F_TEST_REG_INVARIANTS);
bpf_program__set_flags(prog, testing_prog_flags());
bpf_program__set_log_level(prog, 4 | extra_prog_load_log_flags);
err = bpf_object__load(obj);

View File

@ -626,50 +626,6 @@ static bool match_pattern(struct btf *btf, char *pattern, char *text, char *reg_
return false;
}
/* Request BPF program instructions after all rewrites are applied,
* e.g. verifier.c:convert_ctx_access() is done.
*/
static int get_xlated_program(int fd_prog, struct bpf_insn **buf, __u32 *cnt)
{
struct bpf_prog_info info = {};
__u32 info_len = sizeof(info);
__u32 xlated_prog_len;
__u32 buf_element_size = sizeof(struct bpf_insn);
if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
perror("bpf_prog_get_info_by_fd failed");
return -1;
}
xlated_prog_len = info.xlated_prog_len;
if (xlated_prog_len % buf_element_size) {
printf("Program length %d is not multiple of %d\n",
xlated_prog_len, buf_element_size);
return -1;
}
*cnt = xlated_prog_len / buf_element_size;
*buf = calloc(*cnt, buf_element_size);
if (!buf) {
perror("can't allocate xlated program buffer");
return -ENOMEM;
}
bzero(&info, sizeof(info));
info.xlated_prog_len = xlated_prog_len;
info.xlated_prog_insns = (__u64)(unsigned long)*buf;
if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
perror("second bpf_prog_get_info_by_fd failed");
goto out_free_buf;
}
return 0;
out_free_buf:
free(*buf);
return -1;
}
static void print_insn(void *private_data, const char *fmt, ...)
{
va_list args;

View File

@ -19,6 +19,7 @@ static const char *kmulti_syms[] = {
};
#define KMULTI_CNT ARRAY_SIZE(kmulti_syms)
static __u64 kmulti_addrs[KMULTI_CNT];
static __u64 kmulti_cookies[] = { 3, 1, 2 };
#define KPROBE_FUNC "bpf_fentry_test1"
static __u64 kprobe_addr;
@ -31,6 +32,8 @@ static noinline void uprobe_func(void)
asm volatile ("");
}
#define PERF_EVENT_COOKIE 0xdeadbeef
static int verify_perf_link_info(int fd, enum bpf_perf_event_type type, long addr,
ssize_t offset, ssize_t entry_offset)
{
@ -62,6 +65,8 @@ static int verify_perf_link_info(int fd, enum bpf_perf_event_type type, long add
ASSERT_EQ(info.perf_event.kprobe.addr, addr + entry_offset,
"kprobe_addr");
ASSERT_EQ(info.perf_event.kprobe.cookie, PERF_EVENT_COOKIE, "kprobe_cookie");
if (!info.perf_event.kprobe.func_name) {
ASSERT_EQ(info.perf_event.kprobe.name_len, 0, "name_len");
info.perf_event.kprobe.func_name = ptr_to_u64(&buf);
@ -81,6 +86,8 @@ static int verify_perf_link_info(int fd, enum bpf_perf_event_type type, long add
goto again;
}
ASSERT_EQ(info.perf_event.tracepoint.cookie, PERF_EVENT_COOKIE, "tracepoint_cookie");
err = strncmp(u64_to_ptr(info.perf_event.tracepoint.tp_name), TP_NAME,
strlen(TP_NAME));
ASSERT_EQ(err, 0, "cmp_tp_name");
@ -96,10 +103,17 @@ static int verify_perf_link_info(int fd, enum bpf_perf_event_type type, long add
goto again;
}
ASSERT_EQ(info.perf_event.uprobe.cookie, PERF_EVENT_COOKIE, "uprobe_cookie");
err = strncmp(u64_to_ptr(info.perf_event.uprobe.file_name), UPROBE_FILE,
strlen(UPROBE_FILE));
ASSERT_EQ(err, 0, "cmp_file_name");
break;
case BPF_PERF_EVENT_EVENT:
ASSERT_EQ(info.perf_event.event.type, PERF_TYPE_SOFTWARE, "event_type");
ASSERT_EQ(info.perf_event.event.config, PERF_COUNT_SW_PAGE_FAULTS, "event_config");
ASSERT_EQ(info.perf_event.event.cookie, PERF_EVENT_COOKIE, "event_cookie");
break;
default:
err = -1;
break;
@ -139,6 +153,7 @@ static void test_kprobe_fill_link_info(struct test_fill_link_info *skel,
DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, opts,
.attach_mode = PROBE_ATTACH_MODE_LINK,
.retprobe = type == BPF_PERF_EVENT_KRETPROBE,
.bpf_cookie = PERF_EVENT_COOKIE,
);
ssize_t entry_offset = 0;
struct bpf_link *link;
@ -163,10 +178,13 @@ static void test_kprobe_fill_link_info(struct test_fill_link_info *skel,
static void test_tp_fill_link_info(struct test_fill_link_info *skel)
{
DECLARE_LIBBPF_OPTS(bpf_tracepoint_opts, opts,
.bpf_cookie = PERF_EVENT_COOKIE,
);
struct bpf_link *link;
int link_fd, err;
link = bpf_program__attach_tracepoint(skel->progs.tp_run, TP_CAT, TP_NAME);
link = bpf_program__attach_tracepoint_opts(skel->progs.tp_run, TP_CAT, TP_NAME, &opts);
if (!ASSERT_OK_PTR(link, "attach_tp"))
return;
@ -176,16 +194,53 @@ static void test_tp_fill_link_info(struct test_fill_link_info *skel)
bpf_link__destroy(link);
}
static void test_event_fill_link_info(struct test_fill_link_info *skel)
{
DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, opts,
.bpf_cookie = PERF_EVENT_COOKIE,
);
struct bpf_link *link;
int link_fd, err, pfd;
struct perf_event_attr attr = {
.type = PERF_TYPE_SOFTWARE,
.config = PERF_COUNT_SW_PAGE_FAULTS,
.freq = 1,
.sample_freq = 1,
.size = sizeof(struct perf_event_attr),
};
pfd = syscall(__NR_perf_event_open, &attr, -1 /* pid */, 0 /* cpu 0 */,
-1 /* group id */, 0 /* flags */);
if (!ASSERT_GE(pfd, 0, "perf_event_open"))
return;
link = bpf_program__attach_perf_event_opts(skel->progs.event_run, pfd, &opts);
if (!ASSERT_OK_PTR(link, "attach_event"))
goto error;
link_fd = bpf_link__fd(link);
err = verify_perf_link_info(link_fd, BPF_PERF_EVENT_EVENT, 0, 0, 0);
ASSERT_OK(err, "verify_perf_link_info");
bpf_link__destroy(link);
error:
close(pfd);
}
static void test_uprobe_fill_link_info(struct test_fill_link_info *skel,
enum bpf_perf_event_type type)
{
DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, opts,
.retprobe = type == BPF_PERF_EVENT_URETPROBE,
.bpf_cookie = PERF_EVENT_COOKIE,
);
struct bpf_link *link;
int link_fd, err;
link = bpf_program__attach_uprobe(skel->progs.uprobe_run,
type == BPF_PERF_EVENT_URETPROBE,
0, /* self pid */
UPROBE_FILE, uprobe_offset);
link = bpf_program__attach_uprobe_opts(skel->progs.uprobe_run,
0, /* self pid */
UPROBE_FILE, uprobe_offset,
&opts);
if (!ASSERT_OK_PTR(link, "attach_uprobe"))
return;
@ -195,11 +250,11 @@ static void test_uprobe_fill_link_info(struct test_fill_link_info *skel,
bpf_link__destroy(link);
}
static int verify_kmulti_link_info(int fd, bool retprobe)
static int verify_kmulti_link_info(int fd, bool retprobe, bool has_cookies)
{
__u64 addrs[KMULTI_CNT], cookies[KMULTI_CNT];
struct bpf_link_info info;
__u32 len = sizeof(info);
__u64 addrs[KMULTI_CNT];
int flags, i, err;
memset(&info, 0, sizeof(info));
@ -221,18 +276,22 @@ static int verify_kmulti_link_info(int fd, bool retprobe)
if (!info.kprobe_multi.addrs) {
info.kprobe_multi.addrs = ptr_to_u64(addrs);
info.kprobe_multi.cookies = ptr_to_u64(cookies);
goto again;
}
for (i = 0; i < KMULTI_CNT; i++)
for (i = 0; i < KMULTI_CNT; i++) {
ASSERT_EQ(addrs[i], kmulti_addrs[i], "kmulti_addrs");
ASSERT_EQ(cookies[i], has_cookies ? kmulti_cookies[i] : 0,
"kmulti_cookies_value");
}
return 0;
}
static void verify_kmulti_invalid_user_buffer(int fd)
{
__u64 addrs[KMULTI_CNT], cookies[KMULTI_CNT];
struct bpf_link_info info;
__u32 len = sizeof(info);
__u64 addrs[KMULTI_CNT];
int err, i;
memset(&info, 0, sizeof(info));
@ -266,7 +325,20 @@ static void verify_kmulti_invalid_user_buffer(int fd)
info.kprobe_multi.count = KMULTI_CNT;
info.kprobe_multi.addrs = 0x1; /* invalid addr */
err = bpf_link_get_info_by_fd(fd, &info, &len);
ASSERT_EQ(err, -EFAULT, "invalid_buff");
ASSERT_EQ(err, -EFAULT, "invalid_buff_addrs");
info.kprobe_multi.count = KMULTI_CNT;
info.kprobe_multi.addrs = ptr_to_u64(addrs);
info.kprobe_multi.cookies = 0x1; /* invalid addr */
err = bpf_link_get_info_by_fd(fd, &info, &len);
ASSERT_EQ(err, -EFAULT, "invalid_buff_cookies");
/* cookies && !count */
info.kprobe_multi.count = 0;
info.kprobe_multi.addrs = ptr_to_u64(NULL);
info.kprobe_multi.cookies = ptr_to_u64(cookies);
err = bpf_link_get_info_by_fd(fd, &info, &len);
ASSERT_EQ(err, -EINVAL, "invalid_cookies_count");
}
static int symbols_cmp_r(const void *a, const void *b)
@ -278,13 +350,15 @@ static int symbols_cmp_r(const void *a, const void *b)
}
static void test_kprobe_multi_fill_link_info(struct test_fill_link_info *skel,
bool retprobe, bool invalid)
bool retprobe, bool cookies,
bool invalid)
{
LIBBPF_OPTS(bpf_kprobe_multi_opts, opts);
struct bpf_link *link;
int link_fd, err;
opts.syms = kmulti_syms;
opts.cookies = cookies ? kmulti_cookies : NULL;
opts.cnt = KMULTI_CNT;
opts.retprobe = retprobe;
link = bpf_program__attach_kprobe_multi_opts(skel->progs.kmulti_run, NULL, &opts);
@ -293,7 +367,7 @@ static void test_kprobe_multi_fill_link_info(struct test_fill_link_info *skel,
link_fd = bpf_link__fd(link);
if (!invalid) {
err = verify_kmulti_link_info(link_fd, retprobe);
err = verify_kmulti_link_info(link_fd, retprobe, cookies);
ASSERT_OK(err, "verify_kmulti_link_info");
} else {
verify_kmulti_invalid_user_buffer(link_fd);
@ -513,6 +587,8 @@ void test_fill_link_info(void)
test_kprobe_fill_link_info(skel, BPF_PERF_EVENT_KPROBE, true);
if (test__start_subtest("tracepoint_link_info"))
test_tp_fill_link_info(skel);
if (test__start_subtest("event_link_info"))
test_event_fill_link_info(skel);
uprobe_offset = get_uprobe_offset(&uprobe_func);
if (test__start_subtest("uprobe_link_info"))
@ -523,12 +599,16 @@ void test_fill_link_info(void)
qsort(kmulti_syms, KMULTI_CNT, sizeof(kmulti_syms[0]), symbols_cmp_r);
for (i = 0; i < KMULTI_CNT; i++)
kmulti_addrs[i] = ksym_get_addr(kmulti_syms[i]);
if (test__start_subtest("kprobe_multi_link_info"))
test_kprobe_multi_fill_link_info(skel, false, false);
if (test__start_subtest("kretprobe_multi_link_info"))
test_kprobe_multi_fill_link_info(skel, true, false);
if (test__start_subtest("kprobe_multi_link_info")) {
test_kprobe_multi_fill_link_info(skel, false, false, false);
test_kprobe_multi_fill_link_info(skel, false, true, false);
}
if (test__start_subtest("kretprobe_multi_link_info")) {
test_kprobe_multi_fill_link_info(skel, true, false, false);
test_kprobe_multi_fill_link_info(skel, true, true, false);
}
if (test__start_subtest("kprobe_multi_invalid_ubuff"))
test_kprobe_multi_fill_link_info(skel, true, true);
test_kprobe_multi_fill_link_info(skel, true, true, true);
if (test__start_subtest("uprobe_multi_link_info"))
test_uprobe_multi_fill_link_info(skel, false, false);

View File

@ -0,0 +1,51 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (C) 2023. Huawei Technologies Co., Ltd */
#include <test_progs.h>
#include "linux/filter.h"
#include "kptr_xchg_inline.skel.h"
void test_kptr_xchg_inline(void)
{
struct kptr_xchg_inline *skel;
struct bpf_insn *insn = NULL;
struct bpf_insn exp;
unsigned int cnt;
int err;
#if !(defined(__x86_64__) || defined(__aarch64__))
test__skip();
return;
#endif
skel = kptr_xchg_inline__open_and_load();
if (!ASSERT_OK_PTR(skel, "open_load"))
return;
err = get_xlated_program(bpf_program__fd(skel->progs.kptr_xchg_inline), &insn, &cnt);
if (!ASSERT_OK(err, "prog insn"))
goto out;
/* The original instructions are:
* r1 = map[id:xxx][0]+0
* r2 = 0
* call bpf_kptr_xchg#yyy
*
* call bpf_kptr_xchg#yyy will be inlined as:
* r0 = r2
* r0 = atomic64_xchg((u64 *)(r1 +0), r0)
*/
if (!ASSERT_GT(cnt, 5, "insn cnt"))
goto out;
exp = BPF_MOV64_REG(BPF_REG_0, BPF_REG_2);
if (!ASSERT_OK(memcmp(&insn[3], &exp, sizeof(exp)), "mov"))
goto out;
exp = BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_1, BPF_REG_0, 0);
if (!ASSERT_OK(memcmp(&insn[4], &exp, sizeof(exp)), "xchg"))
goto out;
out:
free(insn);
kptr_xchg_inline__destroy(skel);
}

View File

@ -30,6 +30,8 @@ void test_libbpf_probe_prog_types(void)
if (prog_type == BPF_PROG_TYPE_UNSPEC)
continue;
if (strcmp(prog_type_name, "__MAX_BPF_PROG_TYPE") == 0)
continue;
if (!test__start_subtest(prog_type_name))
continue;
@ -68,6 +70,8 @@ void test_libbpf_probe_map_types(void)
if (map_type == BPF_MAP_TYPE_UNSPEC)
continue;
if (strcmp(map_type_name, "__MAX_BPF_MAP_TYPE") == 0)
continue;
if (!test__start_subtest(map_type_name))
continue;

View File

@ -132,6 +132,9 @@ static void test_libbpf_bpf_map_type_str(void)
const char *map_type_str;
char buf[256];
if (map_type == __MAX_BPF_MAP_TYPE)
continue;
map_type_name = btf__str_by_offset(btf, e->name_off);
map_type_str = libbpf_bpf_map_type_str(map_type);
ASSERT_OK_PTR(map_type_str, map_type_name);
@ -186,6 +189,9 @@ static void test_libbpf_bpf_prog_type_str(void)
const char *prog_type_str;
char buf[256];
if (prog_type == __MAX_BPF_PROG_TYPE)
continue;
prog_type_name = btf__str_by_offset(btf, e->name_off);
prog_type_str = libbpf_bpf_prog_type_str(prog_type);
ASSERT_OK_PTR(prog_type_str, prog_type_name);

View File

@ -840,7 +840,7 @@ static int load_range_cmp_prog(struct range x, struct range y, enum op op,
.log_level = 2,
.log_buf = log_buf,
.log_size = log_sz,
.prog_flags = BPF_F_TEST_REG_INVARIANTS,
.prog_flags = testing_prog_flags(),
);
/* ; skip exit block below

View File

@ -188,6 +188,7 @@ static int netns_setup_links_and_routes(struct netns_setup_result *result)
{
struct nstoken *nstoken = NULL;
char src_fwd_addr[IFADDR_STR_LEN+1] = {};
char src_addr[IFADDR_STR_LEN + 1] = {};
int err;
if (result->dev_mode == MODE_VETH) {
@ -208,6 +209,9 @@ static int netns_setup_links_and_routes(struct netns_setup_result *result)
if (get_ifaddr("src_fwd", src_fwd_addr))
goto fail;
if (get_ifaddr("src", src_addr))
goto fail;
result->ifindex_src = if_nametoindex("src");
if (!ASSERT_GT(result->ifindex_src, 0, "ifindex_src"))
goto fail;
@ -270,6 +274,13 @@ static int netns_setup_links_and_routes(struct netns_setup_result *result)
SYS(fail, "ip route add " IP4_DST "/32 dev dst_fwd scope global");
SYS(fail, "ip route add " IP6_DST "/128 dev dst_fwd scope global");
if (result->dev_mode == MODE_VETH) {
SYS(fail, "ip neigh add " IP4_SRC " dev src_fwd lladdr %s", src_addr);
SYS(fail, "ip neigh add " IP6_SRC " dev src_fwd lladdr %s", src_addr);
SYS(fail, "ip neigh add " IP4_DST " dev dst_fwd lladdr %s", MAC_DST);
SYS(fail, "ip neigh add " IP6_DST " dev dst_fwd lladdr %s", MAC_DST);
}
close_netns(nstoken);
/** setup in 'dst' namespace */
@ -280,6 +291,7 @@ static int netns_setup_links_and_routes(struct netns_setup_result *result)
SYS(fail, "ip addr add " IP4_DST "/32 dev dst");
SYS(fail, "ip addr add " IP6_DST "/128 dev dst nodad");
SYS(fail, "ip link set dev dst up");
SYS(fail, "ip link set dev lo up");
SYS(fail, "ip route add " IP4_SRC "/32 dev dst scope global");
SYS(fail, "ip route add " IP4_NET "/16 dev dst scope global");
@ -457,7 +469,7 @@ static int set_forwarding(bool enable)
return 0;
}
static void rcv_tstamp(int fd, const char *expected, size_t s)
static int __rcv_tstamp(int fd, const char *expected, size_t s, __u64 *tstamp)
{
struct __kernel_timespec pkt_ts = {};
char ctl[CMSG_SPACE(sizeof(pkt_ts))];
@ -478,7 +490,7 @@ static void rcv_tstamp(int fd, const char *expected, size_t s)
ret = recvmsg(fd, &msg, 0);
if (!ASSERT_EQ(ret, s, "recvmsg"))
return;
return -1;
ASSERT_STRNEQ(data, expected, s, "expected rcv data");
cmsg = CMSG_FIRSTHDR(&msg);
@ -487,6 +499,12 @@ static void rcv_tstamp(int fd, const char *expected, size_t s)
memcpy(&pkt_ts, CMSG_DATA(cmsg), sizeof(pkt_ts));
pkt_ns = pkt_ts.tv_sec * NSEC_PER_SEC + pkt_ts.tv_nsec;
if (tstamp) {
/* caller will check the tstamp itself */
*tstamp = pkt_ns;
return 0;
}
ASSERT_NEQ(pkt_ns, 0, "pkt rcv tstamp");
ret = clock_gettime(CLOCK_REALTIME, &now_ts);
@ -496,6 +514,60 @@ static void rcv_tstamp(int fd, const char *expected, size_t s)
if (ASSERT_GE(now_ns, pkt_ns, "check rcv tstamp"))
ASSERT_LT(now_ns - pkt_ns, 5 * NSEC_PER_SEC,
"check rcv tstamp");
return 0;
}
static void rcv_tstamp(int fd, const char *expected, size_t s)
{
__rcv_tstamp(fd, expected, s, NULL);
}
static int wait_netstamp_needed_key(void)
{
int opt = 1, srv_fd = -1, cli_fd = -1, nretries = 0, err, n;
char buf[] = "testing testing";
struct nstoken *nstoken;
__u64 tstamp = 0;
nstoken = open_netns(NS_DST);
if (!nstoken)
return -1;
srv_fd = start_server(AF_INET6, SOCK_DGRAM, "::1", 0, 0);
if (!ASSERT_GE(srv_fd, 0, "start_server"))
goto done;
err = setsockopt(srv_fd, SOL_SOCKET, SO_TIMESTAMPNS_NEW,
&opt, sizeof(opt));
if (!ASSERT_OK(err, "setsockopt(SO_TIMESTAMPNS_NEW)"))
goto done;
cli_fd = connect_to_fd(srv_fd, TIMEOUT_MILLIS);
if (!ASSERT_GE(cli_fd, 0, "connect_to_fd"))
goto done;
again:
n = write(cli_fd, buf, sizeof(buf));
if (!ASSERT_EQ(n, sizeof(buf), "send to server"))
goto done;
err = __rcv_tstamp(srv_fd, buf, sizeof(buf), &tstamp);
if (!ASSERT_OK(err, "__rcv_tstamp"))
goto done;
if (!tstamp && nretries++ < 5) {
sleep(1);
printf("netstamp_needed_key retry#%d\n", nretries);
goto again;
}
done:
if (!tstamp && srv_fd != -1) {
close(srv_fd);
srv_fd = -1;
}
if (cli_fd != -1)
close(cli_fd);
close_netns(nstoken);
return srv_fd;
}
static void snd_tstamp(int fd, char *b, size_t s)
@ -832,11 +904,20 @@ static void test_tc_redirect_dtime(struct netns_setup_result *setup_result)
{
struct test_tc_dtime *skel;
struct nstoken *nstoken;
int err;
int hold_tstamp_fd, err;
/* Hold a sk with the SOCK_TIMESTAMP set to ensure there
* is no delay in the kernel net_enable_timestamp().
* This ensures the following tests must have
* non zero rcv tstamp in the recvmsg().
*/
hold_tstamp_fd = wait_netstamp_needed_key();
if (!ASSERT_GE(hold_tstamp_fd, 0, "wait_netstamp_needed_key"))
return;
skel = test_tc_dtime__open();
if (!ASSERT_OK_PTR(skel, "test_tc_dtime__open"))
return;
goto done;
skel->rodata->IFINDEX_SRC = setup_result->ifindex_src_fwd;
skel->rodata->IFINDEX_DST = setup_result->ifindex_dst_fwd;
@ -881,6 +962,7 @@ static void test_tc_redirect_dtime(struct netns_setup_result *setup_result)
done:
test_tc_dtime__destroy(skel);
close(hold_tstamp_fd);
}
static void test_tc_redirect_neigh_fib(struct netns_setup_result *setup_result)

View File

@ -0,0 +1,150 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright Amazon.com Inc. or its affiliates. */
#define _GNU_SOURCE
#include <sched.h>
#include <stdlib.h>
#include <net/if.h>
#include "test_progs.h"
#include "cgroup_helpers.h"
#include "network_helpers.h"
#include "test_tcp_custom_syncookie.skel.h"
static struct test_tcp_custom_syncookie_case {
int family, type;
char addr[16];
char name[10];
} test_cases[] = {
{
.name = "IPv4 TCP",
.family = AF_INET,
.type = SOCK_STREAM,
.addr = "127.0.0.1",
},
{
.name = "IPv6 TCP",
.family = AF_INET6,
.type = SOCK_STREAM,
.addr = "::1",
},
};
static int setup_netns(void)
{
if (!ASSERT_OK(unshare(CLONE_NEWNET), "create netns"))
return -1;
if (!ASSERT_OK(system("ip link set dev lo up"), "ip"))
goto err;
if (!ASSERT_OK(write_sysctl("/proc/sys/net/ipv4/tcp_ecn", "1"),
"write_sysctl"))
goto err;
return 0;
err:
return -1;
}
static int setup_tc(struct test_tcp_custom_syncookie *skel)
{
LIBBPF_OPTS(bpf_tc_hook, qdisc_lo, .attach_point = BPF_TC_INGRESS);
LIBBPF_OPTS(bpf_tc_opts, tc_attach,
.prog_fd = bpf_program__fd(skel->progs.tcp_custom_syncookie));
qdisc_lo.ifindex = if_nametoindex("lo");
if (!ASSERT_OK(bpf_tc_hook_create(&qdisc_lo), "qdisc add dev lo clsact"))
goto err;
if (!ASSERT_OK(bpf_tc_attach(&qdisc_lo, &tc_attach),
"filter add dev lo ingress"))
goto err;
return 0;
err:
return -1;
}
#define msg "Hello World"
#define msglen 11
static void transfer_message(int sender, int receiver)
{
char buf[msglen];
int ret;
ret = send(sender, msg, msglen, 0);
if (!ASSERT_EQ(ret, msglen, "send"))
return;
memset(buf, 0, sizeof(buf));
ret = recv(receiver, buf, msglen, 0);
if (!ASSERT_EQ(ret, msglen, "recv"))
return;
ret = strncmp(buf, msg, msglen);
if (!ASSERT_EQ(ret, 0, "strncmp"))
return;
}
static void create_connection(struct test_tcp_custom_syncookie_case *test_case)
{
int server, client, child;
server = start_server(test_case->family, test_case->type, test_case->addr, 0, 0);
if (!ASSERT_NEQ(server, -1, "start_server"))
return;
client = connect_to_fd(server, 0);
if (!ASSERT_NEQ(client, -1, "connect_to_fd"))
goto close_server;
child = accept(server, NULL, 0);
if (!ASSERT_NEQ(child, -1, "accept"))
goto close_client;
transfer_message(client, child);
transfer_message(child, client);
close(child);
close_client:
close(client);
close_server:
close(server);
}
void test_tcp_custom_syncookie(void)
{
struct test_tcp_custom_syncookie *skel;
int i;
if (setup_netns())
return;
skel = test_tcp_custom_syncookie__open_and_load();
if (!ASSERT_OK_PTR(skel, "open_and_load"))
return;
if (setup_tc(skel))
goto destroy_skel;
for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
if (!test__start_subtest(test_cases[i].name))
continue;
skel->bss->handled_syn = false;
skel->bss->handled_ack = false;
create_connection(&test_cases[i]);
ASSERT_EQ(skel->bss->handled_syn, true, "SYN is not handled at tc.");
ASSERT_EQ(skel->bss->handled_ack, true, "ACK is not handled at tc");
}
destroy_skel:
system("tc qdisc del dev lo clsact");
test_tcp_custom_syncookie__destroy(skel);
}

View File

@ -0,0 +1,75 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
#include <test_progs.h>
#include <time.h>
#include "struct_ops_module.skel.h"
static void check_map_info(struct bpf_map_info *info)
{
struct bpf_btf_info btf_info;
char btf_name[256];
u32 btf_info_len = sizeof(btf_info);
int err, fd;
fd = bpf_btf_get_fd_by_id(info->btf_vmlinux_id);
if (!ASSERT_GE(fd, 0, "get_value_type_btf_obj_fd"))
return;
memset(&btf_info, 0, sizeof(btf_info));
btf_info.name = ptr_to_u64(btf_name);
btf_info.name_len = sizeof(btf_name);
err = bpf_btf_get_info_by_fd(fd, &btf_info, &btf_info_len);
if (!ASSERT_OK(err, "get_value_type_btf_obj_info"))
goto cleanup;
if (!ASSERT_EQ(strcmp(btf_name, "bpf_testmod"), 0, "get_value_type_btf_obj_name"))
goto cleanup;
cleanup:
close(fd);
}
static void test_struct_ops_load(void)
{
DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts);
struct struct_ops_module *skel;
struct bpf_map_info info = {};
struct bpf_link *link;
int err;
u32 len;
skel = struct_ops_module__open_opts(&opts);
if (!ASSERT_OK_PTR(skel, "struct_ops_module_open"))
return;
err = struct_ops_module__load(skel);
if (!ASSERT_OK(err, "struct_ops_module_load"))
goto cleanup;
len = sizeof(info);
err = bpf_map_get_info_by_fd(bpf_map__fd(skel->maps.testmod_1), &info,
&len);
if (!ASSERT_OK(err, "bpf_map_get_info_by_fd"))
goto cleanup;
link = bpf_map__attach_struct_ops(skel->maps.testmod_1);
ASSERT_OK_PTR(link, "attach_test_mod_1");
/* test_2() will be called from bpf_dummy_reg() in bpf_testmod.c */
ASSERT_EQ(skel->bss->test_2_result, 7, "test_2_result");
bpf_link__destroy(link);
check_map_info(&info);
cleanup:
struct_ops_module__destroy(skel);
}
void serial_test_struct_ops_module(void)
{
if (test__start_subtest("test_struct_ops_load"))
test_struct_ops_load();
}

File diff suppressed because it is too large Load Diff

View File

@ -9,7 +9,7 @@ void test_xdpwall(void)
struct xdpwall *skel;
skel = xdpwall__open_and_load();
ASSERT_OK_PTR(skel, "Does LLMV have https://reviews.llvm.org/D109073?");
ASSERT_OK_PTR(skel, "Does LLVM have https://github.com/llvm/llvm-project/commit/ea72b0319d7b0f0c2fcf41d121afa5d031b319d5?");
xdpwall__destroy(skel);
}

View File

@ -80,7 +80,7 @@
#define __imm(name) [name]"i"(name)
#define __imm_const(name, expr) [name]"i"(expr)
#define __imm_addr(name) [name]"i"(&name)
#define __imm_ptr(name) [name]"p"(&name)
#define __imm_ptr(name) [name]"r"(&name)
#define __imm_insn(name, expr) [name]"i"(*(long *)&(expr))
/* Magic constants used with __retval() */

View File

@ -51,9 +51,25 @@
#define ICSK_TIME_LOSS_PROBE 5
#define ICSK_TIME_REO_TIMEOUT 6
#define ETH_ALEN 6
#define ETH_HLEN 14
#define ETH_P_IP 0x0800
#define ETH_P_IPV6 0x86DD
#define NEXTHDR_TCP 6
#define TCPOPT_NOP 1
#define TCPOPT_EOL 0
#define TCPOPT_MSS 2
#define TCPOPT_WINDOW 3
#define TCPOPT_TIMESTAMP 8
#define TCPOPT_SACK_PERM 4
#define TCPOLEN_MSS 4
#define TCPOLEN_WINDOW 3
#define TCPOLEN_TIMESTAMP 10
#define TCPOLEN_SACK_PERM 2
#define CHECKSUM_NONE 0
#define CHECKSUM_PARTIAL 3

View File

@ -78,8 +78,8 @@ int iter_err_unsafe_asm_loop(const void *ctx)
"*(u32 *)(r1 + 0) = r6;" /* invalid */
:
: [it]"r"(&it),
[small_arr]"p"(small_arr),
[zero]"p"(zero),
[small_arr]"r"(small_arr),
[zero]"r"(zero),
__imm(bpf_iter_num_new),
__imm(bpf_iter_num_next),
__imm(bpf_iter_num_destroy)

View File

@ -0,0 +1,48 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (C) 2023. Huawei Technologies Co., Ltd */
#include <linux/types.h>
#include <bpf/bpf_helpers.h>
#include "bpf_experimental.h"
#include "bpf_misc.h"
char _license[] SEC("license") = "GPL";
struct bin_data {
char blob[32];
};
#define private(name) SEC(".bss." #name) __hidden __attribute__((aligned(8)))
private(kptr) struct bin_data __kptr * ptr;
SEC("tc")
__naked int kptr_xchg_inline(void)
{
asm volatile (
"r1 = %[ptr] ll;"
"r2 = 0;"
"call %[bpf_kptr_xchg];"
"if r0 == 0 goto 1f;"
"r1 = r0;"
"r2 = 0;"
"call %[bpf_obj_drop_impl];"
"1:"
"r0 = 0;"
"exit;"
:
: __imm_addr(ptr),
__imm(bpf_kptr_xchg),
__imm(bpf_obj_drop_impl)
: __clobber_all
);
}
/* BTF FUNC records are not generated for kfuncs referenced
* from inline assembly. These records are necessary for
* libbpf to link the program. The function below is a hack
* to ensure that BTF FUNC records are generated.
*/
void __btf_root(void)
{
bpf_obj_drop(NULL);
}

View File

@ -0,0 +1,13 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
char _license[] SEC("license") = "GPL";
struct {
__uint(type, BPF_MAP_TYPE_QUEUE);
__uint(max_entries, 1);
__type(value, __u32);
} priv_map SEC(".maps");

View File

@ -0,0 +1,13 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
char _license[] SEC("license") = "GPL";
SEC("kprobe")
int kprobe_prog(void *ctx)
{
return 1;
}

View File

@ -0,0 +1,30 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include "../bpf_testmod/bpf_testmod.h"
char _license[] SEC("license") = "GPL";
int test_2_result = 0;
SEC("struct_ops/test_1")
int BPF_PROG(test_1)
{
return 0xdeadbeef;
}
SEC("struct_ops/test_2")
int BPF_PROG(test_2, int a, int b)
{
test_2_result = a + b;
return a + b;
}
SEC(".struct_ops.link")
struct bpf_testmod_ops testmod_1 = {
.test_1 = (void *)test_1,
.test_2 = (void *)test_2,
};

View File

@ -80,7 +80,7 @@ int test_core_type_id(void *ctx)
* to detect whether this test has to be executed, however strange
* that might look like.
*
* [0] https://reviews.llvm.org/D85174
* [0] https://github.com/llvm/llvm-project/commit/00602ee7ef0bf6c68d690a2bd729c12b95c95c99
*/
#if __has_builtin(__builtin_preserve_type_info)
struct core_reloc_type_id_output *out = (void *)&data.out;

View File

@ -33,6 +33,12 @@ int BPF_PROG(tp_run)
return 0;
}
SEC("perf_event")
int event_run(void *ctx)
{
return 0;
}
SEC("kprobe.multi")
int BPF_PROG(kmulti_run)
{

View File

@ -21,6 +21,32 @@ struct {
__type(value, __u32);
} mim_hash SEC(".maps");
/* The following three maps are used to test
* perf_event_array map can be an inner
* map of hash/array_of_maps.
*/
struct perf_event_array {
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
__type(key, __u32);
__type(value, __u32);
} inner_map0 SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
__uint(max_entries, 1);
__type(key, __u32);
__array(values, struct perf_event_array);
} mim_array_pe SEC(".maps") = {
.values = {&inner_map0}};
struct {
__uint(type, BPF_MAP_TYPE_HASH_OF_MAPS);
__uint(max_entries, 1);
__type(key, __u32);
__array(values, struct perf_event_array);
} mim_hash_pe SEC(".maps") = {
.values = {&inner_map0}};
SEC("xdp")
int xdp_mimtest0(struct xdp_md *ctx)
{

View File

@ -0,0 +1,64 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright Amazon.com Inc. or its affiliates. */
#ifndef _TEST_SIPHASH_H
#define _TEST_SIPHASH_H
/* include/linux/bitops.h */
static inline u64 rol64(u64 word, unsigned int shift)
{
return (word << (shift & 63)) | (word >> ((-shift) & 63));
}
/* include/linux/siphash.h */
#define SIPHASH_PERMUTATION(a, b, c, d) ( \
(a) += (b), (b) = rol64((b), 13), (b) ^= (a), (a) = rol64((a), 32), \
(c) += (d), (d) = rol64((d), 16), (d) ^= (c), \
(a) += (d), (d) = rol64((d), 21), (d) ^= (a), \
(c) += (b), (b) = rol64((b), 17), (b) ^= (c), (c) = rol64((c), 32))
#define SIPHASH_CONST_0 0x736f6d6570736575ULL
#define SIPHASH_CONST_1 0x646f72616e646f6dULL
#define SIPHASH_CONST_2 0x6c7967656e657261ULL
#define SIPHASH_CONST_3 0x7465646279746573ULL
/* lib/siphash.c */
#define SIPROUND SIPHASH_PERMUTATION(v0, v1, v2, v3)
#define PREAMBLE(len) \
u64 v0 = SIPHASH_CONST_0; \
u64 v1 = SIPHASH_CONST_1; \
u64 v2 = SIPHASH_CONST_2; \
u64 v3 = SIPHASH_CONST_3; \
u64 b = ((u64)(len)) << 56; \
v3 ^= key->key[1]; \
v2 ^= key->key[0]; \
v1 ^= key->key[1]; \
v0 ^= key->key[0];
#define POSTAMBLE \
v3 ^= b; \
SIPROUND; \
SIPROUND; \
v0 ^= b; \
v2 ^= 0xff; \
SIPROUND; \
SIPROUND; \
SIPROUND; \
SIPROUND; \
return (v0 ^ v1) ^ (v2 ^ v3);
static inline u64 siphash_2u64(const u64 first, const u64 second, const siphash_key_t *key)
{
PREAMBLE(16)
v3 ^= first;
SIPROUND;
SIPROUND;
v0 ^= first;
v3 ^= second;
SIPROUND;
SIPROUND;
v0 ^= second;
POSTAMBLE
}
#endif

View File

@ -0,0 +1,572 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright Amazon.com Inc. or its affiliates. */
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
#include "bpf_tracing_net.h"
#include "bpf_kfuncs.h"
#include "test_siphash.h"
#include "test_tcp_custom_syncookie.h"
/* Hash is calculated for each client and split into ISN and TS.
*
* MSB LSB
* ISN: | 31 ... 8 | 7 6 | 5 | 4 | 3 2 1 0 |
* | Hash_1 | MSS | ECN | SACK | WScale |
*
* TS: | 31 ... 8 | 7 ... 0 |
* | Random | Hash_2 |
*/
#define COOKIE_BITS 8
#define COOKIE_MASK (((__u32)1 << COOKIE_BITS) - 1)
enum {
/* 0xf is invalid thus means that SYN did not have WScale. */
BPF_SYNCOOKIE_WSCALE_MASK = (1 << 4) - 1,
BPF_SYNCOOKIE_SACK = (1 << 4),
BPF_SYNCOOKIE_ECN = (1 << 5),
};
#define MSS_LOCAL_IPV4 65495
#define MSS_LOCAL_IPV6 65476
const __u16 msstab4[] = {
536,
1300,
1460,
MSS_LOCAL_IPV4,
};
const __u16 msstab6[] = {
1280 - 60, /* IPV6_MIN_MTU - 60 */
1480 - 60,
9000 - 60,
MSS_LOCAL_IPV6,
};
static siphash_key_t test_key_siphash = {
{ 0x0706050403020100ULL, 0x0f0e0d0c0b0a0908ULL }
};
struct tcp_syncookie {
struct __sk_buff *skb;
void *data_end;
struct ethhdr *eth;
struct iphdr *ipv4;
struct ipv6hdr *ipv6;
struct tcphdr *tcp;
union {
char *ptr;
__be32 *ptr32;
};
struct bpf_tcp_req_attrs attrs;
u32 cookie;
u64 first;
};
bool handled_syn, handled_ack;
static int tcp_load_headers(struct tcp_syncookie *ctx)
{
ctx->data_end = (void *)(long)ctx->skb->data_end;
ctx->eth = (struct ethhdr *)(long)ctx->skb->data;
if (ctx->eth + 1 > ctx->data_end)
goto err;
switch (bpf_ntohs(ctx->eth->h_proto)) {
case ETH_P_IP:
ctx->ipv4 = (struct iphdr *)(ctx->eth + 1);
if (ctx->ipv4 + 1 > ctx->data_end)
goto err;
if (ctx->ipv4->ihl != sizeof(*ctx->ipv4) / 4)
goto err;
if (ctx->ipv4->version != 4)
goto err;
if (ctx->ipv4->protocol != IPPROTO_TCP)
goto err;
ctx->tcp = (struct tcphdr *)(ctx->ipv4 + 1);
break;
case ETH_P_IPV6:
ctx->ipv6 = (struct ipv6hdr *)(ctx->eth + 1);
if (ctx->ipv6 + 1 > ctx->data_end)
goto err;
if (ctx->ipv6->version != 6)
goto err;
if (ctx->ipv6->nexthdr != NEXTHDR_TCP)
goto err;
ctx->tcp = (struct tcphdr *)(ctx->ipv6 + 1);
break;
default:
goto err;
}
if (ctx->tcp + 1 > ctx->data_end)
goto err;
return 0;
err:
return -1;
}
static int tcp_reload_headers(struct tcp_syncookie *ctx)
{
/* Without volatile,
* R3 32-bit pointer arithmetic prohibited
*/
volatile u64 data_len = ctx->skb->data_end - ctx->skb->data;
if (ctx->tcp->doff < sizeof(*ctx->tcp) / 4)
goto err;
/* Needed to calculate csum and parse TCP options. */
if (bpf_skb_change_tail(ctx->skb, data_len + 60 - ctx->tcp->doff * 4, 0))
goto err;
ctx->data_end = (void *)(long)ctx->skb->data_end;
ctx->eth = (struct ethhdr *)(long)ctx->skb->data;
if (ctx->ipv4) {
ctx->ipv4 = (struct iphdr *)(ctx->eth + 1);
ctx->ipv6 = NULL;
ctx->tcp = (struct tcphdr *)(ctx->ipv4 + 1);
} else {
ctx->ipv4 = NULL;
ctx->ipv6 = (struct ipv6hdr *)(ctx->eth + 1);
ctx->tcp = (struct tcphdr *)(ctx->ipv6 + 1);
}
if ((void *)ctx->tcp + 60 > ctx->data_end)
goto err;
return 0;
err:
return -1;
}
static __sum16 tcp_v4_csum(struct tcp_syncookie *ctx, __wsum csum)
{
return csum_tcpudp_magic(ctx->ipv4->saddr, ctx->ipv4->daddr,
ctx->tcp->doff * 4, IPPROTO_TCP, csum);
}
static __sum16 tcp_v6_csum(struct tcp_syncookie *ctx, __wsum csum)
{
return csum_ipv6_magic(&ctx->ipv6->saddr, &ctx->ipv6->daddr,
ctx->tcp->doff * 4, IPPROTO_TCP, csum);
}
static int tcp_validate_header(struct tcp_syncookie *ctx)
{
s64 csum;
if (tcp_reload_headers(ctx))
goto err;
csum = bpf_csum_diff(0, 0, (void *)ctx->tcp, ctx->tcp->doff * 4, 0);
if (csum < 0)
goto err;
if (ctx->ipv4) {
/* check tcp_v4_csum(csum) is 0 if not on lo. */
csum = bpf_csum_diff(0, 0, (void *)ctx->ipv4, ctx->ipv4->ihl * 4, 0);
if (csum < 0)
goto err;
if (csum_fold(csum) != 0)
goto err;
} else if (ctx->ipv6) {
/* check tcp_v6_csum(csum) is 0 if not on lo. */
}
return 0;
err:
return -1;
}
static int tcp_parse_option(__u32 index, struct tcp_syncookie *ctx)
{
char opcode, opsize;
if (ctx->ptr + 1 > ctx->data_end)
goto stop;
opcode = *ctx->ptr++;
if (opcode == TCPOPT_EOL)
goto stop;
if (opcode == TCPOPT_NOP)
goto next;
if (ctx->ptr + 1 > ctx->data_end)
goto stop;
opsize = *ctx->ptr++;
if (opsize < 2)
goto stop;
switch (opcode) {
case TCPOPT_MSS:
if (opsize == TCPOLEN_MSS && ctx->tcp->syn &&
ctx->ptr + (TCPOLEN_MSS - 2) < ctx->data_end)
ctx->attrs.mss = get_unaligned_be16(ctx->ptr);
break;
case TCPOPT_WINDOW:
if (opsize == TCPOLEN_WINDOW && ctx->tcp->syn &&
ctx->ptr + (TCPOLEN_WINDOW - 2) < ctx->data_end) {
ctx->attrs.wscale_ok = 1;
ctx->attrs.snd_wscale = *ctx->ptr;
}
break;
case TCPOPT_TIMESTAMP:
if (opsize == TCPOLEN_TIMESTAMP &&
ctx->ptr + (TCPOLEN_TIMESTAMP - 2) < ctx->data_end) {
ctx->attrs.rcv_tsval = get_unaligned_be32(ctx->ptr);
ctx->attrs.rcv_tsecr = get_unaligned_be32(ctx->ptr + 4);
if (ctx->tcp->syn && ctx->attrs.rcv_tsecr)
ctx->attrs.tstamp_ok = 0;
else
ctx->attrs.tstamp_ok = 1;
}
break;
case TCPOPT_SACK_PERM:
if (opsize == TCPOLEN_SACK_PERM && ctx->tcp->syn &&
ctx->ptr + (TCPOLEN_SACK_PERM - 2) < ctx->data_end)
ctx->attrs.sack_ok = 1;
break;
}
ctx->ptr += opsize - 2;
next:
return 0;
stop:
return 1;
}
static void tcp_parse_options(struct tcp_syncookie *ctx)
{
ctx->ptr = (char *)(ctx->tcp + 1);
bpf_loop(40, tcp_parse_option, ctx, 0);
}
static int tcp_validate_sysctl(struct tcp_syncookie *ctx)
{
if ((ctx->ipv4 && ctx->attrs.mss != MSS_LOCAL_IPV4) ||
(ctx->ipv6 && ctx->attrs.mss != MSS_LOCAL_IPV6))
goto err;
if (!ctx->attrs.wscale_ok || ctx->attrs.snd_wscale != 7)
goto err;
if (!ctx->attrs.tstamp_ok)
goto err;
if (!ctx->attrs.sack_ok)
goto err;
if (!ctx->tcp->ece || !ctx->tcp->cwr)
goto err;
return 0;
err:
return -1;
}
static void tcp_prepare_cookie(struct tcp_syncookie *ctx)
{
u32 seq = bpf_ntohl(ctx->tcp->seq);
u64 first = 0, second;
int mssind = 0;
u32 hash;
if (ctx->ipv4) {
for (mssind = ARRAY_SIZE(msstab4) - 1; mssind; mssind--)
if (ctx->attrs.mss >= msstab4[mssind])
break;
ctx->attrs.mss = msstab4[mssind];
first = (u64)ctx->ipv4->saddr << 32 | ctx->ipv4->daddr;
} else if (ctx->ipv6) {
for (mssind = ARRAY_SIZE(msstab6) - 1; mssind; mssind--)
if (ctx->attrs.mss >= msstab6[mssind])
break;
ctx->attrs.mss = msstab6[mssind];
first = (u64)ctx->ipv6->saddr.in6_u.u6_addr8[0] << 32 |
ctx->ipv6->daddr.in6_u.u6_addr32[0];
}
second = (u64)seq << 32 | ctx->tcp->source << 16 | ctx->tcp->dest;
hash = siphash_2u64(first, second, &test_key_siphash);
if (ctx->attrs.tstamp_ok) {
ctx->attrs.rcv_tsecr = bpf_get_prandom_u32();
ctx->attrs.rcv_tsecr &= ~COOKIE_MASK;
ctx->attrs.rcv_tsecr |= hash & COOKIE_MASK;
}
hash &= ~COOKIE_MASK;
hash |= mssind << 6;
if (ctx->attrs.wscale_ok)
hash |= ctx->attrs.snd_wscale & BPF_SYNCOOKIE_WSCALE_MASK;
if (ctx->attrs.sack_ok)
hash |= BPF_SYNCOOKIE_SACK;
if (ctx->attrs.tstamp_ok && ctx->tcp->ece && ctx->tcp->cwr)
hash |= BPF_SYNCOOKIE_ECN;
ctx->cookie = hash;
}
static void tcp_write_options(struct tcp_syncookie *ctx)
{
ctx->ptr32 = (__be32 *)(ctx->tcp + 1);
*ctx->ptr32++ = bpf_htonl(TCPOPT_MSS << 24 | TCPOLEN_MSS << 16 |
ctx->attrs.mss);
if (ctx->attrs.wscale_ok)
*ctx->ptr32++ = bpf_htonl(TCPOPT_NOP << 24 |
TCPOPT_WINDOW << 16 |
TCPOLEN_WINDOW << 8 |
ctx->attrs.snd_wscale);
if (ctx->attrs.tstamp_ok) {
if (ctx->attrs.sack_ok)
*ctx->ptr32++ = bpf_htonl(TCPOPT_SACK_PERM << 24 |
TCPOLEN_SACK_PERM << 16 |
TCPOPT_TIMESTAMP << 8 |
TCPOLEN_TIMESTAMP);
else
*ctx->ptr32++ = bpf_htonl(TCPOPT_NOP << 24 |
TCPOPT_NOP << 16 |
TCPOPT_TIMESTAMP << 8 |
TCPOLEN_TIMESTAMP);
*ctx->ptr32++ = bpf_htonl(ctx->attrs.rcv_tsecr);
*ctx->ptr32++ = bpf_htonl(ctx->attrs.rcv_tsval);
} else if (ctx->attrs.sack_ok) {
*ctx->ptr32++ = bpf_htonl(TCPOPT_NOP << 24 |
TCPOPT_NOP << 16 |
TCPOPT_SACK_PERM << 8 |
TCPOLEN_SACK_PERM);
}
}
static int tcp_handle_syn(struct tcp_syncookie *ctx)
{
s64 csum;
if (tcp_validate_header(ctx))
goto err;
tcp_parse_options(ctx);
if (tcp_validate_sysctl(ctx))
goto err;
tcp_prepare_cookie(ctx);
tcp_write_options(ctx);
swap(ctx->tcp->source, ctx->tcp->dest);
ctx->tcp->check = 0;
ctx->tcp->ack_seq = bpf_htonl(bpf_ntohl(ctx->tcp->seq) + 1);
ctx->tcp->seq = bpf_htonl(ctx->cookie);
ctx->tcp->doff = ((long)ctx->ptr32 - (long)ctx->tcp) >> 2;
ctx->tcp->ack = 1;
if (!ctx->attrs.tstamp_ok || !ctx->tcp->ece || !ctx->tcp->cwr)
ctx->tcp->ece = 0;
ctx->tcp->cwr = 0;
csum = bpf_csum_diff(0, 0, (void *)ctx->tcp, ctx->tcp->doff * 4, 0);
if (csum < 0)
goto err;
if (ctx->ipv4) {
swap(ctx->ipv4->saddr, ctx->ipv4->daddr);
ctx->tcp->check = tcp_v4_csum(ctx, csum);
ctx->ipv4->check = 0;
ctx->ipv4->tos = 0;
ctx->ipv4->tot_len = bpf_htons((long)ctx->ptr32 - (long)ctx->ipv4);
ctx->ipv4->id = 0;
ctx->ipv4->ttl = 64;
csum = bpf_csum_diff(0, 0, (void *)ctx->ipv4, sizeof(*ctx->ipv4), 0);
if (csum < 0)
goto err;
ctx->ipv4->check = csum_fold(csum);
} else if (ctx->ipv6) {
swap(ctx->ipv6->saddr, ctx->ipv6->daddr);
ctx->tcp->check = tcp_v6_csum(ctx, csum);
*(__be32 *)ctx->ipv6 = bpf_htonl(0x60000000);
ctx->ipv6->payload_len = bpf_htons((long)ctx->ptr32 - (long)ctx->tcp);
ctx->ipv6->hop_limit = 64;
}
swap_array(ctx->eth->h_source, ctx->eth->h_dest);
if (bpf_skb_change_tail(ctx->skb, (long)ctx->ptr32 - (long)ctx->eth, 0))
goto err;
return bpf_redirect(ctx->skb->ifindex, 0);
err:
return TC_ACT_SHOT;
}
static int tcp_validate_cookie(struct tcp_syncookie *ctx)
{
u32 cookie = bpf_ntohl(ctx->tcp->ack_seq) - 1;
u32 seq = bpf_ntohl(ctx->tcp->seq) - 1;
u64 first = 0, second;
int mssind;
u32 hash;
if (ctx->ipv4)
first = (u64)ctx->ipv4->saddr << 32 | ctx->ipv4->daddr;
else if (ctx->ipv6)
first = (u64)ctx->ipv6->saddr.in6_u.u6_addr8[0] << 32 |
ctx->ipv6->daddr.in6_u.u6_addr32[0];
second = (u64)seq << 32 | ctx->tcp->source << 16 | ctx->tcp->dest;
hash = siphash_2u64(first, second, &test_key_siphash);
if (ctx->attrs.tstamp_ok)
hash -= ctx->attrs.rcv_tsecr & COOKIE_MASK;
else
hash &= ~COOKIE_MASK;
hash -= cookie & ~COOKIE_MASK;
if (hash)
goto err;
mssind = (cookie & (3 << 6)) >> 6;
if (ctx->ipv4) {
if (mssind > ARRAY_SIZE(msstab4))
goto err;
ctx->attrs.mss = msstab4[mssind];
} else {
if (mssind > ARRAY_SIZE(msstab6))
goto err;
ctx->attrs.mss = msstab6[mssind];
}
ctx->attrs.snd_wscale = cookie & BPF_SYNCOOKIE_WSCALE_MASK;
ctx->attrs.rcv_wscale = ctx->attrs.snd_wscale;
ctx->attrs.wscale_ok = ctx->attrs.snd_wscale == BPF_SYNCOOKIE_WSCALE_MASK;
ctx->attrs.sack_ok = cookie & BPF_SYNCOOKIE_SACK;
ctx->attrs.ecn_ok = cookie & BPF_SYNCOOKIE_ECN;
return 0;
err:
return -1;
}
static int tcp_handle_ack(struct tcp_syncookie *ctx)
{
struct bpf_sock_tuple tuple;
struct bpf_sock *skc;
int ret = TC_ACT_OK;
struct sock *sk;
u32 tuple_size;
if (ctx->ipv4) {
tuple.ipv4.saddr = ctx->ipv4->saddr;
tuple.ipv4.daddr = ctx->ipv4->daddr;
tuple.ipv4.sport = ctx->tcp->source;
tuple.ipv4.dport = ctx->tcp->dest;
tuple_size = sizeof(tuple.ipv4);
} else if (ctx->ipv6) {
__builtin_memcpy(tuple.ipv6.saddr, &ctx->ipv6->saddr, sizeof(tuple.ipv6.saddr));
__builtin_memcpy(tuple.ipv6.daddr, &ctx->ipv6->daddr, sizeof(tuple.ipv6.daddr));
tuple.ipv6.sport = ctx->tcp->source;
tuple.ipv6.dport = ctx->tcp->dest;
tuple_size = sizeof(tuple.ipv6);
} else {
goto out;
}
skc = bpf_skc_lookup_tcp(ctx->skb, &tuple, tuple_size, -1, 0);
if (!skc)
goto out;
if (skc->state != TCP_LISTEN)
goto release;
sk = (struct sock *)bpf_skc_to_tcp_sock(skc);
if (!sk)
goto err;
if (tcp_validate_header(ctx))
goto err;
tcp_parse_options(ctx);
if (tcp_validate_cookie(ctx))
goto err;
ret = bpf_sk_assign_tcp_reqsk(ctx->skb, sk, &ctx->attrs, sizeof(ctx->attrs));
if (ret < 0)
goto err;
release:
bpf_sk_release(skc);
out:
return ret;
err:
ret = TC_ACT_SHOT;
goto release;
}
SEC("tc")
int tcp_custom_syncookie(struct __sk_buff *skb)
{
struct tcp_syncookie ctx = {
.skb = skb,
};
if (tcp_load_headers(&ctx))
return TC_ACT_OK;
if (ctx.tcp->rst)
return TC_ACT_OK;
if (ctx.tcp->syn) {
if (ctx.tcp->ack)
return TC_ACT_OK;
handled_syn = true;
return tcp_handle_syn(&ctx);
}
handled_ack = true;
return tcp_handle_ack(&ctx);
}
char _license[] SEC("license") = "GPL";

View File

@ -0,0 +1,140 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright Amazon.com Inc. or its affiliates. */
#ifndef _TEST_TCP_SYNCOOKIE_H
#define _TEST_TCP_SYNCOOKIE_H
#define __packed __attribute__((__packed__))
#define __force
#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
#define swap(a, b) \
do { \
typeof(a) __tmp = (a); \
(a) = (b); \
(b) = __tmp; \
} while (0)
#define swap_array(a, b) \
do { \
typeof(a) __tmp[sizeof(a)]; \
__builtin_memcpy(__tmp, a, sizeof(a)); \
__builtin_memcpy(a, b, sizeof(a)); \
__builtin_memcpy(b, __tmp, sizeof(a)); \
} while (0)
/* asm-generic/unaligned.h */
#define __get_unaligned_t(type, ptr) ({ \
const struct { type x; } __packed * __pptr = (typeof(__pptr))(ptr); \
__pptr->x; \
})
#define get_unaligned(ptr) __get_unaligned_t(typeof(*(ptr)), (ptr))
static inline u16 get_unaligned_be16(const void *p)
{
return bpf_ntohs(__get_unaligned_t(__be16, p));
}
static inline u32 get_unaligned_be32(const void *p)
{
return bpf_ntohl(__get_unaligned_t(__be32, p));
}
/* lib/checksum.c */
static inline u32 from64to32(u64 x)
{
/* add up 32-bit and 32-bit for 32+c bit */
x = (x & 0xffffffff) + (x >> 32);
/* add up carry.. */
x = (x & 0xffffffff) + (x >> 32);
return (u32)x;
}
static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr,
__u32 len, __u8 proto, __wsum sum)
{
unsigned long long s = (__force u32)sum;
s += (__force u32)saddr;
s += (__force u32)daddr;
#ifdef __BIG_ENDIAN
s += proto + len;
#else
s += (proto + len) << 8;
#endif
return (__force __wsum)from64to32(s);
}
/* asm-generic/checksum.h */
static inline __sum16 csum_fold(__wsum csum)
{
u32 sum = (__force u32)csum;
sum = (sum & 0xffff) + (sum >> 16);
sum = (sum & 0xffff) + (sum >> 16);
return (__force __sum16)~sum;
}
static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr, __u32 len,
__u8 proto, __wsum sum)
{
return csum_fold(csum_tcpudp_nofold(saddr, daddr, len, proto, sum));
}
/* net/ipv6/ip6_checksum.c */
static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
const struct in6_addr *daddr,
__u32 len, __u8 proto, __wsum csum)
{
int carry;
__u32 ulen;
__u32 uproto;
__u32 sum = (__force u32)csum;
sum += (__force u32)saddr->in6_u.u6_addr32[0];
carry = (sum < (__force u32)saddr->in6_u.u6_addr32[0]);
sum += carry;
sum += (__force u32)saddr->in6_u.u6_addr32[1];
carry = (sum < (__force u32)saddr->in6_u.u6_addr32[1]);
sum += carry;
sum += (__force u32)saddr->in6_u.u6_addr32[2];
carry = (sum < (__force u32)saddr->in6_u.u6_addr32[2]);
sum += carry;
sum += (__force u32)saddr->in6_u.u6_addr32[3];
carry = (sum < (__force u32)saddr->in6_u.u6_addr32[3]);
sum += carry;
sum += (__force u32)daddr->in6_u.u6_addr32[0];
carry = (sum < (__force u32)daddr->in6_u.u6_addr32[0]);
sum += carry;
sum += (__force u32)daddr->in6_u.u6_addr32[1];
carry = (sum < (__force u32)daddr->in6_u.u6_addr32[1]);
sum += carry;
sum += (__force u32)daddr->in6_u.u6_addr32[2];
carry = (sum < (__force u32)daddr->in6_u.u6_addr32[2]);
sum += carry;
sum += (__force u32)daddr->in6_u.u6_addr32[3];
carry = (sum < (__force u32)daddr->in6_u.u6_addr32[3]);
sum += carry;
ulen = (__force u32)bpf_htonl((__u32)len);
sum += ulen;
carry = (sum < ulen);
sum += carry;
uproto = (__force u32)bpf_htonl(proto);
sum += uproto;
carry = (sum < uproto);
sum += carry;
return csum_fold((__force __wsum)sum);
}
#endif

View File

@ -59,7 +59,7 @@ int bpf_testcb(struct bpf_sock_ops *skops)
asm volatile (
"%[op] = *(u32 *)(%[skops] +96)"
: [op] "+r"(op)
: [op] "=r"(op)
: [skops] "r"(skops)
:);

View File

@ -18,11 +18,11 @@
#include "test_iptunnel_common.h"
#include "bpf_kfuncs.h"
const size_t tcphdr_sz = sizeof(struct tcphdr);
const size_t udphdr_sz = sizeof(struct udphdr);
const size_t ethhdr_sz = sizeof(struct ethhdr);
const size_t iphdr_sz = sizeof(struct iphdr);
const size_t ipv6hdr_sz = sizeof(struct ipv6hdr);
#define tcphdr_sz sizeof(struct tcphdr)
#define udphdr_sz sizeof(struct udphdr)
#define ethhdr_sz sizeof(struct ethhdr)
#define iphdr_sz sizeof(struct iphdr)
#define ipv6hdr_sz sizeof(struct ipv6hdr)
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);

View File

@ -0,0 +1,32 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
char _license[] SEC("license") = "GPL";
int my_pid;
bool reject_capable;
bool reject_cmd;
SEC("lsm/bpf_token_capable")
int BPF_PROG(token_capable, struct bpf_token *token, int cap)
{
if (my_pid == 0 || my_pid != (bpf_get_current_pid_tgid() >> 32))
return 0;
if (reject_capable)
return -1;
return 0;
}
SEC("lsm/bpf_token_cmd")
int BPF_PROG(token_cmd, struct bpf_token *token, enum bpf_cmd cmd)
{
if (my_pid == 0 || my_pid != (bpf_get_current_pid_tgid() >> 32))
return 0;
if (reject_cmd)
return -1;
return 0;
}

View File

@ -568,7 +568,7 @@ l0_%=: r0 = 0; \
SEC("tc")
__description("direct packet access: test23 (x += pkt_ptr, 4)")
__failure __msg("invalid access to packet, off=0 size=8, R5(id=2,off=0,r=0)")
__failure __msg("invalid access to packet, off=0 size=8, R5(id=3,off=0,r=0)")
__flag(BPF_F_ANY_ALIGNMENT)
__naked void test23_x_pkt_ptr_4(void)
{

View File

@ -259,4 +259,28 @@ l0_%=: r2 += r1; \
" ::: __clobber_all);
}
SEC("xdp")
__success
__naked void not_an_inifinite_loop(void)
{
asm volatile (" \
call %[bpf_get_prandom_u32]; \
r0 &= 0xff; \
*(u64 *)(r10 - 8) = r0; \
r0 = 0; \
loop_%=: \
r0 = *(u64 *)(r10 - 8); \
if r0 > 10 goto exit_%=; \
r0 += 1; \
*(u64 *)(r10 - 8) = r0; \
r0 = 0; \
goto loop_%=; \
exit_%=: \
r0 = 0; \
exit; \
" :
: __imm(bpf_get_prandom_u32)
: __clobber_all);
}
char _license[] SEC("license") = "GPL";

View File

@ -243,7 +243,7 @@ l0_%=: r0 = 0; \
SEC("tc")
__description("Spill u32 const scalars. Refill as u64. Offset to skb->data")
__failure __msg("invalid access to packet")
__failure __msg("math between pkt pointer and register with unbounded min value is not allowed")
__naked void u64_offset_to_skb_data(void)
{
asm volatile (" \
@ -253,13 +253,11 @@ __naked void u64_offset_to_skb_data(void)
w7 = 20; \
*(u32*)(r10 - 4) = r6; \
*(u32*)(r10 - 8) = r7; \
r4 = *(u16*)(r10 - 8); \
r4 = *(u64*)(r10 - 8); \
r0 = r2; \
/* r0 += r4 R0=pkt R2=pkt R3=pkt_end R4=umax=65535 */\
/* r0 += r4 R0=pkt R2=pkt R3=pkt_end R4= */ \
r0 += r4; \
/* if (r0 > r3) R0=pkt,umax=65535 R2=pkt R3=pkt_end R4=umax=65535 */\
if r0 > r3 goto l0_%=; \
/* r0 = *(u32 *)r2 R0=pkt,umax=65535 R2=pkt R3=pkt_end R4=20 */\
r0 = *(u32*)(r2 + 0); \
l0_%=: r0 = 0; \
exit; \
@ -495,14 +493,14 @@ char single_byte_buf[1] SEC(".data.single_byte_buf");
SEC("raw_tp")
__log_level(2)
__success
/* make sure fp-8 is all STACK_ZERO */
__msg("2: (7a) *(u64 *)(r10 -8) = 0 ; R10=fp0 fp-8_w=00000000")
/* fp-8 is spilled IMPRECISE value zero (represented by a zero value fake reg) */
__msg("2: (7a) *(u64 *)(r10 -8) = 0 ; R10=fp0 fp-8_w=0")
/* but fp-16 is spilled IMPRECISE zero const reg */
__msg("4: (7b) *(u64 *)(r10 -16) = r0 ; R0_w=0 R10=fp0 fp-16_w=0")
/* validate that assigning R2 from STACK_ZERO doesn't mark register
/* validate that assigning R2 from STACK_SPILL with zero value doesn't mark register
* precise immediately; if necessary, it will be marked precise later
*/
__msg("6: (71) r2 = *(u8 *)(r10 -1) ; R2_w=0 R10=fp0 fp-8_w=00000000")
__msg("6: (71) r2 = *(u8 *)(r10 -1) ; R2_w=0 R10=fp0 fp-8_w=0")
/* similarly, when R2 is assigned from spilled register, it is initially
* imprecise, but will be marked precise later once it is used in precise context
*/
@ -520,14 +518,14 @@ __msg("mark_precise: frame0: regs=r0 stack= before 3: (b7) r0 = 0")
__naked void partial_stack_load_preserves_zeros(void)
{
asm volatile (
/* fp-8 is all STACK_ZERO */
/* fp-8 is value zero (represented by a zero value fake reg) */
".8byte %[fp8_st_zero];" /* LLVM-18+: *(u64 *)(r10 -8) = 0; */
/* fp-16 is const zero register */
"r0 = 0;"
"*(u64 *)(r10 -16) = r0;"
/* load single U8 from non-aligned STACK_ZERO slot */
/* load single U8 from non-aligned spilled value zero slot */
"r1 = %[single_byte_buf];"
"r2 = *(u8 *)(r10 -1);"
"r1 += r2;"
@ -539,7 +537,7 @@ __naked void partial_stack_load_preserves_zeros(void)
"r1 += r2;"
"*(u8 *)(r1 + 0) = r2;" /* this should be fine */
/* load single U16 from non-aligned STACK_ZERO slot */
/* load single U16 from non-aligned spilled value zero slot */
"r1 = %[single_byte_buf];"
"r2 = *(u16 *)(r10 -2);"
"r1 += r2;"
@ -551,7 +549,7 @@ __naked void partial_stack_load_preserves_zeros(void)
"r1 += r2;"
"*(u8 *)(r1 + 0) = r2;" /* this should be fine */
/* load single U32 from non-aligned STACK_ZERO slot */
/* load single U32 from non-aligned spilled value zero slot */
"r1 = %[single_byte_buf];"
"r2 = *(u32 *)(r10 -4);"
"r1 += r2;"
@ -583,6 +581,47 @@ __naked void partial_stack_load_preserves_zeros(void)
: __clobber_common);
}
SEC("raw_tp")
__log_level(2)
__success
/* fp-4 is STACK_ZERO */
__msg("2: (62) *(u32 *)(r10 -4) = 0 ; R10=fp0 fp-8=0000????")
__msg("4: (71) r2 = *(u8 *)(r10 -1) ; R2_w=0 R10=fp0 fp-8=0000????")
__msg("5: (0f) r1 += r2")
__msg("mark_precise: frame0: last_idx 5 first_idx 0 subseq_idx -1")
__msg("mark_precise: frame0: regs=r2 stack= before 4: (71) r2 = *(u8 *)(r10 -1)")
__naked void partial_stack_load_preserves_partial_zeros(void)
{
asm volatile (
/* fp-4 is value zero */
".8byte %[fp4_st_zero];" /* LLVM-18+: *(u32 *)(r10 -4) = 0; */
/* load single U8 from non-aligned stack zero slot */
"r1 = %[single_byte_buf];"
"r2 = *(u8 *)(r10 -1);"
"r1 += r2;"
"*(u8 *)(r1 + 0) = r2;" /* this should be fine */
/* load single U16 from non-aligned stack zero slot */
"r1 = %[single_byte_buf];"
"r2 = *(u16 *)(r10 -2);"
"r1 += r2;"
"*(u8 *)(r1 + 0) = r2;" /* this should be fine */
/* load single U32 from non-aligned stack zero slot */
"r1 = %[single_byte_buf];"
"r2 = *(u32 *)(r10 -4);"
"r1 += r2;"
"*(u8 *)(r1 + 0) = r2;" /* this should be fine */
"r0 = 0;"
"exit;"
:
: __imm_ptr(single_byte_buf),
__imm_insn(fp4_st_zero, BPF_ST_MEM(BPF_W, BPF_REG_FP, -4, 0))
: __clobber_common);
}
char two_byte_buf[2] SEC(".data.two_byte_buf");
SEC("raw_tp")
@ -737,4 +776,168 @@ __naked void stack_load_preserves_const_precision_subreg(void)
: __clobber_common);
}
SEC("xdp")
__description("32-bit spilled reg range should be tracked")
__success __retval(0)
__naked void spill_32bit_range_track(void)
{
asm volatile(" \
call %[bpf_ktime_get_ns]; \
/* Make r0 bounded. */ \
r0 &= 65535; \
/* Assign an ID to r0. */ \
r1 = r0; \
/* 32-bit spill r0 to stack. */ \
*(u32*)(r10 - 8) = r0; \
/* Boundary check on r0. */ \
if r0 < 1 goto l0_%=; \
/* 32-bit fill r1 from stack. */ \
r1 = *(u32*)(r10 - 8); \
/* r1 == r0 => r1 >= 1 always. */ \
if r1 >= 1 goto l0_%=; \
/* Dead branch: the verifier should prune it. \
* Do an invalid memory access if the verifier \
* follows it. \
*/ \
r0 = *(u64*)(r9 + 0); \
l0_%=: r0 = 0; \
exit; \
" :
: __imm(bpf_ktime_get_ns)
: __clobber_all);
}
SEC("xdp")
__description("64-bit spill of 64-bit reg should assign ID")
__success __retval(0)
__naked void spill_64bit_of_64bit_ok(void)
{
asm volatile (" \
/* Roll one bit to make the register inexact. */\
call %[bpf_get_prandom_u32]; \
r0 &= 0x80000000; \
r0 <<= 32; \
/* 64-bit spill r0 to stack - should assign an ID. */\
*(u64*)(r10 - 8) = r0; \
/* 64-bit fill r1 from stack - should preserve the ID. */\
r1 = *(u64*)(r10 - 8); \
/* Compare r1 with another register to trigger find_equal_scalars.\
* Having one random bit is important here, otherwise the verifier cuts\
* the corners. \
*/ \
r2 = 0; \
if r1 != r2 goto l0_%=; \
/* The result of this comparison is predefined. */\
if r0 == r2 goto l0_%=; \
/* Dead branch: the verifier should prune it. Do an invalid memory\
* access if the verifier follows it. \
*/ \
r0 = *(u64*)(r9 + 0); \
exit; \
l0_%=: r0 = 0; \
exit; \
" :
: __imm(bpf_get_prandom_u32)
: __clobber_all);
}
SEC("xdp")
__description("32-bit spill of 32-bit reg should assign ID")
__success __retval(0)
__naked void spill_32bit_of_32bit_ok(void)
{
asm volatile (" \
/* Roll one bit to make the register inexact. */\
call %[bpf_get_prandom_u32]; \
w0 &= 0x80000000; \
/* 32-bit spill r0 to stack - should assign an ID. */\
*(u32*)(r10 - 8) = r0; \
/* 32-bit fill r1 from stack - should preserve the ID. */\
r1 = *(u32*)(r10 - 8); \
/* Compare r1 with another register to trigger find_equal_scalars.\
* Having one random bit is important here, otherwise the verifier cuts\
* the corners. \
*/ \
r2 = 0; \
if r1 != r2 goto l0_%=; \
/* The result of this comparison is predefined. */\
if r0 == r2 goto l0_%=; \
/* Dead branch: the verifier should prune it. Do an invalid memory\
* access if the verifier follows it. \
*/ \
r0 = *(u64*)(r9 + 0); \
exit; \
l0_%=: r0 = 0; \
exit; \
" :
: __imm(bpf_get_prandom_u32)
: __clobber_all);
}
SEC("xdp")
__description("16-bit spill of 16-bit reg should assign ID")
__success __retval(0)
__naked void spill_16bit_of_16bit_ok(void)
{
asm volatile (" \
/* Roll one bit to make the register inexact. */\
call %[bpf_get_prandom_u32]; \
r0 &= 0x8000; \
/* 16-bit spill r0 to stack - should assign an ID. */\
*(u16*)(r10 - 8) = r0; \
/* 16-bit fill r1 from stack - should preserve the ID. */\
r1 = *(u16*)(r10 - 8); \
/* Compare r1 with another register to trigger find_equal_scalars.\
* Having one random bit is important here, otherwise the verifier cuts\
* the corners. \
*/ \
r2 = 0; \
if r1 != r2 goto l0_%=; \
/* The result of this comparison is predefined. */\
if r0 == r2 goto l0_%=; \
/* Dead branch: the verifier should prune it. Do an invalid memory\
* access if the verifier follows it. \
*/ \
r0 = *(u64*)(r9 + 0); \
exit; \
l0_%=: r0 = 0; \
exit; \
" :
: __imm(bpf_get_prandom_u32)
: __clobber_all);
}
SEC("xdp")
__description("8-bit spill of 8-bit reg should assign ID")
__success __retval(0)
__naked void spill_8bit_of_8bit_ok(void)
{
asm volatile (" \
/* Roll one bit to make the register inexact. */\
call %[bpf_get_prandom_u32]; \
r0 &= 0x80; \
/* 8-bit spill r0 to stack - should assign an ID. */\
*(u8*)(r10 - 8) = r0; \
/* 8-bit fill r1 from stack - should preserve the ID. */\
r1 = *(u8*)(r10 - 8); \
/* Compare r1 with another register to trigger find_equal_scalars.\
* Having one random bit is important here, otherwise the verifier cuts\
* the corners. \
*/ \
r2 = 0; \
if r1 != r2 goto l0_%=; \
/* The result of this comparison is predefined. */\
if r0 == r2 goto l0_%=; \
/* Dead branch: the verifier should prune it. Do an invalid memory\
* access if the verifier follows it. \
*/ \
r0 = *(u64*)(r9 + 0); \
exit; \
l0_%=: r0 = 0; \
exit; \
" :
: __imm(bpf_get_prandom_u32)
: __clobber_all);
}
char _license[] SEC("license") = "GPL";

View File

@ -181,7 +181,7 @@ static int parse_test_spec(struct test_loader *tester,
memset(spec, 0, sizeof(*spec));
spec->prog_name = bpf_program__name(prog);
spec->prog_flags = BPF_F_TEST_REG_INVARIANTS; /* by default be strict */
spec->prog_flags = testing_prog_flags();
btf = bpf_object__btf(obj);
if (!btf) {
@ -688,7 +688,7 @@ static void process_subtest(struct test_loader *tester,
++nr_progs;
specs = calloc(nr_progs, sizeof(struct test_spec));
if (!ASSERT_OK_PTR(specs, "Can't alloc specs array"))
if (!ASSERT_OK_PTR(specs, "specs_alloc"))
return;
i = 0;

View File

@ -1190,7 +1190,11 @@ static void test_map_in_map(void)
goto out_map_in_map;
}
bpf_object__load(obj);
err = bpf_object__load(obj);
if (err) {
printf("Failed to load test prog\n");
goto out_map_in_map;
}
map = bpf_object__find_map_by_name(obj, "mim_array");
if (!map) {

View File

@ -547,24 +547,6 @@ int bpf_find_map(const char *test, struct bpf_object *obj, const char *name)
return bpf_map__fd(map);
}
static bool is_jit_enabled(void)
{
const char *jit_sysctl = "/proc/sys/net/core/bpf_jit_enable";
bool enabled = false;
int sysctl_fd;
sysctl_fd = open(jit_sysctl, 0, O_RDONLY);
if (sysctl_fd != -1) {
char tmpc;
if (read(sysctl_fd, &tmpc, sizeof(tmpc)) == 1)
enabled = (tmpc != '0');
close(sysctl_fd);
}
return enabled;
}
int compare_map_keys(int map1_fd, int map2_fd)
{
__u32 key, next_key;

View File

@ -19,6 +19,7 @@
#include <bpf/libbpf.h>
#include "cgroup_helpers.h"
#include "testing_helpers.h"
#include "bpf_util.h"
#ifndef ENOTSUPP
@ -679,7 +680,7 @@ static int load_path(const struct sock_addr_test *test, const char *path)
bpf_program__set_type(prog, BPF_PROG_TYPE_CGROUP_SOCK_ADDR);
bpf_program__set_expected_attach_type(prog, test->expected_attach_type);
bpf_program__set_flags(prog, BPF_F_TEST_RND_HI32 | BPF_F_TEST_REG_INVARIANTS);
bpf_program__set_flags(prog, testing_prog_flags());
err = bpf_object__load(obj);
if (err) {

View File

@ -67,6 +67,7 @@
#define F_NEEDS_EFFICIENT_UNALIGNED_ACCESS (1 << 0)
#define F_LOAD_WITH_STRICT_ALIGNMENT (1 << 1)
#define F_NEEDS_JIT_ENABLED (1 << 2)
/* need CAP_BPF, CAP_NET_ADMIN, CAP_PERFMON to load progs */
#define ADMIN_CAPS (1ULL << CAP_NET_ADMIN | \
@ -74,6 +75,7 @@
1ULL << CAP_BPF)
#define UNPRIV_SYSCTL "kernel/unprivileged_bpf_disabled"
static bool unpriv_disabled = false;
static bool jit_disabled;
static int skips;
static bool verbose = false;
static int verif_log_level = 0;
@ -1341,48 +1343,6 @@ static bool cmp_str_seq(const char *log, const char *exp)
return true;
}
static struct bpf_insn *get_xlated_program(int fd_prog, int *cnt)
{
__u32 buf_element_size = sizeof(struct bpf_insn);
struct bpf_prog_info info = {};
__u32 info_len = sizeof(info);
__u32 xlated_prog_len;
struct bpf_insn *buf;
if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
perror("bpf_prog_get_info_by_fd failed");
return NULL;
}
xlated_prog_len = info.xlated_prog_len;
if (xlated_prog_len % buf_element_size) {
printf("Program length %d is not multiple of %d\n",
xlated_prog_len, buf_element_size);
return NULL;
}
*cnt = xlated_prog_len / buf_element_size;
buf = calloc(*cnt, buf_element_size);
if (!buf) {
perror("can't allocate xlated program buffer");
return NULL;
}
bzero(&info, sizeof(info));
info.xlated_prog_len = xlated_prog_len;
info.xlated_prog_insns = (__u64)(unsigned long)buf;
if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
perror("second bpf_prog_get_info_by_fd failed");
goto out_free_buf;
}
return buf;
out_free_buf:
free(buf);
return NULL;
}
static bool is_null_insn(struct bpf_insn *insn)
{
struct bpf_insn null_insn = {};
@ -1505,7 +1465,7 @@ static void print_insn(struct bpf_insn *buf, int cnt)
static bool check_xlated_program(struct bpf_test *test, int fd_prog)
{
struct bpf_insn *buf;
int cnt;
unsigned int cnt;
bool result = true;
bool check_expected = !is_null_insn(test->expected_insns);
bool check_unexpected = !is_null_insn(test->unexpected_insns);
@ -1513,8 +1473,7 @@ static bool check_xlated_program(struct bpf_test *test, int fd_prog)
if (!check_expected && !check_unexpected)
goto out;
buf = get_xlated_program(fd_prog, &cnt);
if (!buf) {
if (get_xlated_program(fd_prog, &buf, &cnt)) {
printf("FAIL: can't get xlated program\n");
result = false;
goto out;
@ -1567,6 +1526,13 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
__u32 pflags;
int i, err;
if ((test->flags & F_NEEDS_JIT_ENABLED) && jit_disabled) {
printf("SKIP (requires BPF JIT)\n");
skips++;
sched_yield();
return;
}
fd_prog = -1;
for (i = 0; i < MAX_NR_MAPS; i++)
map_fds[i] = -1;
@ -1588,7 +1554,7 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
if (fixup_skips != skips)
return;
pflags = BPF_F_TEST_RND_HI32 | BPF_F_TEST_REG_INVARIANTS;
pflags = testing_prog_flags();
if (test->flags & F_LOAD_WITH_STRICT_ALIGNMENT)
pflags |= BPF_F_STRICT_ALIGNMENT;
if (test->flags & F_NEEDS_EFFICIENT_UNALIGNED_ACCESS)
@ -1887,6 +1853,8 @@ int main(int argc, char **argv)
return EXIT_FAILURE;
}
jit_disabled = !is_jit_enabled();
/* Use libbpf 1.0 API mode */
libbpf_set_strict_mode(LIBBPF_STRICT_ALL);

View File

@ -252,6 +252,34 @@ __u32 link_info_prog_id(const struct bpf_link *link, struct bpf_link_info *info)
int extra_prog_load_log_flags = 0;
int testing_prog_flags(void)
{
static int cached_flags = -1;
static int prog_flags[] = { BPF_F_TEST_RND_HI32, BPF_F_TEST_REG_INVARIANTS };
static struct bpf_insn insns[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
};
int insn_cnt = ARRAY_SIZE(insns), i, fd, flags = 0;
LIBBPF_OPTS(bpf_prog_load_opts, opts);
if (cached_flags >= 0)
return cached_flags;
for (i = 0; i < ARRAY_SIZE(prog_flags); i++) {
opts.prog_flags = prog_flags[i];
fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, "flag-test", "GPL",
insns, insn_cnt, &opts);
if (fd >= 0) {
flags |= prog_flags[i];
close(fd);
}
}
cached_flags = flags;
return cached_flags;
}
int bpf_prog_test_load(const char *file, enum bpf_prog_type type,
struct bpf_object **pobj, int *prog_fd)
{
@ -276,7 +304,7 @@ int bpf_prog_test_load(const char *file, enum bpf_prog_type type,
if (type != BPF_PROG_TYPE_UNSPEC && bpf_program__type(prog) != type)
bpf_program__set_type(prog, type);
flags = bpf_program__flags(prog) | BPF_F_TEST_RND_HI32 | BPF_F_TEST_REG_INVARIANTS;
flags = bpf_program__flags(prog) | testing_prog_flags();
bpf_program__set_flags(prog, flags);
err = bpf_object__load(obj);
@ -299,7 +327,7 @@ int bpf_test_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
{
LIBBPF_OPTS(bpf_prog_load_opts, opts,
.kern_version = kern_version,
.prog_flags = BPF_F_TEST_RND_HI32 | BPF_F_TEST_REG_INVARIANTS,
.prog_flags = testing_prog_flags(),
.log_level = extra_prog_load_log_flags,
.log_buf = log_buf,
.log_size = log_buf_sz,
@ -387,3 +415,63 @@ int kern_sync_rcu(void)
{
return syscall(__NR_membarrier, MEMBARRIER_CMD_SHARED, 0, 0);
}
int get_xlated_program(int fd_prog, struct bpf_insn **buf, __u32 *cnt)
{
__u32 buf_element_size = sizeof(struct bpf_insn);
struct bpf_prog_info info = {};
__u32 info_len = sizeof(info);
__u32 xlated_prog_len;
if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
perror("bpf_prog_get_info_by_fd failed");
return -1;
}
xlated_prog_len = info.xlated_prog_len;
if (xlated_prog_len % buf_element_size) {
printf("Program length %u is not multiple of %u\n",
xlated_prog_len, buf_element_size);
return -1;
}
*cnt = xlated_prog_len / buf_element_size;
*buf = calloc(*cnt, buf_element_size);
if (!buf) {
perror("can't allocate xlated program buffer");
return -ENOMEM;
}
bzero(&info, sizeof(info));
info.xlated_prog_len = xlated_prog_len;
info.xlated_prog_insns = (__u64)(unsigned long)*buf;
if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) {
perror("second bpf_prog_get_info_by_fd failed");
goto out_free_buf;
}
return 0;
out_free_buf:
free(*buf);
*buf = NULL;
return -1;
}
bool is_jit_enabled(void)
{
const char *jit_sysctl = "/proc/sys/net/core/bpf_jit_enable";
bool enabled = false;
int sysctl_fd;
sysctl_fd = open(jit_sysctl, O_RDONLY);
if (sysctl_fd != -1) {
char tmpc;
if (read(sysctl_fd, &tmpc, sizeof(tmpc)) == 1)
enabled = (tmpc != '0');
close(sysctl_fd);
}
return enabled;
}

View File

@ -46,4 +46,12 @@ static inline __u64 get_time_ns(void)
return (u64)t.tv_sec * 1000000000 + t.tv_nsec;
}
struct bpf_insn;
/* Request BPF program instructions after all rewrites are applied,
* e.g. verifier.c:convert_ctx_access() is done.
*/
int get_xlated_program(int fd_prog, struct bpf_insn **buf, __u32 *cnt);
int testing_prog_flags(void);
bool is_jit_enabled(void);
#endif /* __TESTING_HELPERS_H */

View File

@ -57,6 +57,7 @@
.expected_insns = { PSEUDO_CALL_INSN() },
.unexpected_insns = { HELPER_CALL_INSN() },
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
.flags = F_NEEDS_JIT_ENABLED,
.result = ACCEPT,
.runs = 0,
.func_info = { { 0, MAIN_TYPE }, { 12, CALLBACK_TYPE } },
@ -90,6 +91,7 @@
.expected_insns = { HELPER_CALL_INSN() },
.unexpected_insns = { PSEUDO_CALL_INSN() },
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
.flags = F_NEEDS_JIT_ENABLED,
.result = ACCEPT,
.runs = 0,
.func_info = { { 0, MAIN_TYPE }, { 16, CALLBACK_TYPE } },
@ -127,6 +129,7 @@
.expected_insns = { HELPER_CALL_INSN() },
.unexpected_insns = { PSEUDO_CALL_INSN() },
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
.flags = F_NEEDS_JIT_ENABLED,
.result = ACCEPT,
.runs = 0,
.func_info = {
@ -165,6 +168,7 @@
.expected_insns = { PSEUDO_CALL_INSN() },
.unexpected_insns = { HELPER_CALL_INSN() },
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
.flags = F_NEEDS_JIT_ENABLED,
.result = ACCEPT,
.runs = 0,
.func_info = {
@ -235,6 +239,7 @@
},
.unexpected_insns = { HELPER_CALL_INSN() },
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
.flags = F_NEEDS_JIT_ENABLED,
.result = ACCEPT,
.func_info = {
{ 0, MAIN_TYPE },
@ -252,6 +257,7 @@
.unexpected_insns = { HELPER_CALL_INSN() },
.result = ACCEPT,
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
.flags = F_NEEDS_JIT_ENABLED,
.func_info = { { 0, MAIN_TYPE }, { 16, CALLBACK_TYPE } },
.func_info_cnt = 2,
BTF_TYPES

Some files were not shown because too many files have changed in this diff Show More