mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2025-01-04 04:06:26 +00:00
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Minor conflict, a CHECK was placed into an if() statement in net-next, whilst a newline was added to that CHECK call in 'net'. Thanks to Daniel for the merge resolution. Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
commit
01adc4851a
297
Documentation/networking/af_xdp.rst
Normal file
297
Documentation/networking/af_xdp.rst
Normal file
@ -0,0 +1,297 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
======
|
||||
AF_XDP
|
||||
======
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
AF_XDP is an address family that is optimized for high performance
|
||||
packet processing.
|
||||
|
||||
This document assumes that the reader is familiar with BPF and XDP. If
|
||||
not, the Cilium project has an excellent reference guide at
|
||||
http://cilium.readthedocs.io/en/doc-1.0/bpf/.
|
||||
|
||||
Using the XDP_REDIRECT action from an XDP program, the program can
|
||||
redirect ingress frames to other XDP enabled netdevs, using the
|
||||
bpf_redirect_map() function. AF_XDP sockets enable the possibility for
|
||||
XDP programs to redirect frames to a memory buffer in a user-space
|
||||
application.
|
||||
|
||||
An AF_XDP socket (XSK) is created with the normal socket()
|
||||
syscall. Associated with each XSK are two rings: the RX ring and the
|
||||
TX ring. A socket can receive packets on the RX ring and it can send
|
||||
packets on the TX ring. These rings are registered and sized with the
|
||||
setsockopts XDP_RX_RING and XDP_TX_RING, respectively. It is mandatory
|
||||
to have at least one of these rings for each socket. An RX or TX
|
||||
descriptor ring points to a data buffer in a memory area called a
|
||||
UMEM. RX and TX can share the same UMEM so that a packet does not have
|
||||
to be copied between RX and TX. Moreover, if a packet needs to be kept
|
||||
for a while due to a possible retransmit, the descriptor that points
|
||||
to that packet can be changed to point to another and reused right
|
||||
away. This again avoids copying data.
|
||||
|
||||
The UMEM consists of a number of equally size frames and each frame
|
||||
has a unique frame id. A descriptor in one of the rings references a
|
||||
frame by referencing its frame id. The user space allocates memory for
|
||||
this UMEM using whatever means it feels is most appropriate (malloc,
|
||||
mmap, huge pages, etc). This memory area is then registered with the
|
||||
kernel using the new setsockopt XDP_UMEM_REG. The UMEM also has two
|
||||
rings: the FILL ring and the COMPLETION ring. The fill ring is used by
|
||||
the application to send down frame ids for the kernel to fill in with
|
||||
RX packet data. References to these frames will then appear in the RX
|
||||
ring once each packet has been received. The completion ring, on the
|
||||
other hand, contains frame ids that the kernel has transmitted
|
||||
completely and can now be used again by user space, for either TX or
|
||||
RX. Thus, the frame ids appearing in the completion ring are ids that
|
||||
were previously transmitted using the TX ring. In summary, the RX and
|
||||
FILL rings are used for the RX path and the TX and COMPLETION rings
|
||||
are used for the TX path.
|
||||
|
||||
The socket is then finally bound with a bind() call to a device and a
|
||||
specific queue id on that device, and it is not until bind is
|
||||
completed that traffic starts to flow.
|
||||
|
||||
The UMEM can be shared between processes, if desired. If a process
|
||||
wants to do this, it simply skips the registration of the UMEM and its
|
||||
corresponding two rings, sets the XDP_SHARED_UMEM flag in the bind
|
||||
call and submits the XSK of the process it would like to share UMEM
|
||||
with as well as its own newly created XSK socket. The new process will
|
||||
then receive frame id references in its own RX ring that point to this
|
||||
shared UMEM. Note that since the ring structures are single-consumer /
|
||||
single-producer (for performance reasons), the new process has to
|
||||
create its own socket with associated RX and TX rings, since it cannot
|
||||
share this with the other process. This is also the reason that there
|
||||
is only one set of FILL and COMPLETION rings per UMEM. It is the
|
||||
responsibility of a single process to handle the UMEM.
|
||||
|
||||
How is then packets distributed from an XDP program to the XSKs? There
|
||||
is a BPF map called XSKMAP (or BPF_MAP_TYPE_XSKMAP in full). The
|
||||
user-space application can place an XSK at an arbitrary place in this
|
||||
map. The XDP program can then redirect a packet to a specific index in
|
||||
this map and at this point XDP validates that the XSK in that map was
|
||||
indeed bound to that device and ring number. If not, the packet is
|
||||
dropped. If the map is empty at that index, the packet is also
|
||||
dropped. This also means that it is currently mandatory to have an XDP
|
||||
program loaded (and one XSK in the XSKMAP) to be able to get any
|
||||
traffic to user space through the XSK.
|
||||
|
||||
AF_XDP can operate in two different modes: XDP_SKB and XDP_DRV. If the
|
||||
driver does not have support for XDP, or XDP_SKB is explicitly chosen
|
||||
when loading the XDP program, XDP_SKB mode is employed that uses SKBs
|
||||
together with the generic XDP support and copies out the data to user
|
||||
space. A fallback mode that works for any network device. On the other
|
||||
hand, if the driver has support for XDP, it will be used by the AF_XDP
|
||||
code to provide better performance, but there is still a copy of the
|
||||
data into user space.
|
||||
|
||||
Concepts
|
||||
========
|
||||
|
||||
In order to use an AF_XDP socket, a number of associated objects need
|
||||
to be setup.
|
||||
|
||||
Jonathan Corbet has also written an excellent article on LWN,
|
||||
"Accelerating networking with AF_XDP". It can be found at
|
||||
https://lwn.net/Articles/750845/.
|
||||
|
||||
UMEM
|
||||
----
|
||||
|
||||
UMEM is a region of virtual contiguous memory, divided into
|
||||
equal-sized frames. An UMEM is associated to a netdev and a specific
|
||||
queue id of that netdev. It is created and configured (frame size,
|
||||
frame headroom, start address and size) by using the XDP_UMEM_REG
|
||||
setsockopt system call. A UMEM is bound to a netdev and queue id, via
|
||||
the bind() system call.
|
||||
|
||||
An AF_XDP is socket linked to a single UMEM, but one UMEM can have
|
||||
multiple AF_XDP sockets. To share an UMEM created via one socket A,
|
||||
the next socket B can do this by setting the XDP_SHARED_UMEM flag in
|
||||
struct sockaddr_xdp member sxdp_flags, and passing the file descriptor
|
||||
of A to struct sockaddr_xdp member sxdp_shared_umem_fd.
|
||||
|
||||
The UMEM has two single-producer/single-consumer rings, that are used
|
||||
to transfer ownership of UMEM frames between the kernel and the
|
||||
user-space application.
|
||||
|
||||
Rings
|
||||
-----
|
||||
|
||||
There are a four different kind of rings: Fill, Completion, RX and
|
||||
TX. All rings are single-producer/single-consumer, so the user-space
|
||||
application need explicit synchronization of multiple
|
||||
processes/threads are reading/writing to them.
|
||||
|
||||
The UMEM uses two rings: Fill and Completion. Each socket associated
|
||||
with the UMEM must have an RX queue, TX queue or both. Say, that there
|
||||
is a setup with four sockets (all doing TX and RX). Then there will be
|
||||
one Fill ring, one Completion ring, four TX rings and four RX rings.
|
||||
|
||||
The rings are head(producer)/tail(consumer) based rings. A producer
|
||||
writes the data ring at the index pointed out by struct xdp_ring
|
||||
producer member, and increasing the producer index. A consumer reads
|
||||
the data ring at the index pointed out by struct xdp_ring consumer
|
||||
member, and increasing the consumer index.
|
||||
|
||||
The rings are configured and created via the _RING setsockopt system
|
||||
calls and mmapped to user-space using the appropriate offset to mmap()
|
||||
(XDP_PGOFF_RX_RING, XDP_PGOFF_TX_RING, XDP_UMEM_PGOFF_FILL_RING and
|
||||
XDP_UMEM_PGOFF_COMPLETION_RING).
|
||||
|
||||
The size of the rings need to be of size power of two.
|
||||
|
||||
UMEM Fill Ring
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
The Fill ring is used to transfer ownership of UMEM frames from
|
||||
user-space to kernel-space. The UMEM indicies are passed in the
|
||||
ring. As an example, if the UMEM is 64k and each frame is 4k, then the
|
||||
UMEM has 16 frames and can pass indicies between 0 and 15.
|
||||
|
||||
Frames passed to the kernel are used for the ingress path (RX rings).
|
||||
|
||||
The user application produces UMEM indicies to this ring.
|
||||
|
||||
UMEM Completetion Ring
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The Completion Ring is used transfer ownership of UMEM frames from
|
||||
kernel-space to user-space. Just like the Fill ring, UMEM indicies are
|
||||
used.
|
||||
|
||||
Frames passed from the kernel to user-space are frames that has been
|
||||
sent (TX ring) and can be used by user-space again.
|
||||
|
||||
The user application consumes UMEM indicies from this ring.
|
||||
|
||||
|
||||
RX Ring
|
||||
~~~~~~~
|
||||
|
||||
The RX ring is the receiving side of a socket. Each entry in the ring
|
||||
is a struct xdp_desc descriptor. The descriptor contains UMEM index
|
||||
(idx), the length of the data (len), the offset into the frame
|
||||
(offset).
|
||||
|
||||
If no frames have been passed to kernel via the Fill ring, no
|
||||
descriptors will (or can) appear on the RX ring.
|
||||
|
||||
The user application consumes struct xdp_desc descriptors from this
|
||||
ring.
|
||||
|
||||
TX Ring
|
||||
~~~~~~~
|
||||
|
||||
The TX ring is used to send frames. The struct xdp_desc descriptor is
|
||||
filled (index, length and offset) and passed into the ring.
|
||||
|
||||
To start the transfer a sendmsg() system call is required. This might
|
||||
be relaxed in the future.
|
||||
|
||||
The user application produces struct xdp_desc descriptors to this
|
||||
ring.
|
||||
|
||||
XSKMAP / BPF_MAP_TYPE_XSKMAP
|
||||
----------------------------
|
||||
|
||||
On XDP side there is a BPF map type BPF_MAP_TYPE_XSKMAP (XSKMAP) that
|
||||
is used in conjunction with bpf_redirect_map() to pass the ingress
|
||||
frame to a socket.
|
||||
|
||||
The user application inserts the socket into the map, via the bpf()
|
||||
system call.
|
||||
|
||||
Note that if an XDP program tries to redirect to a socket that does
|
||||
not match the queue configuration and netdev, the frame will be
|
||||
dropped. E.g. an AF_XDP socket is bound to netdev eth0 and
|
||||
queue 17. Only the XDP program executing for eth0 and queue 17 will
|
||||
successfully pass data to the socket. Please refer to the sample
|
||||
application (samples/bpf/) in for an example.
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
In order to use AF_XDP sockets there are two parts needed. The
|
||||
user-space application and the XDP program. For a complete setup and
|
||||
usage example, please refer to the sample application. The user-space
|
||||
side is xdpsock_user.c and the XDP side xdpsock_kern.c.
|
||||
|
||||
Naive ring dequeue and enqueue could look like this::
|
||||
|
||||
// typedef struct xdp_rxtx_ring RING;
|
||||
// typedef struct xdp_umem_ring RING;
|
||||
|
||||
// typedef struct xdp_desc RING_TYPE;
|
||||
// typedef __u32 RING_TYPE;
|
||||
|
||||
int dequeue_one(RING *ring, RING_TYPE *item)
|
||||
{
|
||||
__u32 entries = ring->ptrs.producer - ring->ptrs.consumer;
|
||||
|
||||
if (entries == 0)
|
||||
return -1;
|
||||
|
||||
// read-barrier!
|
||||
|
||||
*item = ring->desc[ring->ptrs.consumer & (RING_SIZE - 1)];
|
||||
ring->ptrs.consumer++;
|
||||
return 0;
|
||||
}
|
||||
|
||||
int enqueue_one(RING *ring, const RING_TYPE *item)
|
||||
{
|
||||
u32 free_entries = RING_SIZE - (ring->ptrs.producer - ring->ptrs.consumer);
|
||||
|
||||
if (free_entries == 0)
|
||||
return -1;
|
||||
|
||||
ring->desc[ring->ptrs.producer & (RING_SIZE - 1)] = *item;
|
||||
|
||||
// write-barrier!
|
||||
|
||||
ring->ptrs.producer++;
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
For a more optimized version, please refer to the sample application.
|
||||
|
||||
Sample application
|
||||
==================
|
||||
|
||||
There is a xdpsock benchmarking/test application included that
|
||||
demonstrates how to use AF_XDP sockets with both private and shared
|
||||
UMEMs. Say that you would like your UDP traffic from port 4242 to end
|
||||
up in queue 16, that we will enable AF_XDP on. Here, we use ethtool
|
||||
for this::
|
||||
|
||||
ethtool -N p3p2 rx-flow-hash udp4 fn
|
||||
ethtool -N p3p2 flow-type udp4 src-port 4242 dst-port 4242 \
|
||||
action 16
|
||||
|
||||
Running the rxdrop benchmark in XDP_DRV mode can then be done
|
||||
using::
|
||||
|
||||
samples/bpf/xdpsock -i p3p2 -q 16 -r -N
|
||||
|
||||
For XDP_SKB mode, use the switch "-S" instead of "-N" and all options
|
||||
can be displayed with "-h", as usual.
|
||||
|
||||
Credits
|
||||
=======
|
||||
|
||||
- Björn Töpel (AF_XDP core)
|
||||
- Magnus Karlsson (AF_XDP core)
|
||||
- Alexander Duyck
|
||||
- Alexei Starovoitov
|
||||
- Daniel Borkmann
|
||||
- Jesper Dangaard Brouer
|
||||
- John Fastabend
|
||||
- Jonathan Corbet (LWN coverage)
|
||||
- Michael S. Tsirkin
|
||||
- Qi Z Zhang
|
||||
- Willem de Bruijn
|
||||
|
@ -483,6 +483,12 @@ Example output from dmesg:
|
||||
[ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
|
||||
[ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
|
||||
|
||||
When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and
|
||||
setting any other value than that will return in failure. This is even the case for
|
||||
setting bpf_jit_enable to 2, since dumping the final JIT image into the kernel log
|
||||
is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is the
|
||||
generally recommended approach instead.
|
||||
|
||||
In the kernel source tree under tools/bpf/, there's bpf_jit_disasm for
|
||||
generating disassembly out of the kernel log's hexdump:
|
||||
|
||||
|
@ -6,6 +6,7 @@ Contents:
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
af_xdp
|
||||
batman-adv
|
||||
can
|
||||
dpaa2/index
|
||||
|
@ -45,6 +45,7 @@ through bpf(2) and passing a verifier in the kernel, a JIT will then
|
||||
translate these BPF proglets into native CPU instructions. There are
|
||||
two flavors of JITs, the newer eBPF JIT currently supported on:
|
||||
- x86_64
|
||||
- x86_32
|
||||
- arm64
|
||||
- arm32
|
||||
- ppc64
|
||||
|
@ -2729,7 +2729,6 @@ F: Documentation/networking/filter.txt
|
||||
F: Documentation/bpf/
|
||||
F: include/linux/bpf*
|
||||
F: include/linux/filter.h
|
||||
F: include/trace/events/bpf.h
|
||||
F: include/trace/events/xdp.h
|
||||
F: include/uapi/linux/bpf*
|
||||
F: include/uapi/linux/filter.h
|
||||
@ -15408,6 +15407,14 @@ T: git git://linuxtv.org/media_tree.git
|
||||
S: Maintained
|
||||
F: drivers/media/tuners/tuner-xc2028.*
|
||||
|
||||
XDP SOCKETS (AF_XDP)
|
||||
M: Björn Töpel <bjorn.topel@intel.com>
|
||||
M: Magnus Karlsson <magnus.karlsson@intel.com>
|
||||
L: netdev@vger.kernel.org
|
||||
S: Maintained
|
||||
F: kernel/bpf/xskmap.c
|
||||
F: net/xdp/
|
||||
|
||||
XEN BLOCK SUBSYSTEM
|
||||
M: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
||||
M: Roger Pau Monné <roger.pau@citrix.com>
|
||||
|
@ -1452,83 +1452,6 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
|
||||
emit(ARM_LDR_I(rn, ARM_SP, STACK_VAR(src_lo)), ctx);
|
||||
emit_ldx_r(dst, rn, dstk, off, ctx, BPF_SIZE(code));
|
||||
break;
|
||||
/* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + imm)) */
|
||||
case BPF_LD | BPF_ABS | BPF_W:
|
||||
case BPF_LD | BPF_ABS | BPF_H:
|
||||
case BPF_LD | BPF_ABS | BPF_B:
|
||||
/* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + src + imm)) */
|
||||
case BPF_LD | BPF_IND | BPF_W:
|
||||
case BPF_LD | BPF_IND | BPF_H:
|
||||
case BPF_LD | BPF_IND | BPF_B:
|
||||
{
|
||||
const u8 r4 = bpf2a32[BPF_REG_6][1]; /* r4 = ptr to sk_buff */
|
||||
const u8 r0 = bpf2a32[BPF_REG_0][1]; /*r0: struct sk_buff *skb*/
|
||||
/* rtn value */
|
||||
const u8 r1 = bpf2a32[BPF_REG_0][0]; /* r1: int k */
|
||||
const u8 r2 = bpf2a32[BPF_REG_1][1]; /* r2: unsigned int size */
|
||||
const u8 r3 = bpf2a32[BPF_REG_1][0]; /* r3: void *buffer */
|
||||
const u8 r6 = bpf2a32[TMP_REG_1][1]; /* r6: void *(*func)(..) */
|
||||
int size;
|
||||
|
||||
/* Setting up first argument */
|
||||
emit(ARM_MOV_R(r0, r4), ctx);
|
||||
|
||||
/* Setting up second argument */
|
||||
emit_a32_mov_i(r1, imm, false, ctx);
|
||||
if (BPF_MODE(code) == BPF_IND)
|
||||
emit_a32_alu_r(r1, src_lo, false, sstk, ctx,
|
||||
false, false, BPF_ADD);
|
||||
|
||||
/* Setting up third argument */
|
||||
switch (BPF_SIZE(code)) {
|
||||
case BPF_W:
|
||||
size = 4;
|
||||
break;
|
||||
case BPF_H:
|
||||
size = 2;
|
||||
break;
|
||||
case BPF_B:
|
||||
size = 1;
|
||||
break;
|
||||
default:
|
||||
return -EINVAL;
|
||||
}
|
||||
emit_a32_mov_i(r2, size, false, ctx);
|
||||
|
||||
/* Setting up fourth argument */
|
||||
emit(ARM_ADD_I(r3, ARM_SP, imm8m(SKB_BUFFER)), ctx);
|
||||
|
||||
/* Setting up function pointer to call */
|
||||
emit_a32_mov_i(r6, (unsigned int)bpf_load_pointer, false, ctx);
|
||||
emit_blx_r(r6, ctx);
|
||||
|
||||
emit(ARM_EOR_R(r1, r1, r1), ctx);
|
||||
/* Check if return address is NULL or not.
|
||||
* if NULL then jump to epilogue
|
||||
* else continue to load the value from retn address
|
||||
*/
|
||||
emit(ARM_CMP_I(r0, 0), ctx);
|
||||
jmp_offset = epilogue_offset(ctx);
|
||||
check_imm24(jmp_offset);
|
||||
_emit(ARM_COND_EQ, ARM_B(jmp_offset), ctx);
|
||||
|
||||
/* Load value from the address */
|
||||
switch (BPF_SIZE(code)) {
|
||||
case BPF_W:
|
||||
emit(ARM_LDR_I(r0, r0, 0), ctx);
|
||||
emit_rev32(r0, r0, ctx);
|
||||
break;
|
||||
case BPF_H:
|
||||
emit(ARM_LDRH_I(r0, r0, 0), ctx);
|
||||
emit_rev16(r0, r0, ctx);
|
||||
break;
|
||||
case BPF_B:
|
||||
emit(ARM_LDRB_I(r0, r0, 0), ctx);
|
||||
/* No need to reverse */
|
||||
break;
|
||||
}
|
||||
break;
|
||||
}
|
||||
/* ST: *(size *)(dst + off) = imm */
|
||||
case BPF_ST | BPF_MEM | BPF_W:
|
||||
case BPF_ST | BPF_MEM | BPF_H:
|
||||
|
@ -723,71 +723,6 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
|
||||
emit(A64_CBNZ(0, tmp3, jmp_offset), ctx);
|
||||
break;
|
||||
|
||||
/* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + imm)) */
|
||||
case BPF_LD | BPF_ABS | BPF_W:
|
||||
case BPF_LD | BPF_ABS | BPF_H:
|
||||
case BPF_LD | BPF_ABS | BPF_B:
|
||||
/* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + src + imm)) */
|
||||
case BPF_LD | BPF_IND | BPF_W:
|
||||
case BPF_LD | BPF_IND | BPF_H:
|
||||
case BPF_LD | BPF_IND | BPF_B:
|
||||
{
|
||||
const u8 r0 = bpf2a64[BPF_REG_0]; /* r0 = return value */
|
||||
const u8 r6 = bpf2a64[BPF_REG_6]; /* r6 = pointer to sk_buff */
|
||||
const u8 fp = bpf2a64[BPF_REG_FP];
|
||||
const u8 r1 = bpf2a64[BPF_REG_1]; /* r1: struct sk_buff *skb */
|
||||
const u8 r2 = bpf2a64[BPF_REG_2]; /* r2: int k */
|
||||
const u8 r3 = bpf2a64[BPF_REG_3]; /* r3: unsigned int size */
|
||||
const u8 r4 = bpf2a64[BPF_REG_4]; /* r4: void *buffer */
|
||||
const u8 r5 = bpf2a64[BPF_REG_5]; /* r5: void *(*func)(...) */
|
||||
int size;
|
||||
|
||||
emit(A64_MOV(1, r1, r6), ctx);
|
||||
emit_a64_mov_i(0, r2, imm, ctx);
|
||||
if (BPF_MODE(code) == BPF_IND)
|
||||
emit(A64_ADD(0, r2, r2, src), ctx);
|
||||
switch (BPF_SIZE(code)) {
|
||||
case BPF_W:
|
||||
size = 4;
|
||||
break;
|
||||
case BPF_H:
|
||||
size = 2;
|
||||
break;
|
||||
case BPF_B:
|
||||
size = 1;
|
||||
break;
|
||||
default:
|
||||
return -EINVAL;
|
||||
}
|
||||
emit_a64_mov_i64(r3, size, ctx);
|
||||
emit(A64_SUB_I(1, r4, fp, ctx->stack_size), ctx);
|
||||
emit_a64_mov_i64(r5, (unsigned long)bpf_load_pointer, ctx);
|
||||
emit(A64_BLR(r5), ctx);
|
||||
emit(A64_MOV(1, r0, A64_R(0)), ctx);
|
||||
|
||||
jmp_offset = epilogue_offset(ctx);
|
||||
check_imm19(jmp_offset);
|
||||
emit(A64_CBZ(1, r0, jmp_offset), ctx);
|
||||
emit(A64_MOV(1, r5, r0), ctx);
|
||||
switch (BPF_SIZE(code)) {
|
||||
case BPF_W:
|
||||
emit(A64_LDR32(r0, r5, A64_ZR), ctx);
|
||||
#ifndef CONFIG_CPU_BIG_ENDIAN
|
||||
emit(A64_REV32(0, r0, r0), ctx);
|
||||
#endif
|
||||
break;
|
||||
case BPF_H:
|
||||
emit(A64_LDRH(r0, r5, A64_ZR), ctx);
|
||||
#ifndef CONFIG_CPU_BIG_ENDIAN
|
||||
emit(A64_REV16(0, r0, r0), ctx);
|
||||
#endif
|
||||
break;
|
||||
case BPF_B:
|
||||
emit(A64_LDRB(r0, r5, A64_ZR), ctx);
|
||||
break;
|
||||
}
|
||||
break;
|
||||
}
|
||||
default:
|
||||
pr_err_once("unknown opcode %02x\n", code);
|
||||
return -EINVAL;
|
||||
|
@ -1267,110 +1267,6 @@ static int build_one_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
|
||||
return -EINVAL;
|
||||
break;
|
||||
|
||||
case BPF_LD | BPF_B | BPF_ABS:
|
||||
case BPF_LD | BPF_H | BPF_ABS:
|
||||
case BPF_LD | BPF_W | BPF_ABS:
|
||||
case BPF_LD | BPF_DW | BPF_ABS:
|
||||
ctx->flags |= EBPF_SAVE_RA;
|
||||
|
||||
gen_imm_to_reg(insn, MIPS_R_A1, ctx);
|
||||
emit_instr(ctx, addiu, MIPS_R_A2, MIPS_R_ZERO, size_to_len(insn));
|
||||
|
||||
if (insn->imm < 0) {
|
||||
emit_const_to_reg(ctx, MIPS_R_T9, (u64)bpf_internal_load_pointer_neg_helper);
|
||||
} else {
|
||||
emit_const_to_reg(ctx, MIPS_R_T9, (u64)ool_skb_header_pointer);
|
||||
emit_instr(ctx, daddiu, MIPS_R_A3, MIPS_R_SP, ctx->tmp_offset);
|
||||
}
|
||||
goto ld_skb_common;
|
||||
|
||||
case BPF_LD | BPF_B | BPF_IND:
|
||||
case BPF_LD | BPF_H | BPF_IND:
|
||||
case BPF_LD | BPF_W | BPF_IND:
|
||||
case BPF_LD | BPF_DW | BPF_IND:
|
||||
ctx->flags |= EBPF_SAVE_RA;
|
||||
src = ebpf_to_mips_reg(ctx, insn, src_reg_no_fp);
|
||||
if (src < 0)
|
||||
return src;
|
||||
ts = get_reg_val_type(ctx, this_idx, insn->src_reg);
|
||||
if (ts == REG_32BIT_ZERO_EX) {
|
||||
/* sign extend */
|
||||
emit_instr(ctx, sll, MIPS_R_A1, src, 0);
|
||||
src = MIPS_R_A1;
|
||||
}
|
||||
if (insn->imm >= S16_MIN && insn->imm <= S16_MAX) {
|
||||
emit_instr(ctx, daddiu, MIPS_R_A1, src, insn->imm);
|
||||
} else {
|
||||
gen_imm_to_reg(insn, MIPS_R_AT, ctx);
|
||||
emit_instr(ctx, daddu, MIPS_R_A1, MIPS_R_AT, src);
|
||||
}
|
||||
/* truncate to 32-bit int */
|
||||
emit_instr(ctx, sll, MIPS_R_A1, MIPS_R_A1, 0);
|
||||
emit_instr(ctx, daddiu, MIPS_R_A3, MIPS_R_SP, ctx->tmp_offset);
|
||||
emit_instr(ctx, slt, MIPS_R_AT, MIPS_R_A1, MIPS_R_ZERO);
|
||||
|
||||
emit_const_to_reg(ctx, MIPS_R_T8, (u64)bpf_internal_load_pointer_neg_helper);
|
||||
emit_const_to_reg(ctx, MIPS_R_T9, (u64)ool_skb_header_pointer);
|
||||
emit_instr(ctx, addiu, MIPS_R_A2, MIPS_R_ZERO, size_to_len(insn));
|
||||
emit_instr(ctx, movn, MIPS_R_T9, MIPS_R_T8, MIPS_R_AT);
|
||||
|
||||
ld_skb_common:
|
||||
emit_instr(ctx, jalr, MIPS_R_RA, MIPS_R_T9);
|
||||
/* delay slot move */
|
||||
emit_instr(ctx, daddu, MIPS_R_A0, MIPS_R_S0, MIPS_R_ZERO);
|
||||
|
||||
/* Check the error value */
|
||||
b_off = b_imm(exit_idx, ctx);
|
||||
if (is_bad_offset(b_off)) {
|
||||
target = j_target(ctx, exit_idx);
|
||||
if (target == (unsigned int)-1)
|
||||
return -E2BIG;
|
||||
|
||||
if (!(ctx->offsets[this_idx] & OFFSETS_B_CONV)) {
|
||||
ctx->offsets[this_idx] |= OFFSETS_B_CONV;
|
||||
ctx->long_b_conversion = 1;
|
||||
}
|
||||
emit_instr(ctx, bne, MIPS_R_V0, MIPS_R_ZERO, 4 * 3);
|
||||
emit_instr(ctx, nop);
|
||||
emit_instr(ctx, j, target);
|
||||
emit_instr(ctx, nop);
|
||||
} else {
|
||||
emit_instr(ctx, beq, MIPS_R_V0, MIPS_R_ZERO, b_off);
|
||||
emit_instr(ctx, nop);
|
||||
}
|
||||
|
||||
#ifdef __BIG_ENDIAN
|
||||
need_swap = false;
|
||||
#else
|
||||
need_swap = true;
|
||||
#endif
|
||||
dst = MIPS_R_V0;
|
||||
switch (BPF_SIZE(insn->code)) {
|
||||
case BPF_B:
|
||||
emit_instr(ctx, lbu, dst, 0, MIPS_R_V0);
|
||||
break;
|
||||
case BPF_H:
|
||||
emit_instr(ctx, lhu, dst, 0, MIPS_R_V0);
|
||||
if (need_swap)
|
||||
emit_instr(ctx, wsbh, dst, dst);
|
||||
break;
|
||||
case BPF_W:
|
||||
emit_instr(ctx, lw, dst, 0, MIPS_R_V0);
|
||||
if (need_swap) {
|
||||
emit_instr(ctx, wsbh, dst, dst);
|
||||
emit_instr(ctx, rotr, dst, dst, 16);
|
||||
}
|
||||
break;
|
||||
case BPF_DW:
|
||||
emit_instr(ctx, ld, dst, 0, MIPS_R_V0);
|
||||
if (need_swap) {
|
||||
emit_instr(ctx, dsbh, dst, dst);
|
||||
emit_instr(ctx, dshd, dst, dst);
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
break;
|
||||
case BPF_ALU | BPF_END | BPF_FROM_BE:
|
||||
case BPF_ALU | BPF_END | BPF_FROM_LE:
|
||||
dst = ebpf_to_mips_reg(ctx, insn, dst_reg);
|
||||
|
@ -3,7 +3,7 @@
|
||||
# Arch-specific network modules
|
||||
#
|
||||
ifeq ($(CONFIG_PPC64),y)
|
||||
obj-$(CONFIG_BPF_JIT) += bpf_jit_asm64.o bpf_jit_comp64.o
|
||||
obj-$(CONFIG_BPF_JIT) += bpf_jit_comp64.o
|
||||
else
|
||||
obj-$(CONFIG_BPF_JIT) += bpf_jit_asm.o bpf_jit_comp.o
|
||||
endif
|
||||
|
@ -20,7 +20,7 @@
|
||||
* with our redzone usage.
|
||||
*
|
||||
* [ prev sp ] <-------------
|
||||
* [ nv gpr save area ] 8*8 |
|
||||
* [ nv gpr save area ] 6*8 |
|
||||
* [ tail_call_cnt ] 8 |
|
||||
* [ local_tmp_var ] 8 |
|
||||
* fp (r31) --> [ ebpf stack space ] upto 512 |
|
||||
@ -28,8 +28,8 @@
|
||||
* sp (r1) ---> [ stack pointer ] --------------
|
||||
*/
|
||||
|
||||
/* for gpr non volatile registers BPG_REG_6 to 10, plus skb cache registers */
|
||||
#define BPF_PPC_STACK_SAVE (8*8)
|
||||
/* for gpr non volatile registers BPG_REG_6 to 10 */
|
||||
#define BPF_PPC_STACK_SAVE (6*8)
|
||||
/* for bpf JIT code internal usage */
|
||||
#define BPF_PPC_STACK_LOCALS 16
|
||||
/* stack frame excluding BPF stack, ensure this is quadword aligned */
|
||||
@ -39,10 +39,8 @@
|
||||
#ifndef __ASSEMBLY__
|
||||
|
||||
/* BPF register usage */
|
||||
#define SKB_HLEN_REG (MAX_BPF_JIT_REG + 0)
|
||||
#define SKB_DATA_REG (MAX_BPF_JIT_REG + 1)
|
||||
#define TMP_REG_1 (MAX_BPF_JIT_REG + 2)
|
||||
#define TMP_REG_2 (MAX_BPF_JIT_REG + 3)
|
||||
#define TMP_REG_1 (MAX_BPF_JIT_REG + 0)
|
||||
#define TMP_REG_2 (MAX_BPF_JIT_REG + 1)
|
||||
|
||||
/* BPF to ppc register mappings */
|
||||
static const int b2p[] = {
|
||||
@ -63,40 +61,23 @@ static const int b2p[] = {
|
||||
[BPF_REG_FP] = 31,
|
||||
/* eBPF jit internal registers */
|
||||
[BPF_REG_AX] = 2,
|
||||
[SKB_HLEN_REG] = 25,
|
||||
[SKB_DATA_REG] = 26,
|
||||
[TMP_REG_1] = 9,
|
||||
[TMP_REG_2] = 10
|
||||
};
|
||||
|
||||
/* PPC NVR range -- update this if we ever use NVRs below r24 */
|
||||
#define BPF_PPC_NVR_MIN 24
|
||||
|
||||
/* Assembly helpers */
|
||||
#define DECLARE_LOAD_FUNC(func) u64 func(u64 r3, u64 r4); \
|
||||
u64 func##_negative_offset(u64 r3, u64 r4); \
|
||||
u64 func##_positive_offset(u64 r3, u64 r4);
|
||||
|
||||
DECLARE_LOAD_FUNC(sk_load_word);
|
||||
DECLARE_LOAD_FUNC(sk_load_half);
|
||||
DECLARE_LOAD_FUNC(sk_load_byte);
|
||||
|
||||
#define CHOOSE_LOAD_FUNC(imm, func) \
|
||||
(imm < 0 ? \
|
||||
(imm >= SKF_LL_OFF ? func##_negative_offset : func) : \
|
||||
func##_positive_offset)
|
||||
/* PPC NVR range -- update this if we ever use NVRs below r27 */
|
||||
#define BPF_PPC_NVR_MIN 27
|
||||
|
||||
#define SEEN_FUNC 0x1000 /* might call external helpers */
|
||||
#define SEEN_STACK 0x2000 /* uses BPF stack */
|
||||
#define SEEN_SKB 0x4000 /* uses sk_buff */
|
||||
#define SEEN_TAILCALL 0x8000 /* uses tail calls */
|
||||
#define SEEN_TAILCALL 0x4000 /* uses tail calls */
|
||||
|
||||
struct codegen_context {
|
||||
/*
|
||||
* This is used to track register usage as well
|
||||
* as calls to external helpers.
|
||||
* - register usage is tracked with corresponding
|
||||
* bits (r3-r10 and r25-r31)
|
||||
* bits (r3-r10 and r27-r31)
|
||||
* - rest of the bits can be used to track other
|
||||
* things -- for now, we use bits 16 to 23
|
||||
* encoded in SEEN_* macros above
|
||||
|
@ -1,180 +0,0 @@
|
||||
/*
|
||||
* bpf_jit_asm64.S: Packet/header access helper functions
|
||||
* for PPC64 BPF compiler.
|
||||
*
|
||||
* Copyright 2016, Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
|
||||
* IBM Corporation
|
||||
*
|
||||
* Based on bpf_jit_asm.S by Matt Evans
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or
|
||||
* modify it under the terms of the GNU General Public License
|
||||
* as published by the Free Software Foundation; version 2
|
||||
* of the License.
|
||||
*/
|
||||
|
||||
#include <asm/ppc_asm.h>
|
||||
#include <asm/ptrace.h>
|
||||
#include "bpf_jit64.h"
|
||||
|
||||
/*
|
||||
* All of these routines are called directly from generated code,
|
||||
* with the below register usage:
|
||||
* r27 skb pointer (ctx)
|
||||
* r25 skb header length
|
||||
* r26 skb->data pointer
|
||||
* r4 offset
|
||||
*
|
||||
* Result is passed back in:
|
||||
* r8 data read in host endian format (accumulator)
|
||||
*
|
||||
* r9 is used as a temporary register
|
||||
*/
|
||||
|
||||
#define r_skb r27
|
||||
#define r_hlen r25
|
||||
#define r_data r26
|
||||
#define r_off r4
|
||||
#define r_val r8
|
||||
#define r_tmp r9
|
||||
|
||||
_GLOBAL_TOC(sk_load_word)
|
||||
cmpdi r_off, 0
|
||||
blt bpf_slow_path_word_neg
|
||||
b sk_load_word_positive_offset
|
||||
|
||||
_GLOBAL_TOC(sk_load_word_positive_offset)
|
||||
/* Are we accessing past headlen? */
|
||||
subi r_tmp, r_hlen, 4
|
||||
cmpd r_tmp, r_off
|
||||
blt bpf_slow_path_word
|
||||
/* Nope, just hitting the header. cr0 here is eq or gt! */
|
||||
LWZX_BE r_val, r_data, r_off
|
||||
blr /* Return success, cr0 != LT */
|
||||
|
||||
_GLOBAL_TOC(sk_load_half)
|
||||
cmpdi r_off, 0
|
||||
blt bpf_slow_path_half_neg
|
||||
b sk_load_half_positive_offset
|
||||
|
||||
_GLOBAL_TOC(sk_load_half_positive_offset)
|
||||
subi r_tmp, r_hlen, 2
|
||||
cmpd r_tmp, r_off
|
||||
blt bpf_slow_path_half
|
||||
LHZX_BE r_val, r_data, r_off
|
||||
blr
|
||||
|
||||
_GLOBAL_TOC(sk_load_byte)
|
||||
cmpdi r_off, 0
|
||||
blt bpf_slow_path_byte_neg
|
||||
b sk_load_byte_positive_offset
|
||||
|
||||
_GLOBAL_TOC(sk_load_byte_positive_offset)
|
||||
cmpd r_hlen, r_off
|
||||
ble bpf_slow_path_byte
|
||||
lbzx r_val, r_data, r_off
|
||||
blr
|
||||
|
||||
/*
|
||||
* Call out to skb_copy_bits:
|
||||
* Allocate a new stack frame here to remain ABI-compliant in
|
||||
* stashing LR.
|
||||
*/
|
||||
#define bpf_slow_path_common(SIZE) \
|
||||
mflr r0; \
|
||||
std r0, PPC_LR_STKOFF(r1); \
|
||||
stdu r1, -(STACK_FRAME_MIN_SIZE + BPF_PPC_STACK_LOCALS)(r1); \
|
||||
mr r3, r_skb; \
|
||||
/* r4 = r_off as passed */ \
|
||||
addi r5, r1, STACK_FRAME_MIN_SIZE; \
|
||||
li r6, SIZE; \
|
||||
bl skb_copy_bits; \
|
||||
nop; \
|
||||
/* save r5 */ \
|
||||
addi r5, r1, STACK_FRAME_MIN_SIZE; \
|
||||
/* r3 = 0 on success */ \
|
||||
addi r1, r1, STACK_FRAME_MIN_SIZE + BPF_PPC_STACK_LOCALS; \
|
||||
ld r0, PPC_LR_STKOFF(r1); \
|
||||
mtlr r0; \
|
||||
cmpdi r3, 0; \
|
||||
blt bpf_error; /* cr0 = LT */
|
||||
|
||||
bpf_slow_path_word:
|
||||
bpf_slow_path_common(4)
|
||||
/* Data value is on stack, and cr0 != LT */
|
||||
LWZX_BE r_val, 0, r5
|
||||
blr
|
||||
|
||||
bpf_slow_path_half:
|
||||
bpf_slow_path_common(2)
|
||||
LHZX_BE r_val, 0, r5
|
||||
blr
|
||||
|
||||
bpf_slow_path_byte:
|
||||
bpf_slow_path_common(1)
|
||||
lbzx r_val, 0, r5
|
||||
blr
|
||||
|
||||
/*
|
||||
* Call out to bpf_internal_load_pointer_neg_helper
|
||||
*/
|
||||
#define sk_negative_common(SIZE) \
|
||||
mflr r0; \
|
||||
std r0, PPC_LR_STKOFF(r1); \
|
||||
stdu r1, -STACK_FRAME_MIN_SIZE(r1); \
|
||||
mr r3, r_skb; \
|
||||
/* r4 = r_off, as passed */ \
|
||||
li r5, SIZE; \
|
||||
bl bpf_internal_load_pointer_neg_helper; \
|
||||
nop; \
|
||||
addi r1, r1, STACK_FRAME_MIN_SIZE; \
|
||||
ld r0, PPC_LR_STKOFF(r1); \
|
||||
mtlr r0; \
|
||||
/* R3 != 0 on success */ \
|
||||
cmpldi r3, 0; \
|
||||
beq bpf_error_slow; /* cr0 = EQ */
|
||||
|
||||
bpf_slow_path_word_neg:
|
||||
lis r_tmp, -32 /* SKF_LL_OFF */
|
||||
cmpd r_off, r_tmp /* addr < SKF_* */
|
||||
blt bpf_error /* cr0 = LT */
|
||||
b sk_load_word_negative_offset
|
||||
|
||||
_GLOBAL_TOC(sk_load_word_negative_offset)
|
||||
sk_negative_common(4)
|
||||
LWZX_BE r_val, 0, r3
|
||||
blr
|
||||
|
||||
bpf_slow_path_half_neg:
|
||||
lis r_tmp, -32 /* SKF_LL_OFF */
|
||||
cmpd r_off, r_tmp /* addr < SKF_* */
|
||||
blt bpf_error /* cr0 = LT */
|
||||
b sk_load_half_negative_offset
|
||||
|
||||
_GLOBAL_TOC(sk_load_half_negative_offset)
|
||||
sk_negative_common(2)
|
||||
LHZX_BE r_val, 0, r3
|
||||
blr
|
||||
|
||||
bpf_slow_path_byte_neg:
|
||||
lis r_tmp, -32 /* SKF_LL_OFF */
|
||||
cmpd r_off, r_tmp /* addr < SKF_* */
|
||||
blt bpf_error /* cr0 = LT */
|
||||
b sk_load_byte_negative_offset
|
||||
|
||||
_GLOBAL_TOC(sk_load_byte_negative_offset)
|
||||
sk_negative_common(1)
|
||||
lbzx r_val, 0, r3
|
||||
blr
|
||||
|
||||
bpf_error_slow:
|
||||
/* fabricate a cr0 = lt */
|
||||
li r_tmp, -1
|
||||
cmpdi r_tmp, 0
|
||||
bpf_error:
|
||||
/*
|
||||
* Entered with cr0 = lt
|
||||
* Generated code will 'blt epilogue', returning 0.
|
||||
*/
|
||||
li r_val, 0
|
||||
blr
|
@ -59,7 +59,7 @@ static inline bool bpf_has_stack_frame(struct codegen_context *ctx)
|
||||
* [ prev sp ] <-------------
|
||||
* [ ... ] |
|
||||
* sp (r1) ---> [ stack pointer ] --------------
|
||||
* [ nv gpr save area ] 8*8
|
||||
* [ nv gpr save area ] 6*8
|
||||
* [ tail_call_cnt ] 8
|
||||
* [ local_tmp_var ] 8
|
||||
* [ unused red zone ] 208 bytes protected
|
||||
@ -88,21 +88,6 @@ static int bpf_jit_stack_offsetof(struct codegen_context *ctx, int reg)
|
||||
BUG();
|
||||
}
|
||||
|
||||
static void bpf_jit_emit_skb_loads(u32 *image, struct codegen_context *ctx)
|
||||
{
|
||||
/*
|
||||
* Load skb->len and skb->data_len
|
||||
* r3 points to skb
|
||||
*/
|
||||
PPC_LWZ(b2p[SKB_HLEN_REG], 3, offsetof(struct sk_buff, len));
|
||||
PPC_LWZ(b2p[TMP_REG_1], 3, offsetof(struct sk_buff, data_len));
|
||||
/* header_len = len - data_len */
|
||||
PPC_SUB(b2p[SKB_HLEN_REG], b2p[SKB_HLEN_REG], b2p[TMP_REG_1]);
|
||||
|
||||
/* skb->data pointer */
|
||||
PPC_BPF_LL(b2p[SKB_DATA_REG], 3, offsetof(struct sk_buff, data));
|
||||
}
|
||||
|
||||
static void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
|
||||
{
|
||||
int i;
|
||||
@ -145,18 +130,6 @@ static void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
|
||||
if (bpf_is_seen_register(ctx, i))
|
||||
PPC_BPF_STL(b2p[i], 1, bpf_jit_stack_offsetof(ctx, b2p[i]));
|
||||
|
||||
/*
|
||||
* Save additional non-volatile regs if we cache skb
|
||||
* Also, setup skb data
|
||||
*/
|
||||
if (ctx->seen & SEEN_SKB) {
|
||||
PPC_BPF_STL(b2p[SKB_HLEN_REG], 1,
|
||||
bpf_jit_stack_offsetof(ctx, b2p[SKB_HLEN_REG]));
|
||||
PPC_BPF_STL(b2p[SKB_DATA_REG], 1,
|
||||
bpf_jit_stack_offsetof(ctx, b2p[SKB_DATA_REG]));
|
||||
bpf_jit_emit_skb_loads(image, ctx);
|
||||
}
|
||||
|
||||
/* Setup frame pointer to point to the bpf stack area */
|
||||
if (bpf_is_seen_register(ctx, BPF_REG_FP))
|
||||
PPC_ADDI(b2p[BPF_REG_FP], 1,
|
||||
@ -172,14 +145,6 @@ static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx
|
||||
if (bpf_is_seen_register(ctx, i))
|
||||
PPC_BPF_LL(b2p[i], 1, bpf_jit_stack_offsetof(ctx, b2p[i]));
|
||||
|
||||
/* Restore non-volatile registers used for skb cache */
|
||||
if (ctx->seen & SEEN_SKB) {
|
||||
PPC_BPF_LL(b2p[SKB_HLEN_REG], 1,
|
||||
bpf_jit_stack_offsetof(ctx, b2p[SKB_HLEN_REG]));
|
||||
PPC_BPF_LL(b2p[SKB_DATA_REG], 1,
|
||||
bpf_jit_stack_offsetof(ctx, b2p[SKB_DATA_REG]));
|
||||
}
|
||||
|
||||
/* Tear down our stack frame */
|
||||
if (bpf_has_stack_frame(ctx)) {
|
||||
PPC_ADDI(1, 1, BPF_PPC_STACKFRAME + ctx->stack_size);
|
||||
@ -753,23 +718,10 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
|
||||
ctx->seen |= SEEN_FUNC;
|
||||
func = (u8 *) __bpf_call_base + imm;
|
||||
|
||||
/* Save skb pointer if we need to re-cache skb data */
|
||||
if ((ctx->seen & SEEN_SKB) &&
|
||||
bpf_helper_changes_pkt_data(func))
|
||||
PPC_BPF_STL(3, 1, bpf_jit_stack_local(ctx));
|
||||
|
||||
bpf_jit_emit_func_call(image, ctx, (u64)func);
|
||||
|
||||
/* move return value from r3 to BPF_REG_0 */
|
||||
PPC_MR(b2p[BPF_REG_0], 3);
|
||||
|
||||
/* refresh skb cache */
|
||||
if ((ctx->seen & SEEN_SKB) &&
|
||||
bpf_helper_changes_pkt_data(func)) {
|
||||
/* reload skb pointer to r3 */
|
||||
PPC_BPF_LL(3, 1, bpf_jit_stack_local(ctx));
|
||||
bpf_jit_emit_skb_loads(image, ctx);
|
||||
}
|
||||
break;
|
||||
|
||||
/*
|
||||
@ -886,65 +838,6 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
|
||||
PPC_BCC(true_cond, addrs[i + 1 + off]);
|
||||
break;
|
||||
|
||||
/*
|
||||
* Loads from packet header/data
|
||||
* Assume 32-bit input value in imm and X (src_reg)
|
||||
*/
|
||||
|
||||
/* Absolute loads */
|
||||
case BPF_LD | BPF_W | BPF_ABS:
|
||||
func = (u8 *)CHOOSE_LOAD_FUNC(imm, sk_load_word);
|
||||
goto common_load_abs;
|
||||
case BPF_LD | BPF_H | BPF_ABS:
|
||||
func = (u8 *)CHOOSE_LOAD_FUNC(imm, sk_load_half);
|
||||
goto common_load_abs;
|
||||
case BPF_LD | BPF_B | BPF_ABS:
|
||||
func = (u8 *)CHOOSE_LOAD_FUNC(imm, sk_load_byte);
|
||||
common_load_abs:
|
||||
/*
|
||||
* Load from [imm]
|
||||
* Load into r4, which can just be passed onto
|
||||
* skb load helpers as the second parameter
|
||||
*/
|
||||
PPC_LI32(4, imm);
|
||||
goto common_load;
|
||||
|
||||
/* Indirect loads */
|
||||
case BPF_LD | BPF_W | BPF_IND:
|
||||
func = (u8 *)sk_load_word;
|
||||
goto common_load_ind;
|
||||
case BPF_LD | BPF_H | BPF_IND:
|
||||
func = (u8 *)sk_load_half;
|
||||
goto common_load_ind;
|
||||
case BPF_LD | BPF_B | BPF_IND:
|
||||
func = (u8 *)sk_load_byte;
|
||||
common_load_ind:
|
||||
/*
|
||||
* Load from [src_reg + imm]
|
||||
* Treat src_reg as a 32-bit value
|
||||
*/
|
||||
PPC_EXTSW(4, src_reg);
|
||||
if (imm) {
|
||||
if (imm >= -32768 && imm < 32768)
|
||||
PPC_ADDI(4, 4, IMM_L(imm));
|
||||
else {
|
||||
PPC_LI32(b2p[TMP_REG_1], imm);
|
||||
PPC_ADD(4, 4, b2p[TMP_REG_1]);
|
||||
}
|
||||
}
|
||||
|
||||
common_load:
|
||||
ctx->seen |= SEEN_SKB;
|
||||
ctx->seen |= SEEN_FUNC;
|
||||
bpf_jit_emit_func_call(image, ctx, (u64)func);
|
||||
|
||||
/*
|
||||
* Helper returns 'lt' condition on error, and an
|
||||
* appropriate return value in BPF_REG_0
|
||||
*/
|
||||
PPC_BCC(COND_LT, exit_addr);
|
||||
break;
|
||||
|
||||
/*
|
||||
* Tail call
|
||||
*/
|
||||
|
@ -2,4 +2,4 @@
|
||||
#
|
||||
# Arch-specific network modules
|
||||
#
|
||||
obj-$(CONFIG_BPF_JIT) += bpf_jit.o bpf_jit_comp.o
|
||||
obj-$(CONFIG_BPF_JIT) += bpf_jit_comp.o
|
||||
|
@ -1,116 +0,0 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
/*
|
||||
* BPF Jit compiler for s390, help functions.
|
||||
*
|
||||
* Copyright IBM Corp. 2012,2015
|
||||
*
|
||||
* Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
|
||||
* Michael Holzheu <holzheu@linux.vnet.ibm.com>
|
||||
*/
|
||||
|
||||
#include <linux/linkage.h>
|
||||
#include "bpf_jit.h"
|
||||
|
||||
/*
|
||||
* Calling convention:
|
||||
* registers %r7-%r10, %r11,%r13, and %r15 are call saved
|
||||
*
|
||||
* Input (64 bit):
|
||||
* %r3 (%b2) = offset into skb data
|
||||
* %r6 (%b5) = return address
|
||||
* %r7 (%b6) = skb pointer
|
||||
* %r12 = skb data pointer
|
||||
*
|
||||
* Output:
|
||||
* %r14= %b0 = return value (read skb value)
|
||||
*
|
||||
* Work registers: %r2,%r4,%r5,%r14
|
||||
*
|
||||
* skb_copy_bits takes 4 parameters:
|
||||
* %r2 = skb pointer
|
||||
* %r3 = offset into skb data
|
||||
* %r4 = pointer to temp buffer
|
||||
* %r5 = length to copy
|
||||
* Return value in %r2: 0 = ok
|
||||
*
|
||||
* bpf_internal_load_pointer_neg_helper takes 3 parameters:
|
||||
* %r2 = skb pointer
|
||||
* %r3 = offset into data
|
||||
* %r4 = length to copy
|
||||
* Return value in %r2: Pointer to data
|
||||
*/
|
||||
|
||||
#define SKF_MAX_NEG_OFF -0x200000 /* SKF_LL_OFF from filter.h */
|
||||
|
||||
/*
|
||||
* Load SIZE bytes from SKB
|
||||
*/
|
||||
#define sk_load_common(NAME, SIZE, LOAD) \
|
||||
ENTRY(sk_load_##NAME); \
|
||||
ltgr %r3,%r3; /* Is offset negative? */ \
|
||||
jl sk_load_##NAME##_slow_neg; \
|
||||
ENTRY(sk_load_##NAME##_pos); \
|
||||
aghi %r3,SIZE; /* Offset + SIZE */ \
|
||||
clg %r3,STK_OFF_HLEN(%r15); /* Offset + SIZE > hlen? */ \
|
||||
jh sk_load_##NAME##_slow; \
|
||||
LOAD %r14,-SIZE(%r3,%r12); /* Get data from skb */ \
|
||||
b OFF_OK(%r6); /* Return */ \
|
||||
\
|
||||
sk_load_##NAME##_slow:; \
|
||||
lgr %r2,%r7; /* Arg1 = skb pointer */ \
|
||||
aghi %r3,-SIZE; /* Arg2 = offset */ \
|
||||
la %r4,STK_OFF_TMP(%r15); /* Arg3 = temp bufffer */ \
|
||||
lghi %r5,SIZE; /* Arg4 = size */ \
|
||||
brasl %r14,skb_copy_bits; /* Get data from skb */ \
|
||||
LOAD %r14,STK_OFF_TMP(%r15); /* Load from temp bufffer */ \
|
||||
ltgr %r2,%r2; /* Set cc to (%r2 != 0) */ \
|
||||
br %r6; /* Return */
|
||||
|
||||
sk_load_common(word, 4, llgf) /* r14 = *(u32 *) (skb->data+offset) */
|
||||
sk_load_common(half, 2, llgh) /* r14 = *(u16 *) (skb->data+offset) */
|
||||
|
||||
/*
|
||||
* Load 1 byte from SKB (optimized version)
|
||||
*/
|
||||
/* r14 = *(u8 *) (skb->data+offset) */
|
||||
ENTRY(sk_load_byte)
|
||||
ltgr %r3,%r3 # Is offset negative?
|
||||
jl sk_load_byte_slow_neg
|
||||
ENTRY(sk_load_byte_pos)
|
||||
clg %r3,STK_OFF_HLEN(%r15) # Offset >= hlen?
|
||||
jnl sk_load_byte_slow
|
||||
llgc %r14,0(%r3,%r12) # Get byte from skb
|
||||
b OFF_OK(%r6) # Return OK
|
||||
|
||||
sk_load_byte_slow:
|
||||
lgr %r2,%r7 # Arg1 = skb pointer
|
||||
# Arg2 = offset
|
||||
la %r4,STK_OFF_TMP(%r15) # Arg3 = pointer to temp buffer
|
||||
lghi %r5,1 # Arg4 = size (1 byte)
|
||||
brasl %r14,skb_copy_bits # Get data from skb
|
||||
llgc %r14,STK_OFF_TMP(%r15) # Load result from temp buffer
|
||||
ltgr %r2,%r2 # Set cc to (%r2 != 0)
|
||||
br %r6 # Return cc
|
||||
|
||||
#define sk_negative_common(NAME, SIZE, LOAD) \
|
||||
sk_load_##NAME##_slow_neg:; \
|
||||
cgfi %r3,SKF_MAX_NEG_OFF; \
|
||||
jl bpf_error; \
|
||||
lgr %r2,%r7; /* Arg1 = skb pointer */ \
|
||||
/* Arg2 = offset */ \
|
||||
lghi %r4,SIZE; /* Arg3 = size */ \
|
||||
brasl %r14,bpf_internal_load_pointer_neg_helper; \
|
||||
ltgr %r2,%r2; \
|
||||
jz bpf_error; \
|
||||
LOAD %r14,0(%r2); /* Get data from pointer */ \
|
||||
xr %r3,%r3; /* Set cc to zero */ \
|
||||
br %r6; /* Return cc */
|
||||
|
||||
sk_negative_common(word, 4, llgf)
|
||||
sk_negative_common(half, 2, llgh)
|
||||
sk_negative_common(byte, 1, llgc)
|
||||
|
||||
bpf_error:
|
||||
# force a return 0 from jit handler
|
||||
ltgr %r15,%r15 # Set condition code
|
||||
br %r6
|
@ -16,9 +16,6 @@
|
||||
#include <linux/filter.h>
|
||||
#include <linux/types.h>
|
||||
|
||||
extern u8 sk_load_word_pos[], sk_load_half_pos[], sk_load_byte_pos[];
|
||||
extern u8 sk_load_word[], sk_load_half[], sk_load_byte[];
|
||||
|
||||
#endif /* __ASSEMBLY__ */
|
||||
|
||||
/*
|
||||
@ -36,15 +33,6 @@ extern u8 sk_load_word[], sk_load_half[], sk_load_byte[];
|
||||
* | | |
|
||||
* | BPF stack | |
|
||||
* | | |
|
||||
* +---------------+ |
|
||||
* | 8 byte skbp | |
|
||||
* R15+176 -> +---------------+ |
|
||||
* | 8 byte hlen | |
|
||||
* R15+168 -> +---------------+ |
|
||||
* | 4 byte align | |
|
||||
* +---------------+ |
|
||||
* | 4 byte temp | |
|
||||
* | for bpf_jit.S | |
|
||||
* R15+160 -> +---------------+ |
|
||||
* | new backchain | |
|
||||
* R15+152 -> +---------------+ |
|
||||
@ -57,17 +45,11 @@ extern u8 sk_load_word[], sk_load_half[], sk_load_byte[];
|
||||
* The stack size used by the BPF program ("BPF stack" above) is passed
|
||||
* via "aux->stack_depth".
|
||||
*/
|
||||
#define STK_SPACE_ADD (8 + 8 + 4 + 4 + 160)
|
||||
#define STK_SPACE_ADD (160)
|
||||
#define STK_160_UNUSED (160 - 12 * 8)
|
||||
#define STK_OFF (STK_SPACE_ADD - STK_160_UNUSED)
|
||||
#define STK_OFF_TMP 160 /* Offset of tmp buffer on stack */
|
||||
#define STK_OFF_HLEN 168 /* Offset of SKB header length on stack */
|
||||
#define STK_OFF_SKBP 176 /* Offset of SKB pointer on stack */
|
||||
|
||||
#define STK_OFF_R6 (160 - 11 * 8) /* Offset of r6 on stack */
|
||||
#define STK_OFF_TCCNT (160 - 12 * 8) /* Offset of tail_call_cnt on stack */
|
||||
|
||||
/* Offset to skip condition code check */
|
||||
#define OFF_OK 4
|
||||
|
||||
#endif /* __ARCH_S390_NET_BPF_JIT_H */
|
||||
|
@ -47,23 +47,21 @@ struct bpf_jit {
|
||||
|
||||
#define BPF_SIZE_MAX 0xffff /* Max size for program (16 bit branches) */
|
||||
|
||||
#define SEEN_SKB 1 /* skb access */
|
||||
#define SEEN_MEM 2 /* use mem[] for temporary storage */
|
||||
#define SEEN_RET0 4 /* ret0_ip points to a valid return 0 */
|
||||
#define SEEN_LITERAL 8 /* code uses literals */
|
||||
#define SEEN_FUNC 16 /* calls C functions */
|
||||
#define SEEN_TAIL_CALL 32 /* code uses tail calls */
|
||||
#define SEEN_REG_AX 64 /* code uses constant blinding */
|
||||
#define SEEN_STACK (SEEN_FUNC | SEEN_MEM | SEEN_SKB)
|
||||
#define SEEN_MEM (1 << 0) /* use mem[] for temporary storage */
|
||||
#define SEEN_RET0 (1 << 1) /* ret0_ip points to a valid return 0 */
|
||||
#define SEEN_LITERAL (1 << 2) /* code uses literals */
|
||||
#define SEEN_FUNC (1 << 3) /* calls C functions */
|
||||
#define SEEN_TAIL_CALL (1 << 4) /* code uses tail calls */
|
||||
#define SEEN_REG_AX (1 << 5) /* code uses constant blinding */
|
||||
#define SEEN_STACK (SEEN_FUNC | SEEN_MEM)
|
||||
|
||||
/*
|
||||
* s390 registers
|
||||
*/
|
||||
#define REG_W0 (MAX_BPF_JIT_REG + 0) /* Work register 1 (even) */
|
||||
#define REG_W1 (MAX_BPF_JIT_REG + 1) /* Work register 2 (odd) */
|
||||
#define REG_SKB_DATA (MAX_BPF_JIT_REG + 2) /* SKB data register */
|
||||
#define REG_L (MAX_BPF_JIT_REG + 3) /* Literal pool register */
|
||||
#define REG_15 (MAX_BPF_JIT_REG + 4) /* Register 15 */
|
||||
#define REG_L (MAX_BPF_JIT_REG + 2) /* Literal pool register */
|
||||
#define REG_15 (MAX_BPF_JIT_REG + 3) /* Register 15 */
|
||||
#define REG_0 REG_W0 /* Register 0 */
|
||||
#define REG_1 REG_W1 /* Register 1 */
|
||||
#define REG_2 BPF_REG_1 /* Register 2 */
|
||||
@ -88,10 +86,8 @@ static const int reg2hex[] = {
|
||||
[BPF_REG_9] = 10,
|
||||
/* BPF stack pointer */
|
||||
[BPF_REG_FP] = 13,
|
||||
/* Register for blinding (shared with REG_SKB_DATA) */
|
||||
/* Register for blinding */
|
||||
[BPF_REG_AX] = 12,
|
||||
/* SKB data pointer */
|
||||
[REG_SKB_DATA] = 12,
|
||||
/* Work registers for s390x backend */
|
||||
[REG_W0] = 0,
|
||||
[REG_W1] = 1,
|
||||
@ -384,27 +380,6 @@ static void save_restore_regs(struct bpf_jit *jit, int op, u32 stack_depth)
|
||||
} while (re <= 15);
|
||||
}
|
||||
|
||||
/*
|
||||
* For SKB access %b1 contains the SKB pointer. For "bpf_jit.S"
|
||||
* we store the SKB header length on the stack and the SKB data
|
||||
* pointer in REG_SKB_DATA if BPF_REG_AX is not used.
|
||||
*/
|
||||
static void emit_load_skb_data_hlen(struct bpf_jit *jit)
|
||||
{
|
||||
/* Header length: llgf %w1,<len>(%b1) */
|
||||
EMIT6_DISP_LH(0xe3000000, 0x0016, REG_W1, REG_0, BPF_REG_1,
|
||||
offsetof(struct sk_buff, len));
|
||||
/* s %w1,<data_len>(%b1) */
|
||||
EMIT4_DISP(0x5b000000, REG_W1, BPF_REG_1,
|
||||
offsetof(struct sk_buff, data_len));
|
||||
/* stg %w1,ST_OFF_HLEN(%r0,%r15) */
|
||||
EMIT6_DISP_LH(0xe3000000, 0x0024, REG_W1, REG_0, REG_15, STK_OFF_HLEN);
|
||||
if (!(jit->seen & SEEN_REG_AX))
|
||||
/* lg %skb_data,data_off(%b1) */
|
||||
EMIT6_DISP_LH(0xe3000000, 0x0004, REG_SKB_DATA, REG_0,
|
||||
BPF_REG_1, offsetof(struct sk_buff, data));
|
||||
}
|
||||
|
||||
/*
|
||||
* Emit function prologue
|
||||
*
|
||||
@ -445,12 +420,6 @@ static void bpf_jit_prologue(struct bpf_jit *jit, u32 stack_depth)
|
||||
EMIT6_DISP_LH(0xe3000000, 0x0024, REG_W1, REG_0,
|
||||
REG_15, 152);
|
||||
}
|
||||
if (jit->seen & SEEN_SKB) {
|
||||
emit_load_skb_data_hlen(jit);
|
||||
/* stg %b1,ST_OFF_SKBP(%r0,%r15) */
|
||||
EMIT6_DISP_LH(0xe3000000, 0x0024, BPF_REG_1, REG_0, REG_15,
|
||||
STK_OFF_SKBP);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
@ -483,12 +452,12 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i
|
||||
{
|
||||
struct bpf_insn *insn = &fp->insnsi[i];
|
||||
int jmp_off, last, insn_count = 1;
|
||||
unsigned int func_addr, mask;
|
||||
u32 dst_reg = insn->dst_reg;
|
||||
u32 src_reg = insn->src_reg;
|
||||
u32 *addrs = jit->addrs;
|
||||
s32 imm = insn->imm;
|
||||
s16 off = insn->off;
|
||||
unsigned int mask;
|
||||
|
||||
if (dst_reg == BPF_REG_AX || src_reg == BPF_REG_AX)
|
||||
jit->seen |= SEEN_REG_AX;
|
||||
@ -970,13 +939,6 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i
|
||||
EMIT2(0x0d00, REG_14, REG_W1);
|
||||
/* lgr %b0,%r2: load return value into %b0 */
|
||||
EMIT4(0xb9040000, BPF_REG_0, REG_2);
|
||||
if ((jit->seen & SEEN_SKB) &&
|
||||
bpf_helper_changes_pkt_data((void *)func)) {
|
||||
/* lg %b1,ST_OFF_SKBP(%r15) */
|
||||
EMIT6_DISP_LH(0xe3000000, 0x0004, BPF_REG_1, REG_0,
|
||||
REG_15, STK_OFF_SKBP);
|
||||
emit_load_skb_data_hlen(jit);
|
||||
}
|
||||
break;
|
||||
}
|
||||
case BPF_JMP | BPF_TAIL_CALL:
|
||||
@ -1176,73 +1138,6 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i
|
||||
jmp_off = addrs[i + off + 1] - (addrs[i + 1] - 4);
|
||||
EMIT4_PCREL(0xa7040000 | mask << 8, jmp_off);
|
||||
break;
|
||||
/*
|
||||
* BPF_LD
|
||||
*/
|
||||
case BPF_LD | BPF_ABS | BPF_B: /* b0 = *(u8 *) (skb->data+imm) */
|
||||
case BPF_LD | BPF_IND | BPF_B: /* b0 = *(u8 *) (skb->data+imm+src) */
|
||||
if ((BPF_MODE(insn->code) == BPF_ABS) && (imm >= 0))
|
||||
func_addr = __pa(sk_load_byte_pos);
|
||||
else
|
||||
func_addr = __pa(sk_load_byte);
|
||||
goto call_fn;
|
||||
case BPF_LD | BPF_ABS | BPF_H: /* b0 = *(u16 *) (skb->data+imm) */
|
||||
case BPF_LD | BPF_IND | BPF_H: /* b0 = *(u16 *) (skb->data+imm+src) */
|
||||
if ((BPF_MODE(insn->code) == BPF_ABS) && (imm >= 0))
|
||||
func_addr = __pa(sk_load_half_pos);
|
||||
else
|
||||
func_addr = __pa(sk_load_half);
|
||||
goto call_fn;
|
||||
case BPF_LD | BPF_ABS | BPF_W: /* b0 = *(u32 *) (skb->data+imm) */
|
||||
case BPF_LD | BPF_IND | BPF_W: /* b0 = *(u32 *) (skb->data+imm+src) */
|
||||
if ((BPF_MODE(insn->code) == BPF_ABS) && (imm >= 0))
|
||||
func_addr = __pa(sk_load_word_pos);
|
||||
else
|
||||
func_addr = __pa(sk_load_word);
|
||||
goto call_fn;
|
||||
call_fn:
|
||||
jit->seen |= SEEN_SKB | SEEN_RET0 | SEEN_FUNC;
|
||||
REG_SET_SEEN(REG_14); /* Return address of possible func call */
|
||||
|
||||
/*
|
||||
* Implicit input:
|
||||
* BPF_REG_6 (R7) : skb pointer
|
||||
* REG_SKB_DATA (R12): skb data pointer (if no BPF_REG_AX)
|
||||
*
|
||||
* Calculated input:
|
||||
* BPF_REG_2 (R3) : offset of byte(s) to fetch in skb
|
||||
* BPF_REG_5 (R6) : return address
|
||||
*
|
||||
* Output:
|
||||
* BPF_REG_0 (R14): data read from skb
|
||||
*
|
||||
* Scratch registers (BPF_REG_1-5)
|
||||
*/
|
||||
|
||||
/* Call function: llilf %w1,func_addr */
|
||||
EMIT6_IMM(0xc00f0000, REG_W1, func_addr);
|
||||
|
||||
/* Offset: lgfi %b2,imm */
|
||||
EMIT6_IMM(0xc0010000, BPF_REG_2, imm);
|
||||
if (BPF_MODE(insn->code) == BPF_IND)
|
||||
/* agfr %b2,%src (%src is s32 here) */
|
||||
EMIT4(0xb9180000, BPF_REG_2, src_reg);
|
||||
|
||||
/* Reload REG_SKB_DATA if BPF_REG_AX is used */
|
||||
if (jit->seen & SEEN_REG_AX)
|
||||
/* lg %skb_data,data_off(%b6) */
|
||||
EMIT6_DISP_LH(0xe3000000, 0x0004, REG_SKB_DATA, REG_0,
|
||||
BPF_REG_6, offsetof(struct sk_buff, data));
|
||||
/* basr %b5,%w1 (%b5 is call saved) */
|
||||
EMIT2(0x0d00, BPF_REG_5, REG_W1);
|
||||
|
||||
/*
|
||||
* Note: For fast access we jump directly after the
|
||||
* jnz instruction from bpf_jit.S
|
||||
*/
|
||||
/* jnz <ret0> */
|
||||
EMIT4_PCREL(0xa7740000, jit->ret0_ip - jit->prg);
|
||||
break;
|
||||
default: /* too complex, give up */
|
||||
pr_err("Unknown opcode %02x\n", insn->code);
|
||||
return -1;
|
||||
|
@ -1,4 +1,7 @@
|
||||
#
|
||||
# Arch-specific network modules
|
||||
#
|
||||
obj-$(CONFIG_BPF_JIT) += bpf_jit_asm_$(BITS).o bpf_jit_comp_$(BITS).o
|
||||
obj-$(CONFIG_BPF_JIT) += bpf_jit_comp_$(BITS).o
|
||||
ifeq ($(BITS),32)
|
||||
obj-$(CONFIG_BPF_JIT) += bpf_jit_asm_32.o
|
||||
endif
|
||||
|
@ -33,35 +33,6 @@
|
||||
#define I5 0x1d
|
||||
#define FP 0x1e
|
||||
#define I7 0x1f
|
||||
|
||||
#define r_SKB L0
|
||||
#define r_HEADLEN L4
|
||||
#define r_SKB_DATA L5
|
||||
#define r_TMP G1
|
||||
#define r_TMP2 G3
|
||||
|
||||
/* assembly code in arch/sparc/net/bpf_jit_asm_64.S */
|
||||
extern u32 bpf_jit_load_word[];
|
||||
extern u32 bpf_jit_load_half[];
|
||||
extern u32 bpf_jit_load_byte[];
|
||||
extern u32 bpf_jit_load_byte_msh[];
|
||||
extern u32 bpf_jit_load_word_positive_offset[];
|
||||
extern u32 bpf_jit_load_half_positive_offset[];
|
||||
extern u32 bpf_jit_load_byte_positive_offset[];
|
||||
extern u32 bpf_jit_load_byte_msh_positive_offset[];
|
||||
extern u32 bpf_jit_load_word_negative_offset[];
|
||||
extern u32 bpf_jit_load_half_negative_offset[];
|
||||
extern u32 bpf_jit_load_byte_negative_offset[];
|
||||
extern u32 bpf_jit_load_byte_msh_negative_offset[];
|
||||
|
||||
#else
|
||||
#define r_RESULT %o0
|
||||
#define r_SKB %o0
|
||||
#define r_OFF %o1
|
||||
#define r_HEADLEN %l4
|
||||
#define r_SKB_DATA %l5
|
||||
#define r_TMP %g1
|
||||
#define r_TMP2 %g3
|
||||
#endif
|
||||
|
||||
#endif /* _BPF_JIT_H */
|
||||
|
@ -1,162 +0,0 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#include <asm/ptrace.h>
|
||||
|
||||
#include "bpf_jit_64.h"
|
||||
|
||||
#define SAVE_SZ 176
|
||||
#define SCRATCH_OFF STACK_BIAS + 128
|
||||
#define BE_PTR(label) be,pn %xcc, label
|
||||
#define SIGN_EXTEND(reg) sra reg, 0, reg
|
||||
|
||||
#define SKF_MAX_NEG_OFF (-0x200000) /* SKF_LL_OFF from filter.h */
|
||||
|
||||
.text
|
||||
.globl bpf_jit_load_word
|
||||
bpf_jit_load_word:
|
||||
cmp r_OFF, 0
|
||||
bl bpf_slow_path_word_neg
|
||||
nop
|
||||
.globl bpf_jit_load_word_positive_offset
|
||||
bpf_jit_load_word_positive_offset:
|
||||
sub r_HEADLEN, r_OFF, r_TMP
|
||||
cmp r_TMP, 3
|
||||
ble bpf_slow_path_word
|
||||
add r_SKB_DATA, r_OFF, r_TMP
|
||||
andcc r_TMP, 3, %g0
|
||||
bne load_word_unaligned
|
||||
nop
|
||||
retl
|
||||
ld [r_TMP], r_RESULT
|
||||
load_word_unaligned:
|
||||
ldub [r_TMP + 0x0], r_OFF
|
||||
ldub [r_TMP + 0x1], r_TMP2
|
||||
sll r_OFF, 8, r_OFF
|
||||
or r_OFF, r_TMP2, r_OFF
|
||||
ldub [r_TMP + 0x2], r_TMP2
|
||||
sll r_OFF, 8, r_OFF
|
||||
or r_OFF, r_TMP2, r_OFF
|
||||
ldub [r_TMP + 0x3], r_TMP2
|
||||
sll r_OFF, 8, r_OFF
|
||||
retl
|
||||
or r_OFF, r_TMP2, r_RESULT
|
||||
|
||||
.globl bpf_jit_load_half
|
||||
bpf_jit_load_half:
|
||||
cmp r_OFF, 0
|
||||
bl bpf_slow_path_half_neg
|
||||
nop
|
||||
.globl bpf_jit_load_half_positive_offset
|
||||
bpf_jit_load_half_positive_offset:
|
||||
sub r_HEADLEN, r_OFF, r_TMP
|
||||
cmp r_TMP, 1
|
||||
ble bpf_slow_path_half
|
||||
add r_SKB_DATA, r_OFF, r_TMP
|
||||
andcc r_TMP, 1, %g0
|
||||
bne load_half_unaligned
|
||||
nop
|
||||
retl
|
||||
lduh [r_TMP], r_RESULT
|
||||
load_half_unaligned:
|
||||
ldub [r_TMP + 0x0], r_OFF
|
||||
ldub [r_TMP + 0x1], r_TMP2
|
||||
sll r_OFF, 8, r_OFF
|
||||
retl
|
||||
or r_OFF, r_TMP2, r_RESULT
|
||||
|
||||
.globl bpf_jit_load_byte
|
||||
bpf_jit_load_byte:
|
||||
cmp r_OFF, 0
|
||||
bl bpf_slow_path_byte_neg
|
||||
nop
|
||||
.globl bpf_jit_load_byte_positive_offset
|
||||
bpf_jit_load_byte_positive_offset:
|
||||
cmp r_OFF, r_HEADLEN
|
||||
bge bpf_slow_path_byte
|
||||
nop
|
||||
retl
|
||||
ldub [r_SKB_DATA + r_OFF], r_RESULT
|
||||
|
||||
#define bpf_slow_path_common(LEN) \
|
||||
save %sp, -SAVE_SZ, %sp; \
|
||||
mov %i0, %o0; \
|
||||
mov %i1, %o1; \
|
||||
add %fp, SCRATCH_OFF, %o2; \
|
||||
call skb_copy_bits; \
|
||||
mov (LEN), %o3; \
|
||||
cmp %o0, 0; \
|
||||
restore;
|
||||
|
||||
bpf_slow_path_word:
|
||||
bpf_slow_path_common(4)
|
||||
bl bpf_error
|
||||
ld [%sp + SCRATCH_OFF], r_RESULT
|
||||
retl
|
||||
nop
|
||||
bpf_slow_path_half:
|
||||
bpf_slow_path_common(2)
|
||||
bl bpf_error
|
||||
lduh [%sp + SCRATCH_OFF], r_RESULT
|
||||
retl
|
||||
nop
|
||||
bpf_slow_path_byte:
|
||||
bpf_slow_path_common(1)
|
||||
bl bpf_error
|
||||
ldub [%sp + SCRATCH_OFF], r_RESULT
|
||||
retl
|
||||
nop
|
||||
|
||||
#define bpf_negative_common(LEN) \
|
||||
save %sp, -SAVE_SZ, %sp; \
|
||||
mov %i0, %o0; \
|
||||
mov %i1, %o1; \
|
||||
SIGN_EXTEND(%o1); \
|
||||
call bpf_internal_load_pointer_neg_helper; \
|
||||
mov (LEN), %o2; \
|
||||
mov %o0, r_TMP; \
|
||||
cmp %o0, 0; \
|
||||
BE_PTR(bpf_error); \
|
||||
restore;
|
||||
|
||||
bpf_slow_path_word_neg:
|
||||
sethi %hi(SKF_MAX_NEG_OFF), r_TMP
|
||||
cmp r_OFF, r_TMP
|
||||
bl bpf_error
|
||||
nop
|
||||
.globl bpf_jit_load_word_negative_offset
|
||||
bpf_jit_load_word_negative_offset:
|
||||
bpf_negative_common(4)
|
||||
andcc r_TMP, 3, %g0
|
||||
bne load_word_unaligned
|
||||
nop
|
||||
retl
|
||||
ld [r_TMP], r_RESULT
|
||||
|
||||
bpf_slow_path_half_neg:
|
||||
sethi %hi(SKF_MAX_NEG_OFF), r_TMP
|
||||
cmp r_OFF, r_TMP
|
||||
bl bpf_error
|
||||
nop
|
||||
.globl bpf_jit_load_half_negative_offset
|
||||
bpf_jit_load_half_negative_offset:
|
||||
bpf_negative_common(2)
|
||||
andcc r_TMP, 1, %g0
|
||||
bne load_half_unaligned
|
||||
nop
|
||||
retl
|
||||
lduh [r_TMP], r_RESULT
|
||||
|
||||
bpf_slow_path_byte_neg:
|
||||
sethi %hi(SKF_MAX_NEG_OFF), r_TMP
|
||||
cmp r_OFF, r_TMP
|
||||
bl bpf_error
|
||||
nop
|
||||
.globl bpf_jit_load_byte_negative_offset
|
||||
bpf_jit_load_byte_negative_offset:
|
||||
bpf_negative_common(1)
|
||||
retl
|
||||
ldub [r_TMP], r_RESULT
|
||||
|
||||
bpf_error:
|
||||
/* Make the JIT program itself return zero. */
|
||||
ret
|
||||
restore %g0, %g0, %o0
|
@ -48,10 +48,6 @@ static void bpf_flush_icache(void *start_, void *end_)
|
||||
}
|
||||
}
|
||||
|
||||
#define SEEN_DATAREF 1 /* might call external helpers */
|
||||
#define SEEN_XREG 2 /* ebx is used */
|
||||
#define SEEN_MEM 4 /* use mem[] for temporary storage */
|
||||
|
||||
#define S13(X) ((X) & 0x1fff)
|
||||
#define S5(X) ((X) & 0x1f)
|
||||
#define IMMED 0x00002000
|
||||
@ -198,7 +194,6 @@ struct jit_ctx {
|
||||
bool tmp_1_used;
|
||||
bool tmp_2_used;
|
||||
bool tmp_3_used;
|
||||
bool saw_ld_abs_ind;
|
||||
bool saw_frame_pointer;
|
||||
bool saw_call;
|
||||
bool saw_tail_call;
|
||||
@ -207,9 +202,7 @@ struct jit_ctx {
|
||||
|
||||
#define TMP_REG_1 (MAX_BPF_JIT_REG + 0)
|
||||
#define TMP_REG_2 (MAX_BPF_JIT_REG + 1)
|
||||
#define SKB_HLEN_REG (MAX_BPF_JIT_REG + 2)
|
||||
#define SKB_DATA_REG (MAX_BPF_JIT_REG + 3)
|
||||
#define TMP_REG_3 (MAX_BPF_JIT_REG + 4)
|
||||
#define TMP_REG_3 (MAX_BPF_JIT_REG + 2)
|
||||
|
||||
/* Map BPF registers to SPARC registers */
|
||||
static const int bpf2sparc[] = {
|
||||
@ -238,9 +231,6 @@ static const int bpf2sparc[] = {
|
||||
[TMP_REG_1] = G1,
|
||||
[TMP_REG_2] = G2,
|
||||
[TMP_REG_3] = G3,
|
||||
|
||||
[SKB_HLEN_REG] = L4,
|
||||
[SKB_DATA_REG] = L5,
|
||||
};
|
||||
|
||||
static void emit(const u32 insn, struct jit_ctx *ctx)
|
||||
@ -800,25 +790,6 @@ static int emit_compare_and_branch(const u8 code, const u8 dst, u8 src,
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void load_skb_regs(struct jit_ctx *ctx, u8 r_skb)
|
||||
{
|
||||
const u8 r_headlen = bpf2sparc[SKB_HLEN_REG];
|
||||
const u8 r_data = bpf2sparc[SKB_DATA_REG];
|
||||
const u8 r_tmp = bpf2sparc[TMP_REG_1];
|
||||
unsigned int off;
|
||||
|
||||
off = offsetof(struct sk_buff, len);
|
||||
emit(LD32I | RS1(r_skb) | S13(off) | RD(r_headlen), ctx);
|
||||
|
||||
off = offsetof(struct sk_buff, data_len);
|
||||
emit(LD32I | RS1(r_skb) | S13(off) | RD(r_tmp), ctx);
|
||||
|
||||
emit(SUB | RS1(r_headlen) | RS2(r_tmp) | RD(r_headlen), ctx);
|
||||
|
||||
off = offsetof(struct sk_buff, data);
|
||||
emit(LDPTRI | RS1(r_skb) | S13(off) | RD(r_data), ctx);
|
||||
}
|
||||
|
||||
/* Just skip the save instruction and the ctx register move. */
|
||||
#define BPF_TAILCALL_PROLOGUE_SKIP 16
|
||||
#define BPF_TAILCALL_CNT_SP_OFF (STACK_BIAS + 128)
|
||||
@ -857,9 +828,6 @@ static void build_prologue(struct jit_ctx *ctx)
|
||||
|
||||
emit_reg_move(I0, O0, ctx);
|
||||
/* If you add anything here, adjust BPF_TAILCALL_PROLOGUE_SKIP above. */
|
||||
|
||||
if (ctx->saw_ld_abs_ind)
|
||||
load_skb_regs(ctx, bpf2sparc[BPF_REG_1]);
|
||||
}
|
||||
|
||||
static void build_epilogue(struct jit_ctx *ctx)
|
||||
@ -1225,16 +1193,11 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
|
||||
u8 *func = ((u8 *)__bpf_call_base) + imm;
|
||||
|
||||
ctx->saw_call = true;
|
||||
if (ctx->saw_ld_abs_ind && bpf_helper_changes_pkt_data(func))
|
||||
emit_reg_move(bpf2sparc[BPF_REG_1], L7, ctx);
|
||||
|
||||
emit_call((u32 *)func, ctx);
|
||||
emit_nop(ctx);
|
||||
|
||||
emit_reg_move(O0, bpf2sparc[BPF_REG_0], ctx);
|
||||
|
||||
if (ctx->saw_ld_abs_ind && bpf_helper_changes_pkt_data(func))
|
||||
load_skb_regs(ctx, L7);
|
||||
break;
|
||||
}
|
||||
|
||||
@ -1412,43 +1375,6 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
|
||||
emit_nop(ctx);
|
||||
break;
|
||||
}
|
||||
#define CHOOSE_LOAD_FUNC(K, func) \
|
||||
((int)K < 0 ? ((int)K >= SKF_LL_OFF ? func##_negative_offset : func) : func##_positive_offset)
|
||||
|
||||
/* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + imm)) */
|
||||
case BPF_LD | BPF_ABS | BPF_W:
|
||||
func = CHOOSE_LOAD_FUNC(imm, bpf_jit_load_word);
|
||||
goto common_load;
|
||||
case BPF_LD | BPF_ABS | BPF_H:
|
||||
func = CHOOSE_LOAD_FUNC(imm, bpf_jit_load_half);
|
||||
goto common_load;
|
||||
case BPF_LD | BPF_ABS | BPF_B:
|
||||
func = CHOOSE_LOAD_FUNC(imm, bpf_jit_load_byte);
|
||||
goto common_load;
|
||||
/* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + src + imm)) */
|
||||
case BPF_LD | BPF_IND | BPF_W:
|
||||
func = bpf_jit_load_word;
|
||||
goto common_load;
|
||||
case BPF_LD | BPF_IND | BPF_H:
|
||||
func = bpf_jit_load_half;
|
||||
goto common_load;
|
||||
|
||||
case BPF_LD | BPF_IND | BPF_B:
|
||||
func = bpf_jit_load_byte;
|
||||
common_load:
|
||||
ctx->saw_ld_abs_ind = true;
|
||||
|
||||
emit_reg_move(bpf2sparc[BPF_REG_6], O0, ctx);
|
||||
emit_loadimm(imm, O1, ctx);
|
||||
|
||||
if (BPF_MODE(code) == BPF_IND)
|
||||
emit_alu(ADD, src, O1, ctx);
|
||||
|
||||
emit_call(func, ctx);
|
||||
emit_alu_K(SRA, O1, 0, ctx);
|
||||
|
||||
emit_reg_move(O0, bpf2sparc[BPF_REG_0], ctx);
|
||||
break;
|
||||
|
||||
default:
|
||||
pr_err_once("unknown opcode %02x\n", code);
|
||||
@ -1583,12 +1509,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
|
||||
build_epilogue(&ctx);
|
||||
|
||||
if (bpf_jit_enable > 1)
|
||||
pr_info("Pass %d: shrink = %d, seen = [%c%c%c%c%c%c%c]\n", pass,
|
||||
pr_info("Pass %d: shrink = %d, seen = [%c%c%c%c%c%c]\n", pass,
|
||||
image_size - (ctx.idx * 4),
|
||||
ctx.tmp_1_used ? '1' : ' ',
|
||||
ctx.tmp_2_used ? '2' : ' ',
|
||||
ctx.tmp_3_used ? '3' : ' ',
|
||||
ctx.saw_ld_abs_ind ? 'L' : ' ',
|
||||
ctx.saw_frame_pointer ? 'F' : ' ',
|
||||
ctx.saw_call ? 'C' : ' ',
|
||||
ctx.saw_tail_call ? 'T' : ' ');
|
||||
|
@ -138,7 +138,7 @@ config X86
|
||||
select HAVE_DMA_CONTIGUOUS
|
||||
select HAVE_DYNAMIC_FTRACE
|
||||
select HAVE_DYNAMIC_FTRACE_WITH_REGS
|
||||
select HAVE_EBPF_JIT if X86_64
|
||||
select HAVE_EBPF_JIT
|
||||
select HAVE_EFFICIENT_UNALIGNED_ACCESS
|
||||
select HAVE_EXIT_THREAD
|
||||
select HAVE_FENTRY if X86_64 || DYNAMIC_FTRACE
|
||||
|
@ -291,16 +291,20 @@ do { \
|
||||
* lfence
|
||||
* jmp spec_trap
|
||||
* do_rop:
|
||||
* mov %rax,(%rsp)
|
||||
* mov %rax,(%rsp) for x86_64
|
||||
* mov %edx,(%esp) for x86_32
|
||||
* retq
|
||||
*
|
||||
* Without retpolines configured:
|
||||
*
|
||||
* jmp *%rax
|
||||
* jmp *%rax for x86_64
|
||||
* jmp *%edx for x86_32
|
||||
*/
|
||||
#ifdef CONFIG_RETPOLINE
|
||||
#ifdef CONFIG_X86_64
|
||||
# define RETPOLINE_RAX_BPF_JIT_SIZE 17
|
||||
# define RETPOLINE_RAX_BPF_JIT() \
|
||||
do { \
|
||||
EMIT1_off32(0xE8, 7); /* callq do_rop */ \
|
||||
/* spec_trap: */ \
|
||||
EMIT2(0xF3, 0x90); /* pause */ \
|
||||
@ -308,11 +312,31 @@ do { \
|
||||
EMIT2(0xEB, 0xF9); /* jmp spec_trap */ \
|
||||
/* do_rop: */ \
|
||||
EMIT4(0x48, 0x89, 0x04, 0x24); /* mov %rax,(%rsp) */ \
|
||||
EMIT1(0xC3); /* retq */
|
||||
EMIT1(0xC3); /* retq */ \
|
||||
} while (0)
|
||||
#else
|
||||
# define RETPOLINE_EDX_BPF_JIT() \
|
||||
do { \
|
||||
EMIT1_off32(0xE8, 7); /* call do_rop */ \
|
||||
/* spec_trap: */ \
|
||||
EMIT2(0xF3, 0x90); /* pause */ \
|
||||
EMIT3(0x0F, 0xAE, 0xE8); /* lfence */ \
|
||||
EMIT2(0xEB, 0xF9); /* jmp spec_trap */ \
|
||||
/* do_rop: */ \
|
||||
EMIT3(0x89, 0x14, 0x24); /* mov %edx,(%esp) */ \
|
||||
EMIT1(0xC3); /* ret */ \
|
||||
} while (0)
|
||||
#endif
|
||||
#else /* !CONFIG_RETPOLINE */
|
||||
|
||||
#ifdef CONFIG_X86_64
|
||||
# define RETPOLINE_RAX_BPF_JIT_SIZE 2
|
||||
# define RETPOLINE_RAX_BPF_JIT() \
|
||||
EMIT2(0xFF, 0xE0); /* jmp *%rax */
|
||||
#else
|
||||
# define RETPOLINE_EDX_BPF_JIT() \
|
||||
EMIT2(0xFF, 0xE2) /* jmp *%edx */
|
||||
#endif
|
||||
#endif
|
||||
|
||||
#endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
|
||||
|
@ -1,6 +1,9 @@
|
||||
#
|
||||
# Arch-specific network modules
|
||||
#
|
||||
OBJECT_FILES_NON_STANDARD_bpf_jit.o += y
|
||||
|
||||
obj-$(CONFIG_BPF_JIT) += bpf_jit.o bpf_jit_comp.o
|
||||
ifeq ($(CONFIG_X86_32),y)
|
||||
obj-$(CONFIG_BPF_JIT) += bpf_jit_comp32.o
|
||||
else
|
||||
obj-$(CONFIG_BPF_JIT) += bpf_jit_comp.o
|
||||
endif
|
||||
|
@ -1,154 +0,0 @@
|
||||
/* bpf_jit.S : BPF JIT helper functions
|
||||
*
|
||||
* Copyright (C) 2011 Eric Dumazet (eric.dumazet@gmail.com)
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or
|
||||
* modify it under the terms of the GNU General Public License
|
||||
* as published by the Free Software Foundation; version 2
|
||||
* of the License.
|
||||
*/
|
||||
#include <linux/linkage.h>
|
||||
#include <asm/frame.h>
|
||||
|
||||
/*
|
||||
* Calling convention :
|
||||
* rbx : skb pointer (callee saved)
|
||||
* esi : offset of byte(s) to fetch in skb (can be scratched)
|
||||
* r10 : copy of skb->data
|
||||
* r9d : hlen = skb->len - skb->data_len
|
||||
*/
|
||||
#define SKBDATA %r10
|
||||
#define SKF_MAX_NEG_OFF $(-0x200000) /* SKF_LL_OFF from filter.h */
|
||||
|
||||
#define FUNC(name) \
|
||||
.globl name; \
|
||||
.type name, @function; \
|
||||
name:
|
||||
|
||||
FUNC(sk_load_word)
|
||||
test %esi,%esi
|
||||
js bpf_slow_path_word_neg
|
||||
|
||||
FUNC(sk_load_word_positive_offset)
|
||||
mov %r9d,%eax # hlen
|
||||
sub %esi,%eax # hlen - offset
|
||||
cmp $3,%eax
|
||||
jle bpf_slow_path_word
|
||||
mov (SKBDATA,%rsi),%eax
|
||||
bswap %eax /* ntohl() */
|
||||
ret
|
||||
|
||||
FUNC(sk_load_half)
|
||||
test %esi,%esi
|
||||
js bpf_slow_path_half_neg
|
||||
|
||||
FUNC(sk_load_half_positive_offset)
|
||||
mov %r9d,%eax
|
||||
sub %esi,%eax # hlen - offset
|
||||
cmp $1,%eax
|
||||
jle bpf_slow_path_half
|
||||
movzwl (SKBDATA,%rsi),%eax
|
||||
rol $8,%ax # ntohs()
|
||||
ret
|
||||
|
||||
FUNC(sk_load_byte)
|
||||
test %esi,%esi
|
||||
js bpf_slow_path_byte_neg
|
||||
|
||||
FUNC(sk_load_byte_positive_offset)
|
||||
cmp %esi,%r9d /* if (offset >= hlen) goto bpf_slow_path_byte */
|
||||
jle bpf_slow_path_byte
|
||||
movzbl (SKBDATA,%rsi),%eax
|
||||
ret
|
||||
|
||||
/* rsi contains offset and can be scratched */
|
||||
#define bpf_slow_path_common(LEN) \
|
||||
lea 32(%rbp), %rdx;\
|
||||
FRAME_BEGIN; \
|
||||
mov %rbx, %rdi; /* arg1 == skb */ \
|
||||
push %r9; \
|
||||
push SKBDATA; \
|
||||
/* rsi already has offset */ \
|
||||
mov $LEN,%ecx; /* len */ \
|
||||
call skb_copy_bits; \
|
||||
test %eax,%eax; \
|
||||
pop SKBDATA; \
|
||||
pop %r9; \
|
||||
FRAME_END
|
||||
|
||||
|
||||
bpf_slow_path_word:
|
||||
bpf_slow_path_common(4)
|
||||
js bpf_error
|
||||
mov 32(%rbp),%eax
|
||||
bswap %eax
|
||||
ret
|
||||
|
||||
bpf_slow_path_half:
|
||||
bpf_slow_path_common(2)
|
||||
js bpf_error
|
||||
mov 32(%rbp),%ax
|
||||
rol $8,%ax
|
||||
movzwl %ax,%eax
|
||||
ret
|
||||
|
||||
bpf_slow_path_byte:
|
||||
bpf_slow_path_common(1)
|
||||
js bpf_error
|
||||
movzbl 32(%rbp),%eax
|
||||
ret
|
||||
|
||||
#define sk_negative_common(SIZE) \
|
||||
FRAME_BEGIN; \
|
||||
mov %rbx, %rdi; /* arg1 == skb */ \
|
||||
push %r9; \
|
||||
push SKBDATA; \
|
||||
/* rsi already has offset */ \
|
||||
mov $SIZE,%edx; /* size */ \
|
||||
call bpf_internal_load_pointer_neg_helper; \
|
||||
test %rax,%rax; \
|
||||
pop SKBDATA; \
|
||||
pop %r9; \
|
||||
FRAME_END; \
|
||||
jz bpf_error
|
||||
|
||||
bpf_slow_path_word_neg:
|
||||
cmp SKF_MAX_NEG_OFF, %esi /* test range */
|
||||
jl bpf_error /* offset lower -> error */
|
||||
|
||||
FUNC(sk_load_word_negative_offset)
|
||||
sk_negative_common(4)
|
||||
mov (%rax), %eax
|
||||
bswap %eax
|
||||
ret
|
||||
|
||||
bpf_slow_path_half_neg:
|
||||
cmp SKF_MAX_NEG_OFF, %esi
|
||||
jl bpf_error
|
||||
|
||||
FUNC(sk_load_half_negative_offset)
|
||||
sk_negative_common(2)
|
||||
mov (%rax),%ax
|
||||
rol $8,%ax
|
||||
movzwl %ax,%eax
|
||||
ret
|
||||
|
||||
bpf_slow_path_byte_neg:
|
||||
cmp SKF_MAX_NEG_OFF, %esi
|
||||
jl bpf_error
|
||||
|
||||
FUNC(sk_load_byte_negative_offset)
|
||||
sk_negative_common(1)
|
||||
movzbl (%rax), %eax
|
||||
ret
|
||||
|
||||
bpf_error:
|
||||
# force a return 0 from jit handler
|
||||
xor %eax,%eax
|
||||
mov (%rbp),%rbx
|
||||
mov 8(%rbp),%r13
|
||||
mov 16(%rbp),%r14
|
||||
mov 24(%rbp),%r15
|
||||
add $40, %rbp
|
||||
leaveq
|
||||
ret
|
@ -1,4 +1,5 @@
|
||||
/* bpf_jit_comp.c : BPF JIT compiler
|
||||
/*
|
||||
* bpf_jit_comp.c: BPF JIT compiler
|
||||
*
|
||||
* Copyright (C) 2011-2013 Eric Dumazet (eric.dumazet@gmail.com)
|
||||
* Internal BPF Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
|
||||
@ -16,15 +17,6 @@
|
||||
#include <asm/set_memory.h>
|
||||
#include <asm/nospec-branch.h>
|
||||
|
||||
/*
|
||||
* assembly code in arch/x86/net/bpf_jit.S
|
||||
*/
|
||||
extern u8 sk_load_word[], sk_load_half[], sk_load_byte[];
|
||||
extern u8 sk_load_word_positive_offset[], sk_load_half_positive_offset[];
|
||||
extern u8 sk_load_byte_positive_offset[];
|
||||
extern u8 sk_load_word_negative_offset[], sk_load_half_negative_offset[];
|
||||
extern u8 sk_load_byte_negative_offset[];
|
||||
|
||||
static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
|
||||
{
|
||||
if (len == 1)
|
||||
@ -45,14 +37,15 @@ static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
|
||||
#define EMIT2(b1, b2) EMIT((b1) + ((b2) << 8), 2)
|
||||
#define EMIT3(b1, b2, b3) EMIT((b1) + ((b2) << 8) + ((b3) << 16), 3)
|
||||
#define EMIT4(b1, b2, b3, b4) EMIT((b1) + ((b2) << 8) + ((b3) << 16) + ((b4) << 24), 4)
|
||||
|
||||
#define EMIT1_off32(b1, off) \
|
||||
do {EMIT1(b1); EMIT(off, 4); } while (0)
|
||||
do { EMIT1(b1); EMIT(off, 4); } while (0)
|
||||
#define EMIT2_off32(b1, b2, off) \
|
||||
do {EMIT2(b1, b2); EMIT(off, 4); } while (0)
|
||||
do { EMIT2(b1, b2); EMIT(off, 4); } while (0)
|
||||
#define EMIT3_off32(b1, b2, b3, off) \
|
||||
do {EMIT3(b1, b2, b3); EMIT(off, 4); } while (0)
|
||||
do { EMIT3(b1, b2, b3); EMIT(off, 4); } while (0)
|
||||
#define EMIT4_off32(b1, b2, b3, b4, off) \
|
||||
do {EMIT4(b1, b2, b3, b4); EMIT(off, 4); } while (0)
|
||||
do { EMIT4(b1, b2, b3, b4); EMIT(off, 4); } while (0)
|
||||
|
||||
static bool is_imm8(int value)
|
||||
{
|
||||
@ -70,9 +63,10 @@ static bool is_uimm32(u64 value)
|
||||
}
|
||||
|
||||
/* mov dst, src */
|
||||
#define EMIT_mov(DST, SRC) \
|
||||
do {if (DST != SRC) \
|
||||
EMIT3(add_2mod(0x48, DST, SRC), 0x89, add_2reg(0xC0, DST, SRC)); \
|
||||
#define EMIT_mov(DST, SRC) \
|
||||
do { \
|
||||
if (DST != SRC) \
|
||||
EMIT3(add_2mod(0x48, DST, SRC), 0x89, add_2reg(0xC0, DST, SRC)); \
|
||||
} while (0)
|
||||
|
||||
static int bpf_size_to_x86_bytes(int bpf_size)
|
||||
@ -89,7 +83,8 @@ static int bpf_size_to_x86_bytes(int bpf_size)
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* list of x86 cond jumps opcodes (. + s8)
|
||||
/*
|
||||
* List of x86 cond jumps opcodes (. + s8)
|
||||
* Add 0x10 (and an extra 0x0f) to generate far jumps (. + s32)
|
||||
*/
|
||||
#define X86_JB 0x72
|
||||
@ -103,38 +98,37 @@ static int bpf_size_to_x86_bytes(int bpf_size)
|
||||
#define X86_JLE 0x7E
|
||||
#define X86_JG 0x7F
|
||||
|
||||
#define CHOOSE_LOAD_FUNC(K, func) \
|
||||
((int)K < 0 ? ((int)K >= SKF_LL_OFF ? func##_negative_offset : func) : func##_positive_offset)
|
||||
|
||||
/* pick a register outside of BPF range for JIT internal work */
|
||||
/* Pick a register outside of BPF range for JIT internal work */
|
||||
#define AUX_REG (MAX_BPF_JIT_REG + 1)
|
||||
|
||||
/* The following table maps BPF registers to x64 registers.
|
||||
/*
|
||||
* The following table maps BPF registers to x86-64 registers.
|
||||
*
|
||||
* x64 register r12 is unused, since if used as base address
|
||||
* x86-64 register R12 is unused, since if used as base address
|
||||
* register in load/store instructions, it always needs an
|
||||
* extra byte of encoding and is callee saved.
|
||||
*
|
||||
* r9 caches skb->len - skb->data_len
|
||||
* r10 caches skb->data, and used for blinding (if enabled)
|
||||
* Also x86-64 register R9 is unused. x86-64 register R10 is
|
||||
* used for blinding (if enabled).
|
||||
*/
|
||||
static const int reg2hex[] = {
|
||||
[BPF_REG_0] = 0, /* rax */
|
||||
[BPF_REG_1] = 7, /* rdi */
|
||||
[BPF_REG_2] = 6, /* rsi */
|
||||
[BPF_REG_3] = 2, /* rdx */
|
||||
[BPF_REG_4] = 1, /* rcx */
|
||||
[BPF_REG_5] = 0, /* r8 */
|
||||
[BPF_REG_6] = 3, /* rbx callee saved */
|
||||
[BPF_REG_7] = 5, /* r13 callee saved */
|
||||
[BPF_REG_8] = 6, /* r14 callee saved */
|
||||
[BPF_REG_9] = 7, /* r15 callee saved */
|
||||
[BPF_REG_FP] = 5, /* rbp readonly */
|
||||
[BPF_REG_AX] = 2, /* r10 temp register */
|
||||
[AUX_REG] = 3, /* r11 temp register */
|
||||
[BPF_REG_0] = 0, /* RAX */
|
||||
[BPF_REG_1] = 7, /* RDI */
|
||||
[BPF_REG_2] = 6, /* RSI */
|
||||
[BPF_REG_3] = 2, /* RDX */
|
||||
[BPF_REG_4] = 1, /* RCX */
|
||||
[BPF_REG_5] = 0, /* R8 */
|
||||
[BPF_REG_6] = 3, /* RBX callee saved */
|
||||
[BPF_REG_7] = 5, /* R13 callee saved */
|
||||
[BPF_REG_8] = 6, /* R14 callee saved */
|
||||
[BPF_REG_9] = 7, /* R15 callee saved */
|
||||
[BPF_REG_FP] = 5, /* RBP readonly */
|
||||
[BPF_REG_AX] = 2, /* R10 temp register */
|
||||
[AUX_REG] = 3, /* R11 temp register */
|
||||
};
|
||||
|
||||
/* is_ereg() == true if BPF register 'reg' maps to x64 r8..r15
|
||||
/*
|
||||
* is_ereg() == true if BPF register 'reg' maps to x86-64 r8..r15
|
||||
* which need extra byte of encoding.
|
||||
* rax,rcx,...,rbp have simpler encoding
|
||||
*/
|
||||
@ -153,7 +147,7 @@ static bool is_axreg(u32 reg)
|
||||
return reg == BPF_REG_0;
|
||||
}
|
||||
|
||||
/* add modifiers if 'reg' maps to x64 registers r8..r15 */
|
||||
/* Add modifiers if 'reg' maps to x86-64 registers R8..R15 */
|
||||
static u8 add_1mod(u8 byte, u32 reg)
|
||||
{
|
||||
if (is_ereg(reg))
|
||||
@ -170,13 +164,13 @@ static u8 add_2mod(u8 byte, u32 r1, u32 r2)
|
||||
return byte;
|
||||
}
|
||||
|
||||
/* encode 'dst_reg' register into x64 opcode 'byte' */
|
||||
/* Encode 'dst_reg' register into x86-64 opcode 'byte' */
|
||||
static u8 add_1reg(u8 byte, u32 dst_reg)
|
||||
{
|
||||
return byte + reg2hex[dst_reg];
|
||||
}
|
||||
|
||||
/* encode 'dst_reg' and 'src_reg' registers into x64 opcode 'byte' */
|
||||
/* Encode 'dst_reg' and 'src_reg' registers into x86-64 opcode 'byte' */
|
||||
static u8 add_2reg(u8 byte, u32 dst_reg, u32 src_reg)
|
||||
{
|
||||
return byte + reg2hex[dst_reg] + (reg2hex[src_reg] << 3);
|
||||
@ -184,27 +178,24 @@ static u8 add_2reg(u8 byte, u32 dst_reg, u32 src_reg)
|
||||
|
||||
static void jit_fill_hole(void *area, unsigned int size)
|
||||
{
|
||||
/* fill whole space with int3 instructions */
|
||||
/* Fill whole space with INT3 instructions */
|
||||
memset(area, 0xcc, size);
|
||||
}
|
||||
|
||||
struct jit_context {
|
||||
int cleanup_addr; /* epilogue code offset */
|
||||
bool seen_ld_abs;
|
||||
bool seen_ax_reg;
|
||||
int cleanup_addr; /* Epilogue code offset */
|
||||
};
|
||||
|
||||
/* maximum number of bytes emitted while JITing one eBPF insn */
|
||||
/* Maximum number of bytes emitted while JITing one eBPF insn */
|
||||
#define BPF_MAX_INSN_SIZE 128
|
||||
#define BPF_INSN_SAFETY 64
|
||||
|
||||
#define AUX_STACK_SPACE \
|
||||
(32 /* space for rbx, r13, r14, r15 */ + \
|
||||
8 /* space for skb_copy_bits() buffer */)
|
||||
#define AUX_STACK_SPACE 40 /* Space for RBX, R13, R14, R15, tailcnt */
|
||||
|
||||
#define PROLOGUE_SIZE 37
|
||||
#define PROLOGUE_SIZE 37
|
||||
|
||||
/* emit x64 prologue code for BPF program and check it's size.
|
||||
/*
|
||||
* Emit x86-64 prologue code for BPF program and check its size.
|
||||
* bpf_tail_call helper will skip it while jumping into another program
|
||||
*/
|
||||
static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf)
|
||||
@ -212,8 +203,11 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf)
|
||||
u8 *prog = *pprog;
|
||||
int cnt = 0;
|
||||
|
||||
EMIT1(0x55); /* push rbp */
|
||||
EMIT3(0x48, 0x89, 0xE5); /* mov rbp,rsp */
|
||||
/* push rbp */
|
||||
EMIT1(0x55);
|
||||
|
||||
/* mov rbp,rsp */
|
||||
EMIT3(0x48, 0x89, 0xE5);
|
||||
|
||||
/* sub rsp, rounded_stack_depth + AUX_STACK_SPACE */
|
||||
EMIT3_off32(0x48, 0x81, 0xEC,
|
||||
@ -222,19 +216,8 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf)
|
||||
/* sub rbp, AUX_STACK_SPACE */
|
||||
EMIT4(0x48, 0x83, 0xED, AUX_STACK_SPACE);
|
||||
|
||||
/* all classic BPF filters use R6(rbx) save it */
|
||||
|
||||
/* mov qword ptr [rbp+0],rbx */
|
||||
EMIT4(0x48, 0x89, 0x5D, 0);
|
||||
|
||||
/* bpf_convert_filter() maps classic BPF register X to R7 and uses R8
|
||||
* as temporary, so all tcpdump filters need to spill/fill R7(r13) and
|
||||
* R8(r14). R9(r15) spill could be made conditional, but there is only
|
||||
* one 'bpf_error' return path out of helper functions inside bpf_jit.S
|
||||
* The overhead of extra spill is negligible for any filter other
|
||||
* than synthetic ones. Therefore not worth adding complexity.
|
||||
*/
|
||||
|
||||
/* mov qword ptr [rbp+8],r13 */
|
||||
EMIT4(0x4C, 0x89, 0x6D, 8);
|
||||
/* mov qword ptr [rbp+16],r14 */
|
||||
@ -243,9 +226,10 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf)
|
||||
EMIT4(0x4C, 0x89, 0x7D, 24);
|
||||
|
||||
if (!ebpf_from_cbpf) {
|
||||
/* Clear the tail call counter (tail_call_cnt): for eBPF tail
|
||||
/*
|
||||
* Clear the tail call counter (tail_call_cnt): for eBPF tail
|
||||
* calls we need to reset the counter to 0. It's done in two
|
||||
* instructions, resetting rax register to 0, and moving it
|
||||
* instructions, resetting RAX register to 0, and moving it
|
||||
* to the counter location.
|
||||
*/
|
||||
|
||||
@ -260,7 +244,9 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf)
|
||||
*pprog = prog;
|
||||
}
|
||||
|
||||
/* generate the following code:
|
||||
/*
|
||||
* Generate the following code:
|
||||
*
|
||||
* ... bpf_tail_call(void *ctx, struct bpf_array *array, u64 index) ...
|
||||
* if (index >= array->map.max_entries)
|
||||
* goto out;
|
||||
@ -278,23 +264,26 @@ static void emit_bpf_tail_call(u8 **pprog)
|
||||
int label1, label2, label3;
|
||||
int cnt = 0;
|
||||
|
||||
/* rdi - pointer to ctx
|
||||
/*
|
||||
* rdi - pointer to ctx
|
||||
* rsi - pointer to bpf_array
|
||||
* rdx - index in bpf_array
|
||||
*/
|
||||
|
||||
/* if (index >= array->map.max_entries)
|
||||
* goto out;
|
||||
/*
|
||||
* if (index >= array->map.max_entries)
|
||||
* goto out;
|
||||
*/
|
||||
EMIT2(0x89, 0xD2); /* mov edx, edx */
|
||||
EMIT3(0x39, 0x56, /* cmp dword ptr [rsi + 16], edx */
|
||||
offsetof(struct bpf_array, map.max_entries));
|
||||
#define OFFSET1 (41 + RETPOLINE_RAX_BPF_JIT_SIZE) /* number of bytes to jump */
|
||||
#define OFFSET1 (41 + RETPOLINE_RAX_BPF_JIT_SIZE) /* Number of bytes to jump */
|
||||
EMIT2(X86_JBE, OFFSET1); /* jbe out */
|
||||
label1 = cnt;
|
||||
|
||||
/* if (tail_call_cnt > MAX_TAIL_CALL_CNT)
|
||||
* goto out;
|
||||
/*
|
||||
* if (tail_call_cnt > MAX_TAIL_CALL_CNT)
|
||||
* goto out;
|
||||
*/
|
||||
EMIT2_off32(0x8B, 0x85, 36); /* mov eax, dword ptr [rbp + 36] */
|
||||
EMIT3(0x83, 0xF8, MAX_TAIL_CALL_CNT); /* cmp eax, MAX_TAIL_CALL_CNT */
|
||||
@ -308,8 +297,9 @@ static void emit_bpf_tail_call(u8 **pprog)
|
||||
EMIT4_off32(0x48, 0x8B, 0x84, 0xD6, /* mov rax, [rsi + rdx * 8 + offsetof(...)] */
|
||||
offsetof(struct bpf_array, ptrs));
|
||||
|
||||
/* if (prog == NULL)
|
||||
* goto out;
|
||||
/*
|
||||
* if (prog == NULL)
|
||||
* goto out;
|
||||
*/
|
||||
EMIT3(0x48, 0x85, 0xC0); /* test rax,rax */
|
||||
#define OFFSET3 (8 + RETPOLINE_RAX_BPF_JIT_SIZE)
|
||||
@ -321,7 +311,8 @@ static void emit_bpf_tail_call(u8 **pprog)
|
||||
offsetof(struct bpf_prog, bpf_func));
|
||||
EMIT4(0x48, 0x83, 0xC0, PROLOGUE_SIZE); /* add rax, prologue_size */
|
||||
|
||||
/* now we're ready to jump into next BPF program
|
||||
/*
|
||||
* Wow we're ready to jump into next BPF program
|
||||
* rdi == ctx (1st arg)
|
||||
* rax == prog->bpf_func + prologue_size
|
||||
*/
|
||||
@ -334,26 +325,6 @@ static void emit_bpf_tail_call(u8 **pprog)
|
||||
*pprog = prog;
|
||||
}
|
||||
|
||||
|
||||
static void emit_load_skb_data_hlen(u8 **pprog)
|
||||
{
|
||||
u8 *prog = *pprog;
|
||||
int cnt = 0;
|
||||
|
||||
/* r9d = skb->len - skb->data_len (headlen)
|
||||
* r10 = skb->data
|
||||
*/
|
||||
/* mov %r9d, off32(%rdi) */
|
||||
EMIT3_off32(0x44, 0x8b, 0x8f, offsetof(struct sk_buff, len));
|
||||
|
||||
/* sub %r9d, off32(%rdi) */
|
||||
EMIT3_off32(0x44, 0x2b, 0x8f, offsetof(struct sk_buff, data_len));
|
||||
|
||||
/* mov %r10, off32(%rdi) */
|
||||
EMIT3_off32(0x4c, 0x8b, 0x97, offsetof(struct sk_buff, data));
|
||||
*pprog = prog;
|
||||
}
|
||||
|
||||
static void emit_mov_imm32(u8 **pprog, bool sign_propagate,
|
||||
u32 dst_reg, const u32 imm32)
|
||||
{
|
||||
@ -361,7 +332,8 @@ static void emit_mov_imm32(u8 **pprog, bool sign_propagate,
|
||||
u8 b1, b2, b3;
|
||||
int cnt = 0;
|
||||
|
||||
/* optimization: if imm32 is positive, use 'mov %eax, imm32'
|
||||
/*
|
||||
* Optimization: if imm32 is positive, use 'mov %eax, imm32'
|
||||
* (which zero-extends imm32) to save 2 bytes.
|
||||
*/
|
||||
if (sign_propagate && (s32)imm32 < 0) {
|
||||
@ -373,7 +345,8 @@ static void emit_mov_imm32(u8 **pprog, bool sign_propagate,
|
||||
goto done;
|
||||
}
|
||||
|
||||
/* optimization: if imm32 is zero, use 'xor %eax, %eax'
|
||||
/*
|
||||
* Optimization: if imm32 is zero, use 'xor %eax, %eax'
|
||||
* to save 3 bytes.
|
||||
*/
|
||||
if (imm32 == 0) {
|
||||
@ -400,7 +373,8 @@ static void emit_mov_imm64(u8 **pprog, u32 dst_reg,
|
||||
int cnt = 0;
|
||||
|
||||
if (is_uimm32(((u64)imm32_hi << 32) | (u32)imm32_lo)) {
|
||||
/* For emitting plain u32, where sign bit must not be
|
||||
/*
|
||||
* For emitting plain u32, where sign bit must not be
|
||||
* propagated LLVM tends to load imm64 over mov32
|
||||
* directly, so save couple of bytes by just doing
|
||||
* 'mov %eax, imm32' instead.
|
||||
@ -439,8 +413,6 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
|
||||
{
|
||||
struct bpf_insn *insn = bpf_prog->insnsi;
|
||||
int insn_cnt = bpf_prog->len;
|
||||
bool seen_ld_abs = ctx->seen_ld_abs | (oldproglen == 0);
|
||||
bool seen_ax_reg = ctx->seen_ax_reg | (oldproglen == 0);
|
||||
bool seen_exit = false;
|
||||
u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY];
|
||||
int i, cnt = 0;
|
||||
@ -450,9 +422,6 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
|
||||
emit_prologue(&prog, bpf_prog->aux->stack_depth,
|
||||
bpf_prog_was_classic(bpf_prog));
|
||||
|
||||
if (seen_ld_abs)
|
||||
emit_load_skb_data_hlen(&prog);
|
||||
|
||||
for (i = 0; i < insn_cnt; i++, insn++) {
|
||||
const s32 imm32 = insn->imm;
|
||||
u32 dst_reg = insn->dst_reg;
|
||||
@ -460,13 +429,9 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
|
||||
u8 b2 = 0, b3 = 0;
|
||||
s64 jmp_offset;
|
||||
u8 jmp_cond;
|
||||
bool reload_skb_data;
|
||||
int ilen;
|
||||
u8 *func;
|
||||
|
||||
if (dst_reg == BPF_REG_AX || src_reg == BPF_REG_AX)
|
||||
ctx->seen_ax_reg = seen_ax_reg = true;
|
||||
|
||||
switch (insn->code) {
|
||||
/* ALU */
|
||||
case BPF_ALU | BPF_ADD | BPF_X:
|
||||
@ -525,7 +490,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
|
||||
else if (is_ereg(dst_reg))
|
||||
EMIT1(add_1mod(0x40, dst_reg));
|
||||
|
||||
/* b3 holds 'normal' opcode, b2 short form only valid
|
||||
/*
|
||||
* b3 holds 'normal' opcode, b2 short form only valid
|
||||
* in case dst is eax/rax.
|
||||
*/
|
||||
switch (BPF_OP(insn->code)) {
|
||||
@ -593,7 +559,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
|
||||
/* mov rax, dst_reg */
|
||||
EMIT_mov(BPF_REG_0, dst_reg);
|
||||
|
||||
/* xor edx, edx
|
||||
/*
|
||||
* xor edx, edx
|
||||
* equivalent to 'xor rdx, rdx', but one byte less
|
||||
*/
|
||||
EMIT2(0x31, 0xd2);
|
||||
@ -655,7 +622,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
|
||||
}
|
||||
break;
|
||||
}
|
||||
/* shifts */
|
||||
/* Shifts */
|
||||
case BPF_ALU | BPF_LSH | BPF_K:
|
||||
case BPF_ALU | BPF_RSH | BPF_K:
|
||||
case BPF_ALU | BPF_ARSH | BPF_K:
|
||||
@ -686,7 +653,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
|
||||
case BPF_ALU64 | BPF_RSH | BPF_X:
|
||||
case BPF_ALU64 | BPF_ARSH | BPF_X:
|
||||
|
||||
/* check for bad case when dst_reg == rcx */
|
||||
/* Check for bad case when dst_reg == rcx */
|
||||
if (dst_reg == BPF_REG_4) {
|
||||
/* mov r11, dst_reg */
|
||||
EMIT_mov(AUX_REG, dst_reg);
|
||||
@ -724,13 +691,13 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
|
||||
case BPF_ALU | BPF_END | BPF_FROM_BE:
|
||||
switch (imm32) {
|
||||
case 16:
|
||||
/* emit 'ror %ax, 8' to swap lower 2 bytes */
|
||||
/* Emit 'ror %ax, 8' to swap lower 2 bytes */
|
||||
EMIT1(0x66);
|
||||
if (is_ereg(dst_reg))
|
||||
EMIT1(0x41);
|
||||
EMIT3(0xC1, add_1reg(0xC8, dst_reg), 8);
|
||||
|
||||
/* emit 'movzwl eax, ax' */
|
||||
/* Emit 'movzwl eax, ax' */
|
||||
if (is_ereg(dst_reg))
|
||||
EMIT3(0x45, 0x0F, 0xB7);
|
||||
else
|
||||
@ -738,7 +705,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
|
||||
EMIT1(add_2reg(0xC0, dst_reg, dst_reg));
|
||||
break;
|
||||
case 32:
|
||||
/* emit 'bswap eax' to swap lower 4 bytes */
|
||||
/* Emit 'bswap eax' to swap lower 4 bytes */
|
||||
if (is_ereg(dst_reg))
|
||||
EMIT2(0x41, 0x0F);
|
||||
else
|
||||
@ -746,7 +713,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
|
||||
EMIT1(add_1reg(0xC8, dst_reg));
|
||||
break;
|
||||
case 64:
|
||||
/* emit 'bswap rax' to swap 8 bytes */
|
||||
/* Emit 'bswap rax' to swap 8 bytes */
|
||||
EMIT3(add_1mod(0x48, dst_reg), 0x0F,
|
||||
add_1reg(0xC8, dst_reg));
|
||||
break;
|
||||
@ -756,7 +723,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
|
||||
case BPF_ALU | BPF_END | BPF_FROM_LE:
|
||||
switch (imm32) {
|
||||
case 16:
|
||||
/* emit 'movzwl eax, ax' to zero extend 16-bit
|
||||
/*
|
||||
* Emit 'movzwl eax, ax' to zero extend 16-bit
|
||||
* into 64 bit
|
||||
*/
|
||||
if (is_ereg(dst_reg))
|
||||
@ -766,7 +734,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
|
||||
EMIT1(add_2reg(0xC0, dst_reg, dst_reg));
|
||||
break;
|
||||
case 32:
|
||||
/* emit 'mov eax, eax' to clear upper 32-bits */
|
||||
/* Emit 'mov eax, eax' to clear upper 32-bits */
|
||||
if (is_ereg(dst_reg))
|
||||
EMIT1(0x45);
|
||||
EMIT2(0x89, add_2reg(0xC0, dst_reg, dst_reg));
|
||||
@ -809,9 +777,9 @@ st: if (is_imm8(insn->off))
|
||||
|
||||
/* STX: *(u8*)(dst_reg + off) = src_reg */
|
||||
case BPF_STX | BPF_MEM | BPF_B:
|
||||
/* emit 'mov byte ptr [rax + off], al' */
|
||||
/* Emit 'mov byte ptr [rax + off], al' */
|
||||
if (is_ereg(dst_reg) || is_ereg(src_reg) ||
|
||||
/* have to add extra byte for x86 SIL, DIL regs */
|
||||
/* We have to add extra byte for x86 SIL, DIL regs */
|
||||
src_reg == BPF_REG_1 || src_reg == BPF_REG_2)
|
||||
EMIT2(add_2mod(0x40, dst_reg, src_reg), 0x88);
|
||||
else
|
||||
@ -840,25 +808,26 @@ stx: if (is_imm8(insn->off))
|
||||
|
||||
/* LDX: dst_reg = *(u8*)(src_reg + off) */
|
||||
case BPF_LDX | BPF_MEM | BPF_B:
|
||||
/* emit 'movzx rax, byte ptr [rax + off]' */
|
||||
/* Emit 'movzx rax, byte ptr [rax + off]' */
|
||||
EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xB6);
|
||||
goto ldx;
|
||||
case BPF_LDX | BPF_MEM | BPF_H:
|
||||
/* emit 'movzx rax, word ptr [rax + off]' */
|
||||
/* Emit 'movzx rax, word ptr [rax + off]' */
|
||||
EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xB7);
|
||||
goto ldx;
|
||||
case BPF_LDX | BPF_MEM | BPF_W:
|
||||
/* emit 'mov eax, dword ptr [rax+0x14]' */
|
||||
/* Emit 'mov eax, dword ptr [rax+0x14]' */
|
||||
if (is_ereg(dst_reg) || is_ereg(src_reg))
|
||||
EMIT2(add_2mod(0x40, src_reg, dst_reg), 0x8B);
|
||||
else
|
||||
EMIT1(0x8B);
|
||||
goto ldx;
|
||||
case BPF_LDX | BPF_MEM | BPF_DW:
|
||||
/* emit 'mov rax, qword ptr [rax+0x14]' */
|
||||
/* Emit 'mov rax, qword ptr [rax+0x14]' */
|
||||
EMIT2(add_2mod(0x48, src_reg, dst_reg), 0x8B);
|
||||
ldx: /* if insn->off == 0 we can save one extra byte, but
|
||||
* special case of x86 r13 which always needs an offset
|
||||
ldx: /*
|
||||
* If insn->off == 0 we can save one extra byte, but
|
||||
* special case of x86 R13 which always needs an offset
|
||||
* is not worth the hassle
|
||||
*/
|
||||
if (is_imm8(insn->off))
|
||||
@ -870,7 +839,7 @@ stx: if (is_imm8(insn->off))
|
||||
|
||||
/* STX XADD: lock *(u32*)(dst_reg + off) += src_reg */
|
||||
case BPF_STX | BPF_XADD | BPF_W:
|
||||
/* emit 'lock add dword ptr [rax + off], eax' */
|
||||
/* Emit 'lock add dword ptr [rax + off], eax' */
|
||||
if (is_ereg(dst_reg) || is_ereg(src_reg))
|
||||
EMIT3(0xF0, add_2mod(0x40, dst_reg, src_reg), 0x01);
|
||||
else
|
||||
@ -889,35 +858,12 @@ xadd: if (is_imm8(insn->off))
|
||||
case BPF_JMP | BPF_CALL:
|
||||
func = (u8 *) __bpf_call_base + imm32;
|
||||
jmp_offset = func - (image + addrs[i]);
|
||||
if (seen_ld_abs) {
|
||||
reload_skb_data = bpf_helper_changes_pkt_data(func);
|
||||
if (reload_skb_data) {
|
||||
EMIT1(0x57); /* push %rdi */
|
||||
jmp_offset += 22; /* pop, mov, sub, mov */
|
||||
} else {
|
||||
EMIT2(0x41, 0x52); /* push %r10 */
|
||||
EMIT2(0x41, 0x51); /* push %r9 */
|
||||
/* need to adjust jmp offset, since
|
||||
* pop %r9, pop %r10 take 4 bytes after call insn
|
||||
*/
|
||||
jmp_offset += 4;
|
||||
}
|
||||
}
|
||||
if (!imm32 || !is_simm32(jmp_offset)) {
|
||||
pr_err("unsupported bpf func %d addr %p image %p\n",
|
||||
pr_err("unsupported BPF func %d addr %p image %p\n",
|
||||
imm32, func, image);
|
||||
return -EINVAL;
|
||||
}
|
||||
EMIT1_off32(0xE8, jmp_offset);
|
||||
if (seen_ld_abs) {
|
||||
if (reload_skb_data) {
|
||||
EMIT1(0x5F); /* pop %rdi */
|
||||
emit_load_skb_data_hlen(&prog);
|
||||
} else {
|
||||
EMIT2(0x41, 0x59); /* pop %r9 */
|
||||
EMIT2(0x41, 0x5A); /* pop %r10 */
|
||||
}
|
||||
}
|
||||
break;
|
||||
|
||||
case BPF_JMP | BPF_TAIL_CALL:
|
||||
@ -970,7 +916,7 @@ xadd: if (is_imm8(insn->off))
|
||||
else
|
||||
EMIT2_off32(0x81, add_1reg(0xF8, dst_reg), imm32);
|
||||
|
||||
emit_cond_jmp: /* convert BPF opcode to x86 */
|
||||
emit_cond_jmp: /* Convert BPF opcode to x86 */
|
||||
switch (BPF_OP(insn->code)) {
|
||||
case BPF_JEQ:
|
||||
jmp_cond = X86_JE;
|
||||
@ -996,22 +942,22 @@ xadd: if (is_imm8(insn->off))
|
||||
jmp_cond = X86_JBE;
|
||||
break;
|
||||
case BPF_JSGT:
|
||||
/* signed '>', GT in x86 */
|
||||
/* Signed '>', GT in x86 */
|
||||
jmp_cond = X86_JG;
|
||||
break;
|
||||
case BPF_JSLT:
|
||||
/* signed '<', LT in x86 */
|
||||
/* Signed '<', LT in x86 */
|
||||
jmp_cond = X86_JL;
|
||||
break;
|
||||
case BPF_JSGE:
|
||||
/* signed '>=', GE in x86 */
|
||||
/* Signed '>=', GE in x86 */
|
||||
jmp_cond = X86_JGE;
|
||||
break;
|
||||
case BPF_JSLE:
|
||||
/* signed '<=', LE in x86 */
|
||||
/* Signed '<=', LE in x86 */
|
||||
jmp_cond = X86_JLE;
|
||||
break;
|
||||
default: /* to silence gcc warning */
|
||||
default: /* to silence GCC warning */
|
||||
return -EFAULT;
|
||||
}
|
||||
jmp_offset = addrs[i + insn->off] - addrs[i];
|
||||
@ -1039,7 +985,7 @@ xadd: if (is_imm8(insn->off))
|
||||
jmp_offset = addrs[i + insn->off] - addrs[i];
|
||||
|
||||
if (!jmp_offset)
|
||||
/* optimize out nop jumps */
|
||||
/* Optimize out nop jumps */
|
||||
break;
|
||||
emit_jmp:
|
||||
if (is_imm8(jmp_offset)) {
|
||||
@ -1052,66 +998,13 @@ xadd: if (is_imm8(insn->off))
|
||||
}
|
||||
break;
|
||||
|
||||
case BPF_LD | BPF_IND | BPF_W:
|
||||
func = sk_load_word;
|
||||
goto common_load;
|
||||
case BPF_LD | BPF_ABS | BPF_W:
|
||||
func = CHOOSE_LOAD_FUNC(imm32, sk_load_word);
|
||||
common_load:
|
||||
ctx->seen_ld_abs = seen_ld_abs = true;
|
||||
jmp_offset = func - (image + addrs[i]);
|
||||
if (!func || !is_simm32(jmp_offset)) {
|
||||
pr_err("unsupported bpf func %d addr %p image %p\n",
|
||||
imm32, func, image);
|
||||
return -EINVAL;
|
||||
}
|
||||
if (BPF_MODE(insn->code) == BPF_ABS) {
|
||||
/* mov %esi, imm32 */
|
||||
EMIT1_off32(0xBE, imm32);
|
||||
} else {
|
||||
/* mov %rsi, src_reg */
|
||||
EMIT_mov(BPF_REG_2, src_reg);
|
||||
if (imm32) {
|
||||
if (is_imm8(imm32))
|
||||
/* add %esi, imm8 */
|
||||
EMIT3(0x83, 0xC6, imm32);
|
||||
else
|
||||
/* add %esi, imm32 */
|
||||
EMIT2_off32(0x81, 0xC6, imm32);
|
||||
}
|
||||
}
|
||||
/* skb pointer is in R6 (%rbx), it will be copied into
|
||||
* %rdi if skb_copy_bits() call is necessary.
|
||||
* sk_load_* helpers also use %r10 and %r9d.
|
||||
* See bpf_jit.S
|
||||
*/
|
||||
if (seen_ax_reg)
|
||||
/* r10 = skb->data, mov %r10, off32(%rbx) */
|
||||
EMIT3_off32(0x4c, 0x8b, 0x93,
|
||||
offsetof(struct sk_buff, data));
|
||||
EMIT1_off32(0xE8, jmp_offset); /* call */
|
||||
break;
|
||||
|
||||
case BPF_LD | BPF_IND | BPF_H:
|
||||
func = sk_load_half;
|
||||
goto common_load;
|
||||
case BPF_LD | BPF_ABS | BPF_H:
|
||||
func = CHOOSE_LOAD_FUNC(imm32, sk_load_half);
|
||||
goto common_load;
|
||||
case BPF_LD | BPF_IND | BPF_B:
|
||||
func = sk_load_byte;
|
||||
goto common_load;
|
||||
case BPF_LD | BPF_ABS | BPF_B:
|
||||
func = CHOOSE_LOAD_FUNC(imm32, sk_load_byte);
|
||||
goto common_load;
|
||||
|
||||
case BPF_JMP | BPF_EXIT:
|
||||
if (seen_exit) {
|
||||
jmp_offset = ctx->cleanup_addr - addrs[i];
|
||||
goto emit_jmp;
|
||||
}
|
||||
seen_exit = true;
|
||||
/* update cleanup_addr */
|
||||
/* Update cleanup_addr */
|
||||
ctx->cleanup_addr = proglen;
|
||||
/* mov rbx, qword ptr [rbp+0] */
|
||||
EMIT4(0x48, 0x8B, 0x5D, 0);
|
||||
@ -1129,10 +1022,11 @@ xadd: if (is_imm8(insn->off))
|
||||
break;
|
||||
|
||||
default:
|
||||
/* By design x64 JIT should support all BPF instructions
|
||||
/*
|
||||
* By design x86-64 JIT should support all BPF instructions.
|
||||
* This error will be seen if new instruction was added
|
||||
* to interpreter, but not to JIT
|
||||
* or if there is junk in bpf_prog
|
||||
* to the interpreter, but not to the JIT, or if there is
|
||||
* junk in bpf_prog.
|
||||
*/
|
||||
pr_err("bpf_jit: unknown opcode %02x\n", insn->code);
|
||||
return -EINVAL;
|
||||
@ -1184,7 +1078,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
|
||||
return orig_prog;
|
||||
|
||||
tmp = bpf_jit_blind_constants(prog);
|
||||
/* If blinding was requested and we failed during blinding,
|
||||
/*
|
||||
* If blinding was requested and we failed during blinding,
|
||||
* we must fall back to the interpreter.
|
||||
*/
|
||||
if (IS_ERR(tmp))
|
||||
@ -1218,8 +1113,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
|
||||
goto out_addrs;
|
||||
}
|
||||
|
||||
/* Before first pass, make a rough estimation of addrs[]
|
||||
* each bpf instruction is translated to less than 64 bytes
|
||||
/*
|
||||
* Before first pass, make a rough estimation of addrs[]
|
||||
* each BPF instruction is translated to less than 64 bytes
|
||||
*/
|
||||
for (proglen = 0, i = 0; i < prog->len; i++) {
|
||||
proglen += 64;
|
||||
@ -1228,10 +1124,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
|
||||
ctx.cleanup_addr = proglen;
|
||||
skip_init_addrs:
|
||||
|
||||
/* JITed image shrinks with every pass and the loop iterates
|
||||
* until the image stops shrinking. Very large bpf programs
|
||||
/*
|
||||
* JITed image shrinks with every pass and the loop iterates
|
||||
* until the image stops shrinking. Very large BPF programs
|
||||
* may converge on the last pass. In such case do one more
|
||||
* pass to emit the final image
|
||||
* pass to emit the final image.
|
||||
*/
|
||||
for (pass = 0; pass < 20 || image; pass++) {
|
||||
proglen = do_jit(prog, addrs, image, oldproglen, &ctx);
|
||||
|
2419
arch/x86/net/bpf_jit_comp32.c
Normal file
2419
arch/x86/net/bpf_jit_comp32.c
Normal file
File diff suppressed because it is too large
Load Diff
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* Copyright (C) 2017 Netronome Systems, Inc.
|
||||
* Copyright (C) 2017-2018 Netronome Systems, Inc.
|
||||
*
|
||||
* This software is dual licensed under the GNU General License Version 2,
|
||||
* June 1991 as shown in the file COPYING in the top-level directory of this
|
||||
@ -102,6 +102,15 @@ nfp_bpf_cmsg_map_req_alloc(struct nfp_app_bpf *bpf, unsigned int n)
|
||||
return nfp_bpf_cmsg_alloc(bpf, size);
|
||||
}
|
||||
|
||||
static u8 nfp_bpf_cmsg_get_type(struct sk_buff *skb)
|
||||
{
|
||||
struct cmsg_hdr *hdr;
|
||||
|
||||
hdr = (struct cmsg_hdr *)skb->data;
|
||||
|
||||
return hdr->type;
|
||||
}
|
||||
|
||||
static unsigned int nfp_bpf_cmsg_get_tag(struct sk_buff *skb)
|
||||
{
|
||||
struct cmsg_hdr *hdr;
|
||||
@ -431,6 +440,11 @@ void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb)
|
||||
goto err_free;
|
||||
}
|
||||
|
||||
if (nfp_bpf_cmsg_get_type(skb) == CMSG_TYPE_BPF_EVENT) {
|
||||
nfp_bpf_event_output(bpf, skb);
|
||||
return;
|
||||
}
|
||||
|
||||
nfp_ctrl_lock(bpf->app->ctrl);
|
||||
|
||||
tag = nfp_bpf_cmsg_get_tag(skb);
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* Copyright (C) 2017 Netronome Systems, Inc.
|
||||
* Copyright (C) 2017-2018 Netronome Systems, Inc.
|
||||
*
|
||||
* This software is dual licensed under the GNU General License Version 2,
|
||||
* June 1991 as shown in the file COPYING in the top-level directory of this
|
||||
@ -37,6 +37,14 @@
|
||||
#include <linux/bitops.h>
|
||||
#include <linux/types.h>
|
||||
|
||||
/* Kernel's enum bpf_reg_type is not uABI so people may change it breaking
|
||||
* our FW ABI. In that case we will do translation in the driver.
|
||||
*/
|
||||
#define NFP_BPF_SCALAR_VALUE 1
|
||||
#define NFP_BPF_MAP_VALUE 4
|
||||
#define NFP_BPF_STACK 6
|
||||
#define NFP_BPF_PACKET_DATA 8
|
||||
|
||||
enum bpf_cap_tlv_type {
|
||||
NFP_BPF_CAP_TYPE_FUNC = 1,
|
||||
NFP_BPF_CAP_TYPE_ADJUST_HEAD = 2,
|
||||
@ -81,6 +89,7 @@ enum nfp_bpf_cmsg_type {
|
||||
CMSG_TYPE_MAP_DELETE = 5,
|
||||
CMSG_TYPE_MAP_GETNEXT = 6,
|
||||
CMSG_TYPE_MAP_GETFIRST = 7,
|
||||
CMSG_TYPE_BPF_EVENT = 8,
|
||||
__CMSG_TYPE_MAP_MAX,
|
||||
};
|
||||
|
||||
@ -155,4 +164,13 @@ struct cmsg_reply_map_op {
|
||||
__be32 resv;
|
||||
struct cmsg_key_value_pair elem[0];
|
||||
};
|
||||
|
||||
struct cmsg_bpf_event {
|
||||
struct cmsg_hdr hdr;
|
||||
__be32 cpu_id;
|
||||
__be64 map_ptr;
|
||||
__be32 data_size;
|
||||
__be32 pkt_size;
|
||||
u8 data[0];
|
||||
};
|
||||
#endif
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* Copyright (C) 2016-2017 Netronome Systems, Inc.
|
||||
* Copyright (C) 2016-2018 Netronome Systems, Inc.
|
||||
*
|
||||
* This software is dual licensed under the GNU General License Version 2,
|
||||
* June 1991 as shown in the file COPYING in the top-level directory of this
|
||||
@ -1395,15 +1395,9 @@ static int adjust_head(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
|
||||
static int
|
||||
map_call_stack_common(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
|
||||
{
|
||||
struct bpf_offloaded_map *offmap;
|
||||
struct nfp_bpf_map *nfp_map;
|
||||
bool load_lm_ptr;
|
||||
u32 ret_tgt;
|
||||
s64 lm_off;
|
||||
swreg tid;
|
||||
|
||||
offmap = (struct bpf_offloaded_map *)meta->arg1.map_ptr;
|
||||
nfp_map = offmap->dev_priv;
|
||||
|
||||
/* We only have to reload LM0 if the key is not at start of stack */
|
||||
lm_off = nfp_prog->stack_depth;
|
||||
@ -1416,17 +1410,12 @@ map_call_stack_common(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
|
||||
if (meta->func_id == BPF_FUNC_map_update_elem)
|
||||
emit_csr_wr(nfp_prog, reg_b(3 * 2), NFP_CSR_ACT_LM_ADDR2);
|
||||
|
||||
/* Load map ID into a register, it should actually fit as an immediate
|
||||
* but in case it doesn't deal with it here, not in the delay slots.
|
||||
*/
|
||||
tid = ur_load_imm_any(nfp_prog, nfp_map->tid, imm_a(nfp_prog));
|
||||
|
||||
emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO + meta->func_id,
|
||||
2, RELO_BR_HELPER);
|
||||
ret_tgt = nfp_prog_current_offset(nfp_prog) + 2;
|
||||
|
||||
/* Load map ID into A0 */
|
||||
wrp_mov(nfp_prog, reg_a(0), tid);
|
||||
wrp_mov(nfp_prog, reg_a(0), reg_a(2));
|
||||
|
||||
/* Load the return address into B0 */
|
||||
wrp_immed_relo(nfp_prog, reg_b(0), ret_tgt, RELO_IMMED_REL);
|
||||
@ -1456,6 +1445,31 @@ nfp_get_prandom_u32(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int
|
||||
nfp_perf_event_output(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
|
||||
{
|
||||
swreg ptr_type;
|
||||
u32 ret_tgt;
|
||||
|
||||
ptr_type = ur_load_imm_any(nfp_prog, meta->arg1.type, imm_a(nfp_prog));
|
||||
|
||||
ret_tgt = nfp_prog_current_offset(nfp_prog) + 3;
|
||||
|
||||
emit_br_relo(nfp_prog, BR_UNC, BR_OFF_RELO + meta->func_id,
|
||||
2, RELO_BR_HELPER);
|
||||
|
||||
/* Load ptr type into A1 */
|
||||
wrp_mov(nfp_prog, reg_a(1), ptr_type);
|
||||
|
||||
/* Load the return address into B0 */
|
||||
wrp_immed_relo(nfp_prog, reg_b(0), ret_tgt, RELO_IMMED_REL);
|
||||
|
||||
if (!nfp_prog_confirm_current_offset(nfp_prog, ret_tgt))
|
||||
return -EINVAL;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* --- Callbacks --- */
|
||||
static int mov_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
|
||||
{
|
||||
@ -2411,6 +2425,8 @@ static int call(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
|
||||
return map_call_stack_common(nfp_prog, meta);
|
||||
case BPF_FUNC_get_prandom_u32:
|
||||
return nfp_get_prandom_u32(nfp_prog, meta);
|
||||
case BPF_FUNC_perf_event_output:
|
||||
return nfp_perf_event_output(nfp_prog, meta);
|
||||
default:
|
||||
WARN_ONCE(1, "verifier allowed unsupported function\n");
|
||||
return -EOPNOTSUPP;
|
||||
@ -3227,6 +3243,33 @@ static int nfp_bpf_optimize(struct nfp_prog *nfp_prog)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int nfp_bpf_replace_map_ptrs(struct nfp_prog *nfp_prog)
|
||||
{
|
||||
struct nfp_insn_meta *meta1, *meta2;
|
||||
struct nfp_bpf_map *nfp_map;
|
||||
struct bpf_map *map;
|
||||
|
||||
nfp_for_each_insn_walk2(nfp_prog, meta1, meta2) {
|
||||
if (meta1->skip || meta2->skip)
|
||||
continue;
|
||||
|
||||
if (meta1->insn.code != (BPF_LD | BPF_IMM | BPF_DW) ||
|
||||
meta1->insn.src_reg != BPF_PSEUDO_MAP_FD)
|
||||
continue;
|
||||
|
||||
map = (void *)(unsigned long)((u32)meta1->insn.imm |
|
||||
(u64)meta2->insn.imm << 32);
|
||||
if (bpf_map_offload_neutral(map))
|
||||
continue;
|
||||
nfp_map = map_to_offmap(map)->dev_priv;
|
||||
|
||||
meta1->insn.imm = nfp_map->tid;
|
||||
meta2->insn.imm = 0;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int nfp_bpf_ustore_calc(u64 *prog, unsigned int len)
|
||||
{
|
||||
__le64 *ustore = (__force __le64 *)prog;
|
||||
@ -3263,6 +3306,10 @@ int nfp_bpf_jit(struct nfp_prog *nfp_prog)
|
||||
{
|
||||
int ret;
|
||||
|
||||
ret = nfp_bpf_replace_map_ptrs(nfp_prog);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
ret = nfp_bpf_optimize(nfp_prog);
|
||||
if (ret)
|
||||
return ret;
|
||||
@ -3353,6 +3400,9 @@ void *nfp_bpf_relo_for_vnic(struct nfp_prog *nfp_prog, struct nfp_bpf_vnic *bv)
|
||||
case BPF_FUNC_map_delete_elem:
|
||||
val = nfp_prog->bpf->helpers.map_delete;
|
||||
break;
|
||||
case BPF_FUNC_perf_event_output:
|
||||
val = nfp_prog->bpf->helpers.perf_event_output;
|
||||
break;
|
||||
default:
|
||||
pr_err("relocation of unknown helper %d\n",
|
||||
val);
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* Copyright (C) 2017 Netronome Systems, Inc.
|
||||
* Copyright (C) 2017-2018 Netronome Systems, Inc.
|
||||
*
|
||||
* This software is dual licensed under the GNU General License Version 2,
|
||||
* June 1991 as shown in the file COPYING in the top-level directory of this
|
||||
@ -43,6 +43,14 @@
|
||||
#include "fw.h"
|
||||
#include "main.h"
|
||||
|
||||
const struct rhashtable_params nfp_bpf_maps_neutral_params = {
|
||||
.nelem_hint = 4,
|
||||
.key_len = FIELD_SIZEOF(struct nfp_bpf_neutral_map, ptr),
|
||||
.key_offset = offsetof(struct nfp_bpf_neutral_map, ptr),
|
||||
.head_offset = offsetof(struct nfp_bpf_neutral_map, l),
|
||||
.automatic_shrinking = true,
|
||||
};
|
||||
|
||||
static bool nfp_net_ebpf_capable(struct nfp_net *nn)
|
||||
{
|
||||
#ifdef __LITTLE_ENDIAN
|
||||
@ -290,6 +298,9 @@ nfp_bpf_parse_cap_func(struct nfp_app_bpf *bpf, void __iomem *value, u32 length)
|
||||
case BPF_FUNC_map_delete_elem:
|
||||
bpf->helpers.map_delete = readl(&cap->func_addr);
|
||||
break;
|
||||
case BPF_FUNC_perf_event_output:
|
||||
bpf->helpers.perf_event_output = readl(&cap->func_addr);
|
||||
break;
|
||||
}
|
||||
|
||||
return 0;
|
||||
@ -401,17 +412,28 @@ static int nfp_bpf_init(struct nfp_app *app)
|
||||
init_waitqueue_head(&bpf->cmsg_wq);
|
||||
INIT_LIST_HEAD(&bpf->map_list);
|
||||
|
||||
err = nfp_bpf_parse_capabilities(app);
|
||||
err = rhashtable_init(&bpf->maps_neutral, &nfp_bpf_maps_neutral_params);
|
||||
if (err)
|
||||
goto err_free_bpf;
|
||||
|
||||
err = nfp_bpf_parse_capabilities(app);
|
||||
if (err)
|
||||
goto err_free_neutral_maps;
|
||||
|
||||
return 0;
|
||||
|
||||
err_free_neutral_maps:
|
||||
rhashtable_destroy(&bpf->maps_neutral);
|
||||
err_free_bpf:
|
||||
kfree(bpf);
|
||||
return err;
|
||||
}
|
||||
|
||||
static void nfp_check_rhashtable_empty(void *ptr, void *arg)
|
||||
{
|
||||
WARN_ON_ONCE(1);
|
||||
}
|
||||
|
||||
static void nfp_bpf_clean(struct nfp_app *app)
|
||||
{
|
||||
struct nfp_app_bpf *bpf = app->priv;
|
||||
@ -419,6 +441,8 @@ static void nfp_bpf_clean(struct nfp_app *app)
|
||||
WARN_ON(!skb_queue_empty(&bpf->cmsg_replies));
|
||||
WARN_ON(!list_empty(&bpf->map_list));
|
||||
WARN_ON(bpf->maps_in_use || bpf->map_elems_in_use);
|
||||
rhashtable_free_and_destroy(&bpf->maps_neutral,
|
||||
nfp_check_rhashtable_empty, NULL);
|
||||
kfree(bpf);
|
||||
}
|
||||
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* Copyright (C) 2016-2017 Netronome Systems, Inc.
|
||||
* Copyright (C) 2016-2018 Netronome Systems, Inc.
|
||||
*
|
||||
* This software is dual licensed under the GNU General License Version 2,
|
||||
* June 1991 as shown in the file COPYING in the top-level directory of this
|
||||
@ -39,6 +39,7 @@
|
||||
#include <linux/bpf_verifier.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/list.h>
|
||||
#include <linux/rhashtable.h>
|
||||
#include <linux/skbuff.h>
|
||||
#include <linux/types.h>
|
||||
#include <linux/wait.h>
|
||||
@ -114,6 +115,8 @@ enum pkt_vec {
|
||||
* @maps_in_use: number of currently offloaded maps
|
||||
* @map_elems_in_use: number of elements allocated to offloaded maps
|
||||
*
|
||||
* @maps_neutral: hash table of offload-neutral maps (on pointer)
|
||||
*
|
||||
* @adjust_head: adjust head capability
|
||||
* @adjust_head.flags: extra flags for adjust head
|
||||
* @adjust_head.off_min: minimal packet offset within buffer required
|
||||
@ -133,6 +136,7 @@ enum pkt_vec {
|
||||
* @helpers.map_lookup: map lookup helper address
|
||||
* @helpers.map_update: map update helper address
|
||||
* @helpers.map_delete: map delete helper address
|
||||
* @helpers.perf_event_output: output perf event to a ring buffer
|
||||
*
|
||||
* @pseudo_random: FW initialized the pseudo-random machinery (CSRs)
|
||||
*/
|
||||
@ -150,6 +154,8 @@ struct nfp_app_bpf {
|
||||
unsigned int maps_in_use;
|
||||
unsigned int map_elems_in_use;
|
||||
|
||||
struct rhashtable maps_neutral;
|
||||
|
||||
struct nfp_bpf_cap_adjust_head {
|
||||
u32 flags;
|
||||
int off_min;
|
||||
@ -171,6 +177,7 @@ struct nfp_app_bpf {
|
||||
u32 map_lookup;
|
||||
u32 map_update;
|
||||
u32 map_delete;
|
||||
u32 perf_event_output;
|
||||
} helpers;
|
||||
|
||||
bool pseudo_random;
|
||||
@ -199,6 +206,14 @@ struct nfp_bpf_map {
|
||||
enum nfp_bpf_map_use use_map[];
|
||||
};
|
||||
|
||||
struct nfp_bpf_neutral_map {
|
||||
struct rhash_head l;
|
||||
struct bpf_map *ptr;
|
||||
u32 count;
|
||||
};
|
||||
|
||||
extern const struct rhashtable_params nfp_bpf_maps_neutral_params;
|
||||
|
||||
struct nfp_prog;
|
||||
struct nfp_insn_meta;
|
||||
typedef int (*instr_cb_t)(struct nfp_prog *, struct nfp_insn_meta *);
|
||||
@ -367,6 +382,8 @@ static inline bool is_mbpf_xadd(const struct nfp_insn_meta *meta)
|
||||
* @error: error code if something went wrong
|
||||
* @stack_depth: max stack depth from the verifier
|
||||
* @adjust_head_location: if program has single adjust head call - the insn no.
|
||||
* @map_records_cnt: the number of map pointers recorded for this prog
|
||||
* @map_records: the map record pointers from bpf->maps_neutral
|
||||
* @insns: list of BPF instruction wrappers (struct nfp_insn_meta)
|
||||
*/
|
||||
struct nfp_prog {
|
||||
@ -390,6 +407,9 @@ struct nfp_prog {
|
||||
unsigned int stack_depth;
|
||||
unsigned int adjust_head_location;
|
||||
|
||||
unsigned int map_records_cnt;
|
||||
struct nfp_bpf_neutral_map **map_records;
|
||||
|
||||
struct list_head insns;
|
||||
};
|
||||
|
||||
@ -440,5 +460,7 @@ int nfp_bpf_ctrl_lookup_entry(struct bpf_offloaded_map *offmap,
|
||||
int nfp_bpf_ctrl_getnext_entry(struct bpf_offloaded_map *offmap,
|
||||
void *key, void *next_key);
|
||||
|
||||
int nfp_bpf_event_output(struct nfp_app_bpf *bpf, struct sk_buff *skb);
|
||||
|
||||
void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb);
|
||||
#endif
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* Copyright (C) 2016-2017 Netronome Systems, Inc.
|
||||
* Copyright (C) 2016-2018 Netronome Systems, Inc.
|
||||
*
|
||||
* This software is dual licensed under the GNU General License Version 2,
|
||||
* June 1991 as shown in the file COPYING in the top-level directory of this
|
||||
@ -56,6 +56,126 @@
|
||||
#include "../nfp_net_ctrl.h"
|
||||
#include "../nfp_net.h"
|
||||
|
||||
static int
|
||||
nfp_map_ptr_record(struct nfp_app_bpf *bpf, struct nfp_prog *nfp_prog,
|
||||
struct bpf_map *map)
|
||||
{
|
||||
struct nfp_bpf_neutral_map *record;
|
||||
int err;
|
||||
|
||||
/* Map record paths are entered via ndo, update side is protected. */
|
||||
ASSERT_RTNL();
|
||||
|
||||
/* Reuse path - other offloaded program is already tracking this map. */
|
||||
record = rhashtable_lookup_fast(&bpf->maps_neutral, &map,
|
||||
nfp_bpf_maps_neutral_params);
|
||||
if (record) {
|
||||
nfp_prog->map_records[nfp_prog->map_records_cnt++] = record;
|
||||
record->count++;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Grab a single ref to the map for our record. The prog destroy ndo
|
||||
* happens after free_used_maps().
|
||||
*/
|
||||
map = bpf_map_inc(map, false);
|
||||
if (IS_ERR(map))
|
||||
return PTR_ERR(map);
|
||||
|
||||
record = kmalloc(sizeof(*record), GFP_KERNEL);
|
||||
if (!record) {
|
||||
err = -ENOMEM;
|
||||
goto err_map_put;
|
||||
}
|
||||
|
||||
record->ptr = map;
|
||||
record->count = 1;
|
||||
|
||||
err = rhashtable_insert_fast(&bpf->maps_neutral, &record->l,
|
||||
nfp_bpf_maps_neutral_params);
|
||||
if (err)
|
||||
goto err_free_rec;
|
||||
|
||||
nfp_prog->map_records[nfp_prog->map_records_cnt++] = record;
|
||||
|
||||
return 0;
|
||||
|
||||
err_free_rec:
|
||||
kfree(record);
|
||||
err_map_put:
|
||||
bpf_map_put(map);
|
||||
return err;
|
||||
}
|
||||
|
||||
static void
|
||||
nfp_map_ptrs_forget(struct nfp_app_bpf *bpf, struct nfp_prog *nfp_prog)
|
||||
{
|
||||
bool freed = false;
|
||||
int i;
|
||||
|
||||
ASSERT_RTNL();
|
||||
|
||||
for (i = 0; i < nfp_prog->map_records_cnt; i++) {
|
||||
if (--nfp_prog->map_records[i]->count) {
|
||||
nfp_prog->map_records[i] = NULL;
|
||||
continue;
|
||||
}
|
||||
|
||||
WARN_ON(rhashtable_remove_fast(&bpf->maps_neutral,
|
||||
&nfp_prog->map_records[i]->l,
|
||||
nfp_bpf_maps_neutral_params));
|
||||
freed = true;
|
||||
}
|
||||
|
||||
if (freed) {
|
||||
synchronize_rcu();
|
||||
|
||||
for (i = 0; i < nfp_prog->map_records_cnt; i++)
|
||||
if (nfp_prog->map_records[i]) {
|
||||
bpf_map_put(nfp_prog->map_records[i]->ptr);
|
||||
kfree(nfp_prog->map_records[i]);
|
||||
}
|
||||
}
|
||||
|
||||
kfree(nfp_prog->map_records);
|
||||
nfp_prog->map_records = NULL;
|
||||
nfp_prog->map_records_cnt = 0;
|
||||
}
|
||||
|
||||
static int
|
||||
nfp_map_ptrs_record(struct nfp_app_bpf *bpf, struct nfp_prog *nfp_prog,
|
||||
struct bpf_prog *prog)
|
||||
{
|
||||
int i, cnt, err;
|
||||
|
||||
/* Quickly count the maps we will have to remember */
|
||||
cnt = 0;
|
||||
for (i = 0; i < prog->aux->used_map_cnt; i++)
|
||||
if (bpf_map_offload_neutral(prog->aux->used_maps[i]))
|
||||
cnt++;
|
||||
if (!cnt)
|
||||
return 0;
|
||||
|
||||
nfp_prog->map_records = kmalloc_array(cnt,
|
||||
sizeof(nfp_prog->map_records[0]),
|
||||
GFP_KERNEL);
|
||||
if (!nfp_prog->map_records)
|
||||
return -ENOMEM;
|
||||
|
||||
for (i = 0; i < prog->aux->used_map_cnt; i++)
|
||||
if (bpf_map_offload_neutral(prog->aux->used_maps[i])) {
|
||||
err = nfp_map_ptr_record(bpf, nfp_prog,
|
||||
prog->aux->used_maps[i]);
|
||||
if (err) {
|
||||
nfp_map_ptrs_forget(bpf, nfp_prog);
|
||||
return err;
|
||||
}
|
||||
}
|
||||
WARN_ON(cnt != nfp_prog->map_records_cnt);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int
|
||||
nfp_prog_prepare(struct nfp_prog *nfp_prog, const struct bpf_insn *prog,
|
||||
unsigned int cnt)
|
||||
@ -151,7 +271,7 @@ static int nfp_bpf_translate(struct nfp_net *nn, struct bpf_prog *prog)
|
||||
prog->aux->offload->jited_len = nfp_prog->prog_len * sizeof(u64);
|
||||
prog->aux->offload->jited_image = nfp_prog->prog;
|
||||
|
||||
return 0;
|
||||
return nfp_map_ptrs_record(nfp_prog->bpf, nfp_prog, prog);
|
||||
}
|
||||
|
||||
static int nfp_bpf_destroy(struct nfp_net *nn, struct bpf_prog *prog)
|
||||
@ -159,6 +279,7 @@ static int nfp_bpf_destroy(struct nfp_net *nn, struct bpf_prog *prog)
|
||||
struct nfp_prog *nfp_prog = prog->aux->offload->dev_priv;
|
||||
|
||||
kvfree(nfp_prog->prog);
|
||||
nfp_map_ptrs_forget(nfp_prog->bpf, nfp_prog);
|
||||
nfp_prog_free(nfp_prog);
|
||||
|
||||
return 0;
|
||||
@ -320,6 +441,53 @@ int nfp_ndo_bpf(struct nfp_app *app, struct nfp_net *nn, struct netdev_bpf *bpf)
|
||||
}
|
||||
}
|
||||
|
||||
static unsigned long
|
||||
nfp_bpf_perf_event_copy(void *dst, const void *src,
|
||||
unsigned long off, unsigned long len)
|
||||
{
|
||||
memcpy(dst, src + off, len);
|
||||
return 0;
|
||||
}
|
||||
|
||||
int nfp_bpf_event_output(struct nfp_app_bpf *bpf, struct sk_buff *skb)
|
||||
{
|
||||
struct cmsg_bpf_event *cbe = (void *)skb->data;
|
||||
u32 pkt_size, data_size;
|
||||
struct bpf_map *map;
|
||||
|
||||
if (skb->len < sizeof(struct cmsg_bpf_event))
|
||||
goto err_drop;
|
||||
|
||||
pkt_size = be32_to_cpu(cbe->pkt_size);
|
||||
data_size = be32_to_cpu(cbe->data_size);
|
||||
map = (void *)(unsigned long)be64_to_cpu(cbe->map_ptr);
|
||||
|
||||
if (skb->len < sizeof(struct cmsg_bpf_event) + pkt_size + data_size)
|
||||
goto err_drop;
|
||||
if (cbe->hdr.ver != CMSG_MAP_ABI_VERSION)
|
||||
goto err_drop;
|
||||
|
||||
rcu_read_lock();
|
||||
if (!rhashtable_lookup_fast(&bpf->maps_neutral, &map,
|
||||
nfp_bpf_maps_neutral_params)) {
|
||||
rcu_read_unlock();
|
||||
pr_warn("perf event: dest map pointer %px not recognized, dropping event\n",
|
||||
map);
|
||||
goto err_drop;
|
||||
}
|
||||
|
||||
bpf_event_output(map, be32_to_cpu(cbe->cpu_id),
|
||||
&cbe->data[round_up(pkt_size, 4)], data_size,
|
||||
cbe->data, pkt_size, nfp_bpf_perf_event_copy);
|
||||
rcu_read_unlock();
|
||||
|
||||
dev_consume_skb_any(skb);
|
||||
return 0;
|
||||
err_drop:
|
||||
dev_kfree_skb_any(skb);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
static int
|
||||
nfp_net_bpf_load(struct nfp_net *nn, struct bpf_prog *prog,
|
||||
struct netlink_ext_ack *extack)
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* Copyright (C) 2016-2017 Netronome Systems, Inc.
|
||||
* Copyright (C) 2016-2018 Netronome Systems, Inc.
|
||||
*
|
||||
* This software is dual licensed under the GNU General License Version 2,
|
||||
* June 1991 as shown in the file COPYING in the top-level directory of this
|
||||
@ -36,6 +36,8 @@
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/pkt_cls.h>
|
||||
|
||||
#include "../nfp_app.h"
|
||||
#include "../nfp_main.h"
|
||||
#include "fw.h"
|
||||
#include "main.h"
|
||||
|
||||
@ -149,15 +151,6 @@ nfp_bpf_map_call_ok(const char *fname, struct bpf_verifier_env *env,
|
||||
return false;
|
||||
}
|
||||
|
||||
/* Rest of the checks is only if we re-parse the same insn */
|
||||
if (!meta->func_id)
|
||||
return true;
|
||||
|
||||
if (meta->arg1.map_ptr != reg1->map_ptr) {
|
||||
pr_vlog(env, "%s: called for different map\n", fname);
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
@ -216,6 +209,71 @@ nfp_bpf_check_call(struct nfp_prog *nfp_prog, struct bpf_verifier_env *env,
|
||||
pr_vlog(env, "bpf_get_prandom_u32(): FW doesn't support random number generation\n");
|
||||
return -EOPNOTSUPP;
|
||||
|
||||
case BPF_FUNC_perf_event_output:
|
||||
BUILD_BUG_ON(NFP_BPF_SCALAR_VALUE != SCALAR_VALUE ||
|
||||
NFP_BPF_MAP_VALUE != PTR_TO_MAP_VALUE ||
|
||||
NFP_BPF_STACK != PTR_TO_STACK ||
|
||||
NFP_BPF_PACKET_DATA != PTR_TO_PACKET);
|
||||
|
||||
if (!bpf->helpers.perf_event_output) {
|
||||
pr_vlog(env, "event_output: not supported by FW\n");
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
/* Force current CPU to make sure we can report the event
|
||||
* wherever we get the control message from FW.
|
||||
*/
|
||||
if (reg3->var_off.mask & BPF_F_INDEX_MASK ||
|
||||
(reg3->var_off.value & BPF_F_INDEX_MASK) !=
|
||||
BPF_F_CURRENT_CPU) {
|
||||
char tn_buf[48];
|
||||
|
||||
tnum_strn(tn_buf, sizeof(tn_buf), reg3->var_off);
|
||||
pr_vlog(env, "event_output: must use BPF_F_CURRENT_CPU, var_off: %s\n",
|
||||
tn_buf);
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
/* Save space in meta, we don't care about arguments other
|
||||
* than 4th meta, shove it into arg1.
|
||||
*/
|
||||
reg1 = cur_regs(env) + BPF_REG_4;
|
||||
|
||||
if (reg1->type != SCALAR_VALUE /* NULL ptr */ &&
|
||||
reg1->type != PTR_TO_STACK &&
|
||||
reg1->type != PTR_TO_MAP_VALUE &&
|
||||
reg1->type != PTR_TO_PACKET) {
|
||||
pr_vlog(env, "event_output: unsupported ptr type: %d\n",
|
||||
reg1->type);
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
if (reg1->type == PTR_TO_STACK &&
|
||||
!nfp_bpf_stack_arg_ok("event_output", env, reg1, NULL))
|
||||
return -EOPNOTSUPP;
|
||||
|
||||
/* Warn user that on offload NFP may return success even if map
|
||||
* is not going to accept the event, since the event output is
|
||||
* fully async and device won't know the state of the map.
|
||||
* There is also FW limitation on the event length.
|
||||
*
|
||||
* Lost events will not show up on the perf ring, driver
|
||||
* won't see them at all. Events may also get reordered.
|
||||
*/
|
||||
dev_warn_once(&nfp_prog->bpf->app->pf->pdev->dev,
|
||||
"bpf: note: return codes and behavior of bpf_event_output() helper differs for offloaded programs!\n");
|
||||
pr_vlog(env, "warning: return codes and behavior of event_output helper differ for offload!\n");
|
||||
|
||||
if (!meta->func_id)
|
||||
break;
|
||||
|
||||
if (reg1->type != meta->arg1.type) {
|
||||
pr_vlog(env, "event_output: ptr type changed: %d %d\n",
|
||||
meta->arg1.type, reg1->type);
|
||||
return -EINVAL;
|
||||
}
|
||||
break;
|
||||
|
||||
default:
|
||||
pr_vlog(env, "unsupported function id: %d\n", func_id);
|
||||
return -EOPNOTSUPP;
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* Copyright (C) 2017 Netronome Systems, Inc.
|
||||
* Copyright (C) 2017-2018 Netronome Systems, Inc.
|
||||
*
|
||||
* This software is dual licensed under the GNU General License Version 2,
|
||||
* June 1991 as shown in the file COPYING in the top-level directory of this
|
||||
|
@ -110,6 +110,11 @@ static inline struct bpf_offloaded_map *map_to_offmap(struct bpf_map *map)
|
||||
return container_of(map, struct bpf_offloaded_map, map);
|
||||
}
|
||||
|
||||
static inline bool bpf_map_offload_neutral(const struct bpf_map *map)
|
||||
{
|
||||
return map->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY;
|
||||
}
|
||||
|
||||
static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
|
||||
{
|
||||
return map->ops->map_seq_show_elem && map->ops->map_check_btf;
|
||||
@ -235,6 +240,8 @@ struct bpf_verifier_ops {
|
||||
struct bpf_insn_access_aux *info);
|
||||
int (*gen_prologue)(struct bpf_insn *insn, bool direct_write,
|
||||
const struct bpf_prog *prog);
|
||||
int (*gen_ld_abs)(const struct bpf_insn *orig,
|
||||
struct bpf_insn *insn_buf);
|
||||
u32 (*convert_ctx_access)(enum bpf_access_type type,
|
||||
const struct bpf_insn *src,
|
||||
struct bpf_insn *dst,
|
||||
@ -676,6 +683,31 @@ static inline int sock_map_prog(struct bpf_map *map,
|
||||
}
|
||||
#endif
|
||||
|
||||
#if defined(CONFIG_XDP_SOCKETS)
|
||||
struct xdp_sock;
|
||||
struct xdp_sock *__xsk_map_lookup_elem(struct bpf_map *map, u32 key);
|
||||
int __xsk_map_redirect(struct bpf_map *map, struct xdp_buff *xdp,
|
||||
struct xdp_sock *xs);
|
||||
void __xsk_map_flush(struct bpf_map *map);
|
||||
#else
|
||||
struct xdp_sock;
|
||||
static inline struct xdp_sock *__xsk_map_lookup_elem(struct bpf_map *map,
|
||||
u32 key)
|
||||
{
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static inline int __xsk_map_redirect(struct bpf_map *map, struct xdp_buff *xdp,
|
||||
struct xdp_sock *xs)
|
||||
{
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
static inline void __xsk_map_flush(struct bpf_map *map)
|
||||
{
|
||||
}
|
||||
#endif
|
||||
|
||||
/* verifier prototypes for helper functions called from eBPF programs */
|
||||
extern const struct bpf_func_proto bpf_map_lookup_elem_proto;
|
||||
extern const struct bpf_func_proto bpf_map_update_elem_proto;
|
||||
@ -689,9 +721,8 @@ extern const struct bpf_func_proto bpf_ktime_get_ns_proto;
|
||||
extern const struct bpf_func_proto bpf_get_current_pid_tgid_proto;
|
||||
extern const struct bpf_func_proto bpf_get_current_uid_gid_proto;
|
||||
extern const struct bpf_func_proto bpf_get_current_comm_proto;
|
||||
extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
|
||||
extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
|
||||
extern const struct bpf_func_proto bpf_get_stackid_proto;
|
||||
extern const struct bpf_func_proto bpf_get_stack_proto;
|
||||
extern const struct bpf_func_proto bpf_sock_map_update_proto;
|
||||
|
||||
/* Shared helpers among cBPF and eBPF. */
|
||||
|
@ -2,7 +2,6 @@
|
||||
#ifndef __LINUX_BPF_TRACE_H__
|
||||
#define __LINUX_BPF_TRACE_H__
|
||||
|
||||
#include <trace/events/bpf.h>
|
||||
#include <trace/events/xdp.h>
|
||||
|
||||
#endif /* __LINUX_BPF_TRACE_H__ */
|
||||
|
@ -49,4 +49,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP, dev_map_ops)
|
||||
BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops)
|
||||
#endif
|
||||
BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops)
|
||||
#if defined(CONFIG_XDP_SOCKETS)
|
||||
BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops)
|
||||
#endif
|
||||
#endif
|
||||
|
@ -173,6 +173,11 @@ static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
|
||||
|
||||
#define BPF_MAX_SUBPROGS 256
|
||||
|
||||
struct bpf_subprog_info {
|
||||
u32 start; /* insn idx of function entry point */
|
||||
u16 stack_depth; /* max. stack depth used by this function */
|
||||
};
|
||||
|
||||
/* single container for all structs
|
||||
* one verifier_env per bpf_check() call
|
||||
*/
|
||||
@ -191,9 +196,7 @@ struct bpf_verifier_env {
|
||||
bool seen_direct_write;
|
||||
struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
|
||||
struct bpf_verifier_log log;
|
||||
u32 subprog_starts[BPF_MAX_SUBPROGS];
|
||||
/* computes the stack depth of each bpf function */
|
||||
u16 subprog_stack_depth[BPF_MAX_SUBPROGS + 1];
|
||||
struct bpf_subprog_info subprog_info[BPF_MAX_SUBPROGS + 1];
|
||||
u32 subprog_cnt;
|
||||
};
|
||||
|
||||
|
@ -47,7 +47,9 @@ struct xdp_buff;
|
||||
/* Additional register mappings for converted user programs. */
|
||||
#define BPF_REG_A BPF_REG_0
|
||||
#define BPF_REG_X BPF_REG_7
|
||||
#define BPF_REG_TMP BPF_REG_8
|
||||
#define BPF_REG_TMP BPF_REG_2 /* scratch reg */
|
||||
#define BPF_REG_D BPF_REG_8 /* data, callee-saved */
|
||||
#define BPF_REG_H BPF_REG_9 /* hlen, callee-saved */
|
||||
|
||||
/* Kernel hidden auxiliary/helper register for hardening step.
|
||||
* Only used by eBPF JITs. It's nothing more than a temporary
|
||||
@ -468,7 +470,8 @@ struct bpf_prog {
|
||||
dst_needed:1, /* Do we need dst entry? */
|
||||
blinded:1, /* Was blinded */
|
||||
is_func:1, /* program is a bpf function */
|
||||
kprobe_override:1; /* Do we override a kprobe? */
|
||||
kprobe_override:1, /* Do we override a kprobe? */
|
||||
has_callchain_buf:1; /* callchain buffer allocated? */
|
||||
enum bpf_prog_type type; /* Type of BPF program */
|
||||
enum bpf_attach_type expected_attach_type; /* For some prog types */
|
||||
u32 len; /* Number of filter blocks */
|
||||
@ -759,7 +762,7 @@ struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
|
||||
* This does not appear to be a real limitation for existing software.
|
||||
*/
|
||||
int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
|
||||
struct bpf_prog *prog);
|
||||
struct xdp_buff *xdp, struct bpf_prog *prog);
|
||||
int xdp_do_redirect(struct net_device *dev,
|
||||
struct xdp_buff *xdp,
|
||||
struct bpf_prog *prog);
|
||||
|
@ -2510,6 +2510,7 @@ void dev_disable_lro(struct net_device *dev);
|
||||
int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *newskb);
|
||||
int dev_queue_xmit(struct sk_buff *skb);
|
||||
int dev_queue_xmit_accel(struct sk_buff *skb, void *accel_priv);
|
||||
int dev_direct_xmit(struct sk_buff *skb, u16 queue_id);
|
||||
int register_netdevice(struct net_device *dev);
|
||||
void unregister_netdevice_queue(struct net_device *dev, struct list_head *head);
|
||||
void unregister_netdevice_many(struct list_head *head);
|
||||
|
@ -207,8 +207,9 @@ struct ucred {
|
||||
* PF_SMC protocol family that
|
||||
* reuses AF_INET address family
|
||||
*/
|
||||
#define AF_XDP 44 /* XDP sockets */
|
||||
|
||||
#define AF_MAX 44 /* For now.. */
|
||||
#define AF_MAX 45 /* For now.. */
|
||||
|
||||
/* Protocol families, same as address families. */
|
||||
#define PF_UNSPEC AF_UNSPEC
|
||||
@ -257,6 +258,7 @@ struct ucred {
|
||||
#define PF_KCM AF_KCM
|
||||
#define PF_QIPCRTR AF_QIPCRTR
|
||||
#define PF_SMC AF_SMC
|
||||
#define PF_XDP AF_XDP
|
||||
#define PF_MAX AF_MAX
|
||||
|
||||
/* Maximum queue length specifiable by listen. */
|
||||
@ -338,6 +340,7 @@ struct ucred {
|
||||
#define SOL_NFC 280
|
||||
#define SOL_KCM 281
|
||||
#define SOL_TLS 282
|
||||
#define SOL_XDP 283
|
||||
|
||||
/* IPX options */
|
||||
#define IPX_TYPE 1
|
||||
|
@ -23,8 +23,10 @@ struct tnum tnum_range(u64 min, u64 max);
|
||||
/* Arithmetic and logical ops */
|
||||
/* Shift a tnum left (by a fixed shift) */
|
||||
struct tnum tnum_lshift(struct tnum a, u8 shift);
|
||||
/* Shift a tnum right (by a fixed shift) */
|
||||
/* Shift (rsh) a tnum right (by a fixed shift) */
|
||||
struct tnum tnum_rshift(struct tnum a, u8 shift);
|
||||
/* Shift (arsh) a tnum right (by a fixed min_shift) */
|
||||
struct tnum tnum_arshift(struct tnum a, u8 min_shift);
|
||||
/* Add two tnums, return @a + @b */
|
||||
struct tnum tnum_add(struct tnum a, struct tnum b);
|
||||
/* Subtract two tnums, return @a - @b */
|
||||
|
@ -104,6 +104,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
|
||||
}
|
||||
|
||||
void xdp_return_frame(struct xdp_frame *xdpf);
|
||||
void xdp_return_buff(struct xdp_buff *xdp);
|
||||
|
||||
int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
|
||||
struct net_device *dev, u32 queue_index);
|
||||
|
66
include/net/xdp_sock.h
Normal file
66
include/net/xdp_sock.h
Normal file
@ -0,0 +1,66 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0
|
||||
* AF_XDP internal functions
|
||||
* Copyright(c) 2018 Intel Corporation.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it
|
||||
* under the terms and conditions of the GNU General Public License,
|
||||
* version 2, as published by the Free Software Foundation.
|
||||
*
|
||||
* This program is distributed in the hope it will be useful, but WITHOUT
|
||||
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
|
||||
* more details.
|
||||
*/
|
||||
|
||||
#ifndef _LINUX_XDP_SOCK_H
|
||||
#define _LINUX_XDP_SOCK_H
|
||||
|
||||
#include <linux/mutex.h>
|
||||
#include <net/sock.h>
|
||||
|
||||
struct net_device;
|
||||
struct xsk_queue;
|
||||
struct xdp_umem;
|
||||
|
||||
struct xdp_sock {
|
||||
/* struct sock must be the first member of struct xdp_sock */
|
||||
struct sock sk;
|
||||
struct xsk_queue *rx;
|
||||
struct net_device *dev;
|
||||
struct xdp_umem *umem;
|
||||
struct list_head flush_node;
|
||||
u16 queue_id;
|
||||
struct xsk_queue *tx ____cacheline_aligned_in_smp;
|
||||
/* Protects multiple processes in the control path */
|
||||
struct mutex mutex;
|
||||
u64 rx_dropped;
|
||||
};
|
||||
|
||||
struct xdp_buff;
|
||||
#ifdef CONFIG_XDP_SOCKETS
|
||||
int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp);
|
||||
int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp);
|
||||
void xsk_flush(struct xdp_sock *xs);
|
||||
bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs);
|
||||
#else
|
||||
static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
|
||||
{
|
||||
return -ENOTSUPP;
|
||||
}
|
||||
|
||||
static inline int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
|
||||
{
|
||||
return -ENOTSUPP;
|
||||
}
|
||||
|
||||
static inline void xsk_flush(struct xdp_sock *xs)
|
||||
{
|
||||
}
|
||||
|
||||
static inline bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
#endif /* CONFIG_XDP_SOCKETS */
|
||||
|
||||
#endif /* _LINUX_XDP_SOCK_H */
|
@ -1,355 +0,0 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#undef TRACE_SYSTEM
|
||||
#define TRACE_SYSTEM bpf
|
||||
|
||||
#if !defined(_TRACE_BPF_H) || defined(TRACE_HEADER_MULTI_READ)
|
||||
#define _TRACE_BPF_H
|
||||
|
||||
/* These are only used within the BPF_SYSCALL code */
|
||||
#ifdef CONFIG_BPF_SYSCALL
|
||||
|
||||
#include <linux/filter.h>
|
||||
#include <linux/bpf.h>
|
||||
#include <linux/fs.h>
|
||||
#include <linux/tracepoint.h>
|
||||
|
||||
#define __PROG_TYPE_MAP(FN) \
|
||||
FN(SOCKET_FILTER) \
|
||||
FN(KPROBE) \
|
||||
FN(SCHED_CLS) \
|
||||
FN(SCHED_ACT) \
|
||||
FN(TRACEPOINT) \
|
||||
FN(XDP) \
|
||||
FN(PERF_EVENT) \
|
||||
FN(CGROUP_SKB) \
|
||||
FN(CGROUP_SOCK) \
|
||||
FN(LWT_IN) \
|
||||
FN(LWT_OUT) \
|
||||
FN(LWT_XMIT)
|
||||
|
||||
#define __MAP_TYPE_MAP(FN) \
|
||||
FN(HASH) \
|
||||
FN(ARRAY) \
|
||||
FN(PROG_ARRAY) \
|
||||
FN(PERF_EVENT_ARRAY) \
|
||||
FN(PERCPU_HASH) \
|
||||
FN(PERCPU_ARRAY) \
|
||||
FN(STACK_TRACE) \
|
||||
FN(CGROUP_ARRAY) \
|
||||
FN(LRU_HASH) \
|
||||
FN(LRU_PERCPU_HASH) \
|
||||
FN(LPM_TRIE)
|
||||
|
||||
#define __PROG_TYPE_TP_FN(x) \
|
||||
TRACE_DEFINE_ENUM(BPF_PROG_TYPE_##x);
|
||||
#define __PROG_TYPE_SYM_FN(x) \
|
||||
{ BPF_PROG_TYPE_##x, #x },
|
||||
#define __PROG_TYPE_SYM_TAB \
|
||||
__PROG_TYPE_MAP(__PROG_TYPE_SYM_FN) { -1, 0 }
|
||||
__PROG_TYPE_MAP(__PROG_TYPE_TP_FN)
|
||||
|
||||
#define __MAP_TYPE_TP_FN(x) \
|
||||
TRACE_DEFINE_ENUM(BPF_MAP_TYPE_##x);
|
||||
#define __MAP_TYPE_SYM_FN(x) \
|
||||
{ BPF_MAP_TYPE_##x, #x },
|
||||
#define __MAP_TYPE_SYM_TAB \
|
||||
__MAP_TYPE_MAP(__MAP_TYPE_SYM_FN) { -1, 0 }
|
||||
__MAP_TYPE_MAP(__MAP_TYPE_TP_FN)
|
||||
|
||||
DECLARE_EVENT_CLASS(bpf_prog_event,
|
||||
|
||||
TP_PROTO(const struct bpf_prog *prg),
|
||||
|
||||
TP_ARGS(prg),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__array(u8, prog_tag, 8)
|
||||
__field(u32, type)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
BUILD_BUG_ON(sizeof(__entry->prog_tag) != sizeof(prg->tag));
|
||||
memcpy(__entry->prog_tag, prg->tag, sizeof(prg->tag));
|
||||
__entry->type = prg->type;
|
||||
),
|
||||
|
||||
TP_printk("prog=%s type=%s",
|
||||
__print_hex_str(__entry->prog_tag, 8),
|
||||
__print_symbolic(__entry->type, __PROG_TYPE_SYM_TAB))
|
||||
);
|
||||
|
||||
DEFINE_EVENT(bpf_prog_event, bpf_prog_get_type,
|
||||
|
||||
TP_PROTO(const struct bpf_prog *prg),
|
||||
|
||||
TP_ARGS(prg)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(bpf_prog_event, bpf_prog_put_rcu,
|
||||
|
||||
TP_PROTO(const struct bpf_prog *prg),
|
||||
|
||||
TP_ARGS(prg)
|
||||
);
|
||||
|
||||
TRACE_EVENT(bpf_prog_load,
|
||||
|
||||
TP_PROTO(const struct bpf_prog *prg, int ufd),
|
||||
|
||||
TP_ARGS(prg, ufd),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__array(u8, prog_tag, 8)
|
||||
__field(u32, type)
|
||||
__field(int, ufd)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
BUILD_BUG_ON(sizeof(__entry->prog_tag) != sizeof(prg->tag));
|
||||
memcpy(__entry->prog_tag, prg->tag, sizeof(prg->tag));
|
||||
__entry->type = prg->type;
|
||||
__entry->ufd = ufd;
|
||||
),
|
||||
|
||||
TP_printk("prog=%s type=%s ufd=%d",
|
||||
__print_hex_str(__entry->prog_tag, 8),
|
||||
__print_symbolic(__entry->type, __PROG_TYPE_SYM_TAB),
|
||||
__entry->ufd)
|
||||
);
|
||||
|
||||
TRACE_EVENT(bpf_map_create,
|
||||
|
||||
TP_PROTO(const struct bpf_map *map, int ufd),
|
||||
|
||||
TP_ARGS(map, ufd),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(u32, type)
|
||||
__field(u32, size_key)
|
||||
__field(u32, size_value)
|
||||
__field(u32, max_entries)
|
||||
__field(u32, flags)
|
||||
__field(int, ufd)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->type = map->map_type;
|
||||
__entry->size_key = map->key_size;
|
||||
__entry->size_value = map->value_size;
|
||||
__entry->max_entries = map->max_entries;
|
||||
__entry->flags = map->map_flags;
|
||||
__entry->ufd = ufd;
|
||||
),
|
||||
|
||||
TP_printk("map type=%s ufd=%d key=%u val=%u max=%u flags=%x",
|
||||
__print_symbolic(__entry->type, __MAP_TYPE_SYM_TAB),
|
||||
__entry->ufd, __entry->size_key, __entry->size_value,
|
||||
__entry->max_entries, __entry->flags)
|
||||
);
|
||||
|
||||
DECLARE_EVENT_CLASS(bpf_obj_prog,
|
||||
|
||||
TP_PROTO(const struct bpf_prog *prg, int ufd,
|
||||
const struct filename *pname),
|
||||
|
||||
TP_ARGS(prg, ufd, pname),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__array(u8, prog_tag, 8)
|
||||
__field(int, ufd)
|
||||
__string(path, pname->name)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
BUILD_BUG_ON(sizeof(__entry->prog_tag) != sizeof(prg->tag));
|
||||
memcpy(__entry->prog_tag, prg->tag, sizeof(prg->tag));
|
||||
__assign_str(path, pname->name);
|
||||
__entry->ufd = ufd;
|
||||
),
|
||||
|
||||
TP_printk("prog=%s path=%s ufd=%d",
|
||||
__print_hex_str(__entry->prog_tag, 8),
|
||||
__get_str(path), __entry->ufd)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(bpf_obj_prog, bpf_obj_pin_prog,
|
||||
|
||||
TP_PROTO(const struct bpf_prog *prg, int ufd,
|
||||
const struct filename *pname),
|
||||
|
||||
TP_ARGS(prg, ufd, pname)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(bpf_obj_prog, bpf_obj_get_prog,
|
||||
|
||||
TP_PROTO(const struct bpf_prog *prg, int ufd,
|
||||
const struct filename *pname),
|
||||
|
||||
TP_ARGS(prg, ufd, pname)
|
||||
);
|
||||
|
||||
DECLARE_EVENT_CLASS(bpf_obj_map,
|
||||
|
||||
TP_PROTO(const struct bpf_map *map, int ufd,
|
||||
const struct filename *pname),
|
||||
|
||||
TP_ARGS(map, ufd, pname),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(u32, type)
|
||||
__field(int, ufd)
|
||||
__string(path, pname->name)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__assign_str(path, pname->name);
|
||||
__entry->type = map->map_type;
|
||||
__entry->ufd = ufd;
|
||||
),
|
||||
|
||||
TP_printk("map type=%s ufd=%d path=%s",
|
||||
__print_symbolic(__entry->type, __MAP_TYPE_SYM_TAB),
|
||||
__entry->ufd, __get_str(path))
|
||||
);
|
||||
|
||||
DEFINE_EVENT(bpf_obj_map, bpf_obj_pin_map,
|
||||
|
||||
TP_PROTO(const struct bpf_map *map, int ufd,
|
||||
const struct filename *pname),
|
||||
|
||||
TP_ARGS(map, ufd, pname)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(bpf_obj_map, bpf_obj_get_map,
|
||||
|
||||
TP_PROTO(const struct bpf_map *map, int ufd,
|
||||
const struct filename *pname),
|
||||
|
||||
TP_ARGS(map, ufd, pname)
|
||||
);
|
||||
|
||||
DECLARE_EVENT_CLASS(bpf_map_keyval,
|
||||
|
||||
TP_PROTO(const struct bpf_map *map, int ufd,
|
||||
const void *key, const void *val),
|
||||
|
||||
TP_ARGS(map, ufd, key, val),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(u32, type)
|
||||
__field(u32, key_len)
|
||||
__dynamic_array(u8, key, map->key_size)
|
||||
__field(bool, key_trunc)
|
||||
__field(u32, val_len)
|
||||
__dynamic_array(u8, val, map->value_size)
|
||||
__field(bool, val_trunc)
|
||||
__field(int, ufd)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
memcpy(__get_dynamic_array(key), key, map->key_size);
|
||||
memcpy(__get_dynamic_array(val), val, map->value_size);
|
||||
__entry->type = map->map_type;
|
||||
__entry->key_len = min(map->key_size, 16U);
|
||||
__entry->key_trunc = map->key_size != __entry->key_len;
|
||||
__entry->val_len = min(map->value_size, 16U);
|
||||
__entry->val_trunc = map->value_size != __entry->val_len;
|
||||
__entry->ufd = ufd;
|
||||
),
|
||||
|
||||
TP_printk("map type=%s ufd=%d key=[%s%s] val=[%s%s]",
|
||||
__print_symbolic(__entry->type, __MAP_TYPE_SYM_TAB),
|
||||
__entry->ufd,
|
||||
__print_hex(__get_dynamic_array(key), __entry->key_len),
|
||||
__entry->key_trunc ? " ..." : "",
|
||||
__print_hex(__get_dynamic_array(val), __entry->val_len),
|
||||
__entry->val_trunc ? " ..." : "")
|
||||
);
|
||||
|
||||
DEFINE_EVENT(bpf_map_keyval, bpf_map_lookup_elem,
|
||||
|
||||
TP_PROTO(const struct bpf_map *map, int ufd,
|
||||
const void *key, const void *val),
|
||||
|
||||
TP_ARGS(map, ufd, key, val)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(bpf_map_keyval, bpf_map_update_elem,
|
||||
|
||||
TP_PROTO(const struct bpf_map *map, int ufd,
|
||||
const void *key, const void *val),
|
||||
|
||||
TP_ARGS(map, ufd, key, val)
|
||||
);
|
||||
|
||||
TRACE_EVENT(bpf_map_delete_elem,
|
||||
|
||||
TP_PROTO(const struct bpf_map *map, int ufd,
|
||||
const void *key),
|
||||
|
||||
TP_ARGS(map, ufd, key),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(u32, type)
|
||||
__field(u32, key_len)
|
||||
__dynamic_array(u8, key, map->key_size)
|
||||
__field(bool, key_trunc)
|
||||
__field(int, ufd)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
memcpy(__get_dynamic_array(key), key, map->key_size);
|
||||
__entry->type = map->map_type;
|
||||
__entry->key_len = min(map->key_size, 16U);
|
||||
__entry->key_trunc = map->key_size != __entry->key_len;
|
||||
__entry->ufd = ufd;
|
||||
),
|
||||
|
||||
TP_printk("map type=%s ufd=%d key=[%s%s]",
|
||||
__print_symbolic(__entry->type, __MAP_TYPE_SYM_TAB),
|
||||
__entry->ufd,
|
||||
__print_hex(__get_dynamic_array(key), __entry->key_len),
|
||||
__entry->key_trunc ? " ..." : "")
|
||||
);
|
||||
|
||||
TRACE_EVENT(bpf_map_next_key,
|
||||
|
||||
TP_PROTO(const struct bpf_map *map, int ufd,
|
||||
const void *key, const void *key_next),
|
||||
|
||||
TP_ARGS(map, ufd, key, key_next),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(u32, type)
|
||||
__field(u32, key_len)
|
||||
__dynamic_array(u8, key, map->key_size)
|
||||
__dynamic_array(u8, nxt, map->key_size)
|
||||
__field(bool, key_trunc)
|
||||
__field(bool, key_null)
|
||||
__field(int, ufd)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
if (key)
|
||||
memcpy(__get_dynamic_array(key), key, map->key_size);
|
||||
__entry->key_null = !key;
|
||||
memcpy(__get_dynamic_array(nxt), key_next, map->key_size);
|
||||
__entry->type = map->map_type;
|
||||
__entry->key_len = min(map->key_size, 16U);
|
||||
__entry->key_trunc = map->key_size != __entry->key_len;
|
||||
__entry->ufd = ufd;
|
||||
),
|
||||
|
||||
TP_printk("map type=%s ufd=%d key=[%s%s] next=[%s%s]",
|
||||
__print_symbolic(__entry->type, __MAP_TYPE_SYM_TAB),
|
||||
__entry->ufd,
|
||||
__entry->key_null ? "NULL" : __print_hex(__get_dynamic_array(key),
|
||||
__entry->key_len),
|
||||
__entry->key_trunc && !__entry->key_null ? " ..." : "",
|
||||
__print_hex(__get_dynamic_array(nxt), __entry->key_len),
|
||||
__entry->key_trunc ? " ..." : "")
|
||||
);
|
||||
#endif /* CONFIG_BPF_SYSCALL */
|
||||
#endif /* _TRACE_BPF_H */
|
||||
|
||||
#include <trace/define_trace.h>
|
@ -116,6 +116,7 @@ enum bpf_map_type {
|
||||
BPF_MAP_TYPE_DEVMAP,
|
||||
BPF_MAP_TYPE_SOCKMAP,
|
||||
BPF_MAP_TYPE_CPUMAP,
|
||||
BPF_MAP_TYPE_XSKMAP,
|
||||
};
|
||||
|
||||
enum bpf_prog_type {
|
||||
@ -828,12 +829,12 @@ union bpf_attr {
|
||||
*
|
||||
* Also, be aware that the newer helper
|
||||
* **bpf_perf_event_read_value**\ () is recommended over
|
||||
* **bpf_perf_event_read*\ () in general. The latter has some ABI
|
||||
* **bpf_perf_event_read**\ () in general. The latter has some ABI
|
||||
* quirks where error and counter value are used as a return code
|
||||
* (which is wrong to do since ranges may overlap). This issue is
|
||||
* fixed with bpf_perf_event_read_value(), which at the same time
|
||||
* provides more features over the **bpf_perf_event_read**\ ()
|
||||
* interface. Please refer to the description of
|
||||
* fixed with **bpf_perf_event_read_value**\ (), which at the same
|
||||
* time provides more features over the **bpf_perf_event_read**\
|
||||
* () interface. Please refer to the description of
|
||||
* **bpf_perf_event_read_value**\ () for details.
|
||||
* Return
|
||||
* The value of the perf event counter read from the map, or a
|
||||
@ -1361,7 +1362,7 @@ union bpf_attr {
|
||||
* Return
|
||||
* 0
|
||||
*
|
||||
* int bpf_setsockopt(struct bpf_sock_ops_kern *bpf_socket, int level, int optname, char *optval, int optlen)
|
||||
* int bpf_setsockopt(struct bpf_sock_ops *bpf_socket, int level, int optname, char *optval, int optlen)
|
||||
* Description
|
||||
* Emulate a call to **setsockopt()** on the socket associated to
|
||||
* *bpf_socket*, which must be a full socket. The *level* at
|
||||
@ -1435,7 +1436,7 @@ union bpf_attr {
|
||||
* Return
|
||||
* **SK_PASS** on success, or **SK_DROP** on error.
|
||||
*
|
||||
* int bpf_sock_map_update(struct bpf_sock_ops_kern *skops, struct bpf_map *map, void *key, u64 flags)
|
||||
* int bpf_sock_map_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags)
|
||||
* Description
|
||||
* Add an entry to, or update a *map* referencing sockets. The
|
||||
* *skops* is used as a new value for the entry associated to
|
||||
@ -1533,7 +1534,7 @@ union bpf_attr {
|
||||
* Return
|
||||
* 0 on success, or a negative error in case of failure.
|
||||
*
|
||||
* int bpf_perf_prog_read_value(struct bpf_perf_event_data_kern *ctx, struct bpf_perf_event_value *buf, u32 buf_size)
|
||||
* int bpf_perf_prog_read_value(struct bpf_perf_event_data *ctx, struct bpf_perf_event_value *buf, u32 buf_size)
|
||||
* Description
|
||||
* For en eBPF program attached to a perf event, retrieve the
|
||||
* value of the event counter associated to *ctx* and store it in
|
||||
@ -1544,7 +1545,7 @@ union bpf_attr {
|
||||
* Return
|
||||
* 0 on success, or a negative error in case of failure.
|
||||
*
|
||||
* int bpf_getsockopt(struct bpf_sock_ops_kern *bpf_socket, int level, int optname, char *optval, int optlen)
|
||||
* int bpf_getsockopt(struct bpf_sock_ops *bpf_socket, int level, int optname, char *optval, int optlen)
|
||||
* Description
|
||||
* Emulate a call to **getsockopt()** on the socket associated to
|
||||
* *bpf_socket*, which must be a full socket. The *level* at
|
||||
@ -1588,7 +1589,7 @@ union bpf_attr {
|
||||
* Return
|
||||
* 0
|
||||
*
|
||||
* int bpf_sock_ops_cb_flags_set(struct bpf_sock_ops_kern *bpf_sock, int argval)
|
||||
* int bpf_sock_ops_cb_flags_set(struct bpf_sock_ops *bpf_sock, int argval)
|
||||
* Description
|
||||
* Attempt to set the value of the **bpf_sock_ops_cb_flags** field
|
||||
* for the full TCP socket associated to *bpf_sock_ops* to
|
||||
@ -1721,7 +1722,7 @@ union bpf_attr {
|
||||
* Return
|
||||
* 0 on success, or a negative error in case of failure.
|
||||
*
|
||||
* int bpf_bind(struct bpf_sock_addr_kern *ctx, struct sockaddr *addr, int addr_len)
|
||||
* int bpf_bind(struct bpf_sock_addr *ctx, struct sockaddr *addr, int addr_len)
|
||||
* Description
|
||||
* Bind the socket associated to *ctx* to the address pointed by
|
||||
* *addr*, of length *addr_len*. This allows for making outgoing
|
||||
@ -1767,6 +1768,64 @@ union bpf_attr {
|
||||
* **CONFIG_XFRM** configuration option.
|
||||
* Return
|
||||
* 0 on success, or a negative error in case of failure.
|
||||
*
|
||||
* int bpf_get_stack(struct pt_regs *regs, void *buf, u32 size, u64 flags)
|
||||
* Description
|
||||
* Return a user or a kernel stack in bpf program provided buffer.
|
||||
* To achieve this, the helper needs *ctx*, which is a pointer
|
||||
* to the context on which the tracing program is executed.
|
||||
* To store the stacktrace, the bpf program provides *buf* with
|
||||
* a nonnegative *size*.
|
||||
*
|
||||
* The last argument, *flags*, holds the number of stack frames to
|
||||
* skip (from 0 to 255), masked with
|
||||
* **BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
|
||||
* the following flags:
|
||||
*
|
||||
* **BPF_F_USER_STACK**
|
||||
* Collect a user space stack instead of a kernel stack.
|
||||
* **BPF_F_USER_BUILD_ID**
|
||||
* Collect buildid+offset instead of ips for user stack,
|
||||
* only valid if **BPF_F_USER_STACK** is also specified.
|
||||
*
|
||||
* **bpf_get_stack**\ () can collect up to
|
||||
* **PERF_MAX_STACK_DEPTH** both kernel and user frames, subject
|
||||
* to sufficient large buffer size. Note that
|
||||
* this limit can be controlled with the **sysctl** program, and
|
||||
* that it should be manually increased in order to profile long
|
||||
* user stacks (such as stacks for Java programs). To do so, use:
|
||||
*
|
||||
* ::
|
||||
*
|
||||
* # sysctl kernel.perf_event_max_stack=<new value>
|
||||
*
|
||||
* Return
|
||||
* a non-negative value equal to or less than size on success, or
|
||||
* a negative error in case of failure.
|
||||
*
|
||||
* int skb_load_bytes_relative(const struct sk_buff *skb, u32 offset, void *to, u32 len, u32 start_header)
|
||||
* Description
|
||||
* This helper is similar to **bpf_skb_load_bytes**\ () in that
|
||||
* it provides an easy way to load *len* bytes from *offset*
|
||||
* from the packet associated to *skb*, into the buffer pointed
|
||||
* by *to*. The difference to **bpf_skb_load_bytes**\ () is that
|
||||
* a fifth argument *start_header* exists in order to select a
|
||||
* base offset to start from. *start_header* can be one of:
|
||||
*
|
||||
* **BPF_HDR_START_MAC**
|
||||
* Base offset to load data from is *skb*'s mac header.
|
||||
* **BPF_HDR_START_NET**
|
||||
* Base offset to load data from is *skb*'s network header.
|
||||
*
|
||||
* In general, "direct packet access" is the preferred method to
|
||||
* access packet data, however, this helper is in particular useful
|
||||
* in socket filters where *skb*\ **->data** does not always point
|
||||
* to the start of the mac header and where "direct packet access"
|
||||
* is not available.
|
||||
*
|
||||
* Return
|
||||
* 0 on success, or a negative error in case of failure.
|
||||
*
|
||||
*/
|
||||
#define __BPF_FUNC_MAPPER(FN) \
|
||||
FN(unspec), \
|
||||
@ -1835,7 +1894,9 @@ union bpf_attr {
|
||||
FN(msg_pull_data), \
|
||||
FN(bind), \
|
||||
FN(xdp_adjust_tail), \
|
||||
FN(skb_get_xfrm_state),
|
||||
FN(skb_get_xfrm_state), \
|
||||
FN(get_stack), \
|
||||
FN(skb_load_bytes_relative),
|
||||
|
||||
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
|
||||
* function eBPF program intends to call
|
||||
@ -1869,11 +1930,14 @@ enum bpf_func_id {
|
||||
/* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
|
||||
#define BPF_F_TUNINFO_IPV6 (1ULL << 0)
|
||||
|
||||
/* BPF_FUNC_get_stackid flags. */
|
||||
/* flags for both BPF_FUNC_get_stackid and BPF_FUNC_get_stack. */
|
||||
#define BPF_F_SKIP_FIELD_MASK 0xffULL
|
||||
#define BPF_F_USER_STACK (1ULL << 8)
|
||||
/* flags used by BPF_FUNC_get_stackid only. */
|
||||
#define BPF_F_FAST_STACK_CMP (1ULL << 9)
|
||||
#define BPF_F_REUSE_STACKID (1ULL << 10)
|
||||
/* flags used by BPF_FUNC_get_stack only. */
|
||||
#define BPF_F_USER_BUILD_ID (1ULL << 11)
|
||||
|
||||
/* BPF_FUNC_skb_set_tunnel_key flags. */
|
||||
#define BPF_F_ZERO_CSUM_TX (1ULL << 1)
|
||||
@ -1893,6 +1957,12 @@ enum bpf_adj_room_mode {
|
||||
BPF_ADJ_ROOM_NET,
|
||||
};
|
||||
|
||||
/* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
|
||||
enum bpf_hdr_start_off {
|
||||
BPF_HDR_START_MAC,
|
||||
BPF_HDR_START_NET,
|
||||
};
|
||||
|
||||
/* user accessible mirror of in-kernel sk_buff.
|
||||
* new fields can only be added to the end of this structure
|
||||
*/
|
||||
|
87
include/uapi/linux/if_xdp.h
Normal file
87
include/uapi/linux/if_xdp.h
Normal file
@ -0,0 +1,87 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
|
||||
*
|
||||
* if_xdp: XDP socket user-space interface
|
||||
* Copyright(c) 2018 Intel Corporation.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it
|
||||
* under the terms and conditions of the GNU General Public License,
|
||||
* version 2, as published by the Free Software Foundation.
|
||||
*
|
||||
* This program is distributed in the hope it will be useful, but WITHOUT
|
||||
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
|
||||
* more details.
|
||||
*
|
||||
* Author(s): Björn Töpel <bjorn.topel@intel.com>
|
||||
* Magnus Karlsson <magnus.karlsson@intel.com>
|
||||
*/
|
||||
|
||||
#ifndef _LINUX_IF_XDP_H
|
||||
#define _LINUX_IF_XDP_H
|
||||
|
||||
#include <linux/types.h>
|
||||
|
||||
/* Options for the sxdp_flags field */
|
||||
#define XDP_SHARED_UMEM 1
|
||||
|
||||
struct sockaddr_xdp {
|
||||
__u16 sxdp_family;
|
||||
__u32 sxdp_ifindex;
|
||||
__u32 sxdp_queue_id;
|
||||
__u32 sxdp_shared_umem_fd;
|
||||
__u16 sxdp_flags;
|
||||
};
|
||||
|
||||
/* XDP socket options */
|
||||
#define XDP_RX_RING 1
|
||||
#define XDP_TX_RING 2
|
||||
#define XDP_UMEM_REG 3
|
||||
#define XDP_UMEM_FILL_RING 4
|
||||
#define XDP_UMEM_COMPLETION_RING 5
|
||||
#define XDP_STATISTICS 6
|
||||
|
||||
struct xdp_umem_reg {
|
||||
__u64 addr; /* Start of packet data area */
|
||||
__u64 len; /* Length of packet data area */
|
||||
__u32 frame_size; /* Frame size */
|
||||
__u32 frame_headroom; /* Frame head room */
|
||||
};
|
||||
|
||||
struct xdp_statistics {
|
||||
__u64 rx_dropped; /* Dropped for reasons other than invalid desc */
|
||||
__u64 rx_invalid_descs; /* Dropped due to invalid descriptor */
|
||||
__u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
|
||||
};
|
||||
|
||||
/* Pgoff for mmaping the rings */
|
||||
#define XDP_PGOFF_RX_RING 0
|
||||
#define XDP_PGOFF_TX_RING 0x80000000
|
||||
#define XDP_UMEM_PGOFF_FILL_RING 0x100000000
|
||||
#define XDP_UMEM_PGOFF_COMPLETION_RING 0x180000000
|
||||
|
||||
struct xdp_desc {
|
||||
__u32 idx;
|
||||
__u32 len;
|
||||
__u16 offset;
|
||||
__u8 flags;
|
||||
__u8 padding[5];
|
||||
};
|
||||
|
||||
struct xdp_ring {
|
||||
__u32 producer __attribute__((aligned(64)));
|
||||
__u32 consumer __attribute__((aligned(64)));
|
||||
};
|
||||
|
||||
/* Used for the RX and TX queues for packets */
|
||||
struct xdp_rxtx_ring {
|
||||
struct xdp_ring ptrs;
|
||||
struct xdp_desc desc[0] __attribute__((aligned(64)));
|
||||
};
|
||||
|
||||
/* Used for the fill and completion queues for buffers */
|
||||
struct xdp_umem_ring {
|
||||
struct xdp_ring ptrs;
|
||||
__u32 desc[0] __attribute__((aligned(64)));
|
||||
};
|
||||
|
||||
#endif /* _LINUX_IF_XDP_H */
|
@ -8,6 +8,9 @@ obj-$(CONFIG_BPF_SYSCALL) += btf.o
|
||||
ifeq ($(CONFIG_NET),y)
|
||||
obj-$(CONFIG_BPF_SYSCALL) += devmap.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += cpumap.o
|
||||
ifeq ($(CONFIG_XDP_SOCKETS),y)
|
||||
obj-$(CONFIG_BPF_SYSCALL) += xskmap.o
|
||||
endif
|
||||
obj-$(CONFIG_BPF_SYSCALL) += offload.o
|
||||
ifeq ($(CONFIG_STREAM_PARSER),y)
|
||||
ifeq ($(CONFIG_INET),y)
|
||||
|
@ -31,6 +31,7 @@
|
||||
#include <linux/rbtree_latch.h>
|
||||
#include <linux/kallsyms.h>
|
||||
#include <linux/rcupdate.h>
|
||||
#include <linux/perf_event.h>
|
||||
|
||||
#include <asm/unaligned.h>
|
||||
|
||||
@ -633,23 +634,6 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
|
||||
*to++ = BPF_JMP_REG(from->code, from->dst_reg, BPF_REG_AX, off);
|
||||
break;
|
||||
|
||||
case BPF_LD | BPF_ABS | BPF_W:
|
||||
case BPF_LD | BPF_ABS | BPF_H:
|
||||
case BPF_LD | BPF_ABS | BPF_B:
|
||||
*to++ = BPF_ALU64_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
|
||||
*to++ = BPF_ALU64_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
|
||||
*to++ = BPF_LD_IND(from->code, BPF_REG_AX, 0);
|
||||
break;
|
||||
|
||||
case BPF_LD | BPF_IND | BPF_W:
|
||||
case BPF_LD | BPF_IND | BPF_H:
|
||||
case BPF_LD | BPF_IND | BPF_B:
|
||||
*to++ = BPF_ALU64_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
|
||||
*to++ = BPF_ALU64_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
|
||||
*to++ = BPF_ALU32_REG(BPF_ADD, BPF_REG_AX, from->src_reg);
|
||||
*to++ = BPF_LD_IND(from->code, BPF_REG_AX, 0);
|
||||
break;
|
||||
|
||||
case BPF_LD | BPF_IMM | BPF_DW:
|
||||
*to++ = BPF_ALU64_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ aux[1].imm);
|
||||
*to++ = BPF_ALU64_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
|
||||
@ -890,14 +874,7 @@ EXPORT_SYMBOL_GPL(__bpf_call_base);
|
||||
INSN_3(LDX, MEM, W), \
|
||||
INSN_3(LDX, MEM, DW), \
|
||||
/* Immediate based. */ \
|
||||
INSN_3(LD, IMM, DW), \
|
||||
/* Misc (old cBPF carry-over). */ \
|
||||
INSN_3(LD, ABS, B), \
|
||||
INSN_3(LD, ABS, H), \
|
||||
INSN_3(LD, ABS, W), \
|
||||
INSN_3(LD, IND, B), \
|
||||
INSN_3(LD, IND, H), \
|
||||
INSN_3(LD, IND, W)
|
||||
INSN_3(LD, IMM, DW)
|
||||
|
||||
bool bpf_opcode_in_insntable(u8 code)
|
||||
{
|
||||
@ -907,6 +884,13 @@ bool bpf_opcode_in_insntable(u8 code)
|
||||
[0 ... 255] = false,
|
||||
/* Now overwrite non-defaults ... */
|
||||
BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
|
||||
/* UAPI exposed, but rewritten opcodes. cBPF carry-over. */
|
||||
[BPF_LD | BPF_ABS | BPF_B] = true,
|
||||
[BPF_LD | BPF_ABS | BPF_H] = true,
|
||||
[BPF_LD | BPF_ABS | BPF_W] = true,
|
||||
[BPF_LD | BPF_IND | BPF_B] = true,
|
||||
[BPF_LD | BPF_IND | BPF_H] = true,
|
||||
[BPF_LD | BPF_IND | BPF_W] = true,
|
||||
};
|
||||
#undef BPF_INSN_3_TBL
|
||||
#undef BPF_INSN_2_TBL
|
||||
@ -937,8 +921,6 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack)
|
||||
#undef BPF_INSN_3_LBL
|
||||
#undef BPF_INSN_2_LBL
|
||||
u32 tail_call_cnt = 0;
|
||||
void *ptr;
|
||||
int off;
|
||||
|
||||
#define CONT ({ insn++; goto select_insn; })
|
||||
#define CONT_JMP ({ insn++; goto select_insn; })
|
||||
@ -1265,67 +1247,6 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack)
|
||||
atomic64_add((u64) SRC, (atomic64_t *)(unsigned long)
|
||||
(DST + insn->off));
|
||||
CONT;
|
||||
LD_ABS_W: /* BPF_R0 = ntohl(*(u32 *) (skb->data + imm32)) */
|
||||
off = IMM;
|
||||
load_word:
|
||||
/* BPF_LD + BPD_ABS and BPF_LD + BPF_IND insns are only
|
||||
* appearing in the programs where ctx == skb
|
||||
* (see may_access_skb() in the verifier). All programs
|
||||
* keep 'ctx' in regs[BPF_REG_CTX] == BPF_R6,
|
||||
* bpf_convert_filter() saves it in BPF_R6, internal BPF
|
||||
* verifier will check that BPF_R6 == ctx.
|
||||
*
|
||||
* BPF_ABS and BPF_IND are wrappers of function calls,
|
||||
* so they scratch BPF_R1-BPF_R5 registers, preserve
|
||||
* BPF_R6-BPF_R9, and store return value into BPF_R0.
|
||||
*
|
||||
* Implicit input:
|
||||
* ctx == skb == BPF_R6 == CTX
|
||||
*
|
||||
* Explicit input:
|
||||
* SRC == any register
|
||||
* IMM == 32-bit immediate
|
||||
*
|
||||
* Output:
|
||||
* BPF_R0 - 8/16/32-bit skb data converted to cpu endianness
|
||||
*/
|
||||
|
||||
ptr = bpf_load_pointer((struct sk_buff *) (unsigned long) CTX, off, 4, &tmp);
|
||||
if (likely(ptr != NULL)) {
|
||||
BPF_R0 = get_unaligned_be32(ptr);
|
||||
CONT;
|
||||
}
|
||||
|
||||
return 0;
|
||||
LD_ABS_H: /* BPF_R0 = ntohs(*(u16 *) (skb->data + imm32)) */
|
||||
off = IMM;
|
||||
load_half:
|
||||
ptr = bpf_load_pointer((struct sk_buff *) (unsigned long) CTX, off, 2, &tmp);
|
||||
if (likely(ptr != NULL)) {
|
||||
BPF_R0 = get_unaligned_be16(ptr);
|
||||
CONT;
|
||||
}
|
||||
|
||||
return 0;
|
||||
LD_ABS_B: /* BPF_R0 = *(u8 *) (skb->data + imm32) */
|
||||
off = IMM;
|
||||
load_byte:
|
||||
ptr = bpf_load_pointer((struct sk_buff *) (unsigned long) CTX, off, 1, &tmp);
|
||||
if (likely(ptr != NULL)) {
|
||||
BPF_R0 = *(u8 *)ptr;
|
||||
CONT;
|
||||
}
|
||||
|
||||
return 0;
|
||||
LD_IND_W: /* BPF_R0 = ntohl(*(u32 *) (skb->data + src_reg + imm32)) */
|
||||
off = IMM + SRC;
|
||||
goto load_word;
|
||||
LD_IND_H: /* BPF_R0 = ntohs(*(u16 *) (skb->data + src_reg + imm32)) */
|
||||
off = IMM + SRC;
|
||||
goto load_half;
|
||||
LD_IND_B: /* BPF_R0 = *(u8 *) (skb->data + src_reg + imm32) */
|
||||
off = IMM + SRC;
|
||||
goto load_byte;
|
||||
|
||||
default_label:
|
||||
/* If we ever reach this, we have a bug somewhere. Die hard here
|
||||
@ -1722,6 +1643,10 @@ static void bpf_prog_free_deferred(struct work_struct *work)
|
||||
aux = container_of(work, struct bpf_prog_aux, work);
|
||||
if (bpf_prog_is_dev_bound(aux))
|
||||
bpf_prog_offload_destroy(aux->prog);
|
||||
#ifdef CONFIG_PERF_EVENTS
|
||||
if (aux->prog->has_callchain_buf)
|
||||
put_callchain_buffers();
|
||||
#endif
|
||||
for (i = 0; i < aux->func_cnt; i++)
|
||||
bpf_jit_free(aux->func[i]);
|
||||
if (aux->func_cnt) {
|
||||
@ -1794,6 +1719,7 @@ bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size,
|
||||
{
|
||||
return -ENOTSUPP;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bpf_event_output);
|
||||
|
||||
/* Always built-in helper functions. */
|
||||
const struct bpf_func_proto bpf_tail_call_proto = {
|
||||
@ -1840,9 +1766,3 @@ int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to,
|
||||
#include <linux/bpf_trace.h>
|
||||
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_exception);
|
||||
|
||||
/* These are only used within the BPF_SYSCALL code */
|
||||
#ifdef CONFIG_BPF_SYSCALL
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(bpf_prog_get_type);
|
||||
EXPORT_TRACEPOINT_SYMBOL_GPL(bpf_prog_put_rcu);
|
||||
#endif
|
||||
|
@ -429,13 +429,6 @@ int bpf_obj_pin_user(u32 ufd, const char __user *pathname)
|
||||
ret = bpf_obj_do_pin(pname, raw, type);
|
||||
if (ret != 0)
|
||||
bpf_any_put(raw, type);
|
||||
if ((trace_bpf_obj_pin_prog_enabled() ||
|
||||
trace_bpf_obj_pin_map_enabled()) && !ret) {
|
||||
if (type == BPF_TYPE_PROG)
|
||||
trace_bpf_obj_pin_prog(raw, ufd, pname);
|
||||
if (type == BPF_TYPE_MAP)
|
||||
trace_bpf_obj_pin_map(raw, ufd, pname);
|
||||
}
|
||||
out:
|
||||
putname(pname);
|
||||
return ret;
|
||||
@ -502,15 +495,8 @@ int bpf_obj_get_user(const char __user *pathname, int flags)
|
||||
else
|
||||
goto out;
|
||||
|
||||
if (ret < 0) {
|
||||
if (ret < 0)
|
||||
bpf_any_put(raw, type);
|
||||
} else if (trace_bpf_obj_get_prog_enabled() ||
|
||||
trace_bpf_obj_get_map_enabled()) {
|
||||
if (type == BPF_TYPE_PROG)
|
||||
trace_bpf_obj_get_prog(raw, ret, pname);
|
||||
if (type == BPF_TYPE_MAP)
|
||||
trace_bpf_obj_get_map(raw, ret, pname);
|
||||
}
|
||||
out:
|
||||
putname(pname);
|
||||
return ret;
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* Copyright (C) 2017 Netronome Systems, Inc.
|
||||
* Copyright (C) 2017-2018 Netronome Systems, Inc.
|
||||
*
|
||||
* This software is licensed under the GNU General License Version 2,
|
||||
* June 1991 as shown in the file COPYING in the top-level directory of this
|
||||
@ -474,8 +474,10 @@ bool bpf_offload_dev_match(struct bpf_prog *prog, struct bpf_map *map)
|
||||
struct bpf_prog_offload *offload;
|
||||
bool ret;
|
||||
|
||||
if (!bpf_prog_is_dev_bound(prog->aux) || !bpf_map_is_dev_bound(map))
|
||||
if (!bpf_prog_is_dev_bound(prog->aux))
|
||||
return false;
|
||||
if (!bpf_map_is_dev_bound(map))
|
||||
return bpf_map_offload_neutral(map);
|
||||
|
||||
down_read(&bpf_devs_lock);
|
||||
offload = prog->aux->offload;
|
||||
|
@ -262,16 +262,11 @@ static int stack_map_get_build_id(struct vm_area_struct *vma,
|
||||
return ret;
|
||||
}
|
||||
|
||||
static void stack_map_get_build_id_offset(struct bpf_map *map,
|
||||
struct stack_map_bucket *bucket,
|
||||
static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
|
||||
u64 *ips, u32 trace_nr, bool user)
|
||||
{
|
||||
int i;
|
||||
struct vm_area_struct *vma;
|
||||
struct bpf_stack_build_id *id_offs;
|
||||
|
||||
bucket->nr = trace_nr;
|
||||
id_offs = (struct bpf_stack_build_id *)bucket->data;
|
||||
|
||||
/*
|
||||
* We cannot do up_read() in nmi context, so build_id lookup is
|
||||
@ -361,8 +356,10 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
|
||||
pcpu_freelist_pop(&smap->freelist);
|
||||
if (unlikely(!new_bucket))
|
||||
return -ENOMEM;
|
||||
stack_map_get_build_id_offset(map, new_bucket, ips,
|
||||
trace_nr, user);
|
||||
new_bucket->nr = trace_nr;
|
||||
stack_map_get_build_id_offset(
|
||||
(struct bpf_stack_build_id *)new_bucket->data,
|
||||
ips, trace_nr, user);
|
||||
trace_len = trace_nr * sizeof(struct bpf_stack_build_id);
|
||||
if (hash_matches && bucket->nr == trace_nr &&
|
||||
memcmp(bucket->data, new_bucket->data, trace_len) == 0) {
|
||||
@ -405,6 +402,73 @@ const struct bpf_func_proto bpf_get_stackid_proto = {
|
||||
.arg3_type = ARG_ANYTHING,
|
||||
};
|
||||
|
||||
BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, size,
|
||||
u64, flags)
|
||||
{
|
||||
u32 init_nr, trace_nr, copy_len, elem_size, num_elem;
|
||||
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
|
||||
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
|
||||
bool user = flags & BPF_F_USER_STACK;
|
||||
struct perf_callchain_entry *trace;
|
||||
bool kernel = !user;
|
||||
int err = -EINVAL;
|
||||
u64 *ips;
|
||||
|
||||
if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
|
||||
BPF_F_USER_BUILD_ID)))
|
||||
goto clear;
|
||||
if (kernel && user_build_id)
|
||||
goto clear;
|
||||
|
||||
elem_size = (user && user_build_id) ? sizeof(struct bpf_stack_build_id)
|
||||
: sizeof(u64);
|
||||
if (unlikely(size % elem_size))
|
||||
goto clear;
|
||||
|
||||
num_elem = size / elem_size;
|
||||
if (sysctl_perf_event_max_stack < num_elem)
|
||||
init_nr = 0;
|
||||
else
|
||||
init_nr = sysctl_perf_event_max_stack - num_elem;
|
||||
trace = get_perf_callchain(regs, init_nr, kernel, user,
|
||||
sysctl_perf_event_max_stack, false, false);
|
||||
if (unlikely(!trace))
|
||||
goto err_fault;
|
||||
|
||||
trace_nr = trace->nr - init_nr;
|
||||
if (trace_nr < skip)
|
||||
goto err_fault;
|
||||
|
||||
trace_nr -= skip;
|
||||
trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
|
||||
copy_len = trace_nr * elem_size;
|
||||
ips = trace->ip + skip + init_nr;
|
||||
if (user && user_build_id)
|
||||
stack_map_get_build_id_offset(buf, ips, trace_nr, user);
|
||||
else
|
||||
memcpy(buf, ips, copy_len);
|
||||
|
||||
if (size > copy_len)
|
||||
memset(buf + copy_len, 0, size - copy_len);
|
||||
return copy_len;
|
||||
|
||||
err_fault:
|
||||
err = -EFAULT;
|
||||
clear:
|
||||
memset(buf, 0, size);
|
||||
return err;
|
||||
}
|
||||
|
||||
const struct bpf_func_proto bpf_get_stack_proto = {
|
||||
.func = bpf_get_stack,
|
||||
.gpl_only = true,
|
||||
.ret_type = RET_INTEGER,
|
||||
.arg1_type = ARG_PTR_TO_CTX,
|
||||
.arg2_type = ARG_PTR_TO_UNINIT_MEM,
|
||||
.arg3_type = ARG_CONST_SIZE_OR_ZERO,
|
||||
.arg4_type = ARG_ANYTHING,
|
||||
};
|
||||
|
||||
/* Called from eBPF program */
|
||||
static void *stack_map_lookup_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
|
@ -282,6 +282,7 @@ void bpf_map_put(struct bpf_map *map)
|
||||
{
|
||||
__bpf_map_put(map, true);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bpf_map_put);
|
||||
|
||||
void bpf_map_put_with_uref(struct bpf_map *map)
|
||||
{
|
||||
@ -503,7 +504,6 @@ static int map_create(union bpf_attr *attr)
|
||||
return err;
|
||||
}
|
||||
|
||||
trace_bpf_map_create(map, err);
|
||||
return err;
|
||||
|
||||
free_map:
|
||||
@ -544,6 +544,7 @@ struct bpf_map *bpf_map_inc(struct bpf_map *map, bool uref)
|
||||
atomic_inc(&map->usercnt);
|
||||
return map;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bpf_map_inc);
|
||||
|
||||
struct bpf_map *bpf_map_get_with_uref(u32 ufd)
|
||||
{
|
||||
@ -663,7 +664,6 @@ static int map_lookup_elem(union bpf_attr *attr)
|
||||
if (copy_to_user(uvalue, value, value_size) != 0)
|
||||
goto free_value;
|
||||
|
||||
trace_bpf_map_lookup_elem(map, ufd, key, value);
|
||||
err = 0;
|
||||
|
||||
free_value:
|
||||
@ -760,8 +760,6 @@ static int map_update_elem(union bpf_attr *attr)
|
||||
__this_cpu_dec(bpf_prog_active);
|
||||
preempt_enable();
|
||||
out:
|
||||
if (!err)
|
||||
trace_bpf_map_update_elem(map, ufd, key, value);
|
||||
free_value:
|
||||
kfree(value);
|
||||
free_key:
|
||||
@ -814,8 +812,6 @@ static int map_delete_elem(union bpf_attr *attr)
|
||||
__this_cpu_dec(bpf_prog_active);
|
||||
preempt_enable();
|
||||
out:
|
||||
if (!err)
|
||||
trace_bpf_map_delete_elem(map, ufd, key);
|
||||
kfree(key);
|
||||
err_put:
|
||||
fdput(f);
|
||||
@ -879,7 +875,6 @@ static int map_get_next_key(union bpf_attr *attr)
|
||||
if (copy_to_user(unext_key, next_key, map->key_size) != 0)
|
||||
goto free_next_key;
|
||||
|
||||
trace_bpf_map_next_key(map, ufd, key, next_key);
|
||||
err = 0;
|
||||
|
||||
free_next_key:
|
||||
@ -1027,7 +1022,6 @@ static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
|
||||
if (atomic_dec_and_test(&prog->aux->refcnt)) {
|
||||
int i;
|
||||
|
||||
trace_bpf_prog_put_rcu(prog);
|
||||
/* bpf_prog_free_id() must be called first */
|
||||
bpf_prog_free_id(prog, do_idr_lock);
|
||||
|
||||
@ -1194,11 +1188,7 @@ struct bpf_prog *bpf_prog_get(u32 ufd)
|
||||
struct bpf_prog *bpf_prog_get_type_dev(u32 ufd, enum bpf_prog_type type,
|
||||
bool attach_drv)
|
||||
{
|
||||
struct bpf_prog *prog = __bpf_prog_get(ufd, &type, attach_drv);
|
||||
|
||||
if (!IS_ERR(prog))
|
||||
trace_bpf_prog_get_type(prog);
|
||||
return prog;
|
||||
return __bpf_prog_get(ufd, &type, attach_drv);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bpf_prog_get_type_dev);
|
||||
|
||||
@ -1373,7 +1363,6 @@ static int bpf_prog_load(union bpf_attr *attr)
|
||||
}
|
||||
|
||||
bpf_prog_kallsyms_add(prog);
|
||||
trace_bpf_prog_load(prog, err);
|
||||
return err;
|
||||
|
||||
free_used_maps:
|
||||
|
@ -43,6 +43,16 @@ struct tnum tnum_rshift(struct tnum a, u8 shift)
|
||||
return TNUM(a.value >> shift, a.mask >> shift);
|
||||
}
|
||||
|
||||
struct tnum tnum_arshift(struct tnum a, u8 min_shift)
|
||||
{
|
||||
/* if a.value is negative, arithmetic shifting by minimum shift
|
||||
* will have larger negative offset compared to more shifting.
|
||||
* If a.value is nonnegative, arithmetic shifting by minimum shift
|
||||
* will have larger positive offset compare to more shifting.
|
||||
*/
|
||||
return TNUM((s64)a.value >> min_shift, (s64)a.mask >> min_shift);
|
||||
}
|
||||
|
||||
struct tnum tnum_add(struct tnum a, struct tnum b)
|
||||
{
|
||||
u64 sm, sv, sigma, chi, mu;
|
||||
|
@ -22,6 +22,7 @@
|
||||
#include <linux/stringify.h>
|
||||
#include <linux/bsearch.h>
|
||||
#include <linux/sort.h>
|
||||
#include <linux/perf_event.h>
|
||||
|
||||
#include "disasm.h"
|
||||
|
||||
@ -164,6 +165,8 @@ struct bpf_call_arg_meta {
|
||||
bool pkt_access;
|
||||
int regno;
|
||||
int access_size;
|
||||
s64 msize_smax_value;
|
||||
u64 msize_umax_value;
|
||||
};
|
||||
|
||||
static DEFINE_MUTEX(bpf_verifier_lock);
|
||||
@ -738,18 +741,19 @@ enum reg_arg_type {
|
||||
|
||||
static int cmp_subprogs(const void *a, const void *b)
|
||||
{
|
||||
return *(int *)a - *(int *)b;
|
||||
return ((struct bpf_subprog_info *)a)->start -
|
||||
((struct bpf_subprog_info *)b)->start;
|
||||
}
|
||||
|
||||
static int find_subprog(struct bpf_verifier_env *env, int off)
|
||||
{
|
||||
u32 *p;
|
||||
struct bpf_subprog_info *p;
|
||||
|
||||
p = bsearch(&off, env->subprog_starts, env->subprog_cnt,
|
||||
sizeof(env->subprog_starts[0]), cmp_subprogs);
|
||||
p = bsearch(&off, env->subprog_info, env->subprog_cnt,
|
||||
sizeof(env->subprog_info[0]), cmp_subprogs);
|
||||
if (!p)
|
||||
return -ENOENT;
|
||||
return p - env->subprog_starts;
|
||||
return p - env->subprog_info;
|
||||
|
||||
}
|
||||
|
||||
@ -769,18 +773,24 @@ static int add_subprog(struct bpf_verifier_env *env, int off)
|
||||
verbose(env, "too many subprograms\n");
|
||||
return -E2BIG;
|
||||
}
|
||||
env->subprog_starts[env->subprog_cnt++] = off;
|
||||
sort(env->subprog_starts, env->subprog_cnt,
|
||||
sizeof(env->subprog_starts[0]), cmp_subprogs, NULL);
|
||||
env->subprog_info[env->subprog_cnt++].start = off;
|
||||
sort(env->subprog_info, env->subprog_cnt,
|
||||
sizeof(env->subprog_info[0]), cmp_subprogs, NULL);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int check_subprogs(struct bpf_verifier_env *env)
|
||||
{
|
||||
int i, ret, subprog_start, subprog_end, off, cur_subprog = 0;
|
||||
struct bpf_subprog_info *subprog = env->subprog_info;
|
||||
struct bpf_insn *insn = env->prog->insnsi;
|
||||
int insn_cnt = env->prog->len;
|
||||
|
||||
/* Add entry function. */
|
||||
ret = add_subprog(env, 0);
|
||||
if (ret < 0)
|
||||
return ret;
|
||||
|
||||
/* determine subprog starts. The end is one before the next starts */
|
||||
for (i = 0; i < insn_cnt; i++) {
|
||||
if (insn[i].code != (BPF_JMP | BPF_CALL))
|
||||
@ -800,16 +810,18 @@ static int check_subprogs(struct bpf_verifier_env *env)
|
||||
return ret;
|
||||
}
|
||||
|
||||
/* Add a fake 'exit' subprog which could simplify subprog iteration
|
||||
* logic. 'subprog_cnt' should not be increased.
|
||||
*/
|
||||
subprog[env->subprog_cnt].start = insn_cnt;
|
||||
|
||||
if (env->log.level > 1)
|
||||
for (i = 0; i < env->subprog_cnt; i++)
|
||||
verbose(env, "func#%d @%d\n", i, env->subprog_starts[i]);
|
||||
verbose(env, "func#%d @%d\n", i, subprog[i].start);
|
||||
|
||||
/* now check that all jumps are within the same subprog */
|
||||
subprog_start = 0;
|
||||
if (env->subprog_cnt == cur_subprog)
|
||||
subprog_end = insn_cnt;
|
||||
else
|
||||
subprog_end = env->subprog_starts[cur_subprog++];
|
||||
subprog_start = subprog[cur_subprog].start;
|
||||
subprog_end = subprog[cur_subprog + 1].start;
|
||||
for (i = 0; i < insn_cnt; i++) {
|
||||
u8 code = insn[i].code;
|
||||
|
||||
@ -834,10 +846,9 @@ static int check_subprogs(struct bpf_verifier_env *env)
|
||||
return -EINVAL;
|
||||
}
|
||||
subprog_start = subprog_end;
|
||||
if (env->subprog_cnt == cur_subprog)
|
||||
subprog_end = insn_cnt;
|
||||
else
|
||||
subprog_end = env->subprog_starts[cur_subprog++];
|
||||
cur_subprog++;
|
||||
if (cur_subprog < env->subprog_cnt)
|
||||
subprog_end = subprog[cur_subprog + 1].start;
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
@ -1470,13 +1481,13 @@ static int update_stack_depth(struct bpf_verifier_env *env,
|
||||
const struct bpf_func_state *func,
|
||||
int off)
|
||||
{
|
||||
u16 stack = env->subprog_stack_depth[func->subprogno];
|
||||
u16 stack = env->subprog_info[func->subprogno].stack_depth;
|
||||
|
||||
if (stack >= -off)
|
||||
return 0;
|
||||
|
||||
/* update known max for given subprogram */
|
||||
env->subprog_stack_depth[func->subprogno] = -off;
|
||||
env->subprog_info[func->subprogno].stack_depth = -off;
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -1488,9 +1499,9 @@ static int update_stack_depth(struct bpf_verifier_env *env,
|
||||
*/
|
||||
static int check_max_stack_depth(struct bpf_verifier_env *env)
|
||||
{
|
||||
int depth = 0, frame = 0, subprog = 0, i = 0, subprog_end;
|
||||
int depth = 0, frame = 0, idx = 0, i = 0, subprog_end;
|
||||
struct bpf_subprog_info *subprog = env->subprog_info;
|
||||
struct bpf_insn *insn = env->prog->insnsi;
|
||||
int insn_cnt = env->prog->len;
|
||||
int ret_insn[MAX_CALL_FRAMES];
|
||||
int ret_prog[MAX_CALL_FRAMES];
|
||||
|
||||
@ -1498,17 +1509,14 @@ static int check_max_stack_depth(struct bpf_verifier_env *env)
|
||||
/* round up to 32-bytes, since this is granularity
|
||||
* of interpreter stack size
|
||||
*/
|
||||
depth += round_up(max_t(u32, env->subprog_stack_depth[subprog], 1), 32);
|
||||
depth += round_up(max_t(u32, subprog[idx].stack_depth, 1), 32);
|
||||
if (depth > MAX_BPF_STACK) {
|
||||
verbose(env, "combined stack size of %d calls is %d. Too large\n",
|
||||
frame + 1, depth);
|
||||
return -EACCES;
|
||||
}
|
||||
continue_func:
|
||||
if (env->subprog_cnt == subprog)
|
||||
subprog_end = insn_cnt;
|
||||
else
|
||||
subprog_end = env->subprog_starts[subprog];
|
||||
subprog_end = subprog[idx + 1].start;
|
||||
for (; i < subprog_end; i++) {
|
||||
if (insn[i].code != (BPF_JMP | BPF_CALL))
|
||||
continue;
|
||||
@ -1516,17 +1524,16 @@ static int check_max_stack_depth(struct bpf_verifier_env *env)
|
||||
continue;
|
||||
/* remember insn and function to return to */
|
||||
ret_insn[frame] = i + 1;
|
||||
ret_prog[frame] = subprog;
|
||||
ret_prog[frame] = idx;
|
||||
|
||||
/* find the callee */
|
||||
i = i + insn[i].imm + 1;
|
||||
subprog = find_subprog(env, i);
|
||||
if (subprog < 0) {
|
||||
idx = find_subprog(env, i);
|
||||
if (idx < 0) {
|
||||
WARN_ONCE(1, "verifier bug. No program starts at insn %d\n",
|
||||
i);
|
||||
return -EFAULT;
|
||||
}
|
||||
subprog++;
|
||||
frame++;
|
||||
if (frame >= MAX_CALL_FRAMES) {
|
||||
WARN_ONCE(1, "verifier bug. Call stack is too deep\n");
|
||||
@ -1539,10 +1546,10 @@ static int check_max_stack_depth(struct bpf_verifier_env *env)
|
||||
*/
|
||||
if (frame == 0)
|
||||
return 0;
|
||||
depth -= round_up(max_t(u32, env->subprog_stack_depth[subprog], 1), 32);
|
||||
depth -= round_up(max_t(u32, subprog[idx].stack_depth, 1), 32);
|
||||
frame--;
|
||||
i = ret_insn[frame];
|
||||
subprog = ret_prog[frame];
|
||||
idx = ret_prog[frame];
|
||||
goto continue_func;
|
||||
}
|
||||
|
||||
@ -1558,8 +1565,7 @@ static int get_callee_stack_depth(struct bpf_verifier_env *env,
|
||||
start);
|
||||
return -EFAULT;
|
||||
}
|
||||
subprog++;
|
||||
return env->subprog_stack_depth[subprog];
|
||||
return env->subprog_info[subprog].stack_depth;
|
||||
}
|
||||
#endif
|
||||
|
||||
@ -1984,6 +1990,12 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
|
||||
} else if (arg_type_is_mem_size(arg_type)) {
|
||||
bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
|
||||
|
||||
/* remember the mem_size which may be used later
|
||||
* to refine return values.
|
||||
*/
|
||||
meta->msize_smax_value = reg->smax_value;
|
||||
meta->msize_umax_value = reg->umax_value;
|
||||
|
||||
/* The register is SCALAR_VALUE; the access check
|
||||
* happens using its boundaries.
|
||||
*/
|
||||
@ -2061,8 +2073,11 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
|
||||
if (func_id != BPF_FUNC_redirect_map)
|
||||
goto error;
|
||||
break;
|
||||
/* Restrict bpf side of cpumap, open when use-cases appear */
|
||||
/* Restrict bpf side of cpumap and xskmap, open when use-cases
|
||||
* appear.
|
||||
*/
|
||||
case BPF_MAP_TYPE_CPUMAP:
|
||||
case BPF_MAP_TYPE_XSKMAP:
|
||||
if (func_id != BPF_FUNC_redirect_map)
|
||||
goto error;
|
||||
break;
|
||||
@ -2087,7 +2102,7 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
|
||||
case BPF_FUNC_tail_call:
|
||||
if (map->map_type != BPF_MAP_TYPE_PROG_ARRAY)
|
||||
goto error;
|
||||
if (env->subprog_cnt) {
|
||||
if (env->subprog_cnt > 1) {
|
||||
verbose(env, "tail_calls are not allowed in programs with bpf-to-bpf calls\n");
|
||||
return -EINVAL;
|
||||
}
|
||||
@ -2109,7 +2124,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
|
||||
break;
|
||||
case BPF_FUNC_redirect_map:
|
||||
if (map->map_type != BPF_MAP_TYPE_DEVMAP &&
|
||||
map->map_type != BPF_MAP_TYPE_CPUMAP)
|
||||
map->map_type != BPF_MAP_TYPE_CPUMAP &&
|
||||
map->map_type != BPF_MAP_TYPE_XSKMAP)
|
||||
goto error;
|
||||
break;
|
||||
case BPF_FUNC_sk_redirect_map:
|
||||
@ -2259,7 +2275,7 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
|
||||
/* remember the callsite, it will be used by bpf_exit */
|
||||
*insn_idx /* callsite */,
|
||||
state->curframe + 1 /* frameno within this callchain */,
|
||||
subprog + 1 /* subprog number within this prog */);
|
||||
subprog /* subprog number within this prog */);
|
||||
|
||||
/* copy r1 - r5 args that callee can access */
|
||||
for (i = BPF_REG_1; i <= BPF_REG_5; i++)
|
||||
@ -2323,6 +2339,23 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void do_refine_retval_range(struct bpf_reg_state *regs, int ret_type,
|
||||
int func_id,
|
||||
struct bpf_call_arg_meta *meta)
|
||||
{
|
||||
struct bpf_reg_state *ret_reg = ®s[BPF_REG_0];
|
||||
|
||||
if (ret_type != RET_INTEGER ||
|
||||
(func_id != BPF_FUNC_get_stack &&
|
||||
func_id != BPF_FUNC_probe_read_str))
|
||||
return;
|
||||
|
||||
ret_reg->smax_value = meta->msize_smax_value;
|
||||
ret_reg->umax_value = meta->msize_umax_value;
|
||||
__reg_deduce_bounds(ret_reg);
|
||||
__reg_bound_offset(ret_reg);
|
||||
}
|
||||
|
||||
static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn_idx)
|
||||
{
|
||||
const struct bpf_func_proto *fn = NULL;
|
||||
@ -2446,10 +2479,30 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
do_refine_retval_range(regs, fn->ret_type, func_id, &meta);
|
||||
|
||||
err = check_map_func_compatibility(env, meta.map_ptr, func_id);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
if (func_id == BPF_FUNC_get_stack && !env->prog->has_callchain_buf) {
|
||||
const char *err_str;
|
||||
|
||||
#ifdef CONFIG_PERF_EVENTS
|
||||
err = get_callchain_buffers(sysctl_perf_event_max_stack);
|
||||
err_str = "cannot get callchain buffer for func %s#%d\n";
|
||||
#else
|
||||
err = -ENOTSUPP;
|
||||
err_str = "func %s#%d not supported without CONFIG_PERF_EVENTS\n";
|
||||
#endif
|
||||
if (err) {
|
||||
verbose(env, err_str, func_id_name(func_id), func_id);
|
||||
return err;
|
||||
}
|
||||
|
||||
env->prog->has_callchain_buf = true;
|
||||
}
|
||||
|
||||
if (changes_data)
|
||||
clear_all_pkt_pointers(env);
|
||||
return 0;
|
||||
@ -2894,10 +2947,7 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
|
||||
dst_reg->umin_value <<= umin_val;
|
||||
dst_reg->umax_value <<= umax_val;
|
||||
}
|
||||
if (src_known)
|
||||
dst_reg->var_off = tnum_lshift(dst_reg->var_off, umin_val);
|
||||
else
|
||||
dst_reg->var_off = tnum_lshift(tnum_unknown, umin_val);
|
||||
dst_reg->var_off = tnum_lshift(dst_reg->var_off, umin_val);
|
||||
/* We may learn something more from the var_off */
|
||||
__update_reg_bounds(dst_reg);
|
||||
break;
|
||||
@ -2925,16 +2975,35 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
|
||||
*/
|
||||
dst_reg->smin_value = S64_MIN;
|
||||
dst_reg->smax_value = S64_MAX;
|
||||
if (src_known)
|
||||
dst_reg->var_off = tnum_rshift(dst_reg->var_off,
|
||||
umin_val);
|
||||
else
|
||||
dst_reg->var_off = tnum_rshift(tnum_unknown, umin_val);
|
||||
dst_reg->var_off = tnum_rshift(dst_reg->var_off, umin_val);
|
||||
dst_reg->umin_value >>= umax_val;
|
||||
dst_reg->umax_value >>= umin_val;
|
||||
/* We may learn something more from the var_off */
|
||||
__update_reg_bounds(dst_reg);
|
||||
break;
|
||||
case BPF_ARSH:
|
||||
if (umax_val >= insn_bitness) {
|
||||
/* Shifts greater than 31 or 63 are undefined.
|
||||
* This includes shifts by a negative number.
|
||||
*/
|
||||
mark_reg_unknown(env, regs, insn->dst_reg);
|
||||
break;
|
||||
}
|
||||
|
||||
/* Upon reaching here, src_known is true and
|
||||
* umax_val is equal to umin_val.
|
||||
*/
|
||||
dst_reg->smin_value >>= umin_val;
|
||||
dst_reg->smax_value >>= umin_val;
|
||||
dst_reg->var_off = tnum_arshift(dst_reg->var_off, umin_val);
|
||||
|
||||
/* blow away the dst_reg umin_value/umax_value and rely on
|
||||
* dst_reg var_off to refine the result.
|
||||
*/
|
||||
dst_reg->umin_value = 0;
|
||||
dst_reg->umax_value = U64_MAX;
|
||||
__update_reg_bounds(dst_reg);
|
||||
break;
|
||||
default:
|
||||
mark_reg_unknown(env, regs, insn->dst_reg);
|
||||
break;
|
||||
@ -3818,7 +3887,12 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if (env->subprog_cnt) {
|
||||
if (!env->ops->gen_ld_abs) {
|
||||
verbose(env, "bpf verifier is misconfigured\n");
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if (env->subprog_cnt > 1) {
|
||||
/* when program has LD_ABS insn JITs and interpreter assume
|
||||
* that r1 == ctx == skb which is not the case for callees
|
||||
* that can have arbitrary arguments. It's problematic
|
||||
@ -4849,15 +4923,15 @@ static int do_check(struct bpf_verifier_env *env)
|
||||
|
||||
verbose(env, "processed %d insns (limit %d), stack depth ",
|
||||
insn_processed, BPF_COMPLEXITY_LIMIT_INSNS);
|
||||
for (i = 0; i < env->subprog_cnt + 1; i++) {
|
||||
u32 depth = env->subprog_stack_depth[i];
|
||||
for (i = 0; i < env->subprog_cnt; i++) {
|
||||
u32 depth = env->subprog_info[i].stack_depth;
|
||||
|
||||
verbose(env, "%d", depth);
|
||||
if (i + 1 < env->subprog_cnt + 1)
|
||||
if (i + 1 < env->subprog_cnt)
|
||||
verbose(env, "+");
|
||||
}
|
||||
verbose(env, "\n");
|
||||
env->prog->aux->stack_depth = env->subprog_stack_depth[0];
|
||||
env->prog->aux->stack_depth = env->subprog_info[0].stack_depth;
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -4981,7 +5055,7 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
|
||||
/* hold the map. If the program is rejected by verifier,
|
||||
* the map will be released by release_maps() or it
|
||||
* will be used by the valid program until it's unloaded
|
||||
* and all maps are released in free_bpf_prog_info()
|
||||
* and all maps are released in free_used_maps()
|
||||
*/
|
||||
map = bpf_map_inc(map, false);
|
||||
if (IS_ERR(map)) {
|
||||
@ -5063,10 +5137,11 @@ static void adjust_subprog_starts(struct bpf_verifier_env *env, u32 off, u32 len
|
||||
|
||||
if (len == 1)
|
||||
return;
|
||||
for (i = 0; i < env->subprog_cnt; i++) {
|
||||
if (env->subprog_starts[i] < off)
|
||||
/* NOTE: fake 'exit' subprog should be updated as well. */
|
||||
for (i = 0; i <= env->subprog_cnt; i++) {
|
||||
if (env->subprog_info[i].start < off)
|
||||
continue;
|
||||
env->subprog_starts[i] += len - 1;
|
||||
env->subprog_info[i].start += len - 1;
|
||||
}
|
||||
}
|
||||
|
||||
@ -5230,7 +5305,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
|
||||
void *old_bpf_func;
|
||||
int err = -ENOMEM;
|
||||
|
||||
if (env->subprog_cnt == 0)
|
||||
if (env->subprog_cnt <= 1)
|
||||
return 0;
|
||||
|
||||
for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
|
||||
@ -5246,7 +5321,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
|
||||
/* temporarily remember subprog id inside insn instead of
|
||||
* aux_data, since next loop will split up all insns into funcs
|
||||
*/
|
||||
insn->off = subprog + 1;
|
||||
insn->off = subprog;
|
||||
/* remember original imm in case JIT fails and fallback
|
||||
* to interpreter will be needed
|
||||
*/
|
||||
@ -5255,16 +5330,13 @@ static int jit_subprogs(struct bpf_verifier_env *env)
|
||||
insn->imm = 1;
|
||||
}
|
||||
|
||||
func = kzalloc(sizeof(prog) * (env->subprog_cnt + 1), GFP_KERNEL);
|
||||
func = kzalloc(sizeof(prog) * env->subprog_cnt, GFP_KERNEL);
|
||||
if (!func)
|
||||
return -ENOMEM;
|
||||
|
||||
for (i = 0; i <= env->subprog_cnt; i++) {
|
||||
for (i = 0; i < env->subprog_cnt; i++) {
|
||||
subprog_start = subprog_end;
|
||||
if (env->subprog_cnt == i)
|
||||
subprog_end = prog->len;
|
||||
else
|
||||
subprog_end = env->subprog_starts[i];
|
||||
subprog_end = env->subprog_info[i + 1].start;
|
||||
|
||||
len = subprog_end - subprog_start;
|
||||
func[i] = bpf_prog_alloc(bpf_prog_size(len), GFP_USER);
|
||||
@ -5281,7 +5353,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
|
||||
* Long term would need debug info to populate names
|
||||
*/
|
||||
func[i]->aux->name[0] = 'F';
|
||||
func[i]->aux->stack_depth = env->subprog_stack_depth[i];
|
||||
func[i]->aux->stack_depth = env->subprog_info[i].stack_depth;
|
||||
func[i]->jit_requested = 1;
|
||||
func[i] = bpf_int_jit_compile(func[i]);
|
||||
if (!func[i]->jited) {
|
||||
@ -5294,7 +5366,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
|
||||
* now populate all bpf_calls with correct addresses and
|
||||
* run last pass of JIT
|
||||
*/
|
||||
for (i = 0; i <= env->subprog_cnt; i++) {
|
||||
for (i = 0; i < env->subprog_cnt; i++) {
|
||||
insn = func[i]->insnsi;
|
||||
for (j = 0; j < func[i]->len; j++, insn++) {
|
||||
if (insn->code != (BPF_JMP | BPF_CALL) ||
|
||||
@ -5307,7 +5379,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
|
||||
__bpf_call_base;
|
||||
}
|
||||
}
|
||||
for (i = 0; i <= env->subprog_cnt; i++) {
|
||||
for (i = 0; i < env->subprog_cnt; i++) {
|
||||
old_bpf_func = func[i]->bpf_func;
|
||||
tmp = bpf_int_jit_compile(func[i]);
|
||||
if (tmp != func[i] || func[i]->bpf_func != old_bpf_func) {
|
||||
@ -5321,7 +5393,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
|
||||
/* finally lock prog and jit images for all functions and
|
||||
* populate kallsysm
|
||||
*/
|
||||
for (i = 0; i <= env->subprog_cnt; i++) {
|
||||
for (i = 0; i < env->subprog_cnt; i++) {
|
||||
bpf_prog_lock_ro(func[i]);
|
||||
bpf_prog_kallsyms_add(func[i]);
|
||||
}
|
||||
@ -5338,7 +5410,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
|
||||
continue;
|
||||
insn->off = env->insn_aux_data[i].call_imm;
|
||||
subprog = find_subprog(env, i + insn->off + 1);
|
||||
addr = (unsigned long)func[subprog + 1]->bpf_func;
|
||||
addr = (unsigned long)func[subprog]->bpf_func;
|
||||
addr &= PAGE_MASK;
|
||||
insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
|
||||
addr - __bpf_call_base;
|
||||
@ -5347,10 +5419,10 @@ static int jit_subprogs(struct bpf_verifier_env *env)
|
||||
prog->jited = 1;
|
||||
prog->bpf_func = func[0]->bpf_func;
|
||||
prog->aux->func = func;
|
||||
prog->aux->func_cnt = env->subprog_cnt + 1;
|
||||
prog->aux->func_cnt = env->subprog_cnt;
|
||||
return 0;
|
||||
out_free:
|
||||
for (i = 0; i <= env->subprog_cnt; i++)
|
||||
for (i = 0; i < env->subprog_cnt; i++)
|
||||
if (func[i])
|
||||
bpf_jit_free(func[i]);
|
||||
kfree(func);
|
||||
@ -5453,6 +5525,25 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
|
||||
continue;
|
||||
}
|
||||
|
||||
if (BPF_CLASS(insn->code) == BPF_LD &&
|
||||
(BPF_MODE(insn->code) == BPF_ABS ||
|
||||
BPF_MODE(insn->code) == BPF_IND)) {
|
||||
cnt = env->ops->gen_ld_abs(insn, insn_buf);
|
||||
if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf)) {
|
||||
verbose(env, "bpf verifier is misconfigured\n");
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
|
||||
if (!new_prog)
|
||||
return -ENOMEM;
|
||||
|
||||
delta += cnt - 1;
|
||||
env->prog = prog = new_prog;
|
||||
insn = new_prog->insnsi + i + delta;
|
||||
continue;
|
||||
}
|
||||
|
||||
if (insn->code != (BPF_JMP | BPF_CALL))
|
||||
continue;
|
||||
if (insn->src_reg == BPF_PSEUDO_CALL)
|
||||
@ -5650,16 +5741,16 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
|
||||
if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
|
||||
env->strict_alignment = true;
|
||||
|
||||
if (bpf_prog_is_dev_bound(env->prog->aux)) {
|
||||
ret = bpf_prog_offload_verifier_prep(env);
|
||||
if (ret)
|
||||
goto err_unlock;
|
||||
}
|
||||
|
||||
ret = replace_map_fd_with_map_ptr(env);
|
||||
if (ret < 0)
|
||||
goto skip_full_check;
|
||||
|
||||
if (bpf_prog_is_dev_bound(env->prog->aux)) {
|
||||
ret = bpf_prog_offload_verifier_prep(env);
|
||||
if (ret)
|
||||
goto skip_full_check;
|
||||
}
|
||||
|
||||
env->explored_states = kcalloc(env->prog->len,
|
||||
sizeof(struct bpf_verifier_state_list *),
|
||||
GFP_USER);
|
||||
@ -5730,7 +5821,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
|
||||
err_release_maps:
|
||||
if (!env->prog->aux->used_maps)
|
||||
/* if we didn't copy map pointers into bpf_prog_info, release
|
||||
* them now. Otherwise free_bpf_prog_info() will release them.
|
||||
* them now. Otherwise free_used_maps() will release them.
|
||||
*/
|
||||
release_maps(env);
|
||||
*prog = env->prog;
|
||||
|
241
kernel/bpf/xskmap.c
Normal file
241
kernel/bpf/xskmap.c
Normal file
@ -0,0 +1,241 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/* XSKMAP used for AF_XDP sockets
|
||||
* Copyright(c) 2018 Intel Corporation.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it
|
||||
* under the terms and conditions of the GNU General Public License,
|
||||
* version 2, as published by the Free Software Foundation.
|
||||
*
|
||||
* This program is distributed in the hope it will be useful, but WITHOUT
|
||||
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
|
||||
* more details.
|
||||
*/
|
||||
|
||||
#include <linux/bpf.h>
|
||||
#include <linux/capability.h>
|
||||
#include <net/xdp_sock.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/sched.h>
|
||||
|
||||
struct xsk_map {
|
||||
struct bpf_map map;
|
||||
struct xdp_sock **xsk_map;
|
||||
struct list_head __percpu *flush_list;
|
||||
};
|
||||
|
||||
static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
|
||||
{
|
||||
int cpu, err = -EINVAL;
|
||||
struct xsk_map *m;
|
||||
u64 cost;
|
||||
|
||||
if (!capable(CAP_NET_ADMIN))
|
||||
return ERR_PTR(-EPERM);
|
||||
|
||||
if (attr->max_entries == 0 || attr->key_size != 4 ||
|
||||
attr->value_size != 4 ||
|
||||
attr->map_flags & ~(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY))
|
||||
return ERR_PTR(-EINVAL);
|
||||
|
||||
m = kzalloc(sizeof(*m), GFP_USER);
|
||||
if (!m)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
|
||||
bpf_map_init_from_attr(&m->map, attr);
|
||||
|
||||
cost = (u64)m->map.max_entries * sizeof(struct xdp_sock *);
|
||||
cost += sizeof(struct list_head) * num_possible_cpus();
|
||||
if (cost >= U32_MAX - PAGE_SIZE)
|
||||
goto free_m;
|
||||
|
||||
m->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
|
||||
|
||||
/* Notice returns -EPERM on if map size is larger than memlock limit */
|
||||
err = bpf_map_precharge_memlock(m->map.pages);
|
||||
if (err)
|
||||
goto free_m;
|
||||
|
||||
err = -ENOMEM;
|
||||
|
||||
m->flush_list = alloc_percpu(struct list_head);
|
||||
if (!m->flush_list)
|
||||
goto free_m;
|
||||
|
||||
for_each_possible_cpu(cpu)
|
||||
INIT_LIST_HEAD(per_cpu_ptr(m->flush_list, cpu));
|
||||
|
||||
m->xsk_map = bpf_map_area_alloc(m->map.max_entries *
|
||||
sizeof(struct xdp_sock *),
|
||||
m->map.numa_node);
|
||||
if (!m->xsk_map)
|
||||
goto free_percpu;
|
||||
return &m->map;
|
||||
|
||||
free_percpu:
|
||||
free_percpu(m->flush_list);
|
||||
free_m:
|
||||
kfree(m);
|
||||
return ERR_PTR(err);
|
||||
}
|
||||
|
||||
static void xsk_map_free(struct bpf_map *map)
|
||||
{
|
||||
struct xsk_map *m = container_of(map, struct xsk_map, map);
|
||||
int i;
|
||||
|
||||
synchronize_net();
|
||||
|
||||
for (i = 0; i < map->max_entries; i++) {
|
||||
struct xdp_sock *xs;
|
||||
|
||||
xs = m->xsk_map[i];
|
||||
if (!xs)
|
||||
continue;
|
||||
|
||||
sock_put((struct sock *)xs);
|
||||
}
|
||||
|
||||
free_percpu(m->flush_list);
|
||||
bpf_map_area_free(m->xsk_map);
|
||||
kfree(m);
|
||||
}
|
||||
|
||||
static int xsk_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
|
||||
{
|
||||
struct xsk_map *m = container_of(map, struct xsk_map, map);
|
||||
u32 index = key ? *(u32 *)key : U32_MAX;
|
||||
u32 *next = next_key;
|
||||
|
||||
if (index >= m->map.max_entries) {
|
||||
*next = 0;
|
||||
return 0;
|
||||
}
|
||||
|
||||
if (index == m->map.max_entries - 1)
|
||||
return -ENOENT;
|
||||
*next = index + 1;
|
||||
return 0;
|
||||
}
|
||||
|
||||
struct xdp_sock *__xsk_map_lookup_elem(struct bpf_map *map, u32 key)
|
||||
{
|
||||
struct xsk_map *m = container_of(map, struct xsk_map, map);
|
||||
struct xdp_sock *xs;
|
||||
|
||||
if (key >= map->max_entries)
|
||||
return NULL;
|
||||
|
||||
xs = READ_ONCE(m->xsk_map[key]);
|
||||
return xs;
|
||||
}
|
||||
|
||||
int __xsk_map_redirect(struct bpf_map *map, struct xdp_buff *xdp,
|
||||
struct xdp_sock *xs)
|
||||
{
|
||||
struct xsk_map *m = container_of(map, struct xsk_map, map);
|
||||
struct list_head *flush_list = this_cpu_ptr(m->flush_list);
|
||||
int err;
|
||||
|
||||
err = xsk_rcv(xs, xdp);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
if (!xs->flush_node.prev)
|
||||
list_add(&xs->flush_node, flush_list);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
void __xsk_map_flush(struct bpf_map *map)
|
||||
{
|
||||
struct xsk_map *m = container_of(map, struct xsk_map, map);
|
||||
struct list_head *flush_list = this_cpu_ptr(m->flush_list);
|
||||
struct xdp_sock *xs, *tmp;
|
||||
|
||||
list_for_each_entry_safe(xs, tmp, flush_list, flush_node) {
|
||||
xsk_flush(xs);
|
||||
__list_del(xs->flush_node.prev, xs->flush_node.next);
|
||||
xs->flush_node.prev = NULL;
|
||||
}
|
||||
}
|
||||
|
||||
static void *xsk_map_lookup_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
|
||||
u64 map_flags)
|
||||
{
|
||||
struct xsk_map *m = container_of(map, struct xsk_map, map);
|
||||
u32 i = *(u32 *)key, fd = *(u32 *)value;
|
||||
struct xdp_sock *xs, *old_xs;
|
||||
struct socket *sock;
|
||||
int err;
|
||||
|
||||
if (unlikely(map_flags > BPF_EXIST))
|
||||
return -EINVAL;
|
||||
if (unlikely(i >= m->map.max_entries))
|
||||
return -E2BIG;
|
||||
if (unlikely(map_flags == BPF_NOEXIST))
|
||||
return -EEXIST;
|
||||
|
||||
sock = sockfd_lookup(fd, &err);
|
||||
if (!sock)
|
||||
return err;
|
||||
|
||||
if (sock->sk->sk_family != PF_XDP) {
|
||||
sockfd_put(sock);
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
xs = (struct xdp_sock *)sock->sk;
|
||||
|
||||
if (!xsk_is_setup_for_bpf_map(xs)) {
|
||||
sockfd_put(sock);
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
sock_hold(sock->sk);
|
||||
|
||||
old_xs = xchg(&m->xsk_map[i], xs);
|
||||
if (old_xs) {
|
||||
/* Make sure we've flushed everything. */
|
||||
synchronize_net();
|
||||
sock_put((struct sock *)old_xs);
|
||||
}
|
||||
|
||||
sockfd_put(sock);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int xsk_map_delete_elem(struct bpf_map *map, void *key)
|
||||
{
|
||||
struct xsk_map *m = container_of(map, struct xsk_map, map);
|
||||
struct xdp_sock *old_xs;
|
||||
int k = *(u32 *)key;
|
||||
|
||||
if (k >= map->max_entries)
|
||||
return -EINVAL;
|
||||
|
||||
old_xs = xchg(&m->xsk_map[k], NULL);
|
||||
if (old_xs) {
|
||||
/* Make sure we've flushed everything. */
|
||||
synchronize_net();
|
||||
sock_put((struct sock *)old_xs);
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
const struct bpf_map_ops xsk_map_ops = {
|
||||
.map_alloc = xsk_map_alloc,
|
||||
.map_free = xsk_map_free,
|
||||
.map_get_next_key = xsk_map_get_next_key,
|
||||
.map_lookup_elem = xsk_map_lookup_elem,
|
||||
.map_update_elem = xsk_map_update_elem,
|
||||
.map_delete_elem = xsk_map_delete_elem,
|
||||
};
|
||||
|
||||
|
@ -20,6 +20,7 @@
|
||||
#include "trace.h"
|
||||
|
||||
u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
|
||||
u64 bpf_get_stack(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
|
||||
|
||||
/**
|
||||
* trace_call_bpf - invoke BPF program
|
||||
@ -474,8 +475,6 @@ BPF_CALL_2(bpf_current_task_under_cgroup, struct bpf_map *, map, u32, idx)
|
||||
struct bpf_array *array = container_of(map, struct bpf_array, map);
|
||||
struct cgroup *cgrp;
|
||||
|
||||
if (unlikely(in_interrupt()))
|
||||
return -EINVAL;
|
||||
if (unlikely(idx >= array->map.max_entries))
|
||||
return -E2BIG;
|
||||
|
||||
@ -577,6 +576,8 @@ kprobe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
|
||||
return &bpf_perf_event_output_proto;
|
||||
case BPF_FUNC_get_stackid:
|
||||
return &bpf_get_stackid_proto;
|
||||
case BPF_FUNC_get_stack:
|
||||
return &bpf_get_stack_proto;
|
||||
case BPF_FUNC_perf_event_read_value:
|
||||
return &bpf_perf_event_read_value_proto;
|
||||
#ifdef CONFIG_BPF_KPROBE_OVERRIDE
|
||||
@ -664,6 +665,25 @@ static const struct bpf_func_proto bpf_get_stackid_proto_tp = {
|
||||
.arg3_type = ARG_ANYTHING,
|
||||
};
|
||||
|
||||
BPF_CALL_4(bpf_get_stack_tp, void *, tp_buff, void *, buf, u32, size,
|
||||
u64, flags)
|
||||
{
|
||||
struct pt_regs *regs = *(struct pt_regs **)tp_buff;
|
||||
|
||||
return bpf_get_stack((unsigned long) regs, (unsigned long) buf,
|
||||
(unsigned long) size, flags, 0);
|
||||
}
|
||||
|
||||
static const struct bpf_func_proto bpf_get_stack_proto_tp = {
|
||||
.func = bpf_get_stack_tp,
|
||||
.gpl_only = true,
|
||||
.ret_type = RET_INTEGER,
|
||||
.arg1_type = ARG_PTR_TO_CTX,
|
||||
.arg2_type = ARG_PTR_TO_UNINIT_MEM,
|
||||
.arg3_type = ARG_CONST_SIZE_OR_ZERO,
|
||||
.arg4_type = ARG_ANYTHING,
|
||||
};
|
||||
|
||||
static const struct bpf_func_proto *
|
||||
tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
|
||||
{
|
||||
@ -672,6 +692,8 @@ tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
|
||||
return &bpf_perf_event_output_proto_tp;
|
||||
case BPF_FUNC_get_stackid:
|
||||
return &bpf_get_stackid_proto_tp;
|
||||
case BPF_FUNC_get_stack:
|
||||
return &bpf_get_stack_proto_tp;
|
||||
default:
|
||||
return tracing_func_proto(func_id, prog);
|
||||
}
|
||||
@ -734,6 +756,8 @@ pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
|
||||
return &bpf_perf_event_output_proto_tp;
|
||||
case BPF_FUNC_get_stackid:
|
||||
return &bpf_get_stackid_proto_tp;
|
||||
case BPF_FUNC_get_stack:
|
||||
return &bpf_get_stack_proto_tp;
|
||||
case BPF_FUNC_perf_prog_read_value:
|
||||
return &bpf_perf_prog_read_value_proto;
|
||||
default:
|
||||
@ -744,7 +768,7 @@ pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
|
||||
/*
|
||||
* bpf_raw_tp_regs are separate from bpf_pt_regs used from skb/xdp
|
||||
* to avoid potential recursive reuse issue when/if tracepoints are added
|
||||
* inside bpf_*_event_output and/or bpf_get_stack_id
|
||||
* inside bpf_*_event_output, bpf_get_stackid and/or bpf_get_stack
|
||||
*/
|
||||
static DEFINE_PER_CPU(struct pt_regs, bpf_raw_tp_regs);
|
||||
BPF_CALL_5(bpf_perf_event_output_raw_tp, struct bpf_raw_tracepoint_args *, args,
|
||||
@ -787,6 +811,26 @@ static const struct bpf_func_proto bpf_get_stackid_proto_raw_tp = {
|
||||
.arg3_type = ARG_ANYTHING,
|
||||
};
|
||||
|
||||
BPF_CALL_4(bpf_get_stack_raw_tp, struct bpf_raw_tracepoint_args *, args,
|
||||
void *, buf, u32, size, u64, flags)
|
||||
{
|
||||
struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
|
||||
|
||||
perf_fetch_caller_regs(regs);
|
||||
return bpf_get_stack((unsigned long) regs, (unsigned long) buf,
|
||||
(unsigned long) size, flags, 0);
|
||||
}
|
||||
|
||||
static const struct bpf_func_proto bpf_get_stack_proto_raw_tp = {
|
||||
.func = bpf_get_stack_raw_tp,
|
||||
.gpl_only = true,
|
||||
.ret_type = RET_INTEGER,
|
||||
.arg1_type = ARG_PTR_TO_CTX,
|
||||
.arg2_type = ARG_PTR_TO_MEM,
|
||||
.arg3_type = ARG_CONST_SIZE_OR_ZERO,
|
||||
.arg4_type = ARG_ANYTHING,
|
||||
};
|
||||
|
||||
static const struct bpf_func_proto *
|
||||
raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
|
||||
{
|
||||
@ -795,6 +839,8 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
|
||||
return &bpf_perf_event_output_proto_raw_tp;
|
||||
case BPF_FUNC_get_stackid:
|
||||
return &bpf_get_stackid_proto_raw_tp;
|
||||
case BPF_FUNC_get_stack:
|
||||
return &bpf_get_stack_proto_raw_tp;
|
||||
default:
|
||||
return tracing_func_proto(func_id, prog);
|
||||
}
|
||||
|
570
lib/test_bpf.c
570
lib/test_bpf.c
@ -386,116 +386,6 @@ static int bpf_fill_ld_abs_get_processor_id(struct bpf_test *self)
|
||||
return 0;
|
||||
}
|
||||
|
||||
#define PUSH_CNT 68
|
||||
/* test: {skb->data[0], vlan_push} x 68 + {skb->data[0], vlan_pop} x 68 */
|
||||
static int bpf_fill_ld_abs_vlan_push_pop(struct bpf_test *self)
|
||||
{
|
||||
unsigned int len = BPF_MAXINSNS;
|
||||
struct bpf_insn *insn;
|
||||
int i = 0, j, k = 0;
|
||||
|
||||
insn = kmalloc_array(len, sizeof(*insn), GFP_KERNEL);
|
||||
if (!insn)
|
||||
return -ENOMEM;
|
||||
|
||||
insn[i++] = BPF_MOV64_REG(R6, R1);
|
||||
loop:
|
||||
for (j = 0; j < PUSH_CNT; j++) {
|
||||
insn[i++] = BPF_LD_ABS(BPF_B, 0);
|
||||
insn[i] = BPF_JMP_IMM(BPF_JNE, R0, 0x34, len - i - 2);
|
||||
i++;
|
||||
insn[i++] = BPF_MOV64_REG(R1, R6);
|
||||
insn[i++] = BPF_MOV64_IMM(R2, 1);
|
||||
insn[i++] = BPF_MOV64_IMM(R3, 2);
|
||||
insn[i++] = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
|
||||
bpf_skb_vlan_push_proto.func - __bpf_call_base);
|
||||
insn[i] = BPF_JMP_IMM(BPF_JNE, R0, 0, len - i - 2);
|
||||
i++;
|
||||
}
|
||||
|
||||
for (j = 0; j < PUSH_CNT; j++) {
|
||||
insn[i++] = BPF_LD_ABS(BPF_B, 0);
|
||||
insn[i] = BPF_JMP_IMM(BPF_JNE, R0, 0x34, len - i - 2);
|
||||
i++;
|
||||
insn[i++] = BPF_MOV64_REG(R1, R6);
|
||||
insn[i++] = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
|
||||
bpf_skb_vlan_pop_proto.func - __bpf_call_base);
|
||||
insn[i] = BPF_JMP_IMM(BPF_JNE, R0, 0, len - i - 2);
|
||||
i++;
|
||||
}
|
||||
if (++k < 5)
|
||||
goto loop;
|
||||
|
||||
for (; i < len - 1; i++)
|
||||
insn[i] = BPF_ALU32_IMM(BPF_MOV, R0, 0xbef);
|
||||
|
||||
insn[len - 1] = BPF_EXIT_INSN();
|
||||
|
||||
self->u.ptr.insns = insn;
|
||||
self->u.ptr.len = len;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int bpf_fill_ld_abs_vlan_push_pop2(struct bpf_test *self)
|
||||
{
|
||||
struct bpf_insn *insn;
|
||||
|
||||
insn = kmalloc_array(16, sizeof(*insn), GFP_KERNEL);
|
||||
if (!insn)
|
||||
return -ENOMEM;
|
||||
|
||||
/* Due to func address being non-const, we need to
|
||||
* assemble this here.
|
||||
*/
|
||||
insn[0] = BPF_MOV64_REG(R6, R1);
|
||||
insn[1] = BPF_LD_ABS(BPF_B, 0);
|
||||
insn[2] = BPF_LD_ABS(BPF_H, 0);
|
||||
insn[3] = BPF_LD_ABS(BPF_W, 0);
|
||||
insn[4] = BPF_MOV64_REG(R7, R6);
|
||||
insn[5] = BPF_MOV64_IMM(R6, 0);
|
||||
insn[6] = BPF_MOV64_REG(R1, R7);
|
||||
insn[7] = BPF_MOV64_IMM(R2, 1);
|
||||
insn[8] = BPF_MOV64_IMM(R3, 2);
|
||||
insn[9] = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
|
||||
bpf_skb_vlan_push_proto.func - __bpf_call_base);
|
||||
insn[10] = BPF_MOV64_REG(R6, R7);
|
||||
insn[11] = BPF_LD_ABS(BPF_B, 0);
|
||||
insn[12] = BPF_LD_ABS(BPF_H, 0);
|
||||
insn[13] = BPF_LD_ABS(BPF_W, 0);
|
||||
insn[14] = BPF_MOV64_IMM(R0, 42);
|
||||
insn[15] = BPF_EXIT_INSN();
|
||||
|
||||
self->u.ptr.insns = insn;
|
||||
self->u.ptr.len = 16;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int bpf_fill_jump_around_ld_abs(struct bpf_test *self)
|
||||
{
|
||||
unsigned int len = BPF_MAXINSNS;
|
||||
struct bpf_insn *insn;
|
||||
int i = 0;
|
||||
|
||||
insn = kmalloc_array(len, sizeof(*insn), GFP_KERNEL);
|
||||
if (!insn)
|
||||
return -ENOMEM;
|
||||
|
||||
insn[i++] = BPF_MOV64_REG(R6, R1);
|
||||
insn[i++] = BPF_LD_ABS(BPF_B, 0);
|
||||
insn[i] = BPF_JMP_IMM(BPF_JEQ, R0, 10, len - i - 2);
|
||||
i++;
|
||||
while (i < len - 1)
|
||||
insn[i++] = BPF_LD_ABS(BPF_B, 1);
|
||||
insn[i] = BPF_EXIT_INSN();
|
||||
|
||||
self->u.ptr.insns = insn;
|
||||
self->u.ptr.len = len;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int __bpf_fill_stxdw(struct bpf_test *self, int size)
|
||||
{
|
||||
unsigned int len = BPF_MAXINSNS;
|
||||
@ -1987,40 +1877,6 @@ static struct bpf_test tests[] = {
|
||||
{ },
|
||||
{ { 0, -1 } }
|
||||
},
|
||||
{
|
||||
"INT: DIV + ABS",
|
||||
.u.insns_int = {
|
||||
BPF_ALU64_REG(BPF_MOV, R6, R1),
|
||||
BPF_LD_ABS(BPF_B, 3),
|
||||
BPF_ALU64_IMM(BPF_MOV, R2, 2),
|
||||
BPF_ALU32_REG(BPF_DIV, R0, R2),
|
||||
BPF_ALU64_REG(BPF_MOV, R8, R0),
|
||||
BPF_LD_ABS(BPF_B, 4),
|
||||
BPF_ALU64_REG(BPF_ADD, R8, R0),
|
||||
BPF_LD_IND(BPF_B, R8, -70),
|
||||
BPF_EXIT_INSN(),
|
||||
},
|
||||
INTERNAL,
|
||||
{ 10, 20, 30, 40, 50 },
|
||||
{ { 4, 0 }, { 5, 10 } }
|
||||
},
|
||||
{
|
||||
/* This one doesn't go through verifier, but is just raw insn
|
||||
* as opposed to cBPF tests from here. Thus div by 0 tests are
|
||||
* done in test_verifier in BPF kselftests.
|
||||
*/
|
||||
"INT: DIV by -1",
|
||||
.u.insns_int = {
|
||||
BPF_ALU64_REG(BPF_MOV, R6, R1),
|
||||
BPF_ALU64_IMM(BPF_MOV, R7, -1),
|
||||
BPF_LD_ABS(BPF_B, 3),
|
||||
BPF_ALU32_REG(BPF_DIV, R0, R7),
|
||||
BPF_EXIT_INSN(),
|
||||
},
|
||||
INTERNAL,
|
||||
{ 10, 20, 30, 40, 50 },
|
||||
{ { 3, 0 }, { 4, 0 } }
|
||||
},
|
||||
{
|
||||
"check: missing ret",
|
||||
.u.insns = {
|
||||
@ -2383,50 +2239,6 @@ static struct bpf_test tests[] = {
|
||||
{ },
|
||||
{ { 0, 1 } }
|
||||
},
|
||||
{
|
||||
"nmap reduced",
|
||||
.u.insns_int = {
|
||||
BPF_MOV64_REG(R6, R1),
|
||||
BPF_LD_ABS(BPF_H, 12),
|
||||
BPF_JMP_IMM(BPF_JNE, R0, 0x806, 28),
|
||||
BPF_LD_ABS(BPF_H, 12),
|
||||
BPF_JMP_IMM(BPF_JNE, R0, 0x806, 26),
|
||||
BPF_MOV32_IMM(R0, 18),
|
||||
BPF_STX_MEM(BPF_W, R10, R0, -64),
|
||||
BPF_LDX_MEM(BPF_W, R7, R10, -64),
|
||||
BPF_LD_IND(BPF_W, R7, 14),
|
||||
BPF_STX_MEM(BPF_W, R10, R0, -60),
|
||||
BPF_MOV32_IMM(R0, 280971478),
|
||||
BPF_STX_MEM(BPF_W, R10, R0, -56),
|
||||
BPF_LDX_MEM(BPF_W, R7, R10, -56),
|
||||
BPF_LDX_MEM(BPF_W, R0, R10, -60),
|
||||
BPF_ALU32_REG(BPF_SUB, R0, R7),
|
||||
BPF_JMP_IMM(BPF_JNE, R0, 0, 15),
|
||||
BPF_LD_ABS(BPF_H, 12),
|
||||
BPF_JMP_IMM(BPF_JNE, R0, 0x806, 13),
|
||||
BPF_MOV32_IMM(R0, 22),
|
||||
BPF_STX_MEM(BPF_W, R10, R0, -56),
|
||||
BPF_LDX_MEM(BPF_W, R7, R10, -56),
|
||||
BPF_LD_IND(BPF_H, R7, 14),
|
||||
BPF_STX_MEM(BPF_W, R10, R0, -52),
|
||||
BPF_MOV32_IMM(R0, 17366),
|
||||
BPF_STX_MEM(BPF_W, R10, R0, -48),
|
||||
BPF_LDX_MEM(BPF_W, R7, R10, -48),
|
||||
BPF_LDX_MEM(BPF_W, R0, R10, -52),
|
||||
BPF_ALU32_REG(BPF_SUB, R0, R7),
|
||||
BPF_JMP_IMM(BPF_JNE, R0, 0, 2),
|
||||
BPF_MOV32_IMM(R0, 256),
|
||||
BPF_EXIT_INSN(),
|
||||
BPF_MOV32_IMM(R0, 0),
|
||||
BPF_EXIT_INSN(),
|
||||
},
|
||||
INTERNAL,
|
||||
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0x08, 0x06, 0, 0,
|
||||
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
||||
0x10, 0xbf, 0x48, 0xd6, 0x43, 0xd6},
|
||||
{ { 38, 256 } },
|
||||
.stack_depth = 64,
|
||||
},
|
||||
/* BPF_ALU | BPF_MOV | BPF_X */
|
||||
{
|
||||
"ALU_MOV_X: dst = 2",
|
||||
@ -5485,22 +5297,6 @@ static struct bpf_test tests[] = {
|
||||
{ { 1, 0xbee } },
|
||||
.fill_helper = bpf_fill_ld_abs_get_processor_id,
|
||||
},
|
||||
{
|
||||
"BPF_MAXINSNS: ld_abs+vlan_push/pop",
|
||||
{ },
|
||||
INTERNAL,
|
||||
{ 0x34 },
|
||||
{ { ETH_HLEN, 0xbef } },
|
||||
.fill_helper = bpf_fill_ld_abs_vlan_push_pop,
|
||||
},
|
||||
{
|
||||
"BPF_MAXINSNS: jump around ld_abs",
|
||||
{ },
|
||||
INTERNAL,
|
||||
{ 10, 11 },
|
||||
{ { 2, 10 } },
|
||||
.fill_helper = bpf_fill_jump_around_ld_abs,
|
||||
},
|
||||
/*
|
||||
* LD_IND / LD_ABS on fragmented SKBs
|
||||
*/
|
||||
@ -5682,6 +5478,53 @@ static struct bpf_test tests[] = {
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x40, 0x05 } },
|
||||
},
|
||||
{
|
||||
"LD_IND byte positive offset, all ff",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LDX | BPF_IMM, 0x3e),
|
||||
BPF_STMT(BPF_LD | BPF_IND | BPF_B, 0x1),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0xff, [0x3d] = 0xff, [0x3e] = 0xff, [0x3f] = 0xff },
|
||||
{ {0x40, 0xff } },
|
||||
},
|
||||
{
|
||||
"LD_IND byte positive offset, out of bounds",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LDX | BPF_IMM, 0x3e),
|
||||
BPF_STMT(BPF_LD | BPF_IND | BPF_B, 0x1),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x3f, 0 }, },
|
||||
},
|
||||
{
|
||||
"LD_IND byte negative offset, out of bounds",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LDX | BPF_IMM, 0x3e),
|
||||
BPF_STMT(BPF_LD | BPF_IND | BPF_B, -0x3f),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x3f, 0 } },
|
||||
},
|
||||
{
|
||||
"LD_IND byte negative offset, multiple calls",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LDX | BPF_IMM, 0x3b),
|
||||
BPF_STMT(BPF_LD | BPF_IND | BPF_B, SKF_LL_OFF + 1),
|
||||
BPF_STMT(BPF_LD | BPF_IND | BPF_B, SKF_LL_OFF + 2),
|
||||
BPF_STMT(BPF_LD | BPF_IND | BPF_B, SKF_LL_OFF + 3),
|
||||
BPF_STMT(BPF_LD | BPF_IND | BPF_B, SKF_LL_OFF + 4),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x40, 0x82 }, },
|
||||
},
|
||||
{
|
||||
"LD_IND halfword positive offset",
|
||||
.u.insns = {
|
||||
@ -5730,6 +5573,39 @@ static struct bpf_test tests[] = {
|
||||
},
|
||||
{ {0x40, 0x66cc } },
|
||||
},
|
||||
{
|
||||
"LD_IND halfword positive offset, all ff",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LDX | BPF_IMM, 0x3d),
|
||||
BPF_STMT(BPF_LD | BPF_IND | BPF_H, 0x1),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0xff, [0x3d] = 0xff, [0x3e] = 0xff, [0x3f] = 0xff },
|
||||
{ {0x40, 0xffff } },
|
||||
},
|
||||
{
|
||||
"LD_IND halfword positive offset, out of bounds",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LDX | BPF_IMM, 0x3e),
|
||||
BPF_STMT(BPF_LD | BPF_IND | BPF_H, 0x1),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x3f, 0 }, },
|
||||
},
|
||||
{
|
||||
"LD_IND halfword negative offset, out of bounds",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LDX | BPF_IMM, 0x3e),
|
||||
BPF_STMT(BPF_LD | BPF_IND | BPF_H, -0x3f),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x3f, 0 } },
|
||||
},
|
||||
{
|
||||
"LD_IND word positive offset",
|
||||
.u.insns = {
|
||||
@ -5820,6 +5696,39 @@ static struct bpf_test tests[] = {
|
||||
},
|
||||
{ {0x40, 0x66cc77dd } },
|
||||
},
|
||||
{
|
||||
"LD_IND word positive offset, all ff",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LDX | BPF_IMM, 0x3b),
|
||||
BPF_STMT(BPF_LD | BPF_IND | BPF_W, 0x1),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0xff, [0x3d] = 0xff, [0x3e] = 0xff, [0x3f] = 0xff },
|
||||
{ {0x40, 0xffffffff } },
|
||||
},
|
||||
{
|
||||
"LD_IND word positive offset, out of bounds",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LDX | BPF_IMM, 0x3e),
|
||||
BPF_STMT(BPF_LD | BPF_IND | BPF_W, 0x1),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x3f, 0 }, },
|
||||
},
|
||||
{
|
||||
"LD_IND word negative offset, out of bounds",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LDX | BPF_IMM, 0x3e),
|
||||
BPF_STMT(BPF_LD | BPF_IND | BPF_W, -0x3f),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x3f, 0 } },
|
||||
},
|
||||
{
|
||||
"LD_ABS byte",
|
||||
.u.insns = {
|
||||
@ -5837,6 +5746,68 @@ static struct bpf_test tests[] = {
|
||||
},
|
||||
{ {0x40, 0xcc } },
|
||||
},
|
||||
{
|
||||
"LD_ABS byte positive offset, all ff",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_B, 0x3f),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0xff, [0x3d] = 0xff, [0x3e] = 0xff, [0x3f] = 0xff },
|
||||
{ {0x40, 0xff } },
|
||||
},
|
||||
{
|
||||
"LD_ABS byte positive offset, out of bounds",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_B, 0x3f),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x3f, 0 }, },
|
||||
},
|
||||
{
|
||||
"LD_ABS byte negative offset, out of bounds load",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_B, -1),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC | FLAG_EXPECTED_FAIL,
|
||||
.expected_errcode = -EINVAL,
|
||||
},
|
||||
{
|
||||
"LD_ABS byte negative offset, in bounds",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_B, SKF_LL_OFF + 0x3f),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x40, 0x82 }, },
|
||||
},
|
||||
{
|
||||
"LD_ABS byte negative offset, out of bounds",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_B, SKF_LL_OFF + 0x3f),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x3f, 0 }, },
|
||||
},
|
||||
{
|
||||
"LD_ABS byte negative offset, multiple calls",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_B, SKF_LL_OFF + 0x3c),
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_B, SKF_LL_OFF + 0x3d),
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_B, SKF_LL_OFF + 0x3e),
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_B, SKF_LL_OFF + 0x3f),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x40, 0x82 }, },
|
||||
},
|
||||
{
|
||||
"LD_ABS halfword",
|
||||
.u.insns = {
|
||||
@ -5871,6 +5842,55 @@ static struct bpf_test tests[] = {
|
||||
},
|
||||
{ {0x40, 0x99ff } },
|
||||
},
|
||||
{
|
||||
"LD_ABS halfword positive offset, all ff",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_H, 0x3e),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0xff, [0x3d] = 0xff, [0x3e] = 0xff, [0x3f] = 0xff },
|
||||
{ {0x40, 0xffff } },
|
||||
},
|
||||
{
|
||||
"LD_ABS halfword positive offset, out of bounds",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_H, 0x3f),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x3f, 0 }, },
|
||||
},
|
||||
{
|
||||
"LD_ABS halfword negative offset, out of bounds load",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_H, -1),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC | FLAG_EXPECTED_FAIL,
|
||||
.expected_errcode = -EINVAL,
|
||||
},
|
||||
{
|
||||
"LD_ABS halfword negative offset, in bounds",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_H, SKF_LL_OFF + 0x3e),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x40, 0x1982 }, },
|
||||
},
|
||||
{
|
||||
"LD_ABS halfword negative offset, out of bounds",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_H, SKF_LL_OFF + 0x3e),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x3f, 0 }, },
|
||||
},
|
||||
{
|
||||
"LD_ABS word",
|
||||
.u.insns = {
|
||||
@ -5939,6 +5959,140 @@ static struct bpf_test tests[] = {
|
||||
},
|
||||
{ {0x40, 0x88ee99ff } },
|
||||
},
|
||||
{
|
||||
"LD_ABS word positive offset, all ff",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_W, 0x3c),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0xff, [0x3d] = 0xff, [0x3e] = 0xff, [0x3f] = 0xff },
|
||||
{ {0x40, 0xffffffff } },
|
||||
},
|
||||
{
|
||||
"LD_ABS word positive offset, out of bounds",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_W, 0x3f),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x3f, 0 }, },
|
||||
},
|
||||
{
|
||||
"LD_ABS word negative offset, out of bounds load",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_W, -1),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC | FLAG_EXPECTED_FAIL,
|
||||
.expected_errcode = -EINVAL,
|
||||
},
|
||||
{
|
||||
"LD_ABS word negative offset, in bounds",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_W, SKF_LL_OFF + 0x3c),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x40, 0x25051982 }, },
|
||||
},
|
||||
{
|
||||
"LD_ABS word negative offset, out of bounds",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_ABS | BPF_W, SKF_LL_OFF + 0x3c),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x3f, 0 }, },
|
||||
},
|
||||
{
|
||||
"LDX_MSH standalone, preserved A",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_IMM, 0xffeebbaa),
|
||||
BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0x3c),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x40, 0xffeebbaa }, },
|
||||
},
|
||||
{
|
||||
"LDX_MSH standalone, preserved A 2",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_IMM, 0x175e9d63),
|
||||
BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0x3c),
|
||||
BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0x3d),
|
||||
BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0x3e),
|
||||
BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0x3f),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x40, 0x175e9d63 }, },
|
||||
},
|
||||
{
|
||||
"LDX_MSH standalone, test result 1",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_IMM, 0xffeebbaa),
|
||||
BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0x3c),
|
||||
BPF_STMT(BPF_MISC | BPF_TXA, 0),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x40, 0x14 }, },
|
||||
},
|
||||
{
|
||||
"LDX_MSH standalone, test result 2",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_IMM, 0xffeebbaa),
|
||||
BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0x3e),
|
||||
BPF_STMT(BPF_MISC | BPF_TXA, 0),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x40, 0x24 }, },
|
||||
},
|
||||
{
|
||||
"LDX_MSH standalone, negative offset",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_IMM, 0xffeebbaa),
|
||||
BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, -1),
|
||||
BPF_STMT(BPF_MISC | BPF_TXA, 0),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x40, 0 }, },
|
||||
},
|
||||
{
|
||||
"LDX_MSH standalone, negative offset 2",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_IMM, 0xffeebbaa),
|
||||
BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, SKF_LL_OFF + 0x3e),
|
||||
BPF_STMT(BPF_MISC | BPF_TXA, 0),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x40, 0x24 }, },
|
||||
},
|
||||
{
|
||||
"LDX_MSH standalone, out of bounds",
|
||||
.u.insns = {
|
||||
BPF_STMT(BPF_LD | BPF_IMM, 0xffeebbaa),
|
||||
BPF_STMT(BPF_LDX | BPF_B | BPF_MSH, 0x40),
|
||||
BPF_STMT(BPF_MISC | BPF_TXA, 0),
|
||||
BPF_STMT(BPF_RET | BPF_A, 0x0),
|
||||
},
|
||||
CLASSIC,
|
||||
{ [0x3c] = 0x25, [0x3d] = 0x05, [0x3e] = 0x19, [0x3f] = 0x82 },
|
||||
{ {0x40, 0 }, },
|
||||
},
|
||||
/*
|
||||
* verify that the interpreter or JIT correctly sets A and X
|
||||
* to 0.
|
||||
@ -6127,14 +6281,6 @@ static struct bpf_test tests[] = {
|
||||
{},
|
||||
{ {0x1, 0x42 } },
|
||||
},
|
||||
{
|
||||
"LD_ABS with helper changing skb data",
|
||||
{ },
|
||||
INTERNAL,
|
||||
{ 0x34 },
|
||||
{ { ETH_HLEN, 42 } },
|
||||
.fill_helper = bpf_fill_ld_abs_vlan_push_pop2,
|
||||
},
|
||||
/* Checking interpreter vs JIT wrt signed extended imms. */
|
||||
{
|
||||
"JNE signed compare, test 1",
|
||||
|
@ -59,6 +59,7 @@ source "net/tls/Kconfig"
|
||||
source "net/xfrm/Kconfig"
|
||||
source "net/iucv/Kconfig"
|
||||
source "net/smc/Kconfig"
|
||||
source "net/xdp/Kconfig"
|
||||
|
||||
config INET
|
||||
bool "TCP/IP networking"
|
||||
|
@ -85,3 +85,4 @@ obj-y += l3mdev/
|
||||
endif
|
||||
obj-$(CONFIG_QRTR) += qrtr/
|
||||
obj-$(CONFIG_NET_NCSI) += ncsi/
|
||||
obj-$(CONFIG_XDP_SOCKETS) += xdp/
|
||||
|
@ -3627,6 +3627,44 @@ int dev_queue_xmit_accel(struct sk_buff *skb, void *accel_priv)
|
||||
}
|
||||
EXPORT_SYMBOL(dev_queue_xmit_accel);
|
||||
|
||||
int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
|
||||
{
|
||||
struct net_device *dev = skb->dev;
|
||||
struct sk_buff *orig_skb = skb;
|
||||
struct netdev_queue *txq;
|
||||
int ret = NETDEV_TX_BUSY;
|
||||
bool again = false;
|
||||
|
||||
if (unlikely(!netif_running(dev) ||
|
||||
!netif_carrier_ok(dev)))
|
||||
goto drop;
|
||||
|
||||
skb = validate_xmit_skb_list(skb, dev, &again);
|
||||
if (skb != orig_skb)
|
||||
goto drop;
|
||||
|
||||
skb_set_queue_mapping(skb, queue_id);
|
||||
txq = skb_get_tx_queue(dev, skb);
|
||||
|
||||
local_bh_disable();
|
||||
|
||||
HARD_TX_LOCK(dev, txq, smp_processor_id());
|
||||
if (!netif_xmit_frozen_or_drv_stopped(txq))
|
||||
ret = netdev_start_xmit(skb, dev, txq, false);
|
||||
HARD_TX_UNLOCK(dev, txq);
|
||||
|
||||
local_bh_enable();
|
||||
|
||||
if (!dev_xmit_complete(ret))
|
||||
kfree_skb(skb);
|
||||
|
||||
return ret;
|
||||
drop:
|
||||
atomic_long_inc(&dev->tx_dropped);
|
||||
kfree_skb_list(skb);
|
||||
return NET_XMIT_DROP;
|
||||
}
|
||||
EXPORT_SYMBOL(dev_direct_xmit);
|
||||
|
||||
/*************************************************************************
|
||||
* Receiver routines
|
||||
@ -3996,12 +4034,12 @@ static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
|
||||
}
|
||||
|
||||
static u32 netif_receive_generic_xdp(struct sk_buff *skb,
|
||||
struct xdp_buff *xdp,
|
||||
struct bpf_prog *xdp_prog)
|
||||
{
|
||||
struct netdev_rx_queue *rxqueue;
|
||||
void *orig_data, *orig_data_end;
|
||||
u32 metalen, act = XDP_DROP;
|
||||
struct xdp_buff xdp;
|
||||
int hlen, off;
|
||||
u32 mac_len;
|
||||
|
||||
@ -4036,19 +4074,19 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
|
||||
*/
|
||||
mac_len = skb->data - skb_mac_header(skb);
|
||||
hlen = skb_headlen(skb) + mac_len;
|
||||
xdp.data = skb->data - mac_len;
|
||||
xdp.data_meta = xdp.data;
|
||||
xdp.data_end = xdp.data + hlen;
|
||||
xdp.data_hard_start = skb->data - skb_headroom(skb);
|
||||
orig_data_end = xdp.data_end;
|
||||
orig_data = xdp.data;
|
||||
xdp->data = skb->data - mac_len;
|
||||
xdp->data_meta = xdp->data;
|
||||
xdp->data_end = xdp->data + hlen;
|
||||
xdp->data_hard_start = skb->data - skb_headroom(skb);
|
||||
orig_data_end = xdp->data_end;
|
||||
orig_data = xdp->data;
|
||||
|
||||
rxqueue = netif_get_rxqueue(skb);
|
||||
xdp.rxq = &rxqueue->xdp_rxq;
|
||||
xdp->rxq = &rxqueue->xdp_rxq;
|
||||
|
||||
act = bpf_prog_run_xdp(xdp_prog, &xdp);
|
||||
act = bpf_prog_run_xdp(xdp_prog, xdp);
|
||||
|
||||
off = xdp.data - orig_data;
|
||||
off = xdp->data - orig_data;
|
||||
if (off > 0)
|
||||
__skb_pull(skb, off);
|
||||
else if (off < 0)
|
||||
@ -4058,10 +4096,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
|
||||
/* check if bpf_xdp_adjust_tail was used. it can only "shrink"
|
||||
* pckt.
|
||||
*/
|
||||
off = orig_data_end - xdp.data_end;
|
||||
off = orig_data_end - xdp->data_end;
|
||||
if (off != 0) {
|
||||
skb_set_tail_pointer(skb, xdp.data_end - xdp.data);
|
||||
skb_set_tail_pointer(skb, xdp->data_end - xdp->data);
|
||||
skb->len -= off;
|
||||
|
||||
}
|
||||
|
||||
switch (act) {
|
||||
@ -4070,7 +4109,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
|
||||
__skb_push(skb, mac_len);
|
||||
break;
|
||||
case XDP_PASS:
|
||||
metalen = xdp.data - xdp.data_meta;
|
||||
metalen = xdp->data - xdp->data_meta;
|
||||
if (metalen)
|
||||
skb_metadata_set(skb, metalen);
|
||||
break;
|
||||
@ -4120,17 +4159,19 @@ static struct static_key generic_xdp_needed __read_mostly;
|
||||
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
|
||||
{
|
||||
if (xdp_prog) {
|
||||
u32 act = netif_receive_generic_xdp(skb, xdp_prog);
|
||||
struct xdp_buff xdp;
|
||||
u32 act;
|
||||
int err;
|
||||
|
||||
act = netif_receive_generic_xdp(skb, &xdp, xdp_prog);
|
||||
if (act != XDP_PASS) {
|
||||
switch (act) {
|
||||
case XDP_REDIRECT:
|
||||
err = xdp_do_generic_redirect(skb->dev, skb,
|
||||
xdp_prog);
|
||||
&xdp, xdp_prog);
|
||||
if (err)
|
||||
goto out_redir;
|
||||
/* fallthru to submit skb */
|
||||
break;
|
||||
case XDP_TX:
|
||||
generic_xdp_tx(skb, xdp_prog);
|
||||
break;
|
||||
|
@ -59,6 +59,7 @@
|
||||
#include <net/tcp.h>
|
||||
#include <net/xfrm.h>
|
||||
#include <linux/bpf_trace.h>
|
||||
#include <net/xdp_sock.h>
|
||||
|
||||
/**
|
||||
* sk_filter_trim_cap - run a packet through a socket filter
|
||||
@ -112,12 +113,12 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap)
|
||||
}
|
||||
EXPORT_SYMBOL(sk_filter_trim_cap);
|
||||
|
||||
BPF_CALL_1(__skb_get_pay_offset, struct sk_buff *, skb)
|
||||
BPF_CALL_1(bpf_skb_get_pay_offset, struct sk_buff *, skb)
|
||||
{
|
||||
return skb_get_poff(skb);
|
||||
}
|
||||
|
||||
BPF_CALL_3(__skb_get_nlattr, struct sk_buff *, skb, u32, a, u32, x)
|
||||
BPF_CALL_3(bpf_skb_get_nlattr, struct sk_buff *, skb, u32, a, u32, x)
|
||||
{
|
||||
struct nlattr *nla;
|
||||
|
||||
@ -137,7 +138,7 @@ BPF_CALL_3(__skb_get_nlattr, struct sk_buff *, skb, u32, a, u32, x)
|
||||
return 0;
|
||||
}
|
||||
|
||||
BPF_CALL_3(__skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
|
||||
BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
|
||||
{
|
||||
struct nlattr *nla;
|
||||
|
||||
@ -161,13 +162,94 @@ BPF_CALL_3(__skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
|
||||
return 0;
|
||||
}
|
||||
|
||||
BPF_CALL_0(__get_raw_cpu_id)
|
||||
BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
|
||||
data, int, headlen, int, offset)
|
||||
{
|
||||
u8 tmp, *ptr;
|
||||
const int len = sizeof(tmp);
|
||||
|
||||
if (offset >= 0) {
|
||||
if (headlen - offset >= len)
|
||||
return *(u8 *)(data + offset);
|
||||
if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
|
||||
return tmp;
|
||||
} else {
|
||||
ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
|
||||
if (likely(ptr))
|
||||
return *(u8 *)ptr;
|
||||
}
|
||||
|
||||
return -EFAULT;
|
||||
}
|
||||
|
||||
BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb,
|
||||
int, offset)
|
||||
{
|
||||
return ____bpf_skb_load_helper_8(skb, skb->data, skb->len - skb->data_len,
|
||||
offset);
|
||||
}
|
||||
|
||||
BPF_CALL_4(bpf_skb_load_helper_16, const struct sk_buff *, skb, const void *,
|
||||
data, int, headlen, int, offset)
|
||||
{
|
||||
u16 tmp, *ptr;
|
||||
const int len = sizeof(tmp);
|
||||
|
||||
if (offset >= 0) {
|
||||
if (headlen - offset >= len)
|
||||
return get_unaligned_be16(data + offset);
|
||||
if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
|
||||
return be16_to_cpu(tmp);
|
||||
} else {
|
||||
ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
|
||||
if (likely(ptr))
|
||||
return get_unaligned_be16(ptr);
|
||||
}
|
||||
|
||||
return -EFAULT;
|
||||
}
|
||||
|
||||
BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb,
|
||||
int, offset)
|
||||
{
|
||||
return ____bpf_skb_load_helper_16(skb, skb->data, skb->len - skb->data_len,
|
||||
offset);
|
||||
}
|
||||
|
||||
BPF_CALL_4(bpf_skb_load_helper_32, const struct sk_buff *, skb, const void *,
|
||||
data, int, headlen, int, offset)
|
||||
{
|
||||
u32 tmp, *ptr;
|
||||
const int len = sizeof(tmp);
|
||||
|
||||
if (likely(offset >= 0)) {
|
||||
if (headlen - offset >= len)
|
||||
return get_unaligned_be32(data + offset);
|
||||
if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
|
||||
return be32_to_cpu(tmp);
|
||||
} else {
|
||||
ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
|
||||
if (likely(ptr))
|
||||
return get_unaligned_be32(ptr);
|
||||
}
|
||||
|
||||
return -EFAULT;
|
||||
}
|
||||
|
||||
BPF_CALL_2(bpf_skb_load_helper_32_no_cache, const struct sk_buff *, skb,
|
||||
int, offset)
|
||||
{
|
||||
return ____bpf_skb_load_helper_32(skb, skb->data, skb->len - skb->data_len,
|
||||
offset);
|
||||
}
|
||||
|
||||
BPF_CALL_0(bpf_get_raw_cpu_id)
|
||||
{
|
||||
return raw_smp_processor_id();
|
||||
}
|
||||
|
||||
static const struct bpf_func_proto bpf_get_raw_smp_processor_id_proto = {
|
||||
.func = __get_raw_cpu_id,
|
||||
.func = bpf_get_raw_cpu_id,
|
||||
.gpl_only = false,
|
||||
.ret_type = RET_INTEGER,
|
||||
};
|
||||
@ -317,16 +399,16 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
|
||||
/* Emit call(arg1=CTX, arg2=A, arg3=X) */
|
||||
switch (fp->k) {
|
||||
case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
|
||||
*insn = BPF_EMIT_CALL(__skb_get_pay_offset);
|
||||
*insn = BPF_EMIT_CALL(bpf_skb_get_pay_offset);
|
||||
break;
|
||||
case SKF_AD_OFF + SKF_AD_NLATTR:
|
||||
*insn = BPF_EMIT_CALL(__skb_get_nlattr);
|
||||
*insn = BPF_EMIT_CALL(bpf_skb_get_nlattr);
|
||||
break;
|
||||
case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
|
||||
*insn = BPF_EMIT_CALL(__skb_get_nlattr_nest);
|
||||
*insn = BPF_EMIT_CALL(bpf_skb_get_nlattr_nest);
|
||||
break;
|
||||
case SKF_AD_OFF + SKF_AD_CPU:
|
||||
*insn = BPF_EMIT_CALL(__get_raw_cpu_id);
|
||||
*insn = BPF_EMIT_CALL(bpf_get_raw_cpu_id);
|
||||
break;
|
||||
case SKF_AD_OFF + SKF_AD_RANDOM:
|
||||
*insn = BPF_EMIT_CALL(bpf_user_rnd_u32);
|
||||
@ -353,26 +435,87 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool convert_bpf_ld_abs(struct sock_filter *fp, struct bpf_insn **insnp)
|
||||
{
|
||||
const bool unaligned_ok = IS_BUILTIN(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS);
|
||||
int size = bpf_size_to_bytes(BPF_SIZE(fp->code));
|
||||
bool endian = BPF_SIZE(fp->code) == BPF_H ||
|
||||
BPF_SIZE(fp->code) == BPF_W;
|
||||
bool indirect = BPF_MODE(fp->code) == BPF_IND;
|
||||
const int ip_align = NET_IP_ALIGN;
|
||||
struct bpf_insn *insn = *insnp;
|
||||
int offset = fp->k;
|
||||
|
||||
if (!indirect &&
|
||||
((unaligned_ok && offset >= 0) ||
|
||||
(!unaligned_ok && offset >= 0 &&
|
||||
offset + ip_align >= 0 &&
|
||||
offset + ip_align % size == 0))) {
|
||||
*insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_H);
|
||||
*insn++ = BPF_ALU64_IMM(BPF_SUB, BPF_REG_TMP, offset);
|
||||
*insn++ = BPF_JMP_IMM(BPF_JSLT, BPF_REG_TMP, size, 2 + endian);
|
||||
*insn++ = BPF_LDX_MEM(BPF_SIZE(fp->code), BPF_REG_A, BPF_REG_D,
|
||||
offset);
|
||||
if (endian)
|
||||
*insn++ = BPF_ENDIAN(BPF_FROM_BE, BPF_REG_A, size * 8);
|
||||
*insn++ = BPF_JMP_A(8);
|
||||
}
|
||||
|
||||
*insn++ = BPF_MOV64_REG(BPF_REG_ARG1, BPF_REG_CTX);
|
||||
*insn++ = BPF_MOV64_REG(BPF_REG_ARG2, BPF_REG_D);
|
||||
*insn++ = BPF_MOV64_REG(BPF_REG_ARG3, BPF_REG_H);
|
||||
if (!indirect) {
|
||||
*insn++ = BPF_MOV64_IMM(BPF_REG_ARG4, offset);
|
||||
} else {
|
||||
*insn++ = BPF_MOV64_REG(BPF_REG_ARG4, BPF_REG_X);
|
||||
if (fp->k)
|
||||
*insn++ = BPF_ALU64_IMM(BPF_ADD, BPF_REG_ARG4, offset);
|
||||
}
|
||||
|
||||
switch (BPF_SIZE(fp->code)) {
|
||||
case BPF_B:
|
||||
*insn++ = BPF_EMIT_CALL(bpf_skb_load_helper_8);
|
||||
break;
|
||||
case BPF_H:
|
||||
*insn++ = BPF_EMIT_CALL(bpf_skb_load_helper_16);
|
||||
break;
|
||||
case BPF_W:
|
||||
*insn++ = BPF_EMIT_CALL(bpf_skb_load_helper_32);
|
||||
break;
|
||||
default:
|
||||
return false;
|
||||
}
|
||||
|
||||
*insn++ = BPF_JMP_IMM(BPF_JSGE, BPF_REG_A, 0, 2);
|
||||
*insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
|
||||
*insn = BPF_EXIT_INSN();
|
||||
|
||||
*insnp = insn;
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* bpf_convert_filter - convert filter program
|
||||
* @prog: the user passed filter program
|
||||
* @len: the length of the user passed filter program
|
||||
* @new_prog: allocated 'struct bpf_prog' or NULL
|
||||
* @new_len: pointer to store length of converted program
|
||||
* @seen_ld_abs: bool whether we've seen ld_abs/ind
|
||||
*
|
||||
* Remap 'sock_filter' style classic BPF (cBPF) instruction set to 'bpf_insn'
|
||||
* style extended BPF (eBPF).
|
||||
* Conversion workflow:
|
||||
*
|
||||
* 1) First pass for calculating the new program length:
|
||||
* bpf_convert_filter(old_prog, old_len, NULL, &new_len)
|
||||
* bpf_convert_filter(old_prog, old_len, NULL, &new_len, &seen_ld_abs)
|
||||
*
|
||||
* 2) 2nd pass to remap in two passes: 1st pass finds new
|
||||
* jump offsets, 2nd pass remapping:
|
||||
* bpf_convert_filter(old_prog, old_len, new_prog, &new_len);
|
||||
* bpf_convert_filter(old_prog, old_len, new_prog, &new_len, &seen_ld_abs)
|
||||
*/
|
||||
static int bpf_convert_filter(struct sock_filter *prog, int len,
|
||||
struct bpf_prog *new_prog, int *new_len)
|
||||
struct bpf_prog *new_prog, int *new_len,
|
||||
bool *seen_ld_abs)
|
||||
{
|
||||
int new_flen = 0, pass = 0, target, i, stack_off;
|
||||
struct bpf_insn *new_insn, *first_insn = NULL;
|
||||
@ -411,12 +554,27 @@ static int bpf_convert_filter(struct sock_filter *prog, int len,
|
||||
* do this ourself. Initial CTX is present in BPF_REG_ARG1.
|
||||
*/
|
||||
*new_insn++ = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1);
|
||||
if (*seen_ld_abs) {
|
||||
/* For packet access in classic BPF, cache skb->data
|
||||
* in callee-saved BPF R8 and skb->len - skb->data_len
|
||||
* (headlen) in BPF R9. Since classic BPF is read-only
|
||||
* on CTX, we only need to cache it once.
|
||||
*/
|
||||
*new_insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, data),
|
||||
BPF_REG_D, BPF_REG_CTX,
|
||||
offsetof(struct sk_buff, data));
|
||||
*new_insn++ = BPF_LDX_MEM(BPF_W, BPF_REG_H, BPF_REG_CTX,
|
||||
offsetof(struct sk_buff, len));
|
||||
*new_insn++ = BPF_LDX_MEM(BPF_W, BPF_REG_TMP, BPF_REG_CTX,
|
||||
offsetof(struct sk_buff, data_len));
|
||||
*new_insn++ = BPF_ALU32_REG(BPF_SUB, BPF_REG_H, BPF_REG_TMP);
|
||||
}
|
||||
} else {
|
||||
new_insn += 3;
|
||||
}
|
||||
|
||||
for (i = 0; i < len; fp++, i++) {
|
||||
struct bpf_insn tmp_insns[6] = { };
|
||||
struct bpf_insn tmp_insns[32] = { };
|
||||
struct bpf_insn *insn = tmp_insns;
|
||||
|
||||
if (addrs)
|
||||
@ -459,6 +617,11 @@ static int bpf_convert_filter(struct sock_filter *prog, int len,
|
||||
BPF_MODE(fp->code) == BPF_ABS &&
|
||||
convert_bpf_extensions(fp, &insn))
|
||||
break;
|
||||
if (BPF_CLASS(fp->code) == BPF_LD &&
|
||||
convert_bpf_ld_abs(fp, &insn)) {
|
||||
*seen_ld_abs = true;
|
||||
break;
|
||||
}
|
||||
|
||||
if (fp->code == (BPF_ALU | BPF_DIV | BPF_X) ||
|
||||
fp->code == (BPF_ALU | BPF_MOD | BPF_X)) {
|
||||
@ -561,21 +724,31 @@ static int bpf_convert_filter(struct sock_filter *prog, int len,
|
||||
break;
|
||||
|
||||
/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
|
||||
case BPF_LDX | BPF_MSH | BPF_B:
|
||||
/* tmp = A */
|
||||
*insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_A);
|
||||
case BPF_LDX | BPF_MSH | BPF_B: {
|
||||
struct sock_filter tmp = {
|
||||
.code = BPF_LD | BPF_ABS | BPF_B,
|
||||
.k = fp->k,
|
||||
};
|
||||
|
||||
*seen_ld_abs = true;
|
||||
|
||||
/* X = A */
|
||||
*insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
|
||||
/* A = BPF_R0 = *(u8 *) (skb->data + K) */
|
||||
*insn++ = BPF_LD_ABS(BPF_B, fp->k);
|
||||
convert_bpf_ld_abs(&tmp, &insn);
|
||||
insn++;
|
||||
/* A &= 0xf */
|
||||
*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, 0xf);
|
||||
/* A <<= 2 */
|
||||
*insn++ = BPF_ALU32_IMM(BPF_LSH, BPF_REG_A, 2);
|
||||
/* tmp = X */
|
||||
*insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_X);
|
||||
/* X = A */
|
||||
*insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
|
||||
/* A = tmp */
|
||||
*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP);
|
||||
break;
|
||||
|
||||
}
|
||||
/* RET_K is remaped into 2 insns. RET_A case doesn't need an
|
||||
* extra mov as BPF_REG_0 is already mapped into BPF_REG_A.
|
||||
*/
|
||||
@ -657,6 +830,8 @@ static int bpf_convert_filter(struct sock_filter *prog, int len,
|
||||
if (!new_prog) {
|
||||
/* Only calculating new length. */
|
||||
*new_len = new_insn - first_insn;
|
||||
if (*seen_ld_abs)
|
||||
*new_len += 4; /* Prologue bits. */
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -1018,6 +1193,7 @@ static struct bpf_prog *bpf_migrate_filter(struct bpf_prog *fp)
|
||||
struct sock_filter *old_prog;
|
||||
struct bpf_prog *old_fp;
|
||||
int err, new_len, old_len = fp->len;
|
||||
bool seen_ld_abs = false;
|
||||
|
||||
/* We are free to overwrite insns et al right here as it
|
||||
* won't be used at this point in time anymore internally
|
||||
@ -1039,7 +1215,8 @@ static struct bpf_prog *bpf_migrate_filter(struct bpf_prog *fp)
|
||||
}
|
||||
|
||||
/* 1st pass: calculate the new program length. */
|
||||
err = bpf_convert_filter(old_prog, old_len, NULL, &new_len);
|
||||
err = bpf_convert_filter(old_prog, old_len, NULL, &new_len,
|
||||
&seen_ld_abs);
|
||||
if (err)
|
||||
goto out_err_free;
|
||||
|
||||
@ -1058,7 +1235,8 @@ static struct bpf_prog *bpf_migrate_filter(struct bpf_prog *fp)
|
||||
fp->len = new_len;
|
||||
|
||||
/* 2nd pass: remap sock_filter insns into bpf_insn insns. */
|
||||
err = bpf_convert_filter(old_prog, old_len, fp, &new_len);
|
||||
err = bpf_convert_filter(old_prog, old_len, fp, &new_len,
|
||||
&seen_ld_abs);
|
||||
if (err)
|
||||
/* 2nd bpf_convert_filter() can fail only if it fails
|
||||
* to allocate memory, remapping must succeed. Note,
|
||||
@ -1506,6 +1684,47 @@ static const struct bpf_func_proto bpf_skb_load_bytes_proto = {
|
||||
.arg4_type = ARG_CONST_SIZE,
|
||||
};
|
||||
|
||||
BPF_CALL_5(bpf_skb_load_bytes_relative, const struct sk_buff *, skb,
|
||||
u32, offset, void *, to, u32, len, u32, start_header)
|
||||
{
|
||||
u8 *ptr;
|
||||
|
||||
if (unlikely(offset > 0xffff || len > skb_headlen(skb)))
|
||||
goto err_clear;
|
||||
|
||||
switch (start_header) {
|
||||
case BPF_HDR_START_MAC:
|
||||
ptr = skb_mac_header(skb) + offset;
|
||||
break;
|
||||
case BPF_HDR_START_NET:
|
||||
ptr = skb_network_header(skb) + offset;
|
||||
break;
|
||||
default:
|
||||
goto err_clear;
|
||||
}
|
||||
|
||||
if (likely(ptr >= skb_mac_header(skb) &&
|
||||
ptr + len <= skb_tail_pointer(skb))) {
|
||||
memcpy(to, ptr, len);
|
||||
return 0;
|
||||
}
|
||||
|
||||
err_clear:
|
||||
memset(to, 0, len);
|
||||
return -EFAULT;
|
||||
}
|
||||
|
||||
static const struct bpf_func_proto bpf_skb_load_bytes_relative_proto = {
|
||||
.func = bpf_skb_load_bytes_relative,
|
||||
.gpl_only = false,
|
||||
.ret_type = RET_INTEGER,
|
||||
.arg1_type = ARG_PTR_TO_CTX,
|
||||
.arg2_type = ARG_ANYTHING,
|
||||
.arg3_type = ARG_PTR_TO_UNINIT_MEM,
|
||||
.arg4_type = ARG_CONST_SIZE,
|
||||
.arg5_type = ARG_ANYTHING,
|
||||
};
|
||||
|
||||
BPF_CALL_2(bpf_skb_pull_data, struct sk_buff *, skb, u32, len)
|
||||
{
|
||||
/* Idea is the following: should the needed direct read/write
|
||||
@ -2180,7 +2399,7 @@ BPF_CALL_3(bpf_skb_vlan_push, struct sk_buff *, skb, __be16, vlan_proto,
|
||||
return ret;
|
||||
}
|
||||
|
||||
const struct bpf_func_proto bpf_skb_vlan_push_proto = {
|
||||
static const struct bpf_func_proto bpf_skb_vlan_push_proto = {
|
||||
.func = bpf_skb_vlan_push,
|
||||
.gpl_only = false,
|
||||
.ret_type = RET_INTEGER,
|
||||
@ -2188,7 +2407,6 @@ const struct bpf_func_proto bpf_skb_vlan_push_proto = {
|
||||
.arg2_type = ARG_ANYTHING,
|
||||
.arg3_type = ARG_ANYTHING,
|
||||
};
|
||||
EXPORT_SYMBOL_GPL(bpf_skb_vlan_push_proto);
|
||||
|
||||
BPF_CALL_1(bpf_skb_vlan_pop, struct sk_buff *, skb)
|
||||
{
|
||||
@ -2202,13 +2420,12 @@ BPF_CALL_1(bpf_skb_vlan_pop, struct sk_buff *, skb)
|
||||
return ret;
|
||||
}
|
||||
|
||||
const struct bpf_func_proto bpf_skb_vlan_pop_proto = {
|
||||
static const struct bpf_func_proto bpf_skb_vlan_pop_proto = {
|
||||
.func = bpf_skb_vlan_pop,
|
||||
.gpl_only = false,
|
||||
.ret_type = RET_INTEGER,
|
||||
.arg1_type = ARG_PTR_TO_CTX,
|
||||
};
|
||||
EXPORT_SYMBOL_GPL(bpf_skb_vlan_pop_proto);
|
||||
|
||||
static int bpf_skb_generic_push(struct sk_buff *skb, u32 off, u32 len)
|
||||
{
|
||||
@ -2801,7 +3018,8 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
|
||||
{
|
||||
int err;
|
||||
|
||||
if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
|
||||
switch (map->map_type) {
|
||||
case BPF_MAP_TYPE_DEVMAP: {
|
||||
struct net_device *dev = fwd;
|
||||
struct xdp_frame *xdpf;
|
||||
|
||||
@ -2819,14 +3037,25 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd,
|
||||
if (err)
|
||||
return err;
|
||||
__dev_map_insert_ctx(map, index);
|
||||
|
||||
} else if (map->map_type == BPF_MAP_TYPE_CPUMAP) {
|
||||
break;
|
||||
}
|
||||
case BPF_MAP_TYPE_CPUMAP: {
|
||||
struct bpf_cpu_map_entry *rcpu = fwd;
|
||||
|
||||
err = cpu_map_enqueue(rcpu, xdp, dev_rx);
|
||||
if (err)
|
||||
return err;
|
||||
__cpu_map_insert_ctx(map, index);
|
||||
break;
|
||||
}
|
||||
case BPF_MAP_TYPE_XSKMAP: {
|
||||
struct xdp_sock *xs = fwd;
|
||||
|
||||
err = __xsk_map_redirect(map, xdp, xs);
|
||||
return err;
|
||||
}
|
||||
default:
|
||||
break;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
@ -2845,6 +3074,9 @@ void xdp_do_flush_map(void)
|
||||
case BPF_MAP_TYPE_CPUMAP:
|
||||
__cpu_map_flush(map);
|
||||
break;
|
||||
case BPF_MAP_TYPE_XSKMAP:
|
||||
__xsk_map_flush(map);
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
@ -2859,6 +3091,8 @@ static void *__xdp_map_lookup_elem(struct bpf_map *map, u32 index)
|
||||
return __dev_map_lookup_elem(map, index);
|
||||
case BPF_MAP_TYPE_CPUMAP:
|
||||
return __cpu_map_lookup_elem(map, index);
|
||||
case BPF_MAP_TYPE_XSKMAP:
|
||||
return __xsk_map_lookup_elem(map, index);
|
||||
default:
|
||||
return NULL;
|
||||
}
|
||||
@ -2956,13 +3190,14 @@ static int __xdp_generic_ok_fwd_dev(struct sk_buff *skb, struct net_device *fwd)
|
||||
|
||||
static int xdp_do_generic_redirect_map(struct net_device *dev,
|
||||
struct sk_buff *skb,
|
||||
struct xdp_buff *xdp,
|
||||
struct bpf_prog *xdp_prog)
|
||||
{
|
||||
struct redirect_info *ri = this_cpu_ptr(&redirect_info);
|
||||
unsigned long map_owner = ri->map_owner;
|
||||
struct bpf_map *map = ri->map;
|
||||
struct net_device *fwd = NULL;
|
||||
u32 index = ri->ifindex;
|
||||
void *fwd = NULL;
|
||||
int err = 0;
|
||||
|
||||
ri->ifindex = 0;
|
||||
@ -2984,6 +3219,14 @@ static int xdp_do_generic_redirect_map(struct net_device *dev,
|
||||
if (unlikely((err = __xdp_generic_ok_fwd_dev(skb, fwd))))
|
||||
goto err;
|
||||
skb->dev = fwd;
|
||||
generic_xdp_tx(skb, xdp_prog);
|
||||
} else if (map->map_type == BPF_MAP_TYPE_XSKMAP) {
|
||||
struct xdp_sock *xs = fwd;
|
||||
|
||||
err = xsk_generic_rcv(xs, xdp);
|
||||
if (err)
|
||||
goto err;
|
||||
consume_skb(skb);
|
||||
} else {
|
||||
/* TODO: Handle BPF_MAP_TYPE_CPUMAP */
|
||||
err = -EBADRQC;
|
||||
@ -2998,7 +3241,7 @@ static int xdp_do_generic_redirect_map(struct net_device *dev,
|
||||
}
|
||||
|
||||
int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
|
||||
struct bpf_prog *xdp_prog)
|
||||
struct xdp_buff *xdp, struct bpf_prog *xdp_prog)
|
||||
{
|
||||
struct redirect_info *ri = this_cpu_ptr(&redirect_info);
|
||||
u32 index = ri->ifindex;
|
||||
@ -3006,7 +3249,7 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
|
||||
int err = 0;
|
||||
|
||||
if (ri->map)
|
||||
return xdp_do_generic_redirect_map(dev, skb, xdp_prog);
|
||||
return xdp_do_generic_redirect_map(dev, skb, xdp, xdp_prog);
|
||||
|
||||
ri->ifindex = 0;
|
||||
fwd = dev_get_by_index_rcu(dev_net(dev), index);
|
||||
@ -3020,6 +3263,7 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
|
||||
|
||||
skb->dev = fwd;
|
||||
_trace_xdp_redirect(dev, xdp_prog, index);
|
||||
generic_xdp_tx(skb, xdp_prog);
|
||||
return 0;
|
||||
err:
|
||||
_trace_xdp_redirect_err(dev, xdp_prog, index, err);
|
||||
@ -3858,6 +4102,8 @@ sk_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
|
||||
switch (func_id) {
|
||||
case BPF_FUNC_skb_load_bytes:
|
||||
return &bpf_skb_load_bytes_proto;
|
||||
case BPF_FUNC_skb_load_bytes_relative:
|
||||
return &bpf_skb_load_bytes_relative_proto;
|
||||
case BPF_FUNC_get_socket_cookie:
|
||||
return &bpf_get_socket_cookie_proto;
|
||||
case BPF_FUNC_get_socket_uid:
|
||||
@ -3875,6 +4121,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
|
||||
return &bpf_skb_store_bytes_proto;
|
||||
case BPF_FUNC_skb_load_bytes:
|
||||
return &bpf_skb_load_bytes_proto;
|
||||
case BPF_FUNC_skb_load_bytes_relative:
|
||||
return &bpf_skb_load_bytes_relative_proto;
|
||||
case BPF_FUNC_skb_pull_data:
|
||||
return &bpf_skb_pull_data_proto;
|
||||
case BPF_FUNC_csum_diff:
|
||||
@ -4304,6 +4552,41 @@ static int bpf_unclone_prologue(struct bpf_insn *insn_buf, bool direct_write,
|
||||
return insn - insn_buf;
|
||||
}
|
||||
|
||||
static int bpf_gen_ld_abs(const struct bpf_insn *orig,
|
||||
struct bpf_insn *insn_buf)
|
||||
{
|
||||
bool indirect = BPF_MODE(orig->code) == BPF_IND;
|
||||
struct bpf_insn *insn = insn_buf;
|
||||
|
||||
/* We're guaranteed here that CTX is in R6. */
|
||||
*insn++ = BPF_MOV64_REG(BPF_REG_1, BPF_REG_CTX);
|
||||
if (!indirect) {
|
||||
*insn++ = BPF_MOV64_IMM(BPF_REG_2, orig->imm);
|
||||
} else {
|
||||
*insn++ = BPF_MOV64_REG(BPF_REG_2, orig->src_reg);
|
||||
if (orig->imm)
|
||||
*insn++ = BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, orig->imm);
|
||||
}
|
||||
|
||||
switch (BPF_SIZE(orig->code)) {
|
||||
case BPF_B:
|
||||
*insn++ = BPF_EMIT_CALL(bpf_skb_load_helper_8_no_cache);
|
||||
break;
|
||||
case BPF_H:
|
||||
*insn++ = BPF_EMIT_CALL(bpf_skb_load_helper_16_no_cache);
|
||||
break;
|
||||
case BPF_W:
|
||||
*insn++ = BPF_EMIT_CALL(bpf_skb_load_helper_32_no_cache);
|
||||
break;
|
||||
}
|
||||
|
||||
*insn++ = BPF_JMP_IMM(BPF_JSGE, BPF_REG_0, 0, 2);
|
||||
*insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_0, BPF_REG_0);
|
||||
*insn++ = BPF_EXIT_INSN();
|
||||
|
||||
return insn - insn_buf;
|
||||
}
|
||||
|
||||
static int tc_cls_act_prologue(struct bpf_insn *insn_buf, bool direct_write,
|
||||
const struct bpf_prog *prog)
|
||||
{
|
||||
@ -5573,6 +5856,7 @@ const struct bpf_verifier_ops sk_filter_verifier_ops = {
|
||||
.get_func_proto = sk_filter_func_proto,
|
||||
.is_valid_access = sk_filter_is_valid_access,
|
||||
.convert_ctx_access = bpf_convert_ctx_access,
|
||||
.gen_ld_abs = bpf_gen_ld_abs,
|
||||
};
|
||||
|
||||
const struct bpf_prog_ops sk_filter_prog_ops = {
|
||||
@ -5584,6 +5868,7 @@ const struct bpf_verifier_ops tc_cls_act_verifier_ops = {
|
||||
.is_valid_access = tc_cls_act_is_valid_access,
|
||||
.convert_ctx_access = tc_cls_act_convert_ctx_access,
|
||||
.gen_prologue = tc_cls_act_prologue,
|
||||
.gen_ld_abs = bpf_gen_ld_abs,
|
||||
};
|
||||
|
||||
const struct bpf_prog_ops tc_cls_act_prog_ops = {
|
||||
|
@ -226,7 +226,8 @@ static struct lock_class_key af_family_kern_slock_keys[AF_MAX];
|
||||
x "AF_RXRPC" , x "AF_ISDN" , x "AF_PHONET" , \
|
||||
x "AF_IEEE802154", x "AF_CAIF" , x "AF_ALG" , \
|
||||
x "AF_NFC" , x "AF_VSOCK" , x "AF_KCM" , \
|
||||
x "AF_QIPCRTR", x "AF_SMC" , x "AF_MAX"
|
||||
x "AF_QIPCRTR", x "AF_SMC" , x "AF_XDP" , \
|
||||
x "AF_MAX"
|
||||
|
||||
static const char *const af_family_key_strings[AF_MAX+1] = {
|
||||
_sock_locks("sk_lock-")
|
||||
@ -262,7 +263,8 @@ static const char *const af_family_rlock_key_strings[AF_MAX+1] = {
|
||||
"rlock-AF_RXRPC" , "rlock-AF_ISDN" , "rlock-AF_PHONET" ,
|
||||
"rlock-AF_IEEE802154", "rlock-AF_CAIF" , "rlock-AF_ALG" ,
|
||||
"rlock-AF_NFC" , "rlock-AF_VSOCK" , "rlock-AF_KCM" ,
|
||||
"rlock-AF_QIPCRTR", "rlock-AF_SMC" , "rlock-AF_MAX"
|
||||
"rlock-AF_QIPCRTR", "rlock-AF_SMC" , "rlock-AF_XDP" ,
|
||||
"rlock-AF_MAX"
|
||||
};
|
||||
static const char *const af_family_wlock_key_strings[AF_MAX+1] = {
|
||||
"wlock-AF_UNSPEC", "wlock-AF_UNIX" , "wlock-AF_INET" ,
|
||||
@ -279,7 +281,8 @@ static const char *const af_family_wlock_key_strings[AF_MAX+1] = {
|
||||
"wlock-AF_RXRPC" , "wlock-AF_ISDN" , "wlock-AF_PHONET" ,
|
||||
"wlock-AF_IEEE802154", "wlock-AF_CAIF" , "wlock-AF_ALG" ,
|
||||
"wlock-AF_NFC" , "wlock-AF_VSOCK" , "wlock-AF_KCM" ,
|
||||
"wlock-AF_QIPCRTR", "wlock-AF_SMC" , "wlock-AF_MAX"
|
||||
"wlock-AF_QIPCRTR", "wlock-AF_SMC" , "wlock-AF_XDP" ,
|
||||
"wlock-AF_MAX"
|
||||
};
|
||||
static const char *const af_family_elock_key_strings[AF_MAX+1] = {
|
||||
"elock-AF_UNSPEC", "elock-AF_UNIX" , "elock-AF_INET" ,
|
||||
@ -296,7 +299,8 @@ static const char *const af_family_elock_key_strings[AF_MAX+1] = {
|
||||
"elock-AF_RXRPC" , "elock-AF_ISDN" , "elock-AF_PHONET" ,
|
||||
"elock-AF_IEEE802154", "elock-AF_CAIF" , "elock-AF_ALG" ,
|
||||
"elock-AF_NFC" , "elock-AF_VSOCK" , "elock-AF_KCM" ,
|
||||
"elock-AF_QIPCRTR", "elock-AF_SMC" , "elock-AF_MAX"
|
||||
"elock-AF_QIPCRTR", "elock-AF_SMC" , "elock-AF_XDP" ,
|
||||
"elock-AF_MAX"
|
||||
};
|
||||
|
||||
/*
|
||||
|
@ -308,11 +308,9 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
|
||||
|
||||
void xdp_return_frame(struct xdp_frame *xdpf)
|
||||
static void xdp_return(void *data, struct xdp_mem_info *mem)
|
||||
{
|
||||
struct xdp_mem_info *mem = &xdpf->mem;
|
||||
struct xdp_mem_allocator *xa;
|
||||
void *data = xdpf->data;
|
||||
struct page *page;
|
||||
|
||||
switch (mem->type) {
|
||||
@ -339,4 +337,15 @@ void xdp_return_frame(struct xdp_frame *xdpf)
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
void xdp_return_frame(struct xdp_frame *xdpf)
|
||||
{
|
||||
xdp_return(xdpf->data, &xdpf->mem);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(xdp_return_frame);
|
||||
|
||||
void xdp_return_buff(struct xdp_buff *xdp)
|
||||
{
|
||||
xdp_return(xdp->data, &xdp->rxq->mem);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(xdp_return_buff);
|
||||
|
@ -209,7 +209,7 @@ static void prb_clear_rxhash(struct tpacket_kbdq_core *,
|
||||
static void prb_fill_vlan_info(struct tpacket_kbdq_core *,
|
||||
struct tpacket3_hdr *);
|
||||
static void packet_flush_mclist(struct sock *sk);
|
||||
static void packet_pick_tx_queue(struct net_device *dev, struct sk_buff *skb);
|
||||
static u16 packet_pick_tx_queue(struct sk_buff *skb);
|
||||
|
||||
struct packet_skb_cb {
|
||||
union {
|
||||
@ -243,40 +243,7 @@ static void __fanout_link(struct sock *sk, struct packet_sock *po);
|
||||
|
||||
static int packet_direct_xmit(struct sk_buff *skb)
|
||||
{
|
||||
struct net_device *dev = skb->dev;
|
||||
struct sk_buff *orig_skb = skb;
|
||||
struct netdev_queue *txq;
|
||||
int ret = NETDEV_TX_BUSY;
|
||||
bool again = false;
|
||||
|
||||
if (unlikely(!netif_running(dev) ||
|
||||
!netif_carrier_ok(dev)))
|
||||
goto drop;
|
||||
|
||||
skb = validate_xmit_skb_list(skb, dev, &again);
|
||||
if (skb != orig_skb)
|
||||
goto drop;
|
||||
|
||||
packet_pick_tx_queue(dev, skb);
|
||||
txq = skb_get_tx_queue(dev, skb);
|
||||
|
||||
local_bh_disable();
|
||||
|
||||
HARD_TX_LOCK(dev, txq, smp_processor_id());
|
||||
if (!netif_xmit_frozen_or_drv_stopped(txq))
|
||||
ret = netdev_start_xmit(skb, dev, txq, false);
|
||||
HARD_TX_UNLOCK(dev, txq);
|
||||
|
||||
local_bh_enable();
|
||||
|
||||
if (!dev_xmit_complete(ret))
|
||||
kfree_skb(skb);
|
||||
|
||||
return ret;
|
||||
drop:
|
||||
atomic_long_inc(&dev->tx_dropped);
|
||||
kfree_skb_list(skb);
|
||||
return NET_XMIT_DROP;
|
||||
return dev_direct_xmit(skb, packet_pick_tx_queue(skb));
|
||||
}
|
||||
|
||||
static struct net_device *packet_cached_dev_get(struct packet_sock *po)
|
||||
@ -313,8 +280,9 @@ static u16 __packet_pick_tx_queue(struct net_device *dev, struct sk_buff *skb)
|
||||
return (u16) raw_smp_processor_id() % dev->real_num_tx_queues;
|
||||
}
|
||||
|
||||
static void packet_pick_tx_queue(struct net_device *dev, struct sk_buff *skb)
|
||||
static u16 packet_pick_tx_queue(struct sk_buff *skb)
|
||||
{
|
||||
struct net_device *dev = skb->dev;
|
||||
const struct net_device_ops *ops = dev->netdev_ops;
|
||||
u16 queue_index;
|
||||
|
||||
@ -326,7 +294,7 @@ static void packet_pick_tx_queue(struct net_device *dev, struct sk_buff *skb)
|
||||
queue_index = __packet_pick_tx_queue(dev, skb);
|
||||
}
|
||||
|
||||
skb_set_queue_mapping(skb, queue_index);
|
||||
return queue_index;
|
||||
}
|
||||
|
||||
/* __register_prot_hook must be invoked through register_prot_hook
|
||||
|
7
net/xdp/Kconfig
Normal file
7
net/xdp/Kconfig
Normal file
@ -0,0 +1,7 @@
|
||||
config XDP_SOCKETS
|
||||
bool "XDP sockets"
|
||||
depends on BPF_SYSCALL
|
||||
default n
|
||||
help
|
||||
XDP sockets allows a channel between XDP programs and
|
||||
userspace applications.
|
2
net/xdp/Makefile
Normal file
2
net/xdp/Makefile
Normal file
@ -0,0 +1,2 @@
|
||||
obj-$(CONFIG_XDP_SOCKETS) += xsk.o xdp_umem.o xsk_queue.o
|
||||
|
260
net/xdp/xdp_umem.c
Normal file
260
net/xdp/xdp_umem.c
Normal file
@ -0,0 +1,260 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/* XDP user-space packet buffer
|
||||
* Copyright(c) 2018 Intel Corporation.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it
|
||||
* under the terms and conditions of the GNU General Public License,
|
||||
* version 2, as published by the Free Software Foundation.
|
||||
*
|
||||
* This program is distributed in the hope it will be useful, but WITHOUT
|
||||
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
|
||||
* more details.
|
||||
*/
|
||||
|
||||
#include <linux/init.h>
|
||||
#include <linux/sched/mm.h>
|
||||
#include <linux/sched/signal.h>
|
||||
#include <linux/sched/task.h>
|
||||
#include <linux/uaccess.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/bpf.h>
|
||||
#include <linux/mm.h>
|
||||
|
||||
#include "xdp_umem.h"
|
||||
|
||||
#define XDP_UMEM_MIN_FRAME_SIZE 2048
|
||||
|
||||
int xdp_umem_create(struct xdp_umem **umem)
|
||||
{
|
||||
*umem = kzalloc(sizeof(**umem), GFP_KERNEL);
|
||||
|
||||
if (!(*umem))
|
||||
return -ENOMEM;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void xdp_umem_unpin_pages(struct xdp_umem *umem)
|
||||
{
|
||||
unsigned int i;
|
||||
|
||||
if (umem->pgs) {
|
||||
for (i = 0; i < umem->npgs; i++) {
|
||||
struct page *page = umem->pgs[i];
|
||||
|
||||
set_page_dirty_lock(page);
|
||||
put_page(page);
|
||||
}
|
||||
|
||||
kfree(umem->pgs);
|
||||
umem->pgs = NULL;
|
||||
}
|
||||
}
|
||||
|
||||
static void xdp_umem_unaccount_pages(struct xdp_umem *umem)
|
||||
{
|
||||
if (umem->user) {
|
||||
atomic_long_sub(umem->npgs, &umem->user->locked_vm);
|
||||
free_uid(umem->user);
|
||||
}
|
||||
}
|
||||
|
||||
static void xdp_umem_release(struct xdp_umem *umem)
|
||||
{
|
||||
struct task_struct *task;
|
||||
struct mm_struct *mm;
|
||||
|
||||
if (umem->fq) {
|
||||
xskq_destroy(umem->fq);
|
||||
umem->fq = NULL;
|
||||
}
|
||||
|
||||
if (umem->cq) {
|
||||
xskq_destroy(umem->cq);
|
||||
umem->cq = NULL;
|
||||
}
|
||||
|
||||
if (umem->pgs) {
|
||||
xdp_umem_unpin_pages(umem);
|
||||
|
||||
task = get_pid_task(umem->pid, PIDTYPE_PID);
|
||||
put_pid(umem->pid);
|
||||
if (!task)
|
||||
goto out;
|
||||
mm = get_task_mm(task);
|
||||
put_task_struct(task);
|
||||
if (!mm)
|
||||
goto out;
|
||||
|
||||
mmput(mm);
|
||||
umem->pgs = NULL;
|
||||
}
|
||||
|
||||
xdp_umem_unaccount_pages(umem);
|
||||
out:
|
||||
kfree(umem);
|
||||
}
|
||||
|
||||
static void xdp_umem_release_deferred(struct work_struct *work)
|
||||
{
|
||||
struct xdp_umem *umem = container_of(work, struct xdp_umem, work);
|
||||
|
||||
xdp_umem_release(umem);
|
||||
}
|
||||
|
||||
void xdp_get_umem(struct xdp_umem *umem)
|
||||
{
|
||||
atomic_inc(&umem->users);
|
||||
}
|
||||
|
||||
void xdp_put_umem(struct xdp_umem *umem)
|
||||
{
|
||||
if (!umem)
|
||||
return;
|
||||
|
||||
if (atomic_dec_and_test(&umem->users)) {
|
||||
INIT_WORK(&umem->work, xdp_umem_release_deferred);
|
||||
schedule_work(&umem->work);
|
||||
}
|
||||
}
|
||||
|
||||
static int xdp_umem_pin_pages(struct xdp_umem *umem)
|
||||
{
|
||||
unsigned int gup_flags = FOLL_WRITE;
|
||||
long npgs;
|
||||
int err;
|
||||
|
||||
umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs), GFP_KERNEL);
|
||||
if (!umem->pgs)
|
||||
return -ENOMEM;
|
||||
|
||||
down_write(¤t->mm->mmap_sem);
|
||||
npgs = get_user_pages(umem->address, umem->npgs,
|
||||
gup_flags, &umem->pgs[0], NULL);
|
||||
up_write(¤t->mm->mmap_sem);
|
||||
|
||||
if (npgs != umem->npgs) {
|
||||
if (npgs >= 0) {
|
||||
umem->npgs = npgs;
|
||||
err = -ENOMEM;
|
||||
goto out_pin;
|
||||
}
|
||||
err = npgs;
|
||||
goto out_pgs;
|
||||
}
|
||||
return 0;
|
||||
|
||||
out_pin:
|
||||
xdp_umem_unpin_pages(umem);
|
||||
out_pgs:
|
||||
kfree(umem->pgs);
|
||||
umem->pgs = NULL;
|
||||
return err;
|
||||
}
|
||||
|
||||
static int xdp_umem_account_pages(struct xdp_umem *umem)
|
||||
{
|
||||
unsigned long lock_limit, new_npgs, old_npgs;
|
||||
|
||||
if (capable(CAP_IPC_LOCK))
|
||||
return 0;
|
||||
|
||||
lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
|
||||
umem->user = get_uid(current_user());
|
||||
|
||||
do {
|
||||
old_npgs = atomic_long_read(&umem->user->locked_vm);
|
||||
new_npgs = old_npgs + umem->npgs;
|
||||
if (new_npgs > lock_limit) {
|
||||
free_uid(umem->user);
|
||||
umem->user = NULL;
|
||||
return -ENOBUFS;
|
||||
}
|
||||
} while (atomic_long_cmpxchg(&umem->user->locked_vm, old_npgs,
|
||||
new_npgs) != old_npgs);
|
||||
return 0;
|
||||
}
|
||||
|
||||
int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
|
||||
{
|
||||
u32 frame_size = mr->frame_size, frame_headroom = mr->frame_headroom;
|
||||
u64 addr = mr->addr, size = mr->len;
|
||||
unsigned int nframes, nfpp;
|
||||
int size_chk, err;
|
||||
|
||||
if (!umem)
|
||||
return -EINVAL;
|
||||
|
||||
if (frame_size < XDP_UMEM_MIN_FRAME_SIZE || frame_size > PAGE_SIZE) {
|
||||
/* Strictly speaking we could support this, if:
|
||||
* - huge pages, or*
|
||||
* - using an IOMMU, or
|
||||
* - making sure the memory area is consecutive
|
||||
* but for now, we simply say "computer says no".
|
||||
*/
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if (!is_power_of_2(frame_size))
|
||||
return -EINVAL;
|
||||
|
||||
if (!PAGE_ALIGNED(addr)) {
|
||||
/* Memory area has to be page size aligned. For
|
||||
* simplicity, this might change.
|
||||
*/
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if ((addr + size) < addr)
|
||||
return -EINVAL;
|
||||
|
||||
nframes = size / frame_size;
|
||||
if (nframes == 0 || nframes > UINT_MAX)
|
||||
return -EINVAL;
|
||||
|
||||
nfpp = PAGE_SIZE / frame_size;
|
||||
if (nframes < nfpp || nframes % nfpp)
|
||||
return -EINVAL;
|
||||
|
||||
frame_headroom = ALIGN(frame_headroom, 64);
|
||||
|
||||
size_chk = frame_size - frame_headroom - XDP_PACKET_HEADROOM;
|
||||
if (size_chk < 0)
|
||||
return -EINVAL;
|
||||
|
||||
umem->pid = get_task_pid(current, PIDTYPE_PID);
|
||||
umem->size = (size_t)size;
|
||||
umem->address = (unsigned long)addr;
|
||||
umem->props.frame_size = frame_size;
|
||||
umem->props.nframes = nframes;
|
||||
umem->frame_headroom = frame_headroom;
|
||||
umem->npgs = size / PAGE_SIZE;
|
||||
umem->pgs = NULL;
|
||||
umem->user = NULL;
|
||||
|
||||
umem->frame_size_log2 = ilog2(frame_size);
|
||||
umem->nfpp_mask = nfpp - 1;
|
||||
umem->nfpplog2 = ilog2(nfpp);
|
||||
atomic_set(&umem->users, 1);
|
||||
|
||||
err = xdp_umem_account_pages(umem);
|
||||
if (err)
|
||||
goto out;
|
||||
|
||||
err = xdp_umem_pin_pages(umem);
|
||||
if (err)
|
||||
goto out_account;
|
||||
return 0;
|
||||
|
||||
out_account:
|
||||
xdp_umem_unaccount_pages(umem);
|
||||
out:
|
||||
put_pid(umem->pid);
|
||||
return err;
|
||||
}
|
||||
|
||||
bool xdp_umem_validate_queues(struct xdp_umem *umem)
|
||||
{
|
||||
return (umem->fq && umem->cq);
|
||||
}
|
67
net/xdp/xdp_umem.h
Normal file
67
net/xdp/xdp_umem.h
Normal file
@ -0,0 +1,67 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0
|
||||
* XDP user-space packet buffer
|
||||
* Copyright(c) 2018 Intel Corporation.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it
|
||||
* under the terms and conditions of the GNU General Public License,
|
||||
* version 2, as published by the Free Software Foundation.
|
||||
*
|
||||
* This program is distributed in the hope it will be useful, but WITHOUT
|
||||
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
|
||||
* more details.
|
||||
*/
|
||||
|
||||
#ifndef XDP_UMEM_H_
|
||||
#define XDP_UMEM_H_
|
||||
|
||||
#include <linux/mm.h>
|
||||
#include <linux/if_xdp.h>
|
||||
#include <linux/workqueue.h>
|
||||
|
||||
#include "xsk_queue.h"
|
||||
#include "xdp_umem_props.h"
|
||||
|
||||
struct xdp_umem {
|
||||
struct xsk_queue *fq;
|
||||
struct xsk_queue *cq;
|
||||
struct page **pgs;
|
||||
struct xdp_umem_props props;
|
||||
u32 npgs;
|
||||
u32 frame_headroom;
|
||||
u32 nfpp_mask;
|
||||
u32 nfpplog2;
|
||||
u32 frame_size_log2;
|
||||
struct user_struct *user;
|
||||
struct pid *pid;
|
||||
unsigned long address;
|
||||
size_t size;
|
||||
atomic_t users;
|
||||
struct work_struct work;
|
||||
};
|
||||
|
||||
static inline char *xdp_umem_get_data(struct xdp_umem *umem, u32 idx)
|
||||
{
|
||||
u64 pg, off;
|
||||
char *data;
|
||||
|
||||
pg = idx >> umem->nfpplog2;
|
||||
off = (idx & umem->nfpp_mask) << umem->frame_size_log2;
|
||||
|
||||
data = page_address(umem->pgs[pg]);
|
||||
return data + off;
|
||||
}
|
||||
|
||||
static inline char *xdp_umem_get_data_with_headroom(struct xdp_umem *umem,
|
||||
u32 idx)
|
||||
{
|
||||
return xdp_umem_get_data(umem, idx) + umem->frame_headroom;
|
||||
}
|
||||
|
||||
bool xdp_umem_validate_queues(struct xdp_umem *umem);
|
||||
int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr);
|
||||
void xdp_get_umem(struct xdp_umem *umem);
|
||||
void xdp_put_umem(struct xdp_umem *umem);
|
||||
int xdp_umem_create(struct xdp_umem **umem);
|
||||
|
||||
#endif /* XDP_UMEM_H_ */
|
23
net/xdp/xdp_umem_props.h
Normal file
23
net/xdp/xdp_umem_props.h
Normal file
@ -0,0 +1,23 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0
|
||||
* XDP user-space packet buffer
|
||||
* Copyright(c) 2018 Intel Corporation.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it
|
||||
* under the terms and conditions of the GNU General Public License,
|
||||
* version 2, as published by the Free Software Foundation.
|
||||
*
|
||||
* This program is distributed in the hope it will be useful, but WITHOUT
|
||||
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
|
||||
* more details.
|
||||
*/
|
||||
|
||||
#ifndef XDP_UMEM_PROPS_H_
|
||||
#define XDP_UMEM_PROPS_H_
|
||||
|
||||
struct xdp_umem_props {
|
||||
u32 frame_size;
|
||||
u32 nframes;
|
||||
};
|
||||
|
||||
#endif /* XDP_UMEM_PROPS_H_ */
|
656
net/xdp/xsk.c
Normal file
656
net/xdp/xsk.c
Normal file
@ -0,0 +1,656 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/* XDP sockets
|
||||
*
|
||||
* AF_XDP sockets allows a channel between XDP programs and userspace
|
||||
* applications.
|
||||
* Copyright(c) 2018 Intel Corporation.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it
|
||||
* under the terms and conditions of the GNU General Public License,
|
||||
* version 2, as published by the Free Software Foundation.
|
||||
*
|
||||
* This program is distributed in the hope it will be useful, but WITHOUT
|
||||
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
|
||||
* more details.
|
||||
*
|
||||
* Author(s): Björn Töpel <bjorn.topel@intel.com>
|
||||
* Magnus Karlsson <magnus.karlsson@intel.com>
|
||||
*/
|
||||
|
||||
#define pr_fmt(fmt) "AF_XDP: %s: " fmt, __func__
|
||||
|
||||
#include <linux/if_xdp.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/sched/mm.h>
|
||||
#include <linux/sched/signal.h>
|
||||
#include <linux/sched/task.h>
|
||||
#include <linux/socket.h>
|
||||
#include <linux/file.h>
|
||||
#include <linux/uaccess.h>
|
||||
#include <linux/net.h>
|
||||
#include <linux/netdevice.h>
|
||||
#include <net/xdp_sock.h>
|
||||
#include <net/xdp.h>
|
||||
|
||||
#include "xsk_queue.h"
|
||||
#include "xdp_umem.h"
|
||||
|
||||
#define TX_BATCH_SIZE 16
|
||||
|
||||
static struct xdp_sock *xdp_sk(struct sock *sk)
|
||||
{
|
||||
return (struct xdp_sock *)sk;
|
||||
}
|
||||
|
||||
bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
|
||||
{
|
||||
return !!xs->rx;
|
||||
}
|
||||
|
||||
static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
|
||||
{
|
||||
u32 *id, len = xdp->data_end - xdp->data;
|
||||
void *buffer;
|
||||
int err = 0;
|
||||
|
||||
if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
|
||||
return -EINVAL;
|
||||
|
||||
id = xskq_peek_id(xs->umem->fq);
|
||||
if (!id)
|
||||
return -ENOSPC;
|
||||
|
||||
buffer = xdp_umem_get_data_with_headroom(xs->umem, *id);
|
||||
memcpy(buffer, xdp->data, len);
|
||||
err = xskq_produce_batch_desc(xs->rx, *id, len,
|
||||
xs->umem->frame_headroom);
|
||||
if (!err)
|
||||
xskq_discard_id(xs->umem->fq);
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
|
||||
{
|
||||
int err;
|
||||
|
||||
err = __xsk_rcv(xs, xdp);
|
||||
if (likely(!err))
|
||||
xdp_return_buff(xdp);
|
||||
else
|
||||
xs->rx_dropped++;
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
void xsk_flush(struct xdp_sock *xs)
|
||||
{
|
||||
xskq_produce_flush_desc(xs->rx);
|
||||
xs->sk.sk_data_ready(&xs->sk);
|
||||
}
|
||||
|
||||
int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
|
||||
{
|
||||
int err;
|
||||
|
||||
err = __xsk_rcv(xs, xdp);
|
||||
if (!err)
|
||||
xsk_flush(xs);
|
||||
else
|
||||
xs->rx_dropped++;
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
static void xsk_destruct_skb(struct sk_buff *skb)
|
||||
{
|
||||
u32 id = (u32)(long)skb_shinfo(skb)->destructor_arg;
|
||||
struct xdp_sock *xs = xdp_sk(skb->sk);
|
||||
|
||||
WARN_ON_ONCE(xskq_produce_id(xs->umem->cq, id));
|
||||
|
||||
sock_wfree(skb);
|
||||
}
|
||||
|
||||
static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
|
||||
size_t total_len)
|
||||
{
|
||||
bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
|
||||
u32 max_batch = TX_BATCH_SIZE;
|
||||
struct xdp_sock *xs = xdp_sk(sk);
|
||||
bool sent_frame = false;
|
||||
struct xdp_desc desc;
|
||||
struct sk_buff *skb;
|
||||
int err = 0;
|
||||
|
||||
if (unlikely(!xs->tx))
|
||||
return -ENOBUFS;
|
||||
if (need_wait)
|
||||
return -EOPNOTSUPP;
|
||||
|
||||
mutex_lock(&xs->mutex);
|
||||
|
||||
while (xskq_peek_desc(xs->tx, &desc)) {
|
||||
char *buffer;
|
||||
u32 id, len;
|
||||
|
||||
if (max_batch-- == 0) {
|
||||
err = -EAGAIN;
|
||||
goto out;
|
||||
}
|
||||
|
||||
if (xskq_reserve_id(xs->umem->cq)) {
|
||||
err = -EAGAIN;
|
||||
goto out;
|
||||
}
|
||||
|
||||
len = desc.len;
|
||||
if (unlikely(len > xs->dev->mtu)) {
|
||||
err = -EMSGSIZE;
|
||||
goto out;
|
||||
}
|
||||
|
||||
skb = sock_alloc_send_skb(sk, len, !need_wait, &err);
|
||||
if (unlikely(!skb)) {
|
||||
err = -EAGAIN;
|
||||
goto out;
|
||||
}
|
||||
|
||||
skb_put(skb, len);
|
||||
id = desc.idx;
|
||||
buffer = xdp_umem_get_data(xs->umem, id) + desc.offset;
|
||||
err = skb_store_bits(skb, 0, buffer, len);
|
||||
if (unlikely(err)) {
|
||||
kfree_skb(skb);
|
||||
goto out;
|
||||
}
|
||||
|
||||
skb->dev = xs->dev;
|
||||
skb->priority = sk->sk_priority;
|
||||
skb->mark = sk->sk_mark;
|
||||
skb_shinfo(skb)->destructor_arg = (void *)(long)id;
|
||||
skb->destructor = xsk_destruct_skb;
|
||||
|
||||
err = dev_direct_xmit(skb, xs->queue_id);
|
||||
/* Ignore NET_XMIT_CN as packet might have been sent */
|
||||
if (err == NET_XMIT_DROP || err == NETDEV_TX_BUSY) {
|
||||
err = -EAGAIN;
|
||||
/* SKB consumed by dev_direct_xmit() */
|
||||
goto out;
|
||||
}
|
||||
|
||||
sent_frame = true;
|
||||
xskq_discard_desc(xs->tx);
|
||||
}
|
||||
|
||||
out:
|
||||
if (sent_frame)
|
||||
sk->sk_write_space(sk);
|
||||
|
||||
mutex_unlock(&xs->mutex);
|
||||
return err;
|
||||
}
|
||||
|
||||
static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
|
||||
{
|
||||
struct sock *sk = sock->sk;
|
||||
struct xdp_sock *xs = xdp_sk(sk);
|
||||
|
||||
if (unlikely(!xs->dev))
|
||||
return -ENXIO;
|
||||
if (unlikely(!(xs->dev->flags & IFF_UP)))
|
||||
return -ENETDOWN;
|
||||
|
||||
return xsk_generic_xmit(sk, m, total_len);
|
||||
}
|
||||
|
||||
static unsigned int xsk_poll(struct file *file, struct socket *sock,
|
||||
struct poll_table_struct *wait)
|
||||
{
|
||||
unsigned int mask = datagram_poll(file, sock, wait);
|
||||
struct sock *sk = sock->sk;
|
||||
struct xdp_sock *xs = xdp_sk(sk);
|
||||
|
||||
if (xs->rx && !xskq_empty_desc(xs->rx))
|
||||
mask |= POLLIN | POLLRDNORM;
|
||||
if (xs->tx && !xskq_full_desc(xs->tx))
|
||||
mask |= POLLOUT | POLLWRNORM;
|
||||
|
||||
return mask;
|
||||
}
|
||||
|
||||
static int xsk_init_queue(u32 entries, struct xsk_queue **queue,
|
||||
bool umem_queue)
|
||||
{
|
||||
struct xsk_queue *q;
|
||||
|
||||
if (entries == 0 || *queue || !is_power_of_2(entries))
|
||||
return -EINVAL;
|
||||
|
||||
q = xskq_create(entries, umem_queue);
|
||||
if (!q)
|
||||
return -ENOMEM;
|
||||
|
||||
*queue = q;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void __xsk_release(struct xdp_sock *xs)
|
||||
{
|
||||
/* Wait for driver to stop using the xdp socket. */
|
||||
synchronize_net();
|
||||
|
||||
dev_put(xs->dev);
|
||||
}
|
||||
|
||||
static int xsk_release(struct socket *sock)
|
||||
{
|
||||
struct sock *sk = sock->sk;
|
||||
struct xdp_sock *xs = xdp_sk(sk);
|
||||
struct net *net;
|
||||
|
||||
if (!sk)
|
||||
return 0;
|
||||
|
||||
net = sock_net(sk);
|
||||
|
||||
local_bh_disable();
|
||||
sock_prot_inuse_add(net, sk->sk_prot, -1);
|
||||
local_bh_enable();
|
||||
|
||||
if (xs->dev) {
|
||||
__xsk_release(xs);
|
||||
xs->dev = NULL;
|
||||
}
|
||||
|
||||
sock_orphan(sk);
|
||||
sock->sk = NULL;
|
||||
|
||||
sk_refcnt_debug_release(sk);
|
||||
sock_put(sk);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static struct socket *xsk_lookup_xsk_from_fd(int fd)
|
||||
{
|
||||
struct socket *sock;
|
||||
int err;
|
||||
|
||||
sock = sockfd_lookup(fd, &err);
|
||||
if (!sock)
|
||||
return ERR_PTR(-ENOTSOCK);
|
||||
|
||||
if (sock->sk->sk_family != PF_XDP) {
|
||||
sockfd_put(sock);
|
||||
return ERR_PTR(-ENOPROTOOPT);
|
||||
}
|
||||
|
||||
return sock;
|
||||
}
|
||||
|
||||
static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
|
||||
{
|
||||
struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr;
|
||||
struct sock *sk = sock->sk;
|
||||
struct net_device *dev, *dev_curr;
|
||||
struct xdp_sock *xs = xdp_sk(sk);
|
||||
struct xdp_umem *old_umem = NULL;
|
||||
int err = 0;
|
||||
|
||||
if (addr_len < sizeof(struct sockaddr_xdp))
|
||||
return -EINVAL;
|
||||
if (sxdp->sxdp_family != AF_XDP)
|
||||
return -EINVAL;
|
||||
|
||||
mutex_lock(&xs->mutex);
|
||||
dev_curr = xs->dev;
|
||||
dev = dev_get_by_index(sock_net(sk), sxdp->sxdp_ifindex);
|
||||
if (!dev) {
|
||||
err = -ENODEV;
|
||||
goto out_release;
|
||||
}
|
||||
|
||||
if (!xs->rx && !xs->tx) {
|
||||
err = -EINVAL;
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
if (sxdp->sxdp_queue_id >= dev->num_rx_queues) {
|
||||
err = -EINVAL;
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
if (sxdp->sxdp_flags & XDP_SHARED_UMEM) {
|
||||
struct xdp_sock *umem_xs;
|
||||
struct socket *sock;
|
||||
|
||||
if (xs->umem) {
|
||||
/* We have already our own. */
|
||||
err = -EINVAL;
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
sock = xsk_lookup_xsk_from_fd(sxdp->sxdp_shared_umem_fd);
|
||||
if (IS_ERR(sock)) {
|
||||
err = PTR_ERR(sock);
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
umem_xs = xdp_sk(sock->sk);
|
||||
if (!umem_xs->umem) {
|
||||
/* No umem to inherit. */
|
||||
err = -EBADF;
|
||||
sockfd_put(sock);
|
||||
goto out_unlock;
|
||||
} else if (umem_xs->dev != dev ||
|
||||
umem_xs->queue_id != sxdp->sxdp_queue_id) {
|
||||
err = -EINVAL;
|
||||
sockfd_put(sock);
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
xdp_get_umem(umem_xs->umem);
|
||||
old_umem = xs->umem;
|
||||
xs->umem = umem_xs->umem;
|
||||
sockfd_put(sock);
|
||||
} else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) {
|
||||
err = -EINVAL;
|
||||
goto out_unlock;
|
||||
} else {
|
||||
/* This xsk has its own umem. */
|
||||
xskq_set_umem(xs->umem->fq, &xs->umem->props);
|
||||
xskq_set_umem(xs->umem->cq, &xs->umem->props);
|
||||
}
|
||||
|
||||
/* Rebind? */
|
||||
if (dev_curr && (dev_curr != dev ||
|
||||
xs->queue_id != sxdp->sxdp_queue_id)) {
|
||||
__xsk_release(xs);
|
||||
if (old_umem)
|
||||
xdp_put_umem(old_umem);
|
||||
}
|
||||
|
||||
xs->dev = dev;
|
||||
xs->queue_id = sxdp->sxdp_queue_id;
|
||||
|
||||
xskq_set_umem(xs->rx, &xs->umem->props);
|
||||
xskq_set_umem(xs->tx, &xs->umem->props);
|
||||
|
||||
out_unlock:
|
||||
if (err)
|
||||
dev_put(dev);
|
||||
out_release:
|
||||
mutex_unlock(&xs->mutex);
|
||||
return err;
|
||||
}
|
||||
|
||||
static int xsk_setsockopt(struct socket *sock, int level, int optname,
|
||||
char __user *optval, unsigned int optlen)
|
||||
{
|
||||
struct sock *sk = sock->sk;
|
||||
struct xdp_sock *xs = xdp_sk(sk);
|
||||
int err;
|
||||
|
||||
if (level != SOL_XDP)
|
||||
return -ENOPROTOOPT;
|
||||
|
||||
switch (optname) {
|
||||
case XDP_RX_RING:
|
||||
case XDP_TX_RING:
|
||||
{
|
||||
struct xsk_queue **q;
|
||||
int entries;
|
||||
|
||||
if (optlen < sizeof(entries))
|
||||
return -EINVAL;
|
||||
if (copy_from_user(&entries, optval, sizeof(entries)))
|
||||
return -EFAULT;
|
||||
|
||||
mutex_lock(&xs->mutex);
|
||||
q = (optname == XDP_TX_RING) ? &xs->tx : &xs->rx;
|
||||
err = xsk_init_queue(entries, q, false);
|
||||
mutex_unlock(&xs->mutex);
|
||||
return err;
|
||||
}
|
||||
case XDP_UMEM_REG:
|
||||
{
|
||||
struct xdp_umem_reg mr;
|
||||
struct xdp_umem *umem;
|
||||
|
||||
if (xs->umem)
|
||||
return -EBUSY;
|
||||
|
||||
if (copy_from_user(&mr, optval, sizeof(mr)))
|
||||
return -EFAULT;
|
||||
|
||||
mutex_lock(&xs->mutex);
|
||||
err = xdp_umem_create(&umem);
|
||||
|
||||
err = xdp_umem_reg(umem, &mr);
|
||||
if (err) {
|
||||
kfree(umem);
|
||||
mutex_unlock(&xs->mutex);
|
||||
return err;
|
||||
}
|
||||
|
||||
/* Make sure umem is ready before it can be seen by others */
|
||||
smp_wmb();
|
||||
|
||||
xs->umem = umem;
|
||||
mutex_unlock(&xs->mutex);
|
||||
return 0;
|
||||
}
|
||||
case XDP_UMEM_FILL_RING:
|
||||
case XDP_UMEM_COMPLETION_RING:
|
||||
{
|
||||
struct xsk_queue **q;
|
||||
int entries;
|
||||
|
||||
if (!xs->umem)
|
||||
return -EINVAL;
|
||||
|
||||
if (copy_from_user(&entries, optval, sizeof(entries)))
|
||||
return -EFAULT;
|
||||
|
||||
mutex_lock(&xs->mutex);
|
||||
q = (optname == XDP_UMEM_FILL_RING) ? &xs->umem->fq :
|
||||
&xs->umem->cq;
|
||||
err = xsk_init_queue(entries, q, true);
|
||||
mutex_unlock(&xs->mutex);
|
||||
return err;
|
||||
}
|
||||
default:
|
||||
break;
|
||||
}
|
||||
|
||||
return -ENOPROTOOPT;
|
||||
}
|
||||
|
||||
static int xsk_getsockopt(struct socket *sock, int level, int optname,
|
||||
char __user *optval, int __user *optlen)
|
||||
{
|
||||
struct sock *sk = sock->sk;
|
||||
struct xdp_sock *xs = xdp_sk(sk);
|
||||
int len;
|
||||
|
||||
if (level != SOL_XDP)
|
||||
return -ENOPROTOOPT;
|
||||
|
||||
if (get_user(len, optlen))
|
||||
return -EFAULT;
|
||||
if (len < 0)
|
||||
return -EINVAL;
|
||||
|
||||
switch (optname) {
|
||||
case XDP_STATISTICS:
|
||||
{
|
||||
struct xdp_statistics stats;
|
||||
|
||||
if (len < sizeof(stats))
|
||||
return -EINVAL;
|
||||
|
||||
mutex_lock(&xs->mutex);
|
||||
stats.rx_dropped = xs->rx_dropped;
|
||||
stats.rx_invalid_descs = xskq_nb_invalid_descs(xs->rx);
|
||||
stats.tx_invalid_descs = xskq_nb_invalid_descs(xs->tx);
|
||||
mutex_unlock(&xs->mutex);
|
||||
|
||||
if (copy_to_user(optval, &stats, sizeof(stats)))
|
||||
return -EFAULT;
|
||||
if (put_user(sizeof(stats), optlen))
|
||||
return -EFAULT;
|
||||
|
||||
return 0;
|
||||
}
|
||||
default:
|
||||
break;
|
||||
}
|
||||
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
|
||||
static int xsk_mmap(struct file *file, struct socket *sock,
|
||||
struct vm_area_struct *vma)
|
||||
{
|
||||
unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
|
||||
unsigned long size = vma->vm_end - vma->vm_start;
|
||||
struct xdp_sock *xs = xdp_sk(sock->sk);
|
||||
struct xsk_queue *q = NULL;
|
||||
unsigned long pfn;
|
||||
struct page *qpg;
|
||||
|
||||
if (offset == XDP_PGOFF_RX_RING) {
|
||||
q = xs->rx;
|
||||
} else if (offset == XDP_PGOFF_TX_RING) {
|
||||
q = xs->tx;
|
||||
} else {
|
||||
if (!xs->umem)
|
||||
return -EINVAL;
|
||||
|
||||
if (offset == XDP_UMEM_PGOFF_FILL_RING)
|
||||
q = xs->umem->fq;
|
||||
else if (offset == XDP_UMEM_PGOFF_COMPLETION_RING)
|
||||
q = xs->umem->cq;
|
||||
}
|
||||
|
||||
if (!q)
|
||||
return -EINVAL;
|
||||
|
||||
qpg = virt_to_head_page(q->ring);
|
||||
if (size > (PAGE_SIZE << compound_order(qpg)))
|
||||
return -EINVAL;
|
||||
|
||||
pfn = virt_to_phys(q->ring) >> PAGE_SHIFT;
|
||||
return remap_pfn_range(vma, vma->vm_start, pfn,
|
||||
size, vma->vm_page_prot);
|
||||
}
|
||||
|
||||
static struct proto xsk_proto = {
|
||||
.name = "XDP",
|
||||
.owner = THIS_MODULE,
|
||||
.obj_size = sizeof(struct xdp_sock),
|
||||
};
|
||||
|
||||
static const struct proto_ops xsk_proto_ops = {
|
||||
.family = PF_XDP,
|
||||
.owner = THIS_MODULE,
|
||||
.release = xsk_release,
|
||||
.bind = xsk_bind,
|
||||
.connect = sock_no_connect,
|
||||
.socketpair = sock_no_socketpair,
|
||||
.accept = sock_no_accept,
|
||||
.getname = sock_no_getname,
|
||||
.poll = xsk_poll,
|
||||
.ioctl = sock_no_ioctl,
|
||||
.listen = sock_no_listen,
|
||||
.shutdown = sock_no_shutdown,
|
||||
.setsockopt = xsk_setsockopt,
|
||||
.getsockopt = xsk_getsockopt,
|
||||
.sendmsg = xsk_sendmsg,
|
||||
.recvmsg = sock_no_recvmsg,
|
||||
.mmap = xsk_mmap,
|
||||
.sendpage = sock_no_sendpage,
|
||||
};
|
||||
|
||||
static void xsk_destruct(struct sock *sk)
|
||||
{
|
||||
struct xdp_sock *xs = xdp_sk(sk);
|
||||
|
||||
if (!sock_flag(sk, SOCK_DEAD))
|
||||
return;
|
||||
|
||||
xskq_destroy(xs->rx);
|
||||
xskq_destroy(xs->tx);
|
||||
xdp_put_umem(xs->umem);
|
||||
|
||||
sk_refcnt_debug_dec(sk);
|
||||
}
|
||||
|
||||
static int xsk_create(struct net *net, struct socket *sock, int protocol,
|
||||
int kern)
|
||||
{
|
||||
struct sock *sk;
|
||||
struct xdp_sock *xs;
|
||||
|
||||
if (!ns_capable(net->user_ns, CAP_NET_RAW))
|
||||
return -EPERM;
|
||||
if (sock->type != SOCK_RAW)
|
||||
return -ESOCKTNOSUPPORT;
|
||||
|
||||
if (protocol)
|
||||
return -EPROTONOSUPPORT;
|
||||
|
||||
sock->state = SS_UNCONNECTED;
|
||||
|
||||
sk = sk_alloc(net, PF_XDP, GFP_KERNEL, &xsk_proto, kern);
|
||||
if (!sk)
|
||||
return -ENOBUFS;
|
||||
|
||||
sock->ops = &xsk_proto_ops;
|
||||
|
||||
sock_init_data(sock, sk);
|
||||
|
||||
sk->sk_family = PF_XDP;
|
||||
|
||||
sk->sk_destruct = xsk_destruct;
|
||||
sk_refcnt_debug_inc(sk);
|
||||
|
||||
xs = xdp_sk(sk);
|
||||
mutex_init(&xs->mutex);
|
||||
|
||||
local_bh_disable();
|
||||
sock_prot_inuse_add(net, &xsk_proto, 1);
|
||||
local_bh_enable();
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static const struct net_proto_family xsk_family_ops = {
|
||||
.family = PF_XDP,
|
||||
.create = xsk_create,
|
||||
.owner = THIS_MODULE,
|
||||
};
|
||||
|
||||
static int __init xsk_init(void)
|
||||
{
|
||||
int err;
|
||||
|
||||
err = proto_register(&xsk_proto, 0 /* no slab */);
|
||||
if (err)
|
||||
goto out;
|
||||
|
||||
err = sock_register(&xsk_family_ops);
|
||||
if (err)
|
||||
goto out_proto;
|
||||
|
||||
return 0;
|
||||
|
||||
out_proto:
|
||||
proto_unregister(&xsk_proto);
|
||||
out:
|
||||
return err;
|
||||
}
|
||||
|
||||
fs_initcall(xsk_init);
|
73
net/xdp/xsk_queue.c
Normal file
73
net/xdp/xsk_queue.c
Normal file
@ -0,0 +1,73 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/* XDP user-space ring structure
|
||||
* Copyright(c) 2018 Intel Corporation.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it
|
||||
* under the terms and conditions of the GNU General Public License,
|
||||
* version 2, as published by the Free Software Foundation.
|
||||
*
|
||||
* This program is distributed in the hope it will be useful, but WITHOUT
|
||||
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
|
||||
* more details.
|
||||
*/
|
||||
|
||||
#include <linux/slab.h>
|
||||
|
||||
#include "xsk_queue.h"
|
||||
|
||||
void xskq_set_umem(struct xsk_queue *q, struct xdp_umem_props *umem_props)
|
||||
{
|
||||
if (!q)
|
||||
return;
|
||||
|
||||
q->umem_props = *umem_props;
|
||||
}
|
||||
|
||||
static u32 xskq_umem_get_ring_size(struct xsk_queue *q)
|
||||
{
|
||||
return sizeof(struct xdp_umem_ring) + q->nentries * sizeof(u32);
|
||||
}
|
||||
|
||||
static u32 xskq_rxtx_get_ring_size(struct xsk_queue *q)
|
||||
{
|
||||
return (sizeof(struct xdp_ring) +
|
||||
q->nentries * sizeof(struct xdp_desc));
|
||||
}
|
||||
|
||||
struct xsk_queue *xskq_create(u32 nentries, bool umem_queue)
|
||||
{
|
||||
struct xsk_queue *q;
|
||||
gfp_t gfp_flags;
|
||||
size_t size;
|
||||
|
||||
q = kzalloc(sizeof(*q), GFP_KERNEL);
|
||||
if (!q)
|
||||
return NULL;
|
||||
|
||||
q->nentries = nentries;
|
||||
q->ring_mask = nentries - 1;
|
||||
|
||||
gfp_flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN |
|
||||
__GFP_COMP | __GFP_NORETRY;
|
||||
size = umem_queue ? xskq_umem_get_ring_size(q) :
|
||||
xskq_rxtx_get_ring_size(q);
|
||||
|
||||
q->ring = (struct xdp_ring *)__get_free_pages(gfp_flags,
|
||||
get_order(size));
|
||||
if (!q->ring) {
|
||||
kfree(q);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
return q;
|
||||
}
|
||||
|
||||
void xskq_destroy(struct xsk_queue *q)
|
||||
{
|
||||
if (!q)
|
||||
return;
|
||||
|
||||
page_frag_free(q->ring);
|
||||
kfree(q);
|
||||
}
|
247
net/xdp/xsk_queue.h
Normal file
247
net/xdp/xsk_queue.h
Normal file
@ -0,0 +1,247 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0
|
||||
* XDP user-space ring structure
|
||||
* Copyright(c) 2018 Intel Corporation.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it
|
||||
* under the terms and conditions of the GNU General Public License,
|
||||
* version 2, as published by the Free Software Foundation.
|
||||
*
|
||||
* This program is distributed in the hope it will be useful, but WITHOUT
|
||||
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
|
||||
* more details.
|
||||
*/
|
||||
|
||||
#ifndef _LINUX_XSK_QUEUE_H
|
||||
#define _LINUX_XSK_QUEUE_H
|
||||
|
||||
#include <linux/types.h>
|
||||
#include <linux/if_xdp.h>
|
||||
|
||||
#include "xdp_umem_props.h"
|
||||
|
||||
#define RX_BATCH_SIZE 16
|
||||
|
||||
struct xsk_queue {
|
||||
struct xdp_umem_props umem_props;
|
||||
u32 ring_mask;
|
||||
u32 nentries;
|
||||
u32 prod_head;
|
||||
u32 prod_tail;
|
||||
u32 cons_head;
|
||||
u32 cons_tail;
|
||||
struct xdp_ring *ring;
|
||||
u64 invalid_descs;
|
||||
};
|
||||
|
||||
/* Common functions operating for both RXTX and umem queues */
|
||||
|
||||
static inline u64 xskq_nb_invalid_descs(struct xsk_queue *q)
|
||||
{
|
||||
return q ? q->invalid_descs : 0;
|
||||
}
|
||||
|
||||
static inline u32 xskq_nb_avail(struct xsk_queue *q, u32 dcnt)
|
||||
{
|
||||
u32 entries = q->prod_tail - q->cons_tail;
|
||||
|
||||
if (entries == 0) {
|
||||
/* Refresh the local pointer */
|
||||
q->prod_tail = READ_ONCE(q->ring->producer);
|
||||
entries = q->prod_tail - q->cons_tail;
|
||||
}
|
||||
|
||||
return (entries > dcnt) ? dcnt : entries;
|
||||
}
|
||||
|
||||
static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt)
|
||||
{
|
||||
u32 free_entries = q->nentries - (producer - q->cons_tail);
|
||||
|
||||
if (free_entries >= dcnt)
|
||||
return free_entries;
|
||||
|
||||
/* Refresh the local tail pointer */
|
||||
q->cons_tail = READ_ONCE(q->ring->consumer);
|
||||
return q->nentries - (producer - q->cons_tail);
|
||||
}
|
||||
|
||||
/* UMEM queue */
|
||||
|
||||
static inline bool xskq_is_valid_id(struct xsk_queue *q, u32 idx)
|
||||
{
|
||||
if (unlikely(idx >= q->umem_props.nframes)) {
|
||||
q->invalid_descs++;
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
static inline u32 *xskq_validate_id(struct xsk_queue *q)
|
||||
{
|
||||
while (q->cons_tail != q->cons_head) {
|
||||
struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
|
||||
unsigned int idx = q->cons_tail & q->ring_mask;
|
||||
|
||||
if (xskq_is_valid_id(q, ring->desc[idx]))
|
||||
return &ring->desc[idx];
|
||||
|
||||
q->cons_tail++;
|
||||
}
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static inline u32 *xskq_peek_id(struct xsk_queue *q)
|
||||
{
|
||||
struct xdp_umem_ring *ring;
|
||||
|
||||
if (q->cons_tail == q->cons_head) {
|
||||
WRITE_ONCE(q->ring->consumer, q->cons_tail);
|
||||
q->cons_head = q->cons_tail + xskq_nb_avail(q, RX_BATCH_SIZE);
|
||||
|
||||
/* Order consumer and data */
|
||||
smp_rmb();
|
||||
|
||||
return xskq_validate_id(q);
|
||||
}
|
||||
|
||||
ring = (struct xdp_umem_ring *)q->ring;
|
||||
return &ring->desc[q->cons_tail & q->ring_mask];
|
||||
}
|
||||
|
||||
static inline void xskq_discard_id(struct xsk_queue *q)
|
||||
{
|
||||
q->cons_tail++;
|
||||
(void)xskq_validate_id(q);
|
||||
}
|
||||
|
||||
static inline int xskq_produce_id(struct xsk_queue *q, u32 id)
|
||||
{
|
||||
struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
|
||||
|
||||
ring->desc[q->prod_tail++ & q->ring_mask] = id;
|
||||
|
||||
/* Order producer and data */
|
||||
smp_wmb();
|
||||
|
||||
WRITE_ONCE(q->ring->producer, q->prod_tail);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline int xskq_reserve_id(struct xsk_queue *q)
|
||||
{
|
||||
if (xskq_nb_free(q, q->prod_head, 1) == 0)
|
||||
return -ENOSPC;
|
||||
|
||||
q->prod_head++;
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Rx/Tx queue */
|
||||
|
||||
static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
|
||||
{
|
||||
u32 buff_len;
|
||||
|
||||
if (unlikely(d->idx >= q->umem_props.nframes)) {
|
||||
q->invalid_descs++;
|
||||
return false;
|
||||
}
|
||||
|
||||
buff_len = q->umem_props.frame_size;
|
||||
if (unlikely(d->len > buff_len || d->len == 0 ||
|
||||
d->offset > buff_len || d->offset + d->len > buff_len)) {
|
||||
q->invalid_descs++;
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static inline struct xdp_desc *xskq_validate_desc(struct xsk_queue *q,
|
||||
struct xdp_desc *desc)
|
||||
{
|
||||
while (q->cons_tail != q->cons_head) {
|
||||
struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
|
||||
unsigned int idx = q->cons_tail & q->ring_mask;
|
||||
|
||||
if (xskq_is_valid_desc(q, &ring->desc[idx])) {
|
||||
if (desc)
|
||||
*desc = ring->desc[idx];
|
||||
return desc;
|
||||
}
|
||||
|
||||
q->cons_tail++;
|
||||
}
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
|
||||
struct xdp_desc *desc)
|
||||
{
|
||||
struct xdp_rxtx_ring *ring;
|
||||
|
||||
if (q->cons_tail == q->cons_head) {
|
||||
WRITE_ONCE(q->ring->consumer, q->cons_tail);
|
||||
q->cons_head = q->cons_tail + xskq_nb_avail(q, RX_BATCH_SIZE);
|
||||
|
||||
/* Order consumer and data */
|
||||
smp_rmb();
|
||||
|
||||
return xskq_validate_desc(q, desc);
|
||||
}
|
||||
|
||||
ring = (struct xdp_rxtx_ring *)q->ring;
|
||||
*desc = ring->desc[q->cons_tail & q->ring_mask];
|
||||
return desc;
|
||||
}
|
||||
|
||||
static inline void xskq_discard_desc(struct xsk_queue *q)
|
||||
{
|
||||
q->cons_tail++;
|
||||
(void)xskq_validate_desc(q, NULL);
|
||||
}
|
||||
|
||||
static inline int xskq_produce_batch_desc(struct xsk_queue *q,
|
||||
u32 id, u32 len, u16 offset)
|
||||
{
|
||||
struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
|
||||
unsigned int idx;
|
||||
|
||||
if (xskq_nb_free(q, q->prod_head, 1) == 0)
|
||||
return -ENOSPC;
|
||||
|
||||
idx = (q->prod_head++) & q->ring_mask;
|
||||
ring->desc[idx].idx = id;
|
||||
ring->desc[idx].len = len;
|
||||
ring->desc[idx].offset = offset;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline void xskq_produce_flush_desc(struct xsk_queue *q)
|
||||
{
|
||||
/* Order producer and data */
|
||||
smp_wmb();
|
||||
|
||||
q->prod_tail = q->prod_head,
|
||||
WRITE_ONCE(q->ring->producer, q->prod_tail);
|
||||
}
|
||||
|
||||
static inline bool xskq_full_desc(struct xsk_queue *q)
|
||||
{
|
||||
return (xskq_nb_avail(q, q->nentries) == q->nentries);
|
||||
}
|
||||
|
||||
static inline bool xskq_empty_desc(struct xsk_queue *q)
|
||||
{
|
||||
return (xskq_nb_free(q, q->prod_tail, 1) == q->nentries);
|
||||
}
|
||||
|
||||
void xskq_set_umem(struct xsk_queue *q, struct xdp_umem_props *umem_props);
|
||||
struct xsk_queue *xskq_create(u32 nentries, bool umem_queue);
|
||||
void xskq_destroy(struct xsk_queue *q_ops);
|
||||
|
||||
#endif /* _LINUX_XSK_QUEUE_H */
|
@ -45,10 +45,12 @@ hostprogs-y += xdp_rxq_info
|
||||
hostprogs-y += syscall_tp
|
||||
hostprogs-y += cpustat
|
||||
hostprogs-y += xdp_adjust_tail
|
||||
hostprogs-y += xdpsock
|
||||
|
||||
# Libbpf dependencies
|
||||
LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
|
||||
CGROUP_HELPERS := ../../tools/testing/selftests/bpf/cgroup_helpers.o
|
||||
TRACE_HELPERS := ../../tools/testing/selftests/bpf/trace_helpers.o
|
||||
|
||||
test_lru_dist-objs := test_lru_dist.o $(LIBBPF)
|
||||
sock_example-objs := sock_example.o $(LIBBPF)
|
||||
@ -65,10 +67,10 @@ tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o
|
||||
tracex7-objs := bpf_load.o $(LIBBPF) tracex7_user.o
|
||||
load_sock_ops-objs := bpf_load.o $(LIBBPF) load_sock_ops.o
|
||||
test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_write_user_user.o
|
||||
trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o
|
||||
trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o $(TRACE_HELPERS)
|
||||
lathist-objs := bpf_load.o $(LIBBPF) lathist_user.o
|
||||
offwaketime-objs := bpf_load.o $(LIBBPF) offwaketime_user.o
|
||||
spintest-objs := bpf_load.o $(LIBBPF) spintest_user.o
|
||||
offwaketime-objs := bpf_load.o $(LIBBPF) offwaketime_user.o $(TRACE_HELPERS)
|
||||
spintest-objs := bpf_load.o $(LIBBPF) spintest_user.o $(TRACE_HELPERS)
|
||||
map_perf_test-objs := bpf_load.o $(LIBBPF) map_perf_test_user.o
|
||||
test_overhead-objs := bpf_load.o $(LIBBPF) test_overhead_user.o
|
||||
test_cgrp2_array_pin-objs := $(LIBBPF) test_cgrp2_array_pin.o
|
||||
@ -82,8 +84,8 @@ xdp2-objs := bpf_load.o $(LIBBPF) xdp1_user.o
|
||||
xdp_router_ipv4-objs := bpf_load.o $(LIBBPF) xdp_router_ipv4_user.o
|
||||
test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) $(CGROUP_HELPERS) \
|
||||
test_current_task_under_cgroup_user.o
|
||||
trace_event-objs := bpf_load.o $(LIBBPF) trace_event_user.o
|
||||
sampleip-objs := bpf_load.o $(LIBBPF) sampleip_user.o
|
||||
trace_event-objs := bpf_load.o $(LIBBPF) trace_event_user.o $(TRACE_HELPERS)
|
||||
sampleip-objs := bpf_load.o $(LIBBPF) sampleip_user.o $(TRACE_HELPERS)
|
||||
tc_l2_redirect-objs := bpf_load.o $(LIBBPF) tc_l2_redirect_user.o
|
||||
lwt_len_hist-objs := bpf_load.o $(LIBBPF) lwt_len_hist_user.o
|
||||
xdp_tx_iptunnel-objs := bpf_load.o $(LIBBPF) xdp_tx_iptunnel_user.o
|
||||
@ -97,6 +99,7 @@ xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o
|
||||
syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
|
||||
cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o
|
||||
xdp_adjust_tail-objs := bpf_load.o $(LIBBPF) xdp_adjust_tail_user.o
|
||||
xdpsock-objs := bpf_load.o $(LIBBPF) xdpsock_user.o
|
||||
|
||||
# Tell kbuild to always build the programs
|
||||
always := $(hostprogs-y)
|
||||
@ -150,6 +153,7 @@ always += xdp2skb_meta_kern.o
|
||||
always += syscall_tp_kern.o
|
||||
always += cpustat_kern.o
|
||||
always += xdp_adjust_tail_kern.o
|
||||
always += xdpsock_kern.o
|
||||
|
||||
HOSTCFLAGS += -I$(objtree)/usr/include
|
||||
HOSTCFLAGS += -I$(srctree)/tools/lib/
|
||||
@ -196,6 +200,7 @@ HOSTLOADLIBES_xdp_rxq_info += -lelf
|
||||
HOSTLOADLIBES_syscall_tp += -lelf
|
||||
HOSTLOADLIBES_cpustat += -lelf
|
||||
HOSTLOADLIBES_xdp_adjust_tail += -lelf
|
||||
HOSTLOADLIBES_xdpsock += -lelf -pthread
|
||||
|
||||
# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
|
||||
# make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
|
||||
|
@ -145,6 +145,9 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
|
||||
}
|
||||
|
||||
if (is_kprobe || is_kretprobe) {
|
||||
bool need_normal_check = true;
|
||||
const char *event_prefix = "";
|
||||
|
||||
if (is_kprobe)
|
||||
event += 7;
|
||||
else
|
||||
@ -158,18 +161,33 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
|
||||
if (isdigit(*event))
|
||||
return populate_prog_array(event, fd);
|
||||
|
||||
snprintf(buf, sizeof(buf),
|
||||
"echo '%c:%s %s' >> /sys/kernel/debug/tracing/kprobe_events",
|
||||
is_kprobe ? 'p' : 'r', event, event);
|
||||
err = system(buf);
|
||||
if (err < 0) {
|
||||
printf("failed to create kprobe '%s' error '%s'\n",
|
||||
event, strerror(errno));
|
||||
return -1;
|
||||
#ifdef __x86_64__
|
||||
if (strncmp(event, "sys_", 4) == 0) {
|
||||
snprintf(buf, sizeof(buf),
|
||||
"echo '%c:__x64_%s __x64_%s' >> /sys/kernel/debug/tracing/kprobe_events",
|
||||
is_kprobe ? 'p' : 'r', event, event);
|
||||
err = system(buf);
|
||||
if (err >= 0) {
|
||||
need_normal_check = false;
|
||||
event_prefix = "__x64_";
|
||||
}
|
||||
}
|
||||
#endif
|
||||
if (need_normal_check) {
|
||||
snprintf(buf, sizeof(buf),
|
||||
"echo '%c:%s %s' >> /sys/kernel/debug/tracing/kprobe_events",
|
||||
is_kprobe ? 'p' : 'r', event, event);
|
||||
err = system(buf);
|
||||
if (err < 0) {
|
||||
printf("failed to create kprobe '%s' error '%s'\n",
|
||||
event, strerror(errno));
|
||||
return -1;
|
||||
}
|
||||
}
|
||||
|
||||
strcpy(buf, DEBUGFS);
|
||||
strcat(buf, "events/kprobes/");
|
||||
strcat(buf, event_prefix);
|
||||
strcat(buf, event);
|
||||
strcat(buf, "/id");
|
||||
} else if (is_tracepoint) {
|
||||
@ -648,66 +666,3 @@ void read_trace_pipe(void)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#define MAX_SYMS 300000
|
||||
static struct ksym syms[MAX_SYMS];
|
||||
static int sym_cnt;
|
||||
|
||||
static int ksym_cmp(const void *p1, const void *p2)
|
||||
{
|
||||
return ((struct ksym *)p1)->addr - ((struct ksym *)p2)->addr;
|
||||
}
|
||||
|
||||
int load_kallsyms(void)
|
||||
{
|
||||
FILE *f = fopen("/proc/kallsyms", "r");
|
||||
char func[256], buf[256];
|
||||
char symbol;
|
||||
void *addr;
|
||||
int i = 0;
|
||||
|
||||
if (!f)
|
||||
return -ENOENT;
|
||||
|
||||
while (!feof(f)) {
|
||||
if (!fgets(buf, sizeof(buf), f))
|
||||
break;
|
||||
if (sscanf(buf, "%p %c %s", &addr, &symbol, func) != 3)
|
||||
break;
|
||||
if (!addr)
|
||||
continue;
|
||||
syms[i].addr = (long) addr;
|
||||
syms[i].name = strdup(func);
|
||||
i++;
|
||||
}
|
||||
sym_cnt = i;
|
||||
qsort(syms, sym_cnt, sizeof(struct ksym), ksym_cmp);
|
||||
return 0;
|
||||
}
|
||||
|
||||
struct ksym *ksym_search(long key)
|
||||
{
|
||||
int start = 0, end = sym_cnt;
|
||||
int result;
|
||||
|
||||
while (start < end) {
|
||||
size_t mid = start + (end - start) / 2;
|
||||
|
||||
result = key - syms[mid].addr;
|
||||
if (result < 0)
|
||||
end = mid;
|
||||
else if (result > 0)
|
||||
start = mid + 1;
|
||||
else
|
||||
return &syms[mid];
|
||||
}
|
||||
|
||||
if (start >= 1 && syms[start - 1].addr < key &&
|
||||
key < syms[start].addr)
|
||||
/* valid ksym */
|
||||
return &syms[start - 1];
|
||||
|
||||
/* out of range. return _stext */
|
||||
return &syms[0];
|
||||
}
|
||||
|
||||
|
@ -54,12 +54,5 @@ int load_bpf_file(char *path);
|
||||
int load_bpf_file_fixup_map(const char *path, fixup_map_cb fixup_map);
|
||||
|
||||
void read_trace_pipe(void);
|
||||
struct ksym {
|
||||
long addr;
|
||||
char *name;
|
||||
};
|
||||
|
||||
int load_kallsyms(void);
|
||||
struct ksym *ksym_search(long key);
|
||||
int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
|
||||
#endif
|
||||
|
@ -17,6 +17,7 @@
|
||||
#include <sys/resource.h>
|
||||
#include "libbpf.h"
|
||||
#include "bpf_load.h"
|
||||
#include "trace_helpers.h"
|
||||
|
||||
#define PRINT_RAW_ADDR 0
|
||||
|
||||
|
@ -22,6 +22,7 @@
|
||||
#include "libbpf.h"
|
||||
#include "bpf_load.h"
|
||||
#include "perf-sys.h"
|
||||
#include "trace_helpers.h"
|
||||
|
||||
#define DEFAULT_FREQ 99
|
||||
#define DEFAULT_SECS 5
|
||||
|
@ -7,6 +7,7 @@
|
||||
#include <sys/resource.h>
|
||||
#include "libbpf.h"
|
||||
#include "bpf_load.h"
|
||||
#include "trace_helpers.h"
|
||||
|
||||
int main(int ac, char **argv)
|
||||
{
|
||||
|
@ -21,6 +21,7 @@
|
||||
#include "libbpf.h"
|
||||
#include "bpf_load.h"
|
||||
#include "perf-sys.h"
|
||||
#include "trace_helpers.h"
|
||||
|
||||
#define SAMPLE_FREQ 50
|
||||
|
||||
|
@ -21,100 +21,10 @@
|
||||
#include "libbpf.h"
|
||||
#include "bpf_load.h"
|
||||
#include "perf-sys.h"
|
||||
#include "trace_helpers.h"
|
||||
|
||||
static int pmu_fd;
|
||||
|
||||
int page_size;
|
||||
int page_cnt = 8;
|
||||
volatile struct perf_event_mmap_page *header;
|
||||
|
||||
typedef void (*print_fn)(void *data, int size);
|
||||
|
||||
static int perf_event_mmap(int fd)
|
||||
{
|
||||
void *base;
|
||||
int mmap_size;
|
||||
|
||||
page_size = getpagesize();
|
||||
mmap_size = page_size * (page_cnt + 1);
|
||||
|
||||
base = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
|
||||
if (base == MAP_FAILED) {
|
||||
printf("mmap err\n");
|
||||
return -1;
|
||||
}
|
||||
|
||||
header = base;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int perf_event_poll(int fd)
|
||||
{
|
||||
struct pollfd pfd = { .fd = fd, .events = POLLIN };
|
||||
|
||||
return poll(&pfd, 1, 1000);
|
||||
}
|
||||
|
||||
struct perf_event_sample {
|
||||
struct perf_event_header header;
|
||||
__u32 size;
|
||||
char data[];
|
||||
};
|
||||
|
||||
static void perf_event_read(print_fn fn)
|
||||
{
|
||||
__u64 data_tail = header->data_tail;
|
||||
__u64 data_head = header->data_head;
|
||||
__u64 buffer_size = page_cnt * page_size;
|
||||
void *base, *begin, *end;
|
||||
char buf[256];
|
||||
|
||||
asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */
|
||||
if (data_head == data_tail)
|
||||
return;
|
||||
|
||||
base = ((char *)header) + page_size;
|
||||
|
||||
begin = base + data_tail % buffer_size;
|
||||
end = base + data_head % buffer_size;
|
||||
|
||||
while (begin != end) {
|
||||
struct perf_event_sample *e;
|
||||
|
||||
e = begin;
|
||||
if (begin + e->header.size > base + buffer_size) {
|
||||
long len = base + buffer_size - begin;
|
||||
|
||||
assert(len < e->header.size);
|
||||
memcpy(buf, begin, len);
|
||||
memcpy(buf + len, base, e->header.size - len);
|
||||
e = (void *) buf;
|
||||
begin = base + e->header.size - len;
|
||||
} else if (begin + e->header.size == base + buffer_size) {
|
||||
begin = base;
|
||||
} else {
|
||||
begin += e->header.size;
|
||||
}
|
||||
|
||||
if (e->header.type == PERF_RECORD_SAMPLE) {
|
||||
fn(e->data, e->size);
|
||||
} else if (e->header.type == PERF_RECORD_LOST) {
|
||||
struct {
|
||||
struct perf_event_header header;
|
||||
__u64 id;
|
||||
__u64 lost;
|
||||
} *lost = (void *) e;
|
||||
printf("lost %lld events\n", lost->lost);
|
||||
} else {
|
||||
printf("unknown event type=%d size=%d\n",
|
||||
e->header.type, e->header.size);
|
||||
}
|
||||
}
|
||||
|
||||
__sync_synchronize(); /* smp_mb() */
|
||||
header->data_tail = data_head;
|
||||
}
|
||||
|
||||
static __u64 time_get_ns(void)
|
||||
{
|
||||
struct timespec ts;
|
||||
@ -127,7 +37,7 @@ static __u64 start_time;
|
||||
|
||||
#define MAX_CNT 100000ll
|
||||
|
||||
static void print_bpf_output(void *data, int size)
|
||||
static int print_bpf_output(void *data, int size)
|
||||
{
|
||||
static __u64 cnt;
|
||||
struct {
|
||||
@ -138,7 +48,7 @@ static void print_bpf_output(void *data, int size)
|
||||
if (e->cookie != 0x12345678) {
|
||||
printf("BUG pid %llx cookie %llx sized %d\n",
|
||||
e->pid, e->cookie, size);
|
||||
kill(0, SIGINT);
|
||||
return PERF_EVENT_ERROR;
|
||||
}
|
||||
|
||||
cnt++;
|
||||
@ -146,8 +56,10 @@ static void print_bpf_output(void *data, int size)
|
||||
if (cnt == MAX_CNT) {
|
||||
printf("recv %lld events per sec\n",
|
||||
MAX_CNT * 1000000000ll / (time_get_ns() - start_time));
|
||||
kill(0, SIGINT);
|
||||
return PERF_EVENT_DONE;
|
||||
}
|
||||
|
||||
return PERF_EVENT_CONT;
|
||||
}
|
||||
|
||||
static void test_bpf_perf_event(void)
|
||||
@ -170,6 +82,7 @@ int main(int argc, char **argv)
|
||||
{
|
||||
char filename[256];
|
||||
FILE *f;
|
||||
int ret;
|
||||
|
||||
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
|
||||
|
||||
@ -187,10 +100,7 @@ int main(int argc, char **argv)
|
||||
(void) f;
|
||||
|
||||
start_time = time_get_ns();
|
||||
for (;;) {
|
||||
perf_event_poll(pmu_fd);
|
||||
perf_event_read(print_bpf_output);
|
||||
}
|
||||
|
||||
return 0;
|
||||
ret = perf_event_poller(pmu_fd, print_bpf_output);
|
||||
kill(0, SIGINT);
|
||||
return ret;
|
||||
}
|
||||
|
11
samples/bpf/xdpsock.h
Normal file
11
samples/bpf/xdpsock.h
Normal file
@ -0,0 +1,11 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#ifndef XDPSOCK_H_
|
||||
#define XDPSOCK_H_
|
||||
|
||||
/* Power-of-2 number of sockets */
|
||||
#define MAX_SOCKS 4
|
||||
|
||||
/* Round-robin receive */
|
||||
#define RR_LB 0
|
||||
|
||||
#endif /* XDPSOCK_H_ */
|
56
samples/bpf/xdpsock_kern.c
Normal file
56
samples/bpf/xdpsock_kern.c
Normal file
@ -0,0 +1,56 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
#define KBUILD_MODNAME "foo"
|
||||
#include <uapi/linux/bpf.h>
|
||||
#include "bpf_helpers.h"
|
||||
|
||||
#include "xdpsock.h"
|
||||
|
||||
struct bpf_map_def SEC("maps") qidconf_map = {
|
||||
.type = BPF_MAP_TYPE_ARRAY,
|
||||
.key_size = sizeof(int),
|
||||
.value_size = sizeof(int),
|
||||
.max_entries = 1,
|
||||
};
|
||||
|
||||
struct bpf_map_def SEC("maps") xsks_map = {
|
||||
.type = BPF_MAP_TYPE_XSKMAP,
|
||||
.key_size = sizeof(int),
|
||||
.value_size = sizeof(int),
|
||||
.max_entries = 4,
|
||||
};
|
||||
|
||||
struct bpf_map_def SEC("maps") rr_map = {
|
||||
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
|
||||
.key_size = sizeof(int),
|
||||
.value_size = sizeof(unsigned int),
|
||||
.max_entries = 1,
|
||||
};
|
||||
|
||||
SEC("xdp_sock")
|
||||
int xdp_sock_prog(struct xdp_md *ctx)
|
||||
{
|
||||
int *qidconf, key = 0, idx;
|
||||
unsigned int *rr;
|
||||
|
||||
qidconf = bpf_map_lookup_elem(&qidconf_map, &key);
|
||||
if (!qidconf)
|
||||
return XDP_ABORTED;
|
||||
|
||||
if (*qidconf != ctx->rx_queue_index)
|
||||
return XDP_PASS;
|
||||
|
||||
#if RR_LB /* NB! RR_LB is configured in xdpsock.h */
|
||||
rr = bpf_map_lookup_elem(&rr_map, &key);
|
||||
if (!rr)
|
||||
return XDP_ABORTED;
|
||||
|
||||
*rr = (*rr + 1) & (MAX_SOCKS - 1);
|
||||
idx = *rr;
|
||||
#else
|
||||
idx = 0;
|
||||
#endif
|
||||
|
||||
return bpf_redirect_map(&xsks_map, idx, 0);
|
||||
}
|
||||
|
||||
char _license[] SEC("license") = "GPL";
|
948
samples/bpf/xdpsock_user.c
Normal file
948
samples/bpf/xdpsock_user.c
Normal file
@ -0,0 +1,948 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/* Copyright(c) 2017 - 2018 Intel Corporation.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it
|
||||
* under the terms and conditions of the GNU General Public License,
|
||||
* version 2, as published by the Free Software Foundation.
|
||||
*
|
||||
* This program is distributed in the hope it will be useful, but WITHOUT
|
||||
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
|
||||
* more details.
|
||||
*/
|
||||
|
||||
#include <assert.h>
|
||||
#include <errno.h>
|
||||
#include <getopt.h>
|
||||
#include <libgen.h>
|
||||
#include <linux/bpf.h>
|
||||
#include <linux/if_link.h>
|
||||
#include <linux/if_xdp.h>
|
||||
#include <linux/if_ether.h>
|
||||
#include <net/if.h>
|
||||
#include <signal.h>
|
||||
#include <stdbool.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <net/ethernet.h>
|
||||
#include <sys/resource.h>
|
||||
#include <sys/socket.h>
|
||||
#include <sys/mman.h>
|
||||
#include <time.h>
|
||||
#include <unistd.h>
|
||||
#include <pthread.h>
|
||||
#include <locale.h>
|
||||
#include <sys/types.h>
|
||||
#include <poll.h>
|
||||
|
||||
#include "bpf_load.h"
|
||||
#include "bpf_util.h"
|
||||
#include "libbpf.h"
|
||||
|
||||
#include "xdpsock.h"
|
||||
|
||||
#ifndef SOL_XDP
|
||||
#define SOL_XDP 283
|
||||
#endif
|
||||
|
||||
#ifndef AF_XDP
|
||||
#define AF_XDP 44
|
||||
#endif
|
||||
|
||||
#ifndef PF_XDP
|
||||
#define PF_XDP AF_XDP
|
||||
#endif
|
||||
|
||||
#define NUM_FRAMES 131072
|
||||
#define FRAME_HEADROOM 0
|
||||
#define FRAME_SIZE 2048
|
||||
#define NUM_DESCS 1024
|
||||
#define BATCH_SIZE 16
|
||||
|
||||
#define FQ_NUM_DESCS 1024
|
||||
#define CQ_NUM_DESCS 1024
|
||||
|
||||
#define DEBUG_HEXDUMP 0
|
||||
|
||||
typedef __u32 u32;
|
||||
|
||||
static unsigned long prev_time;
|
||||
|
||||
enum benchmark_type {
|
||||
BENCH_RXDROP = 0,
|
||||
BENCH_TXONLY = 1,
|
||||
BENCH_L2FWD = 2,
|
||||
};
|
||||
|
||||
static enum benchmark_type opt_bench = BENCH_RXDROP;
|
||||
static u32 opt_xdp_flags;
|
||||
static const char *opt_if = "";
|
||||
static int opt_ifindex;
|
||||
static int opt_queue;
|
||||
static int opt_poll;
|
||||
static int opt_shared_packet_buffer;
|
||||
static int opt_interval = 1;
|
||||
|
||||
struct xdp_umem_uqueue {
|
||||
u32 cached_prod;
|
||||
u32 cached_cons;
|
||||
u32 mask;
|
||||
u32 size;
|
||||
struct xdp_umem_ring *ring;
|
||||
};
|
||||
|
||||
struct xdp_umem {
|
||||
char (*frames)[FRAME_SIZE];
|
||||
struct xdp_umem_uqueue fq;
|
||||
struct xdp_umem_uqueue cq;
|
||||
int fd;
|
||||
};
|
||||
|
||||
struct xdp_uqueue {
|
||||
u32 cached_prod;
|
||||
u32 cached_cons;
|
||||
u32 mask;
|
||||
u32 size;
|
||||
struct xdp_rxtx_ring *ring;
|
||||
};
|
||||
|
||||
struct xdpsock {
|
||||
struct xdp_uqueue rx;
|
||||
struct xdp_uqueue tx;
|
||||
int sfd;
|
||||
struct xdp_umem *umem;
|
||||
u32 outstanding_tx;
|
||||
unsigned long rx_npkts;
|
||||
unsigned long tx_npkts;
|
||||
unsigned long prev_rx_npkts;
|
||||
unsigned long prev_tx_npkts;
|
||||
};
|
||||
|
||||
#define MAX_SOCKS 4
|
||||
static int num_socks;
|
||||
struct xdpsock *xsks[MAX_SOCKS];
|
||||
|
||||
static unsigned long get_nsecs(void)
|
||||
{
|
||||
struct timespec ts;
|
||||
|
||||
clock_gettime(CLOCK_MONOTONIC, &ts);
|
||||
return ts.tv_sec * 1000000000UL + ts.tv_nsec;
|
||||
}
|
||||
|
||||
static void dump_stats(void);
|
||||
|
||||
#define lassert(expr) \
|
||||
do { \
|
||||
if (!(expr)) { \
|
||||
fprintf(stderr, "%s:%s:%i: Assertion failed: " \
|
||||
#expr ": errno: %d/\"%s\"\n", \
|
||||
__FILE__, __func__, __LINE__, \
|
||||
errno, strerror(errno)); \
|
||||
dump_stats(); \
|
||||
exit(EXIT_FAILURE); \
|
||||
} \
|
||||
} while (0)
|
||||
|
||||
#define barrier() __asm__ __volatile__("": : :"memory")
|
||||
#define u_smp_rmb() barrier()
|
||||
#define u_smp_wmb() barrier()
|
||||
#define likely(x) __builtin_expect(!!(x), 1)
|
||||
#define unlikely(x) __builtin_expect(!!(x), 0)
|
||||
|
||||
static const char pkt_data[] =
|
||||
"\x3c\xfd\xfe\x9e\x7f\x71\xec\xb1\xd7\x98\x3a\xc0\x08\x00\x45\x00"
|
||||
"\x00\x2e\x00\x00\x00\x00\x40\x11\x88\x97\x05\x08\x07\x08\xc8\x14"
|
||||
"\x1e\x04\x10\x92\x10\x92\x00\x1a\x6d\xa3\x34\x33\x1f\x69\x40\x6b"
|
||||
"\x54\x59\xb6\x14\x2d\x11\x44\xbf\xaf\xd9\xbe\xaa";
|
||||
|
||||
static inline u32 umem_nb_free(struct xdp_umem_uqueue *q, u32 nb)
|
||||
{
|
||||
u32 free_entries = q->size - (q->cached_prod - q->cached_cons);
|
||||
|
||||
if (free_entries >= nb)
|
||||
return free_entries;
|
||||
|
||||
/* Refresh the local tail pointer */
|
||||
q->cached_cons = q->ring->ptrs.consumer;
|
||||
|
||||
return q->size - (q->cached_prod - q->cached_cons);
|
||||
}
|
||||
|
||||
static inline u32 xq_nb_free(struct xdp_uqueue *q, u32 ndescs)
|
||||
{
|
||||
u32 free_entries = q->cached_cons - q->cached_prod;
|
||||
|
||||
if (free_entries >= ndescs)
|
||||
return free_entries;
|
||||
|
||||
/* Refresh the local tail pointer */
|
||||
q->cached_cons = q->ring->ptrs.consumer + q->size;
|
||||
return q->cached_cons - q->cached_prod;
|
||||
}
|
||||
|
||||
static inline u32 umem_nb_avail(struct xdp_umem_uqueue *q, u32 nb)
|
||||
{
|
||||
u32 entries = q->cached_prod - q->cached_cons;
|
||||
|
||||
if (entries == 0) {
|
||||
q->cached_prod = q->ring->ptrs.producer;
|
||||
entries = q->cached_prod - q->cached_cons;
|
||||
}
|
||||
|
||||
return (entries > nb) ? nb : entries;
|
||||
}
|
||||
|
||||
static inline u32 xq_nb_avail(struct xdp_uqueue *q, u32 ndescs)
|
||||
{
|
||||
u32 entries = q->cached_prod - q->cached_cons;
|
||||
|
||||
if (entries == 0) {
|
||||
q->cached_prod = q->ring->ptrs.producer;
|
||||
entries = q->cached_prod - q->cached_cons;
|
||||
}
|
||||
|
||||
return (entries > ndescs) ? ndescs : entries;
|
||||
}
|
||||
|
||||
static inline int umem_fill_to_kernel_ex(struct xdp_umem_uqueue *fq,
|
||||
struct xdp_desc *d,
|
||||
size_t nb)
|
||||
{
|
||||
u32 i;
|
||||
|
||||
if (umem_nb_free(fq, nb) < nb)
|
||||
return -ENOSPC;
|
||||
|
||||
for (i = 0; i < nb; i++) {
|
||||
u32 idx = fq->cached_prod++ & fq->mask;
|
||||
|
||||
fq->ring->desc[idx] = d[i].idx;
|
||||
}
|
||||
|
||||
u_smp_wmb();
|
||||
|
||||
fq->ring->ptrs.producer = fq->cached_prod;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline int umem_fill_to_kernel(struct xdp_umem_uqueue *fq, u32 *d,
|
||||
size_t nb)
|
||||
{
|
||||
u32 i;
|
||||
|
||||
if (umem_nb_free(fq, nb) < nb)
|
||||
return -ENOSPC;
|
||||
|
||||
for (i = 0; i < nb; i++) {
|
||||
u32 idx = fq->cached_prod++ & fq->mask;
|
||||
|
||||
fq->ring->desc[idx] = d[i];
|
||||
}
|
||||
|
||||
u_smp_wmb();
|
||||
|
||||
fq->ring->ptrs.producer = fq->cached_prod;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline size_t umem_complete_from_kernel(struct xdp_umem_uqueue *cq,
|
||||
u32 *d, size_t nb)
|
||||
{
|
||||
u32 idx, i, entries = umem_nb_avail(cq, nb);
|
||||
|
||||
u_smp_rmb();
|
||||
|
||||
for (i = 0; i < entries; i++) {
|
||||
idx = cq->cached_cons++ & cq->mask;
|
||||
d[i] = cq->ring->desc[idx];
|
||||
}
|
||||
|
||||
if (entries > 0) {
|
||||
u_smp_wmb();
|
||||
|
||||
cq->ring->ptrs.consumer = cq->cached_cons;
|
||||
}
|
||||
|
||||
return entries;
|
||||
}
|
||||
|
||||
static inline void *xq_get_data(struct xdpsock *xsk, __u32 idx, __u32 off)
|
||||
{
|
||||
lassert(idx < NUM_FRAMES);
|
||||
return &xsk->umem->frames[idx][off];
|
||||
}
|
||||
|
||||
static inline int xq_enq(struct xdp_uqueue *uq,
|
||||
const struct xdp_desc *descs,
|
||||
unsigned int ndescs)
|
||||
{
|
||||
struct xdp_rxtx_ring *r = uq->ring;
|
||||
unsigned int i;
|
||||
|
||||
if (xq_nb_free(uq, ndescs) < ndescs)
|
||||
return -ENOSPC;
|
||||
|
||||
for (i = 0; i < ndescs; i++) {
|
||||
u32 idx = uq->cached_prod++ & uq->mask;
|
||||
|
||||
r->desc[idx].idx = descs[i].idx;
|
||||
r->desc[idx].len = descs[i].len;
|
||||
r->desc[idx].offset = descs[i].offset;
|
||||
}
|
||||
|
||||
u_smp_wmb();
|
||||
|
||||
r->ptrs.producer = uq->cached_prod;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline int xq_enq_tx_only(struct xdp_uqueue *uq,
|
||||
__u32 idx, unsigned int ndescs)
|
||||
{
|
||||
struct xdp_rxtx_ring *q = uq->ring;
|
||||
unsigned int i;
|
||||
|
||||
if (xq_nb_free(uq, ndescs) < ndescs)
|
||||
return -ENOSPC;
|
||||
|
||||
for (i = 0; i < ndescs; i++) {
|
||||
u32 idx = uq->cached_prod++ & uq->mask;
|
||||
|
||||
q->desc[idx].idx = idx + i;
|
||||
q->desc[idx].len = sizeof(pkt_data) - 1;
|
||||
q->desc[idx].offset = 0;
|
||||
}
|
||||
|
||||
u_smp_wmb();
|
||||
|
||||
q->ptrs.producer = uq->cached_prod;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline int xq_deq(struct xdp_uqueue *uq,
|
||||
struct xdp_desc *descs,
|
||||
int ndescs)
|
||||
{
|
||||
struct xdp_rxtx_ring *r = uq->ring;
|
||||
unsigned int idx;
|
||||
int i, entries;
|
||||
|
||||
entries = xq_nb_avail(uq, ndescs);
|
||||
|
||||
u_smp_rmb();
|
||||
|
||||
for (i = 0; i < entries; i++) {
|
||||
idx = uq->cached_cons++ & uq->mask;
|
||||
descs[i] = r->desc[idx];
|
||||
}
|
||||
|
||||
if (entries > 0) {
|
||||
u_smp_wmb();
|
||||
|
||||
r->ptrs.consumer = uq->cached_cons;
|
||||
}
|
||||
|
||||
return entries;
|
||||
}
|
||||
|
||||
static void swap_mac_addresses(void *data)
|
||||
{
|
||||
struct ether_header *eth = (struct ether_header *)data;
|
||||
struct ether_addr *src_addr = (struct ether_addr *)ð->ether_shost;
|
||||
struct ether_addr *dst_addr = (struct ether_addr *)ð->ether_dhost;
|
||||
struct ether_addr tmp;
|
||||
|
||||
tmp = *src_addr;
|
||||
*src_addr = *dst_addr;
|
||||
*dst_addr = tmp;
|
||||
}
|
||||
|
||||
#if DEBUG_HEXDUMP
|
||||
static void hex_dump(void *pkt, size_t length, const char *prefix)
|
||||
{
|
||||
int i = 0;
|
||||
const unsigned char *address = (unsigned char *)pkt;
|
||||
const unsigned char *line = address;
|
||||
size_t line_size = 32;
|
||||
unsigned char c;
|
||||
|
||||
printf("length = %zu\n", length);
|
||||
printf("%s | ", prefix);
|
||||
while (length-- > 0) {
|
||||
printf("%02X ", *address++);
|
||||
if (!(++i % line_size) || (length == 0 && i % line_size)) {
|
||||
if (length == 0) {
|
||||
while (i++ % line_size)
|
||||
printf("__ ");
|
||||
}
|
||||
printf(" | "); /* right close */
|
||||
while (line < address) {
|
||||
c = *line++;
|
||||
printf("%c", (c < 33 || c == 255) ? 0x2E : c);
|
||||
}
|
||||
printf("\n");
|
||||
if (length > 0)
|
||||
printf("%s | ", prefix);
|
||||
}
|
||||
}
|
||||
printf("\n");
|
||||
}
|
||||
#endif
|
||||
|
||||
static size_t gen_eth_frame(char *frame)
|
||||
{
|
||||
memcpy(frame, pkt_data, sizeof(pkt_data) - 1);
|
||||
return sizeof(pkt_data) - 1;
|
||||
}
|
||||
|
||||
static struct xdp_umem *xdp_umem_configure(int sfd)
|
||||
{
|
||||
int fq_size = FQ_NUM_DESCS, cq_size = CQ_NUM_DESCS;
|
||||
struct xdp_umem_reg mr;
|
||||
struct xdp_umem *umem;
|
||||
void *bufs;
|
||||
|
||||
umem = calloc(1, sizeof(*umem));
|
||||
lassert(umem);
|
||||
|
||||
lassert(posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
|
||||
NUM_FRAMES * FRAME_SIZE) == 0);
|
||||
|
||||
mr.addr = (__u64)bufs;
|
||||
mr.len = NUM_FRAMES * FRAME_SIZE;
|
||||
mr.frame_size = FRAME_SIZE;
|
||||
mr.frame_headroom = FRAME_HEADROOM;
|
||||
|
||||
lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr)) == 0);
|
||||
lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_FILL_RING, &fq_size,
|
||||
sizeof(int)) == 0);
|
||||
lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_COMPLETION_RING, &cq_size,
|
||||
sizeof(int)) == 0);
|
||||
|
||||
umem->fq.ring = mmap(0, sizeof(struct xdp_umem_ring) +
|
||||
FQ_NUM_DESCS * sizeof(u32),
|
||||
PROT_READ | PROT_WRITE,
|
||||
MAP_SHARED | MAP_POPULATE, sfd,
|
||||
XDP_UMEM_PGOFF_FILL_RING);
|
||||
lassert(umem->fq.ring != MAP_FAILED);
|
||||
|
||||
umem->fq.mask = FQ_NUM_DESCS - 1;
|
||||
umem->fq.size = FQ_NUM_DESCS;
|
||||
|
||||
umem->cq.ring = mmap(0, sizeof(struct xdp_umem_ring) +
|
||||
CQ_NUM_DESCS * sizeof(u32),
|
||||
PROT_READ | PROT_WRITE,
|
||||
MAP_SHARED | MAP_POPULATE, sfd,
|
||||
XDP_UMEM_PGOFF_COMPLETION_RING);
|
||||
lassert(umem->cq.ring != MAP_FAILED);
|
||||
|
||||
umem->cq.mask = CQ_NUM_DESCS - 1;
|
||||
umem->cq.size = CQ_NUM_DESCS;
|
||||
|
||||
umem->frames = (char (*)[FRAME_SIZE])bufs;
|
||||
umem->fd = sfd;
|
||||
|
||||
if (opt_bench == BENCH_TXONLY) {
|
||||
int i;
|
||||
|
||||
for (i = 0; i < NUM_FRAMES; i++)
|
||||
(void)gen_eth_frame(&umem->frames[i][0]);
|
||||
}
|
||||
|
||||
return umem;
|
||||
}
|
||||
|
||||
static struct xdpsock *xsk_configure(struct xdp_umem *umem)
|
||||
{
|
||||
struct sockaddr_xdp sxdp = {};
|
||||
int sfd, ndescs = NUM_DESCS;
|
||||
struct xdpsock *xsk;
|
||||
bool shared = true;
|
||||
u32 i;
|
||||
|
||||
sfd = socket(PF_XDP, SOCK_RAW, 0);
|
||||
lassert(sfd >= 0);
|
||||
|
||||
xsk = calloc(1, sizeof(*xsk));
|
||||
lassert(xsk);
|
||||
|
||||
xsk->sfd = sfd;
|
||||
xsk->outstanding_tx = 0;
|
||||
|
||||
if (!umem) {
|
||||
shared = false;
|
||||
xsk->umem = xdp_umem_configure(sfd);
|
||||
} else {
|
||||
xsk->umem = umem;
|
||||
}
|
||||
|
||||
lassert(setsockopt(sfd, SOL_XDP, XDP_RX_RING,
|
||||
&ndescs, sizeof(int)) == 0);
|
||||
lassert(setsockopt(sfd, SOL_XDP, XDP_TX_RING,
|
||||
&ndescs, sizeof(int)) == 0);
|
||||
|
||||
/* Rx */
|
||||
xsk->rx.ring = mmap(NULL,
|
||||
sizeof(struct xdp_ring) +
|
||||
NUM_DESCS * sizeof(struct xdp_desc),
|
||||
PROT_READ | PROT_WRITE,
|
||||
MAP_SHARED | MAP_POPULATE, sfd,
|
||||
XDP_PGOFF_RX_RING);
|
||||
lassert(xsk->rx.ring != MAP_FAILED);
|
||||
|
||||
if (!shared) {
|
||||
for (i = 0; i < NUM_DESCS / 2; i++)
|
||||
lassert(umem_fill_to_kernel(&xsk->umem->fq, &i, 1)
|
||||
== 0);
|
||||
}
|
||||
|
||||
/* Tx */
|
||||
xsk->tx.ring = mmap(NULL,
|
||||
sizeof(struct xdp_ring) +
|
||||
NUM_DESCS * sizeof(struct xdp_desc),
|
||||
PROT_READ | PROT_WRITE,
|
||||
MAP_SHARED | MAP_POPULATE, sfd,
|
||||
XDP_PGOFF_TX_RING);
|
||||
lassert(xsk->tx.ring != MAP_FAILED);
|
||||
|
||||
xsk->rx.mask = NUM_DESCS - 1;
|
||||
xsk->rx.size = NUM_DESCS;
|
||||
|
||||
xsk->tx.mask = NUM_DESCS - 1;
|
||||
xsk->tx.size = NUM_DESCS;
|
||||
|
||||
sxdp.sxdp_family = PF_XDP;
|
||||
sxdp.sxdp_ifindex = opt_ifindex;
|
||||
sxdp.sxdp_queue_id = opt_queue;
|
||||
if (shared) {
|
||||
sxdp.sxdp_flags = XDP_SHARED_UMEM;
|
||||
sxdp.sxdp_shared_umem_fd = umem->fd;
|
||||
}
|
||||
|
||||
lassert(bind(sfd, (struct sockaddr *)&sxdp, sizeof(sxdp)) == 0);
|
||||
|
||||
return xsk;
|
||||
}
|
||||
|
||||
static void print_benchmark(bool running)
|
||||
{
|
||||
const char *bench_str = "INVALID";
|
||||
|
||||
if (opt_bench == BENCH_RXDROP)
|
||||
bench_str = "rxdrop";
|
||||
else if (opt_bench == BENCH_TXONLY)
|
||||
bench_str = "txonly";
|
||||
else if (opt_bench == BENCH_L2FWD)
|
||||
bench_str = "l2fwd";
|
||||
|
||||
printf("%s:%d %s ", opt_if, opt_queue, bench_str);
|
||||
if (opt_xdp_flags & XDP_FLAGS_SKB_MODE)
|
||||
printf("xdp-skb ");
|
||||
else if (opt_xdp_flags & XDP_FLAGS_DRV_MODE)
|
||||
printf("xdp-drv ");
|
||||
else
|
||||
printf(" ");
|
||||
|
||||
if (opt_poll)
|
||||
printf("poll() ");
|
||||
|
||||
if (running) {
|
||||
printf("running...");
|
||||
fflush(stdout);
|
||||
}
|
||||
}
|
||||
|
||||
static void dump_stats(void)
|
||||
{
|
||||
unsigned long now = get_nsecs();
|
||||
long dt = now - prev_time;
|
||||
int i;
|
||||
|
||||
prev_time = now;
|
||||
|
||||
for (i = 0; i < num_socks; i++) {
|
||||
char *fmt = "%-15s %'-11.0f %'-11lu\n";
|
||||
double rx_pps, tx_pps;
|
||||
|
||||
rx_pps = (xsks[i]->rx_npkts - xsks[i]->prev_rx_npkts) *
|
||||
1000000000. / dt;
|
||||
tx_pps = (xsks[i]->tx_npkts - xsks[i]->prev_tx_npkts) *
|
||||
1000000000. / dt;
|
||||
|
||||
printf("\n sock%d@", i);
|
||||
print_benchmark(false);
|
||||
printf("\n");
|
||||
|
||||
printf("%-15s %-11s %-11s %-11.2f\n", "", "pps", "pkts",
|
||||
dt / 1000000000.);
|
||||
printf(fmt, "rx", rx_pps, xsks[i]->rx_npkts);
|
||||
printf(fmt, "tx", tx_pps, xsks[i]->tx_npkts);
|
||||
|
||||
xsks[i]->prev_rx_npkts = xsks[i]->rx_npkts;
|
||||
xsks[i]->prev_tx_npkts = xsks[i]->tx_npkts;
|
||||
}
|
||||
}
|
||||
|
||||
static void *poller(void *arg)
|
||||
{
|
||||
(void)arg;
|
||||
for (;;) {
|
||||
sleep(opt_interval);
|
||||
dump_stats();
|
||||
}
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static void int_exit(int sig)
|
||||
{
|
||||
(void)sig;
|
||||
dump_stats();
|
||||
bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
|
||||
exit(EXIT_SUCCESS);
|
||||
}
|
||||
|
||||
static struct option long_options[] = {
|
||||
{"rxdrop", no_argument, 0, 'r'},
|
||||
{"txonly", no_argument, 0, 't'},
|
||||
{"l2fwd", no_argument, 0, 'l'},
|
||||
{"interface", required_argument, 0, 'i'},
|
||||
{"queue", required_argument, 0, 'q'},
|
||||
{"poll", no_argument, 0, 'p'},
|
||||
{"shared-buffer", no_argument, 0, 's'},
|
||||
{"xdp-skb", no_argument, 0, 'S'},
|
||||
{"xdp-native", no_argument, 0, 'N'},
|
||||
{"interval", required_argument, 0, 'n'},
|
||||
{0, 0, 0, 0}
|
||||
};
|
||||
|
||||
static void usage(const char *prog)
|
||||
{
|
||||
const char *str =
|
||||
" Usage: %s [OPTIONS]\n"
|
||||
" Options:\n"
|
||||
" -r, --rxdrop Discard all incoming packets (default)\n"
|
||||
" -t, --txonly Only send packets\n"
|
||||
" -l, --l2fwd MAC swap L2 forwarding\n"
|
||||
" -i, --interface=n Run on interface n\n"
|
||||
" -q, --queue=n Use queue n (default 0)\n"
|
||||
" -p, --poll Use poll syscall\n"
|
||||
" -s, --shared-buffer Use shared packet buffer\n"
|
||||
" -S, --xdp-skb=n Use XDP skb-mod\n"
|
||||
" -N, --xdp-native=n Enfore XDP native mode\n"
|
||||
" -n, --interval=n Specify statistics update interval (default 1 sec).\n"
|
||||
"\n";
|
||||
fprintf(stderr, str, prog);
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
static void parse_command_line(int argc, char **argv)
|
||||
{
|
||||
int option_index, c;
|
||||
|
||||
opterr = 0;
|
||||
|
||||
for (;;) {
|
||||
c = getopt_long(argc, argv, "rtli:q:psSNn:", long_options,
|
||||
&option_index);
|
||||
if (c == -1)
|
||||
break;
|
||||
|
||||
switch (c) {
|
||||
case 'r':
|
||||
opt_bench = BENCH_RXDROP;
|
||||
break;
|
||||
case 't':
|
||||
opt_bench = BENCH_TXONLY;
|
||||
break;
|
||||
case 'l':
|
||||
opt_bench = BENCH_L2FWD;
|
||||
break;
|
||||
case 'i':
|
||||
opt_if = optarg;
|
||||
break;
|
||||
case 'q':
|
||||
opt_queue = atoi(optarg);
|
||||
break;
|
||||
case 's':
|
||||
opt_shared_packet_buffer = 1;
|
||||
break;
|
||||
case 'p':
|
||||
opt_poll = 1;
|
||||
break;
|
||||
case 'S':
|
||||
opt_xdp_flags |= XDP_FLAGS_SKB_MODE;
|
||||
break;
|
||||
case 'N':
|
||||
opt_xdp_flags |= XDP_FLAGS_DRV_MODE;
|
||||
break;
|
||||
case 'n':
|
||||
opt_interval = atoi(optarg);
|
||||
break;
|
||||
default:
|
||||
usage(basename(argv[0]));
|
||||
}
|
||||
}
|
||||
|
||||
opt_ifindex = if_nametoindex(opt_if);
|
||||
if (!opt_ifindex) {
|
||||
fprintf(stderr, "ERROR: interface \"%s\" does not exist\n",
|
||||
opt_if);
|
||||
usage(basename(argv[0]));
|
||||
}
|
||||
}
|
||||
|
||||
static void kick_tx(int fd)
|
||||
{
|
||||
int ret;
|
||||
|
||||
ret = sendto(fd, NULL, 0, MSG_DONTWAIT, NULL, 0);
|
||||
if (ret >= 0 || errno == ENOBUFS || errno == EAGAIN)
|
||||
return;
|
||||
lassert(0);
|
||||
}
|
||||
|
||||
static inline void complete_tx_l2fwd(struct xdpsock *xsk)
|
||||
{
|
||||
u32 descs[BATCH_SIZE];
|
||||
unsigned int rcvd;
|
||||
size_t ndescs;
|
||||
|
||||
if (!xsk->outstanding_tx)
|
||||
return;
|
||||
|
||||
kick_tx(xsk->sfd);
|
||||
ndescs = (xsk->outstanding_tx > BATCH_SIZE) ? BATCH_SIZE :
|
||||
xsk->outstanding_tx;
|
||||
|
||||
/* re-add completed Tx buffers */
|
||||
rcvd = umem_complete_from_kernel(&xsk->umem->cq, descs, ndescs);
|
||||
if (rcvd > 0) {
|
||||
umem_fill_to_kernel(&xsk->umem->fq, descs, rcvd);
|
||||
xsk->outstanding_tx -= rcvd;
|
||||
xsk->tx_npkts += rcvd;
|
||||
}
|
||||
}
|
||||
|
||||
static inline void complete_tx_only(struct xdpsock *xsk)
|
||||
{
|
||||
u32 descs[BATCH_SIZE];
|
||||
unsigned int rcvd;
|
||||
|
||||
if (!xsk->outstanding_tx)
|
||||
return;
|
||||
|
||||
kick_tx(xsk->sfd);
|
||||
|
||||
rcvd = umem_complete_from_kernel(&xsk->umem->cq, descs, BATCH_SIZE);
|
||||
if (rcvd > 0) {
|
||||
xsk->outstanding_tx -= rcvd;
|
||||
xsk->tx_npkts += rcvd;
|
||||
}
|
||||
}
|
||||
|
||||
static void rx_drop(struct xdpsock *xsk)
|
||||
{
|
||||
struct xdp_desc descs[BATCH_SIZE];
|
||||
unsigned int rcvd, i;
|
||||
|
||||
rcvd = xq_deq(&xsk->rx, descs, BATCH_SIZE);
|
||||
if (!rcvd)
|
||||
return;
|
||||
|
||||
for (i = 0; i < rcvd; i++) {
|
||||
u32 idx = descs[i].idx;
|
||||
|
||||
lassert(idx < NUM_FRAMES);
|
||||
#if DEBUG_HEXDUMP
|
||||
char *pkt;
|
||||
char buf[32];
|
||||
|
||||
pkt = xq_get_data(xsk, idx, descs[i].offset);
|
||||
sprintf(buf, "idx=%d", idx);
|
||||
hex_dump(pkt, descs[i].len, buf);
|
||||
#endif
|
||||
}
|
||||
|
||||
xsk->rx_npkts += rcvd;
|
||||
|
||||
umem_fill_to_kernel_ex(&xsk->umem->fq, descs, rcvd);
|
||||
}
|
||||
|
||||
static void rx_drop_all(void)
|
||||
{
|
||||
struct pollfd fds[MAX_SOCKS + 1];
|
||||
int i, ret, timeout, nfds = 1;
|
||||
|
||||
memset(fds, 0, sizeof(fds));
|
||||
|
||||
for (i = 0; i < num_socks; i++) {
|
||||
fds[i].fd = xsks[i]->sfd;
|
||||
fds[i].events = POLLIN;
|
||||
timeout = 1000; /* 1sn */
|
||||
}
|
||||
|
||||
for (;;) {
|
||||
if (opt_poll) {
|
||||
ret = poll(fds, nfds, timeout);
|
||||
if (ret <= 0)
|
||||
continue;
|
||||
}
|
||||
|
||||
for (i = 0; i < num_socks; i++)
|
||||
rx_drop(xsks[i]);
|
||||
}
|
||||
}
|
||||
|
||||
static void tx_only(struct xdpsock *xsk)
|
||||
{
|
||||
int timeout, ret, nfds = 1;
|
||||
struct pollfd fds[nfds + 1];
|
||||
unsigned int idx = 0;
|
||||
|
||||
memset(fds, 0, sizeof(fds));
|
||||
fds[0].fd = xsk->sfd;
|
||||
fds[0].events = POLLOUT;
|
||||
timeout = 1000; /* 1sn */
|
||||
|
||||
for (;;) {
|
||||
if (opt_poll) {
|
||||
ret = poll(fds, nfds, timeout);
|
||||
if (ret <= 0)
|
||||
continue;
|
||||
|
||||
if (fds[0].fd != xsk->sfd ||
|
||||
!(fds[0].revents & POLLOUT))
|
||||
continue;
|
||||
}
|
||||
|
||||
if (xq_nb_free(&xsk->tx, BATCH_SIZE) >= BATCH_SIZE) {
|
||||
lassert(xq_enq_tx_only(&xsk->tx, idx, BATCH_SIZE) == 0);
|
||||
|
||||
xsk->outstanding_tx += BATCH_SIZE;
|
||||
idx += BATCH_SIZE;
|
||||
idx %= NUM_FRAMES;
|
||||
}
|
||||
|
||||
complete_tx_only(xsk);
|
||||
}
|
||||
}
|
||||
|
||||
static void l2fwd(struct xdpsock *xsk)
|
||||
{
|
||||
for (;;) {
|
||||
struct xdp_desc descs[BATCH_SIZE];
|
||||
unsigned int rcvd, i;
|
||||
int ret;
|
||||
|
||||
for (;;) {
|
||||
complete_tx_l2fwd(xsk);
|
||||
|
||||
rcvd = xq_deq(&xsk->rx, descs, BATCH_SIZE);
|
||||
if (rcvd > 0)
|
||||
break;
|
||||
}
|
||||
|
||||
for (i = 0; i < rcvd; i++) {
|
||||
char *pkt = xq_get_data(xsk, descs[i].idx,
|
||||
descs[i].offset);
|
||||
|
||||
swap_mac_addresses(pkt);
|
||||
#if DEBUG_HEXDUMP
|
||||
char buf[32];
|
||||
u32 idx = descs[i].idx;
|
||||
|
||||
sprintf(buf, "idx=%d", idx);
|
||||
hex_dump(pkt, descs[i].len, buf);
|
||||
#endif
|
||||
}
|
||||
|
||||
xsk->rx_npkts += rcvd;
|
||||
|
||||
ret = xq_enq(&xsk->tx, descs, rcvd);
|
||||
lassert(ret == 0);
|
||||
xsk->outstanding_tx += rcvd;
|
||||
}
|
||||
}
|
||||
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
|
||||
char xdp_filename[256];
|
||||
int i, ret, key = 0;
|
||||
pthread_t pt;
|
||||
|
||||
parse_command_line(argc, argv);
|
||||
|
||||
if (setrlimit(RLIMIT_MEMLOCK, &r)) {
|
||||
fprintf(stderr, "ERROR: setrlimit(RLIMIT_MEMLOCK) \"%s\"\n",
|
||||
strerror(errno));
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
snprintf(xdp_filename, sizeof(xdp_filename), "%s_kern.o", argv[0]);
|
||||
|
||||
if (load_bpf_file(xdp_filename)) {
|
||||
fprintf(stderr, "ERROR: load_bpf_file %s\n", bpf_log_buf);
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
if (!prog_fd[0]) {
|
||||
fprintf(stderr, "ERROR: load_bpf_file: \"%s\"\n",
|
||||
strerror(errno));
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
if (bpf_set_link_xdp_fd(opt_ifindex, prog_fd[0], opt_xdp_flags) < 0) {
|
||||
fprintf(stderr, "ERROR: link set xdp fd failed\n");
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
ret = bpf_map_update_elem(map_fd[0], &key, &opt_queue, 0);
|
||||
if (ret) {
|
||||
fprintf(stderr, "ERROR: bpf_map_update_elem qidconf\n");
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
/* Create sockets... */
|
||||
xsks[num_socks++] = xsk_configure(NULL);
|
||||
|
||||
#if RR_LB
|
||||
for (i = 0; i < MAX_SOCKS - 1; i++)
|
||||
xsks[num_socks++] = xsk_configure(xsks[0]->umem);
|
||||
#endif
|
||||
|
||||
/* ...and insert them into the map. */
|
||||
for (i = 0; i < num_socks; i++) {
|
||||
key = i;
|
||||
ret = bpf_map_update_elem(map_fd[1], &key, &xsks[i]->sfd, 0);
|
||||
if (ret) {
|
||||
fprintf(stderr, "ERROR: bpf_map_update_elem %d\n", i);
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
}
|
||||
|
||||
signal(SIGINT, int_exit);
|
||||
signal(SIGTERM, int_exit);
|
||||
signal(SIGABRT, int_exit);
|
||||
|
||||
setlocale(LC_ALL, "");
|
||||
|
||||
ret = pthread_create(&pt, NULL, poller, NULL);
|
||||
lassert(ret == 0);
|
||||
|
||||
prev_time = get_nsecs();
|
||||
|
||||
if (opt_bench == BENCH_RXDROP)
|
||||
rx_drop_all();
|
||||
else if (opt_bench == BENCH_TXONLY)
|
||||
tx_only(xsks[0]);
|
||||
else
|
||||
l2fwd(xsks[0]);
|
||||
|
||||
return 0;
|
||||
}
|
@ -39,9 +39,9 @@ class Helper(object):
|
||||
Break down helper function protocol into smaller chunks: return type,
|
||||
name, distincts arguments.
|
||||
"""
|
||||
arg_re = re.compile('^((const )?(struct )?(\w+|...))( (\**)(\w+))?$')
|
||||
arg_re = re.compile('((const )?(struct )?(\w+|...))( (\**)(\w+))?$')
|
||||
res = {}
|
||||
proto_re = re.compile('^(.+) (\**)(\w+)\(((([^,]+)(, )?){1,5})\)$')
|
||||
proto_re = re.compile('(.+) (\**)(\w+)\(((([^,]+)(, )?){1,5})\)$')
|
||||
|
||||
capture = proto_re.match(self.proto)
|
||||
res['ret_type'] = capture.group(1)
|
||||
@ -87,7 +87,7 @@ class HeaderParser(object):
|
||||
# - Same as above, with "const" and/or "struct" in front of type
|
||||
# - "..." (undefined number of arguments, for bpf_trace_printk())
|
||||
# There is at least one term ("void"), and at most five arguments.
|
||||
p = re.compile('^ \* ((.+) \**\w+\((((const )?(struct )?(\w+|\.\.\.)( \**\w+)?)(, )?){1,5}\))$')
|
||||
p = re.compile(' \* ?((.+) \**\w+\((((const )?(struct )?(\w+|\.\.\.)( \**\w+)?)(, )?){1,5}\))$')
|
||||
capture = p.match(self.line)
|
||||
if not capture:
|
||||
raise NoHelperFound
|
||||
@ -95,7 +95,7 @@ class HeaderParser(object):
|
||||
return capture.group(1)
|
||||
|
||||
def parse_desc(self):
|
||||
p = re.compile('^ \* \tDescription$')
|
||||
p = re.compile(' \* ?(?:\t| {6,8})Description$')
|
||||
capture = p.match(self.line)
|
||||
if not capture:
|
||||
# Helper can have empty description and we might be parsing another
|
||||
@ -109,7 +109,7 @@ class HeaderParser(object):
|
||||
if self.line == ' *\n':
|
||||
desc += '\n'
|
||||
else:
|
||||
p = re.compile('^ \* \t\t(.*)')
|
||||
p = re.compile(' \* ?(?:\t| {6,8})(?:\t| {8})(.*)')
|
||||
capture = p.match(self.line)
|
||||
if capture:
|
||||
desc += capture.group(1) + '\n'
|
||||
@ -118,7 +118,7 @@ class HeaderParser(object):
|
||||
return desc
|
||||
|
||||
def parse_ret(self):
|
||||
p = re.compile('^ \* \tReturn$')
|
||||
p = re.compile(' \* ?(?:\t| {6,8})Return$')
|
||||
capture = p.match(self.line)
|
||||
if not capture:
|
||||
# Helper can have empty retval and we might be parsing another
|
||||
@ -132,7 +132,7 @@ class HeaderParser(object):
|
||||
if self.line == ' *\n':
|
||||
ret += '\n'
|
||||
else:
|
||||
p = re.compile('^ \* \t\t(.*)')
|
||||
p = re.compile(' \* ?(?:\t| {6,8})(?:\t| {8})(.*)')
|
||||
capture = p.match(self.line)
|
||||
if capture:
|
||||
ret += capture.group(1) + '\n'
|
||||
|
@ -1471,7 +1471,9 @@ static inline u16 socket_type_to_security_class(int family, int type, int protoc
|
||||
return SECCLASS_QIPCRTR_SOCKET;
|
||||
case PF_SMC:
|
||||
return SECCLASS_SMC_SOCKET;
|
||||
#if PF_MAX > 44
|
||||
case PF_XDP:
|
||||
return SECCLASS_XDP_SOCKET;
|
||||
#if PF_MAX > 45
|
||||
#error New address family defined, please update this function.
|
||||
#endif
|
||||
}
|
||||
|
@ -240,9 +240,11 @@ struct security_class_mapping secclass_map[] = {
|
||||
{ "manage_subnet", NULL } },
|
||||
{ "bpf",
|
||||
{"map_create", "map_read", "map_write", "prog_load", "prog_run"} },
|
||||
{ "xdp_socket",
|
||||
{ COMMON_SOCK_PERMS, NULL } },
|
||||
{ NULL }
|
||||
};
|
||||
|
||||
#if PF_MAX > 44
|
||||
#if PF_MAX > 45
|
||||
#error New address family defined, please update secclass_map.
|
||||
#endif
|
||||
|
@ -22,17 +22,19 @@ MAP COMMANDS
|
||||
=============
|
||||
|
||||
| **bpftool** **map { show | list }** [*MAP*]
|
||||
| **bpftool** **map dump** *MAP*
|
||||
| **bpftool** **map update** *MAP* **key** [**hex**] *BYTES* **value** [**hex**] *VALUE* [*UPDATE_FLAGS*]
|
||||
| **bpftool** **map lookup** *MAP* **key** [**hex**] *BYTES*
|
||||
| **bpftool** **map getnext** *MAP* [**key** [**hex**] *BYTES*]
|
||||
| **bpftool** **map delete** *MAP* **key** [**hex**] *BYTES*
|
||||
| **bpftool** **map pin** *MAP* *FILE*
|
||||
| **bpftool** **map dump** *MAP*
|
||||
| **bpftool** **map update** *MAP* **key** *DATA* **value** *VALUE* [*UPDATE_FLAGS*]
|
||||
| **bpftool** **map lookup** *MAP* **key** *DATA*
|
||||
| **bpftool** **map getnext** *MAP* [**key** *DATA*]
|
||||
| **bpftool** **map delete** *MAP* **key** *DATA*
|
||||
| **bpftool** **map pin** *MAP* *FILE*
|
||||
| **bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
|
||||
| **bpftool** **map help**
|
||||
|
|
||||
| *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
|
||||
| *DATA* := { [**hex**] *BYTES* }
|
||||
| *PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
|
||||
| *VALUE* := { *BYTES* | *MAP* | *PROG* }
|
||||
| *VALUE* := { *DATA* | *MAP* | *PROG* }
|
||||
| *UPDATE_FLAGS* := { **any** | **exist** | **noexist** }
|
||||
|
||||
DESCRIPTION
|
||||
@ -48,7 +50,7 @@ DESCRIPTION
|
||||
**bpftool map dump** *MAP*
|
||||
Dump all entries in a given *MAP*.
|
||||
|
||||
**bpftool map update** *MAP* **key** [**hex**] *BYTES* **value** [**hex**] *VALUE* [*UPDATE_FLAGS*]
|
||||
**bpftool map update** *MAP* **key** *DATA* **value** *VALUE* [*UPDATE_FLAGS*]
|
||||
Update map entry for a given *KEY*.
|
||||
|
||||
*UPDATE_FLAGS* can be one of: **any** update existing entry
|
||||
@ -61,13 +63,13 @@ DESCRIPTION
|
||||
the bytes are parsed as decimal values, unless a "0x" prefix
|
||||
(for hexadecimal) or a "0" prefix (for octal) is provided.
|
||||
|
||||
**bpftool map lookup** *MAP* **key** [**hex**] *BYTES*
|
||||
**bpftool map lookup** *MAP* **key** *DATA*
|
||||
Lookup **key** in the map.
|
||||
|
||||
**bpftool map getnext** *MAP* [**key** [**hex**] *BYTES*]
|
||||
**bpftool map getnext** *MAP* [**key** *DATA*]
|
||||
Get next key. If *key* is not specified, get first key.
|
||||
|
||||
**bpftool map delete** *MAP* **key** [**hex**] *BYTES*
|
||||
**bpftool map delete** *MAP* **key** *DATA*
|
||||
Remove entry from the map.
|
||||
|
||||
**bpftool map pin** *MAP* *FILE*
|
||||
@ -75,6 +77,22 @@ DESCRIPTION
|
||||
|
||||
Note: *FILE* must be located in *bpffs* mount.
|
||||
|
||||
**bpftool** **map event_pipe** *MAP* [**cpu** *N* **index** *M*]
|
||||
Read events from a BPF_MAP_TYPE_PERF_EVENT_ARRAY map.
|
||||
|
||||
Install perf rings into a perf event array map and dump
|
||||
output of any bpf_perf_event_output() call in the kernel.
|
||||
By default read the number of CPUs on the system and
|
||||
install perf ring for each CPU in the corresponding index
|
||||
in the array.
|
||||
|
||||
If **cpu** and **index** are specified, install perf ring
|
||||
for given **cpu** at **index** in the array (single ring).
|
||||
|
||||
Note that installing a perf ring into an array will silently
|
||||
replace any existing ring. Any other application will stop
|
||||
receiving events if it installed its rings earlier.
|
||||
|
||||
**bpftool map help**
|
||||
Print short help message.
|
||||
|
||||
|
@ -23,7 +23,7 @@ SYNOPSIS
|
||||
|
||||
*MAP-COMMANDS* :=
|
||||
{ **show** | **list** | **dump** | **update** | **lookup** | **getnext** | **delete**
|
||||
| **pin** | **help** }
|
||||
| **pin** | **event_pipe** | **help** }
|
||||
|
||||
*PROG-COMMANDS* := { **show** | **list** | **dump jited** | **dump xlated** | **pin**
|
||||
| **load** | **help** }
|
||||
|
@ -39,7 +39,12 @@ CC = gcc
|
||||
|
||||
CFLAGS += -O2
|
||||
CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow -Wno-missing-field-initializers
|
||||
CFLAGS += -DPACKAGE='"bpftool"' -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
|
||||
CFLAGS += -DPACKAGE='"bpftool"' -D__EXPORTED_HEADERS__ \
|
||||
-I$(srctree)/kernel/bpf/ \
|
||||
-I$(srctree)/tools/include \
|
||||
-I$(srctree)/tools/include/uapi \
|
||||
-I$(srctree)/tools/lib/bpf \
|
||||
-I$(srctree)/tools/perf
|
||||
CFLAGS += -DBPFTOOL_VERSION='"$(BPFTOOL_VERSION)"'
|
||||
LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
# bpftool(8) bash completion -*- shell-script -*-
|
||||
#
|
||||
# Copyright (C) 2017 Netronome Systems, Inc.
|
||||
# Copyright (C) 2017-2018 Netronome Systems, Inc.
|
||||
#
|
||||
# This software is dual licensed under the GNU General License
|
||||
# Version 2, June 1991 as shown in the file COPYING in the top-level
|
||||
@ -79,6 +79,14 @@ _bpftool_get_map_ids()
|
||||
command sed -n 's/.*"id": \(.*\),$/\1/p' )" -- "$cur" ) )
|
||||
}
|
||||
|
||||
_bpftool_get_perf_map_ids()
|
||||
{
|
||||
COMPREPLY+=( $( compgen -W "$( bpftool -jp map 2>&1 | \
|
||||
command grep -C2 perf_event_array | \
|
||||
command sed -n 's/.*"id": \(.*\),$/\1/p' )" -- "$cur" ) )
|
||||
}
|
||||
|
||||
|
||||
_bpftool_get_prog_ids()
|
||||
{
|
||||
COMPREPLY+=( $( compgen -W "$( bpftool -jp prog 2>&1 | \
|
||||
@ -359,10 +367,34 @@ _bpftool()
|
||||
fi
|
||||
return 0
|
||||
;;
|
||||
event_pipe)
|
||||
case $prev in
|
||||
$command)
|
||||
COMPREPLY=( $( compgen -W "$MAP_TYPE" -- "$cur" ) )
|
||||
return 0
|
||||
;;
|
||||
id)
|
||||
_bpftool_get_perf_map_ids
|
||||
return 0
|
||||
;;
|
||||
cpu)
|
||||
return 0
|
||||
;;
|
||||
index)
|
||||
return 0
|
||||
;;
|
||||
*)
|
||||
_bpftool_once_attr 'cpu'
|
||||
_bpftool_once_attr 'index'
|
||||
return 0
|
||||
;;
|
||||
esac
|
||||
;;
|
||||
*)
|
||||
[[ $prev == $object ]] && \
|
||||
COMPREPLY=( $( compgen -W 'delete dump getnext help \
|
||||
lookup pin show list update' -- "$cur" ) )
|
||||
lookup pin event_pipe show list update' -- \
|
||||
"$cur" ) )
|
||||
;;
|
||||
esac
|
||||
;;
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* Copyright (C) 2017 Netronome Systems, Inc.
|
||||
* Copyright (C) 2017-2018 Netronome Systems, Inc.
|
||||
*
|
||||
* This software is dual licensed under the GNU General License Version 2,
|
||||
* June 1991 as shown in the file COPYING in the top-level directory of this
|
||||
@ -33,6 +33,7 @@
|
||||
|
||||
/* Author: Jakub Kicinski <kubakici@wp.pl> */
|
||||
|
||||
#include <ctype.h>
|
||||
#include <errno.h>
|
||||
#include <fcntl.h>
|
||||
#include <fts.h>
|
||||
@ -330,6 +331,16 @@ char *get_fdinfo(int fd, const char *key)
|
||||
return NULL;
|
||||
}
|
||||
|
||||
void print_data_json(uint8_t *data, size_t len)
|
||||
{
|
||||
unsigned int i;
|
||||
|
||||
jsonw_start_array(json_wtr);
|
||||
for (i = 0; i < len; i++)
|
||||
jsonw_printf(json_wtr, "%d", data[i]);
|
||||
jsonw_end_array(json_wtr);
|
||||
}
|
||||
|
||||
void print_hex_data_json(uint8_t *data, size_t len)
|
||||
{
|
||||
unsigned int i;
|
||||
@ -420,6 +431,70 @@ void delete_pinned_obj_table(struct pinned_obj_table *tab)
|
||||
}
|
||||
}
|
||||
|
||||
unsigned int get_page_size(void)
|
||||
{
|
||||
static int result;
|
||||
|
||||
if (!result)
|
||||
result = getpagesize();
|
||||
return result;
|
||||
}
|
||||
|
||||
unsigned int get_possible_cpus(void)
|
||||
{
|
||||
static unsigned int result;
|
||||
char buf[128];
|
||||
long int n;
|
||||
char *ptr;
|
||||
int fd;
|
||||
|
||||
if (result)
|
||||
return result;
|
||||
|
||||
fd = open("/sys/devices/system/cpu/possible", O_RDONLY);
|
||||
if (fd < 0) {
|
||||
p_err("can't open sysfs possible cpus");
|
||||
exit(-1);
|
||||
}
|
||||
|
||||
n = read(fd, buf, sizeof(buf));
|
||||
if (n < 2) {
|
||||
p_err("can't read sysfs possible cpus");
|
||||
exit(-1);
|
||||
}
|
||||
close(fd);
|
||||
|
||||
if (n == sizeof(buf)) {
|
||||
p_err("read sysfs possible cpus overflow");
|
||||
exit(-1);
|
||||
}
|
||||
|
||||
ptr = buf;
|
||||
n = 0;
|
||||
while (*ptr && *ptr != '\n') {
|
||||
unsigned int a, b;
|
||||
|
||||
if (sscanf(ptr, "%u-%u", &a, &b) == 2) {
|
||||
n += b - a + 1;
|
||||
|
||||
ptr = strchr(ptr, '-') + 1;
|
||||
} else if (sscanf(ptr, "%u", &a) == 1) {
|
||||
n++;
|
||||
} else {
|
||||
assert(0);
|
||||
}
|
||||
|
||||
while (isdigit(*ptr))
|
||||
ptr++;
|
||||
if (*ptr == ',')
|
||||
ptr++;
|
||||
}
|
||||
|
||||
result = n;
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
static char *
|
||||
ifindex_to_name_ns(__u32 ifindex, __u32 ns_dev, __u32 ns_ino, char *buf)
|
||||
{
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* Copyright (C) 2017 Netronome Systems, Inc.
|
||||
* Copyright (C) 2017-2018 Netronome Systems, Inc.
|
||||
*
|
||||
* This software is dual licensed under the GNU General License Version 2,
|
||||
* June 1991 as shown in the file COPYING in the top-level directory of this
|
||||
@ -117,14 +117,19 @@ int do_pin_fd(int fd, const char *name);
|
||||
|
||||
int do_prog(int argc, char **arg);
|
||||
int do_map(int argc, char **arg);
|
||||
int do_event_pipe(int argc, char **argv);
|
||||
int do_cgroup(int argc, char **arg);
|
||||
|
||||
int prog_parse_fd(int *argc, char ***argv);
|
||||
int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 *info_len);
|
||||
|
||||
void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes,
|
||||
const char *arch);
|
||||
void print_data_json(uint8_t *data, size_t len);
|
||||
void print_hex_data_json(uint8_t *data, size_t len);
|
||||
|
||||
unsigned int get_page_size(void);
|
||||
unsigned int get_possible_cpus(void);
|
||||
const char *ifindex_to_bfd_name_ns(__u32 ifindex, __u64 ns_dev, __u64 ns_ino);
|
||||
|
||||
#endif
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* Copyright (C) 2017 Netronome Systems, Inc.
|
||||
* Copyright (C) 2017-2018 Netronome Systems, Inc.
|
||||
*
|
||||
* This software is dual licensed under the GNU General License Version 2,
|
||||
* June 1991 as shown in the file COPYING in the top-level directory of this
|
||||
@ -34,7 +34,6 @@
|
||||
/* Author: Jakub Kicinski <kubakici@wp.pl> */
|
||||
|
||||
#include <assert.h>
|
||||
#include <ctype.h>
|
||||
#include <errno.h>
|
||||
#include <fcntl.h>
|
||||
#include <stdbool.h>
|
||||
@ -69,61 +68,6 @@ static const char * const map_type_name[] = {
|
||||
[BPF_MAP_TYPE_CPUMAP] = "cpumap",
|
||||
};
|
||||
|
||||
static unsigned int get_possible_cpus(void)
|
||||
{
|
||||
static unsigned int result;
|
||||
char buf[128];
|
||||
long int n;
|
||||
char *ptr;
|
||||
int fd;
|
||||
|
||||
if (result)
|
||||
return result;
|
||||
|
||||
fd = open("/sys/devices/system/cpu/possible", O_RDONLY);
|
||||
if (fd < 0) {
|
||||
p_err("can't open sysfs possible cpus");
|
||||
exit(-1);
|
||||
}
|
||||
|
||||
n = read(fd, buf, sizeof(buf));
|
||||
if (n < 2) {
|
||||
p_err("can't read sysfs possible cpus");
|
||||
exit(-1);
|
||||
}
|
||||
close(fd);
|
||||
|
||||
if (n == sizeof(buf)) {
|
||||
p_err("read sysfs possible cpus overflow");
|
||||
exit(-1);
|
||||
}
|
||||
|
||||
ptr = buf;
|
||||
n = 0;
|
||||
while (*ptr && *ptr != '\n') {
|
||||
unsigned int a, b;
|
||||
|
||||
if (sscanf(ptr, "%u-%u", &a, &b) == 2) {
|
||||
n += b - a + 1;
|
||||
|
||||
ptr = strchr(ptr, '-') + 1;
|
||||
} else if (sscanf(ptr, "%u", &a) == 1) {
|
||||
n++;
|
||||
} else {
|
||||
assert(0);
|
||||
}
|
||||
|
||||
while (isdigit(*ptr))
|
||||
ptr++;
|
||||
if (*ptr == ',')
|
||||
ptr++;
|
||||
}
|
||||
|
||||
result = n;
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
static bool map_is_per_cpu(__u32 type)
|
||||
{
|
||||
return type == BPF_MAP_TYPE_PERCPU_HASH ||
|
||||
@ -186,8 +130,7 @@ static int map_parse_fd(int *argc, char ***argv)
|
||||
return -1;
|
||||
}
|
||||
|
||||
static int
|
||||
map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 *info_len)
|
||||
int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 *info_len)
|
||||
{
|
||||
int err;
|
||||
int fd;
|
||||
@ -873,23 +816,25 @@ static int do_help(int argc, char **argv)
|
||||
|
||||
fprintf(stderr,
|
||||
"Usage: %s %s { show | list } [MAP]\n"
|
||||
" %s %s dump MAP\n"
|
||||
" %s %s update MAP key [hex] BYTES value [hex] VALUE [UPDATE_FLAGS]\n"
|
||||
" %s %s lookup MAP key [hex] BYTES\n"
|
||||
" %s %s getnext MAP [key [hex] BYTES]\n"
|
||||
" %s %s delete MAP key [hex] BYTES\n"
|
||||
" %s %s pin MAP FILE\n"
|
||||
" %s %s dump MAP\n"
|
||||
" %s %s update MAP key DATA value VALUE [UPDATE_FLAGS]\n"
|
||||
" %s %s lookup MAP key DATA\n"
|
||||
" %s %s getnext MAP [key DATA]\n"
|
||||
" %s %s delete MAP key DATA\n"
|
||||
" %s %s pin MAP FILE\n"
|
||||
" %s %s event_pipe MAP [cpu N index M]\n"
|
||||
" %s %s help\n"
|
||||
"\n"
|
||||
" MAP := { id MAP_ID | pinned FILE }\n"
|
||||
" DATA := { [hex] BYTES }\n"
|
||||
" " HELP_SPEC_PROGRAM "\n"
|
||||
" VALUE := { BYTES | MAP | PROG }\n"
|
||||
" VALUE := { DATA | MAP | PROG }\n"
|
||||
" UPDATE_FLAGS := { any | exist | noexist }\n"
|
||||
" " HELP_SPEC_OPTIONS "\n"
|
||||
"",
|
||||
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
|
||||
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
|
||||
bin_name, argv[-2], bin_name, argv[-2]);
|
||||
bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
|
||||
|
||||
return 0;
|
||||
}
|
||||
@ -904,6 +849,7 @@ static const struct cmd cmds[] = {
|
||||
{ "getnext", do_getnext },
|
||||
{ "delete", do_delete },
|
||||
{ "pin", do_pin },
|
||||
{ "event_pipe", do_event_pipe },
|
||||
{ 0 }
|
||||
};
|
||||
|
||||
|
347
tools/bpf/bpftool/map_perf_ring.c
Normal file
347
tools/bpf/bpftool/map_perf_ring.c
Normal file
@ -0,0 +1,347 @@
|
||||
// SPDX-License-Identifier: GPL-2.0-only
|
||||
/* Copyright (C) 2018 Netronome Systems, Inc. */
|
||||
/* This program is free software; you can redistribute it and/or
|
||||
* modify it under the terms of version 2 of the GNU General Public
|
||||
* License as published by the Free Software Foundation.
|
||||
*/
|
||||
#include <errno.h>
|
||||
#include <fcntl.h>
|
||||
#include <libbpf.h>
|
||||
#include <poll.h>
|
||||
#include <signal.h>
|
||||
#include <stdbool.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <time.h>
|
||||
#include <unistd.h>
|
||||
#include <linux/bpf.h>
|
||||
#include <linux/perf_event.h>
|
||||
#include <sys/ioctl.h>
|
||||
#include <sys/mman.h>
|
||||
#include <sys/syscall.h>
|
||||
|
||||
#include <bpf.h>
|
||||
#include <perf-sys.h>
|
||||
|
||||
#include "main.h"
|
||||
|
||||
#define MMAP_PAGE_CNT 16
|
||||
|
||||
static bool stop;
|
||||
|
||||
struct event_ring_info {
|
||||
int fd;
|
||||
int key;
|
||||
unsigned int cpu;
|
||||
void *mem;
|
||||
};
|
||||
|
||||
struct perf_event_sample {
|
||||
struct perf_event_header header;
|
||||
__u32 size;
|
||||
unsigned char data[];
|
||||
};
|
||||
|
||||
static void int_exit(int signo)
|
||||
{
|
||||
fprintf(stderr, "Stopping...\n");
|
||||
stop = true;
|
||||
}
|
||||
|
||||
static void
|
||||
print_bpf_output(struct event_ring_info *ring, struct perf_event_sample *e)
|
||||
{
|
||||
struct {
|
||||
struct perf_event_header header;
|
||||
__u64 id;
|
||||
__u64 lost;
|
||||
} *lost = (void *)e;
|
||||
struct timespec ts;
|
||||
|
||||
if (clock_gettime(CLOCK_MONOTONIC, &ts)) {
|
||||
perror("Can't read clock for timestamp");
|
||||
return;
|
||||
}
|
||||
|
||||
if (json_output) {
|
||||
jsonw_start_object(json_wtr);
|
||||
jsonw_name(json_wtr, "timestamp");
|
||||
jsonw_uint(json_wtr, ts.tv_sec * 1000000000ull + ts.tv_nsec);
|
||||
jsonw_name(json_wtr, "type");
|
||||
jsonw_uint(json_wtr, e->header.type);
|
||||
jsonw_name(json_wtr, "cpu");
|
||||
jsonw_uint(json_wtr, ring->cpu);
|
||||
jsonw_name(json_wtr, "index");
|
||||
jsonw_uint(json_wtr, ring->key);
|
||||
if (e->header.type == PERF_RECORD_SAMPLE) {
|
||||
jsonw_name(json_wtr, "data");
|
||||
print_data_json(e->data, e->size);
|
||||
} else if (e->header.type == PERF_RECORD_LOST) {
|
||||
jsonw_name(json_wtr, "lost");
|
||||
jsonw_start_object(json_wtr);
|
||||
jsonw_name(json_wtr, "id");
|
||||
jsonw_uint(json_wtr, lost->id);
|
||||
jsonw_name(json_wtr, "count");
|
||||
jsonw_uint(json_wtr, lost->lost);
|
||||
jsonw_end_object(json_wtr);
|
||||
}
|
||||
jsonw_end_object(json_wtr);
|
||||
} else {
|
||||
if (e->header.type == PERF_RECORD_SAMPLE) {
|
||||
printf("== @%ld.%ld CPU: %d index: %d =====\n",
|
||||
(long)ts.tv_sec, ts.tv_nsec,
|
||||
ring->cpu, ring->key);
|
||||
fprint_hex(stdout, e->data, e->size, " ");
|
||||
printf("\n");
|
||||
} else if (e->header.type == PERF_RECORD_LOST) {
|
||||
printf("lost %lld events\n", lost->lost);
|
||||
} else {
|
||||
printf("unknown event type=%d size=%d\n",
|
||||
e->header.type, e->header.size);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
perf_event_read(struct event_ring_info *ring, void **buf, size_t *buf_len)
|
||||
{
|
||||
volatile struct perf_event_mmap_page *header = ring->mem;
|
||||
__u64 buffer_size = MMAP_PAGE_CNT * get_page_size();
|
||||
__u64 data_tail = header->data_tail;
|
||||
__u64 data_head = header->data_head;
|
||||
void *base, *begin, *end;
|
||||
|
||||
asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */
|
||||
if (data_head == data_tail)
|
||||
return;
|
||||
|
||||
base = ((char *)header) + get_page_size();
|
||||
|
||||
begin = base + data_tail % buffer_size;
|
||||
end = base + data_head % buffer_size;
|
||||
|
||||
while (begin != end) {
|
||||
struct perf_event_sample *e;
|
||||
|
||||
e = begin;
|
||||
if (begin + e->header.size > base + buffer_size) {
|
||||
long len = base + buffer_size - begin;
|
||||
|
||||
if (*buf_len < e->header.size) {
|
||||
free(*buf);
|
||||
*buf = malloc(e->header.size);
|
||||
if (!*buf) {
|
||||
fprintf(stderr,
|
||||
"can't allocate memory");
|
||||
stop = true;
|
||||
return;
|
||||
}
|
||||
*buf_len = e->header.size;
|
||||
}
|
||||
|
||||
memcpy(*buf, begin, len);
|
||||
memcpy(*buf + len, base, e->header.size - len);
|
||||
e = (void *)*buf;
|
||||
begin = base + e->header.size - len;
|
||||
} else if (begin + e->header.size == base + buffer_size) {
|
||||
begin = base;
|
||||
} else {
|
||||
begin += e->header.size;
|
||||
}
|
||||
|
||||
print_bpf_output(ring, e);
|
||||
}
|
||||
|
||||
__sync_synchronize(); /* smp_mb() */
|
||||
header->data_tail = data_head;
|
||||
}
|
||||
|
||||
static int perf_mmap_size(void)
|
||||
{
|
||||
return get_page_size() * (MMAP_PAGE_CNT + 1);
|
||||
}
|
||||
|
||||
static void *perf_event_mmap(int fd)
|
||||
{
|
||||
int mmap_size = perf_mmap_size();
|
||||
void *base;
|
||||
|
||||
base = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
|
||||
if (base == MAP_FAILED) {
|
||||
p_err("event mmap failed: %s\n", strerror(errno));
|
||||
return NULL;
|
||||
}
|
||||
|
||||
return base;
|
||||
}
|
||||
|
||||
static void perf_event_unmap(void *mem)
|
||||
{
|
||||
if (munmap(mem, perf_mmap_size()))
|
||||
fprintf(stderr, "Can't unmap ring memory!\n");
|
||||
}
|
||||
|
||||
static int bpf_perf_event_open(int map_fd, int key, int cpu)
|
||||
{
|
||||
struct perf_event_attr attr = {
|
||||
.sample_type = PERF_SAMPLE_RAW,
|
||||
.type = PERF_TYPE_SOFTWARE,
|
||||
.config = PERF_COUNT_SW_BPF_OUTPUT,
|
||||
};
|
||||
int pmu_fd;
|
||||
|
||||
pmu_fd = sys_perf_event_open(&attr, -1, cpu, -1, 0);
|
||||
if (pmu_fd < 0) {
|
||||
p_err("failed to open perf event %d for CPU %d", key, cpu);
|
||||
return -1;
|
||||
}
|
||||
|
||||
if (bpf_map_update_elem(map_fd, &key, &pmu_fd, BPF_ANY)) {
|
||||
p_err("failed to update map for event %d for CPU %d", key, cpu);
|
||||
goto err_close;
|
||||
}
|
||||
if (ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0)) {
|
||||
p_err("failed to enable event %d for CPU %d", key, cpu);
|
||||
goto err_close;
|
||||
}
|
||||
|
||||
return pmu_fd;
|
||||
|
||||
err_close:
|
||||
close(pmu_fd);
|
||||
return -1;
|
||||
}
|
||||
|
||||
int do_event_pipe(int argc, char **argv)
|
||||
{
|
||||
int i, nfds, map_fd, index = -1, cpu = -1;
|
||||
struct bpf_map_info map_info = {};
|
||||
struct event_ring_info *rings;
|
||||
size_t tmp_buf_sz = 0;
|
||||
void *tmp_buf = NULL;
|
||||
struct pollfd *pfds;
|
||||
__u32 map_info_len;
|
||||
bool do_all = true;
|
||||
|
||||
map_info_len = sizeof(map_info);
|
||||
map_fd = map_parse_fd_and_info(&argc, &argv, &map_info, &map_info_len);
|
||||
if (map_fd < 0)
|
||||
return -1;
|
||||
|
||||
if (map_info.type != BPF_MAP_TYPE_PERF_EVENT_ARRAY) {
|
||||
p_err("map is not a perf event array");
|
||||
goto err_close_map;
|
||||
}
|
||||
|
||||
while (argc) {
|
||||
if (argc < 2)
|
||||
BAD_ARG();
|
||||
|
||||
if (is_prefix(*argv, "cpu")) {
|
||||
char *endptr;
|
||||
|
||||
NEXT_ARG();
|
||||
cpu = strtoul(*argv, &endptr, 0);
|
||||
if (*endptr) {
|
||||
p_err("can't parse %s as CPU ID", **argv);
|
||||
goto err_close_map;
|
||||
}
|
||||
|
||||
NEXT_ARG();
|
||||
} else if (is_prefix(*argv, "index")) {
|
||||
char *endptr;
|
||||
|
||||
NEXT_ARG();
|
||||
index = strtoul(*argv, &endptr, 0);
|
||||
if (*endptr) {
|
||||
p_err("can't parse %s as index", **argv);
|
||||
goto err_close_map;
|
||||
}
|
||||
|
||||
NEXT_ARG();
|
||||
} else {
|
||||
BAD_ARG();
|
||||
}
|
||||
|
||||
do_all = false;
|
||||
}
|
||||
|
||||
if (!do_all) {
|
||||
if (index == -1 || cpu == -1) {
|
||||
p_err("cpu and index must be specified together");
|
||||
goto err_close_map;
|
||||
}
|
||||
|
||||
nfds = 1;
|
||||
} else {
|
||||
nfds = min(get_possible_cpus(), map_info.max_entries);
|
||||
cpu = 0;
|
||||
index = 0;
|
||||
}
|
||||
|
||||
rings = calloc(nfds, sizeof(rings[0]));
|
||||
if (!rings)
|
||||
goto err_close_map;
|
||||
|
||||
pfds = calloc(nfds, sizeof(pfds[0]));
|
||||
if (!pfds)
|
||||
goto err_free_rings;
|
||||
|
||||
for (i = 0; i < nfds; i++) {
|
||||
rings[i].cpu = cpu + i;
|
||||
rings[i].key = index + i;
|
||||
|
||||
rings[i].fd = bpf_perf_event_open(map_fd, rings[i].key,
|
||||
rings[i].cpu);
|
||||
if (rings[i].fd < 0)
|
||||
goto err_close_fds_prev;
|
||||
|
||||
rings[i].mem = perf_event_mmap(rings[i].fd);
|
||||
if (!rings[i].mem)
|
||||
goto err_close_fds_current;
|
||||
|
||||
pfds[i].fd = rings[i].fd;
|
||||
pfds[i].events = POLLIN;
|
||||
}
|
||||
|
||||
signal(SIGINT, int_exit);
|
||||
signal(SIGHUP, int_exit);
|
||||
signal(SIGTERM, int_exit);
|
||||
|
||||
if (json_output)
|
||||
jsonw_start_array(json_wtr);
|
||||
|
||||
while (!stop) {
|
||||
poll(pfds, nfds, 200);
|
||||
for (i = 0; i < nfds; i++)
|
||||
perf_event_read(&rings[i], &tmp_buf, &tmp_buf_sz);
|
||||
}
|
||||
free(tmp_buf);
|
||||
|
||||
if (json_output)
|
||||
jsonw_end_array(json_wtr);
|
||||
|
||||
for (i = 0; i < nfds; i++) {
|
||||
perf_event_unmap(rings[i].mem);
|
||||
close(rings[i].fd);
|
||||
}
|
||||
free(pfds);
|
||||
free(rings);
|
||||
close(map_fd);
|
||||
|
||||
return 0;
|
||||
|
||||
err_close_fds_prev:
|
||||
while (i--) {
|
||||
perf_event_unmap(rings[i].mem);
|
||||
err_close_fds_current:
|
||||
close(rings[i].fd);
|
||||
}
|
||||
free(pfds);
|
||||
err_free_rings:
|
||||
free(rings);
|
||||
err_close_map:
|
||||
close(map_fd);
|
||||
return -1;
|
||||
}
|
@ -96,7 +96,10 @@ static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
|
||||
return;
|
||||
}
|
||||
|
||||
strftime(buf, size, "%b %d/%H:%M", &load_tm);
|
||||
if (json_output)
|
||||
strftime(buf, size, "%s", &load_tm);
|
||||
else
|
||||
strftime(buf, size, "%FT%T%z", &load_tm);
|
||||
}
|
||||
|
||||
static int prog_fd_by_tag(unsigned char *tag)
|
||||
@ -245,7 +248,8 @@ static void print_prog_json(struct bpf_prog_info *info, int fd)
|
||||
print_boot_time(info->load_time, buf, sizeof(buf));
|
||||
|
||||
/* Piggy back on load_time, since 0 uid is a valid one */
|
||||
jsonw_string_field(json_wtr, "loaded_at", buf);
|
||||
jsonw_name(json_wtr, "loaded_at");
|
||||
jsonw_printf(json_wtr, "%s", buf);
|
||||
jsonw_uint_field(json_wtr, "uid", info->created_by_uid);
|
||||
}
|
||||
|
||||
|
@ -828,12 +828,12 @@ union bpf_attr {
|
||||
*
|
||||
* Also, be aware that the newer helper
|
||||
* **bpf_perf_event_read_value**\ () is recommended over
|
||||
* **bpf_perf_event_read*\ () in general. The latter has some ABI
|
||||
* **bpf_perf_event_read**\ () in general. The latter has some ABI
|
||||
* quirks where error and counter value are used as a return code
|
||||
* (which is wrong to do since ranges may overlap). This issue is
|
||||
* fixed with bpf_perf_event_read_value(), which at the same time
|
||||
* provides more features over the **bpf_perf_event_read**\ ()
|
||||
* interface. Please refer to the description of
|
||||
* fixed with **bpf_perf_event_read_value**\ (), which at the same
|
||||
* time provides more features over the **bpf_perf_event_read**\
|
||||
* () interface. Please refer to the description of
|
||||
* **bpf_perf_event_read_value**\ () for details.
|
||||
* Return
|
||||
* The value of the perf event counter read from the map, or a
|
||||
@ -1361,7 +1361,7 @@ union bpf_attr {
|
||||
* Return
|
||||
* 0
|
||||
*
|
||||
* int bpf_setsockopt(struct bpf_sock_ops_kern *bpf_socket, int level, int optname, char *optval, int optlen)
|
||||
* int bpf_setsockopt(struct bpf_sock_ops *bpf_socket, int level, int optname, char *optval, int optlen)
|
||||
* Description
|
||||
* Emulate a call to **setsockopt()** on the socket associated to
|
||||
* *bpf_socket*, which must be a full socket. The *level* at
|
||||
@ -1435,7 +1435,7 @@ union bpf_attr {
|
||||
* Return
|
||||
* **SK_PASS** on success, or **SK_DROP** on error.
|
||||
*
|
||||
* int bpf_sock_map_update(struct bpf_sock_ops_kern *skops, struct bpf_map *map, void *key, u64 flags)
|
||||
* int bpf_sock_map_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags)
|
||||
* Description
|
||||
* Add an entry to, or update a *map* referencing sockets. The
|
||||
* *skops* is used as a new value for the entry associated to
|
||||
@ -1533,7 +1533,7 @@ union bpf_attr {
|
||||
* Return
|
||||
* 0 on success, or a negative error in case of failure.
|
||||
*
|
||||
* int bpf_perf_prog_read_value(struct bpf_perf_event_data_kern *ctx, struct bpf_perf_event_value *buf, u32 buf_size)
|
||||
* int bpf_perf_prog_read_value(struct bpf_perf_event_data *ctx, struct bpf_perf_event_value *buf, u32 buf_size)
|
||||
* Description
|
||||
* For en eBPF program attached to a perf event, retrieve the
|
||||
* value of the event counter associated to *ctx* and store it in
|
||||
@ -1544,7 +1544,7 @@ union bpf_attr {
|
||||
* Return
|
||||
* 0 on success, or a negative error in case of failure.
|
||||
*
|
||||
* int bpf_getsockopt(struct bpf_sock_ops_kern *bpf_socket, int level, int optname, char *optval, int optlen)
|
||||
* int bpf_getsockopt(struct bpf_sock_ops *bpf_socket, int level, int optname, char *optval, int optlen)
|
||||
* Description
|
||||
* Emulate a call to **getsockopt()** on the socket associated to
|
||||
* *bpf_socket*, which must be a full socket. The *level* at
|
||||
@ -1588,7 +1588,7 @@ union bpf_attr {
|
||||
* Return
|
||||
* 0
|
||||
*
|
||||
* int bpf_sock_ops_cb_flags_set(struct bpf_sock_ops_kern *bpf_sock, int argval)
|
||||
* int bpf_sock_ops_cb_flags_set(struct bpf_sock_ops *bpf_sock, int argval)
|
||||
* Description
|
||||
* Attempt to set the value of the **bpf_sock_ops_cb_flags** field
|
||||
* for the full TCP socket associated to *bpf_sock_ops* to
|
||||
@ -1721,7 +1721,7 @@ union bpf_attr {
|
||||
* Return
|
||||
* 0 on success, or a negative error in case of failure.
|
||||
*
|
||||
* int bpf_bind(struct bpf_sock_addr_kern *ctx, struct sockaddr *addr, int addr_len)
|
||||
* int bpf_bind(struct bpf_sock_addr *ctx, struct sockaddr *addr, int addr_len)
|
||||
* Description
|
||||
* Bind the socket associated to *ctx* to the address pointed by
|
||||
* *addr*, of length *addr_len*. This allows for making outgoing
|
||||
@ -1767,6 +1767,64 @@ union bpf_attr {
|
||||
* **CONFIG_XFRM** configuration option.
|
||||
* Return
|
||||
* 0 on success, or a negative error in case of failure.
|
||||
*
|
||||
* int bpf_get_stack(struct pt_regs *regs, void *buf, u32 size, u64 flags)
|
||||
* Description
|
||||
* Return a user or a kernel stack in bpf program provided buffer.
|
||||
* To achieve this, the helper needs *ctx*, which is a pointer
|
||||
* to the context on which the tracing program is executed.
|
||||
* To store the stacktrace, the bpf program provides *buf* with
|
||||
* a nonnegative *size*.
|
||||
*
|
||||
* The last argument, *flags*, holds the number of stack frames to
|
||||
* skip (from 0 to 255), masked with
|
||||
* **BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
|
||||
* the following flags:
|
||||
*
|
||||
* **BPF_F_USER_STACK**
|
||||
* Collect a user space stack instead of a kernel stack.
|
||||
* **BPF_F_USER_BUILD_ID**
|
||||
* Collect buildid+offset instead of ips for user stack,
|
||||
* only valid if **BPF_F_USER_STACK** is also specified.
|
||||
*
|
||||
* **bpf_get_stack**\ () can collect up to
|
||||
* **PERF_MAX_STACK_DEPTH** both kernel and user frames, subject
|
||||
* to sufficient large buffer size. Note that
|
||||
* this limit can be controlled with the **sysctl** program, and
|
||||
* that it should be manually increased in order to profile long
|
||||
* user stacks (such as stacks for Java programs). To do so, use:
|
||||
*
|
||||
* ::
|
||||
*
|
||||
* # sysctl kernel.perf_event_max_stack=<new value>
|
||||
*
|
||||
* Return
|
||||
* a non-negative value equal to or less than size on success, or
|
||||
* a negative error in case of failure.
|
||||
*
|
||||
* int skb_load_bytes_relative(const struct sk_buff *skb, u32 offset, void *to, u32 len, u32 start_header)
|
||||
* Description
|
||||
* This helper is similar to **bpf_skb_load_bytes**\ () in that
|
||||
* it provides an easy way to load *len* bytes from *offset*
|
||||
* from the packet associated to *skb*, into the buffer pointed
|
||||
* by *to*. The difference to **bpf_skb_load_bytes**\ () is that
|
||||
* a fifth argument *start_header* exists in order to select a
|
||||
* base offset to start from. *start_header* can be one of:
|
||||
*
|
||||
* **BPF_HDR_START_MAC**
|
||||
* Base offset to load data from is *skb*'s mac header.
|
||||
* **BPF_HDR_START_NET**
|
||||
* Base offset to load data from is *skb*'s network header.
|
||||
*
|
||||
* In general, "direct packet access" is the preferred method to
|
||||
* access packet data, however, this helper is in particular useful
|
||||
* in socket filters where *skb*\ **->data** does not always point
|
||||
* to the start of the mac header and where "direct packet access"
|
||||
* is not available.
|
||||
*
|
||||
* Return
|
||||
* 0 on success, or a negative error in case of failure.
|
||||
*
|
||||
*/
|
||||
#define __BPF_FUNC_MAPPER(FN) \
|
||||
FN(unspec), \
|
||||
@ -1835,7 +1893,9 @@ union bpf_attr {
|
||||
FN(msg_pull_data), \
|
||||
FN(bind), \
|
||||
FN(xdp_adjust_tail), \
|
||||
FN(skb_get_xfrm_state),
|
||||
FN(skb_get_xfrm_state), \
|
||||
FN(get_stack), \
|
||||
FN(skb_load_bytes_relative),
|
||||
|
||||
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
|
||||
* function eBPF program intends to call
|
||||
@ -1869,11 +1929,14 @@ enum bpf_func_id {
|
||||
/* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
|
||||
#define BPF_F_TUNINFO_IPV6 (1ULL << 0)
|
||||
|
||||
/* BPF_FUNC_get_stackid flags. */
|
||||
/* flags for both BPF_FUNC_get_stackid and BPF_FUNC_get_stack. */
|
||||
#define BPF_F_SKIP_FIELD_MASK 0xffULL
|
||||
#define BPF_F_USER_STACK (1ULL << 8)
|
||||
/* flags used by BPF_FUNC_get_stackid only. */
|
||||
#define BPF_F_FAST_STACK_CMP (1ULL << 9)
|
||||
#define BPF_F_REUSE_STACKID (1ULL << 10)
|
||||
/* flags used by BPF_FUNC_get_stack only. */
|
||||
#define BPF_F_USER_BUILD_ID (1ULL << 11)
|
||||
|
||||
/* BPF_FUNC_skb_set_tunnel_key flags. */
|
||||
#define BPF_F_ZERO_CSUM_TX (1ULL << 1)
|
||||
@ -1893,6 +1956,12 @@ enum bpf_adj_room_mode {
|
||||
BPF_ADJ_ROOM_NET,
|
||||
};
|
||||
|
||||
/* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
|
||||
enum bpf_hdr_start_off {
|
||||
BPF_HDR_START_MAC,
|
||||
BPF_HDR_START_NET,
|
||||
};
|
||||
|
||||
/* user accessible mirror of in-kernel sk_buff.
|
||||
* new fields can only be added to the end of this structure
|
||||
*/
|
||||
|
52
tools/include/uapi/linux/erspan.h
Normal file
52
tools/include/uapi/linux/erspan.h
Normal file
@ -0,0 +1,52 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
|
||||
/*
|
||||
* ERSPAN Tunnel Metadata
|
||||
*
|
||||
* Copyright (c) 2018 VMware
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License version 2
|
||||
* as published by the Free Software Foundation.
|
||||
*
|
||||
* Userspace API for metadata mode ERSPAN tunnel
|
||||
*/
|
||||
#ifndef _UAPI_ERSPAN_H
|
||||
#define _UAPI_ERSPAN_H
|
||||
|
||||
#include <linux/types.h> /* For __beXX in userspace */
|
||||
#include <asm/byteorder.h>
|
||||
|
||||
/* ERSPAN version 2 metadata header */
|
||||
struct erspan_md2 {
|
||||
__be32 timestamp;
|
||||
__be16 sgt; /* security group tag */
|
||||
#if defined(__LITTLE_ENDIAN_BITFIELD)
|
||||
__u8 hwid_upper:2,
|
||||
ft:5,
|
||||
p:1;
|
||||
__u8 o:1,
|
||||
gra:2,
|
||||
dir:1,
|
||||
hwid:4;
|
||||
#elif defined(__BIG_ENDIAN_BITFIELD)
|
||||
__u8 p:1,
|
||||
ft:5,
|
||||
hwid_upper:2;
|
||||
__u8 hwid:4,
|
||||
dir:1,
|
||||
gra:2,
|
||||
o:1;
|
||||
#else
|
||||
#error "Please fix <asm/byteorder.h>"
|
||||
#endif
|
||||
};
|
||||
|
||||
struct erspan_metadata {
|
||||
int version;
|
||||
union {
|
||||
__be32 index; /* Version 1 (type II)*/
|
||||
struct erspan_md2 md2; /* Version 2 (type III) */
|
||||
} u;
|
||||
};
|
||||
|
||||
#endif /* _UAPI_ERSPAN_H */
|
@ -32,7 +32,8 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
|
||||
test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
|
||||
sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
|
||||
sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \
|
||||
test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o
|
||||
test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o \
|
||||
test_get_stack_rawtp.o
|
||||
|
||||
# Order correspond to 'make run_tests' order
|
||||
TEST_PROGS := test_kmod.sh \
|
||||
@ -58,6 +59,7 @@ $(OUTPUT)/test_dev_cgroup: cgroup_helpers.c
|
||||
$(OUTPUT)/test_sock: cgroup_helpers.c
|
||||
$(OUTPUT)/test_sock_addr: cgroup_helpers.c
|
||||
$(OUTPUT)/test_sockmap: cgroup_helpers.c
|
||||
$(OUTPUT)/test_progs: trace_helpers.c
|
||||
|
||||
.PHONY: force
|
||||
|
||||
|
@ -101,6 +101,8 @@ static int (*bpf_xdp_adjust_tail)(void *ctx, int offset) =
|
||||
static int (*bpf_skb_get_xfrm_state)(void *ctx, int index, void *state,
|
||||
int size, int flags) =
|
||||
(void *) BPF_FUNC_skb_get_xfrm_state;
|
||||
static int (*bpf_get_stack)(void *ctx, void *buf, int size, int flags) =
|
||||
(void *) BPF_FUNC_get_stack;
|
||||
|
||||
/* llvm builtin functions that eBPF C program may use to
|
||||
* emit BPF_LD_ABS and BPF_LD_IND instructions
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user