linux-stable/net/Kconfig
Daniel Borkmann 4f3446bb80 bpf: add generic constant blinding for use in jits
This work adds a generic facility for use from eBPF JIT compilers
that allows for further hardening of JIT generated images through
blinding constants. In response to the original work on BPF JIT
spraying published by Keegan McAllister [1], most BPF JITs were
changed to make images read-only and start at a randomized offset
in the page, where the rest was filled with trap instructions. We
have this nowadays in x86, arm, arm64 and s390 JIT compilers.
Additionally, later work also made eBPF interpreter images read
only for kernels supporting DEBUG_SET_MODULE_RONX, that is, x86,
arm, arm64 and s390 archs as well currently. This is done by
default for mentioned JITs when JITing is enabled. Furthermore,
we had a generic and configurable constant blinding facility on our
todo for quite some time now to further make spraying harder, and
first implementation since around netconf 2016.

We found that for systems where untrusted users can load cBPF/eBPF
code where JIT is enabled, start offset randomization helps a bit
to make jumps into crafted payload harder, but in case where larger
programs that cross page boundary are injected, we again have some
part of the program opcodes at a page start offset. With improved
guessing and more reliable payload injection, chances can increase
to jump into such payload. Elena Reshetova recently wrote a test
case for it [2, 3]. Moreover, eBPF comes with 64 bit constants, which
can leave some more room for payloads. Note that for all this,
additional bugs in the kernel are still required to make the jump
(and of course to guess right, to not jump into a trap) and naturally
the JIT must be enabled, which is disabled by default.

For helping mitigation, the general idea is to provide an option
bpf_jit_harden that admins can tweak along with bpf_jit_enable, so
that for cases where JIT should be enabled for performance reasons,
the generated image can be further hardened with blinding constants
for unpriviledged users (bpf_jit_harden == 1), with trading off
performance for these, but not for privileged ones. We also added
the option of blinding for all users (bpf_jit_harden == 2), which
is quite helpful for testing f.e. with test_bpf.ko. There are no
further e.g. hardening levels of bpf_jit_harden switch intended,
rationale is to have it dead simple to use as on/off. Since this
functionality would need to be duplicated over and over for JIT
compilers to use, which are already complex enough, we provide a
generic eBPF byte-code level based blinding implementation, which is
then just transparently JITed. JIT compilers need to make only a few
changes to integrate this facility and can be migrated one by one.

This option is for eBPF JITs and will be used in x86, arm64, s390
without too much effort, and soon ppc64 JITs, thus that native eBPF
can be blinded as well as cBPF to eBPF migrations, so that both can
be covered with a single implementation. The rule for JITs is that
bpf_jit_blind_constants() must be called from bpf_int_jit_compile(),
and in case blinding is disabled, we follow normally with JITing the
passed program. In case blinding is enabled and we fail during the
process of blinding itself, we must return with the interpreter.
Similarly, in case the JITing process after the blinding failed, we
return normally to the interpreter with the non-blinded code. Meaning,
interpreter doesn't change in any way and operates on eBPF code as
usual. For doing this pre-JIT blinding step, we need to make use of
a helper/auxiliary register, here BPF_REG_AX. This is strictly internal
to the JIT and not in any way part of the eBPF architecture. Just like
in the same way as JITs internally make use of some helper registers
when emitting code, only that here the helper register is one
abstraction level higher in eBPF bytecode, but nevertheless in JIT
phase. That helper register is needed since f.e. manually written
program can issue loads to all registers of eBPF architecture.

The core concept with the additional register is: blind out all 32
and 64 bit constants by converting BPF_K based instructions into a
small sequence from K_VAL into ((RND ^ K_VAL) ^ RND). Therefore, this
is transformed into: BPF_REG_AX := (RND ^ K_VAL), BPF_REG_AX ^= RND,
and REG <OP> BPF_REG_AX, so actual operation on the target register
is translated from BPF_K into BPF_X one that is operating on
BPF_REG_AX's content. During rewriting phase when blinding, RND is
newly generated via prandom_u32() for each processed instruction.
64 bit loads are split into two 32 bit loads to make translation and
patching not too complex. Only basic thing required by JITs is to
call the helper bpf_jit_blind_constants()/bpf_jit_prog_release_other()
pair, and to map BPF_REG_AX into an unused register.

Small bpf_jit_disasm extract from [2] when applied to x86 JIT:

echo 0 > /proc/sys/net/core/bpf_jit_harden

  ffffffffa034f5e9 + <x>:
  [...]
  39:   mov    $0xa8909090,%eax
  3e:   mov    $0xa8909090,%eax
  43:   mov    $0xa8ff3148,%eax
  48:   mov    $0xa89081b4,%eax
  4d:   mov    $0xa8900bb0,%eax
  52:   mov    $0xa810e0c1,%eax
  57:   mov    $0xa8908eb4,%eax
  5c:   mov    $0xa89020b0,%eax
  [...]

echo 1 > /proc/sys/net/core/bpf_jit_harden

  ffffffffa034f1e5 + <x>:
  [...]
  39:   mov    $0xe1192563,%r10d
  3f:   xor    $0x4989b5f3,%r10d
  46:   mov    %r10d,%eax
  49:   mov    $0xb8296d93,%r10d
  4f:   xor    $0x10b9fd03,%r10d
  56:   mov    %r10d,%eax
  59:   mov    $0x8c381146,%r10d
  5f:   xor    $0x24c7200e,%r10d
  66:   mov    %r10d,%eax
  69:   mov    $0xeb2a830e,%r10d
  6f:   xor    $0x43ba02ba,%r10d
  76:   mov    %r10d,%eax
  79:   mov    $0xd9730af,%r10d
  7f:   xor    $0xa5073b1f,%r10d
  86:   mov    %r10d,%eax
  89:   mov    $0x9a45662b,%r10d
  8f:   xor    $0x325586ea,%r10d
  96:   mov    %r10d,%eax
  [...]

As can be seen, original constants that carry payload are hidden
when enabled, actual operations are transformed from constant-based
to register-based ones, making jumps into constants ineffective.
Above extract/example uses single BPF load instruction over and
over, but of course all instructions with constants are blinded.

Performance wise, JIT with blinding performs a bit slower than just
JIT and faster than interpreter case. This is expected, since we
still get all the performance benefits from JITing and in normal
use-cases not every single instruction needs to be blinded. Summing
up all 296 test cases averaged over multiple runs from test_bpf.ko
suite, interpreter was 55% slower than JIT only and JIT with blinding
was 8% slower than JIT only. Since there are also some extremes in
the test suite, I expect for ordinary workloads that the performance
for the JIT with blinding case is even closer to JIT only case,
f.e. nmap test case from suite has averaged timings in ns 29 (JIT),
35 (+ blinding), and 151 (interpreter).

BPF test suite, seccomp test suite, eBPF sample code and various
bigger networking eBPF programs have been tested with this and were
running fine. For testing purposes, I also adapted interpreter and
redirected blinded eBPF image to interpreter and also here all tests
pass.

  [1] http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html
  [2] https://github.com/01org/jit-spray-poc-for-ksp/
  [3] http://www.openwall.com/lists/kernel-hardening/2016/05/03/5

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Elena Reshetova <elena.reshetova@intel.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:49:32 -04:00

436 lines
13 KiB
Plaintext

#
# Network configuration
#
menuconfig NET
bool "Networking support"
select NLATTR
select GENERIC_NET_UTILS
select BPF
---help---
Unless you really know what you are doing, you should say Y here.
The reason is that some programs need kernel networking support even
when running on a stand-alone machine that isn't connected to any
other computer.
If you are upgrading from an older kernel, you
should consider updating your networking tools too because changes
in the kernel and the tools often go hand in hand. The tools are
contained in the package net-tools, the location and version number
of which are given in <file:Documentation/Changes>.
For a general introduction to Linux networking, it is highly
recommended to read the NET-HOWTO, available from
<http://www.tldp.org/docs.html#howto>.
if NET
config WANT_COMPAT_NETLINK_MESSAGES
bool
help
This option can be selected by other options that need compat
netlink messages.
config COMPAT_NETLINK_MESSAGES
def_bool y
depends on COMPAT
depends on WEXT_CORE || WANT_COMPAT_NETLINK_MESSAGES
help
This option makes it possible to send different netlink messages
to tasks depending on whether the task is a compat task or not. To
achieve this, you need to set skb_shinfo(skb)->frag_list to the
compat skb before sending the skb, the netlink code will sort out
which message to actually pass to the task.
Newly written code should NEVER need this option but do
compat-independent messages instead!
config NET_INGRESS
bool
config NET_EGRESS
bool
menu "Networking options"
source "net/packet/Kconfig"
source "net/unix/Kconfig"
source "net/xfrm/Kconfig"
source "net/iucv/Kconfig"
config INET
bool "TCP/IP networking"
select CRYPTO
select CRYPTO_AES
---help---
These are the protocols used on the Internet and on most local
Ethernets. It is highly recommended to say Y here (this will enlarge
your kernel by about 400 KB), since some programs (e.g. the X window
system) use TCP/IP even if your machine is not connected to any
other computer. You will get the so-called loopback device which
allows you to ping yourself (great fun, that!).
For an excellent introduction to Linux networking, please read the
Linux Networking HOWTO, available from
<http://www.tldp.org/docs.html#howto>.
If you say Y here and also to "/proc file system support" and
"Sysctl support" below, you can change various aspects of the
behavior of the TCP/IP code by writing to the (virtual) files in
/proc/sys/net/ipv4/*; the options are explained in the file
<file:Documentation/networking/ip-sysctl.txt>.
Short answer: say Y.
if INET
source "net/ipv4/Kconfig"
source "net/ipv6/Kconfig"
source "net/netlabel/Kconfig"
endif # if INET
config NETWORK_SECMARK
bool "Security Marking"
help
This enables security marking of network packets, similar
to nfmark, but designated for security purposes.
If you are unsure how to answer this question, answer N.
config NET_PTP_CLASSIFY
def_bool n
config NETWORK_PHY_TIMESTAMPING
bool "Timestamping in PHY devices"
select NET_PTP_CLASSIFY
help
This allows timestamping of network packets by PHYs with
hardware timestamping capabilities. This option adds some
overhead in the transmit and receive paths.
If you are unsure how to answer this question, answer N.
menuconfig NETFILTER
bool "Network packet filtering framework (Netfilter)"
---help---
Netfilter is a framework for filtering and mangling network packets
that pass through your Linux box.
The most common use of packet filtering is to run your Linux box as
a firewall protecting a local network from the Internet. The type of
firewall provided by this kernel support is called a "packet
filter", which means that it can reject individual network packets
based on type, source, destination etc. The other kind of firewall,
a "proxy-based" one, is more secure but more intrusive and more
bothersome to set up; it inspects the network traffic much more
closely, modifies it and has knowledge about the higher level
protocols, which a packet filter lacks. Moreover, proxy-based
firewalls often require changes to the programs running on the local
clients. Proxy-based firewalls don't need support by the kernel, but
they are often combined with a packet filter, which only works if
you say Y here.
You should also say Y here if you intend to use your Linux box as
the gateway to the Internet for a local network of machines without
globally valid IP addresses. This is called "masquerading": if one
of the computers on your local network wants to send something to
the outside, your box can "masquerade" as that computer, i.e. it
forwards the traffic to the intended outside destination, but
modifies the packets to make it look like they came from the
firewall box itself. It works both ways: if the outside host
replies, the Linux box will silently forward the traffic to the
correct local computer. This way, the computers on your local net
are completely invisible to the outside world, even though they can
reach the outside and can receive replies. It is even possible to
run globally visible servers from within a masqueraded local network
using a mechanism called portforwarding. Masquerading is also often
called NAT (Network Address Translation).
Another use of Netfilter is in transparent proxying: if a machine on
the local network tries to connect to an outside host, your Linux
box can transparently forward the traffic to a local server,
typically a caching proxy server.
Yet another use of Netfilter is building a bridging firewall. Using
a bridge with Network packet filtering enabled makes iptables "see"
the bridged traffic. For filtering on the lower network and Ethernet
protocols over the bridge, use ebtables (under bridge netfilter
configuration).
Various modules exist for netfilter which replace the previous
masquerading (ipmasqadm), packet filtering (ipchains), transparent
proxying, and portforwarding mechanisms. Please see
<file:Documentation/Changes> under "iptables" for the location of
these packages.
if NETFILTER
config NETFILTER_DEBUG
bool "Network packet filtering debugging"
depends on NETFILTER
help
You can say Y here if you want to get additional messages useful in
debugging the netfilter code.
config NETFILTER_ADVANCED
bool "Advanced netfilter configuration"
depends on NETFILTER
default y
help
If you say Y here you can select between all the netfilter modules.
If you say N the more unusual ones will not be shown and the
basic ones needed by most people will default to 'M'.
If unsure, say Y.
config BRIDGE_NETFILTER
tristate "Bridged IP/ARP packets filtering"
depends on BRIDGE
depends on NETFILTER && INET
depends on NETFILTER_ADVANCED
default m
---help---
Enabling this option will let arptables resp. iptables see bridged
ARP resp. IP traffic. If you want a bridging firewall, you probably
want this option enabled.
Enabling or disabling this option doesn't enable or disable
ebtables.
If unsure, say N.
source "net/netfilter/Kconfig"
source "net/ipv4/netfilter/Kconfig"
source "net/ipv6/netfilter/Kconfig"
source "net/decnet/netfilter/Kconfig"
source "net/bridge/netfilter/Kconfig"
endif
source "net/dccp/Kconfig"
source "net/sctp/Kconfig"
source "net/rds/Kconfig"
source "net/tipc/Kconfig"
source "net/atm/Kconfig"
source "net/l2tp/Kconfig"
source "net/802/Kconfig"
source "net/bridge/Kconfig"
source "net/dsa/Kconfig"
source "net/8021q/Kconfig"
source "net/decnet/Kconfig"
source "net/llc/Kconfig"
source "net/ipx/Kconfig"
source "drivers/net/appletalk/Kconfig"
source "net/x25/Kconfig"
source "net/lapb/Kconfig"
source "net/phonet/Kconfig"
source "net/6lowpan/Kconfig"
source "net/ieee802154/Kconfig"
source "net/mac802154/Kconfig"
source "net/sched/Kconfig"
source "net/dcb/Kconfig"
source "net/dns_resolver/Kconfig"
source "net/batman-adv/Kconfig"
source "net/openvswitch/Kconfig"
source "net/vmw_vsock/Kconfig"
source "net/netlink/Kconfig"
source "net/mpls/Kconfig"
source "net/hsr/Kconfig"
source "net/switchdev/Kconfig"
source "net/l3mdev/Kconfig"
source "net/qrtr/Kconfig"
config RPS
bool
depends on SMP && SYSFS
default y
config RFS_ACCEL
bool
depends on RPS
select CPU_RMAP
default y
config XPS
bool
depends on SMP
default y
config HWBM
bool
config SOCK_CGROUP_DATA
bool
default n
config CGROUP_NET_PRIO
bool "Network priority cgroup"
depends on CGROUPS
select SOCK_CGROUP_DATA
---help---
Cgroup subsystem for use in assigning processes to network priorities on
a per-interface basis.
config CGROUP_NET_CLASSID
bool "Network classid cgroup"
depends on CGROUPS
select SOCK_CGROUP_DATA
---help---
Cgroup subsystem for use as general purpose socket classid marker that is
being used in cls_cgroup and for netfilter matching.
config NET_RX_BUSY_POLL
bool
default y
config BQL
bool
depends on SYSFS
select DQL
default y
config BPF_JIT
bool "enable BPF Just In Time compiler"
depends on HAVE_CBPF_JIT || HAVE_EBPF_JIT
depends on MODULES
---help---
Berkeley Packet Filter filtering capabilities are normally handled
by an interpreter. This option allows kernel to generate a native
code when filter is loaded in memory. This should speedup
packet sniffing (libpcap/tcpdump).
Note, admin should enable this feature changing:
/proc/sys/net/core/bpf_jit_enable
/proc/sys/net/core/bpf_jit_harden (optional)
config NET_FLOW_LIMIT
bool
depends on RPS
default y
---help---
The network stack has to drop packets when a receive processing CPU's
backlog reaches netdev_max_backlog. If a few out of many active flows
generate the vast majority of load, drop their traffic earlier to
maintain capacity for the other flows. This feature provides servers
with many clients some protection against DoS by a single (spoofed)
flow that greatly exceeds average workload.
menu "Network testing"
config NET_PKTGEN
tristate "Packet Generator (USE WITH CAUTION)"
depends on INET && PROC_FS
---help---
This module will inject preconfigured packets, at a configurable
rate, out of a given interface. It is used for network interface
stress testing and performance analysis. If you don't understand
what was just said, you don't need it: say N.
Documentation on how to use the packet generator can be found
at <file:Documentation/networking/pktgen.txt>.
To compile this code as a module, choose M here: the
module will be called pktgen.
config NET_TCPPROBE
tristate "TCP connection probing"
depends on INET && PROC_FS && KPROBES
---help---
This module allows for capturing the changes to TCP connection
state in response to incoming packets. It is used for debugging
TCP congestion avoidance modules. If you don't understand
what was just said, you don't need it: say N.
Documentation on how to use TCP connection probing can be found
at:
http://www.linuxfoundation.org/collaborate/workgroups/networking/tcpprobe
To compile this code as a module, choose M here: the
module will be called tcp_probe.
config NET_DROP_MONITOR
tristate "Network packet drop alerting service"
depends on INET && TRACEPOINTS
---help---
This feature provides an alerting service to userspace in the
event that packets are discarded in the network stack. Alerts
are broadcast via netlink socket to any listening user space
process. If you don't need network drop alerts, or if you are ok
just checking the various proc files and other utilities for
drop statistics, say N here.
endmenu
endmenu
source "net/ax25/Kconfig"
source "net/can/Kconfig"
source "net/irda/Kconfig"
source "net/bluetooth/Kconfig"
source "net/rxrpc/Kconfig"
source "net/kcm/Kconfig"
config FIB_RULES
bool
menuconfig WIRELESS
bool "Wireless"
depends on !S390
default y
if WIRELESS
source "net/wireless/Kconfig"
source "net/mac80211/Kconfig"
endif # WIRELESS
source "net/wimax/Kconfig"
source "net/rfkill/Kconfig"
source "net/9p/Kconfig"
source "net/caif/Kconfig"
source "net/ceph/Kconfig"
source "net/nfc/Kconfig"
config LWTUNNEL
bool "Network light weight tunnels"
---help---
This feature provides an infrastructure to support light weight
tunnels like mpls. There is no netdevice associated with a light
weight tunnel endpoint. Tunnel encapsulation parameters are stored
with light weight tunnel state associated with fib routes.
config DST_CACHE
bool
default n
config NET_DEVLINK
tristate "Network physical/parent device Netlink interface"
help
Network physical/parent device Netlink interface provides
infrastructure to support access to physical chip-wide config and
monitoring.
config MAY_USE_DEVLINK
tristate
default m if NET_DEVLINK=m
default y if NET_DEVLINK=y || NET_DEVLINK=n
help
Drivers using the devlink infrastructure should have a dependency
on MAY_USE_DEVLINK to ensure they do not cause link errors when
devlink is a loadable module and the driver using it is built-in.
endif # if NET
# Used by archs to tell that they support BPF JIT compiler plus which flavour.
# Only one of the two can be selected for a specific arch since eBPF JIT supersedes
# the cBPF JIT.
# Classic BPF JIT (cBPF)
config HAVE_CBPF_JIT
bool
# Extended BPF JIT (eBPF)
config HAVE_EBPF_JIT
bool