Linux kernel stable tree
Go to file
Song Liu 39d8f0d102 bpf: Use raw_spin_trylock() for pcpu_freelist_push/pop in NMI
Recent improvements in LOCKDEP highlighted a potential A-A deadlock with
pcpu_freelist in NMI:

./tools/testing/selftests/bpf/test_progs -t stacktrace_build_id_nmi

[   18.984807] ================================
[   18.984807] WARNING: inconsistent lock state
[   18.984808] 5.9.0-rc6-01771-g1466de1330e1 #2967 Not tainted
[   18.984809] --------------------------------
[   18.984809] inconsistent {INITIAL USE} -> {IN-NMI} usage.
[   18.984810] test_progs/1990 [HC2[2]:SC0[0]:HE0:SE1] takes:
[   18.984810] ffffe8ffffc219c0 (&head->lock){....}-{2:2}, at: __pcpu_freelist_pop+0xe3/0x180
[   18.984813] {INITIAL USE} state was registered at:
[   18.984814]   lock_acquire+0x175/0x7c0
[   18.984814]   _raw_spin_lock+0x2c/0x40
[   18.984815]   __pcpu_freelist_pop+0xe3/0x180
[   18.984815]   pcpu_freelist_pop+0x31/0x40
[   18.984816]   htab_map_alloc+0xbbf/0xf40
[   18.984816]   __do_sys_bpf+0x5aa/0x3ed0
[   18.984817]   do_syscall_64+0x2d/0x40
[   18.984818]   entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   18.984818] irq event stamp: 12
[...]
[   18.984822] other info that might help us debug this:
[   18.984823]  Possible unsafe locking scenario:
[   18.984823]
[   18.984824]        CPU0
[   18.984824]        ----
[   18.984824]   lock(&head->lock);
[   18.984826]   <Interrupt>
[   18.984826]     lock(&head->lock);
[   18.984827]
[   18.984828]  *** DEADLOCK ***
[   18.984828]
[   18.984829] 2 locks held by test_progs/1990:
[...]
[   18.984838]  <NMI>
[   18.984838]  dump_stack+0x9a/0xd0
[   18.984839]  lock_acquire+0x5c9/0x7c0
[   18.984839]  ? lock_release+0x6f0/0x6f0
[   18.984840]  ? __pcpu_freelist_pop+0xe3/0x180
[   18.984840]  _raw_spin_lock+0x2c/0x40
[   18.984841]  ? __pcpu_freelist_pop+0xe3/0x180
[   18.984841]  __pcpu_freelist_pop+0xe3/0x180
[   18.984842]  pcpu_freelist_pop+0x17/0x40
[   18.984842]  ? lock_release+0x6f0/0x6f0
[   18.984843]  __bpf_get_stackid+0x534/0xaf0
[   18.984843]  bpf_prog_1fd9e30e1438d3c5_oncpu+0x73/0x350
[   18.984844]  bpf_overflow_handler+0x12f/0x3f0

This is because pcpu_freelist_head.lock is accessed in both NMI and
non-NMI context. Fix this issue by using raw_spin_trylock() in NMI.

Since NMI interrupts non-NMI context, when NMI context tries to lock the
raw_spinlock, non-NMI context of the same CPU may already have locked a
lock and is blocked from unlocking the lock. For a system with N CPUs,
there could be N NMIs at the same time, and they may block N non-NMI
raw_spinlocks. This is tricky for pcpu_freelist_push(), where unlike
_pop(), failing _push() means leaking memory. This issue is more likely to
trigger in non-SMP system.

Fix this issue with an extra list, pcpu_freelist.extralist. The extralist
is primarily used to take _push() when raw_spin_trylock() failed on all
the per CPU lists. It should be empty most of the time. The following
table summarizes the behavior of pcpu_freelist in NMI and non-NMI:

non-NMI pop(): 	use _lock(); check per CPU lists first;
                if all per CPU lists are empty, check extralist;
                if extralist is empty, return NULL.

non-NMI push(): use _lock(); only push to per CPU lists.

NMI pop():    use _trylock(); check per CPU lists first;
              if all per CPU lists are locked or empty, check extralist;
              if extralist is locked or empty, return NULL.

NMI push():   use _trylock(); check per CPU lists first;
              if all per CPU lists are locked; try push to extralist;
              if extralist is also locked, keep trying on per CPU lists.

Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201005165838.3735218-1-songliubraving@fb.com
2020-10-06 00:04:11 +02:00
arch Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-10-01 14:29:01 -07:00
block - Fix a regression in bdev partition locking (Christoph) 2020-09-11 11:55:28 -07:00
certs .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2020-08-30 15:53:44 -07:00
Documentation dt-bindings: net: renesas,etheravb: Convert to json-schema 2020-10-01 12:53:30 -07:00
drivers lib8390: Use netif_msg_init to initialize msg_enable bits 2020-10-01 19:08:46 -07:00
fs Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
include bpf: Introducte bpf_this_cpu_ptr() 2020-10-02 15:00:49 -07:00
init Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
ipc ipc: adjust proc_ipc_sem_dointvec definition to match prototype 2020-09-05 12:14:29 -07:00
kernel bpf: Use raw_spin_trylock() for pcpu_freelist_push/pop in NMI 2020-10-06 00:04:11 +02:00
lib Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
mm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
net xsk: Remove internal DMA headers 2020-10-05 15:48:00 +02:00
samples bpf, selftests: Use bpf_tail_call_static where appropriate 2020-09-30 11:50:35 -07:00
scripts bpf: Add bpf_snprintf_btf helper 2020-09-28 18:26:58 -07:00
security Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
sound sound fixes for 5.9-rc6 2020-09-18 11:38:08 -07:00
tools bpf, sockmap: Update selftests to use skb_adjust_room 2020-10-02 15:18:39 -07:00
usr Merge branch 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 13:29:39 -07:00
virt KVM: fix memory leak in kvm_io_bus_unregister_dev() 2020-09-11 13:15:11 -04:00
.clang-format clang-format: Update with the latest for_each macro list 2020-09-01 12:53:42 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: Add ZSTD-compressed files 2020-07-31 11:50:49 +02:00
.mailmap mailmap: add older email addresses for Kees Cook 2020-09-19 13:13:38 -07:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Add Krzysztof Kozlowski to Samsung S3FWRN5 and remove Robert 2020-09-10 15:22:16 -07:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS drop_monitor: Convert to using devlink tracepoint 2020-09-30 18:01:26 -07:00
Makefile Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-09-23 13:11:11 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.