linux

mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-01-16 18:08:20 +00:00

Go to file

Frederic Weisbecker 4a8e65b0c3 srcu: Fix callbacks acceleration mishandling

SRCU callbacks acceleration might fail if the preceding callbacks
advance also fails. This can happen when the following steps are met:

1) The RCU_WAIT_TAIL segment has callbacks (say for gp_num 8) and the
   RCU_NEXT_READY_TAIL also has callbacks (say for gp_num 12).

2) The grace period for RCU_WAIT_TAIL is observed as started but not yet
   completed so rcu_seq_current() returns 4 + SRCU_STATE_SCAN1 = 5.

3) This value is passed to rcu_segcblist_advance() which can't move
   any segment forward and fails.

4) srcu_gp_start_if_needed() still proceeds with callback acceleration.
   But then the call to rcu_seq_snap() observes the grace period for the
   RCU_WAIT_TAIL segment (gp_num 8) as completed and the subsequent one
   for the RCU_NEXT_READY_TAIL segment as started
   (ie: 8 + SRCU_STATE_SCAN1 = 9) so it returns a snapshot of the
   next grace period, which is 16.

5) The value of 16 is passed to rcu_segcblist_accelerate() but the
   freshly enqueued callback in RCU_NEXT_TAIL can't move to
   RCU_NEXT_READY_TAIL which already has callbacks for a previous grace
   period (gp_num = 12). So acceleration fails.

6) Note in all these steps, srcu_invoke_callbacks() hadn't had a chance
   to run srcu_invoke_callbacks().

Then some very bad outcome may happen if the following happens:

7) Some other CPU races and starts the grace period number 16 before the
   CPU handling previous steps had a chance. Therefore srcu_gp_start()
   isn't called on the latter sdp to fix the acceleration leak from
   previous steps with a new pair of call to advance/accelerate.

8) The grace period 16 completes and srcu_invoke_callbacks() is finally
   called. All the callbacks from previous grace periods (8 and 12) are
   correctly advanced and executed but callbacks in RCU_NEXT_READY_TAIL
   still remain. Then rcu_segcblist_accelerate() is called with a
   snaphot of 20.

9) Since nothing started the grace period number 20, callbacks stay
   unhandled.

This has been reported in real load:

	[3144162.608392] INFO: task kworker/136:12:252684 blocked for more
	than 122 seconds.
	[3144162.615986]       Tainted: G           O  K   5.4.203-1-tlinux4-0011.1 #1
	[3144162.623053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
	disables this message.
	[3144162.631162] kworker/136:12  D    0 252684      2 0x90004000
	[3144162.631189] Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
	[3144162.631192] Call Trace:
	[3144162.631202]  __schedule+0x2ee/0x660
	[3144162.631206]  schedule+0x33/0xa0
	[3144162.631209]  schedule_timeout+0x1c4/0x340
	[3144162.631214]  ? update_load_avg+0x82/0x660
	[3144162.631217]  ? raw_spin_rq_lock_nested+0x1f/0x30
	[3144162.631218]  wait_for_completion+0x119/0x180
	[3144162.631220]  ? wake_up_q+0x80/0x80
	[3144162.631224]  __synchronize_srcu.part.19+0x81/0xb0
	[3144162.631226]  ? __bpf_trace_rcu_utilization+0x10/0x10
	[3144162.631227]  synchronize_srcu+0x5f/0xc0
	[3144162.631236]  irqfd_shutdown+0x3c/0xb0 [kvm]
	[3144162.631239]  ? __schedule+0x2f6/0x660
	[3144162.631243]  process_one_work+0x19a/0x3a0
	[3144162.631244]  worker_thread+0x37/0x3a0
	[3144162.631247]  kthread+0x117/0x140
	[3144162.631247]  ? process_one_work+0x3a0/0x3a0
	[3144162.631248]  ? __kthread_cancel_work+0x40/0x40
	[3144162.631250]  ret_from_fork+0x1f/0x30

Fix this with taking the snapshot for acceleration _before_ the read
of the current grace period number.

The only side effect of this solution is that callbacks advancing happen
then _after_ the full barrier in rcu_seq_snap(). This is not a problem
because that barrier only cares about:

1) Ordering accesses of the update side before call_srcu() so they don't
   bleed.
2) See all the accesses prior to the grace period of the current gp_num

The only things callbacks advancing need to be ordered against are
carried by snp locking.

Reported-by: Yong He <alexyonghe@tencent.com>
Co-developed-by:: Yong He <alexyonghe@tencent.com>
Signed-off-by: Yong He <alexyonghe@tencent.com>
Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by:  Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Neeraj upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Neeraj upadhyay <Neeraj.Upadhyay@amd.com>
Link: http://lore.kernel.org/CANZk6aR+CqZaqmMWrC2eRRPY12qAZnDZLwLnHZbNi=xXMB401g@mail.gmail.com
Fixes: da915ad5cf25 ("srcu: Parallelize callback handling")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

2023-10-10 16:32:26 +02:00

arch

rcu: Standardize explicit CPU-hotplug calls

2023-10-04 22:29:45 +02:00

block

block: fix pin count management when merging same-page segments

2023-09-06 07:32:27 -06:00

certs

certs: Reference revocation list for all keyrings

2023-08-17 20:12:41 +00:00

crypto

This update includes the following changes:

2023-08-29 11:23:29 -07:00

Documentation

rcu: Standardize explicit CPU-hotplug calls

2023-10-04 22:29:45 +02:00

drivers

drm ci for 6.6-rc1

2023-09-10 11:55:26 -07:00

six smb3 client fixes, one fix for nls Kconfig, one minor spnego registry update

2023-09-09 19:56:23 -07:00

include

rcu: Standardize explicit CPU-hotplug calls

2023-10-04 22:29:45 +02:00

init

workqueue: Changes for v6.6

2023-09-01 16:06:32 -07:00

io_uring

Revert "io_uring: fix IO hang in io_wq_put_and_exit from do_exit()"

2023-09-07 09:41:49 -06:00

ipc

Add x86 shadow stack support

2023-08-31 12:20:12 -07:00

kernel

srcu: Fix callbacks acceleration mishandling

2023-10-10 16:32:26 +02:00

lib

iov_iter: Kunit tests for page extraction

2023-09-09 15:11:49 -07:00

LICENSES

LICENSES: Add the copyleft-next-0.3.1 license

2022-11-08 15:44:01 +01:00

mm: Remove kmem_valid_obj()

2023-09-13 22:28:59 +02:00

net

Including fixes from netfilter and bpf.

2023-09-07 18:33:07 -07:00

rust

Documentation work keeps chugging along; stuff for 6.6 includes:

2023-08-30 20:05:42 -07:00

samples

VFIO updates for v6.6-rc1

2023-08-30 20:36:01 -07:00

scripts

Revert "checkpatch: Error out if deprecated RCU API used"

2023-09-13 22:27:15 +02:00

security

Landlock updates for v6.6-rc1

2023-09-08 12:06:51 -07:00

sound

sound fixes for 6.6-rc1

2023-09-08 13:07:50 -07:00

tools

perf tools changes for v6.6:

2023-09-09 20:06:17 -07:00

usr

initramfs: Encode dependency on KBUILD_BUILD_TIMESTAMP

2023-06-06 17:54:49 +09:00

virt

ARM:

2023-09-07 13:52:20 -07:00

.clang-format

iommu: Add for_each_group_device()

2023-05-23 08:15:51 +02:00

.cocciconfig

scripts: add Linux .cocciconfig for coccinelle

2016-07-22 12:13:39 +02:00

.get_maintainer.ignore

get_maintainer: add Alan to .get_maintainer.ignore

2022-08-20 15:17:44 -07:00

.gitattributes

.gitattributes: set diff driver for Rust source code files

2023-05-31 17:48:25 +02:00

.gitignore

kbuild: rpm-pkg: rename binkernel.spec to kernel.spec

2023-07-25 00:59:33 +09:00

.mailmap

for-linus-2023083101

2023-09-01 12:31:44 -07:00

.rustfmt.toml

rust: add .rustfmt.toml

2022-09-28 09:02:20 +02:00

COPYING

COPYING: state that all contributions really are covered by this file

2020-02-10 13:32:20 -08:00

CREDITS

USB: Remove Wireless USB and UWB documentation

2023-08-09 14:17:32 +02:00

Kbuild

Kbuild updates for v6.1

2022-10-10 12:00:45 -07:00

Kconfig

kbuild: ensure full rebuild when the compiler is updated

2020-05-12 13:28:33 +09:00

MAINTAINERS

drm ci for 6.6-rc1

2023-09-10 11:55:26 -07:00

Makefile

Linux 6.6-rc1

2023-09-10 16:28:41 -07:00

README

Drop all 00-INDEX files from Documentation/

2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.

Languages

C 97.5%

Assembly 1%

Shell 0.6%

Python 0.3%

Makefile 0.3%