linux-stable/kernel/rcu/Kconfig
Paul E. McKenney f51164a808 rcu: Employ jiffies-based backstop to callback time limit
Currently, if there are more than 100 ready-to-invoke RCU callbacks queued
on a given CPU, the rcu_do_batch() function sets a timeout for invocation
of the series.  This timeout defaulting to three milliseconds, and may
be adjusted using the rcutree.rcu_resched_ns kernel boot parameter.
This timeout is checked using local_clock(), but the overhead of this
function combined with the common-case very small callback-invocation
overhead means that local_clock() is checked every 32nd invocation.

This works well except for longer-than average callbacks.  For example,
a series of 500-microsecond-duration callbacks means that local_clock()
is checked only once every 16 milliseconds, which makes it difficult to
enforce a three-millisecond timeout.

This commit therefore adds a Kconfig option RCU_DOUBLE_CHECK_CB_TIME
that enables backup timeout checking using the coarser grained but
lighter weight jiffies.  If the jiffies counter detects a timeout,
then local_clock() is consulted even if this is not the 32nd callback.
This prevents the aforementioned 16-millisecond latency blow.

Reported-by: Domas Mituzas <dmituzas@meta.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-05-11 13:42:39 -07:00

336 lines
12 KiB
Plaintext

# SPDX-License-Identifier: GPL-2.0-only
#
# RCU-related configuration options
#
menu "RCU Subsystem"
config TREE_RCU
bool
default y if SMP
# Dynticks-idle tracking
select CONTEXT_TRACKING_IDLE
help
This option selects the RCU implementation that is
designed for very large SMP system with hundreds or
thousands of CPUs. It also scales down nicely to
smaller systems.
config PREEMPT_RCU
bool
default y if PREEMPTION
select TREE_RCU
help
This option selects the RCU implementation that is
designed for very large SMP systems with hundreds or
thousands of CPUs, but for which real-time response
is also required. It also scales down nicely to
smaller systems.
Select this option if you are unsure.
config TINY_RCU
bool
default y if !PREEMPTION && !SMP
help
This option selects the RCU implementation that is
designed for UP systems from which real-time response
is not required. This option greatly reduces the
memory footprint of RCU.
config RCU_EXPERT
bool "Make expert-level adjustments to RCU configuration"
default n
help
This option needs to be enabled if you wish to make
expert-level adjustments to RCU configuration. By default,
no such adjustments can be made, which has the often-beneficial
side-effect of preventing "make oldconfig" from asking you all
sorts of detailed questions about how you would like numerous
obscure RCU options to be set up.
Say Y if you need to make expert-level adjustments to RCU.
Say N if you are unsure.
config TINY_SRCU
bool
default y if TINY_RCU
help
This option selects the single-CPU non-preemptible version of SRCU.
config TREE_SRCU
bool
default y if !TINY_RCU
help
This option selects the full-fledged version of SRCU.
config NEED_SRCU_NMI_SAFE
def_bool HAVE_NMI && !ARCH_HAS_NMI_SAFE_THIS_CPU_OPS && !TINY_SRCU
config TASKS_RCU_GENERIC
def_bool TASKS_RCU || TASKS_RUDE_RCU || TASKS_TRACE_RCU
help
This option enables generic infrastructure code supporting
task-based RCU implementations. Not for manual selection.
config FORCE_TASKS_RCU
bool "Force selection of TASKS_RCU"
depends on RCU_EXPERT
select TASKS_RCU
default n
help
This option force-enables a task-based RCU implementation
that uses only voluntary context switch (not preemption!),
idle, and user-mode execution as quiescent states. Not for
manual selection in most cases.
config TASKS_RCU
bool
default n
select IRQ_WORK
config FORCE_TASKS_RUDE_RCU
bool "Force selection of Tasks Rude RCU"
depends on RCU_EXPERT
select TASKS_RUDE_RCU
default n
help
This option force-enables a task-based RCU implementation
that uses only context switch (including preemption) and
user-mode execution as quiescent states. It forces IPIs and
context switches on all online CPUs, including idle ones,
so use with caution. Not for manual selection in most cases.
config TASKS_RUDE_RCU
bool
default n
select IRQ_WORK
config FORCE_TASKS_TRACE_RCU
bool "Force selection of Tasks Trace RCU"
depends on RCU_EXPERT
select TASKS_TRACE_RCU
default n
help
This option enables a task-based RCU implementation that uses
explicit rcu_read_lock_trace() read-side markers, and allows
these readers to appear in the idle loop as well as on the
CPU hotplug code paths. It can force IPIs on online CPUs,
including idle ones, so use with caution. Not for manual
selection in most cases.
config TASKS_TRACE_RCU
bool
default n
select IRQ_WORK
config RCU_STALL_COMMON
def_bool TREE_RCU
help
This option enables RCU CPU stall code that is common between
the TINY and TREE variants of RCU. The purpose is to allow
the tiny variants to disable RCU CPU stall warnings, while
making these warnings mandatory for the tree variants.
config RCU_NEED_SEGCBLIST
def_bool ( TREE_RCU || TREE_SRCU || TASKS_RCU_GENERIC )
config RCU_FANOUT
int "Tree-based hierarchical RCU fanout value"
range 2 64 if 64BIT
range 2 32 if !64BIT
depends on TREE_RCU && RCU_EXPERT
default 64 if 64BIT
default 32 if !64BIT
help
This option controls the fanout of hierarchical implementations
of RCU, allowing RCU to work efficiently on machines with
large numbers of CPUs. This value must be at least the fourth
root of NR_CPUS, which allows NR_CPUS to be insanely large.
The default value of RCU_FANOUT should be used for production
systems, but if you are stress-testing the RCU implementation
itself, small RCU_FANOUT values allow you to test large-system
code paths on small(er) systems.
Select a specific number if testing RCU itself.
Take the default if unsure.
config RCU_FANOUT_LEAF
int "Tree-based hierarchical RCU leaf-level fanout value"
range 2 64 if 64BIT && !RCU_STRICT_GRACE_PERIOD
range 2 32 if !64BIT && !RCU_STRICT_GRACE_PERIOD
range 2 3 if RCU_STRICT_GRACE_PERIOD
depends on TREE_RCU && RCU_EXPERT
default 16 if !RCU_STRICT_GRACE_PERIOD
default 2 if RCU_STRICT_GRACE_PERIOD
help
This option controls the leaf-level fanout of hierarchical
implementations of RCU, and allows trading off cache misses
against lock contention. Systems that synchronize their
scheduling-clock interrupts for energy-efficiency reasons will
want the default because the smaller leaf-level fanout keeps
lock contention levels acceptably low. Very large systems
(hundreds or thousands of CPUs) will instead want to set this
value to the maximum value possible in order to reduce the
number of cache misses incurred during RCU's grace-period
initialization. These systems tend to run CPU-bound, and thus
are not helped by synchronized interrupts, and thus tend to
skew them, which reduces lock contention enough that large
leaf-level fanouts work well. That said, setting leaf-level
fanout to a large number will likely cause problematic
lock contention on the leaf-level rcu_node structures unless
you boot with the skew_tick kernel parameter.
Select a specific number if testing RCU itself.
Select the maximum permissible value for large systems, but
please understand that you may also need to set the skew_tick
kernel boot parameter to avoid contention on the rcu_node
structure's locks.
Take the default if unsure.
config RCU_BOOST
bool "Enable RCU priority boosting"
depends on (RT_MUTEXES && PREEMPT_RCU && RCU_EXPERT) || PREEMPT_RT
default y if PREEMPT_RT
help
This option boosts the priority of preempted RCU readers that
block the current preemptible RCU grace period for too long.
This option also prevents heavy loads from blocking RCU
callback invocation.
Say Y here if you are working with real-time apps or heavy loads
Say N here if you are unsure.
config RCU_BOOST_DELAY
int "Milliseconds to delay boosting after RCU grace-period start"
range 0 3000
depends on RCU_BOOST
default 500
help
This option specifies the time to wait after the beginning of
a given grace period before priority-boosting preempted RCU
readers blocking that grace period. Note that any RCU reader
blocking an expedited RCU grace period is boosted immediately.
Accept the default if unsure.
config RCU_EXP_KTHREAD
bool "Perform RCU expedited work in a real-time kthread"
depends on RCU_BOOST && RCU_EXPERT
default !PREEMPT_RT && NR_CPUS <= 32
help
Use this option to further reduce the latencies of expedited
grace periods at the expense of being more disruptive.
This option is disabled by default on PREEMPT_RT=y kernels which
disable expedited grace periods after boot by unconditionally
setting rcupdate.rcu_normal_after_boot=1.
Accept the default if unsure.
config RCU_NOCB_CPU
bool "Offload RCU callback processing from boot-selected CPUs"
depends on TREE_RCU
depends on RCU_EXPERT || NO_HZ_FULL
default n
help
Use this option to reduce OS jitter for aggressive HPC or
real-time workloads. It can also be used to offload RCU
callback invocation to energy-efficient CPUs in battery-powered
asymmetric multiprocessors. The price of this reduced jitter
is that the overhead of call_rcu() increases and that some
workloads will incur significant increases in context-switch
rates.
This option offloads callback invocation from the set of CPUs
specified at boot time by the rcu_nocbs parameter. For each
such CPU, a kthread ("rcuox/N") will be created to invoke
callbacks, where the "N" is the CPU being offloaded, and where
the "x" is "p" for RCU-preempt (PREEMPTION kernels) and "s" for
RCU-sched (!PREEMPTION kernels). Nothing prevents this kthread
from running on the specified CPUs, but (1) the kthreads may be
preempted between each callback, and (2) affinity or cgroups can
be used to force the kthreads to run on whatever set of CPUs is
desired.
Say Y here if you need reduced OS jitter, despite added overhead.
Say N here if you are unsure.
config RCU_NOCB_CPU_DEFAULT_ALL
bool "Offload RCU callback processing from all CPUs by default"
depends on RCU_NOCB_CPU
default n
help
Use this option to offload callback processing from all CPUs
by default, in the absence of the rcu_nocbs or nohz_full boot
parameter. This also avoids the need to use any boot parameters
to achieve the effect of offloading all CPUs on boot.
Say Y here if you want offload all CPUs by default on boot.
Say N here if you are unsure.
config RCU_NOCB_CPU_CB_BOOST
bool "Offload RCU callback from real-time kthread"
depends on RCU_NOCB_CPU && RCU_BOOST
default y if PREEMPT_RT
help
Use this option to invoke offloaded callbacks as SCHED_FIFO
to avoid starvation by heavy SCHED_OTHER background load.
Of course, running as SCHED_FIFO during callback floods will
cause the rcuo[ps] kthreads to monopolize the CPU for hundreds
of milliseconds or more. Therefore, when enabling this option,
it is your responsibility to ensure that latency-sensitive
tasks either run with higher priority or run on some other CPU.
Say Y here if you want to set RT priority for offloading kthreads.
Say N here if you are building a !PREEMPT_RT kernel and are unsure.
config TASKS_TRACE_RCU_READ_MB
bool "Tasks Trace RCU readers use memory barriers in user and idle"
depends on RCU_EXPERT && TASKS_TRACE_RCU
default PREEMPT_RT || NR_CPUS < 8
help
Use this option to further reduce the number of IPIs sent
to CPUs executing in userspace or idle during tasks trace
RCU grace periods. Given that a reasonable setting of
the rcupdate.rcu_task_ipi_delay kernel boot parameter
eliminates such IPIs for many workloads, proper setting
of this Kconfig option is important mostly for aggressive
real-time installations and for battery-powered devices,
hence the default chosen above.
Say Y here if you hate IPIs.
Say N here if you hate read-side memory barriers.
Take the default if you are unsure.
config RCU_LAZY
bool "RCU callback lazy invocation functionality"
depends on RCU_NOCB_CPU
default n
help
To save power, batch RCU callbacks and flush after delay, memory
pressure, or callback list growing too big.
config RCU_DOUBLE_CHECK_CB_TIME
bool "RCU callback-batch backup time check"
depends on RCU_EXPERT
default n
help
Use this option to provide more precise enforcement of the
rcutree.rcu_resched_ns module parameter in situations where
a single RCU callback might run for hundreds of microseconds,
thus defeating the 32-callback batching used to amortize the
cost of the fine-grained but expensive local_clock() function.
This option rounds rcutree.rcu_resched_ns up to the next
jiffy, and overrides the 32-callback batching if this limit
is exceeded.
Say Y here if you need tighter callback-limit enforcement.
Say N here if you are unsure.
endmenu # "RCU Subsystem"