mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-01-04 04:04:19 +00:00
memory-barriers: Replace uses of "transitive"
The current version of memory-barriers.txt misuses the term "transitive", so this commit replaces it with multi-copy atomic, also adding a definition of this term. Reported-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit is contained in:
parent
d3cf5176d0
commit
f1ab25a30c
@ -53,7 +53,7 @@ CONTENTS
|
||||
- SMP barrier pairing.
|
||||
- Examples of memory barrier sequences.
|
||||
- Read memory barriers vs load speculation.
|
||||
- Transitivity
|
||||
- Multicopy atomicity.
|
||||
|
||||
(*) Explicit kernel barriers.
|
||||
|
||||
@ -635,6 +635,11 @@ can be used to record rare error conditions and the like, and the CPUs'
|
||||
naturally occurring ordering prevents such records from being lost.
|
||||
|
||||
|
||||
Note well that the ordering provided by a data dependency is local to
|
||||
the CPU containing it. See the section on "Multicopy atomicity" for
|
||||
more information.
|
||||
|
||||
|
||||
The data dependency barrier is very important to the RCU system,
|
||||
for example. See rcu_assign_pointer() and rcu_dereference() in
|
||||
include/linux/rcupdate.h. This permits the current target of an RCU'd
|
||||
@ -851,38 +856,11 @@ In short, control dependencies apply only to the stores in the then-clause
|
||||
and else-clause of the if-statement in question (including functions
|
||||
invoked by those two clauses), not to code following that if-statement.
|
||||
|
||||
Finally, control dependencies do -not- provide transitivity. This is
|
||||
demonstrated by two related examples, with the initial values of
|
||||
'x' and 'y' both being zero:
|
||||
|
||||
CPU 0 CPU 1
|
||||
======================= =======================
|
||||
r1 = READ_ONCE(x); r2 = READ_ONCE(y);
|
||||
if (r1 > 0) if (r2 > 0)
|
||||
WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
|
||||
Note well that the ordering provided by a control dependency is local
|
||||
to the CPU containing it. See the section on "Multicopy atomicity"
|
||||
for more information.
|
||||
|
||||
assert(!(r1 == 1 && r2 == 1));
|
||||
|
||||
The above two-CPU example will never trigger the assert(). However,
|
||||
if control dependencies guaranteed transitivity (which they do not),
|
||||
then adding the following CPU would guarantee a related assertion:
|
||||
|
||||
CPU 2
|
||||
=====================
|
||||
WRITE_ONCE(x, 2);
|
||||
|
||||
assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */
|
||||
|
||||
But because control dependencies do -not- provide transitivity, the above
|
||||
assertion can fail after the combined three-CPU example completes. If you
|
||||
need the three-CPU example to provide ordering, you will need smp_mb()
|
||||
between the loads and stores in the CPU 0 and CPU 1 code fragments,
|
||||
that is, just before or just after the "if" statements. Furthermore,
|
||||
the original two-CPU example is very fragile and should be avoided.
|
||||
|
||||
These two examples are the LB and WWC litmus tests from this paper:
|
||||
http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this
|
||||
site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html.
|
||||
|
||||
In summary:
|
||||
|
||||
@ -922,8 +900,8 @@ In summary:
|
||||
|
||||
(*) Control dependencies pair normally with other types of barriers.
|
||||
|
||||
(*) Control dependencies do -not- provide transitivity. If you
|
||||
need transitivity, use smp_mb().
|
||||
(*) Control dependencies do -not- provide multicopy atomicity. If you
|
||||
need all the CPUs to see a given store at the same time, use smp_mb().
|
||||
|
||||
(*) Compilers do not understand control dependencies. It is therefore
|
||||
your job to ensure that they do not break your code.
|
||||
@ -936,13 +914,14 @@ When dealing with CPU-CPU interactions, certain types of memory barrier should
|
||||
always be paired. A lack of appropriate pairing is almost certainly an error.
|
||||
|
||||
General barriers pair with each other, though they also pair with most
|
||||
other types of barriers, albeit without transitivity. An acquire barrier
|
||||
pairs with a release barrier, but both may also pair with other barriers,
|
||||
including of course general barriers. A write barrier pairs with a data
|
||||
dependency barrier, a control dependency, an acquire barrier, a release
|
||||
barrier, a read barrier, or a general barrier. Similarly a read barrier,
|
||||
control dependency, or a data dependency barrier pairs with a write
|
||||
barrier, an acquire barrier, a release barrier, or a general barrier:
|
||||
other types of barriers, albeit without multicopy atomicity. An acquire
|
||||
barrier pairs with a release barrier, but both may also pair with other
|
||||
barriers, including of course general barriers. A write barrier pairs
|
||||
with a data dependency barrier, a control dependency, an acquire barrier,
|
||||
a release barrier, a read barrier, or a general barrier. Similarly a
|
||||
read barrier, control dependency, or a data dependency barrier pairs
|
||||
with a write barrier, an acquire barrier, a release barrier, or a
|
||||
general barrier:
|
||||
|
||||
CPU 1 CPU 2
|
||||
=============== ===============
|
||||
@ -1359,64 +1338,77 @@ the speculation will be cancelled and the value reloaded:
|
||||
retrieved : : +-------+
|
||||
|
||||
|
||||
TRANSITIVITY
|
||||
------------
|
||||
MULTICOPY ATOMICITY
|
||||
--------------------
|
||||
|
||||
Transitivity is a deeply intuitive notion about ordering that is not
|
||||
always provided by real computer systems. The following example
|
||||
demonstrates transitivity:
|
||||
Multicopy atomicity is a deeply intuitive notion about ordering that is
|
||||
not always provided by real computer systems, namely that a given store
|
||||
is visible at the same time to all CPUs, or, alternatively, that all
|
||||
CPUs agree on the order in which all stores took place. However, use of
|
||||
full multicopy atomicity would rule out valuable hardware optimizations,
|
||||
so a weaker form called ``other multicopy atomicity'' instead guarantees
|
||||
that a given store is observed at the same time by all -other- CPUs. The
|
||||
remainder of this document discusses this weaker form, but for brevity
|
||||
will call it simply ``multicopy atomicity''.
|
||||
|
||||
The following example demonstrates multicopy atomicity:
|
||||
|
||||
CPU 1 CPU 2 CPU 3
|
||||
======================= ======================= =======================
|
||||
{ X = 0, Y = 0 }
|
||||
STORE X=1 LOAD X STORE Y=1
|
||||
<general barrier> <general barrier>
|
||||
LOAD Y LOAD X
|
||||
STORE X=1 r1=LOAD X (reads 1) LOAD Y (reads 1)
|
||||
<general barrier> <read barrier>
|
||||
STORE Y=r1 LOAD X
|
||||
|
||||
Suppose that CPU 2's load from X returns 1 and its load from Y returns 0.
|
||||
This indicates that CPU 2's load from X in some sense follows CPU 1's
|
||||
store to X and that CPU 2's load from Y in some sense preceded CPU 3's
|
||||
store to Y. The question is then "Can CPU 3's load from X return 0?"
|
||||
Suppose that CPU 2's load from X returns 1 which it then stores to Y and
|
||||
that CPU 3's load from Y returns 1. This indicates that CPU 2's load
|
||||
from X in some sense follows CPU 1's store to X and that CPU 2's store
|
||||
to Y in some sense preceded CPU 3's load from Y. The question is then
|
||||
"Can CPU 3's load from X return 0?"
|
||||
|
||||
Because CPU 2's load from X in some sense came after CPU 1's store, it
|
||||
Because CPU 3's load from X in some sense came after CPU 2's load, it
|
||||
is natural to expect that CPU 3's load from X must therefore return 1.
|
||||
This expectation is an example of transitivity: if a load executing on
|
||||
CPU A follows a load from the same variable executing on CPU B, then
|
||||
CPU A's load must either return the same value that CPU B's load did,
|
||||
or must return some later value.
|
||||
This expectation is an example of multicopy atomicity: if a load executing
|
||||
on CPU A follows a load from the same variable executing on CPU B, then
|
||||
an understandable but incorrect expectation is that CPU A's load must
|
||||
either return the same value that CPU B's load did, or must return some
|
||||
later value.
|
||||
|
||||
In the Linux kernel, use of general memory barriers guarantees
|
||||
transitivity. Therefore, in the above example, if CPU 2's load from X
|
||||
returns 1 and its load from Y returns 0, then CPU 3's load from X must
|
||||
also return 1.
|
||||
In the Linux kernel, the above use of a general memory barrier compensates
|
||||
for any lack of multicopy atomicity. Therefore, in the above example,
|
||||
if CPU 2's load from X returns 1 and its load from Y returns 0, and CPU 3's
|
||||
load from Y returns 1, then CPU 3's load from X must also return 1.
|
||||
|
||||
However, transitivity is -not- guaranteed for read or write barriers.
|
||||
For example, suppose that CPU 2's general barrier in the above example
|
||||
is changed to a read barrier as shown below:
|
||||
However, dependencies, read barriers, and write barriers are not always
|
||||
able to compensate for non-multicopy atomicity. For example, suppose
|
||||
that CPU 2's general barrier is removed from the above example, leaving
|
||||
only the data dependency shown below:
|
||||
|
||||
CPU 1 CPU 2 CPU 3
|
||||
======================= ======================= =======================
|
||||
{ X = 0, Y = 0 }
|
||||
STORE X=1 LOAD X STORE Y=1
|
||||
<read barrier> <general barrier>
|
||||
LOAD Y LOAD X
|
||||
STORE X=1 r1=LOAD X (reads 1) LOAD Y (reads 1)
|
||||
<data dependency> <read barrier>
|
||||
STORE Y=r1 LOAD X (reads 0)
|
||||
|
||||
This substitution destroys transitivity: in this example, it is perfectly
|
||||
legal for CPU 2's load from X to return 1, its load from Y to return 0,
|
||||
and CPU 3's load from X to return 0.
|
||||
This substitution allows non-multicopy atomicity to run rampant: in
|
||||
this example, it is perfectly legal for CPU 2's load from X to return 1,
|
||||
CPU 3's load from Y to return 1, and its load from X to return 0.
|
||||
|
||||
The key point is that although CPU 2's read barrier orders its pair
|
||||
of loads, it does not guarantee to order CPU 1's store. Therefore, if
|
||||
this example runs on a system where CPUs 1 and 2 share a store buffer
|
||||
or a level of cache, CPU 2 might have early access to CPU 1's writes.
|
||||
General barriers are therefore required to ensure that all CPUs agree
|
||||
on the combined order of CPU 1's and CPU 2's accesses.
|
||||
The key point is that although CPU 2's data dependency orders its load
|
||||
and store, it does not guarantee to order CPU 1's store. Therefore,
|
||||
if this example runs on a non-multicopy-atomic system where CPUs 1 and 2
|
||||
share a store buffer or a level of cache, CPU 2 might have early access
|
||||
to CPU 1's writes. A general barrier is therefore required to ensure
|
||||
that all CPUs agree on the combined order of CPU 1's and CPU 2's accesses.
|
||||
|
||||
General barriers provide "global transitivity", so that all CPUs will
|
||||
agree on the order of operations. In contrast, a chain of release-acquire
|
||||
pairs provides only "local transitivity", so that only those CPUs on
|
||||
the chain are guaranteed to agree on the combined order of the accesses.
|
||||
For example, switching to C code in deference to Herman Hollerith:
|
||||
General barriers can compensate not only for non-multicopy atomicity,
|
||||
but can also generate additional ordering that can ensure that -all-
|
||||
CPUs will perceive the same order of -all- operations. In contrast, a
|
||||
chain of release-acquire pairs do not provide this additional ordering,
|
||||
which means that only those CPUs on the chain are guaranteed to agree
|
||||
on the combined order of the accesses. For example, switching to C code
|
||||
in deference to the ghost of Herman Hollerith:
|
||||
|
||||
int u, v, x, y, z;
|
||||
|
||||
@ -1448,9 +1440,9 @@ For example, switching to C code in deference to Herman Hollerith:
|
||||
r3 = READ_ONCE(u);
|
||||
}
|
||||
|
||||
Because cpu0(), cpu1(), and cpu2() participate in a local transitive
|
||||
chain of smp_store_release()/smp_load_acquire() pairs, the following
|
||||
outcome is prohibited:
|
||||
Because cpu0(), cpu1(), and cpu2() participate in a chain of
|
||||
smp_store_release()/smp_load_acquire() pairs, the following outcome
|
||||
is prohibited:
|
||||
|
||||
r0 == 1 && r1 == 1 && r2 == 1
|
||||
|
||||
@ -1460,9 +1452,9 @@ outcome is prohibited:
|
||||
|
||||
r1 == 1 && r5 == 0
|
||||
|
||||
However, the transitivity of release-acquire is local to the participating
|
||||
CPUs and does not apply to cpu3(). Therefore, the following outcome
|
||||
is possible:
|
||||
However, the ordering provided by a release-acquire chain is local
|
||||
to the CPUs participating in that chain and does not apply to cpu3(),
|
||||
at least aside from stores. Therefore, the following outcome is possible:
|
||||
|
||||
r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0
|
||||
|
||||
@ -1490,8 +1482,8 @@ following outcome is possible:
|
||||
Note that this outcome can happen even on a mythical sequentially
|
||||
consistent system where nothing is ever reordered.
|
||||
|
||||
To reiterate, if your code requires global transitivity, use general
|
||||
barriers throughout.
|
||||
To reiterate, if your code requires full ordering of all operations,
|
||||
use general barriers throughout.
|
||||
|
||||
|
||||
========================
|
||||
@ -3101,6 +3093,9 @@ AMD64 Architecture Programmer's Manual Volume 2: System Programming
|
||||
Chapter 7.1: Memory-Access Ordering
|
||||
Chapter 7.4: Buffering and Combining Memory Writes
|
||||
|
||||
ARM Architecture Reference Manual (ARMv8, for ARMv8-A architecture profile)
|
||||
Chapter B2: The AArch64 Application Level Memory Model
|
||||
|
||||
IA-32 Intel Architecture Software Developer's Manual, Volume 3:
|
||||
System Programming Guide
|
||||
Chapter 7.1: Locked Atomic Operations
|
||||
@ -3112,6 +3107,8 @@ The SPARC Architecture Manual, Version 9
|
||||
Appendix D: Formal Specification of the Memory Models
|
||||
Appendix J: Programming with the Memory Models
|
||||
|
||||
Storage in the PowerPC (Stone and Fitzgerald)
|
||||
|
||||
UltraSPARC Programmer Reference Manual
|
||||
Chapter 5: Memory Accesses and Cacheability
|
||||
Chapter 15: Sparc-V9 Memory Models
|
||||
|
Loading…
Reference in New Issue
Block a user