mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2025-01-08 14:13:53 +00:00
4015350525
Add a method to quickly verify whether safe RET operates properly on a given system using perf tool. Also, add a selftest which does the same thing. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240731160531.28640-1-bp@kernel.org
230 lines
7.7 KiB
ReStructuredText
230 lines
7.7 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
Speculative Return Stack Overflow (SRSO)
|
|
========================================
|
|
|
|
This is a mitigation for the speculative return stack overflow (SRSO)
|
|
vulnerability found on AMD processors. The mechanism is by now the well
|
|
known scenario of poisoning CPU functional units - the Branch Target
|
|
Buffer (BTB) and Return Address Predictor (RAP) in this case - and then
|
|
tricking the elevated privilege domain (the kernel) into leaking
|
|
sensitive data.
|
|
|
|
AMD CPUs predict RET instructions using a Return Address Predictor (aka
|
|
Return Address Stack/Return Stack Buffer). In some cases, a non-architectural
|
|
CALL instruction (i.e., an instruction predicted to be a CALL but is
|
|
not actually a CALL) can create an entry in the RAP which may be used
|
|
to predict the target of a subsequent RET instruction.
|
|
|
|
The specific circumstances that lead to this varies by microarchitecture
|
|
but the concern is that an attacker can mis-train the CPU BTB to predict
|
|
non-architectural CALL instructions in kernel space and use this to
|
|
control the speculative target of a subsequent kernel RET, potentially
|
|
leading to information disclosure via a speculative side-channel.
|
|
|
|
The issue is tracked under CVE-2023-20569.
|
|
|
|
Affected processors
|
|
-------------------
|
|
|
|
AMD Zen, generations 1-4. That is, all families 0x17 and 0x19. Older
|
|
processors have not been investigated.
|
|
|
|
System information and options
|
|
------------------------------
|
|
|
|
First of all, it is required that the latest microcode be loaded for
|
|
mitigations to be effective.
|
|
|
|
The sysfs file showing SRSO mitigation status is:
|
|
|
|
/sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow
|
|
|
|
The possible values in this file are:
|
|
|
|
* 'Not affected':
|
|
|
|
The processor is not vulnerable
|
|
|
|
* 'Vulnerable':
|
|
|
|
The processor is vulnerable and no mitigations have been applied.
|
|
|
|
* 'Vulnerable: No microcode':
|
|
|
|
The processor is vulnerable, no microcode extending IBPB
|
|
functionality to address the vulnerability has been applied.
|
|
|
|
* 'Vulnerable: Safe RET, no microcode':
|
|
|
|
The "Safe RET" mitigation (see below) has been applied to protect the
|
|
kernel, but the IBPB-extending microcode has not been applied. User
|
|
space tasks may still be vulnerable.
|
|
|
|
* 'Vulnerable: Microcode, no safe RET':
|
|
|
|
Extended IBPB functionality microcode patch has been applied. It does
|
|
not address User->Kernel and Guest->Host transitions protection but it
|
|
does address User->User and VM->VM attack vectors.
|
|
|
|
Note that User->User mitigation is controlled by how the IBPB aspect in
|
|
the Spectre v2 mitigation is selected:
|
|
|
|
* conditional IBPB:
|
|
|
|
where each process can select whether it needs an IBPB issued
|
|
around it PR_SPEC_DISABLE/_ENABLE etc, see :doc:`spectre`
|
|
|
|
* strict:
|
|
|
|
i.e., always on - by supplying spectre_v2_user=on on the kernel
|
|
command line
|
|
|
|
(spec_rstack_overflow=microcode)
|
|
|
|
* 'Mitigation: Safe RET':
|
|
|
|
Combined microcode/software mitigation. It complements the
|
|
extended IBPB microcode patch functionality by addressing
|
|
User->Kernel and Guest->Host transitions protection.
|
|
|
|
Selected by default or by spec_rstack_overflow=safe-ret
|
|
|
|
* 'Mitigation: IBPB':
|
|
|
|
Similar protection as "safe RET" above but employs an IBPB barrier on
|
|
privilege domain crossings (User->Kernel, Guest->Host).
|
|
|
|
(spec_rstack_overflow=ibpb)
|
|
|
|
* 'Mitigation: IBPB on VMEXIT':
|
|
|
|
Mitigation addressing the cloud provider scenario - the Guest->Host
|
|
transitions only.
|
|
|
|
(spec_rstack_overflow=ibpb-vmexit)
|
|
|
|
|
|
|
|
In order to exploit vulnerability, an attacker needs to:
|
|
|
|
- gain local access on the machine
|
|
|
|
- break kASLR
|
|
|
|
- find gadgets in the running kernel in order to use them in the exploit
|
|
|
|
- potentially create and pin an additional workload on the sibling
|
|
thread, depending on the microarchitecture (not necessary on fam 0x19)
|
|
|
|
- run the exploit
|
|
|
|
Considering the performance implications of each mitigation type, the
|
|
default one is 'Mitigation: safe RET' which should take care of most
|
|
attack vectors, including the local User->Kernel one.
|
|
|
|
As always, the user is advised to keep her/his system up-to-date by
|
|
applying software updates regularly.
|
|
|
|
The default setting will be reevaluated when needed and especially when
|
|
new attack vectors appear.
|
|
|
|
As one can surmise, 'Mitigation: safe RET' does come at the cost of some
|
|
performance depending on the workload. If one trusts her/his userspace
|
|
and does not want to suffer the performance impact, one can always
|
|
disable the mitigation with spec_rstack_overflow=off.
|
|
|
|
Similarly, 'Mitigation: IBPB' is another full mitigation type employing
|
|
an indirect branch prediction barrier after having applied the required
|
|
microcode patch for one's system. This mitigation comes also at
|
|
a performance cost.
|
|
|
|
Mitigation: Safe RET
|
|
--------------------
|
|
|
|
The mitigation works by ensuring all RET instructions speculate to
|
|
a controlled location, similar to how speculation is controlled in the
|
|
retpoline sequence. To accomplish this, the __x86_return_thunk forces
|
|
the CPU to mispredict every function return using a 'safe return'
|
|
sequence.
|
|
|
|
To ensure the safety of this mitigation, the kernel must ensure that the
|
|
safe return sequence is itself free from attacker interference. In Zen3
|
|
and Zen4, this is accomplished by creating a BTB alias between the
|
|
untraining function srso_alias_untrain_ret() and the safe return
|
|
function srso_alias_safe_ret() which results in evicting a potentially
|
|
poisoned BTB entry and using that safe one for all function returns.
|
|
|
|
In older Zen1 and Zen2, this is accomplished using a reinterpretation
|
|
technique similar to Retbleed one: srso_untrain_ret() and
|
|
srso_safe_ret().
|
|
|
|
Checking the safe RET mitigation actually works
|
|
-----------------------------------------------
|
|
|
|
In case one wants to validate whether the SRSO safe RET mitigation works
|
|
on a kernel, one could use two performance counters
|
|
|
|
* PMC_0xc8 - Count of RET/RET lw retired
|
|
* PMC_0xc9 - Count of RET/RET lw retired mispredicted
|
|
|
|
and compare the number of RETs retired properly vs those retired
|
|
mispredicted, in kernel mode. Another way of specifying those events
|
|
is::
|
|
|
|
# perf list ex_ret_near_ret
|
|
|
|
List of pre-defined events (to be used in -e or -M):
|
|
|
|
core:
|
|
ex_ret_near_ret
|
|
[Retired Near Returns]
|
|
ex_ret_near_ret_mispred
|
|
[Retired Near Returns Mispredicted]
|
|
|
|
Either the command using the event mnemonics::
|
|
|
|
# perf stat -e ex_ret_near_ret:k -e ex_ret_near_ret_mispred:k sleep 10s
|
|
|
|
or using the raw PMC numbers::
|
|
|
|
# perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s
|
|
|
|
should give the same amount. I.e., every RET retired should be
|
|
mispredicted::
|
|
|
|
[root@brent: ~/kernel/linux/tools/perf> ./perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s
|
|
|
|
Performance counter stats for 'sleep 10s':
|
|
|
|
137,167 cpu/event=0xc8,umask=0/k
|
|
137,173 cpu/event=0xc9,umask=0/k
|
|
|
|
10.004110303 seconds time elapsed
|
|
|
|
0.000000000 seconds user
|
|
0.004462000 seconds sys
|
|
|
|
vs the case when the mitigation is disabled (spec_rstack_overflow=off)
|
|
or not functioning properly, showing usually a lot smaller number of
|
|
mispredicted retired RETs vs the overall count of retired RETs during
|
|
a workload::
|
|
|
|
[root@brent: ~/kernel/linux/tools/perf> ./perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s
|
|
|
|
Performance counter stats for 'sleep 10s':
|
|
|
|
201,627 cpu/event=0xc8,umask=0/k
|
|
4,074 cpu/event=0xc9,umask=0/k
|
|
|
|
10.003267252 seconds time elapsed
|
|
|
|
0.002729000 seconds user
|
|
0.000000000 seconds sys
|
|
|
|
Also, there is a selftest which performs the above, go to
|
|
tools/testing/selftests/x86/ and do::
|
|
|
|
make srso
|
|
./srso
|