2019-06-04 10:11:33 +02:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-only
|
2015-11-24 12:37:35 +01:00
|
|
|
/*
|
arm64: module: split core and init PLT sections
The arm64 module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.
However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This leads to relocation failures if the distance between those
regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
occur on arm64, but the issue has been observed on ARM, whose module
region is much smaller).
So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section as the relocated branch instruction.
Since there may now be two PLT entries associated with each entry in
the symbol table, we can no longer hijack the symbol::st_size fields
to record the addresses of PLT entries as we emit them for zero-addend
relocations. So instead, perform an explicit comparison to check for
duplicate entries.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-02-21 22:12:57 +00:00
|
|
|
* Copyright (C) 2014-2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
|
2015-11-24 12:37:35 +01:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/elf.h>
|
arm64: implement ftrace with regs
This patch implements FTRACE_WITH_REGS for arm64, which allows a traced
function's arguments (and some other registers) to be captured into a
struct pt_regs, allowing these to be inspected and/or modified. This is
a building block for live-patching, where a function's arguments may be
forwarded to another function. This is also necessary to enable ftrace
and in-kernel pointer authentication at the same time, as it allows the
LR value to be captured and adjusted prior to signing.
Using GCC's -fpatchable-function-entry=N option, we can have the
compiler insert a configurable number of NOPs between the function entry
point and the usual prologue. This also ensures functions are AAPCS
compliant (e.g. disabling inter-procedural register allocation).
For example, with -fpatchable-function-entry=2, GCC 8.1.0 compiles the
following:
| unsigned long bar(void);
|
| unsigned long foo(void)
| {
| return bar() + 1;
| }
... to:
| <foo>:
| nop
| nop
| stp x29, x30, [sp, #-16]!
| mov x29, sp
| bl 0 <bar>
| add x0, x0, #0x1
| ldp x29, x30, [sp], #16
| ret
This patch builds the kernel with -fpatchable-function-entry=2,
prefixing each function with two NOPs. To trace a function, we replace
these NOPs with a sequence that saves the LR into a GPR, then calls an
ftrace entry assembly function which saves this and other relevant
registers:
| mov x9, x30
| bl <ftrace-entry>
Since patchable functions are AAPCS compliant (and the kernel does not
use x18 as a platform register), x9-x18 can be safely clobbered in the
patched sequence and the ftrace entry code.
There are now two ftrace entry functions, ftrace_regs_entry (which saves
all GPRs), and ftrace_entry (which saves the bare minimum). A PLT is
allocated for each within modules.
Signed-off-by: Torsten Duwe <duwe@suse.de>
[Mark: rework asm, comments, PLTs, initialization, commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Torsten Duwe <duwe@suse.de>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Julien Thierry <jthierry@redhat.com>
Cc: Will Deacon <will@kernel.org>
2019-02-08 16:10:19 +01:00
|
|
|
#include <linux/ftrace.h>
|
2015-11-24 12:37:35 +01:00
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/module.h>
|
2023-05-16 18:06:37 +02:00
|
|
|
#include <linux/moduleloader.h>
|
2015-11-24 12:37:35 +01:00
|
|
|
#include <linux/sort.h>
|
|
|
|
|
2018-11-22 09:46:46 +01:00
|
|
|
static struct plt_entry __get_adrp_add_pair(u64 dst, u64 pc,
|
|
|
|
enum aarch64_insn_register reg)
|
|
|
|
{
|
|
|
|
u32 adrp, add;
|
|
|
|
|
|
|
|
adrp = aarch64_insn_gen_adr(pc, dst, reg, AARCH64_INSN_ADR_TYPE_ADRP);
|
|
|
|
add = aarch64_insn_gen_add_sub_imm(reg, reg, dst % SZ_4K,
|
|
|
|
AARCH64_INSN_VARIANT_64BIT,
|
|
|
|
AARCH64_INSN_ADSB_ADD);
|
|
|
|
|
|
|
|
return (struct plt_entry){ cpu_to_le32(adrp), cpu_to_le32(add) };
|
|
|
|
}
|
|
|
|
|
|
|
|
struct plt_entry get_plt_entry(u64 dst, void *pc)
|
|
|
|
{
|
|
|
|
struct plt_entry plt;
|
|
|
|
static u32 br;
|
|
|
|
|
|
|
|
if (!br)
|
|
|
|
br = aarch64_insn_gen_branch_reg(AARCH64_INSN_REG_16,
|
|
|
|
AARCH64_INSN_BRANCH_NOLINK);
|
|
|
|
|
|
|
|
plt = __get_adrp_add_pair(dst, (u64)pc, AARCH64_INSN_REG_16);
|
|
|
|
plt.br = cpu_to_le32(br);
|
|
|
|
|
|
|
|
return plt;
|
|
|
|
}
|
|
|
|
|
2022-09-29 17:41:32 +08:00
|
|
|
static bool plt_entries_equal(const struct plt_entry *a,
|
|
|
|
const struct plt_entry *b)
|
2018-11-22 09:46:46 +01:00
|
|
|
{
|
|
|
|
u64 p, q;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check whether both entries refer to the same target:
|
|
|
|
* do the cheapest checks first.
|
|
|
|
* If the 'add' or 'br' opcodes are different, then the target
|
|
|
|
* cannot be the same.
|
|
|
|
*/
|
|
|
|
if (a->add != b->add || a->br != b->br)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
p = ALIGN_DOWN((u64)a, SZ_4K);
|
|
|
|
q = ALIGN_DOWN((u64)b, SZ_4K);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If the 'adrp' opcodes are the same then we just need to check
|
|
|
|
* that they refer to the same 4k region.
|
|
|
|
*/
|
|
|
|
if (a->adrp == b->adrp && p == q)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return (p + aarch64_insn_adrp_get_offset(le32_to_cpu(a->adrp))) ==
|
|
|
|
(q + aarch64_insn_adrp_get_offset(le32_to_cpu(b->adrp)));
|
|
|
|
}
|
|
|
|
|
2018-11-05 19:53:23 +01:00
|
|
|
u64 module_emit_plt_entry(struct module *mod, Elf64_Shdr *sechdrs,
|
|
|
|
void *loc, const Elf64_Rela *rela,
|
2015-11-24 12:37:35 +01:00
|
|
|
Elf64_Sym *sym)
|
|
|
|
{
|
module: replace module_layout with module_memory
module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:
1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
obvious how these data are used (are they RO, RX, or RW?)
Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:
MOD_TEXT,
MOD_DATA,
MOD_RODATA,
MOD_RO_AFTER_INIT,
MOD_INIT_TEXT,
MOD_INIT_DATA,
MOD_INIT_RODATA,
and allocating them separately. This adds slightly more entries to
mod_tree (from up to 3 entries per module, to up to 7 entries per
module). However, this at most adds a small constant overhead to
__module_address(), which is expected to be fast.
Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.
module_memory simplifies quite some of the module code. For example,
ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
different allocator for the data. kernel/module/strict_rwx.c is also
much cleaner with module_memory.
Signed-off-by: Song Liu <song@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-02-06 16:28:02 -08:00
|
|
|
struct mod_plt_sec *pltsec = !within_module_init((unsigned long)loc, mod) ?
|
|
|
|
&mod->arch.core : &mod->arch.init;
|
2018-11-05 19:53:23 +01:00
|
|
|
struct plt_entry *plt = (struct plt_entry *)sechdrs[pltsec->plt_shndx].sh_addr;
|
arm64: module: split core and init PLT sections
The arm64 module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.
However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This leads to relocation failures if the distance between those
regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
occur on arm64, but the issue has been observed on ARM, whose module
region is much smaller).
So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section as the relocated branch instruction.
Since there may now be two PLT entries associated with each entry in
the symbol table, we can no longer hijack the symbol::st_size fields
to record the addresses of PLT entries as we emit them for zero-addend
relocations. So instead, perform an explicit comparison to check for
duplicate entries.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-02-21 22:12:57 +00:00
|
|
|
int i = pltsec->plt_num_entries;
|
2018-11-22 09:46:46 +01:00
|
|
|
int j = i - 1;
|
2015-11-24 12:37:35 +01:00
|
|
|
u64 val = sym->st_value + rela->r_addend;
|
|
|
|
|
2018-11-22 09:46:46 +01:00
|
|
|
if (is_forbidden_offset_for_adrp(&plt[i].adrp))
|
|
|
|
i++;
|
|
|
|
|
|
|
|
plt[i] = get_plt_entry(val, &plt[i]);
|
2015-11-24 12:37:35 +01:00
|
|
|
|
arm64: module: split core and init PLT sections
The arm64 module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.
However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This leads to relocation failures if the distance between those
regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
occur on arm64, but the issue has been observed on ARM, whose module
region is much smaller).
So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section as the relocated branch instruction.
Since there may now be two PLT entries associated with each entry in
the symbol table, we can no longer hijack the symbol::st_size fields
to record the addresses of PLT entries as we emit them for zero-addend
relocations. So instead, perform an explicit comparison to check for
duplicate entries.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-02-21 22:12:57 +00:00
|
|
|
/*
|
|
|
|
* Check if the entry we just created is a duplicate. Given that the
|
|
|
|
* relocations are sorted, this will be the last entry we allocated.
|
|
|
|
* (if one exists).
|
|
|
|
*/
|
2018-11-22 09:46:46 +01:00
|
|
|
if (j >= 0 && plt_entries_equal(plt + i, plt + j))
|
|
|
|
return (u64)&plt[j];
|
arm64: module: split core and init PLT sections
The arm64 module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.
However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This leads to relocation failures if the distance between those
regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
occur on arm64, but the issue has been observed on ARM, whose module
region is much smaller).
So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section as the relocated branch instruction.
Since there may now be two PLT entries associated with each entry in
the symbol table, we can no longer hijack the symbol::st_size fields
to record the addresses of PLT entries as we emit them for zero-addend
relocations. So instead, perform an explicit comparison to check for
duplicate entries.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-02-21 22:12:57 +00:00
|
|
|
|
2018-11-22 09:46:46 +01:00
|
|
|
pltsec->plt_num_entries += i - j;
|
2018-03-06 17:15:31 +00:00
|
|
|
if (WARN_ON(pltsec->plt_num_entries > pltsec->plt_max_entries))
|
|
|
|
return 0;
|
2015-11-24 12:37:35 +01:00
|
|
|
|
|
|
|
return (u64)&plt[i];
|
|
|
|
}
|
|
|
|
|
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 17:15:33 +00:00
|
|
|
#ifdef CONFIG_ARM64_ERRATUM_843419
|
2018-11-05 19:53:23 +01:00
|
|
|
u64 module_emit_veneer_for_adrp(struct module *mod, Elf64_Shdr *sechdrs,
|
|
|
|
void *loc, u64 val)
|
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 17:15:33 +00:00
|
|
|
{
|
module: replace module_layout with module_memory
module_layout manages different types of memory (text, data, rodata, etc.)
in one allocation, which is problematic for some reasons:
1. It is hard to enable CONFIG_STRICT_MODULE_RWX.
2. It is hard to use huge pages in modules (and not break strict rwx).
3. Many archs uses module_layout for arch-specific data, but it is not
obvious how these data are used (are they RO, RX, or RW?)
Improve the scenario by replacing 2 (or 3) module_layout per module with
up to 7 module_memory per module:
MOD_TEXT,
MOD_DATA,
MOD_RODATA,
MOD_RO_AFTER_INIT,
MOD_INIT_TEXT,
MOD_INIT_DATA,
MOD_INIT_RODATA,
and allocating them separately. This adds slightly more entries to
mod_tree (from up to 3 entries per module, to up to 7 entries per
module). However, this at most adds a small constant overhead to
__module_address(), which is expected to be fast.
Various archs use module_layout for different data. These data are put
into different module_memory based on their location in module_layout.
IOW, data that used to go with text is allocated with MOD_MEM_TYPE_TEXT;
data that used to go with data is allocated with MOD_MEM_TYPE_DATA, etc.
module_memory simplifies quite some of the module code. For example,
ARCH_WANTS_MODULES_DATA_IN_VMALLOC is a lot cleaner, as it just uses a
different allocator for the data. kernel/module/strict_rwx.c is also
much cleaner with module_memory.
Signed-off-by: Song Liu <song@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-02-06 16:28:02 -08:00
|
|
|
struct mod_plt_sec *pltsec = !within_module_init((unsigned long)loc, mod) ?
|
|
|
|
&mod->arch.core : &mod->arch.init;
|
2018-11-05 19:53:23 +01:00
|
|
|
struct plt_entry *plt = (struct plt_entry *)sechdrs[pltsec->plt_shndx].sh_addr;
|
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 17:15:33 +00:00
|
|
|
int i = pltsec->plt_num_entries++;
|
2018-11-22 09:46:46 +01:00
|
|
|
u32 br;
|
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 17:15:33 +00:00
|
|
|
int rd;
|
|
|
|
|
|
|
|
if (WARN_ON(pltsec->plt_num_entries > pltsec->plt_max_entries))
|
|
|
|
return 0;
|
|
|
|
|
2018-11-22 09:46:46 +01:00
|
|
|
if (is_forbidden_offset_for_adrp(&plt[i].adrp))
|
|
|
|
i = pltsec->plt_num_entries++;
|
|
|
|
|
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 17:15:33 +00:00
|
|
|
/* get the destination register of the ADRP instruction */
|
|
|
|
rd = aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RD,
|
|
|
|
le32_to_cpup((__le32 *)loc));
|
|
|
|
|
|
|
|
br = aarch64_insn_gen_branch_imm((u64)&plt[i].br, (u64)loc + 4,
|
|
|
|
AARCH64_INSN_BRANCH_NOLINK);
|
|
|
|
|
2018-11-22 09:46:46 +01:00
|
|
|
plt[i] = __get_adrp_add_pair(val, (u64)&plt[i], rd);
|
|
|
|
plt[i].br = cpu_to_le32(br);
|
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 17:15:33 +00:00
|
|
|
|
|
|
|
return (u64)&plt[i];
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2021-02-04 09:43:49 +08:00
|
|
|
#define cmp_3way(a, b) ((a) < (b) ? -1 : (a) > (b))
|
2015-11-24 12:37:35 +01:00
|
|
|
|
|
|
|
static int cmp_rela(const void *a, const void *b)
|
|
|
|
{
|
|
|
|
const Elf64_Rela *x = a, *y = b;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/* sort by type, symbol index and addend */
|
|
|
|
i = cmp_3way(ELF64_R_TYPE(x->r_info), ELF64_R_TYPE(y->r_info));
|
|
|
|
if (i == 0)
|
|
|
|
i = cmp_3way(ELF64_R_SYM(x->r_info), ELF64_R_SYM(y->r_info));
|
|
|
|
if (i == 0)
|
|
|
|
i = cmp_3way(x->r_addend, y->r_addend);
|
|
|
|
return i;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool duplicate_rel(const Elf64_Rela *rela, int num)
|
|
|
|
{
|
|
|
|
/*
|
|
|
|
* Entries are sorted by type, symbol index and addend. That means
|
|
|
|
* that, if a duplicate entry exists, it must be in the preceding
|
|
|
|
* slot.
|
|
|
|
*/
|
|
|
|
return num > 0 && cmp_rela(rela + num, rela + num - 1) == 0;
|
|
|
|
}
|
|
|
|
|
arm64: module: split core and init PLT sections
The arm64 module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.
However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This leads to relocation failures if the distance between those
regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
occur on arm64, but the issue has been observed on ARM, whose module
region is much smaller).
So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section as the relocated branch instruction.
Since there may now be two PLT entries associated with each entry in
the symbol table, we can no longer hijack the symbol::st_size fields
to record the addresses of PLT entries as we emit them for zero-addend
relocations. So instead, perform an explicit comparison to check for
duplicate entries.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-02-21 22:12:57 +00:00
|
|
|
static unsigned int count_plts(Elf64_Sym *syms, Elf64_Rela *rela, int num,
|
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 17:15:33 +00:00
|
|
|
Elf64_Word dstidx, Elf_Shdr *dstsec)
|
2015-11-24 12:37:35 +01:00
|
|
|
{
|
|
|
|
unsigned int ret = 0;
|
|
|
|
Elf64_Sym *s;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < num; i++) {
|
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 17:15:33 +00:00
|
|
|
u64 min_align;
|
|
|
|
|
2015-11-24 12:37:35 +01:00
|
|
|
switch (ELF64_R_TYPE(rela[i].r_info)) {
|
|
|
|
case R_AARCH64_JUMP26:
|
|
|
|
case R_AARCH64_CALL26:
|
|
|
|
/*
|
|
|
|
* We only have to consider branch targets that resolve
|
arm64: module: split core and init PLT sections
The arm64 module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.
However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This leads to relocation failures if the distance between those
regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
occur on arm64, but the issue has been observed on ARM, whose module
region is much smaller).
So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section as the relocated branch instruction.
Since there may now be two PLT entries associated with each entry in
the symbol table, we can no longer hijack the symbol::st_size fields
to record the addresses of PLT entries as we emit them for zero-addend
relocations. So instead, perform an explicit comparison to check for
duplicate entries.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-02-21 22:12:57 +00:00
|
|
|
* to symbols that are defined in a different section.
|
|
|
|
* This is not simply a heuristic, it is a fundamental
|
|
|
|
* limitation, since there is no guaranteed way to emit
|
|
|
|
* PLT entries sufficiently close to the branch if the
|
|
|
|
* section size exceeds the range of a branch
|
|
|
|
* instruction. So ignore relocations against defined
|
|
|
|
* symbols if they live in the same section as the
|
|
|
|
* relocation target.
|
2015-11-24 12:37:35 +01:00
|
|
|
*/
|
|
|
|
s = syms + ELF64_R_SYM(rela[i].r_info);
|
arm64: module: split core and init PLT sections
The arm64 module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.
However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This leads to relocation failures if the distance between those
regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
occur on arm64, but the issue has been observed on ARM, whose module
region is much smaller).
So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section as the relocated branch instruction.
Since there may now be two PLT entries associated with each entry in
the symbol table, we can no longer hijack the symbol::st_size fields
to record the addresses of PLT entries as we emit them for zero-addend
relocations. So instead, perform an explicit comparison to check for
duplicate entries.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-02-21 22:12:57 +00:00
|
|
|
if (s->st_shndx == dstidx)
|
2015-11-24 12:37:35 +01:00
|
|
|
break;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Jump relocations with non-zero addends against
|
|
|
|
* undefined symbols are supported by the ELF spec, but
|
|
|
|
* do not occur in practice (e.g., 'jump n bytes past
|
|
|
|
* the entry point of undefined function symbol f').
|
|
|
|
* So we need to support them, but there is no need to
|
|
|
|
* take them into consideration when trying to optimize
|
|
|
|
* this code. So let's only check for duplicates when
|
|
|
|
* the addend is zero: this allows us to record the PLT
|
|
|
|
* entry address in the symbol table itself, rather than
|
|
|
|
* having to search the list for duplicates each time we
|
|
|
|
* emit one.
|
|
|
|
*/
|
|
|
|
if (rela[i].r_addend != 0 || !duplicate_rel(rela, i))
|
|
|
|
ret++;
|
|
|
|
break;
|
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 17:15:33 +00:00
|
|
|
case R_AARCH64_ADR_PREL_PG_HI21_NC:
|
|
|
|
case R_AARCH64_ADR_PREL_PG_HI21:
|
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419
In count_plts() and is_forbidden_offset_for_adrp() we use
cpus_have_const_cap() to check for ARM64_WORKAROUND_843419, but this is
not necessary and cpus_have_final_cap() would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
It's not possible to load a module in the window between detecting the
ARM64_WORKAROUND_843419 cpucap and patching alternatives. The module VA
range limits are initialized much later in module_init_limits() which is
a subsys_initcall, and module loading cannot happen before this. Hence
it's not necessary for count_plts() or is_forbidden_offset_for_adrp() to
use cpus_have_const_cap().
This patch replaces the use of cpus_have_const_cap() with
cpus_have_final_cap() which will avoid generating code to test the
system_cpucaps bitmap and should be better for all subsequent calls at
runtime. Using cpus_have_final_cap() clearly documents that we do not
expect this code to run before cpucaps are finalized, and will make it
easier to spot issues if code is changed in future to allow modules to
be loaded earlier. The ARM64_WORKAROUND_843419 cpucap is added to
cpucap_is_possible() so that code can be elided entirely when this is not
possible, and redundant IS_ENABLED() checks are removed.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 11:24:54 +01:00
|
|
|
if (!cpus_have_final_cap(ARM64_WORKAROUND_843419))
|
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 17:15:33 +00:00
|
|
|
break;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Determine the minimal safe alignment for this ADRP
|
|
|
|
* instruction: the section alignment at which it is
|
|
|
|
* guaranteed not to appear at a vulnerable offset.
|
|
|
|
*
|
|
|
|
* This comes down to finding the least significant zero
|
|
|
|
* bit in bits [11:3] of the section offset, and
|
|
|
|
* increasing the section's alignment so that the
|
|
|
|
* resulting address of this instruction is guaranteed
|
|
|
|
* to equal the offset in that particular bit (as well
|
2022-03-18 11:37:05 +01:00
|
|
|
* as all less significant bits). This ensures that the
|
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 17:15:33 +00:00
|
|
|
* address modulo 4 KB != 0xfff8 or 0xfffc (which would
|
|
|
|
* have all ones in bits [11:3])
|
|
|
|
*/
|
|
|
|
min_align = 2ULL << ffz(rela[i].r_offset | 0x7);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Allocate veneer space for each ADRP that may appear
|
|
|
|
* at a vulnerable offset nonetheless. At relocation
|
|
|
|
* time, some of these will remain unused since some
|
|
|
|
* ADRP instructions can be patched to ADR instructions
|
|
|
|
* instead.
|
|
|
|
*/
|
|
|
|
if (min_align > SZ_4K)
|
|
|
|
ret++;
|
|
|
|
else
|
|
|
|
dstsec->sh_addralign = max(dstsec->sh_addralign,
|
|
|
|
min_align);
|
|
|
|
break;
|
2015-11-24 12:37:35 +01:00
|
|
|
}
|
|
|
|
}
|
2018-11-22 09:46:46 +01:00
|
|
|
|
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419
In count_plts() and is_forbidden_offset_for_adrp() we use
cpus_have_const_cap() to check for ARM64_WORKAROUND_843419, but this is
not necessary and cpus_have_final_cap() would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
It's not possible to load a module in the window between detecting the
ARM64_WORKAROUND_843419 cpucap and patching alternatives. The module VA
range limits are initialized much later in module_init_limits() which is
a subsys_initcall, and module loading cannot happen before this. Hence
it's not necessary for count_plts() or is_forbidden_offset_for_adrp() to
use cpus_have_const_cap().
This patch replaces the use of cpus_have_const_cap() with
cpus_have_final_cap() which will avoid generating code to test the
system_cpucaps bitmap and should be better for all subsequent calls at
runtime. Using cpus_have_final_cap() clearly documents that we do not
expect this code to run before cpucaps are finalized, and will make it
easier to spot issues if code is changed in future to allow modules to
be loaded earlier. The ARM64_WORKAROUND_843419 cpucap is added to
cpucap_is_possible() so that code can be elided entirely when this is not
possible, and redundant IS_ENABLED() checks are removed.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 11:24:54 +01:00
|
|
|
if (cpus_have_final_cap(ARM64_WORKAROUND_843419)) {
|
2018-11-22 09:46:46 +01:00
|
|
|
/*
|
|
|
|
* Add some slack so we can skip PLT slots that may trigger
|
|
|
|
* the erratum due to the placement of the ADRP instruction.
|
|
|
|
*/
|
|
|
|
ret += DIV_ROUND_UP(ret, (SZ_4K / sizeof(struct plt_entry)));
|
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419
In count_plts() and is_forbidden_offset_for_adrp() we use
cpus_have_const_cap() to check for ARM64_WORKAROUND_843419, but this is
not necessary and cpus_have_final_cap() would be preferable.
For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.
Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.
It's not possible to load a module in the window between detecting the
ARM64_WORKAROUND_843419 cpucap and patching alternatives. The module VA
range limits are initialized much later in module_init_limits() which is
a subsys_initcall, and module loading cannot happen before this. Hence
it's not necessary for count_plts() or is_forbidden_offset_for_adrp() to
use cpus_have_const_cap().
This patch replaces the use of cpus_have_const_cap() with
cpus_have_final_cap() which will avoid generating code to test the
system_cpucaps bitmap and should be better for all subsequent calls at
runtime. Using cpus_have_final_cap() clearly documents that we do not
expect this code to run before cpucaps are finalized, and will make it
easier to spot issues if code is changed in future to allow modules to
be loaded earlier. The ARM64_WORKAROUND_843419 cpucap is added to
cpucap_is_possible() so that code can be elided entirely when this is not
possible, and redundant IS_ENABLED() checks are removed.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 11:24:54 +01:00
|
|
|
}
|
2018-11-22 09:46:46 +01:00
|
|
|
|
2015-11-24 12:37:35 +01:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2020-06-22 18:18:02 -07:00
|
|
|
static bool branch_rela_needs_plt(Elf64_Sym *syms, Elf64_Rela *rela,
|
|
|
|
Elf64_Word dstidx)
|
|
|
|
{
|
|
|
|
|
|
|
|
Elf64_Sym *s = syms + ELF64_R_SYM(rela->r_info);
|
|
|
|
|
|
|
|
if (s->st_shndx == dstidx)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
return ELF64_R_TYPE(rela->r_info) == R_AARCH64_JUMP26 ||
|
|
|
|
ELF64_R_TYPE(rela->r_info) == R_AARCH64_CALL26;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Group branch PLT relas at the front end of the array. */
|
|
|
|
static int partition_branch_plt_relas(Elf64_Sym *syms, Elf64_Rela *rela,
|
|
|
|
int numrels, Elf64_Word dstidx)
|
|
|
|
{
|
|
|
|
int i = 0, j = numrels - 1;
|
|
|
|
|
|
|
|
while (i < j) {
|
|
|
|
if (branch_rela_needs_plt(syms, &rela[i], dstidx))
|
|
|
|
i++;
|
|
|
|
else if (branch_rela_needs_plt(syms, &rela[j], dstidx))
|
|
|
|
swap(rela[i], rela[j]);
|
|
|
|
else
|
|
|
|
j--;
|
|
|
|
}
|
|
|
|
|
|
|
|
return i;
|
|
|
|
}
|
|
|
|
|
2015-11-24 12:37:35 +01:00
|
|
|
int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
|
|
|
|
char *secstrings, struct module *mod)
|
|
|
|
{
|
arm64: module: split core and init PLT sections
The arm64 module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.
However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This leads to relocation failures if the distance between those
regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
occur on arm64, but the issue has been observed on ARM, whose module
region is much smaller).
So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section as the relocated branch instruction.
Since there may now be two PLT entries associated with each entry in
the symbol table, we can no longer hijack the symbol::st_size fields
to record the addresses of PLT entries as we emit them for zero-addend
relocations. So instead, perform an explicit comparison to check for
duplicate entries.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-02-21 22:12:57 +00:00
|
|
|
unsigned long core_plts = 0;
|
|
|
|
unsigned long init_plts = 0;
|
2015-11-24 12:37:35 +01:00
|
|
|
Elf64_Sym *syms = NULL;
|
2018-11-05 19:53:23 +01:00
|
|
|
Elf_Shdr *pltsec, *tramp = NULL;
|
2015-11-24 12:37:35 +01:00
|
|
|
int i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Find the empty .plt section so we can expand it to store the PLT
|
|
|
|
* entries. Record the symtab address as well.
|
|
|
|
*/
|
|
|
|
for (i = 0; i < ehdr->e_shnum; i++) {
|
arm64: module: split core and init PLT sections
The arm64 module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.
However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This leads to relocation failures if the distance between those
regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
occur on arm64, but the issue has been observed on ARM, whose module
region is much smaller).
So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section as the relocated branch instruction.
Since there may now be two PLT entries associated with each entry in
the symbol table, we can no longer hijack the symbol::st_size fields
to record the addresses of PLT entries as we emit them for zero-addend
relocations. So instead, perform an explicit comparison to check for
duplicate entries.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-02-21 22:12:57 +00:00
|
|
|
if (!strcmp(secstrings + sechdrs[i].sh_name, ".plt"))
|
2018-11-05 19:53:23 +01:00
|
|
|
mod->arch.core.plt_shndx = i;
|
arm64: module: split core and init PLT sections
The arm64 module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.
However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This leads to relocation failures if the distance between those
regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
occur on arm64, but the issue has been observed on ARM, whose module
region is much smaller).
So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section as the relocated branch instruction.
Since there may now be two PLT entries associated with each entry in
the symbol table, we can no longer hijack the symbol::st_size fields
to record the addresses of PLT entries as we emit them for zero-addend
relocations. So instead, perform an explicit comparison to check for
duplicate entries.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-02-21 22:12:57 +00:00
|
|
|
else if (!strcmp(secstrings + sechdrs[i].sh_name, ".init.plt"))
|
2018-11-05 19:53:23 +01:00
|
|
|
mod->arch.init.plt_shndx = i;
|
2020-09-01 18:00:16 +02:00
|
|
|
else if (!strcmp(secstrings + sechdrs[i].sh_name,
|
2017-11-20 17:41:30 +00:00
|
|
|
".text.ftrace_trampoline"))
|
|
|
|
tramp = sechdrs + i;
|
2015-11-24 12:37:35 +01:00
|
|
|
else if (sechdrs[i].sh_type == SHT_SYMTAB)
|
|
|
|
syms = (Elf64_Sym *)sechdrs[i].sh_addr;
|
|
|
|
}
|
|
|
|
|
2018-11-05 19:53:23 +01:00
|
|
|
if (!mod->arch.core.plt_shndx || !mod->arch.init.plt_shndx) {
|
arm64: module: split core and init PLT sections
The arm64 module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.
However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This leads to relocation failures if the distance between those
regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
occur on arm64, but the issue has been observed on ARM, whose module
region is much smaller).
So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section as the relocated branch instruction.
Since there may now be two PLT entries associated with each entry in
the symbol table, we can no longer hijack the symbol::st_size fields
to record the addresses of PLT entries as we emit them for zero-addend
relocations. So instead, perform an explicit comparison to check for
duplicate entries.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-02-21 22:12:57 +00:00
|
|
|
pr_err("%s: module PLT section(s) missing\n", mod->name);
|
2015-11-24 12:37:35 +01:00
|
|
|
return -ENOEXEC;
|
|
|
|
}
|
|
|
|
if (!syms) {
|
|
|
|
pr_err("%s: module symtab section missing\n", mod->name);
|
|
|
|
return -ENOEXEC;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0; i < ehdr->e_shnum; i++) {
|
|
|
|
Elf64_Rela *rels = (void *)ehdr + sechdrs[i].sh_offset;
|
2020-06-22 18:18:02 -07:00
|
|
|
int nents, numrels = sechdrs[i].sh_size / sizeof(Elf64_Rela);
|
2015-11-24 12:37:35 +01:00
|
|
|
Elf64_Shdr *dstsec = sechdrs + sechdrs[i].sh_info;
|
|
|
|
|
|
|
|
if (sechdrs[i].sh_type != SHT_RELA)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* ignore relocations that operate on non-exec sections */
|
|
|
|
if (!(dstsec->sh_flags & SHF_EXECINSTR))
|
|
|
|
continue;
|
|
|
|
|
2020-06-22 18:18:02 -07:00
|
|
|
/*
|
|
|
|
* sort branch relocations requiring a PLT by type, symbol index
|
|
|
|
* and addend
|
|
|
|
*/
|
|
|
|
nents = partition_branch_plt_relas(syms, rels, numrels,
|
|
|
|
sechdrs[i].sh_info);
|
|
|
|
if (nents)
|
|
|
|
sort(rels, nents, sizeof(Elf64_Rela), cmp_rela, NULL);
|
2015-11-24 12:37:35 +01:00
|
|
|
|
2023-08-01 14:54:08 +00:00
|
|
|
if (!module_init_layout_section(secstrings + dstsec->sh_name))
|
arm64: module: split core and init PLT sections
The arm64 module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.
However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This leads to relocation failures if the distance between those
regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
occur on arm64, but the issue has been observed on ARM, whose module
region is much smaller).
So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section as the relocated branch instruction.
Since there may now be two PLT entries associated with each entry in
the symbol table, we can no longer hijack the symbol::st_size fields
to record the addresses of PLT entries as we emit them for zero-addend
relocations. So instead, perform an explicit comparison to check for
duplicate entries.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-02-21 22:12:57 +00:00
|
|
|
core_plts += count_plts(syms, rels, numrels,
|
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 17:15:33 +00:00
|
|
|
sechdrs[i].sh_info, dstsec);
|
arm64: module: split core and init PLT sections
The arm64 module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.
However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This leads to relocation failures if the distance between those
regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
occur on arm64, but the issue has been observed on ARM, whose module
region is much smaller).
So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section as the relocated branch instruction.
Since there may now be two PLT entries associated with each entry in
the symbol table, we can no longer hijack the symbol::st_size fields
to record the addresses of PLT entries as we emit them for zero-addend
relocations. So instead, perform an explicit comparison to check for
duplicate entries.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-02-21 22:12:57 +00:00
|
|
|
else
|
|
|
|
init_plts += count_plts(syms, rels, numrels,
|
arm64/kernel: don't ban ADRP to work around Cortex-A53 erratum #843419
Working around Cortex-A53 erratum #843419 involves special handling of
ADRP instructions that end up in the last two instruction slots of a
4k page, or whose output register gets overwritten without having been
read. (Note that the latter instruction sequence is never emitted by
a properly functioning compiler, which is why it is disregarded by the
handling of the same erratum in the bfd.ld linker which we rely on for
the core kernel)
Normally, this gets taken care of by the linker, which can spot such
sequences at final link time, and insert a veneer if the ADRP ends up
at a vulnerable offset. However, linux kernel modules are partially
linked ELF objects, and so there is no 'final link time' other than the
runtime loading of the module, at which time all the static relocations
are resolved.
For this reason, we have implemented the #843419 workaround for modules
by avoiding ADRP instructions altogether, by using the large C model,
and by passing -mpc-relative-literal-loads to recent versions of GCC
that may emit adrp/ldr pairs to perform literal loads. However, this
workaround forces us to keep literal data mixed with the instructions
in the executable .text segment, and literal data may inadvertently
turn into an exploitable speculative gadget depending on the relative
offsets of arbitrary symbols.
So let's reimplement this workaround in a way that allows us to switch
back to the small C model, and to drop the -mpc-relative-literal-loads
GCC switch, by patching affected ADRP instructions at runtime:
- ADRP instructions that do not appear at 4k relative offset 0xff8 or
0xffc are ignored
- ADRP instructions that are within 1 MB of their target symbol are
converted into ADR instructions
- remaining ADRP instructions are redirected via a veneer that performs
the load using an unaffected movn/movk sequence.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: tidied up ADRP -> ADR instruction patching.]
[will: use ULL suffix for 64-bit immediate]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-03-06 17:15:33 +00:00
|
|
|
sechdrs[i].sh_info, dstsec);
|
2015-11-24 12:37:35 +01:00
|
|
|
}
|
|
|
|
|
2018-11-05 19:53:23 +01:00
|
|
|
pltsec = sechdrs + mod->arch.core.plt_shndx;
|
|
|
|
pltsec->sh_type = SHT_NOBITS;
|
|
|
|
pltsec->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
|
|
|
|
pltsec->sh_addralign = L1_CACHE_BYTES;
|
|
|
|
pltsec->sh_size = (core_plts + 1) * sizeof(struct plt_entry);
|
arm64: module: split core and init PLT sections
The arm64 module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.
However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This leads to relocation failures if the distance between those
regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
occur on arm64, but the issue has been observed on ARM, whose module
region is much smaller).
So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section as the relocated branch instruction.
Since there may now be two PLT entries associated with each entry in
the symbol table, we can no longer hijack the symbol::st_size fields
to record the addresses of PLT entries as we emit them for zero-addend
relocations. So instead, perform an explicit comparison to check for
duplicate entries.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-02-21 22:12:57 +00:00
|
|
|
mod->arch.core.plt_num_entries = 0;
|
|
|
|
mod->arch.core.plt_max_entries = core_plts;
|
|
|
|
|
2018-11-05 19:53:23 +01:00
|
|
|
pltsec = sechdrs + mod->arch.init.plt_shndx;
|
|
|
|
pltsec->sh_type = SHT_NOBITS;
|
|
|
|
pltsec->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
|
|
|
|
pltsec->sh_addralign = L1_CACHE_BYTES;
|
|
|
|
pltsec->sh_size = (init_plts + 1) * sizeof(struct plt_entry);
|
arm64: module: split core and init PLT sections
The arm64 module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.
However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This leads to relocation failures if the distance between those
regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
occur on arm64, but the issue has been observed on ARM, whose module
region is much smaller).
So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section as the relocated branch instruction.
Since there may now be two PLT entries associated with each entry in
the symbol table, we can no longer hijack the symbol::st_size fields
to record the addresses of PLT entries as we emit them for zero-addend
relocations. So instead, perform an explicit comparison to check for
duplicate entries.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-02-21 22:12:57 +00:00
|
|
|
mod->arch.init.plt_num_entries = 0;
|
|
|
|
mod->arch.init.plt_max_entries = init_plts;
|
|
|
|
|
2017-11-20 17:41:30 +00:00
|
|
|
if (tramp) {
|
|
|
|
tramp->sh_type = SHT_NOBITS;
|
|
|
|
tramp->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
|
|
|
|
tramp->sh_addralign = __alignof__(struct plt_entry);
|
arm64: implement ftrace with regs
This patch implements FTRACE_WITH_REGS for arm64, which allows a traced
function's arguments (and some other registers) to be captured into a
struct pt_regs, allowing these to be inspected and/or modified. This is
a building block for live-patching, where a function's arguments may be
forwarded to another function. This is also necessary to enable ftrace
and in-kernel pointer authentication at the same time, as it allows the
LR value to be captured and adjusted prior to signing.
Using GCC's -fpatchable-function-entry=N option, we can have the
compiler insert a configurable number of NOPs between the function entry
point and the usual prologue. This also ensures functions are AAPCS
compliant (e.g. disabling inter-procedural register allocation).
For example, with -fpatchable-function-entry=2, GCC 8.1.0 compiles the
following:
| unsigned long bar(void);
|
| unsigned long foo(void)
| {
| return bar() + 1;
| }
... to:
| <foo>:
| nop
| nop
| stp x29, x30, [sp, #-16]!
| mov x29, sp
| bl 0 <bar>
| add x0, x0, #0x1
| ldp x29, x30, [sp], #16
| ret
This patch builds the kernel with -fpatchable-function-entry=2,
prefixing each function with two NOPs. To trace a function, we replace
these NOPs with a sequence that saves the LR into a GPR, then calls an
ftrace entry assembly function which saves this and other relevant
registers:
| mov x9, x30
| bl <ftrace-entry>
Since patchable functions are AAPCS compliant (and the kernel does not
use x18 as a platform register), x9-x18 can be safely clobbered in the
patched sequence and the ftrace entry code.
There are now two ftrace entry functions, ftrace_regs_entry (which saves
all GPRs), and ftrace_entry (which saves the bare minimum). A PLT is
allocated for each within modules.
Signed-off-by: Torsten Duwe <duwe@suse.de>
[Mark: rework asm, comments, PLTs, initialization, commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Torsten Duwe <duwe@suse.de>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Torsten Duwe <duwe@suse.de>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Julien Thierry <jthierry@redhat.com>
Cc: Will Deacon <will@kernel.org>
2019-02-08 16:10:19 +01:00
|
|
|
tramp->sh_size = NR_FTRACE_PLTS * sizeof(struct plt_entry);
|
2017-11-20 17:41:30 +00:00
|
|
|
}
|
|
|
|
|
2015-11-24 12:37:35 +01:00
|
|
|
return 0;
|
|
|
|
}
|