License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 14:07:57 +00:00
|
|
|
# SPDX-License-Identifier: GPL-2.0
|
2007-11-12 19:54:30 +00:00
|
|
|
# Select 32 or 64 bit
|
|
|
|
config 64BIT
|
kconfig: reference environment variables directly and remove 'option env='
To get access to environment variables, Kconfig needs to define a
symbol using "option env=" syntax. It is tedious to add a symbol entry
for each environment variable given that we need to define much more
such as 'CC', 'AS', 'srctree' etc. to evaluate the compiler capability
in Kconfig.
Adding '$' for symbol references is grammatically inconsistent.
Looking at the code, the symbols prefixed with 'S' are expanded by:
- conf_expand_value()
This is used to expand 'arch/$ARCH/defconfig' and 'defconfig_list'
- sym_expand_string_value()
This is used to expand strings in 'source' and 'mainmenu'
All of them are fixed values independent of user configuration. So,
they can be changed into the direct expansion instead of symbols.
This change makes the code much cleaner. The bounce symbols 'SRCARCH',
'ARCH', 'SUBARCH', 'KERNELVERSION' are gone.
sym_init() hard-coding 'UNAME_RELEASE' is also gone. 'UNAME_RELEASE'
should be replaced with an environment variable.
ARCH_DEFCONFIG is a normal symbol, so it should be simply referenced
without '$' prefix.
The new syntax is addicted by Make. The variable reference needs
parentheses, like $(FOO), but you can omit them for single-letter
variables, like $F. Yet, in Makefiles, people tend to use the
parenthetical form for consistency / clarification.
At this moment, only the environment variable is supported, but I will
extend the concept of 'variable' later on.
The variables are expanded in the lexer so we can simplify the token
handling on the parser side.
For example, the following code works.
[Example code]
config MY_TOOLCHAIN_LIST
string
default "My tools: CC=$(CC), AS=$(AS), CPP=$(CPP)"
[Result]
$ make -s alldefconfig && tail -n 1 .config
CONFIG_MY_TOOLCHAIN_LIST="My tools: CC=gcc, AS=as, CPP=gcc -E"
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
2018-05-28 09:21:40 +00:00
|
|
|
bool "64-bit kernel" if "$(ARCH)" = "x86"
|
|
|
|
default "$(ARCH)" != "i386"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-12 19:54:30 +00:00
|
|
|
Say yes to build a 64-bit kernel - formerly known as x86_64
|
|
|
|
Say no to build a 32-bit kernel - formerly known as i386
|
|
|
|
|
|
|
|
config X86_32
|
2012-09-10 11:41:45 +00:00
|
|
|
def_bool y
|
|
|
|
depends on !64BIT
|
2016-11-15 09:04:55 +00:00
|
|
|
# Options that are inherently 32-bit kernel only:
|
|
|
|
select ARCH_WANT_IPC_PARSE_VERSION
|
|
|
|
select CLKSRC_I8253
|
|
|
|
select CLONE_BACKWARDS
|
2020-11-03 09:27:20 +00:00
|
|
|
select GENERIC_VDSO_32
|
2019-04-14 16:00:08 +00:00
|
|
|
select HAVE_DEBUG_STACKOVERFLOW
|
2020-11-03 09:27:20 +00:00
|
|
|
select KMAP_LOCAL
|
2016-11-15 09:04:55 +00:00
|
|
|
select MODULES_USE_ELF_REL
|
|
|
|
select OLD_SIGACTION
|
2020-11-30 22:30:59 +00:00
|
|
|
select ARCH_SPLIT_ARG64
|
2007-11-12 19:54:30 +00:00
|
|
|
|
|
|
|
config X86_64
|
2012-09-10 11:41:45 +00:00
|
|
|
def_bool y
|
|
|
|
depends on 64BIT
|
2016-11-15 09:11:57 +00:00
|
|
|
# Options that are inherently 64-bit kernel only:
|
2019-05-14 00:19:04 +00:00
|
|
|
select ARCH_HAS_GIGANTIC_PAGE
|
2019-11-08 12:22:27 +00:00
|
|
|
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
|
2023-02-27 17:36:28 +00:00
|
|
|
select ARCH_SUPPORTS_PER_VMA_LOCK
|
2024-08-26 20:43:51 +00:00
|
|
|
select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
|
2016-11-15 09:11:57 +00:00
|
|
|
select HAVE_ARCH_SOFT_DIRTY
|
|
|
|
select MODULES_USE_ELF_RELA
|
2018-05-09 04:53:49 +00:00
|
|
|
select NEED_DMA_MAP_STATE
|
2018-04-24 07:00:54 +00:00
|
|
|
select SWIOTLB
|
2020-06-14 03:03:25 +00:00
|
|
|
select ARCH_HAS_ELFCORE_COMPAT
|
2021-07-01 01:52:20 +00:00
|
|
|
select ZONE_DMA32
|
2024-05-05 16:06:25 +00:00
|
|
|
select EXECMEM if DYNAMIC_FTRACE
|
2007-11-06 20:35:08 +00:00
|
|
|
|
2019-05-10 16:05:46 +00:00
|
|
|
config FORCE_DYNAMIC_FTRACE
|
|
|
|
def_bool y
|
|
|
|
depends on X86_32
|
|
|
|
depends on FUNCTION_TRACER
|
|
|
|
select DYNAMIC_FTRACE
|
|
|
|
help
|
2022-05-25 13:32:02 +00:00
|
|
|
We keep the static function tracing (!DYNAMIC_FTRACE) around
|
|
|
|
in order to test the non static function tracing in the
|
|
|
|
generic code, as other architectures still use it. But we
|
|
|
|
only need to keep it around for x86_64. No need to keep it
|
|
|
|
for x86_32. For x86_32, force DYNAMIC_FTRACE.
|
2016-11-15 09:11:57 +00:00
|
|
|
#
|
|
|
|
# Arch settings
|
|
|
|
#
|
|
|
|
# ( Note that options that are marked 'if X86_64' could in principle be
|
|
|
|
# ported to 32-bit as well. )
|
|
|
|
#
|
2007-11-06 22:30:30 +00:00
|
|
|
config X86
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
2016-11-15 09:26:39 +00:00
|
|
|
#
|
|
|
|
# Note: keep this list sorted alphabetically
|
|
|
|
#
|
2015-06-03 08:00:13 +00:00
|
|
|
select ACPI_LEGACY_TABLES_LOOKUP if ACPI
|
|
|
|
select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI
|
2023-11-21 13:44:15 +00:00
|
|
|
select ACPI_HOTPLUG_CPU if ACPI_PROCESSOR && HOTPLUG_CPU
|
32-bit userspace ABI: introduce ARCH_32BIT_OFF_T config option
All new 32-bit architectures should have 64-bit userspace off_t type, but
existing architectures has 32-bit ones.
To enforce the rule, new config option is added to arch/Kconfig that defaults
ARCH_32BIT_OFF_T to be disabled for new 32-bit architectures. All existing
32-bit architectures enable it explicitly.
New option affects force_o_largefile() behaviour. Namely, if userspace
off_t is 64-bits long, we have no reason to reject user to open big files.
Note that even if architectures has only 64-bit off_t in the kernel
(arc, c6x, h8300, hexagon, nios2, openrisc, and unicore32),
a libc may use 32-bit off_t, and therefore want to limit the file size
to 4GB unless specified differently in the open flags.
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Yury Norov <ynorov@marvell.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-05-16 08:18:49 +00:00
|
|
|
select ARCH_32BIT_OFF_T if X86_32
|
2018-09-17 12:45:35 +00:00
|
|
|
select ARCH_CLOCKSOURCE_INIT
|
2024-04-20 00:05:54 +00:00
|
|
|
select ARCH_CONFIGURES_CPU_MITIGATIONS
|
2021-10-25 11:41:52 +00:00
|
|
|
select ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
|
2021-05-05 01:38:21 +00:00
|
|
|
select ARCH_ENABLE_HUGEPAGE_MIGRATION if X86_64 && HUGETLB_PAGE && MIGRATION
|
2021-11-05 20:44:39 +00:00
|
|
|
select ARCH_ENABLE_MEMORY_HOTPLUG if X86_64
|
2021-05-05 01:38:17 +00:00
|
|
|
select ARCH_ENABLE_MEMORY_HOTREMOVE if MEMORY_HOTPLUG
|
2021-07-01 01:51:58 +00:00
|
|
|
select ARCH_ENABLE_SPLIT_PMD_PTLOCK if (PGTABLE_LEVELS > 2) && (X86_64 || X86_PAE)
|
2021-05-05 01:38:21 +00:00
|
|
|
select ARCH_ENABLE_THP_MIGRATION if X86_64 && TRANSPARENT_HUGEPAGE
|
2016-11-15 09:26:39 +00:00
|
|
|
select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
|
2021-05-05 01:38:09 +00:00
|
|
|
select ARCH_HAS_CACHE_LINE_SIZE
|
memregion: Add cpu_cache_invalidate_memregion() interface
With CXL security features, and CXL dynamic provisioning, global CPU
cache flushing nvdimm requirements are no longer specific to that
subsystem, even beyond the scope of security_ops. CXL will need such
semantics for features not necessarily limited to persistent memory.
The functionality this is enabling is to be able to instantaneously
secure erase potentially terabytes of memory at once and the kernel
needs to be sure that none of the data from before the erase is still
present in the cache. It is also used when unlocking a memory device
where speculative reads and firmware accesses could have cached poison
from before the device was unlocked. Lastly this facility is used when
mapping new devices, or new capacity into an established physical
address range. I.e. when the driver switches DeviceA mapping AddressX to
DeviceB mapping AddressX then any cached data from DeviceA:AddressX
needs to be invalidated.
This capability is typically only used once per-boot (for unlock), or
once per bare metal provisioning event (secure erase), like when handing
off the system to another tenant or decommissioning a device. It may
also be used for dynamic CXL region provisioning.
Users must first call cpu_cache_has_invalidate_memregion() to know
whether this functionality is available on the architecture. On x86 this
respects the constraints of when wbinvd() is tolerable. It is already
the case that wbinvd() is problematic to allow in VMs due its global
performance impact and KVM, for example, has been known to just trap and
ignore the call. With confidential computing guest execution of wbinvd()
may even trigger an exception. Given guests should not be messing with
the bare metal address map via CXL configuration changes
cpu_cache_has_invalidate_memregion() returns false in VMs.
While this global cache invalidation facility, is exported to modules,
since NVDIMM and CXL support can be built as a module, it is not for
general use. The intent is that this facility is not available outside
of specific "device-memory" use cases. To make that expectation as clear
as possible the API is scoped to a new "DEVMEM" module namespace that
only the NVDIMM and CXL subsystems are expected to import.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Tested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-10-28 18:34:04 +00:00
|
|
|
select ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
|
2023-06-13 23:39:24 +00:00
|
|
|
select ARCH_HAS_CPU_FINALIZE_INIT
|
2023-10-27 00:05:20 +00:00
|
|
|
select ARCH_HAS_CPU_PASID if IOMMU_SVA
|
2022-02-16 20:05:28 +00:00
|
|
|
select ARCH_HAS_CURRENT_STACK_POINTER
|
2017-01-10 21:35:40 +00:00
|
|
|
select ARCH_HAS_DEBUG_VIRTUAL
|
mm/debug: add tests validating architecture page table helpers
This adds tests which will validate architecture page table helpers and
other accessors in their compliance with expected generic MM semantics.
This will help various architectures in validating changes to existing
page table helpers or addition of new ones.
This test covers basic page table entry transformations including but not
limited to old, young, dirty, clean, write, write protect etc at various
level along with populating intermediate entries with next page table page
and validating them.
Test page table pages are allocated from system memory with required size
and alignments. The mapped pfns at page table levels are derived from a
real pfn representing a valid kernel text symbol. This test gets called
via late_initcall().
This test gets built and run when CONFIG_DEBUG_VM_PGTABLE is selected.
Any architecture, which is willing to subscribe this test will need to
select ARCH_HAS_DEBUG_VM_PGTABLE. For now this is limited to arc, arm64,
x86, s390 and powerpc platforms where the test is known to build and run
successfully Going forward, other architectures too can subscribe the test
after fixing any build or runtime problems with their page table helpers.
Folks interested in making sure that a given platform's page table helpers
conform to expected generic MM semantics should enable the above config
which will just trigger this test during boot. Any non conformity here
will be reported as an warning which would need to be fixed. This test
will help catch any changes to the agreed upon semantics expected from
generic MM and enable platforms to accommodate it thereafter.
[anshuman.khandual@arm.com: v17]
Link: http://lkml.kernel.org/r/1587436495-22033-3-git-send-email-anshuman.khandual@arm.com
[anshuman.khandual@arm.com: v18]
Link: http://lkml.kernel.org/r/1588564865-31160-3-git-send-email-anshuman.khandual@arm.com
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> [ppc32]
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Link: http://lkml.kernel.org/r/1583919272-24178-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04 23:47:15 +00:00
|
|
|
select ARCH_HAS_DEBUG_VM_PGTABLE if !X86_PAE
|
2015-11-20 02:19:29 +00:00
|
|
|
select ARCH_HAS_DEVMEM_IS_ALLOWED
|
2024-08-28 06:02:47 +00:00
|
|
|
select ARCH_HAS_DMA_OPS if GART_IOMMU || XEN
|
2020-05-07 20:08:42 +00:00
|
|
|
select ARCH_HAS_EARLY_DEBUG if KGDB
|
2015-06-03 08:00:13 +00:00
|
|
|
select ARCH_HAS_ELF_RANDOMIZE
|
2024-10-23 16:27:11 +00:00
|
|
|
select ARCH_HAS_EXECMEM_ROX if X86_64
|
2014-09-13 18:14:53 +00:00
|
|
|
select ARCH_HAS_FAST_MULTIPLIER
|
include/linux/string.h: add the option of fortified string.h functions
This adds support for compiling with a rough equivalent to the glibc
_FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer
overflow checks for string.h functions when the compiler determines the
size of the source or destination buffer at compile-time. Unlike glibc,
it covers buffer reads in addition to writes.
GNU C __builtin_*_chk intrinsics are avoided because they would force a
much more complex implementation. They aren't designed to detect read
overflows and offer no real benefit when using an implementation based
on inline checks. Inline checks don't add up to much code size and
allow full use of the regular string intrinsics while avoiding the need
for a bunch of _chk functions and per-arch assembly to avoid wrapper
overhead.
This detects various overflows at compile-time in various drivers and
some non-x86 core kernel code. There will likely be issues caught in
regular use at runtime too.
Future improvements left out of initial implementation for simplicity,
as it's all quite optional and can be done incrementally:
* Some of the fortified string functions (strncpy, strcat), don't yet
place a limit on reads from the source based on __builtin_object_size of
the source buffer.
* Extending coverage to more string functions like strlcat.
* It should be possible to optionally use __builtin_object_size(x, 1) for
some functions (C strings) to detect intra-object overflows (like
glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative
approach to avoid likely compatibility issues.
* The compile-time checks should be made available via a separate config
option which can be enabled by default (or always enabled) once enough
time has passed to get the issues it catches fixed.
Kees said:
"This is great to have. While it was out-of-tree code, it would have
blocked at least CVE-2016-3858 from being exploitable (improper size
argument to strlcpy()). I've sent a number of fixes for
out-of-bounds-reads that this detected upstream already"
[arnd@arndb.de: x86: fix fortified memcpy]
Link: http://lkml.kernel.org/r/20170627150047.660360-1-arnd@arndb.de
[keescook@chromium.org: avoid panic() in favor of BUG()]
Link: http://lkml.kernel.org/r/20170626235122.GA25261@beast
[keescook@chromium.org: move from -mm, add ARCH_HAS_FORTIFY_SOURCE, tweak Kconfig help]
Link: http://lkml.kernel.org/r/20170526095404.20439-1-danielmicay@gmail.com
Link: http://lkml.kernel.org/r/1497903987-21002-8-git-send-email-keescook@chromium.org
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 21:36:10 +00:00
|
|
|
select ARCH_HAS_FORTIFY_SOURCE
|
2014-12-13 00:57:44 +00:00
|
|
|
select ARCH_HAS_GCOV_PROFILE_ALL
|
2022-01-20 02:10:31 +00:00
|
|
|
select ARCH_HAS_KCOV if X86_64
|
2024-03-29 07:18:25 +00:00
|
|
|
select ARCH_HAS_KERNEL_FPU_SUPPORT
|
2019-08-06 04:49:14 +00:00
|
|
|
select ARCH_HAS_MEM_ENCRYPT
|
membarrier/x86: Provide core serializing command
There are two places where core serialization is needed by membarrier:
1) When returning from the membarrier IPI,
2) After scheduler updates curr to a thread with a different mm, before
going back to user-space, since the curr->mm is used by membarrier to
check whether it needs to send an IPI to that CPU.
x86-32 uses IRET as return from interrupt, and both IRET and SYSEXIT to go
back to user-space. The IRET instruction is core serializing, but not
SYSEXIT.
x86-64 uses IRET as return from interrupt, which takes care of the IPI.
However, it can return to user-space through either SYSRETL (compat
code), SYSRETQ, or IRET. Given that SYSRET{L,Q} is not core serializing,
we rely instead on write_cr3() performed by switch_mm() to provide core
serialization after changing the current mm, and deal with the special
case of kthread -> uthread (temporarily keeping current mm into
active_mm) by adding a sync_core() in that specific case.
Use the new sync_core_before_usermode() to guarantee this.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Andrew Hunter <ahh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Avi Kivity <avi@scylladb.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: David Sehr <sehr@google.com>
Cc: Greg Hackmann <ghackmann@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maged Michael <maged.michael@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/20180129202020.8515-10-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-29 20:20:18 +00:00
|
|
|
select ARCH_HAS_MEMBARRIER_SYNC_CORE
|
2022-09-28 18:11:18 +00:00
|
|
|
select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
|
bpf: Restrict bpf_probe_read{, str}() only to archs where they work
Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs
with overlapping address ranges, we should really take the next step to
disable them from BPF use there.
To generally fix the situation, we've recently added new helper variants
bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str().
For details on them, see 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel}
and probe_read_{user,kernel}_str helpers").
Given bpf_probe_read{,str}() have been around for ~5 years by now, there
are plenty of users at least on x86 still relying on them today, so we
cannot remove them entirely w/o breaking the BPF tracing ecosystem.
However, their use should be restricted to archs with non-overlapping
address ranges where they are working in their current form. Therefore,
move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and
have x86, arm64, arm select it (other archs supporting it can follow-up
on it as well).
For the remaining archs, they can workaround easily by relying on the
feature probe from bpftool which spills out defines that can be used out
of BPF C code to implement the drop-in replacement for old/new kernels
via: bpftool feature probe macro
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/bpf/20200515101118.6508-2-daniel@iogearbox.net
2020-05-15 10:11:16 +00:00
|
|
|
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
|
2016-11-15 09:26:39 +00:00
|
|
|
select ARCH_HAS_PMEM_API if X86_64
|
2024-10-04 12:46:54 +00:00
|
|
|
select ARCH_HAS_PREEMPT_LAZY
|
2019-07-16 23:30:47 +00:00
|
|
|
select ARCH_HAS_PTE_DEVMAP if X86_64
|
2018-06-08 00:06:08 +00:00
|
|
|
select ARCH_HAS_PTE_SPECIAL
|
2023-12-27 14:12:01 +00:00
|
|
|
select ARCH_HAS_HW_PTE_YOUNG
|
2022-09-18 07:59:59 +00:00
|
|
|
select ARCH_HAS_NONLEAF_PMD_YOUNG if PGTABLE_LEVELS > 2
|
2017-05-29 19:22:50 +00:00
|
|
|
select ARCH_HAS_UACCESS_FLUSHCACHE if X86_64
|
x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
In reaction to a proposal to introduce a memcpy_mcsafe_fast()
implementation Linus points out that memcpy_mcsafe() is poorly named
relative to communicating the scope of the interface. Specifically what
addresses are valid to pass as source, destination, and what faults /
exceptions are handled.
Of particular concern is that even though x86 might be able to handle
the semantics of copy_mc_to_user() with its common copy_user_generic()
implementation other archs likely need / want an explicit path for this
case:
On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > However now I see that copy_user_generic() works for the wrong reason.
> > It works because the exception on the source address due to poison
> > looks no different than a write fault on the user address to the
> > caller, it's still just a short copy. So it makes copy_to_user() work
> > for the wrong reason relative to the name.
>
> Right.
>
> And it won't work that way on other architectures. On x86, we have a
> generic function that can take faults on either side, and we use it
> for both cases (and for the "in_user" case too), but that's an
> artifact of the architecture oddity.
>
> In fact, it's probably wrong even on x86 - because it can hide bugs -
> but writing those things is painful enough that everybody prefers
> having just one function.
Replace a single top-level memcpy_mcsafe() with either
copy_mc_to_user(), or copy_mc_to_kernel().
Introduce an x86 copy_mc_fragile() name as the rename for the
low-level x86 implementation formerly named memcpy_mcsafe(). It is used
as the slow / careful backend that is supplanted by a fast
copy_mc_generic() in a follow-on patch.
One side-effect of this reorganization is that separating copy_mc_64.S
to its own file means that perf no longer needs to track dependencies
for its memcpy_64.S benchmarks.
[ bp: Massage a bit. ]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: <stable@vger.kernel.org>
Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
2020-10-06 03:40:16 +00:00
|
|
|
select ARCH_HAS_COPY_MC if X86_64
|
2017-02-21 15:09:33 +00:00
|
|
|
select ARCH_HAS_SET_MEMORY
|
2019-04-26 00:11:34 +00:00
|
|
|
select ARCH_HAS_SET_DIRECT_MAP
|
2017-02-07 00:31:57 +00:00
|
|
|
select ARCH_HAS_STRICT_KERNEL_RWX
|
|
|
|
select ARCH_HAS_STRICT_MODULE_RWX
|
2018-01-29 20:20:16 +00:00
|
|
|
select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
|
2020-03-13 19:51:42 +00:00
|
|
|
select ARCH_HAS_SYSCALL_WRAPPER
|
2024-01-28 18:45:29 +00:00
|
|
|
select ARCH_HAS_UBSAN
|
2020-06-03 23:03:58 +00:00
|
|
|
select ARCH_HAS_DEBUG_WX
|
2021-07-01 01:52:20 +00:00
|
|
|
select ARCH_HAS_ZONE_DMA_SET if EXPERT
|
2015-06-03 08:00:13 +00:00
|
|
|
select ARCH_HAVE_NMI_SAFE_CMPXCHG
|
x86/elf: Add a new FPU buffer layout info to x86 core files
Add a new .note section containing type, size, offset and flags of every
xfeature that is present.
This information will be used by debuggers to understand the XSAVE layout of
the machine where the core file has been dumped, and to read XSAVE registers,
especially during cross-platform debugging.
The XSAVE layouts of modern AMD and Intel CPUs differ, especially since
Memory Protection Keys and the AVX-512 features have been inculcated into
the AMD CPUs.
Since AMD never adopted (and hence never left room in the XSAVE layout for)
the Intel MPX feature, tools like GDB had assumed a fixed XSAVE layout
matching that of Intel (based on the XCR0 mask).
Hence, core dumps from AMD CPUs didn't match the known size for the XCR0 mask.
This resulted in GDB and other tools not being able to access the values of
the AVX-512 and PKRU registers on AMD CPUs.
To solve this, an interim solution has been accepted into GDB, and is already
a part of GDB 14, see
https://sourceware.org/pipermail/gdb-patches/2023-March/198081.html.
But it depends on heuristics based on the total XSAVE register set size
and the XCR0 mask to infer the layouts of the various register blocks
for core dumps, and hence, is not a foolproof mechanism to determine the
layout of the XSAVE area.
Therefore, add a new core dump note in order to allow GDB/LLDB and other
relevant tools to determine the layout of the XSAVE area of the machine where
the corefile was dumped.
The new core dump note (which is being proposed as a per-process .note
section), NT_X86_XSAVE_LAYOUT (0x205) contains an array of structures.
Each structure describes an individual extended feature containing
offset, size and flags in this format:
struct x86_xfeat_component {
u32 type;
u32 size;
u32 offset;
u32 flags;
};
and in an independent manner, allowing for future extensions without depending
on hw arch specifics like CPUID etc.
[ bp: Massage commit message, zap trailing whitespace. ]
Co-developed-by: Jini Susan George <jinisusan.george@amd.com>
Signed-off-by: Jini Susan George <jinisusan.george@amd.com>
Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Vignesh Balasubramanian <vigbalas@amd.com>
Link: https://lore.kernel.org/r/20240725161017.112111-2-vigbalas@amd.com
2024-07-25 16:10:18 +00:00
|
|
|
select ARCH_HAVE_EXTRA_ELF_NOTES
|
2023-08-08 09:14:56 +00:00
|
|
|
select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
|
2015-06-03 08:00:13 +00:00
|
|
|
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
|
2013-10-08 02:18:07 +00:00
|
|
|
select ARCH_MIGHT_HAVE_PC_PARPORT
|
2014-01-01 19:34:16 +00:00
|
|
|
select ARCH_MIGHT_HAVE_PC_SERIO
|
2019-04-25 09:45:22 +00:00
|
|
|
select ARCH_STACKWALK
|
2018-07-24 09:48:45 +00:00
|
|
|
select ARCH_SUPPORTS_ACPI
|
2015-06-03 08:00:13 +00:00
|
|
|
select ARCH_SUPPORTS_ATOMIC_RMW
|
2020-12-15 03:10:30 +00:00
|
|
|
select ARCH_SUPPORTS_DEBUG_PAGEALLOC
|
2022-01-14 22:06:41 +00:00
|
|
|
select ARCH_SUPPORTS_PAGE_TABLE_CHECK if X86_64
|
2015-06-03 08:00:13 +00:00
|
|
|
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
|
2020-11-18 19:48:41 +00:00
|
|
|
select ARCH_SUPPORTS_KMAP_LOCAL_FORCE_MAP if NR_CPUS <= 4096
|
2022-09-08 21:55:04 +00:00
|
|
|
select ARCH_SUPPORTS_CFI_CLANG if X86_64
|
|
|
|
select ARCH_USES_CFI_TRAPS if X86_64 && CFI_CLANG
|
2021-04-29 23:26:12 +00:00
|
|
|
select ARCH_SUPPORTS_LTO_CLANG
|
|
|
|
select ARCH_SUPPORTS_LTO_CLANG_THIN
|
2024-09-06 10:59:04 +00:00
|
|
|
select ARCH_SUPPORTS_RT
|
kbuild: Add AutoFDO support for Clang build
Add the build support for using Clang's AutoFDO. Building the kernel
with AutoFDO does not reduce the optimization level from the
compiler. AutoFDO uses hardware sampling to gather information about
the frequency of execution of different code paths within a binary.
This information is then used to guide the compiler's optimization
decisions, resulting in a more efficient binary. Experiments
showed that the kernel can improve up to 10% in latency.
The support requires a Clang compiler after LLVM 17. This submission
is limited to x86 platforms that support PMU features like LBR on
Intel machines and AMD Zen3 BRS. Support for SPE on ARM 1,
and BRBE on ARM 1 is part of planned future work.
Here is an example workflow for AutoFDO kernel:
1) Build the kernel on the host machine with LLVM enabled, for example,
$ make menuconfig LLVM=1
Turn on AutoFDO build config:
CONFIG_AUTOFDO_CLANG=y
With a configuration that has LLVM enabled, use the following
command:
scripts/config -e AUTOFDO_CLANG
After getting the config, build with
$ make LLVM=1
2) Install the kernel on the test machine.
3) Run the load tests. The '-c' option in perf specifies the sample
event period. We suggest using a suitable prime number,
like 500009, for this purpose.
For Intel platforms:
$ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \
-o <perf_file> -- <loadtest>
For AMD platforms:
The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2
For Zen3:
$ cat proc/cpuinfo | grep " brs"
For Zen4:
$ cat proc/cpuinfo | grep amd_lbr_v2
$ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \
-N -b -c <count> -o <perf_file> -- <loadtest>
4) (Optional) Download the raw perf file to the host machine.
5) To generate an AutoFDO profile, two offline tools are available:
create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part
of the AutoFDO project and can be found on GitHub
(https://github.com/google/autofdo), version v0.30.1 or later. The
llvm_profgen tool is included in the LLVM compiler itself. It's
important to note that the version of llvm_profgen doesn't need to
match the version of Clang. It needs to be the LLVM 19 release or
later, or from the LLVM trunk.
$ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> \
-o <profile_file>
or
$ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \
--format=extbinary --out=<profile_file>
Note that multiple AutoFDO profile files can be merged into one via:
$ llvm-profdata merge -o <profile_file> <profile_1> ... <profile_n>
6) Rebuild the kernel using the AutoFDO profile file with the same config
as step 1, (Note CONFIG_AUTOFDO_CLANG needs to be enabled):
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file>
Co-developed-by: Han Shen <shenhan@google.com>
Signed-off-by: Han Shen <shenhan@google.com>
Signed-off-by: Rong Xu <xur@google.com>
Suggested-by: Sriraman Tallam <tmsriram@google.com>
Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com>
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Suggested-by: Stephane Eranian <eranian@google.com>
Tested-by: Yonghong Song <yonghong.song@linux.dev>
Tested-by: Yabin Cui <yabinc@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Tested-by: Peter Jung <ptr1337@cachyos.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-02 17:51:08 +00:00
|
|
|
select ARCH_SUPPORTS_AUTOFDO_CLANG
|
kbuild: Add Propeller configuration for kernel build
Add the build support for using Clang's Propeller optimizer. Like
AutoFDO, Propeller uses hardware sampling to gather information
about the frequency of execution of different code paths within a
binary. This information is then used to guide the compiler's
optimization decisions, resulting in a more efficient binary.
The support requires a Clang compiler LLVM 19 or later, and the
create_llvm_prof tool
(https://github.com/google/autofdo/releases/tag/v0.30.1). This
commit is limited to x86 platforms that support PMU features
like LBR on Intel machines and AMD Zen3 BRS.
Here is an example workflow for building an AutoFDO+Propeller
optimized kernel:
1) Build the kernel on the host machine, with AutoFDO and Propeller
build config
CONFIG_AUTOFDO_CLANG=y
CONFIG_PROPELLER_CLANG=y
then
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile>
“<autofdo_profile>” is the profile collected when doing a non-Propeller
AutoFDO build. This step builds a kernel that has the same optimization
level as AutoFDO, plus a metadata section that records basic block
information. This kernel image runs as fast as an AutoFDO optimized
kernel.
2) Install the kernel on test/production machines.
3) Run the load tests. The '-c' option in perf specifies the sample
event period. We suggest using a suitable prime number,
like 500009, for this purpose.
For Intel platforms:
$ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \
-o <perf_file> -- <loadtest>
For AMD platforms:
The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2
# To see if Zen3 support LBR:
$ cat proc/cpuinfo | grep " brs"
# To see if Zen4 support LBR:
$ cat proc/cpuinfo | grep amd_lbr_v2
# If the result is yes, then collect the profile using:
$ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \
-N -b -c <count> -o <perf_file> -- <loadtest>
4) (Optional) Download the raw perf file to the host machine.
5) Generate Propeller profile:
$ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \
--format=propeller --propeller_output_module_name \
--out=<propeller_profile_prefix>_cc_profile.txt \
--propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
“create_llvm_prof” is the profile conversion tool, and a prebuilt
binary for linux can be found on
https://github.com/google/autofdo/releases/tag/v0.30.1 (can also build
from source).
"<propeller_profile_prefix>" can be something like
"/home/user/dir/any_string".
This command generates a pair of Propeller profiles:
"<propeller_profile_prefix>_cc_profile.txt" and
"<propeller_profile_prefix>_ld_profile.txt".
6) Rebuild the kernel using the AutoFDO and Propeller profile files.
CONFIG_AUTOFDO_CLANG=y
CONFIG_PROPELLER_CLANG=y
and
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> \
CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix>
Co-developed-by: Han Shen <shenhan@google.com>
Signed-off-by: Han Shen <shenhan@google.com>
Signed-off-by: Rong Xu <xur@google.com>
Suggested-by: Sriraman Tallam <tmsriram@google.com>
Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com>
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Suggested-by: Stephane Eranian <eranian@google.com>
Tested-by: Yonghong Song <yonghong.song@linux.dev>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-02 17:51:14 +00:00
|
|
|
select ARCH_SUPPORTS_PROPELLER_CLANG if X86_64
|
2015-06-03 08:00:13 +00:00
|
|
|
select ARCH_USE_BUILTIN_BSWAP
|
locking/lockref/x86: Enable ARCH_USE_CMPXCHG_LOCKREF for X86_CMPXCHG64
The following commit:
bc08b449ee14 ("lockref: implement lockless reference count updates using cmpxchg()")
enabled lockless reference count updates using cmpxchg() only for x86_64,
and left x86_32 behind due to inability to detect support for
cmpxchg8b instruction.
Nowadays, we can use CONFIG_X86_CMPXCHG64 for this purpose. Also,
by using try_cmpxchg64() instead of cmpxchg64() in the CMPXCHG_LOOP macro,
the compiler actually produces sane code, improving the
lockref_get_not_zero() main loop from:
eb: 8d 48 01 lea 0x1(%eax),%ecx
ee: 85 c0 test %eax,%eax
f0: 7e 2f jle 121 <lockref_get_not_zero+0x71>
f2: 8b 44 24 10 mov 0x10(%esp),%eax
f6: 8b 54 24 14 mov 0x14(%esp),%edx
fa: 8b 74 24 08 mov 0x8(%esp),%esi
fe: f0 0f c7 0e lock cmpxchg8b (%esi)
102: 8b 7c 24 14 mov 0x14(%esp),%edi
106: 89 c1 mov %eax,%ecx
108: 89 c3 mov %eax,%ebx
10a: 8b 74 24 10 mov 0x10(%esp),%esi
10e: 89 d0 mov %edx,%eax
110: 31 fa xor %edi,%edx
112: 31 ce xor %ecx,%esi
114: 09 f2 or %esi,%edx
116: 75 58 jne 170 <lockref_get_not_zero+0xc0>
to:
350: 8d 4f 01 lea 0x1(%edi),%ecx
353: 85 ff test %edi,%edi
355: 7e 79 jle 3d0 <lockref_get_not_zero+0xb0>
357: f0 0f c7 0e lock cmpxchg8b (%esi)
35b: 75 53 jne 3b0 <lockref_get_not_zero+0x90>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20230918184050.9180-1-ubizjak@gmail.com
2023-09-18 18:40:27 +00:00
|
|
|
select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64
|
2021-04-30 05:55:15 +00:00
|
|
|
select ARCH_USE_MEMTEST
|
2015-06-03 08:00:13 +00:00
|
|
|
select ARCH_USE_QUEUED_RWLOCKS
|
|
|
|
select ARCH_USE_QUEUED_SPINLOCKS
|
2020-04-16 18:24:02 +00:00
|
|
|
select ARCH_USE_SYM_ANNOTATIONS
|
2017-05-28 17:00:14 +00:00
|
|
|
select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
|
2019-12-09 15:08:03 +00:00
|
|
|
select ARCH_WANT_DEFAULT_BPF_JIT if X86_64
|
2016-11-15 09:26:39 +00:00
|
|
|
select ARCH_WANTS_DYNAMIC_TASK_STRUCT
|
2021-06-21 23:18:22 +00:00
|
|
|
select ARCH_WANTS_NO_INSTR
|
2022-03-22 21:45:15 +00:00
|
|
|
select ARCH_WANT_GENERAL_HUGETLB
|
2019-06-27 22:00:11 +00:00
|
|
|
select ARCH_WANT_HUGE_PMD_SHARE
|
2020-11-19 20:46:56 +00:00
|
|
|
select ARCH_WANT_LD_ORPHAN_WARN
|
2023-07-24 19:07:53 +00:00
|
|
|
select ARCH_WANT_OPTIMIZE_DAX_VMEMMAP if X86_64
|
|
|
|
select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP if X86_64
|
mm, THP, swap: delay splitting THP during swap out
Patch series "THP swap: Delay splitting THP during swapping out", v11.
This patchset is to optimize the performance of Transparent Huge Page
(THP) swap.
Recently, the performance of the storage devices improved so fast that
we cannot saturate the disk bandwidth with single logical CPU when do
page swap out even on a high-end server machine. Because the
performance of the storage device improved faster than that of single
logical CPU. And it seems that the trend will not change in the near
future. On the other hand, the THP becomes more and more popular
because of increased memory size. So it becomes necessary to optimize
THP swap performance.
The advantages of the THP swap support include:
- Batch the swap operations for the THP to reduce lock
acquiring/releasing, including allocating/freeing the swap space,
adding/deleting to/from the swap cache, and writing/reading the swap
space, etc. This will help improve the performance of the THP swap.
- The THP swap space read/write will be 2M sequential IO. It is
particularly helpful for the swap read, which are usually 4k random
IO. This will improve the performance of the THP swap too.
- It will help the memory fragmentation, especially when the THP is
heavily used by the applications. The 2M continuous pages will be
free up after THP swapping out.
- It will improve the THP utilization on the system with the swap
turned on. Because the speed for khugepaged to collapse the normal
pages into the THP is quite slow. After the THP is split during the
swapping out, it will take quite long time for the normal pages to
collapse back into the THP after being swapped in. The high THP
utilization helps the efficiency of the page based memory management
too.
There are some concerns regarding THP swap in, mainly because possible
enlarged read/write IO size (for swap in/out) may put more overhead on
the storage device. To deal with that, the THP swap in should be turned
on only when necessary. For example, it can be selected via
"always/never/madvise" logic, to be turned on globally, turned off
globally, or turned on only for VMA with MADV_HUGEPAGE, etc.
This patchset is the first step for the THP swap support. The plan is
to delay splitting THP step by step, finally avoid splitting THP during
the THP swapping out and swap out/in the THP as a whole.
As the first step, in this patchset, the splitting huge page is delayed
from almost the first step of swapping out to after allocating the swap
space for the THP and adding the THP into the swap cache. This will
reduce lock acquiring/releasing for the locks used for the swap cache
management.
With the patchset, the swap out throughput improves 15.5% (from about
3.73GB/s to about 4.31GB/s) in the vm-scalability swap-w-seq test case
with 8 processes. The test is done on a Xeon E5 v3 system. The swap
device used is a RAM simulated PMEM (persistent memory) device. To test
the sequential swapping out, the test case creates 8 processes, which
sequentially allocate and write to the anonymous pages until the RAM and
part of the swap device is used up.
This patch (of 5):
In this patch, splitting huge page is delayed from almost the first step
of swapping out to after allocating the swap space for the THP
(Transparent Huge Page) and adding the THP into the swap cache. This
will batch the corresponding operation, thus improve THP swap out
throughput.
This is the first step for the THP swap optimization. The plan is to
delay splitting the THP step by step and avoid splitting the THP
finally.
In this patch, one swap cluster is used to hold the contents of each THP
swapped out. So, the size of the swap cluster is changed to that of the
THP (Transparent Huge Page) on x86_64 architecture (512). For other
architectures which want such THP swap optimization,
ARCH_USES_THP_SWAP_CLUSTER needs to be selected in the Kconfig file for
the architecture. In effect, this will enlarge swap cluster size by 2
times on x86_64. Which may make it harder to find a free cluster when
the swap space becomes fragmented. So that, this may reduce the
continuous swap space allocation and sequential write in theory. The
performance test in 0day shows no regressions caused by this.
In the future of THP swap optimization, some information of the swapped
out THP (such as compound map count) will be recorded in the
swap_cluster_info data structure.
The mem cgroup swap accounting functions are enhanced to support charge
or uncharge a swap cluster backing a THP as a whole.
The swap cluster allocate/free functions are added to allocate/free a
swap cluster for a THP. A fair simple algorithm is used for swap
cluster allocation, that is, only the first swap device in priority list
will be tried to allocate the swap cluster. The function will fail if
the trying is not successful, and the caller will fallback to allocate a
single swap slot instead. This works good enough for normal cases. If
the difference of the number of the free swap clusters among multiple
swap devices is significant, it is possible that some THPs are split
earlier than necessary. For example, this could be caused by big size
difference among multiple swap devices.
The swap cache functions is enhanced to support add/delete THP to/from
the swap cache as a set of (HPAGE_PMD_NR) sub-pages. This may be
enhanced in the future with multi-order radix tree. But because we will
split the THP soon during swapping out, that optimization doesn't make
much sense for this first step.
The THP splitting functions are enhanced to support to split THP in swap
cache during swapping out. The page lock will be held during allocating
the swap cluster, adding the THP into the swap cache and splitting the
THP. So in the code path other than swapping out, if the THP need to be
split, the PageSwapCache(THP) will be always false.
The swap cluster is only available for SSD, so the THP swap optimization
in this patchset has no effect for HDD.
[ying.huang@intel.com: fix two issues in THP optimize patch]
Link: http://lkml.kernel.org/r/87k25ed8zo.fsf@yhuang-dev.intel.com
[hannes@cmpxchg.org: extensive cleanups and simplifications, reduce code size]
Link: http://lkml.kernel.org/r/20170515112522.32457-2-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Andrew Morton <akpm@linux-foundation.org> [for config option]
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> [for changes in huge_memory.c and huge_mm.h]
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 22:37:18 +00:00
|
|
|
select ARCH_WANTS_THP_SWAP if X86_64
|
2021-04-26 19:42:30 +00:00
|
|
|
select ARCH_HAS_PARANOID_L1D_FLUSH
|
2019-12-04 00:46:31 +00:00
|
|
|
select BUILDTIME_TABLE_SORT
|
2015-06-03 08:00:13 +00:00
|
|
|
select CLKEVT_I8253
|
|
|
|
select CLOCKSOURCE_WATCHDOG
|
2022-09-15 15:04:12 +00:00
|
|
|
# Word-size accesses may read uninitialized data past the trailing \0
|
|
|
|
# in strings and cause false KMSAN reports.
|
|
|
|
select DCACHE_WORD_ACCESS if !KMSAN
|
2021-10-21 22:55:06 +00:00
|
|
|
select DYNAMIC_SIGFRAME
|
EDAC changes, v2:
* New APM X-Gene SoC EDAC driver (Loc Ho)
* AMD error injection module improvements (Aravind Gopalakrishnan)
* Altera Arria 10 support (Thor Thayer)
* misc fixes and cleanups all over the place
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJViuInAAoJEBLB8Bhh3lVKHT8QAKkHIMreO8obo09haxNJlfdF
BaG7SNEDhvcgQ1B76RsjnjkUpsivvUt+mCYMP+BxcAqFrTA33UZCCOK5tEhGb1wr
matRdR6+aezqAl2e/0/Ti25bWOkDxcOeazh2TyezuyIXtaJjOq1oZC7OaYGmxPun
NlZY+/uY1eiHlewKsK04y8G8J5i4wGoKnuxBvOyELT90+a+fLfAOshAp0D4r0piB
Znv0ydsHlu+Wx57slg1DktlsyswmcGS9WfWwwTlELOLulKgN8wEAVYzUB5pJzNbz
ehq0J4wYz95juXADC4M4tEjErHVJNl6PbyMqwt0+XUUJ1NSgOj7Q6iqwxDoZX8km
oxiLVydQBtoIzF1LojFKAVZDFnrMKHKwK3RaDaUJjTI90+tVzEU8xsBlUf6+EgD2
Ss2RH8Gfuf52RdtwHh9++T1ur5rM9YNCAm31msq06mcOf0bEtmDbhZ+fVC5mjhqB
fIb3hxnk0r2BVg+ZCN/boxGS6RzUtYVcCXaBPDMeHcg9BEEds70KCFEcsX7TvJIg
5/SHI+033MylqkX2zrgDQLj7CQk3R0jaotHVbdhLupyOldcM7r5uF+VO84drNWGN
GfM2lpyE/swZWnzKuotgYIGR1XvFjtJAVAyNGIvwP+ajjTsqXzEnLSLClY5LWfYd
nSSSMpCCqsEmhoWftOix
=Id4f
-----END PGP SIGNATURE-----
Merge tag 'edac_for_4.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
- New APM X-Gene SoC EDAC driver (Loc Ho)
- AMD error injection module improvements (Aravind Gopalakrishnan)
- Altera Arria 10 support (Thor Thayer)
- misc fixes and cleanups all over the place
* tag 'edac_for_4.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (28 commits)
EDAC: Update Documentation/edac.txt
EDAC: Fix typos in Documentation/edac.txt
EDAC, mce_amd_inj: Set MISCV on injection
EDAC, mce_amd_inj: Move bit preparations before the injection
EDAC, mce_amd_inj: Cleanup and simplify README
EDAC, altera: Do not allow suspend when EDAC is enabled
EDAC, mce_amd_inj: Make inj_type static
arm: socfpga: dts: Add Arria10 SDRAM EDAC DTS support
EDAC, altera: Add Arria10 EDAC support
EDAC, altera: Refactor for Altera CycloneV SoC
EDAC, altera: Generalize driver to use DT Memory size
EDAC, mce_amd_inj: Add README file
EDAC, mce_amd_inj: Add individual permissions field to dfs_node
EDAC, mce_amd_inj: Modify flags attribute to use string arguments
EDAC, mce_amd_inj: Read out number of MCE banks from the hardware
EDAC, mce_amd_inj: Use MCE_INJECT_GET macro for bank node too
EDAC, xgene: Fix cpuid abuse
EDAC, mpc85xx: Extend error address to 64 bit
EDAC, mpc8xxx: Adapt for FSL SoC
EDAC, edac_stub: Drop arch-specific include
...
2015-06-25 02:52:06 +00:00
|
|
|
select EDAC_ATOMIC_SCRUB
|
|
|
|
select EDAC_SUPPORT
|
2015-06-03 08:00:13 +00:00
|
|
|
select GENERIC_CLOCKEVENTS_BROADCAST if X86_64 || (X86_32 && X86_LOCAL_APIC)
|
2024-02-28 22:13:00 +00:00
|
|
|
select GENERIC_CLOCKEVENTS_BROADCAST_IDLE if GENERIC_CLOCKEVENTS_BROADCAST
|
2015-06-03 08:00:13 +00:00
|
|
|
select GENERIC_CLOCKEVENTS_MIN_ADJUST
|
|
|
|
select GENERIC_CMOS_UPDATE
|
|
|
|
select GENERIC_CPU_AUTOPROBE
|
2023-11-21 13:45:01 +00:00
|
|
|
select GENERIC_CPU_DEVICES
|
2018-01-07 21:48:01 +00:00
|
|
|
select GENERIC_CPU_VULNERABILITIES
|
2014-04-07 22:39:49 +00:00
|
|
|
select GENERIC_EARLY_IOREMAP
|
2020-07-22 22:00:04 +00:00
|
|
|
select GENERIC_ENTRY
|
2015-06-03 08:00:13 +00:00
|
|
|
select GENERIC_IOMAP
|
2017-06-19 23:37:46 +00:00
|
|
|
select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
|
2017-09-13 21:29:38 +00:00
|
|
|
select GENERIC_IRQ_MATRIX_ALLOCATOR if X86_LOCAL_APIC
|
2017-06-19 23:37:33 +00:00
|
|
|
select GENERIC_IRQ_MIGRATION if SMP
|
2015-06-03 08:00:13 +00:00
|
|
|
select GENERIC_IRQ_PROBE
|
2017-10-17 07:54:59 +00:00
|
|
|
select GENERIC_IRQ_RESERVATION_MODE
|
2015-06-03 08:00:13 +00:00
|
|
|
select GENERIC_IRQ_SHOW
|
|
|
|
select GENERIC_PENDING_IRQ if SMP
|
2020-02-04 01:36:24 +00:00
|
|
|
select GENERIC_PTDUMP
|
2015-06-03 08:00:13 +00:00
|
|
|
select GENERIC_SMP_IDLE_THREAD
|
|
|
|
select GENERIC_TIME_VSYSCALL
|
2019-06-21 09:52:49 +00:00
|
|
|
select GENERIC_GETTIMEOFDAY
|
2019-11-12 01:27:11 +00:00
|
|
|
select GENERIC_VDSO_TIME_NS
|
2024-03-25 06:40:12 +00:00
|
|
|
select GENERIC_VDSO_OVERFLOW_PROTECT
|
2022-10-21 12:51:44 +00:00
|
|
|
select GUP_GET_PXX_LOW_HIGH if X86_PAE
|
x86: Select HARDIRQS_SW_RESEND on x86
Modern x86 laptops are starting to use GPIO pins as interrupts more
and more, e.g. touchpads and touchscreens have almost all moved away
from PS/2 and USB to using I2C with a GPIO pin as interrupt.
Modern x86 laptops also have almost all moved to using s2idle instead
of using the system S3 ACPI power state to suspend.
The Intel and AMD pinctrl drivers do not define irq_retrigger handlers
for the irqchips they register, this is causing edge triggered interrupts
which happen while suspended using s2idle to get lost.
One specific example of this is the lid switch on some devices, lid
switches used to be handled by the embedded-controller, but now the
lid open/closed sensor is sometimes directly connected to a GPIO pin.
On most devices the ACPI code for this looks like this:
Method (_E00, ...) {
Notify (LID0, 0x80) // Status Change
}
Where _E00 is an ACPI event handler for changes on both edges of the GPIO
connected to the lid sensor, this event handler is then combined with an
_LID method which directly reads the pin. When the device is resumed by
opening the lid, the GPIO interrupt will wake the system, but because the
pinctrl irqchip doesn't have an irq_retrigger handler, the Notify will not
happen. This is not a problem in the case the _LID method directly reads
the GPIO, because the drivers/acpi/button.c code will call _LID on resume
anyways.
But some devices have an event handler for the GPIO connected to the
lid sensor which looks like this:
Method (_E00, ...) {
if (LID_GPIO == One)
LIDS = One
else
LIDS = Zero
Notify (LID0, 0x80) // Status Change
}
And the _LID method returns the cached LIDS value, since on open we
do not re-run the edge-interrupt handler when we re-enable IRQS on resume
(because of the missing irq_retrigger handler), _LID now will keep
reporting closed, as LIDS was never changed to reflect the open status,
this causes userspace to re-resume the laptop again shortly after opening
the lid.
The Intel GPIO controllers do not allow implementing irq_retrigger without
emulating it in software, at which point we are better of just using the
generic HARDIRQS_SW_RESEND mechanism rather then re-implementing software
emulation for this separately in aprox. 14 different pinctrl drivers.
Select HARDIRQS_SW_RESEND to solve the problem of edge-triggered GPIO
interrupts not being re-triggered on resume when they were triggered during
suspend (s2idle) and/or when they were the cause of the wakeup.
This requires
008f1d60fe25 ("x86/apic/vector: Force interupt handler invocation to irq context")
c16816acd086 ("genirq: Add protection against unsafe usage of generic_handle_irq()")
to protect the APIC based interrupts from being wreckaged by a software
resend.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200123210242.53367-1-hdegoede@redhat.com
2020-01-23 21:02:42 +00:00
|
|
|
select HARDIRQS_SW_RESEND
|
2017-08-15 07:50:13 +00:00
|
|
|
select HARDLOCKUP_CHECK_TIMESTAMP if X86_64
|
2023-03-23 16:33:52 +00:00
|
|
|
select HAS_IOPORT
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_ACPI_APEI if ACPI
|
|
|
|
select HAVE_ACPI_APEI_NMI if ACPI
|
2023-10-02 13:43:03 +00:00
|
|
|
select HAVE_ALIGNED_STRUCT_PAGE
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_ARCH_AUDITSYSCALL
|
|
|
|
select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
|
2022-03-02 17:51:25 +00:00
|
|
|
select HAVE_ARCH_HUGE_VMALLOC if X86_64
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_ARCH_JUMP_LABEL
|
2018-09-19 06:51:41 +00:00
|
|
|
select HAVE_ARCH_JUMP_LABEL_RELATIVE
|
2017-11-16 01:36:35 +00:00
|
|
|
select HAVE_ARCH_KASAN if X86_64
|
2019-12-01 01:55:00 +00:00
|
|
|
select HAVE_ARCH_KASAN_VMALLOC if X86_64
|
2021-02-26 01:18:57 +00:00
|
|
|
select HAVE_ARCH_KFENCE
|
2022-09-15 15:04:17 +00:00
|
|
|
select HAVE_ARCH_KMSAN if X86_64
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_ARCH_KGDB
|
2016-01-14 23:20:06 +00:00
|
|
|
select HAVE_ARCH_MMAP_RND_BITS if MMU
|
|
|
|
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT
|
2017-03-06 14:17:19 +00:00
|
|
|
select HAVE_ARCH_COMPAT_MMAP_BASES if MMU && COMPAT
|
arch: enable relative relocations for arm64, power and x86
Patch series "add support for relative references in special sections", v10.
This adds support for emitting special sections such as initcall arrays,
PCI fixups and tracepoints as relative references rather than absolute
references. This reduces the size by 50% on 64-bit architectures, but
more importantly, it removes the need for carrying relocation metadata for
these sections in relocatable kernels (e.g., for KASLR) that needs to be
fixed up at boot time. On arm64, this reduces the vmlinux footprint of
such a reference by 8x (8 byte absolute reference + 24 byte RELA entry vs
4 byte relative reference)
Patch #3 was sent out before as a single patch. This series supersedes
the previous submission. This version makes relative ksymtab entries
dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
than trying to infer from kbuild test robot replies for which
architectures it should be blacklisted.
Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
and sets it for the main architectures that are expected to benefit the
most from this feature, i.e., 64-bit architectures or ones that use
runtime relocations.
Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
ksymtab/kcrctab sections in decompressor and EFI stub objects when
rebuilding existing C files to run in a different context.
Patches #4 - #6 implement relative references for initcalls, PCI fixups
and tracepoints, respectively, all of which produce sections with order
~1000 entries on an arm64 defconfig kernel with tracing enabled. This
means we save about 28 KB of vmlinux space for each of these patches.
[From the v7 series blurb, which included the jump_label patches as well]:
For the arm64 kernel, all patches combined reduce the memory footprint
of vmlinux by about 1.3 MB (using a config copied from Ubuntu that has
KASLR enabled), of which ~1 MB is the size reduction of the RELA section
in .init, and the remaining 300 KB is reduction of .text/.data.
This patch (of 6):
Before updating certain subsystems to use place relative 32-bit
relocations in special sections, to save space and reduce the number of
absolute relocations that need to be processed at runtime by relocatable
kernels, introduce the Kconfig symbol and define it for some architectures
that should be able to support and benefit from it.
Link: http://lkml.kernel.org/r/20180704083651.24360-2-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: James Morris <jmorris@namei.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Cc: James Morris <james.morris@microsoft.com>
Cc: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 04:56:00 +00:00
|
|
|
select HAVE_ARCH_PREL32_RELOCATIONS
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_ARCH_SECCOMP_FILTER
|
2017-08-16 20:26:03 +00:00
|
|
|
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
|
2018-08-16 22:16:58 +00:00
|
|
|
select HAVE_ARCH_STACKLEAK
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_ARCH_TRACEHOOK
|
|
|
|
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
|
2017-02-24 22:57:02 +00:00
|
|
|
select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64
|
2020-04-21 01:13:45 +00:00
|
|
|
select HAVE_ARCH_USERFAULTFD_WP if X86_64 && USERFAULTFD
|
userfaultfd: add minor fault registration mode
Patch series "userfaultfd: add minor fault handling", v9.
Overview
========
This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS.
When enabled (via the UFFDIO_API ioctl), this feature means that any
hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also*
get events for "minor" faults. By "minor" fault, I mean the following
situation:
Let there exist two mappings (i.e., VMAs) to the same page(s) (shared
memory). One of the mappings is registered with userfaultfd (in minor
mode), and the other is not. Via the non-UFFD mapping, the underlying
pages have already been allocated & filled with some contents. The UFFD
mapping has not yet been faulted in; when it is touched for the first
time, this results in what I'm calling a "minor" fault. As a concrete
example, when working with hugetlbfs, we have huge_pte_none(), but
find_lock_page() finds an existing page.
We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea
is, userspace resolves the fault by either a) doing nothing if the
contents are already correct, or b) updating the underlying contents using
the second, non-UFFD mapping (via memcpy/memset or similar, or something
fancier like RDMA, or etc...). In either case, userspace issues
UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are
correct, carry on setting up the mapping".
Use Case
========
Consider the use case of VM live migration (e.g. under QEMU/KVM):
1. While a VM is still running, we copy the contents of its memory to a
target machine. The pages are populated on the target by writing to the
non-UFFD mapping, using the setup described above. The VM is still running
(and therefore its memory is likely changing), so this may be repeated
several times, until we decide the target is "up to date enough".
2. We pause the VM on the source, and start executing on the target machine.
During this gap, the VM's user(s) will *see* a pause, so it is desirable to
minimize this window.
3. Between the last time any page was copied from the source to the target, and
when the VM was paused, the contents of that page may have changed - and
therefore the copy we have on the target machine is out of date. Although we
can keep track of which pages are out of date, for VMs with large amounts of
memory, it is "slow" to transfer this information to the target machine. We
want to resume execution before such a transfer would complete.
4. So, the guest begins executing on the target machine. The first time it
touches its memory (via the UFFD-registered mapping), userspace wants to
intercept this fault. Userspace checks whether or not the page is up to date,
and if not, copies the updated page from the source machine, via the non-UFFD
mapping. Finally, whether a copy was performed or not, userspace issues a
UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents
are correct, carry on setting up the mapping".
We don't have to do all of the final updates on-demand. The userfaultfd manager
can, in the background, also copy over updated pages once it receives the map of
which pages are up-to-date or not.
Interaction with Existing APIs
==============================
Because this is a feature, a registered VMA could potentially receive both
missing and minor faults. I spent some time thinking through how the
existing API interacts with the new feature:
UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not
allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault:
- For non-shared memory or shmem, -EINVAL is returned.
- For hugetlb, -EFAULT is returned.
UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults.
Without modifications, the existing codepath assumes a new page needs to
be allocated. This is okay, since userspace must have a second
non-UFFD-registered mapping anyway, thus there isn't much reason to want
to use these in any case (just memcpy or memset or similar).
- If UFFDIO_COPY is used on a minor fault, -EEXIST is returned.
- If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL
in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case).
- UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns
-ENOENT in that case (regardless of the kind of fault).
Future Work
===========
This series only supports hugetlbfs. I have a second series in flight to
support shmem as well, extending the functionality. This series is more
mature than the shmem support at this point, and the functionality works
fully on hugetlbfs, so this series can be merged first and then shmem
support will follow.
This patch (of 6):
This feature allows userspace to intercept "minor" faults. By "minor"
faults, I mean the following situation:
Let there exist two mappings (i.e., VMAs) to the same page(s). One of the
mappings is registered with userfaultfd (in minor mode), and the other is
not. Via the non-UFFD mapping, the underlying pages have already been
allocated & filled with some contents. The UFFD mapping has not yet been
faulted in; when it is touched for the first time, this results in what
I'm calling a "minor" fault. As a concrete example, when working with
hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing
page.
This commit adds the new registration mode, and sets the relevant flag on
the VMAs being registered. In the hugetlb fault path, if we find that we
have huge_pte_none(), but find_lock_page() does indeed find an existing
page, then we have a "minor" fault, and if the VMA has the userfaultfd
registration flag, we call into userfaultfd to handle it.
This is implemented as a new registration mode, instead of an API feature.
This is because the alternative implementation has significant drawbacks
[1].
However, doing it this was requires we allocate a VM_* flag for the new
registration mode. On 32-bit systems, there are no unused bits, so this
feature is only supported on architectures with
CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in
MINOR mode on 32-bit architectures, we return -EINVAL.
[1] https://lore.kernel.org/patchwork/patch/1380226/
[peterx@redhat.com: fix minor fault page leak]
Link: https://lkml.kernel.org/r/20210322175132.36659-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05 01:35:36 +00:00
|
|
|
select HAVE_ARCH_USERFAULTFD_MINOR if X86_64 && USERFAULTFD
|
2016-08-11 09:35:23 +00:00
|
|
|
select HAVE_ARCH_VMAP_STACK if X86_64
|
2021-04-01 23:23:45 +00:00
|
|
|
select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
|
2016-11-15 09:26:39 +00:00
|
|
|
select HAVE_ARCH_WITHIN_STACK_FRAMES
|
2019-08-19 05:54:20 +00:00
|
|
|
select HAVE_ASM_MODVERSIONS
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_CMPXCHG_DOUBLE
|
|
|
|
select HAVE_CMPXCHG_LOCAL
|
2022-06-08 14:40:24 +00:00
|
|
|
select HAVE_CONTEXT_TRACKING_USER if X86_64
|
|
|
|
select HAVE_CONTEXT_TRACKING_USER_OFFSTACK if HAVE_CONTEXT_TRACKING_USER
|
2010-10-15 03:32:44 +00:00
|
|
|
select HAVE_C_RECORDMCOUNT
|
2022-04-18 16:50:36 +00:00
|
|
|
select HAVE_OBJTOOL_MCOUNT if HAVE_OBJTOOL
|
2022-11-14 17:57:49 +00:00
|
|
|
select HAVE_OBJTOOL_NOP_MCOUNT if HAVE_OBJTOOL_MCOUNT
|
2022-01-25 14:19:10 +00:00
|
|
|
select HAVE_BUILDTIME_MCOUNT_SORT
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_DEBUG_KMEMLEAK
|
|
|
|
select HAVE_DMA_CONTIGUOUS
|
2008-05-17 04:01:36 +00:00
|
|
|
select HAVE_DYNAMIC_FTRACE
|
2012-09-28 08:15:17 +00:00
|
|
|
select HAVE_DYNAMIC_FTRACE_WITH_REGS
|
2020-10-27 14:55:55 +00:00
|
|
|
select HAVE_DYNAMIC_FTRACE_WITH_ARGS if X86_64
|
2019-11-08 18:11:39 +00:00
|
|
|
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
|
2021-10-12 13:38:01 +00:00
|
|
|
select HAVE_SAMPLE_FTRACE_DIRECT if X86_64
|
2021-11-15 19:56:13 +00:00
|
|
|
select HAVE_SAMPLE_FTRACE_DIRECT_MULTI if X86_64
|
bpf, x86_32: add eBPF JIT compiler for ia32
The JIT compiler emits ia32 bit instructions. Currently, It supports eBPF
only. Classic BPF is supported because of the conversion by BPF core.
Almost all instructions from eBPF ISA supported except the following:
BPF_ALU64 | BPF_DIV | BPF_K
BPF_ALU64 | BPF_DIV | BPF_X
BPF_ALU64 | BPF_MOD | BPF_K
BPF_ALU64 | BPF_MOD | BPF_X
BPF_STX | BPF_XADD | BPF_W
BPF_STX | BPF_XADD | BPF_DW
It doesn't support BPF_JMP|BPF_CALL with BPF_PSEUDO_CALL at the moment.
IA32 has few general purpose registers, EAX|EDX|ECX|EBX|ESI|EDI. I use
EAX|EDX|ECX|EBX as temporary registers to simulate instructions in eBPF
ISA, and allocate ESI|EDI to BPF_REG_AX for constant blinding, all others
eBPF registers, R0-R10, are simulated through scratch space on stack.
The reasons behind the hardware registers allocation policy are:
1:MUL need EAX:EDX, shift operation need ECX, so they aren't fit
for general eBPF 64bit register simulation.
2:We need at least 4 registers to simulate most eBPF ISA operations
on registers operands instead of on register&memory operands.
3:We need to put BPF_REG_AX on hardware registers, or constant blinding
will degrade jit performance heavily.
Tested on PC (Intel(R) Core(TM) i5-5200U CPU).
Testing results on i5-5200U:
1) test_bpf: Summary: 349 PASSED, 0 FAILED, [319/341 JIT'ed]
2) test_progs: Summary: 83 PASSED, 0 FAILED.
3) test_lpm: OK
4) test_lru_map: OK
5) test_verifier: Summary: 828 PASSED, 0 FAILED.
Above tests are all done in following two conditions separately:
1:bpf_jit_enable=1 and bpf_jit_harden=0
2:bpf_jit_enable=1 and bpf_jit_harden=2
Below are some numbers for this jit implementation:
Note:
I run test_progs in kselftest 100 times continuously for every condition,
the numbers are in format: total/times=avg.
The numbers that test_bpf reports show almost the same relation.
a:jit_enable=0 and jit_harden=0 b:jit_enable=1 and jit_harden=0
test_pkt_access:PASS:ipv4:15622/100=156 test_pkt_access:PASS:ipv4:10674/100=106
test_pkt_access:PASS:ipv6:9130/100=91 test_pkt_access:PASS:ipv6:4855/100=48
test_xdp:PASS:ipv4:240198/100=2401 test_xdp:PASS:ipv4:138912/100=1389
test_xdp:PASS:ipv6:137326/100=1373 test_xdp:PASS:ipv6:68542/100=685
test_l4lb:PASS:ipv4:61100/100=611 test_l4lb:PASS:ipv4:37302/100=373
test_l4lb:PASS:ipv6:101000/100=1010 test_l4lb:PASS:ipv6:55030/100=550
c:jit_enable=1 and jit_harden=2
test_pkt_access:PASS:ipv4:10558/100=105
test_pkt_access:PASS:ipv6:5092/100=50
test_xdp:PASS:ipv4:131902/100=1319
test_xdp:PASS:ipv6:77932/100=779
test_l4lb:PASS:ipv4:38924/100=389
test_l4lb:PASS:ipv6:57520/100=575
The numbers show we get 30%~50% improvement.
See Documentation/networking/filter.txt for more information.
Changelog:
Changes v5-v6:
1:Add do {} while (0) to RETPOLINE_RAX_BPF_JIT for
consistence reason.
2:Clean up non-standard comments, reported by Daniel Borkmann.
3:Fix a memory leak issue, repoted by Daniel Borkmann.
Changes v4-v5:
1:Delete is_on_stack, BPF_REG_AX is the only one
on real hardware registers, so just check with
it.
2:Apply commit 1612a981b766 ("bpf, x64: fix JIT emission
for dead code"), suggested by Daniel Borkmann.
Changes v3-v4:
1:Fix changelog in commit.
I install llvm-6.0, then test_progs willn't report errors.
I submit another patch:
"bpf: fix misaligned access for BPF_PROG_TYPE_PERF_EVENT program type on x86_32 platform"
to fix another problem, after that patch, test_verifier willn't report errors too.
2:Fix clear r0[1] twice unnecessarily in *BPF_IND|BPF_ABS* simulation.
Changes v2-v3:
1:Move BPF_REG_AX to real hardware registers for performance reason.
3:Using bpf_load_pointer instead of bpf_jit32.S, suggested by Daniel Borkmann.
4:Delete partial codes in 1c2a088a6626, suggested by Daniel Borkmann.
5:Some bug fixes and comments improvement.
Changes v1-v2:
1:Fix bug in emit_ia32_neg64.
2:Fix bug in emit_ia32_arsh_r64.
3:Delete filename in top level comment, suggested by Thomas Gleixner.
4:Delete unnecessary boiler plate text, suggested by Thomas Gleixner.
5:Rewrite some words in changelog.
6:CodingSytle improvement and a little more comments.
Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-03 06:10:43 +00:00
|
|
|
select HAVE_EBPF_JIT
|
2008-07-25 08:45:33 +00:00
|
|
|
select HAVE_EFFICIENT_UNALIGNED_ACCESS
|
2018-11-15 19:05:37 +00:00
|
|
|
select HAVE_EISA
|
2016-05-21 00:00:16 +00:00
|
|
|
select HAVE_EXIT_THREAD
|
2024-04-02 12:55:15 +00:00
|
|
|
select HAVE_GUP_FAST
|
2017-03-23 14:33:52 +00:00
|
|
|
select HAVE_FENTRY if X86_64 || DYNAMIC_FTRACE
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_FTRACE_MCOUNT_RECORD
|
2023-04-08 12:42:20 +00:00
|
|
|
select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
|
2021-10-21 03:35:55 +00:00
|
|
|
select HAVE_FUNCTION_GRAPH_TRACER if X86_32 || (X86_64 && DYNAMIC_FTRACE)
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_FUNCTION_TRACER
|
2016-05-23 22:09:38 +00:00
|
|
|
select HAVE_GCC_PLUGINS
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_HW_BREAKPOINT
|
|
|
|
select HAVE_IOREMAP_PROT
|
2021-02-09 23:40:51 +00:00
|
|
|
select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_IRQ_TIME_ACCOUNTING
|
2022-04-18 16:50:39 +00:00
|
|
|
select HAVE_JUMP_LABEL_HACK if HAVE_OBJTOOL
|
2009-01-04 23:41:25 +00:00
|
|
|
select HAVE_KERNEL_BZIP2
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_KERNEL_GZIP
|
|
|
|
select HAVE_KERNEL_LZ4
|
2009-01-04 23:41:25 +00:00
|
|
|
select HAVE_KERNEL_LZMA
|
2010-01-08 22:42:45 +00:00
|
|
|
select HAVE_KERNEL_LZO
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_KERNEL_XZ
|
2020-07-30 19:08:39 +00:00
|
|
|
select HAVE_KERNEL_ZSTD
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_KPROBES
|
|
|
|
select HAVE_KPROBES_ON_FTRACE
|
2018-01-12 17:55:03 +00:00
|
|
|
select HAVE_FUNCTION_ERROR_INJECTION
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_KRETPROBES
|
2022-03-26 02:27:17 +00:00
|
|
|
select HAVE_RETHOOK
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_LIVEPATCH if X86_64
|
2010-04-11 16:55:56 +00:00
|
|
|
select HAVE_MIXED_BREAKPOINTS_REGS
|
2017-07-24 23:36:57 +00:00
|
|
|
select HAVE_MOD_ARCH_SPECIFIC
|
2019-01-03 23:28:41 +00:00
|
|
|
select HAVE_MOVE_PMD
|
2020-12-15 03:07:40 +00:00
|
|
|
select HAVE_MOVE_PUD
|
2022-04-18 16:50:40 +00:00
|
|
|
select HAVE_NOINSTR_HACK if HAVE_OBJTOOL
|
2016-05-21 00:00:33 +00:00
|
|
|
select HAVE_NMI
|
2022-04-18 16:50:42 +00:00
|
|
|
select HAVE_NOINSTR_VALIDATION if HAVE_OBJTOOL
|
2022-04-18 16:50:36 +00:00
|
|
|
select HAVE_OBJTOOL if X86_64
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_OPTPROBES
|
2024-02-26 16:14:13 +00:00
|
|
|
select HAVE_PAGE_SIZE_4KB
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_PCSPKR_PLATFORM
|
|
|
|
select HAVE_PERF_EVENTS
|
2010-05-15 20:57:48 +00:00
|
|
|
select HAVE_PERF_EVENTS_NMI
|
2017-08-18 22:15:51 +00:00
|
|
|
select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI
|
2018-11-15 19:05:32 +00:00
|
|
|
select HAVE_PCI
|
2012-08-07 13:20:36 +00:00
|
|
|
select HAVE_PERF_REGS
|
2012-08-07 13:20:40 +00:00
|
|
|
select HAVE_PERF_USER_STACK_DUMP
|
2022-04-18 16:50:36 +00:00
|
|
|
select MMU_GATHER_RCU_TABLE_FREE if PARAVIRT
|
2022-07-08 07:18:03 +00:00
|
|
|
select MMU_GATHER_MERGE_VMAS
|
2020-07-30 10:14:07 +00:00
|
|
|
select HAVE_POSIX_CPU_TIMERS_TASK_WORK
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_REGS_AND_STACK_ACCESS_API
|
2022-04-18 16:50:36 +00:00
|
|
|
select HAVE_RELIABLE_STACKTRACE if UNWINDER_ORC || STACK_VALIDATION
|
2018-04-25 12:20:57 +00:00
|
|
|
select HAVE_FUNCTION_ARG_ACCESS_API
|
mm: percpu: generalize percpu related config
Patch series "mm: percpu: Cleanup percpu first chunk function".
When supporting page mapping percpu first chunk allocator on arm64, we
found there are lots of duplicated codes in percpu embed/page first chunk
allocator. This patchset is aimed to cleanup them and should no function
change.
The currently supported status about 'embed' and 'page' in Archs shows
below,
embed: NEED_PER_CPU_PAGE_FIRST_CHUNK
page: NEED_PER_CPU_EMBED_FIRST_CHUNK
embed page
------------------------
arm64 Y Y
mips Y N
powerpc Y Y
riscv Y N
sparc Y Y
x86 Y Y
------------------------
There are two interfaces about percpu first chunk allocator,
extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
size_t atom_size,
pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn);
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
extern int __init pcpu_page_first_chunk(size_t reserved_size,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn,
- pcpu_fc_populate_pte_fn_t populate_pte_fn);
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
The pcpu_fc_alloc_fn_t/pcpu_fc_free_fn_t is killed, we provide generic
pcpu_fc_alloc() and pcpu_fc_free() function, which are called in the
pcpu_embed/page_first_chunk().
1) For pcpu_embed_first_chunk(), pcpu_fc_cpu_to_node_fn_t is needed to be
provided when archs supported NUMA.
2) For pcpu_page_first_chunk(), the pcpu_fc_populate_pte_fn_t is killed too,
a generic pcpu_populate_pte() which marked '__weak' is provided, if you
need a different function to populate pte on the arch(like x86), please
provide its own implementation.
[1] https://github.com/kevin78/linux.git percpu-cleanup
This patch (of 4):
The HAVE_SETUP_PER_CPU_AREA/NEED_PER_CPU_EMBED_FIRST_CHUNK/
NEED_PER_CPU_PAGE_FIRST_CHUNK/USE_PERCPU_NUMA_NODE_ID configs, which have
duplicate definitions on platforms that subscribe it.
Move them into mm, drop these redundant definitions and instead just
select it on applicable platforms.
Link: https://lkml.kernel.org/r/20211216112359.103822-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20211216112359.103822-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 02:07:41 +00:00
|
|
|
select HAVE_SETUP_PER_CPU_AREA
|
2021-02-09 23:40:52 +00:00
|
|
|
select HAVE_SOFTIRQ_ON_OWN_STACK
|
2018-06-14 10:36:45 +00:00
|
|
|
select HAVE_STACKPROTECTOR if CC_HAS_SANE_STACKPROTECTOR
|
2022-04-18 16:50:36 +00:00
|
|
|
select HAVE_STACK_VALIDATION if HAVE_OBJTOOL
|
2020-08-18 13:57:44 +00:00
|
|
|
select HAVE_STATIC_CALL
|
2022-04-18 16:50:36 +00:00
|
|
|
select HAVE_STATIC_CALL_INLINE if HAVE_OBJTOOL
|
sched/preempt: Add PREEMPT_DYNAMIC using static keys
Where an architecture selects HAVE_STATIC_CALL but not
HAVE_STATIC_CALL_INLINE, each static call has an out-of-line trampoline
which will either branch to a callee or return to the caller.
On such architectures, a number of constraints can conspire to make
those trampolines more complicated and potentially less useful than we'd
like. For example:
* Hardware and software control flow integrity schemes can require the
addition of "landing pad" instructions (e.g. `BTI` for arm64), which
will also be present at the "real" callee.
* Limited branch ranges can require that trampolines generate or load an
address into a register and perform an indirect branch (or at least
have a slow path that does so). This loses some of the benefits of
having a direct branch.
* Interaction with SW CFI schemes can be complicated and fragile, e.g.
requiring that we can recognise idiomatic codegen and remove
indirections understand, at least until clang proves more helpful
mechanisms for dealing with this.
For PREEMPT_DYNAMIC, we don't need the full power of static calls, as we
really only need to enable/disable specific preemption functions. We can
achieve the same effect without a number of the pain points above by
using static keys to fold early returns into the preemption functions
themselves rather than in an out-of-line trampoline, effectively
inlining the trampoline into the start of the function.
For arm64, this results in good code generation. For example, the
dynamic_cond_resched() wrapper looks as follows when enabled. When
disabled, the first `B` is replaced with a `NOP`, resulting in an early
return.
| <dynamic_cond_resched>:
| bti c
| b <dynamic_cond_resched+0x10> // or `nop`
| mov w0, #0x0
| ret
| mrs x0, sp_el0
| ldr x0, [x0, #8]
| cbnz x0, <dynamic_cond_resched+0x8>
| paciasp
| stp x29, x30, [sp, #-16]!
| mov x29, sp
| bl <preempt_schedule_common>
| mov w0, #0x1
| ldp x29, x30, [sp], #16
| autiasp
| ret
... compared to the regular form of the function:
| <__cond_resched>:
| bti c
| mrs x0, sp_el0
| ldr x1, [x0, #8]
| cbz x1, <__cond_resched+0x18>
| mov w0, #0x0
| ret
| paciasp
| stp x29, x30, [sp, #-16]!
| mov x29, sp
| bl <preempt_schedule_common>
| mov w0, #0x1
| ldp x29, x30, [sp], #16
| autiasp
| ret
Any architecture which implements static keys should be able to use this
to implement PREEMPT_DYNAMIC with similar cost to non-inlined static
calls. Since this is likely to have greater overhead than (inlined)
static calls, PREEMPT_DYNAMIC is only defaulted to enabled when
HAVE_PREEMPT_DYNAMIC_CALL is selected.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20220214165216.2231574-6-mark.rutland@arm.com
2022-02-14 16:52:14 +00:00
|
|
|
select HAVE_PREEMPT_DYNAMIC_CALL
|
2018-06-02 12:43:58 +00:00
|
|
|
select HAVE_RSEQ
|
2022-08-04 10:16:44 +00:00
|
|
|
select HAVE_RUST if X86_64
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_SYSCALL_TRACEPOINTS
|
2022-04-19 16:05:09 +00:00
|
|
|
select HAVE_UACCESS_VALIDATION if HAVE_OBJTOOL
|
2015-06-03 08:00:13 +00:00
|
|
|
select HAVE_UNSTABLE_SCHED_CLOCK
|
2009-09-19 06:40:22 +00:00
|
|
|
select HAVE_USER_RETURN_NOTIFIER
|
2019-06-21 09:52:49 +00:00
|
|
|
select HAVE_GENERIC_VDSO
|
2022-11-18 16:38:23 +00:00
|
|
|
select VDSO_GETRANDOM if X86_64
|
2023-05-12 21:07:56 +00:00
|
|
|
select HOTPLUG_PARALLEL if SMP && X86_64
|
2018-05-29 15:48:27 +00:00
|
|
|
select HOTPLUG_SMT if SMP
|
2023-05-12 21:07:56 +00:00
|
|
|
select HOTPLUG_SPLIT_STARTUP if SMP && X86_32
|
2011-02-07 01:24:08 +00:00
|
|
|
select IRQ_FORCED_THREADING
|
mm: introduce new 'lock_mm_and_find_vma()' page fault helper
.. and make x86 use it.
This basically extracts the existing x86 "find and expand faulting vma"
code, but extends it to also take the mmap lock for writing in case we
actually do need to expand the vma.
We've historically short-circuited that case, and have some rather ugly
special logic to serialize the stack segment expansion (since we only
hold the mmap lock for reading) that doesn't match the normal VM
locking.
That slight violation of locking worked well, right up until it didn't:
the maple tree code really does want proper locking even for simple
extension of an existing vma.
So extract the code for "look up the vma of the fault" from x86, fix it
up to do the necessary write locking, and make it available as a helper
function for other architectures that can use the common helper.
Note: I say "common helper", but it really only handles the normal
stack-grows-down case. Which is all architectures except for PA-RISC
and IA64. So some rare architectures can't use the helper, but if they
care they'll just need to open-code this logic.
It's also worth pointing out that this code really would like to have an
optimistic "mmap_upgrade_trylock()" to make it quicker to go from a
read-lock (for the common case) to taking the write lock (for having to
extend the vma) in the normal single-threaded situation where there is
no other locking activity.
But that _is_ all the very uncommon special case, so while it would be
nice to have such an operation, it probably doesn't matter in reality.
I did put in the skeleton code for such a possible future expansion,
even if it only acts as pseudo-documentation for what we're doing.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-15 22:17:36 +00:00
|
|
|
select LOCK_MM_AND_FIND_VMA
|
mm: percpu: generalize percpu related config
Patch series "mm: percpu: Cleanup percpu first chunk function".
When supporting page mapping percpu first chunk allocator on arm64, we
found there are lots of duplicated codes in percpu embed/page first chunk
allocator. This patchset is aimed to cleanup them and should no function
change.
The currently supported status about 'embed' and 'page' in Archs shows
below,
embed: NEED_PER_CPU_PAGE_FIRST_CHUNK
page: NEED_PER_CPU_EMBED_FIRST_CHUNK
embed page
------------------------
arm64 Y Y
mips Y N
powerpc Y Y
riscv Y N
sparc Y Y
x86 Y Y
------------------------
There are two interfaces about percpu first chunk allocator,
extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
size_t atom_size,
pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn);
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
extern int __init pcpu_page_first_chunk(size_t reserved_size,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn,
- pcpu_fc_populate_pte_fn_t populate_pte_fn);
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
The pcpu_fc_alloc_fn_t/pcpu_fc_free_fn_t is killed, we provide generic
pcpu_fc_alloc() and pcpu_fc_free() function, which are called in the
pcpu_embed/page_first_chunk().
1) For pcpu_embed_first_chunk(), pcpu_fc_cpu_to_node_fn_t is needed to be
provided when archs supported NUMA.
2) For pcpu_page_first_chunk(), the pcpu_fc_populate_pte_fn_t is killed too,
a generic pcpu_populate_pte() which marked '__weak' is provided, if you
need a different function to populate pte on the arch(like x86), please
provide its own implementation.
[1] https://github.com/kevin78/linux.git percpu-cleanup
This patch (of 4):
The HAVE_SETUP_PER_CPU_AREA/NEED_PER_CPU_EMBED_FIRST_CHUNK/
NEED_PER_CPU_PAGE_FIRST_CHUNK/USE_PERCPU_NUMA_NODE_ID configs, which have
duplicate definitions on platforms that subscribe it.
Move them into mm, drop these redundant definitions and instead just
select it on applicable platforms.
Link: https://lkml.kernel.org/r/20211216112359.103822-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20211216112359.103822-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 02:07:41 +00:00
|
|
|
select NEED_PER_CPU_EMBED_FIRST_CHUNK
|
|
|
|
select NEED_PER_CPU_PAGE_FIRST_CHUNK
|
2018-04-05 07:44:52 +00:00
|
|
|
select NEED_SG_DMA_LENGTH
|
2024-08-07 06:41:01 +00:00
|
|
|
select NUMA_MEMBLKS if NUMA
|
2018-11-15 19:05:33 +00:00
|
|
|
select PCI_DOMAINS if PCI
|
2019-01-21 23:19:58 +00:00
|
|
|
select PCI_LOCKLESS_CONFIG if PCI
|
2015-06-03 08:00:13 +00:00
|
|
|
select PERF_EVENTS
|
x86: Do full rtc synchronization with ntp
Every 11 minutes ntp attempts to update the x86 rtc with the current
system time. Currently, the x86 code only updates the rtc if the system
time is within +/-15 minutes of the current value of the rtc. This
was done originally to avoid setting the RTC if the RTC was in localtime
mode (common with Windows dualbooting). Other architectures do a full
synchronization and now that we have better infrastructure to detect
when the RTC is in localtime, there is no reason that x86 should be
software limited to a 30 minute window.
This patch changes the behavior of the kernel to do a full synchronization
(year, month, day, hour, minute, and second) of the rtc when ntp requests
a synchronization between the system time and the rtc.
I've used the RTC library functions in this patchset as they do all the
required bounds checking.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: x86@kernel.org
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: linux-efi@vger.kernel.org
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
[jstultz: Tweak commit message, fold in build fix found by fengguang
Also add select RTC_LIB to X86, per new dependency, as found by prarit]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-02-14 17:02:54 +00:00
|
|
|
select RTC_LIB
|
2016-06-01 14:46:23 +00:00
|
|
|
select RTC_MC146818_LIB
|
2015-06-03 08:00:13 +00:00
|
|
|
select SPARSE_IRQ
|
|
|
|
select SYSCTL_EXCEPTION_TRACE
|
2016-09-13 21:29:25 +00:00
|
|
|
select THREAD_INFO_IN_TASK
|
2021-07-31 05:22:32 +00:00
|
|
|
select TRACE_IRQFLAGS_SUPPORT
|
2022-05-11 13:17:32 +00:00
|
|
|
select TRACE_IRQFLAGS_NMI_SUPPORT
|
2015-06-03 08:00:13 +00:00
|
|
|
select USER_STACKTRACE_SUPPORT
|
2020-04-13 07:44:39 +00:00
|
|
|
select HAVE_ARCH_KCSAN if X86_64
|
2019-06-06 01:22:35 +00:00
|
|
|
select PROC_PID_ARCH_STATUS if PROC_FS
|
x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node
== Problem ==
The amount of SGX memory on a system is determined by the BIOS and it
varies wildly between systems. It can be as small as dozens of MB's
and as large as many GB's on servers. Just like how applications need
to know how much regular RAM is available, enclave builders need to
know how much SGX memory an enclave can consume.
== Solution ==
Introduce a new sysfs file:
/sys/devices/system/node/nodeX/x86/sgx_total_bytes
to enumerate the amount of SGX memory available in each NUMA node.
This serves the same function for SGX as /proc/meminfo or
/sys/devices/system/node/nodeX/meminfo does for normal RAM.
'sgx_total_bytes' is needed today to help drive the SGX selftests.
SGX-specific swap code is exercised by creating overcommitted enclaves
which are larger than the physical SGX memory on the system. They
currently use a CPUID-based approach which can diverge from the actual
amount of SGX memory available. 'sgx_total_bytes' ensures that the
selftests can work efficiently and do not attempt stupid things like
creating a 100,000 MB enclave on a system with 128 MB of SGX memory.
== Implementation Details ==
Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an
arch specific attribute group, and add an attribute for the amount of
SGX memory in bytes to each NUMA node:
== ABI Design Discussion ==
As opposed to the per-node ABI, a single, global ABI was considered.
However, this would prevent enclaves from being able to size
themselves so that they fit on a single NUMA node. Essentially, a
single value would rule out NUMA optimizations for enclaves.
Create a new "x86/" directory inside each "nodeX/" sysfs directory.
'sgx_total_bytes' is expected to be the first of at least a few
sgx-specific files to be placed in the new directory. Just scanning
/proc/meminfo, these are the no-brainers that we have for RAM, but we
need for SGX:
MemTotal: xxxx kB // sgx_total_bytes (implemented here)
MemFree: yyyy kB // sgx_free_bytes
SwapTotal: zzzz kB // sgx_swapped_bytes
So, at *least* three. I think we will eventually end up needing
something more along the lines of a dozen. A new directory (as
opposed to being in the nodeX/ "root") directory avoids cluttering the
root with several "sgx_*" files.
Place the new file in a new "nodeX/x86/" directory because SGX is
highly x86-specific. It is very unlikely that any other architecture
(or even non-Intel x86 vendor) will ever implement SGX. Using "sgx/"
as opposed to "x86/" was also considered. But, there is a real chance
this can get used for other arch-specific purposes.
[ dhansen: rewrite changelog ]
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211116162116.93081-2-jarkko@kernel.org
2021-11-16 16:21:16 +00:00
|
|
|
select HAVE_ARCH_NODE_DEV_GROUP if X86_SGX
|
2022-09-15 11:10:47 +00:00
|
|
|
select FUNCTION_ALIGNMENT_16B if X86_64 || X86_ALIGNMENT_16
|
|
|
|
select FUNCTION_ALIGNMENT_4B
|
2020-03-09 00:57:51 +00:00
|
|
|
imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI
|
2022-09-03 13:11:54 +00:00
|
|
|
select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
|
2008-02-10 07:16:28 +00:00
|
|
|
|
2009-06-06 11:58:12 +00:00
|
|
|
config INSTRUCTION_DECODER
|
2012-09-10 11:41:45 +00:00
|
|
|
def_bool y
|
|
|
|
depends on KPROBES || PERF_EVENTS || UPROBES
|
2009-06-06 11:58:12 +00:00
|
|
|
|
x86: unify arch/x86/boot/compressed/vmlinux_*.lds
Look at the:
diff -u arch/x86/boot/compressed/vmlinux_*.lds
output and realize that they're basially exactly the same except for
trivial naming differences, and the fact that the 64-bit version has a
"pgtable" thing.
So unify them.
There's some trivial cleanup there (make the output format a Kconfig thing
rather than doing #ifdef's for it, and unify both 32-bit and 64-bit BSS
end to "_ebss", where 32-bit used to use the traditional "_end"), but
other than that it's really very mindless and straigt conversion.
For example, I think we should aim to remove "startup_32" vs "startup_64",
and just call it "startup", and get rid of one more difference. I didn't
do that.
Also, notice the comment in the unified vmlinux.lds.S talks about
"head_64" and "startup_32" which is an odd and incorrect mix, but that was
actually what the old 64-bit only lds file had, so the confusion isn't
new, and now that mixing is arguably more accurate thanks to the
vmlinux.lds.S file being shared between the two cases ;)
[ Impact: cleanup, unification ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-26 17:12:47 +00:00
|
|
|
config OUTPUT_FORMAT
|
|
|
|
string
|
|
|
|
default "elf32-i386" if X86_32
|
|
|
|
default "elf64-x86-64" if X86_64
|
|
|
|
|
2007-11-06 22:30:30 +00:00
|
|
|
config LOCKDEP_SUPPORT
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
2007-11-06 22:30:30 +00:00
|
|
|
|
|
|
|
config STACKTRACE_SUPPORT
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
2007-11-06 22:30:30 +00:00
|
|
|
|
|
|
|
config MMU
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
2007-11-06 22:30:30 +00:00
|
|
|
|
2016-01-14 23:20:06 +00:00
|
|
|
config ARCH_MMAP_RND_BITS_MIN
|
|
|
|
default 28 if 64BIT
|
|
|
|
default 8
|
|
|
|
|
|
|
|
config ARCH_MMAP_RND_BITS_MAX
|
|
|
|
default 32 if 64BIT
|
|
|
|
default 16
|
|
|
|
|
|
|
|
config ARCH_MMAP_RND_COMPAT_BITS_MIN
|
|
|
|
default 8
|
|
|
|
|
|
|
|
config ARCH_MMAP_RND_COMPAT_BITS_MAX
|
|
|
|
default 16
|
|
|
|
|
2007-11-06 22:30:30 +00:00
|
|
|
config SBUS
|
|
|
|
bool
|
|
|
|
|
|
|
|
config GENERIC_ISA_DMA
|
2012-09-10 11:41:45 +00:00
|
|
|
def_bool y
|
|
|
|
depends on ISA_DMA_API
|
2007-11-06 22:30:30 +00:00
|
|
|
|
2022-09-15 15:04:11 +00:00
|
|
|
config GENERIC_CSUM
|
|
|
|
bool
|
|
|
|
default y if KMSAN || KASAN
|
|
|
|
|
2007-11-06 22:30:30 +00:00
|
|
|
config GENERIC_BUG
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
2007-11-06 22:30:30 +00:00
|
|
|
depends on BUG
|
2008-12-16 11:40:27 +00:00
|
|
|
select GENERIC_BUG_RELATIVE_POINTERS if X86_64
|
|
|
|
|
|
|
|
config GENERIC_BUG_RELATIVE_POINTERS
|
|
|
|
bool
|
2007-11-06 22:30:30 +00:00
|
|
|
|
|
|
|
config ARCH_MAY_HAVE_PC_FDC
|
2012-09-10 11:41:45 +00:00
|
|
|
def_bool y
|
|
|
|
depends on ISA_DMA_API
|
2007-11-06 22:30:30 +00:00
|
|
|
|
2007-11-06 20:35:08 +00:00
|
|
|
config GENERIC_CALIBRATE_DELAY
|
|
|
|
def_bool y
|
|
|
|
|
2008-02-01 01:35:06 +00:00
|
|
|
config ARCH_HAS_CPU_RELAX
|
|
|
|
def_bool y
|
|
|
|
|
2007-12-08 01:12:39 +00:00
|
|
|
config ARCH_HIBERNATION_POSSIBLE
|
|
|
|
def_bool y
|
|
|
|
|
2007-12-08 01:14:00 +00:00
|
|
|
config ARCH_SUSPEND_POSSIBLE
|
|
|
|
def_bool y
|
|
|
|
|
2007-11-06 22:30:30 +00:00
|
|
|
config AUDIT_ARCH
|
2015-02-05 15:39:34 +00:00
|
|
|
def_bool y if X86_64
|
2007-11-06 22:30:30 +00:00
|
|
|
|
2015-07-02 09:09:38 +00:00
|
|
|
config KASAN_SHADOW_OFFSET
|
|
|
|
hex
|
|
|
|
depends on KASAN
|
|
|
|
default 0xdffffc0000000000
|
|
|
|
|
2009-09-02 01:25:07 +00:00
|
|
|
config HAVE_INTEL_TXT
|
|
|
|
def_bool y
|
2012-10-02 18:16:47 +00:00
|
|
|
depends on INTEL_IOMMU && ACPI
|
2009-09-02 01:25:07 +00:00
|
|
|
|
2008-01-30 12:32:27 +00:00
|
|
|
config X86_64_SMP
|
|
|
|
def_bool y
|
|
|
|
depends on X86_64 && SMP
|
|
|
|
|
uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints
Add uprobes support to the core kernel, with x86 support.
This commit adds the kernel facilities, the actual uprobes
user-space ABI and perf probe support comes in later commits.
General design:
Uprobes are maintained in an rb-tree indexed by inode and offset
(the offset here is from the start of the mapping). For a unique
(inode, offset) tuple, there can be at most one uprobe in the
rb-tree.
Since the (inode, offset) tuple identifies a unique uprobe, more
than one user may be interested in the same uprobe. This provides
the ability to connect multiple 'consumers' to the same uprobe.
Each consumer defines a handler and a filter (optional). The
'handler' is run every time the uprobe is hit, if it matches the
'filter' criteria.
The first consumer of a uprobe causes the breakpoint to be
inserted at the specified address and subsequent consumers are
appended to this list. On subsequent probes, the consumer gets
appended to the existing list of consumers. The breakpoint is
removed when the last consumer unregisters. For all other
unregisterations, the consumer is removed from the list of
consumers.
Given a inode, we get a list of the mms that have mapped the
inode. Do the actual registration if mm maps the page where a
probe needs to be inserted/removed.
We use a temporary list to walk through the vmas that map the
inode.
- The number of maps that map the inode, is not known before we
walk the rmap and keeps changing.
- extending vm_area_struct wasn't recommended, it's a
size-critical data structure.
- There can be more than one maps of the inode in the same mm.
We add callbacks to the mmap methods to keep an eye on text vmas
that are of interest to uprobes. When a vma of interest is mapped,
we insert the breakpoint at the right address.
Uprobe works by replacing the instruction at the address defined
by (inode, offset) with the arch specific breakpoint
instruction. We save a copy of the original instruction at the
uprobed address.
This is needed for:
a. executing the instruction out-of-line (xol).
b. instruction analysis for any subsequent fixups.
c. restoring the instruction back when the uprobe is unregistered.
We insert or delete a breakpoint instruction, and this
breakpoint instruction is assumed to be the smallest instruction
available on the platform. For fixed size instruction platforms
this is trivially true, for variable size instruction platforms
the breakpoint instruction is typically the smallest (often a
single byte).
Writing the instruction is done by COWing the page and changing
the instruction during the copy, this even though most platforms
allow atomic writes of the breakpoint instruction. This also
mirrors the behaviour of a ptrace() memory write to a PRIVATE
file map.
The core worker is derived from KSM's replace_page() logic.
In essence, similar to KSM:
a. allocate a new page and copy over contents of the page that
has the uprobed vaddr
b. modify the copy and insert the breakpoint at the required
address
c. switch the original page with the copy containing the
breakpoint
d. flush page tables.
replace_page() is being replicated here because of some minor
changes in the type of pages and also because Hugh Dickins had
plans to improve replace_page() for KSM specific work.
Instruction analysis on x86 is based on instruction decoder and
determines if an instruction can be probed and determines the
necessary fixups after singlestep. Instruction analysis is done
at probe insertion time so that we avoid having to repeat the
same analysis every time a probe is hit.
A lot of code here is due to the improvement/suggestions/inputs
from Peter Zijlstra.
Changelog:
(v10):
- Add code to clear REX.B prefix as suggested by Denys Vlasenko
and Masami Hiramatsu.
(v9):
- Use insn_offset_modrm as suggested by Masami Hiramatsu.
(v7):
Handle comments from Peter Zijlstra:
- Dont take reference to inode. (expect inode to uprobe_register to be sane).
- Use PTR_ERR to set the return value.
- No need to take reference to inode.
- use PTR_ERR to return error value.
- register and uprobe_unregister share code.
(v5):
- Modified del_consumer as per comments from Peter.
- Drop reference to inode before dropping reference to uprobe.
- Use i_size_read(inode) instead of inode->i_size.
- Ensure uprobe->consumers is NULL, before __uprobe_unregister() is called.
- Includes errno.h as recommended by Stephen Rothwell to fix a build issue
on sparc defconfig
- Remove restrictions while unregistering.
- Earlier code leaked inode references under some conditions while
registering/unregistering.
- Continue the vma-rmap walk even if the intermediate vma doesnt
meet the requirements.
- Validate the vma found by find_vma before inserting/removing the
breakpoint
- Call del_consumer under mutex_lock.
- Use hash locks.
- Handle mremap.
- Introduce find_least_offset_node() instead of close match logic in
find_uprobe
- Uprobes no more depends on MM_OWNER; No reference to task_structs
while inserting/removing a probe.
- Uses read_mapping_page instead of grab_cache_page so that the pages
have valid content.
- pass NULL to get_user_pages for the task parameter.
- call SetPageUptodate on the new page allocated in write_opcode.
- fix leaking a reference to the new page under certain conditions.
- Include Instruction Decoder if Uprobes gets defined.
- Remove const attributes for instruction prefix arrays.
- Uses mm_context to know if the application is 32 bit.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Also-written-by: Jim Keniston <jkenisto@us.ibm.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Anton Arapov <anton@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linux-mm <linux-mm@kvack.org>
Link: http://lkml.kernel.org/r/20120209092642.GE16600@linux.vnet.ibm.com
[ Made various small edits to the commit log ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-09 09:26:42 +00:00
|
|
|
config ARCH_SUPPORTS_UPROBES
|
|
|
|
def_bool y
|
|
|
|
|
2014-04-18 22:19:54 +00:00
|
|
|
config FIX_EARLYCON_MEM
|
|
|
|
def_bool y
|
|
|
|
|
2018-05-18 11:30:28 +00:00
|
|
|
config DYNAMIC_PHYSICAL_MASK
|
|
|
|
bool
|
|
|
|
|
2015-04-14 22:46:14 +00:00
|
|
|
config PGTABLE_LEVELS
|
|
|
|
int
|
2017-07-16 22:59:54 +00:00
|
|
|
default 5 if X86_5LEVEL
|
2015-04-14 22:46:14 +00:00
|
|
|
default 4 if X86_64
|
|
|
|
default 3 if X86_PAE
|
|
|
|
default 2
|
|
|
|
|
stack-protector: test compiler capability in Kconfig and drop AUTO mode
Move the test for -fstack-protector(-strong) option to Kconfig.
If the compiler does not support the option, the corresponding menu
is automatically hidden. If STRONG is not supported, it will fall
back to REGULAR. If REGULAR is not supported, it will be disabled.
This means, AUTO is implicitly handled by the dependency solver of
Kconfig, hence removed.
I also turned the 'choice' into only two boolean symbols. The use of
'choice' is not a good idea here, because all of all{yes,mod,no}config
would choose the first visible value, while we want allnoconfig to
disable as many features as possible.
X86 has additional shell scripts in case the compiler supports those
options, but generates broken code. I added CC_HAS_SANE_STACKPROTECTOR
to test this. I had to add -m32 to gcc-x86_32-has-stack-protector.sh
to make it work correctly.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
2018-05-28 09:22:00 +00:00
|
|
|
config CC_HAS_SANE_STACKPROTECTOR
|
|
|
|
bool
|
2022-06-17 18:08:46 +00:00
|
|
|
default $(success,$(srctree)/scripts/gcc-x86_64-has-stack-protector.sh $(CC) $(CLANG_FLAGS)) if 64BIT
|
|
|
|
default $(success,$(srctree)/scripts/gcc-x86_32-has-stack-protector.sh $(CC) $(CLANG_FLAGS))
|
stack-protector: test compiler capability in Kconfig and drop AUTO mode
Move the test for -fstack-protector(-strong) option to Kconfig.
If the compiler does not support the option, the corresponding menu
is automatically hidden. If STRONG is not supported, it will fall
back to REGULAR. If REGULAR is not supported, it will be disabled.
This means, AUTO is implicitly handled by the dependency solver of
Kconfig, hence removed.
I also turned the 'choice' into only two boolean symbols. The use of
'choice' is not a good idea here, because all of all{yes,mod,no}config
would choose the first visible value, while we want allnoconfig to
disable as many features as possible.
X86 has additional shell scripts in case the compiler supports those
options, but generates broken code. I added CC_HAS_SANE_STACKPROTECTOR
to test this. I had to add -m32 to gcc-x86_32-has-stack-protector.sh
to make it work correctly.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
2018-05-28 09:22:00 +00:00
|
|
|
help
|
2022-05-25 13:32:02 +00:00
|
|
|
We have to make sure stack protector is unconditionally disabled if
|
|
|
|
the compiler produces broken code or if it does not let us control
|
|
|
|
the segment on 32-bit kernels.
|
stack-protector: test compiler capability in Kconfig and drop AUTO mode
Move the test for -fstack-protector(-strong) option to Kconfig.
If the compiler does not support the option, the corresponding menu
is automatically hidden. If STRONG is not supported, it will fall
back to REGULAR. If REGULAR is not supported, it will be disabled.
This means, AUTO is implicitly handled by the dependency solver of
Kconfig, hence removed.
I also turned the 'choice' into only two boolean symbols. The use of
'choice' is not a good idea here, because all of all{yes,mod,no}config
would choose the first visible value, while we want allnoconfig to
disable as many features as possible.
X86 has additional shell scripts in case the compiler supports those
options, but generates broken code. I added CC_HAS_SANE_STACKPROTECTOR
to test this. I had to add -m32 to gcc-x86_32-has-stack-protector.sh
to make it work correctly.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
2018-05-28 09:22:00 +00:00
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
menu "Processor type and features"
|
|
|
|
|
|
|
|
config SMP
|
|
|
|
bool "Symmetric multi-processing support"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
This enables support for systems with more than one CPU. If you have
|
2014-01-23 23:55:29 +00:00
|
|
|
a system with only one CPU, say N. If you have a system with more
|
|
|
|
than one CPU, say Y.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2014-01-23 23:55:29 +00:00
|
|
|
If you say N here, the kernel will run on uni- and multiprocessor
|
2007-11-09 20:56:54 +00:00
|
|
|
machines, but will use only one CPU of a multiprocessor machine. If
|
|
|
|
you say Y here, the kernel will run on many, but not all,
|
2014-01-23 23:55:29 +00:00
|
|
|
uniprocessor machines. On a uniprocessor machine, the kernel
|
2007-11-09 20:56:54 +00:00
|
|
|
will run faster if you say N here.
|
|
|
|
|
|
|
|
Note that if you say Y here and choose architecture "586" or
|
|
|
|
"Pentium" under "Processor family", the kernel will not work on 486
|
|
|
|
architectures. Similarly, multiprocessor kernels for the "PPro"
|
|
|
|
architecture may not work on all Pentium based boards.
|
|
|
|
|
|
|
|
People using multiprocessor machines who say Y here should also say
|
|
|
|
Y to "Enhanced Real Time Clock Support", below. The "Advanced Power
|
|
|
|
Management" code will be disabled if you say Y here.
|
|
|
|
|
2023-03-14 23:06:44 +00:00
|
|
|
See also <file:Documentation/arch/x86/i386/IO-APIC.rst>,
|
2019-06-27 17:56:51 +00:00
|
|
|
<file:Documentation/admin-guide/lockup-watchdogs.rst> and the SMP-HOWTO available at
|
2007-11-09 20:56:54 +00:00
|
|
|
<http://www.tldp.org/docs.html#howto>.
|
|
|
|
|
|
|
|
If you don't know what to do here, say N.
|
|
|
|
|
2009-02-17 01:29:58 +00:00
|
|
|
config X86_X2APIC
|
|
|
|
bool "Support x2apic"
|
2015-05-04 15:58:01 +00:00
|
|
|
depends on X86_LOCAL_APIC && X86_64 && (IRQ_REMAP || HYPERVISOR_GUEST)
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2009-02-17 01:29:58 +00:00
|
|
|
This enables x2apic support on CPUs that have this feature.
|
|
|
|
|
|
|
|
This allows 32-bit apic IDs (so it can support very large systems),
|
|
|
|
and accesses the local apic via MSRs not via mmio.
|
|
|
|
|
2022-08-16 23:19:42 +00:00
|
|
|
Some Intel systems circa 2022 and later are locked into x2APIC mode
|
|
|
|
and can not fall back to the legacy APIC modes if SGX or TDX are
|
2022-11-29 21:50:08 +00:00
|
|
|
enabled in the BIOS. They will boot with very reduced functionality
|
|
|
|
without enabling this option.
|
2022-08-16 23:19:42 +00:00
|
|
|
|
2009-02-17 01:29:58 +00:00
|
|
|
If you don't know what to do here, say N.
|
|
|
|
|
2024-04-23 17:41:06 +00:00
|
|
|
config X86_POSTED_MSI
|
|
|
|
bool "Enable MSI and MSI-x delivery by posted interrupts"
|
|
|
|
depends on X86_64 && IRQ_REMAP
|
|
|
|
help
|
|
|
|
This enables MSIs that are under interrupt remapping to be delivered as
|
|
|
|
posted interrupts to the host kernel. Interrupt throughput can
|
|
|
|
potentially be improved by coalescing CPU notifications during high
|
|
|
|
frequency bursts.
|
2022-08-16 23:19:42 +00:00
|
|
|
|
2009-02-17 01:29:58 +00:00
|
|
|
If you don't know what to do here, say N.
|
|
|
|
|
2008-06-19 19:13:09 +00:00
|
|
|
config X86_MPPARSE
|
2021-02-11 13:40:02 +00:00
|
|
|
bool "Enable MPS table" if ACPI
|
2008-10-30 10:38:24 +00:00
|
|
|
default y
|
2008-07-10 12:42:03 +00:00
|
|
|
depends on X86_LOCAL_APIC
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2008-06-19 19:13:09 +00:00
|
|
|
For old smp systems that do not have proper acpi support. Newer systems
|
|
|
|
(esp with 64bit cpus) with acpi support, MADT and DSDT will override it
|
|
|
|
|
2019-01-29 22:44:36 +00:00
|
|
|
config X86_CPU_RESCTRL
|
|
|
|
bool "x86 CPU resource control support"
|
2018-11-21 20:28:39 +00:00
|
|
|
depends on X86 && (CPU_SUP_INTEL || CPU_SUP_AMD)
|
2016-11-15 14:17:12 +00:00
|
|
|
select KERNFS
|
2020-01-15 09:28:51 +00:00
|
|
|
select PROC_CPU_RESCTRL if PROC_FS
|
2016-10-22 13:19:53 +00:00
|
|
|
help
|
2019-01-29 22:44:36 +00:00
|
|
|
Enable x86 CPU resource control support.
|
2018-11-21 20:28:39 +00:00
|
|
|
|
|
|
|
Provide support for the allocation and monitoring of system resources
|
|
|
|
usage by the CPU.
|
|
|
|
|
|
|
|
Intel calls this Intel Resource Director Technology
|
|
|
|
(Intel(R) RDT). More information about RDT can be found in the
|
|
|
|
Intel x86 Architecture Software Developer Manual.
|
|
|
|
|
|
|
|
AMD calls this AMD Platform Quality of Service (AMD QoS).
|
|
|
|
More information about AMD QoS can be found in the AMD64 Technology
|
|
|
|
Platform Quality of Service Extensions manual.
|
2016-10-22 13:19:53 +00:00
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2023-12-05 10:49:54 +00:00
|
|
|
config X86_FRED
|
|
|
|
bool "Flexible Return and Event Delivery"
|
|
|
|
depends on X86_64
|
|
|
|
help
|
|
|
|
When enabled, try to use Flexible Return and Event Delivery
|
|
|
|
instead of the legacy SYSCALL/SYSENTER/IDT architecture for
|
|
|
|
ring transitions and exception/interrupt handling if the
|
2024-03-12 16:19:58 +00:00
|
|
|
system supports it.
|
2023-12-05 10:49:54 +00:00
|
|
|
|
2018-02-10 00:51:03 +00:00
|
|
|
config X86_BIGSMP
|
|
|
|
bool "Support for big SMP systems with more than 8 CPUs"
|
2024-02-04 10:07:19 +00:00
|
|
|
depends on SMP && X86_32
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2019-12-04 00:06:47 +00:00
|
|
|
This option is needed for the systems that have more than 8 CPUs.
|
2018-02-10 00:51:03 +00:00
|
|
|
|
2009-02-10 02:18:14 +00:00
|
|
|
config X86_EXTENDED_PLATFORM
|
|
|
|
bool "Support for extended (non-PC) x86 platforms"
|
|
|
|
default y
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2009-01-27 17:11:43 +00:00
|
|
|
If you disable this option then the kernel will only support
|
|
|
|
standard PC platforms. (which covers the vast majority of
|
|
|
|
systems out there.)
|
|
|
|
|
2009-02-21 00:59:11 +00:00
|
|
|
If you enable this option then you'll be able to select support
|
2024-02-04 10:07:19 +00:00
|
|
|
for the following non-PC x86 platforms, depending on the value of
|
|
|
|
CONFIG_64BIT.
|
|
|
|
|
|
|
|
32-bit platforms (CONFIG_64BIT=n):
|
2013-06-24 00:05:25 +00:00
|
|
|
Goldfish (Android emulator)
|
2009-02-21 00:59:11 +00:00
|
|
|
AMD Elan
|
|
|
|
RDC R-321x SoC
|
|
|
|
SGI 320/540 (Visual Workstation)
|
2012-04-04 17:40:21 +00:00
|
|
|
STA2X11-based (e.g. Northville)
|
2009-08-29 12:54:20 +00:00
|
|
|
Moorestown MID devices
|
2009-01-27 17:11:43 +00:00
|
|
|
|
2024-02-04 10:07:19 +00:00
|
|
|
64-bit platforms (CONFIG_64BIT=y):
|
2011-12-05 16:07:26 +00:00
|
|
|
Numascale NumaChip
|
2009-02-21 00:59:11 +00:00
|
|
|
ScaleMP vSMP
|
|
|
|
SGI Ultraviolet
|
|
|
|
|
|
|
|
If you have one of these systems, or if you want to build a
|
|
|
|
generic distribution kernel, say Y here - otherwise say N.
|
2024-02-04 10:07:19 +00:00
|
|
|
|
2009-02-10 02:18:14 +00:00
|
|
|
# This is an alphabetically sorted list of 64 bit extended platforms
|
|
|
|
# Please maintain the alphabetic order if and when there are additions
|
2011-12-05 16:07:26 +00:00
|
|
|
config X86_NUMACHIP
|
|
|
|
bool "Numascale NumaChip"
|
|
|
|
depends on X86_64
|
|
|
|
depends on X86_EXTENDED_PLATFORM
|
|
|
|
depends on NUMA
|
|
|
|
depends on SMP
|
|
|
|
depends on X86_X2APIC
|
2012-12-07 21:24:32 +00:00
|
|
|
depends on PCI_MMCONFIG
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2011-12-05 16:07:26 +00:00
|
|
|
Adds support for Numascale NumaChip large-SMP systems. Needed to
|
|
|
|
enable more than ~168 cores.
|
|
|
|
If you don't have one of these, you should say N here.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2009-02-10 02:18:14 +00:00
|
|
|
config X86_VSMP
|
|
|
|
bool "ScaleMP vSMP"
|
2013-03-04 20:20:21 +00:00
|
|
|
select HYPERVISOR_GUEST
|
2009-02-10 02:18:14 +00:00
|
|
|
select PARAVIRT
|
|
|
|
depends on X86_64 && PCI
|
|
|
|
depends on X86_EXTENDED_PLATFORM
|
2012-04-16 07:39:35 +00:00
|
|
|
depends on SMP
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2009-02-10 02:18:14 +00:00
|
|
|
Support for ScaleMP vSMP systems. Say 'Y' here if this kernel is
|
|
|
|
supposed to run on these EM64T-based machines. Only choose this option
|
|
|
|
if you have one of these machines.
|
2008-01-30 12:33:36 +00:00
|
|
|
|
2009-01-20 03:36:04 +00:00
|
|
|
config X86_UV
|
|
|
|
bool "SGI Ultraviolet"
|
|
|
|
depends on X86_64
|
2009-02-10 02:18:14 +00:00
|
|
|
depends on X86_EXTENDED_PLATFORM
|
2009-04-03 20:39:42 +00:00
|
|
|
depends on NUMA
|
2016-02-12 00:13:20 +00:00
|
|
|
depends on EFI
|
2021-04-20 07:47:42 +00:00
|
|
|
depends on KEXEC_CORE
|
2009-04-20 20:02:31 +00:00
|
|
|
depends on X86_X2APIC
|
2015-05-06 04:23:59 +00:00
|
|
|
depends on PCI
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2009-01-20 03:36:04 +00:00
|
|
|
This option is needed in order to support SGI Ultraviolet systems.
|
|
|
|
If you don't have one of these, you should say N here.
|
|
|
|
|
2009-02-10 02:18:14 +00:00
|
|
|
# Following is an alphabetically sorted list of 32 bit extended platforms
|
|
|
|
# Please maintain the alphabetic order if and when there are additions
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2013-01-21 17:23:09 +00:00
|
|
|
config X86_GOLDFISH
|
2019-11-21 03:21:09 +00:00
|
|
|
bool "Goldfish (Virtual Platform)"
|
|
|
|
depends on X86_EXTENDED_PLATFORM
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2022-05-25 13:32:02 +00:00
|
|
|
Enable support for the Goldfish virtual platform used primarily
|
|
|
|
for Android development. Unless you are building for the Android
|
|
|
|
Goldfish emulator say N here.
|
2013-01-21 17:23:09 +00:00
|
|
|
|
2010-11-09 20:08:04 +00:00
|
|
|
config X86_INTEL_CE
|
|
|
|
bool "CE4100 TV platform"
|
|
|
|
depends on PCI
|
|
|
|
depends on PCI_GODIRECT
|
2014-06-09 08:19:46 +00:00
|
|
|
depends on X86_IO_APIC
|
2010-11-09 20:08:04 +00:00
|
|
|
depends on X86_32
|
|
|
|
depends on X86_EXTENDED_PLATFORM
|
2010-11-09 20:08:08 +00:00
|
|
|
select X86_REBOOTFIXUPS
|
2011-02-22 20:07:37 +00:00
|
|
|
select OF
|
|
|
|
select OF_EARLY_FLATTREE
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2010-11-09 20:08:04 +00:00
|
|
|
Select for the Intel CE media processor (CE4100) SOC.
|
|
|
|
This option compiles in support for the CE4100 SOC for settop
|
|
|
|
boxes and media devices.
|
|
|
|
|
2013-12-17 01:37:26 +00:00
|
|
|
config X86_INTEL_MID
|
2011-07-12 16:49:29 +00:00
|
|
|
bool "Intel MID platform support"
|
|
|
|
depends on X86_EXTENDED_PLATFORM
|
2014-01-21 18:41:39 +00:00
|
|
|
depends on X86_PLATFORM_DEVICES
|
2011-11-10 13:29:14 +00:00
|
|
|
depends on PCI
|
2016-01-15 20:11:07 +00:00
|
|
|
depends on X86_64 || (PCI_GOANY && X86_32)
|
2011-11-10 13:29:14 +00:00
|
|
|
depends on X86_IO_APIC
|
2013-12-17 01:37:26 +00:00
|
|
|
select I2C
|
2011-12-29 14:43:16 +00:00
|
|
|
select DW_APB_TIMER
|
2020-04-16 08:15:33 +00:00
|
|
|
select INTEL_SCU_PCI
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2013-12-17 01:37:26 +00:00
|
|
|
Select to build a kernel capable of supporting Intel MID (Mobile
|
|
|
|
Internet Device) platform systems which do not have the PCI legacy
|
|
|
|
interfaces. If you are building for a PC class system say N here.
|
2011-11-10 13:29:14 +00:00
|
|
|
|
2013-12-17 01:37:26 +00:00
|
|
|
Intel MID platforms are based on an Intel processor and chipset which
|
|
|
|
consume less power than most of the x86 derivatives.
|
2011-07-12 16:49:29 +00:00
|
|
|
|
2015-01-30 16:29:39 +00:00
|
|
|
config X86_INTEL_QUARK
|
|
|
|
bool "Intel Quark platform support"
|
|
|
|
depends on X86_32
|
|
|
|
depends on X86_EXTENDED_PLATFORM
|
|
|
|
depends on X86_PLATFORM_DEVICES
|
|
|
|
depends on X86_TSC
|
|
|
|
depends on PCI
|
|
|
|
depends on PCI_GOANY
|
|
|
|
depends on X86_IO_APIC
|
|
|
|
select IOSF_MBI
|
|
|
|
select INTEL_IMR
|
2015-03-05 15:24:04 +00:00
|
|
|
select COMMON_CLK
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2015-01-30 16:29:39 +00:00
|
|
|
Select to include support for Quark X1000 SoC.
|
|
|
|
Say Y here if you have a Quark based system such as the Arduino
|
|
|
|
compatible Intel Galileo.
|
|
|
|
|
2013-01-18 13:45:59 +00:00
|
|
|
config X86_INTEL_LPSS
|
|
|
|
bool "Intel Low Power Subsystem Support"
|
2019-01-02 18:10:37 +00:00
|
|
|
depends on X86 && ACPI && PCI
|
2013-01-18 13:45:59 +00:00
|
|
|
select COMMON_CLK
|
2013-09-13 14:02:29 +00:00
|
|
|
select PINCTRL
|
2015-12-12 01:45:06 +00:00
|
|
|
select IOSF_MBI
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2013-01-18 13:45:59 +00:00
|
|
|
Select to build support for Intel Low Power Subsystem such as
|
|
|
|
found on Intel Lynxpoint PCH. Selecting this option enables
|
2013-09-13 14:02:29 +00:00
|
|
|
things like clock tree (common clock framework) and pincontrol
|
|
|
|
which are needed by the LPSS peripheral drivers.
|
2013-01-18 13:45:59 +00:00
|
|
|
|
2015-02-06 00:27:51 +00:00
|
|
|
config X86_AMD_PLATFORM_DEVICE
|
|
|
|
bool "AMD ACPI2Platform devices support"
|
|
|
|
depends on ACPI
|
|
|
|
select COMMON_CLK
|
|
|
|
select PINCTRL
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2015-02-06 00:27:51 +00:00
|
|
|
Select to interpret AMD specific ACPI device to platform device
|
|
|
|
such as I2C, UART, GPIO found on AMD Carrizo and later chipsets.
|
|
|
|
I2C and UART depend on COMMON_CLK to set clock. GPIO driver is
|
|
|
|
implemented under PINCTRL subsystem.
|
|
|
|
|
2014-09-18 05:13:50 +00:00
|
|
|
config IOSF_MBI
|
|
|
|
tristate "Intel SoC IOSF Sideband support for SoC platforms"
|
|
|
|
depends on PCI
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2014-09-18 05:13:50 +00:00
|
|
|
This option enables sideband register access support for Intel SoC
|
|
|
|
platforms. On these platforms the IOSF sideband is used in lieu of
|
|
|
|
MSR's for some register accesses, mostly but not limited to thermal
|
|
|
|
and power. Drivers may query the availability of this device to
|
|
|
|
determine if they need the sideband in order to work on these
|
|
|
|
platforms. The sideband is available on the following SoC products.
|
|
|
|
This list is not meant to be exclusive.
|
|
|
|
- BayTrail
|
|
|
|
- Braswell
|
|
|
|
- Quark
|
|
|
|
|
|
|
|
You should say Y if you are running a kernel on one of these SoC's.
|
|
|
|
|
2014-09-18 05:13:51 +00:00
|
|
|
config IOSF_MBI_DEBUG
|
|
|
|
bool "Enable IOSF sideband access through debugfs"
|
|
|
|
depends on IOSF_MBI && DEBUG_FS
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2014-09-18 05:13:51 +00:00
|
|
|
Select this option to expose the IOSF sideband access registers (MCR,
|
|
|
|
MDR, MCRX) through debugfs to write and read register information from
|
|
|
|
different units on the SoC. This is most useful for obtaining device
|
|
|
|
state information for debug and analysis. As this is a general access
|
|
|
|
mechanism, users of this option would have specific knowledge of the
|
|
|
|
device they want to access.
|
|
|
|
|
|
|
|
If you don't require the option or are in doubt, say N.
|
|
|
|
|
2009-02-10 02:18:14 +00:00
|
|
|
config X86_RDC321X
|
|
|
|
bool "RDC R-321x SoC"
|
2007-11-09 20:56:54 +00:00
|
|
|
depends on X86_32
|
2009-02-10 02:18:14 +00:00
|
|
|
depends on X86_EXTENDED_PLATFORM
|
|
|
|
select M486
|
|
|
|
select X86_REBOOTFIXUPS
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2009-02-10 02:18:14 +00:00
|
|
|
This option is needed for RDC R-321x system-on-chip, also known
|
|
|
|
as R-8610-(G).
|
|
|
|
If you don't have one of these chips, you should say N here.
|
|
|
|
|
2009-01-27 17:43:09 +00:00
|
|
|
config X86_32_NON_STANDARD
|
2009-01-27 17:24:57 +00:00
|
|
|
bool "Support non-standard 32-bit SMP architectures"
|
|
|
|
depends on X86_32 && SMP
|
2009-02-10 02:18:14 +00:00
|
|
|
depends on X86_EXTENDED_PLATFORM
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2014-02-25 20:14:06 +00:00
|
|
|
This option compiles in the bigsmp and STA2X11 default
|
|
|
|
subarchitectures. It is intended for a generic binary
|
|
|
|
kernel. If you select them all, kernel will probe it one by
|
|
|
|
one and will fallback to default.
|
2008-06-09 01:31:54 +00:00
|
|
|
|
2009-02-10 02:18:14 +00:00
|
|
|
# Alphabetically sorted list of Non standard 32 bit platforms
|
2008-06-09 01:31:54 +00:00
|
|
|
|
2009-09-26 16:35:07 +00:00
|
|
|
config X86_SUPPORTS_MEMORY_FAILURE
|
2010-04-21 14:23:44 +00:00
|
|
|
def_bool y
|
2009-09-26 16:35:07 +00:00
|
|
|
# MCE code calls memory_failure():
|
|
|
|
depends on X86_MCE
|
|
|
|
# On 32-bit this adds too big of NODES_SHIFT and we run out of page flags:
|
|
|
|
# On 32-bit SPARSEMEM adds too big of SECTIONS_WIDTH:
|
|
|
|
depends on X86_64 || !SPARSEMEM
|
|
|
|
select ARCH_SUPPORTS_MEMORY_FAILURE
|
|
|
|
|
2012-04-04 17:40:21 +00:00
|
|
|
config STA2X11
|
|
|
|
bool "STA2X11 Companion Chip Support"
|
|
|
|
depends on X86_32_NON_STANDARD && PCI
|
|
|
|
select SWIOTLB
|
|
|
|
select MFD_STA2X11
|
2016-06-02 12:20:18 +00:00
|
|
|
select GPIOLIB
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2012-04-04 17:40:21 +00:00
|
|
|
This adds support for boards based on the STA2X11 IO-Hub,
|
|
|
|
a.k.a. "ConneXt". The chip is used in place of the standard
|
|
|
|
PC chipset, so all "standard" peripherals are missing. If this
|
|
|
|
option is selected the kernel will still be able to boot on
|
|
|
|
standard PC machines.
|
|
|
|
|
2010-09-25 04:06:57 +00:00
|
|
|
config X86_32_IRIS
|
|
|
|
tristate "Eurobraille/Iris poweroff module"
|
|
|
|
depends on X86_32
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2010-09-25 04:06:57 +00:00
|
|
|
The Iris machines from EuroBraille do not have APM or ACPI support
|
|
|
|
to shut themselves down properly. A special I/O sequence is
|
|
|
|
needed to do so, which is what this module does at
|
|
|
|
kernel shutdown.
|
|
|
|
|
|
|
|
This is only for Iris machines from EuroBraille.
|
|
|
|
|
|
|
|
If unused, say N.
|
|
|
|
|
2008-11-11 08:05:16 +00:00
|
|
|
config SCHED_OMIT_FRAME_POINTER
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
|
|
|
prompt "Single-depth WCHAN output"
|
2008-11-06 19:10:49 +00:00
|
|
|
depends on X86
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
Calculate simpler /proc/<PID>/wchan values. If this option
|
|
|
|
is disabled then wchan values will recurse back to the
|
|
|
|
caller function. This provides more accurate wchan values,
|
|
|
|
at the expense of slightly more scheduling overhead.
|
|
|
|
|
|
|
|
If in doubt, say "Y".
|
|
|
|
|
2013-03-04 20:20:21 +00:00
|
|
|
menuconfig HYPERVISOR_GUEST
|
|
|
|
bool "Linux guest support"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2013-03-04 20:20:21 +00:00
|
|
|
Say Y here to enable options for running Linux under various hyper-
|
|
|
|
visors. This option enables basic hypervisor detection and platform
|
|
|
|
setup.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2013-03-04 20:20:21 +00:00
|
|
|
If you say N, all options in this submenu will be skipped and
|
|
|
|
disabled, and Linux guest support won't be built in.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2013-03-04 20:20:21 +00:00
|
|
|
if HYPERVISOR_GUEST
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2008-01-30 12:33:32 +00:00
|
|
|
config PARAVIRT
|
|
|
|
bool "Enable paravirtualization code"
|
2021-03-11 14:23:09 +00:00
|
|
|
depends on HAVE_STATIC_CALL
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2008-01-30 12:33:32 +00:00
|
|
|
This changes the kernel so it can modify itself when it is run
|
|
|
|
under a hypervisor, potentially improving performance significantly
|
|
|
|
over full virtualization. However, when run without a hypervisor
|
|
|
|
the kernel is theoretically slower and slightly larger.
|
|
|
|
|
2018-08-28 07:40:21 +00:00
|
|
|
config PARAVIRT_XXL
|
|
|
|
bool
|
|
|
|
|
2013-03-04 20:20:21 +00:00
|
|
|
config PARAVIRT_DEBUG
|
|
|
|
bool "paravirt-ops debugging"
|
|
|
|
depends on PARAVIRT && DEBUG_KERNEL
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2013-03-04 20:20:21 +00:00
|
|
|
Enable to debug paravirt_ops internals. Specifically, BUG if
|
|
|
|
a paravirt_op is missing when it is called.
|
|
|
|
|
x86: Fix performance regression caused by paravirt_ops on native kernels
Xiaohui Xin and some other folks at Intel have been looking into what's
behind the performance hit of paravirt_ops when running native.
It appears that the hit is entirely due to the paravirtualized
spinlocks introduced by:
| commit 8efcbab674de2bee45a2e4cdf97de16b8e609ac8
| Date: Mon Jul 7 12:07:51 2008 -0700
|
| paravirt: introduce a "lock-byte" spinlock implementation
The extra call/return in the spinlock path is somehow
causing an increase in the cycles/instruction of somewhere around 2-7%
(seems to vary quite a lot from test to test). The working theory is
that the CPU's pipeline is getting upset about the
call->call->locked-op->return->return, and seems to be failing to
speculate (though I haven't seen anything definitive about the precise
reasons). This doesn't entirely make sense, because the performance
hit is also visible on unlock and other operations which don't involve
locked instructions. But spinlock operations clearly swamp all the
other pvops operations, even though I can't imagine that they're
nearly as common (there's only a .05% increase in instructions
executed).
If I disable just the pv-spinlock calls, my tests show that pvops is
identical to non-pvops performance on native (my measurements show that
it is actually about .1% faster, but Xiaohui shows a .05% slowdown).
Summary of results, averaging 10 runs of the "mmperf" test, using a
no-pvops build as baseline:
nopv Pv-nospin Pv-spin
CPU cycles 100.00% 99.89% 102.18%
instructions 100.00% 100.10% 100.15%
CPI 100.00% 99.79% 102.03%
cache ref 100.00% 100.84% 100.28%
cache miss 100.00% 90.47% 88.56%
cache miss rate 100.00% 89.72% 88.31%
branches 100.00% 99.93% 100.04%
branch miss 100.00% 103.66% 107.72%
branch miss rt 100.00% 103.73% 107.67%
wallclock 100.00% 99.90% 102.20%
The clear effect here is that the 2% increase in CPI is
directly reflected in the final wallclock time.
(The other interesting effect is that the more ops are
out of line calls via pvops, the lower the cache access
and miss rates. Not too surprising, but it suggests that
the non-pvops kernel is over-inlined. On the flipside,
the branch misses go up correspondingly...)
So, what's the fix?
Paravirt patching turns all the pvops calls into direct calls, so
_spin_lock etc do end up having direct calls. For example, the compiler
generated code for paravirtualized _spin_lock is:
<_spin_lock+0>: mov %gs:0xb4c8,%rax
<_spin_lock+9>: incl 0xffffffffffffe044(%rax)
<_spin_lock+15>: callq *0xffffffff805a5b30
<_spin_lock+22>: retq
The indirect call will get patched to:
<_spin_lock+0>: mov %gs:0xb4c8,%rax
<_spin_lock+9>: incl 0xffffffffffffe044(%rax)
<_spin_lock+15>: callq <__ticket_spin_lock>
<_spin_lock+20>: nop; nop /* or whatever 2-byte nop */
<_spin_lock+22>: retq
One possibility is to inline _spin_lock, etc, when building an
optimised kernel (ie, when there's no spinlock/preempt
instrumentation/debugging enabled). That will remove the outer
call/return pair, returning the instruction stream to a single
call/return, which will presumably execute the same as the non-pvops
case. The downsides arel 1) it will replicate the
preempt_disable/enable code at eack lock/unlock callsite; this code is
fairly small, but not nothing; and 2) the spinlock definitions are
already a very heavily tangled mass of #ifdefs and other preprocessor
magic, and making any changes will be non-trivial.
The other obvious answer is to disable pv-spinlocks. Making them a
separate config option is fairly easy, and it would be trivial to
enable them only when Xen is enabled (as the only non-default user).
But it doesn't really address the common case of a distro build which
is going to have Xen support enabled, and leaves the open question of
whether the native performance cost of pv-spinlocks is worth the
performance improvement on a loaded Xen system (10% saving of overall
system CPU when guests block rather than spin). Still it is a
reasonable short-term workaround.
[ Impact: fix pvops performance regression when running native ]
Analysed-by: "Xin Xiaohui" <xiaohui.xin@intel.com>
Analysed-by: "Li Xin" <xin.li@intel.com>
Analysed-by: "Nakajima Jun" <jun.nakajima@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Xen-devel <xen-devel@lists.xensource.com>
LKML-Reference: <4A0B62F7.5030802@goop.org>
[ fixed the help text ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-14 00:16:55 +00:00
|
|
|
config PARAVIRT_SPINLOCKS
|
|
|
|
bool "Paravirtualization layer for spinlocks"
|
2012-10-02 18:16:47 +00:00
|
|
|
depends on PARAVIRT && SMP
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
x86: Fix performance regression caused by paravirt_ops on native kernels
Xiaohui Xin and some other folks at Intel have been looking into what's
behind the performance hit of paravirt_ops when running native.
It appears that the hit is entirely due to the paravirtualized
spinlocks introduced by:
| commit 8efcbab674de2bee45a2e4cdf97de16b8e609ac8
| Date: Mon Jul 7 12:07:51 2008 -0700
|
| paravirt: introduce a "lock-byte" spinlock implementation
The extra call/return in the spinlock path is somehow
causing an increase in the cycles/instruction of somewhere around 2-7%
(seems to vary quite a lot from test to test). The working theory is
that the CPU's pipeline is getting upset about the
call->call->locked-op->return->return, and seems to be failing to
speculate (though I haven't seen anything definitive about the precise
reasons). This doesn't entirely make sense, because the performance
hit is also visible on unlock and other operations which don't involve
locked instructions. But spinlock operations clearly swamp all the
other pvops operations, even though I can't imagine that they're
nearly as common (there's only a .05% increase in instructions
executed).
If I disable just the pv-spinlock calls, my tests show that pvops is
identical to non-pvops performance on native (my measurements show that
it is actually about .1% faster, but Xiaohui shows a .05% slowdown).
Summary of results, averaging 10 runs of the "mmperf" test, using a
no-pvops build as baseline:
nopv Pv-nospin Pv-spin
CPU cycles 100.00% 99.89% 102.18%
instructions 100.00% 100.10% 100.15%
CPI 100.00% 99.79% 102.03%
cache ref 100.00% 100.84% 100.28%
cache miss 100.00% 90.47% 88.56%
cache miss rate 100.00% 89.72% 88.31%
branches 100.00% 99.93% 100.04%
branch miss 100.00% 103.66% 107.72%
branch miss rt 100.00% 103.73% 107.67%
wallclock 100.00% 99.90% 102.20%
The clear effect here is that the 2% increase in CPI is
directly reflected in the final wallclock time.
(The other interesting effect is that the more ops are
out of line calls via pvops, the lower the cache access
and miss rates. Not too surprising, but it suggests that
the non-pvops kernel is over-inlined. On the flipside,
the branch misses go up correspondingly...)
So, what's the fix?
Paravirt patching turns all the pvops calls into direct calls, so
_spin_lock etc do end up having direct calls. For example, the compiler
generated code for paravirtualized _spin_lock is:
<_spin_lock+0>: mov %gs:0xb4c8,%rax
<_spin_lock+9>: incl 0xffffffffffffe044(%rax)
<_spin_lock+15>: callq *0xffffffff805a5b30
<_spin_lock+22>: retq
The indirect call will get patched to:
<_spin_lock+0>: mov %gs:0xb4c8,%rax
<_spin_lock+9>: incl 0xffffffffffffe044(%rax)
<_spin_lock+15>: callq <__ticket_spin_lock>
<_spin_lock+20>: nop; nop /* or whatever 2-byte nop */
<_spin_lock+22>: retq
One possibility is to inline _spin_lock, etc, when building an
optimised kernel (ie, when there's no spinlock/preempt
instrumentation/debugging enabled). That will remove the outer
call/return pair, returning the instruction stream to a single
call/return, which will presumably execute the same as the non-pvops
case. The downsides arel 1) it will replicate the
preempt_disable/enable code at eack lock/unlock callsite; this code is
fairly small, but not nothing; and 2) the spinlock definitions are
already a very heavily tangled mass of #ifdefs and other preprocessor
magic, and making any changes will be non-trivial.
The other obvious answer is to disable pv-spinlocks. Making them a
separate config option is fairly easy, and it would be trivial to
enable them only when Xen is enabled (as the only non-default user).
But it doesn't really address the common case of a distro build which
is going to have Xen support enabled, and leaves the open question of
whether the native performance cost of pv-spinlocks is worth the
performance improvement on a loaded Xen system (10% saving of overall
system CPU when guests block rather than spin). Still it is a
reasonable short-term workaround.
[ Impact: fix pvops performance regression when running native ]
Analysed-by: "Xin Xiaohui" <xiaohui.xin@intel.com>
Analysed-by: "Li Xin" <xin.li@intel.com>
Analysed-by: "Nakajima Jun" <jun.nakajima@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Xen-devel <xen-devel@lists.xensource.com>
LKML-Reference: <4A0B62F7.5030802@goop.org>
[ fixed the help text ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-14 00:16:55 +00:00
|
|
|
Paravirtualized spinlocks allow a pvops backend to replace the
|
|
|
|
spinlock implementation with something virtualization-friendly
|
|
|
|
(for example, block the virtual CPU rather than spinning).
|
|
|
|
|
2013-10-21 16:05:08 +00:00
|
|
|
It has a minimal impact on native kernels and gives a nice performance
|
|
|
|
benefit on paravirtualized KVM / Xen kernels.
|
x86: Fix performance regression caused by paravirt_ops on native kernels
Xiaohui Xin and some other folks at Intel have been looking into what's
behind the performance hit of paravirt_ops when running native.
It appears that the hit is entirely due to the paravirtualized
spinlocks introduced by:
| commit 8efcbab674de2bee45a2e4cdf97de16b8e609ac8
| Date: Mon Jul 7 12:07:51 2008 -0700
|
| paravirt: introduce a "lock-byte" spinlock implementation
The extra call/return in the spinlock path is somehow
causing an increase in the cycles/instruction of somewhere around 2-7%
(seems to vary quite a lot from test to test). The working theory is
that the CPU's pipeline is getting upset about the
call->call->locked-op->return->return, and seems to be failing to
speculate (though I haven't seen anything definitive about the precise
reasons). This doesn't entirely make sense, because the performance
hit is also visible on unlock and other operations which don't involve
locked instructions. But spinlock operations clearly swamp all the
other pvops operations, even though I can't imagine that they're
nearly as common (there's only a .05% increase in instructions
executed).
If I disable just the pv-spinlock calls, my tests show that pvops is
identical to non-pvops performance on native (my measurements show that
it is actually about .1% faster, but Xiaohui shows a .05% slowdown).
Summary of results, averaging 10 runs of the "mmperf" test, using a
no-pvops build as baseline:
nopv Pv-nospin Pv-spin
CPU cycles 100.00% 99.89% 102.18%
instructions 100.00% 100.10% 100.15%
CPI 100.00% 99.79% 102.03%
cache ref 100.00% 100.84% 100.28%
cache miss 100.00% 90.47% 88.56%
cache miss rate 100.00% 89.72% 88.31%
branches 100.00% 99.93% 100.04%
branch miss 100.00% 103.66% 107.72%
branch miss rt 100.00% 103.73% 107.67%
wallclock 100.00% 99.90% 102.20%
The clear effect here is that the 2% increase in CPI is
directly reflected in the final wallclock time.
(The other interesting effect is that the more ops are
out of line calls via pvops, the lower the cache access
and miss rates. Not too surprising, but it suggests that
the non-pvops kernel is over-inlined. On the flipside,
the branch misses go up correspondingly...)
So, what's the fix?
Paravirt patching turns all the pvops calls into direct calls, so
_spin_lock etc do end up having direct calls. For example, the compiler
generated code for paravirtualized _spin_lock is:
<_spin_lock+0>: mov %gs:0xb4c8,%rax
<_spin_lock+9>: incl 0xffffffffffffe044(%rax)
<_spin_lock+15>: callq *0xffffffff805a5b30
<_spin_lock+22>: retq
The indirect call will get patched to:
<_spin_lock+0>: mov %gs:0xb4c8,%rax
<_spin_lock+9>: incl 0xffffffffffffe044(%rax)
<_spin_lock+15>: callq <__ticket_spin_lock>
<_spin_lock+20>: nop; nop /* or whatever 2-byte nop */
<_spin_lock+22>: retq
One possibility is to inline _spin_lock, etc, when building an
optimised kernel (ie, when there's no spinlock/preempt
instrumentation/debugging enabled). That will remove the outer
call/return pair, returning the instruction stream to a single
call/return, which will presumably execute the same as the non-pvops
case. The downsides arel 1) it will replicate the
preempt_disable/enable code at eack lock/unlock callsite; this code is
fairly small, but not nothing; and 2) the spinlock definitions are
already a very heavily tangled mass of #ifdefs and other preprocessor
magic, and making any changes will be non-trivial.
The other obvious answer is to disable pv-spinlocks. Making them a
separate config option is fairly easy, and it would be trivial to
enable them only when Xen is enabled (as the only non-default user).
But it doesn't really address the common case of a distro build which
is going to have Xen support enabled, and leaves the open question of
whether the native performance cost of pv-spinlocks is worth the
performance improvement on a loaded Xen system (10% saving of overall
system CPU when guests block rather than spin). Still it is a
reasonable short-term workaround.
[ Impact: fix pvops performance regression when running native ]
Analysed-by: "Xin Xiaohui" <xiaohui.xin@intel.com>
Analysed-by: "Li Xin" <xin.li@intel.com>
Analysed-by: "Nakajima Jun" <jun.nakajima@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Xen-devel <xen-devel@lists.xensource.com>
LKML-Reference: <4A0B62F7.5030802@goop.org>
[ fixed the help text ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-14 00:16:55 +00:00
|
|
|
|
2013-10-21 16:05:08 +00:00
|
|
|
If you are unsure how to answer this question, answer Y.
|
x86: Fix performance regression caused by paravirt_ops on native kernels
Xiaohui Xin and some other folks at Intel have been looking into what's
behind the performance hit of paravirt_ops when running native.
It appears that the hit is entirely due to the paravirtualized
spinlocks introduced by:
| commit 8efcbab674de2bee45a2e4cdf97de16b8e609ac8
| Date: Mon Jul 7 12:07:51 2008 -0700
|
| paravirt: introduce a "lock-byte" spinlock implementation
The extra call/return in the spinlock path is somehow
causing an increase in the cycles/instruction of somewhere around 2-7%
(seems to vary quite a lot from test to test). The working theory is
that the CPU's pipeline is getting upset about the
call->call->locked-op->return->return, and seems to be failing to
speculate (though I haven't seen anything definitive about the precise
reasons). This doesn't entirely make sense, because the performance
hit is also visible on unlock and other operations which don't involve
locked instructions. But spinlock operations clearly swamp all the
other pvops operations, even though I can't imagine that they're
nearly as common (there's only a .05% increase in instructions
executed).
If I disable just the pv-spinlock calls, my tests show that pvops is
identical to non-pvops performance on native (my measurements show that
it is actually about .1% faster, but Xiaohui shows a .05% slowdown).
Summary of results, averaging 10 runs of the "mmperf" test, using a
no-pvops build as baseline:
nopv Pv-nospin Pv-spin
CPU cycles 100.00% 99.89% 102.18%
instructions 100.00% 100.10% 100.15%
CPI 100.00% 99.79% 102.03%
cache ref 100.00% 100.84% 100.28%
cache miss 100.00% 90.47% 88.56%
cache miss rate 100.00% 89.72% 88.31%
branches 100.00% 99.93% 100.04%
branch miss 100.00% 103.66% 107.72%
branch miss rt 100.00% 103.73% 107.67%
wallclock 100.00% 99.90% 102.20%
The clear effect here is that the 2% increase in CPI is
directly reflected in the final wallclock time.
(The other interesting effect is that the more ops are
out of line calls via pvops, the lower the cache access
and miss rates. Not too surprising, but it suggests that
the non-pvops kernel is over-inlined. On the flipside,
the branch misses go up correspondingly...)
So, what's the fix?
Paravirt patching turns all the pvops calls into direct calls, so
_spin_lock etc do end up having direct calls. For example, the compiler
generated code for paravirtualized _spin_lock is:
<_spin_lock+0>: mov %gs:0xb4c8,%rax
<_spin_lock+9>: incl 0xffffffffffffe044(%rax)
<_spin_lock+15>: callq *0xffffffff805a5b30
<_spin_lock+22>: retq
The indirect call will get patched to:
<_spin_lock+0>: mov %gs:0xb4c8,%rax
<_spin_lock+9>: incl 0xffffffffffffe044(%rax)
<_spin_lock+15>: callq <__ticket_spin_lock>
<_spin_lock+20>: nop; nop /* or whatever 2-byte nop */
<_spin_lock+22>: retq
One possibility is to inline _spin_lock, etc, when building an
optimised kernel (ie, when there's no spinlock/preempt
instrumentation/debugging enabled). That will remove the outer
call/return pair, returning the instruction stream to a single
call/return, which will presumably execute the same as the non-pvops
case. The downsides arel 1) it will replicate the
preempt_disable/enable code at eack lock/unlock callsite; this code is
fairly small, but not nothing; and 2) the spinlock definitions are
already a very heavily tangled mass of #ifdefs and other preprocessor
magic, and making any changes will be non-trivial.
The other obvious answer is to disable pv-spinlocks. Making them a
separate config option is fairly easy, and it would be trivial to
enable them only when Xen is enabled (as the only non-default user).
But it doesn't really address the common case of a distro build which
is going to have Xen support enabled, and leaves the open question of
whether the native performance cost of pv-spinlocks is worth the
performance improvement on a loaded Xen system (10% saving of overall
system CPU when guests block rather than spin). Still it is a
reasonable short-term workaround.
[ Impact: fix pvops performance regression when running native ]
Analysed-by: "Xin Xiaohui" <xiaohui.xin@intel.com>
Analysed-by: "Li Xin" <xin.li@intel.com>
Analysed-by: "Nakajima Jun" <jun.nakajima@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Xen-devel <xen-devel@lists.xensource.com>
LKML-Reference: <4A0B62F7.5030802@goop.org>
[ fixed the help text ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-14 00:16:55 +00:00
|
|
|
|
2019-04-30 03:45:23 +00:00
|
|
|
config X86_HV_CALLBACK_VECTOR
|
|
|
|
def_bool n
|
|
|
|
|
2013-03-04 20:20:21 +00:00
|
|
|
source "arch/x86/xen/Kconfig"
|
2008-06-03 14:17:29 +00:00
|
|
|
|
2013-03-04 20:20:21 +00:00
|
|
|
config KVM_GUEST
|
|
|
|
bool "KVM Guest support (including kvmclock)"
|
|
|
|
depends on PARAVIRT
|
|
|
|
select PARAVIRT_CLOCK
|
2019-07-03 23:51:29 +00:00
|
|
|
select ARCH_CPUIDLE_HALTPOLL
|
2020-05-25 14:41:23 +00:00
|
|
|
select X86_HV_CALLBACK_VECTOR
|
2013-03-04 20:20:21 +00:00
|
|
|
default y
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2013-03-04 20:20:21 +00:00
|
|
|
This option enables various optimizations for running under the KVM
|
|
|
|
hypervisor. It includes a paravirtualized clock, so that instead
|
|
|
|
of relying on a PIT (or probably other) emulation by the
|
|
|
|
underlying device model, the host provides the guest with
|
|
|
|
timing infrastructure such as time of day, and system time
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2019-07-03 23:51:29 +00:00
|
|
|
config ARCH_CPUIDLE_HALTPOLL
|
2019-11-21 03:21:09 +00:00
|
|
|
def_bool n
|
|
|
|
prompt "Disable host haltpoll when loading haltpoll driver"
|
|
|
|
help
|
2019-07-03 23:51:29 +00:00
|
|
|
If virtualized under KVM, disable host haltpoll.
|
|
|
|
|
2018-12-10 19:07:28 +00:00
|
|
|
config PVH
|
|
|
|
bool "Support for running PVH guests"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2018-12-10 19:07:28 +00:00
|
|
|
This option enables the PVH entry point for guest virtual machines
|
|
|
|
as specified in the x86/HVM direct boot ABI.
|
|
|
|
|
2013-03-04 20:20:21 +00:00
|
|
|
config PARAVIRT_TIME_ACCOUNTING
|
|
|
|
bool "Paravirtual steal time accounting"
|
|
|
|
depends on PARAVIRT
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2013-03-04 20:20:21 +00:00
|
|
|
Select this option to enable fine granularity task steal time
|
|
|
|
accounting. Time spent executing other tasks in parallel with
|
|
|
|
the current vCPU is discounted from the vCPU power. To account for
|
|
|
|
that, there can be a small performance impact.
|
|
|
|
|
|
|
|
If in doubt, say N here.
|
|
|
|
|
|
|
|
config PARAVIRT_CLOCK
|
|
|
|
bool
|
2008-06-25 04:19:14 +00:00
|
|
|
|
2017-11-27 08:11:46 +00:00
|
|
|
config JAILHOUSE_GUEST
|
|
|
|
bool "Jailhouse non-root cell support"
|
2018-01-15 15:51:20 +00:00
|
|
|
depends on X86_64 && PCI
|
2017-11-27 08:11:48 +00:00
|
|
|
select X86_PM_TIMER
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2017-11-27 08:11:46 +00:00
|
|
|
This option allows to run Linux as guest in a Jailhouse non-root
|
|
|
|
cell. You can leave this option disabled if you only want to start
|
|
|
|
Jailhouse and run Linux afterwards in the root cell.
|
|
|
|
|
2019-04-30 03:45:24 +00:00
|
|
|
config ACRN_GUEST
|
|
|
|
bool "ACRN Guest support"
|
|
|
|
depends on X86_64
|
2019-04-30 03:45:25 +00:00
|
|
|
select X86_HV_CALLBACK_VECTOR
|
2019-04-30 03:45:24 +00:00
|
|
|
help
|
|
|
|
This option allows to run Linux as guest in the ACRN hypervisor. ACRN is
|
|
|
|
a flexible, lightweight reference open-source hypervisor, built with
|
|
|
|
real-time and safety-criticality in mind. It is built for embedded
|
|
|
|
IOT with small footprint and real-time features. More details can be
|
|
|
|
found in https://projectacrn.org/.
|
|
|
|
|
2022-04-05 23:29:10 +00:00
|
|
|
config INTEL_TDX_GUEST
|
|
|
|
bool "Intel TDX (Trust Domain Extensions) - Guest Support"
|
|
|
|
depends on X86_64 && CPU_SUP_INTEL
|
|
|
|
depends on X86_X2APIC
|
2023-06-06 14:26:37 +00:00
|
|
|
depends on EFI_STUB
|
2022-04-05 23:29:13 +00:00
|
|
|
select ARCH_HAS_CC_PLATFORM
|
2022-04-05 23:29:36 +00:00
|
|
|
select X86_MEM_ENCRYPT
|
x86/boot: Avoid #VE during boot for TDX platforms
There are a few MSRs and control register bits that the kernel
normally needs to modify during boot. But, TDX disallows
modification of these registers to help provide consistent security
guarantees. Fortunately, TDX ensures that these are all in the correct
state before the kernel loads, which means the kernel does not need to
modify them.
The conditions to avoid are:
* Any writes to the EFER MSR
* Clearing CR4.MCE
This theoretically makes the guest boot more fragile. If, for instance,
EFER was set up incorrectly and a WRMSR was performed, it will trigger
early exception panic or a triple fault, if it's before early
exceptions are set up. However, this is likely to trip up the guest
BIOS long before control reaches the kernel. In any case, these kinds
of problems are unlikely to occur in production environments, and
developers have good debug tools to fix them quickly.
Change the common boot code to work on TDX and non-TDX systems.
This should have no functional effect on non-TDX systems.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20220405232939.73860-24-kirill.shutemov@linux.intel.com
2022-04-05 23:29:32 +00:00
|
|
|
select X86_MCE
|
2023-06-06 14:26:37 +00:00
|
|
|
select UNACCEPTED_MEMORY
|
2022-04-05 23:29:10 +00:00
|
|
|
help
|
|
|
|
Support running as a guest under Intel TDX. Without this support,
|
|
|
|
the guest kernel can not boot or run under TDX.
|
|
|
|
TDX includes memory encryption and integrity capabilities
|
|
|
|
which protect the confidentiality and integrity of guest
|
|
|
|
memory contents and CPU state. TDX guests are protected from
|
|
|
|
some attacks from the VMM.
|
|
|
|
|
2022-05-25 13:32:02 +00:00
|
|
|
endif # HYPERVISOR_GUEST
|
2008-06-25 04:19:14 +00:00
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
source "arch/x86/Kconfig.cpu"
|
|
|
|
|
|
|
|
config HPET_TIMER
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool X86_64
|
2007-11-09 20:56:54 +00:00
|
|
|
prompt "HPET Timer Support" if X86_32
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2009-02-05 15:21:53 +00:00
|
|
|
Use the IA-PC HPET (High Precision Event Timer) to manage
|
|
|
|
time in preference to the PIT and RTC, if a HPET is
|
|
|
|
present.
|
|
|
|
HPET is the next generation timer replacing legacy 8254s.
|
|
|
|
The HPET provides a stable time base on SMP
|
|
|
|
systems, unlike the TSC, but it is more expensive to access,
|
2016-02-10 23:05:01 +00:00
|
|
|
as it is off-chip. The interface used is documented
|
|
|
|
in the HPET spec, revision 1.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2009-02-05 15:21:53 +00:00
|
|
|
You can safely choose Y here. However, HPET will only be
|
|
|
|
activated if the platform and the BIOS support this feature.
|
|
|
|
Otherwise the 8254 will be used for timing services.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2009-02-05 15:21:53 +00:00
|
|
|
Choose N to continue using the legacy 8254 timer.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
|
|
|
config HPET_EMULATE_RTC
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
2021-02-04 07:32:32 +00:00
|
|
|
depends on HPET_TIMER && (RTC_DRV_CMOS=m || RTC_DRV_CMOS=y)
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2011-01-20 22:44:16 +00:00
|
|
|
# Mark as expert because too many people got it wrong.
|
2007-11-09 20:56:54 +00:00
|
|
|
# The code disables itself when not needed.
|
2008-04-28 09:14:14 +00:00
|
|
|
config DMI
|
|
|
|
default y
|
2014-01-23 23:54:39 +00:00
|
|
|
select DMI_SCAN_MACHINE_NON_EFI_FALLBACK
|
2011-01-20 22:44:16 +00:00
|
|
|
bool "Enable DMI scanning" if EXPERT
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2008-04-28 09:14:14 +00:00
|
|
|
Enabled scanning of DMI to identify machine quirks. Say Y
|
|
|
|
here unless you have verified that your setup is not
|
|
|
|
affected by entries in the DMI blacklist. Required by PNP
|
|
|
|
BIOS code.
|
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
config GART_IOMMU
|
2013-10-04 21:37:56 +00:00
|
|
|
bool "Old AMD GART IOMMU support"
|
2018-04-03 13:47:59 +00:00
|
|
|
select IOMMU_HELPER
|
2007-11-09 20:56:54 +00:00
|
|
|
select SWIOTLB
|
2010-09-17 16:03:43 +00:00
|
|
|
depends on X86_64 && PCI && AMD_NB
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2013-10-06 09:45:20 +00:00
|
|
|
Provides a driver for older AMD Athlon64/Opteron/Turion/Sempron
|
|
|
|
GART based hardware IOMMUs.
|
|
|
|
|
|
|
|
The GART supports full DMA access for devices with 32-bit access
|
|
|
|
limitations, on systems with more than 3 GB. This is usually needed
|
|
|
|
for USB, sound, many IDE/SATA chipsets and some other devices.
|
|
|
|
|
|
|
|
Newer systems typically have a modern AMD IOMMU, supported via
|
|
|
|
the CONFIG_AMD_IOMMU=y config option.
|
|
|
|
|
|
|
|
In normal configurations this driver is only active when needed:
|
|
|
|
there's more than 3 GB of memory and the system contains a
|
|
|
|
32-bit limited device.
|
|
|
|
|
|
|
|
If unsure, say Y.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2022-02-25 20:51:34 +00:00
|
|
|
config BOOT_VESA_SUPPORT
|
|
|
|
bool
|
|
|
|
help
|
|
|
|
If true, at least one selected framebuffer driver can take advantage
|
|
|
|
of VESA video modes set at an early boot stage via the vga= parameter.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2008-05-12 19:21:12 +00:00
|
|
|
config MAXSMP
|
2010-08-21 19:32:41 +00:00
|
|
|
bool "Enable Maximum number of SMP Processors and NUMA Nodes"
|
2012-10-02 18:16:47 +00:00
|
|
|
depends on X86_64 && SMP && DEBUG_KERNEL
|
2008-12-17 01:33:51 +00:00
|
|
|
select CPUMASK_OFFSTACK
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2010-08-21 19:32:41 +00:00
|
|
|
Enable maximum number of CPUS and NUMA Nodes for this architecture.
|
2008-05-12 19:21:12 +00:00
|
|
|
If unsure, say N.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2018-02-10 11:36:29 +00:00
|
|
|
#
|
|
|
|
# The maximum number of CPUs supported:
|
|
|
|
#
|
|
|
|
# The main config value is NR_CPUS, which defaults to NR_CPUS_DEFAULT,
|
|
|
|
# and which can be configured interactively in the
|
|
|
|
# [NR_CPUS_RANGE_BEGIN ... NR_CPUS_RANGE_END] range.
|
|
|
|
#
|
|
|
|
# The ranges are different on 32-bit and 64-bit kernels, depending on
|
|
|
|
# hardware capabilities and scalability features of the kernel.
|
|
|
|
#
|
|
|
|
# ( If MAXSMP is enabled we just use the highest possible value and disable
|
|
|
|
# interactive configuration. )
|
|
|
|
#
|
|
|
|
|
|
|
|
config NR_CPUS_RANGE_BEGIN
|
2018-02-10 00:51:03 +00:00
|
|
|
int
|
2018-02-10 11:36:29 +00:00
|
|
|
default NR_CPUS_RANGE_END if MAXSMP
|
|
|
|
default 1 if !SMP
|
|
|
|
default 2
|
2018-02-10 00:51:03 +00:00
|
|
|
|
2018-02-10 11:36:29 +00:00
|
|
|
config NR_CPUS_RANGE_END
|
2018-02-10 00:51:03 +00:00
|
|
|
int
|
2018-02-10 11:36:29 +00:00
|
|
|
depends on X86_32
|
|
|
|
default 64 if SMP && X86_BIGSMP
|
|
|
|
default 8 if SMP && !X86_BIGSMP
|
|
|
|
default 1 if !SMP
|
2018-02-10 00:51:03 +00:00
|
|
|
|
2018-02-10 11:36:29 +00:00
|
|
|
config NR_CPUS_RANGE_END
|
2018-02-10 00:51:03 +00:00
|
|
|
int
|
2018-02-10 11:36:29 +00:00
|
|
|
depends on X86_64
|
2019-10-12 07:00:54 +00:00
|
|
|
default 8192 if SMP && CPUMASK_OFFSTACK
|
|
|
|
default 512 if SMP && !CPUMASK_OFFSTACK
|
2018-02-10 11:36:29 +00:00
|
|
|
default 1 if !SMP
|
2018-02-10 00:51:03 +00:00
|
|
|
|
2018-02-10 11:36:29 +00:00
|
|
|
config NR_CPUS_DEFAULT
|
2018-02-10 00:51:03 +00:00
|
|
|
int
|
|
|
|
depends on X86_32
|
2018-02-10 11:36:29 +00:00
|
|
|
default 32 if X86_BIGSMP
|
|
|
|
default 8 if SMP
|
|
|
|
default 1 if !SMP
|
2018-02-10 00:51:03 +00:00
|
|
|
|
2018-02-10 11:36:29 +00:00
|
|
|
config NR_CPUS_DEFAULT
|
2018-02-10 00:51:03 +00:00
|
|
|
int
|
|
|
|
depends on X86_64
|
2018-02-10 11:36:29 +00:00
|
|
|
default 8192 if MAXSMP
|
|
|
|
default 64 if SMP
|
|
|
|
default 1 if !SMP
|
2018-02-10 00:51:03 +00:00
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
config NR_CPUS
|
2008-12-17 01:33:51 +00:00
|
|
|
int "Maximum number of CPUs" if SMP && !MAXSMP
|
2018-02-10 11:36:29 +00:00
|
|
|
range NR_CPUS_RANGE_BEGIN NR_CPUS_RANGE_END
|
|
|
|
default NR_CPUS_DEFAULT
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
This allows you to specify the maximum number of CPUs which this
|
2013-11-05 14:37:29 +00:00
|
|
|
kernel will support. If CPUMASK_OFFSTACK is enabled, the maximum
|
2015-05-08 10:25:26 +00:00
|
|
|
supported value is 8192, otherwise the maximum value is 512. The
|
2007-11-09 20:56:54 +00:00
|
|
|
minimum value which makes sense is 2.
|
|
|
|
|
2018-02-10 11:36:29 +00:00
|
|
|
This is purely to save memory: each supported CPU adds about 8KB
|
|
|
|
to the kernel image.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2021-09-24 08:51:04 +00:00
|
|
|
config SCHED_CLUSTER
|
|
|
|
bool "Cluster scheduler support"
|
|
|
|
depends on SMP
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Cluster scheduler support improves the CPU scheduler's decision
|
|
|
|
making when dealing with machines that have clusters of CPUs.
|
|
|
|
Cluster usually means a couple of CPUs which are placed closely
|
|
|
|
by sharing mid-level caches, last-level cache tags or internal
|
|
|
|
busses.
|
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
config SCHED_SMT
|
2018-11-25 18:33:37 +00:00
|
|
|
def_bool y if SMP
|
2007-11-09 20:56:54 +00:00
|
|
|
|
|
|
|
config SCHED_MC
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
|
|
|
prompt "Multi-core scheduler support"
|
2015-06-04 16:55:25 +00:00
|
|
|
depends on SMP
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
Multi-core scheduler support improves the CPU scheduler's decision
|
|
|
|
making when dealing with multi-core CPU chips at a cost of slightly
|
|
|
|
increased overhead in some places. If unsure say N here.
|
|
|
|
|
2016-11-29 18:43:27 +00:00
|
|
|
config SCHED_MC_PRIO
|
|
|
|
bool "CPU core priorities scheduler support"
|
2024-01-19 09:04:56 +00:00
|
|
|
depends on SCHED_MC
|
|
|
|
select X86_INTEL_PSTATE if CPU_SUP_INTEL
|
|
|
|
select X86_AMD_PSTATE if CPU_SUP_AMD && ACPI
|
2016-11-30 07:33:54 +00:00
|
|
|
select CPU_FREQ
|
2016-11-29 18:43:27 +00:00
|
|
|
default y
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2016-11-30 07:33:54 +00:00
|
|
|
Intel Turbo Boost Max Technology 3.0 enabled CPUs have a
|
|
|
|
core ordering determined at manufacturing time, which allows
|
|
|
|
certain cores to reach higher turbo frequencies (when running
|
|
|
|
single threaded workloads) than others.
|
2016-11-29 18:43:27 +00:00
|
|
|
|
2016-11-30 07:33:54 +00:00
|
|
|
Enabling this kernel feature teaches the scheduler about
|
|
|
|
the TBM3 (aka ITMT) priority order of the CPU cores and adjusts the
|
|
|
|
scheduler's CPU selection logic accordingly, so that higher
|
|
|
|
overall system performance can be achieved.
|
2016-11-29 18:43:27 +00:00
|
|
|
|
2016-11-30 07:33:54 +00:00
|
|
|
This feature will have no effect on CPUs without this feature.
|
2016-11-29 18:43:27 +00:00
|
|
|
|
2016-11-30 07:33:54 +00:00
|
|
|
If unsure say Y here.
|
2016-11-22 20:23:55 +00:00
|
|
|
|
2015-01-15 21:22:39 +00:00
|
|
|
config UP_LATE_INIT
|
2019-11-21 03:21:09 +00:00
|
|
|
def_bool y
|
|
|
|
depends on !SMP && X86_LOCAL_APIC
|
2015-01-15 21:22:39 +00:00
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
config X86_UP_APIC
|
2015-02-05 15:31:56 +00:00
|
|
|
bool "Local APIC support on uniprocessors" if !PCI_MSI
|
|
|
|
default PCI_MSI
|
2015-01-22 22:58:49 +00:00
|
|
|
depends on X86_32 && !SMP && !X86_32_NON_STANDARD
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
A local APIC (Advanced Programmable Interrupt Controller) is an
|
|
|
|
integrated interrupt controller in the CPU. If you have a single-CPU
|
|
|
|
system which has a processor with a local APIC, you can say Y here to
|
|
|
|
enable and use it. If you say Y here even though your machine doesn't
|
|
|
|
have a local APIC, then the kernel will still run with no slowdown at
|
|
|
|
all. The local APIC supports CPU-generated self-interrupts (timer,
|
|
|
|
performance counters), and the NMI watchdog which detects hard
|
|
|
|
lockups.
|
|
|
|
|
|
|
|
config X86_UP_IOAPIC
|
|
|
|
bool "IO-APIC support on uniprocessors"
|
|
|
|
depends on X86_UP_APIC
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
An IO-APIC (I/O Advanced Programmable Interrupt Controller) is an
|
|
|
|
SMP-capable replacement for PC-style interrupt controllers. Most
|
|
|
|
SMP systems and many recent uniprocessor systems have one.
|
|
|
|
|
|
|
|
If you have a single-CPU system with an IO-APIC, you can say Y here
|
|
|
|
to use it. If you say Y here even though your machine doesn't have
|
|
|
|
an IO-APIC, then the kernel will still run with no slowdown at all.
|
|
|
|
|
|
|
|
config X86_LOCAL_APIC
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
x86, build, pci: Fix PCI_MSI build on !SMP
Commit ebd97be635 ('PCI: remove ARCH_SUPPORTS_MSI kconfig option')
removed the ARCH_SUPPORTS_MSI option which architectures could select
to indicate that they support MSI. Now, all architectures are supposed
to build fine when MSI support is enabled: instead of having the
architecture tell *when* MSI support can be used, it's up to the
architecture code to ensure that MSI support can be enabled.
On x86, commit ebd97be635 removed the following line:
select ARCH_SUPPORTS_MSI if (X86_LOCAL_APIC && X86_IO_APIC)
Which meant that MSI support was only available when the local APIC
and I/O APIC were enabled. While this is always true on SMP or x86-64,
it is not necessarily the case on i386 !SMP.
The below patch makes sure that the local APIC and I/O APIC support is
always enabled when MSI support is enabled. To do so, it:
* Ensures the X86_UP_APIC option is not visible when PCI_MSI is
enabled. This is the option that allows, on UP machines, to enable
or not the APIC support. It is already not visible on SMP systems,
or x86-64 systems, for example. We're simply also making it
invisible on i386 MSI systems.
* Ensures that the X86_LOCAL_APIC and X86_IO_APIC options are 'y'
when PCI_MSI is enabled.
Notice that this change requires a change in drivers/iommu/Kconfig to
avoid a recursive Kconfig dependencey. The AMD_IOMMU option selects
PCI_MSI, but was depending on X86_IO_APIC. This dependency is no
longer needed: as soon as PCI_MSI is selected, the presence of
X86_IO_APIC is guaranteed. Moreover, the AMD_IOMMU already depended on
X86_64, which already guaranteed that X86_IO_APIC was enabled, so this
dependency was anyway redundant.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Link: http://lkml.kernel.org/r/1380794354-9079-1-git-send-email-thomas.petazzoni@free-electrons.com
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-03 09:59:14 +00:00
|
|
|
depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC || PCI_MSI
|
2015-04-13 06:11:24 +00:00
|
|
|
select IRQ_DOMAIN_HIERARCHY
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2024-06-14 09:58:46 +00:00
|
|
|
config ACPI_MADT_WAKEUP
|
|
|
|
def_bool y
|
|
|
|
depends on X86_64
|
|
|
|
depends on ACPI
|
|
|
|
depends on SMP
|
|
|
|
depends on X86_LOCAL_APIC
|
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
config X86_IO_APIC
|
2015-02-05 15:35:21 +00:00
|
|
|
def_bool y
|
|
|
|
depends on X86_LOCAL_APIC || X86_UP_IOAPIC
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2008-07-15 11:48:55 +00:00
|
|
|
config X86_REROUTE_FOR_BROKEN_BOOT_IRQS
|
|
|
|
bool "Reroute for broken boot IRQs"
|
|
|
|
depends on X86_IO_APIC
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2008-07-15 11:48:55 +00:00
|
|
|
This option enables a workaround that fixes a source of
|
|
|
|
spurious interrupts. This is recommended when threaded
|
|
|
|
interrupt handling is used on systems where the generation of
|
|
|
|
superfluous "boot interrupts" cannot be disabled.
|
|
|
|
|
|
|
|
Some chipsets generate a legacy INTx "boot IRQ" when the IRQ
|
|
|
|
entry in the chipset's IO-APIC is masked (as, e.g. the RT
|
|
|
|
kernel does during interrupt handling). On chipsets where this
|
|
|
|
boot IRQ generation cannot be disabled, this workaround keeps
|
|
|
|
the original IRQ line masked so that only the equivalent "boot
|
|
|
|
IRQ" is delivered to the CPUs. The workaround also tells the
|
|
|
|
kernel to set up the IRQ handler on the boot IRQ line. In this
|
|
|
|
way only one interrupt is delivered to the kernel. Otherwise
|
|
|
|
the spurious second interrupt may cause the kernel to bring
|
|
|
|
down (vital) interrupt lines.
|
|
|
|
|
|
|
|
Only affects "broken" chipsets. Interrupt sharing may be
|
|
|
|
increased on these systems.
|
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
config X86_MCE
|
2009-07-08 22:31:38 +00:00
|
|
|
bool "Machine Check / overheating reporting"
|
2015-08-12 16:29:34 +00:00
|
|
|
select GENERIC_ALLOCATOR
|
2011-09-13 13:23:21 +00:00
|
|
|
default y
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2009-07-08 22:31:38 +00:00
|
|
|
Machine Check support allows the processor to notify the
|
|
|
|
kernel if it detects a problem (e.g. overheating, data corruption).
|
2007-11-09 20:56:54 +00:00
|
|
|
The action the kernel takes depends on the severity of the problem,
|
2009-07-08 22:31:38 +00:00
|
|
|
ranging from warning messages to halting the machine.
|
x86, mce: use 64bit machine check code on 32bit
The 64bit machine check code is in many ways much better than
the 32bit machine check code: it is more specification compliant,
is cleaner, only has a single code base versus one per CPU,
has better infrastructure for recovery, has a cleaner way to communicate
with user space etc. etc.
Use the 64bit code for 32bit too.
This is the second attempt to do this. There was one a couple of years
ago to unify this code for 32bit and 64bit. Back then this ran into some
trouble with K7s and was reverted.
I believe this time the K7 problems (and some others) are addressed.
I went over the old handlers and was very careful to retain
all quirks.
But of course this needs a lot of testing on old systems. On newer
64bit capable systems I don't expect much problems because they have been
already tested with the 64bit kernel.
I made this a CONFIG for now that still allows to select the old
machine check code. This is mostly to make testing easier,
if someone runs into a problem we can ask them to try
with the CONFIG switched.
The new code is default y for more coverage.
Once there is confidence the 64bit code works well on older hardware
too the CONFIG_X86_OLD_MCE and the associated code can be easily
removed.
This causes a behaviour change for 32bit installations. They now
have to install the mcelog package to be able to log
corrected machine checks.
The 64bit machine check code only handles CPUs which support the
standard Intel machine check architecture described in the IA32 SDM.
The 32bit code has special support for some older CPUs which
have non standard machine check architectures, in particular
WinChip C3 and Intel P5. I made those a separate CONFIG option
and kept them for now. The WinChip variant could be probably
removed without too much pain, it doesn't really do anything
interesting. P5 is also disabled by default (like it
was before) because many motherboards have it miswired, but
according to Alan Cox a few embedded setups use that one.
Forward ported/heavily changed version of old patch, original patch
included review/fixes from Thomas Gleixner, Bert Wesarg.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-04-28 17:07:31 +00:00
|
|
|
|
2017-03-27 09:33:03 +00:00
|
|
|
config X86_MCELOG_LEGACY
|
|
|
|
bool "Support for deprecated /dev/mcelog character device"
|
|
|
|
depends on X86_MCE
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2017-03-27 09:33:03 +00:00
|
|
|
Enable support for /dev/mcelog which is needed by the old mcelog
|
|
|
|
userspace logging daemon. Consider switching to the new generation
|
|
|
|
rasdaemon solution.
|
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
config X86_MCE_INTEL
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
|
|
|
prompt "Intel MCE features"
|
2009-07-08 22:31:41 +00:00
|
|
|
depends on X86_MCE && X86_LOCAL_APIC
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2022-05-25 13:32:02 +00:00
|
|
|
Additional support for intel specific MCE features such as
|
|
|
|
the thermal monitor.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
|
|
|
config X86_MCE_AMD
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
|
|
|
prompt "AMD MCE features"
|
2016-11-17 22:57:27 +00:00
|
|
|
depends on X86_MCE && X86_LOCAL_APIC && AMD_NB
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2022-05-25 13:32:02 +00:00
|
|
|
Additional support for AMD specific MCE features such as
|
|
|
|
the DRAM Error Threshold.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
x86, mce: use 64bit machine check code on 32bit
The 64bit machine check code is in many ways much better than
the 32bit machine check code: it is more specification compliant,
is cleaner, only has a single code base versus one per CPU,
has better infrastructure for recovery, has a cleaner way to communicate
with user space etc. etc.
Use the 64bit code for 32bit too.
This is the second attempt to do this. There was one a couple of years
ago to unify this code for 32bit and 64bit. Back then this ran into some
trouble with K7s and was reverted.
I believe this time the K7 problems (and some others) are addressed.
I went over the old handlers and was very careful to retain
all quirks.
But of course this needs a lot of testing on old systems. On newer
64bit capable systems I don't expect much problems because they have been
already tested with the 64bit kernel.
I made this a CONFIG for now that still allows to select the old
machine check code. This is mostly to make testing easier,
if someone runs into a problem we can ask them to try
with the CONFIG switched.
The new code is default y for more coverage.
Once there is confidence the 64bit code works well on older hardware
too the CONFIG_X86_OLD_MCE and the associated code can be easily
removed.
This causes a behaviour change for 32bit installations. They now
have to install the mcelog package to be able to log
corrected machine checks.
The 64bit machine check code only handles CPUs which support the
standard Intel machine check architecture described in the IA32 SDM.
The 32bit code has special support for some older CPUs which
have non standard machine check architectures, in particular
WinChip C3 and Intel P5. I made those a separate CONFIG option
and kept them for now. The WinChip variant could be probably
removed without too much pain, it doesn't really do anything
interesting. P5 is also disabled by default (like it
was before) because many motherboards have it miswired, but
according to Alan Cox a few embedded setups use that one.
Forward ported/heavily changed version of old patch, original patch
included review/fixes from Thomas Gleixner, Bert Wesarg.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-04-28 17:07:31 +00:00
|
|
|
config X86_ANCIENT_MCE
|
2010-04-21 14:23:44 +00:00
|
|
|
bool "Support for old Pentium 5 / WinChip machine checks"
|
2009-07-08 22:31:37 +00:00
|
|
|
depends on X86_32 && X86_MCE
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2009-05-27 07:57:31 +00:00
|
|
|
Include support for machine check handling on old Pentium 5 or WinChip
|
2013-11-30 12:38:43 +00:00
|
|
|
systems. These typically need to be enabled explicitly on the command
|
2009-05-27 07:57:31 +00:00
|
|
|
line.
|
x86, mce: use 64bit machine check code on 32bit
The 64bit machine check code is in many ways much better than
the 32bit machine check code: it is more specification compliant,
is cleaner, only has a single code base versus one per CPU,
has better infrastructure for recovery, has a cleaner way to communicate
with user space etc. etc.
Use the 64bit code for 32bit too.
This is the second attempt to do this. There was one a couple of years
ago to unify this code for 32bit and 64bit. Back then this ran into some
trouble with K7s and was reverted.
I believe this time the K7 problems (and some others) are addressed.
I went over the old handlers and was very careful to retain
all quirks.
But of course this needs a lot of testing on old systems. On newer
64bit capable systems I don't expect much problems because they have been
already tested with the 64bit kernel.
I made this a CONFIG for now that still allows to select the old
machine check code. This is mostly to make testing easier,
if someone runs into a problem we can ask them to try
with the CONFIG switched.
The new code is default y for more coverage.
Once there is confidence the 64bit code works well on older hardware
too the CONFIG_X86_OLD_MCE and the associated code can be easily
removed.
This causes a behaviour change for 32bit installations. They now
have to install the mcelog package to be able to log
corrected machine checks.
The 64bit machine check code only handles CPUs which support the
standard Intel machine check architecture described in the IA32 SDM.
The 32bit code has special support for some older CPUs which
have non standard machine check architectures, in particular
WinChip C3 and Intel P5. I made those a separate CONFIG option
and kept them for now. The WinChip variant could be probably
removed without too much pain, it doesn't really do anything
interesting. P5 is also disabled by default (like it
was before) because many motherboards have it miswired, but
according to Alan Cox a few embedded setups use that one.
Forward ported/heavily changed version of old patch, original patch
included review/fixes from Thomas Gleixner, Bert Wesarg.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-04-28 17:07:31 +00:00
|
|
|
|
2009-02-12 12:49:31 +00:00
|
|
|
config X86_MCE_THRESHOLD
|
|
|
|
depends on X86_MCE_AMD || X86_MCE_INTEL
|
2010-04-21 14:23:44 +00:00
|
|
|
def_bool y
|
2009-02-12 12:49:31 +00:00
|
|
|
|
2009-04-29 17:31:00 +00:00
|
|
|
config X86_MCE_INJECT
|
2017-06-13 16:28:30 +00:00
|
|
|
depends on X86_MCE && X86_LOCAL_APIC && DEBUG_FS
|
2009-04-29 17:31:00 +00:00
|
|
|
tristate "Machine check injector support"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2009-04-29 17:31:00 +00:00
|
|
|
Provide support for injecting machine checks for testing purposes.
|
|
|
|
If you don't know what a machine check is and you don't do kernel
|
|
|
|
QA it is safe to say n.
|
|
|
|
|
2016-03-29 12:30:35 +00:00
|
|
|
source "arch/x86/events/Kconfig"
|
2016-03-20 08:33:36 +00:00
|
|
|
|
2015-07-10 15:34:23 +00:00
|
|
|
config X86_LEGACY_VM86
|
2015-09-05 06:58:10 +00:00
|
|
|
bool "Legacy VM86 support"
|
2007-11-09 20:56:54 +00:00
|
|
|
depends on X86_32
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2015-07-10 15:34:23 +00:00
|
|
|
This option allows user programs to put the CPU into V8086
|
|
|
|
mode, which is an 80286-era approximation of 16-bit real mode.
|
|
|
|
|
|
|
|
Some very old versions of X and/or vbetool require this option
|
|
|
|
for user mode setting. Similarly, DOSEMU will use it if
|
|
|
|
available to accelerate real mode DOS programs. However, any
|
|
|
|
recent version of DOSEMU, X, or vbetool should be fully
|
|
|
|
functional even without kernel VM86 support, as they will all
|
2015-09-05 06:58:10 +00:00
|
|
|
fall back to software emulation. Nevertheless, if you are using
|
|
|
|
a 16-bit DOS program where 16-bit performance matters, vm86
|
|
|
|
mode might be faster than emulation and you might want to
|
|
|
|
enable this option.
|
2015-07-10 15:34:23 +00:00
|
|
|
|
2015-09-05 06:58:10 +00:00
|
|
|
Note that any app that works on a 64-bit kernel is unlikely to
|
|
|
|
need this option, as 64-bit kernels don't, and can't, support
|
|
|
|
V8086 mode. This option is also unrelated to 16-bit protected
|
|
|
|
mode and is not needed to run most 16-bit programs under Wine.
|
2015-07-10 15:34:23 +00:00
|
|
|
|
2015-09-05 06:58:10 +00:00
|
|
|
Enabling this option increases the complexity of the kernel
|
|
|
|
and slows down exception handling a tiny bit.
|
2015-07-10 15:34:23 +00:00
|
|
|
|
2015-09-05 06:58:10 +00:00
|
|
|
If unsure, say N here.
|
2015-07-10 15:34:23 +00:00
|
|
|
|
|
|
|
config VM86
|
2019-11-21 03:21:09 +00:00
|
|
|
bool
|
|
|
|
default X86_LEGACY_VM86
|
2014-05-04 17:36:22 +00:00
|
|
|
|
|
|
|
config X86_16BIT
|
|
|
|
bool "Enable support for 16-bit segments" if EXPERT
|
|
|
|
default y
|
2015-07-30 21:31:34 +00:00
|
|
|
depends on MODIFY_LDT_SYSCALL
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2014-05-04 17:36:22 +00:00
|
|
|
This option is required by programs like Wine to run 16-bit
|
|
|
|
protected mode legacy code on x86 processors. Disabling
|
|
|
|
this option saves about 300 bytes on i386, or around 6K text
|
|
|
|
plus 16K runtime memory on x86-64,
|
|
|
|
|
|
|
|
config X86_ESPFIX32
|
|
|
|
def_bool y
|
|
|
|
depends on X86_16BIT && X86_32
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2014-05-04 17:00:49 +00:00
|
|
|
config X86_ESPFIX64
|
|
|
|
def_bool y
|
2014-05-04 17:36:22 +00:00
|
|
|
depends on X86_16BIT && X86_64
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2014-10-29 21:33:47 +00:00
|
|
|
config X86_VSYSCALL_EMULATION
|
2019-11-21 03:21:09 +00:00
|
|
|
bool "Enable vsyscall emulation" if EXPERT
|
|
|
|
default y
|
|
|
|
depends on X86_64
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2022-05-25 13:32:02 +00:00
|
|
|
This enables emulation of the legacy vsyscall page. Disabling
|
|
|
|
it is roughly equivalent to booting with vsyscall=none, except
|
|
|
|
that it will also disable the helpful warning if a program
|
|
|
|
tries to use a vsyscall. With this option set to N, offending
|
|
|
|
programs will just segfault, citing addresses of the form
|
|
|
|
0xffffffffff600?00.
|
2014-10-29 21:33:47 +00:00
|
|
|
|
2022-05-25 13:32:02 +00:00
|
|
|
This option is required by many programs built before 2013, and
|
|
|
|
care should be used even with newer programs if set to N.
|
2014-10-29 21:33:47 +00:00
|
|
|
|
2022-05-25 13:32:02 +00:00
|
|
|
Disabling this option saves about 7K of kernel size and
|
|
|
|
possibly 4K of additional runtime pagetable memory.
|
2014-10-29 21:33:47 +00:00
|
|
|
|
2019-11-12 20:40:33 +00:00
|
|
|
config X86_IOPL_IOPERM
|
|
|
|
bool "IOPERM and IOPL Emulation"
|
2019-11-11 22:03:29 +00:00
|
|
|
default y
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2019-11-12 20:40:33 +00:00
|
|
|
This enables the ioperm() and iopl() syscalls which are necessary
|
|
|
|
for legacy applications.
|
|
|
|
|
2019-11-11 22:03:28 +00:00
|
|
|
Legacy IOPL support is an overbroad mechanism which allows user
|
|
|
|
space aside of accessing all 65536 I/O ports also to disable
|
|
|
|
interrupts. To gain this access the caller needs CAP_SYS_RAWIO
|
|
|
|
capabilities and permission from potentially active security
|
|
|
|
modules.
|
|
|
|
|
|
|
|
The emulation restricts the functionality of the syscall to
|
|
|
|
only allowing the full range I/O port access, but prevents the
|
2019-11-11 22:03:29 +00:00
|
|
|
ability to disable interrupts from user space which would be
|
|
|
|
granted if the hardware IOPL mechanism would be used.
|
2019-11-11 22:03:28 +00:00
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
config TOSHIBA
|
|
|
|
tristate "Toshiba Laptop support"
|
|
|
|
depends on X86_32
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
This adds a driver to safely access the System Management Mode of
|
|
|
|
the CPU on Toshiba portables with a genuine Toshiba BIOS. It does
|
|
|
|
not work on models with a Phoenix BIOS. The System Management Mode
|
|
|
|
is used to set the BIOS and power saving options on Toshiba portables.
|
|
|
|
|
|
|
|
For information on utilities to make use of this driver see the
|
|
|
|
Toshiba Linux utilities web site at:
|
|
|
|
<http://www.buzzard.org.uk/toshiba/>.
|
|
|
|
|
|
|
|
Say Y if you intend to run this kernel on a Toshiba portable.
|
|
|
|
Say N otherwise.
|
|
|
|
|
|
|
|
config X86_REBOOTFIXUPS
|
2008-10-16 05:01:38 +00:00
|
|
|
bool "Enable X86 board specific fixups for reboot"
|
|
|
|
depends on X86_32
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
This enables chipset and/or board specific fixups to be done
|
|
|
|
in order to get reboot to work correctly. This is only needed on
|
|
|
|
some combinations of hardware and BIOS. The symptom, for which
|
|
|
|
this config is intended, is when reboot ends with a stalled/hung
|
|
|
|
system.
|
|
|
|
|
|
|
|
Currently, the only fixup is for the Geode machines using
|
2008-01-30 12:33:36 +00:00
|
|
|
CS5530A and CS5536 chipsets and the RDC R-321x SoC.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
|
|
|
Say Y if you want to enable the fixup. Currently, it's safe to
|
|
|
|
enable this option even if you don't need it.
|
|
|
|
Say N otherwise.
|
|
|
|
|
|
|
|
config MICROCODE
|
2023-08-10 18:37:29 +00:00
|
|
|
def_bool y
|
2013-10-13 16:36:29 +00:00
|
|
|
depends on CPU_SUP_AMD || CPU_SUP_INTEL
|
2008-07-28 16:44:22 +00:00
|
|
|
|
2023-10-17 21:23:29 +00:00
|
|
|
config MICROCODE_INITRD32
|
|
|
|
def_bool y
|
|
|
|
depends on MICROCODE && X86_32 && BLK_DEV_INITRD
|
|
|
|
|
2022-05-25 16:12:30 +00:00
|
|
|
config MICROCODE_LATE_LOADING
|
|
|
|
bool "Late microcode loading (DANGEROUS)"
|
2019-04-05 04:28:11 +00:00
|
|
|
default n
|
2023-10-02 11:59:56 +00:00
|
|
|
depends on MICROCODE && SMP
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2022-05-25 16:12:30 +00:00
|
|
|
Loading microcode late, when the system is up and executing instructions
|
|
|
|
is a tricky business and should be avoided if possible. Just the sequence
|
|
|
|
of synchronizing all cores and SMT threads is one fragile dance which does
|
|
|
|
not guarantee that cores might not softlock after the loading. Therefore,
|
2023-10-17 21:24:16 +00:00
|
|
|
use this at your own risk. Late loading taints the kernel unless the
|
|
|
|
microcode header indicates that it is safe for late loading via the
|
|
|
|
minimal revision check. This minimal revision check can be enforced on
|
|
|
|
the kernel command line with "microcode.minrev=Y".
|
|
|
|
|
|
|
|
config MICROCODE_LATE_FORCE_MINREV
|
|
|
|
bool "Enforce late microcode loading minimal revision check"
|
|
|
|
default n
|
|
|
|
depends on MICROCODE_LATE_LOADING
|
|
|
|
help
|
|
|
|
To prevent that users load microcode late which modifies already
|
|
|
|
in use features, newer microcode patches have a minimum revision field
|
|
|
|
in the microcode header, which tells the kernel which minimum
|
|
|
|
revision must be active in the CPU to safely load that new microcode
|
|
|
|
late into the running system. If disabled the check will not
|
|
|
|
be enforced but the kernel will be tainted when the minimal
|
|
|
|
revision check fails.
|
|
|
|
|
|
|
|
This minimal revision check can also be controlled via the
|
|
|
|
"microcode.minrev" parameter on the kernel command line.
|
|
|
|
|
|
|
|
If unsure say Y.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
|
|
|
config X86_MSR
|
|
|
|
tristate "/dev/cpu/*/msr - Model-specific register support"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
This device gives privileged processes access to the x86
|
|
|
|
Model-Specific Registers (MSRs). It is a character device with
|
|
|
|
major 202 and minors 0 to 31 for /dev/cpu/0/msr to /dev/cpu/31/msr.
|
|
|
|
MSR accesses are directed to a specific CPU on multi-processor
|
|
|
|
systems.
|
|
|
|
|
|
|
|
config X86_CPUID
|
|
|
|
tristate "/dev/cpu/*/cpuid - CPU information support"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
This device gives processes access to the x86 CPUID instruction to
|
|
|
|
be executed on a specific processor. It is a character device
|
|
|
|
with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
|
|
|
|
/dev/cpu/31/cpuid.
|
|
|
|
|
|
|
|
choice
|
|
|
|
prompt "High Memory Support"
|
2010-04-21 14:23:44 +00:00
|
|
|
default HIGHMEM4G
|
2007-11-09 20:56:54 +00:00
|
|
|
depends on X86_32
|
|
|
|
|
|
|
|
config NOHIGHMEM
|
|
|
|
bool "off"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
Linux can use up to 64 Gigabytes of physical memory on x86 systems.
|
|
|
|
However, the address space of 32-bit x86 processors is only 4
|
|
|
|
Gigabytes large. That means that, if you have a large amount of
|
|
|
|
physical memory, not all of it can be "permanently mapped" by the
|
|
|
|
kernel. The physical memory that's not permanently mapped is called
|
|
|
|
"high memory".
|
|
|
|
|
|
|
|
If you are compiling a kernel which will never run on a machine with
|
|
|
|
more than 1 Gigabyte total physical RAM, answer "off" here (default
|
|
|
|
choice and suitable for most users). This will result in a "3GB/1GB"
|
|
|
|
split: 3GB are mapped so that each process sees a 3GB virtual memory
|
|
|
|
space and the remaining part of the 4GB virtual memory space is used
|
|
|
|
by the kernel to permanently map as much physical memory as
|
|
|
|
possible.
|
|
|
|
|
|
|
|
If the machine has between 1 and 4 Gigabytes physical RAM, then
|
|
|
|
answer "4GB" here.
|
|
|
|
|
|
|
|
If more than 4 Gigabytes is used then answer "64GB" here. This
|
|
|
|
selection turns Intel PAE (Physical Address Extension) mode on.
|
|
|
|
PAE implements 3-level paging on IA32 processors. PAE is fully
|
|
|
|
supported by Linux, PAE mode is implemented on all recent Intel
|
|
|
|
processors (Pentium Pro and better). NOTE: If you say "64GB" here,
|
|
|
|
then the kernel will not boot on CPUs that don't support PAE!
|
|
|
|
|
|
|
|
The actual amount of total physical memory will either be
|
|
|
|
auto detected or can be forced by using a kernel command line option
|
|
|
|
such as "mem=256M". (Try "man bootparam" or see the documentation of
|
|
|
|
your boot loader (lilo or loadlin) about how to pass options to the
|
|
|
|
kernel at boot time.)
|
|
|
|
|
|
|
|
If unsure, say "off".
|
|
|
|
|
|
|
|
config HIGHMEM4G
|
|
|
|
bool "4GB"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
Select this if you have a 32-bit processor and between 1 and 4
|
|
|
|
gigabytes of physical RAM.
|
|
|
|
|
|
|
|
config HIGHMEM64G
|
|
|
|
bool "64GB"
|
2023-12-04 08:47:02 +00:00
|
|
|
depends on X86_HAVE_PAE
|
2007-11-09 20:56:54 +00:00
|
|
|
select X86_PAE
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
Select this if you have a 32-bit processor and more than 4
|
|
|
|
gigabytes of physical RAM.
|
|
|
|
|
|
|
|
endchoice
|
|
|
|
|
|
|
|
choice
|
2011-01-20 22:44:16 +00:00
|
|
|
prompt "Memory split" if EXPERT
|
2007-11-09 20:56:54 +00:00
|
|
|
default VMSPLIT_3G
|
|
|
|
depends on X86_32
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
Select the desired split between kernel and user memory.
|
|
|
|
|
|
|
|
If the address range available to the kernel is less than the
|
|
|
|
physical memory installed, the remaining memory will be available
|
|
|
|
as "high memory". Accessing high memory is a little more costly
|
|
|
|
than low memory, as it needs to be mapped into the kernel first.
|
|
|
|
Note that increasing the kernel address space limits the range
|
|
|
|
available to user programs, making the address space there
|
|
|
|
tighter. Selecting anything other than the default 3G/1G split
|
|
|
|
will also likely make your kernel incompatible with binary-only
|
|
|
|
kernel modules.
|
|
|
|
|
|
|
|
If you are not absolutely sure what you are doing, leave this
|
|
|
|
option alone!
|
|
|
|
|
|
|
|
config VMSPLIT_3G
|
|
|
|
bool "3G/1G user/kernel split"
|
|
|
|
config VMSPLIT_3G_OPT
|
|
|
|
depends on !X86_PAE
|
|
|
|
bool "3G/1G user/kernel split (for full 1G low memory)"
|
|
|
|
config VMSPLIT_2G
|
|
|
|
bool "2G/2G user/kernel split"
|
|
|
|
config VMSPLIT_2G_OPT
|
|
|
|
depends on !X86_PAE
|
|
|
|
bool "2G/2G user/kernel split (for full 2G low memory)"
|
|
|
|
config VMSPLIT_1G
|
|
|
|
bool "1G/3G user/kernel split"
|
|
|
|
endchoice
|
|
|
|
|
|
|
|
config PAGE_OFFSET
|
|
|
|
hex
|
|
|
|
default 0xB0000000 if VMSPLIT_3G_OPT
|
|
|
|
default 0x80000000 if VMSPLIT_2G
|
|
|
|
default 0x78000000 if VMSPLIT_2G_OPT
|
|
|
|
default 0x40000000 if VMSPLIT_1G
|
|
|
|
default 0xC0000000
|
|
|
|
depends on X86_32
|
|
|
|
|
|
|
|
config HIGHMEM
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
2007-11-09 20:56:54 +00:00
|
|
|
depends on X86_32 && (HIGHMEM64G || HIGHMEM4G)
|
|
|
|
|
|
|
|
config X86_PAE
|
2008-10-16 05:01:38 +00:00
|
|
|
bool "PAE (Physical Address Extension) Support"
|
2023-12-04 08:47:02 +00:00
|
|
|
depends on X86_32 && X86_HAVE_PAE
|
2018-04-03 14:24:20 +00:00
|
|
|
select PHYS_ADDR_T_64BIT
|
2015-10-05 15:31:33 +00:00
|
|
|
select SWIOTLB
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
PAE is required for NX support, and furthermore enables
|
|
|
|
larger swapspace support for non-overcommit purposes. It
|
|
|
|
has the cost of more pagetable lookup overhead, and also
|
|
|
|
consumes more pagetable space per process.
|
|
|
|
|
2017-07-16 22:59:54 +00:00
|
|
|
config X86_5LEVEL
|
|
|
|
bool "Enable 5-level page tables support"
|
2019-09-13 09:54:52 +00:00
|
|
|
default y
|
2018-02-14 11:16:50 +00:00
|
|
|
select DYNAMIC_MEMORY_LAYOUT
|
2018-02-14 11:16:54 +00:00
|
|
|
select SPARSEMEM_VMEMMAP
|
2017-07-16 22:59:54 +00:00
|
|
|
depends on X86_64
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2017-07-16 22:59:54 +00:00
|
|
|
5-level paging enables access to larger address space:
|
2023-01-24 18:17:53 +00:00
|
|
|
up to 128 PiB of virtual address space and 4 PiB of
|
2017-07-16 22:59:54 +00:00
|
|
|
physical address space.
|
|
|
|
|
|
|
|
It will be supported by future Intel CPUs.
|
|
|
|
|
2018-02-14 18:25:42 +00:00
|
|
|
A kernel with the option enabled can be booted on machines that
|
|
|
|
support 4- or 5-level paging.
|
2017-07-16 22:59:54 +00:00
|
|
|
|
2023-03-14 23:06:44 +00:00
|
|
|
See Documentation/arch/x86/x86_64/5level-paging.rst for more
|
2017-07-16 22:59:54 +00:00
|
|
|
information.
|
|
|
|
|
|
|
|
Say N if unsure.
|
|
|
|
|
2015-03-05 07:18:23 +00:00
|
|
|
config X86_DIRECT_GBPAGES
|
2015-03-05 01:24:12 +00:00
|
|
|
def_bool y
|
2019-08-07 13:02:58 +00:00
|
|
|
depends on X86_64
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2015-03-05 07:18:23 +00:00
|
|
|
Certain kernel features effectively disable kernel
|
|
|
|
linear 1 GB mappings (even if the CPU otherwise
|
|
|
|
supports them), so don't confuse the user by printing
|
|
|
|
that we have them enabled.
|
2008-10-22 10:33:16 +00:00
|
|
|
|
2018-09-17 14:29:12 +00:00
|
|
|
config X86_CPA_STATISTICS
|
|
|
|
bool "Enable statistic for Change Page Attribute"
|
|
|
|
depends on DEBUG_FS
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2019-11-20 14:57:04 +00:00
|
|
|
Expose statistics about the Change Page Attribute mechanism, which
|
2019-04-16 10:57:51 +00:00
|
|
|
helps to determine the effectiveness of preserving large and huge
|
2018-09-17 14:29:12 +00:00
|
|
|
page mappings when mapping protections are changed.
|
|
|
|
|
2021-12-06 13:55:05 +00:00
|
|
|
config X86_MEM_ENCRYPT
|
|
|
|
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
|
|
|
|
select DYNAMIC_PHYSICAL_MASK
|
|
|
|
def_bool n
|
|
|
|
|
2017-07-17 21:10:03 +00:00
|
|
|
config AMD_MEM_ENCRYPT
|
|
|
|
bool "AMD Secure Memory Encryption (SME) support"
|
|
|
|
depends on X86_64 && CPU_SUP_AMD
|
2023-06-06 14:51:26 +00:00
|
|
|
depends on EFI_STUB
|
2020-04-15 00:05:01 +00:00
|
|
|
select DMA_COHERENT_POOL
|
2019-02-02 09:41:17 +00:00
|
|
|
select ARCH_USE_MEMREMAP_PROT
|
2020-09-07 13:15:24 +00:00
|
|
|
select INSTRUCTION_DECODER
|
2021-09-08 22:58:34 +00:00
|
|
|
select ARCH_HAS_CC_PLATFORM
|
2021-12-06 13:55:05 +00:00
|
|
|
select X86_MEM_ENCRYPT
|
2023-06-06 14:51:26 +00:00
|
|
|
select UNACCEPTED_MEMORY
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2017-07-17 21:10:03 +00:00
|
|
|
Say yes to enable support for the encryption of system memory.
|
|
|
|
This requires an AMD processor that supports Secure Memory
|
|
|
|
Encryption (SME).
|
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
# Common NUMA Features
|
|
|
|
config NUMA
|
2019-12-04 00:06:47 +00:00
|
|
|
bool "NUMA Memory Allocation and Scheduler Support"
|
2007-11-09 20:56:54 +00:00
|
|
|
depends on SMP
|
2014-02-25 20:14:06 +00:00
|
|
|
depends on X86_64 || (X86_32 && HIGHMEM64G && X86_BIGSMP)
|
|
|
|
default y if X86_BIGSMP
|
mm: percpu: generalize percpu related config
Patch series "mm: percpu: Cleanup percpu first chunk function".
When supporting page mapping percpu first chunk allocator on arm64, we
found there are lots of duplicated codes in percpu embed/page first chunk
allocator. This patchset is aimed to cleanup them and should no function
change.
The currently supported status about 'embed' and 'page' in Archs shows
below,
embed: NEED_PER_CPU_PAGE_FIRST_CHUNK
page: NEED_PER_CPU_EMBED_FIRST_CHUNK
embed page
------------------------
arm64 Y Y
mips Y N
powerpc Y Y
riscv Y N
sparc Y Y
x86 Y Y
------------------------
There are two interfaces about percpu first chunk allocator,
extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
size_t atom_size,
pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn);
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
extern int __init pcpu_page_first_chunk(size_t reserved_size,
- pcpu_fc_alloc_fn_t alloc_fn,
- pcpu_fc_free_fn_t free_fn,
- pcpu_fc_populate_pte_fn_t populate_pte_fn);
+ pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
The pcpu_fc_alloc_fn_t/pcpu_fc_free_fn_t is killed, we provide generic
pcpu_fc_alloc() and pcpu_fc_free() function, which are called in the
pcpu_embed/page_first_chunk().
1) For pcpu_embed_first_chunk(), pcpu_fc_cpu_to_node_fn_t is needed to be
provided when archs supported NUMA.
2) For pcpu_page_first_chunk(), the pcpu_fc_populate_pte_fn_t is killed too,
a generic pcpu_populate_pte() which marked '__weak' is provided, if you
need a different function to populate pte on the arch(like x86), please
provide its own implementation.
[1] https://github.com/kevin78/linux.git percpu-cleanup
This patch (of 4):
The HAVE_SETUP_PER_CPU_AREA/NEED_PER_CPU_EMBED_FIRST_CHUNK/
NEED_PER_CPU_PAGE_FIRST_CHUNK/USE_PERCPU_NUMA_NODE_ID configs, which have
duplicate definitions on platforms that subscribe it.
Move them into mm, drop these redundant definitions and instead just
select it on applicable platforms.
Link: https://lkml.kernel.org/r/20211216112359.103822-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20211216112359.103822-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 02:07:41 +00:00
|
|
|
select USE_PERCPU_NUMA_NODE_ID
|
2023-08-25 07:47:37 +00:00
|
|
|
select OF_NUMA if OF
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2019-12-04 00:06:47 +00:00
|
|
|
Enable NUMA (Non-Uniform Memory Access) support.
|
2008-11-04 17:27:19 +00:00
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
The kernel will try to allocate memory used by a CPU on the
|
|
|
|
local memory controller of the CPU and add some more
|
|
|
|
NUMA awareness to the kernel.
|
|
|
|
|
2008-11-08 12:29:45 +00:00
|
|
|
For 64-bit this is recommended if the system is Intel Core i7
|
2008-11-04 17:27:19 +00:00
|
|
|
(or later), AMD Opteron, or EM64T NUMA.
|
|
|
|
|
2014-02-25 20:14:06 +00:00
|
|
|
For 32-bit this is only needed if you boot a 32-bit
|
2014-02-12 02:11:13 +00:00
|
|
|
kernel on a 64-bit NUMA platform.
|
2008-11-04 17:27:19 +00:00
|
|
|
|
|
|
|
Otherwise, you should say N.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2010-10-29 15:14:30 +00:00
|
|
|
config AMD_NUMA
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
|
|
|
prompt "Old style AMD Opteron NUMA detection"
|
2011-07-11 08:34:32 +00:00
|
|
|
depends on X86_64 && NUMA && PCI
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2010-10-29 15:14:30 +00:00
|
|
|
Enable AMD NUMA node topology detection. You should say Y here if
|
|
|
|
you have a multi processor AMD system. This uses an old method to
|
|
|
|
read the NUMA configuration directly from the builtin Northbridge
|
|
|
|
of Opteron. It is recommended to use X86_64_ACPI_NUMA instead,
|
|
|
|
which also takes priority if both are compiled in.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
|
|
|
config X86_64_ACPI_NUMA
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
|
|
|
prompt "ACPI NUMA detection"
|
2007-11-09 20:56:54 +00:00
|
|
|
depends on X86_64 && NUMA && ACPI && PCI
|
|
|
|
select ACPI_NUMA
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
Enable ACPI SRAT based node topology detection.
|
|
|
|
|
|
|
|
config NODES_SHIFT
|
2008-08-25 21:15:38 +00:00
|
|
|
int "Maximum NUMA Nodes (as a power of 2)" if !MAXSMP
|
2010-03-25 22:39:27 +00:00
|
|
|
range 1 10
|
|
|
|
default "10" if MAXSMP
|
2007-11-09 20:56:54 +00:00
|
|
|
default "6" if X86_64
|
|
|
|
default "3"
|
2021-06-29 02:43:01 +00:00
|
|
|
depends on NUMA
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2008-05-12 19:21:12 +00:00
|
|
|
Specify the maximum number of NUMA Nodes available on the target
|
2009-01-26 10:12:25 +00:00
|
|
|
system. Increases memory reserved to accommodate various tables.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
|
|
|
config ARCH_FLATMEM_ENABLE
|
|
|
|
def_bool y
|
2011-04-01 09:15:12 +00:00
|
|
|
depends on X86_32 && !NUMA
|
2007-11-09 20:56:54 +00:00
|
|
|
|
|
|
|
config ARCH_SPARSEMEM_ENABLE
|
|
|
|
def_bool y
|
2012-10-02 18:16:47 +00:00
|
|
|
depends on X86_64 || NUMA || X86_32 || X86_32_NON_STANDARD
|
2007-11-09 20:56:54 +00:00
|
|
|
select SPARSEMEM_STATIC if X86_32
|
|
|
|
select SPARSEMEM_VMEMMAP_ENABLE if X86_64
|
|
|
|
|
2011-04-01 09:15:12 +00:00
|
|
|
config ARCH_SPARSEMEM_DEFAULT
|
2019-04-24 13:24:11 +00:00
|
|
|
def_bool X86_64 || (NUMA && X86_32)
|
2011-04-01 09:15:12 +00:00
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
config ARCH_SELECT_MEMORY_MODEL
|
|
|
|
def_bool y
|
2021-09-29 14:43:21 +00:00
|
|
|
depends on ARCH_SPARSEMEM_ENABLE && ARCH_FLATMEM_ENABLE
|
2007-11-09 20:56:54 +00:00
|
|
|
|
|
|
|
config ARCH_MEMORY_PROBE
|
2013-07-19 17:47:48 +00:00
|
|
|
bool "Enable sysfs memory/probe interface"
|
2021-11-05 20:44:39 +00:00
|
|
|
depends on MEMORY_HOTPLUG
|
2013-07-19 17:47:48 +00:00
|
|
|
help
|
|
|
|
This option enables a sysfs memory/probe interface for testing.
|
2019-06-07 18:54:32 +00:00
|
|
|
See Documentation/admin-guide/mm/memory-hotplug.rst for more information.
|
2013-07-19 17:47:48 +00:00
|
|
|
If you are unsure how to answer this question, answer N.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2011-04-01 09:15:12 +00:00
|
|
|
config ARCH_PROC_KCORE_TEXT
|
|
|
|
def_bool y
|
|
|
|
depends on X86_64 && PROC_KCORE
|
|
|
|
|
2010-01-10 14:28:09 +00:00
|
|
|
config ILLEGAL_POINTER_VALUE
|
2019-11-21 03:21:09 +00:00
|
|
|
hex
|
|
|
|
default 0 if X86_32
|
|
|
|
default 0xdead000000000000 if X86_64
|
2010-01-10 14:28:09 +00:00
|
|
|
|
2015-08-19 04:34:34 +00:00
|
|
|
config X86_PMEM_LEGACY_DEVICE
|
|
|
|
bool
|
|
|
|
|
2015-04-01 07:12:18 +00:00
|
|
|
config X86_PMEM_LEGACY
|
2015-08-19 04:34:34 +00:00
|
|
|
tristate "Support non-standard NVDIMMs and ADR protected memory"
|
2015-06-09 19:33:45 +00:00
|
|
|
depends on PHYS_ADDR_T_64BIT
|
|
|
|
depends on BLK_DEV
|
2015-08-19 04:34:34 +00:00
|
|
|
select X86_PMEM_LEGACY_DEVICE
|
libnvdimm/e820: Retrieve and populate correct 'target_node' info
Use the new phys_to_target_node() and numa_map_to_online_node() helpers
to retrieve the correct id for the 'numa_node' ("local" / online
initiator node) and 'target_node' (offline target memory node) sysfs
attributes.
Below is an example from a 4 NUMA node system where all the memory on
node2 is pmem / reserved. It should be noted that with the arrival of
the ACPI HMAT table and EFI Specific Purpose Memory the kernel will
start to see more platforms with reserved / performance differentiated
memory in its own NUMA node. Hence all the stakeholders on the Cc for
what is ostensibly a libnvdimm local patch.
=== Before ===
/* Notice no online memory on node2 at start */
# numactl --hardware
available: 3 nodes (0-1,3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
node 0 size: 3958 MB
node 0 free: 3708 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
node 1 size: 4027 MB
node 1 free: 3871 MB
node 3 cpus:
node 3 size: 3994 MB
node 3 free: 3971 MB
node distances:
node 0 1 3
0: 10 21 21
1: 21 10 21
3: 21 21 10
/*
* Put the pmem namespace into devdax mode so it can be assigned to the
* kmem driver
*/
# ndctl create-namespace -e namespace0.0 -m devdax -f
{
"dev":"namespace0.0",
"mode":"devdax",
"map":"dev",
"size":"3.94 GiB (4.23 GB)",
"uuid":"1650af9b-9ba3-4704-acd6-10178399d9a3",
[..]
}
/* Online Persistent Memory as System RAM */
# daxctl reconfigure-device --mode=system-ram dax0.0
libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
libdaxctl: memblock_in_dev: dax0.0: memory0: Unable to determine phys_index: Success
[
{
"chardev":"dax0.0",
"size":4225761280,
"target_node":0,
"mode":"system-ram"
}
]
reconfigured 1 device
/* Note that the memory is onlined by default to the wrong node, node0 */
# numactl --hardware
available: 3 nodes (0-1,3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
node 0 size: 7926 MB
node 0 free: 7655 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
node 1 size: 4027 MB
node 1 free: 3871 MB
node 3 cpus:
node 3 size: 3994 MB
node 3 free: 3971 MB
node distances:
node 0 1 3
0: 10 21 21
1: 21 10 21
3: 21 21 10
=== After ===
/* Notice that the "phys_index" error messages are gone */
# daxctl reconfigure-device --mode=system-ram dax0.0
[
{
"chardev":"dax0.0",
"size":4225761280,
"target_node":2,
"mode":"system-ram"
}
]
reconfigured 1 device
/* Notice that node2 is now correctly populated */
# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
node 0 size: 3958 MB
node 0 free: 3793 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
node 1 size: 4027 MB
node 1 free: 3851 MB
node 2 cpus:
node 2 size: 3968 MB
node 2 free: 3968 MB
node 3 cpus:
node 3 size: 3994 MB
node 3 free: 3908 MB
node distances:
node 0 1 2 3
0: 10 21 21 21
1: 21 10 21 21
2: 21 21 10 21
3: 21 21 21 10
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/158188327614.894464.13122730362187722603.stgit@dwillia2-desk3.amr.corp.intel.com
2020-02-16 20:01:16 +00:00
|
|
|
select NUMA_KEEP_MEMINFO if NUMA
|
2015-06-09 19:33:45 +00:00
|
|
|
select LIBNVDIMM
|
2015-04-01 07:12:18 +00:00
|
|
|
help
|
|
|
|
Treat memory marked using the non-standard e820 type of 12 as used
|
|
|
|
by the Intel Sandy Bridge-EP reference BIOS as protected memory.
|
|
|
|
The kernel will offer these regions to the 'pmem' driver so
|
|
|
|
they can be used for persistent storage.
|
|
|
|
|
|
|
|
Say Y if unsure.
|
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
config HIGHPTE
|
|
|
|
bool "Allocate 3rd-level pagetables from highmem"
|
2010-04-21 14:23:44 +00:00
|
|
|
depends on HIGHMEM
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
The VM uses one page table entry for each page of physical memory.
|
|
|
|
For systems with a lot of RAM, this can be wasteful of precious
|
|
|
|
low memory. Setting this option will put user-space page table
|
|
|
|
entries in high memory.
|
|
|
|
|
2008-09-07 08:51:34 +00:00
|
|
|
config X86_CHECK_BIOS_CORRUPTION
|
2009-02-05 15:21:53 +00:00
|
|
|
bool "Check for low memory corruption"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2009-02-05 15:21:53 +00:00
|
|
|
Periodically check for memory corruption in low memory, which
|
|
|
|
is suspected to be caused by BIOS. Even when enabled in the
|
|
|
|
configuration, it is disabled at runtime. Enable it by
|
|
|
|
setting "memory_corruption_check=1" on the kernel command
|
|
|
|
line. By default it scans the low 64k of memory every 60
|
|
|
|
seconds; see the memory_corruption_check_size and
|
|
|
|
memory_corruption_check_period parameters in
|
2016-10-18 12:12:27 +00:00
|
|
|
Documentation/admin-guide/kernel-parameters.rst to adjust this.
|
2009-02-05 15:21:53 +00:00
|
|
|
|
|
|
|
When enabled with the default parameters, this option has
|
|
|
|
almost no overhead, as it reserves a relatively small amount
|
|
|
|
of memory and scans it infrequently. It both detects corruption
|
|
|
|
and prevents it from affecting the running system.
|
|
|
|
|
|
|
|
It is, however, intended as a diagnostic tool; if repeatable
|
|
|
|
BIOS-originated corruption always affects the same memory,
|
|
|
|
you can use memmap= to prevent the kernel from using that
|
|
|
|
memory.
|
2008-09-07 08:51:34 +00:00
|
|
|
|
2008-09-07 09:37:32 +00:00
|
|
|
config X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK
|
2009-02-05 15:21:53 +00:00
|
|
|
bool "Set the default setting of memory_corruption_check"
|
2008-09-07 09:37:32 +00:00
|
|
|
depends on X86_CHECK_BIOS_CORRUPTION
|
|
|
|
default y
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2009-02-05 15:21:53 +00:00
|
|
|
Set whether the default state of memory_corruption_check is
|
|
|
|
on or off.
|
2008-09-07 09:37:32 +00:00
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
config MATH_EMULATION
|
|
|
|
bool
|
2015-07-30 21:31:34 +00:00
|
|
|
depends on MODIFY_LDT_SYSCALL
|
2019-10-01 14:23:35 +00:00
|
|
|
prompt "Math emulation" if X86_32 && (M486SX || MELAN)
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
Linux can emulate a math coprocessor (used for floating point
|
|
|
|
operations) if you don't have one. 486DX and Pentium processors have
|
|
|
|
a math coprocessor built in, 486SX and 386 do not, unless you added
|
|
|
|
a 487DX or 387, respectively. (The messages during boot time can
|
|
|
|
give you some hints here ["man dmesg"].) Everyone needs either a
|
|
|
|
coprocessor or this emulation.
|
|
|
|
|
|
|
|
If you don't have a math coprocessor, you need to say Y here; if you
|
|
|
|
say Y here even though you have a coprocessor, the coprocessor will
|
|
|
|
be used nevertheless. (This behavior can be changed with the kernel
|
|
|
|
command line option "no387", which comes handy if your coprocessor
|
|
|
|
is broken. Try "man bootparam" or see the documentation of your boot
|
|
|
|
loader (lilo or loadlin) about how to pass options to the kernel at
|
|
|
|
boot time.) This means that it is a good idea to say Y here if you
|
|
|
|
intend to use this kernel on different machines.
|
|
|
|
|
|
|
|
More information about the internals of the Linux math coprocessor
|
|
|
|
emulation can be found in <file:arch/x86/math-emu/README>.
|
|
|
|
|
|
|
|
If you are not sure, say Y; apart from resulting in a 66 KB bigger
|
|
|
|
kernel, it won't hurt.
|
|
|
|
|
|
|
|
config MTRR
|
2010-04-21 14:23:44 +00:00
|
|
|
def_bool y
|
2011-01-20 22:44:16 +00:00
|
|
|
prompt "MTRR (Memory Type Range Register) support" if EXPERT
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
On Intel P6 family processors (Pentium Pro, Pentium II and later)
|
|
|
|
the Memory Type Range Registers (MTRRs) may be used to control
|
|
|
|
processor access to memory ranges. This is most useful if you have
|
|
|
|
a video (VGA) card on a PCI or AGP bus. Enabling write-combining
|
|
|
|
allows bus write transfers to be combined into a larger transfer
|
|
|
|
before bursting over the PCI/AGP bus. This can increase performance
|
|
|
|
of image write operations 2.5 times or more. Saying Y here creates a
|
|
|
|
/proc/mtrr file which may be used to manipulate your processor's
|
|
|
|
MTRRs. Typically the X server should use this.
|
|
|
|
|
|
|
|
This code has a reasonably generic interface so that similar
|
|
|
|
control registers on other processors can be easily supported
|
|
|
|
as well:
|
|
|
|
|
|
|
|
The Cyrix 6x86, 6x86MX and M II processors have Address Range
|
|
|
|
Registers (ARRs) which provide a similar functionality to MTRRs. For
|
|
|
|
these, the ARRs are used to emulate the MTRRs.
|
|
|
|
The AMD K6-2 (stepping 8 and above) and K6-3 processors have two
|
|
|
|
MTRRs. The Centaur C6 (WinChip) has 8 MCRs, allowing
|
|
|
|
write-combining. All of these processors are supported by this code
|
|
|
|
and it makes sense to say Y here if you have one of them.
|
|
|
|
|
|
|
|
Saying Y here also fixes a problem with buggy SMP BIOSes which only
|
|
|
|
set the MTRRs for the boot CPU and not for the secondary CPUs. This
|
|
|
|
can lead to all sorts of problems, so it's good to say Y here.
|
|
|
|
|
|
|
|
You can safely say Y even if your machine doesn't have MTRRs, you'll
|
|
|
|
just add about 9 KB to your kernel.
|
|
|
|
|
2023-03-14 23:06:44 +00:00
|
|
|
See <file:Documentation/arch/x86/mtrr.rst> for more information.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2008-04-29 10:52:33 +00:00
|
|
|
config MTRR_SANITIZER
|
x86: change MTRR_SANITIZER to def_bool y
This option has been added in v2.6.26 as a default-disabled
feature and went through several revisions since then.
The feature fixes a wide range of MTRR setup problems that BIOSes
leave us with: slow system, slow Xorg, slow system when adding lots
of RAM, etc., so we want to enable it by default for v2.6.28.
See:
[Bug 10508] Upgrade to 4GB of RAM messes up MTRRs
http://bugzilla.kernel.org/show_bug.cgi?id=10508
and the test results in:
http://lkml.org/lkml/2008/9/29/273
1. hpa
reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1
reg01: base=0x13c000000 (5056MB), size= 64MB: uncachable, count=1
reg02: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg03: base=0x100000000 (4096MB), size=1024MB: write-back, count=1
reg04: base=0xbf700000 (3063MB), size= 1MB: uncachable, count=1
reg05: base=0xbf800000 (3064MB), size= 8MB: uncachable, count=1
will get
Found optimal setting for mtrr clean up
gran_size: 1M chunk_size: 128M num_reg: 6 lose RAM: 0M
range0: 0000000000000000 - 00000000c0000000
Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
hole: 00000000bf700000 - 00000000c0000000
Setting variable MTRR 2, base: 3063MB, range: 1MB, type UC
Setting variable MTRR 3, base: 3064MB, range: 8MB, type UC
range0: 0000000100000000 - 0000000140000000
Setting variable MTRR 4, base: 4096MB, range: 1024MB, type WB
hole: 000000013c000000 - 0000000140000000
Setting variable MTRR 5, base: 5056MB, range: 64MB, type UC
2. Dylan Taft
reg00: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg01: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
reg02: base=0x120000000 (4608MB), size= 256MB: write-back, count=1
reg03: base=0xd0000000 (3328MB), size= 256MB: uncachable, count=1
reg04: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg05: base=0xc7e00000 (3198MB), size= 2MB: uncachable, count=1
reg06: base=0xc8000000 (3200MB), size= 128MB: uncachable, count=1
will get
Found optimal setting for mtrr clean up
gran_size: 1M chunk_size: 4M num_reg: 6 lose RAM: 0M
range0: 0000000000000000 - 00000000c8000000
Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
Setting variable MTRR 2, base: 3072MB, range: 128MB, type WB
hole: 00000000c7e00000 - 00000000c8000000
Setting variable MTRR 3, base: 3198MB, range: 2MB, type UC
rangeX: 0000000100000000 - 0000000130000000
Setting variable MTRR 4, base: 4096MB, range: 512MB, type WB
Setting variable MTRR 5, base: 4608MB, range: 256MB, type WB
3. Gabriel
reg00: base=0xd0000000 (3328MB), size= 256MB: uncachable, count=1
reg01: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg02: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg03: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
reg04: base=0x120000000 (4608MB), size= 128MB: write-back, count=1
reg05: base=0x128000000 (4736MB), size= 64MB: write-back, count=1
reg06: base=0xcf600000 (3318MB), size= 2MB: uncachable, count=1
will get
Found optimal setting for mtrr clean up
gran_size: 1M chunk_size: 16M num_reg: 7 lose RAM: 0M
range0: 0000000000000000 - 00000000d0000000
Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
Setting variable MTRR 2, base: 3072MB, range: 256MB, type WB
hole: 00000000cf600000 - 00000000cf800000
Setting variable MTRR 3, base: 3318MB, range: 2MB, type UC
rangeX: 0000000100000000 - 000000012c000000
Setting variable MTRR 4, base: 4096MB, range: 512MB, type WB
Setting variable MTRR 5, base: 4608MB, range: 128MB, type WB
Setting variable MTRR 6, base: 4736MB, range: 64MB, type WB
4. Mika Fischer
reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1
reg01: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg02: base=0x100000000 (4096MB), size=1024MB: write-back, count=1
reg03: base=0xbf700000 (3063MB), size= 1MB: uncachable, count=1
reg04: base=0xbf800000 (3064MB), size= 8MB: uncachable, count=1
will get
Found optimal setting for mtrr clean up
gran_size: 1M chunk_size: 16M num_reg: 5 lose RAM: 0M
range0: 0000000000000000 - 00000000c0000000
Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
hole: 00000000bf700000 - 00000000c0000000
Setting variable MTRR 2, base: 3063MB, range: 1MB, type UC
Setting variable MTRR 3, base: 3064MB, range: 8MB, type UC
rangeX: 0000000100000000 - 0000000140000000
Setting variable MTRR 4, base: 4096MB, range: 1024MB, type WB
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-30 23:29:40 +00:00
|
|
|
def_bool y
|
2008-04-29 10:52:33 +00:00
|
|
|
prompt "MTRR cleanup support"
|
|
|
|
depends on MTRR
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2008-07-15 12:48:48 +00:00
|
|
|
Convert MTRR layout from continuous to discrete, so X drivers can
|
|
|
|
add writeback entries.
|
2008-04-29 10:52:33 +00:00
|
|
|
|
2008-07-15 12:48:48 +00:00
|
|
|
Can be disabled with disable_mtrr_cleanup on the kernel command line.
|
2009-01-26 10:12:25 +00:00
|
|
|
The largest mtrr entry size for a continuous block can be set with
|
2008-07-15 12:48:48 +00:00
|
|
|
mtrr_chunk_size.
|
2008-04-29 10:52:33 +00:00
|
|
|
|
x86: change MTRR_SANITIZER to def_bool y
This option has been added in v2.6.26 as a default-disabled
feature and went through several revisions since then.
The feature fixes a wide range of MTRR setup problems that BIOSes
leave us with: slow system, slow Xorg, slow system when adding lots
of RAM, etc., so we want to enable it by default for v2.6.28.
See:
[Bug 10508] Upgrade to 4GB of RAM messes up MTRRs
http://bugzilla.kernel.org/show_bug.cgi?id=10508
and the test results in:
http://lkml.org/lkml/2008/9/29/273
1. hpa
reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1
reg01: base=0x13c000000 (5056MB), size= 64MB: uncachable, count=1
reg02: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg03: base=0x100000000 (4096MB), size=1024MB: write-back, count=1
reg04: base=0xbf700000 (3063MB), size= 1MB: uncachable, count=1
reg05: base=0xbf800000 (3064MB), size= 8MB: uncachable, count=1
will get
Found optimal setting for mtrr clean up
gran_size: 1M chunk_size: 128M num_reg: 6 lose RAM: 0M
range0: 0000000000000000 - 00000000c0000000
Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
hole: 00000000bf700000 - 00000000c0000000
Setting variable MTRR 2, base: 3063MB, range: 1MB, type UC
Setting variable MTRR 3, base: 3064MB, range: 8MB, type UC
range0: 0000000100000000 - 0000000140000000
Setting variable MTRR 4, base: 4096MB, range: 1024MB, type WB
hole: 000000013c000000 - 0000000140000000
Setting variable MTRR 5, base: 5056MB, range: 64MB, type UC
2. Dylan Taft
reg00: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg01: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
reg02: base=0x120000000 (4608MB), size= 256MB: write-back, count=1
reg03: base=0xd0000000 (3328MB), size= 256MB: uncachable, count=1
reg04: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg05: base=0xc7e00000 (3198MB), size= 2MB: uncachable, count=1
reg06: base=0xc8000000 (3200MB), size= 128MB: uncachable, count=1
will get
Found optimal setting for mtrr clean up
gran_size: 1M chunk_size: 4M num_reg: 6 lose RAM: 0M
range0: 0000000000000000 - 00000000c8000000
Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
Setting variable MTRR 2, base: 3072MB, range: 128MB, type WB
hole: 00000000c7e00000 - 00000000c8000000
Setting variable MTRR 3, base: 3198MB, range: 2MB, type UC
rangeX: 0000000100000000 - 0000000130000000
Setting variable MTRR 4, base: 4096MB, range: 512MB, type WB
Setting variable MTRR 5, base: 4608MB, range: 256MB, type WB
3. Gabriel
reg00: base=0xd0000000 (3328MB), size= 256MB: uncachable, count=1
reg01: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg02: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg03: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
reg04: base=0x120000000 (4608MB), size= 128MB: write-back, count=1
reg05: base=0x128000000 (4736MB), size= 64MB: write-back, count=1
reg06: base=0xcf600000 (3318MB), size= 2MB: uncachable, count=1
will get
Found optimal setting for mtrr clean up
gran_size: 1M chunk_size: 16M num_reg: 7 lose RAM: 0M
range0: 0000000000000000 - 00000000d0000000
Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
Setting variable MTRR 2, base: 3072MB, range: 256MB, type WB
hole: 00000000cf600000 - 00000000cf800000
Setting variable MTRR 3, base: 3318MB, range: 2MB, type UC
rangeX: 0000000100000000 - 000000012c000000
Setting variable MTRR 4, base: 4096MB, range: 512MB, type WB
Setting variable MTRR 5, base: 4608MB, range: 128MB, type WB
Setting variable MTRR 6, base: 4736MB, range: 64MB, type WB
4. Mika Fischer
reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1
reg01: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg02: base=0x100000000 (4096MB), size=1024MB: write-back, count=1
reg03: base=0xbf700000 (3063MB), size= 1MB: uncachable, count=1
reg04: base=0xbf800000 (3064MB), size= 8MB: uncachable, count=1
will get
Found optimal setting for mtrr clean up
gran_size: 1M chunk_size: 16M num_reg: 5 lose RAM: 0M
range0: 0000000000000000 - 00000000c0000000
Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
hole: 00000000bf700000 - 00000000c0000000
Setting variable MTRR 2, base: 3063MB, range: 1MB, type UC
Setting variable MTRR 3, base: 3064MB, range: 8MB, type UC
rangeX: 0000000100000000 - 0000000140000000
Setting variable MTRR 4, base: 4096MB, range: 1024MB, type WB
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-30 23:29:40 +00:00
|
|
|
If unsure, say Y.
|
2008-04-29 10:52:33 +00:00
|
|
|
|
|
|
|
config MTRR_SANITIZER_ENABLE_DEFAULT
|
2008-04-30 03:25:58 +00:00
|
|
|
int "MTRR cleanup enable value (0-1)"
|
|
|
|
range 0 1
|
|
|
|
default "0"
|
2008-04-29 10:52:33 +00:00
|
|
|
depends on MTRR_SANITIZER
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2008-04-30 03:25:58 +00:00
|
|
|
Enable mtrr cleanup default value
|
2008-04-29 10:52:33 +00:00
|
|
|
|
2008-05-02 09:40:22 +00:00
|
|
|
config MTRR_SANITIZER_SPARE_REG_NR_DEFAULT
|
|
|
|
int "MTRR cleanup spare reg num (0-7)"
|
|
|
|
range 0 7
|
|
|
|
default "1"
|
|
|
|
depends on MTRR_SANITIZER
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2008-05-02 09:40:22 +00:00
|
|
|
mtrr cleanup spare entries default, it can be changed via
|
2008-07-15 12:48:48 +00:00
|
|
|
mtrr_spare_reg_nr=N on the kernel command line.
|
2008-05-02 09:40:22 +00:00
|
|
|
|
2008-03-19 00:00:14 +00:00
|
|
|
config X86_PAT
|
2010-04-21 14:23:44 +00:00
|
|
|
def_bool y
|
2011-01-20 22:44:16 +00:00
|
|
|
prompt "x86 PAT support" if EXPERT
|
2008-04-26 08:26:52 +00:00
|
|
|
depends on MTRR
|
2024-08-21 19:34:43 +00:00
|
|
|
select ARCH_USES_PG_ARCH_2
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2008-03-19 00:00:14 +00:00
|
|
|
Use PAT attributes to setup page level cache control.
|
2008-03-24 21:22:35 +00:00
|
|
|
|
2008-03-19 00:00:14 +00:00
|
|
|
PATs are the modern equivalents of MTRRs and are much more
|
|
|
|
flexible than MTRRs.
|
|
|
|
|
|
|
|
Say N here if you see bootup problems (boot crash, boot hang,
|
2008-03-24 21:22:35 +00:00
|
|
|
spontaneous reboots) or a non-working video driver.
|
2008-03-19 00:00:14 +00:00
|
|
|
|
|
|
|
If unsure, say Y.
|
|
|
|
|
2019-11-05 21:25:32 +00:00
|
|
|
config X86_UMIP
|
2017-11-14 06:29:42 +00:00
|
|
|
def_bool y
|
2019-11-05 21:25:32 +00:00
|
|
|
prompt "User Mode Instruction Prevention" if EXPERT
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2019-11-05 21:25:32 +00:00
|
|
|
User Mode Instruction Prevention (UMIP) is a security feature in
|
|
|
|
some x86 processors. If enabled, a general protection fault is
|
|
|
|
issued if the SGDT, SLDT, SIDT, SMSW or STR instructions are
|
|
|
|
executed in user mode. These instructions unnecessarily expose
|
|
|
|
information about the hardware state.
|
2017-11-14 06:29:42 +00:00
|
|
|
|
|
|
|
The vast majority of applications do not use these instructions.
|
|
|
|
For the very few that do, software emulation is provided in
|
|
|
|
specific cases in protected and virtual-8086 modes. Emulated
|
|
|
|
results are dummy.
|
2017-11-06 02:27:54 +00:00
|
|
|
|
2022-03-08 15:30:17 +00:00
|
|
|
config CC_HAS_IBT
|
|
|
|
# GCC >= 9 and binutils >= 2.29
|
|
|
|
# Retpoline check to work around https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654
|
|
|
|
# Clang/LLVM >= 14
|
2022-03-18 23:07:46 +00:00
|
|
|
# https://github.com/llvm/llvm-project/commit/e0b89df2e0f0130881bf6c39bf31d7f6aac00e0f
|
|
|
|
# https://github.com/llvm/llvm-project/commit/dfcf69770bc522b9e411c66454934a37c1f35332
|
2022-03-08 15:30:17 +00:00
|
|
|
def_bool ((CC_IS_GCC && $(cc-option, -fcf-protection=branch -mindirect-branch-register)) || \
|
2022-03-18 23:07:46 +00:00
|
|
|
(CC_IS_CLANG && CLANG_VERSION >= 140000)) && \
|
2022-03-08 15:30:17 +00:00
|
|
|
$(as-instr,endbr64)
|
|
|
|
|
2023-06-13 00:10:32 +00:00
|
|
|
config X86_CET
|
|
|
|
def_bool n
|
|
|
|
help
|
|
|
|
CET features configured (Shadow stack or IBT)
|
|
|
|
|
2022-03-08 15:30:17 +00:00
|
|
|
config X86_KERNEL_IBT
|
|
|
|
prompt "Indirect Branch Tracking"
|
2022-11-01 17:25:07 +00:00
|
|
|
def_bool y
|
2022-04-18 16:50:36 +00:00
|
|
|
depends on X86_64 && CC_HAS_IBT && HAVE_OBJTOOL
|
2022-03-18 23:07:47 +00:00
|
|
|
# https://github.com/llvm/llvm-project/commit/9d7001eba9c4cb311e03cd8cdc231f9e579f2d0f
|
|
|
|
depends on !LD_IS_LLD || LLD_VERSION >= 140000
|
2022-04-18 16:50:36 +00:00
|
|
|
select OBJTOOL
|
2023-06-13 00:10:32 +00:00
|
|
|
select X86_CET
|
2022-03-08 15:30:17 +00:00
|
|
|
help
|
|
|
|
Build the kernel with support for Indirect Branch Tracking, a
|
|
|
|
hardware support course-grain forward-edge Control Flow Integrity
|
|
|
|
protection. It enforces that all indirect calls must land on
|
|
|
|
an ENDBR instruction, as such, the compiler will instrument the
|
|
|
|
code with them to make this happen.
|
|
|
|
|
2022-03-08 15:30:56 +00:00
|
|
|
In addition to building the kernel with IBT, seal all functions that
|
2022-04-17 19:24:54 +00:00
|
|
|
are not indirect call targets, avoiding them ever becoming one.
|
2022-03-08 15:30:56 +00:00
|
|
|
|
|
|
|
This requires LTO like objtool runs and will slow down the build. It
|
|
|
|
does significantly reduce the number of ENDBR instructions in the
|
|
|
|
kernel image.
|
|
|
|
|
2016-02-12 21:02:00 +00:00
|
|
|
config X86_INTEL_MEMORY_PROTECTION_KEYS
|
2020-05-28 16:08:23 +00:00
|
|
|
prompt "Memory Protection Keys"
|
2016-02-12 21:02:00 +00:00
|
|
|
def_bool y
|
2016-02-12 21:02:28 +00:00
|
|
|
# Note: only available in 64-bit mode
|
2020-05-28 16:08:23 +00:00
|
|
|
depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD)
|
2016-11-15 09:15:03 +00:00
|
|
|
select ARCH_USES_HIGH_VMA_FLAGS
|
|
|
|
select ARCH_HAS_PKEYS
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2016-02-12 21:02:28 +00:00
|
|
|
Memory Protection Keys provides a mechanism for enforcing
|
|
|
|
page-based protections, but without requiring modification of the
|
|
|
|
page tables when an application changes protection domains.
|
|
|
|
|
2019-06-07 18:54:31 +00:00
|
|
|
For details, see Documentation/core-api/protection-keys.rst
|
2016-02-12 21:02:28 +00:00
|
|
|
|
|
|
|
If unsure, say y.
|
2016-02-12 21:02:00 +00:00
|
|
|
|
2024-08-22 15:10:45 +00:00
|
|
|
config ARCH_PKEY_BITS
|
|
|
|
int
|
|
|
|
default 4
|
|
|
|
|
2019-10-23 10:35:50 +00:00
|
|
|
choice
|
|
|
|
prompt "TSX enable mode"
|
|
|
|
depends on CPU_SUP_INTEL
|
|
|
|
default X86_INTEL_TSX_MODE_OFF
|
|
|
|
help
|
|
|
|
Intel's TSX (Transactional Synchronization Extensions) feature
|
|
|
|
allows to optimize locking protocols through lock elision which
|
|
|
|
can lead to a noticeable performance boost.
|
|
|
|
|
|
|
|
On the other hand it has been shown that TSX can be exploited
|
|
|
|
to form side channel attacks (e.g. TAA) and chances are there
|
|
|
|
will be more of those attacks discovered in the future.
|
|
|
|
|
|
|
|
Therefore TSX is not enabled by default (aka tsx=off). An admin
|
|
|
|
might override this decision by tsx=on the command line parameter.
|
|
|
|
Even with TSX enabled, the kernel will attempt to enable the best
|
|
|
|
possible TAA mitigation setting depending on the microcode available
|
|
|
|
for the particular machine.
|
|
|
|
|
|
|
|
This option allows to set the default tsx mode between tsx=on, =off
|
|
|
|
and =auto. See Documentation/admin-guide/kernel-parameters.txt for more
|
|
|
|
details.
|
|
|
|
|
|
|
|
Say off if not sure, auto if TSX is in use but it should be used on safe
|
|
|
|
platforms or on if TSX is in use and the security aspect of tsx is not
|
|
|
|
relevant.
|
|
|
|
|
|
|
|
config X86_INTEL_TSX_MODE_OFF
|
|
|
|
bool "off"
|
|
|
|
help
|
|
|
|
TSX is disabled if possible - equals to tsx=off command line parameter.
|
|
|
|
|
|
|
|
config X86_INTEL_TSX_MODE_ON
|
|
|
|
bool "on"
|
|
|
|
help
|
|
|
|
TSX is always enabled on TSX capable HW - equals the tsx=on command
|
|
|
|
line parameter.
|
|
|
|
|
|
|
|
config X86_INTEL_TSX_MODE_AUTO
|
|
|
|
bool "auto"
|
|
|
|
help
|
|
|
|
TSX is enabled on TSX capable HW that is believed to be safe against
|
|
|
|
side channel attacks- equals the tsx=auto command line parameter.
|
|
|
|
endchoice
|
|
|
|
|
2020-11-12 22:01:16 +00:00
|
|
|
config X86_SGX
|
|
|
|
bool "Software Guard eXtensions (SGX)"
|
2022-08-16 23:19:42 +00:00
|
|
|
depends on X86_64 && CPU_SUP_INTEL && X86_X2APIC
|
2020-11-12 22:01:16 +00:00
|
|
|
depends on CRYPTO=y
|
|
|
|
depends on CRYPTO_SHA256=y
|
|
|
|
select MMU_NOTIFIER
|
2021-03-17 23:53:31 +00:00
|
|
|
select NUMA_KEEP_MEMINFO if NUMA
|
2021-10-26 22:00:45 +00:00
|
|
|
select XARRAY_MULTI
|
2020-11-12 22:01:16 +00:00
|
|
|
help
|
|
|
|
Intel(R) Software Guard eXtensions (SGX) is a set of CPU instructions
|
|
|
|
that can be used by applications to set aside private regions of code
|
|
|
|
and data, referred to as enclaves. An enclave's private memory can
|
|
|
|
only be accessed by code running within the enclave. Accesses from
|
|
|
|
outside the enclave, including other enclaves, are disallowed by
|
|
|
|
hardware.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2023-06-13 00:10:32 +00:00
|
|
|
config X86_USER_SHADOW_STACK
|
|
|
|
bool "X86 userspace shadow stack"
|
|
|
|
depends on AS_WRUSS
|
|
|
|
depends on X86_64
|
|
|
|
select ARCH_USES_HIGH_VMA_FLAGS
|
2024-10-01 22:58:40 +00:00
|
|
|
select ARCH_HAS_USER_SHADOW_STACK
|
2023-06-13 00:10:32 +00:00
|
|
|
select X86_CET
|
|
|
|
help
|
|
|
|
Shadow stack protection is a hardware feature that detects function
|
|
|
|
return address corruption. This helps mitigate ROP attacks.
|
|
|
|
Applications must be enabled to use it, and old userspace does not
|
|
|
|
get protection "for free".
|
|
|
|
|
|
|
|
CPUs supporting shadow stacks were first released in 2020.
|
|
|
|
|
2023-08-02 20:37:22 +00:00
|
|
|
See Documentation/arch/x86/shstk.rst for more information.
|
2023-06-13 00:10:32 +00:00
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2023-08-15 11:02:04 +00:00
|
|
|
config INTEL_TDX_HOST
|
|
|
|
bool "Intel Trust Domain Extensions (TDX) host support"
|
|
|
|
depends on CPU_SUP_INTEL
|
|
|
|
depends on X86_64
|
|
|
|
depends on KVM_INTEL
|
2023-12-08 17:07:23 +00:00
|
|
|
depends on X86_X2APIC
|
2023-12-08 17:07:27 +00:00
|
|
|
select ARCH_KEEP_MEMBLOCK
|
2023-12-08 17:07:31 +00:00
|
|
|
depends on CONTIG_ALLOC
|
2023-12-08 17:07:40 +00:00
|
|
|
depends on !KEXEC_CORE
|
2023-12-13 22:28:25 +00:00
|
|
|
depends on X86_MCE
|
2023-08-15 11:02:04 +00:00
|
|
|
help
|
|
|
|
Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
|
|
|
|
host and certain physical attacks. This option enables necessary TDX
|
|
|
|
support in the host kernel to run confidential VMs.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
config EFI
|
2008-10-16 05:01:38 +00:00
|
|
|
bool "EFI runtime service support"
|
2008-01-30 12:31:19 +00:00
|
|
|
depends on ACPI
|
2013-04-16 14:31:08 +00:00
|
|
|
select UCS2_STRING
|
2014-06-26 10:09:05 +00:00
|
|
|
select EFI_RUNTIME_WRAPPERS
|
2021-10-20 18:02:11 +00:00
|
|
|
select ARCH_USE_MEMREMAP_PROT
|
2023-08-02 15:17:04 +00:00
|
|
|
select EFI_RUNTIME_MAP if KEXEC_CORE
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2009-02-05 15:21:53 +00:00
|
|
|
This enables the kernel to use EFI runtime services that are
|
|
|
|
available (such as the EFI variable services).
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2009-02-05 15:21:53 +00:00
|
|
|
This option is only useful on systems that have EFI firmware.
|
|
|
|
In addition, you should use the latest ELILO loader available
|
|
|
|
at <http://elilo.sourceforge.net> in order to take advantage
|
|
|
|
of EFI runtime services. However, even with this option, the
|
|
|
|
resultant kernel should continue to boot on existing non-EFI
|
|
|
|
platforms.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
x86, efi: EFI boot stub support
There is currently a large divide between kernel development and the
development of EFI boot loaders. The idea behind this patch is to give
the kernel developers full control over the EFI boot process. As
H. Peter Anvin put it,
"The 'kernel carries its own stub' approach been very successful in
dealing with BIOS, and would make a lot of sense to me for EFI as
well."
This patch introduces an EFI boot stub that allows an x86 bzImage to
be loaded and executed by EFI firmware. The bzImage appears to the
firmware as an EFI application. Luckily there are enough free bits
within the bzImage header so that it can masquerade as an EFI
application, thereby coercing the EFI firmware into loading it and
jumping to its entry point. The beauty of this masquerading approach
is that both BIOS and EFI boot loaders can still load and run the same
bzImage, thereby allowing a single kernel image to work in any boot
environment.
The EFI boot stub supports multiple initrds, but they must exist on
the same partition as the bzImage. Command-line arguments for the
kernel can be appended after the bzImage name when run from the EFI
shell, e.g.
Shell> bzImage console=ttyS0 root=/dev/sdb initrd=initrd.img
v7:
- Fix checkpatch warnings.
v6:
- Try to allocate initrd memory just below hdr->inird_addr_max.
v5:
- load_options_size is UTF-16, which needs dividing by 2 to convert
to the corresponding ASCII size.
v4:
- Don't read more than image->load_options_size
v3:
- Fix following warnings when compiling CONFIG_EFI_STUB=n
arch/x86/boot/tools/build.c: In function ‘main’:
arch/x86/boot/tools/build.c:138:24: warning: unused variable ‘pe_header’
arch/x86/boot/tools/build.c:138:15: warning: unused variable ‘file_sz’
- As reported by Matthew Garrett, some Apple machines have GOPs that
don't have hardware attached. We need to weed these out by
searching for ones that handle the PCIIO protocol.
- Don't allocate memory if no initrds are on cmdline
- Don't trust image->load_options_size
Maarten Lankhorst noted:
- Don't strip first argument when booted from efibootmgr
- Don't allocate too much memory for cmdline
- Don't update cmdline_size, the kernel considers it read-only
- Don't accept '\n' for initrd names
v2:
- File alignment was too large, was 8192 should be 512. Reported by
Maarten Lankhorst on LKML.
- Added UGA support for graphics
- Use VIDEO_TYPE_EFI instead of hard-coded number.
- Move linelength assignment until after we've assigned depth
- Dynamically fill out AddressOfEntryPoint in tools/build.c
- Don't use magic number for GDT/TSS stuff. Requested by Andi Kleen
- The bzImage may need to be relocated as it may have been loaded at
a high address address by the firmware. This was required to get my
macbook booting because the firmware loaded it at 0x7cxxxxxx, which
triggers this error in decompress_kernel(),
if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
error("Destination address too large");
Cc: Mike Waychison <mikew@google.com>
Cc: Matthew Garrett <mjg@redhat.com>
Tested-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1321383097.2657.9.camel@mfleming-mobl1.ger.corp.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-12 21:27:52 +00:00
|
|
|
config EFI_STUB
|
2019-12-24 15:10:12 +00:00
|
|
|
bool "EFI stub support"
|
2021-11-15 16:46:39 +00:00
|
|
|
depends on EFI
|
2019-12-24 15:10:12 +00:00
|
|
|
select RELOCATABLE
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2019-12-24 15:10:12 +00:00
|
|
|
This kernel feature allows a bzImage to be loaded directly
|
x86, efi: EFI boot stub support
There is currently a large divide between kernel development and the
development of EFI boot loaders. The idea behind this patch is to give
the kernel developers full control over the EFI boot process. As
H. Peter Anvin put it,
"The 'kernel carries its own stub' approach been very successful in
dealing with BIOS, and would make a lot of sense to me for EFI as
well."
This patch introduces an EFI boot stub that allows an x86 bzImage to
be loaded and executed by EFI firmware. The bzImage appears to the
firmware as an EFI application. Luckily there are enough free bits
within the bzImage header so that it can masquerade as an EFI
application, thereby coercing the EFI firmware into loading it and
jumping to its entry point. The beauty of this masquerading approach
is that both BIOS and EFI boot loaders can still load and run the same
bzImage, thereby allowing a single kernel image to work in any boot
environment.
The EFI boot stub supports multiple initrds, but they must exist on
the same partition as the bzImage. Command-line arguments for the
kernel can be appended after the bzImage name when run from the EFI
shell, e.g.
Shell> bzImage console=ttyS0 root=/dev/sdb initrd=initrd.img
v7:
- Fix checkpatch warnings.
v6:
- Try to allocate initrd memory just below hdr->inird_addr_max.
v5:
- load_options_size is UTF-16, which needs dividing by 2 to convert
to the corresponding ASCII size.
v4:
- Don't read more than image->load_options_size
v3:
- Fix following warnings when compiling CONFIG_EFI_STUB=n
arch/x86/boot/tools/build.c: In function ‘main’:
arch/x86/boot/tools/build.c:138:24: warning: unused variable ‘pe_header’
arch/x86/boot/tools/build.c:138:15: warning: unused variable ‘file_sz’
- As reported by Matthew Garrett, some Apple machines have GOPs that
don't have hardware attached. We need to weed these out by
searching for ones that handle the PCIIO protocol.
- Don't allocate memory if no initrds are on cmdline
- Don't trust image->load_options_size
Maarten Lankhorst noted:
- Don't strip first argument when booted from efibootmgr
- Don't allocate too much memory for cmdline
- Don't update cmdline_size, the kernel considers it read-only
- Don't accept '\n' for initrd names
v2:
- File alignment was too large, was 8192 should be 512. Reported by
Maarten Lankhorst on LKML.
- Added UGA support for graphics
- Use VIDEO_TYPE_EFI instead of hard-coded number.
- Move linelength assignment until after we've assigned depth
- Dynamically fill out AddressOfEntryPoint in tools/build.c
- Don't use magic number for GDT/TSS stuff. Requested by Andi Kleen
- The bzImage may need to be relocated as it may have been loaded at
a high address address by the firmware. This was required to get my
macbook booting because the firmware loaded it at 0x7cxxxxxx, which
triggers this error in decompress_kernel(),
if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
error("Destination address too large");
Cc: Mike Waychison <mikew@google.com>
Cc: Matthew Garrett <mjg@redhat.com>
Tested-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/1321383097.2657.9.camel@mfleming-mobl1.ger.corp.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-12-12 21:27:52 +00:00
|
|
|
by EFI firmware without the use of a bootloader.
|
|
|
|
|
2019-06-27 17:56:51 +00:00
|
|
|
See Documentation/admin-guide/efi-stub.rst for more information.
|
2012-03-16 12:03:13 +00:00
|
|
|
|
x86/efi: Make the deprecated EFI handover protocol optional
The EFI handover protocol permits a bootloader to invoke the kernel as a
EFI PE/COFF application, while passing a bootparams struct as a third
argument to the entrypoint function call.
This has no basis in the UEFI specification, and there are better ways
to pass additional data to a UEFI application (UEFI configuration
tables, UEFI variables, UEFI protocols) than going around the
StartImage() boot service and jumping to a fixed offset in the loaded
image, just to call a different function that takes a third parameter.
The reason for handling struct bootparams in the bootloader was that the
EFI stub could only load initrd images from the EFI system partition,
and so passing it via struct bootparams was needed for loaders like
GRUB, which pass the initrd in memory, and may load it from anywhere,
including from the network. Another motivation was EFI mixed mode, which
could not use the initrd loader in the EFI stub at all due to 32/64 bit
incompatibilities (which will be fixed shortly [0]), and could not
invoke the ordinary PE/COFF entry point either, for the same reasons.
Given that loaders such as GRUB already carried the bootparams handling
in order to implement non-EFI boot, retaining that code and just passing
bootparams to the EFI stub was a reasonable choice (although defining an
alternate entrypoint could have been avoided.) However, the GRUB side
changes never made it upstream, and are only shipped by some of the
distros in their downstream versions.
In the meantime, EFI support has been added to other Linux architecture
ports, as well as to U-boot and systemd, including arch-agnostic methods
for passing initrd images in memory [1], and for doing mixed mode boot
[2], none of them requiring anything like the EFI handover protocol. So
given that only out-of-tree distro GRUB relies on this, let's permit it
to be omitted from the build, in preparation for retiring it completely
at a later date. (Note that systemd-boot does have an implementation as
well, but only uses it as a fallback for booting images that do not
implement the LoadFile2 based initrd loading method, i.e., v5.8 or older)
[0] https://lore.kernel.org/all/20220927085842.2860715-1-ardb@kernel.org/
[1] ec93fc371f01 ("efi/libstub: Add support for loading the initrd from a device path")
[2] 97aa276579b2 ("efi/x86: Add true mixed mode entry point into .compat section")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221122161017.2426828-18-ardb@kernel.org
2022-11-22 16:10:17 +00:00
|
|
|
config EFI_HANDOVER_PROTOCOL
|
|
|
|
bool "EFI handover protocol (DEPRECATED)"
|
|
|
|
depends on EFI_STUB
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Select this in order to include support for the deprecated EFI
|
|
|
|
handover protocol, which defines alternative entry points into the
|
|
|
|
EFI stub. This is a practice that has no basis in the UEFI
|
|
|
|
specification, and requires a priori knowledge on the part of the
|
|
|
|
bootloader about Linux/x86 specific ways of passing the command line
|
|
|
|
and initrd, and where in memory those assets may be loaded.
|
|
|
|
|
|
|
|
If in doubt, say Y. Even though the corresponding support is not
|
|
|
|
present in upstream GRUB or other bootloaders, most distros build
|
|
|
|
GRUB with numerous downstream patches applied, and may rely on the
|
|
|
|
handover protocol as as result.
|
|
|
|
|
2014-01-10 18:52:06 +00:00
|
|
|
config EFI_MIXED
|
|
|
|
bool "EFI mixed-mode support"
|
|
|
|
depends on EFI_STUB && X86_64
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2022-05-25 13:32:02 +00:00
|
|
|
Enabling this feature allows a 64-bit kernel to be booted
|
|
|
|
on a 32-bit firmware, provided that your CPU supports 64-bit
|
|
|
|
mode.
|
2014-01-10 18:52:06 +00:00
|
|
|
|
2022-05-25 13:32:02 +00:00
|
|
|
Note that it is not possible to boot a mixed-mode enabled
|
|
|
|
kernel via the EFI boot stub - a bootloader that supports
|
|
|
|
the EFI handover protocol must be used.
|
2014-01-10 18:52:06 +00:00
|
|
|
|
2022-05-25 13:32:02 +00:00
|
|
|
If unsure, say N.
|
2014-01-10 18:52:06 +00:00
|
|
|
|
2022-11-07 08:17:16 +00:00
|
|
|
config EFI_RUNTIME_MAP
|
|
|
|
bool "Export EFI runtime maps to sysfs" if EXPERT
|
|
|
|
depends on EFI
|
|
|
|
help
|
|
|
|
Export EFI runtime memory regions to /sys/firmware/efi/runtime-map.
|
|
|
|
That memory map is required by the 2nd kernel to set up EFI virtual
|
|
|
|
mappings after kexec, but can also be used for debugging purposes.
|
|
|
|
|
|
|
|
See also Documentation/ABI/testing/sysfs-firmware-efi-runtime-map.
|
|
|
|
|
2018-12-11 11:01:04 +00:00
|
|
|
source "kernel/Kconfig.hz"
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2023-07-12 16:15:33 +00:00
|
|
|
config ARCH_SUPPORTS_KEXEC
|
|
|
|
def_bool y
|
2014-08-29 22:18:46 +00:00
|
|
|
|
2023-07-12 16:15:33 +00:00
|
|
|
config ARCH_SUPPORTS_KEXEC_FILE
|
kexec: fix KEXEC_FILE dependencies
The cleanup for the CONFIG_KEXEC Kconfig logic accidentally changed the
'depends on CRYPTO=y' dependency to a plain 'depends on CRYPTO', which
causes a link failure when all the crypto support is in a loadable module
and kexec_file support is built-in:
x86_64-linux-ld: vmlinux.o: in function `__x64_sys_kexec_file_load':
(.text+0x32e30a): undefined reference to `crypto_alloc_shash'
x86_64-linux-ld: (.text+0x32e58e): undefined reference to `crypto_shash_update'
x86_64-linux-ld: (.text+0x32e6ee): undefined reference to `crypto_shash_final'
Both s390 and x86 have this problem, while ppc64 and riscv have the
correct dependency already. On riscv, the dependency is only used for the
purgatory, not for the kexec_file code itself, which may be a bit
surprising as it means that with CONFIG_CRYPTO=m, it is possible to enable
KEXEC_FILE but then the purgatory code is silently left out.
Move this into the common Kconfig.kexec file in a way that is correct
everywhere, using the dependency on CRYPTO_SHA256=y only when the
purgatory code is available. This requires reversing the dependency
between ARCH_SUPPORTS_KEXEC_PURGATORY and KEXEC_FILE, but the effect
remains the same, other than making riscv behave like the other ones.
On s390, there is an additional dependency on CRYPTO_SHA256_S390, which
should technically not be required but gives better performance. Remove
this dependency here, noting that it was not present in the initial
Kconfig code but was brought in without an explanation in commit
71406883fd357 ("s390/kexec_file: Add kexec_file_load system call").
[arnd@arndb.de: fix riscv build]
Link: https://lkml.kernel.org/r/67ddd260-d424-4229-a815-e3fcfb864a77@app.fastmail.com
Link: https://lkml.kernel.org/r/20231023110308.1202042-1-arnd@kernel.org
Fixes: 6af5138083005 ("x86/kexec: refactor for kernel/Kconfig.kexec")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Eric DeVolder <eric_devolder@yahoo.com>
Tested-by: Eric DeVolder <eric_devolder@yahoo.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Conor Dooley <conor@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-23 11:01:54 +00:00
|
|
|
def_bool X86_64
|
kexec_file: make use of purgatory optional
Patch series "kexec_file, x86, powerpc: refactoring for other
architecutres", v2.
This is a preparatory patchset for adding kexec_file support on arm64.
It was originally included in a arm64 patch set[1], but Philipp is also
working on their kexec_file support on s390[2] and some changes are now
conflicting.
So these common parts were extracted and put into a separate patch set
for better integration. What's more, my original patch#4 was split into
a few small chunks for easier review after Dave's comment.
As such, the resulting code is basically identical with my original, and
the only *visible* differences are:
- renaming of _kexec_kernel_image_probe() and _kimage_file_post_load_cleanup()
- change one of types of arguments at prepare_elf64_headers()
Those, unfortunately, require a couple of trivial changes on the rest
(#1, #6 to #13) of my arm64 kexec_file patch set[1].
Patch #1 allows making a use of purgatory optional, particularly useful
for arm64.
Patch #2 commonalizes arch_kexec_kernel_{image_probe, image_load,
verify_sig}() and arch_kimage_file_post_load_cleanup() across
architectures.
Patches #3-#7 are also intended to generalize parse_elf64_headers(),
along with exclude_mem_range(), to be made best re-use of.
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-February/561182.html
[2] http://lkml.iu.edu//hypermail/linux/kernel/1802.1/02596.html
This patch (of 7):
On arm64, crash dump kernel's usable memory is protected by *unmapping*
it from kernel virtual space unlike other architectures where the region
is just made read-only. It is highly unlikely that the region is
accidentally corrupted and this observation rationalizes that digest
check code can also be dropped from purgatory. The resulting code is so
simple as it doesn't require a bit ugly re-linking/relocation stuff,
i.e. arch_kexec_apply_relocations_add().
Please see:
http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545428.html
All that the purgatory does is to shuffle arguments and jump into a new
kernel, while we still need to have some space for a hash value
(purgatory_sha256_digest) which is never checked against.
As such, it doesn't make sense to have trampline code between old kernel
and new kernel on arm64.
This patch introduces a new configuration, ARCH_HAS_KEXEC_PURGATORY, and
allows related code to be compiled in only if necessary.
[takahiro.akashi@linaro.org: fix trivial screwup]
Link: http://lkml.kernel.org/r/20180309093346.GF25863@linaro.org
Link: http://lkml.kernel.org/r/20180306102303.9063-2-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 22:35:45 +00:00
|
|
|
|
2023-07-12 16:15:33 +00:00
|
|
|
config ARCH_SELECTS_KEXEC_FILE
|
|
|
|
def_bool y
|
2014-08-29 22:18:46 +00:00
|
|
|
depends on KEXEC_FILE
|
2023-07-12 16:15:33 +00:00
|
|
|
select HAVE_IMA_KEXEC if IMA
|
2015-03-13 13:04:37 +00:00
|
|
|
|
2023-07-12 16:15:45 +00:00
|
|
|
config ARCH_SUPPORTS_KEXEC_PURGATORY
|
kexec: fix KEXEC_FILE dependencies
The cleanup for the CONFIG_KEXEC Kconfig logic accidentally changed the
'depends on CRYPTO=y' dependency to a plain 'depends on CRYPTO', which
causes a link failure when all the crypto support is in a loadable module
and kexec_file support is built-in:
x86_64-linux-ld: vmlinux.o: in function `__x64_sys_kexec_file_load':
(.text+0x32e30a): undefined reference to `crypto_alloc_shash'
x86_64-linux-ld: (.text+0x32e58e): undefined reference to `crypto_shash_update'
x86_64-linux-ld: (.text+0x32e6ee): undefined reference to `crypto_shash_final'
Both s390 and x86 have this problem, while ppc64 and riscv have the
correct dependency already. On riscv, the dependency is only used for the
purgatory, not for the kexec_file code itself, which may be a bit
surprising as it means that with CONFIG_CRYPTO=m, it is possible to enable
KEXEC_FILE but then the purgatory code is silently left out.
Move this into the common Kconfig.kexec file in a way that is correct
everywhere, using the dependency on CRYPTO_SHA256=y only when the
purgatory code is available. This requires reversing the dependency
between ARCH_SUPPORTS_KEXEC_PURGATORY and KEXEC_FILE, but the effect
remains the same, other than making riscv behave like the other ones.
On s390, there is an additional dependency on CRYPTO_SHA256_S390, which
should technically not be required but gives better performance. Remove
this dependency here, noting that it was not present in the initial
Kconfig code but was brought in without an explanation in commit
71406883fd357 ("s390/kexec_file: Add kexec_file_load system call").
[arnd@arndb.de: fix riscv build]
Link: https://lkml.kernel.org/r/67ddd260-d424-4229-a815-e3fcfb864a77@app.fastmail.com
Link: https://lkml.kernel.org/r/20231023110308.1202042-1-arnd@kernel.org
Fixes: 6af5138083005 ("x86/kexec: refactor for kernel/Kconfig.kexec")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Eric DeVolder <eric_devolder@yahoo.com>
Tested-by: Eric DeVolder <eric_devolder@yahoo.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Conor Dooley <conor@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-23 11:01:54 +00:00
|
|
|
def_bool y
|
2019-08-20 00:17:44 +00:00
|
|
|
|
2023-07-12 16:15:33 +00:00
|
|
|
config ARCH_SUPPORTS_KEXEC_SIG
|
|
|
|
def_bool y
|
2014-08-08 21:26:13 +00:00
|
|
|
|
2023-07-12 16:15:33 +00:00
|
|
|
config ARCH_SUPPORTS_KEXEC_SIG_FORCE
|
|
|
|
def_bool y
|
2019-08-20 00:17:44 +00:00
|
|
|
|
2023-07-12 16:15:33 +00:00
|
|
|
config ARCH_SUPPORTS_KEXEC_BZIMAGE_VERIFY_SIG
|
|
|
|
def_bool y
|
2014-08-08 21:26:13 +00:00
|
|
|
|
2023-07-12 16:15:33 +00:00
|
|
|
config ARCH_SUPPORTS_KEXEC_JUMP
|
|
|
|
def_bool y
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2023-07-12 16:15:33 +00:00
|
|
|
config ARCH_SUPPORTS_CRASH_DUMP
|
|
|
|
def_bool X86_64 || (X86_32 && HIGHMEM)
|
2008-07-26 02:45:07 +00:00
|
|
|
|
2024-09-17 16:37:20 +00:00
|
|
|
config ARCH_DEFAULT_CRASH_DUMP
|
|
|
|
def_bool y
|
|
|
|
|
2023-08-14 21:44:43 +00:00
|
|
|
config ARCH_SUPPORTS_CRASH_HOTPLUG
|
|
|
|
def_bool y
|
2008-07-26 02:45:07 +00:00
|
|
|
|
2023-09-14 03:31:39 +00:00
|
|
|
config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
|
kexec: split crashkernel reservation code out from crash_core.c
Patch series "Split crash out from kexec and clean up related config
items", v3.
Motivation:
=============
Previously, LKP reported a building error. When investigating, it can't
be resolved reasonablly with the present messy kdump config items.
https://lore.kernel.org/oe-kbuild-all/202312182200.Ka7MzifQ-lkp@intel.com/
The kdump (crash dumping) related config items could causes confusions:
Firstly,
CRASH_CORE enables codes including
- crashkernel reservation;
- elfcorehdr updating;
- vmcoreinfo exporting;
- crash hotplug handling;
Now fadump of powerpc, kcore dynamic debugging and kdump all selects
CRASH_CORE, while fadump
- fadump needs crashkernel parsing, vmcoreinfo exporting, and accessing
global variable 'elfcorehdr_addr';
- kcore only needs vmcoreinfo exporting;
- kdump needs all of the current kernel/crash_core.c.
So only enabling PROC_CORE or FA_DUMP will enable CRASH_CORE, this
mislead people that we enable crash dumping, actual it's not.
Secondly,
It's not reasonable to allow KEXEC_CORE select CRASH_CORE.
Because KEXEC_CORE enables codes which allocate control pages, copy
kexec/kdump segments, and prepare for switching. These codes are
shared by both kexec reboot and kdump. We could want kexec reboot,
but disable kdump. In that case, CRASH_CORE should not be selected.
--------------------
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
---------------------
Thirdly,
It's not reasonable to allow CRASH_DUMP select KEXEC_CORE.
That could make KEXEC_CORE, CRASH_DUMP are enabled independently from
KEXEC or KEXEC_FILE. However, w/o KEXEC or KEXEC_FILE, the KEXEC_CORE
code built in doesn't make any sense because no kernel loading or
switching will happen to utilize the KEXEC_CORE code.
---------------------
CONFIG_CRASH_CORE=y
CONFIG_KEXEC_CORE=y
CONFIG_CRASH_DUMP=y
---------------------
In this case, what is worse, on arch sh and arm, KEXEC relies on MMU,
while CRASH_DUMP can still be enabled when !MMU, then compiling error is
seen as the lkp test robot reported in above link.
------arch/sh/Kconfig------
config ARCH_SUPPORTS_KEXEC
def_bool MMU
config ARCH_SUPPORTS_CRASH_DUMP
def_bool BROKEN_ON_SMP
---------------------------
Changes:
===========
1, split out crash_reserve.c from crash_core.c;
2, split out vmcore_infoc. from crash_core.c;
3, move crash related codes in kexec_core.c into crash_core.c;
4, remove dependency of FA_DUMP on CRASH_DUMP;
5, clean up kdump related config items;
6, wrap up crash codes in crash related ifdefs on all 8 arch-es
which support crash dumping, except of ppc;
Achievement:
===========
With above changes, I can rearrange the config item logic as below (the right
item depends on or is selected by the left item):
PROC_KCORE -----------> VMCORE_INFO
|----------> VMCORE_INFO
FA_DUMP----|
|----------> CRASH_RESERVE
---->VMCORE_INFO
/
|---->CRASH_RESERVE
KEXEC --| /|
|--> KEXEC_CORE--> CRASH_DUMP-->/-|---->PROC_VMCORE
KEXEC_FILE --| \ |
\---->CRASH_HOTPLUG
KEXEC --|
|--> KEXEC_CORE (for kexec reboot only)
KEXEC_FILE --|
Test
========
On all 8 architectures, including x86_64, arm64, s390x, sh, arm, mips,
riscv, loongarch, I did below three cases of config item setting and
building all passed. Take configs on x86_64 as exampmle here:
(1) Both CONFIG_KEXEC and KEXEC_FILE is unset, then all kexec/kdump
items are unset automatically:
# Kexec and crash features
# CONFIG_KEXEC is not set
# CONFIG_KEXEC_FILE is not set
# end of Kexec and crash features
(2) set CONFIG_KEXEC_FILE and 'make olddefconfig':
---------------
# Kexec and crash features
CONFIG_CRASH_RESERVE=y
CONFIG_VMCORE_INFO=y
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC_FILE=y
CONFIG_CRASH_DUMP=y
CONFIG_CRASH_HOTPLUG=y
CONFIG_CRASH_MAX_MEMORY_RANGES=8192
# end of Kexec and crash features
---------------
(3) unset CONFIG_CRASH_DUMP in case 2 and execute 'make olddefconfig':
------------------------
# Kexec and crash features
CONFIG_KEXEC_CORE=y
CONFIG_KEXEC_FILE=y
# end of Kexec and crash features
------------------------
Note:
For ppc, it needs investigation to make clear how to split out crash
code in arch folder. Hope Hari and Pingfan can help have a look, see if
it's doable. Now, I make it either have both kexec and crash enabled, or
disable both of them altogether.
This patch (of 14):
Both kdump and fa_dump of ppc rely on crashkernel reservation. Move the
relevant codes into separate files: crash_reserve.c,
include/linux/crash_reserve.h.
And also add config item CRASH_RESERVE to control its enabling of the
codes. And update config items which has relationship with crashkernel
reservation.
And also change ifdeffery from CONFIG_CRASH_CORE to CONFIG_CRASH_RESERVE
when those scopes are only crashkernel reservation related.
And also rename arch/XXX/include/asm/{crash_core.h => crash_reserve.h} on
arm64, x86 and risc-v because those architectures' crash_core.h is only
related to crashkernel reservation.
[akpm@linux-foundation.org: s/CRASH_RESEERVE/CRASH_RESERVE/, per Klara Modin]
Link: https://lkml.kernel.org/r/20240124051254.67105-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20240124051254.67105-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Hari Bathini <hbathini@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pingfan Liu <piliu@redhat.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Michael Kelley <mhklinux@outlook.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-24 05:12:41 +00:00
|
|
|
def_bool CRASH_RESERVE
|
2023-09-14 03:31:39 +00:00
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
config PHYSICAL_START
|
2011-01-20 22:44:16 +00:00
|
|
|
hex "Physical address where the kernel is loaded" if (EXPERT || CRASH_DUMP)
|
2009-05-11 23:12:16 +00:00
|
|
|
default "0x1000000"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
This gives the physical address where the kernel is loaded.
|
|
|
|
|
2023-12-15 19:05:21 +00:00
|
|
|
If the kernel is not relocatable (CONFIG_RELOCATABLE=n) then bzImage
|
|
|
|
will decompress itself to above physical address and run from there.
|
|
|
|
Otherwise, bzImage will run from the address where it has been loaded
|
|
|
|
by the boot loader. The only exception is if it is loaded below the
|
|
|
|
above physical address, in which case it will relocate itself there.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
|
|
|
In normal kdump cases one does not have to set/change this option
|
|
|
|
as now bzImage can be compiled as a completely relocatable image
|
|
|
|
(CONFIG_RELOCATABLE=y) and be used to load and run from a different
|
|
|
|
address. This option is mainly useful for the folks who don't want
|
|
|
|
to use a bzImage for capturing the crash dump and want to use a
|
|
|
|
vmlinux instead. vmlinux is not relocatable hence a kernel needs
|
|
|
|
to be specifically compiled to run from a specific memory area
|
|
|
|
(normally a reserved region) and this option comes handy.
|
|
|
|
|
2009-05-11 23:12:16 +00:00
|
|
|
So if you are using bzImage for capturing the crash dump,
|
|
|
|
leave the value here unchanged to 0x1000000 and set
|
|
|
|
CONFIG_RELOCATABLE=y. Otherwise if you plan to use vmlinux
|
|
|
|
for capturing the crash dump change this value to start of
|
|
|
|
the reserved region. In other words, it can be set based on
|
|
|
|
the "X" value as specified in the "crashkernel=YM@XM"
|
|
|
|
command line boot parameter passed to the panic-ed
|
2019-06-13 18:21:39 +00:00
|
|
|
kernel. Please take a look at Documentation/admin-guide/kdump/kdump.rst
|
2009-05-11 23:12:16 +00:00
|
|
|
for more details about crash dumps.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
|
|
|
Usage of bzImage for capturing the crash dump is recommended as
|
|
|
|
one does not have to build two kernels. Same kernel can be used
|
|
|
|
as production kernel and capture kernel. Above option should have
|
|
|
|
gone away after relocatable bzImage support is introduced. But it
|
|
|
|
is present because there are users out there who continue to use
|
|
|
|
vmlinux for dump capture. This option should go away down the
|
|
|
|
line.
|
|
|
|
|
|
|
|
Don't change this unless you know what you are doing.
|
|
|
|
|
|
|
|
config RELOCATABLE
|
2009-05-07 21:19:34 +00:00
|
|
|
bool "Build a relocatable kernel"
|
|
|
|
default y
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
This builds a kernel image that retains relocation information
|
|
|
|
so it can be loaded someplace besides the default 1MB.
|
|
|
|
The relocations tend to make the kernel binary about 10% larger,
|
|
|
|
but are discarded at runtime.
|
|
|
|
|
|
|
|
One use is for the kexec on panic case where the recovery kernel
|
|
|
|
must live at a different physical address than the primary
|
|
|
|
kernel.
|
|
|
|
|
|
|
|
Note: If CONFIG_RELOCATABLE=y, then the kernel runs from the address
|
|
|
|
it has been loaded at and the compile time physical address
|
2013-10-11 00:18:14 +00:00
|
|
|
(CONFIG_PHYSICAL_START) is used as the minimum location.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2013-10-11 00:18:14 +00:00
|
|
|
config RANDOMIZE_BASE
|
2016-04-20 20:55:43 +00:00
|
|
|
bool "Randomize the address of the kernel image (KASLR)"
|
2013-10-11 00:18:14 +00:00
|
|
|
depends on RELOCATABLE
|
2017-04-18 09:08:12 +00:00
|
|
|
default y
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2016-04-20 20:55:43 +00:00
|
|
|
In support of Kernel Address Space Layout Randomization (KASLR),
|
|
|
|
this randomizes the physical address at which the kernel image
|
|
|
|
is decompressed and the virtual address where the kernel
|
|
|
|
image is mapped, as a security feature that deters exploit
|
|
|
|
attempts relying on knowledge of the location of kernel
|
|
|
|
code internals.
|
|
|
|
|
2016-05-25 22:45:33 +00:00
|
|
|
On 64-bit, the kernel physical and virtual addresses are
|
|
|
|
randomized separately. The physical address will be anywhere
|
|
|
|
between 16MB and the top of physical memory (up to 64TB). The
|
|
|
|
virtual address will be randomized from 16MB up to 1GB (9 bits
|
|
|
|
of entropy). Note that this also reduces the memory space
|
|
|
|
available to kernel modules from 1.5GB to 1GB.
|
|
|
|
|
|
|
|
On 32-bit, the kernel physical and virtual addresses are
|
|
|
|
randomized together. They will be randomized from 16MB up to
|
|
|
|
512MB (8 bits of entropy).
|
2016-04-20 20:55:43 +00:00
|
|
|
|
|
|
|
Entropy is generated using the RDRAND instruction if it is
|
|
|
|
supported. If RDTSC is supported, its value is mixed into
|
|
|
|
the entropy pool as well. If neither RDRAND nor RDTSC are
|
2016-05-25 22:45:33 +00:00
|
|
|
supported, then entropy is read from the i8254 timer. The
|
|
|
|
usable entropy is limited by the kernel being built using
|
|
|
|
2GB addressing, and that PHYSICAL_ALIGN must be at a
|
|
|
|
minimum of 2MB. As a result, only 10 bits of entropy are
|
|
|
|
theoretically possible, but the implementations are further
|
|
|
|
limited due to memory layouts.
|
2016-04-20 20:55:43 +00:00
|
|
|
|
2017-04-18 09:08:12 +00:00
|
|
|
If unsure, say Y.
|
2013-10-11 00:18:14 +00:00
|
|
|
|
|
|
|
# Relocation on x86 needs some additional build support
|
2009-05-06 04:20:51 +00:00
|
|
|
config X86_NEED_RELOCS
|
|
|
|
def_bool y
|
2013-10-11 00:18:14 +00:00
|
|
|
depends on RANDOMIZE_BASE || (X86_32 && RELOCATABLE)
|
2009-05-06 04:20:51 +00:00
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
config PHYSICAL_ALIGN
|
2013-07-08 16:15:17 +00:00
|
|
|
hex "Alignment value to which kernel should be aligned"
|
2013-10-11 00:18:14 +00:00
|
|
|
default "0x200000"
|
2013-07-08 16:15:17 +00:00
|
|
|
range 0x2000 0x1000000 if X86_32
|
|
|
|
range 0x200000 0x1000000 if X86_64
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-09 20:56:54 +00:00
|
|
|
This value puts the alignment restrictions on physical address
|
|
|
|
where kernel is loaded and run from. Kernel is compiled for an
|
|
|
|
address which meets above alignment restriction.
|
|
|
|
|
|
|
|
If bootloader loads the kernel at a non-aligned address and
|
|
|
|
CONFIG_RELOCATABLE is set, kernel will move itself to nearest
|
|
|
|
address aligned to above value and run from there.
|
|
|
|
|
|
|
|
If bootloader loads the kernel at a non-aligned address and
|
|
|
|
CONFIG_RELOCATABLE is not set, kernel will ignore the run time
|
|
|
|
load address and decompress itself to the address it has been
|
|
|
|
compiled for and run from there. The address for which kernel is
|
|
|
|
compiled already meets above alignment restrictions. Hence the
|
|
|
|
end result is that kernel runs from a physical address meeting
|
|
|
|
above alignment restrictions.
|
|
|
|
|
2013-07-08 16:15:17 +00:00
|
|
|
On 32-bit this value must be a multiple of 0x2000. On 64-bit
|
|
|
|
this value must be a multiple of 0x200000.
|
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
Don't change this unless you know what you are doing.
|
|
|
|
|
2018-02-14 11:16:50 +00:00
|
|
|
config DYNAMIC_MEMORY_LAYOUT
|
|
|
|
bool
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2018-02-14 11:16:50 +00:00
|
|
|
This option makes base addresses of vmalloc and vmemmap as well as
|
|
|
|
__PAGE_OFFSET movable during boot.
|
|
|
|
|
x86/mm: Implement ASLR for kernel memory regions
Randomizes the virtual address space of kernel memory regions for
x86_64. This first patch adds the infrastructure and does not randomize
any region. The following patches will randomize the physical memory
mapping, vmalloc and vmemmap regions.
This security feature mitigates exploits relying on predictable kernel
addresses. These addresses can be used to disclose the kernel modules
base addresses or corrupt specific structures to elevate privileges
bypassing the current implementation of KASLR. This feature can be
enabled with the CONFIG_RANDOMIZE_MEMORY option.
The order of each memory region is not changed. The feature looks at the
available space for the regions based on different configuration options
and randomizes the base and space between each. The size of the physical
memory mapping is the available physical memory. No performance impact
was detected while testing the feature.
Entropy is generated using the KASLR early boot functions now shared in
the lib directory (originally written by Kees Cook). Randomization is
done on PGD & PUD page table levels to increase possible addresses. The
physical memory mapping code was adapted to support PUD level virtual
addresses. This implementation on the best configuration provides 30,000
possible virtual addresses in average for each memory region. An
additional low memory page is used to ensure each CPU can start with a
PGD aligned virtual address (for realmode).
x86/dump_pagetable was updated to correctly display each region.
Updated documentation on x86_64 memory layout accordingly.
Performance data, after all patches in the series:
Kernbench shows almost no difference (-+ less than 1%):
Before:
Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115) System Time 87.056 (0.456416) Percent CPU 1092.9
(13.892) Context Switches 199805 (3455.33) Sleeps 97907.8 (900.636)
After:
Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053) System Time 87.764 (0.49345) Percent CPU 1095
(12.7715) Context Switches 199036 (4298.1) Sleeps 97681.6 (1031.11)
Hackbench shows 0% difference on average (hackbench 90 repeated 10 times):
attemp,before,after 1,0.076,0.069 2,0.072,0.069 3,0.066,0.066 4,0.066,0.068
5,0.066,0.067 6,0.066,0.069 7,0.067,0.066 8,0.063,0.067 9,0.067,0.065
10,0.068,0.071 average,0.0677,0.0677
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-22 00:47:02 +00:00
|
|
|
config RANDOMIZE_MEMORY
|
|
|
|
bool "Randomize the kernel memory sections"
|
|
|
|
depends on X86_64
|
|
|
|
depends on RANDOMIZE_BASE
|
2018-02-14 11:16:50 +00:00
|
|
|
select DYNAMIC_MEMORY_LAYOUT
|
x86/mm: Implement ASLR for kernel memory regions
Randomizes the virtual address space of kernel memory regions for
x86_64. This first patch adds the infrastructure and does not randomize
any region. The following patches will randomize the physical memory
mapping, vmalloc and vmemmap regions.
This security feature mitigates exploits relying on predictable kernel
addresses. These addresses can be used to disclose the kernel modules
base addresses or corrupt specific structures to elevate privileges
bypassing the current implementation of KASLR. This feature can be
enabled with the CONFIG_RANDOMIZE_MEMORY option.
The order of each memory region is not changed. The feature looks at the
available space for the regions based on different configuration options
and randomizes the base and space between each. The size of the physical
memory mapping is the available physical memory. No performance impact
was detected while testing the feature.
Entropy is generated using the KASLR early boot functions now shared in
the lib directory (originally written by Kees Cook). Randomization is
done on PGD & PUD page table levels to increase possible addresses. The
physical memory mapping code was adapted to support PUD level virtual
addresses. This implementation on the best configuration provides 30,000
possible virtual addresses in average for each memory region. An
additional low memory page is used to ensure each CPU can start with a
PGD aligned virtual address (for realmode).
x86/dump_pagetable was updated to correctly display each region.
Updated documentation on x86_64 memory layout accordingly.
Performance data, after all patches in the series:
Kernbench shows almost no difference (-+ less than 1%):
Before:
Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115) System Time 87.056 (0.456416) Percent CPU 1092.9
(13.892) Context Switches 199805 (3455.33) Sleeps 97907.8 (900.636)
After:
Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053) System Time 87.764 (0.49345) Percent CPU 1095
(12.7715) Context Switches 199036 (4298.1) Sleeps 97681.6 (1031.11)
Hackbench shows 0% difference on average (hackbench 90 repeated 10 times):
attemp,before,after 1,0.076,0.069 2,0.072,0.069 3,0.066,0.066 4,0.066,0.068
5,0.066,0.067 6,0.066,0.069 7,0.067,0.066 8,0.063,0.067 9,0.067,0.065
10,0.068,0.071 average,0.0677,0.0677
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-22 00:47:02 +00:00
|
|
|
default RANDOMIZE_BASE
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2022-05-25 13:32:02 +00:00
|
|
|
Randomizes the base virtual address of kernel memory sections
|
|
|
|
(physical memory mapping, vmalloc & vmemmap). This security feature
|
|
|
|
makes exploits relying on predictable memory locations less reliable.
|
x86/mm: Implement ASLR for kernel memory regions
Randomizes the virtual address space of kernel memory regions for
x86_64. This first patch adds the infrastructure and does not randomize
any region. The following patches will randomize the physical memory
mapping, vmalloc and vmemmap regions.
This security feature mitigates exploits relying on predictable kernel
addresses. These addresses can be used to disclose the kernel modules
base addresses or corrupt specific structures to elevate privileges
bypassing the current implementation of KASLR. This feature can be
enabled with the CONFIG_RANDOMIZE_MEMORY option.
The order of each memory region is not changed. The feature looks at the
available space for the regions based on different configuration options
and randomizes the base and space between each. The size of the physical
memory mapping is the available physical memory. No performance impact
was detected while testing the feature.
Entropy is generated using the KASLR early boot functions now shared in
the lib directory (originally written by Kees Cook). Randomization is
done on PGD & PUD page table levels to increase possible addresses. The
physical memory mapping code was adapted to support PUD level virtual
addresses. This implementation on the best configuration provides 30,000
possible virtual addresses in average for each memory region. An
additional low memory page is used to ensure each CPU can start with a
PGD aligned virtual address (for realmode).
x86/dump_pagetable was updated to correctly display each region.
Updated documentation on x86_64 memory layout accordingly.
Performance data, after all patches in the series:
Kernbench shows almost no difference (-+ less than 1%):
Before:
Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115) System Time 87.056 (0.456416) Percent CPU 1092.9
(13.892) Context Switches 199805 (3455.33) Sleeps 97907.8 (900.636)
After:
Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053) System Time 87.764 (0.49345) Percent CPU 1095
(12.7715) Context Switches 199036 (4298.1) Sleeps 97681.6 (1031.11)
Hackbench shows 0% difference on average (hackbench 90 repeated 10 times):
attemp,before,after 1,0.076,0.069 2,0.072,0.069 3,0.066,0.066 4,0.066,0.068
5,0.066,0.067 6,0.066,0.069 7,0.067,0.066 8,0.063,0.067 9,0.067,0.065
10,0.068,0.071 average,0.0677,0.0677
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-22 00:47:02 +00:00
|
|
|
|
2022-05-25 13:32:02 +00:00
|
|
|
The order of allocations remains unchanged. Entropy is generated in
|
|
|
|
the same way as RANDOMIZE_BASE. Current implementation in the optimal
|
|
|
|
configuration have in average 30,000 different possible virtual
|
|
|
|
addresses for each memory section.
|
x86/mm: Implement ASLR for kernel memory regions
Randomizes the virtual address space of kernel memory regions for
x86_64. This first patch adds the infrastructure and does not randomize
any region. The following patches will randomize the physical memory
mapping, vmalloc and vmemmap regions.
This security feature mitigates exploits relying on predictable kernel
addresses. These addresses can be used to disclose the kernel modules
base addresses or corrupt specific structures to elevate privileges
bypassing the current implementation of KASLR. This feature can be
enabled with the CONFIG_RANDOMIZE_MEMORY option.
The order of each memory region is not changed. The feature looks at the
available space for the regions based on different configuration options
and randomizes the base and space between each. The size of the physical
memory mapping is the available physical memory. No performance impact
was detected while testing the feature.
Entropy is generated using the KASLR early boot functions now shared in
the lib directory (originally written by Kees Cook). Randomization is
done on PGD & PUD page table levels to increase possible addresses. The
physical memory mapping code was adapted to support PUD level virtual
addresses. This implementation on the best configuration provides 30,000
possible virtual addresses in average for each memory region. An
additional low memory page is used to ensure each CPU can start with a
PGD aligned virtual address (for realmode).
x86/dump_pagetable was updated to correctly display each region.
Updated documentation on x86_64 memory layout accordingly.
Performance data, after all patches in the series:
Kernbench shows almost no difference (-+ less than 1%):
Before:
Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115) System Time 87.056 (0.456416) Percent CPU 1092.9
(13.892) Context Switches 199805 (3455.33) Sleeps 97907.8 (900.636)
After:
Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053) System Time 87.764 (0.49345) Percent CPU 1095
(12.7715) Context Switches 199036 (4298.1) Sleeps 97681.6 (1031.11)
Hackbench shows 0% difference on average (hackbench 90 repeated 10 times):
attemp,before,after 1,0.076,0.069 2,0.072,0.069 3,0.066,0.066 4,0.066,0.068
5,0.066,0.067 6,0.066,0.069 7,0.067,0.066 8,0.063,0.067 9,0.067,0.065
10,0.068,0.071 average,0.0677,0.0677
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-22 00:47:02 +00:00
|
|
|
|
2022-05-25 13:32:02 +00:00
|
|
|
If unsure, say Y.
|
x86/mm: Implement ASLR for kernel memory regions
Randomizes the virtual address space of kernel memory regions for
x86_64. This first patch adds the infrastructure and does not randomize
any region. The following patches will randomize the physical memory
mapping, vmalloc and vmemmap regions.
This security feature mitigates exploits relying on predictable kernel
addresses. These addresses can be used to disclose the kernel modules
base addresses or corrupt specific structures to elevate privileges
bypassing the current implementation of KASLR. This feature can be
enabled with the CONFIG_RANDOMIZE_MEMORY option.
The order of each memory region is not changed. The feature looks at the
available space for the regions based on different configuration options
and randomizes the base and space between each. The size of the physical
memory mapping is the available physical memory. No performance impact
was detected while testing the feature.
Entropy is generated using the KASLR early boot functions now shared in
the lib directory (originally written by Kees Cook). Randomization is
done on PGD & PUD page table levels to increase possible addresses. The
physical memory mapping code was adapted to support PUD level virtual
addresses. This implementation on the best configuration provides 30,000
possible virtual addresses in average for each memory region. An
additional low memory page is used to ensure each CPU can start with a
PGD aligned virtual address (for realmode).
x86/dump_pagetable was updated to correctly display each region.
Updated documentation on x86_64 memory layout accordingly.
Performance data, after all patches in the series:
Kernbench shows almost no difference (-+ less than 1%):
Before:
Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.63 (1.2695)
User Time 1034.89 (1.18115) System Time 87.056 (0.456416) Percent CPU 1092.9
(13.892) Context Switches 199805 (3455.33) Sleeps 97907.8 (900.636)
After:
Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.489 (1.10636)
User Time 1034.86 (1.36053) System Time 87.764 (0.49345) Percent CPU 1095
(12.7715) Context Switches 199036 (4298.1) Sleeps 97681.6 (1031.11)
Hackbench shows 0% difference on average (hackbench 90 repeated 10 times):
attemp,before,after 1,0.076,0.069 2,0.072,0.069 3,0.066,0.066 4,0.066,0.068
5,0.066,0.067 6,0.066,0.069 7,0.067,0.066 8,0.063,0.067 9,0.067,0.065
10,0.068,0.071 average,0.0677,0.0677
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Alexander Popov <alpopov@ptsecurity.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1466556426-32664-6-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-22 00:47:02 +00:00
|
|
|
|
2016-06-22 00:47:06 +00:00
|
|
|
config RANDOMIZE_MEMORY_PHYSICAL_PADDING
|
|
|
|
hex "Physical memory mapping padding" if EXPERT
|
|
|
|
depends on RANDOMIZE_MEMORY
|
|
|
|
default "0xa" if MEMORY_HOTPLUG
|
|
|
|
default "0x0"
|
|
|
|
range 0x1 0x40 if MEMORY_HOTPLUG
|
|
|
|
range 0x0 0x40
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2022-05-25 13:32:02 +00:00
|
|
|
Define the padding in terabytes added to the existing physical
|
|
|
|
memory size during kernel memory randomization. It is useful
|
|
|
|
for memory hotplug support but reduces the entropy available for
|
|
|
|
address randomization.
|
2016-06-22 00:47:06 +00:00
|
|
|
|
2022-05-25 13:32:02 +00:00
|
|
|
If unsure, leave at the default value.
|
2016-06-22 00:47:06 +00:00
|
|
|
|
2023-03-12 11:25:58 +00:00
|
|
|
config ADDRESS_MASKING
|
|
|
|
bool "Linear Address Masking support"
|
|
|
|
depends on X86_64
|
2024-01-24 03:55:21 +00:00
|
|
|
depends on COMPILE_TEST || !CPU_MITIGATIONS # wait for LASS
|
2023-03-12 11:25:58 +00:00
|
|
|
help
|
|
|
|
Linear Address Masking (LAM) modifies the checking that is applied
|
|
|
|
to 64-bit linear addresses, allowing software to use of the
|
|
|
|
untranslated address bits for metadata.
|
|
|
|
|
|
|
|
The capability can be used for efficient address sanitizers (ASAN)
|
|
|
|
implementation and for optimizations in JITs.
|
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
config HOTPLUG_CPU
|
2019-03-26 16:36:06 +00:00
|
|
|
def_bool y
|
2013-05-21 03:49:35 +00:00
|
|
|
depends on SMP
|
2007-11-09 20:56:54 +00:00
|
|
|
|
|
|
|
config COMPAT_VDSO
|
2014-03-13 23:01:26 +00:00
|
|
|
def_bool n
|
|
|
|
prompt "Disable the 32-bit vDSO (needed for glibc 2.3.3)"
|
2016-11-15 09:22:52 +00:00
|
|
|
depends on COMPAT_32
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2014-03-13 23:01:26 +00:00
|
|
|
Certain buggy versions of glibc will crash if they are
|
|
|
|
presented with a 32-bit vDSO that is not mapped at the address
|
|
|
|
indicated in its segment table.
|
2009-11-10 23:46:52 +00:00
|
|
|
|
2014-03-13 23:01:26 +00:00
|
|
|
The bug was introduced by f866314b89d56845f55e6f365e18b31ec978ec3a
|
|
|
|
and fixed by 3b3ddb4f7db98ec9e912ccdf54d35df4aa30e04a and
|
|
|
|
49ad572a70b8aeb91e57483a11dd1b77e31c4468. Glibc 2.3.3 is
|
|
|
|
the only released version with the bug, but OpenSUSE 9
|
|
|
|
contains a buggy "glibc 2.3.2".
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2014-03-13 23:01:26 +00:00
|
|
|
The symptom of the bug is that everything crashes on startup, saying:
|
|
|
|
dl_main: Assertion `(void *) ph->p_vaddr == _rtld_local._dl_sysinfo_dso' failed!
|
|
|
|
|
|
|
|
Saying Y here changes the default value of the vdso32 boot
|
|
|
|
option from 1 to 0, which turns off the 32-bit vDSO entirely.
|
|
|
|
This works around the glibc bug but hurts performance.
|
|
|
|
|
|
|
|
If unsure, say N: if you are compiling your own kernel, you
|
|
|
|
are unlikely to be using a buggy version of glibc.
|
2007-11-09 20:56:54 +00:00
|
|
|
|
2015-08-13 00:55:19 +00:00
|
|
|
choice
|
|
|
|
prompt "vsyscall table for legacy applications"
|
|
|
|
depends on X86_64
|
2019-06-27 04:45:07 +00:00
|
|
|
default LEGACY_VSYSCALL_XONLY
|
2015-08-13 00:55:19 +00:00
|
|
|
help
|
|
|
|
Legacy user code that does not know how to find the vDSO expects
|
|
|
|
to be able to issue three syscalls by calling fixed addresses in
|
|
|
|
kernel space. Since this location is not randomized with ASLR,
|
|
|
|
it can be used to assist security vulnerability exploitation.
|
|
|
|
|
|
|
|
This setting can be changed at boot time via the kernel command
|
2022-05-11 17:38:53 +00:00
|
|
|
line parameter vsyscall=[emulate|xonly|none]. Emulate mode
|
|
|
|
is deprecated and can only be enabled using the kernel command
|
|
|
|
line.
|
2015-08-13 00:55:19 +00:00
|
|
|
|
|
|
|
On a system with recent enough glibc (2.14 or newer) and no
|
|
|
|
static binaries, you can say None without a performance penalty
|
|
|
|
to improve security.
|
|
|
|
|
2019-06-27 04:45:03 +00:00
|
|
|
If unsure, select "Emulate execution only".
|
2015-08-13 00:55:19 +00:00
|
|
|
|
2019-06-27 04:45:03 +00:00
|
|
|
config LEGACY_VSYSCALL_XONLY
|
|
|
|
bool "Emulate execution only"
|
|
|
|
help
|
|
|
|
The kernel traps and emulates calls into the fixed vsyscall
|
|
|
|
address mapping and does not allow reads. This
|
|
|
|
configuration is recommended when userspace might use the
|
|
|
|
legacy vsyscall area but support for legacy binary
|
|
|
|
instrumentation of legacy code is not needed. It mitigates
|
|
|
|
certain uses of the vsyscall area as an ASLR-bypassing
|
|
|
|
buffer.
|
2015-08-13 00:55:19 +00:00
|
|
|
|
|
|
|
config LEGACY_VSYSCALL_NONE
|
|
|
|
bool "None"
|
|
|
|
help
|
|
|
|
There will be no vsyscall mapping at all. This will
|
|
|
|
eliminate any risk of ASLR bypass due to the vsyscall
|
|
|
|
fixed address mapping. Attempts to use the vsyscalls
|
|
|
|
will be reported to dmesg, so that either old or
|
|
|
|
malicious userspace programs can be identified.
|
|
|
|
|
|
|
|
endchoice
|
|
|
|
|
2008-08-12 19:52:36 +00:00
|
|
|
config CMDLINE_BOOL
|
|
|
|
bool "Built-in kernel command line"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2008-08-12 19:52:36 +00:00
|
|
|
Allow for specifying boot arguments to the kernel at
|
|
|
|
build time. On some systems (e.g. embedded ones), it is
|
|
|
|
necessary or convenient to provide some or all of the
|
|
|
|
kernel boot arguments with the kernel itself (that is,
|
|
|
|
to not rely on the boot loader to provide them.)
|
|
|
|
|
|
|
|
To compile command line arguments into the kernel,
|
|
|
|
set this option to 'Y', then fill in the
|
2015-07-07 22:02:01 +00:00
|
|
|
boot arguments in CONFIG_CMDLINE.
|
2008-08-12 19:52:36 +00:00
|
|
|
|
|
|
|
Systems with fully functional boot loaders (i.e. non-embedded)
|
|
|
|
should leave this option set to 'N'.
|
|
|
|
|
|
|
|
config CMDLINE
|
|
|
|
string "Built-in kernel command string"
|
|
|
|
depends on CMDLINE_BOOL
|
|
|
|
default ""
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2008-08-12 19:52:36 +00:00
|
|
|
Enter arguments here that should be compiled into the kernel
|
|
|
|
image and used at boot time. If the boot loader provides a
|
|
|
|
command line at boot time, it is appended to this string to
|
|
|
|
form the full kernel command line, when the system boots.
|
|
|
|
|
|
|
|
However, you can use the CONFIG_CMDLINE_OVERRIDE option to
|
|
|
|
change this behavior.
|
|
|
|
|
|
|
|
In most cases, the command line (whether built-in or provided
|
|
|
|
by the boot loader) should specify the device for the root
|
|
|
|
file system.
|
|
|
|
|
|
|
|
config CMDLINE_OVERRIDE
|
|
|
|
bool "Built-in command line overrides boot loader arguments"
|
2020-01-24 11:46:15 +00:00
|
|
|
depends on CMDLINE_BOOL && CMDLINE != ""
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2008-08-12 19:52:36 +00:00
|
|
|
Set this option to 'Y' to have the kernel ignore the boot loader
|
|
|
|
command line, and use ONLY the built-in command line.
|
|
|
|
|
|
|
|
This is used to work around broken boot loaders. This should
|
|
|
|
be set to 'N' under normal conditions.
|
|
|
|
|
2015-07-30 21:31:34 +00:00
|
|
|
config MODIFY_LDT_SYSCALL
|
|
|
|
bool "Enable the LDT (local descriptor table)" if EXPERT
|
|
|
|
default y
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2015-07-30 21:31:34 +00:00
|
|
|
Linux can allow user programs to install a per-process x86
|
|
|
|
Local Descriptor Table (LDT) using the modify_ldt(2) system
|
|
|
|
call. This is required to run 16-bit or segmented code such as
|
|
|
|
DOSEMU or some Wine programs. It is also used by some very old
|
|
|
|
threading libraries.
|
|
|
|
|
|
|
|
Enabling this feature adds a small amount of overhead to
|
|
|
|
context switches and increases the low-level kernel attack
|
|
|
|
surface. Disabling it removes the modify_ldt(2) system call.
|
|
|
|
|
|
|
|
Saying 'N' here may make sense for embedded or server kernels.
|
|
|
|
|
2021-10-21 22:55:06 +00:00
|
|
|
config STRICT_SIGALTSTACK_SIZE
|
|
|
|
bool "Enforce strict size checking for sigaltstack"
|
|
|
|
depends on DYNAMIC_SIGFRAME
|
|
|
|
help
|
|
|
|
For historical reasons MINSIGSTKSZ is a constant which became
|
|
|
|
already too small with AVX512 support. Add a mechanism to
|
|
|
|
enforce strict checking of the sigaltstack size against the
|
|
|
|
real size of the FPU frame. This option enables the check
|
|
|
|
by default. It can also be controlled via the kernel command
|
|
|
|
line option 'strict_sas_size' independent of this config
|
|
|
|
switch. Enabling it might break existing applications which
|
|
|
|
allocate a too small sigaltstack but 'work' because they
|
|
|
|
never get a signal delivered.
|
|
|
|
|
|
|
|
Say 'N' unless you want to really enforce this check.
|
|
|
|
|
2024-05-01 00:02:22 +00:00
|
|
|
config CFI_AUTO_DEFAULT
|
|
|
|
bool "Attempt to use FineIBT by default at boot time"
|
|
|
|
depends on FINEIBT
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Attempt to use FineIBT by default at boot time. If enabled,
|
|
|
|
this is the same as booting with "cfi=auto". If disabled,
|
|
|
|
this is the same as booting with "cfi=kcfi".
|
|
|
|
|
2014-12-16 17:58:19 +00:00
|
|
|
source "kernel/livepatch/Kconfig"
|
|
|
|
|
2024-08-08 06:29:34 +00:00
|
|
|
config X86_BUS_LOCK_DETECT
|
|
|
|
bool "Split Lock Detect and Bus Lock Detect support"
|
2024-08-08 06:29:35 +00:00
|
|
|
depends on CPU_SUP_INTEL || CPU_SUP_AMD
|
2024-08-08 06:29:34 +00:00
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enable Split Lock Detect and Bus Lock Detect functionalities.
|
|
|
|
See <file:Documentation/arch/x86/buslock.rst> for more information.
|
|
|
|
|
2007-11-09 20:56:54 +00:00
|
|
|
endmenu
|
|
|
|
|
2023-10-04 14:49:42 +00:00
|
|
|
config CC_HAS_NAMED_AS
|
2024-05-20 08:21:14 +00:00
|
|
|
def_bool $(success,echo 'int __seg_fs fs; int __seg_gs gs;' | $(CC) -x c - -S -o /dev/null)
|
|
|
|
depends on CC_IS_GCC
|
2023-10-04 14:49:42 +00:00
|
|
|
|
2024-04-02 12:19:08 +00:00
|
|
|
config CC_HAS_NAMED_AS_FIXED_SANITIZERS
|
2024-03-20 12:45:49 +00:00
|
|
|
def_bool CC_IS_GCC && GCC_VERSION >= 130300
|
2023-10-04 14:49:42 +00:00
|
|
|
|
|
|
|
config USE_X86_SEG_SUPPORT
|
|
|
|
def_bool y
|
2023-10-09 15:13:48 +00:00
|
|
|
depends on CC_HAS_NAMED_AS
|
|
|
|
#
|
2024-04-02 12:19:08 +00:00
|
|
|
# -fsanitize=kernel-address (KASAN) and -fsanitize=thread
|
|
|
|
# (KCSAN) are incompatible with named address spaces with
|
|
|
|
# GCC < 13.3 - see GCC PR sanitizer/111736.
|
2023-10-09 15:13:48 +00:00
|
|
|
#
|
2024-04-02 12:19:08 +00:00
|
|
|
depends on !(KASAN || KCSAN) || CC_HAS_NAMED_AS_FIXED_SANITIZERS
|
2023-10-04 14:49:42 +00:00
|
|
|
|
2022-06-27 22:21:17 +00:00
|
|
|
config CC_HAS_SLS
|
|
|
|
def_bool $(cc-option,-mharden-sls=all)
|
|
|
|
|
|
|
|
config CC_HAS_RETURN_THUNK
|
|
|
|
def_bool $(cc-option,-mfunction-return=thunk-extern)
|
|
|
|
|
2022-09-15 11:11:18 +00:00
|
|
|
config CC_HAS_ENTRY_PADDING
|
|
|
|
def_bool $(cc-option,-fpatchable-function-entry=16,16)
|
|
|
|
|
|
|
|
config FUNCTION_PADDING_CFI
|
|
|
|
int
|
|
|
|
default 59 if FUNCTION_ALIGNMENT_64B
|
|
|
|
default 27 if FUNCTION_ALIGNMENT_32B
|
|
|
|
default 11 if FUNCTION_ALIGNMENT_16B
|
|
|
|
default 3 if FUNCTION_ALIGNMENT_8B
|
|
|
|
default 0
|
|
|
|
|
|
|
|
# Basically: FUNCTION_ALIGNMENT - 5*CFI_CLANG
|
|
|
|
# except Kconfig can't do arithmetic :/
|
|
|
|
config FUNCTION_PADDING_BYTES
|
|
|
|
int
|
|
|
|
default FUNCTION_PADDING_CFI if CFI_CLANG
|
|
|
|
default FUNCTION_ALIGNMENT
|
|
|
|
|
2022-10-27 09:28:14 +00:00
|
|
|
config CALL_PADDING
|
|
|
|
def_bool n
|
|
|
|
depends on CC_HAS_ENTRY_PADDING && OBJTOOL
|
|
|
|
select FUNCTION_ALIGNMENT_16B
|
|
|
|
|
|
|
|
config FINEIBT
|
|
|
|
def_bool y
|
2023-11-21 16:07:32 +00:00
|
|
|
depends on X86_KERNEL_IBT && CFI_CLANG && MITIGATION_RETPOLINE
|
2022-10-27 09:28:14 +00:00
|
|
|
select CALL_PADDING
|
|
|
|
|
2022-09-15 11:11:17 +00:00
|
|
|
config HAVE_CALL_THUNKS
|
|
|
|
def_bool y
|
2023-11-21 16:07:37 +00:00
|
|
|
depends on CC_HAS_ENTRY_PADDING && MITIGATION_RETHUNK && OBJTOOL
|
2022-09-15 11:11:17 +00:00
|
|
|
|
|
|
|
config CALL_THUNKS
|
|
|
|
def_bool n
|
2022-10-27 09:28:14 +00:00
|
|
|
select CALL_PADDING
|
2022-09-15 11:11:17 +00:00
|
|
|
|
2022-10-28 19:08:19 +00:00
|
|
|
config PREFIX_SYMBOLS
|
|
|
|
def_bool y
|
2022-10-27 09:28:14 +00:00
|
|
|
depends on CALL_PADDING && !CFI_CLANG
|
2022-10-28 19:08:19 +00:00
|
|
|
|
2024-04-20 00:05:54 +00:00
|
|
|
menuconfig CPU_MITIGATIONS
|
|
|
|
bool "Mitigations for CPU vulnerabilities"
|
2022-06-27 22:21:17 +00:00
|
|
|
default y
|
|
|
|
help
|
2024-04-20 00:05:54 +00:00
|
|
|
Say Y here to enable options which enable mitigations for hardware
|
|
|
|
vulnerabilities (usually related to speculative execution).
|
2024-04-20 00:05:55 +00:00
|
|
|
Mitigations can be disabled or restricted to SMT systems at runtime
|
|
|
|
via the "mitigations" kernel parameter.
|
2022-06-27 22:21:17 +00:00
|
|
|
|
2024-04-20 00:05:55 +00:00
|
|
|
If you say N, all mitigations will be disabled. This CANNOT be
|
|
|
|
overridden at runtime.
|
|
|
|
|
|
|
|
Say 'Y', unless you really know what you are doing.
|
2022-06-27 22:21:17 +00:00
|
|
|
|
2024-04-20 00:05:54 +00:00
|
|
|
if CPU_MITIGATIONS
|
2022-06-27 22:21:17 +00:00
|
|
|
|
2023-11-21 16:07:31 +00:00
|
|
|
config MITIGATION_PAGE_TABLE_ISOLATION
|
2022-06-27 22:21:17 +00:00
|
|
|
bool "Remove the kernel mapping in user mode"
|
|
|
|
default y
|
|
|
|
depends on (X86_64 || X86_PAE)
|
|
|
|
help
|
|
|
|
This feature reduces the number of hardware side channels by
|
|
|
|
ensuring that the majority of kernel addresses are not mapped
|
|
|
|
into userspace.
|
|
|
|
|
2023-03-14 23:06:44 +00:00
|
|
|
See Documentation/arch/x86/pti.rst for more details.
|
2022-06-27 22:21:17 +00:00
|
|
|
|
2023-11-21 16:07:32 +00:00
|
|
|
config MITIGATION_RETPOLINE
|
2022-06-27 22:21:17 +00:00
|
|
|
bool "Avoid speculative indirect branches in kernel"
|
|
|
|
select OBJTOOL if HAVE_OBJTOOL
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Compile kernel with the retpoline compiler options to guard against
|
|
|
|
kernel-to-user data leaks by avoiding speculative indirect
|
|
|
|
branches. Requires a compiler with -mindirect-branch=thunk-extern
|
|
|
|
support for full protection. The kernel may run slower.
|
|
|
|
|
2023-11-21 16:07:37 +00:00
|
|
|
config MITIGATION_RETHUNK
|
2022-06-27 22:21:17 +00:00
|
|
|
bool "Enable return-thunks"
|
2023-11-21 16:07:32 +00:00
|
|
|
depends on MITIGATION_RETPOLINE && CC_HAS_RETURN_THUNK
|
2022-06-27 22:21:17 +00:00
|
|
|
select OBJTOOL if HAVE_OBJTOOL
|
2022-07-23 15:22:47 +00:00
|
|
|
default y if X86_64
|
2022-06-27 22:21:17 +00:00
|
|
|
help
|
|
|
|
Compile the kernel with the return-thunks compiler option to guard
|
|
|
|
against kernel-to-user data leaks by avoiding return speculation.
|
|
|
|
Requires a compiler with -mfunction-return=thunk-extern
|
|
|
|
support for full protection. The kernel may run slower.
|
|
|
|
|
2023-11-21 16:07:34 +00:00
|
|
|
config MITIGATION_UNRET_ENTRY
|
2022-06-27 22:21:17 +00:00
|
|
|
bool "Enable UNRET on kernel entry"
|
2023-11-21 16:07:37 +00:00
|
|
|
depends on CPU_SUP_AMD && MITIGATION_RETHUNK && X86_64
|
2022-06-27 22:21:17 +00:00
|
|
|
default y
|
|
|
|
help
|
|
|
|
Compile the kernel with support for the retbleed=unret mitigation.
|
|
|
|
|
2023-11-21 16:07:30 +00:00
|
|
|
config MITIGATION_CALL_DEPTH_TRACKING
|
2022-09-15 11:11:19 +00:00
|
|
|
bool "Mitigate RSB underflow with call depth tracking"
|
|
|
|
depends on CPU_SUP_INTEL && HAVE_CALL_THUNKS
|
|
|
|
select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
|
|
|
|
select CALL_THUNKS
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Compile the kernel with call depth tracking to mitigate the Intel
|
2024-09-13 12:27:53 +00:00
|
|
|
SKL Return-Stack-Buffer (RSB) underflow issue. The mitigation is off
|
|
|
|
by default and needs to be enabled on the kernel command line via the
|
|
|
|
retbleed=stuff option. For non-affected systems the overhead of this
|
|
|
|
option is marginal as the call depth tracking is using run-time
|
|
|
|
generated call thunks in a compiler generated padding area and call
|
|
|
|
patching. This increases text size by ~5%. For non affected systems
|
|
|
|
this space is unused. On affected SKL systems this results in a
|
|
|
|
significant performance gain over the IBRS mitigation.
|
2022-09-15 11:11:19 +00:00
|
|
|
|
2022-09-15 11:11:23 +00:00
|
|
|
config CALL_THUNKS_DEBUG
|
|
|
|
bool "Enable call thunks and call depth tracking debugging"
|
2023-11-21 16:07:30 +00:00
|
|
|
depends on MITIGATION_CALL_DEPTH_TRACKING
|
2022-09-15 11:11:23 +00:00
|
|
|
select FUNCTION_ALIGNMENT_32B
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
Enable call/ret counters for imbalance detection and build in
|
|
|
|
a noisy dmesg about callthunks generation and call patching for
|
|
|
|
trouble shooting. The debug prints need to be enabled on the
|
|
|
|
kernel command line with 'debug-callthunks'.
|
2023-01-24 18:17:53 +00:00
|
|
|
Only enable this when you are debugging call thunks as this
|
|
|
|
creates a noticeable runtime overhead. If unsure say N.
|
2022-09-15 11:11:19 +00:00
|
|
|
|
2023-11-21 16:07:29 +00:00
|
|
|
config MITIGATION_IBPB_ENTRY
|
2022-06-27 22:21:17 +00:00
|
|
|
bool "Enable IBPB on kernel entry"
|
2022-07-23 15:22:47 +00:00
|
|
|
depends on CPU_SUP_AMD && X86_64
|
2022-06-27 22:21:17 +00:00
|
|
|
default y
|
|
|
|
help
|
|
|
|
Compile the kernel with support for the retbleed=ibpb mitigation.
|
|
|
|
|
2023-11-21 16:07:35 +00:00
|
|
|
config MITIGATION_IBRS_ENTRY
|
2022-06-27 22:21:17 +00:00
|
|
|
bool "Enable IBRS on kernel entry"
|
2022-07-23 15:22:47 +00:00
|
|
|
depends on CPU_SUP_INTEL && X86_64
|
2022-06-27 22:21:17 +00:00
|
|
|
default y
|
|
|
|
help
|
|
|
|
Compile the kernel with support for the spectre_v2=ibrs mitigation.
|
|
|
|
This mitigates both spectre_v2 and retbleed at great cost to
|
|
|
|
performance.
|
|
|
|
|
2023-11-21 16:07:36 +00:00
|
|
|
config MITIGATION_SRSO
|
2023-06-28 09:02:39 +00:00
|
|
|
bool "Mitigate speculative RAS overflow on AMD"
|
2023-11-21 16:07:37 +00:00
|
|
|
depends on CPU_SUP_AMD && X86_64 && MITIGATION_RETHUNK
|
2023-06-28 09:02:39 +00:00
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enable the SRSO mitigation needed on AMD Zen1-4 machines.
|
|
|
|
|
2023-11-21 16:07:33 +00:00
|
|
|
config MITIGATION_SLS
|
2022-06-27 22:21:17 +00:00
|
|
|
bool "Mitigate Straight-Line-Speculation"
|
|
|
|
depends on CC_HAS_SLS && X86_64
|
|
|
|
select OBJTOOL if HAVE_OBJTOOL
|
|
|
|
default n
|
|
|
|
help
|
|
|
|
Compile the kernel with straight-line-speculation options to guard
|
|
|
|
against straight line speculation. The kernel image might be slightly
|
|
|
|
larger.
|
|
|
|
|
2024-07-29 16:40:59 +00:00
|
|
|
config MITIGATION_GDS
|
|
|
|
bool "Mitigate Gather Data Sampling"
|
|
|
|
depends on CPU_SUP_INTEL
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enable mitigation for Gather Data Sampling (GDS). GDS is a hardware
|
|
|
|
vulnerability which allows unprivileged speculative access to data
|
|
|
|
which was previously stored in vector registers. The attacker uses gather
|
|
|
|
instructions to infer the stale vector register data.
|
|
|
|
|
2024-03-11 19:29:43 +00:00
|
|
|
config MITIGATION_RFDS
|
|
|
|
bool "RFDS Mitigation"
|
|
|
|
depends on CPU_SUP_INTEL
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enable mitigation for Register File Data Sampling (RFDS) by default.
|
|
|
|
RFDS is a hardware vulnerability which affects Intel Atom CPUs. It
|
|
|
|
allows unprivileged speculative access to stale data previously
|
|
|
|
stored in floating point, vector and integer registers.
|
|
|
|
See also <file:Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst>
|
|
|
|
|
2024-04-11 05:40:51 +00:00
|
|
|
config MITIGATION_SPECTRE_BHI
|
|
|
|
bool "Mitigate Spectre-BHB (Branch History Injection)"
|
2024-03-11 15:57:05 +00:00
|
|
|
depends on CPU_SUP_INTEL
|
2024-04-11 05:40:51 +00:00
|
|
|
default y
|
2024-03-11 15:57:05 +00:00
|
|
|
help
|
|
|
|
Enable BHI mitigations. BHI attacks are a form of Spectre V2 attacks
|
|
|
|
where the branch history buffer is poisoned to speculatively steer
|
|
|
|
indirect branches.
|
|
|
|
See <file:Documentation/admin-guide/hw-vuln/spectre.rst>
|
|
|
|
|
2024-07-29 16:40:49 +00:00
|
|
|
config MITIGATION_MDS
|
|
|
|
bool "Mitigate Microarchitectural Data Sampling (MDS) hardware bug"
|
|
|
|
depends on CPU_SUP_INTEL
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enable mitigation for Microarchitectural Data Sampling (MDS). MDS is
|
|
|
|
a hardware vulnerability which allows unprivileged speculative access
|
|
|
|
to data which is available in various CPU internal buffers.
|
|
|
|
See also <file:Documentation/admin-guide/hw-vuln/mds.rst>
|
2024-07-29 16:40:50 +00:00
|
|
|
|
|
|
|
config MITIGATION_TAA
|
|
|
|
bool "Mitigate TSX Asynchronous Abort (TAA) hardware bug"
|
|
|
|
depends on CPU_SUP_INTEL
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enable mitigation for TSX Asynchronous Abort (TAA). TAA is a hardware
|
|
|
|
vulnerability that allows unprivileged speculative access to data
|
|
|
|
which is available in various CPU internal buffers by using
|
|
|
|
asynchronous aborts within an Intel TSX transactional region.
|
|
|
|
See also <file:Documentation/admin-guide/hw-vuln/tsx_async_abort.rst>
|
2024-07-29 16:40:51 +00:00
|
|
|
|
|
|
|
config MITIGATION_MMIO_STALE_DATA
|
|
|
|
bool "Mitigate MMIO Stale Data hardware bug"
|
|
|
|
depends on CPU_SUP_INTEL
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enable mitigation for MMIO Stale Data hardware bugs. Processor MMIO
|
|
|
|
Stale Data Vulnerabilities are a class of memory-mapped I/O (MMIO)
|
|
|
|
vulnerabilities that can expose data. The vulnerabilities require the
|
|
|
|
attacker to have access to MMIO.
|
|
|
|
See also
|
|
|
|
<file:Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst>
|
2024-07-29 16:40:52 +00:00
|
|
|
|
|
|
|
config MITIGATION_L1TF
|
|
|
|
bool "Mitigate L1 Terminal Fault (L1TF) hardware bug"
|
|
|
|
depends on CPU_SUP_INTEL
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Mitigate L1 Terminal Fault (L1TF) hardware bug. L1 Terminal Fault is a
|
|
|
|
hardware vulnerability which allows unprivileged speculative access to data
|
|
|
|
available in the Level 1 Data Cache.
|
|
|
|
See <file:Documentation/admin-guide/hw-vuln/l1tf.rst
|
2024-07-29 16:40:53 +00:00
|
|
|
|
|
|
|
config MITIGATION_RETBLEED
|
|
|
|
bool "Mitigate RETBleed hardware bug"
|
|
|
|
depends on (CPU_SUP_INTEL && MITIGATION_SPECTRE_V2) || MITIGATION_UNRET_ENTRY || MITIGATION_IBPB_ENTRY
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enable mitigation for RETBleed (Arbitrary Speculative Code Execution
|
|
|
|
with Return Instructions) vulnerability. RETBleed is a speculative
|
|
|
|
execution attack which takes advantage of microarchitectural behavior
|
|
|
|
in many modern microprocessors, similar to Spectre v2. An
|
|
|
|
unprivileged attacker can use these flaws to bypass conventional
|
|
|
|
memory security restrictions to gain read access to privileged memory
|
|
|
|
that would otherwise be inaccessible.
|
2024-07-29 16:40:54 +00:00
|
|
|
|
|
|
|
config MITIGATION_SPECTRE_V1
|
|
|
|
bool "Mitigate SPECTRE V1 hardware bug"
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enable mitigation for Spectre V1 (Bounds Check Bypass). Spectre V1 is a
|
|
|
|
class of side channel attacks that takes advantage of speculative
|
|
|
|
execution that bypasses conditional branch instructions used for
|
|
|
|
memory access bounds check.
|
|
|
|
See also <file:Documentation/admin-guide/hw-vuln/spectre.rst>
|
2024-07-29 16:40:55 +00:00
|
|
|
|
2024-07-29 16:40:56 +00:00
|
|
|
config MITIGATION_SPECTRE_V2
|
|
|
|
bool "Mitigate SPECTRE V2 hardware bug"
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enable mitigation for Spectre V2 (Branch Target Injection). Spectre
|
|
|
|
V2 is a class of side channel attacks that takes advantage of
|
|
|
|
indirect branch predictors inside the processor. In Spectre variant 2
|
|
|
|
attacks, the attacker can steer speculative indirect branches in the
|
|
|
|
victim to gadget code by poisoning the branch target buffer of a CPU
|
|
|
|
used for predicting indirect branch addresses.
|
|
|
|
See also <file:Documentation/admin-guide/hw-vuln/spectre.rst>
|
|
|
|
|
2024-07-29 16:40:55 +00:00
|
|
|
config MITIGATION_SRBDS
|
|
|
|
bool "Mitigate Special Register Buffer Data Sampling (SRBDS) hardware bug"
|
|
|
|
depends on CPU_SUP_INTEL
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enable mitigation for Special Register Buffer Data Sampling (SRBDS).
|
|
|
|
SRBDS is a hardware vulnerability that allows Microarchitectural Data
|
|
|
|
Sampling (MDS) techniques to infer values returned from special
|
|
|
|
register accesses. An unprivileged user can extract values returned
|
|
|
|
from RDRAND and RDSEED executed on another core or sibling thread
|
|
|
|
using MDS techniques.
|
|
|
|
See also
|
|
|
|
<file:Documentation/admin-guide/hw-vuln/special-register-buffer-data-sampling.rst>
|
2024-07-29 16:40:57 +00:00
|
|
|
|
|
|
|
config MITIGATION_SSB
|
|
|
|
bool "Mitigate Speculative Store Bypass (SSB) hardware bug"
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enable mitigation for Speculative Store Bypass (SSB). SSB is a
|
|
|
|
hardware security vulnerability and its exploitation takes advantage
|
|
|
|
of speculative execution in a similar way to the Meltdown and Spectre
|
|
|
|
security vulnerabilities.
|
|
|
|
|
2022-06-27 22:21:17 +00:00
|
|
|
endif
|
|
|
|
|
2017-09-08 23:11:39 +00:00
|
|
|
config ARCH_HAS_ADD_PAGES
|
|
|
|
def_bool y
|
2021-11-05 20:44:39 +00:00
|
|
|
depends on ARCH_ENABLE_MEMORY_HOTPLUG
|
2017-09-08 23:11:39 +00:00
|
|
|
|
2008-11-05 19:37:27 +00:00
|
|
|
menu "Power management and ACPI options"
|
2007-11-06 19:41:05 +00:00
|
|
|
|
|
|
|
config ARCH_HIBERNATION_HEADER
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
2018-09-21 06:27:29 +00:00
|
|
|
depends on HIBERNATION
|
2007-11-06 19:41:05 +00:00
|
|
|
|
|
|
|
source "kernel/power/Kconfig"
|
|
|
|
|
|
|
|
source "drivers/acpi/Kconfig"
|
|
|
|
|
2008-01-30 12:32:49 +00:00
|
|
|
config X86_APM_BOOT
|
2010-04-21 14:23:44 +00:00
|
|
|
def_bool y
|
2011-11-17 10:41:31 +00:00
|
|
|
depends on APM
|
2008-01-30 12:32:49 +00:00
|
|
|
|
2007-11-06 19:41:05 +00:00
|
|
|
menuconfig APM
|
|
|
|
tristate "APM (Advanced Power Management) BIOS support"
|
2008-07-10 14:09:50 +00:00
|
|
|
depends on X86_32 && PM_SLEEP
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-06 19:41:05 +00:00
|
|
|
APM is a BIOS specification for saving power using several different
|
|
|
|
techniques. This is mostly useful for battery powered laptops with
|
|
|
|
APM compliant BIOSes. If you say Y here, the system time will be
|
|
|
|
reset after a RESUME operation, the /proc/apm device will provide
|
|
|
|
battery status information, and user-space programs will receive
|
|
|
|
notification of APM "events" (e.g. battery status change).
|
|
|
|
|
|
|
|
If you select "Y" here, you can disable actual use of the APM
|
|
|
|
BIOS by passing the "apm=off" option to the kernel at boot time.
|
|
|
|
|
|
|
|
Note that the APM support is almost completely disabled for
|
|
|
|
machines with more than one CPU.
|
|
|
|
|
|
|
|
In order to use APM, you will need supporting software. For location
|
2019-06-13 10:10:36 +00:00
|
|
|
and more information, read <file:Documentation/power/apm-acpi.rst>
|
2011-07-08 21:11:16 +00:00
|
|
|
and the Battery Powered Linux mini-HOWTO, available from
|
2007-11-06 19:41:05 +00:00
|
|
|
<http://www.tldp.org/docs.html#howto>.
|
|
|
|
|
|
|
|
This driver does not spin down disk drives (see the hdparm(8)
|
|
|
|
manpage ("man 8 hdparm") for that), and it doesn't turn off
|
|
|
|
VESA-compliant "green" monitors.
|
|
|
|
|
|
|
|
This driver does not support the TI 4000M TravelMate and the ACER
|
|
|
|
486/DX4/75 because they don't have compliant BIOSes. Many "green"
|
|
|
|
desktop machines also don't have compliant BIOSes, and this driver
|
|
|
|
may cause those machines to panic during the boot phase.
|
|
|
|
|
|
|
|
Generally, if you don't have a battery in your machine, there isn't
|
|
|
|
much point in using this driver and you should say N. If you get
|
|
|
|
random kernel OOPSes or reboots that don't seem to be related to
|
|
|
|
anything, try disabling/enabling this option (or disabling/enabling
|
|
|
|
APM in your BIOS).
|
|
|
|
|
|
|
|
Some other things you should try when experiencing seemingly random,
|
|
|
|
"weird" problems:
|
|
|
|
|
|
|
|
1) make sure that you have enough swap space and that it is
|
|
|
|
enabled.
|
2022-07-13 16:08:40 +00:00
|
|
|
2) pass the "idle=poll" option to the kernel
|
2007-11-06 19:41:05 +00:00
|
|
|
3) switch on floating point emulation in the kernel and pass
|
|
|
|
the "no387" option to the kernel
|
|
|
|
4) pass the "floppy=nodma" option to the kernel
|
|
|
|
5) pass the "mem=4M" option to the kernel (thereby disabling
|
|
|
|
all but the first 4 MB of RAM)
|
|
|
|
6) make sure that the CPU is not over clocked.
|
|
|
|
7) read the sig11 FAQ at <http://www.bitwizard.nl/sig11/>
|
|
|
|
8) disable the cache from your BIOS settings
|
|
|
|
9) install a fan for the video card or exchange video RAM
|
|
|
|
10) install a better fan for the CPU
|
|
|
|
11) exchange RAM chips
|
|
|
|
12) exchange the motherboard.
|
|
|
|
|
|
|
|
To compile this driver as a module, choose M here: the
|
|
|
|
module will be called apm.
|
|
|
|
|
|
|
|
if APM
|
|
|
|
|
|
|
|
config APM_IGNORE_USER_SUSPEND
|
|
|
|
bool "Ignore USER SUSPEND"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-06 19:41:05 +00:00
|
|
|
This option will ignore USER SUSPEND requests. On machines with a
|
|
|
|
compliant APM BIOS, you want to say N. However, on the NEC Versa M
|
|
|
|
series notebooks, it is necessary to say Y because of a BIOS bug.
|
|
|
|
|
|
|
|
config APM_DO_ENABLE
|
|
|
|
bool "Enable PM at boot time"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-06 19:41:05 +00:00
|
|
|
Enable APM features at boot time. From page 36 of the APM BIOS
|
|
|
|
specification: "When disabled, the APM BIOS does not automatically
|
|
|
|
power manage devices, enter the Standby State, enter the Suspend
|
|
|
|
State, or take power saving steps in response to CPU Idle calls."
|
|
|
|
This driver will make CPU Idle calls when Linux is idle (unless this
|
|
|
|
feature is turned off -- see "Do CPU IDLE calls", below). This
|
|
|
|
should always save battery power, but more complicated APM features
|
|
|
|
will be dependent on your BIOS implementation. You may need to turn
|
|
|
|
this option off if your computer hangs at boot time when using APM
|
|
|
|
support, or if it beeps continuously instead of suspending. Turn
|
|
|
|
this off if you have a NEC UltraLite Versa 33/C or a Toshiba
|
|
|
|
T400CDT. This is off by default since most machines do fine without
|
|
|
|
this feature.
|
|
|
|
|
|
|
|
config APM_CPU_IDLE
|
2013-02-10 02:10:04 +00:00
|
|
|
depends on CPU_IDLE
|
2007-11-06 19:41:05 +00:00
|
|
|
bool "Make CPU Idle calls when idle"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-06 19:41:05 +00:00
|
|
|
Enable calls to APM CPU Idle/CPU Busy inside the kernel's idle loop.
|
|
|
|
On some machines, this can activate improved power savings, such as
|
|
|
|
a slowed CPU clock rate, when the machine is idle. These idle calls
|
|
|
|
are made after the idle loop has run for some length of time (e.g.,
|
|
|
|
333 mS). On some machines, this will cause a hang at boot time or
|
|
|
|
whenever the CPU becomes idle. (On machines with more than one CPU,
|
|
|
|
this option does nothing.)
|
|
|
|
|
|
|
|
config APM_DISPLAY_BLANK
|
|
|
|
bool "Enable console blanking using APM"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-06 19:41:05 +00:00
|
|
|
Enable console blanking using the APM. Some laptops can use this to
|
|
|
|
turn off the LCD backlight when the screen blanker of the Linux
|
|
|
|
virtual console blanks the screen. Note that this is only used by
|
|
|
|
the virtual console screen blanker, and won't turn off the backlight
|
|
|
|
when using the X Window system. This also doesn't have anything to
|
|
|
|
do with your VESA-compliant power-saving monitor. Further, this
|
|
|
|
option doesn't work for all laptops -- it might not turn off your
|
|
|
|
backlight at all, or it might print a lot of errors to the console,
|
|
|
|
especially if you are using gpm.
|
|
|
|
|
|
|
|
config APM_ALLOW_INTS
|
|
|
|
bool "Allow interrupts during APM BIOS calls"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-06 19:41:05 +00:00
|
|
|
Normally we disable external interrupts while we are making calls to
|
|
|
|
the APM BIOS as a measure to lessen the effects of a badly behaving
|
|
|
|
BIOS implementation. The BIOS should reenable interrupts if it
|
|
|
|
needs to. Unfortunately, some BIOSes do not -- especially those in
|
|
|
|
many of the newer IBM Thinkpads. If you experience hangs when you
|
|
|
|
suspend, try setting this to Y. Otherwise, say N.
|
|
|
|
|
|
|
|
endif # APM
|
|
|
|
|
2011-05-19 22:51:07 +00:00
|
|
|
source "drivers/cpufreq/Kconfig"
|
2007-11-06 19:41:05 +00:00
|
|
|
|
|
|
|
source "drivers/cpuidle/Kconfig"
|
|
|
|
|
2008-10-09 18:45:22 +00:00
|
|
|
source "drivers/idle/Kconfig"
|
|
|
|
|
2007-11-06 19:41:05 +00:00
|
|
|
endmenu
|
|
|
|
|
|
|
|
menu "Bus options (PCI etc.)"
|
|
|
|
|
|
|
|
choice
|
|
|
|
prompt "PCI access mode"
|
2008-07-10 14:09:50 +00:00
|
|
|
depends on X86_32 && PCI
|
2007-11-06 19:41:05 +00:00
|
|
|
default PCI_GOANY
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-06 19:41:05 +00:00
|
|
|
On PCI systems, the BIOS can be used to detect the PCI devices and
|
|
|
|
determine their configuration. However, some old PCI motherboards
|
|
|
|
have BIOS bugs and may crash if this is done. Also, some embedded
|
|
|
|
PCI-based systems don't have any BIOS at all. Linux can also try to
|
|
|
|
detect the PCI hardware directly without using the BIOS.
|
|
|
|
|
|
|
|
With this option, you can specify how Linux should detect the
|
|
|
|
PCI devices. If you choose "BIOS", the BIOS will be used,
|
|
|
|
if you choose "Direct", the BIOS won't be used, and if you
|
|
|
|
choose "MMConfig", then PCI Express MMCONFIG will be used.
|
|
|
|
If you choose "Any", the kernel will try MMCONFIG, then the
|
|
|
|
direct access method and falls back to the BIOS if that doesn't
|
|
|
|
work. If unsure, go with the default, which is "Any".
|
|
|
|
|
|
|
|
config PCI_GOBIOS
|
|
|
|
bool "BIOS"
|
|
|
|
|
|
|
|
config PCI_GOMMCONFIG
|
|
|
|
bool "MMConfig"
|
|
|
|
|
|
|
|
config PCI_GODIRECT
|
|
|
|
bool "Direct"
|
|
|
|
|
2008-04-29 07:59:53 +00:00
|
|
|
config PCI_GOOLPC
|
2010-09-23 16:28:04 +00:00
|
|
|
bool "OLPC XO-1"
|
2008-04-29 07:59:53 +00:00
|
|
|
depends on OLPC
|
|
|
|
|
2008-06-05 21:14:41 +00:00
|
|
|
config PCI_GOANY
|
|
|
|
bool "Any"
|
|
|
|
|
2007-11-06 19:41:05 +00:00
|
|
|
endchoice
|
|
|
|
|
|
|
|
config PCI_BIOS
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
2008-07-10 14:09:50 +00:00
|
|
|
depends on X86_32 && PCI && (PCI_GOBIOS || PCI_GOANY)
|
2007-11-06 19:41:05 +00:00
|
|
|
|
|
|
|
# x86-64 doesn't support PCI BIOS access from long mode so always go direct.
|
|
|
|
config PCI_DIRECT
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
2011-05-27 06:59:39 +00:00
|
|
|
depends on PCI && (X86_64 || (PCI_GODIRECT || PCI_GOANY || PCI_GOOLPC || PCI_GOMMCONFIG))
|
2007-11-06 19:41:05 +00:00
|
|
|
|
|
|
|
config PCI_MMCONFIG
|
2018-03-07 07:39:16 +00:00
|
|
|
bool "Support mmconfig PCI config space access" if X86_64
|
|
|
|
default y
|
2021-02-11 13:40:02 +00:00
|
|
|
depends on PCI && (ACPI || JAILHOUSE_GUEST)
|
2018-03-07 07:39:16 +00:00
|
|
|
depends on X86_64 || (PCI_GOANY || PCI_GOMMCONFIG)
|
2007-11-06 19:41:05 +00:00
|
|
|
|
2008-04-29 07:59:53 +00:00
|
|
|
config PCI_OLPC
|
2008-06-05 21:14:41 +00:00
|
|
|
def_bool y
|
|
|
|
depends on PCI && OLPC && (PCI_GOOLPC || PCI_GOANY)
|
2008-04-29 07:59:53 +00:00
|
|
|
|
2010-03-18 20:31:34 +00:00
|
|
|
config PCI_XEN
|
|
|
|
def_bool y
|
|
|
|
depends on PCI && XEN
|
|
|
|
|
2018-03-07 07:39:17 +00:00
|
|
|
config MMCONF_FAM10H
|
|
|
|
def_bool y
|
|
|
|
depends on X86_64 && PCI_MMCONFIG && ACPI
|
2007-11-06 19:41:05 +00:00
|
|
|
|
2010-04-01 18:43:30 +00:00
|
|
|
config PCI_CNB20LE_QUIRK
|
2011-01-20 22:44:16 +00:00
|
|
|
bool "Read CNB20LE Host Bridge Windows" if EXPERT
|
2012-10-02 18:16:47 +00:00
|
|
|
depends on PCI
|
2010-04-01 18:43:30 +00:00
|
|
|
help
|
|
|
|
Read the PCI windows out of the CNB20LE host bridge. This allows
|
|
|
|
PCI hotplug to work on systems with the CNB20LE chipset which do
|
|
|
|
not have ACPI.
|
|
|
|
|
2011-01-06 17:12:30 +00:00
|
|
|
There's no public spec for this chipset, and this functionality
|
|
|
|
is known to be incomplete.
|
|
|
|
|
|
|
|
You should say N unless you know you need this.
|
|
|
|
|
2016-05-27 22:08:27 +00:00
|
|
|
config ISA_BUS
|
2017-12-29 20:14:46 +00:00
|
|
|
bool "ISA bus support on modern systems" if EXPERT
|
2016-05-27 22:08:27 +00:00
|
|
|
help
|
2017-12-29 20:14:46 +00:00
|
|
|
Expose ISA bus device drivers and options available for selection and
|
|
|
|
configuration. Enable this option if your target machine has an ISA
|
|
|
|
bus. ISA is an older system, displaced by PCI and newer bus
|
|
|
|
architectures -- if your target machine is modern, it probably does
|
|
|
|
not have an ISA bus.
|
2016-05-27 22:08:27 +00:00
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2011-03-22 23:34:59 +00:00
|
|
|
# x86_64 have no ISA slots, but can have ISA-style DMA.
|
2007-11-06 19:41:05 +00:00
|
|
|
config ISA_DMA_API
|
2011-03-22 23:34:59 +00:00
|
|
|
bool "ISA-style DMA support" if (X86_64 && EXPERT)
|
|
|
|
default y
|
|
|
|
help
|
|
|
|
Enables ISA-style DMA support for devices requiring such controllers.
|
|
|
|
If unsure, say Y.
|
2007-11-06 19:41:05 +00:00
|
|
|
|
2016-05-21 17:25:19 +00:00
|
|
|
if X86_32
|
|
|
|
|
2007-11-06 19:41:05 +00:00
|
|
|
config ISA
|
|
|
|
bool "ISA support"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-06 19:41:05 +00:00
|
|
|
Find out whether you have ISA slots on your motherboard. ISA is the
|
|
|
|
name of a bus system, i.e. the way the CPU talks to the other stuff
|
|
|
|
inside your box. Other bus systems are PCI, EISA, MicroChannel
|
|
|
|
(MCA) or VESA. ISA is an older system, now being displaced by PCI;
|
|
|
|
newer boards don't support it. If you have ISA, say Y, otherwise N.
|
|
|
|
|
|
|
|
config SCx200
|
|
|
|
tristate "NatSemi SCx200 support"
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-06 19:41:05 +00:00
|
|
|
This provides basic support for National Semiconductor's
|
|
|
|
(now AMD's) Geode processors. The driver probes for the
|
|
|
|
PCI-IDs of several on-chip devices, so its a good dependency
|
|
|
|
for other scx200_* drivers.
|
|
|
|
|
|
|
|
If compiled as a module, the driver is named scx200.
|
|
|
|
|
|
|
|
config SCx200HR_TIMER
|
|
|
|
tristate "NatSemi SCx200 27MHz High-Resolution Timer Support"
|
2010-07-14 00:56:20 +00:00
|
|
|
depends on SCx200
|
2007-11-06 19:41:05 +00:00
|
|
|
default y
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2007-11-06 19:41:05 +00:00
|
|
|
This driver provides a clocksource built upon the on-chip
|
|
|
|
27MHz high-resolution timer. Its also a workaround for
|
|
|
|
NSC Geode SC-1100's buggy TSC, which loses time when the
|
|
|
|
processor goes idle (as is done by the scheduler). The
|
|
|
|
other workaround is idle=poll boot option.
|
|
|
|
|
2008-04-29 07:59:53 +00:00
|
|
|
config OLPC
|
|
|
|
bool "One Laptop Per Child support"
|
2011-02-23 08:50:15 +00:00
|
|
|
depends on !X86_PAE
|
2009-12-15 02:00:36 +00:00
|
|
|
select GPIOLIB
|
2011-02-23 09:08:31 +00:00
|
|
|
select OF
|
2011-03-13 15:10:17 +00:00
|
|
|
select OF_PROMTREE
|
2011-12-16 22:50:17 +00:00
|
|
|
select IRQ_DOMAIN
|
2019-05-13 07:56:37 +00:00
|
|
|
select OLPC_EC
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2008-04-29 07:59:53 +00:00
|
|
|
Add support for detecting the unique features of the OLPC
|
|
|
|
XO hardware.
|
|
|
|
|
2011-06-25 16:34:10 +00:00
|
|
|
config OLPC_XO1_PM
|
|
|
|
bool "OLPC XO-1 Power Management"
|
2018-10-05 13:13:07 +00:00
|
|
|
depends on OLPC && MFD_CS5535=y && PM_SLEEP
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2011-06-25 16:34:11 +00:00
|
|
|
Add support for poweroff and suspend of the OLPC XO-1 laptop.
|
2010-10-10 09:40:32 +00:00
|
|
|
|
2011-06-25 16:34:17 +00:00
|
|
|
config OLPC_XO1_RTC
|
|
|
|
bool "OLPC XO-1 Real Time Clock"
|
|
|
|
depends on OLPC_XO1_PM && RTC_DRV_CMOS
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2011-06-25 16:34:17 +00:00
|
|
|
Add support for the XO-1 real time clock, which can be used as a
|
|
|
|
programmable wakeup source.
|
|
|
|
|
2011-06-25 16:34:12 +00:00
|
|
|
config OLPC_XO1_SCI
|
|
|
|
bool "OLPC XO-1 SCI extras"
|
2018-04-04 12:44:54 +00:00
|
|
|
depends on OLPC && OLPC_XO1_PM && GPIO_CS5535=y
|
2012-12-18 20:22:17 +00:00
|
|
|
depends on INPUT=y
|
2011-07-24 17:33:21 +00:00
|
|
|
select POWER_SUPPLY
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2011-06-25 16:34:12 +00:00
|
|
|
Add support for SCI-based features of the OLPC XO-1 laptop:
|
2011-06-25 16:34:14 +00:00
|
|
|
- EC-driven system wakeups
|
2011-06-25 16:34:12 +00:00
|
|
|
- Power button
|
2011-06-25 16:34:14 +00:00
|
|
|
- Ebook switch
|
2011-06-25 16:34:15 +00:00
|
|
|
- Lid switch
|
2011-06-25 16:34:16 +00:00
|
|
|
- AC adapter status updates
|
|
|
|
- Battery status updates
|
2011-06-25 16:34:12 +00:00
|
|
|
|
2011-06-25 16:34:18 +00:00
|
|
|
config OLPC_XO15_SCI
|
|
|
|
bool "OLPC XO-1.5 SCI extras"
|
2011-07-24 17:33:21 +00:00
|
|
|
depends on OLPC && ACPI
|
|
|
|
select POWER_SUPPLY
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2011-06-25 16:34:18 +00:00
|
|
|
Add support for SCI-based features of the OLPC XO-1.5 laptop:
|
|
|
|
- EC-driven system wakeups
|
|
|
|
- AC adapter status updates
|
|
|
|
- Battery status updates
|
2010-10-10 09:40:32 +00:00
|
|
|
|
2024-08-21 05:25:04 +00:00
|
|
|
config GEODE_COMMON
|
|
|
|
bool
|
|
|
|
|
2011-09-20 21:00:12 +00:00
|
|
|
config ALIX
|
|
|
|
bool "PCEngines ALIX System Support (LED setup)"
|
|
|
|
select GPIOLIB
|
2024-08-21 05:25:04 +00:00
|
|
|
select GEODE_COMMON
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2011-09-20 21:00:12 +00:00
|
|
|
This option enables system support for the PCEngines ALIX.
|
|
|
|
At present this just sets up LEDs for GPIO control on
|
|
|
|
ALIX2/3/6 boards. However, other system specific setup should
|
|
|
|
get added here.
|
|
|
|
|
|
|
|
Note: You must still enable the drivers for GPIO and LED support
|
|
|
|
(GPIO_CS5535 & LEDS_GPIO) to actually use the LEDs
|
|
|
|
|
|
|
|
Note: You have to set alix.force=1 for boards with Award BIOS.
|
|
|
|
|
2012-03-05 23:05:15 +00:00
|
|
|
config NET5501
|
|
|
|
bool "Soekris Engineering net5501 System Support (LEDS, GPIO, etc)"
|
|
|
|
select GPIOLIB
|
2024-08-21 05:25:04 +00:00
|
|
|
select GEODE_COMMON
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2012-03-05 23:05:15 +00:00
|
|
|
This option enables system support for the Soekris Engineering net5501.
|
|
|
|
|
2012-01-14 08:45:39 +00:00
|
|
|
config GEOS
|
|
|
|
bool "Traverse Technologies GEOS System Support (LEDS, GPIO, etc)"
|
|
|
|
select GPIOLIB
|
2024-08-21 05:25:04 +00:00
|
|
|
select GEODE_COMMON
|
2012-01-14 08:45:39 +00:00
|
|
|
depends on DMI
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2012-01-14 08:45:39 +00:00
|
|
|
This option enables system support for the Traverse Technologies GEOS.
|
|
|
|
|
2013-01-04 21:18:14 +00:00
|
|
|
config TS5500
|
|
|
|
bool "Technologic Systems TS-5500 platform support"
|
|
|
|
depends on MELAN
|
|
|
|
select CHECK_SIGNATURE
|
|
|
|
select NEW_LEDS
|
|
|
|
select LEDS_CLASS
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2013-01-04 21:18:14 +00:00
|
|
|
This option enables system support for the Technologic Systems TS-5500.
|
|
|
|
|
2007-11-06 22:10:39 +00:00
|
|
|
endif # X86_32
|
|
|
|
|
2010-09-17 16:03:43 +00:00
|
|
|
config AMD_NB
|
2007-11-06 19:41:05 +00:00
|
|
|
def_bool y
|
2010-03-12 14:43:03 +00:00
|
|
|
depends on CPU_SUP_AMD && PCI
|
2007-11-06 19:41:05 +00:00
|
|
|
|
|
|
|
endmenu
|
|
|
|
|
2018-07-31 11:39:30 +00:00
|
|
|
menu "Binary Emulations"
|
2007-11-06 19:41:05 +00:00
|
|
|
|
|
|
|
config IA32_EMULATION
|
|
|
|
bool "IA32 Emulation"
|
|
|
|
depends on X86_64
|
2016-11-15 09:22:52 +00:00
|
|
|
select ARCH_WANT_OLD_COMPAT_IPC
|
2013-06-18 19:33:40 +00:00
|
|
|
select BINFMT_ELF
|
2016-11-15 09:22:52 +00:00
|
|
|
select COMPAT_OLD_SIGACTION
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2012-02-19 18:40:03 +00:00
|
|
|
Include code to run legacy 32-bit programs under a
|
|
|
|
64-bit kernel. You should likely turn this on, unless you're
|
|
|
|
100% sure that you don't have any 32-bit programs left.
|
2007-11-06 19:41:05 +00:00
|
|
|
|
2023-06-23 11:14:09 +00:00
|
|
|
config IA32_EMULATION_DEFAULT_DISABLED
|
|
|
|
bool "IA32 emulation disabled by default"
|
|
|
|
default n
|
|
|
|
depends on IA32_EMULATION
|
|
|
|
help
|
|
|
|
Make IA32 emulation disabled by default. This prevents loading 32-bit
|
|
|
|
processes and access to 32-bit syscalls. If unsure, leave it to its
|
|
|
|
default value.
|
|
|
|
|
2022-03-14 19:48:41 +00:00
|
|
|
config X86_X32_ABI
|
2012-10-02 18:16:47 +00:00
|
|
|
bool "x32 ABI for 64-bit mode"
|
2015-06-22 11:55:21 +00:00
|
|
|
depends on X86_64
|
2022-03-14 19:48:42 +00:00
|
|
|
# llvm-objcopy does not convert x86_64 .note.gnu.property or
|
|
|
|
# compressed debug sections to x86_x32 properly:
|
|
|
|
# https://github.com/ClangBuiltLinux/linux/issues/514
|
|
|
|
# https://github.com/ClangBuiltLinux/linux/issues/1141
|
|
|
|
depends on $(success,$(OBJCOPY) --version | head -n1 | grep -qv llvm)
|
2020-06-13 16:50:22 +00:00
|
|
|
help
|
2012-02-19 18:40:03 +00:00
|
|
|
Include code to run binaries for the x32 native 32-bit ABI
|
|
|
|
for 64-bit processors. An x32 process gets access to the
|
|
|
|
full 64-bit register file and wide data path while leaving
|
|
|
|
pointers at 32 bits for smaller memory footprint.
|
|
|
|
|
2016-11-15 09:22:52 +00:00
|
|
|
config COMPAT_32
|
|
|
|
def_bool y
|
|
|
|
depends on IA32_EMULATION || X86_32
|
|
|
|
select HAVE_UID16
|
|
|
|
select OLD_SIGSUSPEND3
|
|
|
|
|
2007-11-06 19:41:05 +00:00
|
|
|
config COMPAT
|
2008-01-30 12:31:03 +00:00
|
|
|
def_bool y
|
2022-03-14 19:48:41 +00:00
|
|
|
depends on IA32_EMULATION || X86_X32_ABI
|
2007-11-06 19:41:05 +00:00
|
|
|
|
|
|
|
config COMPAT_FOR_U64_ALIGNMENT
|
2012-09-10 11:41:45 +00:00
|
|
|
def_bool y
|
2022-06-05 17:53:41 +00:00
|
|
|
depends on COMPAT
|
2011-03-07 15:06:20 +00:00
|
|
|
|
2007-11-06 19:41:05 +00:00
|
|
|
endmenu
|
|
|
|
|
2008-11-03 17:21:45 +00:00
|
|
|
config HAVE_ATOMIC_IOMAP
|
|
|
|
def_bool y
|
|
|
|
depends on X86_32
|
|
|
|
|
2007-12-16 09:02:48 +00:00
|
|
|
source "arch/x86/kvm/Kconfig"
|
2020-03-26 08:00:58 +00:00
|
|
|
|
|
|
|
source "arch/x86/Kconfig.assembler"
|