mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-01-16 18:08:20 +00:00
ASoC: Fixes for v6.13
A few small fixes for v6.13, all system specific - the biggest thing is the fix for jack handling over suspend on some Intel laptops. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmdR3IYACgkQJNaLcl1U h9CaAAgAhU+wN7LEym7648q33gVEy/I335+ZHf0gLEMnAF1iNzoOx0Gy3cXBPHq3 sic1P37UmkIIWi6BTr19aBxQ9Z0Vk3WhUsk+elmg3vhR1lodBZ9m8lYlLyEGbCh/ Ur/AFSoewPbYGdJAVL7FclDiMXlnallF6xFWbh9O9Fw85hTh4WF07dRyws8j9tZQ qMkoic95lLPZTCplt8vHVC9sTXWuVp1HNiUKZDLLQ044PRlehLH21W4HJJgk/Dtl TO5U1zZpY3gB1QsxaR3+6vMDgPHHCUxvRkb4/hydHmKIqoFGuu0Ootipm9su1b/L jOKeEX2Gk6416fHpTWPUvJQTlv9MAA== =lnCs -----END PGP SIGNATURE----- Merge tag 'asoc-fix-v6.13-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.13 A few small fixes for v6.13, all system specific - the biggest thing is the fix for jack handling over suspend on some Intel laptops.
This commit is contained in:
commit
c34e9ab9a6
9
.clippy.toml
Normal file
9
.clippy.toml
Normal file
@ -0,0 +1,9 @@
|
||||
# SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
check-private-items = true
|
||||
|
||||
disallowed-macros = [
|
||||
# The `clippy::dbg_macro` lint only works with `std::dbg!`, thus we simulate
|
||||
# it here, see: https://github.com/rust-lang/rust-clippy/issues/11303.
|
||||
{ path = "kernel::dbg", reason = "the `dbg!` macro is intended as a debugging tool" },
|
||||
]
|
@ -3,3 +3,4 @@ Alan Cox <root@hraefn.swansea.linux.org.uk>
|
||||
Christoph Hellwig <hch@lst.de>
|
||||
Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
||||
Marc Gonzalez <marc.w.gonzalez@free.fr>
|
||||
Ralf Baechle <ralf@linux-mips.org>
|
||||
|
2
.gitignore
vendored
2
.gitignore
vendored
@ -103,6 +103,7 @@ modules.order
|
||||
# We don't want to ignore the following even if they are dot-files
|
||||
#
|
||||
!.clang-format
|
||||
!.clippy.toml
|
||||
!.cocciconfig
|
||||
!.editorconfig
|
||||
!.get_maintainer.ignore
|
||||
@ -128,6 +129,7 @@ series
|
||||
|
||||
# ctags files
|
||||
tags
|
||||
!tags/
|
||||
TAGS
|
||||
|
||||
# cscope files
|
||||
|
5
.mailmap
5
.mailmap
@ -37,6 +37,7 @@ Alexei Avshalom Lazar <quic_ailizaro@quicinc.com> <ailizaro@codeaurora.org>
|
||||
Alexei Starovoitov <ast@kernel.org> <alexei.starovoitov@gmail.com>
|
||||
Alexei Starovoitov <ast@kernel.org> <ast@fb.com>
|
||||
Alexei Starovoitov <ast@kernel.org> <ast@plumgrid.com>
|
||||
Alexey Klimov <alexey.klimov@linaro.org> <klimov.linux@gmail.com>
|
||||
Alexey Makhalov <alexey.amakhalov@broadcom.com> <amakhalov@vmware.com>
|
||||
Alex Elder <elder@kernel.org>
|
||||
Alex Elder <elder@kernel.org> <aelder@sgi.com>
|
||||
@ -251,6 +252,8 @@ Guru Das Srinagesh <quic_gurus@quicinc.com> <gurus@codeaurora.org>
|
||||
Gustavo Padovan <gustavo@las.ic.unicamp.br>
|
||||
Gustavo Padovan <padovan@profusion.mobi>
|
||||
Hanjun Guo <guohanjun@huawei.com> <hanjun.guo@linaro.org>
|
||||
Hans Verkuil <hverkuil@xs4all.nl> <hansverk@cisco.com>
|
||||
Hans Verkuil <hverkuil@xs4all.nl> <hverkuil-cisco@xs4all.nl>
|
||||
Heiko Carstens <hca@linux.ibm.com> <h.carstens@de.ibm.com>
|
||||
Heiko Carstens <hca@linux.ibm.com> <heiko.carstens@de.ibm.com>
|
||||
Heiko Stuebner <heiko@sntech.de> <heiko.stuebner@bqreaders.com>
|
||||
@ -269,6 +272,7 @@ Jack Pham <quic_jackp@quicinc.com> <jackp@codeaurora.org>
|
||||
Jaegeuk Kim <jaegeuk@kernel.org> <jaegeuk@google.com>
|
||||
Jaegeuk Kim <jaegeuk@kernel.org> <jaegeuk.kim@samsung.com>
|
||||
Jaegeuk Kim <jaegeuk@kernel.org> <jaegeuk@motorola.com>
|
||||
Jai Luthra <jai.luthra@linux.dev> <j-luthra@ti.com>
|
||||
Jakub Kicinski <kuba@kernel.org> <jakub.kicinski@netronome.com>
|
||||
James Bottomley <jejb@mulgrave.(none)>
|
||||
James Bottomley <jejb@titanic.il.steeleye.com>
|
||||
@ -730,6 +734,7 @@ Will Deacon <will@kernel.org> <will.deacon@arm.com>
|
||||
Wolfram Sang <wsa@kernel.org> <w.sang@pengutronix.de>
|
||||
Wolfram Sang <wsa@kernel.org> <wsa@the-dreams.de>
|
||||
Yakir Yang <kuankuan.y@gmail.com> <ykk@rock-chips.com>
|
||||
Yanteng Si <si.yanteng@linux.dev> <siyanteng@loongson.cn>
|
||||
Yusuke Goda <goda.yusuke@renesas.com>
|
||||
Zack Rusin <zack.rusin@broadcom.com> <zackr@vmware.com>
|
||||
Zhu Yanjun <zyjzyj2000@gmail.com> <yanjunz@nvidia.com>
|
||||
|
12
CREDITS
12
CREDITS
@ -185,6 +185,11 @@ P: 1024/AF7B30C1 CF 97 C2 CC 6D AE A7 FE C8 BA 9C FC 88 DE 32 C3
|
||||
D: Linux/MIPS port
|
||||
D: Linux/68k hacker
|
||||
D: AX25 maintainer
|
||||
D: EDAC-CAVIUM OCTEON maintainer
|
||||
D: IOC3 ETHERNET DRIVER maintainer
|
||||
D: NETROM NETWORK LAYER maintainer
|
||||
D: ROSE NETWORK LAYER maintainer
|
||||
D: TURBOCHANNEL SUBSYSTEM maintainer
|
||||
S: Hauptstrasse 19
|
||||
S: 79837 St. Blasien
|
||||
S: Germany
|
||||
@ -574,6 +579,9 @@ N: Zach Brown
|
||||
E: zab@zabbo.net
|
||||
D: maestro pci sound
|
||||
|
||||
N: Zefan Li
|
||||
D: Contribution to control group stuff
|
||||
|
||||
N: David Brownell
|
||||
D: Kernel engineer, mentor, and friend. Maintained USB EHCI and
|
||||
D: gadget layers, SPI subsystem, GPIO subsystem, and more than a few
|
||||
@ -3795,6 +3803,10 @@ S: Department of Zoology, University of Washington
|
||||
S: Seattle, WA 98195-1800
|
||||
S: USA
|
||||
|
||||
N: York Sun
|
||||
E: york.sun@nxp.com
|
||||
D: Freescale DDR EDAC
|
||||
|
||||
N: Eugene Surovegin
|
||||
E: ebs@ebshome.net
|
||||
W: https://kernel.ebshome.net/
|
||||
|
12
Documentation/ABI/obsolete/sysfs-selinux-user
Normal file
12
Documentation/ABI/obsolete/sysfs-selinux-user
Normal file
@ -0,0 +1,12 @@
|
||||
What: /sys/fs/selinux/user
|
||||
Date: April 2005 (predates git)
|
||||
KernelVersion: 2.6.12-rc2 (predates git)
|
||||
Contact: selinux@vger.kernel.org
|
||||
Description:
|
||||
|
||||
The selinuxfs "user" node allows userspace to request a list
|
||||
of security contexts that can be reached for a given SELinux
|
||||
user from a given starting context. This was used by libselinux
|
||||
when various login-style programs requested contexts for
|
||||
users, but libselinux stopped using it in 2020.
|
||||
Kernel support will be removed no sooner than Dec 2025.
|
@ -424,6 +424,13 @@ Description:
|
||||
[RW] This file is used to control (on/off) the iostats
|
||||
accounting of the disk.
|
||||
|
||||
What: /sys/block/<disk>/queue/iostats_passthrough
|
||||
Date: October 2024
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] This file is used to control (on/off) the iostats
|
||||
accounting of the disk for passthrough commands.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/logical_block_size
|
||||
Date: May 2009
|
||||
@ -594,6 +601,9 @@ Description:
|
||||
[RW] Maximum number of kilobytes to read-ahead for filesystems
|
||||
on this block device.
|
||||
|
||||
For MADV_HUGEPAGE, the readahead size may exceed this setting
|
||||
since its granularity is based on the hugepage size.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/rotational
|
||||
Date: January 2009
|
||||
|
@ -342,6 +342,70 @@ Description: Specific uncompressed frame descriptors
|
||||
support
|
||||
========================= =====================================
|
||||
|
||||
What: /config/usb-gadget/gadget/functions/uvc.name/streaming/framebased
|
||||
Date: Sept 2024
|
||||
KernelVersion: 5.15
|
||||
Description: Framebased format descriptors
|
||||
|
||||
What: /config/usb-gadget/gadget/functions/uvc.name/streaming/framebased/name
|
||||
Date: Sept 2024
|
||||
KernelVersion: 5.15
|
||||
Description: Specific framebased format descriptors
|
||||
|
||||
================== =======================================
|
||||
bFormatIndex unique id for this format descriptor;
|
||||
only defined after parent header is
|
||||
linked into the streaming class;
|
||||
read-only
|
||||
bmaControls this format's data for bmaControls in
|
||||
the streaming header
|
||||
bmInterlaceFlags specifies interlace information,
|
||||
read-only
|
||||
bAspectRatioY the X dimension of the picture aspect
|
||||
ratio, read-only
|
||||
bAspectRatioX the Y dimension of the picture aspect
|
||||
ratio, read-only
|
||||
bDefaultFrameIndex optimum frame index for this stream
|
||||
bBitsPerPixel number of bits per pixel used to
|
||||
specify color in the decoded video
|
||||
frame
|
||||
guidFormat globally unique id used to identify
|
||||
stream-encoding format
|
||||
================== =======================================
|
||||
|
||||
What: /config/usb-gadget/gadget/functions/uvc.name/streaming/framebased/name/name
|
||||
Date: Sept 2024
|
||||
KernelVersion: 5.15
|
||||
Description: Specific framebased frame descriptors
|
||||
|
||||
========================= =====================================
|
||||
bFrameIndex unique id for this framedescriptor;
|
||||
only defined after parent format is
|
||||
linked into the streaming header;
|
||||
read-only
|
||||
dwFrameInterval indicates how frame interval can be
|
||||
programmed; a number of values
|
||||
separated by newline can be specified
|
||||
dwDefaultFrameInterval the frame interval the device would
|
||||
like to use as default
|
||||
dwBytesPerLine Specifies the number of bytes per line
|
||||
of video for packed fixed frame size
|
||||
formats, allowing the receiver to
|
||||
perform stride alignment of the video.
|
||||
If the bVariableSize value (above) is
|
||||
TRUE (1), or if the format does not
|
||||
permit such alignment, this value shall
|
||||
be set to zero (0).
|
||||
dwMaxBitRate the maximum bit rate at the shortest
|
||||
frame interval in bps
|
||||
dwMinBitRate the minimum bit rate at the longest
|
||||
frame interval in bps
|
||||
wHeight height of decoded bitmap frame in px
|
||||
wWidth width of decoded bitmam frame in px
|
||||
bmCapabilities still image support, fixed frame-rate
|
||||
support
|
||||
========================= =====================================
|
||||
|
||||
What: /config/usb-gadget/gadget/functions/uvc.name/streaming/header
|
||||
Date: Dec 2014
|
||||
KernelVersion: 4.0
|
||||
|
@ -184,3 +184,10 @@ Date: Apr 2020
|
||||
Contact: linux-crypto@vger.kernel.org
|
||||
Description: Dump the total number of time out requests.
|
||||
Available for both PF and VF, and take no other effect on HPRE.
|
||||
|
||||
What: /sys/kernel/debug/hisi_hpre/<bdf>/cap_regs
|
||||
Date: Oct 2024
|
||||
Contact: linux-crypto@vger.kernel.org
|
||||
Description: Dump the values of the qm and hpre capability bit registers and
|
||||
support the query of device specifications to facilitate fault locating.
|
||||
Available for both PF and VF, and take no other effect on HPRE.
|
||||
|
25
Documentation/ABI/testing/debugfs-hisi-migration
Normal file
25
Documentation/ABI/testing/debugfs-hisi-migration
Normal file
@ -0,0 +1,25 @@
|
||||
What: /sys/kernel/debug/vfio/<device>/migration/hisi_acc/dev_data
|
||||
Date: Jan 2025
|
||||
KernelVersion: 6.13
|
||||
Contact: Longfang Liu <liulongfang@huawei.com>
|
||||
Description: Read the configuration data and some status data
|
||||
required for device live migration. These data include device
|
||||
status data, queue configuration data, some task configuration
|
||||
data and device attribute data. The output format of the data
|
||||
is defined by the live migration driver.
|
||||
|
||||
What: /sys/kernel/debug/vfio/<device>/migration/hisi_acc/migf_data
|
||||
Date: Jan 2025
|
||||
KernelVersion: 6.13
|
||||
Contact: Longfang Liu <liulongfang@huawei.com>
|
||||
Description: Read the data from the last completed live migration.
|
||||
This data includes the same device status data as in "dev_data".
|
||||
The migf_data is the dev_data that is migrated.
|
||||
|
||||
What: /sys/kernel/debug/vfio/<device>/migration/hisi_acc/cmd_state
|
||||
Date: Jan 2025
|
||||
KernelVersion: 6.13
|
||||
Contact: Longfang Liu <liulongfang@huawei.com>
|
||||
Description: Used to obtain the device command sending and receiving
|
||||
channel status. Returns failure or success logs based on the
|
||||
results.
|
@ -157,3 +157,10 @@ Contact: linux-crypto@vger.kernel.org
|
||||
Description: Dump the total number of completed but marked error requests
|
||||
to be received.
|
||||
Available for both PF and VF, and take no other effect on SEC.
|
||||
|
||||
What: /sys/kernel/debug/hisi_sec2/<bdf>/cap_regs
|
||||
Date: Oct 2024
|
||||
Contact: linux-crypto@vger.kernel.org
|
||||
Description: Dump the values of the qm and sec capability bit registers and
|
||||
support the query of device specifications to facilitate fault locating.
|
||||
Available for both PF and VF, and take no other effect on SEC.
|
||||
|
@ -158,3 +158,10 @@ Contact: linux-crypto@vger.kernel.org
|
||||
Description: Dump the total number of BD type error requests
|
||||
to be received.
|
||||
Available for both PF and VF, and take no other effect on ZIP.
|
||||
|
||||
What: /sys/kernel/debug/hisi_zip/<bdf>/cap_regs
|
||||
Date: Oct 2024
|
||||
Contact: linux-crypto@vger.kernel.org
|
||||
Description: Dump the values of the qm and zip capability bit registers and
|
||||
support the query of device specifications to facilitate fault locating.
|
||||
Available for both PF and VF, and take no other effect on ZIP.
|
||||
|
@ -0,0 +1,25 @@
|
||||
What: /sys/bus/event_source/devices/vpa_pmu/format
|
||||
Date: November 2024
|
||||
Contact: Linux on PowerPC Developer List <linuxppc-dev@lists.ozlabs.org>
|
||||
Description: Read-only. Attribute group to describe the magic bits
|
||||
that go into perf_event_attr.config for a particular pmu.
|
||||
(See ABI/testing/sysfs-bus-event_source-devices-format).
|
||||
|
||||
Each attribute under this group defines a bit range of the
|
||||
perf_event_attr.config. Supported attribute are listed
|
||||
below::
|
||||
|
||||
event = "config:0-31" - event ID
|
||||
|
||||
For example::
|
||||
|
||||
l1_to_l2_lat = "event=0x1"
|
||||
|
||||
What: /sys/bus/event_source/devices/vpa_pmu/events
|
||||
Date: November 2024
|
||||
Contact: Linux on PowerPC Developer List <linuxppc-dev@lists.ozlabs.org>
|
||||
Description: Read-only. Attribute group to describe performance monitoring
|
||||
events for the Virtual Processor Area events. Each attribute
|
||||
in this group describes a single performance monitoring event
|
||||
supported by vpa_pmu. The name of the file is the name of
|
||||
the event (See ABI/testing/sysfs-bus-event_source-devices-events).
|
@ -2268,6 +2268,30 @@ Description:
|
||||
An example format is 16-bytes, 2-digits-per-byte, HEX-string
|
||||
representing the sensor unique ID number.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/filter_type_available
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_voltage-voltage_filter_mode_available
|
||||
KernelVersion: 6.1
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Reading returns a list with the possible filter modes. Options
|
||||
for the attribute:
|
||||
|
||||
* "sinc3" - The digital sinc3 filter. Moderate 1st
|
||||
conversion time. Good noise performance.
|
||||
* "sinc4" - Sinc 4. Excellent noise performance. Long
|
||||
1st conversion time.
|
||||
* "sinc5" - The digital sinc5 filter. Excellent noise
|
||||
performance
|
||||
* "sinc4+sinc1" - Sinc4 + averaging by 8. Low 1st conversion
|
||||
time.
|
||||
* "sinc3+rej60" - Sinc3 + 60Hz rejection.
|
||||
* "sinc3+sinc1" - Sinc3 + averaging by 8. Low 1st conversion
|
||||
time.
|
||||
* "sinc3+pf1" - Sinc3 + device specific Post Filter 1.
|
||||
* "sinc3+pf2" - Sinc3 + device specific Post Filter 2.
|
||||
* "sinc3+pf3" - Sinc3 + device specific Post Filter 3.
|
||||
* "sinc3+pf4" - Sinc3 + device specific Post Filter 4.
|
||||
|
||||
What: /sys/.../events/in_proximity_thresh_either_runningperiod
|
||||
KernelVersion: 6.6
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
@ -2339,3 +2363,11 @@ KernelVersion: 6.10
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
The value of current sense resistor in Ohms.
|
||||
|
||||
What: /sys/.../iio:deviceX/in_attention_input
|
||||
KernelVersion: 6.13
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Value representing the user's attention to the system expressed
|
||||
in units as percentage. This usually means if the user is
|
||||
looking at the screen or not.
|
||||
|
@ -1,46 +0,0 @@
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_voltage-voltage_filter_mode_available
|
||||
KernelVersion: 6.2
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Reading returns a list with the possible filter modes.
|
||||
|
||||
* "sinc4" - Sinc 4. Excellent noise performance. Long
|
||||
1st conversion time. No natural 50/60Hz rejection.
|
||||
|
||||
* "sinc4+sinc1" - Sinc4 + averaging by 8. Low 1st conversion
|
||||
time.
|
||||
|
||||
* "sinc3" - Sinc3. Moderate 1st conversion time.
|
||||
Good noise performance.
|
||||
|
||||
* "sinc3+rej60" - Sinc3 + 60Hz rejection. At a sampling
|
||||
frequency of 50Hz, achieves simultaneous 50Hz and 60Hz
|
||||
rejection.
|
||||
|
||||
* "sinc3+sinc1" - Sinc3 + averaging by 8. Low 1st conversion
|
||||
time. Best used with a sampling frequency of at least
|
||||
216.19Hz.
|
||||
|
||||
* "sinc3+pf1" - Sinc3 + Post Filter 1. 53dB rejection @
|
||||
50Hz, 58dB rejection @ 60Hz.
|
||||
|
||||
* "sinc3+pf2" - Sinc3 + Post Filter 2. 70dB rejection @
|
||||
50Hz, 70dB rejection @ 60Hz.
|
||||
|
||||
* "sinc3+pf3" - Sinc3 + Post Filter 3. 99dB rejection @
|
||||
50Hz, 103dB rejection @ 60Hz.
|
||||
|
||||
* "sinc3+pf4" - Sinc3 + Post Filter 4. 103dB rejection @
|
||||
50Hz, 109dB rejection @ 60Hz.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_voltageY-voltageZ_filter_mode
|
||||
KernelVersion: 6.2
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Set the filter mode of the differential channel. When the filter
|
||||
mode changes, the in_voltageY-voltageZ_sampling_frequency and
|
||||
in_voltageY-voltageZ_sampling_frequency_available attributes
|
||||
might also change to accommodate the new filter mode.
|
||||
If the current sampling frequency is out of range for the new
|
||||
filter mode, the sampling frequency will be changed to the
|
||||
closest valid one.
|
@ -163,6 +163,17 @@ Description:
|
||||
will be present in sysfs. Writing 1 to this file
|
||||
will perform reset.
|
||||
|
||||
What: /sys/bus/pci/devices/.../reset_subordinate
|
||||
Date: October 2024
|
||||
Contact: linux-pci@vger.kernel.org
|
||||
Description:
|
||||
This is visible only for bridge devices. If you want to reset
|
||||
all devices attached through the subordinate bus of a specific
|
||||
bridge device, writing 1 to this will try to do it. This will
|
||||
affect all devices attached to the system through this bridge
|
||||
similiar to writing 1 to their individual "reset" file, so use
|
||||
with caution.
|
||||
|
||||
What: /sys/bus/pci/devices/.../vpd
|
||||
Date: February 2008
|
||||
Contact: Ben Hutchings <bwh@kernel.org>
|
||||
|
@ -0,0 +1,12 @@
|
||||
What: /sys/bus/platform/drivers/amd_x3d_vcache/AMDI0101:00/amd_x3d_mode
|
||||
Date: November 2024
|
||||
KernelVersion: 6.13
|
||||
Contact: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
|
||||
Description: (RW) AMD 3D V-Cache optimizer allows users to switch CPU core
|
||||
rankings dynamically.
|
||||
|
||||
This file switches between these two modes:
|
||||
- "frequency" cores within the faster CCD are prioritized before
|
||||
those in the slower CCD.
|
||||
- "cache" cores within the larger L3 CCD are prioritized before
|
||||
those in the smaller L3 CCD.
|
@ -193,7 +193,7 @@ Description:
|
||||
|
||||
mechanism:
|
||||
The means of authentication. This attribute is mandatory.
|
||||
Only supported type currently is "password".
|
||||
Supported types are "password" or "certificate".
|
||||
|
||||
max_password_length:
|
||||
A file that can be read to obtain the
|
||||
@ -303,6 +303,7 @@ Description:
|
||||
being configured allowing anyone to make changes.
|
||||
After any of these operations the system must reboot for the changes to
|
||||
take effect.
|
||||
Admin and System certificates are supported from 2025 systems onward.
|
||||
|
||||
certificate_thumbprint:
|
||||
Read only attribute used to display the MD5, SHA1 and SHA256 thumbprints
|
||||
|
@ -149,6 +149,19 @@ Description:
|
||||
advertise to the partner. The currently used capabilities are in
|
||||
brackets. Selection happens by writing to the file.
|
||||
|
||||
What: /sys/class/typec/<port>/usb_capability
|
||||
Date: November 2024
|
||||
Contact: Heikki Krogerus <heikki.krogerus@linux.intel.com>
|
||||
Description: Lists the supported USB Modes. The default USB mode that is used
|
||||
next time with the Enter_USB Message is in brackets. The default
|
||||
mode can be changed by writing to the file when supported by the
|
||||
driver.
|
||||
|
||||
Valid values:
|
||||
- usb2 (USB 2.0)
|
||||
- usb3 (USB 3.2)
|
||||
- usb4 (USB4)
|
||||
|
||||
USB Type-C partner devices (eg. /sys/class/typec/port0-partner/)
|
||||
|
||||
What: /sys/class/typec/<port>-partner/accessory_mode
|
||||
@ -220,6 +233,20 @@ Description:
|
||||
directory exists, it will have an attribute file for every VDO
|
||||
in Discover Identity command result.
|
||||
|
||||
What: /sys/class/typec/<port>-partner/usb_mode
|
||||
Date: November 2024
|
||||
Contact: Heikki Krogerus <heikki.krogerus@linux.intel.com>
|
||||
Description: The USB Modes that the partner device supports. The active mode
|
||||
is displayed in brackets. The active USB mode can be changed by
|
||||
writing to this file when the port driver is able to send Data
|
||||
Reset Message to the partner. That requires USB Power Delivery
|
||||
contract between the partner and the port.
|
||||
|
||||
Valid values:
|
||||
- usb2 (USB 2.0)
|
||||
- usb3 (USB 3.2)
|
||||
- usb4 (USB4)
|
||||
|
||||
USB Type-C cable devices (eg. /sys/class/typec/port0-cable/)
|
||||
|
||||
Note: Electronically Marked Cables will have a device also for one cable plug
|
||||
|
@ -79,3 +79,48 @@ Description:
|
||||
indicates a lane.
|
||||
crc_err_cnt: (RO) CRC err count on this port.
|
||||
============= ==== =============================================
|
||||
|
||||
What: /sys/devices/platform/HISI04Bx:00/used_types
|
||||
Date: August 2024
|
||||
KernelVersion: 6.12
|
||||
Contact: Huisong Li <lihuisong@huawei.com>
|
||||
Description:
|
||||
This interface is used to show all HCCS types used on the
|
||||
platform, like, HCCS-v1, HCCS-v2 and so on.
|
||||
|
||||
What: /sys/devices/platform/HISI04Bx:00/available_inc_dec_lane_types
|
||||
What: /sys/devices/platform/HISI04Bx:00/dec_lane_of_type
|
||||
What: /sys/devices/platform/HISI04Bx:00/inc_lane_of_type
|
||||
Date: August 2024
|
||||
KernelVersion: 6.12
|
||||
Contact: Huisong Li <lihuisong@huawei.com>
|
||||
Description:
|
||||
These interfaces under /sys/devices/platform/HISI04Bx/ are
|
||||
used to support the low power consumption feature of some
|
||||
HCCS types by changing the number of lanes used. The interfaces
|
||||
changing the number of lanes used are 'dec_lane_of_type' and
|
||||
'inc_lane_of_type' which require root privileges. These
|
||||
interfaces aren't exposed if no HCCS type on platform support
|
||||
this feature. Please note that decreasing lane number is only
|
||||
allowed if all the specified HCCS ports are not busy.
|
||||
|
||||
The low power consumption interfaces are as follows:
|
||||
|
||||
============================= ==== ================================
|
||||
available_inc_dec_lane_types: (RO) available HCCS types (string) to
|
||||
increase and decrease the number
|
||||
of lane used, e.g. HCCS-v2.
|
||||
dec_lane_of_type: (WO) input HCCS type supported
|
||||
decreasing lane to decrease the
|
||||
used lane number of all specified
|
||||
HCCS type ports on platform to
|
||||
the minimum.
|
||||
You can query the 'cur_lane_num'
|
||||
to get the minimum lane number
|
||||
after executing successfully.
|
||||
inc_lane_of_type: (WO) input HCCS type supported
|
||||
increasing lane to increase the
|
||||
used lane number of all specified
|
||||
HCCS type ports on platform to
|
||||
the full lane state.
|
||||
============================= ==== ================================
|
||||
|
38
Documentation/ABI/testing/sysfs-driver-hid-corsair-void
Normal file
38
Documentation/ABI/testing/sysfs-driver-hid-corsair-void
Normal file
@ -0,0 +1,38 @@
|
||||
What: /sys/bus/hid/drivers/hid-corsair-void/<dev>/fw_version_headset
|
||||
Date: January 2024
|
||||
KernelVersion: 6.13
|
||||
Contact: Stuart Hayhurst <stuart.a.hayhurst@gmail.com>
|
||||
Description: (R) The firmware version of the headset
|
||||
* Returns -ENODATA if no version was reported
|
||||
|
||||
What: /sys/bus/hid/drivers/hid-corsair-void/<dev>/fw_version_receiver
|
||||
Date: January 2024
|
||||
KernelVersion: 6.13
|
||||
Contact: Stuart Hayhurst <stuart.a.hayhurst@gmail.com>
|
||||
Description: (R) The firmware version of the receiver
|
||||
|
||||
What: /sys/bus/hid/drivers/hid-corsair-void/<dev>/microphone_up
|
||||
Date: July 2023
|
||||
KernelVersion: 6.13
|
||||
Contact: Stuart Hayhurst <stuart.a.hayhurst@gmail.com>
|
||||
Description: (R) Get the physical position of the microphone
|
||||
* 1 -> Microphone up
|
||||
* 0 -> Microphone down
|
||||
|
||||
What: /sys/bus/hid/drivers/hid-corsair-void/<dev>/send_alert
|
||||
Date: July 2023
|
||||
KernelVersion: 6.13
|
||||
Contact: Stuart Hayhurst <stuart.a.hayhurst@gmail.com>
|
||||
Description: (W) Play a built-in notification from the headset (0 / 1)
|
||||
|
||||
What: /sys/bus/hid/drivers/hid-corsair-void/<dev>/set_sidetone
|
||||
Date: December 2023
|
||||
KernelVersion: 6.13
|
||||
Contact: Stuart Hayhurst <stuart.a.hayhurst@gmail.com>
|
||||
Description: (W) Set the sidetone volume (0 - sidetone_max)
|
||||
|
||||
What: /sys/bus/hid/drivers/hid-corsair-void/<dev>/sidetone_max
|
||||
Date: July 2024
|
||||
KernelVersion: 6.13
|
||||
Contact: Stuart Hayhurst <stuart.a.hayhurst@gmail.com>
|
||||
Description: (R) Report the maximum sidetone volume
|
@ -83,3 +83,11 @@ Contact: intel-gfx@lists.freedesktop.org
|
||||
Description: RO. Fan speed of device in RPM.
|
||||
|
||||
Only supported for particular Intel i915 graphics platforms.
|
||||
|
||||
What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/temp1_input
|
||||
Date: November 2024
|
||||
KernelVersion: 6.12
|
||||
Contact: intel-gfx@lists.freedesktop.org
|
||||
Description: RO. GPU package temperature in millidegree Celsius.
|
||||
|
||||
Only supported for particular Intel i915 graphics platforms.
|
||||
|
10
Documentation/ABI/testing/sysfs-driver-panthor-profiling
Normal file
10
Documentation/ABI/testing/sysfs-driver-panthor-profiling
Normal file
@ -0,0 +1,10 @@
|
||||
What: /sys/bus/platform/drivers/panthor/.../profiling
|
||||
Date: September 2024
|
||||
KernelVersion: 6.11.0
|
||||
Contact: Adrian Larumbe <adrian.larumbe@collabora.com>
|
||||
Description:
|
||||
Bitmask to enable drm fdinfo's job profiling measurements.
|
||||
Valid values are:
|
||||
0: Don't enable fdinfo job profiling sources.
|
||||
1: Enable GPU cycle measurements for running jobs.
|
||||
2: Enable GPU timestamp sampling for running jobs.
|
20
Documentation/ABI/testing/sysfs-driver-spi-intel
Normal file
20
Documentation/ABI/testing/sysfs-driver-spi-intel
Normal file
@ -0,0 +1,20 @@
|
||||
What: /sys/devices/.../intel_spi_protected
|
||||
Date: Feb 2025
|
||||
KernelVersion: 6.13
|
||||
Contact: Alexander Usyskin <alexander.usyskin@intel.com>
|
||||
Description: This attribute allows the userspace to check if the
|
||||
Intel SPI flash controller is write protected from the host.
|
||||
|
||||
What: /sys/devices/.../intel_spi_locked
|
||||
Date: Feb 2025
|
||||
KernelVersion: 6.13
|
||||
Contact: Alexander Usyskin <alexander.usyskin@intel.com>
|
||||
Description: This attribute allows the user space to check if the
|
||||
Intel SPI flash controller locks supported opcodes.
|
||||
|
||||
What: /sys/devices/.../intel_spi_bios_locked
|
||||
Date: Feb 2025
|
||||
KernelVersion: 6.13
|
||||
Contact: Alexander Usyskin <alexander.usyskin@intel.com>
|
||||
Description: This attribute allows the user space to check if the
|
||||
Intel SPI flash controller BIOS region is locked for writes.
|
@ -16,3 +16,14 @@ Description: Control strategy of sync decompression:
|
||||
readahead on atomic contexts only.
|
||||
- 1 (force on): enable for readpage and readahead.
|
||||
- 2 (force off): disable for all situations.
|
||||
|
||||
What: /sys/fs/erofs/<disk>/drop_caches
|
||||
Date: November 2024
|
||||
Contact: "Guo Chunhai" <guochunhai@vivo.com>
|
||||
Description: Writing to this will drop compression-related caches,
|
||||
currently used to drop in-memory pclusters and cached
|
||||
compressed folios:
|
||||
|
||||
- 1 : invalidate cached compressed folios
|
||||
- 2 : drop in-memory pclusters
|
||||
- 3 : drop in-memory pclusters and cached compressed folios
|
||||
|
@ -311,10 +311,13 @@ Description: Do background GC aggressively when set. Set to 0 by default.
|
||||
GC approach and turns SSR mode on.
|
||||
gc urgent low(2): lowers the bar of checking I/O idling in
|
||||
order to process outstanding discard commands and GC a
|
||||
little bit aggressively. uses cost benefit GC approach.
|
||||
little bit aggressively. always uses cost benefit GC approach,
|
||||
and will override age-threshold GC approach if ATGC is enabled
|
||||
at the same time.
|
||||
gc urgent mid(3): does GC forcibly in a period of given
|
||||
gc_urgent_sleep_time and executes a mid level of I/O idling check.
|
||||
uses cost benefit GC approach.
|
||||
always uses cost benefit GC approach, and will override
|
||||
age-threshold GC approach if ATGC is enabled at the same time.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/gc_urgent_sleep_time
|
||||
Date: August 2017
|
||||
@ -819,3 +822,9 @@ Description: It controls the valid block ratio threshold not to trigger excessiv
|
||||
for zoned deivces. The initial value of it is 95(%). F2FS will stop the
|
||||
background GC thread from intiating GC for sections having valid blocks
|
||||
exceeding the ratio.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/max_read_extent_count
|
||||
Date: November 2024
|
||||
Contact: "Chao Yu" <chao@kernel.org>
|
||||
Description: It controls max read extent count for per-inode, the value of threshold
|
||||
is 10240 by default.
|
||||
|
@ -117,6 +117,35 @@ by the PCI endpoint function driver.
|
||||
The PCI endpoint function driver should use pci_epc_mem_free_addr() to
|
||||
free the memory space allocated using pci_epc_mem_alloc_addr().
|
||||
|
||||
* pci_epc_map_addr()
|
||||
|
||||
A PCI endpoint function driver should use pci_epc_map_addr() to map to a RC
|
||||
PCI address the CPU address of local memory obtained with
|
||||
pci_epc_mem_alloc_addr().
|
||||
|
||||
* pci_epc_unmap_addr()
|
||||
|
||||
A PCI endpoint function driver should use pci_epc_unmap_addr() to unmap the
|
||||
CPU address of local memory mapped to a RC address with pci_epc_map_addr().
|
||||
|
||||
* pci_epc_mem_map()
|
||||
|
||||
A PCI endpoint controller may impose constraints on the RC PCI addresses that
|
||||
can be mapped. The function pci_epc_mem_map() allows endpoint function
|
||||
drivers to allocate and map controller memory while handling such
|
||||
constraints. This function will determine the size of the memory that must be
|
||||
allocated with pci_epc_mem_alloc_addr() for successfully mapping a RC PCI
|
||||
address range. This function will also indicate the size of the PCI address
|
||||
range that was actually mapped, which can be less than the requested size, as
|
||||
well as the offset into the allocated memory to use for accessing the mapped
|
||||
RC PCI address range.
|
||||
|
||||
* pci_epc_mem_unmap()
|
||||
|
||||
A PCI endpoint function driver can use pci_epc_mem_unmap() to unmap and free
|
||||
controller memory that was allocated and mapped using pci_epc_mem_map().
|
||||
|
||||
|
||||
Other EPC APIs
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
|
@ -18,3 +18,4 @@ PCI Bus Subsystem
|
||||
pcieaer-howto
|
||||
endpoint/index
|
||||
boot-interrupts
|
||||
tph
|
||||
|
@ -217,8 +217,12 @@ capability structure except the PCI Express capability structure,
|
||||
that is shared between many drivers including the service drivers.
|
||||
RMW Capability accessors (pcie_capability_clear_and_set_word(),
|
||||
pcie_capability_set_word(), and pcie_capability_clear_word()) protect
|
||||
a selected set of PCI Express Capability Registers (Link Control
|
||||
Register and Root Control Register). Any change to those registers
|
||||
should be performed using RMW accessors to avoid problems due to
|
||||
concurrent updates. For the up-to-date list of protected registers,
|
||||
see pcie_capability_clear_and_set_word().
|
||||
a selected set of PCI Express Capability Registers:
|
||||
|
||||
* Link Control Register
|
||||
* Root Control Register
|
||||
* Link Control 2 Register
|
||||
|
||||
Any change to those registers should be performed using RMW accessors to
|
||||
avoid problems due to concurrent updates. For the up-to-date list of
|
||||
protected registers, see pcie_capability_clear_and_set_word().
|
||||
|
132
Documentation/PCI/tph.rst
Normal file
132
Documentation/PCI/tph.rst
Normal file
@ -0,0 +1,132 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
|
||||
===========
|
||||
TPH Support
|
||||
===========
|
||||
|
||||
:Copyright: 2024 Advanced Micro Devices, Inc.
|
||||
:Authors: - Eric van Tassell <eric.vantassell@amd.com>
|
||||
- Wei Huang <wei.huang2@amd.com>
|
||||
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
TPH (TLP Processing Hints) is a PCIe feature that allows endpoint devices
|
||||
to provide optimization hints for requests that target memory space.
|
||||
These hints, in a format called Steering Tags (STs), are embedded in the
|
||||
requester's TLP headers, enabling the system hardware, such as the Root
|
||||
Complex, to better manage platform resources for these requests.
|
||||
|
||||
For example, on platforms with TPH-based direct data cache injection
|
||||
support, an endpoint device can include appropriate STs in its DMA
|
||||
traffic to specify which cache the data should be written to. This allows
|
||||
the CPU core to have a higher probability of getting data from cache,
|
||||
potentially improving performance and reducing latency in data
|
||||
processing.
|
||||
|
||||
|
||||
How to Use TPH
|
||||
==============
|
||||
|
||||
TPH is presented as an optional extended capability in PCIe. The Linux
|
||||
kernel handles TPH discovery during boot, but it is up to the device
|
||||
driver to request TPH enablement if it is to be utilized. Once enabled,
|
||||
the driver uses the provided API to obtain the Steering Tag for the
|
||||
target memory and to program the ST into the device's ST table.
|
||||
|
||||
Enable TPH support in Linux
|
||||
---------------------------
|
||||
|
||||
To support TPH, the kernel must be built with the CONFIG_PCIE_TPH option
|
||||
enabled.
|
||||
|
||||
Manage TPH
|
||||
----------
|
||||
|
||||
To enable TPH for a device, use the following function::
|
||||
|
||||
int pcie_enable_tph(struct pci_dev *pdev, int mode);
|
||||
|
||||
This function enables TPH support for device with a specific ST mode.
|
||||
Current supported modes include:
|
||||
|
||||
* PCI_TPH_ST_NS_MODE - NO ST Mode
|
||||
* PCI_TPH_ST_IV_MODE - Interrupt Vector Mode
|
||||
* PCI_TPH_ST_DS_MODE - Device Specific Mode
|
||||
|
||||
`pcie_enable_tph()` checks whether the requested mode is actually
|
||||
supported by the device before enabling. The device driver can figure out
|
||||
which TPH mode is supported and can be properly enabled based on the
|
||||
return value of `pcie_enable_tph()`.
|
||||
|
||||
To disable TPH, use the following function::
|
||||
|
||||
void pcie_disable_tph(struct pci_dev *pdev);
|
||||
|
||||
Manage ST
|
||||
---------
|
||||
|
||||
Steering Tags are platform specific. PCIe spec does not specify where STs
|
||||
are from. Instead PCI Firmware Specification defines an ACPI _DSM method
|
||||
(see the `Revised _DSM for Cache Locality TPH Features ECN
|
||||
<https://members.pcisig.com/wg/PCI-SIG/document/15470>`_) for retrieving
|
||||
STs for a target memory of various properties. This method is what is
|
||||
supported in this implementation.
|
||||
|
||||
To retrieve a Steering Tag for a target memory associated with a specific
|
||||
CPU, use the following function::
|
||||
|
||||
int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type type,
|
||||
unsigned int cpu_uid, u16 *tag);
|
||||
|
||||
The `type` argument is used to specify the memory type, either volatile
|
||||
or persistent, of the target memory. The `cpu_uid` argument specifies the
|
||||
CPU where the memory is associated to.
|
||||
|
||||
After the ST value is retrieved, the device driver can use the following
|
||||
function to write the ST into the device::
|
||||
|
||||
int pcie_tph_set_st_entry(struct pci_dev *pdev, unsigned int index,
|
||||
u16 tag);
|
||||
|
||||
The `index` argument is the ST table entry index the ST tag will be
|
||||
written into. `pcie_tph_set_st_entry()` will figure out the proper
|
||||
location of ST table, either in the MSI-X table or in the TPH Extended
|
||||
Capability space, and write the Steering Tag into the ST entry pointed by
|
||||
the `index` argument.
|
||||
|
||||
It is completely up to the driver to decide how to use these TPH
|
||||
functions. For example a network device driver can use the TPH APIs above
|
||||
to update the Steering Tag when interrupt affinity of a RX/TX queue has
|
||||
been changed. Here is a sample code for IRQ affinity notifier:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
static void irq_affinity_notified(struct irq_affinity_notify *notify,
|
||||
const cpumask_t *mask)
|
||||
{
|
||||
struct drv_irq *irq;
|
||||
unsigned int cpu_id;
|
||||
u16 tag;
|
||||
|
||||
irq = container_of(notify, struct drv_irq, affinity_notify);
|
||||
cpumask_copy(irq->cpu_mask, mask);
|
||||
|
||||
/* Pick a right CPU as the target - here is just an example */
|
||||
cpu_id = cpumask_first(irq->cpu_mask);
|
||||
|
||||
if (pcie_tph_get_cpu_st(irq->pdev, TPH_MEM_TYPE_VM, cpu_id,
|
||||
&tag))
|
||||
return;
|
||||
|
||||
if (pcie_tph_set_st_entry(irq->pdev, irq->msix_nr, tag))
|
||||
return;
|
||||
}
|
||||
|
||||
Disable TPH system-wide
|
||||
-----------------------
|
||||
|
||||
There is a kernel command line option available to control TPH feature:
|
||||
* "notph": TPH will be disabled for all endpoint devices.
|
@ -249,7 +249,7 @@ ticks this GP)" indicates that this CPU has not taken any scheduling-clock
|
||||
interrupts during the current stalled grace period.
|
||||
|
||||
The "idle=" portion of the message prints the dyntick-idle state.
|
||||
The hex number before the first "/" is the low-order 12 bits of the
|
||||
The hex number before the first "/" is the low-order 16 bits of the
|
||||
dynticks counter, which will have an even-numbered value if the CPU
|
||||
is in dyntick-idle mode and an odd-numbered value otherwise. The hex
|
||||
number between the two "/"s is the value of the nesting, which will be
|
||||
|
14
Documentation/accel/qaic/aic080.rst
Normal file
14
Documentation/accel/qaic/aic080.rst
Normal file
@ -0,0 +1,14 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-only
|
||||
|
||||
===============================
|
||||
Qualcomm Cloud AI 80 (AIC080)
|
||||
===============================
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
The Qualcomm Cloud AI 80/AIC080 family of products are a derivative of AIC100.
|
||||
The number of NSPs and clock rates are reduced to fit within resource
|
||||
constrained solutions. The PCIe Product ID is 0xa080.
|
||||
|
||||
As a derivative product, all AIC100 documentation applies.
|
@ -229,6 +229,8 @@ of the defined channels, and their uses.
|
||||
| _PERIODIC | | | timestamps in the device side logs with|
|
||||
| | | | the host time source. |
|
||||
+----------------+---------+----------+----------------------------------------+
|
||||
| IPCR | 24 & 25 | AMSS | AF_QIPCRTR clients and servers. |
|
||||
+----------------+---------+----------+----------------------------------------+
|
||||
|
||||
DMA Bridge
|
||||
==========
|
||||
|
@ -10,4 +10,5 @@ accelerator cards.
|
||||
.. toctree::
|
||||
|
||||
qaic
|
||||
aic080
|
||||
aic100
|
||||
|
@ -18,8 +18,11 @@ set ``CONFIG_SECURITY_APPARMOR=y``
|
||||
|
||||
If AppArmor should be selected as the default security module then set::
|
||||
|
||||
CONFIG_DEFAULT_SECURITY="apparmor"
|
||||
CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE=1
|
||||
CONFIG_DEFAULT_SECURITY_APPARMOR=y
|
||||
|
||||
The CONFIG_LSM parameter manages the order and selection of LSMs.
|
||||
Specify apparmor as the first "major" module (e.g. AppArmor, SELinux, Smack)
|
||||
in the list.
|
||||
|
||||
Build the kernel
|
||||
|
||||
|
@ -47,6 +47,8 @@ The list of possible return codes:
|
||||
-ENOMEM zram was not able to allocate enough memory to fulfil your
|
||||
needs.
|
||||
-EINVAL invalid input has been provided.
|
||||
-EAGAIN re-try operation later (e.g. when attempting to run recompress
|
||||
and writeback simultaneously).
|
||||
======== =============================================================
|
||||
|
||||
If you use 'echo', the returned value is set by the 'echo' utility,
|
||||
|
@ -108,6 +108,27 @@ a fully reliable and straight-forward way to reproduce the regression, too.*
|
||||
With that the process is complete. Now report the regression as described by
|
||||
Documentation/admin-guide/reporting-issues.rst.
|
||||
|
||||
Bisecting linux-next
|
||||
--------------------
|
||||
|
||||
If you face a problem only happening in linux-next, bisect between the
|
||||
linux-next branches 'stable' and 'master'. The following commands will start
|
||||
the process for a linux-next tree you added as a remote called 'next'::
|
||||
|
||||
git bisect start
|
||||
git bisect good next/stable
|
||||
git bisect bad next/master
|
||||
|
||||
The 'stable' branch refers to the state of linux-mainline that the current
|
||||
linux-next release (found in the 'master' branch) is based on -- the former
|
||||
thus should be free of any problems that show up in -next, but not in Linus'
|
||||
tree.
|
||||
|
||||
This will bisect across a wide range of changes, some of which you might have
|
||||
used in earlier linux-next releases without problems. Sadly there is no simple
|
||||
way to avoid checking them: bisecting from one linux-next release to a later
|
||||
one (say between 'next-20241020' and 'next-20241021') is impossible, as they
|
||||
share no common history.
|
||||
|
||||
Additional reading material
|
||||
---------------------------
|
||||
|
@ -90,9 +90,7 @@ Brief summary of control files.
|
||||
used.
|
||||
memory.swappiness set/show swappiness parameter of vmscan
|
||||
(See sysctl's vm.swappiness)
|
||||
memory.move_charge_at_immigrate set/show controls of moving charges
|
||||
This knob is deprecated and shouldn't be
|
||||
used.
|
||||
memory.move_charge_at_immigrate This knob is deprecated.
|
||||
memory.oom_control set/show oom controls.
|
||||
This knob is deprecated and shouldn't be
|
||||
used.
|
||||
@ -243,10 +241,6 @@ behind this approach is that a cgroup that aggressively uses a shared
|
||||
page will eventually get charged for it (once it is uncharged from
|
||||
the cgroup that brought it in -- this will happen on memory pressure).
|
||||
|
||||
But see :ref:`section 8.2 <cgroup-v1-memory-movable-charges>` when moving a
|
||||
task to another cgroup, its pages may be recharged to the new cgroup, if
|
||||
move_charge_at_immigrate has been chosen.
|
||||
|
||||
2.4 Swap Extension
|
||||
--------------------------------------
|
||||
|
||||
@ -756,78 +750,8 @@ If we want to change this to 1G, we can at any time use::
|
||||
|
||||
THIS IS DEPRECATED!
|
||||
|
||||
It's expensive and unreliable! It's better practice to launch workload
|
||||
tasks directly from inside their target cgroup. Use dedicated workload
|
||||
cgroups to allow fine-grained policy adjustments without having to
|
||||
move physical pages between control domains.
|
||||
|
||||
Users can move charges associated with a task along with task migration, that
|
||||
is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
|
||||
This feature is not supported in !CONFIG_MMU environments because of lack of
|
||||
page tables.
|
||||
|
||||
8.1 Interface
|
||||
-------------
|
||||
|
||||
This feature is disabled by default. It can be enabled (and disabled again) by
|
||||
writing to memory.move_charge_at_immigrate of the destination cgroup.
|
||||
|
||||
If you want to enable it::
|
||||
|
||||
# echo (some positive value) > memory.move_charge_at_immigrate
|
||||
|
||||
.. note::
|
||||
Each bits of move_charge_at_immigrate has its own meaning about what type
|
||||
of charges should be moved. See :ref:`section 8.2
|
||||
<cgroup-v1-memory-movable-charges>` for details.
|
||||
|
||||
.. note::
|
||||
Charges are moved only when you move mm->owner, in other words,
|
||||
a leader of a thread group.
|
||||
|
||||
.. note::
|
||||
If we cannot find enough space for the task in the destination cgroup, we
|
||||
try to make space by reclaiming memory. Task migration may fail if we
|
||||
cannot make enough space.
|
||||
|
||||
.. note::
|
||||
It can take several seconds if you move charges much.
|
||||
|
||||
And if you want disable it again::
|
||||
|
||||
# echo 0 > memory.move_charge_at_immigrate
|
||||
|
||||
.. _cgroup-v1-memory-movable-charges:
|
||||
|
||||
8.2 Type of charges which can be moved
|
||||
--------------------------------------
|
||||
|
||||
Each bit in move_charge_at_immigrate has its own meaning about what type of
|
||||
charges should be moved. But in any case, it must be noted that an account of
|
||||
a page or a swap can be moved only when it is charged to the task's current
|
||||
(old) memory cgroup.
|
||||
|
||||
+---+--------------------------------------------------------------------------+
|
||||
|bit| what type of charges would be moved ? |
|
||||
+===+==========================================================================+
|
||||
| 0 | A charge of an anonymous page (or swap of it) used by the target task. |
|
||||
| | You must enable Swap Extension (see 2.4) to enable move of swap charges. |
|
||||
+---+--------------------------------------------------------------------------+
|
||||
| 1 | A charge of file pages (normal file, tmpfs file (e.g. ipc shared memory) |
|
||||
| | and swaps of tmpfs file) mmapped by the target task. Unlike the case of |
|
||||
| | anonymous pages, file pages (and swaps) in the range mmapped by the task |
|
||||
| | will be moved even if the task hasn't done page fault, i.e. they might |
|
||||
| | not be the task's "RSS", but other task's "RSS" that maps the same file. |
|
||||
| | The mapcount of the page is ignored (the page can be moved independent |
|
||||
| | of the mapcount). You must enable Swap Extension (see 2.4) to |
|
||||
| | enable move of swap charges. |
|
||||
+---+--------------------------------------------------------------------------+
|
||||
|
||||
8.3 TODO
|
||||
--------
|
||||
|
||||
- All of moving charge operations are done under cgroup_mutex. It's not good
|
||||
behavior to hold the mutex too long, so we may need some trick.
|
||||
Reading memory.move_charge_at_immigrate will always return 0 and writing
|
||||
to it will always return -EINVAL.
|
||||
|
||||
9. Memory thresholds
|
||||
====================
|
||||
|
@ -1599,6 +1599,15 @@ The following nested keys are defined.
|
||||
pglazyfreed (npn)
|
||||
Amount of reclaimed lazyfree pages
|
||||
|
||||
swpin_zero
|
||||
Number of pages swapped into memory and filled with zero, where I/O
|
||||
was optimized out because the page content was detected to be zero
|
||||
during swapout.
|
||||
|
||||
swpout_zero
|
||||
Number of zero-filled pages swapped out with I/O skipped due to the
|
||||
content being detected as zero.
|
||||
|
||||
zswpin
|
||||
Number of pages moved in to memory from zswap.
|
||||
|
||||
@ -1646,6 +1655,11 @@ The following nested keys are defined.
|
||||
pgdemote_khugepaged
|
||||
Number of pages demoted by khugepaged.
|
||||
|
||||
hugetlb
|
||||
Amount of memory used by hugetlb pages. This metric only shows
|
||||
up if hugetlb usage is accounted for in memory.current (i.e.
|
||||
cgroup is mounted with the memory_hugetlb_accounting option).
|
||||
|
||||
memory.numa_stat
|
||||
A read-only nested-keyed file which exists on non-root cgroups.
|
||||
|
||||
@ -2945,7 +2959,7 @@ following two functions.
|
||||
a queue (device) has been associated with the bio and
|
||||
before submission.
|
||||
|
||||
wbc_account_cgroup_owner(@wbc, @page, @bytes)
|
||||
wbc_account_cgroup_owner(@wbc, @folio, @bytes)
|
||||
Should be called for each data segment being written out.
|
||||
While this function doesn't care exactly when it's called
|
||||
during the writeback session, it's the easiest and most
|
||||
|
@ -27,6 +27,16 @@ kernel command line (/proc/cmdline) and collects module parameters
|
||||
when it loads a module, so the kernel command line can be used for
|
||||
loadable modules too.
|
||||
|
||||
This document may not be entirely up to date and comprehensive. The command
|
||||
"modinfo -p ${modulename}" shows a current list of all parameters of a loadable
|
||||
module. Loadable modules, after being loaded into the running kernel, also
|
||||
reveal their parameters in /sys/module/${modulename}/parameters/. Some of these
|
||||
parameters may be changed at runtime by the command
|
||||
``echo -n ${value} > /sys/module/${modulename}/parameters/${parm}``.
|
||||
|
||||
Special handling
|
||||
----------------
|
||||
|
||||
Hyphens (dashes) and underscores are equivalent in parameter names, so::
|
||||
|
||||
log_buf_len=1M print-fatal-signals=1
|
||||
@ -39,8 +49,8 @@ Double-quotes can be used to protect spaces in values, e.g.::
|
||||
|
||||
param="spaces in here"
|
||||
|
||||
cpu lists:
|
||||
----------
|
||||
cpu lists
|
||||
~~~~~~~~~
|
||||
|
||||
Some kernel parameters take a list of CPUs as a value, e.g. isolcpus,
|
||||
nohz_full, irqaffinity, rcu_nocbs. The format of this list is:
|
||||
@ -82,12 +92,17 @@ so that "nohz_full=all" is the equivalent of "nohz_full=0-N".
|
||||
The semantics of "N" and "all" is supported on a level of bitmaps and holds for
|
||||
all users of bitmap_parselist().
|
||||
|
||||
This document may not be entirely up to date and comprehensive. The command
|
||||
"modinfo -p ${modulename}" shows a current list of all parameters of a loadable
|
||||
module. Loadable modules, after being loaded into the running kernel, also
|
||||
reveal their parameters in /sys/module/${modulename}/parameters/. Some of these
|
||||
parameters may be changed at runtime by the command
|
||||
``echo -n ${value} > /sys/module/${modulename}/parameters/${parm}``.
|
||||
Metric suffixes
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
The [KMG] suffix is commonly described after a number of kernel
|
||||
parameter values. 'K', 'M', 'G', 'T', 'P', and 'E' suffixes are allowed.
|
||||
These letters represent the _binary_ multipliers 'Kilo', 'Mega', 'Giga',
|
||||
'Tera', 'Peta', and 'Exa', equaling 2^10, 2^20, 2^30, 2^40, 2^50, and
|
||||
2^60 bytes respectively. Such letter suffixes can also be entirely omitted.
|
||||
|
||||
Kernel Build Options
|
||||
--------------------
|
||||
|
||||
The parameters listed below are only valid if certain kernel build options
|
||||
were enabled and if respective hardware is present. This list should be kept
|
||||
@ -159,6 +174,7 @@ is applicable::
|
||||
SCSI Appropriate SCSI support is enabled.
|
||||
A lot of drivers have their options described inside
|
||||
the Documentation/scsi/ sub-directory.
|
||||
SDW SoundWire support is enabled.
|
||||
SECURITY Different security models are enabled.
|
||||
SELINUX SELinux support is enabled.
|
||||
SERIAL Serial support is enabled.
|
||||
@ -211,10 +227,5 @@ a fixed number of characters. This limit depends on the architecture
|
||||
and is between 256 and 4096 characters. It is defined in the file
|
||||
./include/uapi/asm-generic/setup.h as COMMAND_LINE_SIZE.
|
||||
|
||||
Finally, the [KMG] suffix is commonly described after a number of kernel
|
||||
parameter values. These 'K', 'M', and 'G' letters represent the _binary_
|
||||
multipliers 'Kilo', 'Mega', and 'Giga', equaling 2^10, 2^20, and 2^30
|
||||
bytes respectively. Such letter suffixes can also be entirely omitted:
|
||||
|
||||
.. include:: kernel-parameters.txt
|
||||
:literal:
|
||||
|
@ -446,6 +446,9 @@
|
||||
arm64.nobti [ARM64] Unconditionally disable Branch Target
|
||||
Identification support
|
||||
|
||||
arm64.nogcs [ARM64] Unconditionally disable Guarded Control Stack
|
||||
support
|
||||
|
||||
arm64.nomops [ARM64] Unconditionally disable Memory Copy and Memory
|
||||
Set instructions support
|
||||
|
||||
@ -918,12 +921,16 @@
|
||||
the parameter has no effect.
|
||||
|
||||
crash_kexec_post_notifiers
|
||||
Run kdump after running panic-notifiers and dumping
|
||||
kmsg. This only for the users who doubt kdump always
|
||||
succeeds in any situation.
|
||||
Note that this also increases risks of kdump failure,
|
||||
because some panic notifiers can make the crashed
|
||||
kernel more unstable.
|
||||
Only jump to kdump kernel after running the panic
|
||||
notifiers and dumping kmsg. This option increases
|
||||
the risks of a kdump failure, since some panic
|
||||
notifiers can make the crashed kernel more unstable.
|
||||
In configurations where kdump may not be reliable,
|
||||
running the panic notifiers could allow collecting
|
||||
more data on dmesg, like stack traces from other CPUS
|
||||
or extra data dumped by panic_print. Note that some
|
||||
configurations enable this option unconditionally,
|
||||
like Hyper-V, PowerPC (fadump) and AMD SEV-SNP.
|
||||
|
||||
crashkernel=size[KMG][@offset[KMG]]
|
||||
[KNL,EARLY] Using kexec, Linux can switch to a 'crash kernel'
|
||||
@ -1546,6 +1553,7 @@
|
||||
failslab=
|
||||
fail_usercopy=
|
||||
fail_page_alloc=
|
||||
fail_skb_realloc=
|
||||
fail_make_request=[KNL]
|
||||
General fault injection mechanism.
|
||||
Format: <interval>,<probability>,<space>,<times>
|
||||
@ -4678,6 +4686,10 @@
|
||||
nomio [S390] Do not use MIO instructions.
|
||||
norid [S390] ignore the RID field and force use of
|
||||
one PCI domain per PCI function
|
||||
notph [PCIE] If the PCIE_TPH kernel config parameter
|
||||
is enabled, this kernel boot option can be used
|
||||
to disable PCIe TLP Processing Hints support
|
||||
system-wide.
|
||||
|
||||
pcie_aspm= [PCIE] Forcibly enable or ignore PCIe Active State Power
|
||||
Management.
|
||||
@ -5412,11 +5424,6 @@
|
||||
Set time (jiffies) between CPU-hotplug operations,
|
||||
or zero to disable CPU-hotplug testing.
|
||||
|
||||
rcutorture.read_exit= [KNL]
|
||||
Set the number of read-then-exit kthreads used
|
||||
to test the interaction of RCU updaters and
|
||||
task-exit processing.
|
||||
|
||||
rcutorture.read_exit_burst= [KNL]
|
||||
The number of times in a given read-then-exit
|
||||
episode that a set of read-then-exit kthreads
|
||||
@ -5426,6 +5433,14 @@
|
||||
The delay, in seconds, between successive
|
||||
read-then-exit testing episodes.
|
||||
|
||||
rcutorture.reader_flavor= [KNL]
|
||||
A bit mask indicating which readers to use.
|
||||
If there is more than one bit set, the readers
|
||||
are entered from low-order bit up, and are
|
||||
exited in the opposite order. For SRCU, the
|
||||
0x1 bit is normal readers, 0x2 NMI-safe readers,
|
||||
and 0x4 light-weight readers.
|
||||
|
||||
rcutorture.shuffle_interval= [KNL]
|
||||
Set task-shuffle interval (s). Shuffling tasks
|
||||
allows some CPUs to go into dyntick-idle mode
|
||||
@ -6060,6 +6075,10 @@
|
||||
non-zero "wait" parameter. See weight_single
|
||||
and weight_many.
|
||||
|
||||
sdw_mclk_divider=[SDW]
|
||||
Specify the MCLK divider for Intel SoundWire buses in
|
||||
case the BIOS does not provide the clock rate properly.
|
||||
|
||||
skew_tick= [KNL,EARLY] Offset the periodic timer tick per cpu to mitigate
|
||||
xtime_lock contention on larger systems, and/or RCU lock
|
||||
contention on all systems with CONFIG_MAXSMP set.
|
||||
@ -6147,6 +6166,16 @@
|
||||
For more information see Documentation/mm/slub.rst.
|
||||
(slub_nomerge legacy name also accepted for now)
|
||||
|
||||
slab_strict_numa [MM]
|
||||
Support memory policies on a per object level
|
||||
in the slab allocator. The default is for memory
|
||||
policies to be applied at the folio level when
|
||||
a new folio is needed or a partial folio is
|
||||
retrieved from the lists. Increases overhead
|
||||
in the slab fastpaths but gains more accurate
|
||||
NUMA kernel object placement which helps with slow
|
||||
interconnects in NUMA systems.
|
||||
|
||||
slram= [HW,MTD]
|
||||
|
||||
smart2= [HW]
|
||||
@ -6700,6 +6729,16 @@
|
||||
Force threading of all interrupt handlers except those
|
||||
marked explicitly IRQF_NO_THREAD.
|
||||
|
||||
thp_shmem= [KNL]
|
||||
Format: <size>[KMG],<size>[KMG]:<policy>;<size>[KMG]-<size>[KMG]:<policy>
|
||||
Control the default policy of each hugepage size for the
|
||||
internal shmem mount. <policy> is one of policies available
|
||||
for the shmem mount ("always", "inherit", "never", "within_size",
|
||||
and "advise").
|
||||
It can be used multiple times for multiple shmem THP sizes.
|
||||
See Documentation/admin-guide/mm/transhuge.rst for more
|
||||
details.
|
||||
|
||||
topology= [S390,EARLY]
|
||||
Format: {off | on}
|
||||
Specify if the kernel should make use of the cpu
|
||||
@ -6727,6 +6766,15 @@
|
||||
torture.verbose_sleep_duration= [KNL]
|
||||
Duration of each verbose-printk() sleep in jiffies.
|
||||
|
||||
tpm.disable_pcr_integrity= [HW,TPM]
|
||||
Do not protect PCR registers from unintended physical
|
||||
access, or interposers in the bus by the means of
|
||||
having an integrity protected session wrapped around
|
||||
TPM2_PCR_Extend command. Consider this in a situation
|
||||
where TPM is heavily utilized by IMA, thus protection
|
||||
causing a major performance hit, and the space where
|
||||
machines are deployed is by other means guarded.
|
||||
|
||||
tpm_suspend_pcr=[HW,TPM]
|
||||
Format: integer pcr id
|
||||
Specify that at suspend time, the tpm driver
|
||||
@ -6867,6 +6915,12 @@
|
||||
|
||||
reserve_mem=12M:4096:trace trace_instance=boot_map^traceoff^traceprintk@trace,sched,irq
|
||||
|
||||
Note, saving the trace buffer across reboots does require that the system
|
||||
is set up to not wipe memory. For instance, CONFIG_RESET_ATTACK_MITIGATION
|
||||
can force a memory reset on boot which will clear any trace that was stored.
|
||||
This is just one of many ways that can clear memory. Make sure your system
|
||||
keeps the content of memory across reboots before relying on this option.
|
||||
|
||||
See also Documentation/trace/debugging.rst
|
||||
|
||||
|
||||
@ -6926,6 +6980,13 @@
|
||||
See Documentation/admin-guide/mm/transhuge.rst
|
||||
for more details.
|
||||
|
||||
transparent_hugepage_shmem= [KNL]
|
||||
Format: [always|within_size|advise|never|deny|force]
|
||||
Can be used to control the hugepage allocation policy for
|
||||
the internal shmem mount.
|
||||
See Documentation/admin-guide/mm/transhuge.rst
|
||||
for more details.
|
||||
|
||||
trusted.source= [KEYS]
|
||||
Format: <string>
|
||||
This parameter identifies the trust source as a backend
|
||||
|
@ -315,7 +315,7 @@ To reduce its OS jitter, do at least one of the following:
|
||||
to do.
|
||||
|
||||
Name:
|
||||
rcuop/%d and rcuos/%d
|
||||
rcuop/%d, rcuos/%d, and rcuog/%d
|
||||
|
||||
Purpose:
|
||||
Offload RCU callbacks from the corresponding CPU.
|
||||
|
@ -15,7 +15,7 @@ Please notice, however, that, if:
|
||||
|
||||
you should use the main media development tree ``master`` branch:
|
||||
|
||||
https://git.linuxtv.org/media_tree.git/
|
||||
https://git.linuxtv.org/media.git/
|
||||
|
||||
In this case, you may find some useful information at the
|
||||
`LinuxTv wiki pages <https://linuxtv.org/wiki>`_:
|
||||
|
@ -20,6 +20,11 @@ Documentation/driver-api/media/index.rst
|
||||
- for driver development information and Kernel APIs used by
|
||||
media devices;
|
||||
|
||||
Documentation/process/debugging/media_specific_debugging_guide.rst
|
||||
|
||||
- for advice about essential tools and techniques to debug drivers on this
|
||||
subsystem
|
||||
|
||||
.. toctree::
|
||||
:caption: Table of Contents
|
||||
:maxdepth: 2
|
||||
|
@ -1,62 +0,0 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
OMAP4 ISS Driver
|
||||
================
|
||||
|
||||
Author: Sergio Aguirre <sergio.a.aguirre@gmail.com>
|
||||
|
||||
Copyright (C) 2012, Texas Instruments
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
The OMAP44XX family of chips contains the Imaging SubSystem (a.k.a. ISS),
|
||||
Which contains several components that can be categorized in 3 big groups:
|
||||
|
||||
- Interfaces (2 Interfaces: CSI2-A & CSI2-B/CCP2)
|
||||
- ISP (Image Signal Processor)
|
||||
- SIMCOP (Still Image Coprocessor)
|
||||
|
||||
For more information, please look in [#f1]_ for latest version of:
|
||||
"OMAP4430 Multimedia Device Silicon Revision 2.x"
|
||||
|
||||
As of Revision AB, the ISS is described in detail in section 8.
|
||||
|
||||
This driver is supporting **only** the CSI2-A/B interfaces for now.
|
||||
|
||||
It makes use of the Media Controller framework [#f2]_, and inherited most of the
|
||||
code from OMAP3 ISP driver (found under drivers/media/platform/ti/omap3isp/\*),
|
||||
except that it doesn't need an IOMMU now for ISS buffers memory mapping.
|
||||
|
||||
Supports usage of MMAP buffers only (for now).
|
||||
|
||||
Tested platforms
|
||||
----------------
|
||||
|
||||
- OMAP4430SDP, w/ ES2.1 GP & SEVM4430-CAM-V1-0 (Contains IMX060 & OV5640, in
|
||||
which only the last one is supported, outputting YUV422 frames).
|
||||
|
||||
- TI Blaze MDP, w/ OMAP4430 ES2.2 EMU (Contains 1 IMX060 & 2 OV5650 sensors, in
|
||||
which only the OV5650 are supported, outputting RAW10 frames).
|
||||
|
||||
- PandaBoard, Rev. A2, w/ OMAP4430 ES2.1 GP & OV adapter board, tested with
|
||||
following sensors:
|
||||
* OV5640
|
||||
* OV5650
|
||||
|
||||
- Tested on mainline kernel:
|
||||
|
||||
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=summary
|
||||
|
||||
Tag: v3.3 (commit c16fa4f2ad19908a47c63d8fa436a1178438c7e7)
|
||||
|
||||
File list
|
||||
---------
|
||||
drivers/staging/media/omap4iss/
|
||||
include/linux/platform_data/media/omap4iss.h
|
||||
|
||||
References
|
||||
----------
|
||||
|
||||
.. [#f1] http://focus.ti.com/general/docs/wtbu/wtbudocumentcenter.tsp?navigationId=12037&templateId=6123#62
|
||||
.. [#f2] http://lwn.net/Articles/420485/
|
27
Documentation/admin-guide/media/raspberrypi-rp1-cfe.dot
Normal file
27
Documentation/admin-guide/media/raspberrypi-rp1-cfe.dot
Normal file
@ -0,0 +1,27 @@
|
||||
digraph board {
|
||||
rankdir=TB
|
||||
n00000001 [label="{{<port0> 0} | csi2\n/dev/v4l-subdev0 | {<port1> 1 | <port2> 2 | <port3> 3 | <port4> 4}}", shape=Mrecord, style=filled, fillcolor=green]
|
||||
n00000001:port1 -> n00000011 [style=dashed]
|
||||
n00000001:port1 -> n00000007:port0
|
||||
n00000001:port2 -> n00000015
|
||||
n00000001:port2 -> n00000007:port0 [style=dashed]
|
||||
n00000001:port3 -> n00000019 [style=dashed]
|
||||
n00000001:port3 -> n00000007:port0 [style=dashed]
|
||||
n00000001:port4 -> n0000001d [style=dashed]
|
||||
n00000001:port4 -> n00000007:port0 [style=dashed]
|
||||
n00000007 [label="{{<port0> 0 | <port1> 1} | pisp-fe\n/dev/v4l-subdev1 | {<port2> 2 | <port3> 3 | <port4> 4}}", shape=Mrecord, style=filled, fillcolor=green]
|
||||
n00000007:port2 -> n00000021
|
||||
n00000007:port3 -> n00000025 [style=dashed]
|
||||
n00000007:port4 -> n00000029
|
||||
n0000000d [label="{imx219 6-0010\n/dev/v4l-subdev2 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
|
||||
n0000000d:port0 -> n00000001:port0 [style=bold]
|
||||
n00000011 [label="rp1-cfe-csi2-ch0\n/dev/video0", shape=box, style=filled, fillcolor=yellow]
|
||||
n00000015 [label="rp1-cfe-csi2-ch1\n/dev/video1", shape=box, style=filled, fillcolor=yellow]
|
||||
n00000019 [label="rp1-cfe-csi2-ch2\n/dev/video2", shape=box, style=filled, fillcolor=yellow]
|
||||
n0000001d [label="rp1-cfe-csi2-ch3\n/dev/video3", shape=box, style=filled, fillcolor=yellow]
|
||||
n00000021 [label="rp1-cfe-fe-image0\n/dev/video4", shape=box, style=filled, fillcolor=yellow]
|
||||
n00000025 [label="rp1-cfe-fe-image1\n/dev/video5", shape=box, style=filled, fillcolor=yellow]
|
||||
n00000029 [label="rp1-cfe-fe-stats\n/dev/video6", shape=box, style=filled, fillcolor=yellow]
|
||||
n0000002d [label="rp1-cfe-fe-config\n/dev/video7", shape=box, style=filled, fillcolor=yellow]
|
||||
n0000002d -> n00000007:port1
|
||||
}
|
78
Documentation/admin-guide/media/raspberrypi-rp1-cfe.rst
Normal file
78
Documentation/admin-guide/media/raspberrypi-rp1-cfe.rst
Normal file
@ -0,0 +1,78 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
============================================
|
||||
Raspberry Pi PiSP Camera Front End (rp1-cfe)
|
||||
============================================
|
||||
|
||||
The PiSP Camera Front End
|
||||
=========================
|
||||
|
||||
The PiSP Camera Front End (CFE) is a module which combines a CSI-2 receiver with
|
||||
a simple ISP, called the Front End (FE).
|
||||
|
||||
The CFE has four DMA engines and can write frames from four separate streams
|
||||
received from the CSI-2 to the memory. One of those streams can also be routed
|
||||
directly to the FE, which can do minimal image processing, write two versions
|
||||
(e.g. non-scaled and downscaled versions) of the received frames to memory and
|
||||
provide statistics of the received frames.
|
||||
|
||||
The FE registers are documented in the `Raspberry Pi Image Signal Processor
|
||||
(ISP) Specification document
|
||||
<https://datasheets.raspberrypi.com/camera/raspberry-pi-image-signal-processor-specification.pdf>`_,
|
||||
and example code for FE can be found in `libpisp
|
||||
<https://github.com/raspberrypi/libpisp>`_.
|
||||
|
||||
The rp1-cfe driver
|
||||
==================
|
||||
|
||||
The Raspberry Pi PiSP Camera Front End (rp1-cfe) driver is located under
|
||||
drivers/media/platform/raspberrypi/rp1-cfe. It uses the `V4L2 API` to register
|
||||
a number of video capture and output devices, the `V4L2 subdev API` to register
|
||||
subdevices for the CSI-2 received and the FE that connects the video devices in
|
||||
a single media graph realized using the `Media Controller (MC) API`.
|
||||
|
||||
The media topology registered by the `rp1-cfe` driver, in this particular
|
||||
example connected to an imx219 sensor, is the following one:
|
||||
|
||||
.. _rp1-cfe-topology:
|
||||
|
||||
.. kernel-figure:: raspberrypi-rp1-cfe.dot
|
||||
:alt: Diagram of an example media pipeline topology
|
||||
:align: center
|
||||
|
||||
The media graph contains the following video device nodes:
|
||||
|
||||
- rp1-cfe-csi2-ch0: capture device for the first CSI-2 stream
|
||||
- rp1-cfe-csi2-ch1: capture device for the second CSI-2 stream
|
||||
- rp1-cfe-csi2-ch2: capture device for the third CSI-2 stream
|
||||
- rp1-cfe-csi2-ch3: capture device for the fourth CSI-2 stream
|
||||
- rp1-cfe-fe-image0: capture device for the first FE output
|
||||
- rp1-cfe-fe-image1: capture device for the second FE output
|
||||
- rp1-cfe-fe-stats: capture device for the FE statistics
|
||||
- rp1-cfe-fe-config: output device for FE configuration
|
||||
|
||||
rp1-cfe-csi2-chX
|
||||
----------------
|
||||
|
||||
The rp1-cfe-csi2-chX capture devices are normal V4L2 capture devices which
|
||||
can be used to capture video frames or metadata received from the CSI-2.
|
||||
|
||||
rp1-cfe-fe-image0, rp1-cfe-fe-image1
|
||||
------------------------------------
|
||||
|
||||
The rp1-cfe-fe-image0 and rp1-cfe-fe-image1 capture devices are used to write
|
||||
the processed frames to memory.
|
||||
|
||||
rp1-cfe-fe-stats
|
||||
----------------
|
||||
|
||||
The format of the FE statistics buffer is defined by
|
||||
:c:type:`pisp_statistics` C structure and the meaning of each parameter is
|
||||
described in the `PiSP specification` document.
|
||||
|
||||
rp1-cfe-fe-config
|
||||
-----------------
|
||||
|
||||
The format of the FE configuration buffer is defined by
|
||||
:c:type:`pisp_fe_config` C structure and the meaning of each parameter is
|
||||
described in the `PiSP specification` document.
|
@ -67,7 +67,7 @@ Changes / Fixes
|
||||
Please mail to linux-media AT vger.kernel.org unified diffs against
|
||||
the linux media git tree:
|
||||
|
||||
https://git.linuxtv.org/media_tree.git/
|
||||
https://git.linuxtv.org/media.git/
|
||||
|
||||
This is done by committing a patch at a clone of the git tree and
|
||||
submitting the patch using ``git send-email``. Don't forget to
|
||||
|
@ -20,12 +20,12 @@ Video4Linux (V4L) driver-specific documentation
|
||||
ivtv
|
||||
mgb4
|
||||
omap3isp
|
||||
omap4_camera
|
||||
philips
|
||||
qcom_camss
|
||||
raspberrypi-pisp-be
|
||||
rcar-fdp1
|
||||
rkisp1
|
||||
raspberrypi-rp1-cfe
|
||||
saa7134
|
||||
si470x
|
||||
si4713
|
||||
|
@ -326,6 +326,29 @@ PMD_ORDER THP policy will be overridden. If the policy for PMD_ORDER
|
||||
is not defined within a valid ``thp_anon``, its policy will default to
|
||||
``never``.
|
||||
|
||||
Similarly to ``transparent_hugepage``, you can control the hugepage
|
||||
allocation policy for the internal shmem mount by using the kernel parameter
|
||||
``transparent_hugepage_shmem=<policy>``, where ``<policy>`` is one of the
|
||||
seven valid policies for shmem (``always``, ``within_size``, ``advise``,
|
||||
``never``, ``deny``, and ``force``).
|
||||
|
||||
In the same manner as ``thp_anon`` controls each supported anonymous THP
|
||||
size, ``thp_shmem`` controls each supported shmem THP size. ``thp_shmem``
|
||||
has the same format as ``thp_anon``, but also supports the policy
|
||||
``within_size``.
|
||||
|
||||
``thp_shmem=`` may be specified multiple times to configure all THP sizes
|
||||
as required. If ``thp_shmem=`` is specified at least once, any shmem THP
|
||||
sizes not explicitly configured on the command line are implicitly set to
|
||||
``never``.
|
||||
|
||||
``transparent_hugepage_shmem`` setting only affects the global toggle. If
|
||||
``thp_shmem`` is not specified, PMD_ORDER hugepage will default to
|
||||
``inherit``. However, if a valid ``thp_shmem`` setting is provided by the
|
||||
user, the PMD_ORDER hugepage policy will be overridden. If the policy for
|
||||
PMD_ORDER is not defined within a valid ``thp_shmem``, its policy will
|
||||
default to ``never``.
|
||||
|
||||
Hugepages in tmpfs/shmem
|
||||
========================
|
||||
|
||||
@ -530,10 +553,18 @@ anon_fault_fallback_charge
|
||||
instead falls back to using huge pages with lower orders or
|
||||
small pages even though the allocation was successful.
|
||||
|
||||
swpout
|
||||
is incremented every time a huge page is swapped out in one
|
||||
zswpout
|
||||
is incremented every time a huge page is swapped out to zswap in one
|
||||
piece without splitting.
|
||||
|
||||
swpin
|
||||
is incremented every time a huge page is swapped in from a non-zswap
|
||||
swap device in one piece.
|
||||
|
||||
swpout
|
||||
is incremented every time a huge page is swapped out to a non-zswap
|
||||
swap device in one piece without splitting.
|
||||
|
||||
swpout_fallback
|
||||
is incremented if a huge page has to be split before swapout.
|
||||
Usually because failed to allocate some continuous swap space
|
||||
|
@ -26,3 +26,4 @@ Performance monitor support
|
||||
meson-ddr-pmu
|
||||
cxl
|
||||
ampere_cspmu
|
||||
mrvl-pem-pmu
|
||||
|
56
Documentation/admin-guide/perf/mrvl-pem-pmu.rst
Normal file
56
Documentation/admin-guide/perf/mrvl-pem-pmu.rst
Normal file
@ -0,0 +1,56 @@
|
||||
=================================================================
|
||||
Marvell Odyssey PEM Performance Monitoring Unit (PMU UNCORE)
|
||||
=================================================================
|
||||
|
||||
The PCI Express Interface Units(PEM) are associated with a corresponding
|
||||
monitoring unit. This includes performance counters to track various
|
||||
characteristics of the data that is transmitted over the PCIe link.
|
||||
|
||||
The counters track inbound and outbound transactions which
|
||||
includes separate counters for posted/non-posted/completion TLPs.
|
||||
Also, inbound and outbound memory read requests along with their
|
||||
latencies can also be monitored. Address Translation Services(ATS)events
|
||||
such as ATS Translation, ATS Page Request, ATS Invalidation along with
|
||||
their corresponding latencies are also tracked.
|
||||
|
||||
There are separate 64 bit counters to measure posted/non-posted/completion
|
||||
tlps in inbound and outbound transactions. ATS events are measured by
|
||||
different counters.
|
||||
|
||||
The PMU driver exposes the available events and format options under sysfs,
|
||||
/sys/bus/event_source/devices/mrvl_pcie_rc_pmu_<>/events/
|
||||
/sys/bus/event_source/devices/mrvl_pcie_rc_pmu_<>/format/
|
||||
|
||||
Examples::
|
||||
|
||||
# perf list | grep mrvl_pcie_rc_pmu
|
||||
mrvl_pcie_rc_pmu_<>/ats_inv/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ats_inv_latency/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ats_pri/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ats_pri_latency/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ats_trans/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ats_trans_latency/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ib_inflight/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ib_reads/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ib_req_no_ro_ebus/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ib_req_no_ro_ncb/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ib_tlp_cpl_partid/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ib_tlp_dwords_cpl_partid/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ib_tlp_dwords_npr/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ib_tlp_dwords_pr/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ib_tlp_npr/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ib_tlp_pr/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ob_inflight_partid/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ob_merges_cpl_partid/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ob_merges_npr_partid/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ob_merges_pr_partid/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ob_reads_partid/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ob_tlp_cpl_partid/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ob_tlp_dwords_cpl_partid/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ob_tlp_dwords_npr_partid/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ob_tlp_dwords_pr_partid/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ob_tlp_npr_partid/ [Kernel PMU event]
|
||||
mrvl_pcie_rc_pmu_<>/ob_tlp_pr_partid/ [Kernel PMU event]
|
||||
|
||||
|
||||
# perf stat -e ib_inflight,ib_reads,ib_req_no_ro_ebus,ib_req_no_ro_ncb <workload>
|
@ -38,6 +38,11 @@ requests. ``aio-max-nr`` allows you to change the maximum value
|
||||
``aio-max-nr`` does not result in the
|
||||
pre-allocation or re-sizing of any kernel data structures.
|
||||
|
||||
dentry-negative
|
||||
----------------------------
|
||||
|
||||
Policy for negative dentries. Set to 1 to to always delete the dentry when a
|
||||
file is removed, and 0 to disable it. By default, this behavior is disabled.
|
||||
|
||||
dentry-state
|
||||
------------
|
||||
@ -332,3 +337,13 @@ Each "watch" costs roughly 90 bytes on a 32-bit kernel, and roughly 160 bytes
|
||||
on a 64-bit one.
|
||||
The current default value for ``max_user_watches`` is 4% of the
|
||||
available low memory, divided by the "watch" cost in bytes.
|
||||
|
||||
5. /proc/sys/fs/fuse - Configuration options for FUSE filesystems
|
||||
=====================================================================
|
||||
|
||||
This directory contains the following configuration options for FUSE
|
||||
filesystems:
|
||||
|
||||
``/proc/sys/fs/fuse/max_pages_limit`` is a read/write file for
|
||||
setting/getting the maximum number of pages that can be used for servicing
|
||||
requests in FUSE.
|
||||
|
@ -401,6 +401,15 @@ The upper bound on the number of tasks that are checked.
|
||||
This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
|
||||
|
||||
|
||||
hung_task_detect_count
|
||||
======================
|
||||
|
||||
Indicates the total number of tasks that have been detected as hung since
|
||||
the system boot.
|
||||
|
||||
This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
|
||||
|
||||
|
||||
hung_task_timeout_secs
|
||||
======================
|
||||
|
||||
|
69
Documentation/arch/arm64/arm-cca.rst
Normal file
69
Documentation/arch/arm64/arm-cca.rst
Normal file
@ -0,0 +1,69 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=====================================
|
||||
Arm Confidential Compute Architecture
|
||||
=====================================
|
||||
|
||||
Arm systems that support the Realm Management Extension (RME) contain
|
||||
hardware to allow a VM guest to be run in a way which protects the code
|
||||
and data of the guest from the hypervisor. It extends the older "two
|
||||
world" model (Normal and Secure World) into four worlds: Normal, Secure,
|
||||
Root and Realm. Linux can then also be run as a guest to a monitor
|
||||
running in the Realm world.
|
||||
|
||||
The monitor running in the Realm world is known as the Realm Management
|
||||
Monitor (RMM) and implements the Realm Management Monitor
|
||||
specification[1]. The monitor acts a bit like a hypervisor (e.g. it runs
|
||||
in EL2 and manages the stage 2 page tables etc of the guests running in
|
||||
Realm world), however much of the control is handled by a hypervisor
|
||||
running in the Normal World. The Normal World hypervisor uses the Realm
|
||||
Management Interface (RMI) defined by the RMM specification to request
|
||||
the RMM to perform operations (e.g. mapping memory or executing a vCPU).
|
||||
|
||||
The RMM defines an environment for guests where the address space (IPA)
|
||||
is split into two. The lower half is protected - any memory that is
|
||||
mapped in this half cannot be seen by the Normal World and the RMM
|
||||
restricts what operations the Normal World can perform on this memory
|
||||
(e.g. the Normal World cannot replace pages in this region without the
|
||||
guest's cooperation). The upper half is shared, the Normal World is free
|
||||
to make changes to the pages in this region, and is able to emulate MMIO
|
||||
devices in this region too.
|
||||
|
||||
A guest running in a Realm may also communicate with the RMM using the
|
||||
Realm Services Interface (RSI) to request changes in its environment or
|
||||
to perform attestation about its environment. In particular it may
|
||||
request that areas of the protected address space are transitioned
|
||||
between 'RAM' and 'EMPTY' (in either direction). This allows a Realm
|
||||
guest to give up memory to be returned to the Normal World, or to
|
||||
request new memory from the Normal World. Without an explicit request
|
||||
from the Realm guest the RMM will otherwise prevent the Normal World
|
||||
from making these changes.
|
||||
|
||||
Linux as a Realm Guest
|
||||
----------------------
|
||||
|
||||
To run Linux as a guest within a Realm, the following must be provided
|
||||
either by the VMM or by a `boot loader` run in the Realm before Linux:
|
||||
|
||||
* All protected RAM described to Linux (by DT or ACPI) must be marked
|
||||
RIPAS RAM before handing control over to Linux.
|
||||
|
||||
* MMIO devices must be either unprotected (e.g. emulated by the Normal
|
||||
World) or marked RIPAS DEV.
|
||||
|
||||
* MMIO devices emulated by the Normal World and used very early in boot
|
||||
(specifically earlycon) must be specified in the upper half of IPA.
|
||||
For earlycon this can be done by specifying the address on the
|
||||
command line, e.g. with an IPA size of 33 bits and the base address
|
||||
of the emulated UART at 0x1000000: ``earlycon=uart,mmio,0x101000000``
|
||||
|
||||
* Linux will use bounce buffers for communicating with unprotected
|
||||
devices. It will transition some protected memory to RIPAS EMPTY and
|
||||
expect to be able to access unprotected pages at the same IPA address
|
||||
but with the highest valid IPA bit set. The expectation is that the
|
||||
VMM will remove the physical pages from the protected mapping and
|
||||
provide those pages as unprotected pages.
|
||||
|
||||
References
|
||||
----------
|
||||
[1] https://developer.arm.com/documentation/den0137/
|
@ -41,6 +41,9 @@ to automatically locate and size all RAM, or it may use knowledge of
|
||||
the RAM in the machine, or any other method the boot loader designer
|
||||
sees fit.)
|
||||
|
||||
For Arm Confidential Compute Realms this includes ensuring that all
|
||||
protected RAM has a Realm IPA state (RIPAS) of "RAM".
|
||||
|
||||
|
||||
2. Setup the device tree
|
||||
-------------------------
|
||||
@ -385,6 +388,9 @@ Before jumping into the kernel, the following conditions must be met:
|
||||
|
||||
- HCRX_EL2.MSCEn (bit 11) must be initialised to 0b1.
|
||||
|
||||
- HCRX_EL2.MCE2 (bit 10) must be initialised to 0b1 and the hypervisor
|
||||
must handle MOPS exceptions as described in :ref:`arm64_mops_hyp`.
|
||||
|
||||
For CPUs with the Extended Translation Control Register feature (FEAT_TCR2):
|
||||
|
||||
- If EL3 is present:
|
||||
@ -411,6 +417,38 @@ Before jumping into the kernel, the following conditions must be met:
|
||||
|
||||
- HFGRWR_EL2.nPIRE0_EL1 (bit 57) must be initialised to 0b1.
|
||||
|
||||
- For CPUs with Guarded Control Stacks (FEAT_GCS):
|
||||
|
||||
- GCSCR_EL1 must be initialised to 0.
|
||||
|
||||
- GCSCRE0_EL1 must be initialised to 0.
|
||||
|
||||
- If EL3 is present:
|
||||
|
||||
- SCR_EL3.GCSEn (bit 39) must be initialised to 0b1.
|
||||
|
||||
- If EL2 is present:
|
||||
|
||||
- GCSCR_EL2 must be initialised to 0.
|
||||
|
||||
- If the kernel is entered at EL1 and EL2 is present:
|
||||
|
||||
- HCRX_EL2.GCSEn must be initialised to 0b1.
|
||||
|
||||
- HFGITR_EL2.nGCSEPP (bit 59) must be initialised to 0b1.
|
||||
|
||||
- HFGITR_EL2.nGCSSTR_EL1 (bit 58) must be initialised to 0b1.
|
||||
|
||||
- HFGITR_EL2.nGCSPUSHM_EL1 (bit 57) must be initialised to 0b1.
|
||||
|
||||
- HFGRTR_EL2.nGCS_EL1 (bit 53) must be initialised to 0b1.
|
||||
|
||||
- HFGRTR_EL2.nGCS_EL0 (bit 52) must be initialised to 0b1.
|
||||
|
||||
- HFGWTR_EL2.nGCS_EL1 (bit 53) must be initialised to 0b1.
|
||||
|
||||
- HFGWTR_EL2.nGCS_EL0 (bit 52) must be initialised to 0b1.
|
||||
|
||||
The requirements described above for CPU mode, caches, MMUs, architected
|
||||
timers, coherency and system registers apply to all CPUs. All CPUs must
|
||||
enter the kernel in the same exception level. Where the values documented
|
||||
|
@ -152,6 +152,8 @@ infrastructure:
|
||||
+------------------------------+---------+---------+
|
||||
| DIT | [51-48] | y |
|
||||
+------------------------------+---------+---------+
|
||||
| MPAM | [43-40] | n |
|
||||
+------------------------------+---------+---------+
|
||||
| SVE | [35-32] | y |
|
||||
+------------------------------+---------+---------+
|
||||
| GIC | [27-24] | n |
|
||||
|
@ -16,9 +16,9 @@ architected discovery mechanism available to userspace code at EL0. The
|
||||
kernel exposes the presence of these features to userspace through a set
|
||||
of flags called hwcaps, exposed in the auxiliary vector.
|
||||
|
||||
Userspace software can test for features by acquiring the AT_HWCAP or
|
||||
AT_HWCAP2 entry of the auxiliary vector, and testing whether the relevant
|
||||
flags are set, e.g.::
|
||||
Userspace software can test for features by acquiring the AT_HWCAP,
|
||||
AT_HWCAP2 or AT_HWCAP3 entry of the auxiliary vector, and testing
|
||||
whether the relevant flags are set, e.g.::
|
||||
|
||||
bool floating_point_is_present(void)
|
||||
{
|
||||
@ -170,6 +170,10 @@ HWCAP_PACG
|
||||
ID_AA64ISAR1_EL1.GPI == 0b0001, as described by
|
||||
Documentation/arch/arm64/pointer-authentication.rst.
|
||||
|
||||
HWCAP_GCS
|
||||
Functionality implied by ID_AA64PFR1_EL1.GCS == 0b1, as
|
||||
described by Documentation/arch/arm64/gcs.rst.
|
||||
|
||||
HWCAP2_DCPODP
|
||||
Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0010.
|
||||
|
||||
|
227
Documentation/arch/arm64/gcs.rst
Normal file
227
Documentation/arch/arm64/gcs.rst
Normal file
@ -0,0 +1,227 @@
|
||||
===============================================
|
||||
Guarded Control Stack support for AArch64 Linux
|
||||
===============================================
|
||||
|
||||
This document outlines briefly the interface provided to userspace by Linux in
|
||||
order to support use of the ARM Guarded Control Stack (GCS) feature.
|
||||
|
||||
This is an outline of the most important features and issues only and not
|
||||
intended to be exhaustive.
|
||||
|
||||
|
||||
|
||||
1. General
|
||||
-----------
|
||||
|
||||
* GCS is an architecture feature intended to provide greater protection
|
||||
against return oriented programming (ROP) attacks and to simplify the
|
||||
implementation of features that need to collect stack traces such as
|
||||
profiling.
|
||||
|
||||
* When GCS is enabled a separate guarded control stack is maintained by the
|
||||
PE which is writeable only through specific GCS operations. This
|
||||
stores the call stack only, when a procedure call instruction is
|
||||
performed the current PC is pushed onto the GCS and on RET the
|
||||
address in the LR is verified against that on the top of the GCS.
|
||||
|
||||
* When active the current GCS pointer is stored in the system register
|
||||
GCSPR_EL0. This is readable by userspace but can only be updated
|
||||
via specific GCS instructions.
|
||||
|
||||
* The architecture provides instructions for switching between guarded
|
||||
control stacks with checks to ensure that the new stack is a valid
|
||||
target for switching.
|
||||
|
||||
* The functionality of GCS is similar to that provided by the x86 Shadow
|
||||
Stack feature, due to sharing of userspace interfaces the ABI refers to
|
||||
shadow stacks rather than GCS.
|
||||
|
||||
* Support for GCS is reported to userspace via HWCAP_GCS in the aux vector
|
||||
AT_HWCAP2 entry.
|
||||
|
||||
* GCS is enabled per thread. While there is support for disabling GCS
|
||||
at runtime this should be done with great care.
|
||||
|
||||
* GCS memory access faults are reported as normal memory access faults.
|
||||
|
||||
* GCS specific errors (those reported with EC 0x2d) will be reported as
|
||||
SIGSEGV with a si_code of SEGV_CPERR (control protection error).
|
||||
|
||||
* GCS is supported only for AArch64.
|
||||
|
||||
* On systems where GCS is supported GCSPR_EL0 is always readable by EL0
|
||||
regardless of the GCS configuration for the thread.
|
||||
|
||||
* The architecture supports enabling GCS without verifying that return values
|
||||
in LR match those in the GCS, the LR will be ignored. This is not supported
|
||||
by Linux.
|
||||
|
||||
|
||||
|
||||
2. Enabling and disabling Guarded Control Stacks
|
||||
-------------------------------------------------
|
||||
|
||||
* GCS is enabled and disabled for a thread via the PR_SET_SHADOW_STACK_STATUS
|
||||
prctl(), this takes a single flags argument specifying which GCS features
|
||||
should be used.
|
||||
|
||||
* When set PR_SHADOW_STACK_ENABLE flag allocates a Guarded Control Stack
|
||||
and enables GCS for the thread, enabling the functionality controlled by
|
||||
GCSCRE0_EL1.{nTR, RVCHKEN, PCRSEL}.
|
||||
|
||||
* When set the PR_SHADOW_STACK_PUSH flag enables the functionality controlled
|
||||
by GCSCRE0_EL1.PUSHMEn, allowing explicit GCS pushes.
|
||||
|
||||
* When set the PR_SHADOW_STACK_WRITE flag enables the functionality controlled
|
||||
by GCSCRE0_EL1.STREn, allowing explicit stores to the Guarded Control Stack.
|
||||
|
||||
* Any unknown flags will cause PR_SET_SHADOW_STACK_STATUS to return -EINVAL.
|
||||
|
||||
* PR_LOCK_SHADOW_STACK_STATUS is passed a bitmask of features with the same
|
||||
values as used for PR_SET_SHADOW_STACK_STATUS. Any future changes to the
|
||||
status of the specified GCS mode bits will be rejected.
|
||||
|
||||
* PR_LOCK_SHADOW_STACK_STATUS allows any bit to be locked, this allows
|
||||
userspace to prevent changes to any future features.
|
||||
|
||||
* There is no support for a process to remove a lock that has been set for
|
||||
it.
|
||||
|
||||
* PR_SET_SHADOW_STACK_STATUS and PR_LOCK_SHADOW_STACK_STATUS affect only the
|
||||
thread that called them, any other running threads will be unaffected.
|
||||
|
||||
* New threads inherit the GCS configuration of the thread that created them.
|
||||
|
||||
* GCS is disabled on exec().
|
||||
|
||||
* The current GCS configuration for a thread may be read with the
|
||||
PR_GET_SHADOW_STACK_STATUS prctl(), this returns the same flags that
|
||||
are passed to PR_SET_SHADOW_STACK_STATUS.
|
||||
|
||||
* If GCS is disabled for a thread after having previously been enabled then
|
||||
the stack will remain allocated for the lifetime of the thread. At present
|
||||
any attempt to reenable GCS for the thread will be rejected, this may be
|
||||
revisited in future.
|
||||
|
||||
* It should be noted that since enabling GCS will result in GCS becoming
|
||||
active immediately it is not normally possible to return from the function
|
||||
that invoked the prctl() that enabled GCS. It is expected that the normal
|
||||
usage will be that GCS is enabled very early in execution of a program.
|
||||
|
||||
|
||||
|
||||
3. Allocation of Guarded Control Stacks
|
||||
----------------------------------------
|
||||
|
||||
* When GCS is enabled for a thread a new Guarded Control Stack will be
|
||||
allocated for it of half the standard stack size or 2 gigabytes,
|
||||
whichever is smaller.
|
||||
|
||||
* When a new thread is created by a thread which has GCS enabled then a
|
||||
new Guarded Control Stack will be allocated for the new thread with
|
||||
half the size of the standard stack.
|
||||
|
||||
* When a stack is allocated by enabling GCS or during thread creation then
|
||||
the top 8 bytes of the stack will be initialised to 0 and GCSPR_EL0 will
|
||||
be set to point to the address of this 0 value, this can be used to
|
||||
detect the top of the stack.
|
||||
|
||||
* Additional Guarded Control Stacks can be allocated using the
|
||||
map_shadow_stack() system call.
|
||||
|
||||
* Stacks allocated using map_shadow_stack() can optionally have an end of
|
||||
stack marker and cap placed at the top of the stack. If the flag
|
||||
SHADOW_STACK_SET_TOKEN is specified a cap will be placed on the stack,
|
||||
if SHADOW_STACK_SET_MARKER is not specified the cap will be the top 8
|
||||
bytes of the stack and if it is specified then the cap will be the next
|
||||
8 bytes. While specifying just SHADOW_STACK_SET_MARKER by itself is
|
||||
valid since the marker is all bits 0 it has no observable effect.
|
||||
|
||||
* Stacks allocated using map_shadow_stack() must have a size which is a
|
||||
multiple of 8 bytes larger than 8 bytes and must be 8 bytes aligned.
|
||||
|
||||
* An address can be specified to map_shadow_stack(), if one is provided then
|
||||
it must be aligned to a page boundary.
|
||||
|
||||
* When a thread is freed the Guarded Control Stack initially allocated for
|
||||
that thread will be freed. Note carefully that if the stack has been
|
||||
switched this may not be the stack currently in use by the thread.
|
||||
|
||||
|
||||
4. Signal handling
|
||||
--------------------
|
||||
|
||||
* A new signal frame record gcs_context encodes the current GCS mode and
|
||||
pointer for the interrupted context on signal delivery. This will always
|
||||
be present on systems that support GCS.
|
||||
|
||||
* The record contains a flag field which reports the current GCS configuration
|
||||
for the interrupted context as PR_GET_SHADOW_STACK_STATUS would.
|
||||
|
||||
* The signal handler is run with the same GCS configuration as the interrupted
|
||||
context.
|
||||
|
||||
* When GCS is enabled for the interrupted thread a signal handling specific
|
||||
GCS cap token will be written to the GCS, this is an architectural GCS cap
|
||||
with the token type (bits 0..11) all clear. The GCSPR_EL0 reported in the
|
||||
signal frame will point to this cap token.
|
||||
|
||||
* The signal handler will use the same GCS as the interrupted context.
|
||||
|
||||
* When GCS is enabled on signal entry a frame with the address of the signal
|
||||
return handler will be pushed onto the GCS, allowing return from the signal
|
||||
handler via RET as normal. This will not be reported in the gcs_context in
|
||||
the signal frame.
|
||||
|
||||
|
||||
5. Signal return
|
||||
-----------------
|
||||
|
||||
When returning from a signal handler:
|
||||
|
||||
* If there is a gcs_context record in the signal frame then the GCS flags
|
||||
and GCSPR_EL0 will be restored from that context prior to further
|
||||
validation.
|
||||
|
||||
* If there is no gcs_context record in the signal frame then the GCS
|
||||
configuration will be unchanged.
|
||||
|
||||
* If GCS is enabled on return from a signal handler then GCSPR_EL0 must
|
||||
point to a valid GCS signal cap record, this will be popped from the
|
||||
GCS prior to signal return.
|
||||
|
||||
* If the GCS configuration is locked when returning from a signal then any
|
||||
attempt to change the GCS configuration will be treated as an error. This
|
||||
is true even if GCS was not enabled prior to signal entry.
|
||||
|
||||
* GCS may be disabled via signal return but any attempt to enable GCS via
|
||||
signal return will be rejected.
|
||||
|
||||
|
||||
6. ptrace extensions
|
||||
---------------------
|
||||
|
||||
* A new regset NT_ARM_GCS is defined for use with PTRACE_GETREGSET and
|
||||
PTRACE_SETREGSET.
|
||||
|
||||
* The GCS mode, including enable and disable, may be configured via ptrace.
|
||||
If GCS is enabled via ptrace no new GCS will be allocated for the thread.
|
||||
|
||||
* Configuration via ptrace ignores locking of GCS mode bits.
|
||||
|
||||
|
||||
7. ELF coredump extensions
|
||||
---------------------------
|
||||
|
||||
* NT_ARM_GCS notes will be added to each coredump for each thread of the
|
||||
dumped process. The contents will be equivalent to the data that would
|
||||
have been read if a PTRACE_GETREGSET of the corresponding type were
|
||||
executed for each thread when the coredump was generated.
|
||||
|
||||
|
||||
|
||||
8. /proc extensions
|
||||
--------------------
|
||||
|
||||
* Guarded Control Stack pages will include "ss" in their VmFlags in
|
||||
/proc/<pid>/smaps.
|
@ -10,16 +10,19 @@ ARM64 Architecture
|
||||
acpi_object_usage
|
||||
amu
|
||||
arm-acpi
|
||||
arm-cca
|
||||
asymmetric-32bit
|
||||
booting
|
||||
cpu-feature-registers
|
||||
cpu-hotplug
|
||||
elf_hwcaps
|
||||
gcs
|
||||
hugetlbpage
|
||||
kdump
|
||||
legacy_instructions
|
||||
memory
|
||||
memory-tagging-extension
|
||||
mops
|
||||
perf
|
||||
pointer-authentication
|
||||
ptdump
|
||||
|
44
Documentation/arch/arm64/mops.rst
Normal file
44
Documentation/arch/arm64/mops.rst
Normal file
@ -0,0 +1,44 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===================================
|
||||
Memory copy/set instructions (MOPS)
|
||||
===================================
|
||||
|
||||
A MOPS memory copy/set operation consists of three consecutive CPY* or SET*
|
||||
instructions: a prologue, main and epilogue (for example: CPYP, CPYM, CPYE).
|
||||
|
||||
A main or epilogue instruction can take a MOPS exception for various reasons,
|
||||
for example when a task is migrated to a CPU with a different MOPS
|
||||
implementation, or when the instruction's alignment and size requirements are
|
||||
not met. The software exception handler is then expected to reset the registers
|
||||
and restart execution from the prologue instruction. Normally this is handled
|
||||
by the kernel.
|
||||
|
||||
For more details refer to "D1.3.5.7 Memory Copy and Memory Set exceptions" in
|
||||
the Arm Architecture Reference Manual DDI 0487K.a (Arm ARM).
|
||||
|
||||
.. _arm64_mops_hyp:
|
||||
|
||||
Hypervisor requirements
|
||||
-----------------------
|
||||
|
||||
A hypervisor running a Linux guest must handle all MOPS exceptions from the
|
||||
guest kernel, as Linux may not be able to handle the exception at all times.
|
||||
For example, a MOPS exception can be taken when the hypervisor migrates a vCPU
|
||||
to another physical CPU with a different MOPS implementation.
|
||||
|
||||
To do this, the hypervisor must:
|
||||
|
||||
- Set HCRX_EL2.MCE2 to 1 so that the exception is taken to the hypervisor.
|
||||
|
||||
- Have an exception handler that implements the algorithm from the Arm ARM
|
||||
rules CNTMJ and MWFQH.
|
||||
|
||||
- Set the guest's PSTATE.SS to 0 in the exception handler, to handle a
|
||||
potential step of the current instruction.
|
||||
|
||||
Note: Clearing PSTATE.SS is needed so that a single step exception is taken
|
||||
on the next instruction (the prologue instruction). Otherwise prologue
|
||||
would get silently stepped over and the single step exception taken on the
|
||||
main instruction. Note that if the guest instruction is not being stepped
|
||||
then clearing PSTATE.SS has no effect.
|
@ -258,6 +258,8 @@ stable kernels.
|
||||
| Hisilicon | Hip{08,09,10,10C| #162001900 | N/A |
|
||||
| | ,11} SMMU PMCG | | |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| Hisilicon | Hip09 | #162100801 | HISILICON_ERRATUM_162100801 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| Qualcomm Tech. | Kryo/Falkor v1 | E1003 | QCOM_FALKOR_ERRATUM_1003 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
|
@ -346,6 +346,10 @@ The regset data starts with struct user_za_header, containing:
|
||||
|
||||
* Writes to NT_ARM_ZT will set PSTATE.ZA to 1.
|
||||
|
||||
* If any register data is provided along with SME_PT_VL_ONEXEC then the
|
||||
registers data will be interpreted with the current vector length, not
|
||||
the vector length configured for use on exec.
|
||||
|
||||
|
||||
8. ELF coredump extensions
|
||||
---------------------------
|
||||
|
@ -402,6 +402,10 @@ The regset data starts with struct user_sve_header, containing:
|
||||
streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode
|
||||
if the target was not in streaming mode.
|
||||
|
||||
* If any register data is provided along with SVE_PT_VL_ONEXEC then the
|
||||
registers data will be interpreted with the current vector length, not
|
||||
the vector length configured for use on exec.
|
||||
|
||||
* The effect of writing a partial, incomplete payload is unspecified.
|
||||
|
||||
|
||||
|
@ -85,6 +85,70 @@ to CPUINTC directly::
|
||||
| Devices |
|
||||
+---------+
|
||||
|
||||
Virtual Extended IRQ model
|
||||
==========================
|
||||
|
||||
In this model, IPI (Inter-Processor Interrupt) and CPU Local Timer interrupt
|
||||
go to CPUINTC directly, CPU UARTS interrupts go to PCH-PIC, while all other
|
||||
devices interrupts go to PCH-PIC/PCH-MSI and gathered by V-EIOINTC (Virtual
|
||||
Extended I/O Interrupt Controller), and then go to CPUINTC directly::
|
||||
|
||||
+-----+ +-------------------+ +-------+
|
||||
| IPI |--> | CPUINTC(0-255vcpu)| <-- | Timer |
|
||||
+-----+ +-------------------+ +-------+
|
||||
^
|
||||
|
|
||||
+-----------+
|
||||
| V-EIOINTC |
|
||||
+-----------+
|
||||
^ ^
|
||||
| |
|
||||
+---------+ +---------+
|
||||
| PCH-PIC | | PCH-MSI |
|
||||
+---------+ +---------+
|
||||
^ ^ ^
|
||||
| | |
|
||||
+--------+ +---------+ +---------+
|
||||
| UARTs | | Devices | | Devices |
|
||||
+--------+ +---------+ +---------+
|
||||
|
||||
|
||||
Description
|
||||
-----------
|
||||
V-EIOINTC (Virtual Extended I/O Interrupt Controller) is an extension of
|
||||
EIOINTC, it only works in VM mode which runs in KVM hypervisor. Interrupts can
|
||||
be routed to up to four vCPUs via standard EIOINTC, however with V-EIOINTC
|
||||
interrupts can be routed to up to 256 virtual cpus.
|
||||
|
||||
With standard EIOINTC, interrupt routing setting includes two parts: eight
|
||||
bits for CPU selection and four bits for CPU IP (Interrupt Pin) selection.
|
||||
For CPU selection there is four bits for EIOINTC node selection, four bits
|
||||
for EIOINTC CPU selection. Bitmap method is used for CPU selection and
|
||||
CPU IP selection, so interrupt can only route to CPU0 - CPU3 and IP0-IP3 in
|
||||
one EIOINTC node.
|
||||
|
||||
With V-EIOINTC it supports to route more CPUs and CPU IP (Interrupt Pin),
|
||||
there are two newly added registers with V-EIOINTC.
|
||||
|
||||
EXTIOI_VIRT_FEATURES
|
||||
--------------------
|
||||
This register is read-only register, which indicates supported features with
|
||||
V-EIOINTC. Feature EXTIOI_HAS_INT_ENCODE and EXTIOI_HAS_CPU_ENCODE is added.
|
||||
|
||||
Feature EXTIOI_HAS_INT_ENCODE is part of standard EIOINTC. If it is 1, it
|
||||
indicates that CPU Interrupt Pin selection can be normal method rather than
|
||||
bitmap method, so interrupt can be routed to IP0 - IP15.
|
||||
|
||||
Feature EXTIOI_HAS_CPU_ENCODE is entension of V-EIOINTC. If it is 1, it
|
||||
indicates that CPU selection can be normal method rather than bitmap method,
|
||||
so interrupt can be routed to CPU0 - CPU255.
|
||||
|
||||
EXTIOI_VIRT_CONFIG
|
||||
------------------
|
||||
This register is read-write register, for compatibility intterupt routed uses
|
||||
the default method which is the same with standard EIOINTC. If the bit is set
|
||||
with 1, it indicated HW to use normal method rather than bitmap method.
|
||||
|
||||
Advanced Extended IRQ model
|
||||
===========================
|
||||
|
||||
|
@ -93,8 +93,8 @@ given platform based on the content of the device-tree. Thus, you
|
||||
should:
|
||||
|
||||
a) add your platform support as a _boolean_ option in
|
||||
arch/powerpc/Kconfig, following the example of PPC_PSERIES,
|
||||
PPC_PMAC and PPC_MAPLE. The latter is probably a good
|
||||
arch/powerpc/Kconfig, following the example of PPC_PSERIES
|
||||
and PPC_PMAC. The latter is probably a good
|
||||
example of a board support to start from.
|
||||
|
||||
b) create your main platform file as
|
||||
|
@ -239,6 +239,9 @@ The following keys are defined:
|
||||
ratified in commit 98918c844281 ("Merge pull request #1217 from
|
||||
riscv/zawrs") of riscv-isa-manual.
|
||||
|
||||
* :c:macro:`RISCV_HWPROBE_EXT_SUPM`: The Supm extension is supported as
|
||||
defined in version 1.0 of the RISC-V Pointer Masking extensions.
|
||||
|
||||
* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: Deprecated. Returns similar values to
|
||||
:c:macro:`RISCV_HWPROBE_KEY_MISALIGNED_SCALAR_PERF`, but the key was
|
||||
mistakenly classified as a bitmask rather than a value.
|
||||
@ -274,3 +277,19 @@ The following keys are defined:
|
||||
represent the highest userspace virtual address usable.
|
||||
|
||||
* :c:macro:`RISCV_HWPROBE_KEY_TIME_CSR_FREQ`: Frequency (in Hz) of `time CSR`.
|
||||
|
||||
* :c:macro:`RISCV_HWPROBE_KEY_MISALIGNED_VECTOR_PERF`: An enum value describing the
|
||||
performance of misaligned vector accesses on the selected set of processors.
|
||||
|
||||
* :c:macro:`RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN`: The performance of misaligned
|
||||
vector accesses is unknown.
|
||||
|
||||
* :c:macro:`RISCV_HWPROBE_MISALIGNED_VECTOR_SLOW`: 32-bit misaligned accesses using vector
|
||||
registers are slower than the equivalent quantity of byte accesses via vector registers.
|
||||
Misaligned accesses may be supported directly in hardware, or trapped and emulated by software.
|
||||
|
||||
* :c:macro:`RISCV_HWPROBE_MISALIGNED_VECTOR_FAST`: 32-bit misaligned accesses using vector
|
||||
registers are faster than the equivalent quantity of byte accesses via vector registers.
|
||||
|
||||
* :c:macro:`RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED`: Misaligned vector accesses are
|
||||
not supported at all and will generate a misaligned address fault.
|
||||
|
@ -68,3 +68,19 @@ Misaligned accesses
|
||||
Misaligned scalar accesses are supported in userspace, but they may perform
|
||||
poorly. Misaligned vector accesses are only supported if the Zicclsm extension
|
||||
is supported.
|
||||
|
||||
Pointer masking
|
||||
---------------
|
||||
|
||||
Support for pointer masking in userspace (the Supm extension) is provided via
|
||||
the ``PR_SET_TAGGED_ADDR_CTRL`` and ``PR_GET_TAGGED_ADDR_CTRL`` ``prctl()``
|
||||
operations. Pointer masking is disabled by default. To enable it, userspace
|
||||
must call ``PR_SET_TAGGED_ADDR_CTRL`` with the ``PR_PMLEN`` field set to the
|
||||
number of mask/tag bits needed by the application. ``PR_PMLEN`` is interpreted
|
||||
as a lower bound; if the kernel is unable to satisfy the request, the
|
||||
``PR_SET_TAGGED_ADDR_CTRL`` operation will fail. The actual number of tag bits
|
||||
is returned in ``PR_PMLEN`` by the ``PR_GET_TAGGED_ADDR_CTRL`` operation.
|
||||
|
||||
Additionally, when pointer masking is enabled (``PR_PMLEN`` is greater than 0),
|
||||
a tagged address ABI is supported, with the same interface and behavior as
|
||||
documented for AArch64 (Documentation/arch/arm64/tagged-address-abi.rst).
|
||||
|
@ -4,8 +4,9 @@
|
||||
AMD HSMP interface
|
||||
============================================
|
||||
|
||||
Newer Fam19h EPYC server line of processors from AMD support system
|
||||
management functionality via HSMP (Host System Management Port).
|
||||
Newer Fam19h(model 0x00-0x1f, 0x30-0x3f, 0x90-0x9f, 0xa0-0xaf),
|
||||
Fam1Ah(model 0x00-0x1f) EPYC server line of processors from AMD support
|
||||
system management functionality via HSMP (Host System Management Port).
|
||||
|
||||
The Host System Management Port (HSMP) is an interface to provide
|
||||
OS-level software with access to system management functions via a
|
||||
@ -16,14 +17,25 @@ More details on the interface can be found in chapter
|
||||
Eg: https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/55898_B1_pub_0_50.zip
|
||||
|
||||
|
||||
HSMP interface is supported on EPYC server CPU models only.
|
||||
HSMP interface is supported on EPYC line of server CPUs and MI300A (APU).
|
||||
|
||||
|
||||
HSMP device
|
||||
============================================
|
||||
|
||||
amd_hsmp driver under the drivers/platforms/x86/ creates miscdevice
|
||||
/dev/hsmp to let user space programs run hsmp mailbox commands.
|
||||
amd_hsmp driver under drivers/platforms/x86/amd/hsmp/ has separate driver files
|
||||
for ACPI object based probing, platform device based probing and for the common
|
||||
code for these two drivers.
|
||||
|
||||
Kconfig option CONFIG_AMD_HSMP_PLAT compiles plat.c and creates amd_hsmp.ko.
|
||||
Kconfig option CONFIG_AMD_HSMP_ACPI compiles acpi.c and creates hsmp_acpi.ko.
|
||||
Selecting any of these two configs automatically selects CONFIG_AMD_HSMP. This
|
||||
compiles common code hsmp.c and creates hsmp_common.ko module.
|
||||
|
||||
Both the ACPI and plat drivers create the miscdevice /dev/hsmp to let
|
||||
user space programs run hsmp mailbox commands.
|
||||
|
||||
The ACPI object format supported by the driver is defined below.
|
||||
|
||||
$ ls -al /dev/hsmp
|
||||
crw-r--r-- 1 root root 10, 123 Jan 21 21:41 /dev/hsmp
|
||||
@ -59,6 +71,51 @@ Note: lseek() is not supported as entire metrics table is read.
|
||||
Metrics table definitions will be documented as part of Public PPR.
|
||||
The same is defined in the amd_hsmp.h header.
|
||||
|
||||
ACPI device object format
|
||||
=========================
|
||||
The ACPI object format expected from the amd_hsmp driver
|
||||
for socket with ID00 is given below::
|
||||
|
||||
Device(HSMP)
|
||||
{
|
||||
Name(_HID, "AMDI0097")
|
||||
Name(_UID, "ID00")
|
||||
Name(HSE0, 0x00000001)
|
||||
Name(RBF0, ResourceTemplate()
|
||||
{
|
||||
Memory32Fixed(ReadWrite, 0xxxxxxx, 0x00100000)
|
||||
})
|
||||
Method(_CRS, 0, NotSerialized)
|
||||
{
|
||||
Return(RBF0)
|
||||
}
|
||||
Method(_STA, 0, NotSerialized)
|
||||
{
|
||||
If(LEqual(HSE0, One))
|
||||
{
|
||||
Return(0x0F)
|
||||
}
|
||||
Else
|
||||
{
|
||||
Return(Zero)
|
||||
}
|
||||
}
|
||||
Name(_DSD, Package(2)
|
||||
{
|
||||
Buffer(0x10)
|
||||
{
|
||||
0x9D, 0x61, 0x4D, 0xB7, 0x07, 0x57, 0xBD, 0x48,
|
||||
0xA6, 0x9F, 0x4E, 0xA2, 0x87, 0x1F, 0xC2, 0xF6
|
||||
},
|
||||
Package(3)
|
||||
{
|
||||
Package(2) {"MsgIdOffset", 0x00010934},
|
||||
Package(2) {"MsgRspOffset", 0x00010980},
|
||||
Package(2) {"MsgArgOffset", 0x000109E0}
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
|
||||
An example
|
||||
==========
|
||||
|
@ -896,10 +896,19 @@ Offset/size: 0x260/4
|
||||
|
||||
The kernel runtime start address is determined by the following algorithm::
|
||||
|
||||
if (relocatable_kernel)
|
||||
runtime_start = align_up(load_address, kernel_alignment)
|
||||
else
|
||||
runtime_start = pref_address
|
||||
if (relocatable_kernel) {
|
||||
if (load_address < pref_address)
|
||||
load_address = pref_address;
|
||||
runtime_start = align_up(load_address, kernel_alignment);
|
||||
} else {
|
||||
runtime_start = pref_address;
|
||||
}
|
||||
|
||||
Hence the necessary memory window location and size can be estimated by
|
||||
a boot loader as::
|
||||
|
||||
memory_window_start = runtime_start;
|
||||
memory_window_size = init_size;
|
||||
|
||||
============ ===============
|
||||
Field name: handover_offset
|
||||
|
@ -26,7 +26,8 @@ Detection
|
||||
=========
|
||||
|
||||
Intel processors may support either or both of the following hardware
|
||||
mechanisms to detect split locks and bus locks.
|
||||
mechanisms to detect split locks and bus locks. Some AMD processors also
|
||||
support bus lock detect.
|
||||
|
||||
#AC exception for split lock detection
|
||||
--------------------------------------
|
||||
|
@ -305,3 +305,8 @@ The available options are:
|
||||
|
||||
debug
|
||||
Enable debug messages.
|
||||
|
||||
nosnp
|
||||
Do not enable SEV-SNP (applies to host/hypervisor only). Setting
|
||||
'nosnp' avoids the RMP check overhead in memory accesses when
|
||||
users do not want to run SEV-SNP guests.
|
||||
|
@ -29,15 +29,27 @@ Complete virtual memory map with 4-level page tables
|
||||
Start addr | Offset | End addr | Size | VM area description
|
||||
========================================================================================================================
|
||||
| | | |
|
||||
0000000000000000 | 0 | 00007fffffffffff | 128 TB | user-space virtual memory, different per mm
|
||||
0000000000000000 | 0 | 00007fffffffefff | ~128 TB | user-space virtual memory, different per mm
|
||||
00007ffffffff000 | ~128 TB | 00007fffffffffff | 4 kB | ... guard hole
|
||||
__________________|____________|__________________|_________|___________________________________________________________
|
||||
| | | |
|
||||
0000800000000000 | +128 TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
|
||||
| | | | virtual memory addresses up to the -128 TB
|
||||
0000800000000000 | +128 TB | 7fffffffffffffff | ~8 EB | ... huge, almost 63 bits wide hole of non-canonical
|
||||
| | | | virtual memory addresses up to the -8 EB
|
||||
| | | | starting offset of kernel mappings.
|
||||
| | | |
|
||||
| | | | LAM relaxes canonicallity check allowing to create aliases
|
||||
| | | | for userspace memory here.
|
||||
__________________|____________|__________________|_________|___________________________________________________________
|
||||
|
|
||||
| Kernel-space virtual memory, shared between all processes:
|
||||
__________________|____________|__________________|_________|___________________________________________________________
|
||||
| | | |
|
||||
8000000000000000 | -8 EB | ffff7fffffffffff | ~8 EB | ... huge, almost 63 bits wide hole of non-canonical
|
||||
| | | | virtual memory addresses up to the -128 TB
|
||||
| | | | starting offset of kernel mappings.
|
||||
| | | |
|
||||
| | | | LAM_SUP relaxes canonicallity check allowing to create
|
||||
| | | | aliases for kernel memory here.
|
||||
____________________________________________________________|___________________________________________________________
|
||||
| | | |
|
||||
ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard hole, also reserved for hypervisor
|
||||
@ -88,15 +100,26 @@ Complete virtual memory map with 5-level page tables
|
||||
Start addr | Offset | End addr | Size | VM area description
|
||||
========================================================================================================================
|
||||
| | | |
|
||||
0000000000000000 | 0 | 00ffffffffffffff | 64 PB | user-space virtual memory, different per mm
|
||||
0000000000000000 | 0 | 00fffffffffff000 | ~64 PB | user-space virtual memory, different per mm
|
||||
00fffffffffff000 | ~64 PB | 00ffffffffffffff | 4 kB | ... guard hole
|
||||
__________________|____________|__________________|_________|___________________________________________________________
|
||||
| | | |
|
||||
0100000000000000 | +64 PB | feffffffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
|
||||
| | | | virtual memory addresses up to the -64 PB
|
||||
0100000000000000 | +64 PB | 7fffffffffffffff | ~8 EB | ... huge, almost 63 bits wide hole of non-canonical
|
||||
| | | | virtual memory addresses up to the -8EB TB
|
||||
| | | | starting offset of kernel mappings.
|
||||
| | | |
|
||||
| | | | LAM relaxes canonicallity check allowing to create aliases
|
||||
| | | | for userspace memory here.
|
||||
__________________|____________|__________________|_________|___________________________________________________________
|
||||
|
|
||||
| Kernel-space virtual memory, shared between all processes:
|
||||
____________________________________________________________|___________________________________________________________
|
||||
8000000000000000 | -8 EB | feffffffffffffff | ~8 EB | ... huge, almost 63 bits wide hole of non-canonical
|
||||
| | | | virtual memory addresses up to the -64 PB
|
||||
| | | | starting offset of kernel mappings.
|
||||
| | | |
|
||||
| | | | LAM_SUP relaxes canonicallity check allowing to create
|
||||
| | | | aliases for kernel memory here.
|
||||
____________________________________________________________|___________________________________________________________
|
||||
| | | |
|
||||
ff00000000000000 | -64 PB | ff0fffffffffffff | 4 PB | ... guard hole, also reserved for hypervisor
|
||||
|
@ -39,13 +39,16 @@ blkdevparts=<blkdev-def>[;<blkdev-def>]
|
||||
create a link to block device partition with the name "PARTNAME".
|
||||
User space application can access partition by partition name.
|
||||
|
||||
ro
|
||||
read-only. Flag the partition as read-only.
|
||||
|
||||
Example:
|
||||
|
||||
eMMC disk names are "mmcblk0" and "mmcblk0boot0".
|
||||
|
||||
bootargs::
|
||||
|
||||
'blkdevparts=mmcblk0:1G(data0),1G(data1),-;mmcblk0boot0:1m(boot),-(kernel)'
|
||||
'blkdevparts=mmcblk0:1G(data0),1G(data1),-;mmcblk0boot0:1m(boot)ro,-(kernel)'
|
||||
|
||||
dmesg::
|
||||
|
||||
|
@ -199,24 +199,36 @@ managing and controlling ublk devices with help of several control commands:
|
||||
|
||||
- user recovery feature description
|
||||
|
||||
Two new features are added for user recovery: ``UBLK_F_USER_RECOVERY`` and
|
||||
``UBLK_F_USER_RECOVERY_REISSUE``.
|
||||
Three new features are added for user recovery: ``UBLK_F_USER_RECOVERY``,
|
||||
``UBLK_F_USER_RECOVERY_REISSUE``, and ``UBLK_F_USER_RECOVERY_FAIL_IO``. To
|
||||
enable recovery of ublk devices after the ublk server exits, the ublk server
|
||||
should specify the ``UBLK_F_USER_RECOVERY`` flag when creating the device. The
|
||||
ublk server may additionally specify at most one of
|
||||
``UBLK_F_USER_RECOVERY_REISSUE`` and ``UBLK_F_USER_RECOVERY_FAIL_IO`` to
|
||||
modify how I/O is handled while the ublk server is dying/dead (this is called
|
||||
the ``nosrv`` case in the driver code).
|
||||
|
||||
With ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io
|
||||
With just ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io
|
||||
handler) is dying, ublk does not delete ``/dev/ublkb*`` during the whole
|
||||
recovery stage and ublk device ID is kept. It is ublk server's
|
||||
responsibility to recover the device context by its own knowledge.
|
||||
Requests which have not been issued to userspace are requeued. Requests
|
||||
which have been issued to userspace are aborted.
|
||||
|
||||
With ``UBLK_F_USER_RECOVERY_REISSUE`` set, after one ubq_daemon(ublk
|
||||
server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``,
|
||||
With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after one ubq_daemon
|
||||
(ublk server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``,
|
||||
requests which have been issued to userspace are requeued and will be
|
||||
re-issued to the new process after handling ``UBLK_CMD_END_USER_RECOVERY``.
|
||||
``UBLK_F_USER_RECOVERY_REISSUE`` is designed for backends who tolerate
|
||||
double-write since the driver may issue the same I/O request twice. It
|
||||
might be useful to a read-only FS or a VM backend.
|
||||
|
||||
With ``UBLK_F_USER_RECOVERY_FAIL_IO`` additionally set, after the ublk server
|
||||
exits, requests which have issued to userspace are failed, as are any
|
||||
subsequently issued requests. Applications continuously issuing I/O against
|
||||
devices with this flag set will see a stream of I/O errors until a new ublk
|
||||
server recovers the device.
|
||||
|
||||
Unprivileged ublk device is supported by passing ``UBLK_F_UNPRIVILEGED_DEV``.
|
||||
Once the flag is set, all control commands can be sent by unprivileged
|
||||
user. Except for command of ``UBLK_CMD_ADD_DEV``, permission check on
|
||||
|
@ -835,7 +835,7 @@ section named by ``btf_ext_info_sec->sec_name_off``.
|
||||
See :ref:`Documentation/bpf/llvm_reloc.rst <btf-co-re-relocations>`
|
||||
for more information on CO-RE relocations.
|
||||
|
||||
4.2 .BTF_ids section
|
||||
4.3 .BTF_ids section
|
||||
--------------------
|
||||
|
||||
The .BTF_ids section encodes BTF ID values that are used within the kernel.
|
||||
@ -896,6 +896,81 @@ and is used as a filter when resolving the BTF ID value.
|
||||
All the BTF ID lists and sets are compiled in the .BTF_ids section and
|
||||
resolved during the linking phase of kernel build by ``resolve_btfids`` tool.
|
||||
|
||||
4.4 .BTF.base section
|
||||
---------------------
|
||||
Split BTF - where the .BTF section only contains types not in the associated
|
||||
base .BTF section - is an extremely efficient way to encode type information
|
||||
for kernel modules, since they generally consist of a few module-specific
|
||||
types along with a large set of shared kernel types. The former are encoded
|
||||
in split BTF, while the latter are encoded in base BTF, resulting in more
|
||||
compact representations. A type in split BTF that refers to a type in
|
||||
base BTF refers to it using its base BTF ID, and split BTF IDs start
|
||||
at last_base_BTF_ID + 1.
|
||||
|
||||
The downside of this approach however is that this makes the split BTF
|
||||
somewhat brittle - when the base BTF changes, base BTF ID references are
|
||||
no longer valid and the split BTF itself becomes useless. The role of the
|
||||
.BTF.base section is to make split BTF more resilient for cases where
|
||||
the base BTF may change, as is the case for kernel modules not built every
|
||||
time the kernel is for example. .BTF.base contains named base types; INTs,
|
||||
FLOATs, STRUCTs, UNIONs, ENUM[64]s and FWDs. INTs and FLOATs are fully
|
||||
described in .BTF.base sections, while composite types like structs
|
||||
and unions are not fully defined - the .BTF.base type simply serves as
|
||||
a description of the type the split BTF referred to, so structs/unions
|
||||
have 0 members in the .BTF.base section. ENUM[64]s are similarly recorded
|
||||
with 0 members. Any other types are added to the split BTF. This
|
||||
distillation process then leaves us with a .BTF.base section with
|
||||
such minimal descriptions of base types and .BTF split section which refers
|
||||
to those base types. Later, we can relocate the split BTF using both the
|
||||
information stored in the .BTF.base section and the new .BTF base; the type
|
||||
information in the .BTF.base section allows us to update the split BTF
|
||||
references to point at the corresponding new base BTF IDs.
|
||||
|
||||
BTF relocation happens on kernel module load when a kernel module has a
|
||||
.BTF.base section, and libbpf also provides a btf__relocate() API to
|
||||
accomplish this.
|
||||
|
||||
As an example consider the following base BTF::
|
||||
|
||||
[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
|
||||
[2] STRUCT 'foo' size=8 vlen=2
|
||||
'f1' type_id=1 bits_offset=0
|
||||
'f2' type_id=1 bits_offset=32
|
||||
|
||||
...and associated split BTF::
|
||||
|
||||
[3] PTR '(anon)' type_id=2
|
||||
|
||||
i.e. split BTF describes a pointer to struct foo { int f1; int f2 };
|
||||
|
||||
.BTF.base will consist of::
|
||||
|
||||
[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
|
||||
[2] STRUCT 'foo' size=8 vlen=0
|
||||
|
||||
If we relocate the split BTF later using the following new base BTF::
|
||||
|
||||
[1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
|
||||
[2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
|
||||
[3] STRUCT 'foo' size=8 vlen=2
|
||||
'f1' type_id=2 bits_offset=0
|
||||
'f2' type_id=2 bits_offset=32
|
||||
|
||||
...we can use our .BTF.base description to know that the split BTF reference
|
||||
is to struct foo, and relocation results in new split BTF::
|
||||
|
||||
[4] PTR '(anon)' type_id=3
|
||||
|
||||
Note that we had to update BTF ID and start BTF ID for the split BTF.
|
||||
|
||||
So we see how .BTF.base plays the role of facilitating later relocation,
|
||||
leading to more resilient split BTF.
|
||||
|
||||
.BTF.base sections will be generated automatically for out-of-tree kernel module
|
||||
builds - i.e. where KBUILD_EXTMOD is set (as it would be for "make M=path/2/mod"
|
||||
cases). .BTF.base generation requires pahole support for the "distilled_base"
|
||||
BTF feature; this is available in pahole v1.28 and later.
|
||||
|
||||
5. Using BTF
|
||||
============
|
||||
|
||||
|
@ -507,7 +507,7 @@ Notes:
|
||||
from the parent state to the current state.
|
||||
|
||||
* Details about REG_LIVE_READ32 are omitted.
|
||||
|
||||
|
||||
* Function ``propagate_liveness()`` (see section :ref:`read_marks_for_cache_hits`)
|
||||
might override the first parent link. Please refer to the comments in the
|
||||
``propagate_liveness()`` and ``mark_reg_read()`` source code for further
|
||||
@ -571,7 +571,7 @@ works::
|
||||
are considered equivalent.
|
||||
|
||||
.. _read_marks_for_cache_hits:
|
||||
|
||||
|
||||
Read marks propagation for cache hits
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
@ -616,7 +616,7 @@ ONLINE section for notifications on online and offline operation::
|
||||
....
|
||||
cpuhp_remove_instance(state, &inst2->node);
|
||||
....
|
||||
remove_multi_state(state);
|
||||
cpuhp_remove_multi_state(state);
|
||||
|
||||
|
||||
Testing of hotplug states
|
||||
|
@ -55,14 +55,16 @@ scope.
|
||||
What about __vmalloc(GFP_NOFS)
|
||||
==============================
|
||||
|
||||
vmalloc doesn't support GFP_NOFS semantic because there are hardcoded
|
||||
GFP_KERNEL allocations deep inside the allocator which are quite non-trivial
|
||||
to fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is
|
||||
almost always a bug. The good news is that the NOFS/NOIO semantic can be
|
||||
achieved by the scope API.
|
||||
Since v5.17, and specifically after the commit 451769ebb7e79 ("mm/vmalloc:
|
||||
alloc GFP_NO{FS,IO} for vmalloc"), GFP_NOFS/GFP_NOIO are now supported in
|
||||
``[k]vmalloc`` by implicitly using scope API.
|
||||
|
||||
In earlier kernels ``vmalloc`` didn't support GFP_NOFS semantic because there
|
||||
were hardcoded GFP_KERNEL allocations deep inside the allocator. That means
|
||||
that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO was almost always a bug.
|
||||
|
||||
In the ideal world, upper layers should already mark dangerous contexts
|
||||
and so no special care is required and vmalloc should be called without
|
||||
any problems. Sometimes if the context is not really clear or there are
|
||||
layering violations then the recommended way around that is to wrap ``vmalloc``
|
||||
by the scope API with a comment explaining the problem.
|
||||
and so no special care is required and ``vmalloc`` should be called without any
|
||||
problems. Sometimes if the context is not really clear or there are layering
|
||||
violations then the recommended way around that (on pre-v5.17 kernels) is to
|
||||
wrap ``vmalloc`` by the scope API with a comment explaining the problem.
|
||||
|
@ -52,6 +52,7 @@ Library functionality that is used throughout the kernel.
|
||||
wrappers/atomic_bitops
|
||||
floating-point
|
||||
union_find
|
||||
min_heap
|
||||
|
||||
Low level entry and exit
|
||||
========================
|
||||
|
300
Documentation/core-api/min_heap.rst
Normal file
300
Documentation/core-api/min_heap.rst
Normal file
@ -0,0 +1,300 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
============
|
||||
Min Heap API
|
||||
============
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
The Min Heap API provides a set of functions and macros for managing min-heaps
|
||||
in the Linux kernel. A min-heap is a binary tree structure where the value of
|
||||
each node is less than or equal to the values of its children, ensuring that
|
||||
the smallest element is always at the root.
|
||||
|
||||
This document provides a guide to the Min Heap API, detailing how to define and
|
||||
use min-heaps. Users should not directly call functions with **__min_heap_*()**
|
||||
prefixes, but should instead use the provided macro wrappers.
|
||||
|
||||
In addition to the standard version of the functions, the API also includes a
|
||||
set of inline versions for performance-critical scenarios. These inline
|
||||
functions have the same names as their non-inline counterparts but include an
|
||||
**_inline** suffix. For example, **__min_heap_init_inline** and its
|
||||
corresponding macro wrapper **min_heap_init_inline**. The inline versions allow
|
||||
custom comparison and swap functions to be called directly, rather than through
|
||||
indirect function calls. This can significantly reduce overhead, especially
|
||||
when CONFIG_MITIGATION_RETPOLINE is enabled, as indirect function calls become
|
||||
more expensive. As with the non-inline versions, it is important to use the
|
||||
macro wrappers for inline functions instead of directly calling the functions
|
||||
themselves.
|
||||
|
||||
Data Structures
|
||||
===============
|
||||
|
||||
Min-Heap Definition
|
||||
-------------------
|
||||
|
||||
The core data structure for representing a min-heap is defined using the
|
||||
**MIN_HEAP_PREALLOCATED** and **DEFINE_MIN_HEAP** macros. These macros allow
|
||||
you to define a min-heap with a preallocated buffer or dynamically allocated
|
||||
memory.
|
||||
|
||||
Example:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#define MIN_HEAP_PREALLOCATED(_type, _name, _nr)
|
||||
struct _name {
|
||||
int nr; /* Number of elements in the heap */
|
||||
int size; /* Maximum number of elements that can be held */
|
||||
_type *data; /* Pointer to the heap data */
|
||||
_type preallocated[_nr]; /* Static preallocated array */
|
||||
}
|
||||
|
||||
#define DEFINE_MIN_HEAP(_type, _name) MIN_HEAP_PREALLOCATED(_type, _name, 0)
|
||||
|
||||
A typical heap structure will include a counter for the number of elements
|
||||
(`nr`), the maximum capacity of the heap (`size`), and a pointer to an array of
|
||||
elements (`data`). Optionally, you can specify a static array for preallocated
|
||||
heap storage using **MIN_HEAP_PREALLOCATED**.
|
||||
|
||||
Min Heap Callbacks
|
||||
------------------
|
||||
|
||||
The **struct min_heap_callbacks** provides customization options for ordering
|
||||
elements in the heap and swapping them. It contains two function pointers:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
struct min_heap_callbacks {
|
||||
bool (*less)(const void *lhs, const void *rhs, void *args);
|
||||
void (*swp)(void *lhs, void *rhs, void *args);
|
||||
};
|
||||
|
||||
- **less** is the comparison function used to establish the order of elements.
|
||||
- **swp** is a function for swapping elements in the heap. If swp is set to
|
||||
NULL, the default swap function will be used, which swaps the elements based on their size
|
||||
|
||||
Macro Wrappers
|
||||
==============
|
||||
|
||||
The following macro wrappers are provided for interacting with the heap in a
|
||||
user-friendly manner. Each macro corresponds to a function that operates on the
|
||||
heap, and they abstract away direct calls to internal functions.
|
||||
|
||||
Each macro accepts various parameters that are detailed below.
|
||||
|
||||
Heap Initialization
|
||||
--------------------
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
min_heap_init(heap, data, size);
|
||||
|
||||
- **heap**: A pointer to the min-heap structure to be initialized.
|
||||
- **data**: A pointer to the buffer where the heap elements will be stored. If
|
||||
`NULL`, the preallocated buffer within the heap structure will be used.
|
||||
- **size**: The maximum number of elements the heap can hold.
|
||||
|
||||
This macro initializes the heap, setting its initial state. If `data` is
|
||||
`NULL`, the preallocated memory inside the heap structure will be used for
|
||||
storage. Otherwise, the user-provided buffer is used. The operation is **O(1)**.
|
||||
|
||||
**Inline Version:** min_heap_init_inline(heap, data, size)
|
||||
|
||||
Accessing the Top Element
|
||||
-------------------------
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
element = min_heap_peek(heap);
|
||||
|
||||
- **heap**: A pointer to the min-heap from which to retrieve the smallest
|
||||
element.
|
||||
|
||||
This macro returns a pointer to the smallest element (the root) of the heap, or
|
||||
`NULL` if the heap is empty. The operation is **O(1)**.
|
||||
|
||||
**Inline Version:** min_heap_peek_inline(heap)
|
||||
|
||||
Heap Insertion
|
||||
--------------
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
success = min_heap_push(heap, element, callbacks, args);
|
||||
|
||||
- **heap**: A pointer to the min-heap into which the element should be inserted.
|
||||
- **element**: A pointer to the element to be inserted into the heap.
|
||||
- **callbacks**: A pointer to a `struct min_heap_callbacks` providing the
|
||||
`less` and `swp` functions.
|
||||
- **args**: Optional arguments passed to the `less` and `swp` functions.
|
||||
|
||||
This macro inserts an element into the heap. It returns `true` if the insertion
|
||||
was successful and `false` if the heap is full. The operation is **O(log n)**.
|
||||
|
||||
**Inline Version:** min_heap_push_inline(heap, element, callbacks, args)
|
||||
|
||||
Heap Removal
|
||||
------------
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
success = min_heap_pop(heap, callbacks, args);
|
||||
|
||||
- **heap**: A pointer to the min-heap from which to remove the smallest element.
|
||||
- **callbacks**: A pointer to a `struct min_heap_callbacks` providing the
|
||||
`less` and `swp` functions.
|
||||
- **args**: Optional arguments passed to the `less` and `swp` functions.
|
||||
|
||||
This macro removes the smallest element (the root) from the heap. It returns
|
||||
`true` if the element was successfully removed, or `false` if the heap is
|
||||
empty. The operation is **O(log n)**.
|
||||
|
||||
**Inline Version:** min_heap_pop_inline(heap, callbacks, args)
|
||||
|
||||
Heap Maintenance
|
||||
----------------
|
||||
|
||||
You can use the following macros to maintain the heap's structure:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
min_heap_sift_down(heap, pos, callbacks, args);
|
||||
|
||||
- **heap**: A pointer to the min-heap.
|
||||
- **pos**: The index from which to start sifting down.
|
||||
- **callbacks**: A pointer to a `struct min_heap_callbacks` providing the
|
||||
`less` and `swp` functions.
|
||||
- **args**: Optional arguments passed to the `less` and `swp` functions.
|
||||
|
||||
This macro restores the heap property by moving the element at the specified
|
||||
index (`pos`) down the heap until it is in the correct position. The operation
|
||||
is **O(log n)**.
|
||||
|
||||
**Inline Version:** min_heap_sift_down_inline(heap, pos, callbacks, args)
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
min_heap_sift_up(heap, idx, callbacks, args);
|
||||
|
||||
- **heap**: A pointer to the min-heap.
|
||||
- **idx**: The index of the element to sift up.
|
||||
- **callbacks**: A pointer to a `struct min_heap_callbacks` providing the
|
||||
`less` and `swp` functions.
|
||||
- **args**: Optional arguments passed to the `less` and `swp` functions.
|
||||
|
||||
This macro restores the heap property by moving the element at the specified
|
||||
index (`idx`) up the heap. The operation is **O(log n)**.
|
||||
|
||||
**Inline Version:** min_heap_sift_up_inline(heap, idx, callbacks, args)
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
min_heapify_all(heap, callbacks, args);
|
||||
|
||||
- **heap**: A pointer to the min-heap.
|
||||
- **callbacks**: A pointer to a `struct min_heap_callbacks` providing the
|
||||
`less` and `swp` functions.
|
||||
- **args**: Optional arguments passed to the `less` and `swp` functions.
|
||||
|
||||
This macro ensures that the entire heap satisfies the heap property. It is
|
||||
called when the heap is built from scratch or after many modifications. The
|
||||
operation is **O(n)**.
|
||||
|
||||
**Inline Version:** min_heapify_all_inline(heap, callbacks, args)
|
||||
|
||||
Removing Specific Elements
|
||||
--------------------------
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
success = min_heap_del(heap, idx, callbacks, args);
|
||||
|
||||
- **heap**: A pointer to the min-heap.
|
||||
- **idx**: The index of the element to delete.
|
||||
- **callbacks**: A pointer to a `struct min_heap_callbacks` providing the
|
||||
`less` and `swp` functions.
|
||||
- **args**: Optional arguments passed to the `less` and `swp` functions.
|
||||
|
||||
This macro removes an element at the specified index (`idx`) from the heap and
|
||||
restores the heap property. The operation is **O(log n)**.
|
||||
|
||||
**Inline Version:** min_heap_del_inline(heap, idx, callbacks, args)
|
||||
|
||||
Other Utilities
|
||||
===============
|
||||
|
||||
- **min_heap_full(heap)**: Checks whether the heap is full.
|
||||
Complexity: **O(1)**.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
bool full = min_heap_full(heap);
|
||||
|
||||
- `heap`: A pointer to the min-heap to check.
|
||||
|
||||
This macro returns `true` if the heap is full, otherwise `false`.
|
||||
|
||||
**Inline Version:** min_heap_full_inline(heap)
|
||||
|
||||
- **min_heap_empty(heap)**: Checks whether the heap is empty.
|
||||
Complexity: **O(1)**.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
bool empty = min_heap_empty(heap);
|
||||
|
||||
- `heap`: A pointer to the min-heap to check.
|
||||
|
||||
This macro returns `true` if the heap is empty, otherwise `false`.
|
||||
|
||||
**Inline Version:** min_heap_empty_inline(heap)
|
||||
|
||||
Example Usage
|
||||
=============
|
||||
|
||||
An example usage of the min-heap API would involve defining a heap structure,
|
||||
initializing it, and inserting and removing elements as needed.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
#include <linux/min_heap.h>
|
||||
|
||||
int my_less_function(const void *lhs, const void *rhs, void *args) {
|
||||
return (*(int *)lhs < *(int *)rhs);
|
||||
}
|
||||
|
||||
struct min_heap_callbacks heap_cb = {
|
||||
.less = my_less_function, /* Comparison function for heap order */
|
||||
.swp = NULL, /* Use default swap function */
|
||||
};
|
||||
|
||||
void example_usage(void) {
|
||||
/* Pre-populate the buffer with elements */
|
||||
int buffer[5] = {5, 2, 8, 1, 3};
|
||||
/* Declare a min-heap */
|
||||
DEFINE_MIN_HEAP(int, my_heap);
|
||||
|
||||
/* Initialize the heap with preallocated buffer and size */
|
||||
min_heap_init(&my_heap, buffer, 5);
|
||||
|
||||
/* Build the heap using min_heapify_all */
|
||||
my_heap.nr = 5; /* Set the number of elements in the heap */
|
||||
min_heapify_all(&my_heap, &heap_cb, NULL);
|
||||
|
||||
/* Peek at the top element (should be 1 in this case) */
|
||||
int *top = min_heap_peek(&my_heap);
|
||||
pr_info("Top element: %d\n", *top);
|
||||
|
||||
/* Pop the top element (1) and get the new top (2) */
|
||||
min_heap_pop(&my_heap, &heap_cb, NULL);
|
||||
top = min_heap_peek(&my_heap);
|
||||
pr_info("New top element: %d\n", *top);
|
||||
|
||||
/* Insert a new element (0) and recheck the top */
|
||||
int new_element = 0;
|
||||
min_heap_push(&my_heap, &new_element, &heap_cb, NULL);
|
||||
top = min_heap_peek(&my_heap);
|
||||
pr_info("Top element after insertion: %d\n", *top);
|
||||
}
|
@ -151,6 +151,77 @@ the more significant 4-byte word.
|
||||
We always think of our offsets as if there were no quirk, and we translate
|
||||
them afterwards, before accessing the memory region.
|
||||
|
||||
Note on buffer lengths not multiple of 4
|
||||
----------------------------------------
|
||||
|
||||
To deal with memory layout quirks where groups of 4 bytes are laid out "little
|
||||
endian" relative to each other, but "big endian" within the group itself, the
|
||||
concept of groups of 4 bytes is intrinsic to the packing API (not to be
|
||||
confused with the memory access, which is performed byte by byte, though).
|
||||
|
||||
With buffer lengths not multiple of 4, this means one group will be incomplete.
|
||||
Depending on the quirks, this may lead to discontinuities in the bit fields
|
||||
accessible through the buffer. The packing API assumes discontinuities were not
|
||||
the intention of the memory layout, so it avoids them by effectively logically
|
||||
shortening the most significant group of 4 octets to the number of octets
|
||||
actually available.
|
||||
|
||||
Example with a 31 byte sized buffer given below. Physical buffer offsets are
|
||||
implicit, and increase from left to right within a group, and from top to
|
||||
bottom within a column.
|
||||
|
||||
No quirks:
|
||||
|
||||
::
|
||||
|
||||
31 29 28 | Group 7 (most significant)
|
||||
27 26 25 24 | Group 6
|
||||
23 22 21 20 | Group 5
|
||||
19 18 17 16 | Group 4
|
||||
15 14 13 12 | Group 3
|
||||
11 10 9 8 | Group 2
|
||||
7 6 5 4 | Group 1
|
||||
3 2 1 0 | Group 0 (least significant)
|
||||
|
||||
QUIRK_LSW32_IS_FIRST:
|
||||
|
||||
::
|
||||
|
||||
3 2 1 0 | Group 0 (least significant)
|
||||
7 6 5 4 | Group 1
|
||||
11 10 9 8 | Group 2
|
||||
15 14 13 12 | Group 3
|
||||
19 18 17 16 | Group 4
|
||||
23 22 21 20 | Group 5
|
||||
27 26 25 24 | Group 6
|
||||
30 29 28 | Group 7 (most significant)
|
||||
|
||||
QUIRK_LITTLE_ENDIAN:
|
||||
|
||||
::
|
||||
|
||||
30 28 29 | Group 7 (most significant)
|
||||
24 25 26 27 | Group 6
|
||||
20 21 22 23 | Group 5
|
||||
16 17 18 19 | Group 4
|
||||
12 13 14 15 | Group 3
|
||||
8 9 10 11 | Group 2
|
||||
4 5 6 7 | Group 1
|
||||
0 1 2 3 | Group 0 (least significant)
|
||||
|
||||
QUIRK_LITTLE_ENDIAN | QUIRK_LSW32_IS_FIRST:
|
||||
|
||||
::
|
||||
|
||||
0 1 2 3 | Group 0 (least significant)
|
||||
4 5 6 7 | Group 1
|
||||
8 9 10 11 | Group 2
|
||||
12 13 14 15 | Group 3
|
||||
16 17 18 19 | Group 4
|
||||
20 21 22 23 | Group 5
|
||||
24 25 26 27 | Group 6
|
||||
28 29 30 | Group 7 (most significant)
|
||||
|
||||
Intended use
|
||||
------------
|
||||
|
||||
|
@ -209,12 +209,17 @@ Struct Resources
|
||||
::
|
||||
|
||||
%pr [mem 0x60000000-0x6fffffff flags 0x2200] or
|
||||
[mem 0x60000000 flags 0x2200] or
|
||||
[mem 0x0000000060000000-0x000000006fffffff flags 0x2200]
|
||||
[mem 0x0000000060000000 flags 0x2200]
|
||||
%pR [mem 0x60000000-0x6fffffff pref] or
|
||||
[mem 0x60000000 pref] or
|
||||
[mem 0x0000000060000000-0x000000006fffffff pref]
|
||||
[mem 0x0000000060000000 pref]
|
||||
|
||||
For printing struct resources. The ``R`` and ``r`` specifiers result in a
|
||||
printed resource with (R) or without (r) a decoded flags member.
|
||||
printed resource with (R) or without (r) a decoded flags member. If start is
|
||||
equal to end only print the start value.
|
||||
|
||||
Passed by reference.
|
||||
|
||||
@ -231,6 +236,19 @@ width of the CPU data path.
|
||||
|
||||
Passed by reference.
|
||||
|
||||
Struct Range
|
||||
------------
|
||||
|
||||
::
|
||||
|
||||
%pra [range 0x0000000060000000-0x000000006fffffff] or
|
||||
[range 0x0000000060000000]
|
||||
|
||||
For printing struct range. struct range holds an arbitrary range of u64
|
||||
values. If start is equal to end only print the start value.
|
||||
|
||||
Passed by reference.
|
||||
|
||||
DMA address types dma_addr_t
|
||||
----------------------------
|
||||
|
||||
|
@ -295,9 +295,9 @@ slot set.
|
||||
|
||||
Fourth, the io_tlb_slot array keeps track of any "padding slots" allocated to
|
||||
meet alloc_align_mask requirements described above. When
|
||||
swiotlb_tlb_map_single() allocates bounce buffer space to meet alloc_align_mask
|
||||
swiotlb_tbl_map_single() allocates bounce buffer space to meet alloc_align_mask
|
||||
requirements, it may allocate pre-padding space across zero or more slots. But
|
||||
when swiotbl_tlb_unmap_single() is called with the bounce buffer address, the
|
||||
when swiotlb_tbl_unmap_single() is called with the bounce buffer address, the
|
||||
alloc_align_mask value that governed the allocation, and therefore the
|
||||
allocation of any padding slots, is not known. The "pad_slots" field records
|
||||
the number of padding slots so that swiotlb_tbl_unmap_single() can free them.
|
||||
|
@ -245,8 +245,8 @@ CPU which can be assigned to the work items of a wq. For example, with
|
||||
at the same time per CPU. This is always a per-CPU attribute, even for
|
||||
unbound workqueues.
|
||||
|
||||
The maximum limit for ``@max_active`` is 512 and the default value used
|
||||
when 0 is specified is 256. These values are chosen sufficiently high
|
||||
The maximum limit for ``@max_active`` is 2048 and the default value used
|
||||
when 0 is specified is 1024. These values are chosen sufficiently high
|
||||
such that they are not the limiting factor while providing protection in
|
||||
runaway cases.
|
||||
|
||||
@ -357,6 +357,11 @@ Guidelines
|
||||
difference in execution characteristics between using a dedicated wq
|
||||
and a system wq.
|
||||
|
||||
Note: If something may generate more than @max_active outstanding
|
||||
work items (do stress test your producers), it may saturate a system
|
||||
wq and potentially lead to deadlock. It should utilize its own
|
||||
dedicated workqueue rather than the system wq.
|
||||
|
||||
* Unless work items are expected to consume a huge amount of CPU
|
||||
cycles, using a bound wq is usually beneficial due to the increased
|
||||
level of locality in wq operations and work item execution.
|
||||
|
@ -8,10 +8,10 @@ Asymmetric Cipher API
|
||||
---------------------
|
||||
|
||||
.. kernel-doc:: include/crypto/akcipher.h
|
||||
:doc: Generic Public Key API
|
||||
:doc: Generic Public Key Cipher API
|
||||
|
||||
.. kernel-doc:: include/crypto/akcipher.h
|
||||
:functions: crypto_alloc_akcipher crypto_free_akcipher crypto_akcipher_set_pub_key crypto_akcipher_set_priv_key crypto_akcipher_maxsize crypto_akcipher_encrypt crypto_akcipher_decrypt crypto_akcipher_sign crypto_akcipher_verify
|
||||
:functions: crypto_alloc_akcipher crypto_free_akcipher crypto_akcipher_set_pub_key crypto_akcipher_set_priv_key crypto_akcipher_maxsize crypto_akcipher_encrypt crypto_akcipher_decrypt
|
||||
|
||||
Asymmetric Cipher Request Handle
|
||||
--------------------------------
|
||||
|
15
Documentation/crypto/api-sig.rst
Normal file
15
Documentation/crypto/api-sig.rst
Normal file
@ -0,0 +1,15 @@
|
||||
Asymmetric Signature Algorithm Definitions
|
||||
------------------------------------------
|
||||
|
||||
.. kernel-doc:: include/crypto/sig.h
|
||||
:functions: sig_alg
|
||||
|
||||
Asymmetric Signature API
|
||||
------------------------
|
||||
|
||||
.. kernel-doc:: include/crypto/sig.h
|
||||
:doc: Generic Public Key Signature API
|
||||
|
||||
.. kernel-doc:: include/crypto/sig.h
|
||||
:functions: crypto_alloc_sig crypto_free_sig crypto_sig_set_pubkey crypto_sig_set_privkey crypto_sig_keysize crypto_sig_maxsize crypto_sig_digestsize crypto_sig_sign crypto_sig_verify
|
||||
|
@ -10,4 +10,5 @@ Programming Interface
|
||||
api-digest
|
||||
api-rng
|
||||
api-akcipher
|
||||
api-sig
|
||||
api-kpp
|
||||
|
@ -214,6 +214,8 @@ the aforementioned cipher types:
|
||||
|
||||
- CRYPTO_ALG_TYPE_AKCIPHER Asymmetric cipher
|
||||
|
||||
- CRYPTO_ALG_TYPE_SIG Asymmetric signature
|
||||
|
||||
- CRYPTO_ALG_TYPE_PCOMPRESS Enhanced version of
|
||||
CRYPTO_ALG_TYPE_COMPRESS allowing for segmented compression /
|
||||
decompression instead of performing the operation on one segment
|
||||
|
168
Documentation/dev-tools/autofdo.rst
Normal file
168
Documentation/dev-tools/autofdo.rst
Normal file
@ -0,0 +1,168 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===================================
|
||||
Using AutoFDO with the Linux kernel
|
||||
===================================
|
||||
|
||||
This enables AutoFDO build support for the kernel when using
|
||||
the Clang compiler. AutoFDO (Auto-Feedback-Directed Optimization)
|
||||
is a type of profile-guided optimization (PGO) used to enhance the
|
||||
performance of binary executables. It gathers information about the
|
||||
frequency of execution of various code paths within a binary using
|
||||
hardware sampling. This data is then used to guide the compiler's
|
||||
optimization decisions, resulting in a more efficient binary. AutoFDO
|
||||
is a powerful optimization technique, and data indicates that it can
|
||||
significantly improve kernel performance. It's especially beneficial
|
||||
for workloads affected by front-end stalls.
|
||||
|
||||
For AutoFDO builds, unlike non-FDO builds, the user must supply a
|
||||
profile. Acquiring an AutoFDO profile can be done in several ways.
|
||||
AutoFDO profiles are created by converting hardware sampling using
|
||||
the "perf" tool. It is crucial that the workload used to create these
|
||||
perf files is representative; they must exhibit runtime
|
||||
characteristics similar to the workloads that are intended to be
|
||||
optimized. Failure to do so will result in the compiler optimizing
|
||||
for the wrong objective.
|
||||
|
||||
The AutoFDO profile often encapsulates the program's behavior. If the
|
||||
performance-critical codes are architecture-independent, the profile
|
||||
can be applied across platforms to achieve performance gains. For
|
||||
instance, using the profile generated on Intel architecture to build
|
||||
a kernel for AMD architecture can also yield performance improvements.
|
||||
|
||||
There are two methods for acquiring a representative profile:
|
||||
(1) Sample real workloads using a production environment.
|
||||
(2) Generate the profile using a representative load test.
|
||||
When enabling the AutoFDO build configuration without providing an
|
||||
AutoFDO profile, the compiler only modifies the dwarf information in
|
||||
the kernel without impacting runtime performance. It's advisable to
|
||||
use a kernel binary built with the same AutoFDO configuration to
|
||||
collect the perf profile. While it's possible to use a kernel built
|
||||
with different options, it may result in inferior performance.
|
||||
|
||||
One can collect profiles using AutoFDO build for the previous kernel.
|
||||
AutoFDO employs relative line numbers to match the profiles, offering
|
||||
some tolerance for source changes. This mode is commonly used in a
|
||||
production environment for profile collection.
|
||||
|
||||
In a profile collection based on a load test, the AutoFDO collection
|
||||
process consists of the following steps:
|
||||
|
||||
#. Initial build: The kernel is built with AutoFDO options
|
||||
without a profile.
|
||||
|
||||
#. Profiling: The above kernel is then run with a representative
|
||||
workload to gather execution frequency data. This data is
|
||||
collected using hardware sampling, via perf. AutoFDO is most
|
||||
effective on platforms supporting advanced PMU features like
|
||||
LBR on Intel machines.
|
||||
|
||||
#. AutoFDO profile generation: Perf output file is converted to
|
||||
the AutoFDO profile via offline tools.
|
||||
|
||||
The support requires a Clang compiler LLVM 17 or later.
|
||||
|
||||
Preparation
|
||||
===========
|
||||
|
||||
Configure the kernel with::
|
||||
|
||||
CONFIG_AUTOFDO_CLANG=y
|
||||
|
||||
Customization
|
||||
=============
|
||||
|
||||
The default CONFIG_AUTOFDO_CLANG setting covers kernel space objects for
|
||||
AutoFDO builds. One can, however, enable or disable AutoFDO build for
|
||||
individual files and directories by adding a line similar to the following
|
||||
to the respective kernel Makefile:
|
||||
|
||||
- For enabling a single file (e.g. foo.o) ::
|
||||
|
||||
AUTOFDO_PROFILE_foo.o := y
|
||||
|
||||
- For enabling all files in one directory ::
|
||||
|
||||
AUTOFDO_PROFILE := y
|
||||
|
||||
- For disabling one file ::
|
||||
|
||||
AUTOFDO_PROFILE_foo.o := n
|
||||
|
||||
- For disabling all files in one directory ::
|
||||
|
||||
AUTOFDO_PROFILE := n
|
||||
|
||||
Workflow
|
||||
========
|
||||
|
||||
Here is an example workflow for AutoFDO kernel:
|
||||
|
||||
1) Build the kernel on the host machine with LLVM enabled,
|
||||
for example, ::
|
||||
|
||||
$ make menuconfig LLVM=1
|
||||
|
||||
Turn on AutoFDO build config::
|
||||
|
||||
CONFIG_AUTOFDO_CLANG=y
|
||||
|
||||
With a configuration that with LLVM enabled, use the following command::
|
||||
|
||||
$ scripts/config -e AUTOFDO_CLANG
|
||||
|
||||
After getting the config, build with ::
|
||||
|
||||
$ make LLVM=1
|
||||
|
||||
2) Install the kernel on the test machine.
|
||||
|
||||
3) Run the load tests. The '-c' option in perf specifies the sample
|
||||
event period. We suggest using a suitable prime number, like 500009,
|
||||
for this purpose.
|
||||
|
||||
- For Intel platforms::
|
||||
|
||||
$ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
|
||||
|
||||
- For AMD platforms:
|
||||
|
||||
The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2. To check,
|
||||
|
||||
For Zen3::
|
||||
|
||||
$ cat proc/cpuinfo | grep " brs"
|
||||
|
||||
For Zen4::
|
||||
|
||||
$ cat proc/cpuinfo | grep amd_lbr_v2
|
||||
|
||||
The following command generated the perf data file::
|
||||
|
||||
$ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
|
||||
|
||||
4) (Optional) Download the raw perf file to the host machine.
|
||||
|
||||
5) To generate an AutoFDO profile, two offline tools are available:
|
||||
create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part
|
||||
of the AutoFDO project and can be found on GitHub
|
||||
(https://github.com/google/autofdo), version v0.30.1 or later.
|
||||
The llvm_profgen tool is included in the LLVM compiler itself. It's
|
||||
important to note that the version of llvm_profgen doesn't need to match
|
||||
the version of Clang. It needs to be the LLVM 19 release of Clang
|
||||
or later, or just from the LLVM trunk. ::
|
||||
|
||||
$ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file>
|
||||
|
||||
or ::
|
||||
|
||||
$ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary --out=<profile_file>
|
||||
|
||||
Note that multiple AutoFDO profile files can be merged into one via::
|
||||
|
||||
$ llvm-profdata merge -o <profile_file> <profile_1> <profile_2> ... <profile_n>
|
||||
|
||||
6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1,
|
||||
(Note CONFIG_AUTOFDO_CLANG needs to be enabled)::
|
||||
|
||||
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file>
|
@ -470,8 +470,6 @@ API usage
|
||||
usleep_range() should be preferred over udelay(). The proper way of
|
||||
using usleep_range() is mentioned in the kernel docs.
|
||||
|
||||
See: https://www.kernel.org/doc/html/latest/timers/timers-howto.html#delays-information-on-the-various-kernel-delay-sleep-mechanisms
|
||||
|
||||
|
||||
Comments
|
||||
--------
|
||||
|
@ -250,25 +250,17 @@ variables for .cocciconfig is as follows:
|
||||
- Your directory from which spatch is called is processed next
|
||||
- The directory provided with the ``--dir`` option is processed last, if used
|
||||
|
||||
Since coccicheck runs through make, it naturally runs from the kernel
|
||||
proper dir; as such the second rule above would be implied for picking up a
|
||||
.cocciconfig when using ``make coccicheck``.
|
||||
|
||||
``make coccicheck`` also supports using M= targets. If you do not supply
|
||||
any M= target, it is assumed you want to target the entire kernel.
|
||||
The kernel coccicheck script has::
|
||||
|
||||
if [ "$KBUILD_EXTMOD" = "" ] ; then
|
||||
OPTIONS="--dir $srctree $COCCIINCLUDE"
|
||||
else
|
||||
OPTIONS="--dir $KBUILD_EXTMOD $COCCIINCLUDE"
|
||||
fi
|
||||
OPTIONS="--dir $srcroot $COCCIINCLUDE"
|
||||
|
||||
KBUILD_EXTMOD is set when an explicit target with M= is used. For both cases
|
||||
the spatch ``--dir`` argument is used, as such third rule applies when whether
|
||||
M= is used or not, and when M= is used the target directory can have its own
|
||||
.cocciconfig file. When M= is not passed as an argument to coccicheck the
|
||||
target directory is the same as the directory from where spatch was called.
|
||||
Here, $srcroot refers to the source directory of the target: it points to the
|
||||
external module's source directory when M= used, and otherwise, to the kernel
|
||||
source directory. The third rule ensures the spatch reads the .cocciconfig from
|
||||
the target directory, allowing external modules to have their own .cocciconfig
|
||||
file.
|
||||
|
||||
If not using the kernel's coccicheck target, keep the above precedence
|
||||
order logic of .cocciconfig reading. If using the kernel's coccicheck target,
|
||||
|
@ -23,7 +23,7 @@ Possible uses:
|
||||
associated code is never run?)
|
||||
|
||||
.. _gcov: https://gcc.gnu.org/onlinedocs/gcc/Gcov.html
|
||||
.. _lcov: http://ltp.sourceforge.net/coverage/lcov.php
|
||||
.. _lcov: https://github.com/linux-test-project/lcov
|
||||
|
||||
|
||||
Preparation
|
||||
|
@ -34,6 +34,8 @@ Documentation/dev-tools/testing-overview.rst
|
||||
ktap
|
||||
checkuapi
|
||||
gpio-sloppy-logic-analyzer
|
||||
autofdo
|
||||
propeller
|
||||
|
||||
|
||||
.. only:: subproject and html
|
||||
|
@ -511,19 +511,14 @@ Tests
|
||||
~~~~~
|
||||
|
||||
There are KASAN tests that allow verifying that KASAN works and can detect
|
||||
certain types of memory corruptions. The tests consist of two parts:
|
||||
certain types of memory corruptions.
|
||||
|
||||
1. Tests that are integrated with the KUnit Test Framework. Enabled with
|
||||
``CONFIG_KASAN_KUNIT_TEST``. These tests can be run and partially verified
|
||||
All KASAN tests are integrated with the KUnit Test Framework and can be enabled
|
||||
via ``CONFIG_KASAN_KUNIT_TEST``. The tests can be run and partially verified
|
||||
automatically in a few different ways; see the instructions below.
|
||||
|
||||
2. Tests that are currently incompatible with KUnit. Enabled with
|
||||
``CONFIG_KASAN_MODULE_TEST`` and can only be run as a module. These tests can
|
||||
only be verified manually by loading the kernel module and inspecting the
|
||||
kernel log for KASAN reports.
|
||||
|
||||
Each KUnit-compatible KASAN test prints one of multiple KASAN reports if an
|
||||
error is detected. Then the test prints its number and status.
|
||||
Each KASAN test prints one of multiple KASAN reports if an error is detected.
|
||||
Then the test prints its number and status.
|
||||
|
||||
When a test passes::
|
||||
|
||||
@ -550,16 +545,16 @@ Or, if one of the tests failed::
|
||||
|
||||
not ok 1 - kasan
|
||||
|
||||
There are a few ways to run KUnit-compatible KASAN tests.
|
||||
There are a few ways to run the KASAN tests.
|
||||
|
||||
1. Loadable module
|
||||
|
||||
With ``CONFIG_KUNIT`` enabled, KASAN-KUnit tests can be built as a loadable
|
||||
module and run by loading ``kasan_test.ko`` with ``insmod`` or ``modprobe``.
|
||||
With ``CONFIG_KUNIT`` enabled, the tests can be built as a loadable module
|
||||
and run by loading ``kasan_test.ko`` with ``insmod`` or ``modprobe``.
|
||||
|
||||
2. Built-In
|
||||
|
||||
With ``CONFIG_KUNIT`` built-in, KASAN-KUnit tests can be built-in as well.
|
||||
With ``CONFIG_KUNIT`` built-in, the tests can be built-in as well.
|
||||
In this case, the tests will run at boot as a late-init call.
|
||||
|
||||
3. Using kunit_tool
|
||||
|
@ -75,11 +75,11 @@ supports it for the architecture you are using, you can use hardware
|
||||
breakpoints if you desire to run with the ``CONFIG_STRICT_KERNEL_RWX``
|
||||
option turned on, else you need to turn off this option.
|
||||
|
||||
Next you should choose one of more I/O drivers to interconnect debugging
|
||||
Next you should choose one or more I/O drivers to interconnect the debugging
|
||||
host and debugged target. Early boot debugging requires a KGDB I/O
|
||||
driver that supports early debugging and the driver must be built into
|
||||
the kernel directly. Kgdb I/O driver configuration takes place via
|
||||
kernel or module parameters which you can learn more about in the in the
|
||||
kernel or module parameters which you can learn more about in the
|
||||
section that describes the parameter kgdboc.
|
||||
|
||||
Here is an example set of ``.config`` symbols to enable or disable for kgdb::
|
||||
@ -201,8 +201,8 @@ Using loadable module or built-in
|
||||
Configure kgdboc at runtime with sysfs
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
At run time you can enable or disable kgdboc by echoing a parameters
|
||||
into the sysfs. Here are two examples:
|
||||
At run time you can enable or disable kgdboc by writing parameters
|
||||
into sysfs. Here are two examples:
|
||||
|
||||
1. Enable kgdboc on ttyS0::
|
||||
|
||||
@ -329,7 +329,7 @@ ways to activate this feature.
|
||||
|
||||
2. Use sysfs before configuring an I/O driver::
|
||||
|
||||
echo 1 > /sys/module/kgdb/parameters/kgdb_use_con
|
||||
echo 1 > /sys/module/debug_core/parameters/kgdb_use_con
|
||||
|
||||
.. note::
|
||||
|
||||
@ -374,10 +374,10 @@ default behavior is always set to 0.
|
||||
Kernel parameter: ``nokaslr``
|
||||
-----------------------------
|
||||
|
||||
If the architecture that you are using enable KASLR by default,
|
||||
If the architecture that you are using enables KASLR by default,
|
||||
you should consider turning it off. KASLR randomizes the
|
||||
virtual address where the kernel image is mapped and confuse
|
||||
gdb which resolve kernel symbol address from symbol table
|
||||
virtual address where the kernel image is mapped and confuses
|
||||
gdb which resolves addresses of kernel symbols from the symbol table
|
||||
of vmlinux.
|
||||
|
||||
Using kdb
|
||||
@ -631,8 +631,6 @@ automatically changes into kgdb mode.
|
||||
|
||||
kgdb
|
||||
|
||||
Now disconnect your terminal program and connect gdb in its place
|
||||
|
||||
2. At the kdb prompt, disconnect the terminal program and connect gdb in
|
||||
its place.
|
||||
|
||||
@ -749,7 +747,7 @@ The kernel debugger is organized into a number of components:
|
||||
helper functions in some of the other kernel components to make it
|
||||
possible for kdb to examine and report information about the kernel
|
||||
without taking locks that could cause a kernel deadlock. The kdb core
|
||||
contains implements the following functionality.
|
||||
implements the following functionality.
|
||||
|
||||
- A simple shell
|
||||
|
||||
|
@ -161,6 +161,7 @@ See the include/linux/kmemleak.h header for the functions prototype.
|
||||
- ``kmemleak_free_percpu`` - notify of a percpu memory block freeing
|
||||
- ``kmemleak_update_trace`` - update object allocation stack trace
|
||||
- ``kmemleak_not_leak`` - mark an object as not a leak
|
||||
- ``kmemleak_transient_leak`` - mark an object as a transient leak
|
||||
- ``kmemleak_ignore`` - do not scan or report an object as leak
|
||||
- ``kmemleak_scan_area`` - add scan areas inside a memory block
|
||||
- ``kmemleak_no_scan`` - do not scan a memory block
|
||||
|
@ -133,7 +133,7 @@ KMSAN shadow memory
|
||||
-------------------
|
||||
|
||||
KMSAN associates a metadata byte (also called shadow byte) with every byte of
|
||||
kernel memory. A bit in the shadow byte is set iff the corresponding bit of the
|
||||
kernel memory. A bit in the shadow byte is set if the corresponding bit of the
|
||||
kernel memory byte is uninitialized. Marking the memory uninitialized (i.e.
|
||||
setting its shadow bytes to ``0xff``) is called poisoning, marking it
|
||||
initialized (setting the shadow bytes to ``0x00``) is called unpoisoning.
|
||||
|
@ -31,6 +31,15 @@ kselftest runs as a userspace process. Tests that can be written/run in
|
||||
userspace may wish to use the `Test Harness`_. Tests that need to be
|
||||
run in kernel space may wish to use a `Test Module`_.
|
||||
|
||||
Documentation on the tests
|
||||
==========================
|
||||
|
||||
For documentation on the kselftests themselves, see:
|
||||
|
||||
.. toctree::
|
||||
|
||||
testing-devices
|
||||
|
||||
Running the selftests (hotplug tests are run in limited mode)
|
||||
=============================================================
|
||||
|
||||
|
162
Documentation/dev-tools/propeller.rst
Normal file
162
Documentation/dev-tools/propeller.rst
Normal file
@ -0,0 +1,162 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=====================================
|
||||
Using Propeller with the Linux kernel
|
||||
=====================================
|
||||
|
||||
This enables Propeller build support for the kernel when using Clang
|
||||
compiler. Propeller is a profile-guided optimization (PGO) method used
|
||||
to optimize binary executables. Like AutoFDO, it utilizes hardware
|
||||
sampling to gather information about the frequency of execution of
|
||||
different code paths within a binary. Unlike AutoFDO, this information
|
||||
is then used right before linking phase to optimize (among others)
|
||||
block layout within and across functions.
|
||||
|
||||
A few important notes about adopting Propeller optimization:
|
||||
|
||||
#. Although it can be used as a standalone optimization step, it is
|
||||
strongly recommended to apply Propeller on top of AutoFDO,
|
||||
AutoFDO+ThinLTO or Instrument FDO. The rest of this document
|
||||
assumes this paradigm.
|
||||
|
||||
#. Propeller uses another round of profiling on top of
|
||||
AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves
|
||||
"build-afdo - train-afdo - build-propeller - train-propeller -
|
||||
build-optimized".
|
||||
|
||||
#. Propeller requires LLVM 19 release or later for Clang/Clang++
|
||||
and the linker(ld.lld).
|
||||
|
||||
#. In addition to LLVM toolchain, Propeller requires a profiling
|
||||
conversion tool: https://github.com/google/autofdo with a release
|
||||
after v0.30.1: https://github.com/google/autofdo/releases/tag/v0.30.1.
|
||||
|
||||
The Propeller optimization process involves the following steps:
|
||||
|
||||
#. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as
|
||||
you would normally do, but with a set of compile-time / link-time
|
||||
flags, so that a special metadata section is created within the
|
||||
kernel binary. The special section is only intend to be used by the
|
||||
profiling tool, it is not part of the runtime image, nor does it
|
||||
change kernel run time text sections.
|
||||
|
||||
#. Profiling: The above kernel is then run with a representative
|
||||
workload to gather execution frequency data. This data is collected
|
||||
using hardware sampling, via perf. Propeller is most effective on
|
||||
platforms supporting advanced PMU features like LBR on Intel
|
||||
machines. This step is the same as profiling the kernel for AutoFDO
|
||||
(the exact perf parameters can be different).
|
||||
|
||||
#. Propeller profile generation: Perf output file is converted to a
|
||||
pair of Propeller profiles via an offline tool.
|
||||
|
||||
#. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized
|
||||
binary as you would normally do, but with a compile-time /
|
||||
link-time flag to pick up the Propeller compile time and link time
|
||||
profiles. This build step uses 3 profiles - the AutoFDO profile,
|
||||
the Propeller compile-time profile and the Propeller link-time
|
||||
profile.
|
||||
|
||||
#. Deployment: The optimized kernel binary is deployed and used
|
||||
in production environments, providing improved performance
|
||||
and reduced latency.
|
||||
|
||||
Preparation
|
||||
===========
|
||||
|
||||
Configure the kernel with::
|
||||
|
||||
CONFIG_AUTOFDO_CLANG=y
|
||||
CONFIG_PROPELLER_CLANG=y
|
||||
|
||||
Customization
|
||||
=============
|
||||
|
||||
The default CONFIG_PROPELLER_CLANG setting covers kernel space objects
|
||||
for Propeller builds. One can, however, enable or disable Propeller build
|
||||
for individual files and directories by adding a line similar to the
|
||||
following to the respective kernel Makefile:
|
||||
|
||||
- For enabling a single file (e.g. foo.o)::
|
||||
|
||||
PROPELLER_PROFILE_foo.o := y
|
||||
|
||||
- For enabling all files in one directory::
|
||||
|
||||
PROPELLER_PROFILE := y
|
||||
|
||||
- For disabling one file::
|
||||
|
||||
PROPELLER_PROFILE_foo.o := n
|
||||
|
||||
- For disabling all files in one directory::
|
||||
|
||||
PROPELLER__PROFILE := n
|
||||
|
||||
|
||||
Workflow
|
||||
========
|
||||
|
||||
Here is an example workflow for building an AutoFDO+Propeller kernel:
|
||||
|
||||
1) Assuming an AutoFDO profile is already collected following
|
||||
instructions in the AutoFDO document, build the kernel on the host
|
||||
machine, with AutoFDO and Propeller build configs ::
|
||||
|
||||
CONFIG_AUTOFDO_CLANG=y
|
||||
CONFIG_PROPELLER_CLANG=y
|
||||
|
||||
and ::
|
||||
|
||||
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name>
|
||||
|
||||
2) Install the kernel on the test machine.
|
||||
|
||||
3) Run the load tests. The '-c' option in perf specifies the sample
|
||||
event period. We suggest using a suitable prime number, like 500009,
|
||||
for this purpose.
|
||||
|
||||
- For Intel platforms::
|
||||
|
||||
$ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
|
||||
|
||||
- For AMD platforms::
|
||||
|
||||
$ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
|
||||
|
||||
Note you can repeat the above steps to collect multiple <perf_file>s.
|
||||
|
||||
4) (Optional) Download the raw perf file(s) to the host machine.
|
||||
|
||||
5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to
|
||||
generate Propeller profile. ::
|
||||
|
||||
$ create_llvm_prof --binary=<vmlinux> --profile=<perf_file>
|
||||
--format=propeller --propeller_output_module_name
|
||||
--out=<propeller_profile_prefix>_cc_profile.txt
|
||||
--propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
|
||||
|
||||
"<propeller_profile_prefix>" can be something like "/home/user/dir/any_string".
|
||||
|
||||
This command generates a pair of Propeller profiles:
|
||||
"<propeller_profile_prefix>_cc_profile.txt" and
|
||||
"<propeller_profile_prefix>_ld_profile.txt".
|
||||
|
||||
If there are more than 1 perf_file collected in the previous step,
|
||||
you can create a temp list file "<perf_file_list>" with each line
|
||||
containing one perf file name and run::
|
||||
|
||||
$ create_llvm_prof --binary=<vmlinux> --profile=@<perf_file_list>
|
||||
--format=propeller --propeller_output_module_name
|
||||
--out=<propeller_profile_prefix>_cc_profile.txt
|
||||
--propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
|
||||
|
||||
6) Rebuild the kernel using the AutoFDO and Propeller
|
||||
profiles. ::
|
||||
|
||||
CONFIG_AUTOFDO_CLANG=y
|
||||
CONFIG_PROPELLER_CLANG=y
|
||||
|
||||
and ::
|
||||
|
||||
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix>
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
x
Reference in New Issue
Block a user