ASoC: Fixes for v6.13

A few small fixes for v6.13, all system specific - the biggest thing is
 the fix for jack handling over suspend on some Intel laptops.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmdR3IYACgkQJNaLcl1U
 h9CaAAgAhU+wN7LEym7648q33gVEy/I335+ZHf0gLEMnAF1iNzoOx0Gy3cXBPHq3
 sic1P37UmkIIWi6BTr19aBxQ9Z0Vk3WhUsk+elmg3vhR1lodBZ9m8lYlLyEGbCh/
 Ur/AFSoewPbYGdJAVL7FclDiMXlnallF6xFWbh9O9Fw85hTh4WF07dRyws8j9tZQ
 qMkoic95lLPZTCplt8vHVC9sTXWuVp1HNiUKZDLLQ044PRlehLH21W4HJJgk/Dtl
 TO5U1zZpY3gB1QsxaR3+6vMDgPHHCUxvRkb4/hydHmKIqoFGuu0Ootipm9su1b/L
 jOKeEX2Gk6416fHpTWPUvJQTlv9MAA==
 =lnCs
 -----END PGP SIGNATURE-----

Merge tag 'asoc-fix-v6.13-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.13

A few small fixes for v6.13, all system specific - the biggest thing is
the fix for jack handling over suspend on some Intel laptops.
This commit is contained in:
Takashi Iwai 2024-12-05 18:09:29 +01:00
commit c34e9ab9a6
13256 changed files with 492845 additions and 320095 deletions

9
.clippy.toml Normal file
View File

@ -0,0 +1,9 @@
# SPDX-License-Identifier: GPL-2.0
check-private-items = true
disallowed-macros = [
# The `clippy::dbg_macro` lint only works with `std::dbg!`, thus we simulate
# it here, see: https://github.com/rust-lang/rust-clippy/issues/11303.
{ path = "kernel::dbg", reason = "the `dbg!` macro is intended as a debugging tool" },
]

View File

@ -3,3 +3,4 @@ Alan Cox <root@hraefn.swansea.linux.org.uk>
Christoph Hellwig <hch@lst.de>
Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Marc Gonzalez <marc.w.gonzalez@free.fr>
Ralf Baechle <ralf@linux-mips.org>

2
.gitignore vendored
View File

@ -103,6 +103,7 @@ modules.order
# We don't want to ignore the following even if they are dot-files
#
!.clang-format
!.clippy.toml
!.cocciconfig
!.editorconfig
!.get_maintainer.ignore
@ -128,6 +129,7 @@ series
# ctags files
tags
!tags/
TAGS
# cscope files

View File

@ -37,6 +37,7 @@ Alexei Avshalom Lazar <quic_ailizaro@quicinc.com> <ailizaro@codeaurora.org>
Alexei Starovoitov <ast@kernel.org> <alexei.starovoitov@gmail.com>
Alexei Starovoitov <ast@kernel.org> <ast@fb.com>
Alexei Starovoitov <ast@kernel.org> <ast@plumgrid.com>
Alexey Klimov <alexey.klimov@linaro.org> <klimov.linux@gmail.com>
Alexey Makhalov <alexey.amakhalov@broadcom.com> <amakhalov@vmware.com>
Alex Elder <elder@kernel.org>
Alex Elder <elder@kernel.org> <aelder@sgi.com>
@ -251,6 +252,8 @@ Guru Das Srinagesh <quic_gurus@quicinc.com> <gurus@codeaurora.org>
Gustavo Padovan <gustavo@las.ic.unicamp.br>
Gustavo Padovan <padovan@profusion.mobi>
Hanjun Guo <guohanjun@huawei.com> <hanjun.guo@linaro.org>
Hans Verkuil <hverkuil@xs4all.nl> <hansverk@cisco.com>
Hans Verkuil <hverkuil@xs4all.nl> <hverkuil-cisco@xs4all.nl>
Heiko Carstens <hca@linux.ibm.com> <h.carstens@de.ibm.com>
Heiko Carstens <hca@linux.ibm.com> <heiko.carstens@de.ibm.com>
Heiko Stuebner <heiko@sntech.de> <heiko.stuebner@bqreaders.com>
@ -269,6 +272,7 @@ Jack Pham <quic_jackp@quicinc.com> <jackp@codeaurora.org>
Jaegeuk Kim <jaegeuk@kernel.org> <jaegeuk@google.com>
Jaegeuk Kim <jaegeuk@kernel.org> <jaegeuk.kim@samsung.com>
Jaegeuk Kim <jaegeuk@kernel.org> <jaegeuk@motorola.com>
Jai Luthra <jai.luthra@linux.dev> <j-luthra@ti.com>
Jakub Kicinski <kuba@kernel.org> <jakub.kicinski@netronome.com>
James Bottomley <jejb@mulgrave.(none)>
James Bottomley <jejb@titanic.il.steeleye.com>
@ -730,6 +734,7 @@ Will Deacon <will@kernel.org> <will.deacon@arm.com>
Wolfram Sang <wsa@kernel.org> <w.sang@pengutronix.de>
Wolfram Sang <wsa@kernel.org> <wsa@the-dreams.de>
Yakir Yang <kuankuan.y@gmail.com> <ykk@rock-chips.com>
Yanteng Si <si.yanteng@linux.dev> <siyanteng@loongson.cn>
Yusuke Goda <goda.yusuke@renesas.com>
Zack Rusin <zack.rusin@broadcom.com> <zackr@vmware.com>
Zhu Yanjun <zyjzyj2000@gmail.com> <yanjunz@nvidia.com>

12
CREDITS
View File

@ -185,6 +185,11 @@ P: 1024/AF7B30C1 CF 97 C2 CC 6D AE A7 FE C8 BA 9C FC 88 DE 32 C3
D: Linux/MIPS port
D: Linux/68k hacker
D: AX25 maintainer
D: EDAC-CAVIUM OCTEON maintainer
D: IOC3 ETHERNET DRIVER maintainer
D: NETROM NETWORK LAYER maintainer
D: ROSE NETWORK LAYER maintainer
D: TURBOCHANNEL SUBSYSTEM maintainer
S: Hauptstrasse 19
S: 79837 St. Blasien
S: Germany
@ -574,6 +579,9 @@ N: Zach Brown
E: zab@zabbo.net
D: maestro pci sound
N: Zefan Li
D: Contribution to control group stuff
N: David Brownell
D: Kernel engineer, mentor, and friend. Maintained USB EHCI and
D: gadget layers, SPI subsystem, GPIO subsystem, and more than a few
@ -3795,6 +3803,10 @@ S: Department of Zoology, University of Washington
S: Seattle, WA 98195-1800
S: USA
N: York Sun
E: york.sun@nxp.com
D: Freescale DDR EDAC
N: Eugene Surovegin
E: ebs@ebshome.net
W: https://kernel.ebshome.net/

View File

@ -0,0 +1,12 @@
What: /sys/fs/selinux/user
Date: April 2005 (predates git)
KernelVersion: 2.6.12-rc2 (predates git)
Contact: selinux@vger.kernel.org
Description:
The selinuxfs "user" node allows userspace to request a list
of security contexts that can be reached for a given SELinux
user from a given starting context. This was used by libselinux
when various login-style programs requested contexts for
users, but libselinux stopped using it in 2020.
Kernel support will be removed no sooner than Dec 2025.

View File

@ -424,6 +424,13 @@ Description:
[RW] This file is used to control (on/off) the iostats
accounting of the disk.
What: /sys/block/<disk>/queue/iostats_passthrough
Date: October 2024
Contact: linux-block@vger.kernel.org
Description:
[RW] This file is used to control (on/off) the iostats
accounting of the disk for passthrough commands.
What: /sys/block/<disk>/queue/logical_block_size
Date: May 2009
@ -594,6 +601,9 @@ Description:
[RW] Maximum number of kilobytes to read-ahead for filesystems
on this block device.
For MADV_HUGEPAGE, the readahead size may exceed this setting
since its granularity is based on the hugepage size.
What: /sys/block/<disk>/queue/rotational
Date: January 2009

View File

@ -342,6 +342,70 @@ Description: Specific uncompressed frame descriptors
support
========================= =====================================
What: /config/usb-gadget/gadget/functions/uvc.name/streaming/framebased
Date: Sept 2024
KernelVersion: 5.15
Description: Framebased format descriptors
What: /config/usb-gadget/gadget/functions/uvc.name/streaming/framebased/name
Date: Sept 2024
KernelVersion: 5.15
Description: Specific framebased format descriptors
================== =======================================
bFormatIndex unique id for this format descriptor;
only defined after parent header is
linked into the streaming class;
read-only
bmaControls this format's data for bmaControls in
the streaming header
bmInterlaceFlags specifies interlace information,
read-only
bAspectRatioY the X dimension of the picture aspect
ratio, read-only
bAspectRatioX the Y dimension of the picture aspect
ratio, read-only
bDefaultFrameIndex optimum frame index for this stream
bBitsPerPixel number of bits per pixel used to
specify color in the decoded video
frame
guidFormat globally unique id used to identify
stream-encoding format
================== =======================================
What: /config/usb-gadget/gadget/functions/uvc.name/streaming/framebased/name/name
Date: Sept 2024
KernelVersion: 5.15
Description: Specific framebased frame descriptors
========================= =====================================
bFrameIndex unique id for this framedescriptor;
only defined after parent format is
linked into the streaming header;
read-only
dwFrameInterval indicates how frame interval can be
programmed; a number of values
separated by newline can be specified
dwDefaultFrameInterval the frame interval the device would
like to use as default
dwBytesPerLine Specifies the number of bytes per line
of video for packed fixed frame size
formats, allowing the receiver to
perform stride alignment of the video.
If the bVariableSize value (above) is
TRUE (1), or if the format does not
permit such alignment, this value shall
be set to zero (0).
dwMaxBitRate the maximum bit rate at the shortest
frame interval in bps
dwMinBitRate the minimum bit rate at the longest
frame interval in bps
wHeight height of decoded bitmap frame in px
wWidth width of decoded bitmam frame in px
bmCapabilities still image support, fixed frame-rate
support
========================= =====================================
What: /config/usb-gadget/gadget/functions/uvc.name/streaming/header
Date: Dec 2014
KernelVersion: 4.0

View File

@ -184,3 +184,10 @@ Date: Apr 2020
Contact: linux-crypto@vger.kernel.org
Description: Dump the total number of time out requests.
Available for both PF and VF, and take no other effect on HPRE.
What: /sys/kernel/debug/hisi_hpre/<bdf>/cap_regs
Date: Oct 2024
Contact: linux-crypto@vger.kernel.org
Description: Dump the values of the qm and hpre capability bit registers and
support the query of device specifications to facilitate fault locating.
Available for both PF and VF, and take no other effect on HPRE.

View File

@ -0,0 +1,25 @@
What: /sys/kernel/debug/vfio/<device>/migration/hisi_acc/dev_data
Date: Jan 2025
KernelVersion: 6.13
Contact: Longfang Liu <liulongfang@huawei.com>
Description: Read the configuration data and some status data
required for device live migration. These data include device
status data, queue configuration data, some task configuration
data and device attribute data. The output format of the data
is defined by the live migration driver.
What: /sys/kernel/debug/vfio/<device>/migration/hisi_acc/migf_data
Date: Jan 2025
KernelVersion: 6.13
Contact: Longfang Liu <liulongfang@huawei.com>
Description: Read the data from the last completed live migration.
This data includes the same device status data as in "dev_data".
The migf_data is the dev_data that is migrated.
What: /sys/kernel/debug/vfio/<device>/migration/hisi_acc/cmd_state
Date: Jan 2025
KernelVersion: 6.13
Contact: Longfang Liu <liulongfang@huawei.com>
Description: Used to obtain the device command sending and receiving
channel status. Returns failure or success logs based on the
results.

View File

@ -157,3 +157,10 @@ Contact: linux-crypto@vger.kernel.org
Description: Dump the total number of completed but marked error requests
to be received.
Available for both PF and VF, and take no other effect on SEC.
What: /sys/kernel/debug/hisi_sec2/<bdf>/cap_regs
Date: Oct 2024
Contact: linux-crypto@vger.kernel.org
Description: Dump the values of the qm and sec capability bit registers and
support the query of device specifications to facilitate fault locating.
Available for both PF and VF, and take no other effect on SEC.

View File

@ -158,3 +158,10 @@ Contact: linux-crypto@vger.kernel.org
Description: Dump the total number of BD type error requests
to be received.
Available for both PF and VF, and take no other effect on ZIP.
What: /sys/kernel/debug/hisi_zip/<bdf>/cap_regs
Date: Oct 2024
Contact: linux-crypto@vger.kernel.org
Description: Dump the values of the qm and zip capability bit registers and
support the query of device specifications to facilitate fault locating.
Available for both PF and VF, and take no other effect on ZIP.

View File

@ -0,0 +1,25 @@
What: /sys/bus/event_source/devices/vpa_pmu/format
Date: November 2024
Contact: Linux on PowerPC Developer List <linuxppc-dev@lists.ozlabs.org>
Description: Read-only. Attribute group to describe the magic bits
that go into perf_event_attr.config for a particular pmu.
(See ABI/testing/sysfs-bus-event_source-devices-format).
Each attribute under this group defines a bit range of the
perf_event_attr.config. Supported attribute are listed
below::
event = "config:0-31" - event ID
For example::
l1_to_l2_lat = "event=0x1"
What: /sys/bus/event_source/devices/vpa_pmu/events
Date: November 2024
Contact: Linux on PowerPC Developer List <linuxppc-dev@lists.ozlabs.org>
Description: Read-only. Attribute group to describe performance monitoring
events for the Virtual Processor Area events. Each attribute
in this group describes a single performance monitoring event
supported by vpa_pmu. The name of the file is the name of
the event (See ABI/testing/sysfs-bus-event_source-devices-events).

View File

@ -2268,6 +2268,30 @@ Description:
An example format is 16-bytes, 2-digits-per-byte, HEX-string
representing the sensor unique ID number.
What: /sys/bus/iio/devices/iio:deviceX/filter_type_available
What: /sys/bus/iio/devices/iio:deviceX/in_voltage-voltage_filter_mode_available
KernelVersion: 6.1
Contact: linux-iio@vger.kernel.org
Description:
Reading returns a list with the possible filter modes. Options
for the attribute:
* "sinc3" - The digital sinc3 filter. Moderate 1st
conversion time. Good noise performance.
* "sinc4" - Sinc 4. Excellent noise performance. Long
1st conversion time.
* "sinc5" - The digital sinc5 filter. Excellent noise
performance
* "sinc4+sinc1" - Sinc4 + averaging by 8. Low 1st conversion
time.
* "sinc3+rej60" - Sinc3 + 60Hz rejection.
* "sinc3+sinc1" - Sinc3 + averaging by 8. Low 1st conversion
time.
* "sinc3+pf1" - Sinc3 + device specific Post Filter 1.
* "sinc3+pf2" - Sinc3 + device specific Post Filter 2.
* "sinc3+pf3" - Sinc3 + device specific Post Filter 3.
* "sinc3+pf4" - Sinc3 + device specific Post Filter 4.
What: /sys/.../events/in_proximity_thresh_either_runningperiod
KernelVersion: 6.6
Contact: linux-iio@vger.kernel.org
@ -2339,3 +2363,11 @@ KernelVersion: 6.10
Contact: linux-iio@vger.kernel.org
Description:
The value of current sense resistor in Ohms.
What: /sys/.../iio:deviceX/in_attention_input
KernelVersion: 6.13
Contact: linux-iio@vger.kernel.org
Description:
Value representing the user's attention to the system expressed
in units as percentage. This usually means if the user is
looking at the screen or not.

View File

@ -1,46 +0,0 @@
What: /sys/bus/iio/devices/iio:deviceX/in_voltage-voltage_filter_mode_available
KernelVersion: 6.2
Contact: linux-iio@vger.kernel.org
Description:
Reading returns a list with the possible filter modes.
* "sinc4" - Sinc 4. Excellent noise performance. Long
1st conversion time. No natural 50/60Hz rejection.
* "sinc4+sinc1" - Sinc4 + averaging by 8. Low 1st conversion
time.
* "sinc3" - Sinc3. Moderate 1st conversion time.
Good noise performance.
* "sinc3+rej60" - Sinc3 + 60Hz rejection. At a sampling
frequency of 50Hz, achieves simultaneous 50Hz and 60Hz
rejection.
* "sinc3+sinc1" - Sinc3 + averaging by 8. Low 1st conversion
time. Best used with a sampling frequency of at least
216.19Hz.
* "sinc3+pf1" - Sinc3 + Post Filter 1. 53dB rejection @
50Hz, 58dB rejection @ 60Hz.
* "sinc3+pf2" - Sinc3 + Post Filter 2. 70dB rejection @
50Hz, 70dB rejection @ 60Hz.
* "sinc3+pf3" - Sinc3 + Post Filter 3. 99dB rejection @
50Hz, 103dB rejection @ 60Hz.
* "sinc3+pf4" - Sinc3 + Post Filter 4. 103dB rejection @
50Hz, 109dB rejection @ 60Hz.
What: /sys/bus/iio/devices/iio:deviceX/in_voltageY-voltageZ_filter_mode
KernelVersion: 6.2
Contact: linux-iio@vger.kernel.org
Description:
Set the filter mode of the differential channel. When the filter
mode changes, the in_voltageY-voltageZ_sampling_frequency and
in_voltageY-voltageZ_sampling_frequency_available attributes
might also change to accommodate the new filter mode.
If the current sampling frequency is out of range for the new
filter mode, the sampling frequency will be changed to the
closest valid one.

View File

@ -163,6 +163,17 @@ Description:
will be present in sysfs. Writing 1 to this file
will perform reset.
What: /sys/bus/pci/devices/.../reset_subordinate
Date: October 2024
Contact: linux-pci@vger.kernel.org
Description:
This is visible only for bridge devices. If you want to reset
all devices attached through the subordinate bus of a specific
bridge device, writing 1 to this will try to do it. This will
affect all devices attached to the system through this bridge
similiar to writing 1 to their individual "reset" file, so use
with caution.
What: /sys/bus/pci/devices/.../vpd
Date: February 2008
Contact: Ben Hutchings <bwh@kernel.org>

View File

@ -0,0 +1,12 @@
What: /sys/bus/platform/drivers/amd_x3d_vcache/AMDI0101:00/amd_x3d_mode
Date: November 2024
KernelVersion: 6.13
Contact: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Description: (RW) AMD 3D V-Cache optimizer allows users to switch CPU core
rankings dynamically.
This file switches between these two modes:
- "frequency" cores within the faster CCD are prioritized before
those in the slower CCD.
- "cache" cores within the larger L3 CCD are prioritized before
those in the smaller L3 CCD.

View File

@ -193,7 +193,7 @@ Description:
mechanism:
The means of authentication. This attribute is mandatory.
Only supported type currently is "password".
Supported types are "password" or "certificate".
max_password_length:
A file that can be read to obtain the
@ -303,6 +303,7 @@ Description:
being configured allowing anyone to make changes.
After any of these operations the system must reboot for the changes to
take effect.
Admin and System certificates are supported from 2025 systems onward.
certificate_thumbprint:
Read only attribute used to display the MD5, SHA1 and SHA256 thumbprints

View File

@ -149,6 +149,19 @@ Description:
advertise to the partner. The currently used capabilities are in
brackets. Selection happens by writing to the file.
What: /sys/class/typec/<port>/usb_capability
Date: November 2024
Contact: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Description: Lists the supported USB Modes. The default USB mode that is used
next time with the Enter_USB Message is in brackets. The default
mode can be changed by writing to the file when supported by the
driver.
Valid values:
- usb2 (USB 2.0)
- usb3 (USB 3.2)
- usb4 (USB4)
USB Type-C partner devices (eg. /sys/class/typec/port0-partner/)
What: /sys/class/typec/<port>-partner/accessory_mode
@ -220,6 +233,20 @@ Description:
directory exists, it will have an attribute file for every VDO
in Discover Identity command result.
What: /sys/class/typec/<port>-partner/usb_mode
Date: November 2024
Contact: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Description: The USB Modes that the partner device supports. The active mode
is displayed in brackets. The active USB mode can be changed by
writing to this file when the port driver is able to send Data
Reset Message to the partner. That requires USB Power Delivery
contract between the partner and the port.
Valid values:
- usb2 (USB 2.0)
- usb3 (USB 3.2)
- usb4 (USB4)
USB Type-C cable devices (eg. /sys/class/typec/port0-cable/)
Note: Electronically Marked Cables will have a device also for one cable plug

View File

@ -79,3 +79,48 @@ Description:
indicates a lane.
crc_err_cnt: (RO) CRC err count on this port.
============= ==== =============================================
What: /sys/devices/platform/HISI04Bx:00/used_types
Date: August 2024
KernelVersion: 6.12
Contact: Huisong Li <lihuisong@huawei.com>
Description:
This interface is used to show all HCCS types used on the
platform, like, HCCS-v1, HCCS-v2 and so on.
What: /sys/devices/platform/HISI04Bx:00/available_inc_dec_lane_types
What: /sys/devices/platform/HISI04Bx:00/dec_lane_of_type
What: /sys/devices/platform/HISI04Bx:00/inc_lane_of_type
Date: August 2024
KernelVersion: 6.12
Contact: Huisong Li <lihuisong@huawei.com>
Description:
These interfaces under /sys/devices/platform/HISI04Bx/ are
used to support the low power consumption feature of some
HCCS types by changing the number of lanes used. The interfaces
changing the number of lanes used are 'dec_lane_of_type' and
'inc_lane_of_type' which require root privileges. These
interfaces aren't exposed if no HCCS type on platform support
this feature. Please note that decreasing lane number is only
allowed if all the specified HCCS ports are not busy.
The low power consumption interfaces are as follows:
============================= ==== ================================
available_inc_dec_lane_types: (RO) available HCCS types (string) to
increase and decrease the number
of lane used, e.g. HCCS-v2.
dec_lane_of_type: (WO) input HCCS type supported
decreasing lane to decrease the
used lane number of all specified
HCCS type ports on platform to
the minimum.
You can query the 'cur_lane_num'
to get the minimum lane number
after executing successfully.
inc_lane_of_type: (WO) input HCCS type supported
increasing lane to increase the
used lane number of all specified
HCCS type ports on platform to
the full lane state.
============================= ==== ================================

View File

@ -0,0 +1,38 @@
What: /sys/bus/hid/drivers/hid-corsair-void/<dev>/fw_version_headset
Date: January 2024
KernelVersion: 6.13
Contact: Stuart Hayhurst <stuart.a.hayhurst@gmail.com>
Description: (R) The firmware version of the headset
* Returns -ENODATA if no version was reported
What: /sys/bus/hid/drivers/hid-corsair-void/<dev>/fw_version_receiver
Date: January 2024
KernelVersion: 6.13
Contact: Stuart Hayhurst <stuart.a.hayhurst@gmail.com>
Description: (R) The firmware version of the receiver
What: /sys/bus/hid/drivers/hid-corsair-void/<dev>/microphone_up
Date: July 2023
KernelVersion: 6.13
Contact: Stuart Hayhurst <stuart.a.hayhurst@gmail.com>
Description: (R) Get the physical position of the microphone
* 1 -> Microphone up
* 0 -> Microphone down
What: /sys/bus/hid/drivers/hid-corsair-void/<dev>/send_alert
Date: July 2023
KernelVersion: 6.13
Contact: Stuart Hayhurst <stuart.a.hayhurst@gmail.com>
Description: (W) Play a built-in notification from the headset (0 / 1)
What: /sys/bus/hid/drivers/hid-corsair-void/<dev>/set_sidetone
Date: December 2023
KernelVersion: 6.13
Contact: Stuart Hayhurst <stuart.a.hayhurst@gmail.com>
Description: (W) Set the sidetone volume (0 - sidetone_max)
What: /sys/bus/hid/drivers/hid-corsair-void/<dev>/sidetone_max
Date: July 2024
KernelVersion: 6.13
Contact: Stuart Hayhurst <stuart.a.hayhurst@gmail.com>
Description: (R) Report the maximum sidetone volume

View File

@ -83,3 +83,11 @@ Contact: intel-gfx@lists.freedesktop.org
Description: RO. Fan speed of device in RPM.
Only supported for particular Intel i915 graphics platforms.
What: /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/temp1_input
Date: November 2024
KernelVersion: 6.12
Contact: intel-gfx@lists.freedesktop.org
Description: RO. GPU package temperature in millidegree Celsius.
Only supported for particular Intel i915 graphics platforms.

View File

@ -0,0 +1,10 @@
What: /sys/bus/platform/drivers/panthor/.../profiling
Date: September 2024
KernelVersion: 6.11.0
Contact: Adrian Larumbe <adrian.larumbe@collabora.com>
Description:
Bitmask to enable drm fdinfo's job profiling measurements.
Valid values are:
0: Don't enable fdinfo job profiling sources.
1: Enable GPU cycle measurements for running jobs.
2: Enable GPU timestamp sampling for running jobs.

View File

@ -0,0 +1,20 @@
What: /sys/devices/.../intel_spi_protected
Date: Feb 2025
KernelVersion: 6.13
Contact: Alexander Usyskin <alexander.usyskin@intel.com>
Description: This attribute allows the userspace to check if the
Intel SPI flash controller is write protected from the host.
What: /sys/devices/.../intel_spi_locked
Date: Feb 2025
KernelVersion: 6.13
Contact: Alexander Usyskin <alexander.usyskin@intel.com>
Description: This attribute allows the user space to check if the
Intel SPI flash controller locks supported opcodes.
What: /sys/devices/.../intel_spi_bios_locked
Date: Feb 2025
KernelVersion: 6.13
Contact: Alexander Usyskin <alexander.usyskin@intel.com>
Description: This attribute allows the user space to check if the
Intel SPI flash controller BIOS region is locked for writes.

View File

@ -16,3 +16,14 @@ Description: Control strategy of sync decompression:
readahead on atomic contexts only.
- 1 (force on): enable for readpage and readahead.
- 2 (force off): disable for all situations.
What: /sys/fs/erofs/<disk>/drop_caches
Date: November 2024
Contact: "Guo Chunhai" <guochunhai@vivo.com>
Description: Writing to this will drop compression-related caches,
currently used to drop in-memory pclusters and cached
compressed folios:
- 1 : invalidate cached compressed folios
- 2 : drop in-memory pclusters
- 3 : drop in-memory pclusters and cached compressed folios

View File

@ -311,10 +311,13 @@ Description: Do background GC aggressively when set. Set to 0 by default.
GC approach and turns SSR mode on.
gc urgent low(2): lowers the bar of checking I/O idling in
order to process outstanding discard commands and GC a
little bit aggressively. uses cost benefit GC approach.
little bit aggressively. always uses cost benefit GC approach,
and will override age-threshold GC approach if ATGC is enabled
at the same time.
gc urgent mid(3): does GC forcibly in a period of given
gc_urgent_sleep_time and executes a mid level of I/O idling check.
uses cost benefit GC approach.
always uses cost benefit GC approach, and will override
age-threshold GC approach if ATGC is enabled at the same time.
What: /sys/fs/f2fs/<disk>/gc_urgent_sleep_time
Date: August 2017
@ -819,3 +822,9 @@ Description: It controls the valid block ratio threshold not to trigger excessiv
for zoned deivces. The initial value of it is 95(%). F2FS will stop the
background GC thread from intiating GC for sections having valid blocks
exceeding the ratio.
What: /sys/fs/f2fs/<disk>/max_read_extent_count
Date: November 2024
Contact: "Chao Yu" <chao@kernel.org>
Description: It controls max read extent count for per-inode, the value of threshold
is 10240 by default.

View File

@ -117,6 +117,35 @@ by the PCI endpoint function driver.
The PCI endpoint function driver should use pci_epc_mem_free_addr() to
free the memory space allocated using pci_epc_mem_alloc_addr().
* pci_epc_map_addr()
A PCI endpoint function driver should use pci_epc_map_addr() to map to a RC
PCI address the CPU address of local memory obtained with
pci_epc_mem_alloc_addr().
* pci_epc_unmap_addr()
A PCI endpoint function driver should use pci_epc_unmap_addr() to unmap the
CPU address of local memory mapped to a RC address with pci_epc_map_addr().
* pci_epc_mem_map()
A PCI endpoint controller may impose constraints on the RC PCI addresses that
can be mapped. The function pci_epc_mem_map() allows endpoint function
drivers to allocate and map controller memory while handling such
constraints. This function will determine the size of the memory that must be
allocated with pci_epc_mem_alloc_addr() for successfully mapping a RC PCI
address range. This function will also indicate the size of the PCI address
range that was actually mapped, which can be less than the requested size, as
well as the offset into the allocated memory to use for accessing the mapped
RC PCI address range.
* pci_epc_mem_unmap()
A PCI endpoint function driver can use pci_epc_mem_unmap() to unmap and free
controller memory that was allocated and mapped using pci_epc_mem_map().
Other EPC APIs
~~~~~~~~~~~~~~

View File

@ -18,3 +18,4 @@ PCI Bus Subsystem
pcieaer-howto
endpoint/index
boot-interrupts
tph

View File

@ -217,8 +217,12 @@ capability structure except the PCI Express capability structure,
that is shared between many drivers including the service drivers.
RMW Capability accessors (pcie_capability_clear_and_set_word(),
pcie_capability_set_word(), and pcie_capability_clear_word()) protect
a selected set of PCI Express Capability Registers (Link Control
Register and Root Control Register). Any change to those registers
should be performed using RMW accessors to avoid problems due to
concurrent updates. For the up-to-date list of protected registers,
see pcie_capability_clear_and_set_word().
a selected set of PCI Express Capability Registers:
* Link Control Register
* Root Control Register
* Link Control 2 Register
Any change to those registers should be performed using RMW accessors to
avoid problems due to concurrent updates. For the up-to-date list of
protected registers, see pcie_capability_clear_and_set_word().

132
Documentation/PCI/tph.rst Normal file
View File

@ -0,0 +1,132 @@
.. SPDX-License-Identifier: GPL-2.0
===========
TPH Support
===========
:Copyright: 2024 Advanced Micro Devices, Inc.
:Authors: - Eric van Tassell <eric.vantassell@amd.com>
- Wei Huang <wei.huang2@amd.com>
Overview
========
TPH (TLP Processing Hints) is a PCIe feature that allows endpoint devices
to provide optimization hints for requests that target memory space.
These hints, in a format called Steering Tags (STs), are embedded in the
requester's TLP headers, enabling the system hardware, such as the Root
Complex, to better manage platform resources for these requests.
For example, on platforms with TPH-based direct data cache injection
support, an endpoint device can include appropriate STs in its DMA
traffic to specify which cache the data should be written to. This allows
the CPU core to have a higher probability of getting data from cache,
potentially improving performance and reducing latency in data
processing.
How to Use TPH
==============
TPH is presented as an optional extended capability in PCIe. The Linux
kernel handles TPH discovery during boot, but it is up to the device
driver to request TPH enablement if it is to be utilized. Once enabled,
the driver uses the provided API to obtain the Steering Tag for the
target memory and to program the ST into the device's ST table.
Enable TPH support in Linux
---------------------------
To support TPH, the kernel must be built with the CONFIG_PCIE_TPH option
enabled.
Manage TPH
----------
To enable TPH for a device, use the following function::
int pcie_enable_tph(struct pci_dev *pdev, int mode);
This function enables TPH support for device with a specific ST mode.
Current supported modes include:
* PCI_TPH_ST_NS_MODE - NO ST Mode
* PCI_TPH_ST_IV_MODE - Interrupt Vector Mode
* PCI_TPH_ST_DS_MODE - Device Specific Mode
`pcie_enable_tph()` checks whether the requested mode is actually
supported by the device before enabling. The device driver can figure out
which TPH mode is supported and can be properly enabled based on the
return value of `pcie_enable_tph()`.
To disable TPH, use the following function::
void pcie_disable_tph(struct pci_dev *pdev);
Manage ST
---------
Steering Tags are platform specific. PCIe spec does not specify where STs
are from. Instead PCI Firmware Specification defines an ACPI _DSM method
(see the `Revised _DSM for Cache Locality TPH Features ECN
<https://members.pcisig.com/wg/PCI-SIG/document/15470>`_) for retrieving
STs for a target memory of various properties. This method is what is
supported in this implementation.
To retrieve a Steering Tag for a target memory associated with a specific
CPU, use the following function::
int pcie_tph_get_cpu_st(struct pci_dev *pdev, enum tph_mem_type type,
unsigned int cpu_uid, u16 *tag);
The `type` argument is used to specify the memory type, either volatile
or persistent, of the target memory. The `cpu_uid` argument specifies the
CPU where the memory is associated to.
After the ST value is retrieved, the device driver can use the following
function to write the ST into the device::
int pcie_tph_set_st_entry(struct pci_dev *pdev, unsigned int index,
u16 tag);
The `index` argument is the ST table entry index the ST tag will be
written into. `pcie_tph_set_st_entry()` will figure out the proper
location of ST table, either in the MSI-X table or in the TPH Extended
Capability space, and write the Steering Tag into the ST entry pointed by
the `index` argument.
It is completely up to the driver to decide how to use these TPH
functions. For example a network device driver can use the TPH APIs above
to update the Steering Tag when interrupt affinity of a RX/TX queue has
been changed. Here is a sample code for IRQ affinity notifier:
.. code-block:: c
static void irq_affinity_notified(struct irq_affinity_notify *notify,
const cpumask_t *mask)
{
struct drv_irq *irq;
unsigned int cpu_id;
u16 tag;
irq = container_of(notify, struct drv_irq, affinity_notify);
cpumask_copy(irq->cpu_mask, mask);
/* Pick a right CPU as the target - here is just an example */
cpu_id = cpumask_first(irq->cpu_mask);
if (pcie_tph_get_cpu_st(irq->pdev, TPH_MEM_TYPE_VM, cpu_id,
&tag))
return;
if (pcie_tph_set_st_entry(irq->pdev, irq->msix_nr, tag))
return;
}
Disable TPH system-wide
-----------------------
There is a kernel command line option available to control TPH feature:
* "notph": TPH will be disabled for all endpoint devices.

View File

@ -249,7 +249,7 @@ ticks this GP)" indicates that this CPU has not taken any scheduling-clock
interrupts during the current stalled grace period.
The "idle=" portion of the message prints the dyntick-idle state.
The hex number before the first "/" is the low-order 12 bits of the
The hex number before the first "/" is the low-order 16 bits of the
dynticks counter, which will have an even-numbered value if the CPU
is in dyntick-idle mode and an odd-numbered value otherwise. The hex
number between the two "/"s is the value of the nesting, which will be

View File

@ -0,0 +1,14 @@
.. SPDX-License-Identifier: GPL-2.0-only
===============================
Qualcomm Cloud AI 80 (AIC080)
===============================
Overview
========
The Qualcomm Cloud AI 80/AIC080 family of products are a derivative of AIC100.
The number of NSPs and clock rates are reduced to fit within resource
constrained solutions. The PCIe Product ID is 0xa080.
As a derivative product, all AIC100 documentation applies.

View File

@ -229,6 +229,8 @@ of the defined channels, and their uses.
| _PERIODIC | | | timestamps in the device side logs with|
| | | | the host time source. |
+----------------+---------+----------+----------------------------------------+
| IPCR | 24 & 25 | AMSS | AF_QIPCRTR clients and servers. |
+----------------+---------+----------+----------------------------------------+
DMA Bridge
==========

View File

@ -10,4 +10,5 @@ accelerator cards.
.. toctree::
qaic
aic080
aic100

View File

@ -18,8 +18,11 @@ set ``CONFIG_SECURITY_APPARMOR=y``
If AppArmor should be selected as the default security module then set::
CONFIG_DEFAULT_SECURITY="apparmor"
CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE=1
CONFIG_DEFAULT_SECURITY_APPARMOR=y
The CONFIG_LSM parameter manages the order and selection of LSMs.
Specify apparmor as the first "major" module (e.g. AppArmor, SELinux, Smack)
in the list.
Build the kernel

View File

@ -47,6 +47,8 @@ The list of possible return codes:
-ENOMEM zram was not able to allocate enough memory to fulfil your
needs.
-EINVAL invalid input has been provided.
-EAGAIN re-try operation later (e.g. when attempting to run recompress
and writeback simultaneously).
======== =============================================================
If you use 'echo', the returned value is set by the 'echo' utility,

View File

@ -108,6 +108,27 @@ a fully reliable and straight-forward way to reproduce the regression, too.*
With that the process is complete. Now report the regression as described by
Documentation/admin-guide/reporting-issues.rst.
Bisecting linux-next
--------------------
If you face a problem only happening in linux-next, bisect between the
linux-next branches 'stable' and 'master'. The following commands will start
the process for a linux-next tree you added as a remote called 'next'::
git bisect start
git bisect good next/stable
git bisect bad next/master
The 'stable' branch refers to the state of linux-mainline that the current
linux-next release (found in the 'master' branch) is based on -- the former
thus should be free of any problems that show up in -next, but not in Linus'
tree.
This will bisect across a wide range of changes, some of which you might have
used in earlier linux-next releases without problems. Sadly there is no simple
way to avoid checking them: bisecting from one linux-next release to a later
one (say between 'next-20241020' and 'next-20241021') is impossible, as they
share no common history.
Additional reading material
---------------------------

View File

@ -90,9 +90,7 @@ Brief summary of control files.
used.
memory.swappiness set/show swappiness parameter of vmscan
(See sysctl's vm.swappiness)
memory.move_charge_at_immigrate set/show controls of moving charges
This knob is deprecated and shouldn't be
used.
memory.move_charge_at_immigrate This knob is deprecated.
memory.oom_control set/show oom controls.
This knob is deprecated and shouldn't be
used.
@ -243,10 +241,6 @@ behind this approach is that a cgroup that aggressively uses a shared
page will eventually get charged for it (once it is uncharged from
the cgroup that brought it in -- this will happen on memory pressure).
But see :ref:`section 8.2 <cgroup-v1-memory-movable-charges>` when moving a
task to another cgroup, its pages may be recharged to the new cgroup, if
move_charge_at_immigrate has been chosen.
2.4 Swap Extension
--------------------------------------
@ -756,78 +750,8 @@ If we want to change this to 1G, we can at any time use::
THIS IS DEPRECATED!
It's expensive and unreliable! It's better practice to launch workload
tasks directly from inside their target cgroup. Use dedicated workload
cgroups to allow fine-grained policy adjustments without having to
move physical pages between control domains.
Users can move charges associated with a task along with task migration, that
is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
This feature is not supported in !CONFIG_MMU environments because of lack of
page tables.
8.1 Interface
-------------
This feature is disabled by default. It can be enabled (and disabled again) by
writing to memory.move_charge_at_immigrate of the destination cgroup.
If you want to enable it::
# echo (some positive value) > memory.move_charge_at_immigrate
.. note::
Each bits of move_charge_at_immigrate has its own meaning about what type
of charges should be moved. See :ref:`section 8.2
<cgroup-v1-memory-movable-charges>` for details.
.. note::
Charges are moved only when you move mm->owner, in other words,
a leader of a thread group.
.. note::
If we cannot find enough space for the task in the destination cgroup, we
try to make space by reclaiming memory. Task migration may fail if we
cannot make enough space.
.. note::
It can take several seconds if you move charges much.
And if you want disable it again::
# echo 0 > memory.move_charge_at_immigrate
.. _cgroup-v1-memory-movable-charges:
8.2 Type of charges which can be moved
--------------------------------------
Each bit in move_charge_at_immigrate has its own meaning about what type of
charges should be moved. But in any case, it must be noted that an account of
a page or a swap can be moved only when it is charged to the task's current
(old) memory cgroup.
+---+--------------------------------------------------------------------------+
|bit| what type of charges would be moved ? |
+===+==========================================================================+
| 0 | A charge of an anonymous page (or swap of it) used by the target task. |
| | You must enable Swap Extension (see 2.4) to enable move of swap charges. |
+---+--------------------------------------------------------------------------+
| 1 | A charge of file pages (normal file, tmpfs file (e.g. ipc shared memory) |
| | and swaps of tmpfs file) mmapped by the target task. Unlike the case of |
| | anonymous pages, file pages (and swaps) in the range mmapped by the task |
| | will be moved even if the task hasn't done page fault, i.e. they might |
| | not be the task's "RSS", but other task's "RSS" that maps the same file. |
| | The mapcount of the page is ignored (the page can be moved independent |
| | of the mapcount). You must enable Swap Extension (see 2.4) to |
| | enable move of swap charges. |
+---+--------------------------------------------------------------------------+
8.3 TODO
--------
- All of moving charge operations are done under cgroup_mutex. It's not good
behavior to hold the mutex too long, so we may need some trick.
Reading memory.move_charge_at_immigrate will always return 0 and writing
to it will always return -EINVAL.
9. Memory thresholds
====================

View File

@ -1599,6 +1599,15 @@ The following nested keys are defined.
pglazyfreed (npn)
Amount of reclaimed lazyfree pages
swpin_zero
Number of pages swapped into memory and filled with zero, where I/O
was optimized out because the page content was detected to be zero
during swapout.
swpout_zero
Number of zero-filled pages swapped out with I/O skipped due to the
content being detected as zero.
zswpin
Number of pages moved in to memory from zswap.
@ -1646,6 +1655,11 @@ The following nested keys are defined.
pgdemote_khugepaged
Number of pages demoted by khugepaged.
hugetlb
Amount of memory used by hugetlb pages. This metric only shows
up if hugetlb usage is accounted for in memory.current (i.e.
cgroup is mounted with the memory_hugetlb_accounting option).
memory.numa_stat
A read-only nested-keyed file which exists on non-root cgroups.
@ -2945,7 +2959,7 @@ following two functions.
a queue (device) has been associated with the bio and
before submission.
wbc_account_cgroup_owner(@wbc, @page, @bytes)
wbc_account_cgroup_owner(@wbc, @folio, @bytes)
Should be called for each data segment being written out.
While this function doesn't care exactly when it's called
during the writeback session, it's the easiest and most

View File

@ -27,6 +27,16 @@ kernel command line (/proc/cmdline) and collects module parameters
when it loads a module, so the kernel command line can be used for
loadable modules too.
This document may not be entirely up to date and comprehensive. The command
"modinfo -p ${modulename}" shows a current list of all parameters of a loadable
module. Loadable modules, after being loaded into the running kernel, also
reveal their parameters in /sys/module/${modulename}/parameters/. Some of these
parameters may be changed at runtime by the command
``echo -n ${value} > /sys/module/${modulename}/parameters/${parm}``.
Special handling
----------------
Hyphens (dashes) and underscores are equivalent in parameter names, so::
log_buf_len=1M print-fatal-signals=1
@ -39,8 +49,8 @@ Double-quotes can be used to protect spaces in values, e.g.::
param="spaces in here"
cpu lists:
----------
cpu lists
~~~~~~~~~
Some kernel parameters take a list of CPUs as a value, e.g. isolcpus,
nohz_full, irqaffinity, rcu_nocbs. The format of this list is:
@ -82,12 +92,17 @@ so that "nohz_full=all" is the equivalent of "nohz_full=0-N".
The semantics of "N" and "all" is supported on a level of bitmaps and holds for
all users of bitmap_parselist().
This document may not be entirely up to date and comprehensive. The command
"modinfo -p ${modulename}" shows a current list of all parameters of a loadable
module. Loadable modules, after being loaded into the running kernel, also
reveal their parameters in /sys/module/${modulename}/parameters/. Some of these
parameters may be changed at runtime by the command
``echo -n ${value} > /sys/module/${modulename}/parameters/${parm}``.
Metric suffixes
~~~~~~~~~~~~~~~
The [KMG] suffix is commonly described after a number of kernel
parameter values. 'K', 'M', 'G', 'T', 'P', and 'E' suffixes are allowed.
These letters represent the _binary_ multipliers 'Kilo', 'Mega', 'Giga',
'Tera', 'Peta', and 'Exa', equaling 2^10, 2^20, 2^30, 2^40, 2^50, and
2^60 bytes respectively. Such letter suffixes can also be entirely omitted.
Kernel Build Options
--------------------
The parameters listed below are only valid if certain kernel build options
were enabled and if respective hardware is present. This list should be kept
@ -159,6 +174,7 @@ is applicable::
SCSI Appropriate SCSI support is enabled.
A lot of drivers have their options described inside
the Documentation/scsi/ sub-directory.
SDW SoundWire support is enabled.
SECURITY Different security models are enabled.
SELINUX SELinux support is enabled.
SERIAL Serial support is enabled.
@ -211,10 +227,5 @@ a fixed number of characters. This limit depends on the architecture
and is between 256 and 4096 characters. It is defined in the file
./include/uapi/asm-generic/setup.h as COMMAND_LINE_SIZE.
Finally, the [KMG] suffix is commonly described after a number of kernel
parameter values. These 'K', 'M', and 'G' letters represent the _binary_
multipliers 'Kilo', 'Mega', and 'Giga', equaling 2^10, 2^20, and 2^30
bytes respectively. Such letter suffixes can also be entirely omitted:
.. include:: kernel-parameters.txt
:literal:

View File

@ -446,6 +446,9 @@
arm64.nobti [ARM64] Unconditionally disable Branch Target
Identification support
arm64.nogcs [ARM64] Unconditionally disable Guarded Control Stack
support
arm64.nomops [ARM64] Unconditionally disable Memory Copy and Memory
Set instructions support
@ -918,12 +921,16 @@
the parameter has no effect.
crash_kexec_post_notifiers
Run kdump after running panic-notifiers and dumping
kmsg. This only for the users who doubt kdump always
succeeds in any situation.
Note that this also increases risks of kdump failure,
because some panic notifiers can make the crashed
kernel more unstable.
Only jump to kdump kernel after running the panic
notifiers and dumping kmsg. This option increases
the risks of a kdump failure, since some panic
notifiers can make the crashed kernel more unstable.
In configurations where kdump may not be reliable,
running the panic notifiers could allow collecting
more data on dmesg, like stack traces from other CPUS
or extra data dumped by panic_print. Note that some
configurations enable this option unconditionally,
like Hyper-V, PowerPC (fadump) and AMD SEV-SNP.
crashkernel=size[KMG][@offset[KMG]]
[KNL,EARLY] Using kexec, Linux can switch to a 'crash kernel'
@ -1546,6 +1553,7 @@
failslab=
fail_usercopy=
fail_page_alloc=
fail_skb_realloc=
fail_make_request=[KNL]
General fault injection mechanism.
Format: <interval>,<probability>,<space>,<times>
@ -4678,6 +4686,10 @@
nomio [S390] Do not use MIO instructions.
norid [S390] ignore the RID field and force use of
one PCI domain per PCI function
notph [PCIE] If the PCIE_TPH kernel config parameter
is enabled, this kernel boot option can be used
to disable PCIe TLP Processing Hints support
system-wide.
pcie_aspm= [PCIE] Forcibly enable or ignore PCIe Active State Power
Management.
@ -5412,11 +5424,6 @@
Set time (jiffies) between CPU-hotplug operations,
or zero to disable CPU-hotplug testing.
rcutorture.read_exit= [KNL]
Set the number of read-then-exit kthreads used
to test the interaction of RCU updaters and
task-exit processing.
rcutorture.read_exit_burst= [KNL]
The number of times in a given read-then-exit
episode that a set of read-then-exit kthreads
@ -5426,6 +5433,14 @@
The delay, in seconds, between successive
read-then-exit testing episodes.
rcutorture.reader_flavor= [KNL]
A bit mask indicating which readers to use.
If there is more than one bit set, the readers
are entered from low-order bit up, and are
exited in the opposite order. For SRCU, the
0x1 bit is normal readers, 0x2 NMI-safe readers,
and 0x4 light-weight readers.
rcutorture.shuffle_interval= [KNL]
Set task-shuffle interval (s). Shuffling tasks
allows some CPUs to go into dyntick-idle mode
@ -6060,6 +6075,10 @@
non-zero "wait" parameter. See weight_single
and weight_many.
sdw_mclk_divider=[SDW]
Specify the MCLK divider for Intel SoundWire buses in
case the BIOS does not provide the clock rate properly.
skew_tick= [KNL,EARLY] Offset the periodic timer tick per cpu to mitigate
xtime_lock contention on larger systems, and/or RCU lock
contention on all systems with CONFIG_MAXSMP set.
@ -6147,6 +6166,16 @@
For more information see Documentation/mm/slub.rst.
(slub_nomerge legacy name also accepted for now)
slab_strict_numa [MM]
Support memory policies on a per object level
in the slab allocator. The default is for memory
policies to be applied at the folio level when
a new folio is needed or a partial folio is
retrieved from the lists. Increases overhead
in the slab fastpaths but gains more accurate
NUMA kernel object placement which helps with slow
interconnects in NUMA systems.
slram= [HW,MTD]
smart2= [HW]
@ -6700,6 +6729,16 @@
Force threading of all interrupt handlers except those
marked explicitly IRQF_NO_THREAD.
thp_shmem= [KNL]
Format: <size>[KMG],<size>[KMG]:<policy>;<size>[KMG]-<size>[KMG]:<policy>
Control the default policy of each hugepage size for the
internal shmem mount. <policy> is one of policies available
for the shmem mount ("always", "inherit", "never", "within_size",
and "advise").
It can be used multiple times for multiple shmem THP sizes.
See Documentation/admin-guide/mm/transhuge.rst for more
details.
topology= [S390,EARLY]
Format: {off | on}
Specify if the kernel should make use of the cpu
@ -6727,6 +6766,15 @@
torture.verbose_sleep_duration= [KNL]
Duration of each verbose-printk() sleep in jiffies.
tpm.disable_pcr_integrity= [HW,TPM]
Do not protect PCR registers from unintended physical
access, or interposers in the bus by the means of
having an integrity protected session wrapped around
TPM2_PCR_Extend command. Consider this in a situation
where TPM is heavily utilized by IMA, thus protection
causing a major performance hit, and the space where
machines are deployed is by other means guarded.
tpm_suspend_pcr=[HW,TPM]
Format: integer pcr id
Specify that at suspend time, the tpm driver
@ -6867,6 +6915,12 @@
reserve_mem=12M:4096:trace trace_instance=boot_map^traceoff^traceprintk@trace,sched,irq
Note, saving the trace buffer across reboots does require that the system
is set up to not wipe memory. For instance, CONFIG_RESET_ATTACK_MITIGATION
can force a memory reset on boot which will clear any trace that was stored.
This is just one of many ways that can clear memory. Make sure your system
keeps the content of memory across reboots before relying on this option.
See also Documentation/trace/debugging.rst
@ -6926,6 +6980,13 @@
See Documentation/admin-guide/mm/transhuge.rst
for more details.
transparent_hugepage_shmem= [KNL]
Format: [always|within_size|advise|never|deny|force]
Can be used to control the hugepage allocation policy for
the internal shmem mount.
See Documentation/admin-guide/mm/transhuge.rst
for more details.
trusted.source= [KEYS]
Format: <string>
This parameter identifies the trust source as a backend

View File

@ -315,7 +315,7 @@ To reduce its OS jitter, do at least one of the following:
to do.
Name:
rcuop/%d and rcuos/%d
rcuop/%d, rcuos/%d, and rcuog/%d
Purpose:
Offload RCU callbacks from the corresponding CPU.

View File

@ -15,7 +15,7 @@ Please notice, however, that, if:
you should use the main media development tree ``master`` branch:
https://git.linuxtv.org/media_tree.git/
https://git.linuxtv.org/media.git/
In this case, you may find some useful information at the
`LinuxTv wiki pages <https://linuxtv.org/wiki>`_:

View File

@ -20,6 +20,11 @@ Documentation/driver-api/media/index.rst
- for driver development information and Kernel APIs used by
media devices;
Documentation/process/debugging/media_specific_debugging_guide.rst
- for advice about essential tools and techniques to debug drivers on this
subsystem
.. toctree::
:caption: Table of Contents
:maxdepth: 2

View File

@ -1,62 +0,0 @@
.. SPDX-License-Identifier: GPL-2.0
OMAP4 ISS Driver
================
Author: Sergio Aguirre <sergio.a.aguirre@gmail.com>
Copyright (C) 2012, Texas Instruments
Introduction
------------
The OMAP44XX family of chips contains the Imaging SubSystem (a.k.a. ISS),
Which contains several components that can be categorized in 3 big groups:
- Interfaces (2 Interfaces: CSI2-A & CSI2-B/CCP2)
- ISP (Image Signal Processor)
- SIMCOP (Still Image Coprocessor)
For more information, please look in [#f1]_ for latest version of:
"OMAP4430 Multimedia Device Silicon Revision 2.x"
As of Revision AB, the ISS is described in detail in section 8.
This driver is supporting **only** the CSI2-A/B interfaces for now.
It makes use of the Media Controller framework [#f2]_, and inherited most of the
code from OMAP3 ISP driver (found under drivers/media/platform/ti/omap3isp/\*),
except that it doesn't need an IOMMU now for ISS buffers memory mapping.
Supports usage of MMAP buffers only (for now).
Tested platforms
----------------
- OMAP4430SDP, w/ ES2.1 GP & SEVM4430-CAM-V1-0 (Contains IMX060 & OV5640, in
which only the last one is supported, outputting YUV422 frames).
- TI Blaze MDP, w/ OMAP4430 ES2.2 EMU (Contains 1 IMX060 & 2 OV5650 sensors, in
which only the OV5650 are supported, outputting RAW10 frames).
- PandaBoard, Rev. A2, w/ OMAP4430 ES2.1 GP & OV adapter board, tested with
following sensors:
* OV5640
* OV5650
- Tested on mainline kernel:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=summary
Tag: v3.3 (commit c16fa4f2ad19908a47c63d8fa436a1178438c7e7)
File list
---------
drivers/staging/media/omap4iss/
include/linux/platform_data/media/omap4iss.h
References
----------
.. [#f1] http://focus.ti.com/general/docs/wtbu/wtbudocumentcenter.tsp?navigationId=12037&templateId=6123#62
.. [#f2] http://lwn.net/Articles/420485/

View File

@ -0,0 +1,27 @@
digraph board {
rankdir=TB
n00000001 [label="{{<port0> 0} | csi2\n/dev/v4l-subdev0 | {<port1> 1 | <port2> 2 | <port3> 3 | <port4> 4}}", shape=Mrecord, style=filled, fillcolor=green]
n00000001:port1 -> n00000011 [style=dashed]
n00000001:port1 -> n00000007:port0
n00000001:port2 -> n00000015
n00000001:port2 -> n00000007:port0 [style=dashed]
n00000001:port3 -> n00000019 [style=dashed]
n00000001:port3 -> n00000007:port0 [style=dashed]
n00000001:port4 -> n0000001d [style=dashed]
n00000001:port4 -> n00000007:port0 [style=dashed]
n00000007 [label="{{<port0> 0 | <port1> 1} | pisp-fe\n/dev/v4l-subdev1 | {<port2> 2 | <port3> 3 | <port4> 4}}", shape=Mrecord, style=filled, fillcolor=green]
n00000007:port2 -> n00000021
n00000007:port3 -> n00000025 [style=dashed]
n00000007:port4 -> n00000029
n0000000d [label="{imx219 6-0010\n/dev/v4l-subdev2 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
n0000000d:port0 -> n00000001:port0 [style=bold]
n00000011 [label="rp1-cfe-csi2-ch0\n/dev/video0", shape=box, style=filled, fillcolor=yellow]
n00000015 [label="rp1-cfe-csi2-ch1\n/dev/video1", shape=box, style=filled, fillcolor=yellow]
n00000019 [label="rp1-cfe-csi2-ch2\n/dev/video2", shape=box, style=filled, fillcolor=yellow]
n0000001d [label="rp1-cfe-csi2-ch3\n/dev/video3", shape=box, style=filled, fillcolor=yellow]
n00000021 [label="rp1-cfe-fe-image0\n/dev/video4", shape=box, style=filled, fillcolor=yellow]
n00000025 [label="rp1-cfe-fe-image1\n/dev/video5", shape=box, style=filled, fillcolor=yellow]
n00000029 [label="rp1-cfe-fe-stats\n/dev/video6", shape=box, style=filled, fillcolor=yellow]
n0000002d [label="rp1-cfe-fe-config\n/dev/video7", shape=box, style=filled, fillcolor=yellow]
n0000002d -> n00000007:port1
}

View File

@ -0,0 +1,78 @@
.. SPDX-License-Identifier: GPL-2.0
============================================
Raspberry Pi PiSP Camera Front End (rp1-cfe)
============================================
The PiSP Camera Front End
=========================
The PiSP Camera Front End (CFE) is a module which combines a CSI-2 receiver with
a simple ISP, called the Front End (FE).
The CFE has four DMA engines and can write frames from four separate streams
received from the CSI-2 to the memory. One of those streams can also be routed
directly to the FE, which can do minimal image processing, write two versions
(e.g. non-scaled and downscaled versions) of the received frames to memory and
provide statistics of the received frames.
The FE registers are documented in the `Raspberry Pi Image Signal Processor
(ISP) Specification document
<https://datasheets.raspberrypi.com/camera/raspberry-pi-image-signal-processor-specification.pdf>`_,
and example code for FE can be found in `libpisp
<https://github.com/raspberrypi/libpisp>`_.
The rp1-cfe driver
==================
The Raspberry Pi PiSP Camera Front End (rp1-cfe) driver is located under
drivers/media/platform/raspberrypi/rp1-cfe. It uses the `V4L2 API` to register
a number of video capture and output devices, the `V4L2 subdev API` to register
subdevices for the CSI-2 received and the FE that connects the video devices in
a single media graph realized using the `Media Controller (MC) API`.
The media topology registered by the `rp1-cfe` driver, in this particular
example connected to an imx219 sensor, is the following one:
.. _rp1-cfe-topology:
.. kernel-figure:: raspberrypi-rp1-cfe.dot
:alt: Diagram of an example media pipeline topology
:align: center
The media graph contains the following video device nodes:
- rp1-cfe-csi2-ch0: capture device for the first CSI-2 stream
- rp1-cfe-csi2-ch1: capture device for the second CSI-2 stream
- rp1-cfe-csi2-ch2: capture device for the third CSI-2 stream
- rp1-cfe-csi2-ch3: capture device for the fourth CSI-2 stream
- rp1-cfe-fe-image0: capture device for the first FE output
- rp1-cfe-fe-image1: capture device for the second FE output
- rp1-cfe-fe-stats: capture device for the FE statistics
- rp1-cfe-fe-config: output device for FE configuration
rp1-cfe-csi2-chX
----------------
The rp1-cfe-csi2-chX capture devices are normal V4L2 capture devices which
can be used to capture video frames or metadata received from the CSI-2.
rp1-cfe-fe-image0, rp1-cfe-fe-image1
------------------------------------
The rp1-cfe-fe-image0 and rp1-cfe-fe-image1 capture devices are used to write
the processed frames to memory.
rp1-cfe-fe-stats
----------------
The format of the FE statistics buffer is defined by
:c:type:`pisp_statistics` C structure and the meaning of each parameter is
described in the `PiSP specification` document.
rp1-cfe-fe-config
-----------------
The format of the FE configuration buffer is defined by
:c:type:`pisp_fe_config` C structure and the meaning of each parameter is
described in the `PiSP specification` document.

View File

@ -67,7 +67,7 @@ Changes / Fixes
Please mail to linux-media AT vger.kernel.org unified diffs against
the linux media git tree:
https://git.linuxtv.org/media_tree.git/
https://git.linuxtv.org/media.git/
This is done by committing a patch at a clone of the git tree and
submitting the patch using ``git send-email``. Don't forget to

View File

@ -20,12 +20,12 @@ Video4Linux (V4L) driver-specific documentation
ivtv
mgb4
omap3isp
omap4_camera
philips
qcom_camss
raspberrypi-pisp-be
rcar-fdp1
rkisp1
raspberrypi-rp1-cfe
saa7134
si470x
si4713

View File

@ -326,6 +326,29 @@ PMD_ORDER THP policy will be overridden. If the policy for PMD_ORDER
is not defined within a valid ``thp_anon``, its policy will default to
``never``.
Similarly to ``transparent_hugepage``, you can control the hugepage
allocation policy for the internal shmem mount by using the kernel parameter
``transparent_hugepage_shmem=<policy>``, where ``<policy>`` is one of the
seven valid policies for shmem (``always``, ``within_size``, ``advise``,
``never``, ``deny``, and ``force``).
In the same manner as ``thp_anon`` controls each supported anonymous THP
size, ``thp_shmem`` controls each supported shmem THP size. ``thp_shmem``
has the same format as ``thp_anon``, but also supports the policy
``within_size``.
``thp_shmem=`` may be specified multiple times to configure all THP sizes
as required. If ``thp_shmem=`` is specified at least once, any shmem THP
sizes not explicitly configured on the command line are implicitly set to
``never``.
``transparent_hugepage_shmem`` setting only affects the global toggle. If
``thp_shmem`` is not specified, PMD_ORDER hugepage will default to
``inherit``. However, if a valid ``thp_shmem`` setting is provided by the
user, the PMD_ORDER hugepage policy will be overridden. If the policy for
PMD_ORDER is not defined within a valid ``thp_shmem``, its policy will
default to ``never``.
Hugepages in tmpfs/shmem
========================
@ -530,10 +553,18 @@ anon_fault_fallback_charge
instead falls back to using huge pages with lower orders or
small pages even though the allocation was successful.
swpout
is incremented every time a huge page is swapped out in one
zswpout
is incremented every time a huge page is swapped out to zswap in one
piece without splitting.
swpin
is incremented every time a huge page is swapped in from a non-zswap
swap device in one piece.
swpout
is incremented every time a huge page is swapped out to a non-zswap
swap device in one piece without splitting.
swpout_fallback
is incremented if a huge page has to be split before swapout.
Usually because failed to allocate some continuous swap space

View File

@ -26,3 +26,4 @@ Performance monitor support
meson-ddr-pmu
cxl
ampere_cspmu
mrvl-pem-pmu

View File

@ -0,0 +1,56 @@
=================================================================
Marvell Odyssey PEM Performance Monitoring Unit (PMU UNCORE)
=================================================================
The PCI Express Interface Units(PEM) are associated with a corresponding
monitoring unit. This includes performance counters to track various
characteristics of the data that is transmitted over the PCIe link.
The counters track inbound and outbound transactions which
includes separate counters for posted/non-posted/completion TLPs.
Also, inbound and outbound memory read requests along with their
latencies can also be monitored. Address Translation Services(ATS)events
such as ATS Translation, ATS Page Request, ATS Invalidation along with
their corresponding latencies are also tracked.
There are separate 64 bit counters to measure posted/non-posted/completion
tlps in inbound and outbound transactions. ATS events are measured by
different counters.
The PMU driver exposes the available events and format options under sysfs,
/sys/bus/event_source/devices/mrvl_pcie_rc_pmu_<>/events/
/sys/bus/event_source/devices/mrvl_pcie_rc_pmu_<>/format/
Examples::
# perf list | grep mrvl_pcie_rc_pmu
mrvl_pcie_rc_pmu_<>/ats_inv/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ats_inv_latency/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ats_pri/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ats_pri_latency/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ats_trans/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ats_trans_latency/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ib_inflight/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ib_reads/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ib_req_no_ro_ebus/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ib_req_no_ro_ncb/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ib_tlp_cpl_partid/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ib_tlp_dwords_cpl_partid/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ib_tlp_dwords_npr/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ib_tlp_dwords_pr/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ib_tlp_npr/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ib_tlp_pr/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ob_inflight_partid/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ob_merges_cpl_partid/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ob_merges_npr_partid/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ob_merges_pr_partid/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ob_reads_partid/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ob_tlp_cpl_partid/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ob_tlp_dwords_cpl_partid/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ob_tlp_dwords_npr_partid/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ob_tlp_dwords_pr_partid/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ob_tlp_npr_partid/ [Kernel PMU event]
mrvl_pcie_rc_pmu_<>/ob_tlp_pr_partid/ [Kernel PMU event]
# perf stat -e ib_inflight,ib_reads,ib_req_no_ro_ebus,ib_req_no_ro_ncb <workload>

View File

@ -38,6 +38,11 @@ requests. ``aio-max-nr`` allows you to change the maximum value
``aio-max-nr`` does not result in the
pre-allocation or re-sizing of any kernel data structures.
dentry-negative
----------------------------
Policy for negative dentries. Set to 1 to to always delete the dentry when a
file is removed, and 0 to disable it. By default, this behavior is disabled.
dentry-state
------------
@ -332,3 +337,13 @@ Each "watch" costs roughly 90 bytes on a 32-bit kernel, and roughly 160 bytes
on a 64-bit one.
The current default value for ``max_user_watches`` is 4% of the
available low memory, divided by the "watch" cost in bytes.
5. /proc/sys/fs/fuse - Configuration options for FUSE filesystems
=====================================================================
This directory contains the following configuration options for FUSE
filesystems:
``/proc/sys/fs/fuse/max_pages_limit`` is a read/write file for
setting/getting the maximum number of pages that can be used for servicing
requests in FUSE.

View File

@ -401,6 +401,15 @@ The upper bound on the number of tasks that are checked.
This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
hung_task_detect_count
======================
Indicates the total number of tasks that have been detected as hung since
the system boot.
This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
hung_task_timeout_secs
======================

View File

@ -0,0 +1,69 @@
.. SPDX-License-Identifier: GPL-2.0
=====================================
Arm Confidential Compute Architecture
=====================================
Arm systems that support the Realm Management Extension (RME) contain
hardware to allow a VM guest to be run in a way which protects the code
and data of the guest from the hypervisor. It extends the older "two
world" model (Normal and Secure World) into four worlds: Normal, Secure,
Root and Realm. Linux can then also be run as a guest to a monitor
running in the Realm world.
The monitor running in the Realm world is known as the Realm Management
Monitor (RMM) and implements the Realm Management Monitor
specification[1]. The monitor acts a bit like a hypervisor (e.g. it runs
in EL2 and manages the stage 2 page tables etc of the guests running in
Realm world), however much of the control is handled by a hypervisor
running in the Normal World. The Normal World hypervisor uses the Realm
Management Interface (RMI) defined by the RMM specification to request
the RMM to perform operations (e.g. mapping memory or executing a vCPU).
The RMM defines an environment for guests where the address space (IPA)
is split into two. The lower half is protected - any memory that is
mapped in this half cannot be seen by the Normal World and the RMM
restricts what operations the Normal World can perform on this memory
(e.g. the Normal World cannot replace pages in this region without the
guest's cooperation). The upper half is shared, the Normal World is free
to make changes to the pages in this region, and is able to emulate MMIO
devices in this region too.
A guest running in a Realm may also communicate with the RMM using the
Realm Services Interface (RSI) to request changes in its environment or
to perform attestation about its environment. In particular it may
request that areas of the protected address space are transitioned
between 'RAM' and 'EMPTY' (in either direction). This allows a Realm
guest to give up memory to be returned to the Normal World, or to
request new memory from the Normal World. Without an explicit request
from the Realm guest the RMM will otherwise prevent the Normal World
from making these changes.
Linux as a Realm Guest
----------------------
To run Linux as a guest within a Realm, the following must be provided
either by the VMM or by a `boot loader` run in the Realm before Linux:
* All protected RAM described to Linux (by DT or ACPI) must be marked
RIPAS RAM before handing control over to Linux.
* MMIO devices must be either unprotected (e.g. emulated by the Normal
World) or marked RIPAS DEV.
* MMIO devices emulated by the Normal World and used very early in boot
(specifically earlycon) must be specified in the upper half of IPA.
For earlycon this can be done by specifying the address on the
command line, e.g. with an IPA size of 33 bits and the base address
of the emulated UART at 0x1000000: ``earlycon=uart,mmio,0x101000000``
* Linux will use bounce buffers for communicating with unprotected
devices. It will transition some protected memory to RIPAS EMPTY and
expect to be able to access unprotected pages at the same IPA address
but with the highest valid IPA bit set. The expectation is that the
VMM will remove the physical pages from the protected mapping and
provide those pages as unprotected pages.
References
----------
[1] https://developer.arm.com/documentation/den0137/

View File

@ -41,6 +41,9 @@ to automatically locate and size all RAM, or it may use knowledge of
the RAM in the machine, or any other method the boot loader designer
sees fit.)
For Arm Confidential Compute Realms this includes ensuring that all
protected RAM has a Realm IPA state (RIPAS) of "RAM".
2. Setup the device tree
-------------------------
@ -385,6 +388,9 @@ Before jumping into the kernel, the following conditions must be met:
- HCRX_EL2.MSCEn (bit 11) must be initialised to 0b1.
- HCRX_EL2.MCE2 (bit 10) must be initialised to 0b1 and the hypervisor
must handle MOPS exceptions as described in :ref:`arm64_mops_hyp`.
For CPUs with the Extended Translation Control Register feature (FEAT_TCR2):
- If EL3 is present:
@ -411,6 +417,38 @@ Before jumping into the kernel, the following conditions must be met:
- HFGRWR_EL2.nPIRE0_EL1 (bit 57) must be initialised to 0b1.
- For CPUs with Guarded Control Stacks (FEAT_GCS):
- GCSCR_EL1 must be initialised to 0.
- GCSCRE0_EL1 must be initialised to 0.
- If EL3 is present:
- SCR_EL3.GCSEn (bit 39) must be initialised to 0b1.
- If EL2 is present:
- GCSCR_EL2 must be initialised to 0.
- If the kernel is entered at EL1 and EL2 is present:
- HCRX_EL2.GCSEn must be initialised to 0b1.
- HFGITR_EL2.nGCSEPP (bit 59) must be initialised to 0b1.
- HFGITR_EL2.nGCSSTR_EL1 (bit 58) must be initialised to 0b1.
- HFGITR_EL2.nGCSPUSHM_EL1 (bit 57) must be initialised to 0b1.
- HFGRTR_EL2.nGCS_EL1 (bit 53) must be initialised to 0b1.
- HFGRTR_EL2.nGCS_EL0 (bit 52) must be initialised to 0b1.
- HFGWTR_EL2.nGCS_EL1 (bit 53) must be initialised to 0b1.
- HFGWTR_EL2.nGCS_EL0 (bit 52) must be initialised to 0b1.
The requirements described above for CPU mode, caches, MMUs, architected
timers, coherency and system registers apply to all CPUs. All CPUs must
enter the kernel in the same exception level. Where the values documented

View File

@ -152,6 +152,8 @@ infrastructure:
+------------------------------+---------+---------+
| DIT | [51-48] | y |
+------------------------------+---------+---------+
| MPAM | [43-40] | n |
+------------------------------+---------+---------+
| SVE | [35-32] | y |
+------------------------------+---------+---------+
| GIC | [27-24] | n |

View File

@ -16,9 +16,9 @@ architected discovery mechanism available to userspace code at EL0. The
kernel exposes the presence of these features to userspace through a set
of flags called hwcaps, exposed in the auxiliary vector.
Userspace software can test for features by acquiring the AT_HWCAP or
AT_HWCAP2 entry of the auxiliary vector, and testing whether the relevant
flags are set, e.g.::
Userspace software can test for features by acquiring the AT_HWCAP,
AT_HWCAP2 or AT_HWCAP3 entry of the auxiliary vector, and testing
whether the relevant flags are set, e.g.::
bool floating_point_is_present(void)
{
@ -170,6 +170,10 @@ HWCAP_PACG
ID_AA64ISAR1_EL1.GPI == 0b0001, as described by
Documentation/arch/arm64/pointer-authentication.rst.
HWCAP_GCS
Functionality implied by ID_AA64PFR1_EL1.GCS == 0b1, as
described by Documentation/arch/arm64/gcs.rst.
HWCAP2_DCPODP
Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0010.

View File

@ -0,0 +1,227 @@
===============================================
Guarded Control Stack support for AArch64 Linux
===============================================
This document outlines briefly the interface provided to userspace by Linux in
order to support use of the ARM Guarded Control Stack (GCS) feature.
This is an outline of the most important features and issues only and not
intended to be exhaustive.
1. General
-----------
* GCS is an architecture feature intended to provide greater protection
against return oriented programming (ROP) attacks and to simplify the
implementation of features that need to collect stack traces such as
profiling.
* When GCS is enabled a separate guarded control stack is maintained by the
PE which is writeable only through specific GCS operations. This
stores the call stack only, when a procedure call instruction is
performed the current PC is pushed onto the GCS and on RET the
address in the LR is verified against that on the top of the GCS.
* When active the current GCS pointer is stored in the system register
GCSPR_EL0. This is readable by userspace but can only be updated
via specific GCS instructions.
* The architecture provides instructions for switching between guarded
control stacks with checks to ensure that the new stack is a valid
target for switching.
* The functionality of GCS is similar to that provided by the x86 Shadow
Stack feature, due to sharing of userspace interfaces the ABI refers to
shadow stacks rather than GCS.
* Support for GCS is reported to userspace via HWCAP_GCS in the aux vector
AT_HWCAP2 entry.
* GCS is enabled per thread. While there is support for disabling GCS
at runtime this should be done with great care.
* GCS memory access faults are reported as normal memory access faults.
* GCS specific errors (those reported with EC 0x2d) will be reported as
SIGSEGV with a si_code of SEGV_CPERR (control protection error).
* GCS is supported only for AArch64.
* On systems where GCS is supported GCSPR_EL0 is always readable by EL0
regardless of the GCS configuration for the thread.
* The architecture supports enabling GCS without verifying that return values
in LR match those in the GCS, the LR will be ignored. This is not supported
by Linux.
2. Enabling and disabling Guarded Control Stacks
-------------------------------------------------
* GCS is enabled and disabled for a thread via the PR_SET_SHADOW_STACK_STATUS
prctl(), this takes a single flags argument specifying which GCS features
should be used.
* When set PR_SHADOW_STACK_ENABLE flag allocates a Guarded Control Stack
and enables GCS for the thread, enabling the functionality controlled by
GCSCRE0_EL1.{nTR, RVCHKEN, PCRSEL}.
* When set the PR_SHADOW_STACK_PUSH flag enables the functionality controlled
by GCSCRE0_EL1.PUSHMEn, allowing explicit GCS pushes.
* When set the PR_SHADOW_STACK_WRITE flag enables the functionality controlled
by GCSCRE0_EL1.STREn, allowing explicit stores to the Guarded Control Stack.
* Any unknown flags will cause PR_SET_SHADOW_STACK_STATUS to return -EINVAL.
* PR_LOCK_SHADOW_STACK_STATUS is passed a bitmask of features with the same
values as used for PR_SET_SHADOW_STACK_STATUS. Any future changes to the
status of the specified GCS mode bits will be rejected.
* PR_LOCK_SHADOW_STACK_STATUS allows any bit to be locked, this allows
userspace to prevent changes to any future features.
* There is no support for a process to remove a lock that has been set for
it.
* PR_SET_SHADOW_STACK_STATUS and PR_LOCK_SHADOW_STACK_STATUS affect only the
thread that called them, any other running threads will be unaffected.
* New threads inherit the GCS configuration of the thread that created them.
* GCS is disabled on exec().
* The current GCS configuration for a thread may be read with the
PR_GET_SHADOW_STACK_STATUS prctl(), this returns the same flags that
are passed to PR_SET_SHADOW_STACK_STATUS.
* If GCS is disabled for a thread after having previously been enabled then
the stack will remain allocated for the lifetime of the thread. At present
any attempt to reenable GCS for the thread will be rejected, this may be
revisited in future.
* It should be noted that since enabling GCS will result in GCS becoming
active immediately it is not normally possible to return from the function
that invoked the prctl() that enabled GCS. It is expected that the normal
usage will be that GCS is enabled very early in execution of a program.
3. Allocation of Guarded Control Stacks
----------------------------------------
* When GCS is enabled for a thread a new Guarded Control Stack will be
allocated for it of half the standard stack size or 2 gigabytes,
whichever is smaller.
* When a new thread is created by a thread which has GCS enabled then a
new Guarded Control Stack will be allocated for the new thread with
half the size of the standard stack.
* When a stack is allocated by enabling GCS or during thread creation then
the top 8 bytes of the stack will be initialised to 0 and GCSPR_EL0 will
be set to point to the address of this 0 value, this can be used to
detect the top of the stack.
* Additional Guarded Control Stacks can be allocated using the
map_shadow_stack() system call.
* Stacks allocated using map_shadow_stack() can optionally have an end of
stack marker and cap placed at the top of the stack. If the flag
SHADOW_STACK_SET_TOKEN is specified a cap will be placed on the stack,
if SHADOW_STACK_SET_MARKER is not specified the cap will be the top 8
bytes of the stack and if it is specified then the cap will be the next
8 bytes. While specifying just SHADOW_STACK_SET_MARKER by itself is
valid since the marker is all bits 0 it has no observable effect.
* Stacks allocated using map_shadow_stack() must have a size which is a
multiple of 8 bytes larger than 8 bytes and must be 8 bytes aligned.
* An address can be specified to map_shadow_stack(), if one is provided then
it must be aligned to a page boundary.
* When a thread is freed the Guarded Control Stack initially allocated for
that thread will be freed. Note carefully that if the stack has been
switched this may not be the stack currently in use by the thread.
4. Signal handling
--------------------
* A new signal frame record gcs_context encodes the current GCS mode and
pointer for the interrupted context on signal delivery. This will always
be present on systems that support GCS.
* The record contains a flag field which reports the current GCS configuration
for the interrupted context as PR_GET_SHADOW_STACK_STATUS would.
* The signal handler is run with the same GCS configuration as the interrupted
context.
* When GCS is enabled for the interrupted thread a signal handling specific
GCS cap token will be written to the GCS, this is an architectural GCS cap
with the token type (bits 0..11) all clear. The GCSPR_EL0 reported in the
signal frame will point to this cap token.
* The signal handler will use the same GCS as the interrupted context.
* When GCS is enabled on signal entry a frame with the address of the signal
return handler will be pushed onto the GCS, allowing return from the signal
handler via RET as normal. This will not be reported in the gcs_context in
the signal frame.
5. Signal return
-----------------
When returning from a signal handler:
* If there is a gcs_context record in the signal frame then the GCS flags
and GCSPR_EL0 will be restored from that context prior to further
validation.
* If there is no gcs_context record in the signal frame then the GCS
configuration will be unchanged.
* If GCS is enabled on return from a signal handler then GCSPR_EL0 must
point to a valid GCS signal cap record, this will be popped from the
GCS prior to signal return.
* If the GCS configuration is locked when returning from a signal then any
attempt to change the GCS configuration will be treated as an error. This
is true even if GCS was not enabled prior to signal entry.
* GCS may be disabled via signal return but any attempt to enable GCS via
signal return will be rejected.
6. ptrace extensions
---------------------
* A new regset NT_ARM_GCS is defined for use with PTRACE_GETREGSET and
PTRACE_SETREGSET.
* The GCS mode, including enable and disable, may be configured via ptrace.
If GCS is enabled via ptrace no new GCS will be allocated for the thread.
* Configuration via ptrace ignores locking of GCS mode bits.
7. ELF coredump extensions
---------------------------
* NT_ARM_GCS notes will be added to each coredump for each thread of the
dumped process. The contents will be equivalent to the data that would
have been read if a PTRACE_GETREGSET of the corresponding type were
executed for each thread when the coredump was generated.
8. /proc extensions
--------------------
* Guarded Control Stack pages will include "ss" in their VmFlags in
/proc/<pid>/smaps.

View File

@ -10,16 +10,19 @@ ARM64 Architecture
acpi_object_usage
amu
arm-acpi
arm-cca
asymmetric-32bit
booting
cpu-feature-registers
cpu-hotplug
elf_hwcaps
gcs
hugetlbpage
kdump
legacy_instructions
memory
memory-tagging-extension
mops
perf
pointer-authentication
ptdump

View File

@ -0,0 +1,44 @@
.. SPDX-License-Identifier: GPL-2.0
===================================
Memory copy/set instructions (MOPS)
===================================
A MOPS memory copy/set operation consists of three consecutive CPY* or SET*
instructions: a prologue, main and epilogue (for example: CPYP, CPYM, CPYE).
A main or epilogue instruction can take a MOPS exception for various reasons,
for example when a task is migrated to a CPU with a different MOPS
implementation, or when the instruction's alignment and size requirements are
not met. The software exception handler is then expected to reset the registers
and restart execution from the prologue instruction. Normally this is handled
by the kernel.
For more details refer to "D1.3.5.7 Memory Copy and Memory Set exceptions" in
the Arm Architecture Reference Manual DDI 0487K.a (Arm ARM).
.. _arm64_mops_hyp:
Hypervisor requirements
-----------------------
A hypervisor running a Linux guest must handle all MOPS exceptions from the
guest kernel, as Linux may not be able to handle the exception at all times.
For example, a MOPS exception can be taken when the hypervisor migrates a vCPU
to another physical CPU with a different MOPS implementation.
To do this, the hypervisor must:
- Set HCRX_EL2.MCE2 to 1 so that the exception is taken to the hypervisor.
- Have an exception handler that implements the algorithm from the Arm ARM
rules CNTMJ and MWFQH.
- Set the guest's PSTATE.SS to 0 in the exception handler, to handle a
potential step of the current instruction.
Note: Clearing PSTATE.SS is needed so that a single step exception is taken
on the next instruction (the prologue instruction). Otherwise prologue
would get silently stepped over and the single step exception taken on the
main instruction. Note that if the guest instruction is not being stepped
then clearing PSTATE.SS has no effect.

View File

@ -258,6 +258,8 @@ stable kernels.
| Hisilicon | Hip{08,09,10,10C| #162001900 | N/A |
| | ,11} SMMU PMCG | | |
+----------------+-----------------+-----------------+-----------------------------+
| Hisilicon | Hip09 | #162100801 | HISILICON_ERRATUM_162100801 |
+----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
| Qualcomm Tech. | Kryo/Falkor v1 | E1003 | QCOM_FALKOR_ERRATUM_1003 |
+----------------+-----------------+-----------------+-----------------------------+

View File

@ -346,6 +346,10 @@ The regset data starts with struct user_za_header, containing:
* Writes to NT_ARM_ZT will set PSTATE.ZA to 1.
* If any register data is provided along with SME_PT_VL_ONEXEC then the
registers data will be interpreted with the current vector length, not
the vector length configured for use on exec.
8. ELF coredump extensions
---------------------------

View File

@ -402,6 +402,10 @@ The regset data starts with struct user_sve_header, containing:
streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode
if the target was not in streaming mode.
* If any register data is provided along with SVE_PT_VL_ONEXEC then the
registers data will be interpreted with the current vector length, not
the vector length configured for use on exec.
* The effect of writing a partial, incomplete payload is unspecified.

View File

@ -85,6 +85,70 @@ to CPUINTC directly::
| Devices |
+---------+
Virtual Extended IRQ model
==========================
In this model, IPI (Inter-Processor Interrupt) and CPU Local Timer interrupt
go to CPUINTC directly, CPU UARTS interrupts go to PCH-PIC, while all other
devices interrupts go to PCH-PIC/PCH-MSI and gathered by V-EIOINTC (Virtual
Extended I/O Interrupt Controller), and then go to CPUINTC directly::
+-----+ +-------------------+ +-------+
| IPI |--> | CPUINTC(0-255vcpu)| <-- | Timer |
+-----+ +-------------------+ +-------+
^
|
+-----------+
| V-EIOINTC |
+-----------+
^ ^
| |
+---------+ +---------+
| PCH-PIC | | PCH-MSI |
+---------+ +---------+
^ ^ ^
| | |
+--------+ +---------+ +---------+
| UARTs | | Devices | | Devices |
+--------+ +---------+ +---------+
Description
-----------
V-EIOINTC (Virtual Extended I/O Interrupt Controller) is an extension of
EIOINTC, it only works in VM mode which runs in KVM hypervisor. Interrupts can
be routed to up to four vCPUs via standard EIOINTC, however with V-EIOINTC
interrupts can be routed to up to 256 virtual cpus.
With standard EIOINTC, interrupt routing setting includes two parts: eight
bits for CPU selection and four bits for CPU IP (Interrupt Pin) selection.
For CPU selection there is four bits for EIOINTC node selection, four bits
for EIOINTC CPU selection. Bitmap method is used for CPU selection and
CPU IP selection, so interrupt can only route to CPU0 - CPU3 and IP0-IP3 in
one EIOINTC node.
With V-EIOINTC it supports to route more CPUs and CPU IP (Interrupt Pin),
there are two newly added registers with V-EIOINTC.
EXTIOI_VIRT_FEATURES
--------------------
This register is read-only register, which indicates supported features with
V-EIOINTC. Feature EXTIOI_HAS_INT_ENCODE and EXTIOI_HAS_CPU_ENCODE is added.
Feature EXTIOI_HAS_INT_ENCODE is part of standard EIOINTC. If it is 1, it
indicates that CPU Interrupt Pin selection can be normal method rather than
bitmap method, so interrupt can be routed to IP0 - IP15.
Feature EXTIOI_HAS_CPU_ENCODE is entension of V-EIOINTC. If it is 1, it
indicates that CPU selection can be normal method rather than bitmap method,
so interrupt can be routed to CPU0 - CPU255.
EXTIOI_VIRT_CONFIG
------------------
This register is read-write register, for compatibility intterupt routed uses
the default method which is the same with standard EIOINTC. If the bit is set
with 1, it indicated HW to use normal method rather than bitmap method.
Advanced Extended IRQ model
===========================

View File

@ -93,8 +93,8 @@ given platform based on the content of the device-tree. Thus, you
should:
a) add your platform support as a _boolean_ option in
arch/powerpc/Kconfig, following the example of PPC_PSERIES,
PPC_PMAC and PPC_MAPLE. The latter is probably a good
arch/powerpc/Kconfig, following the example of PPC_PSERIES
and PPC_PMAC. The latter is probably a good
example of a board support to start from.
b) create your main platform file as

View File

@ -239,6 +239,9 @@ The following keys are defined:
ratified in commit 98918c844281 ("Merge pull request #1217 from
riscv/zawrs") of riscv-isa-manual.
* :c:macro:`RISCV_HWPROBE_EXT_SUPM`: The Supm extension is supported as
defined in version 1.0 of the RISC-V Pointer Masking extensions.
* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: Deprecated. Returns similar values to
:c:macro:`RISCV_HWPROBE_KEY_MISALIGNED_SCALAR_PERF`, but the key was
mistakenly classified as a bitmask rather than a value.
@ -274,3 +277,19 @@ The following keys are defined:
represent the highest userspace virtual address usable.
* :c:macro:`RISCV_HWPROBE_KEY_TIME_CSR_FREQ`: Frequency (in Hz) of `time CSR`.
* :c:macro:`RISCV_HWPROBE_KEY_MISALIGNED_VECTOR_PERF`: An enum value describing the
performance of misaligned vector accesses on the selected set of processors.
* :c:macro:`RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN`: The performance of misaligned
vector accesses is unknown.
* :c:macro:`RISCV_HWPROBE_MISALIGNED_VECTOR_SLOW`: 32-bit misaligned accesses using vector
registers are slower than the equivalent quantity of byte accesses via vector registers.
Misaligned accesses may be supported directly in hardware, or trapped and emulated by software.
* :c:macro:`RISCV_HWPROBE_MISALIGNED_VECTOR_FAST`: 32-bit misaligned accesses using vector
registers are faster than the equivalent quantity of byte accesses via vector registers.
* :c:macro:`RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED`: Misaligned vector accesses are
not supported at all and will generate a misaligned address fault.

View File

@ -68,3 +68,19 @@ Misaligned accesses
Misaligned scalar accesses are supported in userspace, but they may perform
poorly. Misaligned vector accesses are only supported if the Zicclsm extension
is supported.
Pointer masking
---------------
Support for pointer masking in userspace (the Supm extension) is provided via
the ``PR_SET_TAGGED_ADDR_CTRL`` and ``PR_GET_TAGGED_ADDR_CTRL`` ``prctl()``
operations. Pointer masking is disabled by default. To enable it, userspace
must call ``PR_SET_TAGGED_ADDR_CTRL`` with the ``PR_PMLEN`` field set to the
number of mask/tag bits needed by the application. ``PR_PMLEN`` is interpreted
as a lower bound; if the kernel is unable to satisfy the request, the
``PR_SET_TAGGED_ADDR_CTRL`` operation will fail. The actual number of tag bits
is returned in ``PR_PMLEN`` by the ``PR_GET_TAGGED_ADDR_CTRL`` operation.
Additionally, when pointer masking is enabled (``PR_PMLEN`` is greater than 0),
a tagged address ABI is supported, with the same interface and behavior as
documented for AArch64 (Documentation/arch/arm64/tagged-address-abi.rst).

View File

@ -4,8 +4,9 @@
AMD HSMP interface
============================================
Newer Fam19h EPYC server line of processors from AMD support system
management functionality via HSMP (Host System Management Port).
Newer Fam19h(model 0x00-0x1f, 0x30-0x3f, 0x90-0x9f, 0xa0-0xaf),
Fam1Ah(model 0x00-0x1f) EPYC server line of processors from AMD support
system management functionality via HSMP (Host System Management Port).
The Host System Management Port (HSMP) is an interface to provide
OS-level software with access to system management functions via a
@ -16,14 +17,25 @@ More details on the interface can be found in chapter
Eg: https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/55898_B1_pub_0_50.zip
HSMP interface is supported on EPYC server CPU models only.
HSMP interface is supported on EPYC line of server CPUs and MI300A (APU).
HSMP device
============================================
amd_hsmp driver under the drivers/platforms/x86/ creates miscdevice
/dev/hsmp to let user space programs run hsmp mailbox commands.
amd_hsmp driver under drivers/platforms/x86/amd/hsmp/ has separate driver files
for ACPI object based probing, platform device based probing and for the common
code for these two drivers.
Kconfig option CONFIG_AMD_HSMP_PLAT compiles plat.c and creates amd_hsmp.ko.
Kconfig option CONFIG_AMD_HSMP_ACPI compiles acpi.c and creates hsmp_acpi.ko.
Selecting any of these two configs automatically selects CONFIG_AMD_HSMP. This
compiles common code hsmp.c and creates hsmp_common.ko module.
Both the ACPI and plat drivers create the miscdevice /dev/hsmp to let
user space programs run hsmp mailbox commands.
The ACPI object format supported by the driver is defined below.
$ ls -al /dev/hsmp
crw-r--r-- 1 root root 10, 123 Jan 21 21:41 /dev/hsmp
@ -59,6 +71,51 @@ Note: lseek() is not supported as entire metrics table is read.
Metrics table definitions will be documented as part of Public PPR.
The same is defined in the amd_hsmp.h header.
ACPI device object format
=========================
The ACPI object format expected from the amd_hsmp driver
for socket with ID00 is given below::
Device(HSMP)
{
Name(_HID, "AMDI0097")
Name(_UID, "ID00")
Name(HSE0, 0x00000001)
Name(RBF0, ResourceTemplate()
{
Memory32Fixed(ReadWrite, 0xxxxxxx, 0x00100000)
})
Method(_CRS, 0, NotSerialized)
{
Return(RBF0)
}
Method(_STA, 0, NotSerialized)
{
If(LEqual(HSE0, One))
{
Return(0x0F)
}
Else
{
Return(Zero)
}
}
Name(_DSD, Package(2)
{
Buffer(0x10)
{
0x9D, 0x61, 0x4D, 0xB7, 0x07, 0x57, 0xBD, 0x48,
0xA6, 0x9F, 0x4E, 0xA2, 0x87, 0x1F, 0xC2, 0xF6
},
Package(3)
{
Package(2) {"MsgIdOffset", 0x00010934},
Package(2) {"MsgRspOffset", 0x00010980},
Package(2) {"MsgArgOffset", 0x000109E0}
}
})
}
An example
==========

View File

@ -896,10 +896,19 @@ Offset/size: 0x260/4
The kernel runtime start address is determined by the following algorithm::
if (relocatable_kernel)
runtime_start = align_up(load_address, kernel_alignment)
else
runtime_start = pref_address
if (relocatable_kernel) {
if (load_address < pref_address)
load_address = pref_address;
runtime_start = align_up(load_address, kernel_alignment);
} else {
runtime_start = pref_address;
}
Hence the necessary memory window location and size can be estimated by
a boot loader as::
memory_window_start = runtime_start;
memory_window_size = init_size;
============ ===============
Field name: handover_offset

View File

@ -26,7 +26,8 @@ Detection
=========
Intel processors may support either or both of the following hardware
mechanisms to detect split locks and bus locks.
mechanisms to detect split locks and bus locks. Some AMD processors also
support bus lock detect.
#AC exception for split lock detection
--------------------------------------

View File

@ -305,3 +305,8 @@ The available options are:
debug
Enable debug messages.
nosnp
Do not enable SEV-SNP (applies to host/hypervisor only). Setting
'nosnp' avoids the RMP check overhead in memory accesses when
users do not want to run SEV-SNP guests.

View File

@ -29,15 +29,27 @@ Complete virtual memory map with 4-level page tables
Start addr | Offset | End addr | Size | VM area description
========================================================================================================================
| | | |
0000000000000000 | 0 | 00007fffffffffff | 128 TB | user-space virtual memory, different per mm
0000000000000000 | 0 | 00007fffffffefff | ~128 TB | user-space virtual memory, different per mm
00007ffffffff000 | ~128 TB | 00007fffffffffff | 4 kB | ... guard hole
__________________|____________|__________________|_________|___________________________________________________________
| | | |
0000800000000000 | +128 TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
| | | | virtual memory addresses up to the -128 TB
0000800000000000 | +128 TB | 7fffffffffffffff | ~8 EB | ... huge, almost 63 bits wide hole of non-canonical
| | | | virtual memory addresses up to the -8 EB
| | | | starting offset of kernel mappings.
| | | |
| | | | LAM relaxes canonicallity check allowing to create aliases
| | | | for userspace memory here.
__________________|____________|__________________|_________|___________________________________________________________
|
| Kernel-space virtual memory, shared between all processes:
__________________|____________|__________________|_________|___________________________________________________________
| | | |
8000000000000000 | -8 EB | ffff7fffffffffff | ~8 EB | ... huge, almost 63 bits wide hole of non-canonical
| | | | virtual memory addresses up to the -128 TB
| | | | starting offset of kernel mappings.
| | | |
| | | | LAM_SUP relaxes canonicallity check allowing to create
| | | | aliases for kernel memory here.
____________________________________________________________|___________________________________________________________
| | | |
ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard hole, also reserved for hypervisor
@ -88,15 +100,26 @@ Complete virtual memory map with 5-level page tables
Start addr | Offset | End addr | Size | VM area description
========================================================================================================================
| | | |
0000000000000000 | 0 | 00ffffffffffffff | 64 PB | user-space virtual memory, different per mm
0000000000000000 | 0 | 00fffffffffff000 | ~64 PB | user-space virtual memory, different per mm
00fffffffffff000 | ~64 PB | 00ffffffffffffff | 4 kB | ... guard hole
__________________|____________|__________________|_________|___________________________________________________________
| | | |
0100000000000000 | +64 PB | feffffffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
| | | | virtual memory addresses up to the -64 PB
0100000000000000 | +64 PB | 7fffffffffffffff | ~8 EB | ... huge, almost 63 bits wide hole of non-canonical
| | | | virtual memory addresses up to the -8EB TB
| | | | starting offset of kernel mappings.
| | | |
| | | | LAM relaxes canonicallity check allowing to create aliases
| | | | for userspace memory here.
__________________|____________|__________________|_________|___________________________________________________________
|
| Kernel-space virtual memory, shared between all processes:
____________________________________________________________|___________________________________________________________
8000000000000000 | -8 EB | feffffffffffffff | ~8 EB | ... huge, almost 63 bits wide hole of non-canonical
| | | | virtual memory addresses up to the -64 PB
| | | | starting offset of kernel mappings.
| | | |
| | | | LAM_SUP relaxes canonicallity check allowing to create
| | | | aliases for kernel memory here.
____________________________________________________________|___________________________________________________________
| | | |
ff00000000000000 | -64 PB | ff0fffffffffffff | 4 PB | ... guard hole, also reserved for hypervisor

View File

@ -39,13 +39,16 @@ blkdevparts=<blkdev-def>[;<blkdev-def>]
create a link to block device partition with the name "PARTNAME".
User space application can access partition by partition name.
ro
read-only. Flag the partition as read-only.
Example:
eMMC disk names are "mmcblk0" and "mmcblk0boot0".
bootargs::
'blkdevparts=mmcblk0:1G(data0),1G(data1),-;mmcblk0boot0:1m(boot),-(kernel)'
'blkdevparts=mmcblk0:1G(data0),1G(data1),-;mmcblk0boot0:1m(boot)ro,-(kernel)'
dmesg::

View File

@ -199,24 +199,36 @@ managing and controlling ublk devices with help of several control commands:
- user recovery feature description
Two new features are added for user recovery: ``UBLK_F_USER_RECOVERY`` and
``UBLK_F_USER_RECOVERY_REISSUE``.
Three new features are added for user recovery: ``UBLK_F_USER_RECOVERY``,
``UBLK_F_USER_RECOVERY_REISSUE``, and ``UBLK_F_USER_RECOVERY_FAIL_IO``. To
enable recovery of ublk devices after the ublk server exits, the ublk server
should specify the ``UBLK_F_USER_RECOVERY`` flag when creating the device. The
ublk server may additionally specify at most one of
``UBLK_F_USER_RECOVERY_REISSUE`` and ``UBLK_F_USER_RECOVERY_FAIL_IO`` to
modify how I/O is handled while the ublk server is dying/dead (this is called
the ``nosrv`` case in the driver code).
With ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io
With just ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io
handler) is dying, ublk does not delete ``/dev/ublkb*`` during the whole
recovery stage and ublk device ID is kept. It is ublk server's
responsibility to recover the device context by its own knowledge.
Requests which have not been issued to userspace are requeued. Requests
which have been issued to userspace are aborted.
With ``UBLK_F_USER_RECOVERY_REISSUE`` set, after one ubq_daemon(ublk
server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``,
With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after one ubq_daemon
(ublk server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``,
requests which have been issued to userspace are requeued and will be
re-issued to the new process after handling ``UBLK_CMD_END_USER_RECOVERY``.
``UBLK_F_USER_RECOVERY_REISSUE`` is designed for backends who tolerate
double-write since the driver may issue the same I/O request twice. It
might be useful to a read-only FS or a VM backend.
With ``UBLK_F_USER_RECOVERY_FAIL_IO`` additionally set, after the ublk server
exits, requests which have issued to userspace are failed, as are any
subsequently issued requests. Applications continuously issuing I/O against
devices with this flag set will see a stream of I/O errors until a new ublk
server recovers the device.
Unprivileged ublk device is supported by passing ``UBLK_F_UNPRIVILEGED_DEV``.
Once the flag is set, all control commands can be sent by unprivileged
user. Except for command of ``UBLK_CMD_ADD_DEV``, permission check on

View File

@ -835,7 +835,7 @@ section named by ``btf_ext_info_sec->sec_name_off``.
See :ref:`Documentation/bpf/llvm_reloc.rst <btf-co-re-relocations>`
for more information on CO-RE relocations.
4.2 .BTF_ids section
4.3 .BTF_ids section
--------------------
The .BTF_ids section encodes BTF ID values that are used within the kernel.
@ -896,6 +896,81 @@ and is used as a filter when resolving the BTF ID value.
All the BTF ID lists and sets are compiled in the .BTF_ids section and
resolved during the linking phase of kernel build by ``resolve_btfids`` tool.
4.4 .BTF.base section
---------------------
Split BTF - where the .BTF section only contains types not in the associated
base .BTF section - is an extremely efficient way to encode type information
for kernel modules, since they generally consist of a few module-specific
types along with a large set of shared kernel types. The former are encoded
in split BTF, while the latter are encoded in base BTF, resulting in more
compact representations. A type in split BTF that refers to a type in
base BTF refers to it using its base BTF ID, and split BTF IDs start
at last_base_BTF_ID + 1.
The downside of this approach however is that this makes the split BTF
somewhat brittle - when the base BTF changes, base BTF ID references are
no longer valid and the split BTF itself becomes useless. The role of the
.BTF.base section is to make split BTF more resilient for cases where
the base BTF may change, as is the case for kernel modules not built every
time the kernel is for example. .BTF.base contains named base types; INTs,
FLOATs, STRUCTs, UNIONs, ENUM[64]s and FWDs. INTs and FLOATs are fully
described in .BTF.base sections, while composite types like structs
and unions are not fully defined - the .BTF.base type simply serves as
a description of the type the split BTF referred to, so structs/unions
have 0 members in the .BTF.base section. ENUM[64]s are similarly recorded
with 0 members. Any other types are added to the split BTF. This
distillation process then leaves us with a .BTF.base section with
such minimal descriptions of base types and .BTF split section which refers
to those base types. Later, we can relocate the split BTF using both the
information stored in the .BTF.base section and the new .BTF base; the type
information in the .BTF.base section allows us to update the split BTF
references to point at the corresponding new base BTF IDs.
BTF relocation happens on kernel module load when a kernel module has a
.BTF.base section, and libbpf also provides a btf__relocate() API to
accomplish this.
As an example consider the following base BTF::
[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[2] STRUCT 'foo' size=8 vlen=2
'f1' type_id=1 bits_offset=0
'f2' type_id=1 bits_offset=32
...and associated split BTF::
[3] PTR '(anon)' type_id=2
i.e. split BTF describes a pointer to struct foo { int f1; int f2 };
.BTF.base will consist of::
[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[2] STRUCT 'foo' size=8 vlen=0
If we relocate the split BTF later using the following new base BTF::
[1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
[2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[3] STRUCT 'foo' size=8 vlen=2
'f1' type_id=2 bits_offset=0
'f2' type_id=2 bits_offset=32
...we can use our .BTF.base description to know that the split BTF reference
is to struct foo, and relocation results in new split BTF::
[4] PTR '(anon)' type_id=3
Note that we had to update BTF ID and start BTF ID for the split BTF.
So we see how .BTF.base plays the role of facilitating later relocation,
leading to more resilient split BTF.
.BTF.base sections will be generated automatically for out-of-tree kernel module
builds - i.e. where KBUILD_EXTMOD is set (as it would be for "make M=path/2/mod"
cases). .BTF.base generation requires pahole support for the "distilled_base"
BTF feature; this is available in pahole v1.28 and later.
5. Using BTF
============

View File

@ -507,7 +507,7 @@ Notes:
from the parent state to the current state.
* Details about REG_LIVE_READ32 are omitted.
* Function ``propagate_liveness()`` (see section :ref:`read_marks_for_cache_hits`)
might override the first parent link. Please refer to the comments in the
``propagate_liveness()`` and ``mark_reg_read()`` source code for further
@ -571,7 +571,7 @@ works::
are considered equivalent.
.. _read_marks_for_cache_hits:
Read marks propagation for cache hits
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@ -616,7 +616,7 @@ ONLINE section for notifications on online and offline operation::
....
cpuhp_remove_instance(state, &inst2->node);
....
remove_multi_state(state);
cpuhp_remove_multi_state(state);
Testing of hotplug states

View File

@ -55,14 +55,16 @@ scope.
What about __vmalloc(GFP_NOFS)
==============================
vmalloc doesn't support GFP_NOFS semantic because there are hardcoded
GFP_KERNEL allocations deep inside the allocator which are quite non-trivial
to fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is
almost always a bug. The good news is that the NOFS/NOIO semantic can be
achieved by the scope API.
Since v5.17, and specifically after the commit 451769ebb7e79 ("mm/vmalloc:
alloc GFP_NO{FS,IO} for vmalloc"), GFP_NOFS/GFP_NOIO are now supported in
``[k]vmalloc`` by implicitly using scope API.
In earlier kernels ``vmalloc`` didn't support GFP_NOFS semantic because there
were hardcoded GFP_KERNEL allocations deep inside the allocator. That means
that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO was almost always a bug.
In the ideal world, upper layers should already mark dangerous contexts
and so no special care is required and vmalloc should be called without
any problems. Sometimes if the context is not really clear or there are
layering violations then the recommended way around that is to wrap ``vmalloc``
by the scope API with a comment explaining the problem.
and so no special care is required and ``vmalloc`` should be called without any
problems. Sometimes if the context is not really clear or there are layering
violations then the recommended way around that (on pre-v5.17 kernels) is to
wrap ``vmalloc`` by the scope API with a comment explaining the problem.

View File

@ -52,6 +52,7 @@ Library functionality that is used throughout the kernel.
wrappers/atomic_bitops
floating-point
union_find
min_heap
Low level entry and exit
========================

View File

@ -0,0 +1,300 @@
.. SPDX-License-Identifier: GPL-2.0
============
Min Heap API
============
Introduction
============
The Min Heap API provides a set of functions and macros for managing min-heaps
in the Linux kernel. A min-heap is a binary tree structure where the value of
each node is less than or equal to the values of its children, ensuring that
the smallest element is always at the root.
This document provides a guide to the Min Heap API, detailing how to define and
use min-heaps. Users should not directly call functions with **__min_heap_*()**
prefixes, but should instead use the provided macro wrappers.
In addition to the standard version of the functions, the API also includes a
set of inline versions for performance-critical scenarios. These inline
functions have the same names as their non-inline counterparts but include an
**_inline** suffix. For example, **__min_heap_init_inline** and its
corresponding macro wrapper **min_heap_init_inline**. The inline versions allow
custom comparison and swap functions to be called directly, rather than through
indirect function calls. This can significantly reduce overhead, especially
when CONFIG_MITIGATION_RETPOLINE is enabled, as indirect function calls become
more expensive. As with the non-inline versions, it is important to use the
macro wrappers for inline functions instead of directly calling the functions
themselves.
Data Structures
===============
Min-Heap Definition
-------------------
The core data structure for representing a min-heap is defined using the
**MIN_HEAP_PREALLOCATED** and **DEFINE_MIN_HEAP** macros. These macros allow
you to define a min-heap with a preallocated buffer or dynamically allocated
memory.
Example:
.. code-block:: c
#define MIN_HEAP_PREALLOCATED(_type, _name, _nr)
struct _name {
int nr; /* Number of elements in the heap */
int size; /* Maximum number of elements that can be held */
_type *data; /* Pointer to the heap data */
_type preallocated[_nr]; /* Static preallocated array */
}
#define DEFINE_MIN_HEAP(_type, _name) MIN_HEAP_PREALLOCATED(_type, _name, 0)
A typical heap structure will include a counter for the number of elements
(`nr`), the maximum capacity of the heap (`size`), and a pointer to an array of
elements (`data`). Optionally, you can specify a static array for preallocated
heap storage using **MIN_HEAP_PREALLOCATED**.
Min Heap Callbacks
------------------
The **struct min_heap_callbacks** provides customization options for ordering
elements in the heap and swapping them. It contains two function pointers:
.. code-block:: c
struct min_heap_callbacks {
bool (*less)(const void *lhs, const void *rhs, void *args);
void (*swp)(void *lhs, void *rhs, void *args);
};
- **less** is the comparison function used to establish the order of elements.
- **swp** is a function for swapping elements in the heap. If swp is set to
NULL, the default swap function will be used, which swaps the elements based on their size
Macro Wrappers
==============
The following macro wrappers are provided for interacting with the heap in a
user-friendly manner. Each macro corresponds to a function that operates on the
heap, and they abstract away direct calls to internal functions.
Each macro accepts various parameters that are detailed below.
Heap Initialization
--------------------
.. code-block:: c
min_heap_init(heap, data, size);
- **heap**: A pointer to the min-heap structure to be initialized.
- **data**: A pointer to the buffer where the heap elements will be stored. If
`NULL`, the preallocated buffer within the heap structure will be used.
- **size**: The maximum number of elements the heap can hold.
This macro initializes the heap, setting its initial state. If `data` is
`NULL`, the preallocated memory inside the heap structure will be used for
storage. Otherwise, the user-provided buffer is used. The operation is **O(1)**.
**Inline Version:** min_heap_init_inline(heap, data, size)
Accessing the Top Element
-------------------------
.. code-block:: c
element = min_heap_peek(heap);
- **heap**: A pointer to the min-heap from which to retrieve the smallest
element.
This macro returns a pointer to the smallest element (the root) of the heap, or
`NULL` if the heap is empty. The operation is **O(1)**.
**Inline Version:** min_heap_peek_inline(heap)
Heap Insertion
--------------
.. code-block:: c
success = min_heap_push(heap, element, callbacks, args);
- **heap**: A pointer to the min-heap into which the element should be inserted.
- **element**: A pointer to the element to be inserted into the heap.
- **callbacks**: A pointer to a `struct min_heap_callbacks` providing the
`less` and `swp` functions.
- **args**: Optional arguments passed to the `less` and `swp` functions.
This macro inserts an element into the heap. It returns `true` if the insertion
was successful and `false` if the heap is full. The operation is **O(log n)**.
**Inline Version:** min_heap_push_inline(heap, element, callbacks, args)
Heap Removal
------------
.. code-block:: c
success = min_heap_pop(heap, callbacks, args);
- **heap**: A pointer to the min-heap from which to remove the smallest element.
- **callbacks**: A pointer to a `struct min_heap_callbacks` providing the
`less` and `swp` functions.
- **args**: Optional arguments passed to the `less` and `swp` functions.
This macro removes the smallest element (the root) from the heap. It returns
`true` if the element was successfully removed, or `false` if the heap is
empty. The operation is **O(log n)**.
**Inline Version:** min_heap_pop_inline(heap, callbacks, args)
Heap Maintenance
----------------
You can use the following macros to maintain the heap's structure:
.. code-block:: c
min_heap_sift_down(heap, pos, callbacks, args);
- **heap**: A pointer to the min-heap.
- **pos**: The index from which to start sifting down.
- **callbacks**: A pointer to a `struct min_heap_callbacks` providing the
`less` and `swp` functions.
- **args**: Optional arguments passed to the `less` and `swp` functions.
This macro restores the heap property by moving the element at the specified
index (`pos`) down the heap until it is in the correct position. The operation
is **O(log n)**.
**Inline Version:** min_heap_sift_down_inline(heap, pos, callbacks, args)
.. code-block:: c
min_heap_sift_up(heap, idx, callbacks, args);
- **heap**: A pointer to the min-heap.
- **idx**: The index of the element to sift up.
- **callbacks**: A pointer to a `struct min_heap_callbacks` providing the
`less` and `swp` functions.
- **args**: Optional arguments passed to the `less` and `swp` functions.
This macro restores the heap property by moving the element at the specified
index (`idx`) up the heap. The operation is **O(log n)**.
**Inline Version:** min_heap_sift_up_inline(heap, idx, callbacks, args)
.. code-block:: c
min_heapify_all(heap, callbacks, args);
- **heap**: A pointer to the min-heap.
- **callbacks**: A pointer to a `struct min_heap_callbacks` providing the
`less` and `swp` functions.
- **args**: Optional arguments passed to the `less` and `swp` functions.
This macro ensures that the entire heap satisfies the heap property. It is
called when the heap is built from scratch or after many modifications. The
operation is **O(n)**.
**Inline Version:** min_heapify_all_inline(heap, callbacks, args)
Removing Specific Elements
--------------------------
.. code-block:: c
success = min_heap_del(heap, idx, callbacks, args);
- **heap**: A pointer to the min-heap.
- **idx**: The index of the element to delete.
- **callbacks**: A pointer to a `struct min_heap_callbacks` providing the
`less` and `swp` functions.
- **args**: Optional arguments passed to the `less` and `swp` functions.
This macro removes an element at the specified index (`idx`) from the heap and
restores the heap property. The operation is **O(log n)**.
**Inline Version:** min_heap_del_inline(heap, idx, callbacks, args)
Other Utilities
===============
- **min_heap_full(heap)**: Checks whether the heap is full.
Complexity: **O(1)**.
.. code-block:: c
bool full = min_heap_full(heap);
- `heap`: A pointer to the min-heap to check.
This macro returns `true` if the heap is full, otherwise `false`.
**Inline Version:** min_heap_full_inline(heap)
- **min_heap_empty(heap)**: Checks whether the heap is empty.
Complexity: **O(1)**.
.. code-block:: c
bool empty = min_heap_empty(heap);
- `heap`: A pointer to the min-heap to check.
This macro returns `true` if the heap is empty, otherwise `false`.
**Inline Version:** min_heap_empty_inline(heap)
Example Usage
=============
An example usage of the min-heap API would involve defining a heap structure,
initializing it, and inserting and removing elements as needed.
.. code-block:: c
#include <linux/min_heap.h>
int my_less_function(const void *lhs, const void *rhs, void *args) {
return (*(int *)lhs < *(int *)rhs);
}
struct min_heap_callbacks heap_cb = {
.less = my_less_function, /* Comparison function for heap order */
.swp = NULL, /* Use default swap function */
};
void example_usage(void) {
/* Pre-populate the buffer with elements */
int buffer[5] = {5, 2, 8, 1, 3};
/* Declare a min-heap */
DEFINE_MIN_HEAP(int, my_heap);
/* Initialize the heap with preallocated buffer and size */
min_heap_init(&my_heap, buffer, 5);
/* Build the heap using min_heapify_all */
my_heap.nr = 5; /* Set the number of elements in the heap */
min_heapify_all(&my_heap, &heap_cb, NULL);
/* Peek at the top element (should be 1 in this case) */
int *top = min_heap_peek(&my_heap);
pr_info("Top element: %d\n", *top);
/* Pop the top element (1) and get the new top (2) */
min_heap_pop(&my_heap, &heap_cb, NULL);
top = min_heap_peek(&my_heap);
pr_info("New top element: %d\n", *top);
/* Insert a new element (0) and recheck the top */
int new_element = 0;
min_heap_push(&my_heap, &new_element, &heap_cb, NULL);
top = min_heap_peek(&my_heap);
pr_info("Top element after insertion: %d\n", *top);
}

View File

@ -151,6 +151,77 @@ the more significant 4-byte word.
We always think of our offsets as if there were no quirk, and we translate
them afterwards, before accessing the memory region.
Note on buffer lengths not multiple of 4
----------------------------------------
To deal with memory layout quirks where groups of 4 bytes are laid out "little
endian" relative to each other, but "big endian" within the group itself, the
concept of groups of 4 bytes is intrinsic to the packing API (not to be
confused with the memory access, which is performed byte by byte, though).
With buffer lengths not multiple of 4, this means one group will be incomplete.
Depending on the quirks, this may lead to discontinuities in the bit fields
accessible through the buffer. The packing API assumes discontinuities were not
the intention of the memory layout, so it avoids them by effectively logically
shortening the most significant group of 4 octets to the number of octets
actually available.
Example with a 31 byte sized buffer given below. Physical buffer offsets are
implicit, and increase from left to right within a group, and from top to
bottom within a column.
No quirks:
::
31 29 28 | Group 7 (most significant)
27 26 25 24 | Group 6
23 22 21 20 | Group 5
19 18 17 16 | Group 4
15 14 13 12 | Group 3
11 10 9 8 | Group 2
7 6 5 4 | Group 1
3 2 1 0 | Group 0 (least significant)
QUIRK_LSW32_IS_FIRST:
::
3 2 1 0 | Group 0 (least significant)
7 6 5 4 | Group 1
11 10 9 8 | Group 2
15 14 13 12 | Group 3
19 18 17 16 | Group 4
23 22 21 20 | Group 5
27 26 25 24 | Group 6
30 29 28 | Group 7 (most significant)
QUIRK_LITTLE_ENDIAN:
::
30 28 29 | Group 7 (most significant)
24 25 26 27 | Group 6
20 21 22 23 | Group 5
16 17 18 19 | Group 4
12 13 14 15 | Group 3
8 9 10 11 | Group 2
4 5 6 7 | Group 1
0 1 2 3 | Group 0 (least significant)
QUIRK_LITTLE_ENDIAN | QUIRK_LSW32_IS_FIRST:
::
0 1 2 3 | Group 0 (least significant)
4 5 6 7 | Group 1
8 9 10 11 | Group 2
12 13 14 15 | Group 3
16 17 18 19 | Group 4
20 21 22 23 | Group 5
24 25 26 27 | Group 6
28 29 30 | Group 7 (most significant)
Intended use
------------

View File

@ -209,12 +209,17 @@ Struct Resources
::
%pr [mem 0x60000000-0x6fffffff flags 0x2200] or
[mem 0x60000000 flags 0x2200] or
[mem 0x0000000060000000-0x000000006fffffff flags 0x2200]
[mem 0x0000000060000000 flags 0x2200]
%pR [mem 0x60000000-0x6fffffff pref] or
[mem 0x60000000 pref] or
[mem 0x0000000060000000-0x000000006fffffff pref]
[mem 0x0000000060000000 pref]
For printing struct resources. The ``R`` and ``r`` specifiers result in a
printed resource with (R) or without (r) a decoded flags member.
printed resource with (R) or without (r) a decoded flags member. If start is
equal to end only print the start value.
Passed by reference.
@ -231,6 +236,19 @@ width of the CPU data path.
Passed by reference.
Struct Range
------------
::
%pra [range 0x0000000060000000-0x000000006fffffff] or
[range 0x0000000060000000]
For printing struct range. struct range holds an arbitrary range of u64
values. If start is equal to end only print the start value.
Passed by reference.
DMA address types dma_addr_t
----------------------------

View File

@ -295,9 +295,9 @@ slot set.
Fourth, the io_tlb_slot array keeps track of any "padding slots" allocated to
meet alloc_align_mask requirements described above. When
swiotlb_tlb_map_single() allocates bounce buffer space to meet alloc_align_mask
swiotlb_tbl_map_single() allocates bounce buffer space to meet alloc_align_mask
requirements, it may allocate pre-padding space across zero or more slots. But
when swiotbl_tlb_unmap_single() is called with the bounce buffer address, the
when swiotlb_tbl_unmap_single() is called with the bounce buffer address, the
alloc_align_mask value that governed the allocation, and therefore the
allocation of any padding slots, is not known. The "pad_slots" field records
the number of padding slots so that swiotlb_tbl_unmap_single() can free them.

View File

@ -245,8 +245,8 @@ CPU which can be assigned to the work items of a wq. For example, with
at the same time per CPU. This is always a per-CPU attribute, even for
unbound workqueues.
The maximum limit for ``@max_active`` is 512 and the default value used
when 0 is specified is 256. These values are chosen sufficiently high
The maximum limit for ``@max_active`` is 2048 and the default value used
when 0 is specified is 1024. These values are chosen sufficiently high
such that they are not the limiting factor while providing protection in
runaway cases.
@ -357,6 +357,11 @@ Guidelines
difference in execution characteristics between using a dedicated wq
and a system wq.
Note: If something may generate more than @max_active outstanding
work items (do stress test your producers), it may saturate a system
wq and potentially lead to deadlock. It should utilize its own
dedicated workqueue rather than the system wq.
* Unless work items are expected to consume a huge amount of CPU
cycles, using a bound wq is usually beneficial due to the increased
level of locality in wq operations and work item execution.

View File

@ -8,10 +8,10 @@ Asymmetric Cipher API
---------------------
.. kernel-doc:: include/crypto/akcipher.h
:doc: Generic Public Key API
:doc: Generic Public Key Cipher API
.. kernel-doc:: include/crypto/akcipher.h
:functions: crypto_alloc_akcipher crypto_free_akcipher crypto_akcipher_set_pub_key crypto_akcipher_set_priv_key crypto_akcipher_maxsize crypto_akcipher_encrypt crypto_akcipher_decrypt crypto_akcipher_sign crypto_akcipher_verify
:functions: crypto_alloc_akcipher crypto_free_akcipher crypto_akcipher_set_pub_key crypto_akcipher_set_priv_key crypto_akcipher_maxsize crypto_akcipher_encrypt crypto_akcipher_decrypt
Asymmetric Cipher Request Handle
--------------------------------

View File

@ -0,0 +1,15 @@
Asymmetric Signature Algorithm Definitions
------------------------------------------
.. kernel-doc:: include/crypto/sig.h
:functions: sig_alg
Asymmetric Signature API
------------------------
.. kernel-doc:: include/crypto/sig.h
:doc: Generic Public Key Signature API
.. kernel-doc:: include/crypto/sig.h
:functions: crypto_alloc_sig crypto_free_sig crypto_sig_set_pubkey crypto_sig_set_privkey crypto_sig_keysize crypto_sig_maxsize crypto_sig_digestsize crypto_sig_sign crypto_sig_verify

View File

@ -10,4 +10,5 @@ Programming Interface
api-digest
api-rng
api-akcipher
api-sig
api-kpp

View File

@ -214,6 +214,8 @@ the aforementioned cipher types:
- CRYPTO_ALG_TYPE_AKCIPHER Asymmetric cipher
- CRYPTO_ALG_TYPE_SIG Asymmetric signature
- CRYPTO_ALG_TYPE_PCOMPRESS Enhanced version of
CRYPTO_ALG_TYPE_COMPRESS allowing for segmented compression /
decompression instead of performing the operation on one segment

View File

@ -0,0 +1,168 @@
.. SPDX-License-Identifier: GPL-2.0
===================================
Using AutoFDO with the Linux kernel
===================================
This enables AutoFDO build support for the kernel when using
the Clang compiler. AutoFDO (Auto-Feedback-Directed Optimization)
is a type of profile-guided optimization (PGO) used to enhance the
performance of binary executables. It gathers information about the
frequency of execution of various code paths within a binary using
hardware sampling. This data is then used to guide the compiler's
optimization decisions, resulting in a more efficient binary. AutoFDO
is a powerful optimization technique, and data indicates that it can
significantly improve kernel performance. It's especially beneficial
for workloads affected by front-end stalls.
For AutoFDO builds, unlike non-FDO builds, the user must supply a
profile. Acquiring an AutoFDO profile can be done in several ways.
AutoFDO profiles are created by converting hardware sampling using
the "perf" tool. It is crucial that the workload used to create these
perf files is representative; they must exhibit runtime
characteristics similar to the workloads that are intended to be
optimized. Failure to do so will result in the compiler optimizing
for the wrong objective.
The AutoFDO profile often encapsulates the program's behavior. If the
performance-critical codes are architecture-independent, the profile
can be applied across platforms to achieve performance gains. For
instance, using the profile generated on Intel architecture to build
a kernel for AMD architecture can also yield performance improvements.
There are two methods for acquiring a representative profile:
(1) Sample real workloads using a production environment.
(2) Generate the profile using a representative load test.
When enabling the AutoFDO build configuration without providing an
AutoFDO profile, the compiler only modifies the dwarf information in
the kernel without impacting runtime performance. It's advisable to
use a kernel binary built with the same AutoFDO configuration to
collect the perf profile. While it's possible to use a kernel built
with different options, it may result in inferior performance.
One can collect profiles using AutoFDO build for the previous kernel.
AutoFDO employs relative line numbers to match the profiles, offering
some tolerance for source changes. This mode is commonly used in a
production environment for profile collection.
In a profile collection based on a load test, the AutoFDO collection
process consists of the following steps:
#. Initial build: The kernel is built with AutoFDO options
without a profile.
#. Profiling: The above kernel is then run with a representative
workload to gather execution frequency data. This data is
collected using hardware sampling, via perf. AutoFDO is most
effective on platforms supporting advanced PMU features like
LBR on Intel machines.
#. AutoFDO profile generation: Perf output file is converted to
the AutoFDO profile via offline tools.
The support requires a Clang compiler LLVM 17 or later.
Preparation
===========
Configure the kernel with::
CONFIG_AUTOFDO_CLANG=y
Customization
=============
The default CONFIG_AUTOFDO_CLANG setting covers kernel space objects for
AutoFDO builds. One can, however, enable or disable AutoFDO build for
individual files and directories by adding a line similar to the following
to the respective kernel Makefile:
- For enabling a single file (e.g. foo.o) ::
AUTOFDO_PROFILE_foo.o := y
- For enabling all files in one directory ::
AUTOFDO_PROFILE := y
- For disabling one file ::
AUTOFDO_PROFILE_foo.o := n
- For disabling all files in one directory ::
AUTOFDO_PROFILE := n
Workflow
========
Here is an example workflow for AutoFDO kernel:
1) Build the kernel on the host machine with LLVM enabled,
for example, ::
$ make menuconfig LLVM=1
Turn on AutoFDO build config::
CONFIG_AUTOFDO_CLANG=y
With a configuration that with LLVM enabled, use the following command::
$ scripts/config -e AUTOFDO_CLANG
After getting the config, build with ::
$ make LLVM=1
2) Install the kernel on the test machine.
3) Run the load tests. The '-c' option in perf specifies the sample
event period. We suggest using a suitable prime number, like 500009,
for this purpose.
- For Intel platforms::
$ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
- For AMD platforms:
The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2. To check,
For Zen3::
$ cat proc/cpuinfo | grep " brs"
For Zen4::
$ cat proc/cpuinfo | grep amd_lbr_v2
The following command generated the perf data file::
$ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
4) (Optional) Download the raw perf file to the host machine.
5) To generate an AutoFDO profile, two offline tools are available:
create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part
of the AutoFDO project and can be found on GitHub
(https://github.com/google/autofdo), version v0.30.1 or later.
The llvm_profgen tool is included in the LLVM compiler itself. It's
important to note that the version of llvm_profgen doesn't need to match
the version of Clang. It needs to be the LLVM 19 release of Clang
or later, or just from the LLVM trunk. ::
$ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file>
or ::
$ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary --out=<profile_file>
Note that multiple AutoFDO profile files can be merged into one via::
$ llvm-profdata merge -o <profile_file> <profile_1> <profile_2> ... <profile_n>
6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1,
(Note CONFIG_AUTOFDO_CLANG needs to be enabled)::
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file>

View File

@ -470,8 +470,6 @@ API usage
usleep_range() should be preferred over udelay(). The proper way of
using usleep_range() is mentioned in the kernel docs.
See: https://www.kernel.org/doc/html/latest/timers/timers-howto.html#delays-information-on-the-various-kernel-delay-sleep-mechanisms
Comments
--------

View File

@ -250,25 +250,17 @@ variables for .cocciconfig is as follows:
- Your directory from which spatch is called is processed next
- The directory provided with the ``--dir`` option is processed last, if used
Since coccicheck runs through make, it naturally runs from the kernel
proper dir; as such the second rule above would be implied for picking up a
.cocciconfig when using ``make coccicheck``.
``make coccicheck`` also supports using M= targets. If you do not supply
any M= target, it is assumed you want to target the entire kernel.
The kernel coccicheck script has::
if [ "$KBUILD_EXTMOD" = "" ] ; then
OPTIONS="--dir $srctree $COCCIINCLUDE"
else
OPTIONS="--dir $KBUILD_EXTMOD $COCCIINCLUDE"
fi
OPTIONS="--dir $srcroot $COCCIINCLUDE"
KBUILD_EXTMOD is set when an explicit target with M= is used. For both cases
the spatch ``--dir`` argument is used, as such third rule applies when whether
M= is used or not, and when M= is used the target directory can have its own
.cocciconfig file. When M= is not passed as an argument to coccicheck the
target directory is the same as the directory from where spatch was called.
Here, $srcroot refers to the source directory of the target: it points to the
external module's source directory when M= used, and otherwise, to the kernel
source directory. The third rule ensures the spatch reads the .cocciconfig from
the target directory, allowing external modules to have their own .cocciconfig
file.
If not using the kernel's coccicheck target, keep the above precedence
order logic of .cocciconfig reading. If using the kernel's coccicheck target,

View File

@ -23,7 +23,7 @@ Possible uses:
associated code is never run?)
.. _gcov: https://gcc.gnu.org/onlinedocs/gcc/Gcov.html
.. _lcov: http://ltp.sourceforge.net/coverage/lcov.php
.. _lcov: https://github.com/linux-test-project/lcov
Preparation

View File

@ -34,6 +34,8 @@ Documentation/dev-tools/testing-overview.rst
ktap
checkuapi
gpio-sloppy-logic-analyzer
autofdo
propeller
.. only:: subproject and html

View File

@ -511,19 +511,14 @@ Tests
~~~~~
There are KASAN tests that allow verifying that KASAN works and can detect
certain types of memory corruptions. The tests consist of two parts:
certain types of memory corruptions.
1. Tests that are integrated with the KUnit Test Framework. Enabled with
``CONFIG_KASAN_KUNIT_TEST``. These tests can be run and partially verified
All KASAN tests are integrated with the KUnit Test Framework and can be enabled
via ``CONFIG_KASAN_KUNIT_TEST``. The tests can be run and partially verified
automatically in a few different ways; see the instructions below.
2. Tests that are currently incompatible with KUnit. Enabled with
``CONFIG_KASAN_MODULE_TEST`` and can only be run as a module. These tests can
only be verified manually by loading the kernel module and inspecting the
kernel log for KASAN reports.
Each KUnit-compatible KASAN test prints one of multiple KASAN reports if an
error is detected. Then the test prints its number and status.
Each KASAN test prints one of multiple KASAN reports if an error is detected.
Then the test prints its number and status.
When a test passes::
@ -550,16 +545,16 @@ Or, if one of the tests failed::
not ok 1 - kasan
There are a few ways to run KUnit-compatible KASAN tests.
There are a few ways to run the KASAN tests.
1. Loadable module
With ``CONFIG_KUNIT`` enabled, KASAN-KUnit tests can be built as a loadable
module and run by loading ``kasan_test.ko`` with ``insmod`` or ``modprobe``.
With ``CONFIG_KUNIT`` enabled, the tests can be built as a loadable module
and run by loading ``kasan_test.ko`` with ``insmod`` or ``modprobe``.
2. Built-In
With ``CONFIG_KUNIT`` built-in, KASAN-KUnit tests can be built-in as well.
With ``CONFIG_KUNIT`` built-in, the tests can be built-in as well.
In this case, the tests will run at boot as a late-init call.
3. Using kunit_tool

View File

@ -75,11 +75,11 @@ supports it for the architecture you are using, you can use hardware
breakpoints if you desire to run with the ``CONFIG_STRICT_KERNEL_RWX``
option turned on, else you need to turn off this option.
Next you should choose one of more I/O drivers to interconnect debugging
Next you should choose one or more I/O drivers to interconnect the debugging
host and debugged target. Early boot debugging requires a KGDB I/O
driver that supports early debugging and the driver must be built into
the kernel directly. Kgdb I/O driver configuration takes place via
kernel or module parameters which you can learn more about in the in the
kernel or module parameters which you can learn more about in the
section that describes the parameter kgdboc.
Here is an example set of ``.config`` symbols to enable or disable for kgdb::
@ -201,8 +201,8 @@ Using loadable module or built-in
Configure kgdboc at runtime with sysfs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
At run time you can enable or disable kgdboc by echoing a parameters
into the sysfs. Here are two examples:
At run time you can enable or disable kgdboc by writing parameters
into sysfs. Here are two examples:
1. Enable kgdboc on ttyS0::
@ -329,7 +329,7 @@ ways to activate this feature.
2. Use sysfs before configuring an I/O driver::
echo 1 > /sys/module/kgdb/parameters/kgdb_use_con
echo 1 > /sys/module/debug_core/parameters/kgdb_use_con
.. note::
@ -374,10 +374,10 @@ default behavior is always set to 0.
Kernel parameter: ``nokaslr``
-----------------------------
If the architecture that you are using enable KASLR by default,
If the architecture that you are using enables KASLR by default,
you should consider turning it off. KASLR randomizes the
virtual address where the kernel image is mapped and confuse
gdb which resolve kernel symbol address from symbol table
virtual address where the kernel image is mapped and confuses
gdb which resolves addresses of kernel symbols from the symbol table
of vmlinux.
Using kdb
@ -631,8 +631,6 @@ automatically changes into kgdb mode.
kgdb
Now disconnect your terminal program and connect gdb in its place
2. At the kdb prompt, disconnect the terminal program and connect gdb in
its place.
@ -749,7 +747,7 @@ The kernel debugger is organized into a number of components:
helper functions in some of the other kernel components to make it
possible for kdb to examine and report information about the kernel
without taking locks that could cause a kernel deadlock. The kdb core
contains implements the following functionality.
implements the following functionality.
- A simple shell

View File

@ -161,6 +161,7 @@ See the include/linux/kmemleak.h header for the functions prototype.
- ``kmemleak_free_percpu`` - notify of a percpu memory block freeing
- ``kmemleak_update_trace`` - update object allocation stack trace
- ``kmemleak_not_leak`` - mark an object as not a leak
- ``kmemleak_transient_leak`` - mark an object as a transient leak
- ``kmemleak_ignore`` - do not scan or report an object as leak
- ``kmemleak_scan_area`` - add scan areas inside a memory block
- ``kmemleak_no_scan`` - do not scan a memory block

View File

@ -133,7 +133,7 @@ KMSAN shadow memory
-------------------
KMSAN associates a metadata byte (also called shadow byte) with every byte of
kernel memory. A bit in the shadow byte is set iff the corresponding bit of the
kernel memory. A bit in the shadow byte is set if the corresponding bit of the
kernel memory byte is uninitialized. Marking the memory uninitialized (i.e.
setting its shadow bytes to ``0xff``) is called poisoning, marking it
initialized (setting the shadow bytes to ``0x00``) is called unpoisoning.

View File

@ -31,6 +31,15 @@ kselftest runs as a userspace process. Tests that can be written/run in
userspace may wish to use the `Test Harness`_. Tests that need to be
run in kernel space may wish to use a `Test Module`_.
Documentation on the tests
==========================
For documentation on the kselftests themselves, see:
.. toctree::
testing-devices
Running the selftests (hotplug tests are run in limited mode)
=============================================================

View File

@ -0,0 +1,162 @@
.. SPDX-License-Identifier: GPL-2.0
=====================================
Using Propeller with the Linux kernel
=====================================
This enables Propeller build support for the kernel when using Clang
compiler. Propeller is a profile-guided optimization (PGO) method used
to optimize binary executables. Like AutoFDO, it utilizes hardware
sampling to gather information about the frequency of execution of
different code paths within a binary. Unlike AutoFDO, this information
is then used right before linking phase to optimize (among others)
block layout within and across functions.
A few important notes about adopting Propeller optimization:
#. Although it can be used as a standalone optimization step, it is
strongly recommended to apply Propeller on top of AutoFDO,
AutoFDO+ThinLTO or Instrument FDO. The rest of this document
assumes this paradigm.
#. Propeller uses another round of profiling on top of
AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves
"build-afdo - train-afdo - build-propeller - train-propeller -
build-optimized".
#. Propeller requires LLVM 19 release or later for Clang/Clang++
and the linker(ld.lld).
#. In addition to LLVM toolchain, Propeller requires a profiling
conversion tool: https://github.com/google/autofdo with a release
after v0.30.1: https://github.com/google/autofdo/releases/tag/v0.30.1.
The Propeller optimization process involves the following steps:
#. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as
you would normally do, but with a set of compile-time / link-time
flags, so that a special metadata section is created within the
kernel binary. The special section is only intend to be used by the
profiling tool, it is not part of the runtime image, nor does it
change kernel run time text sections.
#. Profiling: The above kernel is then run with a representative
workload to gather execution frequency data. This data is collected
using hardware sampling, via perf. Propeller is most effective on
platforms supporting advanced PMU features like LBR on Intel
machines. This step is the same as profiling the kernel for AutoFDO
(the exact perf parameters can be different).
#. Propeller profile generation: Perf output file is converted to a
pair of Propeller profiles via an offline tool.
#. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized
binary as you would normally do, but with a compile-time /
link-time flag to pick up the Propeller compile time and link time
profiles. This build step uses 3 profiles - the AutoFDO profile,
the Propeller compile-time profile and the Propeller link-time
profile.
#. Deployment: The optimized kernel binary is deployed and used
in production environments, providing improved performance
and reduced latency.
Preparation
===========
Configure the kernel with::
CONFIG_AUTOFDO_CLANG=y
CONFIG_PROPELLER_CLANG=y
Customization
=============
The default CONFIG_PROPELLER_CLANG setting covers kernel space objects
for Propeller builds. One can, however, enable or disable Propeller build
for individual files and directories by adding a line similar to the
following to the respective kernel Makefile:
- For enabling a single file (e.g. foo.o)::
PROPELLER_PROFILE_foo.o := y
- For enabling all files in one directory::
PROPELLER_PROFILE := y
- For disabling one file::
PROPELLER_PROFILE_foo.o := n
- For disabling all files in one directory::
PROPELLER__PROFILE := n
Workflow
========
Here is an example workflow for building an AutoFDO+Propeller kernel:
1) Assuming an AutoFDO profile is already collected following
instructions in the AutoFDO document, build the kernel on the host
machine, with AutoFDO and Propeller build configs ::
CONFIG_AUTOFDO_CLANG=y
CONFIG_PROPELLER_CLANG=y
and ::
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name>
2) Install the kernel on the test machine.
3) Run the load tests. The '-c' option in perf specifies the sample
event period. We suggest using a suitable prime number, like 500009,
for this purpose.
- For Intel platforms::
$ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
- For AMD platforms::
$ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
Note you can repeat the above steps to collect multiple <perf_file>s.
4) (Optional) Download the raw perf file(s) to the host machine.
5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to
generate Propeller profile. ::
$ create_llvm_prof --binary=<vmlinux> --profile=<perf_file>
--format=propeller --propeller_output_module_name
--out=<propeller_profile_prefix>_cc_profile.txt
--propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
"<propeller_profile_prefix>" can be something like "/home/user/dir/any_string".
This command generates a pair of Propeller profiles:
"<propeller_profile_prefix>_cc_profile.txt" and
"<propeller_profile_prefix>_ld_profile.txt".
If there are more than 1 perf_file collected in the previous step,
you can create a temp list file "<perf_file_list>" with each line
containing one perf file name and run::
$ create_llvm_prof --binary=<vmlinux> --profile=@<perf_file_list>
--format=propeller --propeller_output_module_name
--out=<propeller_profile_prefix>_cc_profile.txt
--propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
6) Rebuild the kernel using the AutoFDO and Propeller
profiles. ::
CONFIG_AUTOFDO_CLANG=y
CONFIG_PROPELLER_CLANG=y
and ::
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix>

Some files were not shown because too many files have changed in this diff Show More