Merge branch 'linus' into perf/urgent, to synchronize UAPI headers

Signed-off-by: Ingo Molnar <mingo@kernel.org>
This commit is contained in:
Ingo Molnar 2017-12-06 22:39:39 +01:00
commit d6eabce257
11139 changed files with 536742 additions and 258674 deletions

55
.gitignore vendored
View File

@ -7,38 +7,40 @@
# command after changing this file, to see if there are
# any tracked files which get ignored after the change.
#
# Normal rules
# Normal rules (sorted alphabetically)
#
.*
*.a
*.bin
*.bz2
*.c.[012]*.*
*.dtb
*.dtb.S
*.dwo
*.elf
*.gcno
*.gz
*.i
*.ko
*.ll
*.lst
*.lz4
*.lzma
*.lzo
*.mod.c
*.o
*.o.*
*.a
*.order
*.patch
*.s
*.ko
*.so
*.so.dbg
*.mod.c
*.i
*.lst
*.symtypes
*.order
*.elf
*.bin
*.tar
*.gz
*.bz2
*.lzma
*.xz
*.lz4
*.lzo
*.patch
*.gcno
*.ll
modules.builtin
Module.symvers
*.dwo
*.su
*.c.[012]*.*
*.symtypes
*.tar
*.xz
Module.symvers
modules.builtin
#
# Top-level generic files
@ -53,6 +55,11 @@ Module.symvers
/System.map
/Module.markers
#
# RPM spec file (make rpm-pkg)
#
/*.spec
#
# Debian directory (make deb-pkg)
#

View File

@ -73,6 +73,8 @@ James E Wilson <wilson@specifix.com>
James Hogan <jhogan@kernel.org> <james.hogan@imgtec.com>
James Hogan <jhogan@kernel.org> <james@albanarts.com>
James Ketrenos <jketreno@io.(none)>
Jason Gunthorpe <jgg@ziepe.ca> <jgg@mellanox.com>
Jason Gunthorpe <jgg@ziepe.ca> <jgunthorpe@obsidianresearch.com>
Javi Merino <javi.merino@kernel.org> <javi.merino@arm.com>
<javier@osg.samsung.com> <javier.martinez@collabora.co.uk>
Jean Tourrilhes <jt@hpl.hp.com>

View File

@ -1,5 +0,0 @@
What: /proc/sys/vm/nr_pdflush_threads
Date: June 2012
Contact: Wanpeng Li <liwp@linux.vnet.ibm.com>
Description: Since pdflush is replaced by per-BDI flusher, the interface of old pdflush
exported in /proc/sys/vm/ should be removed.

View File

@ -11,7 +11,7 @@ Description:
Kernel code may export it for complete or partial access.
GPIOs are identified as they are inside the kernel, using integers in
the range 0..INT_MAX. See Documentation/gpio.txt for more information.
the range 0..INT_MAX. See Documentation/gpio/gpio.txt for more information.
/sys/class/gpio
/export ... asks the kernel to export a GPIO to userspace

View File

@ -41,3 +41,73 @@ KernelVersion: 4.5
Contact: K. Y. Srinivasan <kys@microsoft.com>
Description: The 16 bit vendor ID of the device
Users: tools/hv/lsvmbus and user level RDMA libraries
What: /sys/bus/vmbus/devices/vmbus_*/channels/relid/cpu
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin@microsoft.com>
Description: VCPU (sub)channel is affinitized to
Users: tools/hv/lsvmbus and other debuggig tools
What: /sys/bus/vmbus/devices/vmbus_*/channels/relid/cpu
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin@microsoft.com>
Description: VCPU (sub)channel is affinitized to
Users: tools/hv/lsvmbus and other debuggig tools
What: /sys/bus/vmbus/devices/vmbus_*/channels/relid/in_mask
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin@microsoft.com>
Description: Inbound channel signaling state
Users: Debugging tools
What: /sys/bus/vmbus/devices/vmbus_*/channels/relid/latency
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin@microsoft.com>
Description: Channel signaling latency
Users: Debugging tools
What: /sys/bus/vmbus/devices/vmbus_*/channels/relid/out_mask
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin@microsoft.com>
Description: Outbound channel signaling state
Users: Debugging tools
What: /sys/bus/vmbus/devices/vmbus_*/channels/relid/pending
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin@microsoft.com>
Description: Channel interrupt pending state
Users: Debugging tools
What: /sys/bus/vmbus/devices/vmbus_*/channels/relid/read_avail
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin@microsoft.com>
Description: Bytes availabble to read
Users: Debugging tools
What: /sys/bus/vmbus/devices/vmbus_*/channels/relid/write_avail
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin@microsoft.com>
Description: Bytes availabble to write
Users: Debugging tools
What: /sys/bus/vmbus/devices/vmbus_*/channels/relid/events
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin@microsoft.com>
Description: Number of times we have signaled the host
Users: Debugging tools
What: /sys/bus/vmbus/devices/vmbus_*/channels/relid/interrupts
Date: September. 2017
KernelVersion: 4.14
Contact: Stephen Hemminger <sthemmin@microsoft.com>
Description: Number of times we have taken an interrupt (incoming)
Users: Debugging tools

View File

@ -0,0 +1,41 @@
What: /dev/wmi/dell-smbios
Date: November 2017
KernelVersion: 4.15
Contact: "Mario Limonciello" <mario.limonciello@dell.com>
Description:
Perform SMBIOS calls on supported Dell machines.
through the Dell ACPI-WMI interface.
IOCTL's and buffer formats are defined in:
<uapi/linux/wmi.h>
1) To perform an SMBIOS call from userspace, you'll need to
first determine the minimum size of the calling interface
buffer for your machine.
Platforms that contain larger buffers can return larger
objects from the system firmware.
Commonly this size is either 4k or 32k.
To determine the size of the buffer read() a u64 dword from
the WMI character device /dev/wmi/dell-smbios.
2) After you've determined the minimum size of the calling
interface buffer, you can allocate a structure that represents
the structure documented above.
3) In the 'length' object store the size of the buffer you
determined above and allocated.
4) In this buffer object, prepare as necessary for the SMBIOS
call you're interested in. Typically SMBIOS buffers have
"class", "select", and "input" defined to values that coincide
with the data you are interested in.
Documenting class/select/input values is outside of the scope
of this documentation. Check with the libsmbios project for
further documentation on these values.
6) Run the call by using ioctl() as described in the header.
7) The output will be returned in the buffer object.
8) Be sure to free up your allocated object.

View File

@ -522,6 +522,7 @@ Description:
Specifies the output powerdown mode.
DAC output stage is disconnected from the amplifier and
1kohm_to_gnd: connected to ground via an 1kOhm resistor,
2.5kohm_to_gnd: connected to ground via a 2.5kOhm resistor,
6kohm_to_gnd: connected to ground via a 6kOhm resistor,
20kohm_to_gnd: connected to ground via a 20kOhm resistor,
90kohm_to_gnd: connected to ground via a 90kOhm resistor,
@ -1242,9 +1243,9 @@ What: /sys/.../iio:deviceX/in_distance_raw
KernelVersion: 4.0
Contact: linux-iio@vger.kernel.org
Description:
This attribute is used to read the distance covered by the user
since the last reboot while activated. Units after application
of scale are meters.
This attribute is used to read the measured distance to an object
or the distance covered by the user since the last reboot while
activated. Units after application of scale are meters.
What: /sys/bus/iio/devices/iio:deviceX/store_eeprom
KernelVersion: 3.4.0

View File

@ -16,3 +16,13 @@ Description:
the motion sensor is placed. For example, in a laptop a motion
sensor can be located on the base or on the lid. Current valid
values are 'base' and 'lid'.
What: /sys/bus/iio/devices/iio:deviceX/id
Date: Septembre 2017
KernelVersion: 4.14
Contact: linux-iio@vger.kernel.org
Description:
This attribute is exposed by the CrOS EC legacy accelerometer
driver and represents the sensor ID as exposed by the EC. This
ID is used by the Android sensor service hardware abstraction
layer (sensor HAL) through the Android container on ChromeOS.

View File

@ -110,3 +110,51 @@ Description: When new NVM image is written to the non-active NVM
is directly the status value from the DMA configuration
based mailbox before the device is power cycled. Writing
0 here clears the status.
What: /sys/bus/thunderbolt/devices/<xdomain>.<service>/key
Date: Jan 2018
KernelVersion: 4.15
Contact: thunderbolt-software@lists.01.org
Description: This contains name of the property directory the XDomain
service exposes. This entry describes the protocol in
question. Following directories are already reserved by
the Apple XDomain specification:
network: IP/ethernet over Thunderbolt
targetdm: Target disk mode protocol over Thunderbolt
extdisp: External display mode protocol over Thunderbolt
What: /sys/bus/thunderbolt/devices/<xdomain>.<service>/modalias
Date: Jan 2018
KernelVersion: 4.15
Contact: thunderbolt-software@lists.01.org
Description: Stores the same MODALIAS value emitted by uevent for
the XDomain service. Format: tbtsvc:kSpNvNrN
What: /sys/bus/thunderbolt/devices/<xdomain>.<service>/prtcid
Date: Jan 2018
KernelVersion: 4.15
Contact: thunderbolt-software@lists.01.org
Description: This contains XDomain protocol identifier the XDomain
service supports.
What: /sys/bus/thunderbolt/devices/<xdomain>.<service>/prtcvers
Date: Jan 2018
KernelVersion: 4.15
Contact: thunderbolt-software@lists.01.org
Description: This contains XDomain protocol version the XDomain
service supports.
What: /sys/bus/thunderbolt/devices/<xdomain>.<service>/prtcrevs
Date: Jan 2018
KernelVersion: 4.15
Contact: thunderbolt-software@lists.01.org
Description: This contains XDomain software version the XDomain
service supports.
What: /sys/bus/thunderbolt/devices/<xdomain>.<service>/prtcstns
Date: Jan 2018
KernelVersion: 4.15
Contact: thunderbolt-software@lists.01.org
Description: This contains XDomain service specific settings as
bitmask. Format: %x

View File

@ -211,7 +211,9 @@ Description:
device, after it has been suspended at run time, from a resume
request to the moment the device will be ready to process I/O,
in microseconds. If it is equal to 0, however, this means that
the PM QoS resume latency may be arbitrary.
the PM QoS resume latency may be arbitrary and the special value
"n/a" means that user space cannot accept any resume latency at
all for the given device.
Not all drivers support this attribute. If it isn't supported,
it is not present.
@ -258,19 +260,3 @@ Description:
This attribute has no effect on system-wide suspend/resume and
hibernation.
What: /sys/devices/.../power/pm_qos_remote_wakeup
Date: September 2012
Contact: Rafael J. Wysocki <rjw@rjwysocki.net>
Description:
The /sys/devices/.../power/pm_qos_remote_wakeup attribute
is used for manipulating the PM QoS "remote wakeup required"
flag. If set, this flag indicates to the kernel that the
device is a source of user events that have to be signaled from
its low-power states.
Not all drivers support this attribute. If it isn't supported,
it is not present.
This attribute has no effect on system-wide suspend/resume and
hibernation.

View File

@ -0,0 +1,21 @@
What: /sys/bus/w1/devices/19-<id>/speed
Date: Sep 2017
KernelVersion: 4.14
Contact: Jan Kandziora <jjj@gmx.de>
Description: When written, this file sets the I2C speed on the connected
DS28E17 chip. When read, it reads the current setting from
the DS28E17 chip.
Valid values: 100, 400, 900 [kBaud].
Default 100, can be set by w1_ds28e17.speed= module parameter.
Users: w1_ds28e17 driver
What: /sys/bus/w1/devices/19-<id>/stretch
Date: Sep 2017
KernelVersion: 4.14
Contact: Jan Kandziora <jjj@gmx.de>
Description: When written, this file sets the multiplier used to calculate
the busy timeout for I2C operations on the connected DS28E17
chip. When read, returns the current setting.
Valid values: 1 to 9.
Default 1, can be set by w1_ds28e17.stretch= module parameter.
Users: w1_ds28e17 driver

View File

@ -51,6 +51,18 @@ Description:
Controls the dirty page count condition for the in-place-update
policies.
What: /sys/fs/f2fs/<disk>/min_hot_blocks
Date: March 2017
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description:
Controls the dirty page count condition for redefining hot data.
What: /sys/fs/f2fs/<disk>/min_ssr_sections
Date: October 2017
Contact: "Chao Yu" <yuchao0@huawei.com>
Description:
Controls the fee section threshold to trigger SSR allocation.
What: /sys/fs/f2fs/<disk>/max_small_discards
Date: November 2013
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
@ -102,6 +114,12 @@ Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description:
Controls the idle timing.
What: /sys/fs/f2fs/<disk>/iostat_enable
Date: August 2017
Contact: "Chao Yu" <yuchao0@huawei.com>
Description:
Controls to enable/disable IO stat.
What: /sys/fs/f2fs/<disk>/ra_nid_pages
Date: October 2015
Contact: "Chao Yu" <chao2.yu@samsung.com>
@ -122,6 +140,12 @@ Contact: "Shuoran Liu" <liushuoran@huawei.com>
Description:
Shows total written kbytes issued to disk.
What: /sys/fs/f2fs/<disk>/feature
Date: July 2017
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description:
Shows all enabled features in current device.
What: /sys/fs/f2fs/<disk>/inject_rate
Date: May 2016
Contact: "Sheng Yong" <shengyong1@huawei.com>
@ -138,7 +162,18 @@ What: /sys/fs/f2fs/<disk>/reserved_blocks
Date: June 2017
Contact: "Chao Yu" <yuchao0@huawei.com>
Description:
Controls current reserved blocks in system.
Controls target reserved blocks in system, the threshold
is soft, it could exceed current available user space.
What: /sys/fs/f2fs/<disk>/current_reserved_blocks
Date: October 2017
Contact: "Yunlong Song" <yunlong.song@huawei.com>
Contact: "Chao Yu" <yuchao0@huawei.com>
Description:
Shows current reserved blocks in system, it may be temporarily
smaller than target_reserved_blocks, but will gradually
increase to target_reserved_blocks when more free blocks are
freed by user later.
What: /sys/fs/f2fs/<disk>/gc_urgent
Date: August 2017

View File

@ -0,0 +1,21 @@
What: /sys/devices/platform/<platform>/tokens/*
Date: November 2017
KernelVersion: 4.15
Contact: "Mario Limonciello" <mario.limonciello@dell.com>
Description:
A read-only description of Dell platform tokens
available on the machine.
Each token attribute is available as a pair of
sysfs attributes readable by a process with
CAP_SYS_ADMIN.
For example the token ID "5" would be available
as the following attributes:
0005_location
0005_value
Tokens will vary from machine to machine, and
only tokens available on that machine will be
displayed.

View File

@ -0,0 +1,11 @@
What: /sys/devices/platform/<platform>/force_power
Date: September 2017
KernelVersion: 4.15
Contact: "Mario Limonciello" <mario.limonciello@dell.com>
Description:
Modify the platform force power state, influencing
Thunderbolt controllers to turn on or off when no
devices are connected (write-only)
There are two available states:
* 0 -> Force power disabled
* 1 -> Force power enabled

View File

@ -81,7 +81,9 @@ If you want the driver to put an event into the event log on a panic,
enable the 'Generate a panic event to all BMCs on a panic' option. If
you want the whole panic string put into the event log using OEM
events, enable the 'Generate OEM events containing the panic string'
option.
option. You can also enable these dynamically by setting the module
parameter named "panic_op" in the ipmi_msghandler module to "event"
or "string". Setting that parameter to "none" disables this function.
Basic Design
------------

View File

@ -0,0 +1,25 @@
To enumerate platform Low Power Idle states, Intel platforms are using
“Low Power Idle Table” (LPIT). More details about this table can be
downloaded from:
http://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf
Residencies for each low power state can be read via FFH
(Function fixed hardware) or a memory mapped interface.
On platforms supporting S0ix sleep states, there can be two types of
residencies:
- CPU PKG C10 (Read via FFH interface)
- Platform Controller Hub (PCH) SLP_S0 (Read via memory mapped interface)
The following attributes are added dynamically to the cpuidle
sysfs attribute group:
/sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us
/sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us
The "low_power_idle_cpu_residency_us" attribute shows time spent
by the CPU package in PKG C10
The "low_power_idle_system_residency_us" attribute shows SLP_S0
residency, or system time spent with the SLP_S0# signal asserted.
This is the lowest possible system power state, achieved only when CPU is in
PKG C10 and all functional blocks in PCH are in a low power state.

View File

@ -18,7 +18,7 @@ shortcut for ``print_hex_dump(KERN_DEBUG)``.
For ``print_hex_dump_debug()``/``print_hex_dump_bytes()``, format string is
its ``prefix_str`` argument, if it is constant string; or ``hexdump``
in case ``prefix_str`` is build dynamically.
in case ``prefix_str`` is built dynamically.
Dynamic debug has even more useful features:
@ -197,8 +197,8 @@ line
line number matches the callsite line number exactly. A
range of line numbers matches any callsite between the first
and last line number inclusive. An empty first number means
the first line in the file, an empty line number means the
last number in the file. Examples::
the first line in the file, an empty last line number means the
last line number in the file. Examples::
line 1603 // exactly line 1603
line 1600-1605 // the six lines from line 1600 to line 1605

View File

@ -857,7 +857,7 @@
The filter can be disabled or changed to another
driver later using sysfs.
drm_kms_helper.edid_firmware=[<connector>:]<file>[,[<connector>:]<file>]
drm.edid_firmware=[<connector>:]<file>[,[<connector>:]<file>]
Broken monitors, graphic adapters, KVMs and EDIDless
panels may send no or incorrect EDID data sets.
This parameter allows to specify an EDID data sets
@ -1864,13 +1864,6 @@
Built with CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y,
the default is off.
kmemcheck= [X86] Boot-time kmemcheck enable/disable/one-shot mode
Valid arguments: 0, 1, 2
kmemcheck=0 (disabled)
kmemcheck=1 (enabled)
kmemcheck=2 (one-shot mode)
Default: 2 (one-shot mode)
kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.
Default is 0 (don't ignore, but inject #GP)
@ -1897,6 +1890,10 @@
[KVM,ARM] Trap guest accesses to GICv3 common
system registers
kvm-arm.vgic_v4_enable=
[KVM,ARM] Allow use of GICv4 for direct injection of
LPIs.
kvm-intel.ept= [KVM,Intel] Disable extended page tables
(virtualized MMU) support on capable Intel chips.
Default is 1 (enabled)
@ -3211,6 +3208,10 @@
allowed (eg kernel_enable_fpu()/kernel_disable_fpu()).
There is some performance impact when enabling this.
ppc_tm= [PPC]
Format: {"off"}
Disable Hardware Transactional Memory
print-fatal-signals=
[KNL] debug: print fatal signals
@ -3249,13 +3250,15 @@
instead using the legacy FADT method
profile= [KNL] Enable kernel profiling via /proc/profile
Format: [schedule,]<number>
Format: [<profiletype>,]<number>
Param: <profiletype>: "schedule", "sleep", or "kvm"
[defaults to kernel profiling]
Param: "schedule" - profile schedule points.
Param: <number> - step/bucket size as a power of 2 for
statistical time based profiling.
Param: "sleep" - profile D-state sleeping (millisecs).
Requires CONFIG_SCHEDSTATS
Param: "kvm" - profile VM exits.
Param: <number> - step/bucket size as a power of 2 for
statistical time based profiling.
prompt_ramdisk= [RAM] List of RAM disks to prompt for floppy disk
before loading.

View File

@ -197,3 +197,42 @@ information is missing.
To recover from this mode, one needs to flash a valid NVM image to the
host host controller in the same way it is done in the previous chapter.
Networking over Thunderbolt cable
---------------------------------
Thunderbolt technology allows software communication across two hosts
connected by a Thunderbolt cable.
It is possible to tunnel any kind of traffic over Thunderbolt link but
currently we only support Apple ThunderboltIP protocol.
If the other host is running Windows or macOS only thing you need to
do is to connect Thunderbolt cable between the two hosts, the
``thunderbolt-net`` is loaded automatically. If the other host is also
Linux you should load ``thunderbolt-net`` manually on one host (it does
not matter which one)::
# modprobe thunderbolt-net
This triggers module load on the other host automatically. If the driver
is built-in to the kernel image, there is no need to do anything.
The driver will create one virtual ethernet interface per Thunderbolt
port which are named like ``thunderbolt0`` and so on. From this point
you can either use standard userspace tools like ``ifconfig`` to
configure the interface or let your GUI to handle it automatically.
Forcing power
-------------
Many OEMs include a method that can be used to force the power of a
thunderbolt controller to an "On" state even if nothing is connected.
If supported by your machine this will be exposed by the WMI bus with
a sysfs attribute called "force_power".
For example the intel-wmi-thunderbolt driver exposes this attribute in:
/sys/devices/platform/PNP0C14:00/wmi_bus/wmi_bus-PNP0C14:00/86CCFD48-205E-4A77-9C48-2021CBEDE341/force_power
To force the power to on, write 1 to this attribute file.
To disable force power, write 0 to this attribute file.
Note: it's currently not possible to query the force power state of a platform.

View File

@ -33,6 +33,11 @@ SunXi family
- Next Thing Co GR8 (sun5i)
* Single ARM Cortex-A7 based SoCs
- Allwinner V3s (sun8i)
+ Datasheet
http://linux-sunxi.org/File:Allwinner_V3s_Datasheet_V1.0.pdf
* Dual ARM Cortex-A7 based SoCs
- Allwinner A20 (sun7i)
+ User Manual
@ -71,9 +76,11 @@ SunXi family
+ Datasheet
http://dl.linux-sunxi.org/H3/Allwinner_H3_Datasheet_V1.0.pdf
- Allwinner V3s (sun8i)
- Allwinner R40 (sun8i)
+ Datasheet
http://linux-sunxi.org/File:Allwinner_V3s_Datasheet_V1.0.pdf
https://github.com/tinalinux/docs/raw/r40-v1.y/R40_Datasheet_V1.0.pdf
+ User Manual
https://github.com/tinalinux/docs/raw/r40-v1.y/Allwinner_R40_User_Manual_V1.0.pdf
* Quad ARM Cortex-A15, Quad ARM Cortex-A7 based SoCs
- Allwinner A80

View File

@ -110,10 +110,20 @@ infrastructure:
x--------------------------------------------------x
| Name | bits | visible |
|--------------------------------------------------|
| RES0 | [63-32] | n |
| RES0 | [63-48] | n |
|--------------------------------------------------|
| DP | [47-44] | y |
|--------------------------------------------------|
| SM4 | [43-40] | y |
|--------------------------------------------------|
| SM3 | [39-36] | y |
|--------------------------------------------------|
| SHA3 | [35-32] | y |
|--------------------------------------------------|
| RDM | [31-28] | y |
|--------------------------------------------------|
| RES0 | [27-24] | n |
|--------------------------------------------------|
| ATOMICS | [23-20] | y |
|--------------------------------------------------|
| CRC32 | [19-16] | y |
@ -132,7 +142,11 @@ infrastructure:
x--------------------------------------------------x
| Name | bits | visible |
|--------------------------------------------------|
| RES0 | [63-28] | n |
| RES0 | [63-36] | n |
|--------------------------------------------------|
| SVE | [35-32] | y |
|--------------------------------------------------|
| RES0 | [31-28] | n |
|--------------------------------------------------|
| GIC | [27-24] | n |
|--------------------------------------------------|

View File

@ -0,0 +1,160 @@
ARM64 ELF hwcaps
================
This document describes the usage and semantics of the arm64 ELF hwcaps.
1. Introduction
---------------
Some hardware or software features are only available on some CPU
implementations, and/or with certain kernel configurations, but have no
architected discovery mechanism available to userspace code at EL0. The
kernel exposes the presence of these features to userspace through a set
of flags called hwcaps, exposed in the auxilliary vector.
Userspace software can test for features by acquiring the AT_HWCAP entry
of the auxilliary vector, and testing whether the relevant flags are
set, e.g.
bool floating_point_is_present(void)
{
unsigned long hwcaps = getauxval(AT_HWCAP);
if (hwcaps & HWCAP_FP)
return true;
return false;
}
Where software relies on a feature described by a hwcap, it should check
the relevant hwcap flag to verify that the feature is present before
attempting to make use of the feature.
Features cannot be probed reliably through other means. When a feature
is not available, attempting to use it may result in unpredictable
behaviour, and is not guaranteed to result in any reliable indication
that the feature is unavailable, such as a SIGILL.
2. Interpretation of hwcaps
---------------------------
The majority of hwcaps are intended to indicate the presence of features
which are described by architected ID registers inaccessible to
userspace code at EL0. These hwcaps are defined in terms of ID register
fields, and should be interpreted with reference to the definition of
these fields in the ARM Architecture Reference Manual (ARM ARM).
Such hwcaps are described below in the form:
Functionality implied by idreg.field == val.
Such hwcaps indicate the availability of functionality that the ARM ARM
defines as being present when idreg.field has value val, but do not
indicate that idreg.field is precisely equal to val, nor do they
indicate the absence of functionality implied by other values of
idreg.field.
Other hwcaps may indicate the presence of features which cannot be
described by ID registers alone. These may be described without
reference to ID registers, and may refer to other documentation.
3. The hwcaps exposed in AT_HWCAP
---------------------------------
HWCAP_FP
Functionality implied by ID_AA64PFR0_EL1.FP == 0b0000.
HWCAP_ASIMD
Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0000.
HWCAP_EVTSTRM
The generic timer is configured to generate events at a frequency of
approximately 100KHz.
HWCAP_AES
Functionality implied by ID_AA64ISAR1_EL1.AES == 0b0001.
HWCAP_PMULL
Functionality implied by ID_AA64ISAR1_EL1.AES == 0b0010.
HWCAP_SHA1
Functionality implied by ID_AA64ISAR0_EL1.SHA1 == 0b0001.
HWCAP_SHA2
Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0001.
HWCAP_CRC32
Functionality implied by ID_AA64ISAR0_EL1.CRC32 == 0b0001.
HWCAP_ATOMICS
Functionality implied by ID_AA64ISAR0_EL1.Atomic == 0b0010.
HWCAP_FPHP
Functionality implied by ID_AA64PFR0_EL1.FP == 0b0001.
HWCAP_ASIMDHP
Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0001.
HWCAP_CPUID
EL0 access to certain ID registers is available, to the extent
described by Documentation/arm64/cpu-feature-registers.txt.
These ID registers may imply the availability of features.
HWCAP_ASIMDRDM
Functionality implied by ID_AA64ISAR0_EL1.RDM == 0b0001.
HWCAP_JSCVT
Functionality implied by ID_AA64ISAR1_EL1.JSCVT == 0b0001.
HWCAP_FCMA
Functionality implied by ID_AA64ISAR1_EL1.FCMA == 0b0001.
HWCAP_LRCPC
Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0001.
HWCAP_DCPOP
Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0001.
HWCAP_SHA3
Functionality implied by ID_AA64ISAR0_EL1.SHA3 == 0b0001.
HWCAP_SM3
Functionality implied by ID_AA64ISAR0_EL1.SM3 == 0b0001.
HWCAP_SM4
Functionality implied by ID_AA64ISAR0_EL1.SM4 == 0b0001.
HWCAP_ASIMDDP
Functionality implied by ID_AA64ISAR0_EL1.DP == 0b0001.
HWCAP_SHA512
Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0002.
HWCAP_SVE
Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001.

View File

@ -86,9 +86,9 @@ Translation table lookup with 64KB pages:
+-------------------------------------------------> [63] TTBR0/1
When using KVM, the hypervisor maps kernel pages in EL2, at a fixed
offset from the kernel VA (top 24bits of the kernel VA set to zero):
When using KVM without the Virtualization Host Extensions, the hypervisor
maps kernel pages in EL2 at a fixed offset from the kernel VA. See the
kern_hyp_va macro for more details.
Start End Size Use
-----------------------------------------------------------------------
0000004000000000 0000007fffffffff 256GB kernel objects mapped in HYP
When using KVM with the Virtualization Host Extensions, no additional
mappings are created, since the host kernel runs directly in EL2.

508
Documentation/arm64/sve.txt Normal file
View File

@ -0,0 +1,508 @@
Scalable Vector Extension support for AArch64 Linux
===================================================
Author: Dave Martin <Dave.Martin@arm.com>
Date: 4 August 2017
This document outlines briefly the interface provided to userspace by Linux in
order to support use of the ARM Scalable Vector Extension (SVE).
This is an outline of the most important features and issues only and not
intended to be exhaustive.
This document does not aim to describe the SVE architecture or programmer's
model. To aid understanding, a minimal description of relevant programmer's
model features for SVE is included in Appendix A.
1. General
-----------
* SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are
tracked per-thread.
* The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector
AT_HWCAP entry. Presence of this flag implies the presence of the SVE
instructions and registers, and the Linux-specific system interfaces
described in this document. SVE is reported in /proc/cpuinfo as "sve".
* Support for the execution of SVE instructions in userspace can also be
detected by reading the CPU ID register ID_AA64PFR0_EL1 using an MRS
instruction, and checking that the value of the SVE field is nonzero. [3]
It does not guarantee the presence of the system interfaces described in the
following sections: software that needs to verify that those interfaces are
present must check for HWCAP_SVE instead.
* Debuggers should restrict themselves to interacting with the target via the
NT_ARM_SVE regset. The recommended way of detecting support for this regset
is to connect to a target process first and then attempt a
ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).
2. Vector length terminology
-----------------------------
The size of an SVE vector (Z) register is referred to as the "vector length".
To avoid confusion about the units used to express vector length, the kernel
adopts the following conventions:
* Vector length (VL) = size of a Z-register in bytes
* Vector quadwords (VQ) = size of a Z-register in units of 128 bits
(So, VL = 16 * VQ.)
The VQ convention is used where the underlying granularity is important, such
as in data structure definitions. In most other situations, the VL convention
is used. This is consistent with the meaning of the "VL" pseudo-register in
the SVE instruction set architecture.
3. System call behaviour
-------------------------
* On syscall, V0..V31 are preserved (as without SVE). Thus, bits [127:0] of
Z0..Z31 are preserved. All other bits of Z0..Z31, and all of P0..P15 and FFR
become unspecified on return from a syscall.
* The SVE registers are not used to pass arguments to or receive results from
any syscall.
* In practice the affected registers/bits will be preserved or will be replaced
with zeros on return from a syscall, but userspace should not make
assumptions about this. The kernel behaviour may vary on a case-by-case
basis.
* All other SVE state of a thread, including the currently configured vector
length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector
length (if any), is preserved across all syscalls, subject to the specific
exceptions for execve() described in section 6.
In particular, on return from a fork() or clone(), the parent and new child
process or thread share identical SVE configuration, matching that of the
parent before the call.
4. Signal handling
-------------------
* A new signal frame record sve_context encodes the SVE registers on signal
delivery. [1]
* This record is supplementary to fpsimd_context. The FPSR and FPCR registers
are only present in fpsimd_context. For convenience, the content of V0..V31
is duplicated between sve_context and fpsimd_context.
* The signal frame record for SVE always contains basic metadata, in particular
the thread's vector length (in sve_context.vl).
* The SVE registers may or may not be included in the record, depending on
whether the registers are live for the thread. The registers are present if
and only if:
sve_context.head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)).
* If the registers are present, the remainder of the record has a vl-dependent
size and layout. Macros SVE_SIG_* are defined [1] to facilitate access to
the members.
* If the SVE context is too big to fit in sigcontext.__reserved[], then extra
space is allocated on the stack, an extra_context record is written in
__reserved[] referencing this space. sve_context is then written in the
extra space. Refer to [1] for further details about this mechanism.
5. Signal return
-----------------
When returning from a signal handler:
* If there is no sve_context record in the signal frame, or if the record is
present but contains no register data as desribed in the previous section,
then the SVE registers/bits become non-live and take unspecified values.
* If sve_context is present in the signal frame and contains full register
data, the SVE registers become live and are populated with the specified
data. However, for backward compatibility reasons, bits [127:0] of Z0..Z31
are always restored from the corresponding members of fpsimd_context.vregs[]
and not from sve_context. The remaining bits are restored from sve_context.
* Inclusion of fpsimd_context in the signal frame remains mandatory,
irrespective of whether sve_context is present or not.
* The vector length cannot be changed via signal return. If sve_context.vl in
the signal frame does not match the current vector length, the signal return
attempt is treated as illegal, resulting in a forced SIGSEGV.
6. prctl extensions
--------------------
Some new prctl() calls are added to allow programs to manage the SVE vector
length:
prctl(PR_SVE_SET_VL, unsigned long arg)
Sets the vector length of the calling thread and related flags, where
arg == vl | flags. Other threads of the calling process are unaffected.
vl is the desired vector length, where sve_vl_valid(vl) must be true.
flags:
PR_SVE_SET_VL_INHERIT
Inherit the current vector length across execve(). Otherwise, the
vector length is reset to the system default at execve(). (See
Section 9.)
PR_SVE_SET_VL_ONEXEC
Defer the requested vector length change until the next execve()
performed by this thread.
The effect is equivalent to implicit exceution of the following
call immediately after the next execve() (if any) by the thread:
prctl(PR_SVE_SET_VL, arg & ~PR_SVE_SET_VL_ONEXEC)
This allows launching of a new program with a different vector
length, while avoiding runtime side effects in the caller.
Without PR_SVE_SET_VL_ONEXEC, the requested change takes effect
immediately.
Return value: a nonnegative on success, or a negative value on error:
EINVAL: SVE not supported, invalid vector length requested, or
invalid flags.
On success:
* Either the calling thread's vector length or the deferred vector length
to be applied at the next execve() by the thread (dependent on whether
PR_SVE_SET_VL_ONEXEC is present in arg), is set to the largest value
supported by the system that is less than or equal to vl. If vl ==
SVE_VL_MAX, the value set will be the largest value supported by the
system.
* Any previously outstanding deferred vector length change in the calling
thread is cancelled.
* The returned value describes the resulting configuration, encoded as for
PR_SVE_GET_VL. The vector length reported in this value is the new
current vector length for this thread if PR_SVE_SET_VL_ONEXEC was not
present in arg; otherwise, the reported vector length is the deferred
vector length that will be applied at the next execve() by the calling
thread.
* Changing the vector length causes all of P0..P15, FFR and all bits of
Z0..V31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
unspecified. Calling PR_SVE_SET_VL with vl equal to the thread's current
vector length, or calling PR_SVE_SET_VL with the PR_SVE_SET_VL_ONEXEC
flag, does not constitute a change to the vector length for this purpose.
prctl(PR_SVE_GET_VL)
Gets the vector length of the calling thread.
The following flag may be OR-ed into the result:
PR_SVE_SET_VL_INHERIT
Vector length will be inherited across execve().
There is no way to determine whether there is an outstanding deferred
vector length change (which would only normally be the case between a
fork() or vfork() and the corresponding execve() in typical use).
To extract the vector length from the result, and it with
PR_SVE_VL_LEN_MASK.
Return value: a nonnegative value on success, or a negative value on error:
EINVAL: SVE not supported.
7. ptrace extensions
---------------------
* A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and
PTRACE_SETREGSET.
Refer to [2] for definitions.
The regset data starts with struct user_sve_header, containing:
size
Size of the complete regset, in bytes.
This depends on vl and possibly on other things in the future.
If a call to PTRACE_GETREGSET requests less data than the value of
size, the caller can allocate a larger buffer and retry in order to
read the complete regset.
max_size
Maximum size in bytes that the regset can grow to for the target
thread. The regset won't grow bigger than this even if the target
thread changes its vector length etc.
vl
Target thread's current vector length, in bytes.
max_vl
Maximum possible vector length for the target thread.
flags
either
SVE_PT_REGS_FPSIMD
SVE registers are not live (GETREGSET) or are to be made
non-live (SETREGSET).
The payload is of type struct user_fpsimd_state, with the same
meaning as for NT_PRFPREG, starting at offset
SVE_PT_FPSIMD_OFFSET from the start of user_sve_header.
Extra data might be appended in the future: the size of the
payload should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags).
vq should be obtained using sve_vq_from_vl(vl).
or
SVE_PT_REGS_SVE
SVE registers are live (GETREGSET) or are to be made live
(SETREGSET).
The payload contains the SVE register data, starting at offset
SVE_PT_SVE_OFFSET from the start of user_sve_header, and with
size SVE_PT_SVE_SIZE(vq, flags);
... OR-ed with zero or more of the following flags, which have the same
meaning and behaviour as the corresponding PR_SET_VL_* flags:
SVE_PT_VL_INHERIT
SVE_PT_VL_ONEXEC (SETREGSET only).
* The effects of changing the vector length and/or flags are equivalent to
those documented for PR_SVE_SET_VL.
The caller must make a further GETREGSET call if it needs to know what VL is
actually set by SETREGSET, unless is it known in advance that the requested
VL is supported.
* In the SVE_PT_REGS_SVE case, the size and layout of the payload depends on
the header fields. The SVE_PT_SVE_*() macros are provided to facilitate
access to the members.
* In either case, for SETREGSET it is permissible to omit the payload, in which
case only the vector length and flags are changed (along with any
consequences of those changes).
* For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the
requested VL is not supported, the effect will be the same as if the
payload were omitted, except that an EIO error is reported. No
attempt is made to translate the payload data to the correct layout
for the vector length actually set. The thread's FPSIMD state is
preserved, but the remaining bits of the SVE registers become
unspecified. It is up to the caller to translate the payload layout
for the actual VL and retry.
* The effect of writing a partial, incomplete payload is unspecified.
8. ELF coredump extensions
---------------------------
* A NT_ARM_SVE note will be added to each coredump for each thread of the
dumped process. The contents will be equivalent to the data that would have
been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread
when the coredump was generated.
9. System runtime configuration
--------------------------------
* To mitigate the ABI impact of expansion of the signal frame, a policy
mechanism is provided for administrators, distro maintainers and developers
to set the default vector length for userspace processes:
/proc/sys/abi/sve_default_vector_length
Writing the text representation of an integer to this file sets the system
default vector length to the specified value, unless the value is greater
than the maximum vector length supported by the system in which case the
default vector length is set to that maximum.
The result can be determined by reopening the file and reading its
contents.
At boot, the default vector length is initially set to 64 or the maximum
supported vector length, whichever is smaller. This determines the initial
vector length of the init process (PID 1).
Reading this file returns the current system default vector length.
* At every execve() call, the new vector length of the new process is set to
the system default vector length, unless
* PR_SVE_SET_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the
calling thread, or
* a deferred vector length change is pending, established via the
PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ONEXEC).
* Modifying the system default vector length does not affect the vector length
of any existing process or thread that does not make an execve() call.
Appendix A. SVE programmer's model (informative)
=================================================
This section provides a minimal description of the additions made by SVE to the
ARMv8-A programmer's model that are relevant to this document.
Note: This section is for information only and not intended to be complete or
to replace any architectural specification.
A.1. Registers
---------------
In A64 state, SVE adds the following:
* 32 8VL-bit vector registers Z0..Z31
For each Zn, Zn bits [127:0] alias the ARMv8-A vector register Vn.
A register write using a Vn register name zeros all bits of the corresponding
Zn except for bits [127:0].
* 16 VL-bit predicate registers P0..P15
* 1 VL-bit special-purpose predicate register FFR (the "first-fault register")
* a VL "pseudo-register" that determines the size of each vector register
The SVE instruction set architecture provides no way to write VL directly.
Instead, it can be modified only by EL1 and above, by writing appropriate
system registers.
* The value of VL can be configured at runtime by EL1 and above:
16 <= VL <= VLmax, where VL must be a multiple of 16.
* The maximum vector length is determined by the hardware:
16 <= VLmax <= 256.
(The SVE architecture specifies 256, but permits future architecture
revisions to raise this limit.)
* FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point
operations in a similar way to the way in which they interact with ARMv8
floating-point operations.
8VL-1 128 0 bit index
+---- //// -----------------+
Z0 | : V0 |
: :
Z7 | : V7 |
Z8 | : * V8 |
: : :
Z15 | : *V15 |
Z16 | : V16 |
: :
Z31 | : V31 |
+---- //// -----------------+
31 0
VL-1 0 +-------+
+---- //// --+ FPSR | |
P0 | | +-------+
: | | *FPCR | |
P15 | | +-------+
+---- //// --+
FFR | | +-----+
+---- //// --+ VL | |
+-----+
(*) callee-save:
This only applies to bits [63:0] of Z-/V-registers.
FPCR contains callee-save and caller-save bits. See [4] for details.
A.2. Procedure call standard
-----------------------------
The ARMv8-A base procedure call standard is extended as follows with respect to
the additional SVE register state:
* All SVE register bits that are not shared with FP/SIMD are caller-save.
* Z8 bits [63:0] .. Z15 bits [63:0] are callee-save.
This follows from the way these bits are mapped to V8..V15, which are caller-
save in the base procedure call standard.
Appendix B. ARMv8-A FP/SIMD programmer's model
===============================================
Note: This section is for information only and not intended to be complete or
to replace any architectural specification.
Refer to [4] for for more information.
ARMv8-A defines the following floating-point / SIMD register state:
* 32 128-bit vector registers V0..V31
* 2 32-bit status/control registers FPSR, FPCR
127 0 bit index
+---------------+
V0 | |
: : :
V7 | |
* V8 | |
: : : :
*V15 | |
V16 | |
: : :
V31 | |
+---------------+
31 0
+-------+
FPSR | |
+-------+
*FPCR | |
+-------+
(*) callee-save:
This only applies to bits [63:0] of V-registers.
FPCR contains a mixture of callee-save and caller-save bits.
References
==========
[1] arch/arm64/include/uapi/asm/sigcontext.h
AArch64 Linux signal ABI definitions
[2] arch/arm64/include/uapi/asm/ptrace.h
AArch64 Linux ptrace ABI definitions
[3] linux/Documentation/arm64/cpu-feature-registers.txt
[4] ARM IHI0055C
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
Procedure Call Standard for the ARM 64-bit Architecture (AArch64)

View File

@ -20,12 +20,27 @@ for that device, by setting low_latency to 0. See Section 3 for
details on how to configure BFQ for the desired tradeoff between
latency and throughput, or on how to maximize throughput.
On average CPUs, the current version of BFQ can handle devices
performing at most ~30K IOPS; at most ~50 KIOPS on faster CPUs. As a
reference, 30-50 KIOPS correspond to very high bandwidths with
sequential I/O (e.g., 8-12 GB/s if I/O requests are 256 KB large), and
to 120-200 MB/s with 4KB random I/O. BFQ is currently being tested on
multi-queue devices too.
BFQ has a non-null overhead, which limits the maximum IOPS that a CPU
can process for a device scheduled with BFQ. To give an idea of the
limits on slow or average CPUs, here are, first, the limits of BFQ for
three different CPUs, on, respectively, an average laptop, an old
desktop, and a cheap embedded system, in case full hierarchical
support is enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set), but
CONFIG_DEBUG_BLK_CGROUP is not set (Section 4-2):
- Intel i7-4850HQ: 400 KIOPS
- AMD A8-3850: 250 KIOPS
- ARM CortexTM-A53 Octa-core: 80 KIOPS
If CONFIG_DEBUG_BLK_CGROUP is set (and of course full hierarchical
support is enabled), then the sustainable throughput with BFQ
decreases, because all blkio.bfq* statistics are created and updated
(Section 4-2). For BFQ, this leads to the following maximum
sustainable throughputs, on the same systems as above:
- Intel i7-4850HQ: 310 KIOPS
- AMD A8-3850: 200 KIOPS
- ARM CortexTM-A53 Octa-core: 56 KIOPS
BFQ works for multi-queue devices too.
The table of contents follow. Impatients can just jump to Section 3.
@ -500,6 +515,22 @@ BFQ-specific files is "blkio.bfq." or "io.bfq." For example, the group
parameter to set the weight of a group with BFQ is blkio.bfq.weight
or io.bfq.weight.
As for cgroups-v1 (blkio controller), the exact set of stat files
created, and kept up-to-date by bfq, depends on whether
CONFIG_DEBUG_BLK_CGROUP is set. If it is set, then bfq creates all
the stat files documented in
Documentation/cgroup-v1/blkio-controller.txt. If, instead,
CONFIG_DEBUG_BLK_CGROUP is not set, then bfq creates only the files
blkio.bfq.io_service_bytes
blkio.bfq.io_service_bytes_recursive
blkio.bfq.io_serviced
blkio.bfq.io_serviced_recursive
The value of CONFIG_DEBUG_BLK_CGROUP greatly influences the maximum
throughput sustainable with bfq, because updating the blkio.bfq.*
stats is rather costly, especially for some of the stats enabled by
CONFIG_DEBUG_BLK_CGROUP.
Parameters to set
-----------------

View File

@ -216,10 +216,9 @@ may need to abort DMA operations and revert to PIO for the transfer, in
which case a virtual mapping of the page is required. For SCSI it is also
done in some scenarios where the low level driver cannot be trusted to
handle a single sg entry correctly. The driver is expected to perform the
kmaps as needed on such occasions using the __bio_kmap_atomic and bio_kmap_irq
routines as appropriate. A driver could also use the blk_queue_bounce()
routine on its own to bounce highmem i/o to low memory for specific requests
if so desired.
kmaps as needed on such occasions as appropriate. A driver could also use
the blk_queue_bounce() routine on its own to bounce highmem i/o to low
memory for specific requests if so desired.
iii. The i/o scheduler algorithm itself can be replaced/set as appropriate
@ -1137,8 +1136,8 @@ use dma_map_sg for scatter gather) to be able to ship it to the driver. For
PIO drivers (or drivers that need to revert to PIO transfer once in a
while (IDE for example)), where the CPU is doing the actual data
transfer a virtual mapping is needed. If the driver supports highmem I/O,
(Sec 1.1, (ii) ) it needs to use __bio_kmap_atomic and bio_kmap_irq to
temporarily map a bio into the virtual address space.
(Sec 1.1, (ii) ) it needs to use kmap_atomic or similar to temporarily map
a bio into the virtual address space.
8. Prior/Related/Impacted patches

View File

@ -38,7 +38,7 @@ gb=[Size in GB]: Default: 250GB
bs=[Block size (in bytes)]: Default: 512 bytes
The block size reported to the system.
nr_devices=[Number of devices]: Default: 2
nr_devices=[Number of devices]: Default: 1
Number of block devices instantiated. They are instantiated as /dev/nullb0,
etc.
@ -52,13 +52,13 @@ irqmode=[0-2]: Default: 1-Soft-irq
2: Timer: Waits a specific period (completion_nsec) for each IO before
completion.
completion_nsec=[ns]: Default: 10.000ns
completion_nsec=[ns]: Default: 10,000ns
Combined with irqmode=2 (timer). The time each completion event must wait.
submit_queues=[0..nr_cpus]:
submit_queues=[1..nr_cpus]:
The number of submission queues attached to the device driver. If unset, it
defaults to 1 on single-queue and bio-based instances. For multi-queue,
it is ignored when use_per_node_hctx module parameter is 1.
defaults to 1. For multi-queue, it is ignored when use_per_node_hctx module
parameter is 1.
hw_queue_depth=[0..qdepth]: Default: 64
The hardware queue depth of the device.
@ -73,3 +73,12 @@ use_per_node_hctx=[0/1]: Default: 0
use_lightnvm=[0/1]: Default: 0
Register device with LightNVM. Requires blk-mq and CONFIG_NVM to be enabled.
no_sched=[0/1]: Default: 0
0: nullb* use default blk-mq io scheduler.
1: nullb* doesn't use io scheduler.
shared_tags=[0/1]: Default: 0
0: Tag set is not shared.
1: Tag set shared between devices for blk-mq. Only makes sense with
nr_devices > 1, otherwise there's no tag set to share.

View File

@ -0,0 +1,156 @@
BPF extensibility and applicability to networking, tracing, security
in the linux kernel and several user space implementations of BPF
virtual machine led to a number of misunderstanding on what BPF actually is.
This short QA is an attempt to address that and outline a direction
of where BPF is heading long term.
Q: Is BPF a generic instruction set similar to x64 and arm64?
A: NO.
Q: Is BPF a generic virtual machine ?
A: NO.
BPF is generic instruction set _with_ C calling convention.
Q: Why C calling convention was chosen?
A: Because BPF programs are designed to run in the linux kernel
which is written in C, hence BPF defines instruction set compatible
with two most used architectures x64 and arm64 (and takes into
consideration important quirks of other architectures) and
defines calling convention that is compatible with C calling
convention of the linux kernel on those architectures.
Q: can multiple return values be supported in the future?
A: NO. BPF allows only register R0 to be used as return value.
Q: can more than 5 function arguments be supported in the future?
A: NO. BPF calling convention only allows registers R1-R5 to be used
as arguments. BPF is not a standalone instruction set.
(unlike x64 ISA that allows msft, cdecl and other conventions)
Q: can BPF programs access instruction pointer or return address?
A: NO.
Q: can BPF programs access stack pointer ?
A: NO. Only frame pointer (register R10) is accessible.
From compiler point of view it's necessary to have stack pointer.
For example LLVM defines register R11 as stack pointer in its
BPF backend, but it makes sure that generated code never uses it.
Q: Does C-calling convention diminishes possible use cases?
A: YES. BPF design forces addition of major functionality in the form
of kernel helper functions and kernel objects like BPF maps with
seamless interoperability between them. It lets kernel call into
BPF programs and programs call kernel helpers with zero overhead.
As all of them were native C code. That is particularly the case
for JITed BPF programs that are indistinguishable from
native kernel C code.
Q: Does it mean that 'innovative' extensions to BPF code are disallowed?
A: Soft yes. At least for now until BPF core has support for
bpf-to-bpf calls, indirect calls, loops, global variables,
jump tables, read only sections and all other normal constructs
that C code can produce.
Q: Can loops be supported in a safe way?
A: It's not clear yet. BPF developers are trying to find a way to
support bounded loops where the verifier can guarantee that
the program terminates in less than 4096 instructions.
Q: How come LD_ABS and LD_IND instruction are present in BPF whereas
C code cannot express them and has to use builtin intrinsics?
A: This is artifact of compatibility with classic BPF. Modern
networking code in BPF performs better without them.
See 'direct packet access'.
Q: It seems not all BPF instructions are one-to-one to native CPU.
For example why BPF_JNE and other compare and jumps are not cpu-like?
A: This was necessary to avoid introducing flags into ISA which are
impossible to make generic and efficient across CPU architectures.
Q: why BPF_DIV instruction doesn't map to x64 div?
A: Because if we picked one-to-one relationship to x64 it would have made
it more complicated to support on arm64 and other archs. Also it
needs div-by-zero runtime check.
Q: why there is no BPF_SDIV for signed divide operation?
A: Because it would be rarely used. llvm errors in such case and
prints a suggestion to use unsigned divide instead
Q: Why BPF has implicit prologue and epilogue?
A: Because architectures like sparc have register windows and in general
there are enough subtle differences between architectures, so naive
store return address into stack won't work. Another reason is BPF has
to be safe from division by zero (and legacy exception path
of LD_ABS insn). Those instructions need to invoke epilogue and
return implicitly.
Q: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning?
A: Because classic BPF didn't have them and BPF authors felt that compiler
workaround would be acceptable. Turned out that programs lose performance
due to lack of these compare instructions and they were added.
These two instructions is a perfect example what kind of new BPF
instructions are acceptable and can be added in the future.
These two already had equivalent instructions in native CPUs.
New instructions that don't have one-to-one mapping to HW instructions
will not be accepted.
Q: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF
registers which makes BPF inefficient virtual machine for 32-bit
CPU architectures and 32-bit HW accelerators. Can true 32-bit registers
be added to BPF in the future?
A: NO. The first thing to improve performance on 32-bit archs is to teach
LLVM to generate code that uses 32-bit subregisters. Then second step
is to teach verifier to mark operations where zero-ing upper bits
is unnecessary. Then JITs can take advantage of those markings and
drastically reduce size of generated code and improve performance.
Q: Does BPF have a stable ABI?
A: YES. BPF instructions, arguments to BPF programs, set of helper
functions and their arguments, recognized return codes are all part
of ABI. However when tracing programs are using bpf_probe_read() helper
to walk kernel internal datastructures and compile with kernel
internal headers these accesses can and will break with newer
kernels. The union bpf_attr -> kern_version is checked at load time
to prevent accidentally loading kprobe-based bpf programs written
for a different kernel. Networking programs don't do kern_version check.
Q: How much stack space a BPF program uses?
A: Currently all program types are limited to 512 bytes of stack
space, but the verifier computes the actual amount of stack used
and both interpreter and most JITed code consume necessary amount.
Q: Can BPF be offloaded to HW?
A: YES. BPF HW offload is supported by NFP driver.
Q: Does classic BPF interpreter still exist?
A: NO. Classic BPF programs are converted into extend BPF instructions.
Q: Can BPF call arbitrary kernel functions?
A: NO. BPF programs can only call a set of helper functions which
is defined for every program type.
Q: Can BPF overwrite arbitrary kernel memory?
A: NO. Tracing bpf programs can _read_ arbitrary memory with bpf_probe_read()
and bpf_probe_read_str() helpers. Networking programs cannot read
arbitrary memory, since they don't have access to these helpers.
Programs can never read or write arbitrary memory directly.
Q: Can BPF overwrite arbitrary user memory?
A: Sort-of. Tracing BPF programs can overwrite the user memory
of the current task with bpf_probe_write_user(). Every time such
program is loaded the kernel will print warning message, so
this helper is only useful for experiments and prototypes.
Tracing BPF programs are root only.
Q: When bpf_trace_printk() helper is used the kernel prints nasty
warning message. Why is that?
A: This is done to nudge program authors into better interfaces when
programs need to pass data to user space. Like bpf_perf_event_output()
can be used to efficiently stream data via perf ring buffer.
BPF maps can be used for asynchronous data sharing between kernel
and user space. bpf_trace_printk() should only be used for debugging.
Q: Can BPF functionality such as new program or map types, new
helpers, etc be added out of kernel module code?
A: NO.

View File

@ -893,10 +893,6 @@ Controllers
CPU
---
.. note::
The interface for the cpu controller hasn't been merged yet
The "cpu" controllers regulates distribution of CPU cycles. This
controller implements weight and absolute bandwidth limit models for
normal scheduling policy and absolute bandwidth allocation model for
@ -910,12 +906,16 @@ All time durations are in microseconds.
cpu.stat
A read-only flat-keyed file which exists on non-root cgroups.
This file exists whether the controller is enabled or not.
It reports the following six stats:
It always reports the following three stats:
- usage_usec
- user_usec
- system_usec
and the following three when the controller is enabled:
- nr_periods
- nr_throttled
- throttled_usec
@ -926,6 +926,18 @@ All time durations are in microseconds.
The weight in the range [1, 10000].
cpu.weight.nice
A read-write single value file which exists on non-root
cgroups. The default is "0".
The nice value is in the range [-20, 19].
This interface file is an alternative interface for
"cpu.weight" and allows reading and setting weight using the
same values used by nice(2). Because the range is smaller and
granularity is coarser for the nice values, the read value is
the closest approximation of the current weight.
cpu.max
A read-write two value file which exists on non-root cgroups.
The default is "max 100000".
@ -938,26 +950,6 @@ All time durations are in microseconds.
$PERIOD duration. "max" for $MAX indicates no limit. If only
one number is written, $MAX is updated.
cpu.rt.max
.. note::
The semantics of this file is still under discussion and the
interface hasn't been merged yet
A read-write two value file which exists on all cgroups.
The default is "0 100000".
The maximum realtime runtime allocation. Over-committing
configurations are disallowed and process migrations are
rejected if not enough bandwidth is available. It's in the
following format::
$MAX $PERIOD
which indicates that the group may consume upto $MAX in each
$PERIOD duration. If only one number is written, $MAX is
updated.
Memory
------

View File

@ -0,0 +1,7 @@
WARN_ONCE / WARN_ON_ONCE only print a warning once.
echo 1 > /sys/kernel/debug/clear_warn_once
clears the state and allows the warnings to print once again.
This can be useful after test suite runs to reproduce problems.

View File

@ -225,9 +225,9 @@ interrupts.
The following control flow is implemented (simplified excerpt)::
:c:func:`desc->irq_data.chip->irq_mask_ack`;
desc->irq_data.chip->irq_mask_ack();
handle_irq_event(desc->action);
:c:func:`desc->irq_data.chip->irq_unmask`;
desc->irq_data.chip->irq_unmask();
Default Fast EOI IRQ flow handler
@ -239,7 +239,7 @@ which only need an EOI at the end of the handler.
The following control flow is implemented (simplified excerpt)::
handle_irq_event(desc->action);
:c:func:`desc->irq_data.chip->irq_eoi`;
desc->irq_data.chip->irq_eoi();
Default Edge IRQ flow handler
@ -251,15 +251,15 @@ interrupts.
The following control flow is implemented (simplified excerpt)::
if (desc->status & running) {
:c:func:`desc->irq_data.chip->irq_mask_ack`;
desc->irq_data.chip->irq_mask_ack();
desc->status |= pending | masked;
return;
}
:c:func:`desc->irq_data.chip->irq_ack`;
desc->irq_data.chip->irq_ack();
desc->status |= running;
do {
if (desc->status & masked)
:c:func:`desc->irq_data.chip->irq_unmask`;
desc->irq_data.chip->irq_unmask();
desc->status &= ~pending;
handle_irq_event(desc->action);
} while (status & pending);
@ -293,10 +293,10 @@ simplified version without locking.
The following control flow is implemented (simplified excerpt)::
if (desc->irq_data.chip->irq_ack)
:c:func:`desc->irq_data.chip->irq_ack`;
desc->irq_data.chip->irq_ack();
handle_irq_event(desc->action);
if (desc->irq_data.chip->irq_eoi)
:c:func:`desc->irq_data.chip->irq_eoi`;
desc->irq_data.chip->irq_eoi();
EOI Edge IRQ flow handler

View File

@ -177,18 +177,14 @@ Here is a sample module which implements a basic per cpu counter using
printk("Read : CPU %d, count %ld\n", cpu,
local_read(&per_cpu(counters, cpu)));
}
del_timer(&test_timer);
test_timer.expires = jiffies + 1000;
add_timer(&test_timer);
mod_timer(&test_timer, jiffies + 1000);
}
static int __init test_init(void)
{
/* initialize the timer that will increment the counter */
init_timer(&test_timer);
test_timer.function = do_test_timer;
test_timer.expires = jiffies + 1;
add_timer(&test_timer);
timer_setup(&test_timer, do_test_timer, 0);
mod_timer(&test_timer, jiffies + 1);
return 0;
}

View File

@ -90,6 +90,9 @@ Freq_i to Freq_j. Freq_i is in descending order with increasing rows and
Freq_j is in descending order with increasing columns. The output here also
contains the actual freq values for each row and column for better readability.
If the transition table is bigger than PAGE_SIZE, reading this will
return an -EFBIG error.
--------------------------------------------------------------------------------
<mysystem>:/sys/devices/system/cpu/cpu0/cpufreq/stats # cat trans_table
From : To

View File

@ -7,59 +7,27 @@ Code Example For Symmetric Key Cipher Operation
::
struct tcrypt_result {
struct completion completion;
int err;
};
/* tie all data structures together */
struct skcipher_def {
struct scatterlist sg;
struct crypto_skcipher *tfm;
struct skcipher_request *req;
struct tcrypt_result result;
struct crypto_wait wait;
};
/* Callback function */
static void test_skcipher_cb(struct crypto_async_request *req, int error)
{
struct tcrypt_result *result = req->data;
if (error == -EINPROGRESS)
return;
result->err = error;
complete(&result->completion);
pr_info("Encryption finished successfully\n");
}
/* Perform cipher operation */
static unsigned int test_skcipher_encdec(struct skcipher_def *sk,
int enc)
{
int rc = 0;
int rc;
if (enc)
rc = crypto_skcipher_encrypt(sk->req);
rc = crypto_wait_req(crypto_skcipher_encrypt(sk->req), &sk->wait);
else
rc = crypto_skcipher_decrypt(sk->req);
rc = crypto_wait_req(crypto_skcipher_decrypt(sk->req), &sk->wait);
switch (rc) {
case 0:
break;
case -EINPROGRESS:
case -EBUSY:
rc = wait_for_completion_interruptible(
&sk->result.completion);
if (!rc && !sk->result.err) {
reinit_completion(&sk->result.completion);
break;
}
default:
pr_info("skcipher encrypt returned with %d result %d\n",
rc, sk->result.err);
break;
}
init_completion(&sk->result.completion);
if (rc)
pr_info("skcipher encrypt returned with result %d\n", rc);
return rc;
}
@ -89,8 +57,8 @@ Code Example For Symmetric Key Cipher Operation
}
skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
test_skcipher_cb,
&sk.result);
crypto_req_done,
&sk.wait);
/* AES 256 with random key */
get_random_bytes(&key, 32);
@ -122,7 +90,7 @@ Code Example For Symmetric Key Cipher Operation
/* We encrypt one block */
sg_init_one(&sk.sg, scratchpad, 16);
skcipher_request_set_crypt(req, &sk.sg, &sk.sg, 16, ivdata);
init_completion(&sk.result.completion);
crypto_init_wait(&sk.wait);
/* encrypt data */
ret = test_skcipher_encdec(&sk, 1);

View File

@ -33,9 +33,6 @@ of many distributions, e.g. :
You can get the latest version released from the Coccinelle homepage at
http://coccinelle.lip6.fr/
Information and tips about Coccinelle are also provided on the wiki
pages at http://cocci.ekstranet.diku.dk/wiki/doku.php
Once you have it, run the following command::
./configure

View File

@ -21,7 +21,6 @@ whole; patches welcome!
kasan
ubsan
kmemleak
kmemcheck
gdb-kernel-debugging
kgdb
kselftest

View File

@ -12,19 +12,30 @@ To achieve this goal it does not collect coverage in soft/hard interrupts
and instrumentation of some inherently non-deterministic parts of kernel is
disabled (e.g. scheduler, locking).
Usage
-----
kcov is also able to collect comparison operands from the instrumented code
(this feature currently requires that the kernel is compiled with clang).
Prerequisites
-------------
Configure the kernel with::
CONFIG_KCOV=y
CONFIG_KCOV requires gcc built on revision 231296 or later.
If the comparison operands need to be collected, set::
CONFIG_KCOV_ENABLE_COMPARISONS=y
Profiling data will only become accessible once debugfs has been mounted::
mount -t debugfs none /sys/kernel/debug
The following program demonstrates kcov usage from within a test program:
Coverage collection
-------------------
The following program demonstrates coverage collection from within a test
program using kcov:
.. code-block:: c
@ -44,6 +55,9 @@ The following program demonstrates kcov usage from within a test program:
#define KCOV_DISABLE _IO('c', 101)
#define COVER_SIZE (64<<10)
#define KCOV_TRACE_PC 0
#define KCOV_TRACE_CMP 1
int main(int argc, char **argv)
{
int fd;
@ -64,7 +78,7 @@ The following program demonstrates kcov usage from within a test program:
if ((void*)cover == MAP_FAILED)
perror("mmap"), exit(1);
/* Enable coverage collection on the current thread. */
if (ioctl(fd, KCOV_ENABLE, 0))
if (ioctl(fd, KCOV_ENABLE, KCOV_TRACE_PC))
perror("ioctl"), exit(1);
/* Reset coverage from the tail of the ioctl() call. */
__atomic_store_n(&cover[0], 0, __ATOMIC_RELAXED);
@ -111,3 +125,80 @@ The interface is fine-grained to allow efficient forking of test processes.
That is, a parent process opens /sys/kernel/debug/kcov, enables trace mode,
mmaps coverage buffer and then forks child processes in a loop. Child processes
only need to enable coverage (disable happens automatically on thread end).
Comparison operands collection
------------------------------
Comparison operands collection is similar to coverage collection:
.. code-block:: c
/* Same includes and defines as above. */
/* Number of 64-bit words per record. */
#define KCOV_WORDS_PER_CMP 4
/*
* The format for the types of collected comparisons.
*
* Bit 0 shows whether one of the arguments is a compile-time constant.
* Bits 1 & 2 contain log2 of the argument size, up to 8 bytes.
*/
#define KCOV_CMP_CONST (1 << 0)
#define KCOV_CMP_SIZE(n) ((n) << 1)
#define KCOV_CMP_MASK KCOV_CMP_SIZE(3)
int main(int argc, char **argv)
{
int fd;
uint64_t *cover, type, arg1, arg2, is_const, size;
unsigned long n, i;
fd = open("/sys/kernel/debug/kcov", O_RDWR);
if (fd == -1)
perror("open"), exit(1);
if (ioctl(fd, KCOV_INIT_TRACE, COVER_SIZE))
perror("ioctl"), exit(1);
/*
* Note that the buffer pointer is of type uint64_t*, because all
* the comparison operands are promoted to uint64_t.
*/
cover = (uint64_t *)mmap(NULL, COVER_SIZE * sizeof(unsigned long),
PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if ((void*)cover == MAP_FAILED)
perror("mmap"), exit(1);
/* Note KCOV_TRACE_CMP instead of KCOV_TRACE_PC. */
if (ioctl(fd, KCOV_ENABLE, KCOV_TRACE_CMP))
perror("ioctl"), exit(1);
__atomic_store_n(&cover[0], 0, __ATOMIC_RELAXED);
read(-1, NULL, 0);
/* Read number of comparisons collected. */
n = __atomic_load_n(&cover[0], __ATOMIC_RELAXED);
for (i = 0; i < n; i++) {
type = cover[i * KCOV_WORDS_PER_CMP + 1];
/* arg1 and arg2 - operands of the comparison. */
arg1 = cover[i * KCOV_WORDS_PER_CMP + 2];
arg2 = cover[i * KCOV_WORDS_PER_CMP + 3];
/* ip - caller address. */
ip = cover[i * KCOV_WORDS_PER_CMP + 4];
/* size of the operands. */
size = 1 << ((type & KCOV_CMP_MASK) >> 1);
/* is_const - true if either operand is a compile-time constant.*/
is_const = type & KCOV_CMP_CONST;
printf("ip: 0x%lx type: 0x%lx, arg1: 0x%lx, arg2: 0x%lx, "
"size: %lu, %s\n",
ip, type, arg1, arg2, size,
is_const ? "const" : "non-const");
}
if (ioctl(fd, KCOV_DISABLE, 0))
perror("ioctl"), exit(1);
/* Free resources. */
if (munmap(cover, COVER_SIZE * sizeof(unsigned long)))
perror("munmap"), exit(1);
if (close(fd))
perror("close"), exit(1);
return 0;
}
Note that the kcov modes (coverage collection or comparison operands) are
mutually exclusive.

View File

@ -1,733 +0,0 @@
Getting started with kmemcheck
==============================
Vegard Nossum <vegardno@ifi.uio.no>
Introduction
------------
kmemcheck is a debugging feature for the Linux Kernel. More specifically, it
is a dynamic checker that detects and warns about some uses of uninitialized
memory.
Userspace programmers might be familiar with Valgrind's memcheck. The main
difference between memcheck and kmemcheck is that memcheck works for userspace
programs only, and kmemcheck works for the kernel only. The implementations
are of course vastly different. Because of this, kmemcheck is not as accurate
as memcheck, but it turns out to be good enough in practice to discover real
programmer errors that the compiler is not able to find through static
analysis.
Enabling kmemcheck on a kernel will probably slow it down to the extent that
the machine will not be usable for normal workloads such as e.g. an
interactive desktop. kmemcheck will also cause the kernel to use about twice
as much memory as normal. For this reason, kmemcheck is strictly a debugging
feature.
Downloading
-----------
As of version 2.6.31-rc1, kmemcheck is included in the mainline kernel.
Configuring and compiling
-------------------------
kmemcheck only works for the x86 (both 32- and 64-bit) platform. A number of
configuration variables must have specific settings in order for the kmemcheck
menu to even appear in "menuconfig". These are:
- ``CONFIG_CC_OPTIMIZE_FOR_SIZE=n``
This option is located under "General setup" / "Optimize for size".
Without this, gcc will use certain optimizations that usually lead to
false positive warnings from kmemcheck. An example of this is a 16-bit
field in a struct, where gcc may load 32 bits, then discard the upper
16 bits. kmemcheck sees only the 32-bit load, and may trigger a
warning for the upper 16 bits (if they're uninitialized).
- ``CONFIG_SLAB=y`` or ``CONFIG_SLUB=y``
This option is located under "General setup" / "Choose SLAB
allocator".
- ``CONFIG_FUNCTION_TRACER=n``
This option is located under "Kernel hacking" / "Tracers" / "Kernel
Function Tracer"
When function tracing is compiled in, gcc emits a call to another
function at the beginning of every function. This means that when the
page fault handler is called, the ftrace framework will be called
before kmemcheck has had a chance to handle the fault. If ftrace then
modifies memory that was tracked by kmemcheck, the result is an
endless recursive page fault.
- ``CONFIG_DEBUG_PAGEALLOC=n``
This option is located under "Kernel hacking" / "Memory Debugging"
/ "Debug page memory allocations".
In addition, I highly recommend turning on ``CONFIG_DEBUG_INFO=y``. This is also
located under "Kernel hacking". With this, you will be able to get line number
information from the kmemcheck warnings, which is extremely valuable in
debugging a problem. This option is not mandatory, however, because it slows
down the compilation process and produces a much bigger kernel image.
Now the kmemcheck menu should be visible (under "Kernel hacking" / "Memory
Debugging" / "kmemcheck: trap use of uninitialized memory"). Here follows
a description of the kmemcheck configuration variables:
- ``CONFIG_KMEMCHECK``
This must be enabled in order to use kmemcheck at all...
- ``CONFIG_KMEMCHECK_``[``DISABLED`` | ``ENABLED`` | ``ONESHOT``]``_BY_DEFAULT``
This option controls the status of kmemcheck at boot-time. "Enabled"
will enable kmemcheck right from the start, "disabled" will boot the
kernel as normal (but with the kmemcheck code compiled in, so it can
be enabled at run-time after the kernel has booted), and "one-shot" is
a special mode which will turn kmemcheck off automatically after
detecting the first use of uninitialized memory.
If you are using kmemcheck to actively debug a problem, then you
probably want to choose "enabled" here.
The one-shot mode is mostly useful in automated test setups because it
can prevent floods of warnings and increase the chances of the machine
surviving in case something is really wrong. In other cases, the one-
shot mode could actually be counter-productive because it would turn
itself off at the very first error -- in the case of a false positive
too -- and this would come in the way of debugging the specific
problem you were interested in.
If you would like to use your kernel as normal, but with a chance to
enable kmemcheck in case of some problem, it might be a good idea to
choose "disabled" here. When kmemcheck is disabled, most of the run-
time overhead is not incurred, and the kernel will be almost as fast
as normal.
- ``CONFIG_KMEMCHECK_QUEUE_SIZE``
Select the maximum number of error reports to store in an internal
(fixed-size) buffer. Since errors can occur virtually anywhere and in
any context, we need a temporary storage area which is guaranteed not
to generate any other page faults when accessed. The queue will be
emptied as soon as a tasklet may be scheduled. If the queue is full,
new error reports will be lost.
The default value of 64 is probably fine. If some code produces more
than 64 errors within an irqs-off section, then the code is likely to
produce many, many more, too, and these additional reports seldom give
any more information (the first report is usually the most valuable
anyway).
This number might have to be adjusted if you are not using serial
console or similar to capture the kernel log. If you are using the
"dmesg" command to save the log, then getting a lot of kmemcheck
warnings might overflow the kernel log itself, and the earlier reports
will get lost in that way instead. Try setting this to 10 or so on
such a setup.
- ``CONFIG_KMEMCHECK_SHADOW_COPY_SHIFT``
Select the number of shadow bytes to save along with each entry of the
error-report queue. These bytes indicate what parts of an allocation
are initialized, uninitialized, etc. and will be displayed when an
error is detected to help the debugging of a particular problem.
The number entered here is actually the logarithm of the number of
bytes that will be saved. So if you pick for example 5 here, kmemcheck
will save 2^5 = 32 bytes.
The default value should be fine for debugging most problems. It also
fits nicely within 80 columns.
- ``CONFIG_KMEMCHECK_PARTIAL_OK``
This option (when enabled) works around certain GCC optimizations that
produce 32-bit reads from 16-bit variables where the upper 16 bits are
thrown away afterwards.
The default value (enabled) is recommended. This may of course hide
some real errors, but disabling it would probably produce a lot of
false positives.
- ``CONFIG_KMEMCHECK_BITOPS_OK``
This option silences warnings that would be generated for bit-field
accesses where not all the bits are initialized at the same time. This
may also hide some real bugs.
This option is probably obsolete, or it should be replaced with
the kmemcheck-/bitfield-annotations for the code in question. The
default value is therefore fine.
Now compile the kernel as usual.
How to use
----------
Booting
~~~~~~~
First some information about the command-line options. There is only one
option specific to kmemcheck, and this is called "kmemcheck". It can be used
to override the default mode as chosen by the ``CONFIG_KMEMCHECK_*_BY_DEFAULT``
option. Its possible settings are:
- ``kmemcheck=0`` (disabled)
- ``kmemcheck=1`` (enabled)
- ``kmemcheck=2`` (one-shot mode)
If SLUB debugging has been enabled in the kernel, it may take precedence over
kmemcheck in such a way that the slab caches which are under SLUB debugging
will not be tracked by kmemcheck. In order to ensure that this doesn't happen
(even though it shouldn't by default), use SLUB's boot option ``slub_debug``,
like this: ``slub_debug=-``
In fact, this option may also be used for fine-grained control over SLUB vs.
kmemcheck. For example, if the command line includes
``kmemcheck=1 slub_debug=,dentry``, then SLUB debugging will be used only
for the "dentry" slab cache, and with kmemcheck tracking all the other
caches. This is advanced usage, however, and is not generally recommended.
Run-time enable/disable
~~~~~~~~~~~~~~~~~~~~~~~
When the kernel has booted, it is possible to enable or disable kmemcheck at
run-time. WARNING: This feature is still experimental and may cause false
positive warnings to appear. Therefore, try not to use this. If you find that
it doesn't work properly (e.g. you see an unreasonable amount of warnings), I
will be happy to take bug reports.
Use the file ``/proc/sys/kernel/kmemcheck`` for this purpose, e.g.::
$ echo 0 > /proc/sys/kernel/kmemcheck # disables kmemcheck
The numbers are the same as for the ``kmemcheck=`` command-line option.
Debugging
~~~~~~~~~
A typical report will look something like this::
WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024)
80000000000000000000000000000000000000000088ffff0000000000000000
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
^
Pid: 1856, comm: ntpdate Not tainted 2.6.29-rc5 #264 945P-A
RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190
RSP: 0018:ffff88003cdf7d98 EFLAGS: 00210002
RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009
RDX: ffff88003e5d6018 RSI: ffff88003e5d6024 RDI: ffff88003cdf7e84
RBP: ffff88003cdf7db8 R08: ffff88003e5d6000 R09: 0000000000000000
R10: 0000000000000080 R11: 0000000000000000 R12: 000000000000000e
R13: ffff88003cdf7e78 R14: ffff88003d530710 R15: ffff88003d5a98c8
FS: 0000000000000000(0000) GS:ffff880001982000(0063) knlGS:00000
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: ffff88003f806ea0 CR3: 000000003c036000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[<ffffffff8104f04e>] dequeue_signal+0x8e/0x170
[<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390
[<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0
[<ffffffff8100c7b5>] int_signal+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff
The single most valuable information in this report is the RIP (or EIP on 32-
bit) value. This will help us pinpoint exactly which instruction that caused
the warning.
If your kernel was compiled with ``CONFIG_DEBUG_INFO=y``, then all we have to do
is give this address to the addr2line program, like this::
$ addr2line -e vmlinux -i ffffffff8104ede8
arch/x86/include/asm/string_64.h:12
include/asm-generic/siginfo.h:287
kernel/signal.c:380
kernel/signal.c:410
The "``-e vmlinux``" tells addr2line which file to look in. **IMPORTANT:**
This must be the vmlinux of the kernel that produced the warning in the
first place! If not, the line number information will almost certainly be
wrong.
The "``-i``" tells addr2line to also print the line numbers of inlined
functions. In this case, the flag was very important, because otherwise,
it would only have printed the first line, which is just a call to
``memcpy()``, which could be called from a thousand places in the kernel, and
is therefore not very useful. These inlined functions would not show up in
the stack trace above, simply because the kernel doesn't load the extra
debugging information. This technique can of course be used with ordinary
kernel oopses as well.
In this case, it's the caller of ``memcpy()`` that is interesting, and it can be
found in ``include/asm-generic/siginfo.h``, line 287::
281 static inline void copy_siginfo(struct siginfo *to, struct siginfo *from)
282 {
283 if (from->si_code < 0)
284 memcpy(to, from, sizeof(*to));
285 else
286 /* _sigchld is currently the largest know union member */
287 memcpy(to, from, __ARCH_SI_PREAMBLE_SIZE + sizeof(from->_sifields._sigchld));
288 }
Since this was a read (kmemcheck usually warns about reads only, though it can
warn about writes to unallocated or freed memory as well), it was probably the
"from" argument which contained some uninitialized bytes. Following the chain
of calls, we move upwards to see where "from" was allocated or initialized,
``kernel/signal.c``, line 380::
359 static void collect_signal(int sig, struct sigpending *list, siginfo_t *info)
360 {
...
367 list_for_each_entry(q, &list->list, list) {
368 if (q->info.si_signo == sig) {
369 if (first)
370 goto still_pending;
371 first = q;
...
377 if (first) {
378 still_pending:
379 list_del_init(&first->list);
380 copy_siginfo(info, &first->info);
381 __sigqueue_free(first);
...
392 }
393 }
Here, it is ``&first->info`` that is being passed on to ``copy_siginfo()``. The
variable ``first`` was found on a list -- passed in as the second argument to
``collect_signal()``. We continue our journey through the stack, to figure out
where the item on "list" was allocated or initialized. We move to line 410::
395 static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
396 siginfo_t *info)
397 {
...
410 collect_signal(sig, pending, info);
...
414 }
Now we need to follow the ``pending`` pointer, since that is being passed on to
``collect_signal()`` as ``list``. At this point, we've run out of lines from the
"addr2line" output. Not to worry, we just paste the next addresses from the
kmemcheck stack dump, i.e.::
[<ffffffff8104f04e>] dequeue_signal+0x8e/0x170
[<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390
[<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0
[<ffffffff8100c7b5>] int_signal+0x12/0x17
$ addr2line -e vmlinux -i ffffffff8104f04e ffffffff81050bd8 \
ffffffff8100b87d ffffffff8100c7b5
kernel/signal.c:446
kernel/signal.c:1806
arch/x86/kernel/signal.c:805
arch/x86/kernel/signal.c:871
arch/x86/kernel/entry_64.S:694
Remember that since these addresses were found on the stack and not as the
RIP value, they actually point to the _next_ instruction (they are return
addresses). This becomes obvious when we look at the code for line 446::
422 int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
423 {
...
431 signr = __dequeue_signal(&tsk->signal->shared_pending,
432 mask, info);
433 /*
434 * itimer signal ?
435 *
436 * itimers are process shared and we restart periodic
437 * itimers in the signal delivery path to prevent DoS
438 * attacks in the high resolution timer case. This is
439 * compliant with the old way of self restarting
440 * itimers, as the SIGALRM is a legacy signal and only
441 * queued once. Changing the restart behaviour to
442 * restart the timer in the signal dequeue path is
443 * reducing the timer noise on heavy loaded !highres
444 * systems too.
445 */
446 if (unlikely(signr == SIGALRM)) {
...
489 }
So instead of looking at 446, we should be looking at 431, which is the line
that executes just before 446. Here we see that what we are looking for is
``&tsk->signal->shared_pending``.
Our next task is now to figure out which function that puts items on this
``shared_pending`` list. A crude, but efficient tool, is ``git grep``::
$ git grep -n 'shared_pending' kernel/
...
kernel/signal.c:828: pending = group ? &t->signal->shared_pending : &t->pending;
kernel/signal.c:1339: pending = group ? &t->signal->shared_pending : &t->pending;
...
There were more results, but none of them were related to list operations,
and these were the only assignments. We inspect the line numbers more closely
and find that this is indeed where items are being added to the list::
816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
817 int group)
818 {
...
828 pending = group ? &t->signal->shared_pending : &t->pending;
...
851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN &&
852 (is_si_special(info) ||
853 info->si_code >= 0)));
854 if (q) {
855 list_add_tail(&q->list, &pending->list);
...
890 }
and::
1309 int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
1310 {
....
1339 pending = group ? &t->signal->shared_pending : &t->pending;
1340 list_add_tail(&q->list, &pending->list);
....
1347 }
In the first case, the list element we are looking for, ``q``, is being
returned from the function ``__sigqueue_alloc()``, which looks like an
allocation function. Let's take a look at it::
187 static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
188 int override_rlimit)
189 {
190 struct sigqueue *q = NULL;
191 struct user_struct *user;
192
193 /*
194 * We won't get problems with the target's UID changing under us
195 * because changing it requires RCU be used, and if t != current, the
196 * caller must be holding the RCU readlock (by way of a spinlock) and
197 * we use RCU protection here
198 */
199 user = get_uid(__task_cred(t)->user);
200 atomic_inc(&user->sigpending);
201 if (override_rlimit ||
202 atomic_read(&user->sigpending) <=
203 t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
204 q = kmem_cache_alloc(sigqueue_cachep, flags);
205 if (unlikely(q == NULL)) {
206 atomic_dec(&user->sigpending);
207 free_uid(user);
208 } else {
209 INIT_LIST_HEAD(&q->list);
210 q->flags = 0;
211 q->user = user;
212 }
213
214 return q;
215 }
We see that this function initializes ``q->list``, ``q->flags``, and
``q->user``. It seems that now is the time to look at the definition of
``struct sigqueue``, e.g.::
14 struct sigqueue {
15 struct list_head list;
16 int flags;
17 siginfo_t info;
18 struct user_struct *user;
19 };
And, you might remember, it was a ``memcpy()`` on ``&first->info`` that
caused the warning, so this makes perfect sense. It also seems reasonable
to assume that it is the caller of ``__sigqueue_alloc()`` that has the
responsibility of filling out (initializing) this member.
But just which fields of the struct were uninitialized? Let's look at
kmemcheck's report again::
WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024)
80000000000000000000000000000000000000000088ffff0000000000000000
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
^
These first two lines are the memory dump of the memory object itself, and
the shadow bytemap, respectively. The memory object itself is in this case
``&first->info``. Just beware that the start of this dump is NOT the start
of the object itself! The position of the caret (^) corresponds with the
address of the read (ffff88003e4a2024).
The shadow bytemap dump legend is as follows:
- i: initialized
- u: uninitialized
- a: unallocated (memory has been allocated by the slab layer, but has not
yet been handed off to anybody)
- f: freed (memory has been allocated by the slab layer, but has been freed
by the previous owner)
In order to figure out where (relative to the start of the object) the
uninitialized memory was located, we have to look at the disassembly. For
that, we'll need the RIP address again::
RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190
$ objdump -d --no-show-raw-insn vmlinux | grep -C 8 ffffffff8104ede8:
ffffffff8104edc8: mov %r8,0x8(%r8)
ffffffff8104edcc: test %r10d,%r10d
ffffffff8104edcf: js ffffffff8104ee88 <__dequeue_signal+0x168>
ffffffff8104edd5: mov %rax,%rdx
ffffffff8104edd8: mov $0xc,%ecx
ffffffff8104eddd: mov %r13,%rdi
ffffffff8104ede0: mov $0x30,%eax
ffffffff8104ede5: mov %rdx,%rsi
ffffffff8104ede8: rep movsl %ds:(%rsi),%es:(%rdi)
ffffffff8104edea: test $0x2,%al
ffffffff8104edec: je ffffffff8104edf0 <__dequeue_signal+0xd0>
ffffffff8104edee: movsw %ds:(%rsi),%es:(%rdi)
ffffffff8104edf0: test $0x1,%al
ffffffff8104edf2: je ffffffff8104edf5 <__dequeue_signal+0xd5>
ffffffff8104edf4: movsb %ds:(%rsi),%es:(%rdi)
ffffffff8104edf5: mov %r8,%rdi
ffffffff8104edf8: callq ffffffff8104de60 <__sigqueue_free>
As expected, it's the "``rep movsl``" instruction from the ``memcpy()``
that causes the warning. We know about ``REP MOVSL`` that it uses the register
``RCX`` to count the number of remaining iterations. By taking a look at the
register dump again (from the kmemcheck report), we can figure out how many
bytes were left to copy::
RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009
By looking at the disassembly, we also see that ``%ecx`` is being loaded
with the value ``$0xc`` just before (ffffffff8104edd8), so we are very
lucky. Keep in mind that this is the number of iterations, not bytes. And
since this is a "long" operation, we need to multiply by 4 to get the
number of bytes. So this means that the uninitialized value was encountered
at 4 * (0xc - 0x9) = 12 bytes from the start of the object.
We can now try to figure out which field of the "``struct siginfo``" that
was not initialized. This is the beginning of the struct::
40 typedef struct siginfo {
41 int si_signo;
42 int si_errno;
43 int si_code;
44
45 union {
..
92 } _sifields;
93 } siginfo_t;
On 64-bit, the int is 4 bytes long, so it must the union member that has
not been initialized. We can verify this using gdb::
$ gdb vmlinux
...
(gdb) p &((struct siginfo *) 0)->_sifields
$1 = (union {...} *) 0x10
Actually, it seems that the union member is located at offset 0x10 -- which
means that gcc has inserted 4 bytes of padding between the members ``si_code``
and ``_sifields``. We can now get a fuller picture of the memory dump::
_----------------------------=> si_code
/ _--------------------=> (padding)
| / _------------=> _sifields(._kill._pid)
| | / _----=> _sifields(._kill._uid)
| | | /
-------|-------|-------|-------|
80000000000000000000000000000000000000000088ffff0000000000000000
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
This allows us to realize another important fact: ``si_code`` contains the
value 0x80. Remember that x86 is little endian, so the first 4 bytes
"80000000" are really the number 0x00000080. With a bit of research, we
find that this is actually the constant ``SI_KERNEL`` defined in
``include/asm-generic/siginfo.h``::
144 #define SI_KERNEL 0x80 /* sent by the kernel from somewhere */
This macro is used in exactly one place in the x86 kernel: In ``send_signal()``
in ``kernel/signal.c``::
816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
817 int group)
818 {
...
828 pending = group ? &t->signal->shared_pending : &t->pending;
...
851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN &&
852 (is_si_special(info) ||
853 info->si_code >= 0)));
854 if (q) {
855 list_add_tail(&q->list, &pending->list);
856 switch ((unsigned long) info) {
...
865 case (unsigned long) SEND_SIG_PRIV:
866 q->info.si_signo = sig;
867 q->info.si_errno = 0;
868 q->info.si_code = SI_KERNEL;
869 q->info.si_pid = 0;
870 q->info.si_uid = 0;
871 break;
...
890 }
Not only does this match with the ``.si_code`` member, it also matches the place
we found earlier when looking for where siginfo_t objects are enqueued on the
``shared_pending`` list.
So to sum up: It seems that it is the padding introduced by the compiler
between two struct fields that is uninitialized, and this gets reported when
we do a ``memcpy()`` on the struct. This means that we have identified a false
positive warning.
Normally, kmemcheck will not report uninitialized accesses in ``memcpy()`` calls
when both the source and destination addresses are tracked. (Instead, we copy
the shadow bytemap as well). In this case, the destination address clearly
was not tracked. We can dig a little deeper into the stack trace from above::
arch/x86/kernel/signal.c:805
arch/x86/kernel/signal.c:871
arch/x86/kernel/entry_64.S:694
And we clearly see that the destination siginfo object is located on the
stack::
782 static void do_signal(struct pt_regs *regs)
783 {
784 struct k_sigaction ka;
785 siginfo_t info;
...
804 signr = get_signal_to_deliver(&info, &ka, regs, NULL);
...
854 }
And this ``&info`` is what eventually gets passed to ``copy_siginfo()`` as the
destination argument.
Now, even though we didn't find an actual error here, the example is still a
good one, because it shows how one would go about to find out what the report
was all about.
Annotating false positives
~~~~~~~~~~~~~~~~~~~~~~~~~~
There are a few different ways to make annotations in the source code that
will keep kmemcheck from checking and reporting certain allocations. Here
they are:
- ``__GFP_NOTRACK_FALSE_POSITIVE``
This flag can be passed to ``kmalloc()`` or ``kmem_cache_alloc()``
(therefore also to other functions that end up calling one of
these) to indicate that the allocation should not be tracked
because it would lead to a false positive report. This is a "big
hammer" way of silencing kmemcheck; after all, even if the false
positive pertains to particular field in a struct, for example, we
will now lose the ability to find (real) errors in other parts of
the same struct.
Example::
/* No warnings will ever trigger on accessing any part of x */
x = kmalloc(sizeof *x, GFP_KERNEL | __GFP_NOTRACK_FALSE_POSITIVE);
- ``kmemcheck_bitfield_begin(name)``/``kmemcheck_bitfield_end(name)`` and
``kmemcheck_annotate_bitfield(ptr, name)``
The first two of these three macros can be used inside struct
definitions to signal, respectively, the beginning and end of a
bitfield. Additionally, this will assign the bitfield a name, which
is given as an argument to the macros.
Having used these markers, one can later use
kmemcheck_annotate_bitfield() at the point of allocation, to indicate
which parts of the allocation is part of a bitfield.
Example::
struct foo {
int x;
kmemcheck_bitfield_begin(flags);
int flag_a:1;
int flag_b:1;
kmemcheck_bitfield_end(flags);
int y;
};
struct foo *x = kmalloc(sizeof *x);
/* No warnings will trigger on accessing the bitfield of x */
kmemcheck_annotate_bitfield(x, flags);
Note that ``kmemcheck_annotate_bitfield()`` can be used even before the
return value of ``kmalloc()`` is checked -- in other words, passing NULL
as the first argument is legal (and will do nothing).
Reporting errors
----------------
As we have seen, kmemcheck will produce false positive reports. Therefore, it
is not very wise to blindly post kmemcheck warnings to mailing lists and
maintainers. Instead, I encourage maintainers and developers to find errors
in their own code. If you get a warning, you can try to work around it, try
to figure out if it's a real error or not, or simply ignore it. Most
developers know their own code and will quickly and efficiently determine the
root cause of a kmemcheck report. This is therefore also the most efficient
way to work with kmemcheck.
That said, we (the kmemcheck maintainers) will always be on the lookout for
false positives that we can annotate and silence. So whatever you find,
please drop us a note privately! Kernel configs and steps to reproduce (if
available) are of course a great help too.
Happy hacking!
Technical description
---------------------
kmemcheck works by marking memory pages non-present. This means that whenever
somebody attempts to access the page, a page fault is generated. The page
fault handler notices that the page was in fact only hidden, and so it calls
on the kmemcheck code to make further investigations.
When the investigations are completed, kmemcheck "shows" the page by marking
it present (as it would be under normal circumstances). This way, the
interrupted code can continue as usual.
But after the instruction has been executed, we should hide the page again, so
that we can catch the next access too! Now kmemcheck makes use of a debugging
feature of the processor, namely single-stepping. When the processor has
finished the one instruction that generated the memory access, a debug
exception is raised. From here, we simply hide the page again and continue
execution, this time with the single-stepping feature turned off.
kmemcheck requires some assistance from the memory allocator in order to work.
The memory allocator needs to
1. Tell kmemcheck about newly allocated pages and pages that are about to
be freed. This allows kmemcheck to set up and tear down the shadow memory
for the pages in question. The shadow memory stores the status of each
byte in the allocation proper, e.g. whether it is initialized or
uninitialized.
2. Tell kmemcheck which parts of memory should be marked uninitialized.
There are actually a few more states, such as "not yet allocated" and
"recently freed".
If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
memory that can take page faults because of kmemcheck.
If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
request memory with the __GFP_NOTRACK or __GFP_NOTRACK_FALSE_POSITIVE flags.
This does not prevent the page faults from occurring, however, but marks the
object in question as being initialized so that no warnings will ever be
produced for this object.
Currently, the SLAB and SLUB allocators are supported by kmemcheck.

View File

@ -21,6 +21,7 @@ Boards:
Root node property compatible must contain, depending on board:
- Cubietech CubieBoard6: "cubietech,cubieboard6"
- LeMaker Guitar Base Board rev. B: "lemaker,guitar-bb-rev-b", "lemaker,guitar"

View File

@ -41,6 +41,10 @@ Boards with the Amlogic Meson GXM S912 SoC shall have the following properties:
Required root node property:
compatible: "amlogic,s912", "amlogic,meson-gxm";
Boards with the Amlogic Meson AXG A113D SoC shall have the following properties:
Required root node property:
compatible: "amlogic,a113d", "amlogic,meson-axg";
Board compatible values (alphabetically, grouped by SoC):
- "geniatech,atv1200" (Meson6)
@ -71,8 +75,12 @@ Board compatible values (alphabetically, grouped by SoC):
- "amlogic,q200" (Meson gxm s912)
- "amlogic,q201" (Meson gxm s912)
- "khadas,vim2" (Meson gxm s912)
- "kingnovel,r-box-pro" (Meson gxm S912)
- "nexbox,a1" (Meson gxm s912)
- "tronsmart,vega-s96" (Meson gxm s912)
- "amlogic,s400" (Meson axg a113d)
Amlogic Meson Firmware registers Interface
------------------------------------------

View File

@ -0,0 +1,20 @@
Amlogic Meson8 and Meson8b "analog top" registers:
--------------------------------------------------
The analog top registers contain information about the so-called
"metal revision" (which encodes the "minor version") of the SoC.
Required properties:
- reg: the register range of the analog top registers
- compatible: depending on the SoC this should be one of:
- "amlogic,meson8-analog-top"
- "amlogic,meson8b-analog-top"
along with "syscon"
Example:
analog_top: analog-top@81a8 {
compatible = "amlogic,meson8-analog-top", "syscon";
reg = <0x81a8 0x14>;
};

View File

@ -0,0 +1,17 @@
Amlogic Meson6/Meson8/Meson8b assist registers:
-----------------------------------------------
The assist registers contain basic information about the SoC,
for example the encoded SoC part number.
Required properties:
- reg: the register range of the assist registers
- compatible: should be "amlogic,meson-mx-assist" along with "syscon"
Example:
assist: assist@7c00 {
compatible = "amlogic,meson-mx-assist", "syscon";
reg = <0x7c00 0x200>;
};

View File

@ -0,0 +1,17 @@
Amlogic Meson6/Meson8/Meson8b bootrom:
--------------------------------------
The bootrom register area can be used to access SoC specific
information, such as the "misc version".
Required properties:
- reg: the register range of the bootrom registers
- compatible: should be "amlogic,meson-mx-bootrom" along with "syscon"
Example:
bootrom: bootrom@d9040000 {
compatible = "amlogic,meson-mx-bootrom", "syscon";
reg = <0xd9040000 0x10000>;
};

View File

@ -0,0 +1,18 @@
Amlogic Meson8 and Meson8b power-management-unit:
-------------------------------------------------
The pmu is used to turn off and on different power domains of the SoCs
This includes the power to the CPU cores.
Required node properties:
- compatible value : depending on the SoC this should be one of:
"amlogic,meson8-pmu"
"amlogic,meson8b-pmu"
- reg : physical base address and the size of the registers window
Example:
pmu@c81000e4 {
compatible = "amlogic,meson8b-pmu", "syscon";
reg = <0xc81000e0 0x18>;
};

View File

@ -0,0 +1,32 @@
Amlogic Meson8 and Meson8b SRAM for smp bringup:
------------------------------------------------
Amlogic's SMP-capable SoCs use part of the sram for the bringup of the cores.
Once the core gets powered up it executes the code that is residing at a
specific location.
Therefore a reserved section sub-node has to be added to the mmio-sram
declaration.
Required sub-node properties:
- compatible : depending on the SoC this should be one of:
"amlogic,meson8-smp-sram"
"amlogic,meson8b-smp-sram"
The rest of the properties should follow the generic mmio-sram discription
found in ../../misc/sram.txt
Example:
sram: sram@d9000000 {
compatible = "mmio-sram";
reg = <0xd9000000 0x20000>;
#address-cells = <1>;
#size-cells = <1>;
ranges = <0 0xd9000000 0x20000>;
smp-sram@1ff80 {
compatible = "amlogic,meson8b-smp-sram";
reg = <0x1ff80 0x8>;
};
};

View File

@ -164,6 +164,8 @@ Control registers for this memory controller's DDR PHY.
Required properties:
- compatible : should contain one of these
"brcm,brcmstb-ddr-phy-v71.1"
"brcm,brcmstb-ddr-phy-v72.0"
"brcm,brcmstb-ddr-phy-v225.1"
"brcm,brcmstb-ddr-phy-v240.1"
"brcm,brcmstb-ddr-phy-v240.2"
@ -184,7 +186,9 @@ Sequencer DRAM parameters and control registers. Used for Self-Refresh
Power-Down (SRPD), among other things.
Required properties:
- compatible : should contain "brcm,brcmstb-memc-ddr"
- compatible : should contain one of these
"brcm,brcmstb-memc-ddr-rev-b.2.2"
"brcm,brcmstb-memc-ddr"
- reg : the MEMC DDR register range
Example:

View File

@ -0,0 +1,14 @@
Broadcom Hurricane 2 device tree bindings
---------------------------------------
Broadcom Hurricane 2 family of SoCs are used for switching control. These SoCs
are based on Broadcom's iProc SoC architecture and feature a single core Cortex
A9 ARM CPUs, DDR2/DDR3 memory, PCIe GEN-2, USB 2.0 and USB 3.0, serial and NAND
flash and a PCIe attached integrated switching engine.
Boards with Hurricane SoCs shall have the following properties:
Required root node property:
BCM53342
compatible = "brcm,bcm53342", "brcm,hr2";

View File

@ -197,6 +197,8 @@ described below.
"actions,s500-smp"
"allwinner,sun6i-a31"
"allwinner,sun8i-a23"
"amlogic,meson8-smp"
"amlogic,meson8b-smp"
"arm,realview-smp"
"brcm,bcm11351-cpu-method"
"brcm,bcm23550"

View File

@ -7,7 +7,9 @@ Required Properties:
- compatible: Should be one of:
- "mediatek,mt2701-apmixedsys"
- "mediatek,mt2712-apmixedsys", "syscon"
- "mediatek,mt6797-apmixedsys"
- "mediatek,mt7622-apmixedsys"
- "mediatek,mt8135-apmixedsys"
- "mediatek,mt8173-apmixedsys"
- #clock-cells: Must be 1

View File

@ -0,0 +1,22 @@
MediaTek AUDSYS controller
============================
The MediaTek AUDSYS controller provides various clocks to the system.
Required Properties:
- compatible: Should be one of:
- "mediatek,mt7622-audsys", "syscon"
- #clock-cells: Must be 1
The AUDSYS controller uses the common clk binding from
Documentation/devicetree/bindings/clock/clock-bindings.txt
The available clocks are defined in dt-bindings/clock/mt*-clk.h.
Example:
audsys: audsys@11220000 {
compatible = "mediatek,mt7622-audsys", "syscon";
reg = <0 0x11220000 0 0x1000>;
#clock-cells = <1>;
};

View File

@ -7,6 +7,7 @@ Required Properties:
- compatible: Should be:
- "mediatek,mt2701-bdpsys", "syscon"
- "mediatek,mt2712-bdpsys", "syscon"
- #clock-cells: Must be 1
The bdpsys controller uses the common clk binding from

View File

@ -7,6 +7,7 @@ Required Properties:
- compatible: Should be:
- "mediatek,mt2701-ethsys", "syscon"
- "mediatek,mt7622-ethsys", "syscon"
- #clock-cells: Must be 1
The ethsys controller uses the common clk binding from

View File

@ -8,6 +8,7 @@ Required Properties:
- compatible: Should be:
- "mediatek,mt2701-hifsys", "syscon"
- "mediatek,mt7622-hifsys", "syscon"
- #clock-cells: Must be 1
The hifsys controller uses the common clk binding from

View File

@ -7,6 +7,7 @@ Required Properties:
- compatible: Should be one of:
- "mediatek,mt2701-imgsys", "syscon"
- "mediatek,mt2712-imgsys", "syscon"
- "mediatek,mt6797-imgsys", "syscon"
- "mediatek,mt8173-imgsys", "syscon"
- #clock-cells: Must be 1

View File

@ -8,7 +8,9 @@ Required Properties:
- compatible: Should be one of:
- "mediatek,mt2701-infracfg", "syscon"
- "mediatek,mt2712-infracfg", "syscon"
- "mediatek,mt6797-infracfg", "syscon"
- "mediatek,mt7622-infracfg", "syscon"
- "mediatek,mt8135-infracfg", "syscon"
- "mediatek,mt8173-infracfg", "syscon"
- #clock-cells: Must be 1

View File

@ -0,0 +1,22 @@
Mediatek jpgdecsys controller
============================
The Mediatek jpgdecsys controller provides various clocks to the system.
Required Properties:
- compatible: Should be:
- "mediatek,mt2712-jpgdecsys", "syscon"
- #clock-cells: Must be 1
The jpgdecsys controller uses the common clk binding from
Documentation/devicetree/bindings/clock/clock-bindings.txt
The available clocks are defined in dt-bindings/clock/mt*-clk.h.
Example:
jpgdecsys: syscon@19000000 {
compatible = "mediatek,mt2712-jpgdecsys", "syscon";
reg = <0 0x19000000 0 0x1000>;
#clock-cells = <1>;
};

View File

@ -0,0 +1,22 @@
Mediatek mcucfg controller
============================
The Mediatek mcucfg controller provides various clocks to the system.
Required Properties:
- compatible: Should be one of:
- "mediatek,mt2712-mcucfg", "syscon"
- #clock-cells: Must be 1
The mcucfg controller uses the common clk binding from
Documentation/devicetree/bindings/clock/clock-bindings.txt
The available clocks are defined in dt-bindings/clock/mt*-clk.h.
Example:
mcucfg: syscon@10220000 {
compatible = "mediatek,mt2712-mcucfg", "syscon";
reg = <0 0x10220000 0 0x1000>;
#clock-cells = <1>;
};

View File

@ -0,0 +1,22 @@
Mediatek mfgcfg controller
============================
The Mediatek mfgcfg controller provides various clocks to the system.
Required Properties:
- compatible: Should be one of:
- "mediatek,mt2712-mfgcfg", "syscon"
- #clock-cells: Must be 1
The mfgcfg controller uses the common clk binding from
Documentation/devicetree/bindings/clock/clock-bindings.txt
The available clocks are defined in dt-bindings/clock/mt*-clk.h.
Example:
mfgcfg: syscon@13000000 {
compatible = "mediatek,mt2712-mfgcfg", "syscon";
reg = <0 0x13000000 0 0x1000>;
#clock-cells = <1>;
};

View File

@ -7,6 +7,7 @@ Required Properties:
- compatible: Should be one of:
- "mediatek,mt2701-mmsys", "syscon"
- "mediatek,mt2712-mmsys", "syscon"
- "mediatek,mt6797-mmsys", "syscon"
- "mediatek,mt8173-mmsys", "syscon"
- #clock-cells: Must be 1

View File

@ -0,0 +1,22 @@
MediaTek PCIESYS controller
============================
The MediaTek PCIESYS controller provides various clocks to the system.
Required Properties:
- compatible: Should be:
- "mediatek,mt7622-pciesys", "syscon"
- #clock-cells: Must be 1
The PCIESYS controller uses the common clk binding from
Documentation/devicetree/bindings/clock/clock-bindings.txt
The available clocks are defined in dt-bindings/clock/mt*-clk.h.
Example:
pciesys: pciesys@1a100800 {
compatible = "mediatek,mt7622-pciesys", "syscon";
reg = <0 0x1a100800 0 0x1000>;
#clock-cells = <1>;
};

View File

@ -8,6 +8,8 @@ Required Properties:
- compatible: Should be one of:
- "mediatek,mt2701-pericfg", "syscon"
- "mediatek,mt2712-pericfg", "syscon"
- "mediatek,mt7622-pericfg", "syscon"
- "mediatek,mt8135-pericfg", "syscon"
- "mediatek,mt8173-pericfg", "syscon"
- #clock-cells: Must be 1

View File

@ -0,0 +1,22 @@
MediaTek SGMIISYS controller
============================
The MediaTek SGMIISYS controller provides various clocks to the system.
Required Properties:
- compatible: Should be:
- "mediatek,mt7622-sgmiisys", "syscon"
- #clock-cells: Must be 1
The SGMIISYS controller uses the common clk binding from
Documentation/devicetree/bindings/clock/clock-bindings.txt
The available clocks are defined in dt-bindings/clock/mt*-clk.h.
Example:
sgmiisys: sgmiisys@1b128000 {
compatible = "mediatek,mt7622-sgmiisys", "syscon";
reg = <0 0x1b128000 0 0x1000>;
#clock-cells = <1>;
};

View File

@ -0,0 +1,22 @@
MediaTek SSUSBSYS controller
============================
The MediaTek SSUSBSYS controller provides various clocks to the system.
Required Properties:
- compatible: Should be:
- "mediatek,mt7622-ssusbsys", "syscon"
- #clock-cells: Must be 1
The SSUSBSYS controller uses the common clk binding from
Documentation/devicetree/bindings/clock/clock-bindings.txt
The available clocks are defined in dt-bindings/clock/mt*-clk.h.
Example:
ssusbsys: ssusbsys@1a000000 {
compatible = "mediatek,mt7622-ssusbsys", "syscon";
reg = <0 0x1a000000 0 0x1000>;
#clock-cells = <1>;
};

View File

@ -7,7 +7,9 @@ Required Properties:
- compatible: Should be one of:
- "mediatek,mt2701-topckgen"
- "mediatek,mt2712-topckgen", "syscon"
- "mediatek,mt6797-topckgen"
- "mediatek,mt7622-topckgen"
- "mediatek,mt8135-topckgen"
- "mediatek,mt8173-topckgen"
- #clock-cells: Must be 1

View File

@ -7,6 +7,7 @@ Required Properties:
- compatible: Should be one of:
- "mediatek,mt2701-vdecsys", "syscon"
- "mediatek,mt2712-vdecsys", "syscon"
- "mediatek,mt6797-vdecsys", "syscon"
- "mediatek,mt8173-vdecsys", "syscon"
- #clock-cells: Must be 1

View File

@ -6,6 +6,7 @@ The Mediatek vencsys controller provides various clocks to the system.
Required Properties:
- compatible: Should be one of:
- "mediatek,mt2712-vencsys", "syscon"
- "mediatek,mt6797-vencsys", "syscon"
- "mediatek,mt8173-vencsys", "syscon"
- #clock-cells: Must be 1

View File

@ -21,6 +21,8 @@ Required properties:
"ti,omap3-scm"
"ti,omap4-scm-core"
"ti,omap4-scm-padconf-core"
"ti,omap4-scm-wkup"
"ti,omap4-scm-padconf-wkup"
"ti,omap5-scm-core"
"ti,omap5-scm-padconf-core"
"ti,dra7-scm-core"

View File

@ -12,6 +12,8 @@ Required root node properties:
Root node property compatible must contain, depending on board:
- MeLE V9: "mele,v9"
- ProBox2 AVA: "probox2,ava"
- Zidoo X9S: "zidoo,x9s"

View File

@ -1,5 +1,9 @@
Rockchip platforms device tree bindings
---------------------------------------
- Amarula Vyasa RK3288 board
Required root node properties:
- compatible = "amarula,vyasa-rk3288", "rockchip,rk3288";
- Asus Tinker board
Required root node properties:
- compatible = "asus,rk3288-tinker", "rockchip,rk3288";

View File

@ -4,7 +4,6 @@ Properties:
- compatible : should contain two values. First value must be one from following list:
- "samsung,exynos3250-pmu" - for Exynos3250 SoC,
- "samsung,exynos4210-pmu" - for Exynos4210 SoC,
- "samsung,exynos4212-pmu" - for Exynos4212 SoC,
- "samsung,exynos4412-pmu" - for Exynos4412 SoC,
- "samsung,exynos5250-pmu" - for Exynos5250 SoC,
- "samsung,exynos5260-pmu" - for Exynos5260 SoC.
@ -62,7 +61,7 @@ pmu_system_controller: system-controller@10040000 {
Example of clock consumer :
usb3503: usb3503@08 {
usb3503: usb3503@8 {
/* ... */
clock-names = "refclk";
clocks = <&pmu_system_controller 0>;

View File

@ -57,6 +57,7 @@ Required root node properties:
- "hardkernel,odroid-xu3-lite" - for Exynos5422-based Hardkernel
Odroid XU3 Lite board.
- "hardkernel,odroid-xu4" - for Exynos5422-based Hardkernel Odroid XU4.
- "hardkernel,odroid-hc1" - for Exynos5422-based Hardkernel Odroid HC1.
* Insignal
- "insignal,arndale" - for Exynos5250-based Insignal Arndale board.
@ -71,7 +72,7 @@ Optional nodes:
- compatible: only "samsung,secure-firmware" is currently supported
- reg: address of non-secure SYSRAM used for communication with firmware
firmware@0203F000 {
firmware@203F000 {
compatible = "samsung,secure-firmware";
reg = <0x0203F000 0x1000>;
};

View File

@ -39,6 +39,8 @@ SoCs:
compatible = "renesas,r8a7795"
- R-Car M3-W (R8A77960)
compatible = "renesas,r8a7796"
- R-Car V3M (R8A77970)
compatible = "renesas,r8a77970"
- R-Car D3 (R8A77995)
compatible = "renesas,r8a77995"
@ -57,6 +59,8 @@ Boards:
compatible = "renesas,bockw", "renesas,r8a7778"
- Draak (RTP0RC77995SEB0010S)
compatible = "renesas,draak", "renesas,r8a77995"
- Eagle (RTP0RC77970SEB0010S)
compatible = "renesas,eagle", "renesas,r8a77970"
- Genmai (RTK772100BC00000BR)
compatible = "renesas,genmai", "renesas,r7s72100"
- GR-Peach (X28A-M01-E/F)
@ -65,7 +69,7 @@ Boards:
compatible = "renesas,gose", "renesas,r8a7793"
- H3ULCB (R-Car Starter Kit Premier, RTP0RC7795SKBX0010SA00 (H3 ES1.1))
H3ULCB (R-Car Starter Kit Premier, RTP0RC77951SKBX010SA00 (H3 ES2.0))
compatible = "renesas,h3ulcb", "renesas,r8a7795";
compatible = "renesas,h3ulcb", "renesas,r8a7795"
- Henninger
compatible = "renesas,henninger", "renesas,r8a7791"
- iWave Systems RZ/G1E SODIMM SOM Development Platform (iW-RainboW-G22D)
@ -76,6 +80,8 @@ Boards:
compatible = "iwave,g20d", "iwave,g20m", "renesas,r8a7743"
- iWave Systems RZ/G1M Qseven System On Module (iW-RainboW-G20M-Qseven)
compatible = "iwave,g20m", "renesas,r8a7743"
- Kingfisher (SBEV-RCAR-KF-M03)
compatible = "shimafuji,kingfisher"
- Koelsch (RTP0RC7791SEB00010S)
compatible = "renesas,koelsch", "renesas,r8a7791"
- Kyoto Microcomputer Co. KZM-A9-Dual
@ -85,7 +91,7 @@ Boards:
- Lager (RTP0RC7790SEB00010S)
compatible = "renesas,lager", "renesas,r8a7790"
- M3ULCB (R-Car Starter Kit Pro, RTP0RC7796SKBX0010SA09 (M3 ES1.0))
compatible = "renesas,m3ulcb", "renesas,r8a7796";
compatible = "renesas,m3ulcb", "renesas,r8a7796"
- Marzen (R0P7779A00010S)
compatible = "renesas,marzen", "renesas,r8a7779"
- Porter (M2-LCDP)
@ -93,11 +99,11 @@ Boards:
- RSKRZA1 (YR0K77210C000BE)
compatible = "renesas,rskrza1", "renesas,r7s72100"
- Salvator-X (RTP0RC7795SIPB0010S)
compatible = "renesas,salvator-x", "renesas,r8a7795";
compatible = "renesas,salvator-x", "renesas,r8a7795"
- Salvator-X (RTP0RC7796SIPB0011S)
compatible = "renesas,salvator-x", "renesas,r8a7796";
compatible = "renesas,salvator-x", "renesas,r8a7796"
- Salvator-XS (Salvator-X 2nd version, RTP0RC7795SIPB0012S)
compatible = "renesas,salvator-xs", "renesas,r8a7795";
compatible = "renesas,salvator-xs", "renesas,r8a7795"
- SILK (RTP0RC7794LCB00011S)
compatible = "renesas,silk", "renesas,r8a7794"
- SK-RZG1E (YR8A77450S000BE)

View File

@ -33,7 +33,7 @@ Required properties:
property with the highest frequency
Example:
v2m_sysctl: sysctl@020000 {
v2m_sysctl: sysctl@20000 {
compatible = "arm,sp810", "arm,primecell";
reg = <0x020000 0x1000>;
clocks = <&v2m_refclk32khz>, <&v2m_refclk1mhz>, <&smbclk>;

View File

@ -0,0 +1,20 @@
* ARMv8.2 Statistical Profiling Extension (SPE) Performance Monitor Units (PMU)
ARMv8.2 introduces the optional Statistical Profiling Extension for collecting
performance sample data using an in-memory trace buffer.
** SPE Required properties:
- compatible : should be one of:
"arm,statistical-profiling-extension-v1"
- interrupts : Exactly 1 PPI must be listed. For heterogeneous systems where
SPE is only supported on a subset of the CPUs, please consult
the arm,gic-v3 binding for details on describing a PPI partition.
** Example:
spe-pmu {
compatible = "arm,statistical-profiling-extension-v1";
interrupts = <GIC_PPI 05 IRQ_TYPE_LEVEL_HIGH &part1>;
};

View File

@ -14,6 +14,8 @@ using one of the following compatible strings:
allwinner,sun8i-a83t
allwinner,sun8i-h2-plus
allwinner,sun8i-h3
allwinner-sun8i-r40
allwinner,sun8i-v3s
allwinner,sun9i-a80
allwinner,sun50i-a64
nextthing,gr8

View File

@ -37,7 +37,7 @@ Example:
compatible = "arm,vexpress-sysreg";
reg = <0x10000000 0x1000>;
v2m_led_gpios: sys_led@08 {
v2m_led_gpios: sys_led@8 {
compatible = "arm,vexpress-sysreg,sys_led";
gpio-controller;
#gpio-cells = <2>;

View File

@ -5,6 +5,36 @@ Required properties:
- compatible: Compatibility string. Must be 'ceva,ahci-1v84'.
- clocks: Input clock specifier. Refer to common clock bindings.
- interrupts: Interrupt specifier. Refer to interrupt binding.
- ceva,p0-cominit-params: OOB timing value for COMINIT parameter for port 0.
- ceva,p1-cominit-params: OOB timing value for COMINIT parameter for port 1.
The fields for the above parameter must be as shown below:
ceva,pN-cominit-params = /bits/ 8 <CIBGMN CIBGMX CIBGN CINMP>;
CINMP : COMINIT Negate Minimum Period.
CIBGN : COMINIT Burst Gap Nominal.
CIBGMX: COMINIT Burst Gap Maximum.
CIBGMN: COMINIT Burst Gap Minimum.
- ceva,p0-comwake-params: OOB timing value for COMWAKE parameter for port 0.
- ceva,p1-comwake-params: OOB timing value for COMWAKE parameter for port 1.
The fields for the above parameter must be as shown below:
ceva,pN-comwake-params = /bits/ 8 <CWBGMN CWBGMX CWBGN CWNMP>;
CWBGMN: COMWAKE Burst Gap Minimum.
CWBGMX: COMWAKE Burst Gap Maximum.
CWBGN: COMWAKE Burst Gap Nominal.
CWNMP: COMWAKE Negate Minimum Period.
- ceva,p0-burst-params: Burst timing value for COM parameter for port 0.
- ceva,p1-burst-params: Burst timing value for COM parameter for port 1.
The fields for the above parameter must be as shown below:
ceva,pN-burst-params = /bits/ 8 <BMX BNM SFD PTST>;
BMX: COM Burst Maximum.
BNM: COM Burst Nominal.
SFD: Signal Failure Detection value.
PTST: Partial to Slumber timer value.
- ceva,p0-retry-params: Retry interval timing value for port 0.
- ceva,p1-retry-params: Retry interval timing value for port 1.
The fields for the above parameter must be as shown below:
ceva,pN-retry-params = /bits/ 16 <RIT RCT>;
RIT: Retry Interval Timer.
RCT: Rate Change Timer.
Optional properties:
- ceva,broken-gen2: limit to gen1 speed instead of gen2.
@ -16,5 +46,14 @@ Examples:
interrupt-parent = <&gic>;
interrupts = <0 133 4>;
clocks = <&clkc SATA_CLK_ID>;
ceva,p0-cominit-params = /bits/ 8 <0x0F 0x25 0x18 0x29>;
ceva,p0-comwake-params = /bits/ 8 <0x04 0x0B 0x08 0x0F>;
ceva,p0-burst-params = /bits/ 8 <0x0A 0x08 0x4A 0x06>;
ceva,p0-retry-params = /bits/ 16 <0x0216 0x7F06>;
ceva,p1-cominit-params = /bits/ 8 <0x0F 0x25 0x18 0x29>;
ceva,p1-comwake-params = /bits/ 8 <0x04 0x0B 0x08 0x0F>;
ceva,p1-burst-params = /bits/ 8 <0x0A 0x08 0x4A 0x06>;
ceva,p1-retry-params = /bits/ 16 <0x0216 0x7F06>;
ceva,broken-gen2;
};

View File

@ -56,7 +56,7 @@ Examples:
interrupts = <115>;
};
ahci: sata@01c18000 {
ahci: sata@1c18000 {
compatible = "allwinner,sun4i-a10-ahci";
reg = <0x01c18000 0x1000>;
interrupts = <56>;

View File

@ -25,7 +25,7 @@ Optional properties:
Examples:
sata@02200000 {
sata@2200000 {
compatible = "fsl,imx6q-ahci";
reg = <0x02200000 0x4000>;
interrupts = <0 39 IRQ_TYPE_LEVEL_HIGH>;

View File

@ -61,7 +61,7 @@ Timing property for child nodes. It is mandatory, not optional.
Example for an imx6q-sabreauto board, the NOR flash connected to the WEIM:
weim: weim@021b8000 {
weim: weim@21b8000 {
compatible = "fsl,imx6q-weim";
reg = <0x021b8000 0x4000>;
clocks = <&clks 196>;

View File

@ -28,7 +28,7 @@ which can normally be found in the datasheet.
Example:
rsb@01f03400 {
rsb@1f03400 {
compatible = "allwinner,sun8i-a23-rsb";
reg = <0x01f03400 0x400>;
interrupts = <0 39 4>;

View File

@ -0,0 +1,93 @@
Texas Instruments sysc interconnect target module wrapper binding
Texas Instruments SoCs can have a generic interconnect target module
hardware for devices connected to various interconnects such as L3
interconnect (Arteris NoC) and L4 interconnect (Sonics s3220). The sysc
is mostly used for interaction between module and PRCM. It participates
in the OCP Disconnect Protocol but other than that is mostly independent
of the interconnect.
Each interconnect target module can have one or more devices connected to
it. There is a set of control registers for managing interconnect target
module clocks, idle modes and interconnect level resets for the module.
These control registers are sprinkled into the unused register address
space of the first child device IP block managed by the interconnect
target module and typically are named REVISION, SYSCONFIG and SYSSTATUS.
Required standard properties:
- compatible shall be one of the following generic types:
"ti,sysc-omap2"
"ti,sysc-omap4"
"ti,sysc-omap4-simple"
or one of the following derivative types for hardware
needing special workarounds:
"ti,sysc-omap3430-sr"
"ti,sysc-omap3630-sr"
"ti,sysc-omap4-sr"
"ti,sysc-omap3-sham"
"ti,sysc-omap-aes"
"ti,sysc-mcasp"
"ti,sysc-usb-host-fs"
- reg shall have register areas implemented for the interconnect
target module in question such as revision, sysc and syss
- reg-names shall contain the register names implemented for the
interconnect target module in question such as
"rev, "sysc", and "syss"
- ranges shall contain the interconnect target module IO range
available for one or more child device IP blocks managed
by the interconnect target module, the ranges may include
multiple ranges such as device L4 range for control and
parent L3 range for DMA access
Optional properties:
- clocks clock specifier for each name in the clock-names as
specified in the binding documentation for ti-clkctrl,
typically available for all interconnect targets on TI SoCs
based on omap4 except if it's read-only register in hwauto
mode as for example omap4 L4_CFG_CLKCTRL
- clock-names should contain at least "fck", and optionally also "ick"
depending on the SoC and the interconnect target module
- ti,hwmods optional TI interconnect module name to use legacy
hwmod platform data
Example: Single instance of MUSB controller on omap4 using interconnect ranges
using offsets from l4_cfg second segment (0x4a000000 + 0x80000 = 0x4a0ab000):
target-module@2b000 { /* 0x4a0ab000, ap 84 12.0 */
compatible = "ti,sysc-omap2";
ti,hwmods = "usb_otg_hs";
reg = <0x2b400 0x4>,
<0x2b404 0x4>,
<0x2b408 0x4>;
reg-names = "rev", "sysc", "syss";
clocks = <&l3_init_clkctrl OMAP4_USB_OTG_HS_CLKCTRL 0>;
clock-names = "fck";
#address-cells = <1>;
#size-cells = <1>;
ranges = <0 0x2b000 0x1000>;
usb_otg_hs: otg@0 {
compatible = "ti,omap4-musb";
reg = <0x0 0x7ff>;
interrupts = <GIC_SPI 92 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 93 IRQ_TYPE_LEVEL_HIGH>;
usb-phy = <&usb2_phy>;
...
};
};
Note that other SoCs, such as am335x can have multipe child devices. On am335x
there are two MUSB instances, two USB PHY instances, and a single CPPI41 DMA
instance as children of a single interconnet target module.

View File

@ -0,0 +1,50 @@
Technologic Systems NBUS
The NBUS is a bus used to interface with peripherals in the Technologic
Systems FPGA on the TS-4600 SoM.
Required properties :
- compatible : "technologic,ts-nbus"
- #address-cells : must be 1
- #size-cells : must be 0
- pwms : The PWM bound to the FPGA
- ts,data-gpios : The 8 GPIO pins connected to the data lines on the FPGA
- ts,csn-gpios : The GPIO pin connected to the csn line on the FPGA
- ts,txrx-gpios : The GPIO pin connected to the txrx line on the FPGA
- ts,strobe-gpios : The GPIO pin connected to the stobe line on the FPGA
- ts,ale-gpios : The GPIO pin connected to the ale line on the FPGA
- ts,rdy-gpios : The GPIO pin connected to the rdy line on the FPGA
Child nodes:
The NBUS node can contain zero or more child nodes representing peripherals
on the bus.
Example:
nbus {
compatible = "technologic,ts-nbus";
pinctrl-0 = <&nbus_pins>;
#address-cells = <1>;
#size-cells = <0>;
pwms = <&pwm 2 83>;
ts,data-gpios = <&gpio0 0 GPIO_ACTIVE_HIGH
&gpio0 1 GPIO_ACTIVE_HIGH
&gpio0 2 GPIO_ACTIVE_HIGH
&gpio0 3 GPIO_ACTIVE_HIGH
&gpio0 4 GPIO_ACTIVE_HIGH
&gpio0 5 GPIO_ACTIVE_HIGH
&gpio0 6 GPIO_ACTIVE_HIGH
&gpio0 7 GPIO_ACTIVE_HIGH>;
ts,csn-gpios = <&gpio0 16 GPIO_ACTIVE_HIGH>;
ts,txrx-gpios = <&gpio0 24 GPIO_ACTIVE_HIGH>;
ts,strobe-gpios = <&gpio0 25 GPIO_ACTIVE_HIGH>;
ts,ale-gpios = <&gpio0 26 GPIO_ACTIVE_HIGH>;
ts,rdy-gpios = <&gpio0 21 GPIO_ACTIVE_HIGH>;
watchdog@2a {
compatible = "...";
/* ... */
};
};

View File

@ -59,7 +59,7 @@ syscon: syscon@10000000 {
compatible = "syscon";
reg = <0x10000000 0x1000>;
oscclk0: osc0@0c {
oscclk0: osc0@c {
compatible = "arm,syscon-icst307";
#clock-cells = <0>;
lock-offset = <0x20>;

View File

@ -137,6 +137,20 @@ These clock IDs are defined in:
ch1_audio audiopll 2 BCM_CYGNUS_AUDIOPLL_CH1
ch2_audio audiopll 3 BCM_CYGNUS_AUDIOPLL_CH2
Hurricane 2
------
PLL and leaf clock compatible strings for Hurricane 2 are:
"brcm,hr2-armpll"
The following table defines the set of PLL/clock for Hurricane 2:
Clock Source Index ID
--- ----- ----- ---------
crystal N/A N/A N/A
armpll crystal N/A N/A
Northstar and Northstar Plus
------
PLL and leaf clock compatible strings for Northstar and Northstar Plus are:

View File

@ -33,6 +33,12 @@ Required Properties:
- clock-names: Aliases for the above clocks. They should be "pll_ref",
"pll_in", "cdclk", "sclk_audio", and "sclk_pcm_in" respectively.
Optional Properties:
- power-domains: a phandle to respective power domain node as described by
generic PM domain bindings (see power/power_domain.txt for more
information).
The following is the list of clocks generated by the controller. Each clock is
assigned an identifier and client nodes use this identifier to specify the
clock which they consume. Some of the clocks are available only on a particular
@ -80,7 +86,7 @@ Example 3: I2S controller node that consumes the clock generated by the clock
controller. Refer to the standard clock bindings for information
about 'clocks' and 'clock-names' property.
i2s0: i2s@03830000 {
i2s0: i2s@3830000 {
compatible = "samsung,i2s-v5";
reg = <0x03830000 0x100>;
dmas = <&pdma0 10

View File

@ -43,7 +43,7 @@ Example: I2S controller node that consumes the clock generated by the clock
controller. Refer to the standard clock bindings for information
about 'clocks' and 'clock-names' property.
i2s0: i2s@03830000 {
i2s0: i2s@3830000 {
/* ... */
clock-names = "iis", "i2s_opclk0",
"i2s_opclk1";

View File

@ -21,7 +21,7 @@ Required properties:
a size of 8.
- #clock-cells : from common clock binding; shall be set to 1
divider_clk: core-clock@0064 {
divider_clk: core-clock@64 {
compatible = "marvell,dove-divider-clock";
reg = <0x0064 0x8>;
#clock-cells = <1>;

View File

@ -41,3 +41,46 @@ Example 2: UART controller node that consumes the clock generated by the clock
clocks = <&clock CLK_UART2>, <&clock CLK_SCLK_UART2>;
clock-names = "uart", "clk_uart_baud0";
};
Exynos4412 SoC contains some additional clocks for FIMC-ISP (Camera ISP)
subsystem. Registers for those clocks are located in the ISP power domain.
Because those registers are also located in a different memory region than
the main clock controller, a separate clock controller has to be defined for
handling them.
Required Properties:
- compatible: should be "samsung,exynos4412-isp-clock".
- reg: physical base address of the ISP clock controller and length of memory
mapped region.
- #clock-cells: should be 1.
- clocks: list of the clock controller input clock identifiers,
from common clock bindings, should point to CLK_ACLK200 and
CLK_ACLK400_MCUISP clocks from the main clock controller.
- clock-names: list of the clock controller input clock names,
as described in clock-bindings.txt, should be "aclk200" and
"aclk400_mcuisp".
- power-domains: a phandle to ISP power domain node as described by
generic PM domain bindings.
Example 3: The clock controllers bindings for Exynos4412 SoCs.
clock: clock-controller@10030000 {
compatible = "samsung,exynos4412-clock";
reg = <0x10030000 0x18000>;
#clock-cells = <1>;
};
isp_clock: clock-controller@10048000 {
compatible = "samsung,exynos4412-isp-clock";
reg = <0x10048000 0x1000>;
#clock-cells = <1>;
power-domains = <&pd_isp>;
clocks = <&clock CLK_ACLK200>, <&clock CLK_ACLK400_MCUISP>;
clock-names = "aclk200", "aclk400_mcuisp";
};

View File

@ -168,6 +168,11 @@ Required Properties:
- aclk_cam1_400
- aclk_cam1_552
Optional properties:
- power-domains: a phandle to respective power domain node as described by
generic PM domain bindings (see power/power_domain.txt for more
information).
Each clock is assigned an identifier and client nodes can use this identifier
to specify the clock which they consume.
@ -270,6 +275,7 @@ Example 2: Examples of clock controller nodes are listed below.
clocks = <&xxti>,
<&cmu_top CLK_ACLK_G2D_266>,
<&cmu_top CLK_ACLK_G2D_400>;
power-domains = <&pd_g2d>;
};
cmu_disp: clock-controller@13b90000 {
@ -295,6 +301,7 @@ Example 2: Examples of clock controller nodes are listed below.
<&cmu_mif CLK_SCLK_DECON_ECLK_DISP>,
<&cmu_mif CLK_SCLK_DECON_TV_VCLK_DISP>,
<&cmu_mif CLK_ACLK_DISP_333>;
power-domains = <&pd_disp>;
};
cmu_aud: clock-controller@114c0000 {
@ -304,6 +311,7 @@ Example 2: Examples of clock controller nodes are listed below.
clock-names = "oscclk", "fout_aud_pll";
clocks = <&xxti>, <&cmu_top CLK_FOUT_AUD_PLL>;
power-domains = <&pd_aud>;
};
cmu_bus0: clock-controller@13600000 {
@ -340,6 +348,7 @@ Example 2: Examples of clock controller nodes are listed below.
clock-names = "oscclk", "aclk_g3d_400";
clocks = <&xxti>, <&cmu_top CLK_ACLK_G3D_400>;
power-domains = <&pd_g3d>;
};
cmu_gscl: clock-controller@13cf0000 {
@ -353,6 +362,7 @@ Example 2: Examples of clock controller nodes are listed below.
clocks = <&xxti>,
<&cmu_top CLK_ACLK_GSCL_111>,
<&cmu_top CLK_ACLK_GSCL_333>;
power-domains = <&pd_gscl>;
};
cmu_apollo: clock-controller@11900000 {
@ -384,6 +394,7 @@ Example 2: Examples of clock controller nodes are listed below.
clocks = <&xxti>,
<&cmu_top CLK_SCLK_JPEG_MSCL>,
<&cmu_top CLK_ACLK_MSCL_400>;
power-domains = <&pd_mscl>;
};
cmu_mfc: clock-controller@15280000 {
@ -393,6 +404,7 @@ Example 2: Examples of clock controller nodes are listed below.
clock-names = "oscclk", "aclk_mfc_400";
clocks = <&xxti>, <&cmu_top CLK_ACLK_MFC_400>;
power-domains = <&pd_mfc>;
};
cmu_hevc: clock-controller@14f80000 {
@ -402,6 +414,7 @@ Example 2: Examples of clock controller nodes are listed below.
clock-names = "oscclk", "aclk_hevc_400";
clocks = <&xxti>, <&cmu_top CLK_ACLK_HEVC_400>;
power-domains = <&pd_hevc>;
};
cmu_isp: clock-controller@146d0000 {
@ -415,6 +428,7 @@ Example 2: Examples of clock controller nodes are listed below.
clocks = <&xxti>,
<&cmu_top CLK_ACLK_ISP_DIS_400>,
<&cmu_top CLK_ACLK_ISP_400>;
power-domains = <&pd_isp>;
};
cmu_cam0: clock-controller@120d0000 {
@ -430,6 +444,7 @@ Example 2: Examples of clock controller nodes are listed below.
<&cmu_top CLK_ACLK_CAM0_333>,
<&cmu_top CLK_ACLK_CAM0_400>,
<&cmu_top CLK_ACLK_CAM0_552>;
power-domains = <&pd_cam0>;
};
cmu_cam1: clock-controller@145d0000 {
@ -451,6 +466,7 @@ Example 2: Examples of clock controller nodes are listed below.
<&cmu_top CLK_ACLK_CAM1_333>,
<&cmu_top CLK_ACLK_CAM1_400>,
<&cmu_top CLK_ACLK_CAM1_552>;
power-domains = <&pd_cam1>;
};
Example 3: UART controller node that consumes the clock generated by the clock

View File

@ -10,13 +10,13 @@ ID in its "clocks" phandle cell. See include/dt-bindings/clock/imx1-clock.h
for the full list of i.MX1 clock IDs.
Examples:
clks: ccm@0021b000 {
clks: ccm@21b000 {
#clock-cells = <1>;
compatible = "fsl,imx1-ccm";
reg = <0x0021b000 0x1000>;
};
pwm: pwm@00208000 {
pwm: pwm@208000 {
#pwm-cells = <2>;
compatible = "fsl,imx1-pwm";
reg = <0x00208000 0x1000>;

View File

@ -14,14 +14,14 @@ Examples:
#include <dt-bindings/clock/imx6qdl-clock.h>
clks: ccm@020c4000 {
clks: ccm@20c4000 {
compatible = "fsl,imx6q-ccm";
reg = <0x020c4000 0x4000>;
interrupts = <0 87 0x04 0 88 0x04>;
#clock-cells = <1>;
};
uart1: serial@02020000 {
uart1: serial@2020000 {
compatible = "fsl,imx6q-uart", "fsl,imx21-uart";
reg = <0x02020000 0x4000>;
interrupts = <0 26 0x04>;

View File

@ -46,7 +46,7 @@ Example:
/* ... */
Node of the MFD chip
max77686: max77686@09 {
max77686: max77686@9 {
compatible = "maxim,max77686";
interrupt-parent = <&wakeup_eint>;
interrupts = <26 0>;
@ -71,7 +71,7 @@ Example:
/* ... */
Node of the MFD chip
max77802: max77802@09 {
max77802: max77802@9 {
compatible = "maxim,max77802";
interrupt-parent = <&wakeup_eint>;
interrupts = <26 0>;

View File

@ -10,12 +10,23 @@ Required properties :
- compatible : shall contain only one of the following. The generic
compatible "qcom,rpmcc" should be also included.
"qcom,rpmcc-msm8660", "qcom,rpmcc"
"qcom,rpmcc-apq8060", "qcom,rpmcc"
"qcom,rpmcc-msm8916", "qcom,rpmcc"
"qcom,rpmcc-msm8974", "qcom,rpmcc"
"qcom,rpmcc-apq8064", "qcom,rpmcc"
"qcom,rpmcc-msm8996", "qcom,rpmcc"
- #clock-cells : shall contain 1
The clock enumerators are defined in <dt-bindings/clock/qcom,rpmcc.h>
and come in pairs: FOO_CLK followed by FOO_A_CLK. The latter clock
is an "active" clock, which means that the consumer only care that the
clock is available when the apps CPU subsystem is active, i.e. not
suspended or in deep idle. If it is important that the clock keeps running
during system suspend, you need to specify the non-active clock, the one
not containing *_A_* in the enumerator name.
Example:
smd {
compatible = "qcom,smd";

View File

@ -22,6 +22,7 @@ Required Properties:
- "renesas,r8a7794-cpg-mssr" for the r8a7794 SoC (R-Car E2)
- "renesas,r8a7795-cpg-mssr" for the r8a7795 SoC (R-Car H3)
- "renesas,r8a7796-cpg-mssr" for the r8a7796 SoC (R-Car M3-W)
- "renesas,r8a77970-cpg-mssr" for the r8a77970 SoC (R-Car V3M)
- "renesas,r8a77995-cpg-mssr" for the r8a77995 SoC (R-Car D3)
- reg: Base address and length of the memory resource used by the CPG/MSSR
@ -31,8 +32,8 @@ Required Properties:
clock-names
- clock-names: List of external parent clock names. Valid names are:
- "extal" (r8a7743, r8a7745, r8a7790, r8a7791, r8a7792, r8a7793, r8a7794,
r8a7795, r8a7796, r8a77995)
- "extalr" (r8a7795, r8a7796)
r8a7795, r8a7796, r8a77970, r8a77995)
- "extalr" (r8a7795, r8a7796, r8a77970)
- "usb_extal" (r8a7743, r8a7745, r8a7790, r8a7791, r8a7793, r8a7794)
- #clock-cells: Must be 2

View File

@ -1,6 +1,6 @@
* Renesas RZ Clock Pulse Generator (CPG)
* Renesas RZ/A1 Clock Pulse Generator (CPG)
The CPG generates core clocks for the RZ SoCs. It includes the PLL, variable
The CPG generates core clocks for the RZ/A1 SoCs. It includes the PLL, variable
CPU and GPU clocks, and several fixed ratio dividers.
The CPG also provides a Clock Domain for SoC devices, in combination with the
CPG Module Stop (MSTP) Clocks.

View File

@ -42,7 +42,7 @@ Required properties:
Example:
clockgen-a@090ff000 {
clockgen-a@90ff000 {
compatible = "st,clkgen-c32";
reg = <0x90ff000 0x1000>;

View File

@ -36,7 +36,7 @@ For the PRCM CCUs on A83T/H3/A64, two more clocks are needed:
- "iosc": the SoC's internal frequency oscillator
Example for generic CCU:
ccu: clock@01c20000 {
ccu: clock@1c20000 {
compatible = "allwinner,sun8i-h3-ccu";
reg = <0x01c20000 0x400>;
clocks = <&osc24M>, <&osc32k>;
@ -46,7 +46,7 @@ ccu: clock@01c20000 {
};
Example for PRCM CCU:
r_ccu: clock@01f01400 {
r_ccu: clock@1f01400 {
compatible = "allwinner,sun50i-a64-r-ccu";
reg = <0x01f01400 0x100>;
clocks = <&osc24M>, <&osc32k>, <&iosc>, <&ccu CLK_PLL_PERIPH0>;

View File

@ -137,7 +137,7 @@ the address block, which is related to the overall mmc block.
For example:
osc24M: clk@01c20050 {
osc24M: clk@1c20050 {
#clock-cells = <0>;
compatible = "allwinner,sun4i-a10-osc-clk";
reg = <0x01c20050 0x4>;
@ -145,7 +145,7 @@ osc24M: clk@01c20050 {
clock-output-names = "osc24M";
};
pll1: clk@01c20000 {
pll1: clk@1c20000 {
#clock-cells = <0>;
compatible = "allwinner,sun4i-a10-pll1-clk";
reg = <0x01c20000 0x4>;
@ -153,7 +153,7 @@ pll1: clk@01c20000 {
clock-output-names = "pll1";
};
pll5: clk@01c20020 {
pll5: clk@1c20020 {
#clock-cells = <1>;
compatible = "allwinner,sun4i-pll5-clk";
reg = <0x01c20020 0x4>;
@ -161,7 +161,7 @@ pll5: clk@01c20020 {
clock-output-names = "pll5_ddr", "pll5_other";
};
pll6: clk@01c20028 {
pll6: clk@1c20028 {
#clock-cells = <1>;
compatible = "allwinner,sun6i-a31-pll6-clk";
reg = <0x01c20028 0x4>;
@ -169,7 +169,7 @@ pll6: clk@01c20028 {
clock-output-names = "pll6", "pll6x2";
};
cpu: cpu@01c20054 {
cpu: cpu@1c20054 {
#clock-cells = <0>;
compatible = "allwinner,sun4i-a10-cpu-clk";
reg = <0x01c20054 0x4>;
@ -177,7 +177,7 @@ cpu: cpu@01c20054 {
clock-output-names = "cpu";
};
mmc0_clk: clk@01c20088 {
mmc0_clk: clk@1c20088 {
#clock-cells = <1>;
compatible = "allwinner,sun4i-a10-mmc-clk";
reg = <0x01c20088 0x4>;
@ -199,7 +199,7 @@ gmac_int_tx_clk: clk@3 {
clock-output-names = "gmac_int_tx";
};
gmac_clk: clk@01c20164 {
gmac_clk: clk@1c20164 {
#clock-cells = <0>;
compatible = "allwinner,sun7i-a20-gmac-clk";
reg = <0x01c20164 0x4>;
@ -211,7 +211,7 @@ gmac_clk: clk@01c20164 {
clock-output-names = "gmac";
};
mmc_config_clk: clk@01c13000 {
mmc_config_clk: clk@1c13000 {
compatible = "allwinner,sun9i-a80-mmc-config-clk";
reg = <0x01c13000 0x10>;
clocks = <&ahb0_gates 8>;

View File

@ -25,7 +25,7 @@ Example:
};
};
...
i2c0: i2c-master@0d090000 {
i2c0: i2c-master@d090000 {
...
cdce706: clock-synth@69 {
compatible = "ti,cdce706";

Some files were not shown because too many files have changed in this diff Show More