KVM/arm64 fixes for 5.17, take #2
- A couple of fixes when handling an exception while a SError has been delivered - Workaround for Cortex-A510's single-step[ erratum -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmH9LlcPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDLTcP/3Ry8CzvPubZquMyNdRUFvEg2EcfTa6vtIGW Fw7ap2hwPUaXUgJKDihMFIWj3Wf/wPmXw4t2Sr8R/yq8v9kWe+IG1isnT0yQhY3W kLXEqc8Mu4Rf8+jvlFHsp5mLENHIswpWAv/EY49ChgZkNmtkKpnPm1qnD89d8bNv tUwooDWidQ/7nXdM3z6zygSROJS24+OGTYTWzOQ1KgV3FGaXbqYiCleoPOpRR/Tc DQQWF/tVl8bZCqgkGKZCv3aXT0ZUPrQggARJGai78vP0l2sE/Kyaydgq5I7npZja 2L2U4kDNoPYIVa8A1jvV3Ef3AqNFs6B7+jXWfYIgAcXjCYzDK3cZcxavf/Inq9F1 3udVGJGSzH1KkGaihW3BVhsqGORRHKCdksJzWRgqf6vGyJhJw0u0D2u1rTWcT+jw Nm4KxShp0CX59HSLnVF5sR0Mct3jNNZ7UCCgH7q10wuBqYRfJT32hCo2ZrT7g9oD IQ+pa2dVYa3SaKZ4O6T/lSlbLOuuxtvmcEIfxYpPD6m10S5RrxOdsW3MCtiYM5HQ 24oo2mk6NIu/va0XxhcW+NBMcYtLQD9JUGbkUkpcRy2mgilTi9b4YPp+muYM7plQ /S1gj2kGY8vjMg0H+wysjMJyl2huEwSRsZ/UfxCAgW+MYhHLDxhxAnDWc8EcwGgE tUzomowB =Mbx/ -----END PGP SIGNATURE----- Merge tag 'kvmarm-fixes-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.17, take #2 - A couple of fixes when handling an exception while a SError has been delivered - Workaround for Cortex-A510's single-step[ erratum
@ -216,7 +216,6 @@ ForEachMacros:
|
||||
- 'for_each_migratetype_order'
|
||||
- 'for_each_msi_entry'
|
||||
- 'for_each_msi_entry_safe'
|
||||
- 'for_each_msi_vector'
|
||||
- 'for_each_net'
|
||||
- 'for_each_net_continue_reverse'
|
||||
- 'for_each_netdev'
|
||||
|
10
.mailmap
@ -10,10 +10,12 @@
|
||||
# Please keep this list dictionary sorted.
|
||||
#
|
||||
Aaron Durbin <adurbin@google.com>
|
||||
Abhinav Kumar <quic_abhinavk@quicinc.com> <abhinavk@codeaurora.org>
|
||||
Adam Oldham <oldhamca@gmail.com>
|
||||
Adam Radford <aradford@gmail.com>
|
||||
Adriana Reus <adi.reus@gmail.com> <adriana.reus@intel.com>
|
||||
Adrian Bunk <bunk@stusta.de>
|
||||
Akhil P Oommen <quic_akhilpo@quicinc.com> <akhilpo@codeaurora.org>
|
||||
Alan Cox <alan@lxorguk.ukuu.org.uk>
|
||||
Alan Cox <root@hraefn.swansea.linux.org.uk>
|
||||
Aleksandar Markovic <aleksandar.markovic@mips.com> <aleksandar.markovic@imgtec.com>
|
||||
@ -42,6 +44,7 @@ Andrew Vasquez <andrew.vasquez@qlogic.com>
|
||||
Andrey Konovalov <andreyknvl@gmail.com> <andreyknvl@google.com>
|
||||
Andrey Ryabinin <ryabinin.a.a@gmail.com> <a.ryabinin@samsung.com>
|
||||
Andrey Ryabinin <ryabinin.a.a@gmail.com> <aryabinin@virtuozzo.com>
|
||||
Andrzej Hajda <andrzej.hajda@intel.com> <a.hajda@samsung.com>
|
||||
Andy Adamson <andros@citi.umich.edu>
|
||||
Antoine Tenart <atenart@kernel.org> <antoine.tenart@bootlin.com>
|
||||
Antoine Tenart <atenart@kernel.org> <antoine.tenart@free-electrons.com>
|
||||
@ -67,6 +70,7 @@ Boris Brezillon <bbrezillon@kernel.org> <boris.brezillon@bootlin.com>
|
||||
Boris Brezillon <bbrezillon@kernel.org> <boris.brezillon@free-electrons.com>
|
||||
Brian Avery <b.avery@hp.com>
|
||||
Brian King <brking@us.ibm.com>
|
||||
Brian Silverman <bsilver16384@gmail.com> <brian.silverman@bluerivertech.com>
|
||||
Changbin Du <changbin.du@intel.com> <changbin.du@gmail.com>
|
||||
Changbin Du <changbin.du@intel.com> <changbin.du@intel.com>
|
||||
Chao Yu <chao@kernel.org> <chao2.yu@samsung.com>
|
||||
@ -174,6 +178,7 @@ Jeff Layton <jlayton@kernel.org> <jlayton@redhat.com>
|
||||
Jens Axboe <axboe@suse.de>
|
||||
Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
|
||||
Jernej Skrabec <jernej.skrabec@gmail.com> <jernej.skrabec@siol.net>
|
||||
Jessica Zhang <quic_jesszhan@quicinc.com> <jesszhan@codeaurora.org>
|
||||
Jiri Slaby <jirislaby@kernel.org> <jirislaby@gmail.com>
|
||||
Jiri Slaby <jirislaby@kernel.org> <jslaby@novell.com>
|
||||
Jiri Slaby <jirislaby@kernel.org> <jslaby@suse.com>
|
||||
@ -193,6 +198,7 @@ Juha Yrjola <at solidboot.com>
|
||||
Juha Yrjola <juha.yrjola@nokia.com>
|
||||
Juha Yrjola <juha.yrjola@solidboot.com>
|
||||
Julien Thierry <julien.thierry.kdev@gmail.com> <julien.thierry@arm.com>
|
||||
Kalyan Thota <quic_kalyant@quicinc.com> <kalyan_t@codeaurora.org>
|
||||
Kay Sievers <kay.sievers@vrfy.org>
|
||||
Kees Cook <keescook@chromium.org> <kees.cook@canonical.com>
|
||||
Kees Cook <keescook@chromium.org> <keescook@google.com>
|
||||
@ -204,9 +210,11 @@ Kenneth W Chen <kenneth.w.chen@intel.com>
|
||||
Konstantin Khlebnikov <koct9i@gmail.com> <khlebnikov@yandex-team.ru>
|
||||
Konstantin Khlebnikov <koct9i@gmail.com> <k.khlebnikov@samsung.com>
|
||||
Koushik <raghavendra.koushik@neterion.com>
|
||||
Krishna Manikandan <quic_mkrishn@quicinc.com> <mkrishn@codeaurora.org>
|
||||
Krzysztof Kozlowski <krzk@kernel.org> <k.kozlowski.k@gmail.com>
|
||||
Krzysztof Kozlowski <krzk@kernel.org> <k.kozlowski@samsung.com>
|
||||
Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
|
||||
Kuogee Hsieh <quic_khsieh@quicinc.com> <khsieh@codeaurora.org>
|
||||
Leonardo Bras <leobras.c@gmail.com> <leonardo@linux.ibm.com>
|
||||
Leonid I Ananiev <leonid.i.ananiev@intel.com>
|
||||
Leon Romanovsky <leon@kernel.org> <leon@leon.nu>
|
||||
@ -313,6 +321,7 @@ Qais Yousef <qsyousef@gmail.com> <qais.yousef@imgtec.com>
|
||||
Quentin Monnet <quentin@isovalent.com> <quentin.monnet@netronome.com>
|
||||
Quentin Perret <qperret@qperret.net> <quentin.perret@arm.com>
|
||||
Rafael J. Wysocki <rjw@rjwysocki.net> <rjw@sisk.pl>
|
||||
Rajeev Nandan <quic_rajeevny@quicinc.com> <rajeevny@codeaurora.org>
|
||||
Rajesh Shah <rajesh.shah@intel.com>
|
||||
Ralf Baechle <ralf@linux-mips.org>
|
||||
Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
|
||||
@ -327,6 +336,7 @@ Rui Saraiva <rmps@joel.ist.utl.pt>
|
||||
Sachin P Sant <ssant@in.ibm.com>
|
||||
Sakari Ailus <sakari.ailus@linux.intel.com> <sakari.ailus@iki.fi>
|
||||
Sam Ravnborg <sam@mars.ravnborg.org>
|
||||
Sankeerth Billakanti <quic_sbillaka@quicinc.com> <sbillaka@codeaurora.org>
|
||||
Santosh Shilimkar <santosh.shilimkar@oracle.org>
|
||||
Santosh Shilimkar <ssantosh@kernel.org>
|
||||
Sarangdhar Joshi <spjoshi@codeaurora.org>
|
||||
|
5
CREDITS
@ -315,6 +315,11 @@ S: Via Delle Palme, 9
|
||||
S: Terni 05100
|
||||
S: Italy
|
||||
|
||||
N: Ohad Ben Cohen
|
||||
E: ohad@wizery.com
|
||||
D: Remote Processor (remoteproc) subsystem
|
||||
D: Remote Processor Messaging (rpmsg) subsystem
|
||||
|
||||
N: Krzysztof Benedyczak
|
||||
E: golbi@mat.uni.torun.pl
|
||||
W: http://www.mat.uni.torun.pl/~golbi
|
||||
|
@ -1,22 +0,0 @@
|
||||
What: /sys/class/dax/
|
||||
Date: May, 2016
|
||||
KernelVersion: v4.7
|
||||
Contact: nvdimm@lists.linux.dev
|
||||
Description: Device DAX is the device-centric analogue of Filesystem
|
||||
DAX (CONFIG_FS_DAX). It allows memory ranges to be
|
||||
allocated and mapped without need of an intervening file
|
||||
system. Device DAX is strict, precise and predictable.
|
||||
Specifically this interface:
|
||||
|
||||
1. Guarantees fault granularity with respect to a given
|
||||
page size (pte, pmd, or pud) set at configuration time.
|
||||
|
||||
2. Enforces deterministic behavior by being strict about
|
||||
what fault scenarios are supported.
|
||||
|
||||
The /sys/class/dax/ interface enumerates all the
|
||||
device-dax instances in the system. The ABI is
|
||||
deprecated and will be removed after 2020. It is
|
||||
replaced with the DAX bus interface /sys/bus/dax/ where
|
||||
device-dax instances can be found under
|
||||
/sys/bus/dax/devices/
|
676
Documentation/ABI/stable/sysfs-block
Normal file
@ -0,0 +1,676 @@
|
||||
What: /sys/block/<disk>/alignment_offset
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report a physical block size that is
|
||||
bigger than the logical block size (for instance a drive
|
||||
with 4KB physical sectors exposing 512-byte logical
|
||||
blocks to the operating system). This parameter
|
||||
indicates how many bytes the beginning of the device is
|
||||
offset from the disk's natural alignment.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/discard_alignment
|
||||
Date: May 2011
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Devices that support discard functionality may
|
||||
internally allocate space in units that are bigger than
|
||||
the exported logical block size. The discard_alignment
|
||||
parameter indicates how many bytes the beginning of the
|
||||
device is offset from the internal allocation unit's
|
||||
natural alignment.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/diskseq
|
||||
Date: February 2021
|
||||
Contact: Matteo Croce <mcroce@microsoft.com>
|
||||
Description:
|
||||
The /sys/block/<disk>/diskseq files reports the disk
|
||||
sequence number, which is a monotonically increasing
|
||||
number assigned to every drive.
|
||||
Some devices, like the loop device, refresh such number
|
||||
every time the backing file is changed.
|
||||
The value type is 64 bit unsigned.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/inflight
|
||||
Date: October 2009
|
||||
Contact: Jens Axboe <axboe@kernel.dk>, Nikanth Karthikesan <knikanth@suse.de>
|
||||
Description:
|
||||
Reports the number of I/O requests currently in progress
|
||||
(pending / in flight) in a device driver. This can be less
|
||||
than the number of requests queued in the block device queue.
|
||||
The report contains 2 fields: one for read requests
|
||||
and one for write requests.
|
||||
The value type is unsigned int.
|
||||
Cf. Documentation/block/stat.rst which contains a single value for
|
||||
requests in flight.
|
||||
This is related to /sys/block/<disk>/queue/nr_requests
|
||||
and for SCSI device also its queue_depth.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/device_is_integrity_capable
|
||||
Date: July 2014
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Indicates whether a storage device is capable of storing
|
||||
integrity metadata. Set if the device is T10 PI-capable.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/format
|
||||
Date: June 2008
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Metadata format for integrity capable block device.
|
||||
E.g. T10-DIF-TYPE1-CRC.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/protection_interval_bytes
|
||||
Date: July 2015
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Describes the number of data bytes which are protected
|
||||
by one integrity tuple. Typically the device's logical
|
||||
block size.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/read_verify
|
||||
Date: June 2008
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Indicates whether the block layer should verify the
|
||||
integrity of read requests serviced by devices that
|
||||
support sending integrity metadata.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/tag_size
|
||||
Date: June 2008
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Number of bytes of integrity tag space available per
|
||||
512 bytes of data.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/write_generate
|
||||
Date: June 2008
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Indicates whether the block layer should automatically
|
||||
generate checksums for write requests bound for
|
||||
devices that support receiving integrity metadata.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/<partition>/alignment_offset
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report a physical block size that is
|
||||
bigger than the logical block size (for instance a drive
|
||||
with 4KB physical sectors exposing 512-byte logical
|
||||
blocks to the operating system). This parameter
|
||||
indicates how many bytes the beginning of the partition
|
||||
is offset from the disk's natural alignment.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/<partition>/discard_alignment
|
||||
Date: May 2011
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Devices that support discard functionality may
|
||||
internally allocate space in units that are bigger than
|
||||
the exported logical block size. The discard_alignment
|
||||
parameter indicates how many bytes the beginning of the
|
||||
partition is offset from the internal allocation unit's
|
||||
natural alignment.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/<partition>/stat
|
||||
Date: February 2008
|
||||
Contact: Jerome Marchand <jmarchan@redhat.com>
|
||||
Description:
|
||||
The /sys/block/<disk>/<partition>/stat files display the
|
||||
I/O statistics of partition <partition>. The format is the
|
||||
same as the format of /sys/block/<disk>/stat.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/add_random
|
||||
Date: June 2010
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] This file allows to turn off the disk entropy contribution.
|
||||
Default value of this file is '1'(on).
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/chunk_sectors
|
||||
Date: September 2016
|
||||
Contact: Hannes Reinecke <hare@suse.com>
|
||||
Description:
|
||||
[RO] chunk_sectors has different meaning depending on the type
|
||||
of the disk. For a RAID device (dm-raid), chunk_sectors
|
||||
indicates the size in 512B sectors of the RAID volume stripe
|
||||
segment. For a zoned block device, either host-aware or
|
||||
host-managed, chunk_sectors indicates the size in 512B sectors
|
||||
of the zones of the device, with the eventual exception of the
|
||||
last zone of the device which may be smaller.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/dax
|
||||
Date: June 2016
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RO] This file indicates whether the device supports Direct
|
||||
Access (DAX), used by CPU-addressable storage to bypass the
|
||||
pagecache. It shows '1' if true, '0' if not.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/discard_granularity
|
||||
Date: May 2011
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
[RO] Devices that support discard functionality may internally
|
||||
allocate space using units that are bigger than the logical
|
||||
block size. The discard_granularity parameter indicates the size
|
||||
of the internal allocation unit in bytes if reported by the
|
||||
device. Otherwise the discard_granularity will be set to match
|
||||
the device's physical block size. A discard_granularity of 0
|
||||
means that the device does not support discard functionality.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/discard_max_bytes
|
||||
Date: May 2011
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
[RW] While discard_max_hw_bytes is the hardware limit for the
|
||||
device, this setting is the software limit. Some devices exhibit
|
||||
large latencies when large discards are issued, setting this
|
||||
value lower will make Linux issue smaller discards and
|
||||
potentially help reduce latencies induced by large discard
|
||||
operations.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/discard_max_hw_bytes
|
||||
Date: July 2015
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RO] Devices that support discard functionality may have
|
||||
internal limits on the number of bytes that can be trimmed or
|
||||
unmapped in a single operation. The `discard_max_hw_bytes`
|
||||
parameter is set by the device driver to the maximum number of
|
||||
bytes that can be discarded in a single operation. Discard
|
||||
requests issued to the device must not exceed this limit. A
|
||||
`discard_max_hw_bytes` value of 0 means that the device does not
|
||||
support discard functionality.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/discard_zeroes_data
|
||||
Date: May 2011
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
[RO] Will always return 0. Don't rely on any specific behavior
|
||||
for discards, and don't read this file.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/fua
|
||||
Date: May 2018
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RO] Whether or not the block driver supports the FUA flag for
|
||||
write requests. FUA stands for Force Unit Access. If the FUA
|
||||
flag is set that means that write requests must bypass the
|
||||
volatile cache of the storage device.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/hw_sector_size
|
||||
Date: January 2008
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RO] This is the hardware sector size of the device, in bytes.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/independent_access_ranges/
|
||||
Date: October 2021
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RO] The presence of this sub-directory of the
|
||||
/sys/block/xxx/queue/ directory indicates that the device is
|
||||
capable of executing requests targeting different sector ranges
|
||||
in parallel. For instance, single LUN multi-actuator hard-disks
|
||||
will have an independent_access_ranges directory if the device
|
||||
correctly advertizes the sector ranges of its actuators.
|
||||
|
||||
The independent_access_ranges directory contains one directory
|
||||
per access range, with each range described using the sector
|
||||
(RO) attribute file to indicate the first sector of the range
|
||||
and the nr_sectors (RO) attribute file to indicate the total
|
||||
number of sectors in the range starting from the first sector of
|
||||
the range. For example, a dual-actuator hard-disk will have the
|
||||
following independent_access_ranges entries.::
|
||||
|
||||
$ tree /sys/block/<disk>/queue/independent_access_ranges/
|
||||
/sys/block/<disk>/queue/independent_access_ranges/
|
||||
|-- 0
|
||||
| |-- nr_sectors
|
||||
| `-- sector
|
||||
`-- 1
|
||||
|-- nr_sectors
|
||||
`-- sector
|
||||
|
||||
The sector and nr_sectors attributes use 512B sector unit,
|
||||
regardless of the actual block size of the device. Independent
|
||||
access ranges do not overlap and include all sectors within the
|
||||
device capacity. The access ranges are numbered in increasing
|
||||
order of the range start sector, that is, the sector attribute
|
||||
of range 0 always has the value 0.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/io_poll
|
||||
Date: November 2015
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] When read, this file shows whether polling is enabled (1)
|
||||
or disabled (0). Writing '0' to this file will disable polling
|
||||
for this device. Writing any non-zero value will enable this
|
||||
feature.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/io_poll_delay
|
||||
Date: November 2016
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] If polling is enabled, this controls what kind of polling
|
||||
will be performed. It defaults to -1, which is classic polling.
|
||||
In this mode, the CPU will repeatedly ask for completions
|
||||
without giving up any time. If set to 0, a hybrid polling mode
|
||||
is used, where the kernel will attempt to make an educated guess
|
||||
at when the IO will complete. Based on this guess, the kernel
|
||||
will put the process issuing IO to sleep for an amount of time,
|
||||
before entering a classic poll loop. This mode might be a little
|
||||
slower than pure classic polling, but it will be more efficient.
|
||||
If set to a value larger than 0, the kernel will put the process
|
||||
issuing IO to sleep for this amount of microseconds before
|
||||
entering classic polling.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/io_timeout
|
||||
Date: November 2018
|
||||
Contact: Weiping Zhang <zhangweiping@didiglobal.com>
|
||||
Description:
|
||||
[RW] io_timeout is the request timeout in milliseconds. If a
|
||||
request does not complete in this time then the block driver
|
||||
timeout handler is invoked. That timeout handler can decide to
|
||||
retry the request, to fail it or to start a device recovery
|
||||
strategy.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/iostats
|
||||
Date: January 2009
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] This file is used to control (on/off) the iostats
|
||||
accounting of the disk.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/logical_block_size
|
||||
Date: May 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
[RO] This is the smallest unit the storage device can address.
|
||||
It is typically 512 bytes.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/max_active_zones
|
||||
Date: July 2020
|
||||
Contact: Niklas Cassel <niklas.cassel@wdc.com>
|
||||
Description:
|
||||
[RO] For zoned block devices (zoned attribute indicating
|
||||
"host-managed" or "host-aware"), the sum of zones belonging to
|
||||
any of the zone states: EXPLICIT OPEN, IMPLICIT OPEN or CLOSED,
|
||||
is limited by this value. If this value is 0, there is no limit.
|
||||
|
||||
If the host attempts to exceed this limit, the driver should
|
||||
report this error with BLK_STS_ZONE_ACTIVE_RESOURCE, which user
|
||||
space may see as the EOVERFLOW errno.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/max_discard_segments
|
||||
Date: February 2017
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RO] The maximum number of DMA scatter/gather entries in a
|
||||
discard request.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/max_hw_sectors_kb
|
||||
Date: September 2004
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RO] This is the maximum number of kilobytes supported in a
|
||||
single data transfer.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/max_integrity_segments
|
||||
Date: September 2010
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RO] Maximum number of elements in a DMA scatter/gather list
|
||||
with integrity data that will be submitted by the block layer
|
||||
core to the associated block driver.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/max_open_zones
|
||||
Date: July 2020
|
||||
Contact: Niklas Cassel <niklas.cassel@wdc.com>
|
||||
Description:
|
||||
[RO] For zoned block devices (zoned attribute indicating
|
||||
"host-managed" or "host-aware"), the sum of zones belonging to
|
||||
any of the zone states: EXPLICIT OPEN or IMPLICIT OPEN, is
|
||||
limited by this value. If this value is 0, there is no limit.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/max_sectors_kb
|
||||
Date: September 2004
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] This is the maximum number of kilobytes that the block
|
||||
layer will allow for a filesystem request. Must be smaller than
|
||||
or equal to the maximum size allowed by the hardware.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/max_segment_size
|
||||
Date: March 2010
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RO] Maximum size in bytes of a single element in a DMA
|
||||
scatter/gather list.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/max_segments
|
||||
Date: March 2010
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RO] Maximum number of elements in a DMA scatter/gather list
|
||||
that is submitted to the associated block driver.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/minimum_io_size
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
[RO] Storage devices may report a granularity or preferred
|
||||
minimum I/O size which is the smallest request the device can
|
||||
perform without incurring a performance penalty. For disk
|
||||
drives this is often the physical block size. For RAID arrays
|
||||
it is often the stripe chunk size. A properly aligned multiple
|
||||
of minimum_io_size is the preferred request size for workloads
|
||||
where a high number of I/O operations is desired.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/nomerges
|
||||
Date: January 2010
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] Standard I/O elevator operations include attempts to merge
|
||||
contiguous I/Os. For known random I/O loads these attempts will
|
||||
always fail and result in extra cycles being spent in the
|
||||
kernel. This allows one to turn off this behavior on one of two
|
||||
ways: When set to 1, complex merge checks are disabled, but the
|
||||
simple one-shot merges with the previous I/O request are
|
||||
enabled. When set to 2, all merge tries are disabled. The
|
||||
default value is 0 - which enables all types of merge tries.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/nr_requests
|
||||
Date: July 2003
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] This controls how many requests may be allocated in the
|
||||
block layer for read or write requests. Note that the total
|
||||
allocated number may be twice this amount, since it applies only
|
||||
to reads or writes (not the accumulated sum).
|
||||
|
||||
To avoid priority inversion through request starvation, a
|
||||
request queue maintains a separate request pool per each cgroup
|
||||
when CONFIG_BLK_CGROUP is enabled, and this parameter applies to
|
||||
each such per-block-cgroup request pool. IOW, if there are N
|
||||
block cgroups, each request queue may have up to N request
|
||||
pools, each independently regulated by nr_requests.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/nr_zones
|
||||
Date: November 2018
|
||||
Contact: Damien Le Moal <damien.lemoal@wdc.com>
|
||||
Description:
|
||||
[RO] nr_zones indicates the total number of zones of a zoned
|
||||
block device ("host-aware" or "host-managed" zone model). For
|
||||
regular block devices, the value is always 0.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/optimal_io_size
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
[RO] Storage devices may report an optimal I/O size, which is
|
||||
the device's preferred unit for sustained I/O. This is rarely
|
||||
reported for disk drives. For RAID arrays it is usually the
|
||||
stripe width or the internal track size. A properly aligned
|
||||
multiple of optimal_io_size is the preferred request size for
|
||||
workloads where sustained throughput is desired. If no optimal
|
||||
I/O size is reported this file contains 0.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/physical_block_size
|
||||
Date: May 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
[RO] This is the smallest unit a physical storage device can
|
||||
write atomically. It is usually the same as the logical block
|
||||
size but may be bigger. One example is SATA drives with 4KB
|
||||
sectors that expose a 512-byte logical block size to the
|
||||
operating system. For stacked block devices the
|
||||
physical_block_size variable contains the maximum
|
||||
physical_block_size of the component devices.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/read_ahead_kb
|
||||
Date: May 2004
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] Maximum number of kilobytes to read-ahead for filesystems
|
||||
on this block device.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/rotational
|
||||
Date: January 2009
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] This file is used to stat if the device is of rotational
|
||||
type or non-rotational type.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/rq_affinity
|
||||
Date: September 2008
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] If this option is '1', the block layer will migrate request
|
||||
completions to the cpu "group" that originally submitted the
|
||||
request. For some workloads this provides a significant
|
||||
reduction in CPU cycles due to caching effects.
|
||||
|
||||
For storage configurations that need to maximize distribution of
|
||||
completion processing setting this option to '2' forces the
|
||||
completion to run on the requesting cpu (bypassing the "group"
|
||||
aggregation logic).
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/scheduler
|
||||
Date: October 2004
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] When read, this file will display the current and available
|
||||
IO schedulers for this block device. The currently active IO
|
||||
scheduler will be enclosed in [] brackets. Writing an IO
|
||||
scheduler name to this file will switch control of this block
|
||||
device to that new IO scheduler. Note that writing an IO
|
||||
scheduler name to this file will attempt to load that IO
|
||||
scheduler module, if it isn't already present in the system.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/stable_writes
|
||||
Date: September 2020
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] This file will contain '1' if memory must not be modified
|
||||
while it is being used in a write request to this device. When
|
||||
this is the case and the kernel is performing writeback of a
|
||||
page, the kernel will wait for writeback to complete before
|
||||
allowing the page to be modified again, rather than allowing
|
||||
immediate modification as is normally the case. This
|
||||
restriction arises when the device accesses the memory multiple
|
||||
times where the same data must be seen every time -- for
|
||||
example, once to calculate a checksum and once to actually write
|
||||
the data. If no such restriction exists, this file will contain
|
||||
'0'. This file is writable for testing purposes.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/throttle_sample_time
|
||||
Date: March 2017
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] This is the time window that blk-throttle samples data, in
|
||||
millisecond. blk-throttle makes decision based on the
|
||||
samplings. Lower time means cgroups have more smooth throughput,
|
||||
but higher CPU overhead. This exists only when
|
||||
CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/virt_boundary_mask
|
||||
Date: April 2021
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RO] This file shows the I/O segment memory alignment mask for
|
||||
the block device. I/O requests to this device will be split
|
||||
between segments wherever either the memory address of the end
|
||||
of the previous segment or the memory address of the beginning
|
||||
of the current segment is not aligned to virt_boundary_mask + 1
|
||||
bytes.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/wbt_lat_usec
|
||||
Date: November 2016
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] If the device is registered for writeback throttling, then
|
||||
this file shows the target minimum read latency. If this latency
|
||||
is exceeded in a given window of time (see wb_window_usec), then
|
||||
the writeback throttling will start scaling back writes. Writing
|
||||
a value of '0' to this file disables the feature. Writing a
|
||||
value of '-1' to this file resets the value to the default
|
||||
setting.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/write_cache
|
||||
Date: April 2016
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RW] When read, this file will display whether the device has
|
||||
write back caching enabled or not. It will return "write back"
|
||||
for the former case, and "write through" for the latter. Writing
|
||||
to this file can change the kernels view of the device, but it
|
||||
doesn't alter the device state. This means that it might not be
|
||||
safe to toggle the setting from "write back" to "write through",
|
||||
since that will also eliminate cache flushes issued by the
|
||||
kernel.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/write_same_max_bytes
|
||||
Date: January 2012
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
[RO] Some devices support a write same operation in which a
|
||||
single data block can be written to a range of several
|
||||
contiguous blocks on storage. This can be used to wipe areas on
|
||||
disk or to initialize drives in a RAID configuration.
|
||||
write_same_max_bytes indicates how many bytes can be written in
|
||||
a single write same command. If write_same_max_bytes is 0, write
|
||||
same is not supported by the device.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/write_zeroes_max_bytes
|
||||
Date: November 2016
|
||||
Contact: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
|
||||
Description:
|
||||
[RO] Devices that support write zeroes operation in which a
|
||||
single request can be issued to zero out the range of contiguous
|
||||
blocks on storage without having any payload in the request.
|
||||
This can be used to optimize writing zeroes to the devices.
|
||||
write_zeroes_max_bytes indicates how many bytes can be written
|
||||
in a single write zeroes command. If write_zeroes_max_bytes is
|
||||
0, write zeroes is not supported by the device.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/zone_append_max_bytes
|
||||
Date: May 2020
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RO] This is the maximum number of bytes that can be written to
|
||||
a sequential zone of a zoned block device using a zone append
|
||||
write operation (REQ_OP_ZONE_APPEND). This value is always 0 for
|
||||
regular block devices.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/zone_write_granularity
|
||||
Date: January 2021
|
||||
Contact: linux-block@vger.kernel.org
|
||||
Description:
|
||||
[RO] This indicates the alignment constraint, in bytes, for
|
||||
write operations in sequential zones of zoned block devices
|
||||
(devices with a zoned attributed that reports "host-managed" or
|
||||
"host-aware"). This value is always 0 for regular block devices.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/queue/zoned
|
||||
Date: September 2016
|
||||
Contact: Damien Le Moal <damien.lemoal@wdc.com>
|
||||
Description:
|
||||
[RO] zoned indicates if the device is a zoned block device and
|
||||
the zone model of the device if it is indeed zoned. The
|
||||
possible values indicated by zoned are "none" for regular block
|
||||
devices and "host-aware" or "host-managed" for zoned block
|
||||
devices. The characteristics of host-aware and host-managed
|
||||
zoned block devices are described in the ZBC (Zoned Block
|
||||
Commands) and ZAC (Zoned Device ATA Command Set) standards.
|
||||
These standards also define the "drive-managed" zone model.
|
||||
However, since drive-managed zoned block devices do not support
|
||||
zone commands, they will be treated as regular block devices and
|
||||
zoned will report "none".
|
||||
|
||||
|
||||
What: /sys/block/<disk>/stat
|
||||
Date: February 2008
|
||||
Contact: Jerome Marchand <jmarchan@redhat.com>
|
||||
Description:
|
||||
The /sys/block/<disk>/stat files displays the I/O
|
||||
statistics of disk <disk>. They contain 11 fields:
|
||||
|
||||
== ==============================================
|
||||
1 reads completed successfully
|
||||
2 reads merged
|
||||
3 sectors read
|
||||
4 time spent reading (ms)
|
||||
5 writes completed
|
||||
6 writes merged
|
||||
7 sectors written
|
||||
8 time spent writing (ms)
|
||||
9 I/Os currently in progress
|
||||
10 time spent doing I/Os (ms)
|
||||
11 weighted time spent doing I/Os (ms)
|
||||
12 discards completed
|
||||
13 discards merged
|
||||
14 sectors discarded
|
||||
15 time spent discarding (ms)
|
||||
16 flush requests completed
|
||||
17 time spent flushing (ms)
|
||||
== ==============================================
|
||||
|
||||
For more details refer Documentation/admin-guide/iostats.rst
|
@ -176,3 +176,9 @@ Contact: Keith Busch <keith.busch@intel.com>
|
||||
Description:
|
||||
The cache write policy: 0 for write-back, 1 for write-through,
|
||||
other or unknown.
|
||||
|
||||
What: /sys/devices/system/node/nodeX/x86/sgx_total_bytes
|
||||
Date: November 2021
|
||||
Contact: Jarkko Sakkinen <jarkko@kernel.org>
|
||||
Description:
|
||||
The total amount of SGX physical memory in bytes.
|
||||
|
@ -41,14 +41,14 @@ KernelVersion: 5.6.0
|
||||
Contact: dmaengine@vger.kernel.org
|
||||
Description: The maximum number of groups can be created under this device.
|
||||
|
||||
What: /sys/bus/dsa/devices/dsa<m>/max_tokens
|
||||
Date: Oct 25, 2019
|
||||
KernelVersion: 5.6.0
|
||||
What: /sys/bus/dsa/devices/dsa<m>/max_read_buffers
|
||||
Date: Dec 10, 2021
|
||||
KernelVersion: 5.17.0
|
||||
Contact: dmaengine@vger.kernel.org
|
||||
Description: The total number of bandwidth tokens supported by this device.
|
||||
The bandwidth tokens represent resources within the DSA
|
||||
Description: The total number of read buffers supported by this device.
|
||||
The read buffers represent resources within the DSA
|
||||
implementation, and these resources are allocated by engines to
|
||||
support operations.
|
||||
support operations. See DSA spec v1.2 9.2.4 Total Read Buffers.
|
||||
|
||||
What: /sys/bus/dsa/devices/dsa<m>/max_transfer_size
|
||||
Date: Oct 25, 2019
|
||||
@ -115,13 +115,13 @@ KernelVersion: 5.6.0
|
||||
Contact: dmaengine@vger.kernel.org
|
||||
Description: To indicate if this device is configurable or not.
|
||||
|
||||
What: /sys/bus/dsa/devices/dsa<m>/token_limit
|
||||
Date: Oct 25, 2019
|
||||
KernelVersion: 5.6.0
|
||||
What: /sys/bus/dsa/devices/dsa<m>/read_buffer_limit
|
||||
Date: Dec 10, 2021
|
||||
KernelVersion: 5.17.0
|
||||
Contact: dmaengine@vger.kernel.org
|
||||
Description: The maximum number of bandwidth tokens that may be in use at
|
||||
Description: The maximum number of read buffers that may be in use at
|
||||
one time by operations that access low bandwidth memory in the
|
||||
device.
|
||||
device. See DSA spec v1.2 9.2.8 GENCFG on Global Read Buffer Limit.
|
||||
|
||||
What: /sys/bus/dsa/devices/dsa<m>/cmd_status
|
||||
Date: Aug 28, 2020
|
||||
@ -220,8 +220,38 @@ Contact: dmaengine@vger.kernel.org
|
||||
Description: Show the current number of entries in this WQ if WQ Occupancy
|
||||
Support bit WQ capabilities is 1.
|
||||
|
||||
What: /sys/bus/dsa/devices/wq<m>.<n>/enqcmds_retries
|
||||
Date Oct 29, 2021
|
||||
KernelVersion: 5.17.0
|
||||
Contact: dmaengine@vger.kernel.org
|
||||
Description: Indicate the number of retires for an enqcmds submission on a sharedwq.
|
||||
A max value to set attribute is capped at 64.
|
||||
|
||||
What: /sys/bus/dsa/devices/engine<m>.<n>/group_id
|
||||
Date: Oct 25, 2019
|
||||
KernelVersion: 5.6.0
|
||||
Contact: dmaengine@vger.kernel.org
|
||||
Description: The group that this engine belongs to.
|
||||
|
||||
What: /sys/bus/dsa/devices/group<m>.<n>/use_read_buffer_limit
|
||||
Date: Dec 10, 2021
|
||||
KernelVersion: 5.17.0
|
||||
Contact: dmaengine@vger.kernel.org
|
||||
Description: Enable the use of global read buffer limit for the group. See DSA
|
||||
spec v1.2 9.2.18 GRPCFG Use Global Read Buffer Limit.
|
||||
|
||||
What: /sys/bus/dsa/devices/group<m>.<n>/read_buffers_allowed
|
||||
Date: Dec 10, 2021
|
||||
KernelVersion: 5.17.0
|
||||
Contact: dmaengine@vger.kernel.org
|
||||
Description: Indicates max number of read buffers that may be in use at one time
|
||||
by all engines in the group. See DSA spec v1.2 9.2.18 GRPCFG Read
|
||||
Buffers Allowed.
|
||||
|
||||
What: /sys/bus/dsa/devices/group<m>.<n>/read_buffers_reserved
|
||||
Date: Dec 10, 2021
|
||||
KernelVersion: 5.17.0
|
||||
Contact: dmaengine@vger.kernel.org
|
||||
Description: Indicates the number of Read Buffers reserved for the use of
|
||||
engines in the group. See DSA spec v1.2 9.2.18 GRPCFG Read Buffers
|
||||
Reserved.
|
||||
|
@ -27,6 +27,6 @@ Description:
|
||||
(in 1/256 dB)
|
||||
p_volume_res playback volume control resolution
|
||||
(in 1/256 dB)
|
||||
req_number the number of pre-allocated request
|
||||
req_number the number of pre-allocated requests
|
||||
for both capture and playback
|
||||
===================== =======================================
|
||||
|
@ -30,4 +30,6 @@ Description:
|
||||
(in 1/256 dB)
|
||||
p_volume_res playback volume control resolution
|
||||
(in 1/256 dB)
|
||||
req_number the number of pre-allocated requests
|
||||
for both capture and playback
|
||||
===================== =======================================
|
||||
|
@ -21,11 +21,11 @@ Description: Allow the root user to disable/enable in runtime the clock
|
||||
a different engine to disable/enable its clock gating feature.
|
||||
The bitmask is composed of 20 bits:
|
||||
|
||||
======= ============
|
||||
======= ============
|
||||
0 - 7 DMA channels
|
||||
8 - 11 MME engines
|
||||
12 - 19 TPC engines
|
||||
======= ============
|
||||
======= ============
|
||||
|
||||
The bit's location of a specific engine can be determined
|
||||
using (1 << GAUDI_ENGINE_ID_*). GAUDI_ENGINE_ID_* values
|
||||
@ -155,6 +155,13 @@ Description: Triggers an I2C transaction that is generated by the device's
|
||||
CPU. Writing to this file generates a write transaction while
|
||||
reading from the file generates a read transaction
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/i2c_len
|
||||
Date: Dec 2021
|
||||
KernelVersion: 5.17
|
||||
Contact: obitton@habana.ai
|
||||
Description: Sets I2C length in bytes for I2C transaction that is generated by
|
||||
the device's CPU
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/i2c_reg
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
@ -226,12 +233,6 @@ Description: Gets the state dump occurring on a CS timeout or failure.
|
||||
Writing an integer X discards X state dumps, so that the
|
||||
next read would return X+1-st newest state dump.
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/timeout_locked
|
||||
Date: Sep 2021
|
||||
KernelVersion: 5.16
|
||||
Contact: obitton@habana.ai
|
||||
Description: Sets the command submission timeout value in seconds.
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/stop_on_err
|
||||
Date: Mar 2020
|
||||
KernelVersion: 5.6
|
||||
@ -239,6 +240,12 @@ Contact: ogabbay@kernel.org
|
||||
Description: Sets the stop-on_error option for the device engines. Value of
|
||||
"0" is for disable, otherwise enable.
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/timeout_locked
|
||||
Date: Sep 2021
|
||||
KernelVersion: 5.16
|
||||
Contact: obitton@habana.ai
|
||||
Description: Sets the command submission timeout value in seconds.
|
||||
|
||||
What: /sys/kernel/debug/habanalabs/hl<n>/userptr
|
||||
Date: Jan 2019
|
||||
KernelVersion: 5.1
|
||||
|
@ -1,346 +0,0 @@
|
||||
What: /sys/block/<disk>/stat
|
||||
Date: February 2008
|
||||
Contact: Jerome Marchand <jmarchan@redhat.com>
|
||||
Description:
|
||||
The /sys/block/<disk>/stat files displays the I/O
|
||||
statistics of disk <disk>. They contain 11 fields:
|
||||
|
||||
== ==============================================
|
||||
1 reads completed successfully
|
||||
2 reads merged
|
||||
3 sectors read
|
||||
4 time spent reading (ms)
|
||||
5 writes completed
|
||||
6 writes merged
|
||||
7 sectors written
|
||||
8 time spent writing (ms)
|
||||
9 I/Os currently in progress
|
||||
10 time spent doing I/Os (ms)
|
||||
11 weighted time spent doing I/Os (ms)
|
||||
12 discards completed
|
||||
13 discards merged
|
||||
14 sectors discarded
|
||||
15 time spent discarding (ms)
|
||||
16 flush requests completed
|
||||
17 time spent flushing (ms)
|
||||
== ==============================================
|
||||
|
||||
For more details refer Documentation/admin-guide/iostats.rst
|
||||
|
||||
|
||||
What: /sys/block/<disk>/inflight
|
||||
Date: October 2009
|
||||
Contact: Jens Axboe <axboe@kernel.dk>, Nikanth Karthikesan <knikanth@suse.de>
|
||||
Description:
|
||||
Reports the number of I/O requests currently in progress
|
||||
(pending / in flight) in a device driver. This can be less
|
||||
than the number of requests queued in the block device queue.
|
||||
The report contains 2 fields: one for read requests
|
||||
and one for write requests.
|
||||
The value type is unsigned int.
|
||||
Cf. Documentation/block/stat.rst which contains a single value for
|
||||
requests in flight.
|
||||
This is related to nr_requests in Documentation/block/queue-sysfs.rst
|
||||
and for SCSI device also its queue_depth.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/diskseq
|
||||
Date: February 2021
|
||||
Contact: Matteo Croce <mcroce@microsoft.com>
|
||||
Description:
|
||||
The /sys/block/<disk>/diskseq files reports the disk
|
||||
sequence number, which is a monotonically increasing
|
||||
number assigned to every drive.
|
||||
Some devices, like the loop device, refresh such number
|
||||
every time the backing file is changed.
|
||||
The value type is 64 bit unsigned.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/<part>/stat
|
||||
Date: February 2008
|
||||
Contact: Jerome Marchand <jmarchan@redhat.com>
|
||||
Description:
|
||||
The /sys/block/<disk>/<part>/stat files display the
|
||||
I/O statistics of partition <part>. The format is the
|
||||
same as the above-written /sys/block/<disk>/stat
|
||||
format.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/format
|
||||
Date: June 2008
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Metadata format for integrity capable block device.
|
||||
E.g. T10-DIF-TYPE1-CRC.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/read_verify
|
||||
Date: June 2008
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Indicates whether the block layer should verify the
|
||||
integrity of read requests serviced by devices that
|
||||
support sending integrity metadata.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/tag_size
|
||||
Date: June 2008
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Number of bytes of integrity tag space available per
|
||||
512 bytes of data.
|
||||
|
||||
|
||||
What: /sys/block/<disk>/integrity/device_is_integrity_capable
|
||||
Date: July 2014
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Indicates whether a storage device is capable of storing
|
||||
integrity metadata. Set if the device is T10 PI-capable.
|
||||
|
||||
What: /sys/block/<disk>/integrity/protection_interval_bytes
|
||||
Date: July 2015
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Describes the number of data bytes which are protected
|
||||
by one integrity tuple. Typically the device's logical
|
||||
block size.
|
||||
|
||||
What: /sys/block/<disk>/integrity/write_generate
|
||||
Date: June 2008
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Indicates whether the block layer should automatically
|
||||
generate checksums for write requests bound for
|
||||
devices that support receiving integrity metadata.
|
||||
|
||||
What: /sys/block/<disk>/alignment_offset
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report a physical block size that is
|
||||
bigger than the logical block size (for instance a drive
|
||||
with 4KB physical sectors exposing 512-byte logical
|
||||
blocks to the operating system). This parameter
|
||||
indicates how many bytes the beginning of the device is
|
||||
offset from the disk's natural alignment.
|
||||
|
||||
What: /sys/block/<disk>/<partition>/alignment_offset
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report a physical block size that is
|
||||
bigger than the logical block size (for instance a drive
|
||||
with 4KB physical sectors exposing 512-byte logical
|
||||
blocks to the operating system). This parameter
|
||||
indicates how many bytes the beginning of the partition
|
||||
is offset from the disk's natural alignment.
|
||||
|
||||
What: /sys/block/<disk>/queue/logical_block_size
|
||||
Date: May 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
This is the smallest unit the storage device can
|
||||
address. It is typically 512 bytes.
|
||||
|
||||
What: /sys/block/<disk>/queue/physical_block_size
|
||||
Date: May 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
This is the smallest unit a physical storage device can
|
||||
write atomically. It is usually the same as the logical
|
||||
block size but may be bigger. One example is SATA
|
||||
drives with 4KB sectors that expose a 512-byte logical
|
||||
block size to the operating system. For stacked block
|
||||
devices the physical_block_size variable contains the
|
||||
maximum physical_block_size of the component devices.
|
||||
|
||||
What: /sys/block/<disk>/queue/minimum_io_size
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report a granularity or preferred
|
||||
minimum I/O size which is the smallest request the
|
||||
device can perform without incurring a performance
|
||||
penalty. For disk drives this is often the physical
|
||||
block size. For RAID arrays it is often the stripe
|
||||
chunk size. A properly aligned multiple of
|
||||
minimum_io_size is the preferred request size for
|
||||
workloads where a high number of I/O operations is
|
||||
desired.
|
||||
|
||||
What: /sys/block/<disk>/queue/optimal_io_size
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report an optimal I/O size, which is
|
||||
the device's preferred unit for sustained I/O. This is
|
||||
rarely reported for disk drives. For RAID arrays it is
|
||||
usually the stripe width or the internal track size. A
|
||||
properly aligned multiple of optimal_io_size is the
|
||||
preferred request size for workloads where sustained
|
||||
throughput is desired. If no optimal I/O size is
|
||||
reported this file contains 0.
|
||||
|
||||
What: /sys/block/<disk>/queue/nomerges
|
||||
Date: January 2010
|
||||
Contact:
|
||||
Description:
|
||||
Standard I/O elevator operations include attempts to
|
||||
merge contiguous I/Os. For known random I/O loads these
|
||||
attempts will always fail and result in extra cycles
|
||||
being spent in the kernel. This allows one to turn off
|
||||
this behavior on one of two ways: When set to 1, complex
|
||||
merge checks are disabled, but the simple one-shot merges
|
||||
with the previous I/O request are enabled. When set to 2,
|
||||
all merge tries are disabled. The default value is 0 -
|
||||
which enables all types of merge tries.
|
||||
|
||||
What: /sys/block/<disk>/discard_alignment
|
||||
Date: May 2011
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Devices that support discard functionality may
|
||||
internally allocate space in units that are bigger than
|
||||
the exported logical block size. The discard_alignment
|
||||
parameter indicates how many bytes the beginning of the
|
||||
device is offset from the internal allocation unit's
|
||||
natural alignment.
|
||||
|
||||
What: /sys/block/<disk>/<partition>/discard_alignment
|
||||
Date: May 2011
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Devices that support discard functionality may
|
||||
internally allocate space in units that are bigger than
|
||||
the exported logical block size. The discard_alignment
|
||||
parameter indicates how many bytes the beginning of the
|
||||
partition is offset from the internal allocation unit's
|
||||
natural alignment.
|
||||
|
||||
What: /sys/block/<disk>/queue/discard_granularity
|
||||
Date: May 2011
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Devices that support discard functionality may
|
||||
internally allocate space using units that are bigger
|
||||
than the logical block size. The discard_granularity
|
||||
parameter indicates the size of the internal allocation
|
||||
unit in bytes if reported by the device. Otherwise the
|
||||
discard_granularity will be set to match the device's
|
||||
physical block size. A discard_granularity of 0 means
|
||||
that the device does not support discard functionality.
|
||||
|
||||
What: /sys/block/<disk>/queue/discard_max_bytes
|
||||
Date: May 2011
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Devices that support discard functionality may have
|
||||
internal limits on the number of bytes that can be
|
||||
trimmed or unmapped in a single operation. Some storage
|
||||
protocols also have inherent limits on the number of
|
||||
blocks that can be described in a single command. The
|
||||
discard_max_bytes parameter is set by the device driver
|
||||
to the maximum number of bytes that can be discarded in
|
||||
a single operation. Discard requests issued to the
|
||||
device must not exceed this limit. A discard_max_bytes
|
||||
value of 0 means that the device does not support
|
||||
discard functionality.
|
||||
|
||||
What: /sys/block/<disk>/queue/discard_zeroes_data
|
||||
Date: May 2011
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Will always return 0. Don't rely on any specific behavior
|
||||
for discards, and don't read this file.
|
||||
|
||||
What: /sys/block/<disk>/queue/write_same_max_bytes
|
||||
Date: January 2012
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Some devices support a write same operation in which a
|
||||
single data block can be written to a range of several
|
||||
contiguous blocks on storage. This can be used to wipe
|
||||
areas on disk or to initialize drives in a RAID
|
||||
configuration. write_same_max_bytes indicates how many
|
||||
bytes can be written in a single write same command. If
|
||||
write_same_max_bytes is 0, write same is not supported
|
||||
by the device.
|
||||
|
||||
What: /sys/block/<disk>/queue/write_zeroes_max_bytes
|
||||
Date: November 2016
|
||||
Contact: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
|
||||
Description:
|
||||
Devices that support write zeroes operation in which a
|
||||
single request can be issued to zero out the range of
|
||||
contiguous blocks on storage without having any payload
|
||||
in the request. This can be used to optimize writing zeroes
|
||||
to the devices. write_zeroes_max_bytes indicates how many
|
||||
bytes can be written in a single write zeroes command. If
|
||||
write_zeroes_max_bytes is 0, write zeroes is not supported
|
||||
by the device.
|
||||
|
||||
What: /sys/block/<disk>/queue/zoned
|
||||
Date: September 2016
|
||||
Contact: Damien Le Moal <damien.lemoal@wdc.com>
|
||||
Description:
|
||||
zoned indicates if the device is a zoned block device
|
||||
and the zone model of the device if it is indeed zoned.
|
||||
The possible values indicated by zoned are "none" for
|
||||
regular block devices and "host-aware" or "host-managed"
|
||||
for zoned block devices. The characteristics of
|
||||
host-aware and host-managed zoned block devices are
|
||||
described in the ZBC (Zoned Block Commands) and ZAC
|
||||
(Zoned Device ATA Command Set) standards. These standards
|
||||
also define the "drive-managed" zone model. However,
|
||||
since drive-managed zoned block devices do not support
|
||||
zone commands, they will be treated as regular block
|
||||
devices and zoned will report "none".
|
||||
|
||||
What: /sys/block/<disk>/queue/nr_zones
|
||||
Date: November 2018
|
||||
Contact: Damien Le Moal <damien.lemoal@wdc.com>
|
||||
Description:
|
||||
nr_zones indicates the total number of zones of a zoned block
|
||||
device ("host-aware" or "host-managed" zone model). For regular
|
||||
block devices, the value is always 0.
|
||||
|
||||
What: /sys/block/<disk>/queue/max_active_zones
|
||||
Date: July 2020
|
||||
Contact: Niklas Cassel <niklas.cassel@wdc.com>
|
||||
Description:
|
||||
For zoned block devices (zoned attribute indicating
|
||||
"host-managed" or "host-aware"), the sum of zones belonging to
|
||||
any of the zone states: EXPLICIT OPEN, IMPLICIT OPEN or CLOSED,
|
||||
is limited by this value. If this value is 0, there is no limit.
|
||||
|
||||
What: /sys/block/<disk>/queue/max_open_zones
|
||||
Date: July 2020
|
||||
Contact: Niklas Cassel <niklas.cassel@wdc.com>
|
||||
Description:
|
||||
For zoned block devices (zoned attribute indicating
|
||||
"host-managed" or "host-aware"), the sum of zones belonging to
|
||||
any of the zone states: EXPLICIT OPEN or IMPLICIT OPEN,
|
||||
is limited by this value. If this value is 0, there is no limit.
|
||||
|
||||
What: /sys/block/<disk>/queue/chunk_sectors
|
||||
Date: September 2016
|
||||
Contact: Hannes Reinecke <hare@suse.com>
|
||||
Description:
|
||||
chunk_sectors has different meaning depending on the type
|
||||
of the disk. For a RAID device (dm-raid), chunk_sectors
|
||||
indicates the size in 512B sectors of the RAID volume
|
||||
stripe segment. For a zoned block device, either
|
||||
host-aware or host-managed, chunk_sectors indicates the
|
||||
size in 512B sectors of the zones of the device, with
|
||||
the eventual exception of the last zone of the device
|
||||
which may be smaller.
|
||||
|
||||
What: /sys/block/<disk>/queue/io_timeout
|
||||
Date: November 2018
|
||||
Contact: Weiping Zhang <zhangweiping@didiglobal.com>
|
||||
Description:
|
||||
io_timeout is the request timeout in milliseconds. If a request
|
||||
does not complete in this time then the block driver timeout
|
||||
handler is invoked. That timeout handler can decide to retry
|
||||
the request, to fail it or to start a device recovery strategy.
|
16
Documentation/ABI/testing/sysfs-bus-iio-filter-admv8818
Normal file
@ -0,0 +1,16 @@
|
||||
What: /sys/bus/iio/devices/iio:deviceX/filter_mode_available
|
||||
KernelVersion:
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Reading this returns the valid values that can be written to the
|
||||
on_altvoltage0_mode attribute:
|
||||
|
||||
- auto -> Adjust bandpass filter to track changes in input clock rate.
|
||||
- manual -> disable/unregister the clock rate notifier / input clock tracking.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/filter_mode
|
||||
KernelVersion:
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
This attribute configures the filter mode.
|
||||
Reading returns the actual mode.
|
38
Documentation/ABI/testing/sysfs-bus-iio-frequency-admv1013
Normal file
@ -0,0 +1,38 @@
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_altvoltage0-1_i_calibphase
|
||||
KernelVersion:
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Read/write unscaled value for the Local Oscillatior path quadrature I phase shift.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_altvoltage0-1_q_calibphase
|
||||
KernelVersion:
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Read/write unscaled value for the Local Oscillatior path quadrature Q phase shift.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_altvoltage0_i_calibbias
|
||||
KernelVersion:
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Read/write value for the Local Oscillatior Feedthrough Offset Calibration I Positive
|
||||
side.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_altvoltage0_q_calibbias
|
||||
KernelVersion:
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Read/write value for the Local Oscillatior Feedthrough Offset Calibration Q Positive side.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_altvoltage1_i_calibbias
|
||||
KernelVersion:
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Read/write raw value for the Local Oscillatior Feedthrough Offset Calibration I Negative
|
||||
side.
|
||||
|
||||
What: /sys/bus/iio/devices/iio:deviceX/in_altvoltage1_q_calibbias
|
||||
KernelVersion:
|
||||
Contact: linux-iio@vger.kernel.org
|
||||
Description:
|
||||
Read/write raw value for the Local Oscillatior Feedthrough Offset Calibration Q Negative
|
||||
side.
|
@ -244,6 +244,15 @@ Description:
|
||||
is permitted, "u2" if only u2 is permitted, "u1_u2" if both u1 and
|
||||
u2 are permitted.
|
||||
|
||||
What: /sys/bus/usb/devices/.../<hub_interface>/port<X>/connector
|
||||
Date: December 2021
|
||||
Contact: Heikki Krogerus <heikki.krogerus@linux.intel.com>
|
||||
Description:
|
||||
Link to the USB Type-C connector when available. This link is
|
||||
only created when USB Type-C Connector Class is enabled, and
|
||||
only if the system firmware is capable of describing the
|
||||
connection between a port and its connector.
|
||||
|
||||
What: /sys/bus/usb/devices/.../power/usb2_lpm_l1_timeout
|
||||
Date: May 2013
|
||||
Contact: Mathias Nyman <mathias.nyman@linux.intel.com>
|
||||
|
57
Documentation/ABI/testing/sysfs-bus-vdpa
Normal file
@ -0,0 +1,57 @@
|
||||
What: /sys/bus/vdpa/driver_autoprobe
|
||||
Date: March 2020
|
||||
Contact: virtualization@lists.linux-foundation.org
|
||||
Description:
|
||||
This file determines whether new devices are immediately bound
|
||||
to a driver after the creation. It initially contains 1, which
|
||||
means the kernel automatically binds devices to a compatible
|
||||
driver immediately after they are created.
|
||||
|
||||
Writing "0" to this file disable this feature, any other string
|
||||
enable it.
|
||||
|
||||
What: /sys/bus/vdpa/driver_probe
|
||||
Date: March 2020
|
||||
Contact: virtualization@lists.linux-foundation.org
|
||||
Description:
|
||||
Writing a device name to this file will cause the kernel binds
|
||||
devices to a compatible driver.
|
||||
|
||||
This can be useful when /sys/bus/vdpa/driver_autoprobe is
|
||||
disabled.
|
||||
|
||||
What: /sys/bus/vdpa/drivers/.../bind
|
||||
Date: March 2020
|
||||
Contact: virtualization@lists.linux-foundation.org
|
||||
Description:
|
||||
Writing a device name to this file will cause the driver to
|
||||
attempt to bind to the device. This is useful for overriding
|
||||
default bindings.
|
||||
|
||||
What: /sys/bus/vdpa/drivers/.../unbind
|
||||
Date: March 2020
|
||||
Contact: virtualization@lists.linux-foundation.org
|
||||
Description:
|
||||
Writing a device name to this file will cause the driver to
|
||||
attempt to unbind from the device. This may be useful when
|
||||
overriding default bindings.
|
||||
|
||||
What: /sys/bus/vdpa/devices/.../driver_override
|
||||
Date: November 2021
|
||||
Contact: virtualization@lists.linux-foundation.org
|
||||
Description:
|
||||
This file allows the driver for a device to be specified.
|
||||
When specified, only a driver with a name matching the value
|
||||
written to driver_override will have an opportunity to bind to
|
||||
the device. The override is specified by writing a string to the
|
||||
driver_override file (echo vhost-vdpa > driver_override) and may
|
||||
be cleared with an empty string (echo > driver_override).
|
||||
This returns the device to standard matching rules binding.
|
||||
Writing to driver_override does not automatically unbind the
|
||||
device from its current driver or make any attempt to
|
||||
automatically load the specified driver. If no driver with a
|
||||
matching name is currently loaded in the kernel, the device will
|
||||
not bind to any driver. This also allows devices to opt-out of
|
||||
driver binding using a driver_override name such as "none".
|
||||
Only a single driver may be specified in the override, there is
|
||||
no support for parsing delimiters.
|
@ -161,6 +161,15 @@ Description:
|
||||
power-on:
|
||||
Representing a password required to use
|
||||
the system
|
||||
system-mgmt:
|
||||
Representing System Management password.
|
||||
See Lenovo extensions section for details
|
||||
HDD:
|
||||
Representing HDD password
|
||||
See Lenovo extensions section for details
|
||||
NVMe:
|
||||
Representing NVMe password
|
||||
See Lenovo extensions section for details
|
||||
|
||||
mechanism:
|
||||
The means of authentication. This attribute is mandatory.
|
||||
@ -207,6 +216,13 @@ Description:
|
||||
|
||||
On Lenovo systems the following additional settings are available:
|
||||
|
||||
role: system-mgmt This gives the same authority as the bios-admin password to control
|
||||
security related features. The authorities allocated can be set via
|
||||
the BIOS menu SMP Access Control Policy
|
||||
|
||||
role: HDD & NVMe This password is used to unlock access to the drive at boot. Note see
|
||||
'level' and 'index' extensions below.
|
||||
|
||||
lenovo_encoding:
|
||||
The encoding method that is used. This can be either "ascii"
|
||||
or "scancode". Default is set to "ascii"
|
||||
@ -216,6 +232,22 @@ Description:
|
||||
two char code (e.g. "us", "fr", "gr") and may vary per platform.
|
||||
Default is set to "us"
|
||||
|
||||
level:
|
||||
Available for HDD and NVMe authentication to set 'user' or 'master'
|
||||
privilege level.
|
||||
If only the user password is configured then this should be used to
|
||||
unlock the drive at boot. If both master and user passwords are set
|
||||
then either can be used. If a master password is set a user password
|
||||
is required.
|
||||
This attribute defaults to 'user' level
|
||||
|
||||
index:
|
||||
Used with HDD and NVME authentication to set the drive index
|
||||
that is being referenced (e.g hdd0, hdd1 etc)
|
||||
This attribute defaults to device 0.
|
||||
|
||||
|
||||
|
||||
What: /sys/class/firmware-attributes/*/attributes/pending_reboot
|
||||
Date: February 2021
|
||||
KernelVersion: 5.11
|
||||
|
@ -413,7 +413,7 @@ Description:
|
||||
"Over voltage", "Unspecified failure", "Cold",
|
||||
"Watchdog timer expire", "Safety timer expire",
|
||||
"Over current", "Calibration required", "Warm",
|
||||
"Cool", "Hot"
|
||||
"Cool", "Hot", "No battery"
|
||||
|
||||
What: /sys/class/power_supply/<supply_name>/precharge_current
|
||||
Date: June 2017
|
||||
@ -455,6 +455,20 @@ Description:
|
||||
"Unknown", "Charging", "Discharging",
|
||||
"Not charging", "Full"
|
||||
|
||||
What: /sys/class/power_supply/<supply_name>/charge_behaviour
|
||||
Date: November 2021
|
||||
Contact: linux-pm@vger.kernel.org
|
||||
Description:
|
||||
Represents the charging behaviour.
|
||||
|
||||
Access: Read, Write
|
||||
|
||||
Valid values:
|
||||
================ ====================================
|
||||
auto: Charge normally, respect thresholds
|
||||
inhibit-charge: Do not charge while AC is attached
|
||||
force-discharge: Force discharge while AC is attached
|
||||
|
||||
What: /sys/class/power_supply/<supply_name>/technology
|
||||
Date: May 2007
|
||||
Contact: linux-pm@vger.kernel.org
|
||||
|
@ -666,3 +666,18 @@ Description: Preferred MTE tag checking mode
|
||||
================ ==============================================
|
||||
|
||||
See also: Documentation/arm64/memory-tagging-extension.rst
|
||||
|
||||
What: /sys/devices/system/cpu/nohz_full
|
||||
Date: Apr 2015
|
||||
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
|
||||
Description:
|
||||
(RO) the list of CPUs that are in nohz_full mode.
|
||||
These CPUs are set by boot parameter "nohz_full=".
|
||||
|
||||
What: /sys/devices/system/cpu/isolated
|
||||
Date: Apr 2015
|
||||
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
|
||||
Description:
|
||||
(RO) the list of CPUs that are isolated and don't
|
||||
participate in load balancing. These CPUs are set by
|
||||
boot parameter "isolcpus=".
|
||||
|
16
Documentation/ABI/testing/sysfs-fs-erofs
Normal file
@ -0,0 +1,16 @@
|
||||
What: /sys/fs/erofs/features/
|
||||
Date: November 2021
|
||||
Contact: "Huang Jianan" <huangjianan@oppo.com>
|
||||
Description: Shows all enabled kernel features.
|
||||
Supported features:
|
||||
zero_padding, compr_cfgs, big_pcluster, chunked_file,
|
||||
device_table, compr_head2, sb_chksum.
|
||||
|
||||
What: /sys/fs/erofs/<disk>/sync_decompress
|
||||
Date: November 2021
|
||||
Contact: "Huang Jianan" <huangjianan@oppo.com>
|
||||
Description: Control strategy of sync decompression
|
||||
- 0 (default, auto): enable for readpage, and enable for
|
||||
readahead on atomic contexts only,
|
||||
- 1 (force on): enable for readpage and readahead.
|
||||
- 2 (force off): disable for all situations.
|
@ -112,6 +112,11 @@ Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
|
||||
Description: Set timeout to issue discard commands during umount.
|
||||
Default: 5 secs
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/pending_discard
|
||||
Date: November 2021
|
||||
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
|
||||
Description: Shows the number of pending discard commands in the queue.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/max_victim_search
|
||||
Date: January 2014
|
||||
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
|
||||
@ -528,3 +533,10 @@ Description: With "mode=fragment:block" mount options, we can scatter block allo
|
||||
f2fs will allocate 1..<max_fragment_chunk> blocks in a chunk and make a hole
|
||||
in the length of 1..<max_fragment_hole> by turns. This value can be set
|
||||
between 1..512 and the default value is 4.
|
||||
|
||||
What: /sys/fs/f2fs/<disk>/gc_urgent_high_remaining
|
||||
Date: December 2021
|
||||
Contact: "Daeho Jeong" <daehojeong@google.com>
|
||||
Description: You can set the trial count limit for GC urgent high mode with this value.
|
||||
If GC thread gets to the limit, the mode will turn back to GC normal mode.
|
||||
By default, the value is zero, which means there is no limit like before.
|
||||
|
35
Documentation/ABI/testing/sysfs-fs-ubifs
Normal file
@ -0,0 +1,35 @@
|
||||
What: /sys/fs/ubifsX_Y/error_magic
|
||||
Date: October 2021
|
||||
KernelVersion: 5.16
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
Exposes magic errors: every node starts with a magic number.
|
||||
|
||||
This counter keeps track of the number of accesses of nodes
|
||||
with a corrupted magic number.
|
||||
|
||||
The counter is reset to 0 with a remount.
|
||||
|
||||
What: /sys/fs/ubifsX_Y/error_node
|
||||
Date: October 2021
|
||||
KernelVersion: 5.16
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
Exposes node errors. Every node embeds its type.
|
||||
|
||||
This counter keeps track of the number of accesses of nodes
|
||||
with a corrupted node type.
|
||||
|
||||
The counter is reset to 0 with a remount.
|
||||
|
||||
What: /sys/fs/ubifsX_Y/error_crc
|
||||
Date: October 2021
|
||||
KernelVersion: 5.16
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
Exposes crc errors: every node embeds a crc checksum.
|
||||
|
||||
This counter keeps track of the number of accesses of nodes
|
||||
with a bad crc checksum.
|
||||
|
||||
The counter is reset to 0 with a remount.
|
@ -19,6 +19,8 @@ endif
|
||||
SPHINXBUILD = sphinx-build
|
||||
SPHINXOPTS =
|
||||
SPHINXDIRS = .
|
||||
DOCS_THEME =
|
||||
DOCS_CSS =
|
||||
_SPHINXDIRS = $(sort $(patsubst $(srctree)/Documentation/%/index.rst,%,$(wildcard $(srctree)/Documentation/*/index.rst)))
|
||||
SPHINX_CONF = conf.py
|
||||
PAPER =
|
||||
@ -84,7 +86,10 @@ quiet_cmd_sphinx = SPHINX $@ --> file://$(abspath $(BUILDDIR)/$3/$4)
|
||||
-D version=$(KERNELVERSION) -D release=$(KERNELRELEASE) \
|
||||
$(ALLSPHINXOPTS) \
|
||||
$(abspath $(srctree)/$(src)/$5) \
|
||||
$(abspath $(BUILDDIR)/$3/$4)
|
||||
$(abspath $(BUILDDIR)/$3/$4) && \
|
||||
if [ "x$(DOCS_CSS)" != "x" ]; then \
|
||||
cp $(if $(patsubst /%,,$(DOCS_CSS)),$(abspath $(srctree)/$(DOCS_CSS)),$(DOCS_CSS)) $(BUILDDIR)/$3/_static/; \
|
||||
fi
|
||||
|
||||
htmldocs:
|
||||
@$(srctree)/scripts/sphinx-pre-install --version-check
|
||||
@ -154,4 +159,8 @@ dochelp:
|
||||
@echo ' make SPHINX_CONF={conf-file} [target] use *additional* sphinx-build'
|
||||
@echo ' configuration. This is e.g. useful to build with nit-picking config.'
|
||||
@echo
|
||||
@echo ' make DOCS_THEME={sphinx-theme} selects a different Sphinx theme.'
|
||||
@echo
|
||||
@echo ' make DOCS_CSS={a .css file} adds a DOCS_CSS override file for html/epub output.'
|
||||
@echo
|
||||
@echo ' Default location for the generated documents is Documentation/output'
|
||||
|
@ -116,7 +116,7 @@
|
||||
<flowRoot
|
||||
xml:space="preserve"
|
||||
id="flowRoot2985"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"><flowRegion
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"><flowRegion
|
||||
id="flowRegion2987"><rect
|
||||
id="rect2989"
|
||||
width="82.85714"
|
||||
@ -125,7 +125,7 @@
|
||||
y="492.36218" /></flowRegion><flowPara
|
||||
id="flowPara2991" /></flowRoot> <text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="362.371"
|
||||
y="262.51819"
|
||||
id="text4441"
|
||||
|
Before Width: | Height: | Size: 10 KiB After Width: | Height: | Size: 10 KiB |
@ -116,7 +116,7 @@
|
||||
<flowRoot
|
||||
xml:space="preserve"
|
||||
id="flowRoot2985"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"><flowRegion
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"><flowRegion
|
||||
id="flowRegion2987"><rect
|
||||
id="rect2989"
|
||||
width="82.85714"
|
||||
@ -125,7 +125,7 @@
|
||||
y="492.36218" /></flowRegion><flowPara
|
||||
id="flowPara2991" /></flowRoot> <text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="362.371"
|
||||
y="262.51819"
|
||||
id="text4441"
|
||||
|
Before Width: | Height: | Size: 10 KiB After Width: | Height: | Size: 10 KiB |
@ -116,7 +116,7 @@
|
||||
<flowRoot
|
||||
xml:space="preserve"
|
||||
id="flowRoot2985"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"><flowRegion
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"><flowRegion
|
||||
id="flowRegion2987"><rect
|
||||
id="rect2989"
|
||||
width="82.85714"
|
||||
@ -125,7 +125,7 @@
|
||||
y="492.36218" /></flowRegion><flowPara
|
||||
id="flowPara2991" /></flowRoot> <text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="362.371"
|
||||
y="262.51819"
|
||||
id="text4441"
|
||||
|
Before Width: | Height: | Size: 11 KiB After Width: | Height: | Size: 11 KiB |
@ -116,7 +116,7 @@
|
||||
<flowRoot
|
||||
xml:space="preserve"
|
||||
id="flowRoot2985"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"><flowRegion
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"><flowRegion
|
||||
id="flowRegion2987"><rect
|
||||
id="rect2989"
|
||||
width="82.85714"
|
||||
@ -125,7 +125,7 @@
|
||||
y="492.36218" /></flowRegion><flowPara
|
||||
id="flowPara2991" /></flowRoot> <text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="362.371"
|
||||
y="262.51819"
|
||||
id="text4441"
|
||||
|
Before Width: | Height: | Size: 12 KiB After Width: | Height: | Size: 12 KiB |
@ -116,7 +116,7 @@
|
||||
<flowRoot
|
||||
xml:space="preserve"
|
||||
id="flowRoot2985"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"><flowRegion
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"><flowRegion
|
||||
id="flowRegion2987"><rect
|
||||
id="rect2989"
|
||||
width="82.85714"
|
||||
@ -125,7 +125,7 @@
|
||||
y="492.36218" /></flowRegion><flowPara
|
||||
id="flowPara2991" /></flowRoot> <text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="362.371"
|
||||
y="262.51819"
|
||||
id="text4441"
|
||||
|
Before Width: | Height: | Size: 12 KiB After Width: | Height: | Size: 12 KiB |
@ -116,7 +116,7 @@
|
||||
<flowRoot
|
||||
xml:space="preserve"
|
||||
id="flowRoot2985"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"><flowRegion
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"><flowRegion
|
||||
id="flowRegion2987"><rect
|
||||
id="rect2989"
|
||||
width="82.85714"
|
||||
@ -125,7 +125,7 @@
|
||||
y="492.36218" /></flowRegion><flowPara
|
||||
id="flowPara2991" /></flowRoot> <text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="362.371"
|
||||
y="262.51819"
|
||||
id="text4441"
|
||||
|
Before Width: | Height: | Size: 13 KiB After Width: | Height: | Size: 13 KiB |
@ -116,7 +116,7 @@
|
||||
<flowRoot
|
||||
xml:space="preserve"
|
||||
id="flowRoot2985"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"><flowRegion
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"><flowRegion
|
||||
id="flowRegion2987"><rect
|
||||
id="rect2989"
|
||||
width="82.85714"
|
||||
@ -125,7 +125,7 @@
|
||||
y="492.36218" /></flowRegion><flowPara
|
||||
id="flowPara2991" /></flowRoot> <text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="362.371"
|
||||
y="262.51819"
|
||||
id="text4441"
|
||||
|
Before Width: | Height: | Size: 13 KiB After Width: | Height: | Size: 13 KiB |
@ -116,7 +116,7 @@
|
||||
<flowRoot
|
||||
xml:space="preserve"
|
||||
id="flowRoot2985"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"><flowRegion
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"><flowRegion
|
||||
id="flowRegion2987"><rect
|
||||
id="rect2989"
|
||||
width="82.85714"
|
||||
@ -125,7 +125,7 @@
|
||||
y="492.36218" /></flowRegion><flowPara
|
||||
id="flowPara2991" /></flowRoot> <text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="362.371"
|
||||
y="262.51819"
|
||||
id="text4441"
|
||||
|
Before Width: | Height: | Size: 13 KiB After Width: | Height: | Size: 13 KiB |
@ -116,7 +116,7 @@
|
||||
<flowRoot
|
||||
xml:space="preserve"
|
||||
id="flowRoot2985"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"><flowRegion
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"><flowRegion
|
||||
id="flowRegion2987"><rect
|
||||
id="rect2989"
|
||||
width="82.85714"
|
||||
@ -125,7 +125,7 @@
|
||||
y="492.36218" /></flowRegion><flowPara
|
||||
id="flowPara2991" /></flowRoot> <text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="362.371"
|
||||
y="262.51819"
|
||||
id="text4441"
|
||||
|
Before Width: | Height: | Size: 12 KiB After Width: | Height: | Size: 12 KiB |
@ -88,7 +88,7 @@
|
||||
<flowRoot
|
||||
xml:space="preserve"
|
||||
id="flowRoot2985"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"><flowRegion
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"><flowRegion
|
||||
id="flowRegion2987"><rect
|
||||
id="rect2989"
|
||||
width="82.85714"
|
||||
@ -103,7 +103,7 @@
|
||||
id="text2993"
|
||||
y="-261.66608"
|
||||
x="412.12299"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
xml:space="preserve"
|
||||
transform="matrix(0,1,-1,0,0,0)"><tspan
|
||||
y="-261.66608"
|
||||
@ -135,7 +135,7 @@
|
||||
</g>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="112.04738"
|
||||
y="268.18076"
|
||||
id="text4429"
|
||||
@ -146,7 +146,7 @@
|
||||
y="268.18076">WRITE_ONCE(a, 1);</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="112.04738"
|
||||
y="439.13766"
|
||||
id="text4441"
|
||||
@ -157,7 +157,7 @@
|
||||
y="439.13766">WRITE_ONCE(b, 1);</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="255.60869"
|
||||
y="309.29346"
|
||||
id="text4445"
|
||||
@ -168,7 +168,7 @@
|
||||
y="309.29346">r1 = READ_ONCE(a);</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="255.14423"
|
||||
y="520.61786"
|
||||
id="text4449"
|
||||
@ -179,7 +179,7 @@
|
||||
y="520.61786">WRITE_ONCE(c, 1);</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="396.10254"
|
||||
y="384.71124"
|
||||
id="text4453"
|
||||
@ -190,7 +190,7 @@
|
||||
y="384.71124">r2 = READ_ONCE(b);</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="396.10254"
|
||||
y="582.13617"
|
||||
id="text4457"
|
||||
@ -201,7 +201,7 @@
|
||||
y="582.13617">r3 = READ_ONCE(c);</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="112.08231"
|
||||
y="213.91006"
|
||||
id="text4461"
|
||||
@ -212,7 +212,7 @@
|
||||
y="213.91006">thread0()</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="252.34512"
|
||||
y="213.91006"
|
||||
id="text4461-6"
|
||||
@ -223,7 +223,7 @@
|
||||
y="213.91006">thread1()</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="396.42557"
|
||||
y="213.91006"
|
||||
id="text4461-2"
|
||||
@ -251,7 +251,7 @@
|
||||
inkscape:connector-curvature="0" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="111.75929"
|
||||
y="251.53981"
|
||||
id="text4429-8"
|
||||
@ -262,7 +262,7 @@
|
||||
y="251.53981">rcu_read_lock();</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="396.10254"
|
||||
y="367.91556"
|
||||
id="text4429-8-9"
|
||||
@ -273,7 +273,7 @@
|
||||
y="367.91556">rcu_read_lock();</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="396.10254"
|
||||
y="597.40289"
|
||||
id="text4429-8-9-3"
|
||||
@ -284,7 +284,7 @@
|
||||
y="597.40289">rcu_read_unlock();</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="111.75929"
|
||||
y="453.15311"
|
||||
id="text4429-8-9-3-1"
|
||||
@ -300,7 +300,7 @@
|
||||
inkscape:connector-curvature="0" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="394.94427"
|
||||
y="345.66351"
|
||||
id="text4648"
|
||||
@ -324,7 +324,7 @@
|
||||
sodipodi:open="true" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="112.11968"
|
||||
y="475.77856"
|
||||
id="text4648-4"
|
||||
@ -361,7 +361,7 @@
|
||||
sodipodi:open="true" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="254.85066"
|
||||
y="348.96619"
|
||||
id="text4648-4-3"
|
||||
|
Before Width: | Height: | Size: 17 KiB After Width: | Height: | Size: 17 KiB |
@ -116,7 +116,7 @@
|
||||
<flowRoot
|
||||
xml:space="preserve"
|
||||
id="flowRoot2985"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"><flowRegion
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"><flowRegion
|
||||
id="flowRegion2987"><rect
|
||||
id="rect2989"
|
||||
width="82.85714"
|
||||
@ -131,7 +131,7 @@
|
||||
id="text2993"
|
||||
y="-261.66608"
|
||||
x="436.12299"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
xml:space="preserve"
|
||||
transform="matrix(0,1,-1,0,0,0)"><tspan
|
||||
y="-261.66608"
|
||||
@ -163,7 +163,7 @@
|
||||
</g>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="112.04738"
|
||||
y="268.18076"
|
||||
id="text4429"
|
||||
@ -174,7 +174,7 @@
|
||||
y="268.18076">WRITE_ONCE(a, 1);</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="112.04738"
|
||||
y="487.13766"
|
||||
id="text4441"
|
||||
@ -185,7 +185,7 @@
|
||||
y="487.13766">WRITE_ONCE(b, 1);</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="255.60869"
|
||||
y="297.29346"
|
||||
id="text4445"
|
||||
@ -196,7 +196,7 @@
|
||||
y="297.29346">r1 = READ_ONCE(a);</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="255.14423"
|
||||
y="554.61786"
|
||||
id="text4449"
|
||||
@ -207,7 +207,7 @@
|
||||
y="554.61786">WRITE_ONCE(c, 1);</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="396.10254"
|
||||
y="370.71124"
|
||||
id="text4453"
|
||||
@ -218,7 +218,7 @@
|
||||
y="370.71124">WRITE_ONCE(d, 1);</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="396.10254"
|
||||
y="572.13617"
|
||||
id="text4457"
|
||||
@ -229,7 +229,7 @@
|
||||
y="572.13617">r2 = READ_ONCE(c);</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="112.08231"
|
||||
y="213.91006"
|
||||
id="text4461"
|
||||
@ -240,7 +240,7 @@
|
||||
y="213.91006">thread0()</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="252.34512"
|
||||
y="213.91006"
|
||||
id="text4461-6"
|
||||
@ -251,7 +251,7 @@
|
||||
y="213.91006">thread1()</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="396.42557"
|
||||
y="213.91006"
|
||||
id="text4461-2"
|
||||
@ -281,7 +281,7 @@
|
||||
sodipodi:nodetypes="cc" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="111.75929"
|
||||
y="251.53981"
|
||||
id="text4429-8"
|
||||
@ -292,7 +292,7 @@
|
||||
y="251.53981">rcu_read_lock();</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="396.10254"
|
||||
y="353.91556"
|
||||
id="text4429-8-9"
|
||||
@ -303,7 +303,7 @@
|
||||
y="353.91556">rcu_read_lock();</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="396.10254"
|
||||
y="587.40289"
|
||||
id="text4429-8-9-3"
|
||||
@ -314,7 +314,7 @@
|
||||
y="587.40289">rcu_read_unlock();</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="111.75929"
|
||||
y="501.15311"
|
||||
id="text4429-8-9-3-1"
|
||||
@ -331,7 +331,7 @@
|
||||
sodipodi:nodetypes="cc" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="394.94427"
|
||||
y="331.66351"
|
||||
id="text4648"
|
||||
@ -355,7 +355,7 @@
|
||||
sodipodi:open="true" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="112.11968"
|
||||
y="523.77856"
|
||||
id="text4648-4"
|
||||
@ -392,7 +392,7 @@
|
||||
sodipodi:open="true" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="254.85066"
|
||||
y="336.96619"
|
||||
id="text4648-4-3"
|
||||
@ -421,7 +421,7 @@
|
||||
id="text2993-7"
|
||||
y="-261.66608"
|
||||
x="440.12299"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
xml:space="preserve"
|
||||
transform="matrix(0,1,-1,0,0,0)"><tspan
|
||||
y="-261.66608"
|
||||
@ -453,7 +453,7 @@
|
||||
</g>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="541.70508"
|
||||
y="387.6217"
|
||||
id="text4445-0"
|
||||
@ -464,7 +464,7 @@
|
||||
y="387.6217">r3 = READ_ONCE(d);</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="541.2406"
|
||||
y="646.94611"
|
||||
id="text4449-6"
|
||||
@ -488,7 +488,7 @@
|
||||
sodipodi:open="true" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="540.94702"
|
||||
y="427.29443"
|
||||
id="text4648-4-3-1"
|
||||
@ -499,7 +499,7 @@
|
||||
y="427.29443">QS</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="686.27747"
|
||||
y="461.83929"
|
||||
id="text4453-7"
|
||||
@ -510,7 +510,7 @@
|
||||
y="461.83929">r4 = READ_ONCE(b);</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="686.27747"
|
||||
y="669.26422"
|
||||
id="text4457-9"
|
||||
@ -521,7 +521,7 @@
|
||||
y="669.26422">r5 = READ_ONCE(e);</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="686.27747"
|
||||
y="445.04358"
|
||||
id="text4429-8-9-33"
|
||||
@ -532,7 +532,7 @@
|
||||
y="445.04358">rcu_read_lock();</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="686.27747"
|
||||
y="684.53094"
|
||||
id="text4429-8-9-3-8"
|
||||
@ -543,7 +543,7 @@
|
||||
y="684.53094">rcu_read_unlock();</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="685.11914"
|
||||
y="422.79153"
|
||||
id="text4648-9"
|
||||
@ -567,7 +567,7 @@
|
||||
sodipodi:open="true" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="397.85934"
|
||||
y="609.59003"
|
||||
id="text4648-5"
|
||||
@ -591,7 +591,7 @@
|
||||
sodipodi:open="true" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="256.75986"
|
||||
y="586.99133"
|
||||
id="text4648-5-2"
|
||||
@ -615,7 +615,7 @@
|
||||
sodipodi:open="true" />
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="546.22791"
|
||||
y="213.91006"
|
||||
id="text4461-2-5"
|
||||
@ -626,7 +626,7 @@
|
||||
y="213.91006">thread3()</tspan></text>
|
||||
<text
|
||||
xml:space="preserve"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Symbol;-inkscape-font-specification:Symbol"
|
||||
style="font-size:10px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:monospace;-inkscape-font-specification:monospace"
|
||||
x="684.00067"
|
||||
y="213.91006"
|
||||
id="text4461-2-1"
|
||||
|
Before Width: | Height: | Size: 29 KiB After Width: | Height: | Size: 30 KiB |
@ -254,17 +254,6 @@ period (in this case 2603), the grace-period sequence number (7075), and
|
||||
an estimate of the total number of RCU callbacks queued across all CPUs
|
||||
(625 in this case).
|
||||
|
||||
In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed
|
||||
for each CPU::
|
||||
|
||||
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 dyntick_enabled: 1
|
||||
|
||||
The "last_accelerate:" prints the low-order 16 bits (in hex) of the
|
||||
jiffies counter when this CPU last invoked rcu_try_advance_all_cbs()
|
||||
from rcu_needs_cpu() or last invoked rcu_accelerate_cbs() from
|
||||
rcu_prepare_for_idle(). "dyntick_enabled: 1" indicates that dyntick-idle
|
||||
processing is enabled.
|
||||
|
||||
If the grace period ends just as the stall warning starts printing,
|
||||
there will be a spurious stall-warning message, which will include
|
||||
the following::
|
||||
|
@ -39,9 +39,11 @@ different paths, as follows:
|
||||
|
||||
:ref:`6. ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>`
|
||||
|
||||
:ref:`7. FULL LIST OF RCU APIs <7_whatisRCU>`
|
||||
:ref:`7. ANALOGY WITH REFERENCE COUNTING <7_whatisRCU>`
|
||||
|
||||
:ref:`8. ANSWERS TO QUICK QUIZZES <8_whatisRCU>`
|
||||
:ref:`8. FULL LIST OF RCU APIs <8_whatisRCU>`
|
||||
|
||||
:ref:`9. ANSWERS TO QUICK QUIZZES <9_whatisRCU>`
|
||||
|
||||
People who prefer starting with a conceptual overview should focus on
|
||||
Section 1, though most readers will profit by reading this section at
|
||||
@ -677,7 +679,7 @@ Quick Quiz #1:
|
||||
occur when using this algorithm in a real-world Linux
|
||||
kernel? How could this deadlock be avoided?
|
||||
|
||||
:ref:`Answers to Quick Quiz <8_whatisRCU>`
|
||||
:ref:`Answers to Quick Quiz <9_whatisRCU>`
|
||||
|
||||
5B. "TOY" EXAMPLE #2: CLASSIC RCU
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
@ -732,7 +734,7 @@ Quick Quiz #2:
|
||||
Give an example where Classic RCU's read-side
|
||||
overhead is **negative**.
|
||||
|
||||
:ref:`Answers to Quick Quiz <8_whatisRCU>`
|
||||
:ref:`Answers to Quick Quiz <9_whatisRCU>`
|
||||
|
||||
.. _quiz_3:
|
||||
|
||||
@ -741,7 +743,7 @@ Quick Quiz #3:
|
||||
critical section, what the heck do you do in
|
||||
CONFIG_PREEMPT_RT, where normal spinlocks can block???
|
||||
|
||||
:ref:`Answers to Quick Quiz <8_whatisRCU>`
|
||||
:ref:`Answers to Quick Quiz <9_whatisRCU>`
|
||||
|
||||
.. _6_whatisRCU:
|
||||
|
||||
@ -872,7 +874,79 @@ be used in place of synchronize_rcu().
|
||||
|
||||
.. _7_whatisRCU:
|
||||
|
||||
7. FULL LIST OF RCU APIs
|
||||
7. ANALOGY WITH REFERENCE COUNTING
|
||||
-----------------------------------
|
||||
|
||||
The reader-writer analogy (illustrated by the previous section) is not
|
||||
always the best way to think about using RCU. Another helpful analogy
|
||||
considers RCU an effective reference count on everything which is
|
||||
protected by RCU.
|
||||
|
||||
A reference count typically does not prevent the referenced object's
|
||||
values from changing, but does prevent changes to type -- particularly the
|
||||
gross change of type that happens when that object's memory is freed and
|
||||
re-allocated for some other purpose. Once a type-safe reference to the
|
||||
object is obtained, some other mechanism is needed to ensure consistent
|
||||
access to the data in the object. This could involve taking a spinlock,
|
||||
but with RCU the typical approach is to perform reads with SMP-aware
|
||||
operations such as smp_load_acquire(), to perform updates with atomic
|
||||
read-modify-write operations, and to provide the necessary ordering.
|
||||
RCU provides a number of support functions that embed the required
|
||||
operations and ordering, such as the list_for_each_entry_rcu() macro
|
||||
used in the previous section.
|
||||
|
||||
A more focused view of the reference counting behavior is that,
|
||||
between rcu_read_lock() and rcu_read_unlock(), any reference taken with
|
||||
rcu_dereference() on a pointer marked as ``__rcu`` can be treated as
|
||||
though a reference-count on that object has been temporarily increased.
|
||||
This prevents the object from changing type. Exactly what this means
|
||||
will depend on normal expectations of objects of that type, but it
|
||||
typically includes that spinlocks can still be safely locked, normal
|
||||
reference counters can be safely manipulated, and ``__rcu`` pointers
|
||||
can be safely dereferenced.
|
||||
|
||||
Some operations that one might expect to see on an object for
|
||||
which an RCU reference is held include:
|
||||
|
||||
- Copying out data that is guaranteed to be stable by the object's type.
|
||||
- Using kref_get_unless_zero() or similar to get a longer-term
|
||||
reference. This may fail of course.
|
||||
- Acquiring a spinlock in the object, and checking if the object still
|
||||
is the expected object and if so, manipulating it freely.
|
||||
|
||||
The understanding that RCU provides a reference that only prevents a
|
||||
change of type is particularly visible with objects allocated from a
|
||||
slab cache marked ``SLAB_TYPESAFE_BY_RCU``. RCU operations may yield a
|
||||
reference to an object from such a cache that has been concurrently
|
||||
freed and the memory reallocated to a completely different object,
|
||||
though of the same type. In this case RCU doesn't even protect the
|
||||
identity of the object from changing, only its type. So the object
|
||||
found may not be the one expected, but it will be one where it is safe
|
||||
to take a reference or spinlock and then confirm that the identity
|
||||
matches the expectations.
|
||||
|
||||
With traditional reference counting -- such as that implemented by the
|
||||
kref library in Linux -- there is typically code that runs when the last
|
||||
reference to an object is dropped. With kref, this is the function
|
||||
passed to kref_put(). When RCU is being used, such finalization code
|
||||
must not be run until all ``__rcu`` pointers referencing the object have
|
||||
been updated, and then a grace period has passed. Every remaining
|
||||
globally visible pointer to the object must be considered to be a
|
||||
potential counted reference, and the finalization code is typically run
|
||||
using call_rcu() only after all those pointers have been changed.
|
||||
|
||||
To see how to choose between these two analogies -- of RCU as a
|
||||
reader-writer lock and RCU as a reference counting system -- it is useful
|
||||
to reflect on the scale of the thing being protected. The reader-writer
|
||||
lock analogy looks at larger multi-part objects such as a linked list
|
||||
and shows how RCU can facilitate concurrency while elements are added
|
||||
to, and removed from, the list. The reference-count analogy looks at
|
||||
the individual objects and looks at how they can be accessed safely
|
||||
within whatever whole they are a part of.
|
||||
|
||||
.. _8_whatisRCU:
|
||||
|
||||
8. FULL LIST OF RCU APIs
|
||||
-------------------------
|
||||
|
||||
The RCU APIs are documented in docbook-format header comments in the
|
||||
@ -1035,9 +1109,9 @@ g. Otherwise, use RCU.
|
||||
Of course, this all assumes that you have determined that RCU is in fact
|
||||
the right tool for your job.
|
||||
|
||||
.. _8_whatisRCU:
|
||||
.. _9_whatisRCU:
|
||||
|
||||
8. ANSWERS TO QUICK QUIZZES
|
||||
9. ANSWERS TO QUICK QUIZZES
|
||||
----------------------------
|
||||
|
||||
Quick Quiz #1:
|
||||
|
@ -13,6 +13,8 @@ a) waiting for a CPU (while being runnable)
|
||||
b) completion of synchronous block I/O initiated by the task
|
||||
c) swapping in pages
|
||||
d) memory reclaim
|
||||
e) thrashing page cache
|
||||
f) direct compact
|
||||
|
||||
and makes these statistics available to userspace through
|
||||
the taskstats interface.
|
||||
@ -41,11 +43,12 @@ generic data structure to userspace corresponding to per-pid and per-tgid
|
||||
statistics. The delay accounting functionality populates specific fields of
|
||||
this structure. See
|
||||
|
||||
include/linux/taskstats.h
|
||||
include/uapi/linux/taskstats.h
|
||||
|
||||
for a description of the fields pertaining to delay accounting.
|
||||
It will generally be in the form of counters returning the cumulative
|
||||
delay seen for cpu, sync block I/O, swapin, memory reclaim etc.
|
||||
delay seen for cpu, sync block I/O, swapin, memory reclaim, thrash page
|
||||
cache, direct compact etc.
|
||||
|
||||
Taking the difference of two successive readings of a given
|
||||
counter (say cpu_delay_total) for a task will give the delay
|
||||
@ -88,41 +91,37 @@ seen.
|
||||
|
||||
General format of the getdelays command::
|
||||
|
||||
getdelays [-t tgid] [-p pid] [-c cmd...]
|
||||
|
||||
getdelays [-dilv] [-t tgid] [-p pid]
|
||||
|
||||
Get delays, since system boot, for pid 10::
|
||||
|
||||
# ./getdelays -p 10
|
||||
# ./getdelays -d -p 10
|
||||
(output similar to next case)
|
||||
|
||||
Get sum of delays, since system boot, for all pids with tgid 5::
|
||||
|
||||
# ./getdelays -t 5
|
||||
# ./getdelays -d -t 5
|
||||
print delayacct stats ON
|
||||
TGID 5
|
||||
|
||||
|
||||
CPU count real total virtual total delay total
|
||||
7876 92005750 100000000 24001500
|
||||
IO count delay total
|
||||
0 0
|
||||
SWAP count delay total
|
||||
0 0
|
||||
RECLAIM count delay total
|
||||
0 0
|
||||
CPU count real total virtual total delay total delay average
|
||||
8 7000000 6872122 3382277 0.423ms
|
||||
IO count delay total delay average
|
||||
0 0 0ms
|
||||
SWAP count delay total delay average
|
||||
0 0 0ms
|
||||
RECLAIM count delay total delay average
|
||||
0 0 0ms
|
||||
THRASHING count delay total delay average
|
||||
0 0 0ms
|
||||
COMPACT count delay total delay average
|
||||
0 0 0ms
|
||||
|
||||
Get delays seen in executing a given simple command::
|
||||
Get IO accounting for pid 1, it works only with -p::
|
||||
|
||||
# ./getdelays -c ls /
|
||||
# ./getdelays -i -p 1
|
||||
printing IO accounting
|
||||
linuxrc: read=65536, write=0, cancelled_write=0
|
||||
|
||||
bin data1 data3 data5 dev home media opt root srv sys usr
|
||||
boot data2 data4 data6 etc lib mnt proc sbin subdomain tmp var
|
||||
|
||||
|
||||
CPU count real total virtual total delay total
|
||||
6 4000250 4000000 0
|
||||
IO count delay total
|
||||
0 0
|
||||
SWAP count delay total
|
||||
0 0
|
||||
RECLAIM count delay total
|
||||
0 0
|
||||
The above command can be used with -v to get more debug information.
|
||||
|
@ -92,7 +92,8 @@ Triggers can be set on more than one psi metric and more than one trigger
|
||||
for the same psi metric can be specified. However for each trigger a separate
|
||||
file descriptor is required to be able to poll it separately from others,
|
||||
therefore for each trigger a separate open() syscall should be made even
|
||||
when opening the same psi interface file.
|
||||
when opening the same psi interface file. Write operations to a file descriptor
|
||||
with an already existing psi trigger will fail with EBUSY.
|
||||
|
||||
Monitors activate only when system enters stall state for the monitored
|
||||
psi metric and deactivates upon exit from the stall state. While system is
|
||||
|
@ -4,6 +4,8 @@
|
||||
Collaborative Processor Performance Control (CPPC)
|
||||
==================================================
|
||||
|
||||
.. _cppc_sysfs:
|
||||
|
||||
CPPC
|
||||
====
|
||||
|
||||
|
@ -29,12 +29,14 @@ Brief summary of control files::
|
||||
hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded
|
||||
hugetlb.<hugepagesize>.usage_in_bytes # show current usage for "hugepagesize" hugetlb
|
||||
hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB usage limit
|
||||
hugetlb.<hugepagesize>.numa_stat # show the numa information of the hugetlb memory charged to this cgroup
|
||||
|
||||
For a system supporting three hugepage sizes (64k, 32M and 1G), the control
|
||||
files include::
|
||||
|
||||
hugetlb.1GB.limit_in_bytes
|
||||
hugetlb.1GB.max_usage_in_bytes
|
||||
hugetlb.1GB.numa_stat
|
||||
hugetlb.1GB.usage_in_bytes
|
||||
hugetlb.1GB.failcnt
|
||||
hugetlb.1GB.rsvd.limit_in_bytes
|
||||
@ -43,6 +45,7 @@ files include::
|
||||
hugetlb.1GB.rsvd.failcnt
|
||||
hugetlb.64KB.limit_in_bytes
|
||||
hugetlb.64KB.max_usage_in_bytes
|
||||
hugetlb.64KB.numa_stat
|
||||
hugetlb.64KB.usage_in_bytes
|
||||
hugetlb.64KB.failcnt
|
||||
hugetlb.64KB.rsvd.limit_in_bytes
|
||||
@ -51,6 +54,7 @@ files include::
|
||||
hugetlb.64KB.rsvd.failcnt
|
||||
hugetlb.32MB.limit_in_bytes
|
||||
hugetlb.32MB.max_usage_in_bytes
|
||||
hugetlb.32MB.numa_stat
|
||||
hugetlb.32MB.usage_in_bytes
|
||||
hugetlb.32MB.failcnt
|
||||
hugetlb.32MB.rsvd.limit_in_bytes
|
||||
|
@ -1268,6 +1268,9 @@ PAGE_SIZE multiple when read back.
|
||||
The number of processes belonging to this cgroup
|
||||
killed by any kind of OOM killer.
|
||||
|
||||
oom_group_kill
|
||||
The number of times a group OOM has occurred.
|
||||
|
||||
memory.events.local
|
||||
Similar to memory.events but the fields in the file are local
|
||||
to the cgroup i.e. not hierarchical. The file modified event
|
||||
@ -1311,6 +1314,9 @@ PAGE_SIZE multiple when read back.
|
||||
sock (npn)
|
||||
Amount of memory used in network transmission buffers
|
||||
|
||||
vmalloc (npn)
|
||||
Amount of memory used for vmap backed memory.
|
||||
|
||||
shmem
|
||||
Amount of cached filesystem data that is swap-backed,
|
||||
such as tmpfs, shm segments, shared anonymous mmap()s
|
||||
@ -2260,6 +2266,11 @@ HugeTLB Interface Files
|
||||
are local to the cgroup i.e. not hierarchical. The file modified event
|
||||
generated on this file reflects only the local events.
|
||||
|
||||
hugetlb.<hugepagesize>.numa_stat
|
||||
Similar to memory.numa_stat, it shows the numa information of the
|
||||
hugetlb pages of <hugepagesize> in this cgroup. Only active in
|
||||
use hugetlb pages are included. The per-node values are in bytes.
|
||||
|
||||
Misc
|
||||
----
|
||||
|
||||
|
@ -734,10 +734,9 @@ SecurityFlags Flags which control security negotiation and
|
||||
using weaker password hashes is 0x37037 (lanman,
|
||||
plaintext, ntlm, ntlmv2, signing allowed). Some
|
||||
SecurityFlags require the corresponding menuconfig
|
||||
options to be enabled (lanman and plaintext require
|
||||
CONFIG_CIFS_WEAK_PW_HASH for example). Enabling
|
||||
plaintext authentication currently requires also
|
||||
enabling lanman authentication in the security flags
|
||||
options to be enabled. Enabling plaintext
|
||||
authentication currently requires also enabling
|
||||
lanman authentication in the security flags
|
||||
because the cifs module only supports sending
|
||||
laintext passwords using the older lanman dialect
|
||||
form of the session setup SMB. (e.g. for authentication
|
||||
|
@ -8,11 +8,9 @@ to /proc/cpuinfo output of some architectures. They reside in
|
||||
Documentation/ABI/stable/sysfs-devices-system-cpu.
|
||||
|
||||
Architecture-neutral, drivers/base/topology.c, exports these attributes.
|
||||
However, the book and drawer related sysfs files will only be created if
|
||||
CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively.
|
||||
|
||||
CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are currently only used on s390,
|
||||
where they reflect the cpu and cache hierarchy.
|
||||
However the die, cluster, book, and drawer hierarchy related sysfs files will
|
||||
only be created if an architecture provides the related macros as described
|
||||
below.
|
||||
|
||||
For an architecture to support this feature, it must define some of
|
||||
these macros in include/asm-XXX/topology.h::
|
||||
@ -43,15 +41,14 @@ not defined by include/asm-XXX/topology.h:
|
||||
2) topology_die_id: -1
|
||||
3) topology_cluster_id: -1
|
||||
4) topology_core_id: 0
|
||||
5) topology_sibling_cpumask: just the given CPU
|
||||
6) topology_core_cpumask: just the given CPU
|
||||
7) topology_cluster_cpumask: just the given CPU
|
||||
8) topology_die_cpumask: just the given CPU
|
||||
|
||||
For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
|
||||
default definitions for topology_book_id() and topology_book_cpumask().
|
||||
For architectures that don't support drawers (CONFIG_SCHED_DRAWER) there are
|
||||
no default definitions for topology_drawer_id() and topology_drawer_cpumask().
|
||||
5) topology_book_id: -1
|
||||
6) topology_drawer_id: -1
|
||||
7) topology_sibling_cpumask: just the given CPU
|
||||
8) topology_core_cpumask: just the given CPU
|
||||
9) topology_cluster_cpumask: just the given CPU
|
||||
10) topology_die_cpumask: just the given CPU
|
||||
11) topology_book_cpumask: just the given CPU
|
||||
12) topology_drawer_cpumask: just the given CPU
|
||||
|
||||
Additionally, CPU topology information is provided under
|
||||
/sys/devices/system/cpu and includes these files. The internal
|
||||
|
@ -2339,13 +2339,7 @@
|
||||
disks (see major number 3) except that the limit on
|
||||
partitions is 31.
|
||||
|
||||
162 char Raw block device interface
|
||||
0 = /dev/rawctl Raw I/O control device
|
||||
1 = /dev/raw/raw1 First raw I/O device
|
||||
2 = /dev/raw/raw2 Second raw I/O device
|
||||
...
|
||||
max minor number of raw device is set by kernel config
|
||||
MAX_RAW_DEVS or raw module parameter 'max_raw_devs'
|
||||
162 char Used for (now removed) raw block device interface
|
||||
|
||||
163 char
|
||||
|
||||
|
134
Documentation/admin-guide/gpio/gpio-sim.rst
Normal file
@ -0,0 +1,134 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0-or-later
|
||||
|
||||
Configfs GPIO Simulator
|
||||
=======================
|
||||
|
||||
The configfs GPIO Simulator (gpio-sim) provides a way to create simulated GPIO
|
||||
chips for testing purposes. The lines exposed by these chips can be accessed
|
||||
using the standard GPIO character device interface as well as manipulated
|
||||
using sysfs attributes.
|
||||
|
||||
Creating simulated chips
|
||||
------------------------
|
||||
|
||||
The gpio-sim module registers a configfs subsystem called ``'gpio-sim'``. For
|
||||
details of the configfs filesystem, please refer to the configfs documentation.
|
||||
|
||||
The user can create a hierarchy of configfs groups and items as well as modify
|
||||
values of exposed attributes. Once the chip is instantiated, this hierarchy
|
||||
will be translated to appropriate device properties. The general structure is:
|
||||
|
||||
**Group:** ``/config/gpio-sim``
|
||||
|
||||
This is the top directory of the gpio-sim configfs tree.
|
||||
|
||||
**Group:** ``/config/gpio-sim/gpio-device``
|
||||
|
||||
**Attribute:** ``/config/gpio-sim/gpio-device/dev_name``
|
||||
|
||||
**Attribute:** ``/config/gpio-sim/gpio-device/live``
|
||||
|
||||
This is a directory representing a GPIO platform device. The ``'dev_name'``
|
||||
attribute is read-only and allows the user-space to read the platform device
|
||||
name (e.g. ``'gpio-sim.0'``). The ``'live'`` attribute allows to trigger the
|
||||
actual creation of the device once it's fully configured. The accepted values
|
||||
are: ``'1'`` to enable the simulated device and ``'0'`` to disable and tear
|
||||
it down.
|
||||
|
||||
**Group:** ``/config/gpio-sim/gpio-device/gpio-bankX``
|
||||
|
||||
**Attribute:** ``/config/gpio-sim/gpio-device/gpio-bankX/chip_name``
|
||||
|
||||
**Attribute:** ``/config/gpio-sim/gpio-device/gpio-bankX/num_lines``
|
||||
|
||||
This group represents a bank of GPIOs under the top platform device. The
|
||||
``'chip_name'`` attribute is read-only and allows the user-space to read the
|
||||
device name of the bank device. The ``'num_lines'`` attribute allows to specify
|
||||
the number of lines exposed by this bank.
|
||||
|
||||
**Group:** ``/config/gpio-sim/gpio-device/gpio-bankX/lineY``
|
||||
|
||||
**Attribute:** ``/config/gpio-sim/gpio-device/gpio-bankX/lineY/name``
|
||||
|
||||
This group represents a single line at the offset Y. The 'name' attribute
|
||||
allows to set the line name as represented by the 'gpio-line-names' property.
|
||||
|
||||
**Item:** ``/config/gpio-sim/gpio-device/gpio-bankX/lineY/hog``
|
||||
|
||||
**Attribute:** ``/config/gpio-sim/gpio-device/gpio-bankX/lineY/hog/name``
|
||||
|
||||
**Attribute:** ``/config/gpio-sim/gpio-device/gpio-bankX/lineY/hog/direction``
|
||||
|
||||
This item makes the gpio-sim module hog the associated line. The ``'name'``
|
||||
attribute specifies the in-kernel consumer name to use. The ``'direction'``
|
||||
attribute specifies the hog direction and must be one of: ``'input'``,
|
||||
``'output-high'`` and ``'output-low'``.
|
||||
|
||||
Inside each bank directory, there's a set of attributes that can be used to
|
||||
configure the new chip. Additionally the user can ``mkdir()`` subdirectories
|
||||
inside the chip's directory that allow to pass additional configuration for
|
||||
specific lines. The name of those subdirectories must take the form of:
|
||||
``'line<offset>'`` (e.g. ``'line0'``, ``'line20'``, etc.) as the name will be
|
||||
used by the module to assign the config to the specific line at given offset.
|
||||
|
||||
Once the confiuration is complete, the ``'live'`` attribute must be set to 1 in
|
||||
order to instantiate the chip. It can be set back to 0 to destroy the simulated
|
||||
chip. The module will synchronously wait for the new simulated device to be
|
||||
successfully probed and if this doesn't happen, writing to ``'live'`` will
|
||||
result in an error.
|
||||
|
||||
Simulated GPIO chips can also be defined in device-tree. The compatible string
|
||||
must be: ``"gpio-simulator"``. Supported properties are:
|
||||
|
||||
``"gpio-sim,label"`` - chip label
|
||||
|
||||
Other standard GPIO properties (like ``"gpio-line-names"``, ``"ngpios"`` or
|
||||
``"gpio-hog"``) are also supported. Please refer to the GPIO documentation for
|
||||
details.
|
||||
|
||||
An example device-tree code defining a GPIO simulator:
|
||||
|
||||
.. code-block :: none
|
||||
|
||||
gpio-sim {
|
||||
compatible = "gpio-simulator";
|
||||
|
||||
bank0 {
|
||||
gpio-controller;
|
||||
#gpio-cells = <2>;
|
||||
ngpios = <16>;
|
||||
gpio-sim,label = "dt-bank0";
|
||||
gpio-line-names = "", "sim-foo", "", "sim-bar";
|
||||
};
|
||||
|
||||
bank1 {
|
||||
gpio-controller;
|
||||
#gpio-cells = <2>;
|
||||
ngpios = <8>;
|
||||
gpio-sim,label = "dt-bank1";
|
||||
|
||||
line3 {
|
||||
gpio-hog;
|
||||
gpios = <3 0>;
|
||||
output-high;
|
||||
line-name = "sim-hog-from-dt";
|
||||
};
|
||||
};
|
||||
};
|
||||
|
||||
Manipulating simulated lines
|
||||
----------------------------
|
||||
|
||||
Each simulated GPIO chip creates a separate sysfs group under its device
|
||||
directory for each exposed line
|
||||
(e.g. ``/sys/devices/platform/gpio-sim.X/gpiochipY/``). The name of each group
|
||||
is of the form: ``'sim_gpioX'`` where X is the offset of the line. Inside each
|
||||
group there are two attibutes:
|
||||
|
||||
``pull`` - allows to read and set the current simulated pull setting for
|
||||
every line, when writing the value must be one of: ``'pull-up'``,
|
||||
``'pull-down'``
|
||||
|
||||
``value`` - allows to read the current value of the line which may be
|
||||
different from the pull if the line is being driven from
|
||||
user-space
|
@ -10,6 +10,7 @@ gpio
|
||||
gpio-aggregator
|
||||
sysfs
|
||||
gpio-mockup
|
||||
gpio-sim
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
|
@ -468,7 +468,7 @@ Spectre variant 2
|
||||
before invoking any firmware code to prevent Spectre variant 2 exploits
|
||||
using the firmware.
|
||||
|
||||
Using kernel address space randomization (CONFIG_RANDOMIZE_SLAB=y
|
||||
Using kernel address space randomization (CONFIG_RANDOMIZE_BASE=y
|
||||
and CONFIG_SLAB_FREELIST_RANDOM=y in the kernel configuration) makes
|
||||
attacks on the kernel generally more difficult.
|
||||
|
||||
|
@ -225,14 +225,23 @@
|
||||
For broken nForce2 BIOS resulting in XT-PIC timer.
|
||||
|
||||
acpi_sleep= [HW,ACPI] Sleep options
|
||||
Format: { s3_bios, s3_mode, s3_beep, s4_nohwsig,
|
||||
old_ordering, nonvs, sci_force_enable, nobl }
|
||||
Format: { s3_bios, s3_mode, s3_beep, s4_hwsig,
|
||||
s4_nohwsig, old_ordering, nonvs,
|
||||
sci_force_enable, nobl }
|
||||
See Documentation/power/video.rst for information on
|
||||
s3_bios and s3_mode.
|
||||
s3_beep is for debugging; it makes the PC's speaker beep
|
||||
as soon as the kernel's real-mode entry point is called.
|
||||
s4_hwsig causes the kernel to check the ACPI hardware
|
||||
signature during resume from hibernation, and gracefully
|
||||
refuse to resume if it has changed. This complies with
|
||||
the ACPI specification but not with reality, since
|
||||
Windows does not do this and many laptops do change it
|
||||
on docking. So the default behaviour is to allow resume
|
||||
and simply warn when the signature changes, unless the
|
||||
s4_hwsig option is enabled.
|
||||
s4_nohwsig prevents ACPI hardware signature from being
|
||||
used during resume from hibernation.
|
||||
used (or even warned about) during resume.
|
||||
old_ordering causes the ACPI 1.0 ordering of the _PTS
|
||||
control method, with respect to putting devices into
|
||||
low power states, to be enforced (the ACPI 2.0 ordering
|
||||
@ -603,8 +612,8 @@
|
||||
clocksource.max_cswd_read_retries= [KNL]
|
||||
Number of clocksource_watchdog() retries due to
|
||||
external delays before the clock will be marked
|
||||
unstable. Defaults to three retries, that is,
|
||||
four attempts to read the clock under test.
|
||||
unstable. Defaults to two retries, that is,
|
||||
three attempts to read the clock under test.
|
||||
|
||||
clocksource.verify_n_cpus= [KNL]
|
||||
Limit the number of CPUs checked for clocksources
|
||||
@ -2940,7 +2949,7 @@
|
||||
both parameters are enabled, hugetlb_free_vmemmap takes
|
||||
precedence over memory_hotplug.memmap_on_memory.
|
||||
|
||||
memtest= [KNL,X86,ARM,PPC,RISCV] Enable memtest
|
||||
memtest= [KNL,X86,ARM,M68K,PPC,RISCV] Enable memtest
|
||||
Format: <integer>
|
||||
default : 0 <disable>
|
||||
Specifies the number of memtest passes to be
|
||||
@ -3384,7 +3393,7 @@
|
||||
Disable SMAP (Supervisor Mode Access Prevention)
|
||||
even if it is supported by processor.
|
||||
|
||||
nosmep [X86,PPC]
|
||||
nosmep [X86,PPC64s]
|
||||
Disable SMEP (Supervisor Mode Execution Prevention)
|
||||
even if it is supported by processor.
|
||||
|
||||
@ -3551,6 +3560,13 @@
|
||||
shutdown the other cpus. Instead use the REBOOT_VECTOR
|
||||
irq.
|
||||
|
||||
nomodeset Disable kernel modesetting. DRM drivers will not perform
|
||||
display-mode changes or accelerated rendering. Only the
|
||||
system framebuffer will be available for use if this was
|
||||
set-up by the firmware or boot loader.
|
||||
|
||||
Useful as fallback, or for testing and debugging.
|
||||
|
||||
nomodule Disable module load
|
||||
|
||||
nopat [X86] Disable PAT (page attribute table extension of
|
||||
@ -4357,19 +4373,30 @@
|
||||
Disable the Correctable Errors Collector,
|
||||
see CONFIG_RAS_CEC help text.
|
||||
|
||||
rcu_nocbs= [KNL]
|
||||
The argument is a cpu list, as described above.
|
||||
rcu_nocbs[=cpu-list]
|
||||
[KNL] The optional argument is a cpu list,
|
||||
as described above.
|
||||
|
||||
In kernels built with CONFIG_RCU_NOCB_CPU=y, set
|
||||
the specified list of CPUs to be no-callback CPUs.
|
||||
Invocation of these CPUs' RCU callbacks will be
|
||||
offloaded to "rcuox/N" kthreads created for that
|
||||
purpose, where "x" is "p" for RCU-preempt, and
|
||||
"s" for RCU-sched, and "N" is the CPU number.
|
||||
This reduces OS jitter on the offloaded CPUs,
|
||||
which can be useful for HPC and real-time
|
||||
workloads. It can also improve energy efficiency
|
||||
for asymmetric multiprocessors.
|
||||
In kernels built with CONFIG_RCU_NOCB_CPU=y,
|
||||
enable the no-callback CPU mode, which prevents
|
||||
such CPUs' callbacks from being invoked in
|
||||
softirq context. Invocation of such CPUs' RCU
|
||||
callbacks will instead be offloaded to "rcuox/N"
|
||||
kthreads created for that purpose, where "x" is
|
||||
"p" for RCU-preempt, "s" for RCU-sched, and "g"
|
||||
for the kthreads that mediate grace periods; and
|
||||
"N" is the CPU number. This reduces OS jitter on
|
||||
the offloaded CPUs, which can be useful for HPC
|
||||
and real-time workloads. It can also improve
|
||||
energy efficiency for asymmetric multiprocessors.
|
||||
|
||||
If a cpulist is passed as an argument, the specified
|
||||
list of CPUs is set to no-callback mode from boot.
|
||||
|
||||
Otherwise, if the '=' sign and the cpulist
|
||||
arguments are omitted, no CPU will be set to
|
||||
no-callback mode from boot but the mode may be
|
||||
toggled at runtime via cpusets.
|
||||
|
||||
rcu_nocb_poll [KNL]
|
||||
Rather than requiring that offloaded CPUs
|
||||
@ -4503,10 +4530,6 @@
|
||||
on rcutree.qhimark at boot time and to zero to
|
||||
disable more aggressive help enlistment.
|
||||
|
||||
rcutree.rcu_idle_gp_delay= [KNL]
|
||||
Set wakeup interval for idle CPUs that have
|
||||
RCU callbacks (RCU_FAST_NO_HZ=y).
|
||||
|
||||
rcutree.rcu_kick_kthreads= [KNL]
|
||||
Cause the grace-period kthread to get an extra
|
||||
wake_up() if it sleeps three times longer than
|
||||
@ -4617,8 +4640,12 @@
|
||||
in seconds.
|
||||
|
||||
rcutorture.fwd_progress= [KNL]
|
||||
Enable RCU grace-period forward-progress testing
|
||||
Specifies the number of kthreads to be used
|
||||
for RCU grace-period forward-progress testing
|
||||
for the types of RCU supporting this notion.
|
||||
Defaults to 1 kthread, values less than zero or
|
||||
greater than the number of CPUs cause the number
|
||||
of CPUs to be used.
|
||||
|
||||
rcutorture.fwd_progress_div= [KNL]
|
||||
Specify the fraction of a CPU-stall-warning
|
||||
@ -4819,6 +4846,29 @@
|
||||
period to instead use normal non-expedited
|
||||
grace-period processing.
|
||||
|
||||
rcupdate.rcu_task_collapse_lim= [KNL]
|
||||
Set the maximum number of callbacks present
|
||||
at the beginning of a grace period that allows
|
||||
the RCU Tasks flavors to collapse back to using
|
||||
a single callback queue. This switching only
|
||||
occurs when rcupdate.rcu_task_enqueue_lim is
|
||||
set to the default value of -1.
|
||||
|
||||
rcupdate.rcu_task_contend_lim= [KNL]
|
||||
Set the minimum number of callback-queuing-time
|
||||
lock-contention events per jiffy required to
|
||||
cause the RCU Tasks flavors to switch to per-CPU
|
||||
callback queuing. This switching only occurs
|
||||
when rcupdate.rcu_task_enqueue_lim is set to
|
||||
the default value of -1.
|
||||
|
||||
rcupdate.rcu_task_enqueue_lim= [KNL]
|
||||
Set the number of callback queues to use for the
|
||||
RCU Tasks family of RCU flavors. The default
|
||||
of -1 allows this to be automatically (and
|
||||
dynamically) adjusted. This parameter is intended
|
||||
for use in testing.
|
||||
|
||||
rcupdate.rcu_task_ipi_delay= [KNL]
|
||||
Set time in jiffies during which RCU tasks will
|
||||
avoid sending IPIs, starting with the beginning
|
||||
@ -6452,6 +6502,12 @@
|
||||
controller on both pseries and powernv
|
||||
platforms. Only useful on POWER9 and above.
|
||||
|
||||
xive.store-eoi=off [PPC]
|
||||
By default on POWER10 and above, the kernel will use
|
||||
stores for EOI handling when the XIVE interrupt mode
|
||||
is active. This option allows the XIVE driver to use
|
||||
loads instead, as on POWER9.
|
||||
|
||||
xhci-hcd.quirks [USB,KNL]
|
||||
A hex value specifying bitmask with supplemental xhci
|
||||
host controller quirks. Meaning of each bit can be
|
||||
|
@ -208,7 +208,7 @@ Do at least one of the following:
|
||||
2. Enable RCU to do its processing remotely via dyntick-idle by
|
||||
doing all of the following:
|
||||
|
||||
a. Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y.
|
||||
a. Build with CONFIG_NO_HZ=y.
|
||||
b. Ensure that the CPU goes idle frequently, allowing other
|
||||
CPUs to detect that it has passed through an RCU quiescent
|
||||
state. If the kernel is built with CONFIG_NO_HZ_FULL=y,
|
||||
|
@ -60,6 +60,7 @@ s5p-mfc Samsung S5P MFC Video Codec
|
||||
sh_veu SuperH VEU mem2mem video processing
|
||||
sh_vou SuperH VOU video output
|
||||
stm32-dcmi STM32 Digital Camera Memory Interface (DCMI)
|
||||
stm32-dma2d STM32 Chrom-Art Accelerator Unit
|
||||
sun4i-csi Allwinner A10 CMOS Sensor Interface Support
|
||||
sun6i-csi Allwinner V3s Camera Sensor Interface
|
||||
sun8i-di Allwinner Deinterlace
|
||||
|
@ -208,6 +208,31 @@ PID of the DAMON thread.
|
||||
If DAMON_RECLAIM is enabled, this becomes the PID of the worker thread. Else,
|
||||
-1.
|
||||
|
||||
nr_reclaim_tried_regions
|
||||
------------------------
|
||||
|
||||
Number of memory regions that tried to be reclaimed by DAMON_RECLAIM.
|
||||
|
||||
bytes_reclaim_tried_regions
|
||||
---------------------------
|
||||
|
||||
Total bytes of memory regions that tried to be reclaimed by DAMON_RECLAIM.
|
||||
|
||||
nr_reclaimed_regions
|
||||
--------------------
|
||||
|
||||
Number of memory regions that successfully be reclaimed by DAMON_RECLAIM.
|
||||
|
||||
bytes_reclaimed_regions
|
||||
-----------------------
|
||||
|
||||
Total bytes of memory regions that successfully be reclaimed by DAMON_RECLAIM.
|
||||
|
||||
nr_quota_exceeds
|
||||
----------------
|
||||
|
||||
Number of times that the time/space quota limits have exceeded.
|
||||
|
||||
Example
|
||||
=======
|
||||
|
||||
|
@ -7,37 +7,40 @@ Detailed Usages
|
||||
DAMON provides below three interfaces for different users.
|
||||
|
||||
- *DAMON user space tool.*
|
||||
This is for privileged people such as system administrators who want a
|
||||
just-working human-friendly interface. Using this, users can use the DAMON’s
|
||||
major features in a human-friendly way. It may not be highly tuned for
|
||||
special cases, though. It supports both virtual and physical address spaces
|
||||
monitoring.
|
||||
`This <https://github.com/awslabs/damo>`_ is for privileged people such as
|
||||
system administrators who want a just-working human-friendly interface.
|
||||
Using this, users can use the DAMON’s major features in a human-friendly way.
|
||||
It may not be highly tuned for special cases, though. It supports both
|
||||
virtual and physical address spaces monitoring. For more detail, please
|
||||
refer to its `usage document
|
||||
<https://github.com/awslabs/damo/blob/next/USAGE.md>`_.
|
||||
- *debugfs interface.*
|
||||
This is for privileged user space programmers who want more optimized use of
|
||||
DAMON. Using this, users can use DAMON’s major features by reading
|
||||
from and writing to special debugfs files. Therefore, you can write and use
|
||||
your personalized DAMON debugfs wrapper programs that reads/writes the
|
||||
debugfs files instead of you. The DAMON user space tool is also a reference
|
||||
implementation of such programs. It supports both virtual and physical
|
||||
address spaces monitoring.
|
||||
:ref:`This <debugfs_interface>` is for privileged user space programmers who
|
||||
want more optimized use of DAMON. Using this, users can use DAMON’s major
|
||||
features by reading from and writing to special debugfs files. Therefore,
|
||||
you can write and use your personalized DAMON debugfs wrapper programs that
|
||||
reads/writes the debugfs files instead of you. The `DAMON user space tool
|
||||
<https://github.com/awslabs/damo>`_ is one example of such programs. It
|
||||
supports both virtual and physical address spaces monitoring. Note that this
|
||||
interface provides only simple :ref:`statistics <damos_stats>` for the
|
||||
monitoring results. For detailed monitoring results, DAMON provides a
|
||||
:ref:`tracepoint <tracepoint>`.
|
||||
- *Kernel Space Programming Interface.*
|
||||
This is for kernel space programmers. Using this, users can utilize every
|
||||
feature of DAMON most flexibly and efficiently by writing kernel space
|
||||
DAMON application programs for you. You can even extend DAMON for various
|
||||
address spaces.
|
||||
:doc:`This </vm/damon/api>` is for kernel space programmers. Using this,
|
||||
users can utilize every feature of DAMON most flexibly and efficiently by
|
||||
writing kernel space DAMON application programs for you. You can even extend
|
||||
DAMON for various address spaces. For detail, please refer to the interface
|
||||
:doc:`document </vm/damon/api>`.
|
||||
|
||||
Nevertheless, you could write your own user space tool using the debugfs
|
||||
interface. A reference implementation is available at
|
||||
https://github.com/awslabs/damo. If you are a kernel programmer, you could
|
||||
refer to :doc:`/vm/damon/api` for the kernel space programming interface. For
|
||||
the reason, this document describes only the debugfs interface
|
||||
|
||||
.. _debugfs_interface:
|
||||
|
||||
debugfs Interface
|
||||
=================
|
||||
|
||||
DAMON exports five files, ``attrs``, ``target_ids``, ``init_regions``,
|
||||
``schemes`` and ``monitor_on`` under its debugfs directory,
|
||||
``<debugfs>/damon/``.
|
||||
DAMON exports eight files, ``attrs``, ``target_ids``, ``init_regions``,
|
||||
``schemes``, ``monitor_on``, ``kdamond_pid``, ``mk_contexts`` and
|
||||
``rm_contexts`` under its debugfs directory, ``<debugfs>/damon/``.
|
||||
|
||||
|
||||
Attributes
|
||||
@ -131,24 +134,38 @@ Schemes
|
||||
|
||||
For usual DAMON-based data access aware memory management optimizations, users
|
||||
would simply want the system to apply a memory management action to a memory
|
||||
region of a specific size having a specific access frequency for a specific
|
||||
time. DAMON receives such formalized operation schemes from the user and
|
||||
applies those to the target processes. It also counts the total number and
|
||||
size of regions that each scheme is applied. This statistics can be used for
|
||||
online analysis or tuning of the schemes.
|
||||
region of a specific access pattern. DAMON receives such formalized operation
|
||||
schemes from the user and applies those to the target processes.
|
||||
|
||||
Users can get and set the schemes by reading from and writing to ``schemes``
|
||||
debugfs file. Reading the file also shows the statistics of each scheme. To
|
||||
the file, each of the schemes should be represented in each line in below form:
|
||||
the file, each of the schemes should be represented in each line in below
|
||||
form::
|
||||
|
||||
min-size max-size min-acc max-acc min-age max-age action
|
||||
<target access pattern> <action> <quota> <watermarks>
|
||||
|
||||
Note that the ranges are closed interval. Bytes for the size of regions
|
||||
(``min-size`` and ``max-size``), number of monitored accesses per aggregate
|
||||
interval for access frequency (``min-acc`` and ``max-acc``), number of
|
||||
aggregate intervals for the age of regions (``min-age`` and ``max-age``), and a
|
||||
predefined integer for memory management actions should be used. The supported
|
||||
numbers and their meanings are as below.
|
||||
You can disable schemes by simply writing an empty string to the file.
|
||||
|
||||
Target Access Pattern
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The ``<target access pattern>`` is constructed with three ranges in below
|
||||
form::
|
||||
|
||||
min-size max-size min-acc max-acc min-age max-age
|
||||
|
||||
Specifically, bytes for the size of regions (``min-size`` and ``max-size``),
|
||||
number of monitored accesses per aggregate interval for access frequency
|
||||
(``min-acc`` and ``max-acc``), number of aggregate intervals for the age of
|
||||
regions (``min-age`` and ``max-age``) are specified. Note that the ranges are
|
||||
closed interval.
|
||||
|
||||
Action
|
||||
~~~~~~
|
||||
|
||||
The ``<action>`` is a predefined integer for memory management actions, which
|
||||
DAMON will apply to the regions having the target access pattern. The
|
||||
supported numbers and their meanings are as below.
|
||||
|
||||
- 0: Call ``madvise()`` for the region with ``MADV_WILLNEED``
|
||||
- 1: Call ``madvise()`` for the region with ``MADV_COLD``
|
||||
@ -157,20 +174,82 @@ numbers and their meanings are as below.
|
||||
- 4: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``
|
||||
- 5: Do nothing but count the statistics
|
||||
|
||||
You can disable schemes by simply writing an empty string to the file. For
|
||||
example, below commands applies a scheme saying "If a memory region of size in
|
||||
[4KiB, 8KiB] is showing accesses per aggregate interval in [0, 5] for aggregate
|
||||
interval in [10, 20], page out the region", check the entered scheme again, and
|
||||
finally remove the scheme. ::
|
||||
Quota
|
||||
~~~~~
|
||||
|
||||
Optimal ``target access pattern`` for each ``action`` is workload dependent, so
|
||||
not easy to find. Worse yet, setting a scheme of some action too aggressive
|
||||
can cause severe overhead. To avoid such overhead, users can limit time and
|
||||
size quota for the scheme via the ``<quota>`` in below form::
|
||||
|
||||
<ms> <sz> <reset interval> <priority weights>
|
||||
|
||||
This makes DAMON to try to use only up to ``<ms>`` milliseconds for applying
|
||||
the action to memory regions of the ``target access pattern`` within the
|
||||
``<reset interval>`` milliseconds, and to apply the action to only up to
|
||||
``<sz>`` bytes of memory regions within the ``<reset interval>``. Setting both
|
||||
``<ms>`` and ``<sz>`` zero disables the quota limits.
|
||||
|
||||
When the quota limit is expected to be exceeded, DAMON prioritizes found memory
|
||||
regions of the ``target access pattern`` based on their size, access frequency,
|
||||
and age. For personalized prioritization, users can set the weights for the
|
||||
three properties in ``<priority weights>`` in below form::
|
||||
|
||||
<size weight> <access frequency weight> <age weight>
|
||||
|
||||
Watermarks
|
||||
~~~~~~~~~~
|
||||
|
||||
Some schemes would need to run based on current value of the system's specific
|
||||
metrics like free memory ratio. For such cases, users can specify watermarks
|
||||
for the condition.::
|
||||
|
||||
<metric> <check interval> <high mark> <middle mark> <low mark>
|
||||
|
||||
``<metric>`` is a predefined integer for the metric to be checked. The
|
||||
supported numbers and their meanings are as below.
|
||||
|
||||
- 0: Ignore the watermarks
|
||||
- 1: System's free memory rate (per thousand)
|
||||
|
||||
The value of the metric is checked every ``<check interval>`` microseconds.
|
||||
|
||||
If the value is higher than ``<high mark>`` or lower than ``<low mark>``, the
|
||||
scheme is deactivated. If the value is lower than ``<mid mark>``, the scheme
|
||||
is activated.
|
||||
|
||||
.. _damos_stats:
|
||||
|
||||
Statistics
|
||||
~~~~~~~~~~
|
||||
|
||||
It also counts the total number and bytes of regions that each scheme is tried
|
||||
to be applied, the two numbers for the regions that each scheme is successfully
|
||||
applied, and the total number of the quota limit exceeds. This statistics can
|
||||
be used for online analysis or tuning of the schemes.
|
||||
|
||||
The statistics can be shown by reading the ``schemes`` file. Reading the file
|
||||
will show each scheme you entered in each line, and the five numbers for the
|
||||
statistics will be added at the end of each line.
|
||||
|
||||
Example
|
||||
~~~~~~~
|
||||
|
||||
Below commands applies a scheme saying "If a memory region of size in [4KiB,
|
||||
8KiB] is showing accesses per aggregate interval in [0, 5] for aggregate
|
||||
interval in [10, 20], page out the region. For the paging out, use only up to
|
||||
10ms per second, and also don't page out more than 1GiB per second. Under the
|
||||
limitation, page out memory regions having longer age first. Also, check the
|
||||
free memory rate of the system every 5 seconds, start the monitoring and paging
|
||||
out when the free memory rate becomes lower than 50%, but stop it if the free
|
||||
memory rate becomes larger than 60%, or lower than 30%".::
|
||||
|
||||
# cd <debugfs>/damon
|
||||
# echo "4096 8192 0 5 10 20 2" > schemes
|
||||
# cat schemes
|
||||
4096 8192 0 5 10 20 2 0 0
|
||||
# echo > schemes
|
||||
|
||||
The last two integers in the 4th line of above example is the total number and
|
||||
the total size of the regions that the scheme is applied.
|
||||
# scheme="4096 8192 0 5 10 20 2" # target access pattern and action
|
||||
# scheme+=" 10 $((1024*1024*1024)) 1000" # quotas
|
||||
# scheme+=" 0 0 100" # prioritization weights
|
||||
# scheme+=" 1 5000000 600 500 300" # watermarks
|
||||
# echo "$scheme" > schemes
|
||||
|
||||
|
||||
Turning On/Off
|
||||
@ -195,6 +274,54 @@ the monitoring is turned on. If you write to the files while DAMON is running,
|
||||
an error code such as ``-EBUSY`` will be returned.
|
||||
|
||||
|
||||
Monitoring Thread PID
|
||||
---------------------
|
||||
|
||||
DAMON does requested monitoring with a kernel thread called ``kdamond``. You
|
||||
can get the pid of the thread by reading the ``kdamond_pid`` file. When the
|
||||
monitoring is turned off, reading the file returns ``none``. ::
|
||||
|
||||
# cd <debugfs>/damon
|
||||
# cat monitor_on
|
||||
off
|
||||
# cat kdamond_pid
|
||||
none
|
||||
# echo on > monitor_on
|
||||
# cat kdamond_pid
|
||||
18594
|
||||
|
||||
|
||||
Using Multiple Monitoring Threads
|
||||
---------------------------------
|
||||
|
||||
One ``kdamond`` thread is created for each monitoring context. You can create
|
||||
and remove monitoring contexts for multiple ``kdamond`` required use case using
|
||||
the ``mk_contexts`` and ``rm_contexts`` files.
|
||||
|
||||
Writing the name of the new context to the ``mk_contexts`` file creates a
|
||||
directory of the name on the DAMON debugfs directory. The directory will have
|
||||
DAMON debugfs files for the context. ::
|
||||
|
||||
# cd <debugfs>/damon
|
||||
# ls foo
|
||||
# ls: cannot access 'foo': No such file or directory
|
||||
# echo foo > mk_contexts
|
||||
# ls foo
|
||||
# attrs init_regions kdamond_pid schemes target_ids
|
||||
|
||||
If the context is not needed anymore, you can remove it and the corresponding
|
||||
directory by putting the name of the context to the ``rm_contexts`` file. ::
|
||||
|
||||
# echo foo > rm_contexts
|
||||
# ls foo
|
||||
# ls: cannot access 'foo': No such file or directory
|
||||
|
||||
Note that ``mk_contexts``, ``rm_contexts``, and ``monitor_on`` files are in the
|
||||
root directory only.
|
||||
|
||||
|
||||
.. _tracepoint:
|
||||
|
||||
Tracepoint for Monitoring Results
|
||||
=================================
|
||||
|
||||
|
@ -408,7 +408,7 @@ follows:
|
||||
Memory Policy APIs
|
||||
==================
|
||||
|
||||
Linux supports 3 system calls for controlling memory policy. These APIS
|
||||
Linux supports 4 system calls for controlling memory policy. These APIS
|
||||
always affect only the calling task, the calling task's address space, or
|
||||
some shared object mapped into the calling task's address space.
|
||||
|
||||
@ -460,6 +460,20 @@ requested via the 'flags' argument.
|
||||
|
||||
See the mbind(2) man page for more details.
|
||||
|
||||
Set home node for a Range of Task's Address Spacec::
|
||||
|
||||
long sys_set_mempolicy_home_node(unsigned long start, unsigned long len,
|
||||
unsigned long home_node,
|
||||
unsigned long flags);
|
||||
|
||||
sys_set_mempolicy_home_node set the home node for a VMA policy present in the
|
||||
task's address range. The system call updates the home node only for the existing
|
||||
mempolicy range. Other address ranges are ignored. A home node is the NUMA node
|
||||
closest to which page allocation will come from. Specifying the home node override
|
||||
the default allocation policy to allocate memory close to the local node for an
|
||||
executing CPU.
|
||||
|
||||
|
||||
Memory Policy Command Line Interface
|
||||
====================================
|
||||
|
||||
|
106
Documentation/admin-guide/perf/hisi-pcie-pmu.rst
Normal file
@ -0,0 +1,106 @@
|
||||
================================================
|
||||
HiSilicon PCIe Performance Monitoring Unit (PMU)
|
||||
================================================
|
||||
|
||||
On Hip09, HiSilicon PCIe Performance Monitoring Unit (PMU) could monitor
|
||||
bandwidth, latency, bus utilization and buffer occupancy data of PCIe.
|
||||
|
||||
Each PCIe Core has a PMU to monitor multi Root Ports of this PCIe Core and
|
||||
all Endpoints downstream these Root Ports.
|
||||
|
||||
|
||||
HiSilicon PCIe PMU driver
|
||||
=========================
|
||||
|
||||
The PCIe PMU driver registers a perf PMU with the name of its sicl-id and PCIe
|
||||
Core id.::
|
||||
|
||||
/sys/bus/event_source/hisi_pcie<sicl>_<core>
|
||||
|
||||
PMU driver provides description of available events and filter options in sysfs,
|
||||
see /sys/bus/event_source/devices/hisi_pcie<sicl>_<core>.
|
||||
|
||||
The "format" directory describes all formats of the config (events) and config1
|
||||
(filter options) fields of the perf_event_attr structure. The "events" directory
|
||||
describes all documented events shown in perf list.
|
||||
|
||||
The "identifier" sysfs file allows users to identify the version of the
|
||||
PMU hardware device.
|
||||
|
||||
The "bus" sysfs file allows users to get the bus number of Root Ports
|
||||
monitored by PMU.
|
||||
|
||||
Example usage of perf::
|
||||
|
||||
$# perf list
|
||||
hisi_pcie0_0/rx_mwr_latency/ [kernel PMU event]
|
||||
hisi_pcie0_0/rx_mwr_cnt/ [kernel PMU event]
|
||||
------------------------------------------
|
||||
|
||||
$# perf stat -e hisi_pcie0_0/rx_mwr_latency/
|
||||
$# perf stat -e hisi_pcie0_0/rx_mwr_cnt/
|
||||
$# perf stat -g -e hisi_pcie0_0/rx_mwr_latency/ -e hisi_pcie0_0/rx_mwr_cnt/
|
||||
|
||||
The current driver does not support sampling. So "perf record" is unsupported.
|
||||
Also attach to a task is unsupported for PCIe PMU.
|
||||
|
||||
Filter options
|
||||
--------------
|
||||
|
||||
1. Target filter
|
||||
PMU could only monitor the performance of traffic downstream target Root Ports
|
||||
or downstream target Endpoint. PCIe PMU driver support "port" and "bdf"
|
||||
interfaces for users, and these two interfaces aren't supported at the same
|
||||
time.
|
||||
|
||||
-port
|
||||
"port" filter can be used in all PCIe PMU events, target Root Port can be
|
||||
selected by configuring the 16-bits-bitmap "port". Multi ports can be selected
|
||||
for AP-layer-events, and only one port can be selected for TL/DL-layer-events.
|
||||
|
||||
For example, if target Root Port is 0000:00:00.0 (x8 lanes), bit0 of bitmap
|
||||
should be set, port=0x1; if target Root Port is 0000:00:04.0 (x4 lanes),
|
||||
bit8 is set, port=0x100; if these two Root Ports are both monitored, port=0x101.
|
||||
|
||||
Example usage of perf::
|
||||
|
||||
$# perf stat -e hisi_pcie0_0/rx_mwr_latency,port=0x1/ sleep 5
|
||||
|
||||
-bdf
|
||||
|
||||
"bdf" filter can only be used in bandwidth events, target Endpoint is selected
|
||||
by configuring BDF to "bdf". Counter only counts the bandwidth of message
|
||||
requested by target Endpoint.
|
||||
|
||||
For example, "bdf=0x3900" means BDF of target Endpoint is 0000:39:00.0.
|
||||
|
||||
Example usage of perf::
|
||||
|
||||
$# perf stat -e hisi_pcie0_0/rx_mrd_flux,bdf=0x3900/ sleep 5
|
||||
|
||||
2. Trigger filter
|
||||
Event statistics start when the first time TLP length is greater/smaller
|
||||
than trigger condition. You can set the trigger condition by writing "trig_len",
|
||||
and set the trigger mode by writing "trig_mode". This filter can only be used
|
||||
in bandwidth events.
|
||||
|
||||
For example, "trig_len=4" means trigger condition is 2^4 DW, "trig_mode=0"
|
||||
means statistics start when TLP length > trigger condition, "trig_mode=1"
|
||||
means start when TLP length < condition.
|
||||
|
||||
Example usage of perf::
|
||||
|
||||
$# perf stat -e hisi_pcie0_0/rx_mrd_flux,trig_len=0x4,trig_mode=1/ sleep 5
|
||||
|
||||
3. Threshold filter
|
||||
Counter counts when TLP length within the specified range. You can set the
|
||||
threshold by writing "thr_len", and set the threshold mode by writing
|
||||
"thr_mode". This filter can only be used in bandwidth events.
|
||||
|
||||
For example, "thr_len=4" means threshold is 2^4 DW, "thr_mode=0" means
|
||||
counter counts when TLP length >= threshold, and "thr_mode=1" means counts
|
||||
when TLP length < threshold.
|
||||
|
||||
Example usage of perf::
|
||||
|
||||
$# perf stat -e hisi_pcie0_0/rx_mrd_flux,thr_len=0x4,thr_mode=1/ sleep 5
|
382
Documentation/admin-guide/pm/amd-pstate.rst
Normal file
@ -0,0 +1,382 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
.. include:: <isonum.txt>
|
||||
|
||||
===============================================
|
||||
``amd-pstate`` CPU Performance Scaling Driver
|
||||
===============================================
|
||||
|
||||
:Copyright: |copy| 2021 Advanced Micro Devices, Inc.
|
||||
|
||||
:Author: Huang Rui <ray.huang@amd.com>
|
||||
|
||||
|
||||
Introduction
|
||||
===================
|
||||
|
||||
``amd-pstate`` is the AMD CPU performance scaling driver that introduces a
|
||||
new CPU frequency control mechanism on modern AMD APU and CPU series in
|
||||
Linux kernel. The new mechanism is based on Collaborative Processor
|
||||
Performance Control (CPPC) which provides finer grain frequency management
|
||||
than legacy ACPI hardware P-States. Current AMD CPU/APU platforms are using
|
||||
the ACPI P-states driver to manage CPU frequency and clocks with switching
|
||||
only in 3 P-states. CPPC replaces the ACPI P-states controls, allows a
|
||||
flexible, low-latency interface for the Linux kernel to directly
|
||||
communicate the performance hints to hardware.
|
||||
|
||||
``amd-pstate`` leverages the Linux kernel governors such as ``schedutil``,
|
||||
``ondemand``, etc. to manage the performance hints which are provided by
|
||||
CPPC hardware functionality that internally follows the hardware
|
||||
specification (for details refer to AMD64 Architecture Programmer's Manual
|
||||
Volume 2: System Programming [1]_). Currently ``amd-pstate`` supports basic
|
||||
frequency control function according to kernel governors on some of the
|
||||
Zen2 and Zen3 processors, and we will implement more AMD specific functions
|
||||
in future after we verify them on the hardware and SBIOS.
|
||||
|
||||
|
||||
AMD CPPC Overview
|
||||
=======================
|
||||
|
||||
Collaborative Processor Performance Control (CPPC) interface enumerates a
|
||||
continuous, abstract, and unit-less performance value in a scale that is
|
||||
not tied to a specific performance state / frequency. This is an ACPI
|
||||
standard [2]_ which software can specify application performance goals and
|
||||
hints as a relative target to the infrastructure limits. AMD processors
|
||||
provides the low latency register model (MSR) instead of AML code
|
||||
interpreter for performance adjustments. ``amd-pstate`` will initialize a
|
||||
``struct cpufreq_driver`` instance ``amd_pstate_driver`` with the callbacks
|
||||
to manage each performance update behavior. ::
|
||||
|
||||
Highest Perf ------>+-----------------------+ +-----------------------+
|
||||
| | | |
|
||||
| | | |
|
||||
| | Max Perf ---->| |
|
||||
| | | |
|
||||
| | | |
|
||||
Nominal Perf ------>+-----------------------+ +-----------------------+
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
| | Desired Perf ---->| |
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
Lowest non- | | | |
|
||||
linear perf ------>+-----------------------+ +-----------------------+
|
||||
| | | |
|
||||
| | Lowest perf ---->| |
|
||||
| | | |
|
||||
Lowest perf ------>+-----------------------+ +-----------------------+
|
||||
| | | |
|
||||
| | | |
|
||||
| | | |
|
||||
0 ------>+-----------------------+ +-----------------------+
|
||||
|
||||
AMD P-States Performance Scale
|
||||
|
||||
|
||||
.. _perf_cap:
|
||||
|
||||
AMD CPPC Performance Capability
|
||||
--------------------------------
|
||||
|
||||
Highest Performance (RO)
|
||||
.........................
|
||||
|
||||
It is the absolute maximum performance an individual processor may reach,
|
||||
assuming ideal conditions. This performance level may not be sustainable
|
||||
for long durations and may only be achievable if other platform components
|
||||
are in a specific state; for example, it may require other processors be in
|
||||
an idle state. This would be equivalent to the highest frequencies
|
||||
supported by the processor.
|
||||
|
||||
Nominal (Guaranteed) Performance (RO)
|
||||
......................................
|
||||
|
||||
It is the maximum sustained performance level of the processor, assuming
|
||||
ideal operating conditions. In absence of an external constraint (power,
|
||||
thermal, etc.) this is the performance level the processor is expected to
|
||||
be able to maintain continuously. All cores/processors are expected to be
|
||||
able to sustain their nominal performance state simultaneously.
|
||||
|
||||
Lowest non-linear Performance (RO)
|
||||
...................................
|
||||
|
||||
It is the lowest performance level at which nonlinear power savings are
|
||||
achieved, for example, due to the combined effects of voltage and frequency
|
||||
scaling. Above this threshold, lower performance levels should be generally
|
||||
more energy efficient than higher performance levels. This register
|
||||
effectively conveys the most efficient performance level to ``amd-pstate``.
|
||||
|
||||
Lowest Performance (RO)
|
||||
........................
|
||||
|
||||
It is the absolute lowest performance level of the processor. Selecting a
|
||||
performance level lower than the lowest nonlinear performance level may
|
||||
cause an efficiency penalty but should reduce the instantaneous power
|
||||
consumption of the processor.
|
||||
|
||||
AMD CPPC Performance Control
|
||||
------------------------------
|
||||
|
||||
``amd-pstate`` passes performance goals through these registers. The
|
||||
register drives the behavior of the desired performance target.
|
||||
|
||||
Minimum requested performance (RW)
|
||||
...................................
|
||||
|
||||
``amd-pstate`` specifies the minimum allowed performance level.
|
||||
|
||||
Maximum requested performance (RW)
|
||||
...................................
|
||||
|
||||
``amd-pstate`` specifies a limit the maximum performance that is expected
|
||||
to be supplied by the hardware.
|
||||
|
||||
Desired performance target (RW)
|
||||
...................................
|
||||
|
||||
``amd-pstate`` specifies a desired target in the CPPC performance scale as
|
||||
a relative number. This can be expressed as percentage of nominal
|
||||
performance (infrastructure max). Below the nominal sustained performance
|
||||
level, desired performance expresses the average performance level of the
|
||||
processor subject to hardware. Above the nominal performance level,
|
||||
processor must provide at least nominal performance requested and go higher
|
||||
if current operating conditions allow.
|
||||
|
||||
Energy Performance Preference (EPP) (RW)
|
||||
.........................................
|
||||
|
||||
Provides a hint to the hardware if software wants to bias toward performance
|
||||
(0x0) or energy efficiency (0xff).
|
||||
|
||||
|
||||
Key Governors Support
|
||||
=======================
|
||||
|
||||
``amd-pstate`` can be used with all the (generic) scaling governors listed
|
||||
by the ``scaling_available_governors`` policy attribute in ``sysfs``. Then,
|
||||
it is responsible for the configuration of policy objects corresponding to
|
||||
CPUs and provides the ``CPUFreq`` core (and the scaling governors attached
|
||||
to the policy objects) with accurate information on the maximum and minimum
|
||||
operating frequencies supported by the hardware. Users can check the
|
||||
``scaling_cur_freq`` information comes from the ``CPUFreq`` core.
|
||||
|
||||
``amd-pstate`` mainly supports ``schedutil`` and ``ondemand`` for dynamic
|
||||
frequency control. It is to fine tune the processor configuration on
|
||||
``amd-pstate`` to the ``schedutil`` with CPU CFS scheduler. ``amd-pstate``
|
||||
registers adjust_perf callback to implement the CPPC similar performance
|
||||
update behavior. It is initialized by ``sugov_start`` and then populate the
|
||||
CPU's update_util_data pointer to assign ``sugov_update_single_perf`` as
|
||||
the utilization update callback function in CPU scheduler. CPU scheduler
|
||||
will call ``cpufreq_update_util`` and assign the target performance
|
||||
according to the ``struct sugov_cpu`` that utilization update belongs to.
|
||||
Then ``amd-pstate`` updates the desired performance according to the CPU
|
||||
scheduler assigned.
|
||||
|
||||
|
||||
Processor Support
|
||||
=======================
|
||||
|
||||
The ``amd-pstate`` initialization will fail if the _CPC in ACPI SBIOS is
|
||||
not existed at the detected processor, and it uses ``acpi_cpc_valid`` to
|
||||
check the _CPC existence. All Zen based processors support legacy ACPI
|
||||
hardware P-States function, so while the ``amd-pstate`` fails to be
|
||||
initialized, the kernel will fall back to initialize ``acpi-cpufreq``
|
||||
driver.
|
||||
|
||||
There are two types of hardware implementations for ``amd-pstate``: one is
|
||||
`Full MSR Support <perf_cap_>`_ and another is `Shared Memory Support
|
||||
<perf_cap_>`_. It can use :c:macro:`X86_FEATURE_CPPC` feature flag (for
|
||||
details refer to Processor Programming Reference (PPR) for AMD Family
|
||||
19h Model 51h, Revision A1 Processors [3]_) to indicate the different
|
||||
types. ``amd-pstate`` is to register different ``static_call`` instances
|
||||
for different hardware implementations.
|
||||
|
||||
Currently, some of Zen2 and Zen3 processors support ``amd-pstate``. In the
|
||||
future, it will be supported on more and more AMD processors.
|
||||
|
||||
Full MSR Support
|
||||
-----------------
|
||||
|
||||
Some new Zen3 processors such as Cezanne provide the MSR registers directly
|
||||
while the :c:macro:`X86_FEATURE_CPPC` CPU feature flag is set.
|
||||
``amd-pstate`` can handle the MSR register to implement the fast switch
|
||||
function in ``CPUFreq`` that can shrink latency of frequency control on the
|
||||
interrupt context. The functions with ``pstate_xxx`` prefix represent the
|
||||
operations of MSR registers.
|
||||
|
||||
Shared Memory Support
|
||||
----------------------
|
||||
|
||||
If :c:macro:`X86_FEATURE_CPPC` CPU feature flag is not set, that means the
|
||||
processor supports shared memory solution. In this case, ``amd-pstate``
|
||||
uses the ``cppc_acpi`` helper methods to implement the callback functions
|
||||
that defined on ``static_call``. The functions with ``cppc_xxx`` prefix
|
||||
represent the operations of acpi cppc helpers for shared memory solution.
|
||||
|
||||
|
||||
AMD P-States and ACPI hardware P-States always can be supported in one
|
||||
processor. But AMD P-States has the higher priority and if it is enabled
|
||||
with :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond
|
||||
to the request from AMD P-States.
|
||||
|
||||
|
||||
User Space Interface in ``sysfs``
|
||||
==================================
|
||||
|
||||
``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to
|
||||
control its functionality at the system level. They located in the
|
||||
``/sys/devices/system/cpu/cpufreq/policyX/`` directory and affect all CPUs. ::
|
||||
|
||||
root@hr-test1:/home/ray# ls /sys/devices/system/cpu/cpufreq/policy0/*amd*
|
||||
/sys/devices/system/cpu/cpufreq/policy0/amd_pstate_highest_perf
|
||||
/sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_nonlinear_freq
|
||||
/sys/devices/system/cpu/cpufreq/policy0/amd_pstate_max_freq
|
||||
|
||||
|
||||
``amd_pstate_highest_perf / amd_pstate_max_freq``
|
||||
|
||||
Maximum CPPC performance and CPU frequency that the driver is allowed to
|
||||
set in percent of the maximum supported CPPC performance level (the highest
|
||||
performance supported in `AMD CPPC Performance Capability <perf_cap_>`_).
|
||||
In some of ASICs, the highest CPPC performance is not the one in the _CPC
|
||||
table, so we need to expose it to sysfs. If boost is not active but
|
||||
supported, this maximum frequency will be larger than the one in
|
||||
``cpuinfo``.
|
||||
This attribute is read-only.
|
||||
|
||||
``amd_pstate_lowest_nonlinear_freq``
|
||||
|
||||
The lowest non-linear CPPC CPU frequency that the driver is allowed to set
|
||||
in percent of the maximum supported CPPC performance level (Please see the
|
||||
lowest non-linear performance in `AMD CPPC Performance Capability
|
||||
<perf_cap_>`_).
|
||||
This attribute is read-only.
|
||||
|
||||
For other performance and frequency values, we can read them back from
|
||||
``/sys/devices/system/cpu/cpuX/acpi_cppc/``, see :ref:`cppc_sysfs`.
|
||||
|
||||
|
||||
``amd-pstate`` vs ``acpi-cpufreq``
|
||||
======================================
|
||||
|
||||
On majority of AMD platforms supported by ``acpi-cpufreq``, the ACPI tables
|
||||
provided by the platform firmware used for CPU performance scaling, but
|
||||
only provides 3 P-states on AMD processors.
|
||||
However, on modern AMD APU and CPU series, it provides the collaborative
|
||||
processor performance control according to ACPI protocol and customize this
|
||||
for AMD platforms. That is fine-grain and continuous frequency range
|
||||
instead of the legacy hardware P-states. ``amd-pstate`` is the kernel
|
||||
module which supports the new AMD P-States mechanism on most of future AMD
|
||||
platforms. The AMD P-States mechanism will be the more performance and energy
|
||||
efficiency frequency management method on AMD processors.
|
||||
|
||||
Kernel Module Options for ``amd-pstate``
|
||||
=========================================
|
||||
|
||||
``shared_mem``
|
||||
Use a module param (shared_mem) to enable related processors manually with
|
||||
**amd_pstate.shared_mem=1**.
|
||||
Due to the performance issue on the processors with `Shared Memory Support
|
||||
<perf_cap_>`_, so we disable it for the moment and will enable this by default
|
||||
once we address performance issue on this solution.
|
||||
|
||||
The way to check whether current processor is `Full MSR Support <perf_cap_>`_
|
||||
or `Shared Memory Support <perf_cap_>`_ : ::
|
||||
|
||||
ray@hr-test1:~$ lscpu | grep cppc
|
||||
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
|
||||
|
||||
If CPU Flags have cppc, then this processor supports `Full MSR Support
|
||||
<perf_cap_>`_. Otherwise it supports `Shared Memory Support <perf_cap_>`_.
|
||||
|
||||
|
||||
``cpupower`` tool support for ``amd-pstate``
|
||||
===============================================
|
||||
|
||||
``amd-pstate`` is supported on ``cpupower`` tool that can be used to dump the frequency
|
||||
information. And it is in progress to support more and more operations for new
|
||||
``amd-pstate`` module with this tool. ::
|
||||
|
||||
root@hr-test1:/home/ray# cpupower frequency-info
|
||||
analyzing CPU 0:
|
||||
driver: amd-pstate
|
||||
CPUs which run at the same hardware frequency: 0
|
||||
CPUs which need to have their frequency coordinated by software: 0
|
||||
maximum transition latency: 131 us
|
||||
hardware limits: 400 MHz - 4.68 GHz
|
||||
available cpufreq governors: ondemand conservative powersave userspace performance schedutil
|
||||
current policy: frequency should be within 400 MHz and 4.68 GHz.
|
||||
The governor "schedutil" may decide which speed to use
|
||||
within this range.
|
||||
current CPU frequency: Unable to call hardware
|
||||
current CPU frequency: 4.02 GHz (asserted by call to kernel)
|
||||
boost state support:
|
||||
Supported: yes
|
||||
Active: yes
|
||||
AMD PSTATE Highest Performance: 166. Maximum Frequency: 4.68 GHz.
|
||||
AMD PSTATE Nominal Performance: 117. Nominal Frequency: 3.30 GHz.
|
||||
AMD PSTATE Lowest Non-linear Performance: 39. Lowest Non-linear Frequency: 1.10 GHz.
|
||||
AMD PSTATE Lowest Performance: 15. Lowest Frequency: 400 MHz.
|
||||
|
||||
|
||||
Diagnostics and Tuning
|
||||
=======================
|
||||
|
||||
Trace Events
|
||||
--------------
|
||||
|
||||
There are two static trace events that can be used for ``amd-pstate``
|
||||
diagnostics. One of them is the cpu_frequency trace event generally used
|
||||
by ``CPUFreq``, and the other one is the ``amd_pstate_perf`` trace event
|
||||
specific to ``amd-pstate``. The following sequence of shell commands can
|
||||
be used to enable them and see their output (if the kernel is generally
|
||||
configured to support event tracing). ::
|
||||
|
||||
root@hr-test1:/home/ray# cd /sys/kernel/tracing/
|
||||
root@hr-test1:/sys/kernel/tracing# echo 1 > events/amd_cpu/enable
|
||||
root@hr-test1:/sys/kernel/tracing# cat trace
|
||||
# tracer: nop
|
||||
#
|
||||
# entries-in-buffer/entries-written: 47827/42233061 #P:2
|
||||
#
|
||||
# _-----=> irqs-off
|
||||
# / _----=> need-resched
|
||||
# | / _---=> hardirq/softirq
|
||||
# || / _--=> preempt-depth
|
||||
# ||| / delay
|
||||
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
|
||||
# | | | |||| | |
|
||||
<idle>-0 [015] dN... 4995.979886: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=15 changed=false fast_switch=true
|
||||
<idle>-0 [007] d.h.. 4995.979893: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true
|
||||
cat-2161 [000] d.... 4995.980841: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=0 changed=false fast_switch=true
|
||||
sshd-2125 [004] d.s.. 4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=4 changed=false fast_switch=true
|
||||
<idle>-0 [007] d.s.. 4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true
|
||||
<idle>-0 [003] d.s.. 4995.980971: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=3 changed=false fast_switch=true
|
||||
<idle>-0 [011] d.s.. 4995.980996: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=11 changed=false fast_switch=true
|
||||
|
||||
The cpu_frequency trace event will be triggered either by the ``schedutil`` scaling
|
||||
governor (for the policies it is attached to), or by the ``CPUFreq`` core (for the
|
||||
policies with other scaling governors).
|
||||
|
||||
|
||||
Reference
|
||||
===========
|
||||
|
||||
.. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming,
|
||||
https://www.amd.com/system/files/TechDocs/24593.pdf
|
||||
|
||||
.. [2] Advanced Configuration and Power Interface Specification,
|
||||
https://uefi.org/sites/default/files/resources/ACPI_Spec_6_4_Jan22.pdf
|
||||
|
||||
.. [3] Processor Programming Reference (PPR) for AMD Family 19h Model 51h, Revision A1 Processors
|
||||
https://www.amd.com/system/files/TechDocs/56569-A1-PUB.zip
|
@ -11,6 +11,7 @@ Working-State Power Management
|
||||
intel_idle
|
||||
cpufreq
|
||||
intel_pstate
|
||||
amd-pstate
|
||||
cpufreq_drivers
|
||||
intel_epb
|
||||
intel-speed-select
|
||||
|
@ -905,6 +905,17 @@ enabled, otherwise writing to this file will return ``-EBUSY``.
|
||||
The default value is 8.
|
||||
|
||||
|
||||
perf_user_access (arm64 only)
|
||||
=================================
|
||||
|
||||
Controls user space access for reading perf event counters. When set to 1,
|
||||
user space can read performance monitor counter registers directly.
|
||||
|
||||
The default value is 0 (access disabled).
|
||||
|
||||
See Documentation/arm64/perf.rst for more information.
|
||||
|
||||
|
||||
pid_max
|
||||
=======
|
||||
|
||||
|
@ -948,7 +948,7 @@ how much memory needs to be free before kswapd goes back to sleep.
|
||||
|
||||
The unit is in fractions of 10,000. The default value of 10 means the
|
||||
distances between watermarks are 0.1% of the available memory in the
|
||||
node/system. The maximum value is 1000, or 10% of memory.
|
||||
node/system. The maximum value is 3000, or 30% of memory.
|
||||
|
||||
A high rate of threads entering direct reclaim (allocstall) or kswapd
|
||||
going to sleep prematurely (kswapd_low_wmark_hit_quickly) can indicate
|
||||
|
85
Documentation/arc/arc.rst
Normal file
@ -0,0 +1,85 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
Linux kernel for ARC processors
|
||||
*******************************
|
||||
|
||||
Other sources of information
|
||||
############################
|
||||
|
||||
Below are some resources where more information can be found on
|
||||
ARC processors and relevant open source projects.
|
||||
|
||||
- `<https://embarc.org>`_ - Community portal for open source on ARC.
|
||||
Good place to start to find relevant FOSS projects, toolchain releases,
|
||||
news items and more.
|
||||
|
||||
- `<https://github.com/foss-for-synopsys-dwc-arc-processors>`_ -
|
||||
Home for all development activities regarding open source projects for
|
||||
ARC processors. Some of the projects are forks of various upstream projects,
|
||||
where "work in progress" is hosted prior to submission to upstream projects.
|
||||
Other projects are developed by Synopsys and made available to community
|
||||
as open source for use on ARC Processors.
|
||||
|
||||
- `Official Synopsys ARC Processors website
|
||||
<https://www.synopsys.com/designware-ip/processor-solutions.html>`_ -
|
||||
location, with access to some IP documentation (`Programmer's Reference
|
||||
Manual, AKA PRM for ARC HS processors
|
||||
<https://www.synopsys.com/dw/doc.php/ds/cc/programmers-reference-manual-ARC-HS.pdf>`_)
|
||||
and free versions of some commercial tools (`Free nSIM
|
||||
<https://www.synopsys.com/cgi-bin/dwarcnsim/req1.cgi>`_ and
|
||||
`MetaWare Light Edition <https://www.synopsys.com/cgi-bin/arcmwtk_lite/reg1.cgi>`_).
|
||||
Please note though, registration is required to access both the documentation and
|
||||
the tools.
|
||||
|
||||
Important note on ARC processors configurability
|
||||
################################################
|
||||
|
||||
ARC processors are highly configurable and several configurable options
|
||||
are supported in Linux. Some options are transparent to software
|
||||
(i.e cache geometries, some can be detected at runtime and configured
|
||||
and used accordingly, while some need to be explicitly selected or configured
|
||||
in the kernel's configuration utility (AKA "make menuconfig").
|
||||
|
||||
However not all configurable options are supported when an ARC processor
|
||||
is to run Linux. SoC design teams should refer to "Appendix E:
|
||||
Configuration for ARC Linux" in the ARC HS Databook for configurability
|
||||
guidelines.
|
||||
|
||||
Following these guidelines and selecting valid configuration options
|
||||
up front is critical to help prevent any unwanted issues during
|
||||
SoC bringup and software development in general.
|
||||
|
||||
Building the Linux kernel for ARC processors
|
||||
############################################
|
||||
|
||||
The process of kernel building for ARC processors is the same as for any other
|
||||
architecture and could be done in 2 ways:
|
||||
|
||||
- Cross-compilation: process of compiling for ARC targets on a development
|
||||
host with a different processor architecture (generally x86_64/amd64).
|
||||
- Native compilation: process of compiling for ARC on a ARC platform
|
||||
(hardware board or a simulator like QEMU) with complete development environment
|
||||
(GNU toolchain, dtc, make etc) installed on the platform.
|
||||
|
||||
In both cases, up-to-date GNU toolchain for ARC for the host is needed.
|
||||
Synopsys offers prebuilt toolchain releases which can be used for this purpose,
|
||||
available from:
|
||||
|
||||
- Synopsys GNU toolchain releases:
|
||||
`<https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/releases>`_
|
||||
|
||||
- Linux kernel compilers collection:
|
||||
`<https://mirrors.edge.kernel.org/pub/tools/crosstool>`_
|
||||
|
||||
- Bootlin's toolchain collection: `<https://toolchains.bootlin.com>`_
|
||||
|
||||
Once the toolchain is installed in the system, make sure its "bin" folder
|
||||
is added in your ``PATH`` environment variable. Then set ``ARCH=arc`` &
|
||||
``CROSS_COMPILE=arc-linux`` (or whatever matches installed ARC toolchain prefix)
|
||||
and then as usual ``make defconfig && make``.
|
||||
|
||||
This will produce "vmlinux" file in the root of the kernel source tree
|
||||
usable for loading on the target system via JTAG.
|
||||
If you need to get an image usable with U-Boot bootloader,
|
||||
type ``make uImage`` and ``uImage`` will be produced in ``arch/arc/boot``
|
||||
folder.
|
3
Documentation/arc/features.rst
Normal file
@ -0,0 +1,3 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
.. kernel-feat:: $srctree/Documentation/features arc
|
17
Documentation/arc/index.rst
Normal file
@ -0,0 +1,17 @@
|
||||
===================
|
||||
ARC architecture
|
||||
===================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
arc
|
||||
|
||||
features
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
@ -9,6 +9,7 @@ implementation.
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
arc/index
|
||||
arm/index
|
||||
arm64/index
|
||||
ia64/index
|
||||
|
@ -266,10 +266,12 @@ Avanta family
|
||||
-------------
|
||||
|
||||
Flavors:
|
||||
- 88F6500
|
||||
- 88F6510
|
||||
- 88F6530P
|
||||
- 88F6550
|
||||
- 88F6560
|
||||
- 88F6601
|
||||
|
||||
Homepage:
|
||||
https://web.archive.org/web/20181005145041/http://www.marvell.com/broadband/
|
||||
|
@ -275,6 +275,23 @@ infrastructure:
|
||||
| SVEVer | [3-0] | y |
|
||||
+------------------------------+---------+---------+
|
||||
|
||||
8) ID_AA64MMFR1_EL1 - Memory model feature register 1
|
||||
|
||||
+------------------------------+---------+---------+
|
||||
| Name | bits | visible |
|
||||
+------------------------------+---------+---------+
|
||||
| AFP | [47-44] | y |
|
||||
+------------------------------+---------+---------+
|
||||
|
||||
9) ID_AA64ISAR2_EL1 - Instruction set attribute register 2
|
||||
|
||||
+------------------------------+---------+---------+
|
||||
| Name | bits | visible |
|
||||
+------------------------------+---------+---------+
|
||||
| RPRES | [7-4] | y |
|
||||
+------------------------------+---------+---------+
|
||||
|
||||
|
||||
Appendix I: Example
|
||||
-------------------
|
||||
|
||||
|
@ -251,6 +251,14 @@ HWCAP2_ECV
|
||||
|
||||
Functionality implied by ID_AA64MMFR0_EL1.ECV == 0b0001.
|
||||
|
||||
HWCAP2_AFP
|
||||
|
||||
Functionality implied by ID_AA64MFR1_EL1.AFP == 0b0001.
|
||||
|
||||
HWCAP2_RPRES
|
||||
|
||||
Functionality implied by ID_AA64ISAR2_EL1.RPRES == 0b0001.
|
||||
|
||||
4. Unused AT_HWCAP bits
|
||||
-----------------------
|
||||
|
||||
|
@ -2,7 +2,10 @@
|
||||
|
||||
.. _perf_index:
|
||||
|
||||
=====================
|
||||
====
|
||||
Perf
|
||||
====
|
||||
|
||||
Perf Event Attributes
|
||||
=====================
|
||||
|
||||
@ -88,3 +91,76 @@ exclude_host. However when using !exclude_hv there is a small blackout
|
||||
window at the guest entry/exit where host events are not captured.
|
||||
|
||||
On VHE systems there are no blackout windows.
|
||||
|
||||
Perf Userspace PMU Hardware Counter Access
|
||||
==========================================
|
||||
|
||||
Overview
|
||||
--------
|
||||
The perf userspace tool relies on the PMU to monitor events. It offers an
|
||||
abstraction layer over the hardware counters since the underlying
|
||||
implementation is cpu-dependent.
|
||||
Arm64 allows userspace tools to have access to the registers storing the
|
||||
hardware counters' values directly.
|
||||
|
||||
This targets specifically self-monitoring tasks in order to reduce the overhead
|
||||
by directly accessing the registers without having to go through the kernel.
|
||||
|
||||
How-to
|
||||
------
|
||||
The focus is set on the armv8 PMUv3 which makes sure that the access to the pmu
|
||||
registers is enabled and that the userspace has access to the relevant
|
||||
information in order to use them.
|
||||
|
||||
In order to have access to the hardware counters, the global sysctl
|
||||
kernel/perf_user_access must first be enabled:
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
echo 1 > /proc/sys/kernel/perf_user_access
|
||||
|
||||
It is necessary to open the event using the perf tool interface with config1:1
|
||||
attr bit set: the sys_perf_event_open syscall returns a fd which can
|
||||
subsequently be used with the mmap syscall in order to retrieve a page of memory
|
||||
containing information about the event. The PMU driver uses this page to expose
|
||||
to the user the hardware counter's index and other necessary data. Using this
|
||||
index enables the user to access the PMU registers using the `mrs` instruction.
|
||||
Access to the PMU registers is only valid while the sequence lock is unchanged.
|
||||
In particular, the PMSELR_EL0 register is zeroed each time the sequence lock is
|
||||
changed.
|
||||
|
||||
The userspace access is supported in libperf using the perf_evsel__mmap()
|
||||
and perf_evsel__read() functions. See `tools/lib/perf/tests/test-evsel.c`_ for
|
||||
an example.
|
||||
|
||||
About heterogeneous systems
|
||||
---------------------------
|
||||
On heterogeneous systems such as big.LITTLE, userspace PMU counter access can
|
||||
only be enabled when the tasks are pinned to a homogeneous subset of cores and
|
||||
the corresponding PMU instance is opened by specifying the 'type' attribute.
|
||||
The use of generic event types is not supported in this case.
|
||||
|
||||
Have a look at `tools/perf/arch/arm64/tests/user-events.c`_ for an example. It
|
||||
can be run using the perf tool to check that the access to the registers works
|
||||
correctly from userspace:
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
perf test -v user
|
||||
|
||||
About chained events and counter sizes
|
||||
--------------------------------------
|
||||
The user can request either a 32-bit (config1:0 == 0) or 64-bit (config1:0 == 1)
|
||||
counter along with userspace access. The sys_perf_event_open syscall will fail
|
||||
if a 64-bit counter is requested and the hardware doesn't support 64-bit
|
||||
counters. Chained events are not supported in conjunction with userspace counter
|
||||
access. If a 32-bit counter is requested on hardware with 64-bit counters, then
|
||||
userspace must treat the upper 32-bits read from the counter as UNKNOWN. The
|
||||
'pmc_width' field in the user page will indicate the valid width of the counter
|
||||
and should be used to mask the upper bits as needed.
|
||||
|
||||
.. Links
|
||||
.. _tools/perf/arch/arm64/tests/user-events.c:
|
||||
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c
|
||||
.. _tools/lib/perf/tests/test-evsel.c:
|
||||
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c
|
||||
|
@ -52,6 +52,12 @@ stable kernels.
|
||||
| Allwinner | A64/R18 | UNKNOWN1 | SUN50I_ERRATUM_UNKNOWN1 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Cortex-A510 | #2064142 | ARM64_ERRATUM_2064142 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Cortex-A510 | #2038923 | ARM64_ERRATUM_2038923 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Cortex-A510 | #1902691 | ARM64_ERRATUM_1902691 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 |
|
||||
@ -92,12 +98,20 @@ stable kernels.
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Cortex-A77 | #1508412 | ARM64_ERRATUM_1508412 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Cortex-A510 | #2051678 | ARM64_ERRATUM_2051678 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Cortex-A510 | #2077057 | ARM64_ERRATUM_2077057 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Cortex-A710 | #2054223 | ARM64_ERRATUM_2054223 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Cortex-A710 | #2224489 | ARM64_ERRATUM_2224489 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Cortex-X2 | #2119858 | ARM64_ERRATUM_2119858 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Cortex-X2 | #2224489 | ARM64_ERRATUM_2224489 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 |
|
||||
+----------------+-----------------+-----------------+-----------------------------+
|
||||
| ARM | Neoverse-N1 | #1349291 | N/A |
|
||||
|
@ -255,7 +255,7 @@ prctl(PR_SVE_GET_VL)
|
||||
vector length change (which would only normally be the case between a
|
||||
fork() or vfork() and the corresponding execve() in typical use).
|
||||
|
||||
To extract the vector length from the result, and it with
|
||||
To extract the vector length from the result, bitwise and it with
|
||||
PR_SVE_VL_LEN_MASK.
|
||||
|
||||
Return value: a nonnegative value on success, or a negative value on error:
|
||||
|
@ -49,7 +49,7 @@ how the user addresses are used by the kernel:
|
||||
|
||||
- ``brk()``, ``mmap()`` and the ``new_address`` argument to
|
||||
``mremap()`` as these have the potential to alias with existing
|
||||
user addresses.
|
||||
user addresses.
|
||||
|
||||
NOTE: This behaviour changed in v5.6 and so some earlier kernels may
|
||||
incorrectly accept valid tagged pointers for the ``brk()``,
|
||||
|
@ -20,7 +20,6 @@ Block
|
||||
kyber-iosched
|
||||
null_blk
|
||||
pr
|
||||
queue-sysfs
|
||||
request
|
||||
stat
|
||||
switching-sched
|
||||
|
@ -1,321 +0,0 @@
|
||||
=================
|
||||
Queue sysfs files
|
||||
=================
|
||||
|
||||
This text file will detail the queue files that are located in the sysfs tree
|
||||
for each block device. Note that stacked devices typically do not export
|
||||
any settings, since their queue merely functions as a remapping target.
|
||||
These files are the ones found in the /sys/block/xxx/queue/ directory.
|
||||
|
||||
Files denoted with a RO postfix are readonly and the RW postfix means
|
||||
read-write.
|
||||
|
||||
add_random (RW)
|
||||
---------------
|
||||
This file allows to turn off the disk entropy contribution. Default
|
||||
value of this file is '1'(on).
|
||||
|
||||
chunk_sectors (RO)
|
||||
------------------
|
||||
This has different meaning depending on the type of the block device.
|
||||
For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
|
||||
of the RAID volume stripe segment. For a zoned block device, either host-aware
|
||||
or host-managed, chunk_sectors indicates the size in 512B sectors of the zones
|
||||
of the device, with the eventual exception of the last zone of the device which
|
||||
may be smaller.
|
||||
|
||||
dax (RO)
|
||||
--------
|
||||
This file indicates whether the device supports Direct Access (DAX),
|
||||
used by CPU-addressable storage to bypass the pagecache. It shows '1'
|
||||
if true, '0' if not.
|
||||
|
||||
discard_granularity (RO)
|
||||
------------------------
|
||||
This shows the size of internal allocation of the device in bytes, if
|
||||
reported by the device. A value of '0' means device does not support
|
||||
the discard functionality.
|
||||
|
||||
discard_max_hw_bytes (RO)
|
||||
-------------------------
|
||||
Devices that support discard functionality may have internal limits on
|
||||
the number of bytes that can be trimmed or unmapped in a single operation.
|
||||
The `discard_max_hw_bytes` parameter is set by the device driver to the
|
||||
maximum number of bytes that can be discarded in a single operation.
|
||||
Discard requests issued to the device must not exceed this limit.
|
||||
A `discard_max_hw_bytes` value of 0 means that the device does not support
|
||||
discard functionality.
|
||||
|
||||
discard_max_bytes (RW)
|
||||
----------------------
|
||||
While discard_max_hw_bytes is the hardware limit for the device, this
|
||||
setting is the software limit. Some devices exhibit large latencies when
|
||||
large discards are issued, setting this value lower will make Linux issue
|
||||
smaller discards and potentially help reduce latencies induced by large
|
||||
discard operations.
|
||||
|
||||
discard_zeroes_data (RO)
|
||||
------------------------
|
||||
Obsolete. Always zero.
|
||||
|
||||
fua (RO)
|
||||
--------
|
||||
Whether or not the block driver supports the FUA flag for write requests.
|
||||
FUA stands for Force Unit Access. If the FUA flag is set that means that
|
||||
write requests must bypass the volatile cache of the storage device.
|
||||
|
||||
hw_sector_size (RO)
|
||||
-------------------
|
||||
This is the hardware sector size of the device, in bytes.
|
||||
|
||||
io_poll (RW)
|
||||
------------
|
||||
When read, this file shows whether polling is enabled (1) or disabled
|
||||
(0). Writing '0' to this file will disable polling for this device.
|
||||
Writing any non-zero value will enable this feature.
|
||||
|
||||
io_poll_delay (RW)
|
||||
------------------
|
||||
If polling is enabled, this controls what kind of polling will be
|
||||
performed. It defaults to -1, which is classic polling. In this mode,
|
||||
the CPU will repeatedly ask for completions without giving up any time.
|
||||
If set to 0, a hybrid polling mode is used, where the kernel will attempt
|
||||
to make an educated guess at when the IO will complete. Based on this
|
||||
guess, the kernel will put the process issuing IO to sleep for an amount
|
||||
of time, before entering a classic poll loop. This mode might be a
|
||||
little slower than pure classic polling, but it will be more efficient.
|
||||
If set to a value larger than 0, the kernel will put the process issuing
|
||||
IO to sleep for this amount of microseconds before entering classic
|
||||
polling.
|
||||
|
||||
io_timeout (RW)
|
||||
---------------
|
||||
io_timeout is the request timeout in milliseconds. If a request does not
|
||||
complete in this time then the block driver timeout handler is invoked.
|
||||
That timeout handler can decide to retry the request, to fail it or to start
|
||||
a device recovery strategy.
|
||||
|
||||
iostats (RW)
|
||||
-------------
|
||||
This file is used to control (on/off) the iostats accounting of the
|
||||
disk.
|
||||
|
||||
logical_block_size (RO)
|
||||
-----------------------
|
||||
This is the logical block size of the device, in bytes.
|
||||
|
||||
max_discard_segments (RO)
|
||||
-------------------------
|
||||
The maximum number of DMA scatter/gather entries in a discard request.
|
||||
|
||||
max_hw_sectors_kb (RO)
|
||||
----------------------
|
||||
This is the maximum number of kilobytes supported in a single data transfer.
|
||||
|
||||
max_integrity_segments (RO)
|
||||
---------------------------
|
||||
Maximum number of elements in a DMA scatter/gather list with integrity
|
||||
data that will be submitted by the block layer core to the associated
|
||||
block driver.
|
||||
|
||||
max_active_zones (RO)
|
||||
---------------------
|
||||
For zoned block devices (zoned attribute indicating "host-managed" or
|
||||
"host-aware"), the sum of zones belonging to any of the zone states:
|
||||
EXPLICIT OPEN, IMPLICIT OPEN or CLOSED, is limited by this value.
|
||||
If this value is 0, there is no limit.
|
||||
|
||||
If the host attempts to exceed this limit, the driver should report this error
|
||||
with BLK_STS_ZONE_ACTIVE_RESOURCE, which user space may see as the EOVERFLOW
|
||||
errno.
|
||||
|
||||
max_open_zones (RO)
|
||||
-------------------
|
||||
For zoned block devices (zoned attribute indicating "host-managed" or
|
||||
"host-aware"), the sum of zones belonging to any of the zone states:
|
||||
EXPLICIT OPEN or IMPLICIT OPEN, is limited by this value.
|
||||
If this value is 0, there is no limit.
|
||||
|
||||
If the host attempts to exceed this limit, the driver should report this error
|
||||
with BLK_STS_ZONE_OPEN_RESOURCE, which user space may see as the ETOOMANYREFS
|
||||
errno.
|
||||
|
||||
max_sectors_kb (RW)
|
||||
-------------------
|
||||
This is the maximum number of kilobytes that the block layer will allow
|
||||
for a filesystem request. Must be smaller than or equal to the maximum
|
||||
size allowed by the hardware.
|
||||
|
||||
max_segments (RO)
|
||||
-----------------
|
||||
Maximum number of elements in a DMA scatter/gather list that is submitted
|
||||
to the associated block driver.
|
||||
|
||||
max_segment_size (RO)
|
||||
---------------------
|
||||
Maximum size in bytes of a single element in a DMA scatter/gather list.
|
||||
|
||||
minimum_io_size (RO)
|
||||
--------------------
|
||||
This is the smallest preferred IO size reported by the device.
|
||||
|
||||
nomerges (RW)
|
||||
-------------
|
||||
This enables the user to disable the lookup logic involved with IO
|
||||
merging requests in the block layer. By default (0) all merges are
|
||||
enabled. When set to 1 only simple one-hit merges will be tried. When
|
||||
set to 2 no merge algorithms will be tried (including one-hit or more
|
||||
complex tree/hash lookups).
|
||||
|
||||
nr_requests (RW)
|
||||
----------------
|
||||
This controls how many requests may be allocated in the block layer for
|
||||
read or write requests. Note that the total allocated number may be twice
|
||||
this amount, since it applies only to reads or writes (not the accumulated
|
||||
sum).
|
||||
|
||||
To avoid priority inversion through request starvation, a request
|
||||
queue maintains a separate request pool per each cgroup when
|
||||
CONFIG_BLK_CGROUP is enabled, and this parameter applies to each such
|
||||
per-block-cgroup request pool. IOW, if there are N block cgroups,
|
||||
each request queue may have up to N request pools, each independently
|
||||
regulated by nr_requests.
|
||||
|
||||
nr_zones (RO)
|
||||
-------------
|
||||
For zoned block devices (zoned attribute indicating "host-managed" or
|
||||
"host-aware"), this indicates the total number of zones of the device.
|
||||
This is always 0 for regular block devices.
|
||||
|
||||
optimal_io_size (RO)
|
||||
--------------------
|
||||
This is the optimal IO size reported by the device.
|
||||
|
||||
physical_block_size (RO)
|
||||
------------------------
|
||||
This is the physical block size of device, in bytes.
|
||||
|
||||
read_ahead_kb (RW)
|
||||
------------------
|
||||
Maximum number of kilobytes to read-ahead for filesystems on this block
|
||||
device.
|
||||
|
||||
rotational (RW)
|
||||
---------------
|
||||
This file is used to stat if the device is of rotational type or
|
||||
non-rotational type.
|
||||
|
||||
rq_affinity (RW)
|
||||
----------------
|
||||
If this option is '1', the block layer will migrate request completions to the
|
||||
cpu "group" that originally submitted the request. For some workloads this
|
||||
provides a significant reduction in CPU cycles due to caching effects.
|
||||
|
||||
For storage configurations that need to maximize distribution of completion
|
||||
processing setting this option to '2' forces the completion to run on the
|
||||
requesting cpu (bypassing the "group" aggregation logic).
|
||||
|
||||
scheduler (RW)
|
||||
--------------
|
||||
When read, this file will display the current and available IO schedulers
|
||||
for this block device. The currently active IO scheduler will be enclosed
|
||||
in [] brackets. Writing an IO scheduler name to this file will switch
|
||||
control of this block device to that new IO scheduler. Note that writing
|
||||
an IO scheduler name to this file will attempt to load that IO scheduler
|
||||
module, if it isn't already present in the system.
|
||||
|
||||
write_cache (RW)
|
||||
----------------
|
||||
When read, this file will display whether the device has write back
|
||||
caching enabled or not. It will return "write back" for the former
|
||||
case, and "write through" for the latter. Writing to this file can
|
||||
change the kernels view of the device, but it doesn't alter the
|
||||
device state. This means that it might not be safe to toggle the
|
||||
setting from "write back" to "write through", since that will also
|
||||
eliminate cache flushes issued by the kernel.
|
||||
|
||||
write_same_max_bytes (RO)
|
||||
-------------------------
|
||||
This is the number of bytes the device can write in a single write-same
|
||||
command. A value of '0' means write-same is not supported by this
|
||||
device.
|
||||
|
||||
wbt_lat_usec (RW)
|
||||
-----------------
|
||||
If the device is registered for writeback throttling, then this file shows
|
||||
the target minimum read latency. If this latency is exceeded in a given
|
||||
window of time (see wb_window_usec), then the writeback throttling will start
|
||||
scaling back writes. Writing a value of '0' to this file disables the
|
||||
feature. Writing a value of '-1' to this file resets the value to the
|
||||
default setting.
|
||||
|
||||
throttle_sample_time (RW)
|
||||
-------------------------
|
||||
This is the time window that blk-throttle samples data, in millisecond.
|
||||
blk-throttle makes decision based on the samplings. Lower time means cgroups
|
||||
have more smooth throughput, but higher CPU overhead. This exists only when
|
||||
CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
|
||||
|
||||
write_zeroes_max_bytes (RO)
|
||||
---------------------------
|
||||
For block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of
|
||||
bytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES
|
||||
is not supported.
|
||||
|
||||
zone_append_max_bytes (RO)
|
||||
--------------------------
|
||||
This is the maximum number of bytes that can be written to a sequential
|
||||
zone of a zoned block device using a zone append write operation
|
||||
(REQ_OP_ZONE_APPEND). This value is always 0 for regular block devices.
|
||||
|
||||
zoned (RO)
|
||||
----------
|
||||
This indicates if the device is a zoned block device and the zone model of the
|
||||
device if it is indeed zoned. The possible values indicated by zoned are
|
||||
"none" for regular block devices and "host-aware" or "host-managed" for zoned
|
||||
block devices. The characteristics of host-aware and host-managed zoned block
|
||||
devices are described in the ZBC (Zoned Block Commands) and ZAC
|
||||
(Zoned Device ATA Command Set) standards. These standards also define the
|
||||
"drive-managed" zone model. However, since drive-managed zoned block devices
|
||||
do not support zone commands, they will be treated as regular block devices
|
||||
and zoned will report "none".
|
||||
|
||||
zone_write_granularity (RO)
|
||||
---------------------------
|
||||
This indicates the alignment constraint, in bytes, for write operations in
|
||||
sequential zones of zoned block devices (devices with a zoned attributed
|
||||
that reports "host-managed" or "host-aware"). This value is always 0 for
|
||||
regular block devices.
|
||||
|
||||
independent_access_ranges (RO)
|
||||
------------------------------
|
||||
|
||||
The presence of this sub-directory of the /sys/block/xxx/queue/ directory
|
||||
indicates that the device is capable of executing requests targeting
|
||||
different sector ranges in parallel. For instance, single LUN multi-actuator
|
||||
hard-disks will have an independent_access_ranges directory if the device
|
||||
correctly advertizes the sector ranges of its actuators.
|
||||
|
||||
The independent_access_ranges directory contains one directory per access
|
||||
range, with each range described using the sector (RO) attribute file to
|
||||
indicate the first sector of the range and the nr_sectors (RO) attribute file
|
||||
to indicate the total number of sectors in the range starting from the first
|
||||
sector of the range. For example, a dual-actuator hard-disk will have the
|
||||
following independent_access_ranges entries.::
|
||||
|
||||
$ tree /sys/block/<device>/queue/independent_access_ranges/
|
||||
/sys/block/<device>/queue/independent_access_ranges/
|
||||
|-- 0
|
||||
| |-- nr_sectors
|
||||
| `-- sector
|
||||
`-- 1
|
||||
|-- nr_sectors
|
||||
`-- sector
|
||||
|
||||
The sector and nr_sectors attributes use 512B sector unit, regardless of
|
||||
the actual block size of the device. Independent access ranges do not
|
||||
overlap and include all sectors within the device capacity. The access
|
||||
ranges are numbered in increasing order of the range start sector,
|
||||
that is, the sector attribute of range 0 always has the value 0.
|
||||
|
||||
Jens Axboe <jens.axboe@oracle.com>, February 2009
|
@ -3,7 +3,7 @@ BPF Type Format (BTF)
|
||||
=====================
|
||||
|
||||
1. Introduction
|
||||
***************
|
||||
===============
|
||||
|
||||
BTF (BPF Type Format) is the metadata format which encodes the debug info
|
||||
related to BPF program/map. The name BTF was used initially to describe data
|
||||
@ -30,7 +30,7 @@ sections are discussed in details in :ref:`BTF_Type_String`.
|
||||
.. _BTF_Type_String:
|
||||
|
||||
2. BTF Type and String Encoding
|
||||
*******************************
|
||||
===============================
|
||||
|
||||
The file ``include/uapi/linux/btf.h`` provides high-level definition of how
|
||||
types/strings are encoded.
|
||||
@ -57,13 +57,13 @@ little-endian target. The ``btf_header`` is designed to be extensible with
|
||||
generated.
|
||||
|
||||
2.1 String Encoding
|
||||
===================
|
||||
-------------------
|
||||
|
||||
The first string in the string section must be a null string. The rest of
|
||||
string table is a concatenation of other null-terminated strings.
|
||||
|
||||
2.2 Type Encoding
|
||||
=================
|
||||
-----------------
|
||||
|
||||
The type id ``0`` is reserved for ``void`` type. The type section is parsed
|
||||
sequentially and type id is assigned to each recognized type starting from id
|
||||
@ -86,6 +86,7 @@ sequentially and type id is assigned to each recognized type starting from id
|
||||
#define BTF_KIND_DATASEC 15 /* Section */
|
||||
#define BTF_KIND_FLOAT 16 /* Floating point */
|
||||
#define BTF_KIND_DECL_TAG 17 /* Decl Tag */
|
||||
#define BTF_KIND_TYPE_TAG 18 /* Type Tag */
|
||||
|
||||
Note that the type section encodes debug info, not just pure types.
|
||||
``BTF_KIND_FUNC`` is not a type, and it represents a defined subprogram.
|
||||
@ -107,7 +108,7 @@ Each type contains the following common data::
|
||||
* "size" tells the size of the type it is describing.
|
||||
*
|
||||
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
|
||||
* FUNC, FUNC_PROTO and DECL_TAG.
|
||||
* FUNC, FUNC_PROTO, DECL_TAG and TYPE_TAG.
|
||||
* "type" is a type_id referring to another type.
|
||||
*/
|
||||
union {
|
||||
@ -492,8 +493,18 @@ the attribute is applied to a ``struct``/``union`` member or
|
||||
a ``func`` argument, and ``btf_decl_tag.component_idx`` should be a
|
||||
valid index (starting from 0) pointing to a member or an argument.
|
||||
|
||||
2.2.17 BTF_KIND_TYPE_TAG
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
``struct btf_type`` encoding requirement:
|
||||
* ``name_off``: offset to a non-empty string
|
||||
* ``info.kind_flag``: 0
|
||||
* ``info.kind``: BTF_KIND_TYPE_TAG
|
||||
* ``info.vlen``: 0
|
||||
* ``type``: the type with ``btf_type_tag`` attribute
|
||||
|
||||
3. BTF Kernel API
|
||||
*****************
|
||||
=================
|
||||
|
||||
The following bpf syscall command involves BTF:
|
||||
* BPF_BTF_LOAD: load a blob of BTF data into kernel
|
||||
@ -536,14 +547,14 @@ The workflow typically looks like:
|
||||
|
||||
|
||||
3.1 BPF_BTF_LOAD
|
||||
================
|
||||
----------------
|
||||
|
||||
Load a blob of BTF data into kernel. A blob of data, described in
|
||||
:ref:`BTF_Type_String`, can be directly loaded into the kernel. A ``btf_fd``
|
||||
is returned to a userspace.
|
||||
|
||||
3.2 BPF_MAP_CREATE
|
||||
==================
|
||||
------------------
|
||||
|
||||
A map can be created with ``btf_fd`` and specified key/value type id.::
|
||||
|
||||
@ -570,7 +581,7 @@ automatically.
|
||||
.. _BPF_Prog_Load:
|
||||
|
||||
3.3 BPF_PROG_LOAD
|
||||
=================
|
||||
-----------------
|
||||
|
||||
During prog_load, func_info and line_info can be passed to kernel with proper
|
||||
values for the following attributes:
|
||||
@ -620,7 +631,7 @@ For line_info, the line number and column number are defined as below:
|
||||
#define BPF_LINE_INFO_LINE_COL(line_col) ((line_col) & 0x3ff)
|
||||
|
||||
3.4 BPF_{PROG,MAP}_GET_NEXT_ID
|
||||
==============================
|
||||
------------------------------
|
||||
|
||||
In kernel, every loaded program, map or btf has a unique id. The id won't
|
||||
change during the lifetime of a program, map, or btf.
|
||||
@ -630,13 +641,13 @@ each command, to user space, for bpf program or maps, respectively, so an
|
||||
inspection tool can inspect all programs and maps.
|
||||
|
||||
3.5 BPF_{PROG,MAP}_GET_FD_BY_ID
|
||||
===============================
|
||||
-------------------------------
|
||||
|
||||
An introspection tool cannot use id to get details about program or maps.
|
||||
A file descriptor needs to be obtained first for reference-counting purpose.
|
||||
|
||||
3.6 BPF_OBJ_GET_INFO_BY_FD
|
||||
==========================
|
||||
--------------------------
|
||||
|
||||
Once a program/map fd is acquired, an introspection tool can get the detailed
|
||||
information from kernel about this fd, some of which are BTF-related. For
|
||||
@ -645,7 +656,7 @@ example, ``bpf_map_info`` returns ``btf_id`` and key/value type ids.
|
||||
bpf byte codes, and jited_line_info.
|
||||
|
||||
3.7 BPF_BTF_GET_FD_BY_ID
|
||||
========================
|
||||
------------------------
|
||||
|
||||
With ``btf_id`` obtained in ``bpf_map_info`` and ``bpf_prog_info``, bpf
|
||||
syscall command BPF_BTF_GET_FD_BY_ID can retrieve a btf fd. Then, with
|
||||
@ -657,10 +668,10 @@ tool has full btf knowledge and is able to pretty print map key/values, dump
|
||||
func signatures and line info, along with byte/jit codes.
|
||||
|
||||
4. ELF File Format Interface
|
||||
****************************
|
||||
============================
|
||||
|
||||
4.1 .BTF section
|
||||
================
|
||||
----------------
|
||||
|
||||
The .BTF section contains type and string data. The format of this section is
|
||||
same as the one describe in :ref:`BTF_Type_String`.
|
||||
@ -668,7 +679,7 @@ same as the one describe in :ref:`BTF_Type_String`.
|
||||
.. _BTF_Ext_Section:
|
||||
|
||||
4.2 .BTF.ext section
|
||||
====================
|
||||
--------------------
|
||||
|
||||
The .BTF.ext section encodes func_info and line_info which needs loader
|
||||
manipulation before loading into the kernel.
|
||||
@ -732,7 +743,7 @@ bpf_insn``. For ELF API, the ``insn_off`` is the byte offset from the
|
||||
beginning of section (``btf_ext_info_sec->sec_name_off``).
|
||||
|
||||
4.2 .BTF_ids section
|
||||
====================
|
||||
--------------------
|
||||
|
||||
The .BTF_ids section encodes BTF ID values that are used within the kernel.
|
||||
|
||||
@ -793,10 +804,10 @@ All the BTF ID lists and sets are compiled in the .BTF_ids section and
|
||||
resolved during the linking phase of kernel build by ``resolve_btfids`` tool.
|
||||
|
||||
5. Using BTF
|
||||
************
|
||||
============
|
||||
|
||||
5.1 bpftool map pretty print
|
||||
============================
|
||||
----------------------------
|
||||
|
||||
With BTF, the map key/value can be printed based on fields rather than simply
|
||||
raw bytes. This is especially valuable for large structure or if your data
|
||||
@ -838,7 +849,7 @@ bpftool is able to pretty print like below:
|
||||
]
|
||||
|
||||
5.2 bpftool prog dump
|
||||
=====================
|
||||
---------------------
|
||||
|
||||
The following is an example showing how func_info and line_info can help prog
|
||||
dump with better kernel symbol names, function prototypes and line
|
||||
@ -872,7 +883,7 @@ information.::
|
||||
[...]
|
||||
|
||||
5.3 Verifier Log
|
||||
================
|
||||
----------------
|
||||
|
||||
The following is an example of how line_info can help debugging verification
|
||||
failure.::
|
||||
@ -898,7 +909,7 @@ failure.::
|
||||
R2 offset is outside of the packet
|
||||
|
||||
6. BTF Generation
|
||||
*****************
|
||||
=================
|
||||
|
||||
You need latest pahole
|
||||
|
||||
@ -1005,6 +1016,6 @@ format.::
|
||||
.long 8206 # Line 8 Col 14
|
||||
|
||||
7. Testing
|
||||
**********
|
||||
==========
|
||||
|
||||
Kernel bpf selftest `test_btf.c` provides extensive set of BTF-related tests.
|
||||
|
376
Documentation/bpf/classic_vs_extended.rst
Normal file
@ -0,0 +1,376 @@
|
||||
|
||||
===================
|
||||
Classic BPF vs eBPF
|
||||
===================
|
||||
|
||||
eBPF is designed to be JITed with one to one mapping, which can also open up
|
||||
the possibility for GCC/LLVM compilers to generate optimized eBPF code through
|
||||
an eBPF backend that performs almost as fast as natively compiled code.
|
||||
|
||||
Some core changes of the eBPF format from classic BPF:
|
||||
|
||||
- Number of registers increase from 2 to 10:
|
||||
|
||||
The old format had two registers A and X, and a hidden frame pointer. The
|
||||
new layout extends this to be 10 internal registers and a read-only frame
|
||||
pointer. Since 64-bit CPUs are passing arguments to functions via registers
|
||||
the number of args from eBPF program to in-kernel function is restricted
|
||||
to 5 and one register is used to accept return value from an in-kernel
|
||||
function. Natively, x86_64 passes first 6 arguments in registers, aarch64/
|
||||
sparcv9/mips64 have 7 - 8 registers for arguments; x86_64 has 6 callee saved
|
||||
registers, and aarch64/sparcv9/mips64 have 11 or more callee saved registers.
|
||||
|
||||
Thus, all eBPF registers map one to one to HW registers on x86_64, aarch64,
|
||||
etc, and eBPF calling convention maps directly to ABIs used by the kernel on
|
||||
64-bit architectures.
|
||||
|
||||
On 32-bit architectures JIT may map programs that use only 32-bit arithmetic
|
||||
and may let more complex programs to be interpreted.
|
||||
|
||||
R0 - R5 are scratch registers and eBPF program needs spill/fill them if
|
||||
necessary across calls. Note that there is only one eBPF program (== one
|
||||
eBPF main routine) and it cannot call other eBPF functions, it can only
|
||||
call predefined in-kernel functions, though.
|
||||
|
||||
- Register width increases from 32-bit to 64-bit:
|
||||
|
||||
Still, the semantics of the original 32-bit ALU operations are preserved
|
||||
via 32-bit subregisters. All eBPF registers are 64-bit with 32-bit lower
|
||||
subregisters that zero-extend into 64-bit if they are being written to.
|
||||
That behavior maps directly to x86_64 and arm64 subregister definition, but
|
||||
makes other JITs more difficult.
|
||||
|
||||
32-bit architectures run 64-bit eBPF programs via interpreter.
|
||||
Their JITs may convert BPF programs that only use 32-bit subregisters into
|
||||
native instruction set and let the rest being interpreted.
|
||||
|
||||
Operation is 64-bit, because on 64-bit architectures, pointers are also
|
||||
64-bit wide, and we want to pass 64-bit values in/out of kernel functions,
|
||||
so 32-bit eBPF registers would otherwise require to define register-pair
|
||||
ABI, thus, there won't be able to use a direct eBPF register to HW register
|
||||
mapping and JIT would need to do combine/split/move operations for every
|
||||
register in and out of the function, which is complex, bug prone and slow.
|
||||
Another reason is the use of atomic 64-bit counters.
|
||||
|
||||
- Conditional jt/jf targets replaced with jt/fall-through:
|
||||
|
||||
While the original design has constructs such as ``if (cond) jump_true;
|
||||
else jump_false;``, they are being replaced into alternative constructs like
|
||||
``if (cond) jump_true; /* else fall-through */``.
|
||||
|
||||
- Introduces bpf_call insn and register passing convention for zero overhead
|
||||
calls from/to other kernel functions:
|
||||
|
||||
Before an in-kernel function call, the eBPF program needs to
|
||||
place function arguments into R1 to R5 registers to satisfy calling
|
||||
convention, then the interpreter will take them from registers and pass
|
||||
to in-kernel function. If R1 - R5 registers are mapped to CPU registers
|
||||
that are used for argument passing on given architecture, the JIT compiler
|
||||
doesn't need to emit extra moves. Function arguments will be in the correct
|
||||
registers and BPF_CALL instruction will be JITed as single 'call' HW
|
||||
instruction. This calling convention was picked to cover common call
|
||||
situations without performance penalty.
|
||||
|
||||
After an in-kernel function call, R1 - R5 are reset to unreadable and R0 has
|
||||
a return value of the function. Since R6 - R9 are callee saved, their state
|
||||
is preserved across the call.
|
||||
|
||||
For example, consider three C functions::
|
||||
|
||||
u64 f1() { return (*_f2)(1); }
|
||||
u64 f2(u64 a) { return f3(a + 1, a); }
|
||||
u64 f3(u64 a, u64 b) { return a - b; }
|
||||
|
||||
GCC can compile f1, f3 into x86_64::
|
||||
|
||||
f1:
|
||||
movl $1, %edi
|
||||
movq _f2(%rip), %rax
|
||||
jmp *%rax
|
||||
f3:
|
||||
movq %rdi, %rax
|
||||
subq %rsi, %rax
|
||||
ret
|
||||
|
||||
Function f2 in eBPF may look like::
|
||||
|
||||
f2:
|
||||
bpf_mov R2, R1
|
||||
bpf_add R1, 1
|
||||
bpf_call f3
|
||||
bpf_exit
|
||||
|
||||
If f2 is JITed and the pointer stored to ``_f2``. The calls f1 -> f2 -> f3 and
|
||||
returns will be seamless. Without JIT, __bpf_prog_run() interpreter needs to
|
||||
be used to call into f2.
|
||||
|
||||
For practical reasons all eBPF programs have only one argument 'ctx' which is
|
||||
already placed into R1 (e.g. on __bpf_prog_run() startup) and the programs
|
||||
can call kernel functions with up to 5 arguments. Calls with 6 or more arguments
|
||||
are currently not supported, but these restrictions can be lifted if necessary
|
||||
in the future.
|
||||
|
||||
On 64-bit architectures all register map to HW registers one to one. For
|
||||
example, x86_64 JIT compiler can map them as ...
|
||||
|
||||
::
|
||||
|
||||
R0 - rax
|
||||
R1 - rdi
|
||||
R2 - rsi
|
||||
R3 - rdx
|
||||
R4 - rcx
|
||||
R5 - r8
|
||||
R6 - rbx
|
||||
R7 - r13
|
||||
R8 - r14
|
||||
R9 - r15
|
||||
R10 - rbp
|
||||
|
||||
... since x86_64 ABI mandates rdi, rsi, rdx, rcx, r8, r9 for argument passing
|
||||
and rbx, r12 - r15 are callee saved.
|
||||
|
||||
Then the following eBPF pseudo-program::
|
||||
|
||||
bpf_mov R6, R1 /* save ctx */
|
||||
bpf_mov R2, 2
|
||||
bpf_mov R3, 3
|
||||
bpf_mov R4, 4
|
||||
bpf_mov R5, 5
|
||||
bpf_call foo
|
||||
bpf_mov R7, R0 /* save foo() return value */
|
||||
bpf_mov R1, R6 /* restore ctx for next call */
|
||||
bpf_mov R2, 6
|
||||
bpf_mov R3, 7
|
||||
bpf_mov R4, 8
|
||||
bpf_mov R5, 9
|
||||
bpf_call bar
|
||||
bpf_add R0, R7
|
||||
bpf_exit
|
||||
|
||||
After JIT to x86_64 may look like::
|
||||
|
||||
push %rbp
|
||||
mov %rsp,%rbp
|
||||
sub $0x228,%rsp
|
||||
mov %rbx,-0x228(%rbp)
|
||||
mov %r13,-0x220(%rbp)
|
||||
mov %rdi,%rbx
|
||||
mov $0x2,%esi
|
||||
mov $0x3,%edx
|
||||
mov $0x4,%ecx
|
||||
mov $0x5,%r8d
|
||||
callq foo
|
||||
mov %rax,%r13
|
||||
mov %rbx,%rdi
|
||||
mov $0x6,%esi
|
||||
mov $0x7,%edx
|
||||
mov $0x8,%ecx
|
||||
mov $0x9,%r8d
|
||||
callq bar
|
||||
add %r13,%rax
|
||||
mov -0x228(%rbp),%rbx
|
||||
mov -0x220(%rbp),%r13
|
||||
leaveq
|
||||
retq
|
||||
|
||||
Which is in this example equivalent in C to::
|
||||
|
||||
u64 bpf_filter(u64 ctx)
|
||||
{
|
||||
return foo(ctx, 2, 3, 4, 5) + bar(ctx, 6, 7, 8, 9);
|
||||
}
|
||||
|
||||
In-kernel functions foo() and bar() with prototype: u64 (*)(u64 arg1, u64
|
||||
arg2, u64 arg3, u64 arg4, u64 arg5); will receive arguments in proper
|
||||
registers and place their return value into ``%rax`` which is R0 in eBPF.
|
||||
Prologue and epilogue are emitted by JIT and are implicit in the
|
||||
interpreter. R0-R5 are scratch registers, so eBPF program needs to preserve
|
||||
them across the calls as defined by calling convention.
|
||||
|
||||
For example the following program is invalid::
|
||||
|
||||
bpf_mov R1, 1
|
||||
bpf_call foo
|
||||
bpf_mov R0, R1
|
||||
bpf_exit
|
||||
|
||||
After the call the registers R1-R5 contain junk values and cannot be read.
|
||||
An in-kernel verifier.rst is used to validate eBPF programs.
|
||||
|
||||
Also in the new design, eBPF is limited to 4096 insns, which means that any
|
||||
program will terminate quickly and will only call a fixed number of kernel
|
||||
functions. Original BPF and eBPF are two operand instructions,
|
||||
which helps to do one-to-one mapping between eBPF insn and x86 insn during JIT.
|
||||
|
||||
The input context pointer for invoking the interpreter function is generic,
|
||||
its content is defined by a specific use case. For seccomp register R1 points
|
||||
to seccomp_data, for converted BPF filters R1 points to a skb.
|
||||
|
||||
A program, that is translated internally consists of the following elements::
|
||||
|
||||
op:16, jt:8, jf:8, k:32 ==> op:8, dst_reg:4, src_reg:4, off:16, imm:32
|
||||
|
||||
So far 87 eBPF instructions were implemented. 8-bit 'op' opcode field
|
||||
has room for new instructions. Some of them may use 16/24/32 byte encoding. New
|
||||
instructions must be multiple of 8 bytes to preserve backward compatibility.
|
||||
|
||||
eBPF is a general purpose RISC instruction set. Not every register and
|
||||
every instruction are used during translation from original BPF to eBPF.
|
||||
For example, socket filters are not using ``exclusive add`` instruction, but
|
||||
tracing filters may do to maintain counters of events, for example. Register R9
|
||||
is not used by socket filters either, but more complex filters may be running
|
||||
out of registers and would have to resort to spill/fill to stack.
|
||||
|
||||
eBPF can be used as a generic assembler for last step performance
|
||||
optimizations, socket filters and seccomp are using it as assembler. Tracing
|
||||
filters may use it as assembler to generate code from kernel. In kernel usage
|
||||
may not be bounded by security considerations, since generated eBPF code
|
||||
may be optimizing internal code path and not being exposed to the user space.
|
||||
Safety of eBPF can come from the verifier.rst. In such use cases as
|
||||
described, it may be used as safe instruction set.
|
||||
|
||||
Just like the original BPF, eBPF runs within a controlled environment,
|
||||
is deterministic and the kernel can easily prove that. The safety of the program
|
||||
can be determined in two steps: first step does depth-first-search to disallow
|
||||
loops and other CFG validation; second step starts from the first insn and
|
||||
descends all possible paths. It simulates execution of every insn and observes
|
||||
the state change of registers and stack.
|
||||
|
||||
opcode encoding
|
||||
===============
|
||||
|
||||
eBPF is reusing most of the opcode encoding from classic to simplify conversion
|
||||
of classic BPF to eBPF.
|
||||
|
||||
For arithmetic and jump instructions the 8-bit 'code' field is divided into three
|
||||
parts::
|
||||
|
||||
+----------------+--------+--------------------+
|
||||
| 4 bits | 1 bit | 3 bits |
|
||||
| operation code | source | instruction class |
|
||||
+----------------+--------+--------------------+
|
||||
(MSB) (LSB)
|
||||
|
||||
Three LSB bits store instruction class which is one of:
|
||||
|
||||
=================== ===============
|
||||
Classic BPF classes eBPF classes
|
||||
=================== ===============
|
||||
BPF_LD 0x00 BPF_LD 0x00
|
||||
BPF_LDX 0x01 BPF_LDX 0x01
|
||||
BPF_ST 0x02 BPF_ST 0x02
|
||||
BPF_STX 0x03 BPF_STX 0x03
|
||||
BPF_ALU 0x04 BPF_ALU 0x04
|
||||
BPF_JMP 0x05 BPF_JMP 0x05
|
||||
BPF_RET 0x06 BPF_JMP32 0x06
|
||||
BPF_MISC 0x07 BPF_ALU64 0x07
|
||||
=================== ===============
|
||||
|
||||
The 4th bit encodes the source operand ...
|
||||
|
||||
::
|
||||
|
||||
BPF_K 0x00
|
||||
BPF_X 0x08
|
||||
|
||||
* in classic BPF, this means::
|
||||
|
||||
BPF_SRC(code) == BPF_X - use register X as source operand
|
||||
BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
|
||||
|
||||
* in eBPF, this means::
|
||||
|
||||
BPF_SRC(code) == BPF_X - use 'src_reg' register as source operand
|
||||
BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
|
||||
|
||||
... and four MSB bits store operation code.
|
||||
|
||||
If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of::
|
||||
|
||||
BPF_ADD 0x00
|
||||
BPF_SUB 0x10
|
||||
BPF_MUL 0x20
|
||||
BPF_DIV 0x30
|
||||
BPF_OR 0x40
|
||||
BPF_AND 0x50
|
||||
BPF_LSH 0x60
|
||||
BPF_RSH 0x70
|
||||
BPF_NEG 0x80
|
||||
BPF_MOD 0x90
|
||||
BPF_XOR 0xa0
|
||||
BPF_MOV 0xb0 /* eBPF only: mov reg to reg */
|
||||
BPF_ARSH 0xc0 /* eBPF only: sign extending shift right */
|
||||
BPF_END 0xd0 /* eBPF only: endianness conversion */
|
||||
|
||||
If BPF_CLASS(code) == BPF_JMP or BPF_JMP32 [ in eBPF ], BPF_OP(code) is one of::
|
||||
|
||||
BPF_JA 0x00 /* BPF_JMP only */
|
||||
BPF_JEQ 0x10
|
||||
BPF_JGT 0x20
|
||||
BPF_JGE 0x30
|
||||
BPF_JSET 0x40
|
||||
BPF_JNE 0x50 /* eBPF only: jump != */
|
||||
BPF_JSGT 0x60 /* eBPF only: signed '>' */
|
||||
BPF_JSGE 0x70 /* eBPF only: signed '>=' */
|
||||
BPF_CALL 0x80 /* eBPF BPF_JMP only: function call */
|
||||
BPF_EXIT 0x90 /* eBPF BPF_JMP only: function return */
|
||||
BPF_JLT 0xa0 /* eBPF only: unsigned '<' */
|
||||
BPF_JLE 0xb0 /* eBPF only: unsigned '<=' */
|
||||
BPF_JSLT 0xc0 /* eBPF only: signed '<' */
|
||||
BPF_JSLE 0xd0 /* eBPF only: signed '<=' */
|
||||
|
||||
So BPF_ADD | BPF_X | BPF_ALU means 32-bit addition in both classic BPF
|
||||
and eBPF. There are only two registers in classic BPF, so it means A += X.
|
||||
In eBPF it means dst_reg = (u32) dst_reg + (u32) src_reg; similarly,
|
||||
BPF_XOR | BPF_K | BPF_ALU means A ^= imm32 in classic BPF and analogous
|
||||
src_reg = (u32) src_reg ^ (u32) imm32 in eBPF.
|
||||
|
||||
Classic BPF is using BPF_MISC class to represent A = X and X = A moves.
|
||||
eBPF is using BPF_MOV | BPF_X | BPF_ALU code instead. Since there are no
|
||||
BPF_MISC operations in eBPF, the class 7 is used as BPF_ALU64 to mean
|
||||
exactly the same operations as BPF_ALU, but with 64-bit wide operands
|
||||
instead. So BPF_ADD | BPF_X | BPF_ALU64 means 64-bit addition, i.e.:
|
||||
dst_reg = dst_reg + src_reg
|
||||
|
||||
Classic BPF wastes the whole BPF_RET class to represent a single ``ret``
|
||||
operation. Classic BPF_RET | BPF_K means copy imm32 into return register
|
||||
and perform function exit. eBPF is modeled to match CPU, so BPF_JMP | BPF_EXIT
|
||||
in eBPF means function exit only. The eBPF program needs to store return
|
||||
value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is used as
|
||||
BPF_JMP32 to mean exactly the same operations as BPF_JMP, but with 32-bit wide
|
||||
operands for the comparisons instead.
|
||||
|
||||
For load and store instructions the 8-bit 'code' field is divided as::
|
||||
|
||||
+--------+--------+-------------------+
|
||||
| 3 bits | 2 bits | 3 bits |
|
||||
| mode | size | instruction class |
|
||||
+--------+--------+-------------------+
|
||||
(MSB) (LSB)
|
||||
|
||||
Size modifier is one of ...
|
||||
|
||||
::
|
||||
|
||||
BPF_W 0x00 /* word */
|
||||
BPF_H 0x08 /* half word */
|
||||
BPF_B 0x10 /* byte */
|
||||
BPF_DW 0x18 /* eBPF only, double word */
|
||||
|
||||
... which encodes size of load/store operation::
|
||||
|
||||
B - 1 byte
|
||||
H - 2 byte
|
||||
W - 4 byte
|
||||
DW - 8 byte (eBPF only)
|
||||
|
||||
Mode modifier is one of::
|
||||
|
||||
BPF_IMM 0x00 /* used for 32-bit mov in classic BPF and 64-bit in eBPF */
|
||||
BPF_ABS 0x20
|
||||
BPF_IND 0x40
|
||||
BPF_MEM 0x60
|
||||
BPF_LEN 0x80 /* classic BPF only, reserved in eBPF */
|
||||
BPF_MSH 0xa0 /* classic BPF only, reserved in eBPF */
|
||||
BPF_ATOMIC 0xc0 /* eBPF only, atomic operations */
|
11
Documentation/bpf/faq.rst
Normal file
@ -0,0 +1,11 @@
|
||||
================================
|
||||
Frequently asked questions (FAQ)
|
||||
================================
|
||||
|
||||
Two sets of Questions and Answers (Q&A) are maintained.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
bpf_design_QA
|
||||
bpf_devel_QA
|
7
Documentation/bpf/helpers.rst
Normal file
@ -0,0 +1,7 @@
|
||||
Helper functions
|
||||
================
|
||||
|
||||
* `bpf-helpers(7)`_ maintains a list of helpers available to eBPF programs.
|
||||
|
||||
.. Links
|
||||
.. _bpf-helpers(7): https://man7.org/linux/man-pages/man7/bpf-helpers.7.html
|
@ -5,104 +5,33 @@ BPF Documentation
|
||||
This directory contains documentation for the BPF (Berkeley Packet
|
||||
Filter) facility, with a focus on the extended BPF version (eBPF).
|
||||
|
||||
This kernel side documentation is still work in progress. The main
|
||||
textual documentation is (for historical reasons) described in
|
||||
:ref:`networking-filter`, which describe both classical and extended
|
||||
BPF instruction-set.
|
||||
This kernel side documentation is still work in progress.
|
||||
The Cilium project also maintains a `BPF and XDP Reference Guide`_
|
||||
that goes into great technical depth about the BPF Architecture.
|
||||
|
||||
libbpf
|
||||
======
|
||||
|
||||
Documentation/bpf/libbpf/index.rst is a userspace library for loading and interacting with bpf programs.
|
||||
|
||||
BPF Type Format (BTF)
|
||||
=====================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
instruction-set
|
||||
verifier
|
||||
libbpf/index
|
||||
btf
|
||||
|
||||
|
||||
Frequently asked questions (FAQ)
|
||||
================================
|
||||
|
||||
Two sets of Questions and Answers (Q&A) are maintained.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
bpf_design_QA
|
||||
bpf_devel_QA
|
||||
|
||||
Syscall API
|
||||
===========
|
||||
|
||||
The primary info for the bpf syscall is available in the `man-pages`_
|
||||
for `bpf(2)`_. For more information about the userspace API, see
|
||||
Documentation/userspace-api/ebpf/index.rst.
|
||||
|
||||
Helper functions
|
||||
================
|
||||
|
||||
* `bpf-helpers(7)`_ maintains a list of helpers available to eBPF programs.
|
||||
|
||||
|
||||
Program types
|
||||
=============
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
prog_cgroup_sockopt
|
||||
prog_cgroup_sysctl
|
||||
prog_flow_dissector
|
||||
bpf_lsm
|
||||
prog_sk_lookup
|
||||
|
||||
|
||||
Map types
|
||||
=========
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
map_cgroup_storage
|
||||
|
||||
|
||||
Testing and debugging BPF
|
||||
=========================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
drgn
|
||||
s390
|
||||
|
||||
|
||||
Licensing
|
||||
=========
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
faq
|
||||
syscall_api
|
||||
helpers
|
||||
programs
|
||||
maps
|
||||
classic_vs_extended.rst
|
||||
bpf_licensing
|
||||
test_debug
|
||||
other
|
||||
|
||||
.. only:: subproject and html
|
||||
|
||||
Other
|
||||
=====
|
||||
Indices
|
||||
=======
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
ringbuf
|
||||
llvm_reloc
|
||||
* :ref:`genindex`
|
||||
|
||||
.. Links:
|
||||
.. _networking-filter: ../networking/filter.rst
|
||||
.. _man-pages: https://www.kernel.org/doc/man-pages/
|
||||
.. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html
|
||||
.. _bpf-helpers(7): https://man7.org/linux/man-pages/man7/bpf-helpers.7.html
|
||||
.. _BPF and XDP Reference Guide: https://docs.cilium.io/en/latest/bpf/
|
||||
|
279
Documentation/bpf/instruction-set.rst
Normal file
@ -0,0 +1,279 @@
|
||||
|
||||
====================
|
||||
eBPF Instruction Set
|
||||
====================
|
||||
|
||||
Registers and calling convention
|
||||
================================
|
||||
|
||||
eBPF has 10 general purpose registers and a read-only frame pointer register,
|
||||
all of which are 64-bits wide.
|
||||
|
||||
The eBPF calling convention is defined as:
|
||||
|
||||
* R0: return value from function calls, and exit value for eBPF programs
|
||||
* R1 - R5: arguments for function calls
|
||||
* R6 - R9: callee saved registers that function calls will preserve
|
||||
* R10: read-only frame pointer to access stack
|
||||
|
||||
R0 - R5 are scratch registers and eBPF programs needs to spill/fill them if
|
||||
necessary across calls.
|
||||
|
||||
Instruction encoding
|
||||
====================
|
||||
|
||||
eBPF uses 64-bit instructions with the following encoding:
|
||||
|
||||
============= ======= =============== ==================== ============
|
||||
32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
|
||||
============= ======= =============== ==================== ============
|
||||
immediate offset source register destination register opcode
|
||||
============= ======= =============== ==================== ============
|
||||
|
||||
Note that most instructions do not use all of the fields.
|
||||
Unused fields shall be cleared to zero.
|
||||
|
||||
Instruction classes
|
||||
-------------------
|
||||
|
||||
The three LSB bits of the 'opcode' field store the instruction class:
|
||||
|
||||
========= ===== ===============================
|
||||
class value description
|
||||
========= ===== ===============================
|
||||
BPF_LD 0x00 non-standard load operations
|
||||
BPF_LDX 0x01 load into register operations
|
||||
BPF_ST 0x02 store from immediate operations
|
||||
BPF_STX 0x03 store from register operations
|
||||
BPF_ALU 0x04 32-bit arithmetic operations
|
||||
BPF_JMP 0x05 64-bit jump operations
|
||||
BPF_JMP32 0x06 32-bit jump operations
|
||||
BPF_ALU64 0x07 64-bit arithmetic operations
|
||||
========= ===== ===============================
|
||||
|
||||
Arithmetic and jump instructions
|
||||
================================
|
||||
|
||||
For arithmetic and jump instructions (BPF_ALU, BPF_ALU64, BPF_JMP and
|
||||
BPF_JMP32), the 8-bit 'opcode' field is divided into three parts:
|
||||
|
||||
============== ====== =================
|
||||
4 bits (MSB) 1 bit 3 bits (LSB)
|
||||
============== ====== =================
|
||||
operation code source instruction class
|
||||
============== ====== =================
|
||||
|
||||
The 4th bit encodes the source operand:
|
||||
|
||||
====== ===== ========================================
|
||||
source value description
|
||||
====== ===== ========================================
|
||||
BPF_K 0x00 use 32-bit immediate as source operand
|
||||
BPF_X 0x08 use 'src_reg' register as source operand
|
||||
====== ===== ========================================
|
||||
|
||||
The four MSB bits store the operation code.
|
||||
|
||||
|
||||
Arithmetic instructions
|
||||
-----------------------
|
||||
|
||||
BPF_ALU uses 32-bit wide operands while BPF_ALU64 uses 64-bit wide operands for
|
||||
otherwise identical operations.
|
||||
The code field encodes the operation as below:
|
||||
|
||||
======== ===== ==========================
|
||||
code value description
|
||||
======== ===== ==========================
|
||||
BPF_ADD 0x00 dst += src
|
||||
BPF_SUB 0x10 dst -= src
|
||||
BPF_MUL 0x20 dst \*= src
|
||||
BPF_DIV 0x30 dst /= src
|
||||
BPF_OR 0x40 dst \|= src
|
||||
BPF_AND 0x50 dst &= src
|
||||
BPF_LSH 0x60 dst <<= src
|
||||
BPF_RSH 0x70 dst >>= src
|
||||
BPF_NEG 0x80 dst = ~src
|
||||
BPF_MOD 0x90 dst %= src
|
||||
BPF_XOR 0xa0 dst ^= src
|
||||
BPF_MOV 0xb0 dst = src
|
||||
BPF_ARSH 0xc0 sign extending shift right
|
||||
BPF_END 0xd0 endianness conversion
|
||||
======== ===== ==========================
|
||||
|
||||
BPF_ADD | BPF_X | BPF_ALU means::
|
||||
|
||||
dst_reg = (u32) dst_reg + (u32) src_reg;
|
||||
|
||||
BPF_ADD | BPF_X | BPF_ALU64 means::
|
||||
|
||||
dst_reg = dst_reg + src_reg
|
||||
|
||||
BPF_XOR | BPF_K | BPF_ALU means::
|
||||
|
||||
src_reg = (u32) src_reg ^ (u32) imm32
|
||||
|
||||
BPF_XOR | BPF_K | BPF_ALU64 means::
|
||||
|
||||
src_reg = src_reg ^ imm32
|
||||
|
||||
|
||||
Jump instructions
|
||||
-----------------
|
||||
|
||||
BPF_JMP32 uses 32-bit wide operands while BPF_JMP uses 64-bit wide operands for
|
||||
otherwise identical operations.
|
||||
The code field encodes the operation as below:
|
||||
|
||||
======== ===== ========================= ============
|
||||
code value description notes
|
||||
======== ===== ========================= ============
|
||||
BPF_JA 0x00 PC += off BPF_JMP only
|
||||
BPF_JEQ 0x10 PC += off if dst == src
|
||||
BPF_JGT 0x20 PC += off if dst > src unsigned
|
||||
BPF_JGE 0x30 PC += off if dst >= src unsigned
|
||||
BPF_JSET 0x40 PC += off if dst & src
|
||||
BPF_JNE 0x50 PC += off if dst != src
|
||||
BPF_JSGT 0x60 PC += off if dst > src signed
|
||||
BPF_JSGE 0x70 PC += off if dst >= src signed
|
||||
BPF_CALL 0x80 function call
|
||||
BPF_EXIT 0x90 function / program return BPF_JMP only
|
||||
BPF_JLT 0xa0 PC += off if dst < src unsigned
|
||||
BPF_JLE 0xb0 PC += off if dst <= src unsigned
|
||||
BPF_JSLT 0xc0 PC += off if dst < src signed
|
||||
BPF_JSLE 0xd0 PC += off if dst <= src signed
|
||||
======== ===== ========================= ============
|
||||
|
||||
The eBPF program needs to store the return value into register R0 before doing a
|
||||
BPF_EXIT.
|
||||
|
||||
|
||||
Load and store instructions
|
||||
===========================
|
||||
|
||||
For load and store instructions (BPF_LD, BPF_LDX, BPF_ST and BPF_STX), the
|
||||
8-bit 'opcode' field is divided as:
|
||||
|
||||
============ ====== =================
|
||||
3 bits (MSB) 2 bits 3 bits (LSB)
|
||||
============ ====== =================
|
||||
mode size instruction class
|
||||
============ ====== =================
|
||||
|
||||
The size modifier is one of:
|
||||
|
||||
============= ===== =====================
|
||||
size modifier value description
|
||||
============= ===== =====================
|
||||
BPF_W 0x00 word (4 bytes)
|
||||
BPF_H 0x08 half word (2 bytes)
|
||||
BPF_B 0x10 byte
|
||||
BPF_DW 0x18 double word (8 bytes)
|
||||
============= ===== =====================
|
||||
|
||||
The mode modifier is one of:
|
||||
|
||||
============= ===== ====================================
|
||||
mode modifier value description
|
||||
============= ===== ====================================
|
||||
BPF_IMM 0x00 used for 64-bit mov
|
||||
BPF_ABS 0x20 legacy BPF packet access
|
||||
BPF_IND 0x40 legacy BPF packet access
|
||||
BPF_MEM 0x60 all normal load and store operations
|
||||
BPF_ATOMIC 0xc0 atomic operations
|
||||
============= ===== ====================================
|
||||
|
||||
BPF_MEM | <size> | BPF_STX means::
|
||||
|
||||
*(size *) (dst_reg + off) = src_reg
|
||||
|
||||
BPF_MEM | <size> | BPF_ST means::
|
||||
|
||||
*(size *) (dst_reg + off) = imm32
|
||||
|
||||
BPF_MEM | <size> | BPF_LDX means::
|
||||
|
||||
dst_reg = *(size *) (src_reg + off)
|
||||
|
||||
Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW.
|
||||
|
||||
Atomic operations
|
||||
-----------------
|
||||
|
||||
eBPF includes atomic operations, which use the immediate field for extra
|
||||
encoding::
|
||||
|
||||
.imm = BPF_ADD, .code = BPF_ATOMIC | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
|
||||
.imm = BPF_ADD, .code = BPF_ATOMIC | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
|
||||
|
||||
The basic atomic operations supported are::
|
||||
|
||||
BPF_ADD
|
||||
BPF_AND
|
||||
BPF_OR
|
||||
BPF_XOR
|
||||
|
||||
Each having equivalent semantics with the ``BPF_ADD`` example, that is: the
|
||||
memory location addresed by ``dst_reg + off`` is atomically modified, with
|
||||
``src_reg`` as the other operand. If the ``BPF_FETCH`` flag is set in the
|
||||
immediate, then these operations also overwrite ``src_reg`` with the
|
||||
value that was in memory before it was modified.
|
||||
|
||||
The more special operations are::
|
||||
|
||||
BPF_XCHG
|
||||
|
||||
This atomically exchanges ``src_reg`` with the value addressed by ``dst_reg +
|
||||
off``. ::
|
||||
|
||||
BPF_CMPXCHG
|
||||
|
||||
This atomically compares the value addressed by ``dst_reg + off`` with
|
||||
``R0``. If they match it is replaced with ``src_reg``. In either case, the
|
||||
value that was there before is zero-extended and loaded back to ``R0``.
|
||||
|
||||
Note that 1 and 2 byte atomic operations are not supported.
|
||||
|
||||
Clang can generate atomic instructions by default when ``-mcpu=v3`` is
|
||||
enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction
|
||||
Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable
|
||||
the atomics features, while keeping a lower ``-mcpu`` version, you can use
|
||||
``-Xclang -target-feature -Xclang +alu32``.
|
||||
|
||||
You may encounter ``BPF_XADD`` - this is a legacy name for ``BPF_ATOMIC``,
|
||||
referring to the exclusive-add operation encoded when the immediate field is
|
||||
zero.
|
||||
|
||||
16-byte instructions
|
||||
--------------------
|
||||
|
||||
eBPF has one 16-byte instruction: ``BPF_LD | BPF_DW | BPF_IMM`` which consists
|
||||
of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single
|
||||
instruction that loads 64-bit immediate value into a dst_reg.
|
||||
|
||||
Packet access instructions
|
||||
--------------------------
|
||||
|
||||
eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
|
||||
(BPF_IND | <size> | BPF_LD) which are used to access packet data.
|
||||
|
||||
They had to be carried over from classic BPF to have strong performance of
|
||||
socket filters running in eBPF interpreter. These instructions can only
|
||||
be used when interpreter context is a pointer to ``struct sk_buff`` and
|
||||
have seven implicit operands. Register R6 is an implicit input that must
|
||||
contain pointer to sk_buff. Register R0 is an implicit output which contains
|
||||
the data fetched from the packet. Registers R1-R5 are scratch registers
|
||||
and must not be used to store the data across BPF_ABS | BPF_LD or
|
||||
BPF_IND | BPF_LD instructions.
|
||||
|
||||
These instructions have implicit program exit condition as well. When
|
||||
eBPF program is trying to access the data beyond the packet boundary,
|
||||
the interpreter will abort the execution of the program. JIT compilers
|
||||
therefore must preserve this property. src_reg and imm32 fields are
|
||||
explicit inputs to these instructions.
|
||||
|
||||
For example, BPF_IND | BPF_W | BPF_LD means::
|
||||
|
||||
R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
|
||||
|
||||
and R1 - R5 are clobbered.
|
@ -3,8 +3,6 @@
|
||||
libbpf
|
||||
======
|
||||
|
||||
For API documentation see the `versioned API documentation site <https://libbpf.readthedocs.io/en/latest/api.html>`_.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
@ -14,6 +12,8 @@ For API documentation see the `versioned API documentation site <https://libbpf.
|
||||
This is documentation for libbpf, a userspace library for loading and
|
||||
interacting with bpf programs.
|
||||
|
||||
For API documentation see the `versioned API documentation site <https://libbpf.readthedocs.io/en/latest/api.html>`_.
|
||||
|
||||
All general BPF questions, including kernel functionality, libbpf APIs and
|
||||
their application, should be sent to bpf@vger.kernel.org mailing list.
|
||||
You can `subscribe <http://vger.kernel.org/vger-lists.html#bpf>`_ to the
|
||||
|
52
Documentation/bpf/maps.rst
Normal file
@ -0,0 +1,52 @@
|
||||
|
||||
=========
|
||||
eBPF maps
|
||||
=========
|
||||
|
||||
'maps' is a generic storage of different types for sharing data between kernel
|
||||
and userspace.
|
||||
|
||||
The maps are accessed from user space via BPF syscall, which has commands:
|
||||
|
||||
- create a map with given type and attributes
|
||||
``map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)``
|
||||
using attr->map_type, attr->key_size, attr->value_size, attr->max_entries
|
||||
returns process-local file descriptor or negative error
|
||||
|
||||
- lookup key in a given map
|
||||
``err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)``
|
||||
using attr->map_fd, attr->key, attr->value
|
||||
returns zero and stores found elem into value or negative error
|
||||
|
||||
- create or update key/value pair in a given map
|
||||
``err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)``
|
||||
using attr->map_fd, attr->key, attr->value
|
||||
returns zero or negative error
|
||||
|
||||
- find and delete element by key in a given map
|
||||
``err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)``
|
||||
using attr->map_fd, attr->key
|
||||
|
||||
- to delete map: close(fd)
|
||||
Exiting process will delete maps automatically
|
||||
|
||||
userspace programs use this syscall to create/access maps that eBPF programs
|
||||
are concurrently updating.
|
||||
|
||||
maps can have different types: hash, array, bloom filter, radix-tree, etc.
|
||||
|
||||
The map is defined by:
|
||||
|
||||
- type
|
||||
- max number of elements
|
||||
- key size in bytes
|
||||
- value size in bytes
|
||||
|
||||
Map Types
|
||||
=========
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:glob:
|
||||
|
||||
map_*
|
9
Documentation/bpf/other.rst
Normal file
@ -0,0 +1,9 @@
|
||||
=====
|
||||
Other
|
||||
=====
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
ringbuf
|
||||
llvm_reloc
|
9
Documentation/bpf/programs.rst
Normal file
@ -0,0 +1,9 @@
|
||||
=============
|
||||
Program Types
|
||||
=============
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:glob:
|
||||
|
||||
prog_*
|
11
Documentation/bpf/syscall_api.rst
Normal file
@ -0,0 +1,11 @@
|
||||
===========
|
||||
Syscall API
|
||||
===========
|
||||
|
||||
The primary info for the bpf syscall is available in the `man-pages`_
|
||||
for `bpf(2)`_. For more information about the userspace API, see
|
||||
Documentation/userspace-api/ebpf/index.rst.
|
||||
|
||||
.. Links:
|
||||
.. _man-pages: https://www.kernel.org/doc/man-pages/
|
||||
.. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html
|
9
Documentation/bpf/test_debug.rst
Normal file
@ -0,0 +1,9 @@
|
||||
=========================
|
||||
Testing and debugging BPF
|
||||
=========================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
drgn
|
||||
s390
|
529
Documentation/bpf/verifier.rst
Normal file
@ -0,0 +1,529 @@
|
||||
|
||||
=============
|
||||
eBPF verifier
|
||||
=============
|
||||
|
||||
The safety of the eBPF program is determined in two steps.
|
||||
|
||||
First step does DAG check to disallow loops and other CFG validation.
|
||||
In particular it will detect programs that have unreachable instructions.
|
||||
(though classic BPF checker allows them)
|
||||
|
||||
Second step starts from the first insn and descends all possible paths.
|
||||
It simulates execution of every insn and observes the state change of
|
||||
registers and stack.
|
||||
|
||||
At the start of the program the register R1 contains a pointer to context
|
||||
and has type PTR_TO_CTX.
|
||||
If verifier sees an insn that does R2=R1, then R2 has now type
|
||||
PTR_TO_CTX as well and can be used on the right hand side of expression.
|
||||
If R1=PTR_TO_CTX and insn is R2=R1+R1, then R2=SCALAR_VALUE,
|
||||
since addition of two valid pointers makes invalid pointer.
|
||||
(In 'secure' mode verifier will reject any type of pointer arithmetic to make
|
||||
sure that kernel addresses don't leak to unprivileged users)
|
||||
|
||||
If register was never written to, it's not readable::
|
||||
|
||||
bpf_mov R0 = R2
|
||||
bpf_exit
|
||||
|
||||
will be rejected, since R2 is unreadable at the start of the program.
|
||||
|
||||
After kernel function call, R1-R5 are reset to unreadable and
|
||||
R0 has a return type of the function.
|
||||
|
||||
Since R6-R9 are callee saved, their state is preserved across the call.
|
||||
|
||||
::
|
||||
|
||||
bpf_mov R6 = 1
|
||||
bpf_call foo
|
||||
bpf_mov R0 = R6
|
||||
bpf_exit
|
||||
|
||||
is a correct program. If there was R1 instead of R6, it would have
|
||||
been rejected.
|
||||
|
||||
load/store instructions are allowed only with registers of valid types, which
|
||||
are PTR_TO_CTX, PTR_TO_MAP, PTR_TO_STACK. They are bounds and alignment checked.
|
||||
For example::
|
||||
|
||||
bpf_mov R1 = 1
|
||||
bpf_mov R2 = 2
|
||||
bpf_xadd *(u32 *)(R1 + 3) += R2
|
||||
bpf_exit
|
||||
|
||||
will be rejected, since R1 doesn't have a valid pointer type at the time of
|
||||
execution of instruction bpf_xadd.
|
||||
|
||||
At the start R1 type is PTR_TO_CTX (a pointer to generic ``struct bpf_context``)
|
||||
A callback is used to customize verifier to restrict eBPF program access to only
|
||||
certain fields within ctx structure with specified size and alignment.
|
||||
|
||||
For example, the following insn::
|
||||
|
||||
bpf_ld R0 = *(u32 *)(R6 + 8)
|
||||
|
||||
intends to load a word from address R6 + 8 and store it into R0
|
||||
If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know
|
||||
that offset 8 of size 4 bytes can be accessed for reading, otherwise
|
||||
the verifier will reject the program.
|
||||
If R6=PTR_TO_STACK, then access should be aligned and be within
|
||||
stack bounds, which are [-MAX_BPF_STACK, 0). In this example offset is 8,
|
||||
so it will fail verification, since it's out of bounds.
|
||||
|
||||
The verifier will allow eBPF program to read data from stack only after
|
||||
it wrote into it.
|
||||
|
||||
Classic BPF verifier does similar check with M[0-15] memory slots.
|
||||
For example::
|
||||
|
||||
bpf_ld R0 = *(u32 *)(R10 - 4)
|
||||
bpf_exit
|
||||
|
||||
is invalid program.
|
||||
Though R10 is correct read-only register and has type PTR_TO_STACK
|
||||
and R10 - 4 is within stack bounds, there were no stores into that location.
|
||||
|
||||
Pointer register spill/fill is tracked as well, since four (R6-R9)
|
||||
callee saved registers may not be enough for some programs.
|
||||
|
||||
Allowed function calls are customized with bpf_verifier_ops->get_func_proto()
|
||||
The eBPF verifier will check that registers match argument constraints.
|
||||
After the call register R0 will be set to return type of the function.
|
||||
|
||||
Function calls is a main mechanism to extend functionality of eBPF programs.
|
||||
Socket filters may let programs to call one set of functions, whereas tracing
|
||||
filters may allow completely different set.
|
||||
|
||||
If a function made accessible to eBPF program, it needs to be thought through
|
||||
from safety point of view. The verifier will guarantee that the function is
|
||||
called with valid arguments.
|
||||
|
||||
seccomp vs socket filters have different security restrictions for classic BPF.
|
||||
Seccomp solves this by two stage verifier: classic BPF verifier is followed
|
||||
by seccomp verifier. In case of eBPF one configurable verifier is shared for
|
||||
all use cases.
|
||||
|
||||
See details of eBPF verifier in kernel/bpf/verifier.c
|
||||
|
||||
Register value tracking
|
||||
=======================
|
||||
|
||||
In order to determine the safety of an eBPF program, the verifier must track
|
||||
the range of possible values in each register and also in each stack slot.
|
||||
This is done with ``struct bpf_reg_state``, defined in include/linux/
|
||||
bpf_verifier.h, which unifies tracking of scalar and pointer values. Each
|
||||
register state has a type, which is either NOT_INIT (the register has not been
|
||||
written to), SCALAR_VALUE (some value which is not usable as a pointer), or a
|
||||
pointer type. The types of pointers describe their base, as follows:
|
||||
|
||||
|
||||
PTR_TO_CTX
|
||||
Pointer to bpf_context.
|
||||
CONST_PTR_TO_MAP
|
||||
Pointer to struct bpf_map. "Const" because arithmetic
|
||||
on these pointers is forbidden.
|
||||
PTR_TO_MAP_VALUE
|
||||
Pointer to the value stored in a map element.
|
||||
PTR_TO_MAP_VALUE_OR_NULL
|
||||
Either a pointer to a map value, or NULL; map accesses
|
||||
(see maps.rst) return this type, which becomes a
|
||||
PTR_TO_MAP_VALUE when checked != NULL. Arithmetic on
|
||||
these pointers is forbidden.
|
||||
PTR_TO_STACK
|
||||
Frame pointer.
|
||||
PTR_TO_PACKET
|
||||
skb->data.
|
||||
PTR_TO_PACKET_END
|
||||
skb->data + headlen; arithmetic forbidden.
|
||||
PTR_TO_SOCKET
|
||||
Pointer to struct bpf_sock_ops, implicitly refcounted.
|
||||
PTR_TO_SOCKET_OR_NULL
|
||||
Either a pointer to a socket, or NULL; socket lookup
|
||||
returns this type, which becomes a PTR_TO_SOCKET when
|
||||
checked != NULL. PTR_TO_SOCKET is reference-counted,
|
||||
so programs must release the reference through the
|
||||
socket release function before the end of the program.
|
||||
Arithmetic on these pointers is forbidden.
|
||||
|
||||
However, a pointer may be offset from this base (as a result of pointer
|
||||
arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable
|
||||
offset'. The former is used when an exactly-known value (e.g. an immediate
|
||||
operand) is added to a pointer, while the latter is used for values which are
|
||||
not exactly known. The variable offset is also used in SCALAR_VALUEs, to track
|
||||
the range of possible values in the register.
|
||||
|
||||
The verifier's knowledge about the variable offset consists of:
|
||||
|
||||
* minimum and maximum values as unsigned
|
||||
* minimum and maximum values as signed
|
||||
|
||||
* knowledge of the values of individual bits, in the form of a 'tnum': a u64
|
||||
'mask' and a u64 'value'. 1s in the mask represent bits whose value is unknown;
|
||||
1s in the value represent bits known to be 1. Bits known to be 0 have 0 in both
|
||||
mask and value; no bit should ever be 1 in both. For example, if a byte is read
|
||||
into a register from memory, the register's top 56 bits are known zero, while
|
||||
the low 8 are unknown - which is represented as the tnum (0x0; 0xff). If we
|
||||
then OR this with 0x40, we get (0x40; 0xbf), then if we add 1 we get (0x0;
|
||||
0x1ff), because of potential carries.
|
||||
|
||||
Besides arithmetic, the register state can also be updated by conditional
|
||||
branches. For instance, if a SCALAR_VALUE is compared > 8, in the 'true' branch
|
||||
it will have a umin_value (unsigned minimum value) of 9, whereas in the 'false'
|
||||
branch it will have a umax_value of 8. A signed compare (with BPF_JSGT or
|
||||
BPF_JSGE) would instead update the signed minimum/maximum values. Information
|
||||
from the signed and unsigned bounds can be combined; for instance if a value is
|
||||
first tested < 8 and then tested s> 4, the verifier will conclude that the value
|
||||
is also > 4 and s< 8, since the bounds prevent crossing the sign boundary.
|
||||
|
||||
PTR_TO_PACKETs with a variable offset part have an 'id', which is common to all
|
||||
pointers sharing that same variable offset. This is important for packet range
|
||||
checks: after adding a variable to a packet pointer register A, if you then copy
|
||||
it to another register B and then add a constant 4 to A, both registers will
|
||||
share the same 'id' but the A will have a fixed offset of +4. Then if A is
|
||||
bounds-checked and found to be less than a PTR_TO_PACKET_END, the register B is
|
||||
now known to have a safe range of at least 4 bytes. See 'Direct packet access',
|
||||
below, for more on PTR_TO_PACKET ranges.
|
||||
|
||||
The 'id' field is also used on PTR_TO_MAP_VALUE_OR_NULL, common to all copies of
|
||||
the pointer returned from a map lookup. This means that when one copy is
|
||||
checked and found to be non-NULL, all copies can become PTR_TO_MAP_VALUEs.
|
||||
As well as range-checking, the tracked information is also used for enforcing
|
||||
alignment of pointer accesses. For instance, on most systems the packet pointer
|
||||
is 2 bytes after a 4-byte alignment. If a program adds 14 bytes to that to jump
|
||||
over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting
|
||||
pointer will have a variable offset known to be 4n+2 for some n, so adding the 2
|
||||
bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through
|
||||
that pointer are safe.
|
||||
The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common
|
||||
to all copies of the pointer returned from a socket lookup. This has similar
|
||||
behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but
|
||||
it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly
|
||||
represents a reference to the corresponding ``struct sock``. To ensure that the
|
||||
reference is not leaked, it is imperative to NULL-check the reference and in
|
||||
the non-NULL case, and pass the valid reference to the socket release function.
|
||||
|
||||
Direct packet access
|
||||
====================
|
||||
|
||||
In cls_bpf and act_bpf programs the verifier allows direct access to the packet
|
||||
data via skb->data and skb->data_end pointers.
|
||||
Ex::
|
||||
|
||||
1: r4 = *(u32 *)(r1 +80) /* load skb->data_end */
|
||||
2: r3 = *(u32 *)(r1 +76) /* load skb->data */
|
||||
3: r5 = r3
|
||||
4: r5 += 14
|
||||
5: if r5 > r4 goto pc+16
|
||||
R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
|
||||
6: r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */
|
||||
|
||||
this 2byte load from the packet is safe to do, since the program author
|
||||
did check ``if (skb->data + 14 > skb->data_end) goto err`` at insn #5 which
|
||||
means that in the fall-through case the register R3 (which points to skb->data)
|
||||
has at least 14 directly accessible bytes. The verifier marks it
|
||||
as R3=pkt(id=0,off=0,r=14).
|
||||
id=0 means that no additional variables were added to the register.
|
||||
off=0 means that no additional constants were added.
|
||||
r=14 is the range of safe access which means that bytes [R3, R3 + 14) are ok.
|
||||
Note that R5 is marked as R5=pkt(id=0,off=14,r=14). It also points
|
||||
to the packet data, but constant 14 was added to the register, so
|
||||
it now points to ``skb->data + 14`` and accessible range is [R5, R5 + 14 - 14)
|
||||
which is zero bytes.
|
||||
|
||||
More complex packet access may look like::
|
||||
|
||||
|
||||
R0=inv1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
|
||||
6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */
|
||||
7: r4 = *(u8 *)(r3 +12)
|
||||
8: r4 *= 14
|
||||
9: r3 = *(u32 *)(r1 +76) /* load skb->data */
|
||||
10: r3 += r4
|
||||
11: r2 = r1
|
||||
12: r2 <<= 48
|
||||
13: r2 >>= 48
|
||||
14: r3 += r2
|
||||
15: r2 = r3
|
||||
16: r2 += 8
|
||||
17: r1 = *(u32 *)(r1 +80) /* load skb->data_end */
|
||||
18: if r2 > r1 goto pc+2
|
||||
R0=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)) R5=pkt(id=0,off=14,r=14) R10=fp
|
||||
19: r1 = *(u8 *)(r3 +4)
|
||||
|
||||
The state of the register R3 is R3=pkt(id=2,off=0,r=8)
|
||||
id=2 means that two ``r3 += rX`` instructions were seen, so r3 points to some
|
||||
offset within a packet and since the program author did
|
||||
``if (r3 + 8 > r1) goto err`` at insn #18, the safe range is [R3, R3 + 8).
|
||||
The verifier only allows 'add'/'sub' operations on packet registers. Any other
|
||||
operation will set the register state to 'SCALAR_VALUE' and it won't be
|
||||
available for direct packet access.
|
||||
|
||||
Operation ``r3 += rX`` may overflow and become less than original skb->data,
|
||||
therefore the verifier has to prevent that. So when it sees ``r3 += rX``
|
||||
instruction and rX is more than 16-bit value, any subsequent bounds-check of r3
|
||||
against skb->data_end will not give us 'range' information, so attempts to read
|
||||
through the pointer will give "invalid access to packet" error.
|
||||
|
||||
Ex. after insn ``r4 = *(u8 *)(r3 +12)`` (insn #7 above) the state of r4 is
|
||||
R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) which means that upper 56 bits
|
||||
of the register are guaranteed to be zero, and nothing is known about the lower
|
||||
8 bits. After insn ``r4 *= 14`` the state becomes
|
||||
R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)), since multiplying an 8-bit
|
||||
value by constant 14 will keep upper 52 bits as zero, also the least significant
|
||||
bit will be zero as 14 is even. Similarly ``r2 >>= 48`` will make
|
||||
R2=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)), since the shift is not sign
|
||||
extending. This logic is implemented in adjust_reg_min_max_vals() function,
|
||||
which calls adjust_ptr_min_max_vals() for adding pointer to scalar (or vice
|
||||
versa) and adjust_scalar_min_max_vals() for operations on two scalars.
|
||||
|
||||
The end result is that bpf program author can access packet directly
|
||||
using normal C code as::
|
||||
|
||||
void *data = (void *)(long)skb->data;
|
||||
void *data_end = (void *)(long)skb->data_end;
|
||||
struct eth_hdr *eth = data;
|
||||
struct iphdr *iph = data + sizeof(*eth);
|
||||
struct udphdr *udp = data + sizeof(*eth) + sizeof(*iph);
|
||||
|
||||
if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end)
|
||||
return 0;
|
||||
if (eth->h_proto != htons(ETH_P_IP))
|
||||
return 0;
|
||||
if (iph->protocol != IPPROTO_UDP || iph->ihl != 5)
|
||||
return 0;
|
||||
if (udp->dest == 53 || udp->source == 9)
|
||||
...;
|
||||
|
||||
which makes such programs easier to write comparing to LD_ABS insn
|
||||
and significantly faster.
|
||||
|
||||
Pruning
|
||||
=======
|
||||
|
||||
The verifier does not actually walk all possible paths through the program. For
|
||||
each new branch to analyse, the verifier looks at all the states it's previously
|
||||
been in when at this instruction. If any of them contain the current state as a
|
||||
subset, the branch is 'pruned' - that is, the fact that the previous state was
|
||||
accepted implies the current state would be as well. For instance, if in the
|
||||
previous state, r1 held a packet-pointer, and in the current state, r1 holds a
|
||||
packet-pointer with a range as long or longer and at least as strict an
|
||||
alignment, then r1 is safe. Similarly, if r2 was NOT_INIT before then it can't
|
||||
have been used by any path from that point, so any value in r2 (including
|
||||
another NOT_INIT) is safe. The implementation is in the function regsafe().
|
||||
Pruning considers not only the registers but also the stack (and any spilled
|
||||
registers it may hold). They must all be safe for the branch to be pruned.
|
||||
This is implemented in states_equal().
|
||||
|
||||
Understanding eBPF verifier messages
|
||||
====================================
|
||||
|
||||
The following are few examples of invalid eBPF programs and verifier error
|
||||
messages as seen in the log:
|
||||
|
||||
Program with unreachable instructions::
|
||||
|
||||
static struct bpf_insn prog[] = {
|
||||
BPF_EXIT_INSN(),
|
||||
BPF_EXIT_INSN(),
|
||||
};
|
||||
|
||||
Error:
|
||||
|
||||
unreachable insn 1
|
||||
|
||||
Program that reads uninitialized register::
|
||||
|
||||
BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
|
||||
BPF_EXIT_INSN(),
|
||||
|
||||
Error::
|
||||
|
||||
0: (bf) r0 = r2
|
||||
R2 !read_ok
|
||||
|
||||
Program that doesn't initialize R0 before exiting::
|
||||
|
||||
BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
|
||||
BPF_EXIT_INSN(),
|
||||
|
||||
Error::
|
||||
|
||||
0: (bf) r2 = r1
|
||||
1: (95) exit
|
||||
R0 !read_ok
|
||||
|
||||
Program that accesses stack out of bounds::
|
||||
|
||||
BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
|
||||
BPF_EXIT_INSN(),
|
||||
|
||||
Error::
|
||||
|
||||
0: (7a) *(u64 *)(r10 +8) = 0
|
||||
invalid stack off=8 size=8
|
||||
|
||||
Program that doesn't initialize stack before passing its address into function::
|
||||
|
||||
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
|
||||
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
|
||||
BPF_LD_MAP_FD(BPF_REG_1, 0),
|
||||
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
|
||||
BPF_EXIT_INSN(),
|
||||
|
||||
Error::
|
||||
|
||||
0: (bf) r2 = r10
|
||||
1: (07) r2 += -8
|
||||
2: (b7) r1 = 0x0
|
||||
3: (85) call 1
|
||||
invalid indirect read from stack off -8+0 size 8
|
||||
|
||||
Program that uses invalid map_fd=0 while calling to map_lookup_elem() function::
|
||||
|
||||
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
|
||||
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
|
||||
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
|
||||
BPF_LD_MAP_FD(BPF_REG_1, 0),
|
||||
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
|
||||
BPF_EXIT_INSN(),
|
||||
|
||||
Error::
|
||||
|
||||
0: (7a) *(u64 *)(r10 -8) = 0
|
||||
1: (bf) r2 = r10
|
||||
2: (07) r2 += -8
|
||||
3: (b7) r1 = 0x0
|
||||
4: (85) call 1
|
||||
fd 0 is not pointing to valid bpf_map
|
||||
|
||||
Program that doesn't check return value of map_lookup_elem() before accessing
|
||||
map element::
|
||||
|
||||
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
|
||||
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
|
||||
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
|
||||
BPF_LD_MAP_FD(BPF_REG_1, 0),
|
||||
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
|
||||
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
|
||||
BPF_EXIT_INSN(),
|
||||
|
||||
Error::
|
||||
|
||||
0: (7a) *(u64 *)(r10 -8) = 0
|
||||
1: (bf) r2 = r10
|
||||
2: (07) r2 += -8
|
||||
3: (b7) r1 = 0x0
|
||||
4: (85) call 1
|
||||
5: (7a) *(u64 *)(r0 +0) = 0
|
||||
R0 invalid mem access 'map_value_or_null'
|
||||
|
||||
Program that correctly checks map_lookup_elem() returned value for NULL, but
|
||||
accesses the memory with incorrect alignment::
|
||||
|
||||
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
|
||||
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
|
||||
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
|
||||
BPF_LD_MAP_FD(BPF_REG_1, 0),
|
||||
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
|
||||
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
|
||||
BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
|
||||
BPF_EXIT_INSN(),
|
||||
|
||||
Error::
|
||||
|
||||
0: (7a) *(u64 *)(r10 -8) = 0
|
||||
1: (bf) r2 = r10
|
||||
2: (07) r2 += -8
|
||||
3: (b7) r1 = 1
|
||||
4: (85) call 1
|
||||
5: (15) if r0 == 0x0 goto pc+1
|
||||
R0=map_ptr R10=fp
|
||||
6: (7a) *(u64 *)(r0 +4) = 0
|
||||
misaligned access off 4 size 8
|
||||
|
||||
Program that correctly checks map_lookup_elem() returned value for NULL and
|
||||
accesses memory with correct alignment in one side of 'if' branch, but fails
|
||||
to do so in the other side of 'if' branch::
|
||||
|
||||
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
|
||||
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
|
||||
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
|
||||
BPF_LD_MAP_FD(BPF_REG_1, 0),
|
||||
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
|
||||
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
|
||||
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
|
||||
BPF_EXIT_INSN(),
|
||||
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
|
||||
BPF_EXIT_INSN(),
|
||||
|
||||
Error::
|
||||
|
||||
0: (7a) *(u64 *)(r10 -8) = 0
|
||||
1: (bf) r2 = r10
|
||||
2: (07) r2 += -8
|
||||
3: (b7) r1 = 1
|
||||
4: (85) call 1
|
||||
5: (15) if r0 == 0x0 goto pc+2
|
||||
R0=map_ptr R10=fp
|
||||
6: (7a) *(u64 *)(r0 +0) = 0
|
||||
7: (95) exit
|
||||
|
||||
from 5 to 8: R0=imm0 R10=fp
|
||||
8: (7a) *(u64 *)(r0 +0) = 1
|
||||
R0 invalid mem access 'imm'
|
||||
|
||||
Program that performs a socket lookup then sets the pointer to NULL without
|
||||
checking it::
|
||||
|
||||
BPF_MOV64_IMM(BPF_REG_2, 0),
|
||||
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
|
||||
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
|
||||
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
|
||||
BPF_MOV64_IMM(BPF_REG_3, 4),
|
||||
BPF_MOV64_IMM(BPF_REG_4, 0),
|
||||
BPF_MOV64_IMM(BPF_REG_5, 0),
|
||||
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
|
||||
BPF_MOV64_IMM(BPF_REG_0, 0),
|
||||
BPF_EXIT_INSN(),
|
||||
|
||||
Error::
|
||||
|
||||
0: (b7) r2 = 0
|
||||
1: (63) *(u32 *)(r10 -8) = r2
|
||||
2: (bf) r2 = r10
|
||||
3: (07) r2 += -8
|
||||
4: (b7) r3 = 4
|
||||
5: (b7) r4 = 0
|
||||
6: (b7) r5 = 0
|
||||
7: (85) call bpf_sk_lookup_tcp#65
|
||||
8: (b7) r0 = 0
|
||||
9: (95) exit
|
||||
Unreleased reference id=1, alloc_insn=7
|
||||
|
||||
Program that performs a socket lookup but does not NULL-check the returned
|
||||
value::
|
||||
|
||||
BPF_MOV64_IMM(BPF_REG_2, 0),
|
||||
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8),
|
||||
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
|
||||
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
|
||||
BPF_MOV64_IMM(BPF_REG_3, 4),
|
||||
BPF_MOV64_IMM(BPF_REG_4, 0),
|
||||
BPF_MOV64_IMM(BPF_REG_5, 0),
|
||||
BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp),
|
||||
BPF_EXIT_INSN(),
|
||||
|
||||
Error::
|
||||
|
||||
0: (b7) r2 = 0
|
||||
1: (63) *(u32 *)(r10 -8) = r2
|
||||
2: (bf) r2 = r10
|
||||
3: (07) r2 += -8
|
||||
4: (b7) r3 = 4
|
||||
5: (b7) r4 = 0
|
||||
6: (b7) r5 = 0
|
||||
7: (85) call bpf_sk_lookup_tcp#65
|
||||
8: (95) exit
|
||||
Unreleased reference id=1, alloc_insn=7
|
@ -208,16 +208,86 @@ highlight_language = 'none'
|
||||
# The theme to use for HTML and HTML Help pages. See the documentation for
|
||||
# a list of builtin themes.
|
||||
|
||||
# The Read the Docs theme is available from
|
||||
# - https://github.com/snide/sphinx_rtd_theme
|
||||
# - https://pypi.python.org/pypi/sphinx_rtd_theme
|
||||
# - python-sphinx-rtd-theme package (on Debian)
|
||||
try:
|
||||
import sphinx_rtd_theme
|
||||
html_theme = 'sphinx_rtd_theme'
|
||||
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
|
||||
except ImportError:
|
||||
sys.stderr.write('Warning: The Sphinx \'sphinx_rtd_theme\' HTML theme was not found. Make sure you have the theme installed to produce pretty HTML output. Falling back to the default theme.\n')
|
||||
# Default theme
|
||||
html_theme = 'sphinx_rtd_theme'
|
||||
html_css_files = []
|
||||
|
||||
if "DOCS_THEME" in os.environ:
|
||||
html_theme = os.environ["DOCS_THEME"]
|
||||
|
||||
if html_theme == 'sphinx_rtd_theme' or html_theme == 'sphinx_rtd_dark_mode':
|
||||
# Read the Docs theme
|
||||
try:
|
||||
import sphinx_rtd_theme
|
||||
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
|
||||
|
||||
# Add any paths that contain custom static files (such as style sheets) here,
|
||||
# relative to this directory. They are copied after the builtin static files,
|
||||
# so a file named "default.css" will overwrite the builtin "default.css".
|
||||
html_css_files = [
|
||||
'theme_overrides.css',
|
||||
]
|
||||
|
||||
# Read the Docs dark mode override theme
|
||||
if html_theme == 'sphinx_rtd_dark_mode':
|
||||
try:
|
||||
import sphinx_rtd_dark_mode
|
||||
extensions.append('sphinx_rtd_dark_mode')
|
||||
except ImportError:
|
||||
html_theme == 'sphinx_rtd_theme'
|
||||
|
||||
if html_theme == 'sphinx_rtd_theme':
|
||||
# Add color-specific RTD normal mode
|
||||
html_css_files.append('theme_rtd_colors.css')
|
||||
|
||||
except ImportError:
|
||||
html_theme = 'classic'
|
||||
|
||||
if "DOCS_CSS" in os.environ:
|
||||
css = os.environ["DOCS_CSS"].split(" ")
|
||||
|
||||
for l in css:
|
||||
html_css_files.append(l)
|
||||
|
||||
if major <= 1 and minor < 8:
|
||||
html_context = {
|
||||
'css_files': [],
|
||||
}
|
||||
|
||||
for l in html_css_files:
|
||||
html_context['css_files'].append('_static/' + l)
|
||||
|
||||
if html_theme == 'classic':
|
||||
html_theme_options = {
|
||||
'rightsidebar': False,
|
||||
'stickysidebar': True,
|
||||
'collapsiblesidebar': True,
|
||||
'externalrefs': False,
|
||||
|
||||
'footerbgcolor': "white",
|
||||
'footertextcolor': "white",
|
||||
'sidebarbgcolor': "white",
|
||||
'sidebarbtncolor': "black",
|
||||
'sidebartextcolor': "black",
|
||||
'sidebarlinkcolor': "#686bff",
|
||||
'relbarbgcolor': "#133f52",
|
||||
'relbartextcolor': "white",
|
||||
'relbarlinkcolor': "white",
|
||||
'bgcolor': "white",
|
||||
'textcolor': "black",
|
||||
'headbgcolor': "#f2f2f2",
|
||||
'headtextcolor': "#20435c",
|
||||
'headlinkcolor': "#c60f0f",
|
||||
'linkcolor': "#355f7c",
|
||||
'visitedlinkcolor': "#355f7c",
|
||||
'codebgcolor': "#3f3f3f",
|
||||
'codetextcolor': "white",
|
||||
|
||||
'bodyfont': "serif",
|
||||
'headfont': "sans-serif",
|
||||
}
|
||||
|
||||
sys.stderr.write("Using %s theme\n" % html_theme)
|
||||
|
||||
# Theme options are theme-specific and customize the look and feel of a theme
|
||||
# further. For a list of options available for each theme, see the
|
||||
@ -246,20 +316,8 @@ except ImportError:
|
||||
# Add any paths that contain custom static files (such as style sheets) here,
|
||||
# relative to this directory. They are copied after the builtin static files,
|
||||
# so a file named "default.css" will overwrite the builtin "default.css".
|
||||
|
||||
html_static_path = ['sphinx-static']
|
||||
|
||||
html_css_files = [
|
||||
'theme_overrides.css',
|
||||
]
|
||||
|
||||
if major <= 1 and minor < 8:
|
||||
html_context = {
|
||||
'css_files': [
|
||||
'_static/theme_overrides.css',
|
||||
],
|
||||
}
|
||||
|
||||
# Add any extra paths that contain custom files (such as robots.txt or
|
||||
# .htaccess) here, relative to this directory. These files are copied
|
||||
# directly to the root of the documentation.
|
||||
|
@ -279,6 +279,7 @@ Accounting Framework
|
||||
Block Devices
|
||||
=============
|
||||
|
||||
.. kernel-doc:: include/linux/bio.h
|
||||
.. kernel-doc:: block/blk-core.c
|
||||
:export:
|
||||
|
||||
@ -294,9 +295,6 @@ Block Devices
|
||||
.. kernel-doc:: block/blk-settings.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: block/blk-exec.c
|
||||
:export:
|
||||
|
||||
.. kernel-doc:: block/blk-flush.c
|
||||
:export:
|
||||
|
||||
|
@ -118,7 +118,7 @@ Initialization of kobjects
|
||||
Code which creates a kobject must, of course, initialize that object. Some
|
||||
of the internal fields are setup with a (mandatory) call to kobject_init()::
|
||||
|
||||
void kobject_init(struct kobject *kobj, struct kobj_type *ktype);
|
||||
void kobject_init(struct kobject *kobj, const struct kobj_type *ktype);
|
||||
|
||||
The ktype is required for a kobject to be created properly, as every kobject
|
||||
must have an associated kobj_type. After calling kobject_init(), to
|
||||
@ -156,7 +156,7 @@ kobject_name()::
|
||||
There is a helper function to both initialize and add the kobject to the
|
||||
kernel at the same time, called surprisingly enough kobject_init_and_add()::
|
||||
|
||||
int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype,
|
||||
int kobject_init_and_add(struct kobject *kobj, const struct kobj_type *ktype,
|
||||
struct kobject *parent, const char *fmt, ...);
|
||||
|
||||
The arguments are the same as the individual kobject_init() and
|
||||
@ -299,7 +299,6 @@ kobj_type::
|
||||
struct kobj_type {
|
||||
void (*release)(struct kobject *kobj);
|
||||
const struct sysfs_ops *sysfs_ops;
|
||||
struct attribute **default_attrs;
|
||||
const struct attribute_group **default_groups;
|
||||
const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj);
|
||||
const void *(*namespace)(struct kobject *kobj);
|
||||
@ -313,10 +312,10 @@ call kobject_init() or kobject_init_and_add().
|
||||
|
||||
The release field in struct kobj_type is, of course, a pointer to the
|
||||
release() method for this type of kobject. The other two fields (sysfs_ops
|
||||
and default_attrs) control how objects of this type are represented in
|
||||
and default_groups) control how objects of this type are represented in
|
||||
sysfs; they are beyond the scope of this document.
|
||||
|
||||
The default_attrs pointer is a list of default attributes that will be
|
||||
The default_groups pointer is a list of default attributes that will be
|
||||
automatically created for any kobject that is registered with this ktype.
|
||||
|
||||
|
||||
@ -373,10 +372,9 @@ If a kset wishes to control the uevent operations of the kobjects
|
||||
associated with it, it can use the struct kset_uevent_ops to handle it::
|
||||
|
||||
struct kset_uevent_ops {
|
||||
int (* const filter)(struct kset *kset, struct kobject *kobj);
|
||||
const char *(* const name)(struct kset *kset, struct kobject *kobj);
|
||||
int (* const uevent)(struct kset *kset, struct kobject *kobj,
|
||||
struct kobj_uevent_env *env);
|
||||
int (* const filter)(struct kobject *kobj);
|
||||
const char *(* const name)(struct kobject *kobj);
|
||||
int (* const uevent)(struct kobject *kobj, struct kobj_uevent_env *env);
|
||||
};
|
||||
|
||||
|
||||
|
@ -32,6 +32,7 @@ Documentation/dev-tools/testing-overview.rst
|
||||
kgdb
|
||||
kselftest
|
||||
kunit/index
|
||||
ktap
|
||||
|
||||
|
||||
.. only:: subproject and html
|
||||
|
@ -204,17 +204,17 @@ Ultimately this allows to determine the possible executions of concurrent code,
|
||||
and if that code is free from data races.
|
||||
|
||||
KCSAN is aware of *marked atomic operations* (``READ_ONCE``, ``WRITE_ONCE``,
|
||||
``atomic_*``, etc.), but is oblivious of any ordering guarantees and simply
|
||||
assumes that memory barriers are placed correctly. In other words, KCSAN
|
||||
assumes that as long as a plain access is not observed to race with another
|
||||
conflicting access, memory operations are correctly ordered.
|
||||
``atomic_*``, etc.), and a subset of ordering guarantees implied by memory
|
||||
barriers. With ``CONFIG_KCSAN_WEAK_MEMORY=y``, KCSAN models load or store
|
||||
buffering, and can detect missing ``smp_mb()``, ``smp_wmb()``, ``smp_rmb()``,
|
||||
``smp_store_release()``, and all ``atomic_*`` operations with equivalent
|
||||
implied barriers.
|
||||
|
||||
This means that KCSAN will not report *potential* data races due to missing
|
||||
memory ordering. Developers should therefore carefully consider the required
|
||||
memory ordering requirements that remain unchecked. If, however, missing
|
||||
memory ordering (that is observable with a particular compiler and
|
||||
architecture) leads to an observable data race (e.g. entering a critical
|
||||
section erroneously), KCSAN would report the resulting data race.
|
||||
Note, KCSAN will not report all data races due to missing memory ordering,
|
||||
specifically where a memory barrier would be required to prohibit subsequent
|
||||
memory operation from reordering before the barrier. Developers should
|
||||
therefore carefully consider the required memory ordering requirements that
|
||||
remain unchecked.
|
||||
|
||||
Race Detection Beyond Data Races
|
||||
--------------------------------
|
||||
@ -268,6 +268,56 @@ marked operations, if all accesses to a variable that is accessed concurrently
|
||||
are properly marked, KCSAN will never trigger a watchpoint and therefore never
|
||||
report the accesses.
|
||||
|
||||
Modeling Weak Memory
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
KCSAN's approach to detecting data races due to missing memory barriers is
|
||||
based on modeling access reordering (with ``CONFIG_KCSAN_WEAK_MEMORY=y``).
|
||||
Each plain memory access for which a watchpoint is set up, is also selected for
|
||||
simulated reordering within the scope of its function (at most 1 in-flight
|
||||
access).
|
||||
|
||||
Once an access has been selected for reordering, it is checked along every
|
||||
other access until the end of the function scope. If an appropriate memory
|
||||
barrier is encountered, the access will no longer be considered for simulated
|
||||
reordering.
|
||||
|
||||
When the result of a memory operation should be ordered by a barrier, KCSAN can
|
||||
then detect data races where the conflict only occurs as a result of a missing
|
||||
barrier. Consider the example::
|
||||
|
||||
int x, flag;
|
||||
void T1(void)
|
||||
{
|
||||
x = 1; // data race!
|
||||
WRITE_ONCE(flag, 1); // correct: smp_store_release(&flag, 1)
|
||||
}
|
||||
void T2(void)
|
||||
{
|
||||
while (!READ_ONCE(flag)); // correct: smp_load_acquire(&flag)
|
||||
... = x; // data race!
|
||||
}
|
||||
|
||||
When weak memory modeling is enabled, KCSAN can consider ``x`` in ``T1`` for
|
||||
simulated reordering. After the write of ``flag``, ``x`` is again checked for
|
||||
concurrent accesses: because ``T2`` is able to proceed after the write of
|
||||
``flag``, a data race is detected. With the correct barriers in place, ``x``
|
||||
would not be considered for reordering after the proper release of ``flag``,
|
||||
and no data race would be detected.
|
||||
|
||||
Deliberate trade-offs in complexity but also practical limitations mean only a
|
||||
subset of data races due to missing memory barriers can be detected. With
|
||||
currently available compiler support, the implementation is limited to modeling
|
||||
the effects of "buffering" (delaying accesses), since the runtime cannot
|
||||
"prefetch" accesses. Also recall that watchpoints are only set up for plain
|
||||
accesses, and the only access type for which KCSAN simulates reordering. This
|
||||
means reordering of marked accesses is not modeled.
|
||||
|
||||
A consequence of the above is that acquire operations do not require barrier
|
||||
instrumentation (no prefetching). Furthermore, marked accesses introducing
|
||||
address or control dependencies do not require special handling (the marked
|
||||
access cannot be reordered, later dependent accesses cannot be prefetched).
|
||||
|
||||
Key Properties
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
@ -290,8 +340,8 @@ Key Properties
|
||||
4. **Detects Racy Writes from Devices:** Due to checking data values upon
|
||||
setting up watchpoints, racy writes from devices can also be detected.
|
||||
|
||||
5. **Memory Ordering:** KCSAN is *not* explicitly aware of the LKMM's ordering
|
||||
rules; this may result in missed data races (false negatives).
|
||||
5. **Memory Ordering:** KCSAN is aware of only a subset of LKMM ordering rules;
|
||||
this may result in missed data races (false negatives).
|
||||
|
||||
6. **Analysis Accuracy:** For observed executions, due to using a sampling
|
||||
strategy, the analysis is *unsound* (false negatives possible), but aims to
|
||||
|
@ -402,7 +402,7 @@ This is a quick example of how to use kdb.
|
||||
2. Enter the kernel debugger manually or by waiting for an oops or
|
||||
fault. There are several ways you can enter the kernel debugger
|
||||
manually; all involve using the :kbd:`SysRq-G`, which means you must have
|
||||
enabled ``CONFIG_MAGIC_SysRq=y`` in your kernel config.
|
||||
enabled ``CONFIG_MAGIC_SYSRQ=y`` in your kernel config.
|
||||
|
||||
- When logged in as root or with a super user session you can run::
|
||||
|
||||
@ -461,7 +461,7 @@ This is a quick example of how to use kdb with a keyboard.
|
||||
2. Enter the kernel debugger manually or by waiting for an oops or
|
||||
fault. There are several ways you can enter the kernel debugger
|
||||
manually; all involve using the :kbd:`SysRq-G`, which means you must have
|
||||
enabled ``CONFIG_MAGIC_SysRq=y`` in your kernel config.
|
||||
enabled ``CONFIG_MAGIC_SYSRQ=y`` in your kernel config.
|
||||
|
||||
- When logged in as root or with a super user session you can run::
|
||||
|
||||
@ -557,7 +557,7 @@ Connecting with gdb to a serial port
|
||||
Example (using a directly connected port)::
|
||||
|
||||
% gdb ./vmlinux
|
||||
(gdb) set remotebaud 115200
|
||||
(gdb) set serial baud 115200
|
||||
(gdb) target remote /dev/ttyS0
|
||||
|
||||
|
||||
|
298
Documentation/dev-tools/ktap.rst
Normal file
@ -0,0 +1,298 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
========================================
|
||||
The Kernel Test Anything Protocol (KTAP)
|
||||
========================================
|
||||
|
||||
TAP, or the Test Anything Protocol is a format for specifying test results used
|
||||
by a number of projects. It's website and specification are found at this `link
|
||||
<https://testanything.org/>`_. The Linux Kernel largely uses TAP output for test
|
||||
results. However, Kernel testing frameworks have special needs for test results
|
||||
which don't align with the original TAP specification. Thus, a "Kernel TAP"
|
||||
(KTAP) format is specified to extend and alter TAP to support these use-cases.
|
||||
This specification describes the generally accepted format of KTAP as it is
|
||||
currently used in the kernel.
|
||||
|
||||
KTAP test results describe a series of tests (which may be nested: i.e., test
|
||||
can have subtests), each of which can contain both diagnostic data -- e.g., log
|
||||
lines -- and a final result. The test structure and results are
|
||||
machine-readable, whereas the diagnostic data is unstructured and is there to
|
||||
aid human debugging.
|
||||
|
||||
KTAP output is built from four different types of lines:
|
||||
- Version lines
|
||||
- Plan lines
|
||||
- Test case result lines
|
||||
- Diagnostic lines
|
||||
|
||||
In general, valid KTAP output should also form valid TAP output, but some
|
||||
information, in particular nested test results, may be lost. Also note that
|
||||
there is a stagnant draft specification for TAP14, KTAP diverges from this in
|
||||
a couple of places (notably the "Subtest" header), which are described where
|
||||
relevant later in this document.
|
||||
|
||||
Version lines
|
||||
-------------
|
||||
|
||||
All KTAP-formatted results begin with a "version line" which specifies which
|
||||
version of the (K)TAP standard the result is compliant with.
|
||||
|
||||
For example:
|
||||
- "KTAP version 1"
|
||||
- "TAP version 13"
|
||||
- "TAP version 14"
|
||||
|
||||
Note that, in KTAP, subtests also begin with a version line, which denotes the
|
||||
start of the nested test results. This differs from TAP14, which uses a
|
||||
separate "Subtest" line.
|
||||
|
||||
While, going forward, "KTAP version 1" should be used by compliant tests, it
|
||||
is expected that most parsers and other tooling will accept the other versions
|
||||
listed here for compatibility with existing tests and frameworks.
|
||||
|
||||
Plan lines
|
||||
----------
|
||||
|
||||
A test plan provides the number of tests (or subtests) in the KTAP output.
|
||||
|
||||
Plan lines must follow the format of "1..N" where N is the number of tests or subtests.
|
||||
Plan lines follow version lines to indicate the number of nested tests.
|
||||
|
||||
While there are cases where the number of tests is not known in advance -- in
|
||||
which case the test plan may be omitted -- it is strongly recommended one is
|
||||
present where possible.
|
||||
|
||||
Test case result lines
|
||||
----------------------
|
||||
|
||||
Test case result lines indicate the final status of a test.
|
||||
They are required and must have the format:
|
||||
|
||||
.. code-block::
|
||||
|
||||
<result> <number> [<description>][ # [<directive>] [<diagnostic data>]]
|
||||
|
||||
The result can be either "ok", which indicates the test case passed,
|
||||
or "not ok", which indicates that the test case failed.
|
||||
|
||||
<number> represents the number of the test being performed. The first test must
|
||||
have the number 1 and the number then must increase by 1 for each additional
|
||||
subtest within the same test at the same nesting level.
|
||||
|
||||
The description is a description of the test, generally the name of
|
||||
the test, and can be any string of words (can't include #). The
|
||||
description is optional, but recommended.
|
||||
|
||||
The directive and any diagnostic data is optional. If either are present, they
|
||||
must follow a hash sign, "#".
|
||||
|
||||
A directive is a keyword that indicates a different outcome for a test other
|
||||
than passed and failed. The directive is optional, and consists of a single
|
||||
keyword preceding the diagnostic data. In the event that a parser encounters
|
||||
a directive it doesn't support, it should fall back to the "ok" / "not ok"
|
||||
result.
|
||||
|
||||
Currently accepted directives are:
|
||||
|
||||
- "SKIP", which indicates a test was skipped (note the result of the test case
|
||||
result line can be either "ok" or "not ok" if the SKIP directive is used)
|
||||
- "TODO", which indicates that a test is not expected to pass at the moment,
|
||||
e.g. because the feature it is testing is known to be broken. While this
|
||||
directive is inherited from TAP, its use in the kernel is discouraged.
|
||||
- "XFAIL", which indicates that a test is expected to fail. This is similar
|
||||
to "TODO", above, and is used by some kselftest tests.
|
||||
- “TIMEOUT”, which indicates a test has timed out (note the result of the test
|
||||
case result line should be “not ok” if the TIMEOUT directive is used)
|
||||
- “ERROR”, which indicates that the execution of a test has failed due to a
|
||||
specific error that is included in the diagnostic data. (note the result of
|
||||
the test case result line should be “not ok” if the ERROR directive is used)
|
||||
|
||||
The diagnostic data is a plain-text field which contains any additional details
|
||||
about why this result was produced. This is typically an error message for ERROR
|
||||
or failed tests, or a description of missing dependencies for a SKIP result.
|
||||
|
||||
The diagnostic data field is optional, and results which have neither a
|
||||
directive nor any diagnostic data do not need to include the "#" field
|
||||
separator.
|
||||
|
||||
Example result lines include:
|
||||
|
||||
.. code-block::
|
||||
|
||||
ok 1 test_case_name
|
||||
|
||||
The test "test_case_name" passed.
|
||||
|
||||
.. code-block::
|
||||
|
||||
not ok 1 test_case_name
|
||||
|
||||
The test "test_case_name" failed.
|
||||
|
||||
.. code-block::
|
||||
|
||||
ok 1 test # SKIP necessary dependency unavailable
|
||||
|
||||
The test "test" was SKIPPED with the diagnostic message "necessary dependency
|
||||
unavailable".
|
||||
|
||||
.. code-block::
|
||||
|
||||
not ok 1 test # TIMEOUT 30 seconds
|
||||
|
||||
The test "test" timed out, with diagnostic data "30 seconds".
|
||||
|
||||
.. code-block::
|
||||
|
||||
ok 5 check return code # rcode=0
|
||||
|
||||
The test "check return code" passed, with additional diagnostic data “rcode=0”
|
||||
|
||||
|
||||
Diagnostic lines
|
||||
----------------
|
||||
|
||||
If tests wish to output any further information, they should do so using
|
||||
"diagnostic lines". Diagnostic lines are optional, freeform text, and are
|
||||
often used to describe what is being tested and any intermediate results in
|
||||
more detail than the final result and diagnostic data line provides.
|
||||
|
||||
Diagnostic lines are formatted as "# <diagnostic_description>", where the
|
||||
description can be any string. Diagnostic lines can be anywhere in the test
|
||||
output. As a rule, diagnostic lines regarding a test are directly before the
|
||||
test result line for that test.
|
||||
|
||||
Note that most tools will treat unknown lines (see below) as diagnostic lines,
|
||||
even if they do not start with a "#": this is to capture any other useful
|
||||
kernel output which may help debug the test. It is nevertheless recommended
|
||||
that tests always prefix any diagnostic output they have with a "#" character.
|
||||
|
||||
Unknown lines
|
||||
-------------
|
||||
|
||||
There may be lines within KTAP output that do not follow the format of one of
|
||||
the four formats for lines described above. This is allowed, however, they will
|
||||
not influence the status of the tests.
|
||||
|
||||
Nested tests
|
||||
------------
|
||||
|
||||
In KTAP, tests can be nested. This is done by having a test include within its
|
||||
output an entire set of KTAP-formatted results. This can be used to categorize
|
||||
and group related tests, or to split out different results from the same test.
|
||||
|
||||
The "parent" test's result should consist of all of its subtests' results,
|
||||
starting with another KTAP version line and test plan, and end with the overall
|
||||
result. If one of the subtests fail, for example, the parent test should also
|
||||
fail.
|
||||
|
||||
Additionally, all result lines in a subtest should be indented. One level of
|
||||
indentation is two spaces: " ". The indentation should begin at the version
|
||||
line and should end before the parent test's result line.
|
||||
|
||||
An example of a test with two nested subtests:
|
||||
|
||||
.. code-block::
|
||||
|
||||
KTAP version 1
|
||||
1..1
|
||||
KTAP version 1
|
||||
1..2
|
||||
ok 1 test_1
|
||||
not ok 2 test_2
|
||||
# example failed
|
||||
not ok 1 example
|
||||
|
||||
An example format with multiple levels of nested testing:
|
||||
|
||||
.. code-block::
|
||||
|
||||
KTAP version 1
|
||||
1..2
|
||||
KTAP version 1
|
||||
1..2
|
||||
KTAP version 1
|
||||
1..2
|
||||
not ok 1 test_1
|
||||
ok 2 test_2
|
||||
not ok 1 test_3
|
||||
ok 2 test_4 # SKIP
|
||||
not ok 1 example_test_1
|
||||
ok 2 example_test_2
|
||||
|
||||
|
||||
Major differences between TAP and KTAP
|
||||
--------------------------------------
|
||||
|
||||
Note the major differences between the TAP and KTAP specification:
|
||||
- yaml and json are not recommended in diagnostic messages
|
||||
- TODO directive not recognized
|
||||
- KTAP allows for an arbitrary number of tests to be nested
|
||||
|
||||
The TAP14 specification does permit nested tests, but instead of using another
|
||||
nested version line, uses a line of the form
|
||||
"Subtest: <name>" where <name> is the name of the parent test.
|
||||
|
||||
Example KTAP output
|
||||
--------------------
|
||||
.. code-block::
|
||||
|
||||
KTAP version 1
|
||||
1..1
|
||||
KTAP version 1
|
||||
1..3
|
||||
KTAP version 1
|
||||
1..1
|
||||
# test_1: initializing test_1
|
||||
ok 1 test_1
|
||||
ok 1 example_test_1
|
||||
KTAP version 1
|
||||
1..2
|
||||
ok 1 test_1 # SKIP test_1 skipped
|
||||
ok 2 test_2
|
||||
ok 2 example_test_2
|
||||
KTAP version 1
|
||||
1..3
|
||||
ok 1 test_1
|
||||
# test_2: FAIL
|
||||
not ok 2 test_2
|
||||
ok 3 test_3 # SKIP test_3 skipped
|
||||
not ok 3 example_test_3
|
||||
not ok 1 main_test
|
||||
|
||||
This output defines the following hierarchy:
|
||||
|
||||
A single test called "main_test", which fails, and has three subtests:
|
||||
- "example_test_1", which passes, and has one subtest:
|
||||
|
||||
- "test_1", which passes, and outputs the diagnostic message "test_1: initializing test_1"
|
||||
|
||||
- "example_test_2", which passes, and has two subtests:
|
||||
|
||||
- "test_1", which is skipped, with the explanation "test_1 skipped"
|
||||
- "test_2", which passes
|
||||
|
||||
- "example_test_3", which fails, and has three subtests
|
||||
|
||||
- "test_1", which passes
|
||||
- "test_2", which outputs the diagnostic line "test_2: FAIL", and fails.
|
||||
- "test_3", which is skipped with the explanation "test_3 skipped"
|
||||
|
||||
Note that the individual subtests with the same names do not conflict, as they
|
||||
are found in different parent tests. This output also exhibits some sensible
|
||||
rules for "bubbling up" test results: a test fails if any of its subtests fail.
|
||||
Skipped tests do not affect the result of the parent test (though it often
|
||||
makes sense for a test to be marked skipped if _all_ of its subtests have been
|
||||
skipped).
|
||||
|
||||
See also:
|
||||
---------
|
||||
|
||||
- The TAP specification:
|
||||
https://testanything.org/tap-version-13-specification.html
|
||||
- The (stagnant) TAP version 14 specification:
|
||||
https://github.com/TestAnything/Specification/blob/tap-14-specification/specification.md
|
||||
- The kselftest documentation:
|
||||
Documentation/dev-tools/kselftest.rst
|
||||
- The KUnit documentation:
|
||||
Documentation/dev-tools/kunit/index.rst
|
@ -12,5 +12,4 @@ following sections:
|
||||
|
||||
Documentation/dev-tools/kunit/api/test.rst
|
||||
|
||||
- documents all of the standard testing API excluding mocking
|
||||
or mocking related features.
|
||||
- documents all of the standard testing API
|
||||
|
@ -4,8 +4,7 @@
|
||||
Test API
|
||||
========
|
||||
|
||||
This file documents all of the standard testing API excluding mocking or mocking
|
||||
related features.
|
||||
This file documents all of the standard testing API.
|
||||
|
||||
.. kernel-doc:: include/kunit/test.h
|
||||
:internal:
|
||||
|
204
Documentation/dev-tools/kunit/architecture.rst
Normal file
@ -0,0 +1,204 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==================
|
||||
KUnit Architecture
|
||||
==================
|
||||
|
||||
The KUnit architecture can be divided into two parts:
|
||||
|
||||
- Kernel testing library
|
||||
- kunit_tool (Command line test harness)
|
||||
|
||||
In-Kernel Testing Framework
|
||||
===========================
|
||||
|
||||
The kernel testing library supports KUnit tests written in C using
|
||||
KUnit. KUnit tests are kernel code. KUnit does several things:
|
||||
|
||||
- Organizes tests
|
||||
- Reports test results
|
||||
- Provides test utilities
|
||||
|
||||
Test Cases
|
||||
----------
|
||||
|
||||
The fundamental unit in KUnit is the test case. The KUnit test cases are
|
||||
grouped into KUnit suites. A KUnit test case is a function with type
|
||||
signature ``void (*)(struct kunit *test)``.
|
||||
These test case functions are wrapped in a struct called
|
||||
``struct kunit_case``. For code, see:
|
||||
|
||||
.. kernel-doc:: include/kunit/test.h
|
||||
:identifiers: kunit_case
|
||||
|
||||
.. note:
|
||||
``generate_params`` is optional for non-parameterized tests.
|
||||
|
||||
Each KUnit test case gets a ``struct kunit`` context
|
||||
object passed to it that tracks a running test. The KUnit assertion
|
||||
macros and other KUnit utilities use the ``struct kunit`` context
|
||||
object. As an exception, there are two fields:
|
||||
|
||||
- ``->priv``: The setup functions can use it to store arbitrary test
|
||||
user data.
|
||||
|
||||
- ``->param_value``: It contains the parameter value which can be
|
||||
retrieved in the parameterized tests.
|
||||
|
||||
Test Suites
|
||||
-----------
|
||||
|
||||
A KUnit suite includes a collection of test cases. The KUnit suites
|
||||
are represented by the ``struct kunit_suite``. For example:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
static struct kunit_case example_test_cases[] = {
|
||||
KUNIT_CASE(example_test_foo),
|
||||
KUNIT_CASE(example_test_bar),
|
||||
KUNIT_CASE(example_test_baz),
|
||||
{}
|
||||
};
|
||||
|
||||
static struct kunit_suite example_test_suite = {
|
||||
.name = "example",
|
||||
.init = example_test_init,
|
||||
.exit = example_test_exit,
|
||||
.test_cases = example_test_cases,
|
||||
};
|
||||
kunit_test_suite(example_test_suite);
|
||||
|
||||
In the above example, the test suite ``example_test_suite``, runs the
|
||||
test cases ``example_test_foo``, ``example_test_bar``, and
|
||||
``example_test_baz``. Before running the test, the ``example_test_init``
|
||||
is called and after running the test, ``example_test_exit`` is called.
|
||||
The ``kunit_test_suite(example_test_suite)`` registers the test suite
|
||||
with the KUnit test framework.
|
||||
|
||||
Executor
|
||||
--------
|
||||
|
||||
The KUnit executor can list and run built-in KUnit tests on boot.
|
||||
The Test suites are stored in a linker section
|
||||
called ``.kunit_test_suites``. For code, see:
|
||||
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/asm-generic/vmlinux.lds.h?h=v5.15#n945.
|
||||
The linker section consists of an array of pointers to
|
||||
``struct kunit_suite``, and is populated by the ``kunit_test_suites()``
|
||||
macro. To run all tests compiled into the kernel, the KUnit executor
|
||||
iterates over the linker section array.
|
||||
|
||||
.. kernel-figure:: kunit_suitememorydiagram.svg
|
||||
:alt: KUnit Suite Memory
|
||||
|
||||
KUnit Suite Memory Diagram
|
||||
|
||||
On the kernel boot, the KUnit executor uses the start and end addresses
|
||||
of this section to iterate over and run all tests. For code, see:
|
||||
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/kunit/executor.c
|
||||
|
||||
When built as a module, the ``kunit_test_suites()`` macro defines a
|
||||
``module_init()`` function, which runs all the tests in the compilation
|
||||
unit instead of utilizing the executor.
|
||||
|
||||
In KUnit tests, some error classes do not affect other tests
|
||||
or parts of the kernel, each KUnit case executes in a separate thread
|
||||
context. For code, see:
|
||||
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/kunit/try-catch.c?h=v5.15#n58
|
||||
|
||||
Assertion Macros
|
||||
----------------
|
||||
|
||||
KUnit tests verify state using expectations/assertions.
|
||||
All expectations/assertions are formatted as:
|
||||
``KUNIT_{EXPECT|ASSERT}_<op>[_MSG](kunit, property[, message])``
|
||||
|
||||
- ``{EXPECT|ASSERT}`` determines whether the check is an assertion or an
|
||||
expectation.
|
||||
|
||||
- For an expectation, if the check fails, marks the test as failed
|
||||
and logs the failure.
|
||||
|
||||
- An assertion, on failure, causes the test case to terminate
|
||||
immediately.
|
||||
|
||||
- Assertions call function:
|
||||
``void __noreturn kunit_abort(struct kunit *)``.
|
||||
|
||||
- ``kunit_abort`` calls function:
|
||||
``void __noreturn kunit_try_catch_throw(struct kunit_try_catch *try_catch)``.
|
||||
|
||||
- ``kunit_try_catch_throw`` calls function:
|
||||
``void complete_and_exit(struct completion *, long) __noreturn;``
|
||||
and terminates the special thread context.
|
||||
|
||||
- ``<op>`` denotes a check with options: ``TRUE`` (supplied property
|
||||
has the boolean value “true”), ``EQ`` (two supplied properties are
|
||||
equal), ``NOT_ERR_OR_NULL`` (supplied pointer is not null and does not
|
||||
contain an “err” value).
|
||||
|
||||
- ``[_MSG]`` prints a custom message on failure.
|
||||
|
||||
Test Result Reporting
|
||||
---------------------
|
||||
KUnit prints test results in KTAP format. KTAP is based on TAP14, see:
|
||||
https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-specification.md.
|
||||
KTAP (yet to be standardized format) works with KUnit and Kselftest.
|
||||
The KUnit executor prints KTAP results to dmesg, and debugfs
|
||||
(if configured).
|
||||
|
||||
Parameterized Tests
|
||||
-------------------
|
||||
|
||||
Each KUnit parameterized test is associated with a collection of
|
||||
parameters. The test is invoked multiple times, once for each parameter
|
||||
value and the parameter is stored in the ``param_value`` field.
|
||||
The test case includes a ``KUNIT_CASE_PARAM()`` macro that accepts a
|
||||
generator function.
|
||||
The generator function is passed the previous parameter and returns the next
|
||||
parameter. It also provides a macro to generate common-case generators based on
|
||||
arrays.
|
||||
|
||||
For code, see:
|
||||
|
||||
.. kernel-doc:: include/kunit/test.h
|
||||
:identifiers: KUNIT_ARRAY_PARAM
|
||||
|
||||
|
||||
kunit_tool (Command Line Test Harness)
|
||||
======================================
|
||||
|
||||
kunit_tool is a Python script ``(tools/testing/kunit/kunit.py)``
|
||||
that can be used to configure, build, exec, parse and run (runs other
|
||||
commands in order) test results. You can either run KUnit tests using
|
||||
kunit_tool or can include KUnit in kernel and parse manually.
|
||||
|
||||
- ``configure`` command generates the kernel ``.config`` from a
|
||||
``.kunitconfig`` file (and any architecture-specific options).
|
||||
For some architectures, additional config options are specified in the
|
||||
``qemu_config`` Python script
|
||||
(For example: ``tools/testing/kunit/qemu_configs/powerpc.py``).
|
||||
It parses both the existing ``.config`` and the ``.kunitconfig`` files
|
||||
and ensures that ``.config`` is a superset of ``.kunitconfig``.
|
||||
If this is not the case, it will combine the two and run
|
||||
``make olddefconfig`` to regenerate the ``.config`` file. It then
|
||||
verifies that ``.config`` is now a superset. This checks if all
|
||||
Kconfig dependencies are correctly specified in ``.kunitconfig``.
|
||||
``kunit_config.py`` includes the parsing Kconfigs code. The code which
|
||||
runs ``make olddefconfig`` is a part of ``kunit_kernel.py``. You can
|
||||
invoke this command via: ``./tools/testing/kunit/kunit.py config`` and
|
||||
generate a ``.config`` file.
|
||||
- ``build`` runs ``make`` on the kernel tree with required options
|
||||
(depends on the architecture and some options, for example: build_dir)
|
||||
and reports any errors.
|
||||
To build a KUnit kernel from the current ``.config``, you can use the
|
||||
``build`` argument: ``./tools/testing/kunit/kunit.py build``.
|
||||
- ``exec`` command executes kernel results either directly (using
|
||||
User-mode Linux configuration), or via an emulator such
|
||||
as QEMU. It reads results from the log via standard
|
||||
output (stdout), and passes them to ``parse`` to be parsed.
|
||||
If you already have built a kernel with built-in KUnit tests,
|
||||
you can run the kernel and display the test results with the ``exec``
|
||||
argument: ``./tools/testing/kunit/kunit.py exec``.
|
||||
- ``parse`` extracts the KTAP output from a kernel log, parses
|
||||
the test results, and prints a summary. For failed tests, any
|
||||
diagnostic output will be included.
|
@ -4,56 +4,55 @@
|
||||
Frequently Asked Questions
|
||||
==========================
|
||||
|
||||
How is this different from Autotest, kselftest, etc?
|
||||
====================================================
|
||||
How is this different from Autotest, kselftest, and so on?
|
||||
==========================================================
|
||||
KUnit is a unit testing framework. Autotest, kselftest (and some others) are
|
||||
not.
|
||||
|
||||
A `unit test <https://martinfowler.com/bliki/UnitTest.html>`_ is supposed to
|
||||
test a single unit of code in isolation, hence the name. A unit test should be
|
||||
the finest granularity of testing and as such should allow all possible code
|
||||
paths to be tested in the code under test; this is only possible if the code
|
||||
under test is very small and does not have any external dependencies outside of
|
||||
test a single unit of code in isolation and hence the name *unit test*. A unit
|
||||
test should be the finest granularity of testing and should allow all possible
|
||||
code paths to be tested in the code under test. This is only possible if the
|
||||
code under test is small and does not have any external dependencies outside of
|
||||
the test's control like hardware.
|
||||
|
||||
There are no testing frameworks currently available for the kernel that do not
|
||||
require installing the kernel on a test machine or in a VM and all require
|
||||
tests to be written in userspace and run on the kernel under test; this is true
|
||||
for Autotest, kselftest, and some others, disqualifying any of them from being
|
||||
considered unit testing frameworks.
|
||||
require installing the kernel on a test machine or in a virtual machine. All
|
||||
testing frameworks require tests to be written in userspace and run on the
|
||||
kernel under test. This is true for Autotest, kselftest, and some others,
|
||||
disqualifying any of them from being considered unit testing frameworks.
|
||||
|
||||
Does KUnit support running on architectures other than UML?
|
||||
===========================================================
|
||||
|
||||
Yes, well, mostly.
|
||||
Yes, mostly.
|
||||
|
||||
For the most part, the KUnit core framework (what you use to write the tests)
|
||||
can compile to any architecture; it compiles like just another part of the
|
||||
For the most part, the KUnit core framework (what we use to write the tests)
|
||||
can compile to any architecture. It compiles like just another part of the
|
||||
kernel and runs when the kernel boots, or when built as a module, when the
|
||||
module is loaded. However, there is some infrastructure,
|
||||
like the KUnit Wrapper (``tools/testing/kunit/kunit.py``) that does not support
|
||||
other architectures.
|
||||
module is loaded. However, there is infrastructure, like the KUnit Wrapper
|
||||
(``tools/testing/kunit/kunit.py``) that does not support other architectures.
|
||||
|
||||
In short, this means that, yes, you can run KUnit on other architectures, but
|
||||
it might require more work than using KUnit on UML.
|
||||
In short, yes, you can run KUnit on other architectures, but it might require
|
||||
more work than using KUnit on UML.
|
||||
|
||||
For more information, see :ref:`kunit-on-non-uml`.
|
||||
|
||||
What is the difference between a unit test and these other kinds of tests?
|
||||
==========================================================================
|
||||
What is the difference between a unit test and other kinds of tests?
|
||||
====================================================================
|
||||
Most existing tests for the Linux kernel would be categorized as an integration
|
||||
test, or an end-to-end test.
|
||||
|
||||
- A unit test is supposed to test a single unit of code in isolation, hence the
|
||||
name. A unit test should be the finest granularity of testing and as such
|
||||
should allow all possible code paths to be tested in the code under test; this
|
||||
is only possible if the code under test is very small and does not have any
|
||||
external dependencies outside of the test's control like hardware.
|
||||
- A unit test is supposed to test a single unit of code in isolation. A unit
|
||||
test should be the finest granularity of testing and, as such, allows all
|
||||
possible code paths to be tested in the code under test. This is only possible
|
||||
if the code under test is small and does not have any external dependencies
|
||||
outside of the test's control like hardware.
|
||||
- An integration test tests the interaction between a minimal set of components,
|
||||
usually just two or three. For example, someone might write an integration
|
||||
test to test the interaction between a driver and a piece of hardware, or to
|
||||
test the interaction between the userspace libraries the kernel provides and
|
||||
the kernel itself; however, one of these tests would probably not test the
|
||||
the kernel itself. However, one of these tests would probably not test the
|
||||
entire kernel along with hardware interactions and interactions with the
|
||||
userspace.
|
||||
- An end-to-end test usually tests the entire system from the perspective of the
|
||||
@ -62,26 +61,26 @@ test, or an end-to-end test.
|
||||
hardware with a production userspace and then trying to exercise some behavior
|
||||
that depends on interactions between the hardware, the kernel, and userspace.
|
||||
|
||||
KUnit isn't working, what should I do?
|
||||
======================================
|
||||
KUnit is not working, what should I do?
|
||||
=======================================
|
||||
|
||||
Unfortunately, there are a number of things which can break, but here are some
|
||||
things to try.
|
||||
|
||||
1. Try running ``./tools/testing/kunit/kunit.py run`` with the ``--raw_output``
|
||||
1. Run ``./tools/testing/kunit/kunit.py run`` with the ``--raw_output``
|
||||
parameter. This might show details or error messages hidden by the kunit_tool
|
||||
parser.
|
||||
2. Instead of running ``kunit.py run``, try running ``kunit.py config``,
|
||||
``kunit.py build``, and ``kunit.py exec`` independently. This can help track
|
||||
down where an issue is occurring. (If you think the parser is at fault, you
|
||||
can run it manually against stdin or a file with ``kunit.py parse``.)
|
||||
3. Running the UML kernel directly can often reveal issues or error messages
|
||||
kunit_tool ignores. This should be as simple as running ``./vmlinux`` after
|
||||
building the UML kernel (e.g., by using ``kunit.py build``). Note that UML
|
||||
has some unusual requirements (such as the host having a tmpfs filesystem
|
||||
mounted), and has had issues in the past when built statically and the host
|
||||
has KASLR enabled. (On older host kernels, you may need to run ``setarch
|
||||
`uname -m` -R ./vmlinux`` to disable KASLR.)
|
||||
can run it manually against ``stdin`` or a file with ``kunit.py parse``.)
|
||||
3. Running the UML kernel directly can often reveal issues or error messages,
|
||||
``kunit_tool`` ignores. This should be as simple as running ``./vmlinux``
|
||||
after building the UML kernel (for example, by using ``kunit.py build``).
|
||||
Note that UML has some unusual requirements (such as the host having a tmpfs
|
||||
filesystem mounted), and has had issues in the past when built statically and
|
||||
the host has KASLR enabled. (On older host kernels, you may need to run
|
||||
``setarch `uname -m` -R ./vmlinux`` to disable KASLR.)
|
||||
4. Make sure the kernel .config has ``CONFIG_KUNIT=y`` and at least one test
|
||||
(e.g. ``CONFIG_KUNIT_EXAMPLE_TEST=y``). kunit_tool will keep its .config
|
||||
around, so you can see what config was used after running ``kunit.py run``.
|
||||
|
@ -1,13 +1,17 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=========================================
|
||||
KUnit - Unit Testing for the Linux Kernel
|
||||
=========================================
|
||||
=================================
|
||||
KUnit - Linux Kernel Unit Testing
|
||||
=================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Contents:
|
||||
|
||||
start
|
||||
architecture
|
||||
run_wrapper
|
||||
run_manual
|
||||
usage
|
||||
kunit-tool
|
||||
api/index
|
||||
@ -16,82 +20,94 @@ KUnit - Unit Testing for the Linux Kernel
|
||||
tips
|
||||
running_tips
|
||||
|
||||
What is KUnit?
|
||||
==============
|
||||
This section details the kernel unit testing framework.
|
||||
|
||||
KUnit is a lightweight unit testing and mocking framework for the Linux kernel.
|
||||
Introduction
|
||||
============
|
||||
|
||||
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
|
||||
Googletest/Googlemock for C++. KUnit provides facilities for defining unit test
|
||||
cases, grouping related test cases into test suites, providing common
|
||||
infrastructure for running tests, and much more.
|
||||
KUnit (Kernel unit testing framework) provides a common framework for
|
||||
unit tests within the Linux kernel. Using KUnit, you can define groups
|
||||
of test cases called test suites. The tests either run on kernel boot
|
||||
if built-in, or load as a module. KUnit automatically flags and reports
|
||||
failed test cases in the kernel log. The test results appear in `TAP
|
||||
(Test Anything Protocol) format <https://testanything.org/>`_. It is inspired by
|
||||
JUnit, Python’s unittest.mock, and GoogleTest/GoogleMock (C++ unit testing
|
||||
framework).
|
||||
|
||||
KUnit consists of a kernel component, which provides a set of macros for easily
|
||||
writing unit tests. Tests written against KUnit will run on kernel boot if
|
||||
built-in, or when loaded if built as a module. These tests write out results to
|
||||
the kernel log in `TAP <https://testanything.org/>`_ format.
|
||||
KUnit tests are part of the kernel, written in the C (programming)
|
||||
language, and test parts of the Kernel implementation (example: a C
|
||||
language function). Excluding build time, from invocation to
|
||||
completion, KUnit can run around 100 tests in less than 10 seconds.
|
||||
KUnit can test any kernel component, for example: file system, system
|
||||
calls, memory management, device drivers and so on.
|
||||
|
||||
To make running these tests (and reading the results) easier, KUnit offers
|
||||
:doc:`kunit_tool <kunit-tool>`, which builds a `User Mode Linux
|
||||
<http://user-mode-linux.sourceforge.net>`_ kernel, runs it, and parses the test
|
||||
results. This provides a quick way of running KUnit tests during development,
|
||||
without requiring a virtual machine or separate hardware.
|
||||
KUnit follows the white-box testing approach. The test has access to
|
||||
internal system functionality. KUnit runs in kernel space and is not
|
||||
restricted to things exposed to user-space.
|
||||
|
||||
Get started now: Documentation/dev-tools/kunit/start.rst
|
||||
In addition, KUnit has kunit_tool, a script (``tools/testing/kunit/kunit.py``)
|
||||
that configures the Linux kernel, runs KUnit tests under QEMU or UML (`User Mode
|
||||
Linux <http://user-mode-linux.sourceforge.net/>`_), parses the test results and
|
||||
displays them in a user friendly manner.
|
||||
|
||||
Why KUnit?
|
||||
==========
|
||||
Features
|
||||
--------
|
||||
|
||||
A unit test is supposed to test a single unit of code in isolation, hence the
|
||||
name. A unit test should be the finest granularity of testing and as such should
|
||||
allow all possible code paths to be tested in the code under test; this is only
|
||||
possible if the code under test is very small and does not have any external
|
||||
dependencies outside of the test's control like hardware.
|
||||
- Provides a framework for writing unit tests.
|
||||
- Runs tests on any kernel architecture.
|
||||
- Runs a test in milliseconds.
|
||||
|
||||
KUnit provides a common framework for unit tests within the kernel.
|
||||
Prerequisites
|
||||
-------------
|
||||
|
||||
KUnit tests can be run on most architectures, and most tests are architecture
|
||||
independent. All built-in KUnit tests run on kernel startup. Alternatively,
|
||||
KUnit and KUnit tests can be built as modules and tests will run when the test
|
||||
module is loaded.
|
||||
- Any Linux kernel compatible hardware.
|
||||
- For Kernel under test, Linux kernel version 5.5 or greater.
|
||||
|
||||
.. note::
|
||||
Unit Testing
|
||||
============
|
||||
|
||||
KUnit can also run tests without needing a virtual machine or actual
|
||||
hardware under User Mode Linux. User Mode Linux is a Linux architecture,
|
||||
like ARM or x86, which compiles the kernel as a Linux executable. KUnit
|
||||
can be used with UML either by building with ``ARCH=um`` (like any other
|
||||
architecture), or by using :doc:`kunit_tool <kunit-tool>`.
|
||||
A unit test tests a single unit of code in isolation. A unit test is the finest
|
||||
granularity of testing and allows all possible code paths to be tested in the
|
||||
code under test. This is possible if the code under test is small and does not
|
||||
have any external dependencies outside of the test's control like hardware.
|
||||
|
||||
KUnit is fast. Excluding build time, from invocation to completion KUnit can run
|
||||
several dozen tests in only 10 to 20 seconds; this might not sound like a big
|
||||
deal to some people, but having such fast and easy to run tests fundamentally
|
||||
changes the way you go about testing and even writing code in the first place.
|
||||
Linus himself said in his `git talk at Google
|
||||
<https://gist.github.com/lorn/1272686/revisions#diff-53c65572127855f1b003db4064a94573R874>`_:
|
||||
|
||||
"... a lot of people seem to think that performance is about doing the
|
||||
same thing, just doing it faster, and that is not true. That is not what
|
||||
performance is all about. If you can do something really fast, really
|
||||
well, people will start using it differently."
|
||||
Write Unit Tests
|
||||
----------------
|
||||
|
||||
In this context Linus was talking about branching and merging,
|
||||
but this point also applies to testing. If your tests are slow, unreliable, are
|
||||
difficult to write, and require a special setup or special hardware to run,
|
||||
then you wait a lot longer to write tests, and you wait a lot longer to run
|
||||
tests; this means that tests are likely to break, unlikely to test a lot of
|
||||
things, and are unlikely to be rerun once they pass. If your tests are really
|
||||
fast, you run them all the time, every time you make a change, and every time
|
||||
someone sends you some code. Why trust that someone ran all their tests
|
||||
correctly on every change when you can just run them yourself in less time than
|
||||
it takes to read their test log?
|
||||
To write good unit tests, there is a simple but powerful pattern:
|
||||
Arrange-Act-Assert. This is a great way to structure test cases and
|
||||
defines an order of operations.
|
||||
|
||||
- Arrange inputs and targets: At the start of the test, arrange the data
|
||||
that allows a function to work. Example: initialize a statement or
|
||||
object.
|
||||
- Act on the target behavior: Call your function/code under test.
|
||||
- Assert expected outcome: Verify that the result (or resulting state) is as
|
||||
expected.
|
||||
|
||||
Unit Testing Advantages
|
||||
-----------------------
|
||||
|
||||
- Increases testing speed and development in the long run.
|
||||
- Detects bugs at initial stage and therefore decreases bug fix cost
|
||||
compared to acceptance testing.
|
||||
- Improves code quality.
|
||||
- Encourages writing testable code.
|
||||
|
||||
How do I use it?
|
||||
================
|
||||
|
||||
* Documentation/dev-tools/kunit/start.rst - for new users of KUnit
|
||||
* Documentation/dev-tools/kunit/tips.rst - for short examples of best practices
|
||||
* Documentation/dev-tools/kunit/usage.rst - for a more detailed explanation of KUnit features
|
||||
* Documentation/dev-tools/kunit/api/index.rst - for the list of KUnit APIs used for testing
|
||||
* Documentation/dev-tools/kunit/kunit-tool.rst - for more information on the kunit_tool helper script
|
||||
* Documentation/dev-tools/kunit/faq.rst - for answers to some common questions about KUnit
|
||||
* Documentation/dev-tools/kunit/start.rst - for KUnit new users.
|
||||
* Documentation/dev-tools/kunit/architecture.rst - KUnit architecture.
|
||||
* Documentation/dev-tools/kunit/run_wrapper.rst - run kunit_tool.
|
||||
* Documentation/dev-tools/kunit/run_manual.rst - run tests without kunit_tool.
|
||||
* Documentation/dev-tools/kunit/usage.rst - write tests.
|
||||
* Documentation/dev-tools/kunit/tips.rst - best practices with
|
||||
examples.
|
||||
* Documentation/dev-tools/kunit/api/index.rst - KUnit APIs
|
||||
used for testing.
|
||||
* Documentation/dev-tools/kunit/kunit-tool.rst - kunit_tool helper
|
||||
script.
|
||||
* Documentation/dev-tools/kunit/faq.rst - KUnit common questions and
|
||||
answers.
|
||||
|
81
Documentation/dev-tools/kunit/kunit_suitememorydiagram.svg
Normal file
@ -0,0 +1,81 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<svg width="796.93" height="555.73" version="1.1" viewBox="0 0 796.93 555.73" xmlns="http://www.w3.org/2000/svg">
|
||||
<g transform="translate(-13.724 -17.943)">
|
||||
<g fill="#dad4d4" fill-opacity=".91765" stroke="#1a1a1a">
|
||||
<rect x="323.56" y="18.443" width="115.75" height="41.331"/>
|
||||
<rect x="323.56" y="463.09" width="115.75" height="41.331"/>
|
||||
<rect x="323.56" y="531.84" width="115.75" height="41.331"/>
|
||||
<rect x="323.56" y="88.931" width="115.75" height="74.231"/>
|
||||
</g>
|
||||
<g>
|
||||
<rect x="323.56" y="421.76" width="115.75" height="41.331" fill="#b9dbc6" stroke="#1a1a1a"/>
|
||||
<text x="328.00888" y="446.61826" fill="#000000" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="328.00888" y="446.61826" font-family="monospace" font-size="16px">kunit_suite</tspan></text>
|
||||
</g>
|
||||
<g transform="translate(0 -258.6)">
|
||||
<rect x="323.56" y="421.76" width="115.75" height="41.331" fill="#b9dbc6" stroke="#1a1a1a"/>
|
||||
<text x="328.00888" y="446.61826" fill="#000000" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="328.00888" y="446.61826" font-family="monospace" font-size="16px">kunit_suite</tspan></text>
|
||||
</g>
|
||||
<g transform="translate(0 -217.27)">
|
||||
<rect x="323.56" y="421.76" width="115.75" height="41.331" fill="#b9dbc6" stroke="#1a1a1a"/>
|
||||
<text x="328.00888" y="446.61826" fill="#000000" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="328.00888" y="446.61826" font-family="monospace" font-size="16px">kunit_suite</tspan></text>
|
||||
</g>
|
||||
<g transform="translate(0 -175.94)">
|
||||
<rect x="323.56" y="421.76" width="115.75" height="41.331" fill="#b9dbc6" stroke="#1a1a1a"/>
|
||||
<text x="328.00888" y="446.61826" fill="#000000" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="328.00888" y="446.61826" font-family="monospace" font-size="16px">kunit_suite</tspan></text>
|
||||
</g>
|
||||
<g transform="translate(0 -134.61)">
|
||||
<rect x="323.56" y="421.76" width="115.75" height="41.331" fill="#b9dbc6" stroke="#1a1a1a"/>
|
||||
<text x="328.00888" y="446.61826" fill="#000000" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="328.00888" y="446.61826" font-family="monospace" font-size="16px">kunit_suite</tspan></text>
|
||||
</g>
|
||||
<g transform="translate(0 -41.331)">
|
||||
<rect x="323.56" y="421.76" width="115.75" height="41.331" fill="#b9dbc6" stroke="#1a1a1a"/>
|
||||
<text x="328.00888" y="446.61826" fill="#000000" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="328.00888" y="446.61826" font-family="monospace" font-size="16px">kunit_suite</tspan></text>
|
||||
</g>
|
||||
<g transform="translate(3.4459e-5 -.71088)">
|
||||
<rect x="502.19" y="143.16" width="201.13" height="41.331" fill="#dad4d4" fill-opacity=".91765" stroke="#1a1a1a"/>
|
||||
<text x="512.02319" y="168.02026" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="512.02319" y="168.02026" font-family="monospace">_kunit_suites_start</tspan></text>
|
||||
</g>
|
||||
<g transform="translate(3.0518e-5 -3.1753)">
|
||||
<rect x="502.19" y="445.69" width="201.13" height="41.331" fill="#dad4d4" fill-opacity=".91765" stroke="#1a1a1a"/>
|
||||
<text x="521.61694" y="470.54846" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="521.61694" y="470.54846" font-family="monospace">_kunit_suites_end</tspan></text>
|
||||
</g>
|
||||
<rect x="14.224" y="277.78" width="134.47" height="41.331" fill="#dad4d4" fill-opacity=".91765" stroke="#1a1a1a"/>
|
||||
<text x="32.062176" y="304.41287" font-family="sans-serif" font-size="16px" style="line-height:1.25" xml:space="preserve"><tspan x="32.062176" y="304.41287" font-family="monospace">.init.data</tspan></text>
|
||||
<g transform="translate(217.98 145.12)" stroke="#1a1a1a">
|
||||
<circle cx="149.97" cy="373.01" r="3.4012"/>
|
||||
<circle cx="163.46" cy="373.01" r="3.4012"/>
|
||||
<circle cx="176.95" cy="373.01" r="3.4012"/>
|
||||
</g>
|
||||
<g transform="translate(217.98 -298.66)" stroke="#1a1a1a">
|
||||
<circle cx="149.97" cy="373.01" r="3.4012"/>
|
||||
<circle cx="163.46" cy="373.01" r="3.4012"/>
|
||||
<circle cx="176.95" cy="373.01" r="3.4012"/>
|
||||
</g>
|
||||
<g stroke="#1a1a1a">
|
||||
<rect x="323.56" y="328.49" width="115.75" height="51.549" fill="#b9dbc6"/>
|
||||
<g transform="translate(217.98 -18.75)">
|
||||
<circle cx="149.97" cy="373.01" r="3.4012"/>
|
||||
<circle cx="163.46" cy="373.01" r="3.4012"/>
|
||||
<circle cx="176.95" cy="373.01" r="3.4012"/>
|
||||
</g>
|
||||
</g>
|
||||
<g transform="scale(1.0933 .9147)" stroke-width="32.937" aria-label="{">
|
||||
<path d="m275.49 545.57c-35.836-8.432-47.43-24.769-47.957-64.821v-88.536c-0.527-44.795-10.54-57.97-49.538-67.456 38.998-10.013 49.011-23.715 49.538-67.983v-88.536c0.527-40.052 12.121-56.389 47.957-64.821v-5.797c-65.348 0-85.901 17.391-86.955 73.253v93.806c-0.527 36.89-10.013 50.065-44.795 59.551 34.782 10.013 44.268 23.188 44.795 60.078v93.279c1.581 56.389 21.607 73.78 86.955 73.78z"/>
|
||||
</g>
|
||||
<g transform="scale(1.1071 .90325)" stroke-width="14.44" aria-label="{">
|
||||
<path d="m461.46 443.55c-15.711-3.6967-20.794-10.859-21.025-28.418v-38.815c-0.23104-19.639-4.6209-25.415-21.718-29.574 17.097-4.3898 21.487-10.397 21.718-29.805v-38.815c0.23105-17.559 5.314-24.722 21.025-28.418v-2.5415c-28.649 0-37.66 7.6244-38.122 32.115v41.126c-0.23105 16.173-4.3898 21.949-19.639 26.108 15.249 4.3898 19.408 10.166 19.639 26.339v40.895c0.69313 24.722 9.4728 32.346 38.122 32.346z"/>
|
||||
</g>
|
||||
<path d="m449.55 161.84v2.5h49.504v-2.5z" color="#000000" style="-inkscape-stroke:none"/>
|
||||
<g fill-rule="evenodd">
|
||||
<path d="m443.78 163.09 8.65-5v10z" color="#000000" stroke-width="1pt" style="-inkscape-stroke:none"/>
|
||||
<path d="m453.1 156.94-10.648 6.1543 0.99804 0.57812 9.6504 5.5781zm-1.334 2.3125v7.6856l-6.6504-3.8438z" color="#000000" style="-inkscape-stroke:none"/>
|
||||
</g>
|
||||
<path d="m449.55 461.91v2.5h49.504v-2.5z" color="#000000" style="-inkscape-stroke:none"/>
|
||||
<g fill-rule="evenodd">
|
||||
<path d="m443.78 463.16 8.65-5v10z" color="#000000" stroke-width="1pt" style="-inkscape-stroke:none"/>
|
||||
<path d="m453.1 457-10.648 6.1562 0.99804 0.57617 9.6504 5.5781zm-1.334 2.3125v7.6856l-6.6504-3.8438z" color="#000000" style="-inkscape-stroke:none"/>
|
||||
</g>
|
||||
<rect x="515.64" y="223.9" width="294.52" height="178.49" fill="#dad4d4" fill-opacity=".91765" stroke="#1a1a1a"/>
|
||||
<text x="523.33319" y="262.52542" font-family="monospace" font-size="14.667px" style="line-height:1.25" xml:space="preserve"><tspan x="523.33319" y="262.52542"><tspan fill="#008000" font-family="monospace" font-size="14.667px" font-weight="bold">struct</tspan> kunit_suite {</tspan><tspan x="523.33319" y="280.8588"><tspan fill="#008000" font-family="monospace" font-size="14.667px" font-weight="bold"> const char</tspan> name[<tspan fill="#ff00ff" font-size="14.667px">256</tspan>];</tspan><tspan x="523.33319" y="299.19217"> <tspan fill="#008000" font-family="monospace" font-size="14.667px" font-weight="bold">int</tspan> (*init)(<tspan fill="#008000" font-family="monospace" font-size="14.667px" font-weight="bold">struct</tspan> kunit *);</tspan><tspan x="523.33319" y="317.52554"> <tspan fill="#008000" font-family="monospace" font-size="14.667px" font-weight="bold">void</tspan> (*exit)(<tspan fill="#008000" font-family="monospace" font-size="14.667px" font-weight="bold">struct</tspan> kunit *);</tspan><tspan x="523.33319" y="335.85892"> <tspan fill="#008000" font-family="monospace" font-size="14.667px" font-weight="bold">struct</tspan> kunit_case *test_cases;</tspan><tspan x="523.33319" y="354.19229"> ...</tspan><tspan x="523.33319" y="372.52567">};</tspan></text>
|
||||
</g>
|
||||
</svg>
|
After Width: | Height: | Size: 7.6 KiB |
57
Documentation/dev-tools/kunit/run_manual.rst
Normal file
@ -0,0 +1,57 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
============================
|
||||
Run Tests without kunit_tool
|
||||
============================
|
||||
|
||||
If we do not want to use kunit_tool (For example: we want to integrate
|
||||
with other systems, or run tests on real hardware), we can
|
||||
include KUnit in any kernel, read out results, and parse manually.
|
||||
|
||||
.. note:: KUnit is not designed for use in a production system. It is
|
||||
possible that tests may reduce the stability or security of
|
||||
the system.
|
||||
|
||||
Configure the Kernel
|
||||
====================
|
||||
|
||||
KUnit tests can run without kunit_tool. This can be useful, if:
|
||||
|
||||
- We have an existing kernel configuration to test.
|
||||
- Need to run on real hardware (or using an emulator/VM kunit_tool
|
||||
does not support).
|
||||
- Wish to integrate with some existing testing systems.
|
||||
|
||||
KUnit is configured with the ``CONFIG_KUNIT`` option, and individual
|
||||
tests can also be built by enabling their config options in our
|
||||
``.config``. KUnit tests usually (but don't always) have config options
|
||||
ending in ``_KUNIT_TEST``. Most tests can either be built as a module,
|
||||
or be built into the kernel.
|
||||
|
||||
.. note ::
|
||||
|
||||
We can enable the ``KUNIT_ALL_TESTS`` config option to
|
||||
automatically enable all tests with satisfied dependencies. This is
|
||||
a good way of quickly testing everything applicable to the current
|
||||
config.
|
||||
|
||||
Once we have built our kernel (and/or modules), it is simple to run
|
||||
the tests. If the tests are built-in, they will run automatically on the
|
||||
kernel boot. The results will be written to the kernel log (``dmesg``)
|
||||
in TAP format.
|
||||
|
||||
If the tests are built as modules, they will run when the module is
|
||||
loaded.
|
||||
|
||||
.. code-block :: bash
|
||||
|
||||
# modprobe example-test
|
||||
|
||||
The results will appear in TAP format in ``dmesg``.
|
||||
|
||||
.. note ::
|
||||
|
||||
If ``CONFIG_KUNIT_DEBUGFS`` is enabled, KUnit test results will
|
||||
be accessible from the ``debugfs`` filesystem (if mounted).
|
||||
They will be in ``/sys/kernel/debug/kunit/<test_suite>/results``, in
|
||||
TAP format.
|
247
Documentation/dev-tools/kunit/run_wrapper.rst
Normal file
@ -0,0 +1,247 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=========================
|
||||
Run Tests with kunit_tool
|
||||
=========================
|
||||
|
||||
We can either run KUnit tests using kunit_tool or can run tests
|
||||
manually, and then use kunit_tool to parse the results. To run tests
|
||||
manually, see: Documentation/dev-tools/kunit/run_manual.rst.
|
||||
As long as we can build the kernel, we can run KUnit.
|
||||
|
||||
kunit_tool is a Python script which configures and builds a kernel, runs
|
||||
tests, and formats the test results.
|
||||
|
||||
Run command:
|
||||
|
||||
.. code-block::
|
||||
|
||||
./tools/testing/kunit/kunit.py run
|
||||
|
||||
We should see the following:
|
||||
|
||||
.. code-block::
|
||||
|
||||
Generating .config...
|
||||
Building KUnit kernel...
|
||||
Starting KUnit kernel...
|
||||
|
||||
We may want to use the following options:
|
||||
|
||||
.. code-block::
|
||||
|
||||
./tools/testing/kunit/kunit.py run --timeout=30 --jobs=`nproc --all
|
||||
|
||||
- ``--timeout`` sets a maximum amount of time for tests to run.
|
||||
- ``--jobs`` sets the number of threads to build the kernel.
|
||||
|
||||
kunit_tool will generate a ``.kunitconfig`` with a default
|
||||
configuration, if no other ``.kunitconfig`` file exists
|
||||
(in the build directory). In addition, it verifies that the
|
||||
generated ``.config`` file contains the ``CONFIG`` options in the
|
||||
``.kunitconfig``.
|
||||
It is also possible to pass a separate ``.kunitconfig`` fragment to
|
||||
kunit_tool. This is useful if we have several different groups of
|
||||
tests we want to run independently, or if we want to use pre-defined
|
||||
test configs for certain subsystems.
|
||||
|
||||
To use a different ``.kunitconfig`` file (such as one
|
||||
provided to test a particular subsystem), pass it as an option:
|
||||
|
||||
.. code-block::
|
||||
|
||||
./tools/testing/kunit/kunit.py run --kunitconfig=fs/ext4/.kunitconfig
|
||||
|
||||
To view kunit_tool flags (optional command-line arguments), run:
|
||||
|
||||
.. code-block::
|
||||
|
||||
./tools/testing/kunit/kunit.py run --help
|
||||
|
||||
Create a ``.kunitconfig`` File
|
||||
===============================
|
||||
|
||||
If we want to run a specific set of tests (rather than those listed
|
||||
in the KUnit ``defconfig``), we can provide Kconfig options in the
|
||||
``.kunitconfig`` file. For default .kunitconfig, see:
|
||||
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/kunit/configs/default.config.
|
||||
A ``.kunitconfig`` is a ``minconfig`` (a .config
|
||||
generated by running ``make savedefconfig``), used for running a
|
||||
specific set of tests. This file contains the regular Kernel configs
|
||||
with specific test targets. The ``.kunitconfig`` also
|
||||
contains any other config options required by the tests (For example:
|
||||
dependencies for features under tests, configs that enable/disable
|
||||
certain code blocks, arch configs and so on).
|
||||
|
||||
To create a ``.kunitconfig``, using the KUnit ``defconfig``:
|
||||
|
||||
.. code-block::
|
||||
|
||||
cd $PATH_TO_LINUX_REPO
|
||||
cp tools/testing/kunit/configs/default.config .kunit/.kunitconfig
|
||||
|
||||
We can then add any other Kconfig options. For example:
|
||||
|
||||
.. code-block::
|
||||
|
||||
CONFIG_LIST_KUNIT_TEST=y
|
||||
|
||||
kunit_tool ensures that all config options in ``.kunitconfig`` are
|
||||
set in the kernel ``.config`` before running the tests. It warns if we
|
||||
have not included the options dependencies.
|
||||
|
||||
.. note:: Removing something from the ``.kunitconfig`` will
|
||||
not rebuild the ``.config file``. The configuration is only
|
||||
updated if the ``.kunitconfig`` is not a subset of ``.config``.
|
||||
This means that we can use other tools
|
||||
(For example: ``make menuconfig``) to adjust other config options.
|
||||
The build dir needs to be set for ``make menuconfig`` to
|
||||
work, therefore by default use ``make O=.kunit menuconfig``.
|
||||
|
||||
Configure, Build, and Run Tests
|
||||
===============================
|
||||
|
||||
If we want to make manual changes to the KUnit build process, we
|
||||
can run part of the KUnit build process independently.
|
||||
When running kunit_tool, from a ``.kunitconfig``, we can generate a
|
||||
``.config`` by using the ``config`` argument:
|
||||
|
||||
.. code-block::
|
||||
|
||||
./tools/testing/kunit/kunit.py config
|
||||
|
||||
To build a KUnit kernel from the current ``.config``, we can use the
|
||||
``build`` argument:
|
||||
|
||||
.. code-block::
|
||||
|
||||
./tools/testing/kunit/kunit.py build
|
||||
|
||||
If we already have built UML kernel with built-in KUnit tests, we
|
||||
can run the kernel, and display the test results with the ``exec``
|
||||
argument:
|
||||
|
||||
.. code-block::
|
||||
|
||||
./tools/testing/kunit/kunit.py exec
|
||||
|
||||
The ``run`` command discussed in section: **Run Tests with kunit_tool**,
|
||||
is equivalent to running the above three commands in sequence.
|
||||
|
||||
Parse Test Results
|
||||
==================
|
||||
|
||||
KUnit tests output displays results in TAP (Test Anything Protocol)
|
||||
format. When running tests, kunit_tool parses this output and prints
|
||||
a summary. To see the raw test results in TAP format, we can pass the
|
||||
``--raw_output`` argument:
|
||||
|
||||
.. code-block::
|
||||
|
||||
./tools/testing/kunit/kunit.py run --raw_output
|
||||
|
||||
If we have KUnit results in the raw TAP format, we can parse them and
|
||||
print the human-readable summary with the ``parse`` command for
|
||||
kunit_tool. This accepts a filename for an argument, or will read from
|
||||
standard input.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Reading from a file
|
||||
./tools/testing/kunit/kunit.py parse /var/log/dmesg
|
||||
# Reading from stdin
|
||||
dmesg | ./tools/testing/kunit/kunit.py parse
|
||||
|
||||
Run Selected Test Suites
|
||||
========================
|
||||
|
||||
By passing a bash style glob filter to the ``exec`` or ``run``
|
||||
commands, we can run a subset of the tests built into a kernel . For
|
||||
example: if we only want to run KUnit resource tests, use:
|
||||
|
||||
.. code-block::
|
||||
|
||||
./tools/testing/kunit/kunit.py run 'kunit-resource*'
|
||||
|
||||
This uses the standard glob format with wildcard characters.
|
||||
|
||||
Run Tests on qemu
|
||||
=================
|
||||
|
||||
kunit_tool supports running tests on qemu as well as
|
||||
via UML. To run tests on qemu, by default it requires two flags:
|
||||
|
||||
- ``--arch``: Selects a configs collection (Kconfig, qemu config options
|
||||
and so on), that allow KUnit tests to be run on the specified
|
||||
architecture in a minimal way. The architecture argument is same as
|
||||
the option name passed to the ``ARCH`` variable used by Kbuild.
|
||||
Not all architectures currently support this flag, but we can use
|
||||
``--qemu_config`` to handle it. If ``um`` is passed (or this flag
|
||||
is ignored), the tests will run via UML. Non-UML architectures,
|
||||
for example: i386, x86_64, arm and so on; run on qemu.
|
||||
|
||||
- ``--cross_compile``: Specifies the Kbuild toolchain. It passes the
|
||||
same argument as passed to the ``CROSS_COMPILE`` variable used by
|
||||
Kbuild. As a reminder, this will be the prefix for the toolchain
|
||||
binaries such as GCC. For example:
|
||||
|
||||
- ``sparc64-linux-gnu`` if we have the sparc toolchain installed on
|
||||
our system.
|
||||
|
||||
- ``$HOME/toolchains/microblaze/gcc-9.2.0-nolibc/microblaze-linux/bin/microblaze-linux``
|
||||
if we have downloaded the microblaze toolchain from the 0-day
|
||||
website to a directory in our home directory called toolchains.
|
||||
|
||||
If we want to run KUnit tests on an architecture not supported by
|
||||
the ``--arch`` flag, or want to run KUnit tests on qemu using a
|
||||
non-default configuration; then we can write our own``QemuConfig``.
|
||||
These ``QemuConfigs`` are written in Python. They have an import line
|
||||
``from..qemu_config import QemuArchParams`` at the top of the file.
|
||||
The file must contain a variable called ``QEMU_ARCH`` that has an
|
||||
instance of ``QemuArchParams`` assigned to it. See example in:
|
||||
``tools/testing/kunit/qemu_configs/x86_64.py``.
|
||||
|
||||
Once we have a ``QemuConfig``, we can pass it into kunit_tool,
|
||||
using the ``--qemu_config`` flag. When used, this flag replaces the
|
||||
``--arch`` flag. For example: using
|
||||
``tools/testing/kunit/qemu_configs/x86_64.py``, the invocation appear
|
||||
as
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
./tools/testing/kunit/kunit.py run \
|
||||
--timeout=60 \
|
||||
--jobs=12 \
|
||||
--qemu_config=./tools/testing/kunit/qemu_configs/x86_64.py
|
||||
|
||||
To run existing KUnit tests on non-UML architectures, see:
|
||||
Documentation/dev-tools/kunit/non_uml.rst.
|
||||
|
||||
Command-Line Arguments
|
||||
======================
|
||||
|
||||
kunit_tool has a number of other command-line arguments which can
|
||||
be useful for our test environment. Below the most commonly used
|
||||
command line arguments:
|
||||
|
||||
- ``--help``: Lists all available options. To list common options,
|
||||
place ``--help`` before the command. To list options specific to that
|
||||
command, place ``--help`` after the command.
|
||||
|
||||
.. note:: Different commands (``config``, ``build``, ``run``, etc)
|
||||
have different supported options.
|
||||
- ``--build_dir``: Specifies kunit_tool build directory. It includes
|
||||
the ``.kunitconfig``, ``.config`` files and compiled kernel.
|
||||
|
||||
- ``--make_options``: Specifies additional options to pass to make, when
|
||||
compiling a kernel (using ``build`` or ``run`` commands). For example:
|
||||
to enable compiler warnings, we can pass ``--make_options W=1``.
|
||||
|
||||
- ``--alltests``: Builds a UML kernel with all config options enabled
|
||||
using ``make allyesconfig``. This allows us to run as many tests as
|
||||
possible.
|
||||
|
||||
.. note:: It is slow and prone to breakage as new options are
|
||||
added or modified. Instead, enable all tests
|
||||
which have satisfied dependencies by adding
|
||||
``CONFIG_KUNIT_ALL_TESTS=y`` to your ``.kunitconfig``.
|
@ -4,132 +4,137 @@
|
||||
Getting Started
|
||||
===============
|
||||
|
||||
Installing dependencies
|
||||
Installing Dependencies
|
||||
=======================
|
||||
KUnit has the same dependencies as the Linux kernel. As long as you can build
|
||||
the kernel, you can run KUnit.
|
||||
KUnit has the same dependencies as the Linux kernel. As long as you can
|
||||
build the kernel, you can run KUnit.
|
||||
|
||||
Running tests with the KUnit Wrapper
|
||||
====================================
|
||||
Included with KUnit is a simple Python wrapper which runs tests under User Mode
|
||||
Linux, and formats the test results.
|
||||
|
||||
The wrapper can be run with:
|
||||
Running tests with kunit_tool
|
||||
=============================
|
||||
kunit_tool is a Python script, which configures and builds a kernel, runs
|
||||
tests, and formats the test results. From the kernel repository, you
|
||||
can run kunit_tool:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
./tools/testing/kunit/kunit.py run
|
||||
|
||||
For more information on this wrapper (also called kunit_tool) check out the
|
||||
Documentation/dev-tools/kunit/kunit-tool.rst page.
|
||||
For more information on this wrapper, see:
|
||||
Documentation/dev-tools/kunit/run_wrapper.rst.
|
||||
|
||||
Creating a .kunitconfig
|
||||
-----------------------
|
||||
If you want to run a specific set of tests (rather than those listed in the
|
||||
KUnit defconfig), you can provide Kconfig options in the ``.kunitconfig`` file.
|
||||
This file essentially contains the regular Kernel config, with the specific
|
||||
test targets as well. The ``.kunitconfig`` should also contain any other config
|
||||
options required by the tests.
|
||||
Creating a ``.kunitconfig``
|
||||
---------------------------
|
||||
|
||||
A good starting point for a ``.kunitconfig`` is the KUnit defconfig:
|
||||
By default, kunit_tool runs a selection of tests. However, you can specify which
|
||||
unit tests to run by creating a ``.kunitconfig`` file with kernel config options
|
||||
that enable only a specific set of tests and their dependencies.
|
||||
The ``.kunitconfig`` file contains a list of kconfig options which are required
|
||||
to run the desired targets. The ``.kunitconfig`` also contains any other test
|
||||
specific config options, such as test dependencies. For example: the
|
||||
``FAT_FS`` tests - ``FAT_KUNIT_TEST``, depends on
|
||||
``FAT_FS``. ``FAT_FS`` can be enabled by selecting either ``MSDOS_FS``
|
||||
or ``VFAT_FS``. To run ``FAT_KUNIT_TEST``, the ``.kunitconfig`` has:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
CONFIG_KUNIT=y
|
||||
CONFIG_MSDOS_FS=y
|
||||
CONFIG_FAT_KUNIT_TEST=y
|
||||
|
||||
1. A good starting point for the ``.kunitconfig``, is the KUnit default
|
||||
config. Run the command:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
cd $PATH_TO_LINUX_REPO
|
||||
cp tools/testing/kunit/configs/default.config .kunitconfig
|
||||
|
||||
You can then add any other Kconfig options you wish, e.g.:
|
||||
.. note ::
|
||||
You may want to remove CONFIG_KUNIT_ALL_TESTS from the ``.kunitconfig`` as
|
||||
it will enable a number of additional tests that you may not want.
|
||||
|
||||
2. You can then add any other Kconfig options, for example:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
CONFIG_LIST_KUNIT_TEST=y
|
||||
|
||||
:doc:`kunit_tool <kunit-tool>` will ensure that all config options set in
|
||||
``.kunitconfig`` are set in the kernel ``.config`` before running the tests.
|
||||
It'll warn you if you haven't included the dependencies of the options you're
|
||||
using.
|
||||
Before running the tests, kunit_tool ensures that all config options
|
||||
set in ``.kunitconfig`` are set in the kernel ``.config``. It will warn
|
||||
you if you have not included dependencies for the options used.
|
||||
|
||||
.. note::
|
||||
Note that removing something from the ``.kunitconfig`` will not trigger a
|
||||
rebuild of the ``.config`` file: the configuration is only updated if the
|
||||
``.kunitconfig`` is not a subset of ``.config``. This means that you can use
|
||||
other tools (such as make menuconfig) to adjust other config options.
|
||||
.. note ::
|
||||
If you change the ``.kunitconfig``, kunit.py will trigger a rebuild of the
|
||||
``.config`` file. But you can edit the ``.config`` file directly or with
|
||||
tools like ``make menuconfig O=.kunit``. As long as its a superset of
|
||||
``.kunitconfig``, kunit.py won't overwrite your changes.
|
||||
|
||||
|
||||
Running the tests (KUnit Wrapper)
|
||||
---------------------------------
|
||||
|
||||
To make sure that everything is set up correctly, simply invoke the Python
|
||||
wrapper from your kernel repo:
|
||||
Running Tests (KUnit Wrapper)
|
||||
-----------------------------
|
||||
1. To make sure that everything is set up correctly, invoke the Python
|
||||
wrapper from your kernel repository:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
./tools/testing/kunit/kunit.py run
|
||||
|
||||
.. note::
|
||||
You may want to run ``make mrproper`` first.
|
||||
|
||||
If everything worked correctly, you should see the following:
|
||||
|
||||
.. code-block:: bash
|
||||
.. code-block::
|
||||
|
||||
Generating .config ...
|
||||
Building KUnit Kernel ...
|
||||
Starting KUnit Kernel ...
|
||||
|
||||
followed by a list of tests that are run. All of them should be passing.
|
||||
The tests will pass or fail.
|
||||
|
||||
.. note::
|
||||
Because it is building a lot of sources for the first time, the
|
||||
``Building KUnit kernel`` step may take a while.
|
||||
.. note ::
|
||||
Because it is building a lot of sources for the first time, the
|
||||
``Building KUnit kernel`` may take a while.
|
||||
|
||||
Running tests without the KUnit Wrapper
|
||||
Running Tests without the KUnit Wrapper
|
||||
=======================================
|
||||
If you do not want to use the KUnit Wrapper (for example: you want code
|
||||
under test to integrate with other systems, or use a different/
|
||||
unsupported architecture or configuration), KUnit can be included in
|
||||
any kernel, and the results are read out and parsed manually.
|
||||
|
||||
If you'd rather not use the KUnit Wrapper (if, for example, you need to
|
||||
integrate with other systems, or use an architecture other than UML), KUnit can
|
||||
be included in any kernel, and the results read out and parsed manually.
|
||||
.. note ::
|
||||
``CONFIG_KUNIT`` should not be enabled in a production environment.
|
||||
Enabling KUnit disables Kernel Address-Space Layout Randomization
|
||||
(KASLR), and tests may affect the state of the kernel in ways not
|
||||
suitable for production.
|
||||
|
||||
.. note::
|
||||
KUnit is not designed for use in a production system, and it's possible that
|
||||
tests may reduce the stability or security of the system.
|
||||
|
||||
|
||||
|
||||
Configuring the kernel
|
||||
Configuring the Kernel
|
||||
----------------------
|
||||
To enable KUnit itself, you need to enable the ``CONFIG_KUNIT`` Kconfig
|
||||
option (under Kernel Hacking/Kernel Testing and Coverage in
|
||||
``menuconfig``). From there, you can enable any KUnit tests. They
|
||||
usually have config options ending in ``_KUNIT_TEST``.
|
||||
|
||||
In order to enable KUnit itself, you simply need to enable the ``CONFIG_KUNIT``
|
||||
Kconfig option (it's under Kernel Hacking/Kernel Testing and Coverage in
|
||||
menuconfig). From there, you can enable any KUnit tests you want: they usually
|
||||
have config options ending in ``_KUNIT_TEST``.
|
||||
KUnit and KUnit tests can be compiled as modules. The tests in a module
|
||||
will run when the module is loaded.
|
||||
|
||||
KUnit and KUnit tests can be compiled as modules: in this case the tests in a
|
||||
module will be run when the module is loaded.
|
||||
|
||||
|
||||
Running the tests (w/o KUnit Wrapper)
|
||||
Running Tests (without KUnit Wrapper)
|
||||
-------------------------------------
|
||||
Build and run your kernel. In the kernel log, the test output is printed
|
||||
out in the TAP format. This will only happen by default if KUnit/tests
|
||||
are built-in. Otherwise the module will need to be loaded.
|
||||
|
||||
Build and run your kernel as usual. Test output will be written to the kernel
|
||||
log in `TAP <https://testanything.org/>`_ format.
|
||||
.. note ::
|
||||
Some lines and/or data may get interspersed in the TAP output.
|
||||
|
||||
.. note::
|
||||
It's possible that there will be other lines and/or data interspersed in the
|
||||
TAP output.
|
||||
|
||||
|
||||
Writing your first test
|
||||
Writing Your First Test
|
||||
=======================
|
||||
In your kernel repository, let's add some code that we can test.
|
||||
|
||||
In your kernel repo let's add some code that we can test. Create a file
|
||||
``drivers/misc/example.h`` with the contents:
|
||||
1. Create a file ``drivers/misc/example.h``, which includes:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int misc_example_add(int left, int right);
|
||||
|
||||
create a file ``drivers/misc/example.c``:
|
||||
2. Create a file ``drivers/misc/example.c``, which includes:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
@ -142,21 +147,22 @@ create a file ``drivers/misc/example.c``:
|
||||
return left + right;
|
||||
}
|
||||
|
||||
Now add the following lines to ``drivers/misc/Kconfig``:
|
||||
3. Add the following lines to ``drivers/misc/Kconfig``:
|
||||
|
||||
.. code-block:: kconfig
|
||||
|
||||
config MISC_EXAMPLE
|
||||
bool "My example"
|
||||
|
||||
and the following lines to ``drivers/misc/Makefile``:
|
||||
4. Add the following lines to ``drivers/misc/Makefile``:
|
||||
|
||||
.. code-block:: make
|
||||
|
||||
obj-$(CONFIG_MISC_EXAMPLE) += example.o
|
||||
|
||||
Now we are ready to write the test. The test will be in
|
||||
``drivers/misc/example-test.c``:
|
||||
Now we are ready to write the test cases.
|
||||
|
||||
1. Add the below test case in ``drivers/misc/example_test.c``:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
@ -191,7 +197,7 @@ Now we are ready to write the test. The test will be in
|
||||
};
|
||||
kunit_test_suite(misc_example_test_suite);
|
||||
|
||||
Now add the following to ``drivers/misc/Kconfig``:
|
||||
2. Add the following lines to ``drivers/misc/Kconfig``:
|
||||
|
||||
.. code-block:: kconfig
|
||||
|
||||
@ -200,20 +206,20 @@ Now add the following to ``drivers/misc/Kconfig``:
|
||||
depends on MISC_EXAMPLE && KUNIT=y
|
||||
default KUNIT_ALL_TESTS
|
||||
|
||||
and the following to ``drivers/misc/Makefile``:
|
||||
3. Add the following lines to ``drivers/misc/Makefile``:
|
||||
|
||||
.. code-block:: make
|
||||
|
||||
obj-$(CONFIG_MISC_EXAMPLE_TEST) += example-test.o
|
||||
obj-$(CONFIG_MISC_EXAMPLE_TEST) += example_test.o
|
||||
|
||||
Now add it to your ``.kunitconfig``:
|
||||
4. Add the following lines to ``.kunitconfig``:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
CONFIG_MISC_EXAMPLE=y
|
||||
CONFIG_MISC_EXAMPLE_TEST=y
|
||||
|
||||
Now you can run the test:
|
||||
5. Run the test:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
@ -227,16 +233,23 @@ You should see the following failure:
|
||||
[16:08:57] [PASSED] misc-example:misc_example_add_test_basic
|
||||
[16:08:57] [FAILED] misc-example:misc_example_test_failure
|
||||
[16:08:57] EXPECTATION FAILED at drivers/misc/example-test.c:17
|
||||
[16:08:57] This test never passes.
|
||||
[16:08:57] This test never passes.
|
||||
...
|
||||
|
||||
Congrats! You just wrote your first KUnit test!
|
||||
Congrats! You just wrote your first KUnit test.
|
||||
|
||||
Next Steps
|
||||
==========
|
||||
* Check out the Documentation/dev-tools/kunit/tips.rst page for tips on
|
||||
writing idiomatic KUnit tests.
|
||||
* Check out the :doc:`running_tips` page for tips on
|
||||
how to make running KUnit tests easier.
|
||||
* Optional: see the :doc:`usage` page for a more
|
||||
in-depth explanation of KUnit.
|
||||
|
||||
* Documentation/dev-tools/kunit/architecture.rst - KUnit architecture.
|
||||
* Documentation/dev-tools/kunit/run_wrapper.rst - run kunit_tool.
|
||||
* Documentation/dev-tools/kunit/run_manual.rst - run tests without kunit_tool.
|
||||
* Documentation/dev-tools/kunit/usage.rst - write tests.
|
||||
* Documentation/dev-tools/kunit/tips.rst - best practices with
|
||||
examples.
|
||||
* Documentation/dev-tools/kunit/api/index.rst - KUnit APIs
|
||||
used for testing.
|
||||
* Documentation/dev-tools/kunit/kunit-tool.rst - kunit_tool helper
|
||||
script.
|
||||
* Documentation/dev-tools/kunit/faq.rst - KUnit common questions and
|
||||
answers.
|
||||
|