mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
synced 2025-01-15 21:23:23 +00:00
Merge commit 'v2.6.31-rc1' into dmaengine
This commit is contained in:
commit
a348a7e6fd
10
.gitignore
vendored
10
.gitignore
vendored
@ -3,7 +3,7 @@
|
||||
# subdirectories here. Add them in the ".gitignore" file
|
||||
# in that subdirectory instead.
|
||||
#
|
||||
# NOTE! Please use 'git-ls-files -i --exclude-standard'
|
||||
# NOTE! Please use 'git ls-files -i --exclude-standard'
|
||||
# command after changing this file, to see if there are
|
||||
# any tracked files which get ignored after the change.
|
||||
#
|
||||
@ -25,6 +25,8 @@
|
||||
*.elf
|
||||
*.bin
|
||||
*.gz
|
||||
*.lzma
|
||||
*.patch
|
||||
|
||||
#
|
||||
# Top-level generic files
|
||||
@ -62,6 +64,12 @@ series
|
||||
cscope.*
|
||||
ncscope.*
|
||||
|
||||
# gnu global files
|
||||
GPATH
|
||||
GRTAGS
|
||||
GSYMS
|
||||
GTAGS
|
||||
|
||||
*.orig
|
||||
*~
|
||||
\#*#
|
||||
|
4
CREDITS
4
CREDITS
@ -1253,6 +1253,10 @@ S: 8124 Constitution Apt. 7
|
||||
S: Sterling Heights, Michigan 48313
|
||||
S: USA
|
||||
|
||||
N: Wolfgang Grandegger
|
||||
E: wg@grandegger.com
|
||||
D: Controller Area Network (device drivers)
|
||||
|
||||
N: William Greathouse
|
||||
E: wgreathouse@smva.com
|
||||
E: wgreathouse@myfavoritei.com
|
||||
|
@ -60,3 +60,62 @@ Description:
|
||||
Indicates whether the block layer should automatically
|
||||
generate checksums for write requests bound for
|
||||
devices that support receiving integrity metadata.
|
||||
|
||||
What: /sys/block/<disk>/alignment_offset
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report a physical block size that is
|
||||
bigger than the logical block size (for instance a drive
|
||||
with 4KB physical sectors exposing 512-byte logical
|
||||
blocks to the operating system). This parameter
|
||||
indicates how many bytes the beginning of the device is
|
||||
offset from the disk's natural alignment.
|
||||
|
||||
What: /sys/block/<disk>/<partition>/alignment_offset
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report a physical block size that is
|
||||
bigger than the logical block size (for instance a drive
|
||||
with 4KB physical sectors exposing 512-byte logical
|
||||
blocks to the operating system). This parameter
|
||||
indicates how many bytes the beginning of the partition
|
||||
is offset from the disk's natural alignment.
|
||||
|
||||
What: /sys/block/<disk>/queue/logical_block_size
|
||||
Date: May 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
This is the smallest unit the storage device can
|
||||
address. It is typically 512 bytes.
|
||||
|
||||
What: /sys/block/<disk>/queue/physical_block_size
|
||||
Date: May 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
This is the smallest unit the storage device can write
|
||||
without resorting to read-modify-write operation. It is
|
||||
usually the same as the logical block size but may be
|
||||
bigger. One example is SATA drives with 4KB sectors
|
||||
that expose a 512-byte logical block size to the
|
||||
operating system.
|
||||
|
||||
What: /sys/block/<disk>/queue/minimum_io_size
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report a preferred minimum I/O size,
|
||||
which is the smallest request the device can perform
|
||||
without incurring a read-modify-write penalty. For disk
|
||||
drives this is often the physical block size. For RAID
|
||||
arrays it is often the stripe chunk size.
|
||||
|
||||
What: /sys/block/<disk>/queue/optimal_io_size
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report an optimal I/O size, which is
|
||||
the device's preferred unit of receiving I/O. This is
|
||||
rarely reported for disk drives. For RAID devices it is
|
||||
usually the stripe width or the internal block size.
|
||||
|
@ -122,3 +122,10 @@ Description:
|
||||
This symbolic link appears when a device is a Virtual Function.
|
||||
The symbolic link points to the PCI device sysfs entry of the
|
||||
Physical Function this device associates with.
|
||||
|
||||
What: /sys/bus/pci/slots/.../module
|
||||
Date: June 2009
|
||||
Contact: linux-pci@vger.kernel.org
|
||||
Description:
|
||||
This symbolic link points to the PCI hotplug controller driver
|
||||
module that manages the hotplug slot.
|
||||
|
33
Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
Normal file
33
Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
Normal file
@ -0,0 +1,33 @@
|
||||
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/model
|
||||
Date: March 2009
|
||||
Kernel Version: 2.6.30
|
||||
Contact: iss_storagedev@hp.com
|
||||
Description: Displays the SCSI INQUIRY page 0 model for logical drive
|
||||
Y of controller X.
|
||||
|
||||
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/rev
|
||||
Date: March 2009
|
||||
Kernel Version: 2.6.30
|
||||
Contact: iss_storagedev@hp.com
|
||||
Description: Displays the SCSI INQUIRY page 0 revision for logical
|
||||
drive Y of controller X.
|
||||
|
||||
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/unique_id
|
||||
Date: March 2009
|
||||
Kernel Version: 2.6.30
|
||||
Contact: iss_storagedev@hp.com
|
||||
Description: Displays the SCSI INQUIRY page 83 serial number for logical
|
||||
drive Y of controller X.
|
||||
|
||||
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/vendor
|
||||
Date: March 2009
|
||||
Kernel Version: 2.6.30
|
||||
Contact: iss_storagedev@hp.com
|
||||
Description: Displays the SCSI INQUIRY page 0 vendor for logical drive
|
||||
Y of controller X.
|
||||
|
||||
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/block:cciss!cXdY
|
||||
Date: March 2009
|
||||
Kernel Version: 2.6.30
|
||||
Contact: iss_storagedev@hp.com
|
||||
Description: A symbolic link to /sys/block/cciss!cXdY
|
125
Documentation/ABI/testing/sysfs-class-mtd
Normal file
125
Documentation/ABI/testing/sysfs-class-mtd
Normal file
@ -0,0 +1,125 @@
|
||||
What: /sys/class/mtd/
|
||||
Date: April 2009
|
||||
KernelVersion: 2.6.29
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
The mtd/ class subdirectory belongs to the MTD subsystem
|
||||
(MTD core).
|
||||
|
||||
What: /sys/class/mtd/mtdX/
|
||||
Date: April 2009
|
||||
KernelVersion: 2.6.29
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
The /sys/class/mtd/mtd{0,1,2,3,...} directories correspond
|
||||
to each /dev/mtdX character device. These may represent
|
||||
physical/simulated flash devices, partitions on a flash
|
||||
device, or concatenated flash devices. They exist regardless
|
||||
of whether CONFIG_MTD_CHAR is actually enabled.
|
||||
|
||||
What: /sys/class/mtd/mtdXro/
|
||||
Date: April 2009
|
||||
KernelVersion: 2.6.29
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
These directories provide the corresponding read-only device
|
||||
nodes for /sys/class/mtd/mtdX/ . They are only created
|
||||
(for the benefit of udev) if CONFIG_MTD_CHAR is enabled.
|
||||
|
||||
What: /sys/class/mtd/mtdX/dev
|
||||
Date: April 2009
|
||||
KernelVersion: 2.6.29
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
Major and minor numbers of the character device corresponding
|
||||
to this MTD device (in <major>:<minor> format). This is the
|
||||
read-write device so <minor> will be even.
|
||||
|
||||
What: /sys/class/mtd/mtdXro/dev
|
||||
Date: April 2009
|
||||
KernelVersion: 2.6.29
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
Major and minor numbers of the character device corresponding
|
||||
to the read-only variant of thie MTD device (in
|
||||
<major>:<minor> format). In this case <minor> will be odd.
|
||||
|
||||
What: /sys/class/mtd/mtdX/erasesize
|
||||
Date: April 2009
|
||||
KernelVersion: 2.6.29
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
"Major" erase size for the device. If numeraseregions is
|
||||
zero, this is the eraseblock size for the entire device.
|
||||
Otherwise, the MEMGETREGIONCOUNT/MEMGETREGIONINFO ioctls
|
||||
can be used to determine the actual eraseblock layout.
|
||||
|
||||
What: /sys/class/mtd/mtdX/flags
|
||||
Date: April 2009
|
||||
KernelVersion: 2.6.29
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
A hexadecimal value representing the device flags, ORed
|
||||
together:
|
||||
|
||||
0x0400: MTD_WRITEABLE - device is writable
|
||||
0x0800: MTD_BIT_WRITEABLE - single bits can be flipped
|
||||
0x1000: MTD_NO_ERASE - no erase necessary
|
||||
0x2000: MTD_POWERUP_LOCK - always locked after reset
|
||||
|
||||
What: /sys/class/mtd/mtdX/name
|
||||
Date: April 2009
|
||||
KernelVersion: 2.6.29
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
A human-readable ASCII name for the device or partition.
|
||||
This will match the name in /proc/mtd .
|
||||
|
||||
What: /sys/class/mtd/mtdX/numeraseregions
|
||||
Date: April 2009
|
||||
KernelVersion: 2.6.29
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
For devices that have variable eraseblock sizes, this
|
||||
provides the total number of erase regions. Otherwise,
|
||||
it will read back as zero.
|
||||
|
||||
What: /sys/class/mtd/mtdX/oobsize
|
||||
Date: April 2009
|
||||
KernelVersion: 2.6.29
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
Number of OOB bytes per page.
|
||||
|
||||
What: /sys/class/mtd/mtdX/size
|
||||
Date: April 2009
|
||||
KernelVersion: 2.6.29
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
Total size of the device/partition, in bytes.
|
||||
|
||||
What: /sys/class/mtd/mtdX/type
|
||||
Date: April 2009
|
||||
KernelVersion: 2.6.29
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
One of the following ASCII strings, representing the device
|
||||
type:
|
||||
|
||||
absent, ram, rom, nor, nand, dataflash, ubi, unknown
|
||||
|
||||
What: /sys/class/mtd/mtdX/writesize
|
||||
Date: April 2009
|
||||
KernelVersion: 2.6.29
|
||||
Contact: linux-mtd@lists.infradead.org
|
||||
Description:
|
||||
Minimal writable flash unit size. This will always be
|
||||
a positive integer.
|
||||
|
||||
In the case of NOR flash it is 1 (even though individual
|
||||
bits can be cleared).
|
||||
|
||||
In the case of NAND flash it is one NAND page (or a
|
||||
half page, or a quarter page).
|
||||
|
||||
In the case of ECC NOR, it is the ECC block size.
|
18
Documentation/ABI/testing/sysfs-devices-cache_disable
Normal file
18
Documentation/ABI/testing/sysfs-devices-cache_disable
Normal file
@ -0,0 +1,18 @@
|
||||
What: /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X
|
||||
Date: August 2008
|
||||
KernelVersion: 2.6.27
|
||||
Contact: mark.langsdorf@amd.com
|
||||
Description: These files exist in every cpu's cache index directories.
|
||||
There are currently 2 cache_disable_# files in each
|
||||
directory. Reading from these files on a supported
|
||||
processor will return that cache disable index value
|
||||
for that processor and node. Writing to one of these
|
||||
files will cause the specificed cache index to be disabled.
|
||||
|
||||
Currently, only AMD Family 10h Processors support cache index
|
||||
disable, and only for their L3 caches. See the BIOS and
|
||||
Kernel Developer's Guide at
|
||||
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116-Public-GH-BKDG_3.20_2-4-09.pdf
|
||||
for formatting information and other details on the
|
||||
cache index disable.
|
||||
Users: joachim.deguara@amd.com
|
@ -79,3 +79,13 @@ Description:
|
||||
This file is read-only and shows the number of
|
||||
kilobytes of data that have been written to this
|
||||
filesystem since it was mounted.
|
||||
|
||||
What: /sys/fs/ext4/<disk>/inode_goal
|
||||
Date: June 2008
|
||||
Contact: "Theodore Ts'o" <tytso@mit.edu>
|
||||
Description:
|
||||
Tuning parameter which (if non-zero) controls the goal
|
||||
inode used by the inode allocator in p0reference to
|
||||
all other allocation hueristics. This is intended for
|
||||
debugging use only, and should be 0 on production
|
||||
systems.
|
||||
|
73
Documentation/ABI/testing/sysfs-pps
Normal file
73
Documentation/ABI/testing/sysfs-pps
Normal file
@ -0,0 +1,73 @@
|
||||
What: /sys/class/pps/
|
||||
Date: February 2008
|
||||
Contact: Rodolfo Giometti <giometti@linux.it>
|
||||
Description:
|
||||
The /sys/class/pps/ directory will contain files and
|
||||
directories that will provide a unified interface to
|
||||
the PPS sources.
|
||||
|
||||
What: /sys/class/pps/ppsX/
|
||||
Date: February 2008
|
||||
Contact: Rodolfo Giometti <giometti@linux.it>
|
||||
Description:
|
||||
The /sys/class/pps/ppsX/ directory is related to X-th
|
||||
PPS source into the system. Each directory will
|
||||
contain files to manage and control its PPS source.
|
||||
|
||||
What: /sys/class/pps/ppsX/assert
|
||||
Date: February 2008
|
||||
Contact: Rodolfo Giometti <giometti@linux.it>
|
||||
Description:
|
||||
The /sys/class/pps/ppsX/assert file reports the assert events
|
||||
and the assert sequence number of the X-th source in the form:
|
||||
|
||||
<secs>.<nsec>#<sequence>
|
||||
|
||||
If the source has no assert events the content of this file
|
||||
is empty.
|
||||
|
||||
What: /sys/class/pps/ppsX/clear
|
||||
Date: February 2008
|
||||
Contact: Rodolfo Giometti <giometti@linux.it>
|
||||
Description:
|
||||
The /sys/class/pps/ppsX/clear file reports the clear events
|
||||
and the clear sequence number of the X-th source in the form:
|
||||
|
||||
<secs>.<nsec>#<sequence>
|
||||
|
||||
If the source has no clear events the content of this file
|
||||
is empty.
|
||||
|
||||
What: /sys/class/pps/ppsX/mode
|
||||
Date: February 2008
|
||||
Contact: Rodolfo Giometti <giometti@linux.it>
|
||||
Description:
|
||||
The /sys/class/pps/ppsX/mode file reports the functioning
|
||||
mode of the X-th source in hexadecimal encoding.
|
||||
|
||||
Please, refer to linux/include/linux/pps.h for further
|
||||
info.
|
||||
|
||||
What: /sys/class/pps/ppsX/echo
|
||||
Date: February 2008
|
||||
Contact: Rodolfo Giometti <giometti@linux.it>
|
||||
Description:
|
||||
The /sys/class/pps/ppsX/echo file reports if the X-th does
|
||||
or does not support an "echo" function.
|
||||
|
||||
What: /sys/class/pps/ppsX/name
|
||||
Date: February 2008
|
||||
Contact: Rodolfo Giometti <giometti@linux.it>
|
||||
Description:
|
||||
The /sys/class/pps/ppsX/name file reports the name of the
|
||||
X-th source.
|
||||
|
||||
What: /sys/class/pps/ppsX/path
|
||||
Date: February 2008
|
||||
Contact: Rodolfo Giometti <giometti@linux.it>
|
||||
Description:
|
||||
The /sys/class/pps/ppsX/path file reports the path name of
|
||||
the device connected with the X-th source.
|
||||
|
||||
If the source is not connected with any device the content
|
||||
of this file is empty.
|
@ -29,7 +29,7 @@ hardware, for example, you probably needn't concern yourself with
|
||||
isdn4k-utils.
|
||||
|
||||
o Gnu C 3.2 # gcc --version
|
||||
o Gnu make 3.79.1 # make --version
|
||||
o Gnu make 3.80 # make --version
|
||||
o binutils 2.12 # ld -v
|
||||
o util-linux 2.10o # fdformat --version
|
||||
o module-init-tools 0.9.10 # depmod -V
|
||||
@ -48,6 +48,7 @@ o procps 3.2.0 # ps --version
|
||||
o oprofile 0.9 # oprofiled --version
|
||||
o udev 081 # udevinfo -V
|
||||
o grub 0.93 # grub --version
|
||||
o mcelog 0.6
|
||||
|
||||
Kernel compilation
|
||||
==================
|
||||
@ -61,7 +62,7 @@ computer.
|
||||
Make
|
||||
----
|
||||
|
||||
You will need Gnu make 3.79.1 or later to build the kernel.
|
||||
You will need Gnu make 3.80 or later to build the kernel.
|
||||
|
||||
Binutils
|
||||
--------
|
||||
@ -71,6 +72,13 @@ assembling the 16-bit boot code, removing the need for as86 to compile
|
||||
your kernel. This change does, however, mean that you need a recent
|
||||
release of binutils.
|
||||
|
||||
Perl
|
||||
----
|
||||
|
||||
You will need perl 5 and the following modules: Getopt::Long, Getopt::Std,
|
||||
File::Basename, and File::Find to build the kernel.
|
||||
|
||||
|
||||
System utilities
|
||||
================
|
||||
|
||||
@ -276,6 +284,16 @@ before running exportfs or mountd. It is recommended that all NFS
|
||||
services be protected from the internet-at-large by a firewall where
|
||||
that is possible.
|
||||
|
||||
mcelog
|
||||
------
|
||||
|
||||
In Linux 2.6.31+ the i386 kernel needs to run the mcelog utility
|
||||
as a regular cronjob similar to the x86-64 kernel to process and log
|
||||
machine check events when CONFIG_X86_NEW_MCE is enabled. Machine check
|
||||
events are errors reported by the CPU. Processing them is strongly encouraged.
|
||||
All x86-64 kernels since 2.6.4 require the mcelog utility to
|
||||
process machine checks.
|
||||
|
||||
Getting updated software
|
||||
========================
|
||||
|
||||
@ -365,6 +383,10 @@ FUSE
|
||||
----
|
||||
o <http://sourceforge.net/projects/fuse>
|
||||
|
||||
mcelog
|
||||
------
|
||||
o <ftp://ftp.kernel.org/pub/linux/utils/cpu/mce/mcelog/>
|
||||
|
||||
Networking
|
||||
**********
|
||||
|
||||
|
@ -698,8 +698,8 @@ very often is not. Abundant use of the inline keyword leads to a much bigger
|
||||
kernel, which in turn slows the system as a whole down, due to a bigger
|
||||
icache footprint for the CPU and simply because there is less memory
|
||||
available for the pagecache. Just think about it; a pagecache miss causes a
|
||||
disk seek, which easily takes 5 miliseconds. There are a LOT of cpu cycles
|
||||
that can go into these 5 miliseconds.
|
||||
disk seek, which easily takes 5 milliseconds. There are a LOT of cpu cycles
|
||||
that can go into these 5 milliseconds.
|
||||
|
||||
A reasonable rule of thumb is to not put inline at functions that have more
|
||||
than 3 lines of code in them. An exception to this rule are the cases where
|
||||
|
@ -676,8 +676,8 @@ this directory the following files can currently be found:
|
||||
dma-api/all_errors This file contains a numeric value. If this
|
||||
value is not equal to zero the debugging code
|
||||
will print a warning for every error it finds
|
||||
into the kernel log. Be carefull with this
|
||||
option. It can easily flood your logs.
|
||||
into the kernel log. Be careful with this
|
||||
option, as it can easily flood your logs.
|
||||
|
||||
dma-api/disabled This read-only file contains the character 'Y'
|
||||
if the debugging code is disabled. This can
|
||||
@ -704,12 +704,24 @@ this directory the following files can currently be found:
|
||||
The current number of free dma_debug_entries
|
||||
in the allocator.
|
||||
|
||||
dma-api/driver-filter
|
||||
You can write a name of a driver into this file
|
||||
to limit the debug output to requests from that
|
||||
particular driver. Write an empty string to
|
||||
that file to disable the filter and see
|
||||
all errors again.
|
||||
|
||||
If you have this code compiled into your kernel it will be enabled by default.
|
||||
If you want to boot without the bookkeeping anyway you can provide
|
||||
'dma_debug=off' as a boot parameter. This will disable DMA-API debugging.
|
||||
Notice that you can not enable it again at runtime. You have to reboot to do
|
||||
so.
|
||||
|
||||
If you want to see debug messages only for a special device driver you can
|
||||
specify the dma_debug_driver=<drivername> parameter. This will enable the
|
||||
driver filter at boot time. The debug code will only print errors for that
|
||||
driver afterwards. This filter can be disabled or changed later using debugfs.
|
||||
|
||||
When the code disables itself at runtime this is most likely because it ran
|
||||
out of dma_debug_entries. These entries are preallocated at boot. The number
|
||||
of preallocated entries is defined per architecture. If it is too low for you
|
||||
|
@ -13,7 +13,8 @@ DOCBOOKS := z8530book.xml mcabook.xml device-drivers.xml \
|
||||
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
|
||||
genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
|
||||
mac80211.xml debugobjects.xml sh.xml regulator.xml \
|
||||
alsa-driver-api.xml writing-an-alsa-driver.xml
|
||||
alsa-driver-api.xml writing-an-alsa-driver.xml \
|
||||
tracepoint.xml
|
||||
|
||||
###
|
||||
# The build process is as follows (targets):
|
||||
|
@ -106,7 +106,7 @@
|
||||
number of errors are printk'ed including a full stack trace.
|
||||
</para>
|
||||
<para>
|
||||
The statistics are available via debugfs/debug_objects/stats.
|
||||
The statistics are available via /sys/kernel/debug/debug_objects/stats.
|
||||
They provide information about the number of warnings and the
|
||||
number of successful fixups along with information about the
|
||||
usage of the internal tracking objects and the state of the
|
||||
|
@ -145,7 +145,6 @@ usage should require reading the full document.
|
||||
interface in STA mode at first!
|
||||
</para>
|
||||
!Finclude/net/mac80211.h ieee80211_if_init_conf
|
||||
!Finclude/net/mac80211.h ieee80211_if_conf
|
||||
</chapter>
|
||||
|
||||
<chapter id="rx-tx">
|
||||
|
89
Documentation/DocBook/tracepoint.tmpl
Normal file
89
Documentation/DocBook/tracepoint.tmpl
Normal file
@ -0,0 +1,89 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||
|
||||
<book id="Tracepoints">
|
||||
<bookinfo>
|
||||
<title>The Linux Kernel Tracepoint API</title>
|
||||
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Jason</firstname>
|
||||
<surname>Baron</surname>
|
||||
<affiliation>
|
||||
<address>
|
||||
<email>jbaron@redhat.com</email>
|
||||
</address>
|
||||
</affiliation>
|
||||
</author>
|
||||
</authorgroup>
|
||||
|
||||
<legalnotice>
|
||||
<para>
|
||||
This documentation is free software; you can redistribute
|
||||
it and/or modify it under the terms of the GNU General Public
|
||||
License as published by the Free Software Foundation; either
|
||||
version 2 of the License, or (at your option) any later
|
||||
version.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This program is distributed in the hope that it will be
|
||||
useful, but WITHOUT ANY WARRANTY; without even the implied
|
||||
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
See the GNU General Public License for more details.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
You should have received a copy of the GNU General Public
|
||||
License along with this program; if not, write to the Free
|
||||
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||
MA 02111-1307 USA
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For more details see the file COPYING in the source
|
||||
distribution of Linux.
|
||||
</para>
|
||||
</legalnotice>
|
||||
</bookinfo>
|
||||
|
||||
<toc></toc>
|
||||
<chapter id="intro">
|
||||
<title>Introduction</title>
|
||||
<para>
|
||||
Tracepoints are static probe points that are located in strategic points
|
||||
throughout the kernel. 'Probes' register/unregister with tracepoints
|
||||
via a callback mechanism. The 'probes' are strictly typed functions that
|
||||
are passed a unique set of parameters defined by each tracepoint.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
From this simple callback mechanism, 'probes' can be used to profile, debug,
|
||||
and understand kernel behavior. There are a number of tools that provide a
|
||||
framework for using 'probes'. These tools include Systemtap, ftrace, and
|
||||
LTTng.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Tracepoints are defined in a number of header files via various macros. Thus,
|
||||
the purpose of this document is to provide a clear accounting of the available
|
||||
tracepoints. The intention is to understand not only what tracepoints are
|
||||
available but also to understand where future tracepoints might be added.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The API presented has functions of the form:
|
||||
<function>trace_tracepointname(function parameters)</function>. These are the
|
||||
tracepoints callbacks that are found throughout the code. Registering and
|
||||
unregistering probes with these callback sites is covered in the
|
||||
<filename>Documentation/trace/*</filename> directory.
|
||||
</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="irq">
|
||||
<title>IRQ</title>
|
||||
!Iinclude/trace/events/irq.h
|
||||
</chapter>
|
||||
|
||||
</book>
|
@ -61,6 +61,10 @@ be initiated although firmwares have no _OSC support. To enable the
|
||||
walkaround, pls. add aerdriver.forceload=y to kernel boot parameter line
|
||||
when booting kernel. Note that forceload=n by default.
|
||||
|
||||
nosourceid, another parameter of type bool, can be used when broken
|
||||
hardware (mostly chipsets) has root ports that cannot obtain the reporting
|
||||
source ID. nosourceid=n by default.
|
||||
|
||||
2.3 AER error output
|
||||
When a PCI-E AER error is captured, an error message will be outputed to
|
||||
console. If it's a correctable error, it is outputed as a warning.
|
||||
@ -246,3 +250,24 @@ with the PCI Express AER Root driver?
|
||||
A: It could call the helper functions to enable AER in devices and
|
||||
cleanup uncorrectable status register. Pls. refer to section 3.3.
|
||||
|
||||
|
||||
4. Software error injection
|
||||
|
||||
Debugging PCIE AER error recovery code is quite difficult because it
|
||||
is hard to trigger real hardware errors. Software based error
|
||||
injection can be used to fake various kinds of PCIE errors.
|
||||
|
||||
First you should enable PCIE AER software error injection in kernel
|
||||
configuration, that is, following item should be in your .config.
|
||||
|
||||
CONFIG_PCIEAER_INJECT=y or CONFIG_PCIEAER_INJECT=m
|
||||
|
||||
After reboot with new kernel or insert the module, a device file named
|
||||
/dev/aer_inject should be created.
|
||||
|
||||
Then, you need a user space tool named aer-inject, which can be gotten
|
||||
from:
|
||||
http://www.kernel.org/pub/linux/utils/pci/aer-inject/
|
||||
|
||||
More information about aer-inject can be found in the document comes
|
||||
with its source code.
|
||||
|
@ -118,7 +118,7 @@ to another chain) checking the final 'nulls' value if
|
||||
the lookup met the end of chain. If final 'nulls' value
|
||||
is not the slot number, then we must restart the lookup at
|
||||
the beginning. If the object was moved to the same chain,
|
||||
then the reader doesnt care : It might eventually
|
||||
then the reader doesn't care : It might eventually
|
||||
scan the list again without harm.
|
||||
|
||||
|
||||
|
@ -192,23 +192,24 @@ rcu/rcuhier (which displays the struct rcu_node hierarchy).
|
||||
The output of "cat rcu/rcudata" looks as follows:
|
||||
|
||||
rcu:
|
||||
0 c=4011 g=4012 pq=1 pqc=4011 qp=0 rpfq=1 rp=3c2a dt=23301/73 dn=2 df=1882 of=0 ri=2126 ql=2 b=10
|
||||
1 c=4011 g=4012 pq=1 pqc=4011 qp=0 rpfq=3 rp=39a6 dt=78073/1 dn=2 df=1402 of=0 ri=1875 ql=46 b=10
|
||||
2 c=4010 g=4010 pq=1 pqc=4010 qp=0 rpfq=-5 rp=1d12 dt=16646/0 dn=2 df=3140 of=0 ri=2080 ql=0 b=10
|
||||
3 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=2b50 dt=21159/1 dn=2 df=2230 of=0 ri=1923 ql=72 b=10
|
||||
4 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=1644 dt=5783/1 dn=2 df=3348 of=0 ri=2805 ql=7 b=10
|
||||
5 c=4012 g=4013 pq=0 pqc=4011 qp=1 rpfq=3 rp=1aac dt=5879/1 dn=2 df=3140 of=0 ri=2066 ql=10 b=10
|
||||
6 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=ed8 dt=5847/1 dn=2 df=3797 of=0 ri=1266 ql=10 b=10
|
||||
7 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=1fa2 dt=6199/1 dn=2 df=2795 of=0 ri=2162 ql=28 b=10
|
||||
rcu:
|
||||
0 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=10951/1 dn=0 df=1101 of=0 ri=36 ql=0 b=10
|
||||
1 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=16117/1 dn=0 df=1015 of=0 ri=0 ql=0 b=10
|
||||
2 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1445/1 dn=0 df=1839 of=0 ri=0 ql=0 b=10
|
||||
3 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=6681/1 dn=0 df=1545 of=0 ri=0 ql=0 b=10
|
||||
4 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1003/1 dn=0 df=1992 of=0 ri=0 ql=0 b=10
|
||||
5 c=17829 g=17830 pq=1 pqc=17829 qp=1 dt=3887/1 dn=0 df=3331 of=0 ri=4 ql=2 b=10
|
||||
6 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=859/1 dn=0 df=3224 of=0 ri=0 ql=0 b=10
|
||||
7 c=17829 g=17830 pq=0 pqc=17829 qp=1 dt=3761/1 dn=0 df=1818 of=0 ri=0 ql=2 b=10
|
||||
rcu_bh:
|
||||
0 c=-268 g=-268 pq=1 pqc=-268 qp=0 rpfq=-145 rp=21d6 dt=23301/73 dn=2 df=0 of=0 ri=0 ql=0 b=10
|
||||
1 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-170 rp=20ce dt=78073/1 dn=2 df=26 of=0 ri=5 ql=0 b=10
|
||||
2 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-83 rp=fbd dt=16646/0 dn=2 df=28 of=0 ri=4 ql=0 b=10
|
||||
3 c=-268 g=-268 pq=1 pqc=-268 qp=0 rpfq=-105 rp=178c dt=21159/1 dn=2 df=28 of=0 ri=2 ql=0 b=10
|
||||
4 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-30 rp=b54 dt=5783/1 dn=2 df=32 of=0 ri=0 ql=0 b=10
|
||||
5 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-29 rp=df5 dt=5879/1 dn=2 df=30 of=0 ri=3 ql=0 b=10
|
||||
6 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-28 rp=788 dt=5847/1 dn=2 df=32 of=0 ri=0 ql=0 b=10
|
||||
7 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-53 rp=1098 dt=6199/1 dn=2 df=30 of=0 ri=3 ql=0 b=10
|
||||
0 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=10951/1 dn=0 df=0 of=0 ri=0 ql=0 b=10
|
||||
1 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=16117/1 dn=0 df=13 of=0 ri=0 ql=0 b=10
|
||||
2 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1445/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||
3 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=6681/1 dn=0 df=9 of=0 ri=0 ql=0 b=10
|
||||
4 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1003/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||
5 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3887/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||
6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||
7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||
|
||||
The first section lists the rcu_data structures for rcu, the second for
|
||||
rcu_bh. Each section has one line per CPU, or eight for this 8-CPU system.
|
||||
@ -253,12 +254,6 @@ o "pqc" indicates which grace period the last-observed quiescent
|
||||
o "qp" indicates that RCU still expects a quiescent state from
|
||||
this CPU.
|
||||
|
||||
o "rpfq" is the number of rcu_pending() calls on this CPU required
|
||||
to induce this CPU to invoke force_quiescent_state().
|
||||
|
||||
o "rp" is low-order four hex digits of the count of how many times
|
||||
rcu_pending() has been invoked on this CPU.
|
||||
|
||||
o "dt" is the current value of the dyntick counter that is incremented
|
||||
when entering or leaving dynticks idle state, either by the
|
||||
scheduler or by irq. The number after the "/" is the interrupt
|
||||
@ -305,6 +300,9 @@ o "b" is the batch limit for this CPU. If more than this number
|
||||
of RCU callbacks is ready to invoke, then the remainder will
|
||||
be deferred.
|
||||
|
||||
There is also an rcu/rcudata.csv file with the same information in
|
||||
comma-separated-variable spreadsheet format.
|
||||
|
||||
|
||||
The output of "cat rcu/rcugp" looks as follows:
|
||||
|
||||
@ -411,3 +409,63 @@ o Each element of the form "1/1 0:127 ^0" represents one struct
|
||||
For example, the first entry at the lowest level shows
|
||||
"^0", indicating that it corresponds to bit zero in
|
||||
the first entry at the middle level.
|
||||
|
||||
|
||||
The output of "cat rcu/rcu_pending" looks as follows:
|
||||
|
||||
rcu:
|
||||
0 np=255892 qsp=53936 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741
|
||||
1 np=261224 qsp=54638 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792
|
||||
2 np=237496 qsp=49664 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629
|
||||
3 np=236249 qsp=48766 cbr=0 cng=286 gpc=48049 gps=1218 nf=207 nn=137723
|
||||
4 np=221310 qsp=46850 cbr=0 cng=26 gpc=43161 gps=4634 nf=3529 nn=123110
|
||||
5 np=237332 qsp=48449 cbr=0 cng=54 gpc=47920 gps=3252 nf=201 nn=137456
|
||||
6 np=219995 qsp=46718 cbr=0 cng=50 gpc=42098 gps=6093 nf=4202 nn=120834
|
||||
7 np=249893 qsp=49390 cbr=0 cng=72 gpc=38400 gps=17102 nf=41 nn=144888
|
||||
rcu_bh:
|
||||
0 np=146741 qsp=1419 cbr=0 cng=6 gpc=0 gps=0 nf=2 nn=145314
|
||||
1 np=155792 qsp=12597 cbr=0 cng=0 gpc=4 gps=8 nf=3 nn=143180
|
||||
2 np=136629 qsp=18680 cbr=0 cng=0 gpc=7 gps=6 nf=0 nn=117936
|
||||
3 np=137723 qsp=2843 cbr=0 cng=0 gpc=10 gps=7 nf=0 nn=134863
|
||||
4 np=123110 qsp=12433 cbr=0 cng=0 gpc=4 gps=2 nf=0 nn=110671
|
||||
5 np=137456 qsp=4210 cbr=0 cng=0 gpc=6 gps=5 nf=0 nn=133235
|
||||
6 np=120834 qsp=9902 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921
|
||||
7 np=144888 qsp=26336 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542
|
||||
|
||||
As always, this is once again split into "rcu" and "rcu_bh" portions.
|
||||
The fields are as follows:
|
||||
|
||||
o "np" is the number of times that __rcu_pending() has been invoked
|
||||
for the corresponding flavor of RCU.
|
||||
|
||||
o "qsp" is the number of times that the RCU was waiting for a
|
||||
quiescent state from this CPU.
|
||||
|
||||
o "cbr" is the number of times that this CPU had RCU callbacks
|
||||
that had passed through a grace period, and were thus ready
|
||||
to be invoked.
|
||||
|
||||
o "cng" is the number of times that this CPU needed another
|
||||
grace period while RCU was idle.
|
||||
|
||||
o "gpc" is the number of times that an old grace period had
|
||||
completed, but this CPU was not yet aware of it.
|
||||
|
||||
o "gps" is the number of times that a new grace period had started,
|
||||
but this CPU was not yet aware of it.
|
||||
|
||||
o "nf" is the number of times that this CPU suspected that the
|
||||
current grace period had run for too long, and thus needed to
|
||||
be forced.
|
||||
|
||||
Please note that "forcing" consists of sending resched IPIs
|
||||
to holdout CPUs. If that CPU really still is in an old RCU
|
||||
read-side critical section, then we really do have to wait for it.
|
||||
The assumption behing "forcing" is that the CPU is not still in
|
||||
an old RCU read-side critical section, but has not yet responded
|
||||
for some other reason.
|
||||
|
||||
o "nn" is the number of times that this CPU needed nothing. Alert
|
||||
readers will note that the rcu "nn" number for a given CPU very
|
||||
closely matches the rcu_bh "np" number for that same CPU. This
|
||||
is due to short-circuit evaluation in rcu_pending().
|
||||
|
@ -5,7 +5,7 @@ Copyright 2006, 2007 Simtec Electronics
|
||||
|
||||
The Silicon Motion SM501 multimedia companion chip is a multifunction device
|
||||
which may provide numerous interfaces including USB host controller USB gadget,
|
||||
Asyncronous Serial ports, Audio functions and a dual display video interface.
|
||||
asynchronous serial ports, audio functions, and a dual display video interface.
|
||||
The device may be connected by PCI or local bus with varying functions enabled.
|
||||
|
||||
Core
|
||||
|
@ -184,8 +184,9 @@ length. Single character labels using special characters, that being anything
|
||||
other than a letter or digit, are reserved for use by the Smack development
|
||||
team. Smack labels are unstructured, case sensitive, and the only operation
|
||||
ever performed on them is comparison for equality. Smack labels cannot
|
||||
contain unprintable characters or the "/" (slash) character. Smack labels
|
||||
cannot begin with a '-', which is reserved for special options.
|
||||
contain unprintable characters, the "/" (slash), the "\" (backslash), the "'"
|
||||
(quote) and '"' (double-quote) characters.
|
||||
Smack labels cannot begin with a '-', which is reserved for special options.
|
||||
|
||||
There are some predefined labels:
|
||||
|
||||
@ -523,3 +524,18 @@ Smack supports some mount options:
|
||||
|
||||
These mount options apply to all file system types.
|
||||
|
||||
Smack auditing
|
||||
|
||||
If you want Smack auditing of security events, you need to set CONFIG_AUDIT
|
||||
in your kernel configuration.
|
||||
By default, all denied events will be audited. You can change this behavior by
|
||||
writing a single character to the /smack/logging file :
|
||||
0 : no logging
|
||||
1 : log denied (default)
|
||||
2 : log accepted
|
||||
3 : log denied & accepted
|
||||
|
||||
Events are logged as 'key=value' pairs, for each event you at least will get
|
||||
the subjet, the object, the rights requested, the action, the kernel function
|
||||
that triggered the event, plus other pairs depending on the type of event
|
||||
audited.
|
||||
|
@ -54,7 +54,7 @@ kernel patches.
|
||||
CONFIG_PREEMPT.
|
||||
|
||||
14: If the patch affects IO/Disk, etc: has been tested with and without
|
||||
CONFIG_LBD.
|
||||
CONFIG_LBDAF.
|
||||
|
||||
15: All codepaths have been exercised with all lockdep features enabled.
|
||||
|
||||
|
@ -91,6 +91,10 @@ Be as specific as possible. The WORST descriptions possible include
|
||||
things like "update driver X", "bug fix for driver X", or "this patch
|
||||
includes updates for subsystem X. Please apply."
|
||||
|
||||
The maintainer will thank you if you write your patch description in a
|
||||
form which can be easily pulled into Linux's source code management
|
||||
system, git, as a "commit log". See #15, below.
|
||||
|
||||
If your description starts to get long, that's a sign that you probably
|
||||
need to split up your patch. See #3, next.
|
||||
|
||||
@ -183,8 +187,9 @@ Even if the maintainer did not respond in step #4, make sure to ALWAYS
|
||||
copy the maintainer when you change their code.
|
||||
|
||||
For small patches you may want to CC the Trivial Patch Monkey
|
||||
trivial@kernel.org managed by Jesper Juhl; which collects "trivial"
|
||||
patches. Trivial patches must qualify for one of the following rules:
|
||||
trivial@kernel.org which collects "trivial" patches. Have a look
|
||||
into the MAINTAINERS file for its current manager.
|
||||
Trivial patches must qualify for one of the following rules:
|
||||
Spelling fixes in documentation
|
||||
Spelling fixes which could break grep(1)
|
||||
Warning fixes (cluttering with useless warnings is bad)
|
||||
@ -196,7 +201,6 @@ patches. Trivial patches must qualify for one of the following rules:
|
||||
since people copy, as long as it's trivial)
|
||||
Any fix by the author/maintainer of the file (ie. patch monkey
|
||||
in re-transmission mode)
|
||||
URL: <http://www.kernel.org/pub/linux/kernel/people/juhl/trivial/>
|
||||
|
||||
|
||||
|
||||
@ -405,7 +409,14 @@ person it names. This tag documents that potentially interested parties
|
||||
have been included in the discussion
|
||||
|
||||
|
||||
14) Using Tested-by: and Reviewed-by:
|
||||
14) Using Reported-by:, Tested-by: and Reviewed-by:
|
||||
|
||||
If this patch fixes a problem reported by somebody else, consider adding a
|
||||
Reported-by: tag to credit the reporter for their contribution. Please
|
||||
note that this tag should not be added without the reporter's permission,
|
||||
especially if the problem was not reported in a public forum. That said,
|
||||
if we diligently credit our bug reporters, they will, hopefully, be
|
||||
inspired to help us again in the future.
|
||||
|
||||
A Tested-by: tag indicates that the patch has been successfully tested (in
|
||||
some environment) by the person named. This tag informs maintainers that
|
||||
@ -444,7 +455,7 @@ offer a Reviewed-by tag for a patch. This tag serves to give credit to
|
||||
reviewers and to inform maintainers of the degree of review which has been
|
||||
done on the patch. Reviewed-by: tags, when supplied by reviewers known to
|
||||
understand the subject area and to perform thorough reviews, will normally
|
||||
increase the liklihood of your patch getting into the kernel.
|
||||
increase the likelihood of your patch getting into the kernel.
|
||||
|
||||
|
||||
15) The canonical patch format
|
||||
@ -485,12 +496,33 @@ phrase" should not be a filename. Do not use the same "summary
|
||||
phrase" for every patch in a whole patch series (where a "patch
|
||||
series" is an ordered sequence of multiple, related patches).
|
||||
|
||||
Bear in mind that the "summary phrase" of your email becomes
|
||||
a globally-unique identifier for that patch. It propagates
|
||||
all the way into the git changelog. The "summary phrase" may
|
||||
later be used in developer discussions which refer to the patch.
|
||||
People will want to google for the "summary phrase" to read
|
||||
discussion regarding that patch.
|
||||
Bear in mind that the "summary phrase" of your email becomes a
|
||||
globally-unique identifier for that patch. It propagates all the way
|
||||
into the git changelog. The "summary phrase" may later be used in
|
||||
developer discussions which refer to the patch. People will want to
|
||||
google for the "summary phrase" to read discussion regarding that
|
||||
patch. It will also be the only thing that people may quickly see
|
||||
when, two or three months later, they are going through perhaps
|
||||
thousands of patches using tools such as "gitk" or "git log
|
||||
--oneline".
|
||||
|
||||
For these reasons, the "summary" must be no more than 70-75
|
||||
characters, and it must describe both what the patch changes, as well
|
||||
as why the patch might be necessary. It is challenging to be both
|
||||
succinct and descriptive, but that is what a well-written summary
|
||||
should do.
|
||||
|
||||
The "summary phrase" may be prefixed by tags enclosed in square
|
||||
brackets: "Subject: [PATCH tag] <summary phrase>". The tags are not
|
||||
considered part of the summary phrase, but describe how the patch
|
||||
should be treated. Common tags might include a version descriptor if
|
||||
the multiple versions of the patch have been sent out in response to
|
||||
comments (i.e., "v1, v2, v3"), or "RFC" to indicate a request for
|
||||
comments. If there are four patches in a patch series the individual
|
||||
patches may be numbered like this: 1/4, 2/4, 3/4, 4/4. This assures
|
||||
that developers understand the order in which the patches should be
|
||||
applied and that they have reviewed or applied all of the patches in
|
||||
the patch series.
|
||||
|
||||
A couple of example Subjects:
|
||||
|
||||
@ -510,19 +542,31 @@ the patch author in the changelog.
|
||||
The explanation body will be committed to the permanent source
|
||||
changelog, so should make sense to a competent reader who has long
|
||||
since forgotten the immediate details of the discussion that might
|
||||
have led to this patch.
|
||||
have led to this patch. Including symptoms of the failure which the
|
||||
patch addresses (kernel log messages, oops messages, etc.) is
|
||||
especially useful for people who might be searching the commit logs
|
||||
looking for the applicable patch. If a patch fixes a compile failure,
|
||||
it may not be necessary to include _all_ of the compile failures; just
|
||||
enough that it is likely that someone searching for the patch can find
|
||||
it. As in the "summary phrase", it is important to be both succinct as
|
||||
well as descriptive.
|
||||
|
||||
The "---" marker line serves the essential purpose of marking for patch
|
||||
handling tools where the changelog message ends.
|
||||
|
||||
One good use for the additional comments after the "---" marker is for
|
||||
a diffstat, to show what files have changed, and the number of inserted
|
||||
and deleted lines per file. A diffstat is especially useful on bigger
|
||||
patches. Other comments relevant only to the moment or the maintainer,
|
||||
not suitable for the permanent changelog, should also go here.
|
||||
Use diffstat options "-p 1 -w 70" so that filenames are listed from the
|
||||
top of the kernel source tree and don't use too much horizontal space
|
||||
(easily fit in 80 columns, maybe with some indentation).
|
||||
a diffstat, to show what files have changed, and the number of
|
||||
inserted and deleted lines per file. A diffstat is especially useful
|
||||
on bigger patches. Other comments relevant only to the moment or the
|
||||
maintainer, not suitable for the permanent changelog, should also go
|
||||
here. A good example of such comments might be "patch changelogs"
|
||||
which describe what has changed between the v1 and v2 version of the
|
||||
patch.
|
||||
|
||||
If you are going to include a diffstat after the "---" marker, please
|
||||
use diffstat options "-p 1 -w 70" so that filenames are listed from
|
||||
the top of the kernel source tree and don't use too much horizontal
|
||||
space (easily fit in 80 columns, maybe with some indentation).
|
||||
|
||||
See more details on the proper patch format in the following
|
||||
references.
|
||||
|
@ -246,7 +246,8 @@ void print_ioacct(struct taskstats *t)
|
||||
|
||||
int main(int argc, char *argv[])
|
||||
{
|
||||
int c, rc, rep_len, aggr_len, len2, cmd_type;
|
||||
int c, rc, rep_len, aggr_len, len2;
|
||||
int cmd_type = TASKSTATS_CMD_ATTR_UNSPEC;
|
||||
__u16 id;
|
||||
__u32 mypid;
|
||||
|
||||
|
@ -51,7 +51,7 @@ PIN Numbers
|
||||
-----------
|
||||
|
||||
Each pin has an unique number associated with it in regs-gpio.h,
|
||||
eg S3C2410_GPA0 or S3C2410_GPF1. These defines are used to tell
|
||||
eg S3C2410_GPA(0) or S3C2410_GPF(1). These defines are used to tell
|
||||
the GPIO functions which pin is to be used.
|
||||
|
||||
|
||||
@ -65,11 +65,11 @@ Configuring a pin
|
||||
|
||||
Eg:
|
||||
|
||||
s3c2410_gpio_cfgpin(S3C2410_GPA0, S3C2410_GPA0_ADDR0);
|
||||
s3c2410_gpio_cfgpin(S3C2410_GPE8, S3C2410_GPE8_SDDAT1);
|
||||
s3c2410_gpio_cfgpin(S3C2410_GPA(0), S3C2410_GPA0_ADDR0);
|
||||
s3c2410_gpio_cfgpin(S3C2410_GPE(8), S3C2410_GPE8_SDDAT1);
|
||||
|
||||
which would turn GPA0 into the lowest Address line A0, and set
|
||||
GPE8 to be connected to the SDIO/MMC controller's SDDAT1 line.
|
||||
which would turn GPA(0) into the lowest Address line A0, and set
|
||||
GPE(8) to be connected to the SDIO/MMC controller's SDDAT1 line.
|
||||
|
||||
|
||||
Reading the current configuration
|
||||
|
@ -229,10 +229,10 @@ kernel. It is the use of atomic counters to implement reference
|
||||
counting, and it works such that once the counter falls to zero it can
|
||||
be guaranteed that no other entity can be accessing the object:
|
||||
|
||||
static void obj_list_add(struct obj *obj)
|
||||
static void obj_list_add(struct obj *obj, struct list_head *head)
|
||||
{
|
||||
obj->active = 1;
|
||||
list_add(&obj->list);
|
||||
list_add(&obj->list, head);
|
||||
}
|
||||
|
||||
static void obj_list_del(struct obj *obj)
|
||||
|
@ -186,7 +186,7 @@ a virtual address mapping (unlike the earlier scheme of virtual address
|
||||
do not have a corresponding kernel virtual address space mapping) and
|
||||
low-memory pages.
|
||||
|
||||
Note: Please refer to Documentation/PCI/PCI-DMA-mapping.txt for a discussion
|
||||
Note: Please refer to Documentation/DMA-mapping.txt for a discussion
|
||||
on PCI high mem DMA aspects and mapping of scatter gather lists, and support
|
||||
for 64 bit PCI.
|
||||
|
||||
|
@ -58,7 +58,7 @@ same criteria as reads.
|
||||
front_merges (bool)
|
||||
------------
|
||||
|
||||
Sometimes it happens that a request enters the io scheduler that is contigious
|
||||
Sometimes it happens that a request enters the io scheduler that is contiguous
|
||||
with a request that is already on the queue. Either it fits in the back of that
|
||||
request, or it fits at the front. That is called either a back merge candidate
|
||||
or a front merge candidate. Due to the way files are typically laid out,
|
||||
|
@ -27,7 +27,7 @@ parameter.
|
||||
|
||||
For simplicity, only one braille console can be enabled, other uses of
|
||||
console=brl,... will be discarded. Also note that it does not interfere with
|
||||
the console selection mecanism described in serial-console.txt
|
||||
the console selection mechanism described in serial-console.txt
|
||||
|
||||
For now, only the VisioBraille device is supported.
|
||||
|
||||
|
@ -117,7 +117,7 @@ Using the pktcdvd debugfs interface
|
||||
|
||||
To read pktcdvd device infos in human readable form, do:
|
||||
|
||||
# cat /debug/pktcdvd/pktcdvd[0-7]/info
|
||||
# cat /sys/kernel/debug/pktcdvd/pktcdvd[0-7]/info
|
||||
|
||||
For a description of the debugfs interface look into the file:
|
||||
|
||||
|
@ -152,14 +152,19 @@ When swap is accounted, following files are added.
|
||||
|
||||
usage of mem+swap is limited by memsw.limit_in_bytes.
|
||||
|
||||
Note: why 'mem+swap' rather than swap.
|
||||
* why 'mem+swap' rather than swap.
|
||||
The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
|
||||
to move account from memory to swap...there is no change in usage of
|
||||
mem+swap.
|
||||
mem+swap. In other words, when we want to limit the usage of swap without
|
||||
affecting global LRU, mem+swap limit is better than just limiting swap from
|
||||
OS point of view.
|
||||
|
||||
In other words, when we want to limit the usage of swap without affecting
|
||||
global LRU, mem+swap limit is better than just limiting swap from OS point
|
||||
of view.
|
||||
* What happens when a cgroup hits memory.memsw.limit_in_bytes
|
||||
When a cgroup his memory.memsw.limit_in_bytes, it's useless to do swap-out
|
||||
in this cgroup. Then, swap-out will not be done by cgroup routine and file
|
||||
caches are dropped. But as mentioned above, global LRU can do swapout memory
|
||||
from it for sanity of the system's memory management state. You can't forbid
|
||||
it by cgroup.
|
||||
|
||||
2.5 Reclaim
|
||||
|
||||
@ -204,6 +209,7 @@ We can alter the memory limit:
|
||||
|
||||
NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
|
||||
mega or gigabytes.
|
||||
NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited).
|
||||
|
||||
# cat /cgroups/0/memory.limit_in_bytes
|
||||
4194304
|
||||
|
@ -41,6 +41,12 @@ void cn_test_callback(void *data)
|
||||
msg->seq, msg->ack, msg->len, (char *)msg->data);
|
||||
}
|
||||
|
||||
/*
|
||||
* Do not remove this function even if no one is using it as
|
||||
* this is an example of how to get notifications about new
|
||||
* connector user registration
|
||||
*/
|
||||
#if 0
|
||||
static int cn_test_want_notify(void)
|
||||
{
|
||||
struct cn_ctl_msg *ctl;
|
||||
@ -117,6 +123,7 @@ nlmsg_failure:
|
||||
kfree_skb(skb);
|
||||
return -EINVAL;
|
||||
}
|
||||
#endif
|
||||
|
||||
static u32 cn_test_timer_counter;
|
||||
static void cn_test_timer_func(unsigned long __data)
|
||||
|
@ -155,7 +155,7 @@ actual frequency must be determined using the following rules:
|
||||
- if relation==CPUFREQ_REL_H, try to select a new_freq lower than or equal
|
||||
target_freq. ("H for highest, but no higher than")
|
||||
|
||||
Here again the frequency table helper might assist you - see section 3
|
||||
Here again the frequency table helper might assist you - see section 2
|
||||
for details.
|
||||
|
||||
|
||||
|
@ -119,10 +119,6 @@ want the kernel to look at the CPU usage and to make decisions on
|
||||
what to do about the frequency. Typically this is set to values of
|
||||
around '10000' or more. It's default value is (cmp. with users-guide.txt):
|
||||
transition_latency * 1000
|
||||
The lowest value you can set is:
|
||||
transition_latency * 100 or it may get restricted to a value where it
|
||||
makes not sense for the kernel anymore to poll that often which depends
|
||||
on your HZ config variable (HZ=1000: max=20000us, HZ=250: max=5000).
|
||||
Be aware that transition latency is in ns and sampling_rate is in us, so you
|
||||
get the same sysfs value by default.
|
||||
Sampling rate should always get adjusted considering the transition latency
|
||||
@ -131,14 +127,20 @@ in the bash (as said, 1000 is default), do:
|
||||
echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) \
|
||||
>ondemand/sampling_rate
|
||||
|
||||
show_sampling_rate_(min|max): THIS INTERFACE IS DEPRECATED, DON'T USE IT.
|
||||
You can use wider ranges now and the general
|
||||
cpuinfo_transition_latency variable (cmp. with user-guide.txt) can be
|
||||
used to obtain exactly the same info:
|
||||
show_sampling_rate_min = transtition_latency * 500 / 1000
|
||||
show_sampling_rate_max = transtition_latency * 500000 / 1000
|
||||
(divided by 1000 is to illustrate that sampling rate is in us and
|
||||
transition latency is exported ns).
|
||||
show_sampling_rate_min:
|
||||
The sampling rate is limited by the HW transition latency:
|
||||
transition_latency * 100
|
||||
Or by kernel restrictions:
|
||||
If CONFIG_NO_HZ is set, the limit is 10ms fixed.
|
||||
If CONFIG_NO_HZ is not set or no_hz=off boot parameter is used, the
|
||||
limits depend on the CONFIG_HZ option:
|
||||
HZ=1000: min=20000us (20ms)
|
||||
HZ=250: min=80000us (80ms)
|
||||
HZ=100: min=200000us (200ms)
|
||||
The highest value of kernel and HW latency restrictions is shown and
|
||||
used as the minimum sampling rate.
|
||||
|
||||
show_sampling_rate_max: THIS INTERFACE IS DEPRECATED, DON'T USE IT.
|
||||
|
||||
up_threshold: defines what the average CPU usage between the samplings
|
||||
of 'sampling_rate' needs to be for the kernel to make a decision on
|
||||
|
@ -31,7 +31,6 @@ Contents:
|
||||
|
||||
3. How to change the CPU cpufreq policy and/or speed
|
||||
3.1 Preferred interface: sysfs
|
||||
3.2 Deprecated interfaces
|
||||
|
||||
|
||||
|
||||
|
@ -76,9 +76,9 @@ Do the steps below to download the BIOS image.
|
||||
|
||||
The /sys/class/firmware/dell_rbu/ entries will remain till the following is
|
||||
done.
|
||||
echo -1 > /sys/class/firmware/dell_rbu/loading.
|
||||
echo -1 > /sys/class/firmware/dell_rbu/loading
|
||||
Until this step is completed the driver cannot be unloaded.
|
||||
Also echoing either mono ,packet or init in to image_type will free up the
|
||||
Also echoing either mono, packet or init in to image_type will free up the
|
||||
memory allocated by the driver.
|
||||
|
||||
If a user by accident executes steps 1 and 3 above without executing step 2;
|
||||
|
@ -119,7 +119,7 @@ which takes quite a bit of time and thought after the "real work" has been
|
||||
done. When done properly, though, it is time well spent.
|
||||
|
||||
|
||||
5.4: PATCH FORMATTING
|
||||
5.4: PATCH FORMATTING AND CHANGELOGS
|
||||
|
||||
So now you have a perfect series of patches for posting, but the work is
|
||||
not done quite yet. Each patch needs to be formatted into a message which
|
||||
@ -146,8 +146,33 @@ that end, each patch will be composed of the following:
|
||||
- One or more tag lines, with, at a minimum, one Signed-off-by: line from
|
||||
the author of the patch. Tags will be described in more detail below.
|
||||
|
||||
The above three items should, normally, be the text used when committing
|
||||
the change to a revision control system. They are followed by:
|
||||
The items above, together, form the changelog for the patch. Writing good
|
||||
changelogs is a crucial but often-neglected art; it's worth spending
|
||||
another moment discussing this issue. When writing a changelog, you should
|
||||
bear in mind that a number of different people will be reading your words.
|
||||
These include subsystem maintainers and reviewers who need to decide
|
||||
whether the patch should be included, distributors and other maintainers
|
||||
trying to decide whether a patch should be backported to other kernels, bug
|
||||
hunters wondering whether the patch is responsible for a problem they are
|
||||
chasing, users who want to know how the kernel has changed, and more. A
|
||||
good changelog conveys the needed information to all of these people in the
|
||||
most direct and concise way possible.
|
||||
|
||||
To that end, the summary line should describe the effects of and motivation
|
||||
for the change as well as possible given the one-line constraint. The
|
||||
detailed description can then amplify on those topics and provide any
|
||||
needed additional information. If the patch fixes a bug, cite the commit
|
||||
which introduced the bug if possible. If a problem is associated with
|
||||
specific log or compiler output, include that output to help others
|
||||
searching for a solution to the same problem. If the change is meant to
|
||||
support other changes coming in later patch, say so. If internal APIs are
|
||||
changed, detail those changes and how other developers should respond. In
|
||||
general, the more you can put yourself into the shoes of everybody who will
|
||||
be reading your changelog, the better that changelog (and the kernel as a
|
||||
whole) will be.
|
||||
|
||||
Needless to say, the changelog should be the text used when committing the
|
||||
change to a revision control system. It will be followed by:
|
||||
|
||||
- The patch itself, in the unified ("-u") patch format. Using the "-p"
|
||||
option to diff will associate function names with changes, making the
|
||||
|
54
Documentation/device-mapper/dm-log.txt
Normal file
54
Documentation/device-mapper/dm-log.txt
Normal file
@ -0,0 +1,54 @@
|
||||
Device-Mapper Logging
|
||||
=====================
|
||||
The device-mapper logging code is used by some of the device-mapper
|
||||
RAID targets to track regions of the disk that are not consistent.
|
||||
A region (or portion of the address space) of the disk may be
|
||||
inconsistent because a RAID stripe is currently being operated on or
|
||||
a machine died while the region was being altered. In the case of
|
||||
mirrors, a region would be considered dirty/inconsistent while you
|
||||
are writing to it because the writes need to be replicated for all
|
||||
the legs of the mirror and may not reach the legs at the same time.
|
||||
Once all writes are complete, the region is considered clean again.
|
||||
|
||||
There is a generic logging interface that the device-mapper RAID
|
||||
implementations use to perform logging operations (see
|
||||
dm_dirty_log_type in include/linux/dm-dirty-log.h). Various different
|
||||
logging implementations are available and provide different
|
||||
capabilities. The list includes:
|
||||
|
||||
Type Files
|
||||
==== =====
|
||||
disk drivers/md/dm-log.c
|
||||
core drivers/md/dm-log.c
|
||||
userspace drivers/md/dm-log-userspace* include/linux/dm-log-userspace.h
|
||||
|
||||
The "disk" log type
|
||||
-------------------
|
||||
This log implementation commits the log state to disk. This way, the
|
||||
logging state survives reboots/crashes.
|
||||
|
||||
The "core" log type
|
||||
-------------------
|
||||
This log implementation keeps the log state in memory. The log state
|
||||
will not survive a reboot or crash, but there may be a small boost in
|
||||
performance. This method can also be used if no storage device is
|
||||
available for storing log state.
|
||||
|
||||
The "userspace" log type
|
||||
------------------------
|
||||
This log type simply provides a way to export the log API to userspace,
|
||||
so log implementations can be done there. This is done by forwarding most
|
||||
logging requests to userspace, where a daemon receives and processes the
|
||||
request.
|
||||
|
||||
The structure used for communication between kernel and userspace are
|
||||
located in include/linux/dm-log-userspace.h. Due to the frequency,
|
||||
diversity, and 2-way communication nature of the exchanges between
|
||||
kernel and userspace, 'connector' is used as the interface for
|
||||
communication.
|
||||
|
||||
There are currently two userspace log implementations that leverage this
|
||||
framework - "clustered_disk" and "clustered_core". These implementations
|
||||
provide a cluster-coherent log for shared-storage. Device-mapper mirroring
|
||||
can be used in a shared-storage environment when the cluster log implementations
|
||||
are employed.
|
39
Documentation/device-mapper/dm-queue-length.txt
Normal file
39
Documentation/device-mapper/dm-queue-length.txt
Normal file
@ -0,0 +1,39 @@
|
||||
dm-queue-length
|
||||
===============
|
||||
|
||||
dm-queue-length is a path selector module for device-mapper targets,
|
||||
which selects a path with the least number of in-flight I/Os.
|
||||
The path selector name is 'queue-length'.
|
||||
|
||||
Table parameters for each path: [<repeat_count>]
|
||||
<repeat_count>: The number of I/Os to dispatch using the selected
|
||||
path before switching to the next path.
|
||||
If not given, internal default is used. To check
|
||||
the default value, see the activated table.
|
||||
|
||||
Status for each path: <status> <fail-count> <in-flight>
|
||||
<status>: 'A' if the path is active, 'F' if the path is failed.
|
||||
<fail-count>: The number of path failures.
|
||||
<in-flight>: The number of in-flight I/Os on the path.
|
||||
|
||||
|
||||
Algorithm
|
||||
=========
|
||||
|
||||
dm-queue-length increments/decrements 'in-flight' when an I/O is
|
||||
dispatched/completed respectively.
|
||||
dm-queue-length selects a path with the minimum 'in-flight'.
|
||||
|
||||
|
||||
Examples
|
||||
========
|
||||
In case that 2 paths (sda and sdb) are used with repeat_count == 128.
|
||||
|
||||
# echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \
|
||||
dmsetup create test
|
||||
#
|
||||
# dmsetup table
|
||||
test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128
|
||||
#
|
||||
# dmsetup status
|
||||
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0
|
91
Documentation/device-mapper/dm-service-time.txt
Normal file
91
Documentation/device-mapper/dm-service-time.txt
Normal file
@ -0,0 +1,91 @@
|
||||
dm-service-time
|
||||
===============
|
||||
|
||||
dm-service-time is a path selector module for device-mapper targets,
|
||||
which selects a path with the shortest estimated service time for
|
||||
the incoming I/O.
|
||||
|
||||
The service time for each path is estimated by dividing the total size
|
||||
of in-flight I/Os on a path with the performance value of the path.
|
||||
The performance value is a relative throughput value among all paths
|
||||
in a path-group, and it can be specified as a table argument.
|
||||
|
||||
The path selector name is 'service-time'.
|
||||
|
||||
Table parameters for each path: [<repeat_count> [<relative_throughput>]]
|
||||
<repeat_count>: The number of I/Os to dispatch using the selected
|
||||
path before switching to the next path.
|
||||
If not given, internal default is used. To check
|
||||
the default value, see the activated table.
|
||||
<relative_throughput>: The relative throughput value of the path
|
||||
among all paths in the path-group.
|
||||
The valid range is 0-100.
|
||||
If not given, minimum value '1' is used.
|
||||
If '0' is given, the path isn't selected while
|
||||
other paths having a positive value are available.
|
||||
|
||||
Status for each path: <status> <fail-count> <in-flight-size> \
|
||||
<relative_throughput>
|
||||
<status>: 'A' if the path is active, 'F' if the path is failed.
|
||||
<fail-count>: The number of path failures.
|
||||
<in-flight-size>: The size of in-flight I/Os on the path.
|
||||
<relative_throughput>: The relative throughput value of the path
|
||||
among all paths in the path-group.
|
||||
|
||||
|
||||
Algorithm
|
||||
=========
|
||||
|
||||
dm-service-time adds the I/O size to 'in-flight-size' when the I/O is
|
||||
dispatched and substracts when completed.
|
||||
Basically, dm-service-time selects a path having minimum service time
|
||||
which is calculated by:
|
||||
|
||||
('in-flight-size' + 'size-of-incoming-io') / 'relative_throughput'
|
||||
|
||||
However, some optimizations below are used to reduce the calculation
|
||||
as much as possible.
|
||||
|
||||
1. If the paths have the same 'relative_throughput', skip
|
||||
the division and just compare the 'in-flight-size'.
|
||||
|
||||
2. If the paths have the same 'in-flight-size', skip the division
|
||||
and just compare the 'relative_throughput'.
|
||||
|
||||
3. If some paths have non-zero 'relative_throughput' and others
|
||||
have zero 'relative_throughput', ignore those paths with zero
|
||||
'relative_throughput'.
|
||||
|
||||
If such optimizations can't be applied, calculate service time, and
|
||||
compare service time.
|
||||
If calculated service time is equal, the path having maximum
|
||||
'relative_throughput' may be better. So compare 'relative_throughput'
|
||||
then.
|
||||
|
||||
|
||||
Examples
|
||||
========
|
||||
In case that 2 paths (sda and sdb) are used with repeat_count == 128
|
||||
and sda has an average throughput 1GB/s and sdb has 4GB/s,
|
||||
'relative_throughput' value may be '1' for sda and '4' for sdb.
|
||||
|
||||
# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \
|
||||
dmsetup create test
|
||||
#
|
||||
# dmsetup table
|
||||
test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4
|
||||
#
|
||||
# dmsetup status
|
||||
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4
|
||||
|
||||
|
||||
Or '2' for sda and '8' for sdb would be also true.
|
||||
|
||||
# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \
|
||||
dmsetup create test
|
||||
#
|
||||
# dmsetup table
|
||||
test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8
|
||||
#
|
||||
# dmsetup status
|
||||
test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8
|
@ -162,3 +162,35 @@ device_remove_file(dev,&dev_attr_power);
|
||||
|
||||
The file name will be 'power' with a mode of 0644 (-rw-r--r--).
|
||||
|
||||
Word of warning: While the kernel allows device_create_file() and
|
||||
device_remove_file() to be called on a device at any time, userspace has
|
||||
strict expectations on when attributes get created. When a new device is
|
||||
registered in the kernel, a uevent is generated to notify userspace (like
|
||||
udev) that a new device is available. If attributes are added after the
|
||||
device is registered, then userspace won't get notified and userspace will
|
||||
not know about the new attributes.
|
||||
|
||||
This is important for device driver that need to publish additional
|
||||
attributes for a device at driver probe time. If the device driver simply
|
||||
calls device_create_file() on the device structure passed to it, then
|
||||
userspace will never be notified of the new attributes. Instead, it should
|
||||
probably use class_create() and class->dev_attrs to set up a list of
|
||||
desired attributes in the modules_init function, and then in the .probe()
|
||||
hook, and then use device_create() to create a new device as a child
|
||||
of the probed device. The new device will generate a new uevent and
|
||||
properly advertise the new attributes to userspace.
|
||||
|
||||
For example, if a driver wanted to add the following attributes:
|
||||
struct device_attribute mydriver_attribs[] = {
|
||||
__ATTR(port_count, 0444, port_count_show),
|
||||
__ATTR(serial_number, 0444, serial_number_show),
|
||||
NULL
|
||||
};
|
||||
|
||||
Then in the module init function is would do:
|
||||
mydriver_class = class_create(THIS_MODULE, "my_attrs");
|
||||
mydriver_class.dev_attr = mydriver_attribs;
|
||||
|
||||
And assuming 'dev' is the struct device passed into the probe hook, the driver
|
||||
probe function would do something like:
|
||||
create_device(&mydriver_class, dev, chrdev, &private_data, "my_name");
|
||||
|
@ -188,7 +188,7 @@ For example, you can do something like the following.
|
||||
|
||||
void my_midlayer_destroy_something()
|
||||
{
|
||||
devres_release_group(dev, my_midlayer_create_soemthing);
|
||||
devres_release_group(dev, my_midlayer_create_something);
|
||||
}
|
||||
|
||||
|
||||
|
@ -112,7 +112,7 @@ sub tda10045 {
|
||||
|
||||
sub tda10046 {
|
||||
my $sourcefile = "TT_PCI_2.19h_28_11_2006.zip";
|
||||
my $url = "http://technotrend-online.com/download/software/219/$sourcefile";
|
||||
my $url = "http://www.tt-download.com/download/updates/219/$sourcefile";
|
||||
my $hash = "6a7e1e2f2644b162ff0502367553c72d";
|
||||
my $outfile = "dvb-fe-tda10046.fw";
|
||||
my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1);
|
||||
@ -129,8 +129,8 @@ sub tda10046 {
|
||||
}
|
||||
|
||||
sub tda10046lifeview {
|
||||
my $sourcefile = "Drv_2.11.02.zip";
|
||||
my $url = "http://www.lifeview.com.tw/drivers/pci_card/FlyDVB-T/$sourcefile";
|
||||
my $sourcefile = "7%5Cdrv_2.11.02.zip";
|
||||
my $url = "http://www.lifeview.hk/dbimages/document/$sourcefile";
|
||||
my $hash = "1ea24dee4eea8fe971686981f34fd2e0";
|
||||
my $outfile = "dvb-fe-tda10046.fw";
|
||||
my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1);
|
||||
@ -317,7 +317,7 @@ sub nxt2002 {
|
||||
|
||||
sub nxt2004 {
|
||||
my $sourcefile = "AVerTVHD_MCE_A180_Drv_v1.2.2.16.zip";
|
||||
my $url = "http://www.aver.com/support/Drivers/$sourcefile";
|
||||
my $url = "http://www.avermedia-usa.com/support/Drivers/$sourcefile";
|
||||
my $hash = "111cb885b1e009188346d72acfed024c";
|
||||
my $outfile = "dvb-fe-nxt2004.fw";
|
||||
my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1);
|
||||
|
@ -23,8 +23,8 @@ first time, it was renamed to 'EDAC'.
|
||||
The bluesmoke project at sourceforge.net is now utilized as a 'staging area'
|
||||
for EDAC development, before it is sent upstream to kernel.org
|
||||
|
||||
At the bluesmoke/EDAC project site, is a series of quilt patches against
|
||||
recent kernels, stored in a SVN respository. For easier downloading, there
|
||||
At the bluesmoke/EDAC project site is a series of quilt patches against
|
||||
recent kernels, stored in a SVN repository. For easier downloading, there
|
||||
is also a tarball snapshot available.
|
||||
|
||||
============================================================================
|
||||
@ -73,9 +73,9 @@ the vendor should tie the parity status bits to 0 if they do not intend
|
||||
to generate parity. Some vendors do not do this, and thus the parity bit
|
||||
can "float" giving false positives.
|
||||
|
||||
In the kernel there is a pci device attribute located in sysfs that is
|
||||
In the kernel there is a PCI device attribute located in sysfs that is
|
||||
checked by the EDAC PCI scanning code. If that attribute is set,
|
||||
PCI parity/error scannining is skipped for that device. The attribute
|
||||
PCI parity/error scanning is skipped for that device. The attribute
|
||||
is:
|
||||
|
||||
broken_parity_status
|
||||
|
@ -29,16 +29,16 @@ o debugfs entries
|
||||
fault-inject-debugfs kernel module provides some debugfs entries for runtime
|
||||
configuration of fault-injection capabilities.
|
||||
|
||||
- /debug/fail*/probability:
|
||||
- /sys/kernel/debug/fail*/probability:
|
||||
|
||||
likelihood of failure injection, in percent.
|
||||
Format: <percent>
|
||||
|
||||
Note that one-failure-per-hundred is a very high error rate
|
||||
for some testcases. Consider setting probability=100 and configure
|
||||
/debug/fail*/interval for such testcases.
|
||||
/sys/kernel/debug/fail*/interval for such testcases.
|
||||
|
||||
- /debug/fail*/interval:
|
||||
- /sys/kernel/debug/fail*/interval:
|
||||
|
||||
specifies the interval between failures, for calls to
|
||||
should_fail() that pass all the other tests.
|
||||
@ -46,18 +46,18 @@ configuration of fault-injection capabilities.
|
||||
Note that if you enable this, by setting interval>1, you will
|
||||
probably want to set probability=100.
|
||||
|
||||
- /debug/fail*/times:
|
||||
- /sys/kernel/debug/fail*/times:
|
||||
|
||||
specifies how many times failures may happen at most.
|
||||
A value of -1 means "no limit".
|
||||
|
||||
- /debug/fail*/space:
|
||||
- /sys/kernel/debug/fail*/space:
|
||||
|
||||
specifies an initial resource "budget", decremented by "size"
|
||||
on each call to should_fail(,size). Failure injection is
|
||||
suppressed until "space" reaches zero.
|
||||
|
||||
- /debug/fail*/verbose
|
||||
- /sys/kernel/debug/fail*/verbose
|
||||
|
||||
Format: { 0 | 1 | 2 }
|
||||
specifies the verbosity of the messages when failure is
|
||||
@ -65,17 +65,17 @@ configuration of fault-injection capabilities.
|
||||
log line per failure; '2' will print a call trace too -- useful
|
||||
to debug the problems revealed by fault injection.
|
||||
|
||||
- /debug/fail*/task-filter:
|
||||
- /sys/kernel/debug/fail*/task-filter:
|
||||
|
||||
Format: { 'Y' | 'N' }
|
||||
A value of 'N' disables filtering by process (default).
|
||||
Any positive value limits failures to only processes indicated by
|
||||
/proc/<pid>/make-it-fail==1.
|
||||
|
||||
- /debug/fail*/require-start:
|
||||
- /debug/fail*/require-end:
|
||||
- /debug/fail*/reject-start:
|
||||
- /debug/fail*/reject-end:
|
||||
- /sys/kernel/debug/fail*/require-start:
|
||||
- /sys/kernel/debug/fail*/require-end:
|
||||
- /sys/kernel/debug/fail*/reject-start:
|
||||
- /sys/kernel/debug/fail*/reject-end:
|
||||
|
||||
specifies the range of virtual addresses tested during
|
||||
stacktrace walking. Failure is injected only if some caller
|
||||
@ -84,26 +84,26 @@ configuration of fault-injection capabilities.
|
||||
Default required range is [0,ULONG_MAX) (whole of virtual address space).
|
||||
Default rejected range is [0,0).
|
||||
|
||||
- /debug/fail*/stacktrace-depth:
|
||||
- /sys/kernel/debug/fail*/stacktrace-depth:
|
||||
|
||||
specifies the maximum stacktrace depth walked during search
|
||||
for a caller within [require-start,require-end) OR
|
||||
[reject-start,reject-end).
|
||||
|
||||
- /debug/fail_page_alloc/ignore-gfp-highmem:
|
||||
- /sys/kernel/debug/fail_page_alloc/ignore-gfp-highmem:
|
||||
|
||||
Format: { 'Y' | 'N' }
|
||||
default is 'N', setting it to 'Y' won't inject failures into
|
||||
highmem/user allocations.
|
||||
|
||||
- /debug/failslab/ignore-gfp-wait:
|
||||
- /debug/fail_page_alloc/ignore-gfp-wait:
|
||||
- /sys/kernel/debug/failslab/ignore-gfp-wait:
|
||||
- /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait:
|
||||
|
||||
Format: { 'Y' | 'N' }
|
||||
default is 'N', setting it to 'Y' will inject failures
|
||||
only into non-sleep allocations (GFP_ATOMIC allocations).
|
||||
|
||||
- /debug/fail_page_alloc/min-order:
|
||||
- /sys/kernel/debug/fail_page_alloc/min-order:
|
||||
|
||||
specifies the minimum page allocation order to be injected
|
||||
failures.
|
||||
@ -166,13 +166,13 @@ o Inject slab allocation failures into module init/exit code
|
||||
#!/bin/bash
|
||||
|
||||
FAILTYPE=failslab
|
||||
echo Y > /debug/$FAILTYPE/task-filter
|
||||
echo 10 > /debug/$FAILTYPE/probability
|
||||
echo 100 > /debug/$FAILTYPE/interval
|
||||
echo -1 > /debug/$FAILTYPE/times
|
||||
echo 0 > /debug/$FAILTYPE/space
|
||||
echo 2 > /debug/$FAILTYPE/verbose
|
||||
echo 1 > /debug/$FAILTYPE/ignore-gfp-wait
|
||||
echo Y > /sys/kernel/debug/$FAILTYPE/task-filter
|
||||
echo 10 > /sys/kernel/debug/$FAILTYPE/probability
|
||||
echo 100 > /sys/kernel/debug/$FAILTYPE/interval
|
||||
echo -1 > /sys/kernel/debug/$FAILTYPE/times
|
||||
echo 0 > /sys/kernel/debug/$FAILTYPE/space
|
||||
echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
|
||||
echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
|
||||
|
||||
faulty_system()
|
||||
{
|
||||
@ -217,20 +217,20 @@ then
|
||||
exit 1
|
||||
fi
|
||||
|
||||
cat /sys/module/$module/sections/.text > /debug/$FAILTYPE/require-start
|
||||
cat /sys/module/$module/sections/.data > /debug/$FAILTYPE/require-end
|
||||
cat /sys/module/$module/sections/.text > /sys/kernel/debug/$FAILTYPE/require-start
|
||||
cat /sys/module/$module/sections/.data > /sys/kernel/debug/$FAILTYPE/require-end
|
||||
|
||||
echo N > /debug/$FAILTYPE/task-filter
|
||||
echo 10 > /debug/$FAILTYPE/probability
|
||||
echo 100 > /debug/$FAILTYPE/interval
|
||||
echo -1 > /debug/$FAILTYPE/times
|
||||
echo 0 > /debug/$FAILTYPE/space
|
||||
echo 2 > /debug/$FAILTYPE/verbose
|
||||
echo 1 > /debug/$FAILTYPE/ignore-gfp-wait
|
||||
echo 1 > /debug/$FAILTYPE/ignore-gfp-highmem
|
||||
echo 10 > /debug/$FAILTYPE/stacktrace-depth
|
||||
echo N > /sys/kernel/debug/$FAILTYPE/task-filter
|
||||
echo 10 > /sys/kernel/debug/$FAILTYPE/probability
|
||||
echo 100 > /sys/kernel/debug/$FAILTYPE/interval
|
||||
echo -1 > /sys/kernel/debug/$FAILTYPE/times
|
||||
echo 0 > /sys/kernel/debug/$FAILTYPE/space
|
||||
echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
|
||||
echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
|
||||
echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-highmem
|
||||
echo 10 > /sys/kernel/debug/$FAILTYPE/stacktrace-depth
|
||||
|
||||
trap "echo 0 > /debug/$FAILTYPE/probability" SIGINT SIGTERM EXIT
|
||||
trap "echo 0 > /sys/kernel/debug/$FAILTYPE/probability" SIGINT SIGTERM EXIT
|
||||
|
||||
echo "Injecting errors into the module $module... (interrupt to stop)"
|
||||
sleep 1000000
|
||||
|
@ -1,7 +1,7 @@
|
||||
SH7760/SH7763 integrated LCDC Framebuffer driver
|
||||
================================================
|
||||
|
||||
0. Overwiew
|
||||
0. Overview
|
||||
-----------
|
||||
The SH7760/SH7763 have an integrated LCD Display controller (LCDC) which
|
||||
supports (in theory) resolutions ranging from 1x1 to 1024x1024,
|
||||
|
@ -95,7 +95,7 @@ There is no way to change the vesafb video mode and/or timings after
|
||||
booting linux. If you are not happy with the 60 Hz refresh rate, you
|
||||
have these options:
|
||||
|
||||
* configure and load the DOS-Tools for your the graphics board (if
|
||||
* configure and load the DOS-Tools for the graphics board (if
|
||||
available) and boot linux with loadlin.
|
||||
* use a native driver (matroxfb/atyfb) instead if vesafb. If none
|
||||
is available, write a new one!
|
||||
|
@ -6,6 +6,20 @@ be removed from this file.
|
||||
|
||||
---------------------------
|
||||
|
||||
What: IRQF_SAMPLE_RANDOM
|
||||
Check: IRQF_SAMPLE_RANDOM
|
||||
When: July 2009
|
||||
|
||||
Why: Many of IRQF_SAMPLE_RANDOM users are technically bogus as entropy
|
||||
sources in the kernel's current entropy model. To resolve this, every
|
||||
input point to the kernel's entropy pool needs to better document the
|
||||
type of entropy source it actually is. This will be replaced with
|
||||
additional add_*_randomness functions in drivers/char/random.c
|
||||
|
||||
Who: Robin Getz <rgetz@blackfin.uclinux.org> & Matt Mackall <mpm@selenic.com>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: The ieee80211_regdom module parameter
|
||||
When: March 2010 / desktop catchup
|
||||
|
||||
@ -354,16 +368,6 @@ Who: Krzysztof Piotr Oledzki <ole@ans.pl>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: i2c_attach_client(), i2c_detach_client(), i2c_driver->detach_client(),
|
||||
i2c_adapter->client_register(), i2c_adapter->client_unregister
|
||||
When: 2.6.30
|
||||
Check: i2c_attach_client i2c_detach_client
|
||||
Why: Deprecated by the new (standard) device driver binding model. Use
|
||||
i2c_driver->probe() and ->remove() instead.
|
||||
Who: Jean Delvare <khali@linux-fr.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: fscher and fscpos drivers
|
||||
When: June 2009
|
||||
Why: Deprecated by the new fschmd driver.
|
||||
@ -437,3 +441,20 @@ Why: Superseded by tdfxfb. I2C/DDC support used to live in a separate
|
||||
driver but this caused driver conflicts.
|
||||
Who: Jean Delvare <khali@linux-fr.org>
|
||||
Krzysztof Helt <krzysztof.h1@wp.pl>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: CONFIG_RFKILL_INPUT
|
||||
When: 2.6.33
|
||||
Why: Should be implemented in userspace, policy daemon.
|
||||
Who: Johannes Berg <johannes@sipsolutions.net>
|
||||
|
||||
----------------------------
|
||||
|
||||
What: CONFIG_X86_OLD_MCE
|
||||
When: 2.6.32
|
||||
Why: Remove the old legacy 32bit machine check code. This has been
|
||||
superseded by the newer machine check code from the 64bit port,
|
||||
but the old version has been kept around for easier testing. Note this
|
||||
doesn't impact the old P5 and WinChip machine check handlers.
|
||||
Who: Andi Kleen <andi@firstfloor.org>
|
||||
|
@ -66,6 +66,10 @@ mandatory-locking.txt
|
||||
- info on the Linux implementation of Sys V mandatory file locking.
|
||||
ncpfs.txt
|
||||
- info on Novell Netware(tm) filesystem using NCP protocol.
|
||||
nfs41-server.txt
|
||||
- info on the Linux server implementation of NFSv4 minor version 1.
|
||||
nfs-rdma.txt
|
||||
- how to install and setup the Linux NFS/RDMA client and server software.
|
||||
nfsroot.txt
|
||||
- short guide on setting up a diskless box with NFS root filesystem.
|
||||
nilfs2.txt
|
||||
|
@ -109,27 +109,28 @@ prototypes:
|
||||
|
||||
locking rules:
|
||||
All may block.
|
||||
BKL s_lock s_umount
|
||||
alloc_inode: no no no
|
||||
destroy_inode: no
|
||||
dirty_inode: no (must not sleep)
|
||||
write_inode: no
|
||||
drop_inode: no !!!inode_lock!!!
|
||||
delete_inode: no
|
||||
put_super: yes yes no
|
||||
write_super: no yes read
|
||||
sync_fs: no no read
|
||||
freeze_fs: ?
|
||||
unfreeze_fs: ?
|
||||
statfs: no no no
|
||||
remount_fs: yes yes maybe (see below)
|
||||
clear_inode: no
|
||||
umount_begin: yes no no
|
||||
show_options: no (vfsmount->sem)
|
||||
quota_read: no no no (see below)
|
||||
quota_write: no no no (see below)
|
||||
None have BKL
|
||||
s_umount
|
||||
alloc_inode:
|
||||
destroy_inode:
|
||||
dirty_inode: (must not sleep)
|
||||
write_inode:
|
||||
drop_inode: !!!inode_lock!!!
|
||||
delete_inode:
|
||||
put_super: write
|
||||
write_super: read
|
||||
sync_fs: read
|
||||
freeze_fs: read
|
||||
unfreeze_fs: read
|
||||
statfs: no
|
||||
remount_fs: maybe (see below)
|
||||
clear_inode:
|
||||
umount_begin: no
|
||||
show_options: no (namespace_sem)
|
||||
quota_read: no (see below)
|
||||
quota_write: no (see below)
|
||||
|
||||
->remount_fs() will have the s_umount lock if it's already mounted.
|
||||
->remount_fs() will have the s_umount exclusive lock if it's already mounted.
|
||||
When called from get_sb_single, it does NOT have the s_umount lock.
|
||||
->quota_read() and ->quota_write() functions are both guaranteed to
|
||||
be the only ones operating on the quota file by the quota code (via
|
||||
@ -187,7 +188,7 @@ readpages: no
|
||||
write_begin: no locks the page yes
|
||||
write_end: no yes, unlocks yes
|
||||
perform_write: no n/a yes
|
||||
bmap: yes
|
||||
bmap: no
|
||||
invalidatepage: no yes
|
||||
releasepage: no yes
|
||||
direct_IO: no
|
||||
|
@ -369,7 +369,7 @@ The call requires an initialized struct autofs_dev_ioctl. There are two
|
||||
possible variations. Both use the path field set to the path of the mount
|
||||
point to check and the size field adjusted appropriately. One uses the
|
||||
ioctlfd field to identify a specific mount point to check while the other
|
||||
variation uses the path and optionaly arg1 set to an autofs mount type.
|
||||
variation uses the path and optionally arg1 set to an autofs mount type.
|
||||
The call returns 1 if this is a mount point and sets arg1 to the device
|
||||
number of the mount and field arg2 to the relevant super block magic
|
||||
number (described below) or 0 if it isn't a mountpoint. In both cases
|
||||
|
@ -184,7 +184,7 @@ This has the following fields:
|
||||
have index children.
|
||||
|
||||
If this function is not supplied or if it returns NULL then the first
|
||||
cache in the parent's list will be chosed, or failing that, the first
|
||||
cache in the parent's list will be chosen, or failing that, the first
|
||||
cache in the master list.
|
||||
|
||||
(4) A function to retrieve an object's key from the netfs [mandatory].
|
||||
|
158
Documentation/filesystems/debugfs.txt
Normal file
158
Documentation/filesystems/debugfs.txt
Normal file
@ -0,0 +1,158 @@
|
||||
Copyright 2009 Jonathan Corbet <corbet@lwn.net>
|
||||
|
||||
Debugfs exists as a simple way for kernel developers to make information
|
||||
available to user space. Unlike /proc, which is only meant for information
|
||||
about a process, or sysfs, which has strict one-value-per-file rules,
|
||||
debugfs has no rules at all. Developers can put any information they want
|
||||
there. The debugfs filesystem is also intended to not serve as a stable
|
||||
ABI to user space; in theory, there are no stability constraints placed on
|
||||
files exported there. The real world is not always so simple, though [1];
|
||||
even debugfs interfaces are best designed with the idea that they will need
|
||||
to be maintained forever.
|
||||
|
||||
Debugfs is typically mounted with a command like:
|
||||
|
||||
mount -t debugfs none /sys/kernel/debug
|
||||
|
||||
(Or an equivalent /etc/fstab line).
|
||||
|
||||
Note that the debugfs API is exported GPL-only to modules.
|
||||
|
||||
Code using debugfs should include <linux/debugfs.h>. Then, the first order
|
||||
of business will be to create at least one directory to hold a set of
|
||||
debugfs files:
|
||||
|
||||
struct dentry *debugfs_create_dir(const char *name, struct dentry *parent);
|
||||
|
||||
This call, if successful, will make a directory called name underneath the
|
||||
indicated parent directory. If parent is NULL, the directory will be
|
||||
created in the debugfs root. On success, the return value is a struct
|
||||
dentry pointer which can be used to create files in the directory (and to
|
||||
clean it up at the end). A NULL return value indicates that something went
|
||||
wrong. If ERR_PTR(-ENODEV) is returned, that is an indication that the
|
||||
kernel has been built without debugfs support and none of the functions
|
||||
described below will work.
|
||||
|
||||
The most general way to create a file within a debugfs directory is with:
|
||||
|
||||
struct dentry *debugfs_create_file(const char *name, mode_t mode,
|
||||
struct dentry *parent, void *data,
|
||||
const struct file_operations *fops);
|
||||
|
||||
Here, name is the name of the file to create, mode describes the access
|
||||
permissions the file should have, parent indicates the directory which
|
||||
should hold the file, data will be stored in the i_private field of the
|
||||
resulting inode structure, and fops is a set of file operations which
|
||||
implement the file's behavior. At a minimum, the read() and/or write()
|
||||
operations should be provided; others can be included as needed. Again,
|
||||
the return value will be a dentry pointer to the created file, NULL for
|
||||
error, or ERR_PTR(-ENODEV) if debugfs support is missing.
|
||||
|
||||
In a number of cases, the creation of a set of file operations is not
|
||||
actually necessary; the debugfs code provides a number of helper functions
|
||||
for simple situations. Files containing a single integer value can be
|
||||
created with any of:
|
||||
|
||||
struct dentry *debugfs_create_u8(const char *name, mode_t mode,
|
||||
struct dentry *parent, u8 *value);
|
||||
struct dentry *debugfs_create_u16(const char *name, mode_t mode,
|
||||
struct dentry *parent, u16 *value);
|
||||
struct dentry *debugfs_create_u32(const char *name, mode_t mode,
|
||||
struct dentry *parent, u32 *value);
|
||||
struct dentry *debugfs_create_u64(const char *name, mode_t mode,
|
||||
struct dentry *parent, u64 *value);
|
||||
|
||||
These files support both reading and writing the given value; if a specific
|
||||
file should not be written to, simply set the mode bits accordingly. The
|
||||
values in these files are in decimal; if hexadecimal is more appropriate,
|
||||
the following functions can be used instead:
|
||||
|
||||
struct dentry *debugfs_create_x8(const char *name, mode_t mode,
|
||||
struct dentry *parent, u8 *value);
|
||||
struct dentry *debugfs_create_x16(const char *name, mode_t mode,
|
||||
struct dentry *parent, u16 *value);
|
||||
struct dentry *debugfs_create_x32(const char *name, mode_t mode,
|
||||
struct dentry *parent, u32 *value);
|
||||
|
||||
Note that there is no debugfs_create_x64().
|
||||
|
||||
These functions are useful as long as the developer knows the size of the
|
||||
value to be exported. Some types can have different widths on different
|
||||
architectures, though, complicating the situation somewhat. There is a
|
||||
function meant to help out in one special case:
|
||||
|
||||
struct dentry *debugfs_create_size_t(const char *name, mode_t mode,
|
||||
struct dentry *parent,
|
||||
size_t *value);
|
||||
|
||||
As might be expected, this function will create a debugfs file to represent
|
||||
a variable of type size_t.
|
||||
|
||||
Boolean values can be placed in debugfs with:
|
||||
|
||||
struct dentry *debugfs_create_bool(const char *name, mode_t mode,
|
||||
struct dentry *parent, u32 *value);
|
||||
|
||||
A read on the resulting file will yield either Y (for non-zero values) or
|
||||
N, followed by a newline. If written to, it will accept either upper- or
|
||||
lower-case values, or 1 or 0. Any other input will be silently ignored.
|
||||
|
||||
Finally, a block of arbitrary binary data can be exported with:
|
||||
|
||||
struct debugfs_blob_wrapper {
|
||||
void *data;
|
||||
unsigned long size;
|
||||
};
|
||||
|
||||
struct dentry *debugfs_create_blob(const char *name, mode_t mode,
|
||||
struct dentry *parent,
|
||||
struct debugfs_blob_wrapper *blob);
|
||||
|
||||
A read of this file will return the data pointed to by the
|
||||
debugfs_blob_wrapper structure. Some drivers use "blobs" as a simple way
|
||||
to return several lines of (static) formatted text output. This function
|
||||
can be used to export binary information, but there does not appear to be
|
||||
any code which does so in the mainline. Note that all files created with
|
||||
debugfs_create_blob() are read-only.
|
||||
|
||||
There are a couple of other directory-oriented helper functions:
|
||||
|
||||
struct dentry *debugfs_rename(struct dentry *old_dir,
|
||||
struct dentry *old_dentry,
|
||||
struct dentry *new_dir,
|
||||
const char *new_name);
|
||||
|
||||
struct dentry *debugfs_create_symlink(const char *name,
|
||||
struct dentry *parent,
|
||||
const char *target);
|
||||
|
||||
A call to debugfs_rename() will give a new name to an existing debugfs
|
||||
file, possibly in a different directory. The new_name must not exist prior
|
||||
to the call; the return value is old_dentry with updated information.
|
||||
Symbolic links can be created with debugfs_create_symlink().
|
||||
|
||||
There is one important thing that all debugfs users must take into account:
|
||||
there is no automatic cleanup of any directories created in debugfs. If a
|
||||
module is unloaded without explicitly removing debugfs entries, the result
|
||||
will be a lot of stale pointers and no end of highly antisocial behavior.
|
||||
So all debugfs users - at least those which can be built as modules - must
|
||||
be prepared to remove all files and directories they create there. A file
|
||||
can be removed with:
|
||||
|
||||
void debugfs_remove(struct dentry *dentry);
|
||||
|
||||
The dentry value can be NULL, in which case nothing will be removed.
|
||||
|
||||
Once upon a time, debugfs users were required to remember the dentry
|
||||
pointer for every debugfs file they created so that all files could be
|
||||
cleaned up. We live in more civilized times now, though, and debugfs users
|
||||
can call:
|
||||
|
||||
void debugfs_remove_recursive(struct dentry *dentry);
|
||||
|
||||
If this function is passed a pointer for the dentry corresponding to the
|
||||
top-level directory, the entire hierarchy below that directory will be
|
||||
removed.
|
||||
|
||||
Notes:
|
||||
[1] http://lwn.net/Articles/309298/
|
@ -322,7 +322,7 @@ an upper limit on the block size imposed by the page size of the kernel,
|
||||
so 8kB blocks are only allowed on Alpha systems (and other architectures
|
||||
which support larger pages).
|
||||
|
||||
There is an upper limit of 32768 subdirectories in a single directory.
|
||||
There is an upper limit of 32000 subdirectories in a single directory.
|
||||
|
||||
There is a "soft" upper limit of about 10-15k files in a single directory
|
||||
with the current linear linked-list directory implementation. This limit
|
||||
|
@ -235,6 +235,10 @@ minixdf Make 'df' act like Minix.
|
||||
|
||||
debug Extra debugging information is sent to syslog.
|
||||
|
||||
abort Simulate the effects of calling ext4_abort() for
|
||||
debugging purposes. This is normally used while
|
||||
remounting a filesystem which is already mounted.
|
||||
|
||||
errors=remount-ro Remount the filesystem read-only on an error.
|
||||
errors=continue Keep going on a filesystem error.
|
||||
errors=panic Panic and halt the machine if an error occurs.
|
||||
@ -294,7 +298,7 @@ max_batch_time=usec Maximum amount of time ext4 should wait for
|
||||
amount of time (on average) that it takes to
|
||||
finish committing a transaction. Call this time
|
||||
the "commit time". If the time that the
|
||||
transactoin has been running is less than the
|
||||
transaction has been running is less than the
|
||||
commit time, ext4 will try sleeping for the
|
||||
commit time to see if other operations will join
|
||||
the transaction. The commit time is capped by
|
||||
@ -328,7 +332,7 @@ noauto_da_alloc replacing existing files via patterns such as
|
||||
journal commit, in the default data=ordered
|
||||
mode, the data blocks of the new file are forced
|
||||
to disk before the rename() operation is
|
||||
commited. This provides roughly the same level
|
||||
committed. This provides roughly the same level
|
||||
of guarantees as ext3, and avoids the
|
||||
"zero-length" problem that can happen when a
|
||||
system crashes before the delayed allocation
|
||||
@ -358,7 +362,7 @@ written to the journal first, and then to its final location.
|
||||
In the event of a crash, the journal can be replayed, bringing both data and
|
||||
metadata into a consistent state. This mode is the slowest except when data
|
||||
needs to be read from and written to disk at the same time where it
|
||||
outperforms all others modes. Curently ext4 does not have delayed
|
||||
outperforms all others modes. Currently ext4 does not have delayed
|
||||
allocation support if this data journalling mode is selected.
|
||||
|
||||
References
|
||||
|
@ -204,7 +204,7 @@ fiemap_check_flags() helper:
|
||||
|
||||
int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
|
||||
|
||||
The struct fieinfo should be passed in as recieved from ioctl_fiemap(). The
|
||||
The struct fieinfo should be passed in as received from ioctl_fiemap(). The
|
||||
set of fiemap flags which the fs understands should be passed via fs_flags. If
|
||||
fiemap_check_flags finds invalid user flags, it will place the bad values in
|
||||
fieinfo->fi_flags and return -EBADR. If the file system gets -EBADR, from
|
||||
|
@ -60,7 +60,7 @@ go_lock | Called for the first local holder of a lock
|
||||
go_unlock | Called on the final local unlock of a lock
|
||||
go_dump | Called to print content of object for debugfs file, or on
|
||||
| error to dump glock to the log.
|
||||
go_type; | The type of the glock, LM_TYPE_.....
|
||||
go_type | The type of the glock, LM_TYPE_.....
|
||||
go_min_hold_time | The minimum hold time
|
||||
|
||||
The minimum hold time for each lock is the time after a remote lock
|
||||
|
@ -11,18 +11,15 @@ their I/O so file system consistency is maintained. One of the nifty
|
||||
features of GFS is perfect consistency -- changes made to the file system
|
||||
on one machine show up immediately on all other machines in the cluster.
|
||||
|
||||
GFS uses interchangable inter-node locking mechanisms. Different lock
|
||||
modules can plug into GFS and each file system selects the appropriate
|
||||
lock module at mount time. Lock modules include:
|
||||
GFS uses interchangable inter-node locking mechanisms, the currently
|
||||
supported mechanisms are:
|
||||
|
||||
lock_nolock -- allows gfs to be used as a local file system
|
||||
|
||||
lock_dlm -- uses a distributed lock manager (dlm) for inter-node locking
|
||||
The dlm is found at linux/fs/dlm/
|
||||
|
||||
In addition to interfacing with an external locking manager, a gfs lock
|
||||
module is responsible for interacting with external cluster management
|
||||
systems. Lock_dlm depends on user space cluster management systems found
|
||||
Lock_dlm depends on user space cluster management systems found
|
||||
at the URL above.
|
||||
|
||||
To use gfs as a local file system, no external clustering systems are
|
||||
@ -31,13 +28,19 @@ needed, simply:
|
||||
$ mkfs -t gfs2 -p lock_nolock -j 1 /dev/block_device
|
||||
$ mount -t gfs2 /dev/block_device /dir
|
||||
|
||||
GFS2 is not on-disk compatible with previous versions of GFS.
|
||||
If you are using Fedora, you need to install the gfs2-utils package
|
||||
and, for lock_dlm, you will also need to install the cman package
|
||||
and write a cluster.conf as per the documentation.
|
||||
|
||||
GFS2 is not on-disk compatible with previous versions of GFS, but it
|
||||
is pretty close.
|
||||
|
||||
The following man pages can be found at the URL above:
|
||||
gfs2_fsck to repair a filesystem
|
||||
fsck.gfs2 to repair a filesystem
|
||||
gfs2_grow to expand a filesystem online
|
||||
gfs2_jadd to add journals to a filesystem online
|
||||
gfs2_tool to manipulate, examine and tune a filesystem
|
||||
gfs2_quota to examine and change quota values in a filesystem
|
||||
gfs2_convert to convert a gfs filesystem to gfs2 in-place
|
||||
mount.gfs2 to help mount(8) mount a filesystem
|
||||
mkfs.gfs2 to make a filesystem
|
||||
|
@ -23,8 +23,13 @@ Mount options unique to the isofs filesystem.
|
||||
map=off Do not map non-Rock Ridge filenames to lower case
|
||||
map=normal Map non-Rock Ridge filenames to lower case
|
||||
map=acorn As map=normal but also apply Acorn extensions if present
|
||||
mode=xxx Sets the permissions on files to xxx
|
||||
dmode=xxx Sets the permissions on directories to xxx
|
||||
mode=xxx Sets the permissions on files to xxx unless Rock Ridge
|
||||
extensions set the permissions otherwise
|
||||
dmode=xxx Sets the permissions on directories to xxx unless Rock Ridge
|
||||
extensions set the permissions otherwise
|
||||
overriderockperm Set permissions on files and directories according to
|
||||
'mode' and 'dmode' even though Rock Ridge extensions are
|
||||
present.
|
||||
nojoliet Ignore Joliet extensions if they are present.
|
||||
norock Ignore Rock Ridge extensions if they are present.
|
||||
hide Completely strip hidden files from the file system.
|
||||
|
@ -100,7 +100,7 @@ Installation
|
||||
$ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
|
||||
|
||||
In this location, mount.nfs will be invoked automatically for NFS mounts
|
||||
by the system mount commmand.
|
||||
by the system mount command.
|
||||
|
||||
NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
|
||||
on the NFS client machine. You do not need this specific version of
|
||||
|
@ -39,9 +39,8 @@ Features which NILFS2 does not support yet:
|
||||
- extended attributes
|
||||
- POSIX ACLs
|
||||
- quotas
|
||||
- writable snapshots
|
||||
- remote backup (CDP)
|
||||
- data integrity
|
||||
- fsck
|
||||
- resize
|
||||
- defragmentation
|
||||
|
||||
Mount options
|
||||
|
@ -5,11 +5,12 @@
|
||||
Bodo Bauer <bb@ricochet.net>
|
||||
|
||||
2.4.x update Jorge Nerin <comandante@zaralinux.com> November 14 2000
|
||||
move /proc/sys Shen Feng <shen@cn.fujitsu.com> April 1 2009
|
||||
move /proc/sys Shen Feng <shen@cn.fujitsu.com> April 1 2009
|
||||
------------------------------------------------------------------------------
|
||||
Version 1.3 Kernel version 2.2.12
|
||||
Kernel version 2.4.0-test11-pre4
|
||||
------------------------------------------------------------------------------
|
||||
fixes/update part 1.1 Stefani Seibold <stefani@seibold.net> June 9 2009
|
||||
|
||||
Table of Contents
|
||||
-----------------
|
||||
@ -116,7 +117,7 @@ The link self points to the process reading the file system. Each process
|
||||
subdirectory has the entries listed in Table 1-1.
|
||||
|
||||
|
||||
Table 1-1: Process specific entries in /proc
|
||||
Table 1-1: Process specific entries in /proc
|
||||
..............................................................................
|
||||
File Content
|
||||
clear_refs Clears page referenced bits shown in smaps output
|
||||
@ -134,46 +135,103 @@ Table 1-1: Process specific entries in /proc
|
||||
status Process status in human readable form
|
||||
wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan
|
||||
stack Report full stack trace, enable via CONFIG_STACKTRACE
|
||||
smaps Extension based on maps, the rss size for each mapped file
|
||||
smaps a extension based on maps, showing the memory consumption of
|
||||
each mapping
|
||||
..............................................................................
|
||||
|
||||
For example, to get the status information of a process, all you have to do is
|
||||
read the file /proc/PID/status:
|
||||
|
||||
>cat /proc/self/status
|
||||
Name: cat
|
||||
State: R (running)
|
||||
Pid: 5452
|
||||
PPid: 743
|
||||
>cat /proc/self/status
|
||||
Name: cat
|
||||
State: R (running)
|
||||
Tgid: 5452
|
||||
Pid: 5452
|
||||
PPid: 743
|
||||
TracerPid: 0 (2.4)
|
||||
Uid: 501 501 501 501
|
||||
Gid: 100 100 100 100
|
||||
Groups: 100 14 16
|
||||
VmSize: 1112 kB
|
||||
VmLck: 0 kB
|
||||
VmRSS: 348 kB
|
||||
VmData: 24 kB
|
||||
VmStk: 12 kB
|
||||
VmExe: 8 kB
|
||||
VmLib: 1044 kB
|
||||
SigPnd: 0000000000000000
|
||||
SigBlk: 0000000000000000
|
||||
SigIgn: 0000000000000000
|
||||
SigCgt: 0000000000000000
|
||||
CapInh: 00000000fffffeff
|
||||
CapPrm: 0000000000000000
|
||||
CapEff: 0000000000000000
|
||||
|
||||
Uid: 501 501 501 501
|
||||
Gid: 100 100 100 100
|
||||
FDSize: 256
|
||||
Groups: 100 14 16
|
||||
VmPeak: 5004 kB
|
||||
VmSize: 5004 kB
|
||||
VmLck: 0 kB
|
||||
VmHWM: 476 kB
|
||||
VmRSS: 476 kB
|
||||
VmData: 156 kB
|
||||
VmStk: 88 kB
|
||||
VmExe: 68 kB
|
||||
VmLib: 1412 kB
|
||||
VmPTE: 20 kb
|
||||
Threads: 1
|
||||
SigQ: 0/28578
|
||||
SigPnd: 0000000000000000
|
||||
ShdPnd: 0000000000000000
|
||||
SigBlk: 0000000000000000
|
||||
SigIgn: 0000000000000000
|
||||
SigCgt: 0000000000000000
|
||||
CapInh: 00000000fffffeff
|
||||
CapPrm: 0000000000000000
|
||||
CapEff: 0000000000000000
|
||||
CapBnd: ffffffffffffffff
|
||||
voluntary_ctxt_switches: 0
|
||||
nonvoluntary_ctxt_switches: 1
|
||||
|
||||
This shows you nearly the same information you would get if you viewed it with
|
||||
the ps command. In fact, ps uses the proc file system to obtain its
|
||||
information. The statm file contains more detailed information about the
|
||||
process memory usage. Its seven fields are explained in Table 1-2. The stat
|
||||
file contains details information about the process itself. Its fields are
|
||||
explained in Table 1-3.
|
||||
information. But you get a more detailed view of the process by reading the
|
||||
file /proc/PID/status. It fields are described in table 1-2.
|
||||
|
||||
The statm file contains more detailed information about the process
|
||||
memory usage. Its seven fields are explained in Table 1-3. The stat file
|
||||
contains details information about the process itself. Its fields are
|
||||
explained in Table 1-4.
|
||||
|
||||
Table 1-2: Contents of the statm files (as of 2.6.8-rc3)
|
||||
Table 1-2: Contents of the statm files (as of 2.6.30-rc7)
|
||||
..............................................................................
|
||||
Field Content
|
||||
Name filename of the executable
|
||||
State state (R is running, S is sleeping, D is sleeping
|
||||
in an uninterruptible wait, Z is zombie,
|
||||
T is traced or stopped)
|
||||
Tgid thread group ID
|
||||
Pid process id
|
||||
PPid process id of the parent process
|
||||
TracerPid PID of process tracing this process (0 if not)
|
||||
Uid Real, effective, saved set, and file system UIDs
|
||||
Gid Real, effective, saved set, and file system GIDs
|
||||
FDSize number of file descriptor slots currently allocated
|
||||
Groups supplementary group list
|
||||
VmPeak peak virtual memory size
|
||||
VmSize total program size
|
||||
VmLck locked memory size
|
||||
VmHWM peak resident set size ("high water mark")
|
||||
VmRSS size of memory portions
|
||||
VmData size of data, stack, and text segments
|
||||
VmStk size of data, stack, and text segments
|
||||
VmExe size of text segment
|
||||
VmLib size of shared library code
|
||||
VmPTE size of page table entries
|
||||
Threads number of threads
|
||||
SigQ number of signals queued/max. number for queue
|
||||
SigPnd bitmap of pending signals for the thread
|
||||
ShdPnd bitmap of shared pending signals for the process
|
||||
SigBlk bitmap of blocked signals
|
||||
SigIgn bitmap of ignored signals
|
||||
SigCgt bitmap of catched signals
|
||||
CapInh bitmap of inheritable capabilities
|
||||
CapPrm bitmap of permitted capabilities
|
||||
CapEff bitmap of effective capabilities
|
||||
CapBnd bitmap of capabilities bounding set
|
||||
Cpus_allowed mask of CPUs on which this process may run
|
||||
Cpus_allowed_list Same as previous, but in "list format"
|
||||
Mems_allowed mask of memory nodes allowed to this process
|
||||
Mems_allowed_list Same as previous, but in "list format"
|
||||
voluntary_ctxt_switches number of voluntary context switches
|
||||
nonvoluntary_ctxt_switches number of non voluntary context switches
|
||||
..............................................................................
|
||||
|
||||
Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
|
||||
..............................................................................
|
||||
Field Content
|
||||
size total program size (pages) (same as VmSize in status)
|
||||
@ -188,7 +246,7 @@ Table 1-2: Contents of the statm files (as of 2.6.8-rc3)
|
||||
..............................................................................
|
||||
|
||||
|
||||
Table 1-3: Contents of the stat files (as of 2.6.22-rc3)
|
||||
Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
|
||||
..............................................................................
|
||||
Field Content
|
||||
pid process id
|
||||
@ -222,10 +280,10 @@ Table 1-3: Contents of the stat files (as of 2.6.22-rc3)
|
||||
start_stack address of the start of the stack
|
||||
esp current value of ESP
|
||||
eip current value of EIP
|
||||
pending bitmap of pending signals (obsolete)
|
||||
blocked bitmap of blocked signals (obsolete)
|
||||
sigign bitmap of ignored signals (obsolete)
|
||||
sigcatch bitmap of catched signals (obsolete)
|
||||
pending bitmap of pending signals
|
||||
blocked bitmap of blocked signals
|
||||
sigign bitmap of ignored signals
|
||||
sigcatch bitmap of catched signals
|
||||
wchan address where process went to sleep
|
||||
0 (place holder)
|
||||
0 (place holder)
|
||||
@ -234,19 +292,99 @@ Table 1-3: Contents of the stat files (as of 2.6.22-rc3)
|
||||
rt_priority realtime priority
|
||||
policy scheduling policy (man sched_setscheduler)
|
||||
blkio_ticks time spent waiting for block IO
|
||||
gtime guest time of the task in jiffies
|
||||
cgtime guest time of the task children in jiffies
|
||||
..............................................................................
|
||||
|
||||
The /proc/PID/map file containing the currently mapped memory regions and
|
||||
their access permissions.
|
||||
|
||||
The format is:
|
||||
|
||||
address perms offset dev inode pathname
|
||||
|
||||
08048000-08049000 r-xp 00000000 03:00 8312 /opt/test
|
||||
08049000-0804a000 rw-p 00001000 03:00 8312 /opt/test
|
||||
0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
|
||||
a7cb1000-a7cb2000 ---p 00000000 00:00 0
|
||||
a7cb2000-a7eb2000 rw-p 00000000 00:00 0
|
||||
a7eb2000-a7eb3000 ---p 00000000 00:00 0
|
||||
a7eb3000-a7ed5000 rw-p 00000000 00:00 0
|
||||
a7ed5000-a8008000 r-xp 00000000 03:00 4222 /lib/libc.so.6
|
||||
a8008000-a800a000 r--p 00133000 03:00 4222 /lib/libc.so.6
|
||||
a800a000-a800b000 rw-p 00135000 03:00 4222 /lib/libc.so.6
|
||||
a800b000-a800e000 rw-p 00000000 00:00 0
|
||||
a800e000-a8022000 r-xp 00000000 03:00 14462 /lib/libpthread.so.0
|
||||
a8022000-a8023000 r--p 00013000 03:00 14462 /lib/libpthread.so.0
|
||||
a8023000-a8024000 rw-p 00014000 03:00 14462 /lib/libpthread.so.0
|
||||
a8024000-a8027000 rw-p 00000000 00:00 0
|
||||
a8027000-a8043000 r-xp 00000000 03:00 8317 /lib/ld-linux.so.2
|
||||
a8043000-a8044000 r--p 0001b000 03:00 8317 /lib/ld-linux.so.2
|
||||
a8044000-a8045000 rw-p 0001c000 03:00 8317 /lib/ld-linux.so.2
|
||||
aff35000-aff4a000 rw-p 00000000 00:00 0 [stack]
|
||||
ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso]
|
||||
|
||||
where "address" is the address space in the process that it occupies, "perms"
|
||||
is a set of permissions:
|
||||
|
||||
r = read
|
||||
w = write
|
||||
x = execute
|
||||
s = shared
|
||||
p = private (copy on write)
|
||||
|
||||
"offset" is the offset into the mapping, "dev" is the device (major:minor), and
|
||||
"inode" is the inode on that device. 0 indicates that no inode is associated
|
||||
with the memory region, as the case would be with BSS (uninitialized data).
|
||||
The "pathname" shows the name associated file for this mapping. If the mapping
|
||||
is not associated with a file:
|
||||
|
||||
[heap] = the heap of the program
|
||||
[stack] = the stack of the main process
|
||||
[vdso] = the "virtual dynamic shared object",
|
||||
the kernel system call handler
|
||||
|
||||
or if empty, the mapping is anonymous.
|
||||
|
||||
|
||||
The /proc/PID/smaps is an extension based on maps, showing the memory
|
||||
consumption for each of the process's mappings. For each of mappings there
|
||||
is a series of lines such as the following:
|
||||
|
||||
08048000-080bc000 r-xp 00000000 03:02 13130 /bin/bash
|
||||
Size: 1084 kB
|
||||
Rss: 892 kB
|
||||
Pss: 374 kB
|
||||
Shared_Clean: 892 kB
|
||||
Shared_Dirty: 0 kB
|
||||
Private_Clean: 0 kB
|
||||
Private_Dirty: 0 kB
|
||||
Referenced: 892 kB
|
||||
Swap: 0 kB
|
||||
KernelPageSize: 4 kB
|
||||
MMUPageSize: 4 kB
|
||||
|
||||
The first of these lines shows the same information as is displayed for the
|
||||
mapping in /proc/PID/maps. The remaining lines show the size of the mapping,
|
||||
the amount of the mapping that is currently resident in RAM, the "proportional
|
||||
set size” (divide each shared page by the number of processes sharing it), the
|
||||
number of clean and dirty shared pages in the mapping, and the number of clean
|
||||
and dirty private pages in the mapping. The "Referenced" indicates the amount
|
||||
of memory currently marked as referenced or accessed.
|
||||
|
||||
This file is only present if the CONFIG_MMU kernel configuration option is
|
||||
enabled.
|
||||
|
||||
1.2 Kernel data
|
||||
---------------
|
||||
|
||||
Similar to the process entries, the kernel data files give information about
|
||||
the running kernel. The files used to obtain this information are contained in
|
||||
/proc and are listed in Table 1-4. Not all of these will be present in your
|
||||
/proc and are listed in Table 1-5. Not all of these will be present in your
|
||||
system. It depends on the kernel configuration and the loaded modules, which
|
||||
files are there, and which are missing.
|
||||
|
||||
Table 1-4: Kernel info in /proc
|
||||
Table 1-5: Kernel info in /proc
|
||||
..............................................................................
|
||||
File Content
|
||||
apm Advanced power management info
|
||||
@ -283,6 +421,7 @@ Table 1-4: Kernel info in /proc
|
||||
rtc Real time clock
|
||||
scsi SCSI info (see text)
|
||||
slabinfo Slab pool info
|
||||
softirqs softirq usage
|
||||
stat Overall statistics
|
||||
swaps Swap space utilization
|
||||
sys See chapter 2
|
||||
@ -366,7 +505,7 @@ just those considered 'most important'. The new vectors are:
|
||||
RES, CAL, TLB -- rescheduling, call and TLB flush interrupts are
|
||||
sent from one CPU to another per the needs of the OS. Typically,
|
||||
their statistics are used by kernel developers and interested users to
|
||||
determine the occurance of interrupt of the given type.
|
||||
determine the occurrence of interrupts of the given type.
|
||||
|
||||
The above IRQ vectors are displayed only when relevent. For example,
|
||||
the threshold vector does not exist on x86_64 platforms. Others are
|
||||
@ -551,7 +690,7 @@ Committed_AS: The amount of memory presently allocated on the system.
|
||||
memory once that memory has been successfully allocated.
|
||||
VmallocTotal: total size of vmalloc memory area
|
||||
VmallocUsed: amount of vmalloc area which is used
|
||||
VmallocChunk: largest contigious block of vmalloc area which is free
|
||||
VmallocChunk: largest contiguous block of vmalloc area which is free
|
||||
|
||||
..............................................................................
|
||||
|
||||
@ -597,6 +736,25 @@ on the kind of area :
|
||||
0xffffffffa0017000-0xffffffffa0022000 45056 sys_init_module+0xc27/0x1d00 ...
|
||||
pages=10 vmalloc N0=10
|
||||
|
||||
..............................................................................
|
||||
|
||||
softirqs:
|
||||
|
||||
Provides counts of softirq handlers serviced since boot time, for each cpu.
|
||||
|
||||
> cat /proc/softirqs
|
||||
CPU0 CPU1 CPU2 CPU3
|
||||
HI: 0 0 0 0
|
||||
TIMER: 27166 27120 27097 27034
|
||||
NET_TX: 0 0 0 17
|
||||
NET_RX: 42 0 0 39
|
||||
BLOCK: 0 0 107 1121
|
||||
TASKLET: 0 0 0 290
|
||||
SCHED: 27035 26983 26971 26746
|
||||
HRTIMER: 0 0 0 0
|
||||
RCU: 1678 1769 2178 2250
|
||||
|
||||
|
||||
1.3 IDE devices in /proc/ide
|
||||
----------------------------
|
||||
|
||||
@ -614,10 +772,10 @@ IDE devices:
|
||||
|
||||
More detailed information can be found in the controller specific
|
||||
subdirectories. These are named ide0, ide1 and so on. Each of these
|
||||
directories contains the files shown in table 1-5.
|
||||
directories contains the files shown in table 1-6.
|
||||
|
||||
|
||||
Table 1-5: IDE controller info in /proc/ide/ide?
|
||||
Table 1-6: IDE controller info in /proc/ide/ide?
|
||||
..............................................................................
|
||||
File Content
|
||||
channel IDE channel (0 or 1)
|
||||
@ -627,11 +785,11 @@ Table 1-5: IDE controller info in /proc/ide/ide?
|
||||
..............................................................................
|
||||
|
||||
Each device connected to a controller has a separate subdirectory in the
|
||||
controllers directory. The files listed in table 1-6 are contained in these
|
||||
controllers directory. The files listed in table 1-7 are contained in these
|
||||
directories.
|
||||
|
||||
|
||||
Table 1-6: IDE device information
|
||||
Table 1-7: IDE device information
|
||||
..............................................................................
|
||||
File Content
|
||||
cache The cache
|
||||
@ -673,12 +831,12 @@ the drive parameters:
|
||||
1.4 Networking info in /proc/net
|
||||
--------------------------------
|
||||
|
||||
The subdirectory /proc/net follows the usual pattern. Table 1-6 shows the
|
||||
The subdirectory /proc/net follows the usual pattern. Table 1-8 shows the
|
||||
additional values you get for IP version 6 if you configure the kernel to
|
||||
support this. Table 1-7 lists the files and their meaning.
|
||||
support this. Table 1-9 lists the files and their meaning.
|
||||
|
||||
|
||||
Table 1-6: IPv6 info in /proc/net
|
||||
Table 1-8: IPv6 info in /proc/net
|
||||
..............................................................................
|
||||
File Content
|
||||
udp6 UDP sockets (IPv6)
|
||||
@ -693,7 +851,7 @@ Table 1-6: IPv6 info in /proc/net
|
||||
..............................................................................
|
||||
|
||||
|
||||
Table 1-7: Network info in /proc/net
|
||||
Table 1-9: Network info in /proc/net
|
||||
..............................................................................
|
||||
File Content
|
||||
arp Kernel ARP table
|
||||
@ -817,10 +975,10 @@ The directory /proc/parport contains information about the parallel ports of
|
||||
your system. It has one subdirectory for each port, named after the port
|
||||
number (0,1,2,...).
|
||||
|
||||
These directories contain the four files shown in Table 1-8.
|
||||
These directories contain the four files shown in Table 1-10.
|
||||
|
||||
|
||||
Table 1-8: Files in /proc/parport
|
||||
Table 1-10: Files in /proc/parport
|
||||
..............................................................................
|
||||
File Content
|
||||
autoprobe Any IEEE-1284 device ID information that has been acquired.
|
||||
@ -838,10 +996,10 @@ Table 1-8: Files in /proc/parport
|
||||
|
||||
Information about the available and actually used tty's can be found in the
|
||||
directory /proc/tty.You'll find entries for drivers and line disciplines in
|
||||
this directory, as shown in Table 1-9.
|
||||
this directory, as shown in Table 1-11.
|
||||
|
||||
|
||||
Table 1-9: Files in /proc/tty
|
||||
Table 1-11: Files in /proc/tty
|
||||
..............................................................................
|
||||
File Content
|
||||
drivers list of drivers and their usage
|
||||
@ -883,6 +1041,7 @@ since the system first booted. For a quick look, simply cat the file:
|
||||
processes 2915
|
||||
procs_running 1
|
||||
procs_blocked 0
|
||||
softirq 183433 0 21755 12 39 1137 231 21459 2263
|
||||
|
||||
The very first "cpu" line aggregates the numbers in all of the other "cpuN"
|
||||
lines. These numbers identify the amount of time the CPU has spent performing
|
||||
@ -918,6 +1077,11 @@ CPUs.
|
||||
The "procs_blocked" line gives the number of processes currently blocked,
|
||||
waiting for I/O to complete.
|
||||
|
||||
The "softirq" line gives counts of softirqs serviced since boot time, for each
|
||||
of the possible system softirqs. The first column is the total of all
|
||||
softirqs serviced; each subsequent column is the total for that particular
|
||||
softirq.
|
||||
|
||||
|
||||
1.9 Ext4 file system parameters
|
||||
------------------------------
|
||||
@ -926,9 +1090,9 @@ Information about mounted ext4 file systems can be found in
|
||||
/proc/fs/ext4. Each mounted filesystem will have a directory in
|
||||
/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
|
||||
/proc/fs/ext4/dm-0). The files in each per-device directory are shown
|
||||
in Table 1-10, below.
|
||||
in Table 1-12, below.
|
||||
|
||||
Table 1-10: Files in /proc/fs/ext4/<devname>
|
||||
Table 1-12: Files in /proc/fs/ext4/<devname>
|
||||
..............................................................................
|
||||
File Content
|
||||
mb_groups details of multiblock allocator buddy cache of free blocks
|
||||
@ -1003,11 +1167,13 @@ CHAPTER 3: PER-PROCESS PARAMETERS
|
||||
3.1 /proc/<pid>/oom_adj - Adjust the oom-killer score
|
||||
------------------------------------------------------
|
||||
|
||||
This file can be used to adjust the score used to select which processes
|
||||
should be killed in an out-of-memory situation. Giving it a high score will
|
||||
increase the likelihood of this process being killed by the oom-killer. Valid
|
||||
values are in the range -16 to +15, plus the special value -17, which disables
|
||||
oom-killing altogether for this process.
|
||||
This file can be used to adjust the score used to select which processes should
|
||||
be killed in an out-of-memory situation. The oom_adj value is a characteristic
|
||||
of the task's mm, so all threads that share an mm with pid will have the same
|
||||
oom_adj value. A high value will increase the likelihood of this process being
|
||||
killed by the oom-killer. Valid values are in the range -16 to +15 as
|
||||
explained below and a special value of -17, which disables oom-killing
|
||||
altogether for threads sharing pid's mm.
|
||||
|
||||
The process to be killed in an out-of-memory situation is selected among all others
|
||||
based on its badness score. This value equals the original memory size of the process
|
||||
@ -1021,6 +1187,9 @@ the parent's score if they do not share the same memory. Thus forking servers
|
||||
are the prime candidates to be killed. Having only one 'hungry' child will make
|
||||
parent less preferable than the child.
|
||||
|
||||
/proc/<pid>/oom_adj cannot be changed for kthreads since they are immune from
|
||||
oom-killing already.
|
||||
|
||||
/proc/<pid>/oom_score shows process' current badness score.
|
||||
|
||||
The following heuristics are then applied:
|
||||
|
@ -72,7 +72,7 @@ The 'rom' file is special in that it provides read-only access to the device's
|
||||
ROM file, if available. It's disabled by default, however, so applications
|
||||
should write the string "1" to the file to enable it before attempting a read
|
||||
call, and disable it following the access by writing "0" to the file. Note
|
||||
that the device must be enabled for a rom read to return data succesfully.
|
||||
that the device must be enabled for a rom read to return data successfully.
|
||||
In the event a driver is not bound to the device, it can be enabled using the
|
||||
'enable' file, documented above.
|
||||
|
||||
|
@ -124,14 +124,19 @@ sys_immutable -- If set, ATTR_SYS attribute on FAT is handled as
|
||||
flush -- If set, the filesystem will try to flush to disk more
|
||||
early than normal. Not set by default.
|
||||
|
||||
rodir -- FAT has the ATTR_RO (read-only) attribute. But on Windows,
|
||||
the ATTR_RO of the directory will be just ignored actually,
|
||||
and is used by only applications as flag. E.g. it's setted
|
||||
for the customized folder.
|
||||
rodir -- FAT has the ATTR_RO (read-only) attribute. On Windows,
|
||||
the ATTR_RO of the directory will just be ignored,
|
||||
and is used only by applications as a flag (e.g. it's set
|
||||
for the customized folder).
|
||||
|
||||
If you want to use ATTR_RO as read-only flag even for
|
||||
the directory, set this option.
|
||||
|
||||
errors=panic|continue|remount-ro
|
||||
-- specify FAT behavior on critical errors: panic, continue
|
||||
without doing anything or remount the partition in
|
||||
read-only mode (default behavior).
|
||||
|
||||
<bool>: 0,1,yes,no,true,false
|
||||
|
||||
TODO
|
||||
|
@ -77,7 +77,8 @@
|
||||
seconds for the whole load operation.
|
||||
|
||||
- request_firmware_nowait() is also provided for convenience in
|
||||
non-user contexts.
|
||||
user contexts to request firmware asynchronously, but can't be called
|
||||
in atomic contexts.
|
||||
|
||||
|
||||
about in-kernel persistence:
|
||||
|
131
Documentation/futex-requeue-pi.txt
Normal file
131
Documentation/futex-requeue-pi.txt
Normal file
@ -0,0 +1,131 @@
|
||||
Futex Requeue PI
|
||||
----------------
|
||||
|
||||
Requeueing of tasks from a non-PI futex to a PI futex requires
|
||||
special handling in order to ensure the underlying rt_mutex is never
|
||||
left without an owner if it has waiters; doing so would break the PI
|
||||
boosting logic [see rt-mutex-desgin.txt] For the purposes of
|
||||
brevity, this action will be referred to as "requeue_pi" throughout
|
||||
this document. Priority inheritance is abbreviated throughout as
|
||||
"PI".
|
||||
|
||||
Motivation
|
||||
----------
|
||||
|
||||
Without requeue_pi, the glibc implementation of
|
||||
pthread_cond_broadcast() must resort to waking all the tasks waiting
|
||||
on a pthread_condvar and letting them try to sort out which task
|
||||
gets to run first in classic thundering-herd formation. An ideal
|
||||
implementation would wake the highest-priority waiter, and leave the
|
||||
rest to the natural wakeup inherent in unlocking the mutex
|
||||
associated with the condvar.
|
||||
|
||||
Consider the simplified glibc calls:
|
||||
|
||||
/* caller must lock mutex */
|
||||
pthread_cond_wait(cond, mutex)
|
||||
{
|
||||
lock(cond->__data.__lock);
|
||||
unlock(mutex);
|
||||
do {
|
||||
unlock(cond->__data.__lock);
|
||||
futex_wait(cond->__data.__futex);
|
||||
lock(cond->__data.__lock);
|
||||
} while(...)
|
||||
unlock(cond->__data.__lock);
|
||||
lock(mutex);
|
||||
}
|
||||
|
||||
pthread_cond_broadcast(cond)
|
||||
{
|
||||
lock(cond->__data.__lock);
|
||||
unlock(cond->__data.__lock);
|
||||
futex_requeue(cond->data.__futex, cond->mutex);
|
||||
}
|
||||
|
||||
Once pthread_cond_broadcast() requeues the tasks, the cond->mutex
|
||||
has waiters. Note that pthread_cond_wait() attempts to lock the
|
||||
mutex only after it has returned to user space. This will leave the
|
||||
underlying rt_mutex with waiters, and no owner, breaking the
|
||||
previously mentioned PI-boosting algorithms.
|
||||
|
||||
In order to support PI-aware pthread_condvar's, the kernel needs to
|
||||
be able to requeue tasks to PI futexes. This support implies that
|
||||
upon a successful futex_wait system call, the caller would return to
|
||||
user space already holding the PI futex. The glibc implementation
|
||||
would be modified as follows:
|
||||
|
||||
|
||||
/* caller must lock mutex */
|
||||
pthread_cond_wait_pi(cond, mutex)
|
||||
{
|
||||
lock(cond->__data.__lock);
|
||||
unlock(mutex);
|
||||
do {
|
||||
unlock(cond->__data.__lock);
|
||||
futex_wait_requeue_pi(cond->__data.__futex);
|
||||
lock(cond->__data.__lock);
|
||||
} while(...)
|
||||
unlock(cond->__data.__lock);
|
||||
/* the kernel acquired the the mutex for us */
|
||||
}
|
||||
|
||||
pthread_cond_broadcast_pi(cond)
|
||||
{
|
||||
lock(cond->__data.__lock);
|
||||
unlock(cond->__data.__lock);
|
||||
futex_requeue_pi(cond->data.__futex, cond->mutex);
|
||||
}
|
||||
|
||||
The actual glibc implementation will likely test for PI and make the
|
||||
necessary changes inside the existing calls rather than creating new
|
||||
calls for the PI cases. Similar changes are needed for
|
||||
pthread_cond_timedwait() and pthread_cond_signal().
|
||||
|
||||
Implementation
|
||||
--------------
|
||||
|
||||
In order to ensure the rt_mutex has an owner if it has waiters, it
|
||||
is necessary for both the requeue code, as well as the waiting code,
|
||||
to be able to acquire the rt_mutex before returning to user space.
|
||||
The requeue code cannot simply wake the waiter and leave it to
|
||||
acquire the rt_mutex as it would open a race window between the
|
||||
requeue call returning to user space and the waiter waking and
|
||||
starting to run. This is especially true in the uncontended case.
|
||||
|
||||
The solution involves two new rt_mutex helper routines,
|
||||
rt_mutex_start_proxy_lock() and rt_mutex_finish_proxy_lock(), which
|
||||
allow the requeue code to acquire an uncontended rt_mutex on behalf
|
||||
of the waiter and to enqueue the waiter on a contended rt_mutex.
|
||||
Two new system calls provide the kernel<->user interface to
|
||||
requeue_pi: FUTEX_WAIT_REQUEUE_PI and FUTEX_REQUEUE_CMP_PI.
|
||||
|
||||
FUTEX_WAIT_REQUEUE_PI is called by the waiter (pthread_cond_wait()
|
||||
and pthread_cond_timedwait()) to block on the initial futex and wait
|
||||
to be requeued to a PI-aware futex. The implementation is the
|
||||
result of a high-speed collision between futex_wait() and
|
||||
futex_lock_pi(), with some extra logic to check for the additional
|
||||
wake-up scenarios.
|
||||
|
||||
FUTEX_REQUEUE_CMP_PI is called by the waker
|
||||
(pthread_cond_broadcast() and pthread_cond_signal()) to requeue and
|
||||
possibly wake the waiting tasks. Internally, this system call is
|
||||
still handled by futex_requeue (by passing requeue_pi=1). Before
|
||||
requeueing, futex_requeue() attempts to acquire the requeue target
|
||||
PI futex on behalf of the top waiter. If it can, this waiter is
|
||||
woken. futex_requeue() then proceeds to requeue the remaining
|
||||
nr_wake+nr_requeue tasks to the PI futex, calling
|
||||
rt_mutex_start_proxy_lock() prior to each requeue to prepare the
|
||||
task as a waiter on the underlying rt_mutex. It is possible that
|
||||
the lock can be acquired at this stage as well, if so, the next
|
||||
waiter is woken to finish the acquisition of the lock.
|
||||
|
||||
FUTEX_REQUEUE_PI accepts nr_wake and nr_requeue as arguments, but
|
||||
their sum is all that really matters. futex_requeue() will wake or
|
||||
requeue up to nr_wake + nr_requeue tasks. It will wake only as many
|
||||
tasks as it can acquire the lock for, which in the majority of cases
|
||||
should be 0 as good programming practice dictates that the caller of
|
||||
either pthread_cond_broadcast() or pthread_cond_signal() acquire the
|
||||
mutex prior to making the call. FUTEX_REQUEUE_PI requires that
|
||||
nr_wake=1. nr_requeue should be INT_MAX for broadcast and 0 for
|
||||
signal.
|
246
Documentation/gcov.txt
Normal file
246
Documentation/gcov.txt
Normal file
@ -0,0 +1,246 @@
|
||||
Using gcov with the Linux kernel
|
||||
================================
|
||||
|
||||
1. Introduction
|
||||
2. Preparation
|
||||
3. Customization
|
||||
4. Files
|
||||
5. Modules
|
||||
6. Separated build and test machines
|
||||
7. Troubleshooting
|
||||
Appendix A: sample script: gather_on_build.sh
|
||||
Appendix B: sample script: gather_on_test.sh
|
||||
|
||||
|
||||
1. Introduction
|
||||
===============
|
||||
|
||||
gcov profiling kernel support enables the use of GCC's coverage testing
|
||||
tool gcov [1] with the Linux kernel. Coverage data of a running kernel
|
||||
is exported in gcov-compatible format via the "gcov" debugfs directory.
|
||||
To get coverage data for a specific file, change to the kernel build
|
||||
directory and use gcov with the -o option as follows (requires root):
|
||||
|
||||
# cd /tmp/linux-out
|
||||
# gcov -o /sys/kernel/debug/gcov/tmp/linux-out/kernel spinlock.c
|
||||
|
||||
This will create source code files annotated with execution counts
|
||||
in the current directory. In addition, graphical gcov front-ends such
|
||||
as lcov [2] can be used to automate the process of collecting data
|
||||
for the entire kernel and provide coverage overviews in HTML format.
|
||||
|
||||
Possible uses:
|
||||
|
||||
* debugging (has this line been reached at all?)
|
||||
* test improvement (how do I change my test to cover these lines?)
|
||||
* minimizing kernel configurations (do I need this option if the
|
||||
associated code is never run?)
|
||||
|
||||
--
|
||||
|
||||
[1] http://gcc.gnu.org/onlinedocs/gcc/Gcov.html
|
||||
[2] http://ltp.sourceforge.net/coverage/lcov.php
|
||||
|
||||
|
||||
2. Preparation
|
||||
==============
|
||||
|
||||
Configure the kernel with:
|
||||
|
||||
CONFIG_DEBUGFS=y
|
||||
CONFIG_GCOV_KERNEL=y
|
||||
|
||||
and to get coverage data for the entire kernel:
|
||||
|
||||
CONFIG_GCOV_PROFILE_ALL=y
|
||||
|
||||
Note that kernels compiled with profiling flags will be significantly
|
||||
larger and run slower. Also CONFIG_GCOV_PROFILE_ALL may not be supported
|
||||
on all architectures.
|
||||
|
||||
Profiling data will only become accessible once debugfs has been
|
||||
mounted:
|
||||
|
||||
mount -t debugfs none /sys/kernel/debug
|
||||
|
||||
|
||||
3. Customization
|
||||
================
|
||||
|
||||
To enable profiling for specific files or directories, add a line
|
||||
similar to the following to the respective kernel Makefile:
|
||||
|
||||
For a single file (e.g. main.o):
|
||||
GCOV_PROFILE_main.o := y
|
||||
|
||||
For all files in one directory:
|
||||
GCOV_PROFILE := y
|
||||
|
||||
To exclude files from being profiled even when CONFIG_GCOV_PROFILE_ALL
|
||||
is specified, use:
|
||||
|
||||
GCOV_PROFILE_main.o := n
|
||||
and:
|
||||
GCOV_PROFILE := n
|
||||
|
||||
Only files which are linked to the main kernel image or are compiled as
|
||||
kernel modules are supported by this mechanism.
|
||||
|
||||
|
||||
4. Files
|
||||
========
|
||||
|
||||
The gcov kernel support creates the following files in debugfs:
|
||||
|
||||
/sys/kernel/debug/gcov
|
||||
Parent directory for all gcov-related files.
|
||||
|
||||
/sys/kernel/debug/gcov/reset
|
||||
Global reset file: resets all coverage data to zero when
|
||||
written to.
|
||||
|
||||
/sys/kernel/debug/gcov/path/to/compile/dir/file.gcda
|
||||
The actual gcov data file as understood by the gcov
|
||||
tool. Resets file coverage data to zero when written to.
|
||||
|
||||
/sys/kernel/debug/gcov/path/to/compile/dir/file.gcno
|
||||
Symbolic link to a static data file required by the gcov
|
||||
tool. This file is generated by gcc when compiling with
|
||||
option -ftest-coverage.
|
||||
|
||||
|
||||
5. Modules
|
||||
==========
|
||||
|
||||
Kernel modules may contain cleanup code which is only run during
|
||||
module unload time. The gcov mechanism provides a means to collect
|
||||
coverage data for such code by keeping a copy of the data associated
|
||||
with the unloaded module. This data remains available through debugfs.
|
||||
Once the module is loaded again, the associated coverage counters are
|
||||
initialized with the data from its previous instantiation.
|
||||
|
||||
This behavior can be deactivated by specifying the gcov_persist kernel
|
||||
parameter:
|
||||
|
||||
gcov_persist=0
|
||||
|
||||
At run-time, a user can also choose to discard data for an unloaded
|
||||
module by writing to its data file or the global reset file.
|
||||
|
||||
|
||||
6. Separated build and test machines
|
||||
====================================
|
||||
|
||||
The gcov kernel profiling infrastructure is designed to work out-of-the
|
||||
box for setups where kernels are built and run on the same machine. In
|
||||
cases where the kernel runs on a separate machine, special preparations
|
||||
must be made, depending on where the gcov tool is used:
|
||||
|
||||
a) gcov is run on the TEST machine
|
||||
|
||||
The gcov tool version on the test machine must be compatible with the
|
||||
gcc version used for kernel build. Also the following files need to be
|
||||
copied from build to test machine:
|
||||
|
||||
from the source tree:
|
||||
- all C source files + headers
|
||||
|
||||
from the build tree:
|
||||
- all C source files + headers
|
||||
- all .gcda and .gcno files
|
||||
- all links to directories
|
||||
|
||||
It is important to note that these files need to be placed into the
|
||||
exact same file system location on the test machine as on the build
|
||||
machine. If any of the path components is symbolic link, the actual
|
||||
directory needs to be used instead (due to make's CURDIR handling).
|
||||
|
||||
b) gcov is run on the BUILD machine
|
||||
|
||||
The following files need to be copied after each test case from test
|
||||
to build machine:
|
||||
|
||||
from the gcov directory in sysfs:
|
||||
- all .gcda files
|
||||
- all links to .gcno files
|
||||
|
||||
These files can be copied to any location on the build machine. gcov
|
||||
must then be called with the -o option pointing to that directory.
|
||||
|
||||
Example directory setup on the build machine:
|
||||
|
||||
/tmp/linux: kernel source tree
|
||||
/tmp/out: kernel build directory as specified by make O=
|
||||
/tmp/coverage: location of the files copied from the test machine
|
||||
|
||||
[user@build] cd /tmp/out
|
||||
[user@build] gcov -o /tmp/coverage/tmp/out/init main.c
|
||||
|
||||
|
||||
7. Troubleshooting
|
||||
==================
|
||||
|
||||
Problem: Compilation aborts during linker step.
|
||||
Cause: Profiling flags are specified for source files which are not
|
||||
linked to the main kernel or which are linked by a custom
|
||||
linker procedure.
|
||||
Solution: Exclude affected source files from profiling by specifying
|
||||
GCOV_PROFILE := n or GCOV_PROFILE_basename.o := n in the
|
||||
corresponding Makefile.
|
||||
|
||||
|
||||
Appendix A: gather_on_build.sh
|
||||
==============================
|
||||
|
||||
Sample script to gather coverage meta files on the build machine
|
||||
(see 6a):
|
||||
|
||||
#!/bin/bash
|
||||
|
||||
KSRC=$1
|
||||
KOBJ=$2
|
||||
DEST=$3
|
||||
|
||||
if [ -z "$KSRC" ] || [ -z "$KOBJ" ] || [ -z "$DEST" ]; then
|
||||
echo "Usage: $0 <ksrc directory> <kobj directory> <output.tar.gz>" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
KSRC=$(cd $KSRC; printf "all:\n\t@echo \${CURDIR}\n" | make -f -)
|
||||
KOBJ=$(cd $KOBJ; printf "all:\n\t@echo \${CURDIR}\n" | make -f -)
|
||||
|
||||
find $KSRC $KOBJ \( -name '*.gcno' -o -name '*.[ch]' -o -type l \) -a \
|
||||
-perm /u+r,g+r | tar cfz $DEST -P -T -
|
||||
|
||||
if [ $? -eq 0 ] ; then
|
||||
echo "$DEST successfully created, copy to test system and unpack with:"
|
||||
echo " tar xfz $DEST -P"
|
||||
else
|
||||
echo "Could not create file $DEST"
|
||||
fi
|
||||
|
||||
|
||||
Appendix B: gather_on_test.sh
|
||||
=============================
|
||||
|
||||
Sample script to gather coverage data files on the test machine
|
||||
(see 6b):
|
||||
|
||||
#!/bin/bash
|
||||
|
||||
DEST=$1
|
||||
GCDA=/sys/kernel/debug/gcov
|
||||
|
||||
if [ -z "$DEST" ] ; then
|
||||
echo "Usage: $0 <output.tar.gz>" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
find $GCDA -name '*.gcno' -o -name '*.gcda' | tar cfz $DEST -T -
|
||||
|
||||
if [ $? -eq 0 ] ; then
|
||||
echo "$DEST successfully created, copy to build system and unpack with:"
|
||||
echo " tar xfz $DEST"
|
||||
else
|
||||
echo "Could not create file $DEST"
|
||||
fi
|
@ -458,7 +458,7 @@ debugfs interface, since it provides control over GPIO direction and
|
||||
value instead of just showing a gpio state summary. Plus, it could be
|
||||
present on production systems without debugging support.
|
||||
|
||||
Given approprate hardware documentation for the system, userspace could
|
||||
Given appropriate hardware documentation for the system, userspace could
|
||||
know for example that GPIO #23 controls the write protect line used to
|
||||
protect boot loader segments in flash memory. System upgrade procedures
|
||||
may need to temporarily remove that protection, first importing a GPIO,
|
||||
|
@ -2,14 +2,18 @@ Kernel driver f71882fg
|
||||
======================
|
||||
|
||||
Supported chips:
|
||||
* Fintek F71882FG and F71883FG
|
||||
Prefix: 'f71882fg'
|
||||
* Fintek F71858FG
|
||||
Prefix: 'f71858fg'
|
||||
Addresses scanned: none, address read from Super I/O config space
|
||||
Datasheet: Available from the Fintek website
|
||||
* Fintek F71862FG and F71863FG
|
||||
Prefix: 'f71862fg'
|
||||
Addresses scanned: none, address read from Super I/O config space
|
||||
Datasheet: Available from the Fintek website
|
||||
* Fintek F71882FG and F71883FG
|
||||
Prefix: 'f71882fg'
|
||||
Addresses scanned: none, address read from Super I/O config space
|
||||
Datasheet: Available from the Fintek website
|
||||
* Fintek F8000
|
||||
Prefix: 'f8000'
|
||||
Addresses scanned: none, address read from Super I/O config space
|
||||
@ -66,13 +70,13 @@ printed when loading the driver.
|
||||
|
||||
Three different fan control modes are supported; the mode number is written
|
||||
to the pwm#_enable file. Note that not all modes are supported on all
|
||||
chips, and some modes may only be available in RPM / PWM mode on the F8000.
|
||||
chips, and some modes may only be available in RPM / PWM mode.
|
||||
Writing an unsupported mode will result in an invalid parameter error.
|
||||
|
||||
* 1: Manual mode
|
||||
You ask for a specific PWM duty cycle / DC voltage or a specific % of
|
||||
fan#_full_speed by writing to the pwm# file. This mode is only
|
||||
available on the F8000 if the fan channel is in RPM mode.
|
||||
available on the F71858FG / F8000 if the fan channel is in RPM mode.
|
||||
|
||||
* 2: Normal auto mode
|
||||
You can define a number of temperature/fan speed trip points, which % the
|
||||
|
@ -7,7 +7,7 @@ henceforth as AEM.
|
||||
Supported systems:
|
||||
* Any recent IBM System X server with AEM support.
|
||||
This includes the x3350, x3550, x3650, x3655, x3755, x3850 M2,
|
||||
x3950 M2, and certain HS2x/LS2x/QS2x blades. The IPMI host interface
|
||||
x3950 M2, and certain HC10/HS2x/LS2x/QS2x blades. The IPMI host interface
|
||||
driver ("ipmi-si") needs to be loaded for this driver to do anything.
|
||||
Prefix: 'ibmaem'
|
||||
Datasheet: Not available
|
||||
|
@ -70,6 +70,7 @@ are interpreted as 0! For more on how written strings are interpreted see the
|
||||
[0-*] denotes any positive number starting from 0
|
||||
[1-*] denotes any positive number starting from 1
|
||||
RO read only value
|
||||
WO write only value
|
||||
RW read/write value
|
||||
|
||||
Read/write values may be read-only for some chips, depending on the
|
||||
@ -295,6 +296,24 @@ temp[1-*]_label Suggested temperature channel label.
|
||||
user-space.
|
||||
RO
|
||||
|
||||
temp[1-*]_lowest
|
||||
Historical minimum temperature
|
||||
Unit: millidegree Celsius
|
||||
RO
|
||||
|
||||
temp[1-*]_highest
|
||||
Historical maximum temperature
|
||||
Unit: millidegree Celsius
|
||||
RO
|
||||
|
||||
temp[1-*]_reset_history
|
||||
Reset temp_lowest and temp_highest
|
||||
WO
|
||||
|
||||
temp_reset_history
|
||||
Reset temp_lowest and temp_highest for all sensors
|
||||
WO
|
||||
|
||||
Some chips measure temperature using external thermistors and an ADC, and
|
||||
report the temperature measurement as a voltage. Converting this voltage
|
||||
back to a temperature (or the other way around for limits) requires
|
||||
|
42
Documentation/hwmon/tmp401
Normal file
42
Documentation/hwmon/tmp401
Normal file
@ -0,0 +1,42 @@
|
||||
Kernel driver tmp401
|
||||
====================
|
||||
|
||||
Supported chips:
|
||||
* Texas Instruments TMP401
|
||||
Prefix: 'tmp401'
|
||||
Addresses scanned: I2C 0x4c
|
||||
Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp401.html
|
||||
* Texas Instruments TMP411
|
||||
Prefix: 'tmp411'
|
||||
Addresses scanned: I2C 0x4c
|
||||
Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp411.html
|
||||
|
||||
Authors:
|
||||
Hans de Goede <hdegoede@redhat.com>
|
||||
Andre Prendel <andre.prendel@gmx.de>
|
||||
|
||||
Description
|
||||
-----------
|
||||
|
||||
This driver implements support for Texas Instruments TMP401 and
|
||||
TMP411 chips. These chips implements one remote and one local
|
||||
temperature sensor. Temperature is measured in degrees
|
||||
Celsius. Resolution of the remote sensor is 0.0625 degree. Local
|
||||
sensor resolution can be set to 0.5, 0.25, 0.125 or 0.0625 degree (not
|
||||
supported by the driver so far, so using the default resolution of 0.5
|
||||
degree).
|
||||
|
||||
The driver provides the common sysfs-interface for temperatures (see
|
||||
/Documentation/hwmon/sysfs-interface under Temperatures).
|
||||
|
||||
The TMP411 chip is compatible with TMP401. It provides some additional
|
||||
features.
|
||||
|
||||
* Minimum and Maximum temperature measured since power-on, chip-reset
|
||||
|
||||
Exported via sysfs attributes tempX_lowest and tempX_highest.
|
||||
|
||||
* Reset of historical minimum/maximum temperature measurements
|
||||
|
||||
Exported via sysfs attribute temp_reset_history. Writing 1 to this
|
||||
file triggers a reset.
|
@ -12,6 +12,10 @@ Supported chips:
|
||||
Addresses scanned: ISA address retrieved from Super I/O registers
|
||||
Datasheet:
|
||||
http://www.nuvoton.com.tw/NR/rdonlyres/7885623D-A487-4CF9-A47F-30C5F73D6FE6/0/W83627DHG.pdf
|
||||
* Winbond W83627DHG-P
|
||||
Prefix: 'w83627dhg'
|
||||
Addresses scanned: ISA address retrieved from Super I/O registers
|
||||
Datasheet: not available
|
||||
* Winbond W83667HG
|
||||
Prefix: 'w83667hg'
|
||||
Addresses scanned: ISA address retrieved from Super I/O registers
|
||||
@ -28,8 +32,8 @@ Description
|
||||
-----------
|
||||
|
||||
This driver implements support for the Winbond W83627EHF, W83627EHG,
|
||||
W83627DHG and W83667HG super I/O chips. We will refer to them collectively
|
||||
as Winbond chips.
|
||||
W83627DHG, W83627DHG-P and W83667HG super I/O chips. We will refer to them
|
||||
collectively as Winbond chips.
|
||||
|
||||
The chips implement three temperature sensors, five fan rotation
|
||||
speed sensors, ten analog voltage sensors (only nine for the 627DHG), one
|
||||
@ -135,3 +139,6 @@ done in the driver for all register addresses.
|
||||
The DHG also supports PECI, where the DHG queries Intel CPU temperatures, and
|
||||
the ICH8 southbridge gets that data via PECI from the DHG, so that the
|
||||
southbridge drives the fans. And the DHG supports SST, a one-wire serial bus.
|
||||
|
||||
The DHG-P has an additional automatic fan speed control mode named Smart Fan
|
||||
(TM) III+. This mode is not yet supported by the driver.
|
||||
|
@ -20,6 +20,8 @@ platform_device with the base address and interrupt number. The
|
||||
dev.platform_data of the device should also point to a struct
|
||||
ocores_i2c_platform_data (see linux/i2c-ocores.h) describing the
|
||||
distance between registers and the input clock speed.
|
||||
There is also a possibility to attach a list of i2c_board_info which
|
||||
the i2c-ocores driver will add to the bus upon creation.
|
||||
|
||||
E.G. something like:
|
||||
|
||||
@ -36,9 +38,24 @@ static struct resource ocores_resources[] = {
|
||||
},
|
||||
};
|
||||
|
||||
/* optional board info */
|
||||
struct i2c_board_info ocores_i2c_board_info[] = {
|
||||
{
|
||||
I2C_BOARD_INFO("tsc2003", 0x48),
|
||||
.platform_data = &tsc2003_platform_data,
|
||||
.irq = TSC_IRQ
|
||||
},
|
||||
{
|
||||
I2C_BOARD_INFO("adv7180", 0x42 >> 1),
|
||||
.irq = ADV_IRQ
|
||||
}
|
||||
};
|
||||
|
||||
static struct ocores_i2c_platform_data myi2c_data = {
|
||||
.regstep = 2, /* two bytes between registers */
|
||||
.clock_khz = 50000, /* input clock of 50MHz */
|
||||
.devices = ocores_i2c_board_info, /* optional table of devices */
|
||||
.num_devices = ARRAY_SIZE(ocores_i2c_board_info), /* table size */
|
||||
};
|
||||
|
||||
static struct platform_device myi2c = {
|
||||
|
@ -19,6 +19,9 @@ Supported adapters:
|
||||
* VIA Technologies, Inc. VX800/VX820
|
||||
Datasheet: available on http://linux.via.com.tw
|
||||
|
||||
* VIA Technologies, Inc. VX855/VX875
|
||||
Datasheet: Availability unknown
|
||||
|
||||
Authors:
|
||||
Kyösti Mälkki <kmalkki@cc.hut.fi>,
|
||||
Mark D. Studebaker <mdsxyz123@yahoo.com>,
|
||||
@ -53,6 +56,7 @@ Your lspci -n listing must show one of these :
|
||||
device 1106:3287 (VT8251)
|
||||
device 1106:8324 (CX700)
|
||||
device 1106:8353 (VX800/VX820)
|
||||
device 1106:8409 (VX855/VX875)
|
||||
|
||||
If none of these show up, you should look in the BIOS for settings like
|
||||
enable ACPI / SMBus or even USB.
|
||||
|
@ -165,3 +165,47 @@ was done there. Two significant differences are:
|
||||
Once again, method 3 should be avoided wherever possible. Explicit device
|
||||
instantiation (methods 1 and 2) is much preferred for it is safer and
|
||||
faster.
|
||||
|
||||
|
||||
Method 4: Instantiate from user-space
|
||||
-------------------------------------
|
||||
|
||||
In general, the kernel should know which I2C devices are connected and
|
||||
what addresses they live at. However, in certain cases, it does not, so a
|
||||
sysfs interface was added to let the user provide the information. This
|
||||
interface is made of 2 attribute files which are created in every I2C bus
|
||||
directory: new_device and delete_device. Both files are write only and you
|
||||
must write the right parameters to them in order to properly instantiate,
|
||||
respectively delete, an I2C device.
|
||||
|
||||
File new_device takes 2 parameters: the name of the I2C device (a string)
|
||||
and the address of the I2C device (a number, typically expressed in
|
||||
hexadecimal starting with 0x, but can also be expressed in decimal.)
|
||||
|
||||
File delete_device takes a single parameter: the address of the I2C
|
||||
device. As no two devices can live at the same address on a given I2C
|
||||
segment, the address is sufficient to uniquely identify the device to be
|
||||
deleted.
|
||||
|
||||
Example:
|
||||
# echo eeprom 0x50 > /sys/class/i2c-adapter/i2c-3/new_device
|
||||
|
||||
While this interface should only be used when in-kernel device declaration
|
||||
can't be done, there is a variety of cases where it can be helpful:
|
||||
* The I2C driver usually detects devices (method 3 above) but the bus
|
||||
segment your device lives on doesn't have the proper class bit set and
|
||||
thus detection doesn't trigger.
|
||||
* The I2C driver usually detects devices, but your device lives at an
|
||||
unexpected address.
|
||||
* The I2C driver usually detects devices, but your device is not detected,
|
||||
either because the detection routine is too strict, or because your
|
||||
device is not officially supported yet but you know it is compatible.
|
||||
* You are developing a driver on a test board, where you soldered the I2C
|
||||
device yourself.
|
||||
|
||||
This interface is a replacement for the force_* module parameters some I2C
|
||||
drivers implement. Being implemented in i2c-core rather than in each
|
||||
device driver individually, it is much more efficient, and also has the
|
||||
advantage that you do not have to reload the driver to change a setting.
|
||||
You can also instantiate the device before the driver is loaded or even
|
||||
available, and you don't need to know what driver the device needs.
|
||||
|
@ -126,19 +126,9 @@ different) configuration information, as do drivers handling chip variants
|
||||
that can't be distinguished by protocol probing, or which need some board
|
||||
specific information to operate correctly.
|
||||
|
||||
Accordingly, the I2C stack now has two models for associating I2C devices
|
||||
with their drivers: the original "legacy" model, and a newer one that's
|
||||
fully compatible with the Linux 2.6 driver model. These models do not mix,
|
||||
since the "legacy" model requires drivers to create "i2c_client" device
|
||||
objects after SMBus style probing, while the Linux driver model expects
|
||||
drivers to be given such device objects in their probe() routines.
|
||||
|
||||
The legacy model is deprecated now and will soon be removed, so we no
|
||||
longer document it here.
|
||||
|
||||
|
||||
Standard Driver Model Binding ("New Style")
|
||||
-------------------------------------------
|
||||
Device/Driver Binding
|
||||
---------------------
|
||||
|
||||
System infrastructure, typically board-specific initialization code or
|
||||
boot firmware, reports what I2C devices exist. For example, there may be
|
||||
@ -201,7 +191,7 @@ a given I2C bus. This is for example the case of hardware monitoring
|
||||
devices on a PC's SMBus. In that case, you may want to let your driver
|
||||
detect supported devices automatically. This is how the legacy model
|
||||
was working, and is now available as an extension to the standard
|
||||
driver model (so that we can finally get rid of the legacy model.)
|
||||
driver model.
|
||||
|
||||
You simply have to define a detect callback which will attempt to
|
||||
identify supported devices (returning 0 for supported ones and -ENODEV
|
||||
|
@ -216,6 +216,8 @@ Other kernel parameters for ide_core are:
|
||||
|
||||
* "noflush=[interface_number.device_number]" to disable flush requests
|
||||
|
||||
* "nohpa=[interface_number.device_number]" to disable Host Protected Area
|
||||
|
||||
* "noprobe=[interface_number.device_number]" to skip probing
|
||||
|
||||
* "nowerr=[interface_number.device_number]" to ignore the WRERR_STAT bit
|
||||
|
@ -278,7 +278,7 @@ struct input_event {
|
||||
};
|
||||
|
||||
'time' is the timestamp, it returns the time at which the event happened.
|
||||
Type is for example EV_REL for relative moment, REL_KEY for a keypress or
|
||||
Type is for example EV_REL for relative moment, EV_KEY for a keypress or
|
||||
release. More types are defined in include/linux/input.h.
|
||||
|
||||
'code' is event code, for example REL_X or KEY_BACKSPACE, again a complete
|
||||
|
@ -67,7 +67,12 @@ data with it.
|
||||
struct rotary_encoder_platform_data is declared in
|
||||
include/linux/rotary-encoder.h and needs to be filled with the number of
|
||||
steps the encoder has and can carry information about externally inverted
|
||||
signals (because of used invertig buffer or other reasons).
|
||||
signals (because of an inverting buffer or other reasons). The encoder
|
||||
can be set up to deliver input information as either an absolute or relative
|
||||
axes. For relative axes the input event returns +/-1 for each step. For
|
||||
absolute axes the position of the encoder can either roll over between zero
|
||||
and the number of steps or will clamp at the maximum and zero depending on
|
||||
the configuration.
|
||||
|
||||
Because GPIO to IRQ mapping is platform specific, this information must
|
||||
be given in seperately to the driver. See the example below.
|
||||
@ -85,6 +90,8 @@ be given in seperately to the driver. See the example below.
|
||||
static struct rotary_encoder_platform_data my_rotary_encoder_info = {
|
||||
.steps = 24,
|
||||
.axis = ABS_X,
|
||||
.relative_axis = false,
|
||||
.rollover = false,
|
||||
.gpio_a = GPIO_ROTARY_A,
|
||||
.gpio_b = GPIO_ROTARY_B,
|
||||
.inverted_a = 0,
|
||||
|
@ -149,6 +149,8 @@ Code Seq# Include File Comments
|
||||
'p' 40-7F linux/nvram.h
|
||||
'p' 80-9F user-space parport
|
||||
<mailto:tim@cyberelk.net>
|
||||
'p' a1-a4 linux/pps.h LinuxPPS
|
||||
<mailto:giometti@linux.it>
|
||||
'q' 00-1F linux/serio.h
|
||||
'q' 80-FF Internet PhoneJACK, Internet LineJACK
|
||||
<http://www.quicknet.net>
|
||||
|
@ -14,39 +14,37 @@ README
|
||||
- general info on what you need and what to do for Linux ISDN.
|
||||
README.FAQ
|
||||
- general info for FAQ.
|
||||
README.audio
|
||||
- info for running audio over ISDN.
|
||||
README.fax
|
||||
- info for using Fax over ISDN.
|
||||
README.gigaset
|
||||
- info on the drivers for Siemens Gigaset ISDN adapters.
|
||||
README.icn
|
||||
- info on the ICN-ISDN-card and its driver.
|
||||
README.HiSax
|
||||
- info on the HiSax driver which replaces the old teles.
|
||||
README.hfc-pci
|
||||
- info on hfc-pci based cards.
|
||||
README.pcbit
|
||||
- info on the PCBIT-D ISDN adapter and driver.
|
||||
README.syncppp
|
||||
- info on running Sync PPP over ISDN.
|
||||
syncPPP.FAQ
|
||||
- frequently asked questions about running PPP over ISDN.
|
||||
README.avmb1
|
||||
- info on driver for AVM-B1 ISDN card.
|
||||
README.act2000
|
||||
- info on driver for IBM ACT-2000 card.
|
||||
README.eicon
|
||||
- info on driver for Eicon active cards.
|
||||
README.audio
|
||||
- info for running audio over ISDN.
|
||||
README.avmb1
|
||||
- info on driver for AVM-B1 ISDN card.
|
||||
README.concap
|
||||
- info on "CONCAP" encapsulation protocol interface used for X.25.
|
||||
README.diversion
|
||||
- info on module for isdn diversion services.
|
||||
README.fax
|
||||
- info for using Fax over ISDN.
|
||||
README.gigaset
|
||||
- info on the drivers for Siemens Gigaset ISDN adapters
|
||||
README.hfc-pci
|
||||
- info on hfc-pci based cards.
|
||||
README.hysdn
|
||||
- info on driver for Hypercope active HYSDN cards
|
||||
README.icn
|
||||
- info on the ICN-ISDN-card and its driver.
|
||||
README.mISDN
|
||||
- info on the Modular ISDN subsystem (mISDN)
|
||||
README.pcbit
|
||||
- info on the PCBIT-D ISDN adapter and driver.
|
||||
README.sc
|
||||
- info on driver for Spellcaster cards.
|
||||
README.syncppp
|
||||
- info on running Sync PPP over ISDN.
|
||||
README.x25
|
||||
- info for running X.25 over ISDN.
|
||||
README.hysdn
|
||||
- info on driver for Hypercope active HYSDN cards
|
||||
README.mISDN
|
||||
- info on the Modular ISDN subsystem (mISDN).
|
||||
syncPPP.FAQ
|
||||
- frequently asked questions about running PPP over ISDN.
|
||||
|
@ -45,7 +45,7 @@ From then on, Kernel CAPI may call the registered callback functions for the
|
||||
device.
|
||||
|
||||
If the device becomes unusable for any reason (shutdown, disconnect ...), the
|
||||
driver has to call capi_ctr_reseted(). This will prevent further calls to the
|
||||
driver has to call capi_ctr_down(). This will prevent further calls to the
|
||||
callback functions by Kernel CAPI.
|
||||
|
||||
|
||||
@ -114,20 +114,36 @@ char *driver_name
|
||||
int (*load_firmware)(struct capi_ctr *ctrlr, capiloaddata *ldata)
|
||||
(optional) pointer to a callback function for sending firmware and
|
||||
configuration data to the device
|
||||
Return value: 0 on success, error code on error
|
||||
Called in process context.
|
||||
|
||||
void (*reset_ctr)(struct capi_ctr *ctrlr)
|
||||
pointer to a callback function for performing a reset on the device,
|
||||
releasing all registered applications
|
||||
(optional) pointer to a callback function for performing a reset on
|
||||
the device, releasing all registered applications
|
||||
Called in process context.
|
||||
|
||||
void (*register_appl)(struct capi_ctr *ctrlr, u16 applid,
|
||||
capi_register_params *rparam)
|
||||
void (*release_appl)(struct capi_ctr *ctrlr, u16 applid)
|
||||
pointers to callback functions for registration and deregistration of
|
||||
applications with the device
|
||||
Calls to these functions are serialized by Kernel CAPI so that only
|
||||
one call to any of them is active at any time.
|
||||
|
||||
u16 (*send_message)(struct capi_ctr *ctrlr, struct sk_buff *skb)
|
||||
pointer to a callback function for sending a CAPI message to the
|
||||
device
|
||||
Return value: CAPI error code
|
||||
If the method returns 0 (CAPI_NOERROR) the driver has taken ownership
|
||||
of the skb and the caller may no longer access it. If it returns a
|
||||
non-zero (error) value then ownership of the skb returns to the caller
|
||||
who may reuse or free it.
|
||||
The return value should only be used to signal problems with respect
|
||||
to accepting or queueing the message. Errors occurring during the
|
||||
actual processing of the message should be signaled with an
|
||||
appropriate reply message.
|
||||
Calls to this function are not serialized by Kernel CAPI, ie. it must
|
||||
be prepared to be re-entered.
|
||||
|
||||
char *(*procinfo)(struct capi_ctr *ctrlr)
|
||||
pointer to a callback function returning the entry for the device in
|
||||
@ -138,6 +154,8 @@ read_proc_t *ctr_read_proc
|
||||
system entry, /proc/capi/controllers/<n>; will be called with a
|
||||
pointer to the device's capi_ctr structure as the last (data) argument
|
||||
|
||||
Note: Callback functions are never called in interrupt context.
|
||||
|
||||
- to be filled in before calling capi_ctr_ready():
|
||||
|
||||
u8 manu[CAPI_MANUFACTURER_LEN]
|
||||
@ -153,6 +171,45 @@ u8 serial[CAPI_SERIAL_LEN]
|
||||
value to return for CAPI_GET_SERIAL
|
||||
|
||||
|
||||
4.3 The _cmsg Structure
|
||||
|
||||
(declared in <linux/isdn/capiutil.h>)
|
||||
|
||||
The _cmsg structure stores the contents of a CAPI 2.0 message in an easily
|
||||
accessible form. It contains members for all possible CAPI 2.0 parameters, of
|
||||
which only those appearing in the message type currently being processed are
|
||||
actually used. Unused members should be set to zero.
|
||||
|
||||
Members are named after the CAPI 2.0 standard names of the parameters they
|
||||
represent. See <linux/isdn/capiutil.h> for the exact spelling. Member data
|
||||
types are:
|
||||
|
||||
u8 for CAPI parameters of type 'byte'
|
||||
|
||||
u16 for CAPI parameters of type 'word'
|
||||
|
||||
u32 for CAPI parameters of type 'dword'
|
||||
|
||||
_cstruct for CAPI parameters of type 'struct' not containing any
|
||||
variably-sized (struct) subparameters (eg. 'Called Party Number')
|
||||
The member is a pointer to a buffer containing the parameter in
|
||||
CAPI encoding (length + content). It may also be NULL, which will
|
||||
be taken to represent an empty (zero length) parameter.
|
||||
|
||||
_cmstruct for CAPI parameters of type 'struct' containing 'struct'
|
||||
subparameters ('Additional Info' and 'B Protocol')
|
||||
The representation is a single byte containing one of the values:
|
||||
CAPI_DEFAULT: the parameter is empty
|
||||
CAPI_COMPOSE: the values of the subparameters are stored
|
||||
individually in the corresponding _cmsg structure members
|
||||
|
||||
Functions capi_cmsg2message() and capi_message2cmsg() are provided to convert
|
||||
messages between their transport encoding described in the CAPI 2.0 standard
|
||||
and their _cmsg structure representation. Note that capi_cmsg2message() does
|
||||
not know or check the size of its destination buffer. The caller must make
|
||||
sure it is big enough to accomodate the resulting CAPI message.
|
||||
|
||||
|
||||
5. Lower Layer Interface Functions
|
||||
|
||||
(declared in <linux/isdn/capilli.h>)
|
||||
@ -166,7 +223,7 @@ int detach_capi_ctr(struct capi_ctr *ctrlr)
|
||||
register/unregister a device (controller) with Kernel CAPI
|
||||
|
||||
void capi_ctr_ready(struct capi_ctr *ctrlr)
|
||||
void capi_ctr_reseted(struct capi_ctr *ctrlr)
|
||||
void capi_ctr_down(struct capi_ctr *ctrlr)
|
||||
signal controller ready/not ready
|
||||
|
||||
void capi_ctr_suspend_output(struct capi_ctr *ctrlr)
|
||||
@ -211,3 +268,32 @@ CAPIMSG_CONTROL(m) CAPIMSG_SETCONTROL(m, contr) Controller/PLCI/NCCI
|
||||
(u32)
|
||||
CAPIMSG_DATALEN(m) CAPIMSG_SETDATALEN(m, len) Data Length (u16)
|
||||
|
||||
|
||||
Library functions for working with _cmsg structures
|
||||
(from <linux/isdn/capiutil.h>):
|
||||
|
||||
unsigned capi_cmsg2message(_cmsg *cmsg, u8 *msg)
|
||||
Assembles a CAPI 2.0 message from the parameters in *cmsg, storing the
|
||||
result in *msg.
|
||||
|
||||
unsigned capi_message2cmsg(_cmsg *cmsg, u8 *msg)
|
||||
Disassembles the CAPI 2.0 message in *msg, storing the parameters in
|
||||
*cmsg.
|
||||
|
||||
unsigned capi_cmsg_header(_cmsg *cmsg, u16 ApplId, u8 Command, u8 Subcommand,
|
||||
u16 Messagenumber, u32 Controller)
|
||||
Fills the header part and address field of the _cmsg structure *cmsg
|
||||
with the given values, zeroing the remainder of the structure so only
|
||||
parameters with non-default values need to be changed before sending
|
||||
the message.
|
||||
|
||||
void capi_cmsg_answer(_cmsg *cmsg)
|
||||
Sets the low bit of the Subcommand field in *cmsg, thereby converting
|
||||
_REQ to _CONF and _IND to _RESP.
|
||||
|
||||
char *capi_cmd2str(u8 Command, u8 Subcommand)
|
||||
Returns the CAPI 2.0 message name corresponding to the given command
|
||||
and subcommand values, as a static ASCII string. The return value may
|
||||
be NULL if the command/subcommand is not one of those defined in the
|
||||
CAPI 2.0 standard.
|
||||
|
||||
|
@ -149,10 +149,8 @@ GigaSet 307x Device Driver
|
||||
configuration files and chat scripts in the gigaset-VERSION/ppp directory
|
||||
in the driver packages from http://sourceforge.net/projects/gigaset307x/.
|
||||
Please note that the USB drivers are not able to change the state of the
|
||||
control lines (the M105 driver can be configured to use some undocumented
|
||||
control requests, if you really need the control lines, though). This means
|
||||
you must use "Stupid Mode" if you are using wvdial or you should use the
|
||||
nocrtscts option of pppd.
|
||||
control lines. This means you must use "Stupid Mode" if you are using
|
||||
wvdial or you should use the nocrtscts option of pppd.
|
||||
You must also assure that the ppp_async module is loaded with the parameter
|
||||
flag_time=0. You can do this e.g. by adding a line like
|
||||
|
||||
@ -190,20 +188,19 @@ GigaSet 307x Device Driver
|
||||
You can also use /sys/class/tty/ttyGxy/cidmode for changing the CID mode
|
||||
setting (ttyGxy is ttyGU0 or ttyGB0).
|
||||
|
||||
2.6. M105 Undocumented USB Requests
|
||||
------------------------------
|
||||
|
||||
The Gigaset M105 USB data box understands a couple of useful, but
|
||||
undocumented USB commands. These requests are not used in normal
|
||||
operation (for wireless access to the base), but are needed for access
|
||||
to the M105's own configuration mode (registration to the base, baudrate
|
||||
and line format settings, device status queries) via the gigacontr
|
||||
utility. Their use is controlled by the kernel configuration option
|
||||
"Support for undocumented USB requests" (CONFIG_GIGASET_UNDOCREQ). If you
|
||||
encounter error code -ENOTTY when trying to use some features of the
|
||||
M105, try setting that option to "y" via 'make {x,menu}config' and
|
||||
recompiling the driver.
|
||||
2.6. Unregistered Wireless Devices (M101/M105)
|
||||
-----------------------------------------
|
||||
The main purpose of the ser_gigaset and usb_gigaset drivers is to allow
|
||||
the M101 and M105 wireless devices to be used as ISDN devices for ISDN
|
||||
connections through a Gigaset base. Therefore they assume that the device
|
||||
is registered to a DECT base.
|
||||
|
||||
If the M101/M105 device is not registered to a base, initialization of
|
||||
the device fails, and a corresponding error message is logged by the
|
||||
driver. In that situation, a restricted set of functions is available
|
||||
which includes, in particular, those necessary for registering the device
|
||||
to a base or for switching it between Fixed Part and Portable Part
|
||||
modes.
|
||||
|
||||
3. Troubleshooting
|
||||
---------------
|
||||
@ -234,11 +231,12 @@ GigaSet 307x Device Driver
|
||||
Select Unimodem mode for all DECT data adapters. (see section 2.4.)
|
||||
|
||||
Problem:
|
||||
You want to configure your USB DECT data adapter (M105) but gigacontr
|
||||
reports an error: "/dev/ttyGU0: Inappropriate ioctl for device".
|
||||
Messages like this:
|
||||
usb_gigaset 3-2:1.0: Could not initialize the device.
|
||||
appear in your syslog.
|
||||
Solution:
|
||||
Recompile the usb_gigaset driver with the kernel configuration option
|
||||
CONFIG_GIGASET_UNDOCREQ set to 'y'. (see section 2.6.)
|
||||
Check whether your M10x wireless device is correctly registered to the
|
||||
Gigaset base. (see section 2.6.)
|
||||
|
||||
3.2. Telling the driver to provide more information
|
||||
----------------------------------------------
|
||||
|
@ -75,7 +75,7 @@ Linux カーネルパッチ投稿者向けチェックリスト
|
||||
ビルドした上、動作確認を行ってください。
|
||||
|
||||
14: もしパッチがディスクのI/O性能などに影響を与えるようであれば、
|
||||
'CONFIG_LBD'オプションを有効にした場合と無効にした場合の両方で
|
||||
'CONFIG_LBDAF'オプションを有効にした場合と無効にした場合の両方で
|
||||
テストを実施してみてください。
|
||||
|
||||
15: lockdepの機能を全て有効にした上で、全てのコードパスを評価してください。
|
||||
|
@ -35,6 +35,79 @@ new .config files to see the differences:
|
||||
|
||||
(Yes, we need something better here.)
|
||||
|
||||
______________________________________________________________________
|
||||
Environment variables for '*config'
|
||||
|
||||
KCONFIG_CONFIG
|
||||
--------------------------------------------------
|
||||
This environment variable can be used to specify a default kernel config
|
||||
file name to override the default name of ".config".
|
||||
|
||||
KCONFIG_OVERWRITECONFIG
|
||||
--------------------------------------------------
|
||||
If you set KCONFIG_OVERWRITECONFIG in the environment, Kconfig will not
|
||||
break symlinks when .config is a symlink to somewhere else.
|
||||
|
||||
KCONFIG_NOTIMESTAMP
|
||||
--------------------------------------------------
|
||||
If this environment variable exists and is non-null, the timestamp line
|
||||
in generated .config files is omitted.
|
||||
|
||||
______________________________________________________________________
|
||||
Environment variables for '{allyes/allmod/allno/rand}config'
|
||||
|
||||
KCONFIG_ALLCONFIG
|
||||
--------------------------------------------------
|
||||
(partially based on lkml email from/by Rob Landley, re: miniconfig)
|
||||
--------------------------------------------------
|
||||
The allyesconfig/allmodconfig/allnoconfig/randconfig variants can
|
||||
also use the environment variable KCONFIG_ALLCONFIG as a flag or a
|
||||
filename that contains config symbols that the user requires to be
|
||||
set to a specific value. If KCONFIG_ALLCONFIG is used without a
|
||||
filename, "make *config" checks for a file named
|
||||
"all{yes/mod/no/random}.config" (corresponding to the *config command
|
||||
that was used) for symbol values that are to be forced. If this file
|
||||
is not found, it checks for a file named "all.config" to contain forced
|
||||
values.
|
||||
|
||||
This enables you to create "miniature" config (miniconfig) or custom
|
||||
config files containing just the config symbols that you are interested
|
||||
in. Then the kernel config system generates the full .config file,
|
||||
including symbols of your miniconfig file.
|
||||
|
||||
This 'KCONFIG_ALLCONFIG' file is a config file which contains
|
||||
(usually a subset of all) preset config symbols. These variable
|
||||
settings are still subject to normal dependency checks.
|
||||
|
||||
Examples:
|
||||
KCONFIG_ALLCONFIG=custom-notebook.config make allnoconfig
|
||||
or
|
||||
KCONFIG_ALLCONFIG=mini.config make allnoconfig
|
||||
or
|
||||
make KCONFIG_ALLCONFIG=mini.config allnoconfig
|
||||
|
||||
These examples will disable most options (allnoconfig) but enable or
|
||||
disable the options that are explicitly listed in the specified
|
||||
mini-config files.
|
||||
|
||||
______________________________________________________________________
|
||||
Environment variables for 'silentoldconfig'
|
||||
|
||||
KCONFIG_NOSILENTUPDATE
|
||||
--------------------------------------------------
|
||||
If this variable has a non-blank value, it prevents silent kernel
|
||||
config udpates (requires explicit updates).
|
||||
|
||||
KCONFIG_AUTOCONFIG
|
||||
--------------------------------------------------
|
||||
This environment variable can be set to specify the path & name of the
|
||||
"auto.conf" file. Its default value is "include/config/auto.conf".
|
||||
|
||||
KCONFIG_AUTOHEADER
|
||||
--------------------------------------------------
|
||||
This environment variable can be set to specify the path & name of the
|
||||
"autoconf.h" (header) file. Its default value is "include/linux/autoconf.h".
|
||||
|
||||
|
||||
======================================================================
|
||||
menuconfig
|
||||
@ -60,10 +133,11 @@ Searching in menuconfig:
|
||||
|
||||
/^hotplug
|
||||
|
||||
|
||||
______________________________________________________________________
|
||||
Color Themes for 'menuconfig'
|
||||
User interface options for 'menuconfig'
|
||||
|
||||
MENUCONFIG_COLOR
|
||||
--------------------------------------------------
|
||||
It is possible to select different color themes using the variable
|
||||
MENUCONFIG_COLOR. To select a theme use:
|
||||
|
||||
@ -75,83 +149,13 @@ Available themes are:
|
||||
classic => theme with blue background. The classic look
|
||||
bluetitle => a LCD friendly version of classic. (default)
|
||||
|
||||
______________________________________________________________________
|
||||
Environment variables in 'menuconfig'
|
||||
|
||||
KCONFIG_ALLCONFIG
|
||||
--------------------------------------------------
|
||||
(partially based on lkml email from/by Rob Landley, re: miniconfig)
|
||||
--------------------------------------------------
|
||||
The allyesconfig/allmodconfig/allnoconfig/randconfig variants can
|
||||
also use the environment variable KCONFIG_ALLCONFIG as a flag or a
|
||||
filename that contains config symbols that the user requires to be
|
||||
set to a specific value. If KCONFIG_ALLCONFIG is used without a
|
||||
filename, "make *config" checks for a file named
|
||||
"all{yes/mod/no/random}.config" (corresponding to the *config command
|
||||
that was used) for symbol values that are to be forced. If this file
|
||||
is not found, it checks for a file named "all.config" to contain forced
|
||||
values.
|
||||
|
||||
This enables you to create "miniature" config (miniconfig) or custom
|
||||
config files containing just the config symbols that you are interested
|
||||
in. Then the kernel config system generates the full .config file,
|
||||
including dependencies of your miniconfig file, based on the miniconfig
|
||||
file.
|
||||
|
||||
This 'KCONFIG_ALLCONFIG' file is a config file which contains
|
||||
(usually a subset of all) preset config symbols. These variable
|
||||
settings are still subject to normal dependency checks.
|
||||
|
||||
Examples:
|
||||
KCONFIG_ALLCONFIG=custom-notebook.config make allnoconfig
|
||||
or
|
||||
KCONFIG_ALLCONFIG=mini.config make allnoconfig
|
||||
or
|
||||
make KCONFIG_ALLCONFIG=mini.config allnoconfig
|
||||
|
||||
These examples will disable most options (allnoconfig) but enable or
|
||||
disable the options that are explicitly listed in the specified
|
||||
mini-config files.
|
||||
|
||||
KCONFIG_NOSILENTUPDATE
|
||||
--------------------------------------------------
|
||||
If this variable has a non-blank value, it prevents silent kernel
|
||||
config udpates (requires explicit updates).
|
||||
|
||||
KCONFIG_CONFIG
|
||||
--------------------------------------------------
|
||||
This environment variable can be used to specify a default kernel config
|
||||
file name to override the default name of ".config".
|
||||
|
||||
KCONFIG_OVERWRITECONFIG
|
||||
--------------------------------------------------
|
||||
If you set KCONFIG_OVERWRITECONFIG in the environment, Kconfig will not
|
||||
break symlinks when .config is a symlink to somewhere else.
|
||||
|
||||
KCONFIG_NOTIMESTAMP
|
||||
--------------------------------------------------
|
||||
If this environment variable exists and is non-null, the timestamp line
|
||||
in generated .config files is omitted.
|
||||
|
||||
KCONFIG_AUTOCONFIG
|
||||
--------------------------------------------------
|
||||
This environment variable can be set to specify the path & name of the
|
||||
"auto.conf" file. Its default value is "include/config/auto.conf".
|
||||
|
||||
KCONFIG_AUTOHEADER
|
||||
--------------------------------------------------
|
||||
This environment variable can be set to specify the path & name of the
|
||||
"autoconf.h" (header) file. Its default value is "include/linux/autoconf.h".
|
||||
|
||||
______________________________________________________________________
|
||||
menuconfig User Interface Options
|
||||
----------------------------------------------------------------------
|
||||
MENUCONFIG_MODE
|
||||
--------------------------------------------------
|
||||
This mode shows all sub-menus in one large tree.
|
||||
|
||||
Example:
|
||||
MENUCONFIG_MODE=single_menu make menuconfig
|
||||
make MENUCONFIG_MODE=single_menu menuconfig
|
||||
|
||||
|
||||
======================================================================
|
||||
xconfig
|
||||
|
@ -275,7 +275,7 @@ following files:
|
||||
|
||||
KERNELDIR := /lib/modules/`uname -r`/build
|
||||
all::
|
||||
$(MAKE) -C $KERNELDIR M=`pwd` $@
|
||||
$(MAKE) -C $(KERNELDIR) M=`pwd` $@
|
||||
|
||||
# Module specific targets
|
||||
genbin:
|
||||
|
@ -108,7 +108,7 @@ There are two possible methods of using Kdump.
|
||||
|
||||
2) Or use the system kernel binary itself as dump-capture kernel and there is
|
||||
no need to build a separate dump-capture kernel. This is possible
|
||||
only with the architecutres which support a relocatable kernel. As
|
||||
only with the architectures which support a relocatable kernel. As
|
||||
of today, i386, x86_64, ppc64 and ia64 architectures support relocatable
|
||||
kernel.
|
||||
|
||||
@ -222,7 +222,7 @@ Dump-capture kernel config options (Arch Dependent, ia64)
|
||||
----------------------------------------------------------
|
||||
|
||||
- No specific options are required to create a dump-capture kernel
|
||||
for ia64, other than those specified in the arch idependent section
|
||||
for ia64, other than those specified in the arch independent section
|
||||
above. This means that it is possible to use the system kernel
|
||||
as a dump-capture kernel if desired.
|
||||
|
||||
|
@ -48,6 +48,7 @@ parameter is applicable:
|
||||
EFI EFI Partitioning (GPT) is enabled
|
||||
EIDE EIDE/ATAPI support is enabled.
|
||||
FB The frame buffer device is enabled.
|
||||
GCOV GCOV profiling is enabled.
|
||||
HW Appropriate hardware is enabled.
|
||||
IA-64 IA-64 architecture is enabled.
|
||||
IMA Integrity measurement architecture is enabled.
|
||||
@ -56,7 +57,6 @@ parameter is applicable:
|
||||
ISAPNP ISA PnP code is enabled.
|
||||
ISDN Appropriate ISDN support is enabled.
|
||||
JOY Appropriate joystick support is enabled.
|
||||
KMEMTRACE kmemtrace is enabled.
|
||||
LIBATA Libata driver is enabled
|
||||
LP Printer support is enabled.
|
||||
LOOP Loopback device support is enabled.
|
||||
@ -229,14 +229,6 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
to assume that this machine's pmtimer latches its value
|
||||
and always returns good values.
|
||||
|
||||
acpi.power_nocheck= [HW,ACPI]
|
||||
Format: 1/0 enable/disable the check of power state.
|
||||
On some bogus BIOS the _PSC object/_STA object of
|
||||
power resource can't return the correct device power
|
||||
state. In such case it is unneccessary to check its
|
||||
power state again in power transition.
|
||||
1 : disable the power state check
|
||||
|
||||
acpi_sci= [HW,ACPI] ACPI System Control Interrupt trigger mode
|
||||
Format: { level | edge | high | low }
|
||||
|
||||
@ -329,11 +321,6 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
flushed before they will be reused, which
|
||||
is a lot of faster
|
||||
|
||||
amd_iommu_size= [HW,X86-64]
|
||||
Define the size of the aperture for the AMD IOMMU
|
||||
driver. Possible values are:
|
||||
'32M', '64M' (default), '128M', '256M', '512M', '1G'
|
||||
|
||||
amijoy.map= [HW,JOY] Amiga joystick support
|
||||
Map of devices attached to JOY0DAT and JOY1DAT
|
||||
Format: <a>,<b>
|
||||
@ -497,6 +484,13 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
Also note the kernel might malfunction if you disable
|
||||
some critical bits.
|
||||
|
||||
cmo_free_hint= [PPC] Format: { yes | no }
|
||||
Specify whether pages are marked as being inactive
|
||||
when they are freed. This is used in CMO environments
|
||||
to determine OS memory pressure for page stealing by
|
||||
a hypervisor.
|
||||
Default: yes
|
||||
|
||||
code_bytes [X86] How many bytes of object code to print
|
||||
in an oops report.
|
||||
Range: 0 - 8192
|
||||
@ -545,6 +539,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
console=brl,ttyS0
|
||||
For now, only VisioBraille is supported.
|
||||
|
||||
consoleblank= [KNL] The console blank (screen saver) timeout in
|
||||
seconds. Defaults to 10*60 = 10mins. A value of 0
|
||||
disables the blank timer.
|
||||
|
||||
coredump_filter=
|
||||
[KNL] Change the default value for
|
||||
/proc/<pid>/coredump_filter.
|
||||
@ -646,6 +644,13 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
DMA-API debugging code disables itself because the
|
||||
architectural default is too low.
|
||||
|
||||
dma_debug_driver=<driver_name>
|
||||
With this option the DMA-API debugging driver
|
||||
filter feature can be enabled at boot time. Just
|
||||
pass the driver to filter for as the parameter.
|
||||
The filter can be disabled or changed to another
|
||||
driver later using sysfs.
|
||||
|
||||
dscc4.setup= [NET]
|
||||
|
||||
dtc3181e= [HW,SCSI]
|
||||
@ -752,12 +757,25 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
|
||||
|
||||
ftrace=[tracer]
|
||||
[ftrace] will set and start the specified tracer
|
||||
[FTRACE] will set and start the specified tracer
|
||||
as early as possible in order to facilitate early
|
||||
boot debugging.
|
||||
|
||||
ftrace_dump_on_oops
|
||||
[ftrace] will dump the trace buffers on oops.
|
||||
[FTRACE] will dump the trace buffers on oops.
|
||||
|
||||
ftrace_filter=[function-list]
|
||||
[FTRACE] Limit the functions traced by the function
|
||||
tracer at boot up. function-list is a comma separated
|
||||
list of functions. This list can be changed at run
|
||||
time by the set_ftrace_filter file in the debugfs
|
||||
tracing directory.
|
||||
|
||||
ftrace_notrace=[function-list]
|
||||
[FTRACE] Do not trace the functions specified in
|
||||
function-list. This list can be changed at run time
|
||||
by the set_ftrace_notrace file in the debugfs
|
||||
tracing directory.
|
||||
|
||||
gamecon.map[2|3]=
|
||||
[HW,JOY] Multisystem joystick and NES/SNES/PSX pad
|
||||
@ -771,6 +789,12 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
Format: off | on
|
||||
default: on
|
||||
|
||||
gcov_persist= [GCOV] When non-zero (default), profiling data for
|
||||
kernel modules is saved and remains accessible via
|
||||
debugfs, even when the module is unloaded/reloaded.
|
||||
When zero, profiling data is discarded and associated
|
||||
debugfs files are removed at module unload time.
|
||||
|
||||
gdth= [HW,SCSI]
|
||||
See header of drivers/scsi/gdth.c.
|
||||
|
||||
@ -873,11 +897,8 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
|
||||
ide-core.nodma= [HW] (E)IDE subsystem
|
||||
Format: =0.0 to prevent dma on hda, =0.1 hdb =1.0 hdc
|
||||
.vlb_clock .pci_clock .noflush .noprobe .nowerr .cdrom
|
||||
.chs .ignore_cable are additional options
|
||||
See Documentation/ide/ide.txt.
|
||||
|
||||
idebus= [HW] (E)IDE subsystem - VLB/PCI bus speed
|
||||
.vlb_clock .pci_clock .noflush .nohpa .noprobe .nowerr
|
||||
.cdrom .chs .ignore_cable are additional options
|
||||
See Documentation/ide/ide.txt.
|
||||
|
||||
ide-pci-generic.all-generic-ide [HW] (E)IDE subsystem
|
||||
@ -914,6 +935,12 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
Formt: { "sha1" | "md5" }
|
||||
default: "sha1"
|
||||
|
||||
ima_tcb [IMA]
|
||||
Load a policy which meets the needs of the Trusted
|
||||
Computing Base. This means IMA will measure all
|
||||
programs exec'd, files mmap'd for exec, and all files
|
||||
opened for read by uid=0.
|
||||
|
||||
in2000= [HW,SCSI]
|
||||
See header of drivers/scsi/in2000.c.
|
||||
|
||||
@ -971,6 +998,7 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
nomerge
|
||||
forcesac
|
||||
soft
|
||||
pt [x86, IA64]
|
||||
|
||||
io7= [HW] IO7 for Marvel based alpha systems
|
||||
See comment before marvel_specify_io7 in
|
||||
@ -1054,24 +1082,19 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
use the HighMem zone if it exists, and the Normal
|
||||
zone if it does not.
|
||||
|
||||
kmemtrace.enable= [KNL,KMEMTRACE] Format: { yes | no }
|
||||
Controls whether kmemtrace is enabled
|
||||
at boot-time.
|
||||
|
||||
kmemtrace.subbufs=n [KNL,KMEMTRACE] Overrides the number of
|
||||
subbufs kmemtrace's relay channel has. Set this
|
||||
higher than default (KMEMTRACE_N_SUBBUFS in code) if
|
||||
you experience buffer overruns.
|
||||
|
||||
kgdboc= [HW] kgdb over consoles.
|
||||
Requires a tty driver that supports console polling.
|
||||
(only serial suported for now)
|
||||
(only serial supported for now)
|
||||
Format: <serial_device>[,baud]
|
||||
|
||||
kmac= [MIPS] korina ethernet MAC address.
|
||||
Configure the RouterBoard 532 series on-chip
|
||||
Ethernet adapter MAC address.
|
||||
|
||||
kmemleak= [KNL] Boot-time kmemleak enable/disable
|
||||
Valid arguments: on, off
|
||||
Default: on
|
||||
|
||||
kstack=N [X86] Print N words from the kernel stack
|
||||
in oops dumps.
|
||||
|
||||
@ -1339,6 +1362,27 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
min_addr=nn[KMG] [KNL,BOOT,ia64] All physical memory below this
|
||||
physical address is ignored.
|
||||
|
||||
mini2440= [ARM,HW,KNL]
|
||||
Format:[0..2][b][c][t]
|
||||
Default: "0tb"
|
||||
MINI2440 configuration specification:
|
||||
0 - The attached screen is the 3.5" TFT
|
||||
1 - The attached screen is the 7" TFT
|
||||
2 - The VGA Shield is attached (1024x768)
|
||||
Leaving out the screen size parameter will not load
|
||||
the TFT driver, and the framebuffer will be left
|
||||
unconfigured.
|
||||
b - Enable backlight. The TFT backlight pin will be
|
||||
linked to the kernel VESA blanking code and a GPIO
|
||||
LED. This parameter is not necessary when using the
|
||||
VGA shield.
|
||||
c - Enable the s3c camera interface.
|
||||
t - Reserved for enabling touchscreen support. The
|
||||
touchscreen support is not enabled in the mainstream
|
||||
kernel as of 2.6.30, a preliminary port can be found
|
||||
in the "bleeding edge" mini2440 support kernel at
|
||||
http://repo.or.cz/w/linux-2.6/mini2440.git
|
||||
|
||||
mminit_loglevel=
|
||||
[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
|
||||
parameter allows control of the logging verbosity for
|
||||
@ -1380,6 +1424,16 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
mtdparts= [MTD]
|
||||
See drivers/mtd/cmdlinepart.c.
|
||||
|
||||
onenand.bdry= [HW,MTD] Flex-OneNAND Boundary Configuration
|
||||
|
||||
Format: [die0_boundary][,die0_lock][,die1_boundary][,die1_lock]
|
||||
|
||||
boundary - index of last SLC block on Flex-OneNAND.
|
||||
The remaining blocks are configured as MLC blocks.
|
||||
lock - Configure if Flex-OneNAND boundary should be locked.
|
||||
Once locked, the boundary cannot be changed.
|
||||
1 indicates lock status, 0 indicates unlock status.
|
||||
|
||||
mtdset= [ARM]
|
||||
ARM/S3C2412 JIVE boot control
|
||||
|
||||
@ -1390,7 +1444,7 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
('y', default) or cooked coordinates ('n')
|
||||
|
||||
mtrr_chunk_size=nn[KMG] [X86]
|
||||
used for mtrr cleanup. It is largest continous chunk
|
||||
used for mtrr cleanup. It is largest continuous chunk
|
||||
that could hold holes aka. UC entries.
|
||||
|
||||
mtrr_gran_size=nn[KMG] [X86]
|
||||
@ -1575,6 +1629,9 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
noinitrd [RAM] Tells the kernel not to load any configured
|
||||
initial RAM disk.
|
||||
|
||||
nointremap [X86-64, Intel-IOMMU] Do not enable interrupt
|
||||
remapping.
|
||||
|
||||
nointroute [IA-64]
|
||||
|
||||
nojitter [IA64] Disables jitter checking for ITC timers.
|
||||
@ -1660,6 +1717,14 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
oprofile.timer= [HW]
|
||||
Use timer interrupt instead of performance counters
|
||||
|
||||
oprofile.cpu_type= Force an oprofile cpu type
|
||||
This might be useful if you have an older oprofile
|
||||
userland or if you want common events.
|
||||
Format: { archperfmon }
|
||||
archperfmon: [X86] Force use of architectural
|
||||
perfmon on Intel CPUs instead of the
|
||||
CPU specific event set.
|
||||
|
||||
osst= [HW,SCSI] SCSI Tape Driver
|
||||
Format: <buffer_size>,<write_threshold>
|
||||
See also Documentation/scsi/st.txt.
|
||||
@ -1735,6 +1800,9 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
root domains (aka PCI segments, in ACPI-speak).
|
||||
nommconf [X86] Disable use of MMCONFIG for PCI
|
||||
Configuration
|
||||
check_enable_amd_mmconf [X86] check for and enable
|
||||
properly configured MMIO access to PCI
|
||||
config space on AMD family 10h CPU
|
||||
nomsi [MSI] If the PCI_MSI kernel config parameter is
|
||||
enabled, this kernel boot option can be used to
|
||||
disable the use of MSI interrupts system-wide.
|
||||
@ -1824,6 +1892,12 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
PAGE_SIZE is used as alignment.
|
||||
PCI-PCI bridge can be specified, if resource
|
||||
windows need to be expanded.
|
||||
ecrc= Enable/disable PCIe ECRC (transaction layer
|
||||
end-to-end CRC checking).
|
||||
bios: Use BIOS/firmware settings. This is the
|
||||
the default.
|
||||
off: Turn ECRC off
|
||||
on: Turn ECRC on.
|
||||
|
||||
pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active State Power
|
||||
Management.
|
||||
|
773
Documentation/kmemcheck.txt
Normal file
773
Documentation/kmemcheck.txt
Normal file
@ -0,0 +1,773 @@
|
||||
GETTING STARTED WITH KMEMCHECK
|
||||
==============================
|
||||
|
||||
Vegard Nossum <vegardno@ifi.uio.no>
|
||||
|
||||
|
||||
Contents
|
||||
========
|
||||
0. Introduction
|
||||
1. Downloading
|
||||
2. Configuring and compiling
|
||||
3. How to use
|
||||
3.1. Booting
|
||||
3.2. Run-time enable/disable
|
||||
3.3. Debugging
|
||||
3.4. Annotating false positives
|
||||
4. Reporting errors
|
||||
5. Technical description
|
||||
|
||||
|
||||
0. Introduction
|
||||
===============
|
||||
|
||||
kmemcheck is a debugging feature for the Linux Kernel. More specifically, it
|
||||
is a dynamic checker that detects and warns about some uses of uninitialized
|
||||
memory.
|
||||
|
||||
Userspace programmers might be familiar with Valgrind's memcheck. The main
|
||||
difference between memcheck and kmemcheck is that memcheck works for userspace
|
||||
programs only, and kmemcheck works for the kernel only. The implementations
|
||||
are of course vastly different. Because of this, kmemcheck is not as accurate
|
||||
as memcheck, but it turns out to be good enough in practice to discover real
|
||||
programmer errors that the compiler is not able to find through static
|
||||
analysis.
|
||||
|
||||
Enabling kmemcheck on a kernel will probably slow it down to the extent that
|
||||
the machine will not be usable for normal workloads such as e.g. an
|
||||
interactive desktop. kmemcheck will also cause the kernel to use about twice
|
||||
as much memory as normal. For this reason, kmemcheck is strictly a debugging
|
||||
feature.
|
||||
|
||||
|
||||
1. Downloading
|
||||
==============
|
||||
|
||||
kmemcheck can only be downloaded using git. If you want to write patches
|
||||
against the current code, you should use the kmemcheck development branch of
|
||||
the tip tree. It is also possible to use the linux-next tree, which also
|
||||
includes the latest version of kmemcheck.
|
||||
|
||||
Assuming that you've already cloned the linux-2.6.git repository, all you
|
||||
have to do is add the -tip tree as a remote, like this:
|
||||
|
||||
$ git remote add tip git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git
|
||||
|
||||
To actually download the tree, fetch the remote:
|
||||
|
||||
$ git fetch tip
|
||||
|
||||
And to check out a new local branch with the kmemcheck code:
|
||||
|
||||
$ git checkout -b kmemcheck tip/kmemcheck
|
||||
|
||||
General instructions for the -tip tree can be found here:
|
||||
http://people.redhat.com/mingo/tip.git/readme.txt
|
||||
|
||||
|
||||
2. Configuring and compiling
|
||||
============================
|
||||
|
||||
kmemcheck only works for the x86 (both 32- and 64-bit) platform. A number of
|
||||
configuration variables must have specific settings in order for the kmemcheck
|
||||
menu to even appear in "menuconfig". These are:
|
||||
|
||||
o CONFIG_CC_OPTIMIZE_FOR_SIZE=n
|
||||
|
||||
This option is located under "General setup" / "Optimize for size".
|
||||
|
||||
Without this, gcc will use certain optimizations that usually lead to
|
||||
false positive warnings from kmemcheck. An example of this is a 16-bit
|
||||
field in a struct, where gcc may load 32 bits, then discard the upper
|
||||
16 bits. kmemcheck sees only the 32-bit load, and may trigger a
|
||||
warning for the upper 16 bits (if they're uninitialized).
|
||||
|
||||
o CONFIG_SLAB=y or CONFIG_SLUB=y
|
||||
|
||||
This option is located under "General setup" / "Choose SLAB
|
||||
allocator".
|
||||
|
||||
o CONFIG_FUNCTION_TRACER=n
|
||||
|
||||
This option is located under "Kernel hacking" / "Tracers" / "Kernel
|
||||
Function Tracer"
|
||||
|
||||
When function tracing is compiled in, gcc emits a call to another
|
||||
function at the beginning of every function. This means that when the
|
||||
page fault handler is called, the ftrace framework will be called
|
||||
before kmemcheck has had a chance to handle the fault. If ftrace then
|
||||
modifies memory that was tracked by kmemcheck, the result is an
|
||||
endless recursive page fault.
|
||||
|
||||
o CONFIG_DEBUG_PAGEALLOC=n
|
||||
|
||||
This option is located under "Kernel hacking" / "Debug page memory
|
||||
allocations".
|
||||
|
||||
In addition, I highly recommend turning on CONFIG_DEBUG_INFO=y. This is also
|
||||
located under "Kernel hacking". With this, you will be able to get line number
|
||||
information from the kmemcheck warnings, which is extremely valuable in
|
||||
debugging a problem. This option is not mandatory, however, because it slows
|
||||
down the compilation process and produces a much bigger kernel image.
|
||||
|
||||
Now the kmemcheck menu should be visible (under "Kernel hacking" / "kmemcheck:
|
||||
trap use of uninitialized memory"). Here follows a description of the
|
||||
kmemcheck configuration variables:
|
||||
|
||||
o CONFIG_KMEMCHECK
|
||||
|
||||
This must be enabled in order to use kmemcheck at all...
|
||||
|
||||
o CONFIG_KMEMCHECK_[DISABLED | ENABLED | ONESHOT]_BY_DEFAULT
|
||||
|
||||
This option controls the status of kmemcheck at boot-time. "Enabled"
|
||||
will enable kmemcheck right from the start, "disabled" will boot the
|
||||
kernel as normal (but with the kmemcheck code compiled in, so it can
|
||||
be enabled at run-time after the kernel has booted), and "one-shot" is
|
||||
a special mode which will turn kmemcheck off automatically after
|
||||
detecting the first use of uninitialized memory.
|
||||
|
||||
If you are using kmemcheck to actively debug a problem, then you
|
||||
probably want to choose "enabled" here.
|
||||
|
||||
The one-shot mode is mostly useful in automated test setups because it
|
||||
can prevent floods of warnings and increase the chances of the machine
|
||||
surviving in case something is really wrong. In other cases, the one-
|
||||
shot mode could actually be counter-productive because it would turn
|
||||
itself off at the very first error -- in the case of a false positive
|
||||
too -- and this would come in the way of debugging the specific
|
||||
problem you were interested in.
|
||||
|
||||
If you would like to use your kernel as normal, but with a chance to
|
||||
enable kmemcheck in case of some problem, it might be a good idea to
|
||||
choose "disabled" here. When kmemcheck is disabled, most of the run-
|
||||
time overhead is not incurred, and the kernel will be almost as fast
|
||||
as normal.
|
||||
|
||||
o CONFIG_KMEMCHECK_QUEUE_SIZE
|
||||
|
||||
Select the maximum number of error reports to store in an internal
|
||||
(fixed-size) buffer. Since errors can occur virtually anywhere and in
|
||||
any context, we need a temporary storage area which is guaranteed not
|
||||
to generate any other page faults when accessed. The queue will be
|
||||
emptied as soon as a tasklet may be scheduled. If the queue is full,
|
||||
new error reports will be lost.
|
||||
|
||||
The default value of 64 is probably fine. If some code produces more
|
||||
than 64 errors within an irqs-off section, then the code is likely to
|
||||
produce many, many more, too, and these additional reports seldom give
|
||||
any more information (the first report is usually the most valuable
|
||||
anyway).
|
||||
|
||||
This number might have to be adjusted if you are not using serial
|
||||
console or similar to capture the kernel log. If you are using the
|
||||
"dmesg" command to save the log, then getting a lot of kmemcheck
|
||||
warnings might overflow the kernel log itself, and the earlier reports
|
||||
will get lost in that way instead. Try setting this to 10 or so on
|
||||
such a setup.
|
||||
|
||||
o CONFIG_KMEMCHECK_SHADOW_COPY_SHIFT
|
||||
|
||||
Select the number of shadow bytes to save along with each entry of the
|
||||
error-report queue. These bytes indicate what parts of an allocation
|
||||
are initialized, uninitialized, etc. and will be displayed when an
|
||||
error is detected to help the debugging of a particular problem.
|
||||
|
||||
The number entered here is actually the logarithm of the number of
|
||||
bytes that will be saved. So if you pick for example 5 here, kmemcheck
|
||||
will save 2^5 = 32 bytes.
|
||||
|
||||
The default value should be fine for debugging most problems. It also
|
||||
fits nicely within 80 columns.
|
||||
|
||||
o CONFIG_KMEMCHECK_PARTIAL_OK
|
||||
|
||||
This option (when enabled) works around certain GCC optimizations that
|
||||
produce 32-bit reads from 16-bit variables where the upper 16 bits are
|
||||
thrown away afterwards.
|
||||
|
||||
The default value (enabled) is recommended. This may of course hide
|
||||
some real errors, but disabling it would probably produce a lot of
|
||||
false positives.
|
||||
|
||||
o CONFIG_KMEMCHECK_BITOPS_OK
|
||||
|
||||
This option silences warnings that would be generated for bit-field
|
||||
accesses where not all the bits are initialized at the same time. This
|
||||
may also hide some real bugs.
|
||||
|
||||
This option is probably obsolete, or it should be replaced with
|
||||
the kmemcheck-/bitfield-annotations for the code in question. The
|
||||
default value is therefore fine.
|
||||
|
||||
Now compile the kernel as usual.
|
||||
|
||||
|
||||
3. How to use
|
||||
=============
|
||||
|
||||
3.1. Booting
|
||||
============
|
||||
|
||||
First some information about the command-line options. There is only one
|
||||
option specific to kmemcheck, and this is called "kmemcheck". It can be used
|
||||
to override the default mode as chosen by the CONFIG_KMEMCHECK_*_BY_DEFAULT
|
||||
option. Its possible settings are:
|
||||
|
||||
o kmemcheck=0 (disabled)
|
||||
o kmemcheck=1 (enabled)
|
||||
o kmemcheck=2 (one-shot mode)
|
||||
|
||||
If SLUB debugging has been enabled in the kernel, it may take precedence over
|
||||
kmemcheck in such a way that the slab caches which are under SLUB debugging
|
||||
will not be tracked by kmemcheck. In order to ensure that this doesn't happen
|
||||
(even though it shouldn't by default), use SLUB's boot option "slub_debug",
|
||||
like this: slub_debug=-
|
||||
|
||||
In fact, this option may also be used for fine-grained control over SLUB vs.
|
||||
kmemcheck. For example, if the command line includes "kmemcheck=1
|
||||
slub_debug=,dentry", then SLUB debugging will be used only for the "dentry"
|
||||
slab cache, and with kmemcheck tracking all the other caches. This is advanced
|
||||
usage, however, and is not generally recommended.
|
||||
|
||||
|
||||
3.2. Run-time enable/disable
|
||||
============================
|
||||
|
||||
When the kernel has booted, it is possible to enable or disable kmemcheck at
|
||||
run-time. WARNING: This feature is still experimental and may cause false
|
||||
positive warnings to appear. Therefore, try not to use this. If you find that
|
||||
it doesn't work properly (e.g. you see an unreasonable amount of warnings), I
|
||||
will be happy to take bug reports.
|
||||
|
||||
Use the file /proc/sys/kernel/kmemcheck for this purpose, e.g.:
|
||||
|
||||
$ echo 0 > /proc/sys/kernel/kmemcheck # disables kmemcheck
|
||||
|
||||
The numbers are the same as for the kmemcheck= command-line option.
|
||||
|
||||
|
||||
3.3. Debugging
|
||||
==============
|
||||
|
||||
A typical report will look something like this:
|
||||
|
||||
WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024)
|
||||
80000000000000000000000000000000000000000088ffff0000000000000000
|
||||
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
|
||||
^
|
||||
|
||||
Pid: 1856, comm: ntpdate Not tainted 2.6.29-rc5 #264 945P-A
|
||||
RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190
|
||||
RSP: 0018:ffff88003cdf7d98 EFLAGS: 00210002
|
||||
RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009
|
||||
RDX: ffff88003e5d6018 RSI: ffff88003e5d6024 RDI: ffff88003cdf7e84
|
||||
RBP: ffff88003cdf7db8 R08: ffff88003e5d6000 R09: 0000000000000000
|
||||
R10: 0000000000000080 R11: 0000000000000000 R12: 000000000000000e
|
||||
R13: ffff88003cdf7e78 R14: ffff88003d530710 R15: ffff88003d5a98c8
|
||||
FS: 0000000000000000(0000) GS:ffff880001982000(0063) knlGS:00000
|
||||
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
|
||||
CR2: ffff88003f806ea0 CR3: 000000003c036000 CR4: 00000000000006a0
|
||||
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
|
||||
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
|
||||
[<ffffffff8104f04e>] dequeue_signal+0x8e/0x170
|
||||
[<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390
|
||||
[<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0
|
||||
[<ffffffff8100c7b5>] int_signal+0x12/0x17
|
||||
[<ffffffffffffffff>] 0xffffffffffffffff
|
||||
|
||||
The single most valuable information in this report is the RIP (or EIP on 32-
|
||||
bit) value. This will help us pinpoint exactly which instruction that caused
|
||||
the warning.
|
||||
|
||||
If your kernel was compiled with CONFIG_DEBUG_INFO=y, then all we have to do
|
||||
is give this address to the addr2line program, like this:
|
||||
|
||||
$ addr2line -e vmlinux -i ffffffff8104ede8
|
||||
arch/x86/include/asm/string_64.h:12
|
||||
include/asm-generic/siginfo.h:287
|
||||
kernel/signal.c:380
|
||||
kernel/signal.c:410
|
||||
|
||||
The "-e vmlinux" tells addr2line which file to look in. IMPORTANT: This must
|
||||
be the vmlinux of the kernel that produced the warning in the first place! If
|
||||
not, the line number information will almost certainly be wrong.
|
||||
|
||||
The "-i" tells addr2line to also print the line numbers of inlined functions.
|
||||
In this case, the flag was very important, because otherwise, it would only
|
||||
have printed the first line, which is just a call to memcpy(), which could be
|
||||
called from a thousand places in the kernel, and is therefore not very useful.
|
||||
These inlined functions would not show up in the stack trace above, simply
|
||||
because the kernel doesn't load the extra debugging information. This
|
||||
technique can of course be used with ordinary kernel oopses as well.
|
||||
|
||||
In this case, it's the caller of memcpy() that is interesting, and it can be
|
||||
found in include/asm-generic/siginfo.h, line 287:
|
||||
|
||||
281 static inline void copy_siginfo(struct siginfo *to, struct siginfo *from)
|
||||
282 {
|
||||
283 if (from->si_code < 0)
|
||||
284 memcpy(to, from, sizeof(*to));
|
||||
285 else
|
||||
286 /* _sigchld is currently the largest know union member */
|
||||
287 memcpy(to, from, __ARCH_SI_PREAMBLE_SIZE + sizeof(from->_sifields._sigchld));
|
||||
288 }
|
||||
|
||||
Since this was a read (kmemcheck usually warns about reads only, though it can
|
||||
warn about writes to unallocated or freed memory as well), it was probably the
|
||||
"from" argument which contained some uninitialized bytes. Following the chain
|
||||
of calls, we move upwards to see where "from" was allocated or initialized,
|
||||
kernel/signal.c, line 380:
|
||||
|
||||
359 static void collect_signal(int sig, struct sigpending *list, siginfo_t *info)
|
||||
360 {
|
||||
...
|
||||
367 list_for_each_entry(q, &list->list, list) {
|
||||
368 if (q->info.si_signo == sig) {
|
||||
369 if (first)
|
||||
370 goto still_pending;
|
||||
371 first = q;
|
||||
...
|
||||
377 if (first) {
|
||||
378 still_pending:
|
||||
379 list_del_init(&first->list);
|
||||
380 copy_siginfo(info, &first->info);
|
||||
381 __sigqueue_free(first);
|
||||
...
|
||||
392 }
|
||||
393 }
|
||||
|
||||
Here, it is &first->info that is being passed on to copy_siginfo(). The
|
||||
variable "first" was found on a list -- passed in as the second argument to
|
||||
collect_signal(). We continue our journey through the stack, to figure out
|
||||
where the item on "list" was allocated or initialized. We move to line 410:
|
||||
|
||||
395 static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
|
||||
396 siginfo_t *info)
|
||||
397 {
|
||||
...
|
||||
410 collect_signal(sig, pending, info);
|
||||
...
|
||||
414 }
|
||||
|
||||
Now we need to follow the "pending" pointer, since that is being passed on to
|
||||
collect_signal() as "list". At this point, we've run out of lines from the
|
||||
"addr2line" output. Not to worry, we just paste the next addresses from the
|
||||
kmemcheck stack dump, i.e.:
|
||||
|
||||
[<ffffffff8104f04e>] dequeue_signal+0x8e/0x170
|
||||
[<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390
|
||||
[<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0
|
||||
[<ffffffff8100c7b5>] int_signal+0x12/0x17
|
||||
|
||||
$ addr2line -e vmlinux -i ffffffff8104f04e ffffffff81050bd8 \
|
||||
ffffffff8100b87d ffffffff8100c7b5
|
||||
kernel/signal.c:446
|
||||
kernel/signal.c:1806
|
||||
arch/x86/kernel/signal.c:805
|
||||
arch/x86/kernel/signal.c:871
|
||||
arch/x86/kernel/entry_64.S:694
|
||||
|
||||
Remember that since these addresses were found on the stack and not as the
|
||||
RIP value, they actually point to the _next_ instruction (they are return
|
||||
addresses). This becomes obvious when we look at the code for line 446:
|
||||
|
||||
422 int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
|
||||
423 {
|
||||
...
|
||||
431 signr = __dequeue_signal(&tsk->signal->shared_pending,
|
||||
432 mask, info);
|
||||
433 /*
|
||||
434 * itimer signal ?
|
||||
435 *
|
||||
436 * itimers are process shared and we restart periodic
|
||||
437 * itimers in the signal delivery path to prevent DoS
|
||||
438 * attacks in the high resolution timer case. This is
|
||||
439 * compliant with the old way of self restarting
|
||||
440 * itimers, as the SIGALRM is a legacy signal and only
|
||||
441 * queued once. Changing the restart behaviour to
|
||||
442 * restart the timer in the signal dequeue path is
|
||||
443 * reducing the timer noise on heavy loaded !highres
|
||||
444 * systems too.
|
||||
445 */
|
||||
446 if (unlikely(signr == SIGALRM)) {
|
||||
...
|
||||
489 }
|
||||
|
||||
So instead of looking at 446, we should be looking at 431, which is the line
|
||||
that executes just before 446. Here we see that what we are looking for is
|
||||
&tsk->signal->shared_pending.
|
||||
|
||||
Our next task is now to figure out which function that puts items on this
|
||||
"shared_pending" list. A crude, but efficient tool, is git grep:
|
||||
|
||||
$ git grep -n 'shared_pending' kernel/
|
||||
...
|
||||
kernel/signal.c:828: pending = group ? &t->signal->shared_pending : &t->pending;
|
||||
kernel/signal.c:1339: pending = group ? &t->signal->shared_pending : &t->pending;
|
||||
...
|
||||
|
||||
There were more results, but none of them were related to list operations,
|
||||
and these were the only assignments. We inspect the line numbers more closely
|
||||
and find that this is indeed where items are being added to the list:
|
||||
|
||||
816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
|
||||
817 int group)
|
||||
818 {
|
||||
...
|
||||
828 pending = group ? &t->signal->shared_pending : &t->pending;
|
||||
...
|
||||
851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN &&
|
||||
852 (is_si_special(info) ||
|
||||
853 info->si_code >= 0)));
|
||||
854 if (q) {
|
||||
855 list_add_tail(&q->list, &pending->list);
|
||||
...
|
||||
890 }
|
||||
|
||||
and:
|
||||
|
||||
1309 int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
|
||||
1310 {
|
||||
....
|
||||
1339 pending = group ? &t->signal->shared_pending : &t->pending;
|
||||
1340 list_add_tail(&q->list, &pending->list);
|
||||
....
|
||||
1347 }
|
||||
|
||||
In the first case, the list element we are looking for, "q", is being returned
|
||||
from the function __sigqueue_alloc(), which looks like an allocation function.
|
||||
Let's take a look at it:
|
||||
|
||||
187 static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
|
||||
188 int override_rlimit)
|
||||
189 {
|
||||
190 struct sigqueue *q = NULL;
|
||||
191 struct user_struct *user;
|
||||
192
|
||||
193 /*
|
||||
194 * We won't get problems with the target's UID changing under us
|
||||
195 * because changing it requires RCU be used, and if t != current, the
|
||||
196 * caller must be holding the RCU readlock (by way of a spinlock) and
|
||||
197 * we use RCU protection here
|
||||
198 */
|
||||
199 user = get_uid(__task_cred(t)->user);
|
||||
200 atomic_inc(&user->sigpending);
|
||||
201 if (override_rlimit ||
|
||||
202 atomic_read(&user->sigpending) <=
|
||||
203 t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
|
||||
204 q = kmem_cache_alloc(sigqueue_cachep, flags);
|
||||
205 if (unlikely(q == NULL)) {
|
||||
206 atomic_dec(&user->sigpending);
|
||||
207 free_uid(user);
|
||||
208 } else {
|
||||
209 INIT_LIST_HEAD(&q->list);
|
||||
210 q->flags = 0;
|
||||
211 q->user = user;
|
||||
212 }
|
||||
213
|
||||
214 return q;
|
||||
215 }
|
||||
|
||||
We see that this function initializes q->list, q->flags, and q->user. It seems
|
||||
that now is the time to look at the definition of "struct sigqueue", e.g.:
|
||||
|
||||
14 struct sigqueue {
|
||||
15 struct list_head list;
|
||||
16 int flags;
|
||||
17 siginfo_t info;
|
||||
18 struct user_struct *user;
|
||||
19 };
|
||||
|
||||
And, you might remember, it was a memcpy() on &first->info that caused the
|
||||
warning, so this makes perfect sense. It also seems reasonable to assume that
|
||||
it is the caller of __sigqueue_alloc() that has the responsibility of filling
|
||||
out (initializing) this member.
|
||||
|
||||
But just which fields of the struct were uninitialized? Let's look at
|
||||
kmemcheck's report again:
|
||||
|
||||
WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024)
|
||||
80000000000000000000000000000000000000000088ffff0000000000000000
|
||||
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
|
||||
^
|
||||
|
||||
These first two lines are the memory dump of the memory object itself, and the
|
||||
shadow bytemap, respectively. The memory object itself is in this case
|
||||
&first->info. Just beware that the start of this dump is NOT the start of the
|
||||
object itself! The position of the caret (^) corresponds with the address of
|
||||
the read (ffff88003e4a2024).
|
||||
|
||||
The shadow bytemap dump legend is as follows:
|
||||
|
||||
i - initialized
|
||||
u - uninitialized
|
||||
a - unallocated (memory has been allocated by the slab layer, but has not
|
||||
yet been handed off to anybody)
|
||||
f - freed (memory has been allocated by the slab layer, but has been freed
|
||||
by the previous owner)
|
||||
|
||||
In order to figure out where (relative to the start of the object) the
|
||||
uninitialized memory was located, we have to look at the disassembly. For
|
||||
that, we'll need the RIP address again:
|
||||
|
||||
RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190
|
||||
|
||||
$ objdump -d --no-show-raw-insn vmlinux | grep -C 8 ffffffff8104ede8:
|
||||
ffffffff8104edc8: mov %r8,0x8(%r8)
|
||||
ffffffff8104edcc: test %r10d,%r10d
|
||||
ffffffff8104edcf: js ffffffff8104ee88 <__dequeue_signal+0x168>
|
||||
ffffffff8104edd5: mov %rax,%rdx
|
||||
ffffffff8104edd8: mov $0xc,%ecx
|
||||
ffffffff8104eddd: mov %r13,%rdi
|
||||
ffffffff8104ede0: mov $0x30,%eax
|
||||
ffffffff8104ede5: mov %rdx,%rsi
|
||||
ffffffff8104ede8: rep movsl %ds:(%rsi),%es:(%rdi)
|
||||
ffffffff8104edea: test $0x2,%al
|
||||
ffffffff8104edec: je ffffffff8104edf0 <__dequeue_signal+0xd0>
|
||||
ffffffff8104edee: movsw %ds:(%rsi),%es:(%rdi)
|
||||
ffffffff8104edf0: test $0x1,%al
|
||||
ffffffff8104edf2: je ffffffff8104edf5 <__dequeue_signal+0xd5>
|
||||
ffffffff8104edf4: movsb %ds:(%rsi),%es:(%rdi)
|
||||
ffffffff8104edf5: mov %r8,%rdi
|
||||
ffffffff8104edf8: callq ffffffff8104de60 <__sigqueue_free>
|
||||
|
||||
As expected, it's the "rep movsl" instruction from the memcpy() that causes
|
||||
the warning. We know about REP MOVSL that it uses the register RCX to count
|
||||
the number of remaining iterations. By taking a look at the register dump
|
||||
again (from the kmemcheck report), we can figure out how many bytes were left
|
||||
to copy:
|
||||
|
||||
RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009
|
||||
|
||||
By looking at the disassembly, we also see that %ecx is being loaded with the
|
||||
value $0xc just before (ffffffff8104edd8), so we are very lucky. Keep in mind
|
||||
that this is the number of iterations, not bytes. And since this is a "long"
|
||||
operation, we need to multiply by 4 to get the number of bytes. So this means
|
||||
that the uninitialized value was encountered at 4 * (0xc - 0x9) = 12 bytes
|
||||
from the start of the object.
|
||||
|
||||
We can now try to figure out which field of the "struct siginfo" that was not
|
||||
initialized. This is the beginning of the struct:
|
||||
|
||||
40 typedef struct siginfo {
|
||||
41 int si_signo;
|
||||
42 int si_errno;
|
||||
43 int si_code;
|
||||
44
|
||||
45 union {
|
||||
..
|
||||
92 } _sifields;
|
||||
93 } siginfo_t;
|
||||
|
||||
On 64-bit, the int is 4 bytes long, so it must the the union member that has
|
||||
not been initialized. We can verify this using gdb:
|
||||
|
||||
$ gdb vmlinux
|
||||
...
|
||||
(gdb) p &((struct siginfo *) 0)->_sifields
|
||||
$1 = (union {...} *) 0x10
|
||||
|
||||
Actually, it seems that the union member is located at offset 0x10 -- which
|
||||
means that gcc has inserted 4 bytes of padding between the members si_code
|
||||
and _sifields. We can now get a fuller picture of the memory dump:
|
||||
|
||||
_----------------------------=> si_code
|
||||
/ _--------------------=> (padding)
|
||||
| / _------------=> _sifields(._kill._pid)
|
||||
| | / _----=> _sifields(._kill._uid)
|
||||
| | | /
|
||||
-------|-------|-------|-------|
|
||||
80000000000000000000000000000000000000000088ffff0000000000000000
|
||||
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
|
||||
|
||||
This allows us to realize another important fact: si_code contains the value
|
||||
0x80. Remember that x86 is little endian, so the first 4 bytes "80000000" are
|
||||
really the number 0x00000080. With a bit of research, we find that this is
|
||||
actually the constant SI_KERNEL defined in include/asm-generic/siginfo.h:
|
||||
|
||||
144 #define SI_KERNEL 0x80 /* sent by the kernel from somewhere */
|
||||
|
||||
This macro is used in exactly one place in the x86 kernel: In send_signal()
|
||||
in kernel/signal.c:
|
||||
|
||||
816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
|
||||
817 int group)
|
||||
818 {
|
||||
...
|
||||
828 pending = group ? &t->signal->shared_pending : &t->pending;
|
||||
...
|
||||
851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN &&
|
||||
852 (is_si_special(info) ||
|
||||
853 info->si_code >= 0)));
|
||||
854 if (q) {
|
||||
855 list_add_tail(&q->list, &pending->list);
|
||||
856 switch ((unsigned long) info) {
|
||||
...
|
||||
865 case (unsigned long) SEND_SIG_PRIV:
|
||||
866 q->info.si_signo = sig;
|
||||
867 q->info.si_errno = 0;
|
||||
868 q->info.si_code = SI_KERNEL;
|
||||
869 q->info.si_pid = 0;
|
||||
870 q->info.si_uid = 0;
|
||||
871 break;
|
||||
...
|
||||
890 }
|
||||
|
||||
Not only does this match with the .si_code member, it also matches the place
|
||||
we found earlier when looking for where siginfo_t objects are enqueued on the
|
||||
"shared_pending" list.
|
||||
|
||||
So to sum up: It seems that it is the padding introduced by the compiler
|
||||
between two struct fields that is uninitialized, and this gets reported when
|
||||
we do a memcpy() on the struct. This means that we have identified a false
|
||||
positive warning.
|
||||
|
||||
Normally, kmemcheck will not report uninitialized accesses in memcpy() calls
|
||||
when both the source and destination addresses are tracked. (Instead, we copy
|
||||
the shadow bytemap as well). In this case, the destination address clearly
|
||||
was not tracked. We can dig a little deeper into the stack trace from above:
|
||||
|
||||
arch/x86/kernel/signal.c:805
|
||||
arch/x86/kernel/signal.c:871
|
||||
arch/x86/kernel/entry_64.S:694
|
||||
|
||||
And we clearly see that the destination siginfo object is located on the
|
||||
stack:
|
||||
|
||||
782 static void do_signal(struct pt_regs *regs)
|
||||
783 {
|
||||
784 struct k_sigaction ka;
|
||||
785 siginfo_t info;
|
||||
...
|
||||
804 signr = get_signal_to_deliver(&info, &ka, regs, NULL);
|
||||
...
|
||||
854 }
|
||||
|
||||
And this &info is what eventually gets passed to copy_siginfo() as the
|
||||
destination argument.
|
||||
|
||||
Now, even though we didn't find an actual error here, the example is still a
|
||||
good one, because it shows how one would go about to find out what the report
|
||||
was all about.
|
||||
|
||||
|
||||
3.4. Annotating false positives
|
||||
===============================
|
||||
|
||||
There are a few different ways to make annotations in the source code that
|
||||
will keep kmemcheck from checking and reporting certain allocations. Here
|
||||
they are:
|
||||
|
||||
o __GFP_NOTRACK_FALSE_POSITIVE
|
||||
|
||||
This flag can be passed to kmalloc() or kmem_cache_alloc() (therefore
|
||||
also to other functions that end up calling one of these) to indicate
|
||||
that the allocation should not be tracked because it would lead to
|
||||
a false positive report. This is a "big hammer" way of silencing
|
||||
kmemcheck; after all, even if the false positive pertains to
|
||||
particular field in a struct, for example, we will now lose the
|
||||
ability to find (real) errors in other parts of the same struct.
|
||||
|
||||
Example:
|
||||
|
||||
/* No warnings will ever trigger on accessing any part of x */
|
||||
x = kmalloc(sizeof *x, GFP_KERNEL | __GFP_NOTRACK_FALSE_POSITIVE);
|
||||
|
||||
o kmemcheck_bitfield_begin(name)/kmemcheck_bitfield_end(name) and
|
||||
kmemcheck_annotate_bitfield(ptr, name)
|
||||
|
||||
The first two of these three macros can be used inside struct
|
||||
definitions to signal, respectively, the beginning and end of a
|
||||
bitfield. Additionally, this will assign the bitfield a name, which
|
||||
is given as an argument to the macros.
|
||||
|
||||
Having used these markers, one can later use
|
||||
kmemcheck_annotate_bitfield() at the point of allocation, to indicate
|
||||
which parts of the allocation is part of a bitfield.
|
||||
|
||||
Example:
|
||||
|
||||
struct foo {
|
||||
int x;
|
||||
|
||||
kmemcheck_bitfield_begin(flags);
|
||||
int flag_a:1;
|
||||
int flag_b:1;
|
||||
kmemcheck_bitfield_end(flags);
|
||||
|
||||
int y;
|
||||
};
|
||||
|
||||
struct foo *x = kmalloc(sizeof *x);
|
||||
|
||||
/* No warnings will trigger on accessing the bitfield of x */
|
||||
kmemcheck_annotate_bitfield(x, flags);
|
||||
|
||||
Note that kmemcheck_annotate_bitfield() can be used even before the
|
||||
return value of kmalloc() is checked -- in other words, passing NULL
|
||||
as the first argument is legal (and will do nothing).
|
||||
|
||||
|
||||
4. Reporting errors
|
||||
===================
|
||||
|
||||
As we have seen, kmemcheck will produce false positive reports. Therefore, it
|
||||
is not very wise to blindly post kmemcheck warnings to mailing lists and
|
||||
maintainers. Instead, I encourage maintainers and developers to find errors
|
||||
in their own code. If you get a warning, you can try to work around it, try
|
||||
to figure out if it's a real error or not, or simply ignore it. Most
|
||||
developers know their own code and will quickly and efficiently determine the
|
||||
root cause of a kmemcheck report. This is therefore also the most efficient
|
||||
way to work with kmemcheck.
|
||||
|
||||
That said, we (the kmemcheck maintainers) will always be on the lookout for
|
||||
false positives that we can annotate and silence. So whatever you find,
|
||||
please drop us a note privately! Kernel configs and steps to reproduce (if
|
||||
available) are of course a great help too.
|
||||
|
||||
Happy hacking!
|
||||
|
||||
|
||||
5. Technical description
|
||||
========================
|
||||
|
||||
kmemcheck works by marking memory pages non-present. This means that whenever
|
||||
somebody attempts to access the page, a page fault is generated. The page
|
||||
fault handler notices that the page was in fact only hidden, and so it calls
|
||||
on the kmemcheck code to make further investigations.
|
||||
|
||||
When the investigations are completed, kmemcheck "shows" the page by marking
|
||||
it present (as it would be under normal circumstances). This way, the
|
||||
interrupted code can continue as usual.
|
||||
|
||||
But after the instruction has been executed, we should hide the page again, so
|
||||
that we can catch the next access too! Now kmemcheck makes use of a debugging
|
||||
feature of the processor, namely single-stepping. When the processor has
|
||||
finished the one instruction that generated the memory access, a debug
|
||||
exception is raised. From here, we simply hide the page again and continue
|
||||
execution, this time with the single-stepping feature turned off.
|
||||
|
||||
kmemcheck requires some assistance from the memory allocator in order to work.
|
||||
The memory allocator needs to
|
||||
|
||||
1. Tell kmemcheck about newly allocated pages and pages that are about to
|
||||
be freed. This allows kmemcheck to set up and tear down the shadow memory
|
||||
for the pages in question. The shadow memory stores the status of each
|
||||
byte in the allocation proper, e.g. whether it is initialized or
|
||||
uninitialized.
|
||||
|
||||
2. Tell kmemcheck which parts of memory should be marked uninitialized.
|
||||
There are actually a few more states, such as "not yet allocated" and
|
||||
"recently freed".
|
||||
|
||||
If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
|
||||
memory that can take page faults because of kmemcheck.
|
||||
|
||||
If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
|
||||
request memory with the __GFP_NOTRACK or __GFP_NOTRACK_FALSE_POSITIVE flags.
|
||||
This does not prevent the page faults from occurring, however, but marks the
|
||||
object in question as being initialized so that no warnings will ever be
|
||||
produced for this object.
|
||||
|
||||
Currently, the SLAB and SLUB allocators are supported by kmemcheck.
|
142
Documentation/kmemleak.txt
Normal file
142
Documentation/kmemleak.txt
Normal file
@ -0,0 +1,142 @@
|
||||
Kernel Memory Leak Detector
|
||||
===========================
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
Kmemleak provides a way of detecting possible kernel memory leaks in a
|
||||
way similar to a tracing garbage collector
|
||||
(http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Tracing_garbage_collectors),
|
||||
with the difference that the orphan objects are not freed but only
|
||||
reported via /sys/kernel/debug/kmemleak. A similar method is used by the
|
||||
Valgrind tool (memcheck --leak-check) to detect the memory leaks in
|
||||
user-space applications.
|
||||
|
||||
Usage
|
||||
-----
|
||||
|
||||
CONFIG_DEBUG_KMEMLEAK in "Kernel hacking" has to be enabled. A kernel
|
||||
thread scans the memory every 10 minutes (by default) and prints any new
|
||||
unreferenced objects found. To trigger an intermediate scan and display
|
||||
all the possible memory leaks:
|
||||
|
||||
# mount -t debugfs nodev /sys/kernel/debug/
|
||||
# cat /sys/kernel/debug/kmemleak
|
||||
|
||||
Note that the orphan objects are listed in the order they were allocated
|
||||
and one object at the beginning of the list may cause other subsequent
|
||||
objects to be reported as orphan.
|
||||
|
||||
Memory scanning parameters can be modified at run-time by writing to the
|
||||
/sys/kernel/debug/kmemleak file. The following parameters are supported:
|
||||
|
||||
off - disable kmemleak (irreversible)
|
||||
stack=on - enable the task stacks scanning
|
||||
stack=off - disable the tasks stacks scanning
|
||||
scan=on - start the automatic memory scanning thread
|
||||
scan=off - stop the automatic memory scanning thread
|
||||
scan=<secs> - set the automatic memory scanning period in seconds (0
|
||||
to disable it)
|
||||
|
||||
Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on
|
||||
the kernel command line.
|
||||
|
||||
Basic Algorithm
|
||||
---------------
|
||||
|
||||
The memory allocations via kmalloc, vmalloc, kmem_cache_alloc and
|
||||
friends are traced and the pointers, together with additional
|
||||
information like size and stack trace, are stored in a prio search tree.
|
||||
The corresponding freeing function calls are tracked and the pointers
|
||||
removed from the kmemleak data structures.
|
||||
|
||||
An allocated block of memory is considered orphan if no pointer to its
|
||||
start address or to any location inside the block can be found by
|
||||
scanning the memory (including saved registers). This means that there
|
||||
might be no way for the kernel to pass the address of the allocated
|
||||
block to a freeing function and therefore the block is considered a
|
||||
memory leak.
|
||||
|
||||
The scanning algorithm steps:
|
||||
|
||||
1. mark all objects as white (remaining white objects will later be
|
||||
considered orphan)
|
||||
2. scan the memory starting with the data section and stacks, checking
|
||||
the values against the addresses stored in the prio search tree. If
|
||||
a pointer to a white object is found, the object is added to the
|
||||
gray list
|
||||
3. scan the gray objects for matching addresses (some white objects
|
||||
can become gray and added at the end of the gray list) until the
|
||||
gray set is finished
|
||||
4. the remaining white objects are considered orphan and reported via
|
||||
/sys/kernel/debug/kmemleak
|
||||
|
||||
Some allocated memory blocks have pointers stored in the kernel's
|
||||
internal data structures and they cannot be detected as orphans. To
|
||||
avoid this, kmemleak can also store the number of values pointing to an
|
||||
address inside the block address range that need to be found so that the
|
||||
block is not considered a leak. One example is __vmalloc().
|
||||
|
||||
Kmemleak API
|
||||
------------
|
||||
|
||||
See the include/linux/kmemleak.h header for the functions prototype.
|
||||
|
||||
kmemleak_init - initialize kmemleak
|
||||
kmemleak_alloc - notify of a memory block allocation
|
||||
kmemleak_free - notify of a memory block freeing
|
||||
kmemleak_not_leak - mark an object as not a leak
|
||||
kmemleak_ignore - do not scan or report an object as leak
|
||||
kmemleak_scan_area - add scan areas inside a memory block
|
||||
kmemleak_no_scan - do not scan a memory block
|
||||
kmemleak_erase - erase an old value in a pointer variable
|
||||
kmemleak_alloc_recursive - as kmemleak_alloc but checks the recursiveness
|
||||
kmemleak_free_recursive - as kmemleak_free but checks the recursiveness
|
||||
|
||||
Dealing with false positives/negatives
|
||||
--------------------------------------
|
||||
|
||||
The false negatives are real memory leaks (orphan objects) but not
|
||||
reported by kmemleak because values found during the memory scanning
|
||||
point to such objects. To reduce the number of false negatives, kmemleak
|
||||
provides the kmemleak_ignore, kmemleak_scan_area, kmemleak_no_scan and
|
||||
kmemleak_erase functions (see above). The task stacks also increase the
|
||||
amount of false negatives and their scanning is not enabled by default.
|
||||
|
||||
The false positives are objects wrongly reported as being memory leaks
|
||||
(orphan). For objects known not to be leaks, kmemleak provides the
|
||||
kmemleak_not_leak function. The kmemleak_ignore could also be used if
|
||||
the memory block is known not to contain other pointers and it will no
|
||||
longer be scanned.
|
||||
|
||||
Some of the reported leaks are only transient, especially on SMP
|
||||
systems, because of pointers temporarily stored in CPU registers or
|
||||
stacks. Kmemleak defines MSECS_MIN_AGE (defaulting to 1000) representing
|
||||
the minimum age of an object to be reported as a memory leak.
|
||||
|
||||
Limitations and Drawbacks
|
||||
-------------------------
|
||||
|
||||
The main drawback is the reduced performance of memory allocation and
|
||||
freeing. To avoid other penalties, the memory scanning is only performed
|
||||
when the /sys/kernel/debug/kmemleak file is read. Anyway, this tool is
|
||||
intended for debugging purposes where the performance might not be the
|
||||
most important requirement.
|
||||
|
||||
To keep the algorithm simple, kmemleak scans for values pointing to any
|
||||
address inside a block's address range. This may lead to an increased
|
||||
number of false negatives. However, it is likely that a real memory leak
|
||||
will eventually become visible.
|
||||
|
||||
Another source of false negatives is the data stored in non-pointer
|
||||
values. In a future version, kmemleak could only scan the pointer
|
||||
members in the allocated structures. This feature would solve many of
|
||||
the false negative cases described above.
|
||||
|
||||
The tool can report false positives. These are cases where an allocated
|
||||
block doesn't need to be freed (some cases in the init_call functions),
|
||||
the pointer is calculated by other methods than the usual container_of
|
||||
macro or the pointer is stored in a location not scanned by kmemleak.
|
||||
|
||||
Page allocations and ioremap are not tracked. Only the ARM and x86
|
||||
architectures are currently supported.
|
@ -132,7 +132,7 @@ kobject_name():
|
||||
const char *kobject_name(const struct kobject * kobj);
|
||||
|
||||
There is a helper function to both initialize and add the kobject to the
|
||||
kernel at the same time, called supprisingly enough kobject_init_and_add():
|
||||
kernel at the same time, called surprisingly enough kobject_init_and_add():
|
||||
|
||||
int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype,
|
||||
struct kobject *parent, const char *fmt, ...);
|
||||
|
@ -507,9 +507,9 @@ http://www.linuxsymposium.org/2006/linuxsymposium_procv2.pdf (pages 101-115)
|
||||
Appendix A: The kprobes debugfs interface
|
||||
|
||||
With recent kernels (> 2.6.20) the list of registered kprobes is visible
|
||||
under the /debug/kprobes/ directory (assuming debugfs is mounted at /debug).
|
||||
under the /sys/kernel/debug/kprobes/ directory (assuming debugfs is mounted at //sys/kernel/debug).
|
||||
|
||||
/debug/kprobes/list: Lists all registered probes on the system
|
||||
/sys/kernel/debug/kprobes/list: Lists all registered probes on the system
|
||||
|
||||
c015d71a k vfs_read+0x0
|
||||
c011a316 j do_fork+0x0
|
||||
@ -525,7 +525,7 @@ virtual addresses that correspond to modules that've been unloaded),
|
||||
such probes are marked with [GONE]. If the probe is temporarily disabled,
|
||||
such probes are marked with [DISABLED].
|
||||
|
||||
/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly.
|
||||
/sys/kernel/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly.
|
||||
|
||||
Provides a knob to globally and forcibly turn registered kprobes ON or OFF.
|
||||
By default, all kprobes are enabled. By echoing "0" to this file, all
|
||||
|
@ -40,7 +40,7 @@ NOTE: The Acer Aspire One is not supported hardware. It cannot work with
|
||||
acer-wmi until Acer fix their ACPI-WMI implementation on them, so has been
|
||||
blacklisted until that happens.
|
||||
|
||||
Please see the website for the current list of known working hardare:
|
||||
Please see the website for the current list of known working hardware:
|
||||
|
||||
http://code.google.com/p/aceracpi/wiki/SupportedHardware
|
||||
|
||||
|
@ -22,7 +22,7 @@ If your laptop model supports it, you will find sysfs files in the
|
||||
/sys/class/backlight/sony/
|
||||
directory. You will be able to query and set the current screen
|
||||
brightness:
|
||||
brightness get/set screen brightness (an iteger
|
||||
brightness get/set screen brightness (an integer
|
||||
between 0 and 7)
|
||||
actual_brightness reading from this file will query the HW
|
||||
to get real brightness value
|
||||
|
@ -506,7 +506,7 @@ generate input device EV_KEY events.
|
||||
In addition to the EV_KEY events, thinkpad-acpi may also issue EV_SW
|
||||
events for switches:
|
||||
|
||||
SW_RFKILL_ALL T60 and later hardare rfkill rocker switch
|
||||
SW_RFKILL_ALL T60 and later hardware rfkill rocker switch
|
||||
SW_TABLET_MODE Tablet ThinkPads HKEY events 0x5009 and 0x500A
|
||||
|
||||
Non hot-key ACPI HKEY event map:
|
||||
@ -920,7 +920,7 @@ The available commands are:
|
||||
echo '<LED number> off' >/proc/acpi/ibm/led
|
||||
echo '<LED number> blink' >/proc/acpi/ibm/led
|
||||
|
||||
The <LED number> range is 0 to 7. The set of LEDs that can be
|
||||
The <LED number> range is 0 to 15. The set of LEDs that can be
|
||||
controlled varies from model to model. Here is the common ThinkPad
|
||||
mapping:
|
||||
|
||||
@ -932,6 +932,11 @@ mapping:
|
||||
5 - UltraBase battery slot
|
||||
6 - (unknown)
|
||||
7 - standby
|
||||
8 - dock status 1
|
||||
9 - dock status 2
|
||||
10, 11 - (unknown)
|
||||
12 - thinkvantage
|
||||
13, 14, 15 - (unknown)
|
||||
|
||||
All of the above can be turned on and off and can be made to blink.
|
||||
|
||||
@ -940,10 +945,12 @@ sysfs notes:
|
||||
The ThinkPad LED sysfs interface is described in detail by the LED class
|
||||
documentation, in Documentation/leds-class.txt.
|
||||
|
||||
The leds are named (in LED ID order, from 0 to 7):
|
||||
The LEDs are named (in LED ID order, from 0 to 12):
|
||||
"tpacpi::power", "tpacpi:orange:batt", "tpacpi:green:batt",
|
||||
"tpacpi::dock_active", "tpacpi::bay_active", "tpacpi::dock_batt",
|
||||
"tpacpi::unknown_led", "tpacpi::standby".
|
||||
"tpacpi::unknown_led", "tpacpi::standby", "tpacpi::dock_status1",
|
||||
"tpacpi::dock_status2", "tpacpi::unknown_led2", "tpacpi::unknown_led3",
|
||||
"tpacpi::thinkvantage".
|
||||
|
||||
Due to limitations in the sysfs LED class, if the status of the LED
|
||||
indicators cannot be read due to an error, thinkpad-acpi will report it as
|
||||
@ -958,6 +965,12 @@ ThinkPad indicator LED should blink in hardware accelerated mode, use the
|
||||
"timer" trigger, and leave the delay_on and delay_off parameters set to
|
||||
zero (to request hardware acceleration autodetection).
|
||||
|
||||
LEDs that are known not to exist in a given ThinkPad model are not
|
||||
made available through the sysfs interface. If you have a dock and you
|
||||
notice there are LEDs listed for your ThinkPad that do not exist (and
|
||||
are not in the dock), or if you notice that there are missing LEDs,
|
||||
a report to ibm-acpi-devel@lists.sourceforge.net is appreciated.
|
||||
|
||||
|
||||
ACPI sounds -- /proc/acpi/ibm/beep
|
||||
----------------------------------
|
||||
@ -1156,17 +1169,19 @@ may not be distinct. Later Lenovo models that implement the ACPI
|
||||
display backlight brightness control methods have 16 levels, ranging
|
||||
from 0 to 15.
|
||||
|
||||
There are two interfaces to the firmware for direct brightness control,
|
||||
EC and UCMS (or CMOS). To select which one should be used, use the
|
||||
brightness_mode module parameter: brightness_mode=1 selects EC mode,
|
||||
brightness_mode=2 selects UCMS mode, brightness_mode=3 selects EC
|
||||
mode with NVRAM backing (so that brightness changes are remembered
|
||||
across shutdown/reboot).
|
||||
For IBM ThinkPads, there are two interfaces to the firmware for direct
|
||||
brightness control, EC and UCMS (or CMOS). To select which one should be
|
||||
used, use the brightness_mode module parameter: brightness_mode=1 selects
|
||||
EC mode, brightness_mode=2 selects UCMS mode, brightness_mode=3 selects EC
|
||||
mode with NVRAM backing (so that brightness changes are remembered across
|
||||
shutdown/reboot).
|
||||
|
||||
The driver tries to select which interface to use from a table of
|
||||
defaults for each ThinkPad model. If it makes a wrong choice, please
|
||||
report this as a bug, so that we can fix it.
|
||||
|
||||
Lenovo ThinkPads only support brightness_mode=2 (UCMS).
|
||||
|
||||
When display backlight brightness controls are available through the
|
||||
standard ACPI interface, it is best to use it instead of this direct
|
||||
ThinkPad-specific interface. The driver will disable its native
|
||||
@ -1254,7 +1269,7 @@ Fan control and monitoring: fan speed, fan enable/disable
|
||||
|
||||
procfs: /proc/acpi/ibm/fan
|
||||
sysfs device attributes: (hwmon "thinkpad") fan1_input, pwm1,
|
||||
pwm1_enable
|
||||
pwm1_enable, fan2_input
|
||||
sysfs hwmon driver attributes: fan_watchdog
|
||||
|
||||
NOTE NOTE NOTE: fan control operations are disabled by default for
|
||||
@ -1267,6 +1282,9 @@ from the hardware registers of the embedded controller. This is known
|
||||
to work on later R, T, X and Z series ThinkPads but may show a bogus
|
||||
value on other models.
|
||||
|
||||
Some Lenovo ThinkPads support a secondary fan. This fan cannot be
|
||||
controlled separately, it shares the main fan control.
|
||||
|
||||
Fan levels:
|
||||
|
||||
Most ThinkPad fans work in "levels" at the firmware interface. Level 0
|
||||
@ -1397,6 +1415,11 @@ hwmon device attribute fan1_input:
|
||||
which can take up to two minutes. May return rubbish on older
|
||||
ThinkPads.
|
||||
|
||||
hwmon device attribute fan2_input:
|
||||
Fan tachometer reading, in RPM, for the secondary fan.
|
||||
Available only on some ThinkPads. If the secondary fan is
|
||||
not installed, will always read 0.
|
||||
|
||||
hwmon driver attribute fan_watchdog:
|
||||
Fan safety watchdog timer interval, in seconds. Minimum is
|
||||
1 second, maximum is 120 seconds. 0 disables the watchdog.
|
||||
@ -1555,3 +1578,7 @@ Sysfs interface changelog:
|
||||
0x020300: hotkey enable/disable support removed, attributes
|
||||
hotkey_bios_enabled and hotkey_enable deprecated and
|
||||
marked for removal.
|
||||
|
||||
0x020400: Marker for 16 LEDs support. Also, LEDs that are known
|
||||
to not exist in a given model are not registered with
|
||||
the LED sysfs class anymore.
|
||||
|
@ -1,6 +1,5 @@
|
||||
# This creates the demonstration utility "lguest" which runs a Linux guest.
|
||||
CFLAGS:=-Wall -Wmissing-declarations -Wmissing-prototypes -O3 -I../../include -I../../arch/x86/include -U_FORTIFY_SOURCE
|
||||
LDLIBS:=-lz
|
||||
CFLAGS:=-m32 -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -I../../include -I../../arch/x86/include -U_FORTIFY_SOURCE
|
||||
|
||||
all: lguest
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -37,7 +37,6 @@ Running Lguest:
|
||||
"Paravirtualized guest support" = Y
|
||||
"Lguest guest support" = Y
|
||||
"High Memory Support" = off/4GB
|
||||
"PAE (Physical Address Extension) Support" = N
|
||||
"Alignment value to which kernel should be aligned" = 0x100000
|
||||
(CONFIG_PARAVIRT=y, CONFIG_LGUEST_GUEST=y, CONFIG_HIGHMEM64G=n and
|
||||
CONFIG_PHYSICAL_ALIGN=0x100000)
|
||||
|
@ -34,7 +34,7 @@ out of order wrt other memory writes by the owner CPU.
|
||||
|
||||
It can be done by slightly modifying the standard atomic operations : only
|
||||
their UP variant must be kept. It typically means removing LOCK prefix (on
|
||||
i386 and x86_64) and any SMP sychronization barrier. If the architecture does
|
||||
i386 and x86_64) and any SMP synchronization barrier. If the architecture does
|
||||
not have a different behavior between SMP and UP, including asm-generic/local.h
|
||||
in your architecture's local.h is sufficient.
|
||||
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
x
Reference in New Issue
Block a user