mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2025-01-04 04:06:26 +00:00
Merge branch 'linus' into tracing/hw-breakpoints
Conflicts: arch/x86/Kconfig arch/x86/kernel/traps.c arch/x86/power/cpu.c arch/x86/power/cpu_32.c kernel/Makefile Semantic conflict: arch/x86/kernel/hw_breakpoint.c Merge reason: Resolve the conflicts, move from put_cpu_no_sched() to put_cpu() in arch/x86/kernel/hw_breakpoint.c. Signed-off-by: Ingo Molnar <mingo@elte.hu>
This commit is contained in:
commit
eadb8a091b
10
.gitignore
vendored
10
.gitignore
vendored
@ -3,7 +3,7 @@
|
||||
# subdirectories here. Add them in the ".gitignore" file
|
||||
# in that subdirectory instead.
|
||||
#
|
||||
# NOTE! Please use 'git-ls-files -i --exclude-standard'
|
||||
# NOTE! Please use 'git ls-files -i --exclude-standard'
|
||||
# command after changing this file, to see if there are
|
||||
# any tracked files which get ignored after the change.
|
||||
#
|
||||
@ -25,6 +25,8 @@
|
||||
*.elf
|
||||
*.bin
|
||||
*.gz
|
||||
*.lzma
|
||||
*.patch
|
||||
|
||||
#
|
||||
# Top-level generic files
|
||||
@ -62,6 +64,12 @@ series
|
||||
cscope.*
|
||||
ncscope.*
|
||||
|
||||
# gnu global files
|
||||
GPATH
|
||||
GRTAGS
|
||||
GSYMS
|
||||
GTAGS
|
||||
|
||||
*.orig
|
||||
*~
|
||||
\#*#
|
||||
|
4
CREDITS
4
CREDITS
@ -1253,6 +1253,10 @@ S: 8124 Constitution Apt. 7
|
||||
S: Sterling Heights, Michigan 48313
|
||||
S: USA
|
||||
|
||||
N: Wolfgang Grandegger
|
||||
E: wg@grandegger.com
|
||||
D: Controller Area Network (device drivers)
|
||||
|
||||
N: William Greathouse
|
||||
E: wgreathouse@smva.com
|
||||
E: wgreathouse@myfavoritei.com
|
||||
|
@ -60,3 +60,62 @@ Description:
|
||||
Indicates whether the block layer should automatically
|
||||
generate checksums for write requests bound for
|
||||
devices that support receiving integrity metadata.
|
||||
|
||||
What: /sys/block/<disk>/alignment_offset
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report a physical block size that is
|
||||
bigger than the logical block size (for instance a drive
|
||||
with 4KB physical sectors exposing 512-byte logical
|
||||
blocks to the operating system). This parameter
|
||||
indicates how many bytes the beginning of the device is
|
||||
offset from the disk's natural alignment.
|
||||
|
||||
What: /sys/block/<disk>/<partition>/alignment_offset
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report a physical block size that is
|
||||
bigger than the logical block size (for instance a drive
|
||||
with 4KB physical sectors exposing 512-byte logical
|
||||
blocks to the operating system). This parameter
|
||||
indicates how many bytes the beginning of the partition
|
||||
is offset from the disk's natural alignment.
|
||||
|
||||
What: /sys/block/<disk>/queue/logical_block_size
|
||||
Date: May 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
This is the smallest unit the storage device can
|
||||
address. It is typically 512 bytes.
|
||||
|
||||
What: /sys/block/<disk>/queue/physical_block_size
|
||||
Date: May 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
This is the smallest unit the storage device can write
|
||||
without resorting to read-modify-write operation. It is
|
||||
usually the same as the logical block size but may be
|
||||
bigger. One example is SATA drives with 4KB sectors
|
||||
that expose a 512-byte logical block size to the
|
||||
operating system.
|
||||
|
||||
What: /sys/block/<disk>/queue/minimum_io_size
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report a preferred minimum I/O size,
|
||||
which is the smallest request the device can perform
|
||||
without incurring a read-modify-write penalty. For disk
|
||||
drives this is often the physical block size. For RAID
|
||||
arrays it is often the stripe chunk size.
|
||||
|
||||
What: /sys/block/<disk>/queue/optimal_io_size
|
||||
Date: April 2009
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Storage devices may report an optimal I/O size, which is
|
||||
the device's preferred unit of receiving I/O. This is
|
||||
rarely reported for disk drives. For RAID devices it is
|
||||
usually the stripe width or the internal block size.
|
||||
|
33
Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
Normal file
33
Documentation/ABI/testing/sysfs-bus-pci-devices-cciss
Normal file
@ -0,0 +1,33 @@
|
||||
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/model
|
||||
Date: March 2009
|
||||
Kernel Version: 2.6.30
|
||||
Contact: iss_storagedev@hp.com
|
||||
Description: Displays the SCSI INQUIRY page 0 model for logical drive
|
||||
Y of controller X.
|
||||
|
||||
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/rev
|
||||
Date: March 2009
|
||||
Kernel Version: 2.6.30
|
||||
Contact: iss_storagedev@hp.com
|
||||
Description: Displays the SCSI INQUIRY page 0 revision for logical
|
||||
drive Y of controller X.
|
||||
|
||||
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/unique_id
|
||||
Date: March 2009
|
||||
Kernel Version: 2.6.30
|
||||
Contact: iss_storagedev@hp.com
|
||||
Description: Displays the SCSI INQUIRY page 83 serial number for logical
|
||||
drive Y of controller X.
|
||||
|
||||
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/vendor
|
||||
Date: March 2009
|
||||
Kernel Version: 2.6.30
|
||||
Contact: iss_storagedev@hp.com
|
||||
Description: Displays the SCSI INQUIRY page 0 vendor for logical drive
|
||||
Y of controller X.
|
||||
|
||||
Where: /sys/bus/pci/devices/<dev>/ccissX/cXdY/block:cciss!cXdY
|
||||
Date: March 2009
|
||||
Kernel Version: 2.6.30
|
||||
Contact: iss_storagedev@hp.com
|
||||
Description: A symbolic link to /sys/block/cciss!cXdY
|
18
Documentation/ABI/testing/sysfs-devices-cache_disable
Normal file
18
Documentation/ABI/testing/sysfs-devices-cache_disable
Normal file
@ -0,0 +1,18 @@
|
||||
What: /sys/devices/system/cpu/cpu*/cache/index*/cache_disable_X
|
||||
Date: August 2008
|
||||
KernelVersion: 2.6.27
|
||||
Contact: mark.langsdorf@amd.com
|
||||
Description: These files exist in every cpu's cache index directories.
|
||||
There are currently 2 cache_disable_# files in each
|
||||
directory. Reading from these files on a supported
|
||||
processor will return that cache disable index value
|
||||
for that processor and node. Writing to one of these
|
||||
files will cause the specificed cache index to be disabled.
|
||||
|
||||
Currently, only AMD Family 10h Processors support cache index
|
||||
disable, and only for their L3 caches. See the BIOS and
|
||||
Kernel Developer's Guide at
|
||||
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116-Public-GH-BKDG_3.20_2-4-09.pdf
|
||||
for formatting information and other details on the
|
||||
cache index disable.
|
||||
Users: joachim.deguara@amd.com
|
479
Documentation/ABI/testing/sysfs-kernel-slab
Normal file
479
Documentation/ABI/testing/sysfs-kernel-slab
Normal file
@ -0,0 +1,479 @@
|
||||
What: /sys/kernel/slab
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The /sys/kernel/slab directory contains a snapshot of the
|
||||
internal state of the SLUB allocator for each cache. Certain
|
||||
files may be modified to change the behavior of the cache (and
|
||||
any cache it aliases, if any).
|
||||
Users: kernel memory tuning tools
|
||||
|
||||
What: /sys/kernel/slab/cache/aliases
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The aliases file is read-only and specifies how many caches
|
||||
have merged into this cache.
|
||||
|
||||
What: /sys/kernel/slab/cache/align
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The align file is read-only and specifies the cache's object
|
||||
alignment in bytes.
|
||||
|
||||
What: /sys/kernel/slab/cache/alloc_calls
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The alloc_calls file is read-only and lists the kernel code
|
||||
locations from which allocations for this cache were performed.
|
||||
The alloc_calls file only contains information if debugging is
|
||||
enabled for that cache (see Documentation/vm/slub.txt).
|
||||
|
||||
What: /sys/kernel/slab/cache/alloc_fastpath
|
||||
Date: February 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The alloc_fastpath file is read-only and specifies how many
|
||||
objects have been allocated using the fast path.
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/alloc_from_partial
|
||||
Date: February 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The alloc_from_partial file is read-only and specifies how
|
||||
many times a cpu slab has been full and it has been refilled
|
||||
by using a slab from the list of partially used slabs.
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/alloc_refill
|
||||
Date: February 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The alloc_refill file is read-only and specifies how many
|
||||
times the per-cpu freelist was empty but there were objects
|
||||
available as the result of remote cpu frees.
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/alloc_slab
|
||||
Date: February 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The alloc_slab file is read-only and specifies how many times
|
||||
a new slab had to be allocated from the page allocator.
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/alloc_slowpath
|
||||
Date: February 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The alloc_slowpath file is read-only and specifies how many
|
||||
objects have been allocated using the slow path because of a
|
||||
refill or allocation from a partial or new slab.
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/cache_dma
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The cache_dma file is read-only and specifies whether objects
|
||||
are from ZONE_DMA.
|
||||
Available when CONFIG_ZONE_DMA is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/cpu_slabs
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The cpu_slabs file is read-only and displays how many cpu slabs
|
||||
are active and their NUMA locality.
|
||||
|
||||
What: /sys/kernel/slab/cache/cpuslab_flush
|
||||
Date: April 2009
|
||||
KernelVersion: 2.6.31
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The file cpuslab_flush is read-only and specifies how many
|
||||
times a cache's cpu slabs have been flushed as the result of
|
||||
destroying or shrinking a cache, a cpu going offline, or as
|
||||
the result of forcing an allocation from a certain node.
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/ctor
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The ctor file is read-only and specifies the cache's object
|
||||
constructor function, which is invoked for each object when a
|
||||
new slab is allocated.
|
||||
|
||||
What: /sys/kernel/slab/cache/deactivate_empty
|
||||
Date: February 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The file deactivate_empty is read-only and specifies how many
|
||||
times an empty cpu slab was deactivated.
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/deactivate_full
|
||||
Date: February 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The file deactivate_full is read-only and specifies how many
|
||||
times a full cpu slab was deactivated.
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/deactivate_remote_frees
|
||||
Date: February 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The file deactivate_remote_frees is read-only and specifies how
|
||||
many times a cpu slab has been deactivated and contained free
|
||||
objects that were freed remotely.
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/deactivate_to_head
|
||||
Date: February 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The file deactivate_to_head is read-only and specifies how
|
||||
many times a partial cpu slab was deactivated and added to the
|
||||
head of its node's partial list.
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/deactivate_to_tail
|
||||
Date: February 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The file deactivate_to_tail is read-only and specifies how
|
||||
many times a partial cpu slab was deactivated and added to the
|
||||
tail of its node's partial list.
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/destroy_by_rcu
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The destroy_by_rcu file is read-only and specifies whether
|
||||
slabs (not objects) are freed by rcu.
|
||||
|
||||
What: /sys/kernel/slab/cache/free_add_partial
|
||||
Date: February 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The file free_add_partial is read-only and specifies how many
|
||||
times an object has been freed in a full slab so that it had to
|
||||
added to its node's partial list.
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/free_calls
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The free_calls file is read-only and lists the locations of
|
||||
object frees if slab debugging is enabled (see
|
||||
Documentation/vm/slub.txt).
|
||||
|
||||
What: /sys/kernel/slab/cache/free_fastpath
|
||||
Date: February 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The free_fastpath file is read-only and specifies how many
|
||||
objects have been freed using the fast path because it was an
|
||||
object from the cpu slab.
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/free_frozen
|
||||
Date: February 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The free_frozen file is read-only and specifies how many
|
||||
objects have been freed to a frozen slab (i.e. a remote cpu
|
||||
slab).
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/free_remove_partial
|
||||
Date: February 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The file free_remove_partial is read-only and specifies how
|
||||
many times an object has been freed to a now-empty slab so
|
||||
that it had to be removed from its node's partial list.
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/free_slab
|
||||
Date: February 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The free_slab file is read-only and specifies how many times an
|
||||
empty slab has been freed back to the page allocator.
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/free_slowpath
|
||||
Date: February 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The free_slowpath file is read-only and specifies how many
|
||||
objects have been freed using the slow path (i.e. to a full or
|
||||
partial slab).
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/hwcache_align
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The hwcache_align file is read-only and specifies whether
|
||||
objects are aligned on cachelines.
|
||||
|
||||
What: /sys/kernel/slab/cache/min_partial
|
||||
Date: February 2009
|
||||
KernelVersion: 2.6.30
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
David Rientjes <rientjes@google.com>
|
||||
Description:
|
||||
The min_partial file specifies how many empty slabs shall
|
||||
remain on a node's partial list to avoid the overhead of
|
||||
allocating new slabs. Such slabs may be reclaimed by utilizing
|
||||
the shrink file.
|
||||
|
||||
What: /sys/kernel/slab/cache/object_size
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The object_size file is read-only and specifies the cache's
|
||||
object size.
|
||||
|
||||
What: /sys/kernel/slab/cache/objects
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The objects file is read-only and displays how many objects are
|
||||
active and from which nodes they are from.
|
||||
|
||||
What: /sys/kernel/slab/cache/objects_partial
|
||||
Date: April 2008
|
||||
KernelVersion: 2.6.26
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The objects_partial file is read-only and displays how many
|
||||
objects are on partial slabs and from which nodes they are
|
||||
from.
|
||||
|
||||
What: /sys/kernel/slab/cache/objs_per_slab
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The file objs_per_slab is read-only and specifies how many
|
||||
objects may be allocated from a single slab of the order
|
||||
specified in /sys/kernel/slab/cache/order.
|
||||
|
||||
What: /sys/kernel/slab/cache/order
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The order file specifies the page order at which new slabs are
|
||||
allocated. It is writable and can be changed to increase the
|
||||
number of objects per slab. If a slab cannot be allocated
|
||||
because of fragmentation, SLUB will retry with the minimum order
|
||||
possible depending on its characteristics.
|
||||
|
||||
What: /sys/kernel/slab/cache/order_fallback
|
||||
Date: April 2008
|
||||
KernelVersion: 2.6.26
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The file order_fallback is read-only and specifies how many
|
||||
times an allocation of a new slab has not been possible at the
|
||||
cache's order and instead fallen back to its minimum possible
|
||||
order.
|
||||
Available when CONFIG_SLUB_STATS is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/partial
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The partial file is read-only and displays how long many
|
||||
partial slabs there are and how long each node's list is.
|
||||
|
||||
What: /sys/kernel/slab/cache/poison
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The poison file specifies whether objects should be poisoned
|
||||
when a new slab is allocated.
|
||||
|
||||
What: /sys/kernel/slab/cache/reclaim_account
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The reclaim_account file specifies whether the cache's objects
|
||||
are reclaimable (and grouped by their mobility).
|
||||
|
||||
What: /sys/kernel/slab/cache/red_zone
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The red_zone file specifies whether the cache's objects are red
|
||||
zoned.
|
||||
|
||||
What: /sys/kernel/slab/cache/remote_node_defrag_ratio
|
||||
Date: January 2008
|
||||
KernelVersion: 2.6.25
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The file remote_node_defrag_ratio specifies the percentage of
|
||||
times SLUB will attempt to refill the cpu slab with a partial
|
||||
slab from a remote node as opposed to allocating a new slab on
|
||||
the local node. This reduces the amount of wasted memory over
|
||||
the entire system but can be expensive.
|
||||
Available when CONFIG_NUMA is enabled.
|
||||
|
||||
What: /sys/kernel/slab/cache/sanity_checks
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The sanity_checks file specifies whether expensive checks
|
||||
should be performed on free and, at minimum, enables double free
|
||||
checks. Caches that enable sanity_checks cannot be merged with
|
||||
caches that do not.
|
||||
|
||||
What: /sys/kernel/slab/cache/shrink
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The shrink file is written when memory should be reclaimed from
|
||||
a cache. Empty partial slabs are freed and the partial list is
|
||||
sorted so the slabs with the fewest available objects are used
|
||||
first.
|
||||
|
||||
What: /sys/kernel/slab/cache/slab_size
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The slab_size file is read-only and specifies the object size
|
||||
with metadata (debugging information and alignment) in bytes.
|
||||
|
||||
What: /sys/kernel/slab/cache/slabs
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The slabs file is read-only and displays how long many slabs
|
||||
there are (both cpu and partial) and from which nodes they are
|
||||
from.
|
||||
|
||||
What: /sys/kernel/slab/cache/store_user
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The store_user file specifies whether the location of
|
||||
allocation or free should be tracked for a cache.
|
||||
|
||||
What: /sys/kernel/slab/cache/total_objects
|
||||
Date: April 2008
|
||||
KernelVersion: 2.6.26
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The total_objects file is read-only and displays how many total
|
||||
objects a cache has and from which nodes they are from.
|
||||
|
||||
What: /sys/kernel/slab/cache/trace
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
The trace file specifies whether object allocations and frees
|
||||
should be traced.
|
||||
|
||||
What: /sys/kernel/slab/cache/validate
|
||||
Date: May 2007
|
||||
KernelVersion: 2.6.22
|
||||
Contact: Pekka Enberg <penberg@cs.helsinki.fi>,
|
||||
Christoph Lameter <cl@linux-foundation.org>
|
||||
Description:
|
||||
Writing to the validate file causes SLUB to traverse all of its
|
||||
cache's objects and check the validity of metadata.
|
@ -29,7 +29,7 @@ hardware, for example, you probably needn't concern yourself with
|
||||
isdn4k-utils.
|
||||
|
||||
o Gnu C 3.2 # gcc --version
|
||||
o Gnu make 3.79.1 # make --version
|
||||
o Gnu make 3.80 # make --version
|
||||
o binutils 2.12 # ld -v
|
||||
o util-linux 2.10o # fdformat --version
|
||||
o module-init-tools 0.9.10 # depmod -V
|
||||
@ -48,6 +48,7 @@ o procps 3.2.0 # ps --version
|
||||
o oprofile 0.9 # oprofiled --version
|
||||
o udev 081 # udevinfo -V
|
||||
o grub 0.93 # grub --version
|
||||
o mcelog 0.6
|
||||
|
||||
Kernel compilation
|
||||
==================
|
||||
@ -61,7 +62,7 @@ computer.
|
||||
Make
|
||||
----
|
||||
|
||||
You will need Gnu make 3.79.1 or later to build the kernel.
|
||||
You will need Gnu make 3.80 or later to build the kernel.
|
||||
|
||||
Binutils
|
||||
--------
|
||||
@ -276,6 +277,16 @@ before running exportfs or mountd. It is recommended that all NFS
|
||||
services be protected from the internet-at-large by a firewall where
|
||||
that is possible.
|
||||
|
||||
mcelog
|
||||
------
|
||||
|
||||
In Linux 2.6.31+ the i386 kernel needs to run the mcelog utility
|
||||
as a regular cronjob similar to the x86-64 kernel to process and log
|
||||
machine check events when CONFIG_X86_NEW_MCE is enabled. Machine check
|
||||
events are errors reported by the CPU. Processing them is strongly encouraged.
|
||||
All x86-64 kernels since 2.6.4 require the mcelog utility to
|
||||
process machine checks.
|
||||
|
||||
Getting updated software
|
||||
========================
|
||||
|
||||
@ -365,6 +376,10 @@ FUSE
|
||||
----
|
||||
o <http://sourceforge.net/projects/fuse>
|
||||
|
||||
mcelog
|
||||
------
|
||||
o <ftp://ftp.kernel.org/pub/linux/utils/cpu/mce/mcelog/>
|
||||
|
||||
Networking
|
||||
**********
|
||||
|
||||
|
@ -698,8 +698,8 @@ very often is not. Abundant use of the inline keyword leads to a much bigger
|
||||
kernel, which in turn slows the system as a whole down, due to a bigger
|
||||
icache footprint for the CPU and simply because there is less memory
|
||||
available for the pagecache. Just think about it; a pagecache miss causes a
|
||||
disk seek, which easily takes 5 miliseconds. There are a LOT of cpu cycles
|
||||
that can go into these 5 miliseconds.
|
||||
disk seek, which easily takes 5 milliseconds. There are a LOT of cpu cycles
|
||||
that can go into these 5 milliseconds.
|
||||
|
||||
A reasonable rule of thumb is to not put inline at functions that have more
|
||||
than 3 lines of code in them. An exception to this rule are the cases where
|
||||
|
@ -676,8 +676,8 @@ this directory the following files can currently be found:
|
||||
dma-api/all_errors This file contains a numeric value. If this
|
||||
value is not equal to zero the debugging code
|
||||
will print a warning for every error it finds
|
||||
into the kernel log. Be carefull with this
|
||||
option. It can easily flood your logs.
|
||||
into the kernel log. Be careful with this
|
||||
option, as it can easily flood your logs.
|
||||
|
||||
dma-api/disabled This read-only file contains the character 'Y'
|
||||
if the debugging code is disabled. This can
|
||||
@ -704,12 +704,24 @@ this directory the following files can currently be found:
|
||||
The current number of free dma_debug_entries
|
||||
in the allocator.
|
||||
|
||||
dma-api/driver-filter
|
||||
You can write a name of a driver into this file
|
||||
to limit the debug output to requests from that
|
||||
particular driver. Write an empty string to
|
||||
that file to disable the filter and see
|
||||
all errors again.
|
||||
|
||||
If you have this code compiled into your kernel it will be enabled by default.
|
||||
If you want to boot without the bookkeeping anyway you can provide
|
||||
'dma_debug=off' as a boot parameter. This will disable DMA-API debugging.
|
||||
Notice that you can not enable it again at runtime. You have to reboot to do
|
||||
so.
|
||||
|
||||
If you want to see debug messages only for a special device driver you can
|
||||
specify the dma_debug_driver=<drivername> parameter. This will enable the
|
||||
driver filter at boot time. The debug code will only print errors for that
|
||||
driver afterwards. This filter can be disabled or changed later using debugfs.
|
||||
|
||||
When the code disables itself at runtime this is most likely because it ran
|
||||
out of dma_debug_entries. These entries are preallocated at boot. The number
|
||||
of preallocated entries is defined per architecture. If it is too low for you
|
||||
|
@ -106,7 +106,7 @@
|
||||
number of errors are printk'ed including a full stack trace.
|
||||
</para>
|
||||
<para>
|
||||
The statistics are available via debugfs/debug_objects/stats.
|
||||
The statistics are available via /sys/kernel/debug/debug_objects/stats.
|
||||
They provide information about the number of warnings and the
|
||||
number of successful fixups along with information about the
|
||||
usage of the internal tracking objects and the state of the
|
||||
|
@ -145,7 +145,6 @@ usage should require reading the full document.
|
||||
interface in STA mode at first!
|
||||
</para>
|
||||
!Finclude/net/mac80211.h ieee80211_if_init_conf
|
||||
!Finclude/net/mac80211.h ieee80211_if_conf
|
||||
</chapter>
|
||||
|
||||
<chapter id="rx-tx">
|
||||
|
@ -118,7 +118,7 @@ to another chain) checking the final 'nulls' value if
|
||||
the lookup met the end of chain. If final 'nulls' value
|
||||
is not the slot number, then we must restart the lookup at
|
||||
the beginning. If the object was moved to the same chain,
|
||||
then the reader doesnt care : It might eventually
|
||||
then the reader doesn't care : It might eventually
|
||||
scan the list again without harm.
|
||||
|
||||
|
||||
|
@ -192,23 +192,24 @@ rcu/rcuhier (which displays the struct rcu_node hierarchy).
|
||||
The output of "cat rcu/rcudata" looks as follows:
|
||||
|
||||
rcu:
|
||||
0 c=4011 g=4012 pq=1 pqc=4011 qp=0 rpfq=1 rp=3c2a dt=23301/73 dn=2 df=1882 of=0 ri=2126 ql=2 b=10
|
||||
1 c=4011 g=4012 pq=1 pqc=4011 qp=0 rpfq=3 rp=39a6 dt=78073/1 dn=2 df=1402 of=0 ri=1875 ql=46 b=10
|
||||
2 c=4010 g=4010 pq=1 pqc=4010 qp=0 rpfq=-5 rp=1d12 dt=16646/0 dn=2 df=3140 of=0 ri=2080 ql=0 b=10
|
||||
3 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=2b50 dt=21159/1 dn=2 df=2230 of=0 ri=1923 ql=72 b=10
|
||||
4 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=1644 dt=5783/1 dn=2 df=3348 of=0 ri=2805 ql=7 b=10
|
||||
5 c=4012 g=4013 pq=0 pqc=4011 qp=1 rpfq=3 rp=1aac dt=5879/1 dn=2 df=3140 of=0 ri=2066 ql=10 b=10
|
||||
6 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=ed8 dt=5847/1 dn=2 df=3797 of=0 ri=1266 ql=10 b=10
|
||||
7 c=4012 g=4013 pq=1 pqc=4012 qp=1 rpfq=3 rp=1fa2 dt=6199/1 dn=2 df=2795 of=0 ri=2162 ql=28 b=10
|
||||
rcu:
|
||||
0 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=10951/1 dn=0 df=1101 of=0 ri=36 ql=0 b=10
|
||||
1 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=16117/1 dn=0 df=1015 of=0 ri=0 ql=0 b=10
|
||||
2 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1445/1 dn=0 df=1839 of=0 ri=0 ql=0 b=10
|
||||
3 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=6681/1 dn=0 df=1545 of=0 ri=0 ql=0 b=10
|
||||
4 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=1003/1 dn=0 df=1992 of=0 ri=0 ql=0 b=10
|
||||
5 c=17829 g=17830 pq=1 pqc=17829 qp=1 dt=3887/1 dn=0 df=3331 of=0 ri=4 ql=2 b=10
|
||||
6 c=17829 g=17829 pq=1 pqc=17829 qp=0 dt=859/1 dn=0 df=3224 of=0 ri=0 ql=0 b=10
|
||||
7 c=17829 g=17830 pq=0 pqc=17829 qp=1 dt=3761/1 dn=0 df=1818 of=0 ri=0 ql=2 b=10
|
||||
rcu_bh:
|
||||
0 c=-268 g=-268 pq=1 pqc=-268 qp=0 rpfq=-145 rp=21d6 dt=23301/73 dn=2 df=0 of=0 ri=0 ql=0 b=10
|
||||
1 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-170 rp=20ce dt=78073/1 dn=2 df=26 of=0 ri=5 ql=0 b=10
|
||||
2 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-83 rp=fbd dt=16646/0 dn=2 df=28 of=0 ri=4 ql=0 b=10
|
||||
3 c=-268 g=-268 pq=1 pqc=-268 qp=0 rpfq=-105 rp=178c dt=21159/1 dn=2 df=28 of=0 ri=2 ql=0 b=10
|
||||
4 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-30 rp=b54 dt=5783/1 dn=2 df=32 of=0 ri=0 ql=0 b=10
|
||||
5 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-29 rp=df5 dt=5879/1 dn=2 df=30 of=0 ri=3 ql=0 b=10
|
||||
6 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-28 rp=788 dt=5847/1 dn=2 df=32 of=0 ri=0 ql=0 b=10
|
||||
7 c=-268 g=-268 pq=1 pqc=-268 qp=1 rpfq=-53 rp=1098 dt=6199/1 dn=2 df=30 of=0 ri=3 ql=0 b=10
|
||||
0 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=10951/1 dn=0 df=0 of=0 ri=0 ql=0 b=10
|
||||
1 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=16117/1 dn=0 df=13 of=0 ri=0 ql=0 b=10
|
||||
2 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1445/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||
3 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=6681/1 dn=0 df=9 of=0 ri=0 ql=0 b=10
|
||||
4 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=1003/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||
5 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3887/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||
6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||
7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
|
||||
|
||||
The first section lists the rcu_data structures for rcu, the second for
|
||||
rcu_bh. Each section has one line per CPU, or eight for this 8-CPU system.
|
||||
@ -253,12 +254,6 @@ o "pqc" indicates which grace period the last-observed quiescent
|
||||
o "qp" indicates that RCU still expects a quiescent state from
|
||||
this CPU.
|
||||
|
||||
o "rpfq" is the number of rcu_pending() calls on this CPU required
|
||||
to induce this CPU to invoke force_quiescent_state().
|
||||
|
||||
o "rp" is low-order four hex digits of the count of how many times
|
||||
rcu_pending() has been invoked on this CPU.
|
||||
|
||||
o "dt" is the current value of the dyntick counter that is incremented
|
||||
when entering or leaving dynticks idle state, either by the
|
||||
scheduler or by irq. The number after the "/" is the interrupt
|
||||
@ -305,6 +300,9 @@ o "b" is the batch limit for this CPU. If more than this number
|
||||
of RCU callbacks is ready to invoke, then the remainder will
|
||||
be deferred.
|
||||
|
||||
There is also an rcu/rcudata.csv file with the same information in
|
||||
comma-separated-variable spreadsheet format.
|
||||
|
||||
|
||||
The output of "cat rcu/rcugp" looks as follows:
|
||||
|
||||
@ -411,3 +409,63 @@ o Each element of the form "1/1 0:127 ^0" represents one struct
|
||||
For example, the first entry at the lowest level shows
|
||||
"^0", indicating that it corresponds to bit zero in
|
||||
the first entry at the middle level.
|
||||
|
||||
|
||||
The output of "cat rcu/rcu_pending" looks as follows:
|
||||
|
||||
rcu:
|
||||
0 np=255892 qsp=53936 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741
|
||||
1 np=261224 qsp=54638 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792
|
||||
2 np=237496 qsp=49664 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629
|
||||
3 np=236249 qsp=48766 cbr=0 cng=286 gpc=48049 gps=1218 nf=207 nn=137723
|
||||
4 np=221310 qsp=46850 cbr=0 cng=26 gpc=43161 gps=4634 nf=3529 nn=123110
|
||||
5 np=237332 qsp=48449 cbr=0 cng=54 gpc=47920 gps=3252 nf=201 nn=137456
|
||||
6 np=219995 qsp=46718 cbr=0 cng=50 gpc=42098 gps=6093 nf=4202 nn=120834
|
||||
7 np=249893 qsp=49390 cbr=0 cng=72 gpc=38400 gps=17102 nf=41 nn=144888
|
||||
rcu_bh:
|
||||
0 np=146741 qsp=1419 cbr=0 cng=6 gpc=0 gps=0 nf=2 nn=145314
|
||||
1 np=155792 qsp=12597 cbr=0 cng=0 gpc=4 gps=8 nf=3 nn=143180
|
||||
2 np=136629 qsp=18680 cbr=0 cng=0 gpc=7 gps=6 nf=0 nn=117936
|
||||
3 np=137723 qsp=2843 cbr=0 cng=0 gpc=10 gps=7 nf=0 nn=134863
|
||||
4 np=123110 qsp=12433 cbr=0 cng=0 gpc=4 gps=2 nf=0 nn=110671
|
||||
5 np=137456 qsp=4210 cbr=0 cng=0 gpc=6 gps=5 nf=0 nn=133235
|
||||
6 np=120834 qsp=9902 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921
|
||||
7 np=144888 qsp=26336 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542
|
||||
|
||||
As always, this is once again split into "rcu" and "rcu_bh" portions.
|
||||
The fields are as follows:
|
||||
|
||||
o "np" is the number of times that __rcu_pending() has been invoked
|
||||
for the corresponding flavor of RCU.
|
||||
|
||||
o "qsp" is the number of times that the RCU was waiting for a
|
||||
quiescent state from this CPU.
|
||||
|
||||
o "cbr" is the number of times that this CPU had RCU callbacks
|
||||
that had passed through a grace period, and were thus ready
|
||||
to be invoked.
|
||||
|
||||
o "cng" is the number of times that this CPU needed another
|
||||
grace period while RCU was idle.
|
||||
|
||||
o "gpc" is the number of times that an old grace period had
|
||||
completed, but this CPU was not yet aware of it.
|
||||
|
||||
o "gps" is the number of times that a new grace period had started,
|
||||
but this CPU was not yet aware of it.
|
||||
|
||||
o "nf" is the number of times that this CPU suspected that the
|
||||
current grace period had run for too long, and thus needed to
|
||||
be forced.
|
||||
|
||||
Please note that "forcing" consists of sending resched IPIs
|
||||
to holdout CPUs. If that CPU really still is in an old RCU
|
||||
read-side critical section, then we really do have to wait for it.
|
||||
The assumption behing "forcing" is that the CPU is not still in
|
||||
an old RCU read-side critical section, but has not yet responded
|
||||
for some other reason.
|
||||
|
||||
o "nn" is the number of times that this CPU needed nothing. Alert
|
||||
readers will note that the rcu "nn" number for a given CPU very
|
||||
closely matches the rcu_bh "np" number for that same CPU. This
|
||||
is due to short-circuit evaluation in rcu_pending().
|
||||
|
@ -5,7 +5,7 @@ Copyright 2006, 2007 Simtec Electronics
|
||||
|
||||
The Silicon Motion SM501 multimedia companion chip is a multifunction device
|
||||
which may provide numerous interfaces including USB host controller USB gadget,
|
||||
Asyncronous Serial ports, Audio functions and a dual display video interface.
|
||||
asynchronous serial ports, audio functions, and a dual display video interface.
|
||||
The device may be connected by PCI or local bus with varying functions enabled.
|
||||
|
||||
Core
|
||||
|
@ -184,8 +184,9 @@ length. Single character labels using special characters, that being anything
|
||||
other than a letter or digit, are reserved for use by the Smack development
|
||||
team. Smack labels are unstructured, case sensitive, and the only operation
|
||||
ever performed on them is comparison for equality. Smack labels cannot
|
||||
contain unprintable characters or the "/" (slash) character. Smack labels
|
||||
cannot begin with a '-', which is reserved for special options.
|
||||
contain unprintable characters, the "/" (slash), the "\" (backslash), the "'"
|
||||
(quote) and '"' (double-quote) characters.
|
||||
Smack labels cannot begin with a '-', which is reserved for special options.
|
||||
|
||||
There are some predefined labels:
|
||||
|
||||
@ -523,3 +524,18 @@ Smack supports some mount options:
|
||||
|
||||
These mount options apply to all file system types.
|
||||
|
||||
Smack auditing
|
||||
|
||||
If you want Smack auditing of security events, you need to set CONFIG_AUDIT
|
||||
in your kernel configuration.
|
||||
By default, all denied events will be audited. You can change this behavior by
|
||||
writing a single character to the /smack/logging file :
|
||||
0 : no logging
|
||||
1 : log denied (default)
|
||||
2 : log accepted
|
||||
3 : log denied & accepted
|
||||
|
||||
Events are logged as 'key=value' pairs, for each event you at least will get
|
||||
the subjet, the object, the rights requested, the action, the kernel function
|
||||
that triggered the event, plus other pairs depending on the type of event
|
||||
audited.
|
||||
|
@ -91,6 +91,10 @@ Be as specific as possible. The WORST descriptions possible include
|
||||
things like "update driver X", "bug fix for driver X", or "this patch
|
||||
includes updates for subsystem X. Please apply."
|
||||
|
||||
The maintainer will thank you if you write your patch description in a
|
||||
form which can be easily pulled into Linux's source code management
|
||||
system, git, as a "commit log". See #15, below.
|
||||
|
||||
If your description starts to get long, that's a sign that you probably
|
||||
need to split up your patch. See #3, next.
|
||||
|
||||
@ -183,8 +187,9 @@ Even if the maintainer did not respond in step #4, make sure to ALWAYS
|
||||
copy the maintainer when you change their code.
|
||||
|
||||
For small patches you may want to CC the Trivial Patch Monkey
|
||||
trivial@kernel.org managed by Jesper Juhl; which collects "trivial"
|
||||
patches. Trivial patches must qualify for one of the following rules:
|
||||
trivial@kernel.org which collects "trivial" patches. Have a look
|
||||
into the MAINTAINERS file for its current manager.
|
||||
Trivial patches must qualify for one of the following rules:
|
||||
Spelling fixes in documentation
|
||||
Spelling fixes which could break grep(1)
|
||||
Warning fixes (cluttering with useless warnings is bad)
|
||||
@ -196,7 +201,6 @@ patches. Trivial patches must qualify for one of the following rules:
|
||||
since people copy, as long as it's trivial)
|
||||
Any fix by the author/maintainer of the file (ie. patch monkey
|
||||
in re-transmission mode)
|
||||
URL: <http://www.kernel.org/pub/linux/kernel/people/juhl/trivial/>
|
||||
|
||||
|
||||
|
||||
@ -405,7 +409,14 @@ person it names. This tag documents that potentially interested parties
|
||||
have been included in the discussion
|
||||
|
||||
|
||||
14) Using Tested-by: and Reviewed-by:
|
||||
14) Using Reported-by:, Tested-by: and Reviewed-by:
|
||||
|
||||
If this patch fixes a problem reported by somebody else, consider adding a
|
||||
Reported-by: tag to credit the reporter for their contribution. Please
|
||||
note that this tag should not be added without the reporter's permission,
|
||||
especially if the problem was not reported in a public forum. That said,
|
||||
if we diligently credit our bug reporters, they will, hopefully, be
|
||||
inspired to help us again in the future.
|
||||
|
||||
A Tested-by: tag indicates that the patch has been successfully tested (in
|
||||
some environment) by the person named. This tag informs maintainers that
|
||||
@ -444,7 +455,7 @@ offer a Reviewed-by tag for a patch. This tag serves to give credit to
|
||||
reviewers and to inform maintainers of the degree of review which has been
|
||||
done on the patch. Reviewed-by: tags, when supplied by reviewers known to
|
||||
understand the subject area and to perform thorough reviews, will normally
|
||||
increase the liklihood of your patch getting into the kernel.
|
||||
increase the likelihood of your patch getting into the kernel.
|
||||
|
||||
|
||||
15) The canonical patch format
|
||||
@ -485,12 +496,33 @@ phrase" should not be a filename. Do not use the same "summary
|
||||
phrase" for every patch in a whole patch series (where a "patch
|
||||
series" is an ordered sequence of multiple, related patches).
|
||||
|
||||
Bear in mind that the "summary phrase" of your email becomes
|
||||
a globally-unique identifier for that patch. It propagates
|
||||
all the way into the git changelog. The "summary phrase" may
|
||||
later be used in developer discussions which refer to the patch.
|
||||
People will want to google for the "summary phrase" to read
|
||||
discussion regarding that patch.
|
||||
Bear in mind that the "summary phrase" of your email becomes a
|
||||
globally-unique identifier for that patch. It propagates all the way
|
||||
into the git changelog. The "summary phrase" may later be used in
|
||||
developer discussions which refer to the patch. People will want to
|
||||
google for the "summary phrase" to read discussion regarding that
|
||||
patch. It will also be the only thing that people may quickly see
|
||||
when, two or three months later, they are going through perhaps
|
||||
thousands of patches using tools such as "gitk" or "git log
|
||||
--oneline".
|
||||
|
||||
For these reasons, the "summary" must be no more than 70-75
|
||||
characters, and it must describe both what the patch changes, as well
|
||||
as why the patch might be necessary. It is challenging to be both
|
||||
succinct and descriptive, but that is what a well-written summary
|
||||
should do.
|
||||
|
||||
The "summary phrase" may be prefixed by tags enclosed in square
|
||||
brackets: "Subject: [PATCH tag] <summary phrase>". The tags are not
|
||||
considered part of the summary phrase, but describe how the patch
|
||||
should be treated. Common tags might include a version descriptor if
|
||||
the multiple versions of the patch have been sent out in response to
|
||||
comments (i.e., "v1, v2, v3"), or "RFC" to indicate a request for
|
||||
comments. If there are four patches in a patch series the individual
|
||||
patches may be numbered like this: 1/4, 2/4, 3/4, 4/4. This assures
|
||||
that developers understand the order in which the patches should be
|
||||
applied and that they have reviewed or applied all of the patches in
|
||||
the patch series.
|
||||
|
||||
A couple of example Subjects:
|
||||
|
||||
@ -510,19 +542,31 @@ the patch author in the changelog.
|
||||
The explanation body will be committed to the permanent source
|
||||
changelog, so should make sense to a competent reader who has long
|
||||
since forgotten the immediate details of the discussion that might
|
||||
have led to this patch.
|
||||
have led to this patch. Including symptoms of the failure which the
|
||||
patch addresses (kernel log messages, oops messages, etc.) is
|
||||
especially useful for people who might be searching the commit logs
|
||||
looking for the applicable patch. If a patch fixes a compile failure,
|
||||
it may not be necessary to include _all_ of the compile failures; just
|
||||
enough that it is likely that someone searching for the patch can find
|
||||
it. As in the "summary phrase", it is important to be both succinct as
|
||||
well as descriptive.
|
||||
|
||||
The "---" marker line serves the essential purpose of marking for patch
|
||||
handling tools where the changelog message ends.
|
||||
|
||||
One good use for the additional comments after the "---" marker is for
|
||||
a diffstat, to show what files have changed, and the number of inserted
|
||||
and deleted lines per file. A diffstat is especially useful on bigger
|
||||
patches. Other comments relevant only to the moment or the maintainer,
|
||||
not suitable for the permanent changelog, should also go here.
|
||||
Use diffstat options "-p 1 -w 70" so that filenames are listed from the
|
||||
top of the kernel source tree and don't use too much horizontal space
|
||||
(easily fit in 80 columns, maybe with some indentation).
|
||||
a diffstat, to show what files have changed, and the number of
|
||||
inserted and deleted lines per file. A diffstat is especially useful
|
||||
on bigger patches. Other comments relevant only to the moment or the
|
||||
maintainer, not suitable for the permanent changelog, should also go
|
||||
here. A good example of such comments might be "patch changelogs"
|
||||
which describe what has changed between the v1 and v2 version of the
|
||||
patch.
|
||||
|
||||
If you are going to include a diffstat after the "---" marker, please
|
||||
use diffstat options "-p 1 -w 70" so that filenames are listed from
|
||||
the top of the kernel source tree and don't use too much horizontal
|
||||
space (easily fit in 80 columns, maybe with some indentation).
|
||||
|
||||
See more details on the proper patch format in the following
|
||||
references.
|
||||
|
@ -246,7 +246,8 @@ void print_ioacct(struct taskstats *t)
|
||||
|
||||
int main(int argc, char *argv[])
|
||||
{
|
||||
int c, rc, rep_len, aggr_len, len2, cmd_type;
|
||||
int c, rc, rep_len, aggr_len, len2;
|
||||
int cmd_type = TASKSTATS_CMD_ATTR_UNSPEC;
|
||||
__u16 id;
|
||||
__u32 mypid;
|
||||
|
||||
|
@ -51,7 +51,7 @@ PIN Numbers
|
||||
-----------
|
||||
|
||||
Each pin has an unique number associated with it in regs-gpio.h,
|
||||
eg S3C2410_GPA0 or S3C2410_GPF1. These defines are used to tell
|
||||
eg S3C2410_GPA(0) or S3C2410_GPF(1). These defines are used to tell
|
||||
the GPIO functions which pin is to be used.
|
||||
|
||||
|
||||
@ -65,11 +65,11 @@ Configuring a pin
|
||||
|
||||
Eg:
|
||||
|
||||
s3c2410_gpio_cfgpin(S3C2410_GPA0, S3C2410_GPA0_ADDR0);
|
||||
s3c2410_gpio_cfgpin(S3C2410_GPE8, S3C2410_GPE8_SDDAT1);
|
||||
s3c2410_gpio_cfgpin(S3C2410_GPA(0), S3C2410_GPA0_ADDR0);
|
||||
s3c2410_gpio_cfgpin(S3C2410_GPE(8), S3C2410_GPE8_SDDAT1);
|
||||
|
||||
which would turn GPA0 into the lowest Address line A0, and set
|
||||
GPE8 to be connected to the SDIO/MMC controller's SDDAT1 line.
|
||||
which would turn GPA(0) into the lowest Address line A0, and set
|
||||
GPE(8) to be connected to the SDIO/MMC controller's SDDAT1 line.
|
||||
|
||||
|
||||
Reading the current configuration
|
||||
|
@ -229,10 +229,10 @@ kernel. It is the use of atomic counters to implement reference
|
||||
counting, and it works such that once the counter falls to zero it can
|
||||
be guaranteed that no other entity can be accessing the object:
|
||||
|
||||
static void obj_list_add(struct obj *obj)
|
||||
static void obj_list_add(struct obj *obj, struct list_head *head)
|
||||
{
|
||||
obj->active = 1;
|
||||
list_add(&obj->list);
|
||||
list_add(&obj->list, head);
|
||||
}
|
||||
|
||||
static void obj_list_del(struct obj *obj)
|
||||
|
@ -186,7 +186,7 @@ a virtual address mapping (unlike the earlier scheme of virtual address
|
||||
do not have a corresponding kernel virtual address space mapping) and
|
||||
low-memory pages.
|
||||
|
||||
Note: Please refer to Documentation/PCI/PCI-DMA-mapping.txt for a discussion
|
||||
Note: Please refer to Documentation/DMA-mapping.txt for a discussion
|
||||
on PCI high mem DMA aspects and mapping of scatter gather lists, and support
|
||||
for 64 bit PCI.
|
||||
|
||||
|
@ -58,7 +58,7 @@ same criteria as reads.
|
||||
front_merges (bool)
|
||||
------------
|
||||
|
||||
Sometimes it happens that a request enters the io scheduler that is contigious
|
||||
Sometimes it happens that a request enters the io scheduler that is contiguous
|
||||
with a request that is already on the queue. Either it fits in the back of that
|
||||
request, or it fits at the front. That is called either a back merge candidate
|
||||
or a front merge candidate. Due to the way files are typically laid out,
|
||||
|
@ -27,7 +27,7 @@ parameter.
|
||||
|
||||
For simplicity, only one braille console can be enabled, other uses of
|
||||
console=brl,... will be discarded. Also note that it does not interfere with
|
||||
the console selection mecanism described in serial-console.txt
|
||||
the console selection mechanism described in serial-console.txt
|
||||
|
||||
For now, only the VisioBraille device is supported.
|
||||
|
||||
|
@ -117,7 +117,7 @@ Using the pktcdvd debugfs interface
|
||||
|
||||
To read pktcdvd device infos in human readable form, do:
|
||||
|
||||
# cat /debug/pktcdvd/pktcdvd[0-7]/info
|
||||
# cat /sys/kernel/debug/pktcdvd/pktcdvd[0-7]/info
|
||||
|
||||
For a description of the debugfs interface look into the file:
|
||||
|
||||
|
@ -76,9 +76,9 @@ Do the steps below to download the BIOS image.
|
||||
|
||||
The /sys/class/firmware/dell_rbu/ entries will remain till the following is
|
||||
done.
|
||||
echo -1 > /sys/class/firmware/dell_rbu/loading.
|
||||
echo -1 > /sys/class/firmware/dell_rbu/loading
|
||||
Until this step is completed the driver cannot be unloaded.
|
||||
Also echoing either mono ,packet or init in to image_type will free up the
|
||||
Also echoing either mono, packet or init in to image_type will free up the
|
||||
memory allocated by the driver.
|
||||
|
||||
If a user by accident executes steps 1 and 3 above without executing step 2;
|
||||
|
@ -119,7 +119,7 @@ which takes quite a bit of time and thought after the "real work" has been
|
||||
done. When done properly, though, it is time well spent.
|
||||
|
||||
|
||||
5.4: PATCH FORMATTING
|
||||
5.4: PATCH FORMATTING AND CHANGELOGS
|
||||
|
||||
So now you have a perfect series of patches for posting, but the work is
|
||||
not done quite yet. Each patch needs to be formatted into a message which
|
||||
@ -146,8 +146,33 @@ that end, each patch will be composed of the following:
|
||||
- One or more tag lines, with, at a minimum, one Signed-off-by: line from
|
||||
the author of the patch. Tags will be described in more detail below.
|
||||
|
||||
The above three items should, normally, be the text used when committing
|
||||
the change to a revision control system. They are followed by:
|
||||
The items above, together, form the changelog for the patch. Writing good
|
||||
changelogs is a crucial but often-neglected art; it's worth spending
|
||||
another moment discussing this issue. When writing a changelog, you should
|
||||
bear in mind that a number of different people will be reading your words.
|
||||
These include subsystem maintainers and reviewers who need to decide
|
||||
whether the patch should be included, distributors and other maintainers
|
||||
trying to decide whether a patch should be backported to other kernels, bug
|
||||
hunters wondering whether the patch is responsible for a problem they are
|
||||
chasing, users who want to know how the kernel has changed, and more. A
|
||||
good changelog conveys the needed information to all of these people in the
|
||||
most direct and concise way possible.
|
||||
|
||||
To that end, the summary line should describe the effects of and motivation
|
||||
for the change as well as possible given the one-line constraint. The
|
||||
detailed description can then amplify on those topics and provide any
|
||||
needed additional information. If the patch fixes a bug, cite the commit
|
||||
which introduced the bug if possible. If a problem is associated with
|
||||
specific log or compiler output, include that output to help others
|
||||
searching for a solution to the same problem. If the change is meant to
|
||||
support other changes coming in later patch, say so. If internal APIs are
|
||||
changed, detail those changes and how other developers should respond. In
|
||||
general, the more you can put yourself into the shoes of everybody who will
|
||||
be reading your changelog, the better that changelog (and the kernel as a
|
||||
whole) will be.
|
||||
|
||||
Needless to say, the changelog should be the text used when committing the
|
||||
change to a revision control system. It will be followed by:
|
||||
|
||||
- The patch itself, in the unified ("-u") patch format. Using the "-p"
|
||||
option to diff will associate function names with changes, making the
|
||||
|
@ -162,3 +162,35 @@ device_remove_file(dev,&dev_attr_power);
|
||||
|
||||
The file name will be 'power' with a mode of 0644 (-rw-r--r--).
|
||||
|
||||
Word of warning: While the kernel allows device_create_file() and
|
||||
device_remove_file() to be called on a device at any time, userspace has
|
||||
strict expectations on when attributes get created. When a new device is
|
||||
registered in the kernel, a uevent is generated to notify userspace (like
|
||||
udev) that a new device is available. If attributes are added after the
|
||||
device is registered, then userspace won't get notified and userspace will
|
||||
not know about the new attributes.
|
||||
|
||||
This is important for device driver that need to publish additional
|
||||
attributes for a device at driver probe time. If the device driver simply
|
||||
calls device_create_file() on the device structure passed to it, then
|
||||
userspace will never be notified of the new attributes. Instead, it should
|
||||
probably use class_create() and class->dev_attrs to set up a list of
|
||||
desired attributes in the modules_init function, and then in the .probe()
|
||||
hook, and then use device_create() to create a new device as a child
|
||||
of the probed device. The new device will generate a new uevent and
|
||||
properly advertise the new attributes to userspace.
|
||||
|
||||
For example, if a driver wanted to add the following attributes:
|
||||
struct device_attribute mydriver_attribs[] = {
|
||||
__ATTR(port_count, 0444, port_count_show),
|
||||
__ATTR(serial_number, 0444, serial_number_show),
|
||||
NULL
|
||||
};
|
||||
|
||||
Then in the module init function is would do:
|
||||
mydriver_class = class_create(THIS_MODULE, "my_attrs");
|
||||
mydriver_class.dev_attr = mydriver_attribs;
|
||||
|
||||
And assuming 'dev' is the struct device passed into the probe hook, the driver
|
||||
probe function would do something like:
|
||||
create_device(&mydriver_class, dev, chrdev, &private_data, "my_name");
|
||||
|
@ -188,7 +188,7 @@ For example, you can do something like the following.
|
||||
|
||||
void my_midlayer_destroy_something()
|
||||
{
|
||||
devres_release_group(dev, my_midlayer_create_soemthing);
|
||||
devres_release_group(dev, my_midlayer_create_something);
|
||||
}
|
||||
|
||||
|
||||
|
@ -112,7 +112,7 @@ sub tda10045 {
|
||||
|
||||
sub tda10046 {
|
||||
my $sourcefile = "TT_PCI_2.19h_28_11_2006.zip";
|
||||
my $url = "http://technotrend-online.com/download/software/219/$sourcefile";
|
||||
my $url = "http://www.tt-download.com/download/updates/219/$sourcefile";
|
||||
my $hash = "6a7e1e2f2644b162ff0502367553c72d";
|
||||
my $outfile = "dvb-fe-tda10046.fw";
|
||||
my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1);
|
||||
@ -129,8 +129,8 @@ sub tda10046 {
|
||||
}
|
||||
|
||||
sub tda10046lifeview {
|
||||
my $sourcefile = "Drv_2.11.02.zip";
|
||||
my $url = "http://www.lifeview.com.tw/drivers/pci_card/FlyDVB-T/$sourcefile";
|
||||
my $sourcefile = "7%5Cdrv_2.11.02.zip";
|
||||
my $url = "http://www.lifeview.hk/dbimages/document/$sourcefile";
|
||||
my $hash = "1ea24dee4eea8fe971686981f34fd2e0";
|
||||
my $outfile = "dvb-fe-tda10046.fw";
|
||||
my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1);
|
||||
@ -317,7 +317,7 @@ sub nxt2002 {
|
||||
|
||||
sub nxt2004 {
|
||||
my $sourcefile = "AVerTVHD_MCE_A180_Drv_v1.2.2.16.zip";
|
||||
my $url = "http://www.aver.com/support/Drivers/$sourcefile";
|
||||
my $url = "http://www.avermedia-usa.com/support/Drivers/$sourcefile";
|
||||
my $hash = "111cb885b1e009188346d72acfed024c";
|
||||
my $outfile = "dvb-fe-nxt2004.fw";
|
||||
my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1);
|
||||
|
@ -23,8 +23,8 @@ first time, it was renamed to 'EDAC'.
|
||||
The bluesmoke project at sourceforge.net is now utilized as a 'staging area'
|
||||
for EDAC development, before it is sent upstream to kernel.org
|
||||
|
||||
At the bluesmoke/EDAC project site, is a series of quilt patches against
|
||||
recent kernels, stored in a SVN respository. For easier downloading, there
|
||||
At the bluesmoke/EDAC project site is a series of quilt patches against
|
||||
recent kernels, stored in a SVN repository. For easier downloading, there
|
||||
is also a tarball snapshot available.
|
||||
|
||||
============================================================================
|
||||
@ -73,9 +73,9 @@ the vendor should tie the parity status bits to 0 if they do not intend
|
||||
to generate parity. Some vendors do not do this, and thus the parity bit
|
||||
can "float" giving false positives.
|
||||
|
||||
In the kernel there is a pci device attribute located in sysfs that is
|
||||
In the kernel there is a PCI device attribute located in sysfs that is
|
||||
checked by the EDAC PCI scanning code. If that attribute is set,
|
||||
PCI parity/error scannining is skipped for that device. The attribute
|
||||
PCI parity/error scanning is skipped for that device. The attribute
|
||||
is:
|
||||
|
||||
broken_parity_status
|
||||
|
@ -29,16 +29,16 @@ o debugfs entries
|
||||
fault-inject-debugfs kernel module provides some debugfs entries for runtime
|
||||
configuration of fault-injection capabilities.
|
||||
|
||||
- /debug/fail*/probability:
|
||||
- /sys/kernel/debug/fail*/probability:
|
||||
|
||||
likelihood of failure injection, in percent.
|
||||
Format: <percent>
|
||||
|
||||
Note that one-failure-per-hundred is a very high error rate
|
||||
for some testcases. Consider setting probability=100 and configure
|
||||
/debug/fail*/interval for such testcases.
|
||||
/sys/kernel/debug/fail*/interval for such testcases.
|
||||
|
||||
- /debug/fail*/interval:
|
||||
- /sys/kernel/debug/fail*/interval:
|
||||
|
||||
specifies the interval between failures, for calls to
|
||||
should_fail() that pass all the other tests.
|
||||
@ -46,18 +46,18 @@ configuration of fault-injection capabilities.
|
||||
Note that if you enable this, by setting interval>1, you will
|
||||
probably want to set probability=100.
|
||||
|
||||
- /debug/fail*/times:
|
||||
- /sys/kernel/debug/fail*/times:
|
||||
|
||||
specifies how many times failures may happen at most.
|
||||
A value of -1 means "no limit".
|
||||
|
||||
- /debug/fail*/space:
|
||||
- /sys/kernel/debug/fail*/space:
|
||||
|
||||
specifies an initial resource "budget", decremented by "size"
|
||||
on each call to should_fail(,size). Failure injection is
|
||||
suppressed until "space" reaches zero.
|
||||
|
||||
- /debug/fail*/verbose
|
||||
- /sys/kernel/debug/fail*/verbose
|
||||
|
||||
Format: { 0 | 1 | 2 }
|
||||
specifies the verbosity of the messages when failure is
|
||||
@ -65,17 +65,17 @@ configuration of fault-injection capabilities.
|
||||
log line per failure; '2' will print a call trace too -- useful
|
||||
to debug the problems revealed by fault injection.
|
||||
|
||||
- /debug/fail*/task-filter:
|
||||
- /sys/kernel/debug/fail*/task-filter:
|
||||
|
||||
Format: { 'Y' | 'N' }
|
||||
A value of 'N' disables filtering by process (default).
|
||||
Any positive value limits failures to only processes indicated by
|
||||
/proc/<pid>/make-it-fail==1.
|
||||
|
||||
- /debug/fail*/require-start:
|
||||
- /debug/fail*/require-end:
|
||||
- /debug/fail*/reject-start:
|
||||
- /debug/fail*/reject-end:
|
||||
- /sys/kernel/debug/fail*/require-start:
|
||||
- /sys/kernel/debug/fail*/require-end:
|
||||
- /sys/kernel/debug/fail*/reject-start:
|
||||
- /sys/kernel/debug/fail*/reject-end:
|
||||
|
||||
specifies the range of virtual addresses tested during
|
||||
stacktrace walking. Failure is injected only if some caller
|
||||
@ -84,26 +84,26 @@ configuration of fault-injection capabilities.
|
||||
Default required range is [0,ULONG_MAX) (whole of virtual address space).
|
||||
Default rejected range is [0,0).
|
||||
|
||||
- /debug/fail*/stacktrace-depth:
|
||||
- /sys/kernel/debug/fail*/stacktrace-depth:
|
||||
|
||||
specifies the maximum stacktrace depth walked during search
|
||||
for a caller within [require-start,require-end) OR
|
||||
[reject-start,reject-end).
|
||||
|
||||
- /debug/fail_page_alloc/ignore-gfp-highmem:
|
||||
- /sys/kernel/debug/fail_page_alloc/ignore-gfp-highmem:
|
||||
|
||||
Format: { 'Y' | 'N' }
|
||||
default is 'N', setting it to 'Y' won't inject failures into
|
||||
highmem/user allocations.
|
||||
|
||||
- /debug/failslab/ignore-gfp-wait:
|
||||
- /debug/fail_page_alloc/ignore-gfp-wait:
|
||||
- /sys/kernel/debug/failslab/ignore-gfp-wait:
|
||||
- /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait:
|
||||
|
||||
Format: { 'Y' | 'N' }
|
||||
default is 'N', setting it to 'Y' will inject failures
|
||||
only into non-sleep allocations (GFP_ATOMIC allocations).
|
||||
|
||||
- /debug/fail_page_alloc/min-order:
|
||||
- /sys/kernel/debug/fail_page_alloc/min-order:
|
||||
|
||||
specifies the minimum page allocation order to be injected
|
||||
failures.
|
||||
@ -166,13 +166,13 @@ o Inject slab allocation failures into module init/exit code
|
||||
#!/bin/bash
|
||||
|
||||
FAILTYPE=failslab
|
||||
echo Y > /debug/$FAILTYPE/task-filter
|
||||
echo 10 > /debug/$FAILTYPE/probability
|
||||
echo 100 > /debug/$FAILTYPE/interval
|
||||
echo -1 > /debug/$FAILTYPE/times
|
||||
echo 0 > /debug/$FAILTYPE/space
|
||||
echo 2 > /debug/$FAILTYPE/verbose
|
||||
echo 1 > /debug/$FAILTYPE/ignore-gfp-wait
|
||||
echo Y > /sys/kernel/debug/$FAILTYPE/task-filter
|
||||
echo 10 > /sys/kernel/debug/$FAILTYPE/probability
|
||||
echo 100 > /sys/kernel/debug/$FAILTYPE/interval
|
||||
echo -1 > /sys/kernel/debug/$FAILTYPE/times
|
||||
echo 0 > /sys/kernel/debug/$FAILTYPE/space
|
||||
echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
|
||||
echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
|
||||
|
||||
faulty_system()
|
||||
{
|
||||
@ -217,20 +217,20 @@ then
|
||||
exit 1
|
||||
fi
|
||||
|
||||
cat /sys/module/$module/sections/.text > /debug/$FAILTYPE/require-start
|
||||
cat /sys/module/$module/sections/.data > /debug/$FAILTYPE/require-end
|
||||
cat /sys/module/$module/sections/.text > /sys/kernel/debug/$FAILTYPE/require-start
|
||||
cat /sys/module/$module/sections/.data > /sys/kernel/debug/$FAILTYPE/require-end
|
||||
|
||||
echo N > /debug/$FAILTYPE/task-filter
|
||||
echo 10 > /debug/$FAILTYPE/probability
|
||||
echo 100 > /debug/$FAILTYPE/interval
|
||||
echo -1 > /debug/$FAILTYPE/times
|
||||
echo 0 > /debug/$FAILTYPE/space
|
||||
echo 2 > /debug/$FAILTYPE/verbose
|
||||
echo 1 > /debug/$FAILTYPE/ignore-gfp-wait
|
||||
echo 1 > /debug/$FAILTYPE/ignore-gfp-highmem
|
||||
echo 10 > /debug/$FAILTYPE/stacktrace-depth
|
||||
echo N > /sys/kernel/debug/$FAILTYPE/task-filter
|
||||
echo 10 > /sys/kernel/debug/$FAILTYPE/probability
|
||||
echo 100 > /sys/kernel/debug/$FAILTYPE/interval
|
||||
echo -1 > /sys/kernel/debug/$FAILTYPE/times
|
||||
echo 0 > /sys/kernel/debug/$FAILTYPE/space
|
||||
echo 2 > /sys/kernel/debug/$FAILTYPE/verbose
|
||||
echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-wait
|
||||
echo 1 > /sys/kernel/debug/$FAILTYPE/ignore-gfp-highmem
|
||||
echo 10 > /sys/kernel/debug/$FAILTYPE/stacktrace-depth
|
||||
|
||||
trap "echo 0 > /debug/$FAILTYPE/probability" SIGINT SIGTERM EXIT
|
||||
trap "echo 0 > /sys/kernel/debug/$FAILTYPE/probability" SIGINT SIGTERM EXIT
|
||||
|
||||
echo "Injecting errors into the module $module... (interrupt to stop)"
|
||||
sleep 1000000
|
||||
|
@ -1,7 +1,7 @@
|
||||
SH7760/SH7763 integrated LCDC Framebuffer driver
|
||||
================================================
|
||||
|
||||
0. Overwiew
|
||||
0. Overview
|
||||
-----------
|
||||
The SH7760/SH7763 have an integrated LCD Display controller (LCDC) which
|
||||
supports (in theory) resolutions ranging from 1x1 to 1024x1024,
|
||||
|
@ -95,7 +95,7 @@ There is no way to change the vesafb video mode and/or timings after
|
||||
booting linux. If you are not happy with the 60 Hz refresh rate, you
|
||||
have these options:
|
||||
|
||||
* configure and load the DOS-Tools for your the graphics board (if
|
||||
* configure and load the DOS-Tools for the graphics board (if
|
||||
available) and boot linux with loadlin.
|
||||
* use a native driver (matroxfb/atyfb) instead if vesafb. If none
|
||||
is available, write a new one!
|
||||
|
@ -437,3 +437,20 @@ Why: Superseded by tdfxfb. I2C/DDC support used to live in a separate
|
||||
driver but this caused driver conflicts.
|
||||
Who: Jean Delvare <khali@linux-fr.org>
|
||||
Krzysztof Helt <krzysztof.h1@wp.pl>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: CONFIG_RFKILL_INPUT
|
||||
When: 2.6.33
|
||||
Why: Should be implemented in userspace, policy daemon.
|
||||
Who: Johannes Berg <johannes@sipsolutions.net>
|
||||
|
||||
----------------------------
|
||||
|
||||
What: CONFIG_X86_OLD_MCE
|
||||
When: 2.6.32
|
||||
Why: Remove the old legacy 32bit machine check code. This has been
|
||||
superseded by the newer machine check code from the 64bit port,
|
||||
but the old version has been kept around for easier testing. Note this
|
||||
doesn't impact the old P5 and WinChip machine check handlers.
|
||||
Who: Andi Kleen <andi@firstfloor.org>
|
||||
|
@ -369,7 +369,7 @@ The call requires an initialized struct autofs_dev_ioctl. There are two
|
||||
possible variations. Both use the path field set to the path of the mount
|
||||
point to check and the size field adjusted appropriately. One uses the
|
||||
ioctlfd field to identify a specific mount point to check while the other
|
||||
variation uses the path and optionaly arg1 set to an autofs mount type.
|
||||
variation uses the path and optionally arg1 set to an autofs mount type.
|
||||
The call returns 1 if this is a mount point and sets arg1 to the device
|
||||
number of the mount and field arg2 to the relevant super block magic
|
||||
number (described below) or 0 if it isn't a mountpoint. In both cases
|
||||
|
@ -184,7 +184,7 @@ This has the following fields:
|
||||
have index children.
|
||||
|
||||
If this function is not supplied or if it returns NULL then the first
|
||||
cache in the parent's list will be chosed, or failing that, the first
|
||||
cache in the parent's list will be chosen, or failing that, the first
|
||||
cache in the master list.
|
||||
|
||||
(4) A function to retrieve an object's key from the netfs [mandatory].
|
||||
|
158
Documentation/filesystems/debugfs.txt
Normal file
158
Documentation/filesystems/debugfs.txt
Normal file
@ -0,0 +1,158 @@
|
||||
Copyright 2009 Jonathan Corbet <corbet@lwn.net>
|
||||
|
||||
Debugfs exists as a simple way for kernel developers to make information
|
||||
available to user space. Unlike /proc, which is only meant for information
|
||||
about a process, or sysfs, which has strict one-value-per-file rules,
|
||||
debugfs has no rules at all. Developers can put any information they want
|
||||
there. The debugfs filesystem is also intended to not serve as a stable
|
||||
ABI to user space; in theory, there are no stability constraints placed on
|
||||
files exported there. The real world is not always so simple, though [1];
|
||||
even debugfs interfaces are best designed with the idea that they will need
|
||||
to be maintained forever.
|
||||
|
||||
Debugfs is typically mounted with a command like:
|
||||
|
||||
mount -t debugfs none /sys/kernel/debug
|
||||
|
||||
(Or an equivalent /etc/fstab line).
|
||||
|
||||
Note that the debugfs API is exported GPL-only to modules.
|
||||
|
||||
Code using debugfs should include <linux/debugfs.h>. Then, the first order
|
||||
of business will be to create at least one directory to hold a set of
|
||||
debugfs files:
|
||||
|
||||
struct dentry *debugfs_create_dir(const char *name, struct dentry *parent);
|
||||
|
||||
This call, if successful, will make a directory called name underneath the
|
||||
indicated parent directory. If parent is NULL, the directory will be
|
||||
created in the debugfs root. On success, the return value is a struct
|
||||
dentry pointer which can be used to create files in the directory (and to
|
||||
clean it up at the end). A NULL return value indicates that something went
|
||||
wrong. If ERR_PTR(-ENODEV) is returned, that is an indication that the
|
||||
kernel has been built without debugfs support and none of the functions
|
||||
described below will work.
|
||||
|
||||
The most general way to create a file within a debugfs directory is with:
|
||||
|
||||
struct dentry *debugfs_create_file(const char *name, mode_t mode,
|
||||
struct dentry *parent, void *data,
|
||||
const struct file_operations *fops);
|
||||
|
||||
Here, name is the name of the file to create, mode describes the access
|
||||
permissions the file should have, parent indicates the directory which
|
||||
should hold the file, data will be stored in the i_private field of the
|
||||
resulting inode structure, and fops is a set of file operations which
|
||||
implement the file's behavior. At a minimum, the read() and/or write()
|
||||
operations should be provided; others can be included as needed. Again,
|
||||
the return value will be a dentry pointer to the created file, NULL for
|
||||
error, or ERR_PTR(-ENODEV) if debugfs support is missing.
|
||||
|
||||
In a number of cases, the creation of a set of file operations is not
|
||||
actually necessary; the debugfs code provides a number of helper functions
|
||||
for simple situations. Files containing a single integer value can be
|
||||
created with any of:
|
||||
|
||||
struct dentry *debugfs_create_u8(const char *name, mode_t mode,
|
||||
struct dentry *parent, u8 *value);
|
||||
struct dentry *debugfs_create_u16(const char *name, mode_t mode,
|
||||
struct dentry *parent, u16 *value);
|
||||
struct dentry *debugfs_create_u32(const char *name, mode_t mode,
|
||||
struct dentry *parent, u32 *value);
|
||||
struct dentry *debugfs_create_u64(const char *name, mode_t mode,
|
||||
struct dentry *parent, u64 *value);
|
||||
|
||||
These files support both reading and writing the given value; if a specific
|
||||
file should not be written to, simply set the mode bits accordingly. The
|
||||
values in these files are in decimal; if hexadecimal is more appropriate,
|
||||
the following functions can be used instead:
|
||||
|
||||
struct dentry *debugfs_create_x8(const char *name, mode_t mode,
|
||||
struct dentry *parent, u8 *value);
|
||||
struct dentry *debugfs_create_x16(const char *name, mode_t mode,
|
||||
struct dentry *parent, u16 *value);
|
||||
struct dentry *debugfs_create_x32(const char *name, mode_t mode,
|
||||
struct dentry *parent, u32 *value);
|
||||
|
||||
Note that there is no debugfs_create_x64().
|
||||
|
||||
These functions are useful as long as the developer knows the size of the
|
||||
value to be exported. Some types can have different widths on different
|
||||
architectures, though, complicating the situation somewhat. There is a
|
||||
function meant to help out in one special case:
|
||||
|
||||
struct dentry *debugfs_create_size_t(const char *name, mode_t mode,
|
||||
struct dentry *parent,
|
||||
size_t *value);
|
||||
|
||||
As might be expected, this function will create a debugfs file to represent
|
||||
a variable of type size_t.
|
||||
|
||||
Boolean values can be placed in debugfs with:
|
||||
|
||||
struct dentry *debugfs_create_bool(const char *name, mode_t mode,
|
||||
struct dentry *parent, u32 *value);
|
||||
|
||||
A read on the resulting file will yield either Y (for non-zero values) or
|
||||
N, followed by a newline. If written to, it will accept either upper- or
|
||||
lower-case values, or 1 or 0. Any other input will be silently ignored.
|
||||
|
||||
Finally, a block of arbitrary binary data can be exported with:
|
||||
|
||||
struct debugfs_blob_wrapper {
|
||||
void *data;
|
||||
unsigned long size;
|
||||
};
|
||||
|
||||
struct dentry *debugfs_create_blob(const char *name, mode_t mode,
|
||||
struct dentry *parent,
|
||||
struct debugfs_blob_wrapper *blob);
|
||||
|
||||
A read of this file will return the data pointed to by the
|
||||
debugfs_blob_wrapper structure. Some drivers use "blobs" as a simple way
|
||||
to return several lines of (static) formatted text output. This function
|
||||
can be used to export binary information, but there does not appear to be
|
||||
any code which does so in the mainline. Note that all files created with
|
||||
debugfs_create_blob() are read-only.
|
||||
|
||||
There are a couple of other directory-oriented helper functions:
|
||||
|
||||
struct dentry *debugfs_rename(struct dentry *old_dir,
|
||||
struct dentry *old_dentry,
|
||||
struct dentry *new_dir,
|
||||
const char *new_name);
|
||||
|
||||
struct dentry *debugfs_create_symlink(const char *name,
|
||||
struct dentry *parent,
|
||||
const char *target);
|
||||
|
||||
A call to debugfs_rename() will give a new name to an existing debugfs
|
||||
file, possibly in a different directory. The new_name must not exist prior
|
||||
to the call; the return value is old_dentry with updated information.
|
||||
Symbolic links can be created with debugfs_create_symlink().
|
||||
|
||||
There is one important thing that all debugfs users must take into account:
|
||||
there is no automatic cleanup of any directories created in debugfs. If a
|
||||
module is unloaded without explicitly removing debugfs entries, the result
|
||||
will be a lot of stale pointers and no end of highly antisocial behavior.
|
||||
So all debugfs users - at least those which can be built as modules - must
|
||||
be prepared to remove all files and directories they create there. A file
|
||||
can be removed with:
|
||||
|
||||
void debugfs_remove(struct dentry *dentry);
|
||||
|
||||
The dentry value can be NULL, in which case nothing will be removed.
|
||||
|
||||
Once upon a time, debugfs users were required to remember the dentry
|
||||
pointer for every debugfs file they created so that all files could be
|
||||
cleaned up. We live in more civilized times now, though, and debugfs users
|
||||
can call:
|
||||
|
||||
void debugfs_remove_recursive(struct dentry *dentry);
|
||||
|
||||
If this function is passed a pointer for the dentry corresponding to the
|
||||
top-level directory, the entire hierarchy below that directory will be
|
||||
removed.
|
||||
|
||||
Notes:
|
||||
[1] http://lwn.net/Articles/309298/
|
@ -294,7 +294,7 @@ max_batch_time=usec Maximum amount of time ext4 should wait for
|
||||
amount of time (on average) that it takes to
|
||||
finish committing a transaction. Call this time
|
||||
the "commit time". If the time that the
|
||||
transactoin has been running is less than the
|
||||
transaction has been running is less than the
|
||||
commit time, ext4 will try sleeping for the
|
||||
commit time to see if other operations will join
|
||||
the transaction. The commit time is capped by
|
||||
@ -328,7 +328,7 @@ noauto_da_alloc replacing existing files via patterns such as
|
||||
journal commit, in the default data=ordered
|
||||
mode, the data blocks of the new file are forced
|
||||
to disk before the rename() operation is
|
||||
commited. This provides roughly the same level
|
||||
committed. This provides roughly the same level
|
||||
of guarantees as ext3, and avoids the
|
||||
"zero-length" problem that can happen when a
|
||||
system crashes before the delayed allocation
|
||||
@ -358,7 +358,7 @@ written to the journal first, and then to its final location.
|
||||
In the event of a crash, the journal can be replayed, bringing both data and
|
||||
metadata into a consistent state. This mode is the slowest except when data
|
||||
needs to be read from and written to disk at the same time where it
|
||||
outperforms all others modes. Curently ext4 does not have delayed
|
||||
outperforms all others modes. Currently ext4 does not have delayed
|
||||
allocation support if this data journalling mode is selected.
|
||||
|
||||
References
|
||||
|
@ -204,7 +204,7 @@ fiemap_check_flags() helper:
|
||||
|
||||
int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
|
||||
|
||||
The struct fieinfo should be passed in as recieved from ioctl_fiemap(). The
|
||||
The struct fieinfo should be passed in as received from ioctl_fiemap(). The
|
||||
set of fiemap flags which the fs understands should be passed via fs_flags. If
|
||||
fiemap_check_flags finds invalid user flags, it will place the bad values in
|
||||
fieinfo->fi_flags and return -EBADR. If the file system gets -EBADR, from
|
||||
|
@ -60,7 +60,7 @@ go_lock | Called for the first local holder of a lock
|
||||
go_unlock | Called on the final local unlock of a lock
|
||||
go_dump | Called to print content of object for debugfs file, or on
|
||||
| error to dump glock to the log.
|
||||
go_type; | The type of the glock, LM_TYPE_.....
|
||||
go_type | The type of the glock, LM_TYPE_.....
|
||||
go_min_hold_time | The minimum hold time
|
||||
|
||||
The minimum hold time for each lock is the time after a remote lock
|
||||
|
@ -11,18 +11,15 @@ their I/O so file system consistency is maintained. One of the nifty
|
||||
features of GFS is perfect consistency -- changes made to the file system
|
||||
on one machine show up immediately on all other machines in the cluster.
|
||||
|
||||
GFS uses interchangable inter-node locking mechanisms. Different lock
|
||||
modules can plug into GFS and each file system selects the appropriate
|
||||
lock module at mount time. Lock modules include:
|
||||
GFS uses interchangable inter-node locking mechanisms, the currently
|
||||
supported mechanisms are:
|
||||
|
||||
lock_nolock -- allows gfs to be used as a local file system
|
||||
|
||||
lock_dlm -- uses a distributed lock manager (dlm) for inter-node locking
|
||||
The dlm is found at linux/fs/dlm/
|
||||
|
||||
In addition to interfacing with an external locking manager, a gfs lock
|
||||
module is responsible for interacting with external cluster management
|
||||
systems. Lock_dlm depends on user space cluster management systems found
|
||||
Lock_dlm depends on user space cluster management systems found
|
||||
at the URL above.
|
||||
|
||||
To use gfs as a local file system, no external clustering systems are
|
||||
@ -31,13 +28,19 @@ needed, simply:
|
||||
$ mkfs -t gfs2 -p lock_nolock -j 1 /dev/block_device
|
||||
$ mount -t gfs2 /dev/block_device /dir
|
||||
|
||||
GFS2 is not on-disk compatible with previous versions of GFS.
|
||||
If you are using Fedora, you need to install the gfs2-utils package
|
||||
and, for lock_dlm, you will also need to install the cman package
|
||||
and write a cluster.conf as per the documentation.
|
||||
|
||||
GFS2 is not on-disk compatible with previous versions of GFS, but it
|
||||
is pretty close.
|
||||
|
||||
The following man pages can be found at the URL above:
|
||||
gfs2_fsck to repair a filesystem
|
||||
fsck.gfs2 to repair a filesystem
|
||||
gfs2_grow to expand a filesystem online
|
||||
gfs2_jadd to add journals to a filesystem online
|
||||
gfs2_tool to manipulate, examine and tune a filesystem
|
||||
gfs2_quota to examine and change quota values in a filesystem
|
||||
gfs2_convert to convert a gfs filesystem to gfs2 in-place
|
||||
mount.gfs2 to help mount(8) mount a filesystem
|
||||
mkfs.gfs2 to make a filesystem
|
||||
|
@ -100,7 +100,7 @@ Installation
|
||||
$ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
|
||||
|
||||
In this location, mount.nfs will be invoked automatically for NFS mounts
|
||||
by the system mount commmand.
|
||||
by the system mount command.
|
||||
|
||||
NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
|
||||
on the NFS client machine. You do not need this specific version of
|
||||
|
@ -39,9 +39,8 @@ Features which NILFS2 does not support yet:
|
||||
- extended attributes
|
||||
- POSIX ACLs
|
||||
- quotas
|
||||
- writable snapshots
|
||||
- remote backup (CDP)
|
||||
- data integrity
|
||||
- fsck
|
||||
- resize
|
||||
- defragmentation
|
||||
|
||||
Mount options
|
||||
|
@ -366,7 +366,7 @@ just those considered 'most important'. The new vectors are:
|
||||
RES, CAL, TLB -- rescheduling, call and TLB flush interrupts are
|
||||
sent from one CPU to another per the needs of the OS. Typically,
|
||||
their statistics are used by kernel developers and interested users to
|
||||
determine the occurance of interrupt of the given type.
|
||||
determine the occurrence of interrupts of the given type.
|
||||
|
||||
The above IRQ vectors are displayed only when relevent. For example,
|
||||
the threshold vector does not exist on x86_64 platforms. Others are
|
||||
@ -551,7 +551,7 @@ Committed_AS: The amount of memory presently allocated on the system.
|
||||
memory once that memory has been successfully allocated.
|
||||
VmallocTotal: total size of vmalloc memory area
|
||||
VmallocUsed: amount of vmalloc area which is used
|
||||
VmallocChunk: largest contigious block of vmalloc area which is free
|
||||
VmallocChunk: largest contiguous block of vmalloc area which is free
|
||||
|
||||
..............................................................................
|
||||
|
||||
@ -1003,11 +1003,13 @@ CHAPTER 3: PER-PROCESS PARAMETERS
|
||||
3.1 /proc/<pid>/oom_adj - Adjust the oom-killer score
|
||||
------------------------------------------------------
|
||||
|
||||
This file can be used to adjust the score used to select which processes
|
||||
should be killed in an out-of-memory situation. Giving it a high score will
|
||||
increase the likelihood of this process being killed by the oom-killer. Valid
|
||||
values are in the range -16 to +15, plus the special value -17, which disables
|
||||
oom-killing altogether for this process.
|
||||
This file can be used to adjust the score used to select which processes should
|
||||
be killed in an out-of-memory situation. The oom_adj value is a characteristic
|
||||
of the task's mm, so all threads that share an mm with pid will have the same
|
||||
oom_adj value. A high value will increase the likelihood of this process being
|
||||
killed by the oom-killer. Valid values are in the range -16 to +15 as
|
||||
explained below and a special value of -17, which disables oom-killing
|
||||
altogether for threads sharing pid's mm.
|
||||
|
||||
The process to be killed in an out-of-memory situation is selected among all others
|
||||
based on its badness score. This value equals the original memory size of the process
|
||||
@ -1021,6 +1023,9 @@ the parent's score if they do not share the same memory. Thus forking servers
|
||||
are the prime candidates to be killed. Having only one 'hungry' child will make
|
||||
parent less preferable than the child.
|
||||
|
||||
/proc/<pid>/oom_adj cannot be changed for kthreads since they are immune from
|
||||
oom-killing already.
|
||||
|
||||
/proc/<pid>/oom_score shows process' current badness score.
|
||||
|
||||
The following heuristics are then applied:
|
||||
|
@ -72,7 +72,7 @@ The 'rom' file is special in that it provides read-only access to the device's
|
||||
ROM file, if available. It's disabled by default, however, so applications
|
||||
should write the string "1" to the file to enable it before attempting a read
|
||||
call, and disable it following the access by writing "0" to the file. Note
|
||||
that the device must be enabled for a rom read to return data succesfully.
|
||||
that the device must be enabled for a rom read to return data successfully.
|
||||
In the event a driver is not bound to the device, it can be enabled using the
|
||||
'enable' file, documented above.
|
||||
|
||||
|
@ -133,4 +133,4 @@ RAM/SWAP in 10240 inodes and it is only accessible by root.
|
||||
Author:
|
||||
Christoph Rohland <cr@sap.com>, 1.12.01
|
||||
Updated:
|
||||
Hugh Dickins <hugh@veritas.com>, 4 June 2007
|
||||
Hugh Dickins, 4 June 2007
|
||||
|
@ -124,14 +124,19 @@ sys_immutable -- If set, ATTR_SYS attribute on FAT is handled as
|
||||
flush -- If set, the filesystem will try to flush to disk more
|
||||
early than normal. Not set by default.
|
||||
|
||||
rodir -- FAT has the ATTR_RO (read-only) attribute. But on Windows,
|
||||
the ATTR_RO of the directory will be just ignored actually,
|
||||
and is used by only applications as flag. E.g. it's setted
|
||||
for the customized folder.
|
||||
rodir -- FAT has the ATTR_RO (read-only) attribute. On Windows,
|
||||
the ATTR_RO of the directory will just be ignored,
|
||||
and is used only by applications as a flag (e.g. it's set
|
||||
for the customized folder).
|
||||
|
||||
If you want to use ATTR_RO as read-only flag even for
|
||||
the directory, set this option.
|
||||
|
||||
errors=panic|continue|remount-ro
|
||||
-- specify FAT behavior on critical errors: panic, continue
|
||||
without doing anything or remount the partition in
|
||||
read-only mode (default behavior).
|
||||
|
||||
<bool>: 0,1,yes,no,true,false
|
||||
|
||||
TODO
|
||||
|
@ -77,7 +77,8 @@
|
||||
seconds for the whole load operation.
|
||||
|
||||
- request_firmware_nowait() is also provided for convenience in
|
||||
non-user contexts.
|
||||
user contexts to request firmware asynchronously, but can't be called
|
||||
in atomic contexts.
|
||||
|
||||
|
||||
about in-kernel persistence:
|
||||
|
131
Documentation/futex-requeue-pi.txt
Normal file
131
Documentation/futex-requeue-pi.txt
Normal file
@ -0,0 +1,131 @@
|
||||
Futex Requeue PI
|
||||
----------------
|
||||
|
||||
Requeueing of tasks from a non-PI futex to a PI futex requires
|
||||
special handling in order to ensure the underlying rt_mutex is never
|
||||
left without an owner if it has waiters; doing so would break the PI
|
||||
boosting logic [see rt-mutex-desgin.txt] For the purposes of
|
||||
brevity, this action will be referred to as "requeue_pi" throughout
|
||||
this document. Priority inheritance is abbreviated throughout as
|
||||
"PI".
|
||||
|
||||
Motivation
|
||||
----------
|
||||
|
||||
Without requeue_pi, the glibc implementation of
|
||||
pthread_cond_broadcast() must resort to waking all the tasks waiting
|
||||
on a pthread_condvar and letting them try to sort out which task
|
||||
gets to run first in classic thundering-herd formation. An ideal
|
||||
implementation would wake the highest-priority waiter, and leave the
|
||||
rest to the natural wakeup inherent in unlocking the mutex
|
||||
associated with the condvar.
|
||||
|
||||
Consider the simplified glibc calls:
|
||||
|
||||
/* caller must lock mutex */
|
||||
pthread_cond_wait(cond, mutex)
|
||||
{
|
||||
lock(cond->__data.__lock);
|
||||
unlock(mutex);
|
||||
do {
|
||||
unlock(cond->__data.__lock);
|
||||
futex_wait(cond->__data.__futex);
|
||||
lock(cond->__data.__lock);
|
||||
} while(...)
|
||||
unlock(cond->__data.__lock);
|
||||
lock(mutex);
|
||||
}
|
||||
|
||||
pthread_cond_broadcast(cond)
|
||||
{
|
||||
lock(cond->__data.__lock);
|
||||
unlock(cond->__data.__lock);
|
||||
futex_requeue(cond->data.__futex, cond->mutex);
|
||||
}
|
||||
|
||||
Once pthread_cond_broadcast() requeues the tasks, the cond->mutex
|
||||
has waiters. Note that pthread_cond_wait() attempts to lock the
|
||||
mutex only after it has returned to user space. This will leave the
|
||||
underlying rt_mutex with waiters, and no owner, breaking the
|
||||
previously mentioned PI-boosting algorithms.
|
||||
|
||||
In order to support PI-aware pthread_condvar's, the kernel needs to
|
||||
be able to requeue tasks to PI futexes. This support implies that
|
||||
upon a successful futex_wait system call, the caller would return to
|
||||
user space already holding the PI futex. The glibc implementation
|
||||
would be modified as follows:
|
||||
|
||||
|
||||
/* caller must lock mutex */
|
||||
pthread_cond_wait_pi(cond, mutex)
|
||||
{
|
||||
lock(cond->__data.__lock);
|
||||
unlock(mutex);
|
||||
do {
|
||||
unlock(cond->__data.__lock);
|
||||
futex_wait_requeue_pi(cond->__data.__futex);
|
||||
lock(cond->__data.__lock);
|
||||
} while(...)
|
||||
unlock(cond->__data.__lock);
|
||||
/* the kernel acquired the the mutex for us */
|
||||
}
|
||||
|
||||
pthread_cond_broadcast_pi(cond)
|
||||
{
|
||||
lock(cond->__data.__lock);
|
||||
unlock(cond->__data.__lock);
|
||||
futex_requeue_pi(cond->data.__futex, cond->mutex);
|
||||
}
|
||||
|
||||
The actual glibc implementation will likely test for PI and make the
|
||||
necessary changes inside the existing calls rather than creating new
|
||||
calls for the PI cases. Similar changes are needed for
|
||||
pthread_cond_timedwait() and pthread_cond_signal().
|
||||
|
||||
Implementation
|
||||
--------------
|
||||
|
||||
In order to ensure the rt_mutex has an owner if it has waiters, it
|
||||
is necessary for both the requeue code, as well as the waiting code,
|
||||
to be able to acquire the rt_mutex before returning to user space.
|
||||
The requeue code cannot simply wake the waiter and leave it to
|
||||
acquire the rt_mutex as it would open a race window between the
|
||||
requeue call returning to user space and the waiter waking and
|
||||
starting to run. This is especially true in the uncontended case.
|
||||
|
||||
The solution involves two new rt_mutex helper routines,
|
||||
rt_mutex_start_proxy_lock() and rt_mutex_finish_proxy_lock(), which
|
||||
allow the requeue code to acquire an uncontended rt_mutex on behalf
|
||||
of the waiter and to enqueue the waiter on a contended rt_mutex.
|
||||
Two new system calls provide the kernel<->user interface to
|
||||
requeue_pi: FUTEX_WAIT_REQUEUE_PI and FUTEX_REQUEUE_CMP_PI.
|
||||
|
||||
FUTEX_WAIT_REQUEUE_PI is called by the waiter (pthread_cond_wait()
|
||||
and pthread_cond_timedwait()) to block on the initial futex and wait
|
||||
to be requeued to a PI-aware futex. The implementation is the
|
||||
result of a high-speed collision between futex_wait() and
|
||||
futex_lock_pi(), with some extra logic to check for the additional
|
||||
wake-up scenarios.
|
||||
|
||||
FUTEX_REQUEUE_CMP_PI is called by the waker
|
||||
(pthread_cond_broadcast() and pthread_cond_signal()) to requeue and
|
||||
possibly wake the waiting tasks. Internally, this system call is
|
||||
still handled by futex_requeue (by passing requeue_pi=1). Before
|
||||
requeueing, futex_requeue() attempts to acquire the requeue target
|
||||
PI futex on behalf of the top waiter. If it can, this waiter is
|
||||
woken. futex_requeue() then proceeds to requeue the remaining
|
||||
nr_wake+nr_requeue tasks to the PI futex, calling
|
||||
rt_mutex_start_proxy_lock() prior to each requeue to prepare the
|
||||
task as a waiter on the underlying rt_mutex. It is possible that
|
||||
the lock can be acquired at this stage as well, if so, the next
|
||||
waiter is woken to finish the acquisition of the lock.
|
||||
|
||||
FUTEX_REQUEUE_PI accepts nr_wake and nr_requeue as arguments, but
|
||||
their sum is all that really matters. futex_requeue() will wake or
|
||||
requeue up to nr_wake + nr_requeue tasks. It will wake only as many
|
||||
tasks as it can acquire the lock for, which in the majority of cases
|
||||
should be 0 as good programming practice dictates that the caller of
|
||||
either pthread_cond_broadcast() or pthread_cond_signal() acquire the
|
||||
mutex prior to making the call. FUTEX_REQUEUE_PI requires that
|
||||
nr_wake=1. nr_requeue should be INT_MAX for broadcast and 0 for
|
||||
signal.
|
@ -458,7 +458,7 @@ debugfs interface, since it provides control over GPIO direction and
|
||||
value instead of just showing a gpio state summary. Plus, it could be
|
||||
present on production systems without debugging support.
|
||||
|
||||
Given approprate hardware documentation for the system, userspace could
|
||||
Given appropriate hardware documentation for the system, userspace could
|
||||
know for example that GPIO #23 controls the write protect line used to
|
||||
protect boot loader segments in flash memory. System upgrade procedures
|
||||
may need to temporarily remove that protection, first importing a GPIO,
|
||||
|
@ -2,14 +2,18 @@ Kernel driver f71882fg
|
||||
======================
|
||||
|
||||
Supported chips:
|
||||
* Fintek F71882FG and F71883FG
|
||||
Prefix: 'f71882fg'
|
||||
* Fintek F71858FG
|
||||
Prefix: 'f71858fg'
|
||||
Addresses scanned: none, address read from Super I/O config space
|
||||
Datasheet: Available from the Fintek website
|
||||
* Fintek F71862FG and F71863FG
|
||||
Prefix: 'f71862fg'
|
||||
Addresses scanned: none, address read from Super I/O config space
|
||||
Datasheet: Available from the Fintek website
|
||||
* Fintek F71882FG and F71883FG
|
||||
Prefix: 'f71882fg'
|
||||
Addresses scanned: none, address read from Super I/O config space
|
||||
Datasheet: Available from the Fintek website
|
||||
* Fintek F8000
|
||||
Prefix: 'f8000'
|
||||
Addresses scanned: none, address read from Super I/O config space
|
||||
@ -66,13 +70,13 @@ printed when loading the driver.
|
||||
|
||||
Three different fan control modes are supported; the mode number is written
|
||||
to the pwm#_enable file. Note that not all modes are supported on all
|
||||
chips, and some modes may only be available in RPM / PWM mode on the F8000.
|
||||
chips, and some modes may only be available in RPM / PWM mode.
|
||||
Writing an unsupported mode will result in an invalid parameter error.
|
||||
|
||||
* 1: Manual mode
|
||||
You ask for a specific PWM duty cycle / DC voltage or a specific % of
|
||||
fan#_full_speed by writing to the pwm# file. This mode is only
|
||||
available on the F8000 if the fan channel is in RPM mode.
|
||||
available on the F71858FG / F8000 if the fan channel is in RPM mode.
|
||||
|
||||
* 2: Normal auto mode
|
||||
You can define a number of temperature/fan speed trip points, which % the
|
||||
|
@ -7,7 +7,7 @@ henceforth as AEM.
|
||||
Supported systems:
|
||||
* Any recent IBM System X server with AEM support.
|
||||
This includes the x3350, x3550, x3650, x3655, x3755, x3850 M2,
|
||||
x3950 M2, and certain HS2x/LS2x/QS2x blades. The IPMI host interface
|
||||
x3950 M2, and certain HC10/HS2x/LS2x/QS2x blades. The IPMI host interface
|
||||
driver ("ipmi-si") needs to be loaded for this driver to do anything.
|
||||
Prefix: 'ibmaem'
|
||||
Datasheet: Not available
|
||||
|
@ -70,6 +70,7 @@ are interpreted as 0! For more on how written strings are interpreted see the
|
||||
[0-*] denotes any positive number starting from 0
|
||||
[1-*] denotes any positive number starting from 1
|
||||
RO read only value
|
||||
WO write only value
|
||||
RW read/write value
|
||||
|
||||
Read/write values may be read-only for some chips, depending on the
|
||||
@ -150,6 +151,11 @@ fan[1-*]_min Fan minimum value
|
||||
Unit: revolution/min (RPM)
|
||||
RW
|
||||
|
||||
fan[1-*]_max Fan maximum value
|
||||
Unit: revolution/min (RPM)
|
||||
Only rarely supported by the hardware.
|
||||
RW
|
||||
|
||||
fan[1-*]_input Fan input value.
|
||||
Unit: revolution/min (RPM)
|
||||
RO
|
||||
@ -290,6 +296,24 @@ temp[1-*]_label Suggested temperature channel label.
|
||||
user-space.
|
||||
RO
|
||||
|
||||
temp[1-*]_lowest
|
||||
Historical minimum temperature
|
||||
Unit: millidegree Celsius
|
||||
RO
|
||||
|
||||
temp[1-*]_highest
|
||||
Historical maximum temperature
|
||||
Unit: millidegree Celsius
|
||||
RO
|
||||
|
||||
temp[1-*]_reset_history
|
||||
Reset temp_lowest and temp_highest
|
||||
WO
|
||||
|
||||
temp_reset_history
|
||||
Reset temp_lowest and temp_highest for all sensors
|
||||
WO
|
||||
|
||||
Some chips measure temperature using external thermistors and an ADC, and
|
||||
report the temperature measurement as a voltage. Converting this voltage
|
||||
back to a temperature (or the other way around for limits) requires
|
||||
@ -390,6 +414,7 @@ OR
|
||||
in[0-*]_min_alarm
|
||||
in[0-*]_max_alarm
|
||||
fan[1-*]_min_alarm
|
||||
fan[1-*]_max_alarm
|
||||
temp[1-*]_min_alarm
|
||||
temp[1-*]_max_alarm
|
||||
temp[1-*]_crit_alarm
|
||||
|
42
Documentation/hwmon/tmp401
Normal file
42
Documentation/hwmon/tmp401
Normal file
@ -0,0 +1,42 @@
|
||||
Kernel driver tmp401
|
||||
====================
|
||||
|
||||
Supported chips:
|
||||
* Texas Instruments TMP401
|
||||
Prefix: 'tmp401'
|
||||
Addresses scanned: I2C 0x4c
|
||||
Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp401.html
|
||||
* Texas Instruments TMP411
|
||||
Prefix: 'tmp411'
|
||||
Addresses scanned: I2C 0x4c
|
||||
Datasheet: http://focus.ti.com/docs/prod/folders/print/tmp411.html
|
||||
|
||||
Authors:
|
||||
Hans de Goede <hdegoede@redhat.com>
|
||||
Andre Prendel <andre.prendel@gmx.de>
|
||||
|
||||
Description
|
||||
-----------
|
||||
|
||||
This driver implements support for Texas Instruments TMP401 and
|
||||
TMP411 chips. These chips implements one remote and one local
|
||||
temperature sensor. Temperature is measured in degrees
|
||||
Celsius. Resolution of the remote sensor is 0.0625 degree. Local
|
||||
sensor resolution can be set to 0.5, 0.25, 0.125 or 0.0625 degree (not
|
||||
supported by the driver so far, so using the default resolution of 0.5
|
||||
degree).
|
||||
|
||||
The driver provides the common sysfs-interface for temperatures (see
|
||||
/Documentation/hwmon/sysfs-interface under Temperatures).
|
||||
|
||||
The TMP411 chip is compatible with TMP401. It provides some additional
|
||||
features.
|
||||
|
||||
* Minimum and Maximum temperature measured since power-on, chip-reset
|
||||
|
||||
Exported via sysfs attributes tempX_lowest and tempX_highest.
|
||||
|
||||
* Reset of historical minimum/maximum temperature measurements
|
||||
|
||||
Exported via sysfs attribute temp_reset_history. Writing 1 to this
|
||||
file triggers a reset.
|
@ -12,6 +12,10 @@ Supported chips:
|
||||
Addresses scanned: ISA address retrieved from Super I/O registers
|
||||
Datasheet:
|
||||
http://www.nuvoton.com.tw/NR/rdonlyres/7885623D-A487-4CF9-A47F-30C5F73D6FE6/0/W83627DHG.pdf
|
||||
* Winbond W83627DHG-P
|
||||
Prefix: 'w83627dhg'
|
||||
Addresses scanned: ISA address retrieved from Super I/O registers
|
||||
Datasheet: not available
|
||||
* Winbond W83667HG
|
||||
Prefix: 'w83667hg'
|
||||
Addresses scanned: ISA address retrieved from Super I/O registers
|
||||
@ -28,8 +32,8 @@ Description
|
||||
-----------
|
||||
|
||||
This driver implements support for the Winbond W83627EHF, W83627EHG,
|
||||
W83627DHG and W83667HG super I/O chips. We will refer to them collectively
|
||||
as Winbond chips.
|
||||
W83627DHG, W83627DHG-P and W83667HG super I/O chips. We will refer to them
|
||||
collectively as Winbond chips.
|
||||
|
||||
The chips implement three temperature sensors, five fan rotation
|
||||
speed sensors, ten analog voltage sensors (only nine for the 627DHG), one
|
||||
@ -135,3 +139,6 @@ done in the driver for all register addresses.
|
||||
The DHG also supports PECI, where the DHG queries Intel CPU temperatures, and
|
||||
the ICH8 southbridge gets that data via PECI from the DHG, so that the
|
||||
southbridge drives the fans. And the DHG supports SST, a one-wire serial bus.
|
||||
|
||||
The DHG-P has an additional automatic fan speed control mode named Smart Fan
|
||||
(TM) III+. This mode is not yet supported by the driver.
|
||||
|
@ -20,6 +20,8 @@ platform_device with the base address and interrupt number. The
|
||||
dev.platform_data of the device should also point to a struct
|
||||
ocores_i2c_platform_data (see linux/i2c-ocores.h) describing the
|
||||
distance between registers and the input clock speed.
|
||||
There is also a possibility to attach a list of i2c_board_info which
|
||||
the i2c-ocores driver will add to the bus upon creation.
|
||||
|
||||
E.G. something like:
|
||||
|
||||
@ -36,9 +38,24 @@ static struct resource ocores_resources[] = {
|
||||
},
|
||||
};
|
||||
|
||||
/* optional board info */
|
||||
struct i2c_board_info ocores_i2c_board_info[] = {
|
||||
{
|
||||
I2C_BOARD_INFO("tsc2003", 0x48),
|
||||
.platform_data = &tsc2003_platform_data,
|
||||
.irq = TSC_IRQ
|
||||
},
|
||||
{
|
||||
I2C_BOARD_INFO("adv7180", 0x42 >> 1),
|
||||
.irq = ADV_IRQ
|
||||
}
|
||||
};
|
||||
|
||||
static struct ocores_i2c_platform_data myi2c_data = {
|
||||
.regstep = 2, /* two bytes between registers */
|
||||
.clock_khz = 50000, /* input clock of 50MHz */
|
||||
.devices = ocores_i2c_board_info, /* optional table of devices */
|
||||
.num_devices = ARRAY_SIZE(ocores_i2c_board_info), /* table size */
|
||||
};
|
||||
|
||||
static struct platform_device myi2c = {
|
||||
|
@ -19,6 +19,9 @@ Supported adapters:
|
||||
* VIA Technologies, Inc. VX800/VX820
|
||||
Datasheet: available on http://linux.via.com.tw
|
||||
|
||||
* VIA Technologies, Inc. VX855/VX875
|
||||
Datasheet: Availability unknown
|
||||
|
||||
Authors:
|
||||
Kyösti Mälkki <kmalkki@cc.hut.fi>,
|
||||
Mark D. Studebaker <mdsxyz123@yahoo.com>,
|
||||
@ -53,6 +56,7 @@ Your lspci -n listing must show one of these :
|
||||
device 1106:3287 (VT8251)
|
||||
device 1106:8324 (CX700)
|
||||
device 1106:8353 (VX800/VX820)
|
||||
device 1106:8409 (VX855/VX875)
|
||||
|
||||
If none of these show up, you should look in the BIOS for settings like
|
||||
enable ACPI / SMBus or even USB.
|
||||
|
@ -216,6 +216,8 @@ Other kernel parameters for ide_core are:
|
||||
|
||||
* "noflush=[interface_number.device_number]" to disable flush requests
|
||||
|
||||
* "nohpa=[interface_number.device_number]" to disable Host Protected Area
|
||||
|
||||
* "noprobe=[interface_number.device_number]" to skip probing
|
||||
|
||||
* "nowerr=[interface_number.device_number]" to ignore the WRERR_STAT bit
|
||||
|
@ -18,8 +18,12 @@ Usage
|
||||
Anonymous finger details are sent sequentially as separate packets of ABS
|
||||
events. Only the ABS_MT events are recognized as part of a finger
|
||||
packet. The end of a packet is marked by calling the input_mt_sync()
|
||||
function, which generates a SYN_MT_REPORT event. The end of multi-touch
|
||||
transfer is marked by calling the usual input_sync() function.
|
||||
function, which generates a SYN_MT_REPORT event. This instructs the
|
||||
receiver to accept the data for the current finger and prepare to receive
|
||||
another. The end of a multi-touch transfer is marked by calling the usual
|
||||
input_sync() function. This instructs the receiver to act upon events
|
||||
accumulated since last EV_SYN/SYN_REPORT and prepare to receive a new
|
||||
set of events/packets.
|
||||
|
||||
A set of ABS_MT events with the desired properties is defined. The events
|
||||
are divided into categories, to allow for partial implementation. The
|
||||
@ -27,11 +31,26 @@ minimum set consists of ABS_MT_TOUCH_MAJOR, ABS_MT_POSITION_X and
|
||||
ABS_MT_POSITION_Y, which allows for multiple fingers to be tracked. If the
|
||||
device supports it, the ABS_MT_WIDTH_MAJOR may be used to provide the size
|
||||
of the approaching finger. Anisotropy and direction may be specified with
|
||||
ABS_MT_TOUCH_MINOR, ABS_MT_WIDTH_MINOR and ABS_MT_ORIENTATION. Devices with
|
||||
more granular information may specify general shapes as blobs, i.e., as a
|
||||
sequence of rectangular shapes grouped together by an
|
||||
ABS_MT_BLOB_ID. Finally, the ABS_MT_TOOL_TYPE may be used to specify
|
||||
whether the touching tool is a finger or a pen or something else.
|
||||
ABS_MT_TOUCH_MINOR, ABS_MT_WIDTH_MINOR and ABS_MT_ORIENTATION. The
|
||||
ABS_MT_TOOL_TYPE may be used to specify whether the touching tool is a
|
||||
finger or a pen or something else. Devices with more granular information
|
||||
may specify general shapes as blobs, i.e., as a sequence of rectangular
|
||||
shapes grouped together by an ABS_MT_BLOB_ID. Finally, for the few devices
|
||||
that currently support it, the ABS_MT_TRACKING_ID event may be used to
|
||||
report finger tracking from hardware [5].
|
||||
|
||||
Here is what a minimal event sequence for a two-finger touch would look
|
||||
like:
|
||||
|
||||
ABS_MT_TOUCH_MAJOR
|
||||
ABS_MT_POSITION_X
|
||||
ABS_MT_POSITION_Y
|
||||
SYN_MT_REPORT
|
||||
ABS_MT_TOUCH_MAJOR
|
||||
ABS_MT_POSITION_X
|
||||
ABS_MT_POSITION_Y
|
||||
SYN_MT_REPORT
|
||||
SYN_REPORT
|
||||
|
||||
|
||||
Event Semantics
|
||||
@ -44,24 +63,24 @@ ABS_MT_TOUCH_MAJOR
|
||||
|
||||
The length of the major axis of the contact. The length should be given in
|
||||
surface units. If the surface has an X times Y resolution, the largest
|
||||
possible value of ABS_MT_TOUCH_MAJOR is sqrt(X^2 + Y^2), the diagonal.
|
||||
possible value of ABS_MT_TOUCH_MAJOR is sqrt(X^2 + Y^2), the diagonal [4].
|
||||
|
||||
ABS_MT_TOUCH_MINOR
|
||||
|
||||
The length, in surface units, of the minor axis of the contact. If the
|
||||
contact is circular, this event can be omitted.
|
||||
contact is circular, this event can be omitted [4].
|
||||
|
||||
ABS_MT_WIDTH_MAJOR
|
||||
|
||||
The length, in surface units, of the major axis of the approaching
|
||||
tool. This should be understood as the size of the tool itself. The
|
||||
orientation of the contact and the approaching tool are assumed to be the
|
||||
same.
|
||||
same [4].
|
||||
|
||||
ABS_MT_WIDTH_MINOR
|
||||
|
||||
The length, in surface units, of the minor axis of the approaching
|
||||
tool. Omit if circular.
|
||||
tool. Omit if circular [4].
|
||||
|
||||
The above four values can be used to derive additional information about
|
||||
the contact. The ratio ABS_MT_TOUCH_MAJOR / ABS_MT_WIDTH_MAJOR approximates
|
||||
@ -70,14 +89,17 @@ different characteristic widths [1].
|
||||
|
||||
ABS_MT_ORIENTATION
|
||||
|
||||
The orientation of the ellipse. The value should describe half a revolution
|
||||
clockwise around the touch center. The scale of the value is arbitrary, but
|
||||
zero should be returned for an ellipse aligned along the Y axis of the
|
||||
surface. As an example, an index finger placed straight onto the axis could
|
||||
return zero orientation, something negative when twisted to the left, and
|
||||
something positive when twisted to the right. This value can be omitted if
|
||||
the touching object is circular, or if the information is not available in
|
||||
the kernel driver.
|
||||
The orientation of the ellipse. The value should describe a signed quarter
|
||||
of a revolution clockwise around the touch center. The signed value range
|
||||
is arbitrary, but zero should be returned for a finger aligned along the Y
|
||||
axis of the surface, a negative value when finger is turned to the left, and
|
||||
a positive value when finger turned to the right. When completely aligned with
|
||||
the X axis, the range max should be returned. Orientation can be omitted
|
||||
if the touching object is circular, or if the information is not available
|
||||
in the kernel driver. Partial orientation support is possible if the device
|
||||
can distinguish between the two axis, but not (uniquely) any values in
|
||||
between. In such cases, the range of ABS_MT_ORIENTATION should be [0, 1]
|
||||
[4].
|
||||
|
||||
ABS_MT_POSITION_X
|
||||
|
||||
@ -98,8 +120,35 @@ ABS_MT_BLOB_ID
|
||||
|
||||
The BLOB_ID groups several packets together into one arbitrarily shaped
|
||||
contact. This is a low-level anonymous grouping, and should not be confused
|
||||
with the high-level contactID, explained below. Most kernel drivers will
|
||||
not have this capability, and can safely omit the event.
|
||||
with the high-level trackingID [5]. Most kernel drivers will not have blob
|
||||
capability, and can safely omit the event.
|
||||
|
||||
ABS_MT_TRACKING_ID
|
||||
|
||||
The TRACKING_ID identifies an initiated contact throughout its life cycle
|
||||
[5]. There are currently only a few devices that support it, so this event
|
||||
should normally be omitted.
|
||||
|
||||
|
||||
Event Computation
|
||||
-----------------
|
||||
|
||||
The flora of different hardware unavoidably leads to some devices fitting
|
||||
better to the MT protocol than others. To simplify and unify the mapping,
|
||||
this section gives recipes for how to compute certain events.
|
||||
|
||||
For devices reporting contacts as rectangular shapes, signed orientation
|
||||
cannot be obtained. Assuming X and Y are the lengths of the sides of the
|
||||
touching rectangle, here is a simple formula that retains the most
|
||||
information possible:
|
||||
|
||||
ABS_MT_TOUCH_MAJOR := max(X, Y)
|
||||
ABS_MT_TOUCH_MINOR := min(X, Y)
|
||||
ABS_MT_ORIENTATION := bool(X > Y)
|
||||
|
||||
The range of ABS_MT_ORIENTATION should be set to [0, 1], to indicate that
|
||||
the device can distinguish between a finger along the Y axis (0) and a
|
||||
finger along the X axis (1).
|
||||
|
||||
|
||||
Finger Tracking
|
||||
@ -109,14 +158,18 @@ The kernel driver should generate an arbitrary enumeration of the set of
|
||||
anonymous contacts currently on the surface. The order in which the packets
|
||||
appear in the event stream is not important.
|
||||
|
||||
The process of finger tracking, i.e., to assign a unique contactID to each
|
||||
The process of finger tracking, i.e., to assign a unique trackingID to each
|
||||
initiated contact on the surface, is left to user space; preferably the
|
||||
multi-touch X driver [3]. In that driver, the contactID stays the same and
|
||||
multi-touch X driver [3]. In that driver, the trackingID stays the same and
|
||||
unique until the contact vanishes (when the finger leaves the surface). The
|
||||
problem of assigning a set of anonymous fingers to a set of identified
|
||||
fingers is a euclidian bipartite matching problem at each event update, and
|
||||
relies on a sufficiently rapid update rate.
|
||||
|
||||
There are a few devices that support trackingID in hardware. User space can
|
||||
make use of these native identifiers to reduce bandwidth and cpu usage.
|
||||
|
||||
|
||||
Notes
|
||||
-----
|
||||
|
||||
@ -136,5 +189,7 @@ could be used to derive tilt.
|
||||
time of writing (April 2009), the MT protocol is not yet merged, and the
|
||||
prototype implements finger matching, basic mouse support and two-finger
|
||||
scrolling. The project aims at improving the quality of current multi-touch
|
||||
functionality available in the synaptics X driver, and in addition
|
||||
functionality available in the Synaptics X driver, and in addition
|
||||
implement more advanced gestures.
|
||||
[4] See the section on event computation.
|
||||
[5] See the section on finger tracking.
|
||||
|
@ -22,16 +22,11 @@ README.gigaset
|
||||
- info on the drivers for Siemens Gigaset ISDN adapters.
|
||||
README.icn
|
||||
- info on the ICN-ISDN-card and its driver.
|
||||
>>>>>>> 93af7aca44f0e82e67bda10a0fb73d383edcc8bd:Documentation/isdn/00-INDEX
|
||||
README.HiSax
|
||||
- info on the HiSax driver which replaces the old teles.
|
||||
README.hfc-pci
|
||||
- info on hfc-pci based cards.
|
||||
README.pcbit
|
||||
- info on the PCBIT-D ISDN adapter and driver.
|
||||
README.syncppp
|
||||
- info on running Sync PPP over ISDN.
|
||||
syncPPP.FAQ
|
||||
- frequently asked questions about running PPP over ISDN.
|
||||
README.audio
|
||||
- info for running audio over ISDN.
|
||||
README.avmb1
|
||||
- info on driver for AVM-B1 ISDN card.
|
||||
README.act2000
|
||||
@ -42,10 +37,28 @@ README.concap
|
||||
- info on "CONCAP" encapsulation protocol interface used for X.25.
|
||||
README.diversion
|
||||
- info on module for isdn diversion services.
|
||||
README.fax
|
||||
- info for using Fax over ISDN.
|
||||
README.gigaset
|
||||
- info on the drivers for Siemens Gigaset ISDN adapters
|
||||
README.hfc-pci
|
||||
- info on hfc-pci based cards.
|
||||
README.hysdn
|
||||
- info on driver for Hypercope active HYSDN cards
|
||||
README.icn
|
||||
- info on the ICN-ISDN-card and its driver.
|
||||
README.mISDN
|
||||
- info on the Modular ISDN subsystem (mISDN)
|
||||
README.pcbit
|
||||
- info on the PCBIT-D ISDN adapter and driver.
|
||||
README.sc
|
||||
- info on driver for Spellcaster cards.
|
||||
README.syncppp
|
||||
- info on running Sync PPP over ISDN.
|
||||
README.x25
|
||||
- info for running X.25 over ISDN.
|
||||
syncPPP.FAQ
|
||||
- frequently asked questions about running PPP over ISDN.
|
||||
README.hysdn
|
||||
- info on driver for Hypercope active HYSDN cards
|
||||
README.mISDN
|
||||
|
@ -45,7 +45,7 @@ From then on, Kernel CAPI may call the registered callback functions for the
|
||||
device.
|
||||
|
||||
If the device becomes unusable for any reason (shutdown, disconnect ...), the
|
||||
driver has to call capi_ctr_reseted(). This will prevent further calls to the
|
||||
driver has to call capi_ctr_down(). This will prevent further calls to the
|
||||
callback functions by Kernel CAPI.
|
||||
|
||||
|
||||
@ -114,20 +114,36 @@ char *driver_name
|
||||
int (*load_firmware)(struct capi_ctr *ctrlr, capiloaddata *ldata)
|
||||
(optional) pointer to a callback function for sending firmware and
|
||||
configuration data to the device
|
||||
Return value: 0 on success, error code on error
|
||||
Called in process context.
|
||||
|
||||
void (*reset_ctr)(struct capi_ctr *ctrlr)
|
||||
pointer to a callback function for performing a reset on the device,
|
||||
releasing all registered applications
|
||||
(optional) pointer to a callback function for performing a reset on
|
||||
the device, releasing all registered applications
|
||||
Called in process context.
|
||||
|
||||
void (*register_appl)(struct capi_ctr *ctrlr, u16 applid,
|
||||
capi_register_params *rparam)
|
||||
void (*release_appl)(struct capi_ctr *ctrlr, u16 applid)
|
||||
pointers to callback functions for registration and deregistration of
|
||||
applications with the device
|
||||
Calls to these functions are serialized by Kernel CAPI so that only
|
||||
one call to any of them is active at any time.
|
||||
|
||||
u16 (*send_message)(struct capi_ctr *ctrlr, struct sk_buff *skb)
|
||||
pointer to a callback function for sending a CAPI message to the
|
||||
device
|
||||
Return value: CAPI error code
|
||||
If the method returns 0 (CAPI_NOERROR) the driver has taken ownership
|
||||
of the skb and the caller may no longer access it. If it returns a
|
||||
non-zero (error) value then ownership of the skb returns to the caller
|
||||
who may reuse or free it.
|
||||
The return value should only be used to signal problems with respect
|
||||
to accepting or queueing the message. Errors occurring during the
|
||||
actual processing of the message should be signaled with an
|
||||
appropriate reply message.
|
||||
Calls to this function are not serialized by Kernel CAPI, ie. it must
|
||||
be prepared to be re-entered.
|
||||
|
||||
char *(*procinfo)(struct capi_ctr *ctrlr)
|
||||
pointer to a callback function returning the entry for the device in
|
||||
@ -138,6 +154,8 @@ read_proc_t *ctr_read_proc
|
||||
system entry, /proc/capi/controllers/<n>; will be called with a
|
||||
pointer to the device's capi_ctr structure as the last (data) argument
|
||||
|
||||
Note: Callback functions are never called in interrupt context.
|
||||
|
||||
- to be filled in before calling capi_ctr_ready():
|
||||
|
||||
u8 manu[CAPI_MANUFACTURER_LEN]
|
||||
@ -153,6 +171,45 @@ u8 serial[CAPI_SERIAL_LEN]
|
||||
value to return for CAPI_GET_SERIAL
|
||||
|
||||
|
||||
4.3 The _cmsg Structure
|
||||
|
||||
(declared in <linux/isdn/capiutil.h>)
|
||||
|
||||
The _cmsg structure stores the contents of a CAPI 2.0 message in an easily
|
||||
accessible form. It contains members for all possible CAPI 2.0 parameters, of
|
||||
which only those appearing in the message type currently being processed are
|
||||
actually used. Unused members should be set to zero.
|
||||
|
||||
Members are named after the CAPI 2.0 standard names of the parameters they
|
||||
represent. See <linux/isdn/capiutil.h> for the exact spelling. Member data
|
||||
types are:
|
||||
|
||||
u8 for CAPI parameters of type 'byte'
|
||||
|
||||
u16 for CAPI parameters of type 'word'
|
||||
|
||||
u32 for CAPI parameters of type 'dword'
|
||||
|
||||
_cstruct for CAPI parameters of type 'struct' not containing any
|
||||
variably-sized (struct) subparameters (eg. 'Called Party Number')
|
||||
The member is a pointer to a buffer containing the parameter in
|
||||
CAPI encoding (length + content). It may also be NULL, which will
|
||||
be taken to represent an empty (zero length) parameter.
|
||||
|
||||
_cmstruct for CAPI parameters of type 'struct' containing 'struct'
|
||||
subparameters ('Additional Info' and 'B Protocol')
|
||||
The representation is a single byte containing one of the values:
|
||||
CAPI_DEFAULT: the parameter is empty
|
||||
CAPI_COMPOSE: the values of the subparameters are stored
|
||||
individually in the corresponding _cmsg structure members
|
||||
|
||||
Functions capi_cmsg2message() and capi_message2cmsg() are provided to convert
|
||||
messages between their transport encoding described in the CAPI 2.0 standard
|
||||
and their _cmsg structure representation. Note that capi_cmsg2message() does
|
||||
not know or check the size of its destination buffer. The caller must make
|
||||
sure it is big enough to accomodate the resulting CAPI message.
|
||||
|
||||
|
||||
5. Lower Layer Interface Functions
|
||||
|
||||
(declared in <linux/isdn/capilli.h>)
|
||||
@ -166,7 +223,7 @@ int detach_capi_ctr(struct capi_ctr *ctrlr)
|
||||
register/unregister a device (controller) with Kernel CAPI
|
||||
|
||||
void capi_ctr_ready(struct capi_ctr *ctrlr)
|
||||
void capi_ctr_reseted(struct capi_ctr *ctrlr)
|
||||
void capi_ctr_down(struct capi_ctr *ctrlr)
|
||||
signal controller ready/not ready
|
||||
|
||||
void capi_ctr_suspend_output(struct capi_ctr *ctrlr)
|
||||
@ -211,3 +268,32 @@ CAPIMSG_CONTROL(m) CAPIMSG_SETCONTROL(m, contr) Controller/PLCI/NCCI
|
||||
(u32)
|
||||
CAPIMSG_DATALEN(m) CAPIMSG_SETDATALEN(m, len) Data Length (u16)
|
||||
|
||||
|
||||
Library functions for working with _cmsg structures
|
||||
(from <linux/isdn/capiutil.h>):
|
||||
|
||||
unsigned capi_cmsg2message(_cmsg *cmsg, u8 *msg)
|
||||
Assembles a CAPI 2.0 message from the parameters in *cmsg, storing the
|
||||
result in *msg.
|
||||
|
||||
unsigned capi_message2cmsg(_cmsg *cmsg, u8 *msg)
|
||||
Disassembles the CAPI 2.0 message in *msg, storing the parameters in
|
||||
*cmsg.
|
||||
|
||||
unsigned capi_cmsg_header(_cmsg *cmsg, u16 ApplId, u8 Command, u8 Subcommand,
|
||||
u16 Messagenumber, u32 Controller)
|
||||
Fills the header part and address field of the _cmsg structure *cmsg
|
||||
with the given values, zeroing the remainder of the structure so only
|
||||
parameters with non-default values need to be changed before sending
|
||||
the message.
|
||||
|
||||
void capi_cmsg_answer(_cmsg *cmsg)
|
||||
Sets the low bit of the Subcommand field in *cmsg, thereby converting
|
||||
_REQ to _CONF and _IND to _RESP.
|
||||
|
||||
char *capi_cmd2str(u8 Command, u8 Subcommand)
|
||||
Returns the CAPI 2.0 message name corresponding to the given command
|
||||
and subcommand values, as a static ASCII string. The return value may
|
||||
be NULL if the command/subcommand is not one of those defined in the
|
||||
CAPI 2.0 standard.
|
||||
|
||||
|
@ -149,10 +149,8 @@ GigaSet 307x Device Driver
|
||||
configuration files and chat scripts in the gigaset-VERSION/ppp directory
|
||||
in the driver packages from http://sourceforge.net/projects/gigaset307x/.
|
||||
Please note that the USB drivers are not able to change the state of the
|
||||
control lines (the M105 driver can be configured to use some undocumented
|
||||
control requests, if you really need the control lines, though). This means
|
||||
you must use "Stupid Mode" if you are using wvdial or you should use the
|
||||
nocrtscts option of pppd.
|
||||
control lines. This means you must use "Stupid Mode" if you are using
|
||||
wvdial or you should use the nocrtscts option of pppd.
|
||||
You must also assure that the ppp_async module is loaded with the parameter
|
||||
flag_time=0. You can do this e.g. by adding a line like
|
||||
|
||||
@ -190,20 +188,19 @@ GigaSet 307x Device Driver
|
||||
You can also use /sys/class/tty/ttyGxy/cidmode for changing the CID mode
|
||||
setting (ttyGxy is ttyGU0 or ttyGB0).
|
||||
|
||||
2.6. M105 Undocumented USB Requests
|
||||
------------------------------
|
||||
|
||||
The Gigaset M105 USB data box understands a couple of useful, but
|
||||
undocumented USB commands. These requests are not used in normal
|
||||
operation (for wireless access to the base), but are needed for access
|
||||
to the M105's own configuration mode (registration to the base, baudrate
|
||||
and line format settings, device status queries) via the gigacontr
|
||||
utility. Their use is controlled by the kernel configuration option
|
||||
"Support for undocumented USB requests" (CONFIG_GIGASET_UNDOCREQ). If you
|
||||
encounter error code -ENOTTY when trying to use some features of the
|
||||
M105, try setting that option to "y" via 'make {x,menu}config' and
|
||||
recompiling the driver.
|
||||
2.6. Unregistered Wireless Devices (M101/M105)
|
||||
-----------------------------------------
|
||||
The main purpose of the ser_gigaset and usb_gigaset drivers is to allow
|
||||
the M101 and M105 wireless devices to be used as ISDN devices for ISDN
|
||||
connections through a Gigaset base. Therefore they assume that the device
|
||||
is registered to a DECT base.
|
||||
|
||||
If the M101/M105 device is not registered to a base, initialization of
|
||||
the device fails, and a corresponding error message is logged by the
|
||||
driver. In that situation, a restricted set of functions is available
|
||||
which includes, in particular, those necessary for registering the device
|
||||
to a base or for switching it between Fixed Part and Portable Part
|
||||
modes.
|
||||
|
||||
3. Troubleshooting
|
||||
---------------
|
||||
@ -234,11 +231,12 @@ GigaSet 307x Device Driver
|
||||
Select Unimodem mode for all DECT data adapters. (see section 2.4.)
|
||||
|
||||
Problem:
|
||||
You want to configure your USB DECT data adapter (M105) but gigacontr
|
||||
reports an error: "/dev/ttyGU0: Inappropriate ioctl for device".
|
||||
Messages like this:
|
||||
usb_gigaset 3-2:1.0: Could not initialize the device.
|
||||
appear in your syslog.
|
||||
Solution:
|
||||
Recompile the usb_gigaset driver with the kernel configuration option
|
||||
CONFIG_GIGASET_UNDOCREQ set to 'y'. (see section 2.6.)
|
||||
Check whether your M10x wireless device is correctly registered to the
|
||||
Gigaset base. (see section 2.6.)
|
||||
|
||||
3.2. Telling the driver to provide more information
|
||||
----------------------------------------------
|
||||
|
@ -35,6 +35,79 @@ new .config files to see the differences:
|
||||
|
||||
(Yes, we need something better here.)
|
||||
|
||||
______________________________________________________________________
|
||||
Environment variables for '*config'
|
||||
|
||||
KCONFIG_CONFIG
|
||||
--------------------------------------------------
|
||||
This environment variable can be used to specify a default kernel config
|
||||
file name to override the default name of ".config".
|
||||
|
||||
KCONFIG_OVERWRITECONFIG
|
||||
--------------------------------------------------
|
||||
If you set KCONFIG_OVERWRITECONFIG in the environment, Kconfig will not
|
||||
break symlinks when .config is a symlink to somewhere else.
|
||||
|
||||
KCONFIG_NOTIMESTAMP
|
||||
--------------------------------------------------
|
||||
If this environment variable exists and is non-null, the timestamp line
|
||||
in generated .config files is omitted.
|
||||
|
||||
______________________________________________________________________
|
||||
Environment variables for '{allyes/allmod/allno/rand}config'
|
||||
|
||||
KCONFIG_ALLCONFIG
|
||||
--------------------------------------------------
|
||||
(partially based on lkml email from/by Rob Landley, re: miniconfig)
|
||||
--------------------------------------------------
|
||||
The allyesconfig/allmodconfig/allnoconfig/randconfig variants can
|
||||
also use the environment variable KCONFIG_ALLCONFIG as a flag or a
|
||||
filename that contains config symbols that the user requires to be
|
||||
set to a specific value. If KCONFIG_ALLCONFIG is used without a
|
||||
filename, "make *config" checks for a file named
|
||||
"all{yes/mod/no/random}.config" (corresponding to the *config command
|
||||
that was used) for symbol values that are to be forced. If this file
|
||||
is not found, it checks for a file named "all.config" to contain forced
|
||||
values.
|
||||
|
||||
This enables you to create "miniature" config (miniconfig) or custom
|
||||
config files containing just the config symbols that you are interested
|
||||
in. Then the kernel config system generates the full .config file,
|
||||
including symbols of your miniconfig file.
|
||||
|
||||
This 'KCONFIG_ALLCONFIG' file is a config file which contains
|
||||
(usually a subset of all) preset config symbols. These variable
|
||||
settings are still subject to normal dependency checks.
|
||||
|
||||
Examples:
|
||||
KCONFIG_ALLCONFIG=custom-notebook.config make allnoconfig
|
||||
or
|
||||
KCONFIG_ALLCONFIG=mini.config make allnoconfig
|
||||
or
|
||||
make KCONFIG_ALLCONFIG=mini.config allnoconfig
|
||||
|
||||
These examples will disable most options (allnoconfig) but enable or
|
||||
disable the options that are explicitly listed in the specified
|
||||
mini-config files.
|
||||
|
||||
______________________________________________________________________
|
||||
Environment variables for 'silentoldconfig'
|
||||
|
||||
KCONFIG_NOSILENTUPDATE
|
||||
--------------------------------------------------
|
||||
If this variable has a non-blank value, it prevents silent kernel
|
||||
config udpates (requires explicit updates).
|
||||
|
||||
KCONFIG_AUTOCONFIG
|
||||
--------------------------------------------------
|
||||
This environment variable can be set to specify the path & name of the
|
||||
"auto.conf" file. Its default value is "include/config/auto.conf".
|
||||
|
||||
KCONFIG_AUTOHEADER
|
||||
--------------------------------------------------
|
||||
This environment variable can be set to specify the path & name of the
|
||||
"autoconf.h" (header) file. Its default value is "include/linux/autoconf.h".
|
||||
|
||||
|
||||
======================================================================
|
||||
menuconfig
|
||||
@ -60,10 +133,11 @@ Searching in menuconfig:
|
||||
|
||||
/^hotplug
|
||||
|
||||
|
||||
______________________________________________________________________
|
||||
Color Themes for 'menuconfig'
|
||||
User interface options for 'menuconfig'
|
||||
|
||||
MENUCONFIG_COLOR
|
||||
--------------------------------------------------
|
||||
It is possible to select different color themes using the variable
|
||||
MENUCONFIG_COLOR. To select a theme use:
|
||||
|
||||
@ -75,83 +149,13 @@ Available themes are:
|
||||
classic => theme with blue background. The classic look
|
||||
bluetitle => a LCD friendly version of classic. (default)
|
||||
|
||||
______________________________________________________________________
|
||||
Environment variables in 'menuconfig'
|
||||
|
||||
KCONFIG_ALLCONFIG
|
||||
--------------------------------------------------
|
||||
(partially based on lkml email from/by Rob Landley, re: miniconfig)
|
||||
--------------------------------------------------
|
||||
The allyesconfig/allmodconfig/allnoconfig/randconfig variants can
|
||||
also use the environment variable KCONFIG_ALLCONFIG as a flag or a
|
||||
filename that contains config symbols that the user requires to be
|
||||
set to a specific value. If KCONFIG_ALLCONFIG is used without a
|
||||
filename, "make *config" checks for a file named
|
||||
"all{yes/mod/no/random}.config" (corresponding to the *config command
|
||||
that was used) for symbol values that are to be forced. If this file
|
||||
is not found, it checks for a file named "all.config" to contain forced
|
||||
values.
|
||||
|
||||
This enables you to create "miniature" config (miniconfig) or custom
|
||||
config files containing just the config symbols that you are interested
|
||||
in. Then the kernel config system generates the full .config file,
|
||||
including dependencies of your miniconfig file, based on the miniconfig
|
||||
file.
|
||||
|
||||
This 'KCONFIG_ALLCONFIG' file is a config file which contains
|
||||
(usually a subset of all) preset config symbols. These variable
|
||||
settings are still subject to normal dependency checks.
|
||||
|
||||
Examples:
|
||||
KCONFIG_ALLCONFIG=custom-notebook.config make allnoconfig
|
||||
or
|
||||
KCONFIG_ALLCONFIG=mini.config make allnoconfig
|
||||
or
|
||||
make KCONFIG_ALLCONFIG=mini.config allnoconfig
|
||||
|
||||
These examples will disable most options (allnoconfig) but enable or
|
||||
disable the options that are explicitly listed in the specified
|
||||
mini-config files.
|
||||
|
||||
KCONFIG_NOSILENTUPDATE
|
||||
--------------------------------------------------
|
||||
If this variable has a non-blank value, it prevents silent kernel
|
||||
config udpates (requires explicit updates).
|
||||
|
||||
KCONFIG_CONFIG
|
||||
--------------------------------------------------
|
||||
This environment variable can be used to specify a default kernel config
|
||||
file name to override the default name of ".config".
|
||||
|
||||
KCONFIG_OVERWRITECONFIG
|
||||
--------------------------------------------------
|
||||
If you set KCONFIG_OVERWRITECONFIG in the environment, Kconfig will not
|
||||
break symlinks when .config is a symlink to somewhere else.
|
||||
|
||||
KCONFIG_NOTIMESTAMP
|
||||
--------------------------------------------------
|
||||
If this environment variable exists and is non-null, the timestamp line
|
||||
in generated .config files is omitted.
|
||||
|
||||
KCONFIG_AUTOCONFIG
|
||||
--------------------------------------------------
|
||||
This environment variable can be set to specify the path & name of the
|
||||
"auto.conf" file. Its default value is "include/config/auto.conf".
|
||||
|
||||
KCONFIG_AUTOHEADER
|
||||
--------------------------------------------------
|
||||
This environment variable can be set to specify the path & name of the
|
||||
"autoconf.h" (header) file. Its default value is "include/linux/autoconf.h".
|
||||
|
||||
______________________________________________________________________
|
||||
menuconfig User Interface Options
|
||||
----------------------------------------------------------------------
|
||||
MENUCONFIG_MODE
|
||||
--------------------------------------------------
|
||||
This mode shows all sub-menus in one large tree.
|
||||
|
||||
Example:
|
||||
MENUCONFIG_MODE=single_menu make menuconfig
|
||||
make MENUCONFIG_MODE=single_menu menuconfig
|
||||
|
||||
|
||||
======================================================================
|
||||
xconfig
|
||||
|
@ -275,7 +275,7 @@ following files:
|
||||
|
||||
KERNELDIR := /lib/modules/`uname -r`/build
|
||||
all::
|
||||
$(MAKE) -C $KERNELDIR M=`pwd` $@
|
||||
$(MAKE) -C $(KERNELDIR) M=`pwd` $@
|
||||
|
||||
# Module specific targets
|
||||
genbin:
|
||||
|
@ -108,7 +108,7 @@ There are two possible methods of using Kdump.
|
||||
|
||||
2) Or use the system kernel binary itself as dump-capture kernel and there is
|
||||
no need to build a separate dump-capture kernel. This is possible
|
||||
only with the architecutres which support a relocatable kernel. As
|
||||
only with the architectures which support a relocatable kernel. As
|
||||
of today, i386, x86_64, ppc64 and ia64 architectures support relocatable
|
||||
kernel.
|
||||
|
||||
@ -222,7 +222,7 @@ Dump-capture kernel config options (Arch Dependent, ia64)
|
||||
----------------------------------------------------------
|
||||
|
||||
- No specific options are required to create a dump-capture kernel
|
||||
for ia64, other than those specified in the arch idependent section
|
||||
for ia64, other than those specified in the arch independent section
|
||||
above. This means that it is possible to use the system kernel
|
||||
as a dump-capture kernel if desired.
|
||||
|
||||
|
@ -328,11 +328,6 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
flushed before they will be reused, which
|
||||
is a lot of faster
|
||||
|
||||
amd_iommu_size= [HW,X86-64]
|
||||
Define the size of the aperture for the AMD IOMMU
|
||||
driver. Possible values are:
|
||||
'32M', '64M' (default), '128M', '256M', '512M', '1G'
|
||||
|
||||
amijoy.map= [HW,JOY] Amiga joystick support
|
||||
Map of devices attached to JOY0DAT and JOY1DAT
|
||||
Format: <a>,<b>
|
||||
@ -496,6 +491,13 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
Also note the kernel might malfunction if you disable
|
||||
some critical bits.
|
||||
|
||||
cmo_free_hint= [PPC] Format: { yes | no }
|
||||
Specify whether pages are marked as being inactive
|
||||
when they are freed. This is used in CMO environments
|
||||
to determine OS memory pressure for page stealing by
|
||||
a hypervisor.
|
||||
Default: yes
|
||||
|
||||
code_bytes [X86] How many bytes of object code to print
|
||||
in an oops report.
|
||||
Range: 0 - 8192
|
||||
@ -544,6 +546,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
console=brl,ttyS0
|
||||
For now, only VisioBraille is supported.
|
||||
|
||||
consoleblank= [KNL] The console blank (screen saver) timeout in
|
||||
seconds. Defaults to 10*60 = 10mins. A value of 0
|
||||
disables the blank timer.
|
||||
|
||||
coredump_filter=
|
||||
[KNL] Change the default value for
|
||||
/proc/<pid>/coredump_filter.
|
||||
@ -645,6 +651,13 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
DMA-API debugging code disables itself because the
|
||||
architectural default is too low.
|
||||
|
||||
dma_debug_driver=<driver_name>
|
||||
With this option the DMA-API debugging driver
|
||||
filter feature can be enabled at boot time. Just
|
||||
pass the driver to filter for as the parameter.
|
||||
The filter can be disabled or changed to another
|
||||
driver later using sysfs.
|
||||
|
||||
dscc4.setup= [NET]
|
||||
|
||||
dtc3181e= [HW,SCSI]
|
||||
@ -751,12 +764,25 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
|
||||
|
||||
ftrace=[tracer]
|
||||
[ftrace] will set and start the specified tracer
|
||||
[FTRACE] will set and start the specified tracer
|
||||
as early as possible in order to facilitate early
|
||||
boot debugging.
|
||||
|
||||
ftrace_dump_on_oops
|
||||
[ftrace] will dump the trace buffers on oops.
|
||||
[FTRACE] will dump the trace buffers on oops.
|
||||
|
||||
ftrace_filter=[function-list]
|
||||
[FTRACE] Limit the functions traced by the function
|
||||
tracer at boot up. function-list is a comma separated
|
||||
list of functions. This list can be changed at run
|
||||
time by the set_ftrace_filter file in the debugfs
|
||||
tracing directory.
|
||||
|
||||
ftrace_notrace=[function-list]
|
||||
[FTRACE] Do not trace the functions specified in
|
||||
function-list. This list can be changed at run time
|
||||
by the set_ftrace_notrace file in the debugfs
|
||||
tracing directory.
|
||||
|
||||
gamecon.map[2|3]=
|
||||
[HW,JOY] Multisystem joystick and NES/SNES/PSX pad
|
||||
@ -872,11 +898,8 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
|
||||
ide-core.nodma= [HW] (E)IDE subsystem
|
||||
Format: =0.0 to prevent dma on hda, =0.1 hdb =1.0 hdc
|
||||
.vlb_clock .pci_clock .noflush .noprobe .nowerr .cdrom
|
||||
.chs .ignore_cable are additional options
|
||||
See Documentation/ide/ide.txt.
|
||||
|
||||
idebus= [HW] (E)IDE subsystem - VLB/PCI bus speed
|
||||
.vlb_clock .pci_clock .noflush .nohpa .noprobe .nowerr
|
||||
.cdrom .chs .ignore_cable are additional options
|
||||
See Documentation/ide/ide.txt.
|
||||
|
||||
ide-pci-generic.all-generic-ide [HW] (E)IDE subsystem
|
||||
@ -913,6 +936,12 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
Formt: { "sha1" | "md5" }
|
||||
default: "sha1"
|
||||
|
||||
ima_tcb [IMA]
|
||||
Load a policy which meets the needs of the Trusted
|
||||
Computing Base. This means IMA will measure all
|
||||
programs exec'd, files mmap'd for exec, and all files
|
||||
opened for read by uid=0.
|
||||
|
||||
in2000= [HW,SCSI]
|
||||
See header of drivers/scsi/in2000.c.
|
||||
|
||||
@ -1055,13 +1084,17 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
|
||||
kgdboc= [HW] kgdb over consoles.
|
||||
Requires a tty driver that supports console polling.
|
||||
(only serial suported for now)
|
||||
(only serial supported for now)
|
||||
Format: <serial_device>[,baud]
|
||||
|
||||
kmac= [MIPS] korina ethernet MAC address.
|
||||
Configure the RouterBoard 532 series on-chip
|
||||
Ethernet adapter MAC address.
|
||||
|
||||
kmemleak= [KNL] Boot-time kmemleak enable/disable
|
||||
Valid arguments: on, off
|
||||
Default: on
|
||||
|
||||
kstack=N [X86] Print N words from the kernel stack
|
||||
in oops dumps.
|
||||
|
||||
@ -1380,7 +1413,7 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
('y', default) or cooked coordinates ('n')
|
||||
|
||||
mtrr_chunk_size=nn[KMG] [X86]
|
||||
used for mtrr cleanup. It is largest continous chunk
|
||||
used for mtrr cleanup. It is largest continuous chunk
|
||||
that could hold holes aka. UC entries.
|
||||
|
||||
mtrr_gran_size=nn[KMG] [X86]
|
||||
@ -1525,6 +1558,10 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
register save and restore. The kernel will only save
|
||||
legacy floating-point registers on task switch.
|
||||
|
||||
noxsave [BUGS=X86] Disables x86 extended register state save
|
||||
and restore using xsave. The kernel will fallback to
|
||||
enabling legacy floating-point and sse state.
|
||||
|
||||
nohlt [BUGS=ARM,SH] Tells the kernel that the sleep(SH) or
|
||||
wfi(ARM) instruction doesn't work correctly and not to
|
||||
use it. This is also useful when using JTAG debugger.
|
||||
@ -1561,6 +1598,9 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
noinitrd [RAM] Tells the kernel not to load any configured
|
||||
initial RAM disk.
|
||||
|
||||
nointremap [X86-64, Intel-IOMMU] Do not enable interrupt
|
||||
remapping.
|
||||
|
||||
nointroute [IA-64]
|
||||
|
||||
nojitter [IA64] Disables jitter checking for ITC timers.
|
||||
@ -1646,6 +1686,14 @@ and is between 256 and 4096 characters. It is defined in the file
|
||||
oprofile.timer= [HW]
|
||||
Use timer interrupt instead of performance counters
|
||||
|
||||
oprofile.cpu_type= Force an oprofile cpu type
|
||||
This might be useful if you have an older oprofile
|
||||
userland or if you want common events.
|
||||
Format: { archperfmon }
|
||||
archperfmon: [X86] Force use of architectural
|
||||
perfmon on Intel CPUs instead of the
|
||||
CPU specific event set.
|
||||
|
||||
osst= [HW,SCSI] SCSI Tape Driver
|
||||
Format: <buffer_size>,<write_threshold>
|
||||
See also Documentation/scsi/st.txt.
|
||||
|
773
Documentation/kmemcheck.txt
Normal file
773
Documentation/kmemcheck.txt
Normal file
@ -0,0 +1,773 @@
|
||||
GETTING STARTED WITH KMEMCHECK
|
||||
==============================
|
||||
|
||||
Vegard Nossum <vegardno@ifi.uio.no>
|
||||
|
||||
|
||||
Contents
|
||||
========
|
||||
0. Introduction
|
||||
1. Downloading
|
||||
2. Configuring and compiling
|
||||
3. How to use
|
||||
3.1. Booting
|
||||
3.2. Run-time enable/disable
|
||||
3.3. Debugging
|
||||
3.4. Annotating false positives
|
||||
4. Reporting errors
|
||||
5. Technical description
|
||||
|
||||
|
||||
0. Introduction
|
||||
===============
|
||||
|
||||
kmemcheck is a debugging feature for the Linux Kernel. More specifically, it
|
||||
is a dynamic checker that detects and warns about some uses of uninitialized
|
||||
memory.
|
||||
|
||||
Userspace programmers might be familiar with Valgrind's memcheck. The main
|
||||
difference between memcheck and kmemcheck is that memcheck works for userspace
|
||||
programs only, and kmemcheck works for the kernel only. The implementations
|
||||
are of course vastly different. Because of this, kmemcheck is not as accurate
|
||||
as memcheck, but it turns out to be good enough in practice to discover real
|
||||
programmer errors that the compiler is not able to find through static
|
||||
analysis.
|
||||
|
||||
Enabling kmemcheck on a kernel will probably slow it down to the extent that
|
||||
the machine will not be usable for normal workloads such as e.g. an
|
||||
interactive desktop. kmemcheck will also cause the kernel to use about twice
|
||||
as much memory as normal. For this reason, kmemcheck is strictly a debugging
|
||||
feature.
|
||||
|
||||
|
||||
1. Downloading
|
||||
==============
|
||||
|
||||
kmemcheck can only be downloaded using git. If you want to write patches
|
||||
against the current code, you should use the kmemcheck development branch of
|
||||
the tip tree. It is also possible to use the linux-next tree, which also
|
||||
includes the latest version of kmemcheck.
|
||||
|
||||
Assuming that you've already cloned the linux-2.6.git repository, all you
|
||||
have to do is add the -tip tree as a remote, like this:
|
||||
|
||||
$ git remote add tip git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git
|
||||
|
||||
To actually download the tree, fetch the remote:
|
||||
|
||||
$ git fetch tip
|
||||
|
||||
And to check out a new local branch with the kmemcheck code:
|
||||
|
||||
$ git checkout -b kmemcheck tip/kmemcheck
|
||||
|
||||
General instructions for the -tip tree can be found here:
|
||||
http://people.redhat.com/mingo/tip.git/readme.txt
|
||||
|
||||
|
||||
2. Configuring and compiling
|
||||
============================
|
||||
|
||||
kmemcheck only works for the x86 (both 32- and 64-bit) platform. A number of
|
||||
configuration variables must have specific settings in order for the kmemcheck
|
||||
menu to even appear in "menuconfig". These are:
|
||||
|
||||
o CONFIG_CC_OPTIMIZE_FOR_SIZE=n
|
||||
|
||||
This option is located under "General setup" / "Optimize for size".
|
||||
|
||||
Without this, gcc will use certain optimizations that usually lead to
|
||||
false positive warnings from kmemcheck. An example of this is a 16-bit
|
||||
field in a struct, where gcc may load 32 bits, then discard the upper
|
||||
16 bits. kmemcheck sees only the 32-bit load, and may trigger a
|
||||
warning for the upper 16 bits (if they're uninitialized).
|
||||
|
||||
o CONFIG_SLAB=y or CONFIG_SLUB=y
|
||||
|
||||
This option is located under "General setup" / "Choose SLAB
|
||||
allocator".
|
||||
|
||||
o CONFIG_FUNCTION_TRACER=n
|
||||
|
||||
This option is located under "Kernel hacking" / "Tracers" / "Kernel
|
||||
Function Tracer"
|
||||
|
||||
When function tracing is compiled in, gcc emits a call to another
|
||||
function at the beginning of every function. This means that when the
|
||||
page fault handler is called, the ftrace framework will be called
|
||||
before kmemcheck has had a chance to handle the fault. If ftrace then
|
||||
modifies memory that was tracked by kmemcheck, the result is an
|
||||
endless recursive page fault.
|
||||
|
||||
o CONFIG_DEBUG_PAGEALLOC=n
|
||||
|
||||
This option is located under "Kernel hacking" / "Debug page memory
|
||||
allocations".
|
||||
|
||||
In addition, I highly recommend turning on CONFIG_DEBUG_INFO=y. This is also
|
||||
located under "Kernel hacking". With this, you will be able to get line number
|
||||
information from the kmemcheck warnings, which is extremely valuable in
|
||||
debugging a problem. This option is not mandatory, however, because it slows
|
||||
down the compilation process and produces a much bigger kernel image.
|
||||
|
||||
Now the kmemcheck menu should be visible (under "Kernel hacking" / "kmemcheck:
|
||||
trap use of uninitialized memory"). Here follows a description of the
|
||||
kmemcheck configuration variables:
|
||||
|
||||
o CONFIG_KMEMCHECK
|
||||
|
||||
This must be enabled in order to use kmemcheck at all...
|
||||
|
||||
o CONFIG_KMEMCHECK_[DISABLED | ENABLED | ONESHOT]_BY_DEFAULT
|
||||
|
||||
This option controls the status of kmemcheck at boot-time. "Enabled"
|
||||
will enable kmemcheck right from the start, "disabled" will boot the
|
||||
kernel as normal (but with the kmemcheck code compiled in, so it can
|
||||
be enabled at run-time after the kernel has booted), and "one-shot" is
|
||||
a special mode which will turn kmemcheck off automatically after
|
||||
detecting the first use of uninitialized memory.
|
||||
|
||||
If you are using kmemcheck to actively debug a problem, then you
|
||||
probably want to choose "enabled" here.
|
||||
|
||||
The one-shot mode is mostly useful in automated test setups because it
|
||||
can prevent floods of warnings and increase the chances of the machine
|
||||
surviving in case something is really wrong. In other cases, the one-
|
||||
shot mode could actually be counter-productive because it would turn
|
||||
itself off at the very first error -- in the case of a false positive
|
||||
too -- and this would come in the way of debugging the specific
|
||||
problem you were interested in.
|
||||
|
||||
If you would like to use your kernel as normal, but with a chance to
|
||||
enable kmemcheck in case of some problem, it might be a good idea to
|
||||
choose "disabled" here. When kmemcheck is disabled, most of the run-
|
||||
time overhead is not incurred, and the kernel will be almost as fast
|
||||
as normal.
|
||||
|
||||
o CONFIG_KMEMCHECK_QUEUE_SIZE
|
||||
|
||||
Select the maximum number of error reports to store in an internal
|
||||
(fixed-size) buffer. Since errors can occur virtually anywhere and in
|
||||
any context, we need a temporary storage area which is guaranteed not
|
||||
to generate any other page faults when accessed. The queue will be
|
||||
emptied as soon as a tasklet may be scheduled. If the queue is full,
|
||||
new error reports will be lost.
|
||||
|
||||
The default value of 64 is probably fine. If some code produces more
|
||||
than 64 errors within an irqs-off section, then the code is likely to
|
||||
produce many, many more, too, and these additional reports seldom give
|
||||
any more information (the first report is usually the most valuable
|
||||
anyway).
|
||||
|
||||
This number might have to be adjusted if you are not using serial
|
||||
console or similar to capture the kernel log. If you are using the
|
||||
"dmesg" command to save the log, then getting a lot of kmemcheck
|
||||
warnings might overflow the kernel log itself, and the earlier reports
|
||||
will get lost in that way instead. Try setting this to 10 or so on
|
||||
such a setup.
|
||||
|
||||
o CONFIG_KMEMCHECK_SHADOW_COPY_SHIFT
|
||||
|
||||
Select the number of shadow bytes to save along with each entry of the
|
||||
error-report queue. These bytes indicate what parts of an allocation
|
||||
are initialized, uninitialized, etc. and will be displayed when an
|
||||
error is detected to help the debugging of a particular problem.
|
||||
|
||||
The number entered here is actually the logarithm of the number of
|
||||
bytes that will be saved. So if you pick for example 5 here, kmemcheck
|
||||
will save 2^5 = 32 bytes.
|
||||
|
||||
The default value should be fine for debugging most problems. It also
|
||||
fits nicely within 80 columns.
|
||||
|
||||
o CONFIG_KMEMCHECK_PARTIAL_OK
|
||||
|
||||
This option (when enabled) works around certain GCC optimizations that
|
||||
produce 32-bit reads from 16-bit variables where the upper 16 bits are
|
||||
thrown away afterwards.
|
||||
|
||||
The default value (enabled) is recommended. This may of course hide
|
||||
some real errors, but disabling it would probably produce a lot of
|
||||
false positives.
|
||||
|
||||
o CONFIG_KMEMCHECK_BITOPS_OK
|
||||
|
||||
This option silences warnings that would be generated for bit-field
|
||||
accesses where not all the bits are initialized at the same time. This
|
||||
may also hide some real bugs.
|
||||
|
||||
This option is probably obsolete, or it should be replaced with
|
||||
the kmemcheck-/bitfield-annotations for the code in question. The
|
||||
default value is therefore fine.
|
||||
|
||||
Now compile the kernel as usual.
|
||||
|
||||
|
||||
3. How to use
|
||||
=============
|
||||
|
||||
3.1. Booting
|
||||
============
|
||||
|
||||
First some information about the command-line options. There is only one
|
||||
option specific to kmemcheck, and this is called "kmemcheck". It can be used
|
||||
to override the default mode as chosen by the CONFIG_KMEMCHECK_*_BY_DEFAULT
|
||||
option. Its possible settings are:
|
||||
|
||||
o kmemcheck=0 (disabled)
|
||||
o kmemcheck=1 (enabled)
|
||||
o kmemcheck=2 (one-shot mode)
|
||||
|
||||
If SLUB debugging has been enabled in the kernel, it may take precedence over
|
||||
kmemcheck in such a way that the slab caches which are under SLUB debugging
|
||||
will not be tracked by kmemcheck. In order to ensure that this doesn't happen
|
||||
(even though it shouldn't by default), use SLUB's boot option "slub_debug",
|
||||
like this: slub_debug=-
|
||||
|
||||
In fact, this option may also be used for fine-grained control over SLUB vs.
|
||||
kmemcheck. For example, if the command line includes "kmemcheck=1
|
||||
slub_debug=,dentry", then SLUB debugging will be used only for the "dentry"
|
||||
slab cache, and with kmemcheck tracking all the other caches. This is advanced
|
||||
usage, however, and is not generally recommended.
|
||||
|
||||
|
||||
3.2. Run-time enable/disable
|
||||
============================
|
||||
|
||||
When the kernel has booted, it is possible to enable or disable kmemcheck at
|
||||
run-time. WARNING: This feature is still experimental and may cause false
|
||||
positive warnings to appear. Therefore, try not to use this. If you find that
|
||||
it doesn't work properly (e.g. you see an unreasonable amount of warnings), I
|
||||
will be happy to take bug reports.
|
||||
|
||||
Use the file /proc/sys/kernel/kmemcheck for this purpose, e.g.:
|
||||
|
||||
$ echo 0 > /proc/sys/kernel/kmemcheck # disables kmemcheck
|
||||
|
||||
The numbers are the same as for the kmemcheck= command-line option.
|
||||
|
||||
|
||||
3.3. Debugging
|
||||
==============
|
||||
|
||||
A typical report will look something like this:
|
||||
|
||||
WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024)
|
||||
80000000000000000000000000000000000000000088ffff0000000000000000
|
||||
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
|
||||
^
|
||||
|
||||
Pid: 1856, comm: ntpdate Not tainted 2.6.29-rc5 #264 945P-A
|
||||
RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190
|
||||
RSP: 0018:ffff88003cdf7d98 EFLAGS: 00210002
|
||||
RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009
|
||||
RDX: ffff88003e5d6018 RSI: ffff88003e5d6024 RDI: ffff88003cdf7e84
|
||||
RBP: ffff88003cdf7db8 R08: ffff88003e5d6000 R09: 0000000000000000
|
||||
R10: 0000000000000080 R11: 0000000000000000 R12: 000000000000000e
|
||||
R13: ffff88003cdf7e78 R14: ffff88003d530710 R15: ffff88003d5a98c8
|
||||
FS: 0000000000000000(0000) GS:ffff880001982000(0063) knlGS:00000
|
||||
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
|
||||
CR2: ffff88003f806ea0 CR3: 000000003c036000 CR4: 00000000000006a0
|
||||
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
|
||||
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
|
||||
[<ffffffff8104f04e>] dequeue_signal+0x8e/0x170
|
||||
[<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390
|
||||
[<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0
|
||||
[<ffffffff8100c7b5>] int_signal+0x12/0x17
|
||||
[<ffffffffffffffff>] 0xffffffffffffffff
|
||||
|
||||
The single most valuable information in this report is the RIP (or EIP on 32-
|
||||
bit) value. This will help us pinpoint exactly which instruction that caused
|
||||
the warning.
|
||||
|
||||
If your kernel was compiled with CONFIG_DEBUG_INFO=y, then all we have to do
|
||||
is give this address to the addr2line program, like this:
|
||||
|
||||
$ addr2line -e vmlinux -i ffffffff8104ede8
|
||||
arch/x86/include/asm/string_64.h:12
|
||||
include/asm-generic/siginfo.h:287
|
||||
kernel/signal.c:380
|
||||
kernel/signal.c:410
|
||||
|
||||
The "-e vmlinux" tells addr2line which file to look in. IMPORTANT: This must
|
||||
be the vmlinux of the kernel that produced the warning in the first place! If
|
||||
not, the line number information will almost certainly be wrong.
|
||||
|
||||
The "-i" tells addr2line to also print the line numbers of inlined functions.
|
||||
In this case, the flag was very important, because otherwise, it would only
|
||||
have printed the first line, which is just a call to memcpy(), which could be
|
||||
called from a thousand places in the kernel, and is therefore not very useful.
|
||||
These inlined functions would not show up in the stack trace above, simply
|
||||
because the kernel doesn't load the extra debugging information. This
|
||||
technique can of course be used with ordinary kernel oopses as well.
|
||||
|
||||
In this case, it's the caller of memcpy() that is interesting, and it can be
|
||||
found in include/asm-generic/siginfo.h, line 287:
|
||||
|
||||
281 static inline void copy_siginfo(struct siginfo *to, struct siginfo *from)
|
||||
282 {
|
||||
283 if (from->si_code < 0)
|
||||
284 memcpy(to, from, sizeof(*to));
|
||||
285 else
|
||||
286 /* _sigchld is currently the largest know union member */
|
||||
287 memcpy(to, from, __ARCH_SI_PREAMBLE_SIZE + sizeof(from->_sifields._sigchld));
|
||||
288 }
|
||||
|
||||
Since this was a read (kmemcheck usually warns about reads only, though it can
|
||||
warn about writes to unallocated or freed memory as well), it was probably the
|
||||
"from" argument which contained some uninitialized bytes. Following the chain
|
||||
of calls, we move upwards to see where "from" was allocated or initialized,
|
||||
kernel/signal.c, line 380:
|
||||
|
||||
359 static void collect_signal(int sig, struct sigpending *list, siginfo_t *info)
|
||||
360 {
|
||||
...
|
||||
367 list_for_each_entry(q, &list->list, list) {
|
||||
368 if (q->info.si_signo == sig) {
|
||||
369 if (first)
|
||||
370 goto still_pending;
|
||||
371 first = q;
|
||||
...
|
||||
377 if (first) {
|
||||
378 still_pending:
|
||||
379 list_del_init(&first->list);
|
||||
380 copy_siginfo(info, &first->info);
|
||||
381 __sigqueue_free(first);
|
||||
...
|
||||
392 }
|
||||
393 }
|
||||
|
||||
Here, it is &first->info that is being passed on to copy_siginfo(). The
|
||||
variable "first" was found on a list -- passed in as the second argument to
|
||||
collect_signal(). We continue our journey through the stack, to figure out
|
||||
where the item on "list" was allocated or initialized. We move to line 410:
|
||||
|
||||
395 static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
|
||||
396 siginfo_t *info)
|
||||
397 {
|
||||
...
|
||||
410 collect_signal(sig, pending, info);
|
||||
...
|
||||
414 }
|
||||
|
||||
Now we need to follow the "pending" pointer, since that is being passed on to
|
||||
collect_signal() as "list". At this point, we've run out of lines from the
|
||||
"addr2line" output. Not to worry, we just paste the next addresses from the
|
||||
kmemcheck stack dump, i.e.:
|
||||
|
||||
[<ffffffff8104f04e>] dequeue_signal+0x8e/0x170
|
||||
[<ffffffff81050bd8>] get_signal_to_deliver+0x98/0x390
|
||||
[<ffffffff8100b87d>] do_notify_resume+0xad/0x7d0
|
||||
[<ffffffff8100c7b5>] int_signal+0x12/0x17
|
||||
|
||||
$ addr2line -e vmlinux -i ffffffff8104f04e ffffffff81050bd8 \
|
||||
ffffffff8100b87d ffffffff8100c7b5
|
||||
kernel/signal.c:446
|
||||
kernel/signal.c:1806
|
||||
arch/x86/kernel/signal.c:805
|
||||
arch/x86/kernel/signal.c:871
|
||||
arch/x86/kernel/entry_64.S:694
|
||||
|
||||
Remember that since these addresses were found on the stack and not as the
|
||||
RIP value, they actually point to the _next_ instruction (they are return
|
||||
addresses). This becomes obvious when we look at the code for line 446:
|
||||
|
||||
422 int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
|
||||
423 {
|
||||
...
|
||||
431 signr = __dequeue_signal(&tsk->signal->shared_pending,
|
||||
432 mask, info);
|
||||
433 /*
|
||||
434 * itimer signal ?
|
||||
435 *
|
||||
436 * itimers are process shared and we restart periodic
|
||||
437 * itimers in the signal delivery path to prevent DoS
|
||||
438 * attacks in the high resolution timer case. This is
|
||||
439 * compliant with the old way of self restarting
|
||||
440 * itimers, as the SIGALRM is a legacy signal and only
|
||||
441 * queued once. Changing the restart behaviour to
|
||||
442 * restart the timer in the signal dequeue path is
|
||||
443 * reducing the timer noise on heavy loaded !highres
|
||||
444 * systems too.
|
||||
445 */
|
||||
446 if (unlikely(signr == SIGALRM)) {
|
||||
...
|
||||
489 }
|
||||
|
||||
So instead of looking at 446, we should be looking at 431, which is the line
|
||||
that executes just before 446. Here we see that what we are looking for is
|
||||
&tsk->signal->shared_pending.
|
||||
|
||||
Our next task is now to figure out which function that puts items on this
|
||||
"shared_pending" list. A crude, but efficient tool, is git grep:
|
||||
|
||||
$ git grep -n 'shared_pending' kernel/
|
||||
...
|
||||
kernel/signal.c:828: pending = group ? &t->signal->shared_pending : &t->pending;
|
||||
kernel/signal.c:1339: pending = group ? &t->signal->shared_pending : &t->pending;
|
||||
...
|
||||
|
||||
There were more results, but none of them were related to list operations,
|
||||
and these were the only assignments. We inspect the line numbers more closely
|
||||
and find that this is indeed where items are being added to the list:
|
||||
|
||||
816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
|
||||
817 int group)
|
||||
818 {
|
||||
...
|
||||
828 pending = group ? &t->signal->shared_pending : &t->pending;
|
||||
...
|
||||
851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN &&
|
||||
852 (is_si_special(info) ||
|
||||
853 info->si_code >= 0)));
|
||||
854 if (q) {
|
||||
855 list_add_tail(&q->list, &pending->list);
|
||||
...
|
||||
890 }
|
||||
|
||||
and:
|
||||
|
||||
1309 int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
|
||||
1310 {
|
||||
....
|
||||
1339 pending = group ? &t->signal->shared_pending : &t->pending;
|
||||
1340 list_add_tail(&q->list, &pending->list);
|
||||
....
|
||||
1347 }
|
||||
|
||||
In the first case, the list element we are looking for, "q", is being returned
|
||||
from the function __sigqueue_alloc(), which looks like an allocation function.
|
||||
Let's take a look at it:
|
||||
|
||||
187 static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
|
||||
188 int override_rlimit)
|
||||
189 {
|
||||
190 struct sigqueue *q = NULL;
|
||||
191 struct user_struct *user;
|
||||
192
|
||||
193 /*
|
||||
194 * We won't get problems with the target's UID changing under us
|
||||
195 * because changing it requires RCU be used, and if t != current, the
|
||||
196 * caller must be holding the RCU readlock (by way of a spinlock) and
|
||||
197 * we use RCU protection here
|
||||
198 */
|
||||
199 user = get_uid(__task_cred(t)->user);
|
||||
200 atomic_inc(&user->sigpending);
|
||||
201 if (override_rlimit ||
|
||||
202 atomic_read(&user->sigpending) <=
|
||||
203 t->signal->rlim[RLIMIT_SIGPENDING].rlim_cur)
|
||||
204 q = kmem_cache_alloc(sigqueue_cachep, flags);
|
||||
205 if (unlikely(q == NULL)) {
|
||||
206 atomic_dec(&user->sigpending);
|
||||
207 free_uid(user);
|
||||
208 } else {
|
||||
209 INIT_LIST_HEAD(&q->list);
|
||||
210 q->flags = 0;
|
||||
211 q->user = user;
|
||||
212 }
|
||||
213
|
||||
214 return q;
|
||||
215 }
|
||||
|
||||
We see that this function initializes q->list, q->flags, and q->user. It seems
|
||||
that now is the time to look at the definition of "struct sigqueue", e.g.:
|
||||
|
||||
14 struct sigqueue {
|
||||
15 struct list_head list;
|
||||
16 int flags;
|
||||
17 siginfo_t info;
|
||||
18 struct user_struct *user;
|
||||
19 };
|
||||
|
||||
And, you might remember, it was a memcpy() on &first->info that caused the
|
||||
warning, so this makes perfect sense. It also seems reasonable to assume that
|
||||
it is the caller of __sigqueue_alloc() that has the responsibility of filling
|
||||
out (initializing) this member.
|
||||
|
||||
But just which fields of the struct were uninitialized? Let's look at
|
||||
kmemcheck's report again:
|
||||
|
||||
WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88003e4a2024)
|
||||
80000000000000000000000000000000000000000088ffff0000000000000000
|
||||
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
|
||||
^
|
||||
|
||||
These first two lines are the memory dump of the memory object itself, and the
|
||||
shadow bytemap, respectively. The memory object itself is in this case
|
||||
&first->info. Just beware that the start of this dump is NOT the start of the
|
||||
object itself! The position of the caret (^) corresponds with the address of
|
||||
the read (ffff88003e4a2024).
|
||||
|
||||
The shadow bytemap dump legend is as follows:
|
||||
|
||||
i - initialized
|
||||
u - uninitialized
|
||||
a - unallocated (memory has been allocated by the slab layer, but has not
|
||||
yet been handed off to anybody)
|
||||
f - freed (memory has been allocated by the slab layer, but has been freed
|
||||
by the previous owner)
|
||||
|
||||
In order to figure out where (relative to the start of the object) the
|
||||
uninitialized memory was located, we have to look at the disassembly. For
|
||||
that, we'll need the RIP address again:
|
||||
|
||||
RIP: 0010:[<ffffffff8104ede8>] [<ffffffff8104ede8>] __dequeue_signal+0xc8/0x190
|
||||
|
||||
$ objdump -d --no-show-raw-insn vmlinux | grep -C 8 ffffffff8104ede8:
|
||||
ffffffff8104edc8: mov %r8,0x8(%r8)
|
||||
ffffffff8104edcc: test %r10d,%r10d
|
||||
ffffffff8104edcf: js ffffffff8104ee88 <__dequeue_signal+0x168>
|
||||
ffffffff8104edd5: mov %rax,%rdx
|
||||
ffffffff8104edd8: mov $0xc,%ecx
|
||||
ffffffff8104eddd: mov %r13,%rdi
|
||||
ffffffff8104ede0: mov $0x30,%eax
|
||||
ffffffff8104ede5: mov %rdx,%rsi
|
||||
ffffffff8104ede8: rep movsl %ds:(%rsi),%es:(%rdi)
|
||||
ffffffff8104edea: test $0x2,%al
|
||||
ffffffff8104edec: je ffffffff8104edf0 <__dequeue_signal+0xd0>
|
||||
ffffffff8104edee: movsw %ds:(%rsi),%es:(%rdi)
|
||||
ffffffff8104edf0: test $0x1,%al
|
||||
ffffffff8104edf2: je ffffffff8104edf5 <__dequeue_signal+0xd5>
|
||||
ffffffff8104edf4: movsb %ds:(%rsi),%es:(%rdi)
|
||||
ffffffff8104edf5: mov %r8,%rdi
|
||||
ffffffff8104edf8: callq ffffffff8104de60 <__sigqueue_free>
|
||||
|
||||
As expected, it's the "rep movsl" instruction from the memcpy() that causes
|
||||
the warning. We know about REP MOVSL that it uses the register RCX to count
|
||||
the number of remaining iterations. By taking a look at the register dump
|
||||
again (from the kmemcheck report), we can figure out how many bytes were left
|
||||
to copy:
|
||||
|
||||
RAX: 0000000000000030 RBX: ffff88003d4ea968 RCX: 0000000000000009
|
||||
|
||||
By looking at the disassembly, we also see that %ecx is being loaded with the
|
||||
value $0xc just before (ffffffff8104edd8), so we are very lucky. Keep in mind
|
||||
that this is the number of iterations, not bytes. And since this is a "long"
|
||||
operation, we need to multiply by 4 to get the number of bytes. So this means
|
||||
that the uninitialized value was encountered at 4 * (0xc - 0x9) = 12 bytes
|
||||
from the start of the object.
|
||||
|
||||
We can now try to figure out which field of the "struct siginfo" that was not
|
||||
initialized. This is the beginning of the struct:
|
||||
|
||||
40 typedef struct siginfo {
|
||||
41 int si_signo;
|
||||
42 int si_errno;
|
||||
43 int si_code;
|
||||
44
|
||||
45 union {
|
||||
..
|
||||
92 } _sifields;
|
||||
93 } siginfo_t;
|
||||
|
||||
On 64-bit, the int is 4 bytes long, so it must the the union member that has
|
||||
not been initialized. We can verify this using gdb:
|
||||
|
||||
$ gdb vmlinux
|
||||
...
|
||||
(gdb) p &((struct siginfo *) 0)->_sifields
|
||||
$1 = (union {...} *) 0x10
|
||||
|
||||
Actually, it seems that the union member is located at offset 0x10 -- which
|
||||
means that gcc has inserted 4 bytes of padding between the members si_code
|
||||
and _sifields. We can now get a fuller picture of the memory dump:
|
||||
|
||||
_----------------------------=> si_code
|
||||
/ _--------------------=> (padding)
|
||||
| / _------------=> _sifields(._kill._pid)
|
||||
| | / _----=> _sifields(._kill._uid)
|
||||
| | | /
|
||||
-------|-------|-------|-------|
|
||||
80000000000000000000000000000000000000000088ffff0000000000000000
|
||||
i i i i u u u u i i i i i i i i u u u u u u u u u u u u u u u u
|
||||
|
||||
This allows us to realize another important fact: si_code contains the value
|
||||
0x80. Remember that x86 is little endian, so the first 4 bytes "80000000" are
|
||||
really the number 0x00000080. With a bit of research, we find that this is
|
||||
actually the constant SI_KERNEL defined in include/asm-generic/siginfo.h:
|
||||
|
||||
144 #define SI_KERNEL 0x80 /* sent by the kernel from somewhere */
|
||||
|
||||
This macro is used in exactly one place in the x86 kernel: In send_signal()
|
||||
in kernel/signal.c:
|
||||
|
||||
816 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
|
||||
817 int group)
|
||||
818 {
|
||||
...
|
||||
828 pending = group ? &t->signal->shared_pending : &t->pending;
|
||||
...
|
||||
851 q = __sigqueue_alloc(t, GFP_ATOMIC, (sig < SIGRTMIN &&
|
||||
852 (is_si_special(info) ||
|
||||
853 info->si_code >= 0)));
|
||||
854 if (q) {
|
||||
855 list_add_tail(&q->list, &pending->list);
|
||||
856 switch ((unsigned long) info) {
|
||||
...
|
||||
865 case (unsigned long) SEND_SIG_PRIV:
|
||||
866 q->info.si_signo = sig;
|
||||
867 q->info.si_errno = 0;
|
||||
868 q->info.si_code = SI_KERNEL;
|
||||
869 q->info.si_pid = 0;
|
||||
870 q->info.si_uid = 0;
|
||||
871 break;
|
||||
...
|
||||
890 }
|
||||
|
||||
Not only does this match with the .si_code member, it also matches the place
|
||||
we found earlier when looking for where siginfo_t objects are enqueued on the
|
||||
"shared_pending" list.
|
||||
|
||||
So to sum up: It seems that it is the padding introduced by the compiler
|
||||
between two struct fields that is uninitialized, and this gets reported when
|
||||
we do a memcpy() on the struct. This means that we have identified a false
|
||||
positive warning.
|
||||
|
||||
Normally, kmemcheck will not report uninitialized accesses in memcpy() calls
|
||||
when both the source and destination addresses are tracked. (Instead, we copy
|
||||
the shadow bytemap as well). In this case, the destination address clearly
|
||||
was not tracked. We can dig a little deeper into the stack trace from above:
|
||||
|
||||
arch/x86/kernel/signal.c:805
|
||||
arch/x86/kernel/signal.c:871
|
||||
arch/x86/kernel/entry_64.S:694
|
||||
|
||||
And we clearly see that the destination siginfo object is located on the
|
||||
stack:
|
||||
|
||||
782 static void do_signal(struct pt_regs *regs)
|
||||
783 {
|
||||
784 struct k_sigaction ka;
|
||||
785 siginfo_t info;
|
||||
...
|
||||
804 signr = get_signal_to_deliver(&info, &ka, regs, NULL);
|
||||
...
|
||||
854 }
|
||||
|
||||
And this &info is what eventually gets passed to copy_siginfo() as the
|
||||
destination argument.
|
||||
|
||||
Now, even though we didn't find an actual error here, the example is still a
|
||||
good one, because it shows how one would go about to find out what the report
|
||||
was all about.
|
||||
|
||||
|
||||
3.4. Annotating false positives
|
||||
===============================
|
||||
|
||||
There are a few different ways to make annotations in the source code that
|
||||
will keep kmemcheck from checking and reporting certain allocations. Here
|
||||
they are:
|
||||
|
||||
o __GFP_NOTRACK_FALSE_POSITIVE
|
||||
|
||||
This flag can be passed to kmalloc() or kmem_cache_alloc() (therefore
|
||||
also to other functions that end up calling one of these) to indicate
|
||||
that the allocation should not be tracked because it would lead to
|
||||
a false positive report. This is a "big hammer" way of silencing
|
||||
kmemcheck; after all, even if the false positive pertains to
|
||||
particular field in a struct, for example, we will now lose the
|
||||
ability to find (real) errors in other parts of the same struct.
|
||||
|
||||
Example:
|
||||
|
||||
/* No warnings will ever trigger on accessing any part of x */
|
||||
x = kmalloc(sizeof *x, GFP_KERNEL | __GFP_NOTRACK_FALSE_POSITIVE);
|
||||
|
||||
o kmemcheck_bitfield_begin(name)/kmemcheck_bitfield_end(name) and
|
||||
kmemcheck_annotate_bitfield(ptr, name)
|
||||
|
||||
The first two of these three macros can be used inside struct
|
||||
definitions to signal, respectively, the beginning and end of a
|
||||
bitfield. Additionally, this will assign the bitfield a name, which
|
||||
is given as an argument to the macros.
|
||||
|
||||
Having used these markers, one can later use
|
||||
kmemcheck_annotate_bitfield() at the point of allocation, to indicate
|
||||
which parts of the allocation is part of a bitfield.
|
||||
|
||||
Example:
|
||||
|
||||
struct foo {
|
||||
int x;
|
||||
|
||||
kmemcheck_bitfield_begin(flags);
|
||||
int flag_a:1;
|
||||
int flag_b:1;
|
||||
kmemcheck_bitfield_end(flags);
|
||||
|
||||
int y;
|
||||
};
|
||||
|
||||
struct foo *x = kmalloc(sizeof *x);
|
||||
|
||||
/* No warnings will trigger on accessing the bitfield of x */
|
||||
kmemcheck_annotate_bitfield(x, flags);
|
||||
|
||||
Note that kmemcheck_annotate_bitfield() can be used even before the
|
||||
return value of kmalloc() is checked -- in other words, passing NULL
|
||||
as the first argument is legal (and will do nothing).
|
||||
|
||||
|
||||
4. Reporting errors
|
||||
===================
|
||||
|
||||
As we have seen, kmemcheck will produce false positive reports. Therefore, it
|
||||
is not very wise to blindly post kmemcheck warnings to mailing lists and
|
||||
maintainers. Instead, I encourage maintainers and developers to find errors
|
||||
in their own code. If you get a warning, you can try to work around it, try
|
||||
to figure out if it's a real error or not, or simply ignore it. Most
|
||||
developers know their own code and will quickly and efficiently determine the
|
||||
root cause of a kmemcheck report. This is therefore also the most efficient
|
||||
way to work with kmemcheck.
|
||||
|
||||
That said, we (the kmemcheck maintainers) will always be on the lookout for
|
||||
false positives that we can annotate and silence. So whatever you find,
|
||||
please drop us a note privately! Kernel configs and steps to reproduce (if
|
||||
available) are of course a great help too.
|
||||
|
||||
Happy hacking!
|
||||
|
||||
|
||||
5. Technical description
|
||||
========================
|
||||
|
||||
kmemcheck works by marking memory pages non-present. This means that whenever
|
||||
somebody attempts to access the page, a page fault is generated. The page
|
||||
fault handler notices that the page was in fact only hidden, and so it calls
|
||||
on the kmemcheck code to make further investigations.
|
||||
|
||||
When the investigations are completed, kmemcheck "shows" the page by marking
|
||||
it present (as it would be under normal circumstances). This way, the
|
||||
interrupted code can continue as usual.
|
||||
|
||||
But after the instruction has been executed, we should hide the page again, so
|
||||
that we can catch the next access too! Now kmemcheck makes use of a debugging
|
||||
feature of the processor, namely single-stepping. When the processor has
|
||||
finished the one instruction that generated the memory access, a debug
|
||||
exception is raised. From here, we simply hide the page again and continue
|
||||
execution, this time with the single-stepping feature turned off.
|
||||
|
||||
kmemcheck requires some assistance from the memory allocator in order to work.
|
||||
The memory allocator needs to
|
||||
|
||||
1. Tell kmemcheck about newly allocated pages and pages that are about to
|
||||
be freed. This allows kmemcheck to set up and tear down the shadow memory
|
||||
for the pages in question. The shadow memory stores the status of each
|
||||
byte in the allocation proper, e.g. whether it is initialized or
|
||||
uninitialized.
|
||||
|
||||
2. Tell kmemcheck which parts of memory should be marked uninitialized.
|
||||
There are actually a few more states, such as "not yet allocated" and
|
||||
"recently freed".
|
||||
|
||||
If a slab cache is set up using the SLAB_NOTRACK flag, it will never return
|
||||
memory that can take page faults because of kmemcheck.
|
||||
|
||||
If a slab cache is NOT set up using the SLAB_NOTRACK flag, callers can still
|
||||
request memory with the __GFP_NOTRACK or __GFP_NOTRACK_FALSE_POSITIVE flags.
|
||||
This does not prevent the page faults from occurring, however, but marks the
|
||||
object in question as being initialized so that no warnings will ever be
|
||||
produced for this object.
|
||||
|
||||
Currently, the SLAB and SLUB allocators are supported by kmemcheck.
|
142
Documentation/kmemleak.txt
Normal file
142
Documentation/kmemleak.txt
Normal file
@ -0,0 +1,142 @@
|
||||
Kernel Memory Leak Detector
|
||||
===========================
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
Kmemleak provides a way of detecting possible kernel memory leaks in a
|
||||
way similar to a tracing garbage collector
|
||||
(http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Tracing_garbage_collectors),
|
||||
with the difference that the orphan objects are not freed but only
|
||||
reported via /sys/kernel/debug/kmemleak. A similar method is used by the
|
||||
Valgrind tool (memcheck --leak-check) to detect the memory leaks in
|
||||
user-space applications.
|
||||
|
||||
Usage
|
||||
-----
|
||||
|
||||
CONFIG_DEBUG_KMEMLEAK in "Kernel hacking" has to be enabled. A kernel
|
||||
thread scans the memory every 10 minutes (by default) and prints any new
|
||||
unreferenced objects found. To trigger an intermediate scan and display
|
||||
all the possible memory leaks:
|
||||
|
||||
# mount -t debugfs nodev /sys/kernel/debug/
|
||||
# cat /sys/kernel/debug/kmemleak
|
||||
|
||||
Note that the orphan objects are listed in the order they were allocated
|
||||
and one object at the beginning of the list may cause other subsequent
|
||||
objects to be reported as orphan.
|
||||
|
||||
Memory scanning parameters can be modified at run-time by writing to the
|
||||
/sys/kernel/debug/kmemleak file. The following parameters are supported:
|
||||
|
||||
off - disable kmemleak (irreversible)
|
||||
stack=on - enable the task stacks scanning
|
||||
stack=off - disable the tasks stacks scanning
|
||||
scan=on - start the automatic memory scanning thread
|
||||
scan=off - stop the automatic memory scanning thread
|
||||
scan=<secs> - set the automatic memory scanning period in seconds (0
|
||||
to disable it)
|
||||
|
||||
Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on
|
||||
the kernel command line.
|
||||
|
||||
Basic Algorithm
|
||||
---------------
|
||||
|
||||
The memory allocations via kmalloc, vmalloc, kmem_cache_alloc and
|
||||
friends are traced and the pointers, together with additional
|
||||
information like size and stack trace, are stored in a prio search tree.
|
||||
The corresponding freeing function calls are tracked and the pointers
|
||||
removed from the kmemleak data structures.
|
||||
|
||||
An allocated block of memory is considered orphan if no pointer to its
|
||||
start address or to any location inside the block can be found by
|
||||
scanning the memory (including saved registers). This means that there
|
||||
might be no way for the kernel to pass the address of the allocated
|
||||
block to a freeing function and therefore the block is considered a
|
||||
memory leak.
|
||||
|
||||
The scanning algorithm steps:
|
||||
|
||||
1. mark all objects as white (remaining white objects will later be
|
||||
considered orphan)
|
||||
2. scan the memory starting with the data section and stacks, checking
|
||||
the values against the addresses stored in the prio search tree. If
|
||||
a pointer to a white object is found, the object is added to the
|
||||
gray list
|
||||
3. scan the gray objects for matching addresses (some white objects
|
||||
can become gray and added at the end of the gray list) until the
|
||||
gray set is finished
|
||||
4. the remaining white objects are considered orphan and reported via
|
||||
/sys/kernel/debug/kmemleak
|
||||
|
||||
Some allocated memory blocks have pointers stored in the kernel's
|
||||
internal data structures and they cannot be detected as orphans. To
|
||||
avoid this, kmemleak can also store the number of values pointing to an
|
||||
address inside the block address range that need to be found so that the
|
||||
block is not considered a leak. One example is __vmalloc().
|
||||
|
||||
Kmemleak API
|
||||
------------
|
||||
|
||||
See the include/linux/kmemleak.h header for the functions prototype.
|
||||
|
||||
kmemleak_init - initialize kmemleak
|
||||
kmemleak_alloc - notify of a memory block allocation
|
||||
kmemleak_free - notify of a memory block freeing
|
||||
kmemleak_not_leak - mark an object as not a leak
|
||||
kmemleak_ignore - do not scan or report an object as leak
|
||||
kmemleak_scan_area - add scan areas inside a memory block
|
||||
kmemleak_no_scan - do not scan a memory block
|
||||
kmemleak_erase - erase an old value in a pointer variable
|
||||
kmemleak_alloc_recursive - as kmemleak_alloc but checks the recursiveness
|
||||
kmemleak_free_recursive - as kmemleak_free but checks the recursiveness
|
||||
|
||||
Dealing with false positives/negatives
|
||||
--------------------------------------
|
||||
|
||||
The false negatives are real memory leaks (orphan objects) but not
|
||||
reported by kmemleak because values found during the memory scanning
|
||||
point to such objects. To reduce the number of false negatives, kmemleak
|
||||
provides the kmemleak_ignore, kmemleak_scan_area, kmemleak_no_scan and
|
||||
kmemleak_erase functions (see above). The task stacks also increase the
|
||||
amount of false negatives and their scanning is not enabled by default.
|
||||
|
||||
The false positives are objects wrongly reported as being memory leaks
|
||||
(orphan). For objects known not to be leaks, kmemleak provides the
|
||||
kmemleak_not_leak function. The kmemleak_ignore could also be used if
|
||||
the memory block is known not to contain other pointers and it will no
|
||||
longer be scanned.
|
||||
|
||||
Some of the reported leaks are only transient, especially on SMP
|
||||
systems, because of pointers temporarily stored in CPU registers or
|
||||
stacks. Kmemleak defines MSECS_MIN_AGE (defaulting to 1000) representing
|
||||
the minimum age of an object to be reported as a memory leak.
|
||||
|
||||
Limitations and Drawbacks
|
||||
-------------------------
|
||||
|
||||
The main drawback is the reduced performance of memory allocation and
|
||||
freeing. To avoid other penalties, the memory scanning is only performed
|
||||
when the /sys/kernel/debug/kmemleak file is read. Anyway, this tool is
|
||||
intended for debugging purposes where the performance might not be the
|
||||
most important requirement.
|
||||
|
||||
To keep the algorithm simple, kmemleak scans for values pointing to any
|
||||
address inside a block's address range. This may lead to an increased
|
||||
number of false negatives. However, it is likely that a real memory leak
|
||||
will eventually become visible.
|
||||
|
||||
Another source of false negatives is the data stored in non-pointer
|
||||
values. In a future version, kmemleak could only scan the pointer
|
||||
members in the allocated structures. This feature would solve many of
|
||||
the false negative cases described above.
|
||||
|
||||
The tool can report false positives. These are cases where an allocated
|
||||
block doesn't need to be freed (some cases in the init_call functions),
|
||||
the pointer is calculated by other methods than the usual container_of
|
||||
macro or the pointer is stored in a location not scanned by kmemleak.
|
||||
|
||||
Page allocations and ioremap are not tracked. Only the ARM and x86
|
||||
architectures are currently supported.
|
@ -132,7 +132,7 @@ kobject_name():
|
||||
const char *kobject_name(const struct kobject * kobj);
|
||||
|
||||
There is a helper function to both initialize and add the kobject to the
|
||||
kernel at the same time, called supprisingly enough kobject_init_and_add():
|
||||
kernel at the same time, called surprisingly enough kobject_init_and_add():
|
||||
|
||||
int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype,
|
||||
struct kobject *parent, const char *fmt, ...);
|
||||
|
@ -507,9 +507,9 @@ http://www.linuxsymposium.org/2006/linuxsymposium_procv2.pdf (pages 101-115)
|
||||
Appendix A: The kprobes debugfs interface
|
||||
|
||||
With recent kernels (> 2.6.20) the list of registered kprobes is visible
|
||||
under the /debug/kprobes/ directory (assuming debugfs is mounted at /debug).
|
||||
under the /sys/kernel/debug/kprobes/ directory (assuming debugfs is mounted at //sys/kernel/debug).
|
||||
|
||||
/debug/kprobes/list: Lists all registered probes on the system
|
||||
/sys/kernel/debug/kprobes/list: Lists all registered probes on the system
|
||||
|
||||
c015d71a k vfs_read+0x0
|
||||
c011a316 j do_fork+0x0
|
||||
@ -525,7 +525,7 @@ virtual addresses that correspond to modules that've been unloaded),
|
||||
such probes are marked with [GONE]. If the probe is temporarily disabled,
|
||||
such probes are marked with [DISABLED].
|
||||
|
||||
/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly.
|
||||
/sys/kernel/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly.
|
||||
|
||||
Provides a knob to globally and forcibly turn registered kprobes ON or OFF.
|
||||
By default, all kprobes are enabled. By echoing "0" to this file, all
|
||||
|
@ -40,7 +40,7 @@ NOTE: The Acer Aspire One is not supported hardware. It cannot work with
|
||||
acer-wmi until Acer fix their ACPI-WMI implementation on them, so has been
|
||||
blacklisted until that happens.
|
||||
|
||||
Please see the website for the current list of known working hardare:
|
||||
Please see the website for the current list of known working hardware:
|
||||
|
||||
http://code.google.com/p/aceracpi/wiki/SupportedHardware
|
||||
|
||||
|
@ -22,7 +22,7 @@ If your laptop model supports it, you will find sysfs files in the
|
||||
/sys/class/backlight/sony/
|
||||
directory. You will be able to query and set the current screen
|
||||
brightness:
|
||||
brightness get/set screen brightness (an iteger
|
||||
brightness get/set screen brightness (an integer
|
||||
between 0 and 7)
|
||||
actual_brightness reading from this file will query the HW
|
||||
to get real brightness value
|
||||
|
@ -506,7 +506,7 @@ generate input device EV_KEY events.
|
||||
In addition to the EV_KEY events, thinkpad-acpi may also issue EV_SW
|
||||
events for switches:
|
||||
|
||||
SW_RFKILL_ALL T60 and later hardare rfkill rocker switch
|
||||
SW_RFKILL_ALL T60 and later hardware rfkill rocker switch
|
||||
SW_TABLET_MODE Tablet ThinkPads HKEY events 0x5009 and 0x500A
|
||||
|
||||
Non hot-key ACPI HKEY event map:
|
||||
|
@ -1,6 +1,5 @@
|
||||
# This creates the demonstration utility "lguest" which runs a Linux guest.
|
||||
CFLAGS:=-Wall -Wmissing-declarations -Wmissing-prototypes -O3 -I../../include -I../../arch/x86/include -U_FORTIFY_SOURCE
|
||||
LDLIBS:=-lz
|
||||
CFLAGS:=-m32 -Wall -Wmissing-declarations -Wmissing-prototypes -O3 -I../../include -I../../arch/x86/include -U_FORTIFY_SOURCE
|
||||
|
||||
all: lguest
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -37,7 +37,6 @@ Running Lguest:
|
||||
"Paravirtualized guest support" = Y
|
||||
"Lguest guest support" = Y
|
||||
"High Memory Support" = off/4GB
|
||||
"PAE (Physical Address Extension) Support" = N
|
||||
"Alignment value to which kernel should be aligned" = 0x100000
|
||||
(CONFIG_PARAVIRT=y, CONFIG_LGUEST_GUEST=y, CONFIG_HIGHMEM64G=n and
|
||||
CONFIG_PHYSICAL_ALIGN=0x100000)
|
||||
|
@ -34,7 +34,7 @@ out of order wrt other memory writes by the owner CPU.
|
||||
|
||||
It can be done by slightly modifying the standard atomic operations : only
|
||||
their UP variant must be kept. It typically means removing LOCK prefix (on
|
||||
i386 and x86_64) and any SMP sychronization barrier. If the architecture does
|
||||
i386 and x86_64) and any SMP synchronization barrier. If the architecture does
|
||||
not have a different behavior between SMP and UP, including asm-generic/local.h
|
||||
in your architecture's local.h is sufficient.
|
||||
|
||||
|
@ -31,6 +31,7 @@ Contents:
|
||||
|
||||
- Locking functions.
|
||||
- Interrupt disabling functions.
|
||||
- Sleep and wake-up functions.
|
||||
- Miscellaneous functions.
|
||||
|
||||
(*) Inter-CPU locking barrier effects.
|
||||
@ -1217,6 +1218,132 @@ barriers are required in such a situation, they must be provided from some
|
||||
other means.
|
||||
|
||||
|
||||
SLEEP AND WAKE-UP FUNCTIONS
|
||||
---------------------------
|
||||
|
||||
Sleeping and waking on an event flagged in global data can be viewed as an
|
||||
interaction between two pieces of data: the task state of the task waiting for
|
||||
the event and the global data used to indicate the event. To make sure that
|
||||
these appear to happen in the right order, the primitives to begin the process
|
||||
of going to sleep, and the primitives to initiate a wake up imply certain
|
||||
barriers.
|
||||
|
||||
Firstly, the sleeper normally follows something like this sequence of events:
|
||||
|
||||
for (;;) {
|
||||
set_current_state(TASK_UNINTERRUPTIBLE);
|
||||
if (event_indicated)
|
||||
break;
|
||||
schedule();
|
||||
}
|
||||
|
||||
A general memory barrier is interpolated automatically by set_current_state()
|
||||
after it has altered the task state:
|
||||
|
||||
CPU 1
|
||||
===============================
|
||||
set_current_state();
|
||||
set_mb();
|
||||
STORE current->state
|
||||
<general barrier>
|
||||
LOAD event_indicated
|
||||
|
||||
set_current_state() may be wrapped by:
|
||||
|
||||
prepare_to_wait();
|
||||
prepare_to_wait_exclusive();
|
||||
|
||||
which therefore also imply a general memory barrier after setting the state.
|
||||
The whole sequence above is available in various canned forms, all of which
|
||||
interpolate the memory barrier in the right place:
|
||||
|
||||
wait_event();
|
||||
wait_event_interruptible();
|
||||
wait_event_interruptible_exclusive();
|
||||
wait_event_interruptible_timeout();
|
||||
wait_event_killable();
|
||||
wait_event_timeout();
|
||||
wait_on_bit();
|
||||
wait_on_bit_lock();
|
||||
|
||||
|
||||
Secondly, code that performs a wake up normally follows something like this:
|
||||
|
||||
event_indicated = 1;
|
||||
wake_up(&event_wait_queue);
|
||||
|
||||
or:
|
||||
|
||||
event_indicated = 1;
|
||||
wake_up_process(event_daemon);
|
||||
|
||||
A write memory barrier is implied by wake_up() and co. if and only if they wake
|
||||
something up. The barrier occurs before the task state is cleared, and so sits
|
||||
between the STORE to indicate the event and the STORE to set TASK_RUNNING:
|
||||
|
||||
CPU 1 CPU 2
|
||||
=============================== ===============================
|
||||
set_current_state(); STORE event_indicated
|
||||
set_mb(); wake_up();
|
||||
STORE current->state <write barrier>
|
||||
<general barrier> STORE current->state
|
||||
LOAD event_indicated
|
||||
|
||||
The available waker functions include:
|
||||
|
||||
complete();
|
||||
wake_up();
|
||||
wake_up_all();
|
||||
wake_up_bit();
|
||||
wake_up_interruptible();
|
||||
wake_up_interruptible_all();
|
||||
wake_up_interruptible_nr();
|
||||
wake_up_interruptible_poll();
|
||||
wake_up_interruptible_sync();
|
||||
wake_up_interruptible_sync_poll();
|
||||
wake_up_locked();
|
||||
wake_up_locked_poll();
|
||||
wake_up_nr();
|
||||
wake_up_poll();
|
||||
wake_up_process();
|
||||
|
||||
|
||||
[!] Note that the memory barriers implied by the sleeper and the waker do _not_
|
||||
order multiple stores before the wake-up with respect to loads of those stored
|
||||
values after the sleeper has called set_current_state(). For instance, if the
|
||||
sleeper does:
|
||||
|
||||
set_current_state(TASK_INTERRUPTIBLE);
|
||||
if (event_indicated)
|
||||
break;
|
||||
__set_current_state(TASK_RUNNING);
|
||||
do_something(my_data);
|
||||
|
||||
and the waker does:
|
||||
|
||||
my_data = value;
|
||||
event_indicated = 1;
|
||||
wake_up(&event_wait_queue);
|
||||
|
||||
there's no guarantee that the change to event_indicated will be perceived by
|
||||
the sleeper as coming after the change to my_data. In such a circumstance, the
|
||||
code on both sides must interpolate its own memory barriers between the
|
||||
separate data accesses. Thus the above sleeper ought to do:
|
||||
|
||||
set_current_state(TASK_INTERRUPTIBLE);
|
||||
if (event_indicated) {
|
||||
smp_rmb();
|
||||
do_something(my_data);
|
||||
}
|
||||
|
||||
and the waker should do:
|
||||
|
||||
my_data = value;
|
||||
smp_wmb();
|
||||
event_indicated = 1;
|
||||
wake_up(&event_wait_queue);
|
||||
|
||||
|
||||
MISCELLANEOUS FUNCTIONS
|
||||
-----------------------
|
||||
|
||||
@ -1366,7 +1493,7 @@ WHERE ARE MEMORY BARRIERS NEEDED?
|
||||
|
||||
Under normal operation, memory operation reordering is generally not going to
|
||||
be a problem as a single-threaded linear piece of code will still appear to
|
||||
work correctly, even if it's in an SMP kernel. There are, however, three
|
||||
work correctly, even if it's in an SMP kernel. There are, however, four
|
||||
circumstances in which reordering definitely _could_ be a problem:
|
||||
|
||||
(*) Interprocessor interaction.
|
||||
|
@ -73,13 +73,13 @@ this phase is triggered automatically. ACPI can notify this event. If not,
|
||||
(see Section 4.).
|
||||
|
||||
Logical Memory Hotplug phase is to change memory state into
|
||||
avaiable/unavailable for users. Amount of memory from user's view is
|
||||
available/unavailable for users. Amount of memory from user's view is
|
||||
changed by this phase. The kernel makes all memory in it as free pages
|
||||
when a memory range is available.
|
||||
|
||||
In this document, this phase is described as online/offline.
|
||||
|
||||
Logical Memory Hotplug phase is triggred by write of sysfs file by system
|
||||
Logical Memory Hotplug phase is triggered by write of sysfs file by system
|
||||
administrator. For the hot-add case, it must be executed after Physical Hotplug
|
||||
phase by hand.
|
||||
(However, if you writes udev's hotplug scripts for memory hotplug, these
|
||||
@ -334,7 +334,7 @@ MEMORY_CANCEL_ONLINE
|
||||
Generated if MEMORY_GOING_ONLINE fails.
|
||||
|
||||
MEMORY_ONLINE
|
||||
Generated when memory has succesfully brought online. The callback may
|
||||
Generated when memory has successfully brought online. The callback may
|
||||
allocate pages from the new memory.
|
||||
|
||||
MEMORY_GOING_OFFLINE
|
||||
@ -359,7 +359,7 @@ The third argument is passed by pointer of struct memory_notify.
|
||||
struct memory_notify {
|
||||
unsigned long start_pfn;
|
||||
unsigned long nr_pages;
|
||||
int status_cahnge_nid;
|
||||
int status_change_nid;
|
||||
}
|
||||
|
||||
start_pfn is start_pfn of online/offline memory.
|
||||
|
@ -26,7 +26,7 @@ registers and the stack. If the first argument is a 64-bit value, it will be
|
||||
passed in D0:D1. If the first argument is not a 64-bit value, but the second
|
||||
is, the second will be passed entirely on the stack and D1 will be unused.
|
||||
|
||||
Arguments smaller than 32-bits are not coelesced within a register or a stack
|
||||
Arguments smaller than 32-bits are not coalesced within a register or a stack
|
||||
word. For example, two byte-sized arguments will always be passed in separate
|
||||
registers or word-sized stack slots.
|
||||
|
||||
|
@ -50,7 +50,7 @@ byte 255: bit7 bit6 bit5 bit4 bit3 bit2 bit1 bit0 rp1 rp3 rp5 ... rp15
|
||||
cp5 cp5 cp5 cp5 cp4 cp4 cp4 cp4
|
||||
|
||||
This figure represents a sector of 256 bytes.
|
||||
cp is my abbreviaton for column parity, rp for row parity.
|
||||
cp is my abbreviation for column parity, rp for row parity.
|
||||
|
||||
Let's start to explain column parity.
|
||||
cp0 is the parity that belongs to all bit0, bit2, bit4, bit6.
|
||||
@ -560,7 +560,7 @@ Measuring this code again showed big gain. When executing the original
|
||||
linux code 1 million times, this took about 1 second on my system.
|
||||
(using time to measure the performance). After this iteration I was back
|
||||
to 0.075 sec. Actually I had to decide to start measuring over 10
|
||||
million interations in order not to loose too much accuracy. This one
|
||||
million iterations in order not to lose too much accuracy. This one
|
||||
definitely seemed to be the jackpot!
|
||||
|
||||
There is a little bit more room for improvement though. There are three
|
||||
@ -571,8 +571,8 @@ loop; This eliminates 3 statements per loop. Of course after the loop we
|
||||
need to correct by adding:
|
||||
rp4 ^= rp4_6;
|
||||
rp6 ^= rp4_6
|
||||
Furthermore there are 4 sequential assingments to rp8. This can be
|
||||
encoded slightly more efficient by saving tmppar before those 4 lines
|
||||
Furthermore there are 4 sequential assignments to rp8. This can be
|
||||
encoded slightly more efficiently by saving tmppar before those 4 lines
|
||||
and later do rp8 = rp8 ^ tmppar ^ notrp8;
|
||||
(where notrp8 is the value of rp8 before those 4 lines).
|
||||
Again a use of the commutative property of xor.
|
||||
@ -622,7 +622,7 @@ Not a big change, but every penny counts :-)
|
||||
Analysis 7
|
||||
==========
|
||||
|
||||
Acutally this made things worse. Not very much, but I don't want to move
|
||||
Actually this made things worse. Not very much, but I don't want to move
|
||||
into the wrong direction. Maybe something to investigate later. Could
|
||||
have to do with caching again.
|
||||
|
||||
@ -642,7 +642,7 @@ Analysis 8
|
||||
This makes things worse. Let's stick with attempt 6 and continue from there.
|
||||
Although it seems that the code within the loop cannot be optimised
|
||||
further there is still room to optimize the generation of the ecc codes.
|
||||
We can simply calcualate the total parity. If this is 0 then rp4 = rp5
|
||||
We can simply calculate the total parity. If this is 0 then rp4 = rp5
|
||||
etc. If the parity is 1, then rp4 = !rp5;
|
||||
But if rp4 = rp5 we do not need rp5 etc. We can just write the even bits
|
||||
in the result byte and then do something like
|
||||
|
@ -221,7 +221,7 @@ ad_select
|
||||
|
||||
- Any slave's 802.3ad association state changes
|
||||
|
||||
- The bond's adminstrative state changes to up
|
||||
- The bond's administrative state changes to up
|
||||
|
||||
count or 2
|
||||
|
||||
@ -369,7 +369,7 @@ fail_over_mac
|
||||
When this policy is used in conjuction with the mii
|
||||
monitor, devices which assert link up prior to being
|
||||
able to actually transmit and receive are particularly
|
||||
susecptible to loss of the gratuitous ARP, and an
|
||||
susceptible to loss of the gratuitous ARP, and an
|
||||
appropriate updelay setting may be required.
|
||||
|
||||
follow or 2
|
||||
@ -1794,7 +1794,7 @@ target to query.
|
||||
generally referred to as "trunk failover." This is a feature of the
|
||||
switch that causes the link state of a particular switch port to be set
|
||||
down (or up) when the state of another switch port goes down (or up).
|
||||
It's purpose is to propogate link failures from logically "exterior" ports
|
||||
Its purpose is to propagate link failures from logically "exterior" ports
|
||||
to the logically "interior" ports that bonding is able to monitor via
|
||||
miimon. Availability and configuration for trunk failover varies by
|
||||
switch, but this can be a viable alternative to the ARP monitor when using
|
||||
|
@ -36,10 +36,15 @@ This file contains
|
||||
6.2 local loopback of sent frames
|
||||
6.3 CAN controller hardware filters
|
||||
6.4 The virtual CAN driver (vcan)
|
||||
6.5 currently supported CAN hardware
|
||||
6.6 todo
|
||||
6.5 The CAN network device driver interface
|
||||
6.5.1 Netlink interface to set/get devices properties
|
||||
6.5.2 Setting the CAN bit-timing
|
||||
6.5.3 Starting and stopping the CAN network device
|
||||
6.6 supported CAN hardware
|
||||
|
||||
7 Credits
|
||||
7 Socket CAN resources
|
||||
|
||||
8 Credits
|
||||
|
||||
============================================================================
|
||||
|
||||
@ -234,6 +239,8 @@ solution for a couple of reasons:
|
||||
the user application using the common CAN filter mechanisms. Inside
|
||||
this filter definition the (interested) type of errors may be
|
||||
selected. The reception of error frames is disabled by default.
|
||||
The format of the CAN error frame is briefly decribed in the Linux
|
||||
header file "include/linux/can/error.h".
|
||||
|
||||
4. How to use Socket CAN
|
||||
------------------------
|
||||
@ -327,7 +334,7 @@ solution for a couple of reasons:
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* paraniod check ... */
|
||||
/* paranoid check ... */
|
||||
if (nbytes < sizeof(struct can_frame)) {
|
||||
fprintf(stderr, "read: incomplete CAN frame\n");
|
||||
return 1;
|
||||
@ -605,61 +612,213 @@ solution for a couple of reasons:
|
||||
removal of vcan network devices can be managed with the ip(8) tool:
|
||||
|
||||
- Create a virtual CAN network interface:
|
||||
ip link add type vcan
|
||||
$ ip link add type vcan
|
||||
|
||||
- Create a virtual CAN network interface with a specific name 'vcan42':
|
||||
ip link add dev vcan42 type vcan
|
||||
$ ip link add dev vcan42 type vcan
|
||||
|
||||
- Remove a (virtual CAN) network interface 'vcan42':
|
||||
ip link del vcan42
|
||||
$ ip link del vcan42
|
||||
|
||||
The tool 'vcan' from the SocketCAN SVN repository on BerliOS is obsolete.
|
||||
6.5 The CAN network device driver interface
|
||||
|
||||
Virtual CAN network device creation in older Kernels:
|
||||
In Linux Kernel versions < 2.6.24 the vcan driver creates 4 vcan
|
||||
netdevices at module load time by default. This value can be changed
|
||||
with the module parameter 'numdev'. E.g. 'modprobe vcan numdev=8'
|
||||
The CAN network device driver interface provides a generic interface
|
||||
to setup, configure and monitor CAN network devices. The user can then
|
||||
configure the CAN device, like setting the bit-timing parameters, via
|
||||
the netlink interface using the program "ip" from the "IPROUTE2"
|
||||
utility suite. The following chapter describes briefly how to use it.
|
||||
Furthermore, the interface uses a common data structure and exports a
|
||||
set of common functions, which all real CAN network device drivers
|
||||
should use. Please have a look to the SJA1000 or MSCAN driver to
|
||||
understand how to use them. The name of the module is can-dev.ko.
|
||||
|
||||
6.5 currently supported CAN hardware
|
||||
6.5.1 Netlink interface to set/get devices properties
|
||||
|
||||
On the project website http://developer.berlios.de/projects/socketcan
|
||||
there are different drivers available:
|
||||
The CAN device must be configured via netlink interface. The supported
|
||||
netlink message types are defined and briefly described in
|
||||
"include/linux/can/netlink.h". CAN link support for the program "ip"
|
||||
of the IPROUTE2 utility suite is avaiable and it can be used as shown
|
||||
below:
|
||||
|
||||
vcan: Virtual CAN interface driver (if no real hardware is available)
|
||||
sja1000: Philips SJA1000 CAN controller (recommended)
|
||||
i82527: Intel i82527 CAN controller
|
||||
mscan: Motorola/Freescale CAN controller (e.g. inside SOC MPC5200)
|
||||
ccan: CCAN controller core (e.g. inside SOC h7202)
|
||||
slcan: For a bunch of CAN adaptors that are attached via a
|
||||
serial line ASCII protocol (for serial / USB adaptors)
|
||||
- Setting CAN device properties:
|
||||
|
||||
Additionally the different CAN adaptors (ISA/PCI/PCMCIA/USB/Parport)
|
||||
from PEAK Systemtechnik support the CAN netdevice driver model
|
||||
since Linux driver v6.0: http://www.peak-system.com/linux/index.htm
|
||||
$ ip link set can0 type can help
|
||||
Usage: ip link set DEVICE type can
|
||||
[ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
|
||||
[ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
|
||||
phase-seg2 PHASE-SEG2 [ sjw SJW ] ]
|
||||
|
||||
Please check the Mailing Lists on the berlios OSS project website.
|
||||
[ loopback { on | off } ]
|
||||
[ listen-only { on | off } ]
|
||||
[ triple-sampling { on | off } ]
|
||||
|
||||
6.6 todo
|
||||
[ restart-ms TIME-MS ]
|
||||
[ restart ]
|
||||
|
||||
The configuration interface for CAN network drivers is still an open
|
||||
issue that has not been finalized in the socketcan project. Also the
|
||||
idea of having a library module (candev.ko) that holds functions
|
||||
that are needed by all CAN netdevices is not ready to ship.
|
||||
Your contribution is welcome.
|
||||
Where: BITRATE := { 1..1000000 }
|
||||
SAMPLE-POINT := { 0.000..0.999 }
|
||||
TQ := { NUMBER }
|
||||
PROP-SEG := { 1..8 }
|
||||
PHASE-SEG1 := { 1..8 }
|
||||
PHASE-SEG2 := { 1..8 }
|
||||
SJW := { 1..4 }
|
||||
RESTART-MS := { 0 | NUMBER }
|
||||
|
||||
7. Credits
|
||||
- Display CAN device details and statistics:
|
||||
|
||||
$ ip -details -statistics link show can0
|
||||
2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP qlen 10
|
||||
link/can
|
||||
can <TRIPLE-SAMPLING> state ERROR-ACTIVE restart-ms 100
|
||||
bitrate 125000 sample_point 0.875
|
||||
tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
|
||||
sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
|
||||
clock 8000000
|
||||
re-started bus-errors arbit-lost error-warn error-pass bus-off
|
||||
41 17457 0 41 42 41
|
||||
RX: bytes packets errors dropped overrun mcast
|
||||
140859 17608 17457 0 0 0
|
||||
TX: bytes packets errors dropped carrier collsns
|
||||
861 112 0 41 0 0
|
||||
|
||||
More info to the above output:
|
||||
|
||||
"<TRIPLE-SAMPLING>"
|
||||
Shows the list of selected CAN controller modes: LOOPBACK,
|
||||
LISTEN-ONLY, or TRIPLE-SAMPLING.
|
||||
|
||||
"state ERROR-ACTIVE"
|
||||
The current state of the CAN controller: "ERROR-ACTIVE",
|
||||
"ERROR-WARNING", "ERROR-PASSIVE", "BUS-OFF" or "STOPPED"
|
||||
|
||||
"restart-ms 100"
|
||||
Automatic restart delay time. If set to a non-zero value, a
|
||||
restart of the CAN controller will be triggered automatically
|
||||
in case of a bus-off condition after the specified delay time
|
||||
in milliseconds. By default it's off.
|
||||
|
||||
"bitrate 125000 sample_point 0.875"
|
||||
Shows the real bit-rate in bits/sec and the sample-point in the
|
||||
range 0.000..0.999. If the calculation of bit-timing parameters
|
||||
is enabled in the kernel (CONFIG_CAN_CALC_BITTIMING=y), the
|
||||
bit-timing can be defined by setting the "bitrate" argument.
|
||||
Optionally the "sample-point" can be specified. By default it's
|
||||
0.000 assuming CIA-recommended sample-points.
|
||||
|
||||
"tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1"
|
||||
Shows the time quanta in ns, propagation segment, phase buffer
|
||||
segment 1 and 2 and the synchronisation jump width in units of
|
||||
tq. They allow to define the CAN bit-timing in a hardware
|
||||
independent format as proposed by the Bosch CAN 2.0 spec (see
|
||||
chapter 8 of http://www.semiconductors.bosch.de/pdf/can2spec.pdf).
|
||||
|
||||
"sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
|
||||
clock 8000000"
|
||||
Shows the bit-timing constants of the CAN controller, here the
|
||||
"sja1000". The minimum and maximum values of the time segment 1
|
||||
and 2, the synchronisation jump width in units of tq, the
|
||||
bitrate pre-scaler and the CAN system clock frequency in Hz.
|
||||
These constants could be used for user-defined (non-standard)
|
||||
bit-timing calculation algorithms in user-space.
|
||||
|
||||
"re-started bus-errors arbit-lost error-warn error-pass bus-off"
|
||||
Shows the number of restarts, bus and arbitration lost errors,
|
||||
and the state changes to the error-warning, error-passive and
|
||||
bus-off state. RX overrun errors are listed in the "overrun"
|
||||
field of the standard network statistics.
|
||||
|
||||
6.5.2 Setting the CAN bit-timing
|
||||
|
||||
The CAN bit-timing parameters can always be defined in a hardware
|
||||
independent format as proposed in the Bosch CAN 2.0 specification
|
||||
specifying the arguments "tq", "prop_seg", "phase_seg1", "phase_seg2"
|
||||
and "sjw":
|
||||
|
||||
$ ip link set canX type can tq 125 prop-seg 6 \
|
||||
phase-seg1 7 phase-seg2 2 sjw 1
|
||||
|
||||
If the kernel option CONFIG_CAN_CALC_BITTIMING is enabled, CIA
|
||||
recommended CAN bit-timing parameters will be calculated if the bit-
|
||||
rate is specified with the argument "bitrate":
|
||||
|
||||
$ ip link set canX type can bitrate 125000
|
||||
|
||||
Note that this works fine for the most common CAN controllers with
|
||||
standard bit-rates but may *fail* for exotic bit-rates or CAN system
|
||||
clock frequencies. Disabling CONFIG_CAN_CALC_BITTIMING saves some
|
||||
space and allows user-space tools to solely determine and set the
|
||||
bit-timing parameters. The CAN controller specific bit-timing
|
||||
constants can be used for that purpose. They are listed by the
|
||||
following command:
|
||||
|
||||
$ ip -details link show can0
|
||||
...
|
||||
sja1000: clock 8000000 tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
|
||||
|
||||
6.5.3 Starting and stopping the CAN network device
|
||||
|
||||
A CAN network device is started or stopped as usual with the command
|
||||
"ifconfig canX up/down" or "ip link set canX up/down". Be aware that
|
||||
you *must* define proper bit-timing parameters for real CAN devices
|
||||
before you can start it to avoid error-prone default settings:
|
||||
|
||||
$ ip link set canX up type can bitrate 125000
|
||||
|
||||
A device may enter the "bus-off" state if too much errors occurred on
|
||||
the CAN bus. Then no more messages are received or sent. An automatic
|
||||
bus-off recovery can be enabled by setting the "restart-ms" to a
|
||||
non-zero value, e.g.:
|
||||
|
||||
$ ip link set canX type can restart-ms 100
|
||||
|
||||
Alternatively, the application may realize the "bus-off" condition
|
||||
by monitoring CAN error frames and do a restart when appropriate with
|
||||
the command:
|
||||
|
||||
$ ip link set canX type can restart
|
||||
|
||||
Note that a restart will also create a CAN error frame (see also
|
||||
chapter 3.4).
|
||||
|
||||
6.6 Supported CAN hardware
|
||||
|
||||
Please check the "Kconfig" file in "drivers/net/can" to get an actual
|
||||
list of the support CAN hardware. On the Socket CAN project website
|
||||
(see chapter 7) there might be further drivers available, also for
|
||||
older kernel versions.
|
||||
|
||||
7. Socket CAN resources
|
||||
-----------------------
|
||||
|
||||
You can find further resources for Socket CAN like user space tools,
|
||||
support for old kernel versions, more drivers, mailing lists, etc.
|
||||
at the BerliOS OSS project website for Socket CAN:
|
||||
|
||||
http://developer.berlios.de/projects/socketcan
|
||||
|
||||
If you have questions, bug fixes, etc., don't hesitate to post them to
|
||||
the Socketcan-Users mailing list. But please search the archives first.
|
||||
|
||||
8. Credits
|
||||
----------
|
||||
|
||||
Oliver Hartkopp (PF_CAN core, filters, drivers, bcm)
|
||||
Oliver Hartkopp (PF_CAN core, filters, drivers, bcm, SJA1000 driver)
|
||||
Urs Thuermann (PF_CAN core, kernel integration, socket interfaces, raw, vcan)
|
||||
Jan Kizka (RT-SocketCAN core, Socket-API reconciliation)
|
||||
Wolfgang Grandegger (RT-SocketCAN core & drivers, Raw Socket-API reviews)
|
||||
Wolfgang Grandegger (RT-SocketCAN core & drivers, Raw Socket-API reviews,
|
||||
CAN device driver interface, MSCAN driver)
|
||||
Robert Schwebel (design reviews, PTXdist integration)
|
||||
Marc Kleine-Budde (design reviews, Kernel 2.6 cleanups, drivers)
|
||||
Benedikt Spranger (reviews)
|
||||
Thomas Gleixner (LKML reviews, coding style, posting hints)
|
||||
Andrey Volkov (kernel subtree structure, ioctls, mscan driver)
|
||||
Andrey Volkov (kernel subtree structure, ioctls, MSCAN driver)
|
||||
Matthias Brukner (first SJA1000 CAN netdevice implementation Q2/2003)
|
||||
Klaus Hitschler (PEAK driver integration)
|
||||
Uwe Koppe (CAN netdevices with PF_PACKET approach)
|
||||
Michael Schulze (driver layer loopback requirement, RT CAN drivers review)
|
||||
Pavel Pisa (Bit-timing calculation)
|
||||
Sascha Hauer (SJA1000 platform driver)
|
||||
Sebastian Haas (SJA1000 EMS PCI driver)
|
||||
Markus Plessing (SJA1000 EMS PCI driver)
|
||||
Per Dalen (SJA1000 Kvaser PCI driver)
|
||||
Sam Ravnborg (reviews, coding style, kbuild help)
|
||||
|
@ -129,7 +129,7 @@ PHY Link state polling
|
||||
----------------------
|
||||
|
||||
The driver keeps track of the link state and informs the network core
|
||||
about link (carrier) availablilty. This is managed by several methods
|
||||
about link (carrier) availability. This is managed by several methods
|
||||
depending on the version of the chip and on which PHY is being used.
|
||||
|
||||
For the internal PHY, the original (and currently default) method is
|
||||
|
76
Documentation/networking/ieee802154.txt
Normal file
76
Documentation/networking/ieee802154.txt
Normal file
@ -0,0 +1,76 @@
|
||||
|
||||
Linux IEEE 802.15.4 implementation
|
||||
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
The Linux-ZigBee project goal is to provide complete implementation
|
||||
of IEEE 802.15.4 / ZigBee / 6LoWPAN protocols. IEEE 802.15.4 is a stack
|
||||
of protocols for organizing Low-Rate Wireless Personal Area Networks.
|
||||
|
||||
Currently only IEEE 802.15.4 layer is implemented. We have choosen
|
||||
to use plain Berkeley socket API, the generic Linux networking stack
|
||||
to transfer IEEE 802.15.4 messages and a special protocol over genetlink
|
||||
for configuration/management
|
||||
|
||||
|
||||
Socket API
|
||||
==========
|
||||
|
||||
int sd = socket(PF_IEEE802154, SOCK_DGRAM, 0);
|
||||
.....
|
||||
|
||||
The address family, socket addresses etc. are defined in the
|
||||
include/net/ieee802154/af_ieee802154.h header or in the special header
|
||||
in our userspace package (see either linux-zigbee sourceforge download page
|
||||
or git tree at git://linux-zigbee.git.sourceforge.net/gitroot/linux-zigbee).
|
||||
|
||||
One can use SOCK_RAW for passing raw data towards device xmit function. YMMV.
|
||||
|
||||
|
||||
MLME - MAC Level Management
|
||||
============================
|
||||
|
||||
Most of IEEE 802.15.4 MLME interfaces are directly mapped on netlink commands.
|
||||
See the include/net/ieee802154/nl802154.h header. Our userspace tools package
|
||||
(see above) provides CLI configuration utility for radio interfaces and simple
|
||||
coordinator for IEEE 802.15.4 networks as an example users of MLME protocol.
|
||||
|
||||
|
||||
Kernel side
|
||||
=============
|
||||
|
||||
Like with WiFi, there are several types of devices implementing IEEE 802.15.4.
|
||||
1) 'HardMAC'. The MAC layer is implemented in the device itself, the device
|
||||
exports MLME and data API.
|
||||
2) 'SoftMAC' or just radio. These types of devices are just radio transceivers
|
||||
possibly with some kinds of acceleration like automatic CRC computation and
|
||||
comparation, automagic ACK handling, address matching, etc.
|
||||
|
||||
Those types of devices require different approach to be hooked into Linux kernel.
|
||||
|
||||
|
||||
HardMAC
|
||||
=======
|
||||
|
||||
See the header include/net/ieee802154/netdevice.h. You have to implement Linux
|
||||
net_device, with .type = ARPHRD_IEEE802154. Data is exchanged with socket family
|
||||
code via plain sk_buffs. The control block of sk_buffs will contain additional
|
||||
info as described in the struct ieee802154_mac_cb.
|
||||
|
||||
To hook the MLME interface you have to populate the ml_priv field of your
|
||||
net_device with a pointer to struct ieee802154_mlme_ops instance. All fields are
|
||||
required.
|
||||
|
||||
We provide an example of simple HardMAC driver at drivers/ieee802154/fakehard.c
|
||||
|
||||
|
||||
SoftMAC
|
||||
=======
|
||||
|
||||
We are going to provide intermediate layer impelementing IEEE 802.15.4 MAC
|
||||
in software. This is currently WIP.
|
||||
|
||||
See header include/net/ieee802154/mac802154.h and several drivers in
|
||||
drivers/ieee802154/
|
@ -168,7 +168,16 @@ tcp_dsack - BOOLEAN
|
||||
Allows TCP to send "duplicate" SACKs.
|
||||
|
||||
tcp_ecn - BOOLEAN
|
||||
Enable Explicit Congestion Notification in TCP.
|
||||
Enable Explicit Congestion Notification (ECN) in TCP. ECN is only
|
||||
used when both ends of the TCP flow support it. It is useful to
|
||||
avoid losses due to congestion (when the bottleneck router supports
|
||||
ECN).
|
||||
Possible values are:
|
||||
0 disable ECN
|
||||
1 ECN enabled
|
||||
2 Only server-side ECN enabled. If the other end does
|
||||
not support ECN, behavior is like with ECN disabled.
|
||||
Default: 2
|
||||
|
||||
tcp_fack - BOOLEAN
|
||||
Enable FACK congestion avoidance and fast retransmission.
|
||||
@ -1048,6 +1057,13 @@ disable_ipv6 - BOOLEAN
|
||||
address.
|
||||
Default: FALSE (enable IPv6 operation)
|
||||
|
||||
When this value is changed from 1 to 0 (IPv6 is being enabled),
|
||||
it will dynamically create a link-local address on the given
|
||||
interface and start Duplicate Address Detection, if necessary.
|
||||
|
||||
When this value is changed from 0 to 1 (IPv6 is being disabled),
|
||||
it will dynamically delete all address on the given interface.
|
||||
|
||||
accept_dad - INTEGER
|
||||
Whether to accept DAD (Duplicate Address Detection).
|
||||
0: Disable DAD
|
||||
@ -1266,13 +1282,22 @@ sctp_rmem - vector of 3 INTEGERs: min, default, max
|
||||
sctp_wmem - vector of 3 INTEGERs: min, default, max
|
||||
See tcp_wmem for a description.
|
||||
|
||||
UNDOCUMENTED:
|
||||
|
||||
/proc/sys/net/core/*
|
||||
dev_weight FIXME
|
||||
dev_weight - INTEGER
|
||||
The maximum number of packets that kernel can handle on a NAPI
|
||||
interrupt, it's a Per-CPU variable.
|
||||
|
||||
Default: 64
|
||||
|
||||
/proc/sys/net/unix/*
|
||||
max_dgram_qlen FIXME
|
||||
max_dgram_qlen - INTEGER
|
||||
The maximum length of dgram socket receive queue
|
||||
|
||||
Default: 10
|
||||
|
||||
|
||||
UNDOCUMENTED:
|
||||
|
||||
/proc/sys/net/irda/*
|
||||
fast_poll_increase FIXME
|
||||
|
@ -33,3 +33,40 @@ disable
|
||||
|
||||
A reboot is required to enable IPv6.
|
||||
|
||||
autoconf
|
||||
|
||||
Specifies whether to enable IPv6 address autoconfiguration
|
||||
on all interfaces. This might be used when one does not wish
|
||||
for addresses to be automatically generated from prefixes
|
||||
received in Router Advertisements.
|
||||
|
||||
The possible values and their effects are:
|
||||
|
||||
0
|
||||
IPv6 address autoconfiguration is disabled on all interfaces.
|
||||
|
||||
Only the IPv6 loopback address (::1) and link-local addresses
|
||||
will be added to interfaces.
|
||||
|
||||
1
|
||||
IPv6 address autoconfiguration is enabled on all interfaces.
|
||||
|
||||
This is the default value.
|
||||
|
||||
disable_ipv6
|
||||
|
||||
Specifies whether to disable IPv6 on all interfaces.
|
||||
This might be used when no IPv6 addresses are desired.
|
||||
|
||||
The possible values and their effects are:
|
||||
|
||||
0
|
||||
IPv6 is enabled on all interfaces.
|
||||
|
||||
This is the default value.
|
||||
|
||||
1
|
||||
IPv6 is disabled on all interfaces.
|
||||
|
||||
No IPv6 addresses will be added to interfaces.
|
||||
|
||||
|
@ -158,7 +158,7 @@ Sample Userspace Code
|
||||
}
|
||||
return 0;
|
||||
|
||||
Miscellanous
|
||||
Miscellaneous
|
||||
============
|
||||
|
||||
The PPPoL2TP driver was developed as part of the OpenL2TP project by
|
||||
|
@ -12,38 +12,22 @@ following format:
|
||||
The radiotap format is discussed in
|
||||
./Documentation/networking/radiotap-headers.txt.
|
||||
|
||||
Despite 13 radiotap argument types are currently defined, most only make sense
|
||||
Despite many radiotap parameters being currently defined, most only make sense
|
||||
to appear on received packets. The following information is parsed from the
|
||||
radiotap headers and used to control injection:
|
||||
|
||||
* IEEE80211_RADIOTAP_RATE
|
||||
|
||||
rate in 500kbps units, automatic if invalid or not present
|
||||
|
||||
|
||||
* IEEE80211_RADIOTAP_ANTENNA
|
||||
|
||||
antenna to use, automatic if not present
|
||||
|
||||
|
||||
* IEEE80211_RADIOTAP_DBM_TX_POWER
|
||||
|
||||
transmit power in dBm, automatic if not present
|
||||
|
||||
|
||||
* IEEE80211_RADIOTAP_FLAGS
|
||||
|
||||
IEEE80211_RADIOTAP_F_FCS: FCS will be removed and recalculated
|
||||
IEEE80211_RADIOTAP_F_WEP: frame will be encrypted if key available
|
||||
IEEE80211_RADIOTAP_F_FRAG: frame will be fragmented if longer than the
|
||||
current fragmentation threshold. Note that
|
||||
this flag is only reliable when software
|
||||
fragmentation is enabled)
|
||||
current fragmentation threshold.
|
||||
|
||||
|
||||
The injection code can also skip all other currently defined radiotap fields
|
||||
facilitating replay of captured radiotap headers directly.
|
||||
|
||||
Here is an example valid radiotap header defining these three parameters
|
||||
Here is an example valid radiotap header defining some parameters
|
||||
|
||||
0x00, 0x00, // <-- radiotap version
|
||||
0x0b, 0x00, // <- radiotap header length
|
||||
@ -72,8 +56,8 @@ interface), along the following lines:
|
||||
...
|
||||
r = pcap_inject(ppcap, u8aSendBuffer, nLength);
|
||||
|
||||
You can also find sources for a complete inject test applet here:
|
||||
You can also find a link to a complete inject application here:
|
||||
|
||||
http://penumbra.warmcat.com/_twk/tiki-index.php?page=packetspammer
|
||||
http://wireless.kernel.org/en/users/Documentation/packetspammer
|
||||
|
||||
Andy Green <andy@warmcat.com>
|
||||
|
@ -74,7 +74,7 @@ dev->hard_start_xmit:
|
||||
for this and return NETDEV_TX_LOCKED when the spin lock fails.
|
||||
The locking there should also properly protect against
|
||||
set_multicast_list. Note that the use of NETIF_F_LLTX is deprecated.
|
||||
Dont use it for new drivers.
|
||||
Don't use it for new drivers.
|
||||
|
||||
Context: Process with BHs disabled or BH (timer),
|
||||
will be called with interrupts disabled by netconsole.
|
||||
|
@ -38,9 +38,6 @@ ifinfomsg::if_flags & IFF_LOWER_UP:
|
||||
ifinfomsg::if_flags & IFF_DORMANT:
|
||||
Driver has signaled netif_dormant_on()
|
||||
|
||||
These interface flags can also be queried without netlink using the
|
||||
SIOCGIFFLAGS ioctl.
|
||||
|
||||
TLV IFLA_OPERSTATE
|
||||
|
||||
contains RFC2863 state of the interface in numeric representation:
|
||||
|
@ -4,16 +4,18 @@
|
||||
|
||||
This file documents the CONFIG_PACKET_MMAP option available with the PACKET
|
||||
socket interface on 2.4 and 2.6 kernels. This type of sockets is used for
|
||||
capture network traffic with utilities like tcpdump or any other that uses
|
||||
the libpcap library.
|
||||
|
||||
You can find the latest version of this document at
|
||||
capture network traffic with utilities like tcpdump or any other that needs
|
||||
raw access to network interface.
|
||||
|
||||
You can find the latest version of this document at:
|
||||
http://pusa.uv.es/~ulisses/packet_mmap/
|
||||
|
||||
Please send me your comments to
|
||||
Howto can be found at:
|
||||
http://wiki.gnu-log.net (packet_mmap)
|
||||
|
||||
Please send your comments to
|
||||
Ulisses Alonso Camaró <uaca@i.hate.spam.alumni.uv.es>
|
||||
Johann Baudy <johann.baudy@gnu-log.net>
|
||||
|
||||
-------------------------------------------------------------------------------
|
||||
+ Why use PACKET_MMAP
|
||||
@ -25,19 +27,24 @@ to capture each packet, it requires two if you want to get packet's
|
||||
timestamp (like libpcap always does).
|
||||
|
||||
In the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size
|
||||
configurable circular buffer mapped in user space. This way reading packets just
|
||||
needs to wait for them, most of the time there is no need to issue a single
|
||||
system call. By using a shared buffer between the kernel and the user
|
||||
also has the benefit of minimizing packet copies.
|
||||
configurable circular buffer mapped in user space that can be used to either
|
||||
send or receive packets. This way reading packets just needs to wait for them,
|
||||
most of the time there is no need to issue a single system call. Concerning
|
||||
transmission, multiple packets can be sent through one system call to get the
|
||||
highest bandwidth.
|
||||
By using a shared buffer between the kernel and the user also has the benefit
|
||||
of minimizing packet copies.
|
||||
|
||||
It's fine to use PACKET_MMAP to improve the performance of the capture process,
|
||||
but it isn't everything. At least, if you are capturing at high speeds (this
|
||||
is relative to the cpu speed), you should check if the device driver of your
|
||||
network interface card supports some sort of interrupt load mitigation or
|
||||
(even better) if it supports NAPI, also make sure it is enabled.
|
||||
It's fine to use PACKET_MMAP to improve the performance of the capture and
|
||||
transmission process, but it isn't everything. At least, if you are capturing
|
||||
at high speeds (this is relative to the cpu speed), you should check if the
|
||||
device driver of your network interface card supports some sort of interrupt
|
||||
load mitigation or (even better) if it supports NAPI, also make sure it is
|
||||
enabled. For transmission, check the MTU (Maximum Transmission Unit) used and
|
||||
supported by devices of your network.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
+ How to use CONFIG_PACKET_MMAP
|
||||
+ How to use CONFIG_PACKET_MMAP to improve capture process
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
From the user standpoint, you should use the higher level libpcap library, which
|
||||
@ -57,7 +64,7 @@ the low level details or want to improve libpcap by including PACKET_MMAP
|
||||
support.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
+ How to use CONFIG_PACKET_MMAP directly
|
||||
+ How to use CONFIG_PACKET_MMAP directly to improve capture process
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
From the system calls stand point, the use of PACKET_MMAP involves
|
||||
@ -66,6 +73,7 @@ the following process:
|
||||
|
||||
[setup] socket() -------> creation of the capture socket
|
||||
setsockopt() ---> allocation of the circular buffer (ring)
|
||||
option: PACKET_RX_RING
|
||||
mmap() ---------> mapping of the allocated buffer to the
|
||||
user process
|
||||
|
||||
@ -96,6 +104,65 @@ Next I will describe PACKET_MMAP settings and it's constraints,
|
||||
also the mapping of the circular buffer in the user process and
|
||||
the use of this buffer.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
+ How to use CONFIG_PACKET_MMAP directly to improve transmission process
|
||||
--------------------------------------------------------------------------------
|
||||
Transmission process is similar to capture as shown below.
|
||||
|
||||
[setup] socket() -------> creation of the transmission socket
|
||||
setsockopt() ---> allocation of the circular buffer (ring)
|
||||
option: PACKET_TX_RING
|
||||
bind() ---------> bind transmission socket with a network interface
|
||||
mmap() ---------> mapping of the allocated buffer to the
|
||||
user process
|
||||
|
||||
[transmission] poll() ---------> wait for free packets (optional)
|
||||
send() ---------> send all packets that are set as ready in
|
||||
the ring
|
||||
The flag MSG_DONTWAIT can be used to return
|
||||
before end of transfer.
|
||||
|
||||
[shutdown] close() --------> destruction of the transmission socket and
|
||||
deallocation of all associated resources.
|
||||
|
||||
Binding the socket to your network interface is mandatory (with zero copy) to
|
||||
know the header size of frames used in the circular buffer.
|
||||
|
||||
As capture, each frame contains two parts:
|
||||
|
||||
--------------------
|
||||
| struct tpacket_hdr | Header. It contains the status of
|
||||
| | of this frame
|
||||
|--------------------|
|
||||
| data buffer |
|
||||
. . Data that will be sent over the network interface.
|
||||
. .
|
||||
--------------------
|
||||
|
||||
bind() associates the socket to your network interface thanks to
|
||||
sll_ifindex parameter of struct sockaddr_ll.
|
||||
|
||||
Initialization example:
|
||||
|
||||
struct sockaddr_ll my_addr;
|
||||
struct ifreq s_ifr;
|
||||
...
|
||||
|
||||
strncpy (s_ifr.ifr_name, "eth0", sizeof(s_ifr.ifr_name));
|
||||
|
||||
/* get interface index of eth0 */
|
||||
ioctl(this->socket, SIOCGIFINDEX, &s_ifr);
|
||||
|
||||
/* fill sockaddr_ll struct to prepare binding */
|
||||
my_addr.sll_family = AF_PACKET;
|
||||
my_addr.sll_protocol = ETH_P_ALL;
|
||||
my_addr.sll_ifindex = s_ifr.ifr_ifindex;
|
||||
|
||||
/* bind socket to eth0 */
|
||||
bind(this->socket, (struct sockaddr *)&my_addr, sizeof(struct sockaddr_ll));
|
||||
|
||||
A complete tutorial is available at: http://wiki.gnu-log.net/
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
+ PACKET_MMAP settings
|
||||
--------------------------------------------------------------------------------
|
||||
@ -103,7 +170,10 @@ the use of this buffer.
|
||||
|
||||
To setup PACKET_MMAP from user level code is done with a call like
|
||||
|
||||
- Capture process
|
||||
setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &req, sizeof(req))
|
||||
- Transmission process
|
||||
setsockopt(fd, SOL_PACKET, PACKET_TX_RING, (void *) &req, sizeof(req))
|
||||
|
||||
The most significant argument in the previous call is the req parameter,
|
||||
this parameter must to have the following structure:
|
||||
@ -117,11 +187,11 @@ this parameter must to have the following structure:
|
||||
};
|
||||
|
||||
This structure is defined in /usr/include/linux/if_packet.h and establishes a
|
||||
circular buffer (ring) of unswappable memory mapped in the capture process.
|
||||
circular buffer (ring) of unswappable memory.
|
||||
Being mapped in the capture process allows reading the captured frames and
|
||||
related meta-information like timestamps without requiring a system call.
|
||||
|
||||
Captured frames are grouped in blocks. Each block is a physically contiguous
|
||||
Frames are grouped in blocks. Each block is a physically contiguous
|
||||
region of memory and holds tp_block_size/tp_frame_size frames. The total number
|
||||
of blocks is tp_block_nr. Note that tp_frame_nr is a redundant parameter because
|
||||
|
||||
@ -336,6 +406,7 @@ struct tpacket_hdr). If this field is 0 means that the frame is ready
|
||||
to be used for the kernel, If not, there is a frame the user can read
|
||||
and the following flags apply:
|
||||
|
||||
+++ Capture process:
|
||||
from include/linux/if_packet.h
|
||||
|
||||
#define TP_STATUS_COPY 2
|
||||
@ -391,6 +462,37 @@ packets are in the ring:
|
||||
It doesn't incur in a race condition to first check the status value and
|
||||
then poll for frames.
|
||||
|
||||
|
||||
++ Transmission process
|
||||
Those defines are also used for transmission:
|
||||
|
||||
#define TP_STATUS_AVAILABLE 0 // Frame is available
|
||||
#define TP_STATUS_SEND_REQUEST 1 // Frame will be sent on next send()
|
||||
#define TP_STATUS_SENDING 2 // Frame is currently in transmission
|
||||
#define TP_STATUS_WRONG_FORMAT 4 // Frame format is not correct
|
||||
|
||||
First, the kernel initializes all frames to TP_STATUS_AVAILABLE. To send a
|
||||
packet, the user fills a data buffer of an available frame, sets tp_len to
|
||||
current data buffer size and sets its status field to TP_STATUS_SEND_REQUEST.
|
||||
This can be done on multiple frames. Once the user is ready to transmit, it
|
||||
calls send(). Then all buffers with status equal to TP_STATUS_SEND_REQUEST are
|
||||
forwarded to the network device. The kernel updates each status of sent
|
||||
frames with TP_STATUS_SENDING until the end of transfer.
|
||||
At the end of each transfer, buffer status returns to TP_STATUS_AVAILABLE.
|
||||
|
||||
header->tp_len = in_i_size;
|
||||
header->tp_status = TP_STATUS_SEND_REQUEST;
|
||||
retval = send(this->socket, NULL, 0, 0);
|
||||
|
||||
The user can also use poll() to check if a buffer is available:
|
||||
(status == TP_STATUS_SENDING)
|
||||
|
||||
struct pollfd pfd;
|
||||
pfd.fd = fd;
|
||||
pfd.revents = 0;
|
||||
pfd.events = POLLOUT;
|
||||
retval = poll(&pfd, 1, timeout);
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
+ THANKS
|
||||
--------------------------------------------------------------------------------
|
||||
|
@ -36,7 +36,7 @@ Phonet packets have a common header as follows:
|
||||
On Linux, the link-layer header includes the pn_media byte (see below).
|
||||
The next 7 bytes are part of the network-layer header.
|
||||
|
||||
The device ID is split: the 6 higher-order bits consitute the device
|
||||
The device ID is split: the 6 higher-order bits constitute the device
|
||||
address, while the 2 lower-order bits are used for multiplexing, as are
|
||||
the 8-bit object identifiers. As such, Phonet can be considered as a
|
||||
network layer with 6 bits of address space and 10 bits for transport
|
||||
|
@ -89,7 +89,7 @@ added to this document when its support is enabled.
|
||||
Device drivers who provide their own built regulatory domain
|
||||
do not need a callback as the channels registered by them are
|
||||
the only ones that will be allowed and therefore *additional*
|
||||
cannels cannot be enabled.
|
||||
channels cannot be enabled.
|
||||
|
||||
Example code - drivers hinting an alpha2:
|
||||
------------------------------------------
|
||||
|
@ -75,9 +75,6 @@ may need to apply in domain-specific ways to their devices:
|
||||
struct bus_type {
|
||||
...
|
||||
int (*suspend)(struct device *dev, pm_message_t state);
|
||||
int (*suspend_late)(struct device *dev, pm_message_t state);
|
||||
|
||||
int (*resume_early)(struct device *dev);
|
||||
int (*resume)(struct device *dev);
|
||||
};
|
||||
|
||||
@ -226,20 +223,7 @@ The phases are seen by driver notifications issued in this order:
|
||||
|
||||
This call should handle parts of device suspend logic that require
|
||||
sleeping. It probably does work to quiesce the device which hasn't
|
||||
been abstracted into class.suspend() or bus.suspend_late().
|
||||
|
||||
3 bus.suspend_late(dev, message) is called with IRQs disabled, and
|
||||
with only one CPU active. Until the bus.resume_early() phase
|
||||
completes (see later), IRQs are not enabled again. This method
|
||||
won't be exposed by all busses; for message based busses like USB,
|
||||
I2C, or SPI, device interactions normally require IRQs. This bus
|
||||
call may be morphed into a driver call with bus-specific parameters.
|
||||
|
||||
This call might save low level hardware state that might otherwise
|
||||
be lost in the upcoming low power state, and actually put the
|
||||
device into a low power state ... so that in some cases the device
|
||||
may stay partly usable until this late. This "late" call may also
|
||||
help when coping with hardware that behaves badly.
|
||||
been abstracted into class.suspend().
|
||||
|
||||
The pm_message_t parameter is currently used to refine those semantics
|
||||
(described later).
|
||||
@ -351,19 +335,11 @@ devices processing each phase's calls before the next phase begins.
|
||||
|
||||
The phases are seen by driver notifications issued in this order:
|
||||
|
||||
1 bus.resume_early(dev) is called with IRQs disabled, and with
|
||||
only one CPU active. As with bus.suspend_late(), this method
|
||||
won't be supported on busses that require IRQs in order to
|
||||
interact with devices.
|
||||
1 bus.resume(dev) reverses the effects of bus.suspend(). This may
|
||||
be morphed into a device driver call with bus-specific parameters;
|
||||
implementations may sleep.
|
||||
|
||||
This reverses the effects of bus.suspend_late().
|
||||
|
||||
2 bus.resume(dev) is called next. This may be morphed into a device
|
||||
driver call with bus-specific parameters; implementations may sleep.
|
||||
|
||||
This reverses the effects of bus.suspend().
|
||||
|
||||
3 class.resume(dev) is called for devices associated with a class
|
||||
2 class.resume(dev) is called for devices associated with a class
|
||||
that has such a method. Implementations may sleep.
|
||||
|
||||
This reverses the effects of class.suspend(), and would usually
|
||||
|
@ -178,5 +178,5 @@ Consumers can uregister interest by calling :-
|
||||
int regulator_unregister_notifier(struct regulator *regulator,
|
||||
struct notifier_block *nb);
|
||||
|
||||
Regulators use the kernel notifier framework to send event to thier interested
|
||||
Regulators use the kernel notifier framework to send event to their interested
|
||||
consumers.
|
||||
|
@ -119,7 +119,7 @@ Some terms used in this document:-
|
||||
battery power, USB power)
|
||||
|
||||
Regulator Domains: is the new current limit within the
|
||||
regulator operating parameters for input/ouput voltage.
|
||||
regulator operating parameters for input/output voltage.
|
||||
|
||||
If the regulator request passes all the constraint tests
|
||||
then the new regulator value is applied.
|
||||
|
@ -63,7 +63,7 @@ hardware during resume operations where a value can be set that will
|
||||
survive a reboot.
|
||||
|
||||
Consequence is that after a resume (even if it is successful) your system
|
||||
clock will have a value corresponding to the magic mumber instead of the
|
||||
clock will have a value corresponding to the magic number instead of the
|
||||
correct date/time! It is therefore advisable to use a program like ntp-date
|
||||
or rdate to reset the correct date/time from an external time source when
|
||||
using this trace option.
|
||||
|
@ -109,7 +109,7 @@ unfreeze user space processes frozen by SNAPSHOT_UNFREEZE if they are
|
||||
still frozen when the device is being closed).
|
||||
|
||||
Currently it is assumed that the userland utilities reading/writing the
|
||||
snapshot image from/to the kernel will use a swap parition, called the resume
|
||||
snapshot image from/to the kernel will use a swap partition, called the resume
|
||||
partition, or a swap file as storage space (if a swap file is used, the resume
|
||||
partition is the partition that holds this file). However, this is not really
|
||||
required, as they can use, for example, a special (blank) suspend partition or
|
||||
|
@ -1356,7 +1356,7 @@ platforms are moved over to use the flattened-device-tree model.
|
||||
- phy-map : 1 cell, optional, bitmap of addresses to probe the PHY
|
||||
for, used if phy-address is absent. bit 0x00000001 is
|
||||
MDIO address 0.
|
||||
For Axon it can be absent, thouugh my current driver
|
||||
For Axon it can be absent, though my current driver
|
||||
doesn't handle phy-address yet so for now, keep
|
||||
0x00ffffff in it.
|
||||
- rx-fifo-size-gige : 1 cell, Rx fifo size in bytes for 1000 Mb/sec
|
||||
@ -1438,7 +1438,7 @@ platforms are moved over to use the flattened-device-tree model.
|
||||
|
||||
The Xilinx EDK toolchain ships with a set of IP cores (devices) for use
|
||||
in Xilinx Spartan and Virtex FPGAs. The devices cover the whole range
|
||||
of standard device types (network, serial, etc.) and miscellanious
|
||||
of standard device types (network, serial, etc.) and miscellaneous
|
||||
devices (gpio, LCD, spi, etc). Also, since these devices are
|
||||
implemented within the fpga fabric every instance of the device can be
|
||||
synthesised with different options that change the behaviour.
|
||||
|
53
Documentation/powerpc/dts-bindings/can/sja1000.txt
Normal file
53
Documentation/powerpc/dts-bindings/can/sja1000.txt
Normal file
@ -0,0 +1,53 @@
|
||||
Memory mapped SJA1000 CAN controller from NXP (formerly Philips)
|
||||
|
||||
Required properties:
|
||||
|
||||
- compatible : should be "nxp,sja1000".
|
||||
|
||||
- reg : should specify the chip select, address offset and size required
|
||||
to map the registers of the SJA1000. The size is usually 0x80.
|
||||
|
||||
- interrupts: property with a value describing the interrupt source
|
||||
(number and sensitivity) required for the SJA1000.
|
||||
|
||||
Optional properties:
|
||||
|
||||
- nxp,external-clock-frequency : Frequency of the external oscillator
|
||||
clock in Hz. Note that the internal clock frequency used by the
|
||||
SJA1000 is half of that value. If not specified, a default value
|
||||
of 16000000 (16 MHz) is used.
|
||||
|
||||
- nxp,tx-output-mode : operation mode of the TX output control logic:
|
||||
<0x0> : bi-phase output mode
|
||||
<0x1> : normal output mode (default)
|
||||
<0x2> : test output mode
|
||||
<0x3> : clock output mode
|
||||
|
||||
- nxp,tx-output-config : TX output pin configuration:
|
||||
<0x01> : TX0 invert
|
||||
<0x02> : TX0 pull-down (default)
|
||||
<0x04> : TX0 pull-up
|
||||
<0x06> : TX0 push-pull
|
||||
<0x08> : TX1 invert
|
||||
<0x10> : TX1 pull-down
|
||||
<0x20> : TX1 pull-up
|
||||
<0x30> : TX1 push-pull
|
||||
|
||||
- nxp,clock-out-frequency : clock frequency in Hz on the CLKOUT pin.
|
||||
If not specified or if the specified value is 0, the CLKOUT pin
|
||||
will be disabled.
|
||||
|
||||
- nxp,no-comparator-bypass : Allows to disable the CAN input comperator.
|
||||
|
||||
For futher information, please have a look to the SJA1000 data sheet.
|
||||
|
||||
Examples:
|
||||
|
||||
can@3,100 {
|
||||
compatible = "nxp,sja1000";
|
||||
reg = <3 0x100 0x80>;
|
||||
interrupts = <2 0>;
|
||||
interrupt-parent = <&mpic>;
|
||||
nxp,external-clock-frequency = <16000000>;
|
||||
};
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user