mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-01-07 22:03:14 +00:00
Merge rsync://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts: include/linux/kernel.h
This commit is contained in:
commit
0a1340c185
15
CREDITS
15
CREDITS
@ -24,6 +24,11 @@ S: C. Negri 6, bl. D3
|
||||
S: Iasi 6600
|
||||
S: Romania
|
||||
|
||||
N: Mark Adler
|
||||
E: madler@alumni.caltech.edu
|
||||
W: http://alumnus.caltech.edu/~madler/
|
||||
D: zlib decompression
|
||||
|
||||
N: Monalisa Agrawal
|
||||
E: magrawal@nortelnetworks.com
|
||||
D: Basic Interphase 5575 driver with UBR and ABR support.
|
||||
@ -1573,12 +1578,8 @@ S: 160 00 Praha 6
|
||||
S: Czech Republic
|
||||
|
||||
N: Niels Kristian Bech Jensen
|
||||
E: nkbj@image.dk
|
||||
W: http://www.image.dk/~nkbj
|
||||
E: nkbj1970@hotmail.com
|
||||
D: Miscellaneous kernel updates and fixes.
|
||||
S: Dr. Holsts Vej 34, lejl. 164
|
||||
S: DK-8230 Åbyhøj
|
||||
S: Denmark
|
||||
|
||||
N: Michael K. Johnson
|
||||
E: johnsonm@redhat.com
|
||||
@ -3400,10 +3401,10 @@ S: Czech Republic
|
||||
|
||||
N: Thibaut Varene
|
||||
E: T-Bone@parisc-linux.org
|
||||
W: http://www.parisc-linux.org/
|
||||
W: http://www.parisc-linux.org/~varenet/
|
||||
P: 1024D/B7D2F063 E67C 0D43 A75E 12A5 BB1C FA2F 1E32 C3DA B7D2 F063
|
||||
D: PA-RISC port minion, PDC and GSCPS2 drivers, debuglocks and other bits
|
||||
D: Some bits in an ARM port, S1D13XXX FB driver, random patches here and there
|
||||
D: Some ARM at91rm9200 bits, S1D13XXX FB driver, random patches here and there
|
||||
D: AD1889 sound driver
|
||||
S: Paris, France
|
||||
|
||||
|
77
Documentation/ABI/README
Normal file
77
Documentation/ABI/README
Normal file
@ -0,0 +1,77 @@
|
||||
This directory attempts to document the ABI between the Linux kernel and
|
||||
userspace, and the relative stability of these interfaces. Due to the
|
||||
everchanging nature of Linux, and the differing maturity levels, these
|
||||
interfaces should be used by userspace programs in different ways.
|
||||
|
||||
We have four different levels of ABI stability, as shown by the four
|
||||
different subdirectories in this location. Interfaces may change levels
|
||||
of stability according to the rules described below.
|
||||
|
||||
The different levels of stability are:
|
||||
|
||||
stable/
|
||||
This directory documents the interfaces that the developer has
|
||||
defined to be stable. Userspace programs are free to use these
|
||||
interfaces with no restrictions, and backward compatibility for
|
||||
them will be guaranteed for at least 2 years. Most interfaces
|
||||
(like syscalls) are expected to never change and always be
|
||||
available.
|
||||
|
||||
testing/
|
||||
This directory documents interfaces that are felt to be stable,
|
||||
as the main development of this interface has been completed.
|
||||
The interface can be changed to add new features, but the
|
||||
current interface will not break by doing this, unless grave
|
||||
errors or security problems are found in them. Userspace
|
||||
programs can start to rely on these interfaces, but they must be
|
||||
aware of changes that can occur before these interfaces move to
|
||||
be marked stable. Programs that use these interfaces are
|
||||
strongly encouraged to add their name to the description of
|
||||
these interfaces, so that the kernel developers can easily
|
||||
notify them if any changes occur (see the description of the
|
||||
layout of the files below for details on how to do this.)
|
||||
|
||||
obsolete/
|
||||
This directory documents interfaces that are still remaining in
|
||||
the kernel, but are marked to be removed at some later point in
|
||||
time. The description of the interface will document the reason
|
||||
why it is obsolete and when it can be expected to be removed.
|
||||
The file Documentation/feature-removal-schedule.txt may describe
|
||||
some of these interfaces, giving a schedule for when they will
|
||||
be removed.
|
||||
|
||||
removed/
|
||||
This directory contains a list of the old interfaces that have
|
||||
been removed from the kernel.
|
||||
|
||||
Every file in these directories will contain the following information:
|
||||
|
||||
What: Short description of the interface
|
||||
Date: Date created
|
||||
KernelVersion: Kernel version this feature first showed up in.
|
||||
Contact: Primary contact for this interface (may be a mailing list)
|
||||
Description: Long description of the interface and how to use it.
|
||||
Users: All users of this interface who wish to be notified when
|
||||
it changes. This is very important for interfaces in
|
||||
the "testing" stage, so that kernel developers can work
|
||||
with userspace developers to ensure that things do not
|
||||
break in ways that are unacceptable. It is also
|
||||
important to get feedback for these interfaces to make
|
||||
sure they are working in a proper way and do not need to
|
||||
be changed further.
|
||||
|
||||
|
||||
How things move between levels:
|
||||
|
||||
Interfaces in stable may move to obsolete, as long as the proper
|
||||
notification is given.
|
||||
|
||||
Interfaces may be removed from obsolete and the kernel as long as the
|
||||
documented amount of time has gone by.
|
||||
|
||||
Interfaces in the testing state can move to the stable state when the
|
||||
developers feel they are finished. They cannot be removed from the
|
||||
kernel tree without going through the obsolete state first.
|
||||
|
||||
It's up to the developer to place their interfaces in the category they
|
||||
wish for it to start out in.
|
13
Documentation/ABI/obsolete/devfs
Normal file
13
Documentation/ABI/obsolete/devfs
Normal file
@ -0,0 +1,13 @@
|
||||
What: devfs
|
||||
Date: July 2005
|
||||
Contact: Greg Kroah-Hartman <gregkh@suse.de>
|
||||
Description:
|
||||
devfs has been unmaintained for a number of years, has unfixable
|
||||
races, contains a naming policy within the kernel that is
|
||||
against the LSB, and can be replaced by using udev.
|
||||
The files fs/devfs/*, include/linux/devfs_fs*.h will be removed,
|
||||
along with the the assorted devfs function calls throughout the
|
||||
kernel tree.
|
||||
|
||||
Users:
|
||||
|
10
Documentation/ABI/stable/syscalls
Normal file
10
Documentation/ABI/stable/syscalls
Normal file
@ -0,0 +1,10 @@
|
||||
What: The kernel syscall interface
|
||||
Description:
|
||||
This interface matches much of the POSIX interface and is based
|
||||
on it and other Unix based interfaces. It will only be added to
|
||||
over time, and not have things removed from it.
|
||||
|
||||
Note that this interface is different for every architecture
|
||||
that Linux supports. Please see the architecture-specific
|
||||
documentation for details on the syscall numbers that are to be
|
||||
mapped to each syscall.
|
30
Documentation/ABI/stable/sysfs-module
Normal file
30
Documentation/ABI/stable/sysfs-module
Normal file
@ -0,0 +1,30 @@
|
||||
What: /sys/module
|
||||
Description:
|
||||
The /sys/module tree consists of the following structure:
|
||||
|
||||
/sys/module/MODULENAME
|
||||
The name of the module that is in the kernel. This
|
||||
module name will show up either if the module is built
|
||||
directly into the kernel, or if it is loaded as a
|
||||
dyanmic module.
|
||||
|
||||
/sys/module/MODULENAME/parameters
|
||||
This directory contains individual files that are each
|
||||
individual parameters of the module that are able to be
|
||||
changed at runtime. See the individual module
|
||||
documentation as to the contents of these parameters and
|
||||
what they accomplish.
|
||||
|
||||
Note: The individual parameter names and values are not
|
||||
considered stable, only the fact that they will be
|
||||
placed in this location within sysfs. See the
|
||||
individual driver documentation for details as to the
|
||||
stability of the different parameters.
|
||||
|
||||
/sys/module/MODULENAME/refcnt
|
||||
If the module is able to be unloaded from the kernel, this file
|
||||
will contain the current reference count of the module.
|
||||
|
||||
Note: If the module is built into the kernel, or if the
|
||||
CONFIG_MODULE_UNLOAD kernel configuration value is not enabled,
|
||||
this file will not be present.
|
16
Documentation/ABI/testing/sysfs-class
Normal file
16
Documentation/ABI/testing/sysfs-class
Normal file
@ -0,0 +1,16 @@
|
||||
What: /sys/class/
|
||||
Date: Febuary 2006
|
||||
Contact: Greg Kroah-Hartman <gregkh@suse.de>
|
||||
Description:
|
||||
The /sys/class directory will consist of a group of
|
||||
subdirectories describing individual classes of devices
|
||||
in the kernel. The individual directories will consist
|
||||
of either subdirectories, or symlinks to other
|
||||
directories.
|
||||
|
||||
All programs that use this directory tree must be able
|
||||
to handle both subdirectories or symlinks in order to
|
||||
work properly.
|
||||
|
||||
Users:
|
||||
udev <linux-hotplug-devel@lists.sourceforge.net>
|
25
Documentation/ABI/testing/sysfs-devices
Normal file
25
Documentation/ABI/testing/sysfs-devices
Normal file
@ -0,0 +1,25 @@
|
||||
What: /sys/devices
|
||||
Date: February 2006
|
||||
Contact: Greg Kroah-Hartman <gregkh@suse.de>
|
||||
Description:
|
||||
The /sys/devices tree contains a snapshot of the
|
||||
internal state of the kernel device tree. Devices will
|
||||
be added and removed dynamically as the machine runs,
|
||||
and between different kernel versions, the layout of the
|
||||
devices within this tree will change.
|
||||
|
||||
Please do not rely on the format of this tree because of
|
||||
this. If a program wishes to find different things in
|
||||
the tree, please use the /sys/class structure and rely
|
||||
on the symlinks there to point to the proper location
|
||||
within the /sys/devices tree of the individual devices.
|
||||
Or rely on the uevent messages to notify programs of
|
||||
devices being added and removed from this tree to find
|
||||
the location of those devices.
|
||||
|
||||
Note that sometimes not all devices along the directory
|
||||
chain will have emitted uevent messages, so userspace
|
||||
programs must be able to handle such occurrences.
|
||||
|
||||
Users:
|
||||
udev <linux-hotplug-devel@lists.sourceforge.net>
|
@ -181,8 +181,8 @@ Intel IA32 microcode
|
||||
--------------------
|
||||
|
||||
A driver has been added to allow updating of Intel IA32 microcode,
|
||||
accessible as both a devfs regular file and as a normal (misc)
|
||||
character device. If you are not using devfs you may need to:
|
||||
accessible as a normal (misc) character device. If you are not using
|
||||
udev you may need to:
|
||||
|
||||
mkdir /dev/cpu
|
||||
mknod /dev/cpu/microcode c 10 184
|
||||
@ -201,7 +201,9 @@ with programs using shared memory.
|
||||
udev
|
||||
----
|
||||
udev is a userspace application for populating /dev dynamically with
|
||||
only entries for devices actually present. udev replaces devfs.
|
||||
only entries for devices actually present. udev replaces the basic
|
||||
functionality of devfs, while allowing persistant device naming for
|
||||
devices.
|
||||
|
||||
FUSE
|
||||
----
|
||||
@ -231,18 +233,13 @@ The PPP driver has been restructured to support multilink and to
|
||||
enable it to operate over diverse media layers. If you use PPP,
|
||||
upgrade pppd to at least 2.4.0.
|
||||
|
||||
If you are not using devfs, you must have the device file /dev/ppp
|
||||
If you are not using udev, you must have the device file /dev/ppp
|
||||
which can be made by:
|
||||
|
||||
mknod /dev/ppp c 108 0
|
||||
|
||||
as root.
|
||||
|
||||
If you use devfsd and build ppp support as modules, you will need
|
||||
the following in your /etc/devfsd.conf file:
|
||||
|
||||
LOOKUP PPP MODLOAD
|
||||
|
||||
Isdn4k-utils
|
||||
------------
|
||||
|
||||
|
@ -155,7 +155,83 @@ problem, which is called the function-growth-hormone-imbalance syndrome.
|
||||
See next chapter.
|
||||
|
||||
|
||||
Chapter 5: Functions
|
||||
Chapter 5: Typedefs
|
||||
|
||||
Please don't use things like "vps_t".
|
||||
|
||||
It's a _mistake_ to use typedef for structures and pointers. When you see a
|
||||
|
||||
vps_t a;
|
||||
|
||||
in the source, what does it mean?
|
||||
|
||||
In contrast, if it says
|
||||
|
||||
struct virtual_container *a;
|
||||
|
||||
you can actually tell what "a" is.
|
||||
|
||||
Lots of people think that typedefs "help readability". Not so. They are
|
||||
useful only for:
|
||||
|
||||
(a) totally opaque objects (where the typedef is actively used to _hide_
|
||||
what the object is).
|
||||
|
||||
Example: "pte_t" etc. opaque objects that you can only access using
|
||||
the proper accessor functions.
|
||||
|
||||
NOTE! Opaqueness and "accessor functions" are not good in themselves.
|
||||
The reason we have them for things like pte_t etc. is that there
|
||||
really is absolutely _zero_ portably accessible information there.
|
||||
|
||||
(b) Clear integer types, where the abstraction _helps_ avoid confusion
|
||||
whether it is "int" or "long".
|
||||
|
||||
u8/u16/u32 are perfectly fine typedefs, although they fit into
|
||||
category (d) better than here.
|
||||
|
||||
NOTE! Again - there needs to be a _reason_ for this. If something is
|
||||
"unsigned long", then there's no reason to do
|
||||
|
||||
typedef unsigned long myflags_t;
|
||||
|
||||
but if there is a clear reason for why it under certain circumstances
|
||||
might be an "unsigned int" and under other configurations might be
|
||||
"unsigned long", then by all means go ahead and use a typedef.
|
||||
|
||||
(c) when you use sparse to literally create a _new_ type for
|
||||
type-checking.
|
||||
|
||||
(d) New types which are identical to standard C99 types, in certain
|
||||
exceptional circumstances.
|
||||
|
||||
Although it would only take a short amount of time for the eyes and
|
||||
brain to become accustomed to the standard types like 'uint32_t',
|
||||
some people object to their use anyway.
|
||||
|
||||
Therefore, the Linux-specific 'u8/u16/u32/u64' types and their
|
||||
signed equivalents which are identical to standard types are
|
||||
permitted -- although they are not mandatory in new code of your
|
||||
own.
|
||||
|
||||
When editing existing code which already uses one or the other set
|
||||
of types, you should conform to the existing choices in that code.
|
||||
|
||||
(e) Types safe for use in userspace.
|
||||
|
||||
In certain structures which are visible to userspace, we cannot
|
||||
require C99 types and cannot use the 'u32' form above. Thus, we
|
||||
use __u32 and similar types in all structures which are shared
|
||||
with userspace.
|
||||
|
||||
Maybe there are other cases too, but the rule should basically be to NEVER
|
||||
EVER use a typedef unless you can clearly match one of those rules.
|
||||
|
||||
In general, a pointer, or a struct that has elements that can reasonably
|
||||
be directly accessed should _never_ be a typedef.
|
||||
|
||||
|
||||
Chapter 6: Functions
|
||||
|
||||
Functions should be short and sweet, and do just one thing. They should
|
||||
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,
|
||||
@ -183,7 +259,7 @@ and it gets confused. You know you're brilliant, but maybe you'd like
|
||||
to understand what you did 2 weeks from now.
|
||||
|
||||
|
||||
Chapter 6: Centralized exiting of functions
|
||||
Chapter 7: Centralized exiting of functions
|
||||
|
||||
Albeit deprecated by some people, the equivalent of the goto statement is
|
||||
used frequently by compilers in form of the unconditional jump instruction.
|
||||
@ -220,7 +296,7 @@ out:
|
||||
return result;
|
||||
}
|
||||
|
||||
Chapter 7: Commenting
|
||||
Chapter 8: Commenting
|
||||
|
||||
Comments are good, but there is also a danger of over-commenting. NEVER
|
||||
try to explain HOW your code works in a comment: it's much better to
|
||||
@ -240,7 +316,7 @@ When commenting the kernel API functions, please use the kerneldoc format.
|
||||
See the files Documentation/kernel-doc-nano-HOWTO.txt and scripts/kernel-doc
|
||||
for details.
|
||||
|
||||
Chapter 8: You've made a mess of it
|
||||
Chapter 9: You've made a mess of it
|
||||
|
||||
That's OK, we all do. You've probably been told by your long-time Unix
|
||||
user helper that "GNU emacs" automatically formats the C sources for
|
||||
@ -288,7 +364,7 @@ re-formatting you may want to take a look at the man page. But
|
||||
remember: "indent" is not a fix for bad programming.
|
||||
|
||||
|
||||
Chapter 9: Configuration-files
|
||||
Chapter 10: Configuration-files
|
||||
|
||||
For configuration options (arch/xxx/Kconfig, and all the Kconfig files),
|
||||
somewhat different indentation is used.
|
||||
@ -313,7 +389,7 @@ support for file-systems, for instance) should be denoted (DANGEROUS), other
|
||||
experimental options should be denoted (EXPERIMENTAL).
|
||||
|
||||
|
||||
Chapter 10: Data structures
|
||||
Chapter 11: Data structures
|
||||
|
||||
Data structures that have visibility outside the single-threaded
|
||||
environment they are created and destroyed in should always have
|
||||
@ -344,7 +420,7 @@ Remember: if another thread can find your data structure, and you don't
|
||||
have a reference count on it, you almost certainly have a bug.
|
||||
|
||||
|
||||
Chapter 11: Macros, Enums and RTL
|
||||
Chapter 12: Macros, Enums and RTL
|
||||
|
||||
Names of macros defining constants and labels in enums are capitalized.
|
||||
|
||||
@ -399,7 +475,7 @@ The cpp manual deals with macros exhaustively. The gcc internals manual also
|
||||
covers RTL which is used frequently with assembly language in the kernel.
|
||||
|
||||
|
||||
Chapter 12: Printing kernel messages
|
||||
Chapter 13: Printing kernel messages
|
||||
|
||||
Kernel developers like to be seen as literate. Do mind the spelling
|
||||
of kernel messages to make a good impression. Do not use crippled
|
||||
@ -410,7 +486,7 @@ Kernel messages do not have to be terminated with a period.
|
||||
Printing numbers in parentheses (%d) adds no value and should be avoided.
|
||||
|
||||
|
||||
Chapter 13: Allocating memory
|
||||
Chapter 14: Allocating memory
|
||||
|
||||
The kernel provides the following general purpose memory allocators:
|
||||
kmalloc(), kzalloc(), kcalloc(), and vmalloc(). Please refer to the API
|
||||
@ -429,7 +505,7 @@ from void pointer to any other pointer type is guaranteed by the C programming
|
||||
language.
|
||||
|
||||
|
||||
Chapter 14: The inline disease
|
||||
Chapter 15: The inline disease
|
||||
|
||||
There appears to be a common misperception that gcc has a magic "make me
|
||||
faster" speedup option called "inline". While the use of inlines can be
|
||||
@ -457,7 +533,7 @@ something it would have done anyway.
|
||||
|
||||
|
||||
|
||||
Chapter 15: References
|
||||
Appendix I: References
|
||||
|
||||
The C Programming Language, Second Edition
|
||||
by Brian W. Kernighan and Dennis M. Ritchie.
|
||||
@ -481,4 +557,4 @@ Kernel CodingStyle, by greg@kroah.com at OLS 2002:
|
||||
http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/
|
||||
|
||||
--
|
||||
Last updated on 30 December 2005 by a community effort on LKML.
|
||||
Last updated on 30 April 2006.
|
||||
|
@ -10,7 +10,8 @@ DOCBOOKS := wanbook.xml z8530book.xml mcabook.xml videobook.xml \
|
||||
kernel-hacking.xml kernel-locking.xml deviceiobook.xml \
|
||||
procfs-guide.xml writing_usb_driver.xml \
|
||||
kernel-api.xml journal-api.xml lsm.xml usb.xml \
|
||||
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml
|
||||
gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
|
||||
genericirq.xml
|
||||
|
||||
###
|
||||
# The build process is as follows (targets):
|
||||
|
474
Documentation/DocBook/genericirq.tmpl
Normal file
474
Documentation/DocBook/genericirq.tmpl
Normal file
@ -0,0 +1,474 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
|
||||
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
|
||||
|
||||
<book id="Generic-IRQ-Guide">
|
||||
<bookinfo>
|
||||
<title>Linux generic IRQ handling</title>
|
||||
|
||||
<authorgroup>
|
||||
<author>
|
||||
<firstname>Thomas</firstname>
|
||||
<surname>Gleixner</surname>
|
||||
<affiliation>
|
||||
<address>
|
||||
<email>tglx@linutronix.de</email>
|
||||
</address>
|
||||
</affiliation>
|
||||
</author>
|
||||
<author>
|
||||
<firstname>Ingo</firstname>
|
||||
<surname>Molnar</surname>
|
||||
<affiliation>
|
||||
<address>
|
||||
<email>mingo@elte.hu</email>
|
||||
</address>
|
||||
</affiliation>
|
||||
</author>
|
||||
</authorgroup>
|
||||
|
||||
<copyright>
|
||||
<year>2005-2006</year>
|
||||
<holder>Thomas Gleixner</holder>
|
||||
</copyright>
|
||||
<copyright>
|
||||
<year>2005-2006</year>
|
||||
<holder>Ingo Molnar</holder>
|
||||
</copyright>
|
||||
|
||||
<legalnotice>
|
||||
<para>
|
||||
This documentation is free software; you can redistribute
|
||||
it and/or modify it under the terms of the GNU General Public
|
||||
License version 2 as published by the Free Software Foundation.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This program is distributed in the hope that it will be
|
||||
useful, but WITHOUT ANY WARRANTY; without even the implied
|
||||
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
||||
See the GNU General Public License for more details.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
You should have received a copy of the GNU General Public
|
||||
License along with this program; if not, write to the Free
|
||||
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
|
||||
MA 02111-1307 USA
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For more details see the file COPYING in the source
|
||||
distribution of Linux.
|
||||
</para>
|
||||
</legalnotice>
|
||||
</bookinfo>
|
||||
|
||||
<toc></toc>
|
||||
|
||||
<chapter id="intro">
|
||||
<title>Introduction</title>
|
||||
<para>
|
||||
The generic interrupt handling layer is designed to provide a
|
||||
complete abstraction of interrupt handling for device drivers.
|
||||
It is able to handle all the different types of interrupt controller
|
||||
hardware. Device drivers use generic API functions to request, enable,
|
||||
disable and free interrupts. The drivers do not have to know anything
|
||||
about interrupt hardware details, so they can be used on different
|
||||
platforms without code changes.
|
||||
</para>
|
||||
<para>
|
||||
This documentation is provided to developers who want to implement
|
||||
an interrupt subsystem based for their architecture, with the help
|
||||
of the generic IRQ handling layer.
|
||||
</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="rationale">
|
||||
<title>Rationale</title>
|
||||
<para>
|
||||
The original implementation of interrupt handling in Linux is using
|
||||
the __do_IRQ() super-handler, which is able to deal with every
|
||||
type of interrupt logic.
|
||||
</para>
|
||||
<para>
|
||||
Originally, Russell King identified different types of handlers to
|
||||
build a quite universal set for the ARM interrupt handler
|
||||
implementation in Linux 2.5/2.6. He distinguished between:
|
||||
<itemizedlist>
|
||||
<listitem><para>Level type</para></listitem>
|
||||
<listitem><para>Edge type</para></listitem>
|
||||
<listitem><para>Simple type</para></listitem>
|
||||
</itemizedlist>
|
||||
In the SMP world of the __do_IRQ() super-handler another type
|
||||
was identified:
|
||||
<itemizedlist>
|
||||
<listitem><para>Per CPU type</para></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
<para>
|
||||
This split implementation of highlevel IRQ handlers allows us to
|
||||
optimize the flow of the interrupt handling for each specific
|
||||
interrupt type. This reduces complexity in that particular codepath
|
||||
and allows the optimized handling of a given type.
|
||||
</para>
|
||||
<para>
|
||||
The original general IRQ implementation used hw_interrupt_type
|
||||
structures and their ->ack(), ->end() [etc.] callbacks to
|
||||
differentiate the flow control in the super-handler. This leads to
|
||||
a mix of flow logic and lowlevel hardware logic, and it also leads
|
||||
to unnecessary code duplication: for example in i386, there is a
|
||||
ioapic_level_irq and a ioapic_edge_irq irq-type which share many
|
||||
of the lowlevel details but have different flow handling.
|
||||
</para>
|
||||
<para>
|
||||
A more natural abstraction is the clean separation of the
|
||||
'irq flow' and the 'chip details'.
|
||||
</para>
|
||||
<para>
|
||||
Analysing a couple of architecture's IRQ subsystem implementations
|
||||
reveals that most of them can use a generic set of 'irq flow'
|
||||
methods and only need to add the chip level specific code.
|
||||
The separation is also valuable for (sub)architectures
|
||||
which need specific quirks in the irq flow itself but not in the
|
||||
chip-details - and thus provides a more transparent IRQ subsystem
|
||||
design.
|
||||
</para>
|
||||
<para>
|
||||
Each interrupt descriptor is assigned its own highlevel flow
|
||||
handler, which is normally one of the generic
|
||||
implementations. (This highlevel flow handler implementation also
|
||||
makes it simple to provide demultiplexing handlers which can be
|
||||
found in embedded platforms on various architectures.)
|
||||
</para>
|
||||
<para>
|
||||
The separation makes the generic interrupt handling layer more
|
||||
flexible and extensible. For example, an (sub)architecture can
|
||||
use a generic irq-flow implementation for 'level type' interrupts
|
||||
and add a (sub)architecture specific 'edge type' implementation.
|
||||
</para>
|
||||
<para>
|
||||
To make the transition to the new model easier and prevent the
|
||||
breakage of existing implementations, the __do_IRQ() super-handler
|
||||
is still available. This leads to a kind of duality for the time
|
||||
being. Over time the new model should be used in more and more
|
||||
architectures, as it enables smaller and cleaner IRQ subsystems.
|
||||
</para>
|
||||
</chapter>
|
||||
<chapter id="bugs">
|
||||
<title>Known Bugs And Assumptions</title>
|
||||
<para>
|
||||
None (knock on wood).
|
||||
</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="Abstraction">
|
||||
<title>Abstraction layers</title>
|
||||
<para>
|
||||
There are three main levels of abstraction in the interrupt code:
|
||||
<orderedlist>
|
||||
<listitem><para>Highlevel driver API</para></listitem>
|
||||
<listitem><para>Highlevel IRQ flow handlers</para></listitem>
|
||||
<listitem><para>Chiplevel hardware encapsulation</para></listitem>
|
||||
</orderedlist>
|
||||
</para>
|
||||
<sect1>
|
||||
<title>Interrupt control flow</title>
|
||||
<para>
|
||||
Each interrupt is described by an interrupt descriptor structure
|
||||
irq_desc. The interrupt is referenced by an 'unsigned int' numeric
|
||||
value which selects the corresponding interrupt decription structure
|
||||
in the descriptor structures array.
|
||||
The descriptor structure contains status information and pointers
|
||||
to the interrupt flow method and the interrupt chip structure
|
||||
which are assigned to this interrupt.
|
||||
</para>
|
||||
<para>
|
||||
Whenever an interrupt triggers, the lowlevel arch code calls into
|
||||
the generic interrupt code by calling desc->handle_irq().
|
||||
This highlevel IRQ handling function only uses desc->chip primitives
|
||||
referenced by the assigned chip descriptor structure.
|
||||
</para>
|
||||
</sect1>
|
||||
<sect1>
|
||||
<title>Highlevel Driver API</title>
|
||||
<para>
|
||||
The highlevel Driver API consists of following functions:
|
||||
<itemizedlist>
|
||||
<listitem><para>request_irq()</para></listitem>
|
||||
<listitem><para>free_irq()</para></listitem>
|
||||
<listitem><para>disable_irq()</para></listitem>
|
||||
<listitem><para>enable_irq()</para></listitem>
|
||||
<listitem><para>disable_irq_nosync() (SMP only)</para></listitem>
|
||||
<listitem><para>synchronize_irq() (SMP only)</para></listitem>
|
||||
<listitem><para>set_irq_type()</para></listitem>
|
||||
<listitem><para>set_irq_wake()</para></listitem>
|
||||
<listitem><para>set_irq_data()</para></listitem>
|
||||
<listitem><para>set_irq_chip()</para></listitem>
|
||||
<listitem><para>set_irq_chip_data()</para></listitem>
|
||||
</itemizedlist>
|
||||
See the autogenerated function documentation for details.
|
||||
</para>
|
||||
</sect1>
|
||||
<sect1>
|
||||
<title>Highlevel IRQ flow handlers</title>
|
||||
<para>
|
||||
The generic layer provides a set of pre-defined irq-flow methods:
|
||||
<itemizedlist>
|
||||
<listitem><para>handle_level_irq</para></listitem>
|
||||
<listitem><para>handle_edge_irq</para></listitem>
|
||||
<listitem><para>handle_simple_irq</para></listitem>
|
||||
<listitem><para>handle_percpu_irq</para></listitem>
|
||||
</itemizedlist>
|
||||
The interrupt flow handlers (either predefined or architecture
|
||||
specific) are assigned to specific interrupts by the architecture
|
||||
either during bootup or during device initialization.
|
||||
</para>
|
||||
<sect2>
|
||||
<title>Default flow implementations</title>
|
||||
<sect3>
|
||||
<title>Helper functions</title>
|
||||
<para>
|
||||
The helper functions call the chip primitives and
|
||||
are used by the default flow implementations.
|
||||
The following helper functions are implemented (simplified excerpt):
|
||||
<programlisting>
|
||||
default_enable(irq)
|
||||
{
|
||||
desc->chip->unmask(irq);
|
||||
}
|
||||
|
||||
default_disable(irq)
|
||||
{
|
||||
if (!delay_disable(irq))
|
||||
desc->chip->mask(irq);
|
||||
}
|
||||
|
||||
default_ack(irq)
|
||||
{
|
||||
chip->ack(irq);
|
||||
}
|
||||
|
||||
default_mask_ack(irq)
|
||||
{
|
||||
if (chip->mask_ack) {
|
||||
chip->mask_ack(irq);
|
||||
} else {
|
||||
chip->mask(irq);
|
||||
chip->ack(irq);
|
||||
}
|
||||
}
|
||||
|
||||
noop(irq)
|
||||
{
|
||||
}
|
||||
|
||||
</programlisting>
|
||||
</para>
|
||||
</sect3>
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Default flow handler implementations</title>
|
||||
<sect3>
|
||||
<title>Default Level IRQ flow handler</title>
|
||||
<para>
|
||||
handle_level_irq provides a generic implementation
|
||||
for level-triggered interrupts.
|
||||
</para>
|
||||
<para>
|
||||
The following control flow is implemented (simplified excerpt):
|
||||
<programlisting>
|
||||
desc->chip->start();
|
||||
handle_IRQ_event(desc->action);
|
||||
desc->chip->end();
|
||||
</programlisting>
|
||||
</para>
|
||||
</sect3>
|
||||
<sect3>
|
||||
<title>Default Edge IRQ flow handler</title>
|
||||
<para>
|
||||
handle_edge_irq provides a generic implementation
|
||||
for edge-triggered interrupts.
|
||||
</para>
|
||||
<para>
|
||||
The following control flow is implemented (simplified excerpt):
|
||||
<programlisting>
|
||||
if (desc->status & running) {
|
||||
desc->chip->hold();
|
||||
desc->status |= pending | masked;
|
||||
return;
|
||||
}
|
||||
desc->chip->start();
|
||||
desc->status |= running;
|
||||
do {
|
||||
if (desc->status & masked)
|
||||
desc->chip->enable();
|
||||
desc-status &= ~pending;
|
||||
handle_IRQ_event(desc->action);
|
||||
} while (status & pending);
|
||||
desc-status &= ~running;
|
||||
desc->chip->end();
|
||||
</programlisting>
|
||||
</para>
|
||||
</sect3>
|
||||
<sect3>
|
||||
<title>Default simple IRQ flow handler</title>
|
||||
<para>
|
||||
handle_simple_irq provides a generic implementation
|
||||
for simple interrupts.
|
||||
</para>
|
||||
<para>
|
||||
Note: The simple flow handler does not call any
|
||||
handler/chip primitives.
|
||||
</para>
|
||||
<para>
|
||||
The following control flow is implemented (simplified excerpt):
|
||||
<programlisting>
|
||||
handle_IRQ_event(desc->action);
|
||||
</programlisting>
|
||||
</para>
|
||||
</sect3>
|
||||
<sect3>
|
||||
<title>Default per CPU flow handler</title>
|
||||
<para>
|
||||
handle_percpu_irq provides a generic implementation
|
||||
for per CPU interrupts.
|
||||
</para>
|
||||
<para>
|
||||
Per CPU interrupts are only available on SMP and
|
||||
the handler provides a simplified version without
|
||||
locking.
|
||||
</para>
|
||||
<para>
|
||||
The following control flow is implemented (simplified excerpt):
|
||||
<programlisting>
|
||||
desc->chip->start();
|
||||
handle_IRQ_event(desc->action);
|
||||
desc->chip->end();
|
||||
</programlisting>
|
||||
</para>
|
||||
</sect3>
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Quirks and optimizations</title>
|
||||
<para>
|
||||
The generic functions are intended for 'clean' architectures and chips,
|
||||
which have no platform-specific IRQ handling quirks. If an architecture
|
||||
needs to implement quirks on the 'flow' level then it can do so by
|
||||
overriding the highlevel irq-flow handler.
|
||||
</para>
|
||||
</sect2>
|
||||
<sect2>
|
||||
<title>Delayed interrupt disable</title>
|
||||
<para>
|
||||
This per interrupt selectable feature, which was introduced by Russell
|
||||
King in the ARM interrupt implementation, does not mask an interrupt
|
||||
at the hardware level when disable_irq() is called. The interrupt is
|
||||
kept enabled and is masked in the flow handler when an interrupt event
|
||||
happens. This prevents losing edge interrupts on hardware which does
|
||||
not store an edge interrupt event while the interrupt is disabled at
|
||||
the hardware level. When an interrupt arrives while the IRQ_DISABLED
|
||||
flag is set, then the interrupt is masked at the hardware level and
|
||||
the IRQ_PENDING bit is set. When the interrupt is re-enabled by
|
||||
enable_irq() the pending bit is checked and if it is set, the
|
||||
interrupt is resent either via hardware or by a software resend
|
||||
mechanism. (It's necessary to enable CONFIG_HARDIRQS_SW_RESEND when
|
||||
you want to use the delayed interrupt disable feature and your
|
||||
hardware is not capable of retriggering an interrupt.)
|
||||
The delayed interrupt disable can be runtime enabled, per interrupt,
|
||||
by setting the IRQ_DELAYED_DISABLE flag in the irq_desc status field.
|
||||
</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
<sect1>
|
||||
<title>Chiplevel hardware encapsulation</title>
|
||||
<para>
|
||||
The chip level hardware descriptor structure irq_chip
|
||||
contains all the direct chip relevant functions, which
|
||||
can be utilized by the irq flow implementations.
|
||||
<itemizedlist>
|
||||
<listitem><para>ack()</para></listitem>
|
||||
<listitem><para>mask_ack() - Optional, recommended for performance</para></listitem>
|
||||
<listitem><para>mask()</para></listitem>
|
||||
<listitem><para>unmask()</para></listitem>
|
||||
<listitem><para>retrigger() - Optional</para></listitem>
|
||||
<listitem><para>set_type() - Optional</para></listitem>
|
||||
<listitem><para>set_wake() - Optional</para></listitem>
|
||||
</itemizedlist>
|
||||
These primitives are strictly intended to mean what they say: ack means
|
||||
ACK, masking means masking of an IRQ line, etc. It is up to the flow
|
||||
handler(s) to use these basic units of lowlevel functionality.
|
||||
</para>
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<chapter id="doirq">
|
||||
<title>__do_IRQ entry point</title>
|
||||
<para>
|
||||
The original implementation __do_IRQ() is an alternative entry
|
||||
point for all types of interrupts.
|
||||
</para>
|
||||
<para>
|
||||
This handler turned out to be not suitable for all
|
||||
interrupt hardware and was therefore reimplemented with split
|
||||
functionality for egde/level/simple/percpu interrupts. This is not
|
||||
only a functional optimization. It also shortens code paths for
|
||||
interrupts.
|
||||
</para>
|
||||
<para>
|
||||
To make use of the split implementation, replace the call to
|
||||
__do_IRQ by a call to desc->chip->handle_irq() and associate
|
||||
the appropriate handler function to desc->chip->handle_irq().
|
||||
In most cases the generic handler implementations should
|
||||
be sufficient.
|
||||
</para>
|
||||
</chapter>
|
||||
|
||||
<chapter id="locking">
|
||||
<title>Locking on SMP</title>
|
||||
<para>
|
||||
The locking of chip registers is up to the architecture that
|
||||
defines the chip primitives. There is a chip->lock field that can be used
|
||||
for serialization, but the generic layer does not touch it. The per-irq
|
||||
structure is protected via desc->lock, by the generic layer.
|
||||
</para>
|
||||
</chapter>
|
||||
<chapter id="structs">
|
||||
<title>Structures</title>
|
||||
<para>
|
||||
This chapter contains the autogenerated documentation of the structures which are
|
||||
used in the generic IRQ layer.
|
||||
</para>
|
||||
!Iinclude/linux/irq.h
|
||||
</chapter>
|
||||
|
||||
<chapter id="pubfunctions">
|
||||
<title>Public Functions Provided</title>
|
||||
<para>
|
||||
This chapter contains the autogenerated documentation of the kernel API functions
|
||||
which are exported.
|
||||
</para>
|
||||
!Ekernel/irq/manage.c
|
||||
!Ekernel/irq/chip.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="intfunctions">
|
||||
<title>Internal Functions Provided</title>
|
||||
<para>
|
||||
This chapter contains the autogenerated documentation of the internal functions.
|
||||
</para>
|
||||
!Ikernel/irq/handle.c
|
||||
!Ikernel/irq/chip.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="credits">
|
||||
<title>Credits</title>
|
||||
<para>
|
||||
The following people have contributed to this document:
|
||||
<orderedlist>
|
||||
<listitem><para>Thomas Gleixner<email>tglx@linutronix.de</email></para></listitem>
|
||||
<listitem><para>Ingo Molnar<email>mingo@elte.hu</email></para></listitem>
|
||||
</orderedlist>
|
||||
</para>
|
||||
</chapter>
|
||||
</book>
|
@ -62,6 +62,8 @@
|
||||
<sect1><title>Internal Functions</title>
|
||||
!Ikernel/exit.c
|
||||
!Ikernel/signal.c
|
||||
!Iinclude/linux/kthread.h
|
||||
!Ekernel/kthread.c
|
||||
</sect1>
|
||||
|
||||
<sect1><title>Kernel objects manipulation</title>
|
||||
@ -114,9 +116,33 @@ X!Ilib/string.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<chapter id="kernel-lib">
|
||||
<title>Basic Kernel Library Functions</title>
|
||||
|
||||
<para>
|
||||
The Linux kernel provides more basic utility functions.
|
||||
</para>
|
||||
|
||||
<sect1><title>Bitmap Operations</title>
|
||||
!Elib/bitmap.c
|
||||
!Ilib/bitmap.c
|
||||
</sect1>
|
||||
|
||||
<sect1><title>Command-line Parsing</title>
|
||||
!Elib/cmdline.c
|
||||
</sect1>
|
||||
|
||||
<sect1><title>CRC Functions</title>
|
||||
!Elib/crc16.c
|
||||
!Elib/crc32.c
|
||||
!Elib/crc-ccitt.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<chapter id="mm">
|
||||
<title>Memory Management in Linux</title>
|
||||
<sect1><title>The Slab Cache</title>
|
||||
!Iinclude/linux/slab.h
|
||||
!Emm/slab.c
|
||||
</sect1>
|
||||
<sect1><title>User Space Memory Access</title>
|
||||
@ -280,12 +306,13 @@ X!Ekernel/module.c
|
||||
<sect1><title>MTRR Handling</title>
|
||||
!Earch/i386/kernel/cpu/mtrr/main.c
|
||||
</sect1>
|
||||
|
||||
<sect1><title>PCI Support Library</title>
|
||||
!Edrivers/pci/pci.c
|
||||
!Edrivers/pci/pci-driver.c
|
||||
!Edrivers/pci/remove.c
|
||||
!Edrivers/pci/pci-acpi.c
|
||||
<!-- kerneldoc does not understand to __devinit
|
||||
<!-- kerneldoc does not understand __devinit
|
||||
X!Edrivers/pci/search.c
|
||||
-->
|
||||
!Edrivers/pci/msi.c
|
||||
@ -314,9 +341,11 @@ X!Earch/i386/kernel/mca.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<chapter id="devfs">
|
||||
<title>The Device File System</title>
|
||||
!Efs/devfs/base.c
|
||||
<chapter id="firmware">
|
||||
<title>Firmware Interfaces</title>
|
||||
<sect1><title>DMI Interfaces</title>
|
||||
!Edrivers/firmware/dmi_scan.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<chapter id="sysfs">
|
||||
@ -331,6 +360,18 @@ X!Earch/i386/kernel/mca.c
|
||||
!Esecurity/security.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="audit">
|
||||
<title>Audit Interfaces</title>
|
||||
!Ekernel/audit.c
|
||||
!Ikernel/auditsc.c
|
||||
!Ikernel/auditfilter.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="accounting">
|
||||
<title>Accounting Framework</title>
|
||||
!Ikernel/acct.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="pmfuncs">
|
||||
<title>Power Management</title>
|
||||
!Ekernel/power/pm.c
|
||||
@ -390,7 +431,6 @@ X!Edrivers/pnp/system.c
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
|
||||
<chapter id="blkdev">
|
||||
<title>Block Devices</title>
|
||||
!Eblock/ll_rw_blk.c
|
||||
@ -401,6 +441,14 @@ X!Edrivers/pnp/system.c
|
||||
!Edrivers/char/misc.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="parportdev">
|
||||
<title>Parallel Port Devices</title>
|
||||
!Iinclude/linux/parport.h
|
||||
!Edrivers/parport/ieee1284.c
|
||||
!Edrivers/parport/share.c
|
||||
!Idrivers/parport/daisy.c
|
||||
</chapter>
|
||||
|
||||
<chapter id="viddev">
|
||||
<title>Video4Linux</title>
|
||||
!Edrivers/media/video/videodev.c
|
||||
|
@ -1590,7 +1590,7 @@ the amount of locking which needs to be done.
|
||||
<para>
|
||||
Our final dilemma is this: when can we actually destroy the
|
||||
removed element? Remember, a reader might be stepping through
|
||||
this element in the list right now: it we free this element and
|
||||
this element in the list right now: if we free this element and
|
||||
the <symbol>next</symbol> pointer changes, the reader will jump
|
||||
off into garbage and crash. We need to wait until we know that
|
||||
all the readers who were traversing the list when we deleted the
|
||||
|
@ -169,6 +169,22 @@ void (*tf_read) (struct ata_port *ap, struct ata_taskfile *tf);
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2><title>PIO data read/write</title>
|
||||
<programlisting>
|
||||
void (*data_xfer) (struct ata_device *, unsigned char *, unsigned int, int);
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
All bmdma-style drivers must implement this hook. This is the low-level
|
||||
operation that actually copies the data bytes during a PIO data
|
||||
transfer.
|
||||
Typically the driver
|
||||
will choose one of ata_pio_data_xfer_noirq(), ata_pio_data_xfer(), or
|
||||
ata_mmio_data_xfer().
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2><title>ATA command execute</title>
|
||||
<programlisting>
|
||||
void (*exec_command)(struct ata_port *ap, struct ata_taskfile *tf);
|
||||
@ -204,11 +220,10 @@ command.
|
||||
<programlisting>
|
||||
u8 (*check_status)(struct ata_port *ap);
|
||||
u8 (*check_altstatus)(struct ata_port *ap);
|
||||
u8 (*check_err)(struct ata_port *ap);
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
Reads the Status/AltStatus/Error ATA shadow register from
|
||||
Reads the Status/AltStatus ATA shadow register from
|
||||
hardware. On some hardware, reading the Status register has
|
||||
the side effect of clearing the interrupt condition.
|
||||
Most drivers for taskfile-based hardware use
|
||||
@ -269,23 +284,6 @@ void (*set_mode) (struct ata_port *ap);
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2><title>Reset ATA bus</title>
|
||||
<programlisting>
|
||||
void (*phy_reset) (struct ata_port *ap);
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
The very first step in the probe phase. Actions vary depending
|
||||
on the bus type, typically. After waking up the device and probing
|
||||
for device presence (PATA and SATA), typically a soft reset
|
||||
(SRST) will be performed. Drivers typically use the helper
|
||||
functions ata_bus_reset() or sata_phy_reset() for this hook.
|
||||
Many SATA drivers use sata_phy_reset() or call it from within
|
||||
their own phy_reset() functions.
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2><title>Control PCI IDE BMDMA engine</title>
|
||||
<programlisting>
|
||||
void (*bmdma_setup) (struct ata_queued_cmd *qc);
|
||||
@ -354,16 +352,74 @@ int (*qc_issue) (struct ata_queued_cmd *qc);
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2><title>Timeout (error) handling</title>
|
||||
<sect2><title>Exception and probe handling (EH)</title>
|
||||
<programlisting>
|
||||
void (*eng_timeout) (struct ata_port *ap);
|
||||
void (*phy_reset) (struct ata_port *ap);
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
This is a high level error handling function, called from the
|
||||
error handling thread, when a command times out. Most newer
|
||||
hardware will implement its own error handling code here. IDE BMDMA
|
||||
drivers may use the helper function ata_eng_timeout().
|
||||
Deprecated. Use ->error_handler() instead.
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
void (*freeze) (struct ata_port *ap);
|
||||
void (*thaw) (struct ata_port *ap);
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
ata_port_freeze() is called when HSM violations or some other
|
||||
condition disrupts normal operation of the port. A frozen port
|
||||
is not allowed to perform any operation until the port is
|
||||
thawed, which usually follows a successful reset.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The optional ->freeze() callback can be used for freezing the port
|
||||
hardware-wise (e.g. mask interrupt and stop DMA engine). If a
|
||||
port cannot be frozen hardware-wise, the interrupt handler
|
||||
must ack and clear interrupts unconditionally while the port
|
||||
is frozen.
|
||||
</para>
|
||||
<para>
|
||||
The optional ->thaw() callback is called to perform the opposite of ->freeze():
|
||||
prepare the port for normal operation once again. Unmask interrupts,
|
||||
start DMA engine, etc.
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
void (*error_handler) (struct ata_port *ap);
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
->error_handler() is a driver's hook into probe, hotplug, and recovery
|
||||
and other exceptional conditions. The primary responsibility of an
|
||||
implementation is to call ata_do_eh() or ata_bmdma_drive_eh() with a set
|
||||
of EH hooks as arguments:
|
||||
</para>
|
||||
|
||||
<para>
|
||||
'prereset' hook (may be NULL) is called during an EH reset, before any other actions
|
||||
are taken.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
'postreset' hook (may be NULL) is called after the EH reset is performed. Based on
|
||||
existing conditions, severity of the problem, and hardware capabilities,
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Either 'softreset' (may be NULL) or 'hardreset' (may be NULL) will be
|
||||
called to perform the low-level EH reset.
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
void (*post_internal_cmd) (struct ata_queued_cmd *qc);
|
||||
</programlisting>
|
||||
|
||||
<para>
|
||||
Perform any hardware-specific actions necessary to finish processing
|
||||
after executing a probe-time or EH-time command via ata_exec_internal().
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
@ -189,9 +189,9 @@ static unsigned long baseaddr;
|
||||
<sect1>
|
||||
<title>Partition defines</title>
|
||||
<para>
|
||||
If you want to divide your device into parititions, then
|
||||
enable the configuration switch CONFIG_MTD_PARITIONS and define
|
||||
a paritioning scheme suitable to your board.
|
||||
If you want to divide your device into partitions, then
|
||||
enable the configuration switch CONFIG_MTD_PARTITIONS and define
|
||||
a partitioning scheme suitable to your board.
|
||||
</para>
|
||||
<programlisting>
|
||||
#define NUM_PARTITIONS 2
|
||||
|
@ -976,7 +976,7 @@ static int camera_close(struct video_device *dev)
|
||||
<title>Interrupt Handling</title>
|
||||
<para>
|
||||
Our example handler is for an ISA bus device. If it was PCI you would be
|
||||
able to share the interrupt and would have set SA_SHIRQ to indicate a
|
||||
able to share the interrupt and would have set IRQF_SHARED to indicate a
|
||||
shared IRQ. We pass the device pointer as the interrupt routine argument. We
|
||||
don't need to since we only support one card but doing this will make it
|
||||
easier to upgrade the driver for multiple devices in the future.
|
||||
|
@ -10,7 +10,7 @@ standard for controlling intelligent devices that monitor a system.
|
||||
It provides for dynamic discovery of sensors in the system and the
|
||||
ability to monitor the sensors and be informed when the sensor's
|
||||
values change or go outside certain boundaries. It also has a
|
||||
standardized database for field-replacable units (FRUs) and a watchdog
|
||||
standardized database for field-replaceable units (FRUs) and a watchdog
|
||||
timer.
|
||||
|
||||
To use this, you need an interface to an IPMI controller in your
|
||||
@ -64,7 +64,7 @@ situation, you need to read the section below named 'The SI Driver' or
|
||||
IPMI defines a standard watchdog timer. You can enable this with the
|
||||
'IPMI Watchdog Timer' config option. If you compile the driver into
|
||||
the kernel, then via a kernel command-line option you can have the
|
||||
watchdog timer start as soon as it intitializes. It also have a lot
|
||||
watchdog timer start as soon as it initializes. It also have a lot
|
||||
of other options, see the 'Watchdog' section below for more details.
|
||||
Note that you can also have the watchdog continue to run if it is
|
||||
closed (by default it is disabled on close). Go into the 'Watchdog
|
||||
|
22
Documentation/IRQ.txt
Normal file
22
Documentation/IRQ.txt
Normal file
@ -0,0 +1,22 @@
|
||||
What is an IRQ?
|
||||
|
||||
An IRQ is an interrupt request from a device.
|
||||
Currently they can come in over a pin, or over a packet.
|
||||
Several devices may be connected to the same pin thus
|
||||
sharing an IRQ.
|
||||
|
||||
An IRQ number is a kernel identifier used to talk about a hardware
|
||||
interrupt source. Typically this is an index into the global irq_desc
|
||||
array, but except for what linux/interrupt.h implements the details
|
||||
are architecture specific.
|
||||
|
||||
An IRQ number is an enumeration of the possible interrupt sources on a
|
||||
machine. Typically what is enumerated is the number of input pins on
|
||||
all of the interrupt controller in the system. In the case of ISA
|
||||
what is enumerated are the 16 input pins on the two i8259 interrupt
|
||||
controllers.
|
||||
|
||||
Architectures can assign additional meaning to the IRQ numbers, and
|
||||
are encouraged to in the case where there is any manual configuration
|
||||
of the hardware involved. The ISA IRQs are a classic example of
|
||||
assigning this kind of additional meaning.
|
@ -144,9 +144,47 @@ over a rather long period of time, but improvements are always welcome!
|
||||
whether the increased speed is worth it.
|
||||
|
||||
8. Although synchronize_rcu() is a bit slower than is call_rcu(),
|
||||
it usually results in simpler code. So, unless update performance
|
||||
is important or the updaters cannot block, synchronize_rcu()
|
||||
should be used in preference to call_rcu().
|
||||
it usually results in simpler code. So, unless update
|
||||
performance is critically important or the updaters cannot block,
|
||||
synchronize_rcu() should be used in preference to call_rcu().
|
||||
|
||||
An especially important property of the synchronize_rcu()
|
||||
primitive is that it automatically self-limits: if grace periods
|
||||
are delayed for whatever reason, then the synchronize_rcu()
|
||||
primitive will correspondingly delay updates. In contrast,
|
||||
code using call_rcu() should explicitly limit update rate in
|
||||
cases where grace periods are delayed, as failing to do so can
|
||||
result in excessive realtime latencies or even OOM conditions.
|
||||
|
||||
Ways of gaining this self-limiting property when using call_rcu()
|
||||
include:
|
||||
|
||||
a. Keeping a count of the number of data-structure elements
|
||||
used by the RCU-protected data structure, including those
|
||||
waiting for a grace period to elapse. Enforce a limit
|
||||
on this number, stalling updates as needed to allow
|
||||
previously deferred frees to complete.
|
||||
|
||||
Alternatively, limit only the number awaiting deferred
|
||||
free rather than the total number of elements.
|
||||
|
||||
b. Limiting update rate. For example, if updates occur only
|
||||
once per hour, then no explicit rate limiting is required,
|
||||
unless your system is already badly broken. The dcache
|
||||
subsystem takes this approach -- updates are guarded
|
||||
by a global lock, limiting their rate.
|
||||
|
||||
c. Trusted update -- if updates can only be done manually by
|
||||
superuser or some other trusted user, then it might not
|
||||
be necessary to automatically limit them. The theory
|
||||
here is that superuser already has lots of ways to crash
|
||||
the machine.
|
||||
|
||||
d. Use call_rcu_bh() rather than call_rcu(), in order to take
|
||||
advantage of call_rcu_bh()'s faster grace periods.
|
||||
|
||||
e. Periodically invoke synchronize_rcu(), permitting a limited
|
||||
number of updates per grace period.
|
||||
|
||||
9. All RCU list-traversal primitives, which include
|
||||
list_for_each_rcu(), list_for_each_entry_rcu(),
|
||||
|
@ -7,7 +7,7 @@ The CONFIG_RCU_TORTURE_TEST config option is available for all RCU
|
||||
implementations. It creates an rcutorture kernel module that can
|
||||
be loaded to run a torture test. The test periodically outputs
|
||||
status messages via printk(), which can be examined via the dmesg
|
||||
command (perhaps grepping for "rcutorture"). The test is started
|
||||
command (perhaps grepping for "torture"). The test is started
|
||||
when the module is loaded, and stops when the module is unloaded.
|
||||
|
||||
However, actually setting this config option to "y" results in the system
|
||||
@ -35,6 +35,19 @@ stat_interval The number of seconds between output of torture
|
||||
be printed -only- when the module is unloaded, and this
|
||||
is the default.
|
||||
|
||||
shuffle_interval
|
||||
The number of seconds to keep the test threads affinitied
|
||||
to a particular subset of the CPUs. Used in conjunction
|
||||
with test_no_idle_hz.
|
||||
|
||||
test_no_idle_hz Whether or not to test the ability of RCU to operate in
|
||||
a kernel that disables the scheduling-clock interrupt to
|
||||
idle CPUs. Boolean parameter, "1" to test, "0" otherwise.
|
||||
|
||||
torture_type The type of RCU to test: "rcu" for the rcu_read_lock()
|
||||
API, "rcu_bh" for the rcu_read_lock_bh() API, and "srcu"
|
||||
for the "srcu_read_lock()" API.
|
||||
|
||||
verbose Enable debug printk()s. Default is disabled.
|
||||
|
||||
|
||||
@ -42,14 +55,14 @@ OUTPUT
|
||||
|
||||
The statistics output is as follows:
|
||||
|
||||
rcutorture: --- Start of test: nreaders=16 stat_interval=0 verbose=0
|
||||
rcutorture: rtc: 0000000000000000 ver: 1916 tfle: 0 rta: 1916 rtaf: 0 rtf: 1915
|
||||
rcutorture: Reader Pipe: 1466408 9747 0 0 0 0 0 0 0 0 0
|
||||
rcutorture: Reader Batch: 1464477 11678 0 0 0 0 0 0 0 0
|
||||
rcutorture: Free-Block Circulation: 1915 1915 1915 1915 1915 1915 1915 1915 1915 1915 0
|
||||
rcutorture: --- End of test
|
||||
rcu-torture: --- Start of test: nreaders=16 stat_interval=0 verbose=0
|
||||
rcu-torture: rtc: 0000000000000000 ver: 1916 tfle: 0 rta: 1916 rtaf: 0 rtf: 1915
|
||||
rcu-torture: Reader Pipe: 1466408 9747 0 0 0 0 0 0 0 0 0
|
||||
rcu-torture: Reader Batch: 1464477 11678 0 0 0 0 0 0 0 0
|
||||
rcu-torture: Free-Block Circulation: 1915 1915 1915 1915 1915 1915 1915 1915 1915 1915 0
|
||||
rcu-torture: --- End of test
|
||||
|
||||
The command "dmesg | grep rcutorture:" will extract this information on
|
||||
The command "dmesg | grep torture:" will extract this information on
|
||||
most systems. On more esoteric configurations, it may be necessary to
|
||||
use other commands to access the output of the printk()s used by
|
||||
the RCU torture test. The printk()s use KERN_ALERT, so they should
|
||||
@ -115,8 +128,9 @@ The following script may be used to torture RCU:
|
||||
modprobe rcutorture
|
||||
sleep 100
|
||||
rmmod rcutorture
|
||||
dmesg | grep rcutorture:
|
||||
dmesg | grep torture:
|
||||
|
||||
The output can be manually inspected for the error flag of "!!!".
|
||||
One could of course create a more elaborate script that automatically
|
||||
checked for such errors.
|
||||
checked for such errors. The "rmmod" command forces a "SUCCESS" or
|
||||
"FAILURE" indication to be printk()ed.
|
||||
|
@ -184,7 +184,17 @@ synchronize_rcu()
|
||||
blocking, it registers a function and argument which are invoked
|
||||
after all ongoing RCU read-side critical sections have completed.
|
||||
This callback variant is particularly useful in situations where
|
||||
it is illegal to block.
|
||||
it is illegal to block or where update-side performance is
|
||||
critically important.
|
||||
|
||||
However, the call_rcu() API should not be used lightly, as use
|
||||
of the synchronize_rcu() API generally results in simpler code.
|
||||
In addition, the synchronize_rcu() API has the nice property
|
||||
of automatically limiting update rate should grace periods
|
||||
be delayed. This property results in system resilience in face
|
||||
of denial-of-service attacks. Code using call_rcu() should limit
|
||||
update rate in order to gain this same sort of resilience. See
|
||||
checklist.txt for some approaches to limiting the update rate.
|
||||
|
||||
rcu_assign_pointer()
|
||||
|
||||
@ -790,7 +800,6 @@ RCU pointer update:
|
||||
|
||||
RCU grace period:
|
||||
|
||||
synchronize_kernel (deprecated)
|
||||
synchronize_net
|
||||
synchronize_sched
|
||||
synchronize_rcu
|
||||
|
@ -78,9 +78,9 @@ also known as "System Drives", and Drive Groups are also called "Packs". Both
|
||||
terms are in use in the Mylex documentation; I have chosen to standardize on
|
||||
the more generic "Logical Drive" and "Drive Group".
|
||||
|
||||
DAC960 RAID disk devices are named in the style of the Device File System
|
||||
(DEVFS). The device corresponding to Logical Drive D on Controller C is
|
||||
referred to as /dev/rd/cCdD, and the partitions are called /dev/rd/cCdDp1
|
||||
DAC960 RAID disk devices are named in the style of the obsolete Device File
|
||||
System (DEVFS). The device corresponding to Logical Drive D on Controller C
|
||||
is referred to as /dev/rd/cCdD, and the partitions are called /dev/rd/cCdDp1
|
||||
through /dev/rd/cCdDp7. For example, partition 3 of Logical Drive 5 on
|
||||
Controller 2 is referred to as /dev/rd/c2d5p3. Note that unlike with SCSI
|
||||
disks the device names will not change in the event of a disk drive failure.
|
||||
|
57
Documentation/SubmitChecklist
Normal file
57
Documentation/SubmitChecklist
Normal file
@ -0,0 +1,57 @@
|
||||
Linux Kernel patch sumbittal checklist
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Here are some basic things that developers should do if they
|
||||
want to see their kernel patch submittals accepted quicker.
|
||||
|
||||
These are all above and beyond the documentation that is provided
|
||||
in Documentation/SubmittingPatches and elsewhere about submitting
|
||||
Linux kernel patches.
|
||||
|
||||
|
||||
|
||||
- Builds cleanly with applicable or modified CONFIG options =y, =m, and =n.
|
||||
No gcc warnings/errors, no linker warnings/errors.
|
||||
|
||||
- Passes allnoconfig, allmodconfig
|
||||
|
||||
- Builds on multiple CPU arch-es by using local cross-compile tools
|
||||
or something like PLM at OSDL.
|
||||
|
||||
- ppc64 is a good architecture for cross-compilation checking because it
|
||||
tends to use `unsigned long' for 64-bit quantities.
|
||||
|
||||
- Matches kernel coding style(!)
|
||||
|
||||
- Any new or modified CONFIG options don't muck up the config menu.
|
||||
|
||||
- All new Kconfig options have help text.
|
||||
|
||||
- Has been carefully reviewed with respect to relevant Kconfig
|
||||
combinations. This is very hard to get right with testing --
|
||||
brainpower pays off here.
|
||||
|
||||
- Check cleanly with sparse.
|
||||
|
||||
- Use 'make checkstack' and 'make namespacecheck' and fix any
|
||||
problems that they find. Note: checkstack does not point out
|
||||
problems explicitly, but any one function that uses more than
|
||||
512 bytes on the stack is a candidate for change.
|
||||
|
||||
- Include kernel-doc to document global kernel APIs. (Not required
|
||||
for static functions, but OK there also.) Use 'make htmldocs'
|
||||
or 'make mandocs' to check the kernel-doc and fix any issues.
|
||||
|
||||
- Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT,
|
||||
CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES,
|
||||
CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously
|
||||
enabled.
|
||||
|
||||
- Has been build- and runtime tested with and without CONFIG_SMP and
|
||||
CONFIG_PREEMPT.
|
||||
|
||||
- If the patch affects IO/Disk, etc: has been tested with and without
|
||||
CONFIG_LBD.
|
||||
|
||||
|
||||
2006-APR-27
|
@ -85,7 +85,7 @@ IXP4xx provides two methods of accessing PCI memory space:
|
||||
2) If > 64MB of memory space is required, the IXP4xx can be
|
||||
configured to use indirect registers to access PCI This allows
|
||||
for up to 128MB (0x48000000 to 0x4fffffff) of memory on the bus.
|
||||
The disadvantadge of this is that every PCI access requires
|
||||
The disadvantage of this is that every PCI access requires
|
||||
three local register accesses plus a spinlock, but in some
|
||||
cases the performance hit is acceptable. In addition, you cannot
|
||||
mmap() PCI devices in this case due to the indirect nature
|
||||
|
@ -7,11 +7,13 @@ Introduction
|
||||
------------
|
||||
|
||||
The Samsung S3C24XX range of ARM9 System-on-Chip CPUs are supported
|
||||
by the 's3c2410' architecture of ARM Linux. Currently the S3C2410 and
|
||||
the S3C2440 are supported CPUs.
|
||||
by the 's3c2410' architecture of ARM Linux. Currently the S3C2410,
|
||||
S3C2440 and S3C2442 devices are supported.
|
||||
|
||||
Support for the S3C2400 series is in progress.
|
||||
|
||||
Support for the S3C2412 and S3C2413 CPUs is being merged.
|
||||
|
||||
|
||||
Configuration
|
||||
-------------
|
||||
@ -43,9 +45,18 @@ Machines
|
||||
|
||||
Samsung's own development board, geared for PDA work.
|
||||
|
||||
Samsung/Aiji SMDK2412
|
||||
|
||||
The S3C2412 version of the SMDK2440.
|
||||
|
||||
Samsung/Aiji SMDK2413
|
||||
|
||||
The S3C2412 version of the SMDK2440.
|
||||
|
||||
Samsung/Meritech SMDK2440
|
||||
|
||||
The S3C2440 compatible version of the SMDK2440
|
||||
The S3C2440 compatible version of the SMDK2440, which has the
|
||||
option of an S3C2440 or S3C2442 CPU module.
|
||||
|
||||
Thorcom VR1000
|
||||
|
||||
@ -211,24 +222,6 @@ Port Contributors
|
||||
Lucas Correia Villa Real (S3C2400 port)
|
||||
|
||||
|
||||
Document Changes
|
||||
----------------
|
||||
|
||||
05 Sep 2004 - BJD - Added Document Changes section
|
||||
05 Sep 2004 - BJD - Added Klaus Fetscher to list of contributors
|
||||
25 Oct 2004 - BJD - Added Dimitry Andric to list of contributors
|
||||
25 Oct 2004 - BJD - Updated the MTD from the 2.6.9 merge
|
||||
21 Jan 2005 - BJD - Added rx3715, added Shannon to contributors
|
||||
10 Feb 2005 - BJD - Added Guillaume Gourat to contributors
|
||||
02 Mar 2005 - BJD - Added SMDK2440 to list of machines
|
||||
06 Mar 2005 - BJD - Added Christer Weinigel
|
||||
08 Mar 2005 - BJD - Added LCVR to list of people, updated introduction
|
||||
08 Mar 2005 - BJD - Added section on adding machines
|
||||
09 Sep 2005 - BJD - Added section on platform data
|
||||
11 Feb 2006 - BJD - Added I2C, RTC and Watchdog sections
|
||||
11 Feb 2006 - BJD - Added Osiris machine, and S3C2400 information
|
||||
|
||||
|
||||
Document Author
|
||||
---------------
|
||||
|
||||
|
120
Documentation/arm/Samsung-S3C24XX/S3C2412.txt
Normal file
120
Documentation/arm/Samsung-S3C24XX/S3C2412.txt
Normal file
@ -0,0 +1,120 @@
|
||||
S3C2412 ARM Linux Overview
|
||||
==========================
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
The S3C2412 is part of the S3C24XX range of ARM9 System-on-Chip CPUs
|
||||
from Samsung. This part has an ARM926-EJS core, capable of running up
|
||||
to 266MHz (see data-sheet for more information)
|
||||
|
||||
|
||||
Clock
|
||||
-----
|
||||
|
||||
The core clock code provides a set of clocks to the drivers, and allows
|
||||
for source selection and a number of other features.
|
||||
|
||||
|
||||
Power
|
||||
-----
|
||||
|
||||
No support for suspend/resume to RAM in the current system.
|
||||
|
||||
|
||||
DMA
|
||||
---
|
||||
|
||||
No current support for DMA.
|
||||
|
||||
|
||||
GPIO
|
||||
----
|
||||
|
||||
There is support for setting the GPIO to input/output/special function
|
||||
and reading or writing to them.
|
||||
|
||||
|
||||
UART
|
||||
----
|
||||
|
||||
The UART hardware is similar to the S3C2440, and is supported by the
|
||||
s3c2410 driver in the drivers/serial directory.
|
||||
|
||||
|
||||
NAND
|
||||
----
|
||||
|
||||
The NAND hardware is similar to the S3C2440, and is supported by the
|
||||
s3c2410 driver in the drivers/mtd/nand directory.
|
||||
|
||||
|
||||
USB Host
|
||||
--------
|
||||
|
||||
The USB hardware is similar to the S3C2410, with extended clock source
|
||||
control. The OHCI portion is supported by the ohci-s3c2410 driver, and
|
||||
the clock control selection is supported by the core clock code.
|
||||
|
||||
|
||||
USB Device
|
||||
----------
|
||||
|
||||
No current support in the kernel
|
||||
|
||||
|
||||
IRQs
|
||||
----
|
||||
|
||||
All the standard, and external interrupt sources are supported. The
|
||||
extra sub-sources are not yet supported.
|
||||
|
||||
|
||||
RTC
|
||||
---
|
||||
|
||||
The RTC hardware is similar to the S3C2410, and is supported by the
|
||||
s3c2410-rtc driver.
|
||||
|
||||
|
||||
Watchdog
|
||||
--------
|
||||
|
||||
The watchdog harware is the same as the S3C2410, and is supported by
|
||||
the s3c2410_wdt driver.
|
||||
|
||||
|
||||
MMC/SD/SDIO
|
||||
-----------
|
||||
|
||||
No current support for the MMC/SD/SDIO block.
|
||||
|
||||
IIC
|
||||
---
|
||||
|
||||
The IIC hardware is the same as the S3C2410, and is supported by the
|
||||
i2c-s3c24xx driver.
|
||||
|
||||
|
||||
IIS
|
||||
---
|
||||
|
||||
No current support for the IIS interface.
|
||||
|
||||
|
||||
SPI
|
||||
---
|
||||
|
||||
No current support for the SPI interfaces.
|
||||
|
||||
|
||||
ATA
|
||||
---
|
||||
|
||||
No current support for the on-board ATA block.
|
||||
|
||||
|
||||
Document Author
|
||||
---------------
|
||||
|
||||
Ben Dooks, (c) 2006 Simtec Electronics
|
21
Documentation/arm/Samsung-S3C24XX/S3C2413.txt
Normal file
21
Documentation/arm/Samsung-S3C24XX/S3C2413.txt
Normal file
@ -0,0 +1,21 @@
|
||||
S3C2413 ARM Linux Overview
|
||||
==========================
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
The S3C2413 is an extended version of the S3C2412, with an camera
|
||||
interface and mobile DDR memory support. See the S3C2412 support
|
||||
documentation for more information.
|
||||
|
||||
|
||||
Camera Interface
|
||||
---------------
|
||||
|
||||
This block is currently not supported.
|
||||
|
||||
|
||||
Document Author
|
||||
---------------
|
||||
|
||||
Ben Dooks, (c) 2006 Simtec Electronics
|
61
Documentation/arm/Sharp-LH/ADC-LH7-Touchscreen
Normal file
61
Documentation/arm/Sharp-LH/ADC-LH7-Touchscreen
Normal file
@ -0,0 +1,61 @@
|
||||
README on the ADC/Touchscreen Controller
|
||||
========================================
|
||||
|
||||
The LH79524 and LH7A404 include a built-in Analog to Digital
|
||||
controller (ADC) that is used to process input from a touchscreen.
|
||||
The driver only implements a four-wire touch panel protocol.
|
||||
|
||||
The touchscreen driver is maintenance free except for the pen-down or
|
||||
touch threshold. Some resistive displays and board combinations may
|
||||
require tuning of this threshold. The driver exposes some of it's
|
||||
internal state in the sys filesystem. If the kernel is configured
|
||||
with it, CONFIG_SYSFS, and sysfs is mounted at /sys, there will be a
|
||||
directory
|
||||
|
||||
/sys/devices/platform/adc-lh7.0
|
||||
|
||||
containing these files.
|
||||
|
||||
-r--r--r-- 1 root root 4096 Jan 1 00:00 samples
|
||||
-rw-r--r-- 1 root root 4096 Jan 1 00:00 threshold
|
||||
-r--r--r-- 1 root root 4096 Jan 1 00:00 threshold_range
|
||||
|
||||
The threshold is the current touch threshold. It defaults to 750 on
|
||||
most targets.
|
||||
|
||||
# cat threshold
|
||||
750
|
||||
|
||||
The threshold_range contains the range of valid values for the
|
||||
threshold. Values outside of this range will be silently ignored.
|
||||
|
||||
# cat threshold_range
|
||||
0 1023
|
||||
|
||||
To change the threshold, write a value to the threshold file.
|
||||
|
||||
# echo 500 > threshold
|
||||
# cat threshold
|
||||
500
|
||||
|
||||
The samples file contains the most recently sampled values from the
|
||||
ADC. There are 12. Below are typical of the last sampled values when
|
||||
the pen has been released. The first two and last two samples are for
|
||||
detecting whether or not the pen is down. The third through sixth are
|
||||
X coordinate samples. The seventh through tenth are Y coordinate
|
||||
samples.
|
||||
|
||||
# cat samples
|
||||
1023 1023 0 0 0 0 530 529 530 529 1023 1023
|
||||
|
||||
To determine a reasonable threshold, press on the touch panel with an
|
||||
appropriate stylus and read the values from samples.
|
||||
|
||||
# cat samples
|
||||
1023 676 92 103 101 102 855 919 922 922 1023 679
|
||||
|
||||
The first and eleventh samples are discarded. Thus, the important
|
||||
values are the second and twelfth which are used to determine if the
|
||||
pen is down. When both are below the threshold, the driver registers
|
||||
that the pen is down. When either is above the threshold, it
|
||||
registers then pen is up.
|
59
Documentation/arm/Sharp-LH/LCDPanels
Normal file
59
Documentation/arm/Sharp-LH/LCDPanels
Normal file
@ -0,0 +1,59 @@
|
||||
README on the LCD Panels
|
||||
========================
|
||||
|
||||
Configuration options for several LCD panels, available from Logic PD,
|
||||
are included in the kernel source. This README will help you
|
||||
understand the configuration data and give you some guidance for
|
||||
adding support for other panels if you wish.
|
||||
|
||||
|
||||
lcd-panels.h
|
||||
------------
|
||||
|
||||
There is no way, at present, to detect which panel is attached to the
|
||||
system at runtime. Thus the kernel configuration is static. The file
|
||||
arch/arm/mach-ld7a40x/lcd-panels.h (or similar) defines all of the
|
||||
panel specific parameters.
|
||||
|
||||
It should be possible for this data to be shared among several device
|
||||
families. The current layout may be insufficiently general, but it is
|
||||
amenable to improvement.
|
||||
|
||||
|
||||
PIXEL_CLOCK
|
||||
-----------
|
||||
|
||||
The panel data sheets will give a range of acceptable pixel clocks.
|
||||
The fundamental LCDCLK input frequency is divided down by a PCD
|
||||
constant in field '.tim2'. It may happen that it is impossible to set
|
||||
the pixel clock within this range. A clock which is too slow will
|
||||
tend to flicker. For the highest quality image, set the clock as high
|
||||
as possible.
|
||||
|
||||
|
||||
MARGINS
|
||||
-------
|
||||
|
||||
These values may be difficult to glean from the panel data sheet. In
|
||||
the case of the Sharp panels, the upper margin is explicitly called
|
||||
out as a specific number of lines from the top of the frame. The
|
||||
other values may not matter as much as the panels tend to
|
||||
automatically center the image.
|
||||
|
||||
|
||||
Sync Sense
|
||||
----------
|
||||
|
||||
The sense of the hsync and vsync pulses may be called out in the data
|
||||
sheet. On one panel, the sense of these pulses determine the height
|
||||
of the visible region on the panel. Most of the Sharp panels use
|
||||
negative sense sync pulses set by the TIM2_IHS and TIM2_IVS bits in
|
||||
'.tim2'.
|
||||
|
||||
|
||||
Pel Layout
|
||||
----------
|
||||
|
||||
The Sharp color TFT panels are all configured for 16 bit direct color
|
||||
modes. The amba-lcd driver sets the pel mode to 565 for 5 bits of
|
||||
each red and blue and 6 bits of green.
|
@ -157,13 +157,13 @@ For example, smp_mb__before_atomic_dec() can be used like so:
|
||||
smp_mb__before_atomic_dec();
|
||||
atomic_dec(&obj->ref_count);
|
||||
|
||||
It makes sure that all memory operations preceeding the atomic_dec()
|
||||
It makes sure that all memory operations preceding the atomic_dec()
|
||||
call are strongly ordered with respect to the atomic counter
|
||||
operation. In the above example, it guarentees that the assignment of
|
||||
operation. In the above example, it guarantees that the assignment of
|
||||
"1" to obj->dead will be globally visible to other cpus before the
|
||||
atomic counter decrement.
|
||||
|
||||
Without the explicitl smp_mb__before_atomic_dec() call, the
|
||||
Without the explicit smp_mb__before_atomic_dec() call, the
|
||||
implementation could legally allow the atomic counter update visible
|
||||
to other cpus before the "obj->dead = 1;" assignment.
|
||||
|
||||
@ -173,11 +173,11 @@ ordering with respect to memory operations after an atomic_dec() call
|
||||
(smp_mb__{before,after}_atomic_inc()).
|
||||
|
||||
A missing memory barrier in the cases where they are required by the
|
||||
atomic_t implementation above can have disasterous results. Here is
|
||||
an example, which follows a pattern occuring frequently in the Linux
|
||||
atomic_t implementation above can have disastrous results. Here is
|
||||
an example, which follows a pattern occurring frequently in the Linux
|
||||
kernel. It is the use of atomic counters to implement reference
|
||||
counting, and it works such that once the counter falls to zero it can
|
||||
be guarenteed that no other entity can be accessing the object:
|
||||
be guaranteed that no other entity can be accessing the object:
|
||||
|
||||
static void obj_list_add(struct obj *obj)
|
||||
{
|
||||
@ -291,9 +291,9 @@ to the size of an "unsigned long" C data type, and are least of that
|
||||
size. The endianness of the bits within each "unsigned long" are the
|
||||
native endianness of the cpu.
|
||||
|
||||
void set_bit(unsigned long nr, volatils unsigned long *addr);
|
||||
void clear_bit(unsigned long nr, volatils unsigned long *addr);
|
||||
void change_bit(unsigned long nr, volatils unsigned long *addr);
|
||||
void set_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
void clear_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
void change_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
|
||||
These routines set, clear, and change, respectively, the bit number
|
||||
indicated by "nr" on the bit mask pointed to by "ADDR".
|
||||
@ -301,9 +301,9 @@ indicated by "nr" on the bit mask pointed to by "ADDR".
|
||||
They must execute atomically, yet there are no implicit memory barrier
|
||||
semantics required of these interfaces.
|
||||
|
||||
int test_and_set_bit(unsigned long nr, volatils unsigned long *addr);
|
||||
int test_and_clear_bit(unsigned long nr, volatils unsigned long *addr);
|
||||
int test_and_change_bit(unsigned long nr, volatils unsigned long *addr);
|
||||
int test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
int test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
|
||||
|
||||
Like the above, except that these routines return a boolean which
|
||||
indicates whether the changed bit was set _BEFORE_ the atomic bit
|
||||
@ -335,7 +335,7 @@ subsequent memory operation is made visible. For example:
|
||||
/* ... */;
|
||||
obj->killed = 1;
|
||||
|
||||
The implementation of test_and_set_bit() must guarentee that
|
||||
The implementation of test_and_set_bit() must guarantee that
|
||||
"obj->dead = 1;" is visible to cpus before the atomic memory operation
|
||||
done by test_and_set_bit() becomes visible. Likewise, the atomic
|
||||
memory operation done by test_and_set_bit() must become visible before
|
||||
@ -474,7 +474,7 @@ Now, as far as memory barriers go, as long as spin_lock()
|
||||
strictly orders all subsequent memory operations (including
|
||||
the cas()) with respect to itself, things will be fine.
|
||||
|
||||
Said another way, _atomic_dec_and_lock() must guarentee that
|
||||
Said another way, _atomic_dec_and_lock() must guarantee that
|
||||
a counter dropping to zero is never made visible before the
|
||||
spinlock being acquired.
|
||||
|
||||
|
144
Documentation/console/console.txt
Normal file
144
Documentation/console/console.txt
Normal file
@ -0,0 +1,144 @@
|
||||
Console Drivers
|
||||
===============
|
||||
|
||||
The linux kernel has 2 general types of console drivers. The first type is
|
||||
assigned by the kernel to all the virtual consoles during the boot process.
|
||||
This type will be called 'system driver', and only one system driver is allowed
|
||||
to exist. The system driver is persistent and it can never be unloaded, though
|
||||
it may become inactive.
|
||||
|
||||
The second type has to be explicitly loaded and unloaded. This will be called
|
||||
'modular driver' by this document. Multiple modular drivers can coexist at
|
||||
any time with each driver sharing the console with other drivers including
|
||||
the system driver. However, modular drivers cannot take over the console
|
||||
that is currently occupied by another modular driver. (Exception: Drivers that
|
||||
call take_over_console() will succeed in the takeover regardless of the type
|
||||
of driver occupying the consoles.) They can only take over the console that is
|
||||
occupied by the system driver. In the same token, if the modular driver is
|
||||
released by the console, the system driver will take over.
|
||||
|
||||
Modular drivers, from the programmer's point of view, has to call:
|
||||
|
||||
take_over_console() - load and bind driver to console layer
|
||||
give_up_console() - unbind and unload driver
|
||||
|
||||
In newer kernels, the following are also available:
|
||||
|
||||
register_con_driver()
|
||||
unregister_con_driver()
|
||||
|
||||
If sysfs is enabled, the contents of /sys/class/vtconsole can be
|
||||
examined. This shows the console backends currently registered by the
|
||||
system which are named vtcon<n> where <n> is an integer fro 0 to 15. Thus:
|
||||
|
||||
ls /sys/class/vtconsole
|
||||
. .. vtcon0 vtcon1
|
||||
|
||||
Each directory in /sys/class/vtconsole has 3 files:
|
||||
|
||||
ls /sys/class/vtconsole/vtcon0
|
||||
. .. bind name uevent
|
||||
|
||||
What do these files signify?
|
||||
|
||||
1. bind - this is a read/write file. It shows the status of the driver if
|
||||
read, or acts to bind or unbind the driver to the virtual consoles
|
||||
when written to. The possible values are:
|
||||
|
||||
0 - means the driver is not bound and if echo'ed, commands the driver
|
||||
to unbind
|
||||
|
||||
1 - means the driver is bound and if echo'ed, commands the driver to
|
||||
bind
|
||||
|
||||
2. name - read-only file. Shows the name of the driver in this format:
|
||||
|
||||
cat /sys/class/vtconsole/vtcon0/name
|
||||
(S) VGA+
|
||||
|
||||
'(S)' stands for a (S)ystem driver, ie, it cannot be directly
|
||||
commanded to bind or unbind
|
||||
|
||||
'VGA+' is the name of the driver
|
||||
|
||||
cat /sys/class/vtconsole/vtcon1/name
|
||||
(M) frame buffer device
|
||||
|
||||
In this case, '(M)' stands for a (M)odular driver, one that can be
|
||||
directly commanded to bind or unbind.
|
||||
|
||||
3. uevent - ignore this file
|
||||
|
||||
When unbinding, the modular driver is detached first, and then the system
|
||||
driver takes over the consoles vacated by the driver. Binding, on the other
|
||||
hand, will bind the driver to the consoles that are currently occupied by a
|
||||
system driver.
|
||||
|
||||
NOTE1: Binding and binding must be selected in Kconfig. It's under:
|
||||
|
||||
Device Drivers -> Character devices -> Support for binding and unbinding
|
||||
console drivers
|
||||
|
||||
NOTE2: If any of the virtual consoles are in KD_GRAPHICS mode, then binding or
|
||||
unbinding will not succeed. An example of an application that sets the console
|
||||
to KD_GRAPHICS is X.
|
||||
|
||||
How useful is this feature? This is very useful for console driver
|
||||
developers. By unbinding the driver from the console layer, one can unload the
|
||||
driver, make changes, recompile, reload and rebind the driver without any need
|
||||
for rebooting the kernel. For regular users who may want to switch from
|
||||
framebuffer console to VGA console and vice versa, this feature also makes
|
||||
this possible. (NOTE NOTE NOTE: Please read fbcon.txt under Documentation/fb
|
||||
for more details).
|
||||
|
||||
Notes for developers:
|
||||
=====================
|
||||
|
||||
take_over_console() is now broken up into:
|
||||
|
||||
register_con_driver()
|
||||
bind_con_driver() - private function
|
||||
|
||||
give_up_console() is a wrapper to unregister_con_driver(), and a driver must
|
||||
be fully unbound for this call to succeed. con_is_bound() will check if the
|
||||
driver is bound or not.
|
||||
|
||||
Guidelines for console driver writers:
|
||||
=====================================
|
||||
|
||||
In order for binding to and unbinding from the console to properly work,
|
||||
console drivers must follow these guidelines:
|
||||
|
||||
1. All drivers, except system drivers, must call either register_con_driver()
|
||||
or take_over_console(). register_con_driver() will just add the driver to
|
||||
the console's internal list. It won't take over the
|
||||
console. take_over_console(), as it name implies, will also take over (or
|
||||
bind to) the console.
|
||||
|
||||
2. All resources allocated during con->con_init() must be released in
|
||||
con->con_deinit().
|
||||
|
||||
3. All resources allocated in con->con_startup() must be released when the
|
||||
driver, which was previously bound, becomes unbound. The console layer
|
||||
does not have a complementary call to con->con_startup() so it's up to the
|
||||
driver to check when it's legal to release these resources. Calling
|
||||
con_is_bound() in con->con_deinit() will help. If the call returned
|
||||
false(), then it's safe to release the resources. This balance has to be
|
||||
ensured because con->con_startup() can be called again when a request to
|
||||
rebind the driver to the console arrives.
|
||||
|
||||
4. Upon exit of the driver, ensure that the driver is totally unbound. If the
|
||||
condition is satisfied, then the driver must call unregister_con_driver()
|
||||
or give_up_console().
|
||||
|
||||
5. unregister_con_driver() can also be called on conditions which make it
|
||||
impossible for the driver to service console requests. This can happen
|
||||
with the framebuffer console that suddenly lost all of its drivers.
|
||||
|
||||
The current crop of console drivers should still work correctly, but binding
|
||||
and unbinding them may cause problems. With minimal fixes, these drivers can
|
||||
be made to work correctly.
|
||||
|
||||
==========================
|
||||
Antonino Daplas <adaplas@pol.net>
|
||||
|
@ -3,7 +3,7 @@
|
||||
|
||||
Maintained by Torben Mathiasen <device@lanana.org>
|
||||
|
||||
Last revised: 25 January 2005
|
||||
Last revised: 15 May 2006
|
||||
|
||||
This list is the Linux Device List, the official registry of allocated
|
||||
device numbers and /dev directory nodes for the Linux operating
|
||||
@ -94,7 +94,6 @@ Your cooperation is appreciated.
|
||||
9 = /dev/urandom Faster, less secure random number gen.
|
||||
10 = /dev/aio Asyncronous I/O notification interface
|
||||
11 = /dev/kmsg Writes to this come out as printk's
|
||||
12 = /dev/oldmem Access to crash dump from kexec kernel
|
||||
1 block RAM disk
|
||||
0 = /dev/ram0 First RAM disk
|
||||
1 = /dev/ram1 Second RAM disk
|
||||
@ -262,13 +261,13 @@ Your cooperation is appreciated.
|
||||
NOTE: These devices permit both read and write access.
|
||||
|
||||
7 block Loopback devices
|
||||
0 = /dev/loop0 First loopback device
|
||||
1 = /dev/loop1 Second loopback device
|
||||
0 = /dev/loop0 First loop device
|
||||
1 = /dev/loop1 Second loop device
|
||||
...
|
||||
|
||||
The loopback devices are used to mount filesystems not
|
||||
The loop devices are used to mount filesystems not
|
||||
associated with block devices. The binding to the
|
||||
loopback devices is handled by mount(8) or losetup(8).
|
||||
loop devices is handled by mount(8) or losetup(8).
|
||||
|
||||
8 block SCSI disk devices (0-15)
|
||||
0 = /dev/sda First SCSI disk whole disk
|
||||
@ -943,7 +942,7 @@ Your cooperation is appreciated.
|
||||
240 = /dev/ftlp FTL on 16th Memory Technology Device
|
||||
|
||||
Partitions are handled in the same way as for IDE
|
||||
disks (see major number 3) expect that the partition
|
||||
disks (see major number 3) except that the partition
|
||||
limit is 15 rather than 63 per disk (same as SCSI.)
|
||||
|
||||
45 char isdn4linux ISDN BRI driver
|
||||
@ -1168,7 +1167,7 @@ Your cooperation is appreciated.
|
||||
The filename of the encrypted container and the passwords
|
||||
are sent via ioctls (using the sdmount tool) to the master
|
||||
node which then activates them via one of the
|
||||
/dev/scramdisk/x nodes for loopback mounting (all handled
|
||||
/dev/scramdisk/x nodes for loop mounting (all handled
|
||||
through the sdmount tool).
|
||||
|
||||
Requested by: andy@scramdisklinux.org
|
||||
@ -2538,18 +2537,32 @@ Your cooperation is appreciated.
|
||||
0 = /dev/usb/lp0 First USB printer
|
||||
...
|
||||
15 = /dev/usb/lp15 16th USB printer
|
||||
16 = /dev/usb/mouse0 First USB mouse
|
||||
...
|
||||
31 = /dev/usb/mouse15 16th USB mouse
|
||||
32 = /dev/usb/ez0 First USB firmware loader
|
||||
...
|
||||
47 = /dev/usb/ez15 16th USB firmware loader
|
||||
48 = /dev/usb/scanner0 First USB scanner
|
||||
...
|
||||
63 = /dev/usb/scanner15 16th USB scanner
|
||||
64 = /dev/usb/rio500 Diamond Rio 500
|
||||
65 = /dev/usb/usblcd USBLCD Interface (info@usblcd.de)
|
||||
66 = /dev/usb/cpad0 Synaptics cPad (mouse/LCD)
|
||||
96 = /dev/usb/hiddev0 1st USB HID device
|
||||
...
|
||||
111 = /dev/usb/hiddev15 16th USB HID device
|
||||
112 = /dev/usb/auer0 1st auerswald ISDN device
|
||||
...
|
||||
127 = /dev/usb/auer15 16th auerswald ISDN device
|
||||
128 = /dev/usb/brlvgr0 First Braille Voyager device
|
||||
...
|
||||
131 = /dev/usb/brlvgr3 Fourth Braille Voyager device
|
||||
132 = /dev/usb/idmouse ID Mouse (fingerprint scanner) device
|
||||
133 = /dev/usb/sisusbvga1 First SiSUSB VGA device
|
||||
...
|
||||
140 = /dev/usb/sisusbvga8 Eigth SISUSB VGA device
|
||||
144 = /dev/usb/lcd USB LCD device
|
||||
160 = /dev/usb/legousbtower0 1st USB Legotower device
|
||||
...
|
||||
175 = /dev/usb/legousbtower15 16th USB Legotower device
|
||||
240 = /dev/usb/dabusb0 First daubusb device
|
||||
...
|
||||
243 = /dev/usb/dabusb3 Fourth dabusb device
|
||||
|
||||
180 block USB block devices
|
||||
0 = /dev/uba First USB block device
|
||||
@ -2710,6 +2723,17 @@ Your cooperation is appreciated.
|
||||
1 = /dev/cpu/1/msr MSRs on CPU 1
|
||||
...
|
||||
|
||||
202 block Xen Virtual Block Device
|
||||
0 = /dev/xvda First Xen VBD whole disk
|
||||
16 = /dev/xvdb Second Xen VBD whole disk
|
||||
32 = /dev/xvdc Third Xen VBD whole disk
|
||||
...
|
||||
240 = /dev/xvdp Sixteenth Xen VBD whole disk
|
||||
|
||||
Partitions are handled in the same way as for IDE
|
||||
disks (see major number 3) except that the limit on
|
||||
partitions is 15.
|
||||
|
||||
203 char CPU CPUID information
|
||||
0 = /dev/cpu/0/cpuid CPUID on CPU 0
|
||||
1 = /dev/cpu/1/cpuid CPUID on CPU 1
|
||||
@ -2747,11 +2771,27 @@ Your cooperation is appreciated.
|
||||
46 = /dev/ttyCPM0 PPC CPM (SCC or SMC) - port 0
|
||||
...
|
||||
47 = /dev/ttyCPM5 PPC CPM (SCC or SMC) - port 5
|
||||
50 = /dev/ttyIOC40 Altix serial card
|
||||
50 = /dev/ttyIOC0 Altix serial card
|
||||
...
|
||||
81 = /dev/ttyIOC431 Altix serial card
|
||||
82 = /dev/ttyVR0 NEC VR4100 series SIU
|
||||
83 = /dev/ttyVR1 NEC VR4100 series DSIU
|
||||
81 = /dev/ttyIOC31 Altix serial card
|
||||
82 = /dev/ttyVR0 NEC VR4100 series SIU
|
||||
83 = /dev/ttyVR1 NEC VR4100 series DSIU
|
||||
84 = /dev/ttyIOC84 Altix ioc4 serial card
|
||||
...
|
||||
115 = /dev/ttyIOC115 Altix ioc4 serial card
|
||||
116 = /dev/ttySIOC0 Altix ioc3 serial card
|
||||
...
|
||||
147 = /dev/ttySIOC31 Altix ioc3 serial card
|
||||
148 = /dev/ttyPSC0 PPC PSC - port 0
|
||||
...
|
||||
153 = /dev/ttyPSC5 PPC PSC - port 5
|
||||
154 = /dev/ttyAT0 ATMEL serial port 0
|
||||
...
|
||||
169 = /dev/ttyAT15 ATMEL serial port 15
|
||||
170 = /dev/ttyNX0 Hilscher netX serial port 0
|
||||
...
|
||||
185 = /dev/ttyNX15 Hilscher netX serial port 15
|
||||
186 = /dev/ttyJ0 JTAG1 DCC protocol based serial port emulation
|
||||
|
||||
205 char Low-density serial ports (alternate device)
|
||||
0 = /dev/culu0 Callout device for ttyLU0
|
||||
@ -2786,8 +2826,8 @@ Your cooperation is appreciated.
|
||||
50 = /dev/cuioc40 Callout device for ttyIOC40
|
||||
...
|
||||
81 = /dev/cuioc431 Callout device for ttyIOC431
|
||||
82 = /dev/cuvr0 Callout device for ttyVR0
|
||||
83 = /dev/cuvr1 Callout device for ttyVR1
|
||||
82 = /dev/cuvr0 Callout device for ttyVR0
|
||||
83 = /dev/cuvr1 Callout device for ttyVR1
|
||||
|
||||
|
||||
206 char OnStream SC-x0 tape devices
|
||||
@ -2897,7 +2937,6 @@ Your cooperation is appreciated.
|
||||
...
|
||||
196 = /dev/dvb/adapter3/video0 first video decoder of fourth card
|
||||
|
||||
|
||||
216 char Bluetooth RFCOMM TTY devices
|
||||
0 = /dev/rfcomm0 First Bluetooth RFCOMM TTY device
|
||||
1 = /dev/rfcomm1 Second Bluetooth RFCOMM TTY device
|
||||
@ -3002,12 +3041,43 @@ Your cooperation is appreciated.
|
||||
ioctl()'s can be used to rewind the tape regardless of
|
||||
the device used to access it.
|
||||
|
||||
231 char InfiniBand MAD
|
||||
231 char InfiniBand
|
||||
0 = /dev/infiniband/umad0
|
||||
1 = /dev/infiniband/umad1
|
||||
...
|
||||
...
|
||||
63 = /dev/infiniband/umad63 63rd InfiniBandMad device
|
||||
64 = /dev/infiniband/issm0 First InfiniBand IsSM device
|
||||
65 = /dev/infiniband/issm1 Second InfiniBand IsSM device
|
||||
...
|
||||
127 = /dev/infiniband/issm63 63rd InfiniBand IsSM device
|
||||
128 = /dev/infiniband/uverbs0 First InfiniBand verbs device
|
||||
129 = /dev/infiniband/uverbs1 Second InfiniBand verbs device
|
||||
...
|
||||
159 = /dev/infiniband/uverbs31 31st InfiniBand verbs device
|
||||
|
||||
232-239 UNASSIGNED
|
||||
232 char Biometric Devices
|
||||
0 = /dev/biometric/sensor0/fingerprint first fingerprint sensor on first device
|
||||
1 = /dev/biometric/sensor0/iris first iris sensor on first device
|
||||
2 = /dev/biometric/sensor0/retina first retina sensor on first device
|
||||
3 = /dev/biometric/sensor0/voiceprint first voiceprint sensor on first device
|
||||
4 = /dev/biometric/sensor0/facial first facial sensor on first device
|
||||
5 = /dev/biometric/sensor0/hand first hand sensor on first device
|
||||
...
|
||||
10 = /dev/biometric/sensor1/fingerprint first fingerprint sensor on second device
|
||||
...
|
||||
20 = /dev/biometric/sensor2/fingerprint first fingerprint sensor on third device
|
||||
...
|
||||
|
||||
233 char PathScale InfiniPath interconnect
|
||||
0 = /dev/ipath Primary device for programs (any unit)
|
||||
1 = /dev/ipath0 Access specifically to unit 0
|
||||
2 = /dev/ipath1 Access specifically to unit 1
|
||||
...
|
||||
4 = /dev/ipath3 Access specifically to unit 3
|
||||
129 = /dev/ipath_sma Device used by Subnet Management Agent
|
||||
130 = /dev/ipath_diag Device used by diagnostics programs
|
||||
|
||||
234-239 UNASSIGNED
|
||||
|
||||
240-254 char LOCAL/EXPERIMENTAL USE
|
||||
240-254 block LOCAL/EXPERIMENTAL USE
|
||||
@ -3021,6 +3091,28 @@ Your cooperation is appreciated.
|
||||
This major is reserved to assist the expansion to a
|
||||
larger number space. No device nodes with this major
|
||||
should ever be created on the filesystem.
|
||||
(This is probaly not true anymore, but I'll leave it
|
||||
for now /Torben)
|
||||
|
||||
---LARGE MAJORS!!!!!---
|
||||
|
||||
256 char Equinox SST multi-port serial boards
|
||||
0 = /dev/ttyEQ0 First serial port on first Equinox SST board
|
||||
127 = /dev/ttyEQ127 Last serial port on first Equinox SST board
|
||||
128 = /dev/ttyEQ128 First serial port on second Equinox SST board
|
||||
...
|
||||
1027 = /dev/ttyEQ1027 Last serial port on eighth Equinox SST board
|
||||
|
||||
256 block Resident Flash Disk Flash Translation Layer
|
||||
0 = /dev/rfda First RFD FTL layer
|
||||
16 = /dev/rfdb Second RFD FTL layer
|
||||
...
|
||||
240 = /dev/rfdp 16th RFD FTL layer
|
||||
|
||||
257 char Phoenix Technologies Cryptographic Services Driver
|
||||
0 = /dev/ptlsec Crypto Services Driver
|
||||
|
||||
|
||||
|
||||
**** ADDITIONAL /dev DIRECTORY ENTRIES
|
||||
|
||||
|
@ -2,7 +2,7 @@ NOTE: This driver is obsolete. Digi provides a 2.6 driver (dgdm) at
|
||||
http://www.digi.com for PCI cards. They no longer maintain this driver,
|
||||
and have no 2.6 driver for ISA cards.
|
||||
|
||||
This driver requires a number of user-space tools. They can be aquired from
|
||||
This driver requires a number of user-space tools. They can be acquired from
|
||||
http://www.digi.com, but only works with 2.4 kernels.
|
||||
|
||||
|
||||
|
@ -18,7 +18,7 @@ Traditional driver models implemented some sort of tree-like structure
|
||||
(sometimes just a list) for the devices they control. There wasn't any
|
||||
uniformity across the different bus types.
|
||||
|
||||
The current driver model provides a comon, uniform data model for describing
|
||||
The current driver model provides a common, uniform data model for describing
|
||||
a bus and the devices that can appear under the bus. The unified bus
|
||||
model includes a set of common attributes which all busses carry, and a set
|
||||
of common callbacks, such as device discovery during bus probing, bus
|
||||
|
@ -135,10 +135,10 @@ C. Boot options
|
||||
|
||||
The angle can be changed anytime afterwards by 'echoing' the same
|
||||
numbers to any one of the 2 attributes found in
|
||||
/sys/class/graphics/fb{x}
|
||||
/sys/class/graphics/fbcon
|
||||
|
||||
con_rotate - rotate the display of the active console
|
||||
con_rotate_all - rotate the display of all consoles
|
||||
rotate - rotate the display of the active console
|
||||
rotate_all - rotate the display of all consoles
|
||||
|
||||
Console rotation will only become available if Console Rotation
|
||||
Support is compiled in your kernel.
|
||||
@ -148,5 +148,177 @@ C. Boot options
|
||||
Actually, the underlying fb driver is totally ignorant of console
|
||||
rotation.
|
||||
|
||||
---
|
||||
C. Attaching, Detaching and Unloading
|
||||
|
||||
Before going on on how to attach, detach and unload the framebuffer console, an
|
||||
illustration of the dependencies may help.
|
||||
|
||||
The console layer, as with most subsystems, needs a driver that interfaces with
|
||||
the hardware. Thus, in a VGA console:
|
||||
|
||||
console ---> VGA driver ---> hardware.
|
||||
|
||||
Assuming the VGA driver can be unloaded, one must first unbind the VGA driver
|
||||
from the console layer before unloading the driver. The VGA driver cannot be
|
||||
unloaded if it is still bound to the console layer. (See
|
||||
Documentation/console/console.txt for more information).
|
||||
|
||||
This is more complicated in the case of the the framebuffer console (fbcon),
|
||||
because fbcon is an intermediate layer between the console and the drivers:
|
||||
|
||||
console ---> fbcon ---> fbdev drivers ---> hardware
|
||||
|
||||
The fbdev drivers cannot be unloaded if it's bound to fbcon, and fbcon cannot
|
||||
be unloaded if it's bound to the console layer.
|
||||
|
||||
So to unload the fbdev drivers, one must first unbind fbcon from the console,
|
||||
then unbind the fbdev drivers from fbcon. Fortunately, unbinding fbcon from
|
||||
the console layer will automatically unbind framebuffer drivers from
|
||||
fbcon. Thus, there is no need to explicitly unbind the fbdev drivers from
|
||||
fbcon.
|
||||
|
||||
So, how do we unbind fbcon from the console? Part of the answer is in
|
||||
Documentation/console/console.txt. To summarize:
|
||||
|
||||
Echo a value to the bind file that represents the framebuffer console
|
||||
driver. So assuming vtcon1 represents fbcon, then:
|
||||
|
||||
echo 1 > sys/class/vtconsole/vtcon1/bind - attach framebuffer console to
|
||||
console layer
|
||||
echo 0 > sys/class/vtconsole/vtcon1/bind - detach framebuffer console from
|
||||
console layer
|
||||
|
||||
If fbcon is detached from the console layer, your boot console driver (which is
|
||||
usually VGA text mode) will take over. A few drivers (rivafb and i810fb) will
|
||||
restore VGA text mode for you. With the rest, before detaching fbcon, you
|
||||
must take a few additional steps to make sure that your VGA text mode is
|
||||
restored properly. The following is one of the several methods that you can do:
|
||||
|
||||
1. Download or install vbetool. This utility is included with most
|
||||
distributions nowadays, and is usually part of the suspend/resume tool.
|
||||
|
||||
2. In your kernel configuration, ensure that CONFIG_FRAMEBUFFER_CONSOLE is set
|
||||
to 'y' or 'm'. Enable one or more of your favorite framebuffer drivers.
|
||||
|
||||
3. Boot into text mode and as root run:
|
||||
|
||||
vbetool vbestate save > <vga state file>
|
||||
|
||||
The above command saves the register contents of your graphics
|
||||
hardware to <vga state file>. You need to do this step only once as
|
||||
the state file can be reused.
|
||||
|
||||
4. If fbcon is compiled as a module, load fbcon by doing:
|
||||
|
||||
modprobe fbcon
|
||||
|
||||
5. Now to detach fbcon:
|
||||
|
||||
vbetool vbestate restore < <vga state file> && \
|
||||
echo 0 > /sys/class/vtconsole/vtcon1/bind
|
||||
|
||||
6. That's it, you're back to VGA mode. And if you compiled fbcon as a module,
|
||||
you can unload it by 'rmmod fbcon'
|
||||
|
||||
7. To reattach fbcon:
|
||||
|
||||
echo 1 > /sys/class/vtconsole/vtcon1/bind
|
||||
|
||||
8. Once fbcon is unbound, all drivers registered to the system will also
|
||||
become unbound. This means that fbcon and individual framebuffer drivers
|
||||
can be unloaded or reloaded at will. Reloading the drivers or fbcon will
|
||||
automatically bind the console, fbcon and the drivers together. Unloading
|
||||
all the drivers without unloading fbcon will make it impossible for the
|
||||
console to bind fbcon.
|
||||
|
||||
Notes for vesafb users:
|
||||
=======================
|
||||
|
||||
Unfortunately, if your bootline includes a vga=xxx parameter that sets the
|
||||
hardware in graphics mode, such as when loading vesafb, vgacon will not load.
|
||||
Instead, vgacon will replace the default boot console with dummycon, and you
|
||||
won't get any display after detaching fbcon. Your machine is still alive, so
|
||||
you can reattach vesafb. However, to reattach vesafb, you need to do one of
|
||||
the following:
|
||||
|
||||
Variation 1:
|
||||
|
||||
a. Before detaching fbcon, do
|
||||
|
||||
vbetool vbemode save > <vesa state file> # do once for each vesafb mode,
|
||||
# the file can be reused
|
||||
|
||||
b. Detach fbcon as in step 5.
|
||||
|
||||
c. Attach fbcon
|
||||
|
||||
vbetool vbestate restore < <vesa state file> && \
|
||||
echo 1 > /sys/class/vtconsole/vtcon1/bind
|
||||
|
||||
Variation 2:
|
||||
|
||||
a. Before detaching fbcon, do:
|
||||
echo <ID> > /sys/class/tty/console/bind
|
||||
|
||||
|
||||
vbetool vbemode get
|
||||
|
||||
b. Take note of the mode number
|
||||
|
||||
b. Detach fbcon as in step 5.
|
||||
|
||||
c. Attach fbcon:
|
||||
|
||||
vbetool vbemode set <mode number> && \
|
||||
echo 1 > /sys/class/vtconsole/vtcon1/bind
|
||||
|
||||
Samples:
|
||||
========
|
||||
|
||||
Here are 2 sample bash scripts that you can use to bind or unbind the
|
||||
framebuffer console driver if you are in an X86 box:
|
||||
|
||||
---------------------------------------------------------------------------
|
||||
#!/bin/bash
|
||||
# Unbind fbcon
|
||||
|
||||
# Change this to where your actual vgastate file is located
|
||||
# Or Use VGASTATE=$1 to indicate the state file at runtime
|
||||
VGASTATE=/tmp/vgastate
|
||||
|
||||
# path to vbetool
|
||||
VBETOOL=/usr/local/bin
|
||||
|
||||
|
||||
for (( i = 0; i < 16; i++))
|
||||
do
|
||||
if test -x /sys/class/vtconsole/vtcon$i; then
|
||||
if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \
|
||||
= 1 ]; then
|
||||
if test -x $VBETOOL/vbetool; then
|
||||
echo Unbinding vtcon$i
|
||||
$VBETOOL/vbetool vbestate restore < $VGASTATE
|
||||
echo 0 > /sys/class/vtconsole/vtcon$i/bind
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
done
|
||||
|
||||
---------------------------------------------------------------------------
|
||||
#!/bin/bash
|
||||
# Bind fbcon
|
||||
|
||||
for (( i = 0; i < 16; i++))
|
||||
do
|
||||
if test -x /sys/class/vtconsole/vtcon$i; then
|
||||
if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \
|
||||
= 1 ]; then
|
||||
echo Unbinding vtcon$i
|
||||
echo 1 > /sys/class/vtconsole/vtcon$i/bind
|
||||
fi
|
||||
fi
|
||||
done
|
||||
---------------------------------------------------------------------------
|
||||
|
||||
--
|
||||
Antonino Daplas <adaplas@pol.net>
|
||||
|
@ -6,17 +6,6 @@ be removed from this file.
|
||||
|
||||
---------------------------
|
||||
|
||||
What: devfs
|
||||
When: July 2005
|
||||
Files: fs/devfs/*, include/linux/devfs_fs*.h and assorted devfs
|
||||
function calls throughout the kernel tree
|
||||
Why: It has been unmaintained for a number of years, has unfixable
|
||||
races, contains a naming policy within the kernel that is
|
||||
against the LSB, and can be replaced by using udev.
|
||||
Who: Greg Kroah-Hartman <greg@kroah.com>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: RAW driver (CONFIG_RAW_DRIVER)
|
||||
When: December 2005
|
||||
Why: declared obsolete since kernel 2.6.3
|
||||
@ -33,27 +22,12 @@ Who: Adrian Bunk <bunk@stusta.de>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: RCU API moves to EXPORT_SYMBOL_GPL
|
||||
When: April 2006
|
||||
Files: include/linux/rcupdate.h, kernel/rcupdate.c
|
||||
Why: Outside of Linux, the only implementations of anything even
|
||||
vaguely resembling RCU that I am aware of are in DYNIX/ptx,
|
||||
VM/XA, Tornado, and K42. I do not expect anyone to port binary
|
||||
drivers or kernel modules from any of these, since the first two
|
||||
are owned by IBM and the last two are open-source research OSes.
|
||||
So these will move to GPL after a grace period to allow
|
||||
people, who might be using implementations that I am not aware
|
||||
of, to adjust to this upcoming change.
|
||||
Who: Paul E. McKenney <paulmck@us.ibm.com>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN
|
||||
When: November 2005
|
||||
When: November 2006
|
||||
Why: Deprecated in favour of the new ioctl-based rawiso interface, which is
|
||||
more efficient. You should really be using libraw1394 for raw1394
|
||||
access anyway.
|
||||
Who: Jody McIntyre <scjody@steamballoon.com>
|
||||
Who: Jody McIntyre <scjody@modernduck.com>
|
||||
|
||||
---------------------------
|
||||
|
||||
@ -147,16 +121,6 @@ Who: NeilBrown <neilb@suse.de>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: au1x00_uart driver
|
||||
When: January 2006
|
||||
Why: The 8250 serial driver now has the ability to deal with the differences
|
||||
between the standard 8250 family of UARTs and their slightly strange
|
||||
brother on Alchemy SOCs. The loss of features is not considered an
|
||||
issue.
|
||||
Who: Ralf Baechle <ralf@linux-mips.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: eepro100 network driver
|
||||
When: January 2007
|
||||
Why: replaced by the e100 driver
|
||||
@ -192,6 +156,16 @@ Who: Jean Delvare <khali@linux-fr.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: Unused EXPORT_SYMBOL/EXPORT_SYMBOL_GPL exports
|
||||
(temporary transition config option provided until then)
|
||||
The transition config option will also be removed at the same time.
|
||||
When: before 2.6.19
|
||||
Why: Unused symbols are both increasing the size of the kernel binary
|
||||
and are often a sign of "wrong API"
|
||||
Who: Arjan van de Ven <arjan@linux.intel.com>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: remove EXPORT_SYMBOL(tasklist_lock)
|
||||
When: August 2006
|
||||
Files: kernel/fork.c
|
||||
@ -239,3 +213,56 @@ Why: The interface no longer has any callers left in the kernel. It
|
||||
Who: Nick Piggin <npiggin@suse.de>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: Support for the MIPS EV96100 evaluation board
|
||||
When: September 2006
|
||||
Why: Does no longer build since at least November 15, 2003, apparently
|
||||
no userbase left.
|
||||
Who: Ralf Baechle <ralf@linux-mips.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: Support for the Momentum / PMC-Sierra Jaguar ATX evaluation board
|
||||
When: September 2006
|
||||
Why: Does no longer build since quite some time, and was never popular,
|
||||
due to the platform being replaced by successor models. Apparently
|
||||
no user base left. It also is one of the last users of
|
||||
WANT_PAGE_VIRTUAL.
|
||||
Who: Ralf Baechle <ralf@linux-mips.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: Support for the Momentum Ocelot, Ocelot 3, Ocelot C and Ocelot G
|
||||
When: September 2006
|
||||
Why: Some do no longer build and apparently there is no user base left
|
||||
for these platforms.
|
||||
Who: Ralf Baechle <ralf@linux-mips.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: Support for MIPS Technologies' Altas and SEAD evaluation board
|
||||
When: September 2006
|
||||
Why: Some do no longer build and apparently there is no user base left
|
||||
for these platforms. Hardware out of production since several years.
|
||||
Who: Ralf Baechle <ralf@linux-mips.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: Support for the IT8172-based platforms, ITE 8172G and Globespan IVR
|
||||
When: September 2006
|
||||
Why: Code does no longer build since at least 2.6.0, apparently there is
|
||||
no user base left for these platforms. Hardware out of production
|
||||
since several years and hardly a trace of the manufacturer left on
|
||||
the net.
|
||||
Who: Ralf Baechle <ralf@linux-mips.org>
|
||||
|
||||
---------------------------
|
||||
|
||||
What: Interrupt only SA_* flags
|
||||
When: Januar 2007
|
||||
Why: The interrupt related SA_* flags are replaced by IRQF_* to move them
|
||||
out of the signal namespace.
|
||||
|
||||
Who: Thomas Gleixner <tglx@linutronix.de>
|
||||
|
||||
---------------------------
|
||||
|
@ -99,7 +99,7 @@ prototypes:
|
||||
int (*sync_fs)(struct super_block *sb, int wait);
|
||||
void (*write_super_lockfs) (struct super_block *);
|
||||
void (*unlockfs) (struct super_block *);
|
||||
int (*statfs) (struct super_block *, struct kstatfs *);
|
||||
int (*statfs) (struct dentry *, struct kstatfs *);
|
||||
int (*remount_fs) (struct super_block *, int *, char *);
|
||||
void (*clear_inode) (struct inode *);
|
||||
void (*umount_begin) (struct super_block *);
|
||||
@ -142,15 +142,16 @@ see also dquot_operations section.
|
||||
|
||||
--------------------------- file_system_type ---------------------------
|
||||
prototypes:
|
||||
struct super_block *(*get_sb) (struct file_system_type *, int,
|
||||
const char *, void *);
|
||||
struct int (*get_sb) (struct file_system_type *, int,
|
||||
const char *, void *, struct vfsmount *);
|
||||
void (*kill_sb) (struct super_block *);
|
||||
locking rules:
|
||||
may block BKL
|
||||
get_sb yes yes
|
||||
kill_sb yes yes
|
||||
|
||||
->get_sb() returns error or a locked superblock (exclusive on ->s_umount).
|
||||
->get_sb() returns error or 0 with locked superblock attached to the vfsmount
|
||||
(exclusive on ->s_umount).
|
||||
->kill_sb() takes a write-locked superblock, does all shutdown work on it,
|
||||
unlocks and drops the reference.
|
||||
|
||||
|
@ -19,7 +19,7 @@ following procedure:
|
||||
|
||||
(2) Have the follow_link() op do the following steps:
|
||||
|
||||
(a) Call do_kern_mount() to call the appropriate filesystem to set up a
|
||||
(a) Call vfs_kern_mount() to call the appropriate filesystem to set up a
|
||||
superblock and gain a vfsmount structure representing it.
|
||||
|
||||
(b) Copy the nameidata provided as an argument and substitute the dentry
|
||||
|
@ -264,6 +264,15 @@ static struct config_item_type simple_child_type = {
|
||||
};
|
||||
|
||||
|
||||
struct simple_children {
|
||||
struct config_group group;
|
||||
};
|
||||
|
||||
static inline struct simple_children *to_simple_children(struct config_item *item)
|
||||
{
|
||||
return item ? container_of(to_config_group(item), struct simple_children, group) : NULL;
|
||||
}
|
||||
|
||||
static struct config_item *simple_children_make_item(struct config_group *group, const char *name)
|
||||
{
|
||||
struct simple_child *simple_child;
|
||||
@ -304,7 +313,13 @@ static ssize_t simple_children_attr_show(struct config_item *item,
|
||||
"items have only one attribute that is readable and writeable.\n");
|
||||
}
|
||||
|
||||
static void simple_children_release(struct config_item *item)
|
||||
{
|
||||
kfree(to_simple_children(item));
|
||||
}
|
||||
|
||||
static struct configfs_item_operations simple_children_item_ops = {
|
||||
.release = simple_children_release,
|
||||
.show_attribute = simple_children_attr_show,
|
||||
};
|
||||
|
||||
@ -345,10 +360,6 @@ static struct configfs_subsystem simple_children_subsys = {
|
||||
* children of its own.
|
||||
*/
|
||||
|
||||
struct simple_children {
|
||||
struct config_group group;
|
||||
};
|
||||
|
||||
static struct config_group *group_children_make_group(struct config_group *group, const char *name)
|
||||
{
|
||||
struct simple_children *simple_children;
|
||||
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -1,40 +0,0 @@
|
||||
Device File System (devfs) ToDo List
|
||||
|
||||
Richard Gooch <rgooch@atnf.csiro.au>
|
||||
|
||||
3-JUL-2000
|
||||
|
||||
This is a list of things to be done for better devfs support in the
|
||||
Linux kernel. If you'd like to contribute to the devfs, please have a
|
||||
look at this list for anything that is unallocated. Also, if there are
|
||||
items missing (surely), please contact me so I can add them to the
|
||||
list (preferably with your name attached to them:-).
|
||||
|
||||
|
||||
- >256 ptys
|
||||
Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
|
||||
|
||||
- Amiga floppy driver (drivers/block/amiflop.c)
|
||||
|
||||
- Atari floppy driver (drivers/block/ataflop.c)
|
||||
|
||||
- SWIM3 (Super Woz Integrated Machine 3) floppy driver (drivers/block/swim3.c)
|
||||
|
||||
- Amiga ZorroII ramdisc driver (drivers/block/z2ram.c)
|
||||
|
||||
- Parallel port ATAPI CD-ROM (drivers/block/paride/pcd.c)
|
||||
|
||||
- Parallel port ATAPI floppy (drivers/block/paride/pf.c)
|
||||
|
||||
- AP1000 block driver (drivers/ap1000/ap.c, drivers/ap1000/ddv.c)
|
||||
|
||||
- Archimedes floppy (drivers/acorn/block/fd1772.c)
|
||||
|
||||
- MFM hard drive (drivers/acorn/block/mfmhd.c)
|
||||
|
||||
- I2O block device (drivers/message/i2o/i2o_block.c)
|
||||
|
||||
- ST-RAM device (arch/m68k/atari/stram.c)
|
||||
|
||||
- Raw devices
|
||||
|
@ -1,65 +0,0 @@
|
||||
/* -*- auto-fill -*- */
|
||||
|
||||
Device File System (devfs) Boot Options
|
||||
|
||||
Richard Gooch <rgooch@atnf.csiro.au>
|
||||
|
||||
18-AUG-2001
|
||||
|
||||
|
||||
When CONFIG_DEVFS_DEBUG is enabled, you can pass several boot options
|
||||
to the kernel to debug devfs. The boot options are prefixed by
|
||||
"devfs=", and are separated by commas. Spaces are not allowed. The
|
||||
syntax looks like this:
|
||||
|
||||
devfs=<option1>,<option2>,<option3>
|
||||
|
||||
and so on. For example, if you wanted to turn on debugging for module
|
||||
load requests and device registration, you would do:
|
||||
|
||||
devfs=dmod,dreg
|
||||
|
||||
You may prefix "no" to any option. This will invert the option.
|
||||
|
||||
|
||||
Debugging Options
|
||||
=================
|
||||
|
||||
These requires CONFIG_DEVFS_DEBUG to be enabled.
|
||||
Note that all debugging options have 'd' as the first character. By
|
||||
default all options are off. All debugging output is sent to the
|
||||
kernel logs. The debugging options do not take effect until the devfs
|
||||
version message appears (just prior to the root filesystem being
|
||||
mounted).
|
||||
|
||||
These are the options:
|
||||
|
||||
dmod print module load requests to <request_module>
|
||||
|
||||
dreg print device register requests to <devfs_register>
|
||||
|
||||
dunreg print device unregister requests to <devfs_unregister>
|
||||
|
||||
dchange print device change requests to <devfs_set_flags>
|
||||
|
||||
dilookup print inode lookup requests
|
||||
|
||||
diget print VFS inode allocations
|
||||
|
||||
diunlink print inode unlinks
|
||||
|
||||
dichange print inode changes
|
||||
|
||||
dimknod print calls to mknod(2)
|
||||
|
||||
dall some debugging turned on
|
||||
|
||||
|
||||
Other Options
|
||||
=============
|
||||
|
||||
These control the default behaviour of devfs. The options are:
|
||||
|
||||
mount mount devfs onto /dev at boot time
|
||||
|
||||
only disable non-devfs device nodes for devfs-capable drivers
|
@ -113,6 +113,14 @@ noquota
|
||||
grpquota
|
||||
usrquota
|
||||
|
||||
bh (*) ext3 associates buffer heads to data pages to
|
||||
nobh (a) cache disk block mapping information
|
||||
(b) link pages into transaction to provide
|
||||
ordering guarantees.
|
||||
"bh" option forces use of buffer heads.
|
||||
"nobh" option tries to avoid associating buffer
|
||||
heads (supported only for "writeback" mode).
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
@ -18,6 +18,14 @@ Non-privileged mount (or user mount):
|
||||
user. NOTE: this is not the same as mounts allowed with the "user"
|
||||
option in /etc/fstab, which is not discussed here.
|
||||
|
||||
Filesystem connection:
|
||||
|
||||
A connection between the filesystem daemon and the kernel. The
|
||||
connection exists until either the daemon dies, or the filesystem is
|
||||
umounted. Note that detaching (or lazy umounting) the filesystem
|
||||
does _not_ break the connection, in this case it will exist until
|
||||
the last reference to the filesystem is released.
|
||||
|
||||
Mount owner:
|
||||
|
||||
The user who does the mounting.
|
||||
@ -86,16 +94,20 @@ Mount options
|
||||
The default is infinite. Note that the size of read requests is
|
||||
limited anyway to 32 pages (which is 128kbyte on i386).
|
||||
|
||||
Sysfs
|
||||
~~~~~
|
||||
Control filesystem
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
FUSE sets up the following hierarchy in sysfs:
|
||||
There's a control filesystem for FUSE, which can be mounted by:
|
||||
|
||||
/sys/fs/fuse/connections/N/
|
||||
mount -t fusectl none /sys/fs/fuse/connections
|
||||
|
||||
where N is an increasing number allocated to each new connection.
|
||||
Mounting it under the '/sys/fs/fuse/connections' directory makes it
|
||||
backwards compatible with earlier versions.
|
||||
|
||||
For each connection the following attributes are defined:
|
||||
Under the fuse control filesystem each connection has a directory
|
||||
named by a unique number.
|
||||
|
||||
For each connection the following files exist within this directory:
|
||||
|
||||
'waiting'
|
||||
|
||||
@ -110,7 +122,47 @@ For each connection the following attributes are defined:
|
||||
connection. This means that all waiting requests will be aborted an
|
||||
error returned for all aborted and new requests.
|
||||
|
||||
Only a privileged user may read or write these attributes.
|
||||
Only the owner of the mount may read or write these files.
|
||||
|
||||
Interrupting filesystem operations
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
If a process issuing a FUSE filesystem request is interrupted, the
|
||||
following will happen:
|
||||
|
||||
1) If the request is not yet sent to userspace AND the signal is
|
||||
fatal (SIGKILL or unhandled fatal signal), then the request is
|
||||
dequeued and returns immediately.
|
||||
|
||||
2) If the request is not yet sent to userspace AND the signal is not
|
||||
fatal, then an 'interrupted' flag is set for the request. When
|
||||
the request has been successfully transfered to userspace and
|
||||
this flag is set, an INTERRUPT request is queued.
|
||||
|
||||
3) If the request is already sent to userspace, then an INTERRUPT
|
||||
request is queued.
|
||||
|
||||
INTERRUPT requests take precedence over other requests, so the
|
||||
userspace filesystem will receive queued INTERRUPTs before any others.
|
||||
|
||||
The userspace filesystem may ignore the INTERRUPT requests entirely,
|
||||
or may honor them by sending a reply to the _original_ request, with
|
||||
the error set to EINTR.
|
||||
|
||||
It is also possible that there's a race between processing the
|
||||
original request and it's INTERRUPT request. There are two possibilities:
|
||||
|
||||
1) The INTERRUPT request is processed before the original request is
|
||||
processed
|
||||
|
||||
2) The INTERRUPT request is processed after the original request has
|
||||
been answered
|
||||
|
||||
If the filesystem cannot find the original request, it should wait for
|
||||
some timeout and/or a number of new requests to arrive, after which it
|
||||
should reply to the INTERRUPT request with an EAGAIN error. In case
|
||||
1) the INTERRUPT request will be requeued. In case 2) the INTERRUPT
|
||||
reply will be ignored.
|
||||
|
||||
Aborting a filesystem connection
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
@ -139,8 +191,8 @@ the filesystem. There are several ways to do this:
|
||||
- Use forced umount (umount -f). Works in all cases but only if
|
||||
filesystem is still attached (it hasn't been lazy unmounted)
|
||||
|
||||
- Abort filesystem through the sysfs interface. Most powerful
|
||||
method, always works.
|
||||
- Abort filesystem through the FUSE control filesystem. Most
|
||||
powerful method, always works.
|
||||
|
||||
How do non-privileged mounts work?
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
@ -304,25 +356,7 @@ Scenario 1 - Simple deadlock
|
||||
| | for "file"]
|
||||
| | *DEADLOCK*
|
||||
|
||||
The solution for this is to allow requests to be interrupted while
|
||||
they are in userspace:
|
||||
|
||||
| [interrupted by signal] |
|
||||
| <fuse_unlink() |
|
||||
| [release semaphore] | [semaphore acquired]
|
||||
| <sys_unlink() |
|
||||
| | >fuse_unlink()
|
||||
| | [queue req on fc->pending]
|
||||
| | [wake up fc->waitq]
|
||||
| | [sleep on req->waitq]
|
||||
|
||||
If the filesystem daemon was single threaded, this will stop here,
|
||||
since there's no other thread to dequeue and execute the request.
|
||||
In this case the solution is to kill the FUSE daemon as well. If
|
||||
there are multiple serving threads, you just have to kill them as
|
||||
long as any remain.
|
||||
|
||||
Moral: a filesystem which deadlocks, can soon find itself dead.
|
||||
The solution for this is to allow the filesystem to be aborted.
|
||||
|
||||
Scenario 2 - Tricky deadlock
|
||||
----------------------------
|
||||
@ -355,24 +389,14 @@ but is caused by a pagefault.
|
||||
| | [lock page]
|
||||
| | * DEADLOCK *
|
||||
|
||||
Solution is again to let the the request be interrupted (not
|
||||
elaborated further).
|
||||
Solution is basically the same as above.
|
||||
|
||||
An additional problem is that while the write buffer is being
|
||||
copied to the request, the request must not be interrupted. This
|
||||
is because the destination address of the copy may not be valid
|
||||
after the request is interrupted.
|
||||
An additional problem is that while the write buffer is being copied
|
||||
to the request, the request must not be interrupted/aborted. This is
|
||||
because the destination address of the copy may not be valid after the
|
||||
request has returned.
|
||||
|
||||
This is solved with doing the copy atomically, and allowing
|
||||
interruption while the page(s) belonging to the write buffer are
|
||||
faulted with get_user_pages(). The 'req->locked' flag indicates
|
||||
when the copy is taking place, and interruption is delayed until
|
||||
this flag is unset.
|
||||
|
||||
Scenario 3 - Tricky deadlock with asynchronous read
|
||||
---------------------------------------------------
|
||||
|
||||
The same situation as above, except thread-1 will wait on page lock
|
||||
and hence it will be uninterruptible as well. The solution is to
|
||||
abort the connection with forced umount (if mount is attached) or
|
||||
through the abort attribute in sysfs.
|
||||
This is solved with doing the copy atomically, and allowing abort
|
||||
while the page(s) belonging to the write buffer are faulted with
|
||||
get_user_pages(). The 'req->locked' flag indicates when the copy is
|
||||
taking place, and abort is delayed until this flag is unset.
|
||||
|
@ -69,17 +69,135 @@ Prototypes:
|
||||
int inotify_rm_watch (int fd, __u32 mask);
|
||||
|
||||
|
||||
(iii) Internal Kernel Implementation
|
||||
(iii) Kernel Interface
|
||||
|
||||
Each inotify instance is associated with an inotify_device structure.
|
||||
Inotify's kernel API consists a set of functions for managing watches and an
|
||||
event callback.
|
||||
|
||||
To use the kernel API, you must first initialize an inotify instance with a set
|
||||
of inotify_operations. You are given an opaque inotify_handle, which you use
|
||||
for any further calls to inotify.
|
||||
|
||||
struct inotify_handle *ih = inotify_init(my_event_handler);
|
||||
|
||||
You must provide a function for processing events and a function for destroying
|
||||
the inotify watch.
|
||||
|
||||
void handle_event(struct inotify_watch *watch, u32 wd, u32 mask,
|
||||
u32 cookie, const char *name, struct inode *inode)
|
||||
|
||||
watch - the pointer to the inotify_watch that triggered this call
|
||||
wd - the watch descriptor
|
||||
mask - describes the event that occurred
|
||||
cookie - an identifier for synchronizing events
|
||||
name - the dentry name for affected files in a directory-based event
|
||||
inode - the affected inode in a directory-based event
|
||||
|
||||
void destroy_watch(struct inotify_watch *watch)
|
||||
|
||||
You may add watches by providing a pre-allocated and initialized inotify_watch
|
||||
structure and specifying the inode to watch along with an inotify event mask.
|
||||
You must pin the inode during the call. You will likely wish to embed the
|
||||
inotify_watch structure in a structure of your own which contains other
|
||||
information about the watch. Once you add an inotify watch, it is immediately
|
||||
subject to removal depending on filesystem events. You must grab a reference if
|
||||
you depend on the watch hanging around after the call.
|
||||
|
||||
inotify_init_watch(&my_watch->iwatch);
|
||||
inotify_get_watch(&my_watch->iwatch); // optional
|
||||
s32 wd = inotify_add_watch(ih, &my_watch->iwatch, inode, mask);
|
||||
inotify_put_watch(&my_watch->iwatch); // optional
|
||||
|
||||
You may use the watch descriptor (wd) or the address of the inotify_watch for
|
||||
other inotify operations. You must not directly read or manipulate data in the
|
||||
inotify_watch. Additionally, you must not call inotify_add_watch() more than
|
||||
once for a given inotify_watch structure, unless you have first called either
|
||||
inotify_rm_watch() or inotify_rm_wd().
|
||||
|
||||
To determine if you have already registered a watch for a given inode, you may
|
||||
call inotify_find_watch(), which gives you both the wd and the watch pointer for
|
||||
the inotify_watch, or an error if the watch does not exist.
|
||||
|
||||
wd = inotify_find_watch(ih, inode, &watchp);
|
||||
|
||||
You may use container_of() on the watch pointer to access your own data
|
||||
associated with a given watch. When an existing watch is found,
|
||||
inotify_find_watch() bumps the refcount before releasing its locks. You must
|
||||
put that reference with:
|
||||
|
||||
put_inotify_watch(watchp);
|
||||
|
||||
Call inotify_find_update_watch() to update the event mask for an existing watch.
|
||||
inotify_find_update_watch() returns the wd of the updated watch, or an error if
|
||||
the watch does not exist.
|
||||
|
||||
wd = inotify_find_update_watch(ih, inode, mask);
|
||||
|
||||
An existing watch may be removed by calling either inotify_rm_watch() or
|
||||
inotify_rm_wd().
|
||||
|
||||
int ret = inotify_rm_watch(ih, &my_watch->iwatch);
|
||||
int ret = inotify_rm_wd(ih, wd);
|
||||
|
||||
A watch may be removed while executing your event handler with the following:
|
||||
|
||||
inotify_remove_watch_locked(ih, iwatch);
|
||||
|
||||
Call inotify_destroy() to remove all watches from your inotify instance and
|
||||
release it. If there are no outstanding references, inotify_destroy() will call
|
||||
your destroy_watch op for each watch.
|
||||
|
||||
inotify_destroy(ih);
|
||||
|
||||
When inotify removes a watch, it sends an IN_IGNORED event to your callback.
|
||||
You may use this event as an indication to free the watch memory. Note that
|
||||
inotify may remove a watch due to filesystem events, as well as by your request.
|
||||
If you use IN_ONESHOT, inotify will remove the watch after the first event, at
|
||||
which point you may call the final inotify_put_watch.
|
||||
|
||||
(iv) Kernel Interface Prototypes
|
||||
|
||||
struct inotify_handle *inotify_init(struct inotify_operations *ops);
|
||||
|
||||
inotify_init_watch(struct inotify_watch *watch);
|
||||
|
||||
s32 inotify_add_watch(struct inotify_handle *ih,
|
||||
struct inotify_watch *watch,
|
||||
struct inode *inode, u32 mask);
|
||||
|
||||
s32 inotify_find_watch(struct inotify_handle *ih, struct inode *inode,
|
||||
struct inotify_watch **watchp);
|
||||
|
||||
s32 inotify_find_update_watch(struct inotify_handle *ih,
|
||||
struct inode *inode, u32 mask);
|
||||
|
||||
int inotify_rm_wd(struct inotify_handle *ih, u32 wd);
|
||||
|
||||
int inotify_rm_watch(struct inotify_handle *ih,
|
||||
struct inotify_watch *watch);
|
||||
|
||||
void inotify_remove_watch_locked(struct inotify_handle *ih,
|
||||
struct inotify_watch *watch);
|
||||
|
||||
void inotify_destroy(struct inotify_handle *ih);
|
||||
|
||||
void get_inotify_watch(struct inotify_watch *watch);
|
||||
void put_inotify_watch(struct inotify_watch *watch);
|
||||
|
||||
|
||||
(v) Internal Kernel Implementation
|
||||
|
||||
Each inotify instance is represented by an inotify_handle structure.
|
||||
Inotify's userspace consumers also have an inotify_device which is
|
||||
associated with the inotify_handle, and on which events are queued.
|
||||
|
||||
Each watch is associated with an inotify_watch structure. Watches are chained
|
||||
off of each associated device and each associated inode.
|
||||
off of each associated inotify_handle and each associated inode.
|
||||
|
||||
See fs/inotify.c for the locking and lifetime rules.
|
||||
See fs/inotify.c and fs/inotify_user.c for the locking and lifetime rules.
|
||||
|
||||
|
||||
(iv) Rationale
|
||||
(vi) Rationale
|
||||
|
||||
Q: What is the design decision behind not tying the watch to the open fd of
|
||||
the watched object?
|
||||
@ -145,7 +263,7 @@ A: The poor user-space interface is the second biggest problem with dnotify.
|
||||
file descriptor-based one that allows basic file I/O and poll/select.
|
||||
Obtaining the fd and managing the watches could have been done either via a
|
||||
device file or a family of new system calls. We decided to implement a
|
||||
family of system calls because that is the preffered approach for new kernel
|
||||
family of system calls because that is the preferred approach for new kernel
|
||||
interfaces. The only real difference was whether we wanted to use open(2)
|
||||
and ioctl(2) or a couple of new system calls. System calls beat ioctls.
|
||||
|
||||
|
@ -50,10 +50,11 @@ Turn your foo_read_super() into a function that would return 0 in case of
|
||||
success and negative number in case of error (-EINVAL unless you have more
|
||||
informative error value to report). Call it foo_fill_super(). Now declare
|
||||
|
||||
struct super_block foo_get_sb(struct file_system_type *fs_type,
|
||||
int flags, const char *dev_name, void *data)
|
||||
int foo_get_sb(struct file_system_type *fs_type,
|
||||
int flags, const char *dev_name, void *data, struct vfsmount *mnt)
|
||||
{
|
||||
return get_sb_bdev(fs_type, flags, dev_name, data, ext2_fill_super);
|
||||
return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super,
|
||||
mnt);
|
||||
}
|
||||
|
||||
(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of
|
||||
|
@ -70,11 +70,13 @@ tmpfs mounts. See Documentation/filesystems/tmpfs.txt for more information.
|
||||
What is rootfs?
|
||||
---------------
|
||||
|
||||
Rootfs is a special instance of ramfs, which is always present in 2.6 systems.
|
||||
(It's used internally as the starting and stopping point for searches of the
|
||||
kernel's doubly-linked list of mount points.)
|
||||
Rootfs is a special instance of ramfs (or tmpfs, if that's enabled), which is
|
||||
always present in 2.6 systems. You can't unmount rootfs for approximately the
|
||||
same reason you can't kill the init process; rather than having special code
|
||||
to check for and handle an empty list, it's smaller and simpler for the kernel
|
||||
to just make sure certain lists can't become empty.
|
||||
|
||||
Most systems just mount another filesystem over it and ignore it. The
|
||||
Most systems just mount another filesystem over rootfs and ignore it. The
|
||||
amount of space an empty instance of ramfs takes up is tiny.
|
||||
|
||||
What is initramfs?
|
||||
@ -92,14 +94,16 @@ out of that.
|
||||
|
||||
All this differs from the old initrd in several ways:
|
||||
|
||||
- The old initrd was a separate file, while the initramfs archive is linked
|
||||
into the linux kernel image. (The directory linux-*/usr is devoted to
|
||||
generating this archive during the build.)
|
||||
- The old initrd was always a separate file, while the initramfs archive is
|
||||
linked into the linux kernel image. (The directory linux-*/usr is devoted
|
||||
to generating this archive during the build.)
|
||||
|
||||
- The old initrd file was a gzipped filesystem image (in some file format,
|
||||
such as ext2, that had to be built into the kernel), while the new
|
||||
such as ext2, that needed a driver built into the kernel), while the new
|
||||
initramfs archive is a gzipped cpio archive (like tar only simpler,
|
||||
see cpio(1) and Documentation/early-userspace/buffer-format.txt).
|
||||
see cpio(1) and Documentation/early-userspace/buffer-format.txt). The
|
||||
kernel's cpio extraction code is not only extremely small, it's also
|
||||
__init data that can be discarded during the boot process.
|
||||
|
||||
- The program run by the old initrd (which was called /initrd, not /init) did
|
||||
some setup and then returned to the kernel, while the init program from
|
||||
@ -124,13 +128,14 @@ Populating initramfs:
|
||||
|
||||
The 2.6 kernel build process always creates a gzipped cpio format initramfs
|
||||
archive and links it into the resulting kernel binary. By default, this
|
||||
archive is empty (consuming 134 bytes on x86). The config option
|
||||
CONFIG_INITRAMFS_SOURCE (for some reason buried under devices->block devices
|
||||
in menuconfig, and living in usr/Kconfig) can be used to specify a source for
|
||||
the initramfs archive, which will automatically be incorporated into the
|
||||
resulting binary. This option can point to an existing gzipped cpio archive, a
|
||||
directory containing files to be archived, or a text file specification such
|
||||
as the following example:
|
||||
archive is empty (consuming 134 bytes on x86).
|
||||
|
||||
The config option CONFIG_INITRAMFS_SOURCE (for some reason buried under
|
||||
devices->block devices in menuconfig, and living in usr/Kconfig) can be used
|
||||
to specify a source for the initramfs archive, which will automatically be
|
||||
incorporated into the resulting binary. This option can point to an existing
|
||||
gzipped cpio archive, a directory containing files to be archived, or a text
|
||||
file specification such as the following example:
|
||||
|
||||
dir /dev 755 0 0
|
||||
nod /dev/console 644 0 0 c 5 1
|
||||
@ -146,23 +151,84 @@ as the following example:
|
||||
Run "usr/gen_init_cpio" (after the kernel build) to get a usage message
|
||||
documenting the above file format.
|
||||
|
||||
One advantage of the text file is that root access is not required to
|
||||
One advantage of the configuration file is that root access is not required to
|
||||
set permissions or create device nodes in the new archive. (Note that those
|
||||
two example "file" entries expect to find files named "init.sh" and "busybox" in
|
||||
a directory called "initramfs", under the linux-2.6.* directory. See
|
||||
Documentation/early-userspace/README for more details.)
|
||||
|
||||
The kernel does not depend on external cpio tools, gen_init_cpio is created
|
||||
from usr/gen_init_cpio.c which is entirely self-contained, and the kernel's
|
||||
boot-time extractor is also (obviously) self-contained. However, if you _do_
|
||||
happen to have cpio installed, the following command line can extract the
|
||||
generated cpio image back into its component files:
|
||||
The kernel does not depend on external cpio tools. If you specify a
|
||||
directory instead of a configuration file, the kernel's build infrastructure
|
||||
creates a configuration file from that directory (usr/Makefile calls
|
||||
scripts/gen_initramfs_list.sh), and proceeds to package up that directory
|
||||
using the config file (by feeding it to usr/gen_init_cpio, which is created
|
||||
from usr/gen_init_cpio.c). The kernel's build-time cpio creation code is
|
||||
entirely self-contained, and the kernel's boot-time extractor is also
|
||||
(obviously) self-contained.
|
||||
|
||||
The one thing you might need external cpio utilities installed for is creating
|
||||
or extracting your own preprepared cpio files to feed to the kernel build
|
||||
(instead of a config file or directory).
|
||||
|
||||
The following command line can extract a cpio image (either by the above script
|
||||
or by the kernel build) back into its component files:
|
||||
|
||||
cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames
|
||||
|
||||
The following shell script can create a prebuilt cpio archive you can
|
||||
use in place of the above config file:
|
||||
|
||||
#!/bin/sh
|
||||
|
||||
# Copyright 2006 Rob Landley <rob@landley.net> and TimeSys Corporation.
|
||||
# Licensed under GPL version 2
|
||||
|
||||
if [ $# -ne 2 ]
|
||||
then
|
||||
echo "usage: mkinitramfs directory imagename.cpio.gz"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ -d "$1" ]
|
||||
then
|
||||
echo "creating $2 from $1"
|
||||
(cd "$1"; find . | cpio -o -H newc | gzip) > "$2"
|
||||
else
|
||||
echo "First argument must be a directory"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
Note: The cpio man page contains some bad advice that will break your initramfs
|
||||
archive if you follow it. It says "A typical way to generate the list
|
||||
of filenames is with the find command; you should give find the -depth option
|
||||
to minimize problems with permissions on directories that are unwritable or not
|
||||
searchable." Don't do this when creating initramfs.cpio.gz images, it won't
|
||||
work. The Linux kernel cpio extractor won't create files in a directory that
|
||||
doesn't exist, so the directory entries must go before the files that go in
|
||||
those directories. The above script gets them in the right order.
|
||||
|
||||
External initramfs images:
|
||||
--------------------------
|
||||
|
||||
If the kernel has initrd support enabled, an external cpio.gz archive can also
|
||||
be passed into a 2.6 kernel in place of an initrd. In this case, the kernel
|
||||
will autodetect the type (initramfs, not initrd) and extract the external cpio
|
||||
archive into rootfs before trying to run /init.
|
||||
|
||||
This has the memory efficiency advantages of initramfs (no ramdisk block
|
||||
device) but the separate packaging of initrd (which is nice if you have
|
||||
non-GPL code you'd like to run from initramfs, without conflating it with
|
||||
the GPL licensed Linux kernel binary).
|
||||
|
||||
It can also be used to supplement the kernel's built-in initamfs image. The
|
||||
files in the external archive will overwrite any conflicting files in
|
||||
the built-in initramfs archive. Some distributors also prefer to customize
|
||||
a single kernel image with task-specific initramfs images, without recompiling.
|
||||
|
||||
Contents of initramfs:
|
||||
----------------------
|
||||
|
||||
An initramfs archive is a complete self-contained root filesystem for Linux.
|
||||
If you don't already understand what shared libraries, devices, and paths
|
||||
you need to get a minimal root filesystem up and running, here are some
|
||||
references:
|
||||
@ -176,13 +242,36 @@ code against, along with some related utilities. It is BSD licensed.
|
||||
|
||||
I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net)
|
||||
myself. These are LGPL and GPL, respectively. (A self-contained initramfs
|
||||
package is planned for the busybox 1.2 release.)
|
||||
package is planned for the busybox 1.3 release.)
|
||||
|
||||
In theory you could use glibc, but that's not well suited for small embedded
|
||||
uses like this. (A "hello world" program statically linked against glibc is
|
||||
over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do
|
||||
name lookups, even when otherwise statically linked.)
|
||||
|
||||
A good first step is to get initramfs to run a statically linked "hello world"
|
||||
program as init, and test it under an emulator like qemu (www.qemu.org) or
|
||||
User Mode Linux, like so:
|
||||
|
||||
cat > hello.c << EOF
|
||||
#include <stdio.h>
|
||||
#include <unistd.h>
|
||||
|
||||
int main(int argc, char *argv[])
|
||||
{
|
||||
printf("Hello world!\n");
|
||||
sleep(999999999);
|
||||
}
|
||||
EOF
|
||||
gcc -static hello2.c -o init
|
||||
echo init | cpio -o -H newc | gzip > test.cpio.gz
|
||||
# Testing external initramfs using the initrd loading mechanism.
|
||||
qemu -kernel /boot/vmlinuz -initrd test.cpio.gz /dev/zero
|
||||
|
||||
When debugging a normal root filesystem, it's nice to be able to boot with
|
||||
"init=/bin/sh". The initramfs equivalent is "rdinit=/bin/sh", and it's
|
||||
just as useful.
|
||||
|
||||
Why cpio rather than tar?
|
||||
-------------------------
|
||||
|
||||
@ -241,7 +330,7 @@ the above threads) is:
|
||||
Future directions:
|
||||
------------------
|
||||
|
||||
Today (2.6.14), initramfs is always compiled in, but not always used. The
|
||||
Today (2.6.16), initramfs is always compiled in, but not always used. The
|
||||
kernel falls back to legacy boot code that is reached only if initramfs does
|
||||
not contain an /init program. The fallback is legacy code, there to ensure a
|
||||
smooth transition and allowing early boot functionality to gradually move to
|
||||
@ -258,8 +347,9 @@ and so on.
|
||||
|
||||
This kind of complexity (which inevitably includes policy) is rightly handled
|
||||
in userspace. Both klibc and busybox/uClibc are working on simple initramfs
|
||||
packages to drop into a kernel build, and when standard solutions are ready
|
||||
and widely deployed, the kernel's legacy early boot code will become obsolete
|
||||
and a candidate for the feature removal schedule.
|
||||
packages to drop into a kernel build.
|
||||
|
||||
But that's a while off yet.
|
||||
The klibc package has now been accepted into Andrew Morton's 2.6.17-mm tree.
|
||||
The kernel's current early boot code (partition detection, etc) will probably
|
||||
be migrated into a default initramfs, automatically created and used by the
|
||||
kernel build.
|
||||
|
@ -113,8 +113,8 @@ members are defined:
|
||||
struct file_system_type {
|
||||
const char *name;
|
||||
int fs_flags;
|
||||
struct super_block *(*get_sb) (struct file_system_type *, int,
|
||||
const char *, void *);
|
||||
struct int (*get_sb) (struct file_system_type *, int,
|
||||
const char *, void *, struct vfsmount *);
|
||||
void (*kill_sb) (struct super_block *);
|
||||
struct module *owner;
|
||||
struct file_system_type * next;
|
||||
@ -211,7 +211,7 @@ struct super_operations {
|
||||
int (*sync_fs)(struct super_block *sb, int wait);
|
||||
void (*write_super_lockfs) (struct super_block *);
|
||||
void (*unlockfs) (struct super_block *);
|
||||
int (*statfs) (struct super_block *, struct kstatfs *);
|
||||
int (*statfs) (struct dentry *, struct kstatfs *);
|
||||
int (*remount_fs) (struct super_block *, int *, char *);
|
||||
void (*clear_inode) (struct inode *);
|
||||
void (*umount_begin) (struct super_block *);
|
||||
|
59
Documentation/hwmon/abituguru
Normal file
59
Documentation/hwmon/abituguru
Normal file
@ -0,0 +1,59 @@
|
||||
Kernel driver abituguru
|
||||
=======================
|
||||
|
||||
Supported chips:
|
||||
* Abit uGuru (Hardware Monitor part only)
|
||||
Prefix: 'abituguru'
|
||||
Addresses scanned: ISA 0x0E0
|
||||
Datasheet: Not available, this driver is based on reverse engineering.
|
||||
A "Datasheet" has been written based on the reverse engineering it
|
||||
should be available in the same dir as this file under the name
|
||||
abituguru-datasheet.
|
||||
|
||||
Authors:
|
||||
Hans de Goede <j.w.r.degoede@hhs.nl>,
|
||||
(Initial reverse engineering done by Olle Sandberg
|
||||
<ollebull@gmail.com>)
|
||||
|
||||
|
||||
Module Parameters
|
||||
-----------------
|
||||
|
||||
* force: bool Force detection. Note this parameter only causes the
|
||||
detection to be skipped, if the uGuru can't be read
|
||||
the module initialization (insmod) will still fail.
|
||||
* fan_sensors: int Tell the driver how many fan speed sensors there are
|
||||
on your motherboard. Default: 0 (autodetect).
|
||||
* pwms: int Tell the driver how many fan speed controls (fan
|
||||
pwms) your motherboard has. Default: 0 (autodetect).
|
||||
* verbose: int How verbose should the driver be? (0-3):
|
||||
0 normal output
|
||||
1 + verbose error reporting
|
||||
2 + sensors type probing info\n"
|
||||
3 + retryable error reporting
|
||||
Default: 2 (the driver is still in the testing phase)
|
||||
|
||||
Notice if you need any of the first three options above please insmod the
|
||||
driver with verbose set to 3 and mail me <j.w.r.degoede@hhs.nl> the output of:
|
||||
dmesg | grep abituguru
|
||||
|
||||
|
||||
Description
|
||||
-----------
|
||||
|
||||
This driver supports the hardware monitoring features of the Abit uGuru chip
|
||||
found on Abit uGuru featuring motherboards (most modern Abit motherboards).
|
||||
|
||||
The uGuru chip in reality is a Winbond W83L950D in disguise (despite Abit
|
||||
claiming it is "a new microprocessor designed by the ABIT Engineers").
|
||||
Unfortunatly this doesn't help since the W83L950D is a generic
|
||||
microcontroller with a custom Abit application running on it.
|
||||
|
||||
Despite Abit not releasing any information regarding the uGuru, Olle
|
||||
Sandberg <ollebull@gmail.com> has managed to reverse engineer the sensor part
|
||||
of the uGuru. Without his work this driver would not have been possible.
|
||||
|
||||
Known Issues
|
||||
------------
|
||||
|
||||
The voltage and frequency control parts of the Abit uGuru are not supported.
|
312
Documentation/hwmon/abituguru-datasheet
Normal file
312
Documentation/hwmon/abituguru-datasheet
Normal file
@ -0,0 +1,312 @@
|
||||
uGuru datasheet
|
||||
===============
|
||||
|
||||
First of all, what I know about uGuru is no fact based on any help, hints or
|
||||
datasheet from Abit. The data I have got on uGuru have I assembled through
|
||||
my weak knowledge in "backwards engineering".
|
||||
And just for the record, you may have noticed uGuru isn't a chip developed by
|
||||
Abit, as they claim it to be. It's realy just an microprocessor (uC) created by
|
||||
Winbond (W83L950D). And no, reading the manual for this specific uC or
|
||||
mailing Windbond for help won't give any usefull data about uGuru, as it is
|
||||
the program inside the uC that is responding to calls.
|
||||
|
||||
Olle Sandberg <ollebull@gmail.com>, 2005-05-25
|
||||
|
||||
|
||||
Original version by Olle Sandberg who did the heavy lifting of the initial
|
||||
reverse engineering. This version has been almost fully rewritten for clarity
|
||||
and extended with write support and info on more databanks, the write support
|
||||
is once again reverse engineered by Olle the additional databanks have been
|
||||
reverse engineered by me. I would like to express my thanks to Olle, this
|
||||
document and the Linux driver could not have been written without his efforts.
|
||||
|
||||
Note: because of the lack of specs only the sensors part of the uGuru is
|
||||
described here and not the CPU / RAM / etc voltage & frequency control.
|
||||
|
||||
Hans de Goede <j.w.r.degoede@hhs.nl>, 28-01-2006
|
||||
|
||||
|
||||
Detection
|
||||
=========
|
||||
|
||||
As far as known the uGuru is always placed at and using the (ISA) I/O-ports
|
||||
0xE0 and 0xE4, so we don't have to scan any port-range, just check what the two
|
||||
ports are holding for detection. We will refer to 0xE0 as CMD (command-port)
|
||||
and 0xE4 as DATA because Abit refers to them with these names.
|
||||
|
||||
If DATA holds 0x00 or 0x08 and CMD holds 0x00 or 0xAC an uGuru could be
|
||||
present. We have to check for two different values at data-port, because
|
||||
after a reboot uGuru will hold 0x00 here, but if the driver is removed and
|
||||
later on attached again data-port will hold 0x08, more about this later.
|
||||
|
||||
After wider testing of the Linux kernel driver some variants of the uGuru have
|
||||
turned up which will hold 0x00 instead of 0xAC at the CMD port, thus we also
|
||||
have to test CMD for two different values. On these uGuru's DATA will initally
|
||||
hold 0x09 and will only hold 0x08 after reading CMD first, so CMD must be read
|
||||
first!
|
||||
|
||||
To be really sure an uGuru is present a test read of one or more register
|
||||
sets should be done.
|
||||
|
||||
|
||||
Reading / Writing
|
||||
=================
|
||||
|
||||
Addressing
|
||||
----------
|
||||
|
||||
The uGuru has a number of different addressing levels. The first addressing
|
||||
level we will call banks. A bank holds data for one or more sensors. The data
|
||||
in a bank for a sensor is one or more bytes large.
|
||||
|
||||
The number of bytes is fixed for a given bank, you should always read or write
|
||||
that many bytes, reading / writing more will fail, the results when writing
|
||||
less then the number of bytes for a given bank are undetermined.
|
||||
|
||||
See below for all known bank addresses, numbers of sensors in that bank,
|
||||
number of bytes data per sensor and contents/meaning of those bytes.
|
||||
|
||||
Although both this document and the kernel driver have kept the sensor
|
||||
terminoligy for the addressing within a bank this is not 100% correct, in
|
||||
bank 0x24 for example the addressing within the bank selects a PWM output not
|
||||
a sensor.
|
||||
|
||||
Notice that some banks have both a read and a write address this is how the
|
||||
uGuru determines if a read from or a write to the bank is taking place, thus
|
||||
when reading you should always use the read address and when writing the
|
||||
write address. The write address is always one (1) more then the read address.
|
||||
|
||||
|
||||
uGuru ready
|
||||
-----------
|
||||
|
||||
Before you can read from or write to the uGuru you must first put the uGuru
|
||||
in "ready" mode.
|
||||
|
||||
To put the uGuru in ready mode first write 0x00 to DATA and then wait for DATA
|
||||
to hold 0x09, DATA should read 0x09 within 250 read cycles.
|
||||
|
||||
Next CMD _must_ be read and should hold 0xAC, usually CMD will hold 0xAC the
|
||||
first read but sometimes it takes a while before CMD holds 0xAC and thus it
|
||||
has to be read a number of times (max 50).
|
||||
|
||||
After reading CMD, DATA should hold 0x08 which means that the uGuru is ready
|
||||
for input. As above DATA will usually hold 0x08 the first read but not always.
|
||||
This step can be skipped, but it is undetermined what happens if the uGuru has
|
||||
not yet reported 0x08 at DATA and you proceed with writing a bank address.
|
||||
|
||||
|
||||
Sending bank and sensor addresses to the uGuru
|
||||
----------------------------------------------
|
||||
|
||||
First the uGuru must be in "ready" mode as described above, DATA should hold
|
||||
0x08 indicating that the uGuru wants input, in this case the bank address.
|
||||
|
||||
Next write the bank address to DATA. After the bank address has been written
|
||||
wait for to DATA to hold 0x08 again indicating that it wants / is ready for
|
||||
more input (max 250 reads).
|
||||
|
||||
Once DATA holds 0x08 again write the sensor address to CMD.
|
||||
|
||||
|
||||
Reading
|
||||
-------
|
||||
|
||||
First send the bank and sensor addresses as described above.
|
||||
Then for each byte of data you want to read wait for DATA to hold 0x01
|
||||
which indicates that the uGuru is ready to be read (max 250 reads) and once
|
||||
DATA holds 0x01 read the byte from CMD.
|
||||
|
||||
Once all bytes have been read data will hold 0x09, but there is no reason to
|
||||
test for this. Notice that the number of bytes is bank address dependent see
|
||||
above and below.
|
||||
|
||||
After completing a successfull read it is advised to put the uGuru back in
|
||||
ready mode, so that it is ready for the next read / write cycle. This way
|
||||
if your program / driver is unloaded and later loaded again the detection
|
||||
algorithm described above will still work.
|
||||
|
||||
|
||||
|
||||
Writing
|
||||
-------
|
||||
|
||||
First send the bank and sensor addresses as described above.
|
||||
Then for each byte of data you want to write wait for DATA to hold 0x00
|
||||
which indicates that the uGuru is ready to be written (max 250 reads) and
|
||||
once DATA holds 0x00 write the byte to CMD.
|
||||
|
||||
Once all bytes have been written wait for DATA to hold 0x01 (max 250 reads)
|
||||
don't ask why this is the way it is.
|
||||
|
||||
Once DATA holds 0x01 read CMD it should hold 0xAC now.
|
||||
|
||||
After completing a successfull write it is advised to put the uGuru back in
|
||||
ready mode, so that it is ready for the next read / write cycle. This way
|
||||
if your program / driver is unloaded and later loaded again the detection
|
||||
algorithm described above will still work.
|
||||
|
||||
|
||||
Gotchas
|
||||
-------
|
||||
|
||||
After wider testing of the Linux kernel driver some variants of the uGuru have
|
||||
turned up which do not hold 0x08 at DATA within 250 reads after writing the
|
||||
bank address. With these versions this happens quite frequent, using larger
|
||||
timeouts doesn't help, they just go offline for a second or 2, doing some
|
||||
internal callibration or whatever. Your code should be prepared to handle
|
||||
this and in case of no response in this specific case just goto sleep for a
|
||||
while and then retry.
|
||||
|
||||
|
||||
Address Map
|
||||
===========
|
||||
|
||||
Bank 0x20 Alarms (R)
|
||||
--------------------
|
||||
This bank contains 0 sensors, iow the sensor address is ignored (but must be
|
||||
written) just use 0. Bank 0x20 contains 3 bytes:
|
||||
|
||||
Byte 0:
|
||||
This byte holds the alarm flags for sensor 0-7 of Sensor Bank1, with bit 0
|
||||
corresponding to sensor 0, 1 to 1, etc.
|
||||
|
||||
Byte 1:
|
||||
This byte holds the alarm flags for sensor 8-15 of Sensor Bank1, with bit 0
|
||||
corresponding to sensor 8, 1 to 9, etc.
|
||||
|
||||
Byte 2:
|
||||
This byte holds the alarm flags for sensor 0-5 of Sensor Bank2, with bit 0
|
||||
corresponding to sensor 0, 1 to 1, etc.
|
||||
|
||||
|
||||
Bank 0x21 Sensor Bank1 Values / Readings (R)
|
||||
--------------------------------------------
|
||||
This bank contains 16 sensors, for each sensor it contains 1 byte.
|
||||
So far the following sensors are known to be available on all motherboards:
|
||||
Sensor 0 CPU temp
|
||||
Sensor 1 SYS temp
|
||||
Sensor 3 CPU core volt
|
||||
Sensor 4 DDR volt
|
||||
Sensor 10 DDR Vtt volt
|
||||
Sensor 15 PWM temp
|
||||
|
||||
Byte 0:
|
||||
This byte holds the reading from the sensor. Sensors in Bank1 can be both
|
||||
volt and temp sensors, this is motherboard specific. The uGuru however does
|
||||
seem to know (be programmed with) what kindoff sensor is attached see Sensor
|
||||
Bank1 Settings description.
|
||||
|
||||
Volt sensors use a linear scale, a reading 0 corresponds with 0 volt and a
|
||||
reading of 255 with 3494 mV. The sensors for higher voltages however are
|
||||
connected through a division circuit. The currently known division circuits
|
||||
in use result in ranges of: 0-4361mV, 0-6248mV or 0-14510mV. 3.3 volt sources
|
||||
use the 0-4361mV range, 5 volt the 0-6248mV and 12 volt the 0-14510mV .
|
||||
|
||||
Temp sensors also use a linear scale, a reading of 0 corresponds with 0 degree
|
||||
Celsius and a reading of 255 with a reading of 255 degrees Celsius.
|
||||
|
||||
|
||||
Bank 0x22 Sensor Bank1 Settings (R)
|
||||
Bank 0x23 Sensor Bank1 Settings (W)
|
||||
-----------------------------------
|
||||
|
||||
This bank contains 16 sensors, for each sensor it contains 3 bytes. Each
|
||||
set of 3 bytes contains the settings for the sensor with the same sensor
|
||||
address in Bank 0x21 .
|
||||
|
||||
Byte 0:
|
||||
Alarm behaviour for the selected sensor. A 1 enables the described behaviour.
|
||||
Bit 0: Give an alarm if measured temp is over the warning threshold (RW) *
|
||||
Bit 1: Give an alarm if measured volt is over the max threshold (RW) **
|
||||
Bit 2: Give an alarm if measured volt is under the min threshold (RW) **
|
||||
Bit 3: Beep if alarm (RW)
|
||||
Bit 4: 1 if alarm cause measured temp is over the warning threshold (R)
|
||||
Bit 5: 1 if alarm cause measured volt is over the max threshold (R)
|
||||
Bit 6: 1 if alarm cause measured volt is under the min threshold (R)
|
||||
Bit 7: Volt sensor: Shutdown if alarm persist for more then 4 seconds (RW)
|
||||
Temp sensor: Shutdown if temp is over the shutdown threshold (RW)
|
||||
|
||||
* This bit is only honored/used by the uGuru if a temp sensor is connected
|
||||
** This bit is only honored/used by the uGuru if a volt sensor is connected
|
||||
Note with some trickery this can be used to find out what kinda sensor is
|
||||
detected see the Linux kernel driver for an example with many comments on
|
||||
how todo this.
|
||||
|
||||
Byte 1:
|
||||
Temp sensor: warning threshold (scale as bank 0x21)
|
||||
Volt sensor: min threshold (scale as bank 0x21)
|
||||
|
||||
Byte 2:
|
||||
Temp sensor: shutdown threshold (scale as bank 0x21)
|
||||
Volt sensor: max threshold (scale as bank 0x21)
|
||||
|
||||
|
||||
Bank 0x24 PWM outputs for FAN's (R)
|
||||
Bank 0x25 PWM outputs for FAN's (W)
|
||||
-----------------------------------
|
||||
|
||||
This bank contains 3 "sensors", for each sensor it contains 5 bytes.
|
||||
Sensor 0 usually controls the CPU fan
|
||||
Sensor 1 usually controls the NB (or chipset for single chip) fan
|
||||
Sensor 2 usually controls the System fan
|
||||
|
||||
Byte 0:
|
||||
Flag 0x80 to enable control, Fan runs at 100% when disabled.
|
||||
low nibble (temp)sensor address at bank 0x21 used for control.
|
||||
|
||||
Byte 1:
|
||||
0-255 = 0-12v (linear), specify voltage at which fan will rotate when under
|
||||
low threshold temp (specified in byte 3)
|
||||
|
||||
Byte 2:
|
||||
0-255 = 0-12v (linear), specify voltage at which fan will rotate when above
|
||||
high threshold temp (specified in byte 4)
|
||||
|
||||
Byte 3:
|
||||
Low threshold temp (scale as bank 0x21)
|
||||
|
||||
byte 4:
|
||||
High threshold temp (scale as bank 0x21)
|
||||
|
||||
|
||||
Bank 0x26 Sensors Bank2 Values / Readings (R)
|
||||
---------------------------------------------
|
||||
|
||||
This bank contains 6 sensors (AFAIK), for each sensor it contains 1 byte.
|
||||
So far the following sensors are known to be available on all motherboards:
|
||||
Sensor 0: CPU fan speed
|
||||
Sensor 1: NB (or chipset for single chip) fan speed
|
||||
Sensor 2: SYS fan speed
|
||||
|
||||
Byte 0:
|
||||
This byte holds the reading from the sensor. 0-255 = 0-15300 (linear)
|
||||
|
||||
|
||||
Bank 0x27 Sensors Bank2 Settings (R)
|
||||
Bank 0x28 Sensors Bank2 Settings (W)
|
||||
------------------------------------
|
||||
|
||||
This bank contains 6 sensors (AFAIK), for each sensor it contains 2 bytes.
|
||||
|
||||
Byte 0:
|
||||
Alarm behaviour for the selected sensor. A 1 enables the described behaviour.
|
||||
Bit 0: Give an alarm if measured rpm is under the min threshold (RW)
|
||||
Bit 3: Beep if alarm (RW)
|
||||
Bit 7: Shutdown if alarm persist for more then 4 seconds (RW)
|
||||
|
||||
Byte 1:
|
||||
min threshold (scale as bank 0x26)
|
||||
|
||||
|
||||
Warning for the adventerous
|
||||
===========================
|
||||
|
||||
A word of caution to those who want to experiment and see if they can figure
|
||||
the voltage / clock programming out, I tried reading and only reading banks
|
||||
0-0x30 with the reading code used for the sensor banks (0x20-0x28) and this
|
||||
resulted in a _permanent_ reprogramming of the voltages, luckily I had the
|
||||
sensors part configured so that it would shutdown my system on any out of spec
|
||||
voltages which proprably safed my computer (after a reboot I managed to
|
||||
immediatly enter the bios and reload the defaults). This probably means that
|
||||
the read/write cycle for the non sensor part is different from the sensor part.
|
31
Documentation/hwmon/lm70
Normal file
31
Documentation/hwmon/lm70
Normal file
@ -0,0 +1,31 @@
|
||||
Kernel driver lm70
|
||||
==================
|
||||
|
||||
Supported chip:
|
||||
* National Semiconductor LM70
|
||||
Datasheet: http://www.national.com/pf/LM/LM70.html
|
||||
|
||||
Author:
|
||||
Kaiwan N Billimoria <kaiwan@designergraphix.com>
|
||||
|
||||
Description
|
||||
-----------
|
||||
|
||||
This driver implements support for the National Semiconductor LM70
|
||||
temperature sensor.
|
||||
|
||||
The LM70 temperature sensor chip supports a single temperature sensor.
|
||||
It communicates with a host processor (or microcontroller) via an
|
||||
SPI/Microwire Bus interface.
|
||||
|
||||
Communication with the LM70 is simple: when the temperature is to be sensed,
|
||||
the driver accesses the LM70 using SPI communication: 16 SCLK cycles
|
||||
comprise the MOSI/MISO loop. At the end of the transfer, the 11-bit 2's
|
||||
complement digital temperature (sent via the SIO line), is available in the
|
||||
driver for interpretation. This driver makes use of the kernel's in-core
|
||||
SPI support.
|
||||
|
||||
Thanks to
|
||||
---------
|
||||
Jean Delvare <khali@linux-fr.org> for mentoring the hwmon-side driver
|
||||
development.
|
@ -7,6 +7,10 @@ Supported chips:
|
||||
Addresses scanned: I2C 0x18 - 0x1a, 0x29 - 0x2b, 0x4c - 0x4e
|
||||
Datasheet: Publicly available at the National Semiconductor website
|
||||
http://www.national.com/pf/LM/LM83.html
|
||||
* National Semiconductor LM82
|
||||
Addresses scanned: I2C 0x18 - 0x1a, 0x29 - 0x2b, 0x4c - 0x4e
|
||||
Datasheet: Publicly available at the National Semiconductor website
|
||||
http://www.national.com/pf/LM/LM82.html
|
||||
|
||||
|
||||
Author: Jean Delvare <khali@linux-fr.org>
|
||||
@ -15,10 +19,11 @@ Description
|
||||
-----------
|
||||
|
||||
The LM83 is a digital temperature sensor. It senses its own temperature as
|
||||
well as the temperature of up to three external diodes. It is compatible
|
||||
with many other devices such as the LM84 and all other ADM1021 clones.
|
||||
The main difference between the LM83 and the LM84 in that the later can
|
||||
only sense the temperature of one external diode.
|
||||
well as the temperature of up to three external diodes. The LM82 is
|
||||
a stripped down version of the LM83 that only supports one external diode.
|
||||
Both are compatible with many other devices such as the LM84 and all
|
||||
other ADM1021 clones. The main difference between the LM83 and the LM84
|
||||
in that the later can only sense the temperature of one external diode.
|
||||
|
||||
Using the adm1021 driver for a LM83 should work, but only two temperatures
|
||||
will be reported instead of four.
|
||||
@ -30,12 +35,16 @@ contact us. Note that the LM90 can easily be misdetected as a LM83.
|
||||
|
||||
Confirmed motherboards:
|
||||
SBS P014
|
||||
SBS PSL09
|
||||
|
||||
Unconfirmed motherboards:
|
||||
Gigabyte GA-8IK1100
|
||||
Iwill MPX2
|
||||
Soltek SL-75DRV5
|
||||
|
||||
The LM82 is confirmed to have been found on most AMD Geode reference
|
||||
designs and test platforms.
|
||||
|
||||
The driver has been successfully tested by Magnus Forsström, who I'd
|
||||
like to thank here. More testers will be of course welcome.
|
||||
|
||||
|
102
Documentation/hwmon/smsc47m192
Normal file
102
Documentation/hwmon/smsc47m192
Normal file
@ -0,0 +1,102 @@
|
||||
Kernel driver smsc47m192
|
||||
========================
|
||||
|
||||
Supported chips:
|
||||
* SMSC LPC47M192 and LPC47M997
|
||||
Prefix: 'smsc47m192'
|
||||
Addresses scanned: I2C 0x2c - 0x2d
|
||||
Datasheet: The datasheet for LPC47M192 is publicly available from
|
||||
http://www.smsc.com/
|
||||
The LPC47M997 is compatible for hardware monitoring.
|
||||
|
||||
Author: Hartmut Rick <linux@rick.claranet.de>
|
||||
Special thanks to Jean Delvare for careful checking
|
||||
of the code and many helpful comments and suggestions.
|
||||
|
||||
|
||||
Description
|
||||
-----------
|
||||
|
||||
This driver implements support for the hardware sensor capabilities
|
||||
of the SMSC LPC47M192 and LPC47M997 Super-I/O chips.
|
||||
|
||||
These chips support 3 temperature channels and 8 voltage inputs
|
||||
as well as CPU voltage VID input.
|
||||
|
||||
They do also have fan monitoring and control capabilities, but the
|
||||
these features are accessed via ISA bus and are not supported by this
|
||||
driver. Use the 'smsc47m1' driver for fan monitoring and control.
|
||||
|
||||
Voltages and temperatures are measured by an 8-bit ADC, the resolution
|
||||
of the temperatures is 1 bit per degree C.
|
||||
Voltages are scaled such that the nominal voltage corresponds to
|
||||
192 counts, i.e. 3/4 of the full range. Thus the available range for
|
||||
each voltage channel is 0V ... 255/192*(nominal voltage), the resolution
|
||||
is 1 bit per (nominal voltage)/192.
|
||||
Both voltage and temperature values are scaled by 1000, the sys files
|
||||
show voltages in mV and temperatures in units of 0.001 degC.
|
||||
|
||||
The +12V analog voltage input channel (in4_input) is multiplexed with
|
||||
bit 4 of the encoded CPU voltage. This means that you either get
|
||||
a +12V voltage measurement or a 5 bit CPU VID, but not both.
|
||||
The default setting is to use the pin as 12V input, and use only 4 bit VID.
|
||||
This driver assumes that the information in the configuration register
|
||||
is correct, i.e. that the BIOS has updated the configuration if
|
||||
the motherboard has this input wired to VID4.
|
||||
|
||||
The temperature and voltage readings are updated once every 1.5 seconds.
|
||||
Reading them more often repeats the same values.
|
||||
|
||||
|
||||
sysfs interface
|
||||
---------------
|
||||
|
||||
in0_input - +2.5V voltage input
|
||||
in1_input - CPU voltage input (nominal 2.25V)
|
||||
in2_input - +3.3V voltage input
|
||||
in3_input - +5V voltage input
|
||||
in4_input - +12V voltage input (may be missing if used as VID4)
|
||||
in5_input - Vcc voltage input (nominal 3.3V)
|
||||
This is the supply voltage of the sensor chip itself.
|
||||
in6_input - +1.5V voltage input
|
||||
in7_input - +1.8V voltage input
|
||||
|
||||
in[0-7]_min,
|
||||
in[0-7]_max - lower and upper alarm thresholds for in[0-7]_input reading
|
||||
|
||||
All voltages are read and written in mV.
|
||||
|
||||
in[0-7]_alarm - alarm flags for voltage inputs
|
||||
These files read '1' in case of alarm, '0' otherwise.
|
||||
|
||||
temp1_input - chip temperature measured by on-chip diode
|
||||
temp[2-3]_input - temperature measured by external diodes (one of these would
|
||||
typically be wired to the diode inside the CPU)
|
||||
|
||||
temp[1-3]_min,
|
||||
temp[1-3]_max - lower and upper alarm thresholds for temperatures
|
||||
|
||||
temp[1-3]_offset - temperature offset registers
|
||||
The chip adds the offsets stored in these registers to
|
||||
the corresponding temperature readings.
|
||||
Note that temp1 and temp2 offsets share the same register,
|
||||
they cannot both be different from zero at the same time.
|
||||
Writing a non-zero number to one of them will reset the other
|
||||
offset to zero.
|
||||
|
||||
All temperatures and offsets are read and written in
|
||||
units of 0.001 degC.
|
||||
|
||||
temp[1-3]_alarm - alarm flags for temperature inputs, '1' in case of alarm,
|
||||
'0' otherwise.
|
||||
temp[2-3]_input_fault - diode fault flags for temperature inputs 2 and 3.
|
||||
A fault is detected if the two pins for the corresponding
|
||||
sensor are open or shorted, or any of the two is shorted
|
||||
to ground or Vcc. '1' indicates a diode fault.
|
||||
|
||||
cpu0_vid - CPU voltage as received from the CPU
|
||||
|
||||
vrm - CPU VID standard used for decoding CPU voltage
|
||||
|
||||
The *_min, *_max, *_offset and vrm files can be read and
|
||||
written, all others are read-only.
|
@ -3,15 +3,15 @@ Naming and data format standards for sysfs files
|
||||
|
||||
The libsensors library offers an interface to the raw sensors data
|
||||
through the sysfs interface. See libsensors documentation and source for
|
||||
more further information. As of writing this document, libsensors
|
||||
(from lm_sensors 2.8.3) is heavily chip-dependant. Adding or updating
|
||||
further information. As of writing this document, libsensors
|
||||
(from lm_sensors 2.8.3) is heavily chip-dependent. Adding or updating
|
||||
support for any given chip requires modifying the library's code.
|
||||
This is because libsensors was written for the procfs interface
|
||||
older kernel modules were using, which wasn't standardized enough.
|
||||
Recent versions of libsensors (from lm_sensors 2.8.2 and later) have
|
||||
support for the sysfs interface, though.
|
||||
|
||||
The new sysfs interface was designed to be as chip-independant as
|
||||
The new sysfs interface was designed to be as chip-independent as
|
||||
possible.
|
||||
|
||||
Note that motherboards vary widely in the connections to sensor chips.
|
||||
@ -24,7 +24,7 @@ range using external resistors. Since the values of these resistors
|
||||
can change from motherboard to motherboard, the conversions cannot be
|
||||
hard coded into the driver and have to be done in user space.
|
||||
|
||||
For this reason, even if we aim at a chip-independant libsensors, it will
|
||||
For this reason, even if we aim at a chip-independent libsensors, it will
|
||||
still require a configuration file (e.g. /etc/sensors.conf) for proper
|
||||
values conversion, labeling of inputs and hiding of unused inputs.
|
||||
|
||||
@ -39,15 +39,16 @@ If you are developing a userspace application please send us feedback on
|
||||
this standard.
|
||||
|
||||
Note that this standard isn't completely established yet, so it is subject
|
||||
to changes, even important ones. One more reason to use the library instead
|
||||
of accessing sysfs files directly.
|
||||
to changes. If you are writing a new hardware monitoring driver those
|
||||
features can't seem to fit in this interface, please contact us with your
|
||||
extension proposal. Keep in mind that backward compatibility must be
|
||||
preserved.
|
||||
|
||||
Each chip gets its own directory in the sysfs /sys/devices tree. To
|
||||
find all sensor chips, it is easier to follow the symlinks from
|
||||
/sys/i2c/devices/
|
||||
find all sensor chips, it is easier to follow the device symlinks from
|
||||
/sys/class/hwmon/hwmon*.
|
||||
|
||||
All sysfs values are fixed point numbers. To get the true value of some
|
||||
of the values, you should divide by the specified value.
|
||||
All sysfs values are fixed point numbers.
|
||||
|
||||
There is only one value per file, unlike the older /proc specification.
|
||||
The common scheme for files naming is: <type><number>_<item>. Usual
|
||||
@ -69,28 +70,40 @@ to cause an alarm) is chip-dependent.
|
||||
|
||||
-------------------------------------------------------------------------
|
||||
|
||||
[0-*] denotes any positive number starting from 0
|
||||
[1-*] denotes any positive number starting from 1
|
||||
RO read only value
|
||||
RW read/write value
|
||||
|
||||
Read/write values may be read-only for some chips, depending on the
|
||||
hardware implementation.
|
||||
|
||||
All entries are optional, and should only be created in a given driver
|
||||
if the chip has the feature.
|
||||
|
||||
************
|
||||
* Voltages *
|
||||
************
|
||||
|
||||
in[0-8]_min Voltage min value.
|
||||
in[0-*]_min Voltage min value.
|
||||
Unit: millivolt
|
||||
Read/Write
|
||||
RW
|
||||
|
||||
in[0-8]_max Voltage max value.
|
||||
in[0-*]_max Voltage max value.
|
||||
Unit: millivolt
|
||||
Read/Write
|
||||
RW
|
||||
|
||||
in[0-8]_input Voltage input value.
|
||||
in[0-*]_input Voltage input value.
|
||||
Unit: millivolt
|
||||
Read only
|
||||
RO
|
||||
Voltage measured on the chip pin.
|
||||
Actual voltage depends on the scaling resistors on the
|
||||
motherboard, as recommended in the chip datasheet.
|
||||
This varies by chip and by motherboard.
|
||||
Because of this variation, values are generally NOT scaled
|
||||
by the chip driver, and must be done by the application.
|
||||
However, some drivers (notably lm87 and via686a)
|
||||
do scale, with various degrees of success.
|
||||
do scale, because of internal resistors built into a chip.
|
||||
These drivers will output the actual voltage.
|
||||
|
||||
Typical usage:
|
||||
@ -104,58 +117,72 @@ in[0-8]_input Voltage input value.
|
||||
in7_* varies
|
||||
in8_* varies
|
||||
|
||||
cpu[0-1]_vid CPU core reference voltage.
|
||||
cpu[0-*]_vid CPU core reference voltage.
|
||||
Unit: millivolt
|
||||
Read only.
|
||||
RO
|
||||
Not always correct.
|
||||
|
||||
vrm Voltage Regulator Module version number.
|
||||
Read only.
|
||||
Two digit number, first is major version, second is
|
||||
minor version.
|
||||
RW (but changing it should no more be necessary)
|
||||
Originally the VRM standard version multiplied by 10, but now
|
||||
an arbitrary number, as not all standards have a version
|
||||
number.
|
||||
Affects the way the driver calculates the CPU core reference
|
||||
voltage from the vid pins.
|
||||
|
||||
Also see the Alarms section for status flags associated with voltages.
|
||||
|
||||
|
||||
********
|
||||
* Fans *
|
||||
********
|
||||
|
||||
fan[1-3]_min Fan minimum value
|
||||
fan[1-*]_min Fan minimum value
|
||||
Unit: revolution/min (RPM)
|
||||
Read/Write.
|
||||
RW
|
||||
|
||||
fan[1-3]_input Fan input value.
|
||||
fan[1-*]_input Fan input value.
|
||||
Unit: revolution/min (RPM)
|
||||
Read only.
|
||||
RO
|
||||
|
||||
fan[1-3]_div Fan divisor.
|
||||
fan[1-*]_div Fan divisor.
|
||||
Integer value in powers of two (1, 2, 4, 8, 16, 32, 64, 128).
|
||||
RW
|
||||
Some chips only support values 1, 2, 4 and 8.
|
||||
Note that this is actually an internal clock divisor, which
|
||||
affects the measurable speed range, not the read value.
|
||||
|
||||
Also see the Alarms section for status flags associated with fans.
|
||||
|
||||
|
||||
*******
|
||||
* PWM *
|
||||
*******
|
||||
|
||||
pwm[1-3] Pulse width modulation fan control.
|
||||
pwm[1-*] Pulse width modulation fan control.
|
||||
Integer value in the range 0 to 255
|
||||
Read/Write
|
||||
RW
|
||||
255 is max or 100%.
|
||||
|
||||
pwm[1-3]_enable
|
||||
pwm[1-*]_enable
|
||||
Switch PWM on and off.
|
||||
Not always present even if fan*_pwm is.
|
||||
0 to turn off
|
||||
1 to turn on in manual mode
|
||||
2 to turn on in automatic mode
|
||||
Read/Write
|
||||
0: turn off
|
||||
1: turn on in manual mode
|
||||
2+: turn on in automatic mode
|
||||
Check individual chip documentation files for automatic mode details.
|
||||
RW
|
||||
|
||||
pwm[1-*]_mode
|
||||
0: DC mode
|
||||
1: PWM mode
|
||||
RW
|
||||
|
||||
pwm[1-*]_auto_channels_temp
|
||||
Select which temperature channels affect this PWM output in
|
||||
auto mode. Bitfield, 1 is temp1, 2 is temp2, 4 is temp3 etc...
|
||||
Which values are possible depend on the chip used.
|
||||
RW
|
||||
|
||||
pwm[1-*]_auto_point[1-*]_pwm
|
||||
pwm[1-*]_auto_point[1-*]_temp
|
||||
@ -163,6 +190,7 @@ pwm[1-*]_auto_point[1-*]_temp_hyst
|
||||
Define the PWM vs temperature curve. Number of trip points is
|
||||
chip-dependent. Use this for chips which associate trip points
|
||||
to PWM output channels.
|
||||
RW
|
||||
|
||||
OR
|
||||
|
||||
@ -172,50 +200,57 @@ temp[1-*]_auto_point[1-*]_temp_hyst
|
||||
Define the PWM vs temperature curve. Number of trip points is
|
||||
chip-dependent. Use this for chips which associate trip points
|
||||
to temperature channels.
|
||||
RW
|
||||
|
||||
|
||||
****************
|
||||
* Temperatures *
|
||||
****************
|
||||
|
||||
temp[1-3]_type Sensor type selection.
|
||||
temp[1-*]_type Sensor type selection.
|
||||
Integers 1 to 4 or thermistor Beta value (typically 3435)
|
||||
Read/Write.
|
||||
RW
|
||||
1: PII/Celeron Diode
|
||||
2: 3904 transistor
|
||||
3: thermal diode
|
||||
4: thermistor (default/unknown Beta)
|
||||
Not all types are supported by all chips
|
||||
|
||||
temp[1-4]_max Temperature max value.
|
||||
Unit: millidegree Celcius
|
||||
Read/Write value.
|
||||
temp[1-*]_max Temperature max value.
|
||||
Unit: millidegree Celsius (or millivolt, see below)
|
||||
RW
|
||||
|
||||
temp[1-3]_min Temperature min value.
|
||||
Unit: millidegree Celcius
|
||||
Read/Write value.
|
||||
temp[1-*]_min Temperature min value.
|
||||
Unit: millidegree Celsius
|
||||
RW
|
||||
|
||||
temp[1-3]_max_hyst
|
||||
temp[1-*]_max_hyst
|
||||
Temperature hysteresis value for max limit.
|
||||
Unit: millidegree Celcius
|
||||
Unit: millidegree Celsius
|
||||
Must be reported as an absolute temperature, NOT a delta
|
||||
from the max value.
|
||||
Read/Write value.
|
||||
RW
|
||||
|
||||
temp[1-4]_input Temperature input value.
|
||||
Unit: millidegree Celcius
|
||||
Read only value.
|
||||
temp[1-*]_input Temperature input value.
|
||||
Unit: millidegree Celsius
|
||||
RO
|
||||
|
||||
temp[1-4]_crit Temperature critical value, typically greater than
|
||||
temp[1-*]_crit Temperature critical value, typically greater than
|
||||
corresponding temp_max values.
|
||||
Unit: millidegree Celcius
|
||||
Read/Write value.
|
||||
Unit: millidegree Celsius
|
||||
RW
|
||||
|
||||
temp[1-2]_crit_hyst
|
||||
temp[1-*]_crit_hyst
|
||||
Temperature hysteresis value for critical limit.
|
||||
Unit: millidegree Celcius
|
||||
Unit: millidegree Celsius
|
||||
Must be reported as an absolute temperature, NOT a delta
|
||||
from the critical value.
|
||||
RW
|
||||
|
||||
temp[1-4]_offset
|
||||
Temperature offset which is added to the temperature reading
|
||||
by the chip.
|
||||
Unit: millidegree Celsius
|
||||
Read/Write value.
|
||||
|
||||
If there are multiple temperature sensors, temp1_* is
|
||||
@ -225,6 +260,17 @@ temp[1-2]_crit_hyst
|
||||
itself, for example the thermal diode inside the CPU or
|
||||
a thermistor nearby.
|
||||
|
||||
Some chips measure temperature using external thermistors and an ADC, and
|
||||
report the temperature measurement as a voltage. Converting this voltage
|
||||
back to a temperature (or the other way around for limits) requires
|
||||
mathematical functions not available in the kernel, so the conversion
|
||||
must occur in user space. For these chips, all temp* files described
|
||||
above should contain values expressed in millivolt instead of millidegree
|
||||
Celsius. In other words, such temperature channels are handled as voltage
|
||||
channels by the driver.
|
||||
|
||||
Also see the Alarms section for status flags associated with temperatures.
|
||||
|
||||
|
||||
************
|
||||
* Currents *
|
||||
@ -233,25 +279,88 @@ temp[1-2]_crit_hyst
|
||||
Note that no known chip provides current measurements as of writing,
|
||||
so this part is theoretical, so to say.
|
||||
|
||||
curr[1-n]_max Current max value
|
||||
curr[1-*]_max Current max value
|
||||
Unit: milliampere
|
||||
Read/Write.
|
||||
RW
|
||||
|
||||
curr[1-n]_min Current min value.
|
||||
curr[1-*]_min Current min value.
|
||||
Unit: milliampere
|
||||
Read/Write.
|
||||
RW
|
||||
|
||||
curr[1-n]_input Current input value
|
||||
curr[1-*]_input Current input value
|
||||
Unit: milliampere
|
||||
Read only.
|
||||
RO
|
||||
|
||||
|
||||
*********
|
||||
* Other *
|
||||
*********
|
||||
**********
|
||||
* Alarms *
|
||||
**********
|
||||
|
||||
Each channel or limit may have an associated alarm file, containing a
|
||||
boolean value. 1 means than an alarm condition exists, 0 means no alarm.
|
||||
|
||||
Usually a given chip will either use channel-related alarms, or
|
||||
limit-related alarms, not both. The driver should just reflect the hardware
|
||||
implementation.
|
||||
|
||||
in[0-*]_alarm
|
||||
fan[1-*]_alarm
|
||||
temp[1-*]_alarm
|
||||
Channel alarm
|
||||
0: no alarm
|
||||
1: alarm
|
||||
RO
|
||||
|
||||
OR
|
||||
|
||||
in[0-*]_min_alarm
|
||||
in[0-*]_max_alarm
|
||||
fan[1-*]_min_alarm
|
||||
temp[1-*]_min_alarm
|
||||
temp[1-*]_max_alarm
|
||||
temp[1-*]_crit_alarm
|
||||
Limit alarm
|
||||
0: no alarm
|
||||
1: alarm
|
||||
RO
|
||||
|
||||
Each input channel may have an associated fault file. This can be used
|
||||
to notify open diodes, unconnected fans etc. where the hardware
|
||||
supports it. When this boolean has value 1, the measurement for that
|
||||
channel should not be trusted.
|
||||
|
||||
in[0-*]_input_fault
|
||||
fan[1-*]_input_fault
|
||||
temp[1-*]_input_fault
|
||||
Input fault condition
|
||||
0: no fault occured
|
||||
1: fault condition
|
||||
RO
|
||||
|
||||
Some chips also offer the possibility to get beeped when an alarm occurs:
|
||||
|
||||
beep_enable Master beep enable
|
||||
0: no beeps
|
||||
1: beeps
|
||||
RW
|
||||
|
||||
in[0-*]_beep
|
||||
fan[1-*]_beep
|
||||
temp[1-*]_beep
|
||||
Channel beep
|
||||
0: disable
|
||||
1: enable
|
||||
RW
|
||||
|
||||
In theory, a chip could provide per-limit beep masking, but no such chip
|
||||
was seen so far.
|
||||
|
||||
Old drivers provided a different, non-standard interface to alarms and
|
||||
beeps. These interface files are deprecated, but will be kept around
|
||||
for compatibility reasons:
|
||||
|
||||
alarms Alarm bitmask.
|
||||
Read only.
|
||||
RO
|
||||
Integer representation of one to four bytes.
|
||||
A '1' bit means an alarm.
|
||||
Chips should be programmed for 'comparator' mode so that
|
||||
@ -259,35 +368,26 @@ alarms Alarm bitmask.
|
||||
if it is still valid.
|
||||
Generally a direct representation of a chip's internal
|
||||
alarm registers; there is no standard for the position
|
||||
of individual bits.
|
||||
of individual bits. For this reason, the use of this
|
||||
interface file for new drivers is discouraged. Use
|
||||
individual *_alarm and *_fault files instead.
|
||||
Bits are defined in kernel/include/sensors.h.
|
||||
|
||||
alarms_in Alarm bitmask relative to in (voltage) channels
|
||||
Read only
|
||||
A '1' bit means an alarm, LSB corresponds to in0 and so on
|
||||
Prefered to 'alarms' for newer chips
|
||||
|
||||
alarms_fan Alarm bitmask relative to fan channels
|
||||
Read only
|
||||
A '1' bit means an alarm, LSB corresponds to fan1 and so on
|
||||
Prefered to 'alarms' for newer chips
|
||||
|
||||
alarms_temp Alarm bitmask relative to temp (temperature) channels
|
||||
Read only
|
||||
A '1' bit means an alarm, LSB corresponds to temp1 and so on
|
||||
Prefered to 'alarms' for newer chips
|
||||
|
||||
beep_enable Beep/interrupt enable
|
||||
0 to disable.
|
||||
1 to enable.
|
||||
Read/Write
|
||||
|
||||
beep_mask Bitmask for beep.
|
||||
Same format as 'alarms' with the same bit locations.
|
||||
Read/Write
|
||||
Same format as 'alarms' with the same bit locations,
|
||||
use discouraged for the same reason. Use individual
|
||||
*_beep files instead.
|
||||
RW
|
||||
|
||||
|
||||
*********
|
||||
* Other *
|
||||
*********
|
||||
|
||||
eeprom Raw EEPROM data in binary form.
|
||||
Read only.
|
||||
RO
|
||||
|
||||
pec Enable or disable PEC (SMBus only)
|
||||
Read/Write
|
||||
0: disable
|
||||
1: enable
|
||||
RW
|
||||
|
@ -6,31 +6,32 @@ voltages, fans speed). They are often connected through an I2C bus, but some
|
||||
are also connected directly through the ISA bus.
|
||||
|
||||
The kernel drivers make the data from the sensor chips available in the /sys
|
||||
virtual filesystem. Userspace tools are then used to display or set or the
|
||||
data in a more friendly manner.
|
||||
virtual filesystem. Userspace tools are then used to display the measured
|
||||
values or configure the chips in a more friendly manner.
|
||||
|
||||
Lm-sensors
|
||||
----------
|
||||
|
||||
Core set of utilites that will allow you to obtain health information,
|
||||
Core set of utilities that will allow you to obtain health information,
|
||||
setup monitoring limits etc. You can get them on their homepage
|
||||
http://www.lm-sensors.nu/ or as a package from your Linux distribution.
|
||||
|
||||
If from website:
|
||||
Get lmsensors from project web site. Please note, you need only userspace
|
||||
part, so compile with "make user_install" target.
|
||||
Get lm-sensors from project web site. Please note, you need only userspace
|
||||
part, so compile with "make user" and install with "make user_install".
|
||||
|
||||
General hints to get things working:
|
||||
|
||||
0) get lm-sensors userspace utils
|
||||
1) compile all drivers in I2C section as modules in your kernel
|
||||
1) compile all drivers in I2C and Hardware Monitoring sections as modules
|
||||
in your kernel
|
||||
2) run sensors-detect script, it will tell you what modules you need to load.
|
||||
3) load them and run "sensors" command, you should see some results.
|
||||
4) fix sensors.conf, labels, limits, fan divisors
|
||||
5) if any more problems consult FAQ, or documentation
|
||||
|
||||
Other utilites
|
||||
--------------
|
||||
Other utilities
|
||||
---------------
|
||||
|
||||
If you want some graphical indicators of system health look for applications
|
||||
like: gkrellm, ksensors, xsensors, wmtemp, wmsensors, wmgtemp, ksysguardd,
|
||||
|
113
Documentation/hwmon/w83791d
Normal file
113
Documentation/hwmon/w83791d
Normal file
@ -0,0 +1,113 @@
|
||||
Kernel driver w83791d
|
||||
=====================
|
||||
|
||||
Supported chips:
|
||||
* Winbond W83791D
|
||||
Prefix: 'w83791d'
|
||||
Addresses scanned: I2C 0x2c - 0x2f
|
||||
Datasheet: http://www.winbond-usa.com/products/winbond_products/pdfs/PCIC/W83791Da.pdf
|
||||
|
||||
Author: Charles Spirakis <bezaur@gmail.com>
|
||||
|
||||
This driver was derived from the w83781d.c and w83792d.c source files.
|
||||
|
||||
Credits:
|
||||
w83781d.c:
|
||||
Frodo Looijaard <frodol@dds.nl>,
|
||||
Philip Edelbrock <phil@netroedge.com>,
|
||||
and Mark Studebaker <mdsxyz123@yahoo.com>
|
||||
w83792d.c:
|
||||
Chunhao Huang <DZShen@Winbond.com.tw>,
|
||||
Rudolf Marek <r.marek@sh.cvut.cz>
|
||||
|
||||
Module Parameters
|
||||
-----------------
|
||||
|
||||
* init boolean
|
||||
(default 0)
|
||||
Use 'init=1' to have the driver do extra software initializations.
|
||||
The default behavior is to do the minimum initialization possible
|
||||
and depend on the BIOS to properly setup the chip. If you know you
|
||||
have a w83791d and you're having problems, try init=1 before trying
|
||||
reset=1.
|
||||
|
||||
* reset boolean
|
||||
(default 0)
|
||||
Use 'reset=1' to reset the chip (via index 0x40, bit 7). The default
|
||||
behavior is no chip reset to preserve BIOS settings.
|
||||
|
||||
* force_subclients=bus,caddr,saddr,saddr
|
||||
This is used to force the i2c addresses for subclients of
|
||||
a certain chip. Example usage is `force_subclients=0,0x2f,0x4a,0x4b'
|
||||
to force the subclients of chip 0x2f on bus 0 to i2c addresses
|
||||
0x4a and 0x4b.
|
||||
|
||||
|
||||
Description
|
||||
-----------
|
||||
|
||||
This driver implements support for the Winbond W83791D chip.
|
||||
|
||||
Detection of the chip can sometimes be foiled because it can be in an
|
||||
internal state that allows no clean access (Bank with ID register is not
|
||||
currently selected). If you know the address of the chip, use a 'force'
|
||||
parameter; this will put it into a more well-behaved state first.
|
||||
|
||||
The driver implements three temperature sensors, five fan rotation speed
|
||||
sensors, and ten voltage sensors.
|
||||
|
||||
Temperatures are measured in degrees Celsius and measurement resolution is 1
|
||||
degC for temp1 and 0.5 degC for temp2 and temp3. An alarm is triggered when
|
||||
the temperature gets higher than the Overtemperature Shutdown value; it stays
|
||||
on until the temperature falls below the Hysteresis value.
|
||||
|
||||
Fan rotation speeds are reported in RPM (rotations per minute). An alarm is
|
||||
triggered if the rotation speed has dropped below a programmable limit. Fan
|
||||
readings can be divided by a programmable divider (1, 2, 4, 8 for fan 1/2/3
|
||||
and 1, 2, 4, 8, 16, 32, 64 or 128 for fan 4/5) to give the readings more
|
||||
range or accuracy.
|
||||
|
||||
Voltage sensors (also known as IN sensors) report their values in millivolts.
|
||||
An alarm is triggered if the voltage has crossed a programmable minimum
|
||||
or maximum limit.
|
||||
|
||||
Alarms are provided as output from a "realtime status register". The
|
||||
following bits are defined:
|
||||
|
||||
bit - alarm on:
|
||||
0 - Vcore
|
||||
1 - VINR0
|
||||
2 - +3.3VIN
|
||||
3 - 5VDD
|
||||
4 - temp1
|
||||
5 - temp2
|
||||
6 - fan1
|
||||
7 - fan2
|
||||
8 - +12VIN
|
||||
9 - -12VIN
|
||||
10 - -5VIN
|
||||
11 - fan3
|
||||
12 - chassis
|
||||
13 - temp3
|
||||
14 - VINR1
|
||||
15 - reserved
|
||||
16 - tart1
|
||||
17 - tart2
|
||||
18 - tart3
|
||||
19 - VSB
|
||||
20 - VBAT
|
||||
21 - fan4
|
||||
22 - fan5
|
||||
23 - reserved
|
||||
|
||||
When an alarm goes off, you can be warned by a beeping signal through your
|
||||
computer speaker. It is possible to enable all beeping globally, or only
|
||||
the beeping for some alarms.
|
||||
|
||||
The driver only reads the chip values each 3 seconds; reading them more
|
||||
often will do no harm, but will return 'old' values.
|
||||
|
||||
W83791D TODO:
|
||||
---------------
|
||||
Provide a patch for per-file alarms as discussed on the mailing list
|
||||
Provide a patch for smart-fan control (still need appropriate motherboard/fans)
|
@ -21,8 +21,7 @@ Authors:
|
||||
Module Parameters
|
||||
-----------------
|
||||
|
||||
* force_addr: int
|
||||
Forcibly enable the ICH at the given address. EXTREMELY DANGEROUS!
|
||||
None.
|
||||
|
||||
|
||||
Description
|
||||
|
@ -7,6 +7,8 @@ Supported adapters:
|
||||
* nForce3 250Gb MCP 10de:00E4
|
||||
* nForce4 MCP 10de:0052
|
||||
* nForce4 MCP-04 10de:0034
|
||||
* nForce4 MCP51 10de:0264
|
||||
* nForce4 MCP55 10de:0368
|
||||
|
||||
Datasheet: not publically available, but seems to be similar to the
|
||||
AMD-8111 SMBus 2.0 adapter.
|
||||
|
51
Documentation/i2c/busses/i2c-ocores
Normal file
51
Documentation/i2c/busses/i2c-ocores
Normal file
@ -0,0 +1,51 @@
|
||||
Kernel driver i2c-ocores
|
||||
|
||||
Supported adapters:
|
||||
* OpenCores.org I2C controller by Richard Herveille (see datasheet link)
|
||||
Datasheet: http://www.opencores.org/projects.cgi/web/i2c/overview
|
||||
|
||||
Author: Peter Korsgaard <jacmet@sunsite.dk>
|
||||
|
||||
Description
|
||||
-----------
|
||||
|
||||
i2c-ocores is an i2c bus driver for the OpenCores.org I2C controller
|
||||
IP core by Richard Herveille.
|
||||
|
||||
Usage
|
||||
-----
|
||||
|
||||
i2c-ocores uses the platform bus, so you need to provide a struct
|
||||
platform_device with the base address and interrupt number. The
|
||||
dev.platform_data of the device should also point to a struct
|
||||
ocores_i2c_platform_data (see linux/i2c-ocores.h) describing the
|
||||
distance between registers and the input clock speed.
|
||||
|
||||
E.G. something like:
|
||||
|
||||
static struct resource ocores_resources[] = {
|
||||
[0] = {
|
||||
.start = MYI2C_BASEADDR,
|
||||
.end = MYI2C_BASEADDR + 8,
|
||||
.flags = IORESOURCE_MEM,
|
||||
},
|
||||
[1] = {
|
||||
.start = MYI2C_IRQ,
|
||||
.end = MYI2C_IRQ,
|
||||
.flags = IORESOURCE_IRQ,
|
||||
},
|
||||
};
|
||||
|
||||
static struct ocores_i2c_platform_data myi2c_data = {
|
||||
.regstep = 2, /* two bytes between registers */
|
||||
.clock_khz = 50000, /* input clock of 50MHz */
|
||||
};
|
||||
|
||||
static struct platform_device myi2c = {
|
||||
.name = "ocores-i2c",
|
||||
.dev = {
|
||||
.platform_data = &myi2c_data,
|
||||
},
|
||||
.num_resources = ARRAY_SIZE(ocores_resources),
|
||||
.resource = ocores_resources,
|
||||
};
|
@ -6,6 +6,8 @@ Supported adapters:
|
||||
Datasheet: Publicly available at the Intel website
|
||||
* ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges
|
||||
Datasheet: Only available via NDA from ServerWorks
|
||||
* ATI IXP southbridges IXP200, IXP300, IXP400
|
||||
Datasheet: Not publicly available
|
||||
* Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge
|
||||
Datasheet: Publicly available at the SMSC website http://www.smsc.com
|
||||
|
||||
@ -21,8 +23,6 @@ Module Parameters
|
||||
Forcibly enable the PIIX4. DANGEROUS!
|
||||
* force_addr: int
|
||||
Forcibly enable the PIIX4 at the given address. EXTREMELY DANGEROUS!
|
||||
* fix_hstcfg: int
|
||||
Fix config register. Needed on some boards (Force CPCI735).
|
||||
|
||||
|
||||
Description
|
||||
@ -63,10 +63,36 @@ The PIIX4E is just an new version of the PIIX4; it is supported as well.
|
||||
The PIIX/PIIX3 does not implement an SMBus or I2C bus, so you can't use
|
||||
this driver on those mainboards.
|
||||
|
||||
The ServerWorks Southbridges, the Intel 440MX, and the Victory766 are
|
||||
The ServerWorks Southbridges, the Intel 440MX, and the Victory66 are
|
||||
identical to the PIIX4 in I2C/SMBus support.
|
||||
|
||||
A few OSB4 southbridges are known to be misconfigured by the BIOS. In this
|
||||
case, you have you use the fix_hstcfg module parameter. Do not use it
|
||||
unless you know you have to, because in some cases it also breaks
|
||||
configuration on southbridges that don't need it.
|
||||
If you own Force CPCI735 motherboard or other OSB4 based systems you may need
|
||||
to change the SMBus Interrupt Select register so the SMBus controller uses
|
||||
the SMI mode.
|
||||
|
||||
1) Use lspci command and locate the PCI device with the SMBus controller:
|
||||
00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 4f)
|
||||
The line may vary for different chipsets. Please consult the driver source
|
||||
for all possible PCI ids (and lspci -n to match them). Lets assume the
|
||||
device is located at 00:0f.0.
|
||||
2) Now you just need to change the value in 0xD2 register. Get it first with
|
||||
command: lspci -xxx -s 00:0f.0
|
||||
If the value is 0x3 then you need to change it to 0x1
|
||||
setpci -s 00:0f.0 d2.b=1
|
||||
|
||||
Please note that you don't need to do that in all cases, just when the SMBus is
|
||||
not working properly.
|
||||
|
||||
|
||||
Hardware-specific issues
|
||||
------------------------
|
||||
|
||||
This driver will refuse to load on IBM systems with an Intel PIIX4 SMBus.
|
||||
Some of these machines have an RFID EEPROM (24RF08) connected to the SMBus,
|
||||
which can easily get corrupted due to a state machine bug. These are mostly
|
||||
Thinkpad laptops, but desktop systems may also be affected. We have no list
|
||||
of all affected systems, so the only safe solution was to prevent access to
|
||||
the SMBus on all IBM systems (detected using DMI data.)
|
||||
|
||||
For additional information, read:
|
||||
http://www2.lm-sensors.nu/~lm78/cvs/lm_sensors2/README.thinkpad
|
||||
|
@ -2,14 +2,31 @@ Kernel driver scx200_acb
|
||||
|
||||
Author: Christer Weinigel <wingel@nano-system.com>
|
||||
|
||||
The driver supersedes the older, never merged driver named i2c-nscacb.
|
||||
|
||||
Module Parameters
|
||||
-----------------
|
||||
|
||||
* base: int
|
||||
* base: up to 4 ints
|
||||
Base addresses for the ACCESS.bus controllers on SCx200 and SC1100 devices
|
||||
|
||||
By default the driver uses two base addresses 0x820 and 0x840.
|
||||
If you want only one base address, specify the second as 0 so as to
|
||||
override this default.
|
||||
|
||||
Description
|
||||
-----------
|
||||
|
||||
Enable the use of the ACCESS.bus controller on the Geode SCx200 and
|
||||
SC1100 processors and the CS5535 and CS5536 Geode companion devices.
|
||||
|
||||
Device-specific notes
|
||||
---------------------
|
||||
|
||||
The SC1100 WRAP boards are known to use base addresses 0x810 and 0x820.
|
||||
If the scx200_acb driver is built into the kernel, add the following
|
||||
parameter to your boot command line:
|
||||
scx200_acb.base=0x810,0x820
|
||||
If the scx200_acb driver is built as a module, add the following line to
|
||||
the file /etc/modprobe.conf instead:
|
||||
options scx200_acb base=0x810,0x820
|
||||
|
208
Documentation/ia64/aliasing.txt
Normal file
208
Documentation/ia64/aliasing.txt
Normal file
@ -0,0 +1,208 @@
|
||||
MEMORY ATTRIBUTE ALIASING ON IA-64
|
||||
|
||||
Bjorn Helgaas
|
||||
<bjorn.helgaas@hp.com>
|
||||
May 4, 2006
|
||||
|
||||
|
||||
MEMORY ATTRIBUTES
|
||||
|
||||
Itanium supports several attributes for virtual memory references.
|
||||
The attribute is part of the virtual translation, i.e., it is
|
||||
contained in the TLB entry. The ones of most interest to the Linux
|
||||
kernel are:
|
||||
|
||||
WB Write-back (cacheable)
|
||||
UC Uncacheable
|
||||
WC Write-coalescing
|
||||
|
||||
System memory typically uses the WB attribute. The UC attribute is
|
||||
used for memory-mapped I/O devices. The WC attribute is uncacheable
|
||||
like UC is, but writes may be delayed and combined to increase
|
||||
performance for things like frame buffers.
|
||||
|
||||
The Itanium architecture requires that we avoid accessing the same
|
||||
page with both a cacheable mapping and an uncacheable mapping[1].
|
||||
|
||||
The design of the chipset determines which attributes are supported
|
||||
on which regions of the address space. For example, some chipsets
|
||||
support either WB or UC access to main memory, while others support
|
||||
only WB access.
|
||||
|
||||
MEMORY MAP
|
||||
|
||||
Platform firmware describes the physical memory map and the
|
||||
supported attributes for each region. At boot-time, the kernel uses
|
||||
the EFI GetMemoryMap() interface. ACPI can also describe memory
|
||||
devices and the attributes they support, but Linux/ia64 currently
|
||||
doesn't use this information.
|
||||
|
||||
The kernel uses the efi_memmap table returned from GetMemoryMap() to
|
||||
learn the attributes supported by each region of physical address
|
||||
space. Unfortunately, this table does not completely describe the
|
||||
address space because some machines omit some or all of the MMIO
|
||||
regions from the map.
|
||||
|
||||
The kernel maintains another table, kern_memmap, which describes the
|
||||
memory Linux is actually using and the attribute for each region.
|
||||
This contains only system memory; it does not contain MMIO space.
|
||||
|
||||
The kern_memmap table typically contains only a subset of the system
|
||||
memory described by the efi_memmap. Linux/ia64 can't use all memory
|
||||
in the system because of constraints imposed by the identity mapping
|
||||
scheme.
|
||||
|
||||
The efi_memmap table is preserved unmodified because the original
|
||||
boot-time information is required for kexec.
|
||||
|
||||
KERNEL IDENTITY MAPPINGS
|
||||
|
||||
Linux/ia64 identity mappings are done with large pages, currently
|
||||
either 16MB or 64MB, referred to as "granules." Cacheable mappings
|
||||
are speculative[2], so the processor can read any location in the
|
||||
page at any time, independent of the programmer's intentions. This
|
||||
means that to avoid attribute aliasing, Linux can create a cacheable
|
||||
identity mapping only when the entire granule supports cacheable
|
||||
access.
|
||||
|
||||
Therefore, kern_memmap contains only full granule-sized regions that
|
||||
can referenced safely by an identity mapping.
|
||||
|
||||
Uncacheable mappings are not speculative, so the processor will
|
||||
generate UC accesses only to locations explicitly referenced by
|
||||
software. This allows UC identity mappings to cover granules that
|
||||
are only partially populated, or populated with a combination of UC
|
||||
and WB regions.
|
||||
|
||||
USER MAPPINGS
|
||||
|
||||
User mappings are typically done with 16K or 64K pages. The smaller
|
||||
page size allows more flexibility because only 16K or 64K has to be
|
||||
homogeneous with respect to memory attributes.
|
||||
|
||||
POTENTIAL ATTRIBUTE ALIASING CASES
|
||||
|
||||
There are several ways the kernel creates new mappings:
|
||||
|
||||
mmap of /dev/mem
|
||||
|
||||
This uses remap_pfn_range(), which creates user mappings. These
|
||||
mappings may be either WB or UC. If the region being mapped
|
||||
happens to be in kern_memmap, meaning that it may also be mapped
|
||||
by a kernel identity mapping, the user mapping must use the same
|
||||
attribute as the kernel mapping.
|
||||
|
||||
If the region is not in kern_memmap, the user mapping should use
|
||||
an attribute reported as being supported in the EFI memory map.
|
||||
|
||||
Since the EFI memory map does not describe MMIO on some
|
||||
machines, this should use an uncacheable mapping as a fallback.
|
||||
|
||||
mmap of /sys/class/pci_bus/.../legacy_mem
|
||||
|
||||
This is very similar to mmap of /dev/mem, except that legacy_mem
|
||||
only allows mmap of the one megabyte "legacy MMIO" area for a
|
||||
specific PCI bus. Typically this is the first megabyte of
|
||||
physical address space, but it may be different on machines with
|
||||
several VGA devices.
|
||||
|
||||
"X" uses this to access VGA frame buffers. Using legacy_mem
|
||||
rather than /dev/mem allows multiple instances of X to talk to
|
||||
different VGA cards.
|
||||
|
||||
The /dev/mem mmap constraints apply.
|
||||
|
||||
However, since this is for mapping legacy MMIO space, WB access
|
||||
does not make sense. This matters on machines without legacy
|
||||
VGA support: these machines may have WB memory for the entire
|
||||
first megabyte (or even the entire first granule).
|
||||
|
||||
On these machines, we could mmap legacy_mem as WB, which would
|
||||
be safe in terms of attribute aliasing, but X has no way of
|
||||
knowing that it is accessing regular memory, not a frame buffer,
|
||||
so the kernel should fail the mmap rather than doing it with WB.
|
||||
|
||||
read/write of /dev/mem
|
||||
|
||||
This uses copy_from_user(), which implicitly uses a kernel
|
||||
identity mapping. This is obviously safe for things in
|
||||
kern_memmap.
|
||||
|
||||
There may be corner cases of things that are not in kern_memmap,
|
||||
but could be accessed this way. For example, registers in MMIO
|
||||
space are not in kern_memmap, but could be accessed with a UC
|
||||
mapping. This would not cause attribute aliasing. But
|
||||
registers typically can be accessed only with four-byte or
|
||||
eight-byte accesses, and the copy_from_user() path doesn't allow
|
||||
any control over the access size, so this would be dangerous.
|
||||
|
||||
ioremap()
|
||||
|
||||
This returns a kernel identity mapping for use inside the
|
||||
kernel.
|
||||
|
||||
If the region is in kern_memmap, we should use the attribute
|
||||
specified there. Otherwise, if the EFI memory map reports that
|
||||
the entire granule supports WB, we should use that (granules
|
||||
that are partially reserved or occupied by firmware do not appear
|
||||
in kern_memmap). Otherwise, we should use a UC mapping.
|
||||
|
||||
PAST PROBLEM CASES
|
||||
|
||||
mmap of various MMIO regions from /dev/mem by "X" on Intel platforms
|
||||
|
||||
The EFI memory map may not report these MMIO regions.
|
||||
|
||||
These must be allowed so that X will work. This means that
|
||||
when the EFI memory map is incomplete, every /dev/mem mmap must
|
||||
succeed. It may create either WB or UC user mappings, depending
|
||||
on whether the region is in kern_memmap or the EFI memory map.
|
||||
|
||||
mmap of 0x0-0xA0000 /dev/mem by "hwinfo" on HP sx1000 with VGA enabled
|
||||
|
||||
See https://bugzilla.novell.com/show_bug.cgi?id=140858.
|
||||
|
||||
The EFI memory map reports the following attributes:
|
||||
0x00000-0x9FFFF WB only
|
||||
0xA0000-0xBFFFF UC only (VGA frame buffer)
|
||||
0xC0000-0xFFFFF WB only
|
||||
|
||||
This mmap is done with user pages, not kernel identity mappings,
|
||||
so it is safe to use WB mappings.
|
||||
|
||||
The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000,
|
||||
which will use a granule-sized UC mapping covering 0-0xFFFFF. This
|
||||
granule covers some WB-only memory, but since UC is non-speculative,
|
||||
the processor will never generate an uncacheable reference to the
|
||||
WB-only areas unless the driver explicitly touches them.
|
||||
|
||||
mmap of 0x0-0xFFFFF legacy_mem by "X"
|
||||
|
||||
If the EFI memory map reports this entire range as WB, there
|
||||
is no VGA MMIO hole, and the mmap should fail or be done with
|
||||
a WB mapping.
|
||||
|
||||
There's no easy way for X to determine whether the 0xA0000-0xBFFFF
|
||||
region is a frame buffer or just memory, so I think it's best to
|
||||
just fail this mmap request rather than using a WB mapping. As
|
||||
far as I know, there's no need to map legacy_mem with WB
|
||||
mappings.
|
||||
|
||||
Otherwise, a UC mapping of the entire region is probably safe.
|
||||
The VGA hole means the region will not be in kern_memmap. The
|
||||
HP sx1000 chipset doesn't support UC access to the memory surrounding
|
||||
the VGA hole, but X doesn't need that area anyway and should not
|
||||
reference it.
|
||||
|
||||
mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled
|
||||
|
||||
The EFI memory map reports the following attributes:
|
||||
0x00000-0xFFFFF WB only (no VGA MMIO hole)
|
||||
|
||||
This is a special case of the previous case, and the mmap should
|
||||
fail for the same reason as above.
|
||||
|
||||
NOTES
|
||||
|
||||
[1] SDM rev 2.2, vol 2, sec 4.4.1.
|
||||
[2] SDM rev 2.2, vol 2, sec 4.4.6.
|
@ -67,8 +67,7 @@ initrd adds the following new options:
|
||||
as the last process has closed it, all data is freed and /dev/initrd
|
||||
can't be opened anymore.
|
||||
|
||||
root=/dev/ram0 (without devfs)
|
||||
root=/dev/rd/0 (with devfs)
|
||||
root=/dev/ram0
|
||||
|
||||
initrd is mounted as root, and the normal boot procedure is followed,
|
||||
with the RAM disk still mounted as root.
|
||||
@ -90,8 +89,7 @@ you're building an install floppy), the root file system creation
|
||||
procedure should create the /initrd directory.
|
||||
|
||||
If initrd will not be mounted in some cases, its content is still
|
||||
accessible if the following device has been created (note that this
|
||||
does not work if using devfs):
|
||||
accessible if the following device has been created:
|
||||
|
||||
# mknod /dev/initrd b 1 250
|
||||
# chmod 400 /dev/initrd
|
||||
@ -119,8 +117,7 @@ We'll describe the loopback device method:
|
||||
(if space is critical, you may want to use the Minix FS instead of Ext2)
|
||||
3) mount the file system, e.g.
|
||||
# mount -t ext2 -o loop initrd /mnt
|
||||
4) create the console device (not necessary if using devfs, but it can't
|
||||
hurt to do it anyway):
|
||||
4) create the console device:
|
||||
# mkdir /mnt/dev
|
||||
# mknod /mnt/dev/console c 5 1
|
||||
5) copy all the files that are needed to properly use the initrd
|
||||
@ -152,12 +149,7 @@ have to be given:
|
||||
|
||||
root=/dev/ram0 init=/linuxrc rw
|
||||
|
||||
if not using devfs, or
|
||||
|
||||
root=/dev/rd/0 init=/linuxrc rw
|
||||
|
||||
if using devfs. (rw is only necessary if writing to the initrd file
|
||||
system.)
|
||||
(rw is only necessary if writing to the initrd file system.)
|
||||
|
||||
With LOADLIN, you simply execute
|
||||
|
||||
@ -217,9 +209,9 @@ following command:
|
||||
# exec chroot . what-follows <dev/console >dev/console 2>&1
|
||||
|
||||
Where what-follows is a program under the new root, e.g. /sbin/init
|
||||
If the new root file system will be used with devfs and has no valid
|
||||
/dev directory, devfs must be mounted before invoking chroot in order to
|
||||
provide /dev/console.
|
||||
If the new root file system will be used with udev and has no valid
|
||||
/dev directory, udev must be initialized before invoking chroot in order
|
||||
to provide /dev/console.
|
||||
|
||||
Note: implementation details of pivot_root may change with time. In order
|
||||
to ensure compatibility, the following points should be observed:
|
||||
@ -236,7 +228,7 @@ Now, the initrd can be unmounted and the memory allocated by the RAM
|
||||
disk can be freed:
|
||||
|
||||
# umount /initrd
|
||||
# blockdev --flushbufs /dev/ram0 # /dev/rd/0 if using devfs
|
||||
# blockdev --flushbufs /dev/ram0
|
||||
|
||||
It is also possible to use initrd with an NFS-mounted root, see the
|
||||
pivot_root(8) man page for details.
|
||||
|
@ -85,7 +85,9 @@ Code Seq# Include File Comments
|
||||
<mailto:maassen@uni-freiburg.de>
|
||||
'C' all linux/soundcard.h
|
||||
'D' all asm-s390/dasd.h
|
||||
'E' all linux/input.h
|
||||
'F' all linux/fb.h
|
||||
'H' all linux/hiddev.h
|
||||
'I' all linux/isdn.h
|
||||
'J' 00-1F drivers/scsi/gdth_ioctl.h
|
||||
'K' all linux/kd.h
|
||||
@ -117,7 +119,6 @@ Code Seq# Include File Comments
|
||||
'c' 00-7F linux/comstats.h conflict!
|
||||
'c' 00-7F linux/coda.h conflict!
|
||||
'd' 00-FF linux/char/drm/drm/h conflict!
|
||||
'd' 00-1F linux/devfs_fs.h conflict!
|
||||
'd' 00-DF linux/video_decoder.h conflict!
|
||||
'd' F0-FF linux/digi1.h
|
||||
'e' all linux/digi1.h conflict!
|
||||
|
@ -124,7 +124,8 @@ GigaSet 307x Device Driver
|
||||
|
||||
You can use some configuration tool of your distribution to configure this
|
||||
"modem" or configure pppd/wvdial manually. There are some example ppp
|
||||
configuration files and chat scripts in the gigaset-VERSION/ppp directory.
|
||||
configuration files and chat scripts in the gigaset-VERSION/ppp directory
|
||||
in the driver packages from http://sourceforge.net/projects/gigaset307x/.
|
||||
Please note that the USB drivers are not able to change the state of the
|
||||
control lines (the M105 driver can be configured to use some undocumented
|
||||
control requests, if you really need the control lines, though). This means
|
||||
@ -164,8 +165,8 @@ GigaSet 307x Device Driver
|
||||
|
||||
If you want both of these at once, you are out of luck.
|
||||
|
||||
You can also use /sys/module/<name>/parameters/cidmode for changing
|
||||
the CID mode setting (<name> is usb_gigaset or bas_gigaset).
|
||||
You can also use /sys/class/tty/ttyGxy/cidmode for changing the CID mode
|
||||
setting (ttyGxy is ttyGU0 or ttyGB0).
|
||||
|
||||
|
||||
3. Troubleshooting
|
||||
|
@ -1123,6 +1123,14 @@ The top Makefile exports the following variables:
|
||||
$(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE). The user may
|
||||
override this value on the command line if desired.
|
||||
|
||||
INSTALL_MOD_STRIP
|
||||
|
||||
If this variable is specified, will cause modules to be stripped
|
||||
after they are installed. If INSTALL_MOD_STRIP is '1', then the
|
||||
default option --strip-debug will be used. Otherwise,
|
||||
INSTALL_MOD_STRIP will used as the option(s) to the strip command.
|
||||
|
||||
|
||||
=== 8 Makefile language
|
||||
|
||||
The kernel Makefiles are designed to run with GNU Make. The Makefiles
|
||||
|
@ -175,7 +175,7 @@ end
|
||||
document trapinfo
|
||||
Run info threads and lookup pid of thread #1
|
||||
'trapinfo <pid>' will tell you by which trap & possibly
|
||||
addresthe kernel paniced.
|
||||
address the kernel panicked.
|
||||
end
|
||||
|
||||
|
||||
|
@ -1,155 +1,325 @@
|
||||
Documentation for kdump - the kexec-based crash dumping solution
|
||||
================================================================
|
||||
Documentation for Kdump - The kexec-based Crash Dumping Solution
|
||||
================================================================
|
||||
|
||||
DESIGN
|
||||
======
|
||||
This document includes overview, setup and installation, and analysis
|
||||
information.
|
||||
|
||||
Kdump uses kexec to reboot to a second kernel whenever a dump needs to be
|
||||
taken. This second kernel is booted with very little memory. The first kernel
|
||||
reserves the section of memory that the second kernel uses. This ensures that
|
||||
on-going DMA from the first kernel does not corrupt the second kernel.
|
||||
Overview
|
||||
========
|
||||
|
||||
All the necessary information about Core image is encoded in ELF format and
|
||||
stored in reserved area of memory before crash. Physical address of start of
|
||||
ELF header is passed to new kernel through command line parameter elfcorehdr=.
|
||||
Kdump uses kexec to quickly boot to a dump-capture kernel whenever a
|
||||
dump of the system kernel's memory needs to be taken (for example, when
|
||||
the system panics). The system kernel's memory image is preserved across
|
||||
the reboot and is accessible to the dump-capture kernel.
|
||||
|
||||
On i386, the first 640 KB of physical memory is needed to boot, irrespective
|
||||
of where the kernel loads. Hence, this region is backed up by kexec just before
|
||||
rebooting into the new kernel.
|
||||
You can use common Linux commands, such as cp and scp, to copy the
|
||||
memory image to a dump file on the local disk, or across the network to
|
||||
a remote system.
|
||||
|
||||
In the second kernel, "old memory" can be accessed in two ways.
|
||||
Kdump and kexec are currently supported on the x86, x86_64, and ppc64
|
||||
architectures.
|
||||
|
||||
- The first one is through a /dev/oldmem device interface. A capture utility
|
||||
can read the device file and write out the memory in raw format. This is raw
|
||||
dump of memory and analysis/capture tool should be intelligent enough to
|
||||
determine where to look for the right information. ELF headers (elfcorehdr=)
|
||||
can become handy here.
|
||||
When the system kernel boots, it reserves a small section of memory for
|
||||
the dump-capture kernel. This ensures that ongoing Direct Memory Access
|
||||
(DMA) from the system kernel does not corrupt the dump-capture kernel.
|
||||
The kexec -p command loads the dump-capture kernel into this reserved
|
||||
memory.
|
||||
|
||||
- The second interface is through /proc/vmcore. This exports the dump as an ELF
|
||||
format file which can be written out using any file copy command
|
||||
(cp, scp, etc). Further, gdb can be used to perform limited debugging on
|
||||
the dump file. This method ensures methods ensure that there is correct
|
||||
ordering of the dump pages (corresponding to the first 640 KB that has been
|
||||
relocated).
|
||||
On x86 machines, the first 640 KB of physical memory is needed to boot,
|
||||
regardless of where the kernel loads. Therefore, kexec backs up this
|
||||
region just before rebooting into the dump-capture kernel.
|
||||
|
||||
SETUP
|
||||
=====
|
||||
All of the necessary information about the system kernel's core image is
|
||||
encoded in the ELF format, and stored in a reserved area of memory
|
||||
before a crash. The physical address of the start of the ELF header is
|
||||
passed to the dump-capture kernel through the elfcorehdr= boot
|
||||
parameter.
|
||||
|
||||
1) Download the upstream kexec-tools userspace package from
|
||||
http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz.
|
||||
With the dump-capture kernel, you can access the memory image, or "old
|
||||
memory," in two ways:
|
||||
|
||||
Apply the latest consolidated kdump patch on top of kexec-tools-1.101
|
||||
from http://lse.sourceforge.net/kdump/. This arrangment has been made
|
||||
till all the userspace patches supporting kdump are integrated with
|
||||
upstream kexec-tools userspace.
|
||||
- Through a /dev/oldmem device interface. A capture utility can read the
|
||||
device file and write out the memory in raw format. This is a raw dump
|
||||
of memory. Analysis and capture tools must be intelligent enough to
|
||||
determine where to look for the right information.
|
||||
|
||||
2) Download and build the appropriate (2.6.13-rc1 onwards) vanilla kernels.
|
||||
Two kernels need to be built in order to get this feature working.
|
||||
Following are the steps to properly configure the two kernels specific
|
||||
to kexec and kdump features:
|
||||
|
||||
A) First kernel or regular kernel:
|
||||
----------------------------------
|
||||
a) Enable "kexec system call" feature (in Processor type and features).
|
||||
CONFIG_KEXEC=y
|
||||
b) Enable "sysfs file system support" (in Pseudo filesystems).
|
||||
CONFIG_SYSFS=y
|
||||
c) make
|
||||
d) Boot into first kernel with the command line parameter "crashkernel=Y@X".
|
||||
Use appropriate values for X and Y. Y denotes how much memory to reserve
|
||||
for the second kernel, and X denotes at what physical address the
|
||||
reserved memory section starts. For example: "crashkernel=64M@16M".
|
||||
- Through /proc/vmcore. This exports the dump as an ELF-format file that
|
||||
you can write out using file copy commands such as cp or scp. Further,
|
||||
you can use analysis tools such as the GNU Debugger (GDB) and the Crash
|
||||
tool to debug the dump file. This method ensures that the dump pages are
|
||||
correctly ordered.
|
||||
|
||||
|
||||
B) Second kernel or dump capture kernel:
|
||||
---------------------------------------
|
||||
a) For i386 architecture enable Highmem support
|
||||
CONFIG_HIGHMEM=y
|
||||
b) Enable "kernel crash dumps" feature (under "Processor type and features")
|
||||
CONFIG_CRASH_DUMP=y
|
||||
c) Make sure a suitable value for "Physical address where the kernel is
|
||||
loaded" (under "Processor type and features"). By default this value
|
||||
is 0x1000000 (16MB) and it should be same as X (See option d above),
|
||||
e.g., 16 MB or 0x1000000.
|
||||
CONFIG_PHYSICAL_START=0x1000000
|
||||
d) Enable "/proc/vmcore support" (Optional, under "Pseudo filesystems").
|
||||
CONFIG_PROC_VMCORE=y
|
||||
Setup and Installation
|
||||
======================
|
||||
|
||||
3) After booting to regular kernel or first kernel, load the second kernel
|
||||
using the following command:
|
||||
Install kexec-tools and the Kdump patch
|
||||
---------------------------------------
|
||||
|
||||
kexec -p <second-kernel> --args-linux --elf32-core-headers
|
||||
--append="root=<root-dev> init 1 irqpoll maxcpus=1"
|
||||
1) Login as the root user.
|
||||
|
||||
Notes:
|
||||
======
|
||||
i) <second-kernel> has to be a vmlinux image ie uncompressed elf image.
|
||||
bzImage will not work, as of now.
|
||||
ii) --args-linux has to be speicfied as if kexec it loading an elf image,
|
||||
it needs to know that the arguments supplied are of linux type.
|
||||
iii) By default ELF headers are stored in ELF64 format to support systems
|
||||
with more than 4GB memory. Option --elf32-core-headers forces generation
|
||||
of ELF32 headers. The reason for this option being, as of now gdb can
|
||||
not open vmcore file with ELF64 headers on a 32 bit systems. So ELF32
|
||||
headers can be used if one has non-PAE systems and hence memory less
|
||||
than 4GB.
|
||||
iv) Specify "irqpoll" as command line parameter. This reduces driver
|
||||
initialization failures in second kernel due to shared interrupts.
|
||||
v) <root-dev> needs to be specified in a format corresponding to the root
|
||||
device name in the output of mount command.
|
||||
vi) If you have built the drivers required to mount root file system as
|
||||
modules in <second-kernel>, then, specify
|
||||
--initrd=<initrd-for-second-kernel>.
|
||||
vii) Specify maxcpus=1 as, if during first kernel run, if panic happens on
|
||||
non-boot cpus, second kernel doesn't seem to be boot up all the cpus.
|
||||
The other option is to always built the second kernel without SMP
|
||||
support ie CONFIG_SMP=n
|
||||
2) Download the kexec-tools user-space package from the following URL:
|
||||
|
||||
4) After successfully loading the second kernel as above, if a panic occurs
|
||||
system reboots into the second kernel. A module can be written to force
|
||||
the panic or "ALT-SysRq-c" can be used initiate a crash dump for testing
|
||||
purposes.
|
||||
http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz
|
||||
|
||||
5) Once the second kernel has booted, write out the dump file using
|
||||
3) Unpack the tarball with the tar command, as follows:
|
||||
|
||||
tar xvpzf kexec-tools-1.101.tar.gz
|
||||
|
||||
4) Download the latest consolidated Kdump patch from the following URL:
|
||||
|
||||
http://lse.sourceforge.net/kdump/
|
||||
|
||||
(This location is being used until all the user-space Kdump patches
|
||||
are integrated with the kexec-tools package.)
|
||||
|
||||
5) Change to the kexec-tools-1.101 directory, as follows:
|
||||
|
||||
cd kexec-tools-1.101
|
||||
|
||||
6) Apply the consolidated patch to the kexec-tools-1.101 source tree
|
||||
with the patch command, as follows. (Modify the path to the downloaded
|
||||
patch as necessary.)
|
||||
|
||||
patch -p1 < /path-to-kdump-patch/kexec-tools-1.101-kdump.patch
|
||||
|
||||
7) Configure the package, as follows:
|
||||
|
||||
./configure
|
||||
|
||||
8) Compile the package, as follows:
|
||||
|
||||
make
|
||||
|
||||
9) Install the package, as follows:
|
||||
|
||||
make install
|
||||
|
||||
|
||||
Download and build the system and dump-capture kernels
|
||||
------------------------------------------------------
|
||||
|
||||
Download the mainline (vanilla) kernel source code (2.6.13-rc1 or newer)
|
||||
from http://www.kernel.org. Two kernels must be built: a system kernel
|
||||
and a dump-capture kernel. Use the following steps to configure these
|
||||
kernels with the necessary kexec and Kdump features:
|
||||
|
||||
System kernel
|
||||
-------------
|
||||
|
||||
1) Enable "kexec system call" in "Processor type and features."
|
||||
|
||||
CONFIG_KEXEC=y
|
||||
|
||||
2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo
|
||||
filesystems." This is usually enabled by default.
|
||||
|
||||
CONFIG_SYSFS=y
|
||||
|
||||
Note that "sysfs file system support" might not appear in the "Pseudo
|
||||
filesystems" menu if "Configure standard kernel features (for small
|
||||
systems)" is not enabled in "General Setup." In this case, check the
|
||||
.config file itself to ensure that sysfs is turned on, as follows:
|
||||
|
||||
grep 'CONFIG_SYSFS' .config
|
||||
|
||||
3) Enable "Compile the kernel with debug info" in "Kernel hacking."
|
||||
|
||||
CONFIG_DEBUG_INFO=Y
|
||||
|
||||
This causes the kernel to be built with debug symbols. The dump
|
||||
analysis tools require a vmlinux with debug symbols in order to read
|
||||
and analyze a dump file.
|
||||
|
||||
4) Make and install the kernel and its modules. Update the boot loader
|
||||
(such as grub, yaboot, or lilo) configuration files as necessary.
|
||||
|
||||
5) Boot the system kernel with the boot parameter "crashkernel=Y@X",
|
||||
where Y specifies how much memory to reserve for the dump-capture kernel
|
||||
and X specifies the beginning of this reserved memory. For example,
|
||||
"crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
|
||||
starting at physical address 0x01000000 for the dump-capture kernel.
|
||||
|
||||
On x86 and x86_64, use "crashkernel=64M@16M".
|
||||
|
||||
On ppc64, use "crashkernel=128M@32M".
|
||||
|
||||
|
||||
The dump-capture kernel
|
||||
-----------------------
|
||||
|
||||
1) Under "General setup," append "-kdump" to the current string in
|
||||
"Local version."
|
||||
|
||||
2) On x86, enable high memory support under "Processor type and
|
||||
features":
|
||||
|
||||
CONFIG_HIGHMEM64G=y
|
||||
or
|
||||
CONFIG_HIGHMEM4G
|
||||
|
||||
3) On x86 and x86_64, disable symmetric multi-processing support
|
||||
under "Processor type and features":
|
||||
|
||||
CONFIG_SMP=n
|
||||
(If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line
|
||||
when loading the dump-capture kernel, see section "Load the Dump-capture
|
||||
Kernel".)
|
||||
|
||||
4) On ppc64, disable NUMA support and enable EMBEDDED support:
|
||||
|
||||
CONFIG_NUMA=n
|
||||
CONFIG_EMBEDDED=y
|
||||
CONFIG_EEH=N for the dump-capture kernel
|
||||
|
||||
5) Enable "kernel crash dumps" support under "Processor type and
|
||||
features":
|
||||
|
||||
CONFIG_CRASH_DUMP=y
|
||||
|
||||
6) Use a suitable value for "Physical address where the kernel is
|
||||
loaded" (under "Processor type and features"). This only appears when
|
||||
"kernel crash dumps" is enabled. By default this value is 0x1000000
|
||||
(16MB). It should be the same as X in the "crashkernel=Y@X" boot
|
||||
parameter discussed above.
|
||||
|
||||
On x86 and x86_64, use "CONFIG_PHYSICAL_START=0x1000000".
|
||||
|
||||
On ppc64 the value is automatically set at 32MB when
|
||||
CONFIG_CRASH_DUMP is set.
|
||||
|
||||
6) Optionally enable "/proc/vmcore support" under "Filesystems" ->
|
||||
"Pseudo filesystems".
|
||||
|
||||
CONFIG_PROC_VMCORE=y
|
||||
(CONFIG_PROC_VMCORE is set by default when CONFIG_CRASH_DUMP is selected.)
|
||||
|
||||
7) Make and install the kernel and its modules. DO NOT add this kernel
|
||||
to the boot loader configuration files.
|
||||
|
||||
|
||||
Load the Dump-capture Kernel
|
||||
============================
|
||||
|
||||
After booting to the system kernel, load the dump-capture kernel using
|
||||
the following command:
|
||||
|
||||
kexec -p <dump-capture-kernel> \
|
||||
--initrd=<initrd-for-dump-capture-kernel> --args-linux \
|
||||
--append="root=<root-dev> init 1 irqpoll"
|
||||
|
||||
|
||||
Notes on loading the dump-capture kernel:
|
||||
|
||||
* <dump-capture-kernel> must be a vmlinux image (that is, an
|
||||
uncompressed ELF image). bzImage does not work at this time.
|
||||
|
||||
* By default, the ELF headers are stored in ELF64 format to support
|
||||
systems with more than 4GB memory. The --elf32-core-headers option can
|
||||
be used to force the generation of ELF32 headers. This is necessary
|
||||
because GDB currently cannot open vmcore files with ELF64 headers on
|
||||
32-bit systems. ELF32 headers can be used on non-PAE systems (that is,
|
||||
less than 4GB of memory).
|
||||
|
||||
* The "irqpoll" boot parameter reduces driver initialization failures
|
||||
due to shared interrupts in the dump-capture kernel.
|
||||
|
||||
* You must specify <root-dev> in the format corresponding to the root
|
||||
device name in the output of mount command.
|
||||
|
||||
* "init 1" boots the dump-capture kernel into single-user mode without
|
||||
networking. If you want networking, use "init 3."
|
||||
|
||||
|
||||
Kernel Panic
|
||||
============
|
||||
|
||||
After successfully loading the dump-capture kernel as previously
|
||||
described, the system will reboot into the dump-capture kernel if a
|
||||
system crash is triggered. Trigger points are located in panic(),
|
||||
die(), die_nmi() and in the sysrq handler (ALT-SysRq-c).
|
||||
|
||||
The following conditions will execute a crash trigger point:
|
||||
|
||||
If a hard lockup is detected and "NMI watchdog" is configured, the system
|
||||
will boot into the dump-capture kernel ( die_nmi() ).
|
||||
|
||||
If die() is called, and it happens to be a thread with pid 0 or 1, or die()
|
||||
is called inside interrupt context or die() is called and panic_on_oops is set,
|
||||
the system will boot into the dump-capture kernel.
|
||||
|
||||
On powererpc systems when a soft-reset is generated, die() is called by all cpus and the system system will boot into the dump-capture kernel.
|
||||
|
||||
For testing purposes, you can trigger a crash by using "ALT-SysRq-c",
|
||||
"echo c > /proc/sysrq-trigger or write a module to force the panic.
|
||||
|
||||
Write Out the Dump File
|
||||
=======================
|
||||
|
||||
After the dump-capture kernel is booted, write out the dump file with
|
||||
the following command:
|
||||
|
||||
cp /proc/vmcore <dump-file>
|
||||
|
||||
Dump memory can also be accessed as a /dev/oldmem device for a linear/raw
|
||||
view. To create the device, type:
|
||||
You can also access dumped memory as a /dev/oldmem device for a linear
|
||||
and raw view. To create the device, use the following command:
|
||||
|
||||
mknod /dev/oldmem c 1 12
|
||||
mknod /dev/oldmem c 1 12
|
||||
|
||||
Use "dd" with suitable options for count, bs and skip to access specific
|
||||
portions of the dump.
|
||||
Use the dd command with suitable options for count, bs, and skip to
|
||||
access specific portions of the dump.
|
||||
|
||||
Entire memory: dd if=/dev/oldmem of=oldmem.001
|
||||
To see the entire memory, use the following command:
|
||||
|
||||
dd if=/dev/oldmem of=oldmem.001
|
||||
|
||||
|
||||
ANALYSIS
|
||||
Analysis
|
||||
========
|
||||
Limited analysis can be done using gdb on the dump file copied out of
|
||||
/proc/vmcore. Use vmlinux built with -g and run
|
||||
|
||||
gdb vmlinux <dump-file>
|
||||
Before analyzing the dump image, you should reboot into a stable kernel.
|
||||
|
||||
Stack trace for the task on processor 0, register display, memory display
|
||||
work fine.
|
||||
You can do limited analysis using GDB on the dump file copied out of
|
||||
/proc/vmcore. Use the debug vmlinux built with -g and run the following
|
||||
command:
|
||||
|
||||
Note: gdb cannot analyse core files generated in ELF64 format for i386.
|
||||
gdb vmlinux <dump-file>
|
||||
|
||||
Latest "crash" (crash-4.0-2.18) as available on Dave Anderson's site
|
||||
http://people.redhat.com/~anderson/ works well with kdump format.
|
||||
Stack trace for the task on processor 0, register display, and memory
|
||||
display work fine.
|
||||
|
||||
Note: GDB cannot analyze core files generated in ELF64 format for x86.
|
||||
On systems with a maximum of 4GB of memory, you can generate
|
||||
ELF32-format headers using the --elf32-core-headers kernel option on the
|
||||
dump kernel.
|
||||
|
||||
You can also use the Crash utility to analyze dump files in Kdump
|
||||
format. Crash is available on Dave Anderson's site at the following URL:
|
||||
|
||||
http://people.redhat.com/~anderson/
|
||||
|
||||
|
||||
TODO
|
||||
====
|
||||
1) Provide a kernel pages filtering mechanism so that core file size is not
|
||||
insane on systems having huge memory banks.
|
||||
2) Relocatable kernel can help in maintaining multiple kernels for crashdump
|
||||
and same kernel as the first kernel can be used to capture the dump.
|
||||
To Do
|
||||
=====
|
||||
|
||||
1) Provide a kernel pages filtering mechanism, so core file size is not
|
||||
extreme on systems with huge memory banks.
|
||||
|
||||
2) Relocatable kernel can help in maintaining multiple kernels for
|
||||
crash_dump, and the same kernel as the system kernel can be used to
|
||||
capture the dump.
|
||||
|
||||
|
||||
CONTACT
|
||||
Contact
|
||||
=======
|
||||
|
||||
Vivek Goyal (vgoyal@in.ibm.com)
|
||||
Maneesh Soni (maneesh@in.ibm.com)
|
||||
|
||||
|
||||
Trademark
|
||||
=========
|
||||
|
||||
Linux is a trademark of Linus Torvalds in the United States, other
|
||||
countries, or both.
|
||||
|
@ -35,7 +35,6 @@ parameter is applicable:
|
||||
APM Advanced Power Management support is enabled.
|
||||
AX25 Appropriate AX.25 support is enabled.
|
||||
CD Appropriate CD support is enabled.
|
||||
DEVFS devfs support is enabled.
|
||||
DRM Direct Rendering Management support is enabled.
|
||||
EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
|
||||
EFI EFI Partitioning (GPT) is enabled
|
||||
@ -61,6 +60,7 @@ parameter is applicable:
|
||||
MTD MTD support is enabled.
|
||||
NET Appropriate network support is enabled.
|
||||
NUMA NUMA support is enabled.
|
||||
GENERIC_TIME The generic timeofday code is enabled.
|
||||
NFS Appropriate NFS support is enabled.
|
||||
OSS OSS sound support is enabled.
|
||||
PARIDE The ParIDE subsystem is enabled.
|
||||
@ -147,6 +147,9 @@ running once the system is up.
|
||||
acpi_irq_isa= [HW,ACPI] If irq_balance, mark listed IRQs used by ISA
|
||||
Format: <irq>,<irq>...
|
||||
|
||||
acpi_os_name= [HW,ACPI] Tell ACPI BIOS the name of the OS
|
||||
Format: To spoof as Windows 98: ="Microsoft Windows"
|
||||
|
||||
acpi_osi= [HW,ACPI] empty param disables _OSI
|
||||
|
||||
acpi_serialize [HW,ACPI] force serialization of AML methods
|
||||
@ -176,6 +179,11 @@ running once the system is up.
|
||||
override platform specific driver.
|
||||
See also Documentation/acpi-hotkey.txt.
|
||||
|
||||
acpi_pm_good [IA-32,X86-64]
|
||||
Override the pmtimer bug detection: force the kernel
|
||||
to assume that this machine's pmtimer latches its value
|
||||
and always returns good values.
|
||||
|
||||
enable_timer_pin_1 [i386,x86-64]
|
||||
Enable PIN 1 of APIC timer
|
||||
Can be useful to work around chipset bugs
|
||||
@ -338,10 +346,11 @@ running once the system is up.
|
||||
Value can be changed at runtime via
|
||||
/selinux/checkreqprot.
|
||||
|
||||
clock= [BUGS=IA-32,HW] gettimeofday timesource override.
|
||||
Forces specified timesource (if avaliable) to be used
|
||||
when calculating gettimeofday(). If specicified
|
||||
timesource is not avalible, it defaults to PIT.
|
||||
clock= [BUGS=IA-32, HW] gettimeofday clocksource override.
|
||||
[Deprecated]
|
||||
Forces specified clocksource (if avaliable) to be used
|
||||
when calculating gettimeofday(). If specified
|
||||
clocksource is not avalible, it defaults to PIT.
|
||||
Format: { pit | tsc | cyclone | pmtmr }
|
||||
|
||||
disable_8254_timer
|
||||
@ -430,9 +439,6 @@ running once the system is up.
|
||||
Format: <area>[,<node>]
|
||||
See also Documentation/networking/decnet.txt.
|
||||
|
||||
devfs= [DEVFS]
|
||||
See Documentation/filesystems/devfs/boot-options.
|
||||
|
||||
dhash_entries= [KNL]
|
||||
Set number of hash buckets for dentry cache.
|
||||
|
||||
@ -1614,6 +1620,10 @@ running once the system is up.
|
||||
|
||||
time Show timing data prefixed to each printk message line
|
||||
|
||||
clocksource= [GENERIC_TIME] Override the default clocksource
|
||||
Override the default clocksource and use the clocksource
|
||||
with the name specified.
|
||||
|
||||
tipar.timeout= [HW,PPT]
|
||||
Set communications timeout in tenths of a second
|
||||
(default 15).
|
||||
@ -1655,6 +1665,10 @@ running once the system is up.
|
||||
usbhid.mousepoll=
|
||||
[USBHID] The interval which mice are to be polled at.
|
||||
|
||||
vdso= [IA-32]
|
||||
vdso=1: enable VDSO (default)
|
||||
vdso=0: disable VDSO mapping
|
||||
|
||||
video= [FB] Frame buffer configuration
|
||||
See Documentation/fb/modedb.txt.
|
||||
|
||||
@ -1671,9 +1685,14 @@ running once the system is up.
|
||||
decrease the size and leave more room for directly
|
||||
mapped kernel RAM.
|
||||
|
||||
vmhalt= [KNL,S390]
|
||||
vmhalt= [KNL,S390] Perform z/VM CP command after system halt.
|
||||
Format: <command>
|
||||
|
||||
vmpoff= [KNL,S390]
|
||||
vmpanic= [KNL,S390] Perform z/VM CP command after kernel panic.
|
||||
Format: <command>
|
||||
|
||||
vmpoff= [KNL,S390] Perform z/VM CP command after power off.
|
||||
Format: <command>
|
||||
|
||||
waveartist= [HW,OSS]
|
||||
Format: <io>,<irq>,<dma>,<dma2>
|
||||
|
@ -3,16 +3,23 @@
|
||||
===================
|
||||
|
||||
The key request service is part of the key retention service (refer to
|
||||
Documentation/keys.txt). This document explains more fully how that the
|
||||
requesting algorithm works.
|
||||
Documentation/keys.txt). This document explains more fully how the requesting
|
||||
algorithm works.
|
||||
|
||||
The process starts by either the kernel requesting a service by calling
|
||||
request_key():
|
||||
request_key*():
|
||||
|
||||
struct key *request_key(const struct key_type *type,
|
||||
const char *description,
|
||||
const char *callout_string);
|
||||
|
||||
or:
|
||||
|
||||
struct key *request_key_with_auxdata(const struct key_type *type,
|
||||
const char *description,
|
||||
const char *callout_string,
|
||||
void *aux);
|
||||
|
||||
Or by userspace invoking the request_key system call:
|
||||
|
||||
key_serial_t request_key(const char *type,
|
||||
@ -20,16 +27,26 @@ Or by userspace invoking the request_key system call:
|
||||
const char *callout_info,
|
||||
key_serial_t dest_keyring);
|
||||
|
||||
The main difference between the two access points is that the in-kernel
|
||||
interface does not need to link the key to a keyring to prevent it from being
|
||||
immediately destroyed. The kernel interface returns a pointer directly to the
|
||||
key, and it's up to the caller to destroy the key.
|
||||
The main difference between the access points is that the in-kernel interface
|
||||
does not need to link the key to a keyring to prevent it from being immediately
|
||||
destroyed. The kernel interface returns a pointer directly to the key, and
|
||||
it's up to the caller to destroy the key.
|
||||
|
||||
The request_key_with_auxdata() call is like the in-kernel request_key() call,
|
||||
except that it permits auxiliary data to be passed to the upcaller (the default
|
||||
is NULL). This is only useful for those key types that define their own upcall
|
||||
mechanism rather than using /sbin/request-key.
|
||||
|
||||
The userspace interface links the key to a keyring associated with the process
|
||||
to prevent the key from going away, and returns the serial number of the key to
|
||||
the caller.
|
||||
|
||||
|
||||
The following example assumes that the key types involved don't define their
|
||||
own upcall mechanisms. If they do, then those should be substituted for the
|
||||
forking and execution of /sbin/request-key.
|
||||
|
||||
|
||||
===========
|
||||
THE PROCESS
|
||||
===========
|
||||
@ -40,8 +57,8 @@ A request proceeds in the following manner:
|
||||
interface].
|
||||
|
||||
(2) request_key() searches the process's subscribed keyrings to see if there's
|
||||
a suitable key there. If there is, it returns the key. If there isn't, and
|
||||
callout_info is not set, an error is returned. Otherwise the process
|
||||
a suitable key there. If there is, it returns the key. If there isn't,
|
||||
and callout_info is not set, an error is returned. Otherwise the process
|
||||
proceeds to the next step.
|
||||
|
||||
(3) request_key() sees that A doesn't have the desired key yet, so it creates
|
||||
@ -62,7 +79,7 @@ A request proceeds in the following manner:
|
||||
instantiation.
|
||||
|
||||
(7) The program may want to access another key from A's context (say a
|
||||
Kerberos TGT key). It just requests the appropriate key, and the keyring
|
||||
Kerberos TGT key). It just requests the appropriate key, and the keyring
|
||||
search notes that the session keyring has auth key V in its bottom level.
|
||||
|
||||
This will permit it to then search the keyrings of process A with the
|
||||
@ -79,10 +96,11 @@ A request proceeds in the following manner:
|
||||
(10) The program then exits 0 and request_key() deletes key V and returns key
|
||||
U to the caller.
|
||||
|
||||
This also extends further. If key W (step 7 above) didn't exist, key W would be
|
||||
created uninstantiated, another auth key (X) would be created (as per step 3)
|
||||
and another copy of /sbin/request-key spawned (as per step 4); but the context
|
||||
specified by auth key X will still be process A, as it was in auth key V.
|
||||
This also extends further. If key W (step 7 above) didn't exist, key W would
|
||||
be created uninstantiated, another auth key (X) would be created (as per step
|
||||
3) and another copy of /sbin/request-key spawned (as per step 4); but the
|
||||
context specified by auth key X will still be process A, as it was in auth key
|
||||
V.
|
||||
|
||||
This is because process A's keyrings can't simply be attached to
|
||||
/sbin/request-key at the appropriate places because (a) execve will discard two
|
||||
@ -118,17 +136,17 @@ A search of any particular keyring proceeds in the following fashion:
|
||||
|
||||
(2) It considers all the non-keyring keys within that keyring and, if any key
|
||||
matches the criteria specified, calls key_permission(SEARCH) on it to see
|
||||
if the key is allowed to be found. If it is, that key is returned; if
|
||||
if the key is allowed to be found. If it is, that key is returned; if
|
||||
not, the search continues, and the error code is retained if of higher
|
||||
priority than the one currently set.
|
||||
|
||||
(3) It then considers all the keyring-type keys in the keyring it's currently
|
||||
searching. It calls key_permission(SEARCH) on each keyring, and if this
|
||||
searching. It calls key_permission(SEARCH) on each keyring, and if this
|
||||
grants permission, it recurses, executing steps (2) and (3) on that
|
||||
keyring.
|
||||
|
||||
The process stops immediately a valid key is found with permission granted to
|
||||
use it. Any error from a previous match attempt is discarded and the key is
|
||||
use it. Any error from a previous match attempt is discarded and the key is
|
||||
returned.
|
||||
|
||||
When search_process_keyrings() is invoked, it performs the following searches
|
||||
@ -153,7 +171,7 @@ The moment one succeeds, all pending errors are discarded and the found key is
|
||||
returned.
|
||||
|
||||
Only if all these fail does the whole thing fail with the highest priority
|
||||
error. Note that several errors may have come from LSM.
|
||||
error. Note that several errors may have come from LSM.
|
||||
|
||||
The error priority is:
|
||||
|
||||
|
@ -19,6 +19,7 @@ This document has the following sections:
|
||||
- Key overview
|
||||
- Key service overview
|
||||
- Key access permissions
|
||||
- SELinux support
|
||||
- New procfs files
|
||||
- Userspace system call interface
|
||||
- Kernel services
|
||||
@ -232,6 +233,39 @@ For changing the ownership, group ID or permissions mask, being the owner of
|
||||
the key or having the sysadmin capability is sufficient.
|
||||
|
||||
|
||||
===============
|
||||
SELINUX SUPPORT
|
||||
===============
|
||||
|
||||
The security class "key" has been added to SELinux so that mandatory access
|
||||
controls can be applied to keys created within various contexts. This support
|
||||
is preliminary, and is likely to change quite significantly in the near future.
|
||||
Currently, all of the basic permissions explained above are provided in SELinux
|
||||
as well; SELinux is simply invoked after all basic permission checks have been
|
||||
performed.
|
||||
|
||||
The value of the file /proc/self/attr/keycreate influences the labeling of
|
||||
newly-created keys. If the contents of that file correspond to an SELinux
|
||||
security context, then the key will be assigned that context. Otherwise, the
|
||||
key will be assigned the current context of the task that invoked the key
|
||||
creation request. Tasks must be granted explicit permission to assign a
|
||||
particular context to newly-created keys, using the "create" permission in the
|
||||
key security class.
|
||||
|
||||
The default keyrings associated with users will be labeled with the default
|
||||
context of the user if and only if the login programs have been instrumented to
|
||||
properly initialize keycreate during the login process. Otherwise, they will
|
||||
be labeled with the context of the login program itself.
|
||||
|
||||
Note, however, that the default keyrings associated with the root user are
|
||||
labeled with the default kernel context, since they are created early in the
|
||||
boot process, before root has a chance to log in.
|
||||
|
||||
The keyrings associated with new threads are each labeled with the context of
|
||||
their associated thread, and both session and process keyrings are handled
|
||||
similarly.
|
||||
|
||||
|
||||
================
|
||||
NEW PROCFS FILES
|
||||
================
|
||||
@ -241,9 +275,17 @@ about the status of the key service:
|
||||
|
||||
(*) /proc/keys
|
||||
|
||||
This lists all the keys on the system, giving information about their
|
||||
type, description and permissions. The payload of the key is not available
|
||||
this way:
|
||||
This lists the keys that are currently viewable by the task reading the
|
||||
file, giving information about their type, description and permissions.
|
||||
It is not possible to view the payload of the key this way, though some
|
||||
information about it may be given.
|
||||
|
||||
The only keys included in the list are those that grant View permission to
|
||||
the reading process whether or not it possesses them. Note that LSM
|
||||
security checks are still performed, and may further filter out keys that
|
||||
the current process is not authorised to view.
|
||||
|
||||
The contents of the file look like this:
|
||||
|
||||
SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY
|
||||
00000001 I----- 39 perm 1f3f0000 0 0 keyring _uid_ses.0: 1/4
|
||||
@ -271,7 +313,7 @@ about the status of the key service:
|
||||
(*) /proc/key-users
|
||||
|
||||
This file lists the tracking data for each user that has at least one key
|
||||
on the system. Such data includes quota information and statistics:
|
||||
on the system. Such data includes quota information and statistics:
|
||||
|
||||
[root@andromeda root]# cat /proc/key-users
|
||||
0: 46 45/45 1/100 13/10000
|
||||
@ -738,6 +780,17 @@ payload contents" for more information.
|
||||
See also Documentation/keys-request-key.txt.
|
||||
|
||||
|
||||
(*) To search for a key, passing auxiliary data to the upcaller, call:
|
||||
|
||||
struct key *request_key_with_auxdata(const struct key_type *type,
|
||||
const char *description,
|
||||
const char *callout_string,
|
||||
void *aux);
|
||||
|
||||
This is identical to request_key(), except that the auxiliary data is
|
||||
passed to the key_type->request_key() op if it exists.
|
||||
|
||||
|
||||
(*) When it is no longer required, the key should be released using:
|
||||
|
||||
void key_put(struct key *key);
|
||||
@ -935,6 +988,16 @@ The structure has a number of fields, some of which are mandatory:
|
||||
It is not safe to sleep in this method; the caller may hold spinlocks.
|
||||
|
||||
|
||||
(*) void (*revoke)(struct key *key);
|
||||
|
||||
This method is optional. It is called to discard part of the payload
|
||||
data upon a key being revoked. The caller will have the key semaphore
|
||||
write-locked.
|
||||
|
||||
It is safe to sleep in this method, though care should be taken to avoid
|
||||
a deadlock against the key semaphore.
|
||||
|
||||
|
||||
(*) void (*destroy)(struct key *key);
|
||||
|
||||
This method is optional. It is called to discard the payload data on a key
|
||||
@ -979,6 +1042,24 @@ The structure has a number of fields, some of which are mandatory:
|
||||
as might happen when the userspace buffer is accessed.
|
||||
|
||||
|
||||
(*) int (*request_key)(struct key *key, struct key *authkey, const char *op,
|
||||
void *aux);
|
||||
|
||||
This method is optional. If provided, request_key() and
|
||||
request_key_with_auxdata() will invoke this function rather than
|
||||
upcalling to /sbin/request-key to operate upon a key of this type.
|
||||
|
||||
The aux parameter is as passed to request_key_with_auxdata() or is NULL
|
||||
otherwise. Also passed are the key to be operated upon, the
|
||||
authorisation key for this operation and the operation type (currently
|
||||
only "create").
|
||||
|
||||
This function should return only when the upcall is complete. Upon return
|
||||
the authorisation key will be revoked, and the target key will be
|
||||
negatively instantiated if it is still uninstantiated. The error will be
|
||||
returned to the caller of request_key*().
|
||||
|
||||
|
||||
============================
|
||||
REQUEST-KEY CALLBACK SERVICE
|
||||
============================
|
||||
|
@ -200,6 +200,17 @@ All md devices contain:
|
||||
This can be written only while the array is being assembled, not
|
||||
after it is started.
|
||||
|
||||
layout
|
||||
The "layout" for the array for the particular level. This is
|
||||
simply a number that is interpretted differently by different
|
||||
levels. It can be written while assembling an array.
|
||||
|
||||
resync_start
|
||||
The point at which resync should start. If no resync is needed,
|
||||
this will be a very large number. At array creation it will
|
||||
default to 0, though starting the array as 'clean' will
|
||||
set it much larger.
|
||||
|
||||
new_dev
|
||||
This file can be written but not read. The value written should
|
||||
be a block device number as major:minor. e.g. 8:0
|
||||
@ -207,6 +218,54 @@ All md devices contain:
|
||||
available. It will then appear at md/dev-XXX (depending on the
|
||||
name of the device) and further configuration is then possible.
|
||||
|
||||
safe_mode_delay
|
||||
When an md array has seen no write requests for a certain period
|
||||
of time, it will be marked as 'clean'. When another write
|
||||
request arrive, the array is marked as 'dirty' before the write
|
||||
commenses. This is known as 'safe_mode'.
|
||||
The 'certain period' is controlled by this file which stores the
|
||||
period as a number of seconds. The default is 200msec (0.200).
|
||||
Writing a value of 0 disables safemode.
|
||||
|
||||
array_state
|
||||
This file contains a single word which describes the current
|
||||
state of the array. In many cases, the state can be set by
|
||||
writing the word for the desired state, however some states
|
||||
cannot be explicitly set, and some transitions are not allowed.
|
||||
|
||||
clear
|
||||
No devices, no size, no level
|
||||
Writing is equivalent to STOP_ARRAY ioctl
|
||||
inactive
|
||||
May have some settings, but array is not active
|
||||
all IO results in error
|
||||
When written, doesn't tear down array, but just stops it
|
||||
suspended (not supported yet)
|
||||
All IO requests will block. The array can be reconfigured.
|
||||
Writing this, if accepted, will block until array is quiessent
|
||||
readonly
|
||||
no resync can happen. no superblocks get written.
|
||||
write requests fail
|
||||
read-auto
|
||||
like readonly, but behaves like 'clean' on a write request.
|
||||
|
||||
clean - no pending writes, but otherwise active.
|
||||
When written to inactive array, starts without resync
|
||||
If a write request arrives then
|
||||
if metadata is known, mark 'dirty' and switch to 'active'.
|
||||
if not known, block and switch to write-pending
|
||||
If written to an active array that has pending writes, then fails.
|
||||
active
|
||||
fully active: IO and resync can be happening.
|
||||
When written to inactive array, starts with resync
|
||||
|
||||
write-pending
|
||||
clean, but writes are blocked waiting for 'active' to be written.
|
||||
|
||||
active-idle
|
||||
like active, but no writes have been seen for a while (safe_mode_delay).
|
||||
|
||||
|
||||
sync_speed_min
|
||||
sync_speed_max
|
||||
This are similar to /proc/sys/dev/raid/speed_limit_{min,max}
|
||||
@ -250,10 +309,18 @@ Each directory contains:
|
||||
faulty - device has been kicked from active use due to
|
||||
a detected fault
|
||||
in_sync - device is a fully in-sync member of the array
|
||||
writemostly - device will only be subject to read
|
||||
requests if there are no other options.
|
||||
This applies only to raid1 arrays.
|
||||
spare - device is working, but not a full member.
|
||||
This includes spares that are in the process
|
||||
of being recoverred to
|
||||
This list make grow in future.
|
||||
This can be written to.
|
||||
Writing "faulty" simulates a failure on the device.
|
||||
Writing "remove" removes the device from the array.
|
||||
Writing "writemostly" sets the writemostly flag.
|
||||
Writing "-writemostly" clears the writemostly flag.
|
||||
|
||||
errors
|
||||
An approximate count of read errors that have been detected on
|
||||
|
@ -262,9 +262,14 @@ What is required is some way of intervening to instruct the compiler and the
|
||||
CPU to restrict the order.
|
||||
|
||||
Memory barriers are such interventions. They impose a perceived partial
|
||||
ordering between the memory operations specified on either side of the barrier.
|
||||
They request that the sequence of memory events generated appears to other
|
||||
parts of the system as if the barrier is effective on that CPU.
|
||||
ordering over the memory operations on either side of the barrier.
|
||||
|
||||
Such enforcement is important because the CPUs and other devices in a system
|
||||
can use a variety of tricks to improve performance - including reordering,
|
||||
deferral and combination of memory operations; speculative loads; speculative
|
||||
branch prediction and various types of caching. Memory barriers are used to
|
||||
override or suppress these tricks, allowing the code to sanely control the
|
||||
interaction of multiple CPUs and/or devices.
|
||||
|
||||
|
||||
VARIETIES OF MEMORY BARRIER
|
||||
@ -282,7 +287,7 @@ Memory barriers come in four basic varieties:
|
||||
A write barrier is a partial ordering on stores only; it is not required
|
||||
to have any effect on loads.
|
||||
|
||||
A CPU can be viewed as as commiting a sequence of store operations to the
|
||||
A CPU can be viewed as committing a sequence of store operations to the
|
||||
memory system as time progresses. All stores before a write barrier will
|
||||
occur in the sequence _before_ all the stores after the write barrier.
|
||||
|
||||
@ -413,7 +418,7 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
|
||||
indirect effect will be the order in which the second CPU sees the effects
|
||||
of the first CPU's accesses occur, but see the next point:
|
||||
|
||||
(*) There is no guarantee that the a CPU will see the correct order of effects
|
||||
(*) There is no guarantee that a CPU will see the correct order of effects
|
||||
from a second CPU's accesses, even _if_ the second CPU uses a memory
|
||||
barrier, unless the first CPU _also_ uses a matching memory barrier (see
|
||||
the subsection on "SMP Barrier Pairing").
|
||||
@ -461,8 +466,8 @@ Whilst this may seem like a failure of coherency or causality maintenance, it
|
||||
isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
|
||||
Alpha).
|
||||
|
||||
To deal with this, a data dependency barrier must be inserted between the
|
||||
address load and the data load:
|
||||
To deal with this, a data dependency barrier or better must be inserted
|
||||
between the address load and the data load:
|
||||
|
||||
CPU 1 CPU 2
|
||||
=============== ===============
|
||||
@ -484,7 +489,7 @@ lines. The pointer P might be stored in an odd-numbered cache line, and the
|
||||
variable B might be stored in an even-numbered cache line. Then, if the
|
||||
even-numbered bank of the reading CPU's cache is extremely busy while the
|
||||
odd-numbered bank is idle, one can see the new value of the pointer P (&B),
|
||||
but the old value of the variable B (1).
|
||||
but the old value of the variable B (2).
|
||||
|
||||
|
||||
Another example of where data dependency barriers might by required is where a
|
||||
@ -597,7 +602,7 @@ Consider the following sequence of events:
|
||||
|
||||
This sequence of events is committed to the memory coherence system in an order
|
||||
that the rest of the system might perceive as the unordered set of { STORE A,
|
||||
STORE B, STORE C } all occuring before the unordered set of { STORE D, STORE E
|
||||
STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E
|
||||
}:
|
||||
|
||||
+-------+ : :
|
||||
@ -744,7 +749,7 @@ some effectively random order, despite the write barrier issued by CPU 1:
|
||||
: :
|
||||
|
||||
|
||||
If, however, a read barrier were to be placed between the load of E and the
|
||||
If, however, a read barrier were to be placed between the load of B and the
|
||||
load of A on CPU 2:
|
||||
|
||||
CPU 1 CPU 2
|
||||
@ -1461,9 +1466,8 @@ instruction itself is complete.
|
||||
|
||||
On a UP system - where this wouldn't be a problem - the smp_mb() is just a
|
||||
compiler barrier, thus making sure the compiler emits the instructions in the
|
||||
right order without actually intervening in the CPU. Since there there's only
|
||||
one CPU, that CPU's dependency ordering logic will take care of everything
|
||||
else.
|
||||
right order without actually intervening in the CPU. Since there's only one
|
||||
CPU, that CPU's dependency ordering logic will take care of everything else.
|
||||
|
||||
|
||||
ATOMIC OPERATIONS
|
||||
@ -1640,9 +1644,9 @@ functions:
|
||||
|
||||
The PCI bus, amongst others, defines an I/O space concept - which on such
|
||||
CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O
|
||||
space. However, it may also mapped as a virtual I/O space in the CPU's
|
||||
memory map, particularly on those CPUs that don't support alternate
|
||||
I/O spaces.
|
||||
space. However, it may also be mapped as a virtual I/O space in the CPU's
|
||||
memory map, particularly on those CPUs that don't support alternate I/O
|
||||
spaces.
|
||||
|
||||
Accesses to this space may be fully synchronous (as on i386), but
|
||||
intermediary bridges (such as the PCI host bridge) may not fully honour
|
||||
|
@ -74,7 +74,7 @@ Examples:
|
||||
pgset "pkt_size 9014" sets packet size to 9014
|
||||
pgset "frags 5" packet will consist of 5 fragments
|
||||
pgset "count 200000" sets number of packets to send, set to zero
|
||||
for continious sends untill explicitl stopped.
|
||||
for continuous sends until explicitly stopped.
|
||||
|
||||
pgset "delay 5000" adds delay to hard_start_xmit(). nanoseconds
|
||||
|
||||
|
@ -39,10 +39,13 @@ Copyright (C) 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com>
|
||||
mknod /dev/net/tun c 10 200
|
||||
|
||||
Set permissions:
|
||||
e.g. chmod 0700 /dev/net/tun
|
||||
if you want the device only accessible by root. Giving regular users the
|
||||
right to assign network devices is NOT a good idea. Users could assign
|
||||
bogus network interfaces to trick firewalls or administrators.
|
||||
e.g. chmod 0666 /dev/net/tun
|
||||
There's no harm in allowing the device to be accessible by non-root users,
|
||||
since CAP_NET_ADMIN is required for creating network devices or for
|
||||
connecting to network devices which aren't owned by the user in question.
|
||||
If you want to create persistent devices and give ownership of them to
|
||||
unprivileged users, then you need the /dev/net/tun device to be usable by
|
||||
those users.
|
||||
|
||||
Driver module autoloading
|
||||
|
||||
|
@ -213,11 +213,19 @@ have been remapped by the kernel.
|
||||
|
||||
See Documentation/IO-mapping.txt for how to access device memory.
|
||||
|
||||
You still need to call request_region() for I/O regions and
|
||||
request_mem_region() for memory regions to make sure nobody else is using the
|
||||
same device.
|
||||
The device driver needs to call pci_request_region() to make sure
|
||||
no other device is already using the same resource. The driver is expected
|
||||
to determine MMIO and IO Port resource availability _before_ calling
|
||||
pci_enable_device(). Conversely, drivers should call pci_release_region()
|
||||
_after_ calling pci_disable_device(). The idea is to prevent two devices
|
||||
colliding on the same address range.
|
||||
|
||||
All interrupt handlers should be registered with SA_SHIRQ and use the devid
|
||||
Generic flavors of pci_request_region() are request_mem_region()
|
||||
(for MMIO ranges) and request_region() (for IO Port ranges).
|
||||
Use these for address resources that are not described by "normal" PCI
|
||||
interfaces (e.g. BAR).
|
||||
|
||||
All interrupt handlers should be registered with IRQF_SHARED and use the devid
|
||||
to map IRQs to devices (remember that all PCI interrupts are shared).
|
||||
|
||||
|
||||
|
32
Documentation/pcmcia/crc32hash.c
Normal file
32
Documentation/pcmcia/crc32hash.c
Normal file
@ -0,0 +1,32 @@
|
||||
/* crc32hash.c - derived from linux/lib/crc32.c, GNU GPL v2 */
|
||||
/* Usage example:
|
||||
$ ./crc32hash "Dual Speed"
|
||||
*/
|
||||
|
||||
#include <string.h>
|
||||
#include <stdio.h>
|
||||
#include <ctype.h>
|
||||
#include <stdlib.h>
|
||||
|
||||
unsigned int crc32(unsigned char const *p, unsigned int len)
|
||||
{
|
||||
int i;
|
||||
unsigned int crc = 0;
|
||||
while (len--) {
|
||||
crc ^= *p++;
|
||||
for (i = 0; i < 8; i++)
|
||||
crc = (crc >> 1) ^ ((crc & 1) ? 0xedb88320 : 0);
|
||||
}
|
||||
return crc;
|
||||
}
|
||||
|
||||
int main(int argc, char **argv) {
|
||||
unsigned int result;
|
||||
if (argc != 2) {
|
||||
printf("no string passed as argument\n");
|
||||
return -1;
|
||||
}
|
||||
result = crc32(argv[1], strlen(argv[1]));
|
||||
printf("0x%x\n", result);
|
||||
return 0;
|
||||
}
|
@ -27,37 +27,7 @@ pcmcia:m0149cC1ABf06pfn00fn00pa725B842DpbF1EFEE84pc0877B627pd00000000
|
||||
The hex value after "pa" is the hash of product ID string 1, after "pb" for
|
||||
string 2 and so on.
|
||||
|
||||
Alternatively, you can use this small tool to determine the crc32 hash.
|
||||
simply pass the string you want to evaluate as argument to this program,
|
||||
e.g.
|
||||
Alternatively, you can use crc32hash (see Documentation/pcmcia/crc32hash.c)
|
||||
to determine the crc32 hash. Simply pass the string you want to evaluate
|
||||
as argument to this program, e.g.:
|
||||
$ ./crc32hash "Dual Speed"
|
||||
|
||||
-------------------------------------------------------------------------
|
||||
/* crc32hash.c - derived from linux/lib/crc32.c, GNU GPL v2 */
|
||||
#include <string.h>
|
||||
#include <stdio.h>
|
||||
#include <ctype.h>
|
||||
#include <stdlib.h>
|
||||
|
||||
unsigned int crc32(unsigned char const *p, unsigned int len)
|
||||
{
|
||||
int i;
|
||||
unsigned int crc = 0;
|
||||
while (len--) {
|
||||
crc ^= *p++;
|
||||
for (i = 0; i < 8; i++)
|
||||
crc = (crc >> 1) ^ ((crc & 1) ? 0xedb88320 : 0);
|
||||
}
|
||||
return crc;
|
||||
}
|
||||
|
||||
int main(int argc, char **argv) {
|
||||
unsigned int result;
|
||||
if (argc != 2) {
|
||||
printf("no string passed as argument\n");
|
||||
return -1;
|
||||
}
|
||||
result = crc32(argv[1], strlen(argv[1]));
|
||||
printf("0x%x\n", result);
|
||||
return 0;
|
||||
}
|
||||
|
121
Documentation/pi-futex.txt
Normal file
121
Documentation/pi-futex.txt
Normal file
@ -0,0 +1,121 @@
|
||||
Lightweight PI-futexes
|
||||
----------------------
|
||||
|
||||
We are calling them lightweight for 3 reasons:
|
||||
|
||||
- in the user-space fastpath a PI-enabled futex involves no kernel work
|
||||
(or any other PI complexity) at all. No registration, no extra kernel
|
||||
calls - just pure fast atomic ops in userspace.
|
||||
|
||||
- even in the slowpath, the system call and scheduling pattern is very
|
||||
similar to normal futexes.
|
||||
|
||||
- the in-kernel PI implementation is streamlined around the mutex
|
||||
abstraction, with strict rules that keep the implementation
|
||||
relatively simple: only a single owner may own a lock (i.e. no
|
||||
read-write lock support), only the owner may unlock a lock, no
|
||||
recursive locking, etc.
|
||||
|
||||
Priority Inheritance - why?
|
||||
---------------------------
|
||||
|
||||
The short reply: user-space PI helps achieving/improving determinism for
|
||||
user-space applications. In the best-case, it can help achieve
|
||||
determinism and well-bound latencies. Even in the worst-case, PI will
|
||||
improve the statistical distribution of locking related application
|
||||
delays.
|
||||
|
||||
The longer reply:
|
||||
-----------------
|
||||
|
||||
Firstly, sharing locks between multiple tasks is a common programming
|
||||
technique that often cannot be replaced with lockless algorithms. As we
|
||||
can see it in the kernel [which is a quite complex program in itself],
|
||||
lockless structures are rather the exception than the norm - the current
|
||||
ratio of lockless vs. locky code for shared data structures is somewhere
|
||||
between 1:10 and 1:100. Lockless is hard, and the complexity of lockless
|
||||
algorithms often endangers to ability to do robust reviews of said code.
|
||||
I.e. critical RT apps often choose lock structures to protect critical
|
||||
data structures, instead of lockless algorithms. Furthermore, there are
|
||||
cases (like shared hardware, or other resource limits) where lockless
|
||||
access is mathematically impossible.
|
||||
|
||||
Media players (such as Jack) are an example of reasonable application
|
||||
design with multiple tasks (with multiple priority levels) sharing
|
||||
short-held locks: for example, a highprio audio playback thread is
|
||||
combined with medium-prio construct-audio-data threads and low-prio
|
||||
display-colory-stuff threads. Add video and decoding to the mix and
|
||||
we've got even more priority levels.
|
||||
|
||||
So once we accept that synchronization objects (locks) are an
|
||||
unavoidable fact of life, and once we accept that multi-task userspace
|
||||
apps have a very fair expectation of being able to use locks, we've got
|
||||
to think about how to offer the option of a deterministic locking
|
||||
implementation to user-space.
|
||||
|
||||
Most of the technical counter-arguments against doing priority
|
||||
inheritance only apply to kernel-space locks. But user-space locks are
|
||||
different, there we cannot disable interrupts or make the task
|
||||
non-preemptible in a critical section, so the 'use spinlocks' argument
|
||||
does not apply (user-space spinlocks have the same priority inversion
|
||||
problems as other user-space locking constructs). Fact is, pretty much
|
||||
the only technique that currently enables good determinism for userspace
|
||||
locks (such as futex-based pthread mutexes) is priority inheritance:
|
||||
|
||||
Currently (without PI), if a high-prio and a low-prio task shares a lock
|
||||
[this is a quite common scenario for most non-trivial RT applications],
|
||||
even if all critical sections are coded carefully to be deterministic
|
||||
(i.e. all critical sections are short in duration and only execute a
|
||||
limited number of instructions), the kernel cannot guarantee any
|
||||
deterministic execution of the high-prio task: any medium-priority task
|
||||
could preempt the low-prio task while it holds the shared lock and
|
||||
executes the critical section, and could delay it indefinitely.
|
||||
|
||||
Implementation:
|
||||
---------------
|
||||
|
||||
As mentioned before, the userspace fastpath of PI-enabled pthread
|
||||
mutexes involves no kernel work at all - they behave quite similarly to
|
||||
normal futex-based locks: a 0 value means unlocked, and a value==TID
|
||||
means locked. (This is the same method as used by list-based robust
|
||||
futexes.) Userspace uses atomic ops to lock/unlock these mutexes without
|
||||
entering the kernel.
|
||||
|
||||
To handle the slowpath, we have added two new futex ops:
|
||||
|
||||
FUTEX_LOCK_PI
|
||||
FUTEX_UNLOCK_PI
|
||||
|
||||
If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to
|
||||
TID fails], then FUTEX_LOCK_PI is called. The kernel does all the
|
||||
remaining work: if there is no futex-queue attached to the futex address
|
||||
yet then the code looks up the task that owns the futex [it has put its
|
||||
own TID into the futex value], and attaches a 'PI state' structure to
|
||||
the futex-queue. The pi_state includes an rt-mutex, which is a PI-aware,
|
||||
kernel-based synchronization object. The 'other' task is made the owner
|
||||
of the rt-mutex, and the FUTEX_WAITERS bit is atomically set in the
|
||||
futex value. Then this task tries to lock the rt-mutex, on which it
|
||||
blocks. Once it returns, it has the mutex acquired, and it sets the
|
||||
futex value to its own TID and returns. Userspace has no other work to
|
||||
perform - it now owns the lock, and futex value contains
|
||||
FUTEX_WAITERS|TID.
|
||||
|
||||
If the unlock side fastpath succeeds, [i.e. userspace manages to do a
|
||||
TID -> 0 atomic transition of the futex value], then no kernel work is
|
||||
triggered.
|
||||
|
||||
If the unlock fastpath fails (because the FUTEX_WAITERS bit is set),
|
||||
then FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the
|
||||
behalf of userspace - and it also unlocks the attached
|
||||
pi_state->rt_mutex and thus wakes up any potential waiters.
|
||||
|
||||
Note that under this approach, contrary to previous PI-futex approaches,
|
||||
there is no prior 'registration' of a PI-futex. [which is not quite
|
||||
possible anyway, due to existing ABI properties of pthread mutexes.]
|
||||
|
||||
Also, under this scheme, 'robustness' and 'PI' are two orthogonal
|
||||
properties of futexes, and all four combinations are possible: futex,
|
||||
robust-futex, PI-futex, robust+PI-futex.
|
||||
|
||||
More details about priority inheritance can be found in
|
||||
Documentation/rtmutex.txt.
|
@ -118,96 +118,6 @@ will fail.
|
||||
There is currently no way to know what states a device or driver
|
||||
supports a priori. This will change in the future.
|
||||
|
||||
pm_message_t meaning
|
||||
|
||||
pm_message_t has two fields. event ("major"), and flags. If driver
|
||||
does not know event code, it aborts the request, returning error. Some
|
||||
drivers may need to deal with special cases based on the actual type
|
||||
of suspend operation being done at the system level. This is why
|
||||
there are flags.
|
||||
|
||||
Event codes are:
|
||||
|
||||
ON -- no need to do anything except special cases like broken
|
||||
HW.
|
||||
|
||||
# NOTIFICATION -- pretty much same as ON?
|
||||
|
||||
FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from
|
||||
scratch. That probably means stop accepting upstream requests, the
|
||||
actual policy of what to do with them beeing specific to a given
|
||||
driver. It's acceptable for a network driver to just drop packets
|
||||
while a block driver is expected to block the queue so no request is
|
||||
lost. (Use IDE as an example on how to do that). FREEZE requires no
|
||||
power state change, and it's expected for drivers to be able to
|
||||
quickly transition back to operating state.
|
||||
|
||||
SUSPEND -- like FREEZE, but also put hardware into low-power state. If
|
||||
there's need to distinguish several levels of sleep, additional flag
|
||||
is probably best way to do that.
|
||||
|
||||
Transitions are only from a resumed state to a suspended state, never
|
||||
between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen,
|
||||
FREEZE -> SUSPEND or SUSPEND -> FREEZE can not).
|
||||
|
||||
All events are:
|
||||
|
||||
[NOTE NOTE NOTE: If you are driver author, you should not care; you
|
||||
should only look at event, and ignore flags.]
|
||||
|
||||
#Prepare for suspend -- userland is still running but we are going to
|
||||
#enter suspend state. This gives drivers chance to load firmware from
|
||||
#disk and store it in memory, or do other activities taht require
|
||||
#operating userland, ability to kmalloc GFP_KERNEL, etc... All of these
|
||||
#are forbiden once the suspend dance is started.. event = ON, flags =
|
||||
#PREPARE_TO_SUSPEND
|
||||
|
||||
Apm standby -- prepare for APM event. Quiesce devices to make life
|
||||
easier for APM BIOS. event = FREEZE, flags = APM_STANDBY
|
||||
|
||||
Apm suspend -- same as APM_STANDBY, but it we should probably avoid
|
||||
spinning down disks. event = FREEZE, flags = APM_SUSPEND
|
||||
|
||||
System halt, reboot -- quiesce devices to make life easier for BIOS. event
|
||||
= FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT
|
||||
|
||||
System shutdown -- at least disks need to be spun down, or data may be
|
||||
lost. Quiesce devices, just to make life easier for BIOS. event =
|
||||
FREEZE, flags = SYSTEM_SHUTDOWN
|
||||
|
||||
Kexec -- turn off DMAs and put hardware into some state where new
|
||||
kernel can take over. event = FREEZE, flags = KEXEC
|
||||
|
||||
Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake
|
||||
may need to be enabled on some devices. This actually has at least 3
|
||||
subtypes, system can reboot, enter S4 and enter S5 at the end of
|
||||
swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT,
|
||||
SYSTEM_SHUTDOWN, SYSTEM_S4
|
||||
|
||||
Suspend to ram -- put devices into low power state. event = SUSPEND,
|
||||
flags = SUSPEND_TO_RAM
|
||||
|
||||
Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put
|
||||
devices into low power mode, but you must be able to reinitialize
|
||||
device from scratch in resume method. This has two flavors, its done
|
||||
once on suspending kernel, once on resuming kernel. event = FREEZE,
|
||||
flags = DURING_SUSPEND or DURING_RESUME
|
||||
|
||||
Device detach requested from /sys -- deinitialize device; proably same as
|
||||
SYSTEM_SHUTDOWN, I do not understand this one too much. probably event
|
||||
= FREEZE, flags = DEV_DETACH.
|
||||
|
||||
#These are not really events sent:
|
||||
#
|
||||
#System fully on -- device is working normally; this is probably never
|
||||
#passed to suspend() method... event = ON, flags = 0
|
||||
#
|
||||
#Ready after resume -- userland is now running, again. Time to free any
|
||||
#memory you ate during prepare to suspend... event = ON, flags =
|
||||
#READY_AFTER_RESUME
|
||||
#
|
||||
|
||||
|
||||
pm_message_t meaning
|
||||
|
||||
pm_message_t has two fields. event ("major"), and flags. If driver
|
||||
|
@ -18,10 +18,11 @@ Some warnings, first.
|
||||
*
|
||||
* (*) suspend/resume support is needed to make it safe.
|
||||
*
|
||||
* If you have any filesystems on USB devices mounted before suspend,
|
||||
* If you have any filesystems on USB devices mounted before software suspend,
|
||||
* they won't be accessible after resume and you may lose data, as though
|
||||
* you have unplugged the USB devices with mounted filesystems on them
|
||||
* (see the FAQ below for details).
|
||||
* you have unplugged the USB devices with mounted filesystems on them;
|
||||
* see the FAQ below for details. (This is not true for more traditional
|
||||
* power states like "standby", which normally don't turn USB off.)
|
||||
|
||||
You need to append resume=/dev/your_swap_partition to kernel command
|
||||
line. Then you suspend by
|
||||
@ -204,7 +205,7 @@ Q: There don't seem to be any generally useful behavioral
|
||||
distinctions between SUSPEND and FREEZE.
|
||||
|
||||
A: Doing SUSPEND when you are asked to do FREEZE is always correct,
|
||||
but it may be unneccessarily slow. If you want USB to stay simple,
|
||||
but it may be unneccessarily slow. If you want your driver to stay simple,
|
||||
slowness may not matter to you. It can always be fixed later.
|
||||
|
||||
For devices like disk it does matter, you do not want to spindown for
|
||||
@ -349,25 +350,72 @@ Q: How do I make suspend more verbose?
|
||||
|
||||
A: If you want to see any non-error kernel messages on the virtual
|
||||
terminal the kernel switches to during suspend, you have to set the
|
||||
kernel console loglevel to at least 5, for example by doing
|
||||
kernel console loglevel to at least 4 (KERN_WARNING), for example by
|
||||
doing
|
||||
|
||||
echo 5 > /proc/sys/kernel/printk
|
||||
# save the old loglevel
|
||||
read LOGLEVEL DUMMY < /proc/sys/kernel/printk
|
||||
# set the loglevel so we see the progress bar.
|
||||
# if the level is higher than needed, we leave it alone.
|
||||
if [ $LOGLEVEL -lt 5 ]; then
|
||||
echo 5 > /proc/sys/kernel/printk
|
||||
fi
|
||||
|
||||
IMG_SZ=0
|
||||
read IMG_SZ < /sys/power/image_size
|
||||
echo -n disk > /sys/power/state
|
||||
RET=$?
|
||||
#
|
||||
# the logic here is:
|
||||
# if image_size > 0 (without kernel support, IMG_SZ will be zero),
|
||||
# then try again with image_size set to zero.
|
||||
if [ $RET -ne 0 -a $IMG_SZ -ne 0 ]; then # try again with minimal image size
|
||||
echo 0 > /sys/power/image_size
|
||||
echo -n disk > /sys/power/state
|
||||
RET=$?
|
||||
fi
|
||||
|
||||
# restore previous loglevel
|
||||
echo $LOGLEVEL > /proc/sys/kernel/printk
|
||||
exit $RET
|
||||
|
||||
Q: Is this true that if I have a mounted filesystem on a USB device and
|
||||
I suspend to disk, I can lose data unless the filesystem has been mounted
|
||||
with "sync"?
|
||||
|
||||
A: That's right. It depends on your hardware, and it could be true even for
|
||||
suspend-to-RAM. In fact, even with "-o sync" you can lose data if your
|
||||
programs have information in buffers they haven't written out to disk.
|
||||
A: That's right ... if you disconnect that device, you may lose data.
|
||||
In fact, even with "-o sync" you can lose data if your programs have
|
||||
information in buffers they haven't written out to a disk you disconnect,
|
||||
or if you disconnect before the device finished saving data you wrote.
|
||||
|
||||
If you're lucky, your hardware will support low-power modes for USB
|
||||
controllers while the system is asleep. Lots of hardware doesn't,
|
||||
however. Shutting off the power to a USB controller is equivalent to
|
||||
unplugging all the attached devices.
|
||||
Software suspend normally powers down USB controllers, which is equivalent
|
||||
to disconnecting all USB devices attached to your system.
|
||||
|
||||
Your system might well support low-power modes for its USB controllers
|
||||
while the system is asleep, maintaining the connection, using true sleep
|
||||
modes like "suspend-to-RAM" or "standby". (Don't write "disk" to the
|
||||
/sys/power/state file; write "standby" or "mem".) We've not seen any
|
||||
hardware that can use these modes through software suspend, although in
|
||||
theory some systems might support "platform" or "firmware" modes that
|
||||
won't break the USB connections.
|
||||
|
||||
Remember that it's always a bad idea to unplug a disk drive containing a
|
||||
mounted filesystem. With USB that's true even when your system is asleep!
|
||||
The safest thing is to unmount all USB-based filesystems before suspending
|
||||
and remount them after resuming.
|
||||
mounted filesystem. That's true even when your system is asleep! The
|
||||
safest thing is to unmount all filesystems on removable media (such USB,
|
||||
Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays)
|
||||
before suspending; then remount them after resuming.
|
||||
|
||||
Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were
|
||||
compiled with the similar configuration files. Anyway I found that
|
||||
suspend to disk (and resume) is much slower on 2.6.16 compared to
|
||||
2.6.15. Any idea for why that might happen or how can I speed it up?
|
||||
|
||||
A: This is because the size of the suspend image is now greater than
|
||||
for 2.6.15 (by saving more data we can get more responsive system
|
||||
after resume).
|
||||
|
||||
There's the /sys/power/image_size knob that controls the size of the
|
||||
image. If you set it to 0 (eg. by echo 0 > /sys/power/image_size as
|
||||
root), the 2.6.15 behavior should be restored. If it is still too
|
||||
slow, take a look at suspend.sf.net -- userland suspend is faster and
|
||||
supports LZF compression to speed it up further.
|
||||
|
@ -90,6 +90,7 @@ Table of known working notebooks:
|
||||
Model hack (or "how to do it")
|
||||
------------------------------------------------------------------------------
|
||||
Acer Aspire 1406LC ole's late BIOS init (7), turn off DRI
|
||||
Acer TM 230 s3_bios (2)
|
||||
Acer TM 242FX vbetool (6)
|
||||
Acer TM C110 video_post (8)
|
||||
Acer TM C300 vga=normal (only suspend on console, not in X), vbetool (6) or video_post (8)
|
||||
@ -115,6 +116,7 @@ Dell D610 vga=normal and X (possibly vbestate (6) too, but not tested)
|
||||
Dell Inspiron 4000 ??? (*)
|
||||
Dell Inspiron 500m ??? (*)
|
||||
Dell Inspiron 510m ???
|
||||
Dell Inspiron 5150 vbetool needed (6)
|
||||
Dell Inspiron 600m ??? (*)
|
||||
Dell Inspiron 8200 ??? (*)
|
||||
Dell Inspiron 8500 ??? (*)
|
||||
@ -125,6 +127,7 @@ HP NX7000 ??? (*)
|
||||
HP Pavilion ZD7000 vbetool post needed, need open-source nv driver for X
|
||||
HP Omnibook XE3 athlon version none (1)
|
||||
HP Omnibook XE3GC none (1), video is S3 Savage/IX-MV
|
||||
HP Omnibook XE3L-GF vbetool (6)
|
||||
HP Omnibook 5150 none (1), (S1 also works OK)
|
||||
IBM TP T20, model 2647-44G none (1), video is S3 Inc. 86C270-294 Savage/IX-MV, vesafb gets "interesting" but X work.
|
||||
IBM TP A31 / Type 2652-M5G s3_mode (3) [works ok with BIOS 1.04 2002-08-23, but not at all with BIOS 1.11 2004-11-05 :-(]
|
||||
@ -157,6 +160,7 @@ Sony Vaio vgn-s260 X or boot-radeon can init it (5)
|
||||
Sony Vaio vgn-S580BH vga=normal, but suspend from X. Console will be blank unless you return to X.
|
||||
Sony Vaio vgn-FS115B s3_bios (2),s3_mode (4)
|
||||
Toshiba Libretto L5 none (1)
|
||||
Toshiba Libretto 100CT/110CT vbetool (6)
|
||||
Toshiba Portege 3020CT s3_mode (3)
|
||||
Toshiba Satellite 4030CDT s3_mode (3) (S1 also works OK)
|
||||
Toshiba Satellite 4080XCDT s3_mode (3) (S1 also works OK)
|
||||
|
@ -95,7 +95,7 @@ comparison. If the thread has registered a list, then normally the list
|
||||
is empty. If the thread/process crashed or terminated in some incorrect
|
||||
way then the list might be non-empty: in this case the kernel carefully
|
||||
walks the list [not trusting it], and marks all locks that are owned by
|
||||
this thread with the FUTEX_OWNER_DEAD bit, and wakes up one waiter (if
|
||||
this thread with the FUTEX_OWNER_DIED bit, and wakes up one waiter (if
|
||||
any).
|
||||
|
||||
The list is guaranteed to be private and per-thread at do_exit() time,
|
||||
|
781
Documentation/rt-mutex-design.txt
Normal file
781
Documentation/rt-mutex-design.txt
Normal file
@ -0,0 +1,781 @@
|
||||
#
|
||||
# Copyright (c) 2006 Steven Rostedt
|
||||
# Licensed under the GNU Free Documentation License, Version 1.2
|
||||
#
|
||||
|
||||
RT-mutex implementation design
|
||||
------------------------------
|
||||
|
||||
This document tries to describe the design of the rtmutex.c implementation.
|
||||
It doesn't describe the reasons why rtmutex.c exists. For that please see
|
||||
Documentation/rt-mutex.txt. Although this document does explain problems
|
||||
that happen without this code, but that is in the concept to understand
|
||||
what the code actually is doing.
|
||||
|
||||
The goal of this document is to help others understand the priority
|
||||
inheritance (PI) algorithm that is used, as well as reasons for the
|
||||
decisions that were made to implement PI in the manner that was done.
|
||||
|
||||
|
||||
Unbounded Priority Inversion
|
||||
----------------------------
|
||||
|
||||
Priority inversion is when a lower priority process executes while a higher
|
||||
priority process wants to run. This happens for several reasons, and
|
||||
most of the time it can't be helped. Anytime a high priority process wants
|
||||
to use a resource that a lower priority process has (a mutex for example),
|
||||
the high priority process must wait until the lower priority process is done
|
||||
with the resource. This is a priority inversion. What we want to prevent
|
||||
is something called unbounded priority inversion. That is when the high
|
||||
priority process is prevented from running by a lower priority process for
|
||||
an undetermined amount of time.
|
||||
|
||||
The classic example of unbounded priority inversion is were you have three
|
||||
processes, let's call them processes A, B, and C, where A is the highest
|
||||
priority process, C is the lowest, and B is in between. A tries to grab a lock
|
||||
that C owns and must wait and lets C run to release the lock. But in the
|
||||
meantime, B executes, and since B is of a higher priority than C, it preempts C,
|
||||
but by doing so, it is in fact preempting A which is a higher priority process.
|
||||
Now there's no way of knowing how long A will be sleeping waiting for C
|
||||
to release the lock, because for all we know, B is a CPU hog and will
|
||||
never give C a chance to release the lock. This is called unbounded priority
|
||||
inversion.
|
||||
|
||||
Here's a little ASCII art to show the problem.
|
||||
|
||||
grab lock L1 (owned by C)
|
||||
|
|
||||
A ---+
|
||||
C preempted by B
|
||||
|
|
||||
C +----+
|
||||
|
||||
B +-------->
|
||||
B now keeps A from running.
|
||||
|
||||
|
||||
Priority Inheritance (PI)
|
||||
-------------------------
|
||||
|
||||
There are several ways to solve this issue, but other ways are out of scope
|
||||
for this document. Here we only discuss PI.
|
||||
|
||||
PI is where a process inherits the priority of another process if the other
|
||||
process blocks on a lock owned by the current process. To make this easier
|
||||
to understand, let's use the previous example, with processes A, B, and C again.
|
||||
|
||||
This time, when A blocks on the lock owned by C, C would inherit the priority
|
||||
of A. So now if B becomes runnable, it would not preempt C, since C now has
|
||||
the high priority of A. As soon as C releases the lock, it loses its
|
||||
inherited priority, and A then can continue with the resource that C had.
|
||||
|
||||
Terminology
|
||||
-----------
|
||||
|
||||
Here I explain some terminology that is used in this document to help describe
|
||||
the design that is used to implement PI.
|
||||
|
||||
PI chain - The PI chain is an ordered series of locks and processes that cause
|
||||
processes to inherit priorities from a previous process that is
|
||||
blocked on one of its locks. This is described in more detail
|
||||
later in this document.
|
||||
|
||||
mutex - In this document, to differentiate from locks that implement
|
||||
PI and spin locks that are used in the PI code, from now on
|
||||
the PI locks will be called a mutex.
|
||||
|
||||
lock - In this document from now on, I will use the term lock when
|
||||
referring to spin locks that are used to protect parts of the PI
|
||||
algorithm. These locks disable preemption for UP (when
|
||||
CONFIG_PREEMPT is enabled) and on SMP prevents multiple CPUs from
|
||||
entering critical sections simultaneously.
|
||||
|
||||
spin lock - Same as lock above.
|
||||
|
||||
waiter - A waiter is a struct that is stored on the stack of a blocked
|
||||
process. Since the scope of the waiter is within the code for
|
||||
a process being blocked on the mutex, it is fine to allocate
|
||||
the waiter on the process's stack (local variable). This
|
||||
structure holds a pointer to the task, as well as the mutex that
|
||||
the task is blocked on. It also has the plist node structures to
|
||||
place the task in the waiter_list of a mutex as well as the
|
||||
pi_list of a mutex owner task (described below).
|
||||
|
||||
waiter is sometimes used in reference to the task that is waiting
|
||||
on a mutex. This is the same as waiter->task.
|
||||
|
||||
waiters - A list of processes that are blocked on a mutex.
|
||||
|
||||
top waiter - The highest priority process waiting on a specific mutex.
|
||||
|
||||
top pi waiter - The highest priority process waiting on one of the mutexes
|
||||
that a specific process owns.
|
||||
|
||||
Note: task and process are used interchangeably in this document, mostly to
|
||||
differentiate between two processes that are being described together.
|
||||
|
||||
|
||||
PI chain
|
||||
--------
|
||||
|
||||
The PI chain is a list of processes and mutexes that may cause priority
|
||||
inheritance to take place. Multiple chains may converge, but a chain
|
||||
would never diverge, since a process can't be blocked on more than one
|
||||
mutex at a time.
|
||||
|
||||
Example:
|
||||
|
||||
Process: A, B, C, D, E
|
||||
Mutexes: L1, L2, L3, L4
|
||||
|
||||
A owns: L1
|
||||
B blocked on L1
|
||||
B owns L2
|
||||
C blocked on L2
|
||||
C owns L3
|
||||
D blocked on L3
|
||||
D owns L4
|
||||
E blocked on L4
|
||||
|
||||
The chain would be:
|
||||
|
||||
E->L4->D->L3->C->L2->B->L1->A
|
||||
|
||||
To show where two chains merge, we could add another process F and
|
||||
another mutex L5 where B owns L5 and F is blocked on mutex L5.
|
||||
|
||||
The chain for F would be:
|
||||
|
||||
F->L5->B->L1->A
|
||||
|
||||
Since a process may own more than one mutex, but never be blocked on more than
|
||||
one, the chains merge.
|
||||
|
||||
Here we show both chains:
|
||||
|
||||
E->L4->D->L3->C->L2-+
|
||||
|
|
||||
+->B->L1->A
|
||||
|
|
||||
F->L5-+
|
||||
|
||||
For PI to work, the processes at the right end of these chains (or we may
|
||||
also call it the Top of the chain) must be equal to or higher in priority
|
||||
than the processes to the left or below in the chain.
|
||||
|
||||
Also since a mutex may have more than one process blocked on it, we can
|
||||
have multiple chains merge at mutexes. If we add another process G that is
|
||||
blocked on mutex L2:
|
||||
|
||||
G->L2->B->L1->A
|
||||
|
||||
And once again, to show how this can grow I will show the merging chains
|
||||
again.
|
||||
|
||||
E->L4->D->L3->C-+
|
||||
+->L2-+
|
||||
| |
|
||||
G-+ +->B->L1->A
|
||||
|
|
||||
F->L5-+
|
||||
|
||||
|
||||
Plist
|
||||
-----
|
||||
|
||||
Before I go further and talk about how the PI chain is stored through lists
|
||||
on both mutexes and processes, I'll explain the plist. This is similar to
|
||||
the struct list_head functionality that is already in the kernel.
|
||||
The implementation of plist is out of scope for this document, but it is
|
||||
very important to understand what it does.
|
||||
|
||||
There are a few differences between plist and list, the most important one
|
||||
being that plist is a priority sorted linked list. This means that the
|
||||
priorities of the plist are sorted, such that it takes O(1) to retrieve the
|
||||
highest priority item in the list. Obviously this is useful to store processes
|
||||
based on their priorities.
|
||||
|
||||
Another difference, which is important for implementation, is that, unlike
|
||||
list, the head of the list is a different element than the nodes of a list.
|
||||
So the head of the list is declared as struct plist_head and nodes that will
|
||||
be added to the list are declared as struct plist_node.
|
||||
|
||||
|
||||
Mutex Waiter List
|
||||
-----------------
|
||||
|
||||
Every mutex keeps track of all the waiters that are blocked on itself. The mutex
|
||||
has a plist to store these waiters by priority. This list is protected by
|
||||
a spin lock that is located in the struct of the mutex. This lock is called
|
||||
wait_lock. Since the modification of the waiter list is never done in
|
||||
interrupt context, the wait_lock can be taken without disabling interrupts.
|
||||
|
||||
|
||||
Task PI List
|
||||
------------
|
||||
|
||||
To keep track of the PI chains, each process has its own PI list. This is
|
||||
a list of all top waiters of the mutexes that are owned by the process.
|
||||
Note that this list only holds the top waiters and not all waiters that are
|
||||
blocked on mutexes owned by the process.
|
||||
|
||||
The top of the task's PI list is always the highest priority task that
|
||||
is waiting on a mutex that is owned by the task. So if the task has
|
||||
inherited a priority, it will always be the priority of the task that is
|
||||
at the top of this list.
|
||||
|
||||
This list is stored in the task structure of a process as a plist called
|
||||
pi_list. This list is protected by a spin lock also in the task structure,
|
||||
called pi_lock. This lock may also be taken in interrupt context, so when
|
||||
locking the pi_lock, interrupts must be disabled.
|
||||
|
||||
|
||||
Depth of the PI Chain
|
||||
---------------------
|
||||
|
||||
The maximum depth of the PI chain is not dynamic, and could actually be
|
||||
defined. But is very complex to figure it out, since it depends on all
|
||||
the nesting of mutexes. Let's look at the example where we have 3 mutexes,
|
||||
L1, L2, and L3, and four separate functions func1, func2, func3 and func4.
|
||||
The following shows a locking order of L1->L2->L3, but may not actually
|
||||
be directly nested that way.
|
||||
|
||||
void func1(void)
|
||||
{
|
||||
mutex_lock(L1);
|
||||
|
||||
/* do anything */
|
||||
|
||||
mutex_unlock(L1);
|
||||
}
|
||||
|
||||
void func2(void)
|
||||
{
|
||||
mutex_lock(L1);
|
||||
mutex_lock(L2);
|
||||
|
||||
/* do something */
|
||||
|
||||
mutex_unlock(L2);
|
||||
mutex_unlock(L1);
|
||||
}
|
||||
|
||||
void func3(void)
|
||||
{
|
||||
mutex_lock(L2);
|
||||
mutex_lock(L3);
|
||||
|
||||
/* do something else */
|
||||
|
||||
mutex_unlock(L3);
|
||||
mutex_unlock(L2);
|
||||
}
|
||||
|
||||
void func4(void)
|
||||
{
|
||||
mutex_lock(L3);
|
||||
|
||||
/* do something again */
|
||||
|
||||
mutex_unlock(L3);
|
||||
}
|
||||
|
||||
Now we add 4 processes that run each of these functions separately.
|
||||
Processes A, B, C, and D which run functions func1, func2, func3 and func4
|
||||
respectively, and such that D runs first and A last. With D being preempted
|
||||
in func4 in the "do something again" area, we have a locking that follows:
|
||||
|
||||
D owns L3
|
||||
C blocked on L3
|
||||
C owns L2
|
||||
B blocked on L2
|
||||
B owns L1
|
||||
A blocked on L1
|
||||
|
||||
And thus we have the chain A->L1->B->L2->C->L3->D.
|
||||
|
||||
This gives us a PI depth of 4 (four processes), but looking at any of the
|
||||
functions individually, it seems as though they only have at most a locking
|
||||
depth of two. So, although the locking depth is defined at compile time,
|
||||
it still is very difficult to find the possibilities of that depth.
|
||||
|
||||
Now since mutexes can be defined by user-land applications, we don't want a DOS
|
||||
type of application that nests large amounts of mutexes to create a large
|
||||
PI chain, and have the code holding spin locks while looking at a large
|
||||
amount of data. So to prevent this, the implementation not only implements
|
||||
a maximum lock depth, but also only holds at most two different locks at a
|
||||
time, as it walks the PI chain. More about this below.
|
||||
|
||||
|
||||
Mutex owner and flags
|
||||
---------------------
|
||||
|
||||
The mutex structure contains a pointer to the owner of the mutex. If the
|
||||
mutex is not owned, this owner is set to NULL. Since all architectures
|
||||
have the task structure on at least a four byte alignment (and if this is
|
||||
not true, the rtmutex.c code will be broken!), this allows for the two
|
||||
least significant bits to be used as flags. This part is also described
|
||||
in Documentation/rt-mutex.txt, but will also be briefly described here.
|
||||
|
||||
Bit 0 is used as the "Pending Owner" flag. This is described later.
|
||||
Bit 1 is used as the "Has Waiters" flags. This is also described later
|
||||
in more detail, but is set whenever there are waiters on a mutex.
|
||||
|
||||
|
||||
cmpxchg Tricks
|
||||
--------------
|
||||
|
||||
Some architectures implement an atomic cmpxchg (Compare and Exchange). This
|
||||
is used (when applicable) to keep the fast path of grabbing and releasing
|
||||
mutexes short.
|
||||
|
||||
cmpxchg is basically the following function performed atomically:
|
||||
|
||||
unsigned long _cmpxchg(unsigned long *A, unsigned long *B, unsigned long *C)
|
||||
{
|
||||
unsigned long T = *A;
|
||||
if (*A == *B) {
|
||||
*A = *C;
|
||||
}
|
||||
return T;
|
||||
}
|
||||
#define cmpxchg(a,b,c) _cmpxchg(&a,&b,&c)
|
||||
|
||||
This is really nice to have, since it allows you to only update a variable
|
||||
if the variable is what you expect it to be. You know if it succeeded if
|
||||
the return value (the old value of A) is equal to B.
|
||||
|
||||
The macro rt_mutex_cmpxchg is used to try to lock and unlock mutexes. If
|
||||
the architecture does not support CMPXCHG, then this macro is simply set
|
||||
to fail every time. But if CMPXCHG is supported, then this will
|
||||
help out extremely to keep the fast path short.
|
||||
|
||||
The use of rt_mutex_cmpxchg with the flags in the owner field help optimize
|
||||
the system for architectures that support it. This will also be explained
|
||||
later in this document.
|
||||
|
||||
|
||||
Priority adjustments
|
||||
--------------------
|
||||
|
||||
The implementation of the PI code in rtmutex.c has several places that a
|
||||
process must adjust its priority. With the help of the pi_list of a
|
||||
process this is rather easy to know what needs to be adjusted.
|
||||
|
||||
The functions implementing the task adjustments are rt_mutex_adjust_prio,
|
||||
__rt_mutex_adjust_prio (same as the former, but expects the task pi_lock
|
||||
to already be taken), rt_mutex_get_prio, and rt_mutex_setprio.
|
||||
|
||||
rt_mutex_getprio and rt_mutex_setprio are only used in __rt_mutex_adjust_prio.
|
||||
|
||||
rt_mutex_getprio returns the priority that the task should have. Either the
|
||||
task's own normal priority, or if a process of a higher priority is waiting on
|
||||
a mutex owned by the task, then that higher priority should be returned.
|
||||
Since the pi_list of a task holds an order by priority list of all the top
|
||||
waiters of all the mutexes that the task owns, rt_mutex_getprio simply needs
|
||||
to compare the top pi waiter to its own normal priority, and return the higher
|
||||
priority back.
|
||||
|
||||
(Note: if looking at the code, you will notice that the lower number of
|
||||
prio is returned. This is because the prio field in the task structure
|
||||
is an inverse order of the actual priority. So a "prio" of 5 is
|
||||
of higher priority than a "prio" of 10.)
|
||||
|
||||
__rt_mutex_adjust_prio examines the result of rt_mutex_getprio, and if the
|
||||
result does not equal the task's current priority, then rt_mutex_setprio
|
||||
is called to adjust the priority of the task to the new priority.
|
||||
Note that rt_mutex_setprio is defined in kernel/sched.c to implement the
|
||||
actual change in priority.
|
||||
|
||||
It is interesting to note that __rt_mutex_adjust_prio can either increase
|
||||
or decrease the priority of the task. In the case that a higher priority
|
||||
process has just blocked on a mutex owned by the task, __rt_mutex_adjust_prio
|
||||
would increase/boost the task's priority. But if a higher priority task
|
||||
were for some reason to leave the mutex (timeout or signal), this same function
|
||||
would decrease/unboost the priority of the task. That is because the pi_list
|
||||
always contains the highest priority task that is waiting on a mutex owned
|
||||
by the task, so we only need to compare the priority of that top pi waiter
|
||||
to the normal priority of the given task.
|
||||
|
||||
|
||||
High level overview of the PI chain walk
|
||||
----------------------------------------
|
||||
|
||||
The PI chain walk is implemented by the function rt_mutex_adjust_prio_chain.
|
||||
|
||||
The implementation has gone through several iterations, and has ended up
|
||||
with what we believe is the best. It walks the PI chain by only grabbing
|
||||
at most two locks at a time, and is very efficient.
|
||||
|
||||
The rt_mutex_adjust_prio_chain can be used either to boost or lower process
|
||||
priorities.
|
||||
|
||||
rt_mutex_adjust_prio_chain is called with a task to be checked for PI
|
||||
(de)boosting (the owner of a mutex that a process is blocking on), a flag to
|
||||
check for deadlocking, the mutex that the task owns, and a pointer to a waiter
|
||||
that is the process's waiter struct that is blocked on the mutex (although this
|
||||
parameter may be NULL for deboosting).
|
||||
|
||||
For this explanation, I will not mention deadlock detection. This explanation
|
||||
will try to stay at a high level.
|
||||
|
||||
When this function is called, there are no locks held. That also means
|
||||
that the state of the owner and lock can change when entered into this function.
|
||||
|
||||
Before this function is called, the task has already had rt_mutex_adjust_prio
|
||||
performed on it. This means that the task is set to the priority that it
|
||||
should be at, but the plist nodes of the task's waiter have not been updated
|
||||
with the new priorities, and that this task may not be in the proper locations
|
||||
in the pi_lists and wait_lists that the task is blocked on. This function
|
||||
solves all that.
|
||||
|
||||
A loop is entered, where task is the owner to be checked for PI changes that
|
||||
was passed by parameter (for the first iteration). The pi_lock of this task is
|
||||
taken to prevent any more changes to the pi_list of the task. This also
|
||||
prevents new tasks from completing the blocking on a mutex that is owned by this
|
||||
task.
|
||||
|
||||
If the task is not blocked on a mutex then the loop is exited. We are at
|
||||
the top of the PI chain.
|
||||
|
||||
A check is now done to see if the original waiter (the process that is blocked
|
||||
on the current mutex) is the top pi waiter of the task. That is, is this
|
||||
waiter on the top of the task's pi_list. If it is not, it either means that
|
||||
there is another process higher in priority that is blocked on one of the
|
||||
mutexes that the task owns, or that the waiter has just woken up via a signal
|
||||
or timeout and has left the PI chain. In either case, the loop is exited, since
|
||||
we don't need to do any more changes to the priority of the current task, or any
|
||||
task that owns a mutex that this current task is waiting on. A priority chain
|
||||
walk is only needed when a new top pi waiter is made to a task.
|
||||
|
||||
The next check sees if the task's waiter plist node has the priority equal to
|
||||
the priority the task is set at. If they are equal, then we are done with
|
||||
the loop. Remember that the function started with the priority of the
|
||||
task adjusted, but the plist nodes that hold the task in other processes
|
||||
pi_lists have not been adjusted.
|
||||
|
||||
Next, we look at the mutex that the task is blocked on. The mutex's wait_lock
|
||||
is taken. This is done by a spin_trylock, because the locking order of the
|
||||
pi_lock and wait_lock goes in the opposite direction. If we fail to grab the
|
||||
lock, the pi_lock is released, and we restart the loop.
|
||||
|
||||
Now that we have both the pi_lock of the task as well as the wait_lock of
|
||||
the mutex the task is blocked on, we update the task's waiter's plist node
|
||||
that is located on the mutex's wait_list.
|
||||
|
||||
Now we release the pi_lock of the task.
|
||||
|
||||
Next the owner of the mutex has its pi_lock taken, so we can update the
|
||||
task's entry in the owner's pi_list. If the task is the highest priority
|
||||
process on the mutex's wait_list, then we remove the previous top waiter
|
||||
from the owner's pi_list, and replace it with the task.
|
||||
|
||||
Note: It is possible that the task was the current top waiter on the mutex,
|
||||
in which case the task is not yet on the pi_list of the waiter. This
|
||||
is OK, since plist_del does nothing if the plist node is not on any
|
||||
list.
|
||||
|
||||
If the task was not the top waiter of the mutex, but it was before we
|
||||
did the priority updates, that means we are deboosting/lowering the
|
||||
task. In this case, the task is removed from the pi_list of the owner,
|
||||
and the new top waiter is added.
|
||||
|
||||
Lastly, we unlock both the pi_lock of the task, as well as the mutex's
|
||||
wait_lock, and continue the loop again. On the next iteration of the
|
||||
loop, the previous owner of the mutex will be the task that will be
|
||||
processed.
|
||||
|
||||
Note: One might think that the owner of this mutex might have changed
|
||||
since we just grab the mutex's wait_lock. And one could be right.
|
||||
The important thing to remember is that the owner could not have
|
||||
become the task that is being processed in the PI chain, since
|
||||
we have taken that task's pi_lock at the beginning of the loop.
|
||||
So as long as there is an owner of this mutex that is not the same
|
||||
process as the tasked being worked on, we are OK.
|
||||
|
||||
Looking closely at the code, one might be confused. The check for the
|
||||
end of the PI chain is when the task isn't blocked on anything or the
|
||||
task's waiter structure "task" element is NULL. This check is
|
||||
protected only by the task's pi_lock. But the code to unlock the mutex
|
||||
sets the task's waiter structure "task" element to NULL with only
|
||||
the protection of the mutex's wait_lock, which was not taken yet.
|
||||
Isn't this a race condition if the task becomes the new owner?
|
||||
|
||||
The answer is No! The trick is the spin_trylock of the mutex's
|
||||
wait_lock. If we fail that lock, we release the pi_lock of the
|
||||
task and continue the loop, doing the end of PI chain check again.
|
||||
|
||||
In the code to release the lock, the wait_lock of the mutex is held
|
||||
the entire time, and it is not let go when we grab the pi_lock of the
|
||||
new owner of the mutex. So if the switch of a new owner were to happen
|
||||
after the check for end of the PI chain and the grabbing of the
|
||||
wait_lock, the unlocking code would spin on the new owner's pi_lock
|
||||
but never give up the wait_lock. So the PI chain loop is guaranteed to
|
||||
fail the spin_trylock on the wait_lock, release the pi_lock, and
|
||||
try again.
|
||||
|
||||
If you don't quite understand the above, that's OK. You don't have to,
|
||||
unless you really want to make a proof out of it ;)
|
||||
|
||||
|
||||
Pending Owners and Lock stealing
|
||||
--------------------------------
|
||||
|
||||
One of the flags in the owner field of the mutex structure is "Pending Owner".
|
||||
What this means is that an owner was chosen by the process releasing the
|
||||
mutex, but that owner has yet to wake up and actually take the mutex.
|
||||
|
||||
Why is this important? Why can't we just give the mutex to another process
|
||||
and be done with it?
|
||||
|
||||
The PI code is to help with real-time processes, and to let the highest
|
||||
priority process run as long as possible with little latencies and delays.
|
||||
If a high priority process owns a mutex that a lower priority process is
|
||||
blocked on, when the mutex is released it would be given to the lower priority
|
||||
process. What if the higher priority process wants to take that mutex again.
|
||||
The high priority process would fail to take that mutex that it just gave up
|
||||
and it would need to boost the lower priority process to run with full
|
||||
latency of that critical section (since the low priority process just entered
|
||||
it).
|
||||
|
||||
There's no reason a high priority process that gives up a mutex should be
|
||||
penalized if it tries to take that mutex again. If the new owner of the
|
||||
mutex has not woken up yet, there's no reason that the higher priority process
|
||||
could not take that mutex away.
|
||||
|
||||
To solve this, we introduced Pending Ownership and Lock Stealing. When a
|
||||
new process is given a mutex that it was blocked on, it is only given
|
||||
pending ownership. This means that it's the new owner, unless a higher
|
||||
priority process comes in and tries to grab that mutex. If a higher priority
|
||||
process does come along and wants that mutex, we let the higher priority
|
||||
process "steal" the mutex from the pending owner (only if it is still pending)
|
||||
and continue with the mutex.
|
||||
|
||||
|
||||
Taking of a mutex (The walk through)
|
||||
------------------------------------
|
||||
|
||||
OK, now let's take a look at the detailed walk through of what happens when
|
||||
taking a mutex.
|
||||
|
||||
The first thing that is tried is the fast taking of the mutex. This is
|
||||
done when we have CMPXCHG enabled (otherwise the fast taking automatically
|
||||
fails). Only when the owner field of the mutex is NULL can the lock be
|
||||
taken with the CMPXCHG and nothing else needs to be done.
|
||||
|
||||
If there is contention on the lock, whether it is owned or pending owner
|
||||
we go about the slow path (rt_mutex_slowlock).
|
||||
|
||||
The slow path function is where the task's waiter structure is created on
|
||||
the stack. This is because the waiter structure is only needed for the
|
||||
scope of this function. The waiter structure holds the nodes to store
|
||||
the task on the wait_list of the mutex, and if need be, the pi_list of
|
||||
the owner.
|
||||
|
||||
The wait_lock of the mutex is taken since the slow path of unlocking the
|
||||
mutex also takes this lock.
|
||||
|
||||
We then call try_to_take_rt_mutex. This is where the architecture that
|
||||
does not implement CMPXCHG would always grab the lock (if there's no
|
||||
contention).
|
||||
|
||||
try_to_take_rt_mutex is used every time the task tries to grab a mutex in the
|
||||
slow path. The first thing that is done here is an atomic setting of
|
||||
the "Has Waiters" flag of the mutex's owner field. Yes, this could really
|
||||
be false, because if the the mutex has no owner, there are no waiters and
|
||||
the current task also won't have any waiters. But we don't have the lock
|
||||
yet, so we assume we are going to be a waiter. The reason for this is to
|
||||
play nice for those architectures that do have CMPXCHG. By setting this flag
|
||||
now, the owner of the mutex can't release the mutex without going into the
|
||||
slow unlock path, and it would then need to grab the wait_lock, which this
|
||||
code currently holds. So setting the "Has Waiters" flag forces the owner
|
||||
to synchronize with this code.
|
||||
|
||||
Now that we know that we can't have any races with the owner releasing the
|
||||
mutex, we check to see if we can take the ownership. This is done if the
|
||||
mutex doesn't have a owner, or if we can steal the mutex from a pending
|
||||
owner. Let's look at the situations we have here.
|
||||
|
||||
1) Has owner that is pending
|
||||
----------------------------
|
||||
|
||||
The mutex has a owner, but it hasn't woken up and the mutex flag
|
||||
"Pending Owner" is set. The first check is to see if the owner isn't the
|
||||
current task. This is because this function is also used for the pending
|
||||
owner to grab the mutex. When a pending owner wakes up, it checks to see
|
||||
if it can take the mutex, and this is done if the owner is already set to
|
||||
itself. If so, we succeed and leave the function, clearing the "Pending
|
||||
Owner" bit.
|
||||
|
||||
If the pending owner is not current, we check to see if the current priority is
|
||||
higher than the pending owner. If not, we fail the function and return.
|
||||
|
||||
There's also something special about a pending owner. That is a pending owner
|
||||
is never blocked on a mutex. So there is no PI chain to worry about. It also
|
||||
means that if the mutex doesn't have any waiters, there's no accounting needed
|
||||
to update the pending owner's pi_list, since we only worry about processes
|
||||
blocked on the current mutex.
|
||||
|
||||
If there are waiters on this mutex, and we just stole the ownership, we need
|
||||
to take the top waiter, remove it from the pi_list of the pending owner, and
|
||||
add it to the current pi_list. Note that at this moment, the pending owner
|
||||
is no longer on the list of waiters. This is fine, since the pending owner
|
||||
would add itself back when it realizes that it had the ownership stolen
|
||||
from itself. When the pending owner tries to grab the mutex, it will fail
|
||||
in try_to_take_rt_mutex if the owner field points to another process.
|
||||
|
||||
2) No owner
|
||||
-----------
|
||||
|
||||
If there is no owner (or we successfully stole the lock), we set the owner
|
||||
of the mutex to current, and set the flag of "Has Waiters" if the current
|
||||
mutex actually has waiters, or we clear the flag if it doesn't. See, it was
|
||||
OK that we set that flag early, since now it is cleared.
|
||||
|
||||
3) Failed to grab ownership
|
||||
---------------------------
|
||||
|
||||
The most interesting case is when we fail to take ownership. This means that
|
||||
there exists an owner, or there's a pending owner with equal or higher
|
||||
priority than the current task.
|
||||
|
||||
We'll continue on the failed case.
|
||||
|
||||
If the mutex has a timeout, we set up a timer to go off to break us out
|
||||
of this mutex if we failed to get it after a specified amount of time.
|
||||
|
||||
Now we enter a loop that will continue to try to take ownership of the mutex, or
|
||||
fail from a timeout or signal.
|
||||
|
||||
Once again we try to take the mutex. This will usually fail the first time
|
||||
in the loop, since it had just failed to get the mutex. But the second time
|
||||
in the loop, this would likely succeed, since the task would likely be
|
||||
the pending owner.
|
||||
|
||||
If the mutex is TASK_INTERRUPTIBLE a check for signals and timeout is done
|
||||
here.
|
||||
|
||||
The waiter structure has a "task" field that points to the task that is blocked
|
||||
on the mutex. This field can be NULL the first time it goes through the loop
|
||||
or if the task is a pending owner and had it's mutex stolen. If the "task"
|
||||
field is NULL then we need to set up the accounting for it.
|
||||
|
||||
Task blocks on mutex
|
||||
--------------------
|
||||
|
||||
The accounting of a mutex and process is done with the waiter structure of
|
||||
the process. The "task" field is set to the process, and the "lock" field
|
||||
to the mutex. The plist nodes are initialized to the processes current
|
||||
priority.
|
||||
|
||||
Since the wait_lock was taken at the entry of the slow lock, we can safely
|
||||
add the waiter to the wait_list. If the current process is the highest
|
||||
priority process currently waiting on this mutex, then we remove the
|
||||
previous top waiter process (if it exists) from the pi_list of the owner,
|
||||
and add the current process to that list. Since the pi_list of the owner
|
||||
has changed, we call rt_mutex_adjust_prio on the owner to see if the owner
|
||||
should adjust its priority accordingly.
|
||||
|
||||
If the owner is also blocked on a lock, and had its pi_list changed
|
||||
(or deadlock checking is on), we unlock the wait_lock of the mutex and go ahead
|
||||
and run rt_mutex_adjust_prio_chain on the owner, as described earlier.
|
||||
|
||||
Now all locks are released, and if the current process is still blocked on a
|
||||
mutex (waiter "task" field is not NULL), then we go to sleep (call schedule).
|
||||
|
||||
Waking up in the loop
|
||||
---------------------
|
||||
|
||||
The schedule can then wake up for a few reasons.
|
||||
1) we were given pending ownership of the mutex.
|
||||
2) we received a signal and was TASK_INTERRUPTIBLE
|
||||
3) we had a timeout and was TASK_INTERRUPTIBLE
|
||||
|
||||
In any of these cases, we continue the loop and once again try to grab the
|
||||
ownership of the mutex. If we succeed, we exit the loop, otherwise we continue
|
||||
and on signal and timeout, will exit the loop, or if we had the mutex stolen
|
||||
we just simply add ourselves back on the lists and go back to sleep.
|
||||
|
||||
Note: For various reasons, because of timeout and signals, the steal mutex
|
||||
algorithm needs to be careful. This is because the current process is
|
||||
still on the wait_list. And because of dynamic changing of priorities,
|
||||
especially on SCHED_OTHER tasks, the current process can be the
|
||||
highest priority task on the wait_list.
|
||||
|
||||
Failed to get mutex on Timeout or Signal
|
||||
----------------------------------------
|
||||
|
||||
If a timeout or signal occurred, the waiter's "task" field would not be
|
||||
NULL and the task needs to be taken off the wait_list of the mutex and perhaps
|
||||
pi_list of the owner. If this process was a high priority process, then
|
||||
the rt_mutex_adjust_prio_chain needs to be executed again on the owner,
|
||||
but this time it will be lowering the priorities.
|
||||
|
||||
|
||||
Unlocking the Mutex
|
||||
-------------------
|
||||
|
||||
The unlocking of a mutex also has a fast path for those architectures with
|
||||
CMPXCHG. Since the taking of a mutex on contention always sets the
|
||||
"Has Waiters" flag of the mutex's owner, we use this to know if we need to
|
||||
take the slow path when unlocking the mutex. If the mutex doesn't have any
|
||||
waiters, the owner field of the mutex would equal the current process and
|
||||
the mutex can be unlocked by just replacing the owner field with NULL.
|
||||
|
||||
If the owner field has the "Has Waiters" bit set (or CMPXCHG is not available),
|
||||
the slow unlock path is taken.
|
||||
|
||||
The first thing done in the slow unlock path is to take the wait_lock of the
|
||||
mutex. This synchronizes the locking and unlocking of the mutex.
|
||||
|
||||
A check is made to see if the mutex has waiters or not. On architectures that
|
||||
do not have CMPXCHG, this is the location that the owner of the mutex will
|
||||
determine if a waiter needs to be awoken or not. On architectures that
|
||||
do have CMPXCHG, that check is done in the fast path, but it is still needed
|
||||
in the slow path too. If a waiter of a mutex woke up because of a signal
|
||||
or timeout between the time the owner failed the fast path CMPXCHG check and
|
||||
the grabbing of the wait_lock, the mutex may not have any waiters, thus the
|
||||
owner still needs to make this check. If there are no waiters than the mutex
|
||||
owner field is set to NULL, the wait_lock is released and nothing more is
|
||||
needed.
|
||||
|
||||
If there are waiters, then we need to wake one up and give that waiter
|
||||
pending ownership.
|
||||
|
||||
On the wake up code, the pi_lock of the current owner is taken. The top
|
||||
waiter of the lock is found and removed from the wait_list of the mutex
|
||||
as well as the pi_list of the current owner. The task field of the new
|
||||
pending owner's waiter structure is set to NULL, and the owner field of the
|
||||
mutex is set to the new owner with the "Pending Owner" bit set, as well
|
||||
as the "Has Waiters" bit if there still are other processes blocked on the
|
||||
mutex.
|
||||
|
||||
The pi_lock of the previous owner is released, and the new pending owner's
|
||||
pi_lock is taken. Remember that this is the trick to prevent the race
|
||||
condition in rt_mutex_adjust_prio_chain from adding itself as a waiter
|
||||
on the mutex.
|
||||
|
||||
We now clear the "pi_blocked_on" field of the new pending owner, and if
|
||||
the mutex still has waiters pending, we add the new top waiter to the pi_list
|
||||
of the pending owner.
|
||||
|
||||
Finally we unlock the pi_lock of the pending owner and wake it up.
|
||||
|
||||
|
||||
Contact
|
||||
-------
|
||||
|
||||
For updates on this document, please email Steven Rostedt <rostedt@goodmis.org>
|
||||
|
||||
|
||||
Credits
|
||||
-------
|
||||
|
||||
Author: Steven Rostedt <rostedt@goodmis.org>
|
||||
|
||||
Reviewers: Ingo Molnar, Thomas Gleixner, Thomas Duetsch, and Randy Dunlap
|
||||
|
||||
Updates
|
||||
-------
|
||||
|
||||
This document was originally written for 2.6.17-rc3-mm1
|
79
Documentation/rt-mutex.txt
Normal file
79
Documentation/rt-mutex.txt
Normal file
@ -0,0 +1,79 @@
|
||||
RT-mutex subsystem with PI support
|
||||
----------------------------------
|
||||
|
||||
RT-mutexes with priority inheritance are used to support PI-futexes,
|
||||
which enable pthread_mutex_t priority inheritance attributes
|
||||
(PTHREAD_PRIO_INHERIT). [See Documentation/pi-futex.txt for more details
|
||||
about PI-futexes.]
|
||||
|
||||
This technology was developed in the -rt tree and streamlined for
|
||||
pthread_mutex support.
|
||||
|
||||
Basic principles:
|
||||
-----------------
|
||||
|
||||
RT-mutexes extend the semantics of simple mutexes by the priority
|
||||
inheritance protocol.
|
||||
|
||||
A low priority owner of a rt-mutex inherits the priority of a higher
|
||||
priority waiter until the rt-mutex is released. If the temporarily
|
||||
boosted owner blocks on a rt-mutex itself it propagates the priority
|
||||
boosting to the owner of the other rt_mutex it gets blocked on. The
|
||||
priority boosting is immediately removed once the rt_mutex has been
|
||||
unlocked.
|
||||
|
||||
This approach allows us to shorten the block of high-prio tasks on
|
||||
mutexes which protect shared resources. Priority inheritance is not a
|
||||
magic bullet for poorly designed applications, but it allows
|
||||
well-designed applications to use userspace locks in critical parts of
|
||||
an high priority thread, without losing determinism.
|
||||
|
||||
The enqueueing of the waiters into the rtmutex waiter list is done in
|
||||
priority order. For same priorities FIFO order is chosen. For each
|
||||
rtmutex, only the top priority waiter is enqueued into the owner's
|
||||
priority waiters list. This list too queues in priority order. Whenever
|
||||
the top priority waiter of a task changes (for example it timed out or
|
||||
got a signal), the priority of the owner task is readjusted. [The
|
||||
priority enqueueing is handled by "plists", see include/linux/plist.h
|
||||
for more details.]
|
||||
|
||||
RT-mutexes are optimized for fastpath operations and have no internal
|
||||
locking overhead when locking an uncontended mutex or unlocking a mutex
|
||||
without waiters. The optimized fastpath operations require cmpxchg
|
||||
support. [If that is not available then the rt-mutex internal spinlock
|
||||
is used]
|
||||
|
||||
The state of the rt-mutex is tracked via the owner field of the rt-mutex
|
||||
structure:
|
||||
|
||||
rt_mutex->owner holds the task_struct pointer of the owner. Bit 0 and 1
|
||||
are used to keep track of the "owner is pending" and "rtmutex has
|
||||
waiters" state.
|
||||
|
||||
owner bit1 bit0
|
||||
NULL 0 0 mutex is free (fast acquire possible)
|
||||
NULL 0 1 invalid state
|
||||
NULL 1 0 Transitional state*
|
||||
NULL 1 1 invalid state
|
||||
taskpointer 0 0 mutex is held (fast release possible)
|
||||
taskpointer 0 1 task is pending owner
|
||||
taskpointer 1 0 mutex is held and has waiters
|
||||
taskpointer 1 1 task is pending owner and mutex has waiters
|
||||
|
||||
Pending-ownership handling is a performance optimization:
|
||||
pending-ownership is assigned to the first (highest priority) waiter of
|
||||
the mutex, when the mutex is released. The thread is woken up and once
|
||||
it starts executing it can acquire the mutex. Until the mutex is taken
|
||||
by it (bit 0 is cleared) a competing higher priority thread can "steal"
|
||||
the mutex which puts the woken up thread back on the waiters list.
|
||||
|
||||
The pending-ownership optimization is especially important for the
|
||||
uninterrupted workflow of high-prio tasks which repeatedly
|
||||
takes/releases locks that have lower-prio waiters. Without this
|
||||
optimization the higher-prio thread would ping-pong to the lower-prio
|
||||
task [because at unlock time we always assign a new owner].
|
||||
|
||||
(*) The "mutex has waiters" bit gets set to take the lock. If the lock
|
||||
doesn't already have an owner, this bit is quickly cleared if there are
|
||||
no waiters. So this is a transitional state to synchronize with looking
|
||||
at the owner field of the mutex and the mutex owner releasing the lock.
|
@ -44,8 +44,10 @@ normal timer interrupt, which is 100Hz.
|
||||
Programming and/or enabling interrupt frequencies greater than 64Hz is
|
||||
only allowed by root. This is perhaps a bit conservative, but we don't want
|
||||
an evil user generating lots of IRQs on a slow 386sx-16, where it might have
|
||||
a negative impact on performance. Note that the interrupt handler is only
|
||||
a few lines of code to minimize any possibility of this effect.
|
||||
a negative impact on performance. This 64Hz limit can be changed by writing
|
||||
a different value to /proc/sys/dev/rtc/max-user-freq. Note that the
|
||||
interrupt handler is only a few lines of code to minimize any possibility
|
||||
of this effect.
|
||||
|
||||
Also, if the kernel time is synchronized with an external source, the
|
||||
kernel will write the time back to the CMOS clock every 11 minutes. In
|
||||
@ -81,6 +83,7 @@ that will be using this driver.
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <linux/rtc.h>
|
||||
#include <sys/ioctl.h>
|
||||
#include <sys/time.h>
|
||||
|
@ -30,8 +30,6 @@ aic7xxx.txt
|
||||
- info on driver for Adaptec controllers
|
||||
aic7xxx_old.txt
|
||||
- info on driver for Adaptec controllers, old generation
|
||||
cpqfc.txt
|
||||
- info on driver for Compaq Tachyon TS adapters
|
||||
dpti.txt
|
||||
- info on driver for DPT SmartRAID and Adaptec I2O RAID based adapters
|
||||
dtc3x80.txt
|
||||
|
@ -1,3 +1,16 @@
|
||||
|
||||
1 Release Date : Wed Feb 03 14:31:44 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com>
|
||||
2 Current Version : 00.00.02.04
|
||||
3 Older Version : 00.00.02.04
|
||||
|
||||
i. Remove superflous instance_lock
|
||||
|
||||
gets rid of the otherwise superflous instance_lock and avoids an unsave
|
||||
unsynchronized access in the error handler.
|
||||
|
||||
- Christoph Hellwig <hch@lst.de>
|
||||
|
||||
|
||||
1 Release Date : Wed Feb 03 14:31:44 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com>
|
||||
2 Current Version : 00.00.02.04
|
||||
3 Older Version : 00.00.02.04
|
||||
|
@ -24,10 +24,10 @@ Supported Cards/Chipsets
|
||||
9005:0285:9005:0296 Adaptec 2240S (SabreExpress)
|
||||
9005:0285:9005:0290 Adaptec 2410SA (Jaguar)
|
||||
9005:0285:9005:0293 Adaptec 21610SA (Corsair-16)
|
||||
9005:0285:103c:3227 Adaptec 2610SA (Bearcat)
|
||||
9005:0285:103c:3227 Adaptec 2610SA (Bearcat HP release)
|
||||
9005:0285:9005:0292 Adaptec 2810SA (Corsair-8)
|
||||
9005:0285:9005:0294 Adaptec Prowler
|
||||
9005:0286:9005:029d Adaptec 2420SA (Intruder)
|
||||
9005:0286:9005:029d Adaptec 2420SA (Intruder HP release)
|
||||
9005:0286:9005:029c Adaptec 2620SA (Intruder)
|
||||
9005:0286:9005:029b Adaptec 2820SA (Intruder)
|
||||
9005:0286:9005:02a7 Adaptec 2830SA (Skyray)
|
||||
@ -38,7 +38,7 @@ Supported Cards/Chipsets
|
||||
9005:0285:9005:0297 Adaptec 4005SAS (AvonPark)
|
||||
9005:0285:9005:0299 Adaptec 4800SAS (Marauder-X)
|
||||
9005:0285:9005:029a Adaptec 4805SAS (Marauder-E)
|
||||
9005:0286:9005:02a2 Adaptec 4810SAS (Hurricane)
|
||||
9005:0286:9005:02a2 Adaptec 3800SAS (Hurricane44)
|
||||
1011:0046:9005:0364 Adaptec 5400S (Mustang)
|
||||
1011:0046:9005:0365 Adaptec 5400S (Mustang)
|
||||
9005:0283:9005:0283 Adaptec Catapult (3210S with arc firmware)
|
||||
@ -72,7 +72,7 @@ Supported Cards/Chipsets
|
||||
9005:0286:9005:02a1 ICP ICP9087MA (Lancer)
|
||||
9005:0286:9005:02a4 ICP ICP9085LI (Marauder-X)
|
||||
9005:0286:9005:02a5 ICP ICP5085BR (Marauder-E)
|
||||
9005:0286:9005:02a3 ICP ICP5085AU (Hurricane)
|
||||
9005:0286:9005:02a3 ICP ICP5445AU (Hurricane44)
|
||||
9005:0286:9005:02a6 ICP ICP9067MA (Intruder-6)
|
||||
9005:0286:9005:02a9 ICP ICP5087AU (Skyray)
|
||||
9005:0286:9005:02aa ICP ICP5047AU (Skyray)
|
||||
|
@ -1,272 +0,0 @@
|
||||
Notes for CPQFCTS driver for Compaq Tachyon TS
|
||||
Fibre Channel Host Bus Adapter, PCI 64-bit, 66MHz
|
||||
for Linux (RH 6.1, 6.2 kernel 2.2.12-32, 2.2.14-5)
|
||||
SMP tested
|
||||
Tested in single and dual HBA configuration, 32 and 64bit busses,
|
||||
33 and 66MHz. Only supports FC-AL.
|
||||
SEST size 512 Exchanges (simultaneous I/Os) limited by module kmalloc()
|
||||
max of 128k bytes contiguous.
|
||||
|
||||
Ver 2.5.4 Oct 03, 2002
|
||||
* fixed memcpy of sense buffer in ioctl to copy the smaller defined size
|
||||
Ver 2.5.3 Aug 01, 2002
|
||||
* fix the passthru ioctl to handle the Scsi_Cmnd->request being a pointer
|
||||
Ver 2.5.1 Jul 30, 2002
|
||||
* fix ioctl to pay attention to the specified LUN.
|
||||
Ver 2.5.0 Nov 29, 2001
|
||||
* eliminated io_request_lock. This change makes the driver specific
|
||||
to the 2.5.x kernels.
|
||||
* silenced excessively noisy printks.
|
||||
|
||||
Ver 2.1.2 July 23, 2002
|
||||
* initialize DumCmnd->lun in cpqfcTS_ioctl (used in fcFindLoggedInPorts as LUN index)
|
||||
|
||||
Ver 2.1.1 Oct 18, 2001
|
||||
* reinitialize Cmnd->SCp.sent_command (used to identify commands as
|
||||
passthrus) on calling scsi_done, since the scsi mid layer does not
|
||||
use (or reinitialize) this field to prevent subsequent comands from
|
||||
having it set incorrectly.
|
||||
|
||||
Ver 2.1.0 Aug 27, 2001
|
||||
* Revise driver to use new kernel 2.4.x PCI DMA API, instead of
|
||||
virt_to_bus(). (enables driver to work w/ ia64 systems with >2Gb RAM.)
|
||||
Rework main scatter-gather code to handle cases where SG element
|
||||
lengths are larger than 0x7FFFF bytes and use as many scatter
|
||||
gather pages as necessary. (Steve Cameron)
|
||||
* Makefile changes to bring cpqfc into line w/ rest of SCSI drivers
|
||||
(thanks to Keith Owens)
|
||||
|
||||
Ver 2.0.5 Aug 06, 2001
|
||||
* Reject non-existent luns in the driver rather than letting the
|
||||
hardware do it. (some HW behaves differently than others in this area.)
|
||||
* Changed Makefile to rely on "make dep" instead of explicit dependencies
|
||||
* ifdef'ed out fibre channel analyzer triggering debug code
|
||||
* fixed a jiffies wrapping issue
|
||||
|
||||
Ver 2.0.4 Aug 01, 2001
|
||||
* Incorporated fix for target device reset from Steeleye
|
||||
* Fixed passthrough ioctl so it doesn't hang.
|
||||
* Fixed hang in launch_FCworker_thread() that occurred on some machines.
|
||||
* Avoid problem when number of volumes in a single cabinet > 8
|
||||
|
||||
Ver 2.0.2 July 23, 2001
|
||||
Changed the semiphore changes so the driver would compile in 2.4.7.
|
||||
This version is for 2.4.7 and beyond.
|
||||
|
||||
Ver 2.0.1 May 7, 2001
|
||||
Merged version 1.3.6 fixes into version 2.0.0.
|
||||
|
||||
Ver 2.0.0 May 7, 2001
|
||||
Fixed problem so spinlock is being initialized to UNLOCKED.
|
||||
Fixed updated driver so it compiles in the 2.4 tree.
|
||||
|
||||
Ver 1.3.6 Feb 27, 2001
|
||||
Added Target_Device_Reset function for SCSI error handling
|
||||
Fixed problem with not reseting addressing mode after implicit logout
|
||||
|
||||
|
||||
Ver 1.3.4 Sep 7, 2000
|
||||
Added Modinfo information
|
||||
Fixed problem with statically linking the driver
|
||||
|
||||
Ver 1.3.3, Aug 23, 2000
|
||||
Fixed device/function number in ioctl
|
||||
|
||||
Ver 1.3.2, July 27, 2000
|
||||
Add include for Alpha compile on 2.2.14 kernel (cpq*i2c.c)
|
||||
Change logic for different FCP-RSP sense_buffer location for HSG80 target
|
||||
And search for Agilent Tachyon XL2 HBAs (not finished! - in test)
|
||||
|
||||
Tested with
|
||||
(storage):
|
||||
Compaq RA-4x000, RAID firmware ver 2.40 - 2.54
|
||||
Seagate FC drives model ST39102FC, rev 0006
|
||||
Hitachi DK31CJ-72FC rev J8A8
|
||||
IBM DDYF-T18350R rev F60K
|
||||
Compaq FC-SCSI bridge w/ DLT 35/70 Gb DLT (tape)
|
||||
(servers):
|
||||
Compaq PL-1850R
|
||||
Compaq PL-6500 Xeon (400MHz)
|
||||
Compaq PL-8500 (500MHz, 66MHz, 64bit PCI)
|
||||
Compaq Alpha DS20 (RH 6.1)
|
||||
(hubs):
|
||||
Vixel Rapport 1000 (7-port "dumb")
|
||||
Gadzoox Gibralter (12-port "dumb")
|
||||
Gadzoox Capellix 2000, 3000
|
||||
(switches):
|
||||
Brocade 2010, 2400, 2800, rev 2.0.3a (& later)
|
||||
Gadzoox 3210 (Fabric blade beta)
|
||||
Vixel 7100 (Fabric beta firmare - known hot plug issues)
|
||||
using "qa_test" (esp. io_test script) suite modified from Unix tests.
|
||||
|
||||
Installation:
|
||||
make menuconfig
|
||||
(select SCSI low-level, Compaq FC HBA)
|
||||
make modules
|
||||
make modules_install
|
||||
|
||||
e.g. insmod -f cpqfc
|
||||
|
||||
Due to Fabric/switch delays, driver requires 4 seconds
|
||||
to initialize. If adapters are found, there will be a entries at
|
||||
/proc/scsi/cpqfcTS/*
|
||||
|
||||
sample contents of startup messages
|
||||
|
||||
*************************
|
||||
scsi_register allocating 3596 bytes for CPQFCHBA
|
||||
ioremap'd Membase: c887e600
|
||||
HBA Tachyon RevId 1.2
|
||||
Allocating 119808 for 576 Exchanges @ c0dc0000
|
||||
Allocating 112904 for LinkQ @ c0c20000 (576 elements)
|
||||
Allocating 110600 for TachSEST for 512 Exchanges
|
||||
cpqfcTS: writing IMQ BASE 7C0000h PI 7C4000h
|
||||
cpqfcTS: SEST c0e40000(virt): Wrote base E40000h @ c887e740
|
||||
cpqfcTS: New FC port 0000E8h WWN: 500507650642499D SCSI Chan/Trgt 0/0
|
||||
cpqfcTS: New FC port 0000EFh WWN: 50000E100000D5A6 SCSI Chan/Trgt 0/1
|
||||
cpqfcTS: New FC port 0000E4h WWN: 21000020370097BB SCSI Chan/Trgt 0/2
|
||||
cpqfcTS: New FC port 0000E2h WWN: 2100002037009946 SCSI Chan/Trgt 0/3
|
||||
cpqfcTS: New FC port 0000E1h WWN: 21000020370098FE SCSI Chan/Trgt 0/4
|
||||
cpqfcTS: New FC port 0000E0h WWN: 21000020370097B2 SCSI Chan/Trgt 0/5
|
||||
cpqfcTS: New FC port 0000DCh WWN: 2100002037006CC1 SCSI Chan/Trgt 0/6
|
||||
cpqfcTS: New FC port 0000DAh WWN: 21000020370059F6 SCSI Chan/Trgt 0/7
|
||||
cpqfcTS: New FC port 00000Fh WWN: 500805F1FADB0E20 SCSI Chan/Trgt 0/8
|
||||
cpqfcTS: New FC port 000008h WWN: 500805F1FADB0EBA SCSI Chan/Trgt 0/9
|
||||
cpqfcTS: New FC port 000004h WWN: 500805F1FADB1EB9 SCSI Chan/Trgt 0/10
|
||||
cpqfcTS: New FC port 000002h WWN: 500805F1FADB1ADE SCSI Chan/Trgt 0/11
|
||||
cpqfcTS: New FC port 000001h WWN: 500805F1FADBA2CA SCSI Chan/Trgt 0/12
|
||||
scsi4 : Compaq FibreChannel HBA Tachyon TS HPFC-5166A/1.2: WWN 500508B200193F50
|
||||
on PCI bus 0 device 0xa0fc irq 5 IObaseL 0x3400, MEMBASE 0xc6ef8600
|
||||
PCI bus width 32 bits, bus speed 33 MHz
|
||||
FCP-SCSI Driver v1.3.0
|
||||
GBIC detected: Short-wave. LPSM 0h Monitor
|
||||
scsi : 5 hosts.
|
||||
Vendor: IBM Model: DDYF-T18350R Rev: F60K
|
||||
Type: Direct-Access ANSI SCSI revision: 03
|
||||
Detected scsi disk sdb at scsi4, channel 0, id 0, lun 0
|
||||
Vendor: HITACHI Model: DK31CJ-72FC Rev: J8A8
|
||||
Type: Direct-Access ANSI SCSI revision: 02
|
||||
Detected scsi disk sdc at scsi4, channel 0, id 1, lun 0
|
||||
Vendor: SEAGATE Model: ST39102FC Rev: 0006
|
||||
Type: Direct-Access ANSI SCSI revision: 02
|
||||
Detected scsi disk sdd at scsi4, channel 0, id 2, lun 0
|
||||
Vendor: SEAGATE Model: ST39102FC Rev: 0006
|
||||
Type: Direct-Access ANSI SCSI revision: 02
|
||||
Detected scsi disk sde at scsi4, channel 0, id 3, lun 0
|
||||
Vendor: SEAGATE Model: ST39102FC Rev: 0006
|
||||
Type: Direct-Access ANSI SCSI revision: 02
|
||||
Detected scsi disk sdf at scsi4, channel 0, id 4, lun 0
|
||||
Vendor: SEAGATE Model: ST39102FC Rev: 0006
|
||||
Type: Direct-Access ANSI SCSI revision: 02
|
||||
Detected scsi disk sdg at scsi4, channel 0, id 5, lun 0
|
||||
Vendor: SEAGATE Model: ST39102FC Rev: 0006
|
||||
Type: Direct-Access ANSI SCSI revision: 02
|
||||
Detected scsi disk sdh at scsi4, channel 0, id 6, lun 0
|
||||
Vendor: SEAGATE Model: ST39102FC Rev: 0006
|
||||
Type: Direct-Access ANSI SCSI revision: 02
|
||||
Detected scsi disk sdi at scsi4, channel 0, id 7, lun 0
|
||||
Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.48
|
||||
Type: Direct-Access ANSI SCSI revision: 02
|
||||
Detected scsi disk sdj at scsi4, channel 0, id 8, lun 0
|
||||
Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.48
|
||||
Type: Direct-Access ANSI SCSI revision: 02
|
||||
Detected scsi disk sdk at scsi4, channel 0, id 8, lun 1
|
||||
Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.40
|
||||
Type: Direct-Access ANSI SCSI revision: 02
|
||||
Detected scsi disk sdl at scsi4, channel 0, id 9, lun 0
|
||||
Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.40
|
||||
Type: Direct-Access ANSI SCSI revision: 02
|
||||
Detected scsi disk sdm at scsi4, channel 0, id 9, lun 1
|
||||
Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.54
|
||||
Type: Direct-Access ANSI SCSI revision: 02
|
||||
Detected scsi disk sdn at scsi4, channel 0, id 10, lun 0
|
||||
Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.54
|
||||
Type: Direct-Access ANSI SCSI revision: 02
|
||||
Detected scsi disk sdo at scsi4, channel 0, id 11, lun 0
|
||||
Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.54
|
||||
Type: Direct-Access ANSI SCSI revision: 02
|
||||
Detected scsi disk sdp at scsi4, channel 0, id 11, lun 1
|
||||
Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.54
|
||||
Type: Direct-Access ANSI SCSI revision: 02
|
||||
Detected scsi disk sdq at scsi4, channel 0, id 12, lun 0
|
||||
Vendor: COMPAQ Model: LOGICAL VOLUME Rev: 2.54
|
||||
Type: Direct-Access ANSI SCSI revision: 02
|
||||
Detected scsi disk sdr at scsi4, channel 0, id 12, lun 1
|
||||
resize_dma_pool: unknown device type 12
|
||||
resize_dma_pool: unknown device type 12
|
||||
SCSI device sdb: hdwr sector= 512 bytes. Sectors= 35843670 [17501 MB] [17.5 GB]
|
||||
sdb: sdb1
|
||||
SCSI device sdc: hdwr sector= 512 bytes. Sectors= 144410880 [70513 MB] [70.5 GB]
|
||||
sdc: sdc1
|
||||
SCSI device sdd: hdwr sector= 512 bytes. Sectors= 17783240 [8683 MB] [8.7 GB]
|
||||
sdd: sdd1
|
||||
SCSI device sde: hdwr sector= 512 bytes. Sectors= 17783240 [8683 MB] [8.7 GB]
|
||||
sde: sde1
|
||||
SCSI device sdf: hdwr sector= 512 bytes. Sectors= 17783240 [8683 MB] [8.7 GB]
|
||||
sdf: sdf1
|
||||
SCSI device sdg: hdwr sector= 512 bytes. Sectors= 17783240 [8683 MB] [8.7 GB]
|
||||
sdg: sdg1
|
||||
SCSI device sdh: hdwr sector= 512 bytes. Sectors= 17783240 [8683 MB] [8.7 GB]
|
||||
sdh: sdh1
|
||||
SCSI device sdi: hdwr sector= 512 bytes. Sectors= 17783240 [8683 MB] [8.7 GB]
|
||||
sdi: sdi1
|
||||
SCSI device sdj: hdwr sector= 512 bytes. Sectors= 2056160 [1003 MB] [1.0 GB]
|
||||
sdj: sdj1
|
||||
SCSI device sdk: hdwr sector= 512 bytes. Sectors= 2052736 [1002 MB] [1.0 GB]
|
||||
sdk: sdk1
|
||||
SCSI device sdl: hdwr sector= 512 bytes. Sectors= 17764320 [8673 MB] [8.7 GB]
|
||||
sdl: sdl1
|
||||
SCSI device sdm: hdwr sector= 512 bytes. Sectors= 8380320 [4091 MB] [4.1 GB]
|
||||
sdm: sdm1
|
||||
SCSI device sdn: hdwr sector= 512 bytes. Sectors= 17764320 [8673 MB] [8.7 GB]
|
||||
sdn: sdn1
|
||||
SCSI device sdo: hdwr sector= 512 bytes. Sectors= 17764320 [8673 MB] [8.7 GB]
|
||||
sdo: sdo1
|
||||
SCSI device sdp: hdwr sector= 512 bytes. Sectors= 17764320 [8673 MB] [8.7 GB]
|
||||
sdp: sdp1
|
||||
SCSI device sdq: hdwr sector= 512 bytes. Sectors= 2056160 [1003 MB] [1.0 GB]
|
||||
sdq: sdq1
|
||||
SCSI device sdr: hdwr sector= 512 bytes. Sectors= 2052736 [1002 MB] [1.0 GB]
|
||||
sdr: sdr1
|
||||
|
||||
*************************
|
||||
|
||||
If a GBIC of type Short-wave, Long-wave, or Copper is detected, it will
|
||||
print out; otherwise, "none" is displayed. If the cabling is correct
|
||||
and a loop circuit is completed, you should see "Monitor"; otherwise,
|
||||
"LoopFail" (on open circuit) or some LPSM number/state with bit 3 set.
|
||||
|
||||
|
||||
ERRATA:
|
||||
1. Normally, Linux Scsi queries FC devices with INQUIRY strings. All LUNs
|
||||
found according to INQUIRY should get READ commands at sector 0 to find
|
||||
partition table, etc. Older kernels only query the first 4 devices. Some
|
||||
Linux kernels only look for one LUN per target (i.e. FC device).
|
||||
|
||||
2. Physically removing a device, or a malfunctioning system which hides a
|
||||
device, leads to a 30-second timeout and subsequent _abort call.
|
||||
In some process contexts, this will hang the kernel (crashing the system).
|
||||
Single bit errors in frames and virtually all hot plugging events are
|
||||
gracefully handled with internal driver timer and Abort processing.
|
||||
|
||||
3. Some SCSI drives with error conditions will not handle the 7 second timeout
|
||||
in this software driver, leading to infinite retries on timed out SCSI commands.
|
||||
The 7 secs balances the need to quickly recover from lost frames (esp. on sequence
|
||||
initiatives) and time needed by older/slower/error-state drives in responding.
|
||||
This can be easily changed in "Exchanges[].timeOut".
|
||||
|
||||
4. Due to the nature of FC soft addressing, there is no assurance that the
|
||||
same LUNs (drives) will have the same path (e.g. /dev/sdb1) from one boot to
|
||||
next. Dynamic soft address changes (i.e. 24-bit FC port_id) are
|
||||
supported during run time (e.g. due to hot plug event) by the use of WWN to
|
||||
SCSI Nexus (channel/target/LUN) mapping.
|
||||
|
||||
5. Compaq RA4x00 firmware version 2.54 and later supports SSP (Selective
|
||||
Storage Presentation), which maps LUNs to a WWN. If RA4x00 firmware prior
|
||||
2.54 (e.g. older controller) is used, or the FC HBA is replaced (another WWN
|
||||
is used), logical volumes on the RA4x00 will no longer be visible.
|
||||
|
||||
|
||||
Send questions/comments to:
|
||||
Amy Vanzant-Hodge (fibrechannel@compaq.com)
|
||||
|
92
Documentation/scsi/hptiop.txt
Normal file
92
Documentation/scsi/hptiop.txt
Normal file
@ -0,0 +1,92 @@
|
||||
HIGHPOINT ROCKETRAID 3xxx RAID DRIVER (hptiop)
|
||||
|
||||
Controller Register Map
|
||||
-------------------------
|
||||
|
||||
The controller IOP is accessed via PCI BAR0.
|
||||
|
||||
BAR0 offset Register
|
||||
0x10 Inbound Message Register 0
|
||||
0x14 Inbound Message Register 1
|
||||
0x18 Outbound Message Register 0
|
||||
0x1C Outbound Message Register 1
|
||||
0x20 Inbound Doorbell Register
|
||||
0x24 Inbound Interrupt Status Register
|
||||
0x28 Inbound Interrupt Mask Register
|
||||
0x30 Outbound Interrupt Status Register
|
||||
0x34 Outbound Interrupt Mask Register
|
||||
0x40 Inbound Queue Port
|
||||
0x44 Outbound Queue Port
|
||||
|
||||
|
||||
I/O Request Workflow
|
||||
----------------------
|
||||
|
||||
All queued requests are handled via inbound/outbound queue port.
|
||||
A request packet can be allocated in either IOP or host memory.
|
||||
|
||||
To send a request to the controller:
|
||||
|
||||
- Get a free request packet by reading the inbound queue port or
|
||||
allocate a free request in host DMA coherent memory.
|
||||
|
||||
The value returned from the inbound queue port is an offset
|
||||
relative to the IOP BAR0.
|
||||
|
||||
Requests allocated in host memory must be aligned on 32-bytes boundary.
|
||||
|
||||
- Fill the packet.
|
||||
|
||||
- Post the packet to IOP by writing it to inbound queue. For requests
|
||||
allocated in IOP memory, write the offset to inbound queue port. For
|
||||
requests allocated in host memory, write (0x80000000|(bus_addr>>5))
|
||||
to the inbound queue port.
|
||||
|
||||
- The IOP process the request. When the request is completed, it
|
||||
will be put into outbound queue. An outbound interrupt will be
|
||||
generated.
|
||||
|
||||
For requests allocated in IOP memory, the request offset is posted to
|
||||
outbound queue.
|
||||
|
||||
For requests allocated in host memory, (0x80000000|(bus_addr>>5))
|
||||
is posted to the outbound queue. If IOP_REQUEST_FLAG_OUTPUT_CONTEXT
|
||||
flag is set in the request, the low 32-bit context value will be
|
||||
posted instead.
|
||||
|
||||
- The host read the outbound queue and complete the request.
|
||||
|
||||
For requests allocated in IOP memory, the host driver free the request
|
||||
by writing it to the outbound queue.
|
||||
|
||||
Non-queued requests (reset/flush etc) can be sent via inbound message
|
||||
register 0. An outbound message with the same value indicates the completion
|
||||
of an inbound message.
|
||||
|
||||
|
||||
User-level Interface
|
||||
---------------------
|
||||
|
||||
The driver exposes following sysfs attributes:
|
||||
|
||||
NAME R/W Description
|
||||
driver-version R driver version string
|
||||
firmware-version R firmware version string
|
||||
|
||||
The driver registers char device "hptiop" to communicate with HighPoint RAID
|
||||
management software. Its ioctl routine acts as a general binary interface
|
||||
between the IOP firmware and HighPoint RAID management software. New management
|
||||
functions can be implemented in application/firmware without modification
|
||||
in driver code.
|
||||
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Copyright (C) 2006 HighPoint Technologies, Inc. All Rights Reserved.
|
||||
|
||||
This file is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
linux@highpoint-tech.com
|
||||
http://www.highpoint-tech.com
|
@ -12,5 +12,3 @@ http://www.torque.net/parport/
|
||||
Email list for Linux Parport
|
||||
linux-parport@torque.net
|
||||
|
||||
Email for problems with ZIP or ZIP Plus drivers
|
||||
campbell@torque.net
|
||||
|
@ -109,7 +109,7 @@ than the 33.33 MHz being in the PCI spec.
|
||||
|
||||
If you want to share the IRQ with another device and the driver refuses to
|
||||
do so, you might succeed with changing the DC390_IRQ type in tmscsim.c to
|
||||
SA_SHIRQ | SA_INTERRUPT.
|
||||
IRQF_SHARED | IRQF_DISABLED.
|
||||
|
||||
|
||||
3.Features
|
||||
|
@ -366,7 +366,9 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
|
||||
Module for C-Media CMI8338 and 8738 PCI sound cards.
|
||||
|
||||
mpu_port - 0x300,0x310,0x320,0x330, 0 = disable (default)
|
||||
mpu_port - 0x300,0x310,0x320,0x330 = legacy port,
|
||||
1 = integrated PCI port,
|
||||
0 = disable (default)
|
||||
fm_port - 0x388 (default), 0 = disable (default)
|
||||
soft_ac3 - Software-conversion of raw SPDIF packets (model 033 only)
|
||||
(default = 1)
|
||||
@ -468,7 +470,23 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
|
||||
Module for multifunction CS5535 companion PCI device
|
||||
|
||||
The power-management is supported.
|
||||
|
||||
Module snd-darla20
|
||||
------------------
|
||||
|
||||
Module for Echoaudio Darla20
|
||||
|
||||
This module supports multiple cards.
|
||||
The driver requires the firmware loader support on kernel.
|
||||
|
||||
Module snd-darla24
|
||||
------------------
|
||||
|
||||
Module for Echoaudio Darla24
|
||||
|
||||
This module supports multiple cards.
|
||||
The driver requires the firmware loader support on kernel.
|
||||
|
||||
Module snd-dt019x
|
||||
-----------------
|
||||
@ -497,6 +515,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
|
||||
The power-management is supported.
|
||||
|
||||
Module snd-echo3g
|
||||
-----------------
|
||||
|
||||
Module for Echoaudio 3G cards (Gina3G/Layla3G)
|
||||
|
||||
This module supports multiple cards.
|
||||
The driver requires the firmware loader support on kernel.
|
||||
|
||||
Module snd-emu10k1
|
||||
------------------
|
||||
|
||||
@ -655,6 +681,22 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
|
||||
The power-management is supported.
|
||||
|
||||
Module snd-gina20
|
||||
-----------------
|
||||
|
||||
Module for Echoaudio Gina20
|
||||
|
||||
This module supports multiple cards.
|
||||
The driver requires the firmware loader support on kernel.
|
||||
|
||||
Module snd-gina24
|
||||
-----------------
|
||||
|
||||
Module for Echoaudio Gina24
|
||||
|
||||
This module supports multiple cards.
|
||||
The driver requires the firmware loader support on kernel.
|
||||
|
||||
Module snd-gusclassic
|
||||
---------------------
|
||||
|
||||
@ -707,8 +749,10 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
Module snd-hda-intel
|
||||
--------------------
|
||||
|
||||
Module for Intel HD Audio (ICH6, ICH6M, ICH7), ATI SB450,
|
||||
VIA VT8251/VT8237A
|
||||
Module for Intel HD Audio (ICH6, ICH6M, ESB2, ICH7, ICH8),
|
||||
ATI SB450, SB600, RS600,
|
||||
VIA VT8251/VT8237A,
|
||||
SIS966, ULI M5461
|
||||
|
||||
model - force the model name
|
||||
position_fix - Fix DMA pointer (0 = auto, 1 = none, 2 = POSBUF, 3 = FIFO size)
|
||||
@ -756,12 +800,18 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
basic fixed pin assignment w/o SPDIF
|
||||
auto auto-config reading BIOS (default)
|
||||
|
||||
ALC882/883/885
|
||||
ALC882/885
|
||||
3stack-dig 3-jack with SPDIF I/O
|
||||
6stck-dig 6-jack digital with SPDIF I/O
|
||||
auto auto-config reading BIOS (default)
|
||||
|
||||
ALC861
|
||||
ALC883/888
|
||||
3stack-dig 3-jack with SPDIF I/O
|
||||
6stack-dig 6-jack digital with SPDIF I/O
|
||||
6stack-dig-demo 6-stack digital for Intel demo board
|
||||
auto auto-config reading BIOS (default)
|
||||
|
||||
ALC861/660
|
||||
3stack 3-jack
|
||||
3stack-dig 3-jack with SPDIF I/O
|
||||
6stack-dig 6-jack with SPDIF I/O
|
||||
@ -778,6 +828,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
AD1981
|
||||
basic 3-jack (default)
|
||||
hp HP nx6320
|
||||
thinkpad Lenovo Thinkpad T60/X60/Z60
|
||||
|
||||
AD1986A
|
||||
6stack 6-jack, separate surrounds (default)
|
||||
@ -932,6 +983,30 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
driver isn't configured properly or you want to try another
|
||||
type for testing.
|
||||
|
||||
Module snd-indigo
|
||||
-----------------
|
||||
|
||||
Module for Echoaudio Indigo
|
||||
|
||||
This module supports multiple cards.
|
||||
The driver requires the firmware loader support on kernel.
|
||||
|
||||
Module snd-indigodj
|
||||
-------------------
|
||||
|
||||
Module for Echoaudio Indigo DJ
|
||||
|
||||
This module supports multiple cards.
|
||||
The driver requires the firmware loader support on kernel.
|
||||
|
||||
Module snd-indigoio
|
||||
-------------------
|
||||
|
||||
Module for Echoaudio Indigo IO
|
||||
|
||||
This module supports multiple cards.
|
||||
The driver requires the firmware loader support on kernel.
|
||||
|
||||
Module snd-intel8x0
|
||||
-------------------
|
||||
|
||||
@ -1031,6 +1106,22 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
|
||||
This module supports multiple cards.
|
||||
|
||||
Module snd-layla20
|
||||
------------------
|
||||
|
||||
Module for Echoaudio Layla20
|
||||
|
||||
This module supports multiple cards.
|
||||
The driver requires the firmware loader support on kernel.
|
||||
|
||||
Module snd-layla24
|
||||
------------------
|
||||
|
||||
Module for Echoaudio Layla24
|
||||
|
||||
This module supports multiple cards.
|
||||
The driver requires the firmware loader support on kernel.
|
||||
|
||||
Module snd-maestro3
|
||||
-------------------
|
||||
|
||||
@ -1051,6 +1142,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
|
||||
The power-management is supported.
|
||||
|
||||
Module snd-mia
|
||||
---------------
|
||||
|
||||
Module for Echoaudio Mia
|
||||
|
||||
This module supports multiple cards.
|
||||
The driver requires the firmware loader support on kernel.
|
||||
|
||||
Module snd-miro
|
||||
---------------
|
||||
|
||||
@ -1083,6 +1182,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
When no hotplug fw loader is available, you need to load the
|
||||
firmware via mixartloader utility in alsa-tools package.
|
||||
|
||||
Module snd-mona
|
||||
---------------
|
||||
|
||||
Module for Echoaudio Mona
|
||||
|
||||
This module supports multiple cards.
|
||||
The driver requires the firmware loader support on kernel.
|
||||
|
||||
Module snd-mpu401
|
||||
-----------------
|
||||
|
||||
@ -1633,9 +1740,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
|
||||
About capture IBL, see the description of snd-vx222 module.
|
||||
|
||||
Note: the driver is build only when CONFIG_ISA is set.
|
||||
|
||||
Note2: snd-vxp440 driver is merged to snd-vxpocket driver since
|
||||
Note: snd-vxp440 driver is merged to snd-vxpocket driver since
|
||||
ALSA 1.0.10.
|
||||
|
||||
The power-management is supported.
|
||||
@ -1662,8 +1767,6 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
|
||||
|
||||
Module for Sound Core PDAudioCF sound card.
|
||||
|
||||
Note: the driver is build only when CONFIG_ISA is set.
|
||||
|
||||
The power-management is supported.
|
||||
|
||||
|
||||
|
@ -1149,7 +1149,7 @@
|
||||
}
|
||||
chip->port = pci_resource_start(pci, 0);
|
||||
if (request_irq(pci->irq, snd_mychip_interrupt,
|
||||
SA_INTERRUPT|SA_SHIRQ, "My Chip", chip)) {
|
||||
IRQF_DISABLED|IRQF_SHARED, "My Chip", chip)) {
|
||||
printk(KERN_ERR "cannot grab irq %d\n", pci->irq);
|
||||
snd_mychip_free(chip);
|
||||
return -EBUSY;
|
||||
@ -1323,7 +1323,7 @@
|
||||
<programlisting>
|
||||
<![CDATA[
|
||||
if (request_irq(pci->irq, snd_mychip_interrupt,
|
||||
SA_INTERRUPT|SA_SHIRQ, "My Chip", chip)) {
|
||||
IRQF_DISABLED|IRQF_SHARED, "My Chip", chip)) {
|
||||
printk(KERN_ERR "cannot grab irq %d\n", pci->irq);
|
||||
snd_mychip_free(chip);
|
||||
return -EBUSY;
|
||||
@ -1342,7 +1342,7 @@
|
||||
|
||||
<para>
|
||||
On the PCI bus, the interrupts can be shared. Thus,
|
||||
<constant>SA_SHIRQ</constant> is given as the interrupt flag of
|
||||
<constant>IRQF_SHARED</constant> is given as the interrupt flag of
|
||||
<function>request_irq()</function>.
|
||||
</para>
|
||||
|
||||
@ -3048,7 +3048,7 @@ struct _snd_pcm_runtime {
|
||||
</para>
|
||||
|
||||
<para>
|
||||
If you aquire a spinlock in the interrupt handler, and the
|
||||
If you acquire a spinlock in the interrupt handler, and the
|
||||
lock is used in other pcm callbacks, too, then you have to
|
||||
release the lock before calling
|
||||
<function>snd_pcm_period_elapsed()</function>, because
|
||||
@ -4215,7 +4215,7 @@ struct _snd_pcm_runtime {
|
||||
<programlisting>
|
||||
<![CDATA[
|
||||
struct snd_rawmidi *rmidi;
|
||||
snd_mpu401_uart_new(card, 0, MPU401_HW_MPU401, port, integrated,
|
||||
snd_mpu401_uart_new(card, 0, MPU401_HW_MPU401, port, info_flags,
|
||||
irq, irq_flags, &rmidi);
|
||||
]]>
|
||||
</programlisting>
|
||||
@ -4242,15 +4242,36 @@ struct _snd_pcm_runtime {
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The 5th argument is bitflags for additional information.
|
||||
When the i/o port address above is a part of the PCI i/o
|
||||
region, the MPU401 i/o port might have been already allocated
|
||||
(reserved) by the driver itself. In such a case, pass non-zero
|
||||
to the 5th argument
|
||||
(<parameter>integrated</parameter>). Otherwise, pass 0 to it,
|
||||
(reserved) by the driver itself. In such a case, pass a bit flag
|
||||
<constant>MPU401_INFO_INTEGRATED</constant>,
|
||||
and
|
||||
the mpu401-uart layer will allocate the i/o ports by itself.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
When the controller supports only the input or output MIDI stream,
|
||||
pass <constant>MPU401_INFO_INPUT</constant> or
|
||||
<constant>MPU401_INFO_OUTPUT</constant> bitflag, respectively.
|
||||
Then the rawmidi instance is created as a single stream.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<constant>MPU401_INFO_MMIO</constant> bitflag is used to change
|
||||
the access method to MMIO (via readb and writeb) instead of
|
||||
iob and outb. In this case, you have to pass the iomapped address
|
||||
to <function>snd_mpu401_uart_new()</function>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
When <constant>MPU401_INFO_TX_IRQ</constant> is set, the output
|
||||
stream isn't checked in the default interrupt handler. The driver
|
||||
needs to call <function>snd_mpu401_uart_interrupt_tx()</function>
|
||||
by itself to start processing the output stream in irq handler.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Usually, the port address corresponds to the command port and
|
||||
port + 1 corresponds to the data port. If not, you may change
|
||||
@ -5333,7 +5354,7 @@ struct _snd_pcm_runtime {
|
||||
<informalexample>
|
||||
<programlisting>
|
||||
<![CDATA[
|
||||
snd_info_set_text_ops(entry, chip, read_size, my_proc_read);
|
||||
snd_info_set_text_ops(entry, chip, my_proc_read);
|
||||
]]>
|
||||
</programlisting>
|
||||
</informalexample>
|
||||
@ -5394,29 +5415,12 @@ struct _snd_pcm_runtime {
|
||||
<informalexample>
|
||||
<programlisting>
|
||||
<![CDATA[
|
||||
entry->c.text.write_size = 256;
|
||||
entry->c.text.write = my_proc_write;
|
||||
]]>
|
||||
</programlisting>
|
||||
</informalexample>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The buffer size for read is set to 1024 implicitly by
|
||||
<function>snd_info_set_text_ops()</function>. It should suffice
|
||||
in most cases (the size will be aligned to
|
||||
<constant>PAGE_SIZE</constant> anyway), but if you need to handle
|
||||
very large text files, you can set it explicitly, too.
|
||||
|
||||
<informalexample>
|
||||
<programlisting>
|
||||
<![CDATA[
|
||||
entry->c.text.read_size = 65536;
|
||||
]]>
|
||||
</programlisting>
|
||||
</informalexample>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For the write callback, you can use
|
||||
<function>snd_info_get_line()</function> to get a text line, and
|
||||
@ -5562,7 +5566,7 @@ struct _snd_pcm_runtime {
|
||||
power status.</para></listitem>
|
||||
<listitem><para>Call <function>snd_pcm_suspend_all()</function> to suspend the running PCM streams.</para></listitem>
|
||||
<listitem><para>If AC97 codecs are used, call
|
||||
<function>snd_ac97_resume()</function> for each codec.</para></listitem>
|
||||
<function>snd_ac97_suspend()</function> for each codec.</para></listitem>
|
||||
<listitem><para>Save the register values if necessary.</para></listitem>
|
||||
<listitem><para>Stop the hardware if necessary.</para></listitem>
|
||||
<listitem><para>Disable the PCI device by calling
|
||||
|
@ -25,42 +25,84 @@ the bits necessary to run your device. The most commonly
|
||||
used members of this structure, and their typical usage,
|
||||
will be detailed below.
|
||||
|
||||
Here is how probing is performed by an SBUS driver
|
||||
under Linux:
|
||||
Here is a piece of skeleton code for perofming a device
|
||||
probe in an SBUS driverunder Linux:
|
||||
|
||||
static void init_one_mydevice(struct sbus_dev *sdev)
|
||||
static int __devinit mydevice_probe_one(struct sbus_dev *sdev)
|
||||
{
|
||||
struct mysdevice *mp = kzalloc(sizeof(*mp), GFP_KERNEL);
|
||||
|
||||
if (!mp)
|
||||
return -ENODEV;
|
||||
|
||||
...
|
||||
dev_set_drvdata(&sdev->ofdev.dev, mp);
|
||||
return 0;
|
||||
...
|
||||
}
|
||||
|
||||
static int mydevice_match(struct sbus_dev *sdev)
|
||||
static int __devinit mydevice_probe(struct of_device *dev,
|
||||
const struct of_device_id *match)
|
||||
{
|
||||
if (some_criteria(sdev))
|
||||
return 1;
|
||||
return 0;
|
||||
struct sbus_dev *sdev = to_sbus_device(&dev->dev);
|
||||
|
||||
return mydevice_probe_one(sdev);
|
||||
}
|
||||
|
||||
static void mydevice_probe(void)
|
||||
static int __devexit mydevice_remove(struct of_device *dev)
|
||||
{
|
||||
struct sbus_bus *sbus;
|
||||
struct sbus_dev *sdev;
|
||||
struct sbus_dev *sdev = to_sbus_device(&dev->dev);
|
||||
struct mydevice *mp = dev_get_drvdata(&dev->dev);
|
||||
|
||||
for_each_sbus(sbus) {
|
||||
for_each_sbusdev(sdev, sbus) {
|
||||
if (mydevice_match(sdev))
|
||||
init_one_mydevice(sdev);
|
||||
}
|
||||
}
|
||||
return mydevice_remove_one(sdev, mp);
|
||||
}
|
||||
|
||||
All this does is walk through all SBUS devices in the
|
||||
system, checks each to see if it is of the type which
|
||||
your driver is written for, and if so it calls the init
|
||||
routine to attach the device and prepare to drive it.
|
||||
static struct of_device_id mydevice_match[] = {
|
||||
{
|
||||
.name = "mydevice",
|
||||
},
|
||||
{},
|
||||
};
|
||||
|
||||
"init_one_mydevice" might do things like allocate software
|
||||
state structures, map in I/O registers, place the hardware
|
||||
into an initialized state, etc.
|
||||
MODULE_DEVICE_TABLE(of, mydevice_match);
|
||||
|
||||
static struct of_platform_driver mydevice_driver = {
|
||||
.name = "mydevice",
|
||||
.match_table = mydevice_match,
|
||||
.probe = mydevice_probe,
|
||||
.remove = __devexit_p(mydevice_remove),
|
||||
};
|
||||
|
||||
static int __init mydevice_init(void)
|
||||
{
|
||||
return of_register_driver(&mydevice_driver, &sbus_bus_type);
|
||||
}
|
||||
|
||||
static void __exit mydevice_exit(void)
|
||||
{
|
||||
of_unregister_driver(&mydevice_driver);
|
||||
}
|
||||
|
||||
module_init(mydevice_init);
|
||||
module_exit(mydevice_exit);
|
||||
|
||||
The mydevice_match table is a series of entries which
|
||||
describes what SBUS devices your driver is meant for. In the
|
||||
simplest case you specify a string for the 'name' field. Every
|
||||
SBUS device with a 'name' property matching your string will
|
||||
be passed one-by-one to your .probe method.
|
||||
|
||||
You should store away your device private state structure
|
||||
pointer in the drvdata area so that you can retrieve it later on
|
||||
in your .remove method.
|
||||
|
||||
Any memory allocated, registers mapped, IRQs registered,
|
||||
etc. must be undone by your .remove method so that all resources
|
||||
of your device are relased by the time it returns.
|
||||
|
||||
You should _NOT_ use the for_each_sbus(), for_each_sbusdev(),
|
||||
and for_all_sbusdev() interfaces. They are deprecated, will be
|
||||
removed, and no new driver should reference them ever.
|
||||
|
||||
Mapping and Accessing I/O Registers
|
||||
|
||||
@ -263,10 +305,3 @@ discussed above and plus it handles both PCI and SBUS boards.
|
||||
Lance driver abuses consistent mappings for data transfer.
|
||||
It is a nifty trick which we do not particularly recommend...
|
||||
Just check it out and know that it's legal.
|
||||
|
||||
Bad examples, do NOT use
|
||||
|
||||
drivers/video/cgsix.c
|
||||
This one uses result of sbus_ioremap as if it is an address.
|
||||
This does NOT work on sparc64 and therefore is broken. We will
|
||||
convert it at a later date.
|
||||
|
@ -1,5 +1,6 @@
|
||||
Copyright 2004 Linus Torvalds
|
||||
Copyright 2004 Pavel Machek <pavel@suse.cz>
|
||||
Copyright 2006 Bob Copeland <me@bobcopeland.com>
|
||||
|
||||
Using sparse for typechecking
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
@ -41,15 +42,8 @@ sure that bitwise types don't get mixed up (little-endian vs big-endian
|
||||
vs cpu-endian vs whatever), and there the constant "0" really _is_
|
||||
special.
|
||||
|
||||
Use
|
||||
|
||||
make C=[12] CF=-Wbitwise
|
||||
|
||||
or you don't get any checking at all.
|
||||
|
||||
|
||||
Where to get sparse
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
Getting sparse
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
With git, you can just get it from
|
||||
|
||||
@ -57,7 +51,7 @@ With git, you can just get it from
|
||||
|
||||
and DaveJ has tar-balls at
|
||||
|
||||
http://www.codemonkey.org.uk/projects/git-snapshots/sparse/
|
||||
http://www.codemonkey.org.uk/projects/git-snapshots/sparse/
|
||||
|
||||
|
||||
Once you have it, just do
|
||||
@ -65,8 +59,20 @@ Once you have it, just do
|
||||
make
|
||||
make install
|
||||
|
||||
as your regular user, and it will install sparse in your ~/bin directory.
|
||||
After that, doing a kernel make with "make C=1" will run sparse on all the
|
||||
C files that get recompiled, or with "make C=2" will run sparse on the
|
||||
files whether they need to be recompiled or not (ie the latter is fast way
|
||||
to check the whole tree if you have already built it).
|
||||
as a regular user, and it will install sparse in your ~/bin directory.
|
||||
|
||||
Using sparse
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Do a kernel make with "make C=1" to run sparse on all the C files that get
|
||||
recompiled, or use "make C=2" to run sparse on the files whether they need to
|
||||
be recompiled or not. The latter is a fast way to check the whole tree if you
|
||||
have already built it.
|
||||
|
||||
The optional make variable CF can be used to pass arguments to sparse. The
|
||||
build system passes -Wbitwise to sparse automatically. To perform endianness
|
||||
checks, you may define __CHECK_ENDIAN__:
|
||||
|
||||
make C=2 CF="-D__CHECK_ENDIAN__"
|
||||
|
||||
These checks are disabled by default as they generate a host of warnings.
|
||||
|
@ -28,7 +28,7 @@ Currently, these files are in /proc/sys/vm:
|
||||
- block_dump
|
||||
- drop-caches
|
||||
- zone_reclaim_mode
|
||||
- zone_reclaim_interval
|
||||
- panic_on_oom
|
||||
|
||||
==============================================================
|
||||
|
||||
@ -166,15 +166,15 @@ use of files and builds up large slab caches. However, the slab
|
||||
shrink operation is global, may take a long time and free slabs
|
||||
in all nodes of the system.
|
||||
|
||||
================================================================
|
||||
=============================================================
|
||||
|
||||
zone_reclaim_interval:
|
||||
panic_on_oom
|
||||
|
||||
The time allowed for off node allocations after zone reclaim
|
||||
has failed to reclaim enough pages to allow a local allocation.
|
||||
This enables or disables panic on out-of-memory feature. If this is set to 1,
|
||||
the kernel panics when out-of-memory happens. If this is set to 0, the kernel
|
||||
will kill some rogue process, called oom_killer. Usually, oom_killer can kill
|
||||
rogue processes and system will survive. If you want to panic the system
|
||||
rather than killing rogue processes, set this to 1.
|
||||
|
||||
Time is set in seconds and set by default to 30 seconds.
|
||||
|
||||
Reduce the interval if undesired off node allocations occur. However, too
|
||||
frequent scans will have a negative impact onoff node allocation performance.
|
||||
The default value is 0.
|
||||
|
||||
|
@ -115,8 +115,9 @@ trojan program is running at console and which could grab your password
|
||||
when you would try to login. It will kill all programs on given console
|
||||
and thus letting you make sure that the login prompt you see is actually
|
||||
the one from init, not some trojan program.
|
||||
IMPORTANT:In its true form it is not a true SAK like the one in :IMPORTANT
|
||||
IMPORTANT:c2 compliant systems, and it should be mistook as such. :IMPORTANT
|
||||
IMPORTANT: In its true form it is not a true SAK like the one in a :IMPORTANT
|
||||
IMPORTANT: c2 compliant system, and it should not be mistaken as :IMPORTANT
|
||||
IMPORTANT: such. :IMPORTANT
|
||||
It seems other find it useful as (System Attention Key) which is
|
||||
useful when you want to exit a program that will not let you switch consoles.
|
||||
(For example, X or a svgalib program.)
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user