aboutsummaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/Changes31
-rw-r--r--Documentation/CodingStyle43
-rw-r--r--Documentation/DocBook/.gitignore6
-rw-r--r--Documentation/DocBook/kernel-api.tmpl6
-rw-r--r--Documentation/DocBook/kernel-locking.tmpl22
-rw-r--r--Documentation/DocBook/usb.tmpl1
-rw-r--r--Documentation/DocBook/videobook.tmpl4
-rw-r--r--Documentation/RCU/rcuref.txt87
-rw-r--r--Documentation/SubmittingDrivers24
-rw-r--r--Documentation/SubmittingPatches68
-rw-r--r--Documentation/applying-patches.txt81
-rw-r--r--Documentation/block/barrier.txt271
-rw-r--r--Documentation/block/biodoc.txt12
-rw-r--r--Documentation/block/stat.txt82
-rw-r--r--Documentation/cachetlb.txt2
-rw-r--r--Documentation/cpu-freq/governors.txt62
-rw-r--r--Documentation/cpu-hotplug.txt357
-rw-r--r--Documentation/cpusets.txt165
-rw-r--r--Documentation/drivers/edac/edac.txt673
-rw-r--r--Documentation/dvb/avermedia.txt3
-rw-r--r--Documentation/dvb/get_dvb_firmware23
-rw-r--r--Documentation/dvb/ttusb-dec.txt3
-rw-r--r--Documentation/fb/cyblafb/bugs1
-rw-r--r--Documentation/fb/cyblafb/fb.modes57
-rw-r--r--Documentation/fb/cyblafb/performance1
-rw-r--r--Documentation/fb/cyblafb/todo5
-rw-r--r--Documentation/fb/cyblafb/usage33
-rw-r--r--Documentation/fb/cyblafb/whatsnew29
-rw-r--r--Documentation/feature-removal-schedule.txt29
-rw-r--r--Documentation/filesystems/00-INDEX8
-rw-r--r--Documentation/filesystems/configfs/configfs.txt434
-rw-r--r--Documentation/filesystems/configfs/configfs_example.c474
-rw-r--r--Documentation/filesystems/dlmfs.txt130
-rw-r--r--Documentation/filesystems/ext3.txt181
-rw-r--r--Documentation/filesystems/fuse.txt63
-rw-r--r--Documentation/filesystems/ocfs2.txt55
-rw-r--r--Documentation/filesystems/proc.txt19
-rw-r--r--Documentation/filesystems/ramfs-rootfs-initramfs.txt72
-rw-r--r--Documentation/filesystems/relayfs.txt126
-rw-r--r--Documentation/filesystems/spufs.txt521
-rw-r--r--Documentation/filesystems/sysfs-pci.txt21
-rw-r--r--Documentation/filesystems/tmpfs.txt12
-rw-r--r--Documentation/filesystems/vfs.txt5
-rw-r--r--Documentation/hpet.txt2
-rw-r--r--Documentation/hrtimers.txt178
-rw-r--r--Documentation/hwmon/w83627hf19
-rw-r--r--Documentation/i2c/busses/i2c-nforce23
-rw-r--r--Documentation/i2c/busses/i2c-parport1
-rw-r--r--Documentation/i2c/porting-clients90
-rw-r--r--Documentation/i2c/writing-clients20
-rw-r--r--Documentation/i2o/ioctl2
-rw-r--r--Documentation/input/appletouch.txt5
-rw-r--r--Documentation/input/ff.txt2
-rw-r--r--Documentation/ioctl/hdio.txt2
-rw-r--r--Documentation/kbuild/makefiles.txt4
-rw-r--r--Documentation/kbuild/modules.txt66
-rw-r--r--Documentation/kdump/gdbmacros.txt22
-rw-r--r--Documentation/kdump/kdump.txt149
-rw-r--r--Documentation/kernel-parameters.txt88
-rw-r--r--Documentation/keys-request-key.txt22
-rw-r--r--Documentation/keys.txt61
-rw-r--r--Documentation/kprobes.txt3
-rw-r--r--Documentation/laptop-mode.txt8
-rw-r--r--Documentation/locks.txt17
-rw-r--r--Documentation/md.txt120
-rw-r--r--Documentation/mutex-design.txt135
-rw-r--r--Documentation/networking/bonding.txt2
-rw-r--r--Documentation/networking/gianfar.txt72
-rw-r--r--Documentation/networking/ip-sysctl.txt23
-rw-r--r--Documentation/networking/sk98lin.txt4
-rw-r--r--Documentation/oops-tracing.txt8
-rw-r--r--Documentation/pci-error-recovery.txt246
-rw-r--r--Documentation/pcmcia/driver-changes.txt11
-rw-r--r--Documentation/pm.txt2
-rw-r--r--Documentation/power/interface.txt11
-rw-r--r--Documentation/power/swsusp.txt9
-rw-r--r--Documentation/powerpc/00-INDEX10
-rw-r--r--Documentation/powerpc/eeh-pci-error-recovery.txt31
-rw-r--r--Documentation/scsi/ChangeLog.megaraid35
-rw-r--r--Documentation/scsi/aacraid.txt108
-rw-r--r--Documentation/scsi/scsi_mid_low_api.txt39
-rw-r--r--Documentation/sound/alsa/ALSA-Configuration.txt194
-rw-r--r--Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl592
-rw-r--r--Documentation/sound/alsa/Procfile.txt16
-rw-r--r--Documentation/sound/alsa/hda_codec.txt14
-rw-r--r--Documentation/spi/butterfly57
-rw-r--r--Documentation/spi/spi-summary457
-rw-r--r--Documentation/stable_kernel_rules.txt60
-rw-r--r--Documentation/sysctl/vm.txt38
-rw-r--r--Documentation/sysrq.txt6
-rw-r--r--Documentation/video4linux/CARDLIST.bttv2
-rw-r--r--Documentation/video4linux/CARDLIST.cx8812
-rw-r--r--Documentation/video4linux/CARDLIST.saa71345
-rw-r--r--Documentation/video4linux/CARDLIST.tuner7
-rw-r--r--Documentation/x86_64/boot-options.txt4
-rw-r--r--Documentation/x86_64/cpu-hotplug-spec21
96 files changed, 6424 insertions, 993 deletions
diff --git a/Documentation/Changes b/Documentation/Changes
index 86b86399d61..fe5ae0f5502 100644
--- a/Documentation/Changes
+++ b/Documentation/Changes
@@ -31,8 +31,6 @@ al espaņol de este documento en varios formatos.
Eine deutsche Version dieser Datei finden Sie unter
<http://www.stefan-winter.de/Changes-2.4.0.txt>.
-Last updated: October 29th, 2002
-
Chris Ricker (kaboom@gatech.edu or chris.ricker@genetics.utah.edu).
Current Minimal Requirements
@@ -48,7 +46,7 @@ necessary on all systems; obviously, if you don't have any ISDN
hardware, for example, you probably needn't concern yourself with
isdn4k-utils.
-o Gnu C 2.95.3 # gcc --version
+o Gnu C 3.2 # gcc --version
o Gnu make 3.79.1 # make --version
o binutils 2.12 # ld -v
o util-linux 2.10o # fdformat --version
@@ -74,26 +72,7 @@ GCC
---
The gcc version requirements may vary depending on the type of CPU in your
-computer. The next paragraph applies to users of x86 CPUs, but not
-necessarily to users of other CPUs. Users of other CPUs should obtain
-information about their gcc version requirements from another source.
-
-The recommended compiler for the kernel is gcc 2.95.x (x >= 3), and it
-should be used when you need absolute stability. You may use gcc 3.0.x
-instead if you wish, although it may cause problems. Later versions of gcc
-have not received much testing for Linux kernel compilation, and there are
-almost certainly bugs (mainly, but not exclusively, in the kernel) that
-will need to be fixed in order to use these compilers. In any case, using
-pgcc instead of plain gcc is just asking for trouble.
-
-The Red Hat gcc 2.96 compiler subtree can also be used to build this tree.
-You should ensure you use gcc-2.96-74 or later. gcc-2.96-54 will not build
-the kernel correctly.
-
-In addition, please pay attention to compiler optimization. Anything
-greater than -O2 may not be wise. Similarly, if you choose to use gcc-2.95.x
-or derivatives, be sure not to use -fstrict-aliasing (which, depending on
-your version of gcc 2.95.x, may necessitate using -fno-strict-aliasing).
+computer.
Make
----
@@ -322,9 +301,9 @@ Getting updated software
Kernel compilation
******************
-gcc 2.95.3
-----------
-o <ftp://ftp.gnu.org/gnu/gcc/gcc-2.95.3.tar.gz>
+gcc
+---
+o <ftp://ftp.gnu.org/gnu/gcc/>
Make
----
diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle
index eb7db3c1922..ce5d2c038cf 100644
--- a/Documentation/CodingStyle
+++ b/Documentation/CodingStyle
@@ -199,7 +199,7 @@ The rationale is:
modifications are prevented
- saves the compiler work to optimize redundant code away ;)
-int fun(int )
+int fun(int a)
{
int result = 0;
char *buffer = kmalloc(SIZE);
@@ -344,7 +344,7 @@ Remember: if another thread can find your data structure, and you don't
have a reference count on it, you almost certainly have a bug.
- Chapter 11: Macros, Enums, Inline functions and RTL
+ Chapter 11: Macros, Enums and RTL
Names of macros defining constants and labels in enums are capitalized.
@@ -429,7 +429,35 @@ from void pointer to any other pointer type is guaranteed by the C programming
language.
- Chapter 14: References
+ Chapter 14: The inline disease
+
+There appears to be a common misperception that gcc has a magic "make me
+faster" speedup option called "inline". While the use of inlines can be
+appropriate (for example as a means of replacing macros, see Chapter 11), it
+very often is not. Abundant use of the inline keyword leads to a much bigger
+kernel, which in turn slows the system as a whole down, due to a bigger
+icache footprint for the CPU and simply because there is less memory
+available for the pagecache. Just think about it; a pagecache miss causes a
+disk seek, which easily takes 5 miliseconds. There are a LOT of cpu cycles
+that can go into these 5 miliseconds.
+
+A reasonable rule of thumb is to not put inline at functions that have more
+than 3 lines of code in them. An exception to this rule are the cases where
+a parameter is known to be a compiletime constant, and as a result of this
+constantness you *know* the compiler will be able to optimize most of your
+function away at compile time. For a good example of this later case, see
+the kmalloc() inline function.
+
+Often people argue that adding inline to functions that are static and used
+only once is always a win since there is no space tradeoff. While this is
+technically correct, gcc is capable of inlining these automatically without
+help, and the maintenance issue of removing the inline when a second user
+appears outweighs the potential value of the hint that tells gcc to do
+something it would have done anyway.
+
+
+
+ Chapter 15: References
The C Programming Language, Second Edition
by Brian W. Kernighan and Dennis M. Ritchie.
@@ -444,10 +472,13 @@ ISBN 0-201-61586-X.
URL: http://cm.bell-labs.com/cm/cs/tpop/
GNU manuals - where in compliance with K&R and this text - for cpp, gcc,
-gcc internals and indent, all available from http://www.gnu.org
+gcc internals and indent, all available from http://www.gnu.org/manual/
WG14 is the international standardization working group for the programming
-language C, URL: http://std.dkuug.dk/JTC1/SC22/WG14/
+language C, URL: http://www.open-std.org/JTC1/SC22/WG14/
+
+Kernel CodingStyle, by greg@kroah.com at OLS 2002:
+http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/
--
-Last updated on 16 February 2004 by a community effort on LKML.
+Last updated on 30 December 2005 by a community effort on LKML.
diff --git a/Documentation/DocBook/.gitignore b/Documentation/DocBook/.gitignore
new file mode 100644
index 00000000000..c102c02ecf8
--- /dev/null
+++ b/Documentation/DocBook/.gitignore
@@ -0,0 +1,6 @@
+*.xml
+*.ps
+*.pdf
+*.html
+*.9.gz
+*.9
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index 767433bdbc4..8c9c6704e85 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -54,6 +54,11 @@
!Ekernel/sched.c
!Ekernel/timer.c
</sect1>
+ <sect1><title>High-resolution timers</title>
+!Iinclude/linux/ktime.h
+!Iinclude/linux/hrtimer.h
+!Ekernel/hrtimer.c
+ </sect1>
<sect1><title>Internal Functions</title>
!Ikernel/exit.c
!Ikernel/signal.c
@@ -369,6 +374,7 @@ X!Edrivers/acpi/motherboard.c
X!Edrivers/acpi/bus.c
-->
!Edrivers/acpi/scan.c
+!Idrivers/acpi/scan.c
<!-- No correct structured comments
X!Edrivers/acpi/pci_bind.c
-->
diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl
index 90dc2de8e0a..158ffe9bfad 100644
--- a/Documentation/DocBook/kernel-locking.tmpl
+++ b/Documentation/DocBook/kernel-locking.tmpl
@@ -222,7 +222,7 @@
<title>Two Main Types of Kernel Locks: Spinlocks and Semaphores</title>
<para>
- There are two main types of kernel locks. The fundamental type
+ There are three main types of kernel locks. The fundamental type
is the spinlock
(<filename class="headerfile">include/asm/spinlock.h</filename>),
which is a very simple single-holder lock: if you can't get the
@@ -230,16 +230,22 @@
very small and fast, and can be used anywhere.
</para>
<para>
- The second type is a semaphore
+ The second type is a mutex
+ (<filename class="headerfile">include/linux/mutex.h</filename>): it
+ is like a spinlock, but you may block holding a mutex.
+ If you can't lock a mutex, your task will suspend itself, and be woken
+ up when the mutex is released. This means the CPU can do something
+ else while you are waiting. There are many cases when you simply
+ can't sleep (see <xref linkend="sleeping-things"/>), and so have to
+ use a spinlock instead.
+ </para>
+ <para>
+ The third type is a semaphore
(<filename class="headerfile">include/asm/semaphore.h</filename>): it
can have more than one holder at any time (the number decided at
initialization time), although it is most commonly used as a
- single-holder lock (a mutex). If you can't get a semaphore,
- your task will put itself on the queue, and be woken up when the
- semaphore is released. This means the CPU will do something
- else while you are waiting, but there are many cases when you
- simply can't sleep (see <xref linkend="sleeping-things"/>), and so
- have to use a spinlock instead.
+ single-holder lock (a mutex). If you can't get a semaphore, your
+ task will be suspended and later on woken up - just like for mutexes.
</para>
<para>
Neither type of lock is recursive: see
diff --git a/Documentation/DocBook/usb.tmpl b/Documentation/DocBook/usb.tmpl
index 15ce0f21e5e..320af25de3a 100644
--- a/Documentation/DocBook/usb.tmpl
+++ b/Documentation/DocBook/usb.tmpl
@@ -253,6 +253,7 @@
!Edrivers/usb/core/urb.c
!Edrivers/usb/core/message.c
!Edrivers/usb/core/file.c
+!Edrivers/usb/core/driver.c
!Edrivers/usb/core/usb.c
!Edrivers/usb/core/hub.c
</chapter>
diff --git a/Documentation/DocBook/videobook.tmpl b/Documentation/DocBook/videobook.tmpl
index 3ec6c875588..fdff984a516 100644
--- a/Documentation/DocBook/videobook.tmpl
+++ b/Documentation/DocBook/videobook.tmpl
@@ -229,7 +229,7 @@ int __init myradio_init(struct video_init *v)
static int users = 0;
-static int radio_open(stuct video_device *dev, int flags)
+static int radio_open(struct video_device *dev, int flags)
{
if(users)
return -EBUSY;
@@ -949,7 +949,7 @@ int __init mycamera_init(struct video_init *v)
static int users = 0;
-static int camera_open(stuct video_device *dev, int flags)
+static int camera_open(struct video_device *dev, int flags)
{
if(users)
return -EBUSY;
diff --git a/Documentation/RCU/rcuref.txt b/Documentation/RCU/rcuref.txt
index a23fee66064..3f60db41b2f 100644
--- a/Documentation/RCU/rcuref.txt
+++ b/Documentation/RCU/rcuref.txt
@@ -1,74 +1,67 @@
-Refcounter framework for elements of lists/arrays protected by
-RCU.
+Refcounter design for elements of lists/arrays protected by RCU.
Refcounting on elements of lists which are protected by traditional
reader/writer spinlocks or semaphores are straight forward as in:
-1. 2.
-add() search_and_reference()
-{ {
- alloc_object read_lock(&list_lock);
- ... search_for_element
- atomic_set(&el->rc, 1); atomic_inc(&el->rc);
- write_lock(&list_lock); ...
- add_element read_unlock(&list_lock);
- ... ...
- write_unlock(&list_lock); }
+1. 2.
+add() search_and_reference()
+{ {
+ alloc_object read_lock(&list_lock);
+ ... search_for_element
+ atomic_set(&el->rc, 1); atomic_inc(&el->rc);
+ write_lock(&list_lock); ...
+ add_element read_unlock(&list_lock);
+ ... ...
+ write_unlock(&list_lock); }
}
3. 4.
release_referenced() delete()
{ {
- ... write_lock(&list_lock);
- atomic_dec(&el->rc, relfunc) ...
- ... delete_element
-} write_unlock(&list_lock);
- ...
- if (atomic_dec_and_test(&el->rc))
- kfree(el);
- ...
+ ... write_lock(&list_lock);
+ atomic_dec(&el->rc, relfunc) ...
+ ... delete_element
+} write_unlock(&list_lock);
+ ...
+ if (atomic_dec_and_test(&el->rc))
+ kfree(el);
+ ...
}
If this list/array is made lock free using rcu as in changing the
write_lock in add() and delete() to spin_lock and changing read_lock
-in search_and_reference to rcu_read_lock(), the rcuref_get in
+in search_and_reference to rcu_read_lock(), the atomic_get in
search_and_reference could potentially hold reference to an element which
-has already been deleted from the list/array. rcuref_lf_get_rcu takes
+has already been deleted from the list/array. atomic_inc_not_zero takes
care of this scenario. search_and_reference should look as;
1. 2.
add() search_and_reference()
{ {
- alloc_object rcu_read_lock();
- ... search_for_element
- atomic_set(&el->rc, 1); if (rcuref_inc_lf(&el->rc)) {
- write_lock(&list_lock); rcu_read_unlock();
- return FAIL;
- add_element }
- ... ...
- write_unlock(&list_lock); rcu_read_unlock();
+ alloc_object rcu_read_lock();
+ ... search_for_element
+ atomic_set(&el->rc, 1); if (atomic_inc_not_zero(&el->rc)) {
+ write_lock(&list_lock); rcu_read_unlock();
+ return FAIL;
+ add_element }
+ ... ...
+ write_unlock(&list_lock); rcu_read_unlock();
} }
3. 4.
release_referenced() delete()
{ {
- ... write_lock(&list_lock);
- rcuref_dec(&el->rc, relfunc) ...
- ... delete_element
-} write_unlock(&list_lock);
- ...
- if (rcuref_dec_and_test(&el->rc))
- call_rcu(&el->head, el_free);
- ...
+ ... write_lock(&list_lock);
+ atomic_dec(&el->rc, relfunc) ...
+ ... delete_element
+} write_unlock(&list_lock);
+ ...
+ if (atomic_dec_and_test(&el->rc))
+ call_rcu(&el->head, el_free);
+ ...
}
Sometimes, reference to the element need to be obtained in the
-update (write) stream. In such cases, rcuref_inc_lf might be an overkill
-since the spinlock serialising list updates are held. rcuref_inc
+update (write) stream. In such cases, atomic_inc_not_zero might be an
+overkill since the spinlock serialising list updates are held. atomic_inc
is to be used in such cases.
-For arches which do not have cmpxchg rcuref_inc_lf
-api uses a hashed spinlock implementation and the same hashed spinlock
-is acquired in all rcuref_xxx primitives to preserve atomicity.
-Note: Use rcuref_inc api only if you need to use rcuref_inc_lf on the
-refcounter atleast at one place. Mixing rcuref_inc and atomic_xxx api
-might lead to races. rcuref_inc_lf() must be used in lockfree
-RCU critical sections only.
+
diff --git a/Documentation/SubmittingDrivers b/Documentation/SubmittingDrivers
index c3cca924e94..6bd30fdd078 100644
--- a/Documentation/SubmittingDrivers
+++ b/Documentation/SubmittingDrivers
@@ -27,18 +27,17 @@ Who To Submit Drivers To
------------------------
Linux 2.0:
- No new drivers are accepted for this kernel tree
+ No new drivers are accepted for this kernel tree.
Linux 2.2:
+ No new drivers are accepted for this kernel tree.
+
+Linux 2.4:
If the code area has a general maintainer then please submit it to
the maintainer listed in MAINTAINERS in the kernel file. If the
maintainer does not respond or you cannot find the appropriate
- maintainer then please contact the 2.2 kernel maintainer:
- Marc-Christian Petersen <m.c.p@wolk-project.de>.
-
-Linux 2.4:
- The same rules apply as 2.2. The final contact point for Linux 2.4
- submissions is Marcelo Tosatti <marcelo.tosatti@cyclades.com>.
+ maintainer then please contact Marcelo Tosatti
+ <marcelo.tosatti@cyclades.com>.
Linux 2.6:
The same rules apply as 2.4 except that you should follow linux-kernel
@@ -53,6 +52,7 @@ Licensing: The code must be released to us under the
of exclusive GPL licensing, and if you wish the driver
to be useful to other communities such as BSD you may well
wish to release under multiple licenses.
+ See accepted licenses at include/linux/module.h
Copyright: The copyright owner must agree to use of GPL.
It's best if the submitter and copyright owner
@@ -143,5 +143,13 @@ KernelNewbies:
http://kernelnewbies.org/
Linux USB project:
- http://sourceforge.net/projects/linux-usb/
+ http://www.linux-usb.org/
+
+How to NOT write kernel driver by arjanv@redhat.com
+ http://people.redhat.com/arjanv/olspaper.pdf
+
+Kernel Janitor:
+ http://janitor.kernelnewbies.org/
+--
+Last updated on 17 Nov 2005.
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 237d54c44bc..c2c85bcb3d4 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -78,7 +78,9 @@ Randy Dunlap's patch scripts:
http://www.xenotime.net/linux/scripts/patching-scripts-002.tar.gz
Andrew Morton's patch scripts:
-http://www.zip.com.au/~akpm/linux/patches/patch-scripts-0.20
+http://www.zip.com.au/~akpm/linux/patches/
+Instead of these scripts, quilt is the recommended patch management
+tool (see above).
@@ -97,7 +99,7 @@ need to split up your patch. See #3, next.
3) Separate your changes.
-Separate each logical change into its own patch.
+Separate _logical changes_ into a single patch file.
For example, if your changes include both bug fixes and performance
enhancements for a single driver, separate those changes into two
@@ -112,6 +114,10 @@ If one patch depends on another patch in order for a change to be
complete, that is OK. Simply note "this patch depends on patch X"
in your patch description.
+If you cannot condense your patch set into a smaller set of patches,
+then only post say 15 or so at a time and wait for review and integration.
+
+
4) Select e-mail destination.
@@ -124,6 +130,10 @@ your patch to the primary Linux kernel developer's mailing list,
linux-kernel@vger.kernel.org. Most kernel developers monitor this
e-mail list, and can comment on your changes.
+
+Do not send more than 15 patches at once to the vger mailing lists!!!
+
+
Linus Torvalds is the final arbiter of all changes accepted into the
Linux kernel. His e-mail address is <torvalds@osdl.org>. He gets
a lot of e-mail, so typically you should do your best to -avoid- sending
@@ -149,6 +159,9 @@ USB, framebuffer devices, the VFS, the SCSI subsystem, etc. See the
MAINTAINERS file for a mailing list that relates specifically to
your change.
+Majordomo lists of VGER.KERNEL.ORG at:
+ <http://vger.kernel.org/vger-lists.html>
+
If changes affect userland-kernel interfaces, please send
the MAN-PAGES maintainer (as listed in the MAINTAINERS file)
a man-pages patch, or at least a notification of the change,
@@ -158,7 +171,7 @@ Even if the maintainer did not respond in step #4, make sure to ALWAYS
copy the maintainer when you change their code.
For small patches you may want to CC the Trivial Patch Monkey
-trivial@rustcorp.com.au set up by Rusty Russell; which collects "trivial"
+trivial@kernel.org managed by Adrian Bunk; which collects "trivial"
patches. Trivial patches must qualify for one of the following rules:
Spelling fixes in documentation
Spelling fixes which could break grep(1).
@@ -171,7 +184,7 @@ patches. Trivial patches must qualify for one of the following rules:
since people copy, as long as it's trivial)
Any fix by the author/maintainer of the file. (ie. patch monkey
in re-transmission mode)
-URL: <http://www.kernel.org/pub/linux/kernel/people/rusty/trivial/>
+URL: <http://www.kernel.org/pub/linux/kernel/people/bunk/trivial/>
@@ -373,27 +386,14 @@ a diffstat, to show what files have changed, and the number of inserted
and deleted lines per file. A diffstat is especially useful on bigger
patches. Other comments relevant only to the moment or the maintainer,
not suitable for the permanent changelog, should also go here.
+Use diffstat options "-p 1 -w 70" so that filenames are listed from the
+top of the kernel source tree and don't use too much horizontal space
+(easily fit in 80 columns, maybe with some indentation).
See more details on the proper patch format in the following
references.
-13) More references for submitting patches
-
-Andrew Morton, "The perfect patch" (tpp).
- <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt>
-
-Jeff Garzik, "Linux kernel patch submission format."
- <http://linux.yyz.us/patch-format.html>
-
-Greg KH, "How to piss off a kernel subsystem maintainer"
- <http://www.kroah.com/log/2005/03/31/>
-
-Kernel Documentation/CodingStyle
- <http://sosdg.org/~coywolf/lxr/source/Documentation/CodingStyle>
-
-Linus Torvald's mail on the canonical patch format:
- <http://lkml.org/lkml/2005/4/7/183>
-----------------------------------
@@ -466,3 +466,31 @@ and 'extern __inline__'.
Don't try to anticipate nebulous future cases which may or may not
be useful: "Make it as simple as you can, and no simpler."
+
+
+----------------------
+SECTION 3 - REFERENCES
+----------------------
+
+Andrew Morton, "The perfect patch" (tpp).
+ <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt>
+
+Jeff Garzik, "Linux kernel patch submission format."
+ <http://linux.yyz.us/patch-format.html>
+
+Greg Kroah-Hartman "How to piss off a kernel subsystem maintainer".
+ <http://www.kroah.com/log/2005/03/31/>
+ <http://www.kroah.com/log/2005/07/08/>
+ <http://www.kroah.com/log/2005/10/19/>
+ <http://www.kroah.com/log/2006/01/11/>
+
+NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!.
+ <http://marc.theaimsgroup.com/?l=linux-kernel&m=112112749912944&w=2>
+
+Kernel Documentation/CodingStyle
+ <http://sosdg.org/~coywolf/lxr/source/Documentation/CodingStyle>
+
+Linus Torvald's mail on the canonical patch format:
+ <http://lkml.org/lkml/2005/4/7/183>
+--
+Last updated on 17 Nov 2005.
diff --git a/Documentation/applying-patches.txt b/Documentation/applying-patches.txt
index 681e426e248..a083ba35d1a 100644
--- a/Documentation/applying-patches.txt
+++ b/Documentation/applying-patches.txt
@@ -2,8 +2,8 @@
Applying Patches To The Linux Kernel
------------------------------------
- (Written by Jesper Juhl, August 2005)
-
+ Original by: Jesper Juhl, August 2005
+ Last update: 2006-01-05
A frequently asked question on the Linux Kernel Mailing List is how to apply
@@ -76,7 +76,7 @@ instead:
If you wish to uncompress the patch file by hand first before applying it
(what I assume you've done in the examples below), then you simply run
-gunzip or bunzip2 on the file - like this:
+gunzip or bunzip2 on the file -- like this:
gunzip patch-x.y.z.gz
bunzip2 patch-x.y.z.bz2
@@ -94,7 +94,7 @@ Common errors when patching
---
When patch applies a patch file it attempts to verify the sanity of the
file in different ways.
-Checking that the file looks like a valid patch file, checking the code
+Checking that the file looks like a valid patch file & checking the code
around the bits being modified matches the context provided in the patch are
just two of the basic sanity checks patch does.
@@ -118,16 +118,16 @@ wrong.
When patch encounters a change that it can't fix up with fuzz it rejects it
outright and leaves a file with a .rej extension (a reject file). You can
-read this file to see exactely what change couldn't be applied, so you can
+read this file to see exactly what change couldn't be applied, so you can
go fix it up by hand if you wish.
-If you don't have any third party patches applied to your kernel source, but
+If you don't have any third-party patches applied to your kernel source, but
only patches from kernel.org and you apply the patches in the correct order,
and have made no modifications yourself to the source files, then you should
never see a fuzz or reject message from patch. If you do see such messages
anyway, then there's a high risk that either your local source tree or the
patch file is corrupted in some way. In that case you should probably try
-redownloading the patch and if things are still not OK then you'd be advised
+re-downloading the patch and if things are still not OK then you'd be advised
to start with a fresh tree downloaded in full from kernel.org.
Let's look a bit more at some of the messages patch can produce.
@@ -136,7 +136,7 @@ If patch stops and presents a "File to patch:" prompt, then patch could not
find a file to be patched. Most likely you forgot to specify -p1 or you are
in the wrong directory. Less often, you'll find patches that need to be
applied with -p0 instead of -p1 (reading the patch file should reveal if
-this is the case - if so, then this is an error by the person who created
+this is the case -- if so, then this is an error by the person who created
the patch but is not fatal).
If you get "Hunk #2 succeeded at 1887 with fuzz 2 (offset 7 lines)." or a
@@ -167,22 +167,28 @@ the patch will in fact apply it.
A message similar to "patch: **** unexpected end of file in patch" or "patch
unexpectedly ends in middle of line" means that patch could make no sense of
-the file you fed to it. Either your download is broken or you tried to feed
-patch a compressed patch file without uncompressing it first.
+the file you fed to it. Either your download is broken, you tried to feed
+patch a compressed patch file without uncompressing it first, or the patch
+file that you are using has been mangled by a mail client or mail transfer
+agent along the way somewhere, e.g., by splitting a long line into two lines.
+Often these warnings can easily be fixed by joining (concatenating) the
+two lines that had been split.
As I already mentioned above, these errors should never happen if you apply
a patch from kernel.org to the correct version of an unmodified source tree.
So if you get these errors with kernel.org patches then you should probably
-assume that either your patch file or your tree is broken and I'd advice you
+assume that either your patch file or your tree is broken and I'd advise you
to start over with a fresh download of a full kernel tree and the patch you
wish to apply.
Are there any alternatives to `patch'?
---
- Yes there are alternatives. You can use the `interdiff' program
-(http://cyberelk.net/tim/patchutils/) to generate a patch representing the
-differences between two patches and then apply the result.
+ Yes there are alternatives.
+
+ You can use the `interdiff' program (http://cyberelk.net/tim/patchutils/) to
+generate a patch representing the differences between two patches and then
+apply the result.
This will let you move from something like 2.6.12.2 to 2.6.12.3 in a single
step. The -z flag to interdiff will even let you feed it patches in gzip or
bzip2 compressed form directly without the use of zcat or bzcat or manual
@@ -197,10 +203,10 @@ do the additional steps since interdiff can get things wrong in some cases.
Another alternative is `ketchup', which is a python script for automatic
downloading and applying of patches (http://www.selenic.com/ketchup/).
-Other nice tools are diffstat which shows a summary of changes made by a
-patch, lsdiff which displays a short listing of affected files in a patch
-file, along with (optionally) the line numbers of the start of each patch
-and grepdiff which displays a list of the files modified by a patch where
+ Other nice tools are diffstat, which shows a summary of changes made by a
+patch; lsdiff, which displays a short listing of affected files in a patch
+file, along with (optionally) the line numbers of the start of each patch;
+and grepdiff, which displays a list of the files modified by a patch where
the patch contains a given regular expression.
@@ -225,8 +231,8 @@ The -mm kernels live at
In place of ftp.kernel.org you can use ftp.cc.kernel.org, where cc is a
country code. This way you'll be downloading from a mirror site that's most
likely geographically closer to you, resulting in faster downloads for you,
-less bandwidth used globally and less load on the main kernel.org servers -
-these are good things, do use mirrors when possible.
+less bandwidth used globally and less load on the main kernel.org servers --
+these are good things, so do use mirrors when possible.
The 2.6.x kernels
@@ -234,14 +240,14 @@ The 2.6.x kernels
These are the base stable releases released by Linus. The highest numbered
release is the most recent.
-If regressions or other serious flaws are found then a -stable fix patch
+If regressions or other serious flaws are found, then a -stable fix patch
will be released (see below) on top of this base. Once a new 2.6.x base
kernel is released, a patch is made available that is a delta between the
previous 2.6.x kernel and the new one.
-To apply a patch moving from 2.6.11 to 2.6.12 you'd do the following (note
+To apply a patch moving from 2.6.11 to 2.6.12, you'd do the following (note
that such patches do *NOT* apply on top of 2.6.x.y kernels but on top of the
-base 2.6.x kernel - if you need to move from 2.6.x.y to 2.6.x+1 you need to
+base 2.6.x kernel -- if you need to move from 2.6.x.y to 2.6.x+1 you need to
first revert the 2.6.x.y patch).
Here are some examples:
@@ -258,12 +264,12 @@ $ patch -p1 -R < ../patch-2.6.11.1 # revert the 2.6.11.1 patch
# source dir is now 2.6.11
$ patch -p1 < ../patch-2.6.12 # apply new 2.6.12 patch
$ cd ..
-$ mv linux-2.6.11.1 inux-2.6.12 # rename source dir
+$ mv linux-2.6.11.1 linux-2.6.12 # rename source dir
The 2.6.x.y kernels
---
- Kernels with 4 digit versions are -stable kernels. They contain small(ish)
+ Kernels with 4-digit versions are -stable kernels. They contain small(ish)
critical fixes for security problems or significant regressions discovered
in a given 2.6.x kernel.
@@ -274,9 +280,14 @@ versions.
If no 2.6.x.y kernel is available, then the highest numbered 2.6.x kernel is
the current stable kernel.
+ note: the -stable team usually do make incremental patches available as well
+ as patches against the latest mainline release, but I only cover the
+ non-incremental ones below. The incremental ones can be found at
+ ftp://ftp.kernel.org/pub/linux/kernel/v2.6/incr/
+
These patches are not incremental, meaning that for example the 2.6.12.3
patch does not apply on top of the 2.6.12.2 kernel source, but rather on top
-of the base 2.6.12 kernel source.
+of the base 2.6.12 kernel source .
So, in order to apply the 2.6.12.3 patch to your existing 2.6.12.2 kernel
source you have to first back out the 2.6.12.2 patch (so you are left with a
base 2.6.12 kernel source) and then apply the new 2.6.12.3 patch.
@@ -342,12 +353,12 @@ The -git kernels
repository, hence the name).
These patches are usually released daily and represent the current state of
-Linus' tree. They are more experimental than -rc kernels since they are
+Linus's tree. They are more experimental than -rc kernels since they are
generated automatically without even a cursory glance to see if they are
sane.
-git patches are not incremental and apply either to a base 2.6.x kernel or
-a base 2.6.x-rc kernel - you can see which from their name.
+a base 2.6.x-rc kernel -- you can see which from their name.
A patch named 2.6.12-git1 applies to the 2.6.12 kernel source and a patch
named 2.6.13-rc3-git2 applies to the source of the 2.6.13-rc3 kernel.
@@ -390,12 +401,12 @@ You should generally strive to get your patches into mainline via -mm to
ensure maximum testing.
This branch is in constant flux and contains many experimental features, a
-lot of debugging patches not appropriate for mainline etc and is the most
+lot of debugging patches not appropriate for mainline etc., and is the most
experimental of the branches described in this document.
These kernels are not appropriate for use on systems that are supposed to be
stable and they are more risky to run than any of the other branches (make
-sure you have up-to-date backups - that goes for any experimental kernel but
+sure you have up-to-date backups -- that goes for any experimental kernel but
even more so for -mm kernels).
These kernels in addition to all the other experimental patches they contain
@@ -433,7 +444,11 @@ $ cd ..
$ mv linux-2.6.12-mm1 linux-2.6.13-rc3-mm3 # rename the source dir
-This concludes this list of explanations of the various kernel trees and I
-hope you are now crystal clear on how to apply the various patches and help
-testing the kernel.
+This concludes this list of explanations of the various kernel trees.
+I hope you are now clear on how to apply the various patches and help testing
+the kernel.
+
+Thank you's to Randy Dunlap, Rolf Eike Beer, Linus Torvalds, Bodo Eggert,
+Johannes Stezenbach, Grant Coady, Pavel Machek and others that I may have
+forgotten for their reviews and contributions to this document.
diff --git a/Documentation/block/barrier.txt b/Documentation/block/barrier.txt
new file mode 100644
index 00000000000..03971518b22
--- /dev/null
+++ b/Documentation/block/barrier.txt
@@ -0,0 +1,271 @@
+I/O Barriers
+============
+Tejun Heo <htejun@gmail.com>, July 22 2005
+
+I/O barrier requests are used to guarantee ordering around the barrier
+requests. Unless you're crazy enough to use disk drives for
+implementing synchronization constructs (wow, sounds interesting...),
+the ordering is meaningful only for write requests for things like
+journal checkpoints. All requests queued before a barrier request
+must be finished (made it to the physical medium) before the barrier
+request is started, and all requests queued after the barrier request
+must be started only after the barrier request is finished (again,
+made it to the physical medium).
+
+In other words, I/O barrier requests have the following two properties.
+
+1. Request ordering
+
+Requests cannot pass the barrier request. Preceding requests are
+processed before the barrier and following requests after.
+
+Depending on what features a drive supports, this can be done in one
+of the following three ways.
+
+i. For devices which have queue depth greater than 1 (TCQ devices) and
+support ordered tags, block layer can just issue the barrier as an
+ordered request and the lower level driver, controller and drive
+itself are responsible for making sure that the ordering contraint is
+met. Most modern SCSI controllers/drives should support this.
+
+NOTE: SCSI ordered tag isn't currently used due to limitation in the
+ SCSI midlayer, see the following random notes section.
+
+ii. For devices which have queue depth greater than 1 but don't
+support ordered tags, block layer ensures that the requests preceding
+a barrier request finishes before issuing the barrier request. Also,
+it defers requests following the barrier until the barrier request is
+finished. Older SCSI controllers/drives and SATA drives fall in this
+category.
+
+iii. Devices which have queue depth of 1. This is a degenerate case
+of ii. Just keeping issue order suffices. Ancient SCSI
+controllers/drives and IDE drives are in this category.
+
+2. Forced flushing to physcial medium
+
+Again, if you're not gonna do synchronization with disk drives (dang,
+it sounds even more appealing now!), the reason you use I/O barriers
+is mainly to protect filesystem integrity when power failure or some
+other events abruptly stop the drive from operating and possibly make
+the drive lose data in its cache. So, I/O barriers need to guarantee
+that requests actually get written to non-volatile medium in order.
+
+There are four cases,
+
+i. No write-back cache. Keeping requests ordered is enough.
+
+ii. Write-back cache but no flush operation. There's no way to
+gurantee physical-medium commit order. This kind of devices can't to
+I/O barriers.
+
+iii. Write-back cache and flush operation but no FUA (forced unit
+access). We need two cache flushes - before and after the barrier
+request.
+
+iv. Write-back cache, flush operation and FUA. We still need one
+flush to make sure requests preceding a barrier are written to medium,
+but post-barrier flush can be avoided by using FUA write on the
+barrier itself.
+
+
+How to support barrier requests in drivers
+------------------------------------------
+
+All barrier handling is done inside block layer proper. All low level
+drivers have to are implementing its prepare_flush_fn and using one
+the following two functions to indicate what barrier type it supports
+and how to prepare flush requests. Note that the term 'ordered' is
+used to indicate the whole sequence of performing barrier requests
+including draining and flushing.
+
+typedef void (prepare_flush_fn)(request_queue_t *q, struct request *rq);
+
+int blk_queue_ordered(request_queue_t *q, unsigned ordered,
+ prepare_flush_fn *prepare_flush_fn,
+ unsigned gfp_mask);
+
+int blk_queue_ordered_locked(request_queue_t *q, unsigned ordered,
+ prepare_flush_fn *prepare_flush_fn,
+ unsigned gfp_mask);
+
+The only difference between the two functions is whether or not the
+caller is holding q->queue_lock on entry. The latter expects the
+caller is holding the lock.
+
+@q : the queue in question
+@ordered : the ordered mode the driver/device supports
+@prepare_flush_fn : this function should prepare @rq such that it
+ flushes cache to physical medium when executed
+@gfp_mask : gfp_mask used when allocating data structures
+ for ordered processing
+
+For example, SCSI disk driver's prepare_flush_fn looks like the
+following.
+
+static void sd_prepare_flush(request_queue_t *q, struct request *rq)
+{
+ memset(rq->cmd, 0, sizeof(rq->cmd));
+ rq->flags |= REQ_BLOCK_PC;
+ rq->timeout = SD_TIMEOUT;
+ rq->cmd[0] = SYNCHRONIZE_CACHE;
+}
+
+The following seven ordered modes are supported. The following table
+shows which mode should be used depending on what features a
+device/driver supports. In the leftmost column of table,
+QUEUE_ORDERED_ prefix is omitted from the mode names to save space.
+
+The table is followed by description of each mode. Note that in the
+descriptions of QUEUE_ORDERED_DRAIN*, '=>' is used whereas '->' is
+used for QUEUE_ORDERED_TAG* descriptions. '=>' indicates that the
+preceding step must be complete before proceeding to the next step.
+'->' indicates that the next step can start as soon as the previous
+step is issued.
+
+ write-back cache ordered tag flush FUA
+-----------------------------------------------------------------------
+NONE yes/no N/A no N/A
+DRAIN no no N/A N/A
+DRAIN_FLUSH yes no yes no
+DRAIN_FUA yes no yes yes
+TAG no yes N/A N/A
+TAG_FLUSH yes yes yes no
+TAG_FUA yes yes yes yes
+
+
+QUEUE_ORDERED_NONE
+ I/O barriers are not needed and/or supported.
+
+ Sequence: N/A
+
+QUEUE_ORDERED_DRAIN
+ Requests are ordered by draining the request queue and cache
+ flushing isn't needed.
+
+ Sequence: drain => barrier
+
+QUEUE_ORDERED_DRAIN_FLUSH
+ Requests are ordered by draining the request queue and both
+ pre-barrier and post-barrier cache flushings are needed.
+
+ Sequence: drain => preflush => barrier => postflush
+
+QUEUE_ORDERED_DRAIN_FUA
+ Requests are ordered by draining the request queue and
+ pre-barrier cache flushing is needed. By using FUA on barrier
+ request, post-barrier flushing can be skipped.
+
+ Sequence: drain => preflush => barrier
+
+QUEUE_ORDERED_TAG
+ Requests are ordered by ordered tag and cache flushing isn't
+ needed.
+
+ Sequence: barrier
+
+QUEUE_ORDERED_TAG_FLUSH
+ Requests are ordered by ordered tag and both pre-barrier and
+ post-barrier cache flushings are needed.
+
+ Sequence: preflush -> barrier -> postflush
+
+QUEUE_ORDERED_TAG_FUA
+ Requests are ordered by ordered tag and pre-barrier cache
+ flushing is needed. By using FUA on barrier request,
+ post-barrier flushing can be skipped.
+
+ Sequence: preflush -> barrier
+
+
+Random notes/caveats
+--------------------
+
+* SCSI layer currently can't use TAG ordering even if the drive,
+controller and driver support it. The problem is that SCSI midlayer
+request dispatch function is not atomic. It releases queue lock and
+switch to SCSI host lock during issue and it's possible and likely to
+happen in time that requests change their relative positions. Once
+this problem is solved, TAG ordering can be enabled.
+
+* Currently, no matter which ordered mode is used, there can be only
+one barrier request in progress. All I/O barriers are held off by
+block layer until the previous I/O barrier is complete. This doesn't
+make any difference for DRAIN ordered devices, but, for TAG ordered
+devices with very high command latency, passing multiple I/O barriers
+to low level *might* be helpful if they are very frequent. Well, this
+certainly is a non-issue. I'm writing this just to make clear that no
+two I/O barrier is ever passed to low-level driver.
+
+* Completion order. Requests in ordered sequence are issued in order
+but not required to finish in order. Barrier implementation can
+handle out-of-order completion of ordered sequence. IOW, the requests
+MUST be processed in order but the hardware/software completion paths
+are allowed to reorder completion notifications - eg. current SCSI
+midlayer doesn't preserve completion order during error handling.
+
+* Requeueing order. Low-level drivers are free to requeue any request
+after they removed it from the request queue with
+blkdev_dequeue_request(). As barrier sequence should be kept in order
+when requeued, generic elevator code takes care of putting requests in
+order around barrier. See blk_ordered_req_seq() and
+ELEVATOR_INSERT_REQUEUE handling in __elv_add_request() for details.
+
+Note that block drivers must not requeue preceding requests while
+completing latter requests in an ordered sequence. Currently, no
+error checking is done against this.
+
+* Error handling. Currently, block layer will report error to upper
+layer if any of requests in an ordered sequence fails. Unfortunately,
+this doesn't seem to be enough. Look at the following request flow.
+QUEUE_ORDERED_TAG_FLUSH is in use.
+
+ [0] [1] [2] [3] [pre] [barrier] [post] < [4] [5] [6] ... >
+ still in elevator
+
+Let's say request [2], [3] are write requests to update file system
+metadata (journal or whatever) and [barrier] is used to mark that
+those updates are valid. Consider the following sequence.
+
+ i. Requests [0] ~ [post] leaves the request queue and enters
+ low-level driver.
+ ii. After a while, unfortunately, something goes wrong and the
+ drive fails [2]. Note that any of [0], [1] and [3] could have
+ completed by this time, but [pre] couldn't have been finished
+ as the drive must process it in order and it failed before
+ processing that command.
+ iii. Error handling kicks in and determines that the error is
+ unrecoverable and fails [2], and resumes operation.
+ iv. [pre] [barrier] [post] gets processed.
+ v. *BOOM* power fails
+
+The problem here is that the barrier request is *supposed* to indicate
+that filesystem update requests [2] and [3] made it safely to the
+physical medium and, if the machine crashes after the barrier is
+written, filesystem recovery code can depend on that. Sadly, that
+isn't true in this case anymore. IOW, the success of a I/O barrier
+should also be dependent on success of some of the preceding requests,
+where only upper layer (filesystem) knows what 'some' is.
+
+This can be solved by implementing a way to tell the block layer which
+requests affect the success of the following barrier request and
+making lower lever drivers to resume operation on error only after
+block layer tells it to do so.
+
+As the probability of this happening is very low and the drive should
+be faulty, implementing the fix is probably an overkill. But, still,
+it's there.
+
+* In previous drafts of barrier implementation, there was fallback
+mechanism such that, if FUA or ordered TAG fails, less fancy ordered
+mode can be selected and the failed barrier request is retried
+automatically. The rationale for this feature was that as FUA is
+pretty new in ATA world and ordered tag was never used widely, there
+could be devices which report to support those features but choke when
+actually given such requests.
+
+ This was removed for two reasons 1. it's an overkill 2. it's
+impossible to implement properly when TAG ordering is used as low
+level drivers resume after an error automatically. If it's ever
+needed adding it back and modifying low level drivers accordingly
+shouldn't be difficult.
diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index 0fe01c80548..8e63831971d 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -31,7 +31,7 @@ The following people helped with review comments and inputs for this
document:
Christoph Hellwig <hch@infradead.org>
Arjan van de Ven <arjanv@redhat.com>
- Randy Dunlap <rddunlap@osdl.org>
+ Randy Dunlap <rdunlap@xenotime.net>
Andre Hedrick <andre@linux-ide.org>
The following people helped with fixes/contributions to the bio patches
@@ -263,14 +263,8 @@ A flag in the bio structure, BIO_BARRIER is used to identify a barrier i/o.
The generic i/o scheduler would make sure that it places the barrier request and
all other requests coming after it after all the previous requests in the
queue. Barriers may be implemented in different ways depending on the
-driver. A SCSI driver for example could make use of ordered tags to
-preserve the necessary ordering with a lower impact on throughput. For IDE
-this might be two sync cache flush: a pre and post flush when encountering
-a barrier write.
-
-There is a provision for queues to indicate what kind of barriers they
-can provide. This is as of yet unmerged, details will be added here once it
-is in the kernel.
+driver. For more details regarding I/O barriers, please read barrier.txt
+in this directory.
1.2.2 Request Priority/Latency
diff --git a/Documentation/block/stat.txt b/Documentation/block/stat.txt
new file mode 100644
index 00000000000..0dbc946de2e
--- /dev/null
+++ b/Documentation/block/stat.txt
@@ -0,0 +1,82 @@
+Block layer statistics in /sys/block/<dev>/stat
+===============================================
+
+This file documents the contents of the /sys/block/<dev>/stat file.
+
+The stat file provides several statistics about the state of block
+device <dev>.
+
+Q. Why are there multiple statistics in a single file? Doesn't sysfs
+ normally contain a single value per file?
+A. By having a single file, the kernel can guarantee that the statistics
+ represent a consistent snapshot of the state of the device. If the
+ statistics were exported as multiple files containing one statistic
+ each, it would be impossible to guarantee that a set of readings
+ represent a single point in time.
+
+The stat file consists of a single line of text containing 11 decimal
+values separated by whitespace. The fields are summarized in the
+following table, and described in more detail below.
+
+Name units description
+---- ----- -----------
+read I/Os requests number of read I/Os processed
+read merges requests number of read I/Os merged with in-queue I/O
+read sectors sectors number of sectors read
+read ticks milliseconds total wait time for read requests
+write I/Os requests number of write I/Os processed
+write merges requests number of write I/Os merged with in-queue I/O
+write sectors sectors number of sectors written
+write ticks milliseconds total wait time for write requests
+in_flight requests number of I/Os currently in flight
+io_ticks milliseconds total time this block device has been active
+time_in_queue milliseconds total wait time for all requests
+
+read I/Os, write I/Os
+=====================
+
+These values increment when an I/O request completes.
+
+read merges, write merges
+=========================
+
+These values increment when an I/O request is merged with an
+already-queued I/O request.
+
+read sectors, write sectors
+===========================
+
+These values count the number of sectors read from or written to this
+block device. The "sectors" in question are the standard UNIX 512-byte
+sectors, not any device- or filesystem-specific block size. The
+counters are incremented when the I/O completes.
+
+read ticks, write ticks
+=======================
+
+These values count the number of milliseconds that I/O requests have
+waited on this block device. If there are multiple I/O requests waiting,
+these values will increase at a rate greater than 1000/second; for
+example, if 60 read requests wait for an average of 30 ms, the read_ticks
+field will increase by 60*30 = 1800.
+
+in_flight
+=========
+
+This value counts the number of I/O requests that have been issued to
+the device driver but have not yet completed. It does not include I/O
+requests that are in the queue but not yet issued to the device driver.
+
+io_ticks
+========
+
+This value counts the number of milliseconds during which the device has
+had I/O requests queued.
+
+time_in_queue
+=============
+
+This value counts the number of milliseconds that I/O requests have waited
+on this block device. If there are multiple I/O requests waiting, this
+value will increase as the product of the number of milliseconds times the
+number of requests waiting (see "read ticks" above for an example).
diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt
index 7eb715e07ed..4ae418889b8 100644
--- a/Documentation/cachetlb.txt
+++ b/Documentation/cachetlb.txt
@@ -136,7 +136,7 @@ changes occur:
8) void lazy_mmu_prot_update(pte_t pte)
This interface is called whenever the protection on
any user PTEs change. This interface provides a notification
- to architecture specific code to take appropiate action.
+ to architecture specific code to take appropriate action.
Next, we have the cache flushing interfaces. In general, when Linux
diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt
index 933fae74c33..f4b8dc4237e 100644
--- a/Documentation/cpu-freq/governors.txt
+++ b/Documentation/cpu-freq/governors.txt
@@ -27,6 +27,7 @@ Contents:
2.2 Powersave
2.3 Userspace
2.4 Ondemand
+2.5 Conservative
3. The Governor Interface in the CPUfreq Core
@@ -110,9 +111,64 @@ directory.
The CPUfreq govenor "ondemand" sets the CPU depending on the
current usage. To do this the CPU must have the capability to
-switch the frequency very fast.
-
-
+switch the frequency very quickly. There are a number of sysfs file
+accessible parameters:
+
+sampling_rate: measured in uS (10^-6 seconds), this is how often you
+want the kernel to look at the CPU usage and to make decisions on
+what to do about the frequency. Typically this is set to values of
+around '10000' or more.
+
+show_sampling_rate_(min|max): the minimum and maximum sampling rates
+available that you may set 'sampling_rate' to.
+
+up_threshold: defines what the average CPU usaged between the samplings
+of 'sampling_rate' needs to be for the kernel to make a decision on
+whether it should increase the frequency. For example when it is set
+to its default value of '80' it means that between the checking
+intervals the CPU needs to be on average more than 80% in use to then
+decide that the CPU frequency needs to be increased.
+
+sampling_down_factor: this parameter controls the rate that the CPU
+makes a decision on when to decrease the frequency. When set to its
+default value of '5' it means that at 1/5 the sampling_rate the kernel
+makes a decision to lower the frequency. Five "lower rate" decisions
+have to be made in a row before the CPU frequency is actually lower.
+If set to '1' then the frequency decreases as quickly as it increases,
+if set to '2' it decreases at half the rate of the increase.
+
+ignore_nice_load: this parameter takes a value of '0' or '1', when set
+to '0' (its default) then all processes are counted towards towards the
+'cpu utilisation' value. When set to '1' then processes that are
+run with a 'nice' value will not count (and thus be ignored) in the
+overal usage calculation. This is useful if you are running a CPU
+intensive calculation on your laptop that you do not care how long it
+takes to complete as you can 'nice' it and prevent it from taking part
+in the deciding process of whether to increase your CPU frequency.
+
+
+2.5 Conservative
+----------------
+
+The CPUfreq governor "conservative", much like the "ondemand"
+governor, sets the CPU depending on the current usage. It differs in
+behaviour in that it gracefully increases and decreases the CPU speed
+rather than jumping to max speed the moment there is any load on the
+CPU. This behaviour more suitable in a battery powered environment.
+The governor is tweaked in the same manner as the "ondemand" governor
+through sysfs with the addition of:
+
+freq_step: this describes what percentage steps the cpu freq should be
+increased and decreased smoothly by. By default the cpu frequency will
+increase in 5% chunks of your maximum cpu frequency. You can change this
+value to anywhere between 0 and 100 where '0' will effectively lock your
+CPU at a speed regardless of its load whilst '100' will, in theory, make
+it behave identically to the "ondemand" governor.
+
+down_threshold: same as the 'up_threshold' found for the "ondemand"
+governor but for the opposite direction. For example when set to its
+default value of '20' it means that if the CPU usage needs to be below
+20% between samples to have the frequency decreased.
3. The Governor Interface in the CPUfreq Core
=============================================
diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
new file mode 100644
index 00000000000..08c5d04f308
--- /dev/null
+++ b/Documentation/cpu-hotplug.txt
@@ -0,0 +1,357 @@
+ CPU hotplug Support in Linux(tm) Kernel
+
+ Maintainers:
+ CPU Hotplug Core:
+ Rusty Russell <rusty@rustycorp.com.au>
+ Srivatsa Vaddagiri <vatsa@in.ibm.com>
+ i386:
+ Zwane Mwaikambo <zwane@arm.linux.org.uk>
+ ppc64:
+ Nathan Lynch <nathanl@austin.ibm.com>
+ Joel Schopp <jschopp@austin.ibm.com>
+ ia64/x86_64:
+ Ashok Raj <ashok.raj@intel.com>
+
+Authors: Ashok Raj <ashok.raj@intel.com>
+Lots of feedback: Nathan Lynch <nathanl@austin.ibm.com>,
+ Joel Schopp <jschopp@austin.ibm.com>
+
+Introduction
+
+Modern advances in system architectures have introduced advanced error
+reporting and correction capabilities in processors. CPU architectures permit
+partitioning support, where compute resources of a single CPU could be made
+available to virtual machine environments. There are couple OEMS that
+support NUMA hardware which are hot pluggable as well, where physical
+node insertion and removal require support for CPU hotplug.
+
+Such advances require CPUs available to a kernel to be removed either for
+provisioning reasons, or for RAS purposes to keep an offending CPU off
+system execution path. Hence the need for CPU hotplug support in the
+Linux kernel.
+
+A more novel use of CPU-hotplug support is its use today in suspend
+resume support for SMP. Dual-core and HT support makes even
+a laptop run SMP kernels which didn't support these methods. SMP support
+for suspend/resume is a work in progress.
+
+General Stuff about CPU Hotplug
+--------------------------------
+
+Command Line Switches
+---------------------
+maxcpus=n Restrict boot time cpus to n. Say if you have 4 cpus, using
+ maxcpus=2 will only boot 2. You can choose to bring the
+ other cpus later online, read FAQ's for more info.
+
+additional_cpus=n [x86_64 only] use this to limit hotpluggable cpus.
+ This option sets
+ cpu_possible_map = cpu_present_map + additional_cpus
+
+CPU maps and such
+-----------------
+[More on cpumaps and primitive to manipulate, please check
+include/linux/cpumask.h that has more descriptive text.]
+
+cpu_possible_map: Bitmap of possible CPUs that can ever be available in the
+system. This is used to allocate some boot time memory for per_cpu variables
+that aren't designed to grow/shrink as CPUs are made available or removed.
+Once set during boot time discovery phase, the map is static, i.e no bits
+are added or removed anytime. Trimming it accurately for your system needs
+upfront can save some boot time memory. See below for how we use heuristics
+in x86_64 case to keep this under check.
+
+cpu_online_map: Bitmap of all CPUs currently online. Its set in __cpu_up()
+after a cpu is available for kernel scheduling and ready to receive
+interrupts from devices. Its cleared when a cpu is brought down using
+__cpu_disable(), before which all OS services including interrupts are
+migrated to another target CPU.
+
+cpu_present_map: Bitmap of CPUs currently present in the system. Not all
+of them may be online. When physical hotplug is processed by the relevant
+subsystem (e.g ACPI) can change and new bit either be added or removed
+from the map depending on the event is hot-add/hot-remove. There are currently
+no locking rules as of now. Typical usage is to init topology during boot,
+at which time hotplug is disabled.
+
+You really dont need to manipulate any of the system cpu maps. They should
+be read-only for most use. When setting up per-cpu resources almost always use
+cpu_possible_map/for_each_cpu() to iterate.
+
+Never use anything other than cpumask_t to represent bitmap of CPUs.
+
+#include <linux/cpumask.h>
+
+for_each_cpu - Iterate over cpu_possible_map
+for_each_online_cpu - Iterate over cpu_online_map
+for_each_present_cpu - Iterate over cpu_present_map
+for_each_cpu_mask(x,mask) - Iterate over some random collection of cpu mask.
+
+#include <linux/cpu.h>
+lock_cpu_hotplug() and unlock_cpu_hotplug():
+
+The above calls are used to inhibit cpu hotplug operations. While holding the
+cpucontrol mutex, cpu_online_map will not change. If you merely need to avoid
+cpus going away, you could also use preempt_disable() and preempt_enable()
+for those sections. Just remember the critical section cannot call any
+function that can sleep or schedule this process away. The preempt_disable()
+will work as long as stop_machine_run() is used to take a cpu down.
+
+CPU Hotplug - Frequently Asked Questions.
+
+Q: How to i enable my kernel to support CPU hotplug?
+A: When doing make defconfig, Enable CPU hotplug support
+
+ "Processor type and Features" -> Support for Hotpluggable CPUs
+
+Make sure that you have CONFIG_HOTPLUG, and CONFIG_SMP turned on as well.
+
+You would need to enable CONFIG_HOTPLUG_CPU for SMP suspend/resume support
+as well.
+
+Q: What architectures support CPU hotplug?
+A: As of 2.6.14, the following architectures support CPU hotplug.
+
+i386 (Intel), ppc, ppc64, parisc, s390, ia64 and x86_64
+
+Q: How to test if hotplug is supported on the newly built kernel?
+A: You should now notice an entry in sysfs.
+
+Check if sysfs is mounted, using the "mount" command. You should notice
+an entry as shown below in the output.
+
+....
+none on /sys type sysfs (rw)
+....
+
+if this is not mounted, do the following.
+
+#mkdir /sysfs
+#mount -t sysfs sys /sys
+
+now you should see entries for all present cpu, the following is an example
+in a 8-way system.
+
+#pwd
+#/sys/devices/system/cpu
+#ls -l
+total 0
+drwxr-xr-x 10 root root 0 Sep 19 07:44 .
+drwxr-xr-x 13 root root 0 Sep 19 07:45 ..
+drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu0
+drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu1
+drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu2
+drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu3
+drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu4
+drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu5
+drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu6
+drwxr-xr-x 3 root root 0 Sep 19 07:48 cpu7
+
+Under each directory you would find an "online" file which is the control
+file to logically online/offline a processor.
+
+Q: Does hot-add/hot-remove refer to physical add/remove of cpus?
+A: The usage of hot-add/remove may not be very consistently used in the code.
+CONFIG_CPU_HOTPLUG enables logical online/offline capability in the kernel.
+To support physical addition/removal, one would need some BIOS hooks and
+the platform should have something like an attention button in PCI hotplug.
+CONFIG_ACPI_HOTPLUG_CPU enables ACPI support for physical add/remove of CPUs.
+
+Q: How do i logically offline a CPU?
+A: Do the following.
+
+#echo 0 > /sys/devices/system/cpu/cpuX/online
+
+once the logical offline is successful, check
+
+#cat /proc/interrupts
+
+you should now not see the CPU that you removed. Also online file will report
+the state as 0 when a cpu if offline and 1 when its online.
+
+#To display the current cpu state.
+#cat /sys/devices/system/cpu/cpuX/online
+
+Q: Why cant i remove CPU0 on some systems?
+A: Some architectures may have some special dependency on a certain CPU.
+
+For e.g in IA64 platforms we have ability to sent platform interrupts to the
+OS. a.k.a Corrected Platform Error Interrupts (CPEI). In current ACPI
+specifications, we didn't have a way to change the target CPU. Hence if the
+current ACPI version doesn't support such re-direction, we disable that CPU
+by making it not-removable.
+
+In such cases you will also notice that the online file is missing under cpu0.
+
+Q: How do i find out if a particular CPU is not removable?
+A: Depending on the implementation, some architectures may show this by the
+absence of the "online" file. This is done if it can be determined ahead of
+time that this CPU cannot be removed.
+
+In some situations, this can be a run time check, i.e if you try to remove the
+last CPU, this will not be permitted. You can find such failures by
+investigating the return value of the "echo" command.
+
+Q: What happens when a CPU is being logically offlined?
+A: The following happen, listed in no particular order :-)
+
+- A notification is sent to in-kernel registered modules by sending an event
+ CPU_DOWN_PREPARE
+- All process is migrated away from this outgoing CPU to a new CPU
+- All interrupts targeted to this CPU is migrated to a new CPU
+- timers/bottom half/task lets are also migrated to a new CPU
+- Once all services are migrated, kernel calls an arch specific routine
+ __cpu_disable() to perform arch specific cleanup.
+- Once this is successful, an event for successful cleanup is sent by an event
+ CPU_DEAD.
+
+ "It is expected that each service cleans up when the CPU_DOWN_PREPARE
+ notifier is called, when CPU_DEAD is called its expected there is nothing
+ running on behalf of this CPU that was offlined"
+
+Q: If i have some kernel code that needs to be aware of CPU arrival and
+ departure, how to i arrange for proper notification?
+A: This is what you would need in your kernel code to receive notifications.
+
+ #include <linux/cpu.h>
+ static int __cpuinit foobar_cpu_callback(struct notifier_block *nfb,
+ unsigned long action, void *hcpu)
+ {
+ unsigned int cpu = (unsigned long)hcpu;
+
+ switch (action) {
+ case CPU_ONLINE:
+ foobar_online_action(cpu);
+ break;
+ case CPU_DEAD:
+ foobar_dead_action(cpu);
+ break;
+ }
+ return NOTIFY_OK;
+ }
+
+ static struct notifier_block foobar_cpu_notifer =
+ {
+ .notifier_call = foobar_cpu_callback,
+ };
+
+
+In your init function,
+
+ register_cpu_notifier(&foobar_cpu_notifier);
+
+You can fail PREPARE notifiers if something doesn't work to prepare resources.
+This will stop the activity and send a following CANCELED event back.
+
+CPU_DEAD should not be failed, its just a goodness indication, but bad
+things will happen if a notifier in path sent a BAD notify code.
+
+Q: I don't see my action being called for all CPUs already up and running?
+A: Yes, CPU notifiers are called only when new CPUs are on-lined or offlined.
+ If you need to perform some action for each cpu already in the system, then
+
+ for_each_online_cpu(i) {
+ foobar_cpu_callback(&foobar_cpu_notifier, CPU_UP_PREPARE, i);
+ foobar_cpu_callback(&foobar-cpu_notifier, CPU_ONLINE, i);
+ }
+
+Q: If i would like to develop cpu hotplug support for a new architecture,
+ what do i need at a minimum?
+A: The following are what is required for CPU hotplug infrastructure to work
+ correctly.
+
+ - Make sure you have an entry in Kconfig to enable CONFIG_HOTPLUG_CPU
+ - __cpu_up() - Arch interface to bring up a CPU
+ - __cpu_disable() - Arch interface to shutdown a CPU, no more interrupts
+ can be handled by the kernel after the routine
+ returns. Including local APIC timers etc are
+ shutdown.
+ - __cpu_die() - This actually supposed to ensure death of the CPU.
+ Actually look at some example code in other arch
+ that implement CPU hotplug. The processor is taken
+ down from the idle() loop for that specific
+ architecture. __cpu_die() typically waits for some
+ per_cpu state to be set, to ensure the processor
+ dead routine is called to be sure positively.
+
+Q: I need to ensure that a particular cpu is not removed when there is some
+ work specific to this cpu is in progress.
+A: First switch the current thread context to preferred cpu
+
+ int my_func_on_cpu(int cpu)
+ {
+ cpumask_t saved_mask, new_mask = CPU_MASK_NONE;
+ int curr_cpu, err = 0;
+
+ saved_mask = current->cpus_allowed;
+ cpu_set(cpu, new_mask);
+ err = set_cpus_allowed(current, new_mask);
+
+ if (err)
+ return err;
+
+ /*
+ * If we got scheduled out just after the return from
+ * set_cpus_allowed() before running the work, this ensures
+ * we stay locked.
+ */
+ curr_cpu = get_cpu();
+
+ if (curr_cpu != cpu) {
+ err = -EAGAIN;
+ goto ret;
+ } else {
+ /*
+ * Do work : But cant sleep, since get_cpu() disables preempt
+ */
+ }
+ ret:
+ put_cpu();
+ set_cpus_allowed(current, saved_mask);
+ return err;
+ }
+
+
+Q: How do we determine how many CPUs are available for hotplug.
+A: There is no clear spec defined way from ACPI that can give us that
+ information today. Based on some input from Natalie of Unisys,
+ that the ACPI MADT (Multiple APIC Description Tables) marks those possible
+ CPUs in a system with disabled status.
+
+ Andi implemented some simple heuristics that count the number of disabled
+ CPUs in MADT as hotpluggable CPUS. In the case there are no disabled CPUS
+ we assume 1/2 the number of CPUs currently present can be hotplugged.
+
+ Caveat: Today's ACPI MADT can only provide 256 entries since the apicid field
+ in MADT is only 8 bits.
+
+User Space Notification
+
+Hotplug support for devices is common in Linux today. Its being used today to
+support automatic configuration of network, usb and pci devices. A hotplug
+event can be used to invoke an agent script to perform the configuration task.
+
+You can add /etc/hotplug/cpu.agent to handle hotplug notification user space
+scripts.
+
+ #!/bin/bash
+ # $Id: cpu.agent
+ # Kernel hotplug params include:
+ #ACTION=%s [online or offline]
+ #DEVPATH=%s
+ #
+ cd /etc/hotplug
+ . ./hotplug.functions
+
+ case $ACTION in
+ online)
+ echo `date` ":cpu.agent" add cpu >> /tmp/hotplug.txt
+ ;;
+ offline)
+ echo `date` ":cpu.agent" remove cpu >>/tmp/hotplug.txt
+ ;;
+ *)
+ debug_mesg CPU $ACTION event not supported
+ exit 1
+ ;;
+ esac
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index a09a8eb8066..990998ee10b 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -14,7 +14,10 @@ CONTENTS:
1.1 What are cpusets ?
1.2 Why are cpusets needed ?
1.3 How are cpusets implemented ?
- 1.4 How do I use cpusets ?
+ 1.4 What are exclusive cpusets ?
+ 1.5 What does notify_on_release do ?
+ 1.6 What is memory_pressure ?
+ 1.7 How do I use cpusets ?
2. Usage Examples and Syntax
2.1 Basic Usage
2.2 Adding/removing cpus
@@ -49,29 +52,6 @@ its cpus_allowed vector, and the kernel page allocator will not
allocate a page on a node that is not allowed in the requesting tasks
mems_allowed vector.
-If a cpuset is cpu or mem exclusive, no other cpuset, other than a direct
-ancestor or descendent, may share any of the same CPUs or Memory Nodes.
-A cpuset that is cpu exclusive has a sched domain associated with it.
-The sched domain consists of all cpus in the current cpuset that are not
-part of any exclusive child cpusets.
-This ensures that the scheduler load balacing code only balances
-against the cpus that are in the sched domain as defined above and not
-all of the cpus in the system. This removes any overhead due to
-load balancing code trying to pull tasks outside of the cpu exclusive
-cpuset only to be prevented by the tasks' cpus_allowed mask.
-
-A cpuset that is mem_exclusive restricts kernel allocations for
-page, buffer and other data commonly shared by the kernel across
-multiple users. All cpusets, whether mem_exclusive or not, restrict
-allocations of memory for user space. This enables configuring a
-system so that several independent jobs can share common kernel
-data, such as file system pages, while isolating each jobs user
-allocation in its own cpuset. To do this, construct a large
-mem_exclusive cpuset to hold all the jobs, and construct child,
-non-mem_exclusive cpusets for each individual job. Only a small
-amount of typical kernel memory, such as requests from interrupt
-handlers, is allowed to be taken outside even a mem_exclusive cpuset.
-
User level code may create and destroy cpusets by name in the cpuset
virtual file system, manage the attributes and permissions of these
cpusets and which CPUs and Memory Nodes are assigned to each cpuset,
@@ -155,7 +135,7 @@ Cpusets extends these two mechanisms as follows:
The implementation of cpusets requires a few, simple hooks
into the rest of the kernel, none in performance critical paths:
- - in main/init.c, to initialize the root cpuset at system boot.
+ - in init/main.c, to initialize the root cpuset at system boot.
- in fork and exit, to attach and detach a task from its cpuset.
- in sched_setaffinity, to mask the requested CPUs by what's
allowed in that tasks cpuset.
@@ -166,7 +146,7 @@ into the rest of the kernel, none in performance critical paths:
and related changes in both sched.c and arch/ia64/kernel/domain.c
- in the mbind and set_mempolicy system calls, to mask the requested
Memory Nodes by what's allowed in that tasks cpuset.
- - in page_alloc, to restrict memory to allowed nodes.
+ - in page_alloc.c, to restrict memory to allowed nodes.
- in vmscan.c, to restrict page recovery to the current cpuset.
In addition a new file system, of type "cpuset" may be mounted,
@@ -192,9 +172,15 @@ containing the following files describing that cpuset:
- cpus: list of CPUs in that cpuset
- mems: list of Memory Nodes in that cpuset
+ - memory_migrate flag: if set, move pages to cpusets nodes
- cpu_exclusive flag: is cpu placement exclusive?
- mem_exclusive flag: is memory placement exclusive?
- tasks: list of tasks (by pid) attached to that cpuset
+ - notify_on_release flag: run /sbin/cpuset_release_agent on exit?
+ - memory_pressure: measure of how much paging pressure in cpuset
+
+In addition, the root cpuset only has the following file:
+ - memory_pressure_enabled flag: compute memory_pressure?
New cpusets are created using the mkdir system call or shell
command. The properties of a cpuset, such as its flags, allowed
@@ -228,7 +214,108 @@ exclusive cpuset. Also, the use of a Linux virtual file system (vfs)
to represent the cpuset hierarchy provides for a familiar permission
and name space for cpusets, with a minimum of additional kernel code.
-1.4 How do I use cpusets ?
+
+1.4 What are exclusive cpusets ?
+--------------------------------
+
+If a cpuset is cpu or mem exclusive, no other cpuset, other than
+a direct ancestor or descendent, may share any of the same CPUs or
+Memory Nodes.
+
+A cpuset that is cpu_exclusive has a scheduler (sched) domain
+associated with it. The sched domain consists of all CPUs in the
+current cpuset that are not part of any exclusive child cpusets.
+This ensures that the scheduler load balancing code only balances
+against the CPUs that are in the sched domain as defined above and
+not all of the CPUs in the system. This removes any overhead due to
+load balancing code trying to pull tasks outside of the cpu_exclusive
+cpuset only to be prevented by the tasks' cpus_allowed mask.
+
+A cpuset that is mem_exclusive restricts kernel allocations for
+page, buffer and other data commonly shared by the kernel across
+multiple users. All cpusets, whether mem_exclusive or not, restrict
+allocations of memory for user space. This enables configuring a
+system so that several independent jobs can share common kernel data,
+such as file system pages, while isolating each jobs user allocation in
+its own cpuset. To do this, construct a large mem_exclusive cpuset to
+hold all the jobs, and construct child, non-mem_exclusive cpusets for
+each individual job. Only a small amount of typical kernel memory,
+such as requests from interrupt handlers, is allowed to be taken
+outside even a mem_exclusive cpuset.
+
+
+1.5 What does notify_on_release do ?
+------------------------------------
+
+If the notify_on_release flag is enabled (1) in a cpuset, then whenever
+the last task in the cpuset leaves (exits or attaches to some other
+cpuset) and the last child cpuset of that cpuset is removed, then
+the kernel runs the command /sbin/cpuset_release_agent, supplying the
+pathname (relative to the mount point of the cpuset file system) of the
+abandoned cpuset. This enables automatic removal of abandoned cpusets.
+The default value of notify_on_release in the root cpuset at system
+boot is disabled (0). The default value of other cpusets at creation
+is the current value of their parents notify_on_release setting.
+
+
+1.6 What is memory_pressure ?
+-----------------------------
+The memory_pressure of a cpuset provides a simple per-cpuset metric
+of the rate that the tasks in a cpuset are attempting to free up in
+use memory on the nodes of the cpuset to satisfy additional memory
+requests.
+
+This enables batch managers monitoring jobs running in dedicated
+cpusets to efficiently detect what level of memory pressure that job
+is causing.
+
+This is useful both on tightly managed systems running a wide mix of
+submitted jobs, which may choose to terminate or re-prioritize jobs that
+are trying to use more memory than allowed on the nodes assigned them,
+and with tightly coupled, long running, massively parallel scientific
+computing jobs that will dramatically fail to meet required performance
+goals if they start to use more memory than allowed to them.
+
+This mechanism provides a very economical way for the batch manager
+to monitor a cpuset for signs of memory pressure. It's up to the
+batch manager or other user code to decide what to do about it and
+take action.
+
+==> Unless this feature is enabled by writing "1" to the special file
+ /dev/cpuset/memory_pressure_enabled, the hook in the rebalance
+ code of __alloc_pages() for this metric reduces to simply noticing
+ that the cpuset_memory_pressure_enabled flag is zero. So only
+ systems that enable this feature will compute the metric.
+
+Why a per-cpuset, running average:
+
+ Because this meter is per-cpuset, rather than per-task or mm,
+ the system load imposed by a batch scheduler monitoring this
+ metric is sharply reduced on large systems, because a scan of
+ the tasklist can be avoided on each set of queries.
+
+ Because this meter is a running average, instead of an accumulating
+ counter, a batch scheduler can detect memory pressure with a
+ single read, instead of having to read and accumulate results
+ for a period of time.
+
+ Because this meter is per-cpuset rather than per-task or mm,
+ the batch scheduler can obtain the key information, memory
+ pressure in a cpuset, with a single read, rather than having to
+ query and accumulate results over all the (dynamically changing)
+ set of tasks in the cpuset.
+
+A per-cpuset simple digital filter (requires a spinlock and 3 words
+of data per-cpuset) is kept, and updated by any task attached to that
+cpuset, if it enters the synchronous (direct) page reclaim code.
+
+A per-cpuset file provides an integer number representing the recent
+(half-life of 10 seconds) rate of direct page reclaims caused by
+the tasks in the cpuset, in units of reclaims attempted per second,
+times 1000.
+
+
+1.7 How do I use cpusets ?
--------------------------
In order to minimize the impact of cpusets on critical kernel
@@ -277,6 +364,30 @@ rewritten to the 'tasks' file of its cpuset. This is done to avoid
impacting the scheduler code in the kernel with a check for changes
in a tasks processor placement.
+Normally, once a page is allocated (given a physical page
+of main memory) then that page stays on whatever node it
+was allocated, so long as it remains allocated, even if the
+cpusets memory placement policy 'mems' subsequently changes.
+If the cpuset flag file 'memory_migrate' is set true, then when
+tasks are attached to that cpuset, any pages that task had
+allocated to it on nodes in its previous cpuset are migrated
+to the tasks new cpuset. Depending on the implementation,
+this migration may either be done by swapping the page out,
+so that the next time the page is referenced, it will be paged
+into the tasks new cpuset, usually on the node where it was
+referenced, or this migration may be done by directly copying
+the pages from the tasks previous cpuset to the new cpuset,
+where possible to the same node, relative to the new cpuset,
+as the node that held the page, relative to the old cpuset.
+Also if 'memory_migrate' is set true, then if that cpusets
+'mems' file is modified, pages allocated to tasks in that
+cpuset, that were on nodes in the previous setting of 'mems',
+will be moved to nodes in the new setting of 'mems.' Again,
+depending on the implementation, this might be done by swapping,
+or by direct copying. In either case, pages that were not in
+the tasks prior cpuset, or in the cpusets prior 'mems' setting,
+will not be moved.
+
There is an exception to the above. If hotplug functionality is used
to remove all the CPUs that are currently assigned to a cpuset,
then the kernel will automatically update the cpus_allowed of all
diff --git a/Documentation/drivers/edac/edac.txt b/Documentation/drivers/edac/edac.txt
new file mode 100644
index 00000000000..d37191fe568
--- /dev/null
+++ b/Documentation/drivers/edac/edac.txt
@@ -0,0 +1,673 @@
+
+
+EDAC - Error Detection And Correction
+
+Written by Doug Thompson <norsk5@xmission.com>
+7 Dec 2005
+
+
+EDAC was written by:
+ Thayne Harbaugh,
+ modified by Dave Peterson, Doug Thompson, et al,
+ from the bluesmoke.sourceforge.net project.
+
+
+============================================================================
+EDAC PURPOSE
+
+The 'edac' kernel module goal is to detect and report errors that occur
+within the computer system. In the initial release, memory Correctable Errors
+(CE) and Uncorrectable Errors (UE) are the primary errors being harvested.
+
+Detecting CE events, then harvesting those events and reporting them,
+CAN be a predictor of future UE events. With CE events, the system can
+continue to operate, but with less safety. Preventive maintainence and
+proactive part replacement of memory DIMMs exhibiting CEs can reduce
+the likelihood of the dreaded UE events and system 'panics'.
+
+
+In addition, PCI Bus Parity and SERR Errors are scanned for on PCI devices
+in order to determine if errors are occurring on data transfers.
+The presence of PCI Parity errors must be examined with a grain of salt.
+There are several addin adapters that do NOT follow the PCI specification
+with regards to Parity generation and reporting. The specification says
+the vendor should tie the parity status bits to 0 if they do not intend
+to generate parity. Some vendors do not do this, and thus the parity bit
+can "float" giving false positives.
+
+The PCI Parity EDAC device has the ability to "skip" known flakey
+cards during the parity scan. These are set by the parity "blacklist"
+interface in the sysfs for PCI Parity. (See the PCI section in the sysfs
+section below.) There is also a parity "whitelist" which is used as
+an explicit list of devices to scan, while the blacklist is a list
+of devices to skip.
+
+EDAC will have future error detectors that will be added or integrated
+into EDAC in the following list:
+
+ MCE Machine Check Exception
+ MCA Machine Check Architecture
+ NMI NMI notification of ECC errors
+ MSRs Machine Specific Register error cases
+ and other mechanisms.
+
+These errors are usually bus errors, ECC errors, thermal throttling
+and the like.
+
+
+============================================================================
+EDAC VERSIONING
+
+EDAC is composed of a "core" module (edac_mc.ko) and several Memory
+Controller (MC) driver modules. On a given system, the CORE
+is loaded and one MC driver will be loaded. Both the CORE and
+the MC driver have individual versions that reflect current release
+level of their respective modules. Thus, to "report" on what version
+a system is running, one must report both the CORE's and the
+MC driver's versions.
+
+
+LOADING
+
+If 'edac' was statically linked with the kernel then no loading is
+necessary. If 'edac' was built as modules then simply modprobe the
+'edac' pieces that you need. You should be able to modprobe
+hardware-specific modules and have the dependencies load the necessary core
+modules.
+
+Example:
+
+$> modprobe amd76x_edac
+
+loads both the amd76x_edac.ko memory controller module and the edac_mc.ko
+core module.
+
+
+============================================================================
+EDAC sysfs INTERFACE
+
+EDAC presents a 'sysfs' interface for control, reporting and attribute
+reporting purposes.
+
+EDAC lives in the /sys/devices/system/edac directory. Within this directory
+there currently reside 2 'edac' components:
+
+ mc memory controller(s) system
+ pci PCI status system
+
+
+============================================================================
+Memory Controller (mc) Model
+
+First a background on the memory controller's model abstracted in EDAC.
+Each mc device controls a set of DIMM memory modules. These modules are
+layed out in a Chip-Select Row (csrowX) and Channel table (chX). There can
+be multiple csrows and two channels.
+
+Memory controllers allow for several csrows, with 8 csrows being a typical value.
+Yet, the actual number of csrows depends on the electrical "loading"
+of a given motherboard, memory controller and DIMM characteristics.
+
+Dual channels allows for 128 bit data transfers to the CPU from memory.
+
+
+ Channel 0 Channel 1
+ ===================================
+ csrow0 | DIMM_A0 | DIMM_B0 |
+ csrow1 | DIMM_A0 | DIMM_B0 |
+ ===================================
+
+ ===================================
+ csrow2 | DIMM_A1 | DIMM_B1 |
+ csrow3 | DIMM_A1 | DIMM_B1 |
+ ===================================
+
+In the above example table there are 4 physical slots on the motherboard
+for memory DIMMs:
+
+ DIMM_A0
+ DIMM_B0
+ DIMM_A1
+ DIMM_B1
+
+Labels for these slots are usually silk screened on the motherboard. Slots
+labeled 'A' are channel 0 in this example. Slots labled 'B'
+are channel 1. Notice that there are two csrows possible on a
+physical DIMM. These csrows are allocated their csrow assignment
+based on the slot into which the memory DIMM is placed. Thus, when 1 DIMM
+is placed in each Channel, the csrows cross both DIMMs.
+
+Memory DIMMs come single or dual "ranked". A rank is a populated csrow.
+Thus, 2 single ranked DIMMs, placed in slots DIMM_A0 and DIMM_B0 above
+will have 1 csrow, csrow0. csrow1 will be empty. On the other hand,
+when 2 dual ranked DIMMs are similiaryly placed, then both csrow0 and
+csrow1 will be populated. The pattern repeats itself for csrow2 and
+csrow3.
+
+The representation of the above is reflected in the directory tree
+in EDAC's sysfs interface. Starting in directory
+/sys/devices/system/edac/mc each memory controller will be represented
+by its own 'mcX' directory, where 'X" is the index of the MC.
+
+
+ ..../edac/mc/
+ |
+ |->mc0
+ |->mc1
+ |->mc2
+ ....
+
+Under each 'mcX' directory each 'csrowX' is again represented by a
+'csrowX', where 'X" is the csrow index:
+
+
+ .../mc/mc0/
+ |
+ |->csrow0
+ |->csrow2
+ |->csrow3
+ ....
+
+Notice that there is no csrow1, which indicates that csrow0 is
+composed of a single ranked DIMMs. This should also apply in both
+Channels, in order to have dual-channel mode be operational. Since
+both csrow2 and csrow3 are populated, this indicates a dual ranked
+set of DIMMs for channels 0 and 1.
+
+
+Within each of the 'mc','mcX' and 'csrowX' directories are several
+EDAC control and attribute files.
+
+
+============================================================================
+DIRECTORY 'mc'
+
+In directory 'mc' are EDAC system overall control and attribute files:
+
+
+Panic on UE control file:
+
+ 'panic_on_ue'
+
+ An uncorrectable error will cause a machine panic. This is usually
+ desirable. It is a bad idea to continue when an uncorrectable error
+ occurs - it is indeterminate what was uncorrected and the operating
+ system context might be so mangled that continuing will lead to further
+ corruption. If the kernel has MCE configured, then EDAC will never
+ notice the UE.
+
+ LOAD TIME: module/kernel parameter: panic_on_ue=[0|1]
+
+ RUN TIME: echo "1" >/sys/devices/system/edac/mc/panic_on_ue
+
+
+Log UE control file:
+
+ 'log_ue'
+
+ Generate kernel messages describing uncorrectable errors. These errors
+ are reported through the system message log system. UE statistics
+ will be accumulated even when UE logging is disabled.
+
+ LOAD TIME: module/kernel parameter: log_ue=[0|1]
+
+ RUN TIME: echo "1" >/sys/devices/system/edac/mc/log_ue
+
+
+Log CE control file:
+
+ 'log_ce'
+
+ Generate kernel messages describing correctable errors. These
+ errors are reported through the system message log system.
+ CE statistics will be accumulated even when CE logging is disabled.
+
+ LOAD TIME: module/kernel parameter: log_ce=[0|1]
+
+ RUN TIME: echo "1" >/sys/devices/system/edac/mc/log_ce
+
+
+Polling period control file:
+
+ 'poll_msec'
+
+ The time period, in milliseconds, for polling for error information.
+ Too small a value wastes resources. Too large a value might delay
+ necessary handling of errors and might loose valuable information for
+ locating the error. 1000 milliseconds (once each second) is about
+ right for most uses.
+
+ LOAD TIME: module/kernel parameter: poll_msec=[0|1]
+
+ RUN TIME: echo "1000" >/sys/devices/system/edac/mc/poll_msec
+
+
+Module Version read-only attribute file:
+
+ 'mc_version'
+
+ The EDAC CORE modules's version and compile date are shown here to
+ indicate what EDAC is running.
+
+
+
+============================================================================
+'mcX' DIRECTORIES
+
+
+In 'mcX' directories are EDAC control and attribute files for
+this 'X" instance of the memory controllers:
+
+
+Counter reset control file:
+
+ 'reset_counters'
+
+ This write-only control file will zero all the statistical counters
+ for UE and CE errors. Zeroing the counters will also reset the timer
+ indicating how long since the last counter zero. This is useful
+ for computing errors/time. Since the counters are always reset at
+ driver initialization time, no module/kernel parameter is available.
+
+ RUN TIME: echo "anything" >/sys/devices/system/edac/mc/mc0/counter_reset
+
+ This resets the counters on memory controller 0
+
+
+Seconds since last counter reset control file:
+
+ 'seconds_since_reset'
+
+ This attribute file displays how many seconds have elapsed since the
+ last counter reset. This can be used with the error counters to
+ measure error rates.
+
+
+
+DIMM capability attribute file:
+
+ 'edac_capability'
+
+ The EDAC (Error Detection and Correction) capabilities/modes of
+ the memory controller hardware.
+
+
+DIMM Current Capability attribute file:
+
+ 'edac_current_capability'
+
+ The EDAC capabilities available with the hardware
+ configuration. This may not be the same as "EDAC capability"
+ if the correct memory is not used. If a memory controller is
+ capable of EDAC, but DIMMs without check bits are in use, then
+ Parity, SECDED, S4ECD4ED capabilities will not be available
+ even though the memory controller might be capable of those
+ modes with the proper memory loaded.
+
+
+Memory Type supported on this controller attribute file:
+
+ 'supported_mem_type'
+
+ This attribute file displays the memory type, usually
+ buffered and unbuffered DIMMs.
+
+
+Memory Controller name attribute file:
+
+ 'mc_name'
+
+ This attribute file displays the type of memory controller
+ that is being utilized.
+
+
+Memory Controller Module name attribute file:
+
+ 'module_name'
+
+ This attribute file displays the memory controller module name,
+ version and date built. The name of the memory controller
+ hardware - some drivers work with multiple controllers and
+ this field shows which hardware is present.
+
+
+Total memory managed by this memory controller attribute file:
+
+ 'size_mb'
+
+ This attribute file displays, in count of megabytes, of memory
+ that this instance of memory controller manages.
+
+
+Total Uncorrectable Errors count attribute file:
+
+ 'ue_count'
+
+ This attribute file displays the total count of uncorrectable
+ errors that have occurred on this memory controller. If panic_on_ue
+ is set this counter will not have a chance to increment,
+ since EDAC will panic the system.
+
+
+Total UE count that had no information attribute fileY:
+
+ 'ue_noinfo_count'
+
+ This attribute file displays the number of UEs that
+ have occurred have occurred with no informations as to which DIMM
+ slot is having errors.
+
+
+Total Correctable Errors count attribute file:
+
+ 'ce_count'
+
+ This attribute file displays the total count of correctable
+ errors that have occurred on this memory controller. This
+ count is very important to examine. CEs provide early
+ indications that a DIMM is beginning to fail. This count
+ field should be monitored for non-zero values and report
+ such information to the system administrator.
+
+
+Total Correctable Errors count attribute file:
+
+ 'ce_noinfo_count'
+
+ This attribute file displays the number of CEs that
+ have occurred wherewith no informations as to which DIMM slot
+ is having errors. Memory is handicapped, but operational,
+ yet no information is available to indicate which slot
+ the failing memory is in. This count field should be also
+ be monitored for non-zero values.
+
+Device Symlink:
+
+ 'device'
+
+ Symlink to the memory controller device
+
+
+
+============================================================================
+'csrowX' DIRECTORIES
+
+In the 'csrowX' directories are EDAC control and attribute files for
+this 'X" instance of csrow:
+
+
+Total Uncorrectable Errors count attribute file:
+
+ 'ue_count'
+
+ This attribute file displays the total count of uncorrectable
+ errors that have occurred on this csrow. If panic_on_ue is set
+ this counter will not have a chance to increment, since EDAC
+ will panic the system.
+
+
+Total Correctable Errors count attribute file:
+
+ 'ce_count'
+
+ This attribute file displays the total count of correctable
+ errors that have occurred on this csrow. This
+ count is very important to examine. CEs provide early
+ indications that a DIMM is beginning to fail. This count
+ field should be monitored for non-zero values and report
+ such information to the system administrator.
+
+
+Total memory managed by this csrow attribute file:
+
+ 'size_mb'
+
+ This attribute file displays, in count of megabytes, of memory
+ that this csrow contatins.
+
+
+Memory Type attribute file:
+
+ 'mem_type'
+
+ This attribute file will display what type of memory is currently
+ on this csrow. Normally, either buffered or unbuffered memory.
+
+
+EDAC Mode of operation attribute file:
+
+ 'edac_mode'
+
+ This attribute file will display what type of Error detection
+ and correction is being utilized.
+
+
+Device type attribute file:
+
+ 'dev_type'
+
+ This attribute file will display what type of DIMM device is
+ being utilized. Example: x4
+
+
+Channel 0 CE Count attribute file:
+
+ 'ch0_ce_count'
+
+ This attribute file will display the count of CEs on this
+ DIMM located in channel 0.
+
+
+Channel 0 UE Count attribute file:
+
+ 'ch0_ue_count'
+
+ This attribute file will display the count of UEs on this
+ DIMM located in channel 0.
+
+
+Channel 0 DIMM Label control file:
+
+ 'ch0_dimm_label'
+
+ This control file allows this DIMM to have a label assigned
+ to it. With this label in the module, when errors occur
+ the output can provide the DIMM label in the system log.
+ This becomes vital for panic events to isolate the
+ cause of the UE event.
+
+ DIMM Labels must be assigned after booting, with information
+ that correctly identifies the physical slot with its
+ silk screen label. This information is currently very
+ motherboard specific and determination of this information
+ must occur in userland at this time.
+
+
+Channel 1 CE Count attribute file:
+
+ 'ch1_ce_count'
+
+ This attribute file will display the count of CEs on this
+ DIMM located in channel 1.
+
+
+Channel 1 UE Count attribute file:
+
+ 'ch1_ue_count'
+
+ This attribute file will display the count of UEs on this
+ DIMM located in channel 0.
+
+
+Channel 1 DIMM Label control file:
+
+ 'ch1_dimm_label'
+
+ This control file allows this DIMM to have a label assigned
+ to it. With this label in the module, when errors occur
+ the output can provide the DIMM label in the system log.
+ This becomes vital for panic events to isolate the
+ cause of the UE event.
+
+ DIMM Labels must be assigned after booting, with information
+ that correctly identifies the physical slot with its
+ silk screen label. This information is currently very
+ motherboard specific and determination of this information
+ must occur in userland at this time.
+
+
+============================================================================
+SYSTEM LOGGING
+
+If logging for UEs and CEs are enabled then system logs will have
+error notices indicating errors that have been detected:
+
+MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0,
+channel 1 "DIMM_B1": amd76x_edac
+
+MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0,
+channel 1 "DIMM_B1": amd76x_edac
+
+
+The structure of the message is:
+ the memory controller (MC0)
+ Error type (CE)
+ memory page (0x283)
+ offset in the page (0xce0)
+ the byte granularity (grain 8)
+ or resolution of the error
+ the error syndrome (0xb741)
+ memory row (row 0)
+ memory channel (channel 1)
+ DIMM label, if set prior (DIMM B1
+ and then an optional, driver-specific message that may
+ have additional information.
+
+Both UEs and CEs with no info will lack all but memory controller,
+error type, a notice of "no info" and then an optional,
+driver-specific error message.
+
+
+
+============================================================================
+PCI Bus Parity Detection
+
+
+On Header Type 00 devices the primary status is looked at
+for any parity error regardless of whether Parity is enabled on the
+device. (The spec indicates parity is generated in some cases).
+On Header Type 01 bridges, the secondary status register is also
+looked at to see if parity ocurred on the bus on the other side of
+the bridge.
+
+
+SYSFS CONFIGURATION
+
+Under /sys/devices/system/edac/pci are control and attribute files as follows:
+
+
+Enable/Disable PCI Parity checking control file:
+
+ 'check_pci_parity'
+
+
+ This control file enables or disables the PCI Bus Parity scanning
+ operation. Writing a 1 to this file enables the scanning. Writing
+ a 0 to this file disables the scanning.
+
+ Enable:
+ echo "1" >/sys/devices/system/edac/pci/check_pci_parity
+
+ Disable:
+ echo "0" >/sys/devices/system/edac/pci/check_pci_parity
+
+
+
+Panic on PCI PARITY Error:
+
+ 'panic_on_pci_parity'
+
+
+ This control files enables or disables panic'ing when a parity
+ error has been detected.
+
+
+ module/kernel parameter: panic_on_pci_parity=[0|1]
+
+ Enable:
+ echo "1" >/sys/devices/system/edac/pci/panic_on_pci_parity
+
+ Disable:
+ echo "0" >/sys/devices/system/edac/pci/panic_on_pci_parity
+
+
+Parity Count:
+
+ 'pci_parity_count'
+
+ This attribute file will display the number of parity errors that
+ have been detected.
+
+
+
+PCI Device Whitelist:
+
+ 'pci_parity_whitelist'
+
+ This control file allows for an explicit list of PCI devices to be
+ scanned for parity errors. Only devices found on this list will
+ be examined. The list is a line of hexadecimel VENDOR and DEVICE
+ ID tuples:
+
+ 1022:7450,1434:16a6
+
+ One or more can be inserted, seperated by a comma.
+
+ To write the above list doing the following as one command line:
+
+ echo "1022:7450,1434:16a6"
+ > /sys/devices/system/edac/pci/pci_parity_whitelist
+
+
+
+ To display what the whitelist is, simply 'cat' the same file.
+
+
+PCI Device Blacklist:
+
+ 'pci_parity_blacklist'
+
+ This control file allows for a list of PCI devices to be
+ skipped for scanning.
+ The list is a line of hexadecimel VENDOR and DEVICE ID tuples:
+
+ 1022:7450,1434:16a6
+
+ One or more can be inserted, seperated by a comma.
+
+ To write the above list doing the following as one command line:
+
+ echo "1022:7450,1434:16a6"
+ > /sys/devices/system/edac/pci/pci_parity_blacklist
+
+
+ To display what the whitelist current contatins,
+ simply 'cat' the same file.
+
+=======================================================================
+
+PCI Vendor and Devices IDs can be obtained with the lspci command. Using
+the -n option lspci will display the vendor and device IDs. The system
+adminstrator will have to determine which devices should be scanned or
+skipped.
+
+
+
+The two lists (white and black) are prioritized. blacklist is the lower
+priority and will NOT be utilized when a whitelist has been set.
+Turn OFF a whitelist by an empty echo command:
+
+ echo > /sys/devices/system/edac/pci/pci_parity_whitelist
+
+and any previous blacklist will be utililzed.
+
diff --git a/Documentation/dvb/avermedia.txt b/Documentation/dvb/avermedia.txt
index 2dc260b2b0a..068070ff13c 100644
--- a/Documentation/dvb/avermedia.txt
+++ b/Documentation/dvb/avermedia.txt
@@ -150,7 +150,8 @@ Getting the card going
The frontend module sp887x.o, requires an external firmware.
Please use the command "get_dvb_firmware sp887x" to download
- it. Then copy it to /usr/lib/hotplug/firmware.
+ it. Then copy it to /usr/lib/hotplug/firmware or /lib/firmware/
+ (depending on configuration of firmware hotplug).
Receiving DVB-T in Australia
diff --git a/Documentation/dvb/get_dvb_firmware b/Documentation/dvb/get_dvb_firmware
index be6eb4c7599..75c28a17409 100644
--- a/Documentation/dvb/get_dvb_firmware
+++ b/Documentation/dvb/get_dvb_firmware
@@ -23,7 +23,7 @@ use IO::Handle;
@components = ( "sp8870", "sp887x", "tda10045", "tda10046", "av7110", "dec2000t",
"dec2540t", "dec3000s", "vp7041", "dibusb", "nxt2002", "nxt2004",
- "or51211", "or51132_qam", "or51132_vsb");
+ "or51211", "or51132_qam", "or51132_vsb", "bluebird");
# Check args
syntax() if (scalar(@ARGV) != 1);
@@ -34,7 +34,11 @@ for ($i=0; $i < scalar(@components); $i++) {
if ($cid eq $components[$i]) {
$outfile = eval($cid);
die $@ if $@;
- print STDERR "Firmware $outfile extracted successfully. Now copy it to either /lib/firmware or /usr/lib/hotplug/firmware/ (depending on your hotplug version).\n";
+ print STDERR <<EOF;
+Firmware $outfile extracted successfully.
+Now copy it to either /usr/lib/hotplug/firmware or /lib/firmware
+(depending on configuration of firmware hotplug).
+EOF
exit(0);
}
}
@@ -243,7 +247,7 @@ sub nxt2002 {
my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1);
checkstandard();
-
+
wgetfile($sourcefile, $url);
unzip($sourcefile, $tmpdir);
verify("$tmpdir/SkyNETU.sys", $hash);
@@ -308,6 +312,19 @@ sub or51132_vsb {
$fwfile;
}
+sub bluebird {
+ my $url = "http://www.linuxtv.org/download/dvb/firmware/dvb-usb-bluebird-01.fw";
+ my $outfile = "dvb-usb-bluebird-01.fw";
+ my $hash = "658397cb9eba9101af9031302671f49d";
+
+ checkstandard();
+
+ wgetfile($outfile, $url);
+ verify($outfile,$hash);
+
+ $outfile;
+}
+
# ---------------------------------------------------------------
# Utilities
diff --git a/Documentation/dvb/ttusb-dec.txt b/Documentation/dvb/ttusb-dec.txt
index 5c1e984c26a..b2f271cd784 100644
--- a/Documentation/dvb/ttusb-dec.txt
+++ b/Documentation/dvb/ttusb-dec.txt
@@ -41,4 +41,5 @@ Hotplug Firmware Loading for 2.6 kernels
For 2.6 kernels the firmware is loaded at the point that the driver module is
loaded. See linux/Documentation/dvb/firmware.txt for more information.
-Copy the three files downloaded above into the /usr/lib/hotplug/firmware directory.
+Copy the three files downloaded above into the /usr/lib/hotplug/firmware or
+/lib/firmware directory (depending on configuration of firmware hotplug).
diff --git a/Documentation/fb/cyblafb/bugs b/Documentation/fb/cyblafb/bugs
index f90cc66ea91..9443a6d72cd 100644
--- a/Documentation/fb/cyblafb/bugs
+++ b/Documentation/fb/cyblafb/bugs
@@ -11,4 +11,3 @@ Untested features
All LCD stuff is untested. If it worked in tridentfb, it should work in
cyblafb. Please test and report the results to Knut_Petersen@t-online.de.
-
diff --git a/Documentation/fb/cyblafb/fb.modes b/Documentation/fb/cyblafb/fb.modes
index cf4351fc32f..fe0e5223ba8 100644
--- a/Documentation/fb/cyblafb/fb.modes
+++ b/Documentation/fb/cyblafb/fb.modes
@@ -14,142 +14,141 @@
#
mode "640x480-50"
- geometry 640 480 640 3756 8
+ geometry 640 480 2048 4096 8
timings 47619 4294967256 24 17 0 216 3
endmode
mode "640x480-60"
- geometry 640 480 640 3756 8
+ geometry 640 480 2048 4096 8
timings 39682 4294967256 24 17 0 216 3
endmode
mode "640x480-70"
- geometry 640 480 640 3756 8
+ geometry 640 480 2048 4096 8
timings 34013 4294967256 24 17 0 216 3
endmode
mode "640x480-72"
- geometry 640 480 640 3756 8
+ geometry 640 480 2048 4096 8
timings 33068 4294967256 24 17 0 216 3
endmode
mode "640x480-75"
- geometry 640 480 640 3756 8
+ geometry 640 480 2048 4096 8
timings 31746 4294967256 24 17 0 216 3
endmode
mode "640x480-80"
- geometry 640 480 640 3756 8
+ geometry 640 480 2048 4096 8
timings 29761 4294967256 24 17 0 216 3
endmode
mode "640x480-85"
- geometry 640 480 640 3756 8
+ geometry 640 480 2048 4096 8
timings 28011 4294967256 24 17 0 216 3
endmode
mode "800x600-50"
- geometry 800 600 800 3221 8
+ geometry 800 600 2048 4096 8
timings 30303 96 24 14 0 136 11
endmode
mode "800x600-60"
- geometry 800 600 800 3221 8
+ geometry 800 600 2048 4096 8
timings 25252 96 24 14 0 136 11
endmode
mode "800x600-70"
- geometry 800 600 800 3221 8
+ geometry 800 600 2048 4096 8
timings 21645 96 24 14 0 136 11
endmode
mode "800x600-72"
- geometry 800 600 800 3221 8
+ geometry 800 600 2048 4096 8
timings 21043 96 24 14 0 136 11
endmode
mode "800x600-75"
- geometry 800 600 800 3221 8
+ geometry 800 600 2048 4096 8
timings 20202 96 24 14 0 136 11
endmode
mode "800x600-80"
- geometry 800 600 800 3221 8
+ geometry 800 600 2048 4096 8
timings 18939 96 24 14 0 136 11
endmode
mode "800x600-85"
- geometry 800 600 800 3221 8
+ geometry 800 600 2048 4096 8
timings 17825 96 24 14 0 136 11
endmode
mode "1024x768-50"
- geometry 1024 768 1024 2815 8
+ geometry 1024 768 2048 4096 8
timings 19054 144 24 29 0 120 3
endmode
mode "1024x768-60"
- geometry 1024 768 1024 2815 8
+ geometry 1024 768 2048 4096 8
timings 15880 144 24 29 0 120 3
endmode
mode "1024x768-70"
- geometry 1024 768 1024 2815 8
+ geometry 1024 768 2048 4096 8
timings 13610 144 24 29 0 120 3
endmode
mode "1024x768-72"
- geometry 1024 768 1024 2815 8
+ geometry 1024 768 2048 4096 8
timings 13232 144 24 29 0 120 3
endmode
mode "1024x768-75"
- geometry 1024 768 1024 2815 8
+ geometry 1024 768 2048 4096 8
timings 12703 144 24 29 0 120 3
endmode
mode "1024x768-80"
- geometry 1024 768 1024 2815 8
+ geometry 1024 768 2048 4096 8
timings 11910 144 24 29 0 120 3
endmode
mode "1024x768-85"
- geometry 1024 768 1024 2815 8
+ geometry 1024 768 2048 4096 8
timings 11209 144 24 29 0 120 3
endmode
mode "1280x1024-50"
- geometry 1280 1024 1280 2662 8
+ geometry 1280 1024 2048 4096 8
timings 11114 232 16 39 0 160 3
endmode
mode "1280x1024-60"
- geometry 1280 1024 1280 2662 8
+ geometry 1280 1024 2048 4096 8
timings 9262 232 16 39 0 160 3
endmode
mode "1280x1024-70"
- geometry 1280 1024 1280 2662 8
+ geometry 1280 1024 2048 4096 8
timings 7939 232 16 39 0 160 3
endmode
mode "1280x1024-72"
- geometry 1280 1024 1280 2662 8
+ geometry 1280 1024 2048 4096 8
timings 7719 232 16 39 0 160 3
endmode
mode "1280x1024-75"
- geometry 1280 1024 1280 2662 8
+ geometry 1280 1024 2048 4096 8
timings 7410 232 16 39 0 160 3
endmode
mode "1280x1024-80"
- geometry 1280 1024 1280 2662 8
+ geometry 1280 1024 2048 4096 8
timings 6946 232 16 39 0 160 3
endmode
mode "1280x1024-85"
- geometry 1280 1024 1280 2662 8
+ geometry 1280 1024 2048 4096 8
timings 6538 232 16 39 0 160 3
endmode
-
diff --git a/Documentation/fb/cyblafb/performance b/Documentation/fb/cyblafb/performance
index eb4e47a9cea..8d15d5dfc6b 100644
--- a/Documentation/fb/cyblafb/performance
+++ b/Documentation/fb/cyblafb/performance
@@ -77,4 +77,3 @@ patch that speeds up kernel bitblitting a lot ( > 20%).
| | | | |
| | | | |
+-----------+-----------------+-----------------+-----------------+
-
diff --git a/Documentation/fb/cyblafb/todo b/Documentation/fb/cyblafb/todo
index 80fb2f89b6c..c5f6d0eae54 100644
--- a/Documentation/fb/cyblafb/todo
+++ b/Documentation/fb/cyblafb/todo
@@ -22,11 +22,10 @@ accelerated color blitting Who needs it? The console driver does use color
everything else is done using color expanding
blitting of 1bpp character bitmaps.
-xpanning Who needs it?
-
ioctls Who needs it?
-TV-out Will be done later
+TV-out Will be done later. Use "vga= " at boot time
+ to set a suitable video mode.
??? Feel free to contact me if you have any
feature requests
diff --git a/Documentation/fb/cyblafb/usage b/Documentation/fb/cyblafb/usage
index e627c8f5421..a39bb3d402a 100644
--- a/Documentation/fb/cyblafb/usage
+++ b/Documentation/fb/cyblafb/usage
@@ -40,6 +40,16 @@ Selecting Modes
None of the modes possible to select as startup modes are affected by
the problems described at the end of the next subsection.
+ For all startup modes cyblafb chooses a virtual x resolution of 2048,
+ the only exception is mode 1280x1024 in combination with 32 bpp. This
+ allows ywrap scrolling for all those modes if rotation is 0 or 2, and
+ also fast scrolling if rotation is 1 or 3. The default virtual y reso-
+ lution is 4096 for bpp == 8, 2048 for bpp==16 and 1024 for bpp == 32,
+ again with the only exception of 1280x1024 at 32 bpp.
+
+ Please do set your video memory size to 8 Mb in the Bios setup. Other
+ values will work, but performace is decreased for a lot of modes.
+
Mode changes using fbset
========================
@@ -54,20 +64,26 @@ Selecting Modes
- if a flat panel is found, cyblafb does not allow you
to program a resolution higher than the physical
resolution of the flat panel monitor
- - cyblafb does not allow xres to differ from xres_virtual
- cyblafb does not allow vclk to exceed 230 MHz. As 32 bpp
and (currently) 24 bit modes use a doubled vclk internally,
the dotclock limit as seen by fbset is 115 MHz for those
modes and 230 MHz for 8 and 16 bpp modes.
+ - cyblafb will allow you to select very high resolutions as
+ long as the hardware can be programmed to these modes. The
+ documented limit 1600x1200 is not enforced, but don't expect
+ perfect signal quality.
- Any request that violates the rules given above will be ignored and
- fbset will return an error.
+ Any request that violates the rules given above will be either changed
+ to something the hardware supports or an error value will be returned.
If you program a virtual y resolution higher than the hardware limit,
cyblafb will silently decrease that value to the highest possible
- value.
+ value. The same is true for a virtual x resolution that is not
+ supported by the hardware. Cyblafb tries to adapt vyres first because
+ vxres decides if ywrap scrolling is possible or not.
- Attempts to disable acceleration are ignored.
+ Attempts to disable acceleration are ignored, I believe that this is
+ safe.
Some video modes that should work do not work as expected. If you use
the standard fb.modes, fbset 640x480-60 will program that mode, but
@@ -129,10 +145,6 @@ mode 640x480 or 800x600 or 1024x768 or 1280x1024
verbosity 0 is the default, increase to at least 2 for every
bug report!
-vesafb allows cyblafb to be loaded after vesafb has been
- loaded. See sections "Module unloading ...".
-
-
Development hints
=================
@@ -195,7 +207,7 @@ a graphics mode.
After booting, load cyblafb without any mode and bpp parameter and assign
cyblafb to individual ttys using con2fb, e.g.:
- modprobe cyblafb vesafb=1
+ modprobe cyblafb
con2fb /dev/fb1 /dev/tty1
Unloading cyblafb works without problems after you assign vesafb to all
@@ -203,4 +215,3 @@ ttys again, e.g.:
con2fb /dev/fb0 /dev/tty1
rmmod cyblafb
-
diff --git a/Documentation/fb/cyblafb/whatsnew b/Documentation/fb/cyblafb/whatsnew
new file mode 100644
index 00000000000..76c07a26e04
--- /dev/null
+++ b/Documentation/fb/cyblafb/whatsnew
@@ -0,0 +1,29 @@
+0.62
+====
+
+ - the vesafb parameter has been removed as I decided to allow the
+ feature without any special parameter.
+
+ - Cyblafb does not use the vga style of panning any longer, now the
+ "right view" register in the graphics engine IO space is used. Without
+ that change it was impossible to use all available memory, and without
+ access to all available memory it is impossible to ywrap.
+
+ - The imageblit function now uses hardware acceleration for all font
+ widths. Hardware blitting across pixel column 2048 is broken in the
+ cyberblade/i1 graphics core, but we work around that hardware bug.
+
+ - modes with vxres != xres are supported now.
+
+ - ywrap scrolling is supported now and the default. This is a big
+ performance gain.
+
+ - default video modes use vyres > yres and vxres > xres to allow
+ almost optimal scrolling speed for normal and rotated screens
+
+ - some features mainly usefull for debugging the upper layers of the
+ framebuffer system have been added, have a look at the code
+
+ - fixed: Oops after unloading cyblafb when reading /proc/io*
+
+ - we work around some bugs of the higher framebuffer layers.
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 9b743198f77..b4a1ea76269 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -47,17 +47,6 @@ Who: Paul E. McKenney <paulmck@us.ibm.com>
---------------------------
-What: IEEE1394 Audio and Music Data Transmission Protocol driver,
- Connection Management Procedures driver
-When: November 2005
-Files: drivers/ieee1394/{amdtp,cmp}*
-Why: These are incomplete, have never worked, and are better implemented
- in userland via raw1394 (see http://freebob.sourceforge.net/ for
- example.)
-Who: Jody McIntyre <scjody@steamballoon.com>
-
----------------------------
-
What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN
When: November 2005
Why: Deprecated in favour of the new ioctl-based rawiso interface, which is
@@ -82,15 +71,6 @@ Who: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
---------------------------
-What: i2c sysfs name change: in1_ref, vid deprecated in favour of cpu0_vid
-When: November 2005
-Files: drivers/i2c/chips/adm1025.c, drivers/i2c/chips/adm1026.c
-Why: Match the other drivers' name for the same function, duplicate names
- will be available until removal of old names.
-Who: Grant Coady <gcoady@gmail.com>
-
----------------------------
-
What: remove EXPORT_SYMBOL(panic_timeout)
When: April 2006
Files: kernel/panic.c
@@ -143,6 +123,15 @@ Who: Christoph Hellwig <hch@lst.de>
---------------------------
+What: CONFIG_FORCED_INLINING
+When: June 2006
+Why: Config option is there to see if gcc is good enough. (in january
+ 2006). If it is, the behavior should just be the default. If it's not,
+ the option should just go away entirely.
+Who: Arjan van de Ven
+
+---------------------------
+
What: START_ARRAY ioctl for md
When: July 2006
Files: drivers/md/md.c
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
index bcfbab899b3..74052d22d86 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -12,14 +12,16 @@ cifs.txt
- description of the CIFS filesystem
coda.txt
- description of the CODA filesystem.
+configfs/
+ - directory containing configfs documentation and example code.
cramfs.txt
- info on the cram filesystem for small storage (ROMs etc)
devfs/
- directory containing devfs documentation.
+dlmfs.txt
+ - info on the userspace interface to the OCFS2 DLM.
ext2.txt
- info, mount options and specifications for the Ext2 filesystem.
-fat_cvf.txt
- - info on the Compressed Volume Files extension to the FAT filesystem
hpfs.txt
- info and mount options for the OS/2 HPFS.
isofs.txt
@@ -32,6 +34,8 @@ ntfs.txt
- info and mount options for the NTFS filesystem (Windows NT).
proc.txt
- info on Linux's /proc filesystem.
+ocfs2.txt
+ - info and mount options for the OCFS2 clustered filesystem.
romfs.txt
- Description of the ROMFS filesystem.
smbfs.txt
diff --git a/Documentation/filesystems/configfs/configfs.txt b/Documentation/filesystems/configfs/configfs.txt
new file mode 100644
index 00000000000..c4ff96b7c4e
--- /dev/null
+++ b/Documentation/filesystems/configfs/configfs.txt
@@ -0,0 +1,434 @@
+
+configfs - Userspace-driven kernel object configuation.
+
+Joel Becker <joel.becker@oracle.com>
+
+Updated: 31 March 2005
+
+Copyright (c) 2005 Oracle Corporation,
+ Joel Becker <joel.becker@oracle.com>
+
+
+[What is configfs?]
+
+configfs is a ram-based filesystem that provides the converse of
+sysfs's functionality. Where sysfs is a filesystem-based view of
+kernel objects, configfs is a filesystem-based manager of kernel
+objects, or config_items.
+
+With sysfs, an object is created in kernel (for example, when a device
+is discovered) and it is registered with sysfs. Its attributes then
+appear in sysfs, allowing userspace to read the attributes via
+readdir(3)/read(2). It may allow some attributes to be modified via
+write(2). The important point is that the object is created and
+destroyed in kernel, the kernel controls the lifecycle of the sysfs
+representation, and sysfs is merely a window on all this.
+
+A configfs config_item is created via an explicit userspace operation:
+mkdir(2). It is destroyed via rmdir(2). The attributes appear at
+mkdir(2) time, and can be read or modified via read(2) and write(2).
+As with sysfs, readdir(3) queries the list of items and/or attributes.
+symlink(2) can be used to group items together. Unlike sysfs, the
+lifetime of the representation is completely driven by userspace. The
+kernel modules backing the items must respond to this.
+
+Both sysfs and configfs can and should exist together on the same
+system. One is not a replacement for the other.
+
+[Using configfs]
+
+configfs can be compiled as a module or into the kernel. You can access
+it by doing
+
+ mount -t configfs none /config
+
+The configfs tree will be empty unless client modules are also loaded.
+These are modules that register their item types with configfs as
+subsystems. Once a client subsystem is loaded, it will appear as a
+subdirectory (or more than one) under /config. Like sysfs, the
+configfs tree is always there, whether mounted on /config or not.
+
+An item is created via mkdir(2). The item's attributes will also
+appear at this time. readdir(3) can determine what the attributes are,
+read(2) can query their default values, and write(2) can store new
+values. Like sysfs, attributes should be ASCII text files, preferably
+with only one value per file. The same efficiency caveats from sysfs
+apply. Don't mix more than one attribute in one attribute file.
+
+Like sysfs, configfs expects write(2) to store the entire buffer at
+once. When writing to configfs attributes, userspace processes should
+first read the entire file, modify the portions they wish to change, and
+then write the entire buffer back. Attribute files have a maximum size
+of one page (PAGE_SIZE, 4096 on i386).
+
+When an item needs to be destroyed, remove it with rmdir(2). An
+item cannot be destroyed if any other item has a link to it (via
+symlink(2)). Links can be removed via unlink(2).
+
+[Configuring FakeNBD: an Example]
+
+Imagine there's a Network Block Device (NBD) driver that allows you to
+access remote block devices. Call it FakeNBD. FakeNBD uses configfs
+for its configuration. Obviously, there will be a nice program that
+sysadmins use to configure FakeNBD, but somehow that program has to tell
+the driver about it. Here's where configfs comes in.
+
+When the FakeNBD driver is loaded, it registers itself with configfs.
+readdir(3) sees this just fine:
+
+ # ls /config
+ fakenbd
+
+A fakenbd connection can be created with mkdir(2). The name is
+arbitrary, but likely the tool will make some use of the name. Perhaps
+it is a uuid or a disk name:
+
+ # mkdir /config/fakenbd/disk1
+ # ls /config/fakenbd/disk1
+ target device rw
+
+The target attribute contains the IP address of the server FakeNBD will
+connect to. The device attribute is the device on the server.
+Predictably, the rw attribute determines whether the connection is
+read-only or read-write.
+
+ # echo 10.0.0.1 > /config/fakenbd/disk1/target
+ # echo /dev/sda1 > /config/fakenbd/disk1/device
+ # echo 1 > /config/fakenbd/disk1/rw
+
+That's it. That's all there is. Now the device is configured, via the
+shell no less.
+
+[Coding With configfs]
+
+Every object in configfs is a config_item. A config_item reflects an
+object in the subsystem. It has attributes that match values on that
+object. configfs handles the filesystem representation of that object
+and its attributes, allowing the subsystem to ignore all but the
+basic show/store interaction.
+
+Items are created and destroyed inside a config_group. A group is a
+collection of items that share the same attributes and operations.
+Items are created by mkdir(2) and removed by rmdir(2), but configfs
+handles that. The group has a set of operations to perform these tasks
+
+A subsystem is the top level of a client module. During initialization,
+the client module registers the subsystem with configfs, the subsystem
+appears as a directory at the top of the configfs filesystem. A
+subsystem is also a config_group, and can do everything a config_group
+can.
+
+[struct config_item]
+
+ struct config_item {
+ char *ci_name;
+ char ci_namebuf[UOBJ_NAME_LEN];
+ struct kref ci_kref;
+ struct list_head ci_entry;
+ struct config_item *ci_parent;
+ struct config_group *ci_group;
+ struct config_item_type *ci_type;
+ struct dentry *ci_dentry;
+ };
+
+ void config_item_init(struct config_item *);
+ void config_item_init_type_name(struct config_item *,
+ const char *name,
+ struct config_item_type *type);
+ struct config_item *config_item_get(struct config_item *);
+ void config_item_put(struct config_item *);
+
+Generally, struct config_item is embedded in a container structure, a
+structure that actually represents what the subsystem is doing. The
+config_item portion of that structure is how the object interacts with
+configfs.
+
+Whether statically defined in a source file or created by a parent
+config_group, a config_item must have one of the _init() functions
+called on it. This initializes the reference count and sets up the
+appropriate fields.
+
+All users of a config_item should have a reference on it via
+config_item_get(), and drop the reference when they are done via
+config_item_put().
+
+By itself, a config_item cannot do much more than appear in configfs.
+Usually a subsystem wants the item to display and/or store attributes,
+among other things. For that, it needs a type.
+
+[struct config_item_type]
+
+ struct configfs_item_operations {
+ void (*release)(struct config_item *);
+ ssize_t (*show_attribute)(struct config_item *,
+ struct configfs_attribute *,
+ char *);
+ ssize_t (*store_attribute)(struct config_item *,
+ struct configfs_attribute *,
+ const char *, size_t);
+ int (*allow_link)(struct config_item *src,
+ struct config_item *target);
+ int (*drop_link)(struct config_item *src,
+ struct config_item *target);
+ };
+
+ struct config_item_type {
+ struct module *ct_owner;
+ struct configfs_item_operations *ct_item_ops;
+ struct configfs_group_operations *ct_group_ops;
+ struct configfs_attribute **ct_attrs;
+ };
+
+The most basic function of a config_item_type is to define what
+operations can be performed on a config_item. All items that have been
+allocated dynamically will need to provide the ct_item_ops->release()
+method. This method is called when the config_item's reference count
+reaches zero. Items that wish to display an attribute need to provide
+the ct_item_ops->show_attribute() method. Similarly, storing a new
+attribute value uses the store_attribute() method.
+
+[struct configfs_attribute]
+
+ struct configfs_attribute {
+ char *ca_name;
+ struct module *ca_owner;
+ mode_t ca_mode;
+ };
+
+When a config_item wants an attribute to appear as a file in the item's
+configfs directory, it must define a configfs_attribute describing it.
+It then adds the attribute to the NULL-terminated array
+config_item_type->ct_attrs. When the item appears in configfs, the
+attribute file will appear with the configfs_attribute->ca_name
+filename. configfs_attribute->ca_mode specifies the file permissions.
+
+If an attribute is readable and the config_item provides a
+ct_item_ops->show_attribute() method, that method will be called
+whenever userspace asks for a read(2) on the attribute. The converse
+will happen for write(2).
+
+[struct config_group]
+
+A config_item cannot live in a vaccum. The only way one can be created
+is via mkdir(2) on a config_group. This will trigger creation of a
+child item.
+
+ struct config_group {
+ struct config_item cg_item;
+ struct list_head cg_children;
+ struct configfs_subsystem *cg_subsys;
+ struct config_group **default_groups;
+ };
+
+ void config_group_init(struct config_group *group);
+ void config_group_init_type_name(struct config_group *group,
+ const char *name,
+ struct config_item_type *type);
+
+
+The config_group structure contains a config_item. Properly configuring
+that item means that a group can behave as an item in its own right.
+However, it can do more: it can create child items or groups. This is
+accomplished via the group operations specified on the group's
+config_item_type.
+
+ struct configfs_group_operations {
+ struct config_item *(*make_item)(struct config_group *group,
+ const char *name);
+ struct config_group *(*make_group)(struct config_group *group,
+ const char *name);
+ int (*commit_item)(struct config_item *item);
+ void (*drop_item)(struct config_group *group,
+ struct config_item *item);
+ };
+
+A group creates child items by providing the
+ct_group_ops->make_item() method. If provided, this method is called from mkdir(2) in the group's directory. The subsystem allocates a new
+config_item (or more likely, its container structure), initializes it,
+and returns it to configfs. Configfs will then populate the filesystem
+tree to reflect the new item.
+
+If the subsystem wants the child to be a group itself, the subsystem
+provides ct_group_ops->make_group(). Everything else behaves the same,
+using the group _init() functions on the group.
+
+Finally, when userspace calls rmdir(2) on the item or group,
+ct_group_ops->drop_item() is called. As a config_group is also a
+config_item, it is not necessary for a seperate drop_group() method.
+The subsystem must config_item_put() the reference that was initialized
+upon item allocation. If a subsystem has no work to do, it may omit
+the ct_group_ops->drop_item() method, and configfs will call
+config_item_put() on the item on behalf of the subsystem.
+
+IMPORTANT: drop_item() is void, and as such cannot fail. When rmdir(2)
+is called, configfs WILL remove the item from the filesystem tree
+(assuming that it has no children to keep it busy). The subsystem is
+responsible for responding to this. If the subsystem has references to
+the item in other threads, the memory is safe. It may take some time
+for the item to actually disappear from the subsystem's usage. But it
+is gone from configfs.
+
+A config_group cannot be removed while it still has child items. This
+is implemented in the configfs rmdir(2) code. ->drop_item() will not be
+called, as the item has not been dropped. rmdir(2) will fail, as the
+directory is not empty.
+
+[struct configfs_subsystem]
+
+A subsystem must register itself, ususally at module_init time. This
+tells configfs to make the subsystem appear in the file tree.
+
+ struct configfs_subsystem {
+ struct config_group su_group;
+ struct semaphore su_sem;
+ };
+
+ int configfs_register_subsystem(struct configfs_subsystem *subsys);
+ void configfs_unregister_subsystem(struct configfs_subsystem *subsys);
+
+ A subsystem consists of a toplevel config_group and a semaphore.
+The group is where child config_items are created. For a subsystem,
+this group is usually defined statically. Before calling
+configfs_register_subsystem(), the subsystem must have initialized the
+group via the usual group _init() functions, and it must also have
+initialized the semaphore.
+ When the register call returns, the subsystem is live, and it
+will be visible via configfs. At that point, mkdir(2) can be called and
+the subsystem must be ready for it.
+
+[An Example]
+
+The best example of these basic concepts is the simple_children
+subsystem/group and the simple_child item in configfs_example.c It
+shows a trivial object displaying and storing an attribute, and a simple
+group creating and destroying these children.
+
+[Hierarchy Navigation and the Subsystem Semaphore]
+
+There is an extra bonus that configfs provides. The config_groups and
+config_items are arranged in a hierarchy due to the fact that they
+appear in a filesystem. A subsystem is NEVER to touch the filesystem
+parts, but the subsystem might be interested in this hierarchy. For
+this reason, the hierarchy is mirrored via the config_group->cg_children
+and config_item->ci_parent structure members.
+
+A subsystem can navigate the cg_children list and the ci_parent pointer
+to see the tree created by the subsystem. This can race with configfs'
+management of the hierarchy, so configfs uses the subsystem semaphore to
+protect modifications. Whenever a subsystem wants to navigate the
+hierarchy, it must do so under the protection of the subsystem
+semaphore.
+
+A subsystem will be prevented from acquiring the semaphore while a newly
+allocated item has not been linked into this hierarchy. Similarly, it
+will not be able to acquire the semaphore while a dropping item has not
+yet been unlinked. This means that an item's ci_parent pointer will
+never be NULL while the item is in configfs, and that an item will only
+be in its parent's cg_children list for the same duration. This allows
+a subsystem to trust ci_parent and cg_children while they hold the
+semaphore.
+
+[Item Aggregation Via symlink(2)]
+
+configfs provides a simple group via the group->item parent/child
+relationship. Often, however, a larger environment requires aggregation
+outside of the parent/child connection. This is implemented via
+symlink(2).
+
+A config_item may provide the ct_item_ops->allow_link() and
+ct_item_ops->drop_link() methods. If the ->allow_link() method exists,
+symlink(2) may be called with the config_item as the source of the link.
+These links are only allowed between configfs config_items. Any
+symlink(2) attempt outside the configfs filesystem will be denied.
+
+When symlink(2) is called, the source config_item's ->allow_link()
+method is called with itself and a target item. If the source item
+allows linking to target item, it returns 0. A source item may wish to
+reject a link if it only wants links to a certain type of object (say,
+in its own subsystem).
+
+When unlink(2) is called on the symbolic link, the source item is
+notified via the ->drop_link() method. Like the ->drop_item() method,
+this is a void function and cannot return failure. The subsystem is
+responsible for responding to the change.
+
+A config_item cannot be removed while it links to any other item, nor
+can it be removed while an item links to it. Dangling symlinks are not
+allowed in configfs.
+
+[Automatically Created Subgroups]
+
+A new config_group may want to have two types of child config_items.
+While this could be codified by magic names in ->make_item(), it is much
+more explicit to have a method whereby userspace sees this divergence.
+
+Rather than have a group where some items behave differently than
+others, configfs provides a method whereby one or many subgroups are
+automatically created inside the parent at its creation. Thus,
+mkdir("parent) results in "parent", "parent/subgroup1", up through
+"parent/subgroupN". Items of type 1 can now be created in
+"parent/subgroup1", and items of type N can be created in
+"parent/subgroupN".
+
+These automatic subgroups, or default groups, do not preclude other
+children of the parent group. If ct_group_ops->make_group() exists,
+other child groups can be created on the parent group directly.
+
+A configfs subsystem specifies default groups by filling in the
+NULL-terminated array default_groups on the config_group structure.
+Each group in that array is populated in the configfs tree at the same
+time as the parent group. Similarly, they are removed at the same time
+as the parent. No extra notification is provided. When a ->drop_item()
+method call notifies the subsystem the parent group is going away, it
+also means every default group child associated with that parent group.
+
+As a consequence of this, default_groups cannot be removed directly via
+rmdir(2). They also are not considered when rmdir(2) on the parent
+group is checking for children.
+
+[Committable Items]
+
+NOTE: Committable items are currently unimplemented.
+
+Some config_items cannot have a valid initial state. That is, no
+default values can be specified for the item's attributes such that the
+item can do its work. Userspace must configure one or more attributes,
+after which the subsystem can start whatever entity this item
+represents.
+
+Consider the FakeNBD device from above. Without a target address *and*
+a target device, the subsystem has no idea what block device to import.
+The simple example assumes that the subsystem merely waits until all the
+appropriate attributes are configured, and then connects. This will,
+indeed, work, but now every attribute store must check if the attributes
+are initialized. Every attribute store must fire off the connection if
+that condition is met.
+
+Far better would be an explicit action notifying the subsystem that the
+config_item is ready to go. More importantly, an explicit action allows
+the subsystem to provide feedback as to whether the attibutes are
+initialized in a way that makes sense. configfs provides this as
+committable items.
+
+configfs still uses only normal filesystem operations. An item is
+committed via rename(2). The item is moved from a directory where it
+can be modified to a directory where it cannot.
+
+Any group that provides the ct_group_ops->commit_item() method has
+committable items. When this group appears in configfs, mkdir(2) will
+not work directly in the group. Instead, the group will have two
+subdirectories: "live" and "pending". The "live" directory does not
+support mkdir(2) or rmdir(2) either. It only allows rename(2). The
+"pending" directory does allow mkdir(2) and rmdir(2). An item is
+created in the "pending" directory. Its attributes can be modified at
+will. Userspace commits the item by renaming it into the "live"
+directory. At this point, the subsystem recieves the ->commit_item()
+callback. If all required attributes are filled to satisfaction, the
+method returns zero and the item is moved to the "live" directory.
+
+As rmdir(2) does not work in the "live" directory, an item must be
+shutdown, or "uncommitted". Again, this is done via rename(2), this
+time from the "live" directory back to the "pending" one. The subsystem
+is notified by the ct_group_ops->uncommit_object() method.
+
+
diff --git a/Documentation/filesystems/configfs/configfs_example.c b/Documentation/filesystems/configfs/configfs_example.c
new file mode 100644
index 00000000000..f3c6e4946f9
--- /dev/null
+++ b/Documentation/filesystems/configfs/configfs_example.c
@@ -0,0 +1,474 @@
+/*
+ * vim: noexpandtab ts=8 sts=0 sw=8:
+ *
+ * configfs_example.c - This file is a demonstration module containing
+ * a number of configfs subsystems.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ *
+ * Based on sysfs:
+ * sysfs is Copyright (C) 2001, 2002, 2003 Patrick Mochel
+ *
+ * configfs Copyright (C) 2005 Oracle. All rights reserved.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+
+#include <linux/configfs.h>
+
+
+
+/*
+ * 01-childless
+ *
+ * This first example is a childless subsystem. It cannot create
+ * any config_items. It just has attributes.
+ *
+ * Note that we are enclosing the configfs_subsystem inside a container.
+ * This is not necessary if a subsystem has no attributes directly
+ * on the subsystem. See the next example, 02-simple-children, for
+ * such a subsystem.
+ */
+
+struct childless {
+ struct configfs_subsystem subsys;
+ int showme;
+ int storeme;
+};
+
+struct childless_attribute {
+ struct configfs_attribute attr;
+ ssize_t (*show)(struct childless *, char *);
+ ssize_t (*store)(struct childless *, const char *, size_t);
+};
+
+static inline struct childless *to_childless(struct config_item *item)
+{
+ return item ? container_of(to_configfs_subsystem(to_config_group(item)), struct childless, subsys) : NULL;
+}
+
+static ssize_t childless_showme_read(struct childless *childless,
+ char *page)
+{
+ ssize_t pos;
+
+ pos = sprintf(page, "%d\n", childless->showme);
+ childless->showme++;
+
+ return pos;
+}
+
+static ssize_t childless_storeme_read(struct childless *childless,
+ char *page)
+{
+ return sprintf(page, "%d\n", childless->storeme);
+}
+
+static ssize_t childless_storeme_write(struct childless *childless,
+ const char *page,
+ size_t count)
+{
+ unsigned long tmp;
+ char *p = (char *) page;
+
+ tmp = simple_strtoul(p, &p, 10);
+ if (!p || (*p && (*p != '\n')))
+ return -EINVAL;
+
+ if (tmp > INT_MAX)
+ return -ERANGE;
+
+ childless->storeme = tmp;
+
+ return count;
+}
+
+static ssize_t childless_description_read(struct childless *childless,
+ char *page)
+{
+ return sprintf(page,
+"[01-childless]\n"
+"\n"
+"The childless subsystem is the simplest possible subsystem in\n"
+"configfs. It does not support the creation of child config_items.\n"
+"It only has a few attributes. In fact, it isn't much different\n"
+"than a directory in /proc.\n");
+}
+
+static struct childless_attribute childless_attr_showme = {
+ .attr = { .ca_owner = THIS_MODULE, .ca_name = "showme", .ca_mode = S_IRUGO },
+ .show = childless_showme_read,
+};
+static struct childless_attribute childless_attr_storeme = {
+ .attr = { .ca_owner = THIS_MODULE, .ca_name = "storeme", .ca_mode = S_IRUGO | S_IWUSR },
+ .show = childless_storeme_read,
+ .store = childless_storeme_write,
+};
+static struct childless_attribute childless_attr_description = {
+ .attr = { .ca_owner = THIS_MODULE, .ca_name = "description", .ca_mode = S_IRUGO },
+ .show = childless_description_read,
+};
+
+static struct configfs_attribute *childless_attrs[] = {
+ &childless_attr_showme.attr,
+ &childless_attr_storeme.attr,
+ &childless_attr_description.attr,
+ NULL,
+};
+
+static ssize_t childless_attr_show(struct config_item *item,
+ struct configfs_attribute *attr,
+ char *page)
+{
+ struct childless *childless = to_childless(item);
+ struct childless_attribute *childless_attr =
+ container_of(attr, struct childless_attribute, attr);
+ ssize_t ret = 0;
+
+ if (childless_attr->show)
+ ret = childless_attr->show(childless, page);
+ return ret;
+}
+
+static ssize_t childless_attr_store(struct config_item *item,
+ struct configfs_attribute *attr,
+ const char *page, size_t count)
+{
+ struct childless *childless = to_childless(item);
+ struct childless_attribute *childless_attr =
+ container_of(attr, struct childless_attribute, attr);
+ ssize_t ret = -EINVAL;
+
+ if (childless_attr->store)
+ ret = childless_attr->store(childless, page, count);
+ return ret;
+}
+
+static struct configfs_item_operations childless_item_ops = {
+ .show_attribute = childless_attr_show,
+ .store_attribute = childless_attr_store,
+};
+
+static struct config_item_type childless_type = {
+ .ct_item_ops = &childless_item_ops,
+ .ct_attrs = childless_attrs,
+ .ct_owner = THIS_MODULE,
+};
+
+static struct childless childless_subsys = {
+ .subsys = {
+ .su_group = {
+ .cg_item = {
+ .ci_namebuf = "01-childless",
+ .ci_type = &childless_type,
+ },
+ },
+ },
+};
+
+
+/* ----------------------------------------------------------------- */
+
+/*
+ * 02-simple-children
+ *
+ * This example merely has a simple one-attribute child. Note that
+ * there is no extra attribute structure, as the child's attribute is
+ * known from the get-go. Also, there is no container for the
+ * subsystem, as it has no attributes of its own.
+ */
+
+struct simple_child {
+ struct config_item item;
+ int storeme;
+};
+
+static inline struct simple_child *to_simple_child(struct config_item *item)
+{
+ return item ? container_of(item, struct simple_child, item) : NULL;
+}
+
+static struct configfs_attribute simple_child_attr_storeme = {
+ .ca_owner = THIS_MODULE,
+ .ca_name = "storeme",
+ .ca_mode = S_IRUGO | S_IWUSR,
+};
+
+static struct configfs_attribute *simple_child_attrs[] = {
+ &simple_child_attr_storeme,
+ NULL,
+};
+
+static ssize_t simple_child_attr_show(struct config_item *item,
+ struct configfs_attribute *attr,
+ char *page)
+{
+ ssize_t count;
+ struct simple_child *simple_child = to_simple_child(item);
+
+ count = sprintf(page, "%d\n", simple_child->storeme);
+
+ return count;
+}
+
+static ssize_t simple_child_attr_store(struct config_item *item,
+ struct configfs_attribute *attr,
+ const char *page, size_t count)
+{
+ struct simple_child *simple_child = to_simple_child(item);
+ unsigned long tmp;
+ char *p = (char *) page;
+
+ tmp = simple_strtoul(p, &p, 10);
+ if (!p || (*p && (*p != '\n')))
+ return -EINVAL;
+
+ if (tmp > INT_MAX)
+ return -ERANGE;
+
+ simple_child->storeme = tmp;
+
+ return count;
+}
+
+static void simple_child_release(struct config_item *item)
+{
+ kfree(to_simple_child(item));
+}
+
+static struct configfs_item_operations simple_child_item_ops = {
+ .release = simple_child_release,
+ .show_attribute = simple_child_attr_show,
+ .store_attribute = simple_child_attr_store,
+};
+
+static struct config_item_type simple_child_type = {
+ .ct_item_ops = &simple_child_item_ops,
+ .ct_attrs = simple_child_attrs,
+ .ct_owner = THIS_MODULE,
+};
+
+
+static struct config_item *simple_children_make_item(struct config_group *group, const char *name)
+{
+ struct simple_child *simple_child;
+
+ simple_child = kmalloc(sizeof(struct simple_child), GFP_KERNEL);
+ if (!simple_child)
+ return NULL;
+
+ memset(simple_child, 0, sizeof(struct simple_child));
+
+ config_item_init_type_name(&simple_child->item, name,
+ &simple_child_type);
+
+ simple_child->storeme = 0;
+
+ return &simple_child->item;
+}
+
+static struct configfs_attribute simple_children_attr_description = {
+ .ca_owner = THIS_MODULE,
+ .ca_name = "description",
+ .ca_mode = S_IRUGO,
+};
+
+static struct configfs_attribute *simple_children_attrs[] = {
+ &simple_children_attr_description,
+ NULL,
+};
+
+static ssize_t simple_children_attr_show(struct config_item *item,
+ struct configfs_attribute *attr,
+ char *page)
+{
+ return sprintf(page,
+"[02-simple-children]\n"
+"\n"
+"This subsystem allows the creation of child config_items. These\n"
+"items have only one attribute that is readable and writeable.\n");
+}
+
+static struct configfs_item_operations simple_children_item_ops = {
+ .show_attribute = simple_children_attr_show,
+};
+
+/*
+ * Note that, since no extra work is required on ->drop_item(),
+ * no ->drop_item() is provided.
+ */
+static struct configfs_group_operations simple_children_group_ops = {
+ .make_item = simple_children_make_item,
+};
+
+static struct config_item_type simple_children_type = {
+ .ct_item_ops = &simple_children_item_ops,
+ .ct_group_ops = &simple_children_group_ops,
+ .ct_attrs = simple_children_attrs,
+};
+
+static struct configfs_subsystem simple_children_subsys = {
+ .su_group = {
+ .cg_item = {
+ .ci_namebuf = "02-simple-children",
+ .ci_type = &simple_children_type,
+ },
+ },
+};
+
+
+/* ----------------------------------------------------------------- */
+
+/*
+ * 03-group-children
+ *
+ * This example reuses the simple_children group from above. However,
+ * the simple_children group is not the subsystem itself, it is a
+ * child of the subsystem. Creation of a group in the subsystem creates
+ * a new simple_children group. That group can then have simple_child
+ * children of its own.
+ */
+
+struct simple_children {
+ struct config_group group;
+};
+
+static struct config_group *group_children_make_group(struct config_group *group, const char *name)
+{
+ struct simple_children *simple_children;
+
+ simple_children = kmalloc(sizeof(struct simple_children),
+ GFP_KERNEL);
+ if (!simple_children)
+ return NULL;
+
+ memset(simple_children, 0, sizeof(struct simple_children));
+
+ config_group_init_type_name(&simple_children->group, name,
+ &simple_children_type);
+
+ return &simple_children->group;
+}
+
+static struct configfs_attribute group_children_attr_description = {
+ .ca_owner = THIS_MODULE,
+ .ca_name = "description",
+ .ca_mode = S_IRUGO,
+};
+
+static struct configfs_attribute *group_children_attrs[] = {
+ &group_children_attr_description,
+ NULL,
+};
+
+static ssize_t group_children_attr_show(struct config_item *item,
+ struct configfs_attribute *attr,
+ char *page)
+{
+ return sprintf(page,
+"[03-group-children]\n"
+"\n"
+"This subsystem allows the creation of child config_groups. These\n"
+"groups are like the subsystem simple-children.\n");
+}
+
+static struct configfs_item_operations group_children_item_ops = {
+ .show_attribute = group_children_attr_show,
+};
+
+/*
+ * Note that, since no extra work is required on ->drop_item(),
+ * no ->drop_item() is provided.
+ */
+static struct configfs_group_operations group_children_group_ops = {
+ .make_group = group_children_make_group,
+};
+
+static struct config_item_type group_children_type = {
+ .ct_item_ops = &group_children_item_ops,
+ .ct_group_ops = &group_children_group_ops,
+ .ct_attrs = group_children_attrs,
+};
+
+static struct configfs_subsystem group_children_subsys = {
+ .su_group = {
+ .cg_item = {
+ .ci_namebuf = "03-group-children",
+ .ci_type = &group_children_type,
+ },
+ },
+};
+
+/* ----------------------------------------------------------------- */
+
+/*
+ * We're now done with our subsystem definitions.
+ * For convenience in this module, here's a list of them all. It
+ * allows the init function to easily register them. Most modules
+ * will only have one subsystem, and will only call register_subsystem
+ * on it directly.
+ */
+static struct configfs_subsystem *example_subsys[] = {
+ &childless_subsys.subsys,
+ &simple_children_subsys,
+ &group_children_subsys,
+ NULL,
+};
+
+static int __init configfs_example_init(void)
+{
+ int ret;
+ int i;
+ struct configfs_subsystem *subsys;
+
+ for (i = 0; example_subsys[i]; i++) {
+ subsys = example_subsys[i];
+
+ config_group_init(&subsys->su_group);
+ init_MUTEX(&subsys->su_sem);
+ ret = configfs_register_subsystem(subsys);
+ if (ret) {
+ printk(KERN_ERR "Error %d while registering subsystem %s\n",
+ ret,
+ subsys->su_group.cg_item.ci_namebuf);
+ goto out_unregister;
+ }
+ }
+
+ return 0;
+
+out_unregister:
+ for (; i >= 0; i--) {
+ configfs_unregister_subsystem(example_subsys[i]);
+ }
+
+ return ret;
+}
+
+static void __exit configfs_example_exit(void)
+{
+ int i;
+
+ for (i = 0; example_subsys[i]; i++) {
+ configfs_unregister_subsystem(example_subsys[i]);
+ }
+}
+
+module_init(configfs_example_init);
+module_exit(configfs_example_exit);
+MODULE_LICENSE("GPL");
diff --git a/Documentation/filesystems/dlmfs.txt b/Documentation/filesystems/dlmfs.txt
new file mode 100644
index 00000000000..9afab845a90
--- /dev/null
+++ b/Documentation/filesystems/dlmfs.txt
@@ -0,0 +1,130 @@
+dlmfs
+==================
+A minimal DLM userspace interface implemented via a virtual file
+system.
+
+dlmfs is built with OCFS2 as it requires most of its infrastructure.
+
+Project web page: http://oss.oracle.com/projects/ocfs2
+Tools web page: http://oss.oracle.com/projects/ocfs2-tools
+OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
+
+All code copyright 2005 Oracle except when otherwise noted.
+
+CREDITS
+=======
+
+Some code taken from ramfs which is Copyright (C) 2000 Linus Torvalds
+and Transmeta Corp.
+
+Mark Fasheh <mark.fasheh@oracle.com>
+
+Caveats
+=======
+- Right now it only works with the OCFS2 DLM, though support for other
+ DLM implementations should not be a major issue.
+
+Mount options
+=============
+None
+
+Usage
+=====
+
+If you're just interested in OCFS2, then please see ocfs2.txt. The
+rest of this document will be geared towards those who want to use
+dlmfs for easy to setup and easy to use clustered locking in
+userspace.
+
+Setup
+=====
+
+dlmfs requires that the OCFS2 cluster infrastructure be in
+place. Please download ocfs2-tools from the above url and configure a
+cluster.
+
+You'll want to start heartbeating on a volume which all the nodes in
+your lockspace can access. The easiest way to do this is via
+ocfs2_hb_ctl (distributed with ocfs2-tools). Right now it requires
+that an OCFS2 file system be in place so that it can automatically
+find it's heartbeat area, though it will eventually support heartbeat
+against raw disks.
+
+Please see the ocfs2_hb_ctl and mkfs.ocfs2 manual pages distributed
+with ocfs2-tools.
+
+Once you're heartbeating, DLM lock 'domains' can be easily created /
+destroyed and locks within them accessed.
+
+Locking
+=======
+
+Users may access dlmfs via standard file system calls, or they can use
+'libo2dlm' (distributed with ocfs2-tools) which abstracts the file
+system calls and presents a more traditional locking api.
+
+dlmfs handles lock caching automatically for the user, so a lock
+request for an already acquired lock will not generate another DLM
+call. Userspace programs are assumed to handle their own local
+locking.
+
+Two levels of locks are supported - Shared Read, and Exlcusive.
+Also supported is a Trylock operation.
+
+For information on the libo2dlm interface, please see o2dlm.h,
+distributed with ocfs2-tools.
+
+Lock value blocks can be read and written to a resource via read(2)
+and write(2) against the fd obtained via your open(2) call. The
+maximum currently supported LVB length is 64 bytes (though that is an
+OCFS2 DLM limitation). Through this mechanism, users of dlmfs can share
+small amounts of data amongst their nodes.
+
+mkdir(2) signals dlmfs to join a domain (which will have the same name
+as the resulting directory)
+
+rmdir(2) signals dlmfs to leave the domain
+
+Locks for a given domain are represented by regular inodes inside the
+domain directory. Locking against them is done via the open(2) system
+call.
+
+The open(2) call will not return until your lock has been granted or
+an error has occurred, unless it has been instructed to do a trylock
+operation. If the lock succeeds, you'll get an fd.
+
+open(2) with O_CREAT to ensure the resource inode is created - dlmfs does
+not automatically create inodes for existing lock resources.
+
+Open Flag Lock Request Type
+--------- -----------------
+O_RDONLY Shared Read
+O_RDWR Exclusive
+
+Open Flag Resulting Locking Behavior
+--------- --------------------------
+O_NONBLOCK Trylock operation
+
+You must provide exactly one of O_RDONLY or O_RDWR.
+
+If O_NONBLOCK is also provided and the trylock operation was valid but
+could not lock the resource then open(2) will return ETXTBUSY.
+
+close(2) drops the lock associated with your fd.
+
+Modes passed to mkdir(2) or open(2) are adhered to locally. Chown is
+supported locally as well. This means you can use them to restrict
+access to the resources via dlmfs on your local node only.
+
+The resource LVB may be read from the fd in either Shared Read or
+Exclusive modes via the read(2) system call. It can be written via
+write(2) only when open in Exclusive mode.
+
+Once written, an LVB will be visible to other nodes who obtain Read
+Only or higher level locks on the resource.
+
+See Also
+========
+http://opendlm.sourceforge.net/cvsmirror/opendlm/docs/dlmbook_final.pdf
+
+For more information on the VMS distributed locking API.
diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
index 9840d5b8d5b..afb1335c05d 100644
--- a/Documentation/filesystems/ext3.txt
+++ b/Documentation/filesystems/ext3.txt
@@ -2,11 +2,11 @@
Ext3 Filesystem
===============
-ext3 was originally released in September 1999. Written by Stephen Tweedie
-for 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger,
+Ext3 was originally released in September 1999. Written by Stephen Tweedie
+for the 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger,
Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie.
-ext3 is ext2 filesystem enhanced with journalling capabilities.
+Ext3 is the ext2 filesystem enhanced with journalling capabilities.
Options
=======
@@ -14,76 +14,81 @@ Options
When mounting an ext3 filesystem, the following option are accepted:
(*) == default
-jounal=update Update the ext3 file system's journal to the
- current format.
+journal=update Update the ext3 file system's journal to the current
+ format.
-journal=inum When a journal already exists, this option is
- ignored. Otherwise, it specifies the number of
- the inode which will represent the ext3 file
- system's journal file.
+journal=inum When a journal already exists, this option is ignored.
+ Otherwise, it specifies the number of the inode which
+ will represent the ext3 file system's journal file.
+
+journal_dev=devnum When the external journal device's major/minor numbers
+ have changed, this option allows the user to specify
+ the new journal location. The journal device is
+ identified through its new major/minor numbers encoded
+ in devnum.
noload Don't load the journal on mounting.
-data=journal All data are committed into the journal prior
- to being written into the main file system.
+data=journal All data are committed into the journal prior to being
+ written into the main file system.
data=ordered (*) All data are forced directly out to the main file
- system prior to its metadata being committed to
- the journal.
+ system prior to its metadata being committed to the
+ journal.
-data=writeback Data ordering is not preserved, data may be
- written into the main file system after its
- metadata has been committed to the journal.
+data=writeback Data ordering is not preserved, data may be written
+ into the main file system after its metadata has been
+ committed to the journal.
commit=nrsec (*) Ext3 can be told to sync all its data and metadata
every 'nrsec' seconds. The default value is 5 seconds.
- This means that if you lose your power, you will lose,
- as much, the latest 5 seconds of work (your filesystem
- will not be damaged though, thanks to journaling). This
- default value (or any low value) will hurt performance,
- but it's good for data-safety. Setting it to 0 will
- have the same effect than leaving the default 5 sec.
+ This means that if you lose your power, you will lose
+ as much as the latest 5 seconds of work (your
+ filesystem will not be damaged though, thanks to the
+ journaling). This default value (or any low value)
+ will hurt performance, but it's good for data-safety.
+ Setting it to 0 will have the same effect as leaving
+ it at the default (5 seconds).
Setting it to very large values will improve
performance.
-barrier=1 This enables/disables barriers. barrier=0 disables it,
- barrier=1 enables it.
+barrier=1 This enables/disables barriers. barrier=0 disables
+ it, barrier=1 enables it.
-orlov (*) This enables the new Orlov block allocator. It's enabled
- by default.
+orlov (*) This enables the new Orlov block allocator. It is
+ enabled by default.
-oldalloc This disables the Orlov block allocator and enables the
- old block allocator. Orlov should have better performance,
- we'd like to get some feedback if it's the contrary for
- you.
+oldalloc This disables the Orlov block allocator and enables
+ the old block allocator. Orlov should have better
+ performance - we'd like to get some feedback if it's
+ the contrary for you.
-user_xattr Enables Extended User Attributes. Additionally, you need
- to have extended attribute support enabled in the kernel
- configuration (CONFIG_EXT3_FS_XATTR). See the attr(5)
- manual page and http://acl.bestbits.at to learn more
- about extended attributes.
+user_xattr Enables Extended User Attributes. Additionally, you
+ need to have extended attribute support enabled in the
+ kernel configuration (CONFIG_EXT3_FS_XATTR). See the
+ attr(5) manual page and http://acl.bestbits.at/ to
+ learn more about extended attributes.
nouser_xattr Disables Extended User Attributes.
-acl Enables POSIX Access Control Lists support. Additionally,
- you need to have ACL support enabled in the kernel
- configuration (CONFIG_EXT3_FS_POSIX_ACL). See the acl(5)
- manual page and http://acl.bestbits.at for more
- information.
+acl Enables POSIX Access Control Lists support.
+ Additionally, you need to have ACL support enabled in
+ the kernel configuration (CONFIG_EXT3_FS_POSIX_ACL).
+ See the acl(5) manual page and http://acl.bestbits.at/
+ for more information.
-noacl This option disables POSIX Access Control List support.
+noacl This option disables POSIX Access Control List
+ support.
reservation
noreservation
-resize=
-
bsddf (*) Make 'df' act like BSD.
minixdf Make 'df' act like Minix.
check=none Don't do extra checking of bitmaps on mount.
-nocheck
+nocheck
debug Extra debugging information is sent to syslog.
@@ -92,7 +97,7 @@ errors=continue Keep going on a filesystem error.
errors=panic Panic and halt the machine if an error occurs.
grpid Give objects the same group ID as their creator.
-bsdgroups
+bsdgroups
nogrpid (*) New objects have the group ID of their creator.
sysvgroups
@@ -103,81 +108,83 @@ resuid=n The user ID which may use the reserved blocks.
sb=n Use alternate superblock at this location.
-quota Quota options are currently silently ignored.
-noquota (see fs/ext3/super.c, line 594)
+quota
+noquota
grpquota
usrquota
Specification
=============
-ext3 shares all disk implementation with ext2 filesystem, and add
-transactions capabilities to ext2. Journaling is done by the
-Journaling block device layer.
+Ext3 shares all disk implementation with the ext2 filesystem, and adds
+transactions capabilities to ext2. Journaling is done by the Journaling Block
+Device layer.
Journaling Block Device layer
-----------------------------
-The Journaling Block Device layer (JBD) isn't ext3 specific. It was
-design to add journaling capabilities on a block device. The ext3
-filesystem code will inform the JBD of modifications it is performing
-(Call a transaction). the journal support the transactions start and
-stop, and in case of crash, the journal can replayed the transactions
-to put the partition on a consistent state fastly.
+The Journaling Block Device layer (JBD) isn't ext3 specific. It was design to
+add journaling capabilities on a block device. The ext3 filesystem code will
+inform the JBD of modifications it is performing (called a transaction). The
+journal supports the transactions start and stop, and in case of crash, the
+journal can replayed the transactions to put the partition back in a
+consistent state fast.
-handles represent a single atomic update to a filesystem. JBD can
-handle external journal on a block device.
+Handles represent a single atomic update to a filesystem. JBD can handle an
+external journal on a block device.
Data Mode
---------
-There's 3 different data modes:
+There are 3 different data modes:
* writeback mode
-In data=writeback mode, ext3 does not journal data at all. This mode
-provides a similar level of journaling as XFS, JFS, and ReiserFS in its
-default mode - metadata journaling. A crash+recovery can cause
-incorrect data to appear in files which were written shortly before the
-crash. This mode will typically provide the best ext3 performance.
+In data=writeback mode, ext3 does not journal data at all. This mode provides
+a similar level of journaling as that of XFS, JFS, and ReiserFS in its default
+mode - metadata journaling. A crash+recovery can cause incorrect data to
+appear in files which were written shortly before the crash. This mode will
+typically provide the best ext3 performance.
* ordered mode
-In data=ordered mode, ext3 only officially journals metadata, but it
-logically groups metadata and data blocks into a single unit called a
-transaction. When it's time to write the new metadata out to disk, the
-associated data blocks are written first. In general, this mode
-perform slightly slower than writeback but significantly faster than
-journal mode.
+In data=ordered mode, ext3 only officially journals metadata, but it logically
+groups metadata and data blocks into a single unit called a transaction. When
+it's time to write the new metadata out to disk, the associated data blocks
+are written first. In general, this mode performs slightly slower than
+writeback but significantly faster than journal mode.
* journal mode
-data=journal mode provides full data and metadata journaling. All new
-data is written to the journal first, and then to its final location.
-In the event of a crash, the journal can be replayed, bringing both
-data and metadata into a consistent state. This mode is the slowest
-except when data needs to be read from and written to disk at the same
-time where it outperform all others mode.
+data=journal mode provides full data and metadata journaling. All new data is
+written to the journal first, and then to its final location.
+In the event of a crash, the journal can be replayed, bringing both data and
+metadata into a consistent state. This mode is the slowest except when data
+needs to be read from and written to disk at the same time where it
+outperforms all others modes.
Compatibility
-------------
Ext2 partitions can be easily convert to ext3, with `tune2fs -j <dev>`.
-Ext3 is fully compatible with Ext2. Ext3 partitions can easily be
-mounted as Ext2.
+Ext3 is fully compatible with Ext2. Ext3 partitions can easily be mounted as
+Ext2.
+
External Tools
==============
-see manual pages to know more.
+See manual pages to learn more.
+
+tune2fs: create a ext3 journal on a ext2 partition with the -j flag.
+mke2fs: create a ext3 partition with the -j flag.
+debugfs: ext2 and ext3 file system debugger.
+ext2online: online (mounted) ext2 and ext3 filesystem resizer
-tune2fs: create a ext3 journal on a ext2 partition with the -j flags
-mke2fs: create a ext3 partition with the -j flags
-debugfs: ext2 and ext3 file system debugger
References
==========
-kernel source: file:/usr/src/linux/fs/ext3
- file:/usr/src/linux/fs/jbd
+kernel source: <file:fs/ext3/>
+ <file:fs/jbd/>
-programs: http://e2fsprogs.sourceforge.net
+programs: http://e2fsprogs.sourceforge.net/
+ http://ext2resize.sourceforge.net
-useful link:
- http://www.zip.com.au/~akpm/linux/ext3/ext3-usage.html
+useful links: http://www.zip.com.au/~akpm/linux/ext3/ext3-usage.html
http://www-106.ibm.com/developerworks/linux/library/l-fs7/
http://www-106.ibm.com/developerworks/linux/library/l-fs8/
diff --git a/Documentation/filesystems/fuse.txt b/Documentation/filesystems/fuse.txt
index 6b5741e651a..33f74310d16 100644
--- a/Documentation/filesystems/fuse.txt
+++ b/Documentation/filesystems/fuse.txt
@@ -86,6 +86,62 @@ Mount options
The default is infinite. Note that the size of read requests is
limited anyway to 32 pages (which is 128kbyte on i386).
+Sysfs
+~~~~~
+
+FUSE sets up the following hierarchy in sysfs:
+
+ /sys/fs/fuse/connections/N/
+
+where N is an increasing number allocated to each new connection.
+
+For each connection the following attributes are defined:
+
+ 'waiting'
+
+ The number of requests which are waiting to be transfered to
+ userspace or being processed by the filesystem daemon. If there is
+ no filesystem activity and 'waiting' is non-zero, then the
+ filesystem is hung or deadlocked.
+
+ 'abort'
+
+ Writing anything into this file will abort the filesystem
+ connection. This means that all waiting requests will be aborted an
+ error returned for all aborted and new requests.
+
+Only a privileged user may read or write these attributes.
+
+Aborting a filesystem connection
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is possible to get into certain situations where the filesystem is
+not responding. Reasons for this may be:
+
+ a) Broken userspace filesystem implementation
+
+ b) Network connection down
+
+ c) Accidental deadlock
+
+ d) Malicious deadlock
+
+(For more on c) and d) see later sections)
+
+In either of these cases it may be useful to abort the connection to
+the filesystem. There are several ways to do this:
+
+ - Kill the filesystem daemon. Works in case of a) and b)
+
+ - Kill the filesystem daemon and all users of the filesystem. Works
+ in all cases except some malicious deadlocks
+
+ - Use forced umount (umount -f). Works in all cases but only if
+ filesystem is still attached (it hasn't been lazy unmounted)
+
+ - Abort filesystem through the sysfs interface. Most powerful
+ method, always works.
+
How do non-privileged mounts work?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -313,3 +369,10 @@ faulted with get_user_pages(). The 'req->locked' flag indicates
when the copy is taking place, and interruption is delayed until
this flag is unset.
+Scenario 3 - Tricky deadlock with asynchronous read
+---------------------------------------------------
+
+The same situation as above, except thread-1 will wait on page lock
+and hence it will be uninterruptible as well. The solution is to
+abort the connection with forced umount (if mount is attached) or
+through the abort attribute in sysfs.
diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt
new file mode 100644
index 00000000000..f2595caf052
--- /dev/null
+++ b/Documentation/filesystems/ocfs2.txt
@@ -0,0 +1,55 @@
+OCFS2 filesystem
+==================
+OCFS2 is a general purpose extent based shared disk cluster file
+system with many similarities to ext3. It supports 64 bit inode
+numbers, and has automatically extending metadata groups which may
+also make it attractive for non-clustered use.
+
+You'll want to install the ocfs2-tools package in order to at least
+get "mount.ocfs2" and "ocfs2_hb_ctl".
+
+Project web page: http://oss.oracle.com/projects/ocfs2
+Tools web page: http://oss.oracle.com/projects/ocfs2-tools
+OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
+
+All code copyright 2005 Oracle except when otherwise noted.
+
+CREDITS:
+Lots of code taken from ext3 and other projects.
+
+Authors in alphabetical order:
+Joel Becker <joel.becker@oracle.com>
+Zach Brown <zach.brown@oracle.com>
+Mark Fasheh <mark.fasheh@oracle.com>
+Kurt Hackel <kurt.hackel@oracle.com>
+Sunil Mushran <sunil.mushran@oracle.com>
+Manish Singh <manish.singh@oracle.com>
+
+Caveats
+=======
+Features which OCFS2 does not support yet:
+ - sparse files
+ - extended attributes
+ - shared writeable mmap
+ - loopback is supported, but data written will not
+ be cluster coherent.
+ - quotas
+ - cluster aware flock
+ - Directory change notification (F_NOTIFY)
+ - Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease)
+ - POSIX ACLs
+ - readpages / writepages (not user visible)
+
+Mount options
+=============
+
+OCFS2 supports the following mount options:
+(*) == default
+
+barrier=1 This enables/disables barriers. barrier=0 disables it,
+ barrier=1 enables it.
+errors=remount-ro(*) Remount the filesystem read-only on an error.
+errors=panic Panic and halt the machine if an error occurs.
+intr (*) Allow signals to interrupt cluster operations.
+nointr Do not allow signals to interrupt cluster
+ operations.
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index d4773565ea2..944cf109a6f 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -418,7 +418,7 @@ VmallocChunk: 111088 kB
Dirty: Memory which is waiting to get written back to the disk
Writeback: Memory which is actively being written back to the disk
Mapped: files which have been mmaped, such as libraries
- Slab: in-kernel data structures cache
+ Slab: in-kernel data structures cache
CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'),
this is the total amount of memory currently available to
be allocated on the system. This limit is only adhered to
@@ -1302,6 +1302,23 @@ VM has token based thrashing control mechanism and uses the token to prevent
unnecessary page faults in thrashing situation. The unit of the value is
second. The value would be useful to tune thrashing behavior.
+drop_caches
+-----------
+
+Writing to this will cause the kernel to drop clean caches, dentries and
+inodes from memory, causing that memory to become free.
+
+To free pagecache:
+ echo 1 > /proc/sys/vm/drop_caches
+To free dentries and inodes:
+ echo 2 > /proc/sys/vm/drop_caches
+To free pagecache, dentries and inodes:
+ echo 3 > /proc/sys/vm/drop_caches
+
+As this is a non-destructive operation and dirty objects are not freeable, the
+user should run `sync' first.
+
+
2.5 /proc/sys/dev - Device specific parameters
----------------------------------------------
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
index b3404a03259..60ab61e54e8 100644
--- a/Documentation/filesystems/ramfs-rootfs-initramfs.txt
+++ b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
@@ -143,12 +143,26 @@ as the following example:
dir /mnt 755 0 0
file /init initramfs/init.sh 755 0 0
+Run "usr/gen_init_cpio" (after the kernel build) to get a usage message
+documenting the above file format.
+
One advantage of the text file is that root access is not required to
set permissions or create device nodes in the new archive. (Note that those
two example "file" entries expect to find files named "init.sh" and "busybox" in
a directory called "initramfs", under the linux-2.6.* directory. See
Documentation/early-userspace/README for more details.)
+The kernel does not depend on external cpio tools, gen_init_cpio is created
+from usr/gen_init_cpio.c which is entirely self-contained, and the kernel's
+boot-time extractor is also (obviously) self-contained. However, if you _do_
+happen to have cpio installed, the following command line can extract the
+generated cpio image back into its component files:
+
+ cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames
+
+Contents of initramfs:
+----------------------
+
If you don't already understand what shared libraries, devices, and paths
you need to get a minimal root filesystem up and running, here are some
references:
@@ -161,13 +175,69 @@ designed to be a tiny C library to statically link early userspace
code against, along with some related utilities. It is BSD licensed.
I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net)
-myself. These are LGPL and GPL, respectively.
+myself. These are LGPL and GPL, respectively. (A self-contained initramfs
+package is planned for the busybox 1.2 release.)
In theory you could use glibc, but that's not well suited for small embedded
uses like this. (A "hello world" program statically linked against glibc is
over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do
name lookups, even when otherwise statically linked.)
+Why cpio rather than tar?
+-------------------------
+
+This decision was made back in December, 2001. The discussion started here:
+
+ http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1538.html
+
+And spawned a second thread (specifically on tar vs cpio), starting here:
+
+ http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1587.html
+
+The quick and dirty summary version (which is no substitute for reading
+the above threads) is:
+
+1) cpio is a standard. It's decades old (from the AT&T days), and already
+ widely used on Linux (inside RPM, Red Hat's device driver disks). Here's
+ a Linux Journal article about it from 1996:
+
+ http://www.linuxjournal.com/article/1213
+
+ It's not as popular as tar because the traditional cpio command line tools
+ require _truly_hideous_ command line arguments. But that says nothing
+ either way about the archive format, and there are alternative tools,
+ such as:
+
+ http://freshmeat.net/projects/afio/
+
+2) The cpio archive format chosen by the kernel is simpler and cleaner (and
+ thus easier to create and parse) than any of the (literally dozens of)
+ various tar archive formats. The complete initramfs archive format is
+ explained in buffer-format.txt, created in usr/gen_init_cpio.c, and
+ extracted in init/initramfs.c. All three together come to less than 26k
+ total of human-readable text.
+
+3) The GNU project standardizing on tar is approximately as relevant as
+ Windows standardizing on zip. Linux is not part of either, and is free
+ to make its own technical decisions.
+
+4) Since this is a kernel internal format, it could easily have been
+ something brand new. The kernel provides its own tools to create and
+ extract this format anyway. Using an existing standard was preferable,
+ but not essential.
+
+5) Al Viro made the decision (quote: "tar is ugly as hell and not going to be
+ supported on the kernel side"):
+
+ http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1540.html
+
+ explained his reasoning:
+
+ http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html
+ http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html
+
+ and, most importantly, designed and implemented the initramfs code.
+
Future directions:
------------------
diff --git a/Documentation/filesystems/relayfs.txt b/Documentation/filesystems/relayfs.txt
index d803abed29f..5832377b734 100644
--- a/Documentation/filesystems/relayfs.txt
+++ b/Documentation/filesystems/relayfs.txt
@@ -44,30 +44,41 @@ relayfs can operate in a mode where it will overwrite data not yet
collected by userspace, and not wait for it to consume it.
relayfs itself does not provide for communication of such data between
-userspace and kernel, allowing the kernel side to remain simple and not
-impose a single interface on userspace. It does provide a separate
-helper though, described below.
+userspace and kernel, allowing the kernel side to remain simple and
+not impose a single interface on userspace. It does provide a set of
+examples and a separate helper though, described below.
+
+klog and relay-apps example code
+================================
+
+relayfs itself is ready to use, but to make things easier, a couple
+simple utility functions and a set of examples are provided.
+
+The relay-apps example tarball, available on the relayfs sourceforge
+site, contains a set of self-contained examples, each consisting of a
+pair of .c files containing boilerplate code for each of the user and
+kernel sides of a relayfs application; combined these two sets of
+boilerplate code provide glue to easily stream data to disk, without
+having to bother with mundane housekeeping chores.
+
+The 'klog debugging functions' patch (klog.patch in the relay-apps
+tarball) provides a couple of high-level logging functions to the
+kernel which allow writing formatted text or raw data to a channel,
+regardless of whether a channel to write into exists or not, or
+whether relayfs is compiled into the kernel or is configured as a
+module. These functions allow you to put unconditional 'trace'
+statements anywhere in the kernel or kernel modules; only when there
+is a 'klog handler' registered will data actually be logged (see the
+klog and kleak examples for details).
+
+It is of course possible to use relayfs from scratch i.e. without
+using any of the relay-apps example code or klog, but you'll have to
+implement communication between userspace and kernel, allowing both to
+convey the state of buffers (full, empty, amount of padding).
+
+klog and the relay-apps examples can be found in the relay-apps
+tarball on http://relayfs.sourceforge.net
-klog, relay-app & librelay
-==========================
-
-relayfs itself is ready to use, but to make things easier, two
-additional systems are provided. klog is a simple wrapper to make
-writing formatted text or raw data to a channel simpler, regardless of
-whether a channel to write into exists or not, or whether relayfs is
-compiled into the kernel or is configured as a module. relay-app is
-the kernel counterpart of userspace librelay.c, combined these two
-files provide glue to easily stream data to disk, without having to
-bother with housekeeping. klog and relay-app can be used together,
-with klog providing high-level logging functions to the kernel and
-relay-app taking care of kernel-user control and disk-logging chores.
-
-It is possible to use relayfs without relay-app & librelay, but you'll
-have to implement communication between userspace and kernel, allowing
-both to convey the state of buffers (full, empty, amount of padding).
-
-klog, relay-app and librelay can be found in the relay-apps tarball on
-http://relayfs.sourceforge.net
The relayfs user space API
==========================
@@ -125,6 +136,8 @@ Here's a summary of the API relayfs provides to in-kernel clients:
relay_reset(chan)
relayfs_create_dir(name, parent)
relayfs_remove_dir(dentry)
+ relayfs_create_file(name, parent, mode, fops, data)
+ relayfs_remove_file(dentry)
channel management typically called on instigation of userspace:
@@ -141,6 +154,8 @@ Here's a summary of the API relayfs provides to in-kernel clients:
subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
buf_mapped(buf, filp)
buf_unmapped(buf, filp)
+ create_buf_file(filename, parent, mode, buf, is_global)
+ remove_buf_file(dentry)
helper functions:
@@ -320,6 +335,71 @@ forces a sub-buffer switch on all the channel buffers, and can be used
to finalize and process the last sub-buffers before the channel is
closed.
+Creating non-relay files
+------------------------
+
+relay_open() automatically creates files in the relayfs filesystem to
+represent the per-cpu kernel buffers; it's often useful for
+applications to be able to create their own files alongside the relay
+files in the relayfs filesystem as well e.g. 'control' files much like
+those created in /proc or debugfs for similar purposes, used to
+communicate control information between the kernel and user sides of a
+relayfs application. For this purpose the relayfs_create_file() and
+relayfs_remove_file() API functions exist. For relayfs_create_file(),
+the caller passes in a set of user-defined file operations to be used
+for the file and an optional void * to a user-specified data item,
+which will be accessible via inode->u.generic_ip (see the relay-apps
+tarball for examples). The file_operations are a required parameter
+to relayfs_create_file() and thus the semantics of these files are
+completely defined by the caller.
+
+See the relay-apps tarball at http://relayfs.sourceforge.net for
+examples of how these non-relay files are meant to be used.
+
+Creating relay files in other filesystems
+-----------------------------------------
+
+By default of course, relay_open() creates relay files in the relayfs
+filesystem. Because relay_file_operations is exported, however, it's
+also possible to create and use relay files in other pseudo-filesytems
+such as debugfs.
+
+For this purpose, two callback functions are provided,
+create_buf_file() and remove_buf_file(). create_buf_file() is called
+once for each per-cpu buffer from relay_open() to allow the client to
+create a file to be used to represent the corresponding buffer; if
+this callback is not defined, the default implementation will create
+and return a file in the relayfs filesystem to represent the buffer.
+The callback should return the dentry of the file created to represent
+the relay buffer. Note that the parent directory passed to
+relay_open() (and passed along to the callback), if specified, must
+exist in the same filesystem the new relay file is created in. If
+create_buf_file() is defined, remove_buf_file() must also be defined;
+it's responsible for deleting the file(s) created in create_buf_file()
+and is called during relay_close().
+
+The create_buf_file() implementation can also be defined in such a way
+as to allow the creation of a single 'global' buffer instead of the
+default per-cpu set. This can be useful for applications interested
+mainly in seeing the relative ordering of system-wide events without
+the need to bother with saving explicit timestamps for the purpose of
+merging/sorting per-cpu files in a postprocessing step.
+
+To have relay_open() create a global buffer, the create_buf_file()
+implementation should set the value of the is_global outparam to a
+non-zero value in addition to creating the file that will be used to
+represent the single buffer. In the case of a global buffer,
+create_buf_file() and remove_buf_file() will be called only once. The
+normal channel-writing functions e.g. relay_write() can still be used
+- writes from any cpu will transparently end up in the global buffer -
+but since it is a global buffer, callers should make sure they use the
+proper locking for such a buffer, either by wrapping writes in a
+spinlock, or by copying a write function from relayfs_fs.h and
+creating a local version that internally does the proper locking.
+
+See the 'exported-relayfile' examples in the relay-apps tarball for
+examples of creating and using relay files in debugfs.
+
Misc
----
diff --git a/Documentation/filesystems/spufs.txt b/Documentation/filesystems/spufs.txt
new file mode 100644
index 00000000000..8edc3952eff
--- /dev/null
+++ b/Documentation/filesystems/spufs.txt
@@ -0,0 +1,521 @@
+SPUFS(2) Linux Programmer's Manual SPUFS(2)
+
+
+
+NAME
+ spufs - the SPU file system
+
+
+DESCRIPTION
+ The SPU file system is used on PowerPC machines that implement the Cell
+ Broadband Engine Architecture in order to access Synergistic Processor
+ Units (SPUs).
+
+ The file system provides a name space similar to posix shared memory or
+ message queues. Users that have write permissions on the file system
+ can use spu_create(2) to establish SPU contexts in the spufs root.
+
+ Every SPU context is represented by a directory containing a predefined
+ set of files. These files can be used for manipulating the state of the
+ logical SPU. Users can change permissions on those files, but not actu-
+ ally add or remove files.
+
+
+MOUNT OPTIONS
+ uid=<uid>
+ set the user owning the mount point, the default is 0 (root).
+
+ gid=<gid>
+ set the group owning the mount point, the default is 0 (root).
+
+
+FILES
+ The files in spufs mostly follow the standard behavior for regular sys-
+ tem calls like read(2) or write(2), but often support only a subset of
+ the operations supported on regular file systems. This list details the
+ supported operations and the deviations from the behaviour in the
+ respective man pages.
+
+ All files that support the read(2) operation also support readv(2) and
+ all files that support the write(2) operation also support writev(2).
+ All files support the access(2) and stat(2) family of operations, but
+ only the st_mode, st_nlink, st_uid and st_gid fields of struct stat
+ contain reliable information.
+
+ All files support the chmod(2)/fchmod(2) and chown(2)/fchown(2) opera-
+ tions, but will not be able to grant permissions that contradict the
+ possible operations, e.g. read access on the wbox file.
+
+ The current set of files is:
+
+
+ /mem
+ the contents of the local storage memory of the SPU. This can be
+ accessed like a regular shared memory file and contains both code and
+ data in the address space of the SPU. The possible operations on an
+ open mem file are:
+
+ read(2), pread(2), write(2), pwrite(2), lseek(2)
+ These operate as documented, with the exception that seek(2),
+ write(2) and pwrite(2) are not supported beyond the end of the
+ file. The file size is the size of the local storage of the SPU,
+ which normally is 256 kilobytes.
+
+ mmap(2)
+ Mapping mem into the process address space gives access to the
+ SPU local storage within the process address space. Only
+ MAP_SHARED mappings are allowed.
+
+
+ /mbox
+ The first SPU to CPU communication mailbox. This file is read-only and
+ can be read in units of 32 bits. The file can only be used in non-
+ blocking mode and it even poll() will not block on it. The possible
+ operations on an open mbox file are:
+
+ read(2)
+ If a count smaller than four is requested, read returns -1 and
+ sets errno to EINVAL. If there is no data available in the mail
+ box, the return value is set to -1 and errno becomes EAGAIN.
+ When data has been read successfully, four bytes are placed in
+ the data buffer and the value four is returned.
+
+
+ /ibox
+ The second SPU to CPU communication mailbox. This file is similar to
+ the first mailbox file, but can be read in blocking I/O mode, and the
+ poll familiy of system calls can be used to wait for it. The possible
+ operations on an open ibox file are:
+
+ read(2)
+ If a count smaller than four is requested, read returns -1 and
+ sets errno to EINVAL. If there is no data available in the mail
+ box and the file descriptor has been opened with O_NONBLOCK, the
+ return value is set to -1 and errno becomes EAGAIN.
+
+ If there is no data available in the mail box and the file
+ descriptor has been opened without O_NONBLOCK, the call will
+ block until the SPU writes to its interrupt mailbox channel.
+ When data has been read successfully, four bytes are placed in
+ the data buffer and the value four is returned.
+
+ poll(2)
+ Poll on the ibox file returns (POLLIN | POLLRDNORM) whenever
+ data is available for reading.
+
+
+ /wbox
+ The CPU to SPU communation mailbox. It is write-only can can be written
+ in units of 32 bits. If the mailbox is full, write() will block and
+ poll can be used to wait for it becoming empty again. The possible
+ operations on an open wbox file are: write(2) If a count smaller than
+ four is requested, write returns -1 and sets errno to EINVAL. If there
+ is no space available in the mail box and the file descriptor has been
+ opened with O_NONBLOCK, the return value is set to -1 and errno becomes
+ EAGAIN.
+
+ If there is no space available in the mail box and the file descriptor
+ has been opened without O_NONBLOCK, the call will block until the SPU
+ reads from its PPE mailbox channel. When data has been read success-
+ fully, four bytes are placed in the data buffer and the value four is
+ returned.
+
+ poll(2)
+ Poll on the ibox file returns (POLLOUT | POLLWRNORM) whenever
+ space is available for writing.
+
+
+ /mbox_stat
+ /ibox_stat
+ /wbox_stat
+ Read-only files that contain the length of the current queue, i.e. how
+ many words can be read from mbox or ibox or how many words can be
+ written to wbox without blocking. The files can be read only in 4-byte
+ units and return a big-endian binary integer number. The possible
+ operations on an open *box_stat file are:
+
+ read(2)
+ If a count smaller than four is requested, read returns -1 and
+ sets errno to EINVAL. Otherwise, a four byte value is placed in
+ the data buffer, containing the number of elements that can be
+ read from (for mbox_stat and ibox_stat) or written to (for
+ wbox_stat) the respective mail box without blocking or resulting
+ in EAGAIN.
+
+
+ /npc
+ /decr
+ /decr_status
+ /spu_tag_mask
+ /event_mask
+ /srr0
+ Internal registers of the SPU. The representation is an ASCII string
+ with the numeric value of the next instruction to be executed. These
+ can be used in read/write mode for debugging, but normal operation of
+ programs should not rely on them because access to any of them except
+ npc requires an SPU context save and is therefore very inefficient.
+
+ The contents of these files are:
+
+ npc Next Program Counter
+
+ decr SPU Decrementer
+
+ decr_status Decrementer Status
+
+ spu_tag_mask MFC tag mask for SPU DMA
+
+ event_mask Event mask for SPU interrupts
+
+ srr0 Interrupt Return address register
+
+
+ The possible operations on an open npc, decr, decr_status,
+ spu_tag_mask, event_mask or srr0 file are:
+
+ read(2)
+ When the count supplied to the read call is shorter than the
+ required length for the pointer value plus a newline character,
+ subsequent reads from the same file descriptor will result in
+ completing the string, regardless of changes to the register by
+ a running SPU task. When a complete string has been read, all
+ subsequent read operations will return zero bytes and a new file
+ descriptor needs to be opened to read the value again.
+
+ write(2)
+ A write operation on the file results in setting the register to
+ the value given in the string. The string is parsed from the
+ beginning to the first non-numeric character or the end of the
+ buffer. Subsequent writes to the same file descriptor overwrite
+ the previous setting.
+
+
+ /fpcr
+ This file gives access to the Floating Point Status and Control Regis-
+ ter as a four byte long file. The operations on the fpcr file are:
+
+ read(2)
+ If a count smaller than four is requested, read returns -1 and
+ sets errno to EINVAL. Otherwise, a four byte value is placed in
+ the data buffer, containing the current value of the fpcr regis-
+ ter.
+
+ write(2)
+ If a count smaller than four is requested, write returns -1 and
+ sets errno to EINVAL. Otherwise, a four byte value is copied
+ from the data buffer, updating the value of the fpcr register.
+
+
+ /signal1
+ /signal2
+ The two signal notification channels of an SPU. These are read-write
+ files that operate on a 32 bit word. Writing to one of these files
+ triggers an interrupt on the SPU. The value writting to the signal
+ files can be read from the SPU through a channel read or from host user
+ space through the file. After the value has been read by the SPU, it
+ is reset to zero. The possible operations on an open signal1 or sig-
+ nal2 file are:
+
+ read(2)
+ If a count smaller than four is requested, read returns -1 and
+ sets errno to EINVAL. Otherwise, a four byte value is placed in
+ the data buffer, containing the current value of the specified
+ signal notification register.
+
+ write(2)
+ If a count smaller than four is requested, write returns -1 and
+ sets errno to EINVAL. Otherwise, a four byte value is copied
+ from the data buffer, updating the value of the specified signal
+ notification register. The signal notification register will
+ either be replaced with the input data or will be updated to the
+ bitwise OR or the old value and the input data, depending on the
+ contents of the signal1_type, or signal2_type respectively,
+ file.
+
+
+ /signal1_type
+ /signal2_type
+ These two files change the behavior of the signal1 and signal2 notifi-
+ cation files. The contain a numerical ASCII string which is read as
+ either "1" or "0". In mode 0 (overwrite), the hardware replaces the
+ contents of the signal channel with the data that is written to it. in
+ mode 1 (logical OR), the hardware accumulates the bits that are subse-
+ quently written to it. The possible operations on an open signal1_type
+ or signal2_type file are:
+
+ read(2)
+ When the count supplied to the read call is shorter than the
+ required length for the digit plus a newline character, subse-
+ quent reads from the same file descriptor will result in com-
+ pleting the string. When a complete string has been read, all
+ subsequent read operations will return zero bytes and a new file
+ descriptor needs to be opened to read the value again.
+
+ write(2)
+ A write operation on the file results in setting the register to
+ the value given in the string. The string is parsed from the
+ beginning to the first non-numeric character or the end of the
+ buffer. Subsequent writes to the same file descriptor overwrite
+ the previous setting.
+
+
+EXAMPLES
+ /etc/fstab entry
+ none /spu spufs gid=spu 0 0
+
+
+AUTHORS
+ Arnd Bergmann <arndb@de.ibm.com>, Mark Nutter <mnutter@us.ibm.com>,
+ Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
+
+SEE ALSO
+ capabilities(7), close(2), spu_create(2), spu_run(2), spufs(7)
+
+
+
+Linux 2005-09-28 SPUFS(2)
+
+------------------------------------------------------------------------------
+
+SPU_RUN(2) Linux Programmer's Manual SPU_RUN(2)
+
+
+
+NAME
+ spu_run - execute an spu context
+
+
+SYNOPSIS
+ #include <sys/spu.h>
+
+ int spu_run(int fd, unsigned int *npc, unsigned int *event);
+
+DESCRIPTION
+ The spu_run system call is used on PowerPC machines that implement the
+ Cell Broadband Engine Architecture in order to access Synergistic Pro-
+ cessor Units (SPUs). It uses the fd that was returned from spu_cre-
+ ate(2) to address a specific SPU context. When the context gets sched-
+ uled to a physical SPU, it starts execution at the instruction pointer
+ passed in npc.
+
+ Execution of SPU code happens synchronously, meaning that spu_run does
+ not return while the SPU is still running. If there is a need to exe-
+ cute SPU code in parallel with other code on either the main CPU or
+ other SPUs, you need to create a new thread of execution first, e.g.
+ using the pthread_create(3) call.
+
+ When spu_run returns, the current value of the SPU instruction pointer
+ is written back to npc, so you can call spu_run again without updating
+ the pointers.
+
+ event can be a NULL pointer or point to an extended status code that
+ gets filled when spu_run returns. It can be one of the following con-
+ stants:
+
+ SPE_EVENT_DMA_ALIGNMENT
+ A DMA alignment error
+
+ SPE_EVENT_SPE_DATA_SEGMENT
+ A DMA segmentation error
+
+ SPE_EVENT_SPE_DATA_STORAGE
+ A DMA storage error
+
+ If NULL is passed as the event argument, these errors will result in a
+ signal delivered to the calling process.
+
+RETURN VALUE
+ spu_run returns the value of the spu_status register or -1 to indicate
+ an error and set errno to one of the error codes listed below. The
+ spu_status register value contains a bit mask of status codes and
+ optionally a 14 bit code returned from the stop-and-signal instruction
+ on the SPU. The bit masks for the status codes are:
+
+ 0x02 SPU was stopped by stop-and-signal.
+
+ 0x04 SPU was stopped by halt.
+
+ 0x08 SPU is waiting for a channel.
+
+ 0x10 SPU is in single-step mode.
+
+ 0x20 SPU has tried to execute an invalid instruction.
+
+ 0x40 SPU has tried to access an invalid channel.
+
+ 0x3fff0000
+ The bits masked with this value contain the code returned from
+ stop-and-signal.
+
+ There are always one or more of the lower eight bits set or an error
+ code is returned from spu_run.
+
+ERRORS
+ EAGAIN or EWOULDBLOCK
+ fd is in non-blocking mode and spu_run would block.
+
+ EBADF fd is not a valid file descriptor.
+
+ EFAULT npc is not a valid pointer or status is neither NULL nor a valid
+ pointer.
+
+ EINTR A signal occured while spu_run was in progress. The npc value
+ has been updated to the new program counter value if necessary.
+
+ EINVAL fd is not a file descriptor returned from spu_create(2).
+
+ ENOMEM Insufficient memory was available to handle a page fault result-
+ ing from an MFC direct memory access.
+
+ ENOSYS the functionality is not provided by the current system, because
+ either the hardware does not provide SPUs or the spufs module is
+ not loaded.
+
+
+NOTES
+ spu_run is meant to be used from libraries that implement a more
+ abstract interface to SPUs, not to be used from regular applications.
+ See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec-
+ ommended libraries.
+
+
+CONFORMING TO
+ This call is Linux specific and only implemented by the ppc64 architec-
+ ture. Programs using this system call are not portable.
+
+
+BUGS
+ The code does not yet fully implement all features lined out here.
+
+
+AUTHOR
+ Arnd Bergmann <arndb@de.ibm.com>
+
+SEE ALSO
+ capabilities(7), close(2), spu_create(2), spufs(7)
+
+
+
+Linux 2005-09-28 SPU_RUN(2)
+
+------------------------------------------------------------------------------
+
+SPU_CREATE(2) Linux Programmer's Manual SPU_CREATE(2)
+
+
+
+NAME
+ spu_create - create a new spu context
+
+
+SYNOPSIS
+ #include <sys/types.h>
+ #include <sys/spu.h>
+
+ int spu_create(const char *pathname, int flags, mode_t mode);
+
+DESCRIPTION
+ The spu_create system call is used on PowerPC machines that implement
+ the Cell Broadband Engine Architecture in order to access Synergistic
+ Processor Units (SPUs). It creates a new logical context for an SPU in
+ pathname and returns a handle to associated with it. pathname must
+ point to a non-existing directory in the mount point of the SPU file
+ system (spufs). When spu_create is successful, a directory gets cre-
+ ated on pathname and it is populated with files.
+
+ The returned file handle can only be passed to spu_run(2) or closed,
+ other operations are not defined on it. When it is closed, all associ-
+ ated directory entries in spufs are removed. When the last file handle
+ pointing either inside of the context directory or to this file
+ descriptor is closed, the logical SPU context is destroyed.
+
+ The parameter flags can be zero or any bitwise or'd combination of the
+ following constants:
+
+ SPU_RAWIO
+ Allow mapping of some of the hardware registers of the SPU into
+ user space. This flag requires the CAP_SYS_RAWIO capability, see
+ capabilities(7).
+
+ The mode parameter specifies the permissions used for creating the new
+ directory in spufs. mode is modified with the user's umask(2) value
+ and then used for both the directory and the files contained in it. The
+ file permissions mask out some more bits of mode because they typically
+ support only read or write access. See stat(2) for a full list of the
+ possible mode values.
+
+
+RETURN VALUE
+ spu_create returns a new file descriptor. It may return -1 to indicate
+ an error condition and set errno to one of the error codes listed
+ below.
+
+
+ERRORS
+ EACCESS
+ The current user does not have write access on the spufs mount
+ point.
+
+ EEXIST An SPU context already exists at the given path name.
+
+ EFAULT pathname is not a valid string pointer in the current address
+ space.
+
+ EINVAL pathname is not a directory in the spufs mount point.
+
+ ELOOP Too many symlinks were found while resolving pathname.
+
+ EMFILE The process has reached its maximum open file limit.
+
+ ENAMETOOLONG
+ pathname was too long.
+
+ ENFILE The system has reached the global open file limit.
+
+ ENOENT Part of pathname could not be resolved.
+
+ ENOMEM The kernel could not allocate all resources required.
+
+ ENOSPC There are not enough SPU resources available to create a new
+ context or the user specific limit for the number of SPU con-
+ texts has been reached.
+
+ ENOSYS the functionality is not provided by the current system, because
+ either the hardware does not provide SPUs or the spufs module is
+ not loaded.
+
+ ENOTDIR
+ A part of pathname is not a directory.
+
+
+
+NOTES
+ spu_create is meant to be used from libraries that implement a more
+ abstract interface to SPUs, not to be used from regular applications.
+ See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec-
+ ommended libraries.
+
+
+FILES
+ pathname must point to a location beneath the mount point of spufs. By
+ convention, it gets mounted in /spu.
+
+
+CONFORMING TO
+ This call is Linux specific and only implemented by the ppc64 architec-
+ ture. Programs using this system call are not portable.
+
+
+BUGS
+ The code does not yet fully implement all features lined out here.
+
+
+AUTHOR
+ Arnd Bergmann <arndb@de.ibm.com>
+
+SEE ALSO
+ capabilities(7), close(2), spu_run(2), spufs(7)
+
+
+
+Linux 2005-09-28 SPU_CREATE(2)
diff --git a/Documentation/filesystems/sysfs-pci.txt b/Documentation/filesystems/sysfs-pci.txt
index 988a62fae11..7ba2baa165f 100644
--- a/Documentation/filesystems/sysfs-pci.txt
+++ b/Documentation/filesystems/sysfs-pci.txt
@@ -1,4 +1,5 @@
Accessing PCI device resources through sysfs
+--------------------------------------------
sysfs, usually mounted at /sys, provides access to PCI resources on platforms
that support it. For example, a given bus might look like this:
@@ -47,14 +48,21 @@ files, each with their own function.
binary - file contains binary data
cpumask - file contains a cpumask type
-The read only files are informational, writes to them will be ignored.
-Writable files can be used to perform actions on the device (e.g. changing
-config space, detaching a device). mmapable files are available via an
-mmap of the file at offset 0 and can be used to do actual device programming
-from userspace. Note that some platforms don't support mmapping of certain
-resources, so be sure to check the return value from any attempted mmap.
+The read only files are informational, writes to them will be ignored, with
+the exception of the 'rom' file. Writable files can be used to perform
+actions on the device (e.g. changing config space, detaching a device).
+mmapable files are available via an mmap of the file at offset 0 and can be
+used to do actual device programming from userspace. Note that some platforms
+don't support mmapping of certain resources, so be sure to check the return
+value from any attempted mmap.
+
+The 'rom' file is special in that it provides read-only access to the device's
+ROM file, if available. It's disabled by default, however, so applications
+should write the string "1" to the file to enable it before attempting a read
+call, and disable it following the access by writing "0" to the file.
Accessing legacy resources through sysfs
+----------------------------------------
Legacy I/O port and ISA memory resources are also provided in sysfs if the
underlying platform supports them. They're located in the PCI class heirarchy,
@@ -75,6 +83,7 @@ simply dereference the returned pointer (after checking for errors of course)
to access legacy memory space.
Supporting PCI access on new platforms
+--------------------------------------
In order to support PCI resource mapping as described above, Linux platform
code must define HAVE_PCI_MMAP and provide a pci_mmap_page_range function.
diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt
index 0d783c504ea..dbe4d87d261 100644
--- a/Documentation/filesystems/tmpfs.txt
+++ b/Documentation/filesystems/tmpfs.txt
@@ -78,6 +78,18 @@ use up all the memory on the machine; but enhances the scalability of
that instance in a system with many cpus making intensive use of it.
+tmpfs has a mount option to set the NUMA memory allocation policy for
+all files in that instance:
+mpol=interleave prefers to allocate memory from each node in turn
+mpol=default prefers to allocate memory from the local node
+mpol=bind prefers to allocate from mpol_nodelist
+mpol=preferred prefers to allocate from first node in mpol_nodelist
+
+The following mount option is used in conjunction with mpol=interleave,
+mpol=bind or mpol=preferred:
+mpol_nodelist: nodelist suitable for parsing with nodelist_parse.
+
+
To specify the initial root directory you can use the following mount
options:
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index ee4c0a8b8db..e56e842847d 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -162,9 +162,8 @@ get_sb() method fills in is the "s_op" field. This is a pointer to
a "struct super_operations" which describes the next level of the
filesystem implementation.
-Usually, a filesystem uses generic one of the generic get_sb()
-implementations and provides a fill_super() method instead. The
-generic methods are:
+Usually, a filesystem uses one of the generic get_sb() implementations
+and provides a fill_super() method instead. The generic methods are:
get_sb_bdev: mount a filesystem residing on a block device
diff --git a/Documentation/hpet.txt b/Documentation/hpet.txt
index e52457581f4..b7a3dc38dd5 100644
--- a/Documentation/hpet.txt
+++ b/Documentation/hpet.txt
@@ -2,7 +2,7 @@
The High Precision Event Timer (HPET) hardware is the future replacement
for the 8254 and Real Time Clock (RTC) periodic timer functionality.
-Each HPET can have up two 32 timers. It is possible to configure the
+Each HPET can have up to 32 timers. It is possible to configure the
first two timers as legacy replacements for 8254 and RTC periodic timers.
A specification done by Intel and Microsoft can be found at
<http://www.intel.com/hardwaredesign/hpetspec.htm>.
diff --git a/Documentation/hrtimers.txt b/Documentation/hrtimers.txt
new file mode 100644
index 00000000000..7620ff735fa
--- /dev/null
+++ b/Documentation/hrtimers.txt
@@ -0,0 +1,178 @@
+
+hrtimers - subsystem for high-resolution kernel timers
+----------------------------------------------------
+
+This patch introduces a new subsystem for high-resolution kernel timers.
+
+One might ask the question: we already have a timer subsystem
+(kernel/timers.c), why do we need two timer subsystems? After a lot of
+back and forth trying to integrate high-resolution and high-precision
+features into the existing timer framework, and after testing various
+such high-resolution timer implementations in practice, we came to the
+conclusion that the timer wheel code is fundamentally not suitable for
+such an approach. We initially didnt believe this ('there must be a way
+to solve this'), and spent a considerable effort trying to integrate
+things into the timer wheel, but we failed. In hindsight, there are
+several reasons why such integration is hard/impossible:
+
+- the forced handling of low-resolution and high-resolution timers in
+ the same way leads to a lot of compromises, macro magic and #ifdef
+ mess. The timers.c code is very "tightly coded" around jiffies and
+ 32-bitness assumptions, and has been honed and micro-optimized for a
+ relatively narrow use case (jiffies in a relatively narrow HZ range)
+ for many years - and thus even small extensions to it easily break
+ the wheel concept, leading to even worse compromises. The timer wheel
+ code is very good and tight code, there's zero problems with it in its
+ current usage - but it is simply not suitable to be extended for
+ high-res timers.
+
+- the unpredictable [O(N)] overhead of cascading leads to delays which
+ necessiate a more complex handling of high resolution timers, which
+ in turn decreases robustness. Such a design still led to rather large
+ timing inaccuracies. Cascading is a fundamental property of the timer
+ wheel concept, it cannot be 'designed out' without unevitably
+ degrading other portions of the timers.c code in an unacceptable way.
+
+- the implementation of the current posix-timer subsystem on top of
+ the timer wheel has already introduced a quite complex handling of
+ the required readjusting of absolute CLOCK_REALTIME timers at
+ settimeofday or NTP time - further underlying our experience by
+ example: that the timer wheel data structure is too rigid for high-res
+ timers.
+
+- the timer wheel code is most optimal for use cases which can be
+ identified as "timeouts". Such timeouts are usually set up to cover
+ error conditions in various I/O paths, such as networking and block
+ I/O. The vast majority of those timers never expire and are rarely
+ recascaded because the expected correct event arrives in time so they
+ can be removed from the timer wheel before any further processing of
+ them becomes necessary. Thus the users of these timeouts can accept
+ the granularity and precision tradeoffs of the timer wheel, and
+ largely expect the timer subsystem to have near-zero overhead.
+ Accurate timing for them is not a core purpose - in fact most of the
+ timeout values used are ad-hoc. For them it is at most a necessary
+ evil to guarantee the processing of actual timeout completions
+ (because most of the timeouts are deleted before completion), which
+ should thus be as cheap and unintrusive as possible.
+
+The primary users of precision timers are user-space applications that
+utilize nanosleep, posix-timers and itimer interfaces. Also, in-kernel
+users like drivers and subsystems which require precise timed events
+(e.g. multimedia) can benefit from the availability of a seperate
+high-resolution timer subsystem as well.
+
+While this subsystem does not offer high-resolution clock sources just
+yet, the hrtimer subsystem can be easily extended with high-resolution
+clock capabilities, and patches for that exist and are maturing quickly.
+The increasing demand for realtime and multimedia applications along
+with other potential users for precise timers gives another reason to
+separate the "timeout" and "precise timer" subsystems.
+
+Another potential benefit is that such a seperation allows even more
+special-purpose optimization of the existing timer wheel for the low
+resolution and low precision use cases - once the precision-sensitive
+APIs are separated from the timer wheel and are migrated over to
+hrtimers. E.g. we could decrease the frequency of the timeout subsystem
+from 250 Hz to 100 HZ (or even smaller).
+
+hrtimer subsystem implementation details
+----------------------------------------
+
+the basic design considerations were:
+
+- simplicity
+
+- data structure not bound to jiffies or any other granularity. All the
+ kernel logic works at 64-bit nanoseconds resolution - no compromises.
+
+- simplification of existing, timing related kernel code
+
+another basic requirement was the immediate enqueueing and ordering of
+timers at activation time. After looking at several possible solutions
+such as radix trees and hashes, we chose the red black tree as the basic
+data structure. Rbtrees are available as a library in the kernel and are
+used in various performance-critical areas of e.g. memory management and
+file systems. The rbtree is solely used for time sorted ordering, while
+a separate list is used to give the expiry code fast access to the
+queued timers, without having to walk the rbtree.
+
+(This seperate list is also useful for later when we'll introduce
+high-resolution clocks, where we need seperate pending and expired
+queues while keeping the time-order intact.)
+
+Time-ordered enqueueing is not purely for the purposes of
+high-resolution clocks though, it also simplifies the handling of
+absolute timers based on a low-resolution CLOCK_REALTIME. The existing
+implementation needed to keep an extra list of all armed absolute
+CLOCK_REALTIME timers along with complex locking. In case of
+settimeofday and NTP, all the timers (!) had to be dequeued, the
+time-changing code had to fix them up one by one, and all of them had to
+be enqueued again. The time-ordered enqueueing and the storage of the
+expiry time in absolute time units removes all this complex and poorly
+scaling code from the posix-timer implementation - the clock can simply
+be set without having to touch the rbtree. This also makes the handling
+of posix-timers simpler in general.
+
+The locking and per-CPU behavior of hrtimers was mostly taken from the
+existing timer wheel code, as it is mature and well suited. Sharing code
+was not really a win, due to the different data structures. Also, the
+hrtimer functions now have clearer behavior and clearer names - such as
+hrtimer_try_to_cancel() and hrtimer_cancel() [which are roughly
+equivalent to del_timer() and del_timer_sync()] - so there's no direct
+1:1 mapping between them on the algorithmical level, and thus no real
+potential for code sharing either.
+
+Basic data types: every time value, absolute or relative, is in a
+special nanosecond-resolution type: ktime_t. The kernel-internal
+representation of ktime_t values and operations is implemented via
+macros and inline functions, and can be switched between a "hybrid
+union" type and a plain "scalar" 64bit nanoseconds representation (at
+compile time). The hybrid union type optimizes time conversions on 32bit
+CPUs. This build-time-selectable ktime_t storage format was implemented
+to avoid the performance impact of 64-bit multiplications and divisions
+on 32bit CPUs. Such operations are frequently necessary to convert
+between the storage formats provided by kernel and userspace interfaces
+and the internal time format. (See include/linux/ktime.h for further
+details.)
+
+hrtimers - rounding of timer values
+-----------------------------------
+
+the hrtimer code will round timer events to lower-resolution clocks
+because it has to. Otherwise it will do no artificial rounding at all.
+
+one question is, what resolution value should be returned to the user by
+the clock_getres() interface. This will return whatever real resolution
+a given clock has - be it low-res, high-res, or artificially-low-res.
+
+hrtimers - testing and verification
+----------------------------------
+
+We used the high-resolution clock subsystem ontop of hrtimers to verify
+the hrtimer implementation details in praxis, and we also ran the posix
+timer tests in order to ensure specification compliance. We also ran
+tests on low-resolution clocks.
+
+The hrtimer patch converts the following kernel functionality to use
+hrtimers:
+
+ - nanosleep
+ - itimers
+ - posix-timers
+
+The conversion of nanosleep and posix-timers enabled the unification of
+nanosleep and clock_nanosleep.
+
+The code was successfully compiled for the following platforms:
+
+ i386, x86_64, ARM, PPC, PPC64, IA64
+
+The code was run-tested on the following platforms:
+
+ i386(UP/SMP), x86_64(UP/SMP), ARM, PPC
+
+hrtimers were also integrated into the -rt tree, along with a
+hrtimers-based high-resolution clock implementation, so the hrtimers
+code got a healthy amount of testing and use in practice.
+
+ Thomas Gleixner, Ingo Molnar
diff --git a/Documentation/hwmon/w83627hf b/Documentation/hwmon/w83627hf
index 78f37c2d602..5d23776e990 100644
--- a/Documentation/hwmon/w83627hf
+++ b/Documentation/hwmon/w83627hf
@@ -54,13 +54,16 @@ If you really want i2c accesses for these Super I/O chips,
use the w83781d driver. However this is not the preferred method
now that this ISA driver has been developed.
-Technically, the w83627thf does not support a VID reading. However, it's
-possible or even likely that your mainboard maker has routed these signals
-to a specific set of general purpose IO pins (the Asus P4C800-E is one such
-board). The w83627thf driver now interprets these as VID. If the VID on
-your board doesn't work, first see doc/vid in the lm_sensors package. If
-that still doesn't help, email us at lm-sensors@lm-sensors.org.
+The w83627_HF_ uses pins 110-106 as VID0-VID4. The w83627_THF_ uses the
+same pins as GPIO[0:4]. Technically, the w83627_THF_ does not support a
+VID reading. However the two chips have the identical 128 pin package. So,
+it is possible or even likely for a w83627thf to have the VID signals routed
+to these pins despite their not being labeled for that purpose. Therefore,
+the w83627thf driver interprets these as VID. If the VID on your board
+doesn't work, first see doc/vid in the lm_sensors package[1]. If that still
+doesn't help, you may just ignore the bogus VID reading with no harm done.
-For further information on this driver see the w83781d driver
-documentation.
+For further information on this driver see the w83781d driver documentation.
+
+[1] http://www2.lm-sensors.nu/~lm78/cvs/browse.cgi/lm_sensors2/doc/vid
diff --git a/Documentation/i2c/busses/i2c-nforce2 b/Documentation/i2c/busses/i2c-nforce2
index e379e182e64..d751282d9b2 100644
--- a/Documentation/i2c/busses/i2c-nforce2
+++ b/Documentation/i2c/busses/i2c-nforce2
@@ -5,7 +5,8 @@ Supported adapters:
* nForce2 Ultra 400 MCP 10de:0084
* nForce3 Pro150 MCP 10de:00D4
* nForce3 250Gb MCP 10de:00E4
- * nForce4 MCP 10de:0052
+ * nForce4 MCP 10de:0052
+ * nForce4 MCP-04 10de:0034
Datasheet: not publically available, but seems to be similar to the
AMD-8111 SMBus 2.0 adapter.
diff --git a/Documentation/i2c/busses/i2c-parport b/Documentation/i2c/busses/i2c-parport
index 9f1d0082da1..d9f23c0763f 100644
--- a/Documentation/i2c/busses/i2c-parport
+++ b/Documentation/i2c/busses/i2c-parport
@@ -17,6 +17,7 @@ It currently supports the following devices:
* Velleman K8000 adapter
* ELV adapter
* Analog Devices evaluation boards (ADM1025, ADM1030, ADM1031, ADM1032)
+ * Barco LPT->DVI (K5800236) adapter
These devices use different pinout configurations, so you have to tell
the driver what you have, using the type module parameter. There is no
diff --git a/Documentation/i2c/porting-clients b/Documentation/i2c/porting-clients
index 184fac2377a..f03c2a02f80 100644
--- a/Documentation/i2c/porting-clients
+++ b/Documentation/i2c/porting-clients
@@ -1,10 +1,13 @@
-Revision 5, 2005-07-29
+Revision 6, 2005-11-20
Jean Delvare <khali@linux-fr.org>
Greg KH <greg@kroah.com>
This is a guide on how to convert I2C chip drivers from Linux 2.4 to
Linux 2.6. I have been using existing drivers (lm75, lm78) as examples.
Then I converted a driver myself (lm83) and updated this document.
+Note that this guide is strongly oriented towards hardware monitoring
+drivers. Many points are still valid for other type of drivers, but
+others may be irrelevant.
There are two sets of points below. The first set concerns technical
changes. The second set concerns coding policy. Both are mandatory.
@@ -22,16 +25,20 @@ Technical changes:
#include <linux/module.h>
#include <linux/init.h>
#include <linux/slab.h>
+ #include <linux/jiffies.h>
#include <linux/i2c.h>
+ #include <linux/i2c-isa.h> /* for ISA drivers */
#include <linux/hwmon.h> /* for hardware monitoring drivers */
#include <linux/hwmon-sysfs.h>
#include <linux/hwmon-vid.h> /* if you need VRM support */
+ #include <linux/err.h> /* for class registration */
#include <asm/io.h> /* if you have I/O operations */
Please respect this inclusion order. Some extra headers may be
required for a given driver (e.g. "lm75.h").
* [Addresses] SENSORS_I2C_END becomes I2C_CLIENT_END, ISA addresses
- are no more handled by the i2c core.
+ are no more handled by the i2c core. Address ranges are no more
+ supported either, define each individual address separately.
SENSORS_INSMOD_<n> becomes I2C_CLIENT_INSMOD_<n>.
* [Client data] Get rid of sysctl_id. Try using standard names for
@@ -48,23 +55,23 @@ Technical changes:
int kind);
static void lm75_init_client(struct i2c_client *client);
static int lm75_detach_client(struct i2c_client *client);
- static void lm75_update_client(struct i2c_client *client);
+ static struct lm75_data lm75_update_device(struct device *dev);
* [Sysctl] All sysctl stuff is of course gone (defines, ctl_table
and functions). Instead, you have to define show and set functions for
each sysfs file. Only define set for writable values. Take a look at an
- existing 2.6 driver for details (lm78 for example). Don't forget
+ existing 2.6 driver for details (it87 for example). Don't forget
to define the attributes for each file (this is that step that
links callback functions). Use the file names specified in
- Documentation/i2c/sysfs-interface for the individual files. Also
+ Documentation/hwmon/sysfs-interface for the individual files. Also
convert the units these files read and write to the specified ones.
If you need to add a new type of file, please discuss it on the
sensors mailing list <lm-sensors@lm-sensors.org> by providing a
- patch to the Documentation/i2c/sysfs-interface file.
+ patch to the Documentation/hwmon/sysfs-interface file.
* [Attach] For I2C drivers, the attach function should make sure
- that the adapter's class has I2C_CLASS_HWMON, using the
- following construct:
+ that the adapter's class has I2C_CLASS_HWMON (or whatever class is
+ suitable for your driver), using the following construct:
if (!(adapter->class & I2C_CLASS_HWMON))
return 0;
ISA-only drivers of course don't need this.
@@ -72,63 +79,72 @@ Technical changes:
* [Detect] As mentioned earlier, the flags parameter is gone.
The type_name and client_name strings are replaced by a single
- name string, which will be filled with a lowercase, short string
- (typically the driver name, e.g. "lm75").
+ name string, which will be filled with a lowercase, short string.
In i2c-only drivers, drop the i2c_is_isa_adapter check, it's
useless. Same for isa-only drivers, as the test would always be
true. Only hybrid drivers (which are quite rare) still need it.
- The errorN labels are reduced to the number needed. If that number
- is 2 (i2c-only drivers), it is advised that the labels are named
- exit and exit_free. For i2c+isa drivers, labels should be named
- ERROR0, ERROR1 and ERROR2. Don't forget to properly set err before
+ The labels used for error paths are reduced to the number needed.
+ It is advised that the labels are given descriptive names such as
+ exit and exit_free. Don't forget to properly set err before
jumping to error labels. By the way, labels should be left-aligned.
Use kzalloc instead of kmalloc.
Use i2c_set_clientdata to set the client data (as opposed to
a direct access to client->data).
- Use strlcpy instead of strcpy to copy the client name.
+ Use strlcpy instead of strcpy or snprintf to copy the client name.
Replace the sysctl directory registration by calls to
device_create_file. Move the driver initialization before any
sysfs file creation.
+ Register the client with the hwmon class (using hwmon_device_register)
+ if applicable.
Drop client->id.
Drop any 24RF08 corruption prevention you find, as this is now done
at the i2c-core level, and doing it twice voids it.
+ Don't add I2C_CLIENT_ALLOW_USE to client->flags, it's the default now.
* [Init] Limits must not be set by the driver (can be done later in
user-space). Chip should not be reset default (although a module
- parameter may be used to force is), and initialization should be
+ parameter may be used to force it), and initialization should be
limited to the strictly necessary steps.
-* [Detach] Get rid of data, remove the call to
- i2c_deregister_entry. Do not log an error message if
- i2c_detach_client fails, as i2c-core will now do it for you.
-
-* [Update] Don't access client->data directly, use
- i2c_get_clientdata(client) instead.
-
-* [Interface] Init function should not print anything. Make sure
- there is a MODULE_LICENSE() line, at the bottom of the file
- (after MODULE_AUTHOR() and MODULE_DESCRIPTION(), in this order).
+* [Detach] Remove the call to i2c_deregister_entry. Do not log an
+ error message if i2c_detach_client fails, as i2c-core will now do
+ it for you.
+ Unregister from the hwmon class if applicable.
+
+* [Update] The function prototype changed, it is now
+ passed a device structure, which you have to convert to a client
+ using to_i2c_client(dev). The update function should return a
+ pointer to the client data.
+ Don't access client->data directly, use i2c_get_clientdata(client)
+ instead.
+ Use time_after() instead of direct jiffies comparison.
+
+* [Interface] Make sure there is a MODULE_LICENSE() line, at the bottom
+ of the file (after MODULE_AUTHOR() and MODULE_DESCRIPTION(), in this
+ order).
+
+* [Driver] The flags field of the i2c_driver structure is gone.
+ I2C_DF_NOTIFY is now the default behavior.
+ The i2c_driver structure has a driver member, which is itself a
+ structure, those name member should be initialized to a driver name
+ string. i2c_driver itself has no name member anymore.
Coding policy:
* [Copyright] Use (C), not (c), for copyright.
* [Debug/log] Get rid of #ifdef DEBUG/#endif constructs whenever you
- can. Calls to printk/pr_debug for debugging purposes are replaced
- by calls to dev_dbg. Here is an example on how to call it (taken
- from lm75_detect):
+ can. Calls to printk for debugging purposes are replaced by calls to
+ dev_dbg where possible, else to pr_debug. Here is an example of how
+ to call it (taken from lm75_detect):
dev_dbg(&client->dev, "Starting lm75 update\n");
Replace other printk calls with the dev_info, dev_err or dev_warn
function, as appropriate.
-* [Constants] Constants defines (registers, conversions, initial
- values) should be aligned. This greatly improves readability.
- Same goes for variables declarations. Alignments are achieved by the
- means of tabs, not spaces. Remember that tabs are set to 8 in the
- Linux kernel code.
-
-* [Structure definition] The name field should be standardized. All
- lowercase and as simple as the driver name itself (e.g. "lm75").
+* [Constants] Constants defines (registers, conversions) should be
+ aligned. This greatly improves readability.
+ Alignments are achieved by the means of tabs, not spaces. Remember
+ that tabs are set to 8 in the Linux kernel code.
* [Layout] Avoid extra empty lines between comments and what they
comment. Respect the coding style (see Documentation/CodingStyle),
diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients
index d19993cc060..3a057c8e550 100644
--- a/Documentation/i2c/writing-clients
+++ b/Documentation/i2c/writing-clients
@@ -25,9 +25,9 @@ routines, a client structure specific information like the actual I2C
address.
static struct i2c_driver foo_driver = {
- .owner = THIS_MODULE,
- .name = "Foo version 2.3 driver",
- .flags = I2C_DF_NOTIFY,
+ .driver = {
+ .name = "foo",
+ },
.attach_adapter = &foo_attach_adapter,
.detach_client = &foo_detach_client,
.command = &foo_command /* may be NULL */
@@ -36,10 +36,6 @@ static struct i2c_driver foo_driver = {
The name field must match the driver name, including the case. It must not
contain spaces, and may be up to 31 characters long.
-Don't worry about the flags field; just put I2C_DF_NOTIFY into it. This
-means that your driver will be notified when new adapters are found.
-This is almost always what you want.
-
All other fields are for call-back functions which will be explained
below.
@@ -496,17 +492,13 @@ Note that some functions are marked by `__init', and some data structures
by `__init_data'. Hose functions and structures can be removed after
kernel booting (or module loading) is completed.
+
Command function
================
A generic ioctl-like function call back is supported. You will seldom
-need this. You may even set it to NULL.
-
- /* No commands defined */
- int foo_command(struct i2c_client *client, unsigned int cmd, void *arg)
- {
- return 0;
- }
+need this, and its use is deprecated anyway, so newer design should not
+use it. Set it to NULL.
Sending and receiving
diff --git a/Documentation/i2o/ioctl b/Documentation/i2o/ioctl
index 3e174978997..1e77fac4e12 100644
--- a/Documentation/i2o/ioctl
+++ b/Documentation/i2o/ioctl
@@ -185,7 +185,7 @@ VII. Getting Parameters
ENOMEM Kernel memory allocation error
A return value of 0 does not mean that the value was actually
- properly retreived. The user should check the result list
+ properly retrieved. The user should check the result list
to determine the specific status of the transaction.
VIII. Downloading Software
diff --git a/Documentation/input/appletouch.txt b/Documentation/input/appletouch.txt
index b48d11d0326..4f7c633a76d 100644
--- a/Documentation/input/appletouch.txt
+++ b/Documentation/input/appletouch.txt
@@ -3,7 +3,7 @@ Apple Touchpad Driver (appletouch)
Copyright (C) 2005 Stelian Pop <stelian@popies.net>
appletouch is a Linux kernel driver for the USB touchpad found on post
-February 2005 Apple Alu Powerbooks.
+February 2005 and October 2005 Apple Aluminium Powerbooks.
This driver is derived from Johannes Berg's appletrackpad driver[1], but it has
been improved in some areas:
@@ -13,7 +13,8 @@ been improved in some areas:
Credits go to Johannes Berg for reverse-engineering the touchpad protocol,
Frank Arnold for further improvements, and Alex Harper for some additional
-information about the inner workings of the touchpad sensors.
+information about the inner workings of the touchpad sensors. Michael
+Hanselmann added support for the October 2005 models.
Usage:
------
diff --git a/Documentation/input/ff.txt b/Documentation/input/ff.txt
index efa7dd6751f..c7e10eaff20 100644
--- a/Documentation/input/ff.txt
+++ b/Documentation/input/ff.txt
@@ -120,7 +120,7 @@ to the unique id assigned by the driver. This data is required for performing
some operations (removing an effect, controlling the playback).
This if field must be set to -1 by the user in order to tell the driver to
allocate a new effect.
-See <linux/input.h> for a description of the ff_effect stuct. You should also
+See <linux/input.h> for a description of the ff_effect struct. You should also
find help in a few sketches, contained in files shape.fig and interactive.fig.
You need xfig to visualize these files.
diff --git a/Documentation/ioctl/hdio.txt b/Documentation/ioctl/hdio.txt
index 9a7aea0636a..11c9be49f37 100644
--- a/Documentation/ioctl/hdio.txt
+++ b/Documentation/ioctl/hdio.txt
@@ -946,7 +946,7 @@ HDIO_SCAN_HWIF register and (re)scan interface
This ioctl initializes the addresses and irq for a disk
controller, probes for drives, and creates /proc/ide
- interfaces as appropiate.
+ interfaces as appropriate.
diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt
index d802ce88bed..443230b43e0 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.txt
@@ -1033,9 +1033,9 @@ When kbuild executes the following steps are followed (roughly):
Example:
#arch/i386/Makefile
- GCC_VERSION := $(call cc-version)
cflags-y += $(shell \
- if [ $(GCC_VERSION) -ge 0300 ] ; then echo "-mregparm=3"; fi ;)
+ if [ $(call cc-version) -ge 0300 ] ; then \
+ echo "-mregparm=3"; fi ;)
In the above example -mregparm=3 is only used for gcc version greater
than or equal to gcc 3.0.
diff --git a/Documentation/kbuild/modules.txt b/Documentation/kbuild/modules.txt
index c91caf7eb30..7e77f93634e 100644
--- a/Documentation/kbuild/modules.txt
+++ b/Documentation/kbuild/modules.txt
@@ -18,6 +18,7 @@ In this document you will find information about:
=== 5. Include files
--- 5.1 How to include files from the kernel include dir
--- 5.2 External modules using an include/ dir
+ --- 5.3 External modules using several directories
=== 6. Module installation
--- 6.1 INSTALL_MOD_PATH
--- 6.2 INSTALL_MOD_DIR
@@ -38,7 +39,7 @@ included in the kernel tree.
What is covered within this file is mainly information to authors
of modules. The author of an external modules should supply
a makefile that hides most of the complexity so one only has to type
-'make' to buld the module. A complete example will be present in
+'make' to build the module. A complete example will be present in
chapter Ī. Creating a kbuild file for an external module".
@@ -69,7 +70,7 @@ when building an external module.
--- 2.2 Available targets
- $KDIR refers to path to kernel source top-level directory
+ $KDIR refers to the path to the kernel source top-level directory
make -C $KDIR M=`pwd`
Will build the module(s) located in current directory.
@@ -87,11 +88,11 @@ when building an external module.
make -C $KDIR M=$PWD modules_install
Install the external module(s).
Installation default is in /lib/modules/<kernel-version>/extra,
- but may be prefixed with INSTALL_MOD_PATH - see separate chater.
+ but may be prefixed with INSTALL_MOD_PATH - see separate chapter.
make -C $KDIR M=$PWD clean
Remove all generated files for the module - the kernel
- source directory is not moddified.
+ source directory is not modified.
make -C $KDIR M=`pwd` help
help will list the available target when building external
@@ -99,7 +100,7 @@ when building an external module.
--- 2.3 Available options:
- $KDIR refer to path to kernel src
+ $KDIR refers to the path to the kernel source top-level directory
make -C $KDIR
Used to specify where to find the kernel source.
@@ -206,11 +207,11 @@ following files:
KERNELDIR := /lib/modules/`uname -r`/build
all::
- $(MAKE) -C $KERNELDIR M=`pwd` $@
+ $(MAKE) -C $(KERNELDIR) M=`pwd` $@
# Module specific targets
genbin:
- echo "X" > 8123_bini.o_shipped
+ echo "X" > 8123_bin.o_shipped
endif
@@ -341,13 +342,52 @@ directory and therefore needs to deal with this in their kbuild file.
EXTRA_CFLAGS := -Iinclude
8123-y := 8123_if.o 8123_pci.o 8123_bin.o
- Note that in the assingment there is no space between -I and the path.
- This is a kbuild limitation and no space must be present.
-
+ Note that in the assignment there is no space between -I and the path.
+ This is a kbuild limitation: there must be no space present.
+
+--- 5.3 External modules using several directories
+
+ If an external module does not follow the usual kernel style but
+ decide to spread files over several directories then kbuild can
+ support this too.
+
+ Consider the following example:
+
+ |
+ +- src/complex_main.c
+ | +- hal/hardwareif.c
+ | +- hal/include/hardwareif.h
+ +- include/complex.h
+
+ To build a single module named complex.ko we then need the following
+ kbuild file:
+
+ Kbuild:
+ obj-m := complex.o
+ complex-y := src/complex_main.o
+ complex-y += src/hal/hardwareif.o
+
+ EXTRA_CFLAGS := -I$(src)/include
+ EXTRA_CFLAGS += -I$(src)src/hal/include
+
+
+ kbuild knows how to handle .o files located in another directory -
+ although this is NOT reccommended practice. The syntax is to specify
+ the directory relative to the directory where the Kbuild file is
+ located.
+
+ To find the .h files we have to explicitly tell kbuild where to look
+ for the .h files. When kbuild executes current directory is always
+ the root of the kernel tree (argument to -C) and therefore we have to
+ tell kbuild how to find the .h files using absolute paths.
+ $(src) will specify the absolute path to the directory where the
+ Kbuild file are located when being build as an external module.
+ Therefore -I$(src)/ is used to point out the directory of the Kbuild
+ file and any additional path are just appended.
=== 6. Module installation
-Modules which are included in the kernel is installed in the directory:
+Modules which are included in the kernel are installed in the directory:
/lib/modules/$(KERNELRELEASE)/kernel
@@ -365,7 +405,7 @@ External modules are installed in the directory:
=> Install dir: /frodo/lib/modules/$(KERNELRELEASE)/kernel
INSTALL_MOD_PATH may be set as an ordinary shell variable or as in the
- example above be specified on the commandline when calling make.
+ example above be specified on the command line when calling make.
INSTALL_MOD_PATH has effect both when installing modules included in
the kernel as well as when installing external modules.
@@ -384,7 +424,7 @@ External modules are installed in the directory:
=== 7. Module versioning
-Module versioning are enabled by the CONFIG_MODVERSIONS tag.
+Module versioning is enabled by the CONFIG_MODVERSIONS tag.
Module versioning is used as a simple ABI consistency check. The Module
versioning creates a CRC value of the full prototype for an exported symbol and
diff --git a/Documentation/kdump/gdbmacros.txt b/Documentation/kdump/gdbmacros.txt
index bc1b9eb92ae..dcf5580380a 100644
--- a/Documentation/kdump/gdbmacros.txt
+++ b/Documentation/kdump/gdbmacros.txt
@@ -177,3 +177,25 @@ document trapinfo
'trapinfo <pid>' will tell you by which trap & possibly
addresthe kernel paniced.
end
+
+
+define dmesg
+ set $i = 0
+ set $end_idx = (log_end - 1) & (log_buf_len - 1)
+
+ while ($i < logged_chars)
+ set $idx = (log_end - 1 - logged_chars + $i) & (log_buf_len - 1)
+
+ if ($idx + 100 <= $end_idx) || \
+ ($end_idx <= $idx && $idx + 100 < log_buf_len)
+ printf "%.100s", &log_buf[$idx]
+ set $i = $i + 100
+ else
+ printf "%c", log_buf[$idx]
+ set $i = $i + 1
+ end
+ end
+end
+document dmesg
+ print the kernel ring buffer
+end
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 5f08f9ce604..212cf3c21ab 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -4,10 +4,10 @@ Documentation for kdump - the kexec-based crash dumping solution
DESIGN
======
-Kdump uses kexec to reboot to a second kernel whenever a dump needs to be taken.
-This second kernel is booted with very little memory. The first kernel reserves
-the section of memory that the second kernel uses. This ensures that on-going
-DMA from the first kernel does not corrupt the second kernel.
+Kdump uses kexec to reboot to a second kernel whenever a dump needs to be
+taken. This second kernel is booted with very little memory. The first kernel
+reserves the section of memory that the second kernel uses. This ensures that
+on-going DMA from the first kernel does not corrupt the second kernel.
All the necessary information about Core image is encoded in ELF format and
stored in reserved area of memory before crash. Physical address of start of
@@ -35,77 +35,82 @@ In the second kernel, "old memory" can be accessed in two ways.
SETUP
=====
-1) Download http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz
- and apply http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-kdump.patch
- and after that build the source.
+1) Download the upstream kexec-tools userspace package from
+ http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz.
-2) Download and build the appropriate (2.6.13-rc1 onwards) vanilla kernel.
+ Apply the latest consolidated kdump patch on top of kexec-tools-1.101
+ from http://lse.sourceforge.net/kdump/. This arrangment has been made
+ till all the userspace patches supporting kdump are integrated with
+ upstream kexec-tools userspace.
+2) Download and build the appropriate (2.6.13-rc1 onwards) vanilla kernels.
Two kernels need to be built in order to get this feature working.
+ Following are the steps to properly configure the two kernels specific
+ to kexec and kdump features:
- A) First kernel:
+ A) First kernel or regular kernel:
+ ----------------------------------
a) Enable "kexec system call" feature (in Processor type and features).
- CONFIG_KEXEC=y
- b) This kernel's physical load address should be the default value of
- 0x100000 (0x100000, 1 MB) (in Processor type and features).
- CONFIG_PHYSICAL_START=0x100000
- c) Enable "sysfs file system support" (in Pseudo filesystems).
- CONFIG_SYSFS=y
+ CONFIG_KEXEC=y
+ b) Enable "sysfs file system support" (in Pseudo filesystems).
+ CONFIG_SYSFS=y
+ c) make
d) Boot into first kernel with the command line parameter "crashkernel=Y@X".
Use appropriate values for X and Y. Y denotes how much memory to reserve
- for the second kernel, and X denotes at what physical address the reserved
- memory section starts. For example: "crashkernel=64M@16M".
-
- B) Second kernel:
- a) Enable "kernel crash dumps" feature (in Processor type and features).
- CONFIG_CRASH_DUMP=y
- b) Specify a suitable value for "Physical address where the kernel is
- loaded" (in Processor type and features). Typically this value
- should be same as X (See option d) above, e.g., 16 MB or 0x1000000.
- CONFIG_PHYSICAL_START=0x1000000
- c) Enable "/proc/vmcore support" (Optional, in Pseudo filesystems).
- CONFIG_PROC_VMCORE=y
- d) Disable SMP support and build a UP kernel (Until it is fixed).
- CONFIG_SMP=n
- e) Enable "Local APIC support on uniprocessors".
- CONFIG_X86_UP_APIC=y
- f) Enable "IO-APIC support on uniprocessors"
- CONFIG_X86_UP_IOAPIC=y
-
- Note: i) Options a) and b) depend upon "Configure standard kernel features
- (for small systems)" (under General setup).
- ii) Option a) also depends on CONFIG_HIGHMEM (under Processor
- type and features).
- iii) Both option a) and b) are under "Processor type and features".
-
-3) Boot into the first kernel. You are now ready to try out kexec-based crash
- dumps.
-
-4) Load the second kernel to be booted using:
+ for the second kernel, and X denotes at what physical address the
+ reserved memory section starts. For example: "crashkernel=64M@16M".
+
+
+ B) Second kernel or dump capture kernel:
+ ---------------------------------------
+ a) For i386 architecture enable Highmem support
+ CONFIG_HIGHMEM=y
+ b) Enable "kernel crash dumps" feature (under "Processor type and features")
+ CONFIG_CRASH_DUMP=y
+ c) Make sure a suitable value for "Physical address where the kernel is
+ loaded" (under "Processor type and features"). By default this value
+ is 0x1000000 (16MB) and it should be same as X (See option d above),
+ e.g., 16 MB or 0x1000000.
+ CONFIG_PHYSICAL_START=0x1000000
+ d) Enable "/proc/vmcore support" (Optional, under "Pseudo filesystems").
+ CONFIG_PROC_VMCORE=y
+
+3) After booting to regular kernel or first kernel, load the second kernel
+ using the following command:
kexec -p <second-kernel> --args-linux --elf32-core-headers
- --append="root=<root-dev> init 1 irqpoll"
-
- Note: i) <second-kernel> has to be a vmlinux image. bzImage will not work,
- as of now.
- ii) By default ELF headers are stored in ELF64 format. Option
- --elf32-core-headers forces generation of ELF32 headers. gdb can
- not open ELF64 headers on 32 bit systems. So creating ELF32
- headers can come handy for users who have got non-PAE systems and
- hence have memory less than 4GB.
- iii) Specify "irqpoll" as command line parameter. This reduces driver
- initialization failures in second kernel due to shared interrupts.
- iv) <root-dev> needs to be specified in a format corresponding to
- the root device name in the output of mount command.
- v) If you have built the drivers required to mount root file
- system as modules in <second-kernel>, then, specify
- --initrd=<initrd-for-second-kernel>.
-
-5) System reboots into the second kernel when a panic occurs. A module can be
- written to force the panic or "ALT-SysRq-c" can be used initiate a crash
- dump for testing purposes.
-
-6) Write out the dump file using
+ --append="root=<root-dev> init 1 irqpoll maxcpus=1"
+
+ Notes:
+ ======
+ i) <second-kernel> has to be a vmlinux image ie uncompressed elf image.
+ bzImage will not work, as of now.
+ ii) --args-linux has to be speicfied as if kexec it loading an elf image,
+ it needs to know that the arguments supplied are of linux type.
+ iii) By default ELF headers are stored in ELF64 format to support systems
+ with more than 4GB memory. Option --elf32-core-headers forces generation
+ of ELF32 headers. The reason for this option being, as of now gdb can
+ not open vmcore file with ELF64 headers on a 32 bit systems. So ELF32
+ headers can be used if one has non-PAE systems and hence memory less
+ than 4GB.
+ iv) Specify "irqpoll" as command line parameter. This reduces driver
+ initialization failures in second kernel due to shared interrupts.
+ v) <root-dev> needs to be specified in a format corresponding to the root
+ device name in the output of mount command.
+ vi) If you have built the drivers required to mount root file system as
+ modules in <second-kernel>, then, specify
+ --initrd=<initrd-for-second-kernel>.
+ vii) Specify maxcpus=1 as, if during first kernel run, if panic happens on
+ non-boot cpus, second kernel doesn't seem to be boot up all the cpus.
+ The other option is to always built the second kernel without SMP
+ support ie CONFIG_SMP=n
+
+4) After successfully loading the second kernel as above, if a panic occurs
+ system reboots into the second kernel. A module can be written to force
+ the panic or "ALT-SysRq-c" can be used initiate a crash dump for testing
+ purposes.
+
+5) Once the second kernel has booted, write out the dump file using
cp /proc/vmcore <dump-file>
@@ -119,9 +124,9 @@ SETUP
Entire memory: dd if=/dev/oldmem of=oldmem.001
+
ANALYSIS
========
-
Limited analysis can be done using gdb on the dump file copied out of
/proc/vmcore. Use vmlinux built with -g and run
@@ -132,15 +137,19 @@ work fine.
Note: gdb cannot analyse core files generated in ELF64 format for i386.
+Latest "crash" (crash-4.0-2.18) as available on Dave Anderson's site
+http://people.redhat.com/~anderson/ works well with kdump format.
+
+
TODO
====
-
1) Provide a kernel pages filtering mechanism so that core file size is not
insane on systems having huge memory banks.
-2) Modify "crash" tool to make it recognize this dump.
+2) Relocatable kernel can help in maintaining multiple kernels for crashdump
+ and same kernel as the first kernel can be used to capture the dump.
+
CONTACT
=======
-
Vivek Goyal (vgoyal@in.ibm.com)
Maneesh Soni (maneesh@in.ibm.com)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 5dffcfefc3c..1cbcf65b764 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -471,14 +471,15 @@ running once the system is up.
arch/i386/kernel/cpu/cpufreq/elanfreq.c.
elevator= [IOSCHED]
- Format: {"as" | "cfq" | "deadline" | "noop"}
+ Format: {"anticipatory" | "cfq" | "deadline" | "noop"}
See Documentation/block/as-iosched.txt and
Documentation/block/deadline-iosched.txt for details.
- elfcorehdr= [IA-32]
+ elfcorehdr= [IA-32, X86_64]
Specifies physical address of start of kernel core
- image elf header.
- See Documentation/kdump.txt for details.
+ image elf header. Generally kexec loader will
+ pass this option to capture kernel.
+ See Documentation/kdump/kdump.txt for details.
enforcing [SELINUX] Set initial enforcing status.
Format: {"0" | "1"}
@@ -633,6 +634,14 @@ running once the system is up.
inport.irq= [HW] Inport (ATI XL and Microsoft) busmouse driver
Format: <irq>
+ combined_mode= [HW] control which driver uses IDE ports in combined
+ mode: legacy IDE driver, libata, or both
+ (in the libata case, libata.atapi_enabled=1 may be
+ useful as well). Note that using the ide or libata
+ options may affect your device naming (e.g. by
+ changing hdc to sdb).
+ Format: combined (default), ide, or libata
+
inttest= [IA64]
io7= [HW] IO7 for Marvel based alpha systems
@@ -703,9 +712,17 @@ running once the system is up.
load_ramdisk= [RAM] List of ramdisks to load from floppy
See Documentation/ramdisk.txt.
- lockd.udpport= [NFS]
+ lockd.nlm_grace_period=P [NFS] Assign grace period.
+ Format: <integer>
+
+ lockd.nlm_tcpport=N [NFS] Assign TCP port.
+ Format: <integer>
+
+ lockd.nlm_timeout=T [NFS] Assign timeout value.
+ Format: <integer>
- lockd.tcpport= [NFS]
+ lockd.nlm_udpport=M [NFS] Assign UDP port.
+ Format: <integer>
logibm.irq= [HW,MOUSE] Logitech Bus Mouse Driver
Format: <irq>
@@ -824,7 +841,7 @@ running once the system is up.
mem=nopentium [BUGS=IA-32] Disable usage of 4MB pages for kernel
memory.
- memmap=exactmap [KNL,IA-32] Enable setting of an exact
+ memmap=exactmap [KNL,IA-32,X86_64] Enable setting of an exact
E820 memory map, as specified by the user.
Such memmap=exactmap lines can be constructed based on
BIOS output or other requirements. See the memmap=nn@ss
@@ -847,6 +864,49 @@ running once the system is up.
mga= [HW,DRM]
+ migration_cost=
+ [KNL,SMP] debug: override scheduler migration costs
+ Format: <level-1-usecs>,<level-2-usecs>,...
+ This debugging option can be used to override the
+ default scheduler migration cost matrix. The numbers
+ are indexed by 'CPU domain distance'.
+ E.g. migration_cost=1000,2000,3000 on an SMT NUMA
+ box will set up an intra-core migration cost of
+ 1 msec, an inter-core migration cost of 2 msecs,
+ and an inter-node migration cost of 3 msecs.
+
+ WARNING: using the wrong values here can break
+ scheduler performance, so it's only for scheduler
+ development purposes, not production environments.
+
+ migration_debug=
+ [KNL,SMP] migration cost auto-detect verbosity
+ Format=<0|1|2>
+ If a system's migration matrix reported at bootup
+ seems erroneous then this option can be used to
+ increase verbosity of the detection process.
+ We default to 0 (no extra messages), 1 will print
+ some more information, and 2 will be really
+ verbose (probably only useful if you also have a
+ serial console attached to the system).
+
+ migration_factor=
+ [KNL,SMP] multiply/divide migration costs by a factor
+ Format=<percent>
+ This debug option can be used to proportionally
+ increase or decrease the auto-detected migration
+ costs for all entries of the migration matrix.
+ E.g. migration_factor=150 will increase migration
+ costs by 50%. (and thus the scheduler will be less
+ eager migrating cache-hot tasks)
+ migration_factor=80 will decrease migration costs
+ by 20%. (thus the scheduler will be more eager to
+ migrate tasks)
+
+ WARNING: using the wrong values here can break
+ scheduler performance, so it's only for scheduler
+ development purposes, not production environments.
+
mousedev.tap_time=
[MOUSE] Maximum time between finger touching and
leaving touchpad surface for touch to be considered
@@ -902,6 +962,14 @@ running once the system is up.
nfsroot= [NFS] nfs root filesystem for disk-less boxes.
See Documentation/nfsroot.txt.
+ nfs.callback_tcpport=
+ [NFS] set the TCP port on which the NFSv4 callback
+ channel should listen.
+
+ nfs.idmap_cache_timeout=
+ [NFS] set the maximum lifetime for idmapper cache
+ entries.
+
nmi_watchdog= [KNL,BUGS=IA-32] Debugging features for SMP kernels
no387 [BUGS=IA-32] Tells the kernel to use the 387 maths
@@ -982,6 +1050,8 @@ running once the system is up.
nowb [ARM]
+ nr_uarts= [SERIAL] maximum number of UARTs to be registered.
+
opl3= [HW,OSS]
Format: <io>
@@ -1160,6 +1230,10 @@ running once the system is up.
Limit processor to maximum C-state
max_cstate=9 overrides any DMI blacklist limit.
+ processor.nocst [HW,ACPI]
+ Ignore the _CST method to determine C-states,
+ instead using the legacy FADT method
+
prompt_ramdisk= [RAM] List of RAM disks to prompt for floppy disk
before loading.
See Documentation/ramdisk.txt.
diff --git a/Documentation/keys-request-key.txt b/Documentation/keys-request-key.txt
index 5f2b9c5edbb..22488d79116 100644
--- a/Documentation/keys-request-key.txt
+++ b/Documentation/keys-request-key.txt
@@ -56,10 +56,12 @@ A request proceeds in the following manner:
(4) request_key() then forks and executes /sbin/request-key with a new session
keyring that contains a link to auth key V.
- (5) /sbin/request-key execs an appropriate program to perform the actual
+ (5) /sbin/request-key assumes the authority associated with key U.
+
+ (6) /sbin/request-key execs an appropriate program to perform the actual
instantiation.
- (6) The program may want to access another key from A's context (say a
+ (7) The program may want to access another key from A's context (say a
Kerberos TGT key). It just requests the appropriate key, and the keyring
search notes that the session keyring has auth key V in its bottom level.
@@ -67,19 +69,19 @@ A request proceeds in the following manner:
UID, GID, groups and security info of process A as if it was process A,
and come up with key W.
- (7) The program then does what it must to get the data with which to
+ (8) The program then does what it must to get the data with which to
instantiate key U, using key W as a reference (perhaps it contacts a
Kerberos server using the TGT) and then instantiates key U.
- (8) Upon instantiating key U, auth key V is automatically revoked so that it
+ (9) Upon instantiating key U, auth key V is automatically revoked so that it
may not be used again.
- (9) The program then exits 0 and request_key() deletes key V and returns key
+(10) The program then exits 0 and request_key() deletes key V and returns key
U to the caller.
-This also extends further. If key W (step 5 above) didn't exist, key W would be
-created uninstantiated, another auth key (X) would be created [as per step 3]
-and another copy of /sbin/request-key spawned [as per step 4]; but the context
+This also extends further. If key W (step 7 above) didn't exist, key W would be
+created uninstantiated, another auth key (X) would be created (as per step 3)
+and another copy of /sbin/request-key spawned (as per step 4); but the context
specified by auth key X will still be process A, as it was in auth key V.
This is because process A's keyrings can't simply be attached to
@@ -138,8 +140,8 @@ until one succeeds:
(3) The process's session keyring is searched.
- (4) If the process has a request_key() authorisation key in its session
- keyring then:
+ (4) If the process has assumed the authority associated with a request_key()
+ authorisation key then:
(a) If extant, the calling process's thread keyring is searched.
diff --git a/Documentation/keys.txt b/Documentation/keys.txt
index 31154882000..aaa01b0e3ee 100644
--- a/Documentation/keys.txt
+++ b/Documentation/keys.txt
@@ -308,6 +308,8 @@ process making the call:
KEY_SPEC_USER_KEYRING -4 UID-specific keyring
KEY_SPEC_USER_SESSION_KEYRING -5 UID-session keyring
KEY_SPEC_GROUP_KEYRING -6 GID-specific keyring
+ KEY_SPEC_REQKEY_AUTH_KEY -7 assumed request_key()
+ authorisation key
The main syscalls are:
@@ -498,7 +500,11 @@ The keyctl syscall functions are:
keyring is full, error ENFILE will result.
The link procedure checks the nesting of the keyrings, returning ELOOP if
- it appears to deep or EDEADLK if the link would introduce a cycle.
+ it appears too deep or EDEADLK if the link would introduce a cycle.
+
+ Any links within the keyring to keys that match the new key in terms of
+ type and description will be discarded from the keyring as the new one is
+ added.
(*) Unlink a key or keyring from another keyring:
@@ -628,6 +634,41 @@ The keyctl syscall functions are:
there is one, otherwise the user default session keyring.
+ (*) Set the timeout on a key.
+
+ long keyctl(KEYCTL_SET_TIMEOUT, key_serial_t key, unsigned timeout);
+
+ This sets or clears the timeout on a key. The timeout can be 0 to clear
+ the timeout or a number of seconds to set the expiry time that far into
+ the future.
+
+ The process must have attribute modification access on a key to set its
+ timeout. Timeouts may not be set with this function on negative, revoked
+ or expired keys.
+
+
+ (*) Assume the authority granted to instantiate a key
+
+ long keyctl(KEYCTL_ASSUME_AUTHORITY, key_serial_t key);
+
+ This assumes or divests the authority required to instantiate the
+ specified key. Authority can only be assumed if the thread has the
+ authorisation key associated with the specified key in its keyrings
+ somewhere.
+
+ Once authority is assumed, searches for keys will also search the
+ requester's keyrings using the requester's security label, UID, GID and
+ groups.
+
+ If the requested authority is unavailable, error EPERM will be returned,
+ likewise if the authority has been revoked because the target key is
+ already instantiated.
+
+ If the specified key is 0, then any assumed authority will be divested.
+
+ The assumed authorititive key is inherited across fork and exec.
+
+
===============
KERNEL SERVICES
===============
@@ -860,24 +901,6 @@ The structure has a number of fields, some of which are mandatory:
It is safe to sleep in this method.
- (*) int (*duplicate)(struct key *key, const struct key *source);
-
- If this type of key can be duplicated, then this method should be
- provided. It is called to copy the payload attached to the source into the
- new key. The data length on the new key will have been updated and the
- quota adjusted already.
-
- This method will be called with the source key's semaphore read-locked to
- prevent its payload from being changed, thus RCU constraints need not be
- applied to the source key.
-
- This method does not have to lock the destination key in order to attach a
- payload. The fact that KEY_FLAG_INSTANTIATED is not set in key->flags
- prevents anything else from gaining access to the key.
-
- It is safe to sleep in this method.
-
-
(*) int (*update)(struct key *key, const void *data, size_t datalen);
If this type of key can be updated, then this method should be provided.
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index 0541fe1de70..0ea5a0c6e82 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -411,7 +411,8 @@ int init_module(void)
printk("Couldn't find %s to plant kprobe\n", "do_fork");
return -1;
}
- if ((ret = register_kprobe(&kp) < 0)) {
+ ret = register_kprobe(&kp);
+ if (ret < 0) {
printk("register_kprobe failed, returned %d\n", ret);
return -1;
}
diff --git a/Documentation/laptop-mode.txt b/Documentation/laptop-mode.txt
index dc4e810afdc..b18e2167590 100644
--- a/Documentation/laptop-mode.txt
+++ b/Documentation/laptop-mode.txt
@@ -3,7 +3,7 @@ How to conserve battery power using laptop-mode
Document Author: Bart Samwel (bart@samwel.tk)
Date created: January 2, 2004
-Last modified: July 10, 2004
+Last modified: December 06, 2004
Introduction
------------
@@ -33,7 +33,7 @@ or anything. Simply install all the files included in this document, and
laptop mode will automatically be started when you're on battery. For
your convenience, a tarball containing an installer can be downloaded at:
-http://www.xs4all.nl/~bsamwel/laptop_mode/tools
+http://www.xs4all.nl/~bsamwel/laptop_mode/tools/
To configure laptop mode, you need to edit the configuration file, which is
located in /etc/default/laptop-mode on Debian-based systems, or in
@@ -357,7 +357,7 @@ MAX_AGE=${MAX_AGE:-'600'}
# Read-ahead, in kilobytes
READAHEAD=${READAHEAD:-'4096'}
-# Shall we remount journaled fs. with appropiate commit interval? (1=yes)
+# Shall we remount journaled fs. with appropriate commit interval? (1=yes)
DO_REMOUNTS=${DO_REMOUNTS:-'1'}
# And shall we add the "noatime" option to that as well? (1=yes)
@@ -912,7 +912,7 @@ void usage()
exit(0);
}
-int main(int ac, char **av)
+int main(int argc, char **argv)
{
int fd;
char *disk = 0;
diff --git a/Documentation/locks.txt b/Documentation/locks.txt
index ce1be79edfb..e3b402ef33b 100644
--- a/Documentation/locks.txt
+++ b/Documentation/locks.txt
@@ -65,20 +65,3 @@ The default is to disallow mandatory locking. The intention is that
mandatory locking only be enabled on a local filesystem as the specific need
arises.
-Until an updated version of mount(8) becomes available you may have to apply
-this patch to the mount sources (based on the version distributed with Rick
-Faith's util-linux-2.5 package):
-
-*** mount.c.orig Sat Jun 8 09:14:31 1996
---- mount.c Sat Jun 8 09:13:02 1996
-***************
-*** 100,105 ****
---- 100,107 ----
- { "noauto", 0, MS_NOAUTO }, /* Can only be mounted explicitly */
- { "user", 0, MS_USER }, /* Allow ordinary user to mount */
- { "nouser", 1, MS_USER }, /* Forbid ordinary user to mount */
-+ { "mand", 0, MS_MANDLOCK }, /* Allow mandatory locks on this FS */
-+ { "nomand", 1, MS_MANDLOCK }, /* Forbid mandatory locks on this FS */
- /* add new options here */
- #ifdef MS_NOSUB
- { "sub", 1, MS_NOSUB }, /* allow submounts */
diff --git a/Documentation/md.txt b/Documentation/md.txt
index 23e6cce40f9..03a13c462cf 100644
--- a/Documentation/md.txt
+++ b/Documentation/md.txt
@@ -51,6 +51,30 @@ superblock can be autodetected and run at boot time.
The kernel parameter "raid=partitionable" (or "raid=part") means
that all auto-detected arrays are assembled as partitionable.
+Boot time assembly of degraded/dirty arrays
+-------------------------------------------
+
+If a raid5 or raid6 array is both dirty and degraded, it could have
+undetectable data corruption. This is because the fact that it is
+'dirty' means that the parity cannot be trusted, and the fact that it
+is degraded means that some datablocks are missing and cannot reliably
+be reconstructed (due to no parity).
+
+For this reason, md will normally refuse to start such an array. This
+requires the sysadmin to take action to explicitly start the array
+desipite possible corruption. This is normally done with
+ mdadm --assemble --force ....
+
+This option is not really available if the array has the root
+filesystem on it. In order to support this booting from such an
+array, md supports a module parameter "start_dirty_degraded" which,
+when set to 1, bypassed the checks and will allows dirty degraded
+arrays to be started.
+
+So, to boot with a root filesystem of a dirty degraded raid[56], use
+
+ md-mod.start_dirty_degraded=1
+
Superblock formats
------------------
@@ -141,6 +165,70 @@ All md devices contain:
in a fully functional array. If this is not yet known, the file
will be empty. If an array is being resized (not currently
possible) this will contain the larger of the old and new sizes.
+ Some raid level (RAID1) allow this value to be set while the
+ array is active. This will reconfigure the array. Otherwise
+ it can only be set while assembling an array.
+
+ chunk_size
+ This is the size if bytes for 'chunks' and is only relevant to
+ raid levels that involve striping (1,4,5,6,10). The address space
+ of the array is conceptually divided into chunks and consecutive
+ chunks are striped onto neighbouring devices.
+ The size should be atleast PAGE_SIZE (4k) and should be a power
+ of 2. This can only be set while assembling an array
+
+ component_size
+ For arrays with data redundancy (i.e. not raid0, linear, faulty,
+ multipath), all components must be the same size - or at least
+ there must a size that they all provide space for. This is a key
+ part or the geometry of the array. It is measured in sectors
+ and can be read from here. Writing to this value may resize
+ the array if the personality supports it (raid1, raid5, raid6),
+ and if the component drives are large enough.
+
+ metadata_version
+ This indicates the format that is being used to record metadata
+ about the array. It can be 0.90 (traditional format), 1.0, 1.1,
+ 1.2 (newer format in varying locations) or "none" indicating that
+ the kernel isn't managing metadata at all.
+
+ level
+ The raid 'level' for this array. The name will often (but not
+ always) be the same as the name of the module that implements the
+ level. To be auto-loaded the module must have an alias
+ md-$LEVEL e.g. md-raid5
+ This can be written only while the array is being assembled, not
+ after it is started.
+
+ new_dev
+ This file can be written but not read. The value written should
+ be a block device number as major:minor. e.g. 8:0
+ This will cause that device to be attached to the array, if it is
+ available. It will then appear at md/dev-XXX (depending on the
+ name of the device) and further configuration is then possible.
+
+ sync_speed_min
+ sync_speed_max
+ This are similar to /proc/sys/dev/raid/speed_limit_{min,max}
+ however they only apply to the particular array.
+ If no value has been written to these, of if the word 'system'
+ is written, then the system-wide value is used. If a value,
+ in kibibytes-per-second is written, then it is used.
+ When the files are read, they show the currently active value
+ followed by "(local)" or "(system)" depending on whether it is
+ a locally set or system-wide value.
+
+ sync_completed
+ This shows the number of sectors that have been completed of
+ whatever the current sync_action is, followed by the number of
+ sectors in total that could need to be processed. The two
+ numbers are separated by a '/' thus effectively showing one
+ value, a fraction of the process that is complete.
+
+ sync_speed
+ This shows the current actual speed, in K/sec, of the current
+ sync_action. It is averaged over the last 30 seconds.
+
As component devices are added to an md array, they appear in the 'md'
directory as new directories named
@@ -167,6 +255,38 @@ Each directory contains:
of being recoverred to
This list make grow in future.
+ errors
+ An approximate count of read errors that have been detected on
+ this device but have not caused the device to be evicted from
+ the array (either because they were corrected or because they
+ happened while the array was read-only). When using version-1
+ metadata, this value persists across restarts of the array.
+
+ This value can be written while assembling an array thus
+ providing an ongoing count for arrays with metadata managed by
+ userspace.
+
+ slot
+ This gives the role that the device has in the array. It will
+ either be 'none' if the device is not active in the array
+ (i.e. is a spare or has failed) or an integer less than the
+ 'raid_disks' number for the array indicating which possition
+ it currently fills. This can only be set while assembling an
+ array. A device for which this is set is assumed to be working.
+
+ offset
+ This gives the location in the device (in sectors from the
+ start) where data from the array will be stored. Any part of
+ the device before this offset us not touched, unless it is
+ used for storing metadata (Formats 1.1 and 1.2).
+
+ size
+ The amount of the device, after the offset, that can be used
+ for storage of data. This will normally be the same as the
+ component_size. This can be written while assembling an
+ array. If a value less than the current component_size is
+ written, component_size will be reduced to this value.
+
An active md device will also contain and entry for each active device
in the array. These are named
diff --git a/Documentation/mutex-design.txt b/Documentation/mutex-design.txt
new file mode 100644
index 00000000000..cbf79881a41
--- /dev/null
+++ b/Documentation/mutex-design.txt
@@ -0,0 +1,135 @@
+Generic Mutex Subsystem
+
+started by Ingo Molnar <mingo@redhat.com>
+
+ "Why on earth do we need a new mutex subsystem, and what's wrong
+ with semaphores?"
+
+firstly, there's nothing wrong with semaphores. But if the simpler
+mutex semantics are sufficient for your code, then there are a couple
+of advantages of mutexes:
+
+ - 'struct mutex' is smaller on most architectures: .e.g on x86,
+ 'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes.
+ A smaller structure size means less RAM footprint, and better
+ CPU-cache utilization.
+
+ - tighter code. On x86 i get the following .text sizes when
+ switching all mutex-alike semaphores in the kernel to the mutex
+ subsystem:
+
+ text data bss dec hex filename
+ 3280380 868188 396860 4545428 455b94 vmlinux-semaphore
+ 3255329 865296 396732 4517357 44eded vmlinux-mutex
+
+ that's 25051 bytes of code saved, or a 0.76% win - off the hottest
+ codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%)
+ Smaller code means better icache footprint, which is one of the
+ major optimization goals in the Linux kernel currently.
+
+ - the mutex subsystem is slightly faster and has better scalability for
+ contended workloads. On an 8-way x86 system, running a mutex-based
+ kernel and testing creat+unlink+close (of separate, per-task files)
+ in /tmp with 16 parallel tasks, the average number of ops/sec is:
+
+ Semaphores: Mutexes:
+
+ $ ./test-mutex V 16 10 $ ./test-mutex V 16 10
+ 8 CPUs, running 16 tasks. 8 CPUs, running 16 tasks.
+ checking VFS performance. checking VFS performance.
+ avg loops/sec: 34713 avg loops/sec: 84153
+ CPU utilization: 63% CPU utilization: 22%
+
+ i.e. in this workload, the mutex based kernel was 2.4 times faster
+ than the semaphore based kernel, _and_ it also had 2.8 times less CPU
+ utilization. (In terms of 'ops per CPU cycle', the semaphore kernel
+ performed 551 ops/sec per 1% of CPU time used, while the mutex kernel
+ performed 3825 ops/sec per 1% of CPU time used - it was 6.9 times
+ more efficient.)
+
+ the scalability difference is visible even on a 2-way P4 HT box:
+
+ Semaphores: Mutexes:
+
+ $ ./test-mutex V 16 10 $ ./test-mutex V 16 10
+ 4 CPUs, running 16 tasks. 8 CPUs, running 16 tasks.
+ checking VFS performance. checking VFS performance.
+ avg loops/sec: 127659 avg loops/sec: 181082
+ CPU utilization: 100% CPU utilization: 34%
+
+ (the straight performance advantage of mutexes is 41%, the per-cycle
+ efficiency of mutexes is 4.1 times better.)
+
+ - there are no fastpath tradeoffs, the mutex fastpath is just as tight
+ as the semaphore fastpath. On x86, the locking fastpath is 2
+ instructions:
+
+ c0377ccb <mutex_lock>:
+ c0377ccb: f0 ff 08 lock decl (%eax)
+ c0377cce: 78 0e js c0377cde <.text.lock.mutex>
+ c0377cd0: c3 ret
+
+ the unlocking fastpath is equally tight:
+
+ c0377cd1 <mutex_unlock>:
+ c0377cd1: f0 ff 00 lock incl (%eax)
+ c0377cd4: 7e 0f jle c0377ce5 <.text.lock.mutex+0x7>
+ c0377cd6: c3 ret
+
+ - 'struct mutex' semantics are well-defined and are enforced if
+ CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand have
+ virtually no debugging code or instrumentation. The mutex subsystem
+ checks and enforces the following rules:
+
+ * - only one task can hold the mutex at a time
+ * - only the owner can unlock the mutex
+ * - multiple unlocks are not permitted
+ * - recursive locking is not permitted
+ * - a mutex object must be initialized via the API
+ * - a mutex object must not be initialized via memset or copying
+ * - task may not exit with mutex held
+ * - memory areas where held locks reside must not be freed
+ * - held mutexes must not be reinitialized
+ * - mutexes may not be used in irq contexts
+
+ furthermore, there are also convenience features in the debugging
+ code:
+
+ * - uses symbolic names of mutexes, whenever they are printed in debug output
+ * - point-of-acquire tracking, symbolic lookup of function names
+ * - list of all locks held in the system, printout of them
+ * - owner tracking
+ * - detects self-recursing locks and prints out all relevant info
+ * - detects multi-task circular deadlocks and prints out all affected
+ * locks and tasks (and only those tasks)
+
+Disadvantages
+-------------
+
+The stricter mutex API means you cannot use mutexes the same way you
+can use semaphores: e.g. they cannot be used from an interrupt context,
+nor can they be unlocked from a different context that which acquired
+it. [ I'm not aware of any other (e.g. performance) disadvantages from
+using mutexes at the moment, please let me know if you find any. ]
+
+Implementation of mutexes
+-------------------------
+
+'struct mutex' is the new mutex type, defined in include/linux/mutex.h
+and implemented in kernel/mutex.c. It is a counter-based mutex with a
+spinlock and a wait-list. The counter has 3 states: 1 for "unlocked",
+0 for "locked" and negative numbers (usually -1) for "locked, potential
+waiters queued".
+
+the APIs of 'struct mutex' have been streamlined:
+
+ DEFINE_MUTEX(name);
+
+ mutex_init(mutex);
+
+ void mutex_lock(struct mutex *lock);
+ int mutex_lock_interruptible(struct mutex *lock);
+ int mutex_trylock(struct mutex *lock);
+ void mutex_unlock(struct mutex *lock);
+ int mutex_is_locked(struct mutex *lock);
+
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index b0fe41da007..8d8b4e5ea18 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -945,7 +945,6 @@ bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
collisions:0 txqueuelen:0
eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
- inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0
TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0
@@ -953,7 +952,6 @@ eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
Interrupt:10 Base address:0x1080
eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
- inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0
TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0
diff --git a/Documentation/networking/gianfar.txt b/Documentation/networking/gianfar.txt
new file mode 100644
index 00000000000..ad474ea07d0
--- /dev/null
+++ b/Documentation/networking/gianfar.txt
@@ -0,0 +1,72 @@
+The Gianfar Ethernet Driver
+Sysfs File description
+
+Author: Andy Fleming <afleming@freescale.com>
+Updated: 2005-07-28
+
+SYSFS
+
+Several of the features of the gianfar driver are controlled
+through sysfs files. These are:
+
+bd_stash:
+To stash RX Buffer Descriptors in the L2, echo 'on' or '1' to
+bd_stash, echo 'off' or '0' to disable
+
+rx_stash_len:
+To stash the first n bytes of the packet in L2, echo the number
+of bytes to buf_stash_len. echo 0 to disable.
+
+WARNING: You could really screw these up if you set them too low or high!
+fifo_threshold:
+To change the number of bytes the controller needs in the
+fifo before it starts transmission, echo the number of bytes to
+fifo_thresh. Range should be 0-511.
+
+fifo_starve:
+When the FIFO has less than this many bytes during a transmit, it
+enters starve mode, and increases the priority of TX memory
+transactions. To change, echo the number of bytes to
+fifo_starve. Range should be 0-511.
+
+fifo_starve_off:
+Once in starve mode, the FIFO remains there until it has this
+many bytes. To change, echo the number of bytes to
+fifo_starve_off. Range should be 0-511.
+
+CHECKSUM OFFLOADING
+
+The eTSEC controller (first included in parts from late 2005 like
+the 8548) has the ability to perform TCP, UDP, and IP checksums
+in hardware. The Linux kernel only offloads the TCP and UDP
+checksums (and always performs the pseudo header checksums), so
+the driver only supports checksumming for TCP/IP and UDP/IP
+packets. Use ethtool to enable or disable this feature for RX
+and TX.
+
+VLAN
+
+In order to use VLAN, please consult Linux documentation on
+configuring VLANs. The gianfar driver supports hardware insertion and
+extraction of VLAN headers, but not filtering. Filtering will be
+done by the kernel.
+
+MULTICASTING
+
+The gianfar driver supports using the group hash table on the
+TSEC (and the extended hash table on the eTSEC) for multicast
+filtering. On the eTSEC, the exact-match MAC registers are used
+before the hash tables. See Linux documentation on how to join
+multicast groups.
+
+PADDING
+
+The gianfar driver supports padding received frames with 2 bytes
+to align the IP header to a 16-byte boundary, when supported by
+hardware.
+
+ETHTOOL
+
+The gianfar driver supports the use of ethtool for many
+configuration options. You must run ethtool only on currently
+open interfaces. See ethtool documentation for details.
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index ebc09a159f6..2b7cf19a06a 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -46,6 +46,29 @@ ipfrag_secret_interval - INTEGER
for the hash secret) for IP fragments.
Default: 600
+ipfrag_max_dist - INTEGER
+ ipfrag_max_dist is a non-negative integer value which defines the
+ maximum "disorder" which is allowed among fragments which share a
+ common IP source address. Note that reordering of packets is
+ not unusual, but if a large number of fragments arrive from a source
+ IP address while a particular fragment queue remains incomplete, it
+ probably indicates that one or more fragments belonging to that queue
+ have been lost. When ipfrag_max_dist is positive, an additional check
+ is done on fragments before they are added to a reassembly queue - if
+ ipfrag_max_dist (or more) fragments have arrived from a particular IP
+ address between additions to any IP fragment queue using that source
+ address, it's presumed that one or more fragments in the queue are
+ lost. The existing fragment queue will be dropped, and a new one
+ started. An ipfrag_max_dist value of zero disables this check.
+
+ Using a very small value, e.g. 1 or 2, for ipfrag_max_dist can
+ result in unnecessarily dropping fragment queues when normal
+ reordering of packets occurs, which could lead to poor application
+ performance. Using a very large value, e.g. 50000, increases the
+ likelihood of incorrectly reassembling IP fragments that originate
+ from different IP datagrams, which could result in data corruption.
+ Default: 64
+
INET peer storage:
inet_peer_threshold - INTEGER
diff --git a/Documentation/networking/sk98lin.txt b/Documentation/networking/sk98lin.txt
index 851fc97bb22..7837c53fd5f 100644
--- a/Documentation/networking/sk98lin.txt
+++ b/Documentation/networking/sk98lin.txt
@@ -91,7 +91,7 @@ To use the driver as a module, proceed as follows:
with (M)
5. Execute the command "make modules".
6. Execute the command "make modules_install".
- The appropiate modules will be installed.
+ The appropriate modules will be installed.
7. Reboot your system.
@@ -245,7 +245,7 @@ Default: Both
This parameters is only relevant if auto-negotiation for this port is
not set to "Sense". If auto-negotiation is set to "On", all three values
are possible. If it is set to "Off", only "Full" and "Half" are allowed.
-This parameter is usefull if your link partner does not support all
+This parameter is useful if your link partner does not support all
possible combinations.
Flow Control
diff --git a/Documentation/oops-tracing.txt b/Documentation/oops-tracing.txt
index 05960f8a748..2503404ae5c 100644
--- a/Documentation/oops-tracing.txt
+++ b/Documentation/oops-tracing.txt
@@ -41,11 +41,9 @@ the disk is not available then you have three options :-
run a null modem to a second machine and capture the output there
using your favourite communication program. Minicom works well.
-(3) Patch the kernel with one of the crash dump patches. These save
- data to a floppy disk or video rom or a swap partition. None of
- these are standard kernel patches so you have to find and apply
- them yourself. Search kernel archives for kmsgdump, lkcd and
- oops+smram.
+(3) Use Kdump (see Documentation/kdump/kdump.txt),
+ extract the kernel ring buffer from old memory with using dmesg
+ gdbmacro in Documentation/kdump/gdbmacros.txt.
Full Information
diff --git a/Documentation/pci-error-recovery.txt b/Documentation/pci-error-recovery.txt
new file mode 100644
index 00000000000..d089967e494
--- /dev/null
+++ b/Documentation/pci-error-recovery.txt
@@ -0,0 +1,246 @@
+
+ PCI Error Recovery
+ ------------------
+ May 31, 2005
+
+ Current document maintainer:
+ Linas Vepstas <linas@austin.ibm.com>
+
+
+Some PCI bus controllers are able to detect certain "hard" PCI errors
+on the bus, such as parity errors on the data and address busses, as
+well as SERR and PERR errors. These chipsets are then able to disable
+I/O to/from the affected device, so that, for example, a bad DMA
+address doesn't end up corrupting system memory. These same chipsets
+are also able to reset the affected PCI device, and return it to
+working condition. This document describes a generic API form
+performing error recovery.
+
+The core idea is that after a PCI error has been detected, there must
+be a way for the kernel to coordinate with all affected device drivers
+so that the pci card can be made operational again, possibly after
+performing a full electrical #RST of the PCI card. The API below
+provides a generic API for device drivers to be notified of PCI
+errors, and to be notified of, and respond to, a reset sequence.
+
+Preliminary sketch of API, cut-n-pasted-n-modified email from
+Ben Herrenschmidt, circa 5 april 2005
+
+The error recovery API support is exposed to the driver in the form of
+a structure of function pointers pointed to by a new field in struct
+pci_driver. The absence of this pointer in pci_driver denotes an
+"non-aware" driver, behaviour on these is platform dependant.
+Platforms like ppc64 can try to simulate pci hotplug remove/add.
+
+The definition of "pci_error_token" is not covered here. It is based on
+Seto's work on the synchronous error detection. We still need to define
+functions for extracting infos out of an opaque error token. This is
+separate from this API.
+
+This structure has the form:
+
+struct pci_error_handlers
+{
+ int (*error_detected)(struct pci_dev *dev, pci_error_token error);
+ int (*mmio_enabled)(struct pci_dev *dev);
+ int (*resume)(struct pci_dev *dev);
+ int (*link_reset)(struct pci_dev *dev);
+ int (*slot_reset)(struct pci_dev *dev);
+};
+
+A driver doesn't have to implement all of these callbacks. The
+only mandatory one is error_detected(). If a callback is not
+implemented, the corresponding feature is considered unsupported.
+For example, if mmio_enabled() and resume() aren't there, then the
+driver is assumed as not doing any direct recovery and requires
+a reset. If link_reset() is not implemented, the card is assumed as
+not caring about link resets, in which case, if recover is supported,
+the core can try recover (but not slot_reset() unless it really did
+reset the slot). If slot_reset() is not supported, link_reset() can
+be called instead on a slot reset.
+
+At first, the call will always be :
+
+ 1) error_detected()
+
+ Error detected. This is sent once after an error has been detected. At
+this point, the device might not be accessible anymore depending on the
+platform (the slot will be isolated on ppc64). The driver may already
+have "noticed" the error because of a failing IO, but this is the proper
+"synchronisation point", that is, it gives a chance to the driver to
+cleanup, waiting for pending stuff (timers, whatever, etc...) to
+complete; it can take semaphores, schedule, etc... everything but touch
+the device. Within this function and after it returns, the driver
+shouldn't do any new IOs. Called in task context. This is sort of a
+"quiesce" point. See note about interrupts at the end of this doc.
+
+ Result codes:
+ - PCIERR_RESULT_CAN_RECOVER:
+ Driever returns this if it thinks it might be able to recover
+ the HW by just banging IOs or if it wants to be given
+ a chance to extract some diagnostic informations (see
+ below).
+ - PCIERR_RESULT_NEED_RESET:
+ Driver returns this if it thinks it can't recover unless the
+ slot is reset.
+ - PCIERR_RESULT_DISCONNECT:
+ Return this if driver thinks it won't recover at all,
+ (this will detach the driver ? or just leave it
+ dangling ? to be decided)
+
+So at this point, we have called error_detected() for all drivers
+on the segment that had the error. On ppc64, the slot is isolated. What
+happens now typically depends on the result from the drivers. If all
+drivers on the segment/slot return PCIERR_RESULT_CAN_RECOVER, we would
+re-enable IOs on the slot (or do nothing special if the platform doesn't
+isolate slots) and call 2). If not and we can reset slots, we go to 4),
+if neither, we have a dead slot. If it's an hotplug slot, we might
+"simulate" reset by triggering HW unplug/replug though.
+
+>>> Current ppc64 implementation assumes that a device driver will
+>>> *not* schedule or semaphore in this routine; the current ppc64
+>>> implementation uses one kernel thread to notify all devices;
+>>> thus, of one device sleeps/schedules, all devices are affected.
+>>> Doing better requires complex multi-threaded logic in the error
+>>> recovery implementation (e.g. waiting for all notification threads
+>>> to "join" before proceeding with recovery.) This seems excessively
+>>> complex and not worth implementing.
+
+>>> The current ppc64 implementation doesn't much care if the device
+>>> attempts i/o at this point, or not. I/O's will fail, returning
+>>> a value of 0xff on read, and writes will be dropped. If the device
+>>> driver attempts more than 10K I/O's to a frozen adapter, it will
+>>> assume that the device driver has gone into an infinite loop, and
+>>> it will panic the the kernel.
+
+ 2) mmio_enabled()
+
+ This is the "early recovery" call. IOs are allowed again, but DMA is
+not (hrm... to be discussed, I prefer not), with some restrictions. This
+is NOT a callback for the driver to start operations again, only to
+peek/poke at the device, extract diagnostic information, if any, and
+eventually do things like trigger a device local reset or some such,
+but not restart operations. This is sent if all drivers on a segment
+agree that they can try to recover and no automatic link reset was
+performed by the HW. If the platform can't just re-enable IOs without
+a slot reset or a link reset, it doesn't call this callback and goes
+directly to 3) or 4). All IOs should be done _synchronously_ from
+within this callback, errors triggered by them will be returned via
+the normal pci_check_whatever() api, no new error_detected() callback
+will be issued due to an error happening here. However, such an error
+might cause IOs to be re-blocked for the whole segment, and thus
+invalidate the recovery that other devices on the same segment might
+have done, forcing the whole segment into one of the next states,
+that is link reset or slot reset.
+
+ Result codes:
+ - PCIERR_RESULT_RECOVERED
+ Driver returns this if it thinks the device is fully
+ functionnal and thinks it is ready to start
+ normal driver operations again. There is no
+ guarantee that the driver will actually be
+ allowed to proceed, as another driver on the
+ same segment might have failed and thus triggered a
+ slot reset on platforms that support it.
+
+ - PCIERR_RESULT_NEED_RESET
+ Driver returns this if it thinks the device is not
+ recoverable in it's current state and it needs a slot
+ reset to proceed.
+
+ - PCIERR_RESULT_DISCONNECT
+ Same as above. Total failure, no recovery even after
+ reset driver dead. (To be defined more precisely)
+
+>>> The current ppc64 implementation does not implement this callback.
+
+ 3) link_reset()
+
+ This is called after the link has been reset. This is typically
+a PCI Express specific state at this point and is done whenever a
+non-fatal error has been detected that can be "solved" by resetting
+the link. This call informs the driver of the reset and the driver
+should check if the device appears to be in working condition.
+This function acts a bit like 2) mmio_enabled(), in that the driver
+is not supposed to restart normal driver I/O operations right away.
+Instead, it should just "probe" the device to check it's recoverability
+status. If all is right, then the core will call resume() once all
+drivers have ack'd link_reset().
+
+ Result codes:
+ (identical to mmio_enabled)
+
+>>> The current ppc64 implementation does not implement this callback.
+
+ 4) slot_reset()
+
+ This is called after the slot has been soft or hard reset by the
+platform. A soft reset consists of asserting the adapter #RST line
+and then restoring the PCI BARs and PCI configuration header. If the
+platform supports PCI hotplug, then it might instead perform a hard
+reset by toggling power on the slot off/on. This call gives drivers
+the chance to re-initialize the hardware (re-download firmware, etc.),
+but drivers shouldn't restart normal I/O processing operations at
+this point. (See note about interrupts; interrupts aren't guaranteed
+to be delivered until the resume() callback has been called). If all
+device drivers report success on this callback, the patform will call
+resume() to complete the error handling and let the driver restart
+normal I/O processing.
+
+A driver can still return a critical failure for this function if
+it can't get the device operational after reset. If the platform
+previously tried a soft reset, it migh now try a hard reset (power
+cycle) and then call slot_reset() again. It the device still can't
+be recovered, there is nothing more that can be done; the platform
+will typically report a "permanent failure" in such a case. The
+device will be considered "dead" in this case.
+
+ Result codes:
+ - PCIERR_RESULT_DISCONNECT
+ Same as above.
+
+>>> The current ppc64 implementation does not try a power-cycle reset
+>>> if the driver returned PCIERR_RESULT_DISCONNECT. However, it should.
+
+ 5) resume()
+
+ This is called if all drivers on the segment have returned
+PCIERR_RESULT_RECOVERED from one of the 3 prevous callbacks.
+That basically tells the driver to restart activity, tht everything
+is back and running. No result code is taken into account here. If
+a new error happens, it will restart a new error handling process.
+
+That's it. I think this covers all the possibilities. The way those
+callbacks are called is platform policy. A platform with no slot reset
+capability for example may want to just "ignore" drivers that can't
+recover (disconnect them) and try to let other cards on the same segment
+recover. Keep in mind that in most real life cases, though, there will
+be only one driver per segment.
+
+Now, there is a note about interrupts. If you get an interrupt and your
+device is dead or has been isolated, there is a problem :)
+
+After much thinking, I decided to leave that to the platform. That is,
+the recovery API only precies that:
+
+ - There is no guarantee that interrupt delivery can proceed from any
+device on the segment starting from the error detection and until the
+restart callback is sent, at which point interrupts are expected to be
+fully operational.
+
+ - There is no guarantee that interrupt delivery is stopped, that is, ad
+river that gets an interrupts after detecting an error, or that detects
+and error within the interrupt handler such that it prevents proper
+ack'ing of the interrupt (and thus removal of the source) should just
+return IRQ_NOTHANDLED. It's up to the platform to deal with taht
+condition, typically by masking the irq source during the duration of
+the error handling. It is expected that the platform "knows" which
+interrupts are routed to error-management capable slots and can deal
+with temporarily disabling that irq number during error processing (this
+isn't terribly complex). That means some IRQ latency for other devices
+sharing the interrupt, but there is simply no other way. High end
+platforms aren't supposed to share interrupts between many devices
+anyway :)
+
+
+Revised: 31 May 2005 Linas Vepstas <linas@austin.ibm.com>
diff --git a/Documentation/pcmcia/driver-changes.txt b/Documentation/pcmcia/driver-changes.txt
index 403e7b4dcdd..97420f08c78 100644
--- a/Documentation/pcmcia/driver-changes.txt
+++ b/Documentation/pcmcia/driver-changes.txt
@@ -1,5 +1,16 @@
This file details changes in 2.6 which affect PCMCIA card driver authors:
+* Unify detach and REMOVAL event code, as well as attach and INSERTION
+ code (as of 2.6.16)
+ void (*remove) (struct pcmcia_device *dev);
+ int (*probe) (struct pcmcia_device *dev);
+
+* Move suspend, resume and reset out of event handler (as of 2.6.16)
+ int (*suspend) (struct pcmcia_device *dev);
+ int (*resume) (struct pcmcia_device *dev);
+ should be initialized in struct pcmcia_driver, and handle
+ (SUSPEND == RESET_PHYSICAL) and (RESUME == CARD_RESET) events
+
* event handler initialization in struct pcmcia_driver (as of 2.6.13)
The event handler is notified of all events, and must be initialized
as the event() callback in the driver's struct pcmcia_driver.
diff --git a/Documentation/pm.txt b/Documentation/pm.txt
index 2ea1149bf6b..79c0f32a760 100644
--- a/Documentation/pm.txt
+++ b/Documentation/pm.txt
@@ -218,7 +218,7 @@ proceed in the opposite direction.
Q: Who do I contact for additional information about
enabling power management for my specific driver/device?
-ACPI Development mailing list: acpi-devel@lists.sourceforge.net
+ACPI Development mailing list: linux-acpi@vger.kernel.org
System Interface -- OBSOLETE, DO NOT USE!
----------------*************************
diff --git a/Documentation/power/interface.txt b/Documentation/power/interface.txt
index f5ebda5f427..bd4ffb5bd49 100644
--- a/Documentation/power/interface.txt
+++ b/Documentation/power/interface.txt
@@ -41,3 +41,14 @@ to. Writing to this file will accept one of
It will only change to 'firmware' or 'platform' if the system supports
it.
+/sys/power/image_size controls the size of the image created by
+the suspend-to-disk mechanism. It can be written a string
+representing a non-negative integer that will be used as an upper
+limit of the image size, in megabytes. The suspend-to-disk mechanism will
+do its best to ensure the image size will not exceed that number. However,
+if this turns out to be impossible, it will try to suspend anyway using the
+smallest image possible. In particular, if "0" is written to this file, the
+suspend image will be as small as possible.
+
+Reading from this file will display the current image size limit, which
+is set to 500 MB by default.
diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt
index b0d50840788..08c79d4dc54 100644
--- a/Documentation/power/swsusp.txt
+++ b/Documentation/power/swsusp.txt
@@ -27,6 +27,11 @@ echo shutdown > /sys/power/disk; echo disk > /sys/power/state
echo platform > /sys/power/disk; echo disk > /sys/power/state
+If you want to limit the suspend image size to N megabytes, do
+
+echo N > /sys/power/image_size
+
+before suspend (it is limited to 500 MB by default).
Encrypted suspend image:
------------------------
@@ -207,7 +212,7 @@ A: Try running
cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null
-after resume. swapoff -a; swapon -a may also be usefull.
+after resume. swapoff -a; swapon -a may also be useful.
Q: What happens to devices during swsusp? They seem to be resumed
during system suspend?
@@ -318,7 +323,7 @@ to be useless to try to suspend to disk while that app is running?
A: No, it should work okay, as long as your app does not mlock()
it. Just prepare big enough swap partition.
-Q: What information is usefull for debugging suspend-to-disk problems?
+Q: What information is useful for debugging suspend-to-disk problems?
A: Well, last messages on the screen are always useful. If something
is broken, it is usually some kernel driver, therefore trying with as
diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX
index e7bea0a407b..d6d65b9bcfe 100644
--- a/Documentation/powerpc/00-INDEX
+++ b/Documentation/powerpc/00-INDEX
@@ -8,12 +8,18 @@ please mail me.
cpu_features.txt
- info on how we support a variety of CPUs with minimal compile-time
options.
+eeh-pci-error-recovery.txt
+ - info on PCI Bus EEH Error Recovery
+hvcs.txt
+ - IBM "Hypervisor Virtual Console Server" Installation Guide
+mpc52xx.txt
+ - Linux 2.6.x on MPC52xx family
ppc_htab.txt
- info about the Linux/PPC /proc/ppc_htab entry
-smp.txt
- - use and state info about Linux/PPC on MP machines
SBC8260_memory_mapping.txt
- EST SBC8260 board info
+smp.txt
+ - use and state info about Linux/PPC on MP machines
sound.txt
- info on sound support under Linux/PPC
zImage_layout.txt
diff --git a/Documentation/powerpc/eeh-pci-error-recovery.txt b/Documentation/powerpc/eeh-pci-error-recovery.txt
index e75d7474322..67a11a36270 100644
--- a/Documentation/powerpc/eeh-pci-error-recovery.txt
+++ b/Documentation/powerpc/eeh-pci-error-recovery.txt
@@ -115,7 +115,7 @@ Current PPC64 Linux EEH Implementation
At this time, a generic EEH recovery mechanism has been implemented,
so that individual device drivers do not need to be modified to support
EEH recovery. This generic mechanism piggy-backs on the PCI hotplug
-infrastructure, and percolates events up through the hotplug/udev
+infrastructure, and percolates events up through the userspace/udev
infrastructure. Followiing is a detailed description of how this is
accomplished.
@@ -172,7 +172,7 @@ A handler for the EEH notifier_block events is implemented in
drivers/pci/hotplug/pSeries_pci.c, called handle_eeh_events().
It saves the device BAR's and then calls rpaphp_unconfig_pci_adapter().
This last call causes the device driver for the card to be stopped,
-which causes hotplug events to go out to user space. This triggers
+which causes uevents to go out to user space. This triggers
user-space scripts that might issue commands such as "ifdown eth0"
for ethernet cards, and so on. This handler then sleeps for 5 seconds,
hoping to give the user-space scripts enough time to complete.
@@ -258,29 +258,30 @@ rpa_php_unconfig_pci_adapter() { // in rpaphp_pci.c
calls
pci_destroy_dev (struct pci_dev *) {
calls
- device_unregister (&dev->dev) { // in /drivers/base/core.c
+ device_unregister (&dev->dev) { // in /drivers/base/core.c
calls
- device_del(struct device * dev) { // in /drivers/base/core.c
+ device_del(struct device * dev) { // in /drivers/base/core.c
calls
- kobject_del() { //in /libs/kobject.c
+ kobject_del() { //in /libs/kobject.c
calls
- kobject_hotplug() { // in /libs/kobject.c
+ kobject_uevent() { // in /libs/kobject.c
calls
- kset_hotplug() { // in /lib/kobject.c
+ kset_uevent() { // in /lib/kobject.c
calls
- kset->hotplug_ops->hotplug() which is really just
+ kset->uevent_ops->uevent() // which is really just
a call to
- dev_hotplug() { // in /drivers/base/core.c
+ dev_uevent() { // in /drivers/base/core.c
calls
- dev->bus->hotplug() which is really just a call to
- pci_hotplug () { // in drivers/pci/hotplug.c
+ dev->bus->uevent() which is really just a call to
+ pci_uevent () { // in drivers/pci/hotplug.c
which prints device name, etc....
}
}
- then kset_hotplug() calls
- call_usermodehelper () with
- argv[0]=hotplug_path[] which is "/sbin/hotplug"
- --> event to userspace,
+ then kobject_uevent() sends a netlink uevent to userspace
+ --> userspace uevent
+ (during early boot, nobody listens to netlink events and
+ kobject_uevent() executes uevent_helper[], which runs the
+ event process /sbin/hotplug)
}
}
kobject_del() then calls sysfs_remove_dir(), which would
diff --git a/Documentation/scsi/ChangeLog.megaraid b/Documentation/scsi/ChangeLog.megaraid
index 5331d91432c..09f6300eda4 100644
--- a/Documentation/scsi/ChangeLog.megaraid
+++ b/Documentation/scsi/ChangeLog.megaraid
@@ -1,3 +1,38 @@
+Release Date : Fri Nov 11 12:27:22 EST 2005 - Seokmann Ju <sju@lsil.com>
+Current Version : 2.20.4.7 (scsi module), 2.20.2.6 (cmm module)
+Older Version : 2.20.4.6 (scsi module), 2.20.2.6 (cmm module)
+
+1. Sorted out PCI IDs to remove megaraid support overlaps.
+ Based on the patch from Daniel, sorted out PCI IDs along with
+ charactor node name change from 'megadev' to 'megadev_legacy' to avoid
+ conflict.
+ ---
+ Hopefully we'll be getting the build restriction zapped much sooner,
+ but we should also be thinking about totally removing the hardware
+ support overlap in the megaraid drivers.
+
+ This patch pencils in a date of Feb 06 for this, and performs some
+ printk abuse in hope that existing legacy users might pick up on what's
+ going on.
+
+ Signed-off-by: Daniel Drake <dsd@gentoo.org>
+ ---
+
+2. Fixed a issue: megaraid always fails to reset handler.
+ ---
+ I found that the megaraid driver always fails to reset the
+ adapter with the following message:
+ megaraid: resetting the host...
+ megaraid mbox: reset sequence completed successfully
+ megaraid: fast sync command timed out
+ megaraid: reservation reset failed
+ when the "Cluster mode" of the adapter BIOS is enabled.
+ So, whenever the reset occurs, the adapter goes to
+ offline and just become unavailable.
+
+ Jun'ichi Nomura [mailto:jnomura@mtc.biglobe.ne.jp]
+ ---
+
Release Date : Mon Mar 07 12:27:22 EST 2005 - Seokmann Ju <sju@lsil.com>
Current Version : 2.20.4.6 (scsi module), 2.20.2.6 (cmm module)
Older Version : 2.20.4.5 (scsi module), 2.20.2.5 (cmm module)
diff --git a/Documentation/scsi/aacraid.txt b/Documentation/scsi/aacraid.txt
new file mode 100644
index 00000000000..820fd079350
--- /dev/null
+++ b/Documentation/scsi/aacraid.txt
@@ -0,0 +1,108 @@
+AACRAID Driver for Linux (take two)
+
+Introduction
+-------------------------
+The aacraid driver adds support for Adaptec (http://www.adaptec.com)
+RAID controllers. This is a major rewrite from the original
+Adaptec supplied driver. It has signficantly cleaned up both the code
+and the running binary size (the module is less than half the size of
+the original).
+
+Supported Cards/Chipsets
+-------------------------
+ PCI ID (pci.ids) OEM Product
+ 9005:0285:9005:028a Adaptec 2020ZCR (Skyhawk)
+ 9005:0285:9005:028e Adaptec 2020SA (Skyhawk)
+ 9005:0285:9005:028b Adaptec 2025ZCR (Terminator)
+ 9005:0285:9005:028f Adaptec 2025SA (Terminator)
+ 9005:0285:9005:0286 Adaptec 2120S (Crusader)
+ 9005:0286:9005:028d Adaptec 2130S (Lancer)
+ 9005:0285:9005:0285 Adaptec 2200S (Vulcan)
+ 9005:0285:9005:0287 Adaptec 2200S (Vulcan-2m)
+ 9005:0286:9005:028c Adaptec 2230S (Lancer)
+ 9005:0286:9005:028c Adaptec 2230SLP (Lancer)
+ 9005:0285:9005:0296 Adaptec 2240S (SabreExpress)
+ 9005:0285:9005:0290 Adaptec 2410SA (Jaguar)
+ 9005:0285:9005:0293 Adaptec 21610SA (Corsair-16)
+ 9005:0285:103c:3227 Adaptec 2610SA (Bearcat)
+ 9005:0285:9005:0292 Adaptec 2810SA (Corsair-8)
+ 9005:0285:9005:0294 Adaptec Prowler
+ 9005:0286:9005:029d Adaptec 2420SA (Intruder)
+ 9005:0286:9005:029c Adaptec 2620SA (Intruder)
+ 9005:0286:9005:029b Adaptec 2820SA (Intruder)
+ 9005:0286:9005:02a7 Adaptec 2830SA (Skyray)
+ 9005:0286:9005:02a8 Adaptec 2430SA (Skyray)
+ 9005:0285:9005:0288 Adaptec 3230S (Harrier)
+ 9005:0285:9005:0289 Adaptec 3240S (Tornado)
+ 9005:0285:9005:0298 Adaptec 4000SAS (BlackBird)
+ 9005:0285:9005:0297 Adaptec 4005SAS (AvonPark)
+ 9005:0285:9005:0299 Adaptec 4800SAS (Marauder-X)
+ 9005:0285:9005:029a Adaptec 4805SAS (Marauder-E)
+ 9005:0286:9005:02a2 Adaptec 4810SAS (Hurricane)
+ 1011:0046:9005:0364 Adaptec 5400S (Mustang)
+ 1011:0046:9005:0365 Adaptec 5400S (Mustang)
+ 9005:0283:9005:0283 Adaptec Catapult (3210S with arc firmware)
+ 9005:0284:9005:0284 Adaptec Tomcat (3410S with arc firmware)
+ 9005:0287:9005:0800 Adaptec Themisto (Jupiter)
+ 9005:0200:9005:0200 Adaptec Themisto (Jupiter)
+ 9005:0286:9005:0800 Adaptec Callisto (Jupiter)
+ 1011:0046:9005:1364 Dell PERC 2/QC (Quad Channel, Mustang)
+ 1028:0001:1028:0001 Dell PERC 2/Si (Iguana)
+ 1028:0003:1028:0003 Dell PERC 3/Si (SlimFast)
+ 1028:0002:1028:0002 Dell PERC 3/Di (Opal)
+ 1028:0004:1028:0004 Dell PERC 3/DiF (Iguana)
+ 1028:0002:1028:00d1 Dell PERC 3/DiV (Viper)
+ 1028:0002:1028:00d9 Dell PERC 3/DiL (Lexus)
+ 1028:000a:1028:0106 Dell PERC 3/DiJ (Jaguar)
+ 1028:000a:1028:011b Dell PERC 3/DiD (Dagger)
+ 1028:000a:1028:0121 Dell PERC 3/DiB (Boxster)
+ 9005:0285:1028:0287 Dell PERC 320/DC (Vulcan)
+ 9005:0285:1028:0291 Dell CERC 2 (DellCorsair)
+ 1011:0046:103c:10c2 HP NetRAID-4M (Mustang)
+ 9005:0285:17aa:0286 Legend S220 (Crusader)
+ 9005:0285:17aa:0287 Legend S230 (Vulcan)
+ 9005:0285:9005:0290 IBM ServeRAID 7t (Jaguar)
+ 9005:0285:1014:02F2 IBM ServeRAID 8i (AvonPark)
+ 9005:0285:1014:0312 IBM ServeRAID 8i (AvonParkLite)
+ 9005:0286:1014:9580 IBM ServeRAID 8k/8k-l8 (Aurora)
+ 9005:0286:1014:9540 IBM ServeRAID 8k/8k-l4 (AuroraLite)
+ 9005:0286:9005:029f ICP ICP9014R0 (Lancer)
+ 9005:0286:9005:029e ICP ICP9024R0 (Lancer)
+ 9005:0286:9005:02a0 ICP ICP9047MA (Lancer)
+ 9005:0286:9005:02a1 ICP ICP9087MA (Lancer)
+ 9005:0286:9005:02a4 ICP ICP9085LI (Marauder-X)
+ 9005:0286:9005:02a5 ICP ICP5085BR (Marauder-E)
+ 9005:0286:9005:02a3 ICP ICP5085AU (Hurricane)
+ 9005:0286:9005:02a6 ICP ICP9067MA (Intruder-6)
+ 9005:0286:9005:02a9 ICP ICP5087AU (Skyray)
+ 9005:0286:9005:02aa ICP ICP5047AU (Skyray)
+
+People
+-------------------------
+Alan Cox <alan@redhat.com>
+Christoph Hellwig <hch@infradead.org> (updates for new-style PCI probing and SCSI host registration,
+ small cleanups/fixes)
+Matt Domsch <matt_domsch@dell.com> (revision ioctl, adapter messages)
+Deanna Bonds (non-DASD support, PAE fibs and 64 bit, added new adaptec controllers
+ added new ioctls, changed scsi interface to use new error handler,
+ increased the number of fibs and outstanding commands to a container)
+
+ (fixed 64bit and 64G memory model, changed confusing naming convention
+ where fibs that go to the hardware are consistently called hw_fibs and
+ not just fibs like the name of the driver tracking structure)
+Mark Salyzyn <Mark_Salyzyn@adaptec.com> Fixed panic issues and added some new product ids for upcoming hbas. Performance tuning, card failover and bug mitigations.
+
+Original Driver
+-------------------------
+Adaptec Unix OEM Product Group
+
+Mailing List
+-------------------------
+linux-scsi@vger.kernel.org (Interested parties troll here)
+Also note this is very different to Brian's original driver
+so don't expect him to support it.
+Adaptec does support this driver. Contact Adaptec tech support or
+aacraid@adaptec.com
+
+Original by Brian Boerner February 2001
+Rewritten by Alan Cox, November 2001
diff --git a/Documentation/scsi/scsi_mid_low_api.txt b/Documentation/scsi/scsi_mid_low_api.txt
index 66565d42288..8bbae3e1abd 100644
--- a/Documentation/scsi/scsi_mid_low_api.txt
+++ b/Documentation/scsi/scsi_mid_low_api.txt
@@ -150,7 +150,8 @@ scsi devices of which only the first 2 respond:
LLD mid level LLD
===-------------------=========--------------------===------
scsi_host_alloc() -->
-scsi_add_host() --------+
+scsi_add_host() ---->
+scsi_scan_host() -------+
|
slave_alloc()
slave_configure() --> scsi_adjust_queue_depth()
@@ -196,7 +197,7 @@ of the issues involved. See the section on reference counting below.
The hotplug concept may be extended to SCSI devices. Currently, when an
-HBA is added, the scsi_add_host() function causes a scan for SCSI devices
+HBA is added, the scsi_scan_host() function causes a scan for SCSI devices
attached to the HBA's SCSI transport. On newer SCSI transports the HBA
may become aware of a new SCSI device _after_ the scan has completed.
An LLD can use this sequence to make the mid level aware of a SCSI device:
@@ -372,7 +373,7 @@ names all start with "scsi_".
Summary:
scsi_activate_tcq - turn on tag command queueing
scsi_add_device - creates new scsi device (lu) instance
- scsi_add_host - perform sysfs registration and SCSI bus scan.
+ scsi_add_host - perform sysfs registration and set up transport class
scsi_adjust_queue_depth - change the queue depth on a SCSI device
scsi_assign_lock - replace default host_lock with given lock
scsi_bios_ptable - return copy of block device's partition table
@@ -386,6 +387,7 @@ Summary:
scsi_remove_device - detach and remove a SCSI device
scsi_remove_host - detach and remove all SCSI devices owned by host
scsi_report_bus_reset - report scsi _bus_ reset observed
+ scsi_scan_host - scan SCSI bus
scsi_track_queue_full - track successive QUEUE_FULL events
scsi_unblock_requests - allow further commands to be queued to given host
scsi_unregister - [calls scsi_host_put()]
@@ -425,10 +427,10 @@ void scsi_activate_tcq(struct scsi_device *sdev, int depth)
* Might block: yes
*
* Notes: This call is usually performed internally during a scsi
- * bus scan when an HBA is added (i.e. scsi_add_host()). So it
+ * bus scan when an HBA is added (i.e. scsi_scan_host()). So it
* should only be called if the HBA becomes aware of a new scsi
- * device (lu) after scsi_add_host() has completed. If successful
- * this call we lead to slave_alloc() and slave_configure() callbacks
+ * device (lu) after scsi_scan_host() has completed. If successful
+ * this call can lead to slave_alloc() and slave_configure() callbacks
* into the LLD.
*
* Defined in: drivers/scsi/scsi_scan.c
@@ -439,7 +441,7 @@ struct scsi_device * scsi_add_device(struct Scsi_Host *shost,
/**
- * scsi_add_host - perform sysfs registration and SCSI bus scan.
+ * scsi_add_host - perform sysfs registration and set up transport class
* @shost: pointer to scsi host instance
* @dev: pointer to struct device of type scsi class
*
@@ -448,7 +450,11 @@ struct scsi_device * scsi_add_device(struct Scsi_Host *shost,
* Might block: no
*
* Notes: Only required in "hotplug initialization model" after a
- * successful call to scsi_host_alloc().
+ * successful call to scsi_host_alloc(). This function does not
+ * scan the bus; this can be done by calling scsi_scan_host() or
+ * in some other transport-specific way. The LLD must set up
+ * the transport template before calling this function and may only
+ * access the transport class data after this function has been called.
*
* Defined in: drivers/scsi/hosts.c
**/
@@ -559,7 +565,7 @@ void scsi_deactivate_tcq(struct scsi_device *sdev, int depth)
* area for the LLD's exclusive use.
* Both associated refcounting objects have their refcount set to 1.
* Full registration (in sysfs) and a bus scan are performed later when
- * scsi_add_host() is called.
+ * scsi_add_host() and scsi_scan_host() are called.
*
* Defined in: drivers/scsi/hosts.c .
**/
@@ -699,6 +705,19 @@ void scsi_report_bus_reset(struct Scsi_Host * shost, int channel)
/**
+ * scsi_scan_host - scan SCSI bus
+ * @shost: a pointer to a scsi host instance
+ *
+ * Might block: yes
+ *
+ * Notes: Should be called after scsi_add_host()
+ *
+ * Defined in: drivers/scsi/scsi_scan.c
+ **/
+void scsi_scan_host(struct Scsi_Host *shost)
+
+
+/**
* scsi_track_queue_full - track successive QUEUE_FULL events on given
* device to determine if and when there is a need
* to adjust the queue depth on the device.
@@ -1433,7 +1452,7 @@ The following people have contributed to this document:
Christoph Hellwig <hch at infradead dot org>
Doug Ledford <dledford at redhat dot com>
Andries Brouwer <Andries dot Brouwer at cwi dot nl>
- Randy Dunlap <rddunlap at osdl dot org>
+ Randy Dunlap <rdunlap at xenotime dot net>
Alan Stern <stern at rowland dot harvard dot edu>
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt
index 2f27f391c7c..d2578013e82 100644
--- a/Documentation/sound/alsa/ALSA-Configuration.txt
+++ b/Documentation/sound/alsa/ALSA-Configuration.txt
@@ -105,7 +105,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
Each of top level sound card module takes the following options.
index - index (slot #) of sound card
- - Values: 0 through 7 or negative
+ - Values: 0 through 31 or negative
- If nonnegative, assign that index number
- if negative, interpret as a bitmask of permissible
indices; the first free permitted index is assigned
@@ -134,7 +134,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
dma2 - second DMA # for AD1816A chip (PnP setup)
clockfreq - Clock frequency for AD1816A chip (default = 0, 33000Hz)
- Module supports up to 8 cards, autoprobe and PnP.
+ This module supports multiple cards, autoprobe and PnP.
Module snd-ad1848
-----------------
@@ -145,9 +145,11 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
irq - IRQ # for AD1848 chip
dma1 - DMA # for AD1848 chip (0,1,3)
- Module supports up to 8 cards. This module does not support autoprobe
+ This module supports multiple cards. It does not support autoprobe
thus main port must be specified!!! Other ports are optional.
+ The power-management is supported.
+
Module snd-ad1889
-----------------
@@ -156,7 +158,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
ac97_quirk - AC'97 workaround for strange hardware
See the description of intel8x0 module for details.
- This module supports up to 8 cards.
+ This module supports multiple cards.
Module snd-ali5451
------------------
@@ -184,7 +186,9 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
mpu_irq - IRQ # for MPU-401 (PnP setup)
fm_port - port # for OPL3 FM (PnP setup)
- Module supports up to 8 cards, autoprobe and PnP.
+ This module supports multiple cards, autoprobe and PnP.
+
+ The power-management is supported.
Module snd-als4000
------------------
@@ -194,7 +198,9 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
joystick_port - port # for legacy joystick support.
0 = disabled (default), 1 = auto-detect
- Module supports up to 8 cards, autoprobe and PnP.
+ This module supports multiple cards, autoprobe and PnP.
+
+ The power-management is supported.
Module snd-atiixp
-----------------
@@ -213,6 +219,8 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
implementation depends on the motherboard, and you'll need to
choose the correct one via spdif_aclink module option.
+ The power-management is supported.
+
Module snd-atiixp-modem
-----------------------
@@ -223,6 +231,8 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
Note: The default index value of this module is -2, i.e. the first
slot is excluded.
+ The power-management is supported.
+
Module snd-au8810, snd-au8820, snd-au8830
-----------------------------------------
@@ -263,8 +273,10 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
dma1 - 1st DMA # for AZT2320 (WSS) chip (PnP setup)
dma2 - 2nd DMA # for AZT2320 (WSS) chip (PnP setup)
- Module supports up to 8 cards, PnP and autoprobe.
+ This module supports multiple cards, PnP and autoprobe.
+ The power-management is supported.
+
Module snd-azt3328
------------------
@@ -272,7 +284,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
joystick - Enable joystick (default off)
- Module supports up to 8 cards.
+ This module supports multiple cards.
Module snd-bt87x
----------------
@@ -282,7 +294,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
digital_rate - Override the default digital rate (Hz)
load_all - Load the driver even if the card model isn't known
- Module supports up to 8 cards.
+ This module supports multiple cards.
Note: The default index value of this module is -2, i.e. the first
slot is excluded.
@@ -292,7 +304,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
Module for Creative Audigy LS and SB Live 24bit
- Module supports up to 8 cards.
+ This module supports multiple cards.
Module snd-cmi8330
@@ -308,7 +320,9 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
sbdma8 - 8bit DMA # for CMI8330 chip (SB16)
sbdma16 - 16bit DMA # for CMI8330 chip (SB16)
- Module supports up to 8 cards and autoprobe.
+ This module supports multiple cards and autoprobe.
+
+ The power-management is supported.
Module snd-cmipci
-----------------
@@ -321,8 +335,10 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
(default = 1)
joystick_port - Joystick port address (0 = disable, 1 = auto-detect)
- Module supports autoprobe and multiple chips (max 8).
+ This module supports autoprobe and multiple cards.
+ The power-management is supported.
+
Module snd-cs4231
-----------------
@@ -335,7 +351,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
dma1 - first DMA # for CS4231 chip
dma2 - second DMA # for CS4231 chip
- Module supports up to 8 cards. This module does not support autoprobe
+ This module supports multiple cards. This module does not support autoprobe
thus main port must be specified!!! Other ports are optional.
The power-management is supported.
@@ -355,7 +371,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
dma2 - second DMA # for Yamaha CS4232 chip (0,1,3), -1 = disable
isapnp - ISA PnP detection - 0 = disable, 1 = enable (default)
- Module supports up to 8 cards. This module does not support autoprobe
+ This module supports multiple cards. This module does not support autoprobe
thus main port must be specified!!! Other ports are optional.
The power-management is supported.
@@ -376,7 +392,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
dma2 - second DMA # for CS4236 chip (0,1,3), -1 = disable
isapnp - ISA PnP detection - 0 = disable, 1 = enable (default)
- Module supports up to 8 cards. This module does not support autoprobe
+ This module supports multiple cards. This module does not support autoprobe
(if ISA PnP is not used) thus main port and control port must be
specified!!! Other ports are optional.
@@ -389,7 +405,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
dual_codec - Secondary codec ID (0 = disable, default)
- Module supports up to 8 cards.
+ This module supports multiple cards.
The power-management is supported.
@@ -403,13 +419,20 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
thinkpad - Force to enable Thinkpad's CLKRUN control.
mmap_valid - Support OSS mmap mode (default = 0).
- Module supports up to 8 cards and autoprobe.
+ This module supports multiple cards and autoprobe.
Usually external amp and CLKRUN controls are detected automatically
from PCI sub vendor/device ids. If they don't work, give the options
above explicitly.
The power-management is supported.
+ Module snd-cs5535audio
+ ----------------------
+
+ Module for multifunction CS5535 companion PCI device
+
+ This module supports multiple cards.
+
Module snd-dt019x
-----------------
@@ -423,9 +446,11 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
mpu_irq - IRQ # for MPU-401 (PnP setup)
dma8 - DMA # (PnP setup)
- Module supports up to 8 cards. This module is enabled only with
+ This module supports multiple cards. This module is enabled only with
ISA PnP support.
+ The power-management is supported.
+
Module snd-dummy
----------------
@@ -433,6 +458,8 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
or input, but you may use this module for any application which
requires a sound card (like RealPlayer).
+ The power-management is supported.
+
Module snd-emu10k1
------------------
@@ -450,7 +477,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
given in MB unit. Default value is 128.
enable_ir - enable IR
- Module supports up to 8 cards and autoprobe.
+ This module supports multiple cards and autoprobe.
Input & Output configurations [extin/extout]
* Creative Card wo/Digital out [0x0003/0x1f03]
@@ -466,12 +493,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
* Creative Card 5.1 (c) 2003 [0x3fc3/0x7cff]
* Creative Card all ins and outs [0x3fff/0x7fff]
+ The power-management is supported.
+
Module snd-emu10k1x
-------------------
Module for Creative Emu10k1X (SB Live Dell OEM version)
- Module supports up to 8 cards.
+ This module supports multiple cards.
Module snd-ens1370
------------------
@@ -482,7 +511,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
joystick - Enable joystick (default off)
- Module supports up to 8 cards and autoprobe.
+ This module supports multiple cards and autoprobe.
Module snd-ens1371
------------------
@@ -495,7 +524,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
joystick_port - port # for joystick (0x200,0x208,0x210,0x218),
0 = disable (default), 1 = auto-detect
- Module supports up to 8 cards and autoprobe.
+ This module supports multiple cards and autoprobe.
Module snd-es968
----------------
@@ -506,8 +535,10 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
irq - IRQ # for ES968 (SB8) chip (PnP setup)
dma1 - DMA # for ES968 (SB8) chip (PnP setup)
- Module supports up to 8 cards, PnP and autoprobe.
+ This module supports multiple cards, PnP and autoprobe.
+ The power-management is supported.
+
Module snd-es1688
-----------------
@@ -519,7 +550,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
mpu_irq - IRQ # for MPU-401 port (5,7,9,10)
dma8 - DMA # for ES-1688 chip (0,1,3)
- Module supports up to 8 cards and autoprobe (without MPU-401 port).
+ This module supports multiple cards and autoprobe (without MPU-401 port).
Module snd-es18xx
-----------------
@@ -534,8 +565,8 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
dma2 - first DMA # for ES-18xx chip (0,1,3)
isapnp - ISA PnP detection - 0 = disable, 1 = enable (default)
- Module supports up to 8 cards ISA PnP and autoprobe (without MPU-401 port
- if native ISA PnP routines are not used).
+ This module supports multiple cards, ISA PnP and autoprobe (without MPU-401
+ port if native ISA PnP routines are not used).
When dma2 is equal with dma1, the driver works as half-duplex.
The power-management is supported.
@@ -545,7 +576,9 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
Module for sound cards based on ESS Solo-1 (ES1938,ES1946) chips.
- Module supports up to 8 cards and autoprobe.
+ This module supports multiple cards and autoprobe.
+
+ The power-management is supported.
Module snd-es1968
-----------------
@@ -561,7 +594,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
enable_mpu - enable MPU401 (0 = off, 1 = on, 2 = auto (default))
joystick - enable joystick (default off)
- Module supports up to 8 cards and autoprobe.
+ This module supports multiple cards and autoprobe.
The power-management is supported.
@@ -577,8 +610,10 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
- High 16-bits are video (radio) device number + 1
- example: 0x10002 (MediaForte 256-PCPR, device 1)
- Module supports up to 8 cards and autoprobe.
+ This module supports multiple cards and autoprobe.
+ The power-management is supported.
+
Module snd-gusclassic
---------------------
@@ -592,7 +627,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
voices - GF1 voices limit (14-32)
pcm_voices - reserved PCM voices
- Module supports up to 8 cards and autoprobe.
+ This module supports multiple cards and autoprobe.
Module snd-gusextreme
---------------------
@@ -611,7 +646,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
voices - GF1 voices limit (14-32)
pcm_voices - reserved PCM voices
- Module supports up to 8 cards and autoprobe (without MPU-401 port).
+ This module supports multiple cards and autoprobe (without MPU-401 port).
Module snd-gusmax
-----------------
@@ -626,7 +661,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
voices - GF1 voices limit (14-32)
pcm_voices - reserved PCM voices
- Module supports up to 8 cards and autoprobe.
+ This module supports multiple cards and autoprobe.
Module snd-hda-intel
--------------------
@@ -688,12 +723,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
(Usually SD_LPLIB register is more accurate than the
position buffer.)
+ The power-management is supported.
+
Module snd-hdsp
---------------
Module for RME Hammerfall DSP audio interface(s)
- Module supports up to 8 cards.
+ This module supports multiple cards.
Note: The firmware data can be automatically loaded via hotplug
when CONFIG_FW_LOADER is set. Otherwise, you need to load
@@ -751,7 +788,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
cs8427_timeout - reset timeout for the CS8427 chip (S/PDIF transciever)
in msec resolution, default value is 500 (0.5 sec)
- Module supports up to 8 cards and autoprobe. Note: The consumer part
+ This module supports multiple cards and autoprobe. Note: The consumer part
is not used with all Envy24 based cards (for example in the MidiMan Delta
serie).
@@ -787,7 +824,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
aureon71, universe, k8x800, phase22, phase28, ms300,
av710
- Module supports up to 8 cards and autoprobe.
+ This module supports multiple cards and autoprobe.
Note: The supported board is detected by reading EEPROM or PCI
SSID (if EEPROM isn't available). You can override the
@@ -839,6 +876,8 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
Note: The default index value of this module is -2, i.e. the first
slot is excluded.
+ The power-management is supported.
+
Module snd-interwave
--------------------
@@ -855,7 +894,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
effect - 1 = InterWave effects enable (default 0);
requires 8 voices
- Module supports up to 8 cards, autoprobe and ISA PnP.
+ This module supports multiple cards, autoprobe and ISA PnP.
Module snd-interwave-stb
------------------------
@@ -875,14 +914,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
effect - 1 = InterWave effects enable (default 0);
requires 8 voices
- Module supports up to 8 cards, autoprobe and ISA PnP.
+ This module supports multiple cards, autoprobe and ISA PnP.
Module snd-korg1212
-------------------
Module for Korg 1212 IO PCI card
- Module supports up to 8 cards.
+ This module supports multiple cards.
Module snd-maestro3
-------------------
@@ -894,7 +933,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
-1 for default pin (8 for allegro, 1 for
others)
- Module supports autoprobe and multiple chips (max 8).
+ This module supports autoprobe and multiple chips.
Note: the binding of amplifier is dependent on hardware.
If there is no sound even though all channels are unmuted, try to
@@ -909,7 +948,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
Module for Digigram miXart8 sound cards.
- Module supports multiple cards.
+ This module supports multiple cards.
Note: One miXart8 board will be represented as 4 alsa cards.
See MIXART.txt for details.
@@ -928,7 +967,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
irq - IRQ number or -1 (disable)
pnp - PnP detection - 0 = disable, 1 = enable (default)
- Module supports multiple devices (max 8) and PnP.
+ This module supports multiple devices and PnP.
Module snd-mtpav
----------------
@@ -1014,7 +1053,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
dma2 - second DMA # for Yamaha OPL3-SA chip (0,1,3), -1 = disable
isapnp - ISA PnP detection - 0 = disable, 1 = enable (default)
- Module supports up to 8 cards and ISA PnP. This module does not support
+ This module supports multiple cards and ISA PnP. It does not support
autoprobe (if ISA PnP is not used) thus all ports must be specified!!!
The power-management is supported.
@@ -1064,6 +1103,13 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
This module supports only one card, autoprobe and PnP.
+ Module snd-pcxhr
+ ----------------
+
+ Module for Digigram PCXHR boards
+
+ This module supports multiple cards.
+
Module snd-powermac (on ppc only)
---------------------------------
@@ -1084,20 +1130,22 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
For ARM architecture only.
+ The power-management is supported.
+
Module snd-rme32
----------------
Module for RME Digi32, Digi32 Pro and Digi32/8 (Sek'd Prodif32,
Prodif96 and Prodif Gold) sound cards.
- Module supports up to 8 cards.
+ This module supports multiple cards.
Module snd-rme96
----------------
Module for RME Digi96, Digi96/8 and Digi96/8 PRO/PAD/PST sound cards.
- Module supports up to 8 cards.
+ This module supports multiple cards.
Module snd-rme9652
------------------
@@ -1107,7 +1155,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
precise_ptr - Enable precise pointer (doesn't work reliably).
(default = 0)
- Module supports up to 8 cards.
+ This module supports multiple cards.
Note: snd-page-alloc module does the job which snd-hammerfall-mem
module did formerly. It will allocate the buffers in advance
@@ -1124,6 +1172,8 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
Module supports only one card.
Module has no enable and index options.
+ The power-management is supported.
+
Module snd-sb8
--------------
@@ -1135,8 +1185,10 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
irq - IRQ # for SB DSP chip (5,7,9,10)
dma8 - DMA # for SB DSP chip (1,3)
- Module supports up to 8 cards and autoprobe.
+ This module supports multiple cards and autoprobe.
+ The power-management is supported.
+
Module snd-sb16 and snd-sbawe
-----------------------------
@@ -1155,7 +1207,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
csp - ASP/CSP chip support - 0 = disable (default), 1 = enable
isapnp - ISA PnP detection - 0 = disable, 1 = enable (default)
- Module supports up to 8 cards, autoprobe and ISA PnP.
+ This module supports multiple cards, autoprobe and ISA PnP.
Note: To use Vibra16X cards in 16-bit half duplex mode, you must
disable 16bit DMA with dma16 = -1 module parameter.
@@ -1163,6 +1215,8 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
half duplex mode through 8-bit DMA channel by disabling their
16-bit DMA channel.
+ The power-management is supported.
+
Module snd-sgalaxy
------------------
@@ -1173,7 +1227,9 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
irq - IRQ # (7,9,10,11)
dma1 - DMA #
- Module supports up to 8 cards.
+ This module supports multiple cards.
+
+ The power-management is supported.
Module snd-sscape
-----------------
@@ -1185,7 +1241,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
mpu_irq - MPU-401 IRQ # (PnP setup)
dma - DMA # (PnP setup)
- Module supports up to 8 cards. ISA PnP must be enabled.
+ This module supports multiple cards. ISA PnP must be enabled.
You need sscape_ctl tool in alsa-tools package for loading
the microcode.
@@ -1194,21 +1250,21 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
Module for AMD7930 sound chips found on Sparcs.
- Module supports up to 8 cards.
+ This module supports multiple cards.
Module snd-sun-cs4231 (on sparc only)
-------------------------------------
Module for CS4231 sound chips found on Sparcs.
- Module supports up to 8 cards.
+ This module supports multiple cards.
Module snd-sun-dbri (on sparc only)
-----------------------------------
Module for DBRI sound chips found on Sparcs.
- Module supports up to 8 cards.
+ This module supports multiple cards.
Module snd-wavefront
--------------------
@@ -1228,7 +1284,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
dma2 - DMA2 # for CS4232 PCM interface.
isapnp - ISA PnP detection - 0 = disable, 1 = enable (default)
- Module supports up to 8 cards and ISA PnP.
+ This module supports multiple cards and ISA PnP.
Module snd-sonicvibes
---------------------
@@ -1240,7 +1296,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
- SoundCard must have onboard SRAM for this.
mge - Mic Gain Enable - 1 = enable, 0 = disable (default)
- Module supports up to 8 cards and autoprobe.
+ This module supports multiple cards and autoprobe.
Module snd-serial-u16550
------------------------
@@ -1259,7 +1315,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
0 = Soundcanvas, 1 = MS-124T, 2 = MS-124W S/A,
3 = MS-124W M/B, 4 = Generic
- Module supports up to 8 cards. This module does not support autoprobe
+ This module supports multiple cards. This module does not support autoprobe
thus the main port must be specified!!! Other options are optional.
Module snd-trident
@@ -1278,7 +1334,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
pcm_channels - max channels (voices) reserved for PCM
wavetable_size - max wavetable size in kB (4-?kb)
- Module supports up to 8 cards and autoprobe.
+ This module supports multiple cards and autoprobe.
The power-management is supported.
@@ -1290,14 +1346,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
vid - Vendor ID for the device (optional)
pid - Product ID for the device (optional)
- This module supports up to 8 cards, autoprobe and hotplugging.
+ This module supports multiple devices, autoprobe and hotplugging.
Module snd-usb-usx2y
--------------------
Module for Tascam USB US-122, US-224 and US-428 devices.
- This module supports up to 8 cards, autoprobe and hotplugging.
+ This module supports multiple devices, autoprobe and hotplugging.
Note: you need to load the firmware via usx2yloader utility included
in alsa-tools and alsa-firmware packages.
@@ -1356,6 +1412,8 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
Note: for the MPU401 on VIA823x, use snd-mpu401 driver
additionally. The mpu_port option is for VIA686 chips only.
+ The power-management is supported.
+
Module snd-via82xx-modem
------------------------
@@ -1368,6 +1426,8 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
Note: The default index value of this module is -2, i.e. the first
slot is excluded.
+ The power-management is supported.
+
Module snd-virmidi
------------------
@@ -1375,9 +1435,9 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
This module creates virtual rawmidi devices which communicate
to the corresponding ALSA sequencer ports.
- midi_devs - MIDI devices # (1-8, default=4)
+ midi_devs - MIDI devices # (1-4, default=4)
- Module supports up to 8 cards.
+ This module supports multiple cards.
Module snd-vx222
----------------
@@ -1387,7 +1447,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
mic - Enable Microphone on V222 Mic (NYI)
ibl - Capture IBL size. (default = 0, minimum size)
- Module supports up to 8 cards.
+ This module supports multiple cards.
When the driver is compiled as a module and the hotplug firmware
is supported, the firmware data is loaded via hotplug automatically.
@@ -1406,6 +1466,8 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
size is chosen. The possible IBL values can be found in
/proc/asound/cardX/vx-status proc file.
+ The power-management is supported.
+
Module snd-vxpocket
-------------------
@@ -1413,7 +1475,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
ibl - Capture IBL size. (default = 0, minimum size)
- Module supports up to 8 cards. The module is compiled only when
+ This module supports multiple cards. The module is compiled only when
PCMCIA is supported on kernel.
With the older 2.6.x kernel, to activate the driver via the card
@@ -1434,6 +1496,8 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
Note2: snd-vxp440 driver is merged to snd-vxpocket driver since
ALSA 1.0.10.
+ The power-management is supported.
+
Module snd-ymfpci
-----------------
@@ -1447,7 +1511,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
1 (auto-detect)
rear_switch - enable shared rear/line-in switch (bool)
- Module supports autoprobe and multiple chips (max 8).
+ This module supports autoprobe and multiple chips.
The power-management is supported.
@@ -1458,6 +1522,8 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
Note: the driver is build only when CONFIG_ISA is set.
+ The power-management is supported.
+
AC97 Quirk Option
=================
@@ -1474,7 +1540,7 @@ the proper value with this option.
The following strings are accepted:
- default Don't override the default setting
- - disable Disable the quirk
+ - none Disable the quirk
- hp_only Bind Master and Headphone controls as a single control
- swap_hp Swap headphone and master controls
- swap_surround Swap master and surround controls
diff --git a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
index 260334c98d9..e651ed8d1e6 100644
--- a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
+++ b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
@@ -18,8 +18,8 @@
</affiliation>
</author>
- <date>October 6, 2005</date>
- <edition>0.3.5</edition>
+ <date>November 17, 2005</date>
+ <edition>0.3.6</edition>
<abstract>
<para>
@@ -403,9 +403,8 @@
static int enable[SNDRV_CARDS] = SNDRV_DEFAULT_ENABLE_PNP;
/* definition of the chip-specific record */
- typedef struct snd_mychip mychip_t;
- struct snd_mychip {
- snd_card_t *card;
+ struct mychip {
+ struct snd_card *card;
// rest of implementation will be in the section
// "PCI Resource Managements"
};
@@ -413,7 +412,7 @@
/* chip-specific destructor
* (see "PCI Resource Managements")
*/
- static int snd_mychip_free(mychip_t *chip)
+ static int snd_mychip_free(struct mychip *chip)
{
.... // will be implemented later...
}
@@ -421,22 +420,21 @@
/* component-destructor
* (see "Management of Cards and Components")
*/
- static int snd_mychip_dev_free(snd_device_t *device)
+ static int snd_mychip_dev_free(struct snd_device *device)
{
- mychip_t *chip = device->device_data;
- return snd_mychip_free(chip);
+ return snd_mychip_free(device->device_data);
}
/* chip-specific constructor
* (see "Management of Cards and Components")
*/
- static int __devinit snd_mychip_create(snd_card_t *card,
+ static int __devinit snd_mychip_create(struct snd_card *card,
struct pci_dev *pci,
- mychip_t **rchip)
+ struct mychip **rchip)
{
- mychip_t *chip;
+ struct mychip *chip;
int err;
- static snd_device_ops_t ops = {
+ static struct snd_device_ops ops = {
.dev_free = snd_mychip_dev_free,
};
@@ -474,8 +472,8 @@
const struct pci_device_id *pci_id)
{
static int dev;
- snd_card_t *card;
- mychip_t *chip;
+ struct snd_card *card;
+ struct mychip *chip;
int err;
/* (1) */
@@ -582,7 +580,7 @@
<informalexample>
<programlisting>
<![CDATA[
- snd_card_t *card;
+ struct snd_card *card;
....
card = snd_card_new(index[dev], id[dev], THIS_MODULE, 0);
]]>
@@ -605,7 +603,7 @@
<informalexample>
<programlisting>
<![CDATA[
- mychip_t *chip;
+ struct mychip *chip;
....
if ((err = snd_mychip_create(card, pci, &chip)) < 0) {
snd_card_free(card);
@@ -806,7 +804,7 @@
<informalexample>
<programlisting>
<![CDATA[
- snd_card_t *card;
+ struct snd_card *card;
card = snd_card_new(index, id, module, extra_size);
]]>
</programlisting>
@@ -830,7 +828,7 @@
<para>
After the card is created, you can attach the components
(devices) to the card instance. On ALSA driver, a component is
- represented as a <type>snd_device_t</type> object.
+ represented as a struct <structname>snd_device</structname> object.
A component can be a PCM instance, a control interface, a raw
MIDI interface, etc. Each of such instances has one component
entry.
@@ -891,14 +889,11 @@
The chip-specific information, e.g. the i/o port address, its
resource pointer, or the irq number, is stored in the
chip-specific record.
- Usually, the chip-specific record is typedef'ed as
- <type>xxx_t</type> like the following:
<informalexample>
<programlisting>
<![CDATA[
- typedef struct snd_mychip mychip_t;
- struct snd_mychip {
+ struct mychip {
....
};
]]>
@@ -918,12 +913,12 @@
<informalexample>
<programlisting>
<![CDATA[
- card = snd_card_new(index[dev], id[dev], THIS_MODULE, sizeof(mychip_t));
+ card = snd_card_new(index[dev], id[dev], THIS_MODULE, sizeof(struct mychip));
]]>
</programlisting>
</informalexample>
- whether <type>mychip_t</type> is the type of the chip record.
+ whether struct <structname>mychip</structname> is the type of the chip record.
</para>
<para>
@@ -932,7 +927,7 @@
<informalexample>
<programlisting>
<![CDATA[
- mychip_t *chip = (mychip_t *)card->private_data;
+ struct mychip *chip = (struct mychip *)card->private_data;
]]>
</programlisting>
</informalexample>
@@ -954,8 +949,8 @@
<informalexample>
<programlisting>
<![CDATA[
- snd_card_t *card;
- mychip_t *chip;
+ struct snd_card *card;
+ struct mychip *chip;
card = snd_card_new(index[dev], id[dev], THIS_MODULE, NULL);
.....
chip = kzalloc(sizeof(*chip), GFP_KERNEL);
@@ -971,8 +966,8 @@
<informalexample>
<programlisting>
<![CDATA[
- struct snd_mychip {
- snd_card_t *card;
+ struct mychip {
+ struct snd_card *card;
....
};
]]>
@@ -1000,7 +995,7 @@
<informalexample>
<programlisting>
<![CDATA[
- static snd_device_ops_t ops = {
+ static struct snd_device_ops ops = {
.dev_free = snd_mychip_dev_free,
};
....
@@ -1018,10 +1013,9 @@
<informalexample>
<programlisting>
<![CDATA[
- static int snd_mychip_dev_free(snd_device_t *device)
+ static int snd_mychip_dev_free(struct snd_device *device)
{
- mychip_t *chip = device->device_data;
- return snd_mychip_free(chip);
+ return snd_mychip_free(device->device_data);
}
]]>
</programlisting>
@@ -1087,15 +1081,15 @@
<title>PCI Resource Managements Example</title>
<programlisting>
<![CDATA[
- struct snd_mychip {
- snd_card_t *card;
+ struct mychip {
+ struct snd_card *card;
struct pci_dev *pci;
unsigned long port;
int irq;
};
- static int snd_mychip_free(mychip_t *chip)
+ static int snd_mychip_free(struct mychip *chip)
{
/* disable hardware here if any */
.... // (not implemented in this document)
@@ -1113,13 +1107,13 @@
}
/* chip-specific constructor */
- static int __devinit snd_mychip_create(snd_card_t *card,
+ static int __devinit snd_mychip_create(struct snd_card *card,
struct pci_dev *pci,
- mychip_t **rchip)
+ struct mychip **rchip)
{
- mychip_t *chip;
+ struct mychip *chip;
int err;
- static snd_device_ops_t ops = {
+ static struct snd_device_ops ops = {
.dev_free = snd_mychip_dev_free,
};
@@ -1155,8 +1149,7 @@
}
chip->port = pci_resource_start(pci, 0);
if (request_irq(pci->irq, snd_mychip_interrupt,
- SA_INTERRUPT|SA_SHIRQ, "My Chip",
- (void *)chip)) {
+ SA_INTERRUPT|SA_SHIRQ, "My Chip", chip)) {
printk(KERN_ERR "cannot grab irq %d\n", pci->irq);
snd_mychip_free(chip);
return -EBUSY;
@@ -1268,14 +1261,14 @@
<para>
Now assume that this PCI device has an I/O port with 8 bytes
- and an interrupt. Then <type>mychip_t</type> will have the
+ and an interrupt. Then struct <structname>mychip</structname> will have the
following fields:
<informalexample>
<programlisting>
<![CDATA[
- struct snd_mychip {
- snd_card_t *card;
+ struct mychip {
+ struct snd_card *card;
unsigned long port;
int irq;
@@ -1330,8 +1323,7 @@
<programlisting>
<![CDATA[
if (request_irq(pci->irq, snd_mychip_interrupt,
- SA_INTERRUPT|SA_SHIRQ, "My Chip",
- (void *)chip)) {
+ SA_INTERRUPT|SA_SHIRQ, "My Chip", chip)) {
printk(KERN_ERR "cannot grab irq %d\n", pci->irq);
snd_mychip_free(chip);
return -EBUSY;
@@ -1372,7 +1364,7 @@
static irqreturn_t snd_mychip_interrupt(int irq, void *dev_id,
struct pt_regs *regs)
{
- mychip_t *chip = dev_id;
+ struct mychip *chip = dev_id;
....
return IRQ_HANDLED;
}
@@ -1487,7 +1479,7 @@
<informalexample>
<programlisting>
<![CDATA[
- struct snd_mychip {
+ struct mychip {
....
unsigned long iobase_phys;
void __iomem *iobase_virt;
@@ -1517,7 +1509,7 @@
<informalexample>
<programlisting>
<![CDATA[
- static int snd_mychip_free(mychip_t *chip)
+ static int snd_mychip_free(struct mychip *chip)
{
....
if (chip->iobase_virt)
@@ -1537,7 +1529,7 @@
<title>Registration of Device Struct</title>
<para>
At some point, typically after calling <function>snd_device_new()</function>,
- you need to register the <structname>struct device</structname> of the chip
+ you need to register the struct <structname>device</structname> of the chip
you're handling for udev and co. ALSA provides a macro for compatibility with
older kernels. Simply call like the following:
<informalexample>
@@ -1739,7 +1731,7 @@
....
/* hardware definition */
- static snd_pcm_hardware_t snd_mychip_playback_hw = {
+ static struct snd_pcm_hardware snd_mychip_playback_hw = {
.info = (SNDRV_PCM_INFO_MMAP |
SNDRV_PCM_INFO_INTERLEAVED |
SNDRV_PCM_INFO_BLOCK_TRANSFER |
@@ -1758,7 +1750,7 @@
};
/* hardware definition */
- static snd_pcm_hardware_t snd_mychip_capture_hw = {
+ static struct snd_pcm_hardware snd_mychip_capture_hw = {
.info = (SNDRV_PCM_INFO_MMAP |
SNDRV_PCM_INFO_INTERLEAVED |
SNDRV_PCM_INFO_BLOCK_TRANSFER |
@@ -1777,10 +1769,10 @@
};
/* open callback */
- static int snd_mychip_playback_open(snd_pcm_substream_t *substream)
+ static int snd_mychip_playback_open(struct snd_pcm_substream *substream)
{
- mychip_t *chip = snd_pcm_substream_chip(substream);
- snd_pcm_runtime_t *runtime = substream->runtime;
+ struct mychip *chip = snd_pcm_substream_chip(substream);
+ struct snd_pcm_runtime *runtime = substream->runtime;
runtime->hw = snd_mychip_playback_hw;
// more hardware-initialization will be done here
@@ -1788,19 +1780,19 @@
}
/* close callback */
- static int snd_mychip_playback_close(snd_pcm_substream_t *substream)
+ static int snd_mychip_playback_close(struct snd_pcm_substream *substream)
{
- mychip_t *chip = snd_pcm_substream_chip(substream);
+ struct mychip *chip = snd_pcm_substream_chip(substream);
// the hardware-specific codes will be here
return 0;
}
/* open callback */
- static int snd_mychip_capture_open(snd_pcm_substream_t *substream)
+ static int snd_mychip_capture_open(struct snd_pcm_substream *substream)
{
- mychip_t *chip = snd_pcm_substream_chip(substream);
- snd_pcm_runtime_t *runtime = substream->runtime;
+ struct mychip *chip = snd_pcm_substream_chip(substream);
+ struct snd_pcm_runtime *runtime = substream->runtime;
runtime->hw = snd_mychip_capture_hw;
// more hardware-initialization will be done here
@@ -1808,33 +1800,33 @@
}
/* close callback */
- static int snd_mychip_capture_close(snd_pcm_substream_t *substream)
+ static int snd_mychip_capture_close(struct snd_pcm_substream *substream)
{
- mychip_t *chip = snd_pcm_substream_chip(substream);
+ struct mychip *chip = snd_pcm_substream_chip(substream);
// the hardware-specific codes will be here
return 0;
}
/* hw_params callback */
- static int snd_mychip_pcm_hw_params(snd_pcm_substream_t *substream,
- snd_pcm_hw_params_t * hw_params)
+ static int snd_mychip_pcm_hw_params(struct snd_pcm_substream *substream,
+ struct snd_pcm_hw_params *hw_params)
{
return snd_pcm_lib_malloc_pages(substream,
params_buffer_bytes(hw_params));
}
/* hw_free callback */
- static int snd_mychip_pcm_hw_free(snd_pcm_substream_t *substream)
+ static int snd_mychip_pcm_hw_free(struct snd_pcm_substream *substream)
{
return snd_pcm_lib_free_pages(substream);
}
/* prepare callback */
- static int snd_mychip_pcm_prepare(snd_pcm_substream_t *substream)
+ static int snd_mychip_pcm_prepare(struct snd_pcm_substream *substream)
{
- mychip_t *chip = snd_pcm_substream_chip(substream);
- snd_pcm_runtime_t *runtime = substream->runtime;
+ struct mychip *chip = snd_pcm_substream_chip(substream);
+ struct snd_pcm_runtime *runtime = substream->runtime;
/* set up the hardware with the current configuration
* for example...
@@ -1849,7 +1841,7 @@
}
/* trigger callback */
- static int snd_mychip_pcm_trigger(snd_pcm_substream_t *substream,
+ static int snd_mychip_pcm_trigger(struct snd_pcm_substream *substream,
int cmd)
{
switch (cmd) {
@@ -1866,9 +1858,9 @@
/* pointer callback */
static snd_pcm_uframes_t
- snd_mychip_pcm_pointer(snd_pcm_substream_t *substream)
+ snd_mychip_pcm_pointer(struct snd_pcm_substream *substream)
{
- mychip_t *chip = snd_pcm_substream_chip(substream);
+ struct mychip *chip = snd_pcm_substream_chip(substream);
unsigned int current_ptr;
/* get the current hardware pointer */
@@ -1877,7 +1869,7 @@
}
/* operators */
- static snd_pcm_ops_t snd_mychip_playback_ops = {
+ static struct snd_pcm_ops snd_mychip_playback_ops = {
.open = snd_mychip_playback_open,
.close = snd_mychip_playback_close,
.ioctl = snd_pcm_lib_ioctl,
@@ -1889,7 +1881,7 @@
};
/* operators */
- static snd_pcm_ops_t snd_mychip_capture_ops = {
+ static struct snd_pcm_ops snd_mychip_capture_ops = {
.open = snd_mychip_capture_open,
.close = snd_mychip_capture_close,
.ioctl = snd_pcm_lib_ioctl,
@@ -1905,9 +1897,9 @@
*/
/* create a pcm device */
- static int __devinit snd_mychip_new_pcm(mychip_t *chip)
+ static int __devinit snd_mychip_new_pcm(struct mychip *chip)
{
- snd_pcm_t *pcm;
+ struct snd_pcm *pcm;
int err;
if ((err = snd_pcm_new(chip->card, "My Chip", 0, 1, 1,
@@ -1944,9 +1936,9 @@
<informalexample>
<programlisting>
<![CDATA[
- static int __devinit snd_mychip_new_pcm(mychip_t *chip)
+ static int __devinit snd_mychip_new_pcm(struct mychip *chip)
{
- snd_pcm_t *pcm;
+ struct snd_pcm *pcm;
int err;
if ((err = snd_pcm_new(chip->card, "My Chip", 0, 1, 1,
@@ -1989,13 +1981,13 @@
specify more numbers, but they must be handled properly in
open/close, etc. callbacks. When you need to know which
substream you are referring to, then it can be obtained from
- <type>snd_pcm_substream_t</type> data passed to each callback
+ struct <structname>snd_pcm_substream</structname> data passed to each callback
as follows:
<informalexample>
<programlisting>
<![CDATA[
- snd_pcm_substream_t *substream;
+ struct snd_pcm_substream *substream;
int index = substream->number;
]]>
</programlisting>
@@ -2024,7 +2016,7 @@
<informalexample>
<programlisting>
<![CDATA[
- static snd_pcm_ops_t snd_mychip_playback_ops = {
+ static struct snd_pcm_ops snd_mychip_playback_ops = {
.open = snd_mychip_pcm_open,
.close = snd_mychip_pcm_close,
.ioctl = snd_pcm_lib_ioctl,
@@ -2102,18 +2094,18 @@
<title>PCM Instance with a Destructor</title>
<programlisting>
<![CDATA[
- static void mychip_pcm_free(snd_pcm_t *pcm)
+ static void mychip_pcm_free(struct snd_pcm *pcm)
{
- mychip_t *chip = snd_pcm_chip(pcm);
+ struct mychip *chip = snd_pcm_chip(pcm);
/* free your own data */
kfree(chip->my_private_pcm_data);
// do what you like else
....
}
- static int __devinit snd_mychip_new_pcm(mychip_t *chip)
+ static int __devinit snd_mychip_new_pcm(struct mychip *chip)
{
- snd_pcm_t *pcm;
+ struct snd_pcm *pcm;
....
/* allocate your own data */
chip->my_private_pcm_data = kmalloc(...);
@@ -2149,7 +2141,7 @@
<![CDATA[
struct _snd_pcm_runtime {
/* -- Status -- */
- snd_pcm_substream_t *trigger_master;
+ struct snd_pcm_substream *trigger_master;
snd_timestamp_t trigger_tstamp; /* trigger timestamp */
int overrange;
snd_pcm_uframes_t avail_max;
@@ -2192,8 +2184,8 @@ struct _snd_pcm_runtime {
snd_pcm_sync_id_t sync; /* hardware synchronization ID */
/* -- mmap -- */
- volatile snd_pcm_mmap_status_t *status;
- volatile snd_pcm_mmap_control_t *control;
+ volatile struct snd_pcm_mmap_status *status;
+ volatile struct snd_pcm_mmap_control *control;
atomic_t mmap_count;
/* -- locking / scheduling -- */
@@ -2204,15 +2196,15 @@ struct _snd_pcm_runtime {
/* -- private section -- */
void *private_data;
- void (*private_free)(snd_pcm_runtime_t *runtime);
+ void (*private_free)(struct snd_pcm_runtime *runtime);
/* -- hardware description -- */
- snd_pcm_hardware_t hw;
- snd_pcm_hw_constraints_t hw_constraints;
+ struct snd_pcm_hardware hw;
+ struct snd_pcm_hw_constraints hw_constraints;
/* -- interrupt callbacks -- */
- void (*transfer_ack_begin)(snd_pcm_substream_t *substream);
- void (*transfer_ack_end)(snd_pcm_substream_t *substream);
+ void (*transfer_ack_begin)(struct snd_pcm_substream *substream);
+ void (*transfer_ack_end)(struct snd_pcm_substream *substream);
/* -- timer -- */
unsigned int timer_resolution; /* timer resolution */
@@ -2226,7 +2218,7 @@ struct _snd_pcm_runtime {
#if defined(CONFIG_SND_PCM_OSS) || defined(CONFIG_SND_PCM_OSS_MODULE)
/* -- OSS things -- */
- snd_pcm_oss_runtime_t oss;
+ struct snd_pcm_oss_runtime oss;
#endif
};
]]>
@@ -2252,7 +2244,7 @@ struct _snd_pcm_runtime {
<section id="pcm-interface-runtime-hw">
<title>Hardware Description</title>
<para>
- The hardware descriptor (<type>snd_pcm_hardware_t</type>)
+ The hardware descriptor (struct <structname>snd_pcm_hardware</structname>)
contains the definitions of the fundamental hardware
configuration. Above all, you'll need to define this in
<link linkend="pcm-interface-operators-open-callback"><citetitle>
@@ -2267,7 +2259,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- snd_pcm_runtime_t *runtime = substream->runtime;
+ struct snd_pcm_runtime *runtime = substream->runtime;
...
runtime->hw = snd_mychip_playback_hw; /* common definition */
if (chip->model == VERY_OLD_ONE)
@@ -2282,7 +2274,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static snd_pcm_hardware_t snd_mychip_playback_hw = {
+ static struct snd_pcm_hardware snd_mychip_playback_hw = {
.info = (SNDRV_PCM_INFO_MMAP |
SNDRV_PCM_INFO_INTERLEAVED |
SNDRV_PCM_INFO_BLOCK_TRANSFER |
@@ -2337,9 +2329,14 @@ struct _snd_pcm_runtime {
<constant>PAUSE</constant> bit means that the pcm supports the
<quote>pause</quote> operation, while the
<constant>RESUME</constant> bit means that the pcm supports
- the <quote>suspend/resume</quote> operation. If these flags
- are set, the <structfield>trigger</structfield> callback below
- must handle the corresponding commands.
+ the full <quote>suspend/resume</quote> operation.
+ If <constant>PAUSE</constant> flag is set,
+ the <structfield>trigger</structfield> callback below
+ must handle the corresponding (pause push/release) commands.
+ The suspend/resume trigger commands can be defined even without
+ <constant>RESUME</constant> flag. See <link
+ linkend="power-management"><citetitle>
+ Power Management</citetitle></link> section for details.
</para>
<para>
@@ -2512,7 +2509,7 @@ struct _snd_pcm_runtime {
<title>Running Status</title>
<para>
The running status can be referred via <constant>runtime-&gt;status</constant>.
- This is the pointer to <type>snd_pcm_mmap_status_t</type>
+ This is the pointer to struct <structname>snd_pcm_mmap_status</structname>
record. For example, you can get the current DMA hardware
pointer via <constant>runtime-&gt;status-&gt;hw_ptr</constant>.
</para>
@@ -2520,7 +2517,7 @@ struct _snd_pcm_runtime {
<para>
The DMA application pointer can be referred via
<constant>runtime-&gt;control</constant>, which points
- <type>snd_pcm_mmap_control_t</type> record.
+ struct <structname>snd_pcm_mmap_control</structname> record.
However, accessing directly to this value is not recommended.
</para>
</section>
@@ -2542,9 +2539,9 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static int snd_xxx_open(snd_pcm_substream_t *substream)
+ static int snd_xxx_open(struct snd_pcm_substream *substream)
{
- my_pcm_data_t *data;
+ struct my_pcm_data *data;
....
data = kmalloc(sizeof(*data), GFP_KERNEL);
substream->runtime->private_data = data;
@@ -2586,7 +2583,7 @@ struct _snd_pcm_runtime {
<para>
The callback function takes at least the argument with
- <type>snd_pcm_substream_t</type> pointer. For retrieving the
+ <structname>snd_pcm_substream</structname> pointer. For retrieving the
chip record from the given substream instance, you can use the
following macro.
@@ -2594,7 +2591,7 @@ struct _snd_pcm_runtime {
<programlisting>
<![CDATA[
int xxx() {
- mychip_t *chip = snd_pcm_substream_chip(substream);
+ struct mychip *chip = snd_pcm_substream_chip(substream);
....
}
]]>
@@ -2616,7 +2613,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static int snd_xxx_open(snd_pcm_substream_t *substream);
+ static int snd_xxx_open(struct snd_pcm_substream *substream);
]]>
</programlisting>
</informalexample>
@@ -2631,10 +2628,10 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static int snd_xxx_open(snd_pcm_substream_t *substream)
+ static int snd_xxx_open(struct snd_pcm_substream *substream)
{
- mychip_t *chip = snd_pcm_substream_chip(substream);
- snd_pcm_runtime_t *runtime = substream->runtime;
+ struct mychip *chip = snd_pcm_substream_chip(substream);
+ struct snd_pcm_runtime *runtime = substream->runtime;
runtime->hw = snd_mychip_playback_hw;
return 0;
@@ -2667,7 +2664,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static int snd_xxx_close(snd_pcm_substream_t *substream);
+ static int snd_xxx_close(struct snd_pcm_substream *substream);
]]>
</programlisting>
</informalexample>
@@ -2682,7 +2679,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static int snd_xxx_close(snd_pcm_substream_t *substream)
+ static int snd_xxx_close(struct snd_pcm_substream *substream)
{
....
kfree(substream->runtime->private_data);
@@ -2709,8 +2706,8 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static int snd_xxx_hw_params(snd_pcm_substream_t * substream,
- snd_pcm_hw_params_t * hw_params);
+ static int snd_xxx_hw_params(struct snd_pcm_substream *substream,
+ struct snd_pcm_hw_params *hw_params);
]]>
</programlisting>
</informalexample>
@@ -2785,7 +2782,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static int snd_xxx_hw_free(snd_pcm_substream_t * substream);
+ static int snd_xxx_hw_free(struct snd_pcm_substream *substream);
]]>
</programlisting>
</informalexample>
@@ -2820,7 +2817,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static int snd_xxx_prepare(snd_pcm_substream_t * substream);
+ static int snd_xxx_prepare(struct snd_pcm_substream *substream);
]]>
</programlisting>
</informalexample>
@@ -2869,7 +2866,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static int snd_xxx_trigger(snd_pcm_substream_t * substream, int cmd);
+ static int snd_xxx_trigger(struct snd_pcm_substream *substream, int cmd);
]]>
</programlisting>
</informalexample>
@@ -2911,8 +2908,8 @@ struct _snd_pcm_runtime {
</para>
<para>
- When the pcm supports the suspend/resume operation
- (i.e. <constant>SNDRV_PCM_INFO_RESUME</constant> flag is set),
+ When the pcm supports the suspend/resume operation,
+ regardless of full or partial suspend/resume support,
<constant>SUSPEND</constant> and <constant>RESUME</constant>
commands must be handled, too.
These commands are issued when the power-management status is
@@ -2921,6 +2918,8 @@ struct _snd_pcm_runtime {
do suspend and resume of the pcm substream, and usually, they
are identical with <constant>STOP</constant> and
<constant>START</constant> commands, respectively.
+ See <link linkend="power-management"><citetitle>
+ Power Management</citetitle></link> section for details.
</para>
<para>
@@ -2939,7 +2938,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static snd_pcm_uframes_t snd_xxx_pointer(snd_pcm_substream_t * substream)
+ static snd_pcm_uframes_t snd_xxx_pointer(struct snd_pcm_substream *substream)
]]>
</programlisting>
</informalexample>
@@ -3067,7 +3066,7 @@ struct _snd_pcm_runtime {
static irqreturn_t snd_mychip_interrupt(int irq, void *dev_id,
struct pt_regs *regs)
{
- mychip_t *chip = dev_id;
+ struct mychip *chip = dev_id;
spin_lock(&chip->lock);
....
if (pcm_irq_invoked(chip)) {
@@ -3111,7 +3110,7 @@ struct _snd_pcm_runtime {
static irqreturn_t snd_mychip_interrupt(int irq, void *dev_id,
struct pt_regs *regs)
{
- mychip_t *chip = dev_id;
+ struct mychip *chip = dev_id;
spin_lock(&chip->lock);
....
if (pcm_irq_invoked(chip)) {
@@ -3221,13 +3220,13 @@ struct _snd_pcm_runtime {
<![CDATA[
static unsigned int rates[] =
{4000, 10000, 22050, 44100};
- static snd_pcm_hw_constraint_list_t constraints_rates = {
+ static struct snd_pcm_hw_constraint_list constraints_rates = {
.count = ARRAY_SIZE(rates),
.list = rates,
.mask = 0,
};
- static int snd_mychip_pcm_open(snd_pcm_substream_t *substream)
+ static int snd_mychip_pcm_open(struct snd_pcm_substream *substream)
{
int err;
....
@@ -3249,19 +3248,20 @@ struct _snd_pcm_runtime {
You can even define your own constraint rules.
For example, let's suppose my_chip can manage a substream of 1 channel
if and only if the format is S16_LE, otherwise it supports any format
- specified in the <type>snd_pcm_hardware_t</type> stucture (or in any
+ specified in the <structname>snd_pcm_hardware</structname> stucture (or in any
other constraint_list). You can build a rule like this:
<example>
<title>Example of Hardware Constraints for Channels</title>
<programlisting>
<![CDATA[
- static int hw_rule_format_by_channels(snd_pcm_hw_params_t *params,
- snd_pcm_hw_rule_t *rule)
+ static int hw_rule_format_by_channels(struct snd_pcm_hw_params *params,
+ struct snd_pcm_hw_rule *rule)
{
- snd_interval_t *c = hw_param_interval(params, SNDRV_PCM_HW_PARAM_CHANNELS);
- snd_mask_t *f = hw_param_mask(params, SNDRV_PCM_HW_PARAM_FORMAT);
- snd_mask_t fmt;
+ struct snd_interval *c = hw_param_interval(params,
+ SNDRV_PCM_HW_PARAM_CHANNELS);
+ struct snd_mask *f = hw_param_mask(params, SNDRV_PCM_HW_PARAM_FORMAT);
+ struct snd_mask fmt;
snd_mask_any(&fmt); /* Init the struct */
if (c->min < 2) {
@@ -3298,12 +3298,13 @@ struct _snd_pcm_runtime {
<title>Example of Hardware Constraints for Channels</title>
<programlisting>
<![CDATA[
- static int hw_rule_channels_by_format(snd_pcm_hw_params_t *params,
- snd_pcm_hw_rule_t *rule)
+ static int hw_rule_channels_by_format(struct snd_pcm_hw_params *params,
+ struct snd_pcm_hw_rule *rule)
{
- snd_interval_t *c = hw_param_interval(params, SNDRV_PCM_HW_PARAM_CHANNELS);
- snd_mask_t *f = hw_param_mask(params, SNDRV_PCM_HW_PARAM_FORMAT);
- snd_interval_t ch;
+ struct snd_interval *c = hw_param_interval(params,
+ SNDRV_PCM_HW_PARAM_CHANNELS);
+ struct snd_mask *f = hw_param_mask(params, SNDRV_PCM_HW_PARAM_FORMAT);
+ struct snd_interval ch;
snd_interval_any(&ch);
if (f->bits[0] == SNDRV_PCM_FMTBIT_S16_LE) {
@@ -3376,13 +3377,13 @@ struct _snd_pcm_runtime {
callbacks: <structfield>info</structfield>,
<structfield>get</structfield> and
<structfield>put</structfield>. Then, define a
- <type>snd_kcontrol_new_t</type> record, such as:
+ struct <structname>snd_kcontrol_new</structname> record, such as:
<example>
<title>Definition of a Control</title>
<programlisting>
<![CDATA[
- static snd_kcontrol_new_t my_control __devinitdata = {
+ static struct snd_kcontrol_new my_control __devinitdata = {
.iface = SNDRV_CTL_ELEM_IFACE_MIXER,
.name = "PCM Playback Switch",
.index = 0,
@@ -3599,7 +3600,7 @@ struct _snd_pcm_runtime {
<para>
The <structfield>info</structfield> callback is used to get
the detailed information of this control. This must store the
- values of the given <type>snd_ctl_elem_info_t</type>
+ values of the given struct <structname>snd_ctl_elem_info</structname>
object. For example, for a boolean control with a single
element will be:
@@ -3607,8 +3608,8 @@ struct _snd_pcm_runtime {
<title>Example of info callback</title>
<programlisting>
<![CDATA[
- static int snd_myctl_info(snd_kcontrol_t *kcontrol,
- snd_ctl_elem_info_t *uinfo)
+ static int snd_myctl_info(struct snd_kcontrol *kcontrol,
+ struct snd_ctl_elem_info *uinfo)
{
uinfo->type = SNDRV_CTL_ELEM_TYPE_BOOLEAN;
uinfo->count = 1;
@@ -3642,8 +3643,8 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static int snd_myctl_info(snd_kcontrol_t *kcontrol,
- snd_ctl_elem_info_t *uinfo)
+ static int snd_myctl_info(struct snd_kcontrol *kcontrol,
+ struct snd_ctl_elem_info *uinfo)
{
static char *texts[4] = {
"First", "Second", "Third", "Fourth"
@@ -3678,10 +3679,10 @@ struct _snd_pcm_runtime {
<title>Example of get callback</title>
<programlisting>
<![CDATA[
- static int snd_myctl_get(snd_kcontrol_t *kcontrol,
- snd_ctl_elem_value_t *ucontrol)
+ static int snd_myctl_get(struct snd_kcontrol *kcontrol,
+ struct snd_ctl_elem_value *ucontrol)
{
- mychip_t *chip = snd_kcontrol_chip(kcontrol);
+ struct mychip *chip = snd_kcontrol_chip(kcontrol);
ucontrol->value.integer.value[0] = get_some_value(chip);
return 0;
}
@@ -3717,8 +3718,8 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static int snd_sbmixer_get_single(snd_kcontrol_t *kcontrol,
- snd_ctl_elem_value_t *ucontrol)
+ static int snd_sbmixer_get_single(struct snd_kcontrol *kcontrol,
+ struct snd_ctl_elem_value *ucontrol)
{
int reg = kcontrol->private_value & 0xff;
int shift = (kcontrol->private_value >> 16) & 0xff;
@@ -3754,10 +3755,10 @@ struct _snd_pcm_runtime {
<title>Example of put callback</title>
<programlisting>
<![CDATA[
- static int snd_myctl_put(snd_kcontrol_t *kcontrol,
- snd_ctl_elem_value_t *ucontrol)
+ static int snd_myctl_put(struct snd_kcontrol *kcontrol,
+ struct snd_ctl_elem_value *ucontrol)
{
- mychip_t *chip = snd_kcontrol_chip(kcontrol);
+ struct mychip *chip = snd_kcontrol_chip(kcontrol);
int changed = 0;
if (chip->current_value !=
ucontrol->value.integer.value[0]) {
@@ -3814,7 +3815,7 @@ struct _snd_pcm_runtime {
</informalexample>
where <parameter>my_control</parameter> is the
- <type>snd_kcontrol_new_t</type> object defined above, and chip
+ struct <structname>snd_kcontrol_new</structname> object defined above, and chip
is the object pointer to be passed to
kcontrol-&gt;private_data
which can be referred in callbacks.
@@ -3822,7 +3823,7 @@ struct _snd_pcm_runtime {
<para>
<function>snd_ctl_new1()</function> allocates a new
- <type>snd_kcontrol_t</type> instance (that's why the definition
+ <structname>snd_kcontrol</structname> instance (that's why the definition
of <parameter>my_control</parameter> can be with
<parameter>__devinitdata</parameter>
prefix), and <function>snd_ctl_add</function> assigns the given
@@ -3849,7 +3850,7 @@ struct _snd_pcm_runtime {
control id pointer for the notification. The event-mask
specifies the types of notification, for example, in the above
example, the change of control values is notified.
- The id pointer is the pointer of <type>snd_ctl_elem_id_t</type>
+ The id pointer is the pointer of struct <structname>snd_ctl_elem_id</structname>
to be notified.
You can find some examples in <filename>es1938.c</filename> or
<filename>es1968.c</filename> for hardware volume interrupts.
@@ -3882,35 +3883,35 @@ struct _snd_pcm_runtime {
<title>Example of AC97 Interface</title>
<programlisting>
<![CDATA[
- struct snd_mychip {
+ struct mychip {
....
- ac97_t *ac97;
+ struct snd_ac97 *ac97;
....
};
- static unsigned short snd_mychip_ac97_read(ac97_t *ac97,
+ static unsigned short snd_mychip_ac97_read(struct snd_ac97 *ac97,
unsigned short reg)
{
- mychip_t *chip = ac97->private_data;
+ struct mychip *chip = ac97->private_data;
....
// read a register value here from the codec
return the_register_value;
}
- static void snd_mychip_ac97_write(ac97_t *ac97,
+ static void snd_mychip_ac97_write(struct snd_ac97 *ac97,
unsigned short reg, unsigned short val)
{
- mychip_t *chip = ac97->private_data;
+ struct mychip *chip = ac97->private_data;
....
// write the given register value to the codec
}
- static int snd_mychip_ac97(mychip_t *chip)
+ static int snd_mychip_ac97(struct mychip *chip)
{
- ac97_bus_t *bus;
- ac97_template_t ac97;
+ struct snd_ac97_bus *bus;
+ struct snd_ac97_template ac97;
int err;
- static ac97_bus_ops_t ops = {
+ static struct snd_ac97_bus_ops ops = {
.write = snd_mychip_ac97_write,
.read = snd_mychip_ac97_read,
};
@@ -3937,8 +3938,8 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- ac97_bus_t *bus;
- static ac97_bus_ops_t ops = {
+ struct snd_ac97_bus *bus;
+ static struct snd_ac97_bus_ops ops = {
.write = snd_mychip_ac97_write,
.read = snd_mychip_ac97_read,
};
@@ -3952,13 +3953,14 @@ struct _snd_pcm_runtime {
</para>
<para>
- And then call <function>snd_ac97_mixer()</function> with an <type>ac97_template_t</type>
+ And then call <function>snd_ac97_mixer()</function> with an
+ struct <structname>snd_ac97_template</structname>
record together with the bus pointer created above.
<informalexample>
<programlisting>
<![CDATA[
- ac97_template_t ac97;
+ struct snd_ac97_template ac97;
int err;
memset(&ac97, 0, sizeof(ac97));
@@ -3995,10 +3997,10 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static unsigned short snd_mychip_ac97_read(ac97_t *ac97,
+ static unsigned short snd_mychip_ac97_read(struct snd_ac97 *ac97,
unsigned short reg)
{
- mychip_t *chip = ac97->private_data;
+ struct mychip *chip = ac97->private_data;
....
return the_register_value;
}
@@ -4016,7 +4018,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static void snd_mychip_ac97_write(ac97_t *ac97,
+ static void snd_mychip_ac97_write(struct snd_ac97 *ac97,
unsigned short reg, unsigned short val)
]]>
</programlisting>
@@ -4163,7 +4165,7 @@ struct _snd_pcm_runtime {
<title>Multiple Codecs</title>
<para>
When there are several codecs on the same card, you need to
- call <function>snd_ac97_new()</function> multiple times with
+ call <function>snd_ac97_mixer()</function> multiple times with
ac97.num=1 or greater. The <structfield>num</structfield> field
specifies the codec
number.
@@ -4212,7 +4214,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- snd_rawmidi_t *rmidi;
+ struct snd_rawmidi *rmidi;
snd_mpu401_uart_new(card, 0, MPU401_HW_MPU401, port, integrated,
irq, irq_flags, &rmidi);
]]>
@@ -4253,17 +4255,17 @@ struct _snd_pcm_runtime {
Usually, the port address corresponds to the command port and
port + 1 corresponds to the data port. If not, you may change
the <structfield>cport</structfield> field of
- <type>mpu401_t</type> manually
- afterward. However, <type>mpu401_t</type> pointer is not
+ struct <structname>snd_mpu401</structname> manually
+ afterward. However, <structname>snd_mpu401</structname> pointer is not
returned explicitly by
<function>snd_mpu401_uart_new()</function>. You need to cast
rmidi-&gt;private_data to
- <type>mpu401_t</type> explicitly,
+ <structname>snd_mpu401</structname> explicitly,
<informalexample>
<programlisting>
<![CDATA[
- mpu401_t *mpu;
+ struct snd_mpu401 *mpu;
mpu = rmidi->private_data;
]]>
</programlisting>
@@ -4359,7 +4361,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- snd_rawmidi_t *rmidi;
+ struct snd_rawmidi *rmidi;
err = snd_rawmidi_new(chip->card, "MyMIDI", 0, outs, ins, &rmidi);
if (err < 0)
return err;
@@ -4419,7 +4421,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static snd_rawmidi_ops_t snd_mymidi_output_ops = {
+ static struct snd_rawmidi_ops snd_mymidi_output_ops = {
.open = snd_mymidi_output_open,
.close = snd_mymidi_output_close,
.trigger = snd_mymidi_output_trigger,
@@ -4439,9 +4441,9 @@ struct _snd_pcm_runtime {
<programlisting>
<![CDATA[
struct list_head *list;
- snd_rawmidi_substream_t *substream;
+ struct snd_rawmidi_substream *substream;
list_for_each(list, &rmidi->streams[SNDRV_RAWMIDI_STREAM_OUTPUT].substreams) {
- substream = list_entry(list, snd_rawmidi_substream_t, list);
+ substream = list_entry(list, struct snd_rawmidi_substream, list);
sprintf(substream->name, "My MIDI Port %d", substream->number + 1);
}
/* same for SNDRV_RAWMIDI_STREAM_INPUT */
@@ -4463,12 +4465,12 @@ struct _snd_pcm_runtime {
<para>
If there is more than one port, your callbacks can determine the
- port index from the snd_rawmidi_substream_t data passed to each
+ port index from the struct snd_rawmidi_substream data passed to each
callback:
<informalexample>
<programlisting>
<![CDATA[
- snd_rawmidi_substream_t *substream;
+ struct snd_rawmidi_substream *substream;
int index = substream->number;
]]>
</programlisting>
@@ -4481,7 +4483,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static int snd_xxx_open(snd_rawmidi_substream_t *substream);
+ static int snd_xxx_open(struct snd_rawmidi_substream *substream);
]]>
</programlisting>
</informalexample>
@@ -4499,7 +4501,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static int snd_xxx_close(snd_rawmidi_substream_t *substream);
+ static int snd_xxx_close(struct snd_rawmidi_substream *substream);
]]>
</programlisting>
</informalexample>
@@ -4522,7 +4524,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static void snd_xxx_output_trigger(snd_rawmidi_substream_t *substream, int up);
+ static void snd_xxx_output_trigger(struct snd_rawmidi_substream *substream, int up);
]]>
</programlisting>
</informalexample>
@@ -4547,7 +4549,7 @@ struct _snd_pcm_runtime {
<![CDATA[
unsigned char data;
while (snd_rawmidi_transmit_peek(substream, &data, 1) == 1) {
- if (mychip_try_to_transmit(data))
+ if (snd_mychip_try_to_transmit(data))
snd_rawmidi_transmit_ack(substream, 1);
else
break; /* hardware FIFO full */
@@ -4564,11 +4566,11 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- while (mychip_transmit_possible()) {
+ while (snd_mychip_transmit_possible()) {
unsigned char data;
if (snd_rawmidi_transmit(substream, &data, 1) != 1)
break; /* no more data */
- mychip_transmit(data);
+ snd_mychip_transmit(data);
}
]]>
</programlisting>
@@ -4603,7 +4605,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static void snd_xxx_input_trigger(snd_rawmidi_substream_t *substream, int up);
+ static void snd_xxx_input_trigger(struct snd_rawmidi_substream *substream, int up);
]]>
</programlisting>
</informalexample>
@@ -4647,7 +4649,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static void snd_xxx_drain(snd_rawmidi_substream_t *substream);
+ static void snd_xxx_drain(struct snd_rawmidi_substream *substream);
]]>
</programlisting>
</informalexample>
@@ -4661,7 +4663,7 @@ struct _snd_pcm_runtime {
<para>
This callback is optional. If you do not set
- <structfield>drain</structfield> in the snd_rawmidi_ops_t
+ <structfield>drain</structfield> in the struct snd_rawmidi_ops
structure, ALSA will simply wait for 50&nbsp;milliseconds
instead.
</para>
@@ -4703,7 +4705,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- opl3_t *opl3;
+ struct snd_opl3 *opl3;
snd_opl3_create(card, lport, rport, OPL3_HW_OPL3_XXX,
integrated, &opl3);
]]>
@@ -4736,7 +4738,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- opl3_t *opl3;
+ struct snd_opl3 *opl3;
snd_opl3_new(card, OPL3_HW_OPL3_XXX, &opl3);
]]>
</programlisting>
@@ -4767,7 +4769,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- snd_hwdep_t *opl3hwdep;
+ struct snd_hwdep *opl3hwdep;
snd_opl3_hwdep_new(opl3, 0, 1, &opl3hwdep);
]]>
</programlisting>
@@ -4804,7 +4806,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- snd_hwdep_t *hw;
+ struct snd_hwdep *hw;
snd_hwdep_new(card, "My HWDEP", 0, &hw);
]]>
</programlisting>
@@ -4823,7 +4825,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- mydata_t *p = kmalloc(sizeof(*p), GFP_KERNEL);
+ struct mydata *p = kmalloc(sizeof(*p), GFP_KERNEL);
hw->private_data = p;
hw->private_free = mydata_free;
]]>
@@ -4835,9 +4837,9 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static void mydata_free(snd_hwdep_t *hw)
+ static void mydata_free(struct snd_hwdep *hw)
{
- mydata_t *p = hw->private_data;
+ struct mydata *p = hw->private_data;
kfree(p);
}
]]>
@@ -5061,9 +5063,9 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static int playback_copy(snd_pcm_substream_t *substream, int channel,
+ static int playback_copy(struct snd_pcm_substream *substream, int channel,
snd_pcm_uframes_t pos, void *src, snd_pcm_uframes_t count);
- static int capture_copy(snd_pcm_substream_t *substream, int channel,
+ static int capture_copy(struct snd_pcm_substream *substream, int channel,
snd_pcm_uframes_t pos, void *dst, snd_pcm_uframes_t count);
]]>
</programlisting>
@@ -5144,7 +5146,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static int silence(snd_pcm_substream_t *substream, int channel,
+ static int silence(struct snd_pcm_substream *substream, int channel,
snd_pcm_uframes_t pos, snd_pcm_uframes_t count);
]]>
</programlisting>
@@ -5211,7 +5213,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- snd_pcm_sgbuf_t *sgbuf = (snd_pcm_sgbuf_t*)substream->dma_private;
+ struct snd_sg_buf *sgbuf = (struct snd_sg_buf_t*)substream->dma_private;
]]>
</programlisting>
</informalexample>
@@ -5266,7 +5268,7 @@ struct _snd_pcm_runtime {
#include <linux/vmalloc.h>
/* get the physical page pointer on the given offset */
- static struct page *mychip_page(snd_pcm_substream_t *substream,
+ static struct page *mychip_page(struct snd_pcm_substream *substream,
unsigned long offset)
{
void *pageptr = substream->runtime->dma_area + offset;
@@ -5301,7 +5303,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- snd_info_entry_t *entry;
+ struct snd_info_entry *entry;
int err = snd_card_proc_new(card, "my-file", &entry);
]]>
</programlisting>
@@ -5345,8 +5347,8 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static void my_proc_read(snd_info_entry_t *entry,
- snd_info_buffer_t *buffer);
+ static void my_proc_read(struct snd_info_entry *entry,
+ struct snd_info_buffer *buffer);
]]>
</programlisting>
</informalexample>
@@ -5361,10 +5363,10 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static void my_proc_read(snd_info_entry_t *entry,
- snd_info_buffer_t *buffer)
+ static void my_proc_read(struct snd_info_entry *entry,
+ struct snd_info_buffer *buffer)
{
- chip_t *chip = entry->private_data;
+ struct my_chip *chip = entry->private_data;
snd_iprintf(buffer, "This is my chip!\n");
snd_iprintf(buffer, "Port = %ld\n", chip->port);
@@ -5453,7 +5455,7 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static long my_file_io_read(snd_info_entry_t *entry,
+ static long my_file_io_read(struct snd_info_entry *entry,
void *file_private_data,
struct file *file,
char *buf,
@@ -5488,22 +5490,60 @@ struct _snd_pcm_runtime {
<constant>CONFIG_PM</constant>.
</para>
+ <para>
+ If the driver supports the suspend/resume
+ <emphasis>fully</emphasis>, that is, the device can be
+ properly resumed to the status at the suspend is called,
+ you can set <constant>SNDRV_PCM_INFO_RESUME</constant> flag
+ to pcm info field. Usually, this is possible when the
+ registers of ths chip can be safely saved and restored to the
+ RAM. If this is set, the trigger callback is called with
+ <constant>SNDRV_PCM_TRIGGER_RESUME</constant> after resume
+ callback is finished.
+ </para>
+
+ <para>
+ Even if the driver doesn't support PM fully but only the
+ partial suspend/resume is possible, it's still worthy to
+ implement suspend/resume callbacks. In such a case, applications
+ would reset the status by calling
+ <function>snd_pcm_prepare()</function> and restart the stream
+ appropriately. Hence, you can define suspend/resume callbacks
+ below but don't set <constant>SNDRV_PCM_INFO_RESUME</constant>
+ info flag to the PCM.
+ </para>
+
+ <para>
+ Note that the trigger with SUSPEND can be always called when
+ <function>snd_pcm_suspend_all</function> is called,
+ regardless of <constant>SNDRV_PCM_INFO_RESUME</constant> flag.
+ The <constant>RESUME</constant> flag affects only the behavior
+ of <function>snd_pcm_resume()</function>.
+ (Thus, in theory,
+ <constant>SNDRV_PCM_TRIGGER_RESUME</constant> isn't needed
+ to be handled in the trigger callback when no
+ <constant>SNDRV_PCM_INFO_RESUME</constant> flag is set. But,
+ it's better to keep it for compatibility reason.)
+ </para>
<para>
- ALSA provides the common power-management layer. Each card driver
- needs to have only low-level suspend and resume callbacks.
+ In the earlier version of ALSA drivers, a common
+ power-management layer was provided, but it has been removed.
+ The driver needs to define the suspend/resume hooks according to
+ the bus the device is assigned. In the case of PCI driver, the
+ callbacks look like below:
<informalexample>
<programlisting>
<![CDATA[
#ifdef CONFIG_PM
- static int snd_my_suspend(snd_card_t *card, pm_message_t state)
+ static int snd_my_suspend(struct pci_dev *pci, pm_message_t state)
{
- .... // do things for suspsend
+ .... /* do things for suspsend */
return 0;
}
- static int snd_my_resume(snd_card_t *card)
+ static int snd_my_resume(struct pci_dev *pci)
{
- .... // do things for suspsend
+ .... /* do things for suspsend */
return 0;
}
#endif
@@ -5516,11 +5556,18 @@ struct _snd_pcm_runtime {
The scheme of the real suspend job is as following.
<orderedlist>
- <listitem><para>Retrieve the chip data from pm_private_data field.</para></listitem>
+ <listitem><para>Retrieve the card and the chip data.</para></listitem>
+ <listitem><para>Call <function>snd_power_change_state()</function> with
+ <constant>SNDRV_CTL_POWER_D3hot</constant> to change the
+ power status.</para></listitem>
<listitem><para>Call <function>snd_pcm_suspend_all()</function> to suspend the running PCM streams.</para></listitem>
+ <listitem><para>If AC97 codecs are used, call
+ <function>snd_ac97_resume()</function> for each codec.</para></listitem>
<listitem><para>Save the register values if necessary.</para></listitem>
<listitem><para>Stop the hardware if necessary.</para></listitem>
- <listitem><para>Disable the PCI device by calling <function>pci_disable_device()</function>.</para></listitem>
+ <listitem><para>Disable the PCI device by calling
+ <function>pci_disable_device()</function>. Then, call
+ <function>pci_save_state()</function> at last.</para></listitem>
</orderedlist>
</para>
@@ -5530,18 +5577,24 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static int mychip_suspend(snd_card_t *card, pm_message_t state)
+ static int mychip_suspend(struct pci_dev *pci, pm_message_t state)
{
/* (1) */
- mychip_t *chip = card->pm_private_data;
+ struct snd_card *card = pci_get_drvdata(pci);
+ struct mychip *chip = card->private_data;
/* (2) */
- snd_pcm_suspend_all(chip->pcm);
+ snd_power_change_state(card, SNDRV_CTL_POWER_D3hot);
/* (3) */
- snd_mychip_save_registers(chip);
+ snd_pcm_suspend_all(chip->pcm);
/* (4) */
- snd_mychip_stop_hardware(chip);
+ snd_ac97_suspend(chip->ac97);
/* (5) */
- pci_disable_device(chip->pci);
+ snd_mychip_save_registers(chip);
+ /* (6) */
+ snd_mychip_stop_hardware(chip);
+ /* (7) */
+ pci_disable_device(pci);
+ pci_save_state(pci);
return 0;
}
]]>
@@ -5553,14 +5606,17 @@ struct _snd_pcm_runtime {
The scheme of the real resume job is as following.
<orderedlist>
- <listitem><para>Retrieve the chip data from pm_private_data field.</para></listitem>
- <listitem><para>Enable the pci device again by calling
- <function>pci_enable_device()</function>.</para></listitem>
+ <listitem><para>Retrieve the card and the chip data.</para></listitem>
+ <listitem><para>Set up PCI. First, call <function>pci_restore_state()</function>.
+ Then enable the pci device again by calling <function>pci_enable_device()</function>.
+ Call <function>pci_set_master()</function> if necessary, too.</para></listitem>
<listitem><para>Re-initialize the chip.</para></listitem>
<listitem><para>Restore the saved registers if necessary.</para></listitem>
<listitem><para>Resume the mixer, e.g. calling
<function>snd_ac97_resume()</function>.</para></listitem>
<listitem><para>Restart the hardware (if any).</para></listitem>
+ <listitem><para>Call <function>snd_power_change_state()</function> with
+ <constant>SNDRV_CTL_POWER_D0</constant> to notify the processes.</para></listitem>
</orderedlist>
</para>
@@ -5570,12 +5626,15 @@ struct _snd_pcm_runtime {
<informalexample>
<programlisting>
<![CDATA[
- static void mychip_resume(mychip_t *chip)
+ static int mychip_resume(struct pci_dev *pci)
{
/* (1) */
- mychip_t *chip = card->pm_private_data;
+ struct snd_card *card = pci_get_drvdata(pci);
+ struct mychip *chip = card->private_data;
/* (2) */
- pci_enable_device(chip->pci);
+ pci_restore_state(pci);
+ pci_enable_device(pci);
+ pci_set_master(pci);
/* (3) */
snd_mychip_reinit_chip(chip);
/* (4) */
@@ -5584,6 +5643,8 @@ struct _snd_pcm_runtime {
snd_ac97_resume(chip->ac97);
/* (6) */
snd_mychip_restart_chip(chip);
+ /* (7) */
+ snd_power_change_state(card, SNDRV_CTL_POWER_D0);
return 0;
}
]]>
@@ -5592,8 +5653,23 @@ struct _snd_pcm_runtime {
</para>
<para>
- OK, we have all callbacks now. Let's set up them now. In the
- initialization of the card, add the following:
+ As shown in the above, it's better to save registers after
+ suspending the PCM operations via
+ <function>snd_pcm_suspend_all()</function> or
+ <function>snd_pcm_suspend()</function>. It means that the PCM
+ streams are already stoppped when the register snapshot is
+ taken. But, remind that you don't have to restart the PCM
+ stream in the resume callback. It'll be restarted via
+ trigger call with <constant>SNDRV_PCM_TRIGGER_RESUME</constant>
+ when necessary.
+ </para>
+
+ <para>
+ OK, we have all callbacks now. Let's set them up. In the
+ initialization of the card, make sure that you can get the chip
+ data from the card instance, typically via
+ <structfield>private_data</structfield> field, in case you
+ created the chip data individually.
<informalexample>
<programlisting>
@@ -5602,33 +5678,56 @@ struct _snd_pcm_runtime {
const struct pci_device_id *pci_id)
{
....
- snd_card_t *card;
- mychip_t *chip;
+ struct snd_card *card;
+ struct mychip *chip;
....
- snd_card_set_pm_callback(card, snd_my_suspend, snd_my_resume, chip);
+ card = snd_card_new(index[dev], id[dev], THIS_MODULE, NULL);
+ ....
+ chip = kzalloc(sizeof(*chip), GFP_KERNEL);
+ ....
+ card->private_data = chip;
+ ....
+ }
+]]>
+ </programlisting>
+ </informalexample>
+
+ When you created the chip data with
+ <function>snd_card_new()</function>, it's anyway accessible
+ via <structfield>private_data</structfield> field.
+
+ <informalexample>
+ <programlisting>
+<![CDATA[
+ static int __devinit snd_mychip_probe(struct pci_dev *pci,
+ const struct pci_device_id *pci_id)
+ {
+ ....
+ struct snd_card *card;
+ struct mychip *chip;
+ ....
+ card = snd_card_new(index[dev], id[dev], THIS_MODULE,
+ sizeof(struct mychip));
+ ....
+ chip = card->private_data;
....
}
]]>
</programlisting>
</informalexample>
- Here you don't have to put ifdef CONFIG_PM around, since it's already
- checked in the header and expanded to empty if not needed.
</para>
<para>
- If you need a space for saving the registers, you'll need to
- allocate the buffer for it here, too, since it would be fatal
+ If you need a space for saving the registers, allocate the
+ buffer for it here, too, since it would be fatal
if you cannot allocate a memory in the suspend phase.
The allocated buffer should be released in the corresponding
destructor.
</para>
<para>
- And next, set suspend/resume callbacks to the pci_driver,
- This can be done by passing a macro SND_PCI_PM_CALLBACKS
- in the pci_driver struct. This macro is expanded to the correct
- (global) callbacks if CONFIG_PM is set.
+ And next, set suspend/resume callbacks to the pci_driver.
<informalexample>
<programlisting>
@@ -5638,7 +5737,10 @@ struct _snd_pcm_runtime {
.id_table = snd_my_ids,
.probe = snd_my_probe,
.remove = __devexit_p(snd_my_remove),
- SND_PCI_PM_CALLBACKS
+ #ifdef CONFIG_PM
+ .suspend = snd_my_suspend,
+ .resume = snd_my_resume,
+ #endif
};
]]>
</programlisting>
diff --git a/Documentation/sound/alsa/Procfile.txt b/Documentation/sound/alsa/Procfile.txt
index 25c5d648aef..1fe48846d78 100644
--- a/Documentation/sound/alsa/Procfile.txt
+++ b/Documentation/sound/alsa/Procfile.txt
@@ -138,6 +138,22 @@ card*/codec97#0/ac97#?-?+regs
# echo 02 9f1f > /proc/asound/card0/codec97#0/ac97#0-0+regs
+USB Audio Streams
+-----------------
+
+card*/stream*
+ Shows the assignment and the current status of each audio stream
+ of the given card. This information is very useful for debugging.
+
+
+HD-Audio Codecs
+---------------
+
+card*/codec#*
+ Shows the general codec information and the attribute of each
+ widget node.
+
+
Sequencer Information
---------------------
diff --git a/Documentation/sound/alsa/hda_codec.txt b/Documentation/sound/alsa/hda_codec.txt
index e9d07b8f1ac..0be57ed8130 100644
--- a/Documentation/sound/alsa/hda_codec.txt
+++ b/Documentation/sound/alsa/hda_codec.txt
@@ -63,7 +63,7 @@ The bus instance is created via snd_hda_bus_new(). You need to pass
the card instance, the template, and the pointer to store the
resultant bus instance.
-int snd_hda_bus_new(snd_card_t *card, const struct hda_bus_template *temp,
+int snd_hda_bus_new(struct snd_card *card, const struct hda_bus_template *temp,
struct hda_bus **busp);
It returns zero if successful. A negative return value means any
@@ -166,14 +166,14 @@ The ops field contains the following callback functions:
struct hda_pcm_ops {
int (*open)(struct hda_pcm_stream *info, struct hda_codec *codec,
- snd_pcm_substream_t *substream);
+ struct snd_pcm_substream *substream);
int (*close)(struct hda_pcm_stream *info, struct hda_codec *codec,
- snd_pcm_substream_t *substream);
+ struct snd_pcm_substream *substream);
int (*prepare)(struct hda_pcm_stream *info, struct hda_codec *codec,
unsigned int stream_tag, unsigned int format,
- snd_pcm_substream_t *substream);
+ struct snd_pcm_substream *substream);
int (*cleanup)(struct hda_pcm_stream *info, struct hda_codec *codec,
- snd_pcm_substream_t *substream);
+ struct snd_pcm_substream *substream);
};
All are non-NULL, so you can call them safely without NULL check.
@@ -284,7 +284,7 @@ parameter, and PCI subsystem IDs. If the matching entry is found, it
returns the config field value.
snd_hda_add_new_ctls() can be used to create and add control entries.
-Pass the zero-terminated array of snd_kcontrol_new_t. The same array
+Pass the zero-terminated array of struct snd_kcontrol_new. The same array
can be passed to snd_hda_resume_ctls() for resume.
Note that this will call control->put callback of these entries. So,
put callback should check codec->in_resume and force to restore the
@@ -292,7 +292,7 @@ given value if it's non-zero even if the value is identical with the
cached value.
Macros HDA_CODEC_VOLUME(), HDA_CODEC_MUTE() and their variables can be
-used for the entry of snd_kcontrol_new_t.
+used for the entry of struct snd_kcontrol_new.
The input MUX helper callbacks for such a control are provided, too:
snd_hda_input_mux_info() and snd_hda_input_mux_put(). See
diff --git a/Documentation/spi/butterfly b/Documentation/spi/butterfly
new file mode 100644
index 00000000000..a2e8c8d90e3
--- /dev/null
+++ b/Documentation/spi/butterfly
@@ -0,0 +1,57 @@
+spi_butterfly - parport-to-butterfly adapter driver
+===================================================
+
+This is a hardware and software project that includes building and using
+a parallel port adapter cable, together with an "AVR Butterfly" to run
+firmware for user interfacing and/or sensors. A Butterfly is a $US20
+battery powered card with an AVR microcontroller and lots of goodies:
+sensors, LCD, flash, toggle stick, and more. You can use AVR-GCC to
+develop firmware for this, and flash it using this adapter cable.
+
+You can make this adapter from an old printer cable and solder things
+directly to the Butterfly. Or (if you have the parts and skills) you
+can come up with something fancier, providing ciruit protection to the
+Butterfly and the printer port, or with a better power supply than two
+signal pins from the printer port.
+
+
+The first cable connections will hook Linux up to one SPI bus, with the
+AVR and a DataFlash chip; and to the AVR reset line. This is all you
+need to reflash the firmware, and the pins are the standard Atmel "ISP"
+connector pins (used also on non-Butterfly AVR boards).
+
+ Signal Butterfly Parport (DB-25)
+ ------ --------- ---------------
+ SCK = J403.PB1/SCK = pin 2/D0
+ RESET = J403.nRST = pin 3/D1
+ VCC = J403.VCC_EXT = pin 8/D6
+ MOSI = J403.PB2/MOSI = pin 9/D7
+ MISO = J403.PB3/MISO = pin 11/S7,nBUSY
+ GND = J403.GND = pin 23/GND
+
+Then to let Linux master that bus to talk to the DataFlash chip, you must
+(a) flash new firmware that disables SPI (set PRR.2, and disable pullups
+by clearing PORTB.[0-3]); (b) configure the mtd_dataflash driver; and
+(c) cable in the chipselect.
+
+ Signal Butterfly Parport (DB-25)
+ ------ --------- ---------------
+ VCC = J400.VCC_EXT = pin 7/D5
+ SELECT = J400.PB0/nSS = pin 17/C3,nSELECT
+ GND = J400.GND = pin 24/GND
+
+The "USI" controller, using J405, can be used for a second SPI bus. That
+would let you talk to the AVR over SPI, running firmware that makes it act
+as an SPI slave, while letting either Linux or the AVR use the DataFlash.
+There are plenty of spare parport pins to wire this one up, such as:
+
+ Signal Butterfly Parport (DB-25)
+ ------ --------- ---------------
+ SCK = J403.PE4/USCK = pin 5/D3
+ MOSI = J403.PE5/DI = pin 6/D4
+ MISO = J403.PE6/DO = pin 12/S5,nPAPEROUT
+ GND = J403.GND = pin 22/GND
+
+ IRQ = J402.PF4 = pin 10/S6,ACK
+ GND = J402.GND(P2) = pin 25/GND
+
diff --git a/Documentation/spi/spi-summary b/Documentation/spi/spi-summary
new file mode 100644
index 00000000000..a5ffba33a35
--- /dev/null
+++ b/Documentation/spi/spi-summary
@@ -0,0 +1,457 @@
+Overview of Linux kernel SPI support
+====================================
+
+02-Dec-2005
+
+What is SPI?
+------------
+The "Serial Peripheral Interface" (SPI) is a synchronous four wire serial
+link used to connect microcontrollers to sensors, memory, and peripherals.
+
+The three signal wires hold a clock (SCLK, often on the order of 10 MHz),
+and parallel data lines with "Master Out, Slave In" (MOSI) or "Master In,
+Slave Out" (MISO) signals. (Other names are also used.) There are four
+clocking modes through which data is exchanged; mode-0 and mode-3 are most
+commonly used. Each clock cycle shifts data out and data in; the clock
+doesn't cycle except when there is data to shift.
+
+SPI masters may use a "chip select" line to activate a given SPI slave
+device, so those three signal wires may be connected to several chips
+in parallel. All SPI slaves support chipselects. Some devices have
+other signals, often including an interrupt to the master.
+
+Unlike serial busses like USB or SMBUS, even low level protocols for
+SPI slave functions are usually not interoperable between vendors
+(except for cases like SPI memory chips).
+
+ - SPI may be used for request/response style device protocols, as with
+ touchscreen sensors and memory chips.
+
+ - It may also be used to stream data in either direction (half duplex),
+ or both of them at the same time (full duplex).
+
+ - Some devices may use eight bit words. Others may different word
+ lengths, such as streams of 12-bit or 20-bit digital samples.
+
+In the same way, SPI slaves will only rarely support any kind of automatic
+discovery/enumeration protocol. The tree of slave devices accessible from
+a given SPI master will normally be set up manually, with configuration
+tables.
+
+SPI is only one of the names used by such four-wire protocols, and
+most controllers have no problem handling "MicroWire" (think of it as
+half-duplex SPI, for request/response protocols), SSP ("Synchronous
+Serial Protocol"), PSP ("Programmable Serial Protocol"), and other
+related protocols.
+
+Microcontrollers often support both master and slave sides of the SPI
+protocol. This document (and Linux) currently only supports the master
+side of SPI interactions.
+
+
+Who uses it? On what kinds of systems?
+---------------------------------------
+Linux developers using SPI are probably writing device drivers for embedded
+systems boards. SPI is used to control external chips, and it is also a
+protocol supported by every MMC or SD memory card. (The older "DataFlash"
+cards, predating MMC cards but using the same connectors and card shape,
+support only SPI.) Some PC hardware uses SPI flash for BIOS code.
+
+SPI slave chips range from digital/analog converters used for analog
+sensors and codecs, to memory, to peripherals like USB controllers
+or Ethernet adapters; and more.
+
+Most systems using SPI will integrate a few devices on a mainboard.
+Some provide SPI links on expansion connectors; in cases where no
+dedicated SPI controller exists, GPIO pins can be used to create a
+low speed "bitbanging" adapter. Very few systems will "hotplug" an SPI
+controller; the reasons to use SPI focus on low cost and simple operation,
+and if dynamic reconfiguration is important, USB will often be a more
+appropriate low-pincount peripheral bus.
+
+Many microcontrollers that can run Linux integrate one or more I/O
+interfaces with SPI modes. Given SPI support, they could use MMC or SD
+cards without needing a special purpose MMC/SD/SDIO controller.
+
+
+How do these driver programming interfaces work?
+------------------------------------------------
+The <linux/spi/spi.h> header file includes kerneldoc, as does the
+main source code, and you should certainly read that. This is just
+an overview, so you get the big picture before the details.
+
+SPI requests always go into I/O queues. Requests for a given SPI device
+are always executed in FIFO order, and complete asynchronously through
+completion callbacks. There are also some simple synchronous wrappers
+for those calls, including ones for common transaction types like writing
+a command and then reading its response.
+
+There are two types of SPI driver, here called:
+
+ Controller drivers ... these are often built in to System-On-Chip
+ processors, and often support both Master and Slave roles.
+ These drivers touch hardware registers and may use DMA.
+ Or they can be PIO bitbangers, needing just GPIO pins.
+
+ Protocol drivers ... these pass messages through the controller
+ driver to communicate with a Slave or Master device on the
+ other side of an SPI link.
+
+So for example one protocol driver might talk to the MTD layer to export
+data to filesystems stored on SPI flash like DataFlash; and others might
+control audio interfaces, present touchscreen sensors as input interfaces,
+or monitor temperature and voltage levels during industrial processing.
+And those might all be sharing the same controller driver.
+
+A "struct spi_device" encapsulates the master-side interface between
+those two types of driver. At this writing, Linux has no slave side
+programming interface.
+
+There is a minimal core of SPI programming interfaces, focussing on
+using driver model to connect controller and protocol drivers using
+device tables provided by board specific initialization code. SPI
+shows up in sysfs in several locations:
+
+ /sys/devices/.../CTLR/spiB.C ... spi_device for on bus "B",
+ chipselect C, accessed through CTLR.
+
+ /sys/devices/.../CTLR/spiB.C/modalias ... identifies the driver
+ that should be used with this device (for hotplug/coldplug)
+
+ /sys/bus/spi/devices/spiB.C ... symlink to the physical
+ spiB-C device
+
+ /sys/bus/spi/drivers/D ... driver for one or more spi*.* devices
+
+ /sys/class/spi_master/spiB ... class device for the controller
+ managing bus "B". All the spiB.* devices share the same
+ physical SPI bus segment, with SCLK, MOSI, and MISO.
+
+
+How does board-specific init code declare SPI devices?
+------------------------------------------------------
+Linux needs several kinds of information to properly configure SPI devices.
+That information is normally provided by board-specific code, even for
+chips that do support some of automated discovery/enumeration.
+
+DECLARE CONTROLLERS
+
+The first kind of information is a list of what SPI controllers exist.
+For System-on-Chip (SOC) based boards, these will usually be platform
+devices, and the controller may need some platform_data in order to
+operate properly. The "struct platform_device" will include resources
+like the physical address of the controller's first register and its IRQ.
+
+Platforms will often abstract the "register SPI controller" operation,
+maybe coupling it with code to initialize pin configurations, so that
+the arch/.../mach-*/board-*.c files for several boards can all share the
+same basic controller setup code. This is because most SOCs have several
+SPI-capable controllers, and only the ones actually usable on a given
+board should normally be set up and registered.
+
+So for example arch/.../mach-*/board-*.c files might have code like:
+
+ #include <asm/arch/spi.h> /* for mysoc_spi_data */
+
+ /* if your mach-* infrastructure doesn't support kernels that can
+ * run on multiple boards, pdata wouldn't benefit from "__init".
+ */
+ static struct mysoc_spi_data __init pdata = { ... };
+
+ static __init board_init(void)
+ {
+ ...
+ /* this board only uses SPI controller #2 */
+ mysoc_register_spi(2, &pdata);
+ ...
+ }
+
+And SOC-specific utility code might look something like:
+
+ #include <asm/arch/spi.h>
+
+ static struct platform_device spi2 = { ... };
+
+ void mysoc_register_spi(unsigned n, struct mysoc_spi_data *pdata)
+ {
+ struct mysoc_spi_data *pdata2;
+
+ pdata2 = kmalloc(sizeof *pdata2, GFP_KERNEL);
+ *pdata2 = pdata;
+ ...
+ if (n == 2) {
+ spi2->dev.platform_data = pdata2;
+ register_platform_device(&spi2);
+
+ /* also: set up pin modes so the spi2 signals are
+ * visible on the relevant pins ... bootloaders on
+ * production boards may already have done this, but
+ * developer boards will often need Linux to do it.
+ */
+ }
+ ...
+ }
+
+Notice how the platform_data for boards may be different, even if the
+same SOC controller is used. For example, on one board SPI might use
+an external clock, where another derives the SPI clock from current
+settings of some master clock.
+
+
+DECLARE SLAVE DEVICES
+
+The second kind of information is a list of what SPI slave devices exist
+on the target board, often with some board-specific data needed for the
+driver to work correctly.
+
+Normally your arch/.../mach-*/board-*.c files would provide a small table
+listing the SPI devices on each board. (This would typically be only a
+small handful.) That might look like:
+
+ static struct ads7846_platform_data ads_info = {
+ .vref_delay_usecs = 100,
+ .x_plate_ohms = 580,
+ .y_plate_ohms = 410,
+ };
+
+ static struct spi_board_info spi_board_info[] __initdata = {
+ {
+ .modalias = "ads7846",
+ .platform_data = &ads_info,
+ .mode = SPI_MODE_0,
+ .irq = GPIO_IRQ(31),
+ .max_speed_hz = 120000 /* max sample rate at 3V */ * 16,
+ .bus_num = 1,
+ .chip_select = 0,
+ },
+ };
+
+Again, notice how board-specific information is provided; each chip may need
+several types. This example shows generic constraints like the fastest SPI
+clock to allow (a function of board voltage in this case) or how an IRQ pin
+is wired, plus chip-specific constraints like an important delay that's
+changed by the capacitance at one pin.
+
+(There's also "controller_data", information that may be useful to the
+controller driver. An example would be peripheral-specific DMA tuning
+data or chipselect callbacks. This is stored in spi_device later.)
+
+The board_info should provide enough information to let the system work
+without the chip's driver being loaded. The most troublesome aspect of
+that is likely the SPI_CS_HIGH bit in the spi_device.mode field, since
+sharing a bus with a device that interprets chipselect "backwards" is
+not possible.
+
+Then your board initialization code would register that table with the SPI
+infrastructure, so that it's available later when the SPI master controller
+driver is registered:
+
+ spi_register_board_info(spi_board_info, ARRAY_SIZE(spi_board_info));
+
+Like with other static board-specific setup, you won't unregister those.
+
+The widely used "card" style computers bundle memory, cpu, and little else
+onto a card that's maybe just thirty square centimeters. On such systems,
+your arch/.../mach-.../board-*.c file would primarily provide information
+about the devices on the mainboard into which such a card is plugged. That
+certainly includes SPI devices hooked up through the card connectors!
+
+
+NON-STATIC CONFIGURATIONS
+
+Developer boards often play by different rules than product boards, and one
+example is the potential need to hotplug SPI devices and/or controllers.
+
+For those cases you might need to use use spi_busnum_to_master() to look
+up the spi bus master, and will likely need spi_new_device() to provide the
+board info based on the board that was hotplugged. Of course, you'd later
+call at least spi_unregister_device() when that board is removed.
+
+When Linux includes support for MMC/SD/SDIO/DataFlash cards through SPI, those
+configurations will also be dynamic. Fortunately, those devices all support
+basic device identification probes, so that support should hotplug normally.
+
+
+How do I write an "SPI Protocol Driver"?
+----------------------------------------
+All SPI drivers are currently kernel drivers. A userspace driver API
+would just be another kernel driver, probably offering some lowlevel
+access through aio_read(), aio_write(), and ioctl() calls and using the
+standard userspace sysfs mechanisms to bind to a given SPI device.
+
+SPI protocol drivers somewhat resemble platform device drivers:
+
+ static struct spi_driver CHIP_driver = {
+ .driver = {
+ .name = "CHIP",
+ .bus = &spi_bus_type,
+ .owner = THIS_MODULE,
+ },
+
+ .probe = CHIP_probe,
+ .remove = __devexit_p(CHIP_remove),
+ .suspend = CHIP_suspend,
+ .resume = CHIP_resume,
+ };
+
+The driver core will autmatically attempt to bind this driver to any SPI
+device whose board_info gave a modalias of "CHIP". Your probe() code
+might look like this unless you're creating a class_device:
+
+ static int __devinit CHIP_probe(struct spi_device *spi)
+ {
+ struct CHIP *chip;
+ struct CHIP_platform_data *pdata;
+
+ /* assuming the driver requires board-specific data: */
+ pdata = &spi->dev.platform_data;
+ if (!pdata)
+ return -ENODEV;
+
+ /* get memory for driver's per-chip state */
+ chip = kzalloc(sizeof *chip, GFP_KERNEL);
+ if (!chip)
+ return -ENOMEM;
+ dev_set_drvdata(&spi->dev, chip);
+
+ ... etc
+ return 0;
+ }
+
+As soon as it enters probe(), the driver may issue I/O requests to
+the SPI device using "struct spi_message". When remove() returns,
+the driver guarantees that it won't submit any more such messages.
+
+ - An spi_message is a sequence of of protocol operations, executed
+ as one atomic sequence. SPI driver controls include:
+
+ + when bidirectional reads and writes start ... by how its
+ sequence of spi_transfer requests is arranged;
+
+ + optionally defining short delays after transfers ... using
+ the spi_transfer.delay_usecs setting;
+
+ + whether the chipselect becomes inactive after a transfer and
+ any delay ... by using the spi_transfer.cs_change flag;
+
+ + hinting whether the next message is likely to go to this same
+ device ... using the spi_transfer.cs_change flag on the last
+ transfer in that atomic group, and potentially saving costs
+ for chip deselect and select operations.
+
+ - Follow standard kernel rules, and provide DMA-safe buffers in
+ your messages. That way controller drivers using DMA aren't forced
+ to make extra copies unless the hardware requires it (e.g. working
+ around hardware errata that force the use of bounce buffering).
+
+ If standard dma_map_single() handling of these buffers is inappropriate,
+ you can use spi_message.is_dma_mapped to tell the controller driver
+ that you've already provided the relevant DMA addresses.
+
+ - The basic I/O primitive is spi_async(). Async requests may be
+ issued in any context (irq handler, task, etc) and completion
+ is reported using a callback provided with the message.
+ After any detected error, the chip is deselected and processing
+ of that spi_message is aborted.
+
+ - There are also synchronous wrappers like spi_sync(), and wrappers
+ like spi_read(), spi_write(), and spi_write_then_read(). These
+ may be issued only in contexts that may sleep, and they're all
+ clean (and small, and "optional") layers over spi_async().
+
+ - The spi_write_then_read() call, and convenience wrappers around
+ it, should only be used with small amounts of data where the
+ cost of an extra copy may be ignored. It's designed to support
+ common RPC-style requests, such as writing an eight bit command
+ and reading a sixteen bit response -- spi_w8r16() being one its
+ wrappers, doing exactly that.
+
+Some drivers may need to modify spi_device characteristics like the
+transfer mode, wordsize, or clock rate. This is done with spi_setup(),
+which would normally be called from probe() before the first I/O is
+done to the device.
+
+While "spi_device" would be the bottom boundary of the driver, the
+upper boundaries might include sysfs (especially for sensor readings),
+the input layer, ALSA, networking, MTD, the character device framework,
+or other Linux subsystems.
+
+Note that there are two types of memory your driver must manage as part
+of interacting with SPI devices.
+
+ - I/O buffers use the usual Linux rules, and must be DMA-safe.
+ You'd normally allocate them from the heap or free page pool.
+ Don't use the stack, or anything that's declared "static".
+
+ - The spi_message and spi_transfer metadata used to glue those
+ I/O buffers into a group of protocol transactions. These can
+ be allocated anywhere it's convenient, including as part of
+ other allocate-once driver data structures. Zero-init these.
+
+If you like, spi_message_alloc() and spi_message_free() convenience
+routines are available to allocate and zero-initialize an spi_message
+with several transfers.
+
+
+How do I write an "SPI Master Controller Driver"?
+-------------------------------------------------
+An SPI controller will probably be registered on the platform_bus; write
+a driver to bind to the device, whichever bus is involved.
+
+The main task of this type of driver is to provide an "spi_master".
+Use spi_alloc_master() to allocate the master, and class_get_devdata()
+to get the driver-private data allocated for that device.
+
+ struct spi_master *master;
+ struct CONTROLLER *c;
+
+ master = spi_alloc_master(dev, sizeof *c);
+ if (!master)
+ return -ENODEV;
+
+ c = class_get_devdata(&master->cdev);
+
+The driver will initialize the fields of that spi_master, including the
+bus number (maybe the same as the platform device ID) and three methods
+used to interact with the SPI core and SPI protocol drivers. It will
+also initialize its own internal state.
+
+ master->setup(struct spi_device *spi)
+ This sets up the device clock rate, SPI mode, and word sizes.
+ Drivers may change the defaults provided by board_info, and then
+ call spi_setup(spi) to invoke this routine. It may sleep.
+
+ master->transfer(struct spi_device *spi, struct spi_message *message)
+ This must not sleep. Its responsibility is arrange that the
+ transfer happens and its complete() callback is issued; the two
+ will normally happen later, after other transfers complete.
+
+ master->cleanup(struct spi_device *spi)
+ Your controller driver may use spi_device.controller_state to hold
+ state it dynamically associates with that device. If you do that,
+ be sure to provide the cleanup() method to free that state.
+
+The bulk of the driver will be managing the I/O queue fed by transfer().
+
+That queue could be purely conceptual. For example, a driver used only
+for low-frequency sensor acess might be fine using synchronous PIO.
+
+But the queue will probably be very real, using message->queue, PIO,
+often DMA (especially if the root filesystem is in SPI flash), and
+execution contexts like IRQ handlers, tasklets, or workqueues (such
+as keventd). Your driver can be as fancy, or as simple, as you need.
+
+
+THANKS TO
+---------
+Contributors to Linux-SPI discussions include (in alphabetical order,
+by last name):
+
+David Brownell
+Russell King
+Dmitry Pervushin
+Stephen Street
+Mark Underwood
+Andrew Victor
+Vitaly Wool
+
diff --git a/Documentation/stable_kernel_rules.txt b/Documentation/stable_kernel_rules.txt
index 2c81305090d..e409e5d0748 100644
--- a/Documentation/stable_kernel_rules.txt
+++ b/Documentation/stable_kernel_rules.txt
@@ -1,58 +1,56 @@
Everything you ever wanted to know about Linux 2.6 -stable releases.
-Rules on what kind of patches are accepted, and what ones are not, into
-the "-stable" tree:
+Rules on what kind of patches are accepted, and which ones are not, into the
+"-stable" tree:
- It must be obviously correct and tested.
- - It can not bigger than 100 lines, with context.
+ - It can not be bigger than 100 lines, with context.
- It must fix only one thing.
- It must fix a real bug that bothers people (not a, "This could be a
- problem..." type thing.)
+ problem..." type thing).
- It must fix a problem that causes a build error (but not for things
marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
- security issue, or some "oh, that's not good" issue. In short,
- something critical.
- - No "theoretical race condition" issues, unless an explanation of how
- the race can be exploited.
+ security issue, or some "oh, that's not good" issue. In short, something
+ critical.
+ - No "theoretical race condition" issues, unless an explanation of how the
+ race can be exploited is also provided.
- It can not contain any "trivial" fixes in it (spelling changes,
- whitespace cleanups, etc.)
+ whitespace cleanups, etc).
- It must be accepted by the relevant subsystem maintainer.
- - It must follow Documentation/SubmittingPatches rules.
+ - It must follow the Documentation/SubmittingPatches rules.
Procedure for submitting patches to the -stable tree:
- Send the patch, after verifying that it follows the above rules, to
stable@kernel.org.
- - The sender will receive an ack when the patch has been accepted into
- the queue, or a nak if the patch is rejected. This response might
- take a few days, according to the developer's schedules.
- - If accepted, the patch will be added to the -stable queue, for review
- by other developers.
+ - The sender will receive an ACK when the patch has been accepted into the
+ queue, or a NAK if the patch is rejected. This response might take a few
+ days, according to the developer's schedules.
+ - If accepted, the patch will be added to the -stable queue, for review by
+ other developers.
- Security patches should not be sent to this alias, but instead to the
- documented security@kernel.org.
+ documented security@kernel.org address.
Review cycle:
- - When the -stable maintainers decide for a review cycle, the patches
- will be sent to the review committee, and the maintainer of the
- affected area of the patch (unless the submitter is the maintainer of
- the area) and CC: to the linux-kernel mailing list.
- - The review committee has 48 hours in which to ack or nak the patch.
+ - When the -stable maintainers decide for a review cycle, the patches will be
+ sent to the review committee, and the maintainer of the affected area of
+ the patch (unless the submitter is the maintainer of the area) and CC: to
+ the linux-kernel mailing list.
+ - The review committee has 48 hours in which to ACK or NAK the patch.
- If the patch is rejected by a member of the committee, or linux-kernel
- members object to the patch, bringing up issues that the maintainers
- and members did not realize, the patch will be dropped from the
- queue.
- - At the end of the review cycle, the acked patches will be added to
- the latest -stable release, and a new -stable release will happen.
- - Security patches will be accepted into the -stable tree directly from
- the security kernel team, and not go through the normal review cycle.
+ members object to the patch, bringing up issues that the maintainers and
+ members did not realize, the patch will be dropped from the queue.
+ - At the end of the review cycle, the ACKed patches will be added to the
+ latest -stable release, and a new -stable release will happen.
+ - Security patches will be accepted into the -stable tree directly from the
+ security kernel team, and not go through the normal review cycle.
Contact the kernel security team for more details on this procedure.
Review committe:
- - This will be made up of a number of kernel developers who have
- volunteered for this task, and a few that haven't.
-
+ - This is made up of a number of kernel developers who have volunteered for
+ this task, and a few that haven't.
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 2f1aae32a5d..391dd64363e 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -26,12 +26,14 @@ Currently, these files are in /proc/sys/vm:
- min_free_kbytes
- laptop_mode
- block_dump
+- drop-caches
+- zone_reclaim_mode
==============================================================
dirty_ratio, dirty_background_ratio, dirty_expire_centisecs,
dirty_writeback_centisecs, vfs_cache_pressure, laptop_mode,
-block_dump, swap_token_timeout:
+block_dump, swap_token_timeout, drop-caches:
See Documentation/filesystems/proc.txt
@@ -102,3 +104,37 @@ This is used to force the Linux VM to keep a minimum number
of kilobytes free. The VM uses this number to compute a pages_min
value for each lowmem zone in the system. Each lowmem zone gets
a number of reserved free pages based proportionally on its size.
+
+==============================================================
+
+percpu_pagelist_fraction
+
+This is the fraction of pages at most (high mark pcp->high) in each zone that
+are allocated for each per cpu page list. The min value for this is 8. It
+means that we don't allow more than 1/8th of pages in each zone to be
+allocated in any single per_cpu_pagelist. This entry only changes the value
+of hot per cpu pagelists. User can specify a number like 100 to allocate
+1/100th of each zone to each per cpu page list.
+
+The batch value of each per cpu pagelist is also updated as a result. It is
+set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8)
+
+The initial value is zero. Kernel does not use this value at boot time to set
+the high water marks for each per cpu page list.
+
+===============================================================
+
+zone_reclaim_mode:
+
+This is set during bootup to 1 if it is determined that pages from
+remote zones will cause a significant performance reduction. The
+page allocator will then reclaim easily reusable pages (those page
+cache pages that are currently not used) before going off node.
+
+The user can override this setting. It may be beneficial to switch
+off zone reclaim if the system is used for a file server and all
+of memory should be used for caching files from disk.
+
+It may be beneficial to switch this on if one wants to do zone
+reclaim regardless of the numa distances in the system.
+
diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt
index baf17b38158..ad0bedf678b 100644
--- a/Documentation/sysrq.txt
+++ b/Documentation/sysrq.txt
@@ -202,17 +202,13 @@ you must call __handle_sysrq_nolock instead.
* I have more questions, who can I ask?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-You may feel free to send email to myrdraal@deathsdoor.com, and I will
-respond as soon as possible.
- -Myrdraal
-
And I'll answer any questions about the registration system you got, also
responding as soon as possible.
-Crutcher
* Credits
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Written by Mydraal <myrdraal@deathsdoor.com>
+Written by Mydraal <vulpyne@vulpyne.net>
Updated by Adam Sulmicki <adam@cfar.umd.edu>
Updated by Jeremy M. Dolan <jmd@turbogeek.org> 2001/01/28 10:15:59
Added to by Crutcher Dunnavant <crutcher+kernel@datastacks.com>
diff --git a/Documentation/video4linux/CARDLIST.bttv b/Documentation/video4linux/CARDLIST.bttv
index 330246ac80f..b72706c58a4 100644
--- a/Documentation/video4linux/CARDLIST.bttv
+++ b/Documentation/video4linux/CARDLIST.bttv
@@ -141,3 +141,5 @@
140 -> Osprey 440 [0070:ff07]
141 -> Asound Skyeye PCTV
142 -> Sabrent TV-FM (bttv version)
+143 -> Hauppauge ImpactVCB (bt878) [0070:13eb]
+144 -> MagicTV
diff --git a/Documentation/video4linux/CARDLIST.cx88 b/Documentation/video4linux/CARDLIST.cx88
index a1017d1a85d..56e194f1a0b 100644
--- a/Documentation/video4linux/CARDLIST.cx88
+++ b/Documentation/video4linux/CARDLIST.cx88
@@ -16,10 +16,10 @@
15 -> DViCO FusionHDTV DVB-T1 [18ac:db00]
16 -> KWorld LTV883RF
17 -> DViCO FusionHDTV 3 Gold-Q [18ac:d810]
- 18 -> Hauppauge Nova-T DVB-T [0070:9002]
+ 18 -> Hauppauge Nova-T DVB-T [0070:9002,0070:9001]
19 -> Conexant DVB-T reference design [14f1:0187]
20 -> Provideo PV259 [1540:2580]
- 21 -> DViCO FusionHDTV DVB-T Plus [18ac:db10]
+ 21 -> DViCO FusionHDTV DVB-T Plus [18ac:db10,18ac:db11]
22 -> pcHDTV HD3000 HDTV [7063:3000]
23 -> digitalnow DNTV Live! DVB-T [17de:a8a6]
24 -> Hauppauge WinTV 28xxx (Roslyn) models [0070:2801]
@@ -35,3 +35,11 @@
34 -> ATI HDTV Wonder [1002:a101]
35 -> WinFast DTV1000-T [107d:665f]
36 -> AVerTV 303 (M126) [1461:000a]
+ 37 -> Hauppauge Nova-S-Plus DVB-S [0070:9201,0070:9202]
+ 38 -> Hauppauge Nova-SE2 DVB-S [0070:9200]
+ 39 -> KWorld DVB-S 100 [17de:08b2]
+ 40 -> Hauppauge WinTV-HVR1100 DVB-T/Hybrid [0070:9400,0070:9402]
+ 41 -> Hauppauge WinTV-HVR1100 DVB-T/Hybrid (Low Profile) [0070:9800,0070:9802]
+ 42 -> digitalnow DNTV Live! DVB-T Pro [1822:0025]
+ 43 -> KWorld/VStream XPert DVB-T with cx22702 [17de:08a1]
+ 44 -> DViCO FusionHDTV DVB-T Dual Digital [18ac:db50]
diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134
index efb708ec116..cb3a59bbeb1 100644
--- a/Documentation/video4linux/CARDLIST.saa7134
+++ b/Documentation/video4linux/CARDLIST.saa7134
@@ -56,7 +56,7 @@
55 -> LifeView FlyDVB-T DUO [5168:0502,5168:0306]
56 -> Avermedia AVerTV 307 [1461:a70a]
57 -> Avermedia AVerTV GO 007 FM [1461:f31f]
- 58 -> ADS Tech Instant TV (saa7135) [1421:0350,1421:0370,1421:1370]
+ 58 -> ADS Tech Instant TV (saa7135) [1421:0350,1421:0351,1421:0370,1421:1370]
59 -> Kworld/Tevion V-Stream Xpert TV PVR7134
60 -> Typhoon DVB-T Duo Digital/Analog Cardbus [4e42:0502]
61 -> Philips TOUGH DVB-T reference design [1131:2004]
@@ -81,4 +81,5 @@
80 -> ASUS Digimatrix TV [1043:0210]
81 -> Philips Tiger reference design [1131:2018]
82 -> MSI TV@Anywhere plus [1462:6231]
-
+ 83 -> Terratec Cinergy 250 PCI TV [153b:1160]
+ 84 -> LifeView FlyDVB Trio [5168:0319]
diff --git a/Documentation/video4linux/CARDLIST.tuner b/Documentation/video4linux/CARDLIST.tuner
index 9d6544ea9f4..f6d0cf7b792 100644
--- a/Documentation/video4linux/CARDLIST.tuner
+++ b/Documentation/video4linux/CARDLIST.tuner
@@ -40,7 +40,7 @@ tuner=38 - Philips PAL/SECAM multi (FM1216ME MK3)
tuner=39 - LG NTSC (newer TAPC series)
tuner=40 - HITACHI V7-J180AT
tuner=41 - Philips PAL_MK (FI1216 MK)
-tuner=42 - Philips 1236D ATSC/NTSC daul in
+tuner=42 - Philips 1236D ATSC/NTSC dual in
tuner=43 - Philips NTSC MK3 (FM1236MK3 or FM1236/F)
tuner=44 - Philips 4 in 1 (ATI TV Wonder Pro/Conexant)
tuner=45 - Microtune 4049 FM5
@@ -50,7 +50,7 @@ tuner=48 - Tenna TNF 8831 BGFF)
tuner=49 - Microtune 4042 FI5 ATSC/NTSC dual in
tuner=50 - TCL 2002N
tuner=51 - Philips PAL/SECAM_D (FM 1256 I-H3)
-tuner=52 - Thomson DDT 7610 (ATSC/NTSC)
+tuner=52 - Thomson DTT 7610 (ATSC/NTSC)
tuner=53 - Philips FQ1286
tuner=54 - tda8290+75
tuner=55 - TCL 2002MB
@@ -58,7 +58,7 @@ tuner=56 - Philips PAL/SECAM multi (FQ1216AME MK4)
tuner=57 - Philips FQ1236A MK4
tuner=58 - Ymec TVision TVF-8531MF/8831MF/8731MF
tuner=59 - Ymec TVision TVF-5533MF
-tuner=60 - Thomson DDT 7611 (ATSC/NTSC)
+tuner=60 - Thomson DTT 761X (ATSC/NTSC)
tuner=61 - Tena TNF9533-D/IF/TNF9533-B/DF
tuner=62 - Philips TEA5767HN FM Radio
tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner
@@ -68,3 +68,4 @@ tuner=66 - LG NTSC (TALN mini series)
tuner=67 - Philips TD1316 Hybrid Tuner
tuner=68 - Philips TUV1236D ATSC/NTSC dual in
tuner=69 - Tena TNF 5335 MF
+tuner=70 - Samsung TCPN 2121P30A
diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt
index e566affeed7..9c5fc15d03d 100644
--- a/Documentation/x86_64/boot-options.txt
+++ b/Documentation/x86_64/boot-options.txt
@@ -125,7 +125,7 @@ SMP
cpumask=MASK only use cpus with bits set in mask
additional_cpus=NUM Allow NUM more CPUs for hotplug
- (defaults are specified by the BIOS or half the available CPUs)
+ (defaults are specified by the BIOS, see Documentation/x86_64/cpu-hotplug-spec)
NUMA
@@ -198,6 +198,6 @@ Debugging
Misc
- noreplacement Don't replace instructions with more appropiate ones
+ noreplacement Don't replace instructions with more appropriate ones
for the CPU. This may be useful on asymmetric MP systems
where some CPU have less capabilities than the others.
diff --git a/Documentation/x86_64/cpu-hotplug-spec b/Documentation/x86_64/cpu-hotplug-spec
new file mode 100644
index 00000000000..5c0fa345e55
--- /dev/null
+++ b/Documentation/x86_64/cpu-hotplug-spec
@@ -0,0 +1,21 @@
+Firmware support for CPU hotplug under Linux/x86-64
+---------------------------------------------------
+
+Linux/x86-64 supports CPU hotplug now. For various reasons Linux wants to
+know in advance boot time the maximum number of CPUs that could be plugged
+into the system. ACPI 3.0 currently has no official way to supply
+this information from the firmware to the operating system.
+
+In ACPI each CPU needs an LAPIC object in the MADT table (5.2.11.5 in the
+ACPI 3.0 specification). ACPI already has the concept of disabled LAPIC
+objects by setting the Enabled bit in the LAPIC object to zero.
+
+For CPU hotplug Linux/x86-64 expects now that any possible future hotpluggable
+CPU is already available in the MADT. If the CPU is not available yet
+it should have its LAPIC Enabled bit set to 0. Linux will use the number
+of disabled LAPICs to compute the maximum number of future CPUs.
+
+In the worst case the user can overwrite this choice using a command line
+option (additional_cpus=...), but it is recommended to supply the correct
+number (or a reasonable approximation of it, with erring towards more not less)
+in the MADT to avoid manual configuration.