Merge branch 'tracing/core' into tracing/hw-breakpoints

Conflicts: arch/Kconfig kernel/trace/trace.h Merge reason: resolve the conflicts, plus adopt to the new ring-buffer APIs. Signed-off-by: Ingo Molnar <mingo@elte.hu>
author: Ingo Molnar <mingo@elte.hu> 2009-09-07 08:19:51 +0200
committer: Ingo Molnar <mingo@elte.hu> 2009-09-07 08:19:51 +0200
commit: a1922ed661ab2c1637d0b10cde933bd9cd33d965 (patch)
tree: 0f1777542b385ebefd30b3586d830fd8ed6fda5b /Documentation
parent: 75e33751ca8bbb72dd6f1a74d2810ddc8cbe4bdf (diff)
parent: d28daf923ac5e4a0d7cecebae56f3e339189366b (diff)
84 files changed, 4597 insertions, 1979 deletions
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index cbbd3e06994..5f3bedaf8e3 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -94,28 +94,37 @@ What:		/sys/block/<disk>/queue/physical_block_size
 Date:		May 2009
 Contact:	Martin K. Petersen <martin.petersen@oracle.com>
 Description:
-		This is the smallest unit the storage device can write
-		without resorting to read-modify-write operation.  It is
-		usually the same as the logical block size but may be
-		bigger.  One example is SATA drives with 4KB sectors
-		that expose a 512-byte logical block size to the
-		operating system.
+		This is the smallest unit a physical storage device can
+		write atomically.  It is usually the same as the logical
+		block size but may be bigger.  One example is SATA
+		drives with 4KB sectors that expose a 512-byte logical
+		block size to the operating system.  For stacked block
+		devices the physical_block_size variable contains the
+		maximum physical_block_size of the component devices.
 
 What:		/sys/block/<disk>/queue/minimum_io_size
 Date:		April 2009
 Contact:	Martin K. Petersen <martin.petersen@oracle.com>
 Description:
-		Storage devices may report a preferred minimum I/O size,
-		which is the smallest request the device can perform
-		without incurring a read-modify-write penalty.  For disk
-		drives this is often the physical block size.  For RAID
-		arrays it is often the stripe chunk size.
+		Storage devices may report a granularity or preferred
+		minimum I/O size which is the smallest request the
+		device can perform without incurring a performance
+		penalty.  For disk drives this is often the physical
+		block size.  For RAID arrays it is often the stripe
+		chunk size.  A properly aligned multiple of
+		minimum_io_size is the preferred request size for
+		workloads where a high number of I/O operations is
+		desired.
 
 What:		/sys/block/<disk>/queue/optimal_io_size
 Date:		April 2009
 Contact:	Martin K. Petersen <martin.petersen@oracle.com>
 Description:
 		Storage devices may report an optimal I/O size, which is
-		the device's preferred unit of receiving I/O.  This is
-		rarely reported for disk drives.  For RAID devices it is
-		usually the stripe width or the internal block size.
+		the device's preferred unit for sustained I/O.  This is
+		rarely reported for disk drives.  For RAID arrays it is
+		usually the stripe width or the internal track size.  A
+		properly aligned multiple of optimal_io_size is the
+		preferred request size for workloads where sustained
+		throughput is desired.  If no optimal I/O size is
+		reported this file contains 0.
diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci
index 97ad190e13a..6bf68053e4b 100644
--- a/Documentation/ABI/testing/sysfs-bus-pci
+++ b/Documentation/ABI/testing/sysfs-bus-pci
@@ -122,3 +122,10 @@ Description:
 		This symbolic link appears when a device is a Virtual Function.
 		The symbolic link points to the PCI device sysfs entry of the
 		Physical Function this device associates with.
+
+What:		/sys/bus/pci/slots/.../module
+Date:		June 2009
+Contact:	linux-pci@vger.kernel.org
+Description:
+		This symbolic link points to the PCI hotplug controller driver
+		module that manages the hotplug slot.
diff --git a/Documentation/ABI/testing/sysfs-class-mtd b/Documentation/ABI/testing/sysfs-class-mtd
new file mode 100644
index 00000000000..4d55a188898
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-mtd
@@ -0,0 +1,125 @@
+What:		/sys/class/mtd/
+Date:		April 2009
+KernelVersion:	2.6.29
+Contact:	linux-mtd@lists.infradead.org
+Description:
+		The mtd/ class subdirectory belongs to the MTD subsystem
+		(MTD core).
+
+What:		/sys/class/mtd/mtdX/
+Date:		April 2009
+KernelVersion:	2.6.29
+Contact:	linux-mtd@lists.infradead.org
+Description:
+		The /sys/class/mtd/mtd{0,1,2,3,...} directories correspond
+		to each /dev/mtdX character device.  These may represent
+		physical/simulated flash devices, partitions on a flash
+		device, or concatenated flash devices.  They exist regardless
+		of whether CONFIG_MTD_CHAR is actually enabled.
+
+What:		/sys/class/mtd/mtdXro/
+Date:		April 2009
+KernelVersion:	2.6.29
+Contact:	linux-mtd@lists.infradead.org
+Description:
+		These directories provide the corresponding read-only device
+		nodes for /sys/class/mtd/mtdX/ .  They are only created
+		(for the benefit of udev) if CONFIG_MTD_CHAR is enabled.
+
+What:		/sys/class/mtd/mtdX/dev
+Date:		April 2009
+KernelVersion:	2.6.29
+Contact:	linux-mtd@lists.infradead.org
+Description:
+		Major and minor numbers of the character device corresponding
+		to this MTD device (in <major>:<minor> format).  This is the
+		read-write device so <minor> will be even.
+
+What:		/sys/class/mtd/mtdXro/dev
+Date:		April 2009
+KernelVersion:	2.6.29
+Contact:	linux-mtd@lists.infradead.org
+Description:
+		Major and minor numbers of the character device corresponding
+		to the read-only variant of thie MTD device (in
+		<major>:<minor> format).  In this case <minor> will be odd.
+
+What:		/sys/class/mtd/mtdX/erasesize
+Date:		April 2009
+KernelVersion:	2.6.29
+Contact:	linux-mtd@lists.infradead.org
+Description:
+		"Major" erase size for the device.  If numeraseregions is
+		zero, this is the eraseblock size for the entire device.
+		Otherwise, the MEMGETREGIONCOUNT/MEMGETREGIONINFO ioctls
+		can be used to determine the actual eraseblock layout.
+
+What:		/sys/class/mtd/mtdX/flags
+Date:		April 2009
+KernelVersion:	2.6.29
+Contact:	linux-mtd@lists.infradead.org
+Description:
+		A hexadecimal value representing the device flags, ORed
+		together:
+
+		0x0400: MTD_WRITEABLE - device is writable
+		0x0800: MTD_BIT_WRITEABLE - single bits can be flipped
+		0x1000: MTD_NO_ERASE - no erase necessary
+		0x2000: MTD_POWERUP_LOCK - always locked after reset
+
+What:		/sys/class/mtd/mtdX/name
+Date:		April 2009
+KernelVersion:	2.6.29
+Contact:	linux-mtd@lists.infradead.org
+Description:
+		A human-readable ASCII name for the device or partition.
+		This will match the name in /proc/mtd .
+
+What:		/sys/class/mtd/mtdX/numeraseregions
+Date:		April 2009
+KernelVersion:	2.6.29
+Contact:	linux-mtd@lists.infradead.org
+Description:
+		For devices that have variable eraseblock sizes, this
+		provides the total number of erase regions.  Otherwise,
+		it will read back as zero.
+
+What:		/sys/class/mtd/mtdX/oobsize
+Date:		April 2009
+KernelVersion:	2.6.29
+Contact:	linux-mtd@lists.infradead.org
+Description:
+		Number of OOB bytes per page.
+
+What:		/sys/class/mtd/mtdX/size
+Date:		April 2009
+KernelVersion:	2.6.29
+Contact:	linux-mtd@lists.infradead.org
+Description:
+		Total size of the device/partition, in bytes.
+
+What:		/sys/class/mtd/mtdX/type
+Date:		April 2009
+KernelVersion:	2.6.29
+Contact:	linux-mtd@lists.infradead.org
+Description:
+		One of the following ASCII strings, representing the device
+		type:
+
+		absent, ram, rom, nor, nand, dataflash, ubi, unknown
+
+What:		/sys/class/mtd/mtdX/writesize
+Date:		April 2009
+KernelVersion:	2.6.29
+Contact:	linux-mtd@lists.infradead.org
+Description:
+		Minimal writable flash unit size.  This will always be
+		a positive integer.
+
+		In the case of NOR flash it is 1 (even though individual
+		bits can be cleared).
+
+		In the case of NAND flash it is one NAND page (or a
+		half page, or a quarter page).
+
+		In the case of ECC NOR, it is the ECC block size.
diff --git a/Documentation/ABI/testing/sysfs-fs-ext4 b/Documentation/ABI/testing/sysfs-fs-ext4
index 4e79074de28..5fb709997d9 100644
--- a/Documentation/ABI/testing/sysfs-fs-ext4
+++ b/Documentation/ABI/testing/sysfs-fs-ext4
@@ -79,3 +79,13 @@ Description:
 		This file is read-only and shows the number of
 		kilobytes of data that have been written to this
 		filesystem since it was mounted.
+
+What:		/sys/fs/ext4/<disk>/inode_goal
+Date:		June 2008
+Contact:	"Theodore Ts'o" <tytso@mit.edu>
+Description:
+		Tuning parameter which (if non-zero) controls the goal
+		inode used by the inode allocator in p0reference to
+		all other allocation hueristics.  This is intended for
+		debugging use only, and should be 0 on production
+		systems.
diff --git a/Documentation/ABI/testing/sysfs-pps b/Documentation/ABI/testing/sysfs-pps
new file mode 100644
index 00000000000..25028c7bc37
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-pps
@@ -0,0 +1,73 @@
+What:		/sys/class/pps/
+Date:		February 2008
+Contact:	Rodolfo Giometti <giometti@linux.it>
+Description:
+		The /sys/class/pps/ directory will contain files and
+		directories that will provide a unified interface to
+		the PPS sources.
+
+What:		/sys/class/pps/ppsX/
+Date:		February 2008
+Contact:	Rodolfo Giometti <giometti@linux.it>
+Description:
+		The /sys/class/pps/ppsX/ directory is related to X-th
+		PPS source into the system. Each directory will
+		contain files to manage and control its PPS source.
+
+What:		/sys/class/pps/ppsX/assert
+Date:		February 2008
+Contact:	Rodolfo Giometti <giometti@linux.it>
+Description:
+		The /sys/class/pps/ppsX/assert file reports the assert events
+		and the assert sequence number of the X-th source in the form:
+
+			<secs>.<nsec>#<sequence>
+
+		If the source has no assert events the content of this file
+		is empty.
+
+What:		/sys/class/pps/ppsX/clear
+Date:		February 2008
+Contact:	Rodolfo Giometti <giometti@linux.it>
+Description:
+		The /sys/class/pps/ppsX/clear file reports the clear events
+		and the clear sequence number of the X-th source in the form:
+
+			<secs>.<nsec>#<sequence>
+
+		If the source has no clear events the content of this file
+		is empty.
+
+What:		/sys/class/pps/ppsX/mode
+Date:		February 2008
+Contact:	Rodolfo Giometti <giometti@linux.it>
+Description:
+		The /sys/class/pps/ppsX/mode file reports the functioning
+		mode of the X-th source in hexadecimal encoding.
+
+		Please, refer to linux/include/linux/pps.h for further
+		info.
+
+What:		/sys/class/pps/ppsX/echo
+Date:		February 2008
+Contact:	Rodolfo Giometti <giometti@linux.it>
+Description:
+		The /sys/class/pps/ppsX/echo file reports if the X-th does
+		or does not support an "echo" function.
+
+What:		/sys/class/pps/ppsX/name
+Date:		February 2008
+Contact:	Rodolfo Giometti <giometti@linux.it>
+Description:
+		The /sys/class/pps/ppsX/name file reports the name of the
+		X-th source.
+
+What:		/sys/class/pps/ppsX/path
+Date:		February 2008
+Contact:	Rodolfo Giometti <giometti@linux.it>
+Description:
+		The /sys/class/pps/ppsX/path file reports the path name of
+		the device connected with the X-th source.
+
+		If the source is not connected with any device the content
+		of this file is empty.
diff --git a/Documentation/Changes b/Documentation/Changes
index 664392481c8..6d0f1efc5bf 100644
--- a/Documentation/Changes
+++ b/Documentation/Changes
@@ -72,6 +72,13 @@ assembling the 16-bit boot code, removing the need for as86 to compile
 your kernel.  This change does, however, mean that you need a recent
 release of binutils.
 
+Perl
+----
+
+You will need perl 5 and the following modules: Getopt::Long, Getopt::Std,
+File::Basename, and File::Find to build the kernel.
+
+
 System utilities
 ================
 
diff --git a/Documentation/DocBook/kernel-hacking.tmpl b/Documentation/DocBook/kernel-hacking.tmpl
index a50d6cd5857..992e67e6be7 100644
--- a/Documentation/DocBook/kernel-hacking.tmpl
+++ b/Documentation/DocBook/kernel-hacking.tmpl
@@ -449,8 +449,8 @@ printk(KERN_INFO "i = %u\n", i);
    </para>
 
    <programlisting>
-__u32 ipaddress;
-printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
+__be32 ipaddress;
+printk(KERN_INFO "my ip: %pI4\n", &amp;ipaddress);
    </programlisting>
 
    <para>
diff --git a/Documentation/DocBook/mac80211.tmpl b/Documentation/DocBook/mac80211.tmpl
index e3698666357..f3f37f141db 100644
--- a/Documentation/DocBook/mac80211.tmpl
+++ b/Documentation/DocBook/mac80211.tmpl
@@ -184,8 +184,6 @@ usage should require reading the full document.
 !Finclude/net/mac80211.h ieee80211_ctstoself_get
 !Finclude/net/mac80211.h ieee80211_ctstoself_duration
 !Finclude/net/mac80211.h ieee80211_generic_frame_duration
-!Finclude/net/mac80211.h ieee80211_get_hdrlen_from_skb
-!Finclude/net/mac80211.h ieee80211_hdrlen
 !Finclude/net/mac80211.h ieee80211_wake_queue
 !Finclude/net/mac80211.h ieee80211_stop_queue
 !Finclude/net/mac80211.h ieee80211_wake_queues
diff --git a/Documentation/PCI/pcieaer-howto.txt b/Documentation/PCI/pcieaer-howto.txt
index ddeb14beacc..be21001ab14 100644
--- a/Documentation/PCI/pcieaer-howto.txt
+++ b/Documentation/PCI/pcieaer-howto.txt
@@ -61,6 +61,10 @@ be initiated although firmwares have no _OSC support. To enable the
 walkaround, pls. add aerdriver.forceload=y to kernel boot parameter line
 when booting kernel. Note that forceload=n by default.
 
+nosourceid, another parameter of type bool, can be used when broken
+hardware (mostly chipsets) has root ports that cannot obtain the reporting
+source ID. nosourceid=n by default.
+
 2.3 AER error output
 When a PCI-E AER error is captured, an error message will be outputed to
 console. If it's a correctable error, it is outputed as a warning.
@@ -246,3 +250,24 @@ with the PCI Express AER Root driver?
 A: It could call the helper functions to enable AER in devices and
 cleanup uncorrectable status register. Pls. refer to section 3.3.
 
+
+4. Software error injection
+
+Debugging PCIE AER error recovery code is quite difficult because it
+is hard to trigger real hardware errors. Software based error
+injection can be used to fake various kinds of PCIE errors.
+
+First you should enable PCIE AER software error injection in kernel
+configuration, that is, following item should be in your .config.
+
+CONFIG_PCIEAER_INJECT=y or CONFIG_PCIEAER_INJECT=m
+
+After reboot with new kernel or insert the module, a device file named
+/dev/aer_inject should be created.
+
+Then, you need a user space tool named aer-inject, which can be gotten
+from:
+    http://www.kernel.org/pub/linux/utils/pci/aer-inject/
+
+More information about aer-inject can be found in the document comes
+with its source code.
diff --git a/Documentation/RCU/rculist_nulls.txt b/Documentation/RCU/rculist_nulls.txt
index 93cb28d05dc..18f9651ff23 100644
--- a/Documentation/RCU/rculist_nulls.txt
+++ b/Documentation/RCU/rculist_nulls.txt
@@ -83,11 +83,12 @@ not detect it missed following items in original chain.
 obj = kmem_cache_alloc(...);
 lock_chain(); // typically a spin_lock()
 obj->key = key;
-atomic_inc(&obj->refcnt);
 /*
  * we need to make sure obj->key is updated before obj->next
+ * or obj->refcnt
  */
 smp_wmb();
+atomic_set(&obj->refcnt, 1);
 hlist_add_head_rcu(&obj->obj_node, list);
 unlock_chain(); // typically a spin_unlock()
 
@@ -159,6 +160,10 @@ out:
 obj = kmem_cache_alloc(cachep);
 lock_chain(); // typically a spin_lock()
 obj->key = key;
+/*
+ * changes to obj->key must be visible before refcnt one
+ */
+smp_wmb();
 atomic_set(&obj->refcnt, 1);
 /*
  * insert obj in RCU way (readers might be traversing chain)
diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist
index ac5e0b2f109..78a9168ff37 100644
--- a/Documentation/SubmitChecklist
+++ b/Documentation/SubmitChecklist
@@ -54,7 +54,7 @@ kernel patches.
     CONFIG_PREEMPT.
 
 14: If the patch affects IO/Disk, etc: has been tested with and without
-    CONFIG_LBD.
+    CONFIG_LBDAF.
 
 15: All codepaths have been exercised with all lockdep features enabled.
 
diff --git a/Documentation/arm/memory.txt b/Documentation/arm/memory.txt
index 43cb1004d35..9d58c7c5edd 100644
--- a/Documentation/arm/memory.txt
+++ b/Documentation/arm/memory.txt
@@ -21,6 +21,8 @@ ffff8000	ffffffff	copy_user_page / clear_user_page use.
 				For SA11xx and Xscale, this is used to
 				setup a minicache mapping.
 
+ffff4000	ffffffff	cache aliasing on ARMv6 and later CPUs.
+
 ffff1000	ffff7fff	Reserved.
 				Platforms must not use this address range.
 
diff --git a/Documentation/block/data-integrity.txt b/Documentation/block/data-integrity.txt
index e8ca040ba2c..2d735b0ae38 100644
--- a/Documentation/block/data-integrity.txt
+++ b/Documentation/block/data-integrity.txt
@@ -50,7 +50,7 @@ encouraged them to allow separation of the data and integrity metadata
 scatter-gather lists.
 
 The controller will interleave the buffers on write and split them on
-read.  This means that the Linux can DMA the data buffers to and from
+read.  This means that Linux can DMA the data buffers to and from
 host memory without changes to the page cache.
 
 Also, the 16-bit CRC checksum mandated by both the SCSI and SATA specs
@@ -66,7 +66,7 @@ software RAID5).
 
 The IP checksum is weaker than the CRC in terms of detecting bit
 errors.  However, the strength is really in the separation of the data
-buffers and the integrity metadata.  These two distinct buffers much
+buffers and the integrity metadata.  These two distinct buffers must
 match up for an I/O to complete.
 
 The separation of the data and integrity metadata buffers as well as
diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt
index f9ca389dddf..1d7e9784439 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroups/cpusets.txt
@@ -777,6 +777,18 @@ in cpuset directories:
 # /bin/echo 1-4 > cpus		-> set cpus list to cpus 1,2,3,4
 # /bin/echo 1,2,3,4 > cpus	-> set cpus list to cpus 1,2,3,4
 
+To add a CPU to a cpuset, write the new list of CPUs including the
+CPU to be added. To add 6 to the above cpuset:
+
+# /bin/echo 1-4,6 > cpus	-> set cpus list to cpus 1,2,3,4,6
+
+Similarly to remove a CPU from a cpuset, write the new list of CPUs
+without the CPU to be removed.
+
+To remove all the CPUs:
+
+# /bin/echo "" > cpus		-> clear cpus list
+
 2.3 Setting flags
 -----------------
 
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 1a608877b14..23d1262c077 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -152,14 +152,19 @@ When swap is accounted, following files are added.
 
 usage of mem+swap is limited by memsw.limit_in_bytes.
 
-Note: why 'mem+swap' rather than swap.
+* why 'mem+swap' rather than swap.
 The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
 to move account from memory to swap...there is no change in usage of
-mem+swap.
+mem+swap. In other words, when we want to limit the usage of swap without
+affecting global LRU, mem+swap limit is better than just limiting swap from
+OS point of view.
 
-In other words, when we want to limit the usage of swap without affecting
-global LRU, mem+swap limit is better than just limiting swap from OS point
-of view.
+* What happens when a cgroup hits memory.memsw.limit_in_bytes
+When a cgroup his memory.memsw.limit_in_bytes, it's useless to do swap-out
+in this cgroup. Then, swap-out will not be done by cgroup routine and file
+caches are dropped. But as mentioned above, global LRU can do swapout memory
+from it for sanity of the system's memory management state. You can't forbid
+it by cgroup.
 
 2.5 Reclaim
 
@@ -204,6 +209,7 @@ We can alter the memory limit:
 
 NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
 mega or gigabytes.
+NOTE: We can write "-1" to reset the *.limit_in_bytes(unlimited).
 
 # cat /cgroups/0/memory.limit_in_bytes
 4194304
diff --git a/Documentation/connector/cn_test.c b/Documentation/connector/cn_test.c
index 6977c178729..6a5be5d5c8e 100644
--- a/Documentation/connector/cn_test.c
+++ b/Documentation/connector/cn_test.c
@@ -1,7 +1,7 @@
 /*
  * 	cn_test.c
  * 
- * 2004-2005 Copyright (c) Evgeniy Polyakov <johnpol@2ka.mipt.ru>
+ * 2004+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net>
  * All rights reserved.
  * 
  * This program is free software; you can redistribute it and/or modify
@@ -41,6 +41,12 @@ void cn_test_callback(void *data)
 	       msg->seq, msg->ack, msg->len, (char *)msg->data);
 }
 
+/*
+ * Do not remove this function even if no one is using it as
+ * this is an example of how to get notifications about new
+ * connector user registration
+ */
+#if 0
 static int cn_test_want_notify(void)
 {
 	struct cn_ctl_msg *ctl;
@@ -117,6 +123,7 @@ nlmsg_failure:
 	kfree_skb(skb);
 	return -EINVAL;
 }
+#endif
 
 static u32 cn_test_timer_counter;
 static void cn_test_timer_func(unsigned long __data)
@@ -187,5 +194,5 @@ module_init(cn_test_init);
 module_exit(cn_test_fini);
 
 MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Evgeniy Polyakov <johnpol@2ka.mipt.ru>");
+MODULE_AUTHOR("Evgeniy Polyakov <zbr@ioremap.net>");
 MODULE_DESCRIPTION("Connector's test module");
diff --git a/Documentation/connector/ucon.c b/Documentation/connector/ucon.c
index d738cde2a8d..c5092ad0ce4 100644
--- a/Documentation/connector/ucon.c
+++ b/Documentation/connector/ucon.c
@@ -1,7 +1,7 @@
 /*
  * 	ucon.c
  *
- * Copyright (c) 2004+ Evgeniy Polyakov <johnpol@2ka.mipt.ru>
+ * Copyright (c) 2004+ Evgeniy Polyakov <zbr@ioremap.net>
  *
  *
  * This program is free software; you can redistribute it and/or modify
diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt
index 43c743903dd..75a58d14d3c 100644
--- a/Documentation/cpu-freq/cpu-drivers.txt
+++ b/Documentation/cpu-freq/cpu-drivers.txt
@@ -155,7 +155,7 @@ actual frequency must be determined using the following rules:
 - if relation==CPUFREQ_REL_H, try to select a new_freq lower than or equal
   target_freq. ("H for highest, but no higher than")
 
-Here again the frequency table helper might assist you - see section 3
+Here again the frequency table helper might assist you - see section 2
 for details.
 
 
diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt
index ce73f3eb5dd..aed082f49d0 100644
--- a/Documentation/cpu-freq/governors.txt
+++ b/Documentation/cpu-freq/governors.txt
@@ -119,10 +119,6 @@ want the kernel to look at the CPU usage and to make decisions on
 what to do about the frequency.  Typically this is set to values of
 around '10000' or more. It's default value is (cmp. with users-guide.txt):
 transition_latency * 1000
-The lowest value you can set is:
-transition_latency * 100 or it may get restricted to a value where it
-makes not sense for the kernel anymore to poll that often which depends
-on your HZ config variable (HZ=1000: max=20000us, HZ=250: max=5000).
 Be aware that transition latency is in ns and sampling_rate is in us, so you
 get the same sysfs value by default.
 Sampling rate should always get adjusted considering the transition latency
@@ -131,14 +127,20 @@ in the bash (as said, 1000 is default), do:
 echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) \
     >ondemand/sampling_rate
 
-show_sampling_rate_(min|max): THIS INTERFACE IS DEPRECATED, DON'T USE IT.
-You can use wider ranges now and the general
-cpuinfo_transition_latency variable (cmp. with user-guide.txt) can be
-used to obtain exactly the same info:
-show_sampling_rate_min = transtition_latency * 500    / 1000
-show_sampling_rate_max = transtition_latency * 500000 / 1000
-(divided by 1000 is to illustrate that sampling rate is in us and
-transition latency is exported ns).
+show_sampling_rate_min:
+The sampling rate is limited by the HW transition latency:
+transition_latency * 100
+Or by kernel restrictions:
+If CONFIG_NO_HZ is set, the limit is 10ms fixed.
+If CONFIG_NO_HZ is not set or no_hz=off boot parameter is used, the
+limits depend on the CONFIG_HZ option:
+HZ=1000: min=20000us  (20ms)
+HZ=250:  min=80000us  (80ms)
+HZ=100:  min=200000us (200ms)
+The highest value of kernel and HW latency restrictions is shown and
+used as the minimum sampling rate.
+
+show_sampling_rate_max: THIS INTERFACE IS DEPRECATED, DON'T USE IT.
 
 up_threshold: defines what the average CPU usage between the samplings
 of 'sampling_rate' needs to be for the kernel to make a decision on
diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt
index 75f41193f3e..5d5f5fadd1c 100644
--- a/Documentation/cpu-freq/user-guide.txt
+++ b/Documentation/cpu-freq/user-guide.txt
@@ -31,7 +31,6 @@ Contents:
 
 3. How to change the CPU cpufreq policy and/or speed
 3.1 Preferred interface: sysfs
-3.2 Deprecated interfaces
 
 
 
diff --git a/Documentation/device-mapper/dm-log.txt b/Documentation/device-mapper/dm-log.txt
new file mode 100644
index 00000000000..994dd75475a
--- /dev/null
+++ b/Documentation/device-mapper/dm-log.txt
@@ -0,0 +1,54 @@
+Device-Mapper Logging
+=====================
+The device-mapper logging code is used by some of the device-mapper
+RAID targets to track regions of the disk that are not consistent.
+A region (or portion of the address space) of the disk may be
+inconsistent because a RAID stripe is currently being operated on or
+a machine died while the region was being altered.  In the case of
+mirrors, a region would be considered dirty/inconsistent while you
+are writing to it because the writes need to be replicated for all
+the legs of the mirror and may not reach the legs at the same time.
+Once all writes are complete, the region is considered clean again.
+
+There is a generic logging interface that the device-mapper RAID
+implementations use to perform logging operations (see
+dm_dirty_log_type in include/linux/dm-dirty-log.h).  Various different
+logging implementations are available and provide different
+capabilities.  The list includes:
+
+Type		Files
+====		=====
+disk		drivers/md/dm-log.c
+core		drivers/md/dm-log.c
+userspace	drivers/md/dm-log-userspace* include/linux/dm-log-userspace.h
+
+The "disk" log type
+-------------------
+This log implementation commits the log state to disk.  This way, the
+logging state survives reboots/crashes.
+
+The "core" log type
+-------------------
+This log implementation keeps the log state in memory.  The log state
+will not survive a reboot or crash, but there may be a small boost in
+performance.  This method can also be used if no storage device is
+available for storing log state.
+
+The "userspace" log type
+------------------------
+This log type simply provides a way to export the log API to userspace,
+so log implementations can be done there.  This is done by forwarding most
+logging requests to userspace, where a daemon receives and processes the
+request.
+
+The structure used for communication between kernel and userspace are
+located in include/linux/dm-log-userspace.h.  Due to the frequency,
+diversity, and 2-way communication nature of the exchanges between
+kernel and userspace, 'connector' is used as the interface for
+communication.
+
+There are currently two userspace log implementations that leverage this
+framework - "clustered_disk" and "clustered_core".  These implementations
+provide a cluster-coherent log for shared-storage.  Device-mapper mirroring
+can be used in a shared-storage environment when the cluster log implementations
+are employed.
diff --git a/Documentation/device-mapper/dm-queue-length.txt b/Documentation/device-mapper/dm-queue-length.txt
new file mode 100644
index 00000000000..f4db2562175
--- /dev/null
+++ b/Documentation/device-mapper/dm-queue-length.txt
@@ -0,0 +1,39 @@
+dm-queue-length
+===============
+
+dm-queue-length is a path selector module for device-mapper targets,
+which selects a path with the least number of in-flight I/Os.
+The path selector name is 'queue-length'.
+
+Table parameters for each path: [<repeat_count>]
+	<repeat_count>: The number of I/Os to dispatch using the selected
+			path before switching to the next path.
+			If not given, internal default is used. To check
+			the default value, see the activated table.
+
+Status for each path: <status> <fail-count> <in-flight>
+	<status>: 'A' if the path is active, 'F' if the path is failed.
+	<fail-count>: The number of path failures.
+	<in-flight>: The number of in-flight I/Os on the path.
+
+
+Algorithm
+=========
+
+dm-queue-length increments/decrements 'in-flight' when an I/O is
+dispatched/completed respectively.
+dm-queue-length selects a path with the minimum 'in-flight'.
+
+
+Examples
+========
+In case that 2 paths (sda and sdb) are used with repeat_count == 128.
+
+# echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \
+  dmsetup create test
+#
+# dmsetup table
+test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128
+#
+# dmsetup status
+test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0
diff --git a/Documentation/device-mapper/dm-service-time.txt b/Documentation/device-mapper/dm-service-time.txt
new file mode 100644
index 00000000000..7d00668e97b
--- /dev/null
+++ b/Documentation/device-mapper/dm-service-time.txt
@@ -0,0 +1,91 @@
+dm-service-time
+===============
+
+dm-service-time is a path selector module for device-mapper targets,
+which selects a path with the shortest estimated service time for
+the incoming I/O.
+
+The service time for each path is estimated by dividing the total size
+of in-flight I/Os on a path with the performance value of the path.
+The performance value is a relative throughput value among all paths
+in a path-group, and it can be specified as a table argument.
+
+The path selector name is 'service-time'.
+
+Table parameters for each path: [<repeat_count> [<relative_throughput>]]
+	<repeat_count>: The number of I/Os to dispatch using the selected
+			path before switching to the next path.
+			If not given, internal default is used.  To check
+			the default value, see the activated table.
+	<relative_throughput>: The relative throughput value of the path
+			among all paths in the path-group.
+			The valid range is 0-100.
+			If not given, minimum value '1' is used.
+			If '0' is given, the path isn't selected while
+			other paths having a positive value are available.
+
+Status for each path: <status> <fail-count> <in-flight-size> \
+		      <relative_throughput>
+	<status>: 'A' if the path is active, 'F' if the path is failed.
+	<fail-count>: The number of path failures.
+	<in-flight-size>: The size of in-flight I/Os on the path.
+	<relative_throughput>: The relative throughput value of the path
+			among all paths in the path-group.
+
+
+Algorithm
+=========
+
+dm-service-time adds the I/O size to 'in-flight-size' when the I/O is
+dispatched and substracts when completed.
+Basically, dm-service-time selects a path having minimum service time
+which is calculated by:
+
+	('in-flight-size' + 'size-of-incoming-io') / 'relative_throughput'
+
+However, some optimizations below are used to reduce the calculation
+as much as possible.
+
+	1. If the paths have the same 'relative_throughput', skip
+	   the division and just compare the 'in-flight-size'.
+
+	2. If the paths have the same 'in-flight-size', skip the division
+	   and just compare the 'relative_throughput'.
+
+	3. If some paths have non-zero 'relative_throughput' and others
+	   have zero 'relative_throughput', ignore those paths with zero
+	   'relative_throughput'.
+
+If such optimizations can't be applied, calculate service time, and
+compare service time.
+If calculated service time is equal, the path having maximum
+'relative_throughput' may be better.  So compare 'relative_throughput'
+then.
+
+
+Examples
+========
+In case that 2 paths (sda and sdb) are used with repeat_count == 128
+and sda has an average throughput 1GB/s and sdb has 4GB/s,
+'relative_throughput' value may be '1' for sda and '4' for sdb.
+
+# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \
+  dmsetup create test
+#
+# dmsetup table
+test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4
+#
+# dmsetup status
+test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4
+
+
+Or '2' for sda and '8' for sdb would be also true.
+
+# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \
+  dmsetup create test
+#
+# dmsetup table
+test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8
+#
+# dmsetup status
+test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8
diff --git a/Documentation/driver-model/driver.txt b/Documentation/driver-model/driver.txt
index 82132169d47..60120fb3b96 100644
--- a/Documentation/driver-model/driver.txt
+++ b/Documentation/driver-model/driver.txt
@@ -207,8 +207,8 @@ Attributes
 ~~~~~~~~~~
 struct driver_attribute {
         struct attribute        attr;
-        ssize_t (*show)(struct device_driver *, char * buf, size_t count, loff_t off);
-        ssize_t (*store)(struct device_driver *, const char * buf, size_t count, loff_t off);
+        ssize_t (*show)(struct device_driver *driver, char *buf);
+        ssize_t (*store)(struct device_driver *, const char * buf, size_t count);
 };
 
 Device drivers can export attributes via their sysfs directories. 
diff --git a/Documentation/dvb/get_dvb_firmware b/Documentation/dvb/get_dvb_firmware
index a52adfc9a57..3d1b0ab70c8 100644
--- a/Documentation/dvb/get_dvb_firmware
+++ b/Documentation/dvb/get_dvb_firmware
@@ -25,7 +25,7 @@ use IO::Handle;
 		"tda10046lifeview", "av7110", "dec2000t", "dec2540t",
 		"dec3000s", "vp7041", "dibusb", "nxt2002", "nxt2004",
 		"or51211", "or51132_qam", "or51132_vsb", "bluebird",
-		"opera1", "cx231xx", "cx18", "cx23885", "pvrusb2" );
+		"opera1", "cx231xx", "cx18", "cx23885", "pvrusb2", "mpc718" );
 
 # Check args
 syntax() if (scalar(@ARGV) != 1);
@@ -381,6 +381,57 @@ sub cx18 {
     $allfiles;
 }
 
+sub mpc718 {
+    my $archive = 'Yuan MPC718 TV Tuner Card 2.13.10.1016.zip';
+    my $url = "ftp://ftp.work.acer-euro.com/desktop/aspire_idea510/vista/Drivers/$archive";
+    my $fwfile = "dvb-cx18-mpc718-mt352.fw";
+    my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1);
+
+    checkstandard();
+    wgetfile($archive, $url);
+    unzip($archive, $tmpdir);
+
+    my $sourcefile = "$tmpdir/Yuan MPC718 TV Tuner Card 2.13.10.1016/mpc718_32bit/yuanrap.sys";
+    my $found = 0;
+
+    open IN, '<', $sourcefile or die "Couldn't open $sourcefile to extract $fwfile data\n";
+    binmode IN;
+    open OUT, '>', $fwfile;
+    binmode OUT;
+    {
+	# Block scope because we change the line terminator variable $/
+	my $prevlen = 0;
+	my $currlen;
+
+	# Buried in the data segment are 3 runs of almost identical
+	# register-value pairs that end in 0x5d 0x01 which is a "TUNER GO"
+	# command for the MT352.
+	# Pull out the middle run (because it's easy) of register-value
+	# pairs to make the "firmware" file.
+
+	local $/ = "\x5d\x01"; # MT352 "TUNER GO"
+
+	while (<IN>) {
+	    $currlen = length($_);
+	    if ($prevlen == $currlen && $currlen <= 64) {
+		chop; chop; # Get rid of "TUNER GO"
+		s/^\0\0//;  # get rid of leading 00 00 if it's there
+		printf OUT "$_";
+		$found = 1;
+		last;
+	    }
+	    $prevlen = $currlen;
+	}
+    }
+    close OUT;
+    close IN;
+    if (!$found) {
+	unlink $fwfile;
+	die "Couldn't find valid register-value sequence in $sourcefile for $fwfile\n";
+    }
+    $fwfile;
+}
+
 sub cx23885 {
     my $url = "http://linuxtv.org/downloads/firmware/";
 
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 7129846a278..09e031c5588 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -6,6 +6,20 @@ be removed from this file.
 
 ---------------------------
 
+What:	IRQF_SAMPLE_RANDOM
+Check:	IRQF_SAMPLE_RANDOM
+When:	July 2009
+
+Why:	Many of IRQF_SAMPLE_RANDOM users are technically bogus as entropy
+	sources in the kernel's current entropy model. To resolve this, every
+	input point to the kernel's entropy pool needs to better document the
+	type of entropy source it actually is. This will be replaced with
+	additional add_*_randomness functions in drivers/char/random.c
+
+Who:	Robin Getz <rgetz@blackfin.uclinux.org> & Matt Mackall <mpm@selenic.com>
+
+---------------------------
+
 What:	The ieee80211_regdom module parameter
 When:	March 2010 / desktop catchup
 
@@ -354,16 +368,6 @@ Who:  Krzysztof Piotr Oledzki <ole@ans.pl>
 
 ---------------------------
 
-What:	i2c_attach_client(), i2c_detach_client(), i2c_driver->detach_client(),
-	i2c_adapter->client_register(), i2c_adapter->client_unregister
-When:	2.6.30
-Check:	i2c_attach_client i2c_detach_client
-Why:	Deprecated by the new (standard) device driver binding model. Use
-	i2c_driver->probe() and ->remove() instead.
-Who:	Jean Delvare <khali@linux-fr.org>
-
----------------------------
-
 What:	fscher and fscpos drivers
 When:	June 2009
 Why:	Deprecated by the new fschmd driver.
@@ -454,3 +458,13 @@ Why:	Remove the old legacy 32bit machine check code. This has been
 	but the old version has been kept around for easier testing. Note this
 	doesn't impact the old P5 and WinChip machine check handlers.
 Who:	Andi Kleen <andi@firstfloor.org>
+
+----------------------------
+
+What:	lock_policy_rwsem_* and unlock_policy_rwsem_* will not be
+	exported interface anymore.
+When:	2.6.33
+Why:	cpu_policy_rwsem has a new cleaner definition making it local to
+	cpufreq core and contained inside cpufreq.c. Other dependent
+	drivers should not use it in order to safely avoid lockdep issues.
+Who:	Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
index 8dd6db76171..f15621ee559 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -66,6 +66,10 @@ mandatory-locking.txt
 	- info on the Linux implementation of Sys V mandatory file locking.
 ncpfs.txt
 	- info on Novell Netware(tm) filesystem using NCP protocol.
+nfs41-server.txt
+	- info on the Linux server implementation of NFSv4 minor version 1.
+nfs-rdma.txt
+	- how to install and setup the Linux NFS/RDMA client and server software.
 nfsroot.txt
 	- short guide on setting up a diskless box with NFS root filesystem.
 nilfs2.txt
diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt
index bf8080640eb..6208f55c44c 100644
--- a/Documentation/filesystems/9p.txt
+++ b/Documentation/filesystems/9p.txt
@@ -123,6 +123,9 @@ available from the same CVS repository.
 There are user and developer mailing lists available through the v9fs project
 on sourceforge (http://sourceforge.net/projects/v9fs).
 
+A stand-alone version of the module (which should build for any 2.6 kernel)
+is available via (http://github.com/ericvh/9p-sac/tree/master)
+
 News and other information is maintained on SWiK (http://swik.net/v9fs).
 
 Bug reports may be issued through the kernel.org bugzilla 
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 3120f8dd2c3..18b9d0ca063 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -109,27 +109,28 @@ prototypes:
 
 locking rules:
 	All may block.
-			BKL	s_lock	s_umount
-alloc_inode:		no	no	no
-destroy_inode:		no
-dirty_inode:		no				(must not sleep)
-write_inode:		no
-drop_inode:		no				!!!inode_lock!!!
-delete_inode:		no
-put_super:		yes	yes	no
-write_super:		no	yes	read
-sync_fs:		no	no	read
-freeze_fs:		?
-unfreeze_fs:		?
-statfs:			no	no	no
-remount_fs:		yes	yes	maybe		(see below)
-clear_inode:		no
-umount_begin:		yes	no	no
-show_options:		no				(vfsmount->sem)
-quota_read:		no	no	no		(see below)
-quota_write:		no	no	no		(see below)
-
-->remount_fs() will have the s_umount lock if it's already mounted.
+	None have BKL
+			s_umount
+alloc_inode:
+destroy_inode:
+dirty_inode:				(must not sleep)
+write_inode:
+drop_inode:				!!!inode_lock!!!
+delete_inode:
+put_super:		write
+write_super:		read
+sync_fs:		read
+freeze_fs:		read
+unfreeze_fs:		read
+statfs:			no
+remount_fs:		maybe		(see below)
+clear_inode:
+umount_begin:		no
+show_options:		no		(namespace_sem)
+quota_read:		no		(see below)
+quota_write:		no		(see below)
+
+->remount_fs() will have the s_umount exclusive lock if it's already mounted.
 When called from get_sb_single, it does NOT have the s_umount lock.
 ->quota_read() and ->quota_write() functions are both guaranteed to
 be the only ones operating on the quota file by the quota code (via
@@ -187,7 +188,7 @@ readpages:		no
 write_begin:		no	locks the page		yes
 write_end:		no	yes, unlocks		yes
 perform_write:		no	n/a			yes
-bmap:			yes
+bmap:			no
 invalidatepage:		no	yes
 releasepage:		no	yes
 direct_IO:		no
diff --git a/Documentation/filesystems/afs.txt b/Documentation/filesystems/afs.txt
index 12ad6c7f4e5..ffef91c4e0d 100644
--- a/Documentation/filesystems/afs.txt
+++ b/Documentation/filesystems/afs.txt
@@ -23,15 +23,13 @@ it does support include:
 
  (*) Security (currently only AFS kaserver and KerberosIV tickets).
 
- (*) File reading.
+ (*) File reading and writing.
 
  (*) Automounting.
 
-It does not yet support the following AFS features:
-
- (*) Write support.
+ (*) Local caching (via fscache).
 
- (*) Local caching.
+It does not yet support the following AFS features:
 
  (*) pioctl() system call.
 
@@ -56,7 +54,7 @@ They permit the debugging messages to be turned on dynamically by manipulating
 the masks in the following files:
 
 	/sys/module/af_rxrpc/parameters/debug
-	/sys/module/afs/parameters/debug
+	/sys/module/kafs/parameters/debug
 
 
 =====
@@ -66,9 +64,9 @@ USAGE
 When inserting the driver modules the root cell must be specified along with a
 list of volume location server IP addresses:
 
-	insmod af_rxrpc.o
-	insmod rxkad.o
-	insmod kafs.o rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
+	modprobe af_rxrpc
+	modprobe rxkad
+	modprobe kafs rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
 
 The first module is the AF_RXRPC network protocol driver.  This provides the
 RxRPC remote operation protocol and may also be accessed from userspace.  See:
@@ -81,7 +79,7 @@ is the actual filesystem driver for the AFS filesystem.
 Once the module has been loaded, more modules can be added by the following
 procedure:
 
-	echo add grand.central.org 18.7.14.88:128.2.191.224 >/proc/fs/afs/cells
+	echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
 
 Where the parameters to the "add" command are the name of a cell and a list of
 volume location servers within that cell, with the latter separated by colons.
@@ -101,7 +99,7 @@ The name of the volume can be suffixes with ".backup" or ".readonly" to
 specify connection to only volumes of those types.
 
 The name of the cell is optional, and if not given during a mount, then the
-named volume will be looked up in the cell specified during insmod.
+named volume will be looked up in the cell specified during modprobe.
 
 Additional cells can be added through /proc (see later section).
 
@@ -163,14 +161,14 @@ THE CELL DATABASE
 
 The filesystem maintains an internal database of all the cells it knows and the
 IP addresses of the volume location servers for those cells.  The cell to which
-the system belongs is added to the database when insmod is performed by the
+the system belongs is added to the database when modprobe is performed by the
 "rootcell=" argument or, if compiled in, using a "kafs.rootcell=" argument on
 the kernel command line.
 
 Further cells can be added by commands similar to the following:
 
 	echo add CELLNAME VLADDR[:VLADDR][:VLADDR]... >/proc/fs/afs/cells
-	echo add grand.central.org 18.7.14.88:128.2.191.224 >/proc/fs/afs/cells
+	echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
 
 No other cell database operations are available at this time.
 
@@ -233,7 +231,7 @@ insmod /tmp/kafs.o rootcell=cambridge.redhat.com:172.16.18.91
 mount -t afs \%root.afs. /afs
 mount -t afs \%cambridge.redhat.com:root.cell. /afs/cambridge.redhat.com/
 
-echo add grand.central.org 18.7.14.88:128.2.191.224 > /proc/fs/afs/cells
+echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 > /proc/fs/afs/cells
 mount -t afs "#grand.central.org:root.cell." /afs/grand.central.org/
 mount -t afs "#grand.central.org:root.archive." /afs/grand.central.org/archive
 mount -t afs "#grand.central.org:root.contrib." /afs/grand.central.org/contrib
diff --git a/Documentation/filesystems/ext2.txt b/Documentation/filesystems/ext2.txt
index e055acb6b2d..67639f905f1 100644
--- a/Documentation/filesystems/ext2.txt
+++ b/Documentation/filesystems/ext2.txt
@@ -322,7 +322,7 @@ an upper limit on the block size imposed by the page size of the kernel,
 so 8kB blocks are only allowed on Alpha systems (and other architectures
 which support larger pages).
 
-There is an upper limit of 32768 subdirectories in a single directory.
+There is an upper limit of 32000 subdirectories in a single directory.
 
 There is a "soft" upper limit of about 10-15k files in a single directory
 with the current linear linked-list directory implementation.  This limit
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 608fdba97b7..7be02ac5fa3 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -235,6 +235,10 @@ minixdf			Make 'df' act like Minix.
 
 debug			Extra debugging information is sent to syslog.
 
+abort			Simulate the effects of calling ext4_abort() for
+			debugging purposes.  This is normally used while
+			remounting a filesystem which is already mounted.
+
 errors=remount-ro	Remount the filesystem read-only on an error.
 errors=continue		Keep going on a filesystem error.
 errors=panic		Panic and halt the machine if an error occurs.
diff --git a/Documentation/filesystems/isofs.txt b/Documentation/filesystems/isofs.txt
index 6973b980ca2..3c367c3b360 100644
--- a/Documentation/filesystems/isofs.txt
+++ b/Documentation/filesystems/isofs.txt
@@ -23,8 +23,13 @@ Mount options unique to the isofs filesystem.
   map=off       Do not map non-Rock Ridge filenames to lower case
   map=normal    Map non-Rock Ridge filenames to lower case
   map=acorn     As map=normal but also apply Acorn extensions if present
-  mode=xxx      Sets the permissions on files to xxx
-  dmode=xxx     Sets the permissions on directories to xxx
+  mode=xxx      Sets the permissions on files to xxx unless Rock Ridge
+		extensions set the permissions otherwise
+  dmode=xxx     Sets the permissions on directories to xxx unless Rock Ridge
+		extensions set the permissions otherwise
+  overriderockperm Set permissions on files and directories according to
+		'mode' and 'dmode' even though Rock Ridge extensions are
+		present.
   nojoliet      Ignore Joliet extensions if they are present.
   norock        Ignore Rock Ridge extensions if they are present.
   hide		Completely strip hidden files from the file system.
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index ebff3c10a07..ffead13f944 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -5,11 +5,12 @@
                   Bodo Bauer <bb@ricochet.net>
 
 2.4.x update	  Jorge Nerin <comandante@zaralinux.com>      November 14 2000
-move /proc/sys	  Shen Feng <shen@cn.fujitsu.com>		    April 1 2009
+move /proc/sys	  Shen Feng <shen@cn.fujitsu.com>		  April 1 2009
 ------------------------------------------------------------------------------
 Version 1.3                                              Kernel version 2.2.12
 					      Kernel version 2.4.0-test11-pre4
 ------------------------------------------------------------------------------
+fixes/update part 1.1  Stefani Seibold <stefani@seibold.net>       June 9 2009
 
 Table of Contents
 -----------------
@@ -116,7 +117,7 @@ The link  self  points  to  the  process reading the file system. Each process
 subdirectory has the entries listed in Table 1-1.
 
 
-Table 1-1: Process specific entries in /proc 
+Table 1-1: Process specific entries in /proc
 ..............................................................................
  File		Content
  clear_refs	Clears page referenced bits shown in smaps output
@@ -134,46 +135,103 @@ Table 1-1: Process specific entries in /proc
  status		Process status in human readable form
  wchan		If CONFIG_KALLSYMS is set, a pre-decoded wchan
  stack		Report full stack trace, enable via CONFIG_STACKTRACE
- smaps		Extension based on maps, the rss size for each mapped file
+ smaps		a extension based on maps, showing the memory consumption of
+		each mapping
 ..............................................................................
 
 For example, to get the status information of a process, all you have to do is
 read the file /proc/PID/status:
 
-  >cat /proc/self/status 
-  Name:   cat 
-  State:  R (running) 
-  Pid:    5452 
-  PPid:   743 
+  >cat /proc/self/status
+  Name:   cat
+  State:  R (running)
+  Tgid:   5452
+  Pid:    5452
+  PPid:   743
   TracerPid:      0						(2.4)
-  Uid:    501     501     501     501 
-  Gid:    100     100     100     100 
-  Groups: 100 14 16 
-  VmSize:     1112 kB 
-  VmLck:         0 kB 
-  VmRSS:       348 kB 
-  VmData:       24 kB 
-  VmStk:        12 kB 
-  VmExe:         8 kB 
-  VmLib:      1044 kB 
-  SigPnd: 0000000000000000 
-  SigBlk: 0000000000000000 
-  SigIgn: 0000000000000000 
-  SigCgt: 0000000000000000 
-  CapInh: 00000000fffffeff 
-  CapPrm: 0000000000000000 
-  CapEff: 0000000000000000 
-
+  Uid:    501     501     501     501
+  Gid:    100     100     100     100
+  FDSize: 256
+  Groups: 100 14 16
+  VmPeak:     5004 kB
+  VmSize:     5004 kB
+  VmLck:         0 kB
+  VmHWM:       476 kB
+  VmRSS:       476 kB
+  VmData:      156 kB
+  VmStk:        88 kB
+  VmExe:        68 kB
+  VmLib:      1412 kB
+  VmPTE:        20 kb
+  Threads:        1
+  SigQ:   0/28578
+  SigPnd: 0000000000000000
+  ShdPnd: 0000000000000000
+  SigBlk: 0000000000000000
+  SigIgn: 0000000000000000
+  SigCgt: 0000000000000000
+  CapInh: 00000000fffffeff
+  CapPrm: 0000000000000000
+  CapEff: 0000000000000000
+  CapBnd: ffffffffffffffff
+  voluntary_ctxt_switches:        0
+  nonvoluntary_ctxt_switches:     1
 
 This shows you nearly the same information you would get if you viewed it with
 the ps  command.  In  fact,  ps  uses  the  proc  file  system  to  obtain its
-information. The  statm  file  contains  more  detailed  information about the
-process memory usage. Its seven fields are explained in Table 1-2.  The stat
-file contains details information about the process itself.  Its fields are
-explained in Table 1-3.
+information.  But you get a more detailed  view of the  process by reading the
+file /proc/PID/status. It fields are described in table 1-2.
+
+The  statm  file  contains  more  detailed  information about the process
+memory usage. Its seven fields are explained in Table 1-3.  The stat file
+contains details information about the process itself.  Its fields are
+explained in Table 1-4.
 
+Table 1-2: Contents of the statm files (as of 2.6.30-rc7)
+..............................................................................
+ Field                       Content
+ Name                        filename of the executable
+ State                       state (R is running, S is sleeping, D is sleeping
+                             in an uninterruptible wait, Z is zombie,
+			     T is traced or stopped)
+ Tgid                        thread group ID
+ Pid                         process id
+ PPid                        process id of the parent process
+ TracerPid                   PID of process tracing this process (0 if not)
+ Uid                         Real, effective, saved set, and  file system UIDs
+ Gid                         Real, effective, saved set, and  file system GIDs
+ FDSize                      number of file descriptor slots currently allocated
+ Groups                      supplementary group list
+ VmPeak                      peak virtual memory size
+ VmSize                      total program size
+ VmLck                       locked memory size
+ VmHWM                       peak resident set size ("high water mark")
+ VmRSS                       size of memory portions
+ VmData                      size of data, stack, and text segments
+ VmStk                       size of data, stack, and text segments
+ VmExe                       size of text segment
+ VmLib                       size of shared library code
+ VmPTE                       size of page table entries
+ Threads                     number of threads
+ SigQ                        number of signals queued/max. number for queue
+ SigPnd                      bitmap of pending signals for the thread
+ ShdPnd                      bitmap of shared pending signals for the process
+ SigBlk                      bitmap of blocked signals
+ SigIgn                      bitmap of ignored signals
+ SigCgt                      bitmap of catched signals
+ CapInh                      bitmap of inheritable capabilities
+ CapPrm                      bitmap of permitted capabilities
+ CapEff                      bitmap of effective capabilities
+ CapBnd                      bitmap of capabilities bounding set
+ Cpus_allowed                mask of CPUs on which this process may run
+ Cpus_allowed_list           Same as previous, but in "list format"
+ Mems_allowed                mask of memory nodes allowed to this process
+ Mems_allowed_list           Same as previous, but in "list format"
+ voluntary_ctxt_switches     number of voluntary context switches
+ nonvoluntary_ctxt_switches  number of non voluntary context switches
+..............................................................................
 
-Table 1-2: Contents of the statm files (as of 2.6.8-rc3)
+Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
 ..............................................................................
  Field    Content
  size     total program size (pages)		(same as VmSize in status)
@@ -188,7 +246,7 @@ Table 1-2: Contents of the statm files (as of 2.6.8-rc3)
 ..............................................................................
 
 
-Table 1-3: Contents of the stat files (as of 2.6.22-rc3)
+Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
 ..............................................................................
  Field          Content
   pid           process id
@@ -222,10 +280,10 @@ Table 1-3: Contents of the stat files (as of 2.6.22-rc3)
   start_stack   address of the start of the stack
   esp           current value of ESP
   eip           current value of EIP
-  pending       bitmap of pending signals (obsolete)
-  blocked       bitmap of blocked signals (obsolete)
-  sigign        bitmap of ignored signals (obsolete)
-  sigcatch      bitmap of catched signals (obsolete)
+  pending       bitmap of pending signals
+  blocked       bitmap of blocked signals
+  sigign        bitmap of ignored signals
+  sigcatch      bitmap of catched signals
   wchan         address where process went to sleep
   0             (place holder)
   0             (place holder)
@@ -234,19 +292,99 @@ Table 1-3: Contents of the stat files (as of 2.6.22-rc3)
   rt_priority   realtime priority
   policy        scheduling policy (man sched_setscheduler)
   blkio_ticks   time spent waiting for block IO
+  gtime         guest time of the task in jiffies
+  cgtime        guest time of the task children in jiffies
 ..............................................................................
 
+The /proc/PID/map file containing the currently mapped memory regions and
+their access permissions.
+
+The format is:
+
+address           perms offset  dev   inode      pathname
+
+08048000-08049000 r-xp 00000000 03:00 8312       /opt/test
+08049000-0804a000 rw-p 00001000 03:00 8312       /opt/test
+0804a000-0806b000 rw-p 00000000 00:00 0          [heap]
+a7cb1000-a7cb2000 ---p 00000000 00:00 0
+a7cb2000-a7eb2000 rw-p 00000000 00:00 0
+a7eb2000-a7eb3000 ---p 00000000 00:00 0
+a7eb3000-a7ed5000 rw-p 00000000 00:00 0
+a7ed5000-a8008000 r-xp 00000000 03:00 4222       /lib/libc.so.6
+a8008000-a800a000 r--p 00133000 03:00 4222       /lib/libc.so.6
+a800a000-a800b000 rw-p 00135000 03:00 4222       /lib/libc.so.6
+a800b000-a800e000 rw-p 00000000 00:00 0
+a800e000-a8022000 r-xp 00000000 03:00 14462      /lib/libpthread.so.0
+a8022000-a8023000 r--p 00013000 03:00 14462      /lib/libpthread.so.0
+a8023000-a8024000 rw-p 00014000 03:00 14462      /lib/libpthread.so.0
+a8024000-a8027000 rw-p 00000000 00:00 0
+a8027000-a8043000 r-xp 00000000 03:00 8317       /lib/ld-linux.so.2
+a8043000-a8044000 r--p 0001b000 03:00 8317       /lib/ld-linux.so.2
+a8044000-a8045000 rw-p 0001c000 03:00 8317       /lib/ld-linux.so.2
+aff35000-aff4a000 rw-p 00000000 00:00 0          [stack]
+ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso]
+
+where "address" is the address space in the process that it occupies, "perms"
+is a set of permissions:
+
+ r = read
+ w = write
+ x = execute
+ s = shared
+ p = private (copy on write)
+
+"offset" is the offset into the mapping, "dev" is the device (major:minor), and
+"inode" is the inode  on that device.  0 indicates that  no inode is associated
+with the memory region, as the case would be with BSS (uninitialized data).
+The "pathname" shows the name associated file for this mapping.  If the mapping
+is not associated with a file:
+
+ [heap]                   = the heap of the program
+ [stack]                  = the stack of the main process
+ [vdso]                   = the "virtual dynamic shared object",
+                            the kernel system call handler
+
+ or if empty, the mapping is anonymous.
+
+
+The /proc/PID/smaps is an extension based on maps, showing the memory
+consumption for each of the process's mappings. For each of mappings there
+is a series of lines such as the following:
+
+08048000-080bc000 r-xp 00000000 03:02 13130      /bin/bash
+Size:               1084 kB
+Rss:                 892 kB
+Pss:                 374 kB
+Shared_Clean:        892 kB
+Shared_Dirty:          0 kB
+Private_Clean:         0 kB
+Private_Dirty:         0 kB
+Referenced:          892 kB
+Swap:                  0 kB
+KernelPageSize:        4 kB
+MMUPageSize:           4 kB
+
+The first  of these lines shows  the same information  as is displayed for the
+mapping in /proc/PID/maps.  The remaining lines show  the size of the mapping,
+the amount of the mapping that is currently resident in RAM, the "proportional
+set size” (divide each shared page by the number of processes sharing it), the
+number of clean and dirty shared pages in the mapping, and the number of clean
+and dirty private pages in the mapping.  The "Referenced" indicates the amount
+of memory currently marked as referenced or accessed.
+
+This file is only present if the CONFIG_MMU kernel configuration option is
+enabled.
 
 1.2 Kernel data
 ---------------
 
 Similar to  the  process entries, the kernel data files give information about
 the running kernel. The files used to obtain this information are contained in
-/proc and  are  listed  in Table 1-4. Not all of these will be present in your
+/proc and  are  listed  in Table 1-5. Not all of these will be present in your
 system. It  depends  on the kernel configuration and the loaded modules, which
 files are there, and which are missing.
 
-Table 1-4: Kernel info in /proc
+Table 1-5: Kernel info in /proc
 ..............................................................................
  File        Content                                           
  apm         Advanced power management info                    
@@ -283,6 +421,7 @@ Table 1-4: Kernel info in /proc
  rtc         Real time clock                                   
  scsi        SCSI info (see text)                              
  slabinfo    Slab pool info                                    
+ softirqs    softirq usage
  stat        Overall statistics                                
  swaps       Swap space utilization                            
  sys         See chapter 2                                     
@@ -597,6 +736,25 @@ on the kind of area :
 0xffffffffa0017000-0xffffffffa0022000   45056 sys_init_module+0xc27/0x1d00 ...
    pages=10 vmalloc N0=10
 
+..............................................................................
+
+softirqs:
+
+Provides counts of softirq handlers serviced since boot time, for each cpu.
+
+> cat /proc/softirqs
+                CPU0       CPU1       CPU2       CPU3
+      HI:          0          0          0          0
+   TIMER:      27166      27120      27097      27034
+  NET_TX:          0          0          0         17
+  NET_RX:         42          0          0         39
+   BLOCK:          0          0        107       1121
+ TASKLET:          0          0          0        290
+   SCHED:      27035      26983      26971      26746
+ HRTIMER:          0          0          0          0
+     RCU:       1678       1769       2178       2250
+
+
 1.3 IDE devices in /proc/ide
 ----------------------------
 
@@ -614,10 +772,10 @@ IDE devices:
 
 More detailed  information  can  be  found  in  the  controller  specific
 subdirectories. These  are  named  ide0,  ide1  and  so  on.  Each  of  these
-directories contains the files shown in table 1-5.
+directories contains the files shown in table 1-6.
 
 
-Table 1-5: IDE controller info in  /proc/ide/ide?
+Table 1-6: IDE controller info in  /proc/ide/ide?
 ..............................................................................
  File    Content                                 
  channel IDE channel (0 or 1)                    
@@ -627,11 +785,11 @@ Table 1-5: IDE controller info in  /proc/ide/ide?
 ..............................................................................
 
 Each device  connected  to  a  controller  has  a separate subdirectory in the
-controllers directory.  The  files  listed in table 1-6 are contained in these
+controllers directory.  The  files  listed in table 1-7 are contained in these
 directories.
 
 
-Table 1-6: IDE device information
+Table 1-7: IDE device information
 ..............................................................................
  File             Content                                    
  cache            The cache                                  
@@ -673,12 +831,12 @@ the drive parameters:
 1.4 Networking info in /proc/net
 --------------------------------
 
-The subdirectory  /proc/net  follows  the  usual  pattern. Table 1-6 shows the
+The subdirectory  /proc/net  follows  the  usual  pattern. Table 1-8 shows the
 additional values  you  get  for  IP  version 6 if you configure the kernel to
-support this. Table 1-7 lists the files and their meaning.
+support this. Table 1-9 lists the files and their meaning.
 
 
-Table 1-6: IPv6 info in /proc/net 
+Table 1-8: IPv6 info in /proc/net
 ..............................................................................
  File       Content                                               
  udp6       UDP sockets (IPv6)                                    
@@ -693,7 +851,7 @@ Table 1-6: IPv6 info in /proc/net
 ..............................................................................
 
 
-Table 1-7: Network info in /proc/net 
+Table 1-9: Network info in /proc/net
 ..............................................................................
  File          Content                                                         
  arp           Kernel  ARP table                                               
@@ -817,10 +975,10 @@ The directory  /proc/parport  contains information about the parallel ports of
 your system.  It  has  one  subdirectory  for  each port, named after the port
 number (0,1,2,...).
 
-These directories contain the four files shown in Table 1-8.
+These directories contain the four files shown in Table 1-10.
 
 
-Table 1-8: Files in /proc/parport 
+Table 1-10: Files in /proc/parport
 ..............................................................................
  File      Content                                                             
  autoprobe Any IEEE-1284 device ID information that has been acquired.         
@@ -838,10 +996,10 @@ Table 1-8: Files in /proc/parport
 
 Information about  the  available  and actually used tty's can be found in the
 directory /proc/tty.You'll  find  entries  for drivers and line disciplines in
-this directory, as shown in Table 1-9.
+this directory, as shown in Table 1-11.
 
 
-Table 1-9: Files in /proc/tty 
+Table 1-11: Files in /proc/tty
 ..............................................................................
  File          Content                                        
  drivers       list of drivers and their usage                
@@ -883,6 +1041,7 @@ since the system first booted.  For a quick look, simply cat the file:
   processes 2915
   procs_running 1
   procs_blocked 0
+  softirq 183433 0 21755 12 39 1137 231 21459 2263
 
 The very first  "cpu" line aggregates the  numbers in all  of the other "cpuN"
 lines.  These numbers identify the amount of time the CPU has spent performing
@@ -918,6 +1077,11 @@ CPUs.
 The   "procs_blocked" line gives  the  number of  processes currently blocked,
 waiting for I/O to complete.
 
+The "softirq" line gives counts of softirqs serviced since boot time, for each
+of the possible system softirqs. The first column is the total of all
+softirqs serviced; each subsequent column is the total for that particular
+softirq.
+
 
 1.9 Ext4 file system parameters
 ------------------------------
@@ -926,9 +1090,9 @@ Information about mounted ext4 file systems can be found in
 /proc/fs/ext4.  Each mounted filesystem will have a directory in
 /proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
 /proc/fs/ext4/dm-0).   The files in each per-device directory are shown
-in Table 1-10, below.
+in Table 1-12, below.
 
-Table 1-10: Files in /proc/fs/ext4/<devname>
+Table 1-12: Files in /proc/fs/ext4/<devname>
 ..............................................................................
  File            Content                                        
  mb_groups       details of multiblock allocator buddy cache of free blocks
@@ -1003,13 +1167,11 @@ CHAPTER 3: PER-PROCESS PARAMETERS
 3.1 /proc/<pid>/oom_adj - Adjust the oom-killer score
 ------------------------------------------------------
 
-This file can be used to adjust the score used to select which processes should
-be killed in an out-of-memory situation.  The oom_adj value is a characteristic
-of the task's mm, so all threads that share an mm with pid will have the same
-oom_adj value.  A high value will increase the likelihood of this process being
-killed by the oom-killer.  Valid values are in the range -16 to +15 as
-explained below and a special value of -17, which disables oom-killing
-altogether for threads sharing pid's mm.
+This file can be used to adjust the score used to select which processes
+should be killed in an  out-of-memory  situation.  Giving it a high score will
+increase the likelihood of this process being killed by the oom-killer.  Valid
+values are in the range -16 to +15, plus the special value -17, which disables
+oom-killing altogether for this process.
 
 The process to be killed in an out-of-memory situation is selected among all others
 based on its badness score. This value equals the original memory size of the process
@@ -1023,9 +1185,6 @@ the parent's score if they do not share the same memory. Thus forking servers
 are the prime candidates to be killed. Having only one 'hungry' child will make
 parent less preferable than the child.
 
-/proc/<pid>/oom_adj cannot be changed for kthreads since they are immune from
-oom-killing already.
-
 /proc/<pid>/oom_score shows process' current badness score.
 
 The following heuristics are then applied:
diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt
index 7e81e37c0b1..b245d524d56 100644
--- a/Documentation/filesystems/sysfs.txt
+++ b/Documentation/filesystems/sysfs.txt
@@ -23,7 +23,8 @@ interface.
 Using sysfs
 ~~~~~~~~~~~
 
-sysfs is always compiled in. You can access it by doing:
+sysfs is always compiled in if CONFIG_SYSFS is defined. You can access
+it by doing:
 
     mount -t sysfs sysfs /sys 
 
diff --git a/Documentation/gcov.txt b/Documentation/gcov.txt
new file mode 100644
index 00000000000..40ec6335276
--- /dev/null
+++ b/Documentation/gcov.txt
@@ -0,0 +1,253 @@
+Using gcov with the Linux kernel
+================================
+
+1. Introduction
+2. Preparation
+3. Customization
+4. Files
+5. Modules
+6. Separated build and test machines
+7. Troubleshooting
+Appendix A: sample script: gather_on_build.sh
+Appendix B: sample script: gather_on_test.sh
+
+
+1. Introduction
+===============
+
+gcov profiling kernel support enables the use of GCC's coverage testing
+tool gcov [1] with the Linux kernel. Coverage data of a running kernel
+is exported in gcov-compatible format via the "gcov" debugfs directory.
+To get coverage data for a specific file, change to the kernel build
+directory and use gcov with the -o option as follows (requires root):
+
+# cd /tmp/linux-out
+# gcov -o /sys/kernel/debug/gcov/tmp/linux-out/kernel spinlock.c
+
+This will create source code files annotated with execution counts
+in the current directory. In addition, graphical gcov front-ends such
+as lcov [2] can be used to automate the process of collecting data
+for the entire kernel and provide coverage overviews in HTML format.
+
+Possible uses:
+
+* debugging (has this line been reached at all?)
+* test improvement (how do I change my test to cover these lines?)
+* minimizing kernel configurations (do I need this option if the
+  associated code is never run?)
+
+--
+
+[1] http://gcc.gnu.org/onlinedocs/gcc/Gcov.html
+[2] http://ltp.sourceforge.net/coverage/lcov.php
+
+
+2. Preparation
+==============
+
+Configure the kernel with:
+
+        CONFIG_DEBUGFS=y
+        CONFIG_GCOV_KERNEL=y
+
+and to get coverage data for the entire kernel:
+
+        CONFIG_GCOV_PROFILE_ALL=y
+
+Note that kernels compiled with profiling flags will be significantly
+larger and run slower. Also CONFIG_GCOV_PROFILE_ALL may not be supported
+on all architectures.
+
+Profiling data will only become accessible once debugfs has been
+mounted:
+
+        mount -t debugfs none /sys/kernel/debug
+
+
+3. Customization
+================
+
+To enable profiling for specific files or directories, add a line
+similar to the following to the respective kernel Makefile:
+
+        For a single file (e.g. main.o):
+                GCOV_PROFILE_main.o := y
+
+        For all files in one directory:
+                GCOV_PROFILE := y
+
+To exclude files from being profiled even when CONFIG_GCOV_PROFILE_ALL
+is specified, use:
+
+                GCOV_PROFILE_main.o := n
+        and:
+                GCOV_PROFILE := n
+
+Only files which are linked to the main kernel image or are compiled as
+kernel modules are supported by this mechanism.
+
+
+4. Files
+========
+
+The gcov kernel support creates the following files in debugfs:
+
+        /sys/kernel/debug/gcov
+                Parent directory for all gcov-related files.
+
+        /sys/kernel/debug/gcov/reset
+                Global reset file: resets all coverage data to zero when
+                written to.
+
+        /sys/kernel/debug/gcov/path/to/compile/dir/file.gcda
+                The actual gcov data file as understood by the gcov
+                tool. Resets file coverage data to zero when written to.
+
+        /sys/kernel/debug/gcov/path/to/compile/dir/file.gcno
+                Symbolic link to a static data file required by the gcov
+                tool. This file is generated by gcc when compiling with
+                option -ftest-coverage.
+
+
+5. Modules
+==========
+
+Kernel modules may contain cleanup code which is only run during
+module unload time. The gcov mechanism provides a means to collect
+coverage data for such code by keeping a copy of the data associated
+with the unloaded module. This data remains available through debugfs.
+Once the module is loaded again, the associated coverage counters are
+initialized with the data from its previous instantiation.
+
+This behavior can be deactivated by specifying the gcov_persist kernel
+parameter:
+
+        gcov_persist=0
+
+At run-time, a user can also choose to discard data for an unloaded
+module by writing to its data file or the global reset file.
+
+
+6. Separated build and test machines
+====================================
+
+The gcov kernel profiling infrastructure is designed to work out-of-the
+box for setups where kernels are built and run on the same machine. In
+cases where the kernel runs on a separate machine, special preparations
+must be made, depending on where the gcov tool is used:
+
+a) gcov is run on the TEST machine
+
+The gcov tool version on the test machine must be compatible with the
+gcc version used for kernel build. Also the following files need to be
+copied from build to test machine:
+
+from the source tree:
+  - all C source files + headers
+
+from the build tree:
+  - all C source files + headers
+  - all .gcda and .gcno files
+  - all links to directories
+
+It is important to note that these files need to be placed into the
+exact same file system location on the test machine as on the build
+machine. If any of the path components is symbolic link, the actual
+directory needs to be used instead (due to make's CURDIR handling).
+
+b) gcov is run on the BUILD machine
+
+The following files need to be copied after each test case from test
+to build machine:
+
+from the gcov directory in sysfs:
+  - all .gcda files
+  - all links to .gcno files
+
+These files can be copied to any location on the build machine. gcov
+must then be called with the -o option pointing to that directory.
+
+Example directory setup on the build machine:
+
+  /tmp/linux:    kernel source tree
+  /tmp/out:      kernel build directory as specified by make O=
+  /tmp/coverage: location of the files copied from the test machine
+
+  [user@build] cd /tmp/out
+  [user@build] gcov -o /tmp/coverage/tmp/out/init main.c
+
+
+7. Troubleshooting
+==================
+
+Problem:  Compilation aborts during linker step.
+Cause:    Profiling flags are specified for source files which are not
+          linked to the main kernel or which are linked by a custom
+          linker procedure.
+Solution: Exclude affected source files from profiling by specifying
+          GCOV_PROFILE := n or GCOV_PROFILE_basename.o := n in the
+          corresponding Makefile.
+
+Problem:  Files copied from sysfs appear empty or incomplete.
+Cause:    Due to the way seq_file works, some tools such as cp or tar
+          may not correctly copy files from sysfs.
+Solution: Use 'cat' to read .gcda files and 'cp -d' to copy links.
+          Alternatively use the mechanism shown in Appendix B.
+
+
+Appendix A: gather_on_build.sh
+==============================
+
+Sample script to gather coverage meta files on the build machine
+(see 6a):
+#!/bin/bash
+
+KSRC=$1
+KOBJ=$2
+DEST=$3
+
+if [ -z "$KSRC" ] || [ -z "$KOBJ" ] || [ -z "$DEST" ]; then
+  echo "Usage: $0 <ksrc directory> <kobj directory> <output.tar.gz>" >&2
+  exit 1
+fi
+
+KSRC=$(cd $KSRC; printf "all:\n\t@echo \${CURDIR}\n" | make -f -)
+KOBJ=$(cd $KOBJ; printf "all:\n\t@echo \${CURDIR}\n" | make -f -)
+
+find $KSRC $KOBJ \( -name '*.gcno' -o -name '*.[ch]' -o -type l \) -a \
+                 -perm /u+r,g+r | tar cfz $DEST -P -T -
+
+if [ $? -eq 0 ] ; then
+  echo "$DEST successfully created, copy to test system and unpack with:"
+  echo "  tar xfz $DEST -P"
+else
+  echo "Could not create file $DEST"
+fi
+
+
+Appendix B: gather_on_test.sh
+=============================
+
+Sample script to gather coverage data files on the test machine
+(see 6b):
+
+#!/bin/bash -e
+
+DEST=$1
+GCDA=/sys/kernel/debug/gcov
+
+if [ -z "$DEST" ] ; then
+  echo "Usage: $0 <output.tar.gz>" >&2
+  exit 1
+fi
+
+TEMPDIR=$(mktemp -d)
+echo Collecting data..
+find $GCDA -type d -exec mkdir -p $TEMPDIR/\{\} \;
+find $GCDA -name '*.gcda' -exec sh -c 'cat < $0 > '$TEMPDIR'/$0' {} \;
+find $GCDA -name '*.gcno' -exec sh -c 'cp -d $0 '$TEMPDIR'/$0' {} \;
+tar czf $DEST -C $TEMPDIR sys
+rm -rf $TEMPDIR
+
+echo "$DEST successfully created, copy to build system and unpack with:"
+echo "  tar xfz $DEST"
diff --git a/Documentation/i2c/instantiating-devices b/Documentation/i2c/instantiating-devices
index b55ce57a84d..c740b7b4108 100644
--- a/Documentation/i2c/instantiating-devices
+++ b/Documentation/i2c/instantiating-devices
@@ -165,3 +165,47 @@ was done there. Two significant differences are:
 Once again, method 3 should be avoided wherever possible. Explicit device
 instantiation (methods 1 and 2) is much preferred for it is safer and
 faster.
+
+
+Method 4: Instantiate from user-space
+-------------------------------------
+
+In general, the kernel should know which I2C devices are connected and
+what addresses they live at. However, in certain cases, it does not, so a
+sysfs interface was added to let the user provide the information. This
+interface is made of 2 attribute files which are created in every I2C bus
+directory: new_device and delete_device. Both files are write only and you
+must write the right parameters to them in order to properly instantiate,
+respectively delete, an I2C device.
+
+File new_device takes 2 parameters: the name of the I2C device (a string)
+and the address of the I2C device (a number, typically expressed in
+hexadecimal starting with 0x, but can also be expressed in decimal.)
+
+File delete_device takes a single parameter: the address of the I2C
+device. As no two devices can live at the same address on a given I2C
+segment, the address is sufficient to uniquely identify the device to be
+deleted.
+
+Example:
+# echo eeprom 0x50 > /sys/class/i2c-adapter/i2c-3/new_device
+
+While this interface should only be used when in-kernel device declaration
+can't be done, there is a variety of cases where it can be helpful:
+* The I2C driver usually detects devices (method 3 above) but the bus
+  segment your device lives on doesn't have the proper class bit set and
+  thus detection doesn't trigger.
+* The I2C driver usually detects devices, but your device lives at an
+  unexpected address.
+* The I2C driver usually detects devices, but your device is not detected,
+  either because the detection routine is too strict, or because your
+  device is not officially supported yet but you know it is compatible.
+* You are developing a driver on a test board, where you soldered the I2C
+  device yourself.
+
+This interface is a replacement for the force_* module parameters some I2C
+drivers implement. Being implemented in i2c-core rather than in each
+device driver individually, it is much more efficient, and also has the
+advantage that you do not have to reload the driver to change a setting.
+You can also instantiate the device before the driver is loaded or even
+available, and you don't need to know what driver the device needs.
diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients
index c1a06f989cf..7860aafb483 100644
--- a/Documentation/i2c/writing-clients
+++ b/Documentation/i2c/writing-clients
@@ -126,19 +126,9 @@ different) configuration information, as do drivers handling chip variants
 that can't be distinguished by protocol probing, or which need some board
 specific information to operate correctly.
 
-Accordingly, the I2C stack now has two models for associating I2C devices
-with their drivers:  the original "legacy" model, and a newer one that's
-fully compatible with the Linux 2.6 driver model.  These models do not mix,
-since the "legacy" model requires drivers to create "i2c_client" device
-objects after SMBus style probing, while the Linux driver model expects
-drivers to be given such device objects in their probe() routines.
 
-The legacy model is deprecated now and will soon be removed, so we no
-longer document it here.
-
-
-Standard Driver Model Binding ("New Style")
--------------------------------------------
+Device/Driver Binding
+---------------------
 
 System infrastructure, typically board-specific initialization code or
 boot firmware, reports what I2C devices exist.  For example, there may be
@@ -201,7 +191,7 @@ a given I2C bus.  This is for example the case of hardware monitoring
 devices on a PC's SMBus.  In that case, you may want to let your driver
 detect supported devices automatically.  This is how the legacy model
 was working, and is now available as an extension to the standard
-driver model (so that we can finally get rid of the legacy model.)
+driver model.
 
 You simply have to define a detect callback which will attempt to
 identify supported devices (returning 0 for supported ones and -ENODEV
diff --git a/Documentation/input/input.txt b/Documentation/input/input.txt
index 686ee9932df..b93c08442e3 100644
--- a/Documentation/input/input.txt
+++ b/Documentation/input/input.txt
@@ -278,7 +278,7 @@ struct input_event {
 };
 
   'time' is the timestamp, it returns the time at which the event happened.
-Type is for example EV_REL for relative moment, REL_KEY for a keypress or
+Type is for example EV_REL for relative moment, EV_KEY for a keypress or
 release. More types are defined in include/linux/input.h.
 
   'code' is event code, for example REL_X or KEY_BACKSPACE, again a complete
diff --git a/Documentation/input/rotary-encoder.txt b/Documentation/input/rotary-encoder.txt
index 435102a26d9..3a6aec40c0b 100644
--- a/Documentation/input/rotary-encoder.txt
+++ b/Documentation/input/rotary-encoder.txt
@@ -67,7 +67,12 @@ data with it.
 struct rotary_encoder_platform_data is declared in
 include/linux/rotary-encoder.h and needs to be filled with the number of
 steps the encoder has and can carry information about externally inverted
-signals (because of used invertig buffer or other reasons).
+signals (because of an inverting buffer or other reasons). The encoder
+can be set up to deliver input information as either an absolute or relative
+axes. For relative axes the input event returns +/-1 for each step. For
+absolute axes the position of the encoder can either roll over between zero
+and the number of steps or will clamp at the maximum and zero depending on
+the configuration.
 
 Because GPIO to IRQ mapping is platform specific, this information must
 be given in seperately to the driver. See the example below.
@@ -85,6 +90,8 @@ be given in seperately to the driver. See the example below.
 static struct rotary_encoder_platform_data my_rotary_encoder_info = {
 	.steps		= 24,
 	.axis		= ABS_X,
+	.relative_axis	= false,
+	.rollover	= false,
 	.gpio_a		= GPIO_ROTARY_A,
 	.gpio_b		= GPIO_ROTARY_B,
 	.inverted_a	= 0,
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 1f779a25c70..dbea4f95fc8 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -139,6 +139,7 @@ Code	Seq#	Include File		Comments
 'm'	all	linux/synclink.h	conflict!
 'm'	00-1F	net/irda/irmod.h	conflict!
 'n'	00-7F	linux/ncp_fs.h
+'n'	80-8F	linux/nilfs2_fs.h	NILFS2
 'n'	E0-FF	video/matrox.h          matroxfb
 'o'	00-1F	fs/ocfs2/ocfs2_fs.h	OCFS2
 'o'     00-03   include/mtd/ubi-user.h  conflict! (OCFS2 and UBI overlaps)
@@ -149,6 +150,8 @@ Code	Seq#	Include File		Comments
 'p'	40-7F	linux/nvram.h
 'p'	80-9F				user-space parport
 					<mailto:tim@cyberelk.net>
+'p'	a1-a4	linux/pps.h		LinuxPPS
+					<mailto:giometti@linux.it>
 'q'	00-1F	linux/serio.h
 'q'	80-FF				Internet PhoneJACK, Internet LineJACK
 					<http://www.quicknet.net>
diff --git a/Documentation/isdn/00-INDEX b/Documentation/isdn/00-INDEX
index f6010a53659..e87e336f590 100644
--- a/Documentation/isdn/00-INDEX
+++ b/Documentation/isdn/00-INDEX
@@ -14,25 +14,14 @@ README
 	- general info on what you need and what to do for Linux ISDN.
 README.FAQ
 	- general info for FAQ.
-README.audio
-	- info for running audio over ISDN.
-README.fax
-	- info for using Fax over ISDN.
-README.gigaset
-	- info on the drivers for Siemens Gigaset ISDN adapters.
-README.icn
-	- info on the ICN-ISDN-card and its driver.
->>>>>>> 93af7aca44f0e82e67bda10a0fb73d383edcc8bd:Documentation/isdn/00-INDEX
 README.HiSax
 	- info on the HiSax driver which replaces the old teles.
+README.act2000
+	- info on driver for IBM ACT-2000 card.
 README.audio
 	- info for running audio over ISDN.
 README.avmb1
 	- info on driver for AVM-B1 ISDN card.
-README.act2000
-	- info on driver for IBM ACT-2000 card.
-README.eicon
-	- info on driver for Eicon active cards.
 README.concap
 	- info on "CONCAP" encapsulation protocol interface used for X.25.
 README.diversion
@@ -59,7 +48,3 @@ README.x25
 	- info for running X.25 over ISDN.
 syncPPP.FAQ
 	- frequently asked questions about running PPP over ISDN.
-README.hysdn
-	- info on driver for Hypercope active HYSDN cards
-README.mISDN
-	- info on the Modular ISDN subsystem (mISDN).
diff --git a/Documentation/ja_JP/SubmitChecklist b/Documentation/ja_JP/SubmitChecklist
index 6c42e071d72..2df4576f117 100644
--- a/Documentation/ja_JP/SubmitChecklist
+++ b/Documentation/ja_JP/SubmitChecklist
@@ -75,7 +75,7 @@ Linux カーネルパッチ投稿者向けチェックリスト
     ビルドした上、動作確認を行ってください。
 
 14: もしパッチがディスクのI/O性能などに影響を与えるようであれば、
-    'CONFIG_LBD'オプションを有効にした場合と無効にした場合の両方で
+    'CONFIG_LBDAF'オプションを有効にした場合と無効にした場合の両方で
     テストを実施してみてください。
 
 15: lockdepの機能を全て有効にした上で、全てのコードパスを評価してください。
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 5578248c18a..8e91863190e 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -48,6 +48,7 @@ parameter is applicable:
 	EFI	EFI Partitioning (GPT) is enabled
 	EIDE	EIDE/ATAPI support is enabled.
 	FB	The frame buffer device is enabled.
+	GCOV	GCOV profiling is enabled.
 	HW	Appropriate hardware is enabled.
 	IA-64	IA-64 architecture is enabled.
 	IMA     Integrity measurement architecture is enabled.
@@ -228,14 +229,6 @@ and is between 256 and 4096 characters. It is defined in the file
 			to assume that this machine's pmtimer latches its value
 			and always returns good values.
 
- 	acpi.power_nocheck=	[HW,ACPI]
- 			Format: 1/0 enable/disable the check of power state.
- 			On some bogus BIOS the _PSC object/_STA object of
- 			power resource can't return the correct device power
- 			state. In such case it is unneccessary to check its
- 			power state again in power transition.
- 			1 : disable the power state check
-
 	acpi_sci=	[HW,ACPI] ACPI System Control Interrupt trigger mode
 			Format: { level | edge | high | low }
 
@@ -796,6 +789,12 @@ and is between 256 and 4096 characters. It is defined in the file
 			Format: off | on
 			default: on
 
+	gcov_persist=	[GCOV] When non-zero (default), profiling data for
+			kernel modules is saved and remains accessible via
+			debugfs, even when the module is unloaded/reloaded.
+			When zero, profiling data is discarded and associated
+			debugfs files are removed at module unload time.
+
 	gdth=		[HW,SCSI]
 			See header of drivers/scsi/gdth.c.
 
@@ -999,6 +998,7 @@ and is between 256 and 4096 characters. It is defined in the file
 		nomerge
 		forcesac
 		soft
+		pt	[x86, IA64]
 
 	io7=		[HW] IO7 for Marvel based alpha systems
 			See comment before marvel_specify_io7 in
@@ -1115,6 +1115,10 @@ and is between 256 and 4096 characters. It is defined in the file
 			libata.dma=4	  Compact Flash DMA only 
 			Combinations also work, so libata.dma=3 enables DMA
 			for disks and CDROMs, but not CFs.
+	
+	libata.ignore_hpa=	[LIBATA] Ignore HPA limit
+			libata.ignore_hpa=0	  keep BIOS limits (default)
+			libata.ignore_hpa=1	  ignore limits, using full disk
 
 	libata.noacpi	[LIBATA] Disables use of ACPI in libata suspend/resume
 			when set.
@@ -1362,6 +1366,27 @@ and is between 256 and 4096 characters. It is defined in the file
 	min_addr=nn[KMG]	[KNL,BOOT,ia64] All physical memory below this
 			physical address is ignored.
 
+	mini2440=	[ARM,HW,KNL]
+			Format:[0..2][b][c][t]
+			Default: "0tb"
+			MINI2440 configuration specification:
+			0 - The attached screen is the 3.5" TFT
+			1 - The attached screen is the 7" TFT
+			2 - The VGA Shield is attached (1024x768)
+			Leaving out the screen size parameter will not load
+			the TFT driver, and the framebuffer will be left
+			unconfigured.
+			b - Enable backlight. The TFT backlight pin will be
+			linked to the kernel VESA blanking code and a GPIO
+			LED. This parameter is not necessary when using the
+			VGA shield.
+			c - Enable the s3c camera interface.
+			t - Reserved for enabling touchscreen support. The
+			touchscreen support is not enabled in the mainstream
+			kernel as of 2.6.30, a preliminary port can be found
+			in the "bleeding edge" mini2440 support kernel at
+			http://repo.or.cz/w/linux-2.6/mini2440.git
+
 	mminit_loglevel=
 			[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
 			parameter allows control of the logging verbosity for
@@ -1403,6 +1428,16 @@ and is between 256 and 4096 characters. It is defined in the file
 	mtdparts=	[MTD]
 			See drivers/mtd/cmdlinepart.c.
 
+	onenand.bdry=	[HW,MTD] Flex-OneNAND Boundary Configuration
+
+			Format: [die0_boundary][,die0_lock][,die1_boundary][,die1_lock]
+
+			boundary - index of last SLC block on Flex-OneNAND.
+				   The remaining blocks are configured as MLC blocks.
+			lock	 - Configure if Flex-OneNAND boundary should be locked.
+				   Once locked, the boundary cannot be changed.
+				   1 indicates lock status, 0 indicates unlock status.
+
 	mtdset=		[ARM]
 			ARM/S3C2412 JIVE boot control
 
@@ -1689,8 +1724,8 @@ and is between 256 and 4096 characters. It is defined in the file
 	oprofile.cpu_type=	Force an oprofile cpu type
 			This might be useful if you have an older oprofile
 			userland or if you want common events.
-			Format: { archperfmon }
-			archperfmon: [X86] Force use of architectural
+			Format: { arch_perfmon }
+			arch_perfmon: [X86] Force use of architectural
 				perfmon on Intel CPUs instead of the
 				CPU specific event set.
 
@@ -1769,6 +1804,9 @@ and is between 256 and 4096 characters. It is defined in the file
 				root domains (aka PCI segments, in ACPI-speak).
 		nommconf	[X86] Disable use of MMCONFIG for PCI
 				Configuration
+		check_enable_amd_mmconf [X86] check for and enable
+				properly configured MMIO access to PCI
+				config space on AMD family 10h CPU
 		nomsi		[MSI] If the PCI_MSI kernel config parameter is
 				enabled, this kernel boot option can be used to
 				disable the use of MSI interrupts system-wide.
@@ -1858,6 +1896,12 @@ and is between 256 and 4096 characters. It is defined in the file
 				PAGE_SIZE is used as alignment.
 				PCI-PCI bridge can be specified, if resource
 				windows need to be expanded.
+		ecrc=		Enable/disable PCIe ECRC (transaction layer
+				end-to-end CRC checking).
+				bios: Use BIOS/firmware settings. This is the
+				the default.
+				off: Turn ECRC off
+				on: Turn ECRC on.
 
 	pcie_aspm=	[PCIE] Forcibly enable or disable PCIe Active State Power
 			Management.
@@ -1875,6 +1919,12 @@ and is between 256 and 4096 characters. It is defined in the file
 			Format: { 0 | 1 }
 			See arch/parisc/kernel/pdc_chassis.c
 
+	percpu_alloc=	[X86] Select which percpu first chunk allocator to use.
+			Allowed values are one of "lpage", "embed" and "4k".
+			See comments in arch/x86/kernel/setup_percpu.c for
+			details on each allocator.  This parameter is primarily
+			for debugging and performance comparison.
+
 	pf.		[PARIDE]
 			See Documentation/blockdev/paride.txt.
 
@@ -2427,7 +2477,13 @@ and is between 256 and 4096 characters. It is defined in the file
 
 	tp720=		[HW,PS2]
 
-	trace_buf_size=nn[KMG] [ftrace] will set tracing buffer size.
+	trace_buf_size=nn[KMG]
+			[FTRACE] will set tracing buffer size.
+
+	trace_event=[event-list]
+			[FTRACE] Set and start specified trace events in order
+			to facilitate early boot debugging.
+			See also Documentation/trace/events.txt
 
 	trix=		[HW,OSS] MediaTrix AudioTrix Pro
 			Format:
diff --git a/Documentation/kmemleak.txt b/Documentation/kmemleak.txt
index 0112da3b9ab..89068030b01 100644
--- a/Documentation/kmemleak.txt
+++ b/Documentation/kmemleak.txt
@@ -16,13 +16,17 @@ Usage
 -----
 
 CONFIG_DEBUG_KMEMLEAK in "Kernel hacking" has to be enabled. A kernel
-thread scans the memory every 10 minutes (by default) and prints any new
-unreferenced objects found. To trigger an intermediate scan and display
-all the possible memory leaks:
+thread scans the memory every 10 minutes (by default) and prints the
+number of new unreferenced objects found. To display the details of all
+the possible memory leaks:
 
   # mount -t debugfs nodev /sys/kernel/debug/
   # cat /sys/kernel/debug/kmemleak
 
+To trigger an intermediate memory scan:
+
+  # echo scan > /sys/kernel/debug/kmemleak
+
 Note that the orphan objects are listed in the order they were allocated
 and one object at the beginning of the list may cause other subsequent
 objects to be reported as orphan.
@@ -31,16 +35,21 @@ Memory scanning parameters can be modified at run-time by writing to the
 /sys/kernel/debug/kmemleak file. The following parameters are supported:
 
   off		- disable kmemleak (irreversible)
-  stack=on	- enable the task stacks scanning
+  stack=on	- enable the task stacks scanning (default)
   stack=off	- disable the tasks stacks scanning
-  scan=on	- start the automatic memory scanning thread
+  scan=on	- start the automatic memory scanning thread (default)
   scan=off	- stop the automatic memory scanning thread
-  scan=<secs>	- set the automatic memory scanning period in seconds (0
-		  to disable it)
+  scan=<secs>	- set the automatic memory scanning period in seconds
+		  (default 600, 0 to stop the automatic scanning)
+  scan		- trigger a memory scan
 
 Kmemleak can also be disabled at boot-time by passing "kmemleak=off" on
 the kernel command line.
 
+Memory may be allocated or freed before kmemleak is initialised and
+these actions are stored in an early log buffer. The size of this buffer
+is configured via the CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE option.
+
 Basic Algorithm
 ---------------
 
diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt
index 78e354b42f6..e2ddcdeb61b 100644
--- a/Documentation/laptops/thinkpad-acpi.txt
+++ b/Documentation/laptops/thinkpad-acpi.txt
@@ -36,8 +36,6 @@ detailed description):
 	- Bluetooth enable and disable
 	- video output switching, expansion control
 	- ThinkLight on and off
-	- limited docking and undocking
-	- UltraBay eject
 	- CMOS/UCMS control
 	- LED control
 	- ACPI sounds
@@ -729,131 +727,6 @@ cannot be read or if it is unknown, thinkpad-acpi will report it as "off".
 It is impossible to know if the status returned through sysfs is valid.
 
 
-Docking / undocking -- /proc/acpi/ibm/dock
-------------------------------------------
-
-Docking and undocking (e.g. with the X4 UltraBase) requires some
-actions to be taken by the operating system to safely make or break
-the electrical connections with the dock.
-
-The docking feature of this driver generates the following ACPI events:
-
-	ibm/dock GDCK 00000003 00000001 -- eject request
-	ibm/dock GDCK 00000003 00000002 -- undocked
-	ibm/dock GDCK 00000000 00000003 -- docked
-
-NOTE: These events will only be generated if the laptop was docked
-when originally booted. This is due to the current lack of support for
-hot plugging of devices in the Linux ACPI framework. If the laptop was
-booted while not in the dock, the following message is shown in the
-logs:
-
-	Mar 17 01:42:34 aero kernel: thinkpad_acpi: dock device not present
-
-In this case, no dock-related events are generated but the dock and
-undock commands described below still work. They can be executed
-manually or triggered by Fn key combinations (see the example acpid
-configuration files included in the driver tarball package available
-on the web site).
-
-When the eject request button on the dock is pressed, the first event
-above is generated. The handler for this event should issue the
-following command:
-
-	echo undock > /proc/acpi/ibm/dock
-
-After the LED on the dock goes off, it is safe to eject the laptop.
-Note: if you pressed this key by mistake, go ahead and eject the
-laptop, then dock it back in. Otherwise, the dock may not function as
-expected.
-
-When the laptop is docked, the third event above is generated. The
-handler for this event should issue the following command to fully
-enable the dock:
-
-	echo dock > /proc/acpi/ibm/dock
-
-The contents of the /proc/acpi/ibm/dock file shows the current status
-of the dock, as provided by the ACPI framework.
-
-The docking support in this driver does not take care of enabling or
-disabling any other devices you may have attached to the dock. For
-example, a CD drive plugged into the UltraBase needs to be disabled or
-enabled separately. See the provided example acpid configuration files
-for how this can be accomplished.
-
-There is no support yet for PCI devices that may be attached to a
-docking station, e.g. in the ThinkPad Dock II. The driver currently
-does not recognize, enable or disable such devices. This means that
-the only docking stations currently supported are the X-series
-UltraBase docks and "dumb" port replicators like the Mini Dock (the
-latter don't need any ACPI support, actually).
-
-
-UltraBay eject -- /proc/acpi/ibm/bay
-------------------------------------
-
-Inserting or ejecting an UltraBay device requires some actions to be
-taken by the operating system to safely make or break the electrical
-connections with the device.
-
-This feature generates the following ACPI events:
-
-	ibm/bay MSTR 00000003 00000000 -- eject request
-	ibm/bay MSTR 00000001 00000000 -- eject lever inserted
-
-NOTE: These events will only be generated if the UltraBay was present
-when the laptop was originally booted (on the X series, the UltraBay
-is in the dock, so it may not be present if the laptop was undocked).
-This is due to the current lack of support for hot plugging of devices
-in the Linux ACPI framework. If the laptop was booted without the
-UltraBay, the following message is shown in the logs:
-
-	Mar 17 01:42:34 aero kernel: thinkpad_acpi: bay device not present
-
-In this case, no bay-related events are generated but the eject
-command described below still works. It can be executed manually or
-triggered by a hot key combination.
-
-Sliding the eject lever generates the first event shown above. The
-handler for this event should take whatever actions are necessary to
-shut down the device in the UltraBay (e.g. call idectl), then issue
-the following command:
-
-	echo eject > /proc/acpi/ibm/bay
-
-After the LED on the UltraBay goes off, it is safe to pull out the
-device.
-
-When the eject lever is inserted, the second event above is
-generated. The handler for this event should take whatever actions are
-necessary to enable the UltraBay device (e.g. call idectl).
-
-The contents of the /proc/acpi/ibm/bay file shows the current status
-of the UltraBay, as provided by the ACPI framework.
-
-EXPERIMENTAL warm eject support on the 600e/x, A22p and A3x (To use
-this feature, you need to supply the experimental=1 parameter when
-loading the module):
-
-These models do not have a button near the UltraBay device to request
-a hot eject but rather require the laptop to be put to sleep
-(suspend-to-ram) before the bay device is ejected or inserted).
-The sequence of steps to eject the device is as follows:
-
-	echo eject > /proc/acpi/ibm/bay
-	put the ThinkPad to sleep
-	remove the drive
-	resume from sleep
-	cat /proc/acpi/ibm/bay should show that the drive was removed
-
-On the A3x, both the UltraBay 2000 and UltraBay Plus devices are
-supported. Use "eject2" instead of "eject" for the second bay.
-
-Note: the UltraBay eject support on the 600e/x, A22p and A3x is
-EXPERIMENTAL and may not work as expected. USE WITH CAUTION!
-
-
 CMOS/UCMS control
 -----------------
 
@@ -920,7 +793,7 @@ The available commands are:
 	echo '<LED number> off' >/proc/acpi/ibm/led
 	echo '<LED number> blink' >/proc/acpi/ibm/led
 
-The <LED number> range is 0 to 7. The set of LEDs that can be
+The <LED number> range is 0 to 15. The set of LEDs that can be
 controlled varies from model to model. Here is the common ThinkPad
 mapping:
 
@@ -932,6 +805,11 @@ mapping:
 	5 - UltraBase battery slot
 	6 - (unknown)
 	7 - standby
+	8 - dock status 1
+	9 - dock status 2
+	10, 11 - (unknown)
+	12 - thinkvantage
+	13, 14, 15 - (unknown)
 
 All of the above can be turned on and off and can be made to blink.
 
@@ -940,10 +818,12 @@ sysfs notes:
 The ThinkPad LED sysfs interface is described in detail by the LED class
 documentation, in Documentation/leds-class.txt.
 
-The leds are named (in LED ID order, from 0 to 7):
+The LEDs are named (in LED ID order, from 0 to 12):
 "tpacpi::power", "tpacpi:orange:batt", "tpacpi:green:batt",
 "tpacpi::dock_active", "tpacpi::bay_active", "tpacpi::dock_batt",
-"tpacpi::unknown_led", "tpacpi::standby".
+"tpacpi::unknown_led", "tpacpi::standby", "tpacpi::dock_status1",
+"tpacpi::dock_status2", "tpacpi::unknown_led2", "tpacpi::unknown_led3",
+"tpacpi::thinkvantage".
 
 Due to limitations in the sysfs LED class, if the status of the LED
 indicators cannot be read due to an error, thinkpad-acpi will report it as
@@ -958,6 +838,12 @@ ThinkPad indicator LED should blink in hardware accelerated mode, use the
 "timer" trigger, and leave the delay_on and delay_off parameters set to
 zero (to request hardware acceleration autodetection).
 
+LEDs that are known not to exist in a given ThinkPad model are not
+made available through the sysfs interface.  If you have a dock and you
+notice there are LEDs listed for your ThinkPad that do not exist (and
+are not in the dock), or if you notice that there are missing LEDs,
+a report to ibm-acpi-devel@lists.sourceforge.net is appreciated.
+
 
 ACPI sounds -- /proc/acpi/ibm/beep
 ----------------------------------
@@ -1156,17 +1042,19 @@ may not be distinct.  Later Lenovo models that implement the ACPI
 display backlight brightness control methods have 16 levels, ranging
 from 0 to 15.
 
-There are two interfaces to the firmware for direct brightness control,
-EC and UCMS (or CMOS).  To select which one should be used, use the
-brightness_mode module parameter: brightness_mode=1 selects EC mode,
-brightness_mode=2 selects UCMS mode, brightness_mode=3 selects EC
-mode with NVRAM backing (so that brightness changes are remembered
-across shutdown/reboot).
+For IBM ThinkPads, there are two interfaces to the firmware for direct
+brightness control, EC and UCMS (or CMOS).  To select which one should be
+used, use the brightness_mode module parameter: brightness_mode=1 selects
+EC mode, brightness_mode=2 selects UCMS mode, brightness_mode=3 selects EC
+mode with NVRAM backing (so that brightness changes are remembered across
+shutdown/reboot).
 
 The driver tries to select which interface to use from a table of
 defaults for each ThinkPad model.  If it makes a wrong choice, please
 report this as a bug, so that we can fix it.
 
+Lenovo ThinkPads only support brightness_mode=2 (UCMS).
+
 When display backlight brightness controls are available through the
 standard ACPI interface, it is best to use it instead of this direct
 ThinkPad-specific interface.  The driver will disable its native
@@ -1254,7 +1142,7 @@ Fan control and monitoring: fan speed, fan enable/disable
 
 procfs: /proc/acpi/ibm/fan
 sysfs device attributes: (hwmon "thinkpad") fan1_input, pwm1,
-			  pwm1_enable
+			  pwm1_enable, fan2_input
 sysfs hwmon driver attributes: fan_watchdog
 
 NOTE NOTE NOTE: fan control operations are disabled by default for
@@ -1267,6 +1155,9 @@ from the hardware registers of the embedded controller.  This is known
 to work on later R, T, X and Z series ThinkPads but may show a bogus
 value on other models.
 
+Some Lenovo ThinkPads support a secondary fan.  This fan cannot be
+controlled separately, it shares the main fan control.
+
 Fan levels:
 
 Most ThinkPad fans work in "levels" at the firmware interface.  Level 0
@@ -1397,6 +1288,11 @@ hwmon device attribute fan1_input:
 	which can take up to two minutes.  May return rubbish on older
 	ThinkPads.
 
+hwmon device attribute fan2_input:
+	Fan tachometer reading, in RPM, for the secondary fan.
+	Available only on some ThinkPads.  If the secondary fan is
+	not installed, will always read 0.
+
 hwmon driver attribute fan_watchdog:
 	Fan safety watchdog timer interval, in seconds.  Minimum is
 	1 second, maximum is 120 seconds.  0 disables the watchdog.
@@ -1555,3 +1451,7 @@ Sysfs interface changelog:
 0x020300:	hotkey enable/disable support removed, attributes
 		hotkey_bios_enabled and hotkey_enable deprecated and
 		marked for removal.
+
+0x020400:	Marker for 16 LEDs support.  Also, LEDs that are known
+		to not exist in a given model are not registered with
+		the LED sysfs class anymore.
diff --git a/Documentation/leds-lp3944.txt b/Documentation/leds-lp3944.txt
new file mode 100644
index 00000000000..c6eda18b15e
--- /dev/null
+++ b/Documentation/leds-lp3944.txt
@@ -0,0 +1,50 @@
+Kernel driver lp3944
+====================
+
+  * National Semiconductor LP3944 Fun-light Chip
+    Prefix: 'lp3944'
+    Addresses scanned: None (see the Notes section below)
+    Datasheet: Publicly available at the National Semiconductor website
+               http://www.national.com/pf/LP/LP3944.html
+
+Authors:
+        Antonio Ospite <ospite@studenti.unina.it>
+
+
+Description
+-----------
+The LP3944 is a helper chip that can drive up to 8 leds, with two programmable
+DIM modes; it could even be used as a gpio expander but this driver assumes it
+is used as a led controller.
+
+The DIM modes are used to set _blink_ patterns for leds, the pattern is
+specified supplying two parameters:
+  - period: from 0s to 1.6s
+  - duty cycle: percentage of the period the led is on, from 0 to 100
+
+Setting a led in DIM0 or DIM1 mode makes it blink according to the pattern.
+See the datasheet for details.
+
+LP3944 can be found on Motorola A910 smartphone, where it drives the rgb
+leds, the camera flash light and the lcds power.
+
+
+Notes
+-----
+The chip is used mainly in embedded contexts, so this driver expects it is
+registered using the i2c_board_info mechanism.
+
+To register the chip at address 0x60 on adapter 0, set the platform data
+according to include/linux/leds-lp3944.h, set the i2c board info:
+
+	static struct i2c_board_info __initdata a910_i2c_board_info[] = {
+		{
+			I2C_BOARD_INFO("lp3944", 0x60),
+			.platform_data = &a910_lp3944_leds,
+		},
+	};
+
+and register it in the platform init function
+
+	i2c_register_board_info(0, a910_i2c_board_info,
+			ARRAY_SIZE(a910_i2c_board_info));
diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index 9ebcd6ef361..950cde6d6e5 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -1,7 +1,9 @@
-/*P:100 This is the Launcher code, a simple program which lays out the
- * "physical" memory for the new Guest by mapping the kernel image and
- * the virtual devices, then opens /dev/lguest to tell the kernel
- * about the Guest and control it. :*/
+/*P:100
+ * This is the Launcher code, a simple program which lays out the "physical"
+ * memory for the new Guest by mapping the kernel image and the virtual
+ * devices, then opens /dev/lguest to tell the kernel about the Guest and
+ * control it.
+:*/
 #define _LARGEFILE64_SOURCE
 #define _GNU_SOURCE
 #include <stdio.h>
@@ -46,13 +48,15 @@
 #include "linux/virtio_rng.h"
 #include "linux/virtio_ring.h"
 #include "asm/bootparam.h"
-/*L:110 We can ignore the 39 include files we need for this program, but I do
- * want to draw attention to the use of kernel-style types.
+/*L:110
+ * We can ignore the 42 include files we need for this program, but I do want
+ * to draw attention to the use of kernel-style types.
  *
  * As Linus said, "C is a Spartan language, and so should your naming be."  I
  * like these abbreviations, so we define them here.  Note that u64 is always
  * unsigned long long, which works on all Linux systems: this means that we can
- * use %llu in printf for any u64. */
+ * use %llu in printf for any u64.
+ */
 typedef unsigned long long u64;
 typedef uint32_t u32;
 typedef uint16_t u16;
@@ -69,8 +73,10 @@ typedef uint8_t u8;
 /* This will occupy 3 pages: it must be a power of 2. */
 #define VIRTQUEUE_NUM 256
 
-/*L:120 verbose is both a global flag and a macro.  The C preprocessor allows
- * this, and although I wouldn't recommend it, it works quite nicely here. */
+/*L:120
+ * verbose is both a global flag and a macro.  The C preprocessor allows
+ * this, and although I wouldn't recommend it, it works quite nicely here.
+ */
 static bool verbose;
 #define verbose(args...) \
 	do { if (verbose) printf(args); } while(0)
@@ -87,8 +93,7 @@ static int lguest_fd;
 static unsigned int __thread cpu_id;
 
 /* This is our list of devices. */
-struct device_list
-{
+struct device_list {
 	/* Counter to assign interrupt numbers. */
 	unsigned int next_irq;
 
@@ -100,8 +105,7 @@ struct device_list
 
 	/* A single linked list of devices. */
 	struct device *dev;
-	/* And a pointer to the last device for easy append and also for
-	 * configuration appending. */
+	/* And a pointer to the last device for easy append. */
 	struct device *lastdev;
 };
 
@@ -109,8 +113,7 @@ struct device_list
 static struct device_list devices;
 
 /* The device structure describes a single device. */
-struct device
-{
+struct device {
 	/* The linked-list pointer. */
 	struct device *next;
 
@@ -135,8 +138,7 @@ struct device
 };
 
 /* The virtqueue structure describes a queue attached to a device. */
-struct virtqueue
-{
+struct virtqueue {
 	struct virtqueue *next;
 
 	/* Which device owns me. */
@@ -168,20 +170,24 @@ static char **main_args;
 /* The original tty settings to restore on exit. */
 static struct termios orig_term;
 
-/* We have to be careful with barriers: our devices are all run in separate
+/*
+ * We have to be careful with barriers: our devices are all run in separate
  * threads and so we need to make sure that changes visible to the Guest happen
- * in precise order. */
+ * in precise order.
+ */
 #define wmb() __asm__ __volatile__("" : : : "memory")
 #define mb() __asm__ __volatile__("" : : : "memory")
 
-/* Convert an iovec element to the given type.
+/*
+ * Convert an iovec element to the given type.
  *
  * This is a fairly ugly trick: we need to know the size of the type and
  * alignment requirement to check the pointer is kosher.  It's also nice to
  * have the name of the type in case we report failure.
  *
  * Typing those three things all the time is cumbersome and error prone, so we
- * have a macro which sets them all up and passes to the real function. */
+ * have a macro which sets them all up and passes to the real function.
+ */
 #define convert(iov, type) \
 	((type *)_convert((iov), sizeof(type), __alignof__(type), #type))
 
@@ -198,8 +204,10 @@ static void *_convert(struct iovec *iov, size_t size, size_t align,
 /* Wrapper for the last available index.  Makes it easier to change. */
 #define lg_last_avail(vq)	((vq)->last_avail_idx)
 
-/* The virtio configuration space is defined to be little-endian.  x86 is
- * little-endian too, but it's nice to be explicit so we have these helpers. */
+/*
+ * The virtio configuration space is defined to be little-endian.  x86 is
+ * little-endian too, but it's nice to be explicit so we have these helpers.
+ */
 #define cpu_to_le16(v16) (v16)
 #define cpu_to_le32(v32) (v32)
 #define cpu_to_le64(v64) (v64)
@@ -241,11 +249,12 @@ static u8 *get_feature_bits(struct device *dev)
 		+ dev->num_vq * sizeof(struct lguest_vqconfig);
 }
 
-/*L:100 The Launcher code itself takes us out into userspace, that scary place
- * where pointers run wild and free!  Unfortunately, like most userspace
- * programs, it's quite boring (which is why everyone likes to hack on the
- * kernel!).  Perhaps if you make up an Lguest Drinking Game at this point, it
- * will get you through this section.  Or, maybe not.
+/*L:100
+ * The Launcher code itself takes us out into userspace, that scary place where
+ * pointers run wild and free!  Unfortunately, like most userspace programs,
+ * it's quite boring (which is why everyone likes to hack on the kernel!).
+ * Perhaps if you make up an Lguest Drinking Game at this point, it will get
+ * you through this section.  Or, maybe not.
  *
  * The Launcher sets up a big chunk of memory to be the Guest's "physical"
  * memory and stores it in "guest_base".  In other words, Guest physical ==
@@ -253,7 +262,8 @@ static u8 *get_feature_bits(struct device *dev)
  *
  * This can be tough to get your head around, but usually it just means that we
  * use these trivial conversion functions when the Guest gives us it's
- * "physical" addresses: */
+ * "physical" addresses:
+ */
 static void *from_guest_phys(unsigned long addr)
 {
 	return guest_base + addr;
@@ -268,7 +278,8 @@ static unsigned long to_guest_phys(const void *addr)
  * Loading the Kernel.
  *
  * We start with couple of simple helper routines.  open_or_die() avoids
- * error-checking code cluttering the callers: */
+ * error-checking code cluttering the callers:
+ */
 static int open_or_die(const char *name, int flags)
 {
 	int fd = open(name, flags);
@@ -283,12 +294,19 @@ static void *map_zeroed_pages(unsigned int num)
 	int fd = open_or_die("/dev/zero", O_RDONLY);
 	void *addr;
 
-	/* We use a private mapping (ie. if we write to the page, it will be
-	 * copied). */
+	/*
+	 * We use a private mapping (ie. if we write to the page, it will be
+	 * copied).
+	 */
 	addr = mmap(NULL, getpagesize() * num,
 		    PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, fd, 0);
 	if (addr == MAP_FAILED)
 		err(1, "Mmaping %u pages of /dev/zero", num);
+
+	/*
+	 * One neat mmap feature is that you can close the fd, and it
+	 * stays mapped.
+	 */
 	close(fd);
 
 	return addr;
@@ -305,20 +323,24 @@ static void *get_pages(unsigned int num)
 	return addr;
 }
 
-/* This routine is used to load the kernel or initrd.  It tries mmap, but if
+/*
+ * This routine is used to load the kernel or initrd.  It tries mmap, but if
  * that fails (Plan 9's kernel file isn't nicely aligned on page boundaries),
- * it falls back to reading the memory in. */
+ * it falls back to reading the memory in.
+ */
 static void map_at(int fd, void *addr, unsigned long offset, unsigned long len)
 {
 	ssize_t r;
 
-	/* We map writable even though for some segments are marked read-only.
+	/*
+	 * We map writable even though for some segments are marked read-only.
 	 * The kernel really wants to be writable: it patches its own
 	 * instructions.
 	 *
 	 * MAP_PRIVATE means that the page won't be copied until a write is
 	 * done to it.  This allows us to share untouched memory between
-	 * Guests. */
+	 * Guests.
+	 */
 	if (mmap(addr, len, PROT_READ|PROT_WRITE|PROT_EXEC,
 		 MAP_FIXED|MAP_PRIVATE, fd, offset) != MAP_FAILED)
 		return;
@@ -329,7 +351,8 @@ static void map_at(int fd, void *addr, unsigned long offset, unsigned long len)
 		err(1, "Reading offset %lu len %lu gave %zi", offset, len, r);
 }
 
-/* This routine takes an open vmlinux image, which is in ELF, and maps it into
+/*
+ * This routine takes an open vmlinux image, which is in ELF, and maps it into
  * the Guest memory.  ELF = Embedded Linking Format, which is the format used
  * by all modern binaries on Linux including the kernel.
  *
@@ -337,23 +360,28 @@ static void map_at(int fd, void *addr, unsigned long offset, unsigned long len)
  * address.  We use the physical address; the Guest will map itself to the
  * virtual address.
  *
- * We return the starting address. */
+ * We return the starting address.
+ */
 static unsigned long map_elf(int elf_fd, const Elf32_Ehdr *ehdr)
 {
 	Elf32_Phdr phdr[ehdr->e_phnum];
 	unsigned int i;
 
-	/* Sanity checks on the main ELF header: an x86 executable with a
-	 * reasonable number of correctly-sized program headers. */
+	/*
+	 * Sanity checks on the main ELF header: an x86 executable with a
+	 * reasonable number of correctly-sized program headers.
+	 */
 	if (ehdr->e_type != ET_EXEC
 	    || ehdr->e_machine != EM_386
 	    || ehdr->e_phentsize != sizeof(Elf32_Phdr)
 	    || ehdr->e_phnum < 1 || ehdr->e_phnum > 65536U/sizeof(Elf32_Phdr))
 		errx(1, "Malformed elf header");
 
-	/* An ELF executable contains an ELF header and a number of "program"
+	/*
+	 * An ELF executable contains an ELF header and a number of "program"
 	 * headers which indicate which parts ("segments") of the program to
-	 * load where. */
+	 * load where.
+	 */
 
 	/* We read in all the program headers at once: */
 	if (lseek(elf_fd, ehdr->e_phoff, SEEK_SET) < 0)
@@ -361,8 +389,10 @@ static unsigned long map_elf(int elf_fd, const Elf32_Ehdr *ehdr)
 	if (read(elf_fd, phdr, sizeof(phdr)) != sizeof(phdr))
 		err(1, "Reading program headers");
 
-	/* Try all the headers: there are usually only three.  A read-only one,
-	 * a read-write one, and a "note" section which we don't load. */
+	/*
+	 * Try all the headers: there are usually only three.  A read-only one,
+	 * a read-write one, and a "note" section which we don't load.
+	 */
 	for (i = 0; i < ehdr->e_phnum; i++) {
 		/* If this isn't a loadable segment, we ignore it */
 		if (phdr[i].p_type != PT_LOAD)
@@ -380,13 +410,15 @@ static unsigned long map_elf(int elf_fd, const Elf32_Ehdr *ehdr)
 	return ehdr->e_entry;
 }
 
-/*L:150 A bzImage, unlike an ELF file, is not meant to be loaded.  You're
- * supposed to jump into it and it will unpack itself.  We used to have to
- * perform some hairy magic because the unpacking code scared me.
+/*L:150
+ * A bzImage, unlike an ELF file, is not meant to be loaded.  You're supposed
+ * to jump into it and it will unpack itself.  We used to have to perform some
+ * hairy magic because the unpacking code scared me.
  *
  * Fortunately, Jeremy Fitzhardinge convinced me it wasn't that hard and wrote
  * a small patch to jump over the tricky bits in the Guest, so now we just read
- * the funky header so we know where in the file to load, and away we go! */
+ * the funky header so we know where in the file to load, and away we go!
+ */
 static unsigned long load_bzimage(int fd)
 {
 	struct boot_params boot;
@@ -394,8 +426,10 @@ static unsigned long load_bzimage(int fd)
 	/* Modern bzImages get loaded at 1M. */
 	void *p = from_guest_phys(0x100000);
 
-	/* Go back to the start of the file and read the header.  It should be
-	 * a Linux boot header (see Documentation/x86/i386/boot.txt) */
+	/*
+	 * Go back to the start of the file and read the header.  It should be
+	 * a Linux boot header (see Documentation/x86/i386/boot.txt)
+	 */
 	lseek(fd, 0, SEEK_SET);
 	read(fd, &boot, sizeof(boot));
 
@@ -414,9 +448,11 @@ static unsigned long load_bzimage(int fd)
 	return boot.hdr.code32_start;
 }
 
-/*L:140 Loading the kernel is easy when it's a "vmlinux", but most kernels
+/*L:140
+ * Loading the kernel is easy when it's a "vmlinux", but most kernels
  * come wrapped up in the self-decompressing "bzImage" format.  With a little
- * work, we can load those, too. */
+ * work, we can load those, too.
+ */
 static unsigned long load_kernel(int fd)
 {
 	Elf32_Ehdr hdr;
@@ -433,24 +469,28 @@ static unsigned long load_kernel(int fd)
 	return load_bzimage(fd);
 }
 
-/* This is a trivial little helper to align pages.  Andi Kleen hated it because
+/*
+ * This is a trivial little helper to align pages.  Andi Kleen hated it because
  * it calls getpagesize() twice: "it's dumb code."
  *
  * Kernel guys get really het up about optimization, even when it's not
- * necessary.  I leave this code as a reaction against that. */
+ * necessary.  I leave this code as a reaction against that.
+ */
 static inline unsigned long page_align(unsigned long addr)
 {
 	/* Add upwards and truncate downwards. */
 	return ((addr + getpagesize()-1) & ~(getpagesize()-1));
 }
 
-/*L:180 An "initial ram disk" is a disk image loaded into memory along with
- * the kernel which the kernel can use to boot from without needing any
- * drivers.  Most distributions now use this as standard: the initrd contains
- * the code to load the appropriate driver modules for the current machine.
+/*L:180
+ * An "initial ram disk" is a disk image loaded into memory along with the
+ * kernel which the kernel can use to boot from without needing any drivers.
+ * Most distributions now use this as standard: the initrd contains the code to
+ * load the appropriate driver modules for the current machine.
  *
  * Importantly, James Morris works for RedHat, and Fedora uses initrds for its
- * kernels.  He sent me this (and tells me when I break it). */
+ * kernels.  He sent me this (and tells me when I break it).
+ */
 static unsigned long load_initrd(const char *name, unsigned long mem)
 {
 	int ifd;
@@ -462,12 +502,16 @@ static unsigned long load_initrd(const char *name, unsigned long mem)
 	if (fstat(ifd, &st) < 0)
 		err(1, "fstat() on initrd '%s'", name);
 
-	/* We map the initrd at the top of memory, but mmap wants it to be
-	 * page-aligned, so we round the size up for that. */
+	/*
+	 * We map the initrd at the top of memory, but mmap wants it to be
+	 * page-aligned, so we round the size up for that.
+	 */
 	len = page_align(st.st_size);
 	map_at(ifd, from_guest_phys(mem - len), 0, st.st_size);
-	/* Once a file is mapped, you can close the file descriptor.  It's a
-	 * little odd, but quite useful. */
+	/*
+	 * Once a file is mapped, you can close the file descriptor.  It's a
+	 * little odd, but quite useful.
+	 */
 	close(ifd);
 	verbose("mapped initrd %s size=%lu @ %p\n", name, len, (void*)mem-len);
 
@@ -476,8 +520,10 @@ static unsigned long load_initrd(const char *name, unsigned long mem)
 }
 /*:*/
 
-/* Simple routine to roll all the commandline arguments together with spaces
- * between them. */
+/*
+ * Simple routine to roll all the commandline arguments together with spaces
+ * between them.
+ */
 static void concat(char *dst, char *args[])
 {
 	unsigned int i, len = 0;
@@ -494,10 +540,12 @@ static void concat(char *dst, char *args[])
 	dst[len] = '\0';
 }
 
-/*L:185 This is where we actually tell the kernel to initialize the Guest.  We
+/*L:185
+ * This is where we actually tell the kernel to initialize the Guest.  We
  * saw the arguments it expects when we looked at initialize() in lguest_user.c:
  * the base of Guest "physical" memory, the top physical page to allow and the
- * entry point for the Guest. */
+ * entry point for the Guest.
+ */
 static void tell_kernel(unsigned long start)
 {
 	unsigned long args[] = { LHREQ_INITIALIZE,
@@ -511,7 +559,7 @@ static void tell_kernel(unsigned long start)
 }
 /*:*/
 
-/*
+/*L:200
  * Device Handling.
  *
  * When the Guest gives us a buffer, it sends an array of addresses and sizes.
@@ -522,20 +570,26 @@ static void tell_kernel(unsigned long start)
 static void *_check_pointer(unsigned long addr, unsigned int size,
 			    unsigned int line)
 {
-	/* We have to separately check addr and addr+size, because size could
-	 * be huge and addr + size might wrap around. */
+	/*
+	 * We have to separately check addr and addr+size, because size could
+	 * be huge and addr + size might wrap around.
+	 */
 	if (addr >= guest_limit || addr + size >= guest_limit)
 		errx(1, "%s:%i: Invalid address %#lx", __FILE__, line, addr);
-	/* We return a pointer for the caller's convenience, now we know it's
-	 * safe to use. */
+	/*
+	 * We return a pointer for the caller's convenience, now we know it's
+	 * safe to use.
+	 */
 	return from_guest_phys(addr);
 }
 /* A macro which transparently hands the line number to the real function. */
 #define check_pointer(addr,size) _check_pointer(addr, size, __LINE__)
 
-/* Each buffer in the virtqueues is actually a chain of descriptors.  This
+/*
+ * Each buffer in the virtqueues is actually a chain of descriptors.  This
  * function returns the next descriptor in the chain, or vq->vring.num if we're
- * at the end. */
+ * at the end.
+ */
 static unsigned next_desc(struct vring_desc *desc,
 			  unsigned int i, unsigned int max)
 {
@@ -556,7 +610,10 @@ static unsigned next_desc(struct vring_desc *desc,
 	return next;
 }
 
-/* This actually sends the interrupt for this virtqueue */
+/*
+ * This actually sends the interrupt for this virtqueue, if we've used a
+ * buffer.
+ */
 static void trigger_irq(struct virtqueue *vq)
 {
 	unsigned long buf[] = { LHREQ_IRQ, vq->config.irq };
@@ -576,12 +633,14 @@ static void trigger_irq(struct virtqueue *vq)
 		err(1, "Triggering irq %i", vq->config.irq);
 }
 
-/* This looks in the virtqueue and for the first available buffer, and converts
+/*
+ * This looks in the virtqueue for the first available buffer, and converts
  * it to an iovec for convenient access.  Since descriptors consist of some
  * number of output then some number of input descriptors, it's actually two
  * iovecs, but we pack them into one and note how many of each there were.
  *
- * This function returns the descriptor number found. */
+ * This function waits if necessary, and returns the descriptor number found.
+ */
 static unsigned wait_for_vq_desc(struct virtqueue *vq,
 				 struct iovec iov[],
 				 unsigned int *out_num, unsigned int *in_num)
@@ -590,17 +649,23 @@ static unsigned wait_for_vq_desc(struct virtqueue *vq,
 	struct vring_desc *desc;
 	u16 last_avail = lg_last_avail(vq);
 
+	/* There's nothing available? */
 	while (last_avail == vq->vring.avail->idx) {
 		u64 event;
 
-		/* OK, tell Guest about progress up to now. */
+		/*
+		 * Since we're about to sleep, now is a good time to tell the
+		 * Guest about what we've used up to now.
+		 */
 		trigger_irq(vq);
 
 		/* OK, now we need to know about added descriptors. */
 		vq->vring.used->flags &= ~VRING_USED_F_NO_NOTIFY;
 
-		/* They could have slipped one in as we were doing that: make
-		 * sure it's written, then check again. */
+		/*
+		 * They could have slipped one in as we were doing that: make
+		 * sure it's written, then check again.
+		 */
 		mb();
 		if (last_avail != vq->vring.avail->idx) {
 			vq->vring.used->flags |= VRING_USED_F_NO_NOTIFY;
@@ -620,8 +685,10 @@ static unsigned wait_for_vq_desc(struct virtqueue *vq,
 		errx(1, "Guest moved used index from %u to %u",
 		     last_avail, vq->vring.avail->idx);
 
-	/* Grab the next descriptor number they're advertising, and increment
-	 * the index we've seen. */
+	/*
+	 * Grab the next descriptor number they're advertising, and increment
+	 * the index we've seen.
+	 */
 	head = vq->vring.avail->ring[last_avail % vq->vring.num];
 	lg_last_avail(vq)++;
 
@@ -636,8 +703,10 @@ static unsigned wait_for_vq_desc(struct virtqueue *vq,
 	desc = vq->vring.desc;
 	i = head;
 
-	/* If this is an indirect entry, then this buffer contains a descriptor
-	 * table which we handle as if it's any normal descriptor chain. */
+	/*
+	 * If this is an indirect entry, then this buffer contains a descriptor
+	 * table which we handle as if it's any normal descriptor chain.
+	 */
 	if (desc[i].flags & VRING_DESC_F_INDIRECT) {
 		if (desc[i].len % sizeof(struct vring_desc))
 			errx(1, "Invalid size for indirect buffer table");
@@ -656,8 +725,10 @@ static unsigned wait_for_vq_desc(struct virtqueue *vq,
 		if (desc[i].flags & VRING_DESC_F_WRITE)
 			(*in_num)++;
 		else {
-			/* If it's an output descriptor, they're all supposed
-			 * to come before any input descriptors. */
+			/*
+			 * If it's an output descriptor, they're all supposed
+			 * to come before any input descriptors.
+			 */
 			if (*in_num)
 				errx(1, "Descriptor has out after in");
 			(*out_num)++;
@@ -671,14 +742,19 @@ static unsigned wait_for_vq_desc(struct virtqueue *vq,
 	return head;
 }
 
-/* After we've used one of their buffers, we tell them about it.  We'll then
- * want to send them an interrupt, using trigger_irq(). */
+/*
+ * After we've used one of their buffers, we tell the Guest about it.  Sometime
+ * later we'll want to send them an interrupt using trigger_irq(); note that
+ * wait_for_vq_desc() does that for us if it has to wait.
+ */
 static void add_used(struct virtqueue *vq, unsigned int head, int len)
 {
 	struct vring_used_elem *used;
 
-	/* The virtqueue contains a ring of used buffers.  Get a pointer to the
-	 * next entry in that used ring. */
+	/*
+	 * The virtqueue contains a ring of used buffers.  Get a pointer to the
+	 * next entry in that used ring.
+	 */
 	used = &vq->vring.used->ring[vq->vring.used->idx % vq->vring.num];
 	used->id = head;
 	used->len = len;
@@ -698,9 +774,9 @@ static void add_used_and_trigger(struct virtqueue *vq, unsigned head, int len)
 /*
  * The Console
  *
- * We associate some data with the console for our exit hack. */
-struct console_abort
-{
+ * We associate some data with the console for our exit hack.
+ */
+struct console_abort {
 	/* How many times have they hit ^C? */
 	int count;
 	/* When did they start? */
@@ -715,30 +791,35 @@ static void console_input(struct virtqueue *vq)
 	struct console_abort *abort = vq->dev->priv;
 	struct iovec iov[vq->vring.num];
 
-	/* Make sure there's a descriptor waiting. */
+	/* Make sure there's a descriptor available. */
 	head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
 	if (out_num)
 		errx(1, "Output buffers in console in queue?");
 
-	/* Read it in. */
+	/* Read into it.  This is where we usually wait. */
 	len = readv(STDIN_FILENO, iov, in_num);
 	if (len <= 0) {
 		/* Ran out of input? */
 		warnx("Failed to get console input, ignoring console.");
-		/* For simplicity, dying threads kill the whole Launcher.  So
-		 * just nap here. */
+		/*
+		 * For simplicity, dying threads kill the whole Launcher.  So
+		 * just nap here.
+		 */
 		for (;;)
 			pause();
 	}
 
+	/* Tell the Guest we used a buffer. */
 	add_used_and_trigger(vq, head, len);
 
-	/* Three ^C within one second?  Exit.
+	/*
+	 * Three ^C within one second?  Exit.
 	 *
 	 * This is such a hack, but works surprisingly well.  Each ^C has to
 	 * be in a buffer by itself, so they can't be too fast.  But we check
 	 * that we get three within about a second, so they can't be too
-	 * slow. */
+	 * slow.
+	 */
 	if (len != 1 || ((char *)iov[0].iov_base)[0] != 3) {
 		abort->count = 0;
 		return;
@@ -763,15 +844,23 @@ static void console_output(struct virtqueue *vq)
 	unsigned int head, out, in;
 	struct iovec iov[vq->vring.num];
 
+	/* We usually wait in here, for the Guest to give us something. */
 	head = wait_for_vq_desc(vq, iov, &out, &in);
 	if (in)
 		errx(1, "Input buffers in console output queue?");
+
+	/* writev can return a partial write, so we loop here. */
 	while (!iov_empty(iov, out)) {
 		int len = writev(STDOUT_FILENO, iov, out);
 		if (len <= 0)
 			err(1, "Write to stdout gave %i", len);
 		iov_consume(iov, out, len);
 	}
+
+	/*
+	 * We're finished with that buffer: if we're going to sleep,
+	 * wait_for_vq_desc() will prod the Guest with an interrupt.
+	 */
 	add_used(vq, head, 0);
 }
 
@@ -791,15 +880,30 @@ static void net_output(struct virtqueue *vq)
 	unsigned int head, out, in;
 	struct iovec iov[vq->vring.num];
 
+	/* We usually wait in here for the Guest to give us a packet. */
 	head = wait_for_vq_desc(vq, iov, &out, &in);
 	if (in)
 		errx(1, "Input buffers in net output queue?");
+	/*
+	 * Send the whole thing through to /dev/net/tun.  It expects the exact
+	 * same format: what a coincidence!
+	 */
 	if (writev(net_info->tunfd, iov, out) < 0)
 		errx(1, "Write to tun failed?");
+
+	/*
+	 * Done with that one; wait_for_vq_desc() will send the interrupt if
+	 * all packets are processed.
+	 */
 	add_used(vq, head, 0);
 }
 
-/* Will reading from this file descriptor block? */
+/*
+ * Handling network input is a bit trickier, because I've tried to optimize it.
+ *
+ * First we have a helper routine which tells is if from this file descriptor
+ * (ie. the /dev/net/tun device) will block:
+ */
 static bool will_block(int fd)
 {
 	fd_set fdset;
@@ -809,8 +913,11 @@ static bool will_block(int fd)
 	return select(fd+1, &fdset, NULL, NULL, &zero) != 1;
 }
 
-/* This is where we handle packets coming in from the tun device to our
- * Guest. */
+/*
+ * This handles packets coming in from the tun device to our Guest.  Like all
+ * service routines, it gets called again as soon as it returns, so you don't
+ * see a while(1) loop here.
+ */
 static void net_input(struct virtqueue *vq)
 {
 	int len;
@@ -818,21 +925,38 @@ static void net_input(struct virtqueue *vq)
 	struct iovec iov[vq->vring.num];
 	struct net_info *net_info = vq->dev->priv;
 
+	/*
+	 * Get a descriptor to write an incoming packet into.  This will also
+	 * send an interrupt if they're out of descriptors.
+	 */
 	head = wait_for_vq_desc(vq, iov, &out, &in);
 	if (out)
 		errx(1, "Output buffers in net input queue?");
 
-	/* Deliver interrupt now, since we're about to sleep. */
+	/*
+	 * If it looks like we'll block reading from the tun device, send them
+	 * an interrupt.
+	 */
 	if (vq->pending_used && will_block(net_info->tunfd))
 		trigger_irq(vq);
 
+	/*
+	 * Read in the packet.  This is where we normally wait (when there's no
+	 * incoming network traffic).
+	 */
 	len = readv(net_info->tunfd, iov, in);
 	if (len <= 0)
 		err(1, "Failed to read from tun.");
+
+	/*
+	 * Mark that packet buffer as used, but don't interrupt here.  We want
+	 * to wait until we've done as much work as we can.
+	 */
 	add_used(vq, head, len);
 }
+/*:*/
 
-/* This is the helper to create threads. */
+/* This is the helper to create threads: run the service routine in a loop. */
 static int do_thread(void *_vq)
 {
 	struct virtqueue *vq = _vq;
@@ -842,8 +966,10 @@ static int do_thread(void *_vq)
 	return 0;
 }
 
-/* When a child dies, we kill our entire process group with SIGTERM.  This
- * also has the side effect that the shell restores the console for us! */
+/*
+ * When a child dies, we kill our entire process group with SIGTERM.  This
+ * also has the side effect that the shell restores the console for us!
+ */
 static void kill_launcher(int signal)
 {
 	kill(0, SIGTERM);
@@ -878,11 +1004,15 @@ static void reset_device(struct device *dev)
 	signal(SIGCHLD, (void *)kill_launcher);
 }
 
+/*L:216
+ * This actually creates the thread which services the virtqueue for a device.
+ */
 static void create_thread(struct virtqueue *vq)
 {
-	/* Create stack for thread and run it.  Since stack grows
-	 * upwards, we point the stack pointer to the end of this
-	 * region. */
+	/*
+	 * Create stack for thread.  Since the stack grows upwards, we point
+	 * the stack pointer to the end of this region.
+	 */
 	char *stack = malloc(32768);
 	unsigned long args[] = { LHREQ_EVENTFD,
 				 vq->config.pfn*getpagesize(), 0 };
@@ -893,17 +1023,22 @@ static void create_thread(struct virtqueue *vq)
 		err(1, "Creating eventfd");
 	args[2] = vq->eventfd;
 
-	/* Attach an eventfd to this virtqueue: it will go off
-	 * when the Guest does an LHCALL_NOTIFY for this vq. */
+	/*
+	 * Attach an eventfd to this virtqueue: it will go off when the Guest
+	 * does an LHCALL_NOTIFY for this vq.
+	 */
 	if (write(lguest_fd, &args, sizeof(args)) != 0)
 		err(1, "Attaching eventfd");
 
-	/* CLONE_VM: because it has to access the Guest memory, and
-	 * SIGCHLD so we get a signal if it dies. */
+	/*
+	 * CLONE_VM: because it has to access the Guest memory, and SIGCHLD so
+	 * we get a signal if it dies.
+	 */
 	vq->thread = clone(do_thread, stack + 32768, CLONE_VM | SIGCHLD, vq);
 	if (vq->thread == (pid_t)-1)
 		err(1, "Creating clone");
-	/* We close our local copy, now the child has it. */
+
+	/* We close our local copy now the child has it. */
 	close(vq->eventfd);
 }
 
@@ -955,7 +1090,10 @@ static void update_device_status(struct device *dev)
 	}
 }
 
-/* This is the generic routine we call when the Guest uses LHCALL_NOTIFY. */
+/*L:215
+ * This is the generic routine we call when the Guest uses LHCALL_NOTIFY.  In
+ * particular, it's used to notify us of device status changes during boot.
+ */
 static void handle_output(unsigned long addr)
 {
 	struct device *i;
@@ -964,25 +1102,42 @@ static void handle_output(unsigned long addr)
 	for (i = devices.dev; i; i = i->next) {
 		struct virtqueue *vq;
 
-		/* Notifications to device descriptors update device status. */
+		/*
+		 * Notifications to device descriptors mean they updated the
+		 * device status.
+		 */
 		if (from_guest_phys(addr) == i->desc) {
 			update_device_status(i);
 			return;
 		}
 
-		/* Devices *can* be used before status is set to DRIVER_OK. */
+		/*
+		 * Devices *can* be used before status is set to DRIVER_OK.
+		 * The original plan was that they would never do this: they
+		 * would always finish setting up their status bits before
+		 * actually touching the virtqueues.  In practice, we allowed
+		 * them to, and they do (eg. the disk probes for partition
+		 * tables as part of initialization).
+		 *
+		 * If we see this, we start the device: once it's running, we
+		 * expect the device to catch all the notifications.
+		 */
 		for (vq = i->vq; vq; vq = vq->next) {
 			if (addr != vq->config.pfn*getpagesize())
 				continue;
 			if (i->running)
 				errx(1, "Notification on running %s", i->name);
+			/* This just calls create_thread() for each virtqueue */
 			start_device(i);
 			return;
 		}
 	}
 
-	/* Early console write is done using notify on a nul-terminated string
-	 * in Guest memory. */
+	/*
+	 * Early console write is done using notify on a nul-terminated string
+	 * in Guest memory.  It's also great for hacking debugging messages
+	 * into a Guest.
+	 */
 	if (addr >= guest_limit)
 		errx(1, "Bad NOTIFY %#lx", addr);
 
@@ -998,10 +1153,12 @@ static void handle_output(unsigned long addr)
  * routines to allocate and manage them.
  */
 
-/* The layout of the device page is a "struct lguest_device_desc" followed by a
+/*
+ * The layout of the device page is a "struct lguest_device_desc" followed by a
  * number of virtqueue descriptors, then two sets of feature bits, then an
  * array of configuration bytes.  This routine returns the configuration
- * pointer. */
+ * pointer.
+ */
 static u8 *device_config(const struct device *dev)
 {
 	return (void *)(dev->desc + 1)
@@ -1009,9 +1166,11 @@ static u8 *device_config(const struct device *dev)
 		+ dev->feature_len * 2;
 }
 
-/* This routine allocates a new "struct lguest_device_desc" from descriptor
+/*
+ * This routine allocates a new "struct lguest_device_desc" from descriptor
  * table page just above the Guest's normal memory.  It returns a pointer to
- * that descriptor. */
+ * that descriptor.
+ */
 static struct lguest_device_desc *new_dev_desc(u16 type)
 {
 	struct lguest_device_desc d = { .type = type };
@@ -1032,8 +1191,10 @@ static struct lguest_device_desc *new_dev_desc(u16 type)
 	return memcpy(p, &d, sizeof(d));
 }
 
-/* Each device descriptor is followed by the description of its virtqueues.  We
- * specify how many descriptors the virtqueue is to have. */
+/*
+ * Each device descriptor is followed by the description of its virtqueues.  We
+ * specify how many descriptors the virtqueue is to have.
+ */
 static void add_virtqueue(struct device *dev, unsigned int num_descs,
 			  void (*service)(struct virtqueue *))
 {
@@ -1050,6 +1211,11 @@ static void add_virtqueue(struct device *dev, unsigned int num_descs,
 	vq->next = NULL;
 	vq->last_avail_idx = 0;
 	vq->dev = dev;
+
+	/*
+	 * This is the routine the service thread will run, and its Process ID
+	 * once it's running.
+	 */
 	vq->service = service;
 	vq->thread = (pid_t)-1;
 
@@ -1061,10 +1227,12 @@ static void add_virtqueue(struct device *dev, unsigned int num_descs,
 	/* Initialize the vring. */
 	vring_init(&vq->vring, num_descs, p, LGUEST_VRING_ALIGN);
 
-	/* Append virtqueue to this device's descriptor.  We use
+	/*
+	 * Append virtqueue to this device's descriptor.  We use
 	 * device_config() to get the end of the device's current virtqueues;
 	 * we check that we haven't added any config or feature information
-	 * yet, otherwise we'd be overwriting them. */
+	 * yet, otherwise we'd be overwriting them.
+	 */
 	assert(dev->desc->config_len == 0 && dev->desc->feature_len == 0);
 	memcpy(device_config(dev), &vq->config, sizeof(vq->config));
 	dev->num_vq++;
@@ -1072,14 +1240,18 @@ static void add_virtqueue(struct device *dev, unsigned int num_descs,
 
 	verbose("Virtqueue page %#lx\n", to_guest_phys(p));
 
-	/* Add to tail of list, so dev->vq is first vq, dev->vq->next is
-	 * second.  */
+	/*
+	 * Add to tail of list, so dev->vq is first vq, dev->vq->next is
+	 * second.
+	 */
 	for (i = &dev->vq; *i; i = &(*i)->next);
 	*i = vq;
 }
 
-/* The first half of the feature bitmask is for us to advertise features.  The
- * second half is for the Guest to accept features. */
+/*
+ * The first half of the feature bitmask is for us to advertise features.  The
+ * second half is for the Guest to accept features.
+ */
 static void add_feature(struct device *dev, unsigned bit)
 {
 	u8 *features = get_feature_bits(dev);
@@ -1093,9 +1265,11 @@ static void add_feature(struct device *dev, unsigned bit)
 	features[bit / CHAR_BIT] |= (1 << (bit % CHAR_BIT));
 }
 
-/* This routine sets the configuration fields for an existing device's
+/*
+ * This routine sets the configuration fields for an existing device's
  * descriptor.  It only works for the last device, but that's OK because that's
- * how we use it. */
+ * how we use it.
+ */
 static void set_config(struct device *dev, unsigned len, const void *conf)
 {
 	/* Check we haven't overflowed our single page. */
@@ -1105,12 +1279,18 @@ static void set_config(struct device *dev, unsigned len, const void *conf)
 	/* Copy in the config information, and store the length. */
 	memcpy(device_config(dev), conf, len);
 	dev->desc->config_len = len;
+
+	/* Size must fit in config_len field (8 bits)! */
+	assert(dev->desc->config_len == len);
 }
 
-/* This routine does all the creation and setup of a new device, including
- * calling new_dev_desc() to allocate the descriptor and device memory.
+/*
+ * This routine does all the creation and setup of a new device, including
+ * calling new_dev_desc() to allocate the descriptor and device memory.  We
+ * don't actually start the service threads until later.
  *
- * See what I mean about userspace being boring? */
+ * See what I mean about userspace being boring?
+ */
 static struct device *new_device(const char *name, u16 type)
 {
 	struct device *dev = malloc(sizeof(*dev));
@@ -1123,10 +1303,12 @@ static struct device *new_device(const char *name, u16 type)
 	dev->num_vq = 0;
 	dev->running = false;
 
-	/* Append to device list.  Prepending to a single-linked list is
+	/*
+	 * Append to device list.  Prepending to a single-linked list is
 	 * easier, but the user expects the devices to be arranged on the bus
 	 * in command-line order.  The first network device on the command line
-	 * is eth0, the first block device /dev/vda, etc. */
+	 * is eth0, the first block device /dev/vda, etc.
+	 */
 	if (devices.lastdev)
 		devices.lastdev->next = dev;
 	else
@@ -1136,8 +1318,10 @@ static struct device *new_device(const char *name, u16 type)
 	return dev;
 }
 
-/* Our first setup routine is the console.  It's a fairly simple device, but
- * UNIX tty handling makes it uglier than it could be. */
+/*
+ * Our first setup routine is the console.  It's a fairly simple device, but
+ * UNIX tty handling makes it uglier than it could be.
+ */
 static void setup_console(void)
 {
 	struct device *dev;
@@ -1145,8 +1329,10 @@ static void setup_console(void)
 	/* If we can save the initial standard input settings... */
 	if (tcgetattr(STDIN_FILENO, &orig_term) == 0) {
 		struct termios term = orig_term;
-		/* Then we turn off echo, line buffering and ^C etc.  We want a
-		 * raw input stream to the Guest. */
+		/*
+		 * Then we turn off echo, line buffering and ^C etc: We want a
+		 * raw input stream to the Guest.
+		 */
 		term.c_lflag &= ~(ISIG|ICANON|ECHO);
 		tcsetattr(STDIN_FILENO, TCSANOW, &term);
 	}
@@ -1157,10 +1343,12 @@ static void setup_console(void)
 	dev->priv = malloc(sizeof(struct console_abort));
 	((struct console_abort *)dev->priv)->count = 0;
 
-	/* The console needs two virtqueues: the input then the output.  When
+	/*
+	 * The console needs two virtqueues: the input then the output.  When
 	 * they put something the input queue, we make sure we're listening to
 	 * stdin.  When they put something in the output queue, we write it to
-	 * stdout. */
+	 * stdout.
+	 */
 	add_virtqueue(dev, VIRTQUEUE_NUM, console_input);
 	add_virtqueue(dev, VIRTQUEUE_NUM, console_output);
 
@@ -1168,7 +1356,8 @@ static void setup_console(void)
 }
 /*:*/
 
-/*M:010 Inter-guest networking is an interesting area.  Simplest is to have a
+/*M:010
+ * Inter-guest networking is an interesting area.  Simplest is to have a
  * --sharenet=<name> option which opens or creates a named pipe.  This can be
  * used to send packets to another guest in a 1:1 manner.
  *
@@ -1182,7 +1371,8 @@ static void setup_console(void)
  * multiple inter-guest channels behind one interface, although it would
  * require some manner of hotplugging new virtio channels.
  *
- * Finally, we could implement a virtio network switch in the kernel. :*/
+ * Finally, we could implement a virtio network switch in the kernel.
+:*/
 
 static u32 str2ip(const char *ipaddr)
 {
@@ -1207,11 +1397,13 @@ static void str2mac(const char *macaddr, unsigned char mac[6])
 	mac[5] = m[5];
 }
 
-/* This code is "adapted" from libbridge: it attaches the Host end of the
+/*
+ * This code is "adapted" from libbridge: it attaches the Host end of the
  * network device to the bridge device specified by the command line.
  *
  * This is yet another James Morris contribution (I'm an IP-level guy, so I
- * dislike bridging), and I just try not to break it. */
+ * dislike bridging), and I just try not to break it.
+ */
 static void add_to_bridge(int fd, const char *if_name, const char *br_name)
 {
 	int ifidx;
@@ -1231,9 +1423,11 @@ static void add_to_bridge(int fd, const char *if_name, const char *br_name)
 		err(1, "can't add %s to bridge %s", if_name, br_name);
 }
 
-/* This sets up the Host end of the network device with an IP address, brings
+/*
+ * This sets up the Host end of the network device with an IP address, brings
  * it up so packets will flow, the copies the MAC address into the hwaddr
- * pointer. */
+ * pointer.
+ */
 static void configure_device(int fd, const char *tapif, u32 ipaddr)
 {
 	struct ifreq ifr;
@@ -1260,10 +1454,12 @@ static int get_tun_device(char tapif[IFNAMSIZ])
 	/* Start with this zeroed.  Messy but sure. */
 	memset(&ifr, 0, sizeof(ifr));
 
-	/* We open the /dev/net/tun device and tell it we want a tap device.  A
+	/*
+	 * We open the /dev/net/tun device and tell it we want a tap device.  A
 	 * tap device is like a tun device, only somehow different.  To tell
 	 * the truth, I completely blundered my way through this code, but it
-	 * works now! */
+	 * works now!
+	 */
 	netfd = open_or_die("/dev/net/tun", O_RDWR);
 	ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR;
 	strcpy(ifr.ifr_name, "tap%d");
@@ -1274,18 +1470,22 @@ static int get_tun_device(char tapif[IFNAMSIZ])
 		  TUN_F_CSUM|TUN_F_TSO4|TUN_F_TSO6|TUN_F_TSO_ECN) != 0)
 		err(1, "Could not set features for tun device");
 
-	/* We don't need checksums calculated for packets coming in this
-	 * device: trust us! */
+	/*
+	 * We don't need checksums calculated for packets coming in this
+	 * device: trust us!
+	 */
 	ioctl(netfd, TUNSETNOCSUM, 1);
 
 	memcpy(tapif, ifr.ifr_name, IFNAMSIZ);
 	return netfd;
 }
 
-/*L:195 Our network is a Host<->Guest network.  This can either use bridging or
+/*L:195
+ * Our network is a Host<->Guest network.  This can either use bridging or
  * routing, but the principle is the same: it uses the "tun" device to inject
  * packets into the Host as if they came in from a normal network card.  We
- * just shunt packets between the Guest and the tun device. */
+ * just shunt packets between the Guest and the tun device.
+ */
 static void setup_tun_net(char *arg)
 {
 	struct device *dev;
@@ -1302,13 +1502,14 @@ static void setup_tun_net(char *arg)
 	dev = new_device("net", VIRTIO_ID_NET);
 	dev->priv = net_info;
 
-	/* Network devices need a receive and a send queue, just like
-	 * console. */
+	/* Network devices need a recv and a send queue, just like console. */
 	add_virtqueue(dev, VIRTQUEUE_NUM, net_input);
 	add_virtqueue(dev, VIRTQUEUE_NUM, net_output);
 
-	/* We need a socket to perform the magic network ioctls to bring up the
-	 * tap interface, connect to the bridge etc.  Any socket will do! */
+	/*
+	 * We need a socket to perform the magic network ioctls to bring up the
+	 * tap interface, connect to the bridge etc.  Any socket will do!
+	 */
 	ipfd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
 	if (ipfd < 0)
 		err(1, "opening IP socket");
@@ -1362,39 +1563,31 @@ static void setup_tun_net(char *arg)
 		verbose("device %u: tun %s: %s\n",
 			devices.device_num, tapif, arg);
 }
-
-/* Our block (disk) device should be really simple: the Guest asks for a block
- * number and we read or write that position in the file.  Unfortunately, that
- * was amazingly slow: the Guest waits until the read is finished before
- * running anything else, even if it could have been doing useful work.
- *
- * We could use async I/O, except it's reputed to suck so hard that characters
- * actually go missing from your code when you try to use it.
- *
- * So we farm the I/O out to thread, and communicate with it via a pipe. */
+/*:*/
 
 /* This hangs off device->priv. */
-struct vblk_info
-{
+struct vblk_info {
 	/* The size of the file. */
 	off64_t len;
 
 	/* The file descriptor for the file. */
 	int fd;
 
-	/* IO thread listens on this file descriptor [0]. */
-	int workpipe[2];
-
-	/* IO thread writes to this file descriptor to mark it done, then
-	 * Launcher triggers interrupt to Guest. */
-	int done_fd;
 };
 
 /*L:210
  * The Disk
  *
- * Remember that the block device is handled by a separate I/O thread.  We head
- * straight into the core of that thread here:
+ * The disk only has one virtqueue, so it only has one thread.  It is really
+ * simple: the Guest asks for a block number and we read or write that position
+ * in the file.
+ *
+ * Before we serviced each virtqueue in a separate thread, that was unacceptably
+ * slow: the Guest waits until the read is finished before running anything
+ * else, even if it could have been doing useful work.
+ *
+ * We could have used async I/O, except it's reputed to suck so hard that
+ * characters actually go missing from your code when you try to use it.
  */
 static void blk_request(struct virtqueue *vq)
 {
@@ -1406,47 +1599,64 @@ static void blk_request(struct virtqueue *vq)
 	struct iovec iov[vq->vring.num];
 	off64_t off;
 
-	/* Get the next request. */
+	/*
+	 * Get the next request, where we normally wait.  It triggers the
+	 * interrupt to acknowledge previously serviced requests (if any).
+	 */
 	head = wait_for_vq_desc(vq, iov, &out_num, &in_num);
 
-	/* Every block request should contain at least one output buffer
+	/*
+	 * Every block request should contain at least one output buffer
 	 * (detailing the location on disk and the type of request) and one
-	 * input buffer (to hold the result). */
+	 * input buffer (to hold the result).
+	 */
 	if (out_num == 0 || in_num == 0)
 		errx(1, "Bad virtblk cmd %u out=%u in=%u",
 		     head, out_num, in_num);
 
 	out = convert(&iov[0], struct virtio_blk_outhdr);
 	in = convert(&iov[out_num+in_num-1], u8);
+	/*
+	 * For historical reasons, block operations are expressed in 512 byte
+	 * "sectors".
+	 */
 	off = out->sector * 512;
 
-	/* The block device implements "barriers", where the Guest indicates
+	/*
+	 * The block device implements "barriers", where the Guest indicates
 	 * that it wants all previous writes to occur before this write.  We
 	 * don't have a way of asking our kernel to do a barrier, so we just
-	 * synchronize all the data in the file.  Pretty poor, no? */
+	 * synchronize all the data in the file.  Pretty poor, no?
+	 */
 	if (out->type & VIRTIO_BLK_T_BARRIER)
 		fdatasync(vblk->fd);
 
-	/* In general the virtio block driver is allowed to try SCSI commands.
-	 * It'd be nice if we supported eject, for example, but we don't. */
+	/*
+	 * In general the virtio block driver is allowed to try SCSI commands.
+	 * It'd be nice if we supported eject, for example, but we don't.
+	 */
 	if (out->type & VIRTIO_BLK_T_SCSI_CMD) {
 		fprintf(stderr, "Scsi commands unsupported\n");
 		*in = VIRTIO_BLK_S_UNSUPP;
 		wlen = sizeof(*in);
 	} else if (out->type & VIRTIO_BLK_T_OUT) {
-		/* Write */
-
-		/* Move to the right location in the block file.  This can fail
-		 * if they try to write past end. */
+		/*
+		 * Write
+		 *
+		 * Move to the right location in the block file.  This can fail
+		 * if they try to write past end.
+		 */
 		if (lseek64(vblk->fd, off, SEEK_SET) != off)
 			err(1, "Bad seek to sector %llu", out->sector);
 
 		ret = writev(vblk->fd, iov+1, out_num-1);
 		verbose("WRITE to sector %llu: %i\n", out->sector, ret);
 
-		/* Grr... Now we know how long the descriptor they sent was, we
+		/*
+		 * Grr... Now we know how long the descriptor they sent was, we
 		 * make sure they didn't try to write over the end of the block
-		 * file (possibly extending it). */
+		 * file (possibly extending it).
+		 */
 		if (ret > 0 && off + ret > vblk->len) {
 			/* Trim it back to the correct length */
 			ftruncate64(vblk->fd, vblk->len);
@@ -1456,10 +1666,12 @@ static void blk_request(struct virtqueue *vq)
 		wlen = sizeof(*in);
 		*in = (ret >= 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR);
 	} else {
-		/* Read */
-
-		/* Move to the right location in the block file.  This can fail
-		 * if they try to read past end. */
+		/*
+		 * Read
+		 *
+		 * Move to the right location in the block file.  This can fail
+		 * if they try to read past end.
+		 */
 		if (lseek64(vblk->fd, off, SEEK_SET) != off)
 			err(1, "Bad seek to sector %llu", out->sector);
 
@@ -1474,13 +1686,16 @@ static void blk_request(struct virtqueue *vq)
 		}
 	}
 
-	/* OK, so we noted that it was pretty poor to use an fdatasync as a
+	/*
+	 * OK, so we noted that it was pretty poor to use an fdatasync as a
 	 * barrier.  But Christoph Hellwig points out that we need a sync
 	 * *afterwards* as well: "Barriers specify no reordering to the front
-	 * or the back."  And Jens Axboe confirmed it, so here we are: */
+	 * or the back."  And Jens Axboe confirmed it, so here we are:
+	 */
 	if (out->type & VIRTIO_BLK_T_BARRIER)
 		fdatasync(vblk->fd);
 
+	/* Finished that request. */
 	add_used(vq, head, wlen);
 }
 
@@ -1491,7 +1706,7 @@ static void setup_block_file(const char *filename)
 	struct vblk_info *vblk;
 	struct virtio_blk_config conf;
 
-	/* The device responds to return from I/O thread. */
+	/* Creat the device. */
 	dev = new_device("block", VIRTIO_ID_BLOCK);
 
 	/* The device has one virtqueue, where the Guest places requests. */
@@ -1510,27 +1725,32 @@ static void setup_block_file(const char *filename)
 	/* Tell Guest how many sectors this device has. */
 	conf.capacity = cpu_to_le64(vblk->len / 512);
 
-	/* Tell Guest not to put in too many descriptors at once: two are used
-	 * for the in and out elements. */
+	/*
+	 * Tell Guest not to put in too many descriptors at once: two are used
+	 * for the in and out elements.
+	 */
 	add_feature(dev, VIRTIO_BLK_F_SEG_MAX);
 	conf.seg_max = cpu_to_le32(VIRTQUEUE_NUM - 2);
 
-	set_config(dev, sizeof(conf), &conf);
+	/* Don't try to put whole struct: we have 8 bit limit. */
+	set_config(dev, offsetof(struct virtio_blk_config, geometry), &conf);
 
 	verbose("device %u: virtblock %llu sectors\n",
 		++devices.device_num, le64_to_cpu(conf.capacity));
 }
 
-struct rng_info {
-	int rfd;
-};
-
-/* Our random number generator device reads from /dev/random into the Guest's
+/*L:211
+ * Our random number generator device reads from /dev/random into the Guest's
  * input buffers.  The usual case is that the Guest doesn't want random numbers
  * and so has no buffers although /dev/random is still readable, whereas
  * console is the reverse.
  *
- * The same logic applies, however. */
+ * The same logic applies, however.
+ */
+struct rng_info {
+	int rfd;
+};
+
 static void rng_input(struct virtqueue *vq)
 {
 	int len;
@@ -1543,9 +1763,10 @@ static void rng_input(struct virtqueue *vq)
 	if (out_num)
 		errx(1, "Output buffers in rng?");
 
-	/* This is why we convert to iovecs: the readv() call uses them, and so
-	 * it reads straight into the Guest's buffer.  We loop to make sure we
-	 * fill it. */
+	/*
+	 * Just like the console write, we loop to cover the whole iovec.
+	 * In this case, short reads actually happen quite a bit.
+	 */
 	while (!iov_empty(iov, in_num)) {
 		len = readv(rng_info->rfd, iov, in_num);
 		if (len <= 0)
@@ -1558,15 +1779,18 @@ static void rng_input(struct virtqueue *vq)
 	add_used(vq, head, totlen);
 }
 
-/* And this creates a "hardware" random number device for the Guest. */
+/*L:199
+ * This creates a "hardware" random number device for the Guest.
+ */
 static void setup_rng(void)
 {
 	struct device *dev;
 	struct rng_info *rng_info = malloc(sizeof(*rng_info));
 
+	/* Our device's privat info simply contains the /dev/random fd. */
 	rng_info->rfd = open_or_die("/dev/random", O_RDONLY);
 
-	/* The device responds to return from I/O thread. */
+	/* Create the new device. */
 	dev = new_device("rng", VIRTIO_ID_RNG);
 	dev->priv = rng_info;
 
@@ -1582,8 +1806,10 @@ static void __attribute__((noreturn)) restart_guest(void)
 {
 	unsigned int i;
 
-	/* Since we don't track all open fds, we simply close everything beyond
-	 * stderr. */
+	/*
+	 * Since we don't track all open fds, we simply close everything beyond
+	 * stderr.
+	 */
 	for (i = 3; i < FD_SETSIZE; i++)
 		close(i);
 
@@ -1594,8 +1820,10 @@ static void __attribute__((noreturn)) restart_guest(void)
 	err(1, "Could not exec %s", main_args[0]);
 }
 
-/*L:220 Finally we reach the core of the Launcher which runs the Guest, serves
- * its input and output, and finally, lays it to rest. */
+/*L:220
+ * Finally we reach the core of the Launcher which runs the Guest, serves
+ * its input and output, and finally, lays it to rest.
+ */
 static void __attribute__((noreturn)) run_guest(void)
 {
 	for (;;) {
@@ -1630,7 +1858,7 @@ static void __attribute__((noreturn)) run_guest(void)
  *
  * Are you ready?  Take a deep breath and join me in the core of the Host, in
  * "make Host".
- :*/
+:*/
 
 static struct option opts[] = {
 	{ "verbose", 0, NULL, 'v' },
@@ -1651,8 +1879,7 @@ static void usage(void)
 /*L:105 The main routine is where the real work begins: */
 int main(int argc, char *argv[])
 {
-	/* Memory, top-level pagetable, code startpoint and size of the
-	 * (optional) initrd. */
+	/* Memory, code startpoint and size of the (optional) initrd. */
 	unsigned long mem = 0, start, initrd_size = 0;
 	/* Two temporaries. */
 	int i, c;
@@ -1664,24 +1891,32 @@ int main(int argc, char *argv[])
 	/* Save the args: we "reboot" by execing ourselves again. */
 	main_args = argv;
 
-	/* First we initialize the device list.  We keep a pointer to the last
+	/*
+	 * First we initialize the device list.  We keep a pointer to the last
 	 * device, and the next interrupt number to use for devices (1:
-	 * remember that 0 is used by the timer). */
+	 * remember that 0 is used by the timer).
+	 */
 	devices.lastdev = NULL;
 	devices.next_irq = 1;
 
+	/* We're CPU 0.  In fact, that's the only CPU possible right now. */
 	cpu_id = 0;
-	/* We need to know how much memory so we can set up the device
+
+	/*
+	 * We need to know how much memory so we can set up the device
 	 * descriptor and memory pages for the devices as we parse the command
 	 * line.  So we quickly look through the arguments to find the amount
-	 * of memory now. */
+	 * of memory now.
+	 */
 	for (i = 1; i < argc; i++) {
 		if (argv[i][0] != '-') {
 			mem = atoi(argv[i]) * 1024 * 1024;
-			/* We start by mapping anonymous pages over all of
+			/*
+			 * We start by mapping anonymous pages over all of
 			 * guest-physical memory range.  This fills it with 0,
 			 * and ensures that the Guest won't be killed when it
-			 * tries to access it. */
+			 * tries to access it.
+			 */
 			guest_base = map_zeroed_pages(mem / getpagesize()
 						      + DEVICE_PAGES);
 			guest_limit = mem;
@@ -1714,8 +1949,10 @@ int main(int argc, char *argv[])
 			usage();
 		}
 	}
-	/* After the other arguments we expect memory and kernel image name,
-	 * followed by command line arguments for the kernel. */
+	/*
+	 * After the other arguments we expect memory and kernel image name,
+	 * followed by command line arguments for the kernel.
+	 */
 	if (optind + 2 > argc)
 		usage();
 
@@ -1733,20 +1970,26 @@ int main(int argc, char *argv[])
 	/* Map the initrd image if requested (at top of physical memory) */
 	if (initrd_name) {
 		initrd_size = load_initrd(initrd_name, mem);
-		/* These are the location in the Linux boot header where the
-		 * start and size of the initrd are expected to be found. */
+		/*
+		 * These are the location in the Linux boot header where the
+		 * start and size of the initrd are expected to be found.
+		 */
 		boot->hdr.ramdisk_image = mem - initrd_size;
 		boot->hdr.ramdisk_size = initrd_size;
 		/* The bootloader type 0xFF means "unknown"; that's OK. */
 		boot->hdr.type_of_loader = 0xFF;
 	}
 
-	/* The Linux boot header contains an "E820" memory map: ours is a
-	 * simple, single region. */
+	/*
+	 * The Linux boot header contains an "E820" memory map: ours is a
+	 * simple, single region.
+	 */
 	boot->e820_entries = 1;
 	boot->e820_map[0] = ((struct e820entry) { 0, mem, E820_RAM });
-	/* The boot header contains a command line pointer: we put the command
-	 * line after the boot header. */
+	/*
+	 * The boot header contains a command line pointer: we put the command
+	 * line after the boot header.
+	 */
 	boot->hdr.cmd_line_ptr = to_guest_phys(boot + 1);
 	/* We use a simple helper to copy the arguments separated by spaces. */
 	concat((char *)(boot + 1), argv+optind+2);
@@ -1760,11 +2003,13 @@ int main(int argc, char *argv[])
 	/* Tell the entry path not to try to reload segment registers. */
 	boot->hdr.loadflags |= KEEP_SEGMENTS;
 
-	/* We tell the kernel to initialize the Guest: this returns the open
-	 * /dev/lguest file descriptor. */
+	/*
+	 * We tell the kernel to initialize the Guest: this returns the open
+	 * /dev/lguest file descriptor.
+	 */
 	tell_kernel(start);
 
-	/* Ensure that we terminate if a child dies. */
+	/* Ensure that we terminate if a device-servicing child dies. */
 	signal(SIGCHLD, kill_launcher);
 
 	/* If we exit via err(), this kills all the threads, restores tty. */
diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt
index e20d913d591..abf768c681e 100644
--- a/Documentation/lockdep-design.txt
+++ b/Documentation/lockdep-design.txt
@@ -30,9 +30,9 @@ State
 The validator tracks lock-class usage history into 4n + 1 separate state bits:
 
 - 'ever held in STATE context'
-- 'ever head as readlock in STATE context'
-- 'ever head with STATE enabled'
-- 'ever head as readlock with STATE enabled'
+- 'ever held as readlock in STATE context'
+- 'ever held with STATE enabled'
+- 'ever held as readlock with STATE enabled'
 
 Where STATE can be either one of (kernel/lockdep_states.h)
  - hardirq
diff --git a/Documentation/networking/6pack.txt b/Documentation/networking/6pack.txt
index d0777a1200e..8f339428fdf 100644
--- a/Documentation/networking/6pack.txt
+++ b/Documentation/networking/6pack.txt
@@ -1,7 +1,7 @@
 This is the 6pack-mini-HOWTO, written by
 
 Andreas Könsgen DG3KQ
-Internet: ajk@iehk.rwth-aachen.de
+Internet: ajk@comnets.uni-bremen.de
 AMPR-net: dg3kq@db0pra.ampr.org
 AX.25:    dg3kq@db0ach.#nrw.deu.eu
 
diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt
index 8d999d862d0..79f533f38c6 100644
--- a/Documentation/powerpc/booting-without-of.txt
+++ b/Documentation/powerpc/booting-without-of.txt
@@ -1238,1122 +1238,7 @@ descriptions for the SOC devices for which new nodes have been
 defined; this list will expand as more and more SOC-containing
 platforms are moved over to use the flattened-device-tree model.
 
-   a) PHY nodes
-
-   Required properties:
-
-    - device_type : Should be "ethernet-phy"
-    - interrupts : <a b> where a is the interrupt number and b is a
-      field that represents an encoding of the sense and level
-      information for the interrupt.  This should be encoded based on
-      the information in section 2) depending on the type of interrupt
-      controller you have.
-    - interrupt-parent : the phandle for the interrupt controller that
-      services interrupts for this device.
-    - reg : The ID number for the phy, usually a small integer
-    - linux,phandle :  phandle for this node; likely referenced by an
-      ethernet controller node.
-
-
-   Example:
-
-	ethernet-phy@0 {
-		linux,phandle = <2452000>
-		interrupt-parent = <40000>;
-		interrupts = <35 1>;
-		reg = <0>;
-		device_type = "ethernet-phy";
-	};
-
-
-   b) Interrupt controllers
-
-   Some SOC devices contain interrupt controllers that are different
-   from the standard Open PIC specification.  The SOC device nodes for
-   these types of controllers should be specified just like a standard
-   OpenPIC controller.  Sense and level information should be encoded
-   as specified in section 2) of this chapter for each device that
-   specifies an interrupt.
-
-   Example :
-
-	pic@40000 {
-		linux,phandle = <40000>;
-		interrupt-controller;
-		#address-cells = <0>;
-		reg = <40000 40000>;
-		compatible = "chrp,open-pic";
-		device_type = "open-pic";
-	};
-
-    c) 4xx/Axon EMAC ethernet nodes
-
-    The EMAC ethernet controller in IBM and AMCC 4xx chips, and also
-    the Axon bridge.  To operate this needs to interact with a ths
-    special McMAL DMA controller, and sometimes an RGMII or ZMII
-    interface.  In addition to the nodes and properties described
-    below, the node for the OPB bus on which the EMAC sits must have a
-    correct clock-frequency property.
-
-      i) The EMAC node itself
-
-    Required properties:
-    - device_type       : "network"
-
-    - compatible        : compatible list, contains 2 entries, first is
-			  "ibm,emac-CHIP" where CHIP is the host ASIC (440gx,
-			  405gp, Axon) and second is either "ibm,emac" or
-			  "ibm,emac4".  For Axon, thus, we have: "ibm,emac-axon",
-			  "ibm,emac4"
-    - interrupts        : <interrupt mapping for EMAC IRQ and WOL IRQ>
-    - interrupt-parent  : optional, if needed for interrupt mapping
-    - reg               : <registers mapping>
-    - local-mac-address : 6 bytes, MAC address
-    - mal-device        : phandle of the associated McMAL node
-    - mal-tx-channel    : 1 cell, index of the tx channel on McMAL associated
-			  with this EMAC
-    - mal-rx-channel    : 1 cell, index of the rx channel on McMAL associated
-			  with this EMAC
-    - cell-index        : 1 cell, hardware index of the EMAC cell on a given
-			  ASIC (typically 0x0 and 0x1 for EMAC0 and EMAC1 on
-			  each Axon chip)
-    - max-frame-size    : 1 cell, maximum frame size supported in bytes
-    - rx-fifo-size      : 1 cell, Rx fifo size in bytes for 10 and 100 Mb/sec
-			  operations.
-			  For Axon, 2048
-    - tx-fifo-size      : 1 cell, Tx fifo size in bytes for 10 and 100 Mb/sec
-			  operations.
-			  For Axon, 2048.
-    - fifo-entry-size   : 1 cell, size of a fifo entry (used to calculate
-			  thresholds).
-			  For Axon, 0x00000010
-    - mal-burst-size    : 1 cell, MAL burst size (used to calculate thresholds)
-			  in bytes.
-			  For Axon, 0x00000100 (I think ...)
-    - phy-mode          : string, mode of operations of the PHY interface.
-			  Supported values are: "mii", "rmii", "smii", "rgmii",
-			  "tbi", "gmii", rtbi", "sgmii".
-			  For Axon on CAB, it is "rgmii"
-    - mdio-device       : 1 cell, required iff using shared MDIO registers
-			  (440EP).  phandle of the EMAC to use to drive the
-			  MDIO lines for the PHY used by this EMAC.
-    - zmii-device       : 1 cell, required iff connected to a ZMII.  phandle of
-			  the ZMII device node
-    - zmii-channel      : 1 cell, required iff connected to a ZMII.  Which ZMII
-			  channel or 0xffffffff if ZMII is only used for MDIO.
-    - rgmii-device      : 1 cell, required iff connected to an RGMII. phandle
-			  of the RGMII device node.
-			  For Axon: phandle of plb5/plb4/opb/rgmii
-    - rgmii-channel     : 1 cell, required iff connected to an RGMII.  Which
-			  RGMII channel is used by this EMAC.
-			  Fox Axon: present, whatever value is appropriate for each
-			  EMAC, that is the content of the current (bogus) "phy-port"
-			  property.
-
-    Optional properties:
-    - phy-address       : 1 cell, optional, MDIO address of the PHY. If absent,
-			  a search is performed.
-    - phy-map           : 1 cell, optional, bitmap of addresses to probe the PHY
-			  for, used if phy-address is absent. bit 0x00000001 is
-			  MDIO address 0.
-			  For Axon it can be absent, though my current driver
-			  doesn't handle phy-address yet so for now, keep
-			  0x00ffffff in it.
-    - rx-fifo-size-gige : 1 cell, Rx fifo size in bytes for 1000 Mb/sec
-			  operations (if absent the value is the same as
-			  rx-fifo-size).  For Axon, either absent or 2048.
-    - tx-fifo-size-gige : 1 cell, Tx fifo size in bytes for 1000 Mb/sec
-			  operations (if absent the value is the same as
-			  tx-fifo-size). For Axon, either absent or 2048.
-    - tah-device        : 1 cell, optional. If connected to a TAH engine for
-			  offload, phandle of the TAH device node.
-    - tah-channel       : 1 cell, optional. If appropriate, channel used on the
-			  TAH engine.
-
-    Example:
-
-	EMAC0: ethernet@40000800 {
-		device_type = "network";
-		compatible = "ibm,emac-440gp", "ibm,emac";
-		interrupt-parent = <&UIC1>;
-		interrupts = <1c 4 1d 4>;
-		reg = <40000800 70>;
-		local-mac-address = [00 04 AC E3 1B 1E];
-		mal-device = <&MAL0>;
-		mal-tx-channel = <0 1>;
-		mal-rx-channel = <0>;
-		cell-index = <0>;
-		max-frame-size = <5dc>;
-		rx-fifo-size = <1000>;
-		tx-fifo-size = <800>;
-		phy-mode = "rmii";
-		phy-map = <00000001>;
-		zmii-device = <&ZMII0>;
-		zmii-channel = <0>;
-	};
-
-      ii) McMAL node
-
-    Required properties:
-    - device_type        : "dma-controller"
-    - compatible         : compatible list, containing 2 entries, first is
-			   "ibm,mcmal-CHIP" where CHIP is the host ASIC (like
-			   emac) and the second is either "ibm,mcmal" or
-			   "ibm,mcmal2".
-			   For Axon, "ibm,mcmal-axon","ibm,mcmal2"
-    - interrupts         : <interrupt mapping for the MAL interrupts sources:
-                           5 sources: tx_eob, rx_eob, serr, txde, rxde>.
-                           For Axon: This is _different_ from the current
-			   firmware.  We use the "delayed" interrupts for txeob
-			   and rxeob. Thus we end up with mapping those 5 MPIC
-			   interrupts, all level positive sensitive: 10, 11, 32,
-			   33, 34 (in decimal)
-    - dcr-reg            : < DCR registers range >
-    - dcr-parent         : if needed for dcr-reg
-    - num-tx-chans       : 1 cell, number of Tx channels
-    - num-rx-chans       : 1 cell, number of Rx channels
-
-      iii) ZMII node
-
-    Required properties:
-    - compatible         : compatible list, containing 2 entries, first is
-			   "ibm,zmii-CHIP" where CHIP is the host ASIC (like
-			   EMAC) and the second is "ibm,zmii".
-			   For Axon, there is no ZMII node.
-    - reg                : <registers mapping>
-
-      iv) RGMII node
-
-    Required properties:
-    - compatible         : compatible list, containing 2 entries, first is
-			   "ibm,rgmii-CHIP" where CHIP is the host ASIC (like
-			   EMAC) and the second is "ibm,rgmii".
-                           For Axon, "ibm,rgmii-axon","ibm,rgmii"
-    - reg                : <registers mapping>
-    - revision           : as provided by the RGMII new version register if
-			   available.
-			   For Axon: 0x0000012a
-
-   d) Xilinx IP cores
-
-   The Xilinx EDK toolchain ships with a set of IP cores (devices) for use
-   in Xilinx Spartan and Virtex FPGAs.  The devices cover the whole range
-   of standard device types (network, serial, etc.) and miscellaneous
-   devices (gpio, LCD, spi, etc).  Also, since these devices are
-   implemented within the fpga fabric every instance of the device can be
-   synthesised with different options that change the behaviour.
-
-   Each IP-core has a set of parameters which the FPGA designer can use to
-   control how the core is synthesized.  Historically, the EDK tool would
-   extract the device parameters relevant to device drivers and copy them
-   into an 'xparameters.h' in the form of #define symbols.  This tells the
-   device drivers how the IP cores are configured, but it requres the kernel
-   to be recompiled every time the FPGA bitstream is resynthesized.
-
-   The new approach is to export the parameters into the device tree and
-   generate a new device tree each time the FPGA bitstream changes.  The
-   parameters which used to be exported as #defines will now become
-   properties of the device node.  In general, device nodes for IP-cores
-   will take the following form:
-
-	(name): (generic-name)@(base-address) {
-		compatible = "xlnx,(ip-core-name)-(HW_VER)"
-			     [, (list of compatible devices), ...];
-		reg = <(baseaddr) (size)>;
-		interrupt-parent = <&interrupt-controller-phandle>;
-		interrupts = < ... >;
-		xlnx,(parameter1) = "(string-value)";
-		xlnx,(parameter2) = <(int-value)>;
-	};
-
-	(generic-name):   an open firmware-style name that describes the
-			generic class of device.  Preferably, this is one word, such
-			as 'serial' or 'ethernet'.
-	(ip-core-name):	the name of the ip block (given after the BEGIN
-			directive in system.mhs).  Should be in lowercase
-			and all underscores '_' converted to dashes '-'.
-	(name):		is derived from the "PARAMETER INSTANCE" value.
-	(parameter#):	C_* parameters from system.mhs.  The C_ prefix is
-			dropped from the parameter name, the name is converted
-			to lowercase and all underscore '_' characters are
-			converted to dashes '-'.
-	(baseaddr):	the baseaddr parameter value (often named C_BASEADDR).
-	(HW_VER):	from the HW_VER parameter.
-	(size):		the address range size (often C_HIGHADDR - C_BASEADDR + 1).
-
-   Typically, the compatible list will include the exact IP core version
-   followed by an older IP core version which implements the same
-   interface or any other device with the same interface.
-
-   'reg', 'interrupt-parent' and 'interrupts' are all optional properties.
-
-   For example, the following block from system.mhs:
-
-	BEGIN opb_uartlite
-		PARAMETER INSTANCE = opb_uartlite_0
-		PARAMETER HW_VER = 1.00.b
-		PARAMETER C_BAUDRATE = 115200
-		PARAMETER C_DATA_BITS = 8
-		PARAMETER C_ODD_PARITY = 0
-		PARAMETER C_USE_PARITY = 0
-		PARAMETER C_CLK_FREQ = 50000000
-		PARAMETER C_BASEADDR = 0xEC100000
-		PARAMETER C_HIGHADDR = 0xEC10FFFF
-		BUS_INTERFACE SOPB = opb_7
-		PORT OPB_Clk = CLK_50MHz
-		PORT Interrupt = opb_uartlite_0_Interrupt
-		PORT RX = opb_uartlite_0_RX
-		PORT TX = opb_uartlite_0_TX
-		PORT OPB_Rst = sys_bus_reset_0
-	END
-
-   becomes the following device tree node:
-
-	opb_uartlite_0: serial@ec100000 {
-		device_type = "serial";
-		compatible = "xlnx,opb-uartlite-1.00.b";
-		reg = <ec100000 10000>;
-		interrupt-parent = <&opb_intc_0>;
-		interrupts = <1 0>; // got this from the opb_intc parameters
-		current-speed = <d#115200>;	// standard serial device prop
-		clock-frequency = <d#50000000>;	// standard serial device prop
-		xlnx,data-bits = <8>;
-		xlnx,odd-parity = <0>;
-		xlnx,use-parity = <0>;
-	};
-
-   Some IP cores actually implement 2 or more logical devices.  In
-   this case, the device should still describe the whole IP core with
-   a single node and add a child node for each logical device.  The
-   ranges property can be used to translate from parent IP-core to the
-   registers of each device.  In addition, the parent node should be
-   compatible with the bus type 'xlnx,compound', and should contain
-   #address-cells and #size-cells, as with any other bus.  (Note: this
-   makes the assumption that both logical devices have the same bus
-   binding.  If this is not true, then separate nodes should be used
-   for each logical device).  The 'cell-index' property can be used to
-   enumerate logical devices within an IP core.  For example, the
-   following is the system.mhs entry for the dual ps2 controller found
-   on the ml403 reference design.
-
-	BEGIN opb_ps2_dual_ref
-		PARAMETER INSTANCE = opb_ps2_dual_ref_0
-		PARAMETER HW_VER = 1.00.a
-		PARAMETER C_BASEADDR = 0xA9000000
-		PARAMETER C_HIGHADDR = 0xA9001FFF
-		BUS_INTERFACE SOPB = opb_v20_0
-		PORT Sys_Intr1 = ps2_1_intr
-		PORT Sys_Intr2 = ps2_2_intr
-		PORT Clkin1 = ps2_clk_rx_1
-		PORT Clkin2 = ps2_clk_rx_2
-		PORT Clkpd1 = ps2_clk_tx_1
-		PORT Clkpd2 = ps2_clk_tx_2
-		PORT Rx1 = ps2_d_rx_1
-		PORT Rx2 = ps2_d_rx_2
-		PORT Txpd1 = ps2_d_tx_1
-		PORT Txpd2 = ps2_d_tx_2
-	END
-
-   It would result in the following device tree nodes:
-
-	opb_ps2_dual_ref_0: opb-ps2-dual-ref@a9000000 {
-		#address-cells = <1>;
-		#size-cells = <1>;
-		compatible = "xlnx,compound";
-		ranges = <0 a9000000 2000>;
-		// If this device had extra parameters, then they would
-		// go here.
-		ps2@0 {
-			compatible = "xlnx,opb-ps2-dual-ref-1.00.a";
-			reg = <0 40>;
-			interrupt-parent = <&opb_intc_0>;
-			interrupts = <3 0>;
-			cell-index = <0>;
-		};
-		ps2@1000 {
-			compatible = "xlnx,opb-ps2-dual-ref-1.00.a";
-			reg = <1000 40>;
-			interrupt-parent = <&opb_intc_0>;
-			interrupts = <3 0>;
-			cell-index = <0>;
-		};
-	};
-
-   Also, the system.mhs file defines bus attachments from the processor
-   to the devices.  The device tree structure should reflect the bus
-   attachments.  Again an example; this system.mhs fragment:
-
-	BEGIN ppc405_virtex4
-		PARAMETER INSTANCE = ppc405_0
-		PARAMETER HW_VER = 1.01.a
-		BUS_INTERFACE DPLB = plb_v34_0
-		BUS_INTERFACE IPLB = plb_v34_0
-	END
-
-	BEGIN opb_intc
-		PARAMETER INSTANCE = opb_intc_0
-		PARAMETER HW_VER = 1.00.c
-		PARAMETER C_BASEADDR = 0xD1000FC0
-		PARAMETER C_HIGHADDR = 0xD1000FDF
-		BUS_INTERFACE SOPB = opb_v20_0
-	END
-
-	BEGIN opb_uart16550
-		PARAMETER INSTANCE = opb_uart16550_0
-		PARAMETER HW_VER = 1.00.d
-		PARAMETER C_BASEADDR = 0xa0000000
-		PARAMETER C_HIGHADDR = 0xa0001FFF
-		BUS_INTERFACE SOPB = opb_v20_0
-	END
-
-	BEGIN plb_v34
-		PARAMETER INSTANCE = plb_v34_0
-		PARAMETER HW_VER = 1.02.a
-	END
-
-	BEGIN plb_bram_if_cntlr
-		PARAMETER INSTANCE = plb_bram_if_cntlr_0
-		PARAMETER HW_VER = 1.00.b
-		PARAMETER C_BASEADDR = 0xFFFF0000
-		PARAMETER C_HIGHADDR = 0xFFFFFFFF
-		BUS_INTERFACE SPLB = plb_v34_0
-	END
-
-	BEGIN plb2opb_bridge
-		PARAMETER INSTANCE = plb2opb_bridge_0
-		PARAMETER HW_VER = 1.01.a
-		PARAMETER C_RNG0_BASEADDR = 0x20000000
-		PARAMETER C_RNG0_HIGHADDR = 0x3FFFFFFF
-		PARAMETER C_RNG1_BASEADDR = 0x60000000
-		PARAMETER C_RNG1_HIGHADDR = 0x7FFFFFFF
-		PARAMETER C_RNG2_BASEADDR = 0x80000000
-		PARAMETER C_RNG2_HIGHADDR = 0xBFFFFFFF
-		PARAMETER C_RNG3_BASEADDR = 0xC0000000
-		PARAMETER C_RNG3_HIGHADDR = 0xDFFFFFFF
-		BUS_INTERFACE SPLB = plb_v34_0
-		BUS_INTERFACE MOPB = opb_v20_0
-	END
-
-   Gives this device tree (some properties removed for clarity):
-
-	plb@0 {
-		#address-cells = <1>;
-		#size-cells = <1>;
-		compatible = "xlnx,plb-v34-1.02.a";
-		device_type = "ibm,plb";
-		ranges; // 1:1 translation
-
-		plb_bram_if_cntrl_0: bram@ffff0000 {
-			reg = <ffff0000 10000>;
-		}
-
-		opb@20000000 {
-			#address-cells = <1>;
-			#size-cells = <1>;
-			ranges = <20000000 20000000 20000000
-				  60000000 60000000 20000000
-				  80000000 80000000 40000000
-				  c0000000 c0000000 20000000>;
-
-			opb_uart16550_0: serial@a0000000 {
-				reg = <a00000000 2000>;
-			};
-
-			opb_intc_0: interrupt-controller@d1000fc0 {
-				reg = <d1000fc0 20>;
-			};
-		};
-	};
-
-   That covers the general approach to binding xilinx IP cores into the
-   device tree.  The following are bindings for specific devices:
-
-      i) Xilinx ML300 Framebuffer
-
-      Simple framebuffer device from the ML300 reference design (also on the
-      ML403 reference design as well as others).
-
-      Optional properties:
-       - resolution = <xres yres> : pixel resolution of framebuffer.  Some
-                                    implementations use a different resolution.
-                                    Default is <d#640 d#480>
-       - virt-resolution = <xvirt yvirt> : Size of framebuffer in memory.
-                                           Default is <d#1024 d#480>.
-       - rotate-display (empty) : rotate display 180 degrees.
-
-      ii) Xilinx SystemACE
-
-      The Xilinx SystemACE device is used to program FPGAs from an FPGA
-      bitstream stored on a CF card.  It can also be used as a generic CF
-      interface device.
-
-      Optional properties:
-       - 8-bit (empty) : Set this property for SystemACE in 8 bit mode
-
-      iii) Xilinx EMAC and Xilinx TEMAC
-
-      Xilinx Ethernet devices.  In addition to general xilinx properties
-      listed above, nodes for these devices should include a phy-handle
-      property, and may include other common network device properties
-      like local-mac-address.
-
-      iv) Xilinx Uartlite
-
-      Xilinx uartlite devices are simple fixed speed serial ports.
-
-      Required properties:
-       - current-speed : Baud rate of uartlite
-
-      v) Xilinx hwicap
-
-		Xilinx hwicap devices provide access to the configuration logic
-		of the FPGA through the Internal Configuration Access Port
-		(ICAP).  The ICAP enables partial reconfiguration of the FPGA,
-		readback of the configuration information, and some control over
-		'warm boots' of the FPGA fabric.
-
-		Required properties:
-		- xlnx,family : The family of the FPGA, necessary since the
-                      capabilities of the underlying ICAP hardware
-                      differ between different families.  May be
-                      'virtex2p', 'virtex4', or 'virtex5'.
-
-      vi) Xilinx Uart 16550
-
-      Xilinx UART 16550 devices are very similar to the NS16550 but with
-      different register spacing and an offset from the base address.
-
-      Required properties:
-       - clock-frequency : Frequency of the clock input
-       - reg-offset : A value of 3 is required
-       - reg-shift : A value of 2 is required
-
-    e) USB EHCI controllers
-
-    Required properties:
-      - compatible : should be "usb-ehci".
-      - reg : should contain at least address and length of the standard EHCI
-        register set for the device. Optional platform-dependent registers
-        (debug-port or other) can be also specified here, but only after
-        definition of standard EHCI registers.
-      - interrupts : one EHCI interrupt should be described here.
-    If device registers are implemented in big endian mode, the device
-    node should have "big-endian-regs" property.
-    If controller implementation operates with big endian descriptors,
-    "big-endian-desc" property should be specified.
-    If both big endian registers and descriptors are used by the controller
-    implementation, "big-endian" property can be specified instead of having
-    both "big-endian-regs" and "big-endian-desc".
-
-     Example (Sequoia 440EPx):
-	    ehci@e0000300 {
-		   compatible = "ibm,usb-ehci-440epx", "usb-ehci";
-		   interrupt-parent = <&UIC0>;
-		   interrupts = <1a 4>;
-		   reg = <0 e0000300 90 0 e0000390 70>;
-		   big-endian;
-	   };
-
-   f) MDIO on GPIOs
-
-   Currently defined compatibles:
-   - virtual,gpio-mdio
-
-   MDC and MDIO lines connected to GPIO controllers are listed in the
-   gpios property as described in section VIII.1 in the following order:
-
-   MDC, MDIO.
-
-   Example:
-
-	mdio {
-		compatible = "virtual,mdio-gpio";
-		#address-cells = <1>;
-		#size-cells = <0>;
-		gpios = <&qe_pio_a 11
-			 &qe_pio_c 6>;
-	};
-
-    g) SPI (Serial Peripheral Interface) busses
-
-    SPI busses can be described with a node for the SPI master device
-    and a set of child nodes for each SPI slave on the bus.  For this
-    discussion, it is assumed that the system's SPI controller is in
-    SPI master mode.  This binding does not describe SPI controllers
-    in slave mode.
-
-    The SPI master node requires the following properties:
-    - #address-cells  - number of cells required to define a chip select
-			address on the SPI bus.
-    - #size-cells     - should be zero.
-    - compatible      - name of SPI bus controller following generic names
-			recommended practice.
-    No other properties are required in the SPI bus node.  It is assumed
-    that a driver for an SPI bus device will understand that it is an SPI bus.
-    However, the binding does not attempt to define the specific method for
-    assigning chip select numbers.  Since SPI chip select configuration is
-    flexible and non-standardized, it is left out of this binding with the
-    assumption that board specific platform code will be used to manage
-    chip selects.  Individual drivers can define additional properties to
-    support describing the chip select layout.
-
-    SPI slave nodes must be children of the SPI master node and can
-    contain the following properties.
-    - reg             - (required) chip select address of device.
-    - compatible      - (required) name of SPI device following generic names
-			recommended practice
-    - spi-max-frequency - (required) Maximum SPI clocking speed of device in Hz
-    - spi-cpol        - (optional) Empty property indicating device requires
-			inverse clock polarity (CPOL) mode
-    - spi-cpha        - (optional) Empty property indicating device requires
-			shifted clock phase (CPHA) mode
-    - spi-cs-high     - (optional) Empty property indicating device requires
-			chip select active high
-
-    SPI example for an MPC5200 SPI bus:
-		spi@f00 {
-			#address-cells = <1>;
-			#size-cells = <0>;
-			compatible = "fsl,mpc5200b-spi","fsl,mpc5200-spi";
-			reg = <0xf00 0x20>;
-			interrupts = <2 13 0 2 14 0>;
-			interrupt-parent = <&mpc5200_pic>;
-
-			ethernet-switch@0 {
-				compatible = "micrel,ks8995m";
-				spi-max-frequency = <1000000>;
-				reg = <0>;
-			};
-
-			codec@1 {
-				compatible = "ti,tlv320aic26";
-				spi-max-frequency = <100000>;
-				reg = <1>;
-			};
-		};
-
-VII - Marvell Discovery mv64[345]6x System Controller chips
-===========================================================
-
-The Marvell mv64[345]60 series of system controller chips contain
-many of the peripherals needed to implement a complete computer
-system.  In this section, we define device tree nodes to describe
-the system controller chip itself and each of the peripherals
-which it contains.  Compatible string values for each node are
-prefixed with the string "marvell,", for Marvell Technology Group Ltd.
-
-1) The /system-controller node
-
-  This node is used to represent the system-controller and must be
-  present when the system uses a system controller chip. The top-level
-  system-controller node contains information that is global to all
-  devices within the system controller chip. The node name begins
-  with "system-controller" followed by the unit address, which is
-  the base address of the memory-mapped register set for the system
-  controller chip.
-
-  Required properties:
-
-    - ranges : Describes the translation of system controller addresses
-      for memory mapped registers.
-    - clock-frequency: Contains the main clock frequency for the system
-      controller chip.
-    - reg : This property defines the address and size of the
-      memory-mapped registers contained within the system controller
-      chip.  The address specified in the "reg" property should match
-      the unit address of the system-controller node.
-    - #address-cells : Address representation for system controller
-      devices.  This field represents the number of cells needed to
-      represent the address of the memory-mapped registers of devices
-      within the system controller chip.
-    - #size-cells : Size representation for for the memory-mapped
-      registers within the system controller chip.
-    - #interrupt-cells : Defines the width of cells used to represent
-      interrupts.
-
-  Optional properties:
-
-    - model : The specific model of the system controller chip.  Such
-      as, "mv64360", "mv64460", or "mv64560".
-    - compatible : A string identifying the compatibility identifiers
-      of the system controller chip.
-
-  The system-controller node contains child nodes for each system
-  controller device that the platform uses.  Nodes should not be created
-  for devices which exist on the system controller chip but are not used
-
-  Example Marvell Discovery mv64360 system-controller node:
-
-    system-controller@f1000000 { /* Marvell Discovery mv64360 */
-	    #address-cells = <1>;
-	    #size-cells = <1>;
-	    model = "mv64360";                      /* Default */
-	    compatible = "marvell,mv64360";
-	    clock-frequency = <133333333>;
-	    reg = <0xf1000000 0x10000>;
-	    virtual-reg = <0xf1000000>;
-	    ranges = <0x88000000 0x88000000 0x1000000 /* PCI 0 I/O Space */
-		    0x80000000 0x80000000 0x8000000 /* PCI 0 MEM Space */
-		    0xa0000000 0xa0000000 0x4000000 /* User FLASH */
-		    0x00000000 0xf1000000 0x0010000 /* Bridge's regs */
-		    0xf2000000 0xf2000000 0x0040000>;/* Integrated SRAM */
-
-	    [ child node definitions... ]
-    }
-
-2) Child nodes of /system-controller
-
-   a) Marvell Discovery MDIO bus
-
-   The MDIO is a bus to which the PHY devices are connected.  For each
-   device that exists on this bus, a child node should be created.  See
-   the definition of the PHY node below for an example of how to define
-   a PHY.
-
-   Required properties:
-     - #address-cells : Should be <1>
-     - #size-cells : Should be <0>
-     - device_type : Should be "mdio"
-     - compatible : Should be "marvell,mv64360-mdio"
-
-   Example:
-
-     mdio {
-	     #address-cells = <1>;
-	     #size-cells = <0>;
-	     device_type = "mdio";
-	     compatible = "marvell,mv64360-mdio";
-
-	     ethernet-phy@0 {
-		     ......
-	     };
-     };
-
-
-   b) Marvell Discovery ethernet controller
-
-   The Discover ethernet controller is described with two levels
-   of nodes.  The first level describes an ethernet silicon block
-   and the second level describes up to 3 ethernet nodes within
-   that block.  The reason for the multiple levels is that the
-   registers for the node are interleaved within a single set
-   of registers.  The "ethernet-block" level describes the
-   shared register set, and the "ethernet" nodes describe ethernet
-   port-specific properties.
-
-   Ethernet block node
-
-   Required properties:
-     - #address-cells : <1>
-     - #size-cells : <0>
-     - compatible : "marvell,mv64360-eth-block"
-     - reg : Offset and length of the register set for this block
-
-   Example Discovery Ethernet block node:
-     ethernet-block@2000 {
-	     #address-cells = <1>;
-	     #size-cells = <0>;
-	     compatible = "marvell,mv64360-eth-block";
-	     reg = <0x2000 0x2000>;
-	     ethernet@0 {
-		     .......
-	     };
-     };
-
-   Ethernet port node
-
-   Required properties:
-     - device_type : Should be "network".
-     - compatible : Should be "marvell,mv64360-eth".
-     - reg : Should be <0>, <1>, or <2>, according to which registers
-       within the silicon block the device uses.
-     - interrupts : <a> where a is the interrupt number for the port.
-     - interrupt-parent : the phandle for the interrupt controller
-       that services interrupts for this device.
-     - phy : the phandle for the PHY connected to this ethernet
-       controller.
-     - local-mac-address : 6 bytes, MAC address
-
-   Example Discovery Ethernet port node:
-     ethernet@0 {
-	     device_type = "network";
-	     compatible = "marvell,mv64360-eth";
-	     reg = <0>;
-	     interrupts = <32>;
-	     interrupt-parent = <&PIC>;
-	     phy = <&PHY0>;
-	     local-mac-address = [ 00 00 00 00 00 00 ];
-     };
-
-
-
-   c) Marvell Discovery PHY nodes
-
-   Required properties:
-     - device_type : Should be "ethernet-phy"
-     - interrupts : <a> where a is the interrupt number for this phy.
-     - interrupt-parent : the phandle for the interrupt controller that
-       services interrupts for this device.
-     - reg : The ID number for the phy, usually a small integer
-
-   Example Discovery PHY node:
-     ethernet-phy@1 {
-	     device_type = "ethernet-phy";
-	     compatible = "broadcom,bcm5421";
-	     interrupts = <76>;      /* GPP 12 */
-	     interrupt-parent = <&PIC>;
-	     reg = <1>;
-     };
-
-
-   d) Marvell Discovery SDMA nodes
-
-   Represent DMA hardware associated with the MPSC (multiprotocol
-   serial controllers).
-
-   Required properties:
-     - compatible : "marvell,mv64360-sdma"
-     - reg : Offset and length of the register set for this device
-     - interrupts : <a> where a is the interrupt number for the DMA
-       device.
-     - interrupt-parent : the phandle for the interrupt controller
-       that services interrupts for this device.
-
-   Example Discovery SDMA node:
-     sdma@4000 {
-	     compatible = "marvell,mv64360-sdma";
-	     reg = <0x4000 0xc18>;
-	     virtual-reg = <0xf1004000>;
-	     interrupts = <36>;
-	     interrupt-parent = <&PIC>;
-     };
-
-
-   e) Marvell Discovery BRG nodes
-
-   Represent baud rate generator hardware associated with the MPSC
-   (multiprotocol serial controllers).
-
-   Required properties:
-     - compatible : "marvell,mv64360-brg"
-     - reg : Offset and length of the register set for this device
-     - clock-src : A value from 0 to 15 which selects the clock
-       source for the baud rate generator.  This value corresponds
-       to the CLKS value in the BRGx configuration register.  See
-       the mv64x60 User's Manual.
-     - clock-frequence : The frequency (in Hz) of the baud rate
-       generator's input clock.
-     - current-speed : The current speed setting (presumably by
-       firmware) of the baud rate generator.
-
-   Example Discovery BRG node:
-     brg@b200 {
-	     compatible = "marvell,mv64360-brg";
-	     reg = <0xb200 0x8>;
-	     clock-src = <8>;
-	     clock-frequency = <133333333>;
-	     current-speed = <9600>;
-     };
-
-
-   f) Marvell Discovery CUNIT nodes
-
-   Represent the Serial Communications Unit device hardware.
-
-   Required properties:
-     - reg : Offset and length of the register set for this device
-
-   Example Discovery CUNIT node:
-     cunit@f200 {
-	     reg = <0xf200 0x200>;
-     };
-
-
-   g) Marvell Discovery MPSCROUTING nodes
-
-   Represent the Discovery's MPSC routing hardware
-
-   Required properties:
-     - reg : Offset and length of the register set for this device
-
-   Example Discovery CUNIT node:
-     mpscrouting@b500 {
-	     reg = <0xb400 0xc>;
-     };
-
-
-   h) Marvell Discovery MPSCINTR nodes
-
-   Represent the Discovery's MPSC DMA interrupt hardware registers
-   (SDMA cause and mask registers).
-
-   Required properties:
-     - reg : Offset and length of the register set for this device
-
-   Example Discovery MPSCINTR node:
-     mpsintr@b800 {
-	     reg = <0xb800 0x100>;
-     };
-
-
-   i) Marvell Discovery MPSC nodes
-
-   Represent the Discovery's MPSC (Multiprotocol Serial Controller)
-   serial port.
-
-   Required properties:
-     - device_type : "serial"
-     - compatible : "marvell,mv64360-mpsc"
-     - reg : Offset and length of the register set for this device
-     - sdma : the phandle for the SDMA node used by this port
-     - brg : the phandle for the BRG node used by this port
-     - cunit : the phandle for the CUNIT node used by this port
-     - mpscrouting : the phandle for the MPSCROUTING node used by this port
-     - mpscintr : the phandle for the MPSCINTR node used by this port
-     - cell-index : the hardware index of this cell in the MPSC core
-     - max_idle : value needed for MPSC CHR3 (Maximum Frame Length)
-       register
-     - interrupts : <a> where a is the interrupt number for the MPSC.
-     - interrupt-parent : the phandle for the interrupt controller
-       that services interrupts for this device.
-
-   Example Discovery MPSCINTR node:
-     mpsc@8000 {
-	     device_type = "serial";
-	     compatible = "marvell,mv64360-mpsc";
-	     reg = <0x8000 0x38>;
-	     virtual-reg = <0xf1008000>;
-	     sdma = <&SDMA0>;
-	     brg = <&BRG0>;
-	     cunit = <&CUNIT>;
-	     mpscrouting = <&MPSCROUTING>;
-	     mpscintr = <&MPSCINTR>;
-	     cell-index = <0>;
-	     max_idle = <40>;
-	     interrupts = <40>;
-	     interrupt-parent = <&PIC>;
-     };
-
-
-   j) Marvell Discovery Watch Dog Timer nodes
-
-   Represent the Discovery's watchdog timer hardware
-
-   Required properties:
-     - compatible : "marvell,mv64360-wdt"
-     - reg : Offset and length of the register set for this device
-
-   Example Discovery Watch Dog Timer node:
-     wdt@b410 {
-	     compatible = "marvell,mv64360-wdt";
-	     reg = <0xb410 0x8>;
-     };
-
-
-   k) Marvell Discovery I2C nodes
-
-   Represent the Discovery's I2C hardware
-
-   Required properties:
-     - device_type : "i2c"
-     - compatible : "marvell,mv64360-i2c"
-     - reg : Offset and length of the register set for this device
-     - interrupts : <a> where a is the interrupt number for the I2C.
-     - interrupt-parent : the phandle for the interrupt controller
-       that services interrupts for this device.
-
-   Example Discovery I2C node:
-	     compatible = "marvell,mv64360-i2c";
-	     reg = <0xc000 0x20>;
-	     virtual-reg = <0xf100c000>;
-	     interrupts = <37>;
-	     interrupt-parent = <&PIC>;
-     };
-
-
-   l) Marvell Discovery PIC (Programmable Interrupt Controller) nodes
-
-   Represent the Discovery's PIC hardware
-
-   Required properties:
-     - #interrupt-cells : <1>
-     - #address-cells : <0>
-     - compatible : "marvell,mv64360-pic"
-     - reg : Offset and length of the register set for this device
-     - interrupt-controller
-
-   Example Discovery PIC node:
-     pic {
-	     #interrupt-cells = <1>;
-	     #address-cells = <0>;
-	     compatible = "marvell,mv64360-pic";
-	     reg = <0x0 0x88>;
-	     interrupt-controller;
-     };
-
-
-   m) Marvell Discovery MPP (Multipurpose Pins) multiplexing nodes
-
-   Represent the Discovery's MPP hardware
-
-   Required properties:
-     - compatible : "marvell,mv64360-mpp"
-     - reg : Offset and length of the register set for this device
-
-   Example Discovery MPP node:
-     mpp@f000 {
-	     compatible = "marvell,mv64360-mpp";
-	     reg = <0xf000 0x10>;
-     };
-
-
-   n) Marvell Discovery GPP (General Purpose Pins) nodes
-
-   Represent the Discovery's GPP hardware
-
-   Required properties:
-     - compatible : "marvell,mv64360-gpp"
-     - reg : Offset and length of the register set for this device
-
-   Example Discovery GPP node:
-     gpp@f000 {
-	     compatible = "marvell,mv64360-gpp";
-	     reg = <0xf100 0x20>;
-     };
-
-
-   o) Marvell Discovery PCI host bridge node
-
-   Represents the Discovery's PCI host bridge device.  The properties
-   for this node conform to Rev 2.1 of the PCI Bus Binding to IEEE
-   1275-1994.  A typical value for the compatible property is
-   "marvell,mv64360-pci".
-
-   Example Discovery PCI host bridge node
-     pci@80000000 {
-	     #address-cells = <3>;
-	     #size-cells = <2>;
-	     #interrupt-cells = <1>;
-	     device_type = "pci";
-	     compatible = "marvell,mv64360-pci";
-	     reg = <0xcf8 0x8>;
-	     ranges = <0x01000000 0x0        0x0
-			     0x88000000 0x0 0x01000000
-		       0x02000000 0x0 0x80000000
-			     0x80000000 0x0 0x08000000>;
-	     bus-range = <0 255>;
-	     clock-frequency = <66000000>;
-	     interrupt-parent = <&PIC>;
-	     interrupt-map-mask = <0xf800 0x0 0x0 0x7>;
-	     interrupt-map = <
-		     /* IDSEL 0x0a */
-		     0x5000 0 0 1 &PIC 80
-		     0x5000 0 0 2 &PIC 81
-		     0x5000 0 0 3 &PIC 91
-		     0x5000 0 0 4 &PIC 93
-
-		     /* IDSEL 0x0b */
-		     0x5800 0 0 1 &PIC 91
-		     0x5800 0 0 2 &PIC 93
-		     0x5800 0 0 3 &PIC 80
-		     0x5800 0 0 4 &PIC 81
-
-		     /* IDSEL 0x0c */
-		     0x6000 0 0 1 &PIC 91
-		     0x6000 0 0 2 &PIC 93
-		     0x6000 0 0 3 &PIC 80
-		     0x6000 0 0 4 &PIC 81
-
-		     /* IDSEL 0x0d */
-		     0x6800 0 0 1 &PIC 93
-		     0x6800 0 0 2 &PIC 80
-		     0x6800 0 0 3 &PIC 81
-		     0x6800 0 0 4 &PIC 91
-	     >;
-     };
-
-
-   p) Marvell Discovery CPU Error nodes
-
-   Represent the Discovery's CPU error handler device.
-
-   Required properties:
-     - compatible : "marvell,mv64360-cpu-error"
-     - reg : Offset and length of the register set for this device
-     - interrupts : the interrupt number for this device
-     - interrupt-parent : the phandle for the interrupt controller
-       that services interrupts for this device.
-
-   Example Discovery CPU Error node:
-     cpu-error@0070 {
-	     compatible = "marvell,mv64360-cpu-error";
-	     reg = <0x70 0x10 0x128 0x28>;
-	     interrupts = <3>;
-	     interrupt-parent = <&PIC>;
-     };
-
-
-   q) Marvell Discovery SRAM Controller nodes
-
-   Represent the Discovery's SRAM controller device.
-
-   Required properties:
-     - compatible : "marvell,mv64360-sram-ctrl"
-     - reg : Offset and length of the register set for this device
-     - interrupts : the interrupt number for this device
-     - interrupt-parent : the phandle for the interrupt controller
-       that services interrupts for this device.
-
-   Example Discovery SRAM Controller node:
-     sram-ctrl@0380 {
-	     compatible = "marvell,mv64360-sram-ctrl";
-	     reg = <0x380 0x80>;
-	     interrupts = <13>;
-	     interrupt-parent = <&PIC>;
-     };
-
-
-   r) Marvell Discovery PCI Error Handler nodes
-
-   Represent the Discovery's PCI error handler device.
-
-   Required properties:
-     - compatible : "marvell,mv64360-pci-error"
-     - reg : Offset and length of the register set for this device
-     - interrupts : the interrupt number for this device
-     - interrupt-parent : the phandle for the interrupt controller
-       that services interrupts for this device.
-
-   Example Discovery PCI Error Handler node:
-     pci-error@1d40 {
-	     compatible = "marvell,mv64360-pci-error";
-	     reg = <0x1d40 0x40 0xc28 0x4>;
-	     interrupts = <12>;
-	     interrupt-parent = <&PIC>;
-     };
-
-
-   s) Marvell Discovery Memory Controller nodes
-
-   Represent the Discovery's memory controller device.
-
-   Required properties:
-     - compatible : "marvell,mv64360-mem-ctrl"
-     - reg : Offset and length of the register set for this device
-     - interrupts : the interrupt number for this device
-     - interrupt-parent : the phandle for the interrupt controller
-       that services interrupts for this device.
-
-   Example Discovery Memory Controller node:
-     mem-ctrl@1400 {
-	     compatible = "marvell,mv64360-mem-ctrl";
-	     reg = <0x1400 0x60>;
-	     interrupts = <17>;
-	     interrupt-parent = <&PIC>;
-     };
-
-
-VIII - Specifying interrupt information for devices
+VII - Specifying interrupt information for devices
 ===================================================
 
 The device tree represents the busses and devices of a hardware
@@ -2439,56 +1324,7 @@ encodings listed below:
 	2 =  high to low edge sensitive type enabled
 	3 =  low to high edge sensitive type enabled
 
-IX - Specifying GPIO information for devices
-============================================
-
-1) gpios property
------------------
-
-Nodes that makes use of GPIOs should define them using `gpios' property,
-format of which is: <&gpio-controller1-phandle gpio1-specifier
-		     &gpio-controller2-phandle gpio2-specifier
-		     0 /* holes are permitted, means no GPIO 3 */
-		     &gpio-controller4-phandle gpio4-specifier
-		     ...>;
-
-Note that gpio-specifier length is controller dependent.
-
-gpio-specifier may encode: bank, pin position inside the bank,
-whether pin is open-drain and whether pin is logically inverted.
-
-Example of the node using GPIOs:
-
-	node {
-		gpios = <&qe_pio_e 18 0>;
-	};
-
-In this example gpio-specifier is "18 0" and encodes GPIO pin number,
-and empty GPIO flags as accepted by the "qe_pio_e" gpio-controller.
-
-2) gpio-controller nodes
-------------------------
-
-Every GPIO controller node must have #gpio-cells property defined,
-this information will be used to translate gpio-specifiers.
-
-Example of two SOC GPIO banks defined as gpio-controller nodes:
-
-	qe_pio_a: gpio-controller@1400 {
-		#gpio-cells = <2>;
-		compatible = "fsl,qe-pario-bank-a", "fsl,qe-pario-bank";
-		reg = <0x1400 0x18>;
-		gpio-controller;
-	};
-
-	qe_pio_e: gpio-controller@1460 {
-		#gpio-cells = <2>;
-		compatible = "fsl,qe-pario-bank-e", "fsl,qe-pario-bank";
-		reg = <0x1460 0x18>;
-		gpio-controller;
-	};
-
-X - Specifying Device Power Management Information (sleep property)
+VIII - Specifying Device Power Management Information (sleep property)
 ===================================================================
 
 Devices on SOCs often have mechanisms for placing devices into low-power
diff --git a/Documentation/powerpc/dts-bindings/4xx/emac.txt b/Documentation/powerpc/dts-bindings/4xx/emac.txt
new file mode 100644
index 00000000000..2161334a7ca
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/4xx/emac.txt
@@ -0,0 +1,148 @@
+    4xx/Axon EMAC ethernet nodes
+
+    The EMAC ethernet controller in IBM and AMCC 4xx chips, and also
+    the Axon bridge.  To operate this needs to interact with a ths
+    special McMAL DMA controller, and sometimes an RGMII or ZMII
+    interface.  In addition to the nodes and properties described
+    below, the node for the OPB bus on which the EMAC sits must have a
+    correct clock-frequency property.
+
+      i) The EMAC node itself
+
+    Required properties:
+    - device_type       : "network"
+
+    - compatible        : compatible list, contains 2 entries, first is
+			  "ibm,emac-CHIP" where CHIP is the host ASIC (440gx,
+			  405gp, Axon) and second is either "ibm,emac" or
+			  "ibm,emac4".  For Axon, thus, we have: "ibm,emac-axon",
+			  "ibm,emac4"
+    - interrupts        : <interrupt mapping for EMAC IRQ and WOL IRQ>
+    - interrupt-parent  : optional, if needed for interrupt mapping
+    - reg               : <registers mapping>
+    - local-mac-address : 6 bytes, MAC address
+    - mal-device        : phandle of the associated McMAL node
+    - mal-tx-channel    : 1 cell, index of the tx channel on McMAL associated
+			  with this EMAC
+    - mal-rx-channel    : 1 cell, index of the rx channel on McMAL associated
+			  with this EMAC
+    - cell-index        : 1 cell, hardware index of the EMAC cell on a given
+			  ASIC (typically 0x0 and 0x1 for EMAC0 and EMAC1 on
+			  each Axon chip)
+    - max-frame-size    : 1 cell, maximum frame size supported in bytes
+    - rx-fifo-size      : 1 cell, Rx fifo size in bytes for 10 and 100 Mb/sec
+			  operations.
+			  For Axon, 2048
+    - tx-fifo-size      : 1 cell, Tx fifo size in bytes for 10 and 100 Mb/sec
+			  operations.
+			  For Axon, 2048.
+    - fifo-entry-size   : 1 cell, size of a fifo entry (used to calculate
+			  thresholds).
+			  For Axon, 0x00000010
+    - mal-burst-size    : 1 cell, MAL burst size (used to calculate thresholds)
+			  in bytes.
+			  For Axon, 0x00000100 (I think ...)
+    - phy-mode          : string, mode of operations of the PHY interface.
+			  Supported values are: "mii", "rmii", "smii", "rgmii",
+			  "tbi", "gmii", rtbi", "sgmii".
+			  For Axon on CAB, it is "rgmii"
+    - mdio-device       : 1 cell, required iff using shared MDIO registers
+			  (440EP).  phandle of the EMAC to use to drive the
+			  MDIO lines for the PHY used by this EMAC.
+    - zmii-device       : 1 cell, required iff connected to a ZMII.  phandle of
+			  the ZMII device node
+    - zmii-channel      : 1 cell, required iff connected to a ZMII.  Which ZMII
+			  channel or 0xffffffff if ZMII is only used for MDIO.
+    - rgmii-device      : 1 cell, required iff connected to an RGMII. phandle
+			  of the RGMII device node.
+			  For Axon: phandle of plb5/plb4/opb/rgmii
+    - rgmii-channel     : 1 cell, required iff connected to an RGMII.  Which
+			  RGMII channel is used by this EMAC.
+			  Fox Axon: present, whatever value is appropriate for each
+			  EMAC, that is the content of the current (bogus) "phy-port"
+			  property.
+
+    Optional properties:
+    - phy-address       : 1 cell, optional, MDIO address of the PHY. If absent,
+			  a search is performed.
+    - phy-map           : 1 cell, optional, bitmap of addresses to probe the PHY
+			  for, used if phy-address is absent. bit 0x00000001 is
+			  MDIO address 0.
+			  For Axon it can be absent, though my current driver
+			  doesn't handle phy-address yet so for now, keep
+			  0x00ffffff in it.
+    - rx-fifo-size-gige : 1 cell, Rx fifo size in bytes for 1000 Mb/sec
+			  operations (if absent the value is the same as
+			  rx-fifo-size).  For Axon, either absent or 2048.
+    - tx-fifo-size-gige : 1 cell, Tx fifo size in bytes for 1000 Mb/sec
+			  operations (if absent the value is the same as
+			  tx-fifo-size). For Axon, either absent or 2048.
+    - tah-device        : 1 cell, optional. If connected to a TAH engine for
+			  offload, phandle of the TAH device node.
+    - tah-channel       : 1 cell, optional. If appropriate, channel used on the
+			  TAH engine.
+
+    Example:
+
+	EMAC0: ethernet@40000800 {
+		device_type = "network";
+		compatible = "ibm,emac-440gp", "ibm,emac";
+		interrupt-parent = <&UIC1>;
+		interrupts = <1c 4 1d 4>;
+		reg = <40000800 70>;
+		local-mac-address = [00 04 AC E3 1B 1E];
+		mal-device = <&MAL0>;
+		mal-tx-channel = <0 1>;
+		mal-rx-channel = <0>;
+		cell-index = <0>;
+		max-frame-size = <5dc>;
+		rx-fifo-size = <1000>;
+		tx-fifo-size = <800>;
+		phy-mode = "rmii";
+		phy-map = <00000001>;
+		zmii-device = <&ZMII0>;
+		zmii-channel = <0>;
+	};
+
+      ii) McMAL node
+
+    Required properties:
+    - device_type        : "dma-controller"
+    - compatible         : compatible list, containing 2 entries, first is
+			   "ibm,mcmal-CHIP" where CHIP is the host ASIC (like
+			   emac) and the second is either "ibm,mcmal" or
+			   "ibm,mcmal2".
+			   For Axon, "ibm,mcmal-axon","ibm,mcmal2"
+    - interrupts         : <interrupt mapping for the MAL interrupts sources:
+                           5 sources: tx_eob, rx_eob, serr, txde, rxde>.
+                           For Axon: This is _different_ from the current
+			   firmware.  We use the "delayed" interrupts for txeob
+			   and rxeob. Thus we end up with mapping those 5 MPIC
+			   interrupts, all level positive sensitive: 10, 11, 32,
+			   33, 34 (in decimal)
+    - dcr-reg            : < DCR registers range >
+    - dcr-parent         : if needed for dcr-reg
+    - num-tx-chans       : 1 cell, number of Tx channels
+    - num-rx-chans       : 1 cell, number of Rx channels
+
+      iii) ZMII node
+
+    Required properties:
+    - compatible         : compatible list, containing 2 entries, first is
+			   "ibm,zmii-CHIP" where CHIP is the host ASIC (like
+			   EMAC) and the second is "ibm,zmii".
+			   For Axon, there is no ZMII node.
+    - reg                : <registers mapping>
+
+      iv) RGMII node
+
+    Required properties:
+    - compatible         : compatible list, containing 2 entries, first is
+			   "ibm,rgmii-CHIP" where CHIP is the host ASIC (like
+			   EMAC) and the second is "ibm,rgmii".
+                           For Axon, "ibm,rgmii-axon","ibm,rgmii"
+    - reg                : <registers mapping>
+    - revision           : as provided by the RGMII new version register if
+			   available.
+			   For Axon: 0x0000012a
+
diff --git a/Documentation/powerpc/dts-bindings/fsl/esdhc.txt b/Documentation/powerpc/dts-bindings/fsl/esdhc.txt
index 5093ddf900d..3ed3797b508 100644
--- a/Documentation/powerpc/dts-bindings/fsl/esdhc.txt
+++ b/Documentation/powerpc/dts-bindings/fsl/esdhc.txt
@@ -10,6 +10,8 @@ Required properties:
   - interrupts : should contain eSDHC interrupt.
   - interrupt-parent : interrupt source phandle.
   - clock-frequency : specifies eSDHC base clock frequency.
+  - sdhci,1-bit-only : (optional) specifies that a controller can
+    only handle 1-bit data transfers.
 
 Example:
 
diff --git a/Documentation/powerpc/dts-bindings/gpio/gpio.txt b/Documentation/powerpc/dts-bindings/gpio/gpio.txt
new file mode 100644
index 00000000000..edaa84d288a
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/gpio/gpio.txt
@@ -0,0 +1,50 @@
+Specifying GPIO information for devices
+============================================
+
+1) gpios property
+-----------------
+
+Nodes that makes use of GPIOs should define them using `gpios' property,
+format of which is: <&gpio-controller1-phandle gpio1-specifier
+		     &gpio-controller2-phandle gpio2-specifier
+		     0 /* holes are permitted, means no GPIO 3 */
+		     &gpio-controller4-phandle gpio4-specifier
+		     ...>;
+
+Note that gpio-specifier length is controller dependent.
+
+gpio-specifier may encode: bank, pin position inside the bank,
+whether pin is open-drain and whether pin is logically inverted.
+
+Example of the node using GPIOs:
+
+	node {
+		gpios = <&qe_pio_e 18 0>;
+	};
+
+In this example gpio-specifier is "18 0" and encodes GPIO pin number,
+and empty GPIO flags as accepted by the "qe_pio_e" gpio-controller.
+
+2) gpio-controller nodes
+------------------------
+
+Every GPIO controller node must have #gpio-cells property defined,
+this information will be used to translate gpio-specifiers.
+
+Example of two SOC GPIO banks defined as gpio-controller nodes:
+
+	qe_pio_a: gpio-controller@1400 {
+		#gpio-cells = <2>;
+		compatible = "fsl,qe-pario-bank-a", "fsl,qe-pario-bank";
+		reg = <0x1400 0x18>;
+		gpio-controller;
+	};
+
+	qe_pio_e: gpio-controller@1460 {
+		#gpio-cells = <2>;
+		compatible = "fsl,qe-pario-bank-e", "fsl,qe-pario-bank";
+		reg = <0x1460 0x18>;
+		gpio-controller;
+	};
+
+
diff --git a/Documentation/powerpc/dts-bindings/gpio/led.txt b/Documentation/powerpc/dts-bindings/gpio/led.txt
index 4fe14deedc0..064db928c3c 100644
--- a/Documentation/powerpc/dts-bindings/gpio/led.txt
+++ b/Documentation/powerpc/dts-bindings/gpio/led.txt
@@ -16,10 +16,17 @@ LED sub-node properties:
   string defining the trigger assigned to the LED.  Current triggers are:
     "backlight" - LED will act as a back-light, controlled by the framebuffer
 		  system
-    "default-on" - LED will turn on
+    "default-on" - LED will turn on, but see "default-state" below
     "heartbeat" - LED "double" flashes at a load average based rate
     "ide-disk" - LED indicates disk activity
     "timer" - LED flashes at a fixed, configurable rate
+- default-state:  (optional) The initial state of the LED.  Valid
+  values are "on", "off", and "keep".  If the LED is already on or off
+  and the default-state property is set the to same value, then no
+  glitch should be produced where the LED momentarily turns off (or
+  on).  The "keep" setting will keep the LED at whatever its current
+  state is, without producing a glitch.  The default is off if this
+  property is not present.
 
 Examples:
 
@@ -30,14 +37,22 @@ leds {
 		gpios = <&mcu_pio 0 1>; /* Active low */
 		linux,default-trigger = "ide-disk";
 	};
+
+	fault {
+		gpios = <&mcu_pio 1 0>;
+		/* Keep LED on if BIOS detected hardware fault */
+		default-state = "keep";
+	};
 };
 
 run-control {
 	compatible = "gpio-leds";
 	red {
 		gpios = <&mpc8572 6 0>;
+		default-state = "off";
 	};
 	green {
 		gpios = <&mpc8572 7 0>;
+		default-state = "on";
 	};
 }
diff --git a/Documentation/powerpc/dts-bindings/gpio/mdio.txt b/Documentation/powerpc/dts-bindings/gpio/mdio.txt
new file mode 100644
index 00000000000..bc954952901
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/gpio/mdio.txt
@@ -0,0 +1,19 @@
+MDIO on GPIOs
+
+Currently defined compatibles:
+- virtual,gpio-mdio
+
+MDC and MDIO lines connected to GPIO controllers are listed in the
+gpios property as described in section VIII.1 in the following order:
+
+MDC, MDIO.
+
+Example:
+
+mdio {
+	compatible = "virtual,mdio-gpio";
+	#address-cells = <1>;
+	#size-cells = <0>;
+	gpios = <&qe_pio_a 11
+		 &qe_pio_c 6>;
+};
diff --git a/Documentation/powerpc/dts-bindings/marvell.txt b/Documentation/powerpc/dts-bindings/marvell.txt
new file mode 100644
index 00000000000..3708a2fd474
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/marvell.txt
@@ -0,0 +1,521 @@
+Marvell Discovery mv64[345]6x System Controller chips
+===========================================================
+
+The Marvell mv64[345]60 series of system controller chips contain
+many of the peripherals needed to implement a complete computer
+system.  In this section, we define device tree nodes to describe
+the system controller chip itself and each of the peripherals
+which it contains.  Compatible string values for each node are
+prefixed with the string "marvell,", for Marvell Technology Group Ltd.
+
+1) The /system-controller node
+
+  This node is used to represent the system-controller and must be
+  present when the system uses a system controller chip. The top-level
+  system-controller node contains information that is global to all
+  devices within the system controller chip. The node name begins
+  with "system-controller" followed by the unit address, which is
+  the base address of the memory-mapped register set for the system
+  controller chip.
+
+  Required properties:
+
+    - ranges : Describes the translation of system controller addresses
+      for memory mapped registers.
+    - clock-frequency: Contains the main clock frequency for the system
+      controller chip.
+    - reg : This property defines the address and size of the
+      memory-mapped registers contained within the system controller
+      chip.  The address specified in the "reg" property should match
+      the unit address of the system-controller node.
+    - #address-cells : Address representation for system controller
+      devices.  This field represents the number of cells needed to
+      represent the address of the memory-mapped registers of devices
+      within the system controller chip.
+    - #size-cells : Size representation for for the memory-mapped
+      registers within the system controller chip.
+    - #interrupt-cells : Defines the width of cells used to represent
+      interrupts.
+
+  Optional properties:
+
+    - model : The specific model of the system controller chip.  Such
+      as, "mv64360", "mv64460", or "mv64560".
+    - compatible : A string identifying the compatibility identifiers
+      of the system controller chip.
+
+  The system-controller node contains child nodes for each system
+  controller device that the platform uses.  Nodes should not be created
+  for devices which exist on the system controller chip but are not used
+
+  Example Marvell Discovery mv64360 system-controller node:
+
+    system-controller@f1000000 { /* Marvell Discovery mv64360 */
+	    #address-cells = <1>;
+	    #size-cells = <1>;
+	    model = "mv64360";                      /* Default */
+	    compatible = "marvell,mv64360";
+	    clock-frequency = <133333333>;
+	    reg = <0xf1000000 0x10000>;
+	    virtual-reg = <0xf1000000>;
+	    ranges = <0x88000000 0x88000000 0x1000000 /* PCI 0 I/O Space */
+		    0x80000000 0x80000000 0x8000000 /* PCI 0 MEM Space */
+		    0xa0000000 0xa0000000 0x4000000 /* User FLASH */
+		    0x00000000 0xf1000000 0x0010000 /* Bridge's regs */
+		    0xf2000000 0xf2000000 0x0040000>;/* Integrated SRAM */
+
+	    [ child node definitions... ]
+    }
+
+2) Child nodes of /system-controller
+
+   a) Marvell Discovery MDIO bus
+
+   The MDIO is a bus to which the PHY devices are connected.  For each
+   device that exists on this bus, a child node should be created.  See
+   the definition of the PHY node below for an example of how to define
+   a PHY.
+
+   Required properties:
+     - #address-cells : Should be <1>
+     - #size-cells : Should be <0>
+     - device_type : Should be "mdio"
+     - compatible : Should be "marvell,mv64360-mdio"
+
+   Example:
+
+     mdio {
+	     #address-cells = <1>;
+	     #size-cells = <0>;
+	     device_type = "mdio";
+	     compatible = "marvell,mv64360-mdio";
+
+	     ethernet-phy@0 {
+		     ......
+	     };
+     };
+
+
+   b) Marvell Discovery ethernet controller
+
+   The Discover ethernet controller is described with two levels
+   of nodes.  The first level describes an ethernet silicon block
+   and the second level describes up to 3 ethernet nodes within
+   that block.  The reason for the multiple levels is that the
+   registers for the node are interleaved within a single set
+   of registers.  The "ethernet-block" level describes the
+   shared register set, and the "ethernet" nodes describe ethernet
+   port-specific properties.
+
+   Ethernet block node
+
+   Required properties:
+     - #address-cells : <1>
+     - #size-cells : <0>
+     - compatible : "marvell,mv64360-eth-block"
+     - reg : Offset and length of the register set for this block
+
+   Example Discovery Ethernet block node:
+     ethernet-block@2000 {
+	     #address-cells = <1>;
+	     #size-cells = <0>;
+	     compatible = "marvell,mv64360-eth-block";
+	     reg = <0x2000 0x2000>;
+	     ethernet@0 {
+		     .......
+	     };
+     };
+
+   Ethernet port node
+
+   Required properties:
+     - device_type : Should be "network".
+     - compatible : Should be "marvell,mv64360-eth".
+     - reg : Should be <0>, <1>, or <2>, according to which registers
+       within the silicon block the device uses.
+     - interrupts : <a> where a is the interrupt number for the port.
+     - interrupt-parent : the phandle for the interrupt controller
+       that services interrupts for this device.
+     - phy : the phandle for the PHY connected to this ethernet
+       controller.
+     - local-mac-address : 6 bytes, MAC address
+
+   Example Discovery Ethernet port node:
+     ethernet@0 {
+	     device_type = "network";
+	     compatible = "marvell,mv64360-eth";
+	     reg = <0>;
+	     interrupts = <32>;
+	     interrupt-parent = <&PIC>;
+	     phy = <&PHY0>;
+	     local-mac-address = [ 00 00 00 00 00 00 ];
+     };
+
+
+
+   c) Marvell Discovery PHY nodes
+
+   Required properties:
+     - device_type : Should be "ethernet-phy"
+     - interrupts : <a> where a is the interrupt number for this phy.
+     - interrupt-parent : the phandle for the interrupt controller that
+       services interrupts for this device.
+     - reg : The ID number for the phy, usually a small integer
+
+   Example Discovery PHY node:
+     ethernet-phy@1 {
+	     device_type = "ethernet-phy";
+	     compatible = "broadcom,bcm5421";
+	     interrupts = <76>;      /* GPP 12 */
+	     interrupt-parent = <&PIC>;
+	     reg = <1>;
+     };
+
+
+   d) Marvell Discovery SDMA nodes
+
+   Represent DMA hardware associated with the MPSC (multiprotocol
+   serial controllers).
+
+   Required properties:
+     - compatible : "marvell,mv64360-sdma"
+     - reg : Offset and length of the register set for this device
+     - interrupts : <a> where a is the interrupt number for the DMA
+       device.
+     - interrupt-parent : the phandle for the interrupt controller
+       that services interrupts for this device.
+
+   Example Discovery SDMA node:
+     sdma@4000 {
+	     compatible = "marvell,mv64360-sdma";
+	     reg = <0x4000 0xc18>;
+	     virtual-reg = <0xf1004000>;
+	     interrupts = <36>;
+	     interrupt-parent = <&PIC>;
+     };
+
+
+   e) Marvell Discovery BRG nodes
+
+   Represent baud rate generator hardware associated with the MPSC
+   (multiprotocol serial controllers).
+
+   Required properties:
+     - compatible : "marvell,mv64360-brg"
+     - reg : Offset and length of the register set for this device
+     - clock-src : A value from 0 to 15 which selects the clock
+       source for the baud rate generator.  This value corresponds
+       to the CLKS value in the BRGx configuration register.  See
+       the mv64x60 User's Manual.
+     - clock-frequence : The frequency (in Hz) of the baud rate
+       generator's input clock.
+     - current-speed : The current speed setting (presumably by
+       firmware) of the baud rate generator.
+
+   Example Discovery BRG node:
+     brg@b200 {
+	     compatible = "marvell,mv64360-brg";
+	     reg = <0xb200 0x8>;
+	     clock-src = <8>;
+	     clock-frequency = <133333333>;
+	     current-speed = <9600>;
+     };
+
+
+   f) Marvell Discovery CUNIT nodes
+
+   Represent the Serial Communications Unit device hardware.
+
+   Required properties:
+     - reg : Offset and length of the register set for this device
+
+   Example Discovery CUNIT node:
+     cunit@f200 {
+	     reg = <0xf200 0x200>;
+     };
+
+
+   g) Marvell Discovery MPSCROUTING nodes
+
+   Represent the Discovery's MPSC routing hardware
+
+   Required properties:
+     - reg : Offset and length of the register set for this device
+
+   Example Discovery CUNIT node:
+     mpscrouting@b500 {
+	     reg = <0xb400 0xc>;
+     };
+
+
+   h) Marvell Discovery MPSCINTR nodes
+
+   Represent the Discovery's MPSC DMA interrupt hardware registers
+   (SDMA cause and mask registers).
+
+   Required properties:
+     - reg : Offset and length of the register set for this device
+
+   Example Discovery MPSCINTR node:
+     mpsintr@b800 {
+	     reg = <0xb800 0x100>;
+     };
+
+
+   i) Marvell Discovery MPSC nodes
+
+   Represent the Discovery's MPSC (Multiprotocol Serial Controller)
+   serial port.
+
+   Required properties:
+     - device_type : "serial"
+     - compatible : "marvell,mv64360-mpsc"
+     - reg : Offset and length of the register set for this device
+     - sdma : the phandle for the SDMA node used by this port
+     - brg : the phandle for the BRG node used by this port
+     - cunit : the phandle for the CUNIT node used by this port
+     - mpscrouting : the phandle for the MPSCROUTING node used by this port
+     - mpscintr : the phandle for the MPSCINTR node used by this port
+     - cell-index : the hardware index of this cell in the MPSC core
+     - max_idle : value needed for MPSC CHR3 (Maximum Frame Length)
+       register
+     - interrupts : <a> where a is the interrupt number for the MPSC.
+     - interrupt-parent : the phandle for the interrupt controller
+       that services interrupts for this device.
+
+   Example Discovery MPSCINTR node:
+     mpsc@8000 {
+	     device_type = "serial";
+	     compatible = "marvell,mv64360-mpsc";
+	     reg = <0x8000 0x38>;
+	     virtual-reg = <0xf1008000>;
+	     sdma = <&SDMA0>;
+	     brg = <&BRG0>;
+	     cunit = <&CUNIT>;
+	     mpscrouting = <&MPSCROUTING>;
+	     mpscintr = <&MPSCINTR>;
+	     cell-index = <0>;
+	     max_idle = <40>;
+	     interrupts = <40>;
+	     interrupt-parent = <&PIC>;
+     };
+
+
+   j) Marvell Discovery Watch Dog Timer nodes
+
+   Represent the Discovery's watchdog timer hardware
+
+   Required properties:
+     - compatible : "marvell,mv64360-wdt"
+     - reg : Offset and length of the register set for this device
+
+   Example Discovery Watch Dog Timer node:
+     wdt@b410 {
+	     compatible = "marvell,mv64360-wdt";
+	     reg = <0xb410 0x8>;
+     };
+
+
+   k) Marvell Discovery I2C nodes
+
+   Represent the Discovery's I2C hardware
+
+   Required properties:
+     - device_type : "i2c"
+     - compatible : "marvell,mv64360-i2c"
+     - reg : Offset and length of the register set for this device
+     - interrupts : <a> where a is the interrupt number for the I2C.
+     - interrupt-parent : the phandle for the interrupt controller
+       that services interrupts for this device.
+
+   Example Discovery I2C node:
+	     compatible = "marvell,mv64360-i2c";
+	     reg = <0xc000 0x20>;
+	     virtual-reg = <0xf100c000>;
+	     interrupts = <37>;
+	     interrupt-parent = <&PIC>;
+     };
+
+
+   l) Marvell Discovery PIC (Programmable Interrupt Controller) nodes
+
+   Represent the Discovery's PIC hardware
+
+   Required properties:
+     - #interrupt-cells : <1>
+     - #address-cells : <0>
+     - compatible : "marvell,mv64360-pic"
+     - reg : Offset and length of the register set for this device
+     - interrupt-controller
+
+   Example Discovery PIC node:
+     pic {
+	     #interrupt-cells = <1>;
+	     #address-cells = <0>;
+	     compatible = "marvell,mv64360-pic";
+	     reg = <0x0 0x88>;
+	     interrupt-controller;
+     };
+
+
+   m) Marvell Discovery MPP (Multipurpose Pins) multiplexing nodes
+
+   Represent the Discovery's MPP hardware
+
+   Required properties:
+     - compatible : "marvell,mv64360-mpp"
+     - reg : Offset and length of the register set for this device
+
+   Example Discovery MPP node:
+     mpp@f000 {
+	     compatible = "marvell,mv64360-mpp";
+	     reg = <0xf000 0x10>;
+     };
+
+
+   n) Marvell Discovery GPP (General Purpose Pins) nodes
+
+   Represent the Discovery's GPP hardware
+
+   Required properties:
+     - compatible : "marvell,mv64360-gpp"
+     - reg : Offset and length of the register set for this device
+
+   Example Discovery GPP node:
+     gpp@f000 {
+	     compatible = "marvell,mv64360-gpp";
+	     reg = <0xf100 0x20>;
+     };
+
+
+   o) Marvell Discovery PCI host bridge node
+
+   Represents the Discovery's PCI host bridge device.  The properties
+   for this node conform to Rev 2.1 of the PCI Bus Binding to IEEE
+   1275-1994.  A typical value for the compatible property is
+   "marvell,mv64360-pci".
+
+   Example Discovery PCI host bridge node
+     pci@80000000 {
+	     #address-cells = <3>;
+	     #size-cells = <2>;
+	     #interrupt-cells = <1>;
+	     device_type = "pci";
+	     compatible = "marvell,mv64360-pci";
+	     reg = <0xcf8 0x8>;
+	     ranges = <0x01000000 0x0        0x0
+			     0x88000000 0x0 0x01000000
+		       0x02000000 0x0 0x80000000
+			     0x80000000 0x0 0x08000000>;
+	     bus-range = <0 255>;
+	     clock-frequency = <66000000>;
+	     interrupt-parent = <&PIC>;
+	     interrupt-map-mask = <0xf800 0x0 0x0 0x7>;
+	     interrupt-map = <
+		     /* IDSEL 0x0a */
+		     0x5000 0 0 1 &PIC 80
+		     0x5000 0 0 2 &PIC 81
+		     0x5000 0 0 3 &PIC 91
+		     0x5000 0 0 4 &PIC 93
+
+		     /* IDSEL 0x0b */
+		     0x5800 0 0 1 &PIC 91
+		     0x5800 0 0 2 &PIC 93
+		     0x5800 0 0 3 &PIC 80
+		     0x5800 0 0 4 &PIC 81
+
+		     /* IDSEL 0x0c */
+		     0x6000 0 0 1 &PIC 91
+		     0x6000 0 0 2 &PIC 93
+		     0x6000 0 0 3 &PIC 80
+		     0x6000 0 0 4 &PIC 81
+
+		     /* IDSEL 0x0d */
+		     0x6800 0 0 1 &PIC 93
+		     0x6800 0 0 2 &PIC 80
+		     0x6800 0 0 3 &PIC 81
+		     0x6800 0 0 4 &PIC 91
+	     >;
+     };
+
+
+   p) Marvell Discovery CPU Error nodes
+
+   Represent the Discovery's CPU error handler device.
+
+   Required properties:
+     - compatible : "marvell,mv64360-cpu-error"
+     - reg : Offset and length of the register set for this device
+     - interrupts : the interrupt number for this device
+     - interrupt-parent : the phandle for the interrupt controller
+       that services interrupts for this device.
+
+   Example Discovery CPU Error node:
+     cpu-error@0070 {
+	     compatible = "marvell,mv64360-cpu-error";
+	     reg = <0x70 0x10 0x128 0x28>;
+	     interrupts = <3>;
+	     interrupt-parent = <&PIC>;
+     };
+
+
+   q) Marvell Discovery SRAM Controller nodes
+
+   Represent the Discovery's SRAM controller device.
+
+   Required properties:
+     - compatible : "marvell,mv64360-sram-ctrl"
+     - reg : Offset and length of the register set for this device
+     - interrupts : the interrupt number for this device
+     - interrupt-parent : the phandle for the interrupt controller
+       that services interrupts for this device.
+
+   Example Discovery SRAM Controller node:
+     sram-ctrl@0380 {
+	     compatible = "marvell,mv64360-sram-ctrl";
+	     reg = <0x380 0x80>;
+	     interrupts = <13>;
+	     interrupt-parent = <&PIC>;
+     };
+
+
+   r) Marvell Discovery PCI Error Handler nodes
+
+   Represent the Discovery's PCI error handler device.
+
+   Required properties:
+     - compatible : "marvell,mv64360-pci-error"
+     - reg : Offset and length of the register set for this device
+     - interrupts : the interrupt number for this device
+     - interrupt-parent : the phandle for the interrupt controller
+       that services interrupts for this device.
+
+   Example Discovery PCI Error Handler node:
+     pci-error@1d40 {
+	     compatible = "marvell,mv64360-pci-error";
+	     reg = <0x1d40 0x40 0xc28 0x4>;
+	     interrupts = <12>;
+	     interrupt-parent = <&PIC>;
+     };
+
+
+   s) Marvell Discovery Memory Controller nodes
+
+   Represent the Discovery's memory controller device.
+
+   Required properties:
+     - compatible : "marvell,mv64360-mem-ctrl"
+     - reg : Offset and length of the register set for this device
+     - interrupts : the interrupt number for this device
+     - interrupt-parent : the phandle for the interrupt controller
+       that services interrupts for this device.
+
+   Example Discovery Memory Controller node:
+     mem-ctrl@1400 {
+	     compatible = "marvell,mv64360-mem-ctrl";
+	     reg = <0x1400 0x60>;
+	     interrupts = <17>;
+	     interrupt-parent = <&PIC>;
+     };
+
+
diff --git a/Documentation/powerpc/dts-bindings/phy.txt b/Documentation/powerpc/dts-bindings/phy.txt
new file mode 100644
index 00000000000..bb8c742eb8c
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/phy.txt
@@ -0,0 +1,25 @@
+PHY nodes
+
+Required properties:
+
+ - device_type : Should be "ethernet-phy"
+ - interrupts : <a b> where a is the interrupt number and b is a
+   field that represents an encoding of the sense and level
+   information for the interrupt.  This should be encoded based on
+   the information in section 2) depending on the type of interrupt
+   controller you have.
+ - interrupt-parent : the phandle for the interrupt controller that
+   services interrupts for this device.
+ - reg : The ID number for the phy, usually a small integer
+ - linux,phandle :  phandle for this node; likely referenced by an
+   ethernet controller node.
+
+Example:
+
+ethernet-phy@0 {
+	linux,phandle = <2452000>
+	interrupt-parent = <40000>;
+	interrupts = <35 1>;
+	reg = <0>;
+	device_type = "ethernet-phy";
+};
diff --git a/Documentation/powerpc/dts-bindings/spi-bus.txt b/Documentation/powerpc/dts-bindings/spi-bus.txt
new file mode 100644
index 00000000000..e782add2e45
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/spi-bus.txt
@@ -0,0 +1,57 @@
+SPI (Serial Peripheral Interface) busses
+
+SPI busses can be described with a node for the SPI master device
+and a set of child nodes for each SPI slave on the bus.  For this
+discussion, it is assumed that the system's SPI controller is in
+SPI master mode.  This binding does not describe SPI controllers
+in slave mode.
+
+The SPI master node requires the following properties:
+- #address-cells  - number of cells required to define a chip select
+    		address on the SPI bus.
+- #size-cells     - should be zero.
+- compatible      - name of SPI bus controller following generic names
+    		recommended practice.
+No other properties are required in the SPI bus node.  It is assumed
+that a driver for an SPI bus device will understand that it is an SPI bus.
+However, the binding does not attempt to define the specific method for
+assigning chip select numbers.  Since SPI chip select configuration is
+flexible and non-standardized, it is left out of this binding with the
+assumption that board specific platform code will be used to manage
+chip selects.  Individual drivers can define additional properties to
+support describing the chip select layout.
+
+SPI slave nodes must be children of the SPI master node and can
+contain the following properties.
+- reg             - (required) chip select address of device.
+- compatible      - (required) name of SPI device following generic names
+    		recommended practice
+- spi-max-frequency - (required) Maximum SPI clocking speed of device in Hz
+- spi-cpol        - (optional) Empty property indicating device requires
+    		inverse clock polarity (CPOL) mode
+- spi-cpha        - (optional) Empty property indicating device requires
+    		shifted clock phase (CPHA) mode
+- spi-cs-high     - (optional) Empty property indicating device requires
+    		chip select active high
+
+SPI example for an MPC5200 SPI bus:
+	spi@f00 {
+		#address-cells = <1>;
+		#size-cells = <0>;
+		compatible = "fsl,mpc5200b-spi","fsl,mpc5200-spi";
+		reg = <0xf00 0x20>;
+		interrupts = <2 13 0 2 14 0>;
+		interrupt-parent = <&mpc5200_pic>;
+
+		ethernet-switch@0 {
+			compatible = "micrel,ks8995m";
+			spi-max-frequency = <1000000>;
+			reg = <0>;
+		};
+
+		codec@1 {
+			compatible = "ti,tlv320aic26";
+			spi-max-frequency = <100000>;
+			reg = <1>;
+		};
+	};
diff --git a/Documentation/powerpc/dts-bindings/usb-ehci.txt b/Documentation/powerpc/dts-bindings/usb-ehci.txt
new file mode 100644
index 00000000000..fa18612f757
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/usb-ehci.txt
@@ -0,0 +1,25 @@
+USB EHCI controllers
+
+Required properties:
+  - compatible : should be "usb-ehci".
+  - reg : should contain at least address and length of the standard EHCI
+    register set for the device. Optional platform-dependent registers
+    (debug-port or other) can be also specified here, but only after
+    definition of standard EHCI registers.
+  - interrupts : one EHCI interrupt should be described here.
+If device registers are implemented in big endian mode, the device
+node should have "big-endian-regs" property.
+If controller implementation operates with big endian descriptors,
+"big-endian-desc" property should be specified.
+If both big endian registers and descriptors are used by the controller
+implementation, "big-endian" property can be specified instead of having
+both "big-endian-regs" and "big-endian-desc".
+
+Example (Sequoia 440EPx):
+    ehci@e0000300 {
+	   compatible = "ibm,usb-ehci-440epx", "usb-ehci";
+	   interrupt-parent = <&UIC0>;
+	   interrupts = <1a 4>;
+	   reg = <0 e0000300 90 0 e0000390 70>;
+	   big-endian;
+   };
diff --git a/Documentation/powerpc/dts-bindings/xilinx.txt b/Documentation/powerpc/dts-bindings/xilinx.txt
new file mode 100644
index 00000000000..80339fe4300
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/xilinx.txt
@@ -0,0 +1,295 @@
+   d) Xilinx IP cores
+
+   The Xilinx EDK toolchain ships with a set of IP cores (devices) for use
+   in Xilinx Spartan and Virtex FPGAs.  The devices cover the whole range
+   of standard device types (network, serial, etc.) and miscellaneous
+   devices (gpio, LCD, spi, etc).  Also, since these devices are
+   implemented within the fpga fabric every instance of the device can be
+   synthesised with different options that change the behaviour.
+
+   Each IP-core has a set of parameters which the FPGA designer can use to
+   control how the core is synthesized.  Historically, the EDK tool would
+   extract the device parameters relevant to device drivers and copy them
+   into an 'xparameters.h' in the form of #define symbols.  This tells the
+   device drivers how the IP cores are configured, but it requres the kernel
+   to be recompiled every time the FPGA bitstream is resynthesized.
+
+   The new approach is to export the parameters into the device tree and
+   generate a new device tree each time the FPGA bitstream changes.  The
+   parameters which used to be exported as #defines will now become
+   properties of the device node.  In general, device nodes for IP-cores
+   will take the following form:
+
+	(name): (generic-name)@(base-address) {
+		compatible = "xlnx,(ip-core-name)-(HW_VER)"
+			     [, (list of compatible devices), ...];
+		reg = <(baseaddr) (size)>;
+		interrupt-parent = <&interrupt-controller-phandle>;
+		interrupts = < ... >;
+		xlnx,(parameter1) = "(string-value)";
+		xlnx,(parameter2) = <(int-value)>;
+	};
+
+	(generic-name):   an open firmware-style name that describes the
+			generic class of device.  Preferably, this is one word, such
+			as 'serial' or 'ethernet'.
+	(ip-core-name):	the name of the ip block (given after the BEGIN
+			directive in system.mhs).  Should be in lowercase
+			and all underscores '_' converted to dashes '-'.
+	(name):		is derived from the "PARAMETER INSTANCE" value.
+	(parameter#):	C_* parameters from system.mhs.  The C_ prefix is
+			dropped from the parameter name, the name is converted
+			to lowercase and all underscore '_' characters are
+			converted to dashes '-'.
+	(baseaddr):	the baseaddr parameter value (often named C_BASEADDR).
+	(HW_VER):	from the HW_VER parameter.
+	(size):		the address range size (often C_HIGHADDR - C_BASEADDR + 1).
+
+   Typically, the compatible list will include the exact IP core version
+   followed by an older IP core version which implements the same
+   interface or any other device with the same interface.
+
+   'reg', 'interrupt-parent' and 'interrupts' are all optional properties.
+
+   For example, the following block from system.mhs:
+
+	BEGIN opb_uartlite
+		PARAMETER INSTANCE = opb_uartlite_0
+		PARAMETER HW_VER = 1.00.b
+		PARAMETER C_BAUDRATE = 115200
+		PARAMETER C_DATA_BITS = 8
+		PARAMETER C_ODD_PARITY = 0
+		PARAMETER C_USE_PARITY = 0
+		PARAMETER C_CLK_FREQ = 50000000
+		PARAMETER C_BASEADDR = 0xEC100000
+		PARAMETER C_HIGHADDR = 0xEC10FFFF
+		BUS_INTERFACE SOPB = opb_7
+		PORT OPB_Clk = CLK_50MHz
+		PORT Interrupt = opb_uartlite_0_Interrupt
+		PORT RX = opb_uartlite_0_RX
+		PORT TX = opb_uartlite_0_TX
+		PORT OPB_Rst = sys_bus_reset_0
+	END
+
+   becomes the following device tree node:
+
+	opb_uartlite_0: serial@ec100000 {
+		device_type = "serial";
+		compatible = "xlnx,opb-uartlite-1.00.b";
+		reg = <ec100000 10000>;
+		interrupt-parent = <&opb_intc_0>;
+		interrupts = <1 0>; // got this from the opb_intc parameters
+		current-speed = <d#115200>;	// standard serial device prop
+		clock-frequency = <d#50000000>;	// standard serial device prop
+		xlnx,data-bits = <8>;
+		xlnx,odd-parity = <0>;
+		xlnx,use-parity = <0>;
+	};
+
+   Some IP cores actually implement 2 or more logical devices.  In
+   this case, the device should still describe the whole IP core with
+   a single node and add a child node for each logical device.  The
+   ranges property can be used to translate from parent IP-core to the
+   registers of each device.  In addition, the parent node should be
+   compatible with the bus type 'xlnx,compound', and should contain
+   #address-cells and #size-cells, as with any other bus.  (Note: this
+   makes the assumption that both logical devices have the same bus
+   binding.  If this is not true, then separate nodes should be used
+   for each logical device).  The 'cell-index' property can be used to
+   enumerate logical devices within an IP core.  For example, the
+   following is the system.mhs entry for the dual ps2 controller found
+   on the ml403 reference design.
+
+	BEGIN opb_ps2_dual_ref
+		PARAMETER INSTANCE = opb_ps2_dual_ref_0
+		PARAMETER HW_VER = 1.00.a
+		PARAMETER C_BASEADDR = 0xA9000000
+		PARAMETER C_HIGHADDR = 0xA9001FFF
+		BUS_INTERFACE SOPB = opb_v20_0
+		PORT Sys_Intr1 = ps2_1_intr
+		PORT Sys_Intr2 = ps2_2_intr
+		PORT Clkin1 = ps2_clk_rx_1
+		PORT Clkin2 = ps2_clk_rx_2
+		PORT Clkpd1 = ps2_clk_tx_1
+		PORT Clkpd2 = ps2_clk_tx_2
+		PORT Rx1 = ps2_d_rx_1
+		PORT Rx2 = ps2_d_rx_2
+		PORT Txpd1 = ps2_d_tx_1
+		PORT Txpd2 = ps2_d_tx_2
+	END
+
+   It would result in the following device tree nodes:
+
+	opb_ps2_dual_ref_0: opb-ps2-dual-ref@a9000000 {
+		#address-cells = <1>;
+		#size-cells = <1>;
+		compatible = "xlnx,compound";
+		ranges = <0 a9000000 2000>;
+		// If this device had extra parameters, then they would
+		// go here.
+		ps2@0 {
+			compatible = "xlnx,opb-ps2-dual-ref-1.00.a";
+			reg = <0 40>;
+			interrupt-parent = <&opb_intc_0>;
+			interrupts = <3 0>;
+			cell-index = <0>;
+		};
+		ps2@1000 {
+			compatible = "xlnx,opb-ps2-dual-ref-1.00.a";
+			reg = <1000 40>;
+			interrupt-parent = <&opb_intc_0>;
+			interrupts = <3 0>;
+			cell-index = <0>;
+		};
+	};
+
+   Also, the system.mhs file defines bus attachments from the processor
+   to the devices.  The device tree structure should reflect the bus
+   attachments.  Again an example; this system.mhs fragment:
+
+	BEGIN ppc405_virtex4
+		PARAMETER INSTANCE = ppc405_0
+		PARAMETER HW_VER = 1.01.a
+		BUS_INTERFACE DPLB = plb_v34_0
+		BUS_INTERFACE IPLB = plb_v34_0
+	END
+
+	BEGIN opb_intc
+		PARAMETER INSTANCE = opb_intc_0
+		PARAMETER HW_VER = 1.00.c
+		PARAMETER C_BASEADDR = 0xD1000FC0
+		PARAMETER C_HIGHADDR = 0xD1000FDF
+		BUS_INTERFACE SOPB = opb_v20_0
+	END
+
+	BEGIN opb_uart16550
+		PARAMETER INSTANCE = opb_uart16550_0
+		PARAMETER HW_VER = 1.00.d
+		PARAMETER C_BASEADDR = 0xa0000000
+		PARAMETER C_HIGHADDR = 0xa0001FFF
+		BUS_INTERFACE SOPB = opb_v20_0
+	END
+
+	BEGIN plb_v34
+		PARAMETER INSTANCE = plb_v34_0
+		PARAMETER HW_VER = 1.02.a
+	END
+
+	BEGIN plb_bram_if_cntlr
+		PARAMETER INSTANCE = plb_bram_if_cntlr_0
+		PARAMETER HW_VER = 1.00.b
+		PARAMETER C_BASEADDR = 0xFFFF0000
+		PARAMETER C_HIGHADDR = 0xFFFFFFFF
+		BUS_INTERFACE SPLB = plb_v34_0
+	END
+
+	BEGIN plb2opb_bridge
+		PARAMETER INSTANCE = plb2opb_bridge_0
+		PARAMETER HW_VER = 1.01.a
+		PARAMETER C_RNG0_BASEADDR = 0x20000000
+		PARAMETER C_RNG0_HIGHADDR = 0x3FFFFFFF
+		PARAMETER C_RNG1_BASEADDR = 0x60000000
+		PARAMETER C_RNG1_HIGHADDR = 0x7FFFFFFF
+		PARAMETER C_RNG2_BASEADDR = 0x80000000
+		PARAMETER C_RNG2_HIGHADDR = 0xBFFFFFFF
+		PARAMETER C_RNG3_BASEADDR = 0xC0000000
+		PARAMETER C_RNG3_HIGHADDR = 0xDFFFFFFF
+		BUS_INTERFACE SPLB = plb_v34_0
+		BUS_INTERFACE MOPB = opb_v20_0
+	END
+
+   Gives this device tree (some properties removed for clarity):
+
+	plb@0 {
+		#address-cells = <1>;
+		#size-cells = <1>;
+		compatible = "xlnx,plb-v34-1.02.a";
+		device_type = "ibm,plb";
+		ranges; // 1:1 translation
+
+		plb_bram_if_cntrl_0: bram@ffff0000 {
+			reg = <ffff0000 10000>;
+		}
+
+		opb@20000000 {
+			#address-cells = <1>;
+			#size-cells = <1>;
+			ranges = <20000000 20000000 20000000
+				  60000000 60000000 20000000
+				  80000000 80000000 40000000
+				  c0000000 c0000000 20000000>;
+
+			opb_uart16550_0: serial@a0000000 {
+				reg = <a00000000 2000>;
+			};
+
+			opb_intc_0: interrupt-controller@d1000fc0 {
+				reg = <d1000fc0 20>;
+			};
+		};
+	};
+
+   That covers the general approach to binding xilinx IP cores into the
+   device tree.  The following are bindings for specific devices:
+
+      i) Xilinx ML300 Framebuffer
+
+      Simple framebuffer device from the ML300 reference design (also on the
+      ML403 reference design as well as others).
+
+      Optional properties:
+       - resolution = <xres yres> : pixel resolution of framebuffer.  Some
+                                    implementations use a different resolution.
+                                    Default is <d#640 d#480>
+       - virt-resolution = <xvirt yvirt> : Size of framebuffer in memory.
+                                           Default is <d#1024 d#480>.
+       - rotate-display (empty) : rotate display 180 degrees.
+
+      ii) Xilinx SystemACE
+
+      The Xilinx SystemACE device is used to program FPGAs from an FPGA
+      bitstream stored on a CF card.  It can also be used as a generic CF
+      interface device.
+
+      Optional properties:
+       - 8-bit (empty) : Set this property for SystemACE in 8 bit mode
+
+      iii) Xilinx EMAC and Xilinx TEMAC
+
+      Xilinx Ethernet devices.  In addition to general xilinx properties
+      listed above, nodes for these devices should include a phy-handle
+      property, and may include other common network device properties
+      like local-mac-address.
+
+      iv) Xilinx Uartlite
+
+      Xilinx uartlite devices are simple fixed speed serial ports.
+
+      Required properties:
+       - current-speed : Baud rate of uartlite
+
+      v) Xilinx hwicap
+
+		Xilinx hwicap devices provide access to the configuration logic
+		of the FPGA through the Internal Configuration Access Port
+		(ICAP).  The ICAP enables partial reconfiguration of the FPGA,
+		readback of the configuration information, and some control over
+		'warm boots' of the FPGA fabric.
+
+		Required properties:
+		- xlnx,family : The family of the FPGA, necessary since the
+                      capabilities of the underlying ICAP hardware
+                      differ between different families.  May be
+                      'virtex2p', 'virtex4', or 'virtex5'.
+
+      vi) Xilinx Uart 16550
+
+      Xilinx UART 16550 devices are very similar to the NS16550 but with
+      different register spacing and an offset from the base address.
+
+      Required properties:
+       - clock-frequency : Frequency of the clock input
+       - reg-offset : A value of 3 is required
+       - reg-shift : A value of 2 is required
+
+
diff --git a/Documentation/pps/pps.txt b/Documentation/pps/pps.txt
new file mode 100644
index 00000000000..125f4ab4899
--- /dev/null
+++ b/Documentation/pps/pps.txt
@@ -0,0 +1,172 @@
+
+			PPS - Pulse Per Second
+			----------------------
+
+(C) Copyright 2007 Rodolfo Giometti <giometti@enneenne.com>
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+
+
+Overview
+--------
+
+LinuxPPS provides a programming interface (API) to define in the
+system several PPS sources.
+
+PPS means "pulse per second" and a PPS source is just a device which
+provides a high precision signal each second so that an application
+can use it to adjust system clock time.
+
+A PPS source can be connected to a serial port (usually to the Data
+Carrier Detect pin) or to a parallel port (ACK-pin) or to a special
+CPU's GPIOs (this is the common case in embedded systems) but in each
+case when a new pulse arrives the system must apply to it a timestamp
+and record it for userland.
+
+Common use is the combination of the NTPD as userland program, with a
+GPS receiver as PPS source, to obtain a wallclock-time with
+sub-millisecond synchronisation to UTC.
+
+
+RFC considerations
+------------------
+
+While implementing a PPS API as RFC 2783 defines and using an embedded
+CPU GPIO-Pin as physical link to the signal, I encountered a deeper
+problem:
+
+   At startup it needs a file descriptor as argument for the function
+   time_pps_create().
+
+This implies that the source has a /dev/... entry. This assumption is
+ok for the serial and parallel port, where you can do something
+useful besides(!) the gathering of timestamps as it is the central
+task for a PPS-API. But this assumption does not work for a single
+purpose GPIO line. In this case even basic file-related functionality
+(like read() and write()) makes no sense at all and should not be a
+precondition for the use of a PPS-API.
+
+The problem can be simply solved if you consider that a PPS source is
+not always connected with a GPS data source.
+
+So your programs should check if the GPS data source (the serial port
+for instance) is a PPS source too, and if not they should provide the
+possibility to open another device as PPS source.
+
+In LinuxPPS the PPS sources are simply char devices usually mapped
+into files /dev/pps0, /dev/pps1, etc..
+
+
+Coding example
+--------------
+
+To register a PPS source into the kernel you should define a struct
+pps_source_info_s as follows:
+
+    static struct pps_source_info pps_ktimer_info = {
+	    .name         = "ktimer",
+	    .path         = "",
+	    .mode         = PPS_CAPTUREASSERT | PPS_OFFSETASSERT | \
+			    PPS_ECHOASSERT | \
+			    PPS_CANWAIT | PPS_TSFMT_TSPEC,
+	    .echo         = pps_ktimer_echo,
+	    .owner        = THIS_MODULE,
+    };
+
+and then calling the function pps_register_source() in your
+intialization routine as follows:
+
+    source = pps_register_source(&pps_ktimer_info,
+			PPS_CAPTUREASSERT | PPS_OFFSETASSERT);
+
+The pps_register_source() prototype is:
+
+  int pps_register_source(struct pps_source_info_s *info, int default_params)
+
+where "info" is a pointer to a structure that describes a particular
+PPS source, "default_params" tells the system what the initial default
+parameters for the device should be (it is obvious that these parameters
+must be a subset of ones defined in the struct
+pps_source_info_s which describe the capabilities of the driver).
+
+Once you have registered a new PPS source into the system you can
+signal an assert event (for example in the interrupt handler routine)
+just using:
+
+    pps_event(source, &ts, PPS_CAPTUREASSERT, ptr)
+
+where "ts" is the event's timestamp.
+
+The same function may also run the defined echo function
+(pps_ktimer_echo(), passing to it the "ptr" pointer) if the user
+asked for that... etc..
+
+Please see the file drivers/pps/clients/ktimer.c for example code.
+
+
+SYSFS support
+-------------
+
+If the SYSFS filesystem is enabled in the kernel it provides a new class:
+
+   $ ls /sys/class/pps/
+   pps0/  pps1/  pps2/
+
+Every directory is the ID of a PPS sources defined in the system and
+inside you find several files:
+
+   $ ls /sys/class/pps/pps0/
+   assert	clear  echo  mode  name  path  subsystem@  uevent
+
+Inside each "assert" and "clear" file you can find the timestamp and a
+sequence number:
+
+   $ cat /sys/class/pps/pps0/assert
+   1170026870.983207967#8
+
+Where before the "#" is the timestamp in seconds; after it is the
+sequence number. Other files are:
+
+* echo: reports if the PPS source has an echo function or not;
+
+* mode: reports available PPS functioning modes;
+
+* name: reports the PPS source's name;
+
+* path: reports the PPS source's device path, that is the device the
+  PPS source is connected to (if it exists).
+
+
+Testing the PPS support
+-----------------------
+
+In order to test the PPS support even without specific hardware you can use
+the ktimer driver (see the client subsection in the PPS configuration menu)
+and the userland tools provided into Documentaion/pps/ directory.
+
+Once you have enabled the compilation of ktimer just modprobe it (if
+not statically compiled):
+
+   # modprobe ktimer
+
+and the run ppstest as follow:
+
+   $ ./ppstest /dev/pps0
+   trying PPS source "/dev/pps1"
+   found PPS source "/dev/pps1"
+   ok, found 1 source(s), now start fetching data...
+   source 0 - assert 1186592699.388832443, sequence: 364 - clear  0.000000000, sequence: 0
+   source 0 - assert 1186592700.388931295, sequence: 365 - clear  0.000000000, sequence: 0
+   source 0 - assert 1186592701.389032765, sequence: 366 - clear  0.000000000, sequence: 0
+
+Please, note that to compile userland programs you need the file timepps.h
+(see Documentation/pps/).
diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt
index 1b74b5f30af..b4860509c31 100644
--- a/Documentation/rfkill.txt
+++ b/Documentation/rfkill.txt
@@ -3,9 +3,8 @@ rfkill - RF kill switch support
 
 1. Introduction
 2. Implementation details
-3. Kernel driver guidelines
-4. Kernel API
-5. Userspace support
+3. Kernel API
+4. Userspace support
 
 
 1. Introduction
@@ -19,82 +18,62 @@ disable all transmitters of a certain type (or all). This is intended for
 situations where transmitters need to be turned off, for example on
 aircraft.
 
+The rfkill subsystem has a concept of "hard" and "soft" block, which
+differ little in their meaning (block == transmitters off) but rather in
+whether they can be changed or not:
+ - hard block: read-only radio block that cannot be overriden by software
+ - soft block: writable radio block (need not be readable) that is set by
+               the system software.
 
 
 2. Implementation details
 
-The rfkill subsystem is composed of various components: the rfkill class, the
-rfkill-input module (an input layer handler), and some specific input layer
-events.
-
-The rfkill class is provided for kernel drivers to register their radio
-transmitter with the kernel, provide methods for turning it on and off and,
-optionally, letting the system know about hardware-disabled states that may
-be implemented on the device. This code is enabled with the CONFIG_RFKILL
-Kconfig option, which drivers can "select".
-
-The rfkill class code also notifies userspace of state changes, this is
-achieved via uevents. It also provides some sysfs files for userspace to
-check the status of radio transmitters. See the "Userspace support" section
-below.
+The rfkill subsystem is composed of three main components:
+ * the rfkill core,
+ * the deprecated rfkill-input module (an input layer handler, being
+   replaced by userspace policy code) and
+ * the rfkill drivers.
 
+The rfkill core provides API for kernel drivers to register their radio
+transmitter with the kernel, methods for turning it on and off and, letting
+the system know about hardware-disabled states that may be implemented on
+the device.
 
-The rfkill-input code implements a basic response to rfkill buttons -- it
-implements turning on/off all devices of a certain class (or all).
+The rfkill core code also notifies userspace of state changes, and provides
+ways for userspace to query the current states. See the "Userspace support"
+section below.
 
 When the device is hard-blocked (either by a call to rfkill_set_hw_state()
-or from query_hw_block) set_block() will be invoked but drivers can well
-ignore the method call since they can use the return value of the function
-rfkill_set_hw_state() to sync the software state instead of keeping track
-of calls to set_block().
-
-
-The entire functionality is spread over more than one subsystem:
-
- * The kernel input layer generates KEY_WWAN, KEY_WLAN etc. and
-   SW_RFKILL_ALL -- when the user presses a button. Drivers for radio
-   transmitters generally do not register to the input layer, unless the
-   device really provides an input device (i.e. a button that has no
-   effect other than generating a button press event)
-
- * The rfkill-input code hooks up to these events and switches the soft-block
-   of the various radio transmitters, depending on the button type.
-
- * The rfkill drivers turn off/on their transmitters as requested.
-
- * The rfkill class will generate userspace notifications (uevents) to tell
-   userspace what the current state is.
+or from query_hw_block) set_block() will be invoked for additional software
+block, but drivers can ignore the method call since they can use the return
+value of the function rfkill_set_hw_state() to sync the software state
+instead of keeping track of calls to set_block(). In fact, drivers should
+use the return value of rfkill_set_hw_state() unless the hardware actually
+keeps track of soft and hard block separately.
 
 
+3. Kernel API
 
-3. Kernel driver guidelines
 
-
-Drivers for radio transmitters normally implement only the rfkill class.
-These drivers may not unblock the transmitter based on own decisions, they
-should act on information provided by the rfkill class only.
+Drivers for radio transmitters normally implement an rfkill driver.
 
 Platform drivers might implement input devices if the rfkill button is just
 that, a button. If that button influences the hardware then you need to
-implement an rfkill class instead. This also applies if the platform provides
+implement an rfkill driver instead. This also applies if the platform provides
 a way to turn on/off the transmitter(s).
 
-During suspend/hibernation, transmitters should only be left enabled when
-wake-on wlan or similar functionality requires it and the device wasn't
-blocked before suspend/hibernate. Note that it may be necessary to update
-the rfkill subsystem's idea of what the current state is at resume time if
-the state may have changed over suspend.
-
+For some platforms, it is possible that the hardware state changes during
+suspend/hibernation, in which case it will be necessary to update the rfkill
+core with the current state is at resume time.
 
+To create an rfkill driver, driver's Kconfig needs to have
 
-4. Kernel API
+	depends on RFKILL || !RFKILL
 
-To build a driver with rfkill subsystem support, the driver should depend on
-(or select) the Kconfig symbol RFKILL.
-
-The hardware the driver talks to may be write-only (where the current state
-of the hardware is unknown), or read-write (where the hardware can be queried
-about its current state).
+to ensure the driver cannot be built-in when rfkill is modular. The !RFKILL
+case allows the driver to be built when rfkill is not configured, which which
+case all rfkill API can still be used but will be provided by static inlines
+which compile to almost nothing.
 
 Calling rfkill_set_hw_state() when a state change happens is required from
 rfkill drivers that control devices that can be hard-blocked unless they also
@@ -105,10 +84,35 @@ device). Don't do this unless you cannot get the event in any other way.
 
 5. Userspace support
 
-The following sysfs entries exist for every rfkill device:
+The recommended userspace interface to use is /dev/rfkill, which is a misc
+character device that allows userspace to obtain and set the state of rfkill
+devices and sets of devices. It also notifies userspace about device addition
+and removal. The API is a simple read/write API that is defined in
+linux/rfkill.h, with one ioctl that allows turning off the deprecated input
+handler in the kernel for the transition period.
+
+Except for the one ioctl, communication with the kernel is done via read()
+and write() of instances of 'struct rfkill_event'. In this structure, the
+soft and hard block are properly separated (unlike sysfs, see below) and
+userspace is able to get a consistent snapshot of all rfkill devices in the
+system. Also, it is possible to switch all rfkill drivers (or all drivers of
+a specified type) into a state which also updates the default state for
+hotplugged devices.
+
+After an application opens /dev/rfkill, it can read the current state of
+all devices, and afterwards can poll the descriptor for hotplug or state
+change events.
+
+Applications must ignore operations (the "op" field) they do not handle,
+this allows the API to be extended in the future.
+
+Additionally, each rfkill device is registered in sysfs and there has the
+following attributes:
 
 	name: Name assigned by driver to this key (interface or driver name).
-	type: Name of the key type ("wlan", "bluetooth", etc).
+	type: Driver type string ("wlan", "bluetooth", etc).
+	persistent: Whether the soft blocked state is initialised from
+	            non-volatile storage at startup.
 	state: Current state of the transmitter
 		0: RFKILL_STATE_SOFT_BLOCKED
 			transmitter is turned off by software
@@ -117,7 +121,12 @@ The following sysfs entries exist for every rfkill device:
 		2: RFKILL_STATE_HARD_BLOCKED
 			transmitter is forced off by something outside of
 			the driver's control.
-	claim: 0: Kernel handles events (currently always reads that value)
+	       This file is deprecated because it can only properly show
+	       three of the four possible states, soft-and-hard-blocked is
+	       missing.
+	claim: 0: Kernel handles events
+	       This file is deprecated because there no longer is a way to
+	       claim just control over a single rfkill instance.
 
 rfkill devices also issue uevents (with an action of "change"), with the
 following environment variables set:
@@ -128,9 +137,3 @@ RFKILL_TYPE
 
 The contents of these variables corresponds to the "name", "state" and
 "type" sysfs files explained above.
-
-An alternative userspace interface exists as a misc device /dev/rfkill,
-which allows userspace to obtain and set the state of rfkill devices and
-sets of devices. It also notifies userspace about device addition and
-removal. The API is a simple read/write API that is defined in
-linux/rfkill.h.
diff --git a/Documentation/robust-futex-ABI.txt b/Documentation/robust-futex-ABI.txt
index 535f69fab45..fd1cd8aae4e 100644
--- a/Documentation/robust-futex-ABI.txt
+++ b/Documentation/robust-futex-ABI.txt
@@ -135,7 +135,7 @@ manipulating this list), the user code must observe the following
 protocol on 'lock entry' insertion and removal:
 
 On insertion:
- 1) set the 'list_op_pending' word to the address of the 'lock word'
+ 1) set the 'list_op_pending' word to the address of the 'lock entry'
     to be inserted,
  2) acquire the futex lock,
  3) add the lock entry, with its thread id (TID) in the bottom 29 bits
@@ -143,7 +143,7 @@ On insertion:
  4) clear the 'list_op_pending' word.
 
 On removal:
- 1) set the 'list_op_pending' word to the address of the 'lock word'
+ 1) set the 'list_op_pending' word to the address of the 'lock entry'
     to be removed,
  2) remove the lock entry for this lock from the 'head' list,
  2) release the futex lock, and
diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt
index 1df7f9cdab0..86eabe6c341 100644
--- a/Documentation/scheduler/sched-rt-group.txt
+++ b/Documentation/scheduler/sched-rt-group.txt
@@ -73,7 +73,7 @@ The remaining CPU time will be used for user input and other tasks. Because
 realtime tasks have explicitly allocated the CPU time they need to perform
 their tasks, buffer underruns in the graphics or audio can be eliminated.
 
-NOTE: the above example is not fully implemented as of yet (2.6.25). We still
+NOTE: the above example is not fully implemented yet. We still
 lack an EDF scheduler to make non-uniform periods usable.
 
 
@@ -140,14 +140,15 @@ The other option is:
 
 .o CONFIG_CGROUP_SCHED (aka "Basis for grouping tasks" = "Control groups")
 
-This uses the /cgroup virtual file system and "/cgroup/<cgroup>/cpu.rt_runtime_us"
-to control the CPU time reserved for each control group instead.
+This uses the /cgroup virtual file system and
+"/cgroup/<cgroup>/cpu.rt_runtime_us" to control the CPU time reserved for each
+control group instead.
 
 For more information on working with control groups, you should read
 Documentation/cgroups/cgroups.txt as well.
 
-Group settings are checked against the following limits in order to keep the configuration
-schedulable:
+Group settings are checked against the following limits in order to keep the
+configuration schedulable:
 
    \Sum_{i} runtime_{i} / global_period <= global_runtime / global_period
 
@@ -189,7 +190,7 @@ Implementing SCHED_EDF might take a while to complete. Priority Inheritance is
 the biggest challenge as the current linux PI infrastructure is geared towards
 the limited static priority levels 0-99. With deadline scheduling you need to
 do deadline inheritance (since priority is inversely proportional to the
-deadline delta (deadline - now).
+deadline delta (deadline - now)).
 
 This means the whole PI machinery will have to be reworked - and that is one of
 the most complex pieces of code we have.
diff --git a/Documentation/scsi/scsi_fc_transport.txt b/Documentation/scsi/scsi_fc_transport.txt
index e5b071d4661..d7f181701dc 100644
--- a/Documentation/scsi/scsi_fc_transport.txt
+++ b/Documentation/scsi/scsi_fc_transport.txt
@@ -1,10 +1,11 @@
                              SCSI FC Tansport
                  =============================================
 
-Date:  4/12/2007
+Date:  11/18/2008
 Kernel Revisions for features:
   rports : <<TBS>>
-  vports : 2.6.22 (? TBD)
+  vports : 2.6.22
+  bsg support : 2.6.30 (?TBD?)
 
 
 Introduction
@@ -15,6 +16,7 @@ The FC transport can be found at:
   drivers/scsi/scsi_transport_fc.c
   include/scsi/scsi_transport_fc.h
   include/scsi/scsi_netlink_fc.h
+  include/scsi/scsi_bsg_fc.h
 
 This file is found at Documentation/scsi/scsi_fc_transport.txt
 
@@ -472,6 +474,14 @@ int
 fc_vport_terminate(struct fc_vport *vport)
 
 
+FC BSG support (CT & ELS passthru, and more)
+========================================================================
+<< To Be Supplied >>
+
+
+
+
+
 Credits
 =======
 The following people have contributed to this document:
diff --git a/Documentation/scsi/scsi_mid_low_api.txt b/Documentation/scsi/scsi_mid_low_api.txt
index a6d5354639b..de67229251d 100644
--- a/Documentation/scsi/scsi_mid_low_api.txt
+++ b/Documentation/scsi/scsi_mid_low_api.txt
@@ -1271,6 +1271,11 @@ of interest:
     hostdata[0]  - area reserved for LLD at end of struct Scsi_Host. Size
                    is set by the second argument (named 'xtr_bytes') to
                    scsi_host_alloc() or scsi_register().
+    vendor_id    - a unique value that identifies the vendor supplying
+                   the LLD for the Scsi_Host.  Used most often in validating
+                   vendor-specific message requests.  Value consists of an
+                   identifier type and a vendor-specific value.
+                   See scsi_netlink.h for a description of valid formats.
 
 The scsi_host structure is defined in include/scsi/scsi_host.h
 
diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt
index de8e10a9410..939a3dd5814 100644
--- a/Documentation/sound/alsa/HD-Audio-Models.txt
+++ b/Documentation/sound/alsa/HD-Audio-Models.txt
@@ -139,6 +139,7 @@ ALC883/888
   acer		Acer laptops (Travelmate 3012WTMi, Aspire 5600, etc)
   acer-aspire	Acer Aspire 9810
   acer-aspire-4930g Acer Aspire 4930G
+  acer-aspire-6530g Acer Aspire 6530G
   acer-aspire-8930g Acer Aspire 8930G
   medion	Medion Laptops
   medion-md2	Medion MD2
@@ -239,6 +240,7 @@ AD1986A
   laptop-automute 2-channel with EAPD and HP-automute (Lenovo N100)
   ultra		2-channel with EAPD (Samsung Ultra tablet PC)
   samsung	2-channel with EAPD (Samsung R65)
+  samsung-p50	2-channel with HP-automute (Samsung P50)
 
 AD1988/AD1988B/AD1989A/AD1989B
 ==============================
diff --git a/Documentation/sound/alsa/Procfile.txt b/Documentation/sound/alsa/Procfile.txt
index 381908d8ca4..719a819f8cc 100644
--- a/Documentation/sound/alsa/Procfile.txt
+++ b/Documentation/sound/alsa/Procfile.txt
@@ -101,6 +101,8 @@ card*/pcm*/xrun_debug
 	  bit 0 = Enable XRUN/jiffies debug messages
 	  bit 1 = Show stack trace at XRUN / jiffies check
 	  bit 2 = Enable additional jiffies check
+	  bit 3 = Log hwptr update at each period interrupt
+	  bit 4 = Log hwptr update at each snd_pcm_update_hw_ptr()
 
 	When the bit 0 is set, the driver will show the messages to
 	kernel log when an xrun is detected.  The debug message is
@@ -117,6 +119,9 @@ card*/pcm*/xrun_debug
 	buggy) hardware that doesn't give smooth pointer updates.
 	This feature is enabled via the bit 2.
 
+	Bits 3 and 4 are for logging the hwptr records.  Note that
+	these will give flood of kernel messages.
+
 card*/pcm*/sub*/info
 	The general information of this PCM sub-stream.
 
diff --git a/Documentation/spi/spidev_test.c b/Documentation/spi/spidev_test.c
index cf0e3ce0d52..c1a5aad3c75 100644
--- a/Documentation/spi/spidev_test.c
+++ b/Documentation/spi/spidev_test.c
@@ -99,11 +99,13 @@ void parse_opts(int argc, char *argv[])
 			{ "lsb",     0, 0, 'L' },
 			{ "cs-high", 0, 0, 'C' },
 			{ "3wire",   0, 0, '3' },
+			{ "no-cs",   0, 0, 'N' },
+			{ "ready",   0, 0, 'R' },
 			{ NULL, 0, 0, 0 },
 		};
 		int c;
 
-		c = getopt_long(argc, argv, "D:s:d:b:lHOLC3", lopts, NULL);
+		c = getopt_long(argc, argv, "D:s:d:b:lHOLC3NR", lopts, NULL);
 
 		if (c == -1)
 			break;
@@ -139,6 +141,12 @@ void parse_opts(int argc, char *argv[])
 		case '3':
 			mode |= SPI_3WIRE;
 			break;
+		case 'N':
+			mode |= SPI_NO_CS;
+			break;
+		case 'R':
+			mode |= SPI_READY;
+			break;
 		default:
 			print_usage(argv[0]);
 			break;
diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt
index cf42b820ff9..d56a0177542 100644
--- a/Documentation/sysrq.txt
+++ b/Documentation/sysrq.txt
@@ -66,7 +66,8 @@ On all -  write a character to /proc/sysrq-trigger.  e.g.:
 'b'     - Will immediately reboot the system without syncing or unmounting
           your disks.
 
-'c'	- Will perform a kexec reboot in order to take a crashdump.
+'c'	- Will perform a system crash by a NULL pointer dereference.
+          A crashdump will be taken if configured.
 
 'd'	- Shows all locks that are held.
 
@@ -141,8 +142,8 @@ useful when you want to exit a program that will not let you switch consoles.
 re'B'oot is good when you're unable to shut down. But you should also 'S'ync
 and 'U'mount first.
 
-'C'rashdump can be used to manually trigger a crashdump when the system is hung.
-The kernel needs to have been built with CONFIG_KEXEC enabled.
+'C'rash can be used to manually trigger a crashdump when the system is hung.
+Note that this just triggers a crash if there is no dump mechanism available.
 
 'S'ync is great when your system is locked up, it allows you to sync your
 disks and will certainly lessen the chance of data loss and fscking. Note
diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index f157d7594ea..2bcc8d4dea2 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -83,6 +83,15 @@ When reading one of these enable files, there are four results:
  X - there is a mixture of events enabled and disabled
  ? - this file does not affect any event
 
+2.3 Boot option
+---------------
+
+In order to facilitate early boot debugging, use boot option:
+
+	trace_event=[event-list]
+
+The format of this boot option is the same as described in section 2.1.
+
 3. Defining an event-enabled tracepoint
 =======================================
 
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index a39b3c749de..355d0f1f8c5 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -85,26 +85,19 @@ of ftrace. Here is a list of some of the key files:
 	This file holds the output of the trace in a human
 	readable format (described below).
 
-  latency_trace:
-
-	This file shows the same trace but the information
-	is organized more to display possible latencies
-	in the system (described below).
-
   trace_pipe:
 
 	The output is the same as the "trace" file but this
 	file is meant to be streamed with live tracing.
-	Reads from this file will block until new data
-	is retrieved. Unlike the "trace" and "latency_trace"
-	files, this file is a consumer. This means reading
-	from this file causes sequential reads to display
-	more current data. Once data is read from this
-	file, it is consumed, and will not be read
-	again with a sequential read. The "trace" and
-	"latency_trace" files are static, and if the
-	tracer is not adding more data, they will display
-	the same information every time they are read.
+	Reads from this file will block until new data is
+	retrieved.  Unlike the "trace" file, this file is a
+	consumer. This means reading from this file causes
+	sequential reads to display more current data. Once
+	data is read from this file, it is consumed, and
+	will not be read again with a sequential read. The
+	"trace" file is static, and if the tracer is not
+	adding more data,they will display the same
+	information every time they are read.
 
   trace_options:
 
@@ -117,10 +110,10 @@ of ftrace. Here is a list of some of the key files:
 	Some of the tracers record the max latency.
 	For example, the time interrupts are disabled.
 	This time is saved in this file. The max trace
-	will also be stored, and displayed by either
-	"trace" or "latency_trace".  A new max trace will
-	only be recorded if the latency is greater than
-	the value in this file. (in microseconds)
+	will also be stored, and displayed by "trace".
+	A new max trace will only be recorded if the
+	latency is greater than the value in this
+	file. (in microseconds)
 
   buffer_size_kb:
 
@@ -210,7 +203,7 @@ Here is the list of current tracers that may be configured.
 	the trace with the longest max latency.
 	See tracing_max_latency. When a new max is recorded,
 	it replaces the old trace. It is best to view this
-	trace via the latency_trace file.
+	trace with the latency-format option enabled.
 
   "preemptoff"
 
@@ -307,8 +300,8 @@ the lowest priority thread (pid 0).
 Latency trace format
 --------------------
 
-For traces that display latency times, the latency_trace file
-gives somewhat more information to see why a latency happened.
+When the latency-format option is enabled, the trace file gives
+somewhat more information to see why a latency happened.
 Here is a typical trace.
 
 # tracer: irqsoff
@@ -380,9 +373,10 @@ explains which is which.
 
 The above is mostly meaningful for kernel developers.
 
-  time: This differs from the trace file output. The trace file output
-	includes an absolute timestamp. The timestamp used by the
-	latency_trace file is relative to the start of the trace.
+  time: When the latency-format option is enabled, the trace file
+	output includes a timestamp relative to the start of the
+	trace. This differs from the output when latency-format
+	is disabled, which includes an absolute timestamp.
 
   delay: This is just to help catch your eye a bit better. And
 	 needs to be fixed to be only relative to the same CPU.
@@ -440,7 +434,8 @@ Here are the available options:
   sym-addr:
    bash-4000  [01]  1477.606694: simple_strtoul <c0339346>
 
-  verbose - This deals with the latency_trace file.
+  verbose - This deals with the trace file when the
+            latency-format option is enabled.
 
     bash  4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \
     (+0.000ms): simple_strtoul (strict_strtoul)
@@ -472,7 +467,7 @@ Here are the available options:
 		the app is no longer running
 
 		The lookup is performed when you read
-		trace,trace_pipe,latency_trace. Example:
+		trace,trace_pipe. Example:
 
 		a.out-1623  [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0
 x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
@@ -481,6 +476,11 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
 	       every scheduling event. Will add overhead if
 	       there's a lot of tasks running at once.
 
+  latency-format - This option changes the trace. When
+                   it is enabled, the trace displays
+                   additional information about the
+                   latencies, as described in "Latency
+                   trace format".
 
 sched_switch
 ------------
@@ -596,12 +596,13 @@ To reset the maximum, echo 0 into tracing_max_latency. Here is
 an example:
 
  # echo irqsoff > current_tracer
+ # echo latency-format > trace_options
  # echo 0 > tracing_max_latency
  # echo 1 > tracing_enabled
  # ls -ltr
  [...]
  # echo 0 > tracing_enabled
- # cat latency_trace
+ # cat trace
 # tracer: irqsoff
 #
 irqsoff latency trace v1.1.5 on 2.6.26
@@ -703,12 +704,13 @@ which preemption was disabled. The control of preemptoff tracer
 is much like the irqsoff tracer.
 
  # echo preemptoff > current_tracer
+ # echo latency-format > trace_options
  # echo 0 > tracing_max_latency
  # echo 1 > tracing_enabled
  # ls -ltr
  [...]
  # echo 0 > tracing_enabled
- # cat latency_trace
+ # cat trace
 # tracer: preemptoff
 #
 preemptoff latency trace v1.1.5 on 2.6.26-rc8
@@ -850,12 +852,13 @@ Again, using this trace is much like the irqsoff and preemptoff
 tracers.
 
  # echo preemptirqsoff > current_tracer
+ # echo latency-format > trace_options
  # echo 0 > tracing_max_latency
  # echo 1 > tracing_enabled
  # ls -ltr
  [...]
  # echo 0 > tracing_enabled
- # cat latency_trace
+ # cat trace
 # tracer: preemptirqsoff
 #
 preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
@@ -1012,11 +1015,12 @@ Instead of performing an 'ls', we will run 'sleep 1' under
 'chrt' which changes the priority of the task.
 
  # echo wakeup > current_tracer
+ # echo latency-format > trace_options
  # echo 0 > tracing_max_latency
  # echo 1 > tracing_enabled
  # chrt -f 5 sleep 1
  # echo 0 > tracing_enabled
- # cat latency_trace
+ # cat trace
 # tracer: wakeup
 #
 wakeup latency trace v1.1.5 on 2.6.26-rc8
diff --git a/Documentation/trace/function-graph-fold.vim b/Documentation/trace/function-graph-fold.vim
new file mode 100644
index 00000000000..0544b504c8b
--- /dev/null
+++ b/Documentation/trace/function-graph-fold.vim
@@ -0,0 +1,42 @@
+" Enable folding for ftrace function_graph traces.
+"
+" To use, :source this file while viewing a function_graph trace, or use vim's
+" -S option to load from the command-line together with a trace.  You can then
+" use the usual vim fold commands, such as "za", to open and close nested
+" functions.  While closed, a fold will show the total time taken for a call,
+" as would normally appear on the line with the closing brace.  Folded
+" functions will not include finish_task_switch(), so folding should remain
+" relatively sane even through a context switch.
+"
+" Note that this will almost certainly only work well with a
+" single-CPU trace (e.g. trace-cmd report --cpu 1).
+
+function! FunctionGraphFoldExpr(lnum)
+  let line = getline(a:lnum)
+  if line[-1:] == '{'
+    if line =~ 'finish_task_switch() {$'
+      return '>1'
+    endif
+    return 'a1'
+  elseif line[-1:] == '}'
+    return 's1'
+  else
+    return '='
+  endif
+endfunction
+
+function! FunctionGraphFoldText()
+  let s = split(getline(v:foldstart), '|', 1)
+  if getline(v:foldend+1) =~ 'finish_task_switch() {$'
+    let s[2] = ' task switch  '
+  else
+    let e = split(getline(v:foldend), '|', 1)
+    let s[2] = e[2]
+  endif
+  return join(s, '|')
+endfunction
+
+setlocal foldexpr=FunctionGraphFoldExpr(v:lnum)
+setlocal foldtext=FunctionGraphFoldText()
+setlocal foldcolumn=12
+setlocal foldmethod=expr
diff --git a/Documentation/trace/ring-buffer-design.txt b/Documentation/trace/ring-buffer-design.txt
new file mode 100644
index 00000000000..5b1d23d604c
--- /dev/null
+++ b/Documentation/trace/ring-buffer-design.txt
@@ -0,0 +1,955 @@
+		Lockless Ring Buffer Design
+		===========================
+
+Copyright 2009 Red Hat Inc.
+   Author:   Steven Rostedt <srostedt@redhat.com>
+  License:   The GNU Free Documentation License, Version 1.2
+               (dual licensed under the GPL v2)
+Reviewers:   Mathieu Desnoyers, Huang Ying, Hidetoshi Seto,
+	     and Frederic Weisbecker.
+
+
+Written for: 2.6.31
+
+Terminology used in this Document
+---------------------------------
+
+tail - where new writes happen in the ring buffer.
+
+head - where new reads happen in the ring buffer.
+
+producer - the task that writes into the ring buffer (same as writer)
+
+writer - same as producer
+
+consumer - the task that reads from the buffer (same as reader)
+
+reader - same as consumer.
+
+reader_page - A page outside the ring buffer used solely (for the most part)
+    by the reader.
+
+head_page - a pointer to the page that the reader will use next
+
+tail_page - a pointer to the page that will be written to next
+
+commit_page - a pointer to the page with the last finished non nested write.
+
+cmpxchg - hardware assisted atomic transaction that performs the following:
+
+   A = B iff previous A == C
+
+   R = cmpxchg(A, C, B) is saying that we replace A with B if and only if
+      current A is equal to C, and we put the old (current) A into R
+
+   R gets the previous A regardless if A is updated with B or not.
+
+   To see if the update was successful a compare of R == C may be used.
+
+The Generic Ring Buffer
+-----------------------
+
+The ring buffer can be used in either an overwrite mode or in
+producer/consumer mode.
+
+Producer/consumer mode is where the producer were to fill up the
+buffer before the consumer could free up anything, the producer
+will stop writing to the buffer. This will lose most recent events.
+
+Overwrite mode is where the produce were to fill up the buffer
+before the consumer could free up anything, the producer will
+overwrite the older data. This will lose the oldest events.
+
+No two writers can write at the same time (on the same per cpu buffer),
+but a writer may interrupt another writer, but it must finish writing
+before the previous writer may continue. This is very important to the
+algorithm. The writers act like a "stack". The way interrupts works
+enforces this behavior.
+
+
+  writer1 start
+     <preempted> writer2 start
+         <preempted> writer3 start
+                     writer3 finishes
+                 writer2 finishes
+  writer1 finishes
+
+This is very much like a writer being preempted by an interrupt and
+the interrupt doing a write as well.
+
+Readers can happen at any time. But no two readers may run at the
+same time, nor can a reader preempt/interrupt another reader. A reader
+can not preempt/interrupt a writer, but it may read/consume from the
+buffer at the same time as a writer is writing, but the reader must be
+on another processor to do so. A reader may read on its own processor
+and can be preempted by a writer.
+
+A writer can preempt a reader, but a reader can not preempt a writer.
+But a reader can read the buffer at the same time (on another processor)
+as a writer.
+
+The ring buffer is made up of a list of pages held together by a link list.
+
+At initialization a reader page is allocated for the reader that is not
+part of the ring buffer.
+
+The head_page, tail_page and commit_page are all initialized to point
+to the same page.
+
+The reader page is initialized to have its next pointer pointing to
+the head page, and its previous pointer pointing to a page before
+the head page.
+
+The reader has its own page to use. At start up time, this page is
+allocated but is not attached to the list. When the reader wants
+to read from the buffer, if its page is empty (like it is on start up)
+it will swap its page with the head_page. The old reader page will
+become part of the ring buffer and the head_page will be removed.
+The page after the inserted page (old reader_page) will become the
+new head page.
+
+Once the new page is given to the reader, the reader could do what
+it wants with it, as long as a writer has left that page.
+
+A sample of how the reader page is swapped: Note this does not
+show the head page in the buffer, it is for demonstrating a swap
+only.
+
+  +------+
+  |reader|          RING BUFFER
+  |page  |
+  +------+
+                  +---+   +---+   +---+
+                  |   |-->|   |-->|   |
+                  |   |<--|   |<--|   |
+                  +---+   +---+   +---+
+                   ^ |             ^ |
+                   | +-------------+ |
+                   +-----------------+
+
+
+  +------+
+  |reader|          RING BUFFER
+  |page  |-------------------+
+  +------+                   v
+    |             +---+   +---+   +---+
+    |             |   |-->|   |-->|   |
+    |             |   |<--|   |<--|   |<-+
+    |             +---+   +---+   +---+  |
+    |              ^ |             ^ |   |
+    |              | +-------------+ |   |
+    |              +-----------------+   |
+    +------------------------------------+
+
+  +------+
+  |reader|          RING BUFFER
+  |page  |-------------------+
+  +------+ <---------------+ v
+    |  ^          +---+   +---+   +---+
+    |  |          |   |-->|   |-->|   |
+    |  |          |   |   |   |<--|   |<-+
+    |  |          +---+   +---+   +---+  |
+    |  |             |             ^ |   |
+    |  |             +-------------+ |   |
+    |  +-----------------------------+   |
+    +------------------------------------+
+
+  +------+
+  |buffer|          RING BUFFER
+  |page  |-------------------+
+  +------+ <---------------+ v
+    |  ^          +---+   +---+   +---+
+    |  |          |   |   |   |-->|   |
+    |  |  New     |   |   |   |<--|   |<-+
+    |  | Reader   +---+   +---+   +---+  |
+    |  |  page ----^                 |   |
+    |  |                             |   |
+    |  +-----------------------------+   |
+    +------------------------------------+
+
+
+
+It is possible that the page swapped is the commit page and the tail page,
+if what is in the ring buffer is less than what is held in a buffer page.
+
+
+          reader page    commit page   tail page
+              |              |             |
+              v              |             |
+             +---+           |             |
+             |   |<----------+             |
+             |   |<------------------------+
+             |   |------+
+             +---+      |
+                        |
+                        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |--->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+This case is still valid for this algorithm.
+When the writer leaves the page, it simply goes into the ring buffer
+since the reader page still points to the next location in the ring
+buffer.
+
+
+The main pointers:
+
+  reader page - The page used solely by the reader and is not part
+                of the ring buffer (may be swapped in)
+
+  head page - the next page in the ring buffer that will be swapped
+              with the reader page.
+
+  tail page - the page where the next write will take place.
+
+  commit page - the page that last finished a write.
+
+The commit page only is updated by the outer most writer in the
+writer stack. A writer that preempts another writer will not move the
+commit page.
+
+When data is written into the ring buffer, a position is reserved
+in the ring buffer and passed back to the writer. When the writer
+is finished writing data into that position, it commits the write.
+
+Another write (or a read) may take place at anytime during this
+transaction. If another write happens it must finish before continuing
+with the previous write.
+
+
+   Write reserve:
+
+       Buffer page
+      +---------+
+      |written  |
+      +---------+  <--- given back to writer (current commit)
+      |reserved |
+      +---------+ <--- tail pointer
+      | empty   |
+      +---------+
+
+   Write commit:
+
+       Buffer page
+      +---------+
+      |written  |
+      +---------+
+      |written  |
+      +---------+  <--- next positon for write (current commit)
+      | empty   |
+      +---------+
+
+
+ If a write happens after the first reserve:
+
+       Buffer page
+      +---------+
+      |written  |
+      +---------+  <-- current commit
+      |reserved |
+      +---------+  <--- given back to second writer
+      |reserved |
+      +---------+ <--- tail pointer
+
+  After second writer commits:
+
+
+       Buffer page
+      +---------+
+      |written  |
+      +---------+  <--(last full commit)
+      |reserved |
+      +---------+
+      |pending  |
+      |commit   |
+      +---------+ <--- tail pointer
+
+  When the first writer commits:
+
+       Buffer page
+      +---------+
+      |written  |
+      +---------+
+      |written  |
+      +---------+
+      |written  |
+      +---------+  <--(last full commit and tail pointer)
+
+
+The commit pointer points to the last write location that was
+committed without preempting another write. When a write that
+preempted another write is committed, it only becomes a pending commit
+and will not be a full commit till all writes have been committed.
+
+The commit page points to the page that has the last full commit.
+The tail page points to the page with the last write (before
+committing).
+
+The tail page is always equal to or after the commit page. It may
+be several pages ahead. If the tail page catches up to the commit
+page then no more writes may take place (regardless of the mode
+of the ring buffer: overwrite and produce/consumer).
+
+The order of pages are:
+
+ head page
+ commit page
+ tail page
+
+Possible scenario:
+                             tail page
+  head page         commit page  |
+      |                 |        |
+      v                 v        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |--->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+There is a special case that the head page is after either the commit page
+and possibly the tail page. That is when the commit (and tail) page has been
+swapped with the reader page. This is because the head page is always
+part of the ring buffer, but the reader page is not. When ever there
+has been less than a full page that has been committed inside the ring buffer,
+and a reader swaps out a page, it will be swapping out the commit page.
+
+
+          reader page    commit page   tail page
+              |              |             |
+              v              |             |
+             +---+           |             |
+             |   |<----------+             |
+             |   |<------------------------+
+             |   |------+
+             +---+      |
+                        |
+                        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |--->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+                        ^
+                        |
+                    head page
+
+
+In this case, the head page will not move when the tail and commit
+move back into the ring buffer.
+
+The reader can not swap a page into the ring buffer if the commit page
+is still on that page. If the read meets the last commit (real commit
+not pending or reserved), then there is nothing more to read.
+The buffer is considered empty until another full commit finishes.
+
+When the tail meets the head page, if the buffer is in overwrite mode,
+the head page will be pushed ahead one. If the buffer is in producer/consumer
+mode, the write will fail.
+
+Overwrite mode:
+
+            tail page
+               |
+               v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |--->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+                        ^
+                        |
+                    head page
+
+
+            tail page
+               |
+               v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |--->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+                                 ^
+                                 |
+                             head page
+
+
+                    tail page
+                        |
+                        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |--->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+                                 ^
+                                 |
+                             head page
+
+Note, the reader page will still point to the previous head page.
+But when a swap takes place, it will use the most recent head page.
+
+
+Making the Ring Buffer Lockless:
+--------------------------------
+
+The main idea behind the lockless algorithm is to combine the moving
+of the head_page pointer with the swapping of pages with the reader.
+State flags are placed inside the pointer to the page. To do this,
+each page must be aligned in memory by 4 bytes. This will allow the 2
+least significant bits of the address to be used as flags. Since
+they will always be zero for the address. To get the address,
+simply mask out the flags.
+
+  MASK = ~3
+
+  address & MASK
+
+Two flags will be kept by these two bits:
+
+   HEADER - the page being pointed to is a head page
+
+   UPDATE - the page being pointed to is being updated by a writer
+          and was or is about to be a head page.
+
+
+          reader page
+              |
+              v
+             +---+
+             |   |------+
+             +---+      |
+                        |
+                        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-H->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+
+The above pointer "-H->" would have the HEADER flag set. That is
+the next page is the next page to be swapped out by the reader.
+This pointer means the next page is the head page.
+
+When the tail page meets the head pointer, it will use cmpxchg to
+change the pointer to the UPDATE state:
+
+
+            tail page
+               |
+               v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-H->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+            tail page
+               |
+               v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+"-U->" represents a pointer in the UPDATE state.
+
+Any access to the reader will need to take some sort of lock to serialize
+the readers. But the writers will never take a lock to write to the
+ring buffer. This means we only need to worry about a single reader,
+and writes only preempt in "stack" formation.
+
+When the reader tries to swap the page with the ring buffer, it
+will also use cmpxchg. If the flag bit in the pointer to the
+head page does not have the HEADER flag set, the compare will fail
+and the reader will need to look for the new head page and try again.
+Note, the flag UPDATE and HEADER are never set at the same time.
+
+The reader swaps the reader page as follows:
+
+  +------+
+  |reader|          RING BUFFER
+  |page  |
+  +------+
+                  +---+    +---+    +---+
+                  |   |--->|   |--->|   |
+                  |   |<---|   |<---|   |
+                  +---+    +---+    +---+
+                   ^ |               ^ |
+                   | +---------------+ |
+                   +-----H-------------+
+
+The reader sets the reader page next pointer as HEADER to the page after
+the head page.
+
+
+  +------+
+  |reader|          RING BUFFER
+  |page  |-------H-----------+
+  +------+                   v
+    |             +---+    +---+    +---+
+    |             |   |--->|   |--->|   |
+    |             |   |<---|   |<---|   |<-+
+    |             +---+    +---+    +---+  |
+    |              ^ |               ^ |   |
+    |              | +---------------+ |   |
+    |              +-----H-------------+   |
+    +--------------------------------------+
+
+It does a cmpxchg with the pointer to the previous head page to make it
+point to the reader page. Note that the new pointer does not have the HEADER
+flag set.  This action atomically moves the head page forward.
+
+  +------+
+  |reader|          RING BUFFER
+  |page  |-------H-----------+
+  +------+                   v
+    |  ^          +---+   +---+   +---+
+    |  |          |   |-->|   |-->|   |
+    |  |          |   |<--|   |<--|   |<-+
+    |  |          +---+   +---+   +---+  |
+    |  |             |             ^ |   |
+    |  |             +-------------+ |   |
+    |  +-----------------------------+   |
+    +------------------------------------+
+
+After the new head page is set, the previous pointer of the head page is
+updated to the reader page.
+
+  +------+
+  |reader|          RING BUFFER
+  |page  |-------H-----------+
+  +------+ <---------------+ v
+    |  ^          +---+   +---+   +---+
+    |  |          |   |-->|   |-->|   |
+    |  |          |   |   |   |<--|   |<-+
+    |  |          +---+   +---+   +---+  |
+    |  |             |             ^ |   |
+    |  |             +-------------+ |   |
+    |  +-----------------------------+   |
+    +------------------------------------+
+
+  +------+
+  |buffer|          RING BUFFER
+  |page  |-------H-----------+  <--- New head page
+  +------+ <---------------+ v
+    |  ^          +---+   +---+   +---+
+    |  |          |   |   |   |-->|   |
+    |  |  New     |   |   |   |<--|   |<-+
+    |  | Reader   +---+   +---+   +---+  |
+    |  |  page ----^                 |   |
+    |  |                             |   |
+    |  +-----------------------------+   |
+    +------------------------------------+
+
+Another important point. The page that the reader page points back to
+by its previous pointer (the one that now points to the new head page)
+never points back to the reader page. That is because the reader page is
+not part of the ring buffer. Traversing the ring buffer via the next pointers
+will always stay in the ring buffer. Traversing the ring buffer via the
+prev pointers may not.
+
+Note, the way to determine a reader page is simply by examining the previous
+pointer of the page. If the next pointer of the previous page does not
+point back to the original page, then the original page is a reader page:
+
+
+             +--------+
+             | reader |  next   +----+
+             |  page  |-------->|    |<====== (buffer page)
+             +--------+         +----+
+                 |                | ^
+                 |                v | next
+            prev |              +----+
+                 +------------->|    |
+                                +----+
+
+The way the head page moves forward:
+
+When the tail page meets the head page and the buffer is in overwrite mode
+and more writes take place, the head page must be moved forward before the
+writer may move the tail page. The way this is done is that the writer
+performs a cmpxchg to convert the pointer to the head page from the HEADER
+flag to have the UPDATE flag set. Once this is done, the reader will
+not be able to swap the head page from the buffer, nor will it be able to
+move the head page, until the writer is finished with the move.
+
+This eliminates any races that the reader can have on the writer. The reader
+must spin, and this is why the reader can not preempt the writer.
+
+            tail page
+               |
+               v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-H->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+            tail page
+               |
+               v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+The following page will be made into the new head page.
+
+           tail page
+               |
+               v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |-H->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+After the new head page has been set, we can set the old head page
+pointer back to NORMAL.
+
+           tail page
+               |
+               v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |--->|   |-H->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+After the head page has been moved, the tail page may now move forward.
+
+                    tail page
+                        |
+                        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |--->|   |-H->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+
+The above are the trivial updates. Now for the more complex scenarios.
+
+
+As stated before, if enough writes preempt the first write, the
+tail page may make it all the way around the buffer and meet the commit
+page. At this time, we must start dropping writes (usually with some kind
+of warning to the user). But what happens if the commit was still on the
+reader page? The commit page is not part of the ring buffer. The tail page
+must account for this.
+
+
+          reader page    commit page
+              |              |
+              v              |
+             +---+           |
+             |   |<----------+
+             |   |
+             |   |------+
+             +---+      |
+                        |
+                        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-H->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+               ^
+               |
+           tail page
+
+If the tail page were to simply push the head page forward, the commit when
+leaving the reader page would not be pointing to the correct page.
+
+The solution to this is to test if the commit page is on the reader page
+before pushing the head page. If it is, then it can be assumed that the
+tail page wrapped the buffer, and we must drop new writes.
+
+This is not a race condition, because the commit page can only be moved
+by the outter most writer (the writer that was preempted).
+This means that the commit will not move while a writer is moving the
+tail page. The reader can not swap the reader page if it is also being
+used as the commit page. The reader can simply check that the commit
+is off the reader page. Once the commit page leaves the reader page
+it will never go back on it unless a reader does another swap with the
+buffer page that is also the commit page.
+
+
+Nested writes
+-------------
+
+In the pushing forward of the tail page we must first push forward
+the head page if the head page is the next page. If the head page
+is not the next page, the tail page is simply updated with a cmpxchg.
+
+Only writers move the tail page. This must be done atomically to protect
+against nested writers.
+
+  temp_page = tail_page
+  next_page = temp_page->next
+  cmpxchg(tail_page, temp_page, next_page)
+
+The above will update the tail page if it is still pointing to the expected
+page. If this fails, a nested write pushed it forward, the the current write
+does not need to push it.
+
+
+           temp page
+               |
+               v
+            tail page
+               |
+               v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |--->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+Nested write comes in and moves the tail page forward:
+
+                    tail page (moved by nested writer)
+            temp page   |
+               |        |
+               v        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |--->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+The above would fail the cmpxchg, but since the tail page has already
+been moved forward, the writer will just try again to reserve storage
+on the new tail page.
+
+But the moving of the head page is a bit more complex.
+
+            tail page
+               |
+               v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-H->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+The write converts the head page pointer to UPDATE.
+
+            tail page
+               |
+               v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+But if a nested writer preempts here. It will see that the next
+page is a head page, but it is also nested. It will detect that
+it is nested and will save that information. The detection is the
+fact that it sees the UPDATE flag instead of a HEADER or NORMAL
+pointer.
+
+The nested writer will set the new head page pointer.
+
+           tail page
+               |
+               v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |-H->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+But it will not reset the update back to normal. Only the writer
+that converted a pointer from HEAD to UPDATE will convert it back
+to NORMAL.
+
+                    tail page
+                        |
+                        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |-H->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+After the nested writer finishes, the outer most writer will convert
+the UPDATE pointer to NORMAL.
+
+
+                    tail page
+                        |
+                        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |--->|   |-H->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+
+It can be even more complex if several nested writes came in and moved
+the tail page ahead several pages:
+
+
+(first writer)
+
+            tail page
+               |
+               v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-H->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+The write converts the head page pointer to UPDATE.
+
+            tail page
+               |
+               v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |--->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+Next writer comes in, and sees the update and sets up the new
+head page.
+
+(second writer)
+
+           tail page
+               |
+               v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |-H->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+The nested writer moves the tail page forward. But does not set the old
+update page to NORMAL because it is not the outer most writer.
+
+                    tail page
+                        |
+                        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |-H->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+Another writer preempts and sees the page after the tail page is a head page.
+It changes it from HEAD to UPDATE.
+
+(third writer)
+
+                    tail page
+                        |
+                        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |-U->|   |--->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+The writer will move the head page forward:
+
+
+(third writer)
+
+                    tail page
+                        |
+                        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |-U->|   |-H->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+But now that the third writer did change the HEAD flag to UPDATE it
+will convert it to normal:
+
+
+(third writer)
+
+                    tail page
+                        |
+                        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |--->|   |-H->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+
+Then it will move the tail page, and return back to the second writer.
+
+
+(second writer)
+
+                             tail page
+                                 |
+                                 v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |--->|   |-H->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+
+The second writer will fail to move the tail page because it was already
+moved, so it will try again and add its data to the new tail page.
+It will return to the first writer.
+
+
+(first writer)
+
+                             tail page
+                                 |
+                                 v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |--->|   |-H->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+The first writer can not know atomically test if the tail page moved
+while it updates the HEAD page. It will then update the head page to
+what it thinks is the new head page.
+
+
+(first writer)
+
+                             tail page
+                                 |
+                                 v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |-H->|   |-H->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+Since the cmpxchg returns the old value of the pointer the first writer
+will see it succeeded in updating the pointer from NORMAL to HEAD.
+But as we can see, this is not good enough. It must also check to see
+if the tail page is either where it use to be or on the next page:
+
+
+(first writer)
+
+               A        B    tail page
+               |        |        |
+               v        v        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |-H->|   |-H->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+If tail page != A and tail page does not equal B, then it must reset the
+pointer back to NORMAL. The fact that it only needs to worry about
+nested writers, it only needs to check this after setting the HEAD page.
+
+
+(first writer)
+
+               A        B    tail page
+               |        |        |
+               v        v        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |-U->|   |--->|   |-H->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
+Now the writer can update the head page. This is also why the head page must
+remain in UPDATE and only reset by the outer most writer. This prevents
+the reader from seeing the incorrect head page.
+
+
+(first writer)
+
+               A        B    tail page
+               |        |        |
+               v        v        v
+    +---+    +---+    +---+    +---+
+<---|   |--->|   |--->|   |--->|   |-H->
+--->|   |<---|   |<---|   |<---|   |<---
+    +---+    +---+    +---+    +---+
+
diff --git a/Documentation/video4linux/CARDLIST.cx88 b/Documentation/video4linux/CARDLIST.cx88
index 89093f53172..0736518b2f8 100644
--- a/Documentation/video4linux/CARDLIST.cx88
+++ b/Documentation/video4linux/CARDLIST.cx88
@@ -6,8 +6,8 @@
   5 -> Leadtek Winfast 2000XP Expert                       [107d:6611,107d:6613]
   6 -> AverTV Studio 303 (M126)                            [1461:000b]
   7 -> MSI TV-@nywhere Master                              [1462:8606]
-  8 -> Leadtek Winfast DV2000                              [107d:6620]
-  9 -> Leadtek PVR 2000                                    [107d:663b,107d:663c,107d:6632]
+  8 -> Leadtek Winfast DV2000                              [107d:6620,107d:6621]
+  9 -> Leadtek PVR 2000                                    [107d:663b,107d:663c,107d:6632,107d:6630,107d:6638,107d:6631,107d:6637,107d:663d]
  10 -> IODATA GV-VCP3/PCI                                  [10fc:d003]
  11 -> Prolink PlayTV PVR
  12 -> ASUS PVR-416                                        [1043:4823,1461:c111]
@@ -59,7 +59,7 @@
  58 -> Pinnacle PCTV HD 800i                               [11bd:0051]
  59 -> DViCO FusionHDTV 5 PCI nano                         [18ac:d530]
  60 -> Pinnacle Hybrid PCTV                                [12ab:1788]
- 61 -> Winfast TV2000 XP Global                            [107d:6f18]
+ 61 -> Leadtek TV2000 XP Global                            [107d:6f18,107d:6618]
  62 -> PowerColor RA330                                    [14f1:ea3d]
  63 -> Geniatech X8000-MT DVBT                             [14f1:8852]
  64 -> DViCO FusionHDTV DVB-T PRO                          [18ac:db30]
diff --git a/Documentation/video4linux/CARDLIST.em28xx b/Documentation/video4linux/CARDLIST.em28xx
index a98a688c11b..e352d754875 100644
--- a/Documentation/video4linux/CARDLIST.em28xx
+++ b/Documentation/video4linux/CARDLIST.em28xx
@@ -1,5 +1,5 @@
   0 -> Unknown EM2800 video grabber             (em2800)        [eb1a:2800]
-  1 -> Unknown EM2750/28xx video grabber        (em2820/em2840) [eb1a:2820,eb1a:2821,eb1a:2860,eb1a:2861,eb1a:2870,eb1a:2881,eb1a:2883]
+  1 -> Unknown EM2750/28xx video grabber        (em2820/em2840) [eb1a:2710,eb1a:2820,eb1a:2821,eb1a:2860,eb1a:2861,eb1a:2870,eb1a:2881,eb1a:2883]
   2 -> Terratec Cinergy 250 USB                 (em2820/em2840) [0ccd:0036]
   3 -> Pinnacle PCTV USB 2                      (em2820/em2840) [2304:0208]
   4 -> Hauppauge WinTV USB 2                    (em2820/em2840) [2040:4200,2040:4201]
@@ -20,7 +20,7 @@
  19 -> EM2860/SAA711X Reference Design          (em2860)
  20 -> AMD ATI TV Wonder HD 600                 (em2880)        [0438:b002]
  21 -> eMPIA Technology, Inc. GrabBeeX+ Video Encoder (em2800)        [eb1a:2801]
- 22 -> Unknown EM2750/EM2751 webcam grabber     (em2750)        [eb1a:2750,eb1a:2751]
+ 22 -> EM2710/EM2750/EM2751 webcam grabber      (em2750)        [eb1a:2750,eb1a:2751]
  23 -> Huaqi DLCW-130                           (em2750)
  24 -> D-Link DUB-T210 TV Tuner                 (em2820/em2840) [2001:f112]
  25 -> Gadmei UTV310                            (em2820/em2840)
@@ -65,3 +65,5 @@
  67 -> Terratec Grabby                          (em2860)        [0ccd:0096]
  68 -> Terratec AV350                           (em2860)        [0ccd:0084]
  69 -> KWorld ATSC 315U HDTV TV Box             (em2882)        [eb1a:a313]
+ 70 -> Evga inDtube                             (em2882)
+ 71 -> Silvercrest Webcam 1.3mpix               (em2820/em2840)
diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134
index 15562427e8a..c913e561419 100644
--- a/Documentation/video4linux/CARDLIST.saa7134
+++ b/Documentation/video4linux/CARDLIST.saa7134
@@ -153,8 +153,8 @@
 152 -> Asus Tiger Rev:1.00                      [1043:4857]
 153 -> Kworld Plus TV Analog Lite PCI           [17de:7128]
 154 -> Avermedia AVerTV GO 007 FM Plus          [1461:f31d]
-155 -> Hauppauge WinTV-HVR1120 ATSC/QAM-Hybrid  [0070:6706,0070:6708]
-156 -> Hauppauge WinTV-HVR1110r3 DVB-T/Hybrid   [0070:6707,0070:6709,0070:670a]
+155 -> Hauppauge WinTV-HVR1150 ATSC/QAM-Hybrid  [0070:6706,0070:6708]
+156 -> Hauppauge WinTV-HVR1120 DVB-T/Hybrid     [0070:6707,0070:6709,0070:670a]
 157 -> Avermedia AVerTV Studio 507UA            [1461:a11b]
 158 -> AVerMedia Cardbus TV/Radio (E501R)       [1461:b7e9]
 159 -> Beholder BeholdTV 505 RDS                [0000:505B]
diff --git a/Documentation/video4linux/gspca.txt b/Documentation/video4linux/gspca.txt
index 2bcf78896e2..573f95b5880 100644
--- a/Documentation/video4linux/gspca.txt
+++ b/Documentation/video4linux/gspca.txt
@@ -44,7 +44,9 @@ zc3xx		0458:7007	Genius VideoCam V2
 zc3xx		0458:700c	Genius VideoCam V3
 zc3xx		0458:700f	Genius VideoCam Web V2
 sonixj		0458:7025	Genius Eye 311Q
+sn9c20x		0458:7029	Genius Look 320s
 sonixj		0458:702e	Genius Slim 310 NB
+sn9c20x		045e:00f4	LifeCam VX-6000 (SN9C20x + OV9650)
 sonixj		045e:00f5	MicroSoft VX3000
 sonixj		045e:00f7	MicroSoft VX1000
 ov519		045e:028c	Micro$oft xbox cam
@@ -282,6 +284,28 @@ sonixj		0c45:613a	Microdia Sonix PC Camera
 sonixj		0c45:613b	Surfer SN-206
 sonixj		0c45:613c	Sonix Pccam168
 sonixj		0c45:6143	Sonix Pccam168
+sn9c20x		0c45:6240	PC Camera (SN9C201 + MT9M001)
+sn9c20x		0c45:6242	PC Camera (SN9C201 + MT9M111)
+sn9c20x		0c45:6248	PC Camera (SN9C201 + OV9655)
+sn9c20x		0c45:624e	PC Camera (SN9C201 + SOI968)
+sn9c20x		0c45:624f	PC Camera (SN9C201 + OV9650)
+sn9c20x		0c45:6251	PC Camera (SN9C201 + OV9650)
+sn9c20x		0c45:6253	PC Camera (SN9C201 + OV9650)
+sn9c20x		0c45:6260	PC Camera (SN9C201 + OV7670)
+sn9c20x		0c45:6270	PC Camera (SN9C201 + MT9V011/MT9V111/MT9V112)
+sn9c20x		0c45:627b	PC Camera (SN9C201 + OV7660)
+sn9c20x		0c45:627c	PC Camera (SN9C201 + HV7131R)
+sn9c20x		0c45:627f	PC Camera (SN9C201 + OV9650)
+sn9c20x		0c45:6280	PC Camera (SN9C202 + MT9M001)
+sn9c20x		0c45:6282	PC Camera (SN9C202 + MT9M111)
+sn9c20x		0c45:6288	PC Camera (SN9C202 + OV9655)
+sn9c20x		0c45:628e	PC Camera (SN9C202 + SOI968)
+sn9c20x		0c45:628f	PC Camera (SN9C202 + OV9650)
+sn9c20x		0c45:62a0	PC Camera (SN9C202 + OV7670)
+sn9c20x		0c45:62b0	PC Camera (SN9C202 + MT9V011/MT9V111/MT9V112)
+sn9c20x		0c45:62b3	PC Camera (SN9C202 + OV9655)
+sn9c20x		0c45:62bb	PC Camera (SN9C202 + OV7660)
+sn9c20x		0c45:62bc	PC Camera (SN9C202 + HV7131R)
 sunplus		0d64:0303	Sunplus FashionCam DXG
 etoms		102c:6151	Qcam Sangha CIF
 etoms		102c:6251	Qcam xxxxxx VGA
@@ -290,6 +314,7 @@ spca561		10fd:7e50	FlyCam Usb 100
 zc3xx		10fd:8050	Typhoon Webshot II USB 300k
 ov534		1415:2000	Sony HD Eye for PS3 (SLEH 00201)
 pac207		145f:013a	Trust WB-1300N
+sn9c20x		145f:013d	Trust WB-3600R
 vc032x		15b8:6001	HP 2.0 Megapixel
 vc032x		15b8:6002	HP 2.0 Megapixel rz406aa
 spca501		1776:501c	Arowana 300K CMOS Camera
@@ -300,4 +325,11 @@ spca500		2899:012c	Toptro Industrial
 spca508		8086:0110	Intel Easy PC Camera
 spca500		8086:0630	Intel Pocket PC Camera
 spca506		99fa:8988	Grandtec V.cap
+sn9c20x		a168:0610	Dino-Lite Digital Microscope (SN9C201 + HV7131R)
+sn9c20x		a168:0611	Dino-Lite Digital Microscope (SN9C201 + HV7131R)
+sn9c20x		a168:0613	Dino-Lite Digital Microscope (SN9C201 + HV7131R)
+sn9c20x		a168:0618	Dino-Lite Digital Microscope (SN9C201 + HV7131R)
+sn9c20x		a168:0614	Dino-Lite Digital Microscope (SN9C201 + MT9M111)
+sn9c20x		a168:0615	Dino-Lite Digital Microscope (SN9C201 + MT9M111)
+sn9c20x		a168:0617	Dino-Lite Digital Microscope (SN9C201 + MT9M111)
 spca561		abcd:cdee	Petcam
diff --git a/Documentation/video4linux/v4l2-framework.txt b/Documentation/video4linux/v4l2-framework.txt
index d54c1e4c6a9..ba4706afc5f 100644
--- a/Documentation/video4linux/v4l2-framework.txt
+++ b/Documentation/video4linux/v4l2-framework.txt
@@ -390,6 +390,30 @@ later date. It differs between i2c drivers and as such can be confusing.
 To see which chip variants are supported you can look in the i2c driver code
 for the i2c_device_id table. This lists all the possibilities.
 
+There are two more helper functions:
+
+v4l2_i2c_new_subdev_cfg: this function adds new irq and platform_data
+arguments and has both 'addr' and 'probed_addrs' arguments: if addr is not
+0 then that will be used (non-probing variant), otherwise the probed_addrs
+are probed.
+
+For example: this will probe for address 0x10:
+
+struct v4l2_subdev *sd = v4l2_i2c_new_subdev_cfg(v4l2_dev, adapter,
+	       "module_foo", "chipid", 0, NULL, 0, I2C_ADDRS(0x10));
+
+v4l2_i2c_new_subdev_board uses an i2c_board_info struct which is passed
+to the i2c driver and replaces the irq, platform_data and addr arguments.
+
+If the subdev supports the s_config core ops, then that op is called with
+the irq and platform_data arguments after the subdev was setup. The older
+v4l2_i2c_new_(probed_)subdev functions will call s_config as well, but with
+irq set to 0 and platform_data set to NULL.
+
+Note that in the next kernel release the functions v4l2_i2c_new_subdev,
+v4l2_i2c_new_probed_subdev and v4l2_i2c_new_probed_subdev_addr will all be
+replaced by a single v4l2_i2c_new_subdev that is identical to
+v4l2_i2c_new_subdev_cfg but without the irq and platform_data arguments.
 
 struct video_device
 -------------------
diff --git a/Documentation/vm/Makefile b/Documentation/vm/Makefile
index 27479d43a9b..5bd269b3731 100644
--- a/Documentation/vm/Makefile
+++ b/Documentation/vm/Makefile
@@ -2,7 +2,7 @@
 obj- := dummy.o
 
 # List of programs to build
-hostprogs-y := slabinfo slqbinfo page-types
+hostprogs-y := slabinfo page-types
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
diff --git a/Documentation/watchdog/hpwdt.txt b/Documentation/watchdog/hpwdt.txt
new file mode 100644
index 00000000000..9c24d5ffbb0
--- /dev/null
+++ b/Documentation/watchdog/hpwdt.txt
@@ -0,0 +1,95 @@
+Last reviewed: 06/02/2009
+
+                     HP iLO2 NMI Watchdog Driver
+              NMI sourcing for iLO2 based ProLiant Servers
+                     Documentation and Driver by
+              Thomas Mingarelli <thomas.mingarelli@hp.com>
+
+ The HP iLO2 NMI Watchdog driver is a kernel module that provides basic
+ watchdog functionality and the added benefit of NMI sourcing. Both the
+ watchdog functionality and the NMI sourcing capability need to be enabled
+ by the user. Remember that the two modes are not dependant on one another.
+ A user can have the NMI sourcing without the watchdog timer and vice-versa.
+
+ Watchdog functionality is enabled like any other common watchdog driver. That
+ is, an application needs to be started that kicks off the watchdog timer. A
+ basic application exists in the Documentation/watchdog/src directory called
+ watchdog-test.c. Simply compile the C file and kick it off. If the system
+ gets into a bad state and hangs, the HP ProLiant iLO 2 timer register will
+ not be updated in a timely fashion and a hardware system reset (also known as
+ an Automatic Server Recovery (ASR)) event will occur.
+
+ The hpwdt driver also has four (4) module parameters. They are the following:
+
+ soft_margin - allows the user to set the watchdog timer value
+ allow_kdump - allows the user to save off a kernel dump image after an NMI
+ nowayout    - basic watchdog parameter that does not allow the timer to
+               be restarted or an impending ASR to be escaped.
+ priority    - determines whether or not the hpwdt driver is first on the
+               die_notify list to handle NMIs or last. The default value
+               for this module parameter is 0 or LAST. If the user wants to
+               enable NMI sourcing then reload the hpwdt driver with
+               priority=1 (and boot with nmi_watchdog=0).
+
+ NOTE: More information about watchdog drivers in general, including the ioctl
+       interface to /dev/watchdog can be found in
+       Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt.
+
+ The priority parameter was introduced due to other kernel software that relied
+ on handling NMIs (like oprofile). Keeping hpwdt's priority at 0 (or LAST)
+ enables the users of NMIs for non critical events to be work as expected.
+
+ The NMI sourcing capability is disabled by default due to the inability to
+ distinguish between "NMI Watchdog Ticks" and "HW generated NMI events" in the
+ Linux kernel. What this means is that the hpwdt nmi handler code is called
+ each time the NMI signal fires off. This could amount to several thousands of
+ NMIs in a matter of seconds. If a user sees the Linux kernel's "dazed and
+ confused" message in the logs or if the system gets into a hung state, then
+ the hpwdt driver can be reloaded with the "priority" module parameter set
+ (priority=1).
+
+ 1. If the kernel has not been booted with nmi_watchdog turned off then
+    edit /boot/grub/menu.lst and place the nmi_watchdog=0 at the end of the
+    currently booting kernel line.
+ 2. reboot the sever
+ 3. Once the system comes up perform a rmmod hpwdt
+ 4. insmod /lib/modules/`uname -r`/kernel/drivers/char/watchdog/hpwdt.ko priority=1
+
+ Now, the hpwdt can successfully receive and source the NMI and provide a log
+ message that details the reason for the NMI (as determined by the HP BIOS).
+
+ Below is a list of NMIs the HP BIOS understands along with the associated
+ code (reason):
+
+	No source found                00h
+
+	Uncorrectable Memory Error     01h
+
+	ASR NMI                        1Bh
+
+	PCI Parity Error               20h
+
+	NMI Button Press               27h
+
+	SB_BUS_NMI                     28h
+
+	ILO Doorbell NMI               29h
+
+	ILO IOP NMI                    2Ah
+
+	ILO Watchdog NMI               2Bh
+
+	Proc Throt NMI                 2Ch
+
+	Front Side Bus NMI             2Dh
+
+	PCI Express Error              2Fh
+
+	DMA controller NMI             30h
+
+	Hypertransport/CSI Error       31h
+
+
+
+ -- Tom Mingarelli
+    (thomas.mingarelli@hp.com)
diff --git a/Documentation/x86/00-INDEX b/Documentation/x86/00-INDEX
index dbe3377754a..f37b46d3486 100644
--- a/Documentation/x86/00-INDEX
+++ b/Documentation/x86/00-INDEX
@@ -2,3 +2,5 @@
 	- this file
 mtrr.txt
 	- how to use x86 Memory Type Range Registers to increase performance
+exception-tables.txt
+	- why and how Linux kernel uses exception tables on x86
diff --git a/Documentation/exception.txt b/Documentation/x86/exception-tables.txt
index 2d5aded6424..32901aa36f0 100644
--- a/Documentation/exception.txt
+++ b/Documentation/x86/exception-tables.txt
@@ -1,123 +1,123 @@
-     Kernel level exception handling in Linux 2.1.8
+     Kernel level exception handling in Linux
   Commentary by Joerg Pommnitz <joerg@raleigh.ibm.com>
 
-When a process runs in kernel mode, it often has to access user 
-mode memory whose address has been passed by an untrusted program. 
+When a process runs in kernel mode, it often has to access user
+mode memory whose address has been passed by an untrusted program.
 To protect itself the kernel has to verify this address.
 
-In older versions of Linux this was done with the 
-int verify_area(int type, const void * addr, unsigned long size) 
+In older versions of Linux this was done with the
+int verify_area(int type, const void * addr, unsigned long size)
 function (which has since been replaced by access_ok()).
 
-This function verified that the memory area starting at address 
+This function verified that the memory area starting at address
 'addr' and of size 'size' was accessible for the operation specified
-in type (read or write). To do this, verify_read had to look up the 
-virtual memory area (vma) that contained the address addr. In the 
-normal case (correctly working program), this test was successful. 
+in type (read or write). To do this, verify_read had to look up the
+virtual memory area (vma) that contained the address addr. In the
+normal case (correctly working program), this test was successful.
 It only failed for a few buggy programs. In some kernel profiling
 tests, this normally unneeded verification used up a considerable
 amount of time.
 
-To overcome this situation, Linus decided to let the virtual memory 
+To overcome this situation, Linus decided to let the virtual memory
 hardware present in every Linux-capable CPU handle this test.
 
 How does this work?
 
-Whenever the kernel tries to access an address that is currently not 
-accessible, the CPU generates a page fault exception and calls the 
-page fault handler 
+Whenever the kernel tries to access an address that is currently not
+accessible, the CPU generates a page fault exception and calls the
+page fault handler
 
 void do_page_fault(struct pt_regs *regs, unsigned long error_code)
 
-in arch/i386/mm/fault.c. The parameters on the stack are set up by 
-the low level assembly glue in arch/i386/kernel/entry.S. The parameter
-regs is a pointer to the saved registers on the stack, error_code 
+in arch/x86/mm/fault.c. The parameters on the stack are set up by
+the low level assembly glue in arch/x86/kernel/entry_32.S. The parameter
+regs is a pointer to the saved registers on the stack, error_code
 contains a reason code for the exception.
 
-do_page_fault first obtains the unaccessible address from the CPU 
-control register CR2. If the address is within the virtual address 
-space of the process, the fault probably occurred, because the page 
-was not swapped in, write protected or something similar. However, 
-we are interested in the other case: the address is not valid, there 
-is no vma that contains this address. In this case, the kernel jumps 
-to the bad_area label. 
-
-There it uses the address of the instruction that caused the exception 
-(i.e. regs->eip) to find an address where the execution can continue 
-(fixup). If this search is successful, the fault handler modifies the 
-return address (again regs->eip) and returns. The execution will 
+do_page_fault first obtains the unaccessible address from the CPU
+control register CR2. If the address is within the virtual address
+space of the process, the fault probably occurred, because the page
+was not swapped in, write protected or something similar. However,
+we are interested in the other case: the address is not valid, there
+is no vma that contains this address. In this case, the kernel jumps
+to the bad_area label.
+
+There it uses the address of the instruction that caused the exception
+(i.e. regs->eip) to find an address where the execution can continue
+(fixup). If this search is successful, the fault handler modifies the
+return address (again regs->eip) and returns. The execution will
 continue at the address in fixup.
 
 Where does fixup point to?
 
-Since we jump to the contents of fixup, fixup obviously points 
-to executable code. This code is hidden inside the user access macros. 
-I have picked the get_user macro defined in include/asm/uaccess.h as an
-example. The definition is somewhat hard to follow, so let's peek at 
+Since we jump to the contents of fixup, fixup obviously points
+to executable code. This code is hidden inside the user access macros.
+I have picked the get_user macro defined in arch/x86/include/asm/uaccess.h
+as an example. The definition is somewhat hard to follow, so let's peek at
 the code generated by the preprocessor and the compiler. I selected
-the get_user call in drivers/char/console.c for a detailed examination.
+the get_user call in drivers/char/sysrq.c for a detailed examination.
 
-The original code in console.c line 1405:
+The original code in sysrq.c line 587:
         get_user(c, buf);
 
 The preprocessor output (edited to become somewhat readable):
 
 (
-  {        
-    long __gu_err = - 14 , __gu_val = 0;        
-    const __typeof__(*( (  buf ) )) *__gu_addr = ((buf));        
-    if (((((0 + current_set[0])->tss.segment) == 0x18 )  || 
-       (((sizeof(*(buf))) <= 0xC0000000UL) && 
-       ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf)))))))        
+  {
+    long __gu_err = - 14 , __gu_val = 0;
+    const __typeof__(*( (  buf ) )) *__gu_addr = ((buf));
+    if (((((0 + current_set[0])->tss.segment) == 0x18 )  ||
+       (((sizeof(*(buf))) <= 0xC0000000UL) &&
+       ((unsigned long)(__gu_addr ) <= 0xC0000000UL - (sizeof(*(buf)))))))
       do {
-        __gu_err  = 0;        
-        switch ((sizeof(*(buf)))) {        
-          case 1: 
-            __asm__ __volatile__(        
-              "1:      mov" "b" " %2,%" "b" "1\n"        
-              "2:\n"        
-              ".section .fixup,\"ax\"\n"        
-              "3:      movl %3,%0\n"        
-              "        xor" "b" " %" "b" "1,%" "b" "1\n"        
-              "        jmp 2b\n"        
-              ".section __ex_table,\"a\"\n"        
-              "        .align 4\n"        
-              "        .long 1b,3b\n"        
+        __gu_err  = 0;
+        switch ((sizeof(*(buf)))) {
+          case 1:
+            __asm__ __volatile__(
+              "1:      mov" "b" " %2,%" "b" "1\n"
+              "2:\n"
+              ".section .fixup,\"ax\"\n"
+              "3:      movl %3,%0\n"
+              "        xor" "b" " %" "b" "1,%" "b" "1\n"
+              "        jmp 2b\n"
+              ".section __ex_table,\"a\"\n"
+              "        .align 4\n"
+              "        .long 1b,3b\n"
               ".text"        : "=r"(__gu_err), "=q" (__gu_val): "m"((*(struct __large_struct *)
-                            (   __gu_addr   )) ), "i"(- 14 ), "0"(  __gu_err  )) ; 
-              break;        
-          case 2: 
+                            (   __gu_addr   )) ), "i"(- 14 ), "0"(  __gu_err  )) ;
+              break;
+          case 2:
             __asm__ __volatile__(
-              "1:      mov" "w" " %2,%" "w" "1\n"        
-              "2:\n"        
-              ".section .fixup,\"ax\"\n"        
-              "3:      movl %3,%0\n"        
-              "        xor" "w" " %" "w" "1,%" "w" "1\n"        
-              "        jmp 2b\n"        
-              ".section __ex_table,\"a\"\n"        
-              "        .align 4\n"        
-              "        .long 1b,3b\n"        
+              "1:      mov" "w" " %2,%" "w" "1\n"
+              "2:\n"
+              ".section .fixup,\"ax\"\n"
+              "3:      movl %3,%0\n"
+              "        xor" "w" " %" "w" "1,%" "w" "1\n"
+              "        jmp 2b\n"
+              ".section __ex_table,\"a\"\n"
+              "        .align 4\n"
+              "        .long 1b,3b\n"
               ".text"        : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *)
-                            (   __gu_addr   )) ), "i"(- 14 ), "0"(  __gu_err  )); 
-              break;        
-          case 4: 
-            __asm__ __volatile__(        
-              "1:      mov" "l" " %2,%" "" "1\n"        
-              "2:\n"        
-              ".section .fixup,\"ax\"\n"        
-              "3:      movl %3,%0\n"        
-              "        xor" "l" " %" "" "1,%" "" "1\n"        
-              "        jmp 2b\n"        
-              ".section __ex_table,\"a\"\n"        
-              "        .align 4\n"        "        .long 1b,3b\n"        
+                            (   __gu_addr   )) ), "i"(- 14 ), "0"(  __gu_err  ));
+              break;
+          case 4:
+            __asm__ __volatile__(
+              "1:      mov" "l" " %2,%" "" "1\n"
+              "2:\n"
+              ".section .fixup,\"ax\"\n"
+              "3:      movl %3,%0\n"
+              "        xor" "l" " %" "" "1,%" "" "1\n"
+              "        jmp 2b\n"
+              ".section __ex_table,\"a\"\n"
+              "        .align 4\n"        "        .long 1b,3b\n"
               ".text"        : "=r"(__gu_err), "=r" (__gu_val) : "m"((*(struct __large_struct *)
-                            (   __gu_addr   )) ), "i"(- 14 ), "0"(__gu_err)); 
-              break;        
-          default: 
-            (__gu_val) = __get_user_bad();        
-        }        
-      } while (0) ;        
-    ((c)) = (__typeof__(*((buf))))__gu_val;        
+                            (   __gu_addr   )) ), "i"(- 14 ), "0"(__gu_err));
+              break;
+          default:
+            (__gu_val) = __get_user_bad();
+        }
+      } while (0) ;
+    ((c)) = (__typeof__(*((buf))))__gu_val;
     __gu_err;
   }
 );
@@ -127,12 +127,12 @@ see what code gcc generates:
 
  >         xorl %edx,%edx
  >         movl current_set,%eax
- >         cmpl $24,788(%eax)        
- >         je .L1424        
+ >         cmpl $24,788(%eax)
+ >         je .L1424
  >         cmpl $-1073741825,64(%esp)
- >         ja .L1423                
+ >         ja .L1423
  > .L1424:
- >         movl %edx,%eax                        
+ >         movl %edx,%eax
  >         movl 64(%esp),%ebx
  > #APP
  > 1:      movb (%ebx),%dl                /* this is the actual user access */
@@ -149,17 +149,17 @@ see what code gcc generates:
  > .L1423:
  >         movzbl %dl,%esi
 
-The optimizer does a good job and gives us something we can actually 
-understand. Can we? The actual user access is quite obvious. Thanks 
-to the unified address space we can just access the address in user 
+The optimizer does a good job and gives us something we can actually
+understand. Can we? The actual user access is quite obvious. Thanks
+to the unified address space we can just access the address in user
 memory. But what does the .section stuff do?????
 
 To understand this we have to look at the final kernel:
 
  > objdump --section-headers vmlinux
- > 
+ >
  > vmlinux:     file format elf32-i386
- > 
+ >
  > Sections:
  > Idx Name          Size      VMA       LMA       File off  Algn
  >   0 .text         00098f40  c0100000  c0100000  00001000  2**4
@@ -198,18 +198,18 @@ final kernel executable:
 
 The whole user memory access is reduced to 10 x86 machine instructions.
 The instructions bracketed in the .section directives are no longer
-in the normal execution path. They are located in a different section 
+in the normal execution path. They are located in a different section
 of the executable file:
 
  > objdump --disassemble --section=.fixup vmlinux
- > 
+ >
  > c0199ff5 <.fixup+10b5> movl   $0xfffffff2,%eax
  > c0199ffa <.fixup+10ba> xorb   %dl,%dl
  > c0199ffc <.fixup+10bc> jmp    c017e7a7 <do_con_write+e3>
 
 And finally:
  > objdump --full-contents --section=__ex_table vmlinux
- > 
+ >
  >  c01aa7c4 93c017c0 e09f19c0 97c017c0 99c017c0  ................
  >  c01aa7d4 f6c217c0 e99f19c0 a5e717c0 f59f19c0  ................
  >  c01aa7e4 080a18c0 01a019c0 0a0a18c0 04a019c0  ................
@@ -235,8 +235,8 @@ sections in the ELF object file. So the instructions
 ended up in the .fixup section of the object file and the addresses
         .long 1b,3b
 ended up in the __ex_table section of the object file. 1b and 3b
-are local labels. The local label 1b (1b stands for next label 1 
-backward) is the address of the instruction that might fault, i.e. 
+are local labels. The local label 1b (1b stands for next label 1
+backward) is the address of the instruction that might fault, i.e.
 in our case the address of the label 1 is c017e7a5:
 the original assembly code: > 1:      movb (%ebx),%dl
 and linked in vmlinux     : > c017e7a5 <do_con_write+e1> movb   (%ebx),%dl
@@ -254,7 +254,7 @@ The assembly code
 becomes the value pair
  >  c01aa7d4 c017c2f6 c0199fe9 c017e7a5 c0199ff5  ................
                                ^this is ^this is
-                               1b       3b 
+                               1b       3b
 c017e7a5,c0199ff5 in the exception table of the kernel.
 
 So, what actually happens if a fault from kernel mode with no suitable
@@ -266,9 +266,9 @@ vma occurs?
 3.) CPU calls do_page_fault
 4.) do page fault calls search_exception_table (regs->eip == c017e7a5);
 5.) search_exception_table looks up the address c017e7a5 in the
-    exception table (i.e. the contents of the ELF section __ex_table) 
+    exception table (i.e. the contents of the ELF section __ex_table)
     and returns the address of the associated fault handle code c0199ff5.
-6.) do_page_fault modifies its own return address to point to the fault 
+6.) do_page_fault modifies its own return address to point to the fault
     handle code and returns.
 7.) execution continues in the fault handling code.
 8.) 8a) EAX becomes -EFAULT (== -14)
author	Ingo Molnar <mingo@elte.hu>	2009-09-07 08:19:51 +0200
committer	Ingo Molnar <mingo@elte.hu>	2009-09-07 08:19:51 +0200
commit	a1922ed661ab2c1637d0b10cde933bd9cd33d965 (patch)
tree	0f1777542b385ebefd30b3586d830fd8ed6fda5b /Documentation
parent	75e33751ca8bbb72dd6f1a74d2810ddc8cbe4bdf (diff)
parent	d28daf923ac5e4a0d7cecebae56f3e339189366b (diff)