42 files changed, 2225 insertions, 570 deletions
diff --git a/Documentation/arm/00-INDEX b/Documentation/arm/00-INDEX
index ecf7d04bca2..3b08bc2b04c 100644
--- a/Documentation/arm/00-INDEX
+++ b/Documentation/arm/00-INDEX
@@ -4,8 +4,8 @@ Booting
 	- requirements for booting
 Interrupts
 	- ARM Interrupt subsystem documentation
-IXP2000
-	- Release Notes for Linux on Intel's IXP2000 Network Processor
+IXP4xx
+	- Intel IXP4xx Network processor.
 msm
 	- MSM specific documentation
 Netwinder
@@ -26,11 +26,27 @@ SPEAr
 	- ST SPEAr platform Linux Overview
 VFP/
 	- Release notes for Linux Kernel Vector Floating Point support code
+cluster-pm-race-avoidance.txt
+	- Algorithm for CPU and Cluster setup/teardown
 empeg/
 	- Ltd's Empeg MP3 Car Audio Player
+firmware.txt
+	- Secure firmware registration and calling.
+kernel_mode_neon.txt
+	- How to use NEON instructions in kernel mode
+kernel_user_helpers.txt
+	- Helper functions in kernel space made available for userspace.
 mem_alignment
 	- alignment abort handler documentation
 memory.txt
 	- description of the virtual memory layout
 nwfpe/
 	- NWFPE floating point emulator documentation
+swp_emulation
+	- SWP/SWPB emulation handler/logging description
+tcm.txt
+	- ARM Tightly Coupled Memory
+uefi.txt
+	- [U]EFI configuration and runtime services documentation
+vlocks.txt
+	- Voting locks, low-level mechanism relying on memory system atomic writes.
diff --git a/Documentation/arm/Booting b/Documentation/arm/Booting
index 76850295af8..371814a3671 100644
--- a/Documentation/arm/Booting
+++ b/Documentation/arm/Booting
@@ -18,7 +18,8 @@ following:
 2. Initialise one serial port.
 3. Detect the machine type.
 4. Setup the kernel tagged list.
-5. Call the kernel image.
+5. Load initramfs.
+6. Call the kernel image.
 
 
 1. Setup and initialise RAM
@@ -65,13 +66,19 @@ looks at the connected hardware is beyond the scope of this document.
 The boot loader must ultimately be able to provide a MACH_TYPE_xxx
 value to the kernel. (see linux/arch/arm/tools/mach-types).
 
-
-4. Setup the kernel tagged list
--------------------------------
+4. Setup boot data
+------------------
 
 Existing boot loaders:		OPTIONAL, HIGHLY RECOMMENDED
 New boot loaders:		MANDATORY
 
+The boot loader must provide either a tagged list or a dtb image for
+passing configuration data to the kernel.  The physical address of the
+boot data is passed to the kernel in register r2.
+
+4a. Setup the kernel tagged list
+--------------------------------
+
 The boot loader must create and initialise the kernel tagged list.
 A valid tagged list starts with ATAG_CORE and ends with ATAG_NONE.
 The ATAG_CORE tag may or may not be empty.  An empty ATAG_CORE tag
@@ -101,7 +108,40 @@ The tagged list must be placed in a region of memory where neither
 the kernel decompressor nor initrd 'bootp' program will overwrite
 it.  The recommended placement is in the first 16KiB of RAM.
 
-5. Calling the kernel image
+4b. Setup the device tree
+-------------------------
+
+The boot loader must load a device tree image (dtb) into system ram
+at a 64bit aligned address and initialize it with the boot data.  The
+dtb format is documented in Documentation/devicetree/booting-without-of.txt.
+The kernel will look for the dtb magic value of 0xd00dfeed at the dtb
+physical address to determine if a dtb has been passed instead of a
+tagged list.
+
+The boot loader must pass at a minimum the size and location of the
+system memory, and the root filesystem location.  The dtb must be
+placed in a region of memory where the kernel decompressor will not
+overwrite it, whilst remaining within the region which will be covered
+by the kernel's low-memory mapping.
+
+A safe location is just above the 128MiB boundary from start of RAM.
+
+5. Load initramfs.
+------------------
+
+Existing boot loaders:		OPTIONAL
+New boot loaders:		OPTIONAL
+
+If an initramfs is in use then, as with the dtb, it must be placed in
+a region of memory where the kernel decompressor will not overwrite it
+while also with the region which will be covered by the kernel's
+low-memory mapping.
+
+A safe location is just above the device tree blob which itself will
+be loaded just above the 128MiB boundary from the start of RAM as
+recommended above.
+
+6. Calling the kernel image
 ---------------------------
 
 Existing boot loaders:		MANDATORY
@@ -112,11 +152,17 @@ is stored in flash, and is linked correctly to be run from flash,
 then it is legal for the boot loader to call the zImage in flash
 directly.
 
-The zImage may also be placed in system RAM (at any location) and
-called there.  Note that the kernel uses 16K of RAM below the image
-to store page tables.  The recommended placement is 32KiB into RAM.
+The zImage may also be placed in system RAM and called there.  The
+kernel should be placed in the first 128MiB of RAM.  It is recommended
+that it is loaded above 32MiB in order to avoid the need to relocate
+prior to decompression, which will make the boot process slightly
+faster.
 
-In either case, the following conditions must be met:
+When booting a raw (non-zImage) kernel the constraints are tighter.
+In this case the kernel must be loaded at an offset into system equal
+to TEXT_OFFSET - PAGE_OFFSET.
+
+In any case, the following conditions must be met:
 
 - Quiesce all DMA capable devices so that memory does not get
   corrupted by bogus network packets or disk data. This will save
@@ -125,17 +171,43 @@ In either case, the following conditions must be met:
 - CPU register settings
   r0 = 0,
   r1 = machine type number discovered in (3) above.
-  r2 = physical address of tagged list in system RAM.
+  r2 = physical address of tagged list in system RAM, or
+       physical address of device tree block (dtb) in system RAM
 
 - CPU mode
   All forms of interrupts must be disabled (IRQs and FIQs)
-  The CPU must be in SVC mode.  (A special exception exists for Angel)
+
+  For CPUs which do not include the ARM virtualization extensions, the
+  CPU must be in SVC mode.  (A special exception exists for Angel)
+
+  CPUs which include support for the virtualization extensions can be
+  entered in HYP mode in order to enable the kernel to make full use of
+  these extensions.  This is the recommended boot method for such CPUs,
+  unless the virtualisations are already in use by a pre-installed
+  hypervisor.
+
+  If the kernel is not entered in HYP mode for any reason, it must be
+  entered in SVC mode.
 
 - Caches, MMUs
   The MMU must be off.
   Instruction cache may be on or off.
   Data cache must be off.
 
+  If the kernel is entered in HYP mode, the above requirements apply to
+  the HYP mode configuration in addition to the ordinary PL1 (privileged
+  kernel modes) configuration.  In addition, all traps into the
+  hypervisor must be disabled, and PL1 access must be granted for all
+  peripherals and CPU resources for which this is architecturally
+  possible.  Except for entering in HYP mode, the system configuration
+  should be such that a kernel which does not include support for the
+  virtualization extensions can boot correctly without extra help.
+
 - The boot loader is expected to call the kernel image by jumping
   directly to the first instruction of the kernel image.
 
+  On CPUs supporting the ARM instruction set, the entry must be
+  made in ARM state, even for a Thumb-2 kernel.
+
+  On CPUs supporting only the Thumb instruction set such as
+  Cortex-M class CPUs, the entry must be made in Thumb state.
diff --git a/Documentation/arm/IXP2000 b/Documentation/arm/IXP2000
deleted file mode 100644
index 68d21d92a30..00000000000
--- a/Documentation/arm/IXP2000
+++ /dev/null
@@ -1,69 +0,0 @@
-
--------------------------------------------------------------------------
-Release Notes for Linux on Intel's IXP2000 Network Processor
-
-Maintained by Deepak Saxena <dsaxena@plexity.net>
--------------------------------------------------------------------------
-
-1. Overview
-
-Intel's IXP2000 family of NPUs (IXP2400, IXP2800, IXP2850) is designed
-for high-performance network applications such high-availability
-telecom systems. In addition to an XScale core, it contains up to 8
-"MicroEngines" that run special code, several high-end networking 
-interfaces (UTOPIA, SPI, etc), a PCI host bridge, one serial port,
-flash interface, and some other odds and ends. For more information, see:
-
-http://developer.intel.com
-
-2. Linux Support
-
-Linux currently supports the following features on the IXP2000 NPUs:
-
-- On-chip serial
-- PCI
-- Flash (MTD/JFFS2)
-- I2C through GPIO
-- Timers (watchdog, OS)
-
-That is about all we can support under Linux ATM b/c the core networking
-components of the chip are accessed via Intel's closed source SDK. 
-Please contact Intel directly on issues with using those. There is
-also a mailing list run by some folks at Princeton University that might
-be of help:  https://lists.cs.princeton.edu/mailman/listinfo/ixp2xxx
-
-WHATEVER YOU DO, DO NOT POST EMAIL TO THE LINUX-ARM OR LINUX-ARM-KERNEL
-MAILING LISTS REGARDING THE INTEL SDK.
-
-3. Supported Platforms
-
-- Intel IXDP2400 Reference Platform
-- Intel IXDP2800 Reference Platform
-- Intel IXDP2401 Reference Platform
-- Intel IXDP2801 Reference Platform
-- RadiSys ENP-2611
-
-4. Usage Notes
-
-- The IXP2000 platforms usually have rather complex PCI bus topologies
-  with large memory space requirements. In addition, b/c of the way the
-  Intel SDK is designed, devices are enumerated in a very specific
-  way. B/c of this this, we use "pci=firmware" option in the kernel
-  command line so that we do not re-enumerate the bus.
-
-- IXDP2x01 systems have variable clock tick rates that we cannot determine 
-  via HW registers. The "ixdp2x01_clk=XXX" cmd line options allow you
-  to pass the clock rate to the board port.
-
-5. Thanks
-
-The IXP2000 work has been funded by Intel Corp. and MontaVista Software, Inc.
-
-The following people have contributed patches/comments/etc:
-
-Naeem F. Afzal
-Lennert Buytenhek
-Jeffrey Daly
-
--------------------------------------------------------------------------
-Last Update: 8/09/2004
diff --git a/Documentation/arm/IXP4xx b/Documentation/arm/IXP4xx
index 133c5fa6c7a..e48b74de6ac 100644
--- a/Documentation/arm/IXP4xx
+++ b/Documentation/arm/IXP4xx
@@ -36,7 +36,7 @@ Linux currently supports the following features on the IXP4xx chips:
 - Timers (watchdog, OS)
 
 The following components of the chips are not supported by Linux and
-require the use of Intel's propietary CSR softare:
+require the use of Intel's proprietary CSR software:
 
 - USB device interface
 - Network interfaces (HSS, Utopia, NPEs, etc)
@@ -47,7 +47,7 @@ software from:
 
    http://developer.intel.com/design/network/products/npfamily/ixp425.htm    
 
-DO NOT POST QUESTIONS TO THE LINUX MAILING LISTS REGARDING THE PROPIETARY
+DO NOT POST QUESTIONS TO THE LINUX MAILING LISTS REGARDING THE PROPRIETARY
 SOFTWARE.
 
 There are several websites that provide directions/pointers on using
diff --git a/Documentation/arm/Marvell/README b/Documentation/arm/Marvell/README
new file mode 100644
index 00000000000..2cce5401e32
--- /dev/null
+++ b/Documentation/arm/Marvell/README
@@ -0,0 +1,272 @@
+ARM Marvell SoCs
+================
+
+This document lists all the ARM Marvell SoCs that are currently
+supported in mainline by the Linux kernel. As the Marvell families of
+SoCs are large and complex, it is hard to understand where the support
+for a particular SoC is available in the Linux kernel. This document
+tries to help in understanding where those SoCs are supported, and to
+match them with their corresponding public datasheet, when available.
+
+Orion family
+------------
+
+  Flavors:
+        88F5082
+        88F5181
+        88F5181L
+        88F5182
+               Datasheet               : http://www.embeddedarm.com/documentation/third-party/MV88F5182-datasheet.pdf
+               Programmer's User Guide : http://www.embeddedarm.com/documentation/third-party/MV88F5182-opensource-manual.pdf
+               User Manual             : http://www.embeddedarm.com/documentation/third-party/MV88F5182-usermanual.pdf
+        88F5281
+               Datasheet               : http://www.ocmodshop.com/images/reviews/networking/qnap_ts409u/marvel_88f5281_data_sheet.pdf
+        88F6183
+  Core: Feroceon ARMv5 compatible
+  Linux kernel mach directory: arch/arm/mach-orion5x
+  Linux kernel plat directory: arch/arm/plat-orion
+
+Kirkwood family
+---------------
+
+  Flavors:
+        88F6282 a.k.a Armada 300
+                Product Brief  : http://www.marvell.com/embedded-processors/armada-300/assets/armada_310.pdf
+        88F6283 a.k.a Armada 310
+                Product Brief  : http://www.marvell.com/embedded-processors/armada-300/assets/armada_310.pdf
+        88F6190
+                Product Brief  : http://www.marvell.com/embedded-processors/kirkwood/assets/88F6190-003_WEB.pdf
+                Hardware Spec  : http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F619x_OpenSource.pdf
+                Functional Spec: http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf
+        88F6192
+                Product Brief  : http://www.marvell.com/embedded-processors/kirkwood/assets/88F6192-003_ver1.pdf
+                Hardware Spec  : http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F619x_OpenSource.pdf
+                Functional Spec: http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf
+        88F6182
+        88F6180
+                Product Brief  : http://www.marvell.com/embedded-processors/kirkwood/assets/88F6180-003_ver1.pdf
+                Hardware Spec  : http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F6180_OpenSource.pdf
+                Functional Spec: http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf
+        88F6281
+                Product Brief  : http://www.marvell.com/embedded-processors/kirkwood/assets/88F6281-004_ver1.pdf
+                Hardware Spec  : http://www.marvell.com/embedded-processors/kirkwood/assets/HW_88F6281_OpenSource.pdf
+                Functional Spec: http://www.marvell.com/embedded-processors/kirkwood/assets/FS_88F6180_9x_6281_OpenSource.pdf
+  Homepage: http://www.marvell.com/embedded-processors/kirkwood/
+  Core: Feroceon ARMv5 compatible
+  Linux kernel mach directory: arch/arm/mach-kirkwood
+  Linux kernel plat directory: arch/arm/plat-orion
+
+Discovery family
+----------------
+
+  Flavors:
+        MV78100
+                Product Brief  : http://www.marvell.com/embedded-processors/discovery-innovation/assets/MV78100-003_WEB.pdf
+                Hardware Spec  : http://www.marvell.com/embedded-processors/discovery-innovation/assets/HW_MV78100_OpenSource.pdf
+                Functional Spec: http://www.marvell.com/embedded-processors/discovery-innovation/assets/FS_MV76100_78100_78200_OpenSource.pdf
+        MV78200
+                Product Brief  : http://www.marvell.com/embedded-processors/discovery-innovation/assets/MV78200-002_WEB.pdf
+                Hardware Spec  : http://www.marvell.com/embedded-processors/discovery-innovation/assets/HW_MV78200_OpenSource.pdf
+                Functional Spec: http://www.marvell.com/embedded-processors/discovery-innovation/assets/FS_MV76100_78100_78200_OpenSource.pdf
+        MV76100
+                Not supported by the Linux kernel.
+
+  Core: Feroceon ARMv5 compatible
+
+  Linux kernel mach directory: arch/arm/mach-mv78xx0
+  Linux kernel plat directory: arch/arm/plat-orion
+
+EBU Armada family
+-----------------
+
+  Armada 370 Flavors:
+        88F6710
+        88F6707
+        88F6W11
+    Product Brief: http://www.marvell.com/embedded-processors/armada-300/assets/Marvell_ARMADA_370_SoC.pdf
+
+  Armada 375 Flavors:
+	88F6720
+    Product Brief: http://www.marvell.com/embedded-processors/armada-300/assets/ARMADA_375_SoC-01_product_brief.pdf
+
+  Armada 380/385 Flavors:
+	88F6810
+	88F6820
+	88F6828
+
+  Armada XP Flavors:
+        MV78230
+        MV78260
+        MV78460
+    NOTE: not to be confused with the non-SMP 78xx0 SoCs
+    Product Brief: http://www.marvell.com/embedded-processors/armada-xp/assets/Marvell-ArmadaXP-SoC-product%20brief.pdf
+
+  No public datasheet available.
+
+  Core: Sheeva ARMv7 compatible
+
+  Linux kernel mach directory: arch/arm/mach-mvebu
+  Linux kernel plat directory: none
+
+Avanta family
+-------------
+
+  Flavors:
+       88F6510
+       88F6530P
+       88F6550
+       88F6560
+  Homepage     : http://www.marvell.com/broadband/
+  Product Brief: http://www.marvell.com/broadband/assets/Marvell_Avanta_88F6510_305_060-001_product_brief.pdf
+  No public datasheet available.
+
+  Core: ARMv5 compatible
+
+  Linux kernel mach directory: no code in mainline yet, planned for the future
+  Linux kernel plat directory: no code in mainline yet, planned for the future
+
+Dove family (application processor)
+-----------------------------------
+
+  Flavors:
+        88AP510 a.k.a Armada 510
+                Product Brief   : http://www.marvell.com/application-processors/armada-500/assets/Marvell_Armada510_SoC.pdf
+                Hardware Spec   : http://www.marvell.com/application-processors/armada-500/assets/Armada-510-Hardware-Spec.pdf
+                Functional Spec : http://www.marvell.com/application-processors/armada-500/assets/Armada-510-Functional-Spec.pdf
+  Homepage: http://www.marvell.com/application-processors/armada-500/
+  Core: ARMv7 compatible
+  Directory: arch/arm/mach-dove
+
+PXA 2xx/3xx/93x/95x family
+--------------------------
+
+  Flavors:
+        PXA21x, PXA25x, PXA26x
+             Application processor only
+             Core: ARMv5 XScale core
+        PXA270, PXA271, PXA272
+             Product Brief         : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_pb.pdf
+             Design guide          : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_design_guide.pdf
+             Developers manual     : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_dev_man.pdf
+             Specification         : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_emts.pdf
+             Specification update  : http://www.marvell.com/application-processors/pxa-family/assets/pxa_27x_spec_update.pdf
+             Application processor only
+             Core: ARMv5 XScale core
+        PXA300, PXA310, PXA320
+             PXA 300 Product Brief : http://www.marvell.com/application-processors/pxa-family/assets/PXA300_PB_R4.pdf
+             PXA 310 Product Brief : http://www.marvell.com/application-processors/pxa-family/assets/PXA310_PB_R4.pdf
+             PXA 320 Product Brief : http://www.marvell.com/application-processors/pxa-family/assets/PXA320_PB_R4.pdf
+             Design guide          : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_Design_Guide.pdf
+             Developers manual     : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_Developers_Manual.zip
+             Specifications        : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_EMTS.pdf
+             Specification Update  : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_Spec_Update.zip
+             Reference Manual      : http://www.marvell.com/application-processors/pxa-family/assets/PXA3xx_TavorP_BootROM_Ref_Manual.pdf
+             Application processor only
+             Core: ARMv5 XScale core
+        PXA930, PXA935
+             Application processor with Communication processor
+             Core: ARMv5 XScale core
+        PXA955
+             Application processor with Communication processor
+             Core: ARMv7 compatible Sheeva PJ4 core
+
+   Comments:
+
+    * This line of SoCs originates from the XScale family developed by
+      Intel and acquired by Marvell in ~2006. The PXA21x, PXA25x,
+      PXA26x, PXA27x, PXA3xx and PXA93x were developed by Intel, while
+      the later PXA95x were developed by Marvell.
+
+    * Due to their XScale origin, these SoCs have virtually nothing in
+      common with the other (Kirkwood, Dove, etc.) families of Marvell
+      SoCs, except with the MMP/MMP2 family of SoCs.
+
+   Linux kernel mach directory: arch/arm/mach-pxa
+   Linux kernel plat directory: arch/arm/plat-pxa
+
+MMP/MMP2 family (communication processor)
+-----------------------------------------
+
+   Flavors:
+        PXA168, a.k.a Armada 168
+             Homepage             : http://www.marvell.com/application-processors/armada-100/armada-168.jsp
+             Product brief        : http://www.marvell.com/application-processors/armada-100/assets/pxa_168_pb.pdf
+             Hardware manual      : http://www.marvell.com/application-processors/armada-100/assets/armada_16x_datasheet.pdf
+             Software manual      : http://www.marvell.com/application-processors/armada-100/assets/armada_16x_software_manual.pdf
+             Specification update : http://www.marvell.com/application-processors/armada-100/assets/ARMADA16x_Spec_update.pdf
+             Boot ROM manual      : http://www.marvell.com/application-processors/armada-100/assets/armada_16x_ref_manual.pdf
+             App node package     : http://www.marvell.com/application-processors/armada-100/assets/armada_16x_app_note_package.pdf
+             Application processor only
+             Core: ARMv5 compatible Marvell PJ1 (Mohawk)
+        PXA910
+             Homepage             : http://www.marvell.com/communication-processors/pxa910/
+             Product Brief        : http://www.marvell.com/communication-processors/pxa910/assets/Marvell_PXA910_Platform-001_PB_final.pdf
+             Application processor with Communication processor
+             Core: ARMv5 compatible Marvell PJ1 (Mohawk)
+        MMP2, a.k.a Armada 610
+             Product Brief        : http://www.marvell.com/application-processors/armada-600/assets/armada610_pb.pdf
+             Application processor only
+             Core: ARMv7 compatible Sheeva PJ4 core
+
+   Comments:
+
+    * This line of SoCs originates from the XScale family developed by
+      Intel and acquired by Marvell in ~2006. All the processors of
+      this MMP/MMP2 family were developed by Marvell.
+
+    * Due to their XScale origin, these SoCs have virtually nothing in
+      common with the other (Kirkwood, Dove, etc.) families of Marvell
+      SoCs, except with the PXA family of SoCs listed above.
+
+   Linux kernel mach directory: arch/arm/mach-mmp
+   Linux kernel plat directory: arch/arm/plat-pxa
+
+Berlin family (Digital Entertainment)
+-------------------------------------
+
+  Flavors:
+	88DE3005, Armada 1500-mini
+		Design name:	BG2CD
+		Core:		ARM Cortex-A9, PL310 L2CC
+		Homepage:	http://www.marvell.com/digital-entertainment/armada-1500-mini/
+	88DE3100, Armada 1500
+		Design name:	BG2
+		Core:		Marvell PJ4B (ARMv7), Tauros3 L2CC
+		Homepage:	http://www.marvell.com/digital-entertainment/armada-1500/
+		Product Brief:	http://www.marvell.com/digital-entertainment/armada-1500/assets/Marvell-ARMADA-1500-Product-Brief.pdf
+	88DE3114, Armada 1500 Pro
+		Design name:	BG2-Q
+		Core:		Quad Core ARM Cortex-A9, PL310 L2CC
+		Homepage:	http://www.marvell.com/digital-entertainment/armada-1500-pro/
+		Product Brief:	http://www.marvell.com/digital-entertainment/armada-1500-pro/assets/Marvell_ARMADA_1500_PRO-01_product_brief.pdf
+	88DE????
+		Design name:	BG3
+		Core:		ARM Cortex-A15, CA15 integrated L2CC
+
+  Homepage: http://www.marvell.com/digital-entertainment/
+  Directory: arch/arm/mach-berlin
+
+  Comments:
+   * This line of SoCs is based on Marvell Sheeva or ARM Cortex CPUs
+     with Synopsys DesignWare (IRQ, GPIO, Timers, ...) and PXA IP (SDHCI, USB, ETH, ...).
+
+Long-term plans
+---------------
+
+ * Unify the mach-dove/, mach-mv78xx0/, mach-orion5x/ and
+   mach-kirkwood/ into the mach-mvebu/ to support all SoCs from the
+   Marvell EBU (Engineering Business Unit) in a single mach-<foo>
+   directory. The plat-orion/ would therefore disappear.
+
+ * Unify the mach-mmp/ and mach-pxa/ into the same mach-pxa
+   directory. The plat-pxa/ would therefore disappear.
+
+Credits
+-------
+
+ Maen Suleiman <maen@marvell.com>
+ Lior Amsalem <alior@marvell.com>
+ Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
+ Andrew Lunn <andrew@lunn.ch>
+ Nicolas Pitre <nico@fluxnic.net>
+ Eric Miao <eric.y.miao@gmail.com>
diff --git a/Documentation/arm/OMAP/DSS b/Documentation/arm/OMAP/DSS
index 0af0e9eed5d..4484e021290 100644
--- a/Documentation/arm/OMAP/DSS
+++ b/Documentation/arm/OMAP/DSS
@@ -47,6 +47,51 @@ flexible way to enable non-common multi-display configuration. In addition to
 modelling the hardware overlays, omapdss supports virtual overlays and overlay
 managers. These can be used when updating a display with CPU or system DMA.
 
+omapdss driver support for audio
+--------------------------------
+There exist several display technologies and standards that support audio as
+well. Hence, it is relevant to update the DSS device driver to provide an audio
+interface that may be used by an audio driver or any other driver interested in
+the functionality.
+
+The audio_enable function is intended to prepare the relevant
+IP for playback (e.g., enabling an audio FIFO, taking in/out of reset
+some IP, enabling companion chips, etc). It is intended to be called before
+audio_start. The audio_disable function performs the reverse operation and is
+intended to be called after audio_stop.
+
+While a given DSS device driver may support audio, it is possible that for
+certain configurations audio is not supported (e.g., an HDMI display using a
+VESA video timing). The audio_supported function is intended to query whether
+the current configuration of the display supports audio.
+
+The audio_config function is intended to configure all the relevant audio
+parameters of the display. In order to make the function independent of any
+specific DSS device driver, a struct omap_dss_audio is defined. Its purpose
+is to contain all the required parameters for audio configuration. At the
+moment, such structure contains pointers to IEC-60958 channel status word
+and CEA-861 audio infoframe structures. This should be enough to support
+HDMI and DisplayPort, as both are based on CEA-861 and IEC-60958.
+
+The audio_enable/disable, audio_config and audio_supported functions could be
+implemented as functions that may sleep. Hence, they should not be called
+while holding a spinlock or a readlock.
+
+The audio_start/audio_stop function is intended to effectively start/stop audio
+playback after the configuration has taken place. These functions are designed
+to be used in an atomic context. Hence, audio_start should return quickly and be
+called only after all the needed resources for audio playback (audio FIFOs,
+DMA channels, companion chips, etc) have been enabled to begin data transfers.
+audio_stop is designed to only stop the audio transfers. The resources used
+for playback are released using audio_disable.
+
+The enum omap_dss_audio_state may be used to help the implementations of
+the interface to keep track of the audio state. The initial state is _DISABLED;
+then, the state transitions to _CONFIGURED, and then, when it is ready to
+play audio, to _ENABLED. The state _PLAYING is used when the audio is being
+rendered.
+
+
 Panel and controller drivers
 ----------------------------
 
@@ -156,6 +201,7 @@ timings		Display timings (pixclock,xres/hfp/hbp/hsw,yres/vfp/vbp/vsw)
 		"pal" and "ntsc"
 panel_name
 tear_elim	Tearing elimination 0=off, 1=on
+output_type	Output type (video encoder only): "composite" or "svideo"
 
 There are also some debugfs files at <debugfs>/omapdss/ which show information
 about clocks and registers.
@@ -239,7 +285,10 @@ FB0 +-- GFX  ---- LCD ---- LCD
 Misc notes
 ----------
 
-OMAP FB allocates the framebuffer memory using the OMAP VRAM allocator.
+OMAP FB allocates the framebuffer memory using the standard dma allocator. You
+can enable Contiguous Memory Allocator (CONFIG_CMA) to improve the dma
+allocator, and if CMA is enabled, you use "cma=" kernel parameter to increase
+the global memory area for CMA.
 
 Using DSI DPLL to generate pixel clock it is possible produce the pixel clock
 of 86.5MHz (max possible), and with that you get 1280x1024@57 output from DVI.
@@ -255,10 +304,6 @@ framebuffer parameters.
 Kernel boot arguments
 ---------------------
 
-vram=<size>
-	- Amount of total VRAM to preallocate. For example, "10M". omapfb
-	  allocates memory for framebuffers from VRAM.
-
 omapfb.mode=<display>:<mode>[,...]
 	- Default video mode for specified displays. For example,
 	  "dvi:800x400MR-24@60".  See drivers/video/modedb.c.
diff --git a/Documentation/arm/OMAP/omap_pm b/Documentation/arm/OMAP/omap_pm
index 5389440aade..4ae915a9f89 100644
--- a/Documentation/arm/OMAP/omap_pm
+++ b/Documentation/arm/OMAP/omap_pm
@@ -78,7 +78,7 @@ to NULL.  Drivers should use the following idiom:
 The most common usage of these functions will probably be to specify
 the maximum time from when an interrupt occurs, to when the device
 becomes accessible.  To accomplish this, driver writers should use the
-set_max_mpu_wakeup_lat() function to to constrain the MPU wakeup
+set_max_mpu_wakeup_lat() function to constrain the MPU wakeup
 latency, and the set_max_dev_wakeup_lat() function to constrain the
 device wakeup latency (from clk_enable() to accessibility).  For
 example,
@@ -127,3 +127,28 @@ implementation needs:
 10. (*pdata->cpu_set_freq)(unsigned long f)
 
 11. (*pdata->cpu_get_freq)(void)
+
+Customizing OPP for platform
+============================
+Defining CONFIG_PM should enable OPP layer for the silicon
+and the registration of OPP table should take place automatically.
+However, in special cases, the default OPP table may need to be
+tweaked, for e.g.:
+ * enable default OPPs which are disabled by default, but which
+   could be enabled on a platform
+ * Disable an unsupported OPP on the platform
+ * Define and add a custom opp table entry
+in these cases, the board file needs to do additional steps as follows:
+arch/arm/mach-omapx/board-xyz.c
+	#include "pm.h"
+	....
+	static void __init omap_xyz_init_irq(void)
+	{
+		....
+		/* Initialize the default table */
+		omapx_opp_init();
+		/* Do customization to the defaults */
+		....
+	}
+NOTE: omapx_opp_init will be omap3_opp_init or as required
+based on the omap family.
diff --git a/Documentation/arm/SH-Mobile/Makefile b/Documentation/arm/SH-Mobile/Makefile
new file mode 100644
index 00000000000..8771d832cf8
--- /dev/null
+++ b/Documentation/arm/SH-Mobile/Makefile
@@ -0,0 +1,8 @@
+BIN := vrl4
+
+.PHONY: all
+all: $(BIN)
+
+.PHONY: clean
+clean:
+	rm -f *.o $(BIN)
diff --git a/Documentation/arm/SH-Mobile/vrl4.c b/Documentation/arm/SH-Mobile/vrl4.c
new file mode 100644
index 00000000000..e8a191358ad
--- /dev/null
+++ b/Documentation/arm/SH-Mobile/vrl4.c
@@ -0,0 +1,169 @@
+/*
+ * vrl4 format generator
+ *
+ * Copyright (C) 2010 Simon Horman
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ */
+
+/*
+ * usage: vrl4 < zImage > out
+ *	  dd if=out of=/dev/sdx bs=512 seek=1 # Write the image to sector 1
+ *
+ * Reads a zImage from stdin and writes a vrl4 image to stdout.
+ * In practice this means writing a padded vrl4 header to stdout followed
+ * by the zImage.
+ *
+ * The padding places the zImage at ALIGN bytes into the output.
+ * The vrl4 uses ALIGN + START_BASE as the start_address.
+ * This is where the mask ROM will jump to after verifying the header.
+ *
+ * The header sets copy_size to min(sizeof(zImage), MAX_BOOT_PROG_LEN) + ALIGN.
+ * That is, the mask ROM will load the padded header (ALIGN bytes)
+ * And then MAX_BOOT_PROG_LEN bytes of the image, or the entire image,
+ * whichever is smaller.
+ *
+ * The zImage is not modified in any way.
+ */
+
+#define _BSD_SOURCE
+#include <endian.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <errno.h>
+
+struct hdr {
+	uint32_t magic1;
+	uint32_t reserved1;
+	uint32_t magic2;
+	uint32_t reserved2;
+	uint16_t copy_size;
+	uint16_t boot_options;
+	uint32_t reserved3;
+	uint32_t start_address;
+	uint32_t reserved4;
+	uint32_t reserved5;
+	char     reserved6[308];
+};
+
+#define DECLARE_HDR(h)					\
+	struct hdr (h) = {				\
+		.magic1 =	htole32(0xea000000),	\
+		.reserved1 =	htole32(0x56),		\
+		.magic2 =	htole32(0xe59ff008),	\
+		.reserved3 =	htole16(0x1) }
+
+/* Align to 512 bytes, the MMCIF sector size */
+#define ALIGN_BITS	9
+#define ALIGN		(1 << ALIGN_BITS)
+
+#define START_BASE	0xe55b0000
+
+/*
+ * With an alignment of 512 the header uses the first sector.
+ * There is a 128 sector (64kbyte) limit on the data loaded by the mask ROM.
+ * So there are 127 sectors left for the boot programme. But in practice
+ * Only a small portion of a zImage is needed, 16 sectors should be more
+ * than enough.
+ *
+ * Note that this sets how much of the zImage is copied by the mask ROM.
+ * The entire zImage is present after the header and is loaded
+ * by the code in the boot program (which is the first portion of the zImage).
+ */
+#define	MAX_BOOT_PROG_LEN (16 * 512)
+
+#define ROUND_UP(x)	((x + ALIGN - 1) & ~(ALIGN - 1))
+
+ssize_t do_read(int fd, void *buf, size_t count)
+{
+	size_t offset = 0;
+	ssize_t l;
+
+	while (offset < count) {
+		l = read(fd, buf + offset, count - offset);
+		if (!l)
+			break;
+		if (l < 0) {
+			if (errno == EAGAIN || errno == EWOULDBLOCK)
+				continue;
+			perror("read");
+			return -1;
+		}
+		offset += l;
+	}
+
+	return offset;
+}
+
+ssize_t do_write(int fd, const void *buf, size_t count)
+{
+	size_t offset = 0;
+	ssize_t l;
+
+	while (offset < count) {
+		l = write(fd, buf + offset, count - offset);
+		if (l < 0) {
+			if (errno == EAGAIN || errno == EWOULDBLOCK)
+				continue;
+			perror("write");
+			return -1;
+		}
+		offset += l;
+	}
+
+	return offset;
+}
+
+ssize_t write_zero(int fd, size_t len)
+{
+	size_t i = len;
+
+	while (i--) {
+		const char x = 0;
+		if (do_write(fd, &x, 1) < 0)
+			return -1;
+	}
+
+	return len;
+}
+
+int main(void)
+{
+	DECLARE_HDR(hdr);
+	char boot_program[MAX_BOOT_PROG_LEN];
+	size_t aligned_hdr_len, alligned_prog_len;
+	ssize_t prog_len;
+
+	prog_len = do_read(0, boot_program, sizeof(boot_program));
+	if (prog_len <= 0)
+		return -1;
+
+	aligned_hdr_len = ROUND_UP(sizeof(hdr));
+	hdr.start_address = htole32(START_BASE + aligned_hdr_len);
+	alligned_prog_len = ROUND_UP(prog_len);
+	hdr.copy_size = htole16(aligned_hdr_len + alligned_prog_len);
+
+	if (do_write(1, &hdr, sizeof(hdr)) < 0)
+		return -1;
+	if (write_zero(1, aligned_hdr_len - sizeof(hdr)) < 0)
+		return -1;
+
+	if (do_write(1, boot_program, prog_len) < 0)
+		return 1;
+
+	/* Write out the rest of the kernel */
+	while (1) {
+		prog_len = do_read(0, boot_program, sizeof(boot_program));
+		if (prog_len < 0)
+			return 1;
+		if (prog_len == 0)
+			break;
+		if (do_write(1, boot_program, prog_len) < 0)
+			return 1;
+	}
+
+	return 0;
+}
diff --git a/Documentation/arm/SH-Mobile/zboot-rom-mmcif.txt b/Documentation/arm/SH-Mobile/zboot-rom-mmcif.txt
new file mode 100644
index 00000000000..efff8ae2713
--- /dev/null
+++ b/Documentation/arm/SH-Mobile/zboot-rom-mmcif.txt
@@ -0,0 +1,29 @@
+ROM-able zImage boot from MMC
+-----------------------------
+
+An ROM-able zImage compiled with ZBOOT_ROM_MMCIF may be written to MMC and
+SuperH Mobile ARM will to boot directly from the MMCIF hardware block.
+
+This is achieved by the mask ROM loading the first portion of the image into
+MERAM and then jumping to it. This portion contains loader code which
+copies the entire image to SDRAM and jumps to it. From there the zImage
+boot code proceeds as normal, uncompressing the image into its final
+location and then jumping to it.
+
+This code has been tested on an AP4EB board using the developer 1A eMMC
+boot mode which is configured using the following jumper settings.
+The board used for testing required a patched mask ROM in order for
+this mode to function.
+
+   8 7 6 5 4 3 2 1
+   x|x|x|x|x| |x|
+S4 -+-+-+-+-+-+-+-
+    | | | | |x| |x on
+
+The zImage must be written to the MMC card at sector 1 (512 bytes) in
+vrl4 format. A utility vrl4 is supplied to accomplish this.
+
+e.g.
+	vrl4 < zImage | dd of=/dev/sdX bs=512 seek=1
+
+A dual-voltage MMC 4.0 card was used for testing.
diff --git a/Documentation/arm/SH-Mobile/zboot-rom-sdhi.txt b/Documentation/arm/SH-Mobile/zboot-rom-sdhi.txt
new file mode 100644
index 00000000000..441959846e1
--- /dev/null
+++ b/Documentation/arm/SH-Mobile/zboot-rom-sdhi.txt
@@ -0,0 +1,42 @@
+ROM-able zImage boot from eSD
+-----------------------------
+
+An ROM-able zImage compiled with ZBOOT_ROM_SDHI may be written to eSD and
+SuperH Mobile ARM will to boot directly from the SDHI hardware block.
+
+This is achieved by the mask ROM loading the first portion of the image into
+MERAM and then jumping to it. This portion contains loader code which
+copies the entire image to SDRAM and jumps to it. From there the zImage
+boot code proceeds as normal, uncompressing the image into its final
+location and then jumping to it.
+
+This code has been tested on an mackerel board using the developer 1A eSD
+boot mode which is configured using the following jumper settings.
+
+   8 7 6 5 4 3 2 1
+   x|x|x|x| |x|x|
+S4 -+-+-+-+-+-+-+-
+    | | | |x| | |x on
+
+The eSD card needs to be present in SDHI slot 1 (CN7).
+As such S1 and S33 also need to be configured as per
+the notes in arch/arm/mach-shmobile/board-mackerel.c.
+
+A partial zImage must be written to physical partition #1 (boot)
+of the eSD at sector 0 in vrl4 format. A utility vrl4 is supplied to
+accomplish this.
+
+e.g.
+	vrl4 < zImage | dd of=/dev/sdX bs=512 count=17
+
+A full copy of _the same_ zImage should be written to physical partition #1
+(boot) of the eSD at sector 0. This should _not_ be in vrl4 format.
+
+	vrl4 < zImage | dd of=/dev/sdX bs=512
+
+Note: The commands above assume that the physical partition has been
+switched. No such facility currently exists in the Linux Kernel.
+
+Physical partitions are described in the eSD specification.  At the time of
+writing they are not the same as partitions that are typically configured
+using fdisk and visible through /proc/partitions
diff --git a/Documentation/arm/SPEAr/overview.txt b/Documentation/arm/SPEAr/overview.txt
index 253a35c6f78..65610bf52eb 100644
--- a/Documentation/arm/SPEAr/overview.txt
+++ b/Documentation/arm/SPEAr/overview.txt
@@ -8,53 +8,56 @@ Introduction
   weblink : http://www.st.com/spear
 
   The ST Microelectronics SPEAr range of ARM9/CortexA9 System-on-Chip CPUs are
-  supported by the 'spear' platform of ARM Linux. Currently SPEAr300,
-  SPEAr310, SPEAr320 and SPEAr600 SOCs are supported. Support for the SPEAr13XX
-  series is in progress.
+  supported by the 'spear' platform of ARM Linux. Currently SPEAr1310,
+  SPEAr1340, SPEAr300, SPEAr310, SPEAr320 and SPEAr600 SOCs are supported.
 
   Hierarchy in SPEAr is as follows:
 
   SPEAr (Platform)
 	- SPEAr3XX (3XX SOC series, based on ARM9)
 		- SPEAr300 (SOC)
-			- SPEAr300_EVB (Evaluation Board)
+			- SPEAr300 Evaluation Board
 		- SPEAr310 (SOC)
-			- SPEAr310_EVB (Evaluation Board)
+			- SPEAr310 Evaluation Board
 		- SPEAr320 (SOC)
-			- SPEAr320_EVB (Evaluation Board)
+			- SPEAr320 Evaluation Board
 	- SPEAr6XX (6XX SOC series, based on ARM9)
 		- SPEAr600 (SOC)
-			- SPEAr600_EVB (Evaluation Board)
+			- SPEAr600 Evaluation Board
 	- SPEAr13XX (13XX SOC series, based on ARM CORTEXA9)
-		- SPEAr1300 (SOC)
+		- SPEAr1310 (SOC)
+			- SPEAr1310 Evaluation Board
+		- SPEAr1340 (SOC)
+			- SPEAr1340 Evaluation Board
 
   Configuration
   -------------
 
   A generic configuration is provided for each machine, and can be used as the
   default by
-	make spear600_defconfig
-	make spear300_defconfig
-	make spear310_defconfig
-	make spear320_defconfig
+	make spear13xx_defconfig
+	make spear3xx_defconfig
+	make spear6xx_defconfig
 
   Layout
   ------
 
-  The common files for multiple machine families (SPEAr3XX, SPEAr6XX and
-  SPEAr13XX) are located in the platform code contained in arch/arm/plat-spear
+  The common files for multiple machine families (SPEAr3xx, SPEAr6xx and
+  SPEAr13xx) are located in the platform code contained in arch/arm/plat-spear
   with headers in plat/.
 
   Each machine series have a directory with name arch/arm/mach-spear followed by
   series name. Like mach-spear3xx, mach-spear6xx and mach-spear13xx.
 
-  Common file for machines of spear3xx family is mach-spear3xx/spear3xx.c and for
-  spear6xx is mach-spear6xx/spear6xx.c. mach-spear* also contain soc/machine
-  specific files, like spear300.c, spear310.c, spear320.c and spear600.c.
-  mach-spear* also contains board specific files for each machine type.
+  Common file for machines of spear3xx family is mach-spear3xx/spear3xx.c, for
+  spear6xx is mach-spear6xx/spear6xx.c and for spear13xx family is
+  mach-spear13xx/spear13xx.c. mach-spear* also contain soc/machine specific
+  files, like spear1310.c, spear1340.c spear300.c, spear310.c, spear320.c and
+  spear600.c.  mach-spear* doesn't contains board specific files as they fully
+  support Flattened Device Tree.
 
 
   Document Author
   ---------------
 
-  Viresh Kumar, (c) 2010 ST Microelectronics
+  Viresh Kumar <viresh.linux@gmail.com>, (c) 2010-2012 ST Microelectronics
diff --git a/Documentation/arm/Samsung-S3C24XX/GPIO.txt b/Documentation/arm/Samsung-S3C24XX/GPIO.txt
index 816d6071669..0ebd7e2244d 100644
--- a/Documentation/arm/Samsung-S3C24XX/GPIO.txt
+++ b/Documentation/arm/Samsung-S3C24XX/GPIO.txt
@@ -1,4 +1,4 @@
-			S3C2410 GPIO Control
+			S3C24XX GPIO Control
 			====================
 
 Introduction
@@ -12,7 +12,7 @@ Introduction
   of the s3c2410 GPIO system, please read the Samsung provided
   data-sheet/users manual to find out the complete list.
 
-  See Documentation/arm/Samsung/GPIO.txt for the core implemetation.
+  See Documentation/arm/Samsung/GPIO.txt for the core implementation.
 
 
 GPIOLIB
@@ -41,8 +41,8 @@ GPIOLIB
 GPIOLIB conversion
 ------------------
 
-If you need to convert your board or driver to use gpiolib from the exiting
-s3c2410 api, then here are some notes on the process.
+If you need to convert your board or driver to use gpiolib from the phased
+out s3c2410 API, then here are some notes on the process.
 
 1) If your board is exclusively using an GPIO, say to control peripheral
    power, then it will require to claim the gpio with gpio_request() before
@@ -55,7 +55,7 @@ s3c2410 api, then here are some notes on the process.
    as they have the same arguments, and can either take the pin specific
    values, or the more generic special-function-number arguments.
 
-3) s3c2410_gpio_pullup() changs have the problem that whilst the 
+3) s3c2410_gpio_pullup() changes have the problem that whilst the
    s3c2410_gpio_pullup(x, 1) can be easily translated to the
    s3c_gpio_setpull(x, S3C_GPIO_PULL_NONE), the s3c2410_gpio_pullup(x, 0)
    are not so easy.
@@ -74,7 +74,7 @@ s3c2410 api, then here are some notes on the process.
    when using gpio_get_value() on an output pin (s3c2410_gpio_getpin
    would return the value the pin is supposed to be outputting).
 
-6) s3c2410_gpio_getirq() should be directly replacable with the
+6) s3c2410_gpio_getirq() should be directly replaceable with the
    gpio_to_irq() call.
 
 The s3c2410_gpio and gpio_ calls have always operated on the same gpio
@@ -85,27 +85,16 @@ between the calls.
 Headers
 -------
 
-  See arch/arm/mach-s3c2410/include/mach/regs-gpio.h for the list
+  See arch/arm/mach-s3c24xx/include/mach/regs-gpio.h for the list
   of GPIO pins, and the configuration values for them. This
   is included by using #include <mach/regs-gpio.h>
 
-  The GPIO management functions are defined in the hardware
-  header arch/arm/mach-s3c2410/include/mach/hardware.h which can be
-  included by #include <mach/hardware.h>
-
-  A useful amount of documentation can be found in the hardware
-  header on how the GPIO functions (and others) work.
-
-  Whilst a number of these functions do make some checks on what
-  is passed to them, for speed of use, they may not always ensure
-  that the user supplied data to them is correct.
-
 
 PIN Numbers
 -----------
 
   Each pin has an unique number associated with it in regs-gpio.h,
-  eg S3C2410_GPA(0) or S3C2410_GPF(1). These defines are used to tell
+  e.g. S3C2410_GPA(0) or S3C2410_GPF(1). These defines are used to tell
   the GPIO functions which pin is to be used.
 
   With the conversion to gpiolib, there is no longer a direct conversion
@@ -120,31 +109,27 @@ Configuring a pin
   The following function allows the configuration of a given pin to
   be changed.
 
-    void s3c2410_gpio_cfgpin(unsigned int pin, unsigned int function);
+    void s3c_gpio_cfgpin(unsigned int pin, unsigned int function);
 
-  Eg:
+  e.g.:
 
-     s3c2410_gpio_cfgpin(S3C2410_GPA(0), S3C2410_GPA0_ADDR0);
-     s3c2410_gpio_cfgpin(S3C2410_GPE(8), S3C2410_GPE8_SDDAT1);
+     s3c_gpio_cfgpin(S3C2410_GPA(0), S3C_GPIO_SFN(1));
+     s3c_gpio_cfgpin(S3C2410_GPE(8), S3C_GPIO_SFN(2));
 
    which would turn GPA(0) into the lowest Address line A0, and set
    GPE(8) to be connected to the SDIO/MMC controller's SDDAT1 line.
 
-   The s3c_gpio_cfgpin() call is a functional replacement for this call.
-
 
 Reading the current configuration
 ---------------------------------
 
-  The current configuration of a pin can be read by using:
+  The current configuration of a pin can be read by using standard
+  gpiolib function:
 
-  s3c2410_gpio_getcfg(unsigned int pin);
+  s3c_gpio_getcfg(unsigned int pin);
 
   The return value will be from the same set of values which can be
-  passed to s3c2410_gpio_cfgpin().
-
-  The s3c_gpio_getcfg() call should be a functional replacement for
-  this call.
+  passed to s3c_gpio_cfgpin().
 
 
 Configuring a pull-up resistor
@@ -154,61 +139,33 @@ Configuring a pull-up resistor
   pull-up resistors enabled. This can be configured by the following
   function:
 
-    void s3c2410_gpio_pullup(unsigned int pin, unsigned int to);
+    void s3c_gpio_setpull(unsigned int pin, unsigned int to);
 
-  Where the to value is zero to set the pull-up off, and 1 to enable
-  the specified pull-up. Any other values are currently undefined.
+  Where the to value is S3C_GPIO_PULL_NONE to set the pull-up off,
+  and S3C_GPIO_PULL_UP to enable the specified pull-up. Any other
+  values are currently undefined.
 
-  The s3c_gpio_setpull() offers similar functionality, but with the
-  ability to encode whether the pull is up or down. Currently there
-  is no 'just on' state, so up or down must be selected.
 
+Getting and setting the state of a PIN
+--------------------------------------
 
-Getting the state of a PIN
---------------------------
-
-  The state of a pin can be read by using the function:
-
-    unsigned int s3c2410_gpio_getpin(unsigned int pin);
-
-  This will return either zero or non-zero. Do not count on this
-  function returning 1 if the pin is set.
-
-  This call is now implemented by the relevant gpiolib calls, convert
-  your board or driver to use gpiolib.
-
-
-Setting the state of a PIN
---------------------------
-
-  The value an pin is outputing can be modified by using the following:
-
-    void s3c2410_gpio_setpin(unsigned int pin, unsigned int to);
-
-  Which sets the given pin to the value. Use 0 to write 0, and 1 to
-  set the output to 1.
-
-  This call is now implemented by the relevant gpiolib calls, convert
+  These calls are now implemented by the relevant gpiolib calls, convert
   your board or driver to use gpiolib.
 
 
 Getting the IRQ number associated with a PIN
 --------------------------------------------
 
-  The following function can map the given pin number to an IRQ
+  A standard gpiolib function can map the given pin number to an IRQ
   number to pass to the IRQ system.
 
-   int s3c2410_gpio_getirq(unsigned int pin);
+   int gpio_to_irq(unsigned int pin);
 
   Note, not all pins have an IRQ.
 
-  This call is now implemented by the relevant gpiolib calls, convert
-  your board or driver to use gpiolib.
-
 
-Authour
+Author
 -------
 
-
 Ben Dooks, 03 October 2004
 Copyright 2004 Ben Dooks, Simtec Electronics
diff --git a/Documentation/arm/Samsung-S3C24XX/H1940.txt b/Documentation/arm/Samsung-S3C24XX/H1940.txt
index f4a7b22c866..b738859b1fc 100644
--- a/Documentation/arm/Samsung-S3C24XX/H1940.txt
+++ b/Documentation/arm/Samsung-S3C24XX/H1940.txt
@@ -37,4 +37,4 @@ Maintainers
   Thanks to the many others who have also provided support.
 
 
-(c) 2005 Ben Dooks
-\ No newline at end of file
+(c) 2005 Ben Dooks
diff --git a/Documentation/arm/Samsung-S3C24XX/Overview.txt b/Documentation/arm/Samsung-S3C24XX/Overview.txt
index c12bfc1a00c..359587b2367 100644
--- a/Documentation/arm/Samsung-S3C24XX/Overview.txt
+++ b/Documentation/arm/Samsung-S3C24XX/Overview.txt
@@ -8,10 +8,13 @@ Introduction
 
   The Samsung S3C24XX range of ARM9 System-on-Chip CPUs are supported
   by the 's3c2410' architecture of ARM Linux. Currently the S3C2410,
-  S3C2412, S3C2413, S3C2416 S3C2440, S3C2442, S3C2443 and S3C2450 devices
+  S3C2412, S3C2413, S3C2416, S3C2440, S3C2442, S3C2443 and S3C2450 devices
   are supported.
 
-  Support for the S3C2400 and S3C24A0 series are in progress.
+  Support for the S3C2400 and S3C24A0 series was never completed and the
+  corresponding code has been removed after a while.  If someone wishes to
+  revive this effort, partial support can be retrieved from earlier Linux
+  versions.
 
   The S3C2416 and S3C2450 devices are very similar and S3C2450 support is
   included under the arch/arm/mach-s3c2416 directory. Note, whilst core
diff --git a/Documentation/arm/Samsung-S3C24XX/SMDK2440.txt b/Documentation/arm/Samsung-S3C24XX/SMDK2440.txt
index 32e1eae6a25..429390bd468 100644
--- a/Documentation/arm/Samsung-S3C24XX/SMDK2440.txt
+++ b/Documentation/arm/Samsung-S3C24XX/SMDK2440.txt
@@ -53,4 +53,4 @@ Maintainers
   and to Simtec Electronics for allowing me time to work on this.
 
 
-(c) 2004 Ben Dooks
-\ No newline at end of file
+(c) 2004 Ben Dooks
diff --git a/Documentation/arm/Samsung-S3C24XX/Suspend.txt b/Documentation/arm/Samsung-S3C24XX/Suspend.txt
index 7edd0e2e6c5..1ca63b3e563 100644
--- a/Documentation/arm/Samsung-S3C24XX/Suspend.txt
+++ b/Documentation/arm/Samsung-S3C24XX/Suspend.txt
@@ -116,7 +116,7 @@ Configuration
     Allows the entire memory to be checksummed before and after the
     suspend to see if there has been any corruption of the contents.
 
-    Note, the time to calculate the CRC is dependant on the CPU speed
+    Note, the time to calculate the CRC is dependent on the CPU speed
     and the size of memory. For an 64Mbyte RAM area on an 200MHz
     S3C2410, this can take approximately 4 seconds to complete.
 
diff --git a/Documentation/arm/Samsung/GPIO.txt b/Documentation/arm/Samsung/GPIO.txt
index 05850c62abe..795adfd8808 100644
--- a/Documentation/arm/Samsung/GPIO.txt
+++ b/Documentation/arm/Samsung/GPIO.txt
@@ -5,14 +5,14 @@ Introduction
 ------------
 
 This outlines the Samsung GPIO implementation and the architecture
-specfic calls provided alongisde the drivers/gpio core.
+specific calls provided alongside the drivers/gpio core.
 
 
 S3C24XX (Legacy)
 ----------------
 
 See Documentation/arm/Samsung-S3C24XX/GPIO.txt for more information
-about these devices. Their implementation is being brought into line
+about these devices. Their implementation has been brought into line
 with the core samsung implementation described in this document.
 
 
@@ -29,7 +29,7 @@ GPIO numbering is synchronised between the Samsung and gpiolib system.
 PIN configuration
 -----------------
 
-Pin configuration is specific to the Samsung architecutre, with each SoC
+Pin configuration is specific to the Samsung architecture, with each SoC
 registering the necessary information for the core gpio configuration
 implementation to configure pins as necessary.
 
@@ -38,5 +38,3 @@ driver or machine to change gpio configuration.
 
 See arch/arm/plat-samsung/include/plat/gpio-cfg.h for more information
 on these functions.
-
-
diff --git a/Documentation/arm/Samsung/Overview.txt b/Documentation/arm/Samsung/Overview.txt
index c3094ea51aa..658abb258ce 100644
--- a/Documentation/arm/Samsung/Overview.txt
+++ b/Documentation/arm/Samsung/Overview.txt
@@ -14,7 +14,6 @@ Introduction
   - S3C24XX: See Documentation/arm/Samsung-S3C24XX/Overview.txt for full list
   - S3C64XX: S3C6400 and S3C6410
   - S5P6440
-  - S5P6442
   - S5PC100
   - S5PC110 / S5PV210
 
@@ -36,7 +35,6 @@ Configuration
   unifying all the SoCs into one kernel.
 
   s5p6440_defconfig - S5P6440 specific default configuration
-  s5p6442_defconfig - S5P6442 specific default configuration
   s5pc100_defconfig - S5PC100 specific default configuration
   s5pc110_defconfig - S5PC110 specific default configuration
   s5pv210_defconfig - S5PV210 specific default configuration
diff --git a/Documentation/arm/Sharp-LH/ADC-LH7-Touchscreen b/Documentation/arm/Sharp-LH/ADC-LH7-Touchscreen
deleted file mode 100644
index dc460f05564..00000000000
--- a/Documentation/arm/Sharp-LH/ADC-LH7-Touchscreen
+++ /dev/null
@@ -1,61 +0,0 @@
-README on the ADC/Touchscreen Controller
-========================================
-
-The LH79524 and LH7A404 include a built-in Analog to Digital
-controller (ADC) that is used to process input from a touchscreen.
-The driver only implements a four-wire touch panel protocol.
-
-The touchscreen driver is maintenance free except for the pen-down or
-touch threshold.  Some resistive displays and board combinations may
-require tuning of this threshold.  The driver exposes some of its
-internal state in the sys filesystem.  If the kernel is configured
-with it, CONFIG_SYSFS, and sysfs is mounted at /sys, there will be a
-directory
-
-  /sys/devices/platform/adc-lh7.0
-
-containing these files.
-
-  -r--r--r--    1 root     root         4096 Jan  1 00:00 samples
-  -rw-r--r--    1 root     root         4096 Jan  1 00:00 threshold
-  -r--r--r--    1 root     root         4096 Jan  1 00:00 threshold_range
-
-The threshold is the current touch threshold.  It defaults to 750 on
-most targets.
-
-  # cat threshold
- 750
-
-The threshold_range contains the range of valid values for the
-threshold.  Values outside of this range will be silently ignored.
-
-  # cat threshold_range
-  0 1023
-
-To change the threshold, write a value to the threshold file.
-
-  # echo 500 > threshold
-  # cat threshold
-  500
-
-The samples file contains the most recently sampled values from the
-ADC.  There are 12.  Below are typical of the last sampled values when
-the pen has been released.  The first two and last two samples are for
-detecting whether or not the pen is down.  The third through sixth are
-X coordinate samples.  The seventh through tenth are Y coordinate
-samples.
-
-  # cat samples
-  1023 1023 0 0 0 0 530 529 530 529 1023 1023
-
-To determine a reasonable threshold, press on the touch panel with an
-appropriate stylus and read the values from samples.
-
-  # cat samples
-  1023 676 92 103 101 102 855 919 922 922 1023 679
-
-The first and eleventh samples are discarded.  Thus, the important
-values are the second and twelfth which are used to determine if the
-pen is down.  When both are below the threshold, the driver registers
-that the pen is down.  When either is above the threshold, it
-registers then pen is up.
diff --git a/Documentation/arm/Sharp-LH/CompactFlash b/Documentation/arm/Sharp-LH/CompactFlash
deleted file mode 100644
index 8616d877df9..00000000000
--- a/Documentation/arm/Sharp-LH/CompactFlash
+++ /dev/null
@@ -1,32 +0,0 @@
-README on the Compact Flash for Card Engines
-============================================
-
-There are three challenges in supporting the CF interface of the Card
-Engines.  First, every IO operation must be followed with IO to
-another memory region.  Second, the slot is wired for one-to-one
-address mapping *and* it is wired for 16 bit access only.  Second, the
-interrupt request line from the CF device isn't wired.
-
-The IOBARRIER issue is covered in README.IOBARRIER.  This isn't an
-onerous problem.  Enough said here.
-
-The addressing issue is solved in the
-arch/arm/mach-lh7a40x/ide-lpd7a40x.c file with some awkward
-work-arounds.  We implement a special SELECT_DRIVE routine that is
-called before the IDE driver performs its own SELECT_DRIVE.  Our code
-recognizes that the SELECT register cannot be modified without also
-writing a command.  It send an IDLE_IMMEDIATE command on selecting a
-drive.  The function also prevents drive select to the slave drive
-since there can be only one.  The awkward part is that the IDE driver,
-even though we have a select procedure, also attempts to change the
-drive by writing directly the SELECT register.  This attempt is
-explicitly blocked by the OUTB function--not pretty, but effective.
-
-The lack of interrupts is a more serious problem.  Even though the CF
-card is fast when compared to a normal IDE device, we don't know that
-the CF is really flash.  A user could use one of the very small hard
-drives being shipped with a CF interface.  The IDE code includes a
-check for interfaces that lack an IRQ.  In these cases, submitting a
-command to the IDE controller is followed by a call to poll for
-completion.  If the device isn't immediately ready, it schedules a
-timer to poll again later.
diff --git a/Documentation/arm/Sharp-LH/IOBarrier b/Documentation/arm/Sharp-LH/IOBarrier
deleted file mode 100644
index 2e953e228f4..00000000000
--- a/Documentation/arm/Sharp-LH/IOBarrier
+++ /dev/null
@@ -1,45 +0,0 @@
-README on the IOBARRIER for CardEngine IO
-=========================================
-
-Due to an unfortunate oversight when the Card Engines were designed,
-the signals that control access to some peripherals, most notably the
-SMC91C9111 ethernet controller, are not properly handled.
-
-The symptom is that some back to back IO with the peripheral returns
-unreliable data.  With the SMC chip, you'll see errors about the bank
-register being 'screwed'.
-
-The cause is that the AEN signal to the SMC chip does not transition
-for every memory access.  It is driven through the CPLD from the CS7
-line of the CPU's static memory controller which is optimized to
-eliminate unnecessary transitions.  Yet, the SMC requires a transition
-for every write access.  The Sharp website has more information about
-the effect this power-conserving feature has on peripheral
-interfacing.
-
-The solution is to follow every write access to the SMC chip with an
-access to another memory region that will force the CPU to release the
-chip select line.  It is important to guarantee that this access
-forces the CPU off-chip.  We map a page of SDRAM as if it were an
-uncacheable IO device and read from it after every SMC IO write
-operation.
-
-  SMC IO
-  BARRIER IO
-
-Only this sequence is important.  It does not matter that there is no
-BARRIER IO before the access to the SMC chip because the AEN latch
-only needs occurs after the SMC IO write cycle.  The routines that
-implement this work-around make an additional concession which is to
-disable interrupts during the IO sequence.  Other hardware devices
-(the LogicPD CPLD) have registers in the same physical memory
-region as the SMC chip.  An interrupt might allow an access to one of
-those registers while SMC IO is being performed.
-
-You might be tempted to think that we have to access another device
-attached to the static memory controller, but the empirical evidence
-indicates that this is not so.  Mapping 0x00000000 (flash) and
-0xc0000000 (SDRAM) appear to have the same effect.  Using SDRAM seems
-to be faster.  Choosing to access an undecoded memory region is not
-desirable as there is no way to know how that chip select will be used
-in the future.
diff --git a/Documentation/arm/Sharp-LH/KEV7A400 b/Documentation/arm/Sharp-LH/KEV7A400
deleted file mode 100644
index be32b14cd53..00000000000
--- a/Documentation/arm/Sharp-LH/KEV7A400
+++ /dev/null
@@ -1,8 +0,0 @@
-README on Implementing Linux for Sharp's KEV7a400
-=================================================
-
-This product has been discontinued by Sharp.  For the time being, the
-partially implemented code remains in the kernel.  At some point in
-the future, either the code will be finished or it will be removed
-completely.  This depends primarily on how many of the development
-boards are in the field.
diff --git a/Documentation/arm/Sharp-LH/LCDPanels b/Documentation/arm/Sharp-LH/LCDPanels
deleted file mode 100644
index fb1b21c2f2f..00000000000
--- a/Documentation/arm/Sharp-LH/LCDPanels
+++ /dev/null
@@ -1,59 +0,0 @@
-README on the LCD Panels
-========================
-
-Configuration options for several LCD panels, available from Logic PD,
-are included in the kernel source.  This README will help you
-understand the configuration data and give you some guidance for
-adding support for other panels if you wish.
-
-
-lcd-panels.h
-------------
-
-There is no way, at present, to detect which panel is attached to the
-system at runtime.  Thus the kernel configuration is static.  The file
-arch/arm/mach-ld7a40x/lcd-panels.h (or similar) defines all of the
-panel specific parameters.
-
-It should be possible for this data to be shared among several device
-families.  The current layout may be insufficiently general, but it is
-amenable to improvement.
-
-
-PIXEL_CLOCK
------------
-
-The panel data sheets will give a range of acceptable pixel clocks.
-The fundamental LCDCLK input frequency is divided down by a PCD
-constant in field '.tim2'.  It may happen that it is impossible to set
-the pixel clock within this range.  A clock which is too slow will
-tend to flicker.  For the highest quality image, set the clock as high
-as possible.
-
-
-MARGINS
--------
-
-These values may be difficult to glean from the panel data sheet.  In
-the case of the Sharp panels, the upper margin is explicitly called
-out as a specific number of lines from the top of the frame.  The
-other values may not matter as much as the panels tend to
-automatically center the image.
-
-
-Sync Sense
-----------
-
-The sense of the hsync and vsync pulses may be called out in the data
-sheet.  On one panel, the sense of these pulses determine the height
-of the visible region on the panel.  Most of the Sharp panels use
-negative sense sync pulses set by the TIM2_IHS and TIM2_IVS bits in
-'.tim2'.
-
-
-Pel Layout
-----------
-
-The Sharp color TFT panels are all configured for 16 bit direct color
-modes.  The amba-lcd driver sets the pel mode to 565 for 5 bits of
-each red and blue and 6 bits of green.
diff --git a/Documentation/arm/Sharp-LH/LPD7A400 b/Documentation/arm/Sharp-LH/LPD7A400
deleted file mode 100644
index 3275b453bfd..00000000000
--- a/Documentation/arm/Sharp-LH/LPD7A400
+++ /dev/null
@@ -1,15 +0,0 @@
-README on Implementing Linux for the Logic PD LPD7A400-10
-=========================================================
-
-- CPLD memory mapping
-
-  The board designers chose to use high address lines for controlling
-  access to the CPLD registers.  It turns out to be a big waste
-  because we're using an MMU and must map IO space into virtual
-  memory.  The result is that we have to make a mapping for every
-  register.
-
-- Serial Console
-
-  It may be OK not to use the serial console option if the user passes
-  the console device name to the kernel.  This deserves some exploration.
diff --git a/Documentation/arm/Sharp-LH/LPD7A40X b/Documentation/arm/Sharp-LH/LPD7A40X
deleted file mode 100644
index 8c29a27e208..00000000000
--- a/Documentation/arm/Sharp-LH/LPD7A40X
+++ /dev/null
@@ -1,16 +0,0 @@
-README on Implementing Linux for the Logic PD LPD7A40X-10
-=========================================================
-
-- CPLD memory mapping
-
-  The board designers chose to use high address lines for controlling
-  access to the CPLD registers.  It turns out to be a big waste
-  because we're using an MMU and must map IO space into virtual
-  memory.  The result is that we have to make a mapping for every
-  register.
-
-- Serial Console
-
-  It may be OK not to use the serial console option if the user passes
-  the console device name to the kernel.  This deserves some exploration.
-
diff --git a/Documentation/arm/Sharp-LH/SDRAM b/Documentation/arm/Sharp-LH/SDRAM
deleted file mode 100644
index 93ddc23c2fa..00000000000
--- a/Documentation/arm/Sharp-LH/SDRAM
+++ /dev/null
@@ -1,51 +0,0 @@
-README on the SDRAM Controller for the LH7a40X
-==============================================
-
-The standard configuration for the SDRAM controller generates a sparse
-memory array.  The precise layout is determined by the SDRAM chips.  A
-default kernel configuration assembles the discontiguous memory
-regions into separate memory nodes via the NUMA (Non-Uniform Memory
-Architecture) facilities.  In this default configuration, the kernel
-is forgiving about the precise layout.  As long as it is given an
-accurate picture of available memory by the bootloader the kernel will
-execute correctly.
-
-The SDRC supports a mode where some of the chip select lines are
-swapped in order to make SDRAM look like a synchronous ROM.  Setting
-this bit means that the RAM will present as a contiguous array.  Some
-programmers prefer this to the discontiguous layout.  Be aware that
-may be a penalty for this feature where some some configurations of
-memory are significantly reduced; i.e. 64MiB of RAM appears as only 32
-MiB.
-
-There are a couple of configuration options to override the default
-behavior.  When the SROMLL bit is set and memory appears as a
-contiguous array, there is no reason to support NUMA.
-CONFIG_LH7A40X_CONTIGMEM disables NUMA support.  When physical memory
-is discontiguous, the memory tables are organized such that there are
-two banks per nodes with a small gap between them.  This layout wastes
-some kernel memory for page tables representing non-existent memory.
-CONFIG_LH7A40X_ONE_BANK_PER_NODE optimizes the node tables such that
-there are no gaps.  These options control the low level organization
-of the memory management tables in ways that may prevent the kernel
-from booting or may cause the kernel to allocated excessively large
-page tables.  Be warned.  Only change these options if you know what
-you are doing.  The default behavior is a reasonable compromise that
-will suit all users.
-
---
-
-A typical 32MiB system with the default configuration options will
-find physical memory managed as follows.
-
-   node 0: 0xc0000000 4MiB
-           0xc1000000 4MiB
-   node 1: 0xc4000000 4MiB
-           0xc5000000 4MiB
-   node 2: 0xc8000000 4MiB
-           0xc9000000 4MiB
-   node 3: 0xcc000000 4MiB
-           0xcd000000 4MiB
-
-Setting CONFIG_LH7A40X_ONE_BANK_PER_NODE will put each bank into a
-separate node.
diff --git a/Documentation/arm/Sharp-LH/VectoredInterruptController b/Documentation/arm/Sharp-LH/VectoredInterruptController
deleted file mode 100644
index 23047e9861e..00000000000
--- a/Documentation/arm/Sharp-LH/VectoredInterruptController
+++ /dev/null
@@ -1,80 +0,0 @@
-README on the Vectored Interrupt Controller of the LH7A404
-==========================================================
-
-The 404 revision of the LH7A40X series comes with two vectored
-interrupts controllers.  While the kernel does use some of the
-features of these devices, it is far from the purpose for which they
-were designed.
-
-When this README was written, the implementation of the VICs was in
-flux.  It is possible that some details, especially with priorities,
-will change.
-
-The VIC support code is inspired by routines written by Sharp.
-
-
-Priority Control
-----------------
-
-The significant reason for using the VIC's vectoring is to control
-interrupt priorities.  There are two tables in
-arch/arm/mach-lh7a40x/irq-lh7a404.c that look something like this.
-
-  static unsigned char irq_pri_vic1[] = { IRQ_GPIO3INTR, };
-  static unsigned char irq_pri_vic2[] = {
-	IRQ_T3UI, IRQ_GPIO7INTR,
-	IRQ_UART1INTR, IRQ_UART2INTR, IRQ_UART3INTR, };
-
-The initialization code reads these tables and inserts a vector
-address and enable for each indicated IRQ.  Vectored interrupts have
-higher priority than non-vectored interrupts.  So, on VIC1,
-IRQ_GPIO3INTR will be served before any other non-FIQ interrupt.  Due
-to the way that the vectoring works, IRQ_T3UI is the next highest
-priority followed by the other vectored interrupts on VIC2.  After
-that, the non-vectored interrupts are scanned in VIC1 then in VIC2.
-
-
-ISR
----
-
-The interrupt service routine macro get_irqnr() in
-arch/arm/kernel/entry-armv.S scans the VICs for the next active
-interrupt.  The vectoring makes this code somewhat larger than it was
-before using vectoring (refer to the LH7A400 implementation).  In the
-case where an interrupt is vectored, the implementation will tend to
-be faster than the non-vectored version.  However, the worst-case path
-is longer.
-
-It is worth noting that at present, there is no need to read
-VIC2_VECTADDR because the register appears to be shared between the
-controllers.  The code is written such that if this changes, it ought
-to still work properly.
-
-
-Vector Addresses
-----------------
-
-The proper use of the vectoring hardware would jump to the ISR
-specified by the vectoring address.  Linux isn't structured to take
-advantage of this feature, though it might be possible to change
-things to support it.
-
-In this implementation, the vectoring address is used to speed the
-search for the active IRQ.  The address is coded such that the lowest
-6 bits store the IRQ number for vectored interrupts.  These numbers
-correspond to the bits in the interrupt status registers.  IRQ zero is
-the lowest interrupt bit in VIC1.  IRQ 32 is the lowest interrupt bit
-in VIC2.  Because zero is a valid IRQ number and because we cannot
-detect whether or not there is a valid vectoring address if that
-address is zero, the eigth bit (0x100) is set for vectored interrupts.
-The address for IRQ 0x18 (VIC2) is 0x118.  Only the ninth bit is set
-for the default handler on VIC1 and only the tenth bit is set for the
-default handler on VIC2.
-
-In other words.
-
-  0x000		- no active interrupt
-  0x1ii		- vectored interrupt 0xii
-  0x2xx		- unvectored interrupt on VIC1 (xx is don't care)
-  0x4xx		- unvectored interrupt on VIC2 (xx is don't care)
-
diff --git a/Documentation/arm/cluster-pm-race-avoidance.txt b/Documentation/arm/cluster-pm-race-avoidance.txt
new file mode 100644
index 00000000000..750b6fc24af
--- /dev/null
+++ b/Documentation/arm/cluster-pm-race-avoidance.txt
@@ -0,0 +1,498 @@
+Cluster-wide Power-up/power-down race avoidance algorithm
+=========================================================
+
+This file documents the algorithm which is used to coordinate CPU and
+cluster setup and teardown operations and to manage hardware coherency
+controls safely.
+
+The section "Rationale" explains what the algorithm is for and why it is
+needed.  "Basic model" explains general concepts using a simplified view
+of the system.  The other sections explain the actual details of the
+algorithm in use.
+
+
+Rationale
+---------
+
+In a system containing multiple CPUs, it is desirable to have the
+ability to turn off individual CPUs when the system is idle, reducing
+power consumption and thermal dissipation.
+
+In a system containing multiple clusters of CPUs, it is also desirable
+to have the ability to turn off entire clusters.
+
+Turning entire clusters off and on is a risky business, because it
+involves performing potentially destructive operations affecting a group
+of independently running CPUs, while the OS continues to run.  This
+means that we need some coordination in order to ensure that critical
+cluster-level operations are only performed when it is truly safe to do
+so.
+
+Simple locking may not be sufficient to solve this problem, because
+mechanisms like Linux spinlocks may rely on coherency mechanisms which
+are not immediately enabled when a cluster powers up.  Since enabling or
+disabling those mechanisms may itself be a non-atomic operation (such as
+writing some hardware registers and invalidating large caches), other
+methods of coordination are required in order to guarantee safe
+power-down and power-up at the cluster level.
+
+The mechanism presented in this document describes a coherent memory
+based protocol for performing the needed coordination.  It aims to be as
+lightweight as possible, while providing the required safety properties.
+
+
+Basic model
+-----------
+
+Each cluster and CPU is assigned a state, as follows:
+
+	DOWN
+	COMING_UP
+	UP
+	GOING_DOWN
+
+	    +---------> UP ----------+
+	    |                        v
+
+	COMING_UP                GOING_DOWN
+
+	    ^                        |
+	    +--------- DOWN <--------+
+
+
+DOWN:	The CPU or cluster is not coherent, and is either powered off or
+	suspended, or is ready to be powered off or suspended.
+
+COMING_UP: The CPU or cluster has committed to moving to the UP state.
+	It may be part way through the process of initialisation and
+	enabling coherency.
+
+UP:	The CPU or cluster is active and coherent at the hardware
+	level.  A CPU in this state is not necessarily being used
+	actively by the kernel.
+
+GOING_DOWN: The CPU or cluster has committed to moving to the DOWN
+	state.  It may be part way through the process of teardown and
+	coherency exit.
+
+
+Each CPU has one of these states assigned to it at any point in time.
+The CPU states are described in the "CPU state" section, below.
+
+Each cluster is also assigned a state, but it is necessary to split the
+state value into two parts (the "cluster" state and "inbound" state) and
+to introduce additional states in order to avoid races between different
+CPUs in the cluster simultaneously modifying the state.  The cluster-
+level states are described in the "Cluster state" section.
+
+To help distinguish the CPU states from cluster states in this
+discussion, the state names are given a CPU_ prefix for the CPU states,
+and a CLUSTER_ or INBOUND_ prefix for the cluster states.
+
+
+CPU state
+---------
+
+In this algorithm, each individual core in a multi-core processor is
+referred to as a "CPU".  CPUs are assumed to be single-threaded:
+therefore, a CPU can only be doing one thing at a single point in time.
+
+This means that CPUs fit the basic model closely.
+
+The algorithm defines the following states for each CPU in the system:
+
+	CPU_DOWN
+	CPU_COMING_UP
+	CPU_UP
+	CPU_GOING_DOWN
+
+	 cluster setup and
+	CPU setup complete          policy decision
+	      +-----------> CPU_UP ------------+
+	      |                                v
+
+	CPU_COMING_UP                   CPU_GOING_DOWN
+
+	      ^                                |
+	      +----------- CPU_DOWN <----------+
+	 policy decision           CPU teardown complete
+	or hardware event
+
+
+The definitions of the four states correspond closely to the states of
+the basic model.
+
+Transitions between states occur as follows.
+
+A trigger event (spontaneous) means that the CPU can transition to the
+next state as a result of making local progress only, with no
+requirement for any external event to happen.
+
+
+CPU_DOWN:
+
+	A CPU reaches the CPU_DOWN state when it is ready for
+	power-down.  On reaching this state, the CPU will typically
+	power itself down or suspend itself, via a WFI instruction or a
+	firmware call.
+
+	Next state:	CPU_COMING_UP
+	Conditions:	none
+
+	Trigger events:
+
+		a) an explicit hardware power-up operation, resulting
+		   from a policy decision on another CPU;
+
+		b) a hardware event, such as an interrupt.
+
+
+CPU_COMING_UP:
+
+	A CPU cannot start participating in hardware coherency until the
+	cluster is set up and coherent.  If the cluster is not ready,
+	then the CPU will wait in the CPU_COMING_UP state until the
+	cluster has been set up.
+
+	Next state:	CPU_UP
+	Conditions:	The CPU's parent cluster must be in CLUSTER_UP.
+	Trigger events:	Transition of the parent cluster to CLUSTER_UP.
+
+	Refer to the "Cluster state" section for a description of the
+	CLUSTER_UP state.
+
+
+CPU_UP:
+	When a CPU reaches the CPU_UP state, it is safe for the CPU to
+	start participating in local coherency.
+
+	This is done by jumping to the kernel's CPU resume code.
+
+	Note that the definition of this state is slightly different
+	from the basic model definition: CPU_UP does not mean that the
+	CPU is coherent yet, but it does mean that it is safe to resume
+	the kernel.  The kernel handles the rest of the resume
+	procedure, so the remaining steps are not visible as part of the
+	race avoidance algorithm.
+
+	The CPU remains in this state until an explicit policy decision
+	is made to shut down or suspend the CPU.
+
+	Next state:	CPU_GOING_DOWN
+	Conditions:	none
+	Trigger events:	explicit policy decision
+
+
+CPU_GOING_DOWN:
+
+	While in this state, the CPU exits coherency, including any
+	operations required to achieve this (such as cleaning data
+	caches).
+
+	Next state:	CPU_DOWN
+	Conditions:	local CPU teardown complete
+	Trigger events:	(spontaneous)
+
+
+Cluster state
+-------------
+
+A cluster is a group of connected CPUs with some common resources.
+Because a cluster contains multiple CPUs, it can be doing multiple
+things at the same time.  This has some implications.  In particular, a
+CPU can start up while another CPU is tearing the cluster down.
+
+In this discussion, the "outbound side" is the view of the cluster state
+as seen by a CPU tearing the cluster down.  The "inbound side" is the
+view of the cluster state as seen by a CPU setting the CPU up.
+
+In order to enable safe coordination in such situations, it is important
+that a CPU which is setting up the cluster can advertise its state
+independently of the CPU which is tearing down the cluster.  For this
+reason, the cluster state is split into two parts:
+
+	"cluster" state: The global state of the cluster; or the state
+		on the outbound side:
+
+		CLUSTER_DOWN
+		CLUSTER_UP
+		CLUSTER_GOING_DOWN
+
+	"inbound" state: The state of the cluster on the inbound side.
+
+		INBOUND_NOT_COMING_UP
+		INBOUND_COMING_UP
+
+
+	The different pairings of these states results in six possible
+	states for the cluster as a whole:
+
+	                            CLUSTER_UP
+	          +==========> INBOUND_NOT_COMING_UP -------------+
+	          #                                               |
+	                                                          |
+	     CLUSTER_UP     <----+                                |
+	  INBOUND_COMING_UP      |                                v
+
+	          ^             CLUSTER_GOING_DOWN       CLUSTER_GOING_DOWN
+	          #              INBOUND_COMING_UP <=== INBOUND_NOT_COMING_UP
+
+	    CLUSTER_DOWN         |                                |
+	  INBOUND_COMING_UP <----+                                |
+	                                                          |
+	          ^                                               |
+	          +===========     CLUSTER_DOWN      <------------+
+	                       INBOUND_NOT_COMING_UP
+
+	Transitions -----> can only be made by the outbound CPU, and
+	only involve changes to the "cluster" state.
+
+	Transitions ===##> can only be made by the inbound CPU, and only
+	involve changes to the "inbound" state, except where there is no
+	further transition possible on the outbound side (i.e., the
+	outbound CPU has put the cluster into the CLUSTER_DOWN state).
+
+	The race avoidance algorithm does not provide a way to determine
+	which exact CPUs within the cluster play these roles.  This must
+	be decided in advance by some other means.  Refer to the section
+	"Last man and first man selection" for more explanation.
+
+
+	CLUSTER_DOWN/INBOUND_NOT_COMING_UP is the only state where the
+	cluster can actually be powered down.
+
+	The parallelism of the inbound and outbound CPUs is observed by
+	the existence of two different paths from CLUSTER_GOING_DOWN/
+	INBOUND_NOT_COMING_UP (corresponding to GOING_DOWN in the basic
+	model) to CLUSTER_DOWN/INBOUND_COMING_UP (corresponding to
+	COMING_UP in the basic model).  The second path avoids cluster
+	teardown completely.
+
+	CLUSTER_UP/INBOUND_COMING_UP is equivalent to UP in the basic
+	model.  The final transition to CLUSTER_UP/INBOUND_NOT_COMING_UP
+	is trivial and merely resets the state machine ready for the
+	next cycle.
+
+	Details of the allowable transitions follow.
+
+	The next state in each case is notated
+
+		<cluster state>/<inbound state> (<transitioner>)
+
+	where the <transitioner> is the side on which the transition
+	can occur; either the inbound or the outbound side.
+
+
+CLUSTER_DOWN/INBOUND_NOT_COMING_UP:
+
+	Next state:	CLUSTER_DOWN/INBOUND_COMING_UP (inbound)
+	Conditions:	none
+	Trigger events:
+
+		a) an explicit hardware power-up operation, resulting
+		   from a policy decision on another CPU;
+
+		b) a hardware event, such as an interrupt.
+
+
+CLUSTER_DOWN/INBOUND_COMING_UP:
+
+	In this state, an inbound CPU sets up the cluster, including
+	enabling of hardware coherency at the cluster level and any
+	other operations (such as cache invalidation) which are required
+	in order to achieve this.
+
+	The purpose of this state is to do sufficient cluster-level
+	setup to enable other CPUs in the cluster to enter coherency
+	safely.
+
+	Next state:	CLUSTER_UP/INBOUND_COMING_UP (inbound)
+	Conditions:	cluster-level setup and hardware coherency complete
+	Trigger events:	(spontaneous)
+
+
+CLUSTER_UP/INBOUND_COMING_UP:
+
+	Cluster-level setup is complete and hardware coherency is
+	enabled for the cluster.  Other CPUs in the cluster can safely
+	enter coherency.
+
+	This is a transient state, leading immediately to
+	CLUSTER_UP/INBOUND_NOT_COMING_UP.  All other CPUs on the cluster
+	should consider treat these two states as equivalent.
+
+	Next state:	CLUSTER_UP/INBOUND_NOT_COMING_UP (inbound)
+	Conditions:	none
+	Trigger events:	(spontaneous)
+
+
+CLUSTER_UP/INBOUND_NOT_COMING_UP:
+
+	Cluster-level setup is complete and hardware coherency is
+	enabled for the cluster.  Other CPUs in the cluster can safely
+	enter coherency.
+
+	The cluster will remain in this state until a policy decision is
+	made to power the cluster down.
+
+	Next state:	CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP (outbound)
+	Conditions:	none
+	Trigger events:	policy decision to power down the cluster
+
+
+CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP:
+
+	An outbound CPU is tearing the cluster down.  The selected CPU
+	must wait in this state until all CPUs in the cluster are in the
+	CPU_DOWN state.
+
+	When all CPUs are in the CPU_DOWN state, the cluster can be torn
+	down, for example by cleaning data caches and exiting
+	cluster-level coherency.
+
+	To avoid wasteful unnecessary teardown operations, the outbound
+	should check the inbound cluster state for asynchronous
+	transitions to INBOUND_COMING_UP.  Alternatively, individual
+	CPUs can be checked for entry into CPU_COMING_UP or CPU_UP.
+
+
+	Next states:
+
+	CLUSTER_DOWN/INBOUND_NOT_COMING_UP (outbound)
+		Conditions:	cluster torn down and ready to power off
+		Trigger events:	(spontaneous)
+
+	CLUSTER_GOING_DOWN/INBOUND_COMING_UP (inbound)
+		Conditions:	none
+		Trigger events:
+
+			a) an explicit hardware power-up operation,
+			   resulting from a policy decision on another
+			   CPU;
+
+			b) a hardware event, such as an interrupt.
+
+
+CLUSTER_GOING_DOWN/INBOUND_COMING_UP:
+
+	The cluster is (or was) being torn down, but another CPU has
+	come online in the meantime and is trying to set up the cluster
+	again.
+
+	If the outbound CPU observes this state, it has two choices:
+
+		a) back out of teardown, restoring the cluster to the
+		   CLUSTER_UP state;
+
+		b) finish tearing the cluster down and put the cluster
+		   in the CLUSTER_DOWN state; the inbound CPU will
+		   set up the cluster again from there.
+
+	Choice (a) permits the removal of some latency by avoiding
+	unnecessary teardown and setup operations in situations where
+	the cluster is not really going to be powered down.
+
+
+	Next states:
+
+	CLUSTER_UP/INBOUND_COMING_UP (outbound)
+		Conditions:	cluster-level setup and hardware
+				coherency complete
+		Trigger events:	(spontaneous)
+
+	CLUSTER_DOWN/INBOUND_COMING_UP (outbound)
+		Conditions:	cluster torn down and ready to power off
+		Trigger events:	(spontaneous)
+
+
+Last man and First man selection
+--------------------------------
+
+The CPU which performs cluster tear-down operations on the outbound side
+is commonly referred to as the "last man".
+
+The CPU which performs cluster setup on the inbound side is commonly
+referred to as the "first man".
+
+The race avoidance algorithm documented above does not provide a
+mechanism to choose which CPUs should play these roles.
+
+
+Last man:
+
+When shutting down the cluster, all the CPUs involved are initially
+executing Linux and hence coherent.  Therefore, ordinary spinlocks can
+be used to select a last man safely, before the CPUs become
+non-coherent.
+
+
+First man:
+
+Because CPUs may power up asynchronously in response to external wake-up
+events, a dynamic mechanism is needed to make sure that only one CPU
+attempts to play the first man role and do the cluster-level
+initialisation: any other CPUs must wait for this to complete before
+proceeding.
+
+Cluster-level initialisation may involve actions such as configuring
+coherency controls in the bus fabric.
+
+The current implementation in mcpm_head.S uses a separate mutual exclusion
+mechanism to do this arbitration.  This mechanism is documented in
+detail in vlocks.txt.
+
+
+Features and Limitations
+------------------------
+
+Implementation:
+
+	The current ARM-based implementation is split between
+	arch/arm/common/mcpm_head.S (low-level inbound CPU operations) and
+	arch/arm/common/mcpm_entry.c (everything else):
+
+	__mcpm_cpu_going_down() signals the transition of a CPU to the
+		CPU_GOING_DOWN state.
+
+	__mcpm_cpu_down() signals the transition of a CPU to the CPU_DOWN
+		state.
+
+	A CPU transitions to CPU_COMING_UP and then to CPU_UP via the
+		low-level power-up code in mcpm_head.S.  This could
+		involve CPU-specific setup code, but in the current
+		implementation it does not.
+
+	__mcpm_outbound_enter_critical() and __mcpm_outbound_leave_critical()
+		handle transitions from CLUSTER_UP to CLUSTER_GOING_DOWN
+		and from there to CLUSTER_DOWN or back to CLUSTER_UP (in
+		the case of an aborted cluster power-down).
+
+		These functions are more complex than the __mcpm_cpu_*()
+		functions due to the extra inter-CPU coordination which
+		is needed for safe transitions at the cluster level.
+
+	A cluster transitions from CLUSTER_DOWN back to CLUSTER_UP via
+		the low-level power-up code in mcpm_head.S.  This
+		typically involves platform-specific setup code,
+		provided by the platform-specific power_up_setup
+		function registered via mcpm_sync_init.
+
+Deep topologies:
+
+	As currently described and implemented, the algorithm does not
+	support CPU topologies involving more than two levels (i.e.,
+	clusters of clusters are not supported).  The algorithm could be
+	extended by replicating the cluster-level states for the
+	additional topological levels, and modifying the transition
+	rules for the intermediate (non-outermost) cluster levels.
+
+
+Colophon
+--------
+
+Originally created and documented by Dave Martin for Linaro Limited, in
+collaboration with Nicolas Pitre and Achin Gupta.
+
+Copyright (C) 2012-2013  Linaro Limited
+Distributed under the terms of Version 2 of the GNU General Public
+License, as defined in linux/COPYING.
diff --git a/Documentation/arm/firmware.txt b/Documentation/arm/firmware.txt
new file mode 100644
index 00000000000..c2e468fe7b0
--- /dev/null
+++ b/Documentation/arm/firmware.txt
@@ -0,0 +1,88 @@
+Interface for registering and calling firmware-specific operations for ARM.
+----
+Written by Tomasz Figa <t.figa@samsung.com>
+
+Some boards are running with secure firmware running in TrustZone secure
+world, which changes the way some things have to be initialized. This makes
+a need to provide an interface for such platforms to specify available firmware
+operations and call them when needed.
+
+Firmware operations can be specified using struct firmware_ops
+
+	struct firmware_ops {
+		/*
+		* Enters CPU idle mode
+		*/
+		int (*do_idle)(void);
+		/*
+		* Sets boot address of specified physical CPU
+		*/
+		int (*set_cpu_boot_addr)(int cpu, unsigned long boot_addr);
+		/*
+		* Boots specified physical CPU
+		*/
+		int (*cpu_boot)(int cpu);
+		/*
+		* Initializes L2 cache
+		*/
+		int (*l2x0_init)(void);
+	};
+
+and then registered with register_firmware_ops function
+
+	void register_firmware_ops(const struct firmware_ops *ops)
+
+the ops pointer must be non-NULL.
+
+There is a default, empty set of operations provided, so there is no need to
+set anything if platform does not require firmware operations.
+
+To call a firmware operation, a helper macro is provided
+
+	#define call_firmware_op(op, ...)				\
+		((firmware_ops->op) ? firmware_ops->op(__VA_ARGS__) : (-ENOSYS))
+
+the macro checks if the operation is provided and calls it or otherwise returns
+-ENOSYS to signal that given operation is not available (for example, to allow
+fallback to legacy operation).
+
+Example of registering firmware operations:
+
+	/* board file */
+
+	static int platformX_do_idle(void)
+	{
+		/* tell platformX firmware to enter idle */
+		return 0;
+	}
+
+	static int platformX_cpu_boot(int i)
+	{
+		/* tell platformX firmware to boot CPU i */
+		return 0;
+	}
+
+	static const struct firmware_ops platformX_firmware_ops = {
+		.do_idle        = exynos_do_idle,
+		.cpu_boot       = exynos_cpu_boot,
+		/* other operations not available on platformX */
+	};
+
+	/* init_early callback of machine descriptor */
+	static void __init board_init_early(void)
+	{
+		register_firmware_ops(&platformX_firmware_ops);
+	}
+
+Example of using a firmware operation:
+
+	/* some platform code, e.g. SMP initialization */
+
+	__raw_writel(virt_to_phys(exynos4_secondary_startup),
+		CPU1_BOOT_REG);
+
+	/* Call Exynos specific smc call */
+	if (call_firmware_op(cpu_boot, cpu) == -ENOSYS)
+		cpu_boot_legacy(...); /* Try legacy way */
+
+	gic_raise_softirq(cpumask_of(cpu), 1);
diff --git a/Documentation/arm/kernel_mode_neon.txt b/Documentation/arm/kernel_mode_neon.txt
new file mode 100644
index 00000000000..525452726d3
--- /dev/null
+++ b/Documentation/arm/kernel_mode_neon.txt
@@ -0,0 +1,121 @@
+Kernel mode NEON
+================
+
+TL;DR summary
+-------------
+* Use only NEON instructions, or VFP instructions that don't rely on support
+  code
+* Isolate your NEON code in a separate compilation unit, and compile it with
+  '-mfpu=neon -mfloat-abi=softfp'
+* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your
+  NEON code
+* Don't sleep in your NEON code, and be aware that it will be executed with
+  preemption disabled
+
+
+Introduction
+------------
+It is possible to use NEON instructions (and in some cases, VFP instructions) in
+code that runs in kernel mode. However, for performance reasons, the NEON/VFP
+register file is not preserved and restored at every context switch or taken
+exception like the normal register file is, so some manual intervention is
+required. Furthermore, special care is required for code that may sleep [i.e.,
+may call schedule()], as NEON or VFP instructions will be executed in a
+non-preemptible section for reasons outlined below.
+
+
+Lazy preserve and restore
+-------------------------
+The NEON/VFP register file is managed using lazy preserve (on UP systems) and
+lazy restore (on both SMP and UP systems). This means that the register file is
+kept 'live', and is only preserved and restored when multiple tasks are
+contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to
+another core). Lazy restore is implemented by disabling the NEON/VFP unit after
+every context switch, resulting in a trap when subsequently a NEON/VFP
+instruction is issued, allowing the kernel to step in and perform the restore if
+necessary.
+
+Any use of the NEON/VFP unit in kernel mode should not interfere with this, so
+it is required to do an 'eager' preserve of the NEON/VFP register file, and
+enable the NEON/VFP unit explicitly so no exceptions are generated on first
+subsequent use. This is handled by the function kernel_neon_begin(), which
+should be called before any kernel mode NEON or VFP instructions are issued.
+Likewise, the NEON/VFP unit should be disabled again after use to make sure user
+mode will hit the lazy restore trap upon next use. This is handled by the
+function kernel_neon_end().
+
+
+Interruptions in kernel mode
+----------------------------
+For reasons of performance and simplicity, it was decided that there shall be no
+preserve/restore mechanism for the kernel mode NEON/VFP register contents. This
+implies that interruptions of a kernel mode NEON section can only be allowed if
+they are guaranteed not to touch the NEON/VFP registers. For this reason, the
+following rules and restrictions apply in the kernel:
+* NEON/VFP code is not allowed in interrupt context;
+* NEON/VFP code is not allowed to sleep;
+* NEON/VFP code is executed with preemption disabled.
+
+If latency is a concern, it is possible to put back to back calls to
+kernel_neon_end() and kernel_neon_begin() in places in your code where none of
+the NEON registers are live. (Additional calls to kernel_neon_begin() should be
+reasonably cheap if no context switch occurred in the meantime)
+
+
+VFP and support code
+--------------------
+Earlier versions of VFP (prior to version 3) rely on software support for things
+like IEEE-754 compliant underflow handling etc. When the VFP unit needs such
+software assistance, it signals the kernel by raising an undefined instruction
+exception. The kernel responds by inspecting the VFP control registers and the
+current instruction and arguments, and emulates the instruction in software.
+
+Such software assistance is currently not implemented for VFP instructions
+executed in kernel mode. If such a condition is encountered, the kernel will
+fail and generate an OOPS.
+
+
+Separating NEON code from ordinary code
+---------------------------------------
+The compiler is not aware of the special significance of kernel_neon_begin() and
+kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions
+between calls to these respective functions. Furthermore, GCC may generate NEON
+instructions of its own at -O3 level if -mfpu=neon is selected, and even if the
+kernel is currently compiled at -O2, future changes may result in NEON/VFP
+instructions appearing in unexpected places if no special care is taken.
+
+Therefore, the recommended and only supported way of using NEON/VFP in the
+kernel is by adhering to the following rules:
+* isolate the NEON code in a separate compilation unit and compile it with
+  '-mfpu=neon -mfloat-abi=softfp';
+* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls
+  into the unit containing the NEON code from a compilation unit which is *not*
+  built with the GCC flag '-mfpu=neon' set.
+
+As the kernel is compiled with '-msoft-float', the above will guarantee that
+both NEON and VFP instructions will only ever appear in designated compilation
+units at any optimization level.
+
+
+NEON assembler
+--------------
+NEON assembler is supported with no additional caveats as long as the rules
+above are followed.
+
+
+NEON code generated by GCC
+--------------------------
+The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit
+parallelism, and generates NEON code from ordinary C source code. This is fully
+supported as long as the rules above are followed.
+
+
+NEON intrinsics
+---------------
+NEON intrinsics are also supported. However, as code using NEON intrinsics
+relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should
+observe the following in addition to the rules above:
+* Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC
+  uses its builtin version of <stdint.h> (this is a C99 header which the kernel
+  does not supply);
+* Include <arm_neon.h> last, or at least after <linux/types.h>
diff --git a/Documentation/arm/kernel_user_helpers.txt b/Documentation/arm/kernel_user_helpers.txt
new file mode 100644
index 00000000000..5673594717c
--- /dev/null
+++ b/Documentation/arm/kernel_user_helpers.txt
@@ -0,0 +1,267 @@
+Kernel-provided User Helpers
+============================
+
+These are segment of kernel provided user code reachable from user space
+at a fixed address in kernel memory.  This is used to provide user space
+with some operations which require kernel help because of unimplemented
+native feature and/or instructions in many ARM CPUs. The idea is for this
+code to be executed directly in user mode for best efficiency but which is
+too intimate with the kernel counter part to be left to user libraries.
+In fact this code might even differ from one CPU to another depending on
+the available instruction set, or whether it is a SMP systems. In other
+words, the kernel reserves the right to change this code as needed without
+warning. Only the entry points and their results as documented here are
+guaranteed to be stable.
+
+This is different from (but doesn't preclude) a full blown VDSO
+implementation, however a VDSO would prevent some assembly tricks with
+constants that allows for efficient branching to those code segments. And
+since those code segments only use a few cycles before returning to user
+code, the overhead of a VDSO indirect far call would add a measurable
+overhead to such minimalistic operations.
+
+User space is expected to bypass those helpers and implement those things
+inline (either in the code emitted directly by the compiler, or part of
+the implementation of a library call) when optimizing for a recent enough
+processor that has the necessary native support, but only if resulting
+binaries are already to be incompatible with earlier ARM processors due to
+usage of similar native instructions for other things.  In other words
+don't make binaries unable to run on earlier processors just for the sake
+of not using these kernel helpers if your compiled code is not going to
+use new instructions for other purpose.
+
+New helpers may be added over time, so an older kernel may be missing some
+helpers present in a newer kernel.  For this reason, programs must check
+the value of __kuser_helper_version (see below) before assuming that it is
+safe to call any particular helper.  This check should ideally be
+performed only once at process startup time, and execution aborted early
+if the required helpers are not provided by the kernel version that
+process is running on.
+
+kuser_helper_version
+--------------------
+
+Location:	0xffff0ffc
+
+Reference declaration:
+
+  extern int32_t __kuser_helper_version;
+
+Definition:
+
+  This field contains the number of helpers being implemented by the
+  running kernel.  User space may read this to determine the availability
+  of a particular helper.
+
+Usage example:
+
+#define __kuser_helper_version (*(int32_t *)0xffff0ffc)
+
+void check_kuser_version(void)
+{
+	if (__kuser_helper_version < 2) {
+		fprintf(stderr, "can't do atomic operations, kernel too old\n");
+		abort();
+	}
+}
+
+Notes:
+
+  User space may assume that the value of this field never changes
+  during the lifetime of any single process.  This means that this
+  field can be read once during the initialisation of a library or
+  startup phase of a program.
+
+kuser_get_tls
+-------------
+
+Location:	0xffff0fe0
+
+Reference prototype:
+
+  void * __kuser_get_tls(void);
+
+Input:
+
+  lr = return address
+
+Output:
+
+  r0 = TLS value
+
+Clobbered registers:
+
+  none
+
+Definition:
+
+  Get the TLS value as previously set via the __ARM_NR_set_tls syscall.
+
+Usage example:
+
+typedef void * (__kuser_get_tls_t)(void);
+#define __kuser_get_tls (*(__kuser_get_tls_t *)0xffff0fe0)
+
+void foo()
+{
+	void *tls = __kuser_get_tls();
+	printf("TLS = %p\n", tls);
+}
+
+Notes:
+
+  - Valid only if __kuser_helper_version >= 1 (from kernel version 2.6.12).
+
+kuser_cmpxchg
+-------------
+
+Location:	0xffff0fc0
+
+Reference prototype:
+
+  int __kuser_cmpxchg(int32_t oldval, int32_t newval, volatile int32_t *ptr);
+
+Input:
+
+  r0 = oldval
+  r1 = newval
+  r2 = ptr
+  lr = return address
+
+Output:
+
+  r0 = success code (zero or non-zero)
+  C flag = set if r0 == 0, clear if r0 != 0
+
+Clobbered registers:
+
+  r3, ip, flags
+
+Definition:
+
+  Atomically store newval in *ptr only if *ptr is equal to oldval.
+  Return zero if *ptr was changed or non-zero if no exchange happened.
+  The C flag is also set if *ptr was changed to allow for assembly
+  optimization in the calling code.
+
+Usage example:
+
+typedef int (__kuser_cmpxchg_t)(int oldval, int newval, volatile int *ptr);
+#define __kuser_cmpxchg (*(__kuser_cmpxchg_t *)0xffff0fc0)
+
+int atomic_add(volatile int *ptr, int val)
+{
+	int old, new;
+
+	do {
+		old = *ptr;
+		new = old + val;
+	} while(__kuser_cmpxchg(old, new, ptr));
+
+	return new;
+}
+
+Notes:
+
+  - This routine already includes memory barriers as needed.
+
+  - Valid only if __kuser_helper_version >= 2 (from kernel version 2.6.12).
+
+kuser_memory_barrier
+--------------------
+
+Location:	0xffff0fa0
+
+Reference prototype:
+
+  void __kuser_memory_barrier(void);
+
+Input:
+
+  lr = return address
+
+Output:
+
+  none
+
+Clobbered registers:
+
+  none
+
+Definition:
+
+  Apply any needed memory barrier to preserve consistency with data modified
+  manually and __kuser_cmpxchg usage.
+
+Usage example:
+
+typedef void (__kuser_dmb_t)(void);
+#define __kuser_dmb (*(__kuser_dmb_t *)0xffff0fa0)
+
+Notes:
+
+  - Valid only if __kuser_helper_version >= 3 (from kernel version 2.6.15).
+
+kuser_cmpxchg64
+---------------
+
+Location:	0xffff0f60
+
+Reference prototype:
+
+  int __kuser_cmpxchg64(const int64_t *oldval,
+                        const int64_t *newval,
+                        volatile int64_t *ptr);
+
+Input:
+
+  r0 = pointer to oldval
+  r1 = pointer to newval
+  r2 = pointer to target value
+  lr = return address
+
+Output:
+
+  r0 = success code (zero or non-zero)
+  C flag = set if r0 == 0, clear if r0 != 0
+
+Clobbered registers:
+
+  r3, lr, flags
+
+Definition:
+
+  Atomically store the 64-bit value pointed by *newval in *ptr only if *ptr
+  is equal to the 64-bit value pointed by *oldval.  Return zero if *ptr was
+  changed or non-zero if no exchange happened.
+
+  The C flag is also set if *ptr was changed to allow for assembly
+  optimization in the calling code.
+
+Usage example:
+
+typedef int (__kuser_cmpxchg64_t)(const int64_t *oldval,
+                                  const int64_t *newval,
+                                  volatile int64_t *ptr);
+#define __kuser_cmpxchg64 (*(__kuser_cmpxchg64_t *)0xffff0f60)
+
+int64_t atomic_add64(volatile int64_t *ptr, int64_t val)
+{
+	int64_t old, new;
+
+	do {
+		old = *ptr;
+		new = old + val;
+	} while(__kuser_cmpxchg64(&old, &new, ptr));
+
+	return new;
+}
+
+Notes:
+
+  - This routine already includes memory barriers as needed.
+
+  - Due to the length of this sequence, this spans 2 conventional kuser
+    "slots", therefore 0xffff0f80 is not used as a valid entry point.
+
+  - Valid only if __kuser_helper_version >= 5 (from kernel version 3.1).
diff --git a/Documentation/arm/memory.txt b/Documentation/arm/memory.txt
index 771d48d3b33..38dc06d0a79 100644
--- a/Documentation/arm/memory.txt
+++ b/Documentation/arm/memory.txt
@@ -41,25 +41,20 @@ fffe8000	fffeffff	DTCM mapping area for platforms with
 fffe0000	fffe7fff	ITCM mapping area for platforms with
 				ITCM mounted inside the CPU.
 
-fff00000	fffdffff	Fixmap mapping region.  Addresses provided
+ffc00000	ffdfffff	Fixmap mapping region.  Addresses provided
 				by fix_to_virt() will be located here.
 
-ffc00000	ffefffff	DMA memory mapping region.  Memory returned
-				by the dma_alloc_xxx functions will be
-				dynamically mapped here.
-
-ff000000	ffbfffff	Reserved for future expansion of DMA
-				mapping region.
-
-VMALLOC_END	feffffff	Free for platform use, recommended.
-				VMALLOC_END must be aligned to a 2MB
-				boundary.
+fee00000	feffffff	Mapping of PCI I/O space. This is a static
+				mapping within the vmalloc space.
 
 VMALLOC_START	VMALLOC_END-1	vmalloc() / ioremap() space.
 				Memory returned by vmalloc/ioremap will
 				be dynamically placed in this region.
-				VMALLOC_START may be based upon the value
-				of the high_memory variable.
+				Machine specific static mappings are also
+				located here through iotable_init().
+				VMALLOC_START is based upon the value
+				of the high_memory variable, and VMALLOC_END
+				is equal to 0xff000000.
 
 PAGE_OFFSET	high_memory-1	Kernel direct-mapped RAM region.
 				This maps the platforms RAM, and typically
diff --git a/Documentation/arm/sti/overview.txt b/Documentation/arm/sti/overview.txt
new file mode 100644
index 00000000000..1a4e93d6027
--- /dev/null
+++ b/Documentation/arm/sti/overview.txt
@@ -0,0 +1,33 @@
+			STi ARM Linux Overview
+			==========================
+
+Introduction
+------------
+
+  The ST Microelectronics Multimedia and Application Processors range of
+  CortexA9 System-on-Chip are supported by the 'STi' platform of
+  ARM Linux. Currently STiH415, STiH416 SOCs are supported with both
+  B2000 and B2020 Reference boards.
+
+
+  configuration
+  -------------
+
+  A generic configuration is provided for both STiH415/416, and can be used as the
+  default by
+	make stih41x_defconfig
+
+  Layout
+  ------
+  All the files for multiple machine families (STiH415, STiH416, and STiG125)
+  are located in the platform code contained in arch/arm/mach-sti
+
+  There is a generic board board-dt.c in the mach folder which support
+  Flattened Device Tree, which means, It works with any compatible board with
+  Device Trees.
+
+
+  Document Author
+  ---------------
+
+  Srinivas Kandagatla <srinivas.kandagatla@st.com>, (c) 2013 ST Microelectronics
diff --git a/Documentation/arm/sti/stih407-overview.txt b/Documentation/arm/sti/stih407-overview.txt
new file mode 100644
index 00000000000..3343f32f58b
--- /dev/null
+++ b/Documentation/arm/sti/stih407-overview.txt
@@ -0,0 +1,18 @@
+			STiH407 Overview
+			================
+
+Introduction
+------------
+
+    The STiH407 is the new generation of SoC for Multi-HD, AVC set-top boxes
+    and server/connected client application for satellite, cable, terrestrial
+    and IP-STB markets.
+
+    Features
+    - ARM Cortex-A9 1.5 GHz dual core CPU (28nm)
+    - SATA2, USB 3.0, PCIe, Gbit Ethernet
+
+  Document Author
+  ---------------
+
+  Maxime Coquelin <maxime.coquelin@st.com>, (c) 2014 ST Microelectronics
diff --git a/Documentation/arm/sti/stih415-overview.txt b/Documentation/arm/sti/stih415-overview.txt
new file mode 100644
index 00000000000..1383e33f265
--- /dev/null
+++ b/Documentation/arm/sti/stih415-overview.txt
@@ -0,0 +1,12 @@
+			STiH415 Overview
+			================
+
+Introduction
+------------
+
+    The STiH415 is the next generation of HD, AVC set-top box processors
+    for satellite, cable, terrestrial and IP-STB markets.
+
+    Features
+    - ARM Cortex-A9 1.0 GHz, dual-core CPU
+    - SATA2x2,USB 2.0x3, PCIe, Gbit Ethernet MACx2
diff --git a/Documentation/arm/sti/stih416-overview.txt b/Documentation/arm/sti/stih416-overview.txt
new file mode 100644
index 00000000000..558444c201c
--- /dev/null
+++ b/Documentation/arm/sti/stih416-overview.txt
@@ -0,0 +1,12 @@
+			STiH416 Overview
+			================
+
+Introduction
+------------
+
+    The STiH416 is the next generation of HD, AVC set-top box processors
+    for satellite, cable, terrestrial and IP-STB markets.
+
+    Features
+    - ARM Cortex-A9 1.2 GHz dual core CPU
+    - SATA2x2,USB 2.0x3, PCIe, Gbit Ethernet MACx2
diff --git a/Documentation/arm/sunxi/README b/Documentation/arm/sunxi/README
new file mode 100644
index 00000000000..7945238453e
--- /dev/null
+++ b/Documentation/arm/sunxi/README
@@ -0,0 +1,52 @@
+ARM Allwinner SoCs
+==================
+
+This document lists all the ARM Allwinner SoCs that are currently
+supported in mainline by the Linux kernel. This document will also
+provide links to documentation and/or datasheet for these SoCs.
+
+SunXi family
+------------
+  Linux kernel mach directory: arch/arm/mach-sunxi
+
+  Flavors:
+    * ARM926 based SoCs
+      - Allwinner F20 (sun3i)
+        + Not Supported
+
+    * ARM Cortex-A8 based SoCs
+      - Allwinner A10 (sun4i)
+        + Datasheet
+	  http://dl.linux-sunxi.org/A10/A10%20Datasheet%20-%20v1.21%20%282012-04-06%29.pdf
+	+ User Manual
+	  http://dl.linux-sunxi.org/A10/A10%20User%20Manual%20-%20v1.20%20%282012-04-09%2c%20DECRYPTED%29.pdf
+
+      - Allwinner A10s (sun5i)
+        + Datasheet
+          http://dl.linux-sunxi.org/A10s/A10s%20Datasheet%20-%20v1.20%20%282012-03-27%29.pdf
+
+      - Allwinner A13 (sun5i)
+        + Datasheet
+	  http://dl.linux-sunxi.org/A13/A13%20Datasheet%20-%20v1.12%20%282012-03-29%29.pdf
+        + User Manual
+          http://dl.linux-sunxi.org/A13/A13%20User%20Manual%20-%20v1.2%20%282013-01-08%29.pdf
+
+    * Dual ARM Cortex-A7 based SoCs
+      - Allwinner A20 (sun7i)
+        + User Manual
+          http://dl.linux-sunxi.org/A20/A20%20User%20Manual%202013-03-22.pdf
+
+      - Allwinner A23
+        + Not Supported
+
+    * Quad ARM Cortex-A7 based SoCs
+      - Allwinner A31 (sun6i)
+        + Datasheet
+          http://dl.linux-sunxi.org/A31/A31%20Datasheet%20-%20v1.00%20(2012-12-24).pdf
+
+      - Allwinner A31s (sun6i)
+        + Not Supported
+
+    * Quad ARM Cortex-A15, Quad ARM Cortex-A7 based SoCs
+      - Allwinner A80
+        + Not Supported
+\ No newline at end of file
diff --git a/Documentation/arm/sunxi/clocks.txt b/Documentation/arm/sunxi/clocks.txt
new file mode 100644
index 00000000000..e09a88aa313
--- /dev/null
+++ b/Documentation/arm/sunxi/clocks.txt
@@ -0,0 +1,56 @@
+Frequently asked questions about the sunxi clock system
+=======================================================
+
+This document contains useful bits of information that people tend to ask
+about the sunxi clock system, as well as accompanying ASCII art when adequate.
+
+Q: Why is the main 24MHz oscillator gatable? Wouldn't that break the
+   system?
+
+A: The 24MHz oscillator allows gating to save power. Indeed, if gated
+   carelessly the system would stop functioning, but with the right
+   steps, one can gate it and keep the system running. Consider this
+   simplified suspend example:
+
+   While the system is operational, you would see something like
+
+      24MHz         32kHz
+       |
+      PLL1
+       \
+        \_ CPU Mux
+             |
+           [CPU]
+
+   When you are about to suspend, you switch the CPU Mux to the 32kHz
+   oscillator:
+
+      24Mhz         32kHz
+       |              |
+      PLL1            |
+                     /
+           CPU Mux _/
+             |
+           [CPU]
+
+    Finally you can gate the main oscillator
+
+                    32kHz
+                      |
+                      |
+                     /
+           CPU Mux _/
+             |
+           [CPU]
+
+Q: Were can I learn more about the sunxi clocks?
+
+A: The linux-sunxi wiki contains a page documenting the clock registers,
+   you can find it at
+
+        http://linux-sunxi.org/A10/CCM
+
+   The authoritative source for information at this time is the ccmu driver
+   released by Allwinner, you can find it at
+
+        https://github.com/linux-sunxi/linux-sunxi/tree/sunxi-3.0/arch/arm/mach-sun4i/clock/ccmu
diff --git a/Documentation/arm/swp_emulation b/Documentation/arm/swp_emulation
new file mode 100644
index 00000000000..af903d22fd9
--- /dev/null
+++ b/Documentation/arm/swp_emulation
@@ -0,0 +1,27 @@
+Software emulation of deprecated SWP instruction (CONFIG_SWP_EMULATE)
+---------------------------------------------------------------------
+
+ARMv6 architecture deprecates use of the SWP/SWPB instructions, and recommeds
+moving to the load-locked/store-conditional instructions LDREX and STREX.
+
+ARMv7 multiprocessing extensions introduce the ability to disable these
+instructions, triggering an undefined instruction exception when executed.
+Trapped instructions are emulated using an LDREX/STREX or LDREXB/STREXB
+sequence. If a memory access fault (an abort) occurs, a segmentation fault is
+signalled to the triggering process.
+
+/proc/cpu/swp_emulation holds some statistics/information, including the PID of
+the last process to trigger the emulation to be invocated. For example:
+---
+Emulated SWP:		12
+Emulated SWPB:		0
+Aborted SWP{B}:		1
+Last process:		314
+---
+
+NOTE: when accessing uncached shared regions, LDREX/STREX rely on an external
+transaction monitoring block called a global monitor to maintain update
+atomicity. If your system does not implement a global monitor, this option can
+cause programs that perform SWP operations to uncached memory to deadlock, as
+the STREX operation will always fail.
+
diff --git a/Documentation/arm/uefi.txt b/Documentation/arm/uefi.txt
new file mode 100644
index 00000000000..d60030a1b90
--- /dev/null
+++ b/Documentation/arm/uefi.txt
@@ -0,0 +1,64 @@
+UEFI, the Unified Extensible Firmware Interface, is a specification
+governing the behaviours of compatible firmware interfaces. It is
+maintained by the UEFI Forum - http://www.uefi.org/.
+
+UEFI is an evolution of its predecessor 'EFI', so the terms EFI and
+UEFI are used somewhat interchangeably in this document and associated
+source code. As a rule, anything new uses 'UEFI', whereas 'EFI' refers
+to legacy code or specifications.
+
+UEFI support in Linux
+=====================
+Booting on a platform with firmware compliant with the UEFI specification
+makes it possible for the kernel to support additional features:
+- UEFI Runtime Services
+- Retrieving various configuration information through the standardised
+  interface of UEFI configuration tables. (ACPI, SMBIOS, ...)
+
+For actually enabling [U]EFI support, enable:
+- CONFIG_EFI=y
+- CONFIG_EFI_VARS=y or m
+
+The implementation depends on receiving information about the UEFI environment
+in a Flattened Device Tree (FDT) - so is only available with CONFIG_OF.
+
+UEFI stub
+=========
+The "stub" is a feature that extends the Image/zImage into a valid UEFI
+PE/COFF executable, including a loader application that makes it possible to
+load the kernel directly from the UEFI shell, boot menu, or one of the
+lightweight bootloaders like Gummiboot or rEFInd.
+
+The kernel image built with stub support remains a valid kernel image for
+booting in non-UEFI environments.
+
+UEFI kernel support on ARM
+==========================
+UEFI kernel support on the ARM architectures (arm and arm64) is only available
+when boot is performed through the stub.
+
+When booting in UEFI mode, the stub deletes any memory nodes from a provided DT.
+Instead, the kernel reads the UEFI memory map.
+
+The stub populates the FDT /chosen node with (and the kernel scans for) the
+following parameters:
+________________________________________________________________________________
+Name                      | Size   | Description
+================================================================================
+linux,uefi-system-table   | 64-bit | Physical address of the UEFI System Table.
+--------------------------------------------------------------------------------
+linux,uefi-mmap-start     | 64-bit | Physical address of the UEFI memory map,
+                          |        | populated by the UEFI GetMemoryMap() call.
+--------------------------------------------------------------------------------
+linux,uefi-mmap-size      | 32-bit | Size in bytes of the UEFI memory map
+                          |        | pointed to in previous entry.
+--------------------------------------------------------------------------------
+linux,uefi-mmap-desc-size | 32-bit | Size in bytes of each entry in the UEFI
+                          |        | memory map.
+--------------------------------------------------------------------------------
+linux,uefi-mmap-desc-ver  | 32-bit | Version of the mmap descriptor format.
+--------------------------------------------------------------------------------
+linux,uefi-stub-kern-ver  | string | Copy of linux_banner from build.
+--------------------------------------------------------------------------------
+
+For verbose debug messages, specify 'uefi_debug' on the kernel command line.
diff --git a/Documentation/arm/vlocks.txt b/Documentation/arm/vlocks.txt
new file mode 100644
index 00000000000..415960a9bab
--- /dev/null
+++ b/Documentation/arm/vlocks.txt
@@ -0,0 +1,211 @@
+vlocks for Bare-Metal Mutual Exclusion
+======================================
+
+Voting Locks, or "vlocks" provide a simple low-level mutual exclusion
+mechanism, with reasonable but minimal requirements on the memory
+system.
+
+These are intended to be used to coordinate critical activity among CPUs
+which are otherwise non-coherent, in situations where the hardware
+provides no other mechanism to support this and ordinary spinlocks
+cannot be used.
+
+
+vlocks make use of the atomicity provided by the memory system for
+writes to a single memory location.  To arbitrate, every CPU "votes for
+itself", by storing a unique number to a common memory location.  The
+final value seen in that memory location when all the votes have been
+cast identifies the winner.
+
+In order to make sure that the election produces an unambiguous result
+in finite time, a CPU will only enter the election in the first place if
+no winner has been chosen and the election does not appear to have
+started yet.
+
+
+Algorithm
+---------
+
+The easiest way to explain the vlocks algorithm is with some pseudo-code:
+
+
+	int currently_voting[NR_CPUS] = { 0, };
+	int last_vote = -1; /* no votes yet */
+
+	bool vlock_trylock(int this_cpu)
+	{
+		/* signal our desire to vote */
+		currently_voting[this_cpu] = 1;
+		if (last_vote != -1) {
+			/* someone already volunteered himself */
+			currently_voting[this_cpu] = 0;
+			return false; /* not ourself */
+		}
+
+		/* let's suggest ourself */
+		last_vote = this_cpu;
+		currently_voting[this_cpu] = 0;
+
+		/* then wait until everyone else is done voting */
+		for_each_cpu(i) {
+			while (currently_voting[i] != 0)
+				/* wait */;
+		}
+
+		/* result */
+		if (last_vote == this_cpu)
+			return true; /* we won */
+		return false;
+	}
+
+	bool vlock_unlock(void)
+	{
+		last_vote = -1;
+	}
+
+
+The currently_voting[] array provides a way for the CPUs to determine
+whether an election is in progress, and plays a role analogous to the
+"entering" array in Lamport's bakery algorithm [1].
+
+However, once the election has started, the underlying memory system
+atomicity is used to pick the winner.  This avoids the need for a static
+priority rule to act as a tie-breaker, or any counters which could
+overflow.
+
+As long as the last_vote variable is globally visible to all CPUs, it
+will contain only one value that won't change once every CPU has cleared
+its currently_voting flag.
+
+
+Features and limitations
+------------------------
+
+ * vlocks are not intended to be fair.  In the contended case, it is the
+   _last_ CPU which attempts to get the lock which will be most likely
+   to win.
+
+   vlocks are therefore best suited to situations where it is necessary
+   to pick a unique winner, but it does not matter which CPU actually
+   wins.
+
+ * Like other similar mechanisms, vlocks will not scale well to a large
+   number of CPUs.
+
+   vlocks can be cascaded in a voting hierarchy to permit better scaling
+   if necessary, as in the following hypothetical example for 4096 CPUs:
+
+	/* first level: local election */
+	my_town = towns[(this_cpu >> 4) & 0xf];
+	I_won = vlock_trylock(my_town, this_cpu & 0xf);
+	if (I_won) {
+		/* we won the town election, let's go for the state */
+		my_state = states[(this_cpu >> 8) & 0xf];
+		I_won = vlock_lock(my_state, this_cpu & 0xf));
+		if (I_won) {
+			/* and so on */
+			I_won = vlock_lock(the_whole_country, this_cpu & 0xf];
+			if (I_won) {
+				/* ... */
+			}
+			vlock_unlock(the_whole_country);
+		}
+		vlock_unlock(my_state);
+	}
+	vlock_unlock(my_town);
+
+
+ARM implementation
+------------------
+
+The current ARM implementation [2] contains some optimisations beyond
+the basic algorithm:
+
+ * By packing the members of the currently_voting array close together,
+   we can read the whole array in one transaction (providing the number
+   of CPUs potentially contending the lock is small enough).  This
+   reduces the number of round-trips required to external memory.
+
+   In the ARM implementation, this means that we can use a single load
+   and comparison:
+
+	LDR	Rt, [Rn]
+	CMP	Rt, #0
+
+   ...in place of code equivalent to:
+
+	LDRB	Rt, [Rn]
+	CMP	Rt, #0
+	LDRBEQ	Rt, [Rn, #1]
+	CMPEQ	Rt, #0
+	LDRBEQ	Rt, [Rn, #2]
+	CMPEQ	Rt, #0
+	LDRBEQ	Rt, [Rn, #3]
+	CMPEQ	Rt, #0
+
+   This cuts down on the fast-path latency, as well as potentially
+   reducing bus contention in contended cases.
+
+   The optimisation relies on the fact that the ARM memory system
+   guarantees coherency between overlapping memory accesses of
+   different sizes, similarly to many other architectures.  Note that
+   we do not care which element of currently_voting appears in which
+   bits of Rt, so there is no need to worry about endianness in this
+   optimisation.
+
+   If there are too many CPUs to read the currently_voting array in
+   one transaction then multiple transations are still required.  The
+   implementation uses a simple loop of word-sized loads for this
+   case.  The number of transactions is still fewer than would be
+   required if bytes were loaded individually.
+
+
+   In principle, we could aggregate further by using LDRD or LDM, but
+   to keep the code simple this was not attempted in the initial
+   implementation.
+
+
+ * vlocks are currently only used to coordinate between CPUs which are
+   unable to enable their caches yet.  This means that the
+   implementation removes many of the barriers which would be required
+   when executing the algorithm in cached memory.
+
+   packing of the currently_voting array does not work with cached
+   memory unless all CPUs contending the lock are cache-coherent, due
+   to cache writebacks from one CPU clobbering values written by other
+   CPUs.  (Though if all the CPUs are cache-coherent, you should be
+   probably be using proper spinlocks instead anyway).
+
+
+ * The "no votes yet" value used for the last_vote variable is 0 (not
+   -1 as in the pseudocode).  This allows statically-allocated vlocks
+   to be implicitly initialised to an unlocked state simply by putting
+   them in .bss.
+
+   An offset is added to each CPU's ID for the purpose of setting this
+   variable, so that no CPU uses the value 0 for its ID.
+
+
+Colophon
+--------
+
+Originally created and documented by Dave Martin for Linaro Limited, for
+use in ARM-based big.LITTLE platforms, with review and input gratefully
+received from Nicolas Pitre and Achin Gupta.  Thanks to Nicolas for
+grabbing most of this text out of the relevant mail thread and writing
+up the pseudocode.
+
+Copyright (C) 2012-2013  Linaro Limited
+Distributed under the terms of Version 2 of the GNU General Public
+License, as defined in linux/COPYING.
+
+
+References
+----------
+
+[1] Lamport, L. "A New Solution of Dijkstra's Concurrent Programming
+    Problem", Communications of the ACM 17, 8 (August 1974), 453-455.
+
+    http://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm
+
+[2] linux/arch/arm/common/vlock.S, www.kernel.org.