aboutsummaryrefslogtreecommitdiff
path: root/Documentation/powerpc
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/powerpc')
-rw-r--r--Documentation/powerpc/00-INDEX26
-rw-r--r--Documentation/powerpc/SBC8260_memory_mapping.txt197
-rw-r--r--Documentation/powerpc/booting-without-of.txt1420
-rw-r--r--Documentation/powerpc/bootwrapper.txt141
-rw-r--r--Documentation/powerpc/cpu_families.txt221
-rw-r--r--Documentation/powerpc/cpu_features.txt12
-rw-r--r--Documentation/powerpc/eeh-pci-error-recovery.txt23
-rw-r--r--Documentation/powerpc/firmware-assisted-dump.txt270
-rw-r--r--Documentation/powerpc/hvcs.txt8
-rw-r--r--Documentation/powerpc/kvm_440.txt41
-rw-r--r--Documentation/powerpc/mpc52xx.txt12
-rw-r--r--Documentation/powerpc/pmu-ebb.txt137
-rw-r--r--Documentation/powerpc/ppc_htab.txt118
-rw-r--r--Documentation/powerpc/ptrace.txt151
-rw-r--r--Documentation/powerpc/qe_firmware.txt295
-rw-r--r--Documentation/powerpc/smp.txt34
-rw-r--r--Documentation/powerpc/sound.txt81
-rw-r--r--Documentation/powerpc/transactional_memory.txt198
-rw-r--r--Documentation/powerpc/zImage_layout.txt47
19 files changed, 1498 insertions, 1934 deletions
diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX
index d6d65b9bcfe..6db73df0427 100644
--- a/Documentation/powerpc/00-INDEX
+++ b/Documentation/powerpc/00-INDEX
@@ -5,22 +5,28 @@ please mail me.
00-INDEX
- this file
+bootwrapper.txt
+ - Information on how the powerpc kernel is wrapped for boot on various
+ different platforms.
cpu_features.txt
- info on how we support a variety of CPUs with minimal compile-time
options.
eeh-pci-error-recovery.txt
- info on PCI Bus EEH Error Recovery
+firmware-assisted-dump.txt
+ - Documentation on the firmware assisted dump mechanism "fadump".
hvcs.txt
- IBM "Hypervisor Virtual Console Server" Installation Guide
+kvm_440.txt
+ - Various notes on the implementation of KVM for PowerPC 440.
mpc52xx.txt
- Linux 2.6.x on MPC52xx family
-ppc_htab.txt
- - info about the Linux/PPC /proc/ppc_htab entry
-SBC8260_memory_mapping.txt
- - EST SBC8260 board info
-smp.txt
- - use and state info about Linux/PPC on MP machines
-sound.txt
- - info on sound support under Linux/PPC
-zImage_layout.txt
- - info on the kernel images for Linux/PPC
+pmu-ebb.txt
+ - Description of the API for using the PMU with Event Based Branches.
+qe_firmware.txt
+ - describes the layout of firmware binaries for the Freescale QUICC
+ Engine and the code that parses and uploads the microcode therein.
+ptrace.txt
+ - Information on the ptrace interfaces for hardware debug registers.
+transactional_memory.txt
+ - Overview of the Power8 transactional memory support.
diff --git a/Documentation/powerpc/SBC8260_memory_mapping.txt b/Documentation/powerpc/SBC8260_memory_mapping.txt
deleted file mode 100644
index e6e9ee0506c..00000000000
--- a/Documentation/powerpc/SBC8260_memory_mapping.txt
+++ /dev/null
@@ -1,197 +0,0 @@
-Please mail me (Jon Diekema, diekema_jon@si.com or diekema@cideas.com)
-if you have questions, comments or corrections.
-
- * EST SBC8260 Linux memory mapping rules
-
- http://www.estc.com/
- http://www.estc.com/products/boards/SBC8260-8240_ds.html
-
- Initial conditions:
- -------------------
-
- Tasks that need to be perform by the boot ROM before control is
- transferred to zImage (compressed Linux kernel):
-
- - Define the IMMR to 0xf0000000
-
- - Initialize the memory controller so that RAM is available at
- physical address 0x00000000. On the SBC8260 is this 16M (64M)
- SDRAM.
-
- - The boot ROM should only clear the RAM that it is using.
-
- The reason for doing this is to enhances the chances of a
- successful post mortem on a Linux panic. One of the first
- items to examine is the 16k (LOG_BUF_LEN) circular console
- buffer called log_buf which is defined in kernel/printk.c.
-
- - To enhance boot ROM performance, the I-cache can be enabled.
-
- Date: Mon, 22 May 2000 14:21:10 -0700
- From: Neil Russell <caret@c-side.com>
-
- LiMon (LInux MONitor) runs with and starts Linux with MMU
- off, I-cache enabled, D-cache disabled. The I-cache doesn't
- need hints from the MMU to work correctly as the D-cache
- does. No D-cache means no special code to handle devices in
- the presence of cache (no snooping, etc). The use of the
- I-cache means that the monitor can run acceptably fast
- directly from ROM, rather than having to copy it to RAM.
-
- - Build the board information structure (see
- include/asm-ppc/est8260.h for its definition)
-
- - The compressed Linux kernel (zImage) contains a bootstrap loader
- that is position independent; you can load it into any RAM,
- ROM or FLASH memory address >= 0x00500000 (above 5 MB), or
- at its link address of 0x00400000 (4 MB).
-
- Note: If zImage is loaded at its link address of 0x00400000 (4 MB),
- then zImage will skip the step of moving itself to
- its link address.
-
- - Load R3 with the address of the board information structure
-
- - Transfer control to zImage
-
- - The Linux console port is SMC1, and the baud rate is controlled
- from the bi_baudrate field of the board information structure.
- On thing to keep in mind when picking the baud rate, is that
- there is no flow control on the SMC ports. I would stick
- with something safe and standard like 19200.
-
- On the EST SBC8260, the SMC1 port is on the COM1 connector of
- the board.
-
-
- EST SBC8260 defaults:
- ---------------------
-
- Chip
- Memory Sel Bus Use
- --------------------- --- --- ----------------------------------
- 0x00000000-0x03FFFFFF CS2 60x (16M or 64M)/64M SDRAM
- 0x04000000-0x04FFFFFF CS4 local 4M/16M SDRAM (soldered to the board)
- 0x21000000-0x21000000 CS7 60x 1B/64K Flash present detect (from the flash SIMM)
- 0x21000001-0x21000001 CS7 60x 1B/64K Switches (read) and LEDs (write)
- 0x22000000-0x2200FFFF CS5 60x 8K/64K EEPROM
- 0xFC000000-0xFCFFFFFF CS6 60x 2M/16M flash (8 bits wide, soldered to the board)
- 0xFE000000-0xFFFFFFFF CS0 60x 4M/16M flash (SIMM)
-
- Notes:
- ------
-
- - The chip selects can map 32K blocks and up (powers of 2)
-
- - The SDRAM machine can handled up to 128Mbytes per chip select
-
- - Linux uses the 60x bus memory (the SDRAM DIMM) for the
- communications buffers.
-
- - BATs can map 128K-256Mbytes each. There are four data BATs and
- four instruction BATs. Generally the data and instruction BATs
- are mapped the same.
-
- - The IMMR must be set above the kernel virtual memory addresses,
- which start at 0xC0000000. Otherwise, the kernel may crash as
- soon as you start any threads or processes due to VM collisions
- in the kernel or user process space.
-
-
- Details from Dan Malek <dan_malek@mvista.com> on 10/29/1999:
-
- The user application virtual space consumes the first 2 Gbytes
- (0x00000000 to 0x7FFFFFFF). The kernel virtual text starts at
- 0xC0000000, with data following. There is a "protection hole"
- between the end of kernel data and the start of the kernel
- dynamically allocated space, but this space is still within
- 0xCxxxxxxx.
-
- Obviously the kernel can't map any physical addresses 1:1 in
- these ranges.
-
-
- Details from Dan Malek <dan_malek@mvista.com> on 5/19/2000:
-
- During the early kernel initialization, the kernel virtual
- memory allocator is not operational. Prior to this KVM
- initialization, we choose to map virtual to physical addresses
- 1:1. That is, the kernel virtual address exactly matches the
- physical address on the bus. These mappings are typically done
- in arch/ppc/kernel/head.S, or arch/ppc/mm/init.c. Only
- absolutely necessary mappings should be done at this time, for
- example board control registers or a serial uart. Normal device
- driver initialization should map resources later when necessary.
-
- Although platform dependent, and certainly the case for embedded
- 8xx, traditionally memory is mapped at physical address zero,
- and I/O devices above physical address 0x80000000. The lowest
- and highest (above 0xf0000000) I/O addresses are traditionally
- used for devices or registers we need to map during kernel
- initialization and prior to KVM operation. For this reason,
- and since it followed prior PowerPC platform examples, I chose
- to map the embedded 8xx kernel to the 0xc0000000 virtual address.
- This way, we can enable the MMU to map the kernel for proper
- operation, and still map a few windows before the KVM is operational.
-
- On some systems, you could possibly run the kernel at the
- 0x80000000 or any other virtual address. It just depends upon
- mapping that must be done prior to KVM operational. You can never
- map devices or kernel spaces that overlap with the user virtual
- space. This is why default IMMR mapping used by most BDM tools
- won't work. They put the IMMR at something like 0x10000000 or
- 0x02000000 for example. You simply can't map these addresses early
- in the kernel, and continue proper system operation.
-
- The embedded 8xx/82xx kernel is mature enough that all you should
- need to do is map the IMMR someplace at or above 0xf0000000 and it
- should boot far enough to get serial console messages and KGDB
- connected on any platform. There are lots of other subtle memory
- management design features that you simply don't need to worry
- about. If you are changing functions related to MMU initialization,
- you are likely breaking things that are known to work and are
- heading down a path of disaster and frustration. Your changes
- should be to make the flexibility of the processor fit Linux,
- not force arbitrary and non-workable memory mappings into Linux.
-
- - You don't want to change KERNELLOAD or KERNELBASE, otherwise the
- virtual memory and MMU code will get confused.
-
- arch/ppc/Makefile:KERNELLOAD = 0xc0000000
-
- include/asm-ppc/page.h:#define PAGE_OFFSET 0xc0000000
- include/asm-ppc/page.h:#define KERNELBASE PAGE_OFFSET
-
- - RAM is at physical address 0x00000000, and gets mapped to
- virtual address 0xC0000000 for the kernel.
-
-
- Physical addresses used by the Linux kernel:
- --------------------------------------------
-
- 0x00000000-0x3FFFFFFF 1GB reserved for RAM
- 0xF0000000-0xF001FFFF 128K IMMR 64K used for dual port memory,
- 64K for 8260 registers
-
-
- Logical addresses used by the Linux kernel:
- -------------------------------------------
-
- 0xF0000000-0xFFFFFFFF 256M BAT0 (IMMR: dual port RAM, registers)
- 0xE0000000-0xEFFFFFFF 256M BAT1 (I/O space for custom boards)
- 0xC0000000-0xCFFFFFFF 256M BAT2 (RAM)
- 0xD0000000-0xDFFFFFFF 256M BAT3 (if RAM > 256MByte)
-
-
- EST SBC8260 Linux mapping:
- --------------------------
-
- DBAT0, IBAT0, cache inhibited:
-
- Chip
- Memory Sel Use
- --------------------- --- ---------------------------------
- 0xF0000000-0xF001FFFF n/a IMMR: dual port RAM, registers
-
- DBAT1, IBAT1, cache inhibited:
-
diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt
deleted file mode 100644
index 1284498e847..00000000000
--- a/Documentation/powerpc/booting-without-of.txt
+++ /dev/null
@@ -1,1420 +0,0 @@
- Booting the Linux/ppc kernel without Open Firmware
- --------------------------------------------------
-
-
-(c) 2005 Benjamin Herrenschmidt <benh at kernel.crashing.org>,
- IBM Corp.
-(c) 2005 Becky Bruce <becky.bruce at freescale.com>,
- Freescale Semiconductor, FSL SOC and 32-bit additions
-
- May 18, 2005: Rev 0.1 - Initial draft, no chapter III yet.
-
- May 19, 2005: Rev 0.2 - Add chapter III and bits & pieces here or
- clarifies the fact that a lot of things are
- optional, the kernel only requires a very
- small device tree, though it is encouraged
- to provide an as complete one as possible.
-
- May 24, 2005: Rev 0.3 - Precise that DT block has to be in RAM
- - Misc fixes
- - Define version 3 and new format version 16
- for the DT block (version 16 needs kernel
- patches, will be fwd separately).
- String block now has a size, and full path
- is replaced by unit name for more
- compactness.
- linux,phandle is made optional, only nodes
- that are referenced by other nodes need it.
- "name" property is now automatically
- deduced from the unit name
-
- June 1, 2005: Rev 0.4 - Correct confusion between OF_DT_END and
- OF_DT_END_NODE in structure definition.
- - Change version 16 format to always align
- property data to 4 bytes. Since tokens are
- already aligned, that means no specific
- required alignement between property size
- and property data. The old style variable
- alignment would make it impossible to do
- "simple" insertion of properties using
- memove (thanks Milton for
- noticing). Updated kernel patch as well
- - Correct a few more alignement constraints
- - Add a chapter about the device-tree
- compiler and the textural representation of
- the tree that can be "compiled" by dtc.
-
-
- November 21, 2005: Rev 0.5
- - Additions/generalizations for 32-bit
- - Changed to reflect the new arch/powerpc
- structure
- - Added chapter VI
-
-
- ToDo:
- - Add some definitions of interrupt tree (simple/complex)
- - Add some definitions for pci host bridges
- - Add some common address format examples
- - Add definitions for standard properties and "compatible"
- names for cells that are not already defined by the existing
- OF spec.
- - Compare FSL SOC use of PCI to standard and make sure no new
- node definition required.
- - Add more information about node definitions for SOC devices
- that currently have no standard, like the FSL CPM.
-
-
-I - Introduction
-================
-
-During the recent development of the Linux/ppc64 kernel, and more
-specifically, the addition of new platform types outside of the old
-IBM pSeries/iSeries pair, it was decided to enforce some strict rules
-regarding the kernel entry and bootloader <-> kernel interfaces, in
-order to avoid the degeneration that had become the ppc32 kernel entry
-point and the way a new platform should be added to the kernel. The
-legacy iSeries platform breaks those rules as it predates this scheme,
-but no new board support will be accepted in the main tree that
-doesn't follows them properly. In addition, since the advent of the
-arch/powerpc merged architecture for ppc32 and ppc64, new 32-bit
-platforms and 32-bit platforms which move into arch/powerpc will be
-required to use these rules as well.
-
-The main requirement that will be defined in more detail below is
-the presence of a device-tree whose format is defined after Open
-Firmware specification. However, in order to make life easier
-to embedded board vendors, the kernel doesn't require the device-tree
-to represent every device in the system and only requires some nodes
-and properties to be present. This will be described in detail in
-section III, but, for example, the kernel does not require you to
-create a node for every PCI device in the system. It is a requirement
-to have a node for PCI host bridges in order to provide interrupt
-routing informations and memory/IO ranges, among others. It is also
-recommended to define nodes for on chip devices and other busses that
-don't specifically fit in an existing OF specification. This creates a
-great flexibility in the way the kernel can then probe those and match
-drivers to device, without having to hard code all sorts of tables. It
-also makes it more flexible for board vendors to do minor hardware
-upgrades without significantly impacting the kernel code or cluttering
-it with special cases.
-
-
-1) Entry point for arch/powerpc
--------------------------------
-
- There is one and one single entry point to the kernel, at the start
- of the kernel image. That entry point supports two calling
- conventions:
-
- a) Boot from Open Firmware. If your firmware is compatible
- with Open Firmware (IEEE 1275) or provides an OF compatible
- client interface API (support for "interpret" callback of
- forth words isn't required), you can enter the kernel with:
-
- r5 : OF callback pointer as defined by IEEE 1275
- bindings to powerpc. Only the 32 bit client interface
- is currently supported
-
- r3, r4 : address & length of an initrd if any or 0
-
- The MMU is either on or off; the kernel will run the
- trampoline located in arch/powerpc/kernel/prom_init.c to
- extract the device-tree and other information from open
- firmware and build a flattened device-tree as described
- in b). prom_init() will then re-enter the kernel using
- the second method. This trampoline code runs in the
- context of the firmware, which is supposed to handle all
- exceptions during that time.
-
- b) Direct entry with a flattened device-tree block. This entry
- point is called by a) after the OF trampoline and can also be
- called directly by a bootloader that does not support the Open
- Firmware client interface. It is also used by "kexec" to
- implement "hot" booting of a new kernel from a previous
- running one. This method is what I will describe in more
- details in this document, as method a) is simply standard Open
- Firmware, and thus should be implemented according to the
- various standard documents defining it and its binding to the
- PowerPC platform. The entry point definition then becomes:
-
- r3 : physical pointer to the device-tree block
- (defined in chapter II) in RAM
-
- r4 : physical pointer to the kernel itself. This is
- used by the assembly code to properly disable the MMU
- in case you are entering the kernel with MMU enabled
- and a non-1:1 mapping.
-
- r5 : NULL (as to differenciate with method a)
-
- Note about SMP entry: Either your firmware puts your other
- CPUs in some sleep loop or spin loop in ROM where you can get
- them out via a soft reset or some other means, in which case
- you don't need to care, or you'll have to enter the kernel
- with all CPUs. The way to do that with method b) will be
- described in a later revision of this document.
-
-
-2) Board support
-----------------
-
-64-bit kernels:
-
- Board supports (platforms) are not exclusive config options. An
- arbitrary set of board supports can be built in a single kernel
- image. The kernel will "know" what set of functions to use for a
- given platform based on the content of the device-tree. Thus, you
- should:
-
- a) add your platform support as a _boolean_ option in
- arch/powerpc/Kconfig, following the example of PPC_PSERIES,
- PPC_PMAC and PPC_MAPLE. The later is probably a good
- example of a board support to start from.
-
- b) create your main platform file as
- "arch/powerpc/platforms/myplatform/myboard_setup.c" and add it
- to the Makefile under the condition of your CONFIG_
- option. This file will define a structure of type "ppc_md"
- containing the various callbacks that the generic code will
- use to get to your platform specific code
-
- c) Add a reference to your "ppc_md" structure in the
- "machines" table in arch/powerpc/kernel/setup_64.c if you are
- a 64-bit platform.
-
- d) request and get assigned a platform number (see PLATFORM_*
- constants in include/asm-powerpc/processor.h
-
-32-bit embedded kernels:
-
- Currently, board support is essentially an exclusive config option.
- The kernel is configured for a single platform. Part of the reason
- for this is to keep kernels on embedded systems small and efficient;
- part of this is due to the fact the code is already that way. In the
- future, a kernel may support multiple platforms, but only if the
- platforms feature the same core architectire. A single kernel build
- cannot support both configurations with Book E and configurations
- with classic Powerpc architectures.
-
- 32-bit embedded platforms that are moved into arch/powerpc using a
- flattened device tree should adopt the merged tree practice of
- setting ppc_md up dynamically, even though the kernel is currently
- built with support for only a single platform at a time. This allows
- unification of the setup code, and will make it easier to go to a
- multiple-platform-support model in the future.
-
-NOTE: I believe the above will be true once Ben's done with the merge
-of the boot sequences.... someone speak up if this is wrong!
-
- To add a 32-bit embedded platform support, follow the instructions
- for 64-bit platforms above, with the exception that the Kconfig
- option should be set up such that the kernel builds exclusively for
- the platform selected. The processor type for the platform should
- enable another config option to select the specific board
- supported.
-
-NOTE: If ben doesn't merge the setup files, may need to change this to
-point to setup_32.c
-
-
- I will describe later the boot process and various callbacks that
- your platform should implement.
-
-
-II - The DT block format
-========================
-
-
-This chapter defines the actual format of the flattened device-tree
-passed to the kernel. The actual content of it and kernel requirements
-are described later. You can find example of code manipulating that
-format in various places, including arch/powerpc/kernel/prom_init.c
-which will generate a flattened device-tree from the Open Firmware
-representation, or the fs2dt utility which is part of the kexec tools
-which will generate one from a filesystem representation. It is
-expected that a bootloader like uboot provides a bit more support,
-that will be discussed later as well.
-
-Note: The block has to be in main memory. It has to be accessible in
-both real mode and virtual mode with no mapping other than main
-memory. If you are writing a simple flash bootloader, it should copy
-the block to RAM before passing it to the kernel.
-
-
-1) Header
----------
-
- The kernel is entered with r3 pointing to an area of memory that is
- roughtly described in include/asm-powerpc/prom.h by the structure
- boot_param_header:
-
-struct boot_param_header {
- u32 magic; /* magic word OF_DT_HEADER */
- u32 totalsize; /* total size of DT block */
- u32 off_dt_struct; /* offset to structure */
- u32 off_dt_strings; /* offset to strings */
- u32 off_mem_rsvmap; /* offset to memory reserve map
-*/
- u32 version; /* format version */
- u32 last_comp_version; /* last compatible version */
-
- /* version 2 fields below */
- u32 boot_cpuid_phys; /* Which physical CPU id we're
- booting on */
- /* version 3 fields below */
- u32 size_dt_strings; /* size of the strings block */
-};
-
- Along with the constants:
-
-/* Definitions used by the flattened device tree */
-#define OF_DT_HEADER 0xd00dfeed /* 4: version,
- 4: total size */
-#define OF_DT_BEGIN_NODE 0x1 /* Start node: full name
-*/
-#define OF_DT_END_NODE 0x2 /* End node */
-#define OF_DT_PROP 0x3 /* Property: name off,
- size, content */
-#define OF_DT_END 0x9
-
- All values in this header are in big endian format, the various
- fields in this header are defined more precisely below. All
- "offset" values are in bytes from the start of the header; that is
- from the value of r3.
-
- - magic
-
- This is a magic value that "marks" the beginning of the
- device-tree block header. It contains the value 0xd00dfeed and is
- defined by the constant OF_DT_HEADER
-
- - totalsize
-
- This is the total size of the DT block including the header. The
- "DT" block should enclose all data structures defined in this
- chapter (who are pointed to by offsets in this header). That is,
- the device-tree structure, strings, and the memory reserve map.
-
- - off_dt_struct
-
- This is an offset from the beginning of the header to the start
- of the "structure" part the device tree. (see 2) device tree)
-
- - off_dt_strings
-
- This is an offset from the beginning of the header to the start
- of the "strings" part of the device-tree
-
- - off_mem_rsvmap
-
- This is an offset from the beginning of the header to the start
- of the reserved memory map. This map is a list of pairs of 64
- bit integers. Each pair is a physical address and a size. The
-
- list is terminated by an entry of size 0. This map provides the
- kernel with a list of physical memory areas that are "reserved"
- and thus not to be used for memory allocations, especially during
- early initialization. The kernel needs to allocate memory during
- boot for things like un-flattening the device-tree, allocating an
- MMU hash table, etc... Those allocations must be done in such a
- way to avoid overriding critical things like, on Open Firmware
- capable machines, the RTAS instance, or on some pSeries, the TCE
- tables used for the iommu. Typically, the reserve map should
- contain _at least_ this DT block itself (header,total_size). If
- you are passing an initrd to the kernel, you should reserve it as
- well. You do not need to reserve the kernel image itself. The map
- should be 64 bit aligned.
-
- - version
-
- This is the version of this structure. Version 1 stops
- here. Version 2 adds an additional field boot_cpuid_phys.
- Version 3 adds the size of the strings block, allowing the kernel
- to reallocate it easily at boot and free up the unused flattened
- structure after expansion. Version 16 introduces a new more
- "compact" format for the tree itself that is however not backward
- compatible. You should always generate a structure of the highest
- version defined at the time of your implementation. Currently
- that is version 16, unless you explicitely aim at being backward
- compatible.
-
- - last_comp_version
-
- Last compatible version. This indicates down to what version of
- the DT block you are backward compatible. For example, version 2
- is backward compatible with version 1 (that is, a kernel build
- for version 1 will be able to boot with a version 2 format). You
- should put a 1 in this field if you generate a device tree of
- version 1 to 3, or 0x10 if you generate a tree of version 0x10
- using the new unit name format.
-
- - boot_cpuid_phys
-
- This field only exist on version 2 headers. It indicate which
- physical CPU ID is calling the kernel entry point. This is used,
- among others, by kexec. If you are on an SMP system, this value
- should match the content of the "reg" property of the CPU node in
- the device-tree corresponding to the CPU calling the kernel entry
- point (see further chapters for more informations on the required
- device-tree contents)
-
-
- So the typical layout of a DT block (though the various parts don't
- need to be in that order) looks like this (addresses go from top to
- bottom):
-
-
- ------------------------------
- r3 -> | struct boot_param_header |
- ------------------------------
- | (alignment gap) (*) |
- ------------------------------
- | memory reserve map |
- ------------------------------
- | (alignment gap) |
- ------------------------------
- | |
- | device-tree structure |
- | |
- ------------------------------
- | (alignment gap) |
- ------------------------------
- | |
- | device-tree strings |
- | |
- -----> ------------------------------
- |
- |
- --- (r3 + totalsize)
-
- (*) The alignment gaps are not necessarily present; their presence
- and size are dependent on the various alignment requirements of
- the individual data blocks.
-
-
-2) Device tree generalities
----------------------------
-
-This device-tree itself is separated in two different blocks, a
-structure block and a strings block. Both need to be aligned to a 4
-byte boundary.
-
-First, let's quickly describe the device-tree concept before detailing
-the storage format. This chapter does _not_ describe the detail of the
-required types of nodes & properties for the kernel, this is done
-later in chapter III.
-
-The device-tree layout is strongly inherited from the definition of
-the Open Firmware IEEE 1275 device-tree. It's basically a tree of
-nodes, each node having two or more named properties. A property can
-have a value or not.
-
-It is a tree, so each node has one and only one parent except for the
-root node who has no parent.
-
-A node has 2 names. The actual node name is generally contained in a
-property of type "name" in the node property list whose value is a
-zero terminated string and is mandatory for version 1 to 3 of the
-format definition (as it is in Open Firmware). Version 0x10 makes it
-optional as it can generate it from the unit name defined below.
-
-There is also a "unit name" that is used to differenciate nodes with
-the same name at the same level, it is usually made of the node
-name's, the "@" sign, and a "unit address", which definition is
-specific to the bus type the node sits on.
-
-The unit name doesn't exist as a property per-se but is included in
-the device-tree structure. It is typically used to represent "path" in
-the device-tree. More details about the actual format of these will be
-below.
-
-The kernel powerpc generic code does not make any formal use of the
-unit address (though some board support code may do) so the only real
-requirement here for the unit address is to ensure uniqueness of
-the node unit name at a given level of the tree. Nodes with no notion
-of address and no possible sibling of the same name (like /memory or
-/cpus) may omit the unit address in the context of this specification,
-or use the "@0" default unit address. The unit name is used to define
-a node "full path", which is the concatenation of all parent node
-unit names separated with "/".
-
-The root node doesn't have a defined name, and isn't required to have
-a name property either if you are using version 3 or earlier of the
-format. It also has no unit address (no @ symbol followed by a unit
-address). The root node unit name is thus an empty string. The full
-path to the root node is "/".
-
-Every node which actually represents an actual device (that is, a node
-which isn't only a virtual "container" for more nodes, like "/cpus"
-is) is also required to have a "device_type" property indicating the
-type of node .
-
-Finally, every node that can be referenced from a property in another
-node is required to have a "linux,phandle" property. Real open
-firmware implementations provide a unique "phandle" value for every
-node that the "prom_init()" trampoline code turns into
-"linux,phandle" properties. However, this is made optional if the
-flattened device tree is used directly. An example of a node
-referencing another node via "phandle" is when laying out the
-interrupt tree which will be described in a further version of this
-document.
-
-This "linux, phandle" property is a 32 bit value that uniquely
-identifies a node. You are free to use whatever values or system of
-values, internal pointers, or whatever to generate these, the only
-requirement is that every node for which you provide that property has
-a unique value for it.
-
-Here is an example of a simple device-tree. In this example, an "o"
-designates a node followed by the node unit name. Properties are
-presented with their name followed by their content. "content"
-represents an ASCII string (zero terminated) value, while <content>
-represents a 32 bit hexadecimal value. The various nodes in this
-example will be discussed in a later chapter. At this point, it is
-only meant to give you a idea of what a device-tree looks like. I have
-purposefully kept the "name" and "linux,phandle" properties which
-aren't necessary in order to give you a better idea of what the tree
-looks like in practice.
-
- / o device-tree
- |- name = "device-tree"
- |- model = "MyBoardName"
- |- compatible = "MyBoardFamilyName"
- |- #address-cells = <2>
- |- #size-cells = <2>
- |- linux,phandle = <0>
- |
- o cpus
- | | - name = "cpus"
- | | - linux,phandle = <1>
- | | - #address-cells = <1>
- | | - #size-cells = <0>
- | |
- | o PowerPC,970@0
- | |- name = "PowerPC,970"
- | |- device_type = "cpu"
- | |- reg = <0>
- | |- clock-frequency = <5f5e1000>
- | |- linux,boot-cpu
- | |- linux,phandle = <2>
- |
- o memory@0
- | |- name = "memory"
- | |- device_type = "memory"
- | |- reg = <00000000 00000000 00000000 20000000>
- | |- linux,phandle = <3>
- |
- o chosen
- |- name = "chosen"
- |- bootargs = "root=/dev/sda2"
- |- linux,platform = <00000600>
- |- linux,phandle = <4>
-
-This tree is almost a minimal tree. It pretty much contains the
-minimal set of required nodes and properties to boot a linux kernel;
-that is, some basic model informations at the root, the CPUs, and the
-physical memory layout. It also includes misc information passed
-through /chosen, like in this example, the platform type (mandatory)
-and the kernel command line arguments (optional).
-
-The /cpus/PowerPC,970@0/linux,boot-cpu property is an example of a
-property without a value. All other properties have a value. The
-significance of the #address-cells and #size-cells properties will be
-explained in chapter IV which defines precisely the required nodes and
-properties and their content.
-
-
-3) Device tree "structure" block
-
-The structure of the device tree is a linearized tree structure. The
-"OF_DT_BEGIN_NODE" token starts a new node, and the "OF_DT_END_NODE"
-ends that node definition. Child nodes are simply defined before
-"OF_DT_END_NODE" (that is nodes within the node). A 'token' is a 32
-bit value. The tree has to be "finished" with a OF_DT_END token
-
-Here's the basic structure of a single node:
-
- * token OF_DT_BEGIN_NODE (that is 0x00000001)
- * for version 1 to 3, this is the node full path as a zero
- terminated string, starting with "/". For version 16 and later,
- this is the node unit name only (or an empty string for the
- root node)
- * [align gap to next 4 bytes boundary]
- * for each property:
- * token OF_DT_PROP (that is 0x00000003)
- * 32 bit value of property value size in bytes (or 0 of no
- * value)
- * 32 bit value of offset in string block of property name
- * property value data if any
- * [align gap to next 4 bytes boundary]
- * [child nodes if any]
- * token OF_DT_END_NODE (that is 0x00000002)
-
-So the node content can be summmarised as a start token, a full path,
-a list of properties, a list of child node and an end token. Every
-child node is a full node structure itself as defined above.
-
-4) Device tree 'strings" block
-
-In order to save space, property names, which are generally redundant,
-are stored separately in the "strings" block. This block is simply the
-whole bunch of zero terminated strings for all property names
-concatenated together. The device-tree property definitions in the
-structure block will contain offset values from the beginning of the
-strings block.
-
-
-III - Required content of the device tree
-=========================================
-
-WARNING: All "linux,*" properties defined in this document apply only
-to a flattened device-tree. If your platform uses a real
-implementation of Open Firmware or an implementation compatible with
-the Open Firmware client interface, those properties will be created
-by the trampoline code in the kernel's prom_init() file. For example,
-that's where you'll have to add code to detect your board model and
-set the platform number. However, when using the flatenned device-tree
-entry point, there is no prom_init() pass, and thus you have to
-provide those properties yourself.
-
-
-1) Note about cells and address representation
-----------------------------------------------
-
-The general rule is documented in the various Open Firmware
-documentations. If you chose to describe a bus with the device-tree
-and there exist an OF bus binding, then you should follow the
-specification. However, the kernel does not require every single
-device or bus to be described by the device tree.
-
-In general, the format of an address for a device is defined by the
-parent bus type, based on the #address-cells and #size-cells
-property. In the absence of such a property, the parent's parent
-values are used, etc... The kernel requires the root node to have
-those properties defining addresses format for devices directly mapped
-on the processor bus.
-
-Those 2 properties define 'cells' for representing an address and a
-size. A "cell" is a 32 bit number. For example, if both contain 2
-like the example tree given above, then an address and a size are both
-composed of 2 cells, and each is a 64 bit number (cells are
-concatenated and expected to be in big endian format). Another example
-is the way Apple firmware defines them, with 2 cells for an address
-and one cell for a size. Most 32-bit implementations should define
-#address-cells and #size-cells to 1, which represents a 32-bit value.
-Some 32-bit processors allow for physical addresses greater than 32
-bits; these processors should define #address-cells as 2.
-
-"reg" properties are always a tuple of the type "address size" where
-the number of cells of address and size is specified by the bus
-#address-cells and #size-cells. When a bus supports various address
-spaces and other flags relative to a given address allocation (like
-prefetchable, etc...) those flags are usually added to the top level
-bits of the physical address. For example, a PCI physical address is
-made of 3 cells, the bottom two containing the actual address itself
-while the top cell contains address space indication, flags, and pci
-bus & device numbers.
-
-For busses that support dynamic allocation, it's the accepted practice
-to then not provide the address in "reg" (keep it 0) though while
-providing a flag indicating the address is dynamically allocated, and
-then, to provide a separate "assigned-addresses" property that
-contains the fully allocated addresses. See the PCI OF bindings for
-details.
-
-In general, a simple bus with no address space bits and no dynamic
-allocation is preferred if it reflects your hardware, as the existing
-kernel address parsing functions will work out of the box. If you
-define a bus type with a more complex address format, including things
-like address space bits, you'll have to add a bus translator to the
-prom_parse.c file of the recent kernels for your bus type.
-
-The "reg" property only defines addresses and sizes (if #size-cells
-is
-non-0) within a given bus. In order to translate addresses upward
-(that is into parent bus addresses, and possibly into cpu physical
-addresses), all busses must contain a "ranges" property. If the
-"ranges" property is missing at a given level, it's assumed that
-translation isn't possible. The format of the "ranges" proprety for a
-bus is a list of:
-
- bus address, parent bus address, size
-
-"bus address" is in the format of the bus this bus node is defining,
-that is, for a PCI bridge, it would be a PCI address. Thus, (bus
-address, size) defines a range of addresses for child devices. "parent
-bus address" is in the format of the parent bus of this bus. For
-example, for a PCI host controller, that would be a CPU address. For a
-PCI<->ISA bridge, that would be a PCI address. It defines the base
-address in the parent bus where the beginning of that range is mapped.
-
-For a new 64 bit powerpc board, I recommend either the 2/2 format or
-Apple's 2/1 format which is slightly more compact since sizes usually
-fit in a single 32 bit word. New 32 bit powerpc boards should use a
-1/1 format, unless the processor supports physical addresses greater
-than 32-bits, in which case a 2/1 format is recommended.
-
-
-2) Note about "compatible" properties
--------------------------------------
-
-These properties are optional, but recommended in devices and the root
-node. The format of a "compatible" property is a list of concatenated
-zero terminated strings. They allow a device to express its
-compatibility with a family of similar devices, in some cases,
-allowing a single driver to match against several devices regardless
-of their actual names.
-
-3) Note about "name" properties
--------------------------------
-
-While earlier users of Open Firmware like OldWorld macintoshes tended
-to use the actual device name for the "name" property, it's nowadays
-considered a good practice to use a name that is closer to the device
-class (often equal to device_type). For example, nowadays, ethernet
-controllers are named "ethernet", an additional "model" property
-defining precisely the chip type/model, and "compatible" property
-defining the family in case a single driver can driver more than one
-of these chips. However, the kernel doesn't generally put any
-restriction on the "name" property; it is simply considered good
-practice to follow the standard and its evolutions as closely as
-possible.
-
-Note also that the new format version 16 makes the "name" property
-optional. If it's absent for a node, then the node's unit name is then
-used to reconstruct the name. That is, the part of the unit name
-before the "@" sign is used (or the entire unit name if no "@" sign
-is present).
-
-4) Note about node and property names and character set
--------------------------------------------------------
-
-While open firmware provides more flexibe usage of 8859-1, this
-specification enforces more strict rules. Nodes and properties should
-be comprised only of ASCII characters 'a' to 'z', '0' to
-'9', ',', '.', '_', '+', '#', '?', and '-'. Node names additionally
-allow uppercase characters 'A' to 'Z' (property names should be
-lowercase. The fact that vendors like Apple don't respect this rule is
-irrelevant here). Additionally, node and property names should always
-begin with a character in the range 'a' to 'z' (or 'A' to 'Z' for node
-names).
-
-The maximum number of characters for both nodes and property names
-is 31. In the case of node names, this is only the leftmost part of
-a unit name (the pure "name" property), it doesn't include the unit
-address which can extend beyond that limit.
-
-
-5) Required nodes and properties
---------------------------------
- These are all that are currently required. However, it is strongly
- recommended that you expose PCI host bridges as documented in the
- PCI binding to open firmware, and your interrupt tree as documented
- in OF interrupt tree specification.
-
- a) The root node
-
- The root node requires some properties to be present:
-
- - model : this is your board name/model
- - #address-cells : address representation for "root" devices
- - #size-cells: the size representation for "root" devices
-
- Additionally, some recommended properties are:
-
- - compatible : the board "family" generally finds its way here,
- for example, if you have 2 board models with a similar layout,
- that typically get driven by the same platform code in the
- kernel, you would use a different "model" property but put a
- value in "compatible". The kernel doesn't directly use that
- value (see /chosen/linux,platform for how the kernel choses a
- platform type) but it is generally useful.
-
- The root node is also generally where you add additional properties
- specific to your board like the serial number if any, that sort of
- thing. it is recommended that if you add any "custom" property whose
- name may clash with standard defined ones, you prefix them with your
- vendor name and a comma.
-
- b) The /cpus node
-
- This node is the parent of all individual CPU nodes. It doesn't
- have any specific requirements, though it's generally good practice
- to have at least:
-
- #address-cells = <00000001>
- #size-cells = <00000000>
-
- This defines that the "address" for a CPU is a single cell, and has
- no meaningful size. This is not necessary but the kernel will assume
- that format when reading the "reg" properties of a CPU node, see
- below
-
- c) The /cpus/* nodes
-
- So under /cpus, you are supposed to create a node for every CPU on
- the machine. There is no specific restriction on the name of the
- CPU, though It's common practice to call it PowerPC,<name>. For
- example, Apple uses PowerPC,G5 while IBM uses PowerPC,970FX.
-
- Required properties:
-
- - device_type : has to be "cpu"
- - reg : This is the physical cpu number, it's a single 32 bit cell
- and is also used as-is as the unit number for constructing the
- unit name in the full path. For example, with 2 CPUs, you would
- have the full path:
- /cpus/PowerPC,970FX@0
- /cpus/PowerPC,970FX@1
- (unit addresses do not require leading zeroes)
- - d-cache-line-size : one cell, L1 data cache line size in bytes
- - i-cache-line-size : one cell, L1 instruction cache line size in
- bytes
- - d-cache-size : one cell, size of L1 data cache in bytes
- - i-cache-size : one cell, size of L1 instruction cache in bytes
- - linux, boot-cpu : Should be defined if this cpu is the boot cpu.
-
- Recommended properties:
-
- - timebase-frequency : a cell indicating the frequency of the
- timebase in Hz. This is not directly used by the generic code,
- but you are welcome to copy/paste the pSeries code for setting
- the kernel timebase/decrementer calibration based on this
- value.
- - clock-frequency : a cell indicating the CPU core clock frequency
- in Hz. A new property will be defined for 64 bit values, but if
- your frequency is < 4Ghz, one cell is enough. Here as well as
- for the above, the common code doesn't use that property, but
- you are welcome to re-use the pSeries or Maple one. A future
- kernel version might provide a common function for this.
-
- You are welcome to add any property you find relevant to your board,
- like some information about the mechanism used to soft-reset the
- CPUs. For example, Apple puts the GPIO number for CPU soft reset
- lines in there as a "soft-reset" property since they start secondary
- CPUs by soft-resetting them.
-
-
- d) the /memory node(s)
-
- To define the physical memory layout of your board, you should
- create one or more memory node(s). You can either create a single
- node with all memory ranges in its reg property, or you can create
- several nodes, as you wish. The unit address (@ part) used for the
- full path is the address of the first range of memory defined by a
- given node. If you use a single memory node, this will typically be
- @0.
-
- Required properties:
-
- - device_type : has to be "memory"
- - reg : This property contains all the physical memory ranges of
- your board. It's a list of addresses/sizes concatenated
- together, with the number of cells of each defined by the
- #address-cells and #size-cells of the root node. For example,
- with both of these properties beeing 2 like in the example given
- earlier, a 970 based machine with 6Gb of RAM could typically
- have a "reg" property here that looks like:
-
- 00000000 00000000 00000000 80000000
- 00000001 00000000 00000001 00000000
-
- That is a range starting at 0 of 0x80000000 bytes and a range
- starting at 0x100000000 and of 0x100000000 bytes. You can see
- that there is no memory covering the IO hole between 2Gb and
- 4Gb. Some vendors prefer splitting those ranges into smaller
- segments, but the kernel doesn't care.
-
- e) The /chosen node
-
- This node is a bit "special". Normally, that's where open firmware
- puts some variable environment information, like the arguments, or
- phandle pointers to nodes like the main interrupt controller, or the
- default input/output devices.
-
- This specification makes a few of these mandatory, but also defines
- some linux-specific properties that would be normally constructed by
- the prom_init() trampoline when booting with an OF client interface,
- but that you have to provide yourself when using the flattened format.
-
- Required properties:
-
- - linux,platform : This is your platform number as assigned by the
- architecture maintainers
-
- Recommended properties:
-
- - bootargs : This zero-terminated string is passed as the kernel
- command line
- - linux,stdout-path : This is the full path to your standard
- console device if any. Typically, if you have serial devices on
- your board, you may want to put the full path to the one set as
- the default console in the firmware here, for the kernel to pick
- it up as it's own default console. If you look at the funciton
- set_preferred_console() in arch/ppc64/kernel/setup.c, you'll see
- that the kernel tries to find out the default console and has
- knowledge of various types like 8250 serial ports. You may want
- to extend this function to add your own.
- - interrupt-controller : This is one cell containing a phandle
- value that matches the "linux,phandle" property of your main
- interrupt controller node. May be used for interrupt routing.
-
-
- Note that u-boot creates and fills in the chosen node for platforms
- that use it.
-
- f) the /soc<SOCname> node
-
- This node is used to represent a system-on-a-chip (SOC) and must be
- present if the processor is a SOC. The top-level soc node contains
- information that is global to all devices on the SOC. The node name
- should contain a unit address for the SOC, which is the base address
- of the memory-mapped register set for the SOC. The name of an soc
- node should start with "soc", and the remainder of the name should
- represent the part number for the soc. For example, the MPC8540's
- soc node would be called "soc8540".
-
- Required properties:
-
- - device_type : Should be "soc"
- - ranges : Should be defined as specified in 1) to describe the
- translation of SOC addresses for memory mapped SOC registers.
-
- Recommended properties:
-
- - reg : This property defines the address and size of the
- memory-mapped registers that are used for the SOC node itself.
- It does not include the child device registers - these will be
- defined inside each child node. The address specified in the
- "reg" property should match the unit address of the SOC node.
- - #address-cells : Address representation for "soc" devices. The
- format of this field may vary depending on whether or not the
- device registers are memory mapped. For memory mapped
- registers, this field represents the number of cells needed to
- represent the address of the registers. For SOCs that do not
- use MMIO, a special address format should be defined that
- contains enough cells to represent the required information.
- See 1) above for more details on defining #address-cells.
- - #size-cells : Size representation for "soc" devices
- - #interrupt-cells : Defines the width of cells used to represent
- interrupts. Typically this value is <2>, which includes a
- 32-bit number that represents the interrupt number, and a
- 32-bit number that represents the interrupt sense and level.
- This field is only needed if the SOC contains an interrupt
- controller.
-
- The SOC node may contain child nodes for each SOC device that the
- platform uses. Nodes should not be created for devices which exist
- on the SOC but are not used by a particular platform. See chapter VI
- for more information on how to specify devices that are part of an
-SOC.
-
- Example SOC node for the MPC8540:
-
- soc8540@e0000000 {
- #address-cells = <1>;
- #size-cells = <1>;
- #interrupt-cells = <2>;
- device_type = "soc";
- ranges = <00000000 e0000000 00100000>
- reg = <e0000000 00003000>;
- }
-
-
-
-IV - "dtc", the device tree compiler
-====================================
-
-
-dtc source code can be found at
-<http://ozlabs.org/~dgibson/dtc/dtc.tar.gz>
-
-WARNING: This version is still in early development stage; the
-resulting device-tree "blobs" have not yet been validated with the
-kernel. The current generated bloc lacks a useful reserve map (it will
-be fixed to generate an empty one, it's up to the bootloader to fill
-it up) among others. The error handling needs work, bugs are lurking,
-etc...
-
-dtc basically takes a device-tree in a given format and outputs a
-device-tree in another format. The currently supported formats are:
-
- Input formats:
- -------------
-
- - "dtb": "blob" format, that is a flattened device-tree block
- with
- header all in a binary blob.
- - "dts": "source" format. This is a text file containing a
- "source" for a device-tree. The format is defined later in this
- chapter.
- - "fs" format. This is a representation equivalent to the
- output of /proc/device-tree, that is nodes are directories and
- properties are files
-
- Output formats:
- ---------------
-
- - "dtb": "blob" format
- - "dts": "source" format
- - "asm": assembly language file. This is a file that can be
- sourced by gas to generate a device-tree "blob". That file can
- then simply be added to your Makefile. Additionally, the
- assembly file exports some symbols that can be use
-
-
-The syntax of the dtc tool is
-
- dtc [-I <input-format>] [-O <output-format>]
- [-o output-filename] [-V output_version] input_filename
-
-
-The "output_version" defines what versio of the "blob" format will be
-generated. Supported versions are 1,2,3 and 16. The default is
-currently version 3 but that may change in the future to version 16.
-
-Additionally, dtc performs various sanity checks on the tree, like the
-uniqueness of linux,phandle properties, validity of strings, etc...
-
-The format of the .dts "source" file is "C" like, supports C and C++
-style commments.
-
-/ {
-}
-
-The above is the "device-tree" definition. It's the only statement
-supported currently at the toplevel.
-
-/ {
- property1 = "string_value"; /* define a property containing a 0
- * terminated string
- */
-
- property2 = <1234abcd>; /* define a property containing a
- * numerical 32 bits value (hexadecimal)
- */
-
- property3 = <12345678 12345678 deadbeef>;
- /* define a property containing 3
- * numerical 32 bits values (cells) in
- * hexadecimal
- */
- property4 = [0a 0b 0c 0d de ea ad be ef];
- /* define a property whose content is
- * an arbitrary array of bytes
- */
-
- childnode@addresss { /* define a child node named "childnode"
- * whose unit name is "childnode at
- * address"
- */
-
- childprop = "hello\n"; /* define a property "childprop" of
- * childnode (in this case, a string)
- */
- };
-};
-
-Nodes can contain other nodes etc... thus defining the hierarchical
-structure of the tree.
-
-Strings support common escape sequences from C: "\n", "\t", "\r",
-"\(octal value)", "\x(hex value)".
-
-It is also suggested that you pipe your source file through cpp (gcc
-preprocessor) so you can use #include's, #define for constants, etc...
-
-Finally, various options are planned but not yet implemented, like
-automatic generation of phandles, labels (exported to the asm file so
-you can point to a property content and change it easily from whatever
-you link the device-tree with), label or path instead of numeric value
-in some cells to "point" to a node (replaced by a phandle at compile
-time), export of reserve map address to the asm file, ability to
-specify reserve map content at compile time, etc...
-
-We may provide a .h include file with common definitions of that
-proves useful for some properties (like building PCI properties or
-interrupt maps) though it may be better to add a notion of struct
-definitions to the compiler...
-
-
-V - Recommendations for a bootloader
-====================================
-
-
-Here are some various ideas/recommendations that have been proposed
-while all this has been defined and implemented.
-
- - The bootloader may want to be able to use the device-tree itself
- and may want to manipulate it (to add/edit some properties,
- like physical memory size or kernel arguments). At this point, 2
- choices can be made. Either the bootloader works directly on the
- flattened format, or the bootloader has its own internal tree
- representation with pointers (similar to the kernel one) and
- re-flattens the tree when booting the kernel. The former is a bit
- more difficult to edit/modify, the later requires probably a bit
- more code to handle the tree structure. Note that the structure
- format has been designed so it's relatively easy to "insert"
- properties or nodes or delete them by just memmoving things
- around. It contains no internal offsets or pointers for this
- purpose.
-
- - An example of code for iterating nodes & retreiving properties
- directly from the flattened tree format can be found in the kernel
- file arch/ppc64/kernel/prom.c, look at scan_flat_dt() function,
- it's usage in early_init_devtree(), and the corresponding various
- early_init_dt_scan_*() callbacks. That code can be re-used in a
- GPL bootloader, and as the author of that code, I would be happy
- do discuss possible free licencing to any vendor who wishes to
- integrate all or part of this code into a non-GPL bootloader.
-
-
-
-VI - System-on-a-chip devices and nodes
-=======================================
-
-Many companies are now starting to develop system-on-a-chip
-processors, where the processor core (cpu) and many peripheral devices
-exist on a single piece of silicon. For these SOCs, an SOC node
-should be used that defines child nodes for the devices that make
-up the SOC. While platforms are not required to use this model in
-order to boot the kernel, it is highly encouraged that all SOC
-implementations define as complete a flat-device-tree as possible to
-describe the devices on the SOC. This will allow for the
-genericization of much of the kernel code.
-
-
-1) Defining child nodes of an SOC
----------------------------------
-
-Each device that is part of an SOC may have its own node entry inside
-the SOC node. For each device that is included in the SOC, the unit
-address property represents the address offset for this device's
-memory-mapped registers in the parent's address space. The parent's
-address space is defined by the "ranges" property in the top-level soc
-node. The "reg" property for each node that exists directly under the
-SOC node should contain the address mapping from the child address space
-to the parent SOC address space and the size of the device's
-memory-mapped register file.
-
-For many devices that may exist inside an SOC, there are predefined
-specifications for the format of the device tree node. All SOC child
-nodes should follow these specifications, except where noted in this
-document.
-
-See appendix A for an example partial SOC node definition for the
-MPC8540.
-
-
-2) Specifying interrupt information for SOC devices
----------------------------------------------------
-
-Each device that is part of an SOC and which generates interrupts
-should have the following properties:
-
- - interrupt-parent : contains the phandle of the interrupt
- controller which handles interrupts for this device
- - interrupts : a list of tuples representing the interrupt
- number and the interrupt sense and level for each interupt
- for this device.
-
-This information is used by the kernel to build the interrupt table
-for the interrupt controllers in the system.
-
-Sense and level information should be encoded as follows:
-
- Devices connected to openPIC-compatible controllers should encode
- sense and polarity as follows:
-
- 0 = high to low edge sensitive type enabled
- 1 = active low level sensitive type enabled
- 2 = low to high edge sensitive type enabled
- 3 = active high level sensitive type enabled
-
- ISA PIC interrupt controllers should adhere to the ISA PIC
- encodings listed below:
-
- 0 = active low level sensitive type enabled
- 1 = active high level sensitive type enabled
- 2 = high to low edge sensitive type enabled
- 3 = low to high edge sensitive type enabled
-
-
-
-3) Representing devices without a current OF specification
-----------------------------------------------------------
-
-Currently, there are many devices on SOCs that do not have a standard
-representation pre-defined as part of the open firmware
-specifications, mainly because the boards that contain these SOCs are
-not currently booted using open firmware. This section contains
-descriptions for the SOC devices for which new nodes have been
-defined; this list will expand as more and more SOC-containing
-platforms are moved over to use the flattened-device-tree model.
-
- a) MDIO IO device
-
- The MDIO is a bus to which the PHY devices are connected. For each
- device that exists on this bus, a child node should be created. See
- the definition of the PHY node below for an example of how to define
- a PHY.
-
- Required properties:
- - reg : Offset and length of the register set for the device
- - device_type : Should be "mdio"
- - compatible : Should define the compatible device type for the
- mdio. Currently, this is most likely to be "gianfar"
-
- Example:
-
- mdio@24520 {
- reg = <24520 20>;
-
- ethernet-phy@0 {
- ......
- };
- };
-
-
- b) Gianfar-compatible ethernet nodes
-
- Required properties:
-
- - device_type : Should be "network"
- - model : Model of the device. Can be "TSEC", "eTSEC", or "FEC"
- - compatible : Should be "gianfar"
- - reg : Offset and length of the register set for the device
- - address : List of bytes representing the ethernet address of
- this controller
- - interrupts : <a b> where a is the interrupt number and b is a
- field that represents an encoding of the sense and level
- information for the interrupt. This should be encoded based on
- the information in section 2) depending on the type of interrupt
- controller you have.
- - interrupt-parent : the phandle for the interrupt controller that
- services interrupts for this device.
- - phy-handle : The phandle for the PHY connected to this ethernet
- controller.
-
- Example:
-
- ethernet@24000 {
- #size-cells = <0>;
- device_type = "network";
- model = "TSEC";
- compatible = "gianfar";
- reg = <24000 1000>;
- address = [ 00 E0 0C 00 73 00 ];
- interrupts = <d 3 e 3 12 3>;
- interrupt-parent = <40000>;
- phy-handle = <2452000>
- };
-
-
-
- c) PHY nodes
-
- Required properties:
-
- - device_type : Should be "ethernet-phy"
- - interrupts : <a b> where a is the interrupt number and b is a
- field that represents an encoding of the sense and level
- information for the interrupt. This should be encoded based on
- the information in section 2) depending on the type of interrupt
- controller you have.
- - interrupt-parent : the phandle for the interrupt controller that
- services interrupts for this device.
- - reg : The ID number for the phy, usually a small integer
- - linux,phandle : phandle for this node; likely referenced by an
- ethernet controller node.
-
-
- Example:
-
- ethernet-phy@0 {
- linux,phandle = <2452000>
- interrupt-parent = <40000>;
- interrupts = <35 1>;
- reg = <0>;
- device_type = "ethernet-phy";
- };
-
-
- d) Interrupt controllers
-
- Some SOC devices contain interrupt controllers that are different
- from the standard Open PIC specification. The SOC device nodes for
- these types of controllers should be specified just like a standard
- OpenPIC controller. Sense and level information should be encoded
- as specified in section 2) of this chapter for each device that
- specifies an interrupt.
-
- Example :
-
- pic@40000 {
- linux,phandle = <40000>;
- clock-frequency = <0>;
- interrupt-controller;
- #address-cells = <0>;
- reg = <40000 40000>;
- built-in;
- compatible = "chrp,open-pic";
- device_type = "open-pic";
- big-endian;
- };
-
-
- e) I2C
-
- Required properties :
-
- - device_type : Should be "i2c"
- - reg : Offset and length of the register set for the device
-
- Recommended properties :
-
- - compatible : Should be "fsl-i2c" for parts compatible with
- Freescale I2C specifications.
- - interrupts : <a b> where a is the interrupt number and b is a
- field that represents an encoding of the sense and level
- information for the interrupt. This should be encoded based on
- the information in section 2) depending on the type of interrupt
- controller you have.
- - interrupt-parent : the phandle for the interrupt controller that
- services interrupts for this device.
- - dfsrr : boolean; if defined, indicates that this I2C device has
- a digital filter sampling rate register
- - fsl5200-clocking : boolean; if defined, indicated that this device
- uses the FSL 5200 clocking mechanism.
-
- Example :
-
- i2c@3000 {
- interrupt-parent = <40000>;
- interrupts = <1b 3>;
- reg = <3000 18>;
- device_type = "i2c";
- compatible = "fsl-i2c";
- dfsrr;
- };
-
-
- More devices will be defined as this spec matures.
-
-
-Appendix A - Sample SOC node for MPC8540
-========================================
-
-Note that the #address-cells and #size-cells for the SoC node
-in this example have been explicitly listed; these are likely
-not necessary as they are usually the same as the root node.
-
- soc8540@e0000000 {
- #address-cells = <1>;
- #size-cells = <1>;
- #interrupt-cells = <2>;
- device_type = "soc";
- ranges = <00000000 e0000000 00100000>
- reg = <e0000000 00003000>;
-
- mdio@24520 {
- reg = <24520 20>;
- device_type = "mdio";
- compatible = "gianfar";
-
- ethernet-phy@0 {
- linux,phandle = <2452000>
- interrupt-parent = <40000>;
- interrupts = <35 1>;
- reg = <0>;
- device_type = "ethernet-phy";
- };
-
- ethernet-phy@1 {
- linux,phandle = <2452001>
- interrupt-parent = <40000>;
- interrupts = <35 1>;
- reg = <1>;
- device_type = "ethernet-phy";
- };
-
- ethernet-phy@3 {
- linux,phandle = <2452002>
- interrupt-parent = <40000>;
- interrupts = <35 1>;
- reg = <3>;
- device_type = "ethernet-phy";
- };
-
- };
-
- ethernet@24000 {
- #size-cells = <0>;
- device_type = "network";
- model = "TSEC";
- compatible = "gianfar";
- reg = <24000 1000>;
- address = [ 00 E0 0C 00 73 00 ];
- interrupts = <d 3 e 3 12 3>;
- interrupt-parent = <40000>;
- phy-handle = <2452000>;
- };
-
- ethernet@25000 {
- #address-cells = <1>;
- #size-cells = <0>;
- device_type = "network";
- model = "TSEC";
- compatible = "gianfar";
- reg = <25000 1000>;
- address = [ 00 E0 0C 00 73 01 ];
- interrupts = <13 3 14 3 18 3>;
- interrupt-parent = <40000>;
- phy-handle = <2452001>;
- };
-
- ethernet@26000 {
- #address-cells = <1>;
- #size-cells = <0>;
- device_type = "network";
- model = "FEC";
- compatible = "gianfar";
- reg = <26000 1000>;
- address = [ 00 E0 0C 00 73 02 ];
- interrupts = <19 3>;
- interrupt-parent = <40000>;
- phy-handle = <2452002>;
- };
-
- serial@4500 {
- device_type = "serial";
- compatible = "ns16550";
- reg = <4500 100>;
- clock-frequency = <0>;
- interrupts = <1a 3>;
- interrupt-parent = <40000>;
- };
-
- pic@40000 {
- linux,phandle = <40000>;
- clock-frequency = <0>;
- interrupt-controller;
- #address-cells = <0>;
- reg = <40000 40000>;
- built-in;
- compatible = "chrp,open-pic";
- device_type = "open-pic";
- big-endian;
- };
-
- i2c@3000 {
- interrupt-parent = <40000>;
- interrupts = <1b 3>;
- reg = <3000 18>;
- device_type = "i2c";
- compatible = "fsl-i2c";
- dfsrr;
- };
-
- };
diff --git a/Documentation/powerpc/bootwrapper.txt b/Documentation/powerpc/bootwrapper.txt
new file mode 100644
index 00000000000..d60fced5e1c
--- /dev/null
+++ b/Documentation/powerpc/bootwrapper.txt
@@ -0,0 +1,141 @@
+The PowerPC boot wrapper
+------------------------
+Copyright (C) Secret Lab Technologies Ltd.
+
+PowerPC image targets compresses and wraps the kernel image (vmlinux) with
+a boot wrapper to make it usable by the system firmware. There is no
+standard PowerPC firmware interface, so the boot wrapper is designed to
+be adaptable for each kind of image that needs to be built.
+
+The boot wrapper can be found in the arch/powerpc/boot/ directory. The
+Makefile in that directory has targets for all the available image types.
+The different image types are used to support all of the various firmware
+interfaces found on PowerPC platforms. OpenFirmware is the most commonly
+used firmware type on general purpose PowerPC systems from Apple, IBM and
+others. U-Boot is typically found on embedded PowerPC hardware, but there
+are a handful of other firmware implementations which are also popular. Each
+firmware interface requires a different image format.
+
+The boot wrapper is built from the makefile in arch/powerpc/boot/Makefile and
+it uses the wrapper script (arch/powerpc/boot/wrapper) to generate target
+image. The details of the build system is discussed in the next section.
+Currently, the following image format targets exist:
+
+ cuImage.%: Backwards compatible uImage for older version of
+ U-Boot (for versions that don't understand the device
+ tree). This image embeds a device tree blob inside
+ the image. The boot wrapper, kernel and device tree
+ are all embedded inside the U-Boot uImage file format
+ with boot wrapper code that extracts data from the old
+ bd_info structure and loads the data into the device
+ tree before jumping into the kernel.
+ Because of the series of #ifdefs found in the
+ bd_info structure used in the old U-Boot interfaces,
+ cuImages are platform specific. Each specific
+ U-Boot platform has a different platform init file
+ which populates the embedded device tree with data
+ from the platform specific bd_info file. The platform
+ specific cuImage platform init code can be found in
+ arch/powerpc/boot/cuboot.*.c. Selection of the correct
+ cuImage init code for a specific board can be found in
+ the wrapper structure.
+ dtbImage.%: Similar to zImage, except device tree blob is embedded
+ inside the image instead of provided by firmware. The
+ output image file can be either an elf file or a flat
+ binary depending on the platform.
+ dtbImages are used on systems which do not have an
+ interface for passing a device tree directly.
+ dtbImages are similar to simpleImages except that
+ dtbImages have platform specific code for extracting
+ data from the board firmware, but simpleImages do not
+ talk to the firmware at all.
+ PlayStation 3 support uses dtbImage. So do Embedded
+ Planet boards using the PlanetCore firmware. Board
+ specific initialization code is typically found in a
+ file named arch/powerpc/boot/<platform>.c; but this
+ can be overridden by the wrapper script.
+ simpleImage.%: Firmware independent compressed image that does not
+ depend on any particular firmware interface and embeds
+ a device tree blob. This image is a flat binary that
+ can be loaded to any location in RAM and jumped to.
+ Firmware cannot pass any configuration data to the
+ kernel with this image type and it depends entirely on
+ the embedded device tree for all information.
+ The simpleImage is useful for booting systems with
+ an unknown firmware interface or for booting from
+ a debugger when no firmware is present (such as on
+ the Xilinx Virtex platform). The only assumption that
+ simpleImage makes is that RAM is correctly initialized
+ and that the MMU is either off or has RAM mapped to
+ base address 0.
+ simpleImage also supports inserting special platform
+ specific initialization code to the start of the bootup
+ sequence. The virtex405 platform uses this feature to
+ ensure that the cache is invalidated before caching
+ is enabled. Platform specific initialization code is
+ added as part of the wrapper script and is keyed on
+ the image target name. For example, all
+ simpleImage.virtex405-* targets will add the
+ virtex405-head.S initialization code (This also means
+ that the dts file for virtex405 targets should be
+ named (virtex405-<board>.dts). Search the wrapper
+ script for 'virtex405' and see the file
+ arch/powerpc/boot/virtex405-head.S for details.
+ treeImage.%; Image format for used with OpenBIOS firmware found
+ on some ppc4xx hardware. This image embeds a device
+ tree blob inside the image.
+ uImage: Native image format used by U-Boot. The uImage target
+ does not add any boot code. It just wraps a compressed
+ vmlinux in the uImage data structure. This image
+ requires a version of U-Boot that is able to pass
+ a device tree to the kernel at boot. If using an older
+ version of U-Boot, then you need to use a cuImage
+ instead.
+ zImage.%: Image format which does not embed a device tree.
+ Used by OpenFirmware and other firmware interfaces
+ which are able to supply a device tree. This image
+ expects firmware to provide the device tree at boot.
+ Typically, if you have general purpose PowerPC
+ hardware then you want this image format.
+
+Image types which embed a device tree blob (simpleImage, dtbImage, treeImage,
+and cuImage) all generate the device tree blob from a file in the
+arch/powerpc/boot/dts/ directory. The Makefile selects the correct device
+tree source based on the name of the target. Therefore, if the kernel is
+built with 'make treeImage.walnut simpleImage.virtex405-ml403', then the
+build system will use arch/powerpc/boot/dts/walnut.dts to build
+treeImage.walnut and arch/powerpc/boot/dts/virtex405-ml403.dts to build
+the simpleImage.virtex405-ml403.
+
+Two special targets called 'zImage' and 'zImage.initrd' also exist. These
+targets build all the default images as selected by the kernel configuration.
+Default images are selected by the boot wrapper Makefile
+(arch/powerpc/boot/Makefile) by adding targets to the $image-y variable. Look
+at the Makefile to see which default image targets are available.
+
+How it is built
+---------------
+arch/powerpc is designed to support multiplatform kernels, which means
+that a single vmlinux image can be booted on many different target boards.
+It also means that the boot wrapper must be able to wrap for many kinds of
+images on a single build. The design decision was made to not use any
+conditional compilation code (#ifdef, etc) in the boot wrapper source code.
+All of the boot wrapper pieces are buildable at any time regardless of the
+kernel configuration. Building all the wrapper bits on every kernel build
+also ensures that obscure parts of the wrapper are at the very least compile
+tested in a large variety of environments.
+
+The wrapper is adapted for different image types at link time by linking in
+just the wrapper bits that are appropriate for the image type. The 'wrapper
+script' (found in arch/powerpc/boot/wrapper) is called by the Makefile and
+is responsible for selecting the correct wrapper bits for the image type.
+The arguments are well documented in the script's comment block, so they
+are not repeated here. However, it is worth mentioning that the script
+uses the -p (platform) argument as the main method of deciding which wrapper
+bits to compile in. Look for the large 'case "$platform" in' block in the
+middle of the script. This is also the place where platform specific fixups
+can be selected by changing the link order.
+
+In particular, care should be taken when working with cuImages. cuImage
+wrapper bits are very board specific and care should be taken to make sure
+the target you are trying to build is supported by the wrapper bits.
diff --git a/Documentation/powerpc/cpu_families.txt b/Documentation/powerpc/cpu_families.txt
new file mode 100644
index 00000000000..fc08e22feb1
--- /dev/null
+++ b/Documentation/powerpc/cpu_families.txt
@@ -0,0 +1,221 @@
+CPU Families
+============
+
+This document tries to summarise some of the different cpu families that exist
+and are supported by arch/powerpc.
+
+
+Book3S (aka sPAPR)
+------------------
+
+ - Hash MMU
+ - Mix of 32 & 64 bit
+
+ +--------------+ +----------------+
+ | Old POWER | --------------> | RS64 (threads) |
+ +--------------+ +----------------+
+ |
+ |
+ v
+ +--------------+ +----------------+ +------+
+ | 601 | --------------> | 603 | ---> | e300 |
+ +--------------+ +----------------+ +------+
+ | |
+ | |
+ v v
+ +--------------+ +----------------+ +-------+
+ | 604 | | 750 (G3) | ---> | 750CX |
+ +--------------+ +----------------+ +-------+
+ | | |
+ | | |
+ v v v
+ +--------------+ +----------------+ +-------+
+ | 620 (64 bit) | | 7400 | | 750CL |
+ +--------------+ +----------------+ +-------+
+ | | |
+ | | |
+ v v v
+ +--------------+ +----------------+ +-------+
+ | POWER3/630 | | 7410 | | 750FX |
+ +--------------+ +----------------+ +-------+
+ | |
+ | |
+ v v
+ +--------------+ +----------------+
+ | POWER3+ | | 7450 |
+ +--------------+ +----------------+
+ | |
+ | |
+ v v
+ +--------------+ +----------------+
+ | POWER4 | | 7455 |
+ +--------------+ +----------------+
+ | |
+ | |
+ v v
+ +--------------+ +-------+ +----------------+
+ | POWER4+ | --> | 970 | | 7447 |
+ +--------------+ +-------+ +----------------+
+ | | |
+ | | |
+ v v v
+ +--------------+ +-------+ +----------------+
+ | POWER5 | | 970FX | | 7448 |
+ +--------------+ +-------+ +----------------+
+ | | |
+ | | |
+ v v v
+ +--------------+ +-------+ +----------------+
+ | POWER5+ | | 970MP | | e600 |
+ +--------------+ +-------+ +----------------+
+ |
+ |
+ v
+ +--------------+
+ | POWER5++ |
+ +--------------+
+ |
+ |
+ v
+ +--------------+ +-------+
+ | POWER6 | <-?-> | Cell |
+ +--------------+ +-------+
+ |
+ |
+ v
+ +--------------+
+ | POWER7 |
+ +--------------+
+ |
+ |
+ v
+ +--------------+
+ | POWER7+ |
+ +--------------+
+ |
+ |
+ v
+ +--------------+
+ | POWER8 |
+ +--------------+
+
+
+ +---------------+
+ | PA6T (64 bit) |
+ +---------------+
+
+
+IBM BookE
+---------
+
+ - Software loaded TLB.
+ - All 32 bit
+
+ +--------------+
+ | 401 |
+ +--------------+
+ |
+ |
+ v
+ +--------------+
+ | 403 |
+ +--------------+
+ |
+ |
+ v
+ +--------------+
+ | 405 |
+ +--------------+
+ |
+ |
+ v
+ +--------------+
+ | 440 |
+ +--------------+
+ |
+ |
+ v
+ +--------------+ +----------------+
+ | 450 | --> | BG/P |
+ +--------------+ +----------------+
+ |
+ |
+ v
+ +--------------+
+ | 460 |
+ +--------------+
+ |
+ |
+ v
+ +--------------+
+ | 476 |
+ +--------------+
+
+
+Motorola/Freescale 8xx
+----------------------
+
+ - Software loaded with hardware assist.
+ - All 32 bit
+
+ +-------------+
+ | MPC8xx Core |
+ +-------------+
+
+
+Freescale BookE
+---------------
+
+ - Software loaded TLB.
+ - e6500 adds HW loaded indirect TLB entries.
+ - Mix of 32 & 64 bit
+
+ +--------------+
+ | e200 |
+ +--------------+
+
+
+ +--------------------------------+
+ | e500 |
+ +--------------------------------+
+ |
+ |
+ v
+ +--------------------------------+
+ | e500v2 |
+ +--------------------------------+
+ |
+ |
+ v
+ +--------------------------------+
+ | e500mc (Book3e) |
+ +--------------------------------+
+ |
+ |
+ v
+ +--------------------------------+
+ | e5500 (64 bit) |
+ +--------------------------------+
+ |
+ |
+ v
+ +--------------------------------+
+ | e6500 (HW TLB) (Multithreaded) |
+ +--------------------------------+
+
+
+IBM A2 core
+-----------
+
+ - Book3E, software loaded TLB + HW loaded indirect TLB entries.
+ - 64 bit
+
+ +--------------+ +----------------+
+ | A2 core | --> | WSP |
+ +--------------+ +----------------+
+ |
+ |
+ v
+ +--------------+
+ | BG/Q |
+ +--------------+
diff --git a/Documentation/powerpc/cpu_features.txt b/Documentation/powerpc/cpu_features.txt
index 472739880e8..ae09df8722c 100644
--- a/Documentation/powerpc/cpu_features.txt
+++ b/Documentation/powerpc/cpu_features.txt
@@ -11,10 +11,10 @@ split instruction and data caches, and if the CPU supports the DOZE and NAP
sleep modes.
Detection of the feature set is simple. A list of processors can be found in
-arch/ppc/kernel/cputable.c. The PVR register is masked and compared with each
-value in the list. If a match is found, the cpu_features of cur_cpu_spec is
-assigned to the feature bitmask for this processor and a __setup_cpu function
-is called.
+arch/powerpc/kernel/cputable.c. The PVR register is masked and compared with
+each value in the list. If a match is found, the cpu_features of cur_cpu_spec
+is assigned to the feature bitmask for this processor and a __setup_cpu
+function is called.
C code may test 'cur_cpu_spec[smp_processor_id()]->cpu_features' for a
particular feature bit. This is done in quite a few places, for example
@@ -31,7 +31,7 @@ anyways).
After detecting the processor type, the kernel patches out sections of code
that shouldn't be used by writing nop's over it. Using cpufeatures requires
-just 2 macros (found in include/asm-ppc/cputable.h), as seen in head.S
+just 2 macros (found in arch/powerpc/include/asm/cputable.h), as seen in head.S
transfer_to_handler:
#ifdef CONFIG_ALTIVEC
@@ -51,6 +51,6 @@ should be used in the majority of cases.
The END_FTR_SECTION macros are implemented by storing information about this
code in the '__ftr_fixup' ELF section. When do_cpu_ftr_fixups
-(arch/ppc/kernel/misc.S) is invoked, it will iterate over the records in
+(arch/powerpc/kernel/misc.S) is invoked, it will iterate over the records in
__ftr_fixup, and if the required feature is not present it will loop writing
nop's from each BEGIN_FTR_SECTION to END_FTR_SECTION.
diff --git a/Documentation/powerpc/eeh-pci-error-recovery.txt b/Documentation/powerpc/eeh-pci-error-recovery.txt
index 67a11a36270..9d4e33df624 100644
--- a/Documentation/powerpc/eeh-pci-error-recovery.txt
+++ b/Documentation/powerpc/eeh-pci-error-recovery.txt
@@ -36,8 +36,8 @@ Causes of EEH Errors
EEH was originally designed to guard against hardware failure, such
as PCI cards dying from heat, humidity, dust, vibration and bad
electrical connections. The vast majority of EEH errors seen in
-"real life" are due to eithr poorly seated PCI cards, or,
-unfortunately quite commonly, due device driver bugs, device firmware
+"real life" are due to either poorly seated PCI cards, or,
+unfortunately quite commonly, due to device driver bugs, device firmware
bugs, and sometimes PCI card hardware bugs.
The most common software bug, is one that causes the device to
@@ -90,7 +90,7 @@ EEH-isolated, there is a firmware call it can make to determine if
this is the case. If so, then the device driver should put itself
into a consistent state (given that it won't be able to complete any
pending work) and start recovery of the card. Recovery normally
-would consist of reseting the PCI device (holding the PCI #RST
+would consist of resetting the PCI device (holding the PCI #RST
line high for two seconds), followed by setting up the device
config space (the base address registers (BAR's), latency timer,
cache line size, interrupt line, and so on). This is followed by a
@@ -116,12 +116,12 @@ At this time, a generic EEH recovery mechanism has been implemented,
so that individual device drivers do not need to be modified to support
EEH recovery. This generic mechanism piggy-backs on the PCI hotplug
infrastructure, and percolates events up through the userspace/udev
-infrastructure. Followiing is a detailed description of how this is
+infrastructure. Following is a detailed description of how this is
accomplished.
EEH must be enabled in the PHB's very early during the boot process,
and if a PCI slot is hot-plugged. The former is performed by
-eeh_init() in arch/ppc64/kernel/eeh.c, and the later by
+eeh_init() in arch/powerpc/platforms/pseries/eeh.c, and the later by
drivers/pci/hotplug/pSeries_pci.c calling in to the eeh.c code.
EEH must be enabled before a PCI scan of the device can proceed.
Current Power5 hardware will not work unless EEH is enabled;
@@ -133,7 +133,7 @@ error. Given an arbitrary address, the routine
pci_get_device_by_addr() will find the pci device associated
with that address (if any).
-The default include/asm-ppc64/io.h macros readb(), inb(), insb(),
+The default arch/powerpc/include/asm/io.h macros readb(), inb(), insb(),
etc. include a check to see if the i/o read returned all-0xff's.
If so, these make a call to eeh_dn_check_failure(), which in turn
asks the firmware if the all-ff's value is the sign of a true EEH
@@ -143,11 +143,12 @@ seen in /proc/ppc64/eeh (subject to change). Normally, almost
all of these occur during boot, when the PCI bus is scanned, where
a large number of 0xff reads are part of the bus scan procedure.
-If a frozen slot is detected, code in arch/ppc64/kernel/eeh.c will
-print a stack trace to syslog (/var/log/messages). This stack trace
-has proven to be very useful to device-driver authors for finding
-out at what point the EEH error was detected, as the error itself
-usually occurs slightly beforehand.
+If a frozen slot is detected, code in
+arch/powerpc/platforms/pseries/eeh.c will print a stack trace to
+syslog (/var/log/messages). This stack trace has proven to be very
+useful to device-driver authors for finding out at what point the EEH
+error was detected, as the error itself usually occurs slightly
+beforehand.
Next, it uses the Linux kernel notifier chain/work queue mechanism to
allow any interested parties to find out about the failure. Device
diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt
new file mode 100644
index 00000000000..3007bc98af2
--- /dev/null
+++ b/Documentation/powerpc/firmware-assisted-dump.txt
@@ -0,0 +1,270 @@
+
+ Firmware-Assisted Dump
+ ------------------------
+ July 2011
+
+The goal of firmware-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+- Firmware assisted dump (fadump) infrastructure is intended to replace
+ the existing phyp assisted dump.
+- Fadump uses the same firmware interfaces and memory reservation model
+ as phyp assisted dump.
+- Unlike phyp dump, fadump exports the memory dump through /proc/vmcore
+ in the ELF format in the same way as kdump. This helps us reuse the
+ kdump infrastructure for dump capture and filtering.
+- Unlike phyp dump, userspace tool does not need to refer any sysfs
+ interface while reading /proc/vmcore.
+- Unlike phyp dump, fadump allows user to release all the memory reserved
+ for dump, with a single operation of echo 1 > /sys/kernel/fadump_release_mem.
+- Once enabled through kernel boot parameter, fadump can be
+ started/stopped through /sys/kernel/fadump_registered interface (see
+ sysfs files section below) and can be easily integrated with kdump
+ service start/stop init scripts.
+
+Comparing with kdump or other strategies, firmware-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+ with a fresh copy of the kernel. In particular,
+ PCI and I/O devices have been reinitialized and are
+ in a clean, consistent state.
+-- Once the dump is copied out, the memory that held the dump
+ is immediately available to the running kernel. And therefore,
+ unlike kdump, fadump doesn't need a 2nd reboot to get back
+ the system to the production configuration.
+
+The above can only be accomplished by coordination with,
+and assistance from the Power firmware. The procedure is
+as follows:
+
+-- The first kernel registers the sections of memory with the
+ Power firmware for dump preservation during OS initialization.
+ These registered sections of memory are reserved by the first
+ kernel during early boot.
+
+-- When a system crashes, the Power firmware will save
+ the low memory (boot memory of size larger of 5% of system RAM
+ or 256MB) of RAM to the previous registered region. It will
+ also save system registers, and hardware PTE's.
+
+ NOTE: The term 'boot memory' means size of the low memory chunk
+ that is required for a kernel to boot successfully when
+ booted with restricted memory. By default, the boot memory
+ size will be the larger of 5% of system RAM or 256MB.
+ Alternatively, user can also specify boot memory size
+ through boot parameter 'fadump_reserve_mem=' which will
+ override the default calculated size. Use this option
+ if default boot memory size is not sufficient for second
+ kernel to boot successfully.
+
+-- After the low memory (boot memory) area has been saved, the
+ firmware will reset PCI and other hardware state. It will
+ *not* clear the RAM. It will then launch the bootloader, as
+ normal.
+
+-- The freshly booted kernel will notice that there is a new
+ node (ibm,dump-kernel) in the device tree, indicating that
+ there is crash data available from a previous boot. During
+ the early boot OS will reserve rest of the memory above
+ boot memory size effectively booting with restricted memory
+ size. This will make sure that the second kernel will not
+ touch any of the dump memory area.
+
+-- User-space tools will read /proc/vmcore to obtain the contents
+ of memory, which holds the previous crashed kernel dump in ELF
+ format. The userspace tools may copy this info to disk, or
+ network, nas, san, iscsi, etc. as desired.
+
+-- Once the userspace tool is done saving dump, it will echo
+ '1' to /sys/kernel/fadump_release_mem to release the reserved
+ memory back to general use, except the memory required for
+ next firmware-assisted dump registration.
+
+ e.g.
+ # echo 1 > /sys/kernel/fadump_release_mem
+
+Please note that the firmware-assisted dump feature
+is only available on Power6 and above systems with recent
+firmware versions.
+
+Implementation details:
+----------------------
+
+During boot, a check is made to see if firmware supports
+this feature on that particular machine. If it does, then
+we check to see if an active dump is waiting for us. If yes
+then everything but boot memory size of RAM is reserved during
+early boot (See Fig. 2). This area is released once we finish
+collecting the dump from user land scripts (e.g. kdump scripts)
+that are run. If there is dump data, then the
+/sys/kernel/fadump_release_mem file is created, and the reserved
+memory is held.
+
+If there is no waiting dump data, then only the memory required
+to hold CPU state, HPTE region, boot memory dump and elfcore
+header, is reserved at the top of memory (see Fig. 1). This area
+is *not* released: this region will be kept permanently reserved,
+so that it can act as a receptacle for a copy of the boot memory
+content in addition to CPU state and HPTE region, in the case a
+crash does occur.
+
+ o Memory Reservation during first kernel
+
+ Low memory Top of memory
+ 0 boot memory size |
+ | | |<--Reserved dump area -->|
+ V V | Permanent Reservation V
+ +-----------+----------/ /----------+---+----+-----------+----+
+ | | |CPU|HPTE| DUMP |ELF |
+ +-----------+----------/ /----------+---+----+-----------+----+
+ | ^
+ | |
+ \ /
+ -------------------------------------------
+ Boot memory content gets transferred to
+ reserved area by firmware at the time of
+ crash
+ Fig. 1
+
+ o Memory Reservation during second kernel after crash
+
+ Low memory Top of memory
+ 0 boot memory size |
+ | |<------------- Reserved dump area ----------- -->|
+ V V V
+ +-----------+----------/ /----------+---+----+-----------+----+
+ | | |CPU|HPTE| DUMP |ELF |
+ +-----------+----------/ /----------+---+----+-----------+----+
+ | |
+ V V
+ Used by second /proc/vmcore
+ kernel to boot
+ Fig. 2
+
+Currently the dump will be copied from /proc/vmcore to a
+a new file upon user intervention. The dump data available through
+/proc/vmcore will be in ELF format. Hence the existing kdump
+infrastructure (kdump scripts) to save the dump works fine with
+minor modifications.
+
+The tools to examine the dump will be same as the ones
+used for kdump.
+
+How to enable firmware-assisted dump (fadump):
+-------------------------------------
+
+1. Set config option CONFIG_FA_DUMP=y and build kernel.
+2. Boot into linux kernel with 'fadump=on' kernel cmdline option.
+3. Optionally, user can also set 'fadump_reserve_mem=' kernel cmdline
+ to specify size of the memory to reserve for boot memory dump
+ preservation.
+
+NOTE: If firmware-assisted dump fails to reserve memory then it will
+ fallback to existing kdump mechanism if 'crashkernel=' option
+ is set at kernel cmdline.
+
+Sysfs/debugfs files:
+------------
+
+Firmware-assisted dump feature uses sysfs file system to hold
+the control files and debugfs file to display memory reserved region.
+
+Here is the list of files under kernel sysfs:
+
+ /sys/kernel/fadump_enabled
+
+ This is used to display the fadump status.
+ 0 = fadump is disabled
+ 1 = fadump is enabled
+
+ This interface can be used by kdump init scripts to identify if
+ fadump is enabled in the kernel and act accordingly.
+
+ /sys/kernel/fadump_registered
+
+ This is used to display the fadump registration status as well
+ as to control (start/stop) the fadump registration.
+ 0 = fadump is not registered.
+ 1 = fadump is registered and ready to handle system crash.
+
+ To register fadump echo 1 > /sys/kernel/fadump_registered and
+ echo 0 > /sys/kernel/fadump_registered for un-register and stop the
+ fadump. Once the fadump is un-registered, the system crash will not
+ be handled and vmcore will not be captured. This interface can be
+ easily integrated with kdump service start/stop.
+
+ /sys/kernel/fadump_release_mem
+
+ This file is available only when fadump is active during
+ second kernel. This is used to release the reserved memory
+ region that are held for saving crash dump. To release the
+ reserved memory echo 1 to it:
+
+ echo 1 > /sys/kernel/fadump_release_mem
+
+ After echo 1, the content of the /sys/kernel/debug/powerpc/fadump_region
+ file will change to reflect the new memory reservations.
+
+ The existing userspace tools (kdump infrastructure) can be easily
+ enhanced to use this interface to release the memory reserved for
+ dump and continue without 2nd reboot.
+
+Here is the list of files under powerpc debugfs:
+(Assuming debugfs is mounted on /sys/kernel/debug directory.)
+
+ /sys/kernel/debug/powerpc/fadump_region
+
+ This file shows the reserved memory regions if fadump is
+ enabled otherwise this file is empty. The output format
+ is:
+ <region>: [<start>-<end>] <reserved-size> bytes, Dumped: <dump-size>
+
+ e.g.
+ Contents when fadump is registered during first kernel
+
+ # cat /sys/kernel/debug/powerpc/fadump_region
+ CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x0
+ HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x0
+ DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x0
+
+ Contents when fadump is active during second kernel
+
+ # cat /sys/kernel/debug/powerpc/fadump_region
+ CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x40020
+ HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x1000
+ DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x10000000
+ : [0x00000010000000-0x0000006ffaffff] 0x5ffb0000 bytes, Dumped: 0x5ffb0000
+
+NOTE: Please refer to Documentation/filesystems/debugfs.txt on
+ how to mount the debugfs filesystem.
+
+
+TODO:
+-----
+ o Need to come up with the better approach to find out more
+ accurate boot memory size that is required for a kernel to
+ boot successfully when booted with restricted memory.
+ o The fadump implementation introduces a fadump crash info structure
+ in the scratch area before the ELF core header. The idea of introducing
+ this structure is to pass some important crash info data to the second
+ kernel which will help second kernel to populate ELF core header with
+ correct data before it gets exported through /proc/vmcore. The current
+ design implementation does not address a possibility of introducing
+ additional fields (in future) to this structure without affecting
+ compatibility. Need to come up with the better approach to address this.
+ The possible approaches are:
+ 1. Introduce version field for version tracking, bump up the version
+ whenever a new field is added to the structure in future. The version
+ field can be used to find out what fields are valid for the current
+ version of the structure.
+ 2. Reserve the area of predefined size (say PAGE_SIZE) for this
+ structure and have unused area as reserved (initialized to zero)
+ for future field additions.
+ The advantage of approach 1 over 2 is we don't need to reserve extra space.
+---
+Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
+This document is based on the original documentation written for phyp
+assisted dump by Linas Vepstas and Manish Ahuja.
diff --git a/Documentation/powerpc/hvcs.txt b/Documentation/powerpc/hvcs.txt
index dca75cbda6f..a730ca5a07f 100644
--- a/Documentation/powerpc/hvcs.txt
+++ b/Documentation/powerpc/hvcs.txt
@@ -259,7 +259,7 @@ This index of '2' means that in order to connect to vty-server adapter
It should be noted that due to the system hotplug I/O capabilities of a
system the /dev/hvcs* entry that interacts with a particular vty-server
-adapter is not guarenteed to remain the same across system reboots. Look
+adapter is not guaranteed to remain the same across system reboots. Look
in the Q & A section for more on this issue.
---------------------------------------------------------------------------
@@ -528,7 +528,7 @@ this driver assignment of hotplug added vty-servers may be in a different
order than how they would be exposed on module load. Rebooting or
reloading the module after dynamic addition may result in the /dev/hvcs*
and vty-server coupling changing if a vty-server adapter was added in a
-slot inbetween two other vty-server adapters. Refer to the section above
+slot between two other vty-server adapters. Refer to the section above
on how to determine which vty-server goes with which /dev/hvcs* node.
Hint; look at the sysfs "index" attribute for the vty-server.
@@ -558,9 +558,9 @@ partitions.
The proper channel for reporting bugs is either through the Linux OS
distribution company that provided your OS or by posting issues to the
-ppc64 development mailing list at:
+PowerPC development mailing list at:
-linuxppc64-dev@lists.linuxppc.org
+linuxppc-dev@lists.ozlabs.org
This request is to provide a documented and searchable public exchange
of the problems and solutions surrounding this driver for the benefit of
diff --git a/Documentation/powerpc/kvm_440.txt b/Documentation/powerpc/kvm_440.txt
new file mode 100644
index 00000000000..c02a003fa03
--- /dev/null
+++ b/Documentation/powerpc/kvm_440.txt
@@ -0,0 +1,41 @@
+Hollis Blanchard <hollisb@us.ibm.com>
+15 Apr 2008
+
+Various notes on the implementation of KVM for PowerPC 440:
+
+To enforce isolation, host userspace, guest kernel, and guest userspace all
+run at user privilege level. Only the host kernel runs in supervisor mode.
+Executing privileged instructions in the guest traps into KVM (in the host
+kernel), where we decode and emulate them. Through this technique, unmodified
+440 Linux kernels can be run (slowly) as guests. Future performance work will
+focus on reducing the overhead and frequency of these traps.
+
+The usual code flow is started from userspace invoking an "run" ioctl, which
+causes KVM to switch into guest context. We use IVPR to hijack the host
+interrupt vectors while running the guest, which allows us to direct all
+interrupts to kvmppc_handle_interrupt(). At this point, we could either
+- handle the interrupt completely (e.g. emulate "mtspr SPRG0"), or
+- let the host interrupt handler run (e.g. when the decrementer fires), or
+- return to host userspace (e.g. when the guest performs device MMIO)
+
+Address spaces: We take advantage of the fact that Linux doesn't use the AS=1
+address space (in host or guest), which gives us virtual address space to use
+for guest mappings. While the guest is running, the host kernel remains mapped
+in AS=0, but the guest can only use AS=1 mappings.
+
+TLB entries: The TLB entries covering the host linear mapping remain
+present while running the guest. This reduces the overhead of lightweight
+exits, which are handled by KVM running in the host kernel. We keep three
+copies of the TLB:
+ - guest TLB: contents of the TLB as the guest sees it
+ - shadow TLB: the TLB that is actually in hardware while guest is running
+ - host TLB: to restore TLB state when context switching guest -> host
+When a TLB miss occurs because a mapping was not present in the shadow TLB,
+but was present in the guest TLB, KVM handles the fault without invoking the
+guest. Large guest pages are backed by multiple 4KB shadow pages through this
+mechanism.
+
+IO: MMIO and DCR accesses are emulated by userspace. We use virtio for network
+and block IO, so those drivers must be enabled in the guest. It's possible
+that some qemu device emulation (e.g. e1000 or rtl8139) may also work with
+little effort.
diff --git a/Documentation/powerpc/mpc52xx.txt b/Documentation/powerpc/mpc52xx.txt
index 10dd4ab93b8..0d540a31ea1 100644
--- a/Documentation/powerpc/mpc52xx.txt
+++ b/Documentation/powerpc/mpc52xx.txt
@@ -2,7 +2,7 @@ Linux 2.6.x on MPC52xx family
-----------------------------
For the latest info, go to http://www.246tNt.com/mpc52xx/
-
+
To compile/use :
- U-Boot:
@@ -10,23 +10,23 @@ To compile/use :
if you wish to ).
# make lite5200_defconfig
# make uImage
-
+
then, on U-boot:
=> tftpboot 200000 uImage
=> tftpboot 400000 pRamdisk
=> bootm 200000 400000
-
+
- DBug:
# <edit Makefile to set ARCH=ppc & CROSS_COMPILE=... ( also EXTRAVERSION
if you wish to ).
# make lite5200_defconfig
# cp your_initrd.gz arch/ppc/boot/images/ramdisk.image.gz
- # make zImage.initrd
- # make
+ # make zImage.initrd
+ # make
then in DBug:
DBug> dn -i zImage.initrd.lite5200
-
+
Some remarks :
- The port is named mpc52xxx, and config options are PPC_MPC52xx. The MGT5100
diff --git a/Documentation/powerpc/pmu-ebb.txt b/Documentation/powerpc/pmu-ebb.txt
new file mode 100644
index 00000000000..73cd163dbfb
--- /dev/null
+++ b/Documentation/powerpc/pmu-ebb.txt
@@ -0,0 +1,137 @@
+PMU Event Based Branches
+========================
+
+Event Based Branches (EBBs) are a feature which allows the hardware to
+branch directly to a specified user space address when certain events occur.
+
+The full specification is available in Power ISA v2.07:
+
+ https://www.power.org/documentation/power-isa-version-2-07/
+
+One type of event for which EBBs can be configured is PMU exceptions. This
+document describes the API for configuring the Power PMU to generate EBBs,
+using the Linux perf_events API.
+
+
+Terminology
+-----------
+
+Throughout this document we will refer to an "EBB event" or "EBB events". This
+just refers to a struct perf_event which has set the "EBB" flag in its
+attr.config. All events which can be configured on the hardware PMU are
+possible "EBB events".
+
+
+Background
+----------
+
+When a PMU EBB occurs it is delivered to the currently running process. As such
+EBBs can only sensibly be used by programs for self-monitoring.
+
+It is a feature of the perf_events API that events can be created on other
+processes, subject to standard permission checks. This is also true of EBB
+events, however unless the target process enables EBBs (via mtspr(BESCR)) no
+EBBs will ever be delivered.
+
+This makes it possible for a process to enable EBBs for itself, but not
+actually configure any events. At a later time another process can come along
+and attach an EBB event to the process, which will then cause EBBs to be
+delivered to the first process. It's not clear if this is actually useful.
+
+
+When the PMU is configured for EBBs, all PMU interrupts are delivered to the
+user process. This means once an EBB event is scheduled on the PMU, no non-EBB
+events can be configured. This means that EBB events can not be run
+concurrently with regular 'perf' commands, or any other perf events.
+
+It is however safe to run 'perf' commands on a process which is using EBBs. The
+kernel will in general schedule the EBB event, and perf will be notified that
+its events could not run.
+
+The exclusion between EBB events and regular events is implemented using the
+existing "pinned" and "exclusive" attributes of perf_events. This means EBB
+events will be given priority over other events, unless they are also pinned.
+If an EBB event and a regular event are both pinned, then whichever is enabled
+first will be scheduled and the other will be put in error state. See the
+section below titled "Enabling an EBB event" for more information.
+
+
+Creating an EBB event
+---------------------
+
+To request that an event is counted using EBB, the event code should have bit
+63 set.
+
+EBB events must be created with a particular, and restrictive, set of
+attributes - this is so that they interoperate correctly with the rest of the
+perf_events subsystem.
+
+An EBB event must be created with the "pinned" and "exclusive" attributes set.
+Note that if you are creating a group of EBB events, only the leader can have
+these attributes set.
+
+An EBB event must NOT set any of the "inherit", "sample_period", "freq" or
+"enable_on_exec" attributes.
+
+An EBB event must be attached to a task. This is specified to perf_event_open()
+by passing a pid value, typically 0 indicating the current task.
+
+All events in a group must agree on whether they want EBB. That is all events
+must request EBB, or none may request EBB.
+
+EBB events must specify the PMC they are to be counted on. This ensures
+userspace is able to reliably determine which PMC the event is scheduled on.
+
+
+Enabling an EBB event
+---------------------
+
+Once an EBB event has been successfully opened, it must be enabled with the
+perf_events API. This can be achieved either via the ioctl() interface, or the
+prctl() interface.
+
+However, due to the design of the perf_events API, enabling an event does not
+guarantee that it has been scheduled on the PMU. To ensure that the EBB event
+has been scheduled on the PMU, you must perform a read() on the event. If the
+read() returns EOF, then the event has not been scheduled and EBBs are not
+enabled.
+
+This behaviour occurs because the EBB event is pinned and exclusive. When the
+EBB event is enabled it will force all other non-pinned events off the PMU. In
+this case the enable will be successful. However if there is already an event
+pinned on the PMU then the enable will not be successful.
+
+
+Reading an EBB event
+--------------------
+
+It is possible to read() from an EBB event. However the results are
+meaningless. Because interrupts are being delivered to the user process the
+kernel is not able to count the event, and so will return a junk value.
+
+
+Closing an EBB event
+--------------------
+
+When an EBB event is finished with, you can close it using close() as for any
+regular event. If this is the last EBB event the PMU will be deconfigured and
+no further PMU EBBs will be delivered.
+
+
+EBB Handler
+-----------
+
+The EBB handler is just regular userspace code, however it must be written in
+the style of an interrupt handler. When the handler is entered all registers
+are live (possibly) and so must be saved somehow before the handler can invoke
+other code.
+
+It's up to the program how to handle this. For C programs a relatively simple
+option is to create an interrupt frame on the stack and save registers there.
+
+Fork
+----
+
+EBB events are not inherited across fork. If the child process wishes to use
+EBBs it should open a new event for itself. Similarly the EBB state in
+BESCR/EBBHR/EBBRR is cleared across fork().
diff --git a/Documentation/powerpc/ppc_htab.txt b/Documentation/powerpc/ppc_htab.txt
deleted file mode 100644
index 8b8c7df29fa..00000000000
--- a/Documentation/powerpc/ppc_htab.txt
+++ /dev/null
@@ -1,118 +0,0 @@
- Information about /proc/ppc_htab
-=====================================================================
-
-This document and the related code was written by me (Cort Dougan), please
-email me (cort@fsmlabs.com) if you have questions, comments or corrections.
-
-Last Change: 2.16.98
-
-This entry in the proc directory is readable by all users but only
-writable by root.
-
-The ppc_htab interface is a user level way of accessing the
-performance monitoring registers as well as providing information
-about the PTE hash table.
-
-1. Reading
-
- Reading this file will give you information about the memory management
- hash table that serves as an extended tlb for page translation on the
- powerpc. It will also give you information about performance measurement
- specific to the cpu that you are using.
-
- Explanation of the 604 Performance Monitoring Fields:
- MMCR0 - the current value of the MMCR0 register
- PMC1
- PMC2 - the value of the performance counters and a
- description of what events they are counting
- which are based on MMCR0 bit settings.
- Explanation of the PTE Hash Table fields:
-
- Size - hash table size in Kb.
- Buckets - number of buckets in the table.
- Address - the virtual kernel address of the hash table base.
- Entries - the number of ptes that can be stored in the hash table.
- User/Kernel - how many pte's are in use by the kernel or user at that time.
- Overflows - How many of the entries are in their secondary hash location.
- Percent full - ratio of free pte entries to in use entries.
- Reloads - Count of how many hash table misses have occurred
- that were fixed with a reload from the linux tables.
- Should always be 0 on 603 based machines.
- Non-error Misses - Count of how many hash table misses have occurred
- that were completed with the creation of a pte in the linux
- tables with a call to do_page_fault().
- Error Misses - Number of misses due to errors such as bad address
- and permission violations. This includes kernel access of
- bad user addresses that are fixed up by the trap handler.
-
- Note that calculation of the data displayed from /proc/ppc_htab takes
- a long time and spends a great deal of time in the kernel. It would
- be quite hard on performance to read this file constantly. In time
- there may be a counter in the kernel that allows successive reads from
- this file only after a given amount of time has passed to reduce the
- possibility of a user slowing the system by reading this file.
-
-2. Writing
-
- Writing to the ppc_htab allows you to change the characteristics of
- the powerpc PTE hash table and setup performance monitoring.
-
- Resizing the PTE hash table is not enabled right now due to many
- complications with moving the hash table, rehashing the entries
- and many many SMP issues that would have to be dealt with.
-
- Write options to ppc_htab:
-
- - To set the size of the hash table to 64Kb:
-
- echo 'size 64' > /proc/ppc_htab
-
- The size must be a multiple of 64 and must be greater than or equal to
- 64.
-
- - To turn off performance monitoring:
-
- echo 'off' > /proc/ppc_htab
-
- - To reset the counters without changing what they're counting:
-
- echo 'reset' > /proc/ppc_htab
-
- Note that counting will continue after the reset if it is enabled.
-
- - To count only events in user mode or only in kernel mode:
-
- echo 'user' > /proc/ppc_htab
- ...or...
- echo 'kernel' > /proc/ppc_htab
-
- Note that these two options are exclusive of one another and the
- lack of either of these options counts user and kernel.
- Using 'reset' and 'off' reset these flags.
-
- - The 604 has 2 performance counters which can each count events from
- a specific set of events. These sets are disjoint so it is not
- possible to count _any_ combination of 2 events. One event can
- be counted by PMC1 and one by PMC2.
-
- To start counting a particular event use:
-
- echo 'event' > /proc/ppc_htab
-
- and choose from these events:
-
- PMC1
- ----
- 'ic miss' - instruction cache misses
- 'dtlb' - data tlb misses (not hash table misses)
-
- PMC2
- ----
- 'dc miss' - data cache misses
- 'itlb' - instruction tlb misses (not hash table misses)
- 'load miss time' - cycles to complete a load miss
-
-3. Bugs
-
- The PMC1 and PMC2 counters can overflow and give no indication of that
- in /proc/ppc_htab.
diff --git a/Documentation/powerpc/ptrace.txt b/Documentation/powerpc/ptrace.txt
new file mode 100644
index 00000000000..99c5ce88d0f
--- /dev/null
+++ b/Documentation/powerpc/ptrace.txt
@@ -0,0 +1,151 @@
+GDB intends to support the following hardware debug features of BookE
+processors:
+
+4 hardware breakpoints (IAC)
+2 hardware watchpoints (read, write and read-write) (DAC)
+2 value conditions for the hardware watchpoints (DVC)
+
+For that, we need to extend ptrace so that GDB can query and set these
+resources. Since we're extending, we're trying to create an interface
+that's extendable and that covers both BookE and server processors, so
+that GDB doesn't need to special-case each of them. We added the
+following 3 new ptrace requests.
+
+1. PTRACE_PPC_GETHWDEBUGINFO
+
+Query for GDB to discover the hardware debug features. The main info to
+be returned here is the minimum alignment for the hardware watchpoints.
+BookE processors don't have restrictions here, but server processors have
+an 8-byte alignment restriction for hardware watchpoints. We'd like to avoid
+adding special cases to GDB based on what it sees in AUXV.
+
+Since we're at it, we added other useful info that the kernel can return to
+GDB: this query will return the number of hardware breakpoints, hardware
+watchpoints and whether it supports a range of addresses and a condition.
+The query will fill the following structure provided by the requesting process:
+
+struct ppc_debug_info {
+ unit32_t version;
+ unit32_t num_instruction_bps;
+ unit32_t num_data_bps;
+ unit32_t num_condition_regs;
+ unit32_t data_bp_alignment;
+ unit32_t sizeof_condition; /* size of the DVC register */
+ uint64_t features; /* bitmask of the individual flags */
+};
+
+features will have bits indicating whether there is support for:
+
+#define PPC_DEBUG_FEATURE_INSN_BP_RANGE 0x1
+#define PPC_DEBUG_FEATURE_INSN_BP_MASK 0x2
+#define PPC_DEBUG_FEATURE_DATA_BP_RANGE 0x4
+#define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x8
+#define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x10
+
+2. PTRACE_SETHWDEBUG
+
+Sets a hardware breakpoint or watchpoint, according to the provided structure:
+
+struct ppc_hw_breakpoint {
+ uint32_t version;
+#define PPC_BREAKPOINT_TRIGGER_EXECUTE 0x1
+#define PPC_BREAKPOINT_TRIGGER_READ 0x2
+#define PPC_BREAKPOINT_TRIGGER_WRITE 0x4
+ uint32_t trigger_type; /* only some combinations allowed */
+#define PPC_BREAKPOINT_MODE_EXACT 0x0
+#define PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE 0x1
+#define PPC_BREAKPOINT_MODE_RANGE_EXCLUSIVE 0x2
+#define PPC_BREAKPOINT_MODE_MASK 0x3
+ uint32_t addr_mode; /* address match mode */
+
+#define PPC_BREAKPOINT_CONDITION_MODE 0x3
+#define PPC_BREAKPOINT_CONDITION_NONE 0x0
+#define PPC_BREAKPOINT_CONDITION_AND 0x1
+#define PPC_BREAKPOINT_CONDITION_EXACT 0x1 /* different name for the same thing as above */
+#define PPC_BREAKPOINT_CONDITION_OR 0x2
+#define PPC_BREAKPOINT_CONDITION_AND_OR 0x3
+#define PPC_BREAKPOINT_CONDITION_BE_ALL 0x00ff0000 /* byte enable bits */
+#define PPC_BREAKPOINT_CONDITION_BE(n) (1<<((n)+16))
+ uint32_t condition_mode; /* break/watchpoint condition flags */
+
+ uint64_t addr;
+ uint64_t addr2;
+ uint64_t condition_value;
+};
+
+A request specifies one event, not necessarily just one register to be set.
+For instance, if the request is for a watchpoint with a condition, both the
+DAC and DVC registers will be set in the same request.
+
+With this GDB can ask for all kinds of hardware breakpoints and watchpoints
+that the BookE supports. COMEFROM breakpoints available in server processors
+are not contemplated, but that is out of the scope of this work.
+
+ptrace will return an integer (handle) uniquely identifying the breakpoint or
+watchpoint just created. This integer will be used in the PTRACE_DELHWDEBUG
+request to ask for its removal. Return -ENOSPC if the requested breakpoint
+can't be allocated on the registers.
+
+Some examples of using the structure to:
+
+- set a breakpoint in the first breakpoint register
+
+ p.version = PPC_DEBUG_CURRENT_VERSION;
+ p.trigger_type = PPC_BREAKPOINT_TRIGGER_EXECUTE;
+ p.addr_mode = PPC_BREAKPOINT_MODE_EXACT;
+ p.condition_mode = PPC_BREAKPOINT_CONDITION_NONE;
+ p.addr = (uint64_t) address;
+ p.addr2 = 0;
+ p.condition_value = 0;
+
+- set a watchpoint which triggers on reads in the second watchpoint register
+
+ p.version = PPC_DEBUG_CURRENT_VERSION;
+ p.trigger_type = PPC_BREAKPOINT_TRIGGER_READ;
+ p.addr_mode = PPC_BREAKPOINT_MODE_EXACT;
+ p.condition_mode = PPC_BREAKPOINT_CONDITION_NONE;
+ p.addr = (uint64_t) address;
+ p.addr2 = 0;
+ p.condition_value = 0;
+
+- set a watchpoint which triggers only with a specific value
+
+ p.version = PPC_DEBUG_CURRENT_VERSION;
+ p.trigger_type = PPC_BREAKPOINT_TRIGGER_READ;
+ p.addr_mode = PPC_BREAKPOINT_MODE_EXACT;
+ p.condition_mode = PPC_BREAKPOINT_CONDITION_AND | PPC_BREAKPOINT_CONDITION_BE_ALL;
+ p.addr = (uint64_t) address;
+ p.addr2 = 0;
+ p.condition_value = (uint64_t) condition;
+
+- set a ranged hardware breakpoint
+
+ p.version = PPC_DEBUG_CURRENT_VERSION;
+ p.trigger_type = PPC_BREAKPOINT_TRIGGER_EXECUTE;
+ p.addr_mode = PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE;
+ p.condition_mode = PPC_BREAKPOINT_CONDITION_NONE;
+ p.addr = (uint64_t) begin_range;
+ p.addr2 = (uint64_t) end_range;
+ p.condition_value = 0;
+
+- set a watchpoint in server processors (BookS)
+
+ p.version = 1;
+ p.trigger_type = PPC_BREAKPOINT_TRIGGER_RW;
+ p.addr_mode = PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE;
+ or
+ p.addr_mode = PPC_BREAKPOINT_MODE_EXACT;
+
+ p.condition_mode = PPC_BREAKPOINT_CONDITION_NONE;
+ p.addr = (uint64_t) begin_range;
+ /* For PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE addr2 needs to be specified, where
+ * addr2 - addr <= 8 Bytes.
+ */
+ p.addr2 = (uint64_t) end_range;
+ p.condition_value = 0;
+
+3. PTRACE_DELHWDEBUG
+
+Takes an integer which identifies an existing breakpoint or watchpoint
+(i.e., the value returned from PTRACE_SETHWDEBUG), and deletes the
+corresponding breakpoint or watchpoint..
diff --git a/Documentation/powerpc/qe_firmware.txt b/Documentation/powerpc/qe_firmware.txt
new file mode 100644
index 00000000000..2031ddb33d0
--- /dev/null
+++ b/Documentation/powerpc/qe_firmware.txt
@@ -0,0 +1,295 @@
+ Freescale QUICC Engine Firmware Uploading
+ -----------------------------------------
+
+(c) 2007 Timur Tabi <timur at freescale.com>,
+ Freescale Semiconductor
+
+Table of Contents
+=================
+
+ I - Software License for Firmware
+
+ II - Microcode Availability
+
+ III - Description and Terminology
+
+ IV - Microcode Programming Details
+
+ V - Firmware Structure Layout
+
+ VI - Sample Code for Creating Firmware Files
+
+Revision Information
+====================
+
+November 30, 2007: Rev 1.0 - Initial version
+
+I - Software License for Firmware
+=================================
+
+Each firmware file comes with its own software license. For information on
+the particular license, please see the license text that is distributed with
+the firmware.
+
+II - Microcode Availability
+===========================
+
+Firmware files are distributed through various channels. Some are available on
+http://opensource.freescale.com. For other firmware files, please contact
+your Freescale representative or your operating system vendor.
+
+III - Description and Terminology
+================================
+
+In this document, the term 'microcode' refers to the sequence of 32-bit
+integers that compose the actual QE microcode.
+
+The term 'firmware' refers to a binary blob that contains the microcode as
+well as other data that
+
+ 1) describes the microcode's purpose
+ 2) describes how and where to upload the microcode
+ 3) specifies the values of various registers
+ 4) includes additional data for use by specific device drivers
+
+Firmware files are binary files that contain only a firmware.
+
+IV - Microcode Programming Details
+===================================
+
+The QE architecture allows for only one microcode present in I-RAM for each
+RISC processor. To replace any current microcode, a full QE reset (which
+disables the microcode) must be performed first.
+
+QE microcode is uploaded using the following procedure:
+
+1) The microcode is placed into I-RAM at a specific location, using the
+ IRAM.IADD and IRAM.IDATA registers.
+
+2) The CERCR.CIR bit is set to 0 or 1, depending on whether the firmware
+ needs split I-RAM. Split I-RAM is only meaningful for SOCs that have
+ QEs with multiple RISC processors, such as the 8360. Splitting the I-RAM
+ allows each processor to run a different microcode, effectively creating an
+ asymmetric multiprocessing (AMP) system.
+
+3) The TIBCR trap registers are loaded with the addresses of the trap handlers
+ in the microcode.
+
+4) The RSP.ECCR register is programmed with the value provided.
+
+5) If necessary, device drivers that need the virtual traps and extended mode
+ data will use them.
+
+Virtual Microcode Traps
+
+These virtual traps are conditional branches in the microcode. These are
+"soft" provisional introduced in the ROMcode in order to enable higher
+flexibility and save h/w traps If new features are activated or an issue is
+being fixed in the RAM package utilizing they should be activated. This data
+structure signals the microcode which of these virtual traps is active.
+
+This structure contains 6 words that the application should copy to some
+specific been defined. This table describes the structure.
+
+ ---------------------------------------------------------------
+ | Offset in | | Destination Offset | Size of |
+ | array | Protocol | within PRAM | Operand |
+ --------------------------------------------------------------|
+ | 0 | Ethernet | 0xF8 | 4 bytes |
+ | | interworking | | |
+ ---------------------------------------------------------------
+ | 4 | ATM | 0xF8 | 4 bytes |
+ | | interworking | | |
+ ---------------------------------------------------------------
+ | 8 | PPP | 0xF8 | 4 bytes |
+ | | interworking | | |
+ ---------------------------------------------------------------
+ | 12 | Ethernet RX | 0x22 | 1 byte |
+ | | Distributor Page | | |
+ ---------------------------------------------------------------
+ | 16 | ATM Globtal | 0x28 | 1 byte |
+ | | Params Table | | |
+ ---------------------------------------------------------------
+ | 20 | Insert Frame | 0xF8 | 4 bytes |
+ ---------------------------------------------------------------
+
+
+Extended Modes
+
+This is a double word bit array (64 bits) that defines special functionality
+which has an impact on the softwarew drivers. Each bit has its own impact
+and has special instructions for the s/w associated with it. This structure is
+described in this table:
+
+ -----------------------------------------------------------------------
+ | Bit # | Name | Description |
+ -----------------------------------------------------------------------
+ | 0 | General | Indicates that prior to each host command |
+ | | push command | given by the application, the software must |
+ | | | assert a special host command (push command)|
+ | | | CECDR = 0x00800000. |
+ | | | CECR = 0x01c1000f. |
+ -----------------------------------------------------------------------
+ | 1 | UCC ATM | Indicates that after issuing ATM RX INIT |
+ | | RX INIT | command, the host must issue another special|
+ | | push command | command (push command) and immediately |
+ | | | following that re-issue the ATM RX INIT |
+ | | | command. (This makes the sequence of |
+ | | | initializing the ATM receiver a sequence of |
+ | | | three host commands) |
+ | | | CECDR = 0x00800000. |
+ | | | CECR = 0x01c1000f. |
+ -----------------------------------------------------------------------
+ | 2 | Add/remove | Indicates that following the specific host |
+ | | command | command: "Add/Remove entry in Hash Lookup |
+ | | validation | Table" used in Interworking setup, the user |
+ | | | must issue another command. |
+ | | | CECDR = 0xce000003. |
+ | | | CECR = 0x01c10f58. |
+ -----------------------------------------------------------------------
+ | 3 | General push | Indicates that the s/w has to initialize |
+ | | command | some pointers in the Ethernet thread pages |
+ | | | which are used when Header Compression is |
+ | | | activated. The full details of these |
+ | | | pointers is located in the software drivers.|
+ -----------------------------------------------------------------------
+ | 4 | General push | Indicates that after issuing Ethernet TX |
+ | | command | INIT command, user must issue this command |
+ | | | for each SNUM of Ethernet TX thread. |
+ | | | CECDR = 0x00800003. |
+ | | | CECR = 0x7'b{0}, 8'b{Enet TX thread SNUM}, |
+ | | | 1'b{1}, 12'b{0}, 4'b{1} |
+ -----------------------------------------------------------------------
+ | 5 - 31 | N/A | Reserved, set to zero. |
+ -----------------------------------------------------------------------
+
+V - Firmware Structure Layout
+==============================
+
+QE microcode from Freescale is typically provided as a header file. This
+header file contains macros that define the microcode binary itself as well as
+some other data used in uploading that microcode. The format of these files
+do not lend themselves to simple inclusion into other code. Hence,
+the need for a more portable format. This section defines that format.
+
+Instead of distributing a header file, the microcode and related data are
+embedded into a binary blob. This blob is passed to the qe_upload_firmware()
+function, which parses the blob and performs everything necessary to upload
+the microcode.
+
+All integers are big-endian. See the comments for function
+qe_upload_firmware() for up-to-date implementation information.
+
+This structure supports versioning, where the version of the structure is
+embedded into the structure itself. To ensure forward and backwards
+compatibility, all versions of the structure must use the same 'qe_header'
+structure at the beginning.
+
+'header' (type: struct qe_header):
+ The 'length' field is the size, in bytes, of the entire structure,
+ including all the microcode embedded in it, as well as the CRC (if
+ present).
+
+ The 'magic' field is an array of three bytes that contains the letters
+ 'Q', 'E', and 'F'. This is an identifier that indicates that this
+ structure is a QE Firmware structure.
+
+ The 'version' field is a single byte that indicates the version of this
+ structure. If the layout of the structure should ever need to be
+ changed to add support for additional types of microcode, then the
+ version number should also be changed.
+
+The 'id' field is a null-terminated string(suitable for printing) that
+identifies the firmware.
+
+The 'count' field indicates the number of 'microcode' structures. There
+must be one and only one 'microcode' structure for each RISC processor.
+Therefore, this field also represents the number of RISC processors for this
+SOC.
+
+The 'soc' structure contains the SOC numbers and revisions used to match
+the microcode to the SOC itself. Normally, the microcode loader should
+check the data in this structure with the SOC number and revisions, and
+only upload the microcode if there's a match. However, this check is not
+made on all platforms.
+
+Although it is not recommended, you can specify '0' in the soc.model
+field to skip matching SOCs altogether.
+
+The 'model' field is a 16-bit number that matches the actual SOC. The
+'major' and 'minor' fields are the major and minor revision numbers,
+respectively, of the SOC.
+
+For example, to match the 8323, revision 1.0:
+ soc.model = 8323
+ soc.major = 1
+ soc.minor = 0
+
+'padding' is necessary for structure alignment. This field ensures that the
+'extended_modes' field is aligned on a 64-bit boundary.
+
+'extended_modes' is a bitfield that defines special functionality which has an
+impact on the device drivers. Each bit has its own impact and has special
+instructions for the driver associated with it. This field is stored in
+the QE library and available to any driver that calles qe_get_firmware_info().
+
+'vtraps' is an array of 8 words that contain virtual trap values for each
+virtual traps. As with 'extended_modes', this field is stored in the QE
+library and available to any driver that calles qe_get_firmware_info().
+
+'microcode' (type: struct qe_microcode):
+ For each RISC processor there is one 'microcode' structure. The first
+ 'microcode' structure is for the first RISC, and so on.
+
+ The 'id' field is a null-terminated string suitable for printing that
+ identifies this particular microcode.
+
+ 'traps' is an array of 16 words that contain hardware trap values
+ for each of the 16 traps. If trap[i] is 0, then this particular
+ trap is to be ignored (i.e. not written to TIBCR[i]). The entire value
+ is written as-is to the TIBCR[i] register, so be sure to set the EN
+ and T_IBP bits if necessary.
+
+ 'eccr' is the value to program into the ECCR register.
+
+ 'iram_offset' is the offset into IRAM to start writing the
+ microcode.
+
+ 'count' is the number of 32-bit words in the microcode.
+
+ 'code_offset' is the offset, in bytes, from the beginning of this
+ structure where the microcode itself can be found. The first
+ microcode binary should be located immediately after the 'microcode'
+ array.
+
+ 'major', 'minor', and 'revision' are the major, minor, and revision
+ version numbers, respectively, of the microcode. If all values are 0,
+ then these fields are ignored.
+
+ 'reserved' is necessary for structure alignment. Since 'microcode'
+ is an array, the 64-bit 'extended_modes' field needs to be aligned
+ on a 64-bit boundary, and this can only happen if the size of
+ 'microcode' is a multiple of 8 bytes. To ensure that, we add
+ 'reserved'.
+
+After the last microcode is a 32-bit CRC. It can be calculated using
+this algorithm:
+
+u32 crc32(const u8 *p, unsigned int len)
+{
+ unsigned int i;
+ u32 crc = 0;
+
+ while (len--) {
+ crc ^= *p++;
+ for (i = 0; i < 8; i++)
+ crc = (crc >> 1) ^ ((crc & 1) ? 0xedb88320 : 0);
+ }
+ return crc;
+}
+
+VI - Sample Code for Creating Firmware Files
+============================================
+
+A Python program that creates firmware binaries from the header files normally
+distributed by Freescale can be found on http://opensource.freescale.com.
diff --git a/Documentation/powerpc/smp.txt b/Documentation/powerpc/smp.txt
deleted file mode 100644
index 5b581b849ff..00000000000
--- a/Documentation/powerpc/smp.txt
+++ /dev/null
@@ -1,34 +0,0 @@
- Information about Linux/PPC SMP mode
-=====================================================================
-
-This document and the related code was written by me
-(Cort Dougan, cort@fsmlabs.com) please email me if you have questions,
-comments or corrections.
-
-Last Change: 3.31.99
-
-If you want to help by writing code or testing different hardware please
-email me!
-
-1. State of Supported Hardware
-
- PowerSurge Architecture - tested on UMAX s900, Apple 9600
- The second processor on this machine boots up just fine and
- enters its idle loop. Hopefully a completely working SMP kernel
- on this machine will be done shortly.
-
- The code makes the assumption of only two processors. The changes
- necessary to work with any number would not be overly difficult but
- I don't have any machines with >2 processors so it's not high on my
- list of priorities. If anyone else would like do to the work email
- me and I can point out the places that need changed. If you have >2
- processors and don't want to add support yourself let me know and I
- can take a look into it.
-
- BeBox
- BeBox support hasn't been added to the 2.1.X kernels from 2.0.X
- but work is being done and SMP support for BeBox is in the works.
-
- CHRP
- CHRP SMP works and is fairly solid. It's been tested on the IBM F50
- with 4 processors for quite some time now.
diff --git a/Documentation/powerpc/sound.txt b/Documentation/powerpc/sound.txt
deleted file mode 100644
index df23d95e03a..00000000000
--- a/Documentation/powerpc/sound.txt
+++ /dev/null
@@ -1,81 +0,0 @@
- Information about PowerPC Sound support
-=====================================================================
-
-Please mail me (Cort Dougan, cort@fsmlabs.com) if you have questions,
-comments or corrections.
-
-Last Change: 6.16.99
-
-This just covers sound on the PReP and CHRP systems for now and later
-will contain information on the PowerMac's.
-
-Sound on PReP has been tested and is working with the PowerStack and IBM
-Power Series onboard sound systems which are based on the cs4231(2) chip.
-The sound options when doing the make config are a bit different from
-the default, though.
-
-The I/O base, irq and dma lines that you enter during the make config
-are ignored and are set when booting according to the machine type.
-This is so that one binary can be used for Motorola and IBM machines
-which use different values and isn't allowed by the driver, so things
-are hacked together in such a way as to allow this information to be
-set automatically on boot.
-
-1. Motorola PowerStack PReP machines
-
- Enable support for "Crystal CS4232 based (PnP) cards" and for the
- Microsoft Sound System. The MSS isn't used, but some of the routines
- that the CS4232 driver uses are in it.
-
- Although the options you set are ignored and determined automatically
- on boot these are included for information only:
-
- (830) CS4232 audio I/O base 530, 604, E80 or F40
- (10) CS4232 audio IRQ 5, 7, 9, 11, 12 or 15
- (6) CS4232 audio DMA 0, 1 or 3
- (7) CS4232 second (duplex) DMA 0, 1 or 3
-
- This will allow simultaneous record and playback, as 2 different dma
- channels are used.
-
- The sound will be all left channel and very low volume since the
- auxiliary input isn't muted by default. I had the changes necessary
- for this in the kernel but the sound driver maintainer didn't want
- to include them since it wasn't common in other machines. To fix this
- you need to mute it using a mixer utility of some sort (if you find one
- please let me know) or by patching the driver yourself and recompiling.
-
- There is a problem on the PowerStack 2's (PowerStack Pro's) using a
- different irq/drq than the kernel expects. Unfortunately, I don't know
- which irq/drq it is so if anyone knows please email me.
-
- Midi is not supported since the cs4232 driver doesn't support midi yet.
-
-2. IBM PowerPersonal PReP machines
-
- I've only tested sound on the Power Personal Series of IBM workstations
- so if you try it on others please let me know the result. I'm especially
- interested in the 43p's sound system, which I know nothing about.
-
- Enable support for "Crystal CS4232 based (PnP) cards" and for the
- Microsoft Sound System. The MSS isn't used, but some of the routines
- that the CS4232 driver uses are in it.
-
- Although the options you set are ignored and determined automatically
- on boot these are included for information only:
-
- (530) CS4232 audio I/O base 530, 604, E80 or F40
- (5) CS4232 audio IRQ 5, 7, 9, 11, 12 or 15
- (1) CS4232 audio DMA 0, 1 or 3
- (7) CS4232 second (duplex) DMA 0, 1 or 3
- (330) CS4232 MIDI I/O base 330, 370, 3B0 or 3F0
- (9) CS4232 MIDI IRQ 5, 7, 9, 11, 12 or 15
-
- This setup does _NOT_ allow for recording yet.
-
- Midi is not supported since the cs4232 driver doesn't support midi yet.
-
-2. IBM CHRP
-
- I have only tested this on the 43P-150. Build the kernel with the cs4232
- set as a module and load the module with irq=9 dma=1 dma2=2 io=0x550
diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt
new file mode 100644
index 00000000000..9791e98ab49
--- /dev/null
+++ b/Documentation/powerpc/transactional_memory.txt
@@ -0,0 +1,198 @@
+Transactional Memory support
+============================
+
+POWER kernel support for this feature is currently limited to supporting
+its use by user programs. It is not currently used by the kernel itself.
+
+This file aims to sum up how it is supported by Linux and what behaviour you
+can expect from your user programs.
+
+
+Basic overview
+==============
+
+Hardware Transactional Memory is supported on POWER8 processors, and is a
+feature that enables a different form of atomic memory access. Several new
+instructions are presented to delimit transactions; transactions are
+guaranteed to either complete atomically or roll back and undo any partial
+changes.
+
+A simple transaction looks like this:
+
+begin_move_money:
+ tbegin
+ beq abort_handler
+
+ ld r4, SAVINGS_ACCT(r3)
+ ld r5, CURRENT_ACCT(r3)
+ subi r5, r5, 1
+ addi r4, r4, 1
+ std r4, SAVINGS_ACCT(r3)
+ std r5, CURRENT_ACCT(r3)
+
+ tend
+
+ b continue
+
+abort_handler:
+ ... test for odd failures ...
+
+ /* Retry the transaction if it failed because it conflicted with
+ * someone else: */
+ b begin_move_money
+
+
+The 'tbegin' instruction denotes the start point, and 'tend' the end point.
+Between these points the processor is in 'Transactional' state; any memory
+references will complete in one go if there are no conflicts with other
+transactional or non-transactional accesses within the system. In this
+example, the transaction completes as though it were normal straight-line code
+IF no other processor has touched SAVINGS_ACCT(r3) or CURRENT_ACCT(r3); an
+atomic move of money from the current account to the savings account has been
+performed. Even though the normal ld/std instructions are used (note no
+lwarx/stwcx), either *both* SAVINGS_ACCT(r3) and CURRENT_ACCT(r3) will be
+updated, or neither will be updated.
+
+If, in the meantime, there is a conflict with the locations accessed by the
+transaction, the transaction will be aborted by the CPU. Register and memory
+state will roll back to that at the 'tbegin', and control will continue from
+'tbegin+4'. The branch to abort_handler will be taken this second time; the
+abort handler can check the cause of the failure, and retry.
+
+Checkpointed registers include all GPRs, FPRs, VRs/VSRs, LR, CCR/CR, CTR, FPCSR
+and a few other status/flag regs; see the ISA for details.
+
+Causes of transaction aborts
+============================
+
+- Conflicts with cache lines used by other processors
+- Signals
+- Context switches
+- See the ISA for full documentation of everything that will abort transactions.
+
+
+Syscalls
+========
+
+Performing syscalls from within transaction is not recommended, and can lead
+to unpredictable results.
+
+Syscalls do not by design abort transactions, but beware: The kernel code will
+not be running in transactional state. The effect of syscalls will always
+remain visible, but depending on the call they may abort your transaction as a
+side-effect, read soon-to-be-aborted transactional data that should not remain
+invisible, etc. If you constantly retry a transaction that constantly aborts
+itself by calling a syscall, you'll have a livelock & make no progress.
+
+Simple syscalls (e.g. sigprocmask()) "could" be OK. Even things like write()
+from, say, printf() should be OK as long as the kernel does not access any
+memory that was accessed transactionally.
+
+Consider any syscalls that happen to work as debug-only -- not recommended for
+production use. Best to queue them up till after the transaction is over.
+
+
+Signals
+=======
+
+Delivery of signals (both sync and async) during transactions provides a second
+thread state (ucontext/mcontext) to represent the second transactional register
+state. Signal delivery 'treclaim's to capture both register states, so signals
+abort transactions. The usual ucontext_t passed to the signal handler
+represents the checkpointed/original register state; the signal appears to have
+arisen at 'tbegin+4'.
+
+If the sighandler ucontext has uc_link set, a second ucontext has been
+delivered. For future compatibility the MSR.TS field should be checked to
+determine the transactional state -- if so, the second ucontext in uc->uc_link
+represents the active transactional registers at the point of the signal.
+
+For 64-bit processes, uc->uc_mcontext.regs->msr is a full 64-bit MSR and its TS
+field shows the transactional mode.
+
+For 32-bit processes, the mcontext's MSR register is only 32 bits; the top 32
+bits are stored in the MSR of the second ucontext, i.e. in
+uc->uc_link->uc_mcontext.regs->msr. The top word contains the transactional
+state TS.
+
+However, basic signal handlers don't need to be aware of transactions
+and simply returning from the handler will deal with things correctly:
+
+Transaction-aware signal handlers can read the transactional register state
+from the second ucontext. This will be necessary for crash handlers to
+determine, for example, the address of the instruction causing the SIGSEGV.
+
+Example signal handler:
+
+ void crash_handler(int sig, siginfo_t *si, void *uc)
+ {
+ ucontext_t *ucp = uc;
+ ucontext_t *transactional_ucp = ucp->uc_link;
+
+ if (ucp_link) {
+ u64 msr = ucp->uc_mcontext.regs->msr;
+ /* May have transactional ucontext! */
+#ifndef __powerpc64__
+ msr |= ((u64)transactional_ucp->uc_mcontext.regs->msr) << 32;
+#endif
+ if (MSR_TM_ACTIVE(msr)) {
+ /* Yes, we crashed during a transaction. Oops. */
+ fprintf(stderr, "Transaction to be restarted at 0x%llx, but "
+ "crashy instruction was at 0x%llx\n",
+ ucp->uc_mcontext.regs->nip,
+ transactional_ucp->uc_mcontext.regs->nip);
+ }
+ }
+
+ fix_the_problem(ucp->dar);
+ }
+
+When in an active transaction that takes a signal, we need to be careful with
+the stack. It's possible that the stack has moved back up after the tbegin.
+The obvious case here is when the tbegin is called inside a function that
+returns before a tend. In this case, the stack is part of the checkpointed
+transactional memory state. If we write over this non transactionally or in
+suspend, we are in trouble because if we get a tm abort, the program counter and
+stack pointer will be back at the tbegin but our in memory stack won't be valid
+anymore.
+
+To avoid this, when taking a signal in an active transaction, we need to use
+the stack pointer from the checkpointed state, rather than the speculated
+state. This ensures that the signal context (written tm suspended) will be
+written below the stack required for the rollback. The transaction is aborted
+because of the treclaim, so any memory written between the tbegin and the
+signal will be rolled back anyway.
+
+For signals taken in non-TM or suspended mode, we use the
+normal/non-checkpointed stack pointer.
+
+
+Failure cause codes used by kernel
+==================================
+
+These are defined in <asm/reg.h>, and distinguish different reasons why the
+kernel aborted a transaction:
+
+ TM_CAUSE_RESCHED Thread was rescheduled.
+ TM_CAUSE_TLBI Software TLB invalide.
+ TM_CAUSE_FAC_UNAV FP/VEC/VSX unavailable trap.
+ TM_CAUSE_SYSCALL Currently unused; future syscalls that must abort
+ transactions for consistency will use this.
+ TM_CAUSE_SIGNAL Signal delivered.
+ TM_CAUSE_MISC Currently unused.
+ TM_CAUSE_ALIGNMENT Alignment fault.
+ TM_CAUSE_EMULATE Emulation that touched memory.
+
+These can be checked by the user program's abort handler as TEXASR[0:7]. If
+bit 7 is set, it indicates that the error is consider persistent. For example
+a TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not.q
+
+GDB
+===
+
+GDB and ptrace are not currently TM-aware. If one stops during a transaction,
+it looks like the transaction has just started (the checkpointed state is
+presented). The transaction cannot then be continued and will take the failure
+handler route. Furthermore, the transactional 2nd register state will be
+inaccessible. GDB can currently be used on programs using TM, but not sensibly
+in parts within transactions.
diff --git a/Documentation/powerpc/zImage_layout.txt b/Documentation/powerpc/zImage_layout.txt
deleted file mode 100644
index 048e0150f57..00000000000
--- a/Documentation/powerpc/zImage_layout.txt
+++ /dev/null
@@ -1,47 +0,0 @@
- Information about the Linux/PPC kernel images
-=====================================================================
-
-Please mail me (Cort Dougan, cort@fsmlabs.com) if you have questions,
-comments or corrections.
-
-This document is meant to answer several questions I've had about how
-the PReP system boots and how Linux/PPC interacts with that mechanism.
-It would be nice if we could have information on how other architectures
-boot here as well. If you have anything to contribute, please
-let me know.
-
-
-1. PReP boot file
-
- This is the file necessary to boot PReP systems from floppy or
- hard drive. The firmware reads the PReP partition table entry
- and will load the image accordingly.
-
- To boot the zImage, copy it onto a floppy with dd if=zImage of=/dev/fd0h1440
- or onto a PReP hard drive partition with dd if=zImage of=/dev/sda4
- assuming you've created a PReP partition (type 0x41) with fdisk on
- /dev/sda4.
-
- The layout of the image format is:
-
- 0x0 +------------+
- | | PReP partition table entry
- | |
- 0x400 +------------+
- | | Bootstrap program code + data
- | |
- | |
- +------------+
- | | compressed kernel, elf header removed
- +------------+
- | | initrd (if loaded)
- +------------+
- | | Elf section table for bootstrap program
- +------------+
-
-
-2. MBX boot file
-
- The MBX boards can load an elf image, and relocate it to the
- proper location in memory - it copies the image to the location it was
- linked at.